All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/9] arm64: use unwind data on GCC for shadow call stack
@ 2021-10-13 15:22 ` Ard Biesheuvel
  0 siblings, 0 replies; 32+ messages in thread
From: Ard Biesheuvel @ 2021-10-13 15:22 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-hardening, mark.rutland, catalin.marinas, will,
	Ard Biesheuvel, Kees Cook, Sami Tolvanen, Fangrui Song,
	Nick Desaulniers, Dan Li

This series is a proof of concept implementation of using unwind tables
to locate PACIASP/AUTIASP instructions in the code, and patching them
into shadow call stack pushes/pops at boot time if the platform in
question does not support pointer authentication in hardware. This way,
the overhead of the shadow call stack is only imposed if it actually
gives any benefit. It also means that the compiler does not need to
generate the code, so this works with GCC as well.

In fact, it only works with GCC at the moment, as Clang does not seem to
implement the DW_CFA_negate_ra_state correctly, which is emitted after
each PACIASP or AUTIASP instruction (Clang only does the former).
However, GCC does not appear to get it quite right either, as it emits
the directive in the wrong place in some cases (but in a way that can be
worked around).

Note that this only implements it for the core kernel. Modules should be
straight-forward, and most of the code can be reused. Also, the
transformation is applied unconditionally, even if the hardware does
implement PAC, but this does not really matter for a PoC.

One obvious downside is the size of the unwind tables (3 MiB for
defconfig), although there are plenty of use cases where this does not
really matters (and I haven't checked the compressed size). However,
there may be other reasons why we'd want to have access to these unwind
tables (reliable stack traces), so this will need to be discussed before
I intend to take this any further.

Cc: Kees Cook <keescook@google.com>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Fangrui Song <maskray@google.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Dan Li <ashimida@linux.alibaba.com>

Ard Biesheuvel (9):
  arm64: assembler: enable PAC for non-leaf assembler routines
  arm64: cache: use ALIAS version of linkage macros for local aliases
  arm64: crypto: avoid overlapping linkage definitions for AES-CBC
  arm64: aes-neonbs: move frame pop to end of function
  arm64: chacha-neon: move frame pop forward
  arm64: smccc: create proper stack frames for HVC/SMC calls
  arm64: assembler: add unwind annotations to frame push/pop macros
  arm64: unwind: add asynchronous unwind tables to the kernel proper
  arm64: implement dynamic shadow call stack for GCC

 Makefile                              |   4 +-
 arch/Kconfig                          |   4 +-
 arch/arm64/Kconfig                    |  11 +-
 arch/arm64/Makefile                   |   7 +-
 arch/arm64/crypto/aes-modes.S         |   4 +-
 arch/arm64/crypto/aes-neonbs-core.S   |   8 +-
 arch/arm64/crypto/chacha-neon-core.S  |   9 +-
 arch/arm64/include/asm/assembler.h    |  32 ++-
 arch/arm64/include/asm/linkage.h      |  16 +-
 arch/arm64/kernel/Makefile            |   2 +
 arch/arm64/kernel/head.S              |   3 +
 arch/arm64/kernel/patch-scs.c         | 223 ++++++++++++++++++++
 arch/arm64/kernel/smccc-call.S        |  40 ++--
 arch/arm64/kernel/vmlinux.lds.S       |  20 ++
 arch/arm64/mm/cache.S                 |   8 +-
 drivers/firmware/efi/libstub/Makefile |   1 +
 16 files changed, 347 insertions(+), 45 deletions(-)
 create mode 100644 arch/arm64/kernel/patch-scs.c

-- 
2.30.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC PATCH 0/9] arm64: use unwind data on GCC for shadow call stack
@ 2021-10-13 15:22 ` Ard Biesheuvel
  0 siblings, 0 replies; 32+ messages in thread
From: Ard Biesheuvel @ 2021-10-13 15:22 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-hardening, mark.rutland, catalin.marinas, will,
	Ard Biesheuvel, Kees Cook, Sami Tolvanen, Fangrui Song,
	Nick Desaulniers, Dan Li

This series is a proof of concept implementation of using unwind tables
to locate PACIASP/AUTIASP instructions in the code, and patching them
into shadow call stack pushes/pops at boot time if the platform in
question does not support pointer authentication in hardware. This way,
the overhead of the shadow call stack is only imposed if it actually
gives any benefit. It also means that the compiler does not need to
generate the code, so this works with GCC as well.

In fact, it only works with GCC at the moment, as Clang does not seem to
implement the DW_CFA_negate_ra_state correctly, which is emitted after
each PACIASP or AUTIASP instruction (Clang only does the former).
However, GCC does not appear to get it quite right either, as it emits
the directive in the wrong place in some cases (but in a way that can be
worked around).

Note that this only implements it for the core kernel. Modules should be
straight-forward, and most of the code can be reused. Also, the
transformation is applied unconditionally, even if the hardware does
implement PAC, but this does not really matter for a PoC.

One obvious downside is the size of the unwind tables (3 MiB for
defconfig), although there are plenty of use cases where this does not
really matters (and I haven't checked the compressed size). However,
there may be other reasons why we'd want to have access to these unwind
tables (reliable stack traces), so this will need to be discussed before
I intend to take this any further.

Cc: Kees Cook <keescook@google.com>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Fangrui Song <maskray@google.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Dan Li <ashimida@linux.alibaba.com>

Ard Biesheuvel (9):
  arm64: assembler: enable PAC for non-leaf assembler routines
  arm64: cache: use ALIAS version of linkage macros for local aliases
  arm64: crypto: avoid overlapping linkage definitions for AES-CBC
  arm64: aes-neonbs: move frame pop to end of function
  arm64: chacha-neon: move frame pop forward
  arm64: smccc: create proper stack frames for HVC/SMC calls
  arm64: assembler: add unwind annotations to frame push/pop macros
  arm64: unwind: add asynchronous unwind tables to the kernel proper
  arm64: implement dynamic shadow call stack for GCC

 Makefile                              |   4 +-
 arch/Kconfig                          |   4 +-
 arch/arm64/Kconfig                    |  11 +-
 arch/arm64/Makefile                   |   7 +-
 arch/arm64/crypto/aes-modes.S         |   4 +-
 arch/arm64/crypto/aes-neonbs-core.S   |   8 +-
 arch/arm64/crypto/chacha-neon-core.S  |   9 +-
 arch/arm64/include/asm/assembler.h    |  32 ++-
 arch/arm64/include/asm/linkage.h      |  16 +-
 arch/arm64/kernel/Makefile            |   2 +
 arch/arm64/kernel/head.S              |   3 +
 arch/arm64/kernel/patch-scs.c         | 223 ++++++++++++++++++++
 arch/arm64/kernel/smccc-call.S        |  40 ++--
 arch/arm64/kernel/vmlinux.lds.S       |  20 ++
 arch/arm64/mm/cache.S                 |   8 +-
 drivers/firmware/efi/libstub/Makefile |   1 +
 16 files changed, 347 insertions(+), 45 deletions(-)
 create mode 100644 arch/arm64/kernel/patch-scs.c

-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC PATCH 1/9] arm64: assembler: enable PAC for non-leaf assembler routines
  2021-10-13 15:22 ` Ard Biesheuvel
@ 2021-10-13 15:22   ` Ard Biesheuvel
  -1 siblings, 0 replies; 32+ messages in thread
From: Ard Biesheuvel @ 2021-10-13 15:22 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-hardening, mark.rutland, catalin.marinas, will, Ard Biesheuvel

Enable pointer signing and authentication when preserving and restoring
the linker register to/from the stack for assembler routines that use
the frame_push and frame_pop macros to set up their stack frames. This
protects the return address from inadvertent modification while stored
in memory.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/assembler.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index 89faca0e740d..ceed84ac4005 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -665,6 +665,9 @@ alternative_endif
 	 *              for locals.
 	 */
 	.macro		frame_push, regcount:req, extra
+#ifdef CONFIG_ARM64_PTR_AUTH_KERNEL
+	paciasp
+#endif
 	__frame		st, \regcount, \extra
 	.endm
 
@@ -676,6 +679,9 @@ alternative_endif
 	 */
 	.macro		frame_pop
 	__frame		ld
+#ifdef CONFIG_ARM64_PTR_AUTH_KERNEL
+	autiasp
+#endif
 	.endm
 
 	.macro		__frame_regs, reg1, reg2, op, num
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 1/9] arm64: assembler: enable PAC for non-leaf assembler routines
@ 2021-10-13 15:22   ` Ard Biesheuvel
  0 siblings, 0 replies; 32+ messages in thread
From: Ard Biesheuvel @ 2021-10-13 15:22 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-hardening, mark.rutland, catalin.marinas, will, Ard Biesheuvel

Enable pointer signing and authentication when preserving and restoring
the linker register to/from the stack for assembler routines that use
the frame_push and frame_pop macros to set up their stack frames. This
protects the return address from inadvertent modification while stored
in memory.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/assembler.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index 89faca0e740d..ceed84ac4005 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -665,6 +665,9 @@ alternative_endif
 	 *              for locals.
 	 */
 	.macro		frame_push, regcount:req, extra
+#ifdef CONFIG_ARM64_PTR_AUTH_KERNEL
+	paciasp
+#endif
 	__frame		st, \regcount, \extra
 	.endm
 
@@ -676,6 +679,9 @@ alternative_endif
 	 */
 	.macro		frame_pop
 	__frame		ld
+#ifdef CONFIG_ARM64_PTR_AUTH_KERNEL
+	autiasp
+#endif
 	.endm
 
 	.macro		__frame_regs, reg1, reg2, op, num
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 2/9] arm64: cache: use ALIAS version of linkage macros for local aliases
  2021-10-13 15:22 ` Ard Biesheuvel
@ 2021-10-13 15:22   ` Ard Biesheuvel
  -1 siblings, 0 replies; 32+ messages in thread
From: Ard Biesheuvel @ 2021-10-13 15:22 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-hardening, mark.rutland, catalin.marinas, will, Ard Biesheuvel

Upcoming changes to the linkage macros will no longer tolerate duplicate
start and end symbols for functions unless they are annotated as
aliases. This is needed to avoid emitting mismatched .cfi start/end
directives.

So update a couple of occurrences in cache.S where a local alias is
incorrectly declared as a proper local symbol.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/mm/cache.S | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/mm/cache.S b/arch/arm64/mm/cache.S
index 5051b3c1a4f1..681a89921992 100644
--- a/arch/arm64/mm/cache.S
+++ b/arch/arm64/mm/cache.S
@@ -140,7 +140,7 @@ SYM_FUNC_END(dcache_clean_pou)
  *	- start   - kernel start address of region
  *	- end     - kernel end address of region
  */
-SYM_FUNC_START_LOCAL(__dma_inv_area)
+SYM_FUNC_START_LOCAL_ALIAS(__dma_inv_area)
 SYM_FUNC_START_PI(dcache_inval_poc)
 	/* FALLTHROUGH */
 
@@ -167,7 +167,7 @@ SYM_FUNC_START_PI(dcache_inval_poc)
 	dsb	sy
 	ret
 SYM_FUNC_END_PI(dcache_inval_poc)
-SYM_FUNC_END(__dma_inv_area)
+SYM_FUNC_END_ALIAS(__dma_inv_area)
 
 /*
  *	dcache_clean_poc(start, end)
@@ -178,7 +178,7 @@ SYM_FUNC_END(__dma_inv_area)
  *	- start   - virtual start address of region
  *	- end     - virtual end address of region
  */
-SYM_FUNC_START_LOCAL(__dma_clean_area)
+SYM_FUNC_START_LOCAL_ALIAS(__dma_clean_area)
 SYM_FUNC_START_PI(dcache_clean_poc)
 	/* FALLTHROUGH */
 
@@ -190,7 +190,7 @@ SYM_FUNC_START_PI(dcache_clean_poc)
 	dcache_by_line_op cvac, sy, x0, x1, x2, x3
 	ret
 SYM_FUNC_END_PI(dcache_clean_poc)
-SYM_FUNC_END(__dma_clean_area)
+SYM_FUNC_END_ALIAS(__dma_clean_area)
 
 /*
  *	dcache_clean_pop(start, end)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 2/9] arm64: cache: use ALIAS version of linkage macros for local aliases
@ 2021-10-13 15:22   ` Ard Biesheuvel
  0 siblings, 0 replies; 32+ messages in thread
From: Ard Biesheuvel @ 2021-10-13 15:22 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-hardening, mark.rutland, catalin.marinas, will, Ard Biesheuvel

Upcoming changes to the linkage macros will no longer tolerate duplicate
start and end symbols for functions unless they are annotated as
aliases. This is needed to avoid emitting mismatched .cfi start/end
directives.

So update a couple of occurrences in cache.S where a local alias is
incorrectly declared as a proper local symbol.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/mm/cache.S | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/mm/cache.S b/arch/arm64/mm/cache.S
index 5051b3c1a4f1..681a89921992 100644
--- a/arch/arm64/mm/cache.S
+++ b/arch/arm64/mm/cache.S
@@ -140,7 +140,7 @@ SYM_FUNC_END(dcache_clean_pou)
  *	- start   - kernel start address of region
  *	- end     - kernel end address of region
  */
-SYM_FUNC_START_LOCAL(__dma_inv_area)
+SYM_FUNC_START_LOCAL_ALIAS(__dma_inv_area)
 SYM_FUNC_START_PI(dcache_inval_poc)
 	/* FALLTHROUGH */
 
@@ -167,7 +167,7 @@ SYM_FUNC_START_PI(dcache_inval_poc)
 	dsb	sy
 	ret
 SYM_FUNC_END_PI(dcache_inval_poc)
-SYM_FUNC_END(__dma_inv_area)
+SYM_FUNC_END_ALIAS(__dma_inv_area)
 
 /*
  *	dcache_clean_poc(start, end)
@@ -178,7 +178,7 @@ SYM_FUNC_END(__dma_inv_area)
  *	- start   - virtual start address of region
  *	- end     - virtual end address of region
  */
-SYM_FUNC_START_LOCAL(__dma_clean_area)
+SYM_FUNC_START_LOCAL_ALIAS(__dma_clean_area)
 SYM_FUNC_START_PI(dcache_clean_poc)
 	/* FALLTHROUGH */
 
@@ -190,7 +190,7 @@ SYM_FUNC_START_PI(dcache_clean_poc)
 	dcache_by_line_op cvac, sy, x0, x1, x2, x3
 	ret
 SYM_FUNC_END_PI(dcache_clean_poc)
-SYM_FUNC_END(__dma_clean_area)
+SYM_FUNC_END_ALIAS(__dma_clean_area)
 
 /*
  *	dcache_clean_pop(start, end)
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 3/9] arm64: crypto: avoid overlapping linkage definitions for AES-CBC
  2021-10-13 15:22 ` Ard Biesheuvel
@ 2021-10-13 15:22   ` Ard Biesheuvel
  -1 siblings, 0 replies; 32+ messages in thread
From: Ard Biesheuvel @ 2021-10-13 15:22 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-hardening, mark.rutland, catalin.marinas, will, Ard Biesheuvel

The aes_essiv_cbc_[en|de]crypt routines perform a single AES block
encryption of the IV before tail calling into the ordinary AES-CBC
routines to perform the actual data en/decryption. In the asm code, the
symbol definitions currently overlap, which is unnecessary, and becomes
problematic once we enable generation of CFI unwind metadata. So
instead, move the end marker of the ESSIV versions right after the
respective tail calls.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/crypto/aes-modes.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/crypto/aes-modes.S b/arch/arm64/crypto/aes-modes.S
index b495de22bb38..50427301b4d8 100644
--- a/arch/arm64/crypto/aes-modes.S
+++ b/arch/arm64/crypto/aes-modes.S
@@ -134,6 +134,7 @@ AES_FUNC_START(aes_essiv_cbc_encrypt)
 	encrypt_block	v4, w8, x6, x7, w9
 	enc_switch_key	w3, x2, x6
 	b		.Lcbcencloop4x
+AES_FUNC_END(aes_essiv_cbc_encrypt)
 
 AES_FUNC_START(aes_cbc_encrypt)
 	ld1		{v4.16b}, [x5]			/* get iv */
@@ -168,7 +169,6 @@ AES_FUNC_START(aes_cbc_encrypt)
 	st1		{v4.16b}, [x5]			/* return iv */
 	ret
 AES_FUNC_END(aes_cbc_encrypt)
-AES_FUNC_END(aes_essiv_cbc_encrypt)
 
 AES_FUNC_START(aes_essiv_cbc_decrypt)
 	stp		x29, x30, [sp, #-16]!
@@ -180,6 +180,7 @@ AES_FUNC_START(aes_essiv_cbc_decrypt)
 	enc_prepare	w8, x6, x7
 	encrypt_block	cbciv, w8, x6, x7, w9
 	b		.Lessivcbcdecstart
+AES_FUNC_END(aes_essiv_cbc_decrypt)
 
 AES_FUNC_START(aes_cbc_decrypt)
 	stp		x29, x30, [sp, #-16]!
@@ -239,7 +240,6 @@ ST5(	st1		{v4.16b}, [x0], #16		)
 	ldp		x29, x30, [sp], #16
 	ret
 AES_FUNC_END(aes_cbc_decrypt)
-AES_FUNC_END(aes_essiv_cbc_decrypt)
 
 
 	/*
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 3/9] arm64: crypto: avoid overlapping linkage definitions for AES-CBC
@ 2021-10-13 15:22   ` Ard Biesheuvel
  0 siblings, 0 replies; 32+ messages in thread
From: Ard Biesheuvel @ 2021-10-13 15:22 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-hardening, mark.rutland, catalin.marinas, will, Ard Biesheuvel

The aes_essiv_cbc_[en|de]crypt routines perform a single AES block
encryption of the IV before tail calling into the ordinary AES-CBC
routines to perform the actual data en/decryption. In the asm code, the
symbol definitions currently overlap, which is unnecessary, and becomes
problematic once we enable generation of CFI unwind metadata. So
instead, move the end marker of the ESSIV versions right after the
respective tail calls.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/crypto/aes-modes.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/crypto/aes-modes.S b/arch/arm64/crypto/aes-modes.S
index b495de22bb38..50427301b4d8 100644
--- a/arch/arm64/crypto/aes-modes.S
+++ b/arch/arm64/crypto/aes-modes.S
@@ -134,6 +134,7 @@ AES_FUNC_START(aes_essiv_cbc_encrypt)
 	encrypt_block	v4, w8, x6, x7, w9
 	enc_switch_key	w3, x2, x6
 	b		.Lcbcencloop4x
+AES_FUNC_END(aes_essiv_cbc_encrypt)
 
 AES_FUNC_START(aes_cbc_encrypt)
 	ld1		{v4.16b}, [x5]			/* get iv */
@@ -168,7 +169,6 @@ AES_FUNC_START(aes_cbc_encrypt)
 	st1		{v4.16b}, [x5]			/* return iv */
 	ret
 AES_FUNC_END(aes_cbc_encrypt)
-AES_FUNC_END(aes_essiv_cbc_encrypt)
 
 AES_FUNC_START(aes_essiv_cbc_decrypt)
 	stp		x29, x30, [sp, #-16]!
@@ -180,6 +180,7 @@ AES_FUNC_START(aes_essiv_cbc_decrypt)
 	enc_prepare	w8, x6, x7
 	encrypt_block	cbciv, w8, x6, x7, w9
 	b		.Lessivcbcdecstart
+AES_FUNC_END(aes_essiv_cbc_decrypt)
 
 AES_FUNC_START(aes_cbc_decrypt)
 	stp		x29, x30, [sp, #-16]!
@@ -239,7 +240,6 @@ ST5(	st1		{v4.16b}, [x0], #16		)
 	ldp		x29, x30, [sp], #16
 	ret
 AES_FUNC_END(aes_cbc_decrypt)
-AES_FUNC_END(aes_essiv_cbc_decrypt)
 
 
 	/*
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 4/9] arm64: aes-neonbs: move frame pop to end of function
  2021-10-13 15:22 ` Ard Biesheuvel
@ 2021-10-13 15:22   ` Ard Biesheuvel
  -1 siblings, 0 replies; 32+ messages in thread
From: Ard Biesheuvel @ 2021-10-13 15:22 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-hardening, mark.rutland, catalin.marinas, will, Ard Biesheuvel

In order to decomplicate the generation of CFI unwind metadata for the
AES-CTR routine, which would involve preserving/restoring the virtual
register set to convey that the state during the handling of inputs less
than 8 blocks [which is emitted out of line] equals the state before the
frame pop, let's just move it to the end of the function.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/crypto/aes-neonbs-core.S | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/crypto/aes-neonbs-core.S b/arch/arm64/crypto/aes-neonbs-core.S
index a3405b8c344b..7104b54448dc 100644
--- a/arch/arm64/crypto/aes-neonbs-core.S
+++ b/arch/arm64/crypto/aes-neonbs-core.S
@@ -966,10 +966,6 @@ CPU_LE(	rev		x8, x8		)
 
 	b		99b
 
-.Lctr_done:
-	frame_pop
-	ret
-
 	/*
 	 * If we are handling the tail of the input (x6 != NULL), return the
 	 * final keystream block back to the caller.
@@ -998,4 +994,8 @@ CPU_LE(	rev		x8, x8		)
 7:	cbz		x25, 8b
 	st1		{v5.16b}, [x25]
 	b		8b
+
+.Lctr_done:
+	frame_pop
+	ret
 SYM_FUNC_END(aesbs_ctr_encrypt)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 4/9] arm64: aes-neonbs: move frame pop to end of function
@ 2021-10-13 15:22   ` Ard Biesheuvel
  0 siblings, 0 replies; 32+ messages in thread
From: Ard Biesheuvel @ 2021-10-13 15:22 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-hardening, mark.rutland, catalin.marinas, will, Ard Biesheuvel

In order to decomplicate the generation of CFI unwind metadata for the
AES-CTR routine, which would involve preserving/restoring the virtual
register set to convey that the state during the handling of inputs less
than 8 blocks [which is emitted out of line] equals the state before the
frame pop, let's just move it to the end of the function.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/crypto/aes-neonbs-core.S | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/crypto/aes-neonbs-core.S b/arch/arm64/crypto/aes-neonbs-core.S
index a3405b8c344b..7104b54448dc 100644
--- a/arch/arm64/crypto/aes-neonbs-core.S
+++ b/arch/arm64/crypto/aes-neonbs-core.S
@@ -966,10 +966,6 @@ CPU_LE(	rev		x8, x8		)
 
 	b		99b
 
-.Lctr_done:
-	frame_pop
-	ret
-
 	/*
 	 * If we are handling the tail of the input (x6 != NULL), return the
 	 * final keystream block back to the caller.
@@ -998,4 +994,8 @@ CPU_LE(	rev		x8, x8		)
 7:	cbz		x25, 8b
 	st1		{v5.16b}, [x25]
 	b		8b
+
+.Lctr_done:
+	frame_pop
+	ret
 SYM_FUNC_END(aesbs_ctr_encrypt)
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 5/9] arm64: chacha-neon: move frame pop forward
  2021-10-13 15:22 ` Ard Biesheuvel
@ 2021-10-13 15:22   ` Ard Biesheuvel
  -1 siblings, 0 replies; 32+ messages in thread
From: Ard Biesheuvel @ 2021-10-13 15:22 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-hardening, mark.rutland, catalin.marinas, will, Ard Biesheuvel

Instead of branching back to the common exit point of the routine to pop
the stack frame and return to the caller, move the frame pop to right
after the point where we last use the callee save registers. This
simplifies the generation of CFI unwind metadata, and reduces the number
of needed branches.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/crypto/chacha-neon-core.S | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/crypto/chacha-neon-core.S b/arch/arm64/crypto/chacha-neon-core.S
index b70ac76f2610..918c0beae019 100644
--- a/arch/arm64/crypto/chacha-neon-core.S
+++ b/arch/arm64/crypto/chacha-neon-core.S
@@ -691,6 +691,8 @@ CPU_BE(	  rev		a15, a15	)
 	zip2		v15.2d, v29.2d, v31.2d
 	  stp		a14, a15, [x1, #-8]
 
+	frame_pop
+
 	tbnz		x5, #63, .Lt128
 	ld1		{v28.16b-v31.16b}, [x2]
 
@@ -726,7 +728,6 @@ CPU_BE(	  rev		a15, a15	)
 	st1		{v24.16b-v27.16b}, [x1], #64
 	st1		{v28.16b-v31.16b}, [x1]
 
-.Lout:	frame_pop
 	ret
 
 	// fewer than 192 bytes of in/output
@@ -744,7 +745,7 @@ CPU_BE(	  rev		a15, a15	)
 	eor		v23.16b, v23.16b, v31.16b
 	st1		{v20.16b-v23.16b}, [x5]		// overlapping stores
 1:	st1		{v16.16b-v19.16b}, [x1]
-	b		.Lout
+	ret
 
 	// fewer than 128 bytes of in/output
 .Lt128:	ld1		{v28.16b-v31.16b}, [x10]
@@ -772,7 +773,7 @@ CPU_BE(	  rev		a15, a15	)
 	eor		v31.16b, v31.16b, v3.16b
 	st1		{v28.16b-v31.16b}, [x6]		// overlapping stores
 2:	st1		{v20.16b-v23.16b}, [x1]
-	b		.Lout
+	ret
 
 	// fewer than 320 bytes of in/output
 .Lt320:	cbz		x7, 3f				// exactly 256 bytes?
@@ -789,7 +790,7 @@ CPU_BE(	  rev		a15, a15	)
 	eor		v31.16b, v31.16b, v3.16b
 	st1		{v28.16b-v31.16b}, [x7]		// overlapping stores
 3:	st1		{v24.16b-v27.16b}, [x1]
-	b		.Lout
+	ret
 SYM_FUNC_END(chacha_4block_xor_neon)
 
 	.section	".rodata", "a", %progbits
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 5/9] arm64: chacha-neon: move frame pop forward
@ 2021-10-13 15:22   ` Ard Biesheuvel
  0 siblings, 0 replies; 32+ messages in thread
From: Ard Biesheuvel @ 2021-10-13 15:22 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-hardening, mark.rutland, catalin.marinas, will, Ard Biesheuvel

Instead of branching back to the common exit point of the routine to pop
the stack frame and return to the caller, move the frame pop to right
after the point where we last use the callee save registers. This
simplifies the generation of CFI unwind metadata, and reduces the number
of needed branches.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/crypto/chacha-neon-core.S | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/crypto/chacha-neon-core.S b/arch/arm64/crypto/chacha-neon-core.S
index b70ac76f2610..918c0beae019 100644
--- a/arch/arm64/crypto/chacha-neon-core.S
+++ b/arch/arm64/crypto/chacha-neon-core.S
@@ -691,6 +691,8 @@ CPU_BE(	  rev		a15, a15	)
 	zip2		v15.2d, v29.2d, v31.2d
 	  stp		a14, a15, [x1, #-8]
 
+	frame_pop
+
 	tbnz		x5, #63, .Lt128
 	ld1		{v28.16b-v31.16b}, [x2]
 
@@ -726,7 +728,6 @@ CPU_BE(	  rev		a15, a15	)
 	st1		{v24.16b-v27.16b}, [x1], #64
 	st1		{v28.16b-v31.16b}, [x1]
 
-.Lout:	frame_pop
 	ret
 
 	// fewer than 192 bytes of in/output
@@ -744,7 +745,7 @@ CPU_BE(	  rev		a15, a15	)
 	eor		v23.16b, v23.16b, v31.16b
 	st1		{v20.16b-v23.16b}, [x5]		// overlapping stores
 1:	st1		{v16.16b-v19.16b}, [x1]
-	b		.Lout
+	ret
 
 	// fewer than 128 bytes of in/output
 .Lt128:	ld1		{v28.16b-v31.16b}, [x10]
@@ -772,7 +773,7 @@ CPU_BE(	  rev		a15, a15	)
 	eor		v31.16b, v31.16b, v3.16b
 	st1		{v28.16b-v31.16b}, [x6]		// overlapping stores
 2:	st1		{v20.16b-v23.16b}, [x1]
-	b		.Lout
+	ret
 
 	// fewer than 320 bytes of in/output
 .Lt320:	cbz		x7, 3f				// exactly 256 bytes?
@@ -789,7 +790,7 @@ CPU_BE(	  rev		a15, a15	)
 	eor		v31.16b, v31.16b, v3.16b
 	st1		{v28.16b-v31.16b}, [x7]		// overlapping stores
 3:	st1		{v24.16b-v27.16b}, [x1]
-	b		.Lout
+	ret
 SYM_FUNC_END(chacha_4block_xor_neon)
 
 	.section	".rodata", "a", %progbits
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 6/9] arm64: smccc: create proper stack frames for HVC/SMC calls
  2021-10-13 15:22 ` Ard Biesheuvel
@ 2021-10-13 15:22   ` Ard Biesheuvel
  -1 siblings, 0 replies; 32+ messages in thread
From: Ard Biesheuvel @ 2021-10-13 15:22 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-hardening, mark.rutland, catalin.marinas, will, Ard Biesheuvel

Create proper stack frames using the provided macros for HVC/SMC calling
helpers that use the stack. This adds the PAC return address signing
when enabled, and ensures that the unwinder can deal with occurrences
of these routines appearing on the call stack.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/smccc-call.S | 40 +++++++++-----------
 1 file changed, 17 insertions(+), 23 deletions(-)

diff --git a/arch/arm64/kernel/smccc-call.S b/arch/arm64/kernel/smccc-call.S
index 487381164ff6..b1864880159a 100644
--- a/arch/arm64/kernel/smccc-call.S
+++ b/arch/arm64/kernel/smccc-call.S
@@ -32,8 +32,7 @@ SYM_FUNC_END(__arm_smccc_sve_check)
 EXPORT_SYMBOL(__arm_smccc_sve_check)
 
 	.macro SMCCC instr
-	stp     x29, x30, [sp, #-16]!
-	mov	x29, sp
+	frame_push 0
 alternative_if ARM64_SVE
 	bl	__arm_smccc_sve_check
 alternative_else_nop_endif
@@ -47,7 +46,7 @@ alternative_else_nop_endif
 	cmp	x9, #ARM_SMCCC_QUIRK_QCOM_A6
 	b.ne	1f
 	str	x6, [x4, ARM_SMCCC_QUIRK_STATE_OFFS]
-1:	ldp     x29, x30, [sp], #16
+1:	frame_pop
 	ret
 	.endm
 
@@ -74,11 +73,10 @@ SYM_FUNC_END(__arm_smccc_hvc)
 EXPORT_SYMBOL(__arm_smccc_hvc)
 
 	.macro SMCCC_1_2 instr
-	/* Save `res` and free a GPR that won't be clobbered */
-	stp     x1, x19, [sp, #-16]!
+	frame_push 2
 
-	/* Ensure `args` won't be clobbered while loading regs in next step */
-	mov	x19, x0
+	mov	x19, x0		// preserve args
+	mov	x20, x1		// preserve res
 
 	/* Load the registers x0 - x17 from the struct arm_smccc_1_2_regs */
 	ldp	x0, x1, [x19, #ARM_SMCCC_1_2_REGS_X0_OFFS]
@@ -93,24 +91,20 @@ EXPORT_SYMBOL(__arm_smccc_hvc)
 
 	\instr #0
 
-	/* Load the `res` from the stack */
-	ldr	x19, [sp]
-
 	/* Store the registers x0 - x17 into the result structure */
-	stp	x0, x1, [x19, #ARM_SMCCC_1_2_REGS_X0_OFFS]
-	stp	x2, x3, [x19, #ARM_SMCCC_1_2_REGS_X2_OFFS]
-	stp	x4, x5, [x19, #ARM_SMCCC_1_2_REGS_X4_OFFS]
-	stp	x6, x7, [x19, #ARM_SMCCC_1_2_REGS_X6_OFFS]
-	stp	x8, x9, [x19, #ARM_SMCCC_1_2_REGS_X8_OFFS]
-	stp	x10, x11, [x19, #ARM_SMCCC_1_2_REGS_X10_OFFS]
-	stp	x12, x13, [x19, #ARM_SMCCC_1_2_REGS_X12_OFFS]
-	stp	x14, x15, [x19, #ARM_SMCCC_1_2_REGS_X14_OFFS]
-	stp	x16, x17, [x19, #ARM_SMCCC_1_2_REGS_X16_OFFS]
-
-	/* Restore original x19 */
-	ldp     xzr, x19, [sp], #16
+	stp	x0, x1, [x20, #ARM_SMCCC_1_2_REGS_X0_OFFS]
+	stp	x2, x3, [x20, #ARM_SMCCC_1_2_REGS_X2_OFFS]
+	stp	x4, x5, [x20, #ARM_SMCCC_1_2_REGS_X4_OFFS]
+	stp	x6, x7, [x20, #ARM_SMCCC_1_2_REGS_X6_OFFS]
+	stp	x8, x9, [x20, #ARM_SMCCC_1_2_REGS_X8_OFFS]
+	stp	x10, x11, [x20, #ARM_SMCCC_1_2_REGS_X10_OFFS]
+	stp	x12, x13, [x20, #ARM_SMCCC_1_2_REGS_X12_OFFS]
+	stp	x14, x15, [x20, #ARM_SMCCC_1_2_REGS_X14_OFFS]
+	stp	x16, x17, [x20, #ARM_SMCCC_1_2_REGS_X16_OFFS]
+
+	frame_pop
 	ret
-.endm
+	.endm
 
 /*
  * void arm_smccc_1_2_hvc(const struct arm_smccc_1_2_regs *args,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 6/9] arm64: smccc: create proper stack frames for HVC/SMC calls
@ 2021-10-13 15:22   ` Ard Biesheuvel
  0 siblings, 0 replies; 32+ messages in thread
From: Ard Biesheuvel @ 2021-10-13 15:22 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-hardening, mark.rutland, catalin.marinas, will, Ard Biesheuvel

Create proper stack frames using the provided macros for HVC/SMC calling
helpers that use the stack. This adds the PAC return address signing
when enabled, and ensures that the unwinder can deal with occurrences
of these routines appearing on the call stack.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/smccc-call.S | 40 +++++++++-----------
 1 file changed, 17 insertions(+), 23 deletions(-)

diff --git a/arch/arm64/kernel/smccc-call.S b/arch/arm64/kernel/smccc-call.S
index 487381164ff6..b1864880159a 100644
--- a/arch/arm64/kernel/smccc-call.S
+++ b/arch/arm64/kernel/smccc-call.S
@@ -32,8 +32,7 @@ SYM_FUNC_END(__arm_smccc_sve_check)
 EXPORT_SYMBOL(__arm_smccc_sve_check)
 
 	.macro SMCCC instr
-	stp     x29, x30, [sp, #-16]!
-	mov	x29, sp
+	frame_push 0
 alternative_if ARM64_SVE
 	bl	__arm_smccc_sve_check
 alternative_else_nop_endif
@@ -47,7 +46,7 @@ alternative_else_nop_endif
 	cmp	x9, #ARM_SMCCC_QUIRK_QCOM_A6
 	b.ne	1f
 	str	x6, [x4, ARM_SMCCC_QUIRK_STATE_OFFS]
-1:	ldp     x29, x30, [sp], #16
+1:	frame_pop
 	ret
 	.endm
 
@@ -74,11 +73,10 @@ SYM_FUNC_END(__arm_smccc_hvc)
 EXPORT_SYMBOL(__arm_smccc_hvc)
 
 	.macro SMCCC_1_2 instr
-	/* Save `res` and free a GPR that won't be clobbered */
-	stp     x1, x19, [sp, #-16]!
+	frame_push 2
 
-	/* Ensure `args` won't be clobbered while loading regs in next step */
-	mov	x19, x0
+	mov	x19, x0		// preserve args
+	mov	x20, x1		// preserve res
 
 	/* Load the registers x0 - x17 from the struct arm_smccc_1_2_regs */
 	ldp	x0, x1, [x19, #ARM_SMCCC_1_2_REGS_X0_OFFS]
@@ -93,24 +91,20 @@ EXPORT_SYMBOL(__arm_smccc_hvc)
 
 	\instr #0
 
-	/* Load the `res` from the stack */
-	ldr	x19, [sp]
-
 	/* Store the registers x0 - x17 into the result structure */
-	stp	x0, x1, [x19, #ARM_SMCCC_1_2_REGS_X0_OFFS]
-	stp	x2, x3, [x19, #ARM_SMCCC_1_2_REGS_X2_OFFS]
-	stp	x4, x5, [x19, #ARM_SMCCC_1_2_REGS_X4_OFFS]
-	stp	x6, x7, [x19, #ARM_SMCCC_1_2_REGS_X6_OFFS]
-	stp	x8, x9, [x19, #ARM_SMCCC_1_2_REGS_X8_OFFS]
-	stp	x10, x11, [x19, #ARM_SMCCC_1_2_REGS_X10_OFFS]
-	stp	x12, x13, [x19, #ARM_SMCCC_1_2_REGS_X12_OFFS]
-	stp	x14, x15, [x19, #ARM_SMCCC_1_2_REGS_X14_OFFS]
-	stp	x16, x17, [x19, #ARM_SMCCC_1_2_REGS_X16_OFFS]
-
-	/* Restore original x19 */
-	ldp     xzr, x19, [sp], #16
+	stp	x0, x1, [x20, #ARM_SMCCC_1_2_REGS_X0_OFFS]
+	stp	x2, x3, [x20, #ARM_SMCCC_1_2_REGS_X2_OFFS]
+	stp	x4, x5, [x20, #ARM_SMCCC_1_2_REGS_X4_OFFS]
+	stp	x6, x7, [x20, #ARM_SMCCC_1_2_REGS_X6_OFFS]
+	stp	x8, x9, [x20, #ARM_SMCCC_1_2_REGS_X8_OFFS]
+	stp	x10, x11, [x20, #ARM_SMCCC_1_2_REGS_X10_OFFS]
+	stp	x12, x13, [x20, #ARM_SMCCC_1_2_REGS_X12_OFFS]
+	stp	x14, x15, [x20, #ARM_SMCCC_1_2_REGS_X14_OFFS]
+	stp	x16, x17, [x20, #ARM_SMCCC_1_2_REGS_X16_OFFS]
+
+	frame_pop
 	ret
-.endm
+	.endm
 
 /*
  * void arm_smccc_1_2_hvc(const struct arm_smccc_1_2_regs *args,
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 7/9] arm64: assembler: add unwind annotations to frame push/pop macros
  2021-10-13 15:22 ` Ard Biesheuvel
@ 2021-10-13 15:22   ` Ard Biesheuvel
  -1 siblings, 0 replies; 32+ messages in thread
From: Ard Biesheuvel @ 2021-10-13 15:22 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-hardening, mark.rutland, catalin.marinas, will, Ard Biesheuvel

In order to ensure that we can unwind from hand rolled assembly
routines, decorate the frame push/pop helper macros that are used by
non-leaf assembler routines with the appropriate annotations.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/assembler.h | 26 +++++++++++++++++++-
 arch/arm64/include/asm/linkage.h   | 16 +++++++++++-
 2 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index ceed84ac4005..cebb6c8c489b 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -664,9 +664,10 @@ alternative_endif
 	 *              the new value of sp. Add @extra bytes of stack space
 	 *              for locals.
 	 */
-	.macro		frame_push, regcount:req, extra
+	.macro		frame_push, regcount:req, extra=0
 #ifdef CONFIG_ARM64_PTR_AUTH_KERNEL
 	paciasp
+	.cfi_negate_ra_state
 #endif
 	__frame		st, \regcount, \extra
 	.endm
@@ -681,14 +682,29 @@ alternative_endif
 	__frame		ld
 #ifdef CONFIG_ARM64_PTR_AUTH_KERNEL
 	autiasp
+	.cfi_negate_ra_state
 #endif
 	.endm
 
 	.macro		__frame_regs, reg1, reg2, op, num
 	.if		.Lframe_regcount == \num
 	\op\()r		\reg1, [sp, #(\num + 1) * 8]
+	.ifc		\op, st
+	.cfi_offset	\reg1, -.Lframe_cfa_offset
+	.set		.Lframe_cfa_offset, .Lframe_cfa_offset - 8
+	.else
+	.cfi_restore	\reg1
+	.endif
 	.elseif		.Lframe_regcount > \num
 	\op\()p		\reg1, \reg2, [sp, #(\num + 1) * 8]
+	.ifc		\op, st
+	.cfi_offset	\reg1, -.Lframe_cfa_offset
+	.cfi_offset	\reg2, -.Lframe_cfa_offset + 8
+	.set		.Lframe_cfa_offset, .Lframe_cfa_offset - 16
+	.else
+	.cfi_restore	\reg1
+	.cfi_restore	\reg2
+	.endif
 	.endif
 	.endm
 
@@ -708,7 +724,12 @@ alternative_endif
 	.set		.Lframe_regcount, \regcount
 	.set		.Lframe_extra, \extra
 	.set		.Lframe_local_offset, ((\regcount + 3) / 2) * 16
+	.set		.Lframe_cfa_offset, .Lframe_local_offset + .Lframe_extra
 	stp		x29, x30, [sp, #-.Lframe_local_offset - .Lframe_extra]!
+	.cfi_def_cfa_offset .Lframe_cfa_offset
+	.cfi_offset	x29, -.Lframe_cfa_offset
+	.cfi_offset	x30, -.Lframe_cfa_offset + 8
+	.set		.Lframe_cfa_offset, .Lframe_cfa_offset - 16
 	mov		x29, sp
 	.endif
 
@@ -723,6 +744,9 @@ alternative_endif
 	.error		"frame_push/frame_pop may not be nested"
 	.endif
 	ldp		x29, x30, [sp], #.Lframe_local_offset + .Lframe_extra
+	.cfi_restore	x29
+	.cfi_restore	x30
+	.cfi_def_cfa_offset 0
 	.set		.Lframe_regcount, -1
 	.endif
 	.endm
diff --git a/arch/arm64/include/asm/linkage.h b/arch/arm64/include/asm/linkage.h
index 9906541a6861..d984a6750b01 100644
--- a/arch/arm64/include/asm/linkage.h
+++ b/arch/arm64/include/asm/linkage.h
@@ -4,6 +4,9 @@
 #define __ALIGN		.align 2
 #define __ALIGN_STR	".align 2"
 
+#define SYM_FUNC_CFI_START	.cfi_startproc ;
+#define SYM_FUNC_CFI_END	.cfi_endproc ;
+
 #if defined(CONFIG_ARM64_BTI_KERNEL) && defined(__aarch64__)
 
 /*
@@ -12,6 +15,9 @@
  * instead.
  */
 #define BTI_C hint 34 ;
+#else
+#define BTI_C
+#endif
 
 /*
  * When using in-kernel BTI we need to ensure that PCS-conformant assembly
@@ -20,29 +26,37 @@
  */
 #define SYM_FUNC_START(name)				\
 	SYM_START(name, SYM_L_GLOBAL, SYM_A_ALIGN)	\
+	SYM_FUNC_CFI_START				\
 	BTI_C
 
 #define SYM_FUNC_START_NOALIGN(name)			\
 	SYM_START(name, SYM_L_GLOBAL, SYM_A_NONE)	\
+	SYM_FUNC_CFI_START				\
 	BTI_C
 
 #define SYM_FUNC_START_LOCAL(name)			\
 	SYM_START(name, SYM_L_LOCAL, SYM_A_ALIGN)	\
+	SYM_FUNC_CFI_START				\
 	BTI_C
 
 #define SYM_FUNC_START_LOCAL_NOALIGN(name)		\
 	SYM_START(name, SYM_L_LOCAL, SYM_A_NONE)	\
+	SYM_FUNC_CFI_START				\
 	BTI_C
 
 #define SYM_FUNC_START_WEAK(name)			\
 	SYM_START(name, SYM_L_WEAK, SYM_A_ALIGN)	\
+	SYM_FUNC_CFI_START				\
 	BTI_C
 
 #define SYM_FUNC_START_WEAK_NOALIGN(name)		\
 	SYM_START(name, SYM_L_WEAK, SYM_A_NONE)		\
+	SYM_FUNC_CFI_START				\
 	BTI_C
 
-#endif
+#define SYM_FUNC_END(name)				\
+	SYM_FUNC_CFI_END				\
+	SYM_END(name, SYM_T_FUNC)
 
 /*
  * Annotate a function as position independent, i.e., safe to be called before
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 7/9] arm64: assembler: add unwind annotations to frame push/pop macros
@ 2021-10-13 15:22   ` Ard Biesheuvel
  0 siblings, 0 replies; 32+ messages in thread
From: Ard Biesheuvel @ 2021-10-13 15:22 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-hardening, mark.rutland, catalin.marinas, will, Ard Biesheuvel

In order to ensure that we can unwind from hand rolled assembly
routines, decorate the frame push/pop helper macros that are used by
non-leaf assembler routines with the appropriate annotations.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/assembler.h | 26 +++++++++++++++++++-
 arch/arm64/include/asm/linkage.h   | 16 +++++++++++-
 2 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index ceed84ac4005..cebb6c8c489b 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -664,9 +664,10 @@ alternative_endif
 	 *              the new value of sp. Add @extra bytes of stack space
 	 *              for locals.
 	 */
-	.macro		frame_push, regcount:req, extra
+	.macro		frame_push, regcount:req, extra=0
 #ifdef CONFIG_ARM64_PTR_AUTH_KERNEL
 	paciasp
+	.cfi_negate_ra_state
 #endif
 	__frame		st, \regcount, \extra
 	.endm
@@ -681,14 +682,29 @@ alternative_endif
 	__frame		ld
 #ifdef CONFIG_ARM64_PTR_AUTH_KERNEL
 	autiasp
+	.cfi_negate_ra_state
 #endif
 	.endm
 
 	.macro		__frame_regs, reg1, reg2, op, num
 	.if		.Lframe_regcount == \num
 	\op\()r		\reg1, [sp, #(\num + 1) * 8]
+	.ifc		\op, st
+	.cfi_offset	\reg1, -.Lframe_cfa_offset
+	.set		.Lframe_cfa_offset, .Lframe_cfa_offset - 8
+	.else
+	.cfi_restore	\reg1
+	.endif
 	.elseif		.Lframe_regcount > \num
 	\op\()p		\reg1, \reg2, [sp, #(\num + 1) * 8]
+	.ifc		\op, st
+	.cfi_offset	\reg1, -.Lframe_cfa_offset
+	.cfi_offset	\reg2, -.Lframe_cfa_offset + 8
+	.set		.Lframe_cfa_offset, .Lframe_cfa_offset - 16
+	.else
+	.cfi_restore	\reg1
+	.cfi_restore	\reg2
+	.endif
 	.endif
 	.endm
 
@@ -708,7 +724,12 @@ alternative_endif
 	.set		.Lframe_regcount, \regcount
 	.set		.Lframe_extra, \extra
 	.set		.Lframe_local_offset, ((\regcount + 3) / 2) * 16
+	.set		.Lframe_cfa_offset, .Lframe_local_offset + .Lframe_extra
 	stp		x29, x30, [sp, #-.Lframe_local_offset - .Lframe_extra]!
+	.cfi_def_cfa_offset .Lframe_cfa_offset
+	.cfi_offset	x29, -.Lframe_cfa_offset
+	.cfi_offset	x30, -.Lframe_cfa_offset + 8
+	.set		.Lframe_cfa_offset, .Lframe_cfa_offset - 16
 	mov		x29, sp
 	.endif
 
@@ -723,6 +744,9 @@ alternative_endif
 	.error		"frame_push/frame_pop may not be nested"
 	.endif
 	ldp		x29, x30, [sp], #.Lframe_local_offset + .Lframe_extra
+	.cfi_restore	x29
+	.cfi_restore	x30
+	.cfi_def_cfa_offset 0
 	.set		.Lframe_regcount, -1
 	.endif
 	.endm
diff --git a/arch/arm64/include/asm/linkage.h b/arch/arm64/include/asm/linkage.h
index 9906541a6861..d984a6750b01 100644
--- a/arch/arm64/include/asm/linkage.h
+++ b/arch/arm64/include/asm/linkage.h
@@ -4,6 +4,9 @@
 #define __ALIGN		.align 2
 #define __ALIGN_STR	".align 2"
 
+#define SYM_FUNC_CFI_START	.cfi_startproc ;
+#define SYM_FUNC_CFI_END	.cfi_endproc ;
+
 #if defined(CONFIG_ARM64_BTI_KERNEL) && defined(__aarch64__)
 
 /*
@@ -12,6 +15,9 @@
  * instead.
  */
 #define BTI_C hint 34 ;
+#else
+#define BTI_C
+#endif
 
 /*
  * When using in-kernel BTI we need to ensure that PCS-conformant assembly
@@ -20,29 +26,37 @@
  */
 #define SYM_FUNC_START(name)				\
 	SYM_START(name, SYM_L_GLOBAL, SYM_A_ALIGN)	\
+	SYM_FUNC_CFI_START				\
 	BTI_C
 
 #define SYM_FUNC_START_NOALIGN(name)			\
 	SYM_START(name, SYM_L_GLOBAL, SYM_A_NONE)	\
+	SYM_FUNC_CFI_START				\
 	BTI_C
 
 #define SYM_FUNC_START_LOCAL(name)			\
 	SYM_START(name, SYM_L_LOCAL, SYM_A_ALIGN)	\
+	SYM_FUNC_CFI_START				\
 	BTI_C
 
 #define SYM_FUNC_START_LOCAL_NOALIGN(name)		\
 	SYM_START(name, SYM_L_LOCAL, SYM_A_NONE)	\
+	SYM_FUNC_CFI_START				\
 	BTI_C
 
 #define SYM_FUNC_START_WEAK(name)			\
 	SYM_START(name, SYM_L_WEAK, SYM_A_ALIGN)	\
+	SYM_FUNC_CFI_START				\
 	BTI_C
 
 #define SYM_FUNC_START_WEAK_NOALIGN(name)		\
 	SYM_START(name, SYM_L_WEAK, SYM_A_NONE)		\
+	SYM_FUNC_CFI_START				\
 	BTI_C
 
-#endif
+#define SYM_FUNC_END(name)				\
+	SYM_FUNC_CFI_END				\
+	SYM_END(name, SYM_T_FUNC)
 
 /*
  * Annotate a function as position independent, i.e., safe to be called before
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 8/9] arm64: unwind: add asynchronous unwind tables to the kernel proper
  2021-10-13 15:22 ` Ard Biesheuvel
@ 2021-10-13 15:22   ` Ard Biesheuvel
  -1 siblings, 0 replies; 32+ messages in thread
From: Ard Biesheuvel @ 2021-10-13 15:22 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-hardening, mark.rutland, catalin.marinas, will, Ard Biesheuvel

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/Kconfig                    |  3 +++
 arch/arm64/Makefile                   |  7 ++++++-
 arch/arm64/kernel/vmlinux.lds.S       | 20 ++++++++++++++++++++
 drivers/firmware/efi/libstub/Makefile |  1 +
 4 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 077f2ec4eeb2..742baca09343 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -350,6 +350,9 @@ config KASAN_SHADOW_OFFSET
 	default 0xeffffff800000000 if ARM64_VA_BITS_36 && KASAN_SW_TAGS
 	default 0xffffffffffffffff
 
+config UNWIND_TABLES
+	bool
+
 source "arch/arm64/Kconfig.platforms"
 
 menu "Kernel Features"
diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index c744b1e7b356..95ffc4deebb0 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -10,7 +10,7 @@
 #
 # Copyright (C) 1995-2001 by Russell King
 
-LDFLAGS_vmlinux	:=--no-undefined -X
+LDFLAGS_vmlinux	:=--no-undefined -X --eh-frame-hdr
 
 ifeq ($(CONFIG_RELOCATABLE), y)
 # Pass --no-apply-dynamic-relocs to restore pre-binutils-2.27 behaviour
@@ -45,8 +45,13 @@ KBUILD_CFLAGS	+= $(call cc-option,-mabi=lp64)
 KBUILD_AFLAGS	+= $(call cc-option,-mabi=lp64)
 
 # Avoid generating .eh_frame* sections.
+ifneq ($(CONFIG_UNWIND_TABLES),y)
 KBUILD_CFLAGS	+= -fno-asynchronous-unwind-tables -fno-unwind-tables
 KBUILD_AFLAGS	+= -fno-asynchronous-unwind-tables -fno-unwind-tables
+else
+KBUILD_CFLAGS	+= -fasynchronous-unwind-tables
+KBUILD_AFLAGS	+= -fasynchronous-unwind-tables
+endif
 
 ifeq ($(CONFIG_STACKPROTECTOR_PER_TASK),y)
 prepare: stack_protector_prepare
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index f6b1a88245db..ed3db80bf696 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -111,6 +111,21 @@ jiffies = jiffies_64;
 #define TRAMP_TEXT
 #endif
 
+#ifdef CONFIG_UNWIND_TABLES
+#define UNWIND_DATA_SECTIONS				\
+	.eh_frame_hdr : {				\
+		__eh_frame_hdr = .;			\
+		*(.eh_frame_hdr)			\
+	}						\
+	.eh_frame : {					\
+		__eh_frame_start = .;			\
+		*(.eh_frame)				\
+		__eh_frame_end = .;			\
+	}
+#else
+#define UNWIND_DATA_SECTIONS
+#endif
+
 /*
  * The size of the PE/COFF section that covers the kernel image, which
  * runs from _stext to _edata, must be a round multiple of the PE/COFF
@@ -139,6 +154,9 @@ SECTIONS
 	/DISCARD/ : {
 		*(.interp .dynamic)
 		*(.dynsym .dynstr .hash .gnu.hash)
+#ifndef CONFIG_UNWIND_TABLES
+		*(.eh_frame_hdr .eh_frame)
+#endif
 	}
 
 	. = KIMAGE_VADDR;
@@ -217,6 +235,8 @@ SECTIONS
 		__alt_instructions_end = .;
 	}
 
+	UNWIND_DATA_SECTIONS
+
 	. = ALIGN(SEGMENT_ALIGN);
 	__inittext_end = .;
 	__initdata_begin = .;
diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index d0537573501e..78c46638707a 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -20,6 +20,7 @@ cflags-$(CONFIG_X86)		+= -m$(BITS) -D__KERNEL__ \
 # disable the stackleak plugin
 cflags-$(CONFIG_ARM64)		:= $(subst $(CC_FLAGS_FTRACE),,$(KBUILD_CFLAGS)) \
 				   -fpie $(DISABLE_STACKLEAK_PLUGIN) \
+				   -fno-unwind-tables -fno-asynchronous-unwind-tables \
 				   $(call cc-option,-mbranch-protection=none)
 cflags-$(CONFIG_ARM)		:= $(subst $(CC_FLAGS_FTRACE),,$(KBUILD_CFLAGS)) \
 				   -fno-builtin -fpic \
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 8/9] arm64: unwind: add asynchronous unwind tables to the kernel proper
@ 2021-10-13 15:22   ` Ard Biesheuvel
  0 siblings, 0 replies; 32+ messages in thread
From: Ard Biesheuvel @ 2021-10-13 15:22 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-hardening, mark.rutland, catalin.marinas, will, Ard Biesheuvel

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/Kconfig                    |  3 +++
 arch/arm64/Makefile                   |  7 ++++++-
 arch/arm64/kernel/vmlinux.lds.S       | 20 ++++++++++++++++++++
 drivers/firmware/efi/libstub/Makefile |  1 +
 4 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 077f2ec4eeb2..742baca09343 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -350,6 +350,9 @@ config KASAN_SHADOW_OFFSET
 	default 0xeffffff800000000 if ARM64_VA_BITS_36 && KASAN_SW_TAGS
 	default 0xffffffffffffffff
 
+config UNWIND_TABLES
+	bool
+
 source "arch/arm64/Kconfig.platforms"
 
 menu "Kernel Features"
diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index c744b1e7b356..95ffc4deebb0 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -10,7 +10,7 @@
 #
 # Copyright (C) 1995-2001 by Russell King
 
-LDFLAGS_vmlinux	:=--no-undefined -X
+LDFLAGS_vmlinux	:=--no-undefined -X --eh-frame-hdr
 
 ifeq ($(CONFIG_RELOCATABLE), y)
 # Pass --no-apply-dynamic-relocs to restore pre-binutils-2.27 behaviour
@@ -45,8 +45,13 @@ KBUILD_CFLAGS	+= $(call cc-option,-mabi=lp64)
 KBUILD_AFLAGS	+= $(call cc-option,-mabi=lp64)
 
 # Avoid generating .eh_frame* sections.
+ifneq ($(CONFIG_UNWIND_TABLES),y)
 KBUILD_CFLAGS	+= -fno-asynchronous-unwind-tables -fno-unwind-tables
 KBUILD_AFLAGS	+= -fno-asynchronous-unwind-tables -fno-unwind-tables
+else
+KBUILD_CFLAGS	+= -fasynchronous-unwind-tables
+KBUILD_AFLAGS	+= -fasynchronous-unwind-tables
+endif
 
 ifeq ($(CONFIG_STACKPROTECTOR_PER_TASK),y)
 prepare: stack_protector_prepare
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index f6b1a88245db..ed3db80bf696 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -111,6 +111,21 @@ jiffies = jiffies_64;
 #define TRAMP_TEXT
 #endif
 
+#ifdef CONFIG_UNWIND_TABLES
+#define UNWIND_DATA_SECTIONS				\
+	.eh_frame_hdr : {				\
+		__eh_frame_hdr = .;			\
+		*(.eh_frame_hdr)			\
+	}						\
+	.eh_frame : {					\
+		__eh_frame_start = .;			\
+		*(.eh_frame)				\
+		__eh_frame_end = .;			\
+	}
+#else
+#define UNWIND_DATA_SECTIONS
+#endif
+
 /*
  * The size of the PE/COFF section that covers the kernel image, which
  * runs from _stext to _edata, must be a round multiple of the PE/COFF
@@ -139,6 +154,9 @@ SECTIONS
 	/DISCARD/ : {
 		*(.interp .dynamic)
 		*(.dynsym .dynstr .hash .gnu.hash)
+#ifndef CONFIG_UNWIND_TABLES
+		*(.eh_frame_hdr .eh_frame)
+#endif
 	}
 
 	. = KIMAGE_VADDR;
@@ -217,6 +235,8 @@ SECTIONS
 		__alt_instructions_end = .;
 	}
 
+	UNWIND_DATA_SECTIONS
+
 	. = ALIGN(SEGMENT_ALIGN);
 	__inittext_end = .;
 	__initdata_begin = .;
diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index d0537573501e..78c46638707a 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -20,6 +20,7 @@ cflags-$(CONFIG_X86)		+= -m$(BITS) -D__KERNEL__ \
 # disable the stackleak plugin
 cflags-$(CONFIG_ARM64)		:= $(subst $(CC_FLAGS_FTRACE),,$(KBUILD_CFLAGS)) \
 				   -fpie $(DISABLE_STACKLEAK_PLUGIN) \
+				   -fno-unwind-tables -fno-asynchronous-unwind-tables \
 				   $(call cc-option,-mbranch-protection=none)
 cflags-$(CONFIG_ARM)		:= $(subst $(CC_FLAGS_FTRACE),,$(KBUILD_CFLAGS)) \
 				   -fno-builtin -fpic \
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 9/9] arm64: implement dynamic shadow call stack for GCC
  2021-10-13 15:22 ` Ard Biesheuvel
@ 2021-10-13 15:22   ` Ard Biesheuvel
  -1 siblings, 0 replies; 32+ messages in thread
From: Ard Biesheuvel @ 2021-10-13 15:22 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-hardening, mark.rutland, catalin.marinas, will, Ard Biesheuvel

Implement support for the shadow call stack on GCC, and in a dynamic
manner, by parsing the unwind tables at init time to locate all
occurrences of PACIASP/AUTIASP, and replacing them with the shadow call
stack push and pop instructions, respectively.

This is useful because the overhead of the shadow call stack is
difficult to justify on hardware that implements pointer authentication
(PAC), and given that the PAC instructions are executed as NOPs on
hardware that doesn't, we can just replace them.

This patch only implements this for the core kernel, but the logic can
be reused for modules without much trouble.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 Makefile                      |   4 +-
 arch/Kconfig                  |   4 +-
 arch/arm64/Kconfig            |   8 +-
 arch/arm64/kernel/Makefile    |   2 +
 arch/arm64/kernel/head.S      |   3 +
 arch/arm64/kernel/patch-scs.c | 223 ++++++++++++++++++++
 6 files changed, 239 insertions(+), 5 deletions(-)

diff --git a/Makefile b/Makefile
index 7cfe4ff36f44..2d94fed93d9d 100644
--- a/Makefile
+++ b/Makefile
@@ -933,8 +933,8 @@ LDFLAGS_vmlinux += --gc-sections
 endif
 
 ifdef CONFIG_SHADOW_CALL_STACK
-CC_FLAGS_SCS	:= -fsanitize=shadow-call-stack
-KBUILD_CFLAGS	+= $(CC_FLAGS_SCS)
+CC_FLAGS_SCS-$(CONFIG_CC_IS_CLANG)	:= -fsanitize=shadow-call-stack
+KBUILD_CFLAGS				+= $(CC_FLAGS_SCS-y)
 export CC_FLAGS_SCS
 endif
 
diff --git a/arch/Kconfig b/arch/Kconfig
index 8df1c7102643..21eeec66bf4c 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -596,8 +596,8 @@ config ARCH_SUPPORTS_SHADOW_CALL_STACK
 	  switching.
 
 config SHADOW_CALL_STACK
-	bool "Clang Shadow Call Stack"
-	depends on CC_IS_CLANG && ARCH_SUPPORTS_SHADOW_CALL_STACK
+	bool "Shadow Call Stack"
+	depends on ARCH_SUPPORTS_SHADOW_CALL_STACK
 	depends on DYNAMIC_FTRACE_WITH_REGS || !FUNCTION_GRAPH_TRACER
 	help
 	  This option enables Clang's Shadow Call Stack, which uses a
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 742baca09343..6d74822fd386 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -81,7 +81,7 @@ config ARM64
 	select ARCH_SUPPORTS_DEBUG_PAGEALLOC
 	select ARCH_SUPPORTS_HUGETLBFS
 	select ARCH_SUPPORTS_MEMORY_FAILURE
-	select ARCH_SUPPORTS_SHADOW_CALL_STACK if CC_HAVE_SHADOW_CALL_STACK
+	select ARCH_SUPPORTS_SHADOW_CALL_STACK if CC_HAVE_SHADOW_CALL_STACK || CC_IS_GCC
 	select ARCH_SUPPORTS_LTO_CLANG if CPU_LITTLE_ENDIAN
 	select ARCH_SUPPORTS_LTO_CLANG_THIN
 	select ARCH_SUPPORTS_CFI_CLANG
@@ -353,6 +353,12 @@ config KASAN_SHADOW_OFFSET
 config UNWIND_TABLES
 	bool
 
+config UNWIND_PATCH_PAC_INTO_SCS
+	def_bool y
+	depends on CC_IS_GCC && SHADOW_CALL_STACK
+	select UNWIND_TABLES
+	select ARM64_PTR_AUTH_KERNEL
+
 source "arch/arm64/Kconfig.platforms"
 
 menu "Kernel Features"
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 3f1490bfb938..42b9bd92d51e 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -73,6 +73,8 @@ obj-$(CONFIG_ARM64_PTR_AUTH)		+= pointer_auth.o
 obj-$(CONFIG_ARM64_MTE)			+= mte.o
 obj-y					+= vdso-wrap.o
 obj-$(CONFIG_COMPAT_VDSO)		+= vdso32-wrap.o
+obj-$(CONFIG_UNWIND_PATCH_PAC_INTO_SCS)	+= patch-scs.o
+CFLAGS_patch-scs.o			+= -mbranch-protection=none
 
 obj-y					+= probes/
 head-y					:= head.o
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 17962452e31d..5d50d212d3ae 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -447,6 +447,9 @@ SYM_FUNC_START_LOCAL(__primary_switched)
 	bl	__pi_memset
 	dsb	ishst				// Make zero page visible to PTW
 
+#ifdef CONFIG_UNWIND_PATCH_PAC_INTO_SCS
+	bl	scs_patch_vmlinux
+#endif
 #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
 	bl	kasan_early_init
 #endif
diff --git a/arch/arm64/kernel/patch-scs.c b/arch/arm64/kernel/patch-scs.c
new file mode 100644
index 000000000000..878a40060550
--- /dev/null
+++ b/arch/arm64/kernel/patch-scs.c
@@ -0,0 +1,223 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2021 - Google LLC
+ * Author: Ard Biesheuvel <ardb@google.com>
+ */
+
+#include <linux/bug.h>
+#include <linux/init.h>
+#include <linux/linkage.h>
+#include <linux/printk.h>
+#include <linux/types.h>
+
+#define DW_CFA_nop                          0x00
+#define DW_CFA_set_loc                      0x01
+#define DW_CFA_advance_loc1                 0x02
+#define DW_CFA_advance_loc2                 0x03
+#define DW_CFA_advance_loc4                 0x04
+#define DW_CFA_offset_extended              0x05
+#define DW_CFA_restore_extended             0x06
+#define DW_CFA_undefined                    0x07
+#define DW_CFA_same_value                   0x08
+#define DW_CFA_register                     0x09
+#define DW_CFA_remember_state               0x0a
+#define DW_CFA_restore_state                0x0b
+#define DW_CFA_def_cfa                      0x0c
+#define DW_CFA_def_cfa_register             0x0d
+#define DW_CFA_def_cfa_offset               0x0e
+#define DW_CFA_def_cfa_expression           0x0f
+#define DW_CFA_expression                   0x10
+#define DW_CFA_offset_extended_sf           0x11
+#define DW_CFA_def_cfa_sf                   0x12
+#define DW_CFA_def_cfa_offset_sf            0x13
+#define DW_CFA_val_offset                   0x14
+#define DW_CFA_val_offset_sf                0x15
+#define DW_CFA_val_expression               0x16
+#define DW_CFA_lo_user                      0x1c
+#define DW_CFA_negate_ra_state              0x2d
+#define DW_CFA_GNU_args_size                0x2e
+#define DW_CFA_GNU_negative_offset_extended 0x2f
+#define DW_CFA_hi_user                      0x3f
+
+static unsigned long get_uleb128(const u8 **pcur, const u8 *end)
+{
+	const u8 *cur = *pcur;
+	unsigned long value;
+	unsigned int shift;
+
+	for (shift = 0, value = 0; cur < end; shift += 7) {
+		if (shift + 7 > 8 * sizeof(value)
+		    && (*cur & 0x7fU) >= (1U << (8 * sizeof(value) - shift))) {
+			cur = end + 1;
+			break;
+		}
+		value |= (unsigned long) (*cur & 0x7f) << shift;
+		if (!(*cur++ & 0x80))
+			break;
+	}
+	*pcur = cur;
+
+	return value;
+}
+
+extern const u8 __eh_frame_start[], __eh_frame_end[];
+
+struct fde_frame {
+	s32		initial_loc;
+	s32		range;
+};
+
+static int scs_patch_loc(u64 loc)
+{
+	u32 insn = le32_to_cpup((void *)loc);
+
+	/*
+	 * Sometimes, the unwind data appears to be out of sync, and associates
+	 * the DW_CFA_negate_ra_state directive with the ret instruction
+	 * following the autiasp, rather than the autiasp itself.
+	 */
+	if (insn == 0xd65f03c0) { // ret
+		loc -= 4;
+		insn = le32_to_cpup((void *)loc);
+	}
+
+	switch (insn) {
+	case 0xd503233f: // paciasp
+		*(u32 *)loc = cpu_to_le32(0xf800865e);
+		break;
+	case 0xd50323bf: // autiasp
+		*(u32 *)loc = cpu_to_le32(0xf85f8e5e);
+		break;
+	default:
+		// ignore
+		break;
+	}
+	return 0;
+}
+
+static int noinstr scs_handle_frame(const u8 eh_frame[], u32 size)
+{
+	const struct fde_frame *fde;
+	const u8 *opcode;
+	u64 loc;
+
+	/*
+	 * For patching PAC opcodes, we only care about the FDE records, and
+	 * not the CIE, which carries the initial CFA directives but they only
+	 * pertain to which register is the stack pointer.
+	 * TODO this is not 100% true - we need the augmentation string and the
+	 * encoding but they are always the same in practice.
+	 */
+	if (*(u32 *)eh_frame == 0)
+		return 0;
+
+	fde = (const struct fde_frame *)(eh_frame + 4);
+	loc = (u64)offset_to_ptr(&fde->initial_loc);
+	opcode = (const u8 *)(fde + 1);
+
+	// TODO check augmentation data
+	WARN_ON(*opcode++);
+	size -= sizeof(u32) + sizeof(*fde) + 1;
+
+	/*
+	 * Starting from 'loc', apply the CFA opcodes that advance the location
+	 * pointer, and identify the locations of the PAC instructions.
+	 */
+	do {
+		const u8 *end;
+
+		switch (*opcode & 0xC0) {
+		case 0:
+			// handle DW_CFA_xxx opcodes
+			switch (*opcode) {
+				int ret;
+
+			case DW_CFA_nop:
+			case DW_CFA_remember_state:
+			case DW_CFA_restore_state:
+				break;
+
+			case DW_CFA_advance_loc1:
+				loc += 4 * *++opcode;
+				size--;
+				break;
+
+			case DW_CFA_advance_loc2:
+				loc += 4 * *++opcode;
+				loc += 4 * *++opcode << 8;
+				size -= 2;
+				break;
+
+			case DW_CFA_def_cfa:
+			case DW_CFA_def_cfa_offset:
+			case DW_CFA_def_cfa_register:
+				opcode++;
+				size--;
+				end = opcode + size;
+				get_uleb128(&opcode, end);
+				size = end - opcode;
+				continue;
+
+			case DW_CFA_negate_ra_state:
+				// patch paciasp/autiasp into shadow stack push/pop
+				ret = scs_patch_loc(loc - 4);
+				if (ret)
+					return ret;
+				break;
+
+			default:
+				pr_debug("unhandled opcode: %02x\n", *opcode);
+				return -ENOEXEC;
+			}
+			opcode++;
+			size--;
+			break;
+
+		case 0x40:
+			// advance loc
+			loc += (*opcode++ & 0x3f) * 4;
+			size--;
+			break;
+
+		case 0x80:
+			opcode++;
+			size--;
+			end = opcode + size;
+			get_uleb128(&opcode, end);
+			size = end - opcode;
+			continue;
+
+		default:
+			// ignore
+			opcode++;
+			size--;
+			break;
+		}
+	} while (size > 0);
+
+	return 0;
+}
+
+int noinstr scs_patch(const u8 eh_frame[], int size)
+{
+	const u8 *p = eh_frame;
+
+	while (size > 4) {
+		const u32 *frame_size = (const u32 *)p;
+		int ret;
+
+		if (*frame_size != -1 && *frame_size <= size) {
+			ret = scs_handle_frame(p + 4, *frame_size);
+			if (ret)
+				return ret;
+			p += 4 + *frame_size;
+			size -= 4 + *frame_size;
+		}
+	}
+	return 0;
+}
+
+asmlinkage int noinstr scs_patch_vmlinux(void)
+{
+	return scs_patch(__eh_frame_start, __eh_frame_end - __eh_frame_start);
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 9/9] arm64: implement dynamic shadow call stack for GCC
@ 2021-10-13 15:22   ` Ard Biesheuvel
  0 siblings, 0 replies; 32+ messages in thread
From: Ard Biesheuvel @ 2021-10-13 15:22 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-hardening, mark.rutland, catalin.marinas, will, Ard Biesheuvel

Implement support for the shadow call stack on GCC, and in a dynamic
manner, by parsing the unwind tables at init time to locate all
occurrences of PACIASP/AUTIASP, and replacing them with the shadow call
stack push and pop instructions, respectively.

This is useful because the overhead of the shadow call stack is
difficult to justify on hardware that implements pointer authentication
(PAC), and given that the PAC instructions are executed as NOPs on
hardware that doesn't, we can just replace them.

This patch only implements this for the core kernel, but the logic can
be reused for modules without much trouble.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 Makefile                      |   4 +-
 arch/Kconfig                  |   4 +-
 arch/arm64/Kconfig            |   8 +-
 arch/arm64/kernel/Makefile    |   2 +
 arch/arm64/kernel/head.S      |   3 +
 arch/arm64/kernel/patch-scs.c | 223 ++++++++++++++++++++
 6 files changed, 239 insertions(+), 5 deletions(-)

diff --git a/Makefile b/Makefile
index 7cfe4ff36f44..2d94fed93d9d 100644
--- a/Makefile
+++ b/Makefile
@@ -933,8 +933,8 @@ LDFLAGS_vmlinux += --gc-sections
 endif
 
 ifdef CONFIG_SHADOW_CALL_STACK
-CC_FLAGS_SCS	:= -fsanitize=shadow-call-stack
-KBUILD_CFLAGS	+= $(CC_FLAGS_SCS)
+CC_FLAGS_SCS-$(CONFIG_CC_IS_CLANG)	:= -fsanitize=shadow-call-stack
+KBUILD_CFLAGS				+= $(CC_FLAGS_SCS-y)
 export CC_FLAGS_SCS
 endif
 
diff --git a/arch/Kconfig b/arch/Kconfig
index 8df1c7102643..21eeec66bf4c 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -596,8 +596,8 @@ config ARCH_SUPPORTS_SHADOW_CALL_STACK
 	  switching.
 
 config SHADOW_CALL_STACK
-	bool "Clang Shadow Call Stack"
-	depends on CC_IS_CLANG && ARCH_SUPPORTS_SHADOW_CALL_STACK
+	bool "Shadow Call Stack"
+	depends on ARCH_SUPPORTS_SHADOW_CALL_STACK
 	depends on DYNAMIC_FTRACE_WITH_REGS || !FUNCTION_GRAPH_TRACER
 	help
 	  This option enables Clang's Shadow Call Stack, which uses a
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 742baca09343..6d74822fd386 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -81,7 +81,7 @@ config ARM64
 	select ARCH_SUPPORTS_DEBUG_PAGEALLOC
 	select ARCH_SUPPORTS_HUGETLBFS
 	select ARCH_SUPPORTS_MEMORY_FAILURE
-	select ARCH_SUPPORTS_SHADOW_CALL_STACK if CC_HAVE_SHADOW_CALL_STACK
+	select ARCH_SUPPORTS_SHADOW_CALL_STACK if CC_HAVE_SHADOW_CALL_STACK || CC_IS_GCC
 	select ARCH_SUPPORTS_LTO_CLANG if CPU_LITTLE_ENDIAN
 	select ARCH_SUPPORTS_LTO_CLANG_THIN
 	select ARCH_SUPPORTS_CFI_CLANG
@@ -353,6 +353,12 @@ config KASAN_SHADOW_OFFSET
 config UNWIND_TABLES
 	bool
 
+config UNWIND_PATCH_PAC_INTO_SCS
+	def_bool y
+	depends on CC_IS_GCC && SHADOW_CALL_STACK
+	select UNWIND_TABLES
+	select ARM64_PTR_AUTH_KERNEL
+
 source "arch/arm64/Kconfig.platforms"
 
 menu "Kernel Features"
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 3f1490bfb938..42b9bd92d51e 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -73,6 +73,8 @@ obj-$(CONFIG_ARM64_PTR_AUTH)		+= pointer_auth.o
 obj-$(CONFIG_ARM64_MTE)			+= mte.o
 obj-y					+= vdso-wrap.o
 obj-$(CONFIG_COMPAT_VDSO)		+= vdso32-wrap.o
+obj-$(CONFIG_UNWIND_PATCH_PAC_INTO_SCS)	+= patch-scs.o
+CFLAGS_patch-scs.o			+= -mbranch-protection=none
 
 obj-y					+= probes/
 head-y					:= head.o
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 17962452e31d..5d50d212d3ae 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -447,6 +447,9 @@ SYM_FUNC_START_LOCAL(__primary_switched)
 	bl	__pi_memset
 	dsb	ishst				// Make zero page visible to PTW
 
+#ifdef CONFIG_UNWIND_PATCH_PAC_INTO_SCS
+	bl	scs_patch_vmlinux
+#endif
 #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
 	bl	kasan_early_init
 #endif
diff --git a/arch/arm64/kernel/patch-scs.c b/arch/arm64/kernel/patch-scs.c
new file mode 100644
index 000000000000..878a40060550
--- /dev/null
+++ b/arch/arm64/kernel/patch-scs.c
@@ -0,0 +1,223 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2021 - Google LLC
+ * Author: Ard Biesheuvel <ardb@google.com>
+ */
+
+#include <linux/bug.h>
+#include <linux/init.h>
+#include <linux/linkage.h>
+#include <linux/printk.h>
+#include <linux/types.h>
+
+#define DW_CFA_nop                          0x00
+#define DW_CFA_set_loc                      0x01
+#define DW_CFA_advance_loc1                 0x02
+#define DW_CFA_advance_loc2                 0x03
+#define DW_CFA_advance_loc4                 0x04
+#define DW_CFA_offset_extended              0x05
+#define DW_CFA_restore_extended             0x06
+#define DW_CFA_undefined                    0x07
+#define DW_CFA_same_value                   0x08
+#define DW_CFA_register                     0x09
+#define DW_CFA_remember_state               0x0a
+#define DW_CFA_restore_state                0x0b
+#define DW_CFA_def_cfa                      0x0c
+#define DW_CFA_def_cfa_register             0x0d
+#define DW_CFA_def_cfa_offset               0x0e
+#define DW_CFA_def_cfa_expression           0x0f
+#define DW_CFA_expression                   0x10
+#define DW_CFA_offset_extended_sf           0x11
+#define DW_CFA_def_cfa_sf                   0x12
+#define DW_CFA_def_cfa_offset_sf            0x13
+#define DW_CFA_val_offset                   0x14
+#define DW_CFA_val_offset_sf                0x15
+#define DW_CFA_val_expression               0x16
+#define DW_CFA_lo_user                      0x1c
+#define DW_CFA_negate_ra_state              0x2d
+#define DW_CFA_GNU_args_size                0x2e
+#define DW_CFA_GNU_negative_offset_extended 0x2f
+#define DW_CFA_hi_user                      0x3f
+
+static unsigned long get_uleb128(const u8 **pcur, const u8 *end)
+{
+	const u8 *cur = *pcur;
+	unsigned long value;
+	unsigned int shift;
+
+	for (shift = 0, value = 0; cur < end; shift += 7) {
+		if (shift + 7 > 8 * sizeof(value)
+		    && (*cur & 0x7fU) >= (1U << (8 * sizeof(value) - shift))) {
+			cur = end + 1;
+			break;
+		}
+		value |= (unsigned long) (*cur & 0x7f) << shift;
+		if (!(*cur++ & 0x80))
+			break;
+	}
+	*pcur = cur;
+
+	return value;
+}
+
+extern const u8 __eh_frame_start[], __eh_frame_end[];
+
+struct fde_frame {
+	s32		initial_loc;
+	s32		range;
+};
+
+static int scs_patch_loc(u64 loc)
+{
+	u32 insn = le32_to_cpup((void *)loc);
+
+	/*
+	 * Sometimes, the unwind data appears to be out of sync, and associates
+	 * the DW_CFA_negate_ra_state directive with the ret instruction
+	 * following the autiasp, rather than the autiasp itself.
+	 */
+	if (insn == 0xd65f03c0) { // ret
+		loc -= 4;
+		insn = le32_to_cpup((void *)loc);
+	}
+
+	switch (insn) {
+	case 0xd503233f: // paciasp
+		*(u32 *)loc = cpu_to_le32(0xf800865e);
+		break;
+	case 0xd50323bf: // autiasp
+		*(u32 *)loc = cpu_to_le32(0xf85f8e5e);
+		break;
+	default:
+		// ignore
+		break;
+	}
+	return 0;
+}
+
+static int noinstr scs_handle_frame(const u8 eh_frame[], u32 size)
+{
+	const struct fde_frame *fde;
+	const u8 *opcode;
+	u64 loc;
+
+	/*
+	 * For patching PAC opcodes, we only care about the FDE records, and
+	 * not the CIE, which carries the initial CFA directives but they only
+	 * pertain to which register is the stack pointer.
+	 * TODO this is not 100% true - we need the augmentation string and the
+	 * encoding but they are always the same in practice.
+	 */
+	if (*(u32 *)eh_frame == 0)
+		return 0;
+
+	fde = (const struct fde_frame *)(eh_frame + 4);
+	loc = (u64)offset_to_ptr(&fde->initial_loc);
+	opcode = (const u8 *)(fde + 1);
+
+	// TODO check augmentation data
+	WARN_ON(*opcode++);
+	size -= sizeof(u32) + sizeof(*fde) + 1;
+
+	/*
+	 * Starting from 'loc', apply the CFA opcodes that advance the location
+	 * pointer, and identify the locations of the PAC instructions.
+	 */
+	do {
+		const u8 *end;
+
+		switch (*opcode & 0xC0) {
+		case 0:
+			// handle DW_CFA_xxx opcodes
+			switch (*opcode) {
+				int ret;
+
+			case DW_CFA_nop:
+			case DW_CFA_remember_state:
+			case DW_CFA_restore_state:
+				break;
+
+			case DW_CFA_advance_loc1:
+				loc += 4 * *++opcode;
+				size--;
+				break;
+
+			case DW_CFA_advance_loc2:
+				loc += 4 * *++opcode;
+				loc += 4 * *++opcode << 8;
+				size -= 2;
+				break;
+
+			case DW_CFA_def_cfa:
+			case DW_CFA_def_cfa_offset:
+			case DW_CFA_def_cfa_register:
+				opcode++;
+				size--;
+				end = opcode + size;
+				get_uleb128(&opcode, end);
+				size = end - opcode;
+				continue;
+
+			case DW_CFA_negate_ra_state:
+				// patch paciasp/autiasp into shadow stack push/pop
+				ret = scs_patch_loc(loc - 4);
+				if (ret)
+					return ret;
+				break;
+
+			default:
+				pr_debug("unhandled opcode: %02x\n", *opcode);
+				return -ENOEXEC;
+			}
+			opcode++;
+			size--;
+			break;
+
+		case 0x40:
+			// advance loc
+			loc += (*opcode++ & 0x3f) * 4;
+			size--;
+			break;
+
+		case 0x80:
+			opcode++;
+			size--;
+			end = opcode + size;
+			get_uleb128(&opcode, end);
+			size = end - opcode;
+			continue;
+
+		default:
+			// ignore
+			opcode++;
+			size--;
+			break;
+		}
+	} while (size > 0);
+
+	return 0;
+}
+
+int noinstr scs_patch(const u8 eh_frame[], int size)
+{
+	const u8 *p = eh_frame;
+
+	while (size > 4) {
+		const u32 *frame_size = (const u32 *)p;
+		int ret;
+
+		if (*frame_size != -1 && *frame_size <= size) {
+			ret = scs_handle_frame(p + 4, *frame_size);
+			if (ret)
+				return ret;
+			p += 4 + *frame_size;
+			size -= 4 + *frame_size;
+		}
+	}
+	return 0;
+}
+
+asmlinkage int noinstr scs_patch_vmlinux(void)
+{
+	return scs_patch(__eh_frame_start, __eh_frame_end - __eh_frame_start);
+}
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 9/9] arm64: implement dynamic shadow call stack for GCC
  2021-10-13 15:22   ` Ard Biesheuvel
@ 2021-10-13 15:42     ` Mark Brown
  -1 siblings, 0 replies; 32+ messages in thread
From: Mark Brown @ 2021-10-13 15:42 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, linux-hardening, mark.rutland, catalin.marinas, will

[-- Attachment #1: Type: text/plain, Size: 377 bytes --]

On Wed, Oct 13, 2021 at 05:22:43PM +0200, Ard Biesheuvel wrote:

> +config UNWIND_PATCH_PAC_INTO_SCS
> +	def_bool y
> +	depends on CC_IS_GCC && SHADOW_CALL_STACK
> +	select UNWIND_TABLES
> +	select ARM64_PTR_AUTH_KERNEL
> +

This needs a dependency on the GCC relevant toolchain features for
pointer auth doesn't it?  Or just make it depend on rather than select
pointer auth.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 9/9] arm64: implement dynamic shadow call stack for GCC
@ 2021-10-13 15:42     ` Mark Brown
  0 siblings, 0 replies; 32+ messages in thread
From: Mark Brown @ 2021-10-13 15:42 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, linux-hardening, mark.rutland, catalin.marinas, will


[-- Attachment #1.1: Type: text/plain, Size: 377 bytes --]

On Wed, Oct 13, 2021 at 05:22:43PM +0200, Ard Biesheuvel wrote:

> +config UNWIND_PATCH_PAC_INTO_SCS
> +	def_bool y
> +	depends on CC_IS_GCC && SHADOW_CALL_STACK
> +	select UNWIND_TABLES
> +	select ARM64_PTR_AUTH_KERNEL
> +

This needs a dependency on the GCC relevant toolchain features for
pointer auth doesn't it?  Or just make it depend on rather than select
pointer auth.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 6/9] arm64: smccc: create proper stack frames for HVC/SMC calls
  2021-10-13 15:22   ` Ard Biesheuvel
@ 2021-10-13 15:44     ` Mark Brown
  -1 siblings, 0 replies; 32+ messages in thread
From: Mark Brown @ 2021-10-13 15:44 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, linux-hardening, mark.rutland, catalin.marinas, will

[-- Attachment #1: Type: text/plain, Size: 377 bytes --]

On Wed, Oct 13, 2021 at 05:22:40PM +0200, Ard Biesheuvel wrote:
> Create proper stack frames using the provided macros for HVC/SMC calling
> helpers that use the stack. This adds the PAC return address signing
> when enabled, and ensures that the unwinder can deal with occurrences
> of these routines appearing on the call stack.

Reviewed-by: Mark Brown <broonie@kernel.org>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 6/9] arm64: smccc: create proper stack frames for HVC/SMC calls
@ 2021-10-13 15:44     ` Mark Brown
  0 siblings, 0 replies; 32+ messages in thread
From: Mark Brown @ 2021-10-13 15:44 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, linux-hardening, mark.rutland, catalin.marinas, will


[-- Attachment #1.1: Type: text/plain, Size: 377 bytes --]

On Wed, Oct 13, 2021 at 05:22:40PM +0200, Ard Biesheuvel wrote:
> Create proper stack frames using the provided macros for HVC/SMC calling
> helpers that use the stack. This adds the PAC return address signing
> when enabled, and ensures that the unwinder can deal with occurrences
> of these routines appearing on the call stack.

Reviewed-by: Mark Brown <broonie@kernel.org>

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 0/9] arm64: use unwind data on GCC for shadow call stack
  2021-10-13 15:22 ` Ard Biesheuvel
@ 2021-10-13 17:52   ` Ard Biesheuvel
  -1 siblings, 0 replies; 32+ messages in thread
From: Ard Biesheuvel @ 2021-10-13 17:52 UTC (permalink / raw)
  To: Linux ARM
  Cc: linux-hardening, Mark Rutland, Catalin Marinas, Will Deacon,
	Kees Cook, Sami Tolvanen, Fangrui Song, Nick Desaulniers, Dan Li

On Wed, 13 Oct 2021 at 17:22, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> This series is a proof of concept implementation of using unwind tables
> to locate PACIASP/AUTIASP instructions in the code, and patching them
> into shadow call stack pushes/pops at boot time if the platform in
> question does not support pointer authentication in hardware. This way,
> the overhead of the shadow call stack is only imposed if it actually
> gives any benefit. It also means that the compiler does not need to
> generate the code, so this works with GCC as well.
>
> In fact, it only works with GCC at the moment, as Clang does not seem to
> implement the DW_CFA_negate_ra_state correctly, which is emitted after
> each PACIASP or AUTIASP instruction (Clang only does the former).
> However, GCC does not appear to get it quite right either, as it emits
> the directive in the wrong place in some cases (but in a way that can be
> worked around).
>
> Note that this only implements it for the core kernel. Modules should be
> straight-forward, and most of the code can be reused. Also, the
> transformation is applied unconditionally, even if the hardware does
> implement PAC, but this does not really matter for a PoC.
>
> One obvious downside is the size of the unwind tables (3 MiB for
> defconfig), although there are plenty of use cases where this does not
> really matters (and I haven't checked the compressed size). However,
> there may be other reasons why we'd want to have access to these unwind
> tables (reliable stack traces), so this will need to be discussed before
> I intend to take this any further.
>
> Cc: Kees Cook <keescook@google.com>
> Cc: Sami Tolvanen <samitolvanen@google.com>
> Cc: Fangrui Song <maskray@google.com>
> Cc: Nick Desaulniers <ndesaulniers@google.com>
> Cc: Dan Li <ashimida@linux.alibaba.com>
>

Apologies - i failed to pass --cc-cover so the cc'ees above have only
received this cover letter.

The lore thread is here:
https://lore.kernel.org/r/20211013152243.2216899-1-ardb@kernel.org/


> Ard Biesheuvel (9):
>   arm64: assembler: enable PAC for non-leaf assembler routines
>   arm64: cache: use ALIAS version of linkage macros for local aliases
>   arm64: crypto: avoid overlapping linkage definitions for AES-CBC
>   arm64: aes-neonbs: move frame pop to end of function
>   arm64: chacha-neon: move frame pop forward
>   arm64: smccc: create proper stack frames for HVC/SMC calls
>   arm64: assembler: add unwind annotations to frame push/pop macros
>   arm64: unwind: add asynchronous unwind tables to the kernel proper
>   arm64: implement dynamic shadow call stack for GCC
>
>  Makefile                              |   4 +-
>  arch/Kconfig                          |   4 +-
>  arch/arm64/Kconfig                    |  11 +-
>  arch/arm64/Makefile                   |   7 +-
>  arch/arm64/crypto/aes-modes.S         |   4 +-
>  arch/arm64/crypto/aes-neonbs-core.S   |   8 +-
>  arch/arm64/crypto/chacha-neon-core.S  |   9 +-
>  arch/arm64/include/asm/assembler.h    |  32 ++-
>  arch/arm64/include/asm/linkage.h      |  16 +-
>  arch/arm64/kernel/Makefile            |   2 +
>  arch/arm64/kernel/head.S              |   3 +
>  arch/arm64/kernel/patch-scs.c         | 223 ++++++++++++++++++++
>  arch/arm64/kernel/smccc-call.S        |  40 ++--
>  arch/arm64/kernel/vmlinux.lds.S       |  20 ++
>  arch/arm64/mm/cache.S                 |   8 +-
>  drivers/firmware/efi/libstub/Makefile |   1 +
>  16 files changed, 347 insertions(+), 45 deletions(-)
>  create mode 100644 arch/arm64/kernel/patch-scs.c
>
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 0/9] arm64: use unwind data on GCC for shadow call stack
@ 2021-10-13 17:52   ` Ard Biesheuvel
  0 siblings, 0 replies; 32+ messages in thread
From: Ard Biesheuvel @ 2021-10-13 17:52 UTC (permalink / raw)
  To: Linux ARM
  Cc: linux-hardening, Mark Rutland, Catalin Marinas, Will Deacon,
	Kees Cook, Sami Tolvanen, Fangrui Song, Nick Desaulniers, Dan Li

On Wed, 13 Oct 2021 at 17:22, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> This series is a proof of concept implementation of using unwind tables
> to locate PACIASP/AUTIASP instructions in the code, and patching them
> into shadow call stack pushes/pops at boot time if the platform in
> question does not support pointer authentication in hardware. This way,
> the overhead of the shadow call stack is only imposed if it actually
> gives any benefit. It also means that the compiler does not need to
> generate the code, so this works with GCC as well.
>
> In fact, it only works with GCC at the moment, as Clang does not seem to
> implement the DW_CFA_negate_ra_state correctly, which is emitted after
> each PACIASP or AUTIASP instruction (Clang only does the former).
> However, GCC does not appear to get it quite right either, as it emits
> the directive in the wrong place in some cases (but in a way that can be
> worked around).
>
> Note that this only implements it for the core kernel. Modules should be
> straight-forward, and most of the code can be reused. Also, the
> transformation is applied unconditionally, even if the hardware does
> implement PAC, but this does not really matter for a PoC.
>
> One obvious downside is the size of the unwind tables (3 MiB for
> defconfig), although there are plenty of use cases where this does not
> really matters (and I haven't checked the compressed size). However,
> there may be other reasons why we'd want to have access to these unwind
> tables (reliable stack traces), so this will need to be discussed before
> I intend to take this any further.
>
> Cc: Kees Cook <keescook@google.com>
> Cc: Sami Tolvanen <samitolvanen@google.com>
> Cc: Fangrui Song <maskray@google.com>
> Cc: Nick Desaulniers <ndesaulniers@google.com>
> Cc: Dan Li <ashimida@linux.alibaba.com>
>

Apologies - i failed to pass --cc-cover so the cc'ees above have only
received this cover letter.

The lore thread is here:
https://lore.kernel.org/r/20211013152243.2216899-1-ardb@kernel.org/


> Ard Biesheuvel (9):
>   arm64: assembler: enable PAC for non-leaf assembler routines
>   arm64: cache: use ALIAS version of linkage macros for local aliases
>   arm64: crypto: avoid overlapping linkage definitions for AES-CBC
>   arm64: aes-neonbs: move frame pop to end of function
>   arm64: chacha-neon: move frame pop forward
>   arm64: smccc: create proper stack frames for HVC/SMC calls
>   arm64: assembler: add unwind annotations to frame push/pop macros
>   arm64: unwind: add asynchronous unwind tables to the kernel proper
>   arm64: implement dynamic shadow call stack for GCC
>
>  Makefile                              |   4 +-
>  arch/Kconfig                          |   4 +-
>  arch/arm64/Kconfig                    |  11 +-
>  arch/arm64/Makefile                   |   7 +-
>  arch/arm64/crypto/aes-modes.S         |   4 +-
>  arch/arm64/crypto/aes-neonbs-core.S   |   8 +-
>  arch/arm64/crypto/chacha-neon-core.S  |   9 +-
>  arch/arm64/include/asm/assembler.h    |  32 ++-
>  arch/arm64/include/asm/linkage.h      |  16 +-
>  arch/arm64/kernel/Makefile            |   2 +
>  arch/arm64/kernel/head.S              |   3 +
>  arch/arm64/kernel/patch-scs.c         | 223 ++++++++++++++++++++
>  arch/arm64/kernel/smccc-call.S        |  40 ++--
>  arch/arm64/kernel/vmlinux.lds.S       |  20 ++
>  arch/arm64/mm/cache.S                 |   8 +-
>  drivers/firmware/efi/libstub/Makefile |   1 +
>  16 files changed, 347 insertions(+), 45 deletions(-)
>  create mode 100644 arch/arm64/kernel/patch-scs.c
>
> --
> 2.30.2
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 0/9] arm64: use unwind data on GCC for shadow call stack
  2021-10-13 15:22 ` Ard Biesheuvel
@ 2021-10-13 18:01   ` Nick Desaulniers
  -1 siblings, 0 replies; 32+ messages in thread
From: Nick Desaulniers @ 2021-10-13 18:01 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, linux-hardening, mark.rutland, catalin.marinas,
	will, Kees Cook, Sami Tolvanen, Fangrui Song, Dan Li,
	linux-toolchains

On Wed, Oct 13, 2021 at 8:22 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> This series is a proof of concept implementation of using unwind tables
> to locate PACIASP/AUTIASP instructions in the code, and patching them
> into shadow call stack pushes/pops at boot time if the platform in
> question does not support pointer authentication in hardware. This way,
> the overhead of the shadow call stack is only imposed if it actually
> gives any benefit. It also means that the compiler does not need to
> generate the code, so this works with GCC as well.
>
> In fact, it only works with GCC at the moment, as Clang does not seem to
> implement the DW_CFA_negate_ra_state correctly, which is emitted after
> each PACIASP or AUTIASP instruction (Clang only does the former).
> However, GCC does not appear to get it quite right either, as it emits
> the directive in the wrong place in some cases (but in a way that can be
> worked around).

Can we work on getting bug reports to the compiler vendors? Then we
can have something free of workarounds, and more toolchain portable.

>
> Note that this only implements it for the core kernel. Modules should be
> straight-forward, and most of the code can be reused. Also, the
> transformation is applied unconditionally, even if the hardware does
> implement PAC, but this does not really matter for a PoC.
>
> One obvious downside is the size of the unwind tables (3 MiB for
> defconfig), although there are plenty of use cases where this does not
> really matters (and I haven't checked the compressed size). However,
> there may be other reasons why we'd want to have access to these unwind
> tables (reliable stack traces), so this will need to be discussed before
> I intend to take this any further.
>
> Cc: Kees Cook <keescook@google.com>
> Cc: Sami Tolvanen <samitolvanen@google.com>
> Cc: Fangrui Song <maskray@google.com>
> Cc: Nick Desaulniers <ndesaulniers@google.com>
> Cc: Dan Li <ashimida@linux.alibaba.com>
>
> Ard Biesheuvel (9):
>   arm64: assembler: enable PAC for non-leaf assembler routines
>   arm64: cache: use ALIAS version of linkage macros for local aliases
>   arm64: crypto: avoid overlapping linkage definitions for AES-CBC
>   arm64: aes-neonbs: move frame pop to end of function
>   arm64: chacha-neon: move frame pop forward
>   arm64: smccc: create proper stack frames for HVC/SMC calls
>   arm64: assembler: add unwind annotations to frame push/pop macros
>   arm64: unwind: add asynchronous unwind tables to the kernel proper
>   arm64: implement dynamic shadow call stack for GCC
>
>  Makefile                              |   4 +-
>  arch/Kconfig                          |   4 +-
>  arch/arm64/Kconfig                    |  11 +-
>  arch/arm64/Makefile                   |   7 +-
>  arch/arm64/crypto/aes-modes.S         |   4 +-
>  arch/arm64/crypto/aes-neonbs-core.S   |   8 +-
>  arch/arm64/crypto/chacha-neon-core.S  |   9 +-
>  arch/arm64/include/asm/assembler.h    |  32 ++-
>  arch/arm64/include/asm/linkage.h      |  16 +-
>  arch/arm64/kernel/Makefile            |   2 +
>  arch/arm64/kernel/head.S              |   3 +
>  arch/arm64/kernel/patch-scs.c         | 223 ++++++++++++++++++++
>  arch/arm64/kernel/smccc-call.S        |  40 ++--
>  arch/arm64/kernel/vmlinux.lds.S       |  20 ++
>  arch/arm64/mm/cache.S                 |   8 +-
>  drivers/firmware/efi/libstub/Makefile |   1 +
>  16 files changed, 347 insertions(+), 45 deletions(-)
>  create mode 100644 arch/arm64/kernel/patch-scs.c
>
> --
> 2.30.2
>


-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 0/9] arm64: use unwind data on GCC for shadow call stack
@ 2021-10-13 18:01   ` Nick Desaulniers
  0 siblings, 0 replies; 32+ messages in thread
From: Nick Desaulniers @ 2021-10-13 18:01 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, linux-hardening, mark.rutland, catalin.marinas,
	will, Kees Cook, Sami Tolvanen, Fangrui Song, Dan Li,
	linux-toolchains

On Wed, Oct 13, 2021 at 8:22 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> This series is a proof of concept implementation of using unwind tables
> to locate PACIASP/AUTIASP instructions in the code, and patching them
> into shadow call stack pushes/pops at boot time if the platform in
> question does not support pointer authentication in hardware. This way,
> the overhead of the shadow call stack is only imposed if it actually
> gives any benefit. It also means that the compiler does not need to
> generate the code, so this works with GCC as well.
>
> In fact, it only works with GCC at the moment, as Clang does not seem to
> implement the DW_CFA_negate_ra_state correctly, which is emitted after
> each PACIASP or AUTIASP instruction (Clang only does the former).
> However, GCC does not appear to get it quite right either, as it emits
> the directive in the wrong place in some cases (but in a way that can be
> worked around).

Can we work on getting bug reports to the compiler vendors? Then we
can have something free of workarounds, and more toolchain portable.

>
> Note that this only implements it for the core kernel. Modules should be
> straight-forward, and most of the code can be reused. Also, the
> transformation is applied unconditionally, even if the hardware does
> implement PAC, but this does not really matter for a PoC.
>
> One obvious downside is the size of the unwind tables (3 MiB for
> defconfig), although there are plenty of use cases where this does not
> really matters (and I haven't checked the compressed size). However,
> there may be other reasons why we'd want to have access to these unwind
> tables (reliable stack traces), so this will need to be discussed before
> I intend to take this any further.
>
> Cc: Kees Cook <keescook@google.com>
> Cc: Sami Tolvanen <samitolvanen@google.com>
> Cc: Fangrui Song <maskray@google.com>
> Cc: Nick Desaulniers <ndesaulniers@google.com>
> Cc: Dan Li <ashimida@linux.alibaba.com>
>
> Ard Biesheuvel (9):
>   arm64: assembler: enable PAC for non-leaf assembler routines
>   arm64: cache: use ALIAS version of linkage macros for local aliases
>   arm64: crypto: avoid overlapping linkage definitions for AES-CBC
>   arm64: aes-neonbs: move frame pop to end of function
>   arm64: chacha-neon: move frame pop forward
>   arm64: smccc: create proper stack frames for HVC/SMC calls
>   arm64: assembler: add unwind annotations to frame push/pop macros
>   arm64: unwind: add asynchronous unwind tables to the kernel proper
>   arm64: implement dynamic shadow call stack for GCC
>
>  Makefile                              |   4 +-
>  arch/Kconfig                          |   4 +-
>  arch/arm64/Kconfig                    |  11 +-
>  arch/arm64/Makefile                   |   7 +-
>  arch/arm64/crypto/aes-modes.S         |   4 +-
>  arch/arm64/crypto/aes-neonbs-core.S   |   8 +-
>  arch/arm64/crypto/chacha-neon-core.S  |   9 +-
>  arch/arm64/include/asm/assembler.h    |  32 ++-
>  arch/arm64/include/asm/linkage.h      |  16 +-
>  arch/arm64/kernel/Makefile            |   2 +
>  arch/arm64/kernel/head.S              |   3 +
>  arch/arm64/kernel/patch-scs.c         | 223 ++++++++++++++++++++
>  arch/arm64/kernel/smccc-call.S        |  40 ++--
>  arch/arm64/kernel/vmlinux.lds.S       |  20 ++
>  arch/arm64/mm/cache.S                 |   8 +-
>  drivers/firmware/efi/libstub/Makefile |   1 +
>  16 files changed, 347 insertions(+), 45 deletions(-)
>  create mode 100644 arch/arm64/kernel/patch-scs.c
>
> --
> 2.30.2
>


-- 
Thanks,
~Nick Desaulniers

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 9/9] arm64: implement dynamic shadow call stack for GCC
  2021-10-13 15:22   ` Ard Biesheuvel
@ 2021-10-13 22:35     ` Dan Li
  -1 siblings, 0 replies; 32+ messages in thread
From: Dan Li @ 2021-10-13 22:35 UTC (permalink / raw)
  To: Ard Biesheuvel, linux-arm-kernel
  Cc: linux-hardening, mark.rutland, catalin.marinas, will



On 10/13/21 11:22 PM, Ard Biesheuvel wrote:
> Implement support for the shadow call stack on GCC, and in a dynamic
> manner, by parsing the unwind tables at init time to locate all
> occurrences of PACIASP/AUTIASP, and replacing them with the shadow call
> stack push and pop instructions, respectively.
> 
> This is useful because the overhead of the shadow call stack is
> difficult to justify on hardware that implements pointer authentication
> (PAC), and given that the PAC instructions are executed as NOPs on
> hardware that doesn't, we can just replace them.
> 
> This patch only implements this for the core kernel, but the logic can
> be reused for modules without much trouble.
> 
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>   Makefile                      |   4 +-
>   arch/Kconfig                  |   4 +-
>   arch/arm64/Kconfig            |   8 +-
>   arch/arm64/kernel/Makefile    |   2 +
>   arch/arm64/kernel/head.S      |   3 +
>   arch/arm64/kernel/patch-scs.c | 223 ++++++++++++++++++++
>   6 files changed, 239 insertions(+), 5 deletions(-)
> 
> diff --git a/Makefile b/Makefile
> index 7cfe4ff36f44..2d94fed93d9d 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -933,8 +933,8 @@ LDFLAGS_vmlinux += --gc-sections
>   endif
>   
>   ifdef CONFIG_SHADOW_CALL_STACK
> -CC_FLAGS_SCS	:= -fsanitize=shadow-call-stack
> -KBUILD_CFLAGS	+= $(CC_FLAGS_SCS)
> +CC_FLAGS_SCS-$(CONFIG_CC_IS_CLANG)	:= -fsanitize=shadow-call-stack
> +KBUILD_CFLAGS				+= $(CC_FLAGS_SCS-y)
>   export CC_FLAGS_SCS
>   endif
>   
> diff --git a/arch/arm64/kernel/patch-scs.c b/arch/arm64/kernel/patch-scs.c
> new file mode 100644
> index 000000000000..878a40060550
> --- /dev/null
> +++ b/arch/arm64/kernel/patch-scs.c
> +static int scs_patch_loc(u64 loc)
> +{
> +	u32 insn = le32_to_cpup((void *)loc);
> +
> +	/*
> +	 * Sometimes, the unwind data appears to be out of sync, and associates
> +	 * the DW_CFA_negate_ra_state directive with the ret instruction
> +	 * following the autiasp, rather than the autiasp itself.
> +	 */
> +	if (insn == 0xd65f03c0) { // ret
> +		loc -= 4;
> +		insn = le32_to_cpup((void *)loc);
> +	}
> +
> +	switch (insn) {
> +	case 0xd503233f: // paciasp
> +		*(u32 *)loc = cpu_to_le32(0xf800865e);
> +		break;
> +	case 0xd50323bf: // autiasp
> +		*(u32 *)loc = cpu_to_le32(0xf85f8e5e);
> +		break;
> +	default:
> +		// ignore
> +		break;
> +	}
> +	return 0;
> +}

Hi Ard,

According to my understanding (may be wrong), here may need to filter out
'-march=armv8.3-a'. When it is specified, gcc will use 'retaa' instead of
'autiasp' as a pac check.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 9/9] arm64: implement dynamic shadow call stack for GCC
@ 2021-10-13 22:35     ` Dan Li
  0 siblings, 0 replies; 32+ messages in thread
From: Dan Li @ 2021-10-13 22:35 UTC (permalink / raw)
  To: Ard Biesheuvel, linux-arm-kernel
  Cc: linux-hardening, mark.rutland, catalin.marinas, will



On 10/13/21 11:22 PM, Ard Biesheuvel wrote:
> Implement support for the shadow call stack on GCC, and in a dynamic
> manner, by parsing the unwind tables at init time to locate all
> occurrences of PACIASP/AUTIASP, and replacing them with the shadow call
> stack push and pop instructions, respectively.
> 
> This is useful because the overhead of the shadow call stack is
> difficult to justify on hardware that implements pointer authentication
> (PAC), and given that the PAC instructions are executed as NOPs on
> hardware that doesn't, we can just replace them.
> 
> This patch only implements this for the core kernel, but the logic can
> be reused for modules without much trouble.
> 
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>   Makefile                      |   4 +-
>   arch/Kconfig                  |   4 +-
>   arch/arm64/Kconfig            |   8 +-
>   arch/arm64/kernel/Makefile    |   2 +
>   arch/arm64/kernel/head.S      |   3 +
>   arch/arm64/kernel/patch-scs.c | 223 ++++++++++++++++++++
>   6 files changed, 239 insertions(+), 5 deletions(-)
> 
> diff --git a/Makefile b/Makefile
> index 7cfe4ff36f44..2d94fed93d9d 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -933,8 +933,8 @@ LDFLAGS_vmlinux += --gc-sections
>   endif
>   
>   ifdef CONFIG_SHADOW_CALL_STACK
> -CC_FLAGS_SCS	:= -fsanitize=shadow-call-stack
> -KBUILD_CFLAGS	+= $(CC_FLAGS_SCS)
> +CC_FLAGS_SCS-$(CONFIG_CC_IS_CLANG)	:= -fsanitize=shadow-call-stack
> +KBUILD_CFLAGS				+= $(CC_FLAGS_SCS-y)
>   export CC_FLAGS_SCS
>   endif
>   
> diff --git a/arch/arm64/kernel/patch-scs.c b/arch/arm64/kernel/patch-scs.c
> new file mode 100644
> index 000000000000..878a40060550
> --- /dev/null
> +++ b/arch/arm64/kernel/patch-scs.c
> +static int scs_patch_loc(u64 loc)
> +{
> +	u32 insn = le32_to_cpup((void *)loc);
> +
> +	/*
> +	 * Sometimes, the unwind data appears to be out of sync, and associates
> +	 * the DW_CFA_negate_ra_state directive with the ret instruction
> +	 * following the autiasp, rather than the autiasp itself.
> +	 */
> +	if (insn == 0xd65f03c0) { // ret
> +		loc -= 4;
> +		insn = le32_to_cpup((void *)loc);
> +	}
> +
> +	switch (insn) {
> +	case 0xd503233f: // paciasp
> +		*(u32 *)loc = cpu_to_le32(0xf800865e);
> +		break;
> +	case 0xd50323bf: // autiasp
> +		*(u32 *)loc = cpu_to_le32(0xf85f8e5e);
> +		break;
> +	default:
> +		// ignore
> +		break;
> +	}
> +	return 0;
> +}

Hi Ard,

According to my understanding (may be wrong), here may need to filter out
'-march=armv8.3-a'. When it is specified, gcc will use 'retaa' instead of
'autiasp' as a pac check.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 9/9] arm64: implement dynamic shadow call stack for GCC
  2021-10-13 22:35     ` Dan Li
@ 2021-10-14  9:41       ` Ard Biesheuvel
  -1 siblings, 0 replies; 32+ messages in thread
From: Ard Biesheuvel @ 2021-10-14  9:41 UTC (permalink / raw)
  To: Dan Li
  Cc: Linux ARM, linux-hardening, Mark Rutland, Catalin Marinas, Will Deacon

On Thu, 14 Oct 2021 at 00:35, Dan Li <ashimida@linux.alibaba.com> wrote:
>
>
>
> On 10/13/21 11:22 PM, Ard Biesheuvel wrote:
> > Implement support for the shadow call stack on GCC, and in a dynamic
> > manner, by parsing the unwind tables at init time to locate all
> > occurrences of PACIASP/AUTIASP, and replacing them with the shadow call
> > stack push and pop instructions, respectively.
> >
> > This is useful because the overhead of the shadow call stack is
> > difficult to justify on hardware that implements pointer authentication
> > (PAC), and given that the PAC instructions are executed as NOPs on
> > hardware that doesn't, we can just replace them.
> >
> > This patch only implements this for the core kernel, but the logic can
> > be reused for modules without much trouble.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> >   Makefile                      |   4 +-
> >   arch/Kconfig                  |   4 +-
> >   arch/arm64/Kconfig            |   8 +-
> >   arch/arm64/kernel/Makefile    |   2 +
> >   arch/arm64/kernel/head.S      |   3 +
> >   arch/arm64/kernel/patch-scs.c | 223 ++++++++++++++++++++
> >   6 files changed, 239 insertions(+), 5 deletions(-)
> >
> > diff --git a/Makefile b/Makefile
> > index 7cfe4ff36f44..2d94fed93d9d 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -933,8 +933,8 @@ LDFLAGS_vmlinux += --gc-sections
> >   endif
> >
> >   ifdef CONFIG_SHADOW_CALL_STACK
> > -CC_FLAGS_SCS := -fsanitize=shadow-call-stack
> > -KBUILD_CFLAGS        += $(CC_FLAGS_SCS)
> > +CC_FLAGS_SCS-$(CONFIG_CC_IS_CLANG)   := -fsanitize=shadow-call-stack
> > +KBUILD_CFLAGS                                += $(CC_FLAGS_SCS-y)
> >   export CC_FLAGS_SCS
> >   endif
> >
> > diff --git a/arch/arm64/kernel/patch-scs.c b/arch/arm64/kernel/patch-scs.c
> > new file mode 100644
> > index 000000000000..878a40060550
> > --- /dev/null
> > +++ b/arch/arm64/kernel/patch-scs.c
> > +static int scs_patch_loc(u64 loc)
> > +{
> > +     u32 insn = le32_to_cpup((void *)loc);
> > +
> > +     /*
> > +      * Sometimes, the unwind data appears to be out of sync, and associates
> > +      * the DW_CFA_negate_ra_state directive with the ret instruction
> > +      * following the autiasp, rather than the autiasp itself.
> > +      */
> > +     if (insn == 0xd65f03c0) { // ret
> > +             loc -= 4;
> > +             insn = le32_to_cpup((void *)loc);
> > +     }
> > +
> > +     switch (insn) {
> > +     case 0xd503233f: // paciasp
> > +             *(u32 *)loc = cpu_to_le32(0xf800865e);
> > +             break;
> > +     case 0xd50323bf: // autiasp
> > +             *(u32 *)loc = cpu_to_le32(0xf85f8e5e);
> > +             break;
> > +     default:
> > +             // ignore
> > +             break;
> > +     }
> > +     return 0;
> > +}
>
> Hi Ard,
>
> According to my understanding (may be wrong), here may need to filter out
> '-march=armv8.3-a'. When it is specified, gcc will use 'retaa' instead of
> 'autiasp' as a pac check.

We can't use that for the single kernel image anyway, since retaa is
UNDEFINED if the PAC extension is not implemented. So in this
particular case, it does not really matter, given that you would not
include the SCS fallback in a kernel that is targetting only hardware
that implements PAC.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 9/9] arm64: implement dynamic shadow call stack for GCC
@ 2021-10-14  9:41       ` Ard Biesheuvel
  0 siblings, 0 replies; 32+ messages in thread
From: Ard Biesheuvel @ 2021-10-14  9:41 UTC (permalink / raw)
  To: Dan Li
  Cc: Linux ARM, linux-hardening, Mark Rutland, Catalin Marinas, Will Deacon

On Thu, 14 Oct 2021 at 00:35, Dan Li <ashimida@linux.alibaba.com> wrote:
>
>
>
> On 10/13/21 11:22 PM, Ard Biesheuvel wrote:
> > Implement support for the shadow call stack on GCC, and in a dynamic
> > manner, by parsing the unwind tables at init time to locate all
> > occurrences of PACIASP/AUTIASP, and replacing them with the shadow call
> > stack push and pop instructions, respectively.
> >
> > This is useful because the overhead of the shadow call stack is
> > difficult to justify on hardware that implements pointer authentication
> > (PAC), and given that the PAC instructions are executed as NOPs on
> > hardware that doesn't, we can just replace them.
> >
> > This patch only implements this for the core kernel, but the logic can
> > be reused for modules without much trouble.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> >   Makefile                      |   4 +-
> >   arch/Kconfig                  |   4 +-
> >   arch/arm64/Kconfig            |   8 +-
> >   arch/arm64/kernel/Makefile    |   2 +
> >   arch/arm64/kernel/head.S      |   3 +
> >   arch/arm64/kernel/patch-scs.c | 223 ++++++++++++++++++++
> >   6 files changed, 239 insertions(+), 5 deletions(-)
> >
> > diff --git a/Makefile b/Makefile
> > index 7cfe4ff36f44..2d94fed93d9d 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -933,8 +933,8 @@ LDFLAGS_vmlinux += --gc-sections
> >   endif
> >
> >   ifdef CONFIG_SHADOW_CALL_STACK
> > -CC_FLAGS_SCS := -fsanitize=shadow-call-stack
> > -KBUILD_CFLAGS        += $(CC_FLAGS_SCS)
> > +CC_FLAGS_SCS-$(CONFIG_CC_IS_CLANG)   := -fsanitize=shadow-call-stack
> > +KBUILD_CFLAGS                                += $(CC_FLAGS_SCS-y)
> >   export CC_FLAGS_SCS
> >   endif
> >
> > diff --git a/arch/arm64/kernel/patch-scs.c b/arch/arm64/kernel/patch-scs.c
> > new file mode 100644
> > index 000000000000..878a40060550
> > --- /dev/null
> > +++ b/arch/arm64/kernel/patch-scs.c
> > +static int scs_patch_loc(u64 loc)
> > +{
> > +     u32 insn = le32_to_cpup((void *)loc);
> > +
> > +     /*
> > +      * Sometimes, the unwind data appears to be out of sync, and associates
> > +      * the DW_CFA_negate_ra_state directive with the ret instruction
> > +      * following the autiasp, rather than the autiasp itself.
> > +      */
> > +     if (insn == 0xd65f03c0) { // ret
> > +             loc -= 4;
> > +             insn = le32_to_cpup((void *)loc);
> > +     }
> > +
> > +     switch (insn) {
> > +     case 0xd503233f: // paciasp
> > +             *(u32 *)loc = cpu_to_le32(0xf800865e);
> > +             break;
> > +     case 0xd50323bf: // autiasp
> > +             *(u32 *)loc = cpu_to_le32(0xf85f8e5e);
> > +             break;
> > +     default:
> > +             // ignore
> > +             break;
> > +     }
> > +     return 0;
> > +}
>
> Hi Ard,
>
> According to my understanding (may be wrong), here may need to filter out
> '-march=armv8.3-a'. When it is specified, gcc will use 'retaa' instead of
> 'autiasp' as a pac check.

We can't use that for the single kernel image anyway, since retaa is
UNDEFINED if the PAC extension is not implemented. So in this
particular case, it does not really matter, given that you would not
include the SCS fallback in a kernel that is targetting only hardware
that implements PAC.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2021-10-14  9:44 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-13 15:22 [RFC PATCH 0/9] arm64: use unwind data on GCC for shadow call stack Ard Biesheuvel
2021-10-13 15:22 ` Ard Biesheuvel
2021-10-13 15:22 ` [RFC PATCH 1/9] arm64: assembler: enable PAC for non-leaf assembler routines Ard Biesheuvel
2021-10-13 15:22   ` Ard Biesheuvel
2021-10-13 15:22 ` [RFC PATCH 2/9] arm64: cache: use ALIAS version of linkage macros for local aliases Ard Biesheuvel
2021-10-13 15:22   ` Ard Biesheuvel
2021-10-13 15:22 ` [RFC PATCH 3/9] arm64: crypto: avoid overlapping linkage definitions for AES-CBC Ard Biesheuvel
2021-10-13 15:22   ` Ard Biesheuvel
2021-10-13 15:22 ` [RFC PATCH 4/9] arm64: aes-neonbs: move frame pop to end of function Ard Biesheuvel
2021-10-13 15:22   ` Ard Biesheuvel
2021-10-13 15:22 ` [RFC PATCH 5/9] arm64: chacha-neon: move frame pop forward Ard Biesheuvel
2021-10-13 15:22   ` Ard Biesheuvel
2021-10-13 15:22 ` [RFC PATCH 6/9] arm64: smccc: create proper stack frames for HVC/SMC calls Ard Biesheuvel
2021-10-13 15:22   ` Ard Biesheuvel
2021-10-13 15:44   ` Mark Brown
2021-10-13 15:44     ` Mark Brown
2021-10-13 15:22 ` [RFC PATCH 7/9] arm64: assembler: add unwind annotations to frame push/pop macros Ard Biesheuvel
2021-10-13 15:22   ` Ard Biesheuvel
2021-10-13 15:22 ` [RFC PATCH 8/9] arm64: unwind: add asynchronous unwind tables to the kernel proper Ard Biesheuvel
2021-10-13 15:22   ` Ard Biesheuvel
2021-10-13 15:22 ` [RFC PATCH 9/9] arm64: implement dynamic shadow call stack for GCC Ard Biesheuvel
2021-10-13 15:22   ` Ard Biesheuvel
2021-10-13 15:42   ` Mark Brown
2021-10-13 15:42     ` Mark Brown
2021-10-13 22:35   ` Dan Li
2021-10-13 22:35     ` Dan Li
2021-10-14  9:41     ` Ard Biesheuvel
2021-10-14  9:41       ` Ard Biesheuvel
2021-10-13 17:52 ` [RFC PATCH 0/9] arm64: use unwind data on GCC for shadow call stack Ard Biesheuvel
2021-10-13 17:52   ` Ard Biesheuvel
2021-10-13 18:01 ` Nick Desaulniers
2021-10-13 18:01   ` Nick Desaulniers

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.