All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation
@ 2018-08-02 13:21 ` Ard Biesheuvel
  0 siblings, 0 replies; 46+ messages in thread
From: Ard Biesheuvel @ 2018-08-02 13:21 UTC (permalink / raw)
  To: linux-arm-kernel

This is a proof of concept I cooked up, primarily to trigger a discussion
about whether there is a point to doing anything like this, and if there
is, what the pitfalls are. Also, while I am not aware of any similar
implementations, the idea is so simple that I would be surprised if nobody
else thought of the same thing way before I did.

The idea is that we can significantly limit the kernel's attack surface
for ROP based attacks by clearing the stack pointer's sign bit before
returning from a function, and setting it again right after proceeding
from the [expected] return address. This should make it much more difficult
to return to arbitrary gadgets, given that they rely on being chained to
the next via a return address popped off the stack, and this is difficult
when the stack pointer is invalid.

Of course, 4 additional instructions per function return is not exactly
for free, but they are just movs and adds, and leaf functions are
disregarded unless they allocate a stack frame (this comes for free
because simple_return insns are disregarded by the plugin)

Please shoot, preferably with better ideas ...

Ard Biesheuvel (3):
  arm64: use wrapper macro for bl/blx instructions from asm code
  gcc: plugins: add ROP shield plugin for arm64
  arm64: enable ROP protection by clearing SP bit #55 across function
    returns

 arch/Kconfig                                  |   4 +
 arch/arm64/Kconfig                            |  10 ++
 arch/arm64/include/asm/assembler.h            |  21 +++-
 arch/arm64/kernel/entry-ftrace.S              |   6 +-
 arch/arm64/kernel/entry.S                     | 104 +++++++++-------
 arch/arm64/kernel/head.S                      |   4 +-
 arch/arm64/kernel/probes/kprobes_trampoline.S |   2 +-
 arch/arm64/kernel/sleep.S                     |   6 +-
 drivers/firmware/efi/libstub/Makefile         |   3 +-
 scripts/Makefile.gcc-plugins                  |   7 ++
 scripts/gcc-plugins/arm64_rop_shield_plugin.c | 116 ++++++++++++++++++
 11 files changed, 228 insertions(+), 55 deletions(-)
 create mode 100644 scripts/gcc-plugins/arm64_rop_shield_plugin.c

-- 
2.18.0

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation
@ 2018-08-02 13:21 ` Ard Biesheuvel
  0 siblings, 0 replies; 46+ messages in thread
From: Ard Biesheuvel @ 2018-08-02 13:21 UTC (permalink / raw)
  To: kernel-hardening
  Cc: keescook, christoffer.dall, will.deacon, catalin.marinas,
	mark.rutland, labbott, linux-arm-kernel, Ard Biesheuvel

This is a proof of concept I cooked up, primarily to trigger a discussion
about whether there is a point to doing anything like this, and if there
is, what the pitfalls are. Also, while I am not aware of any similar
implementations, the idea is so simple that I would be surprised if nobody
else thought of the same thing way before I did.

The idea is that we can significantly limit the kernel's attack surface
for ROP based attacks by clearing the stack pointer's sign bit before
returning from a function, and setting it again right after proceeding
from the [expected] return address. This should make it much more difficult
to return to arbitrary gadgets, given that they rely on being chained to
the next via a return address popped off the stack, and this is difficult
when the stack pointer is invalid.

Of course, 4 additional instructions per function return is not exactly
for free, but they are just movs and adds, and leaf functions are
disregarded unless they allocate a stack frame (this comes for free
because simple_return insns are disregarded by the plugin)

Please shoot, preferably with better ideas ...

Ard Biesheuvel (3):
  arm64: use wrapper macro for bl/blx instructions from asm code
  gcc: plugins: add ROP shield plugin for arm64
  arm64: enable ROP protection by clearing SP bit #55 across function
    returns

 arch/Kconfig                                  |   4 +
 arch/arm64/Kconfig                            |  10 ++
 arch/arm64/include/asm/assembler.h            |  21 +++-
 arch/arm64/kernel/entry-ftrace.S              |   6 +-
 arch/arm64/kernel/entry.S                     | 104 +++++++++-------
 arch/arm64/kernel/head.S                      |   4 +-
 arch/arm64/kernel/probes/kprobes_trampoline.S |   2 +-
 arch/arm64/kernel/sleep.S                     |   6 +-
 drivers/firmware/efi/libstub/Makefile         |   3 +-
 scripts/Makefile.gcc-plugins                  |   7 ++
 scripts/gcc-plugins/arm64_rop_shield_plugin.c | 116 ++++++++++++++++++
 11 files changed, 228 insertions(+), 55 deletions(-)
 create mode 100644 scripts/gcc-plugins/arm64_rop_shield_plugin.c

-- 
2.18.0

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC/PoC PATCH 1/3] arm64: use wrapper macro for bl/blx instructions from asm code
  2018-08-02 13:21 ` Ard Biesheuvel
@ 2018-08-02 13:21   ` Ard Biesheuvel
  -1 siblings, 0 replies; 46+ messages in thread
From: Ard Biesheuvel @ 2018-08-02 13:21 UTC (permalink / raw)
  To: linux-arm-kernel

In preparation of enabling a feature that temporarily clears the
sign bit in the stack pointer register across a subroutine return,
switch to bl_c/blr_c macros for making such calls from assembler
source. They will be updated in a subsequent patch to conditionally
incorporate the restore sequence for the stack pointer register.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/assembler.h            | 12 ++-
 arch/arm64/kernel/entry-ftrace.S              |  6 +-
 arch/arm64/kernel/entry.S                     | 86 ++++++++++----------
 arch/arm64/kernel/head.S                      |  4 +-
 arch/arm64/kernel/probes/kprobes_trampoline.S |  2 +-
 arch/arm64/kernel/sleep.S                     |  6 +-
 6 files changed, 62 insertions(+), 54 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index 0bcc98dbba56..346ada4de48a 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -687,8 +687,8 @@ USER(\label, ic	ivau, \tmp2)			// invalidate I line PoU
 	.endm
 
 	.macro		do_cond_yield_neon
-	bl		kernel_neon_end
-	bl		kernel_neon_begin
+	bl_c		kernel_neon_end
+	bl_c		kernel_neon_begin
 	.endm
 
 	.macro		endif_yield_neon, lbl
@@ -701,4 +701,12 @@ USER(\label, ic	ivau, \tmp2)			// invalidate I line PoU
 .Lyield_out_\@ :
 	.endm
 
+	.macro		bl_c, target
+	bl		\target
+	.endm
+
+	.macro		blr_c, reg
+	blr		\reg
+	.endm
+
 #endif	/* __ASM_ASSEMBLER_H */
diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S
index 1175f5827ae1..4691eef0dc65 100644
--- a/arch/arm64/kernel/entry-ftrace.S
+++ b/arch/arm64/kernel/entry-ftrace.S
@@ -106,7 +106,7 @@ ENTRY(_mcount)
 
 	mcount_get_pc	x0		//       function's pc
 	mcount_get_lr	x1		//       function's lr (= parent's pc)
-	blr	x2			//   (*ftrace_trace_function)(pc, lr);
+	blr_c	x2			//   (*ftrace_trace_function)(pc, lr);
 
 skip_ftrace_call:			// }
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
@@ -200,7 +200,7 @@ ENTRY(ftrace_graph_caller)
 	mcount_get_lr_addr	  x0	//     pointer to function's saved lr
 	mcount_get_pc		  x1	//     function's pc
 	mcount_get_parent_fp	  x2	//     parent's fp
-	bl	prepare_ftrace_return	// prepare_ftrace_return(&lr, pc, fp)
+	bl_c	prepare_ftrace_return	// prepare_ftrace_return(&lr, pc, fp)
 
 	mcount_exit
 ENDPROC(ftrace_graph_caller)
@@ -215,7 +215,7 @@ ENDPROC(ftrace_graph_caller)
 ENTRY(return_to_handler)
 	save_return_regs
 	mov	x0, x29			//     parent's fp
-	bl	ftrace_return_to_handler// addr = ftrace_return_to_hander(fp);
+	bl_c	ftrace_return_to_handler// addr = ftrace_return_to_hander(fp);
 	mov	x30, x0			// restore the original return address
 	restore_return_regs
 	ret
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 28ad8799406f..eba5b6b528ea 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -43,7 +43,7 @@
  */
 	.macro ct_user_exit, syscall = 0
 #ifdef CONFIG_CONTEXT_TRACKING
-	bl	context_tracking_user_exit
+	bl_c	context_tracking_user_exit
 	.if \syscall == 1
 	/*
 	 * Save/restore needed during syscalls.  Restore syscall arguments from
@@ -59,7 +59,7 @@
 
 	.macro ct_user_enter
 #ifdef CONFIG_CONTEXT_TRACKING
-	bl	context_tracking_user_enter
+	bl_c	context_tracking_user_enter
 #endif
 	.endm
 
@@ -305,7 +305,7 @@ alternative_else_nop_endif
 	 * Cavium erratum 27456 (broadcast TLBI instructions may cause I-cache
 	 * corruption).
 	 */
-	bl	post_ttbr_update_workaround
+	bl_c	post_ttbr_update_workaround
 	.endif
 1:
 	.if	\el != 0
@@ -425,7 +425,7 @@ tsk	.req	x28		// current thread_info
 	ldr_l	x1, handle_arch_irq
 	mov	x0, sp
 	irq_stack_entry
-	blr	x1
+	blr_c	x1
 	irq_stack_exit
 	.endm
 
@@ -490,7 +490,7 @@ __bad_stack:
 	mov	x0, sp
 
 	/* Time to die */
-	bl	handle_bad_stack
+	bl_c	handle_bad_stack
 	ASM_BUG()
 #endif /* CONFIG_VMAP_STACK */
 
@@ -502,7 +502,7 @@ __bad_stack:
 	mov	x0, sp
 	mov	x1, #\reason
 	mrs	x2, esr_el1
-	bl	bad_mode
+	bl_c	bad_mode
 	ASM_BUG()
 	.endm
 
@@ -580,7 +580,7 @@ el1_da:
 	inherit_daif	pstate=x23, tmp=x2
 	clear_address_tag x0, x3
 	mov	x2, sp				// struct pt_regs
-	bl	do_mem_abort
+	bl_c	do_mem_abort
 
 	kernel_exit 1
 el1_sp_pc:
@@ -590,7 +590,7 @@ el1_sp_pc:
 	mrs	x0, far_el1
 	inherit_daif	pstate=x23, tmp=x2
 	mov	x2, sp
-	bl	do_sp_pc_abort
+	bl_c	do_sp_pc_abort
 	ASM_BUG()
 el1_undef:
 	/*
@@ -598,7 +598,7 @@ el1_undef:
 	 */
 	inherit_daif	pstate=x23, tmp=x2
 	mov	x0, sp
-	bl	do_undefinstr
+	bl_c	do_undefinstr
 	ASM_BUG()
 el1_dbg:
 	/*
@@ -609,7 +609,7 @@ el1_dbg:
 	tbz	x24, #0, el1_inv		// EL1 only
 	mrs	x0, far_el1
 	mov	x2, sp				// struct pt_regs
-	bl	do_debug_exception
+	bl_c	do_debug_exception
 	kernel_exit 1
 el1_inv:
 	// TODO: add support for undefined instructions in kernel mode
@@ -617,7 +617,7 @@ el1_inv:
 	mov	x0, sp
 	mov	x2, x1
 	mov	x1, #BAD_SYNC
-	bl	bad_mode
+	bl_c	bad_mode
 	ASM_BUG()
 ENDPROC(el1_sync)
 
@@ -626,7 +626,7 @@ el1_irq:
 	kernel_entry 1
 	enable_da_f
 #ifdef CONFIG_TRACE_IRQFLAGS
-	bl	trace_hardirqs_off
+	bl_c	trace_hardirqs_off
 #endif
 
 	irq_handler
@@ -636,11 +636,11 @@ el1_irq:
 	cbnz	w24, 1f				// preempt count != 0
 	ldr	x0, [tsk, #TSK_TI_FLAGS]	// get flags
 	tbz	x0, #TIF_NEED_RESCHED, 1f	// needs rescheduling?
-	bl	el1_preempt
+	bl_c	el1_preempt
 1:
 #endif
 #ifdef CONFIG_TRACE_IRQFLAGS
-	bl	trace_hardirqs_on
+	bl_c	trace_hardirqs_on
 #endif
 	kernel_exit 1
 ENDPROC(el1_irq)
@@ -648,7 +648,7 @@ ENDPROC(el1_irq)
 #ifdef CONFIG_PREEMPT
 el1_preempt:
 	mov	x24, lr
-1:	bl	preempt_schedule_irq		// irq en/disable is done inside
+1:	bl_c	preempt_schedule_irq		// irq en/disable is done inside
 	ldr	x0, [tsk, #TSK_TI_FLAGS]	// get new tasks TI_FLAGS
 	tbnz	x0, #TIF_NEED_RESCHED, 1b	// needs rescheduling?
 	ret	x24
@@ -749,7 +749,7 @@ el0_da:
 	clear_address_tag x0, x26
 	mov	x1, x25
 	mov	x2, sp
-	bl	do_mem_abort
+	bl_c	do_mem_abort
 	b	ret_to_user
 el0_ia:
 	/*
@@ -758,13 +758,13 @@ el0_ia:
 	mrs	x26, far_el1
 	enable_da_f
 #ifdef CONFIG_TRACE_IRQFLAGS
-	bl	trace_hardirqs_off
+	bl_c	trace_hardirqs_off
 #endif
 	ct_user_exit
 	mov	x0, x26
 	mov	x1, x25
 	mov	x2, sp
-	bl	do_el0_ia_bp_hardening
+	bl_c	do_el0_ia_bp_hardening
 	b	ret_to_user
 el0_fpsimd_acc:
 	/*
@@ -774,7 +774,7 @@ el0_fpsimd_acc:
 	ct_user_exit
 	mov	x0, x25
 	mov	x1, sp
-	bl	do_fpsimd_acc
+	bl_c	do_fpsimd_acc
 	b	ret_to_user
 el0_sve_acc:
 	/*
@@ -784,7 +784,7 @@ el0_sve_acc:
 	ct_user_exit
 	mov	x0, x25
 	mov	x1, sp
-	bl	do_sve_acc
+	bl_c	do_sve_acc
 	b	ret_to_user
 el0_fpsimd_exc:
 	/*
@@ -794,7 +794,7 @@ el0_fpsimd_exc:
 	ct_user_exit
 	mov	x0, x25
 	mov	x1, sp
-	bl	do_fpsimd_exc
+	bl_c	do_fpsimd_exc
 	b	ret_to_user
 el0_sp_pc:
 	/*
@@ -803,13 +803,13 @@ el0_sp_pc:
 	mrs	x26, far_el1
 	enable_da_f
 #ifdef CONFIG_TRACE_IRQFLAGS
-	bl	trace_hardirqs_off
+	bl_c	trace_hardirqs_off
 #endif
 	ct_user_exit
 	mov	x0, x26
 	mov	x1, x25
 	mov	x2, sp
-	bl	do_sp_pc_abort
+	bl_c	do_sp_pc_abort
 	b	ret_to_user
 el0_undef:
 	/*
@@ -818,7 +818,7 @@ el0_undef:
 	enable_daif
 	ct_user_exit
 	mov	x0, sp
-	bl	do_undefinstr
+	bl_c	do_undefinstr
 	b	ret_to_user
 el0_sys:
 	/*
@@ -828,7 +828,7 @@ el0_sys:
 	ct_user_exit
 	mov	x0, x25
 	mov	x1, sp
-	bl	do_sysinstr
+	bl_c	do_sysinstr
 	b	ret_to_user
 el0_dbg:
 	/*
@@ -838,7 +838,7 @@ el0_dbg:
 	mrs	x0, far_el1
 	mov	x1, x25
 	mov	x2, sp
-	bl	do_debug_exception
+	bl_c	do_debug_exception
 	enable_daif
 	ct_user_exit
 	b	ret_to_user
@@ -848,7 +848,7 @@ el0_inv:
 	mov	x0, sp
 	mov	x1, #BAD_SYNC
 	mov	x2, x25
-	bl	bad_el0_sync
+	bl_c	bad_el0_sync
 	b	ret_to_user
 ENDPROC(el0_sync)
 
@@ -858,19 +858,19 @@ el0_irq:
 el0_irq_naked:
 	enable_da_f
 #ifdef CONFIG_TRACE_IRQFLAGS
-	bl	trace_hardirqs_off
+	bl_c	trace_hardirqs_off
 #endif
 
 	ct_user_exit
 #ifdef CONFIG_HARDEN_BRANCH_PREDICTOR
 	tbz	x22, #55, 1f
-	bl	do_el0_irq_bp_hardening
+	bl_c	do_el0_irq_bp_hardening
 1:
 #endif
 	irq_handler
 
 #ifdef CONFIG_TRACE_IRQFLAGS
-	bl	trace_hardirqs_on
+	bl_c	trace_hardirqs_on
 #endif
 	b	ret_to_user
 ENDPROC(el0_irq)
@@ -880,7 +880,7 @@ el1_error:
 	mrs	x1, esr_el1
 	enable_dbg
 	mov	x0, sp
-	bl	do_serror
+	bl_c	do_serror
 	kernel_exit 1
 ENDPROC(el1_error)
 
@@ -890,7 +890,7 @@ el0_error_naked:
 	mrs	x1, esr_el1
 	enable_dbg
 	mov	x0, sp
-	bl	do_serror
+	bl_c	do_serror
 	enable_daif
 	ct_user_exit
 	b	ret_to_user
@@ -920,9 +920,9 @@ ret_fast_syscall_trace:
  */
 work_pending:
 	mov	x0, sp				// 'regs'
-	bl	do_notify_resume
+	bl_c	do_notify_resume
 #ifdef CONFIG_TRACE_IRQFLAGS
-	bl	trace_hardirqs_on		// enabled while in userspace
+	bl_c	trace_hardirqs_on		// enabled while in userspace
 #endif
 	ldr	x1, [tsk, #TSK_TI_FLAGS]	// re-check for single-step
 	b	finish_ret_to_user
@@ -980,11 +980,11 @@ el0_svc_naked:					// compat entry point
 	b.hs	ni_sys
 	mask_nospec64 xscno, xsc_nr, x19	// enforce bounds for syscall number
 	ldr	x16, [stbl, xscno, lsl #3]	// address in the syscall table
-	blr	x16				// call sys_* routine
+	blr_c	x16				// call sys_* routine
 	b	ret_fast_syscall
 ni_sys:
 	mov	x0, sp
-	bl	do_ni_syscall
+	bl_c	do_ni_syscall
 	b	ret_fast_syscall
 ENDPROC(el0_svc)
 
@@ -998,7 +998,7 @@ __sys_trace:
 	mov	x0, #-ENOSYS			// set default errno if so
 	str	x0, [sp, #S_X0]
 1:	mov	x0, sp
-	bl	syscall_trace_enter
+	bl_c	syscall_trace_enter
 	cmp	w0, #NO_SYSCALL			// skip the syscall?
 	b.eq	__sys_trace_return_skipped
 	mov	wscno, w0			// syscall number (possibly new)
@@ -1010,18 +1010,18 @@ __sys_trace:
 	ldp	x4, x5, [sp, #S_X4]
 	ldp	x6, x7, [sp, #S_X6]
 	ldr	x16, [stbl, xscno, lsl #3]	// address in the syscall table
-	blr	x16				// call sys_* routine
+	blr_c	x16				// call sys_* routine
 
 __sys_trace_return:
 	str	x0, [sp, #S_X0]			// save returned x0
 __sys_trace_return_skipped:
 	mov	x0, sp
-	bl	syscall_trace_exit
+	bl_c	syscall_trace_exit
 	b	ret_to_user
 
 __ni_sys_trace:
 	mov	x0, sp
-	bl	do_ni_syscall
+	bl_c	do_ni_syscall
 	b	__sys_trace_return
 
 	.popsection				// .entry.text
@@ -1182,10 +1182,10 @@ NOKPROBE(cpu_switch_to)
  * This is how we return from a fork.
  */
 ENTRY(ret_from_fork)
-	bl	schedule_tail
+	bl_c	schedule_tail
 	cbz	x19, 1f				// not a kernel thread
 	mov	x0, x20
-	blr	x19
+	blr_c	x19
 1:	get_thread_info tsk
 	b	ret_to_user
 ENDPROC(ret_from_fork)
@@ -1337,7 +1337,7 @@ ENTRY(__sdei_asm_handler)
 
 	add	x0, x19, #SDEI_EVENT_INTREGS
 	mov	x1, x19
-	bl	__sdei_handler
+	bl_c	__sdei_handler
 
 	msr	sp_el0, x28
 	/* restore regs >x17 that we clobbered */
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index b0853069702f..10414bbbeecb 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -432,13 +432,13 @@ __primary_switched:
 	dsb	ishst				// Make zero page visible to PTW
 
 #ifdef CONFIG_KASAN
-	bl	kasan_early_init
+	bl_c	kasan_early_init
 #endif
 #ifdef CONFIG_RANDOMIZE_BASE
 	tst	x23, ~(MIN_KIMG_ALIGN - 1)	// already running randomized?
 	b.ne	0f
 	mov	x0, x21				// pass FDT address in x0
-	bl	kaslr_early_init		// parse FDT for KASLR options
+	bl_c	kaslr_early_init		// parse FDT for KASLR options
 	cbz	x0, 0f				// KASLR disabled? just proceed
 	orr	x23, x23, x0			// record KASLR offset
 	ldp	x29, x30, [sp], #16		// we must enable KASLR, return
diff --git a/arch/arm64/kernel/probes/kprobes_trampoline.S b/arch/arm64/kernel/probes/kprobes_trampoline.S
index 45dce03aaeaf..0b195b727dc7 100644
--- a/arch/arm64/kernel/probes/kprobes_trampoline.S
+++ b/arch/arm64/kernel/probes/kprobes_trampoline.S
@@ -67,7 +67,7 @@ ENTRY(kretprobe_trampoline)
 	save_all_base_regs
 
 	mov x0, sp
-	bl trampoline_probe_handler
+	bl_c trampoline_probe_handler
 	/*
 	 * Replace trampoline address in lr with actual orig_ret_addr return
 	 * address.
diff --git a/arch/arm64/kernel/sleep.S b/arch/arm64/kernel/sleep.S
index bebec8ef9372..6ced3a8bb528 100644
--- a/arch/arm64/kernel/sleep.S
+++ b/arch/arm64/kernel/sleep.S
@@ -90,7 +90,7 @@ ENTRY(__cpu_suspend_enter)
 	str	x0, [x1]
 	add	x0, x0, #SLEEP_STACK_DATA_SYSTEM_REGS
 	stp	x29, lr, [sp, #-16]!
-	bl	cpu_do_suspend
+	bl_c	cpu_do_suspend
 	ldp	x29, lr, [sp], #16
 	mov	x0, #1
 	ret
@@ -129,11 +129,11 @@ ENTRY(_cpu_resume)
 	/*
 	 * cpu_do_resume expects x0 to contain context address pointer
 	 */
-	bl	cpu_do_resume
+	bl_c	cpu_do_resume
 
 #ifdef CONFIG_KASAN
 	mov	x0, sp
-	bl	kasan_unpoison_task_stack_below
+	bl_c	kasan_unpoison_task_stack_below
 #endif
 
 	ldp	x19, x20, [x29, #16]
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC/PoC PATCH 1/3] arm64: use wrapper macro for bl/blx instructions from asm code
@ 2018-08-02 13:21   ` Ard Biesheuvel
  0 siblings, 0 replies; 46+ messages in thread
From: Ard Biesheuvel @ 2018-08-02 13:21 UTC (permalink / raw)
  To: kernel-hardening
  Cc: keescook, christoffer.dall, will.deacon, catalin.marinas,
	mark.rutland, labbott, linux-arm-kernel, Ard Biesheuvel

In preparation of enabling a feature that temporarily clears the
sign bit in the stack pointer register across a subroutine return,
switch to bl_c/blr_c macros for making such calls from assembler
source. They will be updated in a subsequent patch to conditionally
incorporate the restore sequence for the stack pointer register.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/assembler.h            | 12 ++-
 arch/arm64/kernel/entry-ftrace.S              |  6 +-
 arch/arm64/kernel/entry.S                     | 86 ++++++++++----------
 arch/arm64/kernel/head.S                      |  4 +-
 arch/arm64/kernel/probes/kprobes_trampoline.S |  2 +-
 arch/arm64/kernel/sleep.S                     |  6 +-
 6 files changed, 62 insertions(+), 54 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index 0bcc98dbba56..346ada4de48a 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -687,8 +687,8 @@ USER(\label, ic	ivau, \tmp2)			// invalidate I line PoU
 	.endm
 
 	.macro		do_cond_yield_neon
-	bl		kernel_neon_end
-	bl		kernel_neon_begin
+	bl_c		kernel_neon_end
+	bl_c		kernel_neon_begin
 	.endm
 
 	.macro		endif_yield_neon, lbl
@@ -701,4 +701,12 @@ USER(\label, ic	ivau, \tmp2)			// invalidate I line PoU
 .Lyield_out_\@ :
 	.endm
 
+	.macro		bl_c, target
+	bl		\target
+	.endm
+
+	.macro		blr_c, reg
+	blr		\reg
+	.endm
+
 #endif	/* __ASM_ASSEMBLER_H */
diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S
index 1175f5827ae1..4691eef0dc65 100644
--- a/arch/arm64/kernel/entry-ftrace.S
+++ b/arch/arm64/kernel/entry-ftrace.S
@@ -106,7 +106,7 @@ ENTRY(_mcount)
 
 	mcount_get_pc	x0		//       function's pc
 	mcount_get_lr	x1		//       function's lr (= parent's pc)
-	blr	x2			//   (*ftrace_trace_function)(pc, lr);
+	blr_c	x2			//   (*ftrace_trace_function)(pc, lr);
 
 skip_ftrace_call:			// }
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
@@ -200,7 +200,7 @@ ENTRY(ftrace_graph_caller)
 	mcount_get_lr_addr	  x0	//     pointer to function's saved lr
 	mcount_get_pc		  x1	//     function's pc
 	mcount_get_parent_fp	  x2	//     parent's fp
-	bl	prepare_ftrace_return	// prepare_ftrace_return(&lr, pc, fp)
+	bl_c	prepare_ftrace_return	// prepare_ftrace_return(&lr, pc, fp)
 
 	mcount_exit
 ENDPROC(ftrace_graph_caller)
@@ -215,7 +215,7 @@ ENDPROC(ftrace_graph_caller)
 ENTRY(return_to_handler)
 	save_return_regs
 	mov	x0, x29			//     parent's fp
-	bl	ftrace_return_to_handler// addr = ftrace_return_to_hander(fp);
+	bl_c	ftrace_return_to_handler// addr = ftrace_return_to_hander(fp);
 	mov	x30, x0			// restore the original return address
 	restore_return_regs
 	ret
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 28ad8799406f..eba5b6b528ea 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -43,7 +43,7 @@
  */
 	.macro ct_user_exit, syscall = 0
 #ifdef CONFIG_CONTEXT_TRACKING
-	bl	context_tracking_user_exit
+	bl_c	context_tracking_user_exit
 	.if \syscall == 1
 	/*
 	 * Save/restore needed during syscalls.  Restore syscall arguments from
@@ -59,7 +59,7 @@
 
 	.macro ct_user_enter
 #ifdef CONFIG_CONTEXT_TRACKING
-	bl	context_tracking_user_enter
+	bl_c	context_tracking_user_enter
 #endif
 	.endm
 
@@ -305,7 +305,7 @@ alternative_else_nop_endif
 	 * Cavium erratum 27456 (broadcast TLBI instructions may cause I-cache
 	 * corruption).
 	 */
-	bl	post_ttbr_update_workaround
+	bl_c	post_ttbr_update_workaround
 	.endif
 1:
 	.if	\el != 0
@@ -425,7 +425,7 @@ tsk	.req	x28		// current thread_info
 	ldr_l	x1, handle_arch_irq
 	mov	x0, sp
 	irq_stack_entry
-	blr	x1
+	blr_c	x1
 	irq_stack_exit
 	.endm
 
@@ -490,7 +490,7 @@ __bad_stack:
 	mov	x0, sp
 
 	/* Time to die */
-	bl	handle_bad_stack
+	bl_c	handle_bad_stack
 	ASM_BUG()
 #endif /* CONFIG_VMAP_STACK */
 
@@ -502,7 +502,7 @@ __bad_stack:
 	mov	x0, sp
 	mov	x1, #\reason
 	mrs	x2, esr_el1
-	bl	bad_mode
+	bl_c	bad_mode
 	ASM_BUG()
 	.endm
 
@@ -580,7 +580,7 @@ el1_da:
 	inherit_daif	pstate=x23, tmp=x2
 	clear_address_tag x0, x3
 	mov	x2, sp				// struct pt_regs
-	bl	do_mem_abort
+	bl_c	do_mem_abort
 
 	kernel_exit 1
 el1_sp_pc:
@@ -590,7 +590,7 @@ el1_sp_pc:
 	mrs	x0, far_el1
 	inherit_daif	pstate=x23, tmp=x2
 	mov	x2, sp
-	bl	do_sp_pc_abort
+	bl_c	do_sp_pc_abort
 	ASM_BUG()
 el1_undef:
 	/*
@@ -598,7 +598,7 @@ el1_undef:
 	 */
 	inherit_daif	pstate=x23, tmp=x2
 	mov	x0, sp
-	bl	do_undefinstr
+	bl_c	do_undefinstr
 	ASM_BUG()
 el1_dbg:
 	/*
@@ -609,7 +609,7 @@ el1_dbg:
 	tbz	x24, #0, el1_inv		// EL1 only
 	mrs	x0, far_el1
 	mov	x2, sp				// struct pt_regs
-	bl	do_debug_exception
+	bl_c	do_debug_exception
 	kernel_exit 1
 el1_inv:
 	// TODO: add support for undefined instructions in kernel mode
@@ -617,7 +617,7 @@ el1_inv:
 	mov	x0, sp
 	mov	x2, x1
 	mov	x1, #BAD_SYNC
-	bl	bad_mode
+	bl_c	bad_mode
 	ASM_BUG()
 ENDPROC(el1_sync)
 
@@ -626,7 +626,7 @@ el1_irq:
 	kernel_entry 1
 	enable_da_f
 #ifdef CONFIG_TRACE_IRQFLAGS
-	bl	trace_hardirqs_off
+	bl_c	trace_hardirqs_off
 #endif
 
 	irq_handler
@@ -636,11 +636,11 @@ el1_irq:
 	cbnz	w24, 1f				// preempt count != 0
 	ldr	x0, [tsk, #TSK_TI_FLAGS]	// get flags
 	tbz	x0, #TIF_NEED_RESCHED, 1f	// needs rescheduling?
-	bl	el1_preempt
+	bl_c	el1_preempt
 1:
 #endif
 #ifdef CONFIG_TRACE_IRQFLAGS
-	bl	trace_hardirqs_on
+	bl_c	trace_hardirqs_on
 #endif
 	kernel_exit 1
 ENDPROC(el1_irq)
@@ -648,7 +648,7 @@ ENDPROC(el1_irq)
 #ifdef CONFIG_PREEMPT
 el1_preempt:
 	mov	x24, lr
-1:	bl	preempt_schedule_irq		// irq en/disable is done inside
+1:	bl_c	preempt_schedule_irq		// irq en/disable is done inside
 	ldr	x0, [tsk, #TSK_TI_FLAGS]	// get new tasks TI_FLAGS
 	tbnz	x0, #TIF_NEED_RESCHED, 1b	// needs rescheduling?
 	ret	x24
@@ -749,7 +749,7 @@ el0_da:
 	clear_address_tag x0, x26
 	mov	x1, x25
 	mov	x2, sp
-	bl	do_mem_abort
+	bl_c	do_mem_abort
 	b	ret_to_user
 el0_ia:
 	/*
@@ -758,13 +758,13 @@ el0_ia:
 	mrs	x26, far_el1
 	enable_da_f
 #ifdef CONFIG_TRACE_IRQFLAGS
-	bl	trace_hardirqs_off
+	bl_c	trace_hardirqs_off
 #endif
 	ct_user_exit
 	mov	x0, x26
 	mov	x1, x25
 	mov	x2, sp
-	bl	do_el0_ia_bp_hardening
+	bl_c	do_el0_ia_bp_hardening
 	b	ret_to_user
 el0_fpsimd_acc:
 	/*
@@ -774,7 +774,7 @@ el0_fpsimd_acc:
 	ct_user_exit
 	mov	x0, x25
 	mov	x1, sp
-	bl	do_fpsimd_acc
+	bl_c	do_fpsimd_acc
 	b	ret_to_user
 el0_sve_acc:
 	/*
@@ -784,7 +784,7 @@ el0_sve_acc:
 	ct_user_exit
 	mov	x0, x25
 	mov	x1, sp
-	bl	do_sve_acc
+	bl_c	do_sve_acc
 	b	ret_to_user
 el0_fpsimd_exc:
 	/*
@@ -794,7 +794,7 @@ el0_fpsimd_exc:
 	ct_user_exit
 	mov	x0, x25
 	mov	x1, sp
-	bl	do_fpsimd_exc
+	bl_c	do_fpsimd_exc
 	b	ret_to_user
 el0_sp_pc:
 	/*
@@ -803,13 +803,13 @@ el0_sp_pc:
 	mrs	x26, far_el1
 	enable_da_f
 #ifdef CONFIG_TRACE_IRQFLAGS
-	bl	trace_hardirqs_off
+	bl_c	trace_hardirqs_off
 #endif
 	ct_user_exit
 	mov	x0, x26
 	mov	x1, x25
 	mov	x2, sp
-	bl	do_sp_pc_abort
+	bl_c	do_sp_pc_abort
 	b	ret_to_user
 el0_undef:
 	/*
@@ -818,7 +818,7 @@ el0_undef:
 	enable_daif
 	ct_user_exit
 	mov	x0, sp
-	bl	do_undefinstr
+	bl_c	do_undefinstr
 	b	ret_to_user
 el0_sys:
 	/*
@@ -828,7 +828,7 @@ el0_sys:
 	ct_user_exit
 	mov	x0, x25
 	mov	x1, sp
-	bl	do_sysinstr
+	bl_c	do_sysinstr
 	b	ret_to_user
 el0_dbg:
 	/*
@@ -838,7 +838,7 @@ el0_dbg:
 	mrs	x0, far_el1
 	mov	x1, x25
 	mov	x2, sp
-	bl	do_debug_exception
+	bl_c	do_debug_exception
 	enable_daif
 	ct_user_exit
 	b	ret_to_user
@@ -848,7 +848,7 @@ el0_inv:
 	mov	x0, sp
 	mov	x1, #BAD_SYNC
 	mov	x2, x25
-	bl	bad_el0_sync
+	bl_c	bad_el0_sync
 	b	ret_to_user
 ENDPROC(el0_sync)
 
@@ -858,19 +858,19 @@ el0_irq:
 el0_irq_naked:
 	enable_da_f
 #ifdef CONFIG_TRACE_IRQFLAGS
-	bl	trace_hardirqs_off
+	bl_c	trace_hardirqs_off
 #endif
 
 	ct_user_exit
 #ifdef CONFIG_HARDEN_BRANCH_PREDICTOR
 	tbz	x22, #55, 1f
-	bl	do_el0_irq_bp_hardening
+	bl_c	do_el0_irq_bp_hardening
 1:
 #endif
 	irq_handler
 
 #ifdef CONFIG_TRACE_IRQFLAGS
-	bl	trace_hardirqs_on
+	bl_c	trace_hardirqs_on
 #endif
 	b	ret_to_user
 ENDPROC(el0_irq)
@@ -880,7 +880,7 @@ el1_error:
 	mrs	x1, esr_el1
 	enable_dbg
 	mov	x0, sp
-	bl	do_serror
+	bl_c	do_serror
 	kernel_exit 1
 ENDPROC(el1_error)
 
@@ -890,7 +890,7 @@ el0_error_naked:
 	mrs	x1, esr_el1
 	enable_dbg
 	mov	x0, sp
-	bl	do_serror
+	bl_c	do_serror
 	enable_daif
 	ct_user_exit
 	b	ret_to_user
@@ -920,9 +920,9 @@ ret_fast_syscall_trace:
  */
 work_pending:
 	mov	x0, sp				// 'regs'
-	bl	do_notify_resume
+	bl_c	do_notify_resume
 #ifdef CONFIG_TRACE_IRQFLAGS
-	bl	trace_hardirqs_on		// enabled while in userspace
+	bl_c	trace_hardirqs_on		// enabled while in userspace
 #endif
 	ldr	x1, [tsk, #TSK_TI_FLAGS]	// re-check for single-step
 	b	finish_ret_to_user
@@ -980,11 +980,11 @@ el0_svc_naked:					// compat entry point
 	b.hs	ni_sys
 	mask_nospec64 xscno, xsc_nr, x19	// enforce bounds for syscall number
 	ldr	x16, [stbl, xscno, lsl #3]	// address in the syscall table
-	blr	x16				// call sys_* routine
+	blr_c	x16				// call sys_* routine
 	b	ret_fast_syscall
 ni_sys:
 	mov	x0, sp
-	bl	do_ni_syscall
+	bl_c	do_ni_syscall
 	b	ret_fast_syscall
 ENDPROC(el0_svc)
 
@@ -998,7 +998,7 @@ __sys_trace:
 	mov	x0, #-ENOSYS			// set default errno if so
 	str	x0, [sp, #S_X0]
 1:	mov	x0, sp
-	bl	syscall_trace_enter
+	bl_c	syscall_trace_enter
 	cmp	w0, #NO_SYSCALL			// skip the syscall?
 	b.eq	__sys_trace_return_skipped
 	mov	wscno, w0			// syscall number (possibly new)
@@ -1010,18 +1010,18 @@ __sys_trace:
 	ldp	x4, x5, [sp, #S_X4]
 	ldp	x6, x7, [sp, #S_X6]
 	ldr	x16, [stbl, xscno, lsl #3]	// address in the syscall table
-	blr	x16				// call sys_* routine
+	blr_c	x16				// call sys_* routine
 
 __sys_trace_return:
 	str	x0, [sp, #S_X0]			// save returned x0
 __sys_trace_return_skipped:
 	mov	x0, sp
-	bl	syscall_trace_exit
+	bl_c	syscall_trace_exit
 	b	ret_to_user
 
 __ni_sys_trace:
 	mov	x0, sp
-	bl	do_ni_syscall
+	bl_c	do_ni_syscall
 	b	__sys_trace_return
 
 	.popsection				// .entry.text
@@ -1182,10 +1182,10 @@ NOKPROBE(cpu_switch_to)
  * This is how we return from a fork.
  */
 ENTRY(ret_from_fork)
-	bl	schedule_tail
+	bl_c	schedule_tail
 	cbz	x19, 1f				// not a kernel thread
 	mov	x0, x20
-	blr	x19
+	blr_c	x19
 1:	get_thread_info tsk
 	b	ret_to_user
 ENDPROC(ret_from_fork)
@@ -1337,7 +1337,7 @@ ENTRY(__sdei_asm_handler)
 
 	add	x0, x19, #SDEI_EVENT_INTREGS
 	mov	x1, x19
-	bl	__sdei_handler
+	bl_c	__sdei_handler
 
 	msr	sp_el0, x28
 	/* restore regs >x17 that we clobbered */
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index b0853069702f..10414bbbeecb 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -432,13 +432,13 @@ __primary_switched:
 	dsb	ishst				// Make zero page visible to PTW
 
 #ifdef CONFIG_KASAN
-	bl	kasan_early_init
+	bl_c	kasan_early_init
 #endif
 #ifdef CONFIG_RANDOMIZE_BASE
 	tst	x23, ~(MIN_KIMG_ALIGN - 1)	// already running randomized?
 	b.ne	0f
 	mov	x0, x21				// pass FDT address in x0
-	bl	kaslr_early_init		// parse FDT for KASLR options
+	bl_c	kaslr_early_init		// parse FDT for KASLR options
 	cbz	x0, 0f				// KASLR disabled? just proceed
 	orr	x23, x23, x0			// record KASLR offset
 	ldp	x29, x30, [sp], #16		// we must enable KASLR, return
diff --git a/arch/arm64/kernel/probes/kprobes_trampoline.S b/arch/arm64/kernel/probes/kprobes_trampoline.S
index 45dce03aaeaf..0b195b727dc7 100644
--- a/arch/arm64/kernel/probes/kprobes_trampoline.S
+++ b/arch/arm64/kernel/probes/kprobes_trampoline.S
@@ -67,7 +67,7 @@ ENTRY(kretprobe_trampoline)
 	save_all_base_regs
 
 	mov x0, sp
-	bl trampoline_probe_handler
+	bl_c trampoline_probe_handler
 	/*
 	 * Replace trampoline address in lr with actual orig_ret_addr return
 	 * address.
diff --git a/arch/arm64/kernel/sleep.S b/arch/arm64/kernel/sleep.S
index bebec8ef9372..6ced3a8bb528 100644
--- a/arch/arm64/kernel/sleep.S
+++ b/arch/arm64/kernel/sleep.S
@@ -90,7 +90,7 @@ ENTRY(__cpu_suspend_enter)
 	str	x0, [x1]
 	add	x0, x0, #SLEEP_STACK_DATA_SYSTEM_REGS
 	stp	x29, lr, [sp, #-16]!
-	bl	cpu_do_suspend
+	bl_c	cpu_do_suspend
 	ldp	x29, lr, [sp], #16
 	mov	x0, #1
 	ret
@@ -129,11 +129,11 @@ ENTRY(_cpu_resume)
 	/*
 	 * cpu_do_resume expects x0 to contain context address pointer
 	 */
-	bl	cpu_do_resume
+	bl_c	cpu_do_resume
 
 #ifdef CONFIG_KASAN
 	mov	x0, sp
-	bl	kasan_unpoison_task_stack_below
+	bl_c	kasan_unpoison_task_stack_below
 #endif
 
 	ldp	x19, x20, [x29, #16]
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC/PoC PATCH 2/3] gcc: plugins: add ROP shield plugin for arm64
  2018-08-02 13:21 ` Ard Biesheuvel
@ 2018-08-02 13:21   ` Ard Biesheuvel
  -1 siblings, 0 replies; 46+ messages in thread
From: Ard Biesheuvel @ 2018-08-02 13:21 UTC (permalink / raw)
  To: linux-arm-kernel

Add a plugin that mangles every 'ret' instruction so bit 55 in the
stack pointer register is cleared first, and every 'bl' or 'blr'
instruction so that the bit is set again right after the call
returns. This should make it very difficult for ROP attacks to be
staged, given that the supply of gadgets is now reduced to those
that start with the 'reset bit #55' sequence, which only occurs right
after a function return when all caller save registers are dead.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/Kconfig                                  |   4 +
 drivers/firmware/efi/libstub/Makefile         |   3 +-
 scripts/Makefile.gcc-plugins                  |   7 ++
 scripts/gcc-plugins/arm64_rop_shield_plugin.c | 116 ++++++++++++++++++++
 4 files changed, 129 insertions(+), 1 deletion(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 1aa59063f1fd..d61d1249d986 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -549,6 +549,10 @@ config GCC_PLUGIN_RANDSTRUCT_PERFORMANCE
 	  in structures.  This reduces the performance hit of RANDSTRUCT
 	  at the cost of weakened randomization.
 
+config GCC_PLUGIN_ARM64_ROP_SHIELD
+	bool
+	depends on GCC_PLUGINS && ARM64
+
 config HAVE_STACKPROTECTOR
 	bool
 	help
diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index a34e9290a699..9b1510e5957f 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -20,7 +20,8 @@ cflags-$(CONFIG_EFI_ARMSTUB)	+= -I$(srctree)/scripts/dtc/libfdt
 KBUILD_CFLAGS			:= $(cflags-y) -DDISABLE_BRANCH_PROFILING \
 				   -D__NO_FORTIFY \
 				   $(call cc-option,-ffreestanding) \
-				   $(call cc-option,-fno-stack-protector)
+				   $(call cc-option,-fno-stack-protector) \
+				   $(DISABLE_ARM64_ROP_SHIELD_PLUGIN)
 
 GCOV_PROFILE			:= n
 KASAN_SANITIZE			:= n
diff --git a/scripts/Makefile.gcc-plugins b/scripts/Makefile.gcc-plugins
index c961b9a65d11..f4f9c27fb3a0 100644
--- a/scripts/Makefile.gcc-plugins
+++ b/scripts/Makefile.gcc-plugins
@@ -17,10 +17,17 @@ gcc-plugin-$(CONFIG_GCC_PLUGIN_RANDSTRUCT)	+= randomize_layout_plugin.so
 gcc-plugin-cflags-$(CONFIG_GCC_PLUGIN_RANDSTRUCT)	+= -DRANDSTRUCT_PLUGIN
 gcc-plugin-cflags-$(CONFIG_GCC_PLUGIN_RANDSTRUCT_PERFORMANCE)	+= -fplugin-arg-randomize_layout_plugin-performance-mode
 
+ifdef CONFIG_GCC_PLUGIN_ARM64_ROP_SHIELD
+gcc-plugin-y				+= arm64_rop_shield_plugin.so
+gcc-plugin-cflags-y			+= -DARM64_ROP_SHIELD_PLUGIN
+DISABLE_ARM64_ROP_SHIELD_PLUGIN		+= -fplugin-arg-arm64_rop_shield_plugin-disable
+endif
+
 GCC_PLUGINS_CFLAGS := $(strip $(addprefix -fplugin=$(objtree)/scripts/gcc-plugins/, $(gcc-plugin-y)) $(gcc-plugin-cflags-y))
 
 export GCC_PLUGINS_CFLAGS GCC_PLUGIN GCC_PLUGIN_SUBDIR
 export DISABLE_LATENT_ENTROPY_PLUGIN
+export DISABLE_ARM64_ROP_SHIELD_PLUGIN
 
 # sancov_plugin.so can be only in CFLAGS_KCOV because avoid duplication.
 GCC_PLUGINS_CFLAGS := $(filter-out %/sancov_plugin.so, $(GCC_PLUGINS_CFLAGS))
diff --git a/scripts/gcc-plugins/arm64_rop_shield_plugin.c b/scripts/gcc-plugins/arm64_rop_shield_plugin.c
new file mode 100644
index 000000000000..e87f4a9b9ab0
--- /dev/null
+++ b/scripts/gcc-plugins/arm64_rop_shield_plugin.c
@@ -0,0 +1,116 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2018 Ard Biesheuvel <ard.biesheuvel@linaro.org>
+ */
+
+#include "gcc-common.h"
+
+__visible int plugin_is_GPL_compatible;
+
+static unsigned int arm64_rop_shield_execute(void)
+{
+	rtx_insn *insn;
+	rtx body, x, y;
+
+	for (insn = get_insns(); insn; insn = NEXT_INSN(insn)) {
+		if (JUMP_P(insn)) {
+			body = PATTERN(insn);
+
+			if (GET_CODE(body) != RETURN)
+				continue;
+
+			x = gen_rtx_ASM_OPERANDS(VOIDmode,
+						 "mov	x16, sp		   \n\t"
+						 "and	sp, x16, #~(1 << 55)",
+						 "",
+						 0,
+						 rtvec_alloc(0),
+						 rtvec_alloc(0),
+						 rtvec_alloc(0),
+						 UNKNOWN_LOCATION);
+			MEM_VOLATILE_P(x) = true;
+
+			/*
+			 * According to the AAPCS spec, x16 may only be used by
+			 * subroutine calls that are exposed via a jump/call
+			 * ELF relocation, and so the compiler may assume it is
+			 * preserved across a call to a function in the same
+			 * compilation unit. So mark x16 as clobbered
+			 * explicitly.
+			 */
+			y = gen_rtx_CLOBBER(VOIDmode, gen_rtx_REG(Pmode, 16));
+
+			emit_insn_before(gen_rtx_PARALLEL(VOIDmode,
+							  gen_rtvec(2, x, y)),
+					 insn);
+		}
+
+		if (CALL_P(insn)) {
+			rtx_insn *next;
+
+			/*
+			 * We can use x30 here without marking it as clobbered.
+			 * The bl instruction already clobbers it, and whether
+			 * we returned here via a plain 'ret' instruction or via
+			 * some other way is unspecified, so it is no longer
+			 * live when we get here.
+			 */
+			x = gen_rtx_ASM_OPERANDS(VOIDmode,
+						 "mov	x30, sp		   \n\t"
+						 "orr	sp, x30, #(1 << 55)",
+						 "",
+						 0,
+						 rtvec_alloc(0),
+						 rtvec_alloc(0),
+						 rtvec_alloc(0),
+						 UNKNOWN_LOCATION);
+			MEM_VOLATILE_P(x) = true;
+
+			next = NEXT_INSN(insn);
+			if (NOTE_P(next))
+				insn = next;
+
+			emit_insn_after(x, insn);
+		}
+	}
+	return 0;
+}
+
+#define PASS_NAME arm64_rop_shield
+
+#define NO_GATE
+#define TODO_FLAGS_FINISH TODO_dump_func
+#include "gcc-generate-rtl-pass.h"
+
+__visible int plugin_init(struct plugin_name_args *plugin_info,
+			  struct plugin_gcc_version *version)
+{
+	const struct plugin_argument *argv = plugin_info->argv;
+	int argc = plugin_info->argc;
+	bool enable = true;
+	int i;
+
+	if (!plugin_default_version_check(version, &gcc_version)) {
+		error(G_("incompatible gcc/plugin versions"));
+		return 1;
+	}
+
+	PASS_INFO(arm64_rop_shield, "shorten", 1, PASS_POS_INSERT_BEFORE);
+
+	for (i = 0; i < argc; i++) {
+		if (!strcmp(argv[i].key, "disable")) {
+			enable = false;
+			continue;
+		}
+		error(G_("unknown option '-fplugin-arg-%s-%s'"),
+		      plugin_info->base_name, argv[i].key);
+	}
+
+	if (!enable)
+		return 0;
+
+	register_callback(plugin_info->base_name, PLUGIN_PASS_MANAGER_SETUP,
+			  NULL, &arm64_rop_shield_pass_info);
+
+	return 0;
+}
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC/PoC PATCH 2/3] gcc: plugins: add ROP shield plugin for arm64
@ 2018-08-02 13:21   ` Ard Biesheuvel
  0 siblings, 0 replies; 46+ messages in thread
From: Ard Biesheuvel @ 2018-08-02 13:21 UTC (permalink / raw)
  To: kernel-hardening
  Cc: keescook, christoffer.dall, will.deacon, catalin.marinas,
	mark.rutland, labbott, linux-arm-kernel, Ard Biesheuvel

Add a plugin that mangles every 'ret' instruction so bit 55 in the
stack pointer register is cleared first, and every 'bl' or 'blr'
instruction so that the bit is set again right after the call
returns. This should make it very difficult for ROP attacks to be
staged, given that the supply of gadgets is now reduced to those
that start with the 'reset bit #55' sequence, which only occurs right
after a function return when all caller save registers are dead.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/Kconfig                                  |   4 +
 drivers/firmware/efi/libstub/Makefile         |   3 +-
 scripts/Makefile.gcc-plugins                  |   7 ++
 scripts/gcc-plugins/arm64_rop_shield_plugin.c | 116 ++++++++++++++++++++
 4 files changed, 129 insertions(+), 1 deletion(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 1aa59063f1fd..d61d1249d986 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -549,6 +549,10 @@ config GCC_PLUGIN_RANDSTRUCT_PERFORMANCE
 	  in structures.  This reduces the performance hit of RANDSTRUCT
 	  at the cost of weakened randomization.
 
+config GCC_PLUGIN_ARM64_ROP_SHIELD
+	bool
+	depends on GCC_PLUGINS && ARM64
+
 config HAVE_STACKPROTECTOR
 	bool
 	help
diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index a34e9290a699..9b1510e5957f 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -20,7 +20,8 @@ cflags-$(CONFIG_EFI_ARMSTUB)	+= -I$(srctree)/scripts/dtc/libfdt
 KBUILD_CFLAGS			:= $(cflags-y) -DDISABLE_BRANCH_PROFILING \
 				   -D__NO_FORTIFY \
 				   $(call cc-option,-ffreestanding) \
-				   $(call cc-option,-fno-stack-protector)
+				   $(call cc-option,-fno-stack-protector) \
+				   $(DISABLE_ARM64_ROP_SHIELD_PLUGIN)
 
 GCOV_PROFILE			:= n
 KASAN_SANITIZE			:= n
diff --git a/scripts/Makefile.gcc-plugins b/scripts/Makefile.gcc-plugins
index c961b9a65d11..f4f9c27fb3a0 100644
--- a/scripts/Makefile.gcc-plugins
+++ b/scripts/Makefile.gcc-plugins
@@ -17,10 +17,17 @@ gcc-plugin-$(CONFIG_GCC_PLUGIN_RANDSTRUCT)	+= randomize_layout_plugin.so
 gcc-plugin-cflags-$(CONFIG_GCC_PLUGIN_RANDSTRUCT)	+= -DRANDSTRUCT_PLUGIN
 gcc-plugin-cflags-$(CONFIG_GCC_PLUGIN_RANDSTRUCT_PERFORMANCE)	+= -fplugin-arg-randomize_layout_plugin-performance-mode
 
+ifdef CONFIG_GCC_PLUGIN_ARM64_ROP_SHIELD
+gcc-plugin-y				+= arm64_rop_shield_plugin.so
+gcc-plugin-cflags-y			+= -DARM64_ROP_SHIELD_PLUGIN
+DISABLE_ARM64_ROP_SHIELD_PLUGIN		+= -fplugin-arg-arm64_rop_shield_plugin-disable
+endif
+
 GCC_PLUGINS_CFLAGS := $(strip $(addprefix -fplugin=$(objtree)/scripts/gcc-plugins/, $(gcc-plugin-y)) $(gcc-plugin-cflags-y))
 
 export GCC_PLUGINS_CFLAGS GCC_PLUGIN GCC_PLUGIN_SUBDIR
 export DISABLE_LATENT_ENTROPY_PLUGIN
+export DISABLE_ARM64_ROP_SHIELD_PLUGIN
 
 # sancov_plugin.so can be only in CFLAGS_KCOV because avoid duplication.
 GCC_PLUGINS_CFLAGS := $(filter-out %/sancov_plugin.so, $(GCC_PLUGINS_CFLAGS))
diff --git a/scripts/gcc-plugins/arm64_rop_shield_plugin.c b/scripts/gcc-plugins/arm64_rop_shield_plugin.c
new file mode 100644
index 000000000000..e87f4a9b9ab0
--- /dev/null
+++ b/scripts/gcc-plugins/arm64_rop_shield_plugin.c
@@ -0,0 +1,116 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2018 Ard Biesheuvel <ard.biesheuvel@linaro.org>
+ */
+
+#include "gcc-common.h"
+
+__visible int plugin_is_GPL_compatible;
+
+static unsigned int arm64_rop_shield_execute(void)
+{
+	rtx_insn *insn;
+	rtx body, x, y;
+
+	for (insn = get_insns(); insn; insn = NEXT_INSN(insn)) {
+		if (JUMP_P(insn)) {
+			body = PATTERN(insn);
+
+			if (GET_CODE(body) != RETURN)
+				continue;
+
+			x = gen_rtx_ASM_OPERANDS(VOIDmode,
+						 "mov	x16, sp		   \n\t"
+						 "and	sp, x16, #~(1 << 55)",
+						 "",
+						 0,
+						 rtvec_alloc(0),
+						 rtvec_alloc(0),
+						 rtvec_alloc(0),
+						 UNKNOWN_LOCATION);
+			MEM_VOLATILE_P(x) = true;
+
+			/*
+			 * According to the AAPCS spec, x16 may only be used by
+			 * subroutine calls that are exposed via a jump/call
+			 * ELF relocation, and so the compiler may assume it is
+			 * preserved across a call to a function in the same
+			 * compilation unit. So mark x16 as clobbered
+			 * explicitly.
+			 */
+			y = gen_rtx_CLOBBER(VOIDmode, gen_rtx_REG(Pmode, 16));
+
+			emit_insn_before(gen_rtx_PARALLEL(VOIDmode,
+							  gen_rtvec(2, x, y)),
+					 insn);
+		}
+
+		if (CALL_P(insn)) {
+			rtx_insn *next;
+
+			/*
+			 * We can use x30 here without marking it as clobbered.
+			 * The bl instruction already clobbers it, and whether
+			 * we returned here via a plain 'ret' instruction or via
+			 * some other way is unspecified, so it is no longer
+			 * live when we get here.
+			 */
+			x = gen_rtx_ASM_OPERANDS(VOIDmode,
+						 "mov	x30, sp		   \n\t"
+						 "orr	sp, x30, #(1 << 55)",
+						 "",
+						 0,
+						 rtvec_alloc(0),
+						 rtvec_alloc(0),
+						 rtvec_alloc(0),
+						 UNKNOWN_LOCATION);
+			MEM_VOLATILE_P(x) = true;
+
+			next = NEXT_INSN(insn);
+			if (NOTE_P(next))
+				insn = next;
+
+			emit_insn_after(x, insn);
+		}
+	}
+	return 0;
+}
+
+#define PASS_NAME arm64_rop_shield
+
+#define NO_GATE
+#define TODO_FLAGS_FINISH TODO_dump_func
+#include "gcc-generate-rtl-pass.h"
+
+__visible int plugin_init(struct plugin_name_args *plugin_info,
+			  struct plugin_gcc_version *version)
+{
+	const struct plugin_argument *argv = plugin_info->argv;
+	int argc = plugin_info->argc;
+	bool enable = true;
+	int i;
+
+	if (!plugin_default_version_check(version, &gcc_version)) {
+		error(G_("incompatible gcc/plugin versions"));
+		return 1;
+	}
+
+	PASS_INFO(arm64_rop_shield, "shorten", 1, PASS_POS_INSERT_BEFORE);
+
+	for (i = 0; i < argc; i++) {
+		if (!strcmp(argv[i].key, "disable")) {
+			enable = false;
+			continue;
+		}
+		error(G_("unknown option '-fplugin-arg-%s-%s'"),
+		      plugin_info->base_name, argv[i].key);
+	}
+
+	if (!enable)
+		return 0;
+
+	register_callback(plugin_info->base_name, PLUGIN_PASS_MANAGER_SETUP,
+			  NULL, &arm64_rop_shield_pass_info);
+
+	return 0;
+}
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC/PoC PATCH 3/3] arm64: enable ROP protection by clearing SP bit #55 across function returns
  2018-08-02 13:21 ` Ard Biesheuvel
@ 2018-08-02 13:21   ` Ard Biesheuvel
  -1 siblings, 0 replies; 46+ messages in thread
From: Ard Biesheuvel @ 2018-08-02 13:21 UTC (permalink / raw)
  To: linux-arm-kernel

ROP attacks rely on a large supply of so-called 'gadgets', which are
(in this context) short sequences of instructions ending in a stack
pop and a return instruction. By exploiting a stack overflow to create
a specially crafted stack frame, each gadget jumps to the next by
popping off the next gadget's address as a fake return address,
allowing non-trivial 'programs' to be executed by piecing together
a large number of such gadgets.

This attack vector relies heavily on the ability to jump to arbitrary
places in the code. If we could limit where a function could return to,
it is much more difficult to obtain critical mass in terms of a gadget
collection that allows arbitrary attacks to be mounted.

So let's try and do so by clearing bit #55 in the stack pointer register
before returning from a function, and setting it again right after a
'bl' or 'blr' instruction. That way, jumping to arbitrary places in the
code and popping the next gadget's address becomes a lot more complicated,
since the stack pointer will not be valid after a function return until
the 'reset' sequence is executed (or after an exception is taken).

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/Kconfig                 | 10 ++++++++++
 arch/arm64/include/asm/assembler.h |  9 +++++++++
 arch/arm64/kernel/entry.S          | 18 ++++++++++++++++++
 3 files changed, 37 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 42c090cf0292..4562af0250b9 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1011,6 +1011,16 @@ config ARM64_SW_TTBR0_PAN
 	  zeroed area and reserved ASID. The user access routines
 	  restore the valid TTBR0_EL1 temporarily.
 
+config ARM64_ROP_SHIELD
+	bool "Enable basic ROP protection through the stack pointer sign bit"
+	depends on GCC_PLUGINS && VMAP_STACK
+	select GCC_PLUGIN_ARM64_ROP_SHIELD
+	help
+	  Enable protection against ROP attacks by clearing bit #55 in the
+	  stack pointer register across a function return.
+
+	  If paranoid, say Y here. If unsure, say N.
+
 menu "ARMv8.1 architectural features"
 
 config ARM64_HW_AFDBM
diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index 346ada4de48a..95d3ec98eb58 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -701,12 +701,21 @@ USER(\label, ic	ivau, \tmp2)			// invalidate I line PoU
 .Lyield_out_\@ :
 	.endm
 
+	.macro		unclobber_sp, tmp
+#ifdef CONFIG_ARM64_ROP_SHIELD
+	mov		\tmp, sp
+	orr		sp, \tmp, #(1 << 55)
+#endif
+	.endm
+
 	.macro		bl_c, target
 	bl		\target
+	unclobber_sp	x30
 	.endm
 
 	.macro		blr_c, reg
 	blr		\reg
+	unclobber_sp	x30
 	.endm
 
 #endif	/* __ASM_ASSEMBLER_H */
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index eba5b6b528ea..2adebca74f11 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -95,6 +95,9 @@ alternative_else_nop_endif
 	 */
 	add	sp, sp, x0			// sp' = sp + x0
 	sub	x0, sp, x0			// x0' = sp' - x0 = (sp + x0) - x0 = sp
+#ifdef CONFIG_ARM64_ROP_SHIELD
+	tbz	x0, #55, 1f
+#endif
 	tbnz	x0, #THREAD_SHIFT, 0f
 	sub	x0, sp, x0			// x0'' = sp' - x0' = (sp + x0) - sp = x0
 	sub	sp, sp, x0			// sp'' = sp' - x0 = (sp + x0) - x0 = sp
@@ -129,6 +132,21 @@ alternative_else_nop_endif
 	/* We were already on the overflow stack. Restore sp/x0 and carry on. */
 	sub	sp, sp, x0
 	mrs	x0, tpidrro_el0
+	b	el\()\el\()_\label
+
+#ifdef CONFIG_ARM64_ROP_SHIELD
+1:	/*
+	 * We have to do a little dance here to set bit 55 in the stack
+	 * pointer register without clobbering anything else.
+	 */
+	orr	x0, x0, #(1 << 55)
+	str	x1, [x0]
+	mov	x1, sp
+	mov	sp, x0
+	and	x0, x0, #~(1 << 55)
+	sub	x0, x1, x0
+	ldr	x1, [sp]
+#endif
 #endif
 	b	el\()\el\()_\label
 	.endm
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC/PoC PATCH 3/3] arm64: enable ROP protection by clearing SP bit #55 across function returns
@ 2018-08-02 13:21   ` Ard Biesheuvel
  0 siblings, 0 replies; 46+ messages in thread
From: Ard Biesheuvel @ 2018-08-02 13:21 UTC (permalink / raw)
  To: kernel-hardening
  Cc: keescook, christoffer.dall, will.deacon, catalin.marinas,
	mark.rutland, labbott, linux-arm-kernel, Ard Biesheuvel

ROP attacks rely on a large supply of so-called 'gadgets', which are
(in this context) short sequences of instructions ending in a stack
pop and a return instruction. By exploiting a stack overflow to create
a specially crafted stack frame, each gadget jumps to the next by
popping off the next gadget's address as a fake return address,
allowing non-trivial 'programs' to be executed by piecing together
a large number of such gadgets.

This attack vector relies heavily on the ability to jump to arbitrary
places in the code. If we could limit where a function could return to,
it is much more difficult to obtain critical mass in terms of a gadget
collection that allows arbitrary attacks to be mounted.

So let's try and do so by clearing bit #55 in the stack pointer register
before returning from a function, and setting it again right after a
'bl' or 'blr' instruction. That way, jumping to arbitrary places in the
code and popping the next gadget's address becomes a lot more complicated,
since the stack pointer will not be valid after a function return until
the 'reset' sequence is executed (or after an exception is taken).

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/Kconfig                 | 10 ++++++++++
 arch/arm64/include/asm/assembler.h |  9 +++++++++
 arch/arm64/kernel/entry.S          | 18 ++++++++++++++++++
 3 files changed, 37 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 42c090cf0292..4562af0250b9 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1011,6 +1011,16 @@ config ARM64_SW_TTBR0_PAN
 	  zeroed area and reserved ASID. The user access routines
 	  restore the valid TTBR0_EL1 temporarily.
 
+config ARM64_ROP_SHIELD
+	bool "Enable basic ROP protection through the stack pointer sign bit"
+	depends on GCC_PLUGINS && VMAP_STACK
+	select GCC_PLUGIN_ARM64_ROP_SHIELD
+	help
+	  Enable protection against ROP attacks by clearing bit #55 in the
+	  stack pointer register across a function return.
+
+	  If paranoid, say Y here. If unsure, say N.
+
 menu "ARMv8.1 architectural features"
 
 config ARM64_HW_AFDBM
diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index 346ada4de48a..95d3ec98eb58 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -701,12 +701,21 @@ USER(\label, ic	ivau, \tmp2)			// invalidate I line PoU
 .Lyield_out_\@ :
 	.endm
 
+	.macro		unclobber_sp, tmp
+#ifdef CONFIG_ARM64_ROP_SHIELD
+	mov		\tmp, sp
+	orr		sp, \tmp, #(1 << 55)
+#endif
+	.endm
+
 	.macro		bl_c, target
 	bl		\target
+	unclobber_sp	x30
 	.endm
 
 	.macro		blr_c, reg
 	blr		\reg
+	unclobber_sp	x30
 	.endm
 
 #endif	/* __ASM_ASSEMBLER_H */
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index eba5b6b528ea..2adebca74f11 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -95,6 +95,9 @@ alternative_else_nop_endif
 	 */
 	add	sp, sp, x0			// sp' = sp + x0
 	sub	x0, sp, x0			// x0' = sp' - x0 = (sp + x0) - x0 = sp
+#ifdef CONFIG_ARM64_ROP_SHIELD
+	tbz	x0, #55, 1f
+#endif
 	tbnz	x0, #THREAD_SHIFT, 0f
 	sub	x0, sp, x0			// x0'' = sp' - x0' = (sp + x0) - sp = x0
 	sub	sp, sp, x0			// sp'' = sp' - x0 = (sp + x0) - x0 = sp
@@ -129,6 +132,21 @@ alternative_else_nop_endif
 	/* We were already on the overflow stack. Restore sp/x0 and carry on. */
 	sub	sp, sp, x0
 	mrs	x0, tpidrro_el0
+	b	el\()\el\()_\label
+
+#ifdef CONFIG_ARM64_ROP_SHIELD
+1:	/*
+	 * We have to do a little dance here to set bit 55 in the stack
+	 * pointer register without clobbering anything else.
+	 */
+	orr	x0, x0, #(1 << 55)
+	str	x1, [x0]
+	mov	x1, sp
+	mov	sp, x0
+	and	x0, x0, #~(1 << 55)
+	sub	x0, x1, x0
+	ldr	x1, [sp]
+#endif
 #endif
 	b	el\()\el\()_\label
 	.endm
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation
  2018-08-02 13:21 ` Ard Biesheuvel
@ 2018-08-06 10:07   ` Florian Weimer
  -1 siblings, 0 replies; 46+ messages in thread
From: Florian Weimer @ 2018-08-06 10:07 UTC (permalink / raw)
  To: linux-arm-kernel

On 08/02/2018 03:21 PM, Ard Biesheuvel wrote:
> The idea is that we can significantly limit the kernel's attack surface
> for ROP based attacks by clearing the stack pointer's sign bit before
> returning from a function, and setting it again right after proceeding
> from the [expected] return address. This should make it much more difficult
> to return to arbitrary gadgets, given that they rely on being chained to
> the next via a return address popped off the stack, and this is difficult
> when the stack pointer is invalid.

Doesn't this break stack unwinding?

Thanks,
Florian

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation
@ 2018-08-06 10:07   ` Florian Weimer
  0 siblings, 0 replies; 46+ messages in thread
From: Florian Weimer @ 2018-08-06 10:07 UTC (permalink / raw)
  To: Ard Biesheuvel, kernel-hardening
  Cc: keescook, christoffer.dall, will.deacon, catalin.marinas,
	mark.rutland, labbott, linux-arm-kernel

On 08/02/2018 03:21 PM, Ard Biesheuvel wrote:
> The idea is that we can significantly limit the kernel's attack surface
> for ROP based attacks by clearing the stack pointer's sign bit before
> returning from a function, and setting it again right after proceeding
> from the [expected] return address. This should make it much more difficult
> to return to arbitrary gadgets, given that they rely on being chained to
> the next via a return address popped off the stack, and this is difficult
> when the stack pointer is invalid.

Doesn't this break stack unwinding?

Thanks,
Florian

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation
  2018-08-06 10:07   ` Florian Weimer
@ 2018-08-06 10:31     ` Ard Biesheuvel
  -1 siblings, 0 replies; 46+ messages in thread
From: Ard Biesheuvel @ 2018-08-06 10:31 UTC (permalink / raw)
  To: linux-arm-kernel

On 6 August 2018 at 12:07, Florian Weimer <fweimer@redhat.com> wrote:
> On 08/02/2018 03:21 PM, Ard Biesheuvel wrote:
>>
>> The idea is that we can significantly limit the kernel's attack surface
>> for ROP based attacks by clearing the stack pointer's sign bit before
>> returning from a function, and setting it again right after proceeding
>> from the [expected] return address. This should make it much more
>> difficult
>> to return to arbitrary gadgets, given that they rely on being chained to
>> the next via a return address popped off the stack, and this is difficult
>> when the stack pointer is invalid.
>
>
> Doesn't this break stack unwinding?
>

Any exception that is taken between clearing and setting the SP bit
will first reset the bit. Since arm64 does not rely on hardware to
preserve the exception context, we can do this in code before
preserving the registers (which is why we need the 'little dance' in
the last patch)

So any time the stack unwinding code runs, the stack pointer should be valid.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation
@ 2018-08-06 10:31     ` Ard Biesheuvel
  0 siblings, 0 replies; 46+ messages in thread
From: Ard Biesheuvel @ 2018-08-06 10:31 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Kernel Hardening, Kees Cook, Christoffer Dall, Will Deacon,
	Catalin Marinas, Mark Rutland, Laura Abbott, linux-arm-kernel

On 6 August 2018 at 12:07, Florian Weimer <fweimer@redhat.com> wrote:
> On 08/02/2018 03:21 PM, Ard Biesheuvel wrote:
>>
>> The idea is that we can significantly limit the kernel's attack surface
>> for ROP based attacks by clearing the stack pointer's sign bit before
>> returning from a function, and setting it again right after proceeding
>> from the [expected] return address. This should make it much more
>> difficult
>> to return to arbitrary gadgets, given that they rely on being chained to
>> the next via a return address popped off the stack, and this is difficult
>> when the stack pointer is invalid.
>
>
> Doesn't this break stack unwinding?
>

Any exception that is taken between clearing and setting the SP bit
will first reset the bit. Since arm64 does not rely on hardware to
preserve the exception context, we can do this in code before
preserving the registers (which is why we need the 'little dance' in
the last patch)

So any time the stack unwinding code runs, the stack pointer should be valid.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation
  2018-08-02 13:21 ` Ard Biesheuvel
@ 2018-08-06 13:55   ` Robin Murphy
  -1 siblings, 0 replies; 46+ messages in thread
From: Robin Murphy @ 2018-08-06 13:55 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/08/18 14:21, Ard Biesheuvel wrote:
> This is a proof of concept I cooked up, primarily to trigger a discussion
> about whether there is a point to doing anything like this, and if there
> is, what the pitfalls are. Also, while I am not aware of any similar
> implementations, the idea is so simple that I would be surprised if nobody
> else thought of the same thing way before I did.

So, "TTBR0 PAN: Pointer Auth edition"? :P

> The idea is that we can significantly limit the kernel's attack surface
> for ROP based attacks by clearing the stack pointer's sign bit before
> returning from a function, and setting it again right after proceeding
> from the [expected] return address. This should make it much more difficult
> to return to arbitrary gadgets, given that they rely on being chained to
> the next via a return address popped off the stack, and this is difficult
> when the stack pointer is invalid.
> 
> Of course, 4 additional instructions per function return is not exactly
> for free, but they are just movs and adds, and leaf functions are
> disregarded unless they allocate a stack frame (this comes for free
> because simple_return insns are disregarded by the plugin)
> 
> Please shoot, preferably with better ideas ...

Actually, on the subject of PAN, shouldn't this at least have a very 
hard dependency on that? AFAICS without PAN clearing bit 55 of SP is 
effectively giving userspace direct control of the kernel stack (thanks 
to TBI). Ouch.

I wonder if there's a little  more mileage in using "{add,sub} sp, sp, 
#1" sequences to rely on stack alignment exceptions instead, with the 
added bonus that that's about as low as the instruction-level overhead 
can get.

Robin.

> 
> Ard Biesheuvel (3):
>    arm64: use wrapper macro for bl/blx instructions from asm code
>    gcc: plugins: add ROP shield plugin for arm64
>    arm64: enable ROP protection by clearing SP bit #55 across function
>      returns
> 
>   arch/Kconfig                                  |   4 +
>   arch/arm64/Kconfig                            |  10 ++
>   arch/arm64/include/asm/assembler.h            |  21 +++-
>   arch/arm64/kernel/entry-ftrace.S              |   6 +-
>   arch/arm64/kernel/entry.S                     | 104 +++++++++-------
>   arch/arm64/kernel/head.S                      |   4 +-
>   arch/arm64/kernel/probes/kprobes_trampoline.S |   2 +-
>   arch/arm64/kernel/sleep.S                     |   6 +-
>   drivers/firmware/efi/libstub/Makefile         |   3 +-
>   scripts/Makefile.gcc-plugins                  |   7 ++
>   scripts/gcc-plugins/arm64_rop_shield_plugin.c | 116 ++++++++++++++++++
>   11 files changed, 228 insertions(+), 55 deletions(-)
>   create mode 100644 scripts/gcc-plugins/arm64_rop_shield_plugin.c
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation
@ 2018-08-06 13:55   ` Robin Murphy
  0 siblings, 0 replies; 46+ messages in thread
From: Robin Murphy @ 2018-08-06 13:55 UTC (permalink / raw)
  To: Ard Biesheuvel, kernel-hardening
  Cc: mark.rutland, keescook, catalin.marinas, will.deacon,
	christoffer.dall, linux-arm-kernel, labbott, Julien Thierry

On 02/08/18 14:21, Ard Biesheuvel wrote:
> This is a proof of concept I cooked up, primarily to trigger a discussion
> about whether there is a point to doing anything like this, and if there
> is, what the pitfalls are. Also, while I am not aware of any similar
> implementations, the idea is so simple that I would be surprised if nobody
> else thought of the same thing way before I did.

So, "TTBR0 PAN: Pointer Auth edition"? :P

> The idea is that we can significantly limit the kernel's attack surface
> for ROP based attacks by clearing the stack pointer's sign bit before
> returning from a function, and setting it again right after proceeding
> from the [expected] return address. This should make it much more difficult
> to return to arbitrary gadgets, given that they rely on being chained to
> the next via a return address popped off the stack, and this is difficult
> when the stack pointer is invalid.
> 
> Of course, 4 additional instructions per function return is not exactly
> for free, but they are just movs and adds, and leaf functions are
> disregarded unless they allocate a stack frame (this comes for free
> because simple_return insns are disregarded by the plugin)
> 
> Please shoot, preferably with better ideas ...

Actually, on the subject of PAN, shouldn't this at least have a very 
hard dependency on that? AFAICS without PAN clearing bit 55 of SP is 
effectively giving userspace direct control of the kernel stack (thanks 
to TBI). Ouch.

I wonder if there's a little  more mileage in using "{add,sub} sp, sp, 
#1" sequences to rely on stack alignment exceptions instead, with the 
added bonus that that's about as low as the instruction-level overhead 
can get.

Robin.

> 
> Ard Biesheuvel (3):
>    arm64: use wrapper macro for bl/blx instructions from asm code
>    gcc: plugins: add ROP shield plugin for arm64
>    arm64: enable ROP protection by clearing SP bit #55 across function
>      returns
> 
>   arch/Kconfig                                  |   4 +
>   arch/arm64/Kconfig                            |  10 ++
>   arch/arm64/include/asm/assembler.h            |  21 +++-
>   arch/arm64/kernel/entry-ftrace.S              |   6 +-
>   arch/arm64/kernel/entry.S                     | 104 +++++++++-------
>   arch/arm64/kernel/head.S                      |   4 +-
>   arch/arm64/kernel/probes/kprobes_trampoline.S |   2 +-
>   arch/arm64/kernel/sleep.S                     |   6 +-
>   drivers/firmware/efi/libstub/Makefile         |   3 +-
>   scripts/Makefile.gcc-plugins                  |   7 ++
>   scripts/gcc-plugins/arm64_rop_shield_plugin.c | 116 ++++++++++++++++++
>   11 files changed, 228 insertions(+), 55 deletions(-)
>   create mode 100644 scripts/gcc-plugins/arm64_rop_shield_plugin.c
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation
  2018-08-06 13:55   ` Robin Murphy
@ 2018-08-06 14:04     ` Ard Biesheuvel
  -1 siblings, 0 replies; 46+ messages in thread
From: Ard Biesheuvel @ 2018-08-06 14:04 UTC (permalink / raw)
  To: linux-arm-kernel

On 6 August 2018 at 15:55, Robin Murphy <robin.murphy@arm.com> wrote:
> On 02/08/18 14:21, Ard Biesheuvel wrote:
>>
>> This is a proof of concept I cooked up, primarily to trigger a discussion
>> about whether there is a point to doing anything like this, and if there
>> is, what the pitfalls are. Also, while I am not aware of any similar
>> implementations, the idea is so simple that I would be surprised if nobody
>> else thought of the same thing way before I did.
>
>
> So, "TTBR0 PAN: Pointer Auth edition"? :P
>
>> The idea is that we can significantly limit the kernel's attack surface
>> for ROP based attacks by clearing the stack pointer's sign bit before
>> returning from a function, and setting it again right after proceeding
>> from the [expected] return address. This should make it much more
>> difficult
>> to return to arbitrary gadgets, given that they rely on being chained to
>> the next via a return address popped off the stack, and this is difficult
>> when the stack pointer is invalid.
>>
>> Of course, 4 additional instructions per function return is not exactly
>> for free, but they are just movs and adds, and leaf functions are
>> disregarded unless they allocate a stack frame (this comes for free
>> because simple_return insns are disregarded by the plugin)
>>
>> Please shoot, preferably with better ideas ...
>
>
> Actually, on the subject of PAN, shouldn't this at least have a very hard
> dependency on that? AFAICS without PAN clearing bit 55 of SP is effectively
> giving userspace direct control of the kernel stack (thanks to TBI). Ouch.
>

How's that? Bits 52 .. 54 will still be set, so SP will never contain
a valid userland address in any case. Or am I missing something?

> I wonder if there's a little  more mileage in using "{add,sub} sp, sp, #1"
> sequences to rely on stack alignment exceptions instead, with the added
> bonus that that's about as low as the instruction-level overhead can get.
>

Good point. I did consider that, but couldn't convince myself that it
isn't easier to defeat: loads via x29 occur reasonably often, and you
can simply offset your doctored stack frame by a single byte.


>>
>> Ard Biesheuvel (3):
>>    arm64: use wrapper macro for bl/blx instructions from asm code
>>    gcc: plugins: add ROP shield plugin for arm64
>>    arm64: enable ROP protection by clearing SP bit #55 across function
>>      returns
>>
>>   arch/Kconfig                                  |   4 +
>>   arch/arm64/Kconfig                            |  10 ++
>>   arch/arm64/include/asm/assembler.h            |  21 +++-
>>   arch/arm64/kernel/entry-ftrace.S              |   6 +-
>>   arch/arm64/kernel/entry.S                     | 104 +++++++++-------
>>   arch/arm64/kernel/head.S                      |   4 +-
>>   arch/arm64/kernel/probes/kprobes_trampoline.S |   2 +-
>>   arch/arm64/kernel/sleep.S                     |   6 +-
>>   drivers/firmware/efi/libstub/Makefile         |   3 +-
>>   scripts/Makefile.gcc-plugins                  |   7 ++
>>   scripts/gcc-plugins/arm64_rop_shield_plugin.c | 116 ++++++++++++++++++
>>   11 files changed, 228 insertions(+), 55 deletions(-)
>>   create mode 100644 scripts/gcc-plugins/arm64_rop_shield_plugin.c
>>
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation
@ 2018-08-06 14:04     ` Ard Biesheuvel
  0 siblings, 0 replies; 46+ messages in thread
From: Ard Biesheuvel @ 2018-08-06 14:04 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Kernel Hardening, Mark Rutland, Kees Cook, Catalin Marinas,
	Will Deacon, Christoffer Dall, linux-arm-kernel, Laura Abbott,
	Julien Thierry

On 6 August 2018 at 15:55, Robin Murphy <robin.murphy@arm.com> wrote:
> On 02/08/18 14:21, Ard Biesheuvel wrote:
>>
>> This is a proof of concept I cooked up, primarily to trigger a discussion
>> about whether there is a point to doing anything like this, and if there
>> is, what the pitfalls are. Also, while I am not aware of any similar
>> implementations, the idea is so simple that I would be surprised if nobody
>> else thought of the same thing way before I did.
>
>
> So, "TTBR0 PAN: Pointer Auth edition"? :P
>
>> The idea is that we can significantly limit the kernel's attack surface
>> for ROP based attacks by clearing the stack pointer's sign bit before
>> returning from a function, and setting it again right after proceeding
>> from the [expected] return address. This should make it much more
>> difficult
>> to return to arbitrary gadgets, given that they rely on being chained to
>> the next via a return address popped off the stack, and this is difficult
>> when the stack pointer is invalid.
>>
>> Of course, 4 additional instructions per function return is not exactly
>> for free, but they are just movs and adds, and leaf functions are
>> disregarded unless they allocate a stack frame (this comes for free
>> because simple_return insns are disregarded by the plugin)
>>
>> Please shoot, preferably with better ideas ...
>
>
> Actually, on the subject of PAN, shouldn't this at least have a very hard
> dependency on that? AFAICS without PAN clearing bit 55 of SP is effectively
> giving userspace direct control of the kernel stack (thanks to TBI). Ouch.
>

How's that? Bits 52 .. 54 will still be set, so SP will never contain
a valid userland address in any case. Or am I missing something?

> I wonder if there's a little  more mileage in using "{add,sub} sp, sp, #1"
> sequences to rely on stack alignment exceptions instead, with the added
> bonus that that's about as low as the instruction-level overhead can get.
>

Good point. I did consider that, but couldn't convince myself that it
isn't easier to defeat: loads via x29 occur reasonably often, and you
can simply offset your doctored stack frame by a single byte.


>>
>> Ard Biesheuvel (3):
>>    arm64: use wrapper macro for bl/blx instructions from asm code
>>    gcc: plugins: add ROP shield plugin for arm64
>>    arm64: enable ROP protection by clearing SP bit #55 across function
>>      returns
>>
>>   arch/Kconfig                                  |   4 +
>>   arch/arm64/Kconfig                            |  10 ++
>>   arch/arm64/include/asm/assembler.h            |  21 +++-
>>   arch/arm64/kernel/entry-ftrace.S              |   6 +-
>>   arch/arm64/kernel/entry.S                     | 104 +++++++++-------
>>   arch/arm64/kernel/head.S                      |   4 +-
>>   arch/arm64/kernel/probes/kprobes_trampoline.S |   2 +-
>>   arch/arm64/kernel/sleep.S                     |   6 +-
>>   drivers/firmware/efi/libstub/Makefile         |   3 +-
>>   scripts/Makefile.gcc-plugins                  |   7 ++
>>   scripts/gcc-plugins/arm64_rop_shield_plugin.c | 116 ++++++++++++++++++
>>   11 files changed, 228 insertions(+), 55 deletions(-)
>>   create mode 100644 scripts/gcc-plugins/arm64_rop_shield_plugin.c
>>
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation
  2018-08-06 14:04     ` Ard Biesheuvel
@ 2018-08-06 15:20       ` Ard Biesheuvel
  -1 siblings, 0 replies; 46+ messages in thread
From: Ard Biesheuvel @ 2018-08-06 15:20 UTC (permalink / raw)
  To: linux-arm-kernel

On 6 August 2018 at 16:04, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 6 August 2018 at 15:55, Robin Murphy <robin.murphy@arm.com> wrote:
>> On 02/08/18 14:21, Ard Biesheuvel wrote:
>>>
>>> This is a proof of concept I cooked up, primarily to trigger a discussion
>>> about whether there is a point to doing anything like this, and if there
>>> is, what the pitfalls are. Also, while I am not aware of any similar
>>> implementations, the idea is so simple that I would be surprised if nobody
>>> else thought of the same thing way before I did.
>>
>>
>> So, "TTBR0 PAN: Pointer Auth edition"? :P
>>
>>> The idea is that we can significantly limit the kernel's attack surface
>>> for ROP based attacks by clearing the stack pointer's sign bit before
>>> returning from a function, and setting it again right after proceeding
>>> from the [expected] return address. This should make it much more
>>> difficult
>>> to return to arbitrary gadgets, given that they rely on being chained to
>>> the next via a return address popped off the stack, and this is difficult
>>> when the stack pointer is invalid.
>>>
>>> Of course, 4 additional instructions per function return is not exactly
>>> for free, but they are just movs and adds, and leaf functions are
>>> disregarded unless they allocate a stack frame (this comes for free
>>> because simple_return insns are disregarded by the plugin)
>>>
>>> Please shoot, preferably with better ideas ...
>>
>>
>> Actually, on the subject of PAN, shouldn't this at least have a very hard
>> dependency on that? AFAICS without PAN clearing bit 55 of SP is effectively
>> giving userspace direct control of the kernel stack (thanks to TBI). Ouch.
>>
>
> How's that? Bits 52 .. 54 will still be set, so SP will never contain
> a valid userland address in any case. Or am I missing something?
>
>> I wonder if there's a little  more mileage in using "{add,sub} sp, sp, #1"
>> sequences to rely on stack alignment exceptions instead, with the added
>> bonus that that's about as low as the instruction-level overhead can get.
>>
>
> Good point. I did consider that, but couldn't convince myself that it
> isn't easier to defeat: loads via x29 occur reasonably often, and you
> can simply offset your doctored stack frame by a single byte.
>

Also, the restore has to be idempotent: only functions that modify sp
set the bit, so it cannot be reset unconditionally. Also, when taking
an exception in the middle, we'll return with the bit set even if it
was clear when the exception was taken.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation
@ 2018-08-06 15:20       ` Ard Biesheuvel
  0 siblings, 0 replies; 46+ messages in thread
From: Ard Biesheuvel @ 2018-08-06 15:20 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Kernel Hardening, Mark Rutland, Kees Cook, Catalin Marinas,
	Will Deacon, Christoffer Dall, linux-arm-kernel, Laura Abbott,
	Julien Thierry

On 6 August 2018 at 16:04, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 6 August 2018 at 15:55, Robin Murphy <robin.murphy@arm.com> wrote:
>> On 02/08/18 14:21, Ard Biesheuvel wrote:
>>>
>>> This is a proof of concept I cooked up, primarily to trigger a discussion
>>> about whether there is a point to doing anything like this, and if there
>>> is, what the pitfalls are. Also, while I am not aware of any similar
>>> implementations, the idea is so simple that I would be surprised if nobody
>>> else thought of the same thing way before I did.
>>
>>
>> So, "TTBR0 PAN: Pointer Auth edition"? :P
>>
>>> The idea is that we can significantly limit the kernel's attack surface
>>> for ROP based attacks by clearing the stack pointer's sign bit before
>>> returning from a function, and setting it again right after proceeding
>>> from the [expected] return address. This should make it much more
>>> difficult
>>> to return to arbitrary gadgets, given that they rely on being chained to
>>> the next via a return address popped off the stack, and this is difficult
>>> when the stack pointer is invalid.
>>>
>>> Of course, 4 additional instructions per function return is not exactly
>>> for free, but they are just movs and adds, and leaf functions are
>>> disregarded unless they allocate a stack frame (this comes for free
>>> because simple_return insns are disregarded by the plugin)
>>>
>>> Please shoot, preferably with better ideas ...
>>
>>
>> Actually, on the subject of PAN, shouldn't this at least have a very hard
>> dependency on that? AFAICS without PAN clearing bit 55 of SP is effectively
>> giving userspace direct control of the kernel stack (thanks to TBI). Ouch.
>>
>
> How's that? Bits 52 .. 54 will still be set, so SP will never contain
> a valid userland address in any case. Or am I missing something?
>
>> I wonder if there's a little  more mileage in using "{add,sub} sp, sp, #1"
>> sequences to rely on stack alignment exceptions instead, with the added
>> bonus that that's about as low as the instruction-level overhead can get.
>>
>
> Good point. I did consider that, but couldn't convince myself that it
> isn't easier to defeat: loads via x29 occur reasonably often, and you
> can simply offset your doctored stack frame by a single byte.
>

Also, the restore has to be idempotent: only functions that modify sp
set the bit, so it cannot be reset unconditionally. Also, when taking
an exception in the middle, we'll return with the bit set even if it
was clear when the exception was taken.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation
  2018-08-06 14:04     ` Ard Biesheuvel
@ 2018-08-06 15:38       ` Robin Murphy
  -1 siblings, 0 replies; 46+ messages in thread
From: Robin Murphy @ 2018-08-06 15:38 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/08/18 15:04, Ard Biesheuvel wrote:
> On 6 August 2018 at 15:55, Robin Murphy <robin.murphy@arm.com> wrote:
>> On 02/08/18 14:21, Ard Biesheuvel wrote:
>>>
>>> This is a proof of concept I cooked up, primarily to trigger a discussion
>>> about whether there is a point to doing anything like this, and if there
>>> is, what the pitfalls are. Also, while I am not aware of any similar
>>> implementations, the idea is so simple that I would be surprised if nobody
>>> else thought of the same thing way before I did.
>>
>>
>> So, "TTBR0 PAN: Pointer Auth edition"? :P
>>
>>> The idea is that we can significantly limit the kernel's attack surface
>>> for ROP based attacks by clearing the stack pointer's sign bit before
>>> returning from a function, and setting it again right after proceeding
>>> from the [expected] return address. This should make it much more
>>> difficult
>>> to return to arbitrary gadgets, given that they rely on being chained to
>>> the next via a return address popped off the stack, and this is difficult
>>> when the stack pointer is invalid.
>>>
>>> Of course, 4 additional instructions per function return is not exactly
>>> for free, but they are just movs and adds, and leaf functions are
>>> disregarded unless they allocate a stack frame (this comes for free
>>> because simple_return insns are disregarded by the plugin)
>>>
>>> Please shoot, preferably with better ideas ...
>>
>>
>> Actually, on the subject of PAN, shouldn't this at least have a very hard
>> dependency on that? AFAICS without PAN clearing bit 55 of SP is effectively
>> giving userspace direct control of the kernel stack (thanks to TBI). Ouch.
>>
> 
> How's that? Bits 52 .. 54 will still be set, so SP will never contain
> a valid userland address in any case. Or am I missing something?

Ah, yes, I'd managed to forget about the address hole, but I think that 
only makes it a bit trickier, rather than totally safe - it feels like 
you just need to chain one or two returns through "valid" targets until 
you can hit an epilogue with a "mov sp, x29" (at first glance there are 
a fair few of those in my vmlinux), after which we're back to the bit 55 
scheme alone giving no protection against retargeting the stack to a 
valid TTBR0 address.

>> I wonder if there's a little  more mileage in using "{add,sub} sp, sp, #1"
>> sequences to rely on stack alignment exceptions instead, with the added
>> bonus that that's about as low as the instruction-level overhead can get.
>>
> 
> Good point. I did consider that, but couldn't convince myself that it
> isn't easier to defeat: loads via x29 occur reasonably often, and you
> can simply offset your doctored stack frame by a single byte.

True; in theory there are 3072 possible unaligned offsets to choose 
from, but compile-time randomisation doesn't seem much use, and 
hotpatching just about every function call in the kernel isn't a nice 
thought either.

Robin.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation
@ 2018-08-06 15:38       ` Robin Murphy
  0 siblings, 0 replies; 46+ messages in thread
From: Robin Murphy @ 2018-08-06 15:38 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Kernel Hardening, Mark Rutland, Kees Cook, Catalin Marinas,
	Will Deacon, Christoffer Dall, linux-arm-kernel, Laura Abbott,
	Julien Thierry

On 06/08/18 15:04, Ard Biesheuvel wrote:
> On 6 August 2018 at 15:55, Robin Murphy <robin.murphy@arm.com> wrote:
>> On 02/08/18 14:21, Ard Biesheuvel wrote:
>>>
>>> This is a proof of concept I cooked up, primarily to trigger a discussion
>>> about whether there is a point to doing anything like this, and if there
>>> is, what the pitfalls are. Also, while I am not aware of any similar
>>> implementations, the idea is so simple that I would be surprised if nobody
>>> else thought of the same thing way before I did.
>>
>>
>> So, "TTBR0 PAN: Pointer Auth edition"? :P
>>
>>> The idea is that we can significantly limit the kernel's attack surface
>>> for ROP based attacks by clearing the stack pointer's sign bit before
>>> returning from a function, and setting it again right after proceeding
>>> from the [expected] return address. This should make it much more
>>> difficult
>>> to return to arbitrary gadgets, given that they rely on being chained to
>>> the next via a return address popped off the stack, and this is difficult
>>> when the stack pointer is invalid.
>>>
>>> Of course, 4 additional instructions per function return is not exactly
>>> for free, but they are just movs and adds, and leaf functions are
>>> disregarded unless they allocate a stack frame (this comes for free
>>> because simple_return insns are disregarded by the plugin)
>>>
>>> Please shoot, preferably with better ideas ...
>>
>>
>> Actually, on the subject of PAN, shouldn't this at least have a very hard
>> dependency on that? AFAICS without PAN clearing bit 55 of SP is effectively
>> giving userspace direct control of the kernel stack (thanks to TBI). Ouch.
>>
> 
> How's that? Bits 52 .. 54 will still be set, so SP will never contain
> a valid userland address in any case. Or am I missing something?

Ah, yes, I'd managed to forget about the address hole, but I think that 
only makes it a bit trickier, rather than totally safe - it feels like 
you just need to chain one or two returns through "valid" targets until 
you can hit an epilogue with a "mov sp, x29" (at first glance there are 
a fair few of those in my vmlinux), after which we're back to the bit 55 
scheme alone giving no protection against retargeting the stack to a 
valid TTBR0 address.

>> I wonder if there's a little  more mileage in using "{add,sub} sp, sp, #1"
>> sequences to rely on stack alignment exceptions instead, with the added
>> bonus that that's about as low as the instruction-level overhead can get.
>>
> 
> Good point. I did consider that, but couldn't convince myself that it
> isn't easier to defeat: loads via x29 occur reasonably often, and you
> can simply offset your doctored stack frame by a single byte.

True; in theory there are 3072 possible unaligned offsets to choose 
from, but compile-time randomisation doesn't seem much use, and 
hotpatching just about every function call in the kernel isn't a nice 
thought either.

Robin.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation
  2018-08-06 15:38       ` Robin Murphy
@ 2018-08-06 15:50         ` Ard Biesheuvel
  -1 siblings, 0 replies; 46+ messages in thread
From: Ard Biesheuvel @ 2018-08-06 15:50 UTC (permalink / raw)
  To: linux-arm-kernel

On 6 August 2018 at 17:38, Robin Murphy <robin.murphy@arm.com> wrote:
> On 06/08/18 15:04, Ard Biesheuvel wrote:
>>
>> On 6 August 2018 at 15:55, Robin Murphy <robin.murphy@arm.com> wrote:
>>>
>>> On 02/08/18 14:21, Ard Biesheuvel wrote:
>>>>
>>>>
>>>> This is a proof of concept I cooked up, primarily to trigger a
>>>> discussion
>>>> about whether there is a point to doing anything like this, and if there
>>>> is, what the pitfalls are. Also, while I am not aware of any similar
>>>> implementations, the idea is so simple that I would be surprised if
>>>> nobody
>>>> else thought of the same thing way before I did.
>>>
>>>
>>>
>>> So, "TTBR0 PAN: Pointer Auth edition"? :P
>>>
>>>> The idea is that we can significantly limit the kernel's attack surface
>>>> for ROP based attacks by clearing the stack pointer's sign bit before
>>>> returning from a function, and setting it again right after proceeding
>>>> from the [expected] return address. This should make it much more
>>>> difficult
>>>> to return to arbitrary gadgets, given that they rely on being chained to
>>>> the next via a return address popped off the stack, and this is
>>>> difficult
>>>> when the stack pointer is invalid.
>>>>
>>>> Of course, 4 additional instructions per function return is not exactly
>>>> for free, but they are just movs and adds, and leaf functions are
>>>> disregarded unless they allocate a stack frame (this comes for free
>>>> because simple_return insns are disregarded by the plugin)
>>>>
>>>> Please shoot, preferably with better ideas ...
>>>
>>>
>>>
>>> Actually, on the subject of PAN, shouldn't this at least have a very hard
>>> dependency on that? AFAICS without PAN clearing bit 55 of SP is
>>> effectively
>>> giving userspace direct control of the kernel stack (thanks to TBI).
>>> Ouch.
>>>
>>
>> How's that? Bits 52 .. 54 will still be set, so SP will never contain
>> a valid userland address in any case. Or am I missing something?
>
>
> Ah, yes, I'd managed to forget about the address hole, but I think that only
> makes it a bit trickier, rather than totally safe - it feels like you just
> need to chain one or two returns through "valid" targets until you can hit
> an epilogue with a "mov sp, x29" (at first glance there are a fair few of
> those in my vmlinux), after which we're back to the bit 55 scheme alone
> giving no protection against retargeting the stack to a valid TTBR0 address.
>

Wouldn't such an epilogue clear the SP bit before returning again?

>>> I wonder if there's a little  more mileage in using "{add,sub} sp, sp,
>>> #1"
>>> sequences to rely on stack alignment exceptions instead, with the added
>>> bonus that that's about as low as the instruction-level overhead can get.
>>>
>>
>> Good point. I did consider that, but couldn't convince myself that it
>> isn't easier to defeat: loads via x29 occur reasonably often, and you
>> can simply offset your doctored stack frame by a single byte.
>
>
> True; in theory there are 3072 possible unaligned offsets to choose from,
> but compile-time randomisation doesn't seem much use, and hotpatching just
> about every function call in the kernel isn't a nice thought either.
>
> Robin.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation
@ 2018-08-06 15:50         ` Ard Biesheuvel
  0 siblings, 0 replies; 46+ messages in thread
From: Ard Biesheuvel @ 2018-08-06 15:50 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Kernel Hardening, Mark Rutland, Kees Cook, Catalin Marinas,
	Will Deacon, Christoffer Dall, linux-arm-kernel, Laura Abbott,
	Julien Thierry

On 6 August 2018 at 17:38, Robin Murphy <robin.murphy@arm.com> wrote:
> On 06/08/18 15:04, Ard Biesheuvel wrote:
>>
>> On 6 August 2018 at 15:55, Robin Murphy <robin.murphy@arm.com> wrote:
>>>
>>> On 02/08/18 14:21, Ard Biesheuvel wrote:
>>>>
>>>>
>>>> This is a proof of concept I cooked up, primarily to trigger a
>>>> discussion
>>>> about whether there is a point to doing anything like this, and if there
>>>> is, what the pitfalls are. Also, while I am not aware of any similar
>>>> implementations, the idea is so simple that I would be surprised if
>>>> nobody
>>>> else thought of the same thing way before I did.
>>>
>>>
>>>
>>> So, "TTBR0 PAN: Pointer Auth edition"? :P
>>>
>>>> The idea is that we can significantly limit the kernel's attack surface
>>>> for ROP based attacks by clearing the stack pointer's sign bit before
>>>> returning from a function, and setting it again right after proceeding
>>>> from the [expected] return address. This should make it much more
>>>> difficult
>>>> to return to arbitrary gadgets, given that they rely on being chained to
>>>> the next via a return address popped off the stack, and this is
>>>> difficult
>>>> when the stack pointer is invalid.
>>>>
>>>> Of course, 4 additional instructions per function return is not exactly
>>>> for free, but they are just movs and adds, and leaf functions are
>>>> disregarded unless they allocate a stack frame (this comes for free
>>>> because simple_return insns are disregarded by the plugin)
>>>>
>>>> Please shoot, preferably with better ideas ...
>>>
>>>
>>>
>>> Actually, on the subject of PAN, shouldn't this at least have a very hard
>>> dependency on that? AFAICS without PAN clearing bit 55 of SP is
>>> effectively
>>> giving userspace direct control of the kernel stack (thanks to TBI).
>>> Ouch.
>>>
>>
>> How's that? Bits 52 .. 54 will still be set, so SP will never contain
>> a valid userland address in any case. Or am I missing something?
>
>
> Ah, yes, I'd managed to forget about the address hole, but I think that only
> makes it a bit trickier, rather than totally safe - it feels like you just
> need to chain one or two returns through "valid" targets until you can hit
> an epilogue with a "mov sp, x29" (at first glance there are a fair few of
> those in my vmlinux), after which we're back to the bit 55 scheme alone
> giving no protection against retargeting the stack to a valid TTBR0 address.
>

Wouldn't such an epilogue clear the SP bit before returning again?

>>> I wonder if there's a little  more mileage in using "{add,sub} sp, sp,
>>> #1"
>>> sequences to rely on stack alignment exceptions instead, with the added
>>> bonus that that's about as low as the instruction-level overhead can get.
>>>
>>
>> Good point. I did consider that, but couldn't convince myself that it
>> isn't easier to defeat: loads via x29 occur reasonably often, and you
>> can simply offset your doctored stack frame by a single byte.
>
>
> True; in theory there are 3072 possible unaligned offsets to choose from,
> but compile-time randomisation doesn't seem much use, and hotpatching just
> about every function call in the kernel isn't a nice thought either.
>
> Robin.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation
  2018-08-06 15:50         ` Ard Biesheuvel
@ 2018-08-06 16:04           ` Ard Biesheuvel
  -1 siblings, 0 replies; 46+ messages in thread
From: Ard Biesheuvel @ 2018-08-06 16:04 UTC (permalink / raw)
  To: linux-arm-kernel

On 6 August 2018 at 17:50, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 6 August 2018 at 17:38, Robin Murphy <robin.murphy@arm.com> wrote:
>> On 06/08/18 15:04, Ard Biesheuvel wrote:
>>>
>>> On 6 August 2018 at 15:55, Robin Murphy <robin.murphy@arm.com> wrote:
>>>>
>>>> On 02/08/18 14:21, Ard Biesheuvel wrote:
>>>>>
>>>>>
>>>>> This is a proof of concept I cooked up, primarily to trigger a
>>>>> discussion
>>>>> about whether there is a point to doing anything like this, and if there
>>>>> is, what the pitfalls are. Also, while I am not aware of any similar
>>>>> implementations, the idea is so simple that I would be surprised if
>>>>> nobody
>>>>> else thought of the same thing way before I did.
>>>>
>>>>
>>>>
>>>> So, "TTBR0 PAN: Pointer Auth edition"? :P
>>>>
>>>>> The idea is that we can significantly limit the kernel's attack surface
>>>>> for ROP based attacks by clearing the stack pointer's sign bit before
>>>>> returning from a function, and setting it again right after proceeding
>>>>> from the [expected] return address. This should make it much more
>>>>> difficult
>>>>> to return to arbitrary gadgets, given that they rely on being chained to
>>>>> the next via a return address popped off the stack, and this is
>>>>> difficult
>>>>> when the stack pointer is invalid.
>>>>>
>>>>> Of course, 4 additional instructions per function return is not exactly
>>>>> for free, but they are just movs and adds, and leaf functions are
>>>>> disregarded unless they allocate a stack frame (this comes for free
>>>>> because simple_return insns are disregarded by the plugin)
>>>>>
>>>>> Please shoot, preferably with better ideas ...
>>>>
>>>>
>>>>
>>>> Actually, on the subject of PAN, shouldn't this at least have a very hard
>>>> dependency on that? AFAICS without PAN clearing bit 55 of SP is
>>>> effectively
>>>> giving userspace direct control of the kernel stack (thanks to TBI).
>>>> Ouch.
>>>>
>>>
>>> How's that? Bits 52 .. 54 will still be set, so SP will never contain
>>> a valid userland address in any case. Or am I missing something?
>>
>>
>> Ah, yes, I'd managed to forget about the address hole, but I think that only
>> makes it a bit trickier, rather than totally safe - it feels like you just
>> need to chain one or two returns through "valid" targets until you can hit
>> an epilogue with a "mov sp, x29" (at first glance there are a fair few of
>> those in my vmlinux), after which we're back to the bit 55 scheme alone
>> giving no protection against retargeting the stack to a valid TTBR0 address.
>>
>
> Wouldn't such an epilogue clear the SP bit before returning again?
>

... or are you saying you can play tricks and clear bits 52 .. 54 ? If
so, you can already do that, right? And apply it to bit 55 as well?

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation
@ 2018-08-06 16:04           ` Ard Biesheuvel
  0 siblings, 0 replies; 46+ messages in thread
From: Ard Biesheuvel @ 2018-08-06 16:04 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Kernel Hardening, Mark Rutland, Kees Cook, Catalin Marinas,
	Will Deacon, Christoffer Dall, linux-arm-kernel, Laura Abbott,
	Julien Thierry

On 6 August 2018 at 17:50, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 6 August 2018 at 17:38, Robin Murphy <robin.murphy@arm.com> wrote:
>> On 06/08/18 15:04, Ard Biesheuvel wrote:
>>>
>>> On 6 August 2018 at 15:55, Robin Murphy <robin.murphy@arm.com> wrote:
>>>>
>>>> On 02/08/18 14:21, Ard Biesheuvel wrote:
>>>>>
>>>>>
>>>>> This is a proof of concept I cooked up, primarily to trigger a
>>>>> discussion
>>>>> about whether there is a point to doing anything like this, and if there
>>>>> is, what the pitfalls are. Also, while I am not aware of any similar
>>>>> implementations, the idea is so simple that I would be surprised if
>>>>> nobody
>>>>> else thought of the same thing way before I did.
>>>>
>>>>
>>>>
>>>> So, "TTBR0 PAN: Pointer Auth edition"? :P
>>>>
>>>>> The idea is that we can significantly limit the kernel's attack surface
>>>>> for ROP based attacks by clearing the stack pointer's sign bit before
>>>>> returning from a function, and setting it again right after proceeding
>>>>> from the [expected] return address. This should make it much more
>>>>> difficult
>>>>> to return to arbitrary gadgets, given that they rely on being chained to
>>>>> the next via a return address popped off the stack, and this is
>>>>> difficult
>>>>> when the stack pointer is invalid.
>>>>>
>>>>> Of course, 4 additional instructions per function return is not exactly
>>>>> for free, but they are just movs and adds, and leaf functions are
>>>>> disregarded unless they allocate a stack frame (this comes for free
>>>>> because simple_return insns are disregarded by the plugin)
>>>>>
>>>>> Please shoot, preferably with better ideas ...
>>>>
>>>>
>>>>
>>>> Actually, on the subject of PAN, shouldn't this at least have a very hard
>>>> dependency on that? AFAICS without PAN clearing bit 55 of SP is
>>>> effectively
>>>> giving userspace direct control of the kernel stack (thanks to TBI).
>>>> Ouch.
>>>>
>>>
>>> How's that? Bits 52 .. 54 will still be set, so SP will never contain
>>> a valid userland address in any case. Or am I missing something?
>>
>>
>> Ah, yes, I'd managed to forget about the address hole, but I think that only
>> makes it a bit trickier, rather than totally safe - it feels like you just
>> need to chain one or two returns through "valid" targets until you can hit
>> an epilogue with a "mov sp, x29" (at first glance there are a fair few of
>> those in my vmlinux), after which we're back to the bit 55 scheme alone
>> giving no protection against retargeting the stack to a valid TTBR0 address.
>>
>
> Wouldn't such an epilogue clear the SP bit before returning again?
>

... or are you saying you can play tricks and clear bits 52 .. 54 ? If
so, you can already do that, right? And apply it to bit 55 as well?

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation
  2018-08-06 16:04           ` Ard Biesheuvel
@ 2018-08-06 17:45             ` Robin Murphy
  -1 siblings, 0 replies; 46+ messages in thread
From: Robin Murphy @ 2018-08-06 17:45 UTC (permalink / raw)
  To: linux-arm-kernel



On 06/08/18 17:04, Ard Biesheuvel wrote:
> On 6 August 2018 at 17:50, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>> On 6 August 2018 at 17:38, Robin Murphy <robin.murphy@arm.com> wrote:
>>> On 06/08/18 15:04, Ard Biesheuvel wrote:
>>>>
>>>> On 6 August 2018 at 15:55, Robin Murphy <robin.murphy@arm.com> wrote:
>>>>>
>>>>> On 02/08/18 14:21, Ard Biesheuvel wrote:
>>>>>>
>>>>>>
>>>>>> This is a proof of concept I cooked up, primarily to trigger a
>>>>>> discussion
>>>>>> about whether there is a point to doing anything like this, and if there
>>>>>> is, what the pitfalls are. Also, while I am not aware of any similar
>>>>>> implementations, the idea is so simple that I would be surprised if
>>>>>> nobody
>>>>>> else thought of the same thing way before I did.
>>>>>
>>>>>
>>>>>
>>>>> So, "TTBR0 PAN: Pointer Auth edition"? :P
>>>>>
>>>>>> The idea is that we can significantly limit the kernel's attack surface
>>>>>> for ROP based attacks by clearing the stack pointer's sign bit before
>>>>>> returning from a function, and setting it again right after proceeding
>>>>>> from the [expected] return address. This should make it much more
>>>>>> difficult
>>>>>> to return to arbitrary gadgets, given that they rely on being chained to
>>>>>> the next via a return address popped off the stack, and this is
>>>>>> difficult
>>>>>> when the stack pointer is invalid.
>>>>>>
>>>>>> Of course, 4 additional instructions per function return is not exactly
>>>>>> for free, but they are just movs and adds, and leaf functions are
>>>>>> disregarded unless they allocate a stack frame (this comes for free
>>>>>> because simple_return insns are disregarded by the plugin)
>>>>>>
>>>>>> Please shoot, preferably with better ideas ...
>>>>>
>>>>>
>>>>>
>>>>> Actually, on the subject of PAN, shouldn't this at least have a very hard
>>>>> dependency on that? AFAICS without PAN clearing bit 55 of SP is
>>>>> effectively
>>>>> giving userspace direct control of the kernel stack (thanks to TBI).
>>>>> Ouch.
>>>>>
>>>>
>>>> How's that? Bits 52 .. 54 will still be set, so SP will never contain
>>>> a valid userland address in any case. Or am I missing something?
>>>
>>>
>>> Ah, yes, I'd managed to forget about the address hole, but I think that only
>>> makes it a bit trickier, rather than totally safe - it feels like you just
>>> need to chain one or two returns through "valid" targets until you can hit
>>> an epilogue with a "mov sp, x29" (at first glance there are a fair few of
>>> those in my vmlinux), after which we're back to the bit 55 scheme alone
>>> giving no protection against retargeting the stack to a valid TTBR0 address.
>>>
>>
>> Wouldn't such an epilogue clear the SP bit before returning again?
>>
> 
> ... or are you saying you can play tricks and clear bits 52 .. 54 ? If
> so, you can already do that, right? And apply it to bit 55 as well?

Indeed, in this scenario clearing bit 55 immediately before the final 
ret does nothing because the "valid" return beforehand loaded x29 with 
an arbitrary userspace address from a doctored stack frame, so the rest 
of that epilogue beyond that first mov already ran off the fake stack.

Admittedly you might have to retain control of the "real" kernel stack 
and go through much the same dance if the gadget chain ever needs to 
pass through a real return target (to mitigate bit 55 being 
unconditionally set again to make an invalid TTBR1 address). Working 
around the mitigations certainly makes the exploit more difficult, but 
still seemingly far from impossible. And yes, AFAICS an attacker could 
indeed use the same SP-hijacking trick today (in the same absence of 
PAN), it's just that without any mitigations to prevent using the kernel 
stack alone I can't imagine it would be worth the extra complication.

I guess what I'm getting at is that if the protection mechanism is 
"always return with SP outside TTBR1", there seems little point in going 
through the motions if SP in TTBR0 could still be valid and allow an 
attack to succeed anyway; this is basically just me working through a 
justification for saying the proposed scheme needs "depends on ARM64_PAN 
|| ARM64_SW_TTBR0_PAN", making it that much uglier for v8.0 CPUs...

Robin.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation
@ 2018-08-06 17:45             ` Robin Murphy
  0 siblings, 0 replies; 46+ messages in thread
From: Robin Murphy @ 2018-08-06 17:45 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Kernel Hardening, Mark Rutland, Kees Cook, Catalin Marinas,
	Will Deacon, Christoffer Dall, linux-arm-kernel, Laura Abbott,
	Julien Thierry



On 06/08/18 17:04, Ard Biesheuvel wrote:
> On 6 August 2018 at 17:50, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>> On 6 August 2018 at 17:38, Robin Murphy <robin.murphy@arm.com> wrote:
>>> On 06/08/18 15:04, Ard Biesheuvel wrote:
>>>>
>>>> On 6 August 2018 at 15:55, Robin Murphy <robin.murphy@arm.com> wrote:
>>>>>
>>>>> On 02/08/18 14:21, Ard Biesheuvel wrote:
>>>>>>
>>>>>>
>>>>>> This is a proof of concept I cooked up, primarily to trigger a
>>>>>> discussion
>>>>>> about whether there is a point to doing anything like this, and if there
>>>>>> is, what the pitfalls are. Also, while I am not aware of any similar
>>>>>> implementations, the idea is so simple that I would be surprised if
>>>>>> nobody
>>>>>> else thought of the same thing way before I did.
>>>>>
>>>>>
>>>>>
>>>>> So, "TTBR0 PAN: Pointer Auth edition"? :P
>>>>>
>>>>>> The idea is that we can significantly limit the kernel's attack surface
>>>>>> for ROP based attacks by clearing the stack pointer's sign bit before
>>>>>> returning from a function, and setting it again right after proceeding
>>>>>> from the [expected] return address. This should make it much more
>>>>>> difficult
>>>>>> to return to arbitrary gadgets, given that they rely on being chained to
>>>>>> the next via a return address popped off the stack, and this is
>>>>>> difficult
>>>>>> when the stack pointer is invalid.
>>>>>>
>>>>>> Of course, 4 additional instructions per function return is not exactly
>>>>>> for free, but they are just movs and adds, and leaf functions are
>>>>>> disregarded unless they allocate a stack frame (this comes for free
>>>>>> because simple_return insns are disregarded by the plugin)
>>>>>>
>>>>>> Please shoot, preferably with better ideas ...
>>>>>
>>>>>
>>>>>
>>>>> Actually, on the subject of PAN, shouldn't this at least have a very hard
>>>>> dependency on that? AFAICS without PAN clearing bit 55 of SP is
>>>>> effectively
>>>>> giving userspace direct control of the kernel stack (thanks to TBI).
>>>>> Ouch.
>>>>>
>>>>
>>>> How's that? Bits 52 .. 54 will still be set, so SP will never contain
>>>> a valid userland address in any case. Or am I missing something?
>>>
>>>
>>> Ah, yes, I'd managed to forget about the address hole, but I think that only
>>> makes it a bit trickier, rather than totally safe - it feels like you just
>>> need to chain one or two returns through "valid" targets until you can hit
>>> an epilogue with a "mov sp, x29" (at first glance there are a fair few of
>>> those in my vmlinux), after which we're back to the bit 55 scheme alone
>>> giving no protection against retargeting the stack to a valid TTBR0 address.
>>>
>>
>> Wouldn't such an epilogue clear the SP bit before returning again?
>>
> 
> ... or are you saying you can play tricks and clear bits 52 .. 54 ? If
> so, you can already do that, right? And apply it to bit 55 as well?

Indeed, in this scenario clearing bit 55 immediately before the final 
ret does nothing because the "valid" return beforehand loaded x29 with 
an arbitrary userspace address from a doctored stack frame, so the rest 
of that epilogue beyond that first mov already ran off the fake stack.

Admittedly you might have to retain control of the "real" kernel stack 
and go through much the same dance if the gadget chain ever needs to 
pass through a real return target (to mitigate bit 55 being 
unconditionally set again to make an invalid TTBR1 address). Working 
around the mitigations certainly makes the exploit more difficult, but 
still seemingly far from impossible. And yes, AFAICS an attacker could 
indeed use the same SP-hijacking trick today (in the same absence of 
PAN), it's just that without any mitigations to prevent using the kernel 
stack alone I can't imagine it would be worth the extra complication.

I guess what I'm getting at is that if the protection mechanism is 
"always return with SP outside TTBR1", there seems little point in going 
through the motions if SP in TTBR0 could still be valid and allow an 
attack to succeed anyway; this is basically just me working through a 
justification for saying the proposed scheme needs "depends on ARM64_PAN 
|| ARM64_SW_TTBR0_PAN", making it that much uglier for v8.0 CPUs...

Robin.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation
  2018-08-06 17:45             ` Robin Murphy
@ 2018-08-06 18:49               ` Kees Cook
  -1 siblings, 0 replies; 46+ messages in thread
From: Kees Cook @ 2018-08-06 18:49 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Aug 6, 2018 at 10:45 AM, Robin Murphy <robin.murphy@arm.com> wrote:
> I guess what I'm getting at is that if the protection mechanism is "always
> return with SP outside TTBR1", there seems little point in going through the
> motions if SP in TTBR0 could still be valid and allow an attack to succeed
> anyway; this is basically just me working through a justification for saying
> the proposed scheme needs "depends on ARM64_PAN || ARM64_SW_TTBR0_PAN",
> making it that much uglier for v8.0 CPUs...

I think anyone with v8.0 CPUs interested in this mitigation would also
very much want PAN emulation. If a "depends on" isn't desired, what
about "imply" in the Kconfig?

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation
@ 2018-08-06 18:49               ` Kees Cook
  0 siblings, 0 replies; 46+ messages in thread
From: Kees Cook @ 2018-08-06 18:49 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Ard Biesheuvel, Kernel Hardening, Mark Rutland, Catalin Marinas,
	Will Deacon, Christoffer Dall, linux-arm-kernel, Laura Abbott,
	Julien Thierry

On Mon, Aug 6, 2018 at 10:45 AM, Robin Murphy <robin.murphy@arm.com> wrote:
> I guess what I'm getting at is that if the protection mechanism is "always
> return with SP outside TTBR1", there seems little point in going through the
> motions if SP in TTBR0 could still be valid and allow an attack to succeed
> anyway; this is basically just me working through a justification for saying
> the proposed scheme needs "depends on ARM64_PAN || ARM64_SW_TTBR0_PAN",
> making it that much uglier for v8.0 CPUs...

I think anyone with v8.0 CPUs interested in this mitigation would also
very much want PAN emulation. If a "depends on" isn't desired, what
about "imply" in the Kconfig?

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation
  2018-08-06 18:49               ` Kees Cook
@ 2018-08-06 19:35                 ` Ard Biesheuvel
  -1 siblings, 0 replies; 46+ messages in thread
From: Ard Biesheuvel @ 2018-08-06 19:35 UTC (permalink / raw)
  To: linux-arm-kernel

On 6 August 2018 at 20:49, Kees Cook <keescook@chromium.org> wrote:
> On Mon, Aug 6, 2018 at 10:45 AM, Robin Murphy <robin.murphy@arm.com> wrote:
>> I guess what I'm getting at is that if the protection mechanism is "always
>> return with SP outside TTBR1", there seems little point in going through the
>> motions if SP in TTBR0 could still be valid and allow an attack to succeed
>> anyway; this is basically just me working through a justification for saying
>> the proposed scheme needs "depends on ARM64_PAN || ARM64_SW_TTBR0_PAN",
>> making it that much uglier for v8.0 CPUs...
>
> I think anyone with v8.0 CPUs interested in this mitigation would also
> very much want PAN emulation. If a "depends on" isn't desired, what
> about "imply" in the Kconfig?
>

Yes, but actually, using bit #0 is maybe a better alternative in any
case. You can never dereference SP with bit #0 set, regardless of
whether the address points to user or kernel space, and my concern
about reloading sp from x29 doesn't really make sense, given that x29
is always assigned from sp right after pushing x29 and x30 in the
function prologue, and sp only gets restored from x29 in the epilogue
when there is a stack frame to begin with, in which case we add #1 to
sp again before returning from the function.

The other code gets a lot cleaner as well.

So for the return we'll have

  ldp     x29, x30, [sp], #nn
>>add     sp, sp, #0x1
  ret

and for the function call

  bl      <foo>
>>mov      x30, sp
>>bic     sp, x30, #1

The restore sequence in entry.s:96 (which has no spare registers) gets
much simpler as well:

--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -95,6 +95,15 @@ alternative_else_nop_endif
         */
        add     sp, sp, x0      // sp' = sp + x0
        sub     x0, sp, x0      // x0' = sp' - x0 = (sp + x0) - x0 = sp
+#ifdef CONFIG_ARM64_ROP_SHIELD
+       tbnz    x0, #0, 1f
+       .subsection     1
+1:     sub     x0, x0, #1
+       sub     sp, sp, #1
+       b       2f
+       .previous
+2:
+#endif
        tbnz    x0, #THREAD_SHIFT, 0f
        sub     x0, sp, x0      // x0'' = sp' - x0' = (sp + x0) - sp = x0
        sub     sp, sp, x0      // sp'' = sp' - x0 = (sp + x0) - x0 = sp

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation
@ 2018-08-06 19:35                 ` Ard Biesheuvel
  0 siblings, 0 replies; 46+ messages in thread
From: Ard Biesheuvel @ 2018-08-06 19:35 UTC (permalink / raw)
  To: Kees Cook
  Cc: Robin Murphy, Kernel Hardening, Mark Rutland, Catalin Marinas,
	Will Deacon, Christoffer Dall, linux-arm-kernel, Laura Abbott,
	Julien Thierry

On 6 August 2018 at 20:49, Kees Cook <keescook@chromium.org> wrote:
> On Mon, Aug 6, 2018 at 10:45 AM, Robin Murphy <robin.murphy@arm.com> wrote:
>> I guess what I'm getting at is that if the protection mechanism is "always
>> return with SP outside TTBR1", there seems little point in going through the
>> motions if SP in TTBR0 could still be valid and allow an attack to succeed
>> anyway; this is basically just me working through a justification for saying
>> the proposed scheme needs "depends on ARM64_PAN || ARM64_SW_TTBR0_PAN",
>> making it that much uglier for v8.0 CPUs...
>
> I think anyone with v8.0 CPUs interested in this mitigation would also
> very much want PAN emulation. If a "depends on" isn't desired, what
> about "imply" in the Kconfig?
>

Yes, but actually, using bit #0 is maybe a better alternative in any
case. You can never dereference SP with bit #0 set, regardless of
whether the address points to user or kernel space, and my concern
about reloading sp from x29 doesn't really make sense, given that x29
is always assigned from sp right after pushing x29 and x30 in the
function prologue, and sp only gets restored from x29 in the epilogue
when there is a stack frame to begin with, in which case we add #1 to
sp again before returning from the function.

The other code gets a lot cleaner as well.

So for the return we'll have

  ldp     x29, x30, [sp], #nn
>>add     sp, sp, #0x1
  ret

and for the function call

  bl      <foo>
>>mov      x30, sp
>>bic     sp, x30, #1

The restore sequence in entry.s:96 (which has no spare registers) gets
much simpler as well:

--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -95,6 +95,15 @@ alternative_else_nop_endif
         */
        add     sp, sp, x0      // sp' = sp + x0
        sub     x0, sp, x0      // x0' = sp' - x0 = (sp + x0) - x0 = sp
+#ifdef CONFIG_ARM64_ROP_SHIELD
+       tbnz    x0, #0, 1f
+       .subsection     1
+1:     sub     x0, x0, #1
+       sub     sp, sp, #1
+       b       2f
+       .previous
+2:
+#endif
        tbnz    x0, #THREAD_SHIFT, 0f
        sub     x0, sp, x0      // x0'' = sp' - x0' = (sp + x0) - sp = x0
        sub     sp, sp, x0      // sp'' = sp' - x0 = (sp + x0) - x0 = sp

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation
  2018-08-06 19:35                 ` Ard Biesheuvel
@ 2018-08-06 19:50                   ` Kees Cook
  -1 siblings, 0 replies; 46+ messages in thread
From: Kees Cook @ 2018-08-06 19:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Aug 6, 2018 at 12:35 PM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> On 6 August 2018 at 20:49, Kees Cook <keescook@chromium.org> wrote:
>> On Mon, Aug 6, 2018 at 10:45 AM, Robin Murphy <robin.murphy@arm.com> wrote:
>>> I guess what I'm getting at is that if the protection mechanism is "always
>>> return with SP outside TTBR1", there seems little point in going through the
>>> motions if SP in TTBR0 could still be valid and allow an attack to succeed
>>> anyway; this is basically just me working through a justification for saying
>>> the proposed scheme needs "depends on ARM64_PAN || ARM64_SW_TTBR0_PAN",
>>> making it that much uglier for v8.0 CPUs...
>>
>> I think anyone with v8.0 CPUs interested in this mitigation would also
>> very much want PAN emulation. If a "depends on" isn't desired, what
>> about "imply" in the Kconfig?
>>
>
> Yes, but actually, using bit #0 is maybe a better alternative in any
> case. You can never dereference SP with bit #0 set, regardless of
> whether the address points to user or kernel space, and my concern
> about reloading sp from x29 doesn't really make sense, given that x29
> is always assigned from sp right after pushing x29 and x30 in the
> function prologue, and sp only gets restored from x29 in the epilogue
> when there is a stack frame to begin with, in which case we add #1 to
> sp again before returning from the function.

Fair enough! :)

> The other code gets a lot cleaner as well.
>
> So for the return we'll have
>
>   ldp     x29, x30, [sp], #nn
>>>add     sp, sp, #0x1
>   ret
>
> and for the function call
>
>   bl      <foo>
>>>mov      x30, sp
>>>bic     sp, x30, #1
>
> The restore sequence in entry.s:96 (which has no spare registers) gets
> much simpler as well:
>
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -95,6 +95,15 @@ alternative_else_nop_endif
>          */
>         add     sp, sp, x0      // sp' = sp + x0
>         sub     x0, sp, x0      // x0' = sp' - x0 = (sp + x0) - x0 = sp
> +#ifdef CONFIG_ARM64_ROP_SHIELD
> +       tbnz    x0, #0, 1f
> +       .subsection     1
> +1:     sub     x0, x0, #1
> +       sub     sp, sp, #1
> +       b       2f
> +       .previous
> +2:
> +#endif
>         tbnz    x0, #THREAD_SHIFT, 0f
>         sub     x0, sp, x0      // x0'' = sp' - x0' = (sp + x0) - sp = x0
>         sub     sp, sp, x0      // sp'' = sp' - x0 = (sp + x0) - x0 = sp

I get slightly concerned about "add" vs "clear bit", but I don't see a
real way to chain a lot of "add"s to get to avoid the unaligned
access. Is "or" less efficient than "add"?

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation
@ 2018-08-06 19:50                   ` Kees Cook
  0 siblings, 0 replies; 46+ messages in thread
From: Kees Cook @ 2018-08-06 19:50 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Robin Murphy, Kernel Hardening, Mark Rutland, Catalin Marinas,
	Will Deacon, Christoffer Dall, linux-arm-kernel, Laura Abbott,
	Julien Thierry

On Mon, Aug 6, 2018 at 12:35 PM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> On 6 August 2018 at 20:49, Kees Cook <keescook@chromium.org> wrote:
>> On Mon, Aug 6, 2018 at 10:45 AM, Robin Murphy <robin.murphy@arm.com> wrote:
>>> I guess what I'm getting at is that if the protection mechanism is "always
>>> return with SP outside TTBR1", there seems little point in going through the
>>> motions if SP in TTBR0 could still be valid and allow an attack to succeed
>>> anyway; this is basically just me working through a justification for saying
>>> the proposed scheme needs "depends on ARM64_PAN || ARM64_SW_TTBR0_PAN",
>>> making it that much uglier for v8.0 CPUs...
>>
>> I think anyone with v8.0 CPUs interested in this mitigation would also
>> very much want PAN emulation. If a "depends on" isn't desired, what
>> about "imply" in the Kconfig?
>>
>
> Yes, but actually, using bit #0 is maybe a better alternative in any
> case. You can never dereference SP with bit #0 set, regardless of
> whether the address points to user or kernel space, and my concern
> about reloading sp from x29 doesn't really make sense, given that x29
> is always assigned from sp right after pushing x29 and x30 in the
> function prologue, and sp only gets restored from x29 in the epilogue
> when there is a stack frame to begin with, in which case we add #1 to
> sp again before returning from the function.

Fair enough! :)

> The other code gets a lot cleaner as well.
>
> So for the return we'll have
>
>   ldp     x29, x30, [sp], #nn
>>>add     sp, sp, #0x1
>   ret
>
> and for the function call
>
>   bl      <foo>
>>>mov      x30, sp
>>>bic     sp, x30, #1
>
> The restore sequence in entry.s:96 (which has no spare registers) gets
> much simpler as well:
>
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -95,6 +95,15 @@ alternative_else_nop_endif
>          */
>         add     sp, sp, x0      // sp' = sp + x0
>         sub     x0, sp, x0      // x0' = sp' - x0 = (sp + x0) - x0 = sp
> +#ifdef CONFIG_ARM64_ROP_SHIELD
> +       tbnz    x0, #0, 1f
> +       .subsection     1
> +1:     sub     x0, x0, #1
> +       sub     sp, sp, #1
> +       b       2f
> +       .previous
> +2:
> +#endif
>         tbnz    x0, #THREAD_SHIFT, 0f
>         sub     x0, sp, x0      // x0'' = sp' - x0' = (sp + x0) - sp = x0
>         sub     sp, sp, x0      // sp'' = sp' - x0 = (sp + x0) - x0 = sp

I get slightly concerned about "add" vs "clear bit", but I don't see a
real way to chain a lot of "add"s to get to avoid the unaligned
access. Is "or" less efficient than "add"?

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation
  2018-08-06 19:50                   ` Kees Cook
@ 2018-08-06 19:54                     ` Ard Biesheuvel
  -1 siblings, 0 replies; 46+ messages in thread
From: Ard Biesheuvel @ 2018-08-06 19:54 UTC (permalink / raw)
  To: linux-arm-kernel

On 6 August 2018 at 21:50, Kees Cook <keescook@chromium.org> wrote:
> On Mon, Aug 6, 2018 at 12:35 PM, Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
>> On 6 August 2018 at 20:49, Kees Cook <keescook@chromium.org> wrote:
>>> On Mon, Aug 6, 2018 at 10:45 AM, Robin Murphy <robin.murphy@arm.com> wrote:
>>>> I guess what I'm getting at is that if the protection mechanism is "always
>>>> return with SP outside TTBR1", there seems little point in going through the
>>>> motions if SP in TTBR0 could still be valid and allow an attack to succeed
>>>> anyway; this is basically just me working through a justification for saying
>>>> the proposed scheme needs "depends on ARM64_PAN || ARM64_SW_TTBR0_PAN",
>>>> making it that much uglier for v8.0 CPUs...
>>>
>>> I think anyone with v8.0 CPUs interested in this mitigation would also
>>> very much want PAN emulation. If a "depends on" isn't desired, what
>>> about "imply" in the Kconfig?
>>>
>>
>> Yes, but actually, using bit #0 is maybe a better alternative in any
>> case. You can never dereference SP with bit #0 set, regardless of
>> whether the address points to user or kernel space, and my concern
>> about reloading sp from x29 doesn't really make sense, given that x29
>> is always assigned from sp right after pushing x29 and x30 in the
>> function prologue, and sp only gets restored from x29 in the epilogue
>> when there is a stack frame to begin with, in which case we add #1 to
>> sp again before returning from the function.
>
> Fair enough! :)
>
>> The other code gets a lot cleaner as well.
>>
>> So for the return we'll have
>>
>>   ldp     x29, x30, [sp], #nn
>>>>add     sp, sp, #0x1
>>   ret
>>
>> and for the function call
>>
>>   bl      <foo>
>>>>mov      x30, sp
>>>>bic     sp, x30, #1
>>
>> The restore sequence in entry.s:96 (which has no spare registers) gets
>> much simpler as well:
>>
>> --- a/arch/arm64/kernel/entry.S
>> +++ b/arch/arm64/kernel/entry.S
>> @@ -95,6 +95,15 @@ alternative_else_nop_endif
>>          */
>>         add     sp, sp, x0      // sp' = sp + x0
>>         sub     x0, sp, x0      // x0' = sp' - x0 = (sp + x0) - x0 = sp
>> +#ifdef CONFIG_ARM64_ROP_SHIELD
>> +       tbnz    x0, #0, 1f
>> +       .subsection     1
>> +1:     sub     x0, x0, #1
>> +       sub     sp, sp, #1
>> +       b       2f
>> +       .previous
>> +2:
>> +#endif
>>         tbnz    x0, #THREAD_SHIFT, 0f
>>         sub     x0, sp, x0      // x0'' = sp' - x0' = (sp + x0) - sp = x0
>>         sub     sp, sp, x0      // sp'' = sp' - x0 = (sp + x0) - x0 = sp
>
> I get slightly concerned about "add" vs "clear bit", but I don't see a
> real way to chain a lot of "add"s to get to avoid the unaligned
> access. Is "or" less efficient than "add"?
>

Yes. The stack pointer is special on arm64, and can only be used with
a limited set of ALU instructions. So orring #1 would involve 'mov
<reg>, sp ; orr sp, <reg>, #1' like in the 'bic' case above, which
requires a scratch register as well.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation
@ 2018-08-06 19:54                     ` Ard Biesheuvel
  0 siblings, 0 replies; 46+ messages in thread
From: Ard Biesheuvel @ 2018-08-06 19:54 UTC (permalink / raw)
  To: Kees Cook
  Cc: Robin Murphy, Kernel Hardening, Mark Rutland, Catalin Marinas,
	Will Deacon, Christoffer Dall, linux-arm-kernel, Laura Abbott,
	Julien Thierry

On 6 August 2018 at 21:50, Kees Cook <keescook@chromium.org> wrote:
> On Mon, Aug 6, 2018 at 12:35 PM, Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
>> On 6 August 2018 at 20:49, Kees Cook <keescook@chromium.org> wrote:
>>> On Mon, Aug 6, 2018 at 10:45 AM, Robin Murphy <robin.murphy@arm.com> wrote:
>>>> I guess what I'm getting at is that if the protection mechanism is "always
>>>> return with SP outside TTBR1", there seems little point in going through the
>>>> motions if SP in TTBR0 could still be valid and allow an attack to succeed
>>>> anyway; this is basically just me working through a justification for saying
>>>> the proposed scheme needs "depends on ARM64_PAN || ARM64_SW_TTBR0_PAN",
>>>> making it that much uglier for v8.0 CPUs...
>>>
>>> I think anyone with v8.0 CPUs interested in this mitigation would also
>>> very much want PAN emulation. If a "depends on" isn't desired, what
>>> about "imply" in the Kconfig?
>>>
>>
>> Yes, but actually, using bit #0 is maybe a better alternative in any
>> case. You can never dereference SP with bit #0 set, regardless of
>> whether the address points to user or kernel space, and my concern
>> about reloading sp from x29 doesn't really make sense, given that x29
>> is always assigned from sp right after pushing x29 and x30 in the
>> function prologue, and sp only gets restored from x29 in the epilogue
>> when there is a stack frame to begin with, in which case we add #1 to
>> sp again before returning from the function.
>
> Fair enough! :)
>
>> The other code gets a lot cleaner as well.
>>
>> So for the return we'll have
>>
>>   ldp     x29, x30, [sp], #nn
>>>>add     sp, sp, #0x1
>>   ret
>>
>> and for the function call
>>
>>   bl      <foo>
>>>>mov      x30, sp
>>>>bic     sp, x30, #1
>>
>> The restore sequence in entry.s:96 (which has no spare registers) gets
>> much simpler as well:
>>
>> --- a/arch/arm64/kernel/entry.S
>> +++ b/arch/arm64/kernel/entry.S
>> @@ -95,6 +95,15 @@ alternative_else_nop_endif
>>          */
>>         add     sp, sp, x0      // sp' = sp + x0
>>         sub     x0, sp, x0      // x0' = sp' - x0 = (sp + x0) - x0 = sp
>> +#ifdef CONFIG_ARM64_ROP_SHIELD
>> +       tbnz    x0, #0, 1f
>> +       .subsection     1
>> +1:     sub     x0, x0, #1
>> +       sub     sp, sp, #1
>> +       b       2f
>> +       .previous
>> +2:
>> +#endif
>>         tbnz    x0, #THREAD_SHIFT, 0f
>>         sub     x0, sp, x0      // x0'' = sp' - x0' = (sp + x0) - sp = x0
>>         sub     sp, sp, x0      // sp'' = sp' - x0 = (sp + x0) - x0 = sp
>
> I get slightly concerned about "add" vs "clear bit", but I don't see a
> real way to chain a lot of "add"s to get to avoid the unaligned
> access. Is "or" less efficient than "add"?
>

Yes. The stack pointer is special on arm64, and can only be used with
a limited set of ALU instructions. So orring #1 would involve 'mov
<reg>, sp ; orr sp, <reg>, #1' like in the 'bic' case above, which
requires a scratch register as well.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation
  2018-08-06 19:54                     ` Ard Biesheuvel
  (?)
@ 2018-08-07  3:05                     ` Mark Brand
  2018-08-07  9:21                         ` Ard Biesheuvel
  -1 siblings, 1 reply; 46+ messages in thread
From: Mark Brand @ 2018-08-07  3:05 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Catalin Marinas, Christoffer Dall, Julien Thierry, Kees Cook,
	Kernel Hardening, Laura Abbott, Mark Rutland, Robin Murphy,
	Will Deacon, linux-arm-kernel


[-- Attachment #1.1: Type: text/plain, Size: 4610 bytes --]

I think the phrasing of "limit kernel attack surface against ROP attacks"
is confusing and misleading. ROP does not describe a class of bugs,
vulnerabilities or attacks against the kernel - it's just one of many
code-reuse techniques that can be used by an attacker while exploiting a
vulnerability. But that's kind of off-topic!

I think what this thread is talking about is implementing extremely
coarse-grained reverse-edge control-flow-integrity, in that a return can
only return to the address following a legitimate call, but it can return
to any of those.

I suspect there's not much benefit to this, since (as far as I can see) the
assumption is that an attacker has the means to direct flow of execution as
far as taking complete control of the (el1) stack before executing any ROP
payload.

At that point, I think it's highly unlikely an attacker needs to chain
gadgets through return instructions at all - I suspect there are a few
places in the kernel where it is necessary to load the entire register
context from a register that is not the stack pointer, and it would likely
not be more than a minor inconvenience to an attacker to use these (and
chaining through branch register) instructions instead of chaining through
return instructions.

I'd have to take a closer look at an arm64 kernel image to be sure though -
I'll do that when I get a chance and update...

Regards,
Mark

On Mon, 6 Aug 2018 at 19:28, Ard Biesheuvel <ard.biesheuvel@linaro.org>
wrote:

> On 6 August 2018 at 21:50, Kees Cook <keescook@chromium.org> wrote:
> > On Mon, Aug 6, 2018 at 12:35 PM, Ard Biesheuvel
> > <ard.biesheuvel@linaro.org> wrote:
> >> On 6 August 2018 at 20:49, Kees Cook <keescook@chromium.org> wrote:
> >>> On Mon, Aug 6, 2018 at 10:45 AM, Robin Murphy <robin.murphy@arm.com>
> wrote:
> >>>> I guess what I'm getting at is that if the protection mechanism is
> "always
> >>>> return with SP outside TTBR1", there seems little point in going
> through the
> >>>> motions if SP in TTBR0 could still be valid and allow an attack to
> succeed
> >>>> anyway; this is basically just me working through a justification for
> saying
> >>>> the proposed scheme needs "depends on ARM64_PAN ||
> ARM64_SW_TTBR0_PAN",
> >>>> making it that much uglier for v8.0 CPUs...
> >>>
> >>> I think anyone with v8.0 CPUs interested in this mitigation would also
> >>> very much want PAN emulation. If a "depends on" isn't desired, what
> >>> about "imply" in the Kconfig?
> >>>
> >>
> >> Yes, but actually, using bit #0 is maybe a better alternative in any
> >> case. You can never dereference SP with bit #0 set, regardless of
> >> whether the address points to user or kernel space, and my concern
> >> about reloading sp from x29 doesn't really make sense, given that x29
> >> is always assigned from sp right after pushing x29 and x30 in the
> >> function prologue, and sp only gets restored from x29 in the epilogue
> >> when there is a stack frame to begin with, in which case we add #1 to
> >> sp again before returning from the function.
> >
> > Fair enough! :)
> >
> >> The other code gets a lot cleaner as well.
> >>
> >> So for the return we'll have
> >>
> >>   ldp     x29, x30, [sp], #nn
> >>>>add     sp, sp, #0x1
> >>   ret
> >>
> >> and for the function call
> >>
> >>   bl      <foo>
> >>>>mov      x30, sp
> >>>>bic     sp, x30, #1
> >>
> >> The restore sequence in entry.s:96 (which has no spare registers) gets
> >> much simpler as well:
> >>
> >> --- a/arch/arm64/kernel/entry.S
> >> +++ b/arch/arm64/kernel/entry.S
> >> @@ -95,6 +95,15 @@ alternative_else_nop_endif
> >>          */
> >>         add     sp, sp, x0      // sp' = sp + x0
> >>         sub     x0, sp, x0      // x0' = sp' - x0 = (sp + x0) - x0 = sp
> >> +#ifdef CONFIG_ARM64_ROP_SHIELD
> >> +       tbnz    x0, #0, 1f
> >> +       .subsection     1
> >> +1:     sub     x0, x0, #1
> >> +       sub     sp, sp, #1
> >> +       b       2f
> >> +       .previous
> >> +2:
> >> +#endif
> >>         tbnz    x0, #THREAD_SHIFT, 0f
> >>         sub     x0, sp, x0      // x0'' = sp' - x0' = (sp + x0) - sp =
> x0
> >>         sub     sp, sp, x0      // sp'' = sp' - x0 = (sp + x0) - x0 = sp
> >
> > I get slightly concerned about "add" vs "clear bit", but I don't see a
> > real way to chain a lot of "add"s to get to avoid the unaligned
> > access. Is "or" less efficient than "add"?
> >
>
> Yes. The stack pointer is special on arm64, and can only be used with
> a limited set of ALU instructions. So orring #1 would involve 'mov
> <reg>, sp ; orr sp, <reg>, #1' like in the 'bic' case above, which
> requires a scratch register as well.
>

[-- Attachment #1.2: Type: text/html, Size: 6381 bytes --]

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4849 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation
  2018-08-07  3:05                     ` Mark Brand
@ 2018-08-07  9:21                         ` Ard Biesheuvel
  0 siblings, 0 replies; 46+ messages in thread
From: Ard Biesheuvel @ 2018-08-07  9:21 UTC (permalink / raw)
  To: linux-arm-kernel

On 7 August 2018 at 05:05, Mark Brand <markbrand@google.com> wrote:
> I think the phrasing of "limit kernel attack surface against ROP attacks" is
> confusing and misleading. ROP does not describe a class of bugs,
> vulnerabilities or attacks against the kernel - it's just one of many
> code-reuse techniques that can be used by an attacker while exploiting a
> vulnerability. But that's kind of off-topic!
>
> I think what this thread is talking about is implementing extremely
> coarse-grained reverse-edge control-flow-integrity, in that a return can
> only return to the address following a legitimate call, but it can return to
> any of those.
>

Indeed. Apologies for not mastering the lingo, but it is indeed about
no longer being able to subvert function returns into jumping to
arbitrary places in the code.

> I suspect there's not much benefit to this, since (as far as I can see) the
> assumption is that an attacker has the means to direct flow of execution as
> far as taking complete control of the (el1) stack before executing any ROP
> payload.
>
> At that point, I think it's highly unlikely an attacker needs to chain
> gadgets through return instructions at all - I suspect there are a few
> places in the kernel where it is necessary to load the entire register
> context from a register that is not the stack pointer, and it would likely
> not be more than a minor inconvenience to an attacker to use these (and
> chaining through branch register) instructions instead of chaining through
> return instructions.
>
> I'd have to take a closer look at an arm64 kernel image to be sure though -
> I'll do that when I get a chance and update...
>

Thanks. Reloading all registers from an arbitrary offset register
should occur rarely, no? Could we work around that?

> On Mon, 6 Aug 2018 at 19:28, Ard Biesheuvel <ard.biesheuvel@linaro.org>
> wrote:
>>
>> On 6 August 2018 at 21:50, Kees Cook <keescook@chromium.org> wrote:
>> > On Mon, Aug 6, 2018 at 12:35 PM, Ard Biesheuvel
>> > <ard.biesheuvel@linaro.org> wrote:
>> >> On 6 August 2018 at 20:49, Kees Cook <keescook@chromium.org> wrote:
>> >>> On Mon, Aug 6, 2018 at 10:45 AM, Robin Murphy <robin.murphy@arm.com>
>> >>> wrote:
>> >>>> I guess what I'm getting at is that if the protection mechanism is
>> >>>> "always
>> >>>> return with SP outside TTBR1", there seems little point in going
>> >>>> through the
>> >>>> motions if SP in TTBR0 could still be valid and allow an attack to
>> >>>> succeed
>> >>>> anyway; this is basically just me working through a justification for
>> >>>> saying
>> >>>> the proposed scheme needs "depends on ARM64_PAN ||
>> >>>> ARM64_SW_TTBR0_PAN",
>> >>>> making it that much uglier for v8.0 CPUs...
>> >>>
>> >>> I think anyone with v8.0 CPUs interested in this mitigation would also
>> >>> very much want PAN emulation. If a "depends on" isn't desired, what
>> >>> about "imply" in the Kconfig?
>> >>>
>> >>
>> >> Yes, but actually, using bit #0 is maybe a better alternative in any
>> >> case. You can never dereference SP with bit #0 set, regardless of
>> >> whether the address points to user or kernel space, and my concern
>> >> about reloading sp from x29 doesn't really make sense, given that x29
>> >> is always assigned from sp right after pushing x29 and x30 in the
>> >> function prologue, and sp only gets restored from x29 in the epilogue
>> >> when there is a stack frame to begin with, in which case we add #1 to
>> >> sp again before returning from the function.
>> >
>> > Fair enough! :)
>> >
>> >> The other code gets a lot cleaner as well.
>> >>
>> >> So for the return we'll have
>> >>
>> >>   ldp     x29, x30, [sp], #nn
>> >>>>add     sp, sp, #0x1
>> >>   ret
>> >>
>> >> and for the function call
>> >>
>> >>   bl      <foo>
>> >>>>mov      x30, sp
>> >>>>bic     sp, x30, #1
>> >>
>> >> The restore sequence in entry.s:96 (which has no spare registers) gets
>> >> much simpler as well:
>> >>
>> >> --- a/arch/arm64/kernel/entry.S
>> >> +++ b/arch/arm64/kernel/entry.S
>> >> @@ -95,6 +95,15 @@ alternative_else_nop_endif
>> >>          */
>> >>         add     sp, sp, x0      // sp' = sp + x0
>> >>         sub     x0, sp, x0      // x0' = sp' - x0 = (sp + x0) - x0 = sp
>> >> +#ifdef CONFIG_ARM64_ROP_SHIELD
>> >> +       tbnz    x0, #0, 1f
>> >> +       .subsection     1
>> >> +1:     sub     x0, x0, #1
>> >> +       sub     sp, sp, #1
>> >> +       b       2f
>> >> +       .previous
>> >> +2:
>> >> +#endif
>> >>         tbnz    x0, #THREAD_SHIFT, 0f
>> >>         sub     x0, sp, x0      // x0'' = sp' - x0' = (sp + x0) - sp =
>> >> x0
>> >>         sub     sp, sp, x0      // sp'' = sp' - x0 = (sp + x0) - x0 =
>> >> sp
>> >
>> > I get slightly concerned about "add" vs "clear bit", but I don't see a
>> > real way to chain a lot of "add"s to get to avoid the unaligned
>> > access. Is "or" less efficient than "add"?
>> >
>>
>> Yes. The stack pointer is special on arm64, and can only be used with
>> a limited set of ALU instructions. So orring #1 would involve 'mov
>> <reg>, sp ; orr sp, <reg>, #1' like in the 'bic' case above, which
>> requires a scratch register as well.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation
@ 2018-08-07  9:21                         ` Ard Biesheuvel
  0 siblings, 0 replies; 46+ messages in thread
From: Ard Biesheuvel @ 2018-08-07  9:21 UTC (permalink / raw)
  To: Mark Brand
  Cc: Catalin Marinas, Christoffer Dall, Julien Thierry, Kees Cook,
	Kernel Hardening, Laura Abbott, Mark Rutland, Robin Murphy,
	Will Deacon, linux-arm-kernel

On 7 August 2018 at 05:05, Mark Brand <markbrand@google.com> wrote:
> I think the phrasing of "limit kernel attack surface against ROP attacks" is
> confusing and misleading. ROP does not describe a class of bugs,
> vulnerabilities or attacks against the kernel - it's just one of many
> code-reuse techniques that can be used by an attacker while exploiting a
> vulnerability. But that's kind of off-topic!
>
> I think what this thread is talking about is implementing extremely
> coarse-grained reverse-edge control-flow-integrity, in that a return can
> only return to the address following a legitimate call, but it can return to
> any of those.
>

Indeed. Apologies for not mastering the lingo, but it is indeed about
no longer being able to subvert function returns into jumping to
arbitrary places in the code.

> I suspect there's not much benefit to this, since (as far as I can see) the
> assumption is that an attacker has the means to direct flow of execution as
> far as taking complete control of the (el1) stack before executing any ROP
> payload.
>
> At that point, I think it's highly unlikely an attacker needs to chain
> gadgets through return instructions at all - I suspect there are a few
> places in the kernel where it is necessary to load the entire register
> context from a register that is not the stack pointer, and it would likely
> not be more than a minor inconvenience to an attacker to use these (and
> chaining through branch register) instructions instead of chaining through
> return instructions.
>
> I'd have to take a closer look at an arm64 kernel image to be sure though -
> I'll do that when I get a chance and update...
>

Thanks. Reloading all registers from an arbitrary offset register
should occur rarely, no? Could we work around that?

> On Mon, 6 Aug 2018 at 19:28, Ard Biesheuvel <ard.biesheuvel@linaro.org>
> wrote:
>>
>> On 6 August 2018 at 21:50, Kees Cook <keescook@chromium.org> wrote:
>> > On Mon, Aug 6, 2018 at 12:35 PM, Ard Biesheuvel
>> > <ard.biesheuvel@linaro.org> wrote:
>> >> On 6 August 2018 at 20:49, Kees Cook <keescook@chromium.org> wrote:
>> >>> On Mon, Aug 6, 2018 at 10:45 AM, Robin Murphy <robin.murphy@arm.com>
>> >>> wrote:
>> >>>> I guess what I'm getting at is that if the protection mechanism is
>> >>>> "always
>> >>>> return with SP outside TTBR1", there seems little point in going
>> >>>> through the
>> >>>> motions if SP in TTBR0 could still be valid and allow an attack to
>> >>>> succeed
>> >>>> anyway; this is basically just me working through a justification for
>> >>>> saying
>> >>>> the proposed scheme needs "depends on ARM64_PAN ||
>> >>>> ARM64_SW_TTBR0_PAN",
>> >>>> making it that much uglier for v8.0 CPUs...
>> >>>
>> >>> I think anyone with v8.0 CPUs interested in this mitigation would also
>> >>> very much want PAN emulation. If a "depends on" isn't desired, what
>> >>> about "imply" in the Kconfig?
>> >>>
>> >>
>> >> Yes, but actually, using bit #0 is maybe a better alternative in any
>> >> case. You can never dereference SP with bit #0 set, regardless of
>> >> whether the address points to user or kernel space, and my concern
>> >> about reloading sp from x29 doesn't really make sense, given that x29
>> >> is always assigned from sp right after pushing x29 and x30 in the
>> >> function prologue, and sp only gets restored from x29 in the epilogue
>> >> when there is a stack frame to begin with, in which case we add #1 to
>> >> sp again before returning from the function.
>> >
>> > Fair enough! :)
>> >
>> >> The other code gets a lot cleaner as well.
>> >>
>> >> So for the return we'll have
>> >>
>> >>   ldp     x29, x30, [sp], #nn
>> >>>>add     sp, sp, #0x1
>> >>   ret
>> >>
>> >> and for the function call
>> >>
>> >>   bl      <foo>
>> >>>>mov      x30, sp
>> >>>>bic     sp, x30, #1
>> >>
>> >> The restore sequence in entry.s:96 (which has no spare registers) gets
>> >> much simpler as well:
>> >>
>> >> --- a/arch/arm64/kernel/entry.S
>> >> +++ b/arch/arm64/kernel/entry.S
>> >> @@ -95,6 +95,15 @@ alternative_else_nop_endif
>> >>          */
>> >>         add     sp, sp, x0      // sp' = sp + x0
>> >>         sub     x0, sp, x0      // x0' = sp' - x0 = (sp + x0) - x0 = sp
>> >> +#ifdef CONFIG_ARM64_ROP_SHIELD
>> >> +       tbnz    x0, #0, 1f
>> >> +       .subsection     1
>> >> +1:     sub     x0, x0, #1
>> >> +       sub     sp, sp, #1
>> >> +       b       2f
>> >> +       .previous
>> >> +2:
>> >> +#endif
>> >>         tbnz    x0, #THREAD_SHIFT, 0f
>> >>         sub     x0, sp, x0      // x0'' = sp' - x0' = (sp + x0) - sp =
>> >> x0
>> >>         sub     sp, sp, x0      // sp'' = sp' - x0 = (sp + x0) - x0 =
>> >> sp
>> >
>> > I get slightly concerned about "add" vs "clear bit", but I don't see a
>> > real way to chain a lot of "add"s to get to avoid the unaligned
>> > access. Is "or" less efficient than "add"?
>> >
>>
>> Yes. The stack pointer is special on arm64, and can only be used with
>> a limited set of ALU instructions. So orring #1 would involve 'mov
>> <reg>, sp ; orr sp, <reg>, #1' like in the 'bic' case above, which
>> requires a scratch register as well.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation
  2018-08-07  9:21                         ` Ard Biesheuvel
@ 2018-08-08 16:09                           ` Mark Brand
  -1 siblings, 0 replies; 46+ messages in thread
From: Mark Brand @ 2018-08-08 16:09 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Aug 7, 2018 at 2:22 AM Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>
> On 7 August 2018 at 05:05, Mark Brand <markbrand@google.com> wrote:
> > I think the phrasing of "limit kernel attack surface against ROP attacks" is
> > confusing and misleading. ROP does not describe a class of bugs,
> > vulnerabilities or attacks against the kernel - it's just one of many
> > code-reuse techniques that can be used by an attacker while exploiting a
> > vulnerability. But that's kind of off-topic!
> >
> > I think what this thread is talking about is implementing extremely
> > coarse-grained reverse-edge control-flow-integrity, in that a return can
> > only return to the address following a legitimate call, but it can return to
> > any of those.
> >
>
> Indeed. Apologies for not mastering the lingo, but it is indeed about
> no longer being able to subvert function returns into jumping to
> arbitrary places in the code.
>
> > I suspect there's not much benefit to this, since (as far as I can see) the
> > assumption is that an attacker has the means to direct flow of execution as
> > far as taking complete control of the (el1) stack before executing any ROP
> > payload.
> >
> > At that point, I think it's highly unlikely an attacker needs to chain
> > gadgets through return instructions at all - I suspect there are a few
> > places in the kernel where it is necessary to load the entire register
> > context from a register that is not the stack pointer, and it would likely
> > not be more than a minor inconvenience to an attacker to use these (and
> > chaining through branch register) instructions instead of chaining through
> > return instructions.
> >
> > I'd have to take a closer look at an arm64 kernel image to be sure though -
> > I'll do that when I get a chance and update...
> >
>
> Thanks. Reloading all registers from an arbitrary offset register
> should occur rarely, no? Could we work around that?

 I forgot about the gmail-html-by-default... Hopefully everyone else
can read the quotes though :-/.

I took a look and have put together an example rop chain that doesn't
use any return instructions that you could instrument, that will call
an arbitrary kernel function with controlled parameters (at least x0 -
x4, would have to probably mess with some alignment and add a
repetition of the last gadget to get all register control. It assumes
that the attacker has control over the memory pointed to by x0 at the
point where they get control of pc, and that they know where that
memory is located (but it would also work if they just controlled the
memory pointed to by x0, and had another chunk of kernel memory they
control at a known address. Seems like a pretty reasonable starting
assumption, and I'm sure anyone with a little motivation could produce
similar chains for other starting conditions, this just seemed the
"most likely" reasonable conditions to me.

There are two basic principles used here -

(1) chaining through the mempool_free function, I found this really
quickly when searching for useful gadgets based off x0

void mempool_free(void *element, mempool_t *pool)
{
  unsigned long flags;

  if (unlikely(element == NULL))
    return;

  /* snip */
  smp_rmb();

  /* snip */
  if (unlikely(pool->curr_nr < pool->min_nr)) {
    spin_lock_irqsave(&pool->lock, flags);
    if (likely(pool->curr_nr < pool->min_nr)) {
      add_element(pool, element);
      spin_unlock_irqrestore(&pool->lock, flags);
      wake_up(&pool->wait);
      return;
    }
    spin_unlock_irqrestore(&pool->lock, flags);
  }

  pool->free(element, pool->pool_data);
}

Since the callsites for this function usually load the arguments
through some registers, and the function to call gets pulled out of
one of those arguments, it's easy to get a couple of registers loaded
here and then the chain continue.

(2) loading complete register state using kernel_exit macro.

Since the kernel_exit macro actually loads spsr_el1 and elr_el1 from
registers, I think that you can let the eret return to anywhere in el1
without dropping to el0, since the same handler is used for "exiting
the kernel" when a hardware interrupt interrupts the kernel itself. I
didn't fill out the necessary register values in the chain below,
since I don't anyway have a device around to test this on right now.

I'm not sure that you could really robustly protect this eret; I
suppose that you could try and somehow validate the saved register
state, but given that it would be happening on every exception return,
I suspect it would be expensive.

0:dispatch_io + yy (mempool_free gadget, appears in plenty of other places.)
ffffff8008a340d4  084c41a9   ldp     x8, x19, [x0, #0x10]
ffffff8008a340d8  190040f9   ldr     x25, [x0]
ffffff8008a340dc  1a1040f9   ldr     x26, [x0, #0x20]
ffffff8008a340e0  010140f9   ldr     x1, [x8]
ffffff8008a340e4  ed80dd97   bl      mempool_free

  mempool_free:
  ffffff8008194498  f44fbea9   stp     x20, x19, [sp, #-0x20
{__saved_x20} {__saved_x19}]!
  ffffff800819449c  fd7b01a9   stp     x29, x30, [sp, #0x10
{__saved_x29} {__saved_x30}]
  ffffff80081944a0  fd430091   add     x29, sp, #0x10 {__saved_x29}
  ffffff80081944a4  f30301aa   mov     x19, x1
  ffffff80081944a8  f40300aa   mov     x20, x0
  ffffff80081944ac  340100b4   cbz     x20, 0xffffff80081944d0

  ffffff80081944b0  bf3903d5   dmb     ishld
  ffffff80081944b4  68a64029   ldp     w8, w9, [x19, #0x4]
  ffffff80081944b8  3f01086b   cmp     w9, w8
  ffffff80081944bc  0b010054   b.lt    0xffffff80081944dc

  ffffff80081944c0  681640f9   ldr     x8, [x19, #0x28]
  ffffff80081944c4  610e40f9   ldr     x1, [x19, #0x18]
  ffffff80081944c8  e00314aa   mov     x0, x20
  ffffff80081944cc  00013fd6   blr     x8

  ffffff80081944d0  fd7b41a9   ldp     x29, x30, [sp, #0x10
{__saved_x29} {__saved_x30}]
  ffffff80081944d4  f44fc2a8   ldp     x20, x19, [sp {__saved_x20}
{__saved_x19}], #0x20
  ffffff80081944d8  c0035fd6   ret

ffffff8008a340e8  e00319aa   mov     x0, x25
ffffff8008a340ec  e1031aaa   mov     x1, x26
ffffff8008a340f0  60023fd6   blr     x19

1:el1_irq + xx - (x1, x26) -> sp control
ffffff800808314c  5f030091   mov     sp, x26
ffffff8008083150  fd4fbfa9   stp     x29, x19, [sp, #-0x10]! {__saved_x0}
ffffff8008083154  fd030091   mov     x29, sp
ffffff8008083158  20003fd6   blr     x1

2:ipc_log_extract + xx (sp, x19) -> survival
ffffff800817c35c  e0c30091   add     x0, sp, #0x30 {var_170}
ffffff800817c360  e1430091   add     x1, sp, #0x10 {var_190}
ffffff800817c364  60023fd6   blr     x19

3:dispatch_io + xx (mempool_free gadget, appears in plenty of other places.)
ffffff8008a342cc  084c41a9   ldp     x8, x19, [x0, #0x10]
ffffff8008a342d0  140040f9   ldr     x20, [x0]
ffffff8008a342d4  151040f9   ldr     x21, [x0, #0x20]
ffffff8008a342d8  010140f9   ldr     x1, [x8]
ffffff8008a342dc  6f80dd97   bl      mempool_free

  mempool_free:
  ffffff8008194498  f44fbea9   stp     x20, x19, [sp, #-0x20
{__saved_x20} {__saved_x19}]!
  ffffff800819449c  fd7b01a9   stp     x29, x30, [sp, #0x10
{__saved_x29} {__saved_x30}]
  ffffff80081944a0  fd430091   add     x29, sp, #0x10 {__saved_x29}
  ffffff80081944a4  f30301aa   mov     x19, x1
  ffffff80081944a8  f40300aa   mov     x20, x0
  ffffff80081944ac  340100b4   cbz     x20, 0xffffff80081944d0

  ffffff80081944b0  bf3903d5   dmb     ishld
  ffffff80081944b4  68a64029   ldp     w8, w9, [x19, #0x4]
  ffffff80081944b8  3f01086b   cmp     w9, w8
  ffffff80081944bc  0b010054   b.lt    0xffffff80081944dc

  ffffff80081944c0  681640f9   ldr     x8, [x19, #0x28]
  ffffff80081944c4  610e40f9   ldr     x1, [x19, #0x18]
  ffffff80081944c8  e00314aa   mov     x0, x20
  ffffff80081944cc  00013fd6   blr     x8

  ffffff80081944d0  fd7b41a9   ldp     x29, x30, [sp, #0x10
{__saved_x29} {__saved_x30}]
  ffffff80081944d4  f44fc2a8   ldp     x20, x19, [sp {__saved_x20}
{__saved_x19}], #0x20
  ffffff80081944d8  c0035fd6   ret

ffffff8008a342e0  e00314aa   mov     x0, x20
ffffff8008a342e4  e10315aa   mov     x1, x21
ffffff8008a342e8  60023fd6   blr     x19

4:bus_sort_breadthfirst + xx - (x26)
ffffff8008683cc8  561740f9   ldr     x22, [x26, #0x28]
ffffff8008683ccc  e00315aa   mov     x0, x21
ffffff8008683cd0  e10316aa   mov     x1, x22
ffffff8008683cd4  80023fd6   blr     x20

5:kernel_exit (macro) - (x21, x22, sp) -> full register control & pc control
ffffff8008082f64  354018d5   msr     elr_el1, x21
ffffff8008082f68  164018d5   msr     spsr_el1, x22
ffffff8008082f6c  e00740a9   ldp     x0, x1, [sp {var_130} {var_128}]
ffffff8008082f70  e20f41a9   ldp     x2, x3, [sp, #0x10 {var_120} {var_118}]
ffffff8008082f74  e41742a9   ldp     x4, x5, [sp, #0x20 {var_110} {var_108}]
ffffff8008082f78  e61f43a9   ldp     x6, x7, [sp, #0x30 {var_100} {var_f8}]
ffffff8008082f7c  e82744a9   ldp     x8, x9, [sp, #0x40 {var_f0} {var_e8}]
ffffff8008082f80  ea2f45a9   ldp     x10, x11, [sp, #0x50 {var_e0} {var_d8}]
ffffff8008082f84  ec3746a9   ldp     x12, x13, [sp, #0x60 {var_d0} {var_c8}]
ffffff8008082f88  ee3f47a9   ldp     x14, x15, [sp, #0x70 {var_c0} {var_b8}]
ffffff8008082f8c  f04748a9   ldp     x16, x17, [sp, #0x80 {var_b0} {var_a8}]
ffffff8008082f90  f24f49a9   ldp     x18, x19, [sp, #0x90 {var_a0} {var_98}]
ffffff8008082f94  f4574aa9   ldp     x20, x21, [sp, #0xa0 {var_90} {var_88}]
ffffff8008082f98  f65f4ba9   ldp     x22, x23, [sp, #0xb0 {var_80} {var_78}]
ffffff8008082f9c  f8674ca9   ldp     x24, x25, [sp, #0xc0 {var_70} {var_68}]
ffffff8008082fa0  fa6f4da9   ldp     x26, x27, [sp, #0xd0 {var_60} {var_58}]
ffffff8008082fa4  fc774ea9   ldp     x28, x29, [sp, #0xe0 {var_50} {var_48}]
ffffff8008082fa8  fe7b40f9   ldr     x30, [sp, #0xf0 {var_40}]
ffffff8008082fac  ffc30491   add     sp, sp, #0x130
ffffff8008082fb0  e0039fd6   eret


ptr = 0000414100000000 = initial x0

0000: 2525252525252525 ; (0:40d8) x25
0008: 0000414100000030 ; (0:40e0) x1
0010: 0000414100000000 ; (0:40d4) x8
0018: ffffff8008a342cc ; (0:40d4) x19 -> branch target (2:c364)
0020: 0000414100000070 ; (0:40dc) x26 -> sp (1:314c)
0028:
0030: 8888888899999999 ; (0:44b4) w8, w9
0038:
0040:
0048: ffffff800817c35c ; (0:44c4) x1 -> branch target (1:3158)
0050:
0058: ffffff800808314c ; (0:44c0) x8 -> branch target (0:44c4)
0060: xxxxxxxxxxxxxxxx ; saved x29                               <-- sp@(1:3154)
0068: xxxxxxxxxxxxxxxx ; saved x19
0070:                  ;                                         <--
sp@(1:314c), (5:2f64)
0078:
0080:
0088:
0090:
0098: 2222222222222222 ; (4:3cc8) x22 -> spsr_el1
00a0: ffffff8008082f64 ; (3:42d0) x20 -> branch target (4:3cd4)  <-- x0@(2:c35c)
00a8: 00004141000000d0 ; (3:42d8) x1
00b0: 00004141000000a0 ; (3:42cc) x8
00b8: 1919191919191919 ; (3:42cc) x19
00c0: 2121212121212121 ; (3:42d4) x21 -> elr_el1
00c8:
00d0: 8888888899999999 ; (3:44b4) w8, w9
00d8:
00e0:
00e8: 1111111111111111 ; (3:44c4) x1
00f0:
00f8: ffffff800808314c ; (3:44c0) x8 -> branch target (3:44c4)
>
> > On Mon, 6 Aug 2018 at 19:28, Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > wrote:
> >>
> >> On 6 August 2018 at 21:50, Kees Cook <keescook@chromium.org> wrote:
> >> > On Mon, Aug 6, 2018 at 12:35 PM, Ard Biesheuvel
> >> > <ard.biesheuvel@linaro.org> wrote:
> >> >> On 6 August 2018 at 20:49, Kees Cook <keescook@chromium.org> wrote:
> >> >>> On Mon, Aug 6, 2018 at 10:45 AM, Robin Murphy <robin.murphy@arm.com>
> >> >>> wrote:
> >> >>>> I guess what I'm getting at is that if the protection mechanism is
> >> >>>> "always
> >> >>>> return with SP outside TTBR1", there seems little point in going
> >> >>>> through the
> >> >>>> motions if SP in TTBR0 could still be valid and allow an attack to
> >> >>>> succeed
> >> >>>> anyway; this is basically just me working through a justification for
> >> >>>> saying
> >> >>>> the proposed scheme needs "depends on ARM64_PAN ||
> >> >>>> ARM64_SW_TTBR0_PAN",
> >> >>>> making it that much uglier for v8.0 CPUs...
> >> >>>
> >> >>> I think anyone with v8.0 CPUs interested in this mitigation would also
> >> >>> very much want PAN emulation. If a "depends on" isn't desired, what
> >> >>> about "imply" in the Kconfig?
> >> >>>
> >> >>
> >> >> Yes, but actually, using bit #0 is maybe a better alternative in any
> >> >> case. You can never dereference SP with bit #0 set, regardless of
> >> >> whether the address points to user or kernel space, and my concern
> >> >> about reloading sp from x29 doesn't really make sense, given that x29
> >> >> is always assigned from sp right after pushing x29 and x30 in the
> >> >> function prologue, and sp only gets restored from x29 in the epilogue
> >> >> when there is a stack frame to begin with, in which case we add #1 to
> >> >> sp again before returning from the function.
> >> >
> >> > Fair enough! :)
> >> >
> >> >> The other code gets a lot cleaner as well.
> >> >>
> >> >> So for the return we'll have
> >> >>
> >> >>   ldp     x29, x30, [sp], #nn
> >> >>>>add     sp, sp, #0x1
> >> >>   ret
> >> >>
> >> >> and for the function call
> >> >>
> >> >>   bl      <foo>
> >> >>>>mov      x30, sp
> >> >>>>bic     sp, x30, #1
> >> >>
> >> >> The restore sequence in entry.s:96 (which has no spare registers) gets
> >> >> much simpler as well:
> >> >>
> >> >> --- a/arch/arm64/kernel/entry.S
> >> >> +++ b/arch/arm64/kernel/entry.S
> >> >> @@ -95,6 +95,15 @@ alternative_else_nop_endif
> >> >>          */
> >> >>         add     sp, sp, x0      // sp' = sp + x0
> >> >>         sub     x0, sp, x0      // x0' = sp' - x0 = (sp + x0) - x0 = sp
> >> >> +#ifdef CONFIG_ARM64_ROP_SHIELD
> >> >> +       tbnz    x0, #0, 1f
> >> >> +       .subsection     1
> >> >> +1:     sub     x0, x0, #1
> >> >> +       sub     sp, sp, #1
> >> >> +       b       2f
> >> >> +       .previous
> >> >> +2:
> >> >> +#endif
> >> >>         tbnz    x0, #THREAD_SHIFT, 0f
> >> >>         sub     x0, sp, x0      // x0'' = sp' - x0' = (sp + x0) - sp =
> >> >> x0
> >> >>         sub     sp, sp, x0      // sp'' = sp' - x0 = (sp + x0) - x0 =
> >> >> sp
> >> >
> >> > I get slightly concerned about "add" vs "clear bit", but I don't see a
> >> > real way to chain a lot of "add"s to get to avoid the unaligned
> >> > access. Is "or" less efficient than "add"?
> >> >
> >>
> >> Yes. The stack pointer is special on arm64, and can only be used with
> >> a limited set of ALU instructions. So orring #1 would involve 'mov
> >> <reg>, sp ; orr sp, <reg>, #1' like in the 'bic' case above, which
> >> requires a scratch register as well.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4849 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20180808/fa400226/attachment-0001.p7s>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation
@ 2018-08-08 16:09                           ` Mark Brand
  0 siblings, 0 replies; 46+ messages in thread
From: Mark Brand @ 2018-08-08 16:09 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Catalin Marinas, Christoffer Dall, Julien Thierry, Kees Cook,
	Kernel Hardening, Laura Abbott, Mark Rutland, Robin Murphy,
	Will Deacon, linux-arm-kernel

[-- Attachment #1: Type: text/plain, Size: 14384 bytes --]

On Tue, Aug 7, 2018 at 2:22 AM Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>
> On 7 August 2018 at 05:05, Mark Brand <markbrand@google.com> wrote:
> > I think the phrasing of "limit kernel attack surface against ROP attacks" is
> > confusing and misleading. ROP does not describe a class of bugs,
> > vulnerabilities or attacks against the kernel - it's just one of many
> > code-reuse techniques that can be used by an attacker while exploiting a
> > vulnerability. But that's kind of off-topic!
> >
> > I think what this thread is talking about is implementing extremely
> > coarse-grained reverse-edge control-flow-integrity, in that a return can
> > only return to the address following a legitimate call, but it can return to
> > any of those.
> >
>
> Indeed. Apologies for not mastering the lingo, but it is indeed about
> no longer being able to subvert function returns into jumping to
> arbitrary places in the code.
>
> > I suspect there's not much benefit to this, since (as far as I can see) the
> > assumption is that an attacker has the means to direct flow of execution as
> > far as taking complete control of the (el1) stack before executing any ROP
> > payload.
> >
> > At that point, I think it's highly unlikely an attacker needs to chain
> > gadgets through return instructions at all - I suspect there are a few
> > places in the kernel where it is necessary to load the entire register
> > context from a register that is not the stack pointer, and it would likely
> > not be more than a minor inconvenience to an attacker to use these (and
> > chaining through branch register) instructions instead of chaining through
> > return instructions.
> >
> > I'd have to take a closer look at an arm64 kernel image to be sure though -
> > I'll do that when I get a chance and update...
> >
>
> Thanks. Reloading all registers from an arbitrary offset register
> should occur rarely, no? Could we work around that?

 I forgot about the gmail-html-by-default... Hopefully everyone else
can read the quotes though :-/.

I took a look and have put together an example rop chain that doesn't
use any return instructions that you could instrument, that will call
an arbitrary kernel function with controlled parameters (at least x0 -
x4, would have to probably mess with some alignment and add a
repetition of the last gadget to get all register control. It assumes
that the attacker has control over the memory pointed to by x0 at the
point where they get control of pc, and that they know where that
memory is located (but it would also work if they just controlled the
memory pointed to by x0, and had another chunk of kernel memory they
control at a known address. Seems like a pretty reasonable starting
assumption, and I'm sure anyone with a little motivation could produce
similar chains for other starting conditions, this just seemed the
"most likely" reasonable conditions to me.

There are two basic principles used here -

(1) chaining through the mempool_free function, I found this really
quickly when searching for useful gadgets based off x0

void mempool_free(void *element, mempool_t *pool)
{
  unsigned long flags;

  if (unlikely(element == NULL))
    return;

  /* snip */
  smp_rmb();

  /* snip */
  if (unlikely(pool->curr_nr < pool->min_nr)) {
    spin_lock_irqsave(&pool->lock, flags);
    if (likely(pool->curr_nr < pool->min_nr)) {
      add_element(pool, element);
      spin_unlock_irqrestore(&pool->lock, flags);
      wake_up(&pool->wait);
      return;
    }
    spin_unlock_irqrestore(&pool->lock, flags);
  }

  pool->free(element, pool->pool_data);
}

Since the callsites for this function usually load the arguments
through some registers, and the function to call gets pulled out of
one of those arguments, it's easy to get a couple of registers loaded
here and then the chain continue.

(2) loading complete register state using kernel_exit macro.

Since the kernel_exit macro actually loads spsr_el1 and elr_el1 from
registers, I think that you can let the eret return to anywhere in el1
without dropping to el0, since the same handler is used for "exiting
the kernel" when a hardware interrupt interrupts the kernel itself. I
didn't fill out the necessary register values in the chain below,
since I don't anyway have a device around to test this on right now.

I'm not sure that you could really robustly protect this eret; I
suppose that you could try and somehow validate the saved register
state, but given that it would be happening on every exception return,
I suspect it would be expensive.

0:dispatch_io + yy (mempool_free gadget, appears in plenty of other places.)
ffffff8008a340d4  084c41a9   ldp     x8, x19, [x0, #0x10]
ffffff8008a340d8  190040f9   ldr     x25, [x0]
ffffff8008a340dc  1a1040f9   ldr     x26, [x0, #0x20]
ffffff8008a340e0  010140f9   ldr     x1, [x8]
ffffff8008a340e4  ed80dd97   bl      mempool_free

  mempool_free:
  ffffff8008194498  f44fbea9   stp     x20, x19, [sp, #-0x20
{__saved_x20} {__saved_x19}]!
  ffffff800819449c  fd7b01a9   stp     x29, x30, [sp, #0x10
{__saved_x29} {__saved_x30}]
  ffffff80081944a0  fd430091   add     x29, sp, #0x10 {__saved_x29}
  ffffff80081944a4  f30301aa   mov     x19, x1
  ffffff80081944a8  f40300aa   mov     x20, x0
  ffffff80081944ac  340100b4   cbz     x20, 0xffffff80081944d0

  ffffff80081944b0  bf3903d5   dmb     ishld
  ffffff80081944b4  68a64029   ldp     w8, w9, [x19, #0x4]
  ffffff80081944b8  3f01086b   cmp     w9, w8
  ffffff80081944bc  0b010054   b.lt    0xffffff80081944dc

  ffffff80081944c0  681640f9   ldr     x8, [x19, #0x28]
  ffffff80081944c4  610e40f9   ldr     x1, [x19, #0x18]
  ffffff80081944c8  e00314aa   mov     x0, x20
  ffffff80081944cc  00013fd6   blr     x8

  ffffff80081944d0  fd7b41a9   ldp     x29, x30, [sp, #0x10
{__saved_x29} {__saved_x30}]
  ffffff80081944d4  f44fc2a8   ldp     x20, x19, [sp {__saved_x20}
{__saved_x19}], #0x20
  ffffff80081944d8  c0035fd6   ret

ffffff8008a340e8  e00319aa   mov     x0, x25
ffffff8008a340ec  e1031aaa   mov     x1, x26
ffffff8008a340f0  60023fd6   blr     x19

1:el1_irq + xx - (x1, x26) -> sp control
ffffff800808314c  5f030091   mov     sp, x26
ffffff8008083150  fd4fbfa9   stp     x29, x19, [sp, #-0x10]! {__saved_x0}
ffffff8008083154  fd030091   mov     x29, sp
ffffff8008083158  20003fd6   blr     x1

2:ipc_log_extract + xx (sp, x19) -> survival
ffffff800817c35c  e0c30091   add     x0, sp, #0x30 {var_170}
ffffff800817c360  e1430091   add     x1, sp, #0x10 {var_190}
ffffff800817c364  60023fd6   blr     x19

3:dispatch_io + xx (mempool_free gadget, appears in plenty of other places.)
ffffff8008a342cc  084c41a9   ldp     x8, x19, [x0, #0x10]
ffffff8008a342d0  140040f9   ldr     x20, [x0]
ffffff8008a342d4  151040f9   ldr     x21, [x0, #0x20]
ffffff8008a342d8  010140f9   ldr     x1, [x8]
ffffff8008a342dc  6f80dd97   bl      mempool_free

  mempool_free:
  ffffff8008194498  f44fbea9   stp     x20, x19, [sp, #-0x20
{__saved_x20} {__saved_x19}]!
  ffffff800819449c  fd7b01a9   stp     x29, x30, [sp, #0x10
{__saved_x29} {__saved_x30}]
  ffffff80081944a0  fd430091   add     x29, sp, #0x10 {__saved_x29}
  ffffff80081944a4  f30301aa   mov     x19, x1
  ffffff80081944a8  f40300aa   mov     x20, x0
  ffffff80081944ac  340100b4   cbz     x20, 0xffffff80081944d0

  ffffff80081944b0  bf3903d5   dmb     ishld
  ffffff80081944b4  68a64029   ldp     w8, w9, [x19, #0x4]
  ffffff80081944b8  3f01086b   cmp     w9, w8
  ffffff80081944bc  0b010054   b.lt    0xffffff80081944dc

  ffffff80081944c0  681640f9   ldr     x8, [x19, #0x28]
  ffffff80081944c4  610e40f9   ldr     x1, [x19, #0x18]
  ffffff80081944c8  e00314aa   mov     x0, x20
  ffffff80081944cc  00013fd6   blr     x8

  ffffff80081944d0  fd7b41a9   ldp     x29, x30, [sp, #0x10
{__saved_x29} {__saved_x30}]
  ffffff80081944d4  f44fc2a8   ldp     x20, x19, [sp {__saved_x20}
{__saved_x19}], #0x20
  ffffff80081944d8  c0035fd6   ret

ffffff8008a342e0  e00314aa   mov     x0, x20
ffffff8008a342e4  e10315aa   mov     x1, x21
ffffff8008a342e8  60023fd6   blr     x19

4:bus_sort_breadthfirst + xx - (x26)
ffffff8008683cc8  561740f9   ldr     x22, [x26, #0x28]
ffffff8008683ccc  e00315aa   mov     x0, x21
ffffff8008683cd0  e10316aa   mov     x1, x22
ffffff8008683cd4  80023fd6   blr     x20

5:kernel_exit (macro) - (x21, x22, sp) -> full register control & pc control
ffffff8008082f64  354018d5   msr     elr_el1, x21
ffffff8008082f68  164018d5   msr     spsr_el1, x22
ffffff8008082f6c  e00740a9   ldp     x0, x1, [sp {var_130} {var_128}]
ffffff8008082f70  e20f41a9   ldp     x2, x3, [sp, #0x10 {var_120} {var_118}]
ffffff8008082f74  e41742a9   ldp     x4, x5, [sp, #0x20 {var_110} {var_108}]
ffffff8008082f78  e61f43a9   ldp     x6, x7, [sp, #0x30 {var_100} {var_f8}]
ffffff8008082f7c  e82744a9   ldp     x8, x9, [sp, #0x40 {var_f0} {var_e8}]
ffffff8008082f80  ea2f45a9   ldp     x10, x11, [sp, #0x50 {var_e0} {var_d8}]
ffffff8008082f84  ec3746a9   ldp     x12, x13, [sp, #0x60 {var_d0} {var_c8}]
ffffff8008082f88  ee3f47a9   ldp     x14, x15, [sp, #0x70 {var_c0} {var_b8}]
ffffff8008082f8c  f04748a9   ldp     x16, x17, [sp, #0x80 {var_b0} {var_a8}]
ffffff8008082f90  f24f49a9   ldp     x18, x19, [sp, #0x90 {var_a0} {var_98}]
ffffff8008082f94  f4574aa9   ldp     x20, x21, [sp, #0xa0 {var_90} {var_88}]
ffffff8008082f98  f65f4ba9   ldp     x22, x23, [sp, #0xb0 {var_80} {var_78}]
ffffff8008082f9c  f8674ca9   ldp     x24, x25, [sp, #0xc0 {var_70} {var_68}]
ffffff8008082fa0  fa6f4da9   ldp     x26, x27, [sp, #0xd0 {var_60} {var_58}]
ffffff8008082fa4  fc774ea9   ldp     x28, x29, [sp, #0xe0 {var_50} {var_48}]
ffffff8008082fa8  fe7b40f9   ldr     x30, [sp, #0xf0 {var_40}]
ffffff8008082fac  ffc30491   add     sp, sp, #0x130
ffffff8008082fb0  e0039fd6   eret


ptr = 0000414100000000 = initial x0

0000: 2525252525252525 ; (0:40d8) x25
0008: 0000414100000030 ; (0:40e0) x1
0010: 0000414100000000 ; (0:40d4) x8
0018: ffffff8008a342cc ; (0:40d4) x19 -> branch target (2:c364)
0020: 0000414100000070 ; (0:40dc) x26 -> sp (1:314c)
0028:
0030: 8888888899999999 ; (0:44b4) w8, w9
0038:
0040:
0048: ffffff800817c35c ; (0:44c4) x1 -> branch target (1:3158)
0050:
0058: ffffff800808314c ; (0:44c0) x8 -> branch target (0:44c4)
0060: xxxxxxxxxxxxxxxx ; saved x29                               <-- sp@(1:3154)
0068: xxxxxxxxxxxxxxxx ; saved x19
0070:                  ;                                         <--
sp@(1:314c), (5:2f64)
0078:
0080:
0088:
0090:
0098: 2222222222222222 ; (4:3cc8) x22 -> spsr_el1
00a0: ffffff8008082f64 ; (3:42d0) x20 -> branch target (4:3cd4)  <-- x0@(2:c35c)
00a8: 00004141000000d0 ; (3:42d8) x1
00b0: 00004141000000a0 ; (3:42cc) x8
00b8: 1919191919191919 ; (3:42cc) x19
00c0: 2121212121212121 ; (3:42d4) x21 -> elr_el1
00c8:
00d0: 8888888899999999 ; (3:44b4) w8, w9
00d8:
00e0:
00e8: 1111111111111111 ; (3:44c4) x1
00f0:
00f8: ffffff800808314c ; (3:44c0) x8 -> branch target (3:44c4)
>
> > On Mon, 6 Aug 2018 at 19:28, Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > wrote:
> >>
> >> On 6 August 2018 at 21:50, Kees Cook <keescook@chromium.org> wrote:
> >> > On Mon, Aug 6, 2018 at 12:35 PM, Ard Biesheuvel
> >> > <ard.biesheuvel@linaro.org> wrote:
> >> >> On 6 August 2018 at 20:49, Kees Cook <keescook@chromium.org> wrote:
> >> >>> On Mon, Aug 6, 2018 at 10:45 AM, Robin Murphy <robin.murphy@arm.com>
> >> >>> wrote:
> >> >>>> I guess what I'm getting at is that if the protection mechanism is
> >> >>>> "always
> >> >>>> return with SP outside TTBR1", there seems little point in going
> >> >>>> through the
> >> >>>> motions if SP in TTBR0 could still be valid and allow an attack to
> >> >>>> succeed
> >> >>>> anyway; this is basically just me working through a justification for
> >> >>>> saying
> >> >>>> the proposed scheme needs "depends on ARM64_PAN ||
> >> >>>> ARM64_SW_TTBR0_PAN",
> >> >>>> making it that much uglier for v8.0 CPUs...
> >> >>>
> >> >>> I think anyone with v8.0 CPUs interested in this mitigation would also
> >> >>> very much want PAN emulation. If a "depends on" isn't desired, what
> >> >>> about "imply" in the Kconfig?
> >> >>>
> >> >>
> >> >> Yes, but actually, using bit #0 is maybe a better alternative in any
> >> >> case. You can never dereference SP with bit #0 set, regardless of
> >> >> whether the address points to user or kernel space, and my concern
> >> >> about reloading sp from x29 doesn't really make sense, given that x29
> >> >> is always assigned from sp right after pushing x29 and x30 in the
> >> >> function prologue, and sp only gets restored from x29 in the epilogue
> >> >> when there is a stack frame to begin with, in which case we add #1 to
> >> >> sp again before returning from the function.
> >> >
> >> > Fair enough! :)
> >> >
> >> >> The other code gets a lot cleaner as well.
> >> >>
> >> >> So for the return we'll have
> >> >>
> >> >>   ldp     x29, x30, [sp], #nn
> >> >>>>add     sp, sp, #0x1
> >> >>   ret
> >> >>
> >> >> and for the function call
> >> >>
> >> >>   bl      <foo>
> >> >>>>mov      x30, sp
> >> >>>>bic     sp, x30, #1
> >> >>
> >> >> The restore sequence in entry.s:96 (which has no spare registers) gets
> >> >> much simpler as well:
> >> >>
> >> >> --- a/arch/arm64/kernel/entry.S
> >> >> +++ b/arch/arm64/kernel/entry.S
> >> >> @@ -95,6 +95,15 @@ alternative_else_nop_endif
> >> >>          */
> >> >>         add     sp, sp, x0      // sp' = sp + x0
> >> >>         sub     x0, sp, x0      // x0' = sp' - x0 = (sp + x0) - x0 = sp
> >> >> +#ifdef CONFIG_ARM64_ROP_SHIELD
> >> >> +       tbnz    x0, #0, 1f
> >> >> +       .subsection     1
> >> >> +1:     sub     x0, x0, #1
> >> >> +       sub     sp, sp, #1
> >> >> +       b       2f
> >> >> +       .previous
> >> >> +2:
> >> >> +#endif
> >> >>         tbnz    x0, #THREAD_SHIFT, 0f
> >> >>         sub     x0, sp, x0      // x0'' = sp' - x0' = (sp + x0) - sp =
> >> >> x0
> >> >>         sub     sp, sp, x0      // sp'' = sp' - x0 = (sp + x0) - x0 =
> >> >> sp
> >> >
> >> > I get slightly concerned about "add" vs "clear bit", but I don't see a
> >> > real way to chain a lot of "add"s to get to avoid the unaligned
> >> > access. Is "or" less efficient than "add"?
> >> >
> >>
> >> Yes. The stack pointer is special on arm64, and can only be used with
> >> a limited set of ALU instructions. So orring #1 would involve 'mov
> >> <reg>, sp ; orr sp, <reg>, #1' like in the 'bic' case above, which
> >> requires a scratch register as well.

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4849 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation
  2018-08-08 16:09                           ` Mark Brand
@ 2018-08-08 22:02                             ` Kees Cook
  -1 siblings, 0 replies; 46+ messages in thread
From: Kees Cook @ 2018-08-08 22:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Aug 8, 2018 at 9:09 AM, Mark Brand <markbrand@google.com> wrote:
> (1) chaining through the mempool_free function, I found this really
> quickly when searching for useful gadgets based off x0
>
> void mempool_free(void *element, mempool_t *pool)
> {
>   unsigned long flags;
>
>   if (unlikely(element == NULL))
>     return;
>
>   /* snip */
>   smp_rmb();
>
>   /* snip */
>   if (unlikely(pool->curr_nr < pool->min_nr)) {
>     spin_lock_irqsave(&pool->lock, flags);
>     if (likely(pool->curr_nr < pool->min_nr)) {
>       add_element(pool, element);
>       spin_unlock_irqrestore(&pool->lock, flags);
>       wake_up(&pool->wait);
>       return;
>     }
>     spin_unlock_irqrestore(&pool->lock, flags);
>   }
>
>   pool->free(element, pool->pool_data);
> }
>
> Since the callsites for this function usually load the arguments
> through some registers, and the function to call gets pulled out of
> one of those arguments, it's easy to get a couple of registers loaded
> here and then the chain continue.
>
> (2) loading complete register state using kernel_exit macro.
>
> Since the kernel_exit macro actually loads spsr_el1 and elr_el1 from
> registers, I think that you can let the eret return to anywhere in el1
> without dropping to el0, since the same handler is used for "exiting
> the kernel" when a hardware interrupt interrupts the kernel itself. I
> didn't fill out the necessary register values in the chain below,
> since I don't anyway have a device around to test this on right now.
>
> I'm not sure that you could really robustly protect this eret; I
> suppose that you could try and somehow validate the saved register
> state, but given that it would be happening on every exception return,
> I suspect it would be expensive.

This is a wonderful example, thanks! Yeah, Call-Oriented Programming?
:P mempool_free() is quite a nice gadget.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation
@ 2018-08-08 22:02                             ` Kees Cook
  0 siblings, 0 replies; 46+ messages in thread
From: Kees Cook @ 2018-08-08 22:02 UTC (permalink / raw)
  To: Mark Brand
  Cc: Ard Biesheuvel, Catalin Marinas, Christoffer Dall,
	Julien Thierry, Kernel Hardening, Laura Abbott, Mark Rutland,
	Robin Murphy, Will Deacon, linux-arm-kernel

On Wed, Aug 8, 2018 at 9:09 AM, Mark Brand <markbrand@google.com> wrote:
> (1) chaining through the mempool_free function, I found this really
> quickly when searching for useful gadgets based off x0
>
> void mempool_free(void *element, mempool_t *pool)
> {
>   unsigned long flags;
>
>   if (unlikely(element == NULL))
>     return;
>
>   /* snip */
>   smp_rmb();
>
>   /* snip */
>   if (unlikely(pool->curr_nr < pool->min_nr)) {
>     spin_lock_irqsave(&pool->lock, flags);
>     if (likely(pool->curr_nr < pool->min_nr)) {
>       add_element(pool, element);
>       spin_unlock_irqrestore(&pool->lock, flags);
>       wake_up(&pool->wait);
>       return;
>     }
>     spin_unlock_irqrestore(&pool->lock, flags);
>   }
>
>   pool->free(element, pool->pool_data);
> }
>
> Since the callsites for this function usually load the arguments
> through some registers, and the function to call gets pulled out of
> one of those arguments, it's easy to get a couple of registers loaded
> here and then the chain continue.
>
> (2) loading complete register state using kernel_exit macro.
>
> Since the kernel_exit macro actually loads spsr_el1 and elr_el1 from
> registers, I think that you can let the eret return to anywhere in el1
> without dropping to el0, since the same handler is used for "exiting
> the kernel" when a hardware interrupt interrupts the kernel itself. I
> didn't fill out the necessary register values in the chain below,
> since I don't anyway have a device around to test this on right now.
>
> I'm not sure that you could really robustly protect this eret; I
> suppose that you could try and somehow validate the saved register
> state, but given that it would be happening on every exception return,
> I suspect it would be expensive.

This is a wonderful example, thanks! Yeah, Call-Oriented Programming?
:P mempool_free() is quite a nice gadget.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation
  2018-08-08 16:09                           ` Mark Brand
  (?)
  (?)
@ 2018-08-13  7:39                           ` Ard Biesheuvel
  -1 siblings, 0 replies; 46+ messages in thread
From: Ard Biesheuvel @ 2018-08-13  7:39 UTC (permalink / raw)
  To: Mark Brand
  Cc: Catalin Marinas, Christoffer Dall, Julien Thierry, Kees Cook,
	Kernel Hardening, Laura Abbott, Mark Rutland, Robin Murphy,
	Will Deacon, linux-arm-kernel

[-- Attachment #1: Type: text/plain, Size: 15342 bytes --]

On Wed 8 Aug 2018 at 19:09, Mark Brand <markbrand@google.com> wrote:

> On Tue, Aug 7, 2018 at 2:22 AM Ard Biesheuvel <ard.biesheuvel@linaro.org>
> wrote:
> >
> > On 7 August 2018 at 05:05, Mark Brand <markbrand@google.com> wrote:
> > > I think the phrasing of "limit kernel attack surface against ROP
> attacks" is
> > > confusing and misleading. ROP does not describe a class of bugs,
> > > vulnerabilities or attacks against the kernel - it's just one of many
> > > code-reuse techniques that can be used by an attacker while exploiting
> a
> > > vulnerability. But that's kind of off-topic!
> > >
> > > I think what this thread is talking about is implementing extremely
> > > coarse-grained reverse-edge control-flow-integrity, in that a return
> can
> > > only return to the address following a legitimate call, but it can
> return to
> > > any of those.
> > >
> >
> > Indeed. Apologies for not mastering the lingo, but it is indeed about
> > no longer being able to subvert function returns into jumping to
> > arbitrary places in the code.
> >
> > > I suspect there's not much benefit to this, since (as far as I can
> see) the
> > > assumption is that an attacker has the means to direct flow of
> execution as
> > > far as taking complete control of the (el1) stack before executing any
> ROP
> > > payload.
> > >
> > > At that point, I think it's highly unlikely an attacker needs to chain
> > > gadgets through return instructions at all - I suspect there are a few
> > > places in the kernel where it is necessary to load the entire register
> > > context from a register that is not the stack pointer, and it would
> likely
> > > not be more than a minor inconvenience to an attacker to use these (and
> > > chaining through branch register) instructions instead of chaining
> through
> > > return instructions.
> > >
> > > I'd have to take a closer look at an arm64 kernel image to be sure
> though -
> > > I'll do that when I get a chance and update...
> > >
> >
> > Thanks. Reloading all registers from an arbitrary offset register
> > should occur rarely, no? Could we work around that?
>
>  I forgot about the gmail-html-by-default... Hopefully everyone else
> can read the quotes though :-/.
>
> I took a look and have put together an example rop chain that doesn't
> use any return instructions that you could instrument, that will call
> an arbitrary kernel function with controlled parameters (at least x0 -
> x4, would have to probably mess with some alignment and add a
> repetition of the last gadget to get all register control. It assumes
> that the attacker has control over the memory pointed to by x0 at the
> point where they get control of pc, and that they know where that
> memory is located (but it would also work if they just controlled the
> memory pointed to by x0, and had another chunk of kernel memory they
> control at a known address. Seems like a pretty reasonable starting
> assumption, and I'm sure anyone with a little motivation could produce
> similar chains for other starting conditions, this just seemed the
> "most likely" reasonable conditions to me.
>

Thanks a lot for taking the time to put together this excellent example. I
will study it in more detail after I return from my vacation.

Ard.


> There are two basic principles used here -
>
> (1) chaining through the mempool_free function, I found this really
> quickly when searching for useful gadgets based off x0
>
> void mempool_free(void *element, mempool_t *pool)
> {
>   unsigned long flags;
>
>   if (unlikely(element == NULL))
>     return;
>
>   /* snip */
>   smp_rmb();
>
>   /* snip */
>   if (unlikely(pool->curr_nr < pool->min_nr)) {
>     spin_lock_irqsave(&pool->lock, flags);
>     if (likely(pool->curr_nr < pool->min_nr)) {
>       add_element(pool, element);
>       spin_unlock_irqrestore(&pool->lock, flags);
>       wake_up(&pool->wait);
>       return;
>     }
>     spin_unlock_irqrestore(&pool->lock, flags);
>   }
>
>   pool->free(element, pool->pool_data);
> }
>
> Since the callsites for this function usually load the arguments
> through some registers, and the function to call gets pulled out of
> one of those arguments, it's easy to get a couple of registers loaded
> here and then the chain continue.
>
> (2) loading complete register state using kernel_exit macro.
>
> Since the kernel_exit macro actually loads spsr_el1 and elr_el1 from
> registers, I think that you can let the eret return to anywhere in el1
> without dropping to el0, since the same handler is used for "exiting
> the kernel" when a hardware interrupt interrupts the kernel itself. I
> didn't fill out the necessary register values in the chain below,
> since I don't anyway have a device around to test this on right now.
>
> I'm not sure that you could really robustly protect this eret; I
> suppose that you could try and somehow validate the saved register
> state, but given that it would be happening on every exception return,
> I suspect it would be expensive.
>
> 0:dispatch_io + yy (mempool_free gadget, appears in plenty of other
> places.)
> ffffff8008a340d4  084c41a9   ldp     x8, x19, [x0, #0x10]
> ffffff8008a340d8  190040f9   ldr     x25, [x0]
> ffffff8008a340dc  1a1040f9   ldr     x26, [x0, #0x20]
> ffffff8008a340e0  010140f9   ldr     x1, [x8]
> ffffff8008a340e4  ed80dd97   bl      mempool_free
>
>   mempool_free:
>   ffffff8008194498  f44fbea9   stp     x20, x19, [sp, #-0x20
> {__saved_x20} {__saved_x19}]!
>   ffffff800819449c  fd7b01a9   stp     x29, x30, [sp, #0x10
> {__saved_x29} {__saved_x30}]
>   ffffff80081944a0  fd430091   add     x29, sp, #0x10 {__saved_x29}
>   ffffff80081944a4  f30301aa   mov     x19, x1
>   ffffff80081944a8  f40300aa   mov     x20, x0
>   ffffff80081944ac  340100b4   cbz     x20, 0xffffff80081944d0
>
>   ffffff80081944b0  bf3903d5   dmb     ishld
>   ffffff80081944b4  68a64029   ldp     w8, w9, [x19, #0x4]
>   ffffff80081944b8  3f01086b   cmp     w9, w8
>   ffffff80081944bc  0b010054   b.lt    0xffffff80081944dc
>
>   ffffff80081944c0  681640f9   ldr     x8, [x19, #0x28]
>   ffffff80081944c4  610e40f9   ldr     x1, [x19, #0x18]
>   ffffff80081944c8  e00314aa   mov     x0, x20
>   ffffff80081944cc  00013fd6   blr     x8
>
>   ffffff80081944d0  fd7b41a9   ldp     x29, x30, [sp, #0x10
> {__saved_x29} {__saved_x30}]
>   ffffff80081944d4  f44fc2a8   ldp     x20, x19, [sp {__saved_x20}
> {__saved_x19}], #0x20
>   ffffff80081944d8  c0035fd6   ret
>
> ffffff8008a340e8  e00319aa   mov     x0, x25
> ffffff8008a340ec  e1031aaa   mov     x1, x26
> ffffff8008a340f0  60023fd6   blr     x19
>
> 1:el1_irq + xx - (x1, x26) -> sp control
> ffffff800808314c  5f030091   mov     sp, x26
> ffffff8008083150  fd4fbfa9   stp     x29, x19, [sp, #-0x10]! {__saved_x0}
> ffffff8008083154  fd030091   mov     x29, sp
> ffffff8008083158  20003fd6   blr     x1
>
> 2:ipc_log_extract + xx (sp, x19) -> survival
> ffffff800817c35c  e0c30091   add     x0, sp, #0x30 {var_170}
> ffffff800817c360  e1430091   add     x1, sp, #0x10 {var_190}
> ffffff800817c364  60023fd6   blr     x19
>
> 3:dispatch_io + xx (mempool_free gadget, appears in plenty of other
> places.)
> ffffff8008a342cc  084c41a9   ldp     x8, x19, [x0, #0x10]
> ffffff8008a342d0  140040f9   ldr     x20, [x0]
> ffffff8008a342d4  151040f9   ldr     x21, [x0, #0x20]
> ffffff8008a342d8  010140f9   ldr     x1, [x8]
> ffffff8008a342dc  6f80dd97   bl      mempool_free
>
>   mempool_free:
>   ffffff8008194498  f44fbea9   stp     x20, x19, [sp, #-0x20
> {__saved_x20} {__saved_x19}]!
>   ffffff800819449c  fd7b01a9   stp     x29, x30, [sp, #0x10
> {__saved_x29} {__saved_x30}]
>   ffffff80081944a0  fd430091   add     x29, sp, #0x10 {__saved_x29}
>   ffffff80081944a4  f30301aa   mov     x19, x1
>   ffffff80081944a8  f40300aa   mov     x20, x0
>   ffffff80081944ac  340100b4   cbz     x20, 0xffffff80081944d0
>
>   ffffff80081944b0  bf3903d5   dmb     ishld
>   ffffff80081944b4  68a64029   ldp     w8, w9, [x19, #0x4]
>   ffffff80081944b8  3f01086b   cmp     w9, w8
>   ffffff80081944bc  0b010054   b.lt    0xffffff80081944dc
>
>   ffffff80081944c0  681640f9   ldr     x8, [x19, #0x28]
>   ffffff80081944c4  610e40f9   ldr     x1, [x19, #0x18]
>   ffffff80081944c8  e00314aa   mov     x0, x20
>   ffffff80081944cc  00013fd6   blr     x8
>
>   ffffff80081944d0  fd7b41a9   ldp     x29, x30, [sp, #0x10
> {__saved_x29} {__saved_x30}]
>   ffffff80081944d4  f44fc2a8   ldp     x20, x19, [sp {__saved_x20}
> {__saved_x19}], #0x20
>   ffffff80081944d8  c0035fd6   ret
>
> ffffff8008a342e0  e00314aa   mov     x0, x20
> ffffff8008a342e4  e10315aa   mov     x1, x21
> ffffff8008a342e8  60023fd6   blr     x19
>
> 4:bus_sort_breadthfirst + xx - (x26)
> ffffff8008683cc8  561740f9   ldr     x22, [x26, #0x28]
> ffffff8008683ccc  e00315aa   mov     x0, x21
> ffffff8008683cd0  e10316aa   mov     x1, x22
> ffffff8008683cd4  80023fd6   blr     x20
>
> 5:kernel_exit (macro) - (x21, x22, sp) -> full register control & pc
> control
> ffffff8008082f64  354018d5   msr     elr_el1, x21
> ffffff8008082f68  164018d5   msr     spsr_el1, x22
> ffffff8008082f6c  e00740a9   ldp     x0, x1, [sp {var_130} {var_128}]
> ffffff8008082f70  e20f41a9   ldp     x2, x3, [sp, #0x10 {var_120}
> {var_118}]
> ffffff8008082f74  e41742a9   ldp     x4, x5, [sp, #0x20 {var_110}
> {var_108}]
> ffffff8008082f78  e61f43a9   ldp     x6, x7, [sp, #0x30 {var_100} {var_f8}]
> ffffff8008082f7c  e82744a9   ldp     x8, x9, [sp, #0x40 {var_f0} {var_e8}]
> ffffff8008082f80  ea2f45a9   ldp     x10, x11, [sp, #0x50 {var_e0}
> {var_d8}]
> ffffff8008082f84  ec3746a9   ldp     x12, x13, [sp, #0x60 {var_d0}
> {var_c8}]
> ffffff8008082f88  ee3f47a9   ldp     x14, x15, [sp, #0x70 {var_c0}
> {var_b8}]
> ffffff8008082f8c  f04748a9   ldp     x16, x17, [sp, #0x80 {var_b0}
> {var_a8}]
> ffffff8008082f90  f24f49a9   ldp     x18, x19, [sp, #0x90 {var_a0}
> {var_98}]
> ffffff8008082f94  f4574aa9   ldp     x20, x21, [sp, #0xa0 {var_90}
> {var_88}]
> ffffff8008082f98  f65f4ba9   ldp     x22, x23, [sp, #0xb0 {var_80}
> {var_78}]
> ffffff8008082f9c  f8674ca9   ldp     x24, x25, [sp, #0xc0 {var_70}
> {var_68}]
> ffffff8008082fa0  fa6f4da9   ldp     x26, x27, [sp, #0xd0 {var_60}
> {var_58}]
> ffffff8008082fa4  fc774ea9   ldp     x28, x29, [sp, #0xe0 {var_50}
> {var_48}]
> ffffff8008082fa8  fe7b40f9   ldr     x30, [sp, #0xf0 {var_40}]
> ffffff8008082fac  ffc30491   add     sp, sp, #0x130
> ffffff8008082fb0  e0039fd6   eret
>
>
> ptr = 0000414100000000 = initial x0
>
> 0000: 2525252525252525 ; (0:40d8) x25
> 0008: 0000414100000030 ; (0:40e0) x1
> 0010: 0000414100000000 ; (0:40d4) x8
> 0018: ffffff8008a342cc ; (0:40d4) x19 -> branch target (2:c364)
> 0020: 0000414100000070 ; (0:40dc) x26 -> sp (1:314c)
> 0028:
> 0030: 8888888899999999 ; (0:44b4) w8, w9
> 0038:
> 0040:
> 0048: ffffff800817c35c ; (0:44c4) x1 -> branch target (1:3158)
> 0050:
> 0058: ffffff800808314c ; (0:44c0) x8 -> branch target (0:44c4)
> 0060: xxxxxxxxxxxxxxxx ; saved x29                               <-- sp@
> (1:3154)
> 0068: xxxxxxxxxxxxxxxx ; saved x19
> 0070:                  ;                                         <--
> sp@(1:314c), (5:2f64)
> 0078:
> 0080:
> 0088:
> 0090:
> 0098: 2222222222222222 ; (4:3cc8) x22 -> spsr_el1
> 00a0: ffffff8008082f64 ; (3:42d0) x20 -> branch target (4:3cd4)  <-- x0@
> (2:c35c)
> 00a8: 00004141000000d0 ; (3:42d8) x1
> 00b0: 00004141000000a0 ; (3:42cc) x8
> 00b8: 1919191919191919 ; (3:42cc) x19
> 00c0: 2121212121212121 ; (3:42d4) x21 -> elr_el1
> 00c8:
> 00d0: 8888888899999999 ; (3:44b4) w8, w9
> 00d8:
> 00e0:
> 00e8: 1111111111111111 ; (3:44c4) x1
> 00f0:
> 00f8: ffffff800808314c ; (3:44c0) x8 -> branch target (3:44c4)
> >
> > > On Mon, 6 Aug 2018 at 19:28, Ard Biesheuvel <ard.biesheuvel@linaro.org
> >
> > > wrote:
> > >>
> > >> On 6 August 2018 at 21:50, Kees Cook <keescook@chromium.org> wrote:
> > >> > On Mon, Aug 6, 2018 at 12:35 PM, Ard Biesheuvel
> > >> > <ard.biesheuvel@linaro.org> wrote:
> > >> >> On 6 August 2018 at 20:49, Kees Cook <keescook@chromium.org>
> wrote:
> > >> >>> On Mon, Aug 6, 2018 at 10:45 AM, Robin Murphy <
> robin.murphy@arm.com>
> > >> >>> wrote:
> > >> >>>> I guess what I'm getting at is that if the protection mechanism
> is
> > >> >>>> "always
> > >> >>>> return with SP outside TTBR1", there seems little point in going
> > >> >>>> through the
> > >> >>>> motions if SP in TTBR0 could still be valid and allow an attack
> to
> > >> >>>> succeed
> > >> >>>> anyway; this is basically just me working through a
> justification for
> > >> >>>> saying
> > >> >>>> the proposed scheme needs "depends on ARM64_PAN ||
> > >> >>>> ARM64_SW_TTBR0_PAN",
> > >> >>>> making it that much uglier for v8.0 CPUs...
> > >> >>>
> > >> >>> I think anyone with v8.0 CPUs interested in this mitigation would
> also
> > >> >>> very much want PAN emulation. If a "depends on" isn't desired,
> what
> > >> >>> about "imply" in the Kconfig?
> > >> >>>
> > >> >>
> > >> >> Yes, but actually, using bit #0 is maybe a better alternative in
> any
> > >> >> case. You can never dereference SP with bit #0 set, regardless of
> > >> >> whether the address points to user or kernel space, and my concern
> > >> >> about reloading sp from x29 doesn't really make sense, given that
> x29
> > >> >> is always assigned from sp right after pushing x29 and x30 in the
> > >> >> function prologue, and sp only gets restored from x29 in the
> epilogue
> > >> >> when there is a stack frame to begin with, in which case we add #1
> to
> > >> >> sp again before returning from the function.
> > >> >
> > >> > Fair enough! :)
> > >> >
> > >> >> The other code gets a lot cleaner as well.
> > >> >>
> > >> >> So for the return we'll have
> > >> >>
> > >> >>   ldp     x29, x30, [sp], #nn
> > >> >>>>add     sp, sp, #0x1
> > >> >>   ret
> > >> >>
> > >> >> and for the function call
> > >> >>
> > >> >>   bl      <foo>
> > >> >>>>mov      x30, sp
> > >> >>>>bic     sp, x30, #1
> > >> >>
> > >> >> The restore sequence in entry.s:96 (which has no spare registers)
> gets
> > >> >> much simpler as well:
> > >> >>
> > >> >> --- a/arch/arm64/kernel/entry.S
> > >> >> +++ b/arch/arm64/kernel/entry.S
> > >> >> @@ -95,6 +95,15 @@ alternative_else_nop_endif
> > >> >>          */
> > >> >>         add     sp, sp, x0      // sp' = sp + x0
> > >> >>         sub     x0, sp, x0      // x0' = sp' - x0 = (sp + x0) - x0
> = sp
> > >> >> +#ifdef CONFIG_ARM64_ROP_SHIELD
> > >> >> +       tbnz    x0, #0, 1f
> > >> >> +       .subsection     1
> > >> >> +1:     sub     x0, x0, #1
> > >> >> +       sub     sp, sp, #1
> > >> >> +       b       2f
> > >> >> +       .previous
> > >> >> +2:
> > >> >> +#endif
> > >> >>         tbnz    x0, #THREAD_SHIFT, 0f
> > >> >>         sub     x0, sp, x0      // x0'' = sp' - x0' = (sp + x0) -
> sp =
> > >> >> x0
> > >> >>         sub     sp, sp, x0      // sp'' = sp' - x0 = (sp + x0) -
> x0 =
> > >> >> sp
> > >> >
> > >> > I get slightly concerned about "add" vs "clear bit", but I don't
> see a
> > >> > real way to chain a lot of "add"s to get to avoid the unaligned
> > >> > access. Is "or" less efficient than "add"?
> > >> >
> > >>
> > >> Yes. The stack pointer is special on arm64, and can only be used with
> > >> a limited set of ALU instructions. So orring #1 would involve 'mov
> > >> <reg>, sp ; orr sp, <reg>, #1' like in the 'bic' case above, which
> > >> requires a scratch register as well.
>

[-- Attachment #2: Type: text/html, Size: 19892 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation
  2018-08-02 13:21 ` Ard Biesheuvel
@ 2018-08-18  1:27   ` Laura Abbott
  -1 siblings, 0 replies; 46+ messages in thread
From: Laura Abbott @ 2018-08-18  1:27 UTC (permalink / raw)
  To: linux-arm-kernel

On 08/02/2018 06:21 AM, Ard Biesheuvel wrote:
> This is a proof of concept I cooked up, primarily to trigger a discussion
> about whether there is a point to doing anything like this, and if there
> is, what the pitfalls are. Also, while I am not aware of any similar
> implementations, the idea is so simple that I would be surprised if nobody
> else thought of the same thing way before I did.
> 
> The idea is that we can significantly limit the kernel's attack surface
> for ROP based attacks by clearing the stack pointer's sign bit before
> returning from a function, and setting it again right after proceeding
> from the [expected] return address. This should make it much more difficult
> to return to arbitrary gadgets, given that they rely on being chained to
> the next via a return address popped off the stack, and this is difficult
> when the stack pointer is invalid.
> 
> Of course, 4 additional instructions per function return is not exactly
> for free, but they are just movs and adds, and leaf functions are
> disregarded unless they allocate a stack frame (this comes for free
> because simple_return insns are disregarded by the plugin)
> 
> Please shoot, preferably with better ideas ...
> 
> Ard Biesheuvel (3):
>    arm64: use wrapper macro for bl/blx instructions from asm code
>    gcc: plugins: add ROP shield plugin for arm64
>    arm64: enable ROP protection by clearing SP bit #55 across function
>      returns
> 
>   arch/Kconfig                                  |   4 +
>   arch/arm64/Kconfig                            |  10 ++
>   arch/arm64/include/asm/assembler.h            |  21 +++-
>   arch/arm64/kernel/entry-ftrace.S              |   6 +-
>   arch/arm64/kernel/entry.S                     | 104 +++++++++-------
>   arch/arm64/kernel/head.S                      |   4 +-
>   arch/arm64/kernel/probes/kprobes_trampoline.S |   2 +-
>   arch/arm64/kernel/sleep.S                     |   6 +-
>   drivers/firmware/efi/libstub/Makefile         |   3 +-
>   scripts/Makefile.gcc-plugins                  |   7 ++
>   scripts/gcc-plugins/arm64_rop_shield_plugin.c | 116 ++++++++++++++++++
>   11 files changed, 228 insertions(+), 55 deletions(-)
>   create mode 100644 scripts/gcc-plugins/arm64_rop_shield_plugin.c
> 

I tried this on the Fedora config and it died in mutex_lock

#0  el1_sync () at arch/arm64/kernel/entry.S:570
#1  0xffff000008c62ed4 in __cmpxchg_case_acq_8 (new=<optimized out>, old=<optimized out>, ptr=<optimized out>) at ./arch/arm64/include/asm/atomic_lse.h:480
#2  __cmpxchg_acq (size=<optimized out>, new=<optimized out>, old=<optimized out>, ptr=<optimized out>) at ./arch/arm64/include/asm/cmpxchg.h:141
#3  __mutex_trylock_fast (lock=<optimized out>) at kernel/locking/mutex.c:144
#4  mutex_lock (lock=0xffff0000098dee48 <cgroup_mutex>) at kernel/locking/mutex.c:241
#5  0xffff000008f40978 in kallsyms_token_index ()

ffff000008bda050 <mutex_lock>:
ffff000008bda050:       a9bf7bfd        stp     x29, x30, [sp, #-16]!
ffff000008bda054:       aa0003e3        mov     x3, x0
ffff000008bda058:       d5384102        mrs     x2, sp_el0
ffff000008bda05c:       910003fd        mov     x29, sp
ffff000008bda060:       d2800001        mov     x1, #0x0                        // #0
ffff000008bda064:       97ff85af        bl      ffff000008bbb720 <__ll_sc___cmpxchg_case_acq_8>
ffff000008bda068:       d503201f        nop
ffff000008bda06c:       d503201f        nop
ffff000008bda070:       b50000c0        cbnz    x0, ffff000008bda088 <mutex_lock+0x38>
ffff000008bda074:       a8c17bfd        ldp     x29, x30, [sp], #16
ffff000008bda078:       910003f0        mov     x16, sp
ffff000008bda07c:       9248fa1f        and     sp, x16, #0xff7fffffffffffff
ffff000008bda080:       d65f03c0        ret
ffff000008bda084:       d503201f        nop
ffff000008bda088:       aa0303e0        mov     x0, x3
ffff000008bda08c:       97ffffe7        bl      ffff000008bda028 <__mutex_lock_slowpath>
ffff000008bda090:       910003fe        mov     x30, sp
ffff000008bda094:       b24903df        orr     sp, x30, #0x80000000000000
ffff000008bda098:       a8c17bfd        ldp     x29, x30, [sp], #16
ffff000008bda09c:       910003f0        mov     x16, sp
ffff000008bda0a0:       9248fa1f        and     sp, x16, #0xff7fffffffffffff
ffff000008bda0a4:       d65f03c0        ret

ffff000008bbb720 <__ll_sc___cmpxchg_case_acq_8>:
ffff000008bbb720:       f9800011        prfm    pstl1strm, [x0]
ffff000008bbb724:       c85ffc10        ldaxr   x16, [x0]
ffff000008bbb728:       ca010211        eor     x17, x16, x1
ffff000008bbb72c:       b5000071        cbnz    x17, ffff000008bbb738 <__ll_sc___cmpxchg_case_acq_8+0x18>
ffff000008bbb730:       c8117c02        stxr    w17, x2, [x0]
ffff000008bbb734:       35ffff91        cbnz    w17, ffff000008bbb724 <__ll_sc___cmpxchg_case_acq_8+0x4>
ffff000008bbb738:       aa1003e0        mov     x0, x16
ffff000008bbb73c:       910003f0        mov     x16, sp
ffff000008bbb740:       9248fa1f        and     sp, x16, #0xff7fffffffffffff
ffff000008bbb744:       d65f03c0        ret

If I turn off CONFIG_ARM64_LSE_ATOMICS it works

Thanks,
Laura

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation
@ 2018-08-18  1:27   ` Laura Abbott
  0 siblings, 0 replies; 46+ messages in thread
From: Laura Abbott @ 2018-08-18  1:27 UTC (permalink / raw)
  To: Ard Biesheuvel, kernel-hardening
  Cc: keescook, christoffer.dall, will.deacon, catalin.marinas,
	mark.rutland, labbott, linux-arm-kernel

On 08/02/2018 06:21 AM, Ard Biesheuvel wrote:
> This is a proof of concept I cooked up, primarily to trigger a discussion
> about whether there is a point to doing anything like this, and if there
> is, what the pitfalls are. Also, while I am not aware of any similar
> implementations, the idea is so simple that I would be surprised if nobody
> else thought of the same thing way before I did.
> 
> The idea is that we can significantly limit the kernel's attack surface
> for ROP based attacks by clearing the stack pointer's sign bit before
> returning from a function, and setting it again right after proceeding
> from the [expected] return address. This should make it much more difficult
> to return to arbitrary gadgets, given that they rely on being chained to
> the next via a return address popped off the stack, and this is difficult
> when the stack pointer is invalid.
> 
> Of course, 4 additional instructions per function return is not exactly
> for free, but they are just movs and adds, and leaf functions are
> disregarded unless they allocate a stack frame (this comes for free
> because simple_return insns are disregarded by the plugin)
> 
> Please shoot, preferably with better ideas ...
> 
> Ard Biesheuvel (3):
>    arm64: use wrapper macro for bl/blx instructions from asm code
>    gcc: plugins: add ROP shield plugin for arm64
>    arm64: enable ROP protection by clearing SP bit #55 across function
>      returns
> 
>   arch/Kconfig                                  |   4 +
>   arch/arm64/Kconfig                            |  10 ++
>   arch/arm64/include/asm/assembler.h            |  21 +++-
>   arch/arm64/kernel/entry-ftrace.S              |   6 +-
>   arch/arm64/kernel/entry.S                     | 104 +++++++++-------
>   arch/arm64/kernel/head.S                      |   4 +-
>   arch/arm64/kernel/probes/kprobes_trampoline.S |   2 +-
>   arch/arm64/kernel/sleep.S                     |   6 +-
>   drivers/firmware/efi/libstub/Makefile         |   3 +-
>   scripts/Makefile.gcc-plugins                  |   7 ++
>   scripts/gcc-plugins/arm64_rop_shield_plugin.c | 116 ++++++++++++++++++
>   11 files changed, 228 insertions(+), 55 deletions(-)
>   create mode 100644 scripts/gcc-plugins/arm64_rop_shield_plugin.c
> 

I tried this on the Fedora config and it died in mutex_lock

#0  el1_sync () at arch/arm64/kernel/entry.S:570
#1  0xffff000008c62ed4 in __cmpxchg_case_acq_8 (new=<optimized out>, old=<optimized out>, ptr=<optimized out>) at ./arch/arm64/include/asm/atomic_lse.h:480
#2  __cmpxchg_acq (size=<optimized out>, new=<optimized out>, old=<optimized out>, ptr=<optimized out>) at ./arch/arm64/include/asm/cmpxchg.h:141
#3  __mutex_trylock_fast (lock=<optimized out>) at kernel/locking/mutex.c:144
#4  mutex_lock (lock=0xffff0000098dee48 <cgroup_mutex>) at kernel/locking/mutex.c:241
#5  0xffff000008f40978 in kallsyms_token_index ()

ffff000008bda050 <mutex_lock>:
ffff000008bda050:       a9bf7bfd        stp     x29, x30, [sp, #-16]!
ffff000008bda054:       aa0003e3        mov     x3, x0
ffff000008bda058:       d5384102        mrs     x2, sp_el0
ffff000008bda05c:       910003fd        mov     x29, sp
ffff000008bda060:       d2800001        mov     x1, #0x0                        // #0
ffff000008bda064:       97ff85af        bl      ffff000008bbb720 <__ll_sc___cmpxchg_case_acq_8>
ffff000008bda068:       d503201f        nop
ffff000008bda06c:       d503201f        nop
ffff000008bda070:       b50000c0        cbnz    x0, ffff000008bda088 <mutex_lock+0x38>
ffff000008bda074:       a8c17bfd        ldp     x29, x30, [sp], #16
ffff000008bda078:       910003f0        mov     x16, sp
ffff000008bda07c:       9248fa1f        and     sp, x16, #0xff7fffffffffffff
ffff000008bda080:       d65f03c0        ret
ffff000008bda084:       d503201f        nop
ffff000008bda088:       aa0303e0        mov     x0, x3
ffff000008bda08c:       97ffffe7        bl      ffff000008bda028 <__mutex_lock_slowpath>
ffff000008bda090:       910003fe        mov     x30, sp
ffff000008bda094:       b24903df        orr     sp, x30, #0x80000000000000
ffff000008bda098:       a8c17bfd        ldp     x29, x30, [sp], #16
ffff000008bda09c:       910003f0        mov     x16, sp
ffff000008bda0a0:       9248fa1f        and     sp, x16, #0xff7fffffffffffff
ffff000008bda0a4:       d65f03c0        ret

ffff000008bbb720 <__ll_sc___cmpxchg_case_acq_8>:
ffff000008bbb720:       f9800011        prfm    pstl1strm, [x0]
ffff000008bbb724:       c85ffc10        ldaxr   x16, [x0]
ffff000008bbb728:       ca010211        eor     x17, x16, x1
ffff000008bbb72c:       b5000071        cbnz    x17, ffff000008bbb738 <__ll_sc___cmpxchg_case_acq_8+0x18>
ffff000008bbb730:       c8117c02        stxr    w17, x2, [x0]
ffff000008bbb734:       35ffff91        cbnz    w17, ffff000008bbb724 <__ll_sc___cmpxchg_case_acq_8+0x4>
ffff000008bbb738:       aa1003e0        mov     x0, x16
ffff000008bbb73c:       910003f0        mov     x16, sp
ffff000008bbb740:       9248fa1f        and     sp, x16, #0xff7fffffffffffff
ffff000008bbb744:       d65f03c0        ret

If I turn off CONFIG_ARM64_LSE_ATOMICS it works

Thanks,
Laura

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation
  2018-08-18  1:27   ` Laura Abbott
@ 2018-08-20  6:30     ` Ard Biesheuvel
  -1 siblings, 0 replies; 46+ messages in thread
From: Ard Biesheuvel @ 2018-08-20  6:30 UTC (permalink / raw)
  To: linux-arm-kernel

On 18 August 2018 at 03:27, Laura Abbott <labbott@redhat.com> wrote:
> On 08/02/2018 06:21 AM, Ard Biesheuvel wrote:
>>
>> This is a proof of concept I cooked up, primarily to trigger a discussion
>> about whether there is a point to doing anything like this, and if there
>> is, what the pitfalls are. Also, while I am not aware of any similar
>> implementations, the idea is so simple that I would be surprised if nobody
>> else thought of the same thing way before I did.
>>
>> The idea is that we can significantly limit the kernel's attack surface
>> for ROP based attacks by clearing the stack pointer's sign bit before
>> returning from a function, and setting it again right after proceeding
>> from the [expected] return address. This should make it much more
>> difficult
>> to return to arbitrary gadgets, given that they rely on being chained to
>> the next via a return address popped off the stack, and this is difficult
>> when the stack pointer is invalid.
>>
>> Of course, 4 additional instructions per function return is not exactly
>> for free, but they are just movs and adds, and leaf functions are
>> disregarded unless they allocate a stack frame (this comes for free
>> because simple_return insns are disregarded by the plugin)
>>
>> Please shoot, preferably with better ideas ...
>>
>> Ard Biesheuvel (3):
>>    arm64: use wrapper macro for bl/blx instructions from asm code
>>    gcc: plugins: add ROP shield plugin for arm64
>>    arm64: enable ROP protection by clearing SP bit #55 across function
>>      returns
>>
>>   arch/Kconfig                                  |   4 +
>>   arch/arm64/Kconfig                            |  10 ++
>>   arch/arm64/include/asm/assembler.h            |  21 +++-
>>   arch/arm64/kernel/entry-ftrace.S              |   6 +-
>>   arch/arm64/kernel/entry.S                     | 104 +++++++++-------
>>   arch/arm64/kernel/head.S                      |   4 +-
>>   arch/arm64/kernel/probes/kprobes_trampoline.S |   2 +-
>>   arch/arm64/kernel/sleep.S                     |   6 +-
>>   drivers/firmware/efi/libstub/Makefile         |   3 +-
>>   scripts/Makefile.gcc-plugins                  |   7 ++
>>   scripts/gcc-plugins/arm64_rop_shield_plugin.c | 116 ++++++++++++++++++
>>   11 files changed, 228 insertions(+), 55 deletions(-)
>>   create mode 100644 scripts/gcc-plugins/arm64_rop_shield_plugin.c
>>
>
> I tried this on the Fedora config and it died in mutex_lock
>
> #0  el1_sync () at arch/arm64/kernel/entry.S:570
> #1  0xffff000008c62ed4 in __cmpxchg_case_acq_8 (new=<optimized out>,
> old=<optimized out>, ptr=<optimized out>) at
> ./arch/arm64/include/asm/atomic_lse.h:480
> #2  __cmpxchg_acq (size=<optimized out>, new=<optimized out>, old=<optimized
> out>, ptr=<optimized out>) at ./arch/arm64/include/asm/cmpxchg.h:141
> #3  __mutex_trylock_fast (lock=<optimized out>) at
> kernel/locking/mutex.c:144
> #4  mutex_lock (lock=0xffff0000098dee48 <cgroup_mutex>) at
> kernel/locking/mutex.c:241
> #5  0xffff000008f40978 in kallsyms_token_index ()
>
> ffff000008bda050 <mutex_lock>:
> ffff000008bda050:       a9bf7bfd        stp     x29, x30, [sp, #-16]!
> ffff000008bda054:       aa0003e3        mov     x3, x0
> ffff000008bda058:       d5384102        mrs     x2, sp_el0
> ffff000008bda05c:       910003fd        mov     x29, sp
> ffff000008bda060:       d2800001        mov     x1, #0x0
> // #0
> ffff000008bda064:       97ff85af        bl      ffff000008bbb720
> <__ll_sc___cmpxchg_case_acq_8>
> ffff000008bda068:       d503201f        nop
> ffff000008bda06c:       d503201f        nop
> ffff000008bda070:       b50000c0        cbnz    x0, ffff000008bda088
> <mutex_lock+0x38>
> ffff000008bda074:       a8c17bfd        ldp     x29, x30, [sp], #16
> ffff000008bda078:       910003f0        mov     x16, sp
> ffff000008bda07c:       9248fa1f        and     sp, x16, #0xff7fffffffffffff
> ffff000008bda080:       d65f03c0        ret
> ffff000008bda084:       d503201f        nop
> ffff000008bda088:       aa0303e0        mov     x0, x3
> ffff000008bda08c:       97ffffe7        bl      ffff000008bda028
> <__mutex_lock_slowpath>
> ffff000008bda090:       910003fe        mov     x30, sp
> ffff000008bda094:       b24903df        orr     sp, x30, #0x80000000000000
> ffff000008bda098:       a8c17bfd        ldp     x29, x30, [sp], #16
> ffff000008bda09c:       910003f0        mov     x16, sp
> ffff000008bda0a0:       9248fa1f        and     sp, x16, #0xff7fffffffffffff
> ffff000008bda0a4:       d65f03c0        ret
>
> ffff000008bbb720 <__ll_sc___cmpxchg_case_acq_8>:
> ffff000008bbb720:       f9800011        prfm    pstl1strm, [x0]
> ffff000008bbb724:       c85ffc10        ldaxr   x16, [x0]
> ffff000008bbb728:       ca010211        eor     x17, x16, x1
> ffff000008bbb72c:       b5000071        cbnz    x17, ffff000008bbb738
> <__ll_sc___cmpxchg_case_acq_8+0x18>
> ffff000008bbb730:       c8117c02        stxr    w17, x2, [x0]
> ffff000008bbb734:       35ffff91        cbnz    w17, ffff000008bbb724
> <__ll_sc___cmpxchg_case_acq_8+0x4>
> ffff000008bbb738:       aa1003e0        mov     x0, x16
> ffff000008bbb73c:       910003f0        mov     x16, sp
> ffff000008bbb740:       9248fa1f        and     sp, x16, #0xff7fffffffffffff
> ffff000008bbb744:       d65f03c0        ret
>
> If I turn off CONFIG_ARM64_LSE_ATOMICS it works
>

Thanks Laura.

It is unlikely that this series will be resubmitted in a form that is
anywhere close to its current form, but this is a useful data point
nonetheless. Disregarding ll_sc_atomics.o is straight-forward, and I
am glad to hear that it works without issue otherwise.

-- 
Ard.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation
@ 2018-08-20  6:30     ` Ard Biesheuvel
  0 siblings, 0 replies; 46+ messages in thread
From: Ard Biesheuvel @ 2018-08-20  6:30 UTC (permalink / raw)
  To: Laura Abbott
  Cc: Kernel Hardening, Kees Cook, Christoffer Dall, Will Deacon,
	Catalin Marinas, Mark Rutland, Laura Abbott, linux-arm-kernel

On 18 August 2018 at 03:27, Laura Abbott <labbott@redhat.com> wrote:
> On 08/02/2018 06:21 AM, Ard Biesheuvel wrote:
>>
>> This is a proof of concept I cooked up, primarily to trigger a discussion
>> about whether there is a point to doing anything like this, and if there
>> is, what the pitfalls are. Also, while I am not aware of any similar
>> implementations, the idea is so simple that I would be surprised if nobody
>> else thought of the same thing way before I did.
>>
>> The idea is that we can significantly limit the kernel's attack surface
>> for ROP based attacks by clearing the stack pointer's sign bit before
>> returning from a function, and setting it again right after proceeding
>> from the [expected] return address. This should make it much more
>> difficult
>> to return to arbitrary gadgets, given that they rely on being chained to
>> the next via a return address popped off the stack, and this is difficult
>> when the stack pointer is invalid.
>>
>> Of course, 4 additional instructions per function return is not exactly
>> for free, but they are just movs and adds, and leaf functions are
>> disregarded unless they allocate a stack frame (this comes for free
>> because simple_return insns are disregarded by the plugin)
>>
>> Please shoot, preferably with better ideas ...
>>
>> Ard Biesheuvel (3):
>>    arm64: use wrapper macro for bl/blx instructions from asm code
>>    gcc: plugins: add ROP shield plugin for arm64
>>    arm64: enable ROP protection by clearing SP bit #55 across function
>>      returns
>>
>>   arch/Kconfig                                  |   4 +
>>   arch/arm64/Kconfig                            |  10 ++
>>   arch/arm64/include/asm/assembler.h            |  21 +++-
>>   arch/arm64/kernel/entry-ftrace.S              |   6 +-
>>   arch/arm64/kernel/entry.S                     | 104 +++++++++-------
>>   arch/arm64/kernel/head.S                      |   4 +-
>>   arch/arm64/kernel/probes/kprobes_trampoline.S |   2 +-
>>   arch/arm64/kernel/sleep.S                     |   6 +-
>>   drivers/firmware/efi/libstub/Makefile         |   3 +-
>>   scripts/Makefile.gcc-plugins                  |   7 ++
>>   scripts/gcc-plugins/arm64_rop_shield_plugin.c | 116 ++++++++++++++++++
>>   11 files changed, 228 insertions(+), 55 deletions(-)
>>   create mode 100644 scripts/gcc-plugins/arm64_rop_shield_plugin.c
>>
>
> I tried this on the Fedora config and it died in mutex_lock
>
> #0  el1_sync () at arch/arm64/kernel/entry.S:570
> #1  0xffff000008c62ed4 in __cmpxchg_case_acq_8 (new=<optimized out>,
> old=<optimized out>, ptr=<optimized out>) at
> ./arch/arm64/include/asm/atomic_lse.h:480
> #2  __cmpxchg_acq (size=<optimized out>, new=<optimized out>, old=<optimized
> out>, ptr=<optimized out>) at ./arch/arm64/include/asm/cmpxchg.h:141
> #3  __mutex_trylock_fast (lock=<optimized out>) at
> kernel/locking/mutex.c:144
> #4  mutex_lock (lock=0xffff0000098dee48 <cgroup_mutex>) at
> kernel/locking/mutex.c:241
> #5  0xffff000008f40978 in kallsyms_token_index ()
>
> ffff000008bda050 <mutex_lock>:
> ffff000008bda050:       a9bf7bfd        stp     x29, x30, [sp, #-16]!
> ffff000008bda054:       aa0003e3        mov     x3, x0
> ffff000008bda058:       d5384102        mrs     x2, sp_el0
> ffff000008bda05c:       910003fd        mov     x29, sp
> ffff000008bda060:       d2800001        mov     x1, #0x0
> // #0
> ffff000008bda064:       97ff85af        bl      ffff000008bbb720
> <__ll_sc___cmpxchg_case_acq_8>
> ffff000008bda068:       d503201f        nop
> ffff000008bda06c:       d503201f        nop
> ffff000008bda070:       b50000c0        cbnz    x0, ffff000008bda088
> <mutex_lock+0x38>
> ffff000008bda074:       a8c17bfd        ldp     x29, x30, [sp], #16
> ffff000008bda078:       910003f0        mov     x16, sp
> ffff000008bda07c:       9248fa1f        and     sp, x16, #0xff7fffffffffffff
> ffff000008bda080:       d65f03c0        ret
> ffff000008bda084:       d503201f        nop
> ffff000008bda088:       aa0303e0        mov     x0, x3
> ffff000008bda08c:       97ffffe7        bl      ffff000008bda028
> <__mutex_lock_slowpath>
> ffff000008bda090:       910003fe        mov     x30, sp
> ffff000008bda094:       b24903df        orr     sp, x30, #0x80000000000000
> ffff000008bda098:       a8c17bfd        ldp     x29, x30, [sp], #16
> ffff000008bda09c:       910003f0        mov     x16, sp
> ffff000008bda0a0:       9248fa1f        and     sp, x16, #0xff7fffffffffffff
> ffff000008bda0a4:       d65f03c0        ret
>
> ffff000008bbb720 <__ll_sc___cmpxchg_case_acq_8>:
> ffff000008bbb720:       f9800011        prfm    pstl1strm, [x0]
> ffff000008bbb724:       c85ffc10        ldaxr   x16, [x0]
> ffff000008bbb728:       ca010211        eor     x17, x16, x1
> ffff000008bbb72c:       b5000071        cbnz    x17, ffff000008bbb738
> <__ll_sc___cmpxchg_case_acq_8+0x18>
> ffff000008bbb730:       c8117c02        stxr    w17, x2, [x0]
> ffff000008bbb734:       35ffff91        cbnz    w17, ffff000008bbb724
> <__ll_sc___cmpxchg_case_acq_8+0x4>
> ffff000008bbb738:       aa1003e0        mov     x0, x16
> ffff000008bbb73c:       910003f0        mov     x16, sp
> ffff000008bbb740:       9248fa1f        and     sp, x16, #0xff7fffffffffffff
> ffff000008bbb744:       d65f03c0        ret
>
> If I turn off CONFIG_ARM64_LSE_ATOMICS it works
>

Thanks Laura.

It is unlikely that this series will be resubmitted in a form that is
anywhere close to its current form, but this is a useful data point
nonetheless. Disregarding ll_sc_atomics.o is straight-forward, and I
am glad to hear that it works without issue otherwise.

-- 
Ard.

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2018-08-20  6:30 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-02 13:21 [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation Ard Biesheuvel
2018-08-02 13:21 ` Ard Biesheuvel
2018-08-02 13:21 ` [RFC/PoC PATCH 1/3] arm64: use wrapper macro for bl/blx instructions from asm code Ard Biesheuvel
2018-08-02 13:21   ` Ard Biesheuvel
2018-08-02 13:21 ` [RFC/PoC PATCH 2/3] gcc: plugins: add ROP shield plugin for arm64 Ard Biesheuvel
2018-08-02 13:21   ` Ard Biesheuvel
2018-08-02 13:21 ` [RFC/PoC PATCH 3/3] arm64: enable ROP protection by clearing SP bit #55 across function returns Ard Biesheuvel
2018-08-02 13:21   ` Ard Biesheuvel
2018-08-06 10:07 ` [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation Florian Weimer
2018-08-06 10:07   ` Florian Weimer
2018-08-06 10:31   ` Ard Biesheuvel
2018-08-06 10:31     ` Ard Biesheuvel
2018-08-06 13:55 ` Robin Murphy
2018-08-06 13:55   ` Robin Murphy
2018-08-06 14:04   ` Ard Biesheuvel
2018-08-06 14:04     ` Ard Biesheuvel
2018-08-06 15:20     ` Ard Biesheuvel
2018-08-06 15:20       ` Ard Biesheuvel
2018-08-06 15:38     ` Robin Murphy
2018-08-06 15:38       ` Robin Murphy
2018-08-06 15:50       ` Ard Biesheuvel
2018-08-06 15:50         ` Ard Biesheuvel
2018-08-06 16:04         ` Ard Biesheuvel
2018-08-06 16:04           ` Ard Biesheuvel
2018-08-06 17:45           ` Robin Murphy
2018-08-06 17:45             ` Robin Murphy
2018-08-06 18:49             ` Kees Cook
2018-08-06 18:49               ` Kees Cook
2018-08-06 19:35               ` Ard Biesheuvel
2018-08-06 19:35                 ` Ard Biesheuvel
2018-08-06 19:50                 ` Kees Cook
2018-08-06 19:50                   ` Kees Cook
2018-08-06 19:54                   ` Ard Biesheuvel
2018-08-06 19:54                     ` Ard Biesheuvel
2018-08-07  3:05                     ` Mark Brand
2018-08-07  9:21                       ` Ard Biesheuvel
2018-08-07  9:21                         ` Ard Biesheuvel
2018-08-08 16:09                         ` Mark Brand
2018-08-08 16:09                           ` Mark Brand
2018-08-08 22:02                           ` Kees Cook
2018-08-08 22:02                             ` Kees Cook
2018-08-13  7:39                           ` Ard Biesheuvel
2018-08-18  1:27 ` Laura Abbott
2018-08-18  1:27   ` Laura Abbott
2018-08-20  6:30   ` Ard Biesheuvel
2018-08-20  6:30     ` Ard Biesheuvel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.