All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC v8 0/6] Introduce the STACKLEAK feature and a test for it
@ 2018-02-16 18:10 Alexander Popov
  2018-02-16 18:10 ` [PATCH RFC v8 1/6] x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls Alexander Popov
                   ` (7 more replies)
  0 siblings, 8 replies; 61+ messages in thread
From: Alexander Popov @ 2018-02-16 18:10 UTC (permalink / raw)
  To: kernel-hardening, Kees Cook, PaX Team, Brad Spengler,
	Ingo Molnar, Andy Lutomirski, Tycho Andersen, Laura Abbott,
	Mark Rutland, Ard Biesheuvel, Borislav Petkov, Thomas Gleixner,
	H . Peter Anvin, Peter Zijlstra, Dmitry V . Levin, x86,
	alex.popov

This is the 8th version of the patch series introducing STACKLEAK to the
mainline kernel. I've made some minor improvements while waiting for the
next review by x86 maintainers.

STACKLEAK is a security feature developed by Grsecurity/PaX (kudos to them),
which:
 - reduces the information that can be revealed through kernel stack leak bugs;
 - blocks some uninitialized stack variable attacks (e.g. CVE-2010-2963);
 - introduces some runtime checks for kernel stack overflow detection.

STACKLEAK instrumentation statistics
====================================

These numbers were obtained for the 4th version of the patch series.

Size of vmlinux (x86_64_defconfig):
 file size:
  - STACKLEAK disabled: 35014784 bytes
  - STACKLEAK enabled: 35044952 bytes (+0.086%)
 .text section size (calculated by size utility):
  - STACKLEAK disabled: 10752983
  - STACKLEAK enabled: 11062221 (+2.876%)

The readelf utility shows 45602 functions in vmlinux. The STACKLEAK gcc plugin
inserted 36 check_alloca() calls and 1265 track_stack() calls (42274 calls are
inserted during GIMPLE pass and 41009 calls are deleted during RTL pass).
So 2.853% of functions are instrumented.

STACKLEAK performance impact
============================

The STACKLEAK description in Kconfig includes:
"The tradeoff is the performance impact: on a single CPU system kernel
compilation sees a 1% slowdown, other systems and workloads may vary and you are
advised to test this feature on your expected workload before deploying it".

Here are the results of a brief performance test on x86_64 (for the 2nd version
of the patch). The numbers are very different because the STACKLEAK performance
penalty depends on the workloads.

Hardware: Intel Core i7-4770, 16 GB RAM

Test #1: building the Linux kernel with Ubuntu config (time make -j9)
Result on 4.11-rc8:
  real	32m14.893s
  user	237m30.886s
  sys	11m12.273s
Result on 4.11-rc8+stackleak:
  real	32m26.881s (+0.62%)
  user	238m38.926s (+0.48%)
  sys	11m36.426s (+3.59%)

Test #2: hackbench -s 4096 -l 2000 -g 15 -f 25 -P
Average result on 4.11-rc8: 8.71s
Average result on 4.11-rc8+stackleak: 9.08s (+4.29%)

Changelog
=========

Changes in v8

1. Renamed the STACKLEAK tests to STACKLEAK_ALLOCA and STACKLEAK_DEEP_RECURSION.
   Fixed some obsolete comments there.

2. Added the comments describing stack erasing on x86_32, because it differs
   a lot from one on x86_64 since v7.

Changes in v7

1. Improved erase_kstack() on x86_64. Now it detects which stack we are
   currently using. If we are on the thread stack, erase_kstack() writes the
   poison value up to the stack pointer. If we are not on the thread stack (we
   are on the trampoline stack introduced by Andy Lutomirski), erase_kstack()
   writes the poison value up to the cpu_current_top_of_stack.

   N.B. Right now there are two situations when erase_kstack() is called
   on the thread stack:
    - from sysret32_from_system_call in entry_64_compat.S;
    - from syscall_trace_enter() (see the 3rd patch of this series).

2. Introduced STACKLEAK_POISON_CHECK_DEPTH macro and BUILD_BUG_ON() checking
   its consistency with CONFIG_STACKLEAK_TRACK_MIN_SIZE.

3. Added "disable" option for the STACKLEAK gcc plugin (asked by Laura Abbott).

4. Improved CONFIG_STACKLEAK_METRICS. Now /proc/<pid>/stack_depth shows
   the maximum kernel stack consumption for the current and previous syscalls.
   Although this information is not precise, it can be useful for estimating
   the STACKLEAK performance impact for different workloads.

5. Removed redundant erase_kstack() calls from syscall_trace_enter().
   There is no need to erase the stack in syscall_trace_enter() when it
   returns -1 (thanks to Dmitry Levin for noticing that).

6. Introduced MIN_STACK_LEFT to get rid of hardcoded numbers in check_alloca().

Changes in v6

1. Examined syscall entry/exit paths.
    - Added missing erase_kstack() call at ret_from_fork() for x86_32.
    - Added missing erase_kstack() call at syscall_trace_enter().
    - Solved questions previously marked with TODO.

2. Rebased onto v4.15-rc2, which includes Andy Lutomirski's entry changes.
   Andy removed sp0 from thread_struct for x86_64, which was the only issue
   during rebasing.

3. Removed the recursive BUG() in track_stack() that was caused by the code
   instrumentation. Instead, CONFIG_GCC_PLUGIN_STACKLEAK now implies
   CONFIG_VMAP_STACK and CONFIG_SCHED_STACK_END_CHECK, which seems to be
   an optimal solution.

4. Put stack erasing in syscall_trace_enter() into a separate patch and
   fixed my mistake with secure_computing() (found by Tycho Andersen).

5. After some experiments, kept the asm implementation of erase_kstack(),
   because it gives a full control over the stack for clearing it neatly
   and doesn't offend KASAN.

6. Improved the comments describing STACKLEAK.

Changes in v5 (mostly according to the feedback from Ingo Molnar)

1. Introduced the CONFIG_STACKLEAK_METRICS providing STACKLEAK information
   about tasks via the /proc file system. That information can be useful for
   estimating the STACKLEAK performance impact for different workloads.
   In particular, /proc/<pid>/lowest_stack shows the current lowest_stack
   value and its final value from the previous syscall.

2. Introduced a single check_alloca() implementation working for both
   x86_64 and x86_32.

3. Fixed coding style issues and did some refactoring in the STACKLEAK
   gcc plugin.

4. Put the gcc plugin and the kernel stack erasing into separate (working)
   patches.

Changes in v4

1. Introduced the CONFIG_STACKLEAK_TRACK_MIN_SIZE parameter instead of
   hard-coded track-lowest-sp.

2. Carefully looked into the assertions in track_stack() and check_alloca().
    - Fixed the incorrect BUG() condition in track_stack(), thanks to Tycho
       Andersen. Also disabled that check if CONFIG_VMAP_STACK is enabled.
    - Fixed the surplus and erroneous code for calculating stack_left in
       check_alloca() on x86_64. That code repeats the work which is already
       done in get_stack_info() and it misses the fact that different
       exception stacks on x86_64 have different size.

3. Introduced two lkdtm tests for the STACKLEAK feature (developed together
   with Tycho Andersen).

4. Added info about STACKLEAK to Documentation/security/self-protection.rst.

5. Improved the comments.

6. Decided not to change "unsigned long sp = (unsigned long)&sp" to
   current_stack_pointer. The original variant is more platform independent
   since current_stack_pointer has different type on x86 and arm.

Changes in v3

1. Added the detailed comments describing the STACKLEAK gcc plugin.
   Read the plugin from the bottom up, like you do for Linux kernel drivers.
   Additional information:
    - gcc internals documentation available from gcc sources;
    - wonderful slides by Diego Novillo
       https://www.airs.com/dnovillo/200711-GCC-Internals/
    - nice introduction to gcc plugins at LWN
       https://lwn.net/Articles/457543/

2. Improved the commit message and Kconfig description according to the
   feedback from Kees Cook. Also added the original notice describing
   the performance impact of the STACKLEAK feature.

3. Removed the arch-specific ix86_cmodel check in stackleak_track_stack_gate().
   It caused skipping the kernel code instrumentation for i386.

4. Fixed a minor mistake in stackleak_tree_instrument_execute().
   First versions of the plugin used ENTRY_BLOCK_PTR_FOR_FN(cfun)->next_bb
   to get the basic block with the function prologue. That was not correct
   since the control flow graph edge from ENTRY_BLOCK_PTR doesn't always
   go to that basic block.

   So later it was changed to single_succ(ENTRY_BLOCK_PTR_FOR_FN(cfun)),
   but not completely. next_bb was still used for entry_bb assignment,
   which could cause the wrong value of the prologue_instrumented variable.

   I've reported this issue to Grsecurity and proposed the fix for it, but
   unfortunately didn't get any reply.

5. Introduced the STACKLEAK_POISON macro and renamed the config option
   according to the feedback from Kees Cook.

Ideas for further work
======================

 - Think of erasing stack on the way out of exception handlers (idea by
   Andy Lutomirski).
 - Think of erasing the kernel stack after invoking EFI runtime services
   (idea by Mark Rutland).
 - Try to port STACKLEAK to arm64 (Laura Abbott is working on it).


Alexander Popov (6):
  x86/entry: Add STACKLEAK erasing the kernel stack at the end of
    syscalls
  gcc-plugins: Add STACKLEAK plugin for tracking the kernel stack
  x86/entry: Erase kernel stack in syscall_trace_enter()
  lkdtm: Add a test for STACKLEAK
  fs/proc: Show STACKLEAK metrics in the /proc file system
  doc: self-protection: Add information about STACKLEAK feature

 Documentation/security/self-protection.rst |  23 +-
 arch/Kconfig                               |  53 ++++
 arch/x86/Kconfig                           |   1 +
 arch/x86/entry/common.c                    |   7 +
 arch/x86/entry/entry_32.S                  |  93 ++++++
 arch/x86/entry/entry_64.S                  | 113 +++++++
 arch/x86/entry/entry_64_compat.S           |  11 +
 arch/x86/include/asm/processor.h           |   7 +
 arch/x86/kernel/asm-offsets.c              |  11 +
 arch/x86/kernel/dumpstack.c                |  20 ++
 arch/x86/kernel/process_32.c               |   8 +
 arch/x86/kernel/process_64.c               |   8 +
 drivers/misc/Makefile                      |   3 +
 drivers/misc/lkdtm.h                       |   4 +
 drivers/misc/lkdtm_core.c                  |   2 +
 drivers/misc/lkdtm_stackleak.c             | 136 +++++++++
 fs/exec.c                                  |  33 ++
 fs/proc/base.c                             |  18 ++
 include/linux/compiler.h                   |   6 +
 scripts/Makefile.gcc-plugins               |   3 +
 scripts/gcc-plugins/stackleak_plugin.c     | 471 +++++++++++++++++++++++++++++
 21 files changed, 1022 insertions(+), 9 deletions(-)
 create mode 100644 drivers/misc/lkdtm_stackleak.c
 create mode 100644 scripts/gcc-plugins/stackleak_plugin.c

-- 
2.7.4

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH RFC v8 1/6] x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls
  2018-02-16 18:10 [PATCH RFC v8 0/6] Introduce the STACKLEAK feature and a test for it Alexander Popov
@ 2018-02-16 18:10 ` Alexander Popov
  2018-02-21 13:24   ` Borislav Petkov
  2018-02-16 18:10 ` [PATCH RFC v8 2/6] gcc-plugins: Add STACKLEAK plugin for tracking the kernel stack Alexander Popov
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 61+ messages in thread
From: Alexander Popov @ 2018-02-16 18:10 UTC (permalink / raw)
  To: kernel-hardening, Kees Cook, PaX Team, Brad Spengler,
	Ingo Molnar, Andy Lutomirski, Tycho Andersen, Laura Abbott,
	Mark Rutland, Ard Biesheuvel, Borislav Petkov, Thomas Gleixner,
	H . Peter Anvin, Peter Zijlstra, Dmitry V . Levin, x86,
	alex.popov

The STACKLEAK feature erases the kernel stack before returning from
syscalls. That reduces the information which kernel stack leak bugs can
reveal and blocks some uninitialized stack variable attacks. Moreover,
STACKLEAK provides runtime checks for kernel stack overflow detection.

This commit introduces the architecture-specific code filling the used
part of the kernel stack with a poison value before returning to the
userspace. Full STACKLEAK feature also contains the gcc plugin which
comes in a separate commit.

The STACKLEAK feature is ported from grsecurity/PaX. More information at:
  https://grsecurity.net/
  https://pax.grsecurity.net/

This code is modified from Brad Spengler/PaX Team's code in the last
public patch of grsecurity/PaX based on our understanding of the code.
Changes or omissions from the original code are ours and don't reflect
the original grsecurity/PaX code.

Signed-off-by: Alexander Popov <alex.popov@linux.com>
---
 arch/Kconfig                     |  27 ++++++++++
 arch/x86/Kconfig                 |   1 +
 arch/x86/entry/entry_32.S        |  89 ++++++++++++++++++++++++++++++++
 arch/x86/entry/entry_64.S        | 109 +++++++++++++++++++++++++++++++++++++++
 arch/x86/entry/entry_64_compat.S |  11 ++++
 arch/x86/include/asm/processor.h |   4 ++
 arch/x86/kernel/asm-offsets.c    |   8 +++
 arch/x86/kernel/process_32.c     |   5 ++
 arch/x86/kernel/process_64.c     |   5 ++
 include/linux/compiler.h         |   6 +++
 10 files changed, 265 insertions(+)

diff --git a/arch/Kconfig b/arch/Kconfig
index 76c0b54..368e2fb 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -401,6 +401,13 @@ config SECCOMP_FILTER
 
 	  See Documentation/prctl/seccomp_filter.txt for details.
 
+config HAVE_ARCH_STACKLEAK
+	bool
+	help
+	  An architecture should select this if it has the code which
+	  fills the used part of the kernel stack with the STACKLEAK_POISON
+	  value before returning from system calls.
+
 config HAVE_GCC_PLUGINS
 	bool
 	help
@@ -531,6 +538,26 @@ config GCC_PLUGIN_RANDSTRUCT_PERFORMANCE
 	  in structures.  This reduces the performance hit of RANDSTRUCT
 	  at the cost of weakened randomization.
 
+config GCC_PLUGIN_STACKLEAK
+	bool "Erase the kernel stack before returning from syscalls"
+	depends on GCC_PLUGINS
+	depends on HAVE_ARCH_STACKLEAK
+	help
+	  This option makes the kernel erase the kernel stack before it
+	  returns from a system call. That reduces the information which
+	  kernel stack leak bugs can reveal and blocks some uninitialized
+	  stack variable attacks. This option also provides runtime checks
+	  for kernel stack overflow detection.
+
+	  The tradeoff is the performance impact: on a single CPU system kernel
+	  compilation sees a 1% slowdown, other systems and workloads may vary
+	  and you are advised to test this feature on your expected workload
+	  before deploying it.
+
+	  This plugin was ported from grsecurity/PaX. More information at:
+	   * https://grsecurity.net/
+	   * https://pax.grsecurity.net/
+
 config HAVE_CC_STACKPROTECTOR
 	bool
 	help
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 63bf349..62748bd 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -119,6 +119,7 @@ config X86
 	select HAVE_ARCH_COMPAT_MMAP_BASES	if MMU && COMPAT
 	select HAVE_ARCH_SECCOMP_FILTER
 	select HAVE_ARCH_THREAD_STRUCT_WHITELIST
+	select HAVE_ARCH_STACKLEAK
 	select HAVE_ARCH_TRACEHOOK
 	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
 	select HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD if X86_64
diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 16c2c02..4a7365a 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -77,6 +77,90 @@
 #endif
 .endm
 
+.macro erase_kstack
+#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
+	call erase_kstack
+#endif
+.endm
+
+#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
+ENTRY(erase_kstack)
+	pushl	%edi
+	pushl	%ecx
+	pushl	%eax
+	pushl	%ebp
+
+	movl	PER_CPU_VAR(current_task), %ebp
+	mov	TASK_lowest_stack(%ebp), %edi
+	mov	$STACKLEAK_POISON, %eax
+	std
+
+	/*
+	 * Let's search for the poison value in the stack.
+	 * Start from the lowest_stack and go to the bottom (see std above).
+	 */
+1:
+	mov	%edi, %ecx
+	and	$THREAD_SIZE_asm - 1, %ecx
+	shr	$2, %ecx
+	repne	scasl
+	jecxz	2f	/* Didn't find it. Go to poisoning. */
+
+	/*
+	 * Found the poison value in the stack. Go to poisoning if there is
+	 * not enough space left for the poison check.
+	 */
+	cmp	$STACKLEAK_POISON_CHECK_DEPTH / 4, %ecx
+	jc	2f
+
+	/*
+	 * Check that some further dwords contain poison. If so, the part
+	 * of the stack below the address in %edi is likely to be poisoned.
+	 * Otherwise we need to search deeper.
+	 */
+	mov	$STACKLEAK_POISON_CHECK_DEPTH / 4, %ecx
+	repe	scasl
+	jecxz	2f	/* Poison the upper part of the stack */
+	jne	1b	/* Search deeper */
+
+2:
+	/*
+	 * Prepare the counter for poisoning the kernel stack between
+	 * %edi and %esp. Two dwords at the bottom of the stack are reserved
+	 * and should not be poisoned (see CONFIG_SCHED_STACK_END_CHECK).
+	 */
+	or	$2 * 4, %edi
+	cld
+	mov	%esp, %ecx
+	sub	%edi, %ecx
+
+	/* Check that the counter value is sane. */
+	cmp	$THREAD_SIZE_asm, %ecx
+	jb	3f
+	ud2
+
+3:
+	/*
+	 * So let's write the poison value to the kernel stack. Start from the
+	 * address in %edi and move up (see cld above) to the address in %esp
+	 * (not included, used memory).
+	 */
+	shr	$2, %ecx
+	rep	stosl
+
+	/* Set the lowest_stack value to the top_of_stack - 128 */
+	movl	PER_CPU_VAR(cpu_current_top_of_stack), %edi
+	sub	$128, %edi
+	mov	%edi, TASK_lowest_stack(%ebp)
+
+	popl	%ebp
+	popl	%eax
+	popl	%ecx
+	popl	%edi
+	ret
+ENDPROC(erase_kstack)
+#endif
+
 /*
  * User gs save/restore
  *
@@ -299,6 +383,7 @@ ENTRY(ret_from_fork)
 	/* When we fork, we trace the syscall return in the child, too. */
 	movl    %esp, %eax
 	call    syscall_return_slowpath
+	erase_kstack
 	jmp     restore_all
 
 	/* kernel thread */
@@ -459,6 +544,8 @@ ENTRY(entry_SYSENTER_32)
 	ALTERNATIVE "testl %eax, %eax; jz .Lsyscall_32_done", \
 		    "jmp .Lsyscall_32_done", X86_FEATURE_XENPV
 
+	erase_kstack
+
 /* Opportunistic SYSEXIT */
 	TRACE_IRQS_ON			/* User mode traces as IRQs on. */
 	movl	PT_EIP(%esp), %edx	/* pt_regs->ip */
@@ -545,6 +632,8 @@ ENTRY(entry_INT80_32)
 	call	do_int80_syscall_32
 .Lsyscall_32_done:
 
+	erase_kstack
+
 restore_all:
 	TRACE_IRQS_IRET
 .Lrestore_all_notrace:
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 30c8c53..6863af8 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -66,6 +66,112 @@ END(native_usergs_sysret64)
 	TRACE_IRQS_FLAGS EFLAGS(%rsp)
 .endm
 
+.macro erase_kstack
+#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
+	call erase_kstack
+#endif
+.endm
+
+#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
+ENTRY(erase_kstack)
+	pushq	%rdi
+	pushq	%rcx
+	pushq	%rax
+	pushq	%r11
+
+	mov	PER_CPU_VAR(current_task), %r11
+	mov	TASK_lowest_stack(%r11), %rdi
+	mov	$STACKLEAK_POISON, %rax
+	std
+
+	/*
+	 * Let's search for the poison value in the stack.
+	 * Start from the lowest_stack and go to the bottom (see std above).
+	 */
+1:
+	mov	%edi, %ecx
+	and	$THREAD_SIZE_asm - 1, %ecx
+	shr	$3, %ecx
+	repne	scasq
+	jecxz	2f	/* Didn't find it. Go to poisoning. */
+
+	/*
+	 * Found the poison value in the stack. Go to poisoning if there is
+	 * not enough space left for the poison check.
+	 */
+	cmp	$STACKLEAK_POISON_CHECK_DEPTH / 8, %ecx
+	jb	2f
+
+	/*
+	 * Check that some further qwords contain poison. If so, the part
+	 * of the stack below the address in %rdi is likely to be poisoned.
+	 * Otherwise we need to search deeper.
+	 */
+	mov	$STACKLEAK_POISON_CHECK_DEPTH / 8, %ecx
+	repe	scasq
+	jecxz	2f	/* Poison the upper part of the stack */
+	jne	1b	/* Search deeper */
+
+2:
+	/*
+	 * Two qwords at the bottom of the thread stack are reserved and
+	 * should not be poisoned (see CONFIG_SCHED_STACK_END_CHECK).
+	 */
+	or	$2 * 8, %rdi
+
+	/*
+	 * Check whether we are on the thread stack to prepare the counter
+	 * for stack poisoning.
+	 */
+	mov	PER_CPU_VAR(cpu_current_top_of_stack), %rcx
+	sub	%rsp, %rcx
+	cmp	$THREAD_SIZE_asm, %rcx
+	jb	3f
+
+	/*
+	 * We are not on the thread stack, so we can write poison between
+	 * the address in %rdi and the stack top.
+	 */
+	mov	PER_CPU_VAR(cpu_current_top_of_stack), %rcx
+	sub	%rdi, %rcx
+	jmp	4f
+
+3:
+	/*
+	 * We are on the thread stack, so we can write poison between the
+	 * address in %rdi and the address in %rsp (not included, used memory).
+	 */
+	mov	%rsp, %rcx
+	sub	%rdi, %rcx
+
+4:
+	/* Check that the counter value is sane */
+	cmp	$THREAD_SIZE_asm, %rcx
+	jb	5f
+	ud2
+
+5:
+	/*
+	 * So let's write the poison value to the kernel stack. Start from the
+	 * address in %rdi and move up (see cld).
+	 */
+	cld
+	shr	$3, %ecx
+	rep	stosq
+
+	/* Set the lowest_stack value to the top_of_stack - 256 */
+	mov	PER_CPU_VAR(cpu_current_top_of_stack), %rdi
+	sub	$256, %rdi
+	mov	%rdi, TASK_lowest_stack(%r11)
+
+	popq	%r11
+	popq	%rax
+	popq	%rcx
+	popq	%rdi
+	ret
+ENDPROC(erase_kstack)
+#endif
+
 /*
  * When dynamic function tracer is enabled it will add a breakpoint
  * to all locations that it is about to modify, sync CPUs, update
@@ -345,6 +451,8 @@ syscall_return_via_sysret:
 	 * We are on the trampoline stack.  All regs except RDI are live.
 	 * We can do future final exit work right here.
 	 */
+	erase_kstack
+
 	SWITCH_TO_USER_CR3_STACK scratch_reg=%rdi
 
 	popq	%rdi
@@ -653,6 +761,7 @@ GLOBAL(swapgs_restore_regs_and_return_to_usermode)
 	 * We are on the trampoline stack.  All regs except RDI are live.
 	 * We can do future final exit work right here.
 	 */
+	erase_kstack
 
 	SWITCH_TO_USER_CR3_STACK scratch_reg=%rdi
 
diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 98d5358..727dde2 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -19,6 +19,12 @@
 
 	.section .entry.text, "ax"
 
+	.macro erase_kstack
+#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
+	call erase_kstack
+#endif
+	.endm
+
 /*
  * 32-bit SYSENTER entry.
  *
@@ -238,6 +244,11 @@ GLOBAL(entry_SYSCALL_compat_after_hwframe)
 
 	/* Opportunistic SYSRET */
 sysret32_from_system_call:
+	/*
+	 * We are not going to return to the userspace from the trampoline
+	 * stack. So let's erase the thread stack right now.
+	 */
+	erase_kstack
 	TRACE_IRQS_ON			/* User mode traces as IRQs on. */
 	movq	RBX(%rsp), %rbx		/* pt_regs->rbx */
 	movq	RBP(%rsp), %rbp		/* pt_regs->rbp */
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 793bae7..5fd8ae1 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -494,6 +494,10 @@ struct thread_struct {
 
 	mm_segment_t		addr_limit;
 
+#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
+	unsigned long		lowest_stack;
+#endif
+
 	unsigned int		sig_on_uaccess_err:1;
 	unsigned int		uaccess_err:1;	/* uaccess failed */
 
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 76417a9..ef5d260 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -39,6 +39,9 @@ void common(void) {
 	BLANK();
 	OFFSET(TASK_TI_flags, task_struct, thread_info.flags);
 	OFFSET(TASK_addr_limit, task_struct, thread.addr_limit);
+#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
+	OFFSET(TASK_lowest_stack, task_struct, thread.lowest_stack);
+#endif
 
 	BLANK();
 	OFFSET(crypto_tfm_ctx_offset, crypto_tfm, __crt_ctx);
@@ -75,6 +78,11 @@ void common(void) {
 	OFFSET(PV_MMU_read_cr2, pv_mmu_ops, read_cr2);
 #endif
 
+#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
+	BLANK();
+	DEFINE(THREAD_SIZE_asm, THREAD_SIZE);
+#endif
+
 #ifdef CONFIG_XEN
 	BLANK();
 	OFFSET(XEN_vcpu_info_mask, vcpu_info, evtchn_upcall_mask);
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 5224c60..6d256ab 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -136,6 +136,11 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
 	p->thread.sp0 = (unsigned long) (childregs+1);
 	memset(p->thread.ptrace_bps, 0, sizeof(p->thread.ptrace_bps));
 
+#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
+	p->thread.lowest_stack = (unsigned long)task_stack_page(p) +
+						2 * sizeof(unsigned long);
+#endif
+
 	if (unlikely(p->flags & PF_KTHREAD)) {
 		/* kernel thread */
 		memset(childregs, 0, sizeof(struct pt_regs));
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 9eb448c..6dc55f6 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -281,6 +281,11 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
 	p->thread.sp = (unsigned long) fork_frame;
 	p->thread.io_bitmap_ptr = NULL;
 
+#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
+	p->thread.lowest_stack = (unsigned long)task_stack_page(p) +
+						2 * sizeof(unsigned long);
+#endif
+
 	savesegment(gs, p->thread.gsindex);
 	p->thread.gsbase = p->thread.gsindex ? 0 : me->thread.gsbase;
 	savesegment(fs, p->thread.fsindex);
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index e835fc0..fe04f60 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -337,4 +337,10 @@ unsigned long read_word_at_a_time(const void *addr)
 	compiletime_assert(__native_word(t),				\
 		"Need native word sized stores/loads for atomicity.")
 
+#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
+/* Poison value points to the unused hole in the virtual memory map */
+# define STACKLEAK_POISON -0xBEEF
+# define STACKLEAK_POISON_CHECK_DEPTH 128
+#endif
+
 #endif /* __LINUX_COMPILER_H */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH RFC v8 2/6] gcc-plugins: Add STACKLEAK plugin for tracking the kernel stack
  2018-02-16 18:10 [PATCH RFC v8 0/6] Introduce the STACKLEAK feature and a test for it Alexander Popov
  2018-02-16 18:10 ` [PATCH RFC v8 1/6] x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls Alexander Popov
@ 2018-02-16 18:10 ` Alexander Popov
  2018-02-16 18:10 ` [PATCH RFC v8 3/6] x86/entry: Erase kernel stack in syscall_trace_enter() Alexander Popov
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 61+ messages in thread
From: Alexander Popov @ 2018-02-16 18:10 UTC (permalink / raw)
  To: kernel-hardening, Kees Cook, PaX Team, Brad Spengler,
	Ingo Molnar, Andy Lutomirski, Tycho Andersen, Laura Abbott,
	Mark Rutland, Ard Biesheuvel, Borislav Petkov, Thomas Gleixner,
	H . Peter Anvin, Peter Zijlstra, Dmitry V . Levin, x86,
	alex.popov

The STACKLEAK feature erases the kernel stack before returning from
syscalls. That reduces the information which kernel stack leak bugs can
reveal and blocks some uninitialized stack variable attacks. Moreover,
STACKLEAK provides runtime checks for kernel stack overflow detection.

This commit introduces the STACKLEAK gcc plugin. It is needed for:
 - tracking the lowest border of the kernel stack, which is important
    for the code erasing the used part of the kernel stack at the end
    of syscalls (comes in a separate commit);
 - checking that alloca calls don't cause stack overflow.

So this plugin instruments the kernel code inserting:
 - the check_alloca() call before alloca and the track_stack() call
    after it;
 - the track_stack() call for the functions with a stack frame size
    greater than or equal to CONFIG_STACKLEAK_TRACK_MIN_SIZE.

The STACKLEAK feature is ported from grsecurity/PaX. More information at:
  https://grsecurity.net/
  https://pax.grsecurity.net/

This code is modified from Brad Spengler/PaX Team's code in the last
public patch of grsecurity/PaX based on our understanding of the code.
Changes or omissions from the original code are ours and don't reflect
the original grsecurity/PaX code.

Signed-off-by: Alexander Popov <alex.popov@linux.com>
---
 arch/Kconfig                           |  14 +
 arch/x86/kernel/dumpstack.c            |  20 ++
 fs/exec.c                              |  33 +++
 scripts/Makefile.gcc-plugins           |   3 +
 scripts/gcc-plugins/stackleak_plugin.c | 471 +++++++++++++++++++++++++++++++++
 5 files changed, 541 insertions(+)
 create mode 100644 scripts/gcc-plugins/stackleak_plugin.c

diff --git a/arch/Kconfig b/arch/Kconfig
index 368e2fb..a4a8fba 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -542,6 +542,8 @@ config GCC_PLUGIN_STACKLEAK
 	bool "Erase the kernel stack before returning from syscalls"
 	depends on GCC_PLUGINS
 	depends on HAVE_ARCH_STACKLEAK
+	imply VMAP_STACK
+	imply SCHED_STACK_END_CHECK
 	help
 	  This option makes the kernel erase the kernel stack before it
 	  returns from a system call. That reduces the information which
@@ -558,6 +560,18 @@ config GCC_PLUGIN_STACKLEAK
 	   * https://grsecurity.net/
 	   * https://pax.grsecurity.net/
 
+config STACKLEAK_TRACK_MIN_SIZE
+	int "Minimum stack frame size of functions tracked by STACKLEAK"
+	default 100
+	range 0 4096
+	depends on GCC_PLUGIN_STACKLEAK
+	help
+	  The STACKLEAK gcc plugin instruments the kernel code for tracking
+	  the lowest border of the kernel stack (and for some other purposes).
+	  It inserts the track_stack() call for the functions with a stack
+	  frame size greater than or equal to this parameter.
+	  If unsure, leave the default value 100.
+
 config HAVE_CC_STACKPROTECTOR
 	bool
 	help
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index a2d8a39..b53163f 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -375,3 +375,23 @@ static int __init code_bytes_setup(char *s)
 	return 1;
 }
 __setup("code_bytes=", code_bytes_setup);
+
+#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
+
+#define MIN_STACK_LEFT 256
+
+void __used check_alloca(unsigned long size)
+{
+	unsigned long sp = (unsigned long)&sp;
+	struct stack_info stack_info = {0};
+	unsigned long visit_mask = 0;
+	unsigned long stack_left;
+
+	BUG_ON(get_stack_info(&sp, current, &stack_info, &visit_mask));
+
+	stack_left = sp - (unsigned long)stack_info.begin;
+	BUG_ON(stack_left < MIN_STACK_LEFT ||
+				size >= stack_left - MIN_STACK_LEFT);
+}
+EXPORT_SYMBOL(check_alloca);
+#endif
diff --git a/fs/exec.c b/fs/exec.c
index 7eb8d21..9865612 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1962,3 +1962,36 @@ COMPAT_SYSCALL_DEFINE5(execveat, int, fd,
 				  argv, envp, flags);
 }
 #endif
+
+#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
+void __used track_stack(void)
+{
+	/*
+	 * N.B. The arch-specific part of the STACKLEAK feature fills the
+	 * kernel stack with the poison value, which has the register width.
+	 * That code assumes that the value of thread.lowest_stack is aligned
+	 * on the register width boundary.
+	 *
+	 * That is true for x86 and x86_64 because of the kernel stack
+	 * alignment on these platforms (for details, see cc_stack_align in
+	 * arch/x86/Makefile). Take care of that when you port STACKLEAK to
+	 * new platforms.
+	 */
+	unsigned long sp = (unsigned long)&sp;
+
+	/*
+	 * Having CONFIG_STACKLEAK_TRACK_MIN_SIZE larger than
+	 * STACKLEAK_POISON_CHECK_DEPTH makes the poison search in
+	 * erase_kstack() unreliable. Let's prevent that.
+	 */
+	BUILD_BUG_ON(CONFIG_STACKLEAK_TRACK_MIN_SIZE >
+						STACKLEAK_POISON_CHECK_DEPTH);
+
+	if (sp < current->thread.lowest_stack &&
+	    sp >= (unsigned long)task_stack_page(current) +
+					2 * sizeof(unsigned long)) {
+		current->thread.lowest_stack = sp;
+	}
+}
+EXPORT_SYMBOL(track_stack);
+#endif /* CONFIG_GCC_PLUGIN_STACKLEAK */
diff --git a/scripts/Makefile.gcc-plugins b/scripts/Makefile.gcc-plugins
index b2a95af..8d6070f 100644
--- a/scripts/Makefile.gcc-plugins
+++ b/scripts/Makefile.gcc-plugins
@@ -35,6 +35,9 @@ ifdef CONFIG_GCC_PLUGINS
   gcc-plugin-cflags-$(CONFIG_GCC_PLUGIN_RANDSTRUCT)	+= -DRANDSTRUCT_PLUGIN
   gcc-plugin-cflags-$(CONFIG_GCC_PLUGIN_RANDSTRUCT_PERFORMANCE)	+= -fplugin-arg-randomize_layout_plugin-performance-mode
 
+  gcc-plugin-$(CONFIG_GCC_PLUGIN_STACKLEAK)	+= stackleak_plugin.so
+  gcc-plugin-cflags-$(CONFIG_GCC_PLUGIN_STACKLEAK)	+= -DSTACKLEAK_PLUGIN -fplugin-arg-stackleak_plugin-track-min-size=$(CONFIG_STACKLEAK_TRACK_MIN_SIZE)
+
   GCC_PLUGINS_CFLAGS := $(strip $(addprefix -fplugin=$(objtree)/scripts/gcc-plugins/, $(gcc-plugin-y)) $(gcc-plugin-cflags-y))
 
   export PLUGINCC GCC_PLUGINS_CFLAGS GCC_PLUGIN GCC_PLUGIN_SUBDIR
diff --git a/scripts/gcc-plugins/stackleak_plugin.c b/scripts/gcc-plugins/stackleak_plugin.c
new file mode 100644
index 0000000..6fc991c
--- /dev/null
+++ b/scripts/gcc-plugins/stackleak_plugin.c
@@ -0,0 +1,471 @@
+/*
+ * Copyright 2011-2017 by the PaX Team <pageexec@freemail.hu>
+ * Modified by Alexander Popov <alex.popov@linux.com>
+ * Licensed under the GPL v2
+ *
+ * Note: the choice of the license means that the compilation process is
+ * NOT 'eligible' as defined by gcc's library exception to the GPL v3,
+ * but for the kernel it doesn't matter since it doesn't link against
+ * any of the gcc libraries
+ *
+ * This gcc plugin is needed for tracking the lowest border of the kernel stack
+ * and checking that alloca calls don't cause stack overflow. It instruments
+ * the kernel code inserting:
+ *  - the check_alloca() call before alloca and the track_stack() call after it;
+ *  - the track_stack() call for the functions with a stack frame size greater
+ *     than or equal to the "track-min-size" plugin parameter.
+ *
+ * This plugin is ported from grsecurity/PaX. For more information see:
+ *   https://grsecurity.net/
+ *   https://pax.grsecurity.net/
+ *
+ * Debugging:
+ *  - use fprintf() to stderr, debug_generic_expr(), debug_gimple_stmt()
+ *     and print_rtl();
+ *  - add "-fdump-tree-all -fdump-rtl-all" to the plugin CFLAGS in
+ *     Makefile.gcc-plugins to see the verbose dumps of the gcc passes;
+ *  - use gcc -E to understand the preprocessing shenanigans;
+ *  - use gcc with enabled CFG/GIMPLE/SSA verification (--enable-checking).
+ */
+
+#include "gcc-common.h"
+
+__visible int plugin_is_GPL_compatible;
+
+static int track_frame_size = -1;
+static const char track_function[] = "track_stack";
+static const char check_function[] = "check_alloca";
+
+/*
+ * Mark these global variables (roots) for gcc garbage collector since
+ * they point to the garbage-collected memory.
+ */
+static GTY(()) tree track_function_decl;
+static GTY(()) tree check_function_decl;
+
+static struct plugin_info stackleak_plugin_info = {
+	.version = "201707101337",
+	.help = "track-min-size=nn\ttrack stack for functions with a stack frame size >= nn bytes\n"
+		"disable\t\tdo not activate the plugin\n"
+};
+
+static void stackleak_check_alloca(gimple_stmt_iterator *gsi)
+{
+	gimple stmt;
+	gcall *check_alloca;
+	tree alloca_size;
+	cgraph_node_ptr node;
+	int frequency;
+	basic_block bb;
+
+	/* Insert call to void check_alloca(unsigned long size) */
+	alloca_size = gimple_call_arg(gsi_stmt(*gsi), 0);
+	stmt = gimple_build_call(check_function_decl, 1, alloca_size);
+	check_alloca = as_a_gcall(stmt);
+	gsi_insert_before(gsi, check_alloca, GSI_SAME_STMT);
+
+	/* Update the cgraph */
+	bb = gimple_bb(check_alloca);
+	node = cgraph_get_create_node(check_function_decl);
+	gcc_assert(node);
+	frequency = compute_call_stmt_bb_frequency(current_function_decl, bb);
+	cgraph_create_edge(cgraph_get_node(current_function_decl), node,
+			check_alloca, bb->count, frequency, bb->loop_depth);
+}
+
+static void stackleak_add_instrumentation(gimple_stmt_iterator *gsi, bool after)
+{
+	gimple stmt;
+	gcall *track_stack;
+	cgraph_node_ptr node;
+	int frequency;
+	basic_block bb;
+
+	/* Insert call to void track_stack(void) */
+	stmt = gimple_build_call(track_function_decl, 0);
+	track_stack = as_a_gcall(stmt);
+	if (after)
+		gsi_insert_after(gsi, track_stack, GSI_CONTINUE_LINKING);
+	else
+		gsi_insert_before(gsi, track_stack, GSI_SAME_STMT);
+
+	/* Update the cgraph */
+	bb = gimple_bb(track_stack);
+	node = cgraph_get_create_node(track_function_decl);
+	gcc_assert(node);
+	frequency = compute_call_stmt_bb_frequency(current_function_decl, bb);
+	cgraph_create_edge(cgraph_get_node(current_function_decl), node,
+			track_stack, bb->count, frequency, bb->loop_depth);
+}
+
+static bool is_alloca(gimple stmt)
+{
+	if (gimple_call_builtin_p(stmt, BUILT_IN_ALLOCA))
+		return true;
+
+#if BUILDING_GCC_VERSION >= 4007
+	if (gimple_call_builtin_p(stmt, BUILT_IN_ALLOCA_WITH_ALIGN))
+		return true;
+#endif
+
+	return false;
+}
+
+/*
+ * Work with the GIMPLE representation of the code.
+ * Insert the check_alloca() call before alloca and track_stack() call after
+ * it. Also insert track_stack() call into the beginning of the function
+ * if it is not instrumented.
+ */
+static unsigned int stackleak_tree_instrument_execute(void)
+{
+	basic_block bb, entry_bb;
+	bool prologue_instrumented = false, is_leaf = true;
+	gimple_stmt_iterator gsi;
+
+	/*
+	 * ENTRY_BLOCK_PTR is a basic block which represents possible entry
+	 * point of a function. This block does not contain any code and
+	 * has a CFG edge to its successor.
+	 */
+	gcc_assert(single_succ_p(ENTRY_BLOCK_PTR_FOR_FN(cfun)));
+	entry_bb = single_succ(ENTRY_BLOCK_PTR_FOR_FN(cfun));
+
+	/*
+	 * 1. Loop through the GIMPLE statements in each of cfun basic blocks.
+	 * cfun is a global variable which represents the function that is
+	 * currently processed.
+	 */
+	FOR_EACH_BB_FN(bb, cfun) {
+		for (gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi)) {
+			gimple stmt;
+
+			stmt = gsi_stmt(gsi);
+
+			/* Leaf function is a function which makes no calls */
+			if (is_gimple_call(stmt))
+				is_leaf = false;
+
+			if (!is_alloca(stmt))
+				continue;
+
+			/* 2. Insert stack overflow check before alloca call */
+			stackleak_check_alloca(&gsi);
+
+			/* 3. Insert track_stack() call after alloca call */
+			stackleak_add_instrumentation(&gsi, true);
+			if (bb == entry_bb)
+				prologue_instrumented = true;
+		}
+	}
+
+	if (prologue_instrumented)
+		return 0;
+
+	/*
+	 * Special cases to skip the instrumentation.
+	 *
+	 * Taking the address of static inline functions materializes them,
+	 * but we mustn't instrument some of them as the resulting stack
+	 * alignment required by the function call ABI will break other
+	 * assumptions regarding the expected (but not otherwise enforced)
+	 * register clobbering ABI.
+	 *
+	 * Case in point: native_save_fl on amd64 when optimized for size
+	 * clobbers rdx if it were instrumented here.
+	 *
+	 * TODO: any more special cases?
+	 */
+	if (is_leaf &&
+	    !TREE_PUBLIC(current_function_decl) &&
+	    DECL_DECLARED_INLINE_P(current_function_decl)) {
+		return 0;
+	}
+
+	if (is_leaf &&
+	    !strncmp(IDENTIFIER_POINTER(DECL_NAME(current_function_decl)),
+		     "_paravirt_", 10)) {
+		return 0;
+	}
+
+	/* 4. Insert track_stack() call at the function beginning */
+	bb = entry_bb;
+	if (!single_pred_p(bb)) {
+		/* gcc_assert(bb_loop_depth(bb) ||
+				(bb->flags & BB_IRREDUCIBLE_LOOP)); */
+		split_edge(single_succ_edge(ENTRY_BLOCK_PTR_FOR_FN(cfun)));
+		gcc_assert(single_succ_p(ENTRY_BLOCK_PTR_FOR_FN(cfun)));
+		bb = single_succ(ENTRY_BLOCK_PTR_FOR_FN(cfun));
+	}
+	gsi = gsi_after_labels(bb);
+	stackleak_add_instrumentation(&gsi, false);
+
+	return 0;
+}
+
+/*
+ * Work with the RTL representation of the code.
+ * Remove the unneeded track_stack() calls from the functions which don't
+ * call alloca and have the stack frame size less than track_frame_size.
+ */
+static unsigned int stackleak_final_execute(void)
+{
+	rtx_insn *insn, *next;
+
+	if (cfun->calls_alloca)
+		return 0;
+
+	if (get_frame_size() >= track_frame_size)
+		return 0;
+
+	/*
+	 * 1. Find track_stack() calls. Loop through the chain of insns,
+	 * which is an RTL representation of the code for a function.
+	 *
+	 * The example of a matching insn:
+	 *    (call_insn 8 4 10 2 (call (mem (symbol_ref ("track_stack")
+	 *    [flags 0x41] <function_decl 0x7f7cd3302a80 track_stack>)
+	 *    [0 track_stack S1 A8]) (0)) 675 {*call} (expr_list
+	 *    (symbol_ref ("track_stack") [flags 0x41] <function_decl
+	 *    0x7f7cd3302a80 track_stack>) (expr_list (0) (nil))) (nil))
+	 */
+	for (insn = get_insns(); insn; insn = next) {
+		rtx body;
+
+		next = NEXT_INSN(insn);
+
+		/* Check the expression code of the insn */
+		if (!CALL_P(insn))
+			continue;
+
+		/*
+		 * Check the expression code of the insn body, which is an RTL
+		 * Expression (RTX) describing the side effect performed by
+		 * that insn.
+		 */
+		body = PATTERN(insn);
+		if (GET_CODE(body) != CALL)
+			continue;
+
+		/*
+		 * Check the first operand of the call expression. It should
+		 * be a mem RTX describing the needed subroutine with a
+		 * symbol_ref RTX.
+		 */
+		body = XEXP(body, 0);
+		if (GET_CODE(body) != MEM)
+			continue;
+
+		body = XEXP(body, 0);
+		if (GET_CODE(body) != SYMBOL_REF)
+			continue;
+
+		if (SYMBOL_REF_DECL(body) != track_function_decl)
+			continue;
+
+		/* 2. Delete the track_stack() call */
+		delete_insn_and_edges(insn);
+#if BUILDING_GCC_VERSION >= 4007
+		if (GET_CODE(next) == NOTE &&
+		    NOTE_KIND(next) == NOTE_INSN_CALL_ARG_LOCATION) {
+			insn = next;
+			next = NEXT_INSN(insn);
+			delete_insn_and_edges(insn);
+		}
+#endif
+	}
+
+	/*
+	 * Uncomment the following to see the code which was cleaned at this
+	 * pass. It should not contain check_alloca() and track_stack() calls.
+	 * The stack frame size should be less than track_frame_size.
+	 *
+	 * warning(0, "Instrumentation is removed, stack frame size: %ld",
+	 * 						get_frame_size());
+	 * print_simple_rtl(stderr, get_insns());
+	 */
+
+	return 0;
+}
+
+static bool stackleak_track_stack_gate(void)
+{
+	tree section;
+
+	section = lookup_attribute("section",
+				   DECL_ATTRIBUTES(current_function_decl));
+	if (section && TREE_VALUE(section)) {
+		section = TREE_VALUE(TREE_VALUE(section));
+
+		if (!strncmp(TREE_STRING_POINTER(section), ".init.text", 10))
+			return false;
+		if (!strncmp(TREE_STRING_POINTER(section), ".devinit.text", 13))
+			return false;
+		if (!strncmp(TREE_STRING_POINTER(section), ".cpuinit.text", 13))
+			return false;
+		if (!strncmp(TREE_STRING_POINTER(section), ".meminit.text", 13))
+			return false;
+	}
+
+	return track_frame_size >= 0;
+}
+
+/* Build function declarations for track_stack() and check_alloca() */
+static void stackleak_start_unit(void *gcc_data __unused,
+				 void *user_data __unused)
+{
+	tree fntype;
+
+	/* void track_stack(void) */
+	fntype = build_function_type_list(void_type_node, NULL_TREE);
+	track_function_decl = build_fn_decl(track_function, fntype);
+	DECL_ASSEMBLER_NAME(track_function_decl); /* for LTO */
+	TREE_PUBLIC(track_function_decl) = 1;
+	TREE_USED(track_function_decl) = 1;
+	DECL_EXTERNAL(track_function_decl) = 1;
+	DECL_ARTIFICIAL(track_function_decl) = 1;
+	DECL_PRESERVE_P(track_function_decl) = 1;
+
+	/* void check_alloca(unsigned long) */
+	fntype = build_function_type_list(void_type_node,
+				long_unsigned_type_node, NULL_TREE);
+	check_function_decl = build_fn_decl(check_function, fntype);
+	DECL_ASSEMBLER_NAME(check_function_decl); /* for LTO */
+	TREE_PUBLIC(check_function_decl) = 1;
+	TREE_USED(check_function_decl) = 1;
+	DECL_EXTERNAL(check_function_decl) = 1;
+	DECL_ARTIFICIAL(check_function_decl) = 1;
+	DECL_PRESERVE_P(check_function_decl) = 1;
+}
+
+/*
+ * Pass gate function is a predicate function that gets executed before the
+ * corresponding pass. If the return value is 'true' the pass gets executed,
+ * otherwise, it is skipped.
+ */
+static bool stackleak_tree_instrument_gate(void)
+{
+	return stackleak_track_stack_gate();
+}
+
+#define PASS_NAME stackleak_tree_instrument
+#define PROPERTIES_REQUIRED PROP_gimple_leh | PROP_cfg
+#define TODO_FLAGS_START TODO_verify_ssa | TODO_verify_flow | TODO_verify_stmts
+#define TODO_FLAGS_FINISH TODO_verify_ssa | TODO_verify_stmts | TODO_dump_func \
+			| TODO_update_ssa | TODO_rebuild_cgraph_edges
+#include "gcc-generate-gimple-pass.h"
+
+static bool stackleak_final_gate(void)
+{
+	return stackleak_track_stack_gate();
+}
+
+#define PASS_NAME stackleak_final
+#define TODO_FLAGS_FINISH TODO_dump_func
+#include "gcc-generate-rtl-pass.h"
+
+/*
+ * Every gcc plugin exports a plugin_init() function that is called right
+ * after the plugin is loaded. This function is responsible for registering
+ * the plugin callbacks and doing other required initialization.
+ */
+__visible int plugin_init(struct plugin_name_args *plugin_info,
+			  struct plugin_gcc_version *version)
+{
+	const char * const plugin_name = plugin_info->base_name;
+	const int argc = plugin_info->argc;
+	const struct plugin_argument * const argv = plugin_info->argv;
+	int i = 0;
+
+	/* Extra GGC root tables describing our GTY-ed data */
+	static const struct ggc_root_tab gt_ggc_r_gt_stackleak[] = {
+		{
+			.base = &track_function_decl,
+			.nelt = 1,
+			.stride = sizeof(track_function_decl),
+			.cb = &gt_ggc_mx_tree_node,
+			.pchw = &gt_pch_nx_tree_node
+		},
+		{
+			.base = &check_function_decl,
+			.nelt = 1,
+			.stride = sizeof(check_function_decl),
+			.cb = &gt_ggc_mx_tree_node,
+			.pchw = &gt_pch_nx_tree_node
+		},
+		LAST_GGC_ROOT_TAB
+	};
+
+	/*
+	 * The stackleak_tree_instrument pass should be executed before the
+	 * "optimized" pass, which is the control flow graph cleanup that is
+	 * performed just before expanding gcc trees to the RTL. In former
+	 * versions of the plugin this new pass was inserted before the
+	 * "tree_profile" pass, which is currently called "profile".
+	 */
+	PASS_INFO(stackleak_tree_instrument, "optimized", 1,
+						PASS_POS_INSERT_BEFORE);
+
+	/*
+	 * The stackleak_final pass should be executed before the "final" pass,
+	 * which turns the RTL (Register Transfer Language) into assembly.
+	 */
+	PASS_INFO(stackleak_final, "final", 1, PASS_POS_INSERT_BEFORE);
+
+	if (!plugin_default_version_check(version, &gcc_version)) {
+		error(G_("incompatible gcc/plugin versions"));
+		return 1;
+	}
+
+	/* Parse the plugin arguments */
+	for (i = 0; i < argc; i++) {
+		if (!strcmp(argv[i].key, "disable"))
+			return 0;
+
+		if (!strcmp(argv[i].key, "track-min-size")) {
+			if (!argv[i].value) {
+				error(G_("no value supplied for option '-fplugin-arg-%s-%s'"),
+					plugin_name, argv[i].key);
+				return 1;
+			}
+
+			track_frame_size = atoi(argv[i].value);
+			if (track_frame_size < 0) {
+				error(G_("invalid option argument '-fplugin-arg-%s-%s=%s'"),
+					plugin_name, argv[i].key, argv[i].value);
+				return 1;
+			}
+		} else {
+			error(G_("unknown option '-fplugin-arg-%s-%s'"),
+					plugin_name, argv[i].key);
+			return 1;
+		}
+	}
+
+	/* Give the information about the plugin */
+	register_callback(plugin_name, PLUGIN_INFO, NULL,
+						&stackleak_plugin_info);
+
+	/* Register to be called before processing a translation unit */
+	register_callback(plugin_name, PLUGIN_START_UNIT,
+					&stackleak_start_unit, NULL);
+
+	/* Register an extra GCC garbage collector (GGC) root table */
+	register_callback(plugin_name, PLUGIN_REGISTER_GGC_ROOTS, NULL,
+					(void *)&gt_ggc_r_gt_stackleak);
+
+	/*
+	 * Hook into the Pass Manager to register new gcc passes.
+	 *
+	 * The stack frame size info is available only at the last RTL pass,
+	 * when it's too late to insert complex code like a function call.
+	 * So we register two gcc passes to instrument every function at first
+	 * and remove the unneeded instrumentation later.
+	 */
+	register_callback(plugin_name, PLUGIN_PASS_MANAGER_SETUP, NULL,
+					&stackleak_tree_instrument_pass_info);
+	register_callback(plugin_name, PLUGIN_PASS_MANAGER_SETUP, NULL,
+					&stackleak_final_pass_info);
+
+	return 0;
+}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH RFC v8 3/6] x86/entry: Erase kernel stack in syscall_trace_enter()
  2018-02-16 18:10 [PATCH RFC v8 0/6] Introduce the STACKLEAK feature and a test for it Alexander Popov
  2018-02-16 18:10 ` [PATCH RFC v8 1/6] x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls Alexander Popov
  2018-02-16 18:10 ` [PATCH RFC v8 2/6] gcc-plugins: Add STACKLEAK plugin for tracking the kernel stack Alexander Popov
@ 2018-02-16 18:10 ` Alexander Popov
  2018-02-16 18:10 ` [PATCH RFC v8 4/6] lkdtm: Add a test for STACKLEAK Alexander Popov
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 61+ messages in thread
From: Alexander Popov @ 2018-02-16 18:10 UTC (permalink / raw)
  To: kernel-hardening, Kees Cook, PaX Team, Brad Spengler,
	Ingo Molnar, Andy Lutomirski, Tycho Andersen, Laura Abbott,
	Mark Rutland, Ard Biesheuvel, Borislav Petkov, Thomas Gleixner,
	H . Peter Anvin, Peter Zijlstra, Dmitry V . Levin, x86,
	alex.popov

Make STACKLEAK erase kernel stack after ptrace/seccomp/auditing
not to leave any sensitive information on the stack for the syscall code.

This code is modified from Brad Spengler/PaX Team's code in the last
public patch of grsecurity/PaX based on our understanding of the code.
Changes or omissions from the original code are ours and don't reflect
the original grsecurity/PaX code.

Signed-off-by: Alexander Popov <alex.popov@linux.com>
---
 arch/x86/entry/common.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 74f6eee..b4be776 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -46,6 +46,12 @@ __visible inline void enter_from_user_mode(void)
 static inline void enter_from_user_mode(void) {}
 #endif
 
+#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
+asmlinkage void erase_kstack(void);
+#else
+static void erase_kstack(void) {}
+#endif
+
 static void do_audit_syscall_entry(struct pt_regs *regs, u32 arch)
 {
 #ifdef CONFIG_X86_64
@@ -128,6 +134,7 @@ static long syscall_trace_enter(struct pt_regs *regs)
 
 	do_audit_syscall_entry(regs, arch);
 
+	erase_kstack();
 	return ret ?: regs->orig_ax;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH RFC v8 4/6] lkdtm: Add a test for STACKLEAK
  2018-02-16 18:10 [PATCH RFC v8 0/6] Introduce the STACKLEAK feature and a test for it Alexander Popov
                   ` (2 preceding siblings ...)
  2018-02-16 18:10 ` [PATCH RFC v8 3/6] x86/entry: Erase kernel stack in syscall_trace_enter() Alexander Popov
@ 2018-02-16 18:10 ` Alexander Popov
  2018-02-16 18:10 ` [PATCH RFC v8 5/6] fs/proc: Show STACKLEAK metrics in the /proc file system Alexander Popov
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 61+ messages in thread
From: Alexander Popov @ 2018-02-16 18:10 UTC (permalink / raw)
  To: kernel-hardening, Kees Cook, PaX Team, Brad Spengler,
	Ingo Molnar, Andy Lutomirski, Tycho Andersen, Laura Abbott,
	Mark Rutland, Ard Biesheuvel, Borislav Petkov, Thomas Gleixner,
	H . Peter Anvin, Peter Zijlstra, Dmitry V . Levin, x86,
	alex.popov

Introduce two lkdtm tests for the STACKLEAK feature: STACKLEAK_ALLOCA
and STACKLEAK_DEEP_RECURSION. Both of them check that the current task
stack is properly erased (filled with STACKLEAK_POISON).

STACKLEAK_ALLOCA tests that:
 - check_alloca() allows alloca calls which don't exhaust the kernel stack;
 - alloca calls which exhaust/overflow the kernel stack hit BUG() in
    check_alloca().

STACKLEAK_DEEP_RECURSION tests that exhausting the current task stack
with a deep recursion is detected by CONFIG_VMAP_STACK (which is implied
by CONFIG_GCC_PLUGIN_STACKLEAK).

Signed-off-by: Tycho Andersen <tycho@tycho.ws>
Signed-off-by: Alexander Popov <alex.popov@linux.com>
---
 drivers/misc/Makefile          |   3 +
 drivers/misc/lkdtm.h           |   4 ++
 drivers/misc/lkdtm_core.c      |   2 +
 drivers/misc/lkdtm_stackleak.c | 136 +++++++++++++++++++++++++++++++++++++++++
 4 files changed, 145 insertions(+)
 create mode 100644 drivers/misc/lkdtm_stackleak.c

diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index c3c8624..2b11823 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -65,6 +65,9 @@ lkdtm-$(CONFIG_LKDTM)		+= lkdtm_perms.o
 lkdtm-$(CONFIG_LKDTM)		+= lkdtm_refcount.o
 lkdtm-$(CONFIG_LKDTM)		+= lkdtm_rodata_objcopy.o
 lkdtm-$(CONFIG_LKDTM)		+= lkdtm_usercopy.o
+lkdtm-$(CONFIG_LKDTM)		+= lkdtm_stackleak.o
+
+KASAN_SANITIZE_lkdtm_stackleak.o := n
 
 KCOV_INSTRUMENT_lkdtm_rodata.o	:= n
 
diff --git a/drivers/misc/lkdtm.h b/drivers/misc/lkdtm.h
index 9e513dc..4b2b8e3 100644
--- a/drivers/misc/lkdtm.h
+++ b/drivers/misc/lkdtm.h
@@ -83,4 +83,8 @@ void lkdtm_USERCOPY_STACK_FRAME_FROM(void);
 void lkdtm_USERCOPY_STACK_BEYOND(void);
 void lkdtm_USERCOPY_KERNEL(void);
 
+/* lkdtm_stackleak.c */
+void lkdtm_STACKLEAK_ALLOCA(void);
+void lkdtm_STACKLEAK_DEEP_RECURSION(void);
+
 #endif
diff --git a/drivers/misc/lkdtm_core.c b/drivers/misc/lkdtm_core.c
index 2154d1b..c37fd85 100644
--- a/drivers/misc/lkdtm_core.c
+++ b/drivers/misc/lkdtm_core.c
@@ -183,6 +183,8 @@ static const struct crashtype crashtypes[] = {
 	CRASHTYPE(USERCOPY_STACK_FRAME_FROM),
 	CRASHTYPE(USERCOPY_STACK_BEYOND),
 	CRASHTYPE(USERCOPY_KERNEL),
+	CRASHTYPE(STACKLEAK_ALLOCA),
+	CRASHTYPE(STACKLEAK_DEEP_RECURSION),
 };
 
 
diff --git a/drivers/misc/lkdtm_stackleak.c b/drivers/misc/lkdtm_stackleak.c
new file mode 100644
index 0000000..b1d2a9c
--- /dev/null
+++ b/drivers/misc/lkdtm_stackleak.c
@@ -0,0 +1,136 @@
+/*
+ * This code tests several aspects of the STACKLEAK feature:
+ *  - the current task stack is properly erased (filled with STACKLEAK_POISON);
+ *  - check_alloca() allows alloca calls which don't exhaust the kernel stack;
+ *  - alloca calls which exhaust/overflow the kernel stack hit BUG() in
+ *     check_alloca();
+ *  - exhausting the current task stack with a deep recursion is detected by
+ *     CONFIG_VMAP_STACK (which is implied by CONFIG_GCC_PLUGIN_STACKLEAK).
+ *
+ * Authors:
+ *   Tycho Andersen <tycho@tycho.ws>
+ *   Alexander Popov <alex.popov@linux.com>
+ */
+
+#include "lkdtm.h"
+#include <linux/sched.h>
+#include <linux/compiler.h>
+
+#ifndef CONFIG_GCC_PLUGIN_STACKLEAK
+# define STACKLEAK_POISON -0xBEEF
+# define CONFIG_STACKLEAK_TRACK_MIN_SIZE 100
+#endif
+
+static noinline bool stack_is_erased(void)
+{
+	unsigned long *sp, left, found, i;
+
+	/*
+	 * For the details about the alignment of the poison values, see
+	 * the comment in track_stack().
+	 */
+	sp = PTR_ALIGN(&i, sizeof(unsigned long));
+
+	left = ((unsigned long)sp & (THREAD_SIZE - 1)) / sizeof(unsigned long);
+	sp--;
+
+	/*
+	 * Two unsigned long ints at the bottom of the thread stack are
+	 * reserved and not poisoned.
+	 */
+	if (left <= 2)
+		return false;
+
+	left -= 2;
+	pr_info("checking unused part of the thread stack (%lu bytes)...\n",
+					left * sizeof(unsigned long));
+
+	/* Search for 17 poison values in a row (like erase_kstack() does) */
+	for (i = 0, found = 0; i < left && found < 17; i++) {
+		if (*(sp - i) == STACKLEAK_POISON)
+			found++;
+		else
+			found = 0;
+	}
+
+	if (found < 17) {
+		pr_err("FAIL: thread stack is not erased (checked %lu bytes)\n",
+						i * sizeof(unsigned long));
+		return false;
+	}
+
+	pr_info("first %lu bytes are unpoisoned\n",
+				(i - found) * sizeof(unsigned long));
+
+	/* The rest of thread stack should be erased */
+	for (; i < left; i++) {
+		if (*(sp - i) != STACKLEAK_POISON) {
+			pr_err("FAIL: thread stack is NOT properly erased\n");
+			return false;
+		}
+	}
+
+	pr_info("the rest of the thread stack is properly erased\n");
+	return true;
+}
+
+static noinline void do_alloca(unsigned long size)
+{
+	char buf[size];
+
+	/* So this doesn't get inlined or optimized out */
+	snprintf(buf, size, "testing alloca...\n");
+}
+
+void lkdtm_STACKLEAK_ALLOCA(void)
+{
+	unsigned long left = (unsigned long)&left & (THREAD_SIZE - 1);
+
+	if (!stack_is_erased())
+		return;
+
+	/* Try a small alloca to see if it works */
+	pr_info("try a small alloca of 16 bytes...\n");
+	do_alloca(16);
+	pr_info("small alloca is successful\n");
+
+	/* Try to hit the BUG() in check_alloca() */
+	pr_info("try a large alloca of %lu bytes (stack overflow)...\n", left);
+	do_alloca(left);
+	pr_err("FAIL: large alloca overstepped the thread stack boundary\n");
+}
+
+/*
+ * The stack frame size of recursion() is bigger than the
+ * CONFIG_STACKLEAK_TRACK_MIN_SIZE, hence that function is instrumented
+ * by the STACKLEAK gcc plugin and it calls track_stack() at the beginning.
+ */
+static noinline unsigned long recursion(unsigned long prev_sp)
+{
+	char buf[CONFIG_STACKLEAK_TRACK_MIN_SIZE + 42];
+	unsigned long sp = (unsigned long)&sp;
+
+	snprintf(buf, sizeof(buf), "testing deep recursion...\n");
+
+	if (prev_sp < sp + THREAD_SIZE)
+		sp = recursion(prev_sp);
+
+	return sp;
+}
+
+void lkdtm_STACKLEAK_DEEP_RECURSION(void)
+{
+	unsigned long sp = (unsigned long)&sp;
+
+	if (!stack_is_erased())
+		return;
+
+	/*
+	 * Exhaust the thread stack with a deep recursion. It should hit the
+	 * guard page provided by CONFIG_VMAP_STACK (which is implied by
+	 * CONFIG_GCC_PLUGIN_STACKLEAK).
+	 */
+	pr_info("try to exhaust the thread stack with a deep recursion...\n");
+	pr_err("FAIL: thread stack exhaustion (%lu bytes) is not detected\n",
+							sp - recursion(sp));
+}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH RFC v8 5/6] fs/proc: Show STACKLEAK metrics in the /proc file system
  2018-02-16 18:10 [PATCH RFC v8 0/6] Introduce the STACKLEAK feature and a test for it Alexander Popov
                   ` (3 preceding siblings ...)
  2018-02-16 18:10 ` [PATCH RFC v8 4/6] lkdtm: Add a test for STACKLEAK Alexander Popov
@ 2018-02-16 18:10 ` Alexander Popov
  2018-02-16 18:10 ` [PATCH RFC v8 6/6] doc: self-protection: Add information about STACKLEAK feature Alexander Popov
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 61+ messages in thread
From: Alexander Popov @ 2018-02-16 18:10 UTC (permalink / raw)
  To: kernel-hardening, Kees Cook, PaX Team, Brad Spengler,
	Ingo Molnar, Andy Lutomirski, Tycho Andersen, Laura Abbott,
	Mark Rutland, Ard Biesheuvel, Borislav Petkov, Thomas Gleixner,
	H . Peter Anvin, Peter Zijlstra, Dmitry V . Levin, x86,
	alex.popov

Introduce CONFIG_STACKLEAK_METRICS providing STACKLEAK information about
tasks via the /proc file system. In particular, /proc/<pid>/stack_depth
shows the maximum kernel stack consumption for the current and previous
syscalls. Although this information is not precise, it  can be useful for
estimating the STACKLEAK performance impact for your workloads.

Signed-off-by: Alexander Popov <alex.popov@linux.com>
---
 arch/Kconfig                     | 12 ++++++++++++
 arch/x86/entry/entry_32.S        |  4 ++++
 arch/x86/entry/entry_64.S        |  4 ++++
 arch/x86/include/asm/processor.h |  3 +++
 arch/x86/kernel/asm-offsets.c    |  3 +++
 arch/x86/kernel/process_32.c     |  3 +++
 arch/x86/kernel/process_64.c     |  3 +++
 fs/proc/base.c                   | 18 ++++++++++++++++++
 8 files changed, 50 insertions(+)

diff --git a/arch/Kconfig b/arch/Kconfig
index a4a8fba..42ebfb9 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -572,6 +572,18 @@ config STACKLEAK_TRACK_MIN_SIZE
 	  frame size greater than or equal to this parameter.
 	  If unsure, leave the default value 100.
 
+config STACKLEAK_METRICS
+	bool "Show STACKLEAK metrics in the /proc file system"
+	depends on GCC_PLUGIN_STACKLEAK
+	depends on PROC_FS
+	help
+	  If this is set, STACKLEAK metrics for every task are available in
+	  the /proc file system. In particular, /proc/<pid>/stack_depth
+	  shows the maximum kernel stack consumption for the current and
+	  previous syscalls. Although this information is not precise, it
+	  can be useful for estimating the STACKLEAK performance impact for
+	  your workloads.
+
 config HAVE_CC_STACKPROTECTOR
 	bool
 	help
diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 4a7365a..c6613ca 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -134,6 +134,10 @@ ENTRY(erase_kstack)
 	mov	%esp, %ecx
 	sub	%edi, %ecx
 
+#ifdef CONFIG_STACKLEAK_METRICS
+	mov	%edi, TASK_prev_lowest_stack(%ebp)
+#endif
+
 	/* Check that the counter value is sane. */
 	cmp	$THREAD_SIZE_asm, %ecx
 	jb	3f
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 6863af8..b418d3a 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -119,6 +119,10 @@ ENTRY(erase_kstack)
 	 */
 	or	$2 * 8, %rdi
 
+#ifdef CONFIG_STACKLEAK_METRICS
+	mov	%rdi, TASK_prev_lowest_stack(%r11)
+#endif
+
 	/*
 	 * Check whether we are on the thread stack to prepare the counter
 	 * for stack poisoning.
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 5fd8ae1..871e197 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -496,6 +496,9 @@ struct thread_struct {
 
 #ifdef CONFIG_GCC_PLUGIN_STACKLEAK
 	unsigned long		lowest_stack;
+# ifdef CONFIG_STACKLEAK_METRICS
+	unsigned long		prev_lowest_stack;
+# endif
 #endif
 
 	unsigned int		sig_on_uaccess_err:1;
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index ef5d260..f48197a 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -41,6 +41,9 @@ void common(void) {
 	OFFSET(TASK_addr_limit, task_struct, thread.addr_limit);
 #ifdef CONFIG_GCC_PLUGIN_STACKLEAK
 	OFFSET(TASK_lowest_stack, task_struct, thread.lowest_stack);
+# ifdef CONFIG_STACKLEAK_METRICS
+	OFFSET(TASK_prev_lowest_stack, task_struct, thread.prev_lowest_stack);
+# endif
 #endif
 
 	BLANK();
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 6d256ab..48993fe 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -139,6 +139,9 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
 #ifdef CONFIG_GCC_PLUGIN_STACKLEAK
 	p->thread.lowest_stack = (unsigned long)task_stack_page(p) +
 						2 * sizeof(unsigned long);
+# ifdef CONFIG_STACKLEAK_METRICS
+	p->thread.prev_lowest_stack = p->thread.lowest_stack;
+# endif
 #endif
 
 	if (unlikely(p->flags & PF_KTHREAD)) {
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 6dc55f6..0355fba 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -284,6 +284,9 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
 #ifdef CONFIG_GCC_PLUGIN_STACKLEAK
 	p->thread.lowest_stack = (unsigned long)task_stack_page(p) +
 						2 * sizeof(unsigned long);
+# ifdef CONFIG_STACKLEAK_METRICS
+	p->thread.prev_lowest_stack = p->thread.lowest_stack;
+# endif
 #endif
 
 	savesegment(gs, p->thread.gsindex);
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 9298324..6a7f9bd 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2914,6 +2914,21 @@ static int proc_pid_patch_state(struct seq_file *m, struct pid_namespace *ns,
 }
 #endif /* CONFIG_LIVEPATCH */
 
+#ifdef CONFIG_STACKLEAK_METRICS
+static int proc_stack_depth(struct seq_file *m, struct pid_namespace *ns,
+				struct pid *pid, struct task_struct *task)
+{
+	unsigned long prev_depth = THREAD_SIZE -
+			(task->thread.prev_lowest_stack & (THREAD_SIZE - 1));
+	unsigned long depth = THREAD_SIZE -
+			(task->thread.lowest_stack & (THREAD_SIZE - 1));
+
+	seq_printf(m, "previous stack depth: %lu\nstack depth: %lu\n",
+							prev_depth, depth);
+	return 0;
+}
+#endif /* CONFIG_STACKLEAK_METRICS */
+
 /*
  * Thread groups
  */
@@ -3018,6 +3033,9 @@ static const struct pid_entry tgid_base_stuff[] = {
 #ifdef CONFIG_LIVEPATCH
 	ONE("patch_state",  S_IRUSR, proc_pid_patch_state),
 #endif
+#ifdef CONFIG_STACKLEAK_METRICS
+	ONE("stack_depth", S_IRUGO, proc_stack_depth),
+#endif
 };
 
 static int proc_tgid_base_readdir(struct file *file, struct dir_context *ctx)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH RFC v8 6/6] doc: self-protection: Add information about STACKLEAK feature
  2018-02-16 18:10 [PATCH RFC v8 0/6] Introduce the STACKLEAK feature and a test for it Alexander Popov
                   ` (4 preceding siblings ...)
  2018-02-16 18:10 ` [PATCH RFC v8 5/6] fs/proc: Show STACKLEAK metrics in the /proc file system Alexander Popov
@ 2018-02-16 18:10 ` Alexander Popov
  2018-02-20 10:29 ` [PATCH RFC v8 0/6] Introduce the STACKLEAK feature and a test for it Alexander Popov
  2018-02-22  1:43 ` Laura Abbott
  7 siblings, 0 replies; 61+ messages in thread
From: Alexander Popov @ 2018-02-16 18:10 UTC (permalink / raw)
  To: kernel-hardening, Kees Cook, PaX Team, Brad Spengler,
	Ingo Molnar, Andy Lutomirski, Tycho Andersen, Laura Abbott,
	Mark Rutland, Ard Biesheuvel, Borislav Petkov, Thomas Gleixner,
	H . Peter Anvin, Peter Zijlstra, Dmitry V . Levin, x86,
	alex.popov

Add information about STACKLEAK feature to "Stack depth overflow" and
"Memory poisoning" sections of self-protection.rst.

Signed-off-by: Alexander Popov <alex.popov@linux.com>
---
 Documentation/security/self-protection.rst | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/Documentation/security/self-protection.rst b/Documentation/security/self-protection.rst
index 0f53826..b685f18 100644
--- a/Documentation/security/self-protection.rst
+++ b/Documentation/security/self-protection.rst
@@ -165,10 +165,15 @@ Stack depth overflow
 A less well understood attack is using a bug that triggers the
 kernel to consume stack memory with deep function calls or large stack
 allocations. With this attack it is possible to write beyond the end of
-the kernel's preallocated stack space and into sensitive structures. Two
-important changes need to be made for better protections: moving the
-sensitive thread_info structure elsewhere, and adding a faulting memory
-hole at the bottom of the stack to catch these overflows.
+the kernel's preallocated stack space and into sensitive structures.
+The combination of the following measures gives better protection:
+
+* moving the sensitive thread_info structure off the stack
+  (``CONFIG_THREAD_INFO_IN_TASK``);
+* adding a faulting memory hole at the bottom of the stack to catch
+  these overflows (``CONFIG_VMAP_STACK``);
+* runtime checking that alloca() calls don't overstep the stack boundary
+  (``CONFIG_GCC_PLUGIN_STACKLEAK``).
 
 Heap memory integrity
 ---------------------
@@ -302,11 +307,11 @@ sure structure holes are cleared.
 Memory poisoning
 ----------------
 
-When releasing memory, it is best to poison the contents (clear stack on
-syscall return, wipe heap memory on a free), to avoid reuse attacks that
-rely on the old contents of memory. This frustrates many uninitialized
-variable attacks, stack content exposures, heap content exposures, and
-use-after-free attacks.
+When releasing memory, it is best to poison the contents, to avoid reuse
+attacks that rely on the old contents of memory. E.g., clear stack on a
+syscall return (``CONFIG_GCC_PLUGIN_STACKLEAK``), wipe heap memory on a
+free. This frustrates many uninitialized variable attacks, stack content
+exposures, heap content exposures, and use-after-free attacks.
 
 Destination tracking
 --------------------
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH RFC v8 0/6] Introduce the STACKLEAK feature and a test for it
  2018-02-16 18:10 [PATCH RFC v8 0/6] Introduce the STACKLEAK feature and a test for it Alexander Popov
                   ` (5 preceding siblings ...)
  2018-02-16 18:10 ` [PATCH RFC v8 6/6] doc: self-protection: Add information about STACKLEAK feature Alexander Popov
@ 2018-02-20 10:29 ` Alexander Popov
  2018-02-20 23:17   ` Kees Cook
  2018-02-22  1:43 ` Laura Abbott
  7 siblings, 1 reply; 61+ messages in thread
From: Alexander Popov @ 2018-02-20 10:29 UTC (permalink / raw)
  To: kernel-hardening, Kees Cook, PaX Team, Brad Spengler,
	Ingo Molnar, Andy Lutomirski, Tycho Andersen, Laura Abbott,
	Mark Rutland, Ard Biesheuvel, Borislav Petkov, Thomas Gleixner,
	H . Peter Anvin, Peter Zijlstra, Dmitry V . Levin, x86,
	Mohamed Ghannam

On 16.02.2018 21:10, Alexander Popov wrote:
> This is the 8th version of the patch series introducing STACKLEAK to the
> mainline kernel. I've made some minor improvements while waiting for the
> next review by x86 maintainers.
> 
> STACKLEAK is a security feature developed by Grsecurity/PaX (kudos to them),
> which:
>  - reduces the information that can be revealed through kernel stack leak bugs;
>  - blocks some uninitialized stack variable attacks (e.g. CVE-2010-2963);
>  - introduces some runtime checks for kernel stack overflow detection.

Hello! I've just tested STACKLEAK against the recent CVE-2017-17712 exploit:
http://openwall.com/lists/oss-security/2018/02/20/1

This vulnerability is a race condition in raw_sendmsg() in net/ipv4/raw.c. It
leads to uninitialized stack pointer usage which can be used for a local
privilege escalation.

CVE-2017-17712 was discovered and fixed by Mohamed Ghannam (kudos to him!). He
also provided a stable PoC exploit for it.

With STACKLEAK, the uninitialized stack pointer is set to STACKLEAK_POISON
(-0xBEEF) and points to the unused hole in the virtual memory map. That blocks
the stack spraying needed for CVE-2017-17712 exploit (similar to CVE-2010-2963).

Best regards,
Alexander

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH RFC v8 0/6] Introduce the STACKLEAK feature and a test for it
  2018-02-20 10:29 ` [PATCH RFC v8 0/6] Introduce the STACKLEAK feature and a test for it Alexander Popov
@ 2018-02-20 23:17   ` Kees Cook
  2018-02-20 23:33     ` Laura Abbott
                       ` (2 more replies)
  0 siblings, 3 replies; 61+ messages in thread
From: Kees Cook @ 2018-02-20 23:17 UTC (permalink / raw)
  To: Alexander Popov, Thomas Gleixner, Andy Lutomirski
  Cc: Kernel Hardening, PaX Team, Brad Spengler, Ingo Molnar,
	Tycho Andersen, Laura Abbott, Mark Rutland, Ard Biesheuvel,
	Borislav Petkov, H . Peter Anvin, Peter Zijlstra,
	Dmitry V . Levin, X86 ML, Mohamed Ghannam

On Tue, Feb 20, 2018 at 2:29 AM, Alexander Popov <alex.popov@linux.com> wrote:
> On 16.02.2018 21:10, Alexander Popov wrote:
>> This is the 8th version of the patch series introducing STACKLEAK to the
>> mainline kernel. I've made some minor improvements while waiting for the
>> next review by x86 maintainers.

If we can borrow some of luto or tglx's time, I think that'd be best:
they've been looking at the entry code a lot lately. :) Regardless, I
think the addition to the entry code is clean (especially now that the
fast path is gone *sob*). :P

>> STACKLEAK is a security feature developed by Grsecurity/PaX (kudos to them),
>> which:
>>  - reduces the information that can be revealed through kernel stack leak bugs;
>>  - blocks some uninitialized stack variable attacks (e.g. CVE-2010-2963);
>>  - introduces some runtime checks for kernel stack overflow detection.

I've added this series to my kernel.org trees, which means 0-day will
start grinding on it too now:
https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git
kspp/gcc-plugin/stackleak

The LKDTM tests look great and check out for me. I think the code is
clear, so I'd like to get it into -next, but I want to be sure I'm not
stepping on x86 toes first.

Laura, how does arm64 look for this? Would it be possible to add it to
this series (at least on kernel.org for build/run testing)?

> Hello! I've just tested STACKLEAK against the recent CVE-2017-17712 exploit:
> http://openwall.com/lists/oss-security/2018/02/20/1
>
> This vulnerability is a race condition in raw_sendmsg() in net/ipv4/raw.c. It
> leads to uninitialized stack pointer usage which can be used for a local
> privilege escalation.
>
> CVE-2017-17712 was discovered and fixed by Mohamed Ghannam (kudos to him!). He
> also provided a stable PoC exploit for it.
>
> With STACKLEAK, the uninitialized stack pointer is set to STACKLEAK_POISON
> (-0xBEEF) and points to the unused hole in the virtual memory map. That blocks
> the stack spraying needed for CVE-2017-17712 exploit (similar to CVE-2010-2963).

Nice check! I wonder if the the byref structleak also solves it?
Regardless, I think stackleak has better performance and greater
coverage.

Thanks!

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH RFC v8 0/6] Introduce the STACKLEAK feature and a test for it
  2018-02-20 23:17   ` Kees Cook
@ 2018-02-20 23:33     ` Laura Abbott
  2018-02-21  1:13         ` Laura Abbott
  2018-02-21 10:05     ` [PATCH RFC v8 0/6] Introduce the STACKLEAK feature and a test for it Borislav Petkov
  2018-02-21 14:43     ` Alexander Popov
  2 siblings, 1 reply; 61+ messages in thread
From: Laura Abbott @ 2018-02-20 23:33 UTC (permalink / raw)
  To: Kees Cook, Alexander Popov, Thomas Gleixner, Andy Lutomirski
  Cc: Kernel Hardening, PaX Team, Brad Spengler, Ingo Molnar,
	Tycho Andersen, Mark Rutland, Ard Biesheuvel, Borislav Petkov,
	H . Peter Anvin, Peter Zijlstra, Dmitry V . Levin, X86 ML,
	Mohamed Ghannam

On 02/20/2018 03:17 PM, Kees Cook wrote:
> On Tue, Feb 20, 2018 at 2:29 AM, Alexander Popov <alex.popov@linux.com> wrote:
>> On 16.02.2018 21:10, Alexander Popov wrote:
>>> This is the 8th version of the patch series introducing STACKLEAK to the
>>> mainline kernel. I've made some minor improvements while waiting for the
>>> next review by x86 maintainers.
> 
> If we can borrow some of luto or tglx's time, I think that'd be best:
> they've been looking at the entry code a lot lately. :) Regardless, I
> think the addition to the entry code is clean (especially now that the
> fast path is gone *sob*). :P
> 
>>> STACKLEAK is a security feature developed by Grsecurity/PaX (kudos to them),
>>> which:
>>>   - reduces the information that can be revealed through kernel stack leak bugs;
>>>   - blocks some uninitialized stack variable attacks (e.g. CVE-2010-2963);
>>>   - introduces some runtime checks for kernel stack overflow detection.
> 
> I've added this series to my kernel.org trees, which means 0-day will
> start grinding on it too now:
> https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git
> kspp/gcc-plugin/stackleak
> 
> The LKDTM tests look great and check out for me. I think the code is
> clear, so I'd like to get it into -next, but I want to be sure I'm not
> stepping on x86 toes first.
> 
> Laura, how does arm64 look for this? Would it be possible to add it to
> this series (at least on kernel.org for build/run testing)?
> 

I fell behind on rebasing/testing so I need to bring it up to date.
Assuming the arm folks are okay with the approach, we can bring it
in for kernel.org testing once I'm finished.

Thanks,
Laura

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 0/2] Stackleak for arm64
  2018-02-20 23:33     ` Laura Abbott
@ 2018-02-21  1:13         ` Laura Abbott
  0 siblings, 0 replies; 61+ messages in thread
From: Laura Abbott @ 2018-02-21  1:13 UTC (permalink / raw)
  To: Alexander Popov, Kees Cook, Mark Rutland, Ard Biesheuvel
  Cc: Laura Abbott, kernel-hardening, linux-arm-kernel, linux-kernel

This is the arm64 version of the STACKLEAK plugin originall from
grsecurity. See
https://marc.info/?l=kernel-hardening&m=151880470609808 for the
full x86 version. This is based on top of Kees' branch for stackleak
and has been cleaned up to use a few macros from that branch.

Comments welcome, if there are no major objections Kees will queue this
up to get some CI testing. This passed both of the LKDTM tests.

Laura Abbott (2):
  stackleak: Update for arm64
  arm64: Clear the stack

 arch/arm64/Kconfig                     |   1 +
 arch/arm64/include/asm/processor.h     |   6 ++
 arch/arm64/kernel/asm-offsets.c        |   3 +
 arch/arm64/kernel/entry.S              | 108 +++++++++++++++++++++++++++++++++
 arch/arm64/kernel/process.c            |  16 +++++
 drivers/firmware/efi/libstub/Makefile  |   3 +-
 scripts/Makefile.gcc-plugins           |   5 +-
 scripts/gcc-plugins/stackleak_plugin.c |   5 ++
 8 files changed, 145 insertions(+), 2 deletions(-)

-- 
2.14.3

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 0/2] Stackleak for arm64
@ 2018-02-21  1:13         ` Laura Abbott
  0 siblings, 0 replies; 61+ messages in thread
From: Laura Abbott @ 2018-02-21  1:13 UTC (permalink / raw)
  To: linux-arm-kernel

This is the arm64 version of the STACKLEAK plugin originall from
grsecurity. See
https://marc.info/?l=kernel-hardening&m=151880470609808 for the
full x86 version. This is based on top of Kees' branch for stackleak
and has been cleaned up to use a few macros from that branch.

Comments welcome, if there are no major objections Kees will queue this
up to get some CI testing. This passed both of the LKDTM tests.

Laura Abbott (2):
  stackleak: Update for arm64
  arm64: Clear the stack

 arch/arm64/Kconfig                     |   1 +
 arch/arm64/include/asm/processor.h     |   6 ++
 arch/arm64/kernel/asm-offsets.c        |   3 +
 arch/arm64/kernel/entry.S              | 108 +++++++++++++++++++++++++++++++++
 arch/arm64/kernel/process.c            |  16 +++++
 drivers/firmware/efi/libstub/Makefile  |   3 +-
 scripts/Makefile.gcc-plugins           |   5 +-
 scripts/gcc-plugins/stackleak_plugin.c |   5 ++
 8 files changed, 145 insertions(+), 2 deletions(-)

-- 
2.14.3

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 1/2] stackleak: Update for arm64
  2018-02-21  1:13         ` Laura Abbott
@ 2018-02-21  1:13           ` Laura Abbott
  -1 siblings, 0 replies; 61+ messages in thread
From: Laura Abbott @ 2018-02-21  1:13 UTC (permalink / raw)
  To: Alexander Popov, Kees Cook, Mark Rutland, Ard Biesheuvel
  Cc: Laura Abbott, kernel-hardening, linux-arm-kernel, linux-kernel


arm64 has another layer of indirection in the RTL.
Account for this in the plugin.

Signed-off-by: Laura Abbott <labbott@redhat.com>
---
 scripts/gcc-plugins/stackleak_plugin.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/scripts/gcc-plugins/stackleak_plugin.c b/scripts/gcc-plugins/stackleak_plugin.c
index 6fc991c98d8b..7dfaa027423f 100644
--- a/scripts/gcc-plugins/stackleak_plugin.c
+++ b/scripts/gcc-plugins/stackleak_plugin.c
@@ -244,6 +244,11 @@ static unsigned int stackleak_final_execute(void)
 		 * that insn.
 		 */
 		body = PATTERN(insn);
+		/* arm64 is different */
+		if (GET_CODE(body) == PARALLEL) {
+			body = XEXP(body, 0);
+			body = XEXP(body, 0);
+		}
 		if (GET_CODE(body) != CALL)
 			continue;
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 1/2] stackleak: Update for arm64
@ 2018-02-21  1:13           ` Laura Abbott
  0 siblings, 0 replies; 61+ messages in thread
From: Laura Abbott @ 2018-02-21  1:13 UTC (permalink / raw)
  To: linux-arm-kernel


arm64 has another layer of indirection in the RTL.
Account for this in the plugin.

Signed-off-by: Laura Abbott <labbott@redhat.com>
---
 scripts/gcc-plugins/stackleak_plugin.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/scripts/gcc-plugins/stackleak_plugin.c b/scripts/gcc-plugins/stackleak_plugin.c
index 6fc991c98d8b..7dfaa027423f 100644
--- a/scripts/gcc-plugins/stackleak_plugin.c
+++ b/scripts/gcc-plugins/stackleak_plugin.c
@@ -244,6 +244,11 @@ static unsigned int stackleak_final_execute(void)
 		 * that insn.
 		 */
 		body = PATTERN(insn);
+		/* arm64 is different */
+		if (GET_CODE(body) == PARALLEL) {
+			body = XEXP(body, 0);
+			body = XEXP(body, 0);
+		}
 		if (GET_CODE(body) != CALL)
 			continue;
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 2/2] arm64: Clear the stack
  2018-02-21  1:13         ` Laura Abbott
@ 2018-02-21  1:13           ` Laura Abbott
  -1 siblings, 0 replies; 61+ messages in thread
From: Laura Abbott @ 2018-02-21  1:13 UTC (permalink / raw)
  To: Alexander Popov, Kees Cook, Mark Rutland, Ard Biesheuvel
  Cc: Laura Abbott, kernel-hardening, linux-arm-kernel, linux-kernel


Implementation of stackleak based heavily on the x86 version

Signed-off-by: Laura Abbott <labbott@redhat.com>
---
 arch/arm64/Kconfig                    |   1 +
 arch/arm64/include/asm/processor.h    |   6 ++
 arch/arm64/kernel/asm-offsets.c       |   3 +
 arch/arm64/kernel/entry.S             | 108 ++++++++++++++++++++++++++++++++++
 arch/arm64/kernel/process.c           |  16 +++++
 drivers/firmware/efi/libstub/Makefile |   3 +-
 scripts/Makefile.gcc-plugins          |   5 +-
 7 files changed, 140 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 7381eeb7ef8e..dcadcae674a7 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -92,6 +92,7 @@ config ARM64
 	select HAVE_ARCH_MMAP_RND_BITS
 	select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT
 	select HAVE_ARCH_SECCOMP_FILTER
+	select HAVE_ARCH_STACKLEAK
 	select HAVE_ARCH_THREAD_STRUCT_WHITELIST
 	select HAVE_ARCH_TRACEHOOK
 	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index fce604e3e599..4b309101ac83 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -114,6 +114,12 @@ struct thread_struct {
 	unsigned long		fault_address;	/* fault info */
 	unsigned long		fault_code;	/* ESR_EL1 value */
 	struct debug_info	debug;		/* debugging */
+#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
+	unsigned long           lowest_stack;
+#ifdef CONFIG_STACKLEAK_METRICS
+	unsigned long		prev_lowest_stack;
+#endif
+#endif
 };
 
 /*
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 1303e04110cd..b5c6100e8b14 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -45,6 +45,9 @@ int main(void)
   DEFINE(TSK_TI_TTBR0,		offsetof(struct task_struct, thread_info.ttbr0));
 #endif
   DEFINE(TSK_STACK,		offsetof(struct task_struct, stack));
+#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
+  DEFINE(TSK_TI_LOWEST_STACK,	offsetof(struct task_struct, thread.lowest_stack));
+#endif
   BLANK();
   DEFINE(THREAD_CPU_CONTEXT,	offsetof(struct task_struct, thread.cpu_context));
   BLANK();
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index ec2ee720e33e..b909b436293a 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -401,6 +401,11 @@ tsk	.req	x28		// current thread_info
 
 	.text
 
+	.macro	erase_kstack
+#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
+	bl	__erase_kstack
+#endif
+	.endm
 /*
  * Exception vectors.
  */
@@ -901,6 +906,7 @@ work_pending:
  */
 ret_to_user:
 	disable_daif
+	erase_kstack
 	ldr	x1, [tsk, #TSK_TI_FLAGS]
 	and	x2, x1, #_TIF_WORK_MASK
 	cbnz	x2, work_pending
@@ -1337,3 +1343,105 @@ alternative_else_nop_endif
 ENDPROC(__sdei_asm_handler)
 NOKPROBE(__sdei_asm_handler)
 #endif /* CONFIG_ARM_SDE_INTERFACE */
+
+/*
+ * This is what the stack looks like
+ *
+ * +---+ <- task_stack_page(p) + THREAD_SIZE
+ * |   |
+ * +---+ <- task_stack_page(p) + THREAD_START_SP
+ * |   |
+ * |   |
+ * +---+ <- task_pt_regs(p)
+ * |   |
+ * |   |
+ * |   | <- current_sp
+ * ~~~~~
+ *
+ * ~~~~~
+ * |   | <- lowest_stack
+ * |   |
+ * |   |
+ * +---+ <- task_stack_page(p)
+ *
+ * This function is desgned to poison the memory between the lowest_stack
+ * and the current stack pointer. After clearing the stack, the lowest
+ * stack is reset.
+ */
+
+#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
+ENTRY(__erase_kstack)
+	mov	x10, x0	// save x0 for the fast path
+
+	get_thread_info	x0
+	ldr	x1, [x0, #TSK_TI_LOWEST_STACK]
+
+	/* get the number of bytes to check for lowest stack */
+	mov	x3, x1
+	and	x3, x3, #THREAD_SIZE - 1
+	lsr	x3, x3, #3
+
+	/* generate addresses from the bottom of the stack */
+	mov	x4, sp
+	movn	x2, #THREAD_SIZE - 1
+	and	x1, x4, x2
+
+	mov	x2, #STACKLEAK_POISON
+
+	mov	x5, #0
+1:
+	/*
+	 * As borrowed from the x86 logic, start from the lowest_stack
+	 * and go to the bottom to find the poison value.
+	 * The check of 16 is to hopefully avoid false positives.
+	 */
+	cbz	x3, 4f
+	ldr	x4, [x1, x3, lsl #3]
+	cmp	x4, x2
+	csinc	x5, xzr, x5, ne
+	tbnz	x5, #STACKLEAK_POISON_CHECK_DEPTH/4, 4f	// found 16 poisons?
+	sub	x3, x3, #1
+	b	1b
+
+4:
+	/* total number of bytes to poison */
+	add     x5, x1, x3, lsl #3
+	mov	x4, sp
+	sub     x8, x4, x5
+
+	cmp     x8, #THREAD_SIZE // sanity check the range
+	b.lo    5f
+	ASM_BUG()
+
+5:
+	/*
+	 * We may have hit a path where the stack did not get used,
+	 * no need to do anything here
+	 */
+	cbz	x8, 7f
+
+	sub	x8, x8, #1 // don't poison the current stack pointer
+
+	lsr     x8, x8, #3
+	add     x3, x3, x8
+
+	/*
+	 * The logic of this loop ensures the last stack word isn't
+	 * ovewritten.
+	 */
+6:
+	cbz     x8, 7f
+	str     x2, [x1, x3, lsl #3]
+	sub     x3, x3, #1
+	sub     x8, x8, #1
+	b	6b
+
+	/* Reset the lowest stack to the top of the stack */
+7:
+	mov	x1, sp
+	str	x1, [x0, #TSK_TI_LOWEST_STACK]
+
+	mov	x0, x10
+	ret
+ENDPROC(__erase_kstack)
+#endif
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index ad8aeb098b31..fd0528db6772 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -357,6 +357,9 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start,
 	p->thread.cpu_context.pc = (unsigned long)ret_from_fork;
 	p->thread.cpu_context.sp = (unsigned long)childregs;
 
+#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
+	p->thread.lowest_stack = (unsigned long)task_stack_page(p);
+#endif
 	ptrace_hw_copy_thread(p);
 
 	return 0;
@@ -486,3 +489,16 @@ void arch_setup_new_exec(void)
 {
 	current->mm->context.flags = is_compat_task() ? MMCF_AARCH32 : 0;
 }
+
+#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
+void __used check_alloca(unsigned long size)
+{
+	unsigned long sp, stack_left;
+
+	sp = current_stack_pointer;
+
+	stack_left = sp & (THREAD_SIZE - 1);
+	BUG_ON(stack_left < 256 || size >= stack_left - 256);
+}
+EXPORT_SYMBOL(check_alloca);
+#endif
diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index 7b3ba40f0745..35ebbc1b17ff 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -20,7 +20,8 @@ cflags-$(CONFIG_EFI_ARMSTUB)	+= -I$(srctree)/scripts/dtc/libfdt
 KBUILD_CFLAGS			:= $(cflags-y) -DDISABLE_BRANCH_PROFILING \
 				   -D__NO_FORTIFY \
 				   $(call cc-option,-ffreestanding) \
-				   $(call cc-option,-fno-stack-protector)
+				   $(call cc-option,-fno-stack-protector) \
+				   $(DISABLE_STACKLEAK_PLUGIN)
 
 GCOV_PROFILE			:= n
 KASAN_SANITIZE			:= n
diff --git a/scripts/Makefile.gcc-plugins b/scripts/Makefile.gcc-plugins
index 8d6070fc538f..6cc0e35d3324 100644
--- a/scripts/Makefile.gcc-plugins
+++ b/scripts/Makefile.gcc-plugins
@@ -37,11 +37,14 @@ ifdef CONFIG_GCC_PLUGINS
 
   gcc-plugin-$(CONFIG_GCC_PLUGIN_STACKLEAK)	+= stackleak_plugin.so
   gcc-plugin-cflags-$(CONFIG_GCC_PLUGIN_STACKLEAK)	+= -DSTACKLEAK_PLUGIN -fplugin-arg-stackleak_plugin-track-min-size=$(CONFIG_STACKLEAK_TRACK_MIN_SIZE)
+  ifdef CONFIG_GCC_PLUGIN_STACKLEAK
+    DISABLE_STACKLEAK_PLUGIN		+= -fplugin-arg-stackleak_plugin-disable
+  endif
 
   GCC_PLUGINS_CFLAGS := $(strip $(addprefix -fplugin=$(objtree)/scripts/gcc-plugins/, $(gcc-plugin-y)) $(gcc-plugin-cflags-y))
 
   export PLUGINCC GCC_PLUGINS_CFLAGS GCC_PLUGIN GCC_PLUGIN_SUBDIR
-  export SANCOV_PLUGIN DISABLE_LATENT_ENTROPY_PLUGIN
+  export SANCOV_PLUGIN DISABLE_LATENT_ENTROPY_PLUGIN DISABLE_STACKLEAK_PLUGIN
 
   ifneq ($(PLUGINCC),)
     # SANCOV_PLUGIN can be only in CFLAGS_KCOV because avoid duplication.
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 2/2] arm64: Clear the stack
@ 2018-02-21  1:13           ` Laura Abbott
  0 siblings, 0 replies; 61+ messages in thread
From: Laura Abbott @ 2018-02-21  1:13 UTC (permalink / raw)
  To: linux-arm-kernel


Implementation of stackleak based heavily on the x86 version

Signed-off-by: Laura Abbott <labbott@redhat.com>
---
 arch/arm64/Kconfig                    |   1 +
 arch/arm64/include/asm/processor.h    |   6 ++
 arch/arm64/kernel/asm-offsets.c       |   3 +
 arch/arm64/kernel/entry.S             | 108 ++++++++++++++++++++++++++++++++++
 arch/arm64/kernel/process.c           |  16 +++++
 drivers/firmware/efi/libstub/Makefile |   3 +-
 scripts/Makefile.gcc-plugins          |   5 +-
 7 files changed, 140 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 7381eeb7ef8e..dcadcae674a7 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -92,6 +92,7 @@ config ARM64
 	select HAVE_ARCH_MMAP_RND_BITS
 	select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT
 	select HAVE_ARCH_SECCOMP_FILTER
+	select HAVE_ARCH_STACKLEAK
 	select HAVE_ARCH_THREAD_STRUCT_WHITELIST
 	select HAVE_ARCH_TRACEHOOK
 	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index fce604e3e599..4b309101ac83 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -114,6 +114,12 @@ struct thread_struct {
 	unsigned long		fault_address;	/* fault info */
 	unsigned long		fault_code;	/* ESR_EL1 value */
 	struct debug_info	debug;		/* debugging */
+#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
+	unsigned long           lowest_stack;
+#ifdef CONFIG_STACKLEAK_METRICS
+	unsigned long		prev_lowest_stack;
+#endif
+#endif
 };
 
 /*
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 1303e04110cd..b5c6100e8b14 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -45,6 +45,9 @@ int main(void)
   DEFINE(TSK_TI_TTBR0,		offsetof(struct task_struct, thread_info.ttbr0));
 #endif
   DEFINE(TSK_STACK,		offsetof(struct task_struct, stack));
+#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
+  DEFINE(TSK_TI_LOWEST_STACK,	offsetof(struct task_struct, thread.lowest_stack));
+#endif
   BLANK();
   DEFINE(THREAD_CPU_CONTEXT,	offsetof(struct task_struct, thread.cpu_context));
   BLANK();
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index ec2ee720e33e..b909b436293a 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -401,6 +401,11 @@ tsk	.req	x28		// current thread_info
 
 	.text
 
+	.macro	erase_kstack
+#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
+	bl	__erase_kstack
+#endif
+	.endm
 /*
  * Exception vectors.
  */
@@ -901,6 +906,7 @@ work_pending:
  */
 ret_to_user:
 	disable_daif
+	erase_kstack
 	ldr	x1, [tsk, #TSK_TI_FLAGS]
 	and	x2, x1, #_TIF_WORK_MASK
 	cbnz	x2, work_pending
@@ -1337,3 +1343,105 @@ alternative_else_nop_endif
 ENDPROC(__sdei_asm_handler)
 NOKPROBE(__sdei_asm_handler)
 #endif /* CONFIG_ARM_SDE_INTERFACE */
+
+/*
+ * This is what the stack looks like
+ *
+ * +---+ <- task_stack_page(p) + THREAD_SIZE
+ * |   |
+ * +---+ <- task_stack_page(p) + THREAD_START_SP
+ * |   |
+ * |   |
+ * +---+ <- task_pt_regs(p)
+ * |   |
+ * |   |
+ * |   | <- current_sp
+ * ~~~~~
+ *
+ * ~~~~~
+ * |   | <- lowest_stack
+ * |   |
+ * |   |
+ * +---+ <- task_stack_page(p)
+ *
+ * This function is desgned to poison the memory between the lowest_stack
+ * and the current stack pointer. After clearing the stack, the lowest
+ * stack is reset.
+ */
+
+#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
+ENTRY(__erase_kstack)
+	mov	x10, x0	// save x0 for the fast path
+
+	get_thread_info	x0
+	ldr	x1, [x0, #TSK_TI_LOWEST_STACK]
+
+	/* get the number of bytes to check for lowest stack */
+	mov	x3, x1
+	and	x3, x3, #THREAD_SIZE - 1
+	lsr	x3, x3, #3
+
+	/* generate addresses from the bottom of the stack */
+	mov	x4, sp
+	movn	x2, #THREAD_SIZE - 1
+	and	x1, x4, x2
+
+	mov	x2, #STACKLEAK_POISON
+
+	mov	x5, #0
+1:
+	/*
+	 * As borrowed from the x86 logic, start from the lowest_stack
+	 * and go to the bottom to find the poison value.
+	 * The check of 16 is to hopefully avoid false positives.
+	 */
+	cbz	x3, 4f
+	ldr	x4, [x1, x3, lsl #3]
+	cmp	x4, x2
+	csinc	x5, xzr, x5, ne
+	tbnz	x5, #STACKLEAK_POISON_CHECK_DEPTH/4, 4f	// found 16 poisons?
+	sub	x3, x3, #1
+	b	1b
+
+4:
+	/* total number of bytes to poison */
+	add     x5, x1, x3, lsl #3
+	mov	x4, sp
+	sub     x8, x4, x5
+
+	cmp     x8, #THREAD_SIZE // sanity check the range
+	b.lo    5f
+	ASM_BUG()
+
+5:
+	/*
+	 * We may have hit a path where the stack did not get used,
+	 * no need to do anything here
+	 */
+	cbz	x8, 7f
+
+	sub	x8, x8, #1 // don't poison the current stack pointer
+
+	lsr     x8, x8, #3
+	add     x3, x3, x8
+
+	/*
+	 * The logic of this loop ensures the last stack word isn't
+	 * ovewritten.
+	 */
+6:
+	cbz     x8, 7f
+	str     x2, [x1, x3, lsl #3]
+	sub     x3, x3, #1
+	sub     x8, x8, #1
+	b	6b
+
+	/* Reset the lowest stack to the top of the stack */
+7:
+	mov	x1, sp
+	str	x1, [x0, #TSK_TI_LOWEST_STACK]
+
+	mov	x0, x10
+	ret
+ENDPROC(__erase_kstack)
+#endif
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index ad8aeb098b31..fd0528db6772 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -357,6 +357,9 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start,
 	p->thread.cpu_context.pc = (unsigned long)ret_from_fork;
 	p->thread.cpu_context.sp = (unsigned long)childregs;
 
+#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
+	p->thread.lowest_stack = (unsigned long)task_stack_page(p);
+#endif
 	ptrace_hw_copy_thread(p);
 
 	return 0;
@@ -486,3 +489,16 @@ void arch_setup_new_exec(void)
 {
 	current->mm->context.flags = is_compat_task() ? MMCF_AARCH32 : 0;
 }
+
+#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
+void __used check_alloca(unsigned long size)
+{
+	unsigned long sp, stack_left;
+
+	sp = current_stack_pointer;
+
+	stack_left = sp & (THREAD_SIZE - 1);
+	BUG_ON(stack_left < 256 || size >= stack_left - 256);
+}
+EXPORT_SYMBOL(check_alloca);
+#endif
diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index 7b3ba40f0745..35ebbc1b17ff 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -20,7 +20,8 @@ cflags-$(CONFIG_EFI_ARMSTUB)	+= -I$(srctree)/scripts/dtc/libfdt
 KBUILD_CFLAGS			:= $(cflags-y) -DDISABLE_BRANCH_PROFILING \
 				   -D__NO_FORTIFY \
 				   $(call cc-option,-ffreestanding) \
-				   $(call cc-option,-fno-stack-protector)
+				   $(call cc-option,-fno-stack-protector) \
+				   $(DISABLE_STACKLEAK_PLUGIN)
 
 GCOV_PROFILE			:= n
 KASAN_SANITIZE			:= n
diff --git a/scripts/Makefile.gcc-plugins b/scripts/Makefile.gcc-plugins
index 8d6070fc538f..6cc0e35d3324 100644
--- a/scripts/Makefile.gcc-plugins
+++ b/scripts/Makefile.gcc-plugins
@@ -37,11 +37,14 @@ ifdef CONFIG_GCC_PLUGINS
 
   gcc-plugin-$(CONFIG_GCC_PLUGIN_STACKLEAK)	+= stackleak_plugin.so
   gcc-plugin-cflags-$(CONFIG_GCC_PLUGIN_STACKLEAK)	+= -DSTACKLEAK_PLUGIN -fplugin-arg-stackleak_plugin-track-min-size=$(CONFIG_STACKLEAK_TRACK_MIN_SIZE)
+  ifdef CONFIG_GCC_PLUGIN_STACKLEAK
+    DISABLE_STACKLEAK_PLUGIN		+= -fplugin-arg-stackleak_plugin-disable
+  endif
 
   GCC_PLUGINS_CFLAGS := $(strip $(addprefix -fplugin=$(objtree)/scripts/gcc-plugins/, $(gcc-plugin-y)) $(gcc-plugin-cflags-y))
 
   export PLUGINCC GCC_PLUGINS_CFLAGS GCC_PLUGIN GCC_PLUGIN_SUBDIR
-  export SANCOV_PLUGIN DISABLE_LATENT_ENTROPY_PLUGIN
+  export SANCOV_PLUGIN DISABLE_LATENT_ENTROPY_PLUGIN DISABLE_STACKLEAK_PLUGIN
 
   ifneq ($(PLUGINCC),)
     # SANCOV_PLUGIN can be only in CFLAGS_KCOV because avoid duplication.
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH RFC v8 0/6] Introduce the STACKLEAK feature and a test for it
  2018-02-20 23:17   ` Kees Cook
  2018-02-20 23:33     ` Laura Abbott
@ 2018-02-21 10:05     ` Borislav Petkov
  2018-02-21 15:09       ` Alexander Popov
  2018-02-21 14:43     ` Alexander Popov
  2 siblings, 1 reply; 61+ messages in thread
From: Borislav Petkov @ 2018-02-21 10:05 UTC (permalink / raw)
  To: Kees Cook, Alexander Popov
  Cc: Thomas Gleixner, Andy Lutomirski, Kernel Hardening, PaX Team,
	Brad Spengler, Ingo Molnar, Tycho Andersen, Laura Abbott,
	Mark Rutland, Ard Biesheuvel, H . Peter Anvin, Peter Zijlstra,
	Dmitry V . Levin, X86 ML, Mohamed Ghannam

On Tue, Feb 20, 2018 at 03:17:21PM -0800, Kees Cook wrote:
> On Tue, Feb 20, 2018 at 2:29 AM, Alexander Popov <alex.popov@linux.com> wrote:
> > On 16.02.2018 21:10, Alexander Popov wrote:
> >> This is the 8th version of the patch series introducing STACKLEAK to the
> >> mainline kernel. I've made some minor improvements while waiting for the
> >> next review by x86 maintainers.
> 
> If we can borrow some of luto or tglx's time, I think that'd be best:

Btw, someone just noticed: why hasn't this patchset ever been sent to
lkml for a wider review?

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH RFC v8 1/6] x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls
  2018-02-16 18:10 ` [PATCH RFC v8 1/6] x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls Alexander Popov
@ 2018-02-21 13:24   ` Borislav Petkov
  2018-02-21 21:49     ` Alexander Popov
  0 siblings, 1 reply; 61+ messages in thread
From: Borislav Petkov @ 2018-02-21 13:24 UTC (permalink / raw)
  To: Alexander Popov
  Cc: kernel-hardening, Kees Cook, PaX Team, Brad Spengler,
	Ingo Molnar, Andy Lutomirski, Tycho Andersen, Laura Abbott,
	Mark Rutland, Ard Biesheuvel, Thomas Gleixner, H . Peter Anvin,
	Peter Zijlstra, Dmitry V . Levin, x86

On Fri, Feb 16, 2018 at 09:10:52PM +0300, Alexander Popov wrote:
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 63bf349..62748bd 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -119,6 +119,7 @@ config X86
>  	select HAVE_ARCH_COMPAT_MMAP_BASES	if MMU && COMPAT
>  	select HAVE_ARCH_SECCOMP_FILTER
>  	select HAVE_ARCH_THREAD_STRUCT_WHITELIST
> +	select HAVE_ARCH_STACKLEAK
>  	select HAVE_ARCH_TRACEHOOK
>  	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
>  	select HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD if X86_64
> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
> index 16c2c02..4a7365a 100644
> --- a/arch/x86/entry/entry_32.S
> +++ b/arch/x86/entry/entry_32.S
> @@ -77,6 +77,90 @@
>  #endif
>  .endm
>  
> +.macro erase_kstack
> +#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
> +	call erase_kstack

Hmm, why do we need a macro *and* a function of the same name? Why not do

	call erase_kstack

everywhere we need to?

And then do

ENTRY(erase_kstack)
#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
	...

to tone down the ifdeffery.

> +#endif
> +.endm
> +
> +#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
> +ENTRY(erase_kstack)
> +	pushl	%edi
> +	pushl	%ecx
> +	pushl	%eax
> +	pushl	%ebp
> +
> +	movl	PER_CPU_VAR(current_task), %ebp
> +	mov	TASK_lowest_stack(%ebp), %edi

So instead of TASK_lowest_stack and adding all the ifdeffery around, why
not do this computation:

	(unsigned long)task_stack_page(p) + 2 * sizeof(unsigned long);

in asm?

You only need task->stack as an offset variable. And other code might
need it too, anyway so you could just as well use it instead of adding
another var to thread_struct.

/me reads further.

Ah. So this lowest_stack thing changes at the end:

	"Set the lowest_stack value to the top_of_stack - 256"

Why?

So that magic needs more explanation for slow people like me.

> +	mov	$STACKLEAK_POISON, %eax
> +	std
> +
> +	/*
> +	 * Let's search for the poison value in the stack.
> +	 * Start from the lowest_stack and go to the bottom (see std above).

Let's write insns capitalized: "see STD above".

...

> +	/*
> +	 * Check that some further qwords contain poison. If so, the part
> +	 * of the stack below the address in %rdi is likely to be poisoned.
> +	 * Otherwise we need to search deeper.
> +	 */
> +	mov	$STACKLEAK_POISON_CHECK_DEPTH / 8, %ecx
> +	repe	scasq
> +	jecxz	2f	/* Poison the upper part of the stack */
> +	jne	1b	/* Search deeper */
> +
> +2:

You could name that label

.Lpoison

and make the code more readable:

	jb	.Lpoison

and so on.

> +	/*
> +	 * Two qwords at the bottom of the thread stack are reserved and
> +	 * should not be poisoned (see CONFIG_SCHED_STACK_END_CHECK).
> +	 */
> +	or	$2 * 8, %rdi
> +
> +	/*
> +	 * Check whether we are on the thread stack to prepare the counter
> +	 * for stack poisoning.
> +	 */
> +	mov	PER_CPU_VAR(cpu_current_top_of_stack), %rcx
> +	sub	%rsp, %rcx
> +	cmp	$THREAD_SIZE_asm, %rcx
> +	jb	3f
> +
> +	/*
> +	 * We are not on the thread stack, so we can write poison between
> +	 * the address in %rdi and the stack top.
> +	 */
> +	mov	PER_CPU_VAR(cpu_current_top_of_stack), %rcx
> +	sub	%rdi, %rcx
> +	jmp	4f
> +
> +3:
> +	/*
> +	 * We are on the thread stack, so we can write poison between the
> +	 * address in %rdi and the address in %rsp (not included, used memory).
> +	 */
> +	mov	%rsp, %rcx
> +	sub	%rdi, %rcx
> +
> +4:
> +	/* Check that the counter value is sane */
> +	cmp	$THREAD_SIZE_asm, %rcx
> +	jb	5f
> +	ud2
> +
> +5:
> +	/*
> +	 * So let's write the poison value to the kernel stack. Start from the
> +	 * address in %rdi and move up (see cld).
> +	 */
> +	cld
> +	shr	$3, %ecx
> +	rep	stosq

Hohumm, that's >2K loops per syscall here and stack is

0xffffc90000008010:     0xffffffffffff4111      0xffffffffffff4111
0xffffc90000008020:     0xffffffffffff4111      0xffffffffffff4111
0xffffc90000008030:     0xffffffffffff4111      0xffffffffffff4111
0xffffc90000008040:     0xffffffffffff4111      0xffffffffffff4111
0xffffc90000008050:     0xffffffffffff4111      0xffffffffffff4111
0xffffc90000008060:     0xffffffffffff4111      0xffffffffffff4111
...

> diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
> index 9eb448c..6dc55f6 100644
> --- a/arch/x86/kernel/process_64.c
> +++ b/arch/x86/kernel/process_64.c
> @@ -281,6 +281,11 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
>  	p->thread.sp = (unsigned long) fork_frame;
>  	p->thread.io_bitmap_ptr = NULL;
>  
> +#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
> +	p->thread.lowest_stack = (unsigned long)task_stack_page(p) +
> +						2 * sizeof(unsigned long);
> +#endif
> +
>  	savesegment(gs, p->thread.gsindex);
>  	p->thread.gsbase = p->thread.gsindex ? 0 : me->thread.gsbase;
>  	savesegment(fs, p->thread.fsindex);
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index e835fc0..fe04f60 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -337,4 +337,10 @@ unsigned long read_word_at_a_time(const void *addr)
>  	compiletime_assert(__native_word(t),				\
>  		"Need native word sized stores/loads for atomicity.")
>  
> +#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
> +/* Poison value points to the unused hole in the virtual memory map */

There are a bunch of those there now. Please document it in
Documentation/x86/x86_64/mm.txt

> +# define STACKLEAK_POISON -0xBEEF
> +# define STACKLEAK_POISON_CHECK_DEPTH 128
> +#endif
> +
>  #endif /* __LINUX_COMPILER_H */
> -- 
> 2.7.4
> 

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH RFC v8 0/6] Introduce the STACKLEAK feature and a test for it
  2018-02-20 23:17   ` Kees Cook
  2018-02-20 23:33     ` Laura Abbott
  2018-02-21 10:05     ` [PATCH RFC v8 0/6] Introduce the STACKLEAK feature and a test for it Borislav Petkov
@ 2018-02-21 14:43     ` Alexander Popov
  2 siblings, 0 replies; 61+ messages in thread
From: Alexander Popov @ 2018-02-21 14:43 UTC (permalink / raw)
  To: Kees Cook, Thomas Gleixner, Andy Lutomirski
  Cc: Kernel Hardening, PaX Team, Brad Spengler, Ingo Molnar,
	Tycho Andersen, Laura Abbott, Mark Rutland, Ard Biesheuvel,
	Borislav Petkov, H . Peter Anvin, Peter Zijlstra,
	Dmitry V . Levin, X86 ML, Mohamed Ghannam

Hello Kees,

On 21.02.2018 02:17, Kees Cook wrote:
> On Tue, Feb 20, 2018 at 2:29 AM, Alexander Popov <alex.popov@linux.com> wrote:
>> On 16.02.2018 21:10, Alexander Popov wrote:
>>> This is the 8th version of the patch series introducing STACKLEAK to the
>>> mainline kernel. I've made some minor improvements while waiting for the
>>> next review by x86 maintainers.
> 
> If we can borrow some of luto or tglx's time, I think that'd be best:
> they've been looking at the entry code a lot lately. :) Regardless, I
> think the addition to the entry code is clean (especially now that the
> fast path is gone *sob*). :P
> 
>>> STACKLEAK is a security feature developed by Grsecurity/PaX (kudos to them),
>>> which:
>>>  - reduces the information that can be revealed through kernel stack leak bugs;
>>>  - blocks some uninitialized stack variable attacks (e.g. CVE-2010-2963);
>>>  - introduces some runtime checks for kernel stack overflow detection.
> 
> I've added this series to my kernel.org trees, which means 0-day will
> start grinding on it too now:
> https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git
> kspp/gcc-plugin/stackleak

Great, thanks!

> The LKDTM tests look great and check out for me. I think the code is
> clear, so I'd like to get it into -next, but I want to be sure I'm not
> stepping on x86 toes first.

Yes, I see. Thanks for your review anyway.

> Laura, how does arm64 look for this? Would it be possible to add it to
> this series (at least on kernel.org for build/run testing)?
> 
>> Hello! I've just tested STACKLEAK against the recent CVE-2017-17712 exploit:
>> http://openwall.com/lists/oss-security/2018/02/20/1
>>
>> This vulnerability is a race condition in raw_sendmsg() in net/ipv4/raw.c. It
>> leads to uninitialized stack pointer usage which can be used for a local
>> privilege escalation.
>>
>> CVE-2017-17712 was discovered and fixed by Mohamed Ghannam (kudos to him!). He
>> also provided a stable PoC exploit for it.
>>
>> With STACKLEAK, the uninitialized stack pointer is set to STACKLEAK_POISON
>> (-0xBEEF) and points to the unused hole in the virtual memory map. That blocks
>> the stack spraying needed for CVE-2017-17712 exploit (similar to CVE-2010-2963).
> 
> Nice check! I wonder if the the byref structleak also solves it?
> Regardless, I think stackleak has better performance and greater
> coverage.

I disabled CONFIG_GCC_PLUGIN_STACKLEAK and instead enabled
CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL. The msg pointer is now 0. So the access
to msg->msg_iter now gives the following:

[    7.324333] BUG: unable to handle kernel NULL pointer dereference at
0000000000000010
[    7.325067] IP: csum_and_copy_from_iter_full+0x1e/0x400
[    7.325067] PGD 800000007db6c067 P4D 800000007db6c067 PUD 7d34a067 PMD 0
[    7.325067] Oops: 0000 [#1] SMP PTI
[    7.325067] Dumping ftrace buffer:
[    7.325067]    (ftrace buffer empty)
[    7.325067] Modules linked in:
[    7.325067] CPU: 0 PID: 2708 Comm: poc Not tainted 4.16.0-rc1+ #5
[    7.325067] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
Ubuntu-1.8.2-1ubuntu1 04/01/2014
[    7.325067] RIP: 0010:csum_and_copy_from_iter_full+0x1e/0x400
[    7.325067] RSP: 0018:ffffc90000d839d0 EFLAGS: 00010246
[    7.325067] RAX: 0000000000000000 RBX: ffff88003d0887c0 RCX: 0000000000000010
[    7.325067] RDX: ffffc90000d83a64 RSI: 0000000000006400 RDI: ffff88003b828024
[    7.325067] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff88003d0887c0
[    7.325067] R10: ffff88003b828024 R11: 0000000000000000 R12: 0000000000006400
[    7.325067] R13: 0000000000006400 R14: ffff88003de53640 R15: ffff88007d90c110
[    7.325067] FS:  00007f52585d0700(0000) GS:ffff88003ec00000(0000)
knlGS:0000000000000000
[    7.325067] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    7.325067] CR2: 0000000000000010 CR3: 000000007db70000 CR4: 00000000000006f0
[    7.325067] Call Trace:
[    7.325067]  ? __kmalloc_reserve.isra.41+0x29/0x80
[    7.325067]  ip_generic_getfrag+0x73/0xc0
[    7.325067]  __ip_append_data.isra.48+0x68b/0x890
[    7.325067]  ? raw_destroy+0x20/0x20
[    7.325067]  ? raw_destroy+0x20/0x20
[    7.325067]  ip_append_data.part.50+0x67/0xc0
[    7.325067]  raw_sendmsg+0x486/0xa50
[    7.325067]  ? __kmalloc+0x141/0x1a0
[    7.325067]  sock_sendmsg+0x31/0x40
[    7.325067]  ___sys_sendmsg+0x277/0x2d0
[    7.325067]  ? __sys_sendmsg+0x62/0xa0
[    7.325067]  __sys_sendmsg+0x62/0xa0
[    7.325067]  do_syscall_64+0x63/0x120
[    7.325067]  entry_SYSCALL_64_after_hwframe+0x21/0x86
[    7.325067] RIP: 0033:0x7f525c186e90
[    7.325067] RSP: 002b:00007f52585cff00 EFLAGS: 00000293 ORIG_RAX:
000000000000002e
[    7.325067] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f525c186e90
[    7.325067] RDX: 0000000000000000 RSI: 00000000011bd010 RDI: 0000000000000003
[    7.325067] RBP: 00000000011bd010 R08: 0000000000000000 R09: 00007f52585d0700
[    7.325067] R10: 00007f52585cff40 R11: 0000000000000293 R12: 0000000000000000
[    7.325067] R13: 00007fffde2c70bf R14: 0000000000000000 R15: 00007f525c5b7040
[    7.325067] Code: 4c 89 c3 eb f6 49 01 d8 e9 7a ff ff ff 41 57 41 56 41 55 41
54 55 53 48 83 ec 48 65 48 8b 04 25 28 00 00 00 48 89 44 24 40 31 c0 <8b> 01 48
89 14 24 a8 08 0f 85 e5 00 00 00 48 39 71 10 49 89 f6
[    7.325067] RIP: csum_and_copy_from_iter_full+0x1e/0x400 RSP: ffffc90000d839d0
[    7.325067] CR2: 0000000000000010
[    7.382645] ---[ end trace f6927758360fb538 ]---


I think that would block the stack spraying as well.

Best regards,
Alexander

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 0/2] Stackleak for arm64
  2018-02-21  1:13         ` Laura Abbott
@ 2018-02-21 14:48           ` Alexander Popov
  -1 siblings, 0 replies; 61+ messages in thread
From: Alexander Popov @ 2018-02-21 14:48 UTC (permalink / raw)
  To: Laura Abbott, Kees Cook, Mark Rutland, Ard Biesheuvel
  Cc: kernel-hardening, linux-arm-kernel, linux-kernel

On 21.02.2018 04:13, Laura Abbott wrote:
> This is the arm64 version of the STACKLEAK plugin originall from
> grsecurity. See
> https://marc.info/?l=kernel-hardening&m=151880470609808 for the
> full x86 version. This is based on top of Kees' branch for stackleak
> and has been cleaned up to use a few macros from that branch.
> 
> Comments welcome, if there are no major objections Kees will queue this
> up to get some CI testing. This passed both of the LKDTM tests.

Hello, Laura,

Thank you. I'll take some time to learn your patches and test them on my LeMaker
HiKey board. I'll return with the feedback.

Best regards,
Alexander

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 0/2] Stackleak for arm64
@ 2018-02-21 14:48           ` Alexander Popov
  0 siblings, 0 replies; 61+ messages in thread
From: Alexander Popov @ 2018-02-21 14:48 UTC (permalink / raw)
  To: linux-arm-kernel

On 21.02.2018 04:13, Laura Abbott wrote:
> This is the arm64 version of the STACKLEAK plugin originall from
> grsecurity. See
> https://marc.info/?l=kernel-hardening&m=151880470609808 for the
> full x86 version. This is based on top of Kees' branch for stackleak
> and has been cleaned up to use a few macros from that branch.
> 
> Comments welcome, if there are no major objections Kees will queue this
> up to get some CI testing. This passed both of the LKDTM tests.

Hello, Laura,

Thank you. I'll take some time to learn your patches and test them on my LeMaker
HiKey board. I'll return with the feedback.

Best regards,
Alexander

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH RFC v8 0/6] Introduce the STACKLEAK feature and a test for it
  2018-02-21 10:05     ` [PATCH RFC v8 0/6] Introduce the STACKLEAK feature and a test for it Borislav Petkov
@ 2018-02-21 15:09       ` Alexander Popov
  0 siblings, 0 replies; 61+ messages in thread
From: Alexander Popov @ 2018-02-21 15:09 UTC (permalink / raw)
  To: Borislav Petkov, Kees Cook
  Cc: Thomas Gleixner, Andy Lutomirski, Kernel Hardening, PaX Team,
	Brad Spengler, Ingo Molnar, Tycho Andersen, Laura Abbott,
	Mark Rutland, Ard Biesheuvel, H . Peter Anvin, Peter Zijlstra,
	Dmitry V . Levin, X86 ML, Mohamed Ghannam

On 21.02.2018 13:05, Borislav Petkov wrote:
> On Tue, Feb 20, 2018 at 03:17:21PM -0800, Kees Cook wrote:
>> On Tue, Feb 20, 2018 at 2:29 AM, Alexander Popov <alex.popov@linux.com> wrote:
>>> On 16.02.2018 21:10, Alexander Popov wrote:
>>>> This is the 8th version of the patch series introducing STACKLEAK to the
>>>> mainline kernel. I've made some minor improvements while waiting for the
>>>> next review by x86 maintainers.
>>
>> If we can borrow some of luto or tglx's time, I think that'd be best:
> 
> Btw, someone just noticed: why hasn't this patchset ever been sent to
> lkml for a wider review?

Thanks for that notice.

I just realized that I'm resending this patch series to people who I chose
approximately 8 month ago, when I started to work on STACKLEAK.

I'll use scripts/get_maintainer.pl for the next version.

Best regards,
Alexander

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 2/2] arm64: Clear the stack
  2018-02-21  1:13           ` Laura Abbott
@ 2018-02-21 15:38             ` Mark Rutland
  -1 siblings, 0 replies; 61+ messages in thread
From: Mark Rutland @ 2018-02-21 15:38 UTC (permalink / raw)
  To: Laura Abbott
  Cc: Alexander Popov, Kees Cook, Ard Biesheuvel, kernel-hardening,
	linux-arm-kernel, linux-kernel

Hi Laura,

On Tue, Feb 20, 2018 at 05:13:03PM -0800, Laura Abbott wrote:
> Implementation of stackleak based heavily on the x86 version

Neat!

> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> index ec2ee720e33e..b909b436293a 100644
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -401,6 +401,11 @@ tsk	.req	x28		// current thread_info
>  
>  	.text
>  
> +	.macro	erase_kstack
> +#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
> +	bl	__erase_kstack
> +#endif
> +	.endm
>  /*
>   * Exception vectors.
>   */
> @@ -901,6 +906,7 @@ work_pending:
>   */
>  ret_to_user:
>  	disable_daif
> +	erase_kstack

I *think* this should happen in finish_ret_to_user a few lines down, since we
can call C code if we branch to work_pending, dirtying the stack.

>  	ldr	x1, [tsk, #TSK_TI_FLAGS]
>  	and	x2, x1, #_TIF_WORK_MASK
>  	cbnz	x2, work_pending
> @@ -1337,3 +1343,105 @@ alternative_else_nop_endif
>  ENDPROC(__sdei_asm_handler)
>  NOKPROBE(__sdei_asm_handler)
>  #endif /* CONFIG_ARM_SDE_INTERFACE */
> +
> +/*
> + * This is what the stack looks like
> + *
> + * +---+ <- task_stack_page(p) + THREAD_SIZE
> + * |   |
> + * +---+ <- task_stack_page(p) + THREAD_START_SP
> + * |   |
> + * |   |
> + * +---+ <- task_pt_regs(p)

THREAD_START_SP got killed off in commit 34be98f4944f9907 as part of the
VMAP_STACK rework, so this can be:

      +---+ <- task_stack_page(p) + THREAD_SIZE
      |   |
      |   |
      +---+ <- task_pt_regs(p)
       ...

> + * |   |
> + * |   |
> + * |   | <- current_sp
> + * ~~~~~
> + *
> + * ~~~~~
> + * |   | <- lowest_stack
> + * |   |
> + * |   |
> + * +---+ <- task_stack_page(p)
> + *
> + * This function is desgned to poison the memory between the lowest_stack
> + * and the current stack pointer. After clearing the stack, the lowest
> + * stack is reset.
> + */
> +
> +#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
> +ENTRY(__erase_kstack)
> +	mov	x10, x0	// save x0 for the fast path

AFAICT, we only call this from ret_to_user, where x0 doesn't need to be
preserved.

Is that for ret_fast_syscall? In some cases, ret_fast_syscall can bypass
ret_to_user and calls kernel_exit directly, so we might need a call there.

> +
> +	get_thread_info	x0
> +	ldr	x1, [x0, #TSK_TI_LOWEST_STACK]
> +
> +	/* get the number of bytes to check for lowest stack */
> +	mov	x3, x1
> +	and	x3, x3, #THREAD_SIZE - 1
> +	lsr	x3, x3, #3
> +
> +	/* generate addresses from the bottom of the stack */
> +	mov	x4, sp
> +	movn	x2, #THREAD_SIZE - 1
> +	and	x1, x4, x2

Can we replace the MOVN;AND with a single instruction to clear the low bits?
e.g.

	mov	x4, sp
	bic	x1, x4, #THREAD_SIZE - 1

... IIUC BIC is an alias for the bitfield instructions, though I can't recall
exactly which one(s).

> +
> +	mov	x2, #STACKLEAK_POISON
> +
> +	mov	x5, #0
> +1:
> +	/*
> +	 * As borrowed from the x86 logic, start from the lowest_stack
> +	 * and go to the bottom to find the poison value.
> +	 * The check of 16 is to hopefully avoid false positives.
> +	 */
> +	cbz	x3, 4f
> +	ldr	x4, [x1, x3, lsl #3]
> +	cmp	x4, x2
> +	csinc	x5, xzr, x5, ne
> +	tbnz	x5, #STACKLEAK_POISON_CHECK_DEPTH/4, 4f	// found 16 poisons?
> +	sub	x3, x3, #1
> +	b	1b
> +
> +4:
> +	/* total number of bytes to poison */
> +	add     x5, x1, x3, lsl #3
> +	mov	x4, sp
> +	sub     x8, x4, x5
> +
> +	cmp     x8, #THREAD_SIZE // sanity check the range
> +	b.lo    5f
> +	ASM_BUG()
> +
> +5:
> +	/*
> +	 * We may have hit a path where the stack did not get used,
> +	 * no need to do anything here
> +	 */
> +	cbz	x8, 7f
> +
> +	sub	x8, x8, #1 // don't poison the current stack pointer
> +
> +	lsr     x8, x8, #3
> +	add     x3, x3, x8
> +
> +	/*
> +	 * The logic of this loop ensures the last stack word isn't
> +	 * ovewritten.
> +	 */

Is that to ensure that we don't clobber the word at the current sp value?

> +6:
> +	cbz     x8, 7f
> +	str     x2, [x1, x3, lsl #3]
> +	sub     x3, x3, #1
> +	sub     x8, x8, #1
> +	b	6b
> +
> +	/* Reset the lowest stack to the top of the stack */
> +7:
> +	mov	x1, sp
> +	str	x1, [x0, #TSK_TI_LOWEST_STACK]
> +
> +	mov	x0, x10
> +	ret
> +ENDPROC(__erase_kstack)
> +#endif

[...]

> diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> index 7b3ba40f0745..35ebbc1b17ff 100644
> --- a/drivers/firmware/efi/libstub/Makefile
> +++ b/drivers/firmware/efi/libstub/Makefile
> @@ -20,7 +20,8 @@ cflags-$(CONFIG_EFI_ARMSTUB)	+= -I$(srctree)/scripts/dtc/libfdt
>  KBUILD_CFLAGS			:= $(cflags-y) -DDISABLE_BRANCH_PROFILING \
>  				   -D__NO_FORTIFY \
>  				   $(call cc-option,-ffreestanding) \
> -				   $(call cc-option,-fno-stack-protector)
> +				   $(call cc-option,-fno-stack-protector) \
> +				   $(DISABLE_STACKLEAK_PLUGIN)

I believe the KVM hyp code will also need to opt-out of this.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 2/2] arm64: Clear the stack
@ 2018-02-21 15:38             ` Mark Rutland
  0 siblings, 0 replies; 61+ messages in thread
From: Mark Rutland @ 2018-02-21 15:38 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Laura,

On Tue, Feb 20, 2018 at 05:13:03PM -0800, Laura Abbott wrote:
> Implementation of stackleak based heavily on the x86 version

Neat!

> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> index ec2ee720e33e..b909b436293a 100644
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -401,6 +401,11 @@ tsk	.req	x28		// current thread_info
>  
>  	.text
>  
> +	.macro	erase_kstack
> +#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
> +	bl	__erase_kstack
> +#endif
> +	.endm
>  /*
>   * Exception vectors.
>   */
> @@ -901,6 +906,7 @@ work_pending:
>   */
>  ret_to_user:
>  	disable_daif
> +	erase_kstack

I *think* this should happen in finish_ret_to_user a few lines down, since we
can call C code if we branch to work_pending, dirtying the stack.

>  	ldr	x1, [tsk, #TSK_TI_FLAGS]
>  	and	x2, x1, #_TIF_WORK_MASK
>  	cbnz	x2, work_pending
> @@ -1337,3 +1343,105 @@ alternative_else_nop_endif
>  ENDPROC(__sdei_asm_handler)
>  NOKPROBE(__sdei_asm_handler)
>  #endif /* CONFIG_ARM_SDE_INTERFACE */
> +
> +/*
> + * This is what the stack looks like
> + *
> + * +---+ <- task_stack_page(p) + THREAD_SIZE
> + * |   |
> + * +---+ <- task_stack_page(p) + THREAD_START_SP
> + * |   |
> + * |   |
> + * +---+ <- task_pt_regs(p)

THREAD_START_SP got killed off in commit 34be98f4944f9907 as part of the
VMAP_STACK rework, so this can be:

      +---+ <- task_stack_page(p) + THREAD_SIZE
      |   |
      |   |
      +---+ <- task_pt_regs(p)
       ...

> + * |   |
> + * |   |
> + * |   | <- current_sp
> + * ~~~~~
> + *
> + * ~~~~~
> + * |   | <- lowest_stack
> + * |   |
> + * |   |
> + * +---+ <- task_stack_page(p)
> + *
> + * This function is desgned to poison the memory between the lowest_stack
> + * and the current stack pointer. After clearing the stack, the lowest
> + * stack is reset.
> + */
> +
> +#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
> +ENTRY(__erase_kstack)
> +	mov	x10, x0	// save x0 for the fast path

AFAICT, we only call this from ret_to_user, where x0 doesn't need to be
preserved.

Is that for ret_fast_syscall? In some cases, ret_fast_syscall can bypass
ret_to_user and calls kernel_exit directly, so we might need a call there.

> +
> +	get_thread_info	x0
> +	ldr	x1, [x0, #TSK_TI_LOWEST_STACK]
> +
> +	/* get the number of bytes to check for lowest stack */
> +	mov	x3, x1
> +	and	x3, x3, #THREAD_SIZE - 1
> +	lsr	x3, x3, #3
> +
> +	/* generate addresses from the bottom of the stack */
> +	mov	x4, sp
> +	movn	x2, #THREAD_SIZE - 1
> +	and	x1, x4, x2

Can we replace the MOVN;AND with a single instruction to clear the low bits?
e.g.

	mov	x4, sp
	bic	x1, x4, #THREAD_SIZE - 1

... IIUC BIC is an alias for the bitfield instructions, though I can't recall
exactly which one(s).

> +
> +	mov	x2, #STACKLEAK_POISON
> +
> +	mov	x5, #0
> +1:
> +	/*
> +	 * As borrowed from the x86 logic, start from the lowest_stack
> +	 * and go to the bottom to find the poison value.
> +	 * The check of 16 is to hopefully avoid false positives.
> +	 */
> +	cbz	x3, 4f
> +	ldr	x4, [x1, x3, lsl #3]
> +	cmp	x4, x2
> +	csinc	x5, xzr, x5, ne
> +	tbnz	x5, #STACKLEAK_POISON_CHECK_DEPTH/4, 4f	// found 16 poisons?
> +	sub	x3, x3, #1
> +	b	1b
> +
> +4:
> +	/* total number of bytes to poison */
> +	add     x5, x1, x3, lsl #3
> +	mov	x4, sp
> +	sub     x8, x4, x5
> +
> +	cmp     x8, #THREAD_SIZE // sanity check the range
> +	b.lo    5f
> +	ASM_BUG()
> +
> +5:
> +	/*
> +	 * We may have hit a path where the stack did not get used,
> +	 * no need to do anything here
> +	 */
> +	cbz	x8, 7f
> +
> +	sub	x8, x8, #1 // don't poison the current stack pointer
> +
> +	lsr     x8, x8, #3
> +	add     x3, x3, x8
> +
> +	/*
> +	 * The logic of this loop ensures the last stack word isn't
> +	 * ovewritten.
> +	 */

Is that to ensure that we don't clobber the word at the current sp value?

> +6:
> +	cbz     x8, 7f
> +	str     x2, [x1, x3, lsl #3]
> +	sub     x3, x3, #1
> +	sub     x8, x8, #1
> +	b	6b
> +
> +	/* Reset the lowest stack to the top of the stack */
> +7:
> +	mov	x1, sp
> +	str	x1, [x0, #TSK_TI_LOWEST_STACK]
> +
> +	mov	x0, x10
> +	ret
> +ENDPROC(__erase_kstack)
> +#endif

[...]

> diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> index 7b3ba40f0745..35ebbc1b17ff 100644
> --- a/drivers/firmware/efi/libstub/Makefile
> +++ b/drivers/firmware/efi/libstub/Makefile
> @@ -20,7 +20,8 @@ cflags-$(CONFIG_EFI_ARMSTUB)	+= -I$(srctree)/scripts/dtc/libfdt
>  KBUILD_CFLAGS			:= $(cflags-y) -DDISABLE_BRANCH_PROFILING \
>  				   -D__NO_FORTIFY \
>  				   $(call cc-option,-ffreestanding) \
> -				   $(call cc-option,-fno-stack-protector)
> +				   $(call cc-option,-fno-stack-protector) \
> +				   $(DISABLE_STACKLEAK_PLUGIN)

I believe the KVM hyp code will also need to opt-out of this.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH RFC v8 1/6] x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls
  2018-02-21 13:24   ` Borislav Petkov
@ 2018-02-21 21:49     ` Alexander Popov
  2018-02-22 19:14       ` Borislav Petkov
  0 siblings, 1 reply; 61+ messages in thread
From: Alexander Popov @ 2018-02-21 21:49 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: kernel-hardening, Kees Cook, PaX Team, Brad Spengler,
	Ingo Molnar, Andy Lutomirski, Tycho Andersen, Laura Abbott,
	Mark Rutland, Ard Biesheuvel, Thomas Gleixner, H . Peter Anvin,
	Peter Zijlstra, Dmitry V . Levin, x86

Hello Borislav,

Thanks a lot for your review. Please see my comments below.

On 21.02.2018 16:24, Borislav Petkov wrote:
> On Fri, Feb 16, 2018 at 09:10:52PM +0300, Alexander Popov wrote:
>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>> index 63bf349..62748bd 100644
>> --- a/arch/x86/Kconfig
>> +++ b/arch/x86/Kconfig
>> @@ -119,6 +119,7 @@ config X86
>>  	select HAVE_ARCH_COMPAT_MMAP_BASES	if MMU && COMPAT
>>  	select HAVE_ARCH_SECCOMP_FILTER
>>  	select HAVE_ARCH_THREAD_STRUCT_WHITELIST
>> +	select HAVE_ARCH_STACKLEAK
>>  	select HAVE_ARCH_TRACEHOOK
>>  	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
>>  	select HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD if X86_64
>> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
>> index 16c2c02..4a7365a 100644
>> --- a/arch/x86/entry/entry_32.S
>> +++ b/arch/x86/entry/entry_32.S
>> @@ -77,6 +77,90 @@
>>  #endif
>>  .endm
>>  
>> +.macro erase_kstack
>> +#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
>> +	call erase_kstack
> 
> Hmm, why do we need a macro *and* a function of the same name? Why not do
> 
> 	call erase_kstack
> 
> everywhere we need to?
> 
> And then do
> 
> ENTRY(erase_kstack)
> #ifdef CONFIG_GCC_PLUGIN_STACKLEAK
> 	...
> 
> to tone down the ifdeffery.

This approach doesn't work. There is the diff:

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index b418d3a..0a05755 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -66,14 +66,8 @@ END(native_usergs_sysret64)
        TRACE_IRQS_FLAGS EFLAGS(%rsp)
 .endm

-.macro erase_kstack
-#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
-       call erase_kstack
-#endif
-.endm
-
-#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
 ENTRY(erase_kstack)
+#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
        pushq   %rdi
        pushq   %rcx
        pushq   %rax
@@ -173,8 +167,8 @@ ENTRY(erase_kstack)
        popq    %rcx
        popq    %rdi
        ret
-ENDPROC(erase_kstack)
 #endif
+ENDPROC(erase_kstack)

 /*
  * When dynamic function tracer is enabled it will add a breakpoint
@@ -455,7 +449,7 @@ syscall_return_via_sysret:
         * We are on the trampoline stack.  All regs except RDI are live.
         * We can do future final exit work right here.
         */
-       erase_kstack
+       call erase_kstack

        SWITCH_TO_USER_CR3_STACK scratch_reg=%rdi

@@ -765,7 +759,7 @@ GLOBAL(swapgs_restore_regs_and_return_to_usermode)
         * We are on the trampoline stack.  All regs except RDI are live.
         * We can do future final exit work right here.
         */
-       erase_kstack
+       call erase_kstack

        SWITCH_TO_USER_CR3_STACK scratch_reg=%rdi


It crashes the kernel with STACKLEAK disabled:

[    1.709081] VFS: Mounted root (ext4 filesystem) readonly on device 8:0.
[    1.710202] devtmpfs: mounted
[    1.711741] Freeing unused kernel memory: 1276K
[    1.712391] Kernel memory protection disabled.
[    1.760050] PANIC: double fault, error_code: 0x0
[    1.760627] CPU: 0 PID: 1 Comm: init Not tainted 4.16.0-rc1+ #8
[    1.761005] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[    1.761005] RIP: 0010:update_curr+0x7/0x160
[    1.761005] RSP: 0000:fffffe0000002000 EFLAGS: 00010002
[    1.761005] RAX: ffff88007c890100 RBX: ffff88007fc20900 RCX: 000000003da0c28c
[    1.761005] RDX: ffff88007fc20900 RSI: ffff88007c890100 RDI: ffff88007fc20900
[    1.761005] RBP: ffff88007c890100 R08: ffffffffffffffda R09: 0000000000000001
[    1.761005] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88007fc20900
[    1.761005] R13: ffff88007b529d00 R14: ffff88007c890100 R15: 0000000000000000
[    1.761005] FS:  0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
[    1.761005] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.761005] CR2: fffffe0000001ff8 CR3: 000000007b090000 CR4: 00000000000006f0
[    1.761005] Call Trace:
[    1.761005]  <ENTRY_TRAMPOLINE>
[    1.761005]  put_prev_entity+0x31/0x580
[    1.761005]  pick_next_task_fair+0x454/0x470
[    1.761005]  __schedule+0x14b/0x6f0
[    1.761005]  schedule+0x23/0x80
[    1.761005]  exit_to_usermode_loop+0x5c/0x90
[    1.761005]  do_syscall_64+0xfa/0x110
[    1.761005]  entry_SYSCALL_64_after_hwframe+0x21/0x86
[    1.761005] RIP: 81a00935:swapgs_restore_regs_and_return_to_usermode+0x38/0x8c
[    1.761005] RSP: 81a00935:ffffffff81a00935 EFLAGS: ffffffff81a00935 ORIG_RAX: ffffffffffffffda
[    1.761005] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000000000
[    1.761005] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff81a00935
[    1.761005] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[    1.761005] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[    1.761005] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[    1.761005]  ? swapgs_restore_regs_and_return_to_usermode+0x38/0x8c
[    1.761005]  ? swapgs_restore_regs_and_return_to_usermode+0x38/0x8c
[    1.761005]  ? swapgs_restore_regs_and_return_to_usermode+0x38/0x8c
[    1.761005]  ? swapgs_restore_regs_and_return_to_usermode+0x38/0x8c
[    1.761005]  ? swapgs_restore_regs_and_return_to_usermode+0x38/0x8c
[    1.761005]  ? swapgs_restore_regs_and_return_to_usermode+0x38/0x8c
[    1.761005]  </ENTRY_TRAMPOLINE>
[    1.761005] Code: 02 00 00 8b 78 18 e8 29 ff ff ff 48 ba db 34 b6 d7 82 de 1b 43 48 f7 e2 48 89 d0 48 c1 e8 12 c3 0f
1f 40 00 41 56 41 55 41 54 55 <53> 48 8b 5f 40 48 8b 87 30 01 00 00 48 85 db 48 8b 80 68 09 00
[    1.761005] Kernel panic - not syncing: Machine halted.


Actually gcc creates a strange erase_kstack():

ffffffff81a00010 <erase_kstack>:
ffffffff81a00010:	5f                   	pop    %rdi
ffffffff81a00011:	eb 31                	jmp    ffffffff81a00044 <entry_SYSCALL_64_after_hwframe>
ffffffff81a00013:	0f 1f 00             	nopl   (%rax)
ffffffff81a00016:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
ffffffff81a0001d:	00 00 00


But with the initial macro the binary doesn't have erase_kstack() at all
when STACKLEAK is disabled.

>> +#endif
>> +.endm
>> +
>> +#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
>> +ENTRY(erase_kstack)
>> +	pushl	%edi
>> +	pushl	%ecx
>> +	pushl	%eax
>> +	pushl	%ebp
>> +
>> +	movl	PER_CPU_VAR(current_task), %ebp
>> +	mov	TASK_lowest_stack(%ebp), %edi
> 
> So instead of TASK_lowest_stack and adding all the ifdeffery around, why
> not do this computation:
> 
> 	(unsigned long)task_stack_page(p) + 2 * sizeof(unsigned long);
> 
> in asm?
> 
> You only need task->stack as an offset variable. And other code might
> need it too, anyway so you could just as well use it instead of adding
> another var to thread_struct.
> 
> /me reads further.
> 
> Ah. So this lowest_stack thing changes at the end:
> 
> 	"Set the lowest_stack value to the top_of_stack - 256"
> 
> Why?
> 
> So that magic needs more explanation for slow people like me.

Thanks for your question.

The commit message says: "Full STACKLEAK feature also contains the gcc plugin
which comes in a separate commit".

And the next commit in this series introduces that plugin. Let me quote its
commit message as well:

 This commit introduces the STACKLEAK gcc plugin. It is needed for:
  - tracking the lowest border of the kernel stack, which is important
     for the code erasing the used part of the kernel stack at the end
     of syscalls (comes in a separate commit);
  - checking that alloca calls don't cause stack overflow.

 So this plugin instruments the kernel code inserting:
  - the check_alloca() call before alloca and the track_stack() call
     after it;
  - the track_stack() call for the functions with a stack frame size
     greater than or equal to CONFIG_STACKLEAK_TRACK_MIN_SIZE.

Tracking the lowest border of the kernel stack with the lowest_stack
variable makes STACKLEAK so efficient (please see the performance
statistics in the cover letter).

>> +	mov	$STACKLEAK_POISON, %eax
>> +	std
>> +
>> +	/*
>> +	 * Let's search for the poison value in the stack.
>> +	 * Start from the lowest_stack and go to the bottom (see std above).
> 
> Let's write insns capitalized: "see STD above".

Ok.

> ...
> 
>> +	/*
>> +	 * Check that some further qwords contain poison. If so, the part
>> +	 * of the stack below the address in %rdi is likely to be poisoned.
>> +	 * Otherwise we need to search deeper.
>> +	 */
>> +	mov	$STACKLEAK_POISON_CHECK_DEPTH / 8, %ecx
>> +	repe	scasq
>> +	jecxz	2f	/* Poison the upper part of the stack */
>> +	jne	1b	/* Search deeper */
>> +
>> +2:
> 
> You could name that label
> 
> .Lpoison
> 
> and make the code more readable:
> 
> 	jb	.Lpoison
> 
> and so on.

Ok, thanks.

>> +	/*
>> +	 * Two qwords at the bottom of the thread stack are reserved and
>> +	 * should not be poisoned (see CONFIG_SCHED_STACK_END_CHECK).
>> +	 */
>> +	or	$2 * 8, %rdi
>> +
>> +	/*
>> +	 * Check whether we are on the thread stack to prepare the counter
>> +	 * for stack poisoning.
>> +	 */
>> +	mov	PER_CPU_VAR(cpu_current_top_of_stack), %rcx
>> +	sub	%rsp, %rcx
>> +	cmp	$THREAD_SIZE_asm, %rcx
>> +	jb	3f
>> +
>> +	/*
>> +	 * We are not on the thread stack, so we can write poison between
>> +	 * the address in %rdi and the stack top.
>> +	 */
>> +	mov	PER_CPU_VAR(cpu_current_top_of_stack), %rcx
>> +	sub	%rdi, %rcx
>> +	jmp	4f
>> +
>> +3:
>> +	/*
>> +	 * We are on the thread stack, so we can write poison between the
>> +	 * address in %rdi and the address in %rsp (not included, used memory).
>> +	 */
>> +	mov	%rsp, %rcx
>> +	sub	%rdi, %rcx
>> +
>> +4:
>> +	/* Check that the counter value is sane */
>> +	cmp	$THREAD_SIZE_asm, %rcx
>> +	jb	5f
>> +	ud2
>> +
>> +5:
>> +	/*
>> +	 * So let's write the poison value to the kernel stack. Start from the
>> +	 * address in %rdi and move up (see cld).
>> +	 */
>> +	cld
>> +	shr	$3, %ecx
>> +	rep	stosq
> 
> Hohumm, that's >2K loops per syscall here and stack is
> 
> 0xffffc90000008010:     0xffffffffffff4111      0xffffffffffff4111
> 0xffffc90000008020:     0xffffffffffff4111      0xffffffffffff4111
> 0xffffc90000008030:     0xffffffffffff4111      0xffffffffffff4111
> 0xffffc90000008040:     0xffffffffffff4111      0xffffffffffff4111
> 0xffffc90000008050:     0xffffffffffff4111      0xffffffffffff4111
> 0xffffc90000008060:     0xffffffffffff4111      0xffffffffffff4111

Yes, so the stack is erased. That is the actual goal.

> ...
>> +#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
>> +/* Poison value points to the unused hole in the virtual memory map */
> 
> There are a bunch of those there now. Please document it in
> Documentation/x86/x86_64/mm.txt
> 
>> +# define STACKLEAK_POISON -0xBEEF

The mm.txt already has this line:
 ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole

Excuse me, I didn't get what to document.

Thanks again.

Best regards,
Alexander

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH 2/2] arm64: Clear the stack
  2018-02-21 15:38             ` Mark Rutland
@ 2018-02-21 23:53               ` Laura Abbott
  -1 siblings, 0 replies; 61+ messages in thread
From: Laura Abbott @ 2018-02-21 23:53 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Alexander Popov, Kees Cook, Ard Biesheuvel, kernel-hardening,
	linux-arm-kernel, linux-kernel

On 02/21/2018 07:38 AM, Mark Rutland wrote:
> Hi Laura,
> 
> On Tue, Feb 20, 2018 at 05:13:03PM -0800, Laura Abbott wrote:
>> Implementation of stackleak based heavily on the x86 version
> 
> Neat!
> 
>> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
>> index ec2ee720e33e..b909b436293a 100644
>> --- a/arch/arm64/kernel/entry.S
>> +++ b/arch/arm64/kernel/entry.S
>> @@ -401,6 +401,11 @@ tsk	.req	x28		// current thread_info
>>   
>>   	.text
>>   
>> +	.macro	erase_kstack
>> +#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
>> +	bl	__erase_kstack
>> +#endif
>> +	.endm
>>   /*
>>    * Exception vectors.
>>    */
>> @@ -901,6 +906,7 @@ work_pending:
>>    */
>>   ret_to_user:
>>   	disable_daif
>> +	erase_kstack
> 
> I *think* this should happen in finish_ret_to_user a few lines down, since we
> can call C code if we branch to work_pending, dirtying the stack.
> 

I think you're right but this didn't immediately work when I tried it.
I'll have to dig into this some more.

>>   	ldr	x1, [tsk, #TSK_TI_FLAGS]
>>   	and	x2, x1, #_TIF_WORK_MASK
>>   	cbnz	x2, work_pending
>> @@ -1337,3 +1343,105 @@ alternative_else_nop_endif
>>   ENDPROC(__sdei_asm_handler)
>>   NOKPROBE(__sdei_asm_handler)
>>   #endif /* CONFIG_ARM_SDE_INTERFACE */
>> +
>> +/*
>> + * This is what the stack looks like
>> + *
>> + * +---+ <- task_stack_page(p) + THREAD_SIZE
>> + * |   |
>> + * +---+ <- task_stack_page(p) + THREAD_START_SP
>> + * |   |
>> + * |   |
>> + * +---+ <- task_pt_regs(p)
> 
> THREAD_START_SP got killed off in commit 34be98f4944f9907 as part of the
> VMAP_STACK rework, so this can be:
> 
>        +---+ <- task_stack_page(p) + THREAD_SIZE
>        |   |
>        |   |
>        +---+ <- task_pt_regs(p)
>         ...
> 

Good point.

>> + * |   |
>> + * |   |
>> + * |   | <- current_sp
>> + * ~~~~~
>> + *
>> + * ~~~~~
>> + * |   | <- lowest_stack
>> + * |   |
>> + * |   |
>> + * +---+ <- task_stack_page(p)
>> + *
>> + * This function is desgned to poison the memory between the lowest_stack
>> + * and the current stack pointer. After clearing the stack, the lowest
>> + * stack is reset.
>> + */
>> +
>> +#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
>> +ENTRY(__erase_kstack)
>> +	mov	x10, x0	// save x0 for the fast path
> 
> AFAICT, we only call this from ret_to_user, where x0 doesn't need to be
> preserved.
> 
> Is that for ret_fast_syscall? In some cases, ret_fast_syscall can bypass
> ret_to_user and calls kernel_exit directly, so we might need a call there.
> 

This was a hold over when I was experimenting with calling  erase_kstack
more places, one of which came through ret_fast_syscall. I realized
later that the erase was unnecessary but accidentally kept the saving
in. I'll see about removing it assuming we don't decide later to put
a call on the fast path.

>> +
>> +	get_thread_info	x0
>> +	ldr	x1, [x0, #TSK_TI_LOWEST_STACK]
>> +
>> +	/* get the number of bytes to check for lowest stack */
>> +	mov	x3, x1
>> +	and	x3, x3, #THREAD_SIZE - 1
>> +	lsr	x3, x3, #3
>> +
>> +	/* generate addresses from the bottom of the stack */
>> +	mov	x4, sp
>> +	movn	x2, #THREAD_SIZE - 1
>> +	and	x1, x4, x2
> 
> Can we replace the MOVN;AND with a single instruction to clear the low bits?
> e.g.
> 
> 	mov	x4, sp
> 	bic	x1, x4, #THREAD_SIZE - 1
> 
> ... IIUC BIC is an alias for the bitfield instructions, though I can't recall
> exactly which one(s).
> 

Good suggestion.

>> +
>> +	mov	x2, #STACKLEAK_POISON
>> +
>> +	mov	x5, #0
>> +1:
>> +	/*
>> +	 * As borrowed from the x86 logic, start from the lowest_stack
>> +	 * and go to the bottom to find the poison value.
>> +	 * The check of 16 is to hopefully avoid false positives.
>> +	 */
>> +	cbz	x3, 4f
>> +	ldr	x4, [x1, x3, lsl #3]
>> +	cmp	x4, x2
>> +	csinc	x5, xzr, x5, ne
>> +	tbnz	x5, #STACKLEAK_POISON_CHECK_DEPTH/4, 4f	// found 16 poisons?
>> +	sub	x3, x3, #1
>> +	b	1b
>> +
>> +4:
>> +	/* total number of bytes to poison */
>> +	add     x5, x1, x3, lsl #3
>> +	mov	x4, sp
>> +	sub     x8, x4, x5
>> +
>> +	cmp     x8, #THREAD_SIZE // sanity check the range
>> +	b.lo    5f
>> +	ASM_BUG()
>> +
>> +5:
>> +	/*
>> +	 * We may have hit a path where the stack did not get used,
>> +	 * no need to do anything here
>> +	 */
>> +	cbz	x8, 7f
>> +
>> +	sub	x8, x8, #1 // don't poison the current stack pointer
>> +
>> +	lsr     x8, x8, #3
>> +	add     x3, x3, x8
>> +
>> +	/*
>> +	 * The logic of this loop ensures the last stack word isn't
>> +	 * ovewritten.
>> +	 */
> 
> Is that to ensure that we don't clobber the word at the current sp value?
> 

Correct.

>> +6:
>> +	cbz     x8, 7f
>> +	str     x2, [x1, x3, lsl #3]
>> +	sub     x3, x3, #1
>> +	sub     x8, x8, #1
>> +	b	6b
>> +
>> +	/* Reset the lowest stack to the top of the stack */
>> +7:
>> +	mov	x1, sp
>> +	str	x1, [x0, #TSK_TI_LOWEST_STACK]
>> +
>> +	mov	x0, x10
>> +	ret
>> +ENDPROC(__erase_kstack)
>> +#endif
> 
> [...]
> 
>> diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
>> index 7b3ba40f0745..35ebbc1b17ff 100644
>> --- a/drivers/firmware/efi/libstub/Makefile
>> +++ b/drivers/firmware/efi/libstub/Makefile
>> @@ -20,7 +20,8 @@ cflags-$(CONFIG_EFI_ARMSTUB)	+= -I$(srctree)/scripts/dtc/libfdt
>>   KBUILD_CFLAGS			:= $(cflags-y) -DDISABLE_BRANCH_PROFILING \
>>   				   -D__NO_FORTIFY \
>>   				   $(call cc-option,-ffreestanding) \
>> -				   $(call cc-option,-fno-stack-protector)
>> +				   $(call cc-option,-fno-stack-protector) \
>> +				   $(DISABLE_STACKLEAK_PLUGIN)
> 
> I believe the KVM hyp code will also need to opt-out of this.
> 

I'll double check that.

> Thanks,
> Mark.
> 

Thanks,
Laura

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 2/2] arm64: Clear the stack
@ 2018-02-21 23:53               ` Laura Abbott
  0 siblings, 0 replies; 61+ messages in thread
From: Laura Abbott @ 2018-02-21 23:53 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/21/2018 07:38 AM, Mark Rutland wrote:
> Hi Laura,
> 
> On Tue, Feb 20, 2018 at 05:13:03PM -0800, Laura Abbott wrote:
>> Implementation of stackleak based heavily on the x86 version
> 
> Neat!
> 
>> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
>> index ec2ee720e33e..b909b436293a 100644
>> --- a/arch/arm64/kernel/entry.S
>> +++ b/arch/arm64/kernel/entry.S
>> @@ -401,6 +401,11 @@ tsk	.req	x28		// current thread_info
>>   
>>   	.text
>>   
>> +	.macro	erase_kstack
>> +#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
>> +	bl	__erase_kstack
>> +#endif
>> +	.endm
>>   /*
>>    * Exception vectors.
>>    */
>> @@ -901,6 +906,7 @@ work_pending:
>>    */
>>   ret_to_user:
>>   	disable_daif
>> +	erase_kstack
> 
> I *think* this should happen in finish_ret_to_user a few lines down, since we
> can call C code if we branch to work_pending, dirtying the stack.
> 

I think you're right but this didn't immediately work when I tried it.
I'll have to dig into this some more.

>>   	ldr	x1, [tsk, #TSK_TI_FLAGS]
>>   	and	x2, x1, #_TIF_WORK_MASK
>>   	cbnz	x2, work_pending
>> @@ -1337,3 +1343,105 @@ alternative_else_nop_endif
>>   ENDPROC(__sdei_asm_handler)
>>   NOKPROBE(__sdei_asm_handler)
>>   #endif /* CONFIG_ARM_SDE_INTERFACE */
>> +
>> +/*
>> + * This is what the stack looks like
>> + *
>> + * +---+ <- task_stack_page(p) + THREAD_SIZE
>> + * |   |
>> + * +---+ <- task_stack_page(p) + THREAD_START_SP
>> + * |   |
>> + * |   |
>> + * +---+ <- task_pt_regs(p)
> 
> THREAD_START_SP got killed off in commit 34be98f4944f9907 as part of the
> VMAP_STACK rework, so this can be:
> 
>        +---+ <- task_stack_page(p) + THREAD_SIZE
>        |   |
>        |   |
>        +---+ <- task_pt_regs(p)
>         ...
> 

Good point.

>> + * |   |
>> + * |   |
>> + * |   | <- current_sp
>> + * ~~~~~
>> + *
>> + * ~~~~~
>> + * |   | <- lowest_stack
>> + * |   |
>> + * |   |
>> + * +---+ <- task_stack_page(p)
>> + *
>> + * This function is desgned to poison the memory between the lowest_stack
>> + * and the current stack pointer. After clearing the stack, the lowest
>> + * stack is reset.
>> + */
>> +
>> +#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
>> +ENTRY(__erase_kstack)
>> +	mov	x10, x0	// save x0 for the fast path
> 
> AFAICT, we only call this from ret_to_user, where x0 doesn't need to be
> preserved.
> 
> Is that for ret_fast_syscall? In some cases, ret_fast_syscall can bypass
> ret_to_user and calls kernel_exit directly, so we might need a call there.
> 

This was a hold over when I was experimenting with calling  erase_kstack
more places, one of which came through ret_fast_syscall. I realized
later that the erase was unnecessary but accidentally kept the saving
in. I'll see about removing it assuming we don't decide later to put
a call on the fast path.

>> +
>> +	get_thread_info	x0
>> +	ldr	x1, [x0, #TSK_TI_LOWEST_STACK]
>> +
>> +	/* get the number of bytes to check for lowest stack */
>> +	mov	x3, x1
>> +	and	x3, x3, #THREAD_SIZE - 1
>> +	lsr	x3, x3, #3
>> +
>> +	/* generate addresses from the bottom of the stack */
>> +	mov	x4, sp
>> +	movn	x2, #THREAD_SIZE - 1
>> +	and	x1, x4, x2
> 
> Can we replace the MOVN;AND with a single instruction to clear the low bits?
> e.g.
> 
> 	mov	x4, sp
> 	bic	x1, x4, #THREAD_SIZE - 1
> 
> ... IIUC BIC is an alias for the bitfield instructions, though I can't recall
> exactly which one(s).
> 

Good suggestion.

>> +
>> +	mov	x2, #STACKLEAK_POISON
>> +
>> +	mov	x5, #0
>> +1:
>> +	/*
>> +	 * As borrowed from the x86 logic, start from the lowest_stack
>> +	 * and go to the bottom to find the poison value.
>> +	 * The check of 16 is to hopefully avoid false positives.
>> +	 */
>> +	cbz	x3, 4f
>> +	ldr	x4, [x1, x3, lsl #3]
>> +	cmp	x4, x2
>> +	csinc	x5, xzr, x5, ne
>> +	tbnz	x5, #STACKLEAK_POISON_CHECK_DEPTH/4, 4f	// found 16 poisons?
>> +	sub	x3, x3, #1
>> +	b	1b
>> +
>> +4:
>> +	/* total number of bytes to poison */
>> +	add     x5, x1, x3, lsl #3
>> +	mov	x4, sp
>> +	sub     x8, x4, x5
>> +
>> +	cmp     x8, #THREAD_SIZE // sanity check the range
>> +	b.lo    5f
>> +	ASM_BUG()
>> +
>> +5:
>> +	/*
>> +	 * We may have hit a path where the stack did not get used,
>> +	 * no need to do anything here
>> +	 */
>> +	cbz	x8, 7f
>> +
>> +	sub	x8, x8, #1 // don't poison the current stack pointer
>> +
>> +	lsr     x8, x8, #3
>> +	add     x3, x3, x8
>> +
>> +	/*
>> +	 * The logic of this loop ensures the last stack word isn't
>> +	 * ovewritten.
>> +	 */
> 
> Is that to ensure that we don't clobber the word at the current sp value?
> 

Correct.

>> +6:
>> +	cbz     x8, 7f
>> +	str     x2, [x1, x3, lsl #3]
>> +	sub     x3, x3, #1
>> +	sub     x8, x8, #1
>> +	b	6b
>> +
>> +	/* Reset the lowest stack to the top of the stack */
>> +7:
>> +	mov	x1, sp
>> +	str	x1, [x0, #TSK_TI_LOWEST_STACK]
>> +
>> +	mov	x0, x10
>> +	ret
>> +ENDPROC(__erase_kstack)
>> +#endif
> 
> [...]
> 
>> diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
>> index 7b3ba40f0745..35ebbc1b17ff 100644
>> --- a/drivers/firmware/efi/libstub/Makefile
>> +++ b/drivers/firmware/efi/libstub/Makefile
>> @@ -20,7 +20,8 @@ cflags-$(CONFIG_EFI_ARMSTUB)	+= -I$(srctree)/scripts/dtc/libfdt
>>   KBUILD_CFLAGS			:= $(cflags-y) -DDISABLE_BRANCH_PROFILING \
>>   				   -D__NO_FORTIFY \
>>   				   $(call cc-option,-ffreestanding) \
>> -				   $(call cc-option,-fno-stack-protector)
>> +				   $(call cc-option,-fno-stack-protector) \
>> +				   $(DISABLE_STACKLEAK_PLUGIN)
> 
> I believe the KVM hyp code will also need to opt-out of this.
> 

I'll double check that.

> Thanks,
> Mark.
> 

Thanks,
Laura

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 2/2] arm64: Clear the stack
  2018-02-21 23:53               ` Laura Abbott
@ 2018-02-22  1:35                 ` Laura Abbott
  -1 siblings, 0 replies; 61+ messages in thread
From: Laura Abbott @ 2018-02-22  1:35 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Alexander Popov, Kees Cook, Ard Biesheuvel, kernel-hardening,
	linux-arm-kernel, linux-kernel

On 02/21/2018 03:53 PM, Laura Abbott wrote:
>> I *think* this should happen in finish_ret_to_user a few lines down, since we
>> can call C code if we branch to work_pending, dirtying the stack.
>>
> 
> I think you're right but this didn't immediately work when I tried it.
> I'll have to dig into this some more.

Okay I figured this out. Not corrupting registers works wonders.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 2/2] arm64: Clear the stack
@ 2018-02-22  1:35                 ` Laura Abbott
  0 siblings, 0 replies; 61+ messages in thread
From: Laura Abbott @ 2018-02-22  1:35 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/21/2018 03:53 PM, Laura Abbott wrote:
>> I *think* this should happen in finish_ret_to_user a few lines down, since we
>> can call C code if we branch to work_pending, dirtying the stack.
>>
> 
> I think you're right but this didn't immediately work when I tried it.
> I'll have to dig into this some more.

Okay I figured this out. Not corrupting registers works wonders.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH RFC v8 0/6] Introduce the STACKLEAK feature and a test for it
  2018-02-16 18:10 [PATCH RFC v8 0/6] Introduce the STACKLEAK feature and a test for it Alexander Popov
                   ` (6 preceding siblings ...)
  2018-02-20 10:29 ` [PATCH RFC v8 0/6] Introduce the STACKLEAK feature and a test for it Alexander Popov
@ 2018-02-22  1:43 ` Laura Abbott
  2018-02-22 23:14   ` [PATCH 0/2] Update stackleak for gcc-8 Laura Abbott
  7 siblings, 1 reply; 61+ messages in thread
From: Laura Abbott @ 2018-02-22  1:43 UTC (permalink / raw)
  To: Alexander Popov, kernel-hardening, Kees Cook, PaX Team,
	Brad Spengler, Ingo Molnar, Andy Lutomirski, Tycho Andersen,
	Mark Rutland, Ard Biesheuvel, Borislav Petkov, Thomas Gleixner,
	H . Peter Anvin, Peter Zijlstra, Dmitry V . Levin, x86

On 02/16/2018 10:10 AM, Alexander Popov wrote:
> This is the 8th version of the patch series introducing STACKLEAK to the
> mainline kernel. I've made some minor improvements while waiting for the
> next review by x86 maintainers.
> 
> STACKLEAK is a security feature developed by Grsecurity/PaX (kudos to them),
> which:
>   - reduces the information that can be revealed through kernel stack leak bugs;
>   - blocks some uninitialized stack variable attacks (e.g. CVE-2010-2963);
>   - introduces some runtime checks for kernel stack overflow detection.
> 
> STACKLEAK instrumentation statistics
> ====================================
> 
> These numbers were obtained for the 4th version of the patch series.
> 
> Size of vmlinux (x86_64_defconfig):
>   file size:
>    - STACKLEAK disabled: 35014784 bytes
>    - STACKLEAK enabled: 35044952 bytes (+0.086%)
>   .text section size (calculated by size utility):
>    - STACKLEAK disabled: 10752983
>    - STACKLEAK enabled: 11062221 (+2.876%)
> 
> The readelf utility shows 45602 functions in vmlinux. The STACKLEAK gcc plugin
> inserted 36 check_alloca() calls and 1265 track_stack() calls (42274 calls are
> inserted during GIMPLE pass and 41009 calls are deleted during RTL pass).
> So 2.853% of functions are instrumented.
> 
> STACKLEAK performance impact
> ============================
> 
> The STACKLEAK description in Kconfig includes:
> "The tradeoff is the performance impact: on a single CPU system kernel
> compilation sees a 1% slowdown, other systems and workloads may vary and you are
> advised to test this feature on your expected workload before deploying it".
> 
> Here are the results of a brief performance test on x86_64 (for the 2nd version
> of the patch). The numbers are very different because the STACKLEAK performance
> penalty depends on the workloads.
> 
> Hardware: Intel Core i7-4770, 16 GB RAM
> 
> Test #1: building the Linux kernel with Ubuntu config (time make -j9)
> Result on 4.11-rc8:
>    real	32m14.893s
>    user	237m30.886s
>    sys	11m12.273s
> Result on 4.11-rc8+stackleak:
>    real	32m26.881s (+0.62%)
>    user	238m38.926s (+0.48%)
>    sys	11m36.426s (+3.59%)
> 
> Test #2: hackbench -s 4096 -l 2000 -g 15 -f 25 -P
> Average result on 4.11-rc8: 8.71s
> Average result on 4.11-rc8+stackleak: 9.08s (+4.29%)
> 
> Changelog
> =========
> 
> Changes in v8
> 
> 1. Renamed the STACKLEAK tests to STACKLEAK_ALLOCA and STACKLEAK_DEEP_RECURSION.
>     Fixed some obsolete comments there.
> 
> 2. Added the comments describing stack erasing on x86_32, because it differs
>     a lot from one on x86_64 since v7.
> 
> Changes in v7
> 
> 1. Improved erase_kstack() on x86_64. Now it detects which stack we are
>     currently using. If we are on the thread stack, erase_kstack() writes the
>     poison value up to the stack pointer. If we are not on the thread stack (we
>     are on the trampoline stack introduced by Andy Lutomirski), erase_kstack()
>     writes the poison value up to the cpu_current_top_of_stack.
> 
>     N.B. Right now there are two situations when erase_kstack() is called
>     on the thread stack:
>      - from sysret32_from_system_call in entry_64_compat.S;
>      - from syscall_trace_enter() (see the 3rd patch of this series).
> 
> 2. Introduced STACKLEAK_POISON_CHECK_DEPTH macro and BUILD_BUG_ON() checking
>     its consistency with CONFIG_STACKLEAK_TRACK_MIN_SIZE.
> 
> 3. Added "disable" option for the STACKLEAK gcc plugin (asked by Laura Abbott).
> 
> 4. Improved CONFIG_STACKLEAK_METRICS. Now /proc/<pid>/stack_depth shows
>     the maximum kernel stack consumption for the current and previous syscalls.
>     Although this information is not precise, it can be useful for estimating
>     the STACKLEAK performance impact for different workloads.
> 
> 5. Removed redundant erase_kstack() calls from syscall_trace_enter().
>     There is no need to erase the stack in syscall_trace_enter() when it
>     returns -1 (thanks to Dmitry Levin for noticing that).
> 
> 6. Introduced MIN_STACK_LEFT to get rid of hardcoded numbers in check_alloca().
> 
> Changes in v6
> 
> 1. Examined syscall entry/exit paths.
>      - Added missing erase_kstack() call at ret_from_fork() for x86_32.
>      - Added missing erase_kstack() call at syscall_trace_enter().
>      - Solved questions previously marked with TODO.
> 
> 2. Rebased onto v4.15-rc2, which includes Andy Lutomirski's entry changes.
>     Andy removed sp0 from thread_struct for x86_64, which was the only issue
>     during rebasing.
> 
> 3. Removed the recursive BUG() in track_stack() that was caused by the code
>     instrumentation. Instead, CONFIG_GCC_PLUGIN_STACKLEAK now implies
>     CONFIG_VMAP_STACK and CONFIG_SCHED_STACK_END_CHECK, which seems to be
>     an optimal solution.
> 
> 4. Put stack erasing in syscall_trace_enter() into a separate patch and
>     fixed my mistake with secure_computing() (found by Tycho Andersen).
> 
> 5. After some experiments, kept the asm implementation of erase_kstack(),
>     because it gives a full control over the stack for clearing it neatly
>     and doesn't offend KASAN.
> 
> 6. Improved the comments describing STACKLEAK.
> 
> Changes in v5 (mostly according to the feedback from Ingo Molnar)
> 
> 1. Introduced the CONFIG_STACKLEAK_METRICS providing STACKLEAK information
>     about tasks via the /proc file system. That information can be useful for
>     estimating the STACKLEAK performance impact for different workloads.
>     In particular, /proc/<pid>/lowest_stack shows the current lowest_stack
>     value and its final value from the previous syscall.
> 
> 2. Introduced a single check_alloca() implementation working for both
>     x86_64 and x86_32.
> 
> 3. Fixed coding style issues and did some refactoring in the STACKLEAK
>     gcc plugin.
> 
> 4. Put the gcc plugin and the kernel stack erasing into separate (working)
>     patches.
> 
> Changes in v4
> 
> 1. Introduced the CONFIG_STACKLEAK_TRACK_MIN_SIZE parameter instead of
>     hard-coded track-lowest-sp.
> 
> 2. Carefully looked into the assertions in track_stack() and check_alloca().
>      - Fixed the incorrect BUG() condition in track_stack(), thanks to Tycho
>         Andersen. Also disabled that check if CONFIG_VMAP_STACK is enabled.
>      - Fixed the surplus and erroneous code for calculating stack_left in
>         check_alloca() on x86_64. That code repeats the work which is already
>         done in get_stack_info() and it misses the fact that different
>         exception stacks on x86_64 have different size.
> 
> 3. Introduced two lkdtm tests for the STACKLEAK feature (developed together
>     with Tycho Andersen).
> 
> 4. Added info about STACKLEAK to Documentation/security/self-protection.rst.
> 
> 5. Improved the comments.
> 
> 6. Decided not to change "unsigned long sp = (unsigned long)&sp" to
>     current_stack_pointer. The original variant is more platform independent
>     since current_stack_pointer has different type on x86 and arm.
> 
> Changes in v3
> 
> 1. Added the detailed comments describing the STACKLEAK gcc plugin.
>     Read the plugin from the bottom up, like you do for Linux kernel drivers.
>     Additional information:
>      - gcc internals documentation available from gcc sources;
>      - wonderful slides by Diego Novillo
>         https://www.airs.com/dnovillo/200711-GCC-Internals/
>      - nice introduction to gcc plugins at LWN
>         https://lwn.net/Articles/457543/
> 
> 2. Improved the commit message and Kconfig description according to the
>     feedback from Kees Cook. Also added the original notice describing
>     the performance impact of the STACKLEAK feature.
> 
> 3. Removed the arch-specific ix86_cmodel check in stackleak_track_stack_gate().
>     It caused skipping the kernel code instrumentation for i386.
> 
> 4. Fixed a minor mistake in stackleak_tree_instrument_execute().
>     First versions of the plugin used ENTRY_BLOCK_PTR_FOR_FN(cfun)->next_bb
>     to get the basic block with the function prologue. That was not correct
>     since the control flow graph edge from ENTRY_BLOCK_PTR doesn't always
>     go to that basic block.
> 
>     So later it was changed to single_succ(ENTRY_BLOCK_PTR_FOR_FN(cfun)),
>     but not completely. next_bb was still used for entry_bb assignment,
>     which could cause the wrong value of the prologue_instrumented variable.
> 
>     I've reported this issue to Grsecurity and proposed the fix for it, but
>     unfortunately didn't get any reply.
> 
> 5. Introduced the STACKLEAK_POISON macro and renamed the config option
>     according to the feedback from Kees Cook.
> 
> Ideas for further work
> ======================
> 
>   - Think of erasing stack on the way out of exception handlers (idea by
>     Andy Lutomirski).
>   - Think of erasing the kernel stack after invoking EFI runtime services
>     (idea by Mark Rutland).
>   - Try to port STACKLEAK to arm64 (Laura Abbott is working on it).
> 
> 
> Alexander Popov (6):
>    x86/entry: Add STACKLEAK erasing the kernel stack at the end of
>      syscalls
>    gcc-plugins: Add STACKLEAK plugin for tracking the kernel stack
>    x86/entry: Erase kernel stack in syscall_trace_enter()
>    lkdtm: Add a test for STACKLEAK
>    fs/proc: Show STACKLEAK metrics in the /proc file system
>    doc: self-protection: Add information about STACKLEAK feature
> 
>   Documentation/security/self-protection.rst |  23 +-
>   arch/Kconfig                               |  53 ++++
>   arch/x86/Kconfig                           |   1 +
>   arch/x86/entry/common.c                    |   7 +
>   arch/x86/entry/entry_32.S                  |  93 ++++++
>   arch/x86/entry/entry_64.S                  | 113 +++++++
>   arch/x86/entry/entry_64_compat.S           |  11 +
>   arch/x86/include/asm/processor.h           |   7 +
>   arch/x86/kernel/asm-offsets.c              |  11 +
>   arch/x86/kernel/dumpstack.c                |  20 ++
>   arch/x86/kernel/process_32.c               |   8 +
>   arch/x86/kernel/process_64.c               |   8 +
>   drivers/misc/Makefile                      |   3 +
>   drivers/misc/lkdtm.h                       |   4 +
>   drivers/misc/lkdtm_core.c                  |   2 +
>   drivers/misc/lkdtm_stackleak.c             | 136 +++++++++
>   fs/exec.c                                  |  33 ++
>   fs/proc/base.c                             |  18 ++
>   include/linux/compiler.h                   |   6 +
>   scripts/Makefile.gcc-plugins               |   3 +
>   scripts/gcc-plugins/stackleak_plugin.c     | 471 +++++++++++++++++++++++++++++
>   21 files changed, 1022 insertions(+), 9 deletions(-)
>   create mode 100644 drivers/misc/lkdtm_stackleak.c
>   create mode 100644 scripts/gcc-plugins/stackleak_plugin.c
> 

This doesn't work with gcc-8. One of the errors is a minor API change
which can be fixed up but a bigger problem is they changed get_frame_size

/* Return size needed for stack frame based on slots so far allocated.
    This size counts from zero.  It is not rounded to STACK_BOUNDARY;
    the caller may have to do that.  */
extern poly_int64 get_frame_size (void);

This is described at https://gcc.gnu.org/onlinedocs//gccint/poly_005fint.html
but I'm still trying to come up with a good solution. Feel free to beat
me to fixing it...

Thanks,
Laura

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/2] stackleak: Update for arm64
  2018-02-21  1:13           ` Laura Abbott
@ 2018-02-22 16:58             ` Will Deacon
  -1 siblings, 0 replies; 61+ messages in thread
From: Will Deacon @ 2018-02-22 16:58 UTC (permalink / raw)
  To: Laura Abbott
  Cc: Alexander Popov, Kees Cook, Mark Rutland, Ard Biesheuvel,
	linux-kernel, linux-arm-kernel, kernel-hardening,
	richard.sandiford

Hi Laura,

On Tue, Feb 20, 2018 at 05:13:02PM -0800, Laura Abbott wrote:
> 
> arm64 has another layer of indirection in the RTL.
> Account for this in the plugin.
> 
> Signed-off-by: Laura Abbott <labbott@redhat.com>
> ---
>  scripts/gcc-plugins/stackleak_plugin.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/scripts/gcc-plugins/stackleak_plugin.c b/scripts/gcc-plugins/stackleak_plugin.c
> index 6fc991c98d8b..7dfaa027423f 100644
> --- a/scripts/gcc-plugins/stackleak_plugin.c
> +++ b/scripts/gcc-plugins/stackleak_plugin.c
> @@ -244,6 +244,11 @@ static unsigned int stackleak_final_execute(void)
>  		 * that insn.
>  		 */
>  		body = PATTERN(insn);
> +		/* arm64 is different */
> +		if (GET_CODE(body) == PARALLEL) {
> +			body = XEXP(body, 0);
> +			body = XEXP(body, 0);
> +		}

Like most kernel developers, I don't know the first thing about GCC internals
so I asked our GCC team and Richard (CC'd) reckons this should be:

	if (GET_CODE(body) == PARALLEL)
		body = XVECEXP(body, 0, 0);

instead of the hunk above. Can you give that a go instead, please?

Cheers,

Will

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 1/2] stackleak: Update for arm64
@ 2018-02-22 16:58             ` Will Deacon
  0 siblings, 0 replies; 61+ messages in thread
From: Will Deacon @ 2018-02-22 16:58 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Laura,

On Tue, Feb 20, 2018 at 05:13:02PM -0800, Laura Abbott wrote:
> 
> arm64 has another layer of indirection in the RTL.
> Account for this in the plugin.
> 
> Signed-off-by: Laura Abbott <labbott@redhat.com>
> ---
>  scripts/gcc-plugins/stackleak_plugin.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/scripts/gcc-plugins/stackleak_plugin.c b/scripts/gcc-plugins/stackleak_plugin.c
> index 6fc991c98d8b..7dfaa027423f 100644
> --- a/scripts/gcc-plugins/stackleak_plugin.c
> +++ b/scripts/gcc-plugins/stackleak_plugin.c
> @@ -244,6 +244,11 @@ static unsigned int stackleak_final_execute(void)
>  		 * that insn.
>  		 */
>  		body = PATTERN(insn);
> +		/* arm64 is different */
> +		if (GET_CODE(body) == PARALLEL) {
> +			body = XEXP(body, 0);
> +			body = XEXP(body, 0);
> +		}

Like most kernel developers, I don't know the first thing about GCC internals
so I asked our GCC team and Richard (CC'd) reckons this should be:

	if (GET_CODE(body) == PARALLEL)
		body = XVECEXP(body, 0, 0);

instead of the hunk above. Can you give that a go instead, please?

Cheers,

Will

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH RFC v8 1/6] x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls
  2018-02-21 21:49     ` Alexander Popov
@ 2018-02-22 19:14       ` Borislav Petkov
  2018-02-22 20:24         ` Alexander Popov
  0 siblings, 1 reply; 61+ messages in thread
From: Borislav Petkov @ 2018-02-22 19:14 UTC (permalink / raw)
  To: Alexander Popov
  Cc: kernel-hardening, Kees Cook, PaX Team, Brad Spengler,
	Ingo Molnar, Andy Lutomirski, Tycho Andersen, Laura Abbott,
	Mark Rutland, Ard Biesheuvel, Thomas Gleixner, H . Peter Anvin,
	Peter Zijlstra, Dmitry V . Levin, x86

On Thu, Feb 22, 2018 at 12:49:44AM +0300, Alexander Popov wrote:
> Actually gcc creates a strange erase_kstack():
> 
> ffffffff81a00010 <erase_kstack>:
> ffffffff81a00010:	5f                   	pop    %rdi
> ffffffff81a00011:	eb 31                	jmp    ffffffff81a00044 <entry_SYSCALL_64_after_hwframe>
> ffffffff81a00013:	0f 1f 00             	nopl   (%rax)
> ffffffff81a00016:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
> ffffffff81a0001d:	00 00 00

Something's very strange here. That code looks like:

ENTRY(entry_SYSCALL_64_stage2)
        UNWIND_HINT_EMPTY
        popq    %rdi
        jmp     entry_SYSCALL_64_after_hwframe
END(entry_SYSCALL_64_stage2)

ENTRY(entry_SYSCALL_64)
...

so frankly I wonder what gcc is even doing here?!? And why isn't
that erase_kstack symbol completely empty. It gives it that address
ffffffff81a00010 and size and this looks really wrong.

/me experiments a bit more, notices the ENDPROC()...

Ok, I think I know what it is:

END(erase_kstack) gives

.globl erase_kstack ; .p2align 4, 0x90 ; erase_kstack:
# 167 "arch/x86/entry/entry_64.S"
.size erase_kstack, .-erase_kstack
# 182 "arch/x86/entry/entry_64.S"

and that generates a STT_NOTYPE symbol:

 67318: ffffffff81a00000     0 NOTYPE  GLOBAL DEFAULT    1 erase_kstack

In that case, the symbol gets ignored by objdump as it dumps this at
that address:

ffffffff81a00000 <entry_SYSCALL_64_stage2>:
ffffffff81a00000:       5f                      pop    %rdi
ffffffff81a00001:       eb 31                   jmp    ffffffff81a00034 <entry_SYSCALL_64_after_hwframe>
ffffffff81a00003:       0f 1f 00                nopl   (%rax)
ffffffff81a00006:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
ffffffff81a0000d:       00 00 00

which is basically the strange erase_kstack() variant you pasted above.

while ENDPROC(erase_kstack) gives

.globl erase_kstack ; .p2align 4, 0x90 ; erase_kstack:
# 167 "arch/x86/entry/entry_64.S"
.type erase_kstack, @function ; .size erase_kstack, .-erase_kstack
# 182 "arch/x86/entry/entry_64.S"

and that generates a STT_FUNC:

 67318: ffffffff81a00000     0 FUNC    GLOBAL DEFAULT    1 erase_kstack

Now, there's this comment over ENDPROC:

/* If symbol 'name' is treated as a subroutine (gets called, and returns)
 * then please use ENDPROC to mark 'name' as STT_FUNC for the benefit of
 * static analysis tools such as stack depth analyzer.
 */
#ifndef ENDPROC

and I don't think we want to analyze the stack depth of erase_kstack
itself.

However, even if we did END(erase_kstack), the calls are still in the
code:

ffffffff81a00111:       e8 ea fe ff ff          callq  ffffffff81a00000 <entry_SYSCALL_64_stage2>

so macro it is. But please call the macro something else, not the same
name as the function.

> The commit message says: "Full STACKLEAK feature also contains the gcc plugin
> which comes in a separate commit".
> 
> And the next commit in this series introduces that plugin. Let me quote its
> commit message as well:

...

> Tracking the lowest border of the kernel stack with the lowest_stack
> variable makes STACKLEAK so efficient (please see the performance
> statistics in the cover letter).

Aha, there it is. I guess I better continue looking through the
patchset. :)

> The mm.txt already has this line:
>  ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole
> 
> Excuse me, I didn't get what to document.

You say

/* Poison value points to the unused hole in the virtual memory map */

but we do change that memory map from time to time and there are
multiple unused holes.

So do something like this so that there are no clashes when someone
decides to use that unused hole:

---
diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
index ea91cb61a602..5d8f4168247d 100644
--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -24,6 +24,7 @@ ffffffffa0000000 - [fixmap start]   (~1526 MB) module mapping space (variable)
 [fixmap start]   - ffffffffff5fffff kernel-internal fixmap range
 ffffffffff600000 - ffffffffff600fff (=4 kB) legacy vsyscall ABI
 ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole
+Stackleak poison value in this last hole: 0xffffffffffff4111
 
 Virtual memory map with 5 level page tables:
 
@@ -50,6 +51,7 @@ ffffffffa0000000 - fffffffffeffffff (1520 MB) module mapping space
 [fixmap start]   - ffffffffff5fffff kernel-internal fixmap range
 ffffffffff600000 - ffffffffff600fff (=4 kB) legacy vsyscall ABI
 ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole
+Stackleak poison value in this last hole: 0xffffffffffff4111
 
 Architecture defines a 64-bit virtual address. Implementations can support
 less. Currently supported are 48- and 57-bit virtual addresses. Bits 63
---

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/2] stackleak: Update for arm64
  2018-02-22 16:58             ` Will Deacon
@ 2018-02-22 19:22               ` Alexander Popov
  -1 siblings, 0 replies; 61+ messages in thread
From: Alexander Popov @ 2018-02-22 19:22 UTC (permalink / raw)
  To: Will Deacon, Laura Abbott
  Cc: Kees Cook, Mark Rutland, Ard Biesheuvel, linux-kernel,
	linux-arm-kernel, kernel-hardening, richard.sandiford

Hello Will, Richard and GCC folks!

On 22.02.2018 19:58, Will Deacon wrote:
> On Tue, Feb 20, 2018 at 05:13:02PM -0800, Laura Abbott wrote:
>>
>> arm64 has another layer of indirection in the RTL.
>> Account for this in the plugin.
>>
>> Signed-off-by: Laura Abbott <labbott@redhat.com>
>> ---
>>  scripts/gcc-plugins/stackleak_plugin.c | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/scripts/gcc-plugins/stackleak_plugin.c b/scripts/gcc-plugins/stackleak_plugin.c
>> index 6fc991c98d8b..7dfaa027423f 100644
>> --- a/scripts/gcc-plugins/stackleak_plugin.c
>> +++ b/scripts/gcc-plugins/stackleak_plugin.c
>> @@ -244,6 +244,11 @@ static unsigned int stackleak_final_execute(void)
>>  		 * that insn.
>>  		 */
>>  		body = PATTERN(insn);
>> +		/* arm64 is different */
>> +		if (GET_CODE(body) == PARALLEL) {
>> +			body = XEXP(body, 0);
>> +			body = XEXP(body, 0);
>> +		}
> 
> Like most kernel developers, I don't know the first thing about GCC internals
> so I asked our GCC team and Richard (CC'd) reckons this should be:
> 
> 	if (GET_CODE(body) == PARALLEL)
> 		body = XVECEXP(body, 0, 0);
> 
> instead of the hunk above. Can you give that a go instead, please?

Thanks a lot!

Would you be so kind to take a look at the whole STACKLEAK plugin?
http://www.openwall.com/lists/kernel-hardening/2018/02/16/4
https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?h=kspp/gcc-plugin/stackleak&id=57a0a6763b12e82dd462593d0f42be610e93cdc9

It's not very big. I documented it in detail. I would be really grateful for the
review!

Best regards,
Alexander

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 1/2] stackleak: Update for arm64
@ 2018-02-22 19:22               ` Alexander Popov
  0 siblings, 0 replies; 61+ messages in thread
From: Alexander Popov @ 2018-02-22 19:22 UTC (permalink / raw)
  To: linux-arm-kernel

Hello Will, Richard and GCC folks!

On 22.02.2018 19:58, Will Deacon wrote:
> On Tue, Feb 20, 2018 at 05:13:02PM -0800, Laura Abbott wrote:
>>
>> arm64 has another layer of indirection in the RTL.
>> Account for this in the plugin.
>>
>> Signed-off-by: Laura Abbott <labbott@redhat.com>
>> ---
>>  scripts/gcc-plugins/stackleak_plugin.c | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/scripts/gcc-plugins/stackleak_plugin.c b/scripts/gcc-plugins/stackleak_plugin.c
>> index 6fc991c98d8b..7dfaa027423f 100644
>> --- a/scripts/gcc-plugins/stackleak_plugin.c
>> +++ b/scripts/gcc-plugins/stackleak_plugin.c
>> @@ -244,6 +244,11 @@ static unsigned int stackleak_final_execute(void)
>>  		 * that insn.
>>  		 */
>>  		body = PATTERN(insn);
>> +		/* arm64 is different */
>> +		if (GET_CODE(body) == PARALLEL) {
>> +			body = XEXP(body, 0);
>> +			body = XEXP(body, 0);
>> +		}
> 
> Like most kernel developers, I don't know the first thing about GCC internals
> so I asked our GCC team and Richard (CC'd) reckons this should be:
> 
> 	if (GET_CODE(body) == PARALLEL)
> 		body = XVECEXP(body, 0, 0);
> 
> instead of the hunk above. Can you give that a go instead, please?

Thanks a lot!

Would you be so kind to take a look at the whole STACKLEAK plugin?
http://www.openwall.com/lists/kernel-hardening/2018/02/16/4
https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?h=kspp/gcc-plugin/stackleak&id=57a0a6763b12e82dd462593d0f42be610e93cdc9

It's not very big. I documented it in detail. I would be really grateful for the
review!

Best regards,
Alexander

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/2] stackleak: Update for arm64
  2018-02-22 16:58             ` Will Deacon
@ 2018-02-22 19:38               ` Laura Abbott
  -1 siblings, 0 replies; 61+ messages in thread
From: Laura Abbott @ 2018-02-22 19:38 UTC (permalink / raw)
  To: Will Deacon
  Cc: Alexander Popov, Kees Cook, Mark Rutland, Ard Biesheuvel,
	linux-kernel, linux-arm-kernel, kernel-hardening,
	richard.sandiford

On 02/22/2018 08:58 AM, Will Deacon wrote:
> Hi Laura,
> 
> On Tue, Feb 20, 2018 at 05:13:02PM -0800, Laura Abbott wrote:
>>
>> arm64 has another layer of indirection in the RTL.
>> Account for this in the plugin.
>>
>> Signed-off-by: Laura Abbott <labbott@redhat.com>
>> ---
>>   scripts/gcc-plugins/stackleak_plugin.c | 5 +++++
>>   1 file changed, 5 insertions(+)
>>
>> diff --git a/scripts/gcc-plugins/stackleak_plugin.c b/scripts/gcc-plugins/stackleak_plugin.c
>> index 6fc991c98d8b..7dfaa027423f 100644
>> --- a/scripts/gcc-plugins/stackleak_plugin.c
>> +++ b/scripts/gcc-plugins/stackleak_plugin.c
>> @@ -244,6 +244,11 @@ static unsigned int stackleak_final_execute(void)
>>   		 * that insn.
>>   		 */
>>   		body = PATTERN(insn);
>> +		/* arm64 is different */
>> +		if (GET_CODE(body) == PARALLEL) {
>> +			body = XEXP(body, 0);
>> +			body = XEXP(body, 0);
>> +		}
> 
> Like most kernel developers, I don't know the first thing about GCC internals
> so I asked our GCC team and Richard (CC'd) reckons this should be:
> 
> 	if (GET_CODE(body) == PARALLEL)
> 		body = XVECEXP(body, 0, 0);
> 
> instead of the hunk above. Can you give that a go instead, please?
> 
> Cheers,
> 
> Will
> 

Yep, seems to work fine and makes sense from my understanding of
gcc internals. I'll fix it up for the next version. Thanks for the
review!

Laura

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 1/2] stackleak: Update for arm64
@ 2018-02-22 19:38               ` Laura Abbott
  0 siblings, 0 replies; 61+ messages in thread
From: Laura Abbott @ 2018-02-22 19:38 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/22/2018 08:58 AM, Will Deacon wrote:
> Hi Laura,
> 
> On Tue, Feb 20, 2018 at 05:13:02PM -0800, Laura Abbott wrote:
>>
>> arm64 has another layer of indirection in the RTL.
>> Account for this in the plugin.
>>
>> Signed-off-by: Laura Abbott <labbott@redhat.com>
>> ---
>>   scripts/gcc-plugins/stackleak_plugin.c | 5 +++++
>>   1 file changed, 5 insertions(+)
>>
>> diff --git a/scripts/gcc-plugins/stackleak_plugin.c b/scripts/gcc-plugins/stackleak_plugin.c
>> index 6fc991c98d8b..7dfaa027423f 100644
>> --- a/scripts/gcc-plugins/stackleak_plugin.c
>> +++ b/scripts/gcc-plugins/stackleak_plugin.c
>> @@ -244,6 +244,11 @@ static unsigned int stackleak_final_execute(void)
>>   		 * that insn.
>>   		 */
>>   		body = PATTERN(insn);
>> +		/* arm64 is different */
>> +		if (GET_CODE(body) == PARALLEL) {
>> +			body = XEXP(body, 0);
>> +			body = XEXP(body, 0);
>> +		}
> 
> Like most kernel developers, I don't know the first thing about GCC internals
> so I asked our GCC team and Richard (CC'd) reckons this should be:
> 
> 	if (GET_CODE(body) == PARALLEL)
> 		body = XVECEXP(body, 0, 0);
> 
> instead of the hunk above. Can you give that a go instead, please?
> 
> Cheers,
> 
> Will
> 

Yep, seems to work fine and makes sense from my understanding of
gcc internals. I'll fix it up for the next version. Thanks for the
review!

Laura

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH RFC v8 1/6] x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls
  2018-02-22 19:14       ` Borislav Petkov
@ 2018-02-22 20:24         ` Alexander Popov
  0 siblings, 0 replies; 61+ messages in thread
From: Alexander Popov @ 2018-02-22 20:24 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: kernel-hardening, Kees Cook, PaX Team, Brad Spengler,
	Ingo Molnar, Andy Lutomirski, Tycho Andersen, Laura Abbott,
	Mark Rutland, Ard Biesheuvel, Thomas Gleixner, H . Peter Anvin,
	Peter Zijlstra, Dmitry V . Levin, x86

Hello Borislav,

On 22.02.2018 22:14, Borislav Petkov wrote:
> On Thu, Feb 22, 2018 at 12:49:44AM +0300, Alexander Popov wrote:
> However, even if we did END(erase_kstack), the calls are still in the
> code:
> 
> ffffffff81a00111:       e8 ea fe ff ff          callq  ffffffff81a00000 <entry_SYSCALL_64_stage2>
> 
> so macro it is. But please call the macro something else, not the same
> name as the function.

Thanks for your time spent on this! I'll call it ERASE_KSTACK and it will look
like other macros.

>> The mm.txt already has this line:
>>  ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole
>>
>> Excuse me, I didn't get what to document.
> 
> You say
> 
> /* Poison value points to the unused hole in the virtual memory map */
> 
> but we do change that memory map from time to time and there are
> multiple unused holes.
> 
> So do something like this so that there are no clashes when someone
> decides to use that unused hole:
> 
> ---
> diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
> index ea91cb61a602..5d8f4168247d 100644
> --- a/Documentation/x86/x86_64/mm.txt
> +++ b/Documentation/x86/x86_64/mm.txt
> @@ -24,6 +24,7 @@ ffffffffa0000000 - [fixmap start]   (~1526 MB) module mapping space (variable)
>  [fixmap start]   - ffffffffff5fffff kernel-internal fixmap range
>  ffffffffff600000 - ffffffffff600fff (=4 kB) legacy vsyscall ABI
>  ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole
> +Stackleak poison value in this last hole: 0xffffffffffff4111
>  
>  Virtual memory map with 5 level page tables:
>  
> @@ -50,6 +51,7 @@ ffffffffa0000000 - fffffffffeffffff (1520 MB) module mapping space
>  [fixmap start]   - ffffffffff5fffff kernel-internal fixmap range
>  ffffffffff600000 - ffffffffff600fff (=4 kB) legacy vsyscall ABI
>  ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole
> +Stackleak poison value in this last hole: 0xffffffffffff4111

Ok, I see. Thank you very much.

Best regards,
Alexander

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 0/2] Update stackleak for gcc-8
  2018-02-22  1:43 ` Laura Abbott
@ 2018-02-22 23:14   ` Laura Abbott
  2018-02-22 23:14     ` [PATCH 1/2] gcc-plugins: Update cgraph_create_edge " Laura Abbott
                       ` (2 more replies)
  0 siblings, 3 replies; 61+ messages in thread
From: Laura Abbott @ 2018-02-22 23:14 UTC (permalink / raw)
  To: Alexander Popov, Kees Cook, kernel-hardening; +Cc: Laura Abbott


Okay I _think_ this should be the solution for gcc-8. I've split this
into two patches, one for the generic framework and the other for the
stackleak plugin itself. Kees, I'll let you figure out how to apply
these. Feel free to fold in as appropriate.



Laura Abbott (2):
  gcc-plugins: Update cgraph_create_edge for gcc-8
  gcc-plugins: stackleak: Update for gcc-8

 scripts/gcc-plugins/gcc-common.h       |  6 ++++++
 scripts/gcc-plugins/stackleak_plugin.c | 24 +++++++++++++++++++++++-
 2 files changed, 29 insertions(+), 1 deletion(-)

-- 
2.14.3

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 1/2] gcc-plugins: Update cgraph_create_edge for gcc-8
  2018-02-22 23:14   ` [PATCH 0/2] Update stackleak for gcc-8 Laura Abbott
@ 2018-02-22 23:14     ` Laura Abbott
  2018-02-22 23:40       ` Kees Cook
  2018-02-22 23:14     ` [PATCH 2/2] gcc-plugins: stackleak: Update " Laura Abbott
  2018-02-22 23:43     ` [PATCH 0/2] Update stackleak " Kees Cook
  2 siblings, 1 reply; 61+ messages in thread
From: Laura Abbott @ 2018-02-22 23:14 UTC (permalink / raw)
  To: Alexander Popov, Kees Cook, kernel-hardening; +Cc: Laura Abbott


gcc-8 changed the API for cgraph_create_edge. Update accordingly.

Signed-off-by: Laura Abbott <labbott@redhat.com>
---
 scripts/gcc-plugins/gcc-common.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/scripts/gcc-plugins/gcc-common.h b/scripts/gcc-plugins/gcc-common.h
index f46750053377..42c55c29157f 100644
--- a/scripts/gcc-plugins/gcc-common.h
+++ b/scripts/gcc-plugins/gcc-common.h
@@ -723,8 +723,14 @@ static inline const char *get_decl_section_name(const_tree decl)
 #define varpool_get_node(decl) varpool_node::get(decl)
 #define dump_varpool_node(file, node) (node)->dump(file)
 
+#if BUILDING_GCC_VERSION >= 8000
+#define cgraph_create_edge(caller, callee, call_stmt, count, nest) \
+	(caller)->create_edge((callee), (call_stmt), (count))
+#else
 #define cgraph_create_edge(caller, callee, call_stmt, count, freq, nest) \
 	(caller)->create_edge((callee), (call_stmt), (count), (freq))
+
+#endif
 #define cgraph_create_edge_including_clones(caller, callee, old_call_stmt, call_stmt, count, freq, nest, reason) \
 	(caller)->create_edge_including_clones((callee), (old_call_stmt), (call_stmt), (count), (freq), (reason))
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 2/2] gcc-plugins: stackleak: Update for gcc-8
  2018-02-22 23:14   ` [PATCH 0/2] Update stackleak for gcc-8 Laura Abbott
  2018-02-22 23:14     ` [PATCH 1/2] gcc-plugins: Update cgraph_create_edge " Laura Abbott
@ 2018-02-22 23:14     ` Laura Abbott
  2018-02-24 14:04       ` Alexander Popov
  2018-02-22 23:43     ` [PATCH 0/2] Update stackleak " Kees Cook
  2 siblings, 1 reply; 61+ messages in thread
From: Laura Abbott @ 2018-02-22 23:14 UTC (permalink / raw)
  To: Alexander Popov, Kees Cook, kernel-hardening; +Cc: Laura Abbott


- Use the new cgraph_create_edge API
- Account for the change in type for get_frame_size

Signed-off-by: Laura Abbott <labbott@redhat.com>
---
 scripts/gcc-plugins/stackleak_plugin.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/scripts/gcc-plugins/stackleak_plugin.c b/scripts/gcc-plugins/stackleak_plugin.c
index 6fc991c98d8b..4ea2fdb987e6 100644
--- a/scripts/gcc-plugins/stackleak_plugin.c
+++ b/scripts/gcc-plugins/stackleak_plugin.c
@@ -69,8 +69,13 @@ static void stackleak_check_alloca(gimple_stmt_iterator *gsi)
 	node = cgraph_get_create_node(check_function_decl);
 	gcc_assert(node);
 	frequency = compute_call_stmt_bb_frequency(current_function_decl, bb);
+#if BUILDING_GCC_VERSION >= 8000
+	cgraph_create_edge(cgraph_get_node(current_function_decl), node,
+			check_alloca, bb->count, bb->loop_depth);
+#else
 	cgraph_create_edge(cgraph_get_node(current_function_decl), node,
 			check_alloca, bb->count, frequency, bb->loop_depth);
+#endif
 }
 
 static void stackleak_add_instrumentation(gimple_stmt_iterator *gsi, bool after)
@@ -94,8 +99,13 @@ static void stackleak_add_instrumentation(gimple_stmt_iterator *gsi, bool after)
 	node = cgraph_get_create_node(track_function_decl);
 	gcc_assert(node);
 	frequency = compute_call_stmt_bb_frequency(current_function_decl, bb);
+#if BUILDING_GCC_VERSION >= 8000
+	cgraph_create_edge(cgraph_get_node(current_function_decl), node,
+			track_stack, bb->count, bb->loop_depth);
+#else
 	cgraph_create_edge(cgraph_get_node(current_function_decl), node,
 			track_stack, bb->count, frequency, bb->loop_depth);
+#endif
 }
 
 static bool is_alloca(gimple stmt)
@@ -203,6 +213,18 @@ static unsigned int stackleak_tree_instrument_execute(void)
 	return 0;
 }
 
+
+#if BUILDING_GCC_VERSION >= 8000
+bool check_frame_size()
+{
+	return maybe_ge(get_frame_size(), track_frame_size);
+}
+#else
+bool check_frame_size()
+{
+	return get_frame_size() >= track_frame_size;
+}
+#endif
 /*
  * Work with the RTL representation of the code.
  * Remove the unneeded track_stack() calls from the functions which don't
@@ -215,7 +237,7 @@ static unsigned int stackleak_final_execute(void)
 	if (cfun->calls_alloca)
 		return 0;
 
-	if (get_frame_size() >= track_frame_size)
+	if (check_frame_size())
 		return 0;
 
 	/*
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/2] gcc-plugins: Update cgraph_create_edge for gcc-8
  2018-02-22 23:14     ` [PATCH 1/2] gcc-plugins: Update cgraph_create_edge " Laura Abbott
@ 2018-02-22 23:40       ` Kees Cook
  2018-02-23 17:30         ` Laura Abbott
  0 siblings, 1 reply; 61+ messages in thread
From: Kees Cook @ 2018-02-22 23:40 UTC (permalink / raw)
  To: Laura Abbott; +Cc: Alexander Popov, Kernel Hardening

On Thu, Feb 22, 2018 at 3:14 PM, Laura Abbott <labbott@redhat.com> wrote:
>
> gcc-8 changed the API for cgraph_create_edge. Update accordingly.
>
> Signed-off-by: Laura Abbott <labbott@redhat.com>
> ---
>  scripts/gcc-plugins/gcc-common.h | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/scripts/gcc-plugins/gcc-common.h b/scripts/gcc-plugins/gcc-common.h
> index f46750053377..42c55c29157f 100644
> --- a/scripts/gcc-plugins/gcc-common.h
> +++ b/scripts/gcc-plugins/gcc-common.h
> @@ -723,8 +723,14 @@ static inline const char *get_decl_section_name(const_tree decl)
>  #define varpool_get_node(decl) varpool_node::get(decl)
>  #define dump_varpool_node(file, node) (node)->dump(file)
>
> +#if BUILDING_GCC_VERSION >= 8000
> +#define cgraph_create_edge(caller, callee, call_stmt, count, nest) \
> +       (caller)->create_edge((callee), (call_stmt), (count))
> +#else
>  #define cgraph_create_edge(caller, callee, call_stmt, count, freq, nest) \
>         (caller)->create_edge((callee), (call_stmt), (count), (freq))
> +
> +#endif

Since gcc 8 "throws away" the freq argument, could this just be
updated to do the same? i.e., leave "freq" as an argument, and just
don't use it? That way the plugin caller doesn't need to be updated.

+#if BUILDING_GCC_VERSION >= 8000
+#define cgraph_create_edge(caller, callee, call_stmt, count, freq, nest) \
+       (caller)->create_edge((callee), (call_stmt), (count))
+#else
 #define cgraph_create_edge(caller, callee, call_stmt, count, freq, nest) \
        (caller)->create_edge((callee), (call_stmt), (count), (freq))
+#endif


-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 0/2] Update stackleak for gcc-8
  2018-02-22 23:14   ` [PATCH 0/2] Update stackleak for gcc-8 Laura Abbott
  2018-02-22 23:14     ` [PATCH 1/2] gcc-plugins: Update cgraph_create_edge " Laura Abbott
  2018-02-22 23:14     ` [PATCH 2/2] gcc-plugins: stackleak: Update " Laura Abbott
@ 2018-02-22 23:43     ` Kees Cook
  2 siblings, 0 replies; 61+ messages in thread
From: Kees Cook @ 2018-02-22 23:43 UTC (permalink / raw)
  To: Laura Abbott; +Cc: Alexander Popov, Kernel Hardening

On Thu, Feb 22, 2018 at 3:14 PM, Laura Abbott <labbott@redhat.com> wrote:
>
> Okay I _think_ this should be the solution for gcc-8. I've split this
> into two patches, one for the generic framework and the other for the
> stackleak plugin itself.

Nice! Thanks for solving this!

> Kees, I'll let you figure out how to apply
> these. Feel free to fold in as appropriate.

Other than the suggestion to have fewer ifdef in stackleak, this looks
fine. Alexander, feel free to take these both (with adjustments) for
the stackleak series: stackleak is the only user of
cgraph_create_edge() so I think it's fine to adjust gcc-common.h in
the series.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/2] gcc-plugins: Update cgraph_create_edge for gcc-8
  2018-02-22 23:40       ` Kees Cook
@ 2018-02-23 17:30         ` Laura Abbott
  2018-02-24 12:36           ` Alexander Popov
  0 siblings, 1 reply; 61+ messages in thread
From: Laura Abbott @ 2018-02-23 17:30 UTC (permalink / raw)
  To: Kees Cook; +Cc: Alexander Popov, Kernel Hardening

On 02/22/2018 03:40 PM, Kees Cook wrote:
> On Thu, Feb 22, 2018 at 3:14 PM, Laura Abbott <labbott@redhat.com> wrote:
>>
>> gcc-8 changed the API for cgraph_create_edge. Update accordingly.
>>
>> Signed-off-by: Laura Abbott <labbott@redhat.com>
>> ---
>>   scripts/gcc-plugins/gcc-common.h | 6 ++++++
>>   1 file changed, 6 insertions(+)
>>
>> diff --git a/scripts/gcc-plugins/gcc-common.h b/scripts/gcc-plugins/gcc-common.h
>> index f46750053377..42c55c29157f 100644
>> --- a/scripts/gcc-plugins/gcc-common.h
>> +++ b/scripts/gcc-plugins/gcc-common.h
>> @@ -723,8 +723,14 @@ static inline const char *get_decl_section_name(const_tree decl)
>>   #define varpool_get_node(decl) varpool_node::get(decl)
>>   #define dump_varpool_node(file, node) (node)->dump(file)
>>
>> +#if BUILDING_GCC_VERSION >= 8000
>> +#define cgraph_create_edge(caller, callee, call_stmt, count, nest) \
>> +       (caller)->create_edge((callee), (call_stmt), (count))
>> +#else
>>   #define cgraph_create_edge(caller, callee, call_stmt, count, freq, nest) \
>>          (caller)->create_edge((callee), (call_stmt), (count), (freq))
>> +
>> +#endif
> 
> Since gcc 8 "throws away" the freq argument, could this just be
> updated to do the same? i.e., leave "freq" as an argument, and just
> don't use it? That way the plugin caller doesn't need to be updated.
> 
> +#if BUILDING_GCC_VERSION >= 8000
> +#define cgraph_create_edge(caller, callee, call_stmt, count, freq, nest) \
> +       (caller)->create_edge((callee), (call_stmt), (count))
> +#else
>   #define cgraph_create_edge(caller, callee, call_stmt, count, freq, nest) \
>          (caller)->create_edge((callee), (call_stmt), (count), (freq))
> +#endif
> 
> 
> -Kees
> 

Yeah, I think I started out that way then ended up with this for
debugging reasons. The only concern I might have is ending up
with the freq argument flagged as unused by gcc but I'm indifferent
overall.

Thanks,
Laura

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/2] gcc-plugins: Update cgraph_create_edge for gcc-8
  2018-02-23 17:30         ` Laura Abbott
@ 2018-02-24 12:36           ` Alexander Popov
  0 siblings, 0 replies; 61+ messages in thread
From: Alexander Popov @ 2018-02-24 12:36 UTC (permalink / raw)
  To: Laura Abbott, Kees Cook; +Cc: Kernel Hardening

On 23.02.2018 20:30, Laura Abbott wrote:
> On 02/22/2018 03:40 PM, Kees Cook wrote:
>> On Thu, Feb 22, 2018 at 3:14 PM, Laura Abbott <labbott@redhat.com> wrote:
>>>
>>> gcc-8 changed the API for cgraph_create_edge. Update accordingly.
>>>
>>> Signed-off-by: Laura Abbott <labbott@redhat.com>
>>> ---
>>>   scripts/gcc-plugins/gcc-common.h | 6 ++++++
>>>   1 file changed, 6 insertions(+)
>>>
>>> diff --git a/scripts/gcc-plugins/gcc-common.h b/scripts/gcc-plugins/gcc-common.h
>>> index f46750053377..42c55c29157f 100644
>>> --- a/scripts/gcc-plugins/gcc-common.h
>>> +++ b/scripts/gcc-plugins/gcc-common.h
>>> @@ -723,8 +723,14 @@ static inline const char *get_decl_section_name(const_tree decl)
>>>   #define varpool_get_node(decl) varpool_node::get(decl)
>>>   #define dump_varpool_node(file, node) (node)->dump(file)
>>>
>>> +#if BUILDING_GCC_VERSION >= 8000
>>> +#define cgraph_create_edge(caller, callee, call_stmt, count, nest) \
>>> +       (caller)->create_edge((callee), (call_stmt), (count))
>>> +#else
>>>   #define cgraph_create_edge(caller, callee, call_stmt, count, freq, nest) \
>>>          (caller)->create_edge((callee), (call_stmt), (count), (freq))
>>> +
>>> +#endif
>>
>> Since gcc 8 "throws away" the freq argument, could this just be
>> updated to do the same? i.e., leave "freq" as an argument, and just
>> don't use it? That way the plugin caller doesn't need to be updated.
>>
>> +#if BUILDING_GCC_VERSION >= 8000
>> +#define cgraph_create_edge(caller, callee, call_stmt, count, freq, nest) \
>> +       (caller)->create_edge((callee), (call_stmt), (count))
>> +#else
>>   #define cgraph_create_edge(caller, callee, call_stmt, count, freq, nest) \
>>          (caller)->create_edge((callee), (call_stmt), (count), (freq))
>> +#endif
>>
> 
> Yeah, I think I started out that way then ended up with this for
> debugging reasons. The only concern I might have is ending up
> with the freq argument flagged as unused by gcc but I'm indifferent
> overall.

Hello Laura and Kees,

Laura, thanks a lot for your work! I double-checked gcc changelog and
agree with you. I will add this fix for gcc-8 as a separate commit
to the next version of STACKLEAK.

And I'll do some macro cleanup as well. All the cgraph_create_edge* macros
in gcc-common.h look strange.

At line 395 we have these redefinitions for dropping "nest":

 #if BUILDING_GCC_VERSION >= 4007 && BUILDING_GCC_VERSION <= 4009
 #define cgraph_create_edge(caller, callee, call_stmt, count, freq, nest) \
 	cgraph_create_edge((caller), (callee), (call_stmt), (count), (freq))
 #define cgraph_create_edge_including_clones(caller, callee, old_call_stmt, call_stmt, count, freq, nest, reason) \
 	cgraph_create_edge_including_clones((caller), (callee), (old_call_stmt), (call_stmt), (count), (freq), (reason))
 #endif

But later at line 726 "nest" is not used as well:

 #define cgraph_create_edge(caller, callee, call_stmt, count, freq, nest) \
 	(caller)->create_edge((callee), (call_stmt), (count), (freq))
 #define cgraph_create_edge_including_clones(caller, callee, old_call_stmt, call_stmt, count, freq, nest, reason) \
 	(caller)->create_edge_including_clones((callee), (old_call_stmt), (call_stmt), (count), (freq), (reason))

So I'm going to drop "nest" from cgraph_create_edge* macros and the plugin code.

Thanks again!
Best regards,
Alexander

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 2/2] gcc-plugins: stackleak: Update for gcc-8
  2018-02-22 23:14     ` [PATCH 2/2] gcc-plugins: stackleak: Update " Laura Abbott
@ 2018-02-24 14:04       ` Alexander Popov
  2018-02-26 21:51         ` Laura Abbott
  0 siblings, 1 reply; 61+ messages in thread
From: Alexander Popov @ 2018-02-24 14:04 UTC (permalink / raw)
  To: Laura Abbott, Kees Cook, kernel-hardening; +Cc: Will Deacon, richard.sandiford

Hello Laura,

Thanks for the cooperation!

On 23.02.2018 02:14, Laura Abbott wrote:
> +#if BUILDING_GCC_VERSION >= 8000
> +bool check_frame_size()
> +{
> +	return maybe_ge(get_frame_size(), track_frame_size);

After looking through this guide
https://gcc.gnu.org/onlinedocs//gccint/Guidelines-for-using-poly_005fint.html#Guidelines-for-using-poly_005fint
it seems to me that we should better use something like that:

poly_int64 frame_size = get_frame_size();

if (frame_size.to_constant() >= track_frame_size)
	return 0;

May I ask for your opinion?

> +}
> +#else
> +bool check_frame_size()
> +{
> +	return get_frame_size() >= track_frame_size;
> +}
> +#endif
>  /*
>   * Work with the RTL representation of the code.
>   * Remove the unneeded track_stack() calls from the functions which don't
> @@ -215,7 +237,7 @@ static unsigned int stackleak_final_execute(void)
>  	if (cfun->calls_alloca)
>  		return 0;
>  
> -	if (get_frame_size() >= track_frame_size)
> +	if (check_frame_size())
>  		return 0;
>  

Best regards,
Alexander

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 2/2] gcc-plugins: stackleak: Update for gcc-8
  2018-02-24 14:04       ` Alexander Popov
@ 2018-02-26 21:51         ` Laura Abbott
  2018-02-27 10:30           ` Richard Sandiford
  0 siblings, 1 reply; 61+ messages in thread
From: Laura Abbott @ 2018-02-26 21:51 UTC (permalink / raw)
  To: alex.popov, Kees Cook, kernel-hardening; +Cc: Will Deacon, richard.sandiford

On 02/24/2018 06:04 AM, Alexander Popov wrote:
> Hello Laura,
> 
> Thanks for the cooperation!
> 
> On 23.02.2018 02:14, Laura Abbott wrote:
>> +#if BUILDING_GCC_VERSION >= 8000
>> +bool check_frame_size()
>> +{
>> +	return maybe_ge(get_frame_size(), track_frame_size);
> 
> After looking through this guide
> https://gcc.gnu.org/onlinedocs//gccint/Guidelines-for-using-poly_005fint.html#Guidelines-for-using-poly_005fint
> it seems to me that we should better use something like that:
> 
> poly_int64 frame_size = get_frame_size();
> 
> if (frame_size.to_constant() >= track_frame_size)
> 	return 0;
> 
> May I ask for your opinion?
> 

I was a bit wary of using to_constant() because I wasn't 100% sure
I could make the assertion that it could actually be a constant so using
maybe_ge seemed like the conservative approach. This also getting into
compiler internals so it's possible I'm misunderstanding how poly_ints
are supposed to work.

Thanks,
Laura

>> +}
>> +#else
>> +bool check_frame_size()
>> +{
>> +	return get_frame_size() >= track_frame_size;
>> +}
>> +#endif
>>   /*
>>    * Work with the RTL representation of the code.
>>    * Remove the unneeded track_stack() calls from the functions which don't
>> @@ -215,7 +237,7 @@ static unsigned int stackleak_final_execute(void)
>>   	if (cfun->calls_alloca)
>>   		return 0;
>>   
>> -	if (get_frame_size() >= track_frame_size)
>> +	if (check_frame_size())
>>   		return 0;
>>   
> 
> Best regards,
> Alexander
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/2] stackleak: Update for arm64
  2018-02-22 19:22               ` Alexander Popov
  (?)
@ 2018-02-27 10:21                 ` Richard Sandiford
  -1 siblings, 0 replies; 61+ messages in thread
From: Richard Sandiford @ 2018-02-27 10:21 UTC (permalink / raw)
  To: Alexander Popov
  Cc: Will Deacon, Laura Abbott, Kees Cook, Mark Rutland,
	Ard Biesheuvel, linux-kernel, linux-arm-kernel, kernel-hardening

Hi Alexander,

Sorry for the slow reply, been caught up in an office move.

Alexander Popov <alex.popov@linux.com> writes:
> Hello Will, Richard and GCC folks!
>
> On 22.02.2018 19:58, Will Deacon wrote:
>> On Tue, Feb 20, 2018 at 05:13:02PM -0800, Laura Abbott wrote:
>>>
>>> arm64 has another layer of indirection in the RTL.
>>> Account for this in the plugin.
>>>
>>> Signed-off-by: Laura Abbott <labbott@redhat.com>
>>> ---
>>>  scripts/gcc-plugins/stackleak_plugin.c | 5 +++++
>>>  1 file changed, 5 insertions(+)
>>>
>>> diff --git a/scripts/gcc-plugins/stackleak_plugin.c b/scripts/gcc-plugins/stackleak_plugin.c
>>> index 6fc991c98d8b..7dfaa027423f 100644
>>> --- a/scripts/gcc-plugins/stackleak_plugin.c
>>> +++ b/scripts/gcc-plugins/stackleak_plugin.c
>>> @@ -244,6 +244,11 @@ static unsigned int stackleak_final_execute(void)
>>>  		 * that insn.
>>>  		 */
>>>  		body = PATTERN(insn);
>>> +		/* arm64 is different */
>>> +		if (GET_CODE(body) == PARALLEL) {
>>> +			body = XEXP(body, 0);
>>> +			body = XEXP(body, 0);
>>> +		}
>> 
>> Like most kernel developers, I don't know the first thing about GCC internals
>> so I asked our GCC team and Richard (CC'd) reckons this should be:
>> 
>> 	if (GET_CODE(body) == PARALLEL)
>> 		body = XVECEXP(body, 0, 0);
>> 
>> instead of the hunk above. Can you give that a go instead, please?
>
> Thanks a lot!
>
> Would you be so kind to take a look at the whole STACKLEAK plugin?
> http://www.openwall.com/lists/kernel-hardening/2018/02/16/4
> https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?h=kspp/gcc-plugin/stackleak&id=57a0a6763b12e82dd462593d0f42be610e93cdc9
>
> It's not very big. I documented it in detail. I would be really grateful for the
> review!

Looks good to me FWIW.  Just a couple of minor things:

> +	/*
> +	 * 1. Loop through the GIMPLE statements in each of cfun basic blocks.
> +	 * cfun is a global variable which represents the function that is
> +	 * currently processed.
> +	 */
> +	FOR_EACH_BB_FN(bb, cfun) {
> +		for (gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi)) {
> +			gimple stmt;
> +
> +			stmt = gsi_stmt(gsi);
> +
> +			/* Leaf function is a function which makes no calls */
> +			if (is_gimple_call(stmt))
> +				is_leaf = false;

It's probably not going to matter in practice, but it might be worth
emphasising in the comments that this leafness is only approximate.
It will sometimes be a false positive, because we could still
end up creating calls to libgcc functions from non-call statements
(or for target-specific reasons).  It can also be a false negative,
since call statements can be to built-in or internal functions that
end up being open-coded.

> +	/*
> +	 * The stackleak_final pass should be executed before the "final" pass,
> +	 * which turns the RTL (Register Transfer Language) into assembly.
> +	 */
> +	PASS_INFO(stackleak_final, "final", 1, PASS_POS_INSERT_BEFORE);

This might be too late, since it happens e.g. after addresses have
been calculated for branch ranges, and after machine-specific passes
(e.g. bundling on ia64).

The stack size is final after reload, so inserting the pass after that
might be better.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 1/2] stackleak: Update for arm64
@ 2018-02-27 10:21                 ` Richard Sandiford
  0 siblings, 0 replies; 61+ messages in thread
From: Richard Sandiford @ 2018-02-27 10:21 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Alexander,

Sorry for the slow reply, been caught up in an office move.

Alexander Popov <alex.popov@linux.com> writes:
> Hello Will, Richard and GCC folks!
>
> On 22.02.2018 19:58, Will Deacon wrote:
>> On Tue, Feb 20, 2018 at 05:13:02PM -0800, Laura Abbott wrote:
>>>
>>> arm64 has another layer of indirection in the RTL.
>>> Account for this in the plugin.
>>>
>>> Signed-off-by: Laura Abbott <labbott@redhat.com>
>>> ---
>>>  scripts/gcc-plugins/stackleak_plugin.c | 5 +++++
>>>  1 file changed, 5 insertions(+)
>>>
>>> diff --git a/scripts/gcc-plugins/stackleak_plugin.c b/scripts/gcc-plugins/stackleak_plugin.c
>>> index 6fc991c98d8b..7dfaa027423f 100644
>>> --- a/scripts/gcc-plugins/stackleak_plugin.c
>>> +++ b/scripts/gcc-plugins/stackleak_plugin.c
>>> @@ -244,6 +244,11 @@ static unsigned int stackleak_final_execute(void)
>>>  		 * that insn.
>>>  		 */
>>>  		body = PATTERN(insn);
>>> +		/* arm64 is different */
>>> +		if (GET_CODE(body) == PARALLEL) {
>>> +			body = XEXP(body, 0);
>>> +			body = XEXP(body, 0);
>>> +		}
>> 
>> Like most kernel developers, I don't know the first thing about GCC internals
>> so I asked our GCC team and Richard (CC'd) reckons this should be:
>> 
>> 	if (GET_CODE(body) == PARALLEL)
>> 		body = XVECEXP(body, 0, 0);
>> 
>> instead of the hunk above. Can you give that a go instead, please?
>
> Thanks a lot!
>
> Would you be so kind to take a look at the whole STACKLEAK plugin?
> http://www.openwall.com/lists/kernel-hardening/2018/02/16/4
> https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?h=kspp/gcc-plugin/stackleak&id=57a0a6763b12e82dd462593d0f42be610e93cdc9
>
> It's not very big. I documented it in detail. I would be really grateful for the
> review!

Looks good to me FWIW.  Just a couple of minor things:

> +	/*
> +	 * 1. Loop through the GIMPLE statements in each of cfun basic blocks.
> +	 * cfun is a global variable which represents the function that is
> +	 * currently processed.
> +	 */
> +	FOR_EACH_BB_FN(bb, cfun) {
> +		for (gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi)) {
> +			gimple stmt;
> +
> +			stmt = gsi_stmt(gsi);
> +
> +			/* Leaf function is a function which makes no calls */
> +			if (is_gimple_call(stmt))
> +				is_leaf = false;

It's probably not going to matter in practice, but it might be worth
emphasising in the comments that this leafness is only approximate.
It will sometimes be a false positive, because we could still
end up creating calls to libgcc functions from non-call statements
(or for target-specific reasons).  It can also be a false negative,
since call statements can be to built-in or internal functions that
end up being open-coded.

> +	/*
> +	 * The stackleak_final pass should be executed before the "final" pass,
> +	 * which turns the RTL (Register Transfer Language) into assembly.
> +	 */
> +	PASS_INFO(stackleak_final, "final", 1, PASS_POS_INSERT_BEFORE);

This might be too late, since it happens e.g. after addresses have
been calculated for branch ranges, and after machine-specific passes
(e.g. bundling on ia64).

The stack size is final after reload, so inserting the pass after that
might be better.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/2] stackleak: Update for arm64
@ 2018-02-27 10:21                 ` Richard Sandiford
  0 siblings, 0 replies; 61+ messages in thread
From: Richard Sandiford @ 2018-02-27 10:21 UTC (permalink / raw)
  To: Alexander Popov
  Cc: Will Deacon, Laura Abbott, Kees Cook, Mark Rutland,
	Ard Biesheuvel, linux-kernel, linux-arm-kernel, kernel-hardening

Hi Alexander,

Sorry for the slow reply, been caught up in an office move.

Alexander Popov <alex.popov@linux.com> writes:
> Hello Will, Richard and GCC folks!
>
> On 22.02.2018 19:58, Will Deacon wrote:
>> On Tue, Feb 20, 2018 at 05:13:02PM -0800, Laura Abbott wrote:
>>>
>>> arm64 has another layer of indirection in the RTL.
>>> Account for this in the plugin.
>>>
>>> Signed-off-by: Laura Abbott <labbott@redhat.com>
>>> ---
>>>  scripts/gcc-plugins/stackleak_plugin.c | 5 +++++
>>>  1 file changed, 5 insertions(+)
>>>
>>> diff --git a/scripts/gcc-plugins/stackleak_plugin.c b/scripts/gcc-plugins/stackleak_plugin.c
>>> index 6fc991c98d8b..7dfaa027423f 100644
>>> --- a/scripts/gcc-plugins/stackleak_plugin.c
>>> +++ b/scripts/gcc-plugins/stackleak_plugin.c
>>> @@ -244,6 +244,11 @@ static unsigned int stackleak_final_execute(void)
>>>  		 * that insn.
>>>  		 */
>>>  		body = PATTERN(insn);
>>> +		/* arm64 is different */
>>> +		if (GET_CODE(body) == PARALLEL) {
>>> +			body = XEXP(body, 0);
>>> +			body = XEXP(body, 0);
>>> +		}
>> 
>> Like most kernel developers, I don't know the first thing about GCC internals
>> so I asked our GCC team and Richard (CC'd) reckons this should be:
>> 
>> 	if (GET_CODE(body) == PARALLEL)
>> 		body = XVECEXP(body, 0, 0);
>> 
>> instead of the hunk above. Can you give that a go instead, please?
>
> Thanks a lot!
>
> Would you be so kind to take a look at the whole STACKLEAK plugin?
> http://www.openwall.com/lists/kernel-hardening/2018/02/16/4
> https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?h=kspp/gcc-plugin/stackleak&id=57a0a6763b12e82dd462593d0f42be610e93cdc9
>
> It's not very big. I documented it in detail. I would be really grateful for the
> review!

Looks good to me FWIW.  Just a couple of minor things:

> +	/*
> +	 * 1. Loop through the GIMPLE statements in each of cfun basic blocks.
> +	 * cfun is a global variable which represents the function that is
> +	 * currently processed.
> +	 */
> +	FOR_EACH_BB_FN(bb, cfun) {
> +		for (gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi)) {
> +			gimple stmt;
> +
> +			stmt = gsi_stmt(gsi);
> +
> +			/* Leaf function is a function which makes no calls */
> +			if (is_gimple_call(stmt))
> +				is_leaf = false;

It's probably not going to matter in practice, but it might be worth
emphasising in the comments that this leafness is only approximate.
It will sometimes be a false positive, because we could still
end up creating calls to libgcc functions from non-call statements
(or for target-specific reasons).  It can also be a false negative,
since call statements can be to built-in or internal functions that
end up being open-coded.

> +	/*
> +	 * The stackleak_final pass should be executed before the "final" pass,
> +	 * which turns the RTL (Register Transfer Language) into assembly.
> +	 */
> +	PASS_INFO(stackleak_final, "final", 1, PASS_POS_INSERT_BEFORE);

This might be too late, since it happens e.g. after addresses have
been calculated for branch ranges, and after machine-specific passes
(e.g. bundling on ia64).

The stack size is final after reload, so inserting the pass after that
might be better.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 2/2] gcc-plugins: stackleak: Update for gcc-8
  2018-02-26 21:51         ` Laura Abbott
@ 2018-02-27 10:30           ` Richard Sandiford
  2018-02-28 10:27             ` Alexander Popov
  0 siblings, 1 reply; 61+ messages in thread
From: Richard Sandiford @ 2018-02-27 10:30 UTC (permalink / raw)
  To: Laura Abbott; +Cc: alex.popov, Kees Cook, kernel-hardening, Will Deacon

Laura Abbott <labbott@redhat.com> writes:
> On 02/24/2018 06:04 AM, Alexander Popov wrote:
>> Hello Laura,
>> 
>> Thanks for the cooperation!
>> 
>> On 23.02.2018 02:14, Laura Abbott wrote:
>>> +#if BUILDING_GCC_VERSION >= 8000
>>> +bool check_frame_size()
>>> +{
>>> +	return maybe_ge(get_frame_size(), track_frame_size);
>> 
>> After looking through this guide
>> https://gcc.gnu.org/onlinedocs//gccint/Guidelines-for-using-poly_005fint.html#Guidelines-for-using-poly_005fint
>> it seems to me that we should better use something like that:
>> 
>> poly_int64 frame_size = get_frame_size();
>> 
>> if (frame_size.to_constant() >= track_frame_size)
>> 	return 0;
>> 
>> May I ask for your opinion?
>> 
>
> I was a bit wary of using to_constant() because I wasn't 100% sure
> I could make the assertion that it could actually be a constant so using

Yeah, the assert would fire for SVE functions that spill SVE registers
to the stack.

> maybe_ge seemed like the conservative approach. This also getting into
> compiler internals so it's possible I'm misunderstanding how poly_ints
> are supposed to work.

The maybe_ge patch looks good to me FWIW.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 2/2] gcc-plugins: stackleak: Update for gcc-8
  2018-02-27 10:30           ` Richard Sandiford
@ 2018-02-28 10:27             ` Alexander Popov
  0 siblings, 0 replies; 61+ messages in thread
From: Alexander Popov @ 2018-02-28 10:27 UTC (permalink / raw)
  To: Laura Abbott, Kees Cook, kernel-hardening, Will Deacon,
	richard.sandiford

On 27.02.2018 13:30, Richard Sandiford wrote:
> Laura Abbott <labbott@redhat.com> writes:
>> On 02/24/2018 06:04 AM, Alexander Popov wrote:
>>> Hello Laura,
>>>
>>> Thanks for the cooperation!
>>>
>>> On 23.02.2018 02:14, Laura Abbott wrote:
>>>> +#if BUILDING_GCC_VERSION >= 8000
>>>> +bool check_frame_size()
>>>> +{
>>>> +	return maybe_ge(get_frame_size(), track_frame_size);
>>>
>>> After looking through this guide
>>> https://gcc.gnu.org/onlinedocs//gccint/Guidelines-for-using-poly_005fint.html#Guidelines-for-using-poly_005fint
>>> it seems to me that we should better use something like that:
>>>
>>> poly_int64 frame_size = get_frame_size();
>>>
>>> if (frame_size.to_constant() >= track_frame_size)
>>> 	return 0;
>>>
>>> May I ask for your opinion?
>>>
>>
>> I was a bit wary of using to_constant() because I wasn't 100% sure
>> I could make the assertion that it could actually be a constant so using
> 
> Yeah, the assert would fire for SVE functions that spill SVE registers
> to the stack.
> 
>> maybe_ge seemed like the conservative approach. This also getting into
>> compiler internals so it's possible I'm misunderstanding how poly_ints
>> are supposed to work.
> 
> The maybe_ge patch looks good to me FWIW.

Thanks, Laura and Richard!

I see the rationale, I will use maybe_ge() in the next version of the patch.

Best regards,
Alexander

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/2] stackleak: Update for arm64
  2018-02-27 10:21                 ` Richard Sandiford
@ 2018-02-28 15:09                   ` Alexander Popov
  -1 siblings, 0 replies; 61+ messages in thread
From: Alexander Popov @ 2018-02-28 15:09 UTC (permalink / raw)
  To: Will Deacon, Laura Abbott, Kees Cook, Mark Rutland,
	Ard Biesheuvel, linux-kernel, linux-arm-kernel, kernel-hardening,
	richard.sandiford

On 27.02.2018 13:21, Richard Sandiford wrote:
> Hi Alexander,
> 
> Sorry for the slow reply, been caught up in an office move.

Thank you very much for the review, Richard!

> Alexander Popov <alex.popov@linux.com> writes:
>> Would you be so kind to take a look at the whole STACKLEAK plugin?
>> http://www.openwall.com/lists/kernel-hardening/2018/02/16/4
>> https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?h=kspp/gcc-plugin/stackleak&id=57a0a6763b12e82dd462593d0f42be610e93cdc9
>>
>> It's not very big. I documented it in detail. I would be really grateful for the
>> review!
> 
> Looks good to me FWIW.  Just a couple of minor things:
> 
>> +	/*
>> +	 * 1. Loop through the GIMPLE statements in each of cfun basic blocks.
>> +	 * cfun is a global variable which represents the function that is
>> +	 * currently processed.
>> +	 */
>> +	FOR_EACH_BB_FN(bb, cfun) {
>> +		for (gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi)) {
>> +			gimple stmt;
>> +
>> +			stmt = gsi_stmt(gsi);
>> +
>> +			/* Leaf function is a function which makes no calls */
>> +			if (is_gimple_call(stmt))
>> +				is_leaf = false;
> 
> It's probably not going to matter in practice, but it might be worth
> emphasising in the comments that this leafness is only approximate.

That's important, thank you! May I ask why you think it's not going to matter in
practice?

> It will sometimes be a false positive, because we could still
> end up creating calls to libgcc functions from non-call statements
> (or for target-specific reasons).  It can also be a false negative,
> since call statements can be to built-in or internal functions that
> end up being open-coded.

Oh, that raises the question: how does this leafness inaccuracy affect in my
particular case?

is_leaf is currently used for finding the special cases to skip the
track_stack() call insertion:

/*
 * Special cases to skip the instrumentation.
 *
 * Taking the address of static inline functions materializes them,
 * but we mustn't instrument some of them as the resulting stack
 * alignment required by the function call ABI will break other
 * assumptions regarding the expected (but not otherwise enforced)
 * register clobbering ABI.
 *
 * Case in point: native_save_fl on amd64 when optimized for size
 * clobbers rdx if it were instrumented here.
 *
 * TODO: any more special cases?
 */
if (is_leaf &&
    !TREE_PUBLIC(current_function_decl) &&
    DECL_DECLARED_INLINE_P(current_function_decl)) {
	return 0;
}


And now it seems to me that the stackleak plugin should not instrument all
static inline functions, regardless of is_leaf. Do you agree?

>> +	/*
>> +	 * The stackleak_final pass should be executed before the "final" pass,
>> +	 * which turns the RTL (Register Transfer Language) into assembly.
>> +	 */
>> +	PASS_INFO(stackleak_final, "final", 1, PASS_POS_INSERT_BEFORE);
> 
> This might be too late, since it happens e.g. after addresses have
> been calculated for branch ranges, and after machine-specific passes
> (e.g. bundling on ia64).
> 
> The stack size is final after reload, so inserting the pass after that
> might be better.

Thanks for that notice. May I ask for the additional clarification?

I specified -fdump-passes and see a lot of passes between reload and final:
  ...
  rtl-sched1                                       :  OFF
  rtl-ira                                          :  ON
  rtl-reload                                       :  ON
  rtl-vzeroupper                                   :  OFF
  *all-postreload                                  :  OFF
     rtl-postreload                                :  OFF
     rtl-gcse2                                     :  OFF
     rtl-split2                                    :  ON
     rtl-ree                                       :  ON
     rtl-cmpelim                                   :  OFF
     rtl-btl1                                      :  OFF
     rtl-pro_and_epilogue                          :  ON
     rtl-dse2                                      :  ON
     rtl-csa                                       :  ON
     rtl-jump2                                     :  ON
     rtl-compgotos                                 :  ON
     rtl-sched_fusion                              :  OFF
     rtl-peephole2                                 :  ON
     rtl-ce3                                       :  ON
     rtl-rnreg                                     :  OFF
     rtl-cprop_hardreg                             :  ON
     rtl-rtl_dce                                   :  ON
     rtl-bbro                                      :  ON
     rtl-btl2                                      :  OFF
     *leaf_regs                                    :  ON
     rtl-split4                                    :  ON
     rtl-sched2                                    :  ON
     *stack_regs                                   :  ON
        rtl-split3                                 :  OFF
        rtl-stack                                  :  ON
  *all-late_compilation                            :  OFF
     rtl-alignments                                :  ON
     rtl-vartrack                                  :  ON
     *free_cfg                                     :  ON
     rtl-mach                                      :  ON
     rtl-barriers                                  :  ON
     rtl-dbr                                       :  OFF
     rtl-split5                                    :  OFF
     rtl-eh_ranges                                 :  OFF
     rtl-shorten                                   :  ON
     rtl-nothrow                                   :  ON
     rtl-dwarf2                                    :  ON
     rtl-stackleak_final                           :  ON
     rtl-final                                     :  ON
  rtl-dfinish                                      :  ON
clean_state                                        :  ON

Where exactly would you recommend me to insert the stackleak_final pass, which
removes the unneeded track_stack() calls?

Best regards,
Alexander

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 1/2] stackleak: Update for arm64
@ 2018-02-28 15:09                   ` Alexander Popov
  0 siblings, 0 replies; 61+ messages in thread
From: Alexander Popov @ 2018-02-28 15:09 UTC (permalink / raw)
  To: linux-arm-kernel

On 27.02.2018 13:21, Richard Sandiford wrote:
> Hi Alexander,
> 
> Sorry for the slow reply, been caught up in an office move.

Thank you very much for the review, Richard!

> Alexander Popov <alex.popov@linux.com> writes:
>> Would you be so kind to take a look at the whole STACKLEAK plugin?
>> http://www.openwall.com/lists/kernel-hardening/2018/02/16/4
>> https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?h=kspp/gcc-plugin/stackleak&id=57a0a6763b12e82dd462593d0f42be610e93cdc9
>>
>> It's not very big. I documented it in detail. I would be really grateful for the
>> review!
> 
> Looks good to me FWIW.  Just a couple of minor things:
> 
>> +	/*
>> +	 * 1. Loop through the GIMPLE statements in each of cfun basic blocks.
>> +	 * cfun is a global variable which represents the function that is
>> +	 * currently processed.
>> +	 */
>> +	FOR_EACH_BB_FN(bb, cfun) {
>> +		for (gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi)) {
>> +			gimple stmt;
>> +
>> +			stmt = gsi_stmt(gsi);
>> +
>> +			/* Leaf function is a function which makes no calls */
>> +			if (is_gimple_call(stmt))
>> +				is_leaf = false;
> 
> It's probably not going to matter in practice, but it might be worth
> emphasising in the comments that this leafness is only approximate.

That's important, thank you! May I ask why you think it's not going to matter in
practice?

> It will sometimes be a false positive, because we could still
> end up creating calls to libgcc functions from non-call statements
> (or for target-specific reasons).  It can also be a false negative,
> since call statements can be to built-in or internal functions that
> end up being open-coded.

Oh, that raises the question: how does this leafness inaccuracy affect in my
particular case?

is_leaf is currently used for finding the special cases to skip the
track_stack() call insertion:

/*
 * Special cases to skip the instrumentation.
 *
 * Taking the address of static inline functions materializes them,
 * but we mustn't instrument some of them as the resulting stack
 * alignment required by the function call ABI will break other
 * assumptions regarding the expected (but not otherwise enforced)
 * register clobbering ABI.
 *
 * Case in point: native_save_fl on amd64 when optimized for size
 * clobbers rdx if it were instrumented here.
 *
 * TODO: any more special cases?
 */
if (is_leaf &&
    !TREE_PUBLIC(current_function_decl) &&
    DECL_DECLARED_INLINE_P(current_function_decl)) {
	return 0;
}


And now it seems to me that the stackleak plugin should not instrument all
static inline functions, regardless of is_leaf. Do you agree?

>> +	/*
>> +	 * The stackleak_final pass should be executed before the "final" pass,
>> +	 * which turns the RTL (Register Transfer Language) into assembly.
>> +	 */
>> +	PASS_INFO(stackleak_final, "final", 1, PASS_POS_INSERT_BEFORE);
> 
> This might be too late, since it happens e.g. after addresses have
> been calculated for branch ranges, and after machine-specific passes
> (e.g. bundling on ia64).
> 
> The stack size is final after reload, so inserting the pass after that
> might be better.

Thanks for that notice. May I ask for the additional clarification?

I specified -fdump-passes and see a lot of passes between reload and final:
  ...
  rtl-sched1                                       :  OFF
  rtl-ira                                          :  ON
  rtl-reload                                       :  ON
  rtl-vzeroupper                                   :  OFF
  *all-postreload                                  :  OFF
     rtl-postreload                                :  OFF
     rtl-gcse2                                     :  OFF
     rtl-split2                                    :  ON
     rtl-ree                                       :  ON
     rtl-cmpelim                                   :  OFF
     rtl-btl1                                      :  OFF
     rtl-pro_and_epilogue                          :  ON
     rtl-dse2                                      :  ON
     rtl-csa                                       :  ON
     rtl-jump2                                     :  ON
     rtl-compgotos                                 :  ON
     rtl-sched_fusion                              :  OFF
     rtl-peephole2                                 :  ON
     rtl-ce3                                       :  ON
     rtl-rnreg                                     :  OFF
     rtl-cprop_hardreg                             :  ON
     rtl-rtl_dce                                   :  ON
     rtl-bbro                                      :  ON
     rtl-btl2                                      :  OFF
     *leaf_regs                                    :  ON
     rtl-split4                                    :  ON
     rtl-sched2                                    :  ON
     *stack_regs                                   :  ON
        rtl-split3                                 :  OFF
        rtl-stack                                  :  ON
  *all-late_compilation                            :  OFF
     rtl-alignments                                :  ON
     rtl-vartrack                                  :  ON
     *free_cfg                                     :  ON
     rtl-mach                                      :  ON
     rtl-barriers                                  :  ON
     rtl-dbr                                       :  OFF
     rtl-split5                                    :  OFF
     rtl-eh_ranges                                 :  OFF
     rtl-shorten                                   :  ON
     rtl-nothrow                                   :  ON
     rtl-dwarf2                                    :  ON
     rtl-stackleak_final                           :  ON
     rtl-final                                     :  ON
  rtl-dfinish                                      :  ON
clean_state                                        :  ON

Where exactly would you recommend me to insert the stackleak_final pass, which
removes the unneeded track_stack() calls?

Best regards,
Alexander

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/2] stackleak: Update for arm64
  2018-02-28 15:09                   ` Alexander Popov
  (?)
@ 2018-03-01 10:33                     ` Richard Sandiford
  -1 siblings, 0 replies; 61+ messages in thread
From: Richard Sandiford @ 2018-03-01 10:33 UTC (permalink / raw)
  To: Alexander Popov
  Cc: Will Deacon, Laura Abbott, Kees Cook, Mark Rutland,
	Ard Biesheuvel, linux-kernel, linux-arm-kernel, kernel-hardening

Alexander Popov <alex.popov@linux.com> writes:
> On 27.02.2018 13:21, Richard Sandiford wrote:
>> Hi Alexander,
>> 
>> Sorry for the slow reply, been caught up in an office move.
>
> Thank you very much for the review, Richard!
>
>> Alexander Popov <alex.popov@linux.com> writes:
>>> Would you be so kind to take a look at the whole STACKLEAK plugin?
>>> http://www.openwall.com/lists/kernel-hardening/2018/02/16/4
>>> https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?h=kspp/gcc-plugin/stackleak&id=57a0a6763b12e82dd462593d0f42be610e93cdc9
>>>
>>> It's not very big. I documented it in detail. I would be really
>>> grateful for the
>>> review!
>> 
>> Looks good to me FWIW.  Just a couple of minor things:
>> 
>>> +	/*
>>> +	 * 1. Loop through the GIMPLE statements in each of cfun basic blocks.
>>> +	 * cfun is a global variable which represents the function that is
>>> +	 * currently processed.
>>> +	 */
>>> +	FOR_EACH_BB_FN(bb, cfun) {
>>> +		for (gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi)) {
>>> +			gimple stmt;
>>> +
>>> +			stmt = gsi_stmt(gsi);
>>> +
>>> +			/* Leaf function is a function which makes no calls */
>>> +			if (is_gimple_call(stmt))
>>> +				is_leaf = false;
>> 
>> It's probably not going to matter in practice, but it might be worth
>> emphasising in the comments that this leafness is only approximate.
>
> That's important, thank you! May I ask why you think it's not going to matter in
> practice?

I just thought the kind of calls it misses are going to have very
shallow frames, but from what you said later I guess that isn't the
point.  It also might be a bit too hand-wavy for something like this :-)

>> It will sometimes be a false positive, because we could still
>> end up creating calls to libgcc functions from non-call statements
>> (or for target-specific reasons).  It can also be a false negative,
>> since call statements can be to built-in or internal functions that
>> end up being open-coded.
>
> Oh, that raises the question: how does this leafness inaccuracy affect in my
> particular case?
>
> is_leaf is currently used for finding the special cases to skip the
> track_stack() call insertion:
>
> /*
>  * Special cases to skip the instrumentation.
>  *
>  * Taking the address of static inline functions materializes them,
>  * but we mustn't instrument some of them as the resulting stack
>  * alignment required by the function call ABI will break other
>  * assumptions regarding the expected (but not otherwise enforced)
>  * register clobbering ABI.
>  *
>  * Case in point: native_save_fl on amd64 when optimized for size
>  * clobbers rdx if it were instrumented here.
>  *
>  * TODO: any more special cases?
>  */
> if (is_leaf &&
>     !TREE_PUBLIC(current_function_decl) &&
>     DECL_DECLARED_INLINE_P(current_function_decl)) {
> 	return 0;
> }
>
>
> And now it seems to me that the stackleak plugin should not instrument all
> static inline functions, regardless of is_leaf. Do you agree?

OK.  I'd missed that this was just a heuristic to detect certain kinds
of linux function, so it's probably fine as it is.

Not sure whether it's safe to punt for general static inline functions.
E.g. couldn't you have a static inline function that just provides a
more convenient interface to another function?  But I guess it's a
linux-specific heuristic, so I can't really say.

TBH the paravirt save_fl stuff seems like dancing on the edge,
but that's another story. :-)

>>> +	/*
>>> +	 * The stackleak_final pass should be executed before the "final" pass,
>>> +	 * which turns the RTL (Register Transfer Language) into assembly.
>>> +	 */
>>> +	PASS_INFO(stackleak_final, "final", 1, PASS_POS_INSERT_BEFORE);
>> 
>> This might be too late, since it happens e.g. after addresses have
>> been calculated for branch ranges, and after machine-specific passes
>> (e.g. bundling on ia64).
>> 
>> The stack size is final after reload, so inserting the pass after that
>> might be better.
>
> Thanks for that notice. May I ask for the additional clarification?
>
> I specified -fdump-passes and see a lot of passes between reload and final:
>   ...
>   rtl-sched1                                       :  OFF
>   rtl-ira                                          :  ON
>   rtl-reload                                       :  ON
>   rtl-vzeroupper                                   :  OFF
>   *all-postreload                                  :  OFF
>      rtl-postreload                                :  OFF
>      rtl-gcse2                                     :  OFF
>      rtl-split2                                    :  ON
>      rtl-ree                                       :  ON
>      rtl-cmpelim                                   :  OFF
>      rtl-btl1                                      :  OFF
>      rtl-pro_and_epilogue                          :  ON
>      rtl-dse2                                      :  ON
>      rtl-csa                                       :  ON
>      rtl-jump2                                     :  ON
>      rtl-compgotos                                 :  ON
>      rtl-sched_fusion                              :  OFF
>      rtl-peephole2                                 :  ON
>      rtl-ce3                                       :  ON
>      rtl-rnreg                                     :  OFF
>      rtl-cprop_hardreg                             :  ON
>      rtl-rtl_dce                                   :  ON
>      rtl-bbro                                      :  ON
>      rtl-btl2                                      :  OFF
>      *leaf_regs                                    :  ON
>      rtl-split4                                    :  ON
>      rtl-sched2                                    :  ON
>      *stack_regs                                   :  ON
>         rtl-split3                                 :  OFF
>         rtl-stack                                  :  ON
>   *all-late_compilation                            :  OFF
>      rtl-alignments                                :  ON
>      rtl-vartrack                                  :  ON
>      *free_cfg                                     :  ON
>      rtl-mach                                      :  ON
>      rtl-barriers                                  :  ON
>      rtl-dbr                                       :  OFF
>      rtl-split5                                    :  OFF
>      rtl-eh_ranges                                 :  OFF
>      rtl-shorten                                   :  ON
>      rtl-nothrow                                   :  ON
>      rtl-dwarf2                                    :  ON
>      rtl-stackleak_final                           :  ON
>      rtl-final                                     :  ON
>   rtl-dfinish                                      :  ON
> clean_state                                        :  ON
>
> Where exactly would you recommend me to insert the stackleak_final pass, which
> removes the unneeded track_stack() calls?

Directly after rtl-reload seems best.  That's the first point at which
the frame size is final, and reload is one of the few rtl passes that
always runs.  Doing it there could also help with things like shrink
wrapping (part of rtl-pro_and_epilogue).

Thanks,
Richard

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 1/2] stackleak: Update for arm64
@ 2018-03-01 10:33                     ` Richard Sandiford
  0 siblings, 0 replies; 61+ messages in thread
From: Richard Sandiford @ 2018-03-01 10:33 UTC (permalink / raw)
  To: linux-arm-kernel

Alexander Popov <alex.popov@linux.com> writes:
> On 27.02.2018 13:21, Richard Sandiford wrote:
>> Hi Alexander,
>> 
>> Sorry for the slow reply, been caught up in an office move.
>
> Thank you very much for the review, Richard!
>
>> Alexander Popov <alex.popov@linux.com> writes:
>>> Would you be so kind to take a look at the whole STACKLEAK plugin?
>>> http://www.openwall.com/lists/kernel-hardening/2018/02/16/4
>>> https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?h=kspp/gcc-plugin/stackleak&id=57a0a6763b12e82dd462593d0f42be610e93cdc9
>>>
>>> It's not very big. I documented it in detail. I would be really
>>> grateful for the
>>> review!
>> 
>> Looks good to me FWIW.  Just a couple of minor things:
>> 
>>> +	/*
>>> +	 * 1. Loop through the GIMPLE statements in each of cfun basic blocks.
>>> +	 * cfun is a global variable which represents the function that is
>>> +	 * currently processed.
>>> +	 */
>>> +	FOR_EACH_BB_FN(bb, cfun) {
>>> +		for (gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi)) {
>>> +			gimple stmt;
>>> +
>>> +			stmt = gsi_stmt(gsi);
>>> +
>>> +			/* Leaf function is a function which makes no calls */
>>> +			if (is_gimple_call(stmt))
>>> +				is_leaf = false;
>> 
>> It's probably not going to matter in practice, but it might be worth
>> emphasising in the comments that this leafness is only approximate.
>
> That's important, thank you! May I ask why you think it's not going to matter in
> practice?

I just thought the kind of calls it misses are going to have very
shallow frames, but from what you said later I guess that isn't the
point.  It also might be a bit too hand-wavy for something like this :-)

>> It will sometimes be a false positive, because we could still
>> end up creating calls to libgcc functions from non-call statements
>> (or for target-specific reasons).  It can also be a false negative,
>> since call statements can be to built-in or internal functions that
>> end up being open-coded.
>
> Oh, that raises the question: how does this leafness inaccuracy affect in my
> particular case?
>
> is_leaf is currently used for finding the special cases to skip the
> track_stack() call insertion:
>
> /*
>  * Special cases to skip the instrumentation.
>  *
>  * Taking the address of static inline functions materializes them,
>  * but we mustn't instrument some of them as the resulting stack
>  * alignment required by the function call ABI will break other
>  * assumptions regarding the expected (but not otherwise enforced)
>  * register clobbering ABI.
>  *
>  * Case in point: native_save_fl on amd64 when optimized for size
>  * clobbers rdx if it were instrumented here.
>  *
>  * TODO: any more special cases?
>  */
> if (is_leaf &&
>     !TREE_PUBLIC(current_function_decl) &&
>     DECL_DECLARED_INLINE_P(current_function_decl)) {
> 	return 0;
> }
>
>
> And now it seems to me that the stackleak plugin should not instrument all
> static inline functions, regardless of is_leaf. Do you agree?

OK.  I'd missed that this was just a heuristic to detect certain kinds
of linux function, so it's probably fine as it is.

Not sure whether it's safe to punt for general static inline functions.
E.g. couldn't you have a static inline function that just provides a
more convenient interface to another function?  But I guess it's a
linux-specific heuristic, so I can't really say.

TBH the paravirt save_fl stuff seems like dancing on the edge,
but that's another story. :-)

>>> +	/*
>>> +	 * The stackleak_final pass should be executed before the "final" pass,
>>> +	 * which turns the RTL (Register Transfer Language) into assembly.
>>> +	 */
>>> +	PASS_INFO(stackleak_final, "final", 1, PASS_POS_INSERT_BEFORE);
>> 
>> This might be too late, since it happens e.g. after addresses have
>> been calculated for branch ranges, and after machine-specific passes
>> (e.g. bundling on ia64).
>> 
>> The stack size is final after reload, so inserting the pass after that
>> might be better.
>
> Thanks for that notice. May I ask for the additional clarification?
>
> I specified -fdump-passes and see a lot of passes between reload and final:
>   ...
>   rtl-sched1                                       :  OFF
>   rtl-ira                                          :  ON
>   rtl-reload                                       :  ON
>   rtl-vzeroupper                                   :  OFF
>   *all-postreload                                  :  OFF
>      rtl-postreload                                :  OFF
>      rtl-gcse2                                     :  OFF
>      rtl-split2                                    :  ON
>      rtl-ree                                       :  ON
>      rtl-cmpelim                                   :  OFF
>      rtl-btl1                                      :  OFF
>      rtl-pro_and_epilogue                          :  ON
>      rtl-dse2                                      :  ON
>      rtl-csa                                       :  ON
>      rtl-jump2                                     :  ON
>      rtl-compgotos                                 :  ON
>      rtl-sched_fusion                              :  OFF
>      rtl-peephole2                                 :  ON
>      rtl-ce3                                       :  ON
>      rtl-rnreg                                     :  OFF
>      rtl-cprop_hardreg                             :  ON
>      rtl-rtl_dce                                   :  ON
>      rtl-bbro                                      :  ON
>      rtl-btl2                                      :  OFF
>      *leaf_regs                                    :  ON
>      rtl-split4                                    :  ON
>      rtl-sched2                                    :  ON
>      *stack_regs                                   :  ON
>         rtl-split3                                 :  OFF
>         rtl-stack                                  :  ON
>   *all-late_compilation                            :  OFF
>      rtl-alignments                                :  ON
>      rtl-vartrack                                  :  ON
>      *free_cfg                                     :  ON
>      rtl-mach                                      :  ON
>      rtl-barriers                                  :  ON
>      rtl-dbr                                       :  OFF
>      rtl-split5                                    :  OFF
>      rtl-eh_ranges                                 :  OFF
>      rtl-shorten                                   :  ON
>      rtl-nothrow                                   :  ON
>      rtl-dwarf2                                    :  ON
>      rtl-stackleak_final                           :  ON
>      rtl-final                                     :  ON
>   rtl-dfinish                                      :  ON
> clean_state                                        :  ON
>
> Where exactly would you recommend me to insert the stackleak_final pass, which
> removes the unneeded track_stack() calls?

Directly after rtl-reload seems best.  That's the first point at which
the frame size is final, and reload is one of the few rtl passes that
always runs.  Doing it there could also help with things like shrink
wrapping (part of rtl-pro_and_epilogue).

Thanks,
Richard

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/2] stackleak: Update for arm64
@ 2018-03-01 10:33                     ` Richard Sandiford
  0 siblings, 0 replies; 61+ messages in thread
From: Richard Sandiford @ 2018-03-01 10:33 UTC (permalink / raw)
  To: Alexander Popov
  Cc: Will Deacon, Laura Abbott, Kees Cook, Mark Rutland,
	Ard Biesheuvel, linux-kernel, linux-arm-kernel, kernel-hardening

Alexander Popov <alex.popov@linux.com> writes:
> On 27.02.2018 13:21, Richard Sandiford wrote:
>> Hi Alexander,
>> 
>> Sorry for the slow reply, been caught up in an office move.
>
> Thank you very much for the review, Richard!
>
>> Alexander Popov <alex.popov@linux.com> writes:
>>> Would you be so kind to take a look at the whole STACKLEAK plugin?
>>> http://www.openwall.com/lists/kernel-hardening/2018/02/16/4
>>> https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?h=kspp/gcc-plugin/stackleak&id=57a0a6763b12e82dd462593d0f42be610e93cdc9
>>>
>>> It's not very big. I documented it in detail. I would be really
>>> grateful for the
>>> review!
>> 
>> Looks good to me FWIW.  Just a couple of minor things:
>> 
>>> +	/*
>>> +	 * 1. Loop through the GIMPLE statements in each of cfun basic blocks.
>>> +	 * cfun is a global variable which represents the function that is
>>> +	 * currently processed.
>>> +	 */
>>> +	FOR_EACH_BB_FN(bb, cfun) {
>>> +		for (gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi)) {
>>> +			gimple stmt;
>>> +
>>> +			stmt = gsi_stmt(gsi);
>>> +
>>> +			/* Leaf function is a function which makes no calls */
>>> +			if (is_gimple_call(stmt))
>>> +				is_leaf = false;
>> 
>> It's probably not going to matter in practice, but it might be worth
>> emphasising in the comments that this leafness is only approximate.
>
> That's important, thank you! May I ask why you think it's not going to matter in
> practice?

I just thought the kind of calls it misses are going to have very
shallow frames, but from what you said later I guess that isn't the
point.  It also might be a bit too hand-wavy for something like this :-)

>> It will sometimes be a false positive, because we could still
>> end up creating calls to libgcc functions from non-call statements
>> (or for target-specific reasons).  It can also be a false negative,
>> since call statements can be to built-in or internal functions that
>> end up being open-coded.
>
> Oh, that raises the question: how does this leafness inaccuracy affect in my
> particular case?
>
> is_leaf is currently used for finding the special cases to skip the
> track_stack() call insertion:
>
> /*
>  * Special cases to skip the instrumentation.
>  *
>  * Taking the address of static inline functions materializes them,
>  * but we mustn't instrument some of them as the resulting stack
>  * alignment required by the function call ABI will break other
>  * assumptions regarding the expected (but not otherwise enforced)
>  * register clobbering ABI.
>  *
>  * Case in point: native_save_fl on amd64 when optimized for size
>  * clobbers rdx if it were instrumented here.
>  *
>  * TODO: any more special cases?
>  */
> if (is_leaf &&
>     !TREE_PUBLIC(current_function_decl) &&
>     DECL_DECLARED_INLINE_P(current_function_decl)) {
> 	return 0;
> }
>
>
> And now it seems to me that the stackleak plugin should not instrument all
> static inline functions, regardless of is_leaf. Do you agree?

OK.  I'd missed that this was just a heuristic to detect certain kinds
of linux function, so it's probably fine as it is.

Not sure whether it's safe to punt for general static inline functions.
E.g. couldn't you have a static inline function that just provides a
more convenient interface to another function?  But I guess it's a
linux-specific heuristic, so I can't really say.

TBH the paravirt save_fl stuff seems like dancing on the edge,
but that's another story. :-)

>>> +	/*
>>> +	 * The stackleak_final pass should be executed before the "final" pass,
>>> +	 * which turns the RTL (Register Transfer Language) into assembly.
>>> +	 */
>>> +	PASS_INFO(stackleak_final, "final", 1, PASS_POS_INSERT_BEFORE);
>> 
>> This might be too late, since it happens e.g. after addresses have
>> been calculated for branch ranges, and after machine-specific passes
>> (e.g. bundling on ia64).
>> 
>> The stack size is final after reload, so inserting the pass after that
>> might be better.
>
> Thanks for that notice. May I ask for the additional clarification?
>
> I specified -fdump-passes and see a lot of passes between reload and final:
>   ...
>   rtl-sched1                                       :  OFF
>   rtl-ira                                          :  ON
>   rtl-reload                                       :  ON
>   rtl-vzeroupper                                   :  OFF
>   *all-postreload                                  :  OFF
>      rtl-postreload                                :  OFF
>      rtl-gcse2                                     :  OFF
>      rtl-split2                                    :  ON
>      rtl-ree                                       :  ON
>      rtl-cmpelim                                   :  OFF
>      rtl-btl1                                      :  OFF
>      rtl-pro_and_epilogue                          :  ON
>      rtl-dse2                                      :  ON
>      rtl-csa                                       :  ON
>      rtl-jump2                                     :  ON
>      rtl-compgotos                                 :  ON
>      rtl-sched_fusion                              :  OFF
>      rtl-peephole2                                 :  ON
>      rtl-ce3                                       :  ON
>      rtl-rnreg                                     :  OFF
>      rtl-cprop_hardreg                             :  ON
>      rtl-rtl_dce                                   :  ON
>      rtl-bbro                                      :  ON
>      rtl-btl2                                      :  OFF
>      *leaf_regs                                    :  ON
>      rtl-split4                                    :  ON
>      rtl-sched2                                    :  ON
>      *stack_regs                                   :  ON
>         rtl-split3                                 :  OFF
>         rtl-stack                                  :  ON
>   *all-late_compilation                            :  OFF
>      rtl-alignments                                :  ON
>      rtl-vartrack                                  :  ON
>      *free_cfg                                     :  ON
>      rtl-mach                                      :  ON
>      rtl-barriers                                  :  ON
>      rtl-dbr                                       :  OFF
>      rtl-split5                                    :  OFF
>      rtl-eh_ranges                                 :  OFF
>      rtl-shorten                                   :  ON
>      rtl-nothrow                                   :  ON
>      rtl-dwarf2                                    :  ON
>      rtl-stackleak_final                           :  ON
>      rtl-final                                     :  ON
>   rtl-dfinish                                      :  ON
> clean_state                                        :  ON
>
> Where exactly would you recommend me to insert the stackleak_final pass, which
> removes the unneeded track_stack() calls?

Directly after rtl-reload seems best.  That's the first point at which
the frame size is final, and reload is one of the few rtl passes that
always runs.  Doing it there could also help with things like shrink
wrapping (part of rtl-pro_and_epilogue).

Thanks,
Richard

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/2] stackleak: Update for arm64
  2018-03-01 10:33                     ` Richard Sandiford
@ 2018-03-02 11:14                       ` Alexander Popov
  -1 siblings, 0 replies; 61+ messages in thread
From: Alexander Popov @ 2018-03-02 11:14 UTC (permalink / raw)
  To: Will Deacon, Laura Abbott, Kees Cook, Mark Rutland,
	Ard Biesheuvel, linux-kernel, linux-arm-kernel, kernel-hardening,
	richard.sandiford
  Cc: PaX Team

Thanks for your reply, Richard!

On 01.03.2018 13:33, Richard Sandiford wrote:
> Alexander Popov <alex.popov@linux.com> writes:
>> On 27.02.2018 13:21, Richard Sandiford wrote:
>>> Alexander Popov <alex.popov@linux.com> writes:
>>>> Would you be so kind to take a look at the whole STACKLEAK plugin?
>>>> http://www.openwall.com/lists/kernel-hardening/2018/02/16/4
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?h=kspp/gcc-plugin/stackleak&id=57a0a6763b12e82dd462593d0f42be610e93cdc9
>>>>
>>>> It's not very big. I documented it in detail. I would be really
>>>> grateful for the
>>>> review!
>>>
>>> Looks good to me FWIW.  Just a couple of minor things:
>>>
>>>> +	/*
>>>> +	 * 1. Loop through the GIMPLE statements in each of cfun basic blocks.
>>>> +	 * cfun is a global variable which represents the function that is
>>>> +	 * currently processed.
>>>> +	 */
>>>> +	FOR_EACH_BB_FN(bb, cfun) {
>>>> +		for (gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi)) {
>>>> +			gimple stmt;
>>>> +
>>>> +			stmt = gsi_stmt(gsi);
>>>> +
>>>> +			/* Leaf function is a function which makes no calls */
>>>> +			if (is_gimple_call(stmt))
>>>> +				is_leaf = false;
>>>
>>> It's probably not going to matter in practice, but it might be worth
>>> emphasising in the comments that this leafness is only approximate.
>>
>> That's important, thank you! May I ask why you think it's not going to matter in
>> practice?
> 
> I just thought the kind of calls it misses are going to have very
> shallow frames, but from what you said later I guess that isn't the
> point.  It also might be a bit too hand-wavy for something like this :-)
> 
>>> It will sometimes be a false positive, because we could still
>>> end up creating calls to libgcc functions from non-call statements
>>> (or for target-specific reasons).  It can also be a false negative,
>>> since call statements can be to built-in or internal functions that
>>> end up being open-coded.
>>
>> Oh, that raises the question: how does this leafness inaccuracy affect in my
>> particular case?
>>
>> is_leaf is currently used for finding the special cases to skip the
>> track_stack() call insertion:
>>
>> /*
>>  * Special cases to skip the instrumentation.
>>  *
>>  * Taking the address of static inline functions materializes them,
>>  * but we mustn't instrument some of them as the resulting stack
>>  * alignment required by the function call ABI will break other
>>  * assumptions regarding the expected (but not otherwise enforced)
>>  * register clobbering ABI.
>>  *
>>  * Case in point: native_save_fl on amd64 when optimized for size
>>  * clobbers rdx if it were instrumented here.
>>  *
>>  * TODO: any more special cases?
>>  */
>> if (is_leaf &&
>>     !TREE_PUBLIC(current_function_decl) &&
>>     DECL_DECLARED_INLINE_P(current_function_decl)) {
>> 	return 0;
>> }
>>
>>
>> And now it seems to me that the stackleak plugin should not instrument all
>> static inline functions, regardless of is_leaf. Do you agree?
> 
> OK.  I'd missed that this was just a heuristic to detect certain kinds
> of linux function, so it's probably fine as it is.
> 
> Not sure whether it's safe to punt for general static inline functions.
> E.g. couldn't you have a static inline function that just provides a
> more convenient interface to another function?  But I guess it's a
> linux-specific heuristic, so I can't really say.

Huh, I got the insight! I think that the current approach (originally by PaX
Team) should work fine despite the false positives which you described:

If some static inline function already does explicit calls (so is_leaf is
false), adding the track_stack() call will not introduce anything special that
can break the aforementioned register clobbering ABI in that function.

Does it sound reasonable?

However, I don't know what to with false negatives.

> TBH the paravirt save_fl stuff seems like dancing on the edge,
> but that's another story. :-)

That's interesting. Could you elaborate on that?

>>>> +	/*
>>>> +	 * The stackleak_final pass should be executed before the "final" pass,
>>>> +	 * which turns the RTL (Register Transfer Language) into assembly.
>>>> +	 */
>>>> +	PASS_INFO(stackleak_final, "final", 1, PASS_POS_INSERT_BEFORE);
>>>
>>> This might be too late, since it happens e.g. after addresses have
>>> been calculated for branch ranges, and after machine-specific passes
>>> (e.g. bundling on ia64).
>>>
>>> The stack size is final after reload, so inserting the pass after that
>>> might be better.
>>
>> Thanks for that notice. May I ask for the additional clarification?
>>
>> I specified -fdump-passes and see a lot of passes between reload and final:
...
>>
>> Where exactly would you recommend me to insert the stackleak_final pass, which
>> removes the unneeded track_stack() calls?
> 
> Directly after rtl-reload seems best.  That's the first point at which
> the frame size is final, and reload is one of the few rtl passes that
> always runs.  Doing it there could also help with things like shrink
> wrapping (part of rtl-pro_and_epilogue).
Thanks a lot for your detailed answer. I'll follow your advice in the next
version of the patch series.

Best regards,
Alexander

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 1/2] stackleak: Update for arm64
@ 2018-03-02 11:14                       ` Alexander Popov
  0 siblings, 0 replies; 61+ messages in thread
From: Alexander Popov @ 2018-03-02 11:14 UTC (permalink / raw)
  To: linux-arm-kernel

Thanks for your reply, Richard!

On 01.03.2018 13:33, Richard Sandiford wrote:
> Alexander Popov <alex.popov@linux.com> writes:
>> On 27.02.2018 13:21, Richard Sandiford wrote:
>>> Alexander Popov <alex.popov@linux.com> writes:
>>>> Would you be so kind to take a look at the whole STACKLEAK plugin?
>>>> http://www.openwall.com/lists/kernel-hardening/2018/02/16/4
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?h=kspp/gcc-plugin/stackleak&id=57a0a6763b12e82dd462593d0f42be610e93cdc9
>>>>
>>>> It's not very big. I documented it in detail. I would be really
>>>> grateful for the
>>>> review!
>>>
>>> Looks good to me FWIW.  Just a couple of minor things:
>>>
>>>> +	/*
>>>> +	 * 1. Loop through the GIMPLE statements in each of cfun basic blocks.
>>>> +	 * cfun is a global variable which represents the function that is
>>>> +	 * currently processed.
>>>> +	 */
>>>> +	FOR_EACH_BB_FN(bb, cfun) {
>>>> +		for (gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi)) {
>>>> +			gimple stmt;
>>>> +
>>>> +			stmt = gsi_stmt(gsi);
>>>> +
>>>> +			/* Leaf function is a function which makes no calls */
>>>> +			if (is_gimple_call(stmt))
>>>> +				is_leaf = false;
>>>
>>> It's probably not going to matter in practice, but it might be worth
>>> emphasising in the comments that this leafness is only approximate.
>>
>> That's important, thank you! May I ask why you think it's not going to matter in
>> practice?
> 
> I just thought the kind of calls it misses are going to have very
> shallow frames, but from what you said later I guess that isn't the
> point.  It also might be a bit too hand-wavy for something like this :-)
> 
>>> It will sometimes be a false positive, because we could still
>>> end up creating calls to libgcc functions from non-call statements
>>> (or for target-specific reasons).  It can also be a false negative,
>>> since call statements can be to built-in or internal functions that
>>> end up being open-coded.
>>
>> Oh, that raises the question: how does this leafness inaccuracy affect in my
>> particular case?
>>
>> is_leaf is currently used for finding the special cases to skip the
>> track_stack() call insertion:
>>
>> /*
>>  * Special cases to skip the instrumentation.
>>  *
>>  * Taking the address of static inline functions materializes them,
>>  * but we mustn't instrument some of them as the resulting stack
>>  * alignment required by the function call ABI will break other
>>  * assumptions regarding the expected (but not otherwise enforced)
>>  * register clobbering ABI.
>>  *
>>  * Case in point: native_save_fl on amd64 when optimized for size
>>  * clobbers rdx if it were instrumented here.
>>  *
>>  * TODO: any more special cases?
>>  */
>> if (is_leaf &&
>>     !TREE_PUBLIC(current_function_decl) &&
>>     DECL_DECLARED_INLINE_P(current_function_decl)) {
>> 	return 0;
>> }
>>
>>
>> And now it seems to me that the stackleak plugin should not instrument all
>> static inline functions, regardless of is_leaf. Do you agree?
> 
> OK.  I'd missed that this was just a heuristic to detect certain kinds
> of linux function, so it's probably fine as it is.
> 
> Not sure whether it's safe to punt for general static inline functions.
> E.g. couldn't you have a static inline function that just provides a
> more convenient interface to another function?  But I guess it's a
> linux-specific heuristic, so I can't really say.

Huh, I got the insight! I think that the current approach (originally by PaX
Team) should work fine despite the false positives which you described:

If some static inline function already does explicit calls (so is_leaf is
false), adding the track_stack() call will not introduce anything special that
can break the aforementioned register clobbering ABI in that function.

Does it sound reasonable?

However, I don't know what to with false negatives.

> TBH the paravirt save_fl stuff seems like dancing on the edge,
> but that's another story. :-)

That's interesting. Could you elaborate on that?

>>>> +	/*
>>>> +	 * The stackleak_final pass should be executed before the "final" pass,
>>>> +	 * which turns the RTL (Register Transfer Language) into assembly.
>>>> +	 */
>>>> +	PASS_INFO(stackleak_final, "final", 1, PASS_POS_INSERT_BEFORE);
>>>
>>> This might be too late, since it happens e.g. after addresses have
>>> been calculated for branch ranges, and after machine-specific passes
>>> (e.g. bundling on ia64).
>>>
>>> The stack size is final after reload, so inserting the pass after that
>>> might be better.
>>
>> Thanks for that notice. May I ask for the additional clarification?
>>
>> I specified -fdump-passes and see a lot of passes between reload and final:
...
>>
>> Where exactly would you recommend me to insert the stackleak_final pass, which
>> removes the unneeded track_stack() calls?
> 
> Directly after rtl-reload seems best.  That's the first point at which
> the frame size is final, and reload is one of the few rtl passes that
> always runs.  Doing it there could also help with things like shrink
> wrapping (part of rtl-pro_and_epilogue).
Thanks a lot for your detailed answer. I'll follow your advice in the next
version of the patch series.

Best regards,
Alexander

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 1/2] stackleak: Update for arm64
  2018-05-02 20:33 [PATCH 0/2] Stackleak for arm64 Laura Abbott
@ 2018-05-02 20:33   ` Laura Abbott
  0 siblings, 0 replies; 61+ messages in thread
From: Laura Abbott @ 2018-05-02 20:33 UTC (permalink / raw)
  To: Alexander Popov, Kees Cook, Mark Rutland, Ard Biesheuvel
  Cc: Laura Abbott, kernel-hardening, linux-arm-kernel, linux-kernel

arm64 has another layer of indirection in the RTL.
Account for this in the plugin.

Signed-off-by: Laura Abbott <labbott@redhat.com>
---
Fixed from previous version to be a vector expression.
---
 scripts/gcc-plugins/stackleak_plugin.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/scripts/gcc-plugins/stackleak_plugin.c b/scripts/gcc-plugins/stackleak_plugin.c
index 6ac2a055ec61..0a55ecaf44df 100644
--- a/scripts/gcc-plugins/stackleak_plugin.c
+++ b/scripts/gcc-plugins/stackleak_plugin.c
@@ -253,6 +253,10 @@ static unsigned int stackleak_cleanup_execute(void)
 		 * that insn.
 		 */
 		body = PATTERN(insn);
+		/* arm64 is different */
+		if (GET_CODE(body) == PARALLEL)
+			body = XVECEXP(body, 0, 0);
+
 		if (GET_CODE(body) != CALL)
 			continue;
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 1/2] stackleak: Update for arm64
@ 2018-05-02 20:33   ` Laura Abbott
  0 siblings, 0 replies; 61+ messages in thread
From: Laura Abbott @ 2018-05-02 20:33 UTC (permalink / raw)
  To: linux-arm-kernel

arm64 has another layer of indirection in the RTL.
Account for this in the plugin.

Signed-off-by: Laura Abbott <labbott@redhat.com>
---
Fixed from previous version to be a vector expression.
---
 scripts/gcc-plugins/stackleak_plugin.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/scripts/gcc-plugins/stackleak_plugin.c b/scripts/gcc-plugins/stackleak_plugin.c
index 6ac2a055ec61..0a55ecaf44df 100644
--- a/scripts/gcc-plugins/stackleak_plugin.c
+++ b/scripts/gcc-plugins/stackleak_plugin.c
@@ -253,6 +253,10 @@ static unsigned int stackleak_cleanup_execute(void)
 		 * that insn.
 		 */
 		body = PATTERN(insn);
+		/* arm64 is different */
+		if (GET_CODE(body) == PARALLEL)
+			body = XVECEXP(body, 0, 0);
+
 		if (GET_CODE(body) != CALL)
 			continue;
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2018-05-02 20:33 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-16 18:10 [PATCH RFC v8 0/6] Introduce the STACKLEAK feature and a test for it Alexander Popov
2018-02-16 18:10 ` [PATCH RFC v8 1/6] x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls Alexander Popov
2018-02-21 13:24   ` Borislav Petkov
2018-02-21 21:49     ` Alexander Popov
2018-02-22 19:14       ` Borislav Petkov
2018-02-22 20:24         ` Alexander Popov
2018-02-16 18:10 ` [PATCH RFC v8 2/6] gcc-plugins: Add STACKLEAK plugin for tracking the kernel stack Alexander Popov
2018-02-16 18:10 ` [PATCH RFC v8 3/6] x86/entry: Erase kernel stack in syscall_trace_enter() Alexander Popov
2018-02-16 18:10 ` [PATCH RFC v8 4/6] lkdtm: Add a test for STACKLEAK Alexander Popov
2018-02-16 18:10 ` [PATCH RFC v8 5/6] fs/proc: Show STACKLEAK metrics in the /proc file system Alexander Popov
2018-02-16 18:10 ` [PATCH RFC v8 6/6] doc: self-protection: Add information about STACKLEAK feature Alexander Popov
2018-02-20 10:29 ` [PATCH RFC v8 0/6] Introduce the STACKLEAK feature and a test for it Alexander Popov
2018-02-20 23:17   ` Kees Cook
2018-02-20 23:33     ` Laura Abbott
2018-02-21  1:13       ` [PATCH 0/2] Stackleak for arm64 Laura Abbott
2018-02-21  1:13         ` Laura Abbott
2018-02-21  1:13         ` [PATCH 1/2] stackleak: Update " Laura Abbott
2018-02-21  1:13           ` Laura Abbott
2018-02-22 16:58           ` Will Deacon
2018-02-22 16:58             ` Will Deacon
2018-02-22 19:22             ` Alexander Popov
2018-02-22 19:22               ` Alexander Popov
2018-02-27 10:21               ` Richard Sandiford
2018-02-27 10:21                 ` Richard Sandiford
2018-02-27 10:21                 ` Richard Sandiford
2018-02-28 15:09                 ` Alexander Popov
2018-02-28 15:09                   ` Alexander Popov
2018-03-01 10:33                   ` Richard Sandiford
2018-03-01 10:33                     ` Richard Sandiford
2018-03-01 10:33                     ` Richard Sandiford
2018-03-02 11:14                     ` Alexander Popov
2018-03-02 11:14                       ` Alexander Popov
2018-02-22 19:38             ` Laura Abbott
2018-02-22 19:38               ` Laura Abbott
2018-02-21  1:13         ` [PATCH 2/2] arm64: Clear the stack Laura Abbott
2018-02-21  1:13           ` Laura Abbott
2018-02-21 15:38           ` Mark Rutland
2018-02-21 15:38             ` Mark Rutland
2018-02-21 23:53             ` Laura Abbott
2018-02-21 23:53               ` Laura Abbott
2018-02-22  1:35               ` Laura Abbott
2018-02-22  1:35                 ` Laura Abbott
2018-02-21 14:48         ` [PATCH 0/2] Stackleak for arm64 Alexander Popov
2018-02-21 14:48           ` Alexander Popov
2018-02-21 10:05     ` [PATCH RFC v8 0/6] Introduce the STACKLEAK feature and a test for it Borislav Petkov
2018-02-21 15:09       ` Alexander Popov
2018-02-21 14:43     ` Alexander Popov
2018-02-22  1:43 ` Laura Abbott
2018-02-22 23:14   ` [PATCH 0/2] Update stackleak for gcc-8 Laura Abbott
2018-02-22 23:14     ` [PATCH 1/2] gcc-plugins: Update cgraph_create_edge " Laura Abbott
2018-02-22 23:40       ` Kees Cook
2018-02-23 17:30         ` Laura Abbott
2018-02-24 12:36           ` Alexander Popov
2018-02-22 23:14     ` [PATCH 2/2] gcc-plugins: stackleak: Update " Laura Abbott
2018-02-24 14:04       ` Alexander Popov
2018-02-26 21:51         ` Laura Abbott
2018-02-27 10:30           ` Richard Sandiford
2018-02-28 10:27             ` Alexander Popov
2018-02-22 23:43     ` [PATCH 0/2] Update stackleak " Kees Cook
2018-05-02 20:33 [PATCH 0/2] Stackleak for arm64 Laura Abbott
2018-05-02 20:33 ` [PATCH 1/2] stackleak: Update " Laura Abbott
2018-05-02 20:33   ` Laura Abbott

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.