[PATCH v2 0/5] Optionally randomize kernel stack offset each syscall

kernel-hardening.lists.openwall.com archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/5] Optionally randomize kernel stack offset each syscall
@ 2020-03-24 20:32 Kees Cook
  2020-03-24 20:32 ` [PATCH v2 1/5] jump_label: Provide CONFIG-driven build state defaults Kees Cook
                   ` (5 more replies)
  0 siblings, 6 replies; 26+ messages in thread
From: Kees Cook @ 2020-03-24 20:32 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Kees Cook, Elena Reshetova, x86, Andy Lutomirski, Peter Zijlstra,
	Catalin Marinas, Will Deacon, Mark Rutland, Alexander Potapenko,
	Ard Biesheuvel, Jann Horn, Perla, Enrico, kernel-hardening,
	linux-arm-kernel, linux-mm, linux-kernel

Hi,

This is a continuation and refactoring of Elena's earlier effort to add
kernel stack base offset randomization. In the time since the previous
discussions, two attacks[1][2] were made public that depended on stack
determinism, so we're no longer in the position of "this is a good idea
but we have no examples of attacks". :)

Earlier discussions also devolved into debates on entropy sources, which
is mostly a red herring, given the already low entropy available due
to stack size. Regardless, entropy can be changed/improved separately
from this series as needed.

Earlier discussions also got stuck debating how much syscall overhead
was too much, but this is also a red herring since the feature itself
needs to be selectable at boot with no cost for those that don't want it:
this is solved here with static branches.

So, here is an improved version, made as arch-agnostic as possible,
with usage added for x86 and arm64. It also includes some small static
branch clean ups.

-Kees

[1] https://a13xp0p0v.github.io/2020/02/15/CVE-2019-18683.html
[2] https://repositorio-aberto.up.pt/bitstream/10216/125357/2/374717.pdf

v2:
- move to per-cpu rdtsc() saved on syscall exit
- add static branches for zero-cost dynamic enabling
- Kconfig just selects the default state of static branch
- __builtin_alloca() produces ugly asm without -fno-stack-clash-protection
- made arch agnostic
rfc: https://lore.kernel.org/kernel-hardening/20190329081358.30497-1-elena.reshetova@intel.com/

Kees Cook (5):
  jump_label: Provide CONFIG-driven build state defaults
  init_on_alloc: Unpessimize default-on builds
  stack: Optionally randomize kernel stack offset each syscall
  x86/entry: Enable random_kstack_offset support
  arm64: entry: Enable random_kstack_offset support

 Makefile                         |  4 ++++
 arch/Kconfig                     | 19 +++++++++++++++
 arch/arm64/Kconfig               |  1 +
 arch/arm64/kernel/syscall.c      | 10 ++++++++
 arch/x86/Kconfig                 |  1 +
 arch/x86/entry/common.c          | 12 +++++++++-
 include/linux/jump_label.h       | 19 +++++++++++++++
 include/linux/mm.h               | 18 +++++---------
 include/linux/randomize_kstack.h | 40 ++++++++++++++++++++++++++++++++
 init/main.c                      | 23 ++++++++++++++++++
 mm/page_alloc.c                  | 12 ++--------
 11 files changed, 136 insertions(+), 23 deletions(-)
 create mode 100644 include/linux/randomize_kstack.h

-- 
2.20.1

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 1/5] jump_label: Provide CONFIG-driven build state defaults
  2020-03-24 20:32 [PATCH v2 0/5] Optionally randomize kernel stack offset each syscall Kees Cook
@ 2020-03-24 20:32 ` Kees Cook
  2020-03-24 22:06   ` Peter Zijlstra
  2020-03-24 20:32 ` [PATCH v2 2/5] init_on_alloc: Unpessimize default-on builds Kees Cook
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 26+ messages in thread
From: Kees Cook @ 2020-03-24 20:32 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Kees Cook, Elena Reshetova, x86, Andy Lutomirski, Peter Zijlstra,
	Catalin Marinas, Will Deacon, Mark Rutland, Alexander Potapenko,
	Ard Biesheuvel, Jann Horn, Perla, Enrico, kernel-hardening,
	linux-arm-kernel, linux-mm, linux-kernel

Choosing the initial state of static branches changes the assembly
layout (if the condition is expected to be likely, inline, or unlikely,
out of line via a jump). A few places in the kernel use (or could be
using) a CONFIG to choose the default state, so provide the
infrastructure to do this and convert the existing cases (init_on_alloc
and init_on_free) to the new macros.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 include/linux/jump_label.h | 19 +++++++++++++++++++
 include/linux/mm.h         | 12 ++----------
 mm/page_alloc.c            | 12 ++----------
 3 files changed, 23 insertions(+), 20 deletions(-)

diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
index 3526c0aee954..615fdfb871a3 100644
--- a/include/linux/jump_label.h
+++ b/include/linux/jump_label.h
@@ -382,6 +382,21 @@ struct static_key_false {
 		[0 ... (count) - 1] = STATIC_KEY_FALSE_INIT,	\
 	}
 
+#define _DEFINE_STATIC_KEY_1(name)	DEFINE_STATIC_KEY_TRUE(name)
+#define _DEFINE_STATIC_KEY_0(name)	DEFINE_STATIC_KEY_FALSE(name)
+#define DEFINE_STATIC_KEY_MAYBE(cfg, name)			\
+	__PASTE(_DEFINE_STATIC_KEY_, IS_ENABLED(cfg))(name)
+
+#define _DEFINE_STATIC_KEY_RO_1(name)	DEFINE_STATIC_KEY_TRUE_RO(name)
+#define _DEFINE_STATIC_KEY_RO_0(name)	DEFINE_STATIC_KEY_FALSE_RO(name)
+#define DEFINE_STATIC_KEY_MAYBE_RO(cfg, name)			\
+	__PASTE(_DEFINE_STATIC_KEY_RO_, IS_ENABLED(cfg))(name)
+
+#define _DECLARE_STATIC_KEY_1(name)	DECLARE_STATIC_KEY_TRUE(name)
+#define _DECLARE_STATIC_KEY_0(name)	DECLARE_STATIC_KEY_FALSE(name)
+#define DECLARE_STATIC_KEY_MAYBE(cfg, name)			\
+	__PASTE(_DECLARE_STATIC_KEY_, IS_ENABLED(cfg))(name)
+
 extern bool ____wrong_branch_error(void);
 
 #define static_key_enabled(x)							\
@@ -482,6 +497,10 @@ extern bool ____wrong_branch_error(void);
 
 #endif /* CONFIG_JUMP_LABEL */
 
+#define static_branch_maybe(config, x)					\
+	(IS_ENABLED(config) ? static_branch_likely(x)			\
+			    : static_branch_unlikely(x))
+
 /*
  * Advanced usage; refcount, branch is enabled when: count != 0
  */
diff --git a/include/linux/mm.h b/include/linux/mm.h
index c54fb96cb1e6..059658604dd6 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2662,11 +2662,7 @@ static inline void kernel_poison_pages(struct page *page, int numpages,
 					int enable) { }
 #endif
 
-#ifdef CONFIG_INIT_ON_ALLOC_DEFAULT_ON
-DECLARE_STATIC_KEY_TRUE(init_on_alloc);
-#else
-DECLARE_STATIC_KEY_FALSE(init_on_alloc);
-#endif
+DECLARE_STATIC_KEY_MAYBE(CONFIG_INIT_ON_ALLOC_DEFAULT_ON, init_on_alloc);
 static inline bool want_init_on_alloc(gfp_t flags)
 {
 	if (static_branch_unlikely(&init_on_alloc) &&
@@ -2675,11 +2671,7 @@ static inline bool want_init_on_alloc(gfp_t flags)
 	return flags & __GFP_ZERO;
 }
 
-#ifdef CONFIG_INIT_ON_FREE_DEFAULT_ON
-DECLARE_STATIC_KEY_TRUE(init_on_free);
-#else
-DECLARE_STATIC_KEY_FALSE(init_on_free);
-#endif
+DECLARE_STATIC_KEY_MAYBE(CONFIG_INIT_ON_FREE_DEFAULT_ON, init_on_free);
 static inline bool want_init_on_free(void)
 {
 	return static_branch_unlikely(&init_on_free) &&
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3c4eb750a199..1f625e5a03c0 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -135,18 +135,10 @@ unsigned long totalcma_pages __read_mostly;
 
 int percpu_pagelist_fraction;
 gfp_t gfp_allowed_mask __read_mostly = GFP_BOOT_MASK;
-#ifdef CONFIG_INIT_ON_ALLOC_DEFAULT_ON
-DEFINE_STATIC_KEY_TRUE(init_on_alloc);
-#else
-DEFINE_STATIC_KEY_FALSE(init_on_alloc);
-#endif
+DEFINE_STATIC_KEY_MAYBE(CONFIG_INIT_ON_ALLOC_DEFAULT_ON, init_on_alloc);
 EXPORT_SYMBOL(init_on_alloc);
 
-#ifdef CONFIG_INIT_ON_FREE_DEFAULT_ON
-DEFINE_STATIC_KEY_TRUE(init_on_free);
-#else
-DEFINE_STATIC_KEY_FALSE(init_on_free);
-#endif
+DEFINE_STATIC_KEY_MAYBE(CONFIG_INIT_ON_FREE_DEFAULT_ON, init_on_free);
 EXPORT_SYMBOL(init_on_free);
 
 static int __init early_init_on_alloc(char *buf)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 2/5] init_on_alloc: Unpessimize default-on builds
  2020-03-24 20:32 [PATCH v2 0/5] Optionally randomize kernel stack offset each syscall Kees Cook
  2020-03-24 20:32 ` [PATCH v2 1/5] jump_label: Provide CONFIG-driven build state defaults Kees Cook
@ 2020-03-24 20:32 ` Kees Cook
  2020-03-26 15:48   ` Alexander Potapenko
  2020-03-24 20:32 ` [PATCH v2 3/5] stack: Optionally randomize kernel stack offset each syscall Kees Cook
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 26+ messages in thread
From: Kees Cook @ 2020-03-24 20:32 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Kees Cook, Elena Reshetova, x86, Andy Lutomirski, Peter Zijlstra,
	Catalin Marinas, Will Deacon, Mark Rutland, Alexander Potapenko,
	Ard Biesheuvel, Jann Horn, Perla, Enrico, kernel-hardening,
	linux-arm-kernel, linux-mm, linux-kernel

Right now, the state of CONFIG_INIT_ON_ALLOC_DEFAULT_ON (and
...ON_FREE...) did not change the assembly ordering of the static branch
tests. Use the new jump_label macro to check CONFIG settings to default
to the "expected" state, unpessimizes the resulting assembly code.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 include/linux/mm.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 059658604dd6..64e911159ffa 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2665,7 +2665,8 @@ static inline void kernel_poison_pages(struct page *page, int numpages,
 DECLARE_STATIC_KEY_MAYBE(CONFIG_INIT_ON_ALLOC_DEFAULT_ON, init_on_alloc);
 static inline bool want_init_on_alloc(gfp_t flags)
 {
-	if (static_branch_unlikely(&init_on_alloc) &&
+	if (static_branch_maybe(CONFIG_INIT_ON_ALLOC_DEFAULT_ON,
+				&init_on_alloc) &&
 	    !page_poisoning_enabled())
 		return true;
 	return flags & __GFP_ZERO;
@@ -2674,7 +2675,8 @@ static inline bool want_init_on_alloc(gfp_t flags)
 DECLARE_STATIC_KEY_MAYBE(CONFIG_INIT_ON_FREE_DEFAULT_ON, init_on_free);
 static inline bool want_init_on_free(void)
 {
-	return static_branch_unlikely(&init_on_free) &&
+	return static_branch_maybe(CONFIG_INIT_ON_FREE_DEFAULT_ON,
+				   &init_on_free) &&
 	       !page_poisoning_enabled();
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 3/5] stack: Optionally randomize kernel stack offset each syscall
  2020-03-24 20:32 [PATCH v2 0/5] Optionally randomize kernel stack offset each syscall Kees Cook
  2020-03-24 20:32 ` [PATCH v2 1/5] jump_label: Provide CONFIG-driven build state defaults Kees Cook
  2020-03-24 20:32 ` [PATCH v2 2/5] init_on_alloc: Unpessimize default-on builds Kees Cook
@ 2020-03-24 20:32 ` Kees Cook
  2020-03-30 11:25   ` Mark Rutland
  2020-03-24 20:32 ` [PATCH v2 4/5] x86/entry: Enable random_kstack_offset support Kees Cook
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 26+ messages in thread
From: Kees Cook @ 2020-03-24 20:32 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Kees Cook, Elena Reshetova, x86, Andy Lutomirski, Peter Zijlstra,
	Catalin Marinas, Will Deacon, Mark Rutland, Alexander Potapenko,
	Ard Biesheuvel, Jann Horn, Perla, Enrico, kernel-hardening,
	linux-arm-kernel, linux-mm, linux-kernel

This provides the ability for architectures to enable kernel stack base
address offset randomization. This feature is controlled by the boot
param "randomize_kstack_offset=on/off", with its default value set by
CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT.

This feature is based on the original idea from the last public release
of PaX's RANDKSTACK feature: https://pax.grsecurity.net/docs/randkstack.txt
All the credit for the original idea goes to the PaX team. Note that
the design and implementation of this upstream randomize_kstack_offset
feature differs greatly from the RANDKSTACK feature (see below).

Reasoning for the feature:

This feature aims to make harder the various stack-based attacks that
rely on deterministic stack structure. We have had many such attacks in
past (just to name few):

https://jon.oberheide.org/files/infiltrate12-thestackisback.pdf
https://jon.oberheide.org/files/stackjacking-infiltrate11.pdf
https://googleprojectzero.blogspot.com/2016/06/exploiting-recursion-in-linux-kernel_20.html

As Linux kernel stack protections have been constantly improving
(vmap-based stack allocation with guard pages, removal of thread_info,
STACKLEAK), attackers have to find new ways for their exploits to work.
They have done so, continuing to rely on the kernel's stack determinism,
in situations where VMAP_STACK and THREAD_INFO_IN_TASK_STRUCT were not
relevant. For example, the following recent attacks would have been
hampered if the stack offset was non-deterministic between syscalls:

https://repositorio-aberto.up.pt/bitstream/10216/125357/2/374717.pdf
https://a13xp0p0v.github.io/2020/02/15/CVE-2019-18683.html

The main idea is that since the stack offset is randomized upon each
system call, it is hard for an attack to reliably land in any particular
place on the thread stack, even with address exposures, as the stack base
will change on the next syscall. Also, since randomization is performed
after placing pt_regs, the ptrace-based approach[1] to discover the
randomized offset during a long-running syscall should not be possible.

Design description:

During most of the kernel's execution, it runs on the "thread stack",
which is allocated at fork.c/dup_task_struct() and stored in a per-task
variable (tsk->stack). Since stack is growing downward, the stack
top can be always calculated using task_top_of_stack(tsk) function,
which essentially returns an address of tsk->stack + stack size. When
VMAP_STACK is enabled, the thread stack is allocated from vmalloc space.

The thread stack is pretty deterministic in its structure -- fixed in
size, and upon every entry from a userspace to kernel on a syscall the
thread stack is started to be constructed from an address fetched from a
per-cpu cpu_current_top_of_stack variable. The first element to be pushed
to the thread stack is the pt_regs struct that stores all required CPU
registers and syscall parameters.

The goal of randomize_kstack_offset feature is to add a random offset
after the pt_regs has been pushed to the stack and the rest of thread
stack (used during the syscall processing) every time a process issues
a syscall. The source of randomness is currently arch-defined (but x86
is using the low byte of rdtsc()). Future improvements for different
entropy sources is possible, but out of scope for this patch. The offset
is added using alloca() call since it helps avoiding changes in assembly
syscall entry code and unwinder, and provides correct stack alignment
as defined by the compiler.

In order to make this available by default with zero performance impact
for those that don't want it, now it is selectable with static branches.
This way, if the overhead is not wanted, it can just be turned off.

Using the per-cpu variable as the entropy source and __builtin_alloc()
for stack adjustment and alignment, the generated assembly for x86_64
with GCC looks like this:

...
ffffffff81003977: 65 8b 05 02 ea 00 7f  mov %gs:0x7f00ea02(%rip),%eax
					    # 12380 <kstack_offset>
ffffffff8100397e: 25 ff 03 00 00        and $0x3ff,%eax
ffffffff81003983: 48 83 c0 0f           add $0xf,%rax
ffffffff81003987: 25 f8 07 00 00        and $0x7f8,%eax
ffffffff8100398c: 48 29 c4              sub %rax,%rsp
ffffffff8100398f: 48 8d 44 24 0f        lea 0xf(%rsp),%rax
ffffffff81003994: 48 83 e0 f0           and $0xfffffffffffffff0,%rax
...

As a result of the above stack alignment, this patch introduces about
5 bits of randomness after pt_regs is spilled to the thread stack on
x86_64, and 6 bits on x86_32 (since its has 1 fewer bits required for
stack alignment). The amount of entropy could be adjusted based on how
much of the stack space we wish to trade for security.

My measure of syscall performance overhead (on x86_64):

lmbench: /usr/lib/lmbench/bin/x86_64-linux-gnu/lat_syscall -N 10000 null
    randomize_kstack_offset=y	Simple syscall: 0.7082 microseconds
    randomize_kstack_offset=n	Simple syscall: 0.7016 microseconds

So, roughly 0.9% overhead growth for a no-op syscall, which is very
manageable. And for people that don't want this, it's off by default.

Comparison to PaX RANDKSTACK feature:

The RANDKSTACK feature randomizes the location of the stack start
(cpu_current_top_of_stack), i.e. including the location of pt_regs
structure itself on the stack. Initially this patch followed the same
approach, but during the recent discussions[2], it has been determined
to be of a little value since, if ptrace functionality is available for
an attacker, they can use PTRACE_PEEKUSR/PTRACE_POKEUSR to read/write
different offsets in the pt_regs struct, observe the cache behavior of
the pt_regs accesses, and figure out the random stack offset. Another
difference is that the random offset is stored in a per-cpu variable,
rather than having it be per-thread. As a result, these implementations
differ a fair bit in their implementation details and results, though
obviously the intent is similar.

[1] https://lore.kernel.org/kernel-hardening/2236FBA76BA1254E88B949DDB74E612BA4BC57C1@IRSMSX102.ger.corp.intel.com/
[2] https://lore.kernel.org/kernel-hardening/20190329081358.30497-1-elena.reshetova@intel.com/

Co-developed-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Link: https://lore.kernel.org/r/20190415060918.3766-1-elena.reshetova@intel.com
Signed-off-by: Kees Cook <keescook@chromium.org>
---
v2:
- move to per-cpu rdtsc() saved on syscall exit
- add static branches for zero-cost dynamic enabling
- Kconfig just selects the default state of static branch
- __builtin_alloca() produces ugly asm without -fno-stack-clash-protection
- made arch agnostic
rfc: https://lore.kernel.org/kernel-hardening/20190329081358.30497-1-elena.reshetova@intel.com/
---
 Makefile                         |  4 ++++
 arch/Kconfig                     | 19 +++++++++++++++
 include/linux/randomize_kstack.h | 40 ++++++++++++++++++++++++++++++++
 init/main.c                      | 23 ++++++++++++++++++
 4 files changed, 86 insertions(+)
 create mode 100644 include/linux/randomize_kstack.h

diff --git a/Makefile b/Makefile
index 171f2b004c8a..c99463406522 100644
--- a/Makefile
+++ b/Makefile
@@ -779,6 +779,10 @@ ifdef CONFIG_INIT_STACK_ALL
 KBUILD_CFLAGS	+= -ftrivial-auto-var-init=pattern
 endif
 
+# While VLAs have been removed, GCC produces unreachable stack probes
+# for the random_kstack_offset feature. Disable it for all compilers.
+KBUILD_CFLAGS	+= $(call cc-option,-fno-stack-clash-protection,)
+
 DEBUG_CFLAGS	:= $(call cc-option, -fno-var-tracking-assignments)
 
 ifdef CONFIG_DEBUG_INFO
diff --git a/arch/Kconfig b/arch/Kconfig
index 17fe351cdde0..619a56da4b76 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -854,6 +854,25 @@ config VMAP_STACK
 	  virtual mappings with real shadow memory, and KASAN_VMALLOC must
 	  be enabled.
 
+config HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET
+	def_bool n
+	help
+	  An arch should select this symbol if it can support kernel stack
+	  offset randomization with calls to add_random_kstack_offset()
+	  during syscall entry and choose_random_kstack_offset() during
+	  syscall exit.
+
+config RANDOMIZE_KSTACK_OFFSET_DEFAULT
+	bool "Randomize kernel stack offset on syscall entry"
+	depends on HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET
+	help
+	  The kernel stack offset can be randomized (after pt_regs) by
+	  roughly 5 bits of entropy, frustrating memory corruption
+	  attacks that depend on stack address determinism or
+	  cross-syscall address exposures. This feature is controlled
+	  by kernel boot param "randomize_kstack_offset=on/off", and this
+	  config chooses the default boot state.
+
 config ARCH_OPTIONAL_KERNEL_RWX
 	def_bool n
 
diff --git a/include/linux/randomize_kstack.h b/include/linux/randomize_kstack.h
new file mode 100644
index 000000000000..651ba9504568
--- /dev/null
+++ b/include/linux/randomize_kstack.h
@@ -0,0 +1,40 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef _LINUX_RANDOMIZE_KSTACK_H
+#define _LINUX_RANDOMIZE_KSTACK_H
+
+#include <linux/kernel.h>
+#include <linux/jump_label.h>
+#include <linux/percpu-defs.h>
+
+DECLARE_STATIC_KEY_MAYBE(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,
+			 randomize_kstack_offset);
+DECLARE_PER_CPU(u32, kstack_offset);
+
+/*
+ * Do not use this anywhere else in the kernel. This is used here because
+ * it provides an arch-agnostic way to grow the stack with correct
+ * alignment. Also, since this use is being explicitly masked to a max of
+ * 10 bits, stack-clash style attacks are unlikely. For more details see
+ * "VLAs" in Documentation/process/deprecated.rst
+ */
+void *__builtin_alloca(size_t size);
+
+#define add_random_kstack_offset() do {					\
+	if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,	\
+				&randomize_kstack_offset)) {		\
+		u32 offset = this_cpu_read(kstack_offset);		\
+		char *ptr = __builtin_alloca(offset & 0x3FF);		\
+		asm volatile("" : "=m"(*ptr));				\
+	}								\
+} while (0)
+
+#define choose_random_kstack_offset(rand) do {				\
+	if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,	\
+				&randomize_kstack_offset)) {		\
+		u32 offset = this_cpu_read(kstack_offset);		\
+		offset ^= (rand);					\
+		this_cpu_write(kstack_offset, offset);			\
+	}								\
+} while (0)
+
+#endif
diff --git a/init/main.c b/init/main.c
index ee4947af823f..78fe3aea00b0 100644
--- a/init/main.c
+++ b/init/main.c
@@ -777,6 +777,29 @@ static void __init mm_init(void)
 	pti_init();
 }
 
+#ifdef CONFIG_HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET
+DEFINE_STATIC_KEY_MAYBE_RO(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,
+			   randomize_kstack_offset);
+DEFINE_PER_CPU(u32, kstack_offset);
+
+static int __init early_randomize_kstack_offset(char *buf)
+{
+	int ret;
+	bool bool_result;
+
+	ret = kstrtobool(buf, &bool_result);
+	if (ret)
+		return ret;
+
+	if (bool_result)
+		static_branch_enable(&randomize_kstack_offset);
+	else
+		static_branch_disable(&randomize_kstack_offset);
+	return 0;
+}
+early_param("randomize_kstack_offset", early_randomize_kstack_offset);
+#endif
+
 void __init __weak arch_call_rest_init(void)
 {
 	rest_init();
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 4/5] x86/entry: Enable random_kstack_offset support
  2020-03-24 20:32 [PATCH v2 0/5] Optionally randomize kernel stack offset each syscall Kees Cook
                   ` (2 preceding siblings ...)
  2020-03-24 20:32 ` [PATCH v2 3/5] stack: Optionally randomize kernel stack offset each syscall Kees Cook
@ 2020-03-24 20:32 ` Kees Cook
  2020-03-28 22:26   ` Kees Cook
  2020-03-24 20:32 ` [PATCH v2 5/5] arm64: entry: " Kees Cook
  2020-03-24 21:28 ` [PATCH v2 0/5] Optionally randomize kernel stack offset each syscall Jann Horn
  5 siblings, 1 reply; 26+ messages in thread
From: Kees Cook @ 2020-03-24 20:32 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Kees Cook, Elena Reshetova, x86, Andy Lutomirski, Peter Zijlstra,
	Catalin Marinas, Will Deacon, Mark Rutland, Alexander Potapenko,
	Ard Biesheuvel, Jann Horn, Perla, Enrico, kernel-hardening,
	linux-arm-kernel, linux-mm, linux-kernel

Allow for a randomized stack offset on a per-syscall basis, with roughly
5 bits of entropy.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/Kconfig        |  1 +
 arch/x86/entry/common.c | 12 +++++++++++-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index beea77046f9b..b9d449581eb6 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -150,6 +150,7 @@ config X86
 	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
 	select HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD if X86_64
 	select HAVE_ARCH_VMAP_STACK		if X86_64
+	select HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET
 	select HAVE_ARCH_WITHIN_STACK_FRAMES
 	select HAVE_ASM_MODVERSIONS
 	select HAVE_CMPXCHG_DOUBLE
diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 9747876980b5..086d7af570af 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -26,6 +26,7 @@
 #include <linux/livepatch.h>
 #include <linux/syscalls.h>
 #include <linux/uaccess.h>
+#include <linux/randomize_kstack.h>
 
 #include <asm/desc.h>
 #include <asm/traps.h>
@@ -189,6 +190,13 @@ __visible inline void prepare_exit_to_usermode(struct pt_regs *regs)
 	lockdep_assert_irqs_disabled();
 	lockdep_sys_exit();
 
+	/*
+	 * x86_64 stack alignment means 3 bits are ignored, so keep
+	 * the top 5 bits. x86_32 needs only 2 bits of alignment, so
+	 * the top 6 bits will be used.
+	 */
+	choose_random_kstack_offset(rdtsc() & 0xFF);
+
 	cached_flags = READ_ONCE(ti->flags);
 
 	if (unlikely(cached_flags & EXIT_TO_USERMODE_LOOP_FLAGS))
@@ -283,6 +291,7 @@ __visible void do_syscall_64(unsigned long nr, struct pt_regs *regs)
 {
 	struct thread_info *ti;
 
+	add_random_kstack_offset();
 	enter_from_user_mode();
 	local_irq_enable();
 	ti = current_thread_info();
@@ -355,6 +364,7 @@ static __always_inline void do_syscall_32_irqs_on(struct pt_regs *regs)
 /* Handles int $0x80 */
 __visible void do_int80_syscall_32(struct pt_regs *regs)
 {
+	add_random_kstack_offset();
 	enter_from_user_mode();
 	local_irq_enable();
 	do_syscall_32_irqs_on(regs);
@@ -378,8 +388,8 @@ __visible long do_fast_syscall_32(struct pt_regs *regs)
 	 */
 	regs->ip = landing_pad;
 
+	add_random_kstack_offset();
 	enter_from_user_mode();
-
 	local_irq_enable();
 
 	/* Fetch EBP from where the vDSO stashed it. */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 5/5] arm64: entry: Enable random_kstack_offset support
  2020-03-24 20:32 [PATCH v2 0/5] Optionally randomize kernel stack offset each syscall Kees Cook
                   ` (3 preceding siblings ...)
  2020-03-24 20:32 ` [PATCH v2 4/5] x86/entry: Enable random_kstack_offset support Kees Cook
@ 2020-03-24 20:32 ` Kees Cook
  2020-03-25 13:21   ` Mark Rutland
  2020-04-20 20:54   ` Will Deacon
  2020-03-24 21:28 ` [PATCH v2 0/5] Optionally randomize kernel stack offset each syscall Jann Horn
  5 siblings, 2 replies; 26+ messages in thread
From: Kees Cook @ 2020-03-24 20:32 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Kees Cook, Elena Reshetova, x86, Andy Lutomirski, Peter Zijlstra,
	Catalin Marinas, Will Deacon, Mark Rutland, Alexander Potapenko,
	Ard Biesheuvel, Jann Horn, Perla, Enrico, kernel-hardening,
	linux-arm-kernel, linux-mm, linux-kernel

Allow for a randomized stack offset on a per-syscall basis, with roughly
5 bits of entropy.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/arm64/Kconfig          |  1 +
 arch/arm64/kernel/syscall.c | 10 ++++++++++
 2 files changed, 11 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 0b30e884e088..4d5aa4959f72 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -127,6 +127,7 @@ config ARM64
 	select HAVE_ARCH_MMAP_RND_BITS
 	select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT
 	select HAVE_ARCH_PREL32_RELOCATIONS
+	select HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET
 	select HAVE_ARCH_SECCOMP_FILTER
 	select HAVE_ARCH_STACKLEAK
 	select HAVE_ARCH_THREAD_STRUCT_WHITELIST
diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
index a12c0c88d345..238dbd753b44 100644
--- a/arch/arm64/kernel/syscall.c
+++ b/arch/arm64/kernel/syscall.c
@@ -5,6 +5,7 @@
 #include <linux/errno.h>
 #include <linux/nospec.h>
 #include <linux/ptrace.h>
+#include <linux/randomize_kstack.h>
 #include <linux/syscalls.h>
 
 #include <asm/daifflags.h>
@@ -42,6 +43,8 @@ static void invoke_syscall(struct pt_regs *regs, unsigned int scno,
 {
 	long ret;
 
+	add_random_kstack_offset();
+
 	if (scno < sc_nr) {
 		syscall_fn_t syscall_fn;
 		syscall_fn = syscall_table[array_index_nospec(scno, sc_nr)];
@@ -51,6 +54,13 @@ static void invoke_syscall(struct pt_regs *regs, unsigned int scno,
 	}
 
 	regs->regs[0] = ret;
+
+	/*
+	 * Since the compiler chooses a 4 bit alignment for the stack,
+	 * let's save one additional bit (9 total), which gets us up
+	 * near 5 bits of entropy.
+	 */
+	choose_random_kstack_offset(get_random_int() & 0x1FF);
 }
 
 static inline bool has_syscall_work(unsigned long flags)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/5] Optionally randomize kernel stack offset each syscall
  2020-03-24 20:32 [PATCH v2 0/5] Optionally randomize kernel stack offset each syscall Kees Cook
                   ` (4 preceding siblings ...)
  2020-03-24 20:32 ` [PATCH v2 5/5] arm64: entry: " Kees Cook
@ 2020-03-24 21:28 ` Jann Horn
  2020-03-24 23:07   ` Kees Cook
  5 siblings, 1 reply; 26+ messages in thread
From: Jann Horn @ 2020-03-24 21:28 UTC (permalink / raw)
  To: Kees Cook
  Cc: Thomas Gleixner, Elena Reshetova, the arch/x86 maintainers,
	Andy Lutomirski, Peter Zijlstra, Catalin Marinas, Will Deacon,
	Mark Rutland, Alexander Potapenko, Ard Biesheuvel, Perla, Enrico,
	Kernel Hardening, linux-arm-kernel, Linux-MM, kernel list

On Tue, Mar 24, 2020 at 9:32 PM Kees Cook <keescook@chromium.org> wrote:
> This is a continuation and refactoring of Elena's earlier effort to add
> kernel stack base offset randomization. In the time since the previous
> discussions, two attacks[1][2] were made public that depended on stack
> determinism, so we're no longer in the position of "this is a good idea
> but we have no examples of attacks". :)
[...]
> [1] https://a13xp0p0v.github.io/2020/02/15/CVE-2019-18683.html

This one only starts using the stack's location after having parsed
it out of dmesg (which in any environment that wants to provide a
reasonable level of security really ought to be restricted to root),
right? If you give people read access to dmesg, they can leak all
sorts of pointers; not just the stack pointer, but also whatever else
happens to be in the registers at that point - which is likely to give
the attacker more ways to place controlled data at a known location.
See e.g. <https://googleprojectzero.blogspot.com/2018/09/a-cache-invalidation-bug-in-linux.html>,
which leaks the pointer to a BPF map out of dmesg.

Also, are you sure that it isn't possible to make the syscall that
leaked its stack pointer never return to userspace (via ptrace or
SIGSTOP or something like that), and therefore never realign its
stack, while keeping some controlled data present on the syscall's
stack?

> [2] https://repositorio-aberto.up.pt/bitstream/10216/125357/2/374717.pdf

That's a moderately large document; which specific part are you referencing?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 1/5] jump_label: Provide CONFIG-driven build state defaults
  2020-03-24 20:32 ` [PATCH v2 1/5] jump_label: Provide CONFIG-driven build state defaults Kees Cook
@ 2020-03-24 22:06   ` Peter Zijlstra
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2020-03-24 22:06 UTC (permalink / raw)
  To: Kees Cook
  Cc: Thomas Gleixner, Elena Reshetova, x86, Andy Lutomirski,
	Catalin Marinas, Will Deacon, Mark Rutland, Alexander Potapenko,
	Ard Biesheuvel, Jann Horn, Perla, Enrico, kernel-hardening,
	linux-arm-kernel, linux-mm, linux-kernel

On Tue, Mar 24, 2020 at 01:32:27PM -0700, Kees Cook wrote:
> Choosing the initial state of static branches changes the assembly
> layout (if the condition is expected to be likely, inline, or unlikely,
> out of line via a jump). A few places in the kernel use (or could be
> using) a CONFIG to choose the default state, so provide the
> infrastructure to do this and convert the existing cases (init_on_alloc
> and init_on_free) to the new macros.
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>

Cute,

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/5] Optionally randomize kernel stack offset each syscall
  2020-03-24 21:28 ` [PATCH v2 0/5] Optionally randomize kernel stack offset each syscall Jann Horn
@ 2020-03-24 23:07   ` Kees Cook
  2020-03-25 12:15     ` Reshetova, Elena
  0 siblings, 1 reply; 26+ messages in thread
From: Kees Cook @ 2020-03-24 23:07 UTC (permalink / raw)
  To: Jann Horn
  Cc: Thomas Gleixner, Elena Reshetova, the arch/x86 maintainers,
	Andy Lutomirski, Peter Zijlstra, Catalin Marinas, Will Deacon,
	Mark Rutland, Alexander Potapenko, Ard Biesheuvel,
	Kernel Hardening, linux-arm-kernel, Linux-MM, kernel list

[-enrico, who is bouncing]

On Tue, Mar 24, 2020 at 10:28:35PM +0100, Jann Horn wrote:
> On Tue, Mar 24, 2020 at 9:32 PM Kees Cook <keescook@chromium.org> wrote:
> > This is a continuation and refactoring of Elena's earlier effort to add
> > kernel stack base offset randomization. In the time since the previous
> > discussions, two attacks[1][2] were made public that depended on stack
> > determinism, so we're no longer in the position of "this is a good idea
> > but we have no examples of attacks". :)
> [...]
> > [1] https://a13xp0p0v.github.io/2020/02/15/CVE-2019-18683.html
> 
> This one only starts using the stack's location after having parsed
> it out of dmesg (which in any environment that wants to provide a
> reasonable level of security really ought to be restricted to root),
> right? If you give people read access to dmesg, they can leak all
> sorts of pointers; not just the stack pointer, but also whatever else
> happens to be in the registers at that point - which is likely to give
> the attacker more ways to place controlled data at a known location.
> See e.g. <https://googleprojectzero.blogspot.com/2018/09/a-cache-invalidation-bug-in-linux.html>,
> which leaks the pointer to a BPF map out of dmesg.

It was mentioned that it would re-use the base across syscalls, so this
defense would have frustrated it.

More to my point was that there still are attacks using a deterministic
stack as part of the exploit chain. We have a low-cost way to make that
go away.

> Also, are you sure that it isn't possible to make the syscall that
> leaked its stack pointer never return to userspace (via ptrace or
> SIGSTOP or something like that), and therefore never realign its
> stack, while keeping some controlled data present on the syscall's
> stack?
> 
> > [2] https://repositorio-aberto.up.pt/bitstream/10216/125357/2/374717.pdf
> 
> That's a moderately large document; which specific part are you referencing?

IIRC, section 3.3 discusses using the stack for CFI bypass, though
thinking about it again, it may have been targeting pt_regs. I'll
double check and remove this reference if that's the case.

But, as I mention, this is proactive and I'd like to stop yet more
things from being able to depend on the stack location.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH v2 0/5] Optionally randomize kernel stack offset each syscall
  2020-03-24 23:07   ` Kees Cook
@ 2020-03-25 12:15     ` Reshetova, Elena
  2020-03-25 20:27       ` Kees Cook
  0 siblings, 1 reply; 26+ messages in thread
From: Reshetova, Elena @ 2020-03-25 12:15 UTC (permalink / raw)
  To: Kees Cook, Jann Horn
  Cc: Thomas Gleixner, the arch/x86 maintainers, Andy Lutomirski,
	Peter Zijlstra, Catalin Marinas, Will Deacon, Mark Rutland,
	Alexander Potapenko, Ard Biesheuvel, Kernel Hardening,
	linux-arm-kernel, Linux-MM, kernel list


> > Also, are you sure that it isn't possible to make the syscall that
> > leaked its stack pointer never return to userspace (via ptrace or
> > SIGSTOP or something like that), and therefore never realign its
> > stack, while keeping some controlled data present on the syscall's
> > stack?

How would you reliably detect that a stack pointer has been leaked
to userspace while it has been in a syscall? Does not seem to be a trivial
task to me. 

Best Regards,
Elena. 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 5/5] arm64: entry: Enable random_kstack_offset support
  2020-03-24 20:32 ` [PATCH v2 5/5] arm64: entry: " Kees Cook
@ 2020-03-25 13:21   ` Mark Rutland
  2020-03-25 20:22     ` Kees Cook
  2020-04-20 20:54   ` Will Deacon
  1 sibling, 1 reply; 26+ messages in thread
From: Mark Rutland @ 2020-03-25 13:21 UTC (permalink / raw)
  To: Kees Cook
  Cc: Thomas Gleixner, Elena Reshetova, x86, Andy Lutomirski,
	Peter Zijlstra, Catalin Marinas, Will Deacon,
	Alexander Potapenko, Ard Biesheuvel, Jann Horn, Perla, Enrico,
	kernel-hardening, linux-arm-kernel, linux-mm, linux-kernel

On Tue, Mar 24, 2020 at 01:32:31PM -0700, Kees Cook wrote:
> Allow for a randomized stack offset on a per-syscall basis, with roughly
> 5 bits of entropy.
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>

Just to check, do you have an idea of the impact on arm64? Patch 3 had
figures for x86 where it reads the TSC, and it's unclear to me how
get_random_int() compares to that.

Otherwise, this looks sound to me; I'd jsut like to know whether the
overhead is in the same ballpark.

Thanks
Mark.

> ---
>  arch/arm64/Kconfig          |  1 +
>  arch/arm64/kernel/syscall.c | 10 ++++++++++
>  2 files changed, 11 insertions(+)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 0b30e884e088..4d5aa4959f72 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -127,6 +127,7 @@ config ARM64
>  	select HAVE_ARCH_MMAP_RND_BITS
>  	select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT
>  	select HAVE_ARCH_PREL32_RELOCATIONS
> +	select HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET
>  	select HAVE_ARCH_SECCOMP_FILTER
>  	select HAVE_ARCH_STACKLEAK
>  	select HAVE_ARCH_THREAD_STRUCT_WHITELIST
> diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
> index a12c0c88d345..238dbd753b44 100644
> --- a/arch/arm64/kernel/syscall.c
> +++ b/arch/arm64/kernel/syscall.c
> @@ -5,6 +5,7 @@
>  #include <linux/errno.h>
>  #include <linux/nospec.h>
>  #include <linux/ptrace.h>
> +#include <linux/randomize_kstack.h>
>  #include <linux/syscalls.h>
>  
>  #include <asm/daifflags.h>
> @@ -42,6 +43,8 @@ static void invoke_syscall(struct pt_regs *regs, unsigned int scno,
>  {
>  	long ret;
>  
> +	add_random_kstack_offset();
> +
>  	if (scno < sc_nr) {
>  		syscall_fn_t syscall_fn;
>  		syscall_fn = syscall_table[array_index_nospec(scno, sc_nr)];
> @@ -51,6 +54,13 @@ static void invoke_syscall(struct pt_regs *regs, unsigned int scno,
>  	}
>  
>  	regs->regs[0] = ret;
> +
> +	/*
> +	 * Since the compiler chooses a 4 bit alignment for the stack,
> +	 * let's save one additional bit (9 total), which gets us up
> +	 * near 5 bits of entropy.
> +	 */
> +	choose_random_kstack_offset(get_random_int() & 0x1FF);
>  }
>  
>  static inline bool has_syscall_work(unsigned long flags)
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 5/5] arm64: entry: Enable random_kstack_offset support
  2020-03-25 13:21   ` Mark Rutland
@ 2020-03-25 20:22     ` Kees Cook
  2020-03-26 11:15       ` Mark Rutland
  0 siblings, 1 reply; 26+ messages in thread
From: Kees Cook @ 2020-03-25 20:22 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Thomas Gleixner, Elena Reshetova, x86, Andy Lutomirski,
	Peter Zijlstra, Catalin Marinas, Will Deacon,
	Alexander Potapenko, Ard Biesheuvel, Jann Horn, Perla, Enrico,
	kernel-hardening, linux-arm-kernel, linux-mm, linux-kernel

On Wed, Mar 25, 2020 at 01:21:27PM +0000, Mark Rutland wrote:
> On Tue, Mar 24, 2020 at 01:32:31PM -0700, Kees Cook wrote:
> > Allow for a randomized stack offset on a per-syscall basis, with roughly
> > 5 bits of entropy.
> > 
> > Signed-off-by: Kees Cook <keescook@chromium.org>
> 
> Just to check, do you have an idea of the impact on arm64? Patch 3 had
> figures for x86 where it reads the TSC, and it's unclear to me how
> get_random_int() compares to that.

I didn't do a measurement on arm64 since I don't have a good bare-metal
test environment. I know Andy Lutomirki has plans for making
get_random_get() as fast as possible, so that's why I used it here. I
couldn't figure out if there was a comparable instruction like rdtsc in
aarch64 (it seems there's a cycle counter, but I found nothing in the
kernel that seemed to actually use it)?

> Otherwise, this looks sound to me; I'd jsut like to know whether the
> overhead is in the same ballpark.

Thanks!

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/5] Optionally randomize kernel stack offset each syscall
  2020-03-25 12:15     ` Reshetova, Elena
@ 2020-03-25 20:27       ` Kees Cook
  2020-03-25 23:20         ` Jann Horn
  0 siblings, 1 reply; 26+ messages in thread
From: Kees Cook @ 2020-03-25 20:27 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: Jann Horn, Thomas Gleixner, the arch/x86 maintainers,
	Andy Lutomirski, Peter Zijlstra, Catalin Marinas, Will Deacon,
	Mark Rutland, Alexander Potapenko, Ard Biesheuvel,
	Kernel Hardening, linux-arm-kernel, Linux-MM, kernel list

On Wed, Mar 25, 2020 at 12:15:12PM +0000, Reshetova, Elena wrote:
> > > Also, are you sure that it isn't possible to make the syscall that
> > > leaked its stack pointer never return to userspace (via ptrace or
> > > SIGSTOP or something like that), and therefore never realign its
> > > stack, while keeping some controlled data present on the syscall's
> > > stack?
> 
> How would you reliably detect that a stack pointer has been leaked
> to userspace while it has been in a syscall? Does not seem to be a trivial
> task to me. 

Well, my expectation is that folks using this defense are also using
panic_on_warn sysctl, etc, so attackers don't get a chance to actually
_use_ register values spilled to dmesg.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/5] Optionally randomize kernel stack offset each syscall
  2020-03-25 20:27       ` Kees Cook
@ 2020-03-25 23:20         ` Jann Horn
  2020-03-26 17:18           ` Kees Cook
  0 siblings, 1 reply; 26+ messages in thread
From: Jann Horn @ 2020-03-25 23:20 UTC (permalink / raw)
  To: Kees Cook
  Cc: Reshetova, Elena, Thomas Gleixner, the arch/x86 maintainers,
	Andy Lutomirski, Peter Zijlstra, Catalin Marinas, Will Deacon,
	Mark Rutland, Alexander Potapenko, Ard Biesheuvel,
	Kernel Hardening, linux-arm-kernel, Linux-MM, kernel list

On Wed, Mar 25, 2020 at 9:27 PM Kees Cook <keescook@chromium.org> wrote:
> On Wed, Mar 25, 2020 at 12:15:12PM +0000, Reshetova, Elena wrote:
> > > > Also, are you sure that it isn't possible to make the syscall that
> > > > leaked its stack pointer never return to userspace (via ptrace or
> > > > SIGSTOP or something like that), and therefore never realign its
> > > > stack, while keeping some controlled data present on the syscall's
> > > > stack?
> >
> > How would you reliably detect that a stack pointer has been leaked
> > to userspace while it has been in a syscall? Does not seem to be a trivial
> > task to me.
>
> Well, my expectation is that folks using this defense are also using
> panic_on_warn sysctl, etc, so attackers don't get a chance to actually
> _use_ register values spilled to dmesg.

Uh... I thought that thing was exclusively for stuff like syzkaller,
because nuking the entire system because of a WARN is far too
excessive? WARNs should be safe to add almost anywhere in the kernel,
so that developers can put their assumptions about system behavior
into code without having to worry about bringing down the entire
system if that assumption turns out to have been false in some
harmless edgecase.

Also, there are other places that dump register state. In particular
the soft lockup detection, which you can IIRC easily trip even
accidentally if you play around with stuff like FUSE filesystems, or
if a disk becomes unresponsive. Sure, *theoretically* you can also set
the "panic on soft lockup" flag, but that seems like a really terrible
idea to me.

As far as I can tell, the only clean way to fix this is to tell
distros that give non-root users access to dmesg (Ubuntu in
particular) that they have to stop doing that. E.g. Debian seems to
get by just fine with root-restricted dmesg.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 5/5] arm64: entry: Enable random_kstack_offset support
  2020-03-25 20:22     ` Kees Cook
@ 2020-03-26 11:15       ` Mark Rutland
  2020-03-26 16:31         ` Kees Cook
  0 siblings, 1 reply; 26+ messages in thread
From: Mark Rutland @ 2020-03-26 11:15 UTC (permalink / raw)
  To: Kees Cook
  Cc: Thomas Gleixner, Elena Reshetova, x86, Andy Lutomirski,
	Peter Zijlstra, Catalin Marinas, Will Deacon,
	Alexander Potapenko, Ard Biesheuvel, Jann Horn, Perla, Enrico,
	kernel-hardening, linux-arm-kernel, linux-mm, linux-kernel

On Wed, Mar 25, 2020 at 01:22:07PM -0700, Kees Cook wrote:
> On Wed, Mar 25, 2020 at 01:21:27PM +0000, Mark Rutland wrote:
> > On Tue, Mar 24, 2020 at 01:32:31PM -0700, Kees Cook wrote:
> > > Allow for a randomized stack offset on a per-syscall basis, with roughly
> > > 5 bits of entropy.
> > > 
> > > Signed-off-by: Kees Cook <keescook@chromium.org>
> > 
> > Just to check, do you have an idea of the impact on arm64? Patch 3 had
> > figures for x86 where it reads the TSC, and it's unclear to me how
> > get_random_int() compares to that.
> 
> I didn't do a measurement on arm64 since I don't have a good bare-metal
> test environment. I know Andy Lutomirki has plans for making
> get_random_get() as fast as possible, so that's why I used it here.

Ok. I suspect I also won't get the chance to test that in the next few
days, but if I do I'll try to share the results.

My concern here was that, get_random_int() has to grab a spinlock and
mess with IRQ masking, so has the potential to block for much longer,
but that might not be an issue in practice, and I don't think that
should block these patches.

> I couldn't figure out if there was a comparable instruction like rdtsc
> in aarch64 (it seems there's a cycle counter, but I found nothing in
> the kernel that seemed to actually use it)?

AArch64 doesn't have a direct equivalent. The generic counter
(CNTxCT_EL0) is the closest thing, but its nominal frequency is
typically much lower than the nominal CPU clock frequency (unlike TSC
where they're the same). The cycle counter (PMCCNTR_EL0) is part of the
PMU, and can't be relied on in the same way (e.g. as perf reprograms it
to generate overflow events, and it can stop for things like WFI/WFE).

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 2/5] init_on_alloc: Unpessimize default-on builds
  2020-03-24 20:32 ` [PATCH v2 2/5] init_on_alloc: Unpessimize default-on builds Kees Cook
@ 2020-03-26 15:48   ` Alexander Potapenko
  0 siblings, 0 replies; 26+ messages in thread
From: Alexander Potapenko @ 2020-03-26 15:48 UTC (permalink / raw)
  To: Kees Cook
  Cc: Thomas Gleixner, Elena Reshetova, the arch/x86 maintainers,
	Andy Lutomirski, Peter Zijlstra, Catalin Marinas, Will Deacon,
	Mark Rutland, Ard Biesheuvel, Jann Horn, Perla, Enrico,
	Kernel Hardening, linux-arm-kernel, Linux Memory Management List,
	LKML

On Tue, Mar 24, 2020 at 9:32 PM Kees Cook <keescook@chromium.org> wrote:
>
> Right now, the state of CONFIG_INIT_ON_ALLOC_DEFAULT_ON (and
> ...ON_FREE...) did not change the assembly ordering of the static branch
> tests. Use the new jump_label macro to check CONFIG settings to default
> to the "expected" state, unpessimizes the resulting assembly code.
>
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>  include/linux/mm.h | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 059658604dd6..64e911159ffa 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2665,7 +2665,8 @@ static inline void kernel_poison_pages(struct page *page, int numpages,
>  DECLARE_STATIC_KEY_MAYBE(CONFIG_INIT_ON_ALLOC_DEFAULT_ON, init_on_alloc);
>  static inline bool want_init_on_alloc(gfp_t flags)
>  {
> -       if (static_branch_unlikely(&init_on_alloc) &&
> +       if (static_branch_maybe(CONFIG_INIT_ON_ALLOC_DEFAULT_ON,
> +                               &init_on_alloc) &&
>             !page_poisoning_enabled())
>                 return true;
>         return flags & __GFP_ZERO;
> @@ -2674,7 +2675,8 @@ static inline bool want_init_on_alloc(gfp_t flags)
>  DECLARE_STATIC_KEY_MAYBE(CONFIG_INIT_ON_FREE_DEFAULT_ON, init_on_free);
>  static inline bool want_init_on_free(void)
>  {
> -       return static_branch_unlikely(&init_on_free) &&
> +       return static_branch_maybe(CONFIG_INIT_ON_FREE_DEFAULT_ON,
> +                                  &init_on_free) &&
>                !page_poisoning_enabled();
>  }
>
> --
> 2.20.1
>
Reviewed-by: Alexander Potapenko <glider@google.com>

-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 5/5] arm64: entry: Enable random_kstack_offset support
  2020-03-26 11:15       ` Mark Rutland
@ 2020-03-26 16:31         ` Kees Cook
  2020-03-30 11:26           ` Mark Rutland
  0 siblings, 1 reply; 26+ messages in thread
From: Kees Cook @ 2020-03-26 16:31 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Thomas Gleixner, Elena Reshetova, x86, Andy Lutomirski,
	Peter Zijlstra, Catalin Marinas, Will Deacon,
	Alexander Potapenko, Ard Biesheuvel, Jann Horn, Perla, Enrico,
	kernel-hardening, linux-arm-kernel, linux-mm, linux-kernel

On Thu, Mar 26, 2020 at 11:15:21AM +0000, Mark Rutland wrote:
> On Wed, Mar 25, 2020 at 01:22:07PM -0700, Kees Cook wrote:
> > On Wed, Mar 25, 2020 at 01:21:27PM +0000, Mark Rutland wrote:
> > > On Tue, Mar 24, 2020 at 01:32:31PM -0700, Kees Cook wrote:
> > > > Allow for a randomized stack offset on a per-syscall basis, with roughly
> > > > 5 bits of entropy.
> > > > 
> > > > Signed-off-by: Kees Cook <keescook@chromium.org>
> > > 
> > > Just to check, do you have an idea of the impact on arm64? Patch 3 had
> > > figures for x86 where it reads the TSC, and it's unclear to me how
> > > get_random_int() compares to that.
> > 
> > I didn't do a measurement on arm64 since I don't have a good bare-metal
> > test environment. I know Andy Lutomirki has plans for making
> > get_random_get() as fast as possible, so that's why I used it here.
> 
> Ok. I suspect I also won't get the chance to test that in the next few
> days, but if I do I'll try to share the results.

Okay, thanks! I can try a rough estimate under emulation, but I assume
that'll be mostly useless. :)

> My concern here was that, get_random_int() has to grab a spinlock and
> mess with IRQ masking, so has the potential to block for much longer,
> but that might not be an issue in practice, and I don't think that
> should block these patches.

Gotcha. I was already surprised by how "heavy" the per-cpu access was
when I looked at the resulting assembly (there looked to be preempt
stuff, etc). But my hope was that this is configurable so people can
measure for themselves if they want it, and most people who want this
feature have a high tolerance for performance trade-offs. ;)

> > I couldn't figure out if there was a comparable instruction like rdtsc
> > in aarch64 (it seems there's a cycle counter, but I found nothing in
> > the kernel that seemed to actually use it)?
> 
> AArch64 doesn't have a direct equivalent. The generic counter
> (CNTxCT_EL0) is the closest thing, but its nominal frequency is
> typically much lower than the nominal CPU clock frequency (unlike TSC
> where they're the same). The cycle counter (PMCCNTR_EL0) is part of the
> PMU, and can't be relied on in the same way (e.g. as perf reprograms it
> to generate overflow events, and it can stop for things like WFI/WFE).

Okay, cool; thanks for the details! It's always nice to confirm I didn't
miss some glaringly obvious solution. ;)

For a potential v2, should I add your reviewed-by or wait for your
timing analysis, etc?

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/5] Optionally randomize kernel stack offset each syscall
  2020-03-25 23:20         ` Jann Horn
@ 2020-03-26 17:18           ` Kees Cook
  0 siblings, 0 replies; 26+ messages in thread
From: Kees Cook @ 2020-03-26 17:18 UTC (permalink / raw)
  To: Jann Horn
  Cc: Reshetova, Elena, Thomas Gleixner, the arch/x86 maintainers,
	Andy Lutomirski, Peter Zijlstra, Catalin Marinas, Will Deacon,
	Mark Rutland, Alexander Potapenko, Ard Biesheuvel,
	Kernel Hardening, linux-arm-kernel, Linux-MM, kernel list

On Thu, Mar 26, 2020 at 12:20:19AM +0100, Jann Horn wrote:
> On Wed, Mar 25, 2020 at 9:27 PM Kees Cook <keescook@chromium.org> wrote:
> > On Wed, Mar 25, 2020 at 12:15:12PM +0000, Reshetova, Elena wrote:
> > > > > Also, are you sure that it isn't possible to make the syscall that
> > > > > leaked its stack pointer never return to userspace (via ptrace or
> > > > > SIGSTOP or something like that), and therefore never realign its
> > > > > stack, while keeping some controlled data present on the syscall's
> > > > > stack?
> > >
> > > How would you reliably detect that a stack pointer has been leaked
> > > to userspace while it has been in a syscall? Does not seem to be a trivial
> > > task to me.
> >
> > Well, my expectation is that folks using this defense are also using
> > panic_on_warn sysctl, etc, so attackers don't get a chance to actually
> > _use_ register values spilled to dmesg.
> 
> Uh... I thought that thing was exclusively for stuff like syzkaller,
> because nuking the entire system because of a WARN is far too
> excessive? WARNs should be safe to add almost anywhere in the kernel,
> so that developers can put their assumptions about system behavior
> into code without having to worry about bringing down the entire
> system if that assumption turns out to have been false in some
> harmless edgecase.

So, I'm caught in a tight spot between Linus's deprecation of BUG()[1],
and the desire for high-sensitivity security-oriented system builders
to have a "completely stop running that kernel thread" option. Linus's
entirely reasonable observation that BUG() destabilizes the kernel more
often than it doesn't means there isn't actually a safe "stop that kernel
thread" option, especially since many mitigations that detect badness span
a spectrum of "stops the badness before it happens" (e.g. NX memory) to
"I see badness has already happened" (e.g. stack protector). As a result,
the only way to provide a way for the security-prioritized users is to
downgrade corruptions to DoSes via panic(). I wish there was a magic
way to have a perfect kernel state unwinder to get us the BUG() we
wanted it to be, but given the kernel's complexity, it doesn't exist
(and is unlikely to be worth developing). Right now, we either get
"WARN() and keep going as best we can" or we get "WARN() and panic".

And with regard to "WARNs should be safe to add", yes, that's generally
true, but the goal is to not make them reachable from userspace because
of this need to be able to "upgrade" them to panic(). I have tried to
document[1] this:

  Note that the WARN()-family should only be used for "expected to
  be unreachable" situations. If you want to warn about "reachable
  but undesirable" situations, please use the pr_warn()-family of
  functions. System owners may have set the *panic_on_warn* sysctl,
  to make sure their systems do not continue running in the face of
  "unreachable" conditions. (For example, see commits like `this one
  <https://git.kernel.org/linus/d4689846881d160a4d12a514e991a740bcb5d65a>`_.)

[1] https://lore.kernel.org/lkml/202003141524.59C619B51A@keescook/

> Also, there are other places that dump register state. In particular
> the soft lockup detection, which you can IIRC easily trip even
> accidentally if you play around with stuff like FUSE filesystems, or
> if a disk becomes unresponsive. Sure, *theoretically* you can also set
> the "panic on soft lockup" flag, but that seems like a really terrible
> idea to me.

I understand your general objection to non-deterministic defenses,
as there will always be ways to weaken them, but I don't think that's
reason enough to not have them. I prefer to look at mitigations as a
spectrum, and to recognize that some are more effective with certain
system configurations. They become tools to choose from when building
defense in depth.

> As far as I can tell, the only clean way to fix this is to tell
> distros that give non-root users access to dmesg (Ubuntu in
> particular) that they have to stop doing that. E.g. Debian seems to
> get by just fine with root-restricted dmesg.

Totally agreed about that. Ubuntu may be hard to convince as one of
their design principles has been to make the first user able to use the
system completely with as little interruption as possible. (e.g. pop-up
confirmation dialogs are strongly discouraged, etc.)

So, for this series, I think the benefit-to-complexity value is high.
It's a simple solution even if it's not perfect (most things can't be
given the existing kernel design trade-offs).

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 4/5] x86/entry: Enable random_kstack_offset support
  2020-03-24 20:32 ` [PATCH v2 4/5] x86/entry: Enable random_kstack_offset support Kees Cook
@ 2020-03-28 22:26   ` Kees Cook
  0 siblings, 0 replies; 26+ messages in thread
From: Kees Cook @ 2020-03-28 22:26 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Elena Reshetova, x86, Andy Lutomirski, Peter Zijlstra,
	Catalin Marinas, Will Deacon, Mark Rutland, Alexander Potapenko,
	Ard Biesheuvel, Jann Horn, Perla, Enrico, kernel-hardening,
	linux-arm-kernel, linux-mm, linux-kernel

On Tue, Mar 24, 2020 at 01:32:30PM -0700, Kees Cook wrote:
> Allow for a randomized stack offset on a per-syscall basis, with roughly
> 5 bits of entropy.
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>  arch/x86/Kconfig        |  1 +
>  arch/x86/entry/common.c | 12 +++++++++++-
>  2 files changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index beea77046f9b..b9d449581eb6 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -150,6 +150,7 @@ config X86
>  	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
>  	select HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD if X86_64
>  	select HAVE_ARCH_VMAP_STACK		if X86_64
> +	select HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET
>  	select HAVE_ARCH_WITHIN_STACK_FRAMES
>  	select HAVE_ASM_MODVERSIONS
>  	select HAVE_CMPXCHG_DOUBLE
> diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
> index 9747876980b5..086d7af570af 100644
> --- a/arch/x86/entry/common.c
> +++ b/arch/x86/entry/common.c
> @@ -26,6 +26,7 @@
>  #include <linux/livepatch.h>
>  #include <linux/syscalls.h>
>  #include <linux/uaccess.h>
> +#include <linux/randomize_kstack.h>
>  
>  #include <asm/desc.h>
>  #include <asm/traps.h>
> @@ -189,6 +190,13 @@ __visible inline void prepare_exit_to_usermode(struct pt_regs *regs)
>  	lockdep_assert_irqs_disabled();
>  	lockdep_sys_exit();
>  
> +	/*
> +	 * x86_64 stack alignment means 3 bits are ignored, so keep
> +	 * the top 5 bits. x86_32 needs only 2 bits of alignment, so
> +	 * the top 6 bits will be used.
> +	 */
> +	choose_random_kstack_offset(rdtsc() & 0xFF);
> +
>  	cached_flags = READ_ONCE(ti->flags);
>  
>  	if (unlikely(cached_flags & EXIT_TO_USERMODE_LOOP_FLAGS))
> @@ -283,6 +291,7 @@ __visible void do_syscall_64(unsigned long nr, struct pt_regs *regs)
>  {
>  	struct thread_info *ti;
>  
> +	add_random_kstack_offset();
>  	enter_from_user_mode();
>  	local_irq_enable();
>  	ti = current_thread_info();

So, I got an email from 0day that this caused a performance regression[1]
with things _turned off_. On closer examination:

Before (objdump -dS vmlinux):

__visible void do_syscall_64(unsigned long nr, struct pt_regs *regs)
{
ffffffff81003920:       41 54                   push   %r12
ffffffff81003922:       55                      push   %rbp
ffffffff81003923:       48 89 f5                mov    %rsi,%rbp
ffffffff81003926:       53                      push   %rbx
ffffffff81003927:       48 89 fb                mov    %rdi,%rbx
        struct thread_info *ti;

        enter_from_user_mode();
        local_irq_enable();
...


After:

__visible void do_syscall_64(unsigned long nr, struct pt_regs *regs)
{ 
ffffffff81003960:       55                      push   %rbp
ffffffff81003961:       48 89 e5                mov    %rsp,%rbp
ffffffff81003964:       41 55                   push   %r13
ffffffff81003966:       41 54                   push   %r12
ffffffff81003968:       49 89 f4                mov    %rsi,%r12
ffffffff8100396b:       53                      push   %rbx
ffffffff8100396c:       48 89 fb                mov    %rdi,%rbx
ffffffff8100396f:       48 83 ec 08             sub    $0x8,%rsp
ffffffff81003973:       65 48 8b 04 25 28 00    mov    %gs:0x28,%rax
ffffffff8100397a:       00 00 
ffffffff8100397c:       48 89 45 e0             mov    %rax,-0x20(%rbp)
ffffffff81003980:       31 c0                   xor    %eax,%eax
        asm_volatile_goto("1:"
ffffffff81003982:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
        struct thread_info *ti;

        add_random_kstack_offset();
        enter_from_user_mode();
        local_irq_enable();

The "nopl" there is the static_branch code that I'd expect. However, the
preample is quite different. *drum roll* Anyone else recognize %gs:0x28?

That's the stack canary. :P It seems that GCC views this as an array:

                char *ptr = __builtin_alloca(offset & 0x3FF);
                asm volatile("" : "=m"(*ptr));

because it's locally allocated on the stack. *face palm*

I'll go figure out a way to fix this...

-Kees

[1] https://lore.kernel.org/lkml/202003281505.0F481D3@keescook/

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 3/5] stack: Optionally randomize kernel stack offset each syscall
  2020-03-24 20:32 ` [PATCH v2 3/5] stack: Optionally randomize kernel stack offset each syscall Kees Cook
@ 2020-03-30 11:25   ` Mark Rutland
  2020-03-30 18:18     ` Kees Cook
  2020-03-30 18:27     ` Kees Cook
  0 siblings, 2 replies; 26+ messages in thread
From: Mark Rutland @ 2020-03-30 11:25 UTC (permalink / raw)
  To: Kees Cook
  Cc: Thomas Gleixner, Elena Reshetova, x86, Andy Lutomirski,
	Peter Zijlstra, Catalin Marinas, Will Deacon,
	Alexander Potapenko, Ard Biesheuvel, Jann Horn, Perla, Enrico,
	kernel-hardening, linux-arm-kernel, linux-mm, linux-kernel

On Tue, Mar 24, 2020 at 01:32:29PM -0700, Kees Cook wrote:
> +/*
> + * Do not use this anywhere else in the kernel. This is used here because
> + * it provides an arch-agnostic way to grow the stack with correct
> + * alignment. Also, since this use is being explicitly masked to a max of
> + * 10 bits, stack-clash style attacks are unlikely. For more details see
> + * "VLAs" in Documentation/process/deprecated.rst
> + */
> +void *__builtin_alloca(size_t size);
> +
> +#define add_random_kstack_offset() do {					\
> +	if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,	\
> +				&randomize_kstack_offset)) {		\
> +		u32 offset = this_cpu_read(kstack_offset);		\
> +		char *ptr = __builtin_alloca(offset & 0x3FF);		\
> +		asm volatile("" : "=m"(*ptr));				\

Is this asm() a homebrew OPTIMIZER_HIDE_VAR(*ptr)? If the asm
constraints generate metter code, could we add those as alternative
constraints in OPTIMIZER_HIDE_VAR() ?

Mark.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 5/5] arm64: entry: Enable random_kstack_offset support
  2020-03-26 16:31         ` Kees Cook
@ 2020-03-30 11:26           ` Mark Rutland
  0 siblings, 0 replies; 26+ messages in thread
From: Mark Rutland @ 2020-03-30 11:26 UTC (permalink / raw)
  To: Kees Cook
  Cc: Thomas Gleixner, Elena Reshetova, x86, Andy Lutomirski,
	Peter Zijlstra, Catalin Marinas, Will Deacon,
	Alexander Potapenko, Ard Biesheuvel, Jann Horn, Perla, Enrico,
	kernel-hardening, linux-arm-kernel, linux-mm, linux-kernel

On Thu, Mar 26, 2020 at 09:31:32AM -0700, Kees Cook wrote:
> On Thu, Mar 26, 2020 at 11:15:21AM +0000, Mark Rutland wrote:
> > On Wed, Mar 25, 2020 at 01:22:07PM -0700, Kees Cook wrote:
> > > On Wed, Mar 25, 2020 at 01:21:27PM +0000, Mark Rutland wrote:
> > > > On Tue, Mar 24, 2020 at 01:32:31PM -0700, Kees Cook wrote:
> > > > > Allow for a randomized stack offset on a per-syscall basis, with roughly
> > > > > 5 bits of entropy.
> > > > > 
> > > > > Signed-off-by: Kees Cook <keescook@chromium.org>
> > > > 
> > > > Just to check, do you have an idea of the impact on arm64? Patch 3 had
> > > > figures for x86 where it reads the TSC, and it's unclear to me how
> > > > get_random_int() compares to that.
> > > 
> > > I didn't do a measurement on arm64 since I don't have a good bare-metal
> > > test environment. I know Andy Lutomirki has plans for making
> > > get_random_get() as fast as possible, so that's why I used it here.
> > 
> > Ok. I suspect I also won't get the chance to test that in the next few
> > days, but if I do I'll try to share the results.
> 
> Okay, thanks! I can try a rough estimate under emulation, but I assume
> that'll be mostly useless. :)
> 
> > My concern here was that, get_random_int() has to grab a spinlock and
> > mess with IRQ masking, so has the potential to block for much longer,
> > but that might not be an issue in practice, and I don't think that
> > should block these patches.
> 
> Gotcha. I was already surprised by how "heavy" the per-cpu access was
> when I looked at the resulting assembly (there looked to be preempt
> stuff, etc). But my hope was that this is configurable so people can
> measure for themselves if they want it, and most people who want this
> feature have a high tolerance for performance trade-offs. ;)
> 
> > > I couldn't figure out if there was a comparable instruction like rdtsc
> > > in aarch64 (it seems there's a cycle counter, but I found nothing in
> > > the kernel that seemed to actually use it)?
> > 
> > AArch64 doesn't have a direct equivalent. The generic counter
> > (CNTxCT_EL0) is the closest thing, but its nominal frequency is
> > typically much lower than the nominal CPU clock frequency (unlike TSC
> > where they're the same). The cycle counter (PMCCNTR_EL0) is part of the
> > PMU, and can't be relied on in the same way (e.g. as perf reprograms it
> > to generate overflow events, and it can stop for things like WFI/WFE).
> 
> Okay, cool; thanks for the details! It's always nice to confirm I didn't
> miss some glaringly obvious solution. ;)
> 
> For a potential v2, should I add your reviewed-by or wait for your
> timing analysis, etc?

I'd rather not give an R-b until I've seen numbers, but please don't
block waiting for that. For the moment, feel free to add:

Acked-by: Mark Rutland <mark.rutland@arm.com>

... and it's down to Will and Catalin to make the call for arm64.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 3/5] stack: Optionally randomize kernel stack offset each syscall
  2020-03-30 11:25   ` Mark Rutland
@ 2020-03-30 18:18     ` Kees Cook
  2020-03-30 18:27     ` Kees Cook
  1 sibling, 0 replies; 26+ messages in thread
From: Kees Cook @ 2020-03-30 18:18 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Thomas Gleixner, Elena Reshetova, x86, Andy Lutomirski,
	Peter Zijlstra, Catalin Marinas, Will Deacon,
	Alexander Potapenko, Ard Biesheuvel, Jann Horn, Perla, Enrico,
	kernel-hardening, linux-arm-kernel, linux-mm, linux-kernel

On Mon, Mar 30, 2020 at 12:25:36PM +0100, Mark Rutland wrote:
> On Tue, Mar 24, 2020 at 01:32:29PM -0700, Kees Cook wrote:
> > +/*
> > + * Do not use this anywhere else in the kernel. This is used here because
> > + * it provides an arch-agnostic way to grow the stack with correct
> > + * alignment. Also, since this use is being explicitly masked to a max of
> > + * 10 bits, stack-clash style attacks are unlikely. For more details see
> > + * "VLAs" in Documentation/process/deprecated.rst
> > + */
> > +void *__builtin_alloca(size_t size);
> > +
> > +#define add_random_kstack_offset() do {					\
> > +	if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,	\
> > +				&randomize_kstack_offset)) {		\
> > +		u32 offset = this_cpu_read(kstack_offset);		\
> > +		char *ptr = __builtin_alloca(offset & 0x3FF);		\
> > +		asm volatile("" : "=m"(*ptr));				\
> 
> Is this asm() a homebrew OPTIMIZER_HIDE_VAR(*ptr)? If the asm
> constraints generate metter code, could we add those as alternative
> constraints in OPTIMIZER_HIDE_VAR() ?

Hah, yes, it is. And this produces identical asm, so I've replaced it
with OPTIMIZER_HIDE_VAR() now. Now if I could figure out how to hide it
from stack protector. :(

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 3/5] stack: Optionally randomize kernel stack offset each syscall
  2020-03-30 11:25   ` Mark Rutland
  2020-03-30 18:18     ` Kees Cook
@ 2020-03-30 18:27     ` Kees Cook
  1 sibling, 0 replies; 26+ messages in thread
From: Kees Cook @ 2020-03-30 18:27 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Thomas Gleixner, Elena Reshetova, x86, Andy Lutomirski,
	Peter Zijlstra, Catalin Marinas, Will Deacon,
	Alexander Potapenko, Ard Biesheuvel, Jann Horn, Perla, Enrico,
	kernel-hardening, linux-arm-kernel, linux-mm, linux-kernel

On Mon, Mar 30, 2020 at 12:25:36PM +0100, Mark Rutland wrote:
> On Tue, Mar 24, 2020 at 01:32:29PM -0700, Kees Cook wrote:
> > +/*
> > + * Do not use this anywhere else in the kernel. This is used here because
> > + * it provides an arch-agnostic way to grow the stack with correct
> > + * alignment. Also, since this use is being explicitly masked to a max of
> > + * 10 bits, stack-clash style attacks are unlikely. For more details see
> > + * "VLAs" in Documentation/process/deprecated.rst
> > + */
> > +void *__builtin_alloca(size_t size);
> > +
> > +#define add_random_kstack_offset() do {					\
> > +	if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,	\
> > +				&randomize_kstack_offset)) {		\
> > +		u32 offset = this_cpu_read(kstack_offset);		\
> > +		char *ptr = __builtin_alloca(offset & 0x3FF);		\
> > +		asm volatile("" : "=m"(*ptr));				\
> 
> Is this asm() a homebrew OPTIMIZER_HIDE_VAR(*ptr)? If the asm
> constraints generate metter code, could we add those as alternative
> constraints in OPTIMIZER_HIDE_VAR() ?

Er, no, sorry, not the same. I disassembled the wrong binary. :)

With     asm volatile("" : "=m"(*ptr))

ffffffff810038bc:       48 8d 44 24 0f          lea    0xf(%rsp),%rax
ffffffff810038c1:       48 83 e0 f0             and    $0xfffffffffffffff0,%rax


With   __asm__ ("" : "=r" (var) : "0" (var))

ffffffff810038bc:       48 8d 54 24 0f          lea    0xf(%rsp),%rdx
ffffffff810038c1:       48 83 e2 f0             and    $0xfffffffffffffff0,%rdx
ffffffff810038c5:       0f b6 02                movzbl (%rdx),%eax
ffffffff810038c8:       88 02                   mov    %al,(%rdx)


It looks like OPTIMIZER_HIDE_VAR() is basically just:

	var = var;

In the former case, we avoid the write and retain the allocation. So I
think don't think OPTIMIZER_HIDE_VAR() should be used here, nor should
OPTIMIZER_HIDE_VAR() be changed to remove the "0" (var) bit.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 5/5] arm64: entry: Enable random_kstack_offset support
  2020-03-24 20:32 ` [PATCH v2 5/5] arm64: entry: " Kees Cook
  2020-03-25 13:21   ` Mark Rutland
@ 2020-04-20 20:54   ` Will Deacon
  2020-04-20 22:34     ` Kees Cook
  1 sibling, 1 reply; 26+ messages in thread
From: Will Deacon @ 2020-04-20 20:54 UTC (permalink / raw)
  To: Kees Cook
  Cc: Thomas Gleixner, Elena Reshetova, x86, Andy Lutomirski,
	Peter Zijlstra, Catalin Marinas, Mark Rutland,
	Alexander Potapenko, Ard Biesheuvel, Jann Horn, Perla, Enrico,
	kernel-hardening, linux-arm-kernel, linux-mm, linux-kernel

On Tue, Mar 24, 2020 at 01:32:31PM -0700, Kees Cook wrote:
> Allow for a randomized stack offset on a per-syscall basis, with roughly
> 5 bits of entropy.
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>  arch/arm64/Kconfig          |  1 +
>  arch/arm64/kernel/syscall.c | 10 ++++++++++
>  2 files changed, 11 insertions(+)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 0b30e884e088..4d5aa4959f72 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -127,6 +127,7 @@ config ARM64
>  	select HAVE_ARCH_MMAP_RND_BITS
>  	select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT
>  	select HAVE_ARCH_PREL32_RELOCATIONS
> +	select HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET
>  	select HAVE_ARCH_SECCOMP_FILTER
>  	select HAVE_ARCH_STACKLEAK
>  	select HAVE_ARCH_THREAD_STRUCT_WHITELIST
> diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
> index a12c0c88d345..238dbd753b44 100644
> --- a/arch/arm64/kernel/syscall.c
> +++ b/arch/arm64/kernel/syscall.c
> @@ -5,6 +5,7 @@
>  #include <linux/errno.h>
>  #include <linux/nospec.h>
>  #include <linux/ptrace.h>
> +#include <linux/randomize_kstack.h>
>  #include <linux/syscalls.h>
>  
>  #include <asm/daifflags.h>
> @@ -42,6 +43,8 @@ static void invoke_syscall(struct pt_regs *regs, unsigned int scno,
>  {
>  	long ret;
>  
> +	add_random_kstack_offset();
> +
>  	if (scno < sc_nr) {
>  		syscall_fn_t syscall_fn;
>  		syscall_fn = syscall_table[array_index_nospec(scno, sc_nr)];
> @@ -51,6 +54,13 @@ static void invoke_syscall(struct pt_regs *regs, unsigned int scno,
>  	}
>  
>  	regs->regs[0] = ret;
> +
> +	/*
> +	 * Since the compiler chooses a 4 bit alignment for the stack,
> +	 * let's save one additional bit (9 total), which gets us up
> +	 * near 5 bits of entropy.
> +	 */
> +	choose_random_kstack_offset(get_random_int() & 0x1FF);

Hmm, this comment doesn't make any sense to me. I mean, I get that 0x1ff
is 9 bits, and that is 4+5 but so what?

Will

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 5/5] arm64: entry: Enable random_kstack_offset support
  2020-04-20 20:54   ` Will Deacon
@ 2020-04-20 22:34     ` Kees Cook
  2020-04-21  7:02       ` Will Deacon
  0 siblings, 1 reply; 26+ messages in thread
From: Kees Cook @ 2020-04-20 22:34 UTC (permalink / raw)
  To: Will Deacon
  Cc: Thomas Gleixner, Elena Reshetova, x86, Andy Lutomirski,
	Peter Zijlstra, Catalin Marinas, Mark Rutland,
	Alexander Potapenko, Ard Biesheuvel, Jann Horn, Perla, Enrico,
	kernel-hardening, linux-arm-kernel, linux-mm, linux-kernel

On Mon, Apr 20, 2020 at 09:54:58PM +0100, Will Deacon wrote:
> On Tue, Mar 24, 2020 at 01:32:31PM -0700, Kees Cook wrote:
> > +	/*
> > +	 * Since the compiler chooses a 4 bit alignment for the stack,
> > +	 * let's save one additional bit (9 total), which gets us up
> > +	 * near 5 bits of entropy.
> > +	 */
> > +	choose_random_kstack_offset(get_random_int() & 0x1FF);
> 
> Hmm, this comment doesn't make any sense to me. I mean, I get that 0x1ff
> is 9 bits, and that is 4+5 but so what?

Er, well, yes. I guess I was just trying to explain why there were 9
bits saved here and to document what I was seeing the compiler actually
doing with the values. (And it serves as a comparison to the x86 comment
which is explaining similar calculations in the face of x86_64 vs ia32.)

Would something like this be better?

/*
 * Since the compiler uses 4 bit alignment for the stack (1 more than
 * x86_64), let's try to match the 5ish-bit entropy seen in x86_64,
 * instead of having needlessly lower entropy. As a result, keep the
 * low 9 bits.
 */

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 5/5] arm64: entry: Enable random_kstack_offset support
  2020-04-20 22:34     ` Kees Cook
@ 2020-04-21  7:02       ` Will Deacon
  0 siblings, 0 replies; 26+ messages in thread
From: Will Deacon @ 2020-04-21  7:02 UTC (permalink / raw)
  To: Kees Cook
  Cc: Thomas Gleixner, Elena Reshetova, x86, Andy Lutomirski,
	Peter Zijlstra, Catalin Marinas, Mark Rutland,
	Alexander Potapenko, Ard Biesheuvel, Jann Horn, Perla, Enrico,
	kernel-hardening, linux-arm-kernel, linux-mm, linux-kernel

On Mon, Apr 20, 2020 at 03:34:57PM -0700, Kees Cook wrote:
> On Mon, Apr 20, 2020 at 09:54:58PM +0100, Will Deacon wrote:
> > On Tue, Mar 24, 2020 at 01:32:31PM -0700, Kees Cook wrote:
> > > +	/*
> > > +	 * Since the compiler chooses a 4 bit alignment for the stack,
> > > +	 * let's save one additional bit (9 total), which gets us up
> > > +	 * near 5 bits of entropy.
> > > +	 */
> > > +	choose_random_kstack_offset(get_random_int() & 0x1FF);
> > 
> > Hmm, this comment doesn't make any sense to me. I mean, I get that 0x1ff
> > is 9 bits, and that is 4+5 but so what?
> 
> Er, well, yes. I guess I was just trying to explain why there were 9
> bits saved here and to document what I was seeing the compiler actually
> doing with the values. (And it serves as a comparison to the x86 comment
> which is explaining similar calculations in the face of x86_64 vs ia32.)
> 
> Would something like this be better?
> 
> /*
>  * Since the compiler uses 4 bit alignment for the stack (1 more than
>  * x86_64), let's try to match the 5ish-bit entropy seen in x86_64,
>  * instead of having needlessly lower entropy. As a result, keep the
>  * low 9 bits.
>  */

Yes, thank you! I was missing the comparison to x86_64 and so the one
"additional" bit didn't make sense to me.

With the new comment:

Acked-by: Will Deacon <will@kernel.org>

I'm assuming you're merging this via some other tree, but let me know
if you need anything else from me.

Cheers,

Will

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2020-04-21  7:02 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-24 20:32 [PATCH v2 0/5] Optionally randomize kernel stack offset each syscall Kees Cook
2020-03-24 20:32 ` [PATCH v2 1/5] jump_label: Provide CONFIG-driven build state defaults Kees Cook
2020-03-24 22:06   ` Peter Zijlstra
2020-03-24 20:32 ` [PATCH v2 2/5] init_on_alloc: Unpessimize default-on builds Kees Cook
2020-03-26 15:48   ` Alexander Potapenko
2020-03-24 20:32 ` [PATCH v2 3/5] stack: Optionally randomize kernel stack offset each syscall Kees Cook
2020-03-30 11:25   ` Mark Rutland
2020-03-30 18:18     ` Kees Cook
2020-03-30 18:27     ` Kees Cook
2020-03-24 20:32 ` [PATCH v2 4/5] x86/entry: Enable random_kstack_offset support Kees Cook
2020-03-28 22:26   ` Kees Cook
2020-03-24 20:32 ` [PATCH v2 5/5] arm64: entry: " Kees Cook
2020-03-25 13:21   ` Mark Rutland
2020-03-25 20:22     ` Kees Cook
2020-03-26 11:15       ` Mark Rutland
2020-03-26 16:31         ` Kees Cook
2020-03-30 11:26           ` Mark Rutland
2020-04-20 20:54   ` Will Deacon
2020-04-20 22:34     ` Kees Cook
2020-04-21  7:02       ` Will Deacon
2020-03-24 21:28 ` [PATCH v2 0/5] Optionally randomize kernel stack offset each syscall Jann Horn
2020-03-24 23:07   ` Kees Cook
2020-03-25 12:15     ` Reshetova, Elena
2020-03-25 20:27       ` Kees Cook
2020-03-25 23:20         ` Jann Horn
2020-03-26 17:18           ` Kees Cook

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).