linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/5] Optionally randomize kernel stack offset each syscall
@ 2020-06-22 19:31 Kees Cook
  2020-06-22 19:31 ` [PATCH v4 1/5] jump_label: Provide CONFIG-driven build state defaults Kees Cook
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: Kees Cook @ 2020-06-22 19:31 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Kees Cook, Elena Reshetova, x86, Andy Lutomirski, Peter Zijlstra,
	Catalin Marinas, Will Deacon, Mark Rutland, Alexander Potapenko,
	Alexander Popov, Ard Biesheuvel, Jann Horn, kernel-hardening,
	linux-arm-kernel, linux-mm, linux-kernel

v4:
- rebase to v5.8-rc2
v3: https://lore.kernel.org/lkml/20200406231606.37619-1-keescook@chromium.org/
v2: https://lore.kernel.org/lkml/20200324203231.64324-1-keescook@chromium.org/
rfc: https://lore.kernel.org/kernel-hardening/20190329081358.30497-1-elena.reshetova@intel.com/

Hi,

This is a continuation and refactoring of Elena's earlier effort to add
kernel stack base offset randomization. In the time since the previous
discussions, two attacks[1][2] were made public that depended on stack
determinism, so we're no longer in the position of "this is a good idea
but we have no examples of attacks". :)

Earlier discussions also devolved into debates on entropy sources, which
is mostly a red herring, given the already low entropy available due
to stack size. Regardless, entropy can be changed/improved separately
from this series as needed.

Earlier discussions also got stuck debating how much syscall overhead
was too much, but this is also a red herring since the feature itself
needs to be selectable at boot with no cost for those that don't want it:
this is solved here with static branches.

So, here is an improved version, made as arch-agnostic as possible,
with usage added for x86 and arm64. It also includes some small static
branch clean ups, and addresses some surprise performance issues due to
the stack canary[3].

Note that for v5.8, this depends on this fix (due to how x86 changed its
stack protector removal for syscall entry):
https://lore.kernel.org/lkml/202006221201.3641ED037E@keescook/

Thanks!

-Kees

[1] https://a13xp0p0v.github.io/2020/02/15/CVE-2019-18683.html
[2] https://repositorio-aberto.up.pt/bitstream/10216/125357/2/374717.pdf
[3] https://lore.kernel.org/lkml/202003281520.A9BFF461@keescook/


Kees Cook (5):
  jump_label: Provide CONFIG-driven build state defaults
  init_on_alloc: Unpessimize default-on builds
  stack: Optionally randomize kernel stack offset each syscall
  x86/entry: Enable random_kstack_offset support
  arm64: entry: Enable random_kstack_offset support

 Makefile                         |  4 ++++
 arch/Kconfig                     | 23 ++++++++++++++++++
 arch/arm64/Kconfig               |  1 +
 arch/arm64/kernel/Makefile       |  5 ++++
 arch/arm64/kernel/syscall.c      | 10 ++++++++
 arch/x86/Kconfig                 |  1 +
 arch/x86/entry/common.c          | 11 +++++++++
 include/linux/jump_label.h       | 19 +++++++++++++++
 include/linux/mm.h               | 18 +++++---------
 include/linux/randomize_kstack.h | 40 ++++++++++++++++++++++++++++++++
 init/main.c                      | 23 ++++++++++++++++++
 mm/page_alloc.c                  | 12 ++--------
 12 files changed, 145 insertions(+), 22 deletions(-)
 create mode 100644 include/linux/randomize_kstack.h

-- 
2.25.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v4 1/5] jump_label: Provide CONFIG-driven build state defaults
  2020-06-22 19:31 [PATCH v4 0/5] Optionally randomize kernel stack offset each syscall Kees Cook
@ 2020-06-22 19:31 ` Kees Cook
  2020-06-22 19:31 ` [PATCH v4 2/5] init_on_alloc: Unpessimize default-on builds Kees Cook
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 19+ messages in thread
From: Kees Cook @ 2020-06-22 19:31 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Kees Cook, Peter Zijlstra, Elena Reshetova, x86, Andy Lutomirski,
	Catalin Marinas, Will Deacon, Mark Rutland, Alexander Potapenko,
	Alexander Popov, Ard Biesheuvel, Jann Horn, kernel-hardening,
	linux-arm-kernel, linux-mm, linux-kernel

Choosing the initial state of static branches changes the assembly
layout (if the condition is expected to be likely, inline, or unlikely,
out of line via a jump). A few places in the kernel use (or could be
using) a CONFIG to choose the default state, so provide the
infrastructure to do this and convert the existing cases (init_on_alloc
and init_on_free) to the new macros.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 include/linux/jump_label.h | 19 +++++++++++++++++++
 include/linux/mm.h         | 12 ++----------
 mm/page_alloc.c            | 12 ++----------
 3 files changed, 23 insertions(+), 20 deletions(-)

diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
index 3526c0aee954..615fdfb871a3 100644
--- a/include/linux/jump_label.h
+++ b/include/linux/jump_label.h
@@ -382,6 +382,21 @@ struct static_key_false {
 		[0 ... (count) - 1] = STATIC_KEY_FALSE_INIT,	\
 	}
 
+#define _DEFINE_STATIC_KEY_1(name)	DEFINE_STATIC_KEY_TRUE(name)
+#define _DEFINE_STATIC_KEY_0(name)	DEFINE_STATIC_KEY_FALSE(name)
+#define DEFINE_STATIC_KEY_MAYBE(cfg, name)			\
+	__PASTE(_DEFINE_STATIC_KEY_, IS_ENABLED(cfg))(name)
+
+#define _DEFINE_STATIC_KEY_RO_1(name)	DEFINE_STATIC_KEY_TRUE_RO(name)
+#define _DEFINE_STATIC_KEY_RO_0(name)	DEFINE_STATIC_KEY_FALSE_RO(name)
+#define DEFINE_STATIC_KEY_MAYBE_RO(cfg, name)			\
+	__PASTE(_DEFINE_STATIC_KEY_RO_, IS_ENABLED(cfg))(name)
+
+#define _DECLARE_STATIC_KEY_1(name)	DECLARE_STATIC_KEY_TRUE(name)
+#define _DECLARE_STATIC_KEY_0(name)	DECLARE_STATIC_KEY_FALSE(name)
+#define DECLARE_STATIC_KEY_MAYBE(cfg, name)			\
+	__PASTE(_DECLARE_STATIC_KEY_, IS_ENABLED(cfg))(name)
+
 extern bool ____wrong_branch_error(void);
 
 #define static_key_enabled(x)							\
@@ -482,6 +497,10 @@ extern bool ____wrong_branch_error(void);
 
 #endif /* CONFIG_JUMP_LABEL */
 
+#define static_branch_maybe(config, x)					\
+	(IS_ENABLED(config) ? static_branch_likely(x)			\
+			    : static_branch_unlikely(x))
+
 /*
  * Advanced usage; refcount, branch is enabled when: count != 0
  */
diff --git a/include/linux/mm.h b/include/linux/mm.h
index dc7b87310c10..0e6824fd4458 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2889,11 +2889,7 @@ static inline void kernel_poison_pages(struct page *page, int numpages,
 					int enable) { }
 #endif
 
-#ifdef CONFIG_INIT_ON_ALLOC_DEFAULT_ON
-DECLARE_STATIC_KEY_TRUE(init_on_alloc);
-#else
-DECLARE_STATIC_KEY_FALSE(init_on_alloc);
-#endif
+DECLARE_STATIC_KEY_MAYBE(CONFIG_INIT_ON_ALLOC_DEFAULT_ON, init_on_alloc);
 static inline bool want_init_on_alloc(gfp_t flags)
 {
 	if (static_branch_unlikely(&init_on_alloc) &&
@@ -2902,11 +2898,7 @@ static inline bool want_init_on_alloc(gfp_t flags)
 	return flags & __GFP_ZERO;
 }
 
-#ifdef CONFIG_INIT_ON_FREE_DEFAULT_ON
-DECLARE_STATIC_KEY_TRUE(init_on_free);
-#else
-DECLARE_STATIC_KEY_FALSE(init_on_free);
-#endif
+DECLARE_STATIC_KEY_MAYBE(CONFIG_INIT_ON_FREE_DEFAULT_ON, init_on_free);
 static inline bool want_init_on_free(void)
 {
 	return static_branch_unlikely(&init_on_free) &&
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 48eb0f1410d4..5885a612fa18 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -136,18 +136,10 @@ unsigned long totalcma_pages __read_mostly;
 
 int percpu_pagelist_fraction;
 gfp_t gfp_allowed_mask __read_mostly = GFP_BOOT_MASK;
-#ifdef CONFIG_INIT_ON_ALLOC_DEFAULT_ON
-DEFINE_STATIC_KEY_TRUE(init_on_alloc);
-#else
-DEFINE_STATIC_KEY_FALSE(init_on_alloc);
-#endif
+DEFINE_STATIC_KEY_MAYBE(CONFIG_INIT_ON_ALLOC_DEFAULT_ON, init_on_alloc);
 EXPORT_SYMBOL(init_on_alloc);
 
-#ifdef CONFIG_INIT_ON_FREE_DEFAULT_ON
-DEFINE_STATIC_KEY_TRUE(init_on_free);
-#else
-DEFINE_STATIC_KEY_FALSE(init_on_free);
-#endif
+DEFINE_STATIC_KEY_MAYBE(CONFIG_INIT_ON_FREE_DEFAULT_ON, init_on_free);
 EXPORT_SYMBOL(init_on_free);
 
 static int __init early_init_on_alloc(char *buf)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 2/5] init_on_alloc: Unpessimize default-on builds
  2020-06-22 19:31 [PATCH v4 0/5] Optionally randomize kernel stack offset each syscall Kees Cook
  2020-06-22 19:31 ` [PATCH v4 1/5] jump_label: Provide CONFIG-driven build state defaults Kees Cook
@ 2020-06-22 19:31 ` Kees Cook
  2020-06-22 19:31 ` [PATCH v4 3/5] stack: Optionally randomize kernel stack offset each syscall Kees Cook
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 19+ messages in thread
From: Kees Cook @ 2020-06-22 19:31 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Kees Cook, Alexander Potapenko, Elena Reshetova, x86,
	Andy Lutomirski, Peter Zijlstra, Catalin Marinas, Will Deacon,
	Mark Rutland, Alexander Popov, Ard Biesheuvel, Jann Horn,
	kernel-hardening, linux-arm-kernel, linux-mm, linux-kernel

Right now, the state of CONFIG_INIT_ON_ALLOC_DEFAULT_ON (and
...ON_FREE...) did not change the assembly ordering of the static branch
tests. Use the new jump_label macro to check CONFIG settings to default
to the "expected" state, unpessimizes the resulting assembly code.

Reviewed-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 include/linux/mm.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0e6824fd4458..0a05b20870c2 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2892,7 +2892,8 @@ static inline void kernel_poison_pages(struct page *page, int numpages,
 DECLARE_STATIC_KEY_MAYBE(CONFIG_INIT_ON_ALLOC_DEFAULT_ON, init_on_alloc);
 static inline bool want_init_on_alloc(gfp_t flags)
 {
-	if (static_branch_unlikely(&init_on_alloc) &&
+	if (static_branch_maybe(CONFIG_INIT_ON_ALLOC_DEFAULT_ON,
+				&init_on_alloc) &&
 	    !page_poisoning_enabled())
 		return true;
 	return flags & __GFP_ZERO;
@@ -2901,7 +2902,8 @@ static inline bool want_init_on_alloc(gfp_t flags)
 DECLARE_STATIC_KEY_MAYBE(CONFIG_INIT_ON_FREE_DEFAULT_ON, init_on_free);
 static inline bool want_init_on_free(void)
 {
-	return static_branch_unlikely(&init_on_free) &&
+	return static_branch_maybe(CONFIG_INIT_ON_FREE_DEFAULT_ON,
+				   &init_on_free) &&
 	       !page_poisoning_enabled();
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 3/5] stack: Optionally randomize kernel stack offset each syscall
  2020-06-22 19:31 [PATCH v4 0/5] Optionally randomize kernel stack offset each syscall Kees Cook
  2020-06-22 19:31 ` [PATCH v4 1/5] jump_label: Provide CONFIG-driven build state defaults Kees Cook
  2020-06-22 19:31 ` [PATCH v4 2/5] init_on_alloc: Unpessimize default-on builds Kees Cook
@ 2020-06-22 19:31 ` Kees Cook
  2020-06-22 19:40   ` Randy Dunlap
                     ` (3 more replies)
  2020-06-22 19:31 ` [PATCH v4 4/5] x86/entry: Enable random_kstack_offset support Kees Cook
  2020-06-22 19:31 ` [PATCH v4 5/5] arm64: entry: " Kees Cook
  4 siblings, 4 replies; 19+ messages in thread
From: Kees Cook @ 2020-06-22 19:31 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Kees Cook, Elena Reshetova, x86, Andy Lutomirski, Peter Zijlstra,
	Catalin Marinas, Will Deacon, Mark Rutland, Alexander Potapenko,
	Alexander Popov, Ard Biesheuvel, Jann Horn, kernel-hardening,
	linux-arm-kernel, linux-mm, linux-kernel

This provides the ability for architectures to enable kernel stack base
address offset randomization. This feature is controlled by the boot
param "randomize_kstack_offset=on/off", with its default value set by
CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT.

This feature is based on the original idea from the last public release
of PaX's RANDKSTACK feature: https://pax.grsecurity.net/docs/randkstack.txt
All the credit for the original idea goes to the PaX team. Note that
the design and implementation of this upstream randomize_kstack_offset
feature differs greatly from the RANDKSTACK feature (see below).

Reasoning for the feature:

This feature aims to make harder the various stack-based attacks that
rely on deterministic stack structure. We have had many such attacks in
past (just to name few):

https://jon.oberheide.org/files/infiltrate12-thestackisback.pdf
https://jon.oberheide.org/files/stackjacking-infiltrate11.pdf
https://googleprojectzero.blogspot.com/2016/06/exploiting-recursion-in-linux-kernel_20.html

As Linux kernel stack protections have been constantly improving
(vmap-based stack allocation with guard pages, removal of thread_info,
STACKLEAK), attackers have had to find new ways for their exploits
to work. They have done so, continuing to rely on the kernel's stack
determinism, in situations where VMAP_STACK and THREAD_INFO_IN_TASK_STRUCT
were not relevant. For example, the following recent attacks would have
been hampered if the stack offset was non-deterministic between syscalls:

https://repositorio-aberto.up.pt/bitstream/10216/125357/2/374717.pdf
(page 70: targeting the pt_regs copy with linear stack overflow)

https://a13xp0p0v.github.io/2020/02/15/CVE-2019-18683.html
(leaked stack address from one syscall as a target during next syscall)

The main idea is that since the stack offset is randomized on each system
call, it is harder for an attack to reliably land in any particular place
on the thread stack, even with address exposures, as the stack base will
change on the next syscall. Also, since randomization is performed after
placing pt_regs, the ptrace-based approach[1] to discover the randomized
offset during a long-running syscall should not be possible.

Design description:

During most of the kernel's execution, it runs on the "thread stack",
which is pretty deterministic in its structure: it is fixed in size,
and on every entry from userspace to kernel on a syscall the thread
stack starts construction from an address fetched from the per-cpu
cpu_current_top_of_stack variable. The first element to be pushed to the
thread stack is the pt_regs struct that stores all required CPU registers
and syscall parameters. Finally the specific syscall function is called,
with the stack being used as the kernel executes the resulting request.

The goal of randomize_kstack_offset feature is to add a random offset
after the pt_regs has been pushed to the stack and before the rest of the
thread stack is used during the syscall processing, and to change it every
time a process issues a syscall. The source of randomness is currently
architecture-defined (but x86 is using the low byte of rdtsc()). Future
improvements for different entropy sources is possible, but out of scope
for this patch. As suggested by Andy Lutomirski, the offset is added
using alloca() and an empty asm() statement with an output constraint,
since it avoid changes to assembly syscall entry code, to the unwinder,
and provides correct stack alignment as defined by the compiler.

In order to make this available by default with zero performance impact
for those that don't want it, it is boot-time selectable with static
branches. This way, if the overhead is not wanted, it can just be
left turned off with no performance impact.

The generated assembly for x86_64 with GCC looks like this:

...
ffffffff81003977: 65 8b 05 02 ea 00 7f  mov %gs:0x7f00ea02(%rip),%eax
					    # 12380 <kstack_offset>
ffffffff8100397e: 25 ff 03 00 00        and $0x3ff,%eax
ffffffff81003983: 48 83 c0 0f           add $0xf,%rax
ffffffff81003987: 25 f8 07 00 00        and $0x7f8,%eax
ffffffff8100398c: 48 29 c4              sub %rax,%rsp
ffffffff8100398f: 48 8d 44 24 0f        lea 0xf(%rsp),%rax
ffffffff81003994: 48 83 e0 f0           and $0xfffffffffffffff0,%rax
...

As a result of the above stack alignment, this patch introduces about
5 bits of randomness after pt_regs is spilled to the thread stack on
x86_64, and 6 bits on x86_32 (since its has 1 fewer bit required for
stack alignment). The amount of entropy could be adjusted based on how
much of the stack space we wish to trade for security.

My measure of syscall performance overhead (on x86_64):

lmbench: /usr/lib/lmbench/bin/x86_64-linux-gnu/lat_syscall -N 10000 null
    randomize_kstack_offset=y	Simple syscall: 0.7082 microseconds
    randomize_kstack_offset=n	Simple syscall: 0.7016 microseconds

So, roughly 0.9% overhead growth for a no-op syscall, which is very
manageable. And for people that don't want this, it's off by default.

There are two gotchas with using the alloca() trick. First,
compilers that have Stack Clash protection (-fstack-clash-protection)
enabled by default (e.g. Ubuntu[3]) add pagesize stack probes to
any dynamic stack allocations. While the randomization offset is
always less than a page, the resulting assembly would still contain
(unreachable!) probing routines, bloating the resulting assembly. To
avoid this, -fno-stack-clash-protection is unconditionally added to
the kernel Makefile since this is the only dynamic stack allocation in
the kernel (now that VLAs have been removed) and it is provably safe
from Stack Clash style attacks.

The second gotcha with alloca() is a negative interaction with
-fstack-protector-strong, in that it sees the alloca() as an array
allocation, which triggers the unconditional addition of the stack
canary function pre/post-amble which slows down syscalls regardless of
the static branch. In order to avoid adding this unneeded check and its
associated performance impact, architectures need to downgrade uses of
-fstack-protector-strong to -fstack-protector (which only triggers for
char arrays) in the compilation units that use the add_random_kstack()
macro and to audit the resulting stack mitigation coverage (to make sure
no desired coverage disappears). This change is not needed on x86
because stack protector is already unconditionally disabled for the
compilation unite, but is needed on arm64. There is, unfortunately,
no attribute that can be used to disable stack protector for specific
functions.

Comparison to PaX RANDKSTACK feature:

The RANDKSTACK feature randomizes the location of the stack start
(cpu_current_top_of_stack), i.e. including the location of pt_regs
structure itself on the stack. Initially this patch followed the same
approach, but during the recent discussions[2], it has been determined
to be of a little value since, if ptrace functionality is available for
an attacker, they can use PTRACE_PEEKUSR/PTRACE_POKEUSR to read/write
different offsets in the pt_regs struct, observe the cache behavior of
the pt_regs accesses, and figure out the random stack offset. Another
difference is that the random offset is stored in a per-cpu variable,
rather than having it be per-thread. As a result, these implementations
differ a fair bit in their implementation details and results, though
obviously the intent is similar.

[1] https://lore.kernel.org/kernel-hardening/2236FBA76BA1254E88B949DDB74E612BA4BC57C1@IRSMSX102.ger.corp.intel.com/
[2] https://lore.kernel.org/kernel-hardening/20190329081358.30497-1-elena.reshetova@intel.com/
[3] https://lists.ubuntu.com/archives/ubuntu-devel/2019-June/040741.html

Co-developed-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Link: https://lore.kernel.org/r/20190415060918.3766-1-elena.reshetova@intel.com
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 Makefile                         |  4 ++++
 arch/Kconfig                     | 23 ++++++++++++++++++
 include/linux/randomize_kstack.h | 40 ++++++++++++++++++++++++++++++++
 init/main.c                      | 23 ++++++++++++++++++
 4 files changed, 90 insertions(+)
 create mode 100644 include/linux/randomize_kstack.h

diff --git a/Makefile b/Makefile
index b46e91bf0b0e..8cb7a1388950 100644
--- a/Makefile
+++ b/Makefile
@@ -809,6 +809,10 @@ ifdef CONFIG_INIT_STACK_ALL
 KBUILD_CFLAGS	+= -ftrivial-auto-var-init=pattern
 endif
 
+# While VLAs have been removed, GCC produces unreachable stack probes
+# for the randomize_kstack_offset feature. Disable it for all compilers.
+KBUILD_CFLAGS	+= $(call cc-option,-fno-stack-clash-protection,)
+
 DEBUG_CFLAGS	:= $(call cc-option, -fno-var-tracking-assignments)
 
 ifdef CONFIG_DEBUG_INFO
diff --git a/arch/Kconfig b/arch/Kconfig
index 1ea61290900a..1f52c9cfefca 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -883,6 +883,29 @@ config VMAP_STACK
 	  virtual mappings with real shadow memory, and KASAN_VMALLOC must
 	  be enabled.
 
+config HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET
+	def_bool n
+	help
+	  An arch should select this symbol if it can support kernel stack
+	  offset randomization with calls to add_random_kstack_offset()
+	  during syscall entry and choose_random_kstack_offset() during
+	  syscall exit. Downgrading of -fstack-protector-strong to
+	  -fstack-protector should also be applied to the entry code and
+	  closely examined, as the artificial stack bump looks like an array
+	  to the compiler, so it will attempt to add canary checks regardless
+	  of the static branch state.
+
+config RANDOMIZE_KSTACK_OFFSET_DEFAULT
+	bool "Randomize kernel stack offset on syscall entry"
+	depends on HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET
+	help
+	  The kernel stack offset can be randomized (after pt_regs) by
+	  roughly 5 bits of entropy, frustrating memory corruption
+	  attacks that depend on stack address determinism or
+	  cross-syscall address exposures. This feature is controlled
+	  by kernel boot param "randomize_kstack_offset=on/off", and this
+	  config chooses the default boot state.
+
 config ARCH_OPTIONAL_KERNEL_RWX
 	def_bool n
 
diff --git a/include/linux/randomize_kstack.h b/include/linux/randomize_kstack.h
new file mode 100644
index 000000000000..1df0dc52cadc
--- /dev/null
+++ b/include/linux/randomize_kstack.h
@@ -0,0 +1,40 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef _LINUX_RANDOMIZE_KSTACK_H
+#define _LINUX_RANDOMIZE_KSTACK_H
+
+#include <linux/kernel.h>
+#include <linux/jump_label.h>
+#include <linux/percpu-defs.h>
+
+DECLARE_STATIC_KEY_MAYBE(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,
+			 randomize_kstack_offset);
+DECLARE_PER_CPU(u32, kstack_offset);
+
+/*
+ * Do not use this anywhere else in the kernel. This is used here because
+ * it provides an arch-agnostic way to grow the stack with correct
+ * alignment. Also, since this use is being explicitly masked to a max of
+ * 10 bits, stack-clash style attacks are unlikely. For more details see
+ * "VLAs" in Documentation/process/deprecated.rst
+ */
+void *__builtin_alloca(size_t size);
+
+#define add_random_kstack_offset() do {					\
+	if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,	\
+				&randomize_kstack_offset)) {		\
+		u32 offset = this_cpu_read(kstack_offset);		\
+		u8 *ptr = __builtin_alloca(offset & 0x3FF);		\
+		asm volatile("" : "=m"(*ptr));				\
+	}								\
+} while (0)
+
+#define choose_random_kstack_offset(rand) do {				\
+	if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,	\
+				&randomize_kstack_offset)) {		\
+		u32 offset = this_cpu_read(kstack_offset);		\
+		offset ^= (rand);					\
+		this_cpu_write(kstack_offset, offset);			\
+	}								\
+} while (0)
+
+#endif
diff --git a/init/main.c b/init/main.c
index 0ead83e86b5a..fa8ae0ae3ac2 100644
--- a/init/main.c
+++ b/init/main.c
@@ -822,6 +822,29 @@ static void __init mm_init(void)
 	pti_init();
 }
 
+#ifdef CONFIG_HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET
+DEFINE_STATIC_KEY_MAYBE_RO(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,
+			   randomize_kstack_offset);
+DEFINE_PER_CPU(u32, kstack_offset);
+
+static int __init early_randomize_kstack_offset(char *buf)
+{
+	int ret;
+	bool bool_result;
+
+	ret = kstrtobool(buf, &bool_result);
+	if (ret)
+		return ret;
+
+	if (bool_result)
+		static_branch_enable(&randomize_kstack_offset);
+	else
+		static_branch_disable(&randomize_kstack_offset);
+	return 0;
+}
+early_param("randomize_kstack_offset", early_randomize_kstack_offset);
+#endif
+
 void __init __weak arch_call_rest_init(void)
 {
 	rest_init();
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 4/5] x86/entry: Enable random_kstack_offset support
  2020-06-22 19:31 [PATCH v4 0/5] Optionally randomize kernel stack offset each syscall Kees Cook
                   ` (2 preceding siblings ...)
  2020-06-22 19:31 ` [PATCH v4 3/5] stack: Optionally randomize kernel stack offset each syscall Kees Cook
@ 2020-06-22 19:31 ` Kees Cook
  2020-06-22 19:31 ` [PATCH v4 5/5] arm64: entry: " Kees Cook
  4 siblings, 0 replies; 19+ messages in thread
From: Kees Cook @ 2020-06-22 19:31 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Kees Cook, Elena Reshetova, x86, Andy Lutomirski, Peter Zijlstra,
	Catalin Marinas, Will Deacon, Mark Rutland, Alexander Potapenko,
	Alexander Popov, Ard Biesheuvel, Jann Horn, kernel-hardening,
	linux-arm-kernel, linux-mm, linux-kernel

Allow for a randomized stack offset on a per-syscall basis, with roughly
5 bits of entropy.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/Kconfig        |  1 +
 arch/x86/entry/common.c | 11 +++++++++++
 2 files changed, 12 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 6a0cc524882d..57c2a458150e 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -156,6 +156,7 @@ config X86
 	select HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD if X86_64
 	select HAVE_ARCH_USERFAULTFD_WP         if X86_64 && USERFAULTFD
 	select HAVE_ARCH_VMAP_STACK		if X86_64
+	select HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET
 	select HAVE_ARCH_WITHIN_STACK_FRAMES
 	select HAVE_ASM_MODVERSIONS
 	select HAVE_CMPXCHG_DOUBLE
diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index bd3f14175193..681125bbf09a 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -26,6 +26,7 @@
 #include <linux/livepatch.h>
 #include <linux/syscalls.h>
 #include <linux/uaccess.h>
+#include <linux/randomize_kstack.h>
 
 #ifdef CONFIG_XEN_PV
 #include <xen/xen-ops.h>
@@ -240,6 +241,13 @@ static void __prepare_exit_to_usermode(struct pt_regs *regs)
 	lockdep_assert_irqs_disabled();
 	lockdep_sys_exit();
 
+	/*
+	 * x86_64 stack alignment means 3 bits are ignored, so keep
+	 * the top 5 bits. x86_32 needs only 2 bits of alignment, so
+	 * the top 6 bits will be used.
+	 */
+	choose_random_kstack_offset(rdtsc() & 0xFF);
+
 	cached_flags = READ_ONCE(ti->flags);
 
 	if (unlikely(cached_flags & EXIT_TO_USERMODE_LOOP_FLAGS))
@@ -346,6 +354,7 @@ __visible noinstr void do_syscall_64(unsigned long nr, struct pt_regs *regs)
 {
 	struct thread_info *ti;
 
+	add_random_kstack_offset();
 	enter_from_user_mode();
 	instrumentation_begin();
 
@@ -409,6 +418,7 @@ static void do_syscall_32_irqs_on(struct pt_regs *regs)
 /* Handles int $0x80 */
 __visible noinstr void do_int80_syscall_32(struct pt_regs *regs)
 {
+	add_random_kstack_offset();
 	enter_from_user_mode();
 	instrumentation_begin();
 
@@ -467,6 +477,7 @@ __visible noinstr long do_fast_syscall_32(struct pt_regs *regs)
 	 */
 	regs->ip = landing_pad;
 
+	add_random_kstack_offset();
 	enter_from_user_mode();
 	instrumentation_begin();
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 5/5] arm64: entry: Enable random_kstack_offset support
  2020-06-22 19:31 [PATCH v4 0/5] Optionally randomize kernel stack offset each syscall Kees Cook
                   ` (3 preceding siblings ...)
  2020-06-22 19:31 ` [PATCH v4 4/5] x86/entry: Enable random_kstack_offset support Kees Cook
@ 2020-06-22 19:31 ` Kees Cook
  2020-06-23  9:40   ` Mark Rutland
  4 siblings, 1 reply; 19+ messages in thread
From: Kees Cook @ 2020-06-22 19:31 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Kees Cook, Elena Reshetova, x86, Andy Lutomirski, Peter Zijlstra,
	Catalin Marinas, Will Deacon, Mark Rutland, Alexander Potapenko,
	Alexander Popov, Ard Biesheuvel, Jann Horn, kernel-hardening,
	linux-arm-kernel, linux-mm, linux-kernel

Allow for a randomized stack offset on a per-syscall basis, with roughly
5 bits of entropy.

In order to avoid unconditional stack canaries on syscall entry, also
downgrade from -fstack-protector-strong to -fstack-protector to avoid
triggering checks due to alloca(). Examining the resulting syscall.o,
sees no changes in canary coverage (none before, none now).

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/arm64/Kconfig          |  1 +
 arch/arm64/kernel/Makefile  |  5 +++++
 arch/arm64/kernel/syscall.c | 10 ++++++++++
 3 files changed, 16 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index a4a094bedcb2..2902e5316e1a 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -135,6 +135,7 @@ config ARM64
 	select HAVE_ARCH_MMAP_RND_BITS
 	select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT
 	select HAVE_ARCH_PREL32_RELOCATIONS
+	select HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET
 	select HAVE_ARCH_SECCOMP_FILTER
 	select HAVE_ARCH_STACKLEAK
 	select HAVE_ARCH_THREAD_STRUCT_WHITELIST
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 151f28521f1e..39fc23d3770b 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -11,6 +11,11 @@ CFLAGS_REMOVE_ftrace.o = $(CC_FLAGS_FTRACE)
 CFLAGS_REMOVE_insn.o = $(CC_FLAGS_FTRACE)
 CFLAGS_REMOVE_return_address.o = $(CC_FLAGS_FTRACE)
 
+# Downgrade to -fstack-protector to avoid triggering unneeded stack canary
+# checks due to randomize_kstack_offset.
+CFLAGS_REMOVE_syscall.o += -fstack-protector-strong
+CFLAGS_syscall.o	+= $(subst -fstack-protector-strong,-fstack-protector,$(filter -fstack-protector-strong,$(KBUILD_CFLAGS)))
+
 # Object file lists.
 obj-y			:= debug-monitors.o entry.o irq.o fpsimd.o		\
 			   entry-common.o entry-fpsimd.o process.o ptrace.o	\
diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
index 5f5b868292f5..00d3c84db9cd 100644
--- a/arch/arm64/kernel/syscall.c
+++ b/arch/arm64/kernel/syscall.c
@@ -5,6 +5,7 @@
 #include <linux/errno.h>
 #include <linux/nospec.h>
 #include <linux/ptrace.h>
+#include <linux/randomize_kstack.h>
 #include <linux/syscalls.h>
 
 #include <asm/daifflags.h>
@@ -42,6 +43,8 @@ static void invoke_syscall(struct pt_regs *regs, unsigned int scno,
 {
 	long ret;
 
+	add_random_kstack_offset();
+
 	if (scno < sc_nr) {
 		syscall_fn_t syscall_fn;
 		syscall_fn = syscall_table[array_index_nospec(scno, sc_nr)];
@@ -51,6 +54,13 @@ static void invoke_syscall(struct pt_regs *regs, unsigned int scno,
 	}
 
 	regs->regs[0] = ret;
+
+	/*
+	 * Since the compiler chooses a 4 bit alignment for the stack,
+	 * let's save one additional bit (9 total), which gets us up
+	 * near 5 bits of entropy.
+	 */
+	choose_random_kstack_offset(get_random_int() & 0x1FF);
 }
 
 static inline bool has_syscall_work(unsigned long flags)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 3/5] stack: Optionally randomize kernel stack offset each syscall
  2020-06-22 19:31 ` [PATCH v4 3/5] stack: Optionally randomize kernel stack offset each syscall Kees Cook
@ 2020-06-22 19:40   ` Randy Dunlap
  2020-06-22 21:26     ` Kees Cook
  2020-06-22 20:07   ` Jann Horn
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 19+ messages in thread
From: Randy Dunlap @ 2020-06-22 19:40 UTC (permalink / raw)
  To: Kees Cook, Thomas Gleixner
  Cc: Elena Reshetova, x86, Andy Lutomirski, Peter Zijlstra,
	Catalin Marinas, Will Deacon, Mark Rutland, Alexander Potapenko,
	Alexander Popov, Ard Biesheuvel, Jann Horn, kernel-hardening,
	linux-arm-kernel, linux-mm, linux-kernel

On 6/22/20 12:31 PM, Kees Cook wrote:
> This provides the ability for architectures to enable kernel stack base
> address offset randomization. This feature is controlled by the boot
> param "randomize_kstack_offset=on/off", with its default value set by
> CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT.
> 
> Co-developed-by: Elena Reshetova <elena.reshetova@intel.com>
> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
> Link: https://lore.kernel.org/r/20190415060918.3766-1-elena.reshetova@intel.com
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>  Makefile                         |  4 ++++
>  arch/Kconfig                     | 23 ++++++++++++++++++
>  include/linux/randomize_kstack.h | 40 ++++++++++++++++++++++++++++++++
>  init/main.c                      | 23 ++++++++++++++++++
>  4 files changed, 90 insertions(+)
>  create mode 100644 include/linux/randomize_kstack.h

Hi,
Please add documentation for the new kernel boot parameter to
Documentation/admin-guide/kernel-parameters.txt.


> diff --git a/arch/Kconfig b/arch/Kconfig
> index 1ea61290900a..1f52c9cfefca 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -883,6 +883,29 @@ config VMAP_STACK
>  	  virtual mappings with real shadow memory, and KASAN_VMALLOC must
>  	  be enabled.
>  
> +config HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET
> +	def_bool n
> +	help
> +	  An arch should select this symbol if it can support kernel stack
> +	  offset randomization with calls to add_random_kstack_offset()
> +	  during syscall entry and choose_random_kstack_offset() during
> +	  syscall exit. Downgrading of -fstack-protector-strong to
> +	  -fstack-protector should also be applied to the entry code and
> +	  closely examined, as the artificial stack bump looks like an array
> +	  to the compiler, so it will attempt to add canary checks regardless
> +	  of the static branch state.
> +
> +config RANDOMIZE_KSTACK_OFFSET_DEFAULT
> +	bool "Randomize kernel stack offset on syscall entry"
> +	depends on HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET
> +	help
> +	  The kernel stack offset can be randomized (after pt_regs) by
> +	  roughly 5 bits of entropy, frustrating memory corruption
> +	  attacks that depend on stack address determinism or
> +	  cross-syscall address exposures. This feature is controlled
> +	  by kernel boot param "randomize_kstack_offset=on/off", and this
> +	  config chooses the default boot state.


thanks.
-- 
~Randy


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 3/5] stack: Optionally randomize kernel stack offset each syscall
  2020-06-22 19:31 ` [PATCH v4 3/5] stack: Optionally randomize kernel stack offset each syscall Kees Cook
  2020-06-22 19:40   ` Randy Dunlap
@ 2020-06-22 20:07   ` Jann Horn
  2020-06-22 21:30     ` Kees Cook
  2020-06-22 22:56   ` Arvind Sankar
  2020-06-23 12:38   ` Alexander Popov
  3 siblings, 1 reply; 19+ messages in thread
From: Jann Horn @ 2020-06-22 20:07 UTC (permalink / raw)
  To: Kees Cook
  Cc: Thomas Gleixner, Elena Reshetova, the arch/x86 maintainers,
	Andy Lutomirski, Peter Zijlstra, Catalin Marinas, Will Deacon,
	Mark Rutland, Alexander Potapenko, Alexander Popov,
	Ard Biesheuvel, Kernel Hardening, Linux ARM, Linux-MM,
	kernel list

On Mon, Jun 22, 2020 at 9:31 PM Kees Cook <keescook@chromium.org> wrote:
> This provides the ability for architectures to enable kernel stack base
> address offset randomization. This feature is controlled by the boot
> param "randomize_kstack_offset=on/off", with its default value set by
> CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT.
[...]
> +#define add_random_kstack_offset() do {                                        \
> +       if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, \
> +                               &randomize_kstack_offset)) {            \
> +               u32 offset = this_cpu_read(kstack_offset);              \
> +               u8 *ptr = __builtin_alloca(offset & 0x3FF);             \
> +               asm volatile("" : "=m"(*ptr));                          \
> +       }                                                               \
> +} while (0)

clang generates better code here if the mask is stack-aligned -
otherwise it needs to round the stack pointer / the offset:

$ cat alloca_align.c
#include <alloca.h>
void callee(void);

void alloca_blah(unsigned long rand) {
  asm volatile(""::"r"(alloca(rand & MASK)));
  callee();
}
$ clang -O3 -c -o alloca_align.o alloca_align.c -DMASK=0x3ff
$ objdump -d alloca_align.o
[...]
   0: 55                    push   %rbp
   1: 48 89 e5              mov    %rsp,%rbp
   4: 81 e7 ff 03 00 00    and    $0x3ff,%edi
   a: 83 c7 0f              add    $0xf,%edi
   d: 83 e7 f0              and    $0xfffffff0,%edi
  10: 48 89 e0              mov    %rsp,%rax
  13: 48 29 f8              sub    %rdi,%rax
  16: 48 89 c4              mov    %rax,%rsp
  19: e8 00 00 00 00        callq  1e <alloca_blah+0x1e>
  1e: 48 89 ec              mov    %rbp,%rsp
  21: 5d                    pop    %rbp
  22: c3                    retq
$ clang -O3 -c -o alloca_align.o alloca_align.c -DMASK=0x3f0
$ objdump -d alloca_align.o
[...]
   0: 55                    push   %rbp
   1: 48 89 e5              mov    %rsp,%rbp
   4: 48 89 e0              mov    %rsp,%rax
   7: 81 e7 f0 03 00 00    and    $0x3f0,%edi
   d: 48 29 f8              sub    %rdi,%rax
  10: 48 89 c4              mov    %rax,%rsp
  13: e8 00 00 00 00        callq  18 <alloca_blah+0x18>
  18: 48 89 ec              mov    %rbp,%rsp
  1b: 5d                    pop    %rbp
  1c: c3                    retq
$

(From a glance at the assembly, gcc seems to always assume that the
length may be misaligned.)

Maybe this should be something along the lines of
__builtin_alloca(offset & (0x3ff & ARCH_STACK_ALIGN_MASK)) (with
appropriate definitions of the stack alignment mask depending on the
architecture's choice of stack alignment for kernel code).

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 3/5] stack: Optionally randomize kernel stack offset each syscall
  2020-06-22 19:40   ` Randy Dunlap
@ 2020-06-22 21:26     ` Kees Cook
  0 siblings, 0 replies; 19+ messages in thread
From: Kees Cook @ 2020-06-22 21:26 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Thomas Gleixner, Elena Reshetova, x86, Andy Lutomirski,
	Peter Zijlstra, Catalin Marinas, Will Deacon, Mark Rutland,
	Alexander Potapenko, Alexander Popov, Ard Biesheuvel, Jann Horn,
	kernel-hardening, linux-arm-kernel, linux-mm, linux-kernel

On Mon, Jun 22, 2020 at 12:40:49PM -0700, Randy Dunlap wrote:
> On 6/22/20 12:31 PM, Kees Cook wrote:
> > This provides the ability for architectures to enable kernel stack base
> > address offset randomization. This feature is controlled by the boot
> > param "randomize_kstack_offset=on/off", with its default value set by
> > CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT.
> > 
> > Co-developed-by: Elena Reshetova <elena.reshetova@intel.com>
> > Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
> > Link: https://lore.kernel.org/r/20190415060918.3766-1-elena.reshetova@intel.com
> > Signed-off-by: Kees Cook <keescook@chromium.org>
> > ---
> >  Makefile                         |  4 ++++
> >  arch/Kconfig                     | 23 ++++++++++++++++++
> >  include/linux/randomize_kstack.h | 40 ++++++++++++++++++++++++++++++++
> >  init/main.c                      | 23 ++++++++++++++++++
> >  4 files changed, 90 insertions(+)
> >  create mode 100644 include/linux/randomize_kstack.h
> 
> Please add documentation for the new kernel boot parameter to
> Documentation/admin-guide/kernel-parameters.txt.

Oops, yes. Thanks for the reminder!

(I wonder if checkpatch can notice "+early_param" and suggest the Doc
update hmmm)

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 3/5] stack: Optionally randomize kernel stack offset each syscall
  2020-06-22 20:07   ` Jann Horn
@ 2020-06-22 21:30     ` Kees Cook
  2020-06-22 21:42       ` Jann Horn
  0 siblings, 1 reply; 19+ messages in thread
From: Kees Cook @ 2020-06-22 21:30 UTC (permalink / raw)
  To: Jann Horn
  Cc: Thomas Gleixner, Elena Reshetova, the arch/x86 maintainers,
	Andy Lutomirski, Peter Zijlstra, Catalin Marinas, Will Deacon,
	Mark Rutland, Alexander Potapenko, Alexander Popov,
	Ard Biesheuvel, Kernel Hardening, Linux ARM, Linux-MM,
	kernel list

On Mon, Jun 22, 2020 at 10:07:37PM +0200, Jann Horn wrote:
> On Mon, Jun 22, 2020 at 9:31 PM Kees Cook <keescook@chromium.org> wrote:
> > This provides the ability for architectures to enable kernel stack base
> > address offset randomization. This feature is controlled by the boot
> > param "randomize_kstack_offset=on/off", with its default value set by
> > CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT.
> [...]
> > +#define add_random_kstack_offset() do {                                        \
> > +       if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, \
> > +                               &randomize_kstack_offset)) {            \
> > +               u32 offset = this_cpu_read(kstack_offset);              \
> > +               u8 *ptr = __builtin_alloca(offset & 0x3FF);             \
> > +               asm volatile("" : "=m"(*ptr));                          \
> > +       }                                                               \
> > +} while (0)
> 
> clang generates better code here if the mask is stack-aligned -
> otherwise it needs to round the stack pointer / the offset:

Interesting. I was hoping to avoid needing to know the architecture
stack alignment (leaving it up to the compiler).

> 
> $ cat alloca_align.c
> #include <alloca.h>
> void callee(void);
> 
> void alloca_blah(unsigned long rand) {
>   asm volatile(""::"r"(alloca(rand & MASK)));
>   callee();
> }
> $ clang -O3 -c -o alloca_align.o alloca_align.c -DMASK=0x3ff
> $ objdump -d alloca_align.o
> [...]
>    0: 55                    push   %rbp
>    1: 48 89 e5              mov    %rsp,%rbp
>    4: 81 e7 ff 03 00 00    and    $0x3ff,%edi
>    a: 83 c7 0f              add    $0xf,%edi
>    d: 83 e7 f0              and    $0xfffffff0,%edi
>   10: 48 89 e0              mov    %rsp,%rax
>   13: 48 29 f8              sub    %rdi,%rax
>   16: 48 89 c4              mov    %rax,%rsp
>   19: e8 00 00 00 00        callq  1e <alloca_blah+0x1e>
>   1e: 48 89 ec              mov    %rbp,%rsp
>   21: 5d                    pop    %rbp
>   22: c3                    retq
> $ clang -O3 -c -o alloca_align.o alloca_align.c -DMASK=0x3f0
> $ objdump -d alloca_align.o
> [...]
>    0: 55                    push   %rbp
>    1: 48 89 e5              mov    %rsp,%rbp
>    4: 48 89 e0              mov    %rsp,%rax
>    7: 81 e7 f0 03 00 00    and    $0x3f0,%edi
>    d: 48 29 f8              sub    %rdi,%rax
>   10: 48 89 c4              mov    %rax,%rsp
>   13: e8 00 00 00 00        callq  18 <alloca_blah+0x18>
>   18: 48 89 ec              mov    %rbp,%rsp
>   1b: 5d                    pop    %rbp
>   1c: c3                    retq
> $
> 
> (From a glance at the assembly, gcc seems to always assume that the
> length may be misaligned.)

Right -- this is why I didn't bother with it, since it didn't seem to
notice what I'd already done to the alloca() argument. (But from what I
could measure on cycle counts, the additional ALU didn't seem to really
make much difference ... it _would_ be nice to avoid it, of course.)

> Maybe this should be something along the lines of
> __builtin_alloca(offset & (0x3ff & ARCH_STACK_ALIGN_MASK)) (with
> appropriate definitions of the stack alignment mask depending on the
> architecture's choice of stack alignment for kernel code).

Is that explicitly selected anywhere in the kernel? I thought the
alignment was left up to the compiler (as in I've seen bugs fixed where
the kernel had to deal with the alignment choices the compiler was
making...)

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 3/5] stack: Optionally randomize kernel stack offset each syscall
  2020-06-22 21:30     ` Kees Cook
@ 2020-06-22 21:42       ` Jann Horn
  2020-06-22 22:04         ` Kees Cook
  0 siblings, 1 reply; 19+ messages in thread
From: Jann Horn @ 2020-06-22 21:42 UTC (permalink / raw)
  To: Kees Cook
  Cc: Thomas Gleixner, Elena Reshetova, the arch/x86 maintainers,
	Andy Lutomirski, Peter Zijlstra, Catalin Marinas, Will Deacon,
	Mark Rutland, Alexander Potapenko, Alexander Popov,
	Ard Biesheuvel, Kernel Hardening, Linux ARM, Linux-MM,
	kernel list

On Mon, Jun 22, 2020 at 11:30 PM Kees Cook <keescook@chromium.org> wrote:
> On Mon, Jun 22, 2020 at 10:07:37PM +0200, Jann Horn wrote:
> > On Mon, Jun 22, 2020 at 9:31 PM Kees Cook <keescook@chromium.org> wrote:
> > > This provides the ability for architectures to enable kernel stack base
> > > address offset randomization. This feature is controlled by the boot
> > > param "randomize_kstack_offset=on/off", with its default value set by
> > > CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT.
> > [...]
> > > +#define add_random_kstack_offset() do {                                        \
> > > +       if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, \
> > > +                               &randomize_kstack_offset)) {            \
> > > +               u32 offset = this_cpu_read(kstack_offset);              \
> > > +               u8 *ptr = __builtin_alloca(offset & 0x3FF);             \
> > > +               asm volatile("" : "=m"(*ptr));                          \
> > > +       }                                                               \
> > > +} while (0)
> >
> > clang generates better code here if the mask is stack-aligned -
> > otherwise it needs to round the stack pointer / the offset:
[...]
> > Maybe this should be something along the lines of
> > __builtin_alloca(offset & (0x3ff & ARCH_STACK_ALIGN_MASK)) (with
> > appropriate definitions of the stack alignment mask depending on the
> > architecture's choice of stack alignment for kernel code).
>
> Is that explicitly selected anywhere in the kernel? I thought the
> alignment was left up to the compiler (as in I've seen bugs fixed where
> the kernel had to deal with the alignment choices the compiler was
> making...)

No, at least on x86-64 and x86 Linux overrides the normal ABI. From
arch/x86/Makefile:

# For gcc stack alignment is specified with -mpreferred-stack-boundary,
# clang has the option -mstack-alignment for that purpose.
ifneq ($(call cc-option, -mpreferred-stack-boundary=4),)
      cc_stack_align4 := -mpreferred-stack-boundary=2
      cc_stack_align8 := -mpreferred-stack-boundary=3
else ifneq ($(call cc-option, -mstack-alignment=16),)
      cc_stack_align4 := -mstack-alignment=4
      cc_stack_align8 := -mstack-alignment=8
endif
[...]
ifeq ($(CONFIG_X86_32),y)
[...]
        # Align the stack to the register width instead of using the default
        # alignment of 16 bytes. This reduces stack usage and the number of
        # alignment instructions.
        KBUILD_CFLAGS += $(call cc-option,$(cc_stack_align4))
[...]
else
[...]
        # By default gcc and clang use a stack alignment of 16 bytes for x86.
        # However the standard kernel entry on x86-64 leaves the stack on an
        # 8-byte boundary. If the compiler isn't informed about the actual
        # alignment it will generate extra alignment instructions for the
        # default alignment which keep the stack *mis*aligned.
        # Furthermore an alignment to the register width reduces stack usage
        # and the number of alignment instructions.
        KBUILD_CFLAGS += $(call cc-option,$(cc_stack_align8))
[...]
endif

Normal x86-64 ABI has 16-byte stack alignment; Linux kernel x86-64 ABI
has 8-byte stack alignment.
Similarly, the normal Linux 32-bit x86 ABI is 16-byte aligned;
meanwhile Linux kernel x86 ABI has 4-byte stack alignment.

This is because userspace code wants the stack to be sufficiently
aligned for fancy SSE instructions and such; the kernel, on the other
hand, never uses those in normal code, and cares about stack usage and
such very much.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 3/5] stack: Optionally randomize kernel stack offset each syscall
  2020-06-22 21:42       ` Jann Horn
@ 2020-06-22 22:04         ` Kees Cook
  0 siblings, 0 replies; 19+ messages in thread
From: Kees Cook @ 2020-06-22 22:04 UTC (permalink / raw)
  To: Jann Horn
  Cc: Thomas Gleixner, Elena Reshetova, the arch/x86 maintainers,
	Andy Lutomirski, Peter Zijlstra, Catalin Marinas, Will Deacon,
	Mark Rutland, Alexander Potapenko, Alexander Popov,
	Ard Biesheuvel, Kernel Hardening, Linux ARM, Linux-MM,
	kernel list

On Mon, Jun 22, 2020 at 11:42:29PM +0200, Jann Horn wrote:
> No, at least on x86-64 and x86 Linux overrides the normal ABI. From
> arch/x86/Makefile:

Ah! Thanks for the pointer.

> 
> # For gcc stack alignment is specified with -mpreferred-stack-boundary,
> # clang has the option -mstack-alignment for that purpose.
> ifneq ($(call cc-option, -mpreferred-stack-boundary=4),)
>       cc_stack_align4 := -mpreferred-stack-boundary=2
>       cc_stack_align8 := -mpreferred-stack-boundary=3
> else ifneq ($(call cc-option, -mstack-alignment=16),)
>       cc_stack_align4 := -mstack-alignment=4
>       cc_stack_align8 := -mstack-alignment=8
> endif
> [...]
> ifeq ($(CONFIG_X86_32),y)
> [...]
>         # Align the stack to the register width instead of using the default
>         # alignment of 16 bytes. This reduces stack usage and the number of
>         # alignment instructions.
>         KBUILD_CFLAGS += $(call cc-option,$(cc_stack_align4))
> [...]
> else
> [...]
>         # By default gcc and clang use a stack alignment of 16 bytes for x86.
>         # However the standard kernel entry on x86-64 leaves the stack on an
>         # 8-byte boundary. If the compiler isn't informed about the actual
>         # alignment it will generate extra alignment instructions for the
>         # default alignment which keep the stack *mis*aligned.
>         # Furthermore an alignment to the register width reduces stack usage
>         # and the number of alignment instructions.
>         KBUILD_CFLAGS += $(call cc-option,$(cc_stack_align8))
> [...]
> endif

And it seems that only x86 does this. No other architecture specifies
-mpreferred-stack-boundary...

> Normal x86-64 ABI has 16-byte stack alignment; Linux kernel x86-64 ABI
> has 8-byte stack alignment.
> Similarly, the normal Linux 32-bit x86 ABI is 16-byte aligned;
> meanwhile Linux kernel x86 ABI has 4-byte stack alignment.
> 
> This is because userspace code wants the stack to be sufficiently
> aligned for fancy SSE instructions and such; the kernel, on the other
> hand, never uses those in normal code, and cares about stack usage and
> such very much.

This makes it nicer for Clang:


diff --git a/include/linux/randomize_kstack.h b/include/linux/randomize_kstack.h
index 1df0dc52cadc..f7e1f68fb50c 100644
--- a/include/linux/randomize_kstack.h
+++ b/include/linux/randomize_kstack.h
@@ -10,6 +10,14 @@ DECLARE_STATIC_KEY_MAYBE(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,
 			 randomize_kstack_offset);
 DECLARE_PER_CPU(u32, kstack_offset);
 
+#ifdef CONFIG_X86_64
+#define ARCH_STACK_ALIGN_MASK	~((1 << 8) - 1)
+#elif defined(CONFIG_X86_32)
+#define ARCH_STACK_ALIGN_MASK	~((1 << 4) - 1)
+#else
+#define ARCH_STACK_ALIGN_MASK	~(0)
+#endif
+
 /*
  * Do not use this anywhere else in the kernel. This is used here because
  * it provides an arch-agnostic way to grow the stack with correct
@@ -23,7 +31,8 @@ void *__builtin_alloca(size_t size);
 	if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,	\
 				&randomize_kstack_offset)) {		\
 		u32 offset = this_cpu_read(kstack_offset);		\
-		u8 *ptr = __builtin_alloca(offset & 0x3FF);		\
+		u8 *ptr = __builtin_alloca(offset & 0x3FF &		\
+					   ARCH_STACK_ALIGN_MASK);	\
 		asm volatile("" : "=m"(*ptr));				\
 	}								\
 } while (0)


But I don't like open-coding the x86-ony stack alignment... it should be
in Kconfig or something, I think?

-- 
Kees Cook

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 3/5] stack: Optionally randomize kernel stack offset each syscall
  2020-06-22 19:31 ` [PATCH v4 3/5] stack: Optionally randomize kernel stack offset each syscall Kees Cook
  2020-06-22 19:40   ` Randy Dunlap
  2020-06-22 20:07   ` Jann Horn
@ 2020-06-22 22:56   ` Arvind Sankar
  2020-06-22 23:07     ` Kees Cook
  2020-06-23 12:38   ` Alexander Popov
  3 siblings, 1 reply; 19+ messages in thread
From: Arvind Sankar @ 2020-06-22 22:56 UTC (permalink / raw)
  To: Kees Cook
  Cc: Thomas Gleixner, Elena Reshetova, x86, Andy Lutomirski,
	Peter Zijlstra, Catalin Marinas, Will Deacon, Mark Rutland,
	Alexander Potapenko, Alexander Popov, Ard Biesheuvel, Jann Horn,
	kernel-hardening, linux-arm-kernel, linux-mm, linux-kernel

On Mon, Jun 22, 2020 at 12:31:44PM -0700, Kees Cook wrote:
> +
> +#define add_random_kstack_offset() do {					\
> +	if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,	\
> +				&randomize_kstack_offset)) {		\
> +		u32 offset = this_cpu_read(kstack_offset);		\
> +		u8 *ptr = __builtin_alloca(offset & 0x3FF);		\
> +		asm volatile("" : "=m"(*ptr));				\
> +	}								\
> +} while (0)

This feels a little fragile. ptr doesn't escape the block, so the
compiler is free to restore the stack immediately after this block. In
fact, given that all you've said is that the asm modifies *ptr, but
nothing uses that output, the compiler could eliminate the whole thing,
no?

https://godbolt.org/z/HT43F5

gcc restores the stack immediately, if no function calls come after it.

clang completely eliminates the code if no function calls come after.

I'm not sure why function calls should affect it, though, given that
those functions can't possibly access ptr or the memory it points to.

A full memory barrier (like barrier_data) should be better -- it gives
the compiler a reason to believe that ptr might escape and be accessed
by any code following the block?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 3/5] stack: Optionally randomize kernel stack offset each syscall
  2020-06-22 22:56   ` Arvind Sankar
@ 2020-06-22 23:07     ` Kees Cook
  2020-06-23  0:05       ` Arvind Sankar
  0 siblings, 1 reply; 19+ messages in thread
From: Kees Cook @ 2020-06-22 23:07 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Thomas Gleixner, Elena Reshetova, x86, Andy Lutomirski,
	Peter Zijlstra, Catalin Marinas, Will Deacon, Mark Rutland,
	Alexander Potapenko, Alexander Popov, Ard Biesheuvel, Jann Horn,
	kernel-hardening, linux-arm-kernel, linux-mm, linux-kernel

On Mon, Jun 22, 2020 at 06:56:15PM -0400, Arvind Sankar wrote:
> On Mon, Jun 22, 2020 at 12:31:44PM -0700, Kees Cook wrote:
> > +
> > +#define add_random_kstack_offset() do {					\
> > +	if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,	\
> > +				&randomize_kstack_offset)) {		\
> > +		u32 offset = this_cpu_read(kstack_offset);		\
> > +		u8 *ptr = __builtin_alloca(offset & 0x3FF);		\
> > +		asm volatile("" : "=m"(*ptr));				\
> > +	}								\
> > +} while (0)
> 
> This feels a little fragile. ptr doesn't escape the block, so the
> compiler is free to restore the stack immediately after this block. In
> fact, given that all you've said is that the asm modifies *ptr, but
> nothing uses that output, the compiler could eliminate the whole thing,
> no?
> 
> https://godbolt.org/z/HT43F5
> 
> gcc restores the stack immediately, if no function calls come after it.
> 
> clang completely eliminates the code if no function calls come after.

nothing uses the stack in your example. And adding a barrier (which is
what the "=m" is, doesn't change it.

> I'm not sure why function calls should affect it, though, given that
> those functions can't possibly access ptr or the memory it points to.

It seems to work correctly:
https://godbolt.org/z/c3W5BW

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 3/5] stack: Optionally randomize kernel stack offset each syscall
  2020-06-22 23:07     ` Kees Cook
@ 2020-06-23  0:05       ` Arvind Sankar
  2020-06-23  0:56         ` Kees Cook
  0 siblings, 1 reply; 19+ messages in thread
From: Arvind Sankar @ 2020-06-23  0:05 UTC (permalink / raw)
  To: Kees Cook
  Cc: Arvind Sankar, Thomas Gleixner, Elena Reshetova, x86,
	Andy Lutomirski, Peter Zijlstra, Catalin Marinas, Will Deacon,
	Mark Rutland, Alexander Potapenko, Alexander Popov,
	Ard Biesheuvel, Jann Horn, kernel-hardening, linux-arm-kernel,
	linux-mm, linux-kernel

On Mon, Jun 22, 2020 at 04:07:11PM -0700, Kees Cook wrote:
> On Mon, Jun 22, 2020 at 06:56:15PM -0400, Arvind Sankar wrote:
> > On Mon, Jun 22, 2020 at 12:31:44PM -0700, Kees Cook wrote:
> > > +
> > > +#define add_random_kstack_offset() do {					\
> > > +	if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,	\
> > > +				&randomize_kstack_offset)) {		\
> > > +		u32 offset = this_cpu_read(kstack_offset);		\
> > > +		u8 *ptr = __builtin_alloca(offset & 0x3FF);		\
> > > +		asm volatile("" : "=m"(*ptr));				\
> > > +	}								\
> > > +} while (0)
> > 
> > This feels a little fragile. ptr doesn't escape the block, so the
> > compiler is free to restore the stack immediately after this block. In
> > fact, given that all you've said is that the asm modifies *ptr, but
> > nothing uses that output, the compiler could eliminate the whole thing,
> > no?
> > 
> > https://godbolt.org/z/HT43F5
> > 
> > gcc restores the stack immediately, if no function calls come after it.
> > 
> > clang completely eliminates the code if no function calls come after.
> 
> nothing uses the stack in your example. And adding a barrier (which is
> what the "=m" is, doesn't change it.

Yeah, I realized that that was what's going on. And clang isn't actually
DCE'ing it, it's taking advantage of the red zone since my alloca was
small enough.

But I still don't see anything _stopping_ the compiler from optimizing
this better in the future. The "=m" is not a barrier: it just informs
the compiler that the asm produces an output value in *ptr (and no other
outputs). If nothing can consume that output, it doesn't stop the
compiler from freeing the allocation immediately after the asm instead
of at the end of the function.

I'm talking about something like
	asm volatile("" : : "r" (ptr) : "memory");
which tells the compiler that the asm may change memory arbitrarily.

Here, we don't use it really as a barrier, but to tell the compiler that
the asm may have stashed the value of ptr somewhere in memory, so it's
not free to reuse the space that it pointed to until the function
returns (unless it can prove that nothing accesses memory, not just that
nothing accesses ptr).

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 3/5] stack: Optionally randomize kernel stack offset each syscall
  2020-06-23  0:05       ` Arvind Sankar
@ 2020-06-23  0:56         ` Kees Cook
  2020-06-23 13:42           ` David Laight
  0 siblings, 1 reply; 19+ messages in thread
From: Kees Cook @ 2020-06-23  0:56 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Thomas Gleixner, Elena Reshetova, x86, Andy Lutomirski,
	Peter Zijlstra, Catalin Marinas, Will Deacon, Mark Rutland,
	Alexander Potapenko, Alexander Popov, Ard Biesheuvel, Jann Horn,
	kernel-hardening, linux-arm-kernel, linux-mm, linux-kernel

On Mon, Jun 22, 2020 at 08:05:10PM -0400, Arvind Sankar wrote:
> But I still don't see anything _stopping_ the compiler from optimizing
> this better in the future. The "=m" is not a barrier: it just informs
> the compiler that the asm produces an output value in *ptr (and no other
> outputs). If nothing can consume that output, it doesn't stop the
> compiler from freeing the allocation immediately after the asm instead
> of at the end of the function.

Ah, yeah, I get what you mean.

> I'm talking about something like
> 	asm volatile("" : : "r" (ptr) : "memory");
> which tells the compiler that the asm may change memory arbitrarily.

Yeah, I will adjust it.

> Here, we don't use it really as a barrier, but to tell the compiler that
> the asm may have stashed the value of ptr somewhere in memory, so it's
> not free to reuse the space that it pointed to until the function
> returns (unless it can prove that nothing accesses memory, not just that
> nothing accesses ptr).

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 5/5] arm64: entry: Enable random_kstack_offset support
  2020-06-22 19:31 ` [PATCH v4 5/5] arm64: entry: " Kees Cook
@ 2020-06-23  9:40   ` Mark Rutland
  0 siblings, 0 replies; 19+ messages in thread
From: Mark Rutland @ 2020-06-23  9:40 UTC (permalink / raw)
  To: Kees Cook
  Cc: Thomas Gleixner, Elena Reshetova, x86, Andy Lutomirski,
	Peter Zijlstra, Catalin Marinas, Will Deacon,
	Alexander Potapenko, Alexander Popov, Ard Biesheuvel, Jann Horn,
	kernel-hardening, linux-arm-kernel, linux-mm, linux-kernel

On Mon, Jun 22, 2020 at 12:31:46PM -0700, Kees Cook wrote:
> Allow for a randomized stack offset on a per-syscall basis, with roughly
> 5 bits of entropy.
> 
> In order to avoid unconditional stack canaries on syscall entry, also
> downgrade from -fstack-protector-strong to -fstack-protector to avoid
> triggering checks due to alloca(). Examining the resulting syscall.o,
> sees no changes in canary coverage (none before, none now).

Is there any way we can do this on invoke_syscall() specifically with an
attribute? That'd help to keep all the concerns local in the file, and
means that we wouldn't potentially lose protection for other functions
that get added in future.

> 
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>  arch/arm64/Kconfig          |  1 +
>  arch/arm64/kernel/Makefile  |  5 +++++
>  arch/arm64/kernel/syscall.c | 10 ++++++++++
>  3 files changed, 16 insertions(+)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index a4a094bedcb2..2902e5316e1a 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -135,6 +135,7 @@ config ARM64
>  	select HAVE_ARCH_MMAP_RND_BITS
>  	select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT
>  	select HAVE_ARCH_PREL32_RELOCATIONS
> +	select HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET
>  	select HAVE_ARCH_SECCOMP_FILTER
>  	select HAVE_ARCH_STACKLEAK
>  	select HAVE_ARCH_THREAD_STRUCT_WHITELIST
> diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
> index 151f28521f1e..39fc23d3770b 100644
> --- a/arch/arm64/kernel/Makefile
> +++ b/arch/arm64/kernel/Makefile
> @@ -11,6 +11,11 @@ CFLAGS_REMOVE_ftrace.o = $(CC_FLAGS_FTRACE)
>  CFLAGS_REMOVE_insn.o = $(CC_FLAGS_FTRACE)
>  CFLAGS_REMOVE_return_address.o = $(CC_FLAGS_FTRACE)
>  
> +# Downgrade to -fstack-protector to avoid triggering unneeded stack canary
> +# checks due to randomize_kstack_offset.
> +CFLAGS_REMOVE_syscall.o += -fstack-protector-strong
> +CFLAGS_syscall.o	+= $(subst -fstack-protector-strong,-fstack-protector,$(filter -fstack-protector-strong,$(KBUILD_CFLAGS)))
> +
>  # Object file lists.
>  obj-y			:= debug-monitors.o entry.o irq.o fpsimd.o		\
>  			   entry-common.o entry-fpsimd.o process.o ptrace.o	\
> diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
> index 5f5b868292f5..00d3c84db9cd 100644
> --- a/arch/arm64/kernel/syscall.c
> +++ b/arch/arm64/kernel/syscall.c
> @@ -5,6 +5,7 @@
>  #include <linux/errno.h>
>  #include <linux/nospec.h>
>  #include <linux/ptrace.h>
> +#include <linux/randomize_kstack.h>
>  #include <linux/syscalls.h>
>  
>  #include <asm/daifflags.h>
> @@ -42,6 +43,8 @@ static void invoke_syscall(struct pt_regs *regs, unsigned int scno,
>  {
>  	long ret;
>  
> +	add_random_kstack_offset();
> +
>  	if (scno < sc_nr) {
>  		syscall_fn_t syscall_fn;
>  		syscall_fn = syscall_table[array_index_nospec(scno, sc_nr)];
> @@ -51,6 +54,13 @@ static void invoke_syscall(struct pt_regs *regs, unsigned int scno,
>  	}
>  
>  	regs->regs[0] = ret;
> +
> +	/*
> +	 * Since the compiler chooses a 4 bit alignment for the stack,
> +	 * let's save one additional bit (9 total), which gets us up
> +	 * near 5 bits of entropy.
> +	 */

To explain the alignment requirement a bit better, how about:

	/*
	 * The AAPCS mandates a 16-byte (i.e. 4-bit) aligned SP at
	 * function boundaries. We want at least 5 bits of entropy so we
	 * must randomize at least SP[8:4].
	 */

> +	choose_random_kstack_offset(get_random_int() & 0x1FF);

Do we have a rationale for randomizing bits SP[3:0]? If not, we might
get better code gen with a 0x1F0 mask, since the compiler won't need to
round down the SP.

If we have a rationale that's fine, but we should spell it out more
explicitly in the comment. Even if that's just "randomizing SP[3:0]
isn't harmful, so we randomize those too".

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 3/5] stack: Optionally randomize kernel stack offset each syscall
  2020-06-22 19:31 ` [PATCH v4 3/5] stack: Optionally randomize kernel stack offset each syscall Kees Cook
                     ` (2 preceding siblings ...)
  2020-06-22 22:56   ` Arvind Sankar
@ 2020-06-23 12:38   ` Alexander Popov
  3 siblings, 0 replies; 19+ messages in thread
From: Alexander Popov @ 2020-06-23 12:38 UTC (permalink / raw)
  To: Kees Cook, Thomas Gleixner
  Cc: Elena Reshetova, x86, Andy Lutomirski, Peter Zijlstra,
	Catalin Marinas, Will Deacon, Mark Rutland, Alexander Potapenko,
	Ard Biesheuvel, Jann Horn, kernel-hardening, linux-arm-kernel,
	linux-mm, linux-kernel

On 22.06.2020 22:31, Kees Cook wrote:
> As Linux kernel stack protections have been constantly improving
> (vmap-based stack allocation with guard pages, removal of thread_info,
> STACKLEAK), attackers have had to find new ways for their exploits
> to work. They have done so, continuing to rely on the kernel's stack
> determinism, in situations where VMAP_STACK and THREAD_INFO_IN_TASK_STRUCT
> were not relevant. For example, the following recent attacks would have
> been hampered if the stack offset was non-deterministic between syscalls:
> 
> https://repositorio-aberto.up.pt/bitstream/10216/125357/2/374717.pdf
> (page 70: targeting the pt_regs copy with linear stack overflow)
> 
> https://a13xp0p0v.github.io/2020/02/15/CVE-2019-18683.html
> (leaked stack address from one syscall as a target during next syscall)
> 
> The main idea is that since the stack offset is randomized on each system
> call, it is harder for an attack to reliably land in any particular place
> on the thread stack, even with address exposures, as the stack base will
> change on the next syscall. Also, since randomization is performed after
> placing pt_regs, the ptrace-based approach[1] to discover the randomized
> offset during a long-running syscall should not be possible.

Hello Kees!

I would recommend to disable CONFIG_STACKLEAK_METRICS if kernel stack offset
randomization is enabled. It is a debugging feature that provides information
about kernel stack usage. That info can be useful for calculating the random offset.

I would also recommend to check: there might be other kernel features for
debugging or getting statistics that can be used to disclose the random stack
offset.

Best regards,
Alexander

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH v4 3/5] stack: Optionally randomize kernel stack offset each syscall
  2020-06-23  0:56         ` Kees Cook
@ 2020-06-23 13:42           ` David Laight
  0 siblings, 0 replies; 19+ messages in thread
From: David Laight @ 2020-06-23 13:42 UTC (permalink / raw)
  To: 'Kees Cook', Arvind Sankar
  Cc: Thomas Gleixner, Elena Reshetova, x86, Andy Lutomirski,
	Peter Zijlstra, Catalin Marinas, Will Deacon, Mark Rutland,
	Alexander Potapenko, Alexander Popov, Ard Biesheuvel, Jann Horn,
	kernel-hardening, linux-arm-kernel, linux-mm, linux-kernel

From: Kees Cook
> Sent: 23 June 2020 01:56
> On Mon, Jun 22, 2020 at 08:05:10PM -0400, Arvind Sankar wrote:
> > But I still don't see anything _stopping_ the compiler from optimizing
> > this better in the future. The "=m" is not a barrier: it just informs
> > the compiler that the asm produces an output value in *ptr (and no other
> > outputs). If nothing can consume that output, it doesn't stop the
> > compiler from freeing the allocation immediately after the asm instead
> > of at the end of the function.
> 
> Ah, yeah, I get what you mean.
> 
> > I'm talking about something like
> > 	asm volatile("" : : "r" (ptr) : "memory");
> > which tells the compiler that the asm may change memory arbitrarily.
> 
> Yeah, I will adjust it.
> 
> > Here, we don't use it really as a barrier, but to tell the compiler that
> > the asm may have stashed the value of ptr somewhere in memory, so it's
> > not free to reuse the space that it pointed to until the function
> > returns (unless it can prove that nothing accesses memory, not just that
> > nothing accesses ptr).

Do you need another asm volatile("" : : "r" (ptr) : "memory");
(or similar) at the bottom of the function - that the compiler thinks
might access the memory whose address it thought got saved earlier?

I wonder if it would be easier to allocate the stack space
in the asm wrapper? At least as an architecture option.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2020-06-23 13:42 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-22 19:31 [PATCH v4 0/5] Optionally randomize kernel stack offset each syscall Kees Cook
2020-06-22 19:31 ` [PATCH v4 1/5] jump_label: Provide CONFIG-driven build state defaults Kees Cook
2020-06-22 19:31 ` [PATCH v4 2/5] init_on_alloc: Unpessimize default-on builds Kees Cook
2020-06-22 19:31 ` [PATCH v4 3/5] stack: Optionally randomize kernel stack offset each syscall Kees Cook
2020-06-22 19:40   ` Randy Dunlap
2020-06-22 21:26     ` Kees Cook
2020-06-22 20:07   ` Jann Horn
2020-06-22 21:30     ` Kees Cook
2020-06-22 21:42       ` Jann Horn
2020-06-22 22:04         ` Kees Cook
2020-06-22 22:56   ` Arvind Sankar
2020-06-22 23:07     ` Kees Cook
2020-06-23  0:05       ` Arvind Sankar
2020-06-23  0:56         ` Kees Cook
2020-06-23 13:42           ` David Laight
2020-06-23 12:38   ` Alexander Popov
2020-06-22 19:31 ` [PATCH v4 4/5] x86/entry: Enable random_kstack_offset support Kees Cook
2020-06-22 19:31 ` [PATCH v4 5/5] arm64: entry: " Kees Cook
2020-06-23  9:40   ` Mark Rutland

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).