From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E0941CA9EA0 for ; Fri, 25 Oct 2019 10:56:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A42A121929 for ; Fri, 25 Oct 2019 10:56:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2409788AbfJYK4t (ORCPT ); Fri, 25 Oct 2019 06:56:49 -0400 Received: from foss.arm.com ([217.140.110.172]:38956 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2407177AbfJYK4t (ORCPT ); Fri, 25 Oct 2019 06:56:49 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 233811FB; Fri, 25 Oct 2019 03:56:48 -0700 (PDT) Received: from lakrids.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 03B9C3F6C4; Fri, 25 Oct 2019 03:56:45 -0700 (PDT) Date: Fri, 25 Oct 2019 11:56:43 +0100 From: Mark Rutland To: samitolvanen@google.com Cc: Will Deacon , Catalin Marinas , Steven Rostedt , Masami Hiramatsu , Ard Biesheuvel , Dave Martin , Kees Cook , Laura Abbott , Nick Desaulniers , Jann Horn , Miguel Ojeda , Masahiro Yamada , clang-built-linux@googlegroups.com, kernel-hardening@lists.openwall.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v2 05/17] add support for Clang's Shadow Call Stack (SCS) Message-ID: <20191025105643.GD40270@lakrids.cambridge.arm.com> References: <20191018161033.261971-1-samitolvanen@google.com> <20191024225132.13410-1-samitolvanen@google.com> <20191024225132.13410-6-samitolvanen@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20191024225132.13410-6-samitolvanen@google.com> User-Agent: Mutt/1.11.1+11 (2f07cb52) (2018-12-01) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Oct 24, 2019 at 03:51:20PM -0700, samitolvanen@google.com wrote: > This change adds generic support for Clang's Shadow Call Stack, > which uses a shadow stack to protect return addresses from being > overwritten by an attacker. Details are available here: > > https://clang.llvm.org/docs/ShadowCallStack.html > > Note that security guarantees in the kernel differ from the > ones documented for user space. The kernel must store addresses > of shadow stacks used by other tasks and interrupt handlers in > memory, which means an attacker capable reading and writing > arbitrary memory may be able to locate them and hijack control > flow by modifying shadow stacks that are not currently in use. > > Signed-off-by: Sami Tolvanen > --- > Makefile | 6 ++ > arch/Kconfig | 33 +++++++ > include/linux/compiler-clang.h | 6 ++ > include/linux/compiler_types.h | 4 + > include/linux/scs.h | 78 +++++++++++++++++ > init/init_task.c | 8 ++ > kernel/Makefile | 1 + > kernel/fork.c | 9 ++ > kernel/sched/core.c | 2 + > kernel/sched/sched.h | 1 + > kernel/scs.c | 155 +++++++++++++++++++++++++++++++++ > 11 files changed, 303 insertions(+) > create mode 100644 include/linux/scs.h > create mode 100644 kernel/scs.c > > diff --git a/Makefile b/Makefile > index 5475cdb6d57d..2b5c59fb18f2 100644 > --- a/Makefile > +++ b/Makefile > @@ -846,6 +846,12 @@ ifdef CONFIG_LIVEPATCH > KBUILD_CFLAGS += $(call cc-option, -flive-patching=inline-clone) > endif > > +ifdef CONFIG_SHADOW_CALL_STACK > +CC_FLAGS_SCS := -fsanitize=shadow-call-stack > +KBUILD_CFLAGS += $(CC_FLAGS_SCS) > +export CC_FLAGS_SCS > +endif > + > # arch Makefile may override CC so keep this after arch Makefile is included > NOSTDINC_FLAGS += -nostdinc -isystem $(shell $(CC) -print-file-name=include) > > diff --git a/arch/Kconfig b/arch/Kconfig > index 5f8a5d84dbbe..5e34cbcd8d6a 100644 > --- a/arch/Kconfig > +++ b/arch/Kconfig > @@ -521,6 +521,39 @@ config STACKPROTECTOR_STRONG > about 20% of all kernel functions, which increases the kernel code > size by about 2%. > > +config ARCH_SUPPORTS_SHADOW_CALL_STACK > + bool > + help > + An architecture should select this if it supports Clang's Shadow > + Call Stack, has asm/scs.h, and implements runtime support for shadow > + stack switching. > + > +config SHADOW_CALL_STACK_VMAP > + bool > + depends on SHADOW_CALL_STACK > + help > + Use virtually mapped shadow call stacks. Selecting this option > + provides better stack exhaustion protection, but increases per-thread > + memory consumption as a full page is allocated for each shadow stack. > + > +config SHADOW_CALL_STACK > + bool "Clang Shadow Call Stack" > + depends on ARCH_SUPPORTS_SHADOW_CALL_STACK > + help > + This option enables Clang's Shadow Call Stack, which uses a > + shadow stack to protect function return addresses from being > + overwritten by an attacker. More information can be found from > + Clang's documentation: > + > + https://clang.llvm.org/docs/ShadowCallStack.html > + > + Note that security guarantees in the kernel differ from the ones > + documented for user space. The kernel must store addresses of shadow > + stacks used by other tasks and interrupt handlers in memory, which > + means an attacker capable reading and writing arbitrary memory may > + be able to locate them and hijack control flow by modifying shadow > + stacks that are not currently in use. > + > config HAVE_ARCH_WITHIN_STACK_FRAMES > bool > help > diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h > index 333a6695a918..afe5e24088b2 100644 > --- a/include/linux/compiler-clang.h > +++ b/include/linux/compiler-clang.h > @@ -42,3 +42,9 @@ > * compilers, like ICC. > */ > #define barrier() __asm__ __volatile__("" : : : "memory") > + > +#if __has_feature(shadow_call_stack) > +# define __noscs __attribute__((no_sanitize("shadow-call-stack"))) > +#else > +# define __noscs > +#endif Huh. I didn't realise it was valid to have a space after the `#` like this. I see we're very inconsistent about style on that front, so this is fine, I'll just have to get used to it. :) > diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h > index 72393a8c1a6c..be5d5be4b1ae 100644 > --- a/include/linux/compiler_types.h > +++ b/include/linux/compiler_types.h > @@ -202,6 +202,10 @@ struct ftrace_likely_data { > # define randomized_struct_fields_end > #endif > > +#ifndef __noscs > +# define __noscs > +#endif > + > #ifndef asm_volatile_goto > #define asm_volatile_goto(x...) asm goto(x) > #endif > diff --git a/include/linux/scs.h b/include/linux/scs.h > new file mode 100644 > index 000000000000..c8b0ccfdd803 > --- /dev/null > +++ b/include/linux/scs.h > @@ -0,0 +1,78 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +/* > + * Shadow Call Stack support. > + * > + * Copyright (C) 2018 Google LLC > + */ > + > +#ifndef _LINUX_SCS_H > +#define _LINUX_SCS_H > + > +#include > +#include > +#include > + > +#ifdef CONFIG_SHADOW_CALL_STACK > + > +#define SCS_SIZE 1024 I think it'd be worth a comment on how this size was chosen. IIRC this empirical? > +#define SCS_END_MAGIC 0xaf0194819b1635f6UL Keyboard smash? ... or is there a prize for whoever figures out the secret? ;) > + > +#define GFP_SCS (GFP_KERNEL | __GFP_ZERO) > + > +static inline void *task_scs(struct task_struct *tsk) > +{ > + return task_thread_info(tsk)->shadow_call_stack; > +} > + > +static inline void task_set_scs(struct task_struct *tsk, void *s) > +{ > + task_thread_info(tsk)->shadow_call_stack = s; > +} This should probably be named get and set, or have: #define task_scs(tsk) (task_thread_info(tsk)->shadow_call_stack) ... which can have a trivial implementation as NULL for the !SCS case. > + > +extern void scs_init(void); > +extern void scs_task_init(struct task_struct *tsk); > +extern void scs_task_reset(struct task_struct *tsk); > +extern int scs_prepare(struct task_struct *tsk, int node); > +extern bool scs_corrupted(struct task_struct *tsk); > +extern void scs_release(struct task_struct *tsk); > + > +#else /* CONFIG_SHADOW_CALL_STACK */ > + > +static inline void *task_scs(struct task_struct *tsk) > +{ > + return 0; > +} For all the trivial wrappers you can put the implementation on the same line as the prototype. That makes it a bit easier to compare against the prototypes on the other side of the ifdeffery. e.g. this lot can be: static inline void *task_scs(struct task_struct *tsk) { return 0; } static inline void task_set_scs(struct task_struct *tsk, void *s) { } static inline void scs_init(void) { } ... > +#endif /* CONFIG_SHADOW_CALL_STACK */ > + > +#endif /* _LINUX_SCS_H */ > diff --git a/init/init_task.c b/init/init_task.c > index 9e5cbe5eab7b..cbd40460e903 100644 > --- a/init/init_task.c > +++ b/init/init_task.c > @@ -11,6 +11,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -184,6 +185,13 @@ struct task_struct init_task > }; > EXPORT_SYMBOL(init_task); > > +#ifdef CONFIG_SHADOW_CALL_STACK > +unsigned long init_shadow_call_stack[SCS_SIZE / sizeof(long)] __init_task_data > + __aligned(SCS_SIZE) = { > + [(SCS_SIZE / sizeof(long)) - 1] = SCS_END_MAGIC > +}; > +#endif > + > /* > * Initial thread structure. Alignment of this is handled by a special > * linker map entry. > diff --git a/kernel/Makefile b/kernel/Makefile > index daad787fb795..313dbd44d576 100644 > --- a/kernel/Makefile > +++ b/kernel/Makefile > @@ -102,6 +102,7 @@ obj-$(CONFIG_TRACEPOINTS) += trace/ > obj-$(CONFIG_IRQ_WORK) += irq_work.o > obj-$(CONFIG_CPU_PM) += cpu_pm.o > obj-$(CONFIG_BPF) += bpf/ > +obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o > > obj-$(CONFIG_PERF_EVENTS) += events/ > > diff --git a/kernel/fork.c b/kernel/fork.c > index bcdf53125210..ae7ebe9f0586 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -94,6 +94,7 @@ > #include > #include > #include > +#include Nit: alphabetical order, please (this should come before stackleak.h). > > #include > #include > @@ -451,6 +452,8 @@ void put_task_stack(struct task_struct *tsk) > > void free_task(struct task_struct *tsk) > { > + scs_release(tsk); > + > #ifndef CONFIG_THREAD_INFO_IN_TASK > /* > * The task is finally done with both the stack and thread_info, > @@ -834,6 +837,8 @@ void __init fork_init(void) > NULL, free_vm_stack_cache); > #endif > > + scs_init(); > + > lockdep_init_task(&init_task); > uprobes_init(); > } > @@ -907,6 +912,7 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node) > clear_user_return_notifier(tsk); > clear_tsk_need_resched(tsk); > set_task_stack_end_magic(tsk); > + scs_task_init(tsk); > > #ifdef CONFIG_STACKPROTECTOR > tsk->stack_canary = get_random_canary(); > @@ -2022,6 +2028,9 @@ static __latent_entropy struct task_struct *copy_process( > args->tls); > if (retval) > goto bad_fork_cleanup_io; > + retval = scs_prepare(p, node); > + if (retval) > + goto bad_fork_cleanup_thread; Can we please fold scs_prepare() into scs_task_init() and do this in dup_task_struct()? That way we set this up consistently in one place, where we're also allocating the regular stack. Arguably stackleak_task_init() would better fit there too. > > stackleak_task_init(p); > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index dd05a378631a..e7faeb383008 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -6013,6 +6013,8 @@ void init_idle(struct task_struct *idle, int cpu) > raw_spin_lock_irqsave(&idle->pi_lock, flags); > raw_spin_lock(&rq->lock); > > + scs_task_reset(idle); I'm a bit confused by this -- please see comments below on scs_task_reset(). > + > __sched_fork(0, idle); > idle->state = TASK_RUNNING; > idle->se.exec_start = sched_clock(); > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h > index 0db2c1b3361e..c153003a011c 100644 > --- a/kernel/sched/sched.h > +++ b/kernel/sched/sched.h > @@ -58,6 +58,7 @@ > #include > #include > #include > +#include > #include > #include > #include > diff --git a/kernel/scs.c b/kernel/scs.c > new file mode 100644 > index 000000000000..383d29e8c199 > --- /dev/null > +++ b/kernel/scs.c > @@ -0,0 +1,155 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Shadow Call Stack support. > + * > + * Copyright (C) 2019 Google LLC > + */ > + > +#include > +#include > +#include > +#include Nit: alphabetical order, please. > +#include > +#include > + > +static inline void *__scs_base(struct task_struct *tsk) > +{ > + return (void *)((uintptr_t)task_scs(tsk) & ~(SCS_SIZE - 1)); > +} We only ever assign the base to task_scs(tsk), with the current live value being in a register that we don't read. Are we expecting arch code to keep this up-to-date with the register value? I would have expected that we just leave this as the base (as we do for the regular stack in the task struct), and it's down to arch code to save/restore the current value where necessary. Am I missing some caveat with that approach? > + > +#ifdef CONFIG_SHADOW_CALL_STACK_VMAP > + > +/* Keep a cache of shadow stacks */ > +#define SCS_CACHE_SIZE 2 > +static DEFINE_PER_CPU(void *, scs_cache[SCS_CACHE_SIZE]); > + > +static void *scs_alloc(int node) > +{ > + int i; > + > + for (i = 0; i < SCS_CACHE_SIZE; i++) { > + void *s; > + > + s = this_cpu_xchg(scs_cache[i], NULL); > + if (s) { > + memset(s, 0, SCS_SIZE); > + return s; > + } > + } > + > + BUILD_BUG_ON(SCS_SIZE > PAGE_SIZE); It's probably worth a comment on why we rely on SCS_SIZE <= PAGE_SIZE. > + > + return __vmalloc_node_range(PAGE_SIZE, SCS_SIZE, > + VMALLOC_START, VMALLOC_END, > + GFP_SCS, PAGE_KERNEL, 0, > + node, __builtin_return_address(0)); > +} > + > +static void scs_free(void *s) > +{ > + int i; > + > + for (i = 0; i < SCS_CACHE_SIZE; i++) { > + if (this_cpu_cmpxchg(scs_cache[i], 0, s) != 0) > + continue; > + > + return; > + } > + > + vfree_atomic(s); > +} > + > +static int scs_cleanup(unsigned int cpu) > +{ > + int i; > + void **cache = per_cpu_ptr(scs_cache, cpu); > + > + for (i = 0; i < SCS_CACHE_SIZE; i++) { > + vfree(cache[i]); > + cache[i] = NULL; > + } > + > + return 0; > +} > + > +void __init scs_init(void) > +{ > + cpuhp_setup_state(CPUHP_BP_PREPARE_DYN, "scs:scs_cache", NULL, > + scs_cleanup); > +} > + > +#else /* !CONFIG_SHADOW_CALL_STACK_VMAP */ > + > +static struct kmem_cache *scs_cache; > + > +static inline void *scs_alloc(int node) > +{ > + return kmem_cache_alloc_node(scs_cache, GFP_SCS, node); > +} > + > +static inline void scs_free(void *s) > +{ > + kmem_cache_free(scs_cache, s); > +} > + > +void __init scs_init(void) > +{ > + scs_cache = kmem_cache_create("scs_cache", SCS_SIZE, SCS_SIZE, > + 0, NULL); > + WARN_ON(!scs_cache); > +} > + > +#endif /* CONFIG_SHADOW_CALL_STACK_VMAP */ > + > +static inline unsigned long *scs_magic(struct task_struct *tsk) > +{ > + return (unsigned long *)(__scs_base(tsk) + SCS_SIZE - sizeof(long)); Slightly simpler as: return (unsigned long *)(__scs_base(tsk) + SCS_SIZE) - 1; Thanks, Mark. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.2 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B968ECA9EA0 for ; Fri, 25 Oct 2019 10:56:55 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7EF5621929 for ; Fri, 25 Oct 2019 10:56:55 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="JSpvXq6A" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7EF5621929 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=XHOEbwo7UCTId4sa0fozU1shH5vZPLL91EoPQDQqdPA=; b=JSpvXq6AunMinV 03RA8PY/Gji2pf0cUAt7Tme+fuoQsJ1Wiak242ejPjHuIH84ZlGhTjZgadfFPLJND46mcftvdKSIG YCmxzW0uPG1sufBgGcan2XabfyEf9VIKJkIQpaKKCjfqtjxqYs5iX4pvKGMRUHnBCLcWfOzlJZkhy PvpdIikFqFXOSNyNkksIx3+MCrssDsaUq6QuZqdyADseMESIGlHRbtj42TL7GZxs4Ka09n1Zgqhgg lYr14FeaY41ibjUq4mf9GPGU673fDDj+xqrKarwf5vulcp9tbYz89ot11xffU869BoN8wO3vIZVLf LPWSEzRFAu9UZugjPdYg==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1iNxH4-0008DF-S7; Fri, 25 Oct 2019 10:56:54 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1iNxH0-0008CE-Vm for linux-arm-kernel@lists.infradead.org; Fri, 25 Oct 2019 10:56:53 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 233811FB; Fri, 25 Oct 2019 03:56:48 -0700 (PDT) Received: from lakrids.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 03B9C3F6C4; Fri, 25 Oct 2019 03:56:45 -0700 (PDT) Date: Fri, 25 Oct 2019 11:56:43 +0100 From: Mark Rutland To: samitolvanen@google.com Subject: Re: [PATCH v2 05/17] add support for Clang's Shadow Call Stack (SCS) Message-ID: <20191025105643.GD40270@lakrids.cambridge.arm.com> References: <20191018161033.261971-1-samitolvanen@google.com> <20191024225132.13410-1-samitolvanen@google.com> <20191024225132.13410-6-samitolvanen@google.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20191024225132.13410-6-samitolvanen@google.com> User-Agent: Mutt/1.11.1+11 (2f07cb52) (2018-12-01) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20191025_035651_117042_8EC46628 X-CRM114-Status: GOOD ( 38.29 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Kees Cook , Ard Biesheuvel , Masahiro Yamada , Catalin Marinas , Jann Horn , Nick Desaulniers , linux-kernel@vger.kernel.org, Steven Rostedt , Miguel Ojeda , Masami Hiramatsu , clang-built-linux@googlegroups.com, kernel-hardening@lists.openwall.com, Laura Abbott , Will Deacon , Dave Martin , linux-arm-kernel@lists.infradead.org Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Thu, Oct 24, 2019 at 03:51:20PM -0700, samitolvanen@google.com wrote: > This change adds generic support for Clang's Shadow Call Stack, > which uses a shadow stack to protect return addresses from being > overwritten by an attacker. Details are available here: > > https://clang.llvm.org/docs/ShadowCallStack.html > > Note that security guarantees in the kernel differ from the > ones documented for user space. The kernel must store addresses > of shadow stacks used by other tasks and interrupt handlers in > memory, which means an attacker capable reading and writing > arbitrary memory may be able to locate them and hijack control > flow by modifying shadow stacks that are not currently in use. > > Signed-off-by: Sami Tolvanen > --- > Makefile | 6 ++ > arch/Kconfig | 33 +++++++ > include/linux/compiler-clang.h | 6 ++ > include/linux/compiler_types.h | 4 + > include/linux/scs.h | 78 +++++++++++++++++ > init/init_task.c | 8 ++ > kernel/Makefile | 1 + > kernel/fork.c | 9 ++ > kernel/sched/core.c | 2 + > kernel/sched/sched.h | 1 + > kernel/scs.c | 155 +++++++++++++++++++++++++++++++++ > 11 files changed, 303 insertions(+) > create mode 100644 include/linux/scs.h > create mode 100644 kernel/scs.c > > diff --git a/Makefile b/Makefile > index 5475cdb6d57d..2b5c59fb18f2 100644 > --- a/Makefile > +++ b/Makefile > @@ -846,6 +846,12 @@ ifdef CONFIG_LIVEPATCH > KBUILD_CFLAGS += $(call cc-option, -flive-patching=inline-clone) > endif > > +ifdef CONFIG_SHADOW_CALL_STACK > +CC_FLAGS_SCS := -fsanitize=shadow-call-stack > +KBUILD_CFLAGS += $(CC_FLAGS_SCS) > +export CC_FLAGS_SCS > +endif > + > # arch Makefile may override CC so keep this after arch Makefile is included > NOSTDINC_FLAGS += -nostdinc -isystem $(shell $(CC) -print-file-name=include) > > diff --git a/arch/Kconfig b/arch/Kconfig > index 5f8a5d84dbbe..5e34cbcd8d6a 100644 > --- a/arch/Kconfig > +++ b/arch/Kconfig > @@ -521,6 +521,39 @@ config STACKPROTECTOR_STRONG > about 20% of all kernel functions, which increases the kernel code > size by about 2%. > > +config ARCH_SUPPORTS_SHADOW_CALL_STACK > + bool > + help > + An architecture should select this if it supports Clang's Shadow > + Call Stack, has asm/scs.h, and implements runtime support for shadow > + stack switching. > + > +config SHADOW_CALL_STACK_VMAP > + bool > + depends on SHADOW_CALL_STACK > + help > + Use virtually mapped shadow call stacks. Selecting this option > + provides better stack exhaustion protection, but increases per-thread > + memory consumption as a full page is allocated for each shadow stack. > + > +config SHADOW_CALL_STACK > + bool "Clang Shadow Call Stack" > + depends on ARCH_SUPPORTS_SHADOW_CALL_STACK > + help > + This option enables Clang's Shadow Call Stack, which uses a > + shadow stack to protect function return addresses from being > + overwritten by an attacker. More information can be found from > + Clang's documentation: > + > + https://clang.llvm.org/docs/ShadowCallStack.html > + > + Note that security guarantees in the kernel differ from the ones > + documented for user space. The kernel must store addresses of shadow > + stacks used by other tasks and interrupt handlers in memory, which > + means an attacker capable reading and writing arbitrary memory may > + be able to locate them and hijack control flow by modifying shadow > + stacks that are not currently in use. > + > config HAVE_ARCH_WITHIN_STACK_FRAMES > bool > help > diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h > index 333a6695a918..afe5e24088b2 100644 > --- a/include/linux/compiler-clang.h > +++ b/include/linux/compiler-clang.h > @@ -42,3 +42,9 @@ > * compilers, like ICC. > */ > #define barrier() __asm__ __volatile__("" : : : "memory") > + > +#if __has_feature(shadow_call_stack) > +# define __noscs __attribute__((no_sanitize("shadow-call-stack"))) > +#else > +# define __noscs > +#endif Huh. I didn't realise it was valid to have a space after the `#` like this. I see we're very inconsistent about style on that front, so this is fine, I'll just have to get used to it. :) > diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h > index 72393a8c1a6c..be5d5be4b1ae 100644 > --- a/include/linux/compiler_types.h > +++ b/include/linux/compiler_types.h > @@ -202,6 +202,10 @@ struct ftrace_likely_data { > # define randomized_struct_fields_end > #endif > > +#ifndef __noscs > +# define __noscs > +#endif > + > #ifndef asm_volatile_goto > #define asm_volatile_goto(x...) asm goto(x) > #endif > diff --git a/include/linux/scs.h b/include/linux/scs.h > new file mode 100644 > index 000000000000..c8b0ccfdd803 > --- /dev/null > +++ b/include/linux/scs.h > @@ -0,0 +1,78 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +/* > + * Shadow Call Stack support. > + * > + * Copyright (C) 2018 Google LLC > + */ > + > +#ifndef _LINUX_SCS_H > +#define _LINUX_SCS_H > + > +#include > +#include > +#include > + > +#ifdef CONFIG_SHADOW_CALL_STACK > + > +#define SCS_SIZE 1024 I think it'd be worth a comment on how this size was chosen. IIRC this empirical? > +#define SCS_END_MAGIC 0xaf0194819b1635f6UL Keyboard smash? ... or is there a prize for whoever figures out the secret? ;) > + > +#define GFP_SCS (GFP_KERNEL | __GFP_ZERO) > + > +static inline void *task_scs(struct task_struct *tsk) > +{ > + return task_thread_info(tsk)->shadow_call_stack; > +} > + > +static inline void task_set_scs(struct task_struct *tsk, void *s) > +{ > + task_thread_info(tsk)->shadow_call_stack = s; > +} This should probably be named get and set, or have: #define task_scs(tsk) (task_thread_info(tsk)->shadow_call_stack) ... which can have a trivial implementation as NULL for the !SCS case. > + > +extern void scs_init(void); > +extern void scs_task_init(struct task_struct *tsk); > +extern void scs_task_reset(struct task_struct *tsk); > +extern int scs_prepare(struct task_struct *tsk, int node); > +extern bool scs_corrupted(struct task_struct *tsk); > +extern void scs_release(struct task_struct *tsk); > + > +#else /* CONFIG_SHADOW_CALL_STACK */ > + > +static inline void *task_scs(struct task_struct *tsk) > +{ > + return 0; > +} For all the trivial wrappers you can put the implementation on the same line as the prototype. That makes it a bit easier to compare against the prototypes on the other side of the ifdeffery. e.g. this lot can be: static inline void *task_scs(struct task_struct *tsk) { return 0; } static inline void task_set_scs(struct task_struct *tsk, void *s) { } static inline void scs_init(void) { } ... > +#endif /* CONFIG_SHADOW_CALL_STACK */ > + > +#endif /* _LINUX_SCS_H */ > diff --git a/init/init_task.c b/init/init_task.c > index 9e5cbe5eab7b..cbd40460e903 100644 > --- a/init/init_task.c > +++ b/init/init_task.c > @@ -11,6 +11,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -184,6 +185,13 @@ struct task_struct init_task > }; > EXPORT_SYMBOL(init_task); > > +#ifdef CONFIG_SHADOW_CALL_STACK > +unsigned long init_shadow_call_stack[SCS_SIZE / sizeof(long)] __init_task_data > + __aligned(SCS_SIZE) = { > + [(SCS_SIZE / sizeof(long)) - 1] = SCS_END_MAGIC > +}; > +#endif > + > /* > * Initial thread structure. Alignment of this is handled by a special > * linker map entry. > diff --git a/kernel/Makefile b/kernel/Makefile > index daad787fb795..313dbd44d576 100644 > --- a/kernel/Makefile > +++ b/kernel/Makefile > @@ -102,6 +102,7 @@ obj-$(CONFIG_TRACEPOINTS) += trace/ > obj-$(CONFIG_IRQ_WORK) += irq_work.o > obj-$(CONFIG_CPU_PM) += cpu_pm.o > obj-$(CONFIG_BPF) += bpf/ > +obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o > > obj-$(CONFIG_PERF_EVENTS) += events/ > > diff --git a/kernel/fork.c b/kernel/fork.c > index bcdf53125210..ae7ebe9f0586 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -94,6 +94,7 @@ > #include > #include > #include > +#include Nit: alphabetical order, please (this should come before stackleak.h). > > #include > #include > @@ -451,6 +452,8 @@ void put_task_stack(struct task_struct *tsk) > > void free_task(struct task_struct *tsk) > { > + scs_release(tsk); > + > #ifndef CONFIG_THREAD_INFO_IN_TASK > /* > * The task is finally done with both the stack and thread_info, > @@ -834,6 +837,8 @@ void __init fork_init(void) > NULL, free_vm_stack_cache); > #endif > > + scs_init(); > + > lockdep_init_task(&init_task); > uprobes_init(); > } > @@ -907,6 +912,7 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node) > clear_user_return_notifier(tsk); > clear_tsk_need_resched(tsk); > set_task_stack_end_magic(tsk); > + scs_task_init(tsk); > > #ifdef CONFIG_STACKPROTECTOR > tsk->stack_canary = get_random_canary(); > @@ -2022,6 +2028,9 @@ static __latent_entropy struct task_struct *copy_process( > args->tls); > if (retval) > goto bad_fork_cleanup_io; > + retval = scs_prepare(p, node); > + if (retval) > + goto bad_fork_cleanup_thread; Can we please fold scs_prepare() into scs_task_init() and do this in dup_task_struct()? That way we set this up consistently in one place, where we're also allocating the regular stack. Arguably stackleak_task_init() would better fit there too. > > stackleak_task_init(p); > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index dd05a378631a..e7faeb383008 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -6013,6 +6013,8 @@ void init_idle(struct task_struct *idle, int cpu) > raw_spin_lock_irqsave(&idle->pi_lock, flags); > raw_spin_lock(&rq->lock); > > + scs_task_reset(idle); I'm a bit confused by this -- please see comments below on scs_task_reset(). > + > __sched_fork(0, idle); > idle->state = TASK_RUNNING; > idle->se.exec_start = sched_clock(); > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h > index 0db2c1b3361e..c153003a011c 100644 > --- a/kernel/sched/sched.h > +++ b/kernel/sched/sched.h > @@ -58,6 +58,7 @@ > #include > #include > #include > +#include > #include > #include > #include > diff --git a/kernel/scs.c b/kernel/scs.c > new file mode 100644 > index 000000000000..383d29e8c199 > --- /dev/null > +++ b/kernel/scs.c > @@ -0,0 +1,155 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Shadow Call Stack support. > + * > + * Copyright (C) 2019 Google LLC > + */ > + > +#include > +#include > +#include > +#include Nit: alphabetical order, please. > +#include > +#include > + > +static inline void *__scs_base(struct task_struct *tsk) > +{ > + return (void *)((uintptr_t)task_scs(tsk) & ~(SCS_SIZE - 1)); > +} We only ever assign the base to task_scs(tsk), with the current live value being in a register that we don't read. Are we expecting arch code to keep this up-to-date with the register value? I would have expected that we just leave this as the base (as we do for the regular stack in the task struct), and it's down to arch code to save/restore the current value where necessary. Am I missing some caveat with that approach? > + > +#ifdef CONFIG_SHADOW_CALL_STACK_VMAP > + > +/* Keep a cache of shadow stacks */ > +#define SCS_CACHE_SIZE 2 > +static DEFINE_PER_CPU(void *, scs_cache[SCS_CACHE_SIZE]); > + > +static void *scs_alloc(int node) > +{ > + int i; > + > + for (i = 0; i < SCS_CACHE_SIZE; i++) { > + void *s; > + > + s = this_cpu_xchg(scs_cache[i], NULL); > + if (s) { > + memset(s, 0, SCS_SIZE); > + return s; > + } > + } > + > + BUILD_BUG_ON(SCS_SIZE > PAGE_SIZE); It's probably worth a comment on why we rely on SCS_SIZE <= PAGE_SIZE. > + > + return __vmalloc_node_range(PAGE_SIZE, SCS_SIZE, > + VMALLOC_START, VMALLOC_END, > + GFP_SCS, PAGE_KERNEL, 0, > + node, __builtin_return_address(0)); > +} > + > +static void scs_free(void *s) > +{ > + int i; > + > + for (i = 0; i < SCS_CACHE_SIZE; i++) { > + if (this_cpu_cmpxchg(scs_cache[i], 0, s) != 0) > + continue; > + > + return; > + } > + > + vfree_atomic(s); > +} > + > +static int scs_cleanup(unsigned int cpu) > +{ > + int i; > + void **cache = per_cpu_ptr(scs_cache, cpu); > + > + for (i = 0; i < SCS_CACHE_SIZE; i++) { > + vfree(cache[i]); > + cache[i] = NULL; > + } > + > + return 0; > +} > + > +void __init scs_init(void) > +{ > + cpuhp_setup_state(CPUHP_BP_PREPARE_DYN, "scs:scs_cache", NULL, > + scs_cleanup); > +} > + > +#else /* !CONFIG_SHADOW_CALL_STACK_VMAP */ > + > +static struct kmem_cache *scs_cache; > + > +static inline void *scs_alloc(int node) > +{ > + return kmem_cache_alloc_node(scs_cache, GFP_SCS, node); > +} > + > +static inline void scs_free(void *s) > +{ > + kmem_cache_free(scs_cache, s); > +} > + > +void __init scs_init(void) > +{ > + scs_cache = kmem_cache_create("scs_cache", SCS_SIZE, SCS_SIZE, > + 0, NULL); > + WARN_ON(!scs_cache); > +} > + > +#endif /* CONFIG_SHADOW_CALL_STACK_VMAP */ > + > +static inline unsigned long *scs_magic(struct task_struct *tsk) > +{ > + return (unsigned long *)(__scs_base(tsk) + SCS_SIZE - sizeof(long)); Slightly simpler as: return (unsigned long *)(__scs_base(tsk) + SCS_SIZE) - 1; Thanks, Mark. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel