* [PATCH v8 2/4] x86/syscalls: Optimize address limit check
2017-04-26 18:34 [PATCH v8 1/4] syscalls: Verify address limit before returning to user-mode Thomas Garnier
@ 2017-04-26 18:34 ` Thomas Garnier
2017-04-26 18:34 ` [PATCH v8 3/4] arm/syscalls: " Thomas Garnier
` (2 subsequent siblings)
3 siblings, 0 replies; 9+ messages in thread
From: Thomas Garnier @ 2017-04-26 18:34 UTC (permalink / raw)
To: Martin Schwidefsky, Heiko Carstens, Dave Hansen, Arnd Bergmann,
David Howells, René Nyffenegger, Thomas Garnier,
Andrew Morton, Paul E . McKenney, Ingo Molnar,
Eric W . Biederman, Thomas Gleixner, Oleg Nesterov,
Pavel Tikhomirov, Ingo Molnar, H . Peter Anvin, Andy Lutomirski,
Paolo Bonzini, Kees Cook, Rik van Riel, Josh Poimboeuf,
Borislav Petkov, Brian Gerst, Kirill A . Shutemov,
Christian Borntraeger, Russell King, Will Deacon,
Catalin Marinas, Mark Rutland, James Morse
Cc: linux-s390, linux-kernel, linux-api, x86, linux-arm-kernel,
kernel-hardening
Disable the generic address limit check in favor of an architecture
specific optimized implementation.
The user-mode state check is added to the prepare_exit_to_usermode
function. This function is called before any user-mode return on 32-bit
and on the 64-bit syscall slowpath. For the 64-bit syscall fast path, an
assembly address limit check redirects to the slow path if the address
limit is different.
The TASK_SIZE_MAX definition is moved to the pgtable_64_types header so
it can be used in assembly code.
Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
Based on next-20170426
---
arch/x86/Kconfig | 1 +
arch/x86/entry/common.c | 3 +++
arch/x86/entry/entry_64.S | 8 ++++++++
arch/x86/include/asm/pgtable_64_types.h | 11 +++++++++++
arch/x86/include/asm/processor.h | 11 -----------
5 files changed, 23 insertions(+), 11 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index cd18994a9555..f3ce1859bd61 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -63,6 +63,7 @@ config X86
select ARCH_MIGHT_HAVE_ACPI_PDC if ACPI
select ARCH_MIGHT_HAVE_PC_PARPORT
select ARCH_MIGHT_HAVE_PC_SERIO
+ select ADDR_LIMIT_CHECK
select ARCH_SUPPORTS_ATOMIC_RMW
select ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT
select ARCH_SUPPORTS_NUMA_BALANCING if X86_64
diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index cdefcfdd9e63..057d133d7b78 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -23,6 +23,7 @@
#include <linux/user-return-notifier.h>
#include <linux/uprobes.h>
#include <linux/livepatch.h>
+#include <linux/syscalls.h>
#include <asm/desc.h>
#include <asm/traps.h>
@@ -183,6 +184,8 @@ __visible inline void prepare_exit_to_usermode(struct pt_regs *regs)
struct thread_info *ti = current_thread_info();
u32 cached_flags;
+ addr_limit_check_syscall();
+
if (IS_ENABLED(CONFIG_PROVE_LOCKING) && WARN_ON(!irqs_disabled()))
local_irq_disable();
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 607d72c4a485..62aca6dcdaf4 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -218,6 +218,14 @@ entry_SYSCALL_64_fastpath:
testl $_TIF_ALLWORK_MASK, TASK_TI_flags(%r11)
jnz 1f
+ /*
+ * If address limit is not based on user-mode, jump to slow path for
+ * additional security checks.
+ */
+ movq $TASK_SIZE_MAX, %rcx
+ cmp %rcx, TASK_addr_limit(%r11)
+ jne 1f
+
LOCKDEP_SYS_EXIT
TRACE_IRQS_ON /* user mode is traced as IRQs on */
movq RIP(%rsp), %rcx
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 06470da156ba..78af4d43a7ce 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -104,4 +104,15 @@ typedef struct { pteval_t pte; } pte_t;
#define EARLY_DYNAMIC_PAGE_TABLES 64
+/*
+ * User space process size. 47bits minus one guard page. The guard
+ * page is necessary on Intel CPUs: if a SYSCALL instruction is at
+ * the highest possible canonical userspace address, then that
+ * syscall will enter the kernel with a non-canonical return
+ * address, and SYSRET will explode dangerously. We avoid this
+ * particular problem by preventing anything from being mapped
+ * at the maximum canonical address.
+ */
+#define TASK_SIZE_MAX ((_AC(1, UL) << 47) - PAGE_SIZE)
+
#endif /* _ASM_X86_PGTABLE_64_DEFS_H */
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 3cada998a402..e80822582d3e 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -825,17 +825,6 @@ static inline void spin_lock_prefetch(const void *x)
#define KSTK_ESP(task) (task_pt_regs(task)->sp)
#else
-/*
- * User space process size. 47bits minus one guard page. The guard
- * page is necessary on Intel CPUs: if a SYSCALL instruction is at
- * the highest possible canonical userspace address, then that
- * syscall will enter the kernel with a non-canonical return
- * address, and SYSRET will explode dangerously. We avoid this
- * particular problem by preventing anything from being mapped
- * at the maximum canonical address.
- */
-#define TASK_SIZE_MAX ((1UL << 47) - PAGE_SIZE)
-
/* This decides where the kernel will search for a free chunk of vm
* space during mmap's.
*/
--
2.13.0.rc0.306.g87b477812d-goog
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v8 3/4] arm/syscalls: Optimize address limit check
2017-04-26 18:34 [PATCH v8 1/4] syscalls: Verify address limit before returning to user-mode Thomas Garnier
2017-04-26 18:34 ` [PATCH v8 2/4] x86/syscalls: Optimize address limit check Thomas Garnier
@ 2017-04-26 18:34 ` Thomas Garnier
2017-04-26 18:34 ` [PATCH v8 4/4] arm64/syscalls: " Thomas Garnier
2017-04-27 6:49 ` [PATCH v8 1/4] syscalls: Verify address limit before returning to user-mode Ingo Molnar
3 siblings, 0 replies; 9+ messages in thread
From: Thomas Garnier @ 2017-04-26 18:34 UTC (permalink / raw)
To: Martin Schwidefsky, Heiko Carstens, Dave Hansen, Arnd Bergmann,
David Howells, René Nyffenegger, Thomas Garnier,
Andrew Morton, Paul E . McKenney, Ingo Molnar,
Eric W . Biederman, Thomas Gleixner, Oleg Nesterov,
Pavel Tikhomirov, Ingo Molnar, H . Peter Anvin, Andy Lutomirski,
Paolo Bonzini, Kees Cook, Rik van Riel, Josh Poimboeuf,
Borislav Petkov, Brian Gerst, Kirill A . Shutemov,
Christian Borntraeger, Russell King, Will Deacon,
Catalin Marinas, Mark Rutland, James Morse
Cc: linux-s390, linux-kernel, linux-api, x86, linux-arm-kernel,
kernel-hardening
Disable the generic address limit check in favor of an architecture
specific optimized implementation.
The address limit is checked on each syscall return path to user-mode
path as well as the irq user-mode return function. If the address limit
was changed, a generic handler is called to stop the kernel on an
explicit check.
Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
Based on next-20170426
---
arch/arm/Kconfig | 1 +
arch/arm/kernel/entry-common.S | 10 +++++++++-
2 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 4c1a35f15838..f17f30084a4e 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -12,6 +12,7 @@ config ARM
select ARCH_HAVE_CUSTOM_GPIO_H
select ARCH_HAS_GCOV_PROFILE_ALL
select ARCH_MIGHT_HAVE_PC_PARPORT
+ select ADDR_LIMIT_CHECK
select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX
select ARCH_OPTIONAL_KERNEL_RWX_DEFAULT if CPU_V7
select ARCH_SUPPORTS_ATOMIC_RMW
diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
index eb5cd77bf1d8..c83927498f40 100644
--- a/arch/arm/kernel/entry-common.S
+++ b/arch/arm/kernel/entry-common.S
@@ -12,6 +12,7 @@
#include <asm/unistd.h>
#include <asm/ftrace.h>
#include <asm/unwind.h>
+#include <asm/memory.h>
#ifdef CONFIG_AEABI
#include <asm/unistd-oabi.h>
#endif
@@ -27,7 +28,6 @@
#include "entry-header.S"
-
.align 5
#if !(IS_ENABLED(CONFIG_TRACE_IRQFLAGS) || IS_ENABLED(CONFIG_CONTEXT_TRACKING))
/*
@@ -40,9 +40,12 @@ ret_fast_syscall:
UNWIND(.fnstart )
UNWIND(.cantunwind )
disable_irq_notrace @ disable interrupts
+ ldr r2, [tsk, #TI_ADDR_LIMIT]
ldr r1, [tsk, #TI_FLAGS] @ re-check for syscall tracing
tst r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK
bne fast_work_pending
+ cmp r2, #TASK_SIZE
+ blne addr_limit_check_failed
/* perform architecture specific actions before user return */
arch_ret_to_user r1, lr
@@ -66,6 +69,7 @@ ret_fast_syscall:
UNWIND(.cantunwind )
str r0, [sp, #S_R0 + S_OFF]! @ save returned r0
disable_irq_notrace @ disable interrupts
+ ldr r2, [tsk, #TI_ADDR_LIMIT]
ldr r1, [tsk, #TI_FLAGS] @ re-check for syscall tracing
tst r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK
beq no_work_pending
@@ -82,6 +86,7 @@ slow_work_pending:
mov r2, why @ 'syscall'
bl do_work_pending
cmp r0, #0
+ ldreq r2, [tsk, #TI_ADDR_LIMIT]
beq no_work_pending
movlt scno, #(__NR_restart_syscall - __NR_SYSCALL_BASE)
ldmia sp, {r0 - r6} @ have to reload r0 - r6
@@ -99,9 +104,12 @@ ret_slow_syscall:
disable_irq_notrace @ disable interrupts
ENTRY(ret_to_user_from_irq)
ldr r1, [tsk, #TI_FLAGS]
+ ldr r2, [tsk, #TI_ADDR_LIMIT]
tst r1, #_TIF_WORK_MASK
bne slow_work_pending
no_work_pending:
+ cmp r2, #TASK_SIZE
+ blne addr_limit_check_failed
asm_trace_hardirqs_on save = 0
/* perform architecture specific actions before user return */
--
2.13.0.rc0.306.g87b477812d-goog
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v8 4/4] arm64/syscalls: Optimize address limit check
2017-04-26 18:34 [PATCH v8 1/4] syscalls: Verify address limit before returning to user-mode Thomas Garnier
2017-04-26 18:34 ` [PATCH v8 2/4] x86/syscalls: Optimize address limit check Thomas Garnier
2017-04-26 18:34 ` [PATCH v8 3/4] arm/syscalls: " Thomas Garnier
@ 2017-04-26 18:34 ` Thomas Garnier
2017-04-27 6:49 ` [PATCH v8 1/4] syscalls: Verify address limit before returning to user-mode Ingo Molnar
3 siblings, 0 replies; 9+ messages in thread
From: Thomas Garnier @ 2017-04-26 18:34 UTC (permalink / raw)
To: Martin Schwidefsky, Heiko Carstens, Dave Hansen, Arnd Bergmann,
David Howells, René Nyffenegger, Thomas Garnier,
Andrew Morton, Paul E . McKenney, Ingo Molnar,
Eric W . Biederman, Thomas Gleixner, Oleg Nesterov,
Pavel Tikhomirov, Ingo Molnar, H . Peter Anvin, Andy Lutomirski,
Paolo Bonzini, Kees Cook, Rik van Riel, Josh Poimboeuf,
Borislav Petkov, Brian Gerst, Kirill A . Shutemov,
Christian Borntraeger, Russell King, Will Deacon,
Catalin Marinas, Mark Rutland, James Morse
Cc: linux-s390, linux-kernel, linux-api, x86, linux-arm-kernel,
kernel-hardening
Disable the generic address limit check in favor of an architecture
specific optimized implementation.
The address limit is checked on each syscall return path to user-mode.
If it was changed, a generic handler is called to stop the kernel on an
explicit check.
Signed-off-by: Thomas Garnier <thgarnie@google.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
---
Based on next-20170426
---
arch/arm64/Kconfig | 1 +
arch/arm64/kernel/entry.S | 12 ++++++++++++
2 files changed, 13 insertions(+)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 3dcd7ec69bca..fe9466d3bf94 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -25,6 +25,7 @@ config ARM64
select ARCH_WANT_COMPAT_IPC_PARSE_VERSION
select ARCH_WANT_FRAME_POINTERS
select ARCH_HAS_UBSAN_SANITIZE_ALL
+ select ADDR_LIMIT_CHECK
select ARM_AMBA
select ARM_ARCH_TIMER
select ARM_GIC
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 43512d4d7df2..c895c4402d32 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -744,6 +744,8 @@ ENDPROC(cpu_switch_to)
ret_fast_syscall:
disable_irq // disable interrupts
str x0, [sp, #S_X0] // returned x0
+ ldr x2, [tsk, #TSK_TI_ADDR_LIMIT] // check addr limit change
+ tbnz x2, #63, addr_limit_fail
ldr x1, [tsk, #TSK_TI_FLAGS] // re-check for syscall tracing
and x2, x1, #_TIF_SYSCALL_WORK
cbnz x2, ret_fast_syscall_trace
@@ -771,6 +773,8 @@ work_pending:
*/
ret_to_user:
disable_irq // disable interrupts
+ ldr x2, [tsk, #TSK_TI_ADDR_LIMIT] // check addr limit change
+ tbnz x2, #63, addr_limit_fail
ldr x1, [tsk, #TSK_TI_FLAGS]
and x2, x1, #_TIF_WORK_MASK
cbnz x2, work_pending
@@ -780,6 +784,14 @@ finish_ret_to_user:
ENDPROC(ret_to_user)
/*
+ * Address limit was incorrect before returning in user-mode which could be
+ * used to elevate privileges. Call the generic handler to stop the kernel on
+ * the appropriate check. This function does not return.
+ */
+addr_limit_fail:
+ b addr_limit_check_failed
+
+/*
* This is how we return from a fork.
*/
ENTRY(ret_from_fork)
--
2.13.0.rc0.306.g87b477812d-goog
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v8 1/4] syscalls: Verify address limit before returning to user-mode
2017-04-26 18:34 [PATCH v8 1/4] syscalls: Verify address limit before returning to user-mode Thomas Garnier
` (2 preceding siblings ...)
2017-04-26 18:34 ` [PATCH v8 4/4] arm64/syscalls: " Thomas Garnier
@ 2017-04-27 6:49 ` Ingo Molnar
2017-04-27 14:16 ` Thomas Garnier
3 siblings, 1 reply; 9+ messages in thread
From: Ingo Molnar @ 2017-04-27 6:49 UTC (permalink / raw)
To: Thomas Garnier
Cc: Martin Schwidefsky, Heiko Carstens, Dave Hansen, Arnd Bergmann,
David Howells, René Nyffenegger, Andrew Morton,
Paul E . McKenney, Eric W . Biederman, Thomas Gleixner,
Oleg Nesterov, Pavel Tikhomirov, Ingo Molnar, H . Peter Anvin,
Andy Lutomirski, Paolo Bonzini, Kees Cook, Rik van Riel,
Josh Poimboeuf, Borislav Petkov, Brian Gerst,
Kirill A . Shutemov, Christian Borntraeger, Russell King,
Will Deacon, Catalin Marinas, Mark Rutland, James Morse,
linux-s390, linux-kernel, linux-api, x86, linux-arm-kernel,
kernel-hardening
* Thomas Garnier <thgarnie@google.com> wrote:
> +
> +/*
> + * Called before coming back to user-mode. Returning to user-mode with an
> + * address limit different than USER_DS can allow to overwrite kernel memory.
> + */
> +static inline void addr_limit_check_syscall(void)
> +{
> + BUG_ON(!segment_eq(get_fs(), USER_DS));
> +}
> +
> +#ifndef CONFIG_ADDR_LIMIT_CHECK
> +#define __CHECK_USERMODE_SYSCALL() \
> + bool user_caller = segment_eq(get_fs(), USER_DS)
> +#define __VERIFY_ADDR_LIMIT() \
> + if (user_caller) addr_limit_check_syscall()
> +#else
> +#define __CHECK_USERMODE_SYSCALL()
> +#define __VERIFY_ADDR_LIMIT()
> +asmlinkage void addr_limit_check_failed(void) __noreturn;
> +#endif
_Please_ harmonize all the externally exposed names and symbols.
There's no reason for this mismash of names:
CONFIG_ADDR_LIMIT_CHECK
__CHECK_USERMODE_SYSCALL
__VERIFY_ADDR_LIMIT
When we could just as easily name them consistently, along the existing pattern:
CONFIG_ADDR_LIMIT_CHECK
__SYSCALL_ADDR_LIMIT_CHECK
__ADDR_LIMIT_CHECK
which should fit into existing nomenclature:
> #define __SYSCALL_DEFINEx(x, name, ...) \
But even with that fixed, the whole construct still looks pretty weird:
> { \
> - long ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__)); \
> + long ret; \
> + __CHECK_USERMODE_SYSCALL(); \
> + ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__)); \
> + __ADDR_LIMIT_CHECK(); \
> __MAP(x,__SC_TEST,__VA_ARGS__); \
> __PROTECT(x, ret,__MAP(x,__SC_ARGS,__VA_ARGS__)); \
> return ret; \
I think something like this would be more natural to read:
> + ADDR_LIMIT_CHECK_PRE(); \
> + ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__)); \
> + ADDR_LIMIT_CHECK_POST(); \
it's a clear pre/post construct. Also note the lack of double underscores.
BTW., a further simplification would be:
#ifndef ADDR_LIMIT_CHECK_PRE
# define ADDR_LIMIT_CHECK_PRE ...
#endif
This way architectures could override this generic functionality simply by
defining the helpers. Architectures that don't do that get the generic version.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v8 1/4] syscalls: Verify address limit before returning to user-mode
2017-04-27 6:49 ` [PATCH v8 1/4] syscalls: Verify address limit before returning to user-mode Ingo Molnar
@ 2017-04-27 14:16 ` Thomas Garnier
2017-04-27 14:42 ` Thomas Garnier
2017-04-28 6:33 ` Ingo Molnar
0 siblings, 2 replies; 9+ messages in thread
From: Thomas Garnier @ 2017-04-27 14:16 UTC (permalink / raw)
To: Ingo Molnar
Cc: Martin Schwidefsky, Heiko Carstens, Dave Hansen, Arnd Bergmann,
David Howells, René Nyffenegger, Andrew Morton,
Paul E . McKenney, Eric W . Biederman, Thomas Gleixner,
Oleg Nesterov, Pavel Tikhomirov, Ingo Molnar, H . Peter Anvin,
Andy Lutomirski, Paolo Bonzini, Kees Cook, Rik van Riel,
Josh Poimboeuf, Borislav Petkov, Brian Gerst,
Kirill A . Shutemov, Christian Borntraeger, Russell King,
Will Deacon, Catalin Marinas, Mark Rutland, James Morse,
linux-s390, LKML, Linux API, the arch/x86 maintainers,
linux-arm-kernel, Kernel Hardening
On Wed, Apr 26, 2017 at 11:49 PM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Thomas Garnier <thgarnie@google.com> wrote:
>
>> +
>> +/*
>> + * Called before coming back to user-mode. Returning to user-mode with an
>> + * address limit different than USER_DS can allow to overwrite kernel memory.
>> + */
>> +static inline void addr_limit_check_syscall(void)
>> +{
>> + BUG_ON(!segment_eq(get_fs(), USER_DS));
>> +}
>> +
>> +#ifndef CONFIG_ADDR_LIMIT_CHECK
>> +#define __CHECK_USERMODE_SYSCALL() \
>> + bool user_caller = segment_eq(get_fs(), USER_DS)
>> +#define __VERIFY_ADDR_LIMIT() \
>> + if (user_caller) addr_limit_check_syscall()
>> +#else
>> +#define __CHECK_USERMODE_SYSCALL()
>> +#define __VERIFY_ADDR_LIMIT()
>> +asmlinkage void addr_limit_check_failed(void) __noreturn;
>> +#endif
>
> _Please_ harmonize all the externally exposed names and symbols.
>
> There's no reason for this mismash of names:
>
> CONFIG_ADDR_LIMIT_CHECK
>
> __CHECK_USERMODE_SYSCALL
> __VERIFY_ADDR_LIMIT
>
> When we could just as easily name them consistently, along the existing pattern:
>
> CONFIG_ADDR_LIMIT_CHECK
>
> __SYSCALL_ADDR_LIMIT_CHECK
> __ADDR_LIMIT_CHECK
>
> which should fit into existing nomenclature:
>
>> #define __SYSCALL_DEFINEx(x, name, ...) \
>
> But even with that fixed, the whole construct still looks pretty weird:
>
>> { \
>> - long ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__)); \
>> + long ret; \
>> + __CHECK_USERMODE_SYSCALL(); \
>> + ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__)); \
>> + __ADDR_LIMIT_CHECK(); \
>> __MAP(x,__SC_TEST,__VA_ARGS__); \
>> __PROTECT(x, ret,__MAP(x,__SC_ARGS,__VA_ARGS__)); \
>> return ret; \
>
> I think something like this would be more natural to read:
>
>> + ADDR_LIMIT_CHECK_PRE(); \
>> + ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__)); \
>> + ADDR_LIMIT_CHECK_POST(); \
>
> it's a clear pre/post construct. Also note the lack of double underscores.
I think this construct makes more sense because the first macro check
if the syscall was called by user-mode. I will send an update for this
on this thread.
>
> BTW., a further simplification would be:
>
> #ifndef ADDR_LIMIT_CHECK_PRE
> # define ADDR_LIMIT_CHECK_PRE ...
> #endif
>
> This way architectures could override this generic functionality simply by
> defining the helpers. Architectures that don't do that get the generic version.
I don't think architectures need to do that. The optimizations are
embedding the checks on their architecture-specific code to make it
faster and remove the size impact. The pre/post is fine for the rest.
>
> Thanks,
>
> Ingo
--
Thomas
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v8 1/4] syscalls: Verify address limit before returning to user-mode
2017-04-27 14:16 ` Thomas Garnier
@ 2017-04-27 14:42 ` Thomas Garnier
2017-04-28 6:35 ` Ingo Molnar
2017-04-28 6:33 ` Ingo Molnar
1 sibling, 1 reply; 9+ messages in thread
From: Thomas Garnier @ 2017-04-27 14:42 UTC (permalink / raw)
To: Martin Schwidefsky, Heiko Carstens, Dave Hansen, Arnd Bergmann,
Andrew Morton, Al Viro, Thomas Garnier, David Howells,
René Nyffenegger, Paul E . McKenney, Ingo Molnar,
Thomas Gleixner, Eric W . Biederman, Oleg Nesterov,
Pavel Tikhomirov, Ingo Molnar, H . Peter Anvin, Andy Lutomirski,
Paolo Bonzini, Rik van Riel, Kees Cook, Josh Poimboeuf,
Borislav Petkov, Brian Gerst, Kirill A . Shutemov,
Christian Borntraeger, Russell King, Will Deacon,
Catalin Marinas, Mark Rutland, James Morse
Cc: linux-s390, linux-kernel, linux-api, x86, linux-arm-kernel,
kernel-hardening
Ensure that a syscall does not return to user-mode with a kernel address
limit. If that happens, a process can corrupt kernel-mode memory and
elevate privileges [1].
The CONFIG_ADDR_LIMIT_CHECK option disables the generic check so each
architecture can create optimized versions.
[1] https://bugs.chromium.org/p/project-zero/issues/detail?id=990
Signed-off-by: Thomas Garnier <thgarnie@google.com>
Tested-by: Kees Cook <keescook@chromium.org>
---
Based on next-20170426
---
arch/s390/Kconfig | 1 +
include/linux/syscalls.h | 27 ++++++++++++++++++++++++++-
init/Kconfig | 6 ++++++
kernel/sys.c | 13 +++++++++++++
4 files changed, 46 insertions(+), 1 deletion(-)
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index d25435d94b6e..164de1d24e92 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -103,6 +103,7 @@ config S390
select ARCH_INLINE_WRITE_UNLOCK_BH
select ARCH_INLINE_WRITE_UNLOCK_IRQ
select ARCH_INLINE_WRITE_UNLOCK_IRQRESTORE
+ select ADDR_LIMIT_CHECK
select ARCH_SAVE_PAGE_KEYS if HIBERNATION
select ARCH_SUPPORTS_ATOMIC_RMW
select ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 980c3c9b06f8..ebde64f1622c 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -191,6 +191,28 @@ extern struct trace_event_functions exit_syscall_print_funcs;
SYSCALL_METADATA(sname, x, __VA_ARGS__) \
__SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
+
+/*
+ * Called before coming back to user-mode. Returning to user-mode with an
+ * address limit different than USER_DS can allow to overwrite kernel memory.
+ */
+static inline void addr_limit_check_syscall(void)
+{
+ BUG_ON(!segment_eq(get_fs(), USER_DS));
+}
+
+#ifndef CONFIG_ADDR_LIMIT_CHECK
+#define ADDR_LIMIT_CHECK_PRE() \
+ bool user_caller = segment_eq(get_fs(), USER_DS)
+#define ADDR_LIMIT_CHECK_POST() \
+ if (user_caller) addr_limit_check_syscall()
+#else
+#define ADDR_LIMIT_CHECK_PRE()
+#define ADDR_LIMIT_CHECK_POST()
+asmlinkage void addr_limit_check_failed(void) __noreturn;
+#endif
+
+
#define __PROTECT(...) asmlinkage_protect(__VA_ARGS__)
#define __SYSCALL_DEFINEx(x, name, ...) \
asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \
@@ -199,7 +221,10 @@ extern struct trace_event_functions exit_syscall_print_funcs;
asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__)); \
asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \
{ \
- long ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__)); \
+ long ret; \
+ ADDR_LIMIT_CHECK_PRE(); \
+ ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__)); \
+ ADDR_LIMIT_CHECK_POST(); \
__MAP(x,__SC_TEST,__VA_ARGS__); \
__PROTECT(x, ret,__MAP(x,__SC_ARGS,__VA_ARGS__)); \
return ret; \
diff --git a/init/Kconfig b/init/Kconfig
index 42a346b0df43..599d9fe30703 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1961,6 +1961,12 @@ config PROFILING
config TRACEPOINTS
bool
+config ADDR_LIMIT_CHECK
+ bool
+ help
+ Disable the generic address limit check. Allow each architecture to
+ optimize how and when the verification is done.
+
source "arch/Kconfig"
endmenu # General setup
diff --git a/kernel/sys.c b/kernel/sys.c
index 8a94b4eabcaa..a1cbcd715d62 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2458,3 +2458,16 @@ COMPAT_SYSCALL_DEFINE1(sysinfo, struct compat_sysinfo __user *, info)
return 0;
}
#endif /* CONFIG_COMPAT */
+
+#ifdef CONFIG_ADDR_LIMIT_CHECK
+/*
+ * Used when an architecture specific implementation detects an invalid address
+ * limit. This function does not return.
+ */
+asmlinkage void addr_limit_check_failed(void)
+{
+ /* Try to fail on the generic address limit check */
+ addr_limit_check_syscall();
+ panic("Invalid address limit before returning to user-mode");
+}
+#endif
--
2.13.0.rc0.306.g87b477812d-goog
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v8 1/4] syscalls: Verify address limit before returning to user-mode
2017-04-27 14:42 ` Thomas Garnier
@ 2017-04-28 6:35 ` Ingo Molnar
0 siblings, 0 replies; 9+ messages in thread
From: Ingo Molnar @ 2017-04-28 6:35 UTC (permalink / raw)
To: Thomas Garnier
Cc: Martin Schwidefsky, Heiko Carstens, Dave Hansen, Arnd Bergmann,
Andrew Morton, Al Viro, David Howells, René Nyffenegger,
Paul E . McKenney, Thomas Gleixner, Eric W . Biederman,
Oleg Nesterov, Pavel Tikhomirov, Ingo Molnar, H . Peter Anvin,
Andy Lutomirski, Paolo Bonzini, Rik van Riel, Kees Cook,
Josh Poimboeuf, Borislav Petkov, Brian Gerst,
Kirill A . Shutemov, Christian Borntraeger, Russell King,
Will Deacon, Catalin Marinas, Mark Rutland, James Morse,
linux-s390, linux-kernel, linux-api, x86, linux-arm-kernel,
kernel-hardening
* Thomas Garnier <thgarnie@google.com> wrote:
> Ensure that a syscall does not return to user-mode with a kernel address
> limit. If that happens, a process can corrupt kernel-mode memory and
> elevate privileges [1].
>
> The CONFIG_ADDR_LIMIT_CHECK option disables the generic check so each
> architecture can create optimized versions.
>
> [1] https://bugs.chromium.org/p/project-zero/issues/detail?id=990
>
> Signed-off-by: Thomas Garnier <thgarnie@google.com>
> Tested-by: Kees Cook <keescook@chromium.org>
> ---
> Based on next-20170426
> ---
> arch/s390/Kconfig | 1 +
> include/linux/syscalls.h | 27 ++++++++++++++++++++++++++-
> init/Kconfig | 6 ++++++
> kernel/sys.c | 13 +++++++++++++
> 4 files changed, 46 insertions(+), 1 deletion(-)
>
> diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
> index d25435d94b6e..164de1d24e92 100644
> --- a/arch/s390/Kconfig
> +++ b/arch/s390/Kconfig
> @@ -103,6 +103,7 @@ config S390
> select ARCH_INLINE_WRITE_UNLOCK_BH
> select ARCH_INLINE_WRITE_UNLOCK_IRQ
> select ARCH_INLINE_WRITE_UNLOCK_IRQRESTORE
> + select ADDR_LIMIT_CHECK
> select ARCH_SAVE_PAGE_KEYS if HIBERNATION
> select ARCH_SUPPORTS_ATOMIC_RMW
> select ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index 980c3c9b06f8..ebde64f1622c 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -191,6 +191,28 @@ extern struct trace_event_functions exit_syscall_print_funcs;
> SYSCALL_METADATA(sname, x, __VA_ARGS__) \
> __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
>
> +
> +/*
> + * Called before coming back to user-mode. Returning to user-mode with an
> + * address limit different than USER_DS can allow to overwrite kernel memory.
> + */
> +static inline void addr_limit_check_syscall(void)
> +{
> + BUG_ON(!segment_eq(get_fs(), USER_DS));
> +}
> +
> +#ifndef CONFIG_ADDR_LIMIT_CHECK
> +#define ADDR_LIMIT_CHECK_PRE() \
> + bool user_caller = segment_eq(get_fs(), USER_DS)
> +#define ADDR_LIMIT_CHECK_POST() \
> + if (user_caller) addr_limit_check_syscall()
> +#else
> +#define ADDR_LIMIT_CHECK_PRE()
> +#define ADDR_LIMIT_CHECK_POST()
> +asmlinkage void addr_limit_check_failed(void) __noreturn;
> +#endif
> +
> +
> #define __PROTECT(...) asmlinkage_protect(__VA_ARGS__)
> #define __SYSCALL_DEFINEx(x, name, ...) \
> asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \
> @@ -199,7 +221,10 @@ extern struct trace_event_functions exit_syscall_print_funcs;
> asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__)); \
> asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \
> { \
> - long ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__)); \
> + long ret; \
> + ADDR_LIMIT_CHECK_PRE(); \
> + ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__)); \
> + ADDR_LIMIT_CHECK_POST(); \
> __MAP(x,__SC_TEST,__VA_ARGS__); \
> __PROTECT(x, ret,__MAP(x,__SC_ARGS,__VA_ARGS__)); \
> return ret; \
> diff --git a/init/Kconfig b/init/Kconfig
> index 42a346b0df43..599d9fe30703 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1961,6 +1961,12 @@ config PROFILING
> config TRACEPOINTS
> bool
>
> +config ADDR_LIMIT_CHECK
> + bool
> + help
> + Disable the generic address limit check. Allow each architecture to
> + optimize how and when the verification is done.
> +
> source "arch/Kconfig"
>
> endmenu # General setup
> diff --git a/kernel/sys.c b/kernel/sys.c
> index 8a94b4eabcaa..a1cbcd715d62 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -2458,3 +2458,16 @@ COMPAT_SYSCALL_DEFINE1(sysinfo, struct compat_sysinfo __user *, info)
> return 0;
> }
> #endif /* CONFIG_COMPAT */
> +
> +#ifdef CONFIG_ADDR_LIMIT_CHECK
> +/*
> + * Used when an architecture specific implementation detects an invalid address
> + * limit. This function does not return.
> + */
> +asmlinkage void addr_limit_check_failed(void)
> +{
> + /* Try to fail on the generic address limit check */
> + addr_limit_check_syscall();
> + panic("Invalid address limit before returning to user-mode");
> +}
> +#endif
Ok, this version looks pretty good to me. Could you (re-)send a full series?
I assume some of these changes need to be propagated into the followup patches but
even if not it's better to pick up a clean series.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v8 1/4] syscalls: Verify address limit before returning to user-mode
2017-04-27 14:16 ` Thomas Garnier
2017-04-27 14:42 ` Thomas Garnier
@ 2017-04-28 6:33 ` Ingo Molnar
1 sibling, 0 replies; 9+ messages in thread
From: Ingo Molnar @ 2017-04-28 6:33 UTC (permalink / raw)
To: Thomas Garnier
Cc: Martin Schwidefsky, Heiko Carstens, Dave Hansen, Arnd Bergmann,
David Howells, René Nyffenegger, Andrew Morton,
Paul E . McKenney, Eric W . Biederman, Thomas Gleixner,
Oleg Nesterov, Pavel Tikhomirov, Ingo Molnar, H . Peter Anvin,
Andy Lutomirski, Paolo Bonzini, Kees Cook, Rik van Riel,
Josh Poimboeuf, Borislav Petkov, Brian Gerst,
Kirill A . Shutemov, Christian Borntraeger, Russell King,
Will Deacon, Catalin Marinas, Mark Rutland, James Morse,
linux-s390, LKML, Linux API, the arch/x86 maintainers,
linux-arm-kernel, Kernel Hardening
* Thomas Garnier <thgarnie@google.com> wrote:
> > BTW., a further simplification would be:
> >
> > #ifndef ADDR_LIMIT_CHECK_PRE
> > # define ADDR_LIMIT_CHECK_PRE ...
> > #endif
> >
> > This way architectures could override this generic functionality simply by
> > defining the helpers. Architectures that don't do that get the generic version.
>
> I don't think architectures need to do that. The optimizations are
> embedding the checks on their architecture-specific code to make it
> faster and remove the size impact. The pre/post is fine for the rest.
Indeed, only the generic code needs to turn off that code - architectures will
place these callbacks elsewhere.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 9+ messages in thread