All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Thomas Garnier <thgarnie@google.com>
Cc: "Martin Schwidefsky" <schwidefsky@de.ibm.com>,
	"Heiko Carstens" <heiko.carstens@de.ibm.com>,
	"Arnd Bergmann" <arnd@arndb.de>,
	"Dave Hansen" <dave.hansen@intel.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"David Howells" <dhowells@redhat.com>,
	"René Nyffenegger" <mail@renenyffenegger.ch>,
	"Paul E . McKenney" <paulmck@linux.vnet.ibm.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Oleg Nesterov" <oleg@redhat.com>,
	"Stephen Smalley" <sds@tycho.nsa.gov>,
	"Pavel Tikhomirov" <ptikhomirov@virtuozzo.com>,
	"Ingo Molnar" <mingo@redhat.com>,
	"H . Peter Anvin" <hpa@zytor.com>,
	"Andy Lutomirski" <luto@kernel.org>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Kees Cook" <keescook@chromium.org>,
	"Rik van Riel" <riel@redhat.com>,
	"Josh Poimboeuf" <jpoimboe@redhat.com>,
	"Borislav Petkov" <bp@alien8.de>,
	"Brian Gerst" <brgerst@gmail.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	"Christian Borntraeger" <borntraeger@de.ibm.com>,
	"Russell King" <linux@armlinux.org.uk>,
	"Will Deacon" <will.deacon@arm.com>,
	"Catalin Marinas" <catalin.marinas@arm.com>,
	"Mark Rutland" <mark.rutland@arm.com>,
	"James Morse" <james.morse@arm.com>,
	linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-api@vger.kernel.org, x86@kernel.org,
	linux-arm-kernel@lists.infradead.org,
	kernel-hardening@lists.openwall.com
Subject: Re: [PATCH v7 1/4] syscalls: Restore address limit after a syscall
Date: Tue, 25 Apr 2017 08:33:05 +0200	[thread overview]
Message-ID: <20170425063305.hwjuxupa37rwe6zj@gmail.com> (raw)
In-Reply-To: <20170410164420.64003-1-thgarnie@google.com>


* Thomas Garnier <thgarnie@google.com> wrote:

> This patch ensures a syscall does not return to user-mode with a kernel
> address limit. If that happened, a process can corrupt kernel-mode
> memory and elevate privileges.

Don't start changelogs with 'This patch' - it's obvious that we are talking about 
this patch. Writing:

   Ensure that a syscall does not return to user-mode with a kernel address limit. 
   If that happens, a process can corrupt kernel-mode memory and elevate 
   privileges.

also note the spelling fix I did. (There's another spelling error elsewhere in 
this changelog as well.)

Please read changelogs!

> For example, it would mitigation this bug:
> 
> - https://bugs.chromium.org/p/project-zero/issues/detail?id=990
> 
> The CONFIG_ARCH_NO_SYSCALL_VERIFY_PRE_USERMODE_STATE option is also
> added so each architecture can optimize this change.

As I pointed it out in my previous reply this Kconfig name is awfully long - but 
it should have been obvious when this changelog was written ...

> Signed-off-by: Thomas Garnier <thgarnie@google.com>
> Tested-by: Kees Cook <keescook@chromium.org>
> ---
> Based on next-20170410
> ---
>  arch/s390/Kconfig        |  1 +
>  include/linux/syscalls.h | 26 +++++++++++++++++++++++++-
>  init/Kconfig             |  6 ++++++
>  kernel/sys.c             | 13 +++++++++++++
>  4 files changed, 45 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
> index d25435d94b6e..489a0cc6e46b 100644
> --- a/arch/s390/Kconfig
> +++ b/arch/s390/Kconfig
> @@ -103,6 +103,7 @@ config S390
>  	select ARCH_INLINE_WRITE_UNLOCK_BH
>  	select ARCH_INLINE_WRITE_UNLOCK_IRQ
>  	select ARCH_INLINE_WRITE_UNLOCK_IRQRESTORE
> +	select ARCH_NO_SYSCALL_VERIFY_PRE_USERMODE_STATE
>  	select ARCH_SAVE_PAGE_KEYS if HIBERNATION
>  	select ARCH_SUPPORTS_ATOMIC_RMW
>  	select ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index 980c3c9b06f8..801a7a74fe28 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -191,6 +191,27 @@ extern struct trace_event_functions exit_syscall_print_funcs;
>  	SYSCALL_METADATA(sname, x, __VA_ARGS__)			\
>  	__SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
>  
> +
> +/*
> + * Called before coming back to user-mode. Returning to user-mode with an
> + * address limit different than USER_DS can allow to overwrite kernel memory.
> + */
> +static inline void verify_pre_usermode_state(void) {
> +	BUG_ON(!segment_eq(get_fs(), USER_DS));
> +}

Non-standard coding style.

> +
> +#ifndef CONFIG_ARCH_NO_SYSCALL_VERIFY_PRE_USERMODE_STATE
> +#define __CHECK_USER_CALLER() \
> +	bool user_caller = segment_eq(get_fs(), USER_DS)
> +#define __VERIFY_PRE_USERMODE_STATE() \
> +	if (user_caller) verify_pre_usermode_state()
> +#else
> +#define __CHECK_USER_CALLER()
> +#define __VERIFY_PRE_USERMODE_STATE()
> +asmlinkage void address_limit_check_failed(void);
> +#endif
> +
> +
>  #define __PROTECT(...) asmlinkage_protect(__VA_ARGS__)
>  #define __SYSCALL_DEFINEx(x, name, ...)					\
>  	asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))	\
> @@ -199,7 +220,10 @@ extern struct trace_event_functions exit_syscall_print_funcs;
>  	asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__));	\
>  	asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__))	\
>  	{								\
> -		long ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__));	\
> +		long ret;						\
> +		__CHECK_USER_CALLER();					\
> +		ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__));	\
> +		__VERIFY_PRE_USERMODE_STATE();				\
>  		__MAP(x,__SC_TEST,__VA_ARGS__);				\
>  		__PROTECT(x, ret,__MAP(x,__SC_ARGS,__VA_ARGS__));	\
>  		return ret;						\

BTW., the '__VERIFY_PRE_USERMODE_STATE()' name is highly misleading: the 'pre' 
prefix suggests that this is done before a system call - while it's done 
afterwards.

The solution is to not try to specify the exact call placement in the name, just 
describe the functionality (and harmonize along the common prefix).

> +config ARCH_NO_SYSCALL_VERIFY_PRE_USERMODE_STATE
> +	bool
> +	help
> +	  Disable the generic pre-usermode state verification. Allow each
> +	  architecture to optimize how and when the verification is done.
> +

Please name the Kconfig symbols something like this:

	CONFIG_ADDR_LIMIT_CHECK
	CONFIG_ADDR_LIMIT_CHECK_ARCH

or so, which tells us whether the check is done by the architecture code, without 
breaking the col80 limit with a single Kconfig name.

BTW:

> +#ifdef CONFIG_ARCH_NO_SYSCALL_VERIFY_PRE_USERMODE_STATE
> +/*
> + * This function is called when an architecture specific implementation detected
> + * an invalid address limit. The generic user-mode state checker will finish on
> + * the appropriate BUG_ON.
> + */
> +asmlinkage void address_limit_check_failed(void)
> +{
> +	verify_pre_usermode_state();
> +	panic("address_limit_check_failed called with a valid user-mode state");

It's very unconstructive to unconditionally panic the system, just because some 
kernel code leaked the address limit! Do a warn-once printout and kill the current 
task (i.e. don't continue execution), but don't crash everything else!

Thanks,

	Ingo

WARNING: multiple messages have this Message-ID (diff)
From: Ingo Molnar <mingo@kernel.org>
To: Thomas Garnier <thgarnie@google.com>
Cc: "Martin Schwidefsky" <schwidefsky@de.ibm.com>,
	"Heiko Carstens" <heiko.carstens@de.ibm.com>,
	"Arnd Bergmann" <arnd@arndb.de>,
	"Dave Hansen" <dave.hansen@intel.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"David Howells" <dhowells@redhat.com>,
	"René Nyffenegger" <mail@renenyffenegger.ch>,
	"Paul E . McKenney" <paulmck@linux.vnet.ibm.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Oleg Nesterov" <oleg@redhat.com>,
	"Stephen Smalley" <sds@tycho.nsa.gov>,
	"Pavel Tikhomirov" <ptikhomirov@virtuozzo.com>,
	"Ingo Molnar" <mingo@redhat.com>,
	"H . Peter Anvin" <hpa@zytor.com>,
	"Andy Lutomirski" <luto@kernel.org>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Kees Cook" <keescook@chromium.org>,
	"Rik van Riel" <riel@redhat.com>,
	"Josh Poimboeuf" <jpoimboe@redhat.com>,
	"Borislav Petkov" <bp@alien8.de>,
	"Brian Gerst" <brgerst@gmail.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	"Christian Borntraeger" <borntraeger@de.ibm.com>,
	"Russell King" <linux@armlinux.org.uk>,
	"Will Deacon" <will.deacon@arm.com>,
	"Catalin Marinas" <catalin.marinas@arm.com>,
	"Mark Rutland" <mark.rutland@arm.com>,
	"James Morse" <james.morse@arm.com>,
	linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-api@vger.kernel.org, x86@kernel.org,
	linux-arm-kernel@lists.infradead.org,
	kernel-hardening@lists.openwall.com
Subject: [kernel-hardening] Re: [PATCH v7 1/4] syscalls: Restore address limit after a syscall
Date: Tue, 25 Apr 2017 08:33:05 +0200	[thread overview]
Message-ID: <20170425063305.hwjuxupa37rwe6zj@gmail.com> (raw)
In-Reply-To: <20170410164420.64003-1-thgarnie@google.com>


* Thomas Garnier <thgarnie@google.com> wrote:

> This patch ensures a syscall does not return to user-mode with a kernel
> address limit. If that happened, a process can corrupt kernel-mode
> memory and elevate privileges.

Don't start changelogs with 'This patch' - it's obvious that we are talking about 
this patch. Writing:

   Ensure that a syscall does not return to user-mode with a kernel address limit. 
   If that happens, a process can corrupt kernel-mode memory and elevate 
   privileges.

also note the spelling fix I did. (There's another spelling error elsewhere in 
this changelog as well.)

Please read changelogs!

> For example, it would mitigation this bug:
> 
> - https://bugs.chromium.org/p/project-zero/issues/detail?id=990
> 
> The CONFIG_ARCH_NO_SYSCALL_VERIFY_PRE_USERMODE_STATE option is also
> added so each architecture can optimize this change.

As I pointed it out in my previous reply this Kconfig name is awfully long - but 
it should have been obvious when this changelog was written ...

> Signed-off-by: Thomas Garnier <thgarnie@google.com>
> Tested-by: Kees Cook <keescook@chromium.org>
> ---
> Based on next-20170410
> ---
>  arch/s390/Kconfig        |  1 +
>  include/linux/syscalls.h | 26 +++++++++++++++++++++++++-
>  init/Kconfig             |  6 ++++++
>  kernel/sys.c             | 13 +++++++++++++
>  4 files changed, 45 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
> index d25435d94b6e..489a0cc6e46b 100644
> --- a/arch/s390/Kconfig
> +++ b/arch/s390/Kconfig
> @@ -103,6 +103,7 @@ config S390
>  	select ARCH_INLINE_WRITE_UNLOCK_BH
>  	select ARCH_INLINE_WRITE_UNLOCK_IRQ
>  	select ARCH_INLINE_WRITE_UNLOCK_IRQRESTORE
> +	select ARCH_NO_SYSCALL_VERIFY_PRE_USERMODE_STATE
>  	select ARCH_SAVE_PAGE_KEYS if HIBERNATION
>  	select ARCH_SUPPORTS_ATOMIC_RMW
>  	select ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index 980c3c9b06f8..801a7a74fe28 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -191,6 +191,27 @@ extern struct trace_event_functions exit_syscall_print_funcs;
>  	SYSCALL_METADATA(sname, x, __VA_ARGS__)			\
>  	__SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
>  
> +
> +/*
> + * Called before coming back to user-mode. Returning to user-mode with an
> + * address limit different than USER_DS can allow to overwrite kernel memory.
> + */
> +static inline void verify_pre_usermode_state(void) {
> +	BUG_ON(!segment_eq(get_fs(), USER_DS));
> +}

Non-standard coding style.

> +
> +#ifndef CONFIG_ARCH_NO_SYSCALL_VERIFY_PRE_USERMODE_STATE
> +#define __CHECK_USER_CALLER() \
> +	bool user_caller = segment_eq(get_fs(), USER_DS)
> +#define __VERIFY_PRE_USERMODE_STATE() \
> +	if (user_caller) verify_pre_usermode_state()
> +#else
> +#define __CHECK_USER_CALLER()
> +#define __VERIFY_PRE_USERMODE_STATE()
> +asmlinkage void address_limit_check_failed(void);
> +#endif
> +
> +
>  #define __PROTECT(...) asmlinkage_protect(__VA_ARGS__)
>  #define __SYSCALL_DEFINEx(x, name, ...)					\
>  	asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))	\
> @@ -199,7 +220,10 @@ extern struct trace_event_functions exit_syscall_print_funcs;
>  	asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__));	\
>  	asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__))	\
>  	{								\
> -		long ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__));	\
> +		long ret;						\
> +		__CHECK_USER_CALLER();					\
> +		ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__));	\
> +		__VERIFY_PRE_USERMODE_STATE();				\
>  		__MAP(x,__SC_TEST,__VA_ARGS__);				\
>  		__PROTECT(x, ret,__MAP(x,__SC_ARGS,__VA_ARGS__));	\
>  		return ret;						\

BTW., the '__VERIFY_PRE_USERMODE_STATE()' name is highly misleading: the 'pre' 
prefix suggests that this is done before a system call - while it's done 
afterwards.

The solution is to not try to specify the exact call placement in the name, just 
describe the functionality (and harmonize along the common prefix).

> +config ARCH_NO_SYSCALL_VERIFY_PRE_USERMODE_STATE
> +	bool
> +	help
> +	  Disable the generic pre-usermode state verification. Allow each
> +	  architecture to optimize how and when the verification is done.
> +

Please name the Kconfig symbols something like this:

	CONFIG_ADDR_LIMIT_CHECK
	CONFIG_ADDR_LIMIT_CHECK_ARCH

or so, which tells us whether the check is done by the architecture code, without 
breaking the col80 limit with a single Kconfig name.

BTW:

> +#ifdef CONFIG_ARCH_NO_SYSCALL_VERIFY_PRE_USERMODE_STATE
> +/*
> + * This function is called when an architecture specific implementation detected
> + * an invalid address limit. The generic user-mode state checker will finish on
> + * the appropriate BUG_ON.
> + */
> +asmlinkage void address_limit_check_failed(void)
> +{
> +	verify_pre_usermode_state();
> +	panic("address_limit_check_failed called with a valid user-mode state");

It's very unconstructive to unconditionally panic the system, just because some 
kernel code leaked the address limit! Do a warn-once printout and kill the current 
task (i.e. don't continue execution), but don't crash everything else!

Thanks,

	Ingo

WARNING: multiple messages have this Message-ID (diff)
From: Ingo Molnar <mingo-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
To: Thomas Garnier <thgarnie-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Cc: "Martin Schwidefsky"
	<schwidefsky-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>,
	"Heiko Carstens"
	<heiko.carstens-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>,
	"Arnd Bergmann" <arnd-r2nGTMty4D4@public.gmane.org>,
	"Dave Hansen"
	<dave.hansen-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	"Andrew Morton"
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	"David Howells"
	<dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	"René Nyffenegger"
	<mail-gLCNRsNSrVdVZEhyV+6z5nIPMjoJpjVV@public.gmane.org>,
	"Paul E . McKenney"
	<paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>,
	"Thomas Gleixner" <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>,
	"Oleg Nesterov" <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	"Stephen Smalley" <sds-+05T5uksL2qpZYMLLGbcSA@public.gmane.org>,
	"Pavel Tikhomirov"
	<ptikhomirov-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>,
	"Ingo Molnar" <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	"H . Peter Anvin" <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>,
	"Andy Lutomirski" <luto-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	"Paolo Bonzini"
	<pbonzini-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	"Kees Cook" <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>,
	"Rik van Riel" <riel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	"Josh Poimboeuf"
	<jpoimboe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Subject: Re: [PATCH v7 1/4] syscalls: Restore address limit after a syscall
Date: Tue, 25 Apr 2017 08:33:05 +0200	[thread overview]
Message-ID: <20170425063305.hwjuxupa37rwe6zj@gmail.com> (raw)
In-Reply-To: <20170410164420.64003-1-thgarnie-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>


* Thomas Garnier <thgarnie-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:

> This patch ensures a syscall does not return to user-mode with a kernel
> address limit. If that happened, a process can corrupt kernel-mode
> memory and elevate privileges.

Don't start changelogs with 'This patch' - it's obvious that we are talking about 
this patch. Writing:

   Ensure that a syscall does not return to user-mode with a kernel address limit. 
   If that happens, a process can corrupt kernel-mode memory and elevate 
   privileges.

also note the spelling fix I did. (There's another spelling error elsewhere in 
this changelog as well.)

Please read changelogs!

> For example, it would mitigation this bug:
> 
> - https://bugs.chromium.org/p/project-zero/issues/detail?id=990
> 
> The CONFIG_ARCH_NO_SYSCALL_VERIFY_PRE_USERMODE_STATE option is also
> added so each architecture can optimize this change.

As I pointed it out in my previous reply this Kconfig name is awfully long - but 
it should have been obvious when this changelog was written ...

> Signed-off-by: Thomas Garnier <thgarnie-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> Tested-by: Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
> ---
> Based on next-20170410
> ---
>  arch/s390/Kconfig        |  1 +
>  include/linux/syscalls.h | 26 +++++++++++++++++++++++++-
>  init/Kconfig             |  6 ++++++
>  kernel/sys.c             | 13 +++++++++++++
>  4 files changed, 45 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
> index d25435d94b6e..489a0cc6e46b 100644
> --- a/arch/s390/Kconfig
> +++ b/arch/s390/Kconfig
> @@ -103,6 +103,7 @@ config S390
>  	select ARCH_INLINE_WRITE_UNLOCK_BH
>  	select ARCH_INLINE_WRITE_UNLOCK_IRQ
>  	select ARCH_INLINE_WRITE_UNLOCK_IRQRESTORE
> +	select ARCH_NO_SYSCALL_VERIFY_PRE_USERMODE_STATE
>  	select ARCH_SAVE_PAGE_KEYS if HIBERNATION
>  	select ARCH_SUPPORTS_ATOMIC_RMW
>  	select ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index 980c3c9b06f8..801a7a74fe28 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -191,6 +191,27 @@ extern struct trace_event_functions exit_syscall_print_funcs;
>  	SYSCALL_METADATA(sname, x, __VA_ARGS__)			\
>  	__SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
>  
> +
> +/*
> + * Called before coming back to user-mode. Returning to user-mode with an
> + * address limit different than USER_DS can allow to overwrite kernel memory.
> + */
> +static inline void verify_pre_usermode_state(void) {
> +	BUG_ON(!segment_eq(get_fs(), USER_DS));
> +}

Non-standard coding style.

> +
> +#ifndef CONFIG_ARCH_NO_SYSCALL_VERIFY_PRE_USERMODE_STATE
> +#define __CHECK_USER_CALLER() \
> +	bool user_caller = segment_eq(get_fs(), USER_DS)
> +#define __VERIFY_PRE_USERMODE_STATE() \
> +	if (user_caller) verify_pre_usermode_state()
> +#else
> +#define __CHECK_USER_CALLER()
> +#define __VERIFY_PRE_USERMODE_STATE()
> +asmlinkage void address_limit_check_failed(void);
> +#endif
> +
> +
>  #define __PROTECT(...) asmlinkage_protect(__VA_ARGS__)
>  #define __SYSCALL_DEFINEx(x, name, ...)					\
>  	asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))	\
> @@ -199,7 +220,10 @@ extern struct trace_event_functions exit_syscall_print_funcs;
>  	asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__));	\
>  	asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__))	\
>  	{								\
> -		long ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__));	\
> +		long ret;						\
> +		__CHECK_USER_CALLER();					\
> +		ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__));	\
> +		__VERIFY_PRE_USERMODE_STATE();				\
>  		__MAP(x,__SC_TEST,__VA_ARGS__);				\
>  		__PROTECT(x, ret,__MAP(x,__SC_ARGS,__VA_ARGS__));	\
>  		return ret;						\

BTW., the '__VERIFY_PRE_USERMODE_STATE()' name is highly misleading: the 'pre' 
prefix suggests that this is done before a system call - while it's done 
afterwards.

The solution is to not try to specify the exact call placement in the name, just 
describe the functionality (and harmonize along the common prefix).

> +config ARCH_NO_SYSCALL_VERIFY_PRE_USERMODE_STATE
> +	bool
> +	help
> +	  Disable the generic pre-usermode state verification. Allow each
> +	  architecture to optimize how and when the verification is done.
> +

Please name the Kconfig symbols something like this:

	CONFIG_ADDR_LIMIT_CHECK
	CONFIG_ADDR_LIMIT_CHECK_ARCH

or so, which tells us whether the check is done by the architecture code, without 
breaking the col80 limit with a single Kconfig name.

BTW:

> +#ifdef CONFIG_ARCH_NO_SYSCALL_VERIFY_PRE_USERMODE_STATE
> +/*
> + * This function is called when an architecture specific implementation detected
> + * an invalid address limit. The generic user-mode state checker will finish on
> + * the appropriate BUG_ON.
> + */
> +asmlinkage void address_limit_check_failed(void)
> +{
> +	verify_pre_usermode_state();
> +	panic("address_limit_check_failed called with a valid user-mode state");

It's very unconstructive to unconditionally panic the system, just because some 
kernel code leaked the address limit! Do a warn-once printout and kill the current 
task (i.e. don't continue execution), but don't crash everything else!

Thanks,

	Ingo

WARNING: multiple messages have this Message-ID (diff)
From: mingo@kernel.org (Ingo Molnar)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v7 1/4] syscalls: Restore address limit after a syscall
Date: Tue, 25 Apr 2017 08:33:05 +0200	[thread overview]
Message-ID: <20170425063305.hwjuxupa37rwe6zj@gmail.com> (raw)
In-Reply-To: <20170410164420.64003-1-thgarnie@google.com>


* Thomas Garnier <thgarnie@google.com> wrote:

> This patch ensures a syscall does not return to user-mode with a kernel
> address limit. If that happened, a process can corrupt kernel-mode
> memory and elevate privileges.

Don't start changelogs with 'This patch' - it's obvious that we are talking about 
this patch. Writing:

   Ensure that a syscall does not return to user-mode with a kernel address limit. 
   If that happens, a process can corrupt kernel-mode memory and elevate 
   privileges.

also note the spelling fix I did. (There's another spelling error elsewhere in 
this changelog as well.)

Please read changelogs!

> For example, it would mitigation this bug:
> 
> - https://bugs.chromium.org/p/project-zero/issues/detail?id=990
> 
> The CONFIG_ARCH_NO_SYSCALL_VERIFY_PRE_USERMODE_STATE option is also
> added so each architecture can optimize this change.

As I pointed it out in my previous reply this Kconfig name is awfully long - but 
it should have been obvious when this changelog was written ...

> Signed-off-by: Thomas Garnier <thgarnie@google.com>
> Tested-by: Kees Cook <keescook@chromium.org>
> ---
> Based on next-20170410
> ---
>  arch/s390/Kconfig        |  1 +
>  include/linux/syscalls.h | 26 +++++++++++++++++++++++++-
>  init/Kconfig             |  6 ++++++
>  kernel/sys.c             | 13 +++++++++++++
>  4 files changed, 45 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
> index d25435d94b6e..489a0cc6e46b 100644
> --- a/arch/s390/Kconfig
> +++ b/arch/s390/Kconfig
> @@ -103,6 +103,7 @@ config S390
>  	select ARCH_INLINE_WRITE_UNLOCK_BH
>  	select ARCH_INLINE_WRITE_UNLOCK_IRQ
>  	select ARCH_INLINE_WRITE_UNLOCK_IRQRESTORE
> +	select ARCH_NO_SYSCALL_VERIFY_PRE_USERMODE_STATE
>  	select ARCH_SAVE_PAGE_KEYS if HIBERNATION
>  	select ARCH_SUPPORTS_ATOMIC_RMW
>  	select ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index 980c3c9b06f8..801a7a74fe28 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -191,6 +191,27 @@ extern struct trace_event_functions exit_syscall_print_funcs;
>  	SYSCALL_METADATA(sname, x, __VA_ARGS__)			\
>  	__SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
>  
> +
> +/*
> + * Called before coming back to user-mode. Returning to user-mode with an
> + * address limit different than USER_DS can allow to overwrite kernel memory.
> + */
> +static inline void verify_pre_usermode_state(void) {
> +	BUG_ON(!segment_eq(get_fs(), USER_DS));
> +}

Non-standard coding style.

> +
> +#ifndef CONFIG_ARCH_NO_SYSCALL_VERIFY_PRE_USERMODE_STATE
> +#define __CHECK_USER_CALLER() \
> +	bool user_caller = segment_eq(get_fs(), USER_DS)
> +#define __VERIFY_PRE_USERMODE_STATE() \
> +	if (user_caller) verify_pre_usermode_state()
> +#else
> +#define __CHECK_USER_CALLER()
> +#define __VERIFY_PRE_USERMODE_STATE()
> +asmlinkage void address_limit_check_failed(void);
> +#endif
> +
> +
>  #define __PROTECT(...) asmlinkage_protect(__VA_ARGS__)
>  #define __SYSCALL_DEFINEx(x, name, ...)					\
>  	asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))	\
> @@ -199,7 +220,10 @@ extern struct trace_event_functions exit_syscall_print_funcs;
>  	asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__));	\
>  	asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__))	\
>  	{								\
> -		long ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__));	\
> +		long ret;						\
> +		__CHECK_USER_CALLER();					\
> +		ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__));	\
> +		__VERIFY_PRE_USERMODE_STATE();				\
>  		__MAP(x,__SC_TEST,__VA_ARGS__);				\
>  		__PROTECT(x, ret,__MAP(x,__SC_ARGS,__VA_ARGS__));	\
>  		return ret;						\

BTW., the '__VERIFY_PRE_USERMODE_STATE()' name is highly misleading: the 'pre' 
prefix suggests that this is done before a system call - while it's done 
afterwards.

The solution is to not try to specify the exact call placement in the name, just 
describe the functionality (and harmonize along the common prefix).

> +config ARCH_NO_SYSCALL_VERIFY_PRE_USERMODE_STATE
> +	bool
> +	help
> +	  Disable the generic pre-usermode state verification. Allow each
> +	  architecture to optimize how and when the verification is done.
> +

Please name the Kconfig symbols something like this:

	CONFIG_ADDR_LIMIT_CHECK
	CONFIG_ADDR_LIMIT_CHECK_ARCH

or so, which tells us whether the check is done by the architecture code, without 
breaking the col80 limit with a single Kconfig name.

BTW:

> +#ifdef CONFIG_ARCH_NO_SYSCALL_VERIFY_PRE_USERMODE_STATE
> +/*
> + * This function is called when an architecture specific implementation detected
> + * an invalid address limit. The generic user-mode state checker will finish on
> + * the appropriate BUG_ON.
> + */
> +asmlinkage void address_limit_check_failed(void)
> +{
> +	verify_pre_usermode_state();
> +	panic("address_limit_check_failed called with a valid user-mode state");

It's very unconstructive to unconditionally panic the system, just because some 
kernel code leaked the address limit! Do a warn-once printout and kill the current 
task (i.e. don't continue execution), but don't crash everything else!

Thanks,

	Ingo

  parent reply	other threads:[~2017-04-25  6:33 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-10 16:44 [PATCH v7 1/4] syscalls: Restore address limit after a syscall Thomas Garnier
2017-04-10 16:44 ` Thomas Garnier
2017-04-10 16:44 ` Thomas Garnier
2017-04-10 16:44 ` [kernel-hardening] " Thomas Garnier
2017-04-10 16:44 ` [PATCH v7 2/4] x86/syscalls: Architecture specific pre-usermode check Thomas Garnier
2017-04-10 16:44   ` Thomas Garnier
2017-04-10 16:44   ` Thomas Garnier
2017-04-10 16:44   ` [kernel-hardening] " Thomas Garnier
2017-04-10 16:44 ` [PATCH v7 3/4] arm/syscalls: " Thomas Garnier
2017-04-10 16:44   ` Thomas Garnier
2017-04-10 16:44   ` Thomas Garnier
2017-04-10 16:44   ` [kernel-hardening] " Thomas Garnier
2017-04-10 16:44 ` [PATCH v7 4/4] arm64/syscalls: " Thomas Garnier
2017-04-10 16:44   ` Thomas Garnier
2017-04-10 16:44   ` Thomas Garnier
2017-04-10 16:44   ` [kernel-hardening] " Thomas Garnier
2017-04-10 17:12   ` Catalin Marinas
2017-04-10 17:12     ` Catalin Marinas
2017-04-10 17:12     ` Catalin Marinas
2017-04-10 17:12     ` [kernel-hardening] " Catalin Marinas
2017-04-10 20:06     ` Thomas Garnier
2017-04-10 20:06       ` Thomas Garnier
2017-04-10 20:06       ` Thomas Garnier
2017-04-10 20:06       ` [kernel-hardening] " Thomas Garnier
2017-04-10 20:09       ` Thomas Garnier
2017-04-10 20:09         ` Thomas Garnier
2017-04-10 20:09         ` Thomas Garnier
2017-04-10 20:09         ` [kernel-hardening] " Thomas Garnier
2017-04-10 20:07     ` Thomas Garnier
2017-04-10 20:07       ` Thomas Garnier
2017-04-10 20:07       ` Thomas Garnier
2017-04-10 20:07       ` [kernel-hardening] " Thomas Garnier
2017-04-24 23:57 ` [PATCH v7 1/4] syscalls: Restore address limit after a syscall Kees Cook
2017-04-24 23:57   ` Kees Cook
2017-04-24 23:57   ` Kees Cook
2017-04-24 23:57   ` [kernel-hardening] " Kees Cook
2017-04-25  6:23   ` Ingo Molnar
2017-04-25  6:23     ` Ingo Molnar
2017-04-25  6:23     ` Ingo Molnar
2017-04-25  6:23     ` [kernel-hardening] " Ingo Molnar
2017-04-25 14:12     ` Thomas Garnier
2017-04-25 14:12       ` Thomas Garnier
2017-04-25 14:12       ` Thomas Garnier
2017-04-25 14:12       ` [kernel-hardening] " Thomas Garnier
2017-04-25  6:33 ` Ingo Molnar [this message]
2017-04-25  6:33   ` Ingo Molnar
2017-04-25  6:33   ` Ingo Molnar
2017-04-25  6:33   ` [kernel-hardening] " Ingo Molnar
2017-04-25 14:18   ` Thomas Garnier
2017-04-25 14:18     ` Thomas Garnier
2017-04-25 14:18     ` Thomas Garnier
2017-04-25 14:18     ` [kernel-hardening] " Thomas Garnier
2017-04-26  8:12     ` Ingo Molnar
2017-04-26  8:12       ` Ingo Molnar
2017-04-26  8:12       ` Ingo Molnar
2017-04-26  8:12       ` [kernel-hardening] " Ingo Molnar
2017-04-26 14:09       ` Thomas Garnier
2017-04-26 14:09         ` Thomas Garnier
2017-04-26 14:09         ` Thomas Garnier
2017-04-26 14:09         ` [kernel-hardening] " Thomas Garnier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170425063305.hwjuxupa37rwe6zj@gmail.com \
    --to=mingo@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=borntraeger@de.ibm.com \
    --cc=bp@alien8.de \
    --cc=brgerst@gmail.com \
    --cc=catalin.marinas@arm.com \
    --cc=dave.hansen@intel.com \
    --cc=dhowells@redhat.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=hpa@zytor.com \
    --cc=james.morse@arm.com \
    --cc=jpoimboe@redhat.com \
    --cc=keescook@chromium.org \
    --cc=kernel-hardening@lists.openwall.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=luto@kernel.org \
    --cc=mail@renenyffenegger.ch \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=oleg@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=ptikhomirov@virtuozzo.com \
    --cc=riel@redhat.com \
    --cc=schwidefsky@de.ibm.com \
    --cc=sds@tycho.nsa.gov \
    --cc=tglx@linutronix.de \
    --cc=thgarnie@google.com \
    --cc=will.deacon@arm.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.