All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] syscalls: Restore address limit after a syscall
@ 2017-02-09 18:33 ` Thomas Garnier
  0 siblings, 0 replies; 22+ messages in thread
From: Thomas Garnier @ 2017-02-09 18:33 UTC (permalink / raw)
  To: Dave Hansen, Arnd Bergmann, René Nyffenegger, Stephen Bates,
	Jeff Moyer, Milosz Tanski, Thomas Garnier
  Cc: linux-api, linux-kernel, kernel-hardening

This patch prevents a syscall to modify the address limit of the
caller. The address limit is kept by the syscall wrapper and restored
just after the syscall ends.

For example, it would mitigation this bug:

- https://bugs.chromium.org/p/project-zero/issues/detail?id=990

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
Based on next-20170209
---
 include/linux/syscalls.h | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 91a740f6b884..a1b6a62a9849 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -198,7 +198,10 @@ extern struct trace_event_functions exit_syscall_print_funcs;
 	asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__));	\
 	asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__))	\
 	{								\
-		long ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__));	\
+		long ret;						\
+		mm_segment_t fs = get_fs();				\
+		ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__));	\
+		set_fs(fs);						\
 		__MAP(x,__SC_TEST,__VA_ARGS__);				\
 		__PROTECT(x, ret,__MAP(x,__SC_ARGS,__VA_ARGS__));	\
 		return ret;						\
-- 
2.11.0.483.g087da7b7c-goog

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [kernel-hardening] [RFC] syscalls: Restore address limit after a syscall
@ 2017-02-09 18:33 ` Thomas Garnier
  0 siblings, 0 replies; 22+ messages in thread
From: Thomas Garnier @ 2017-02-09 18:33 UTC (permalink / raw)
  To: Dave Hansen, Arnd Bergmann, René Nyffenegger, Stephen Bates,
	Jeff Moyer, Milosz Tanski, Thomas Garnier
  Cc: linux-api, linux-kernel, kernel-hardening

This patch prevents a syscall to modify the address limit of the
caller. The address limit is kept by the syscall wrapper and restored
just after the syscall ends.

For example, it would mitigation this bug:

- https://bugs.chromium.org/p/project-zero/issues/detail?id=990

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
Based on next-20170209
---
 include/linux/syscalls.h | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 91a740f6b884..a1b6a62a9849 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -198,7 +198,10 @@ extern struct trace_event_functions exit_syscall_print_funcs;
 	asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__));	\
 	asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__))	\
 	{								\
-		long ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__));	\
+		long ret;						\
+		mm_segment_t fs = get_fs();				\
+		ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__));	\
+		set_fs(fs);						\
 		__MAP(x,__SC_TEST,__VA_ARGS__);				\
 		__PROTECT(x, ret,__MAP(x,__SC_ARGS,__VA_ARGS__));	\
 		return ret;						\
-- 
2.11.0.483.g087da7b7c-goog

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [RFC] syscalls: Restore address limit after a syscall
@ 2017-02-09 19:31   ` Kees Cook
  0 siblings, 0 replies; 22+ messages in thread
From: Kees Cook @ 2017-02-09 19:31 UTC (permalink / raw)
  To: Thomas Garnier, Andy Lutomirski
  Cc: Dave Hansen, Arnd Bergmann, René Nyffenegger, Stephen Bates,
	Jeff Moyer, Milosz Tanski, Linux API, LKML, kernel-hardening

On Thu, Feb 9, 2017 at 10:33 AM, Thomas Garnier <thgarnie@google.com> wrote:
> This patch prevents a syscall to modify the address limit of the
> caller. The address limit is kept by the syscall wrapper and restored
> just after the syscall ends.
>
> For example, it would mitigation this bug:
>
> - https://bugs.chromium.org/p/project-zero/issues/detail?id=990
>
> Signed-off-by: Thomas Garnier <thgarnie@google.com>
> ---
> Based on next-20170209
> ---
>  include/linux/syscalls.h | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index 91a740f6b884..a1b6a62a9849 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -198,7 +198,10 @@ extern struct trace_event_functions exit_syscall_print_funcs;
>         asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__));      \
>         asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__))       \
>         {                                                               \
> -               long ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__));  \
> +               long ret;                                               \
> +               mm_segment_t fs = get_fs();                             \
> +               ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__));       \
> +               set_fs(fs);                                             \
>                 __MAP(x,__SC_TEST,__VA_ARGS__);                         \
>                 __PROTECT(x, ret,__MAP(x,__SC_ARGS,__VA_ARGS__));       \
>                 return ret;                                             \
> --
> 2.11.0.483.g087da7b7c-goog
>

I have a memory of Andy looking at this before, and there was some
problem with how a bunch of compat code would set fs and then re-call
the syscall... but I can't quite find the conversation. Andy, do you
remember the details?

This seems like an entirely reasonable thing to enforce for syscalls,
though I'm sure there's a gotcha somewhere. :)

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] syscalls: Restore address limit after a syscall
@ 2017-02-09 19:31   ` Kees Cook
  0 siblings, 0 replies; 22+ messages in thread
From: Kees Cook @ 2017-02-09 19:31 UTC (permalink / raw)
  To: Thomas Garnier, Andy Lutomirski
  Cc: Dave Hansen, Arnd Bergmann, René Nyffenegger, Stephen Bates,
	Jeff Moyer, Milosz Tanski, Linux API, LKML,
	kernel-hardening-ZwoEplunGu1jrUoiu81ncdBPR1lH4CV8

On Thu, Feb 9, 2017 at 10:33 AM, Thomas Garnier <thgarnie-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
> This patch prevents a syscall to modify the address limit of the
> caller. The address limit is kept by the syscall wrapper and restored
> just after the syscall ends.
>
> For example, it would mitigation this bug:
>
> - https://bugs.chromium.org/p/project-zero/issues/detail?id=990
>
> Signed-off-by: Thomas Garnier <thgarnie-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> ---
> Based on next-20170209
> ---
>  include/linux/syscalls.h | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index 91a740f6b884..a1b6a62a9849 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -198,7 +198,10 @@ extern struct trace_event_functions exit_syscall_print_funcs;
>         asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__));      \
>         asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__))       \
>         {                                                               \
> -               long ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__));  \
> +               long ret;                                               \
> +               mm_segment_t fs = get_fs();                             \
> +               ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__));       \
> +               set_fs(fs);                                             \
>                 __MAP(x,__SC_TEST,__VA_ARGS__);                         \
>                 __PROTECT(x, ret,__MAP(x,__SC_ARGS,__VA_ARGS__));       \
>                 return ret;                                             \
> --
> 2.11.0.483.g087da7b7c-goog
>

I have a memory of Andy looking at this before, and there was some
problem with how a bunch of compat code would set fs and then re-call
the syscall... but I can't quite find the conversation. Andy, do you
remember the details?

This seems like an entirely reasonable thing to enforce for syscalls,
though I'm sure there's a gotcha somewhere. :)

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [kernel-hardening] Re: [RFC] syscalls: Restore address limit after a syscall
@ 2017-02-09 19:31   ` Kees Cook
  0 siblings, 0 replies; 22+ messages in thread
From: Kees Cook @ 2017-02-09 19:31 UTC (permalink / raw)
  To: Thomas Garnier, Andy Lutomirski
  Cc: Dave Hansen, Arnd Bergmann, René Nyffenegger, Stephen Bates,
	Jeff Moyer, Milosz Tanski, Linux API, LKML, kernel-hardening

On Thu, Feb 9, 2017 at 10:33 AM, Thomas Garnier <thgarnie@google.com> wrote:
> This patch prevents a syscall to modify the address limit of the
> caller. The address limit is kept by the syscall wrapper and restored
> just after the syscall ends.
>
> For example, it would mitigation this bug:
>
> - https://bugs.chromium.org/p/project-zero/issues/detail?id=990
>
> Signed-off-by: Thomas Garnier <thgarnie@google.com>
> ---
> Based on next-20170209
> ---
>  include/linux/syscalls.h | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index 91a740f6b884..a1b6a62a9849 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -198,7 +198,10 @@ extern struct trace_event_functions exit_syscall_print_funcs;
>         asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__));      \
>         asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__))       \
>         {                                                               \
> -               long ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__));  \
> +               long ret;                                               \
> +               mm_segment_t fs = get_fs();                             \
> +               ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__));       \
> +               set_fs(fs);                                             \
>                 __MAP(x,__SC_TEST,__VA_ARGS__);                         \
>                 __PROTECT(x, ret,__MAP(x,__SC_ARGS,__VA_ARGS__));       \
>                 return ret;                                             \
> --
> 2.11.0.483.g087da7b7c-goog
>

I have a memory of Andy looking at this before, and there was some
problem with how a bunch of compat code would set fs and then re-call
the syscall... but I can't quite find the conversation. Andy, do you
remember the details?

This seems like an entirely reasonable thing to enforce for syscalls,
though I'm sure there's a gotcha somewhere. :)

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] syscalls: Restore address limit after a syscall
  2017-02-09 19:31   ` Kees Cook
@ 2017-02-09 23:05     ` Andy Lutomirski
  -1 siblings, 0 replies; 22+ messages in thread
From: Andy Lutomirski @ 2017-02-09 23:05 UTC (permalink / raw)
  To: Kees Cook
  Cc: Thomas Garnier, Dave Hansen, Arnd Bergmann,
	René Nyffenegger, Stephen Bates, Jeff Moyer, Milosz Tanski,
	Linux API, LKML, kernel-hardening

On Thu, Feb 9, 2017 at 11:31 AM, Kees Cook <keescook@chromium.org> wrote:
> On Thu, Feb 9, 2017 at 10:33 AM, Thomas Garnier <thgarnie@google.com> wrote:
>> This patch prevents a syscall to modify the address limit of the
>> caller. The address limit is kept by the syscall wrapper and restored
>> just after the syscall ends.
>>
>> For example, it would mitigation this bug:
>>
>> - https://bugs.chromium.org/p/project-zero/issues/detail?id=990
>>
>> Signed-off-by: Thomas Garnier <thgarnie@google.com>
>> ---
>> Based on next-20170209
>> ---
>>  include/linux/syscalls.h | 5 ++++-
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
>> index 91a740f6b884..a1b6a62a9849 100644
>> --- a/include/linux/syscalls.h
>> +++ b/include/linux/syscalls.h
>> @@ -198,7 +198,10 @@ extern struct trace_event_functions exit_syscall_print_funcs;
>>         asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__));      \
>>         asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__))       \
>>         {                                                               \
>> -               long ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__));  \
>> +               long ret;                                               \
>> +               mm_segment_t fs = get_fs();                             \
>> +               ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__));       \
>> +               set_fs(fs);                                             \
>>                 __MAP(x,__SC_TEST,__VA_ARGS__);                         \
>>                 __PROTECT(x, ret,__MAP(x,__SC_ARGS,__VA_ARGS__));       \
>>                 return ret;                                             \
>> --
>> 2.11.0.483.g087da7b7c-goog
>>
>
> I have a memory of Andy looking at this before, and there was some
> problem with how a bunch of compat code would set fs and then re-call
> the syscall... but I can't quite find the conversation. Andy, do you
> remember the details?
>
> This seems like an entirely reasonable thing to enforce for syscalls,
> though I'm sure there's a gotcha somewhere. :)

This sounds vaguely familiar, but that's about all.

Anyway, it seems reasonable that the SyS_foobar wrappers are genuinely
only used for syscalls and not for other things, so the code should
*work*.  That being said, I think there's room for several
improvements.

1. Why save the old "fs" value?  For that matter, why restore it?
IOW, I'd rather see BUG_ON(get_fs() != USER_DS) at the end.

2. I'd rather see the mechanism be more general.  If we had, effectively:

asmlinkage long SyS_foo(...) {
  sys_foo();
  verify_pre_usermode_state();
}

and let verify_pre_usermode_state() potentially do more things, we'd
get a more flexible mechanism.  On arches like x86_32, we could save a
decent amount of code size by moving verify_pre_usermode_state() into
prepare_exit_to_usermode(), but that would have to be a per-arch
opt-in.  x86_64 probably would *not* select this due to the fast path
(or it would do it in asm.  hmm.).

3. If this thing gets factored out, then arch code can call it for
non-syscall entries, too.

4. Can we make this configurable?


For x86, a nice implementation might be:

select ARCH_NO_SYSCALL_VERIFY_PRE_USERMODE_STATE

... in prepare_exit_to_usermode():

verify_pre_usermode_state();  // right at the beginning

... in the asm syscall fast path:

#ifdef CONFIG_VERIFY_PRE_USERMODE_STATE
call verify_pre_usermode_staet
#endif

(or just inline the interesting bit)

--Andy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [kernel-hardening] Re: [RFC] syscalls: Restore address limit after a syscall
@ 2017-02-09 23:05     ` Andy Lutomirski
  0 siblings, 0 replies; 22+ messages in thread
From: Andy Lutomirski @ 2017-02-09 23:05 UTC (permalink / raw)
  To: Kees Cook
  Cc: Thomas Garnier, Dave Hansen, Arnd Bergmann,
	René Nyffenegger, Stephen Bates, Jeff Moyer, Milosz Tanski,
	Linux API, LKML, kernel-hardening

On Thu, Feb 9, 2017 at 11:31 AM, Kees Cook <keescook@chromium.org> wrote:
> On Thu, Feb 9, 2017 at 10:33 AM, Thomas Garnier <thgarnie@google.com> wrote:
>> This patch prevents a syscall to modify the address limit of the
>> caller. The address limit is kept by the syscall wrapper and restored
>> just after the syscall ends.
>>
>> For example, it would mitigation this bug:
>>
>> - https://bugs.chromium.org/p/project-zero/issues/detail?id=990
>>
>> Signed-off-by: Thomas Garnier <thgarnie@google.com>
>> ---
>> Based on next-20170209
>> ---
>>  include/linux/syscalls.h | 5 ++++-
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
>> index 91a740f6b884..a1b6a62a9849 100644
>> --- a/include/linux/syscalls.h
>> +++ b/include/linux/syscalls.h
>> @@ -198,7 +198,10 @@ extern struct trace_event_functions exit_syscall_print_funcs;
>>         asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__));      \
>>         asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__))       \
>>         {                                                               \
>> -               long ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__));  \
>> +               long ret;                                               \
>> +               mm_segment_t fs = get_fs();                             \
>> +               ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__));       \
>> +               set_fs(fs);                                             \
>>                 __MAP(x,__SC_TEST,__VA_ARGS__);                         \
>>                 __PROTECT(x, ret,__MAP(x,__SC_ARGS,__VA_ARGS__));       \
>>                 return ret;                                             \
>> --
>> 2.11.0.483.g087da7b7c-goog
>>
>
> I have a memory of Andy looking at this before, and there was some
> problem with how a bunch of compat code would set fs and then re-call
> the syscall... but I can't quite find the conversation. Andy, do you
> remember the details?
>
> This seems like an entirely reasonable thing to enforce for syscalls,
> though I'm sure there's a gotcha somewhere. :)

This sounds vaguely familiar, but that's about all.

Anyway, it seems reasonable that the SyS_foobar wrappers are genuinely
only used for syscalls and not for other things, so the code should
*work*.  That being said, I think there's room for several
improvements.

1. Why save the old "fs" value?  For that matter, why restore it?
IOW, I'd rather see BUG_ON(get_fs() != USER_DS) at the end.

2. I'd rather see the mechanism be more general.  If we had, effectively:

asmlinkage long SyS_foo(...) {
  sys_foo();
  verify_pre_usermode_state();
}

and let verify_pre_usermode_state() potentially do more things, we'd
get a more flexible mechanism.  On arches like x86_32, we could save a
decent amount of code size by moving verify_pre_usermode_state() into
prepare_exit_to_usermode(), but that would have to be a per-arch
opt-in.  x86_64 probably would *not* select this due to the fast path
(or it would do it in asm.  hmm.).

3. If this thing gets factored out, then arch code can call it for
non-syscall entries, too.

4. Can we make this configurable?


For x86, a nice implementation might be:

select ARCH_NO_SYSCALL_VERIFY_PRE_USERMODE_STATE

... in prepare_exit_to_usermode():

verify_pre_usermode_state();  // right at the beginning

... in the asm syscall fast path:

#ifdef CONFIG_VERIFY_PRE_USERMODE_STATE
call verify_pre_usermode_staet
#endif

(or just inline the interesting bit)

--Andy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] syscalls: Restore address limit after a syscall
  2017-02-09 23:05     ` [kernel-hardening] " Andy Lutomirski
@ 2017-02-09 23:41       ` Thomas Garnier
  -1 siblings, 0 replies; 22+ messages in thread
From: Thomas Garnier @ 2017-02-09 23:41 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kees Cook, Dave Hansen, Arnd Bergmann, René Nyffenegger,
	Stephen Bates, Jeff Moyer, Milosz Tanski, Linux API, LKML,
	kernel-hardening

On Thu, Feb 9, 2017 at 3:05 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Thu, Feb 9, 2017 at 11:31 AM, Kees Cook <keescook@chromium.org> wrote:
>> On Thu, Feb 9, 2017 at 10:33 AM, Thomas Garnier <thgarnie@google.com> wrote:
>>> This patch prevents a syscall to modify the address limit of the
>>> caller. The address limit is kept by the syscall wrapper and restored
>>> just after the syscall ends.
>>>
>>> For example, it would mitigation this bug:
>>>
>>> - https://bugs.chromium.org/p/project-zero/issues/detail?id=990
>>>
>>> Signed-off-by: Thomas Garnier <thgarnie@google.com>
>>> ---
>>> Based on next-20170209
>>> ---
>>>  include/linux/syscalls.h | 5 ++++-
>>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
>>> index 91a740f6b884..a1b6a62a9849 100644
>>> --- a/include/linux/syscalls.h
>>> +++ b/include/linux/syscalls.h
>>> @@ -198,7 +198,10 @@ extern struct trace_event_functions exit_syscall_print_funcs;
>>>         asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__));      \
>>>         asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__))       \
>>>         {                                                               \
>>> -               long ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__));  \
>>> +               long ret;                                               \
>>> +               mm_segment_t fs = get_fs();                             \
>>> +               ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__));       \
>>> +               set_fs(fs);                                             \
>>>                 __MAP(x,__SC_TEST,__VA_ARGS__);                         \
>>>                 __PROTECT(x, ret,__MAP(x,__SC_ARGS,__VA_ARGS__));       \
>>>                 return ret;                                             \
>>> --
>>> 2.11.0.483.g087da7b7c-goog
>>>
>>
>> I have a memory of Andy looking at this before, and there was some
>> problem with how a bunch of compat code would set fs and then re-call
>> the syscall... but I can't quite find the conversation. Andy, do you
>> remember the details?
>>
>> This seems like an entirely reasonable thing to enforce for syscalls,
>> though I'm sure there's a gotcha somewhere. :)
>
> This sounds vaguely familiar, but that's about all.
>
> Anyway, it seems reasonable that the SyS_foobar wrappers are genuinely
> only used for syscalls and not for other things, so the code should
> *work*.  That being said, I think there's room for several
> improvements.
>
> 1. Why save the old "fs" value?  For that matter, why restore it?
> IOW, I'd rather see BUG_ON(get_fs() != USER_DS) at the end.
>

I guess that make sense in the wrapper.

> 2. I'd rather see the mechanism be more general.  If we had, effectively:
>
> asmlinkage long SyS_foo(...) {
>   sys_foo();
>   verify_pre_usermode_state();
> }
>
> and let verify_pre_usermode_state() potentially do more things, we'd
> get a more flexible mechanism.  On arches like x86_32, we could save a
> decent amount of code size by moving verify_pre_usermode_state() into
> prepare_exit_to_usermode(), but that would have to be a per-arch
> opt-in.  x86_64 probably would *not* select this due to the fast path
> (or it would do it in asm.  hmm.).
>

I will look into that. I like this design better.

> 3. If this thing gets factored out, then arch code can call it for
> non-syscall entries, too.
>

Yes, it makes sense.

> 4. Can we make this configurable?
>
>
> For x86, a nice implementation might be:
>
> select ARCH_NO_SYSCALL_VERIFY_PRE_USERMODE_STATE
>
> ... in prepare_exit_to_usermode():
>
> verify_pre_usermode_state();  // right at the beginning
>
> ... in the asm syscall fast path:
>
> #ifdef CONFIG_VERIFY_PRE_USERMODE_STATE
> call verify_pre_usermode_staet
> #endif
>
> (or just inline the interesting bit)
>

So by default it is in the wrapper. If selected, an architecture can
disable the wrapper put it in the best places. Understood correctly?

> --Andy



-- 
Thomas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [kernel-hardening] Re: [RFC] syscalls: Restore address limit after a syscall
@ 2017-02-09 23:41       ` Thomas Garnier
  0 siblings, 0 replies; 22+ messages in thread
From: Thomas Garnier @ 2017-02-09 23:41 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kees Cook, Dave Hansen, Arnd Bergmann, René Nyffenegger,
	Stephen Bates, Jeff Moyer, Milosz Tanski, Linux API, LKML,
	kernel-hardening

On Thu, Feb 9, 2017 at 3:05 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Thu, Feb 9, 2017 at 11:31 AM, Kees Cook <keescook@chromium.org> wrote:
>> On Thu, Feb 9, 2017 at 10:33 AM, Thomas Garnier <thgarnie@google.com> wrote:
>>> This patch prevents a syscall to modify the address limit of the
>>> caller. The address limit is kept by the syscall wrapper and restored
>>> just after the syscall ends.
>>>
>>> For example, it would mitigation this bug:
>>>
>>> - https://bugs.chromium.org/p/project-zero/issues/detail?id=990
>>>
>>> Signed-off-by: Thomas Garnier <thgarnie@google.com>
>>> ---
>>> Based on next-20170209
>>> ---
>>>  include/linux/syscalls.h | 5 ++++-
>>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
>>> index 91a740f6b884..a1b6a62a9849 100644
>>> --- a/include/linux/syscalls.h
>>> +++ b/include/linux/syscalls.h
>>> @@ -198,7 +198,10 @@ extern struct trace_event_functions exit_syscall_print_funcs;
>>>         asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__));      \
>>>         asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__))       \
>>>         {                                                               \
>>> -               long ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__));  \
>>> +               long ret;                                               \
>>> +               mm_segment_t fs = get_fs();                             \
>>> +               ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__));       \
>>> +               set_fs(fs);                                             \
>>>                 __MAP(x,__SC_TEST,__VA_ARGS__);                         \
>>>                 __PROTECT(x, ret,__MAP(x,__SC_ARGS,__VA_ARGS__));       \
>>>                 return ret;                                             \
>>> --
>>> 2.11.0.483.g087da7b7c-goog
>>>
>>
>> I have a memory of Andy looking at this before, and there was some
>> problem with how a bunch of compat code would set fs and then re-call
>> the syscall... but I can't quite find the conversation. Andy, do you
>> remember the details?
>>
>> This seems like an entirely reasonable thing to enforce for syscalls,
>> though I'm sure there's a gotcha somewhere. :)
>
> This sounds vaguely familiar, but that's about all.
>
> Anyway, it seems reasonable that the SyS_foobar wrappers are genuinely
> only used for syscalls and not for other things, so the code should
> *work*.  That being said, I think there's room for several
> improvements.
>
> 1. Why save the old "fs" value?  For that matter, why restore it?
> IOW, I'd rather see BUG_ON(get_fs() != USER_DS) at the end.
>

I guess that make sense in the wrapper.

> 2. I'd rather see the mechanism be more general.  If we had, effectively:
>
> asmlinkage long SyS_foo(...) {
>   sys_foo();
>   verify_pre_usermode_state();
> }
>
> and let verify_pre_usermode_state() potentially do more things, we'd
> get a more flexible mechanism.  On arches like x86_32, we could save a
> decent amount of code size by moving verify_pre_usermode_state() into
> prepare_exit_to_usermode(), but that would have to be a per-arch
> opt-in.  x86_64 probably would *not* select this due to the fast path
> (or it would do it in asm.  hmm.).
>

I will look into that. I like this design better.

> 3. If this thing gets factored out, then arch code can call it for
> non-syscall entries, too.
>

Yes, it makes sense.

> 4. Can we make this configurable?
>
>
> For x86, a nice implementation might be:
>
> select ARCH_NO_SYSCALL_VERIFY_PRE_USERMODE_STATE
>
> ... in prepare_exit_to_usermode():
>
> verify_pre_usermode_state();  // right at the beginning
>
> ... in the asm syscall fast path:
>
> #ifdef CONFIG_VERIFY_PRE_USERMODE_STATE
> call verify_pre_usermode_staet
> #endif
>
> (or just inline the interesting bit)
>

So by default it is in the wrapper. If selected, an architecture can
disable the wrapper put it in the best places. Understood correctly?

> --Andy



-- 
Thomas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] syscalls: Restore address limit after a syscall
  2017-02-09 23:41       ` [kernel-hardening] " Thomas Garnier
  (?)
@ 2017-02-10  2:42         ` Andy Lutomirski
  -1 siblings, 0 replies; 22+ messages in thread
From: Andy Lutomirski @ 2017-02-10  2:42 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Kees Cook, Dave Hansen, Arnd Bergmann, René Nyffenegger,
	Stephen Bates, Jeff Moyer, Milosz Tanski, Linux API, LKML,
	kernel-hardening, Will Deacon, linux-arm-kernel, linux-s390

On Thu, Feb 9, 2017 at 3:41 PM, Thomas Garnier <thgarnie@google.com> wrote:
> On Thu, Feb 9, 2017 at 3:05 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Thu, Feb 9, 2017 at 11:31 AM, Kees Cook <keescook@chromium.org> wrote:
>>> On Thu, Feb 9, 2017 at 10:33 AM, Thomas Garnier <thgarnie@google.com> wrote:
>>>> This patch prevents a syscall to modify the address limit of the
>>>> caller. The address limit is kept by the syscall wrapper and restored
>>>> just after the syscall ends.
>>>>
>>>> For example, it would mitigation this bug:
>>>>
>>>> - https://bugs.chromium.org/p/project-zero/issues/detail?id=990
>>>>
>>>> Signed-off-by: Thomas Garnier <thgarnie@google.com>
>>>> ---
>>>> Based on next-20170209
>>>> ---
>>>>  include/linux/syscalls.h | 5 ++++-
>>>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
>>>> index 91a740f6b884..a1b6a62a9849 100644
>>>> --- a/include/linux/syscalls.h
>>>> +++ b/include/linux/syscalls.h
>>>> @@ -198,7 +198,10 @@ extern struct trace_event_functions exit_syscall_print_funcs;
>>>>         asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__));      \
>>>>         asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__))       \
>>>>         {                                                               \
>>>> -               long ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__));  \
>>>> +               long ret;                                               \
>>>> +               mm_segment_t fs = get_fs();                             \
>>>> +               ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__));       \
>>>> +               set_fs(fs);                                             \
>>>>                 __MAP(x,__SC_TEST,__VA_ARGS__);                         \
>>>>                 __PROTECT(x, ret,__MAP(x,__SC_ARGS,__VA_ARGS__));       \
>>>>                 return ret;                                             \
>>>> --
>>>> 2.11.0.483.g087da7b7c-goog
>>>>
>>>
>>> I have a memory of Andy looking at this before, and there was some
>>> problem with how a bunch of compat code would set fs and then re-call
>>> the syscall... but I can't quite find the conversation. Andy, do you
>>> remember the details?
>>>
>>> This seems like an entirely reasonable thing to enforce for syscalls,
>>> though I'm sure there's a gotcha somewhere. :)
>>
>> This sounds vaguely familiar, but that's about all.
>>
>> Anyway, it seems reasonable that the SyS_foobar wrappers are genuinely
>> only used for syscalls and not for other things, so the code should
>> *work*.  That being said, I think there's room for several
>> improvements.
>>
>> 1. Why save the old "fs" value?  For that matter, why restore it?
>> IOW, I'd rather see BUG_ON(get_fs() != USER_DS) at the end.
>>
>
> I guess that make sense in the wrapper.
>
>> 2. I'd rather see the mechanism be more general.  If we had, effectively:
>>
>> asmlinkage long SyS_foo(...) {
>>   sys_foo();
>>   verify_pre_usermode_state();
>> }
>>
>> and let verify_pre_usermode_state() potentially do more things, we'd
>> get a more flexible mechanism.  On arches like x86_32, we could save a
>> decent amount of code size by moving verify_pre_usermode_state() into
>> prepare_exit_to_usermode(), but that would have to be a per-arch
>> opt-in.  x86_64 probably would *not* select this due to the fast path
>> (or it would do it in asm.  hmm.).
>>
>
> I will look into that. I like this design better.
>
>> 3. If this thing gets factored out, then arch code can call it for
>> non-syscall entries, too.
>>
>
> Yes, it makes sense.
>
>> 4. Can we make this configurable?
>>
>>
>> For x86, a nice implementation might be:
>>
>> select ARCH_NO_SYSCALL_VERIFY_PRE_USERMODE_STATE
>>
>> ... in prepare_exit_to_usermode():
>>
>> verify_pre_usermode_state();  // right at the beginning
>>
>> ... in the asm syscall fast path:
>>
>> #ifdef CONFIG_VERIFY_PRE_USERMODE_STATE
>> call verify_pre_usermode_staet
>> #endif
>>
>> (or just inline the interesting bit)
>>
>
> So by default it is in the wrapper. If selected, an architecture can
> disable the wrapper put it in the best places. Understood correctly?

Sounds good to me.

Presumably the result should go through -mm.  Want to cc: akpm and
linux-arch@ on the next version?

I've also cc'd arm and s390 folks -- those are the other arches that
try to be on top of hardening.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [kernel-hardening] Re: [RFC] syscalls: Restore address limit after a syscall
@ 2017-02-10  2:42         ` Andy Lutomirski
  0 siblings, 0 replies; 22+ messages in thread
From: Andy Lutomirski @ 2017-02-10  2:42 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Kees Cook, Dave Hansen, Arnd Bergmann, René Nyffenegger,
	Stephen Bates, Jeff Moyer, Milosz Tanski, Linux API, LKML,
	kernel-hardening, Will Deacon, linux-arm-kernel, linux-s390

On Thu, Feb 9, 2017 at 3:41 PM, Thomas Garnier <thgarnie@google.com> wrote:
> On Thu, Feb 9, 2017 at 3:05 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Thu, Feb 9, 2017 at 11:31 AM, Kees Cook <keescook@chromium.org> wrote:
>>> On Thu, Feb 9, 2017 at 10:33 AM, Thomas Garnier <thgarnie@google.com> wrote:
>>>> This patch prevents a syscall to modify the address limit of the
>>>> caller. The address limit is kept by the syscall wrapper and restored
>>>> just after the syscall ends.
>>>>
>>>> For example, it would mitigation this bug:
>>>>
>>>> - https://bugs.chromium.org/p/project-zero/issues/detail?id=990
>>>>
>>>> Signed-off-by: Thomas Garnier <thgarnie@google.com>
>>>> ---
>>>> Based on next-20170209
>>>> ---
>>>>  include/linux/syscalls.h | 5 ++++-
>>>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
>>>> index 91a740f6b884..a1b6a62a9849 100644
>>>> --- a/include/linux/syscalls.h
>>>> +++ b/include/linux/syscalls.h
>>>> @@ -198,7 +198,10 @@ extern struct trace_event_functions exit_syscall_print_funcs;
>>>>         asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__));      \
>>>>         asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__))       \
>>>>         {                                                               \
>>>> -               long ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__));  \
>>>> +               long ret;                                               \
>>>> +               mm_segment_t fs = get_fs();                             \
>>>> +               ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__));       \
>>>> +               set_fs(fs);                                             \
>>>>                 __MAP(x,__SC_TEST,__VA_ARGS__);                         \
>>>>                 __PROTECT(x, ret,__MAP(x,__SC_ARGS,__VA_ARGS__));       \
>>>>                 return ret;                                             \
>>>> --
>>>> 2.11.0.483.g087da7b7c-goog
>>>>
>>>
>>> I have a memory of Andy looking at this before, and there was some
>>> problem with how a bunch of compat code would set fs and then re-call
>>> the syscall... but I can't quite find the conversation. Andy, do you
>>> remember the details?
>>>
>>> This seems like an entirely reasonable thing to enforce for syscalls,
>>> though I'm sure there's a gotcha somewhere. :)
>>
>> This sounds vaguely familiar, but that's about all.
>>
>> Anyway, it seems reasonable that the SyS_foobar wrappers are genuinely
>> only used for syscalls and not for other things, so the code should
>> *work*.  That being said, I think there's room for several
>> improvements.
>>
>> 1. Why save the old "fs" value?  For that matter, why restore it?
>> IOW, I'd rather see BUG_ON(get_fs() != USER_DS) at the end.
>>
>
> I guess that make sense in the wrapper.
>
>> 2. I'd rather see the mechanism be more general.  If we had, effectively:
>>
>> asmlinkage long SyS_foo(...) {
>>   sys_foo();
>>   verify_pre_usermode_state();
>> }
>>
>> and let verify_pre_usermode_state() potentially do more things, we'd
>> get a more flexible mechanism.  On arches like x86_32, we could save a
>> decent amount of code size by moving verify_pre_usermode_state() into
>> prepare_exit_to_usermode(), but that would have to be a per-arch
>> opt-in.  x86_64 probably would *not* select this due to the fast path
>> (or it would do it in asm.  hmm.).
>>
>
> I will look into that. I like this design better.
>
>> 3. If this thing gets factored out, then arch code can call it for
>> non-syscall entries, too.
>>
>
> Yes, it makes sense.
>
>> 4. Can we make this configurable?
>>
>>
>> For x86, a nice implementation might be:
>>
>> select ARCH_NO_SYSCALL_VERIFY_PRE_USERMODE_STATE
>>
>> ... in prepare_exit_to_usermode():
>>
>> verify_pre_usermode_state();  // right at the beginning
>>
>> ... in the asm syscall fast path:
>>
>> #ifdef CONFIG_VERIFY_PRE_USERMODE_STATE
>> call verify_pre_usermode_staet
>> #endif
>>
>> (or just inline the interesting bit)
>>
>
> So by default it is in the wrapper. If selected, an architecture can
> disable the wrapper put it in the best places. Understood correctly?

Sounds good to me.

Presumably the result should go through -mm.  Want to cc: akpm and
linux-arch@ on the next version?

I've also cc'd arm and s390 folks -- those are the other arches that
try to be on top of hardening.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC] syscalls: Restore address limit after a syscall
@ 2017-02-10  2:42         ` Andy Lutomirski
  0 siblings, 0 replies; 22+ messages in thread
From: Andy Lutomirski @ 2017-02-10  2:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Feb 9, 2017 at 3:41 PM, Thomas Garnier <thgarnie@google.com> wrote:
> On Thu, Feb 9, 2017 at 3:05 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Thu, Feb 9, 2017 at 11:31 AM, Kees Cook <keescook@chromium.org> wrote:
>>> On Thu, Feb 9, 2017 at 10:33 AM, Thomas Garnier <thgarnie@google.com> wrote:
>>>> This patch prevents a syscall to modify the address limit of the
>>>> caller. The address limit is kept by the syscall wrapper and restored
>>>> just after the syscall ends.
>>>>
>>>> For example, it would mitigation this bug:
>>>>
>>>> - https://bugs.chromium.org/p/project-zero/issues/detail?id=990
>>>>
>>>> Signed-off-by: Thomas Garnier <thgarnie@google.com>
>>>> ---
>>>> Based on next-20170209
>>>> ---
>>>>  include/linux/syscalls.h | 5 ++++-
>>>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
>>>> index 91a740f6b884..a1b6a62a9849 100644
>>>> --- a/include/linux/syscalls.h
>>>> +++ b/include/linux/syscalls.h
>>>> @@ -198,7 +198,10 @@ extern struct trace_event_functions exit_syscall_print_funcs;
>>>>         asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__));      \
>>>>         asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__))       \
>>>>         {                                                               \
>>>> -               long ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__));  \
>>>> +               long ret;                                               \
>>>> +               mm_segment_t fs = get_fs();                             \
>>>> +               ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__));       \
>>>> +               set_fs(fs);                                             \
>>>>                 __MAP(x,__SC_TEST,__VA_ARGS__);                         \
>>>>                 __PROTECT(x, ret,__MAP(x,__SC_ARGS,__VA_ARGS__));       \
>>>>                 return ret;                                             \
>>>> --
>>>> 2.11.0.483.g087da7b7c-goog
>>>>
>>>
>>> I have a memory of Andy looking at this before, and there was some
>>> problem with how a bunch of compat code would set fs and then re-call
>>> the syscall... but I can't quite find the conversation. Andy, do you
>>> remember the details?
>>>
>>> This seems like an entirely reasonable thing to enforce for syscalls,
>>> though I'm sure there's a gotcha somewhere. :)
>>
>> This sounds vaguely familiar, but that's about all.
>>
>> Anyway, it seems reasonable that the SyS_foobar wrappers are genuinely
>> only used for syscalls and not for other things, so the code should
>> *work*.  That being said, I think there's room for several
>> improvements.
>>
>> 1. Why save the old "fs" value?  For that matter, why restore it?
>> IOW, I'd rather see BUG_ON(get_fs() != USER_DS) at the end.
>>
>
> I guess that make sense in the wrapper.
>
>> 2. I'd rather see the mechanism be more general.  If we had, effectively:
>>
>> asmlinkage long SyS_foo(...) {
>>   sys_foo();
>>   verify_pre_usermode_state();
>> }
>>
>> and let verify_pre_usermode_state() potentially do more things, we'd
>> get a more flexible mechanism.  On arches like x86_32, we could save a
>> decent amount of code size by moving verify_pre_usermode_state() into
>> prepare_exit_to_usermode(), but that would have to be a per-arch
>> opt-in.  x86_64 probably would *not* select this due to the fast path
>> (or it would do it in asm.  hmm.).
>>
>
> I will look into that. I like this design better.
>
>> 3. If this thing gets factored out, then arch code can call it for
>> non-syscall entries, too.
>>
>
> Yes, it makes sense.
>
>> 4. Can we make this configurable?
>>
>>
>> For x86, a nice implementation might be:
>>
>> select ARCH_NO_SYSCALL_VERIFY_PRE_USERMODE_STATE
>>
>> ... in prepare_exit_to_usermode():
>>
>> verify_pre_usermode_state();  // right at the beginning
>>
>> ... in the asm syscall fast path:
>>
>> #ifdef CONFIG_VERIFY_PRE_USERMODE_STATE
>> call verify_pre_usermode_staet
>> #endif
>>
>> (or just inline the interesting bit)
>>
>
> So by default it is in the wrapper. If selected, an architecture can
> disable the wrapper put it in the best places. Understood correctly?

Sounds good to me.

Presumably the result should go through -mm.  Want to cc: akpm and
linux-arch@ on the next version?

I've also cc'd arm and s390 folks -- those are the other arches that
try to be on top of hardening.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] syscalls: Restore address limit after a syscall
  2017-02-10  2:42         ` [kernel-hardening] " Andy Lutomirski
  (?)
@ 2017-02-10 19:22           ` Russell King - ARM Linux
  -1 siblings, 0 replies; 22+ messages in thread
From: Russell King - ARM Linux @ 2017-02-10 19:22 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Garnier, Stephen Bates, linux-s390, Kees Cook,
	Arnd Bergmann, kernel-hardening, Linux API, Will Deacon, LKML,
	Dave Hansen, Jeff Moyer, René Nyffenegger, Milosz Tanski,
	linux-arm-kernel

On Thu, Feb 09, 2017 at 06:42:34PM -0800, Andy Lutomirski wrote:
> On Thu, Feb 9, 2017 at 3:41 PM, Thomas Garnier <thgarnie@google.com> wrote:
> > So by default it is in the wrapper. If selected, an architecture can
> > disable the wrapper put it in the best places. Understood correctly?
> 
> Sounds good to me.
> 
> Presumably the result should go through -mm.  Want to cc: akpm and
> linux-arch@ on the next version?
> 
> I've also cc'd arm and s390 folks -- those are the other arches that
> try to be on top of hardening.

The best place for this on ARM is in the assembly code, rather than in
the hundreds of system calls - having it in one place is surely better
for reducing the cache impact.

This (untested) patch should be sufficient for ARM - there's two choices
which I think make sense to do this:
1. Immediately after returning the syscall
2. Immediately before any returning to userspace

(1) has the advantage that the address limit will be forced for the
exit-path works that we do, preventing those making accesses to kernel
space.

(2) has the advantage that we'd guarantee that the address limit will
be forced while userspace is running for the next entry into kernel
space.

There's actually a third option as well:

(3) forcing the address limit on entry to the kernel from userspace.

This patch implements option 1.

 arch/arm/kernel/entry-common.S | 6 ++++++
 1 files changed, 6 insertions(+)

diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
index eb5cd77bf1d8..6a717a2ccb88 100644
--- a/arch/arm/kernel/entry-common.S
+++ b/arch/arm/kernel/entry-common.S
@@ -39,6 +39,8 @@
 ret_fast_syscall:
  UNWIND(.fnstart	)
  UNWIND(.cantunwind	)
+	mov	r1, #TASK_SIZE
+	str	r1, [tsk, #TI_ADDR_LIMIT]
 	disable_irq_notrace			@ disable interrupts
 	ldr	r1, [tsk, #TI_FLAGS]		@ re-check for syscall tracing
 	tst	r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK
@@ -64,6 +66,8 @@ ENDPROC(ret_fast_syscall)
 ret_fast_syscall:
  UNWIND(.fnstart	)
  UNWIND(.cantunwind	)
+	mov	r1, #TASK_SIZE
+	str	r1, [tsk, #TI_ADDR_LIMIT]
 	str	r0, [sp, #S_R0 + S_OFF]!	@ save returned r0
 	disable_irq_notrace			@ disable interrupts
 	ldr	r1, [tsk, #TI_FLAGS]		@ re-check for syscall tracing
@@ -262,6 +266,8 @@ ENDPROC(vector_swi)
 	b	ret_slow_syscall
 
 __sys_trace_return:
+	mov	r1, #TASK_SIZE
+	str	r1, [tsk, #TI_ADDR_LIMIT]
 	str	r0, [sp, #S_R0 + S_OFF]!	@ save returned r0
 	mov	r0, sp
 	bl	syscall_trace_exit


-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [kernel-hardening] Re: [RFC] syscalls: Restore address limit after a syscall
@ 2017-02-10 19:22           ` Russell King - ARM Linux
  0 siblings, 0 replies; 22+ messages in thread
From: Russell King - ARM Linux @ 2017-02-10 19:22 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Garnier, Stephen Bates, linux-s390, Kees Cook,
	Arnd Bergmann, kernel-hardening, Linux API, Will Deacon, LKML,
	Dave Hansen, Jeff Moyer, René Nyffenegger, Milosz Tanski,
	linux-arm-kernel

On Thu, Feb 09, 2017 at 06:42:34PM -0800, Andy Lutomirski wrote:
> On Thu, Feb 9, 2017 at 3:41 PM, Thomas Garnier <thgarnie@google.com> wrote:
> > So by default it is in the wrapper. If selected, an architecture can
> > disable the wrapper put it in the best places. Understood correctly?
> 
> Sounds good to me.
> 
> Presumably the result should go through -mm.  Want to cc: akpm and
> linux-arch@ on the next version?
> 
> I've also cc'd arm and s390 folks -- those are the other arches that
> try to be on top of hardening.

The best place for this on ARM is in the assembly code, rather than in
the hundreds of system calls - having it in one place is surely better
for reducing the cache impact.

This (untested) patch should be sufficient for ARM - there's two choices
which I think make sense to do this:
1. Immediately after returning the syscall
2. Immediately before any returning to userspace

(1) has the advantage that the address limit will be forced for the
exit-path works that we do, preventing those making accesses to kernel
space.

(2) has the advantage that we'd guarantee that the address limit will
be forced while userspace is running for the next entry into kernel
space.

There's actually a third option as well:

(3) forcing the address limit on entry to the kernel from userspace.

This patch implements option 1.

 arch/arm/kernel/entry-common.S | 6 ++++++
 1 files changed, 6 insertions(+)

diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
index eb5cd77bf1d8..6a717a2ccb88 100644
--- a/arch/arm/kernel/entry-common.S
+++ b/arch/arm/kernel/entry-common.S
@@ -39,6 +39,8 @@
 ret_fast_syscall:
  UNWIND(.fnstart	)
  UNWIND(.cantunwind	)
+	mov	r1, #TASK_SIZE
+	str	r1, [tsk, #TI_ADDR_LIMIT]
 	disable_irq_notrace			@ disable interrupts
 	ldr	r1, [tsk, #TI_FLAGS]		@ re-check for syscall tracing
 	tst	r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK
@@ -64,6 +66,8 @@ ENDPROC(ret_fast_syscall)
 ret_fast_syscall:
  UNWIND(.fnstart	)
  UNWIND(.cantunwind	)
+	mov	r1, #TASK_SIZE
+	str	r1, [tsk, #TI_ADDR_LIMIT]
 	str	r0, [sp, #S_R0 + S_OFF]!	@ save returned r0
 	disable_irq_notrace			@ disable interrupts
 	ldr	r1, [tsk, #TI_FLAGS]		@ re-check for syscall tracing
@@ -262,6 +266,8 @@ ENDPROC(vector_swi)
 	b	ret_slow_syscall
 
 __sys_trace_return:
+	mov	r1, #TASK_SIZE
+	str	r1, [tsk, #TI_ADDR_LIMIT]
 	str	r0, [sp, #S_R0 + S_OFF]!	@ save returned r0
 	mov	r0, sp
 	bl	syscall_trace_exit


-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC] syscalls: Restore address limit after a syscall
@ 2017-02-10 19:22           ` Russell King - ARM Linux
  0 siblings, 0 replies; 22+ messages in thread
From: Russell King - ARM Linux @ 2017-02-10 19:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Feb 09, 2017 at 06:42:34PM -0800, Andy Lutomirski wrote:
> On Thu, Feb 9, 2017 at 3:41 PM, Thomas Garnier <thgarnie@google.com> wrote:
> > So by default it is in the wrapper. If selected, an architecture can
> > disable the wrapper put it in the best places. Understood correctly?
> 
> Sounds good to me.
> 
> Presumably the result should go through -mm.  Want to cc: akpm and
> linux-arch@ on the next version?
> 
> I've also cc'd arm and s390 folks -- those are the other arches that
> try to be on top of hardening.

The best place for this on ARM is in the assembly code, rather than in
the hundreds of system calls - having it in one place is surely better
for reducing the cache impact.

This (untested) patch should be sufficient for ARM - there's two choices
which I think make sense to do this:
1. Immediately after returning the syscall
2. Immediately before any returning to userspace

(1) has the advantage that the address limit will be forced for the
exit-path works that we do, preventing those making accesses to kernel
space.

(2) has the advantage that we'd guarantee that the address limit will
be forced while userspace is running for the next entry into kernel
space.

There's actually a third option as well:

(3) forcing the address limit on entry to the kernel from userspace.

This patch implements option 1.

 arch/arm/kernel/entry-common.S | 6 ++++++
 1 files changed, 6 insertions(+)

diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
index eb5cd77bf1d8..6a717a2ccb88 100644
--- a/arch/arm/kernel/entry-common.S
+++ b/arch/arm/kernel/entry-common.S
@@ -39,6 +39,8 @@
 ret_fast_syscall:
  UNWIND(.fnstart	)
  UNWIND(.cantunwind	)
+	mov	r1, #TASK_SIZE
+	str	r1, [tsk, #TI_ADDR_LIMIT]
 	disable_irq_notrace			@ disable interrupts
 	ldr	r1, [tsk, #TI_FLAGS]		@ re-check for syscall tracing
 	tst	r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK
@@ -64,6 +66,8 @@ ENDPROC(ret_fast_syscall)
 ret_fast_syscall:
  UNWIND(.fnstart	)
  UNWIND(.cantunwind	)
+	mov	r1, #TASK_SIZE
+	str	r1, [tsk, #TI_ADDR_LIMIT]
 	str	r0, [sp, #S_R0 + S_OFF]!	@ save returned r0
 	disable_irq_notrace			@ disable interrupts
 	ldr	r1, [tsk, #TI_FLAGS]		@ re-check for syscall tracing
@@ -262,6 +266,8 @@ ENDPROC(vector_swi)
 	b	ret_slow_syscall
 
 __sys_trace_return:
+	mov	r1, #TASK_SIZE
+	str	r1, [tsk, #TI_ADDR_LIMIT]
 	str	r0, [sp, #S_R0 + S_OFF]!	@ save returned r0
 	mov	r0, sp
 	bl	syscall_trace_exit


-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [RFC] syscalls: Restore address limit after a syscall
  2017-02-10 19:22           ` [kernel-hardening] " Russell King - ARM Linux
  (?)
@ 2017-02-10 20:49             ` Kees Cook
  -1 siblings, 0 replies; 22+ messages in thread
From: Kees Cook @ 2017-02-10 20:49 UTC (permalink / raw)
  To: Russell King - ARM Linux, Mark Rutland
  Cc: Andy Lutomirski, Thomas Garnier, Stephen Bates, linux-s390,
	Arnd Bergmann, kernel-hardening, Linux API, Will Deacon, LKML,
	Dave Hansen, Jeff Moyer, René Nyffenegger, Milosz Tanski,
	linux-arm-kernel

On Fri, Feb 10, 2017 at 11:22 AM, Russell King - ARM Linux
<linux@armlinux.org.uk> wrote:
> On Thu, Feb 09, 2017 at 06:42:34PM -0800, Andy Lutomirski wrote:
>> On Thu, Feb 9, 2017 at 3:41 PM, Thomas Garnier <thgarnie@google.com> wrote:
>> > So by default it is in the wrapper. If selected, an architecture can
>> > disable the wrapper put it in the best places. Understood correctly?
>>
>> Sounds good to me.
>>
>> Presumably the result should go through -mm.  Want to cc: akpm and
>> linux-arch@ on the next version?
>>
>> I've also cc'd arm and s390 folks -- those are the other arches that
>> try to be on top of hardening.
>
> The best place for this on ARM is in the assembly code, rather than in
> the hundreds of system calls - having it in one place is surely better
> for reducing the cache impact.
>
> This (untested) patch should be sufficient for ARM - there's two choices
> which I think make sense to do this:
> 1. Immediately after returning the syscall
> 2. Immediately before any returning to userspace
>
> (1) has the advantage that the address limit will be forced for the
> exit-path works that we do, preventing those making accesses to kernel
> space.
>
> (2) has the advantage that we'd guarantee that the address limit will
> be forced while userspace is running for the next entry into kernel
> space.
>
> There's actually a third option as well:
>
> (3) forcing the address limit on entry to the kernel from userspace.
>
> This patch implements option 1.
>
>  arch/arm/kernel/entry-common.S | 6 ++++++
>  1 files changed, 6 insertions(+)
>
> diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
> index eb5cd77bf1d8..6a717a2ccb88 100644
> --- a/arch/arm/kernel/entry-common.S
> +++ b/arch/arm/kernel/entry-common.S
> @@ -39,6 +39,8 @@
>  ret_fast_syscall:
>   UNWIND(.fnstart       )
>   UNWIND(.cantunwind    )
> +       mov     r1, #TASK_SIZE
> +       str     r1, [tsk, #TI_ADDR_LIMIT]
>         disable_irq_notrace                     @ disable interrupts
>         ldr     r1, [tsk, #TI_FLAGS]            @ re-check for syscall tracing
>         tst     r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK
> @@ -64,6 +66,8 @@ ENDPROC(ret_fast_syscall)
>  ret_fast_syscall:
>   UNWIND(.fnstart       )
>   UNWIND(.cantunwind    )
> +       mov     r1, #TASK_SIZE
> +       str     r1, [tsk, #TI_ADDR_LIMIT]
>         str     r0, [sp, #S_R0 + S_OFF]!        @ save returned r0
>         disable_irq_notrace                     @ disable interrupts
>         ldr     r1, [tsk, #TI_FLAGS]            @ re-check for syscall tracing
> @@ -262,6 +266,8 @@ ENDPROC(vector_swi)
>         b       ret_slow_syscall
>
>  __sys_trace_return:
> +       mov     r1, #TASK_SIZE
> +       str     r1, [tsk, #TI_ADDR_LIMIT]
>         str     r0, [sp, #S_R0 + S_OFF]!        @ save returned r0
>         mov     r0, sp
>         bl      syscall_trace_exit
>

That looks pretty great! If I'm reading the macros correctly, this'll
only happen on _actual_ syscall exit, right? So all the crazy OABI
stuff won't suddenly break? e.g.:

asmlinkage long sys_oabi_semtimedop(int semid,
...
                mm_segment_t fs = get_fs();
                set_fs(KERNEL_DS);
                err = sys_semtimedop(semid, sops, nsops, timeout);
                set_fs(fs);
...
        return err;
}

Is there a similarly good place to do this for arm64?

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [kernel-hardening] Re: [RFC] syscalls: Restore address limit after a syscall
@ 2017-02-10 20:49             ` Kees Cook
  0 siblings, 0 replies; 22+ messages in thread
From: Kees Cook @ 2017-02-10 20:49 UTC (permalink / raw)
  To: Russell King - ARM Linux, Mark Rutland
  Cc: Andy Lutomirski, Thomas Garnier, Stephen Bates, linux-s390,
	Arnd Bergmann, kernel-hardening, Linux API, Will Deacon, LKML,
	Dave Hansen, Jeff Moyer, René Nyffenegger, Milosz Tanski,
	linux-arm-kernel

On Fri, Feb 10, 2017 at 11:22 AM, Russell King - ARM Linux
<linux@armlinux.org.uk> wrote:
> On Thu, Feb 09, 2017 at 06:42:34PM -0800, Andy Lutomirski wrote:
>> On Thu, Feb 9, 2017 at 3:41 PM, Thomas Garnier <thgarnie@google.com> wrote:
>> > So by default it is in the wrapper. If selected, an architecture can
>> > disable the wrapper put it in the best places. Understood correctly?
>>
>> Sounds good to me.
>>
>> Presumably the result should go through -mm.  Want to cc: akpm and
>> linux-arch@ on the next version?
>>
>> I've also cc'd arm and s390 folks -- those are the other arches that
>> try to be on top of hardening.
>
> The best place for this on ARM is in the assembly code, rather than in
> the hundreds of system calls - having it in one place is surely better
> for reducing the cache impact.
>
> This (untested) patch should be sufficient for ARM - there's two choices
> which I think make sense to do this:
> 1. Immediately after returning the syscall
> 2. Immediately before any returning to userspace
>
> (1) has the advantage that the address limit will be forced for the
> exit-path works that we do, preventing those making accesses to kernel
> space.
>
> (2) has the advantage that we'd guarantee that the address limit will
> be forced while userspace is running for the next entry into kernel
> space.
>
> There's actually a third option as well:
>
> (3) forcing the address limit on entry to the kernel from userspace.
>
> This patch implements option 1.
>
>  arch/arm/kernel/entry-common.S | 6 ++++++
>  1 files changed, 6 insertions(+)
>
> diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
> index eb5cd77bf1d8..6a717a2ccb88 100644
> --- a/arch/arm/kernel/entry-common.S
> +++ b/arch/arm/kernel/entry-common.S
> @@ -39,6 +39,8 @@
>  ret_fast_syscall:
>   UNWIND(.fnstart       )
>   UNWIND(.cantunwind    )
> +       mov     r1, #TASK_SIZE
> +       str     r1, [tsk, #TI_ADDR_LIMIT]
>         disable_irq_notrace                     @ disable interrupts
>         ldr     r1, [tsk, #TI_FLAGS]            @ re-check for syscall tracing
>         tst     r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK
> @@ -64,6 +66,8 @@ ENDPROC(ret_fast_syscall)
>  ret_fast_syscall:
>   UNWIND(.fnstart       )
>   UNWIND(.cantunwind    )
> +       mov     r1, #TASK_SIZE
> +       str     r1, [tsk, #TI_ADDR_LIMIT]
>         str     r0, [sp, #S_R0 + S_OFF]!        @ save returned r0
>         disable_irq_notrace                     @ disable interrupts
>         ldr     r1, [tsk, #TI_FLAGS]            @ re-check for syscall tracing
> @@ -262,6 +266,8 @@ ENDPROC(vector_swi)
>         b       ret_slow_syscall
>
>  __sys_trace_return:
> +       mov     r1, #TASK_SIZE
> +       str     r1, [tsk, #TI_ADDR_LIMIT]
>         str     r0, [sp, #S_R0 + S_OFF]!        @ save returned r0
>         mov     r0, sp
>         bl      syscall_trace_exit
>

That looks pretty great! If I'm reading the macros correctly, this'll
only happen on _actual_ syscall exit, right? So all the crazy OABI
stuff won't suddenly break? e.g.:

asmlinkage long sys_oabi_semtimedop(int semid,
...
                mm_segment_t fs = get_fs();
                set_fs(KERNEL_DS);
                err = sys_semtimedop(semid, sops, nsops, timeout);
                set_fs(fs);
...
        return err;
}

Is there a similarly good place to do this for arm64?

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC] syscalls: Restore address limit after a syscall
@ 2017-02-10 20:49             ` Kees Cook
  0 siblings, 0 replies; 22+ messages in thread
From: Kees Cook @ 2017-02-10 20:49 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Feb 10, 2017 at 11:22 AM, Russell King - ARM Linux
<linux@armlinux.org.uk> wrote:
> On Thu, Feb 09, 2017 at 06:42:34PM -0800, Andy Lutomirski wrote:
>> On Thu, Feb 9, 2017 at 3:41 PM, Thomas Garnier <thgarnie@google.com> wrote:
>> > So by default it is in the wrapper. If selected, an architecture can
>> > disable the wrapper put it in the best places. Understood correctly?
>>
>> Sounds good to me.
>>
>> Presumably the result should go through -mm.  Want to cc: akpm and
>> linux-arch@ on the next version?
>>
>> I've also cc'd arm and s390 folks -- those are the other arches that
>> try to be on top of hardening.
>
> The best place for this on ARM is in the assembly code, rather than in
> the hundreds of system calls - having it in one place is surely better
> for reducing the cache impact.
>
> This (untested) patch should be sufficient for ARM - there's two choices
> which I think make sense to do this:
> 1. Immediately after returning the syscall
> 2. Immediately before any returning to userspace
>
> (1) has the advantage that the address limit will be forced for the
> exit-path works that we do, preventing those making accesses to kernel
> space.
>
> (2) has the advantage that we'd guarantee that the address limit will
> be forced while userspace is running for the next entry into kernel
> space.
>
> There's actually a third option as well:
>
> (3) forcing the address limit on entry to the kernel from userspace.
>
> This patch implements option 1.
>
>  arch/arm/kernel/entry-common.S | 6 ++++++
>  1 files changed, 6 insertions(+)
>
> diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
> index eb5cd77bf1d8..6a717a2ccb88 100644
> --- a/arch/arm/kernel/entry-common.S
> +++ b/arch/arm/kernel/entry-common.S
> @@ -39,6 +39,8 @@
>  ret_fast_syscall:
>   UNWIND(.fnstart       )
>   UNWIND(.cantunwind    )
> +       mov     r1, #TASK_SIZE
> +       str     r1, [tsk, #TI_ADDR_LIMIT]
>         disable_irq_notrace                     @ disable interrupts
>         ldr     r1, [tsk, #TI_FLAGS]            @ re-check for syscall tracing
>         tst     r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK
> @@ -64,6 +66,8 @@ ENDPROC(ret_fast_syscall)
>  ret_fast_syscall:
>   UNWIND(.fnstart       )
>   UNWIND(.cantunwind    )
> +       mov     r1, #TASK_SIZE
> +       str     r1, [tsk, #TI_ADDR_LIMIT]
>         str     r0, [sp, #S_R0 + S_OFF]!        @ save returned r0
>         disable_irq_notrace                     @ disable interrupts
>         ldr     r1, [tsk, #TI_FLAGS]            @ re-check for syscall tracing
> @@ -262,6 +266,8 @@ ENDPROC(vector_swi)
>         b       ret_slow_syscall
>
>  __sys_trace_return:
> +       mov     r1, #TASK_SIZE
> +       str     r1, [tsk, #TI_ADDR_LIMIT]
>         str     r0, [sp, #S_R0 + S_OFF]!        @ save returned r0
>         mov     r0, sp
>         bl      syscall_trace_exit
>

That looks pretty great! If I'm reading the macros correctly, this'll
only happen on _actual_ syscall exit, right? So all the crazy OABI
stuff won't suddenly break? e.g.:

asmlinkage long sys_oabi_semtimedop(int semid,
...
                mm_segment_t fs = get_fs();
                set_fs(KERNEL_DS);
                err = sys_semtimedop(semid, sops, nsops, timeout);
                set_fs(fs);
...
        return err;
}

Is there a similarly good place to do this for arm64?

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] syscalls: Restore address limit after a syscall
  2017-02-10 20:49             ` [kernel-hardening] " Kees Cook
  (?)
  (?)
@ 2017-02-10 21:49               ` Russell King - ARM Linux
  -1 siblings, 0 replies; 22+ messages in thread
From: Russell King - ARM Linux @ 2017-02-10 21:49 UTC (permalink / raw)
  To: Kees Cook
  Cc: Mark Rutland, Stephen Bates, linux-s390, Arnd Bergmann,
	kernel-hardening, Linux API, Will Deacon, LKML, Andy Lutomirski,
	Dave Hansen, Jeff Moyer, René Nyffenegger, Milosz Tanski,
	Thomas Garnier, linux-arm-kernel

On Fri, Feb 10, 2017 at 12:49:34PM -0800, Kees Cook wrote:
> On Fri, Feb 10, 2017 at 11:22 AM, Russell King - ARM Linux
> <linux@armlinux.org.uk> wrote:
> > On Thu, Feb 09, 2017 at 06:42:34PM -0800, Andy Lutomirski wrote:
> >> On Thu, Feb 9, 2017 at 3:41 PM, Thomas Garnier <thgarnie@google.com> wrote:
> >> > So by default it is in the wrapper. If selected, an architecture can
> >> > disable the wrapper put it in the best places. Understood correctly?
> >>
> >> Sounds good to me.
> >>
> >> Presumably the result should go through -mm.  Want to cc: akpm and
> >> linux-arch@ on the next version?
> >>
> >> I've also cc'd arm and s390 folks -- those are the other arches that
> >> try to be on top of hardening.
> >
> > The best place for this on ARM is in the assembly code, rather than in
> > the hundreds of system calls - having it in one place is surely better
> > for reducing the cache impact.
> >
> > This (untested) patch should be sufficient for ARM - there's two choices
> > which I think make sense to do this:
> > 1. Immediately after returning the syscall
> > 2. Immediately before any returning to userspace
> >
> > (1) has the advantage that the address limit will be forced for the
> > exit-path works that we do, preventing those making accesses to kernel
> > space.
> >
> > (2) has the advantage that we'd guarantee that the address limit will
> > be forced while userspace is running for the next entry into kernel
> > space.
> >
> > There's actually a third option as well:
> >
> > (3) forcing the address limit on entry to the kernel from userspace.
> >
> > This patch implements option 1.
> >
> >  arch/arm/kernel/entry-common.S | 6 ++++++
> >  1 files changed, 6 insertions(+)
> >
> > diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
> > index eb5cd77bf1d8..6a717a2ccb88 100644
> > --- a/arch/arm/kernel/entry-common.S
> > +++ b/arch/arm/kernel/entry-common.S
> > @@ -39,6 +39,8 @@
> >  ret_fast_syscall:
> >   UNWIND(.fnstart       )
> >   UNWIND(.cantunwind    )
> > +       mov     r1, #TASK_SIZE
> > +       str     r1, [tsk, #TI_ADDR_LIMIT]
> >         disable_irq_notrace                     @ disable interrupts
> >         ldr     r1, [tsk, #TI_FLAGS]            @ re-check for syscall tracing
> >         tst     r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK
> > @@ -64,6 +66,8 @@ ENDPROC(ret_fast_syscall)
> >  ret_fast_syscall:
> >   UNWIND(.fnstart       )
> >   UNWIND(.cantunwind    )
> > +       mov     r1, #TASK_SIZE
> > +       str     r1, [tsk, #TI_ADDR_LIMIT]
> >         str     r0, [sp, #S_R0 + S_OFF]!        @ save returned r0
> >         disable_irq_notrace                     @ disable interrupts
> >         ldr     r1, [tsk, #TI_FLAGS]            @ re-check for syscall tracing
> > @@ -262,6 +266,8 @@ ENDPROC(vector_swi)
> >         b       ret_slow_syscall
> >
> >  __sys_trace_return:
> > +       mov     r1, #TASK_SIZE
> > +       str     r1, [tsk, #TI_ADDR_LIMIT]
> >         str     r0, [sp, #S_R0 + S_OFF]!        @ save returned r0
> >         mov     r0, sp
> >         bl      syscall_trace_exit
> >
> 
> That looks pretty great! If I'm reading the macros correctly, this'll
> only happen on _actual_ syscall exit, right? So all the crazy OABI
> stuff won't suddenly break? e.g.:

Correct.

> Is there a similarly good place to do this for arm64?

I'd imagine similar places exist in arm64:

ret_fast_syscall
ret_to_user

maybe?

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [kernel-hardening] Re: [RFC] syscalls: Restore address limit after a syscall
@ 2017-02-10 21:49               ` Russell King - ARM Linux
  0 siblings, 0 replies; 22+ messages in thread
From: Russell King - ARM Linux @ 2017-02-10 21:49 UTC (permalink / raw)
  To: Kees Cook
  Cc: Mark Rutland, Stephen Bates, linux-s390, Arnd Bergmann,
	kernel-hardening, Linux API, Will Deacon, LKML, Andy Lutomirski,
	Dave Hansen, Jeff Moyer, René Nyffenegger, Milosz Tanski,
	Thomas Garnier, linux-arm-kernel

On Fri, Feb 10, 2017 at 12:49:34PM -0800, Kees Cook wrote:
> On Fri, Feb 10, 2017 at 11:22 AM, Russell King - ARM Linux
> <linux@armlinux.org.uk> wrote:
> > On Thu, Feb 09, 2017 at 06:42:34PM -0800, Andy Lutomirski wrote:
> >> On Thu, Feb 9, 2017 at 3:41 PM, Thomas Garnier <thgarnie@google.com> wrote:
> >> > So by default it is in the wrapper. If selected, an architecture can
> >> > disable the wrapper put it in the best places. Understood correctly?
> >>
> >> Sounds good to me.
> >>
> >> Presumably the result should go through -mm.  Want to cc: akpm and
> >> linux-arch@ on the next version?
> >>
> >> I've also cc'd arm and s390 folks -- those are the other arches that
> >> try to be on top of hardening.
> >
> > The best place for this on ARM is in the assembly code, rather than in
> > the hundreds of system calls - having it in one place is surely better
> > for reducing the cache impact.
> >
> > This (untested) patch should be sufficient for ARM - there's two choices
> > which I think make sense to do this:
> > 1. Immediately after returning the syscall
> > 2. Immediately before any returning to userspace
> >
> > (1) has the advantage that the address limit will be forced for the
> > exit-path works that we do, preventing those making accesses to kernel
> > space.
> >
> > (2) has the advantage that we'd guarantee that the address limit will
> > be forced while userspace is running for the next entry into kernel
> > space.
> >
> > There's actually a third option as well:
> >
> > (3) forcing the address limit on entry to the kernel from userspace.
> >
> > This patch implements option 1.
> >
> >  arch/arm/kernel/entry-common.S | 6 ++++++
> >  1 files changed, 6 insertions(+)
> >
> > diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
> > index eb5cd77bf1d8..6a717a2ccb88 100644
> > --- a/arch/arm/kernel/entry-common.S
> > +++ b/arch/arm/kernel/entry-common.S
> > @@ -39,6 +39,8 @@
> >  ret_fast_syscall:
> >   UNWIND(.fnstart       )
> >   UNWIND(.cantunwind    )
> > +       mov     r1, #TASK_SIZE
> > +       str     r1, [tsk, #TI_ADDR_LIMIT]
> >         disable_irq_notrace                     @ disable interrupts
> >         ldr     r1, [tsk, #TI_FLAGS]            @ re-check for syscall tracing
> >         tst     r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK
> > @@ -64,6 +66,8 @@ ENDPROC(ret_fast_syscall)
> >  ret_fast_syscall:
> >   UNWIND(.fnstart       )
> >   UNWIND(.cantunwind    )
> > +       mov     r1, #TASK_SIZE
> > +       str     r1, [tsk, #TI_ADDR_LIMIT]
> >         str     r0, [sp, #S_R0 + S_OFF]!        @ save returned r0
> >         disable_irq_notrace                     @ disable interrupts
> >         ldr     r1, [tsk, #TI_FLAGS]            @ re-check for syscall tracing
> > @@ -262,6 +266,8 @@ ENDPROC(vector_swi)
> >         b       ret_slow_syscall
> >
> >  __sys_trace_return:
> > +       mov     r1, #TASK_SIZE
> > +       str     r1, [tsk, #TI_ADDR_LIMIT]
> >         str     r0, [sp, #S_R0 + S_OFF]!        @ save returned r0
> >         mov     r0, sp
> >         bl      syscall_trace_exit
> >
> 
> That looks pretty great! If I'm reading the macros correctly, this'll
> only happen on _actual_ syscall exit, right? So all the crazy OABI
> stuff won't suddenly break? e.g.:

Correct.

> Is there a similarly good place to do this for arm64?

I'd imagine similar places exist in arm64:

ret_fast_syscall
ret_to_user

maybe?

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] syscalls: Restore address limit after a syscall
@ 2017-02-10 21:49               ` Russell King - ARM Linux
  0 siblings, 0 replies; 22+ messages in thread
From: Russell King - ARM Linux @ 2017-02-10 21:49 UTC (permalink / raw)
  To: Kees Cook
  Cc: Mark Rutland, Stephen Bates, linux-s390-u79uwXL29TY76Z2rM5mHXA,
	Arnd Bergmann, kernel-hardening-ZwoEplunGu1jrUoiu81ncdBPR1lH4CV8,
	Linux API, Will Deacon, LKML, Andy Lutomirski, Dave Hansen,
	Jeff Moyer, René Nyffenegger, Milosz Tanski, Thomas Garnier,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Fri, Feb 10, 2017 at 12:49:34PM -0800, Kees Cook wrote:
> On Fri, Feb 10, 2017 at 11:22 AM, Russell King - ARM Linux
> <linux-I+IVW8TIWO2tmTQ+vhA3Yw@public.gmane.org> wrote:
> > On Thu, Feb 09, 2017 at 06:42:34PM -0800, Andy Lutomirski wrote:
> >> On Thu, Feb 9, 2017 at 3:41 PM, Thomas Garnier <thgarnie-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
> >> > So by default it is in the wrapper. If selected, an architecture can
> >> > disable the wrapper put it in the best places. Understood correctly?
> >>
> >> Sounds good to me.
> >>
> >> Presumably the result should go through -mm.  Want to cc: akpm and
> >> linux-arch@ on the next version?
> >>
> >> I've also cc'd arm and s390 folks -- those are the other arches that
> >> try to be on top of hardening.
> >
> > The best place for this on ARM is in the assembly code, rather than in
> > the hundreds of system calls - having it in one place is surely better
> > for reducing the cache impact.
> >
> > This (untested) patch should be sufficient for ARM - there's two choices
> > which I think make sense to do this:
> > 1. Immediately after returning the syscall
> > 2. Immediately before any returning to userspace
> >
> > (1) has the advantage that the address limit will be forced for the
> > exit-path works that we do, preventing those making accesses to kernel
> > space.
> >
> > (2) has the advantage that we'd guarantee that the address limit will
> > be forced while userspace is running for the next entry into kernel
> > space.
> >
> > There's actually a third option as well:
> >
> > (3) forcing the address limit on entry to the kernel from userspace.
> >
> > This patch implements option 1.
> >
> >  arch/arm/kernel/entry-common.S | 6 ++++++
> >  1 files changed, 6 insertions(+)
> >
> > diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
> > index eb5cd77bf1d8..6a717a2ccb88 100644
> > --- a/arch/arm/kernel/entry-common.S
> > +++ b/arch/arm/kernel/entry-common.S
> > @@ -39,6 +39,8 @@
> >  ret_fast_syscall:
> >   UNWIND(.fnstart       )
> >   UNWIND(.cantunwind    )
> > +       mov     r1, #TASK_SIZE
> > +       str     r1, [tsk, #TI_ADDR_LIMIT]
> >         disable_irq_notrace                     @ disable interrupts
> >         ldr     r1, [tsk, #TI_FLAGS]            @ re-check for syscall tracing
> >         tst     r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK
> > @@ -64,6 +66,8 @@ ENDPROC(ret_fast_syscall)
> >  ret_fast_syscall:
> >   UNWIND(.fnstart       )
> >   UNWIND(.cantunwind    )
> > +       mov     r1, #TASK_SIZE
> > +       str     r1, [tsk, #TI_ADDR_LIMIT]
> >         str     r0, [sp, #S_R0 + S_OFF]!        @ save returned r0
> >         disable_irq_notrace                     @ disable interrupts
> >         ldr     r1, [tsk, #TI_FLAGS]            @ re-check for syscall tracing
> > @@ -262,6 +266,8 @@ ENDPROC(vector_swi)
> >         b       ret_slow_syscall
> >
> >  __sys_trace_return:
> > +       mov     r1, #TASK_SIZE
> > +       str     r1, [tsk, #TI_ADDR_LIMIT]
> >         str     r0, [sp, #S_R0 + S_OFF]!        @ save returned r0
> >         mov     r0, sp
> >         bl      syscall_trace_exit
> >
> 
> That looks pretty great! If I'm reading the macros correctly, this'll
> only happen on _actual_ syscall exit, right? So all the crazy OABI
> stuff won't suddenly break? e.g.:

Correct.

> Is there a similarly good place to do this for arm64?

I'd imagine similar places exist in arm64:

ret_fast_syscall
ret_to_user

maybe?

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC] syscalls: Restore address limit after a syscall
@ 2017-02-10 21:49               ` Russell King - ARM Linux
  0 siblings, 0 replies; 22+ messages in thread
From: Russell King - ARM Linux @ 2017-02-10 21:49 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Feb 10, 2017 at 12:49:34PM -0800, Kees Cook wrote:
> On Fri, Feb 10, 2017 at 11:22 AM, Russell King - ARM Linux
> <linux@armlinux.org.uk> wrote:
> > On Thu, Feb 09, 2017 at 06:42:34PM -0800, Andy Lutomirski wrote:
> >> On Thu, Feb 9, 2017 at 3:41 PM, Thomas Garnier <thgarnie@google.com> wrote:
> >> > So by default it is in the wrapper. If selected, an architecture can
> >> > disable the wrapper put it in the best places. Understood correctly?
> >>
> >> Sounds good to me.
> >>
> >> Presumably the result should go through -mm.  Want to cc: akpm and
> >> linux-arch@ on the next version?
> >>
> >> I've also cc'd arm and s390 folks -- those are the other arches that
> >> try to be on top of hardening.
> >
> > The best place for this on ARM is in the assembly code, rather than in
> > the hundreds of system calls - having it in one place is surely better
> > for reducing the cache impact.
> >
> > This (untested) patch should be sufficient for ARM - there's two choices
> > which I think make sense to do this:
> > 1. Immediately after returning the syscall
> > 2. Immediately before any returning to userspace
> >
> > (1) has the advantage that the address limit will be forced for the
> > exit-path works that we do, preventing those making accesses to kernel
> > space.
> >
> > (2) has the advantage that we'd guarantee that the address limit will
> > be forced while userspace is running for the next entry into kernel
> > space.
> >
> > There's actually a third option as well:
> >
> > (3) forcing the address limit on entry to the kernel from userspace.
> >
> > This patch implements option 1.
> >
> >  arch/arm/kernel/entry-common.S | 6 ++++++
> >  1 files changed, 6 insertions(+)
> >
> > diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
> > index eb5cd77bf1d8..6a717a2ccb88 100644
> > --- a/arch/arm/kernel/entry-common.S
> > +++ b/arch/arm/kernel/entry-common.S
> > @@ -39,6 +39,8 @@
> >  ret_fast_syscall:
> >   UNWIND(.fnstart       )
> >   UNWIND(.cantunwind    )
> > +       mov     r1, #TASK_SIZE
> > +       str     r1, [tsk, #TI_ADDR_LIMIT]
> >         disable_irq_notrace                     @ disable interrupts
> >         ldr     r1, [tsk, #TI_FLAGS]            @ re-check for syscall tracing
> >         tst     r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK
> > @@ -64,6 +66,8 @@ ENDPROC(ret_fast_syscall)
> >  ret_fast_syscall:
> >   UNWIND(.fnstart       )
> >   UNWIND(.cantunwind    )
> > +       mov     r1, #TASK_SIZE
> > +       str     r1, [tsk, #TI_ADDR_LIMIT]
> >         str     r0, [sp, #S_R0 + S_OFF]!        @ save returned r0
> >         disable_irq_notrace                     @ disable interrupts
> >         ldr     r1, [tsk, #TI_FLAGS]            @ re-check for syscall tracing
> > @@ -262,6 +266,8 @@ ENDPROC(vector_swi)
> >         b       ret_slow_syscall
> >
> >  __sys_trace_return:
> > +       mov     r1, #TASK_SIZE
> > +       str     r1, [tsk, #TI_ADDR_LIMIT]
> >         str     r0, [sp, #S_R0 + S_OFF]!        @ save returned r0
> >         mov     r0, sp
> >         bl      syscall_trace_exit
> >
> 
> That looks pretty great! If I'm reading the macros correctly, this'll
> only happen on _actual_ syscall exit, right? So all the crazy OABI
> stuff won't suddenly break? e.g.:

Correct.

> Is there a similarly good place to do this for arm64?

I'd imagine similar places exist in arm64:

ret_fast_syscall
ret_to_user

maybe?

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2017-02-10 21:50 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-09 18:33 [RFC] syscalls: Restore address limit after a syscall Thomas Garnier
2017-02-09 18:33 ` [kernel-hardening] " Thomas Garnier
2017-02-09 19:31 ` Kees Cook
2017-02-09 19:31   ` [kernel-hardening] " Kees Cook
2017-02-09 19:31   ` Kees Cook
2017-02-09 23:05   ` Andy Lutomirski
2017-02-09 23:05     ` [kernel-hardening] " Andy Lutomirski
2017-02-09 23:41     ` Thomas Garnier
2017-02-09 23:41       ` [kernel-hardening] " Thomas Garnier
2017-02-10  2:42       ` Andy Lutomirski
2017-02-10  2:42         ` Andy Lutomirski
2017-02-10  2:42         ` [kernel-hardening] " Andy Lutomirski
2017-02-10 19:22         ` Russell King - ARM Linux
2017-02-10 19:22           ` Russell King - ARM Linux
2017-02-10 19:22           ` [kernel-hardening] " Russell King - ARM Linux
2017-02-10 20:49           ` Kees Cook
2017-02-10 20:49             ` Kees Cook
2017-02-10 20:49             ` [kernel-hardening] " Kees Cook
2017-02-10 21:49             ` Russell King - ARM Linux
2017-02-10 21:49               ` Russell King - ARM Linux
2017-02-10 21:49               ` Russell King - ARM Linux
2017-02-10 21:49               ` [kernel-hardening] " Russell King - ARM Linux

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.