All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH] exec: Use init rlimits for setuid exec
@ 2017-07-06  4:32 Kees Cook
  2017-07-06  4:59 ` Andy Lutomirski
                   ` (3 more replies)
  0 siblings, 4 replies; 36+ messages in thread
From: Kees Cook @ 2017-07-06  4:32 UTC (permalink / raw)
  To: Linus Torvalds, Andy Lutomirski
  Cc: Michal Hocko, Ben Hutchings, Willy Tarreau, Hugh Dickins,
	Oleg Nesterov, Jason A. Donenfeld, Rik van Riel, Larry Woodman,
	Kirill A. Shutemov, Tony Luck, James E.J. Bottomley,
	Helge Diller, James Hogan, Laura Abbott, Greg KH, security,
	Qualys Security Advisory, LKML, Ximin Luo

In an attempt to provide sensible rlimit defaults for setuid execs, this
inherits the namespace's init rlimits:

$ ulimit -s
8192
$ ulimit -s unlimited
$ /bin/sh -c 'ulimit -s'
unlimited
$ sudo /bin/sh -c 'ulimit -s'
8192

This is modified from Brad Spengler/PaX Team's hard-coded setuid exec
stack rlimit (8MB) in the last public patch of grsecurity/PaX based on
my understanding of the code. Changes or omissions from the original
code are mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
Instead of copying all rlimits, we could also pick specific ones to copy
(e.g. RLIMIT_STACK, or ones from Andy's list) or exclude from copying
(probably better to blacklist than whitelist).

I think this is the right way to find the ns init task, but maybe it
needs locking?
---
 fs/exec.c | 34 ++++++++++++++++++++++++++++++----
 1 file changed, 30 insertions(+), 4 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 904199086490..80e8b2bd4284 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1675,6 +1675,12 @@ static int exec_binprm(struct linux_binprm *bprm)
 	return ret;
 }
 
+static inline bool is_setuid_exec(struct linux_binprm *bprm)
+{
+	return (!uid_eq(bprm->cred->euid, current_euid()) ||
+		!gid_eq(bprm->cred->egid, current_egid()));
+}
+
 /*
  * sys_execve() executes a new program.
  */
@@ -1687,6 +1693,7 @@ static int do_execveat_common(int fd, struct filename *filename,
 	struct linux_binprm *bprm;
 	struct file *file;
 	struct files_struct *displaced;
+	struct rlimit saved_rlim[RLIM_NLIMITS];
 	int retval;
 
 	if (IS_ERR(filename))
@@ -1771,24 +1778,38 @@ static int do_execveat_common(int fd, struct filename *filename,
 	if (retval < 0)
 		goto out;
 
+	/*
+	 * From here forward, we've got credentials set up and we're
+	 * using resources, so do rlimit replacement before we start
+	 * copying strings. (Note that the RLIMIT_NPROC check has
+	 * already happened.)
+	 */
+	BUILD_BUG_ON(sizeof(saved_rlim) != sizeof(current->signal->rlim));
+	if (is_setuid_exec(bprm)) {
+		memcpy(saved_rlim, current->signal->rlim, sizeof(saved_rlim));
+		memcpy(current->signal->rlim,
+		       task_active_pid_ns(current)->child_reaper->signal->rlim,
+		       sizeof(current->signal->rlim));
+	}
+
 	retval = copy_strings_kernel(1, &bprm->filename, bprm);
 	if (retval < 0)
-		goto out;
+		goto out_restore;
 
 	bprm->exec = bprm->p;
 	retval = copy_strings(bprm->envc, envp, bprm);
 	if (retval < 0)
-		goto out;
+		goto out_restore;
 
 	retval = copy_strings(bprm->argc, argv, bprm);
 	if (retval < 0)
-		goto out;
+		goto out_restore;
 
 	would_dump(bprm, bprm->file);
 
 	retval = exec_binprm(bprm);
 	if (retval < 0)
-		goto out;
+		goto out_restore;
 
 	/* execve succeeded */
 	current->fs->in_exec = 0;
@@ -1802,6 +1823,11 @@ static int do_execveat_common(int fd, struct filename *filename,
 		put_files_struct(displaced);
 	return retval;
 
+out_restore:
+	if (is_setuid_exec(bprm)) {
+		memcpy(current->signal->rlim, saved_rlim, sizeof(saved_rlim));
+	}
+
 out:
 	if (bprm->mm) {
 		acct_arg_size(bprm, 0);
-- 
2.7.4


-- 
Kees Cook
Pixel Security

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH] exec: Use init rlimits for setuid exec
  2017-07-06  4:32 [RFC][PATCH] exec: Use init rlimits for setuid exec Kees Cook
@ 2017-07-06  4:59 ` Andy Lutomirski
  2017-07-06 12:45   ` Eric W. Biederman
  2017-07-06  5:47 ` Willy Tarreau
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 36+ messages in thread
From: Andy Lutomirski @ 2017-07-06  4:59 UTC (permalink / raw)
  To: Kees Cook
  Cc: Linus Torvalds, Andy Lutomirski, Michal Hocko, Ben Hutchings,
	Willy Tarreau, Hugh Dickins, Oleg Nesterov, Jason A. Donenfeld,
	Rik van Riel, Larry Woodman, Kirill A. Shutemov, Tony Luck,
	James E.J. Bottomley, Helge Diller, James Hogan, Laura Abbott,
	Greg KH, security, Qualys Security Advisory, LKML, Ximin Luo

On Wed, Jul 5, 2017 at 9:32 PM, Kees Cook <keescook@chromium.org> wrote:
> In an attempt to provide sensible rlimit defaults for setuid execs, this
> inherits the namespace's init rlimits:
>
> $ ulimit -s
> 8192
> $ ulimit -s unlimited
> $ /bin/sh -c 'ulimit -s'
> unlimited
> $ sudo /bin/sh -c 'ulimit -s'
> 8192
>
> This is modified from Brad Spengler/PaX Team's hard-coded setuid exec
> stack rlimit (8MB) in the last public patch of grsecurity/PaX based on
> my understanding of the code. Changes or omissions from the original
> code are mine and don't reflect the original grsecurity/PaX code.
>
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
> Instead of copying all rlimits, we could also pick specific ones to copy
> (e.g. RLIMIT_STACK, or ones from Andy's list) or exclude from copying
> (probably better to blacklist than whitelist).
>
> I think this is the right way to find the ns init task, but maybe it
> needs locking?
> ---
>  fs/exec.c | 34 ++++++++++++++++++++++++++++++----
>  1 file changed, 30 insertions(+), 4 deletions(-)
>
> diff --git a/fs/exec.c b/fs/exec.c
> index 904199086490..80e8b2bd4284 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1675,6 +1675,12 @@ static int exec_binprm(struct linux_binprm *bprm)
>         return ret;
>  }
>
> +static inline bool is_setuid_exec(struct linux_binprm *bprm)
> +{
> +       return (!uid_eq(bprm->cred->euid, current_euid()) ||
> +               !gid_eq(bprm->cred->egid, current_egid()));
> +}
> +

How about skipping this and using security_bprm_secureexec()?

>  /*
>   * sys_execve() executes a new program.
>   */
> @@ -1687,6 +1693,7 @@ static int do_execveat_common(int fd, struct filename *filename,
>         struct linux_binprm *bprm;
>         struct file *file;
>         struct files_struct *displaced;
> +       struct rlimit saved_rlim[RLIM_NLIMITS];
>         int retval;
>
>         if (IS_ERR(filename))
> @@ -1771,24 +1778,38 @@ static int do_execveat_common(int fd, struct filename *filename,
>         if (retval < 0)
>                 goto out;
>
> +       /*
> +        * From here forward, we've got credentials set up and we're
> +        * using resources, so do rlimit replacement before we start
> +        * copying strings. (Note that the RLIMIT_NPROC check has
> +        * already happened.)
> +        */
> +       BUILD_BUG_ON(sizeof(saved_rlim) != sizeof(current->signal->rlim));
> +       if (is_setuid_exec(bprm)) {
> +               memcpy(saved_rlim, current->signal->rlim, sizeof(saved_rlim));
> +               memcpy(current->signal->rlim,
> +                      task_active_pid_ns(current)->child_reaper->signal->rlim,
> +                      sizeof(current->signal->rlim));
> +       }
> +

Is there any locking needed here?  Is it possible that another thread
with the same current->signal is running?

I think the answer is that this needs to be after de_thread(), which
is called from exec_binprm(), which is rather annoying since the stack
rlimit is used before that.  It's not *that* bad, since I think you
could replace all the RLIMIT_STACK accesses with explciit code to look
at the rlimits that are actually in play, but you just can't actually
install them until you've flushed the old exec.

Perhaps, if you fix this, the changelog should say:

Changes, omissions, or bugfixes from the original
code are mine and don't reflect the original grsecurity/PaX code.

(Hmm, is the grsecurity/PaX code exploitable?  If you can exec a
setuid program and arrange for it to fail while you're in a threaded
program, it looks like the patch you sent will result in corruption of
the rlimits in use by your other threads.  Admittedy, bypassing
rlimits is not exactly a huge deal.)

Also, should this whole thing have a sysctl?

--Andy

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH] exec: Use init rlimits for setuid exec
  2017-07-06  4:32 [RFC][PATCH] exec: Use init rlimits for setuid exec Kees Cook
  2017-07-06  4:59 ` Andy Lutomirski
@ 2017-07-06  5:47 ` Willy Tarreau
  2017-07-06 12:38 ` Eric W. Biederman
  2017-07-06 16:34 ` Linus Torvalds
  3 siblings, 0 replies; 36+ messages in thread
From: Willy Tarreau @ 2017-07-06  5:47 UTC (permalink / raw)
  To: Kees Cook
  Cc: Linus Torvalds, Andy Lutomirski, Michal Hocko, Ben Hutchings,
	Hugh Dickins, Oleg Nesterov, Jason A. Donenfeld, Rik van Riel,
	Larry Woodman, Kirill A. Shutemov, Tony Luck,
	James E.J. Bottomley, Helge Diller, James Hogan, Laura Abbott,
	Greg KH, security, Qualys Security Advisory, LKML, Ximin Luo

On Wed, Jul 05, 2017 at 09:32:35PM -0700, Kees Cook wrote:
> In an attempt to provide sensible rlimit defaults for setuid execs, this
> inherits the namespace's init rlimits:
> 
> $ ulimit -s
> 8192
> $ ulimit -s unlimited
> $ /bin/sh -c 'ulimit -s'
> unlimited
> $ sudo /bin/sh -c 'ulimit -s'
> 8192
> 
> This is modified from Brad Spengler/PaX Team's hard-coded setuid exec
> stack rlimit (8MB) in the last public patch of grsecurity/PaX based on
> my understanding of the code. Changes or omissions from the original
> code are mine and don't reflect the original grsecurity/PaX code.
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
> Instead of copying all rlimits, we could also pick specific ones to copy
> (e.g. RLIMIT_STACK, or ones from Andy's list) or exclude from copying
> (probably better to blacklist than whitelist).
> 
> I think this is the right way to find the ns init task, but maybe it
> needs locking?
> ---
>  fs/exec.c | 34 ++++++++++++++++++++++++++++++----
>  1 file changed, 30 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index 904199086490..80e8b2bd4284 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1675,6 +1675,12 @@ static int exec_binprm(struct linux_binprm *bprm)
>  	return ret;
>  }
>  
> +static inline bool is_setuid_exec(struct linux_binprm *bprm)
> +{
> +	return (!uid_eq(bprm->cred->euid, current_euid()) ||
> +		!gid_eq(bprm->cred->egid, current_egid()));
> +}
> +
>  /*
>   * sys_execve() executes a new program.
>   */
> @@ -1687,6 +1693,7 @@ static int do_execveat_common(int fd, struct filename *filename,
>  	struct linux_binprm *bprm;
>  	struct file *file;
>  	struct files_struct *displaced;
> +	struct rlimit saved_rlim[RLIM_NLIMITS];
>  	int retval;
>  
>  	if (IS_ERR(filename))
> @@ -1771,24 +1778,38 @@ static int do_execveat_common(int fd, struct filename *filename,
>  	if (retval < 0)
>  		goto out;
>  
> +	/*
> +	 * From here forward, we've got credentials set up and we're
> +	 * using resources, so do rlimit replacement before we start
> +	 * copying strings. (Note that the RLIMIT_NPROC check has
> +	 * already happened.)
> +	 */
> +	BUILD_BUG_ON(sizeof(saved_rlim) != sizeof(current->signal->rlim));
> +	if (is_setuid_exec(bprm)) {
> +		memcpy(saved_rlim, current->signal->rlim, sizeof(saved_rlim));
> +		memcpy(current->signal->rlim,
> +		       task_active_pid_ns(current)->child_reaper->signal->rlim,
> +		       sizeof(current->signal->rlim));
> +	}
> +

But you're not replacing just RLIMIT_STACK but all rlimits here. Please
don't do this, as I mentionned elsewhere in the thread it will really
become a pain at least for RLIMIT_CORE and RLIMIT_NOFILE. Better only
apply RLIMIT_STACK as the commit message mentions it.

Willy

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH] exec: Use init rlimits for setuid exec
  2017-07-06  4:32 [RFC][PATCH] exec: Use init rlimits for setuid exec Kees Cook
  2017-07-06  4:59 ` Andy Lutomirski
  2017-07-06  5:47 ` Willy Tarreau
@ 2017-07-06 12:38 ` Eric W. Biederman
  2017-07-06 15:30   ` Andy Lutomirski
  2017-07-06 16:34 ` Linus Torvalds
  3 siblings, 1 reply; 36+ messages in thread
From: Eric W. Biederman @ 2017-07-06 12:38 UTC (permalink / raw)
  To: Kees Cook
  Cc: Linus Torvalds, Andy Lutomirski, Michal Hocko, Ben Hutchings,
	Willy Tarreau, Hugh Dickins, Oleg Nesterov, Jason A. Donenfeld,
	Rik van Riel, Larry Woodman, Kirill A. Shutemov, Tony Luck,
	James E.J. Bottomley, Helge Diller, James Hogan, Laura Abbott,
	Greg KH, security, Qualys Security Advisory, LKML, Ximin Luo

Kees Cook <keescook@chromium.org> writes:

> In an attempt to provide sensible rlimit defaults for setuid execs, this
> inherits the namespace's init rlimits:
>
> $ ulimit -s
> 8192
> $ ulimit -s unlimited
> $ /bin/sh -c 'ulimit -s'
> unlimited
> $ sudo /bin/sh -c 'ulimit -s'
> 8192
>
> This is modified from Brad Spengler/PaX Team's hard-coded setuid exec
> stack rlimit (8MB) in the last public patch of grsecurity/PaX based on
> my understanding of the code. Changes or omissions from the original
> code are mine and don't reflect the original grsecurity/PaX code.
>
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
> Instead of copying all rlimits, we could also pick specific ones to copy
> (e.g. RLIMIT_STACK, or ones from Andy's list) or exclude from copying
> (probably better to blacklist than whitelist).
>
> I think this is the right way to find the ns init task, but maybe it
> needs locking?
> ---
>  fs/exec.c | 34 ++++++++++++++++++++++++++++++----
>  1 file changed, 30 insertions(+), 4 deletions(-)
>
> diff --git a/fs/exec.c b/fs/exec.c
> index 904199086490..80e8b2bd4284 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1675,6 +1675,12 @@ static int exec_binprm(struct linux_binprm *bprm)
>  	return ret;
>  }
>  
> +static inline bool is_setuid_exec(struct linux_binprm *bprm)
> +{
> +	return (!uid_eq(bprm->cred->euid, current_euid()) ||
> +		!gid_eq(bprm->cred->egid, current_egid()));
> +}

Awesome I can make an executable setuid to myself and get all of roots
rlimits!

Scratch inheritable rlimits as useful for any kind of policy decision.

>  /*
>   * sys_execve() executes a new program.
>   */
> @@ -1687,6 +1693,7 @@ static int do_execveat_common(int fd, struct filename *filename,
>  	struct linux_binprm *bprm;
>  	struct file *file;
>  	struct files_struct *displaced;
> +	struct rlimit saved_rlim[RLIM_NLIMITS];
>  	int retval;
>  
>  	if (IS_ERR(filename))
> @@ -1771,24 +1778,38 @@ static int do_execveat_common(int fd, struct filename *filename,
>  	if (retval < 0)
>  		goto out;
>  
> +	/*
> +	 * From here forward, we've got credentials set up and we're
> +	 * using resources, so do rlimit replacement before we start
> +	 * copying strings. (Note that the RLIMIT_NPROC check has
> +	 * already happened.)
> +	 */
> +	BUILD_BUG_ON(sizeof(saved_rlim) != sizeof(current->signal->rlim));
> +	if (is_setuid_exec(bprm)) {
> +		memcpy(saved_rlim, current->signal->rlim, sizeof(saved_rlim));
> +		memcpy(current->signal->rlim,
> +		       task_active_pid_ns(current)->child_reaper->signal->rlim,
> +		       sizeof(current->signal->rlim));
> +	}
> +

Caerful.  child_reaper can change if you are not holding the tasklist
lock.

It would be better if we could move any rlimit changes after de_thread.
Otherwise there are some really fun races you can play with.

After de_thread is past the point of no return so you would not need to
worry about restoring the rlimits either.

>  	retval = copy_strings_kernel(1, &bprm->filename, bprm);
>  	if (retval < 0)
> -		goto out;
> +		goto out_restore;
>  
>  	bprm->exec = bprm->p;
>  	retval = copy_strings(bprm->envc, envp, bprm);
>  	if (retval < 0)
> -		goto out;
> +		goto out_restore;
>  
>  	retval = copy_strings(bprm->argc, argv, bprm);
>  	if (retval < 0)
> -		goto out;
> +		goto out_restore;
>  
>  	would_dump(bprm, bprm->file);
>  
>  	retval = exec_binprm(bprm);
>  	if (retval < 0)
> -		goto out;
> +		goto out_restore;
>  
>  	/* execve succeeded */
>  	current->fs->in_exec = 0;
> @@ -1802,6 +1823,11 @@ static int do_execveat_common(int fd, struct filename *filename,
>  		put_files_struct(displaced);
>  	return retval;
>  
> +out_restore:
> +	if (is_setuid_exec(bprm)) {
> +		memcpy(current->signal->rlim, saved_rlim, sizeof(saved_rlim));
> +	}
> +
>  out:
>  	if (bprm->mm) {
>  		acct_arg_size(bprm, 0);
> -- 
> 2.7.4

Eric

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH] exec: Use init rlimits for setuid exec
  2017-07-06  4:59 ` Andy Lutomirski
@ 2017-07-06 12:45   ` Eric W. Biederman
  2017-07-06 15:27     ` Andy Lutomirski
  0 siblings, 1 reply; 36+ messages in thread
From: Eric W. Biederman @ 2017-07-06 12:45 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kees Cook, Linus Torvalds, Michal Hocko, Ben Hutchings,
	Willy Tarreau, Hugh Dickins, Oleg Nesterov, Jason A. Donenfeld,
	Rik van Riel, Larry Woodman, Kirill A. Shutemov, Tony Luck,
	James E.J. Bottomley, Helge Diller, James Hogan, Laura Abbott,
	Greg KH, security, Qualys Security Advisory, LKML, Ximin Luo

Andy Lutomirski <luto@kernel.org> writes:

> On Wed, Jul 5, 2017 at 9:32 PM, Kees Cook <keescook@chromium.org> wrote:
>> In an attempt to provide sensible rlimit defaults for setuid execs, this
>> inherits the namespace's init rlimits:
>>
>> $ ulimit -s
>> 8192
>> $ ulimit -s unlimited
>> $ /bin/sh -c 'ulimit -s'
>> unlimited
>> $ sudo /bin/sh -c 'ulimit -s'
>> 8192
>>
>> This is modified from Brad Spengler/PaX Team's hard-coded setuid exec
>> stack rlimit (8MB) in the last public patch of grsecurity/PaX based on
>> my understanding of the code. Changes or omissions from the original
>> code are mine and don't reflect the original grsecurity/PaX code.
>>
>> Signed-off-by: Kees Cook <keescook@chromium.org>
>> ---
>> Instead of copying all rlimits, we could also pick specific ones to copy
>> (e.g. RLIMIT_STACK, or ones from Andy's list) or exclude from copying
>> (probably better to blacklist than whitelist).
>>
>> I think this is the right way to find the ns init task, but maybe it
>> needs locking?
>> ---
>>  fs/exec.c | 34 ++++++++++++++++++++++++++++++----
>>  1 file changed, 30 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/exec.c b/fs/exec.c
>> index 904199086490..80e8b2bd4284 100644
>> --- a/fs/exec.c
>> +++ b/fs/exec.c
>> @@ -1675,6 +1675,12 @@ static int exec_binprm(struct linux_binprm *bprm)
>>         return ret;
>>  }
>>
>> +static inline bool is_setuid_exec(struct linux_binprm *bprm)
>> +{
>> +       return (!uid_eq(bprm->cred->euid, current_euid()) ||
>> +               !gid_eq(bprm->cred->egid, current_egid()));
>> +}
>> +
>
> How about skipping this and using security_bprm_secureexec()?
>
>>  /*
>>   * sys_execve() executes a new program.
>>   */
>> @@ -1687,6 +1693,7 @@ static int do_execveat_common(int fd, struct filename *filename,
>>         struct linux_binprm *bprm;
>>         struct file *file;
>>         struct files_struct *displaced;
>> +       struct rlimit saved_rlim[RLIM_NLIMITS];
>>         int retval;
>>
>>         if (IS_ERR(filename))
>> @@ -1771,24 +1778,38 @@ static int do_execveat_common(int fd, struct filename *filename,
>>         if (retval < 0)
>>                 goto out;
>>
>> +       /*
>> +        * From here forward, we've got credentials set up and we're
>> +        * using resources, so do rlimit replacement before we start
>> +        * copying strings. (Note that the RLIMIT_NPROC check has
>> +        * already happened.)
>> +        */
>> +       BUILD_BUG_ON(sizeof(saved_rlim) != sizeof(current->signal->rlim));
>> +       if (is_setuid_exec(bprm)) {
>> +               memcpy(saved_rlim, current->signal->rlim, sizeof(saved_rlim));
>> +               memcpy(current->signal->rlim,
>> +                      task_active_pid_ns(current)->child_reaper->signal->rlim,
>> +                      sizeof(current->signal->rlim));
>> +       }
>> +
>
> Is there any locking needed here?  Is it possible that another thread
> with the same current->signal is running?

tasklist_lock protects child_reaper, and child_reaper changes.
I suppose rcu_read_lock should be enough for the case above if we just
want a snapshot in time.

It is 100% possible another thread is running and could take advantage
of the new limits.

> I think the answer is that this needs to be after de_thread(), which
> is called from exec_binprm(), which is rather annoying since the stack
> rlimit is used before that.  It's not *that* bad, since I think you
> could replace all the RLIMIT_STACK accesses with explciit code to look
> at the rlimits that are actually in play, but you just can't actually
> install them until you've flushed the old exec.

How this is handled elsewhere in the code is to put the new values in
bprm.  Putting new rlimits in bprm and changing them in flush_old_exec
or or setup_new_exec seems very sensible. It also allows for them to be
accessed before de_thread when setting up the new mm and we can still
fail.

Eric

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH] exec: Use init rlimits for setuid exec
  2017-07-06 12:45   ` Eric W. Biederman
@ 2017-07-06 15:27     ` Andy Lutomirski
  0 siblings, 0 replies; 36+ messages in thread
From: Andy Lutomirski @ 2017-07-06 15:27 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andy Lutomirski, Kees Cook, Linus Torvalds, Michal Hocko,
	Ben Hutchings, Willy Tarreau, Hugh Dickins, Oleg Nesterov,
	Jason A. Donenfeld, Rik van Riel, Larry Woodman,
	Kirill A. Shutemov, Tony Luck, James E.J. Bottomley,
	Helge Diller, James Hogan, Laura Abbott, Greg KH, security,
	Qualys Security Advisory, LKML, Ximin Luo

On Thu, Jul 6, 2017 at 5:45 AM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> How this is handled elsewhere in the code is to put the new values in
> bprm.  Putting new rlimits in bprm and changing them in flush_old_exec
> or or setup_new_exec seems very sensible. It also allows for them to be
> accessed before de_thread when setting up the new mm and we can still
> fail.
>

Makes sense to me.

--Andy

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH] exec: Use init rlimits for setuid exec
  2017-07-06 12:38 ` Eric W. Biederman
@ 2017-07-06 15:30   ` Andy Lutomirski
  0 siblings, 0 replies; 36+ messages in thread
From: Andy Lutomirski @ 2017-07-06 15:30 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Kees Cook, Linus Torvalds, Andy Lutomirski, Michal Hocko,
	Ben Hutchings, Willy Tarreau, Hugh Dickins, Oleg Nesterov,
	Jason A. Donenfeld, Rik van Riel, Larry Woodman,
	Kirill A. Shutemov, Tony Luck, James E.J. Bottomley,
	Helge Diller, James Hogan, Laura Abbott, Greg KH, security,
	Qualys Security Advisory, LKML, Ximin Luo

On Thu, Jul 6, 2017 at 5:38 AM, Eric W. Biederman <ebiederm@xmission.com> wrote:
> Kees Cook <keescook@chromium.org> writes:
>
>> In an attempt to provide sensible rlimit defaults for setuid execs, this
>> inherits the namespace's init rlimits:
>>
>> $ ulimit -s
>> 8192
>> $ ulimit -s unlimited
>> $ /bin/sh -c 'ulimit -s'
>> unlimited
>> $ sudo /bin/sh -c 'ulimit -s'
>> 8192
>>
>> This is modified from Brad Spengler/PaX Team's hard-coded setuid exec
>> stack rlimit (8MB) in the last public patch of grsecurity/PaX based on
>> my understanding of the code. Changes or omissions from the original
>> code are mine and don't reflect the original grsecurity/PaX code.
>>
>> Signed-off-by: Kees Cook <keescook@chromium.org>
>> ---
>> Instead of copying all rlimits, we could also pick specific ones to copy
>> (e.g. RLIMIT_STACK, or ones from Andy's list) or exclude from copying
>> (probably better to blacklist than whitelist).
>>
>> I think this is the right way to find the ns init task, but maybe it
>> needs locking?
>> ---
>>  fs/exec.c | 34 ++++++++++++++++++++++++++++++----
>>  1 file changed, 30 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/exec.c b/fs/exec.c
>> index 904199086490..80e8b2bd4284 100644
>> --- a/fs/exec.c
>> +++ b/fs/exec.c
>> @@ -1675,6 +1675,12 @@ static int exec_binprm(struct linux_binprm *bprm)
>>       return ret;
>>  }
>>
>> +static inline bool is_setuid_exec(struct linux_binprm *bprm)
>> +{
>> +     return (!uid_eq(bprm->cred->euid, current_euid()) ||
>> +             !gid_eq(bprm->cred->egid, current_egid()));
>> +}
>
> Awesome I can make an executable setuid to myself and get all of roots
> rlimits!
>
> Scratch inheritable rlimits as useful for any kind of policy decision.

Most of them are already fairly useless.  But it's a fair point.  We
don't want Jane to run a setuid-John program and thereby escape
whatever limits are imposed.  We don't really have a concept of
bona-fide-trustworthy-setuid-root, though.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH] exec: Use init rlimits for setuid exec
  2017-07-06  4:32 [RFC][PATCH] exec: Use init rlimits for setuid exec Kees Cook
                   ` (2 preceding siblings ...)
  2017-07-06 12:38 ` Eric W. Biederman
@ 2017-07-06 16:34 ` Linus Torvalds
  2017-07-06 16:50   ` Linus Torvalds
                     ` (2 more replies)
  3 siblings, 3 replies; 36+ messages in thread
From: Linus Torvalds @ 2017-07-06 16:34 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andy Lutomirski, Michal Hocko, Ben Hutchings, Willy Tarreau,
	Hugh Dickins, Oleg Nesterov, Jason A. Donenfeld, Rik van Riel,
	Larry Woodman, Kirill A. Shutemov, Tony Luck,
	James E.J. Bottomley, Helge Diller, James Hogan, Laura Abbott,
	Greg KH, security, Qualys Security Advisory, LKML, Ximin Luo

On Wed, Jul 5, 2017 at 9:32 PM, Kees Cook <keescook@chromium.org> wrote:
> In an attempt to provide sensible rlimit defaults for setuid execs, this
> inherits the namespace's init rlimits:

Yeah, so I have to admit to hating this patch.

As already mentioned by others, it's not only not clear that we want
to do this on every setuid exec, it's also not clear that init is the
right source of limits, or even which limits we'd want to copy.

I can easily see init doing a rlimit for its own use, and then when it
goes through the fork/exec process does it set up some other rlimit
for what it is going to run. You'd presumably want that for any
non-system thing, so it's actually fairly natural to do it for system
things too, so it's not at all obvious that "init" itself would run
with some generic "system limits".

So to me this feels like a bad hack that was brought on by this
particular attack.

I'd much rather see something like

 (a) minimal: just use our existing default stack (and stack _only_)
limit value for suid binaries that actually get extra permissions: {
_STK_LIM, RLIM_INFINITY }.

or

 (b) fancier: per-namespace defaults that can be explicitly set by
something, and enabled individually.

or

 (c) perhaps encourage people to annotate their suid binaries with
initial resource requirements (and for stack, I mean the existing
GNU_STACK ELF annotation in particular).

For an example of (a), that existing _STK_LIM define is what the
kernel defaults to, and it's a 8MB stack. And looking at my Fedora
install, I see that the default user rlimit is 8MB for the stack.

Is that just coincidence, or is that just a sign of "nobody ever even
modifies the default value"? So (a) feels like "nobody really cares,
and 8MB is fine, and nobody even bothers changing it - just do the
minimal thing".

As to (b), we could just have that whole INIT_RLIMITS per-namespace,
but only enable the stack limit by default. But then system admins
could cvhange those limits and enable/disable individual rlimits to be
used by suid binaries. That feels like the "give the admin tools"

And (c) would be the sane option, and what we already do for things
like GNU_STACK to enable/disable executable stacks. It really feels
like allowing the GNU_STACK segment to contain stack rlimit override
information would be the perfect tool for binaries to say "Yeah, I
need more stack than _STK_LIM".

So I see many different approaches (that could be combined: I like
combining (a) and (c), for example), and absolutely none of them
involve the random "take some values from init".

And yes, a large part of this may be that I no longer feel like I can
trust "init" to do the sane thing. You all presumably know why.

                   Linus

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH] exec: Use init rlimits for setuid exec
  2017-07-06 16:34 ` Linus Torvalds
@ 2017-07-06 16:50   ` Linus Torvalds
  2017-07-06 17:29   ` Kees Cook
  2017-07-12 23:50   ` Alan Cox
  2 siblings, 0 replies; 36+ messages in thread
From: Linus Torvalds @ 2017-07-06 16:50 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andy Lutomirski, Michal Hocko, Ben Hutchings, Willy Tarreau,
	Hugh Dickins, Oleg Nesterov, Jason A. Donenfeld, Rik van Riel,
	Larry Woodman, Kirill A. Shutemov, Tony Luck,
	James E.J. Bottomley, Helge Diller, James Hogan, Laura Abbott,
	Greg KH, security, Qualys Security Advisory, LKML, Ximin Luo

On Thu, Jul 6, 2017 at 9:34 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
>  (a) minimal: just use our existing default stack (and stack _only_)
> limit value for suid binaries that actually get extra permissions: {
> _STK_LIM, RLIM_INFINITY }.
>
>  (c) perhaps encourage people to annotate their suid binaries with
> initial resource requirements (and for stack, I mean the existing
> GNU_STACK ELF annotation in particular).

The more I look at that combination, the more I like it.

We already parse PT_GNU_STACK, and we already take the permission
values from there.

So taking the "p_memsz" field from there to mean the stack limit (if
it is non-zero) would be not only simple, but logical and clean.

And so if something ever wants more stack than 8MB, it would be
trivial to just use elf tools to mark that segment.

Ok, I admit that I don't know if there are any sane elf tools that do
this easily, but it still sounds conceptually like the RightThing(tm)
to do.

                Linus

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH] exec: Use init rlimits for setuid exec
  2017-07-06 16:34 ` Linus Torvalds
  2017-07-06 16:50   ` Linus Torvalds
@ 2017-07-06 17:29   ` Kees Cook
  2017-07-06 17:52     ` Linus Torvalds
  2017-07-12 23:50   ` Alan Cox
  2 siblings, 1 reply; 36+ messages in thread
From: Kees Cook @ 2017-07-06 17:29 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Michal Hocko, Ben Hutchings, Willy Tarreau,
	Hugh Dickins, Oleg Nesterov, Jason A. Donenfeld, Rik van Riel,
	Larry Woodman, Kirill A. Shutemov, Tony Luck,
	James E.J. Bottomley, Helge Diller, James Hogan, Laura Abbott,
	Greg KH, security, Qualys Security Advisory, LKML, Ximin Luo

On Thu, Jul 6, 2017 at 9:34 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Wed, Jul 5, 2017 at 9:32 PM, Kees Cook <keescook@chromium.org> wrote:
>> In an attempt to provide sensible rlimit defaults for setuid execs, this
>> inherits the namespace's init rlimits:
>
> Yeah, so I have to admit to hating this patch.

Yup! Totally fine. I just wanted to demonstrate the idea with some actual code.

> I'd much rather see something like
>
>  (a) minimal: just use our existing default stack (and stack _only_)
> limit value for suid binaries that actually get extra permissions: {
> _STK_LIM, RLIM_INFINITY }.

This would look a lot like the existing patch; it'd just not copy the
init process rlimits.

>  (c) perhaps encourage people to annotate their suid binaries with
> initial resource requirements (and for stack, I mean the existing
> GNU_STACK ELF annotation in particular).
>
> For an example of (a), that existing _STK_LIM define is what the
> kernel defaults to, and it's a 8MB stack. And looking at my Fedora
> install, I see that the default user rlimit is 8MB for the stack.
> [...]
> Is that just coincidence, or is that just a sign of "nobody ever even
> modifies the default value"? So (a) feels like "nobody really cares,
> and 8MB is fine, and nobody even bothers changing it - just do the
> minimal thing".

I would be terrified to see a setuid binary that needed >8MB stack. :P

> And (c) would be the sane option, and what we already do for things
> like GNU_STACK to enable/disable executable stacks. It really feels
> like allowing the GNU_STACK segment to contain stack rlimit override
> information would be the perfect tool for binaries to say "Yeah, I
> need more stack than _STK_LIM".
>
> So I see many different approaches (that could be combined: I like
> combining (a) and (c), for example), and absolutely none of them
> involve the random "take some values from init".
>
> And yes, a large part of this may be that I no longer feel like I can
> trust "init" to do the sane thing. You all presumably know why.

Heh. Yeah, I think a+c could make sense. "if not GNU_STACK program
header or 0 MemSiz GNU_STACK program header, use _STK_LIM, otherwise
use GNU_STACK MemSiz". I'll send a patch...

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH] exec: Use init rlimits for setuid exec
  2017-07-06 17:29   ` Kees Cook
@ 2017-07-06 17:52     ` Linus Torvalds
  2017-07-06 19:12       ` Kees Cook
  0 siblings, 1 reply; 36+ messages in thread
From: Linus Torvalds @ 2017-07-06 17:52 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andy Lutomirski, Michal Hocko, Ben Hutchings, Willy Tarreau,
	Hugh Dickins, Oleg Nesterov, Jason A. Donenfeld, Rik van Riel,
	Larry Woodman, Kirill A. Shutemov, Tony Luck,
	James E.J. Bottomley, Helge Diller, James Hogan, Laura Abbott,
	Greg KH, security, Qualys Security Advisory, LKML, Ximin Luo

On Thu, Jul 6, 2017 at 10:29 AM, Kees Cook <keescook@chromium.org> wrote:
>>
>>  (a) minimal: just use our existing default stack (and stack _only_)
>> limit value for suid binaries that actually get extra permissions: {
>> _STK_LIM, RLIM_INFINITY }.
>
> This would look a lot like the existing patch; it'd just not copy the
> init process rlimits.

Can't we just do the final rlimit setting so late in execve that we
don't need that whole "saved_rlimit" thing?

If the issue is the "people can use argv/envp to already fill the
stack", then I'd actually be happier with just limiting that.

We already claim that our ARG_MAX is just 128kB (old legacy). And I
was really happy when we changed our execve() to not have that nasty
array of pages, and we could expand on the array sizes. But we could
*easily* just say "limit execve arrays to 8MB", because while our code
can handle more, you do have latency issues and just memory use issues
too.

So right now we already limit the stack size artificially to 1/4 the
stack rlimit (see get_arg_page()), and we could easily just further
cap it at 8M total - right now people obviously actually run in
practice with much less (ie for me that argument size is capped at a
quarter of that 8MB default rlimit).

I have heard of people who want a big stack due to crazy recursion or
due to just doing otherwise insane things. But needing more than 8MB
of arg/envp? Not happening.

So I think we could easily do that stack rlimit thing at the very last
minute, and not have to worry about restoring anything.

                 Linus

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH] exec: Use init rlimits for setuid exec
  2017-07-06 17:52     ` Linus Torvalds
@ 2017-07-06 19:12       ` Kees Cook
  2017-07-07  4:48         ` Andy Lutomirski
  2017-07-10  8:44         ` Michal Hocko
  0 siblings, 2 replies; 36+ messages in thread
From: Kees Cook @ 2017-07-06 19:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Michal Hocko, Ben Hutchings, Willy Tarreau,
	Hugh Dickins, Oleg Nesterov, Jason A. Donenfeld, Rik van Riel,
	Larry Woodman, Kirill A. Shutemov, Tony Luck,
	James E.J. Bottomley, Helge Diller, James Hogan, Laura Abbott,
	Greg KH, security, Qualys Security Advisory, LKML, Ximin Luo

On Thu, Jul 6, 2017 at 10:52 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Jul 6, 2017 at 10:29 AM, Kees Cook <keescook@chromium.org> wrote:
>>>
>>>  (a) minimal: just use our existing default stack (and stack _only_)
>>> limit value for suid binaries that actually get extra permissions: {
>>> _STK_LIM, RLIM_INFINITY }.
>>
>> This would look a lot like the existing patch; it'd just not copy the
>> init process rlimits.
>
> Can't we just do the final rlimit setting so late in execve that we
> don't need that whole "saved_rlimit" thing?

The stack rlimit defines the mmap layout too:

do_execveat_common() ->
exec_binprm() ->
search_binary_handler() ->
fmt->load_binary (load_elf_binary()) ->
setup_new_exec() ->
arch_pick_mmap_layout() ->
mmap_is_legacy() ->
rlimit(RLIMIT_STACK) == RLIM_INFINITY

exec_binprm() happens after the other stack setup (copy_strings()), so
if we wanted to avoid saved_rlimit, we'd have to replumb how
arch_pick_mmap_layout() works and how copy_strings() performs its
calculations (neither looks too terrible).

> If the issue is the "people can use argv/envp to already fill the
> stack", then I'd actually be happier with just limiting that.
>
> We already claim that our ARG_MAX is just 128kB (old legacy). And I
> was really happy when we changed our execve() to not have that nasty
> array of pages, and we could expand on the array sizes. But we could
> *easily* just say "limit execve arrays to 8MB", because while our code
> can handle more, you do have latency issues and just memory use issues
> too.

That would address the argv/envp calculation but not the layout control.

> So right now we already limit the stack size artificially to 1/4 the
> stack rlimit (see get_arg_page()), and we could easily just further
> cap it at 8M total - right now people obviously actually run in
> practice with much less (ie for me that argument size is capped at a
> quarter of that 8MB default rlimit).
>
> I have heard of people who want a big stack due to crazy recursion or
> due to just doing otherwise insane things. But needing more than 8MB
> of arg/envp? Not happening.
>
> So I think we could easily do that stack rlimit thing at the very last
> minute, and not have to worry about restoring anything.

We should double check there isn't more than just argv and layout, but
I think both cases could be passed down a value from above instead of
examining current's rlimits.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH] exec: Use init rlimits for setuid exec
  2017-07-06 19:12       ` Kees Cook
@ 2017-07-07  4:48         ` Andy Lutomirski
  2017-07-07  5:03           ` Linus Torvalds
  2017-07-07  5:10           ` Kees Cook
  2017-07-10  8:44         ` Michal Hocko
  1 sibling, 2 replies; 36+ messages in thread
From: Andy Lutomirski @ 2017-07-07  4:48 UTC (permalink / raw)
  To: Kees Cook
  Cc: Linus Torvalds, Andy Lutomirski, Michal Hocko, Ben Hutchings,
	Willy Tarreau, Hugh Dickins, Oleg Nesterov, Jason A. Donenfeld,
	Rik van Riel, Larry Woodman, Kirill A. Shutemov, Tony Luck,
	James E.J. Bottomley, Helge Diller, James Hogan, Laura Abbott,
	Greg KH, security, Qualys Security Advisory, LKML, Ximin Luo

On Thu, Jul 6, 2017 at 12:12 PM, Kees Cook <keescook@chromium.org> wrote:
> On Thu, Jul 6, 2017 at 10:52 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>> On Thu, Jul 6, 2017 at 10:29 AM, Kees Cook <keescook@chromium.org> wrote:
>>>>
>>>>  (a) minimal: just use our existing default stack (and stack _only_)
>>>> limit value for suid binaries that actually get extra permissions: {
>>>> _STK_LIM, RLIM_INFINITY }.
>>>
>>> This would look a lot like the existing patch; it'd just not copy the
>>> init process rlimits.
>>
>> Can't we just do the final rlimit setting so late in execve that we
>> don't need that whole "saved_rlimit" thing?
>
> The stack rlimit defines the mmap layout too:
>
> do_execveat_common() ->
> exec_binprm() ->
> search_binary_handler() ->
> fmt->load_binary (load_elf_binary()) ->
> setup_new_exec() ->
> arch_pick_mmap_layout() ->
> mmap_is_legacy() ->
> rlimit(RLIMIT_STACK) == RLIM_INFINITY
>
> exec_binprm() happens after the other stack setup (copy_strings()), so
> if we wanted to avoid saved_rlimit, we'd have to replumb how
> arch_pick_mmap_layout() works and how copy_strings() performs its
> calculations (neither looks too terrible).

How about a much simpler solution: don't read rlimit at all in
copy_strings(), let alone try to enforce it.  Instead, just before the
point of no return, check how much stack space is already used and, if
it's more than an appropriate threshold (e.g. 1/4 of the rlimit),
abort.  Sure, this adds overhead if we're going to abort, but does
that really matter?

I don't see why using rlimit for layout control makes any sense
whatsoever.  Is there some historical reason we need that?  As far as
I can see (on insufficient inspection) is that the kernel is trying to
guarantee that, if we have so much arg crap that our remaining stack
is less than 128k, then we don't exceed our limit by a little bit.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH] exec: Use init rlimits for setuid exec
  2017-07-07  4:48         ` Andy Lutomirski
@ 2017-07-07  5:03           ` Linus Torvalds
  2017-07-07  5:10           ` Kees Cook
  1 sibling, 0 replies; 36+ messages in thread
From: Linus Torvalds @ 2017-07-07  5:03 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kees Cook, Michal Hocko, Ben Hutchings, Willy Tarreau,
	Hugh Dickins, Oleg Nesterov, Jason A. Donenfeld, Rik van Riel,
	Larry Woodman, Kirill A. Shutemov, Tony Luck,
	James E.J. Bottomley, Helge Diller, James Hogan, Laura Abbott,
	Greg KH, security, Qualys Security Advisory, LKML, Ximin Luo

On Thu, Jul 6, 2017 at 9:48 PM, Andy Lutomirski <luto@kernel.org> wrote:
>
> How about a much simpler solution: don't read rlimit at all in
> copy_strings(), let alone try to enforce it.

People have historically relied on E2BIG and then splitting things
into multiple chunks (ie do the whole 'xargs' thing).

But I agree that *if* we abort cleanly with E2BIG, then it's fine if
we copy a bit too much first. It's just that we need to check early
enough that an E2BIG error is possible (ie not after the "point of no
return").

And yes, I think we can get rid of the layout differences based on
rlimit. After all, if I read that right the thing currently clamps
that "gap" thing to 128MB anyway (and that's on the *low* side).

             Linus

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH] exec: Use init rlimits for setuid exec
  2017-07-07  4:48         ` Andy Lutomirski
  2017-07-07  5:03           ` Linus Torvalds
@ 2017-07-07  5:10           ` Kees Cook
  2017-07-07  5:15             ` Kees Cook
  1 sibling, 1 reply; 36+ messages in thread
From: Kees Cook @ 2017-07-07  5:10 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Michal Hocko, Ben Hutchings, Willy Tarreau,
	Hugh Dickins, Oleg Nesterov, Jason A. Donenfeld, Rik van Riel,
	Larry Woodman, Kirill A. Shutemov, Tony Luck,
	James E.J. Bottomley, Helge Diller, James Hogan, Laura Abbott,
	Greg KH, security, Qualys Security Advisory, LKML, Ximin Luo

On Thu, Jul 6, 2017 at 9:48 PM, Andy Lutomirski <luto@kernel.org> wrote:
> How about a much simpler solution: don't read rlimit at all in
> copy_strings(), let alone try to enforce it.  Instead, just before the
> point of no return, check how much stack space is already used and, if
> it's more than an appropriate threshold (e.g. 1/4 of the rlimit),
> abort.  Sure, this adds overhead if we're going to abort, but does
> that really matter?

We should avoid using up tons of memory and then failing. Better to
cap it as we use it. Plumbing a sane value into this shouldn't be hard
at all. Just making this a hardcoded 2MB seems sane (1/4 of 8MB).

> I don't see why using rlimit for layout control makes any sense
> whatsoever.  Is there some historical reason we need that?  As far as
> I can see (on insufficient inspection) is that the kernel is trying to
> guarantee that, if we have so much arg crap that our remaining stack
> is less than 128k, then we don't exceed our limit by a little bit.

IIUC, this is a big deal on 32-bit. Unlimited stack triggers top-down
mmap instead of bottom-up. I mean, I'd be delighted to get rid of
this, but I thought it was relied on by userspace.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH] exec: Use init rlimits for setuid exec
  2017-07-07  5:10           ` Kees Cook
@ 2017-07-07  5:15             ` Kees Cook
  2017-07-07  5:36               ` Andy Lutomirski
  2017-07-07  5:39               ` Linus Torvalds
  0 siblings, 2 replies; 36+ messages in thread
From: Kees Cook @ 2017-07-07  5:15 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Michal Hocko, Ben Hutchings, Willy Tarreau,
	Hugh Dickins, Oleg Nesterov, Jason A. Donenfeld, Rik van Riel,
	Larry Woodman, Kirill A. Shutemov, Tony Luck,
	James E.J. Bottomley, Helge Diller, James Hogan, Laura Abbott,
	Greg KH, security, Qualys Security Advisory, LKML, Ximin Luo

On Thu, Jul 6, 2017 at 10:10 PM, Kees Cook <keescook@chromium.org> wrote:
> On Thu, Jul 6, 2017 at 9:48 PM, Andy Lutomirski <luto@kernel.org> wrote:
>> How about a much simpler solution: don't read rlimit at all in
>> copy_strings(), let alone try to enforce it.  Instead, just before the
>> point of no return, check how much stack space is already used and, if
>> it's more than an appropriate threshold (e.g. 1/4 of the rlimit),
>> abort.  Sure, this adds overhead if we're going to abort, but does
>> that really matter?
>
> We should avoid using up tons of memory and then failing. Better to
> cap it as we use it. Plumbing a sane value into this shouldn't be hard
> at all. Just making this a hardcoded 2MB seems sane (1/4 of 8MB).
>
>> I don't see why using rlimit for layout control makes any sense
>> whatsoever.  Is there some historical reason we need that?  As far as
>> I can see (on insufficient inspection) is that the kernel is trying to
>> guarantee that, if we have so much arg crap that our remaining stack
>> is less than 128k, then we don't exceed our limit by a little bit.
>
> IIUC, this is a big deal on 32-bit. Unlimited stack triggers top-down
> mmap instead of bottom-up. I mean, I'd be delighted to get rid of
> this, but I thought it was relied on by userspace.

I always say this backwards. :P Default is top-down (allocate at high
addresses and work down toward low). With unlimited stack, allocations
start at low addresses and work up. Here's the results (shown with
randomize_va_space sysctl set to 0):

$ ulimit -s
8192
$ cat /proc/self/maps
08048000-08050000 r-xp 00000000 fc:01 843        /bin/cat
08050000-08051000 r--p 00007000 fc:01 843        /bin/cat
08051000-08052000 rw-p 00008000 fc:01 843        /bin/cat
08052000-08073000 rw-p 00000000 00:00 0          [heap]
b7be7000-b7de7000 r--p 00000000 fc:01 403307     /usr/lib/locale/locale-archive
b7de7000-b7f9a000 r-xp 00000000 fc:01 276980
/lib/i386-linux-gnu/libc-2.24.so
b7f9a000-b7f9b000 ---p 001b3000 fc:01 276980
/lib/i386-linux-gnu/libc-2.24.so
b7f9b000-b7f9d000 r--p 001b3000 fc:01 276980
/lib/i386-linux-gnu/libc-2.24.so
b7f9d000-b7f9e000 rw-p 001b5000 fc:01 276980
/lib/i386-linux-gnu/libc-2.24.so
b7f9e000-b7fa1000 rw-p 00000000 00:00 0
b7fb2000-b7fd7000 rw-p 00000000 00:00 0
b7fd7000-b7fd9000 r--p 00000000 00:00 0          [vvar]
b7fd9000-b7fdb000 r-xp 00000000 00:00 0          [vdso]
b7fdb000-b7ffd000 r-xp 00000000 fc:01 276959     /lib/i386-linux-gnu/ld-2.24.so
b7ffd000-b7ffe000 r--p 002d7000 fc:01 403307     /usr/lib/locale/locale-archive
b7ffe000-b7fff000 r--p 00022000 fc:01 276959     /lib/i386-linux-gnu/ld-2.24.so
b7fff000-b8000000 rw-p 00023000 fc:01 276959     /lib/i386-linux-gnu/ld-2.24.so
bffdf000-c0000000 rw-p 00000000 00:00 0          [stack]
$ ulimit -s unlimited
$ cat /proc/self/maps
08048000-08050000 r-xp 00000000 fc:01 843        /bin/cat
08050000-08051000 r--p 00007000 fc:01 843        /bin/cat
08051000-08052000 rw-p 00008000 fc:01 843        /bin/cat
08052000-08073000 rw-p 00000000 00:00 0          [heap]
40000000-40022000 r-xp 00000000 fc:01 276959     /lib/i386-linux-gnu/ld-2.24.so
40022000-40023000 r--p 002d7000 fc:01 403307     /usr/lib/locale/locale-archive
40023000-40024000 r--p 00022000 fc:01 276959     /lib/i386-linux-gnu/ld-2.24.so
40024000-40025000 rw-p 00023000 fc:01 276959     /lib/i386-linux-gnu/ld-2.24.so
40025000-40027000 r--p 00000000 00:00 0          [vvar]
40027000-40029000 r-xp 00000000 00:00 0          [vdso]
40029000-4004e000 rw-p 00000000 00:00 0
4005f000-40212000 r-xp 00000000 fc:01 276980
/lib/i386-linux-gnu/libc-2.24.so
40212000-40213000 ---p 001b3000 fc:01 276980
/lib/i386-linux-gnu/libc-2.24.so
40213000-40215000 r--p 001b3000 fc:01 276980
/lib/i386-linux-gnu/libc-2.24.so
40215000-40216000 rw-p 001b5000 fc:01 276980
/lib/i386-linux-gnu/libc-2.24.so
40216000-40219000 rw-p 00000000 00:00 0
40219000-40419000 r--p 00000000 fc:01 403307     /usr/lib/locale/locale-archive
bffdf000-c0000000 rw-p 00000000 00:00 0          [stack]


-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH] exec: Use init rlimits for setuid exec
  2017-07-07  5:15             ` Kees Cook
@ 2017-07-07  5:36               ` Andy Lutomirski
  2017-07-07  5:45                 ` Kees Cook
  2017-07-07  5:39               ` Linus Torvalds
  1 sibling, 1 reply; 36+ messages in thread
From: Andy Lutomirski @ 2017-07-07  5:36 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andy Lutomirski, Linus Torvalds, Michal Hocko, Ben Hutchings,
	Willy Tarreau, Hugh Dickins, Oleg Nesterov, Jason A. Donenfeld,
	Rik van Riel, Larry Woodman, Kirill A. Shutemov, Tony Luck,
	James E.J. Bottomley, Helge Diller, James Hogan, Laura Abbott,
	Greg KH, security, Qualys Security Advisory, LKML, Ximin Luo

On Thu, Jul 6, 2017 at 10:15 PM, Kees Cook <keescook@chromium.org> wrote:
> On Thu, Jul 6, 2017 at 10:10 PM, Kees Cook <keescook@chromium.org> wrote:
>> On Thu, Jul 6, 2017 at 9:48 PM, Andy Lutomirski <luto@kernel.org> wrote:
>>> How about a much simpler solution: don't read rlimit at all in
>>> copy_strings(), let alone try to enforce it.  Instead, just before the
>>> point of no return, check how much stack space is already used and, if
>>> it's more than an appropriate threshold (e.g. 1/4 of the rlimit),
>>> abort.  Sure, this adds overhead if we're going to abort, but does
>>> that really matter?
>>
>> We should avoid using up tons of memory and then failing. Better to
>> cap it as we use it. Plumbing a sane value into this shouldn't be hard
>> at all. Just making this a hardcoded 2MB seems sane (1/4 of 8MB).

Aren't there real use cases that use many megs of arguments?

We could probably get away with saying max(rlimit(RLIMIT_STACK), 2MB)
as long as we make sure later on that we don't screw up if we've
overallocated?

>>
>>> I don't see why using rlimit for layout control makes any sense
>>> whatsoever.  Is there some historical reason we need that?  As far as
>>> I can see (on insufficient inspection) is that the kernel is trying to
>>> guarantee that, if we have so much arg crap that our remaining stack
>>> is less than 128k, then we don't exceed our limit by a little bit.
>>
>> IIUC, this is a big deal on 32-bit. Unlimited stack triggers top-down
>> mmap instead of bottom-up. I mean, I'd be delighted to get rid of
>> this, but I thought it was relied on by userspace.
>
> I always say this backwards. :P Default is top-down (allocate at high
> addresses and work down toward low). With unlimited stack, allocations
> start at low addresses and work up. Here's the results (shown with
> randomize_va_space sysctl set to 0):

Uhh, crikey!  Where's the code that does that?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH] exec: Use init rlimits for setuid exec
  2017-07-07  5:15             ` Kees Cook
  2017-07-07  5:36               ` Andy Lutomirski
@ 2017-07-07  5:39               ` Linus Torvalds
  2017-07-07  5:49                 ` Kees Cook
  1 sibling, 1 reply; 36+ messages in thread
From: Linus Torvalds @ 2017-07-07  5:39 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andy Lutomirski, Michal Hocko, Ben Hutchings, Willy Tarreau,
	Hugh Dickins, Oleg Nesterov, Jason A. Donenfeld, Rik van Riel,
	Larry Woodman, Kirill A. Shutemov, Tony Luck,
	James E.J. Bottomley, Helge Diller, James Hogan, Laura Abbott,
	Greg KH, security, Qualys Security Advisory, LKML, Ximin Luo

On Thu, Jul 6, 2017 at 10:15 PM, Kees Cook <keescook@chromium.org> wrote:
>
> I always say this backwards. :P Default is top-down (allocate at high
> addresses and work down toward low). With unlimited stack, allocations
> start at low addresses and work up. Here's the results (shown with
> randomize_va_space sysctl set to 0):

But this doesn't affect the stack layout itself.

So we could do the stack copying without much caring, because that
happens first, right?

So I think we can do all the envp/argv copying first, and then - as we
change the credentials, change the rlimit. And the string copies
wouldn't need to care much - although I guess they are also fine
checking against a possible *smaller* stack rlimit, which is actually
what we'd want.

And I think the credentials switch (which is the point of no return
anyway) happens before we start mmap'ing the executable etc. We used
to have some odd code there and do it in the completely wrong order
(checking that the binary was executable for the *old* user, which
makes no sense, iirc)

So I'm getting the sense that none of this should be a problem.

But it's entirely possible that I missed something, and am just full
of shit. Our execve() path has traditionally been very hard to read.
It's actually gotten a bit better, but the whole "jump back and forth
between the generic fs/exec.c code and the binfmt code" is certainly
still there.

             Linus

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH] exec: Use init rlimits for setuid exec
  2017-07-07  5:36               ` Andy Lutomirski
@ 2017-07-07  5:45                 ` Kees Cook
  2017-07-07  6:02                   ` Linus Torvalds
  2017-07-07 14:48                   ` Andy Lutomirski
  0 siblings, 2 replies; 36+ messages in thread
From: Kees Cook @ 2017-07-07  5:45 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Michal Hocko, Ben Hutchings, Willy Tarreau,
	Hugh Dickins, Oleg Nesterov, Jason A. Donenfeld, Rik van Riel,
	Larry Woodman, Kirill A. Shutemov, Tony Luck,
	James E.J. Bottomley, Helge Diller, James Hogan, Laura Abbott,
	Greg KH, security, Qualys Security Advisory, LKML, Ximin Luo

On Thu, Jul 6, 2017 at 10:36 PM, Andy Lutomirski <luto@kernel.org> wrote:
> On Thu, Jul 6, 2017 at 10:15 PM, Kees Cook <keescook@chromium.org> wrote:
>> On Thu, Jul 6, 2017 at 10:10 PM, Kees Cook <keescook@chromium.org> wrote:
>>> On Thu, Jul 6, 2017 at 9:48 PM, Andy Lutomirski <luto@kernel.org> wrote:
>>>> How about a much simpler solution: don't read rlimit at all in
>>>> copy_strings(), let alone try to enforce it.  Instead, just before the
>>>> point of no return, check how much stack space is already used and, if
>>>> it's more than an appropriate threshold (e.g. 1/4 of the rlimit),
>>>> abort.  Sure, this adds overhead if we're going to abort, but does
>>>> that really matter?
>>>
>>> We should avoid using up tons of memory and then failing. Better to
>>> cap it as we use it. Plumbing a sane value into this shouldn't be hard
>>> at all. Just making this a hardcoded 2MB seems sane (1/4 of 8MB).
>
> Aren't there real use cases that use many megs of arguments?

They'd be relatively new since the args were pretty limited before.
I'd be curious to see them.

> We could probably get away with saying max(rlimit(RLIMIT_STACK), 2MB)
> as long as we make sure later on that we don't screw up if we've
> overallocated?

min, not max, but yeah. Here's part of what I have for get_arg_page():

                rlim = current->signal->rlim;
-               if (size > READ_ONCE(rlim[RLIMIT_STACK].rlim_cur) / 4)
+               arg_stack = READ_ONCE(rlim[RLIMIT_STACK].rlim_cur);
+               arg_stack = min_t(unsigned long, arg_stack, _STK_LIM) / 4;
+               if (size > arg_stack)
                        goto fail;

>>> IIUC, this is a big deal on 32-bit. Unlimited stack triggers top-down
>>> mmap instead of bottom-up. I mean, I'd be delighted to get rid of
>>> this, but I thought it was relied on by userspace.
>>
>> I always say this backwards. :P Default is top-down (allocate at high
>> addresses and work down toward low). With unlimited stack, allocations
>> start at low addresses and work up. Here's the results (shown with
>> randomize_va_space sysctl set to 0):
>
> Uhh, crikey!  Where's the code that does that?

That was the call path I quoted earlier:

> The stack rlimit defines the mmap layout too:
>
> do_execveat_common() ->
> exec_binprm() ->
> search_binary_handler() ->
> fmt->load_binary (load_elf_binary()) ->
> setup_new_exec() ->
> arch_pick_mmap_layout() ->
> mmap_is_legacy() ->
> rlimit(RLIMIT_STACK) == RLIM_INFINITY

i.e. arch_pick_mmap_layout().

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH] exec: Use init rlimits for setuid exec
  2017-07-07  5:39               ` Linus Torvalds
@ 2017-07-07  5:49                 ` Kees Cook
  2017-07-07  6:40                   ` Kees Cook
  0 siblings, 1 reply; 36+ messages in thread
From: Kees Cook @ 2017-07-07  5:49 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Michal Hocko, Ben Hutchings, Willy Tarreau,
	Hugh Dickins, Oleg Nesterov, Jason A. Donenfeld, Rik van Riel,
	Larry Woodman, Kirill A. Shutemov, Tony Luck,
	James E.J. Bottomley, Helge Diller, James Hogan, Laura Abbott,
	Greg KH, security, Qualys Security Advisory, LKML, Ximin Luo

On Thu, Jul 6, 2017 at 10:39 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Jul 6, 2017 at 10:15 PM, Kees Cook <keescook@chromium.org> wrote:
>>
>> I always say this backwards. :P Default is top-down (allocate at high
>> addresses and work down toward low). With unlimited stack, allocations
>> start at low addresses and work up. Here's the results (shown with
>> randomize_va_space sysctl set to 0):
>
> But this doesn't affect the stack layout itself.
>
> So we could do the stack copying without much caring, because that
> happens first, right?
>
> So I think we can do all the envp/argv copying first, and then - as we
> change the credentials, change the rlimit. And the string copies
> wouldn't need to care much - although I guess they are also fine
> checking against a possible *smaller* stack rlimit, which is actually
> what we'd want.

Yup, agreed.

> And I think the credentials switch (which is the point of no return
> anyway) happens before we start mmap'ing the executable etc. We used
> to have some odd code there and do it in the completely wrong order
> (checking that the binary was executable for the *old* user, which
> makes no sense, iirc)

Yeah, it all happens in setup_new_exec(). The first thing is layout
selection, then switching credentials. It could be made to take a hint
from GNU_STACK (which was parsed before setup_new_exec() is called),
check security_bprm_secureexec() and then make the rlimit changes, all
before the layout selection.

> So I'm getting the sense that none of this should be a problem.
>
> But it's entirely possible that I missed something, and am just full
> of shit. Our execve() path has traditionally been very hard to read.
> It's actually gotten a bit better, but the whole "jump back and forth
> between the generic fs/exec.c code and the binfmt code" is certainly
> still there.

Yeah, there might be something else lurking here, but so far, I'm
satisfied this will work too.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH] exec: Use init rlimits for setuid exec
  2017-07-07  5:45                 ` Kees Cook
@ 2017-07-07  6:02                   ` Linus Torvalds
  2017-07-07  6:10                     ` Kees Cook
  2017-07-07 14:48                   ` Andy Lutomirski
  1 sibling, 1 reply; 36+ messages in thread
From: Linus Torvalds @ 2017-07-07  6:02 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andy Lutomirski, Michal Hocko, Ben Hutchings, Willy Tarreau,
	Hugh Dickins, Oleg Nesterov, Jason A. Donenfeld, Rik van Riel,
	Larry Woodman, Kirill A. Shutemov, Tony Luck,
	James E.J. Bottomley, Helge Diller, James Hogan, Laura Abbott,
	Greg KH, security, Qualys Security Advisory, LKML, Ximin Luo

On Thu, Jul 6, 2017 at 10:45 PM, Kees Cook <keescook@chromium.org> wrote:
> On Thu, Jul 6, 2017 at 10:36 PM, Andy Lutomirski <luto@kernel.org> wrote:
>>
>> Aren't there real use cases that use many megs of arguments?
>
> They'd be relatively new since the args were pretty limited before.
> I'd be curious to see them.

"megs" yes. "many megs" no.

The traditional kernel limit was 32 pages (so 128kB on x86, explaining
our MAX_ARG value).

We moved to the much nider "two active VM's at the same time" model a
fairly long time ago, though - it was back in v2.6.23 or so. So about
10 years ago.

I would have expected lots of scripts to have been written since that
just end up going *far* over the old 128kB limit, because it's really
easy to do.

Things like big directories and the shell expanding "*" can easily be
a megabyte of arguments. I know I used to have scripts where I had to
use "xargs" in the past, and with the > 128kB change I just stopped,
because "a couple of megabytes" is enough for a lot of things where
128kB wasn't necessarily.

Oh, one example is actually the kernel source tree. I don't do it any
more (because "git grep" is much better), but I used to do things like

    grep something $(find . -name '*.[ch]')

all the time.

And that actually currently *just* overflows the 2MB argument size,
but used to work (easily) ten years ago. Oh, how the kernel has
grown..

Yes, yes, *portably* you should always have done

  find . -print0 -name '*.[ch]' | xargs -0 grep

but be honest now: that first thing is what you actually write when
you do some throw-away one-liner.

So 2+MB is still definitely something people can do (and probably *do* do).

             Linus

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH] exec: Use init rlimits for setuid exec
  2017-07-07  6:02                   ` Linus Torvalds
@ 2017-07-07  6:10                     ` Kees Cook
  2017-07-07 16:06                       ` Linus Torvalds
  0 siblings, 1 reply; 36+ messages in thread
From: Kees Cook @ 2017-07-07  6:10 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Michal Hocko, Ben Hutchings, Willy Tarreau,
	Hugh Dickins, Oleg Nesterov, Jason A. Donenfeld, Rik van Riel,
	Larry Woodman, Kirill A. Shutemov, Tony Luck,
	James E.J. Bottomley, Helge Diller, James Hogan, Laura Abbott,
	Greg KH, security, Qualys Security Advisory, LKML, Ximin Luo

On Thu, Jul 6, 2017 at 11:02 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> So 2+MB is still definitely something people can do (and probably *do* do).

With the default 8MB stack, most people are already limited to 2MB
here. I guess the question is, do people raise their stack rlimit to
gain more arguments? Should I pick a different value for the args?

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH] exec: Use init rlimits for setuid exec
  2017-07-07  5:49                 ` Kees Cook
@ 2017-07-07  6:40                   ` Kees Cook
  2017-07-07 16:22                     ` Linus Torvalds
  0 siblings, 1 reply; 36+ messages in thread
From: Kees Cook @ 2017-07-07  6:40 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Michal Hocko, Ben Hutchings, Willy Tarreau,
	Hugh Dickins, Oleg Nesterov, Jason A. Donenfeld, Rik van Riel,
	Larry Woodman, Kirill A. Shutemov, Tony Luck,
	James E.J. Bottomley, Helge Diller, James Hogan, Laura Abbott,
	Greg KH, security, Qualys Security Advisory, LKML, Ximin Luo

On Thu, Jul 6, 2017 at 10:49 PM, Kees Cook <keescook@chromium.org> wrote:
> On Thu, Jul 6, 2017 at 10:39 PM, Linus Torvalds
>> And I think the credentials switch (which is the point of no return
>> anyway) happens before we start mmap'ing the executable etc. We used
>> to have some odd code there and do it in the completely wrong order
>> (checking that the binary was executable for the *old* user, which
>> makes no sense, iirc)
>
> Yeah, it all happens in setup_new_exec(). The first thing is layout
> selection, then switching credentials. It could be made to take a hint
> from GNU_STACK (which was parsed before setup_new_exec() is called),
> check security_bprm_secureexec() and then make the rlimit changes, all
> before the layout selection.

At Andy's suggestion I'm using security_bprm_secureexec() to test for
setuid-ness. However, this seems to expect the credentials to have
already been installed. And yet ... the following patch still works
correctly when I call it "early". I'm going to look again in the
morning.

diff --git a/fs/exec.c b/fs/exec.c
index b60804216b59..a4d2433a44ec 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1334,9 +1334,20 @@ EXPORT_SYMBOL(would_dump);

 void setup_new_exec(struct linux_binprm * bprm)
 {
+       /* This is the point of no return */
+
+       /*
+        * If this is a setuid execution, reset the stack limit to
+        * a sane default to avoid bad behavior from the prior rlimits.
+        */
+       if (security_bprm_secureexec(bprm)) {
+               struct rlimit default_stack = { _STK_LIM, RLIM_INFINITY };
+
+               current->signal->rlim[RLIMIT_STACK] = default_stack;
+       }
+
        arch_pick_mmap_layout(current->mm);

-       /* This is the point of no return */
        current->sas_ss_sp = current->sas_ss_size = 0;

        if (uid_eq(current_euid(), current_uid()) &&
gid_eq(current_egid(), current_gid()))



-- 
Kees Cook
Pixel Security

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH] exec: Use init rlimits for setuid exec
  2017-07-07  5:45                 ` Kees Cook
  2017-07-07  6:02                   ` Linus Torvalds
@ 2017-07-07 14:48                   ` Andy Lutomirski
  1 sibling, 0 replies; 36+ messages in thread
From: Andy Lutomirski @ 2017-07-07 14:48 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andy Lutomirski, Linus Torvalds, Michal Hocko, Ben Hutchings,
	Willy Tarreau, Hugh Dickins, Oleg Nesterov, Jason A. Donenfeld,
	Rik van Riel, Larry Woodman, Kirill A. Shutemov, Tony Luck,
	James E.J. Bottomley, Helge Diller, James Hogan, Laura Abbott,
	Greg KH, security, Qualys Security Advisory, LKML, Ximin Luo

On Thu, Jul 6, 2017 at 10:45 PM, Kees Cook <keescook@chromium.org> wrote:
> On Thu, Jul 6, 2017 at 10:36 PM, Andy Lutomirski <luto@kernel.org> wrote:
>> Aren't there real use cases that use many megs of arguments?
>
> They'd be relatively new since the args were pretty limited before.
> I'd be curious to see them.
>
>> We could probably get away with saying max(rlimit(RLIMIT_STACK), 2MB)
>> as long as we make sure later on that we don't screw up if we've
>> overallocated?
>
> min, not max, but yeah. Here's part of what I have for get_arg_page():
>
>                 rlim = current->signal->rlim;
> -               if (size > READ_ONCE(rlim[RLIMIT_STACK].rlim_cur) / 4)
> +               arg_stack = READ_ONCE(rlim[RLIMIT_STACK].rlim_cur);
> +               arg_stack = min_t(unsigned long, arg_stack, _STK_LIM) / 4;
> +               if (size > arg_stack)
>                         goto fail;

I really did mean max, the idea being that, if we're going to increase
rlim_cur, it's a bit odd to fail the exec if it would have worked
under the higher value.  That being said, I see no real exploit vector
here if just rlimit(RLIMIT_STACK) is used.

(Can you just use rlimit()?  The open-coding seems entirely useless.)

I thought of another approach, though: change the rlimit macros so
that a secureexec program always gets at least 8MB stack.  Might be
less regression-prone.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH] exec: Use init rlimits for setuid exec
  2017-07-07  6:10                     ` Kees Cook
@ 2017-07-07 16:06                       ` Linus Torvalds
  2017-07-07 18:28                         ` Kees Cook
  0 siblings, 1 reply; 36+ messages in thread
From: Linus Torvalds @ 2017-07-07 16:06 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andy Lutomirski, Michal Hocko, Ben Hutchings, Willy Tarreau,
	Hugh Dickins, Oleg Nesterov, Jason A. Donenfeld, Rik van Riel,
	Larry Woodman, Kirill A. Shutemov, Tony Luck,
	James E.J. Bottomley, Helge Diller, James Hogan, Laura Abbott,
	Greg KH, security, Qualys Security Advisory, LKML, Ximin Luo

On Thu, Jul 6, 2017 at 11:10 PM, Kees Cook <keescook@chromium.org> wrote:
> On Thu, Jul 6, 2017 at 11:02 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>> So 2+MB is still definitely something people can do (and probably *do* do).
>
> With the default 8MB stack, most people are already limited to 2MB
> here. I guess the question is, do people raise their stack rlimit to
> gain more arguments? Should I pick a different value for the args?

So I would not be at all surprised if people just made the stack limit
higher when they hit the E2BIG issue in some script.

So yes, I'd make the max args cutoff be higher than 2MB.

I'd suggest we make the code do:

 (a) keep the existing rlimit/4 check (so *most* people will see the
exact same behavior)

 (b) add a static max arg check for something that is closer to 8MB
but leaves a somewhat reasonable stack size even if the stack size get
reset to 8MB

I'd suggest that (b) case just be 6MB or something. Maybe make it
(_STK_LIM/4*3) or whatever, in case we ever end up changing that
value.

Hmm?

                 Linus

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH] exec: Use init rlimits for setuid exec
  2017-07-07  6:40                   ` Kees Cook
@ 2017-07-07 16:22                     ` Linus Torvalds
  2017-07-07 18:27                       ` Kees Cook
  0 siblings, 1 reply; 36+ messages in thread
From: Linus Torvalds @ 2017-07-07 16:22 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andy Lutomirski, Michal Hocko, Ben Hutchings, Willy Tarreau,
	Hugh Dickins, Oleg Nesterov, Jason A. Donenfeld, Rik van Riel,
	Larry Woodman, Kirill A. Shutemov, Tony Luck,
	James E.J. Bottomley, Helge Diller, James Hogan, Laura Abbott,
	Greg KH, security, Qualys Security Advisory, LKML, Ximin Luo

On Thu, Jul 6, 2017 at 11:40 PM, Kees Cook <keescook@chromium.org> wrote:
> On Thu, Jul 6, 2017 at 10:49 PM, Kees Cook <keescook@chromium.org> wrote:
>
> At Andy's suggestion I'm using security_bprm_secureexec() to test for
> setuid-ness. However, this seems to expect the credentials to have
> already been installed.

That function actually takes a look at current creds, so yes, it
currently works only after the creds have been installed.

It works for you because it *also* checks if the current cred is not
root, and then it looks at bprm->cap_effective too (indirectly - check
the commoncap code), which has been set earlier.

But it's crazy code. I literally don't know why it looks at
current_creds(), when it's all about the bprm, which has its own creds
- and then it would work any time.

That code is confusing. Somebody should check it. That whole
security_bprm_secureexec() hook seems mis-designed.

> And yet ... the following patch still works correctly when I call it "early".

So I definitely like this approach, as long as we clarify that crazy
security_bprm_secureexec() model. That code really is insane.

But I wonder:

> +       if (security_bprm_secureexec(bprm)) {
> +               struct rlimit default_stack = { _STK_LIM, RLIM_INFINITY };
> +
> +               current->signal->rlim[RLIMIT_STACK] = default_stack;

Should we override the whole thing? And in particular, should we
override the hard limit at all?

If we have an existing lower stack limit, we might as well leave it be
entirely. And if the soft stack limit is higher, why modify the hard
limit?

In particular, the scenario I'm thinking of is "system admin on a
small machine has set everybodys stack limits to just 8M as a _hard_
limit".

Now, Bob and Jill agree to help each other to escape that limit, so
they make suid binaries for their own account, and let each other use
them - boom, they now have _STK_LIM and RLIM_INFINITY stack limits.

Not good.

In contrast, if the code just did:

    if (security_bprm_secureexec(bprm)) {
        if (current->signal->rlim[RLIMIT_STACK].rlim_cur > _STK_LIM)
            current->signal->rlim[RLIMIT_STACK].rlim_cur = _STK_LIM;
    }

and leave the hard limit alone entirely. At least that doesn't let
anybody escape the limits that the sysadmin has set.

Hmm? Yes, this allows people to try to attack suid binaries with a
really small stack. But that's a pre-existing attack - do we have
worries about it?

                        Linus

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH] exec: Use init rlimits for setuid exec
  2017-07-07 16:22                     ` Linus Torvalds
@ 2017-07-07 18:27                       ` Kees Cook
  0 siblings, 0 replies; 36+ messages in thread
From: Kees Cook @ 2017-07-07 18:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Michal Hocko, Ben Hutchings, Willy Tarreau,
	Hugh Dickins, Oleg Nesterov, Jason A. Donenfeld, Rik van Riel,
	Larry Woodman, Kirill A. Shutemov, Tony Luck,
	James E.J. Bottomley, Helge Diller, James Hogan, Laura Abbott,
	Greg KH, security, Qualys Security Advisory, LKML, Ximin Luo

On Fri, Jul 7, 2017 at 9:22 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Jul 6, 2017 at 11:40 PM, Kees Cook <keescook@chromium.org> wrote:
> So I definitely like this approach, as long as we clarify that crazy
> security_bprm_secureexec() model. That code really is insane.

Yeah, it really seems like security_bprm_secureexec() should be
checking the bprm cred, though there are comments in the LSMs about
why this isn't done. I'll dig through the history there; it may be an
artifact of the old cred installation ordering...

It seems like an LSM would want to see "current creds" and "new creds"
as distinct things for comparing the results, but I'll check it out.

> In contrast, if the code just did:
>
>     if (security_bprm_secureexec(bprm)) {
>         if (current->signal->rlim[RLIMIT_STACK].rlim_cur > _STK_LIM)
>             current->signal->rlim[RLIMIT_STACK].rlim_cur = _STK_LIM;
>     }
>
> and leave the hard limit alone entirely. At least that doesn't let
> anybody escape the limits that the sysadmin has set.

Sounds good to me.

>
> Hmm? Yes, this allows people to try to attack suid binaries with a
> really small stack. But that's a pre-existing attack - do we have
> worries about it?

Agreed, I think that's out of scope for this.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH] exec: Use init rlimits for setuid exec
  2017-07-07 16:06                       ` Linus Torvalds
@ 2017-07-07 18:28                         ` Kees Cook
  0 siblings, 0 replies; 36+ messages in thread
From: Kees Cook @ 2017-07-07 18:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Michal Hocko, Ben Hutchings, Willy Tarreau,
	Hugh Dickins, Oleg Nesterov, Jason A. Donenfeld, Rik van Riel,
	Larry Woodman, Kirill A. Shutemov, Tony Luck,
	James E.J. Bottomley, Helge Diller, James Hogan, Laura Abbott,
	Greg KH, security, Qualys Security Advisory, LKML, Ximin Luo

On Fri, Jul 7, 2017 at 9:06 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Jul 6, 2017 at 11:10 PM, Kees Cook <keescook@chromium.org> wrote:
>> On Thu, Jul 6, 2017 at 11:02 PM, Linus Torvalds
>> <torvalds@linux-foundation.org> wrote:
>>> So 2+MB is still definitely something people can do (and probably *do* do).
>>
>> With the default 8MB stack, most people are already limited to 2MB
>> here. I guess the question is, do people raise their stack rlimit to
>> gain more arguments? Should I pick a different value for the args?
>
> So I would not be at all surprised if people just made the stack limit
> higher when they hit the E2BIG issue in some script.
>
> So yes, I'd make the max args cutoff be higher than 2MB.
>
> I'd suggest we make the code do:
>
>  (a) keep the existing rlimit/4 check (so *most* people will see the
> exact same behavior)
>
>  (b) add a static max arg check for something that is closer to 8MB
> but leaves a somewhat reasonable stack size even if the stack size get
> reset to 8MB
>
> I'd suggest that (b) case just be 6MB or something. Maybe make it
> (_STK_LIM/4*3) or whatever, in case we ever end up changing that
> value.

Sounds good. I'll send a patch...

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH] exec: Use init rlimits for setuid exec
  2017-07-06 19:12       ` Kees Cook
  2017-07-07  4:48         ` Andy Lutomirski
@ 2017-07-10  8:44         ` Michal Hocko
  2017-07-10 16:12           ` Kees Cook
  1 sibling, 1 reply; 36+ messages in thread
From: Michal Hocko @ 2017-07-10  8:44 UTC (permalink / raw)
  To: Kees Cook
  Cc: Linus Torvalds, Andy Lutomirski, Ben Hutchings, Willy Tarreau,
	Hugh Dickins, Oleg Nesterov, Jason A. Donenfeld, Rik van Riel,
	Larry Woodman, Kirill A. Shutemov, Tony Luck,
	James E.J. Bottomley, Helge Diller, James Hogan, Laura Abbott,
	Greg KH, security, Qualys Security Advisory, LKML, Ximin Luo

On Thu 06-07-17 12:12:55, Kees Cook wrote:
> On Thu, Jul 6, 2017 at 10:52 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > On Thu, Jul 6, 2017 at 10:29 AM, Kees Cook <keescook@chromium.org> wrote:
> >>>
> >>>  (a) minimal: just use our existing default stack (and stack _only_)
> >>> limit value for suid binaries that actually get extra permissions: {
> >>> _STK_LIM, RLIM_INFINITY }.
> >>
> >> This would look a lot like the existing patch; it'd just not copy the
> >> init process rlimits.
> >
> > Can't we just do the final rlimit setting so late in execve that we
> > don't need that whole "saved_rlimit" thing?
> 
> The stack rlimit defines the mmap layout too:
> 
> do_execveat_common() ->
> exec_binprm() ->
> search_binary_handler() ->
> fmt->load_binary (load_elf_binary()) ->
> setup_new_exec() ->
> arch_pick_mmap_layout() ->
> mmap_is_legacy() ->
> rlimit(RLIMIT_STACK) == RLIM_INFINITY

FWIW this is gone in tip tree. See
lkml.kernel.org/r/20170614082218.12450-1-mhocko@kernel.org

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH] exec: Use init rlimits for setuid exec
  2017-07-10  8:44         ` Michal Hocko
@ 2017-07-10 16:12           ` Kees Cook
  2017-07-10 16:18             ` Linus Torvalds
  2017-07-10 16:27             ` Michal Hocko
  0 siblings, 2 replies; 36+ messages in thread
From: Kees Cook @ 2017-07-10 16:12 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Linus Torvalds, Andy Lutomirski, Ben Hutchings, Willy Tarreau,
	Hugh Dickins, Oleg Nesterov, Jason A. Donenfeld, Rik van Riel,
	Larry Woodman, Kirill A. Shutemov, Tony Luck,
	James E.J. Bottomley, Helge Diller, James Hogan, Laura Abbott,
	Greg KH, security, Qualys Security Advisory, LKML, Ximin Luo

On Mon, Jul 10, 2017 at 1:44 AM, Michal Hocko <mhocko@kernel.org> wrote:
> On Thu 06-07-17 12:12:55, Kees Cook wrote:
>> On Thu, Jul 6, 2017 at 10:52 AM, Linus Torvalds
>> <torvalds@linux-foundation.org> wrote:
>> > On Thu, Jul 6, 2017 at 10:29 AM, Kees Cook <keescook@chromium.org> wrote:
>> >>>
>> >>>  (a) minimal: just use our existing default stack (and stack _only_)
>> >>> limit value for suid binaries that actually get extra permissions: {
>> >>> _STK_LIM, RLIM_INFINITY }.
>> >>
>> >> This would look a lot like the existing patch; it'd just not copy the
>> >> init process rlimits.
>> >
>> > Can't we just do the final rlimit setting so late in execve that we
>> > don't need that whole "saved_rlimit" thing?
>>
>> The stack rlimit defines the mmap layout too:
>>
>> do_execveat_common() ->
>> exec_binprm() ->
>> search_binary_handler() ->
>> fmt->load_binary (load_elf_binary()) ->
>> setup_new_exec() ->
>> arch_pick_mmap_layout() ->
>> mmap_is_legacy() ->
>> rlimit(RLIMIT_STACK) == RLIM_INFINITY
>
> FWIW this is gone in tip tree. See
> lkml.kernel.org/r/20170614082218.12450-1-mhocko@kernel.org

Sounds good to me, but won't large-memory users in 32-bit get annoyed?

A setuid/setgid exec will also get ADDR_COMPAT_LAYOUT cleared in
bprm_fill_uid() (but not for fs-caps), so that'll keep things from
getting layout-controlled. Thanks!

(ADDR_COMPAT_LAYOUT isn't cleared for capability elevations, though, I think.)

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH] exec: Use init rlimits for setuid exec
  2017-07-10 16:12           ` Kees Cook
@ 2017-07-10 16:18             ` Linus Torvalds
  2017-07-10 16:52               ` Willy Tarreau
  2017-07-10 16:27             ` Michal Hocko
  1 sibling, 1 reply; 36+ messages in thread
From: Linus Torvalds @ 2017-07-10 16:18 UTC (permalink / raw)
  To: Kees Cook
  Cc: Michal Hocko, Andy Lutomirski, Ben Hutchings, Willy Tarreau,
	Hugh Dickins, Oleg Nesterov, Jason A. Donenfeld, Rik van Riel,
	Larry Woodman, Kirill A. Shutemov, Tony Luck,
	James E.J. Bottomley, Helge Diller, James Hogan, Laura Abbott,
	Greg KH, security, Qualys Security Advisory, LKML, Ximin Luo

On Mon, Jul 10, 2017 at 9:12 AM, Kees Cook <keescook@chromium.org> wrote:
>
> Sounds good to me, but won't large-memory users in 32-bit get annoyed?

We'll see.

I suspect that all large-memory users have long since upgraded to
x86-64 (rule of thumb: if you are upgrading kernels today, you
probably upgraded hardware ten years ago), and that this may be a
non-issue today.

But only time will tell.

I certainly prefer "keep it simple" over theoretical concerns. It's
why I prefer that unconditional stack limit too - we may have to make
it conditional on suid'ness or something like the ELF PT_GNU_STACK
setting, but before over-designing things, let's see if anybody even
cares.

             Linus

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH] exec: Use init rlimits for setuid exec
  2017-07-10 16:12           ` Kees Cook
  2017-07-10 16:18             ` Linus Torvalds
@ 2017-07-10 16:27             ` Michal Hocko
  2017-07-10 18:16               ` Michal Hocko
  1 sibling, 1 reply; 36+ messages in thread
From: Michal Hocko @ 2017-07-10 16:27 UTC (permalink / raw)
  To: Kees Cook
  Cc: Linus Torvalds, Andy Lutomirski, Ben Hutchings, Willy Tarreau,
	Hugh Dickins, Oleg Nesterov, Jason A. Donenfeld, Rik van Riel,
	Larry Woodman, Kirill A. Shutemov, Tony Luck,
	James E.J. Bottomley, Helge Diller, James Hogan, Laura Abbott,
	Greg KH, security, Qualys Security Advisory, LKML, Ximin Luo

On Mon 10-07-17 09:12:11, Kees Cook wrote:
> On Mon, Jul 10, 2017 at 1:44 AM, Michal Hocko <mhocko@kernel.org> wrote:
> > On Thu 06-07-17 12:12:55, Kees Cook wrote:
> >> On Thu, Jul 6, 2017 at 10:52 AM, Linus Torvalds
> >> <torvalds@linux-foundation.org> wrote:
> >> > On Thu, Jul 6, 2017 at 10:29 AM, Kees Cook <keescook@chromium.org> wrote:
> >> >>>
> >> >>>  (a) minimal: just use our existing default stack (and stack _only_)
> >> >>> limit value for suid binaries that actually get extra permissions: {
> >> >>> _STK_LIM, RLIM_INFINITY }.
> >> >>
> >> >> This would look a lot like the existing patch; it'd just not copy the
> >> >> init process rlimits.
> >> >
> >> > Can't we just do the final rlimit setting so late in execve that we
> >> > don't need that whole "saved_rlimit" thing?
> >>
> >> The stack rlimit defines the mmap layout too:
> >>
> >> do_execveat_common() ->
> >> exec_binprm() ->
> >> search_binary_handler() ->
> >> fmt->load_binary (load_elf_binary()) ->
> >> setup_new_exec() ->
> >> arch_pick_mmap_layout() ->
> >> mmap_is_legacy() ->
> >> rlimit(RLIMIT_STACK) == RLIM_INFINITY
> >
> > FWIW this is gone in tip tree. See
> > lkml.kernel.org/r/20170614082218.12450-1-mhocko@kernel.org
> 
> Sounds good to me, but won't large-memory users in 32-bit get annoyed?

Why would they? 32b do bottom up layouts by default.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH] exec: Use init rlimits for setuid exec
  2017-07-10 16:18             ` Linus Torvalds
@ 2017-07-10 16:52               ` Willy Tarreau
  0 siblings, 0 replies; 36+ messages in thread
From: Willy Tarreau @ 2017-07-10 16:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kees Cook, Michal Hocko, Andy Lutomirski, Ben Hutchings,
	Hugh Dickins, Oleg Nesterov, Jason A. Donenfeld, Rik van Riel,
	Larry Woodman, Kirill A. Shutemov, Tony Luck,
	James E.J. Bottomley, Helge Diller, James Hogan, Laura Abbott,
	Greg KH, security, Qualys Security Advisory, LKML, Ximin Luo

On Mon, Jul 10, 2017 at 09:18:09AM -0700, Linus Torvalds wrote:
> On Mon, Jul 10, 2017 at 9:12 AM, Kees Cook <keescook@chromium.org> wrote:
> >
> > Sounds good to me, but won't large-memory users in 32-bit get annoyed?
> 
> We'll see.
> 
> I suspect that all large-memory users have long since upgraded to
> x86-64 (rule of thumb: if you are upgrading kernels today, you
> probably upgraded hardware ten years ago), and that this may be a
> non-issue today.

I tend to agree. We've been using 32-bit machines with "a lot" (=2GB)
of RAM and haproxy using something like 1.3GB in the past, and it
started to become a bit complex due to ASLR puching large holes between
each and every shared object, forcing us to stop setting strict
overcommit limits for example. We've abandonned them after kernel 3.10,
when the new models had been migrated to 64 bits a few years ago already
and I think anyone doing anything serious with memory doesn't use 32-bit
at all.

Well I know of one exception :-)  My netbook has 3 GB and is 32-bit,
running on 4.9 :

willy@eeepc:~$ uname -a
Linux eeepc 4.9.36-eeepc #1 SMP Mon Jul 10 07:33:29 CEST 2017 i686 Intel(R) Atom(TM) CPU N2800   @ 1.86GHz GenuineIntel GNU/Linux
willy@eeepc:~$ free
             total       used       free     shared    buffers     cached
Mem:       3097840     649816    2448024          0      52476     507216
-/+ buffers/cache:      90124    3007716
Swap:      1025440          0    1025440

It only runs end-user stuff (firefox) so it cannot be considered anything
serious.

Willy

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH] exec: Use init rlimits for setuid exec
  2017-07-10 16:27             ` Michal Hocko
@ 2017-07-10 18:16               ` Michal Hocko
  2017-07-10 18:29                 ` Rik van Riel
  0 siblings, 1 reply; 36+ messages in thread
From: Michal Hocko @ 2017-07-10 18:16 UTC (permalink / raw)
  To: Kees Cook
  Cc: Linus Torvalds, Andy Lutomirski, Ben Hutchings, Willy Tarreau,
	Hugh Dickins, Oleg Nesterov, Jason A. Donenfeld, Rik van Riel,
	Larry Woodman, Kirill A. Shutemov, Tony Luck,
	James E.J. Bottomley, Helge Diller, James Hogan, Laura Abbott,
	Greg KH, security, Qualys Security Advisory, LKML, Ximin Luo

On Mon 10-07-17 18:27:51, Michal Hocko wrote:
> On Mon 10-07-17 09:12:11, Kees Cook wrote:
[...]
> > >> do_execveat_common() ->
> > >> exec_binprm() ->
> > >> search_binary_handler() ->
> > >> fmt->load_binary (load_elf_binary()) ->
> > >> setup_new_exec() ->
> > >> arch_pick_mmap_layout() ->
> > >> mmap_is_legacy() ->
> > >> rlimit(RLIMIT_STACK) == RLIM_INFINITY
> > >
> > > FWIW this is gone in tip tree. See
> > > lkml.kernel.org/r/20170614082218.12450-1-mhocko@kernel.org
> > 
> > Sounds good to me, but won't large-memory users in 32-bit get annoyed?
> 
> Why would they? 32b do bottom up layouts by default.

OK, I misread the code. 32b applications on 64b systems do top down by
default and only if they override this by ADDR_COMPAT_LAYOUT
personality. For some reason I thought that 32b userspace goes a
different path and makes sure that they are always doing bottom up.

Anyway even if somebody really needs to grow stack really large we have
the personality to give them the legacy layout.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH] exec: Use init rlimits for setuid exec
  2017-07-10 18:16               ` Michal Hocko
@ 2017-07-10 18:29                 ` Rik van Riel
  0 siblings, 0 replies; 36+ messages in thread
From: Rik van Riel @ 2017-07-10 18:29 UTC (permalink / raw)
  To: Michal Hocko, Kees Cook
  Cc: Linus Torvalds, Andy Lutomirski, Ben Hutchings, Willy Tarreau,
	Hugh Dickins, Oleg Nesterov, Jason A. Donenfeld, Larry Woodman,
	Kirill A. Shutemov, Tony Luck, James E.J. Bottomley,
	Helge Diller, James Hogan, Laura Abbott, Greg KH, security,
	Qualys Security Advisory, LKML, Ximin Luo

On Mon, 2017-07-10 at 20:16 +0200, Michal Hocko wrote:

> OK, I misread the code. 32b applications on 64b systems do top down
> by
> default and only if they override this by ADDR_COMPAT_LAYOUT
> personality. For some reason I thought that 32b userspace goes a
> different path and makes sure that they are always doing bottom up.
> 
> Anyway even if somebody really needs to grow stack really large we
> have
> the personality to give them the legacy layout.

I think what will happen when rlimit_stack is RLIMIT_INFINITY
is that mmap_base will end up placing mm->mmap_base at 512MB
(task_size / 6 * 5 below the top of address space) for 32 bit
kernels, and we eventually fall back to a bottom-up search
if the space below mmap_base is exhausted (if it ever is).

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC][PATCH] exec: Use init rlimits for setuid exec
  2017-07-06 16:34 ` Linus Torvalds
  2017-07-06 16:50   ` Linus Torvalds
  2017-07-06 17:29   ` Kees Cook
@ 2017-07-12 23:50   ` Alan Cox
  2 siblings, 0 replies; 36+ messages in thread
From: Alan Cox @ 2017-07-12 23:50 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kees Cook, Andy Lutomirski, Michal Hocko, Ben Hutchings,
	Willy Tarreau, Hugh Dickins, Oleg Nesterov, Jason A. Donenfeld,
	Rik van Riel, Larry Woodman, Kirill A. Shutemov, Tony Luck,
	James E.J. Bottomley, Helge Diller, James Hogan, Laura Abbott,
	Greg KH, security, Qualys Security Advisory, LKML, Ximin Luo

>  (a) minimal: just use our existing default stack (and stack _only_)
> limit value for suid binaries that actually get extra permissions: {
> _STK_LIM, RLIM_INFINITY }.

Even that is dangerous because a setuid binary can be transitioning
between two users (none privileged) yet be subject to an rlimit attack.
There's even less reason to believe that non root setuid binaries are
properly hardened than obvious targets. CPU limit attacks in particular
can be used to do some quite clever things.

Also consider a binary that is gaining some minor right (eg network
rights) being targetted because giving it extra permissions allows the
attacker to gain access to infinite resources when that clearly isn't the
intent.

>  (c) perhaps encourage people to annotate their suid binaries with
> initial resource requirements (and for stack, I mean the existing
> GNU_STACK ELF annotation in particular).

Making this for setuid binaries only makes no sense. If a user can
annotate required resources and the execve() fails if those resources are
over the rlimit then that is a useful feature full stop, and there's no
reason to even make it setuid dependent.

Alan

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2017-07-12 23:52 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-06  4:32 [RFC][PATCH] exec: Use init rlimits for setuid exec Kees Cook
2017-07-06  4:59 ` Andy Lutomirski
2017-07-06 12:45   ` Eric W. Biederman
2017-07-06 15:27     ` Andy Lutomirski
2017-07-06  5:47 ` Willy Tarreau
2017-07-06 12:38 ` Eric W. Biederman
2017-07-06 15:30   ` Andy Lutomirski
2017-07-06 16:34 ` Linus Torvalds
2017-07-06 16:50   ` Linus Torvalds
2017-07-06 17:29   ` Kees Cook
2017-07-06 17:52     ` Linus Torvalds
2017-07-06 19:12       ` Kees Cook
2017-07-07  4:48         ` Andy Lutomirski
2017-07-07  5:03           ` Linus Torvalds
2017-07-07  5:10           ` Kees Cook
2017-07-07  5:15             ` Kees Cook
2017-07-07  5:36               ` Andy Lutomirski
2017-07-07  5:45                 ` Kees Cook
2017-07-07  6:02                   ` Linus Torvalds
2017-07-07  6:10                     ` Kees Cook
2017-07-07 16:06                       ` Linus Torvalds
2017-07-07 18:28                         ` Kees Cook
2017-07-07 14:48                   ` Andy Lutomirski
2017-07-07  5:39               ` Linus Torvalds
2017-07-07  5:49                 ` Kees Cook
2017-07-07  6:40                   ` Kees Cook
2017-07-07 16:22                     ` Linus Torvalds
2017-07-07 18:27                       ` Kees Cook
2017-07-10  8:44         ` Michal Hocko
2017-07-10 16:12           ` Kees Cook
2017-07-10 16:18             ` Linus Torvalds
2017-07-10 16:52               ` Willy Tarreau
2017-07-10 16:27             ` Michal Hocko
2017-07-10 18:16               ` Michal Hocko
2017-07-10 18:29                 ` Rik van Riel
2017-07-12 23:50   ` Alan Cox

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.