All of lore.kernel.org
 help / color / mirror / Atom feed
* get_arg_page() && ptr_size accounting
@ 2018-09-10 12:29 Oleg Nesterov
  2018-09-10 16:41 ` Kees Cook
  0 siblings, 1 reply; 16+ messages in thread
From: Oleg Nesterov @ 2018-09-10 12:29 UTC (permalink / raw)
  To: Kees Cook; +Cc: Rik van Riel, Michal Hocko, Stanislav Kozina, linux-kernel

Hi Kees,

I was thinking about backporting the commit 98da7d08850fb8bde
("fs/exec.c: account for argv/envp pointers"), but I am not sure
I understand it...

So get_arg_page() does

		/*
		 * Since the stack will hold pointers to the strings, we
		 * must account for them as well.
		 *
		 * The size calculation is the entire vma while each arg page is
		 * built, so each time we get here it's calculating how far it
		 * is currently (rather than each call being just the newly
		 * added size from the arg page).  As a result, we need to
		 * always add the entire size of the pointers, so that on the
		 * last call to get_arg_page() we'll actually have the entire
		 * correct size.
		 */
		ptr_size = (bprm->argc + bprm->envc) * sizeof(void *);
		if (ptr_size > ULONG_MAX - size)
			goto fail;
		size += ptr_size;

OK, but
		acct_arg_size(bprm, size / PAGE_SIZE);

after that doesn't look exactly right. This additional space will be used later
when the process already uses bprm->mm, right? so it shouldn't be accounted by
acct_arg_size().

Not to mention that ptr_size/PAGE_SIZE doesn't look right in any case...

In short. Am I totally confused or the patch below makes sense? This way we do
not need the fat comment.

Oleg.

--- x/fs/exec.c
+++ x/fs/exec.c
@@ -222,25 +222,17 @@ static struct page *get_arg_page(struct
 		unsigned long size = bprm->vma->vm_end - bprm->vma->vm_start;
 		unsigned long ptr_size, limit;
 
+		acct_arg_size(bprm, size / PAGE_SIZE);
+
 		/*
 		 * Since the stack will hold pointers to the strings, we
 		 * must account for them as well.
-		 *
-		 * The size calculation is the entire vma while each arg page is
-		 * built, so each time we get here it's calculating how far it
-		 * is currently (rather than each call being just the newly
-		 * added size from the arg page).  As a result, we need to
-		 * always add the entire size of the pointers, so that on the
-		 * last call to get_arg_page() we'll actually have the entire
-		 * correct size.
 		 */
 		ptr_size = (bprm->argc + bprm->envc) * sizeof(void *);
 		if (ptr_size > ULONG_MAX - size)
 			goto fail;
 		size += ptr_size;
 
-		acct_arg_size(bprm, size / PAGE_SIZE);
-
 		/*
 		 * We've historically supported up to 32 pages (ARG_MAX)
 		 * of argument strings even with small stacks


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: get_arg_page() && ptr_size accounting
  2018-09-10 12:29 get_arg_page() && ptr_size accounting Oleg Nesterov
@ 2018-09-10 16:41 ` Kees Cook
  2018-09-10 16:45   ` Kees Cook
  2018-09-10 17:18   ` Oleg Nesterov
  0 siblings, 2 replies; 16+ messages in thread
From: Kees Cook @ 2018-09-10 16:41 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Rik van Riel, Michal Hocko, Stanislav Kozina, LKML

On Mon, Sep 10, 2018 at 5:29 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> Hi Kees,
>
> I was thinking about backporting the commit 98da7d08850fb8bde
> ("fs/exec.c: account for argv/envp pointers"), but I am not sure
> I understand it...
>
> So get_arg_page() does
>
>                 /*
>                  * Since the stack will hold pointers to the strings, we
>                  * must account for them as well.
>                  *
>                  * The size calculation is the entire vma while each arg page is
>                  * built, so each time we get here it's calculating how far it
>                  * is currently (rather than each call being just the newly
>                  * added size from the arg page).  As a result, we need to
>                  * always add the entire size of the pointers, so that on the
>                  * last call to get_arg_page() we'll actually have the entire
>                  * correct size.
>                  */
>                 ptr_size = (bprm->argc + bprm->envc) * sizeof(void *);
>                 if (ptr_size > ULONG_MAX - size)
>                         goto fail;
>                 size += ptr_size;
>
> OK, but
>                 acct_arg_size(bprm, size / PAGE_SIZE);
>
> after that doesn't look exactly right. This additional space will be used later
> when the process already uses bprm->mm, right? so it shouldn't be accounted by
> acct_arg_size().

My understanding (based on the comment about acct_arg_size()) is that
before exec_mmap() happens, the memory used to build the new arguments
copy memory area gets accounted to the MM_ANONPAGES resource limit of
the execing process. I couldn't find any place where the argc/envc
pointers were being included in the count, so I believe this needs to
stay as-is otherwise it's a resource limit bypass.

> Not to mention that ptr_size/PAGE_SIZE doesn't look right in any case...

Hm? acct_arg_size() takes pages, not bytes. I think this is correct?
What doesn't look right to you?

> In short. Am I totally confused or the patch below makes sense? This way we do
> not need the fat comment.

Even if I'm wrong about acct_arg_size(), we need to keep the comment
because it still applies to "size" and the following ARG_MAX test and
the _STK_LIM test. I think it's very non-obvious that we need to
always keep the full argc/envc count each time we go through these
calculations.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: get_arg_page() && ptr_size accounting
  2018-09-10 16:41 ` Kees Cook
@ 2018-09-10 16:45   ` Kees Cook
  2018-09-10 17:21     ` Oleg Nesterov
  2018-09-10 17:18   ` Oleg Nesterov
  1 sibling, 1 reply; 16+ messages in thread
From: Kees Cook @ 2018-09-10 16:45 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Rik van Riel, Michal Hocko, Stanislav Kozina, LKML

On Mon, Sep 10, 2018 at 9:41 AM, Kees Cook <keescook@chromium.org> wrote:
> On Mon, Sep 10, 2018 at 5:29 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>> Hi Kees,
>>
>> I was thinking about backporting the commit 98da7d08850fb8bde
>> ("fs/exec.c: account for argv/envp pointers"), but I am not sure
>> I understand it...

BTW, if you backport that, please get the rest associated with the
various Stack Clash related weaknesses:

c31dbb146dd4 exec: pin stack limit during exec
b83838313386 exec: introduce finalize_exec() before start_thread()
8f2af155b513 exec: pass stack rlimit into mm layout functions
7bd698b3c04e exec: Set file unwritable before LSM check
e816c201aed5 exec: Weaken dumpability for secureexec
779f4e1c6c7c Revert "exec: avoid RLIMIT_STACK races with prlimit()"
04e35f4495dd exec: avoid RLIMIT_STACK races with prlimit()
fe8993b3a05c exec: Consolidate pdeath_signal clearing
64701dee4178 exec: Use sane stack rlimit under secureexec
473d89639db0 exec: Consolidate dumpability logic
a70423dfbc58 exec: Use secureexec for clearing pdeath_signal
e37fdb785a5f exec: Use secureexec for setting dumpability
2af622802696 LSM: drop bprm_secureexec hook
46d98eb4e1d2 commoncap: Refactor to remove bprm_secureexec hook
c425e189ffd7 binfmt: Introduce secureexec flag
a9208e42ba99 exec: Correct comments about "point of no return"
ddb4a1442def exec: Rename bprm->cred_prepared to called_set_creds
da029c11e6b1 exec: Limit arg stack to at most 75% of _STK_LIM
98da7d08850f fs/exec.c: account for argv/envp pointers

(and there may be more related to the secureexec changes)

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: get_arg_page() && ptr_size accounting
  2018-09-10 16:41 ` Kees Cook
  2018-09-10 16:45   ` Kees Cook
@ 2018-09-10 17:18   ` Oleg Nesterov
  2018-09-11  4:23     ` Kees Cook
  1 sibling, 1 reply; 16+ messages in thread
From: Oleg Nesterov @ 2018-09-10 17:18 UTC (permalink / raw)
  To: Kees Cook; +Cc: Rik van Riel, Michal Hocko, Stanislav Kozina, LKML

On 09/10, Kees Cook wrote:
>
> > So get_arg_page() does
> >
> >                 /*
> >                  * Since the stack will hold pointers to the strings, we
> >                  * must account for them as well.
> >                  *
> >                  * The size calculation is the entire vma while each arg page is
> >                  * built, so each time we get here it's calculating how far it
> >                  * is currently (rather than each call being just the newly
> >                  * added size from the arg page).  As a result, we need to
> >                  * always add the entire size of the pointers, so that on the
> >                  * last call to get_arg_page() we'll actually have the entire
> >                  * correct size.
> >                  */
> >                 ptr_size = (bprm->argc + bprm->envc) * sizeof(void *);
> >                 if (ptr_size > ULONG_MAX - size)
> >                         goto fail;
> >                 size += ptr_size;
> >
> > OK, but
> >                 acct_arg_size(bprm, size / PAGE_SIZE);
> >
> > after that doesn't look exactly right. This additional space will be used later
> > when the process already uses bprm->mm, right? so it shouldn't be accounted by
> > acct_arg_size().
>
> My understanding (based on the comment about acct_arg_size()) is that
> before exec_mmap() happens, the memory used to build the new arguments
> copy memory area gets accounted to the MM_ANONPAGES resource limit of
> the execing process.

Yes, because otherwise oom-killer can't account the memory populated by
get_arg_page() in bprm->mm.

> I couldn't find any place where the argc/envc
> pointers were being included in the count,

But why??? To clarify,

	size += ptr_size;

after acct_arg_size() is clear and correct, we are going to check rlim_stack
and thus the size should include the pointers we will add in create_elf_tables().


But acct_arg_size() should only account the pages we allocate for bprm->mm,
nothing more. create_elf_tables() does not allocate the memory when it populates
arg_start/arg_end/env_start/env_end. Plus at this time the process has already
switched to bprm->mm.

> > Not to mention that ptr_size/PAGE_SIZE doesn't look right in any case...
>
> Hm? acct_arg_size() takes pages, not bytes. I think this is correct?
> What doesn't look right to you?

Please forget. I meant that _if_ we actually wanted to account this additional
memory in bprm->pages, than we would probably need something like
acct_arg_size(size/PAGE_SIZE + DIV_ROUND_UP(ptr_size, PAGE_SIZE)).

> > In short. Am I totally confused or the patch below makes sense? This way we do
> > not need the fat comment.
>
> Even if I'm wrong about acct_arg_size(), we need to keep the comment

I won't argue, but to me evrything looks obvious as long as we don't pass
ptr_size acct_arg_size().

Oleg.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: get_arg_page() && ptr_size accounting
  2018-09-10 16:45   ` Kees Cook
@ 2018-09-10 17:21     ` Oleg Nesterov
  2018-09-10 17:43       ` Oleg Nesterov
  2018-09-11  4:27       ` Kees Cook
  0 siblings, 2 replies; 16+ messages in thread
From: Oleg Nesterov @ 2018-09-10 17:21 UTC (permalink / raw)
  To: Kees Cook; +Cc: Rik van Riel, Michal Hocko, Stanislav Kozina, LKML

On 09/10, Kees Cook wrote:
>
> On Mon, Sep 10, 2018 at 9:41 AM, Kees Cook <keescook@chromium.org> wrote:
> > On Mon, Sep 10, 2018 at 5:29 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> >> Hi Kees,
> >>
> >> I was thinking about backporting the commit 98da7d08850fb8bde
> >> ("fs/exec.c: account for argv/envp pointers"), but I am not sure
> >> I understand it...
>
> BTW, if you backport that, please get the rest associated with the
> various Stack Clash related weaknesses:

may be...

> da029c11e6b1 exec: Limit arg stack to at most 75% of _STK_LIM

and I have to admit that I do not understand this patch at all, the
changelog explains nothing.

Could you explain what this patch actually prevents from? Especially
now that we have stack_guard_gap?

Oleg.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: get_arg_page() && ptr_size accounting
  2018-09-10 17:21     ` Oleg Nesterov
@ 2018-09-10 17:43       ` Oleg Nesterov
  2018-09-11  4:30         ` Kees Cook
  2018-09-11  4:27       ` Kees Cook
  1 sibling, 1 reply; 16+ messages in thread
From: Oleg Nesterov @ 2018-09-10 17:43 UTC (permalink / raw)
  To: Kees Cook; +Cc: Rik van Riel, Michal Hocko, Stanislav Kozina, LKML

On 09/10, Oleg Nesterov wrote:
>
> On 09/10, Kees Cook wrote:
> >
> > On Mon, Sep 10, 2018 at 9:41 AM, Kees Cook <keescook@chromium.org> wrote:
> > > On Mon, Sep 10, 2018 at 5:29 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> > >> Hi Kees,
> > >>
> > >> I was thinking about backporting the commit 98da7d08850fb8bde
> > >> ("fs/exec.c: account for argv/envp pointers"), but I am not sure
> > >> I understand it...
> >
> > BTW, if you backport that, please get the rest associated with the
> > various Stack Clash related weaknesses:
>
> may be...
>
> > da029c11e6b1 exec: Limit arg stack to at most 75% of _STK_LIM
>
> and I have to admit that I do not understand this patch at all, the
> changelog explains nothing.
>
> Could you explain what this patch actually prevents from? Especially
> now that we have stack_guard_gap?

forgot to mention...

with this patch

	#define MAX_ARG_STRINGS 0x7FFFFFFF

doesn't match the reality. perhaps something like below makes sense just
to make it clear, but this is cosmetic.

Oleg.

--- x/fs/exec.c
+++ x/fs/exec.c
@@ -1789,11 +1789,13 @@ static int __do_execve_file(int fd, stru
 	if (retval)
 		goto out_unmark;
 
-	bprm->argc = count(argv, MAX_ARG_STRINGS);
+	int max_arg_strings = _STK_LIM / 4 * 3 / 2; // actually even less than
+
+	bprm->argc = count(argv, max_arg_strings);
 	if ((retval = bprm->argc) < 0)
 		goto out;
 
-	bprm->envc = count(envp, MAX_ARG_STRINGS);
+	bprm->envc = count(envp, max_arg_strings - bprm->argc);
 	if ((retval = bprm->envc) < 0)
 		goto out;
 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: get_arg_page() && ptr_size accounting
  2018-09-10 17:18   ` Oleg Nesterov
@ 2018-09-11  4:23     ` Kees Cook
  2018-09-11 14:13       ` Oleg Nesterov
  0 siblings, 1 reply; 16+ messages in thread
From: Kees Cook @ 2018-09-11  4:23 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Rik van Riel, Michal Hocko, Stanislav Kozina, LKML

On Mon, Sep 10, 2018 at 10:18 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 09/10, Kees Cook wrote:
>>
>> > So get_arg_page() does
>> >
>> >                 /*
>> >                  * Since the stack will hold pointers to the strings, we
>> >                  * must account for them as well.
>> >                  *
>> >                  * The size calculation is the entire vma while each arg page is
>> >                  * built, so each time we get here it's calculating how far it
>> >                  * is currently (rather than each call being just the newly
>> >                  * added size from the arg page).  As a result, we need to
>> >                  * always add the entire size of the pointers, so that on the
>> >                  * last call to get_arg_page() we'll actually have the entire
>> >                  * correct size.
>> >                  */
>> >                 ptr_size = (bprm->argc + bprm->envc) * sizeof(void *);
>> >                 if (ptr_size > ULONG_MAX - size)
>> >                         goto fail;
>> >                 size += ptr_size;
>> >
>> > OK, but
>> >                 acct_arg_size(bprm, size / PAGE_SIZE);
>> >
>> > after that doesn't look exactly right. This additional space will be used later
>> > when the process already uses bprm->mm, right? so it shouldn't be accounted by
>> > acct_arg_size().
>>
>> My understanding (based on the comment about acct_arg_size()) is that
>> before exec_mmap() happens, the memory used to build the new arguments
>> copy memory area gets accounted to the MM_ANONPAGES resource limit of
>> the execing process.
>
> Yes, because otherwise oom-killer can't account the memory populated by
> get_arg_page() in bprm->mm.
>
>> I couldn't find any place where the argc/envc
>> pointers were being included in the count,
>
> But why??? To clarify,
>
>         size += ptr_size;
>
> after acct_arg_size() is clear and correct, we are going to check rlim_stack
> and thus the size should include the pointers we will add in create_elf_tables().
>
> But acct_arg_size() should only account the pages we allocate for bprm->mm,
> nothing more. create_elf_tables() does not allocate the memory when it populates
> arg_start/arg_end/env_start/env_end. Plus at this time the process has already
> switched to bprm->mm.

I've looked more closely now. So, while I agree with you about
resource limits, there's a corner case that is better handled here:
once we've called flush_old_exec(), we can no longer send errors back
to the parent. We just segfault. So, I think it's better to give a
resource limit error early, since it is able to do the math early.

If we move acct_arg_size() earlier, then the "immediate" resource
utilization is checked, but it means it can just segfault later. If we
leave it as-is, we account for later memory allocations "too early",
but we'll still not be able to run: but we can tell the parent why.

I prefer leave it as-is.

>> > Not to mention that ptr_size/PAGE_SIZE doesn't look right in any case...
>>
>> Hm? acct_arg_size() takes pages, not bytes. I think this is correct?
>> What doesn't look right to you?
>
> Please forget. I meant that _if_ we actually wanted to account this additional
> memory in bprm->pages, than we would probably need something like
> acct_arg_size(size/PAGE_SIZE + DIV_ROUND_UP(ptr_size, PAGE_SIZE)).

I'd need to study that more, but that change seems reasonable. :)

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: get_arg_page() && ptr_size accounting
  2018-09-10 17:21     ` Oleg Nesterov
  2018-09-10 17:43       ` Oleg Nesterov
@ 2018-09-11  4:27       ` Kees Cook
  2018-09-11 15:25         ` Oleg Nesterov
  1 sibling, 1 reply; 16+ messages in thread
From: Kees Cook @ 2018-09-11  4:27 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Rik van Riel, Michal Hocko, Stanislav Kozina, LKML

On Mon, Sep 10, 2018 at 10:21 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 09/10, Kees Cook wrote:
>>
>> On Mon, Sep 10, 2018 at 9:41 AM, Kees Cook <keescook@chromium.org> wrote:
>> > On Mon, Sep 10, 2018 at 5:29 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>> >> Hi Kees,
>> >>
>> >> I was thinking about backporting the commit 98da7d08850fb8bde
>> >> ("fs/exec.c: account for argv/envp pointers"), but I am not sure
>> >> I understand it...
>>
>> BTW, if you backport that, please get the rest associated with the
>> various Stack Clash related weaknesses:
>
> may be...
>
>> da029c11e6b1 exec: Limit arg stack to at most 75% of _STK_LIM
>
> and I have to admit that I do not understand this patch at all, the
> changelog explains nothing.

The issue here is with keeping some stack space available for a
program to reasonably start execution without doing insane things. The
sizes were picked after discussion with Linus while examining the
various Stack Clash weaknesses.

> Could you explain what this patch actually prevents from? Especially
> now that we have stack_guard_gap?

One of the many Stack Clash abuses was that it was possible to jump
over the stack gap with outrageous environment variables that got
expanded in stupid ways by, IIRC, glibc or the dynamic linker. The
point here was to be defensive in the face of future weaknesses, and
try to be robust in the face of crazy execs but workable under normal
(but large) execs.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: get_arg_page() && ptr_size accounting
  2018-09-10 17:43       ` Oleg Nesterov
@ 2018-09-11  4:30         ` Kees Cook
  2018-09-11 15:29           ` Oleg Nesterov
  0 siblings, 1 reply; 16+ messages in thread
From: Kees Cook @ 2018-09-11  4:30 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Rik van Riel, Michal Hocko, Stanislav Kozina, LKML

On Mon, Sep 10, 2018 at 10:43 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 09/10, Oleg Nesterov wrote:
>>
>> On 09/10, Kees Cook wrote:
>> >
>> > On Mon, Sep 10, 2018 at 9:41 AM, Kees Cook <keescook@chromium.org> wrote:
>> > > On Mon, Sep 10, 2018 at 5:29 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>> > >> Hi Kees,
>> > >>
>> > >> I was thinking about backporting the commit 98da7d08850fb8bde
>> > >> ("fs/exec.c: account for argv/envp pointers"), but I am not sure
>> > >> I understand it...
>> >
>> > BTW, if you backport that, please get the rest associated with the
>> > various Stack Clash related weaknesses:
>>
>> may be...
>>
>> > da029c11e6b1 exec: Limit arg stack to at most 75% of _STK_LIM
>>
>> and I have to admit that I do not understand this patch at all, the
>> changelog explains nothing.
>>
>> Could you explain what this patch actually prevents from? Especially
>> now that we have stack_guard_gap?
>
> forgot to mention...
>
> with this patch
>
>         #define MAX_ARG_STRINGS 0x7FFFFFFF
>
> doesn't match the reality. perhaps something like below makes sense just
> to make it clear, but this is cosmetic.

Part of the discussion from back then was basically "we don't have
hard-coded limits so programs need to check dynamically themselves".

I'd prefer to leave it all well enough alone since I don't want to
introduce regressions here in the face of the many many Stack Clash
style weaknesses.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: get_arg_page() && ptr_size accounting
  2018-09-11  4:23     ` Kees Cook
@ 2018-09-11 14:13       ` Oleg Nesterov
  2018-09-11 19:06         ` Kees Cook
  0 siblings, 1 reply; 16+ messages in thread
From: Oleg Nesterov @ 2018-09-11 14:13 UTC (permalink / raw)
  To: Kees Cook; +Cc: Rik van Riel, Michal Hocko, Stanislav Kozina, LKML

On 09/10, Kees Cook wrote:
>
> I've looked more closely now. So, while I agree with you about
> resource limits, there's a corner case that is better handled here:
> once we've called flush_old_exec(), we can no longer send errors back
> to the parent. We just segfault. So, I think it's better to give a
> resource limit error early, since it is able to do the math early.
>
> If we move acct_arg_size() earlier, then the "immediate" resource
> utilization is checked, but it means it can just segfault later. If we
> leave it as-is, we account for later memory allocations "too early",
> but we'll still not be able to run: but we can tell the parent why.

I don't follow. Could you spell please?

AFAICS, the trivial patch I proposed changes nothing except it fixes the
bprm->pages accounting. The problem is really minor, but this looks confusing
and wrong anyway.

> I prefer leave it as-is.

After this discussion, I strongly disagree.

And now I think we should remove this rlim crap from get_arg_page() altogether
to make the things more clear.

> > Please forget. I meant that _if_ we actually wanted to account this additional
> > memory in bprm->pages, than we would probably need something like
> > acct_arg_size(size/PAGE_SIZE + DIV_ROUND_UP(ptr_size, PAGE_SIZE)).
>
> I'd need to study that more, but that change seems reasonable. :)

Please forget. Not that it matters, but we simply can't account ptr_size
100% correctly if we do this in get_arg_page().

See the patch below. Completely untested, quite possibly wrong, but I think
this is what we should do.

Oleg.


diff --git a/fs/exec.c b/fs/exec.c
index 1ebf6e5..7804a5c 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -218,55 +218,10 @@ static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
 	if (ret <= 0)
 		return NULL;
 
-	if (write) {
-		unsigned long size = bprm->vma->vm_end - bprm->vma->vm_start;
-		unsigned long ptr_size, limit;
-
-		/*
-		 * Since the stack will hold pointers to the strings, we
-		 * must account for them as well.
-		 *
-		 * The size calculation is the entire vma while each arg page is
-		 * built, so each time we get here it's calculating how far it
-		 * is currently (rather than each call being just the newly
-		 * added size from the arg page).  As a result, we need to
-		 * always add the entire size of the pointers, so that on the
-		 * last call to get_arg_page() we'll actually have the entire
-		 * correct size.
-		 */
-		ptr_size = (bprm->argc + bprm->envc) * sizeof(void *);
-		if (ptr_size > ULONG_MAX - size)
-			goto fail;
-		size += ptr_size;
-
-		acct_arg_size(bprm, size / PAGE_SIZE);
-
-		/*
-		 * We've historically supported up to 32 pages (ARG_MAX)
-		 * of argument strings even with small stacks
-		 */
-		if (size <= ARG_MAX)
-			return page;
-
-		/*
-		 * Limit to 1/4 of the max stack size or 3/4 of _STK_LIM
-		 * (whichever is smaller) for the argv+env strings.
-		 * This ensures that:
-		 *  - the remaining binfmt code will not run out of stack space,
-		 *  - the program will have a reasonable amount of stack left
-		 *    to work from.
-		 */
-		limit = _STK_LIM / 4 * 3;
-		limit = min(limit, bprm->rlim_stack.rlim_cur / 4);
-		if (size > limit)
-			goto fail;
-	}
+	if (write)
+		acct_arg_size(bprm, vma_pages(bprm->vma));
 
 	return page;
-
-fail:
-	put_page(page);
-	return NULL;
 }
 
 static void put_arg_page(struct page *page)
@@ -410,11 +365,6 @@ static int bprm_mm_init(struct linux_binprm *bprm)
 	if (!mm)
 		goto err;
 
-	/* Save current stack limit for all calculations made during exec. */
-	task_lock(current->group_leader);
-	bprm->rlim_stack = current->signal->rlim[RLIMIT_STACK];
-	task_unlock(current->group_leader);
-
 	err = __bprm_mm_init(bprm);
 	if (err)
 		goto err;
@@ -492,6 +442,27 @@ static int count(struct user_arg_ptr argv, int max)
 	return i;
 }
 
+static int prepare_rlim_stack(struct linux_binprm *bprm)
+{
+	unsigned long limit, ptr_size;
+
+	task_lock(current->group_leader);
+	bprm->rlim_stack = current->signal->rlim[RLIMIT_STACK];
+	task_unlock(current->group_leader);
+
+	limit = _STK_LIM / 4 * 3;
+	limit = min(limit, bprm->rlim_stack.rlim_cur / 4);
+	limit = max(limit, (unsigned long)ARG_MAX);
+	/* COMMENT */
+	ptr_size = (bprm->argc + bprm->envc) * sizeof(void *);
+	if (limit <= ptr_size)
+		return -E2BIG;
+	limit -= ptr_size;
+
+	bprm->p_min = bprm->p - limit;
+	return 0;
+}
+
 /*
  * 'copy_strings()' copies argument/environment strings from the old
  * processes's memory to the new process's stack.  The call to get_user_pages()
@@ -527,6 +498,8 @@ static int copy_strings(int argc, struct user_arg_ptr argv,
 		pos = bprm->p;
 		str += len;
 		bprm->p -= len;
+		if (bprm->p <= bprm->p_min)
+			goto out;
 
 		while (len > 0) {
 			int offset, bytes_to_copy;
@@ -1801,6 +1774,10 @@ static int __do_execve_file(int fd, struct filename *filename,
 	if (retval < 0)
 		goto out;
 
+	retval = prepare_rlim_stack(bprm);
+	if (retval < 0)
+		goto out;
+
 	retval = copy_strings_kernel(1, &bprm->filename, bprm);
 	if (retval < 0)
 		goto out;
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index c05f24f..423e8c1 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -24,7 +24,7 @@ struct linux_binprm {
 	struct page *page[MAX_ARG_PAGES];
 #endif
 	struct mm_struct *mm;
-	unsigned long p; /* current top of mem */
+	unsigned long p, p_min; /* current top of mem */
 	unsigned int
 		/*
 		 * True after the bprm_set_creds hook has been called once


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: get_arg_page() && ptr_size accounting
  2018-09-11  4:27       ` Kees Cook
@ 2018-09-11 15:25         ` Oleg Nesterov
  0 siblings, 0 replies; 16+ messages in thread
From: Oleg Nesterov @ 2018-09-11 15:25 UTC (permalink / raw)
  To: Kees Cook; +Cc: Rik van Riel, Michal Hocko, Stanislav Kozina, LKML

On 09/10, Kees Cook wrote:
>
> On Mon, Sep 10, 2018 at 10:21 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>
> > Could you explain what this patch actually prevents from? Especially
> > now that we have stack_guard_gap?
>
> One of the many Stack Clash abuses was that it was possible to jump
> over the stack gap with outrageous environment variables that got
> expanded in stupid ways by, IIRC, glibc or the dynamic linker. The
> point here was to be defensive in the face of future weaknesses, and
> try to be robust in the face of crazy execs but workable under normal
> (but large) execs.

IOW, unlimited arg size makes many interesting attacks possible.

This is clear, I was confused because I thought this patch fixes some
particular problem not explained in the changelog.

Thanks,

Oleg.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: get_arg_page() && ptr_size accounting
  2018-09-11  4:30         ` Kees Cook
@ 2018-09-11 15:29           ` Oleg Nesterov
  0 siblings, 0 replies; 16+ messages in thread
From: Oleg Nesterov @ 2018-09-11 15:29 UTC (permalink / raw)
  To: Kees Cook; +Cc: Rik van Riel, Michal Hocko, Stanislav Kozina, LKML

On 09/10, Kees Cook wrote:
>
> On Mon, Sep 10, 2018 at 10:43 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > with this patch
> >
> >         #define MAX_ARG_STRINGS 0x7FFFFFFF
> >
> > doesn't match the reality. perhaps something like below makes sense just
> > to make it clear, but this is cosmetic.
>
> Part of the discussion from back then was basically "we don't have
> hard-coded limits so programs need to check dynamically themselves".
>
> I'd prefer to leave it all well enough alone since I don't want to
> introduce regressions here in the face of the many many Stack Clash
> style weaknesses.

I simply can't understand... Perhaps you too misunderstood me, I only
tried to say that count() can stop earlier, it is pointless to continue
to count the arg/env strings after argc + envc > _STK_LIM / 4 * 3 / 2,
copy_strings() will fail anyway.

Oleg.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: get_arg_page() && ptr_size accounting
  2018-09-11 14:13       ` Oleg Nesterov
@ 2018-09-11 19:06         ` Kees Cook
  2018-09-12 12:27           ` Oleg Nesterov
  0 siblings, 1 reply; 16+ messages in thread
From: Kees Cook @ 2018-09-11 19:06 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Rik van Riel, Michal Hocko, Stanislav Kozina, LKML

Oh, I like this patch! This is much cleaner. Notes below...

On Tue, Sep 11, 2018 at 7:13 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> diff --git a/fs/exec.c b/fs/exec.c
> index 1ebf6e5..7804a5c 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -218,55 +218,10 @@ static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
>         if (ret <= 0)
>                 return NULL;
>
> -       if (write) {
> -               unsigned long size = bprm->vma->vm_end - bprm->vma->vm_start;
> -               unsigned long ptr_size, limit;
> -
> -               /*
> -                * Since the stack will hold pointers to the strings, we
> -                * must account for them as well.
> -                *
> -                * The size calculation is the entire vma while each arg page is
> -                * built, so each time we get here it's calculating how far it
> -                * is currently (rather than each call being just the newly
> -                * added size from the arg page).  As a result, we need to
> -                * always add the entire size of the pointers, so that on the
> -                * last call to get_arg_page() we'll actually have the entire
> -                * correct size.
> -                */
> -               ptr_size = (bprm->argc + bprm->envc) * sizeof(void *);
> -               if (ptr_size > ULONG_MAX - size)
> -                       goto fail;
> -               size += ptr_size;
> -
> -               acct_arg_size(bprm, size / PAGE_SIZE);
> -
> -               /*
> -                * We've historically supported up to 32 pages (ARG_MAX)
> -                * of argument strings even with small stacks
> -                */
> -               if (size <= ARG_MAX)
> -                       return page;
> -
> -               /*
> -                * Limit to 1/4 of the max stack size or 3/4 of _STK_LIM
> -                * (whichever is smaller) for the argv+env strings.
> -                * This ensures that:
> -                *  - the remaining binfmt code will not run out of stack space,
> -                *  - the program will have a reasonable amount of stack left
> -                *    to work from.
> -                */
> -               limit = _STK_LIM / 4 * 3;
> -               limit = min(limit, bprm->rlim_stack.rlim_cur / 4);
> -               if (size > limit)
> -                       goto fail;
> -       }
> +       if (write)
> +               acct_arg_size(bprm, vma_pages(bprm->vma));
>
>         return page;
> -
> -fail:
> -       put_page(page);
> -       return NULL;
>  }
>
>  static void put_arg_page(struct page *page)
> @@ -410,11 +365,6 @@ static int bprm_mm_init(struct linux_binprm *bprm)
>         if (!mm)
>                 goto err;
>
> -       /* Save current stack limit for all calculations made during exec. */
> -       task_lock(current->group_leader);
> -       bprm->rlim_stack = current->signal->rlim[RLIMIT_STACK];
> -       task_unlock(current->group_leader);
> -

I would prefer this hunk stay here since it will be more robust
against weird arch-specific things happening against rlim_stack. I had
to clean up some of these tests in arch code, so I'm nervous about
moving this further away. Here is before we call arch_bprm_mm_init(),
and I think it's better to do this as early as possible.

>         err = __bprm_mm_init(bprm);
>         if (err)
>                 goto err;
> @@ -492,6 +442,27 @@ static int count(struct user_arg_ptr argv, int max)
>         return i;
>  }
>
> +static int prepare_rlim_stack(struct linux_binprm *bprm)

How about collapsing this with:

        bprm->argc = count(argv, MAX_ARG_STRINGS);
        if ((retval = bprm->argc) < 0)
                goto out;

        bprm->envc = count(envp, MAX_ARG_STRINGS);
        if ((retval = bprm->envc) < 0)
                goto out;

and call it prepare_arg_count(struct linux_binprm *bprm,
                                                struct user_arg_ptr argv,
                                                struct user_arg_ptr envp)

> +{
> +       unsigned long limit, ptr_size;
> +
> +       task_lock(current->group_leader);
> +       bprm->rlim_stack = current->signal->rlim[RLIMIT_STACK];
> +       task_unlock(current->group_leader);
> +

Let's try to retain the comments here:

               /*
                * Limit to 1/4 of the max stack size or 3/4 of _STK_LIM
                * (whichever is smaller) for the argv+env strings.
                * This ensures that:
                *  - the remaining binfmt code will not run out of stack space,
                *  - the program will have a reasonable amount of stack left
                *    to work from.
                */
> +       limit = _STK_LIM / 4 * 3;
> +       limit = min(limit, bprm->rlim_stack.rlim_cur / 4);

and here:

               /*
                * We've historically supported up to 32 pages (ARG_MAX)
                * of argument strings even with small stacks
                */
> +       limit = max(limit, (unsigned long)ARG_MAX);
> +       /* COMMENT */

This comment should likely be something like:

/*
 * We must account for the size of all the argv and envp pointers to
 * the argv and envp strings, since they will also take up space in
 * the stack. They aren't stored until much later when we can't
 * signal to the parent that the child has run out of stack space.
 * Instead, calculate it here so it's possible to fail gracefully.
 */

> +       ptr_size = (bprm->argc + bprm->envc) * sizeof(void *);

BTW, in re-reading create_elf_tables() and its calculation of "items",
I realize the above should actually include the trailing NULL pointers
and argc, so it should be:

ptr_size = (1 + bprm->argc + 1 + bprm->envc + 1) * sizeof(void *);

> +       if (limit <= ptr_size)
> +               return -E2BIG;
> +       limit -= ptr_size;
> +
> +       bprm->p_min = bprm->p - limit;
> +       return 0;
> +}
> +
>  /*
>   * 'copy_strings()' copies argument/environment strings from the old
>   * processes's memory to the new process's stack.  The call to get_user_pages()
> @@ -527,6 +498,8 @@ static int copy_strings(int argc, struct user_arg_ptr argv,
>                 pos = bprm->p;
>                 str += len;
>                 bprm->p -= len;
> +               if (bprm->p <= bprm->p_min)
> +                       goto out;
>
>                 while (len > 0) {
>                         int offset, bytes_to_copy;
> @@ -1801,6 +1774,10 @@ static int __do_execve_file(int fd, struct filename *filename,
>         if (retval < 0)
>                 goto out;
>
> +       retval = prepare_rlim_stack(bprm);
> +       if (retval < 0)
> +               goto out;
> +
>         retval = copy_strings_kernel(1, &bprm->filename, bprm);
>         if (retval < 0)
>                 goto out;
> diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
> index c05f24f..423e8c1 100644
> --- a/include/linux/binfmts.h
> +++ b/include/linux/binfmts.h
> @@ -24,7 +24,7 @@ struct linux_binprm {
>         struct page *page[MAX_ARG_PAGES];
>  #endif
>         struct mm_struct *mm;
> -       unsigned long p; /* current top of mem */
> +       unsigned long p, p_min; /* current top of mem */

Can you split this out to a separate line (with a new comment) to
avoid comment-confusion? Something like:

unsigned long p; /* current top of mem */
unsigned long p_min; /* the minimum allowed mem position */

>         unsigned int
>                 /*
>                  * True after the bprm_set_creds hook has been called once
>

I've also spent some more time convincing myself again that there
aren't races between count(), copy_strings(), and create_elf_tables().
A malicious parent would only be able to zero-fill the stack of the
child, but not escape the counts that I can see.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: get_arg_page() && ptr_size accounting
  2018-09-11 19:06         ` Kees Cook
@ 2018-09-12 12:27           ` Oleg Nesterov
  2018-09-12 14:23             ` Oleg Nesterov
  2018-09-12 20:42             ` Kees Cook
  0 siblings, 2 replies; 16+ messages in thread
From: Oleg Nesterov @ 2018-09-12 12:27 UTC (permalink / raw)
  To: Kees Cook; +Cc: Rik van Riel, Michal Hocko, Stanislav Kozina, LKML

On 09/11, Kees Cook wrote:
>
> Oh, I like this patch! This is much cleaner.

it's pity. cause this means I will have to actually test this change and
(worse) write the changelog ;)

> > @@ -410,11 +365,6 @@ static int bprm_mm_init(struct linux_binprm *bprm)
> >         if (!mm)
> >                 goto err;
> >
> > -       /* Save current stack limit for all calculations made during exec. */
> > -       task_lock(current->group_leader);
> > -       bprm->rlim_stack = current->signal->rlim[RLIMIT_STACK];
> > -       task_unlock(current->group_leader);
> > -
>
> I would prefer this hunk stay here since it will be more robust
> against weird arch-specific things happening against rlim_stack. I had
> to clean up some of these tests in arch code, so I'm nervous about
> moving this further away. Here is before we call arch_bprm_mm_init(),
> and I think it's better to do this as early as possible.

Well, I don't reaally agree but I won't argue, this is cosmetic at least
right now.

> > +static int prepare_rlim_stack(struct linux_binprm *bprm)
>
> How about collapsing this with:
>
>         bprm->argc = count(argv, MAX_ARG_STRINGS);
>         if ((retval = bprm->argc) < 0)
>                 goto out;
>
>         bprm->envc = count(envp, MAX_ARG_STRINGS);
>         if ((retval = bprm->envc) < 0)
>                 goto out;
>
> and call it prepare_arg_count(struct linux_binprm *bprm,
>                                                 struct user_arg_ptr argv,
>                                                 struct user_arg_ptr envp)

OK, agreed,

> Let's try to retain the comments here:

Yes, sure, I wasn't going to remove the comments,

> > +       /* COMMENT */
>
> This comment should likely be something like:
>
> /*
>  * We must account for the size of all the argv and envp pointers to
>  * the argv and envp strings, since they will also take up space in
>  * the stack. They aren't stored until much later when we can't
>  * signal to the parent that the child has run out of stack space.
>  * Instead, calculate it here so it's possible to fail gracefully.
>  */

Thanks!

> > +       ptr_size = (bprm->argc + bprm->envc) * sizeof(void *);
>
> BTW, in re-reading create_elf_tables() and its calculation of "items",
> I realize the above should actually include the trailing NULL pointers
> and argc, so it should be:
>
> ptr_size = (1 + bprm->argc + 1 + bprm->envc + 1) * sizeof(void *);

Yes, I noticed this too. But can we do this later please?

Firstly, this change needs a special note in the changelog, and thus I
think a separate patch makes more sense.

And in fact I am not sure we really care about "small" O(1) errors in
ptr_size calculations.

> > -       unsigned long p; /* current top of mem */
> > +       unsigned long p, p_min; /* current top of mem */
>
> Can you split this out to a separate line (with a new comment) to
> avoid comment-confusion? Something like:
>
> unsigned long p; /* current top of mem */
> unsigned long p_min; /* the minimum allowed mem position */

OK, but "minimum allowed mem position" explains nothing... The comment
should explain that ->p_min (can you suggest a better name?) is artificial
marker pre-computed for rlim-like checks in copy_strings()...

> I've also spent some more time convincing myself again that there
> aren't races between count(), copy_strings(), and create_elf_tables().

Yes, I thought about this too, do not see anything dangerous.

BTW. I think we can simply kill count(). But this needs another cleanup
and dicsussion.

Thanks for review!

Oleg.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: get_arg_page() && ptr_size accounting
  2018-09-12 12:27           ` Oleg Nesterov
@ 2018-09-12 14:23             ` Oleg Nesterov
  2018-09-12 20:42             ` Kees Cook
  1 sibling, 0 replies; 16+ messages in thread
From: Oleg Nesterov @ 2018-09-12 14:23 UTC (permalink / raw)
  To: Kees Cook; +Cc: Rik van Riel, Michal Hocko, Stanislav Kozina, LKML

On 09/12, Oleg Nesterov wrote:
>
> it's pity. cause this means I will have to actually test this change and
> (worse) write the changelog ;)

Seems to work, see v2 below, I tried to address your comments.

The new helper is prepare_arg_pages(), matches setup_arg_pages/shift_arg_pages.

s/p_min/argmin/. Or please suggest a better name and/or comment.

I still think the new helper should absorb the bprm->rlim_stack inititialization,
it is called right after bprm_mm_init() and I see no reason why arch_bprm_mm_init()
should ever look at rlim_stack, but this is minor and I won't insist.

If you do not see any problems I'll try to force myself to write the changelog
tomorrow.

Oleg.


diff --git a/fs/exec.c b/fs/exec.c
index 1ebf6e5..9a892ac 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -218,55 +218,10 @@ static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
 	if (ret <= 0)
 		return NULL;
 
-	if (write) {
-		unsigned long size = bprm->vma->vm_end - bprm->vma->vm_start;
-		unsigned long ptr_size, limit;
-
-		/*
-		 * Since the stack will hold pointers to the strings, we
-		 * must account for them as well.
-		 *
-		 * The size calculation is the entire vma while each arg page is
-		 * built, so each time we get here it's calculating how far it
-		 * is currently (rather than each call being just the newly
-		 * added size from the arg page).  As a result, we need to
-		 * always add the entire size of the pointers, so that on the
-		 * last call to get_arg_page() we'll actually have the entire
-		 * correct size.
-		 */
-		ptr_size = (bprm->argc + bprm->envc) * sizeof(void *);
-		if (ptr_size > ULONG_MAX - size)
-			goto fail;
-		size += ptr_size;
-
-		acct_arg_size(bprm, size / PAGE_SIZE);
-
-		/*
-		 * We've historically supported up to 32 pages (ARG_MAX)
-		 * of argument strings even with small stacks
-		 */
-		if (size <= ARG_MAX)
-			return page;
-
-		/*
-		 * Limit to 1/4 of the max stack size or 3/4 of _STK_LIM
-		 * (whichever is smaller) for the argv+env strings.
-		 * This ensures that:
-		 *  - the remaining binfmt code will not run out of stack space,
-		 *  - the program will have a reasonable amount of stack left
-		 *    to work from.
-		 */
-		limit = _STK_LIM / 4 * 3;
-		limit = min(limit, bprm->rlim_stack.rlim_cur / 4);
-		if (size > limit)
-			goto fail;
-	}
+	if (write)
+		acct_arg_size(bprm, vma_pages(bprm->vma));
 
 	return page;
-
-fail:
-	put_page(page);
-	return NULL;
 }
 
 static void put_arg_page(struct page *page)
@@ -492,6 +447,50 @@ static int count(struct user_arg_ptr argv, int max)
 	return i;
 }
 
+static int prepare_arg_pages(struct linux_binprm *bprm,
+			struct user_arg_ptr argv, struct user_arg_ptr envp)
+{
+	unsigned long limit, ptr_size;
+
+	bprm->argc = count(argv, MAX_ARG_STRINGS);
+	if (bprm->argc < 0)
+		return bprm->argc;
+
+	bprm->envc = count(envp, MAX_ARG_STRINGS);
+	if (bprm->envc < 0)
+		return bprm->envc;
+
+	/*
+	 * Limit to 1/4 of the max stack size or 3/4 of _STK_LIM
+	 * (whichever is smaller) for the argv+env strings.
+	 * This ensures that:
+	 *  - the remaining binfmt code will not run out of stack space,
+	 *  - the program will have a reasonable amount of stack left
+	 *    to work from.
+	 */
+	limit = _STK_LIM / 4 * 3;
+	limit = min(limit, bprm->rlim_stack.rlim_cur / 4);
+	/*
+	 * We've historically supported up to 32 pages (ARG_MAX)
+	 * of argument strings even with small stacks
+	 */
+	limit = max(limit, (unsigned long)ARG_MAX);
+	/*
+	 * We must account for the size of all the argv and envp pointers to
+	 * the argv and envp strings, since they will also take up space in
+	 * the stack. They aren't stored until much later when we can't
+	 * signal to the parent that the child has run out of stack space.
+	 * Instead, calculate it here so it's possible to fail gracefully.
+	 */
+	ptr_size = (bprm->argc + bprm->envc) * sizeof(void *);
+	if (limit <= ptr_size)
+		return -E2BIG;
+	limit -= ptr_size;
+
+	bprm->argmin = bprm->p - limit;
+	return 0;
+}
+
 /*
  * 'copy_strings()' copies argument/environment strings from the old
  * processes's memory to the new process's stack.  The call to get_user_pages()
@@ -527,6 +526,8 @@ static int copy_strings(int argc, struct user_arg_ptr argv,
 		pos = bprm->p;
 		str += len;
 		bprm->p -= len;
+		if (bprm->p < bprm->argmin)
+			goto out;
 
 		while (len > 0) {
 			int offset, bytes_to_copy;
@@ -1789,12 +1790,8 @@ static int __do_execve_file(int fd, struct filename *filename,
 	if (retval)
 		goto out_unmark;
 
-	bprm->argc = count(argv, MAX_ARG_STRINGS);
-	if ((retval = bprm->argc) < 0)
-		goto out;
-
-	bprm->envc = count(envp, MAX_ARG_STRINGS);
-	if ((retval = bprm->envc) < 0)
+	retval = prepare_arg_pages(bprm, argv, envp);
+	if (retval < 0)
 		goto out;
 
 	retval = prepare_binprm(bprm);
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index c05f24f..5ffc8da 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -25,6 +25,7 @@ struct linux_binprm {
 #endif
 	struct mm_struct *mm;
 	unsigned long p; /* current top of mem */
+	unsigned long argmin; /* rlimit marker for copy_strings() */
 	unsigned int
 		/*
 		 * True after the bprm_set_creds hook has been called once


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: get_arg_page() && ptr_size accounting
  2018-09-12 12:27           ` Oleg Nesterov
  2018-09-12 14:23             ` Oleg Nesterov
@ 2018-09-12 20:42             ` Kees Cook
  1 sibling, 0 replies; 16+ messages in thread
From: Kees Cook @ 2018-09-12 20:42 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Rik van Riel, Michal Hocko, Stanislav Kozina, LKML

On Wed, Sep 12, 2018 at 5:27 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 09/11, Kees Cook wrote:
>>
>> Oh, I like this patch! This is much cleaner.
>
> it's pity. cause this means I will have to actually test this change and
> (worse) write the changelog ;)

Hehe. I know this pain well! :)

>> > @@ -410,11 +365,6 @@ static int bprm_mm_init(struct linux_binprm *bprm)
>> >         if (!mm)
>> >                 goto err;
>> >
>> > -       /* Save current stack limit for all calculations made during exec. */
>> > -       task_lock(current->group_leader);
>> > -       bprm->rlim_stack = current->signal->rlim[RLIMIT_STACK];
>> > -       task_unlock(current->group_leader);
>> > -
>>
>> I would prefer this hunk stay here since it will be more robust
>> against weird arch-specific things happening against rlim_stack. I had
>> to clean up some of these tests in arch code, so I'm nervous about
>> moving this further away. Here is before we call arch_bprm_mm_init(),
>> and I think it's better to do this as early as possible.
>
> Well, I don't reaally agree but I won't argue, this is cosmetic at least
> right now.

Probably what it deserves is a better comment to capture what I said
above. Maybe:

-    /* Save current stack limit for all calculations made during exec. */
+    /* Do this before any arch-specific calls, like arch_bprm_mm_init(),
+     * so that bprm->rlim_stack is available for the architecture to use
+     * in case it needs it earlier that mm layout time.
+     */

>> BTW, in re-reading create_elf_tables() and its calculation of "items",
>> I realize the above should actually include the trailing NULL pointers
>> and argc, so it should be:
>>
>> ptr_size = (1 + bprm->argc + 1 + bprm->envc + 1) * sizeof(void *);
>
> Yes, I noticed this too. But can we do this later please?

Sure!

>> > -       unsigned long p; /* current top of mem */
>> > +       unsigned long p, p_min; /* current top of mem */
>>
>> Can you split this out to a separate line (with a new comment) to
>> avoid comment-confusion? Something like:
>>
>> unsigned long p; /* current top of mem */
>> unsigned long p_min; /* the minimum allowed mem position */
>
> OK, but "minimum allowed mem position" explains nothing... The comment
> should explain that ->p_min (can you suggest a better name?) is artificial
> marker pre-computed for rlim-like checks in copy_strings()...

How about something like:

... p; /* top of memory array reserved for stack */
... p_min; /* bottom of stack as computed in prepare_arg_pages() */

(Is "p" really only used for stack reservation tracking?)

> BTW. I think we can simply kill count(). But this needs another cleanup
> and dicsussion.

Hm, I think we need count for doing the sanity checking and allowing
the cond_resched() calls.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2018-09-12 20:42 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-10 12:29 get_arg_page() && ptr_size accounting Oleg Nesterov
2018-09-10 16:41 ` Kees Cook
2018-09-10 16:45   ` Kees Cook
2018-09-10 17:21     ` Oleg Nesterov
2018-09-10 17:43       ` Oleg Nesterov
2018-09-11  4:30         ` Kees Cook
2018-09-11 15:29           ` Oleg Nesterov
2018-09-11  4:27       ` Kees Cook
2018-09-11 15:25         ` Oleg Nesterov
2018-09-10 17:18   ` Oleg Nesterov
2018-09-11  4:23     ` Kees Cook
2018-09-11 14:13       ` Oleg Nesterov
2018-09-11 19:06         ` Kees Cook
2018-09-12 12:27           ` Oleg Nesterov
2018-09-12 14:23             ` Oleg Nesterov
2018-09-12 20:42             ` Kees Cook

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.