All of lore.kernel.org
 help / color / mirror / Atom feed
From: enh <enh@google.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rob Landley <rob@landley.net>, Kees Cook <keescook@chromium.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	toybox@lists.landley.net
Subject: Re: Regression: commit da029c11e6b1 broke toybox xargs.
Date: Wed, 15 Nov 2017 14:10:17 -0800	[thread overview]
Message-ID: <CAJgzZopVb5CC-T6GTYTAGKwY+vQ9Muvi3g6jKNmOC1+4BTMNvQ@mail.gmail.com> (raw)
In-Reply-To: <CA+55aFzUvbGjD8nQ-+3oiMBx14c_6zOj2n7KLN3UsJ-qsd4Dcw@mail.gmail.com>

On Sun, Nov 5, 2017 at 12:46 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Sat, Nov 4, 2017 at 5:39 PM, Rob Landley <rob@landley.net> wrote:
>> On 11/03/2017 08:07 PM, Linus Torvalds wrote:
>>>>
>>>> But it boils down to "got the limit wrong, the exec failed after the
>>>> fork(), dynamic recovery from which is awkward so I'm trying to figure
>>>> out the right limit".
>>
>> Sounds later like dynamic recovery is what you recommend. (Awkward
>> doesn't mean I can't do it.)
>
> So my preferred solution would certainly be for xargs to realize that
> _SC_ARG_MAX is at most an "informed guess" rather than some absolute
> value.
>
> Or, alternatively, simply use a _SC_ARG_MAX that is small enough that
> we can say "yes, that's what we've always supported". That, in
> practice, is the 128kB value.
>
> It's probably also big enough that nobody cares any more. Sure, you
> *could* feed bigger arguments, and we do end up basically supporting
> it because people who don't write proper scripts will simply do
>
>     grep some-value $(find . -name '*.c')
>
> and that works pretty well if your code-base isn't humongous.
>
> It still works for the kernel, for example, but there's no question
> that it's still not something we *guarantee* works.
>
> It doesn't work if you don't limit it to *.c files:
>
>   [torvalds@i7 linux]$ grep some-value $(find .)
>   -bash: /usr/bin/grep: Argument list too long
>
> so when the kernel expanded the argument list from the traditional
> 128kB, it was for _convenience_, it was not meant for people who do
> proper scripting and use xargs.
>
> See the difference?
>
>>> What _is_ the stack limit when using toybox? Is it just entirely unlimited?
>>
>> Answer to second question on ubuntu 14.04:
>>
>>   landley@driftwood:~/linux/linux/fs$ ulimit -s 999999999
>>   landley@driftwood:~/linux/linux/fs$ ulimit -s
>>   999999999
>
> Oh, I can do that on Fedora too.
>
> But the thing is, nobody actually seems to do that.
>
> And our regression rule has never been "behavior doesn't change".
> That would mean that we could never make any changes at all.
>
> For example, we do things like add new error handling etc all the
> time, which we then sometimes even add tests for in our kselftest
> directory.
>
> So clearly behavior changes all the time and we don't consider that a
> regression per se.
>
> The rule for a regression for the kernel is that some real user
> workflow breaks. Not some test. Not a "look, I used to be able to do
> X, now I can't".
>
> So that's why this whole "is this a test or a real workload" question
> is important to me, and what odd RLIMIT_STACK people actually use.

Android's default RLIMIT_STACK is 8192.

the specific bug report was from an interactive shell user. i'm not
aware of any part of the platform itself that relies on this, nor any
third-party app.

> I'm not interested in "can you reproduce it", because that's simply
> not the issue for us from a regression standpoint.
>
> That said, I tried this under Fedora:
>
>   ulimit -s unlimited
>   find / -print0 2> /dev/null | xargs -0 /usr/bin/echo | wc
>
> and it reported 991 lines. That would match just using 128kB as the
> breaking point for xargs.
>
> So apparently at least Fedora doesn't seem to use that "stack limit /
> 4" thing, but the traditional 128kB value.
>
> I don't know if that is because of 'xargs', or because of library
> differences - I didn't check.
>
> And I have a hard time judging whether the regression report is a
> "look, this behavior changed", or a "we actually have a real
> regression visible to actual users".
>
> See above why that matters to me.
>
>> It sounds like "probe and recover" is my best option.
>
> I actually suspect "just use 128kB" is the actual best option in practice.

for libc's sysconf(_SC_ARG_MAX) too? i'm fine changing bionic back to
reporting 128KiB if there's an lkml "Linus says" mail that i can link
to in the comment. it certainly seems like an overly-conservative
choice is better than the current situation...

> Sure, "probe and recover" is the best option in theory, but it's a lot
> more complex, and there doesn't actually seem to be a lot of upside.
>
> So to take my silly example of piping "find /" to xargs: I can't
> imagine that anybody sane should care really whether it results in
> 900+ execve's (for that 128kB limit) or just 20 (for some "optimal"
> 6MB limit).
>
> And there is actually a somewhat real fear with the
> "probe-and-recover" model: the already mentioned nasty case "sometimes
> we don't return E2BIG, we might just succeed the execve, but then kill
> the process very early due to some _other_ limit being triggered".
>
> That nasty case really shouldn't happen, but it was the case we
> considered (and rejected) for suid binaries.
>
> So that's the real reason the kernel behavior changed: we had a
> security issue with the "we allow basically unlimited stack growth"
> where a huge argv/envp could be used to grow the stack into some other
> segment.
>
> Realistically, it was only a security issue with suid binaries, but as
> mentioned in this thread, the problem is that we do the argument
> copying even before we've decided whether the execve is going to be a
> suid execve.
>
> So then - exactly so that we do *not* have that nasty case of "execve
> succeeds, but we kill the process immediately if it turns into a
> potential security issue", we do that "limit the stack growth for
> everybody".
>
>                 Linus



-- 
Elliott Hughes - http://who/enh - http://jessies.org/~enh/
Android native code/tools questions? Mail me/drop by/add me as a reviewer.

  reply	other threads:[~2017-11-15 22:10 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-01 23:34 Regression: commit da029c11e6b1 broke toybox xargs Rob Landley
2017-11-02  3:30 ` Kees Cook
     [not found] ` <CA+55aFyw74DcPygS=SB0d-Fufz3j73zTVp2UXUUOUt4=1_He=Q@mail.gmail.com>
2017-11-02 15:40   ` Linus Torvalds
2017-11-03 23:58     ` Rob Landley
2017-11-04  0:03       ` [Toybox] " enh
2017-11-04  0:42       ` Kees Cook
2017-11-04  1:22         ` Linus Torvalds
2017-11-04  1:37           ` Kees Cook
2017-11-05  1:10             ` Rob Landley
2017-11-04  1:07       ` Linus Torvalds
2017-11-05  0:39         ` Rob Landley
2017-11-05 20:46           ` Linus Torvalds
2017-11-15 22:10             ` enh [this message]
2017-11-15 22:45               ` Linus Torvalds
2017-11-15 21:12           ` Pavel Machek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJgzZopVb5CC-T6GTYTAGKwY+vQ9Muvi3g6jKNmOC1+4BTMNvQ@mail.gmail.com \
    --to=enh@google.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rob@landley.net \
    --cc=torvalds@linux-foundation.org \
    --cc=toybox@lists.landley.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.