From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1752467AbdKEAjn (ORCPT <rfc822;w@1wt.eu>);
        Sat, 4 Nov 2017 20:39:43 -0400
Received: from mail-pg0-f49.google.com ([74.125.83.49]:49756 "EHLO
        mail-pg0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1751508AbdKEAjk (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Sat, 4 Nov 2017 20:39:40 -0400
X-Google-Smtp-Source: ABhQp+QuOv1oieN8mphy5tUYORw3gwo6HzTVnDXo0C2E+3BHxTTpjF0U90tfXUhjtn5HCB8rsBInjQ==
Subject: Re: Regression: commit da029c11e6b1 broke toybox xargs.
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Kees Cook <keescook@chromium.org>,
        Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
        toybox@lists.landley.net, enh@google.com
References: <b33da177-a836-40eb-25c8-8134da83c63d@landley.net>
 <CA+55aFyw74DcPygS=SB0d-Fufz3j73zTVp2UXUUOUt4=1_He=Q@mail.gmail.com>
 <CA+55aFyWWMipF2sLdcUbz2JwMQ0YRK5YAHTy8HrUPevMF+6XZA@mail.gmail.com>
 <0b3a9bd0-3046-cdab-cfee-0ca45ee64e8d@landley.net>
 <CA+55aFyvL3Y3XE4t6O0-HVNTuzFwqAvZUW-6=HX7m4bf+qpJ1w@mail.gmail.com>
From: Rob Landley <rob@landley.net>
Message-ID: <59f9380b-fc9b-c6a5-998a-a603ef828d1d@landley.net>
Date: Sat, 4 Nov 2017 19:39:36 -0500
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.3.0
MIME-Version: 1.0
In-Reply-To: <CA+55aFyvL3Y3XE4t6O0-HVNTuzFwqAvZUW-6=HX7m4bf+qpJ1w@mail.gmail.com>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Correcting Elliot's email to google, not gmail. (Sorry, I'm in Tokyo for
work this month, almost over the jetlag...)

On 11/03/2017 08:07 PM, Linus Torvalds wrote:
> On Fri, Nov 3, 2017 at 4:58 PM, Rob Landley <rob@landley.net> wrote:
>> On 11/02/2017 10:40 AM, Linus Torvalds wrote:
>>
>> But it boils down to "got the limit wrong, the exec failed after the
>> fork(), dynamic recovery from which is awkward so I'm trying to figure
>> out the right limit".

Sounds later like dynamic recovery is what you recommend. (Awkward
doesn't mean I can't do it.)

> I suspect we _do_ have to raise that limit, because clearly this is a
> regression, but I absolutely _detest_ the fact that a stupid
> _embedded_ OS thinks that it should have a bigger stack limit than
> stuff that runs on supercomputers.
> 
> That just makes me go "there's something seriously wrong".

This was me trying not to assume what other people will do, I think
android's default is still 8mb (it was in M) but my test systems for
this are literally on the other side of the planet right now.

Google's internal frame of reference is very different from mine. I got
pointed at a podcast (Android Developers Backstage #53) where Elliott
and another android dev talked about toybox for a few minutes in the
second half, they they shared a chuckle over my complaint that
downloading AOSP takes 150 gigabytes _before_ it tries to build
anything, and only the largest machine I own can build it at all (and
that very slowly). It was just so alien to them that this would be a
_problem_...

> For something like "xargs", I'm actually really saddened by the stupid
> decision to think it's a single value. The whole and *only* reason for
> xargs to exist is to just get it right,

Which is what I was trying very hard to do. :(

> and the natural thing for
> xargs to do would be to not ask, but simply try to do the whole thing,
> and if you get E2BIG, you decide to split it in half or something
> until it works. That kind of approach would just make it work
> _without_ depending on some magic value.
> 
> The fact that apparently xargs is too stupid to do that, and instead
> requires _SC_ARG_MAX to magically give it the "One True Value(tm)" is
> just all kinds of crap.

I'm writing this xargs, I can _make_ it do that, it just requires a pipe
back from the forked child to return status and is either slow (remove
one argument at a time) or inaccurate (cut it in half, result coulda
been longer). Either way xargs still needs an internal limit or "yes |
xargs" will try to fill all memory before ever calling exec().

The reason I wanted to support "exactly as big as possible" is that
calling a command as one invocation vs multiple invocations can change
behavior. Once you've decided to split, how BIG you split is much less
important, so falling back to an arbitrary limit would be fine except
I'd still have to check the stack size to see if it's _lower_ than that
arbitrary limit. (If you set the stack ulimit to 128k, which nommu
systems may wanna do, then the exec limit is 32k. It can be _anything_.)

And this limit is shared with environment variables so the problem might
be that your environment's pathological and you can't run this command
line with even one argument because envp ate all the space, but that's
another story and the user can wash it through env -i to make it work.
Except:

  $ env -i {A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P}=$(printf '%0*d' 130657) \
      env | wc -c

Says 2090560 (of 2097152), but 130658 says argument list too long when
it's only 16 more bytes of the ~6k we should have left (envp[]=17*8,
argc=2*8, argv[0]=4...) argc and it sounds like you're saying I should
just stop _trying_ to figure out exact up-front measurements.

So stacksize /4, then split in half each time, and if it strips down to
one argument that can't run, have an error message for that. Ok.

> Oh well. Enough ranting.
> 
> What _is_ the stack limit when using toybox? Is it just entirely unlimited?

Answer to second question on ubuntu 14.04:

  landley@driftwood:~/linux/linux/fs$ ulimit -s 999999999
  landley@driftwood:~/linux/linux/fs$ ulimit -s
  999999999

Anybody can call ulimit to expand it as a normal user, so effectively
yes it is unlimited. I have no IDEA what my users are gonna do. (If they
do something stupid it's their fault, but I don't necessarily get to say
what stupid is from here.)

Answer to first: the default is whatever I inherited from the Android
fork du jour it's running on.

The google developers seem to be drinking from a firehose of
contributions from the half-dozen phone companies trying to get code
upstream. Elliott presumably says no to what he can but they're hugely
outnumbered and there's politics I'm only dimly aware of (never having
worked for google and only having met Elliott for lunch once a couple
years ago when I was in town for ELC anyway, this is all just the
impression of an interested outside party).

Then there's more companies in China and such (like Xiaomi,
https://www.youtube.com/watch?v=fR6K1l3sfm8#t=1m38s) that probably don't
even send code back to Google (because Great Firewall) but still use
their own forks of android infrastructure which they modify the hell out of.

I can theoretically test the vanilla AOSP build du jour under qemu, but
thats like the vanilla kernel: what winds up in the hands of 99% of end
users is a fork of a fork.

Mostly I trust Elliott to point out when I've violated an Android
assumption (often via the patch comment).

(And then there's the Android Native Development Kit, which _almost_
builds toybox outside of AOSP, but will never spray it down with the
full selinux environment ala
http://lists.landley.net/pipermail/toybox-landley.net/2016-December/008772.html
so it'll build a _partial_ toybox at best, by design. I think we left
off around
http://lists.landley.net/pipermail/toybox-landley.net/2017-April/008970.html
and I should poke at it again when I get time...)

I got _into_ this trying to transplant the android build to run under
android natively (http://landley.net/toybox/#21-03-2013) and the
discussions I had with Elliott about that on the toybox list involved
creating a "posix container" under minijail. (Each android app runs
under its own UID, but a build system sort of needs a UID range, so he
went off to teach android apps about this new concept and I got
distracted by $DAYJOB and we haven't gotten back to it since...) Such a
container would presumably have its own environment setup, and "is 8
megs the right stack size for that" is question I never asked. Right now
AOSP can expand it as needed deep in some nested Ninja file unless they
added an selinux rule to stop it.

tl;dr You ask hard questions, I have no idea.

>> Should I just go back to hardwiring in 131072? It's no _less_ arbitrary
>> than 10 megs, and it sounds like getting it _right_ is unachievable.
> 
> So in a perfect world, nobody should use that value.
> 
> But we can certainly change the kernel behavior back too.
> 
> But you realize that then we still would limit suid binaries, and now
> your "xargs" would suddenly work with normal binaries, but break if
> it's a suid binary?

It sounds like "probe and recover" is my best option.

> So it would certainly just be nicer if toybox had a sane stack limit
> and none of this would matter.

That's like saying coreutils should have a sane stack limit. It's
command line utilities, not an OS. I inherit prefs and live with them.

Android has its own init subsystem that sprays the world down with
selinux rules before anything else gets to run, meaning the AOSP build
annotates the toybox binary it installs with enough extended attributes
to choke a cow in order for things like "ps" to work. All the other
android builds are forks off AOSP (the Android Open Source Project),
which is the "upstream vanilla" for that distro. It is a GIANT HAIRBALL
and dismantling it enough to figure out what it actually does is a todo
item for me, and I haven't even found time to watch all of
https://www.youtube.com/watch?v=dEKYZUgorWQ yet...

But I support standard Linux _too_, so right now I mostly develop on
vanilla and then Elliott tells me when I screwed up. :)

>                 Linus
Rob