From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752467AbdKEAjn (ORCPT ); Sat, 4 Nov 2017 20:39:43 -0400 Received: from mail-pg0-f49.google.com ([74.125.83.49]:49756 "EHLO mail-pg0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751508AbdKEAjk (ORCPT ); Sat, 4 Nov 2017 20:39:40 -0400 X-Google-Smtp-Source: ABhQp+QuOv1oieN8mphy5tUYORw3gwo6HzTVnDXo0C2E+3BHxTTpjF0U90tfXUhjtn5HCB8rsBInjQ== Subject: Re: Regression: commit da029c11e6b1 broke toybox xargs. To: Linus Torvalds Cc: Kees Cook , Linux Kernel Mailing List , toybox@lists.landley.net, enh@google.com References: <0b3a9bd0-3046-cdab-cfee-0ca45ee64e8d@landley.net> From: Rob Landley Message-ID: <59f9380b-fc9b-c6a5-998a-a603ef828d1d@landley.net> Date: Sat, 4 Nov 2017 19:39:36 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Correcting Elliot's email to google, not gmail. (Sorry, I'm in Tokyo for work this month, almost over the jetlag...) On 11/03/2017 08:07 PM, Linus Torvalds wrote: > On Fri, Nov 3, 2017 at 4:58 PM, Rob Landley wrote: >> On 11/02/2017 10:40 AM, Linus Torvalds wrote: >> >> But it boils down to "got the limit wrong, the exec failed after the >> fork(), dynamic recovery from which is awkward so I'm trying to figure >> out the right limit". Sounds later like dynamic recovery is what you recommend. (Awkward doesn't mean I can't do it.) > I suspect we _do_ have to raise that limit, because clearly this is a > regression, but I absolutely _detest_ the fact that a stupid > _embedded_ OS thinks that it should have a bigger stack limit than > stuff that runs on supercomputers. > > That just makes me go "there's something seriously wrong". This was me trying not to assume what other people will do, I think android's default is still 8mb (it was in M) but my test systems for this are literally on the other side of the planet right now. Google's internal frame of reference is very different from mine. I got pointed at a podcast (Android Developers Backstage #53) where Elliott and another android dev talked about toybox for a few minutes in the second half, they they shared a chuckle over my complaint that downloading AOSP takes 150 gigabytes _before_ it tries to build anything, and only the largest machine I own can build it at all (and that very slowly). It was just so alien to them that this would be a _problem_... > For something like "xargs", I'm actually really saddened by the stupid > decision to think it's a single value. The whole and *only* reason for > xargs to exist is to just get it right, Which is what I was trying very hard to do. :( > and the natural thing for > xargs to do would be to not ask, but simply try to do the whole thing, > and if you get E2BIG, you decide to split it in half or something > until it works. That kind of approach would just make it work > _without_ depending on some magic value. > > The fact that apparently xargs is too stupid to do that, and instead > requires _SC_ARG_MAX to magically give it the "One True Value(tm)" is > just all kinds of crap. I'm writing this xargs, I can _make_ it do that, it just requires a pipe back from the forked child to return status and is either slow (remove one argument at a time) or inaccurate (cut it in half, result coulda been longer). Either way xargs still needs an internal limit or "yes | xargs" will try to fill all memory before ever calling exec(). The reason I wanted to support "exactly as big as possible" is that calling a command as one invocation vs multiple invocations can change behavior. Once you've decided to split, how BIG you split is much less important, so falling back to an arbitrary limit would be fine except I'd still have to check the stack size to see if it's _lower_ than that arbitrary limit. (If you set the stack ulimit to 128k, which nommu systems may wanna do, then the exec limit is 32k. It can be _anything_.) And this limit is shared with environment variables so the problem might be that your environment's pathological and you can't run this command line with even one argument because envp ate all the space, but that's another story and the user can wash it through env -i to make it work. Except: $ env -i {A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P}=$(printf '%0*d' 130657) \ env | wc -c Says 2090560 (of 2097152), but 130658 says argument list too long when it's only 16 more bytes of the ~6k we should have left (envp[]=17*8, argc=2*8, argv[0]=4...) argc and it sounds like you're saying I should just stop _trying_ to figure out exact up-front measurements. So stacksize /4, then split in half each time, and if it strips down to one argument that can't run, have an error message for that. Ok. > Oh well. Enough ranting. > > What _is_ the stack limit when using toybox? Is it just entirely unlimited? Answer to second question on ubuntu 14.04: landley@driftwood:~/linux/linux/fs$ ulimit -s 999999999 landley@driftwood:~/linux/linux/fs$ ulimit -s 999999999 Anybody can call ulimit to expand it as a normal user, so effectively yes it is unlimited. I have no IDEA what my users are gonna do. (If they do something stupid it's their fault, but I don't necessarily get to say what stupid is from here.) Answer to first: the default is whatever I inherited from the Android fork du jour it's running on. The google developers seem to be drinking from a firehose of contributions from the half-dozen phone companies trying to get code upstream. Elliott presumably says no to what he can but they're hugely outnumbered and there's politics I'm only dimly aware of (never having worked for google and only having met Elliott for lunch once a couple years ago when I was in town for ELC anyway, this is all just the impression of an interested outside party). Then there's more companies in China and such (like Xiaomi, https://www.youtube.com/watch?v=fR6K1l3sfm8#t=1m38s) that probably don't even send code back to Google (because Great Firewall) but still use their own forks of android infrastructure which they modify the hell out of. I can theoretically test the vanilla AOSP build du jour under qemu, but thats like the vanilla kernel: what winds up in the hands of 99% of end users is a fork of a fork. Mostly I trust Elliott to point out when I've violated an Android assumption (often via the patch comment). (And then there's the Android Native Development Kit, which _almost_ builds toybox outside of AOSP, but will never spray it down with the full selinux environment ala http://lists.landley.net/pipermail/toybox-landley.net/2016-December/008772.html so it'll build a _partial_ toybox at best, by design. I think we left off around http://lists.landley.net/pipermail/toybox-landley.net/2017-April/008970.html and I should poke at it again when I get time...) I got _into_ this trying to transplant the android build to run under android natively (http://landley.net/toybox/#21-03-2013) and the discussions I had with Elliott about that on the toybox list involved creating a "posix container" under minijail. (Each android app runs under its own UID, but a build system sort of needs a UID range, so he went off to teach android apps about this new concept and I got distracted by $DAYJOB and we haven't gotten back to it since...) Such a container would presumably have its own environment setup, and "is 8 megs the right stack size for that" is question I never asked. Right now AOSP can expand it as needed deep in some nested Ninja file unless they added an selinux rule to stop it. tl;dr You ask hard questions, I have no idea. >> Should I just go back to hardwiring in 131072? It's no _less_ arbitrary >> than 10 megs, and it sounds like getting it _right_ is unachievable. > > So in a perfect world, nobody should use that value. > > But we can certainly change the kernel behavior back too. > > But you realize that then we still would limit suid binaries, and now > your "xargs" would suddenly work with normal binaries, but break if > it's a suid binary? It sounds like "probe and recover" is my best option. > So it would certainly just be nicer if toybox had a sane stack limit > and none of this would matter. That's like saying coreutils should have a sane stack limit. It's command line utilities, not an OS. I inherit prefs and live with them. Android has its own init subsystem that sprays the world down with selinux rules before anything else gets to run, meaning the AOSP build annotates the toybox binary it installs with enough extended attributes to choke a cow in order for things like "ps" to work. All the other android builds are forks off AOSP (the Android Open Source Project), which is the "upstream vanilla" for that distro. It is a GIANT HAIRBALL and dismantling it enough to figure out what it actually does is a todo item for me, and I haven't even found time to watch all of https://www.youtube.com/watch?v=dEKYZUgorWQ yet... But I support standard Linux _too_, so right now I mostly develop on vanilla and then Elliott tells me when I screwed up. :) > Linus Rob