From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756331AbdKDBHm (ORCPT ); Fri, 3 Nov 2017 21:07:42 -0400 Received: from mail-io0-f179.google.com ([209.85.223.179]:51907 "EHLO mail-io0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751955AbdKDBHk (ORCPT ); Fri, 3 Nov 2017 21:07:40 -0400 X-Google-Smtp-Source: ABhQp+SGFWZFCvmDEJS5qEVrrWZcE993zylzV91rwhiC9Sfoku07ZjUtU30c7ruFbYjx1kx+MtAK+2V+qm6LHuJPrf0= MIME-Version: 1.0 In-Reply-To: <0b3a9bd0-3046-cdab-cfee-0ca45ee64e8d@landley.net> References: <0b3a9bd0-3046-cdab-cfee-0ca45ee64e8d@landley.net> From: Linus Torvalds Date: Fri, 3 Nov 2017 18:07:39 -0700 X-Google-Sender-Auth: xsdn18-hSQOjnN1wgIHYXQRhIRM Message-ID: Subject: Re: Regression: commit da029c11e6b1 broke toybox xargs. To: Rob Landley Cc: Kees Cook , Linux Kernel Mailing List , toybox@lists.landley.net, enh@gmail.com Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Nov 3, 2017 at 4:58 PM, Rob Landley wrote: > On 11/02/2017 10:40 AM, Linus Torvalds wrote: > > But it boils down to "got the limit wrong, the exec failed after the > fork(), dynamic recovery from which is awkward so I'm trying to figure > out the right limit". Well, the thing is, you would only get the limit wrong if your RLIMIT_STACK is set to some insane value. >> Ahh. I should have read that email more carefully. If xargs broke, >> that _will_ break actual scripts, yes. Do you actually set the stack >> limit to insane values? Anybody using toybox really shouldn't be doing >> 32MB stacks. > > Toybox is the default command line of android since M, which went 64 bit > in L, and the Pixel 2 phone has 4 gigs of ram. So? My desktop has 32GB of ram, and is running a distro that sets the kernel configuration to MAXSMP because the distro people don't want to have multiple kernels, and some peoople run it on big hardware with terabytes of RAM and thousands of cores. And yet, on that distro, I do: [torvalds@i7 linux]$ ulimit -s 8192 ie the stack limit hasn't been increased from the default 8MB. So that whole "let's make the stack limit crazy" is actually the core problem in your particular equation. If you have a sane stack limit (anything less than 24MB), you'd not have seen the xargs issue. That said, _SC_ARG_MAX really is badly defined. In many ways, 128k is still the correct limit, not because it's the historical one, but because it is MAX_ARG_STRLEN. It's the biggest single string we allow (although strictly speaking, it's not really 128kB, it's 32*PAGE_SIZE, for historical reasons. So there simply isn't a single limit, and never has been. The traditional value is 128kB, then for a while we didn't have any limit at all, then we did that RLIMIT_STACK/4 (but only for the strings), then we did RLIMIT_STACK/4 (but taking the pointers into account too), and then we did that "limit it to at most three quarters of _RLIM_STK") I suspect we _do_ have to raise that limit, because clearly this is a regression, but I absolutely _detest_ the fact that a stupid _embedded_ OS thinks that it should have a bigger stack limit than stuff that runs on supercomputers. That just makes me go "there's something seriously wrong". > My problem here is it's hard to figure out what exec size the limit > _is_. There's a sysconf(_SC_ARG_MAX) which bionic and glibc are > currently returning as stack_limit/4, which is now too big and exec() > will error out after the fork. Musl is returning the 131072 limit from > 2011-ish, meaning "/bin/echo $(printf '%0*d' 131071)" works but > "printf '%0*d' 131071 | xargs" fails, an inconsistency I was trying to > avoid. Maybe I don't have that luxury... Honestly, lots of the POSIX SC limits are questionable. In this case, _SC_ARG_MAX is garbage because it's simply not even well-defined. It really is 32*PAGE_SIZE if all you have is one single long argument, because that's the largest single string we accept. Make it one byte bigger, and we'll return E2BIG, as you found out. But at the same time, it can clearly also be 6MB, since that's what we accept if the stack limit are big enough, and yes, it used to be even bigger. For something like "xargs", I'm actually really saddened by the stupid decision to think it's a single value. The whole and *only* reason for xargs to exist is to just get it right, and the natural thing for xargs to do would be to not ask, but simply try to do the whole thing, and if you get E2BIG, you decide to split it in half or something until it works. That kind of approach would just make it work _without_ depending on some magic value. The fact that apparently xargs is too stupid to do that, and instead requires _SC_ARG_MAX to magically give it the "One True Value(tm)" is just all kinds of crap. Oh well. Enough ranting. What _is_ the stack limit when using toybox? Is it just entirely unlimited? > Should I just go back to hardwiring in 131072? It's no _less_ arbitrary > than 10 megs, and it sounds like getting it _right_ is unachievable. So in a perfect world, nobody should use that value. But we can certainly change the kernel behavior back too. But you realize that then we still would limit suid binaries, and now your "xargs" would suddenly work with normal binaries, but break if it's a suid binary? So it would certainly just be nicer if toybox had a sane stack limit and none of this would matter. Linus