From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933721AbaDIWtp (ORCPT ); Wed, 9 Apr 2014 18:49:45 -0400 Received: from out02.mta.xmission.com ([166.70.13.232]:58964 "EHLO out02.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932719AbaDIWtm (ORCPT ); Wed, 9 Apr 2014 18:49:42 -0400 From: ebiederm@xmission.com (Eric W. Biederman) To: Al Viro Cc: Linus Torvalds , "Serge E. Hallyn" , Linux-Fsdevel , Kernel Mailing List , Andy Lutomirski , Rob Landley , Miklos Szeredi , Christoph Hellwig , Karel Zak , "J. Bruce Fields" , Fengguang Wu References: <8761v7h2pt.fsf@tw-ebiederman.twitter.com> <87li281wx6.fsf_-_@xmission.com> <87ob28kqks.fsf_-_@xmission.com> <874n3n7czm.fsf_-_@xmission.com> <87wqezl5df.fsf_-_@x220.int.ebiederm.org> <20140409023027.GX18016@ZenIV.linux.org.uk> <20140409023947.GY18016@ZenIV.linux.org.uk> <87sipmbe8x.fsf@x220.int.ebiederm.org> <20140409175322.GZ18016@ZenIV.linux.org.uk> <20140409182830.GA18016@ZenIV.linux.org.uk> Date: Wed, 09 Apr 2014 15:49:09 -0700 In-Reply-To: <20140409182830.GA18016@ZenIV.linux.org.uk> (Al Viro's message of "Wed, 9 Apr 2014 19:28:32 +0100") Message-ID: <87txa286fu.fsf@x220.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-AID: U2FsdGVkX1+ch6em2WMkHgAFR8Ee++iQmueklBXRBEc= X-SA-Exim-Connect-IP: 98.234.51.111 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 0.0 T_TM2_M_HEADER_IN_MSG BODY: T_TM2_M_HEADER_IN_MSG * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.4273] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [] * 0.5 XM_Body_Dirty_Words Contains a dirty word * 0.0 T_TooManySym_01 4+ unique symbols in subject * 1.2 XMSubMetaSxObfu_03 Obfuscated Sexy Noun-People * 1.0 XMSubMetaSx_00 1+ Sexy Words * 1.0 XMSexyCombo_01 Sexy words in both body/subject X-Spam-DCC: ; X-Spam-Combo: ***;Al Viro X-Spam-Relay-Country: Subject: Re: [GIT PULL] Detaching mounts on unlink for 3.15-rc1 X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Wed, 14 Nov 2012 13:58:17 -0700) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Al Viro writes: > On Wed, Apr 09, 2014 at 06:53:23PM +0100, Al Viro wrote: > >> For starters, put that ext4 on top of dm-raid or dm-multipath. That alone >> will very likely push you over the top. >> >> Keep in mind, BTW, that you do not have full 8K to play with - there's >> struct thread_info that should not be stepped upon. Not particulary large >> (IIRC, restart_block is the largest piece in amd64 one), but it eats about >> 100 bytes. >> >> I'd probably use renameat(2) in testing - i.e. trigger the shite when >> resolving a deeply nested symlink in renameat() arguments. That brings >> extra struct nameidata into the game, i.e. extra 152 bytes chewed off the >> stack. > > Come to think of that, some extra nastiness could be had by mixing it with > execve(). You can have up to 4 levels of #! resolution there, each eating > up at least 128 bytes (more, actually). Compiler _might_ turn that > tail call of search_binary_handler() into a jump, but it's not guaranteed > at all. > > FWIW, it probably makes sense to turn load_script() into > static int load_script(struct linux_binprm *bprm) > { > int err = __load_script(bprm); > if (err) > return err; > return search_binary_handler(bprm); > } > > regardless of that issue; we don't need interp[] after the call of > open_exec(), so it makes sense to reduce the footprint in mutual > recursion loop. > > For extra pain, consider s/ext4/xfs/, possibly with iscsi thrown under the > bus^Wdm-multipath. > > The thing is, we are already too close to stack overflow limit. Adding > several kilobytes more is not survivable, and since you are taking > somebody in a userns DoSing the system into consideration, you can't > say "it takes malicious root to set up, so it's not serious" - the > DoS you mentioned requires the same thing... Thank you for the comments this makes it clear that the problem is with mntput (and the filesystem I/O that can be triggered) not particularly with detach_mounts. As I read the code all of these nasty cases we are concern with today can be triggered with pathput/mntput already with an appropriate race against umount, which means my detach_mounts code doesn't introduce a stack space usage regression, but seems to be the messenger that we have such problems in the VFS. There is a also a big difference between what can be triggered using filesystems we allow unprivileged users to mount, devpts, proc, ramfs, sysfs, mqueue, shmem (none of which have backing store) and filesystems with a long I/O path. So it looks like it still requires global root to trigger this, although I still think it is serious. One of the more interesting aspects of this userns work is running into code that semantically should be safe for unprivileged users to use but as we haven't historically allowed unprivileged users to use the code there are silly assumptions and untested code paths. > BTW, another thing to test would be this: > mount nfs on /mnt > mount a filesystem on /mnt/path that can be invalidated > cd to /mnt/path/foo > bind /mnt on /mnt/path/foo/bar > shoot /mnt/path (on server) > stat bar/path/foo > That should rip the fs you are in out of the tree; it should work, but > it's definitely a case worth testing. Agreed that is a case worth test. I wasn't looking at that case in particular as that is not the worst case stack usage or even an approximation of it. Eric