From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751274Ab1GUEKF (ORCPT ); Thu, 21 Jul 2011 00:10:05 -0400 Received: from cantor2.suse.de ([195.135.220.15]:34092 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750704Ab1GUEJ7 (ORCPT ); Thu, 21 Jul 2011 00:09:59 -0400 Date: Thu, 21 Jul 2011 14:09:36 +1000 From: NeilBrown To: Stephen Smalley Cc: Vasiliy Kulikov , kernel-hardening@lists.openwall.com, Solar Designer , James Morris , Linus Torvalds , linux-kernel@vger.kernel.org, Greg Kroah-Hartman , Andrew Morton , "David S. Miller" , Jiri Slaby , Alexander Viro , linux-fsdevel@vger.kernel.org, KOSAKI Motohiro , Eric Paris , Willy Tarreau , Sebastian Krahmer Subject: Re: [kernel-hardening] Re: [PATCH] move RLIMIT_NPROC check from set_user() to do_execve_common() Message-ID: <20110721140936.632d2c8b@notabene.brown> In-Reply-To: <1310759683.30257.123.camel@moss-pluto> References: <20110713170657.59dae548@notabene.brown> <20110714112751.1bfd998f@notabene.brown> <20110714150602.GA30019@openwall.com> <20110715133013.4fa38d19@notabene.brown> <20110715063113.GA3166@albatros> <20110715170650.585f1dad@notabene.brown> <20110715073823.GA3821@albatros> <1310738313.30257.27.camel@moss-pluto> <20110715152641.GA6286@albatros> <1310759683.30257.123.camel@moss-pluto> X-Mailer: Claws Mail 3.7.9 (GTK+ 2.22.1; x86_64-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 15 Jul 2011 15:54:43 -0400 Stephen Smalley wrote: > On Fri, 2011-07-15 at 19:26 +0400, Vasiliy Kulikov wrote: > > Hi Stephen, > > > > On Fri, Jul 15, 2011 at 09:58 -0400, Stephen Smalley wrote: > > > Does this have implications for Android's zygote model? There you have > > > a long running uid 0 / all caps process (the zygote), which forks itself > > > upon receiving a request to spawn an app and then calls > > > > > setgroups(); > > > setrlimit(); setgid(); setuid(); > > > > Is RLIMIT_NPROC forced in your model and setuid() is expected to fail > > because of NPROC exceeding? If no, then it is not touched at all. > > I don't know what their intent is. But it is an example of a case where > moving the enforcement from setuid() to a subsequent execve() causes the > check to never get applied. As to whether or not they care, I don't > know. An app that calls fork() repeatedly will still be stopped, but an > app that repeatedly connects to the zygote and asks to spawn another > instance of itself would be unlimited. > > OTOH, the current RLIMIT_NPROC check and lack of setuid() error checking > has been a repeated issue for Android. > So where does this leave us? Between a rock and a hard place? It says to me that moving the check from set_user to execve is simply Wrong(TM). It may be convenient and do TheRightThing in certain common cases, but it also can do the Wrong thing in other cases and I don't think that is an acceptable trade off. Having setuid succeed when it should fail is simply incorrect. The problem - as we all know - is that user space doesn't always check error status properly. If we were to look for precedent I would point to SIGPIPE. The only reason for that to exist is because programs don't always check that a "write" succeeds so we have a mechanism to kill off processes that don't check the error status and keep sending. I would really like to apply that to this problem ... but that has already been suggested (years ago) and found wanting. Maybe we can revisit it? The value of the SIGPIPE approach (whether it is SIGPIPE or SIGXCPU or SIGVEC ... if only there were a SIGXNPROC) is that the program remains in complete control. It can find out if the NPROC limit has been exceeded at the "right" time. The only other solution that I can think of which isn't "Wrong(TM)" is my first patch which introduced PF_SETUSER_FAILED. With this patch setuid() still fails if it should, so the calling process still remains in control. But if it fails to exercise that control, the kernel steps in. Vasiliy didn't like that because it allows a process to ignore the setuid failure, perform some in-process activity as root when expecting it to be as some-other-user, and only fails when execve is attempted - possibly too late. Against this I ask: what exactly is our goal here? Is it to stop all possible abuses? I think not. That is impossible. Is it to stop certain known and commonly repeated errors? I think so. That is clearly valuable. We know (Thanks to Solar Designer's list) that unchecked setuid followed by execve is a commonly repeated error, so trapping that can be justified. Do we know that accessing the filesystem first is a commonly repeated error? If not, there is no clear motive to deal with that problem. If, however, it is then maybe a check for PF_SETUSER_FAILED in inode_permission() would be the right thing. Or maybe we just set PF_SETUSER_FAILED and leave it up to some security module to decide what to disable or report when that is set? In short: I don't think there can be a solution that is both completely correct and completely safe. I would go for "as correct as possible" with "closes common vulnerabilities". NeilBrown