From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752584AbaLCXeW (ORCPT ); Wed, 3 Dec 2014 18:34:22 -0500 Received: from mx1.redhat.com ([209.132.183.28]:45317 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752323AbaLCXeU (ORCPT ); Wed, 3 Dec 2014 18:34:20 -0500 Message-ID: <1417649648.2581.32.camel@pluto.fritz.box> Subject: Re: [RFC PATCH 3/4] kmod - add call_usermodehelper_ns() helper From: Ian Kent To: "Eric W. Biederman" Cc: Benjamin Coddington , Oleg Nesterov , Kernel Mailing List , "J. Bruce Fields" , Stanislav Kinsbursky , Trond Myklebust , David Howells , Al Viro Date: Thu, 04 Dec 2014 07:34:08 +0800 In-Reply-To: <87zjb4k7b6.fsf@x220.int.ebiederm.org> References: <20141125005255.4974.54193.stgit@pluto.fritz.box> <20141125010734.4974.85347.stgit@pluto.fritz.box> <20141125215248.GA7958@redhat.com> <20141125220637.GA10008@redhat.com> <87y4qy7wci.fsf@x220.int.ebiederm.org> <1416956845.2509.38.camel@pluto.fritz.box> <871toq7tr6.fsf@x220.int.ebiederm.org> <1416959452.2509.52.camel@pluto.fritz.box> <87egsq3fna.fsf@x220.int.ebiederm.org> <1417563215.2531.25.camel@pluto.fritz.box> <87zjb4k7b6.fsf@x220.int.ebiederm.org> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 2014-12-03 at 10:49 -0600, Eric W. Biederman wrote: > Ian Kent writes: > > > On Mon, 2014-12-01 at 16:56 -0500, Benjamin Coddington wrote: > >> n Tue, 25 Nov 2014, Eric W. Biederman wrote: > >> Hi, > >> > >> > Ian Kent writes: > >> > > >> > > On Tue, 2014-11-25 at 17:19 -0600, Eric W. Biederman wrote: > >> > >> Ian Kent writes: > >> > >> > >> > >> > On Tue, 2014-11-25 at 16:23 -0600, Eric W. Biederman wrote: > >> > >> >> Oleg Nesterov writes: > >> > >> >> > >> > >> >> > On 11/25, Oleg Nesterov wrote: > >> > >> >> >> > >> > >> >> >> Let me first apologize, I didn't actually read this series yet. > >> > >> >> >> > >> > >> >> >> But I have to admit that so far I do not like this approach... > >> > >> >> >> probably I am biased. > >> > >> >> > > >> > >> >> > Yes. > >> > >> >> > > >> > >> >> > And I have another concern... this is mostly a feeling, I can be > >> > >> >> > easily wrong but: > >> > >> >> > > >> > >> >> >> On 11/25, Ian Kent wrote: > >> > >> >> >> > > >> > >> >> >> > +static int umh_set_ns(struct subprocess_info *info, struct cred *new) > >> > >> >> >> > +{ > >> > >> >> >> > + struct nsproxy *ns = info->data; > >> > >> >> >> > + > >> > >> >> >> > + mntns_setfs(ns->mnt_ns); > >> > >> >> >> > >> > >> >> >> Firstly, it is not clear to me if we should use the caller's ->mnt_ns. > >> > >> >> >> Let me remind about the coredump. The dumping task can cloned with > >> > >> >> >> CLONE_NEWNS or it cam do unshare(NEWNS)... but OK, I do not understand > >> > >> >> >> this enough. > >> > >> >> > > >> > >> >> > And otoh. If we actually want to use the caller's mnt_ns/namespaces we > >> > >> >> > could simply fork/reparent a child which will do execve ? > >> > >> >> > >> > >> >> That would certainly be a better approach, and roughly equivalent to > >> > >> >> what exists here. That would even ensure we remain in the proper > >> > >> >> cgroups, and lsm context. > >> > >> >> > >> > >> >> The practical problem with the approach presented here is that I can > >> > >> >> hijack any user mode helper I wish, and make it run in any executable I > >> > >> >> wish as the global root user. > >> > >> >> > >> > >> >> Ian if we were to merge this I believe you would win the award for > >> > >> >> easiest path to a root shell. > >> > >> > > >> > >> > LOL, OK, so there's a problem with this. > >> > >> > > >> > >> > But, how should a user mode helper execute within a namespace (or more > >> > >> > specifically within a container)? > >> > >> > > >> > >> > Suppose a user mode helper program scans through the pid list and > >> > >> > somehow picks the correct process pid and then does an > >> > >> > open()/setns()/execve(). > >> > >> > > >> > >> > Does that then satisfy the requirements? > >> > >> > What needs to be done to safely do that in kernel? > >> > >> > > >> > >> > The other approach I've considered is doing a full open()/setns() in > >> > >> > kernel (since the caller already knows its pid) but it sounds like > >> > >> > that's not right either. > >> > >> > >> > >> The approach we agreed upon with the core dump helper was to provide > >> > >> enough information that userspace could figure out what was the > >> > >> appropriate policy and call nsenter/setns. > >> > > > >> > > Your recommending I have a look at the core dump helper, that's fine, > >> > > I'll do that. > >> > > >> > I am just describing it because it came up. Core dumps are a much > >> > easier case than nfs. > >> > > >> > Frankly if we can figure out how to run the user mode helpers from the > >> > kernel with an appropriate context and not involve userspace I think > >> > that will be better for everyone, as it involves fewer moving parts at > >> > the end of the day. > >> > > >> > >> The only sane approach I can think of in the context of nfs is to fork > >> > >> a kernel thread at mount time that has all of the appropriate context > >> > >> because it was captured from the privileged mounting process, and use > >> > >> that kernel as the equivalent of kthreadd. > >> > >> > >> > >> There may be some intermediate ground where we capture things or we use > >> > >> the init process of the pid namespace (captured at mount time) as our > >> > >> template/reference process. > >> > >> > >> > >> If we are going to set this stuff up in the kernel we need a reference > >> > >> process that we can create children of because what is possible with > >> > >> respect to containers keeps changing, and it is extremely error prone to > >> > >> figure out what all othe crazy little bits are, and to update everything > >> > >> every time someone tweaks the kernel's capabilities. We have kthreadd > >> > >> because it was too error prone to scrub a userspace thread of all of the > >> > >> userspace bits and make it the equivalent of what kthreadd is today. > >> > >> > >> > >> Of course it is also rather nice to have something to hang everything > >> > >> else on. > >> > >> > >> > >> In summary we need a reference struct task that is all setup properly > >> > >> so that we can create an appropriate kernel thread. > >> > > > >> > > I'm having trouble understanding what your getting at here but I'm not > >> > > that sharp so bear with me. > >> > > > >> > > When call_usermodehelper() is called it's called from a process that is > >> > > within the context within which the execution is required. > >> > > >> > No it is not. That is precisely why we have call_usermodehelper instead > >> > of just forking and exec'ing something. The context which triggers the > >> > call can be completely different from where you want to run. > >> > > >> > > So what information do we not have available for setup? > >> > > > >> > > Are you saying that the problem is that when the user mode helper run > >> > > thread is invoked we don't have the information available that was > >> > > present when call_usermodehelper() was called and that's where the > >> > > challenge lies? > >> > > >> > That is part of it. > >> > > >> > However in the context of nfs the correct context is determined at mount > >> > time, and so when call_usermodehelper() is invoked the only link to that > >> > correct context that we have is the nfs super block. > >> > > >> > In a pathological case a userspace application can create a new user > >> > namespace, and a new mount namespace and completely rearrange the > >> > mounts, and deliberately trigger an nfs user mode helper. If we were to > >> > use that applications context it could control which userspace > >> > application the kernel invoked. > >> > >> Then it seems it would never be safe to try to execve /sbin/request-key > >> within the calling process' mnt_ns, so if we want to do some work > >> within that namespace the transition to it needs to happen after the > >> upcall, after /sbin/request-key calls the appropriate helper. > > No. We never want to do work in the calling process's mnt_ns. > We want to work in the mounters mnt_ns which given mount propagation > can be something completely and totally different. > > >> And instead of transitioning into this context before execve, instead > >> pass it along to allow the userspace helper to do setns().. > > > > I still don't see the difference between, given an appropriate pid, > > using the open()/setns() mechanism in kernel similar to nsenter(1). > > > > In that case the setup is done before the first exec rather than between > > the first and second exec (the one in nsenter). Our other approach > > essentially did that but (incorrectly) used the current process pid and > > didn't deal with all the namespace types as nsenter does. > > > > Isn't the real problem getting hold of an appropriate pid to use for the > > open/setns that belongs to a process that's still running? > > What else is needed, Eric? > > Essentially that is the real problem. Getting ahold of the context of > the container in which a filesystem was mounted. > > Now it isn't just namespaces but also cgroups and uids and gids and > capabilities and process groups and sessions. There are a whole host of > little things that need to be right to run a user mode helper, and > in particular to run a user mode helper in a ``container''. And I think this is where the confusion starts because .... As it stands (whether executing within a container or not) the mounting process is mostly long gone when these calls are made and they are executed from the root init context. So doesn't adding a new init context, for example the init context of a container, then using that for the template amount to the same thing within the container? If we accept that then the open/setns is "fairly" straight forward. TBH I'm struggling to see where would be different. Ian