All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ian Kent <raven@themaw.net>
To: Stanislav Kinsbursky <skinsbursky@parallels.com>,
	"Eric W. Biederman" <ebiederm@xmission.com>
Cc: Jeff Layton <jlayton@redhat.com>,
	Greg KH <gregkh@linuxfoundation.org>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-nfs@vger.kernel.org, devel@openvz.org, oleg@redhat.com,
	bfields@fieldses.org, bharrosh@panasas.com
Subject: Re: call_usermodehelper in containers
Date: Sat, 13 Feb 2016 07:39:33 +0800	[thread overview]
Message-ID: <1455320373.2890.5.camel@themaw.net> (raw)
In-Reply-To: <52860B6D.4090208@parallels.com>

On Fri, 2013-11-15 at 15:54 +0400, Stanislav Kinsbursky wrote:
> 15.11.2013 15:03, Eric W. Biederman пишет:
> > Stanislav Kinsbursky <skinsbursky@parallels.com> writes:
> > 
> > > 12.11.2013 17:30, Jeff Layton пишет:
> > > > On Tue, 12 Nov 2013 17:02:36 +0400
> > > > Stanislav Kinsbursky <skinsbursky@parallels.com> wrote:
> > > > 
> > > > > 12.11.2013 15:12, Jeff Layton пишет:
> > > > > > On Mon, 11 Nov 2013 16:47:03 -0800
> > > > > > Greg KH <gregkh@linuxfoundation.org> wrote:
> > > > > > 
> > > > > > > On Mon, Nov 11, 2013 at 07:18:25AM -0500, Jeff Layton
> > > > > > > wrote:
> > > > > > > > We have a bit of a problem wrt to upcalls that use
> > > > > > > > call_usermodehelper
> > > > > > > > with containers and I'd like to bring this to some sort
> > > > > > > > of resolution...
> > > > > > > > 
> > > > > > > > A particularly problematic case (though there are
> > > > > > > > others) is the
> > > > > > > > nfsdcltrack upcall. It basically uses
> > > > > > > > call_usermodehelper to run a
> > > > > > > > program in userland to track some information on stable
> > > > > > > > storage for
> > > > > > > > nfsd.
> > > > > > > 
> > > > > > > I thought the discussion at the kernel summit about this
> > > > > > > issue was:
> > > > > > > 	- don't do this.
> > > > > > > 	- don't do it.
> > > > > > > 	- if you really need to do this, fix nfsd
> > > > > > > 
> > > > > > 
> > > > > > Sorry, I couldn't make the kernel summit so I missed that
> > > > > > discussion. I
> > > > > > guess LWN didn't cover it?
> > > > > > 
> > > > > > In any case, I guess then that we'll either have to come up
> > > > > > with some
> > > > > > way to fix nfsd here, or simply ensure that nfsd can never
> > > > > > be started
> > > > > > unless root in the container has a full set of a full set of
> > > > > > capabilities.
> > > > > > 
> > > > > > One sort of Rube Goldberg possibility to fix nfsd is:
> > > > > > 
> > > > > > - when we start nfsd in a container, fork off an extra
> > > > > > kernel thread
> > > > > >      that just sits idle. That thread would need to be a
> > > > > > descendant of the
> > > > > >      userland process that started nfsd, so we'd need to
> > > > > > create it with
> > > > > >      kernel_thread().
> > > > > > 
> > > > > > - Have the kernel just start up the UMH program in the
> > > > > > init_ns mount
> > > > > >      namespace as it currently does, but also pass the pid
> > > > > > of the idle
> > > > > >      kernel thread to the UMH upcall.
> > > > > > 
> > > > > > - The program will then use /proc/<pid>/root and
> > > > > > /proc/<pid>/ns/* to set
> > > > > >      itself up for doing things properly.
> > > > > > 
> > > > > > Note that with this mechanism we can't actually run a
> > > > > > different binary
> > > > > > per container, but that's probably fine for most purposes.
> > > > > > 
> > > > > 
> > > > > Hmmm... Why we can't? We can go a bit further with userspace
> > > > > idea.
> > > > > 
> > > > > We use UMH some very limited number of user programs. For 2,
> > > > > actually:
> > > > > 1) /sbin/nfs_cache_getent
> > > > > 2) /sbin/nfsdcltrack
> > > > > 
> > > > 
> > > > No, the kernel uses them for a lot more than that. Pretty much
> > > > all of
> > > > the keys API upcalls use it. See all of the callers of
> > > > call_usermodehelper. All of them are running user binaries out
> > > > of the
> > > > kernel, and almost all of them are certainly broken wrt
> > > > containers.
> > > > 
> > > > > If we convert them into proxies, which use /proc/<pid>/root
> > > > > and /proc/<pid>/ns/*, this will allow us to lookup the right
> > > > > binary.
> > > > > The only limitation here is presence of this "proxy" binaries
> > > > > on "host".
> > > > > 
> > > > 
> > > > Suppose I spawn my own container as a user, using all of this
> > > > spiffy
> > > > new user namespace stuff. Then I make the kernel use
> > > > call_usermodehelper to call the upcall in the init_ns, and then
> > > > trick
> > > > it into running my new "escape_from_namespace" program with
> > > > "real" root
> > > > privileges.
> > > > 
> > > > I don't think we can reasonably assume that having the kernel
> > > > exec an
> > > > arbitrary binary inside of a container is safe. Doing so inside
> > > > of the
> > > > init_ns is marginally more safe, but only marginally so...
> > > > 
> > > > > And we don't need any significant changes in kernel.
> > > > > 
> > > > > BTW, Jeff, could you remind me, please, why exactly we need to
> > > > > use UMH to run the binary?
> > > > > What are this capabilities, which force us to do so?
> > > > > 
> > > > 
> > > > Nothing _forces_ us to do so, but upcalls are very difficult to
> > > > handle,
> > > > and UMH has a lot of advantages over a long-running daemon
> > > > launched by
> > > > userland.
> > > > 
> > > > Originally, I created the nfsdcltrack upcall as a running daemon
> > > > called
> > > > nfsdcld, and the kernel used rpc_pipefs to communicate with it.
> > > > 
> > > > Everyone hated it because no one likes to have to run daemons
> > > > for
> > > > infrequently used upcalls. It's a pain for users to ensure that
> > > > it's
> > > > running and it's a pain to handle when it isn't. So, I was
> > > > encouraged
> > > > to turn that instead into a UMH upcall.
> > > > 
> > > > But leaving that aside, this problem is a lot larger than just
> > > > nfsd. We
> > > > have a *lot* of UMH upcalls in the kernel, so this problem is
> > > > more
> > > > general than just "fixing" nfsd's.
> > > > 
> > > 
> > > Ok. So we are talking about generic approach to UMH support in a
> > > container (and/or namespace).
> > > 
> > > Actually, as far as I can see, there are more that one aspect,
> > > which is not supported.
> > > One one them is executing of the right binary. Another one is
> > > capabilities (and maybe there are more, like user namespaces), but
> > > I
> > > don't really care about them for now.
> > > Executing the right binary, actually, is not about namespaces at
> > > all. This is about lookup implementation in VFS
> > > (do_execve_common).
> > 
> > 
> > 
> > > Would be great to unshare FS for forked UHM kthread and swap it to
> > > desired root. This will solve the problem with proper lookup.
> > > However,
> > > as far as I understand, this approach is not welcome by the
> > > community.
> > 
> > I don't understand that one.  Having a preforked thread with the
> > proper
> > environment that can act like kthreadd in terms of spawning user
> > mode
> > helpers works and is simple.  The only downside I can see is that
> > there
> > is extra overhead.
> > 
> 
> What do you mean by "simple" here? Simple to implement?
> We already have a preforked thread, called "UMH", used exactly for
> this purpose.

Is there?

Can you explain how the pre-forking happens please?

AFAICS a workqueue is used to run UMH helpers, I can't see any pre
-forking going on there and it doesn't appear to be possible to do
either.

> And, if I'm not mistaken, we are trying to discuss, how to adapt
> existent infrastructure for namespaces, don't we?
> 
> > Beyond that though for the user mode helpers spawned to populate
> > security keys it is not clear which context they should be run in,
> > even if we do have kernel threads.
> > 
> 
> Regardless of the context itself, we need a way to pass it to kernel
> thread and to put kernel thread in this context. Or I'm missing
> something?
> 
> > > This problem, probably, can be solved by constructing full binary
> > > path
> > > (i.e. not in a container, but in kernel thread root context) in
> > > UMH
> > > "init" callack. However, this will help only is the dentry is
> > > accessible from "init" root. Which is usually no true in case on
> > > mount
> > > namespaces, if I understand them right.
> > 
> > You are correct it can not be assumed that what is visible in one
> > mount
> > namespace is visible in another.  And of course in addition to
> > picking
> > the correct binary to run you have to set up a proper environment
> > for
> > that binary to run in.  It may be that it's configuration file is
> > only
> > avaiable at the expected location in the proper mount namespace,
> > even
> > if the binary is available in all of the mount namespaces.
> > 
> 
> Yes, you are right. So, this solution can help only in case of very
> specific and simple "environment-less" programs.
> So, I believe, that we should modify UMH itself to support our needs.
> But I don't see, how to make the idea more pleasant for the community.
> IOW, when I was talking about UMH in NFS implementation on Ksummit,
> Linus's answer was something like "fix NFS".
> And I can't object it, actually, because for now NFS is the only
> corner case.
> 
> Jeff said, that there are a bunch of UMH calls in kernel, but this is
> not solid enough to prove UHM changes, since nobody is trying to use
> them in containers.
> 
> So, I doubt, that we can change UMH generically without additional use
> -cases for 'containerized" UMH.
> 
> > Eric
> > 
> 
> 

  reply	other threads:[~2016-02-12 23:39 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-11 12:18 call_usermodehelper in containers Jeff Layton
2013-11-11 12:43 ` [Devel] " Vasily Kulikov
2013-11-11 13:26   ` Jeff Layton
2013-11-12  0:47 ` Greg KH
2013-11-12 11:12   ` Jeff Layton
2013-11-12 13:02     ` Stanislav Kinsbursky
2013-11-12 13:30       ` Jeff Layton
2013-11-15  5:05         ` Eric W. Biederman
2013-11-15 10:40         ` Stanislav Kinsbursky
2013-11-15 11:03           ` Eric W. Biederman
2013-11-15 11:54             ` Stanislav Kinsbursky
2016-02-12 23:39               ` Ian Kent [this message]
2016-02-13 16:08                 ` Stanislav Kinsburskiy
2016-02-15  0:11                   ` Ian Kent
     [not found]                     ` <1455495082.2941.32.camel-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org>
2016-02-18  3:17                       ` Eric W. Biederman
2016-02-18  3:17                         ` Eric W. Biederman
2013-11-18 17:28             ` Oleg Nesterov
2013-11-18 18:02               ` Oleg Nesterov
2013-11-19 14:51                 ` Jeff Layton
2016-02-11  0:17               ` Ian Kent
     [not found]                 ` <1455149857.2903.9.camel-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org>
2016-02-18  2:57                   ` Eric W. Biederman
2016-02-18  2:57                     ` Eric W. Biederman
     [not found]                     ` <8737sq4teb.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2016-02-18  3:43                       ` Kamezawa Hiroyuki
2016-02-18  3:43                         ` Kamezawa Hiroyuki
2016-02-18  6:36                         ` Ian Kent
2016-02-18  7:37                           ` Ian Kent
     [not found]                             ` <1455781033.2908.5.camel-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org>
2016-02-18 20:45                               ` Eric W. Biederman
2016-02-18 20:45                                 ` Eric W. Biederman
     [not found]                                 ` <87r3g9ychc.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2016-02-19  3:08                                   ` Kamezawa Hiroyuki
2016-02-19  3:08                                     ` Kamezawa Hiroyuki
     [not found]                                     ` <56C68714.2000900-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2016-02-19  5:37                                       ` Ian Kent
2016-02-19  5:37                                     ` Ian Kent
     [not found]                                       ` <1455860260.3356.31.camel-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org>
2016-02-19  9:30                                         ` Kamezawa Hiroyuki
2016-02-19  9:30                                           ` Kamezawa Hiroyuki
     [not found]                                           ` <56C6E0A8.3010806-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2016-02-20  3:28                                             ` Ian Kent
2016-02-20  3:28                                               ` Ian Kent
2016-02-19  5:14                                   ` Ian Kent
2016-02-19  5:14                                     ` Ian Kent
2016-02-23  2:55                                     ` Ian Kent
     [not found]                                       ` <1456196130.2911.10.camel-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org>
2016-02-23 14:36                                         ` J. Bruce Fields
2016-02-23 14:36                                       ` J. Bruce Fields
     [not found]                                         ` <20160223143627.GB31951-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2016-02-24  0:55                                           ` Ian Kent
2016-02-24  0:55                                             ` Ian Kent
     [not found]                                     ` <1455858850.3356.19.camel-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org>
2016-02-23  2:55                                       ` Ian Kent
     [not found]                           ` <1455777387.3188.24.camel-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org>
2016-02-18  7:37                             ` Ian Kent
     [not found]                         ` <56C53DE3.1070108-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2016-02-18  6:36                           ` Ian Kent
2016-03-24  7:45               ` Ian Kent
2016-03-25  1:28                 ` Oleg Nesterov
2016-03-25  7:25                   ` Ian Kent

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1455320373.2890.5.camel@themaw.net \
    --to=raven@themaw.net \
    --cc=bfields@fieldses.org \
    --cc=bharrosh@panasas.com \
    --cc=devel@openvz.org \
    --cc=ebiederm@xmission.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jlayton@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=skinsbursky@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.