* call_usermodehelper in containers @ 2013-11-11 12:18 Jeff Layton 2013-11-11 12:43 ` [Devel] " Vasily Kulikov 2013-11-12 0:47 ` Greg KH 0 siblings, 2 replies; 49+ messages in thread From: Jeff Layton @ 2013-11-11 12:18 UTC (permalink / raw) To: linux-kernel, linux-fsdevel, linux-nfs, Stanislav Kinsbursky Cc: devel, ebiederm, oleg, bfields, bharrosh We have a bit of a problem wrt to upcalls that use call_usermodehelper with containers and I'd like to bring this to some sort of resolution... A particularly problematic case (though there are others) is the nfsdcltrack upcall. It basically uses call_usermodehelper to run a program in userland to track some information on stable storage for nfsd. One could envision nfsd running on a machine with several containers, and each would need to track its own database of info on stable storage. When processing RPCs that come into the network address within a certain container, we want to ensure that it tracks this info inside the mount namespace within that container as well. So, we ideally need to ensure that when this upcall is run, that we run the correct binary in the container *and* that it does all of its file I/O within the correct mount namespace. We might also need other namespace swapping as well. Stanislav posted a patch a few months ago that tried to address this: https://lkml.org/lkml/2013/5/22/80 However, it failed to address some problems -- namely that we have to consider the case where a container may be running with reduced capabilities. We want to ensure that the UMH upcall inherits the correct capability set for its container as well. What's the correct approach to fix this? One possibility would be to keep a kernel thread around that sits in the correct namespace(s) and has the right privileges, and then use that to launch UMH programs. That thread could be spawned whenever someone runs rpc.nfsd inside a container. Not very elegant, but it seems like something that would work. Are there better approaches? -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Devel] call_usermodehelper in containers 2013-11-11 12:18 call_usermodehelper in containers Jeff Layton @ 2013-11-11 12:43 ` Vasily Kulikov 2013-11-11 13:26 ` Jeff Layton 2013-11-12 0:47 ` Greg KH 1 sibling, 1 reply; 49+ messages in thread From: Vasily Kulikov @ 2013-11-11 12:43 UTC (permalink / raw) To: Jeff Layton Cc: linux-kernel, linux-fsdevel, linux-nfs, Stanislav Kinsbursky, bfields, bharrosh, devel, oleg Hi Jeff, On Mon, Nov 11, 2013 at 07:18 -0500, Jeff Layton wrote: > What's the correct approach to fix this? One possibility would be to > keep a kernel thread around that sits in the correct namespace(s) and > has the right privileges, and then use that to launch UMH programs. > That thread could be spawned whenever someone runs rpc.nfsd inside a > container. > > Not very elegant, but it seems like something that would work. > > Are there better approaches? What's the reasoning behind this? I mean, it is not very obvious what we should keep here. Compare 2 cases: 1) root process with all caps spawns new ns, then drops some of caps; 2) root process with all caps drops some of his caps and then spawns new ns. >From the programmer's POV both cases are valid and lead to absolutely the same limitations inside of the new namespace. However, from kernel POV they differ -- if save cap set when ns is created then in (1) we'll have cap'ed UMH, in (2) we'll have UMH with only several caps. It might significantly influence on ability of UMH to do its job and ability of this limited ns to escape from the sandbox. So, what semantic should UMH privileges have? Also, an orthogonal addition: you might want to keep only minimum information about capabilities or something -- keep only cap_t field in namespace structure without explicit kernel thread for each ns. When UMH is created, just fill the required caps in it. Thanks, -- Vasily Kulikov http://www.openwall.com - bringing security into open computing environments ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [Devel] call_usermodehelper in containers 2013-11-11 12:43 ` [Devel] " Vasily Kulikov @ 2013-11-11 13:26 ` Jeff Layton 0 siblings, 0 replies; 49+ messages in thread From: Jeff Layton @ 2013-11-11 13:26 UTC (permalink / raw) To: Vasily Kulikov Cc: linux-kernel, linux-fsdevel, linux-nfs, Stanislav Kinsbursky, bfields, bharrosh, devel, oleg On Mon, 11 Nov 2013 16:43:21 +0400 Vasily Kulikov <segoon@openwall.com> wrote: > Hi Jeff, > > On Mon, Nov 11, 2013 at 07:18 -0500, Jeff Layton wrote: > > What's the correct approach to fix this? One possibility would be to > > keep a kernel thread around that sits in the correct namespace(s) and > > has the right privileges, and then use that to launch UMH programs. > > That thread could be spawned whenever someone runs rpc.nfsd inside a > > container. > > > > Not very elegant, but it seems like something that would work. > > > > Are there better approaches? > > What's the reasoning behind this? I mean, it is not very obvious what > we should keep here. Compare 2 cases: > > 1) root process with all caps spawns new ns, then drops some of caps; > > 2) root process with all caps drops some of his caps and then spawns new ns. > > From the programmer's POV both cases are valid and lead to absolutely > the same limitations inside of the new namespace. However, from kernel > POV they differ -- if save cap set when ns is created then in (1) we'll > have cap'ed UMH, in (2) we'll have UMH with only several caps. It might > significantly influence on ability of UMH to do its job and ability of > this limited ns to escape from the sandbox. > > So, what semantic should UMH privileges have? > > > Also, an orthogonal addition: you might want to keep only minimum > information about capabilities or something -- keep only cap_t field in > namespace structure without explicit kernel thread for each ns. When UMH is > created, just fill the required caps in it. > > Thanks, > I don't have a particular affinity for either approach. I think from a safety POV, it's less error prone to spawn the UMH programs as a descendent of the userland process that originally started up nfsd. That way we ensure that it inherits all of the limitations of the original task. If we try instead to start with a process that is running with all capabilities and then drop some of them selectively, then it seems quite possible to miss something. But...namespaces aren't really my "thing", and I might have the terminology wrong or be missing some other crux of the issue. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: call_usermodehelper in containers 2013-11-11 12:18 call_usermodehelper in containers Jeff Layton 2013-11-11 12:43 ` [Devel] " Vasily Kulikov @ 2013-11-12 0:47 ` Greg KH 2013-11-12 11:12 ` Jeff Layton 1 sibling, 1 reply; 49+ messages in thread From: Greg KH @ 2013-11-12 0:47 UTC (permalink / raw) To: Jeff Layton Cc: linux-kernel, linux-fsdevel, linux-nfs, Stanislav Kinsbursky, devel, ebiederm, oleg, bfields, bharrosh On Mon, Nov 11, 2013 at 07:18:25AM -0500, Jeff Layton wrote: > We have a bit of a problem wrt to upcalls that use call_usermodehelper > with containers and I'd like to bring this to some sort of resolution... > > A particularly problematic case (though there are others) is the > nfsdcltrack upcall. It basically uses call_usermodehelper to run a > program in userland to track some information on stable storage for > nfsd. I thought the discussion at the kernel summit about this issue was: - don't do this. - don't do it. - if you really need to do this, fix nfsd thanks, greg k-h ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: call_usermodehelper in containers 2013-11-12 0:47 ` Greg KH @ 2013-11-12 11:12 ` Jeff Layton 2013-11-12 13:02 ` Stanislav Kinsbursky 0 siblings, 1 reply; 49+ messages in thread From: Jeff Layton @ 2013-11-12 11:12 UTC (permalink / raw) To: Greg KH Cc: linux-kernel, linux-fsdevel, linux-nfs, Stanislav Kinsbursky, devel, ebiederm, oleg, bfields, bharrosh On Mon, 11 Nov 2013 16:47:03 -0800 Greg KH <gregkh@linuxfoundation.org> wrote: > On Mon, Nov 11, 2013 at 07:18:25AM -0500, Jeff Layton wrote: > > We have a bit of a problem wrt to upcalls that use call_usermodehelper > > with containers and I'd like to bring this to some sort of resolution... > > > > A particularly problematic case (though there are others) is the > > nfsdcltrack upcall. It basically uses call_usermodehelper to run a > > program in userland to track some information on stable storage for > > nfsd. > > I thought the discussion at the kernel summit about this issue was: > - don't do this. > - don't do it. > - if you really need to do this, fix nfsd > Sorry, I couldn't make the kernel summit so I missed that discussion. I guess LWN didn't cover it? In any case, I guess then that we'll either have to come up with some way to fix nfsd here, or simply ensure that nfsd can never be started unless root in the container has a full set of a full set of capabilities. One sort of Rube Goldberg possibility to fix nfsd is: - when we start nfsd in a container, fork off an extra kernel thread that just sits idle. That thread would need to be a descendant of the userland process that started nfsd, so we'd need to create it with kernel_thread(). - Have the kernel just start up the UMH program in the init_ns mount namespace as it currently does, but also pass the pid of the idle kernel thread to the UMH upcall. - The program will then use /proc/<pid>/root and /proc/<pid>/ns/* to set itself up for doing things properly. Note that with this mechanism we can't actually run a different binary per container, but that's probably fine for most purposes. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: call_usermodehelper in containers 2013-11-12 11:12 ` Jeff Layton @ 2013-11-12 13:02 ` Stanislav Kinsbursky 2013-11-12 13:30 ` Jeff Layton 0 siblings, 1 reply; 49+ messages in thread From: Stanislav Kinsbursky @ 2013-11-12 13:02 UTC (permalink / raw) To: Jeff Layton, Greg KH Cc: linux-kernel, linux-fsdevel, linux-nfs, devel, ebiederm, oleg, bfields, bharrosh 12.11.2013 15:12, Jeff Layton пишет: > On Mon, 11 Nov 2013 16:47:03 -0800 > Greg KH <gregkh@linuxfoundation.org> wrote: > >> On Mon, Nov 11, 2013 at 07:18:25AM -0500, Jeff Layton wrote: >>> We have a bit of a problem wrt to upcalls that use call_usermodehelper >>> with containers and I'd like to bring this to some sort of resolution... >>> >>> A particularly problematic case (though there are others) is the >>> nfsdcltrack upcall. It basically uses call_usermodehelper to run a >>> program in userland to track some information on stable storage for >>> nfsd. >> >> I thought the discussion at the kernel summit about this issue was: >> - don't do this. >> - don't do it. >> - if you really need to do this, fix nfsd >> > > Sorry, I couldn't make the kernel summit so I missed that discussion. I > guess LWN didn't cover it? > > In any case, I guess then that we'll either have to come up with some > way to fix nfsd here, or simply ensure that nfsd can never be started > unless root in the container has a full set of a full set of > capabilities. > > One sort of Rube Goldberg possibility to fix nfsd is: > > - when we start nfsd in a container, fork off an extra kernel thread > that just sits idle. That thread would need to be a descendant of the > userland process that started nfsd, so we'd need to create it with > kernel_thread(). > > - Have the kernel just start up the UMH program in the init_ns mount > namespace as it currently does, but also pass the pid of the idle > kernel thread to the UMH upcall. > > - The program will then use /proc/<pid>/root and /proc/<pid>/ns/* to set > itself up for doing things properly. > > Note that with this mechanism we can't actually run a different binary > per container, but that's probably fine for most purposes. > Hmmm... Why we can't? We can go a bit further with userspace idea. We use UMH some very limited number of user programs. For 2, actually: 1) /sbin/nfs_cache_getent 2) /sbin/nfsdcltrack If we convert them into proxies, which use /proc/<pid>/root and /proc/<pid>/ns/*, this will allow us to lookup the right binary. The only limitation here is presence of this "proxy" binaries on "host". And we don't need any significant changes in kernel. BTW, Jeff, could you remind me, please, why exactly we need to use UMH to run the binary? What are this capabilities, which force us to do so? -- Best regards, Stanislav Kinsbursky ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: call_usermodehelper in containers 2013-11-12 13:02 ` Stanislav Kinsbursky @ 2013-11-12 13:30 ` Jeff Layton 2013-11-15 5:05 ` Eric W. Biederman 2013-11-15 10:40 ` Stanislav Kinsbursky 0 siblings, 2 replies; 49+ messages in thread From: Jeff Layton @ 2013-11-12 13:30 UTC (permalink / raw) To: Stanislav Kinsbursky Cc: Greg KH, linux-kernel, linux-fsdevel, linux-nfs, devel, ebiederm, oleg, bfields, bharrosh On Tue, 12 Nov 2013 17:02:36 +0400 Stanislav Kinsbursky <skinsbursky@parallels.com> wrote: > 12.11.2013 15:12, Jeff Layton пишет: > > On Mon, 11 Nov 2013 16:47:03 -0800 > > Greg KH <gregkh@linuxfoundation.org> wrote: > > > >> On Mon, Nov 11, 2013 at 07:18:25AM -0500, Jeff Layton wrote: > >>> We have a bit of a problem wrt to upcalls that use call_usermodehelper > >>> with containers and I'd like to bring this to some sort of resolution... > >>> > >>> A particularly problematic case (though there are others) is the > >>> nfsdcltrack upcall. It basically uses call_usermodehelper to run a > >>> program in userland to track some information on stable storage for > >>> nfsd. > >> > >> I thought the discussion at the kernel summit about this issue was: > >> - don't do this. > >> - don't do it. > >> - if you really need to do this, fix nfsd > >> > > > > Sorry, I couldn't make the kernel summit so I missed that discussion. I > > guess LWN didn't cover it? > > > > In any case, I guess then that we'll either have to come up with some > > way to fix nfsd here, or simply ensure that nfsd can never be started > > unless root in the container has a full set of a full set of > > capabilities. > > > > One sort of Rube Goldberg possibility to fix nfsd is: > > > > - when we start nfsd in a container, fork off an extra kernel thread > > that just sits idle. That thread would need to be a descendant of the > > userland process that started nfsd, so we'd need to create it with > > kernel_thread(). > > > > - Have the kernel just start up the UMH program in the init_ns mount > > namespace as it currently does, but also pass the pid of the idle > > kernel thread to the UMH upcall. > > > > - The program will then use /proc/<pid>/root and /proc/<pid>/ns/* to set > > itself up for doing things properly. > > > > Note that with this mechanism we can't actually run a different binary > > per container, but that's probably fine for most purposes. > > > > Hmmm... Why we can't? We can go a bit further with userspace idea. > > We use UMH some very limited number of user programs. For 2, actually: > 1) /sbin/nfs_cache_getent > 2) /sbin/nfsdcltrack > No, the kernel uses them for a lot more than that. Pretty much all of the keys API upcalls use it. See all of the callers of call_usermodehelper. All of them are running user binaries out of the kernel, and almost all of them are certainly broken wrt containers. > If we convert them into proxies, which use /proc/<pid>/root and /proc/<pid>/ns/*, this will allow us to lookup the right binary. > The only limitation here is presence of this "proxy" binaries on "host". > Suppose I spawn my own container as a user, using all of this spiffy new user namespace stuff. Then I make the kernel use call_usermodehelper to call the upcall in the init_ns, and then trick it into running my new "escape_from_namespace" program with "real" root privileges. I don't think we can reasonably assume that having the kernel exec an arbitrary binary inside of a container is safe. Doing so inside of the init_ns is marginally more safe, but only marginally so... > And we don't need any significant changes in kernel. > > BTW, Jeff, could you remind me, please, why exactly we need to use UMH to run the binary? > What are this capabilities, which force us to do so? > Nothing _forces_ us to do so, but upcalls are very difficult to handle, and UMH has a lot of advantages over a long-running daemon launched by userland. Originally, I created the nfsdcltrack upcall as a running daemon called nfsdcld, and the kernel used rpc_pipefs to communicate with it. Everyone hated it because no one likes to have to run daemons for infrequently used upcalls. It's a pain for users to ensure that it's running and it's a pain to handle when it isn't. So, I was encouraged to turn that instead into a UMH upcall. But leaving that aside, this problem is a lot larger than just nfsd. We have a *lot* of UMH upcalls in the kernel, so this problem is more general than just "fixing" nfsd's. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: call_usermodehelper in containers 2013-11-12 13:30 ` Jeff Layton @ 2013-11-15 5:05 ` Eric W. Biederman 2013-11-15 10:40 ` Stanislav Kinsbursky 1 sibling, 0 replies; 49+ messages in thread From: Eric W. Biederman @ 2013-11-15 5:05 UTC (permalink / raw) To: Jeff Layton Cc: Stanislav Kinsbursky, Greg KH, linux-kernel, linux-fsdevel, linux-nfs, devel, oleg, bfields, bharrosh Jeff Layton <jlayton@redhat.com> writes: > On Tue, 12 Nov 2013 17:02:36 +0400 > Stanislav Kinsbursky <skinsbursky@parallels.com> wrote: > >> 12.11.2013 15:12, Jeff Layton пишет: >> > On Mon, 11 Nov 2013 16:47:03 -0800 >> > Greg KH <gregkh@linuxfoundation.org> wrote: >> > >> >> On Mon, Nov 11, 2013 at 07:18:25AM -0500, Jeff Layton wrote: >> >>> We have a bit of a problem wrt to upcalls that use call_usermodehelper >> >>> with containers and I'd like to bring this to some sort of resolution... >> >>> >> >>> A particularly problematic case (though there are others) is the >> >>> nfsdcltrack upcall. It basically uses call_usermodehelper to run a >> >>> program in userland to track some information on stable storage for >> >>> nfsd. >> >> >> >> I thought the discussion at the kernel summit about this issue was: >> >> - don't do this. >> >> - don't do it. >> >> - if you really need to do this, fix nfsd >> >> >> > >> > Sorry, I couldn't make the kernel summit so I missed that discussion. I >> > guess LWN didn't cover it? >> > >> > In any case, I guess then that we'll either have to come up with some >> > way to fix nfsd here, or simply ensure that nfsd can never be started >> > unless root in the container has a full set of a full set of >> > capabilities. >> > >> > One sort of Rube Goldberg possibility to fix nfsd is: >> > >> > - when we start nfsd in a container, fork off an extra kernel thread >> > that just sits idle. That thread would need to be a descendant of the >> > userland process that started nfsd, so we'd need to create it with >> > kernel_thread(). >> > >> > - Have the kernel just start up the UMH program in the init_ns mount >> > namespace as it currently does, but also pass the pid of the idle >> > kernel thread to the UMH upcall. >> > >> > - The program will then use /proc/<pid>/root and /proc/<pid>/ns/* to set >> > itself up for doing things properly. >> > >> > Note that with this mechanism we can't actually run a different binary >> > per container, but that's probably fine for most purposes. >> > >> >> Hmmm... Why we can't? We can go a bit further with userspace idea. >> >> We use UMH some very limited number of user programs. For 2, actually: >> 1) /sbin/nfs_cache_getent >> 2) /sbin/nfsdcltrack >> > > No, the kernel uses them for a lot more than that. Pretty much all of > the keys API upcalls use it. See all of the callers of > call_usermodehelper. All of them are running user binaries out of the > kernel, and almost all of them are certainly broken wrt containers. Broken in the sense that we don't run them in the container yes. I tried using the keys api for the uid mapping of containers and I wound up being very disappointed because for testing/debugging I could never flush any result I had ever returned a key for. Which rather soured me on the real-world usability of the key based user mode helpers. Perhaps I was doing it wrong but it seemed like a very brittle interface, that was intollerant of human failures. >> If we convert them into proxies, which use /proc/<pid>/root and /proc/<pid>/ns/*, this will allow us to lookup the right binary. >> The only limitation here is presence of this "proxy" binaries on "host". >> > > Suppose I spawn my own container as a user, using all of this spiffy > new user namespace stuff. Then I make the kernel use > call_usermodehelper to call the upcall in the init_ns, and then trick > it into running my new "escape_from_namespace" program with "real" root > privileges. > > I don't think we can reasonably assume that having the kernel exec an > arbitrary binary inside of a container is safe. Doing so inside of the > init_ns is marginally more safe, but only marginally so... One thing we have done with the core dump helper is because there is enough information to know the namespaces of the program dumping core have the root owned and installed helper use setns to get inside the namespaces so we can have a per namespace core dump policy. If we can provide enough context to the other helpers that is probably the easiest way to go. The question is can we truly pass enough state. >> And we don't need any significant changes in kernel. >> >> BTW, Jeff, could you remind me, please, why exactly we need to use UMH to run the binary? >> What are this capabilities, which force us to do so? >> > > Nothing _forces_ us to do so, but upcalls are very difficult to handle, > and UMH has a lot of advantages over a long-running daemon launched by > userland. > > Originally, I created the nfsdcltrack upcall as a running daemon called > nfsdcld, and the kernel used rpc_pipefs to communicate with it. > > Everyone hated it because no one likes to have to run daemons for > infrequently used upcalls. It's a pain for users to ensure that it's > running and it's a pain to handle when it isn't. So, I was encouraged > to turn that instead into a UMH upcall. > > But leaving that aside, this problem is a lot larger than just nfsd. We > have a *lot* of UMH upcalls in the kernel, so this problem is more > general than just "fixing" nfsd's. Yes. So far I don't think we can trigger any of these upcalls from inside of a user namespace so we aren't in trouble yet. But it is definitely worth looking at. Because we are basically one person scratching their itch to get some feature working from being at the point where nfs or something else that uses the upcalls needs the support. Eric ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: call_usermodehelper in containers 2013-11-12 13:30 ` Jeff Layton 2013-11-15 5:05 ` Eric W. Biederman @ 2013-11-15 10:40 ` Stanislav Kinsbursky 2013-11-15 11:03 ` Eric W. Biederman 1 sibling, 1 reply; 49+ messages in thread From: Stanislav Kinsbursky @ 2013-11-15 10:40 UTC (permalink / raw) To: Jeff Layton Cc: Greg KH, linux-kernel, linux-fsdevel, linux-nfs, devel, ebiederm, oleg, bfields, bharrosh 12.11.2013 17:30, Jeff Layton пишет: > On Tue, 12 Nov 2013 17:02:36 +0400 > Stanislav Kinsbursky <skinsbursky@parallels.com> wrote: > >> 12.11.2013 15:12, Jeff Layton пишет: >>> On Mon, 11 Nov 2013 16:47:03 -0800 >>> Greg KH <gregkh@linuxfoundation.org> wrote: >>> >>>> On Mon, Nov 11, 2013 at 07:18:25AM -0500, Jeff Layton wrote: >>>>> We have a bit of a problem wrt to upcalls that use call_usermodehelper >>>>> with containers and I'd like to bring this to some sort of resolution... >>>>> >>>>> A particularly problematic case (though there are others) is the >>>>> nfsdcltrack upcall. It basically uses call_usermodehelper to run a >>>>> program in userland to track some information on stable storage for >>>>> nfsd. >>>> >>>> I thought the discussion at the kernel summit about this issue was: >>>> - don't do this. >>>> - don't do it. >>>> - if you really need to do this, fix nfsd >>>> >>> >>> Sorry, I couldn't make the kernel summit so I missed that discussion. I >>> guess LWN didn't cover it? >>> >>> In any case, I guess then that we'll either have to come up with some >>> way to fix nfsd here, or simply ensure that nfsd can never be started >>> unless root in the container has a full set of a full set of >>> capabilities. >>> >>> One sort of Rube Goldberg possibility to fix nfsd is: >>> >>> - when we start nfsd in a container, fork off an extra kernel thread >>> that just sits idle. That thread would need to be a descendant of the >>> userland process that started nfsd, so we'd need to create it with >>> kernel_thread(). >>> >>> - Have the kernel just start up the UMH program in the init_ns mount >>> namespace as it currently does, but also pass the pid of the idle >>> kernel thread to the UMH upcall. >>> >>> - The program will then use /proc/<pid>/root and /proc/<pid>/ns/* to set >>> itself up for doing things properly. >>> >>> Note that with this mechanism we can't actually run a different binary >>> per container, but that's probably fine for most purposes. >>> >> >> Hmmm... Why we can't? We can go a bit further with userspace idea. >> >> We use UMH some very limited number of user programs. For 2, actually: >> 1) /sbin/nfs_cache_getent >> 2) /sbin/nfsdcltrack >> > > No, the kernel uses them for a lot more than that. Pretty much all of > the keys API upcalls use it. See all of the callers of > call_usermodehelper. All of them are running user binaries out of the > kernel, and almost all of them are certainly broken wrt containers. > >> If we convert them into proxies, which use /proc/<pid>/root and /proc/<pid>/ns/*, this will allow us to lookup the right binary. >> The only limitation here is presence of this "proxy" binaries on "host". >> > > Suppose I spawn my own container as a user, using all of this spiffy > new user namespace stuff. Then I make the kernel use > call_usermodehelper to call the upcall in the init_ns, and then trick > it into running my new "escape_from_namespace" program with "real" root > privileges. > > I don't think we can reasonably assume that having the kernel exec an > arbitrary binary inside of a container is safe. Doing so inside of the > init_ns is marginally more safe, but only marginally so... > >> And we don't need any significant changes in kernel. >> >> BTW, Jeff, could you remind me, please, why exactly we need to use UMH to run the binary? >> What are this capabilities, which force us to do so? >> > > Nothing _forces_ us to do so, but upcalls are very difficult to handle, > and UMH has a lot of advantages over a long-running daemon launched by > userland. > > Originally, I created the nfsdcltrack upcall as a running daemon called > nfsdcld, and the kernel used rpc_pipefs to communicate with it. > > Everyone hated it because no one likes to have to run daemons for > infrequently used upcalls. It's a pain for users to ensure that it's > running and it's a pain to handle when it isn't. So, I was encouraged > to turn that instead into a UMH upcall. > > But leaving that aside, this problem is a lot larger than just nfsd. We > have a *lot* of UMH upcalls in the kernel, so this problem is more > general than just "fixing" nfsd's. > Ok. So we are talking about generic approach to UMH support in a container (and/or namespace). Actually, as far as I can see, there are more that one aspect, which is not supported. One one them is executing of the right binary. Another one is capabilities (and maybe there are more, like user namespaces), but I don't really care about them for now. Executing the right binary, actually, is not about namespaces at all. This is about lookup implementation in VFS (do_execve_common). Would be great to unshare FS for forked UHM kthread and swap it to desired root. This will solve the problem with proper lookup. However, as far as I understand, this approach is not welcome by the community. This problem, probably, can be solved by constructing full binary path (i.e. not in a container, but in kernel thread root context) in UMH "init" callack. However, this will help only is the dentry is accessible from "init" root. Which is usually no true in case on mount namespaces, if I understand them right. -- Best regards, Stanislav Kinsbursky ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: call_usermodehelper in containers 2013-11-15 10:40 ` Stanislav Kinsbursky @ 2013-11-15 11:03 ` Eric W. Biederman 2013-11-15 11:54 ` Stanislav Kinsbursky 2013-11-18 17:28 ` Oleg Nesterov 0 siblings, 2 replies; 49+ messages in thread From: Eric W. Biederman @ 2013-11-15 11:03 UTC (permalink / raw) To: Stanislav Kinsbursky Cc: Jeff Layton, Greg KH, linux-kernel, linux-fsdevel, linux-nfs, devel, oleg, bfields, bharrosh Stanislav Kinsbursky <skinsbursky@parallels.com> writes: > 12.11.2013 17:30, Jeff Layton пишет: >> On Tue, 12 Nov 2013 17:02:36 +0400 >> Stanislav Kinsbursky <skinsbursky@parallels.com> wrote: >> >>> 12.11.2013 15:12, Jeff Layton пишет: >>>> On Mon, 11 Nov 2013 16:47:03 -0800 >>>> Greg KH <gregkh@linuxfoundation.org> wrote: >>>> >>>>> On Mon, Nov 11, 2013 at 07:18:25AM -0500, Jeff Layton wrote: >>>>>> We have a bit of a problem wrt to upcalls that use call_usermodehelper >>>>>> with containers and I'd like to bring this to some sort of resolution... >>>>>> >>>>>> A particularly problematic case (though there are others) is the >>>>>> nfsdcltrack upcall. It basically uses call_usermodehelper to run a >>>>>> program in userland to track some information on stable storage for >>>>>> nfsd. >>>>> >>>>> I thought the discussion at the kernel summit about this issue was: >>>>> - don't do this. >>>>> - don't do it. >>>>> - if you really need to do this, fix nfsd >>>>> >>>> >>>> Sorry, I couldn't make the kernel summit so I missed that discussion. I >>>> guess LWN didn't cover it? >>>> >>>> In any case, I guess then that we'll either have to come up with some >>>> way to fix nfsd here, or simply ensure that nfsd can never be started >>>> unless root in the container has a full set of a full set of >>>> capabilities. >>>> >>>> One sort of Rube Goldberg possibility to fix nfsd is: >>>> >>>> - when we start nfsd in a container, fork off an extra kernel thread >>>> that just sits idle. That thread would need to be a descendant of the >>>> userland process that started nfsd, so we'd need to create it with >>>> kernel_thread(). >>>> >>>> - Have the kernel just start up the UMH program in the init_ns mount >>>> namespace as it currently does, but also pass the pid of the idle >>>> kernel thread to the UMH upcall. >>>> >>>> - The program will then use /proc/<pid>/root and /proc/<pid>/ns/* to set >>>> itself up for doing things properly. >>>> >>>> Note that with this mechanism we can't actually run a different binary >>>> per container, but that's probably fine for most purposes. >>>> >>> >>> Hmmm... Why we can't? We can go a bit further with userspace idea. >>> >>> We use UMH some very limited number of user programs. For 2, actually: >>> 1) /sbin/nfs_cache_getent >>> 2) /sbin/nfsdcltrack >>> >> >> No, the kernel uses them for a lot more than that. Pretty much all of >> the keys API upcalls use it. See all of the callers of >> call_usermodehelper. All of them are running user binaries out of the >> kernel, and almost all of them are certainly broken wrt containers. >> >>> If we convert them into proxies, which use /proc/<pid>/root and /proc/<pid>/ns/*, this will allow us to lookup the right binary. >>> The only limitation here is presence of this "proxy" binaries on "host". >>> >> >> Suppose I spawn my own container as a user, using all of this spiffy >> new user namespace stuff. Then I make the kernel use >> call_usermodehelper to call the upcall in the init_ns, and then trick >> it into running my new "escape_from_namespace" program with "real" root >> privileges. >> >> I don't think we can reasonably assume that having the kernel exec an >> arbitrary binary inside of a container is safe. Doing so inside of the >> init_ns is marginally more safe, but only marginally so... >> >>> And we don't need any significant changes in kernel. >>> >>> BTW, Jeff, could you remind me, please, why exactly we need to use UMH to run the binary? >>> What are this capabilities, which force us to do so? >>> >> >> Nothing _forces_ us to do so, but upcalls are very difficult to handle, >> and UMH has a lot of advantages over a long-running daemon launched by >> userland. >> >> Originally, I created the nfsdcltrack upcall as a running daemon called >> nfsdcld, and the kernel used rpc_pipefs to communicate with it. >> >> Everyone hated it because no one likes to have to run daemons for >> infrequently used upcalls. It's a pain for users to ensure that it's >> running and it's a pain to handle when it isn't. So, I was encouraged >> to turn that instead into a UMH upcall. >> >> But leaving that aside, this problem is a lot larger than just nfsd. We >> have a *lot* of UMH upcalls in the kernel, so this problem is more >> general than just "fixing" nfsd's. >> > > Ok. So we are talking about generic approach to UMH support in a container (and/or namespace). > > Actually, as far as I can see, there are more that one aspect, which is not supported. > One one them is executing of the right binary. Another one is > capabilities (and maybe there are more, like user namespaces), but I > don't really care about them for now. > Executing the right binary, actually, is not about namespaces at all. This is about lookup implementation in VFS (do_execve_common). > Would be great to unshare FS for forked UHM kthread and swap it to > desired root. This will solve the problem with proper lookup. However, > as far as I understand, this approach is not welcome by the community. I don't understand that one. Having a preforked thread with the proper environment that can act like kthreadd in terms of spawning user mode helpers works and is simple. The only downside I can see is that there is extra overhead. Beyond that though for the user mode helpers spawned to populate security keys it is not clear which context they should be run in, even if we do have kernel threads. > This problem, probably, can be solved by constructing full binary path > (i.e. not in a container, but in kernel thread root context) in UMH > "init" callack. However, this will help only is the dentry is > accessible from "init" root. Which is usually no true in case on mount > namespaces, if I understand them right. You are correct it can not be assumed that what is visible in one mount namespace is visible in another. And of course in addition to picking the correct binary to run you have to set up a proper environment for that binary to run in. It may be that it's configuration file is only avaiable at the expected location in the proper mount namespace, even if the binary is available in all of the mount namespaces. Eric ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: call_usermodehelper in containers 2013-11-15 11:03 ` Eric W. Biederman @ 2013-11-15 11:54 ` Stanislav Kinsbursky 2016-02-12 23:39 ` Ian Kent 2013-11-18 17:28 ` Oleg Nesterov 1 sibling, 1 reply; 49+ messages in thread From: Stanislav Kinsbursky @ 2013-11-15 11:54 UTC (permalink / raw) To: Eric W. Biederman Cc: Jeff Layton, Greg KH, linux-kernel, linux-fsdevel, linux-nfs, devel, oleg, bfields, bharrosh 15.11.2013 15:03, Eric W. Biederman пишет: > Stanislav Kinsbursky <skinsbursky@parallels.com> writes: > >> 12.11.2013 17:30, Jeff Layton пишет: >>> On Tue, 12 Nov 2013 17:02:36 +0400 >>> Stanislav Kinsbursky <skinsbursky@parallels.com> wrote: >>> >>>> 12.11.2013 15:12, Jeff Layton пишет: >>>>> On Mon, 11 Nov 2013 16:47:03 -0800 >>>>> Greg KH <gregkh@linuxfoundation.org> wrote: >>>>> >>>>>> On Mon, Nov 11, 2013 at 07:18:25AM -0500, Jeff Layton wrote: >>>>>>> We have a bit of a problem wrt to upcalls that use call_usermodehelper >>>>>>> with containers and I'd like to bring this to some sort of resolution... >>>>>>> >>>>>>> A particularly problematic case (though there are others) is the >>>>>>> nfsdcltrack upcall. It basically uses call_usermodehelper to run a >>>>>>> program in userland to track some information on stable storage for >>>>>>> nfsd. >>>>>> >>>>>> I thought the discussion at the kernel summit about this issue was: >>>>>> - don't do this. >>>>>> - don't do it. >>>>>> - if you really need to do this, fix nfsd >>>>>> >>>>> >>>>> Sorry, I couldn't make the kernel summit so I missed that discussion. I >>>>> guess LWN didn't cover it? >>>>> >>>>> In any case, I guess then that we'll either have to come up with some >>>>> way to fix nfsd here, or simply ensure that nfsd can never be started >>>>> unless root in the container has a full set of a full set of >>>>> capabilities. >>>>> >>>>> One sort of Rube Goldberg possibility to fix nfsd is: >>>>> >>>>> - when we start nfsd in a container, fork off an extra kernel thread >>>>> that just sits idle. That thread would need to be a descendant of the >>>>> userland process that started nfsd, so we'd need to create it with >>>>> kernel_thread(). >>>>> >>>>> - Have the kernel just start up the UMH program in the init_ns mount >>>>> namespace as it currently does, but also pass the pid of the idle >>>>> kernel thread to the UMH upcall. >>>>> >>>>> - The program will then use /proc/<pid>/root and /proc/<pid>/ns/* to set >>>>> itself up for doing things properly. >>>>> >>>>> Note that with this mechanism we can't actually run a different binary >>>>> per container, but that's probably fine for most purposes. >>>>> >>>> >>>> Hmmm... Why we can't? We can go a bit further with userspace idea. >>>> >>>> We use UMH some very limited number of user programs. For 2, actually: >>>> 1) /sbin/nfs_cache_getent >>>> 2) /sbin/nfsdcltrack >>>> >>> >>> No, the kernel uses them for a lot more than that. Pretty much all of >>> the keys API upcalls use it. See all of the callers of >>> call_usermodehelper. All of them are running user binaries out of the >>> kernel, and almost all of them are certainly broken wrt containers. >>> >>>> If we convert them into proxies, which use /proc/<pid>/root and /proc/<pid>/ns/*, this will allow us to lookup the right binary. >>>> The only limitation here is presence of this "proxy" binaries on "host". >>>> >>> >>> Suppose I spawn my own container as a user, using all of this spiffy >>> new user namespace stuff. Then I make the kernel use >>> call_usermodehelper to call the upcall in the init_ns, and then trick >>> it into running my new "escape_from_namespace" program with "real" root >>> privileges. >>> >>> I don't think we can reasonably assume that having the kernel exec an >>> arbitrary binary inside of a container is safe. Doing so inside of the >>> init_ns is marginally more safe, but only marginally so... >>> >>>> And we don't need any significant changes in kernel. >>>> >>>> BTW, Jeff, could you remind me, please, why exactly we need to use UMH to run the binary? >>>> What are this capabilities, which force us to do so? >>>> >>> >>> Nothing _forces_ us to do so, but upcalls are very difficult to handle, >>> and UMH has a lot of advantages over a long-running daemon launched by >>> userland. >>> >>> Originally, I created the nfsdcltrack upcall as a running daemon called >>> nfsdcld, and the kernel used rpc_pipefs to communicate with it. >>> >>> Everyone hated it because no one likes to have to run daemons for >>> infrequently used upcalls. It's a pain for users to ensure that it's >>> running and it's a pain to handle when it isn't. So, I was encouraged >>> to turn that instead into a UMH upcall. >>> >>> But leaving that aside, this problem is a lot larger than just nfsd. We >>> have a *lot* of UMH upcalls in the kernel, so this problem is more >>> general than just "fixing" nfsd's. >>> >> >> Ok. So we are talking about generic approach to UMH support in a container (and/or namespace). >> >> Actually, as far as I can see, there are more that one aspect, which is not supported. >> One one them is executing of the right binary. Another one is >> capabilities (and maybe there are more, like user namespaces), but I >> don't really care about them for now. >> Executing the right binary, actually, is not about namespaces at all. This is about lookup implementation in VFS (do_execve_common). > > > >> Would be great to unshare FS for forked UHM kthread and swap it to >> desired root. This will solve the problem with proper lookup. However, >> as far as I understand, this approach is not welcome by the community. > > I don't understand that one. Having a preforked thread with the proper > environment that can act like kthreadd in terms of spawning user mode > helpers works and is simple. The only downside I can see is that there > is extra overhead. > What do you mean by "simple" here? Simple to implement? We already have a preforked thread, called "UMH", used exactly for this purpose. And, if I'm not mistaken, we are trying to discuss, how to adapt existent infrastructure for namespaces, don't we? > Beyond that though for the user mode helpers spawned to populate > security keys it is not clear which context they should be run in, > even if we do have kernel threads. > Regardless of the context itself, we need a way to pass it to kernel thread and to put kernel thread in this context. Or I'm missing something? >> This problem, probably, can be solved by constructing full binary path >> (i.e. not in a container, but in kernel thread root context) in UMH >> "init" callack. However, this will help only is the dentry is >> accessible from "init" root. Which is usually no true in case on mount >> namespaces, if I understand them right. > > You are correct it can not be assumed that what is visible in one mount > namespace is visible in another. And of course in addition to picking > the correct binary to run you have to set up a proper environment for > that binary to run in. It may be that it's configuration file is only > avaiable at the expected location in the proper mount namespace, even > if the binary is available in all of the mount namespaces. > Yes, you are right. So, this solution can help only in case of very specific and simple "environment-less" programs. So, I believe, that we should modify UMH itself to support our needs. But I don't see, how to make the idea more pleasant for the community. IOW, when I was talking about UMH in NFS implementation on Ksummit, Linus's answer was something like "fix NFS". And I can't object it, actually, because for now NFS is the only corner case. Jeff said, that there are a bunch of UMH calls in kernel, but this is not solid enough to prove UHM changes, since nobody is trying to use them in containers. So, I doubt, that we can change UMH generically without additional use-cases for 'containerized" UMH. > Eric > -- Best regards, Stanislav Kinsbursky ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: call_usermodehelper in containers 2013-11-15 11:54 ` Stanislav Kinsbursky @ 2016-02-12 23:39 ` Ian Kent 2016-02-13 16:08 ` Stanislav Kinsburskiy 0 siblings, 1 reply; 49+ messages in thread From: Ian Kent @ 2016-02-12 23:39 UTC (permalink / raw) To: Stanislav Kinsbursky, Eric W. Biederman Cc: Jeff Layton, Greg KH, linux-kernel, linux-fsdevel, linux-nfs, devel, oleg, bfields, bharrosh On Fri, 2013-11-15 at 15:54 +0400, Stanislav Kinsbursky wrote: > 15.11.2013 15:03, Eric W. Biederman пишет: > > Stanislav Kinsbursky <skinsbursky@parallels.com> writes: > > > > > 12.11.2013 17:30, Jeff Layton пишет: > > > > On Tue, 12 Nov 2013 17:02:36 +0400 > > > > Stanislav Kinsbursky <skinsbursky@parallels.com> wrote: > > > > > > > > > 12.11.2013 15:12, Jeff Layton пишет: > > > > > > On Mon, 11 Nov 2013 16:47:03 -0800 > > > > > > Greg KH <gregkh@linuxfoundation.org> wrote: > > > > > > > > > > > > > On Mon, Nov 11, 2013 at 07:18:25AM -0500, Jeff Layton > > > > > > > wrote: > > > > > > > > We have a bit of a problem wrt to upcalls that use > > > > > > > > call_usermodehelper > > > > > > > > with containers and I'd like to bring this to some sort > > > > > > > > of resolution... > > > > > > > > > > > > > > > > A particularly problematic case (though there are > > > > > > > > others) is the > > > > > > > > nfsdcltrack upcall. It basically uses > > > > > > > > call_usermodehelper to run a > > > > > > > > program in userland to track some information on stable > > > > > > > > storage for > > > > > > > > nfsd. > > > > > > > > > > > > > > I thought the discussion at the kernel summit about this > > > > > > > issue was: > > > > > > > - don't do this. > > > > > > > - don't do it. > > > > > > > - if you really need to do this, fix nfsd > > > > > > > > > > > > > > > > > > > Sorry, I couldn't make the kernel summit so I missed that > > > > > > discussion. I > > > > > > guess LWN didn't cover it? > > > > > > > > > > > > In any case, I guess then that we'll either have to come up > > > > > > with some > > > > > > way to fix nfsd here, or simply ensure that nfsd can never > > > > > > be started > > > > > > unless root in the container has a full set of a full set of > > > > > > capabilities. > > > > > > > > > > > > One sort of Rube Goldberg possibility to fix nfsd is: > > > > > > > > > > > > - when we start nfsd in a container, fork off an extra > > > > > > kernel thread > > > > > > that just sits idle. That thread would need to be a > > > > > > descendant of the > > > > > > userland process that started nfsd, so we'd need to > > > > > > create it with > > > > > > kernel_thread(). > > > > > > > > > > > > - Have the kernel just start up the UMH program in the > > > > > > init_ns mount > > > > > > namespace as it currently does, but also pass the pid > > > > > > of the idle > > > > > > kernel thread to the UMH upcall. > > > > > > > > > > > > - The program will then use /proc/<pid>/root and > > > > > > /proc/<pid>/ns/* to set > > > > > > itself up for doing things properly. > > > > > > > > > > > > Note that with this mechanism we can't actually run a > > > > > > different binary > > > > > > per container, but that's probably fine for most purposes. > > > > > > > > > > > > > > > > Hmmm... Why we can't? We can go a bit further with userspace > > > > > idea. > > > > > > > > > > We use UMH some very limited number of user programs. For 2, > > > > > actually: > > > > > 1) /sbin/nfs_cache_getent > > > > > 2) /sbin/nfsdcltrack > > > > > > > > > > > > > No, the kernel uses them for a lot more than that. Pretty much > > > > all of > > > > the keys API upcalls use it. See all of the callers of > > > > call_usermodehelper. All of them are running user binaries out > > > > of the > > > > kernel, and almost all of them are certainly broken wrt > > > > containers. > > > > > > > > > If we convert them into proxies, which use /proc/<pid>/root > > > > > and /proc/<pid>/ns/*, this will allow us to lookup the right > > > > > binary. > > > > > The only limitation here is presence of this "proxy" binaries > > > > > on "host". > > > > > > > > > > > > > Suppose I spawn my own container as a user, using all of this > > > > spiffy > > > > new user namespace stuff. Then I make the kernel use > > > > call_usermodehelper to call the upcall in the init_ns, and then > > > > trick > > > > it into running my new "escape_from_namespace" program with > > > > "real" root > > > > privileges. > > > > > > > > I don't think we can reasonably assume that having the kernel > > > > exec an > > > > arbitrary binary inside of a container is safe. Doing so inside > > > > of the > > > > init_ns is marginally more safe, but only marginally so... > > > > > > > > > And we don't need any significant changes in kernel. > > > > > > > > > > BTW, Jeff, could you remind me, please, why exactly we need to > > > > > use UMH to run the binary? > > > > > What are this capabilities, which force us to do so? > > > > > > > > > > > > > Nothing _forces_ us to do so, but upcalls are very difficult to > > > > handle, > > > > and UMH has a lot of advantages over a long-running daemon > > > > launched by > > > > userland. > > > > > > > > Originally, I created the nfsdcltrack upcall as a running daemon > > > > called > > > > nfsdcld, and the kernel used rpc_pipefs to communicate with it. > > > > > > > > Everyone hated it because no one likes to have to run daemons > > > > for > > > > infrequently used upcalls. It's a pain for users to ensure that > > > > it's > > > > running and it's a pain to handle when it isn't. So, I was > > > > encouraged > > > > to turn that instead into a UMH upcall. > > > > > > > > But leaving that aside, this problem is a lot larger than just > > > > nfsd. We > > > > have a *lot* of UMH upcalls in the kernel, so this problem is > > > > more > > > > general than just "fixing" nfsd's. > > > > > > > > > > Ok. So we are talking about generic approach to UMH support in a > > > container (and/or namespace). > > > > > > Actually, as far as I can see, there are more that one aspect, > > > which is not supported. > > > One one them is executing of the right binary. Another one is > > > capabilities (and maybe there are more, like user namespaces), but > > > I > > > don't really care about them for now. > > > Executing the right binary, actually, is not about namespaces at > > > all. This is about lookup implementation in VFS > > > (do_execve_common). > > > > > > > > > Would be great to unshare FS for forked UHM kthread and swap it to > > > desired root. This will solve the problem with proper lookup. > > > However, > > > as far as I understand, this approach is not welcome by the > > > community. > > > > I don't understand that one. Having a preforked thread with the > > proper > > environment that can act like kthreadd in terms of spawning user > > mode > > helpers works and is simple. The only downside I can see is that > > there > > is extra overhead. > > > > What do you mean by "simple" here? Simple to implement? > We already have a preforked thread, called "UMH", used exactly for > this purpose. Is there? Can you explain how the pre-forking happens please? AFAICS a workqueue is used to run UMH helpers, I can't see any pre -forking going on there and it doesn't appear to be possible to do either. > And, if I'm not mistaken, we are trying to discuss, how to adapt > existent infrastructure for namespaces, don't we? > > > Beyond that though for the user mode helpers spawned to populate > > security keys it is not clear which context they should be run in, > > even if we do have kernel threads. > > > > Regardless of the context itself, we need a way to pass it to kernel > thread and to put kernel thread in this context. Or I'm missing > something? > > > > This problem, probably, can be solved by constructing full binary > > > path > > > (i.e. not in a container, but in kernel thread root context) in > > > UMH > > > "init" callack. However, this will help only is the dentry is > > > accessible from "init" root. Which is usually no true in case on > > > mount > > > namespaces, if I understand them right. > > > > You are correct it can not be assumed that what is visible in one > > mount > > namespace is visible in another. And of course in addition to > > picking > > the correct binary to run you have to set up a proper environment > > for > > that binary to run in. It may be that it's configuration file is > > only > > avaiable at the expected location in the proper mount namespace, > > even > > if the binary is available in all of the mount namespaces. > > > > Yes, you are right. So, this solution can help only in case of very > specific and simple "environment-less" programs. > So, I believe, that we should modify UMH itself to support our needs. > But I don't see, how to make the idea more pleasant for the community. > IOW, when I was talking about UMH in NFS implementation on Ksummit, > Linus's answer was something like "fix NFS". > And I can't object it, actually, because for now NFS is the only > corner case. > > Jeff said, that there are a bunch of UMH calls in kernel, but this is > not solid enough to prove UHM changes, since nobody is trying to use > them in containers. > > So, I doubt, that we can change UMH generically without additional use > -cases for 'containerized" UMH. > > > Eric > > > > ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: call_usermodehelper in containers 2016-02-12 23:39 ` Ian Kent @ 2016-02-13 16:08 ` Stanislav Kinsburskiy 2016-02-15 0:11 ` Ian Kent 0 siblings, 1 reply; 49+ messages in thread From: Stanislav Kinsburskiy @ 2016-02-13 16:08 UTC (permalink / raw) To: Ian Kent, Stanislav Kinsbursky, Eric W. Biederman Cc: Jeff Layton, Greg KH, linux-kernel, linux-fsdevel, linux-nfs, devel, oleg, bfields, bharrosh 13.02.2016 00:39, Ian Kent пишет: > On Fri, 2013-11-15 at 15:54 +0400, Stanislav Kinsbursky wrote: >> 15.11.2013 15:03, Eric W. Biederman пишет: >>> Stanislav Kinsbursky <skinsbursky@parallels.com> writes: >>> >>>> 12.11.2013 17:30, Jeff Layton пишет: >>>>> On Tue, 12 Nov 2013 17:02:36 +0400 >>>>> Stanislav Kinsbursky <skinsbursky@parallels.com> wrote: >>>>> >>>>>> 12.11.2013 15:12, Jeff Layton пишет: >>>>>>> On Mon, 11 Nov 2013 16:47:03 -0800 >>>>>>> Greg KH <gregkh@linuxfoundation.org> wrote: >>>>>>> >>>>>>>> On Mon, Nov 11, 2013 at 07:18:25AM -0500, Jeff Layton >>>>>>>> wrote: >>>>>>>>> We have a bit of a problem wrt to upcalls that use >>>>>>>>> call_usermodehelper >>>>>>>>> with containers and I'd like to bring this to some sort >>>>>>>>> of resolution... >>>>>>>>> >>>>>>>>> A particularly problematic case (though there are >>>>>>>>> others) is the >>>>>>>>> nfsdcltrack upcall. It basically uses >>>>>>>>> call_usermodehelper to run a >>>>>>>>> program in userland to track some information on stable >>>>>>>>> storage for >>>>>>>>> nfsd. >>>>>>>> I thought the discussion at the kernel summit about this >>>>>>>> issue was: >>>>>>>> - don't do this. >>>>>>>> - don't do it. >>>>>>>> - if you really need to do this, fix nfsd >>>>>>>> >>>>>>> Sorry, I couldn't make the kernel summit so I missed that >>>>>>> discussion. I >>>>>>> guess LWN didn't cover it? >>>>>>> >>>>>>> In any case, I guess then that we'll either have to come up >>>>>>> with some >>>>>>> way to fix nfsd here, or simply ensure that nfsd can never >>>>>>> be started >>>>>>> unless root in the container has a full set of a full set of >>>>>>> capabilities. >>>>>>> >>>>>>> One sort of Rube Goldberg possibility to fix nfsd is: >>>>>>> >>>>>>> - when we start nfsd in a container, fork off an extra >>>>>>> kernel thread >>>>>>> that just sits idle. That thread would need to be a >>>>>>> descendant of the >>>>>>> userland process that started nfsd, so we'd need to >>>>>>> create it with >>>>>>> kernel_thread(). >>>>>>> >>>>>>> - Have the kernel just start up the UMH program in the >>>>>>> init_ns mount >>>>>>> namespace as it currently does, but also pass the pid >>>>>>> of the idle >>>>>>> kernel thread to the UMH upcall. >>>>>>> >>>>>>> - The program will then use /proc/<pid>/root and >>>>>>> /proc/<pid>/ns/* to set >>>>>>> itself up for doing things properly. >>>>>>> >>>>>>> Note that with this mechanism we can't actually run a >>>>>>> different binary >>>>>>> per container, but that's probably fine for most purposes. >>>>>>> >>>>>> Hmmm... Why we can't? We can go a bit further with userspace >>>>>> idea. >>>>>> >>>>>> We use UMH some very limited number of user programs. For 2, >>>>>> actually: >>>>>> 1) /sbin/nfs_cache_getent >>>>>> 2) /sbin/nfsdcltrack >>>>>> >>>>> No, the kernel uses them for a lot more than that. Pretty much >>>>> all of >>>>> the keys API upcalls use it. See all of the callers of >>>>> call_usermodehelper. All of them are running user binaries out >>>>> of the >>>>> kernel, and almost all of them are certainly broken wrt >>>>> containers. >>>>> >>>>>> If we convert them into proxies, which use /proc/<pid>/root >>>>>> and /proc/<pid>/ns/*, this will allow us to lookup the right >>>>>> binary. >>>>>> The only limitation here is presence of this "proxy" binaries >>>>>> on "host". >>>>>> >>>>> Suppose I spawn my own container as a user, using all of this >>>>> spiffy >>>>> new user namespace stuff. Then I make the kernel use >>>>> call_usermodehelper to call the upcall in the init_ns, and then >>>>> trick >>>>> it into running my new "escape_from_namespace" program with >>>>> "real" root >>>>> privileges. >>>>> >>>>> I don't think we can reasonably assume that having the kernel >>>>> exec an >>>>> arbitrary binary inside of a container is safe. Doing so inside >>>>> of the >>>>> init_ns is marginally more safe, but only marginally so... >>>>> >>>>>> And we don't need any significant changes in kernel. >>>>>> >>>>>> BTW, Jeff, could you remind me, please, why exactly we need to >>>>>> use UMH to run the binary? >>>>>> What are this capabilities, which force us to do so? >>>>>> >>>>> Nothing _forces_ us to do so, but upcalls are very difficult to >>>>> handle, >>>>> and UMH has a lot of advantages over a long-running daemon >>>>> launched by >>>>> userland. >>>>> >>>>> Originally, I created the nfsdcltrack upcall as a running daemon >>>>> called >>>>> nfsdcld, and the kernel used rpc_pipefs to communicate with it. >>>>> >>>>> Everyone hated it because no one likes to have to run daemons >>>>> for >>>>> infrequently used upcalls. It's a pain for users to ensure that >>>>> it's >>>>> running and it's a pain to handle when it isn't. So, I was >>>>> encouraged >>>>> to turn that instead into a UMH upcall. >>>>> >>>>> But leaving that aside, this problem is a lot larger than just >>>>> nfsd. We >>>>> have a *lot* of UMH upcalls in the kernel, so this problem is >>>>> more >>>>> general than just "fixing" nfsd's. >>>>> >>>> Ok. So we are talking about generic approach to UMH support in a >>>> container (and/or namespace). >>>> >>>> Actually, as far as I can see, there are more that one aspect, >>>> which is not supported. >>>> One one them is executing of the right binary. Another one is >>>> capabilities (and maybe there are more, like user namespaces), but >>>> I >>>> don't really care about them for now. >>>> Executing the right binary, actually, is not about namespaces at >>>> all. This is about lookup implementation in VFS >>>> (do_execve_common). >>> >>> >>>> Would be great to unshare FS for forked UHM kthread and swap it to >>>> desired root. This will solve the problem with proper lookup. >>>> However, >>>> as far as I understand, this approach is not welcome by the >>>> community. >>> I don't understand that one. Having a preforked thread with the >>> proper >>> environment that can act like kthreadd in terms of spawning user >>> mode >>> helpers works and is simple. The only downside I can see is that >>> there >>> is extra overhead. >>> >> What do you mean by "simple" here? Simple to implement? >> We already have a preforked thread, called "UMH", used exactly for >> this purpose. > Is there? > > Can you explain how the pre-forking happens please? > > AFAICS a workqueue is used to run UMH helpers, I can't see any pre > -forking going on there and it doesn't appear to be possible to do > either. Hi Ian, I'm not sure, I understand your question. But there is a generic "khelper" thread, which is responsible for spawning new kthreads to execute some binary, requested by user. IOW, when you want to use UMH, you add a request to "khelper" workqueue, which, in turn, creates another thread. The new one call init callback and does the actual execve call. > >> And, if I'm not mistaken, we are trying to discuss, how to adapt >> existent infrastructure for namespaces, don't we? >> >>> Beyond that though for the user mode helpers spawned to populate >>> security keys it is not clear which context they should be run in, >>> even if we do have kernel threads. >>> >> Regardless of the context itself, we need a way to pass it to kernel >> thread and to put kernel thread in this context. Or I'm missing >> something? >> >>>> This problem, probably, can be solved by constructing full binary >>>> path >>>> (i.e. not in a container, but in kernel thread root context) in >>>> UMH >>>> "init" callack. However, this will help only is the dentry is >>>> accessible from "init" root. Which is usually no true in case on >>>> mount >>>> namespaces, if I understand them right. >>> You are correct it can not be assumed that what is visible in one >>> mount >>> namespace is visible in another. And of course in addition to >>> picking >>> the correct binary to run you have to set up a proper environment >>> for >>> that binary to run in. It may be that it's configuration file is >>> only >>> avaiable at the expected location in the proper mount namespace, >>> even >>> if the binary is available in all of the mount namespaces. >>> >> Yes, you are right. So, this solution can help only in case of very >> specific and simple "environment-less" programs. >> So, I believe, that we should modify UMH itself to support our needs. >> But I don't see, how to make the idea more pleasant for the community. >> IOW, when I was talking about UMH in NFS implementation on Ksummit, >> Linus's answer was something like "fix NFS". >> And I can't object it, actually, because for now NFS is the only >> corner case. >> >> Jeff said, that there are a bunch of UMH calls in kernel, but this is >> not solid enough to prove UHM changes, since nobody is trying to use >> them in containers. >> >> So, I doubt, that we can change UMH generically without additional use >> -cases for 'containerized" UMH. >> >>> Eric >>> >> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: call_usermodehelper in containers 2016-02-13 16:08 ` Stanislav Kinsburskiy @ 2016-02-15 0:11 ` Ian Kent [not found] ` <1455495082.2941.32.camel-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org> 0 siblings, 1 reply; 49+ messages in thread From: Ian Kent @ 2016-02-15 0:11 UTC (permalink / raw) To: skinsbursky, Stanislav Kinsbursky, Eric W. Biederman Cc: Jeff Layton, Greg KH, linux-kernel, linux-fsdevel, linux-nfs, devel, oleg, bfields, bharrosh On Sat, 2016-02-13 at 17:08 +0100, Stanislav Kinsburskiy wrote: > > 13.02.2016 00:39, Ian Kent пишет: > > On Fri, 2013-11-15 at 15:54 +0400, Stanislav Kinsbursky wrote: > > > 15.11.2013 15:03, Eric W. Biederman пишет: > > > > Stanislav Kinsbursky <skinsbursky@parallels.com> writes: > > > > > > > > > 12.11.2013 17:30, Jeff Layton пишет: > > > > > > On Tue, 12 Nov 2013 17:02:36 +0400 > > > > > > Stanislav Kinsbursky <skinsbursky@parallels.com> wrote: > > > > > > > > > > > > > 12.11.2013 15:12, Jeff Layton пишет: > > > > > > > > On Mon, 11 Nov 2013 16:47:03 -0800 > > > > > > > > Greg KH <gregkh@linuxfoundation.org> wrote: > > > > > > > > > > > > > > > > > On Mon, Nov 11, 2013 at 07:18:25AM -0500, Jeff Layton > > > > > > > > > wrote: > > > > > > > > > > We have a bit of a problem wrt to upcalls that use > > > > > > > > > > call_usermodehelper > > > > > > > > > > with containers and I'd like to bring this to some > > > > > > > > > > sort > > > > > > > > > > of resolution... > > > > > > > > > > > > > > > > > > > > A particularly problematic case (though there are > > > > > > > > > > others) is the > > > > > > > > > > nfsdcltrack upcall. It basically uses > > > > > > > > > > call_usermodehelper to run a > > > > > > > > > > program in userland to track some information on > > > > > > > > > > stable > > > > > > > > > > storage for > > > > > > > > > > nfsd. > > > > > > > > > I thought the discussion at the kernel summit about > > > > > > > > > this > > > > > > > > > issue was: > > > > > > > > > - don't do this. > > > > > > > > > - don't do it. > > > > > > > > > - if you really need to do this, fix nfsd > > > > > > > > > > > > > > > > > Sorry, I couldn't make the kernel summit so I missed > > > > > > > > that > > > > > > > > discussion. I > > > > > > > > guess LWN didn't cover it? > > > > > > > > > > > > > > > > In any case, I guess then that we'll either have to come > > > > > > > > up > > > > > > > > with some > > > > > > > > way to fix nfsd here, or simply ensure that nfsd can > > > > > > > > never > > > > > > > > be started > > > > > > > > unless root in the container has a full set of a full > > > > > > > > set of > > > > > > > > capabilities. > > > > > > > > > > > > > > > > One sort of Rube Goldberg possibility to fix nfsd is: > > > > > > > > > > > > > > > > - when we start nfsd in a container, fork off an extra > > > > > > > > kernel thread > > > > > > > > that just sits idle. That thread would need to be > > > > > > > > a > > > > > > > > descendant of the > > > > > > > > userland process that started nfsd, so we'd need > > > > > > > > to > > > > > > > > create it with > > > > > > > > kernel_thread(). > > > > > > > > > > > > > > > > - Have the kernel just start up the UMH program in the > > > > > > > > init_ns mount > > > > > > > > namespace as it currently does, but also pass the > > > > > > > > pid > > > > > > > > of the idle > > > > > > > > kernel thread to the UMH upcall. > > > > > > > > > > > > > > > > - The program will then use /proc/<pid>/root and > > > > > > > > /proc/<pid>/ns/* to set > > > > > > > > itself up for doing things properly. > > > > > > > > > > > > > > > > Note that with this mechanism we can't actually run a > > > > > > > > different binary > > > > > > > > per container, but that's probably fine for most > > > > > > > > purposes. > > > > > > > > > > > > > > > Hmmm... Why we can't? We can go a bit further with > > > > > > > userspace > > > > > > > idea. > > > > > > > > > > > > > > We use UMH some very limited number of user programs. For > > > > > > > 2, > > > > > > > actually: > > > > > > > 1) /sbin/nfs_cache_getent > > > > > > > 2) /sbin/nfsdcltrack > > > > > > > > > > > > > No, the kernel uses them for a lot more than that. Pretty > > > > > > much > > > > > > all of > > > > > > the keys API upcalls use it. See all of the callers of > > > > > > call_usermodehelper. All of them are running user binaries > > > > > > out > > > > > > of the > > > > > > kernel, and almost all of them are certainly broken wrt > > > > > > containers. > > > > > > > > > > > > > If we convert them into proxies, which use > > > > > > > /proc/<pid>/root > > > > > > > and /proc/<pid>/ns/*, this will allow us to lookup the > > > > > > > right > > > > > > > binary. > > > > > > > The only limitation here is presence of this "proxy" > > > > > > > binaries > > > > > > > on "host". > > > > > > > > > > > > > Suppose I spawn my own container as a user, using all of > > > > > > this > > > > > > spiffy > > > > > > new user namespace stuff. Then I make the kernel use > > > > > > call_usermodehelper to call the upcall in the init_ns, and > > > > > > then > > > > > > trick > > > > > > it into running my new "escape_from_namespace" program with > > > > > > "real" root > > > > > > privileges. > > > > > > > > > > > > I don't think we can reasonably assume that having the > > > > > > kernel > > > > > > exec an > > > > > > arbitrary binary inside of a container is safe. Doing so > > > > > > inside > > > > > > of the > > > > > > init_ns is marginally more safe, but only marginally so... > > > > > > > > > > > > > And we don't need any significant changes in kernel. > > > > > > > > > > > > > > BTW, Jeff, could you remind me, please, why exactly we > > > > > > > need to > > > > > > > use UMH to run the binary? > > > > > > > What are this capabilities, which force us to do so? > > > > > > > > > > > > > Nothing _forces_ us to do so, but upcalls are very difficult > > > > > > to > > > > > > handle, > > > > > > and UMH has a lot of advantages over a long-running daemon > > > > > > launched by > > > > > > userland. > > > > > > > > > > > > Originally, I created the nfsdcltrack upcall as a running > > > > > > daemon > > > > > > called > > > > > > nfsdcld, and the kernel used rpc_pipefs to communicate with > > > > > > it. > > > > > > > > > > > > Everyone hated it because no one likes to have to run > > > > > > daemons > > > > > > for > > > > > > infrequently used upcalls. It's a pain for users to ensure > > > > > > that > > > > > > it's > > > > > > running and it's a pain to handle when it isn't. So, I was > > > > > > encouraged > > > > > > to turn that instead into a UMH upcall. > > > > > > > > > > > > But leaving that aside, this problem is a lot larger than > > > > > > just > > > > > > nfsd. We > > > > > > have a *lot* of UMH upcalls in the kernel, so this problem > > > > > > is > > > > > > more > > > > > > general than just "fixing" nfsd's. > > > > > > > > > > > Ok. So we are talking about generic approach to UMH support in > > > > > a > > > > > container (and/or namespace). > > > > > > > > > > Actually, as far as I can see, there are more that one aspect, > > > > > which is not supported. > > > > > One one them is executing of the right binary. Another one is > > > > > capabilities (and maybe there are more, like user namespaces), > > > > > but > > > > > I > > > > > don't really care about them for now. > > > > > Executing the right binary, actually, is not about namespaces > > > > > at > > > > > all. This is about lookup implementation in VFS > > > > > (do_execve_common). > > > > > > > > > > > > > Would be great to unshare FS for forked UHM kthread and swap > > > > > it to > > > > > desired root. This will solve the problem with proper lookup. > > > > > However, > > > > > as far as I understand, this approach is not welcome by the > > > > > community. > > > > I don't understand that one. Having a preforked thread with the > > > > proper > > > > environment that can act like kthreadd in terms of spawning user > > > > mode > > > > helpers works and is simple. The only downside I can see is > > > > that > > > > there > > > > is extra overhead. > > > > > > > What do you mean by "simple" here? Simple to implement? > > > We already have a preforked thread, called "UMH", used exactly for > > > this purpose. > > Is there? > > > > Can you explain how the pre-forking happens please? > > > > AFAICS a workqueue is used to run UMH helpers, I can't see any pre > > -forking going on there and it doesn't appear to be possible to do > > either. > > Hi Ian, > I'm not sure, I understand your question. > But there is a generic "khelper" thread, which is responsible for > spawning new kthreads to execute some binary, requested by user. > IOW, when you want to use UMH, you add a request to "khelper" > workqueue, > which, in turn, creates another thread. The new one call init callback > and does the actual execve call. AFAICS kernel/kmod.c used to use create_singlethread_workqueue() and queue_work() to perform umh calls, now it uses only queue_work() and the system_unbound_wq workqueue. Looking at the workqueue sub system there doesn't appear to be a way to create a workqueue with a thread runner thread, created within the process context at the time of workqueue creation, that then waits to run work. So there's no way to create a workqueue to run umh calls within a specific process context, such as that of a container, by using the workqueue subsystem as it is now. The problem being that the process context of the caller requesting umh isn't necessarily (and shouldn't be used because it could allow the caller to hijack the environment) the process context that needs to be used for the request. It looks like the reply to this thread from Oleg that demonstrates using child_reaper for the run context could be used though. Capturing the struct pid of child_reaper and then using that to locate the appropriate task context later (if it still exists) at request time could be used. That doesn't take care of working out when this should be captured or where to put it so it can be obtained at request time (which seems difficult in itself). > > > > > > And, if I'm not mistaken, we are trying to discuss, how to adapt > > > existent infrastructure for namespaces, don't we? > > > > > > > Beyond that though for the user mode helpers spawned to populate > > > > security keys it is not clear which context they should be run > > > > in, > > > > even if we do have kernel threads. > > > > > > > Regardless of the context itself, we need a way to pass it to > > > kernel > > > thread and to put kernel thread in this context. Or I'm missing > > > something? > > > > > > > > This problem, probably, can be solved by constructing full > > > > > binary > > > > > path > > > > > (i.e. not in a container, but in kernel thread root context) > > > > > in > > > > > UMH > > > > > "init" callack. However, this will help only is the dentry is > > > > > accessible from "init" root. Which is usually no true in case > > > > > on > > > > > mount > > > > > namespaces, if I understand them right. > > > > You are correct it can not be assumed that what is visible in > > > > one > > > > mount > > > > namespace is visible in another. And of course in addition to > > > > picking > > > > the correct binary to run you have to set up a proper > > > > environment > > > > for > > > > that binary to run in. It may be that it's configuration file > > > > is > > > > only > > > > avaiable at the expected location in the proper mount namespace, > > > > even > > > > if the binary is available in all of the mount namespaces. > > > > > > > Yes, you are right. So, this solution can help only in case of > > > very > > > specific and simple "environment-less" programs. > > > So, I believe, that we should modify UMH itself to support our > > > needs. > > > But I don't see, how to make the idea more pleasant for the > > > community. > > > IOW, when I was talking about UMH in NFS implementation on > > > Ksummit, > > > Linus's answer was something like "fix NFS". > > > And I can't object it, actually, because for now NFS is the only > > > corner case. > > > > > > Jeff said, that there are a bunch of UMH calls in kernel, but this > > > is > > > not solid enough to prove UHM changes, since nobody is trying to > > > use > > > them in containers. > > > > > > So, I doubt, that we can change UMH generically without additional > > > use > > > -cases for 'containerized" UMH. > > > > > > > Eric > > > > > > > > ^ permalink raw reply [flat|nested] 49+ messages in thread
[parent not found: <1455495082.2941.32.camel-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org>]
* Re: call_usermodehelper in containers 2016-02-15 0:11 ` Ian Kent @ 2016-02-18 3:17 ` Eric W. Biederman 0 siblings, 0 replies; 49+ messages in thread From: Eric W. Biederman @ 2016-02-18 3:17 UTC (permalink / raw) To: Ian Kent Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA, Stanislav Kinsbursky, Jeff Layton, Greg KH, Linux Containers, skinsbursky-5HdwGun5lf+gSpxsJD1C4w, linux-kernel-u79uwXL29TY76Z2rM5mHXA, oleg-H+wXaHxf7aLQT0dZR+AlfA, bfields-uC3wQj2KruNg9hUCZPvPmw, bharrosh-C4P08NqkoRlBDgjK7y7TUQ, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, devel-GEFAQzZX7r8dnm+yROfE0A Ian Kent <raven-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org> writes: > AFAICS kernel/kmod.c used to use create_singlethread_workqueue() and > queue_work() to perform umh calls, now it uses only queue_work() and > the system_unbound_wq workqueue. > > Looking at the workqueue sub system there doesn't appear to be a way to > create a workqueue with a thread runner thread, created within the > process context at the time of workqueue creation, that then waits to > run work. So there's no way to create a workqueue to run umh calls > within a specific process context, such as that of a container, by using > the workqueue subsystem as it is now. > > The problem being that the process context of the caller requesting umh > isn't necessarily (and shouldn't be used because it could allow the > caller to hijack the environment) the process context that needs to be > used for the request. > > It looks like the reply to this thread from Oleg that demonstrates using > child_reaper for the run context could be used though. Capturing the > struct pid of child_reaper and then using that to locate the appropriate > task context later (if it still exists) at request time could be used. > > That doesn't take care of working out when this should be captured or > where to put it so it can be obtained at request time (which seems > difficult in itself). It would be really really nice if the user namespace could be used for the where do we look at case. As every other namespace already has a pointer to the user namespace, and fundamentally the user namespace is the permission boundary (from a namespace perspective). So for the equivalent of kthreadd in a user namespace we need a thread that has a full set of namespaces owned by the user namespaces. On one side this is very easy to obtain if we look at the process that sets core_pattern or mounts one of the nfs filesystems (such as the filesystem that when mounted starts nfsd), and just fork a kernel thread from it. On another side perhaps what we want is a syscall call it start_umhd that says repurpose the caller of this thread to handle future user mode helper calls. That we could tie to a user namespace quite easily. This definitely does not play particularly nice with queue work and friends, but that is just infrastructure and we can update user mode helper to use something else reasonable as long as we have a solid design. Perhaps there is a combination of the two ideas that could work. Instead of a syscall use the invocation of a service that needs a user mode helper as a trigger to create such a launcher thread. Eric ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: call_usermodehelper in containers @ 2016-02-18 3:17 ` Eric W. Biederman 0 siblings, 0 replies; 49+ messages in thread From: Eric W. Biederman @ 2016-02-18 3:17 UTC (permalink / raw) To: Ian Kent Cc: skinsbursky, Stanislav Kinsbursky, Jeff Layton, Greg KH, linux-kernel, linux-fsdevel, linux-nfs, devel, oleg, bfields, bharrosh, Linux Containers Ian Kent <raven@themaw.net> writes: > AFAICS kernel/kmod.c used to use create_singlethread_workqueue() and > queue_work() to perform umh calls, now it uses only queue_work() and > the system_unbound_wq workqueue. > > Looking at the workqueue sub system there doesn't appear to be a way to > create a workqueue with a thread runner thread, created within the > process context at the time of workqueue creation, that then waits to > run work. So there's no way to create a workqueue to run umh calls > within a specific process context, such as that of a container, by using > the workqueue subsystem as it is now. > > The problem being that the process context of the caller requesting umh > isn't necessarily (and shouldn't be used because it could allow the > caller to hijack the environment) the process context that needs to be > used for the request. > > It looks like the reply to this thread from Oleg that demonstrates using > child_reaper for the run context could be used though. Capturing the > struct pid of child_reaper and then using that to locate the appropriate > task context later (if it still exists) at request time could be used. > > That doesn't take care of working out when this should be captured or > where to put it so it can be obtained at request time (which seems > difficult in itself). It would be really really nice if the user namespace could be used for the where do we look at case. As every other namespace already has a pointer to the user namespace, and fundamentally the user namespace is the permission boundary (from a namespace perspective). So for the equivalent of kthreadd in a user namespace we need a thread that has a full set of namespaces owned by the user namespaces. On one side this is very easy to obtain if we look at the process that sets core_pattern or mounts one of the nfs filesystems (such as the filesystem that when mounted starts nfsd), and just fork a kernel thread from it. On another side perhaps what we want is a syscall call it start_umhd that says repurpose the caller of this thread to handle future user mode helper calls. That we could tie to a user namespace quite easily. This definitely does not play particularly nice with queue work and friends, but that is just infrastructure and we can update user mode helper to use something else reasonable as long as we have a solid design. Perhaps there is a combination of the two ideas that could work. Instead of a syscall use the invocation of a service that needs a user mode helper as a trigger to create such a launcher thread. Eric ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: call_usermodehelper in containers 2013-11-15 11:03 ` Eric W. Biederman 2013-11-15 11:54 ` Stanislav Kinsbursky @ 2013-11-18 17:28 ` Oleg Nesterov 2013-11-18 18:02 ` Oleg Nesterov ` (2 more replies) 1 sibling, 3 replies; 49+ messages in thread From: Oleg Nesterov @ 2013-11-18 17:28 UTC (permalink / raw) To: Eric W. Biederman Cc: Stanislav Kinsbursky, Jeff Layton, Greg KH, linux-kernel, linux-fsdevel, linux-nfs, devel, bfields, bharrosh On 11/15, Eric W. Biederman wrote: > > I don't understand that one. Having a preforked thread with the proper > environment that can act like kthreadd in terms of spawning user mode > helpers works and is simple. Can't we ask ->child_reaper to create the non-daemonized kernel thread with the "right" ->nsproxy, ->fs, etc? IOW. Please the the "patch" below. It is obviously incomplete and wrong, and it can be more clear/clean. And probably we need another API. Just to explain what I mean. With this patch call_usermodehelper(..., UMH_IN_MY_NS) should do exec from the caller's namespace. Oleg. --- --- a/include/linux/kmod.h +++ b/include/linux/kmod.h @@ -24,6 +24,7 @@ #include <linux/errno.h> #include <linux/compiler.h> #include <linux/workqueue.h> +#include <linux/task_work.h> #include <linux/sysctl.h> #define KMOD_PATH_LEN 256 @@ -53,8 +54,14 @@ struct file; #define UMH_WAIT_PROC 2 /* wait for the process to complete */ #define UMH_KILLABLE 4 /* wait for EXEC/PROC killable */ +// FIXME: IMH_* is not actually a mask +#define UMH_IN_MY_NS 8 + struct subprocess_info { - struct work_struct work; + union { + struct work_struct work; + struct callback_head twork; + }; struct completion *complete; char *path; char **argv; --- a/kernel/kmod.c +++ b/kernel/kmod.c @@ -541,7 +541,6 @@ struct subprocess_info *call_usermodehelper_setup(char *path, char **argv, if (!sub_info) goto out; - INIT_WORK(&sub_info->work, __call_usermodehelper); sub_info->path = path; sub_info->argv = argv; sub_info->envp = envp; @@ -554,6 +553,24 @@ struct subprocess_info *call_usermodehelper_setup(char *path, char **argv, } EXPORT_SYMBOL(call_usermodehelper_setup); +static int call_call_usermodehelper(void *twork) +{ + struct subprocess_info *sub_info = + container_of(twork, struct subprocess_info, twork); + + __call_usermodehelper(&sub_info->work); + do_exit(0); + +} + +static void fork_umh_helper(struct callback_head *twork) +{ + if (current->flags & PF_EXITING) + return; // WRONG, FIXME + + kernel_thread(call_call_usermodehelper, twork, SIGCHLD); +} + /** * call_usermodehelper_exec - start a usermode application * @sub_info: information about the subprocessa @@ -570,6 +587,10 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait) { DECLARE_COMPLETION_ONSTACK(done); int retval = 0; + bool in_my_ns; + + in_my_ns = wait & UMH_IN_MY_NS; + wait &= ~UMH_IN_MY_NS; if (!sub_info->path) { call_usermodehelper_freeinfo(sub_info); @@ -594,7 +615,21 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait) sub_info->complete = &done; sub_info->wait = wait; - queue_work(khelper_wq, &sub_info->work); + if (likely(!in_my_ns)) { + INIT_WORK(&sub_info->work, __call_usermodehelper); + queue_work(khelper_wq, &sub_info->work); + } else { + // RACY, WRONG, ETC + struct task_struct *my_init = task_active_pid_ns(current)->child_reaper; + + init_task_work(&sub_info->twork, fork_umh_helper); + task_work_add(my_init, &sub_info->twork, false); + + // until we have task_work_add_interruptibel() + do_send_sig_info(SIGCHLD, SEND_SIG_FORCED, my_init, false); + + } + if (wait == UMH_NO_WAIT) /* task has freed sub_info */ goto unlock; ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: call_usermodehelper in containers 2013-11-18 17:28 ` Oleg Nesterov @ 2013-11-18 18:02 ` Oleg Nesterov 2013-11-19 14:51 ` Jeff Layton 2016-02-11 0:17 ` Ian Kent 2016-03-24 7:45 ` Ian Kent 2 siblings, 1 reply; 49+ messages in thread From: Oleg Nesterov @ 2013-11-18 18:02 UTC (permalink / raw) To: Eric W. Biederman Cc: Stanislav Kinsbursky, Jeff Layton, Greg KH, linux-kernel, linux-fsdevel, linux-nfs, devel, bfields, bharrosh On 11/18, Oleg Nesterov wrote: > > On 11/15, Eric W. Biederman wrote: > > > > I don't understand that one. Having a preforked thread with the proper > > environment that can act like kthreadd in terms of spawning user mode > > helpers works and is simple. > > Can't we ask ->child_reaper to create the non-daemonized kernel thread > with the "right" ->nsproxy, ->fs, etc? > > IOW. Please the the "patch" below. It is obviously incomplete and wrong, > and it can be more clear/clean. And probably we need another API. Just > to explain what I mean. Or, perhaps UMH_IN_MY_NS should only work if ->child_reaper explicitly does, say, prctl(PR_SPAWN_UMH_IN_NS_HELPER) which forks the non-daemonized kernel kthread_worker thread, I dunno. Oleg. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: call_usermodehelper in containers 2013-11-18 18:02 ` Oleg Nesterov @ 2013-11-19 14:51 ` Jeff Layton 0 siblings, 0 replies; 49+ messages in thread From: Jeff Layton @ 2013-11-19 14:51 UTC (permalink / raw) To: Oleg Nesterov Cc: Eric W. Biederman, Stanislav Kinsbursky, Greg KH, linux-kernel, linux-fsdevel, linux-nfs, devel, bfields, bharrosh On Mon, 18 Nov 2013 19:02:59 +0100 Oleg Nesterov <oleg@redhat.com> wrote: > On 11/18, Oleg Nesterov wrote: > > > > On 11/15, Eric W. Biederman wrote: > > > > > > I don't understand that one. Having a preforked thread with the proper > > > environment that can act like kthreadd in terms of spawning user mode > > > helpers works and is simple. > > > > Can't we ask ->child_reaper to create the non-daemonized kernel thread > > with the "right" ->nsproxy, ->fs, etc? > > > > IOW. Please the the "patch" below. It is obviously incomplete and wrong, > > and it can be more clear/clean. And probably we need another API. Just > > to explain what I mean. > > Or, perhaps UMH_IN_MY_NS should only work if ->child_reaper explicitly > does, say, prctl(PR_SPAWN_UMH_IN_NS_HELPER) which forks the non-daemonized > kernel kthread_worker thread, I dunno. > > Oleg. > Neat idea. So is it always the case that tasks in a container have the same namespace settings and capabilities as the child_reaper? We'll still have the basic problem for nfsd that we'll need to keep track of what the child_reaper is when nfsd is started, but I think that's not too hard to solve. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: call_usermodehelper in containers 2013-11-18 17:28 ` Oleg Nesterov 2013-11-18 18:02 ` Oleg Nesterov @ 2016-02-11 0:17 ` Ian Kent [not found] ` <1455149857.2903.9.camel-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org> 2016-03-24 7:45 ` Ian Kent 2 siblings, 1 reply; 49+ messages in thread From: Ian Kent @ 2016-02-11 0:17 UTC (permalink / raw) To: Oleg Nesterov, Eric W. Biederman Cc: Stanislav Kinsbursky, Jeff Layton, Greg KH, linux-kernel, linux-fsdevel, linux-nfs, devel, bfields, bharrosh, Eric W. Biederman On Mon, 2013-11-18 at 18:28 +0100, Oleg Nesterov wrote: > On 11/15, Eric W. Biederman wrote: > > > > I don't understand that one. Having a preforked thread with the > > proper > > environment that can act like kthreadd in terms of spawning user > > mode > > helpers works and is simple. Forgive me replying to such an old thread but ... After realizing workqueues can't be used to pre-create threads to run usermode helpers I've returned to look at this. > > Can't we ask ->child_reaper to create the non-daemonized kernel thread > with the "right" ->nsproxy, ->fs, etc? Eric, do you think this approach would be sufficient too? Probably wouldn't be quite right for user namespaces but should provide what's needed for other cases? It certainly has the advantage of not having to maintain a plague of processes waiting around to execute helpers. > > IOW. Please the the "patch" below. It is obviously incomplete and > wrong, > and it can be more clear/clean. And probably we need another API. Just > to explain what I mean. > > With this patch call_usermodehelper(..., UMH_IN_MY_NS) should do exec > from the caller's namespace. > > Oleg. > --- > > --- a/include/linux/kmod.h > +++ b/include/linux/kmod.h > @@ -24,6 +24,7 @@ > #include <linux/errno.h> > #include <linux/compiler.h> > #include <linux/workqueue.h> > +#include <linux/task_work.h> > #include <linux/sysctl.h> > > #define KMOD_PATH_LEN 256 > @@ -53,8 +54,14 @@ struct file; > #define UMH_WAIT_PROC 2 /* wait for the process to > complete */ > #define UMH_KILLABLE 4 /* wait for EXEC/PROC killable > */ > > +// FIXME: IMH_* is not actually a mask > +#define UMH_IN_MY_NS 8 > + > struct subprocess_info { > - struct work_struct work; > + union { > + struct work_struct work; > + struct callback_head twork; > + }; > struct completion *complete; > char *path; > char **argv; > --- a/kernel/kmod.c > +++ b/kernel/kmod.c > @@ -541,7 +541,6 @@ struct subprocess_info > *call_usermodehelper_setup(char *path, char **argv, > if (!sub_info) > goto out; > > - INIT_WORK(&sub_info->work, __call_usermodehelper); > sub_info->path = path; > sub_info->argv = argv; > sub_info->envp = envp; > @@ -554,6 +553,24 @@ struct subprocess_info > *call_usermodehelper_setup(char *path, char **argv, > } > EXPORT_SYMBOL(call_usermodehelper_setup); > > +static int call_call_usermodehelper(void *twork) > +{ > + struct subprocess_info *sub_info = > + container_of(twork, struct subprocess_info, twork); > + > + __call_usermodehelper(&sub_info->work); > + do_exit(0); > + > +} > + > +static void fork_umh_helper(struct callback_head *twork) > +{ > + if (current->flags & PF_EXITING) > + return; // WRONG, FIXME > + > + kernel_thread(call_call_usermodehelper, twork, SIGCHLD); > +} > + > /** > * call_usermodehelper_exec - start a usermode application > * @sub_info: information about the subprocessa > @@ -570,6 +587,10 @@ int call_usermodehelper_exec(struct > subprocess_info *sub_info, int wait) > { > DECLARE_COMPLETION_ONSTACK(done); > int retval = 0; > + bool in_my_ns; > + > + in_my_ns = wait & UMH_IN_MY_NS; > + wait &= ~UMH_IN_MY_NS; > > if (!sub_info->path) { > call_usermodehelper_freeinfo(sub_info); > @@ -594,7 +615,21 @@ int call_usermodehelper_exec(struct > subprocess_info *sub_info, int wait) > sub_info->complete = &done; > sub_info->wait = wait; > > - queue_work(khelper_wq, &sub_info->work); > + if (likely(!in_my_ns)) { > + INIT_WORK(&sub_info->work, __call_usermodehelper); > + queue_work(khelper_wq, &sub_info->work); > + } else { > + // RACY, WRONG, ETC > + struct task_struct *my_init = > task_active_pid_ns(current)->child_reaper; > + > + init_task_work(&sub_info->twork, fork_umh_helper); > + task_work_add(my_init, &sub_info->twork, false); > + > + // until we have task_work_add_interruptibel() > + do_send_sig_info(SIGCHLD, SEND_SIG_FORCED, my_init, > false); > + > + } > + > if (wait == UMH_NO_WAIT) /* task has freed sub_info */ > goto unlock; > > > -- > To unsubscribe from this list: send the line "unsubscribe linux > -fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 49+ messages in thread
[parent not found: <1455149857.2903.9.camel-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org>]
* Re: call_usermodehelper in containers 2016-02-11 0:17 ` Ian Kent @ 2016-02-18 2:57 ` Eric W. Biederman 0 siblings, 0 replies; 49+ messages in thread From: Eric W. Biederman @ 2016-02-18 2:57 UTC (permalink / raw) To: Ian Kent Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA, Stanislav Kinsbursky, Jeff Layton, Greg KH, Linux Containers, Oleg Nesterov, linux-kernel-u79uwXL29TY76Z2rM5mHXA, bfields-uC3wQj2KruNg9hUCZPvPmw, bharrosh-C4P08NqkoRlBDgjK7y7TUQ, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, devel-GEFAQzZX7r8dnm+yROfE0A Ccing The containers list because a related discussion is happening there and somehow this thread has never made it there. Ian Kent <raven-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org> writes: > On Mon, 2013-11-18 at 18:28 +0100, Oleg Nesterov wrote: >> On 11/15, Eric W. Biederman wrote: >> > >> > I don't understand that one. Having a preforked thread with the >> > proper >> > environment that can act like kthreadd in terms of spawning user >> > mode >> > helpers works and is simple. > > Forgive me replying to such an old thread but ... > > After realizing workqueues can't be used to pre-create threads to run > usermode helpers I've returned to look at this. If someone can wind up with a good implementation I will be happy. >> Can't we ask ->child_reaper to create the non-daemonized kernel thread >> with the "right" ->nsproxy, ->fs, etc? > > Eric, do you think this approach would be sufficient too? > > Probably wouldn't be quite right for user namespaces but should provide > what's needed for other cases? > > It certainly has the advantage of not having to maintain a plague of > processes waiting around to execute helpers. That certainly sounds attractive. Especially for the case of everyone who wants to set a core pattern in a container. I am fuzzy on all of the details right now, but what I do remember is that in the kernel the user mode helper concepts when they attempted to scrub a processes environment were quite error prone until we managed to get kthreadd(pid 2) on the scene which always had a clean environment. If we are going to tie this kind of thing to the pid namespace I recommend simplying denying it if you are in a user namespace without an approrpriate pid namespace. AKA simply not allowing thigns to be setup if current->pid_ns->user_ns != current->user_ns. That still leaves things a little hand-wavy but I hope that helps conceptually. Eric ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: call_usermodehelper in containers @ 2016-02-18 2:57 ` Eric W. Biederman 0 siblings, 0 replies; 49+ messages in thread From: Eric W. Biederman @ 2016-02-18 2:57 UTC (permalink / raw) To: Ian Kent Cc: Oleg Nesterov, Stanislav Kinsbursky, Jeff Layton, Greg KH, linux-kernel, linux-fsdevel, linux-nfs, devel, bfields, bharrosh, Linux Containers Ccing The containers list because a related discussion is happening there and somehow this thread has never made it there. Ian Kent <raven@themaw.net> writes: > On Mon, 2013-11-18 at 18:28 +0100, Oleg Nesterov wrote: >> On 11/15, Eric W. Biederman wrote: >> > >> > I don't understand that one. Having a preforked thread with the >> > proper >> > environment that can act like kthreadd in terms of spawning user >> > mode >> > helpers works and is simple. > > Forgive me replying to such an old thread but ... > > After realizing workqueues can't be used to pre-create threads to run > usermode helpers I've returned to look at this. If someone can wind up with a good implementation I will be happy. >> Can't we ask ->child_reaper to create the non-daemonized kernel thread >> with the "right" ->nsproxy, ->fs, etc? > > Eric, do you think this approach would be sufficient too? > > Probably wouldn't be quite right for user namespaces but should provide > what's needed for other cases? > > It certainly has the advantage of not having to maintain a plague of > processes waiting around to execute helpers. That certainly sounds attractive. Especially for the case of everyone who wants to set a core pattern in a container. I am fuzzy on all of the details right now, but what I do remember is that in the kernel the user mode helper concepts when they attempted to scrub a processes environment were quite error prone until we managed to get kthreadd(pid 2) on the scene which always had a clean environment. If we are going to tie this kind of thing to the pid namespace I recommend simplying denying it if you are in a user namespace without an approrpriate pid namespace. AKA simply not allowing thigns to be setup if current->pid_ns->user_ns != current->user_ns. That still leaves things a little hand-wavy but I hope that helps conceptually. Eric ^ permalink raw reply [flat|nested] 49+ messages in thread
[parent not found: <8737sq4teb.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>]
* Re: call_usermodehelper in containers 2016-02-18 2:57 ` Eric W. Biederman @ 2016-02-18 3:43 ` Kamezawa Hiroyuki -1 siblings, 0 replies; 49+ messages in thread From: Kamezawa Hiroyuki @ 2016-02-18 3:43 UTC (permalink / raw) To: Eric W. Biederman, Ian Kent Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA, Stanislav Kinsbursky, Jeff Layton, Greg KH, Linux Containers, Oleg Nesterov, linux-kernel-u79uwXL29TY76Z2rM5mHXA, bfields-uC3wQj2KruNg9hUCZPvPmw, bharrosh-C4P08NqkoRlBDgjK7y7TUQ, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, devel-GEFAQzZX7r8dnm+yROfE0A On 2016/02/18 11:57, Eric W. Biederman wrote: > > Ccing The containers list because a related discussion is happening there > and somehow this thread has never made it there. > > Ian Kent <raven-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org> writes: > >> On Mon, 2013-11-18 at 18:28 +0100, Oleg Nesterov wrote: >>> On 11/15, Eric W. Biederman wrote: >>>> >>>> I don't understand that one. Having a preforked thread with the >>>> proper >>>> environment that can act like kthreadd in terms of spawning user >>>> mode >>>> helpers works and is simple. >> >> Forgive me replying to such an old thread but ... >> >> After realizing workqueues can't be used to pre-create threads to run >> usermode helpers I've returned to look at this. > > If someone can wind up with a good implementation I will be happy. > >>> Can't we ask ->child_reaper to create the non-daemonized kernel thread >>> with the "right" ->nsproxy, ->fs, etc? >> >> Eric, do you think this approach would be sufficient too? >> >> Probably wouldn't be quite right for user namespaces but should provide >> what's needed for other cases? >> >> It certainly has the advantage of not having to maintain a plague of >> processes waiting around to execute helpers. > > That certainly sounds attractive. Especially for the case of everyone > who wants to set a core pattern in a container. > > I am fuzzy on all of the details right now, but what I do remember is > that in the kernel the user mode helper concepts when they attempted to > scrub a processes environment were quite error prone until we managed to > get kthreadd(pid 2) on the scene which always had a clean environment. > > If we are going to tie this kind of thing to the pid namespace I > recommend simplying denying it if you are in a user namespace without > an approrpriate pid namespace. AKA simply not allowing thigns to be setup > if current->pid_ns->user_ns != current->user_ns. > Can't be handled by simple capability like CAP_SYS_USERMODEHELPER ? User_ns check seems not to allow core-dump-cather in host will not work if user_ns is used. Thanks, -Kame ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: call_usermodehelper in containers @ 2016-02-18 3:43 ` Kamezawa Hiroyuki 0 siblings, 0 replies; 49+ messages in thread From: Kamezawa Hiroyuki @ 2016-02-18 3:43 UTC (permalink / raw) To: Eric W. Biederman, Ian Kent Cc: Oleg Nesterov, Stanislav Kinsbursky, Jeff Layton, Greg KH, linux-kernel, linux-fsdevel, linux-nfs, devel, bfields, bharrosh, Linux Containers On 2016/02/18 11:57, Eric W. Biederman wrote: > > Ccing The containers list because a related discussion is happening there > and somehow this thread has never made it there. > > Ian Kent <raven@themaw.net> writes: > >> On Mon, 2013-11-18 at 18:28 +0100, Oleg Nesterov wrote: >>> On 11/15, Eric W. Biederman wrote: >>>> >>>> I don't understand that one. Having a preforked thread with the >>>> proper >>>> environment that can act like kthreadd in terms of spawning user >>>> mode >>>> helpers works and is simple. >> >> Forgive me replying to such an old thread but ... >> >> After realizing workqueues can't be used to pre-create threads to run >> usermode helpers I've returned to look at this. > > If someone can wind up with a good implementation I will be happy. > >>> Can't we ask ->child_reaper to create the non-daemonized kernel thread >>> with the "right" ->nsproxy, ->fs, etc? >> >> Eric, do you think this approach would be sufficient too? >> >> Probably wouldn't be quite right for user namespaces but should provide >> what's needed for other cases? >> >> It certainly has the advantage of not having to maintain a plague of >> processes waiting around to execute helpers. > > That certainly sounds attractive. Especially for the case of everyone > who wants to set a core pattern in a container. > > I am fuzzy on all of the details right now, but what I do remember is > that in the kernel the user mode helper concepts when they attempted to > scrub a processes environment were quite error prone until we managed to > get kthreadd(pid 2) on the scene which always had a clean environment. > > If we are going to tie this kind of thing to the pid namespace I > recommend simplying denying it if you are in a user namespace without > an approrpriate pid namespace. AKA simply not allowing thigns to be setup > if current->pid_ns->user_ns != current->user_ns. > Can't be handled by simple capability like CAP_SYS_USERMODEHELPER ? User_ns check seems not to allow core-dump-cather in host will not work if user_ns is used. Thanks, -Kame ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: call_usermodehelper in containers 2016-02-18 3:43 ` Kamezawa Hiroyuki (?) @ 2016-02-18 6:36 ` Ian Kent 2016-02-18 7:37 ` Ian Kent [not found] ` <1455777387.3188.24.camel-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org> -1 siblings, 2 replies; 49+ messages in thread From: Ian Kent @ 2016-02-18 6:36 UTC (permalink / raw) To: Kamezawa Hiroyuki, Eric W. Biederman Cc: Oleg Nesterov, Stanislav Kinsbursky, Jeff Layton, Greg KH, linux-kernel, linux-fsdevel, linux-nfs, devel, bfields, bharrosh, Linux Containers On Thu, 2016-02-18 at 12:43 +0900, Kamezawa Hiroyuki wrote: > On 2016/02/18 11:57, Eric W. Biederman wrote: > > > > Ccing The containers list because a related discussion is happening > > there > > and somehow this thread has never made it there. > > > > Ian Kent <raven@themaw.net> writes: > > > > > On Mon, 2013-11-18 at 18:28 +0100, Oleg Nesterov wrote: > > > > On 11/15, Eric W. Biederman wrote: > > > > > > > > > > I don't understand that one. Having a preforked thread with > > > > > the > > > > > proper > > > > > environment that can act like kthreadd in terms of spawning > > > > > user > > > > > mode > > > > > helpers works and is simple. > > > > > > Forgive me replying to such an old thread but ... > > > > > > After realizing workqueues can't be used to pre-create threads to > > > run > > > usermode helpers I've returned to look at this. > > > > If someone can wind up with a good implementation I will be happy. > > > > > > Can't we ask ->child_reaper to create the non-daemonized kernel > > > > thread > > > > with the "right" ->nsproxy, ->fs, etc? > > > > > > Eric, do you think this approach would be sufficient too? > > > > > > Probably wouldn't be quite right for user namespaces but should > > > provide > > > what's needed for other cases? > > > > > > It certainly has the advantage of not having to maintain a plague > > > of > > > processes waiting around to execute helpers. > > > > That certainly sounds attractive. Especially for the case of > > everyone > > who wants to set a core pattern in a container. > > > > I am fuzzy on all of the details right now, but what I do remember > > is > > that in the kernel the user mode helper concepts when they attempted > > to > > scrub a processes environment were quite error prone until we > > managed to > > get kthreadd(pid 2) on the scene which always had a clean > > environment. > > > > If we are going to tie this kind of thing to the pid namespace I > > recommend simplying denying it if you are in a user namespace > > without > > an approrpriate pid namespace. AKA simply not allowing thigns to be > > setup > > if current->pid_ns->user_ns != current->user_ns. > > > Can't be handled by simple capability like CAP_SYS_USERMODEHELPER ? > > User_ns check seems not to allow core-dump-cather in host will not > work if user_ns is used. I don't think so but I'm not sure. The approach I was talking about assumes the init process of the caller (say within a container, corresponding to ->child_reaper) is an appropriate template for umh thread execution. But I don't think that covers the case where unshare has created different namespaces, like a mount namespace for example. The current workqueue sub system can't be used to pre-create a thread to be used for umh execution so, either is needs changes or yet another mechanism needs to be implemented. For uses other than core dumping capturing a reference to the struct pid of the environment init process and using that as an execution template should be sufficient and takes care of environment existence problems with some extra checks, not to mention eliminating the need for a potentially huge number of kernel threads needing to be created to provide execution templates. Where to store this and how to access it when needed is another problem. Not sure a usermode helper capability is the right thing either as I thought one important use of user namespaces was to allow unprivileged users to perform operations they otherwise can't. Maybe a CAP_SYS_USERNSCOREDUMP or similar would be sensible .... Still an appropriate execution template would be needed and IIUC we can't trust getting that from within a user created namespace environment. > > Thanks, > -Kame > ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: call_usermodehelper in containers 2016-02-18 6:36 ` Ian Kent @ 2016-02-18 7:37 ` Ian Kent [not found] ` <1455781033.2908.5.camel-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org> [not found] ` <1455777387.3188.24.camel-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org> 1 sibling, 1 reply; 49+ messages in thread From: Ian Kent @ 2016-02-18 7:37 UTC (permalink / raw) To: Kamezawa Hiroyuki, Eric W. Biederman Cc: Oleg Nesterov, Stanislav Kinsbursky, Jeff Layton, Greg KH, linux-kernel, linux-fsdevel, linux-nfs, devel, bfields, bharrosh, Linux Containers On Thu, 2016-02-18 at 14:36 +0800, Ian Kent wrote: > On Thu, 2016-02-18 at 12:43 +0900, Kamezawa Hiroyuki wrote: > > On 2016/02/18 11:57, Eric W. Biederman wrote: > > > > > > Ccing The containers list because a related discussion is > > > happening > > > there > > > and somehow this thread has never made it there. > > > > > > Ian Kent <raven@themaw.net> writes: > > > > > > > On Mon, 2013-11-18 at 18:28 +0100, Oleg Nesterov wrote: > > > > > On 11/15, Eric W. Biederman wrote: > > > > > > > > > > > > I don't understand that one. Having a preforked thread with > > > > > > the > > > > > > proper > > > > > > environment that can act like kthreadd in terms of spawning > > > > > > user > > > > > > mode > > > > > > helpers works and is simple. > > > > > > > > Forgive me replying to such an old thread but ... > > > > > > > > After realizing workqueues can't be used to pre-create threads > > > > to > > > > run > > > > usermode helpers I've returned to look at this. > > > > > > If someone can wind up with a good implementation I will be happy. > > > > > > > > Can't we ask ->child_reaper to create the non-daemonized > > > > > kernel > > > > > thread > > > > > with the "right" ->nsproxy, ->fs, etc? > > > > > > > > Eric, do you think this approach would be sufficient too? > > > > > > > > Probably wouldn't be quite right for user namespaces but should > > > > provide > > > > what's needed for other cases? > > > > > > > > It certainly has the advantage of not having to maintain a > > > > plague > > > > of > > > > processes waiting around to execute helpers. > > > > > > That certainly sounds attractive. Especially for the case of > > > everyone > > > who wants to set a core pattern in a container. > > > > > > I am fuzzy on all of the details right now, but what I do remember > > > is > > > that in the kernel the user mode helper concepts when they > > > attempted > > > to > > > scrub a processes environment were quite error prone until we > > > managed to > > > get kthreadd(pid 2) on the scene which always had a clean > > > environment. > > > > > > If we are going to tie this kind of thing to the pid namespace I > > > recommend simplying denying it if you are in a user namespace > > > without > > > an approrpriate pid namespace. AKA simply not allowing thigns to > > > be > > > setup > > > if current->pid_ns->user_ns != current->user_ns. > > > > > Can't be handled by simple capability like CAP_SYS_USERMODEHELPER ? > > > > User_ns check seems not to allow core-dump-cather in host will not > > work if user_ns is used. > > I don't think so but I'm not sure. > > The approach I was talking about assumes the init process of the > caller > (say within a container, corresponding to ->child_reaper) is an > appropriate template for umh thread execution. > > But I don't think that covers the case where unshare has created > different namespaces, like a mount namespace for example. > > The current workqueue sub system can't be used to pre-create a thread > to > be used for umh execution so, either is needs changes or yet another > mechanism needs to be implemented. > > For uses other than core dumping capturing a reference to the struct > pid > of the environment init process and using that as an execution > template > should be sufficient and takes care of environment existence problems > with some extra checks, not to mention eliminating the need for a > potentially huge number of kernel threads needing to be created to > provide execution templates. > > Where to store this and how to access it when needed is another > problem. > > Not sure a usermode helper capability is the right thing either as I > thought one important use of user namespaces was to allow unprivileged > users to perform operations they otherwise can't. > > Maybe a CAP_SYS_USERNSCOREDUMP or similar would be sensible .... > > Still an appropriate execution template would be needed and IIUC we > can't trust getting that from within a user created namespace > environment. Perhaps, if a struct cred could be captured at some appropriate time that could be used to cater for user namespaces. Eric, do you think that would be possible to do without allowing users to circumvent security? > > > > > Thanks, > > -Kame ^ permalink raw reply [flat|nested] 49+ messages in thread
[parent not found: <1455781033.2908.5.camel-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org>]
* Re: call_usermodehelper in containers 2016-02-18 7:37 ` Ian Kent @ 2016-02-18 20:45 ` Eric W. Biederman 0 siblings, 0 replies; 49+ messages in thread From: Eric W. Biederman @ 2016-02-18 20:45 UTC (permalink / raw) To: Ian Kent Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA, Stanislav Kinsbursky, Jeff Layton, Greg KH, Linux Containers, Oleg Nesterov, linux-kernel-u79uwXL29TY76Z2rM5mHXA, bfields-uC3wQj2KruNg9hUCZPvPmw, devel-GEFAQzZX7r8dnm+yROfE0A, bharrosh-C4P08NqkoRlBDgjK7y7TUQ, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA Ian Kent <raven-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org> writes: > On Thu, 2016-02-18 at 14:36 +0800, Ian Kent wrote: >> On Thu, 2016-02-18 at 12:43 +0900, Kamezawa Hiroyuki wrote: >> > On 2016/02/18 11:57, Eric W. Biederman wrote: >> > > >> > > Ccing The containers list because a related discussion is >> > > happening >> > > there >> > > and somehow this thread has never made it there. >> > > >> > > Ian Kent <raven-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org> writes: >> > > >> > > > On Mon, 2013-11-18 at 18:28 +0100, Oleg Nesterov wrote: >> > > > > On 11/15, Eric W. Biederman wrote: >> > > > > > >> > > > > > I don't understand that one. Having a preforked thread with >> > > > > > the >> > > > > > proper >> > > > > > environment that can act like kthreadd in terms of spawning >> > > > > > user >> > > > > > mode >> > > > > > helpers works and is simple. >> > > > >> > > > Forgive me replying to such an old thread but ... >> > > > >> > > > After realizing workqueues can't be used to pre-create threads >> > > > to >> > > > run >> > > > usermode helpers I've returned to look at this. >> > > >> > > If someone can wind up with a good implementation I will be happy. >> > > >> > > > > Can't we ask ->child_reaper to create the non-daemonized >> > > > > kernel >> > > > > thread >> > > > > with the "right" ->nsproxy, ->fs, etc? >> > > > >> > > > Eric, do you think this approach would be sufficient too? >> > > > >> > > > Probably wouldn't be quite right for user namespaces but should >> > > > provide >> > > > what's needed for other cases? >> > > > >> > > > It certainly has the advantage of not having to maintain a >> > > > plague >> > > > of >> > > > processes waiting around to execute helpers. >> > > >> > > That certainly sounds attractive. Especially for the case of >> > > everyone >> > > who wants to set a core pattern in a container. >> > > >> > > I am fuzzy on all of the details right now, but what I do remember >> > > is >> > > that in the kernel the user mode helper concepts when they >> > > attempted >> > > to >> > > scrub a processes environment were quite error prone until we >> > > managed to >> > > get kthreadd(pid 2) on the scene which always had a clean >> > > environment. >> > > >> > > If we are going to tie this kind of thing to the pid namespace I >> > > recommend simplying denying it if you are in a user namespace >> > > without >> > > an approrpriate pid namespace. AKA simply not allowing thigns to >> > > be >> > > setup >> > > if current->pid_ns->user_ns != current->user_ns. >> > > >> > Can't be handled by simple capability like CAP_SYS_USERMODEHELPER ? I wasn't talking about a capability I was talking about how to identify where the user mode helper lives. >> > User_ns check seems not to allow core-dump-cather in host will not >> > work if user_ns is used. The bottom line is all of this approaches non-sense if user namespaces are not used. If you just have a pid namespace or a mount namespace (or perhaps both) and your fire off a new fangled user mode helper you get a deep problem. The user space process started to handle your core dump or your nfs callback will have a full set of capabilities (because it is still in the root user namespace). With a full set of capabilities and perhaps a little luck there is no containment. The imperfect solution that currently exists for the core dump helper is to provide enough information to the user space application that it can query and find out the context of the core dumping application and keep everything in that application sandbox if it so desires. I expect something similar could be done for other user mode helper style callbacks. To make starting the user space application other than how we do today needs a good argument that you are you can allow a lesser privileged process set things up and that it can be exploited to gain privielge. >> I don't think so but I'm not sure. >> >> The approach I was talking about assumes the init process of the >> caller >> (say within a container, corresponding to ->child_reaper) is an >> appropriate template for umh thread execution. >> >> But I don't think that covers the case where unshare has created >> different namespaces, like a mount namespace for example. >> >> The current workqueue sub system can't be used to pre-create a thread >> to >> be used for umh execution so, either is needs changes or yet another >> mechanism needs to be implemented. >> >> For uses other than core dumping capturing a reference to the struct >> pid >> of the environment init process and using that as an execution >> template >> should be sufficient and takes care of environment existence problems >> with some extra checks, not to mention eliminating the need for a >> potentially huge number of kernel threads needing to be created to >> provide execution templates. >> >> Where to store this and how to access it when needed is another >> problem. >> >> Not sure a usermode helper capability is the right thing either as I >> thought one important use of user namespaces was to allow unprivileged >> users to perform operations they otherwise can't. >> >> Maybe a CAP_SYS_USERNSCOREDUMP or similar would be sensible .... >> >> Still an appropriate execution template would be needed and IIUC we >> can't trust getting that from within a user created namespace >> environment. > > Perhaps, if a struct cred could be captured at some appropriate time > that could be used to cater for user namespaces. > > Eric, do you think that would be possible to do without allowing users > to circumvent security? The general problem with capturing less than a full process is that we always mess it up and forget to capture something important. In a lot of ways this is a very simpilar problem to setting up an at job or a cron job. You build a script you test it then you tell at to run it at a certain time and it fails, because your working environment did not include something important that was in your actuall environment. Unfortunately in this case the failures we are talking about are container escapes and privilege escalation, so we do need to tread carefully. We might be able to safely define the context as the context of the currently running init process (Which we can identifiy with a struct pid). Justifying that looks a little trickier but doable. After a mechanism is picked it simply becomes a case of making certain your permission checks for starting something are in sync with your mechanism. Personally I am a fan of the don't be clever and capture a kernel thread approach as it is very easy to see you what if any exploitation opportunities there are. The justifications for something more clever is trickier. Of course we do something that from this perspective would be considered ``clever'' today with kthreadd and user mode helpers. Eric ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: call_usermodehelper in containers @ 2016-02-18 20:45 ` Eric W. Biederman 0 siblings, 0 replies; 49+ messages in thread From: Eric W. Biederman @ 2016-02-18 20:45 UTC (permalink / raw) To: Ian Kent Cc: Kamezawa Hiroyuki, Oleg Nesterov, Stanislav Kinsbursky, Jeff Layton, Greg KH, linux-kernel, linux-fsdevel, linux-nfs, devel, bfields, bharrosh, Linux Containers Ian Kent <raven@themaw.net> writes: > On Thu, 2016-02-18 at 14:36 +0800, Ian Kent wrote: >> On Thu, 2016-02-18 at 12:43 +0900, Kamezawa Hiroyuki wrote: >> > On 2016/02/18 11:57, Eric W. Biederman wrote: >> > > >> > > Ccing The containers list because a related discussion is >> > > happening >> > > there >> > > and somehow this thread has never made it there. >> > > >> > > Ian Kent <raven@themaw.net> writes: >> > > >> > > > On Mon, 2013-11-18 at 18:28 +0100, Oleg Nesterov wrote: >> > > > > On 11/15, Eric W. Biederman wrote: >> > > > > > >> > > > > > I don't understand that one. Having a preforked thread with >> > > > > > the >> > > > > > proper >> > > > > > environment that can act like kthreadd in terms of spawning >> > > > > > user >> > > > > > mode >> > > > > > helpers works and is simple. >> > > > >> > > > Forgive me replying to such an old thread but ... >> > > > >> > > > After realizing workqueues can't be used to pre-create threads >> > > > to >> > > > run >> > > > usermode helpers I've returned to look at this. >> > > >> > > If someone can wind up with a good implementation I will be happy. >> > > >> > > > > Can't we ask ->child_reaper to create the non-daemonized >> > > > > kernel >> > > > > thread >> > > > > with the "right" ->nsproxy, ->fs, etc? >> > > > >> > > > Eric, do you think this approach would be sufficient too? >> > > > >> > > > Probably wouldn't be quite right for user namespaces but should >> > > > provide >> > > > what's needed for other cases? >> > > > >> > > > It certainly has the advantage of not having to maintain a >> > > > plague >> > > > of >> > > > processes waiting around to execute helpers. >> > > >> > > That certainly sounds attractive. Especially for the case of >> > > everyone >> > > who wants to set a core pattern in a container. >> > > >> > > I am fuzzy on all of the details right now, but what I do remember >> > > is >> > > that in the kernel the user mode helper concepts when they >> > > attempted >> > > to >> > > scrub a processes environment were quite error prone until we >> > > managed to >> > > get kthreadd(pid 2) on the scene which always had a clean >> > > environment. >> > > >> > > If we are going to tie this kind of thing to the pid namespace I >> > > recommend simplying denying it if you are in a user namespace >> > > without >> > > an approrpriate pid namespace. AKA simply not allowing thigns to >> > > be >> > > setup >> > > if current->pid_ns->user_ns != current->user_ns. >> > > >> > Can't be handled by simple capability like CAP_SYS_USERMODEHELPER ? I wasn't talking about a capability I was talking about how to identify where the user mode helper lives. >> > User_ns check seems not to allow core-dump-cather in host will not >> > work if user_ns is used. The bottom line is all of this approaches non-sense if user namespaces are not used. If you just have a pid namespace or a mount namespace (or perhaps both) and your fire off a new fangled user mode helper you get a deep problem. The user space process started to handle your core dump or your nfs callback will have a full set of capabilities (because it is still in the root user namespace). With a full set of capabilities and perhaps a little luck there is no containment. The imperfect solution that currently exists for the core dump helper is to provide enough information to the user space application that it can query and find out the context of the core dumping application and keep everything in that application sandbox if it so desires. I expect something similar could be done for other user mode helper style callbacks. To make starting the user space application other than how we do today needs a good argument that you are you can allow a lesser privileged process set things up and that it can be exploited to gain privielge. >> I don't think so but I'm not sure. >> >> The approach I was talking about assumes the init process of the >> caller >> (say within a container, corresponding to ->child_reaper) is an >> appropriate template for umh thread execution. >> >> But I don't think that covers the case where unshare has created >> different namespaces, like a mount namespace for example. >> >> The current workqueue sub system can't be used to pre-create a thread >> to >> be used for umh execution so, either is needs changes or yet another >> mechanism needs to be implemented. >> >> For uses other than core dumping capturing a reference to the struct >> pid >> of the environment init process and using that as an execution >> template >> should be sufficient and takes care of environment existence problems >> with some extra checks, not to mention eliminating the need for a >> potentially huge number of kernel threads needing to be created to >> provide execution templates. >> >> Where to store this and how to access it when needed is another >> problem. >> >> Not sure a usermode helper capability is the right thing either as I >> thought one important use of user namespaces was to allow unprivileged >> users to perform operations they otherwise can't. >> >> Maybe a CAP_SYS_USERNSCOREDUMP or similar would be sensible .... >> >> Still an appropriate execution template would be needed and IIUC we >> can't trust getting that from within a user created namespace >> environment. > > Perhaps, if a struct cred could be captured at some appropriate time > that could be used to cater for user namespaces. > > Eric, do you think that would be possible to do without allowing users > to circumvent security? The general problem with capturing less than a full process is that we always mess it up and forget to capture something important. In a lot of ways this is a very simpilar problem to setting up an at job or a cron job. You build a script you test it then you tell at to run it at a certain time and it fails, because your working environment did not include something important that was in your actuall environment. Unfortunately in this case the failures we are talking about are container escapes and privilege escalation, so we do need to tread carefully. We might be able to safely define the context as the context of the currently running init process (Which we can identifiy with a struct pid). Justifying that looks a little trickier but doable. After a mechanism is picked it simply becomes a case of making certain your permission checks for starting something are in sync with your mechanism. Personally I am a fan of the don't be clever and capture a kernel thread approach as it is very easy to see you what if any exploitation opportunities there are. The justifications for something more clever is trickier. Of course we do something that from this perspective would be considered ``clever'' today with kthreadd and user mode helpers. Eric ^ permalink raw reply [flat|nested] 49+ messages in thread
[parent not found: <87r3g9ychc.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>]
* Re: call_usermodehelper in containers 2016-02-18 20:45 ` Eric W. Biederman @ 2016-02-19 3:08 ` Kamezawa Hiroyuki -1 siblings, 0 replies; 49+ messages in thread From: Kamezawa Hiroyuki @ 2016-02-19 3:08 UTC (permalink / raw) To: Eric W. Biederman, Ian Kent Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA, Stanislav Kinsbursky, Jeff Layton, Greg KH, Linux Containers, Oleg Nesterov, linux-kernel-u79uwXL29TY76Z2rM5mHXA, bfields-uC3wQj2KruNg9hUCZPvPmw, bharrosh-C4P08NqkoRlBDgjK7y7TUQ, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, devel-GEFAQzZX7r8dnm+yROfE0A On 2016/02/19 5:45, Eric W. Biederman wrote: > Personally I am a fan of the don't be clever and capture a kernel thread > approach as it is very easy to see you what if any exploitation > opportunities there are. The justifications for something more clever > is trickier. Of course we do something that from this perspective would > be considered ``clever'' today with kthreadd and user mode helpers. > I read old discussion....let me allow clarification to create a helper kernel thread to run usermodehelper with using kthreadd. 0) define a trigger to create an independent usermodehelper environment for a container. Option A) at creating some namespace (pid, uid, etc...) Option B) at creating a new nsproxy Option C).at a new systemcall is called or some sysctl, make_private_usermode_helper() or some, It's expected this should be triggered by init process of a container with some capability. And scope of the effect should be defined. pid namespace ? nsporxy ? or new namespace ? 1) create a helper thread. task = kthread_create(kthread_work_fn, ?, ?, "usermodehelper") switch task's nsproxy to current.(swtich_task_namespaces()) switch task's cgroups to current (cgroup_attach_task_all()) switch task's cred to current. copy task's capability from current (and any other ?) wake_up_process() And create a link between kthread_wq and container. 2) modify call_usermodehelper() to use kthread_worker .... It seems the problem is which object container private user mode helper should be tied to. Regards, -Kame ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: call_usermodehelper in containers @ 2016-02-19 3:08 ` Kamezawa Hiroyuki 0 siblings, 0 replies; 49+ messages in thread From: Kamezawa Hiroyuki @ 2016-02-19 3:08 UTC (permalink / raw) To: Eric W. Biederman, Ian Kent Cc: Oleg Nesterov, Stanislav Kinsbursky, Jeff Layton, Greg KH, linux-kernel, linux-fsdevel, linux-nfs, devel, bfields, bharrosh, Linux Containers On 2016/02/19 5:45, Eric W. Biederman wrote: > Personally I am a fan of the don't be clever and capture a kernel thread > approach as it is very easy to see you what if any exploitation > opportunities there are. The justifications for something more clever > is trickier. Of course we do something that from this perspective would > be considered ``clever'' today with kthreadd and user mode helpers. > I read old discussion....let me allow clarification to create a helper kernel thread to run usermodehelper with using kthreadd. 0) define a trigger to create an independent usermodehelper environment for a container. Option A) at creating some namespace (pid, uid, etc...) Option B) at creating a new nsproxy Option C).at a new systemcall is called or some sysctl, make_private_usermode_helper() or some, It's expected this should be triggered by init process of a container with some capability. And scope of the effect should be defined. pid namespace ? nsporxy ? or new namespace ? 1) create a helper thread. task = kthread_create(kthread_work_fn, ?, ?, "usermodehelper") switch task's nsproxy to current.(swtich_task_namespaces()) switch task's cgroups to current (cgroup_attach_task_all()) switch task's cred to current. copy task's capability from current (and any other ?) wake_up_process() And create a link between kthread_wq and container. 2) modify call_usermodehelper() to use kthread_worker .... It seems the problem is which object container private user mode helper should be tied to. Regards, -Kame ^ permalink raw reply [flat|nested] 49+ messages in thread
[parent not found: <56C68714.2000900-+CUm20s59erQFUHtdCDX3A@public.gmane.org>]
* Re: call_usermodehelper in containers [not found] ` <56C68714.2000900-+CUm20s59erQFUHtdCDX3A@public.gmane.org> @ 2016-02-19 5:37 ` Ian Kent 0 siblings, 0 replies; 49+ messages in thread From: Ian Kent @ 2016-02-19 5:37 UTC (permalink / raw) To: Kamezawa Hiroyuki, Eric W. Biederman Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA, Stanislav Kinsbursky, Jeff Layton, Greg KH, Linux Containers, Oleg Nesterov, linux-kernel-u79uwXL29TY76Z2rM5mHXA, bfields-uC3wQj2KruNg9hUCZPvPmw, bharrosh-C4P08NqkoRlBDgjK7y7TUQ, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, devel-GEFAQzZX7r8dnm+yROfE0A On Fri, 2016-02-19 at 12:08 +0900, Kamezawa Hiroyuki wrote: > On 2016/02/19 5:45, Eric W. Biederman wrote: > > Personally I am a fan of the don't be clever and capture a kernel > > thread > > approach as it is very easy to see you what if any exploitation > > opportunities there are. The justifications for something more > > clever > > is trickier. Of course we do something that from this perspective > > would > > be considered ``clever'' today with kthreadd and user mode helpers. > > > > I read old discussion....let me allow clarification to create a > helper kernel thread > to run usermodehelper with using kthreadd. > > 0) define a trigger to create an independent usermodehelper > environment for a container. > Option A) at creating some namespace (pid, uid, etc...) > Option B) at creating a new nsproxy > Option C).at a new systemcall is called or some sysctl, > make_private_usermode_helper() or some, > > It's expected this should be triggered by init process of a > container with some capability. > And scope of the effect should be defined. pid namespace ? nsporxy ? > or new namespace ? > > 1) create a helper thread. > task = kthread_create(kthread_work_fn, ?, ?, "usermodehelper") > switch task's nsproxy to current.(swtich_task_namespaces()) > switch task's cgroups to current (cgroup_attach_task_all()) > switch task's cred to current. > copy task's capability from current > (and any other ?) > wake_up_process() > > And create a link between kthread_wq and container. Not sure I quite understand this but I thought the difficulty with this approach previously (even though the approach was very much incomplete) was knowing that all the "moving parts" would not allow vulnerabilities. And it looks like this would require a kernel thread for each instance. So for a thousand containers that each mount an NFS mount that means, at least, 1000 additional kernel threads. Might be able to sell that, if we were lucky, but from an system administration POV it's horrible. There's also the question of existence (aka. lifetime) to deal with since the thread above needs to be created at a time other than the usermode helper callback. What happens for SIGKILL on a container? > 2) modify call_usermodehelper() to use kthread_worker > .... > > It seems the problem is which object container private user mode > helper should be tied to. > > Regards, > -Kame ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: call_usermodehelper in containers 2016-02-19 3:08 ` Kamezawa Hiroyuki (?) (?) @ 2016-02-19 5:37 ` Ian Kent [not found] ` <1455860260.3356.31.camel-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org> -1 siblings, 1 reply; 49+ messages in thread From: Ian Kent @ 2016-02-19 5:37 UTC (permalink / raw) To: Kamezawa Hiroyuki, Eric W. Biederman Cc: Oleg Nesterov, Stanislav Kinsbursky, Jeff Layton, Greg KH, linux-kernel, linux-fsdevel, linux-nfs, devel, bfields, bharrosh, Linux Containers On Fri, 2016-02-19 at 12:08 +0900, Kamezawa Hiroyuki wrote: > On 2016/02/19 5:45, Eric W. Biederman wrote: > > Personally I am a fan of the don't be clever and capture a kernel > > thread > > approach as it is very easy to see you what if any exploitation > > opportunities there are. The justifications for something more > > clever > > is trickier. Of course we do something that from this perspective > > would > > be considered ``clever'' today with kthreadd and user mode helpers. > > > > I read old discussion....let me allow clarification to create a > helper kernel thread > to run usermodehelper with using kthreadd. > > 0) define a trigger to create an independent usermodehelper > environment for a container. > Option A) at creating some namespace (pid, uid, etc...) > Option B) at creating a new nsproxy > Option C).at a new systemcall is called or some sysctl, > make_private_usermode_helper() or some, > > It's expected this should be triggered by init process of a > container with some capability. > And scope of the effect should be defined. pid namespace ? nsporxy ? > or new namespace ? > > 1) create a helper thread. > task = kthread_create(kthread_work_fn, ?, ?, "usermodehelper") > switch task's nsproxy to current.(swtich_task_namespaces()) > switch task's cgroups to current (cgroup_attach_task_all()) > switch task's cred to current. > copy task's capability from current > (and any other ?) > wake_up_process() > > And create a link between kthread_wq and container. Not sure I quite understand this but I thought the difficulty with this approach previously (even though the approach was very much incomplete) was knowing that all the "moving parts" would not allow vulnerabilities. And it looks like this would require a kernel thread for each instance. So for a thousand containers that each mount an NFS mount that means, at least, 1000 additional kernel threads. Might be able to sell that, if we were lucky, but from an system administration POV it's horrible. There's also the question of existence (aka. lifetime) to deal with since the thread above needs to be created at a time other than the usermode helper callback. What happens for SIGKILL on a container? > 2) modify call_usermodehelper() to use kthread_worker > .... > > It seems the problem is which object container private user mode > helper should be tied to. > > Regards, > -Kame ^ permalink raw reply [flat|nested] 49+ messages in thread
[parent not found: <1455860260.3356.31.camel-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org>]
* Re: call_usermodehelper in containers 2016-02-19 5:37 ` Ian Kent @ 2016-02-19 9:30 ` Kamezawa Hiroyuki 0 siblings, 0 replies; 49+ messages in thread From: Kamezawa Hiroyuki @ 2016-02-19 9:30 UTC (permalink / raw) To: Ian Kent, Eric W. Biederman Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA, Stanislav Kinsbursky, Jeff Layton, Greg KH, Linux Containers, Oleg Nesterov, linux-kernel-u79uwXL29TY76Z2rM5mHXA, bfields-uC3wQj2KruNg9hUCZPvPmw, bharrosh-C4P08NqkoRlBDgjK7y7TUQ, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, devel-GEFAQzZX7r8dnm+yROfE0A On 2016/02/19 14:37, Ian Kent wrote: > On Fri, 2016-02-19 at 12:08 +0900, Kamezawa Hiroyuki wrote: >> On 2016/02/19 5:45, Eric W. Biederman wrote: >>> Personally I am a fan of the don't be clever and capture a kernel >>> thread >>> approach as it is very easy to see you what if any exploitation >>> opportunities there are. The justifications for something more >>> clever >>> is trickier. Of course we do something that from this perspective >>> would >>> be considered ``clever'' today with kthreadd and user mode helpers. >>> >> >> I read old discussion....let me allow clarification to create a >> helper kernel thread >> to run usermodehelper with using kthreadd. >> >> 0) define a trigger to create an independent usermodehelper >> environment for a container. >> Option A) at creating some namespace (pid, uid, etc...) >> Option B) at creating a new nsproxy >> Option C).at a new systemcall is called or some sysctl, >> make_private_usermode_helper() or some, >> >> It's expected this should be triggered by init process of a >> container with some capability. >> And scope of the effect should be defined. pid namespace ? nsporxy ? >> or new namespace ? >> >> 1) create a helper thread. >> task = kthread_create(kthread_work_fn, ?, ?, "usermodehelper") >> switch task's nsproxy to current.(swtich_task_namespaces()) >> switch task's cgroups to current (cgroup_attach_task_all()) >> switch task's cred to current. >> copy task's capability from current >> (and any other ?) >> wake_up_process() >> >> And create a link between kthread_wq and container. > > Not sure I quite understand this but I thought the difficulty with this > approach previously (even though the approach was very much incomplete) > was knowing that all the "moving parts" would not allow vulnerabilities. > Ok, that was discussed. > And it looks like this would require a kernel thread for each instance. > So for a thousand containers that each mount an NFS mount that means, at > least, 1000 additional kernel threads. Might be able to sell that, if we > were lucky, but from an system administration POV it's horrible. > I agree. > There's also the question of existence (aka. lifetime) to deal with > since the thread above needs to be created at a time other than the > usermode helper callback. > > What happens for SIGKILL on a container? > It depends on how the helper kthread is tied to a container related object. If kthread is linked with some namespace, we can kill it when a namespace goes away. So, with your opinion, - a helper thread should be spawned on demand - the lifetime of it should be clear. It will be good to have as same life time as the container. I wonder there is no solution for "moving part" problem other than calling do_fork() or copy_process() with container's init process context if we do all in the kernel. Is that possible ? Thanks, -Kame ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: call_usermodehelper in containers @ 2016-02-19 9:30 ` Kamezawa Hiroyuki 0 siblings, 0 replies; 49+ messages in thread From: Kamezawa Hiroyuki @ 2016-02-19 9:30 UTC (permalink / raw) To: Ian Kent, Eric W. Biederman Cc: Oleg Nesterov, Stanislav Kinsbursky, Jeff Layton, Greg KH, linux-kernel, linux-fsdevel, linux-nfs, devel, bfields, bharrosh, Linux Containers On 2016/02/19 14:37, Ian Kent wrote: > On Fri, 2016-02-19 at 12:08 +0900, Kamezawa Hiroyuki wrote: >> On 2016/02/19 5:45, Eric W. Biederman wrote: >>> Personally I am a fan of the don't be clever and capture a kernel >>> thread >>> approach as it is very easy to see you what if any exploitation >>> opportunities there are. The justifications for something more >>> clever >>> is trickier. Of course we do something that from this perspective >>> would >>> be considered ``clever'' today with kthreadd and user mode helpers. >>> >> >> I read old discussion....let me allow clarification to create a >> helper kernel thread >> to run usermodehelper with using kthreadd. >> >> 0) define a trigger to create an independent usermodehelper >> environment for a container. >> Option A) at creating some namespace (pid, uid, etc...) >> Option B) at creating a new nsproxy >> Option C).at a new systemcall is called or some sysctl, >> make_private_usermode_helper() or some, >> >> It's expected this should be triggered by init process of a >> container with some capability. >> And scope of the effect should be defined. pid namespace ? nsporxy ? >> or new namespace ? >> >> 1) create a helper thread. >> task = kthread_create(kthread_work_fn, ?, ?, "usermodehelper") >> switch task's nsproxy to current.(swtich_task_namespaces()) >> switch task's cgroups to current (cgroup_attach_task_all()) >> switch task's cred to current. >> copy task's capability from current >> (and any other ?) >> wake_up_process() >> >> And create a link between kthread_wq and container. > > Not sure I quite understand this but I thought the difficulty with this > approach previously (even though the approach was very much incomplete) > was knowing that all the "moving parts" would not allow vulnerabilities. > Ok, that was discussed. > And it looks like this would require a kernel thread for each instance. > So for a thousand containers that each mount an NFS mount that means, at > least, 1000 additional kernel threads. Might be able to sell that, if we > were lucky, but from an system administration POV it's horrible. > I agree. > There's also the question of existence (aka. lifetime) to deal with > since the thread above needs to be created at a time other than the > usermode helper callback. > > What happens for SIGKILL on a container? > It depends on how the helper kthread is tied to a container related object. If kthread is linked with some namespace, we can kill it when a namespace goes away. So, with your opinion, - a helper thread should be spawned on demand - the lifetime of it should be clear. It will be good to have as same life time as the container. I wonder there is no solution for "moving part" problem other than calling do_fork() or copy_process() with container's init process context if we do all in the kernel. Is that possible ? Thanks, -Kame ^ permalink raw reply [flat|nested] 49+ messages in thread
[parent not found: <56C6E0A8.3010806-+CUm20s59erQFUHtdCDX3A@public.gmane.org>]
* Re: call_usermodehelper in containers 2016-02-19 9:30 ` Kamezawa Hiroyuki @ 2016-02-20 3:28 ` Ian Kent -1 siblings, 0 replies; 49+ messages in thread From: Ian Kent @ 2016-02-20 3:28 UTC (permalink / raw) To: Kamezawa Hiroyuki, Eric W. Biederman Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA, Stanislav Kinsbursky, Jeff Layton, Greg KH, Linux Containers, Oleg Nesterov, linux-kernel-u79uwXL29TY76Z2rM5mHXA, bfields-uC3wQj2KruNg9hUCZPvPmw, bharrosh-C4P08NqkoRlBDgjK7y7TUQ, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, devel-GEFAQzZX7r8dnm+yROfE0A On Fri, 2016-02-19 at 18:30 +0900, Kamezawa Hiroyuki wrote: > On 2016/02/19 14:37, Ian Kent wrote: > > On Fri, 2016-02-19 at 12:08 +0900, Kamezawa Hiroyuki wrote: > > > On 2016/02/19 5:45, Eric W. Biederman wrote: > > > > Personally I am a fan of the don't be clever and capture a > > > > kernel > > > > thread > > > > approach as it is very easy to see you what if any exploitation > > > > opportunities there are. The justifications for something more > > > > clever > > > > is trickier. Of course we do something that from this > > > > perspective > > > > would > > > > be considered ``clever'' today with kthreadd and user mode > > > > helpers. > > > > > > > > > > I read old discussion....let me allow clarification to create a > > > helper kernel thread > > > to run usermodehelper with using kthreadd. > > > > > > 0) define a trigger to create an independent usermodehelper > > > environment for a container. > > > Option A) at creating some namespace (pid, uid, etc...) > > > Option B) at creating a new nsproxy > > > Option C).at a new systemcall is called or some sysctl, > > > make_private_usermode_helper() or some, > > > > > > It's expected this should be triggered by init process of a > > > container with some capability. > > > And scope of the effect should be defined. pid namespace ? > > > nsporxy ? > > > or new namespace ? > > > > > > 1) create a helper thread. > > > task = kthread_create(kthread_work_fn, ?, ?, "usermodehelper") > > > switch task's nsproxy to current.(swtich_task_namespaces()) > > > switch task's cgroups to current (cgroup_attach_task_all()) > > > switch task's cred to current. > > > copy task's capability from current > > > (and any other ?) > > > wake_up_process() > > > > > > And create a link between kthread_wq and container. > > > > Not sure I quite understand this but I thought the difficulty with > > this > > approach previously (even though the approach was very much > > incomplete) > > was knowing that all the "moving parts" would not allow > > vulnerabilities. > > > Ok, that was discussed. > > > And it looks like this would require a kernel thread for each > > instance. > > So for a thousand containers that each mount an NFS mount that > > means, at > > least, 1000 additional kernel threads. Might be able to sell that, > > if we > > were lucky, but from an system administration POV it's horrible. > > > I agree. > > > There's also the question of existence (aka. lifetime) to deal with > > since the thread above needs to be created at a time other than the > > usermode helper callback. > > > > What happens for SIGKILL on a container? > > First understand that the fork and workqueue code is not something I've needed to look at in the past so it's still quite new to me even now. > It depends on how the helper kthread is tied to a container related > object. > If kthread is linked with some namespace, we can kill it when a > namespace > goes away. I don't know how to do that so without knowing any better I assume it could be difficult and complicated but, of course, I don't know. > > So, with your opinion, > - a helper thread should be spawned on demand > - the lifetime of it should be clear. It will be good to have as > same life time as the container. This was always what I believed to be the best way to do it but ... Not sure you've seen the other threads on this by me so let me provide some history. I started out posting a series (totally untested, an RFC only) in the hope of finding a way to do this. After a few iterations that lead to the conclusion that a kernel thread would need to be created to provide context for subsequent helper execution (for every distinct context), much the same as we have here, and that the init process of the required context would probably be sufficient for this, required as the environment of the thread requesting helper execution itself could be used subvert execution. I ended up accepting that even if I could work out what needed to be captured and work out what needed to be done to switch to the namspace(s) and other bits that would be high maintenance as it would be fairly complicated and subsystems may be added or changed over time. Also I had assumed a singlethread workqueue would create a single thread for helper execution which was wrong. After realizing what I had was far from what's needed I went back and started reviewing the previous threads. That lead me to following a link Oleg had posted to this thread where I finally saw his suggestion about using ->child_reaper as the execution template. That really got my attention because of its simplicity and that's why I want to give that a try now and see where it leads. However user namespaces do sound like a problem even with this. Having finally got a simple test scenario I see now that the palaces I use to capture the information used to run the helper is also wrong but that's less important than getting an execution method that works, is safe, and is as simple as it can be. > > I wonder there is no solution for "moving part" problem other than > calling > do_fork() or copy_process() with container's init process context if > we do all in the kernel. Not sure I understand this but I believe that ultimately there will be the equivalent of a fork (perhaps two) and exec (we need to exec the helper anyway) no matter how this is done. For example, IIUC, a fork must be done to change pid namespace but a template like the container init process would already have that pid namespace in cases other than possibly user namespaces. I hope I understood what you were asking and haven't needlessly rambled on, ;) Ian ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: call_usermodehelper in containers @ 2016-02-20 3:28 ` Ian Kent 0 siblings, 0 replies; 49+ messages in thread From: Ian Kent @ 2016-02-20 3:28 UTC (permalink / raw) To: Kamezawa Hiroyuki, Eric W. Biederman Cc: Oleg Nesterov, Stanislav Kinsbursky, Jeff Layton, Greg KH, linux-kernel, linux-fsdevel, linux-nfs, devel, bfields, bharrosh, Linux Containers On Fri, 2016-02-19 at 18:30 +0900, Kamezawa Hiroyuki wrote: > On 2016/02/19 14:37, Ian Kent wrote: > > On Fri, 2016-02-19 at 12:08 +0900, Kamezawa Hiroyuki wrote: > > > On 2016/02/19 5:45, Eric W. Biederman wrote: > > > > Personally I am a fan of the don't be clever and capture a > > > > kernel > > > > thread > > > > approach as it is very easy to see you what if any exploitation > > > > opportunities there are. The justifications for something more > > > > clever > > > > is trickier. Of course we do something that from this > > > > perspective > > > > would > > > > be considered ``clever'' today with kthreadd and user mode > > > > helpers. > > > > > > > > > > I read old discussion....let me allow clarification to create a > > > helper kernel thread > > > to run usermodehelper with using kthreadd. > > > > > > 0) define a trigger to create an independent usermodehelper > > > environment for a container. > > > Option A) at creating some namespace (pid, uid, etc...) > > > Option B) at creating a new nsproxy > > > Option C).at a new systemcall is called or some sysctl, > > > make_private_usermode_helper() or some, > > > > > > It's expected this should be triggered by init process of a > > > container with some capability. > > > And scope of the effect should be defined. pid namespace ? > > > nsporxy ? > > > or new namespace ? > > > > > > 1) create a helper thread. > > > task = kthread_create(kthread_work_fn, ?, ?, "usermodehelper") > > > switch task's nsproxy to current.(swtich_task_namespaces()) > > > switch task's cgroups to current (cgroup_attach_task_all()) > > > switch task's cred to current. > > > copy task's capability from current > > > (and any other ?) > > > wake_up_process() > > > > > > And create a link between kthread_wq and container. > > > > Not sure I quite understand this but I thought the difficulty with > > this > > approach previously (even though the approach was very much > > incomplete) > > was knowing that all the "moving parts" would not allow > > vulnerabilities. > > > Ok, that was discussed. > > > And it looks like this would require a kernel thread for each > > instance. > > So for a thousand containers that each mount an NFS mount that > > means, at > > least, 1000 additional kernel threads. Might be able to sell that, > > if we > > were lucky, but from an system administration POV it's horrible. > > > I agree. > > > There's also the question of existence (aka. lifetime) to deal with > > since the thread above needs to be created at a time other than the > > usermode helper callback. > > > > What happens for SIGKILL on a container? > > First understand that the fork and workqueue code is not something I've needed to look at in the past so it's still quite new to me even now. > It depends on how the helper kthread is tied to a container related > object. > If kthread is linked with some namespace, we can kill it when a > namespace > goes away. I don't know how to do that so without knowing any better I assume it could be difficult and complicated but, of course, I don't know. > > So, with your opinion, > - a helper thread should be spawned on demand > - the lifetime of it should be clear. It will be good to have as > same life time as the container. This was always what I believed to be the best way to do it but ... Not sure you've seen the other threads on this by me so let me provide some history. I started out posting a series (totally untested, an RFC only) in the hope of finding a way to do this. After a few iterations that lead to the conclusion that a kernel thread would need to be created to provide context for subsequent helper execution (for every distinct context), much the same as we have here, and that the init process of the required context would probably be sufficient for this, required as the environment of the thread requesting helper execution itself could be used subvert execution. I ended up accepting that even if I could work out what needed to be captured and work out what needed to be done to switch to the namspace(s) and other bits that would be high maintenance as it would be fairly complicated and subsystems may be added or changed over time. Also I had assumed a singlethread workqueue would create a single thread for helper execution which was wrong. After realizing what I had was far from what's needed I went back and started reviewing the previous threads. That lead me to following a link Oleg had posted to this thread where I finally saw his suggestion about using ->child_reaper as the execution template. That really got my attention because of its simplicity and that's why I want to give that a try now and see where it leads. However user namespaces do sound like a problem even with this. Having finally got a simple test scenario I see now that the palaces I use to capture the information used to run the helper is also wrong but that's less important than getting an execution method that works, is safe, and is as simple as it can be. > > I wonder there is no solution for "moving part" problem other than > calling > do_fork() or copy_process() with container's init process context if > we do all in the kernel. Not sure I understand this but I believe that ultimately there will be the equivalent of a fork (perhaps two) and exec (we need to exec the helper anyway) no matter how this is done. For example, IIUC, a fork must be done to change pid namespace but a template like the container init process would already have that pid namespace in cases other than possibly user namespaces. I hope I understood what you were asking and haven't needlessly rambled on, ;) Ian ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: call_usermodehelper in containers 2016-02-18 20:45 ` Eric W. Biederman @ 2016-02-19 5:14 ` Ian Kent -1 siblings, 0 replies; 49+ messages in thread From: Ian Kent @ 2016-02-19 5:14 UTC (permalink / raw) To: Eric W. Biederman Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA, Stanislav Kinsbursky, Jeff Layton, Greg KH, Linux Containers, Oleg Nesterov, linux-kernel-u79uwXL29TY76Z2rM5mHXA, bfields-uC3wQj2KruNg9hUCZPvPmw, devel-GEFAQzZX7r8dnm+yROfE0A, bharrosh-C4P08NqkoRlBDgjK7y7TUQ, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA On Thu, 2016-02-18 at 14:45 -0600, Eric W. Biederman wrote: > Ian Kent <raven-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org> writes: > > > On Thu, 2016-02-18 at 14:36 +0800, Ian Kent wrote: > > > On Thu, 2016-02-18 at 12:43 +0900, Kamezawa Hiroyuki wrote: > > > > On 2016/02/18 11:57, Eric W. Biederman wrote: > > > > > > > > > > Ccing The containers list because a related discussion is > > > > > happening > > > > > there > > > > > and somehow this thread has never made it there. > > > > > > > > > > Ian Kent <raven-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org> writes: > > > > > > > > > > > On Mon, 2013-11-18 at 18:28 +0100, Oleg Nesterov wrote: > > > > > > > On 11/15, Eric W. Biederman wrote: > > > > > > > > > > > > > > > > I don't understand that one. Having a preforked thread > > > > > > > > with > > > > > > > > the > > > > > > > > proper > > > > > > > > environment that can act like kthreadd in terms of > > > > > > > > spawning > > > > > > > > user > > > > > > > > mode > > > > > > > > helpers works and is simple. > > > > > > > > > > > > Forgive me replying to such an old thread but ... > > > > > > > > > > > > After realizing workqueues can't be used to pre-create > > > > > > threads > > > > > > to > > > > > > run > > > > > > usermode helpers I've returned to look at this. > > > > > > > > > > If someone can wind up with a good implementation I will be > > > > > happy. > > > > > > > > > > > > Can't we ask ->child_reaper to create the non-daemonized > > > > > > > kernel > > > > > > > thread > > > > > > > with the "right" ->nsproxy, ->fs, etc? > > > > > > > > > > > > Eric, do you think this approach would be sufficient too? > > > > > > > > > > > > Probably wouldn't be quite right for user namespaces but > > > > > > should > > > > > > provide > > > > > > what's needed for other cases? > > > > > > > > > > > > It certainly has the advantage of not having to maintain a > > > > > > plague > > > > > > of > > > > > > processes waiting around to execute helpers. > > > > > > > > > > That certainly sounds attractive. Especially for the case of > > > > > everyone > > > > > who wants to set a core pattern in a container. > > > > > > > > > > I am fuzzy on all of the details right now, but what I do > > > > > remember > > > > > is > > > > > that in the kernel the user mode helper concepts when they > > > > > attempted > > > > > to > > > > > scrub a processes environment were quite error prone until we > > > > > managed to > > > > > get kthreadd(pid 2) on the scene which always had a clean > > > > > environment. > > > > > > > > > > If we are going to tie this kind of thing to the pid namespace > > > > > I > > > > > recommend simplying denying it if you are in a user namespace > > > > > without > > > > > an approrpriate pid namespace. AKA simply not allowing thigns > > > > > to > > > > > be > > > > > setup > > > > > if current->pid_ns->user_ns != current->user_ns. > > > > > > > > > Can't be handled by simple capability like > > > > CAP_SYS_USERMODEHELPER ? > > I wasn't talking about a capability I was talking about how to > identify > where the user mode helper lives. > > > > > User_ns check seems not to allow core-dump-cather in host will > > > > not > > > > work if user_ns is used. > > The bottom line is all of this approaches non-sense if user namespaces > are not used. If you just have a pid namespace or a mount namespace > (or > perhaps both) and your fire off a new fangled user mode helper you get > a > deep problem. The user space process started to handle your core dump > or > your nfs callback will have a full set of capabilities (because it is > still in the root user namespace). With a full set of capabilities > and perhaps a little luck there is no containment. > > The imperfect solution that currently exists for the core dump helper > is to provide enough information to the user space application that > it can query and find out the context of the core dumping application > and keep everything in that application sandbox if it so desires. > I expect something similar could be done for other user mode helper > style callbacks. > > To make starting the user space application other than how we do today > needs a good argument that you are you can allow a lesser privileged > process set things up and that it can be exploited to gain privielge. > > > > I don't think so but I'm not sure. > > > > > > The approach I was talking about assumes the init process of the > > > caller > > > (say within a container, corresponding to ->child_reaper) is an > > > appropriate template for umh thread execution. > > > > > > But I don't think that covers the case where unshare has created > > > different namespaces, like a mount namespace for example. > > > > > > The current workqueue sub system can't be used to pre-create a > > > thread > > > to > > > be used for umh execution so, either is needs changes or yet > > > another > > > mechanism needs to be implemented. > > > > > > For uses other than core dumping capturing a reference to the > > > struct > > > pid > > > of the environment init process and using that as an execution > > > template > > > should be sufficient and takes care of environment existence > > > problems > > > with some extra checks, not to mention eliminating the need for a > > > potentially huge number of kernel threads needing to be created to > > > provide execution templates. > > > > > > Where to store this and how to access it when needed is another > > > problem. > > > > > > Not sure a usermode helper capability is the right thing either as > > > I > > > thought one important use of user namespaces was to allow > > > unprivileged > > > users to perform operations they otherwise can't. > > > > > > Maybe a CAP_SYS_USERNSCOREDUMP or similar would be sensible .... > > > > > > Still an appropriate execution template would be needed and IIUC > > > we > > > can't trust getting that from within a user created namespace > > > environment. > > > > Perhaps, if a struct cred could be captured at some appropriate time > > that could be used to cater for user namespaces. > > > > Eric, do you think that would be possible to do without allowing > > users > > to circumvent security? > > The general problem with capturing less than a full process is that > we always mess it up and forget to capture something important. > > In a lot of ways this is a very simpilar problem to setting up an at > job > or a cron job. You build a script you test it then you tell at to run > it at a certain time and it fails, because your working environment > did > not include something important that was in your actuall environment. > > Unfortunately in this case the failures we are talking about are > container escapes and privilege escalation, so we do need to tread > carefully. > > We might be able to safely define the context as the context of the > currently running init process (Which we can identifiy with a struct > pid). Justifying that looks a little trickier but doable. Right, that seems like a fairly straight forward thing to implement based on Olegs' example patch. I'll put together a series based on that approach. Keep in mind that the patches in my previous posts for sub-system usage are definitely wrong but I can use them (and they will be only an initial example of how to use the mechanism) to verify that contained execution happens. They will need to change. I was thinking that also capturing a struct cred (although I need to look more at the relationship between the process cred, and the nsproxy locations) at a particular time combined with a double fork and exec could allow inclusion of user namespace. Perhaps at only one level deep, ie. only allowing the first user namesapec created from init or from container and not user namespaces created from within a user namespace (if I can work out how to identify that case). Again when these are captured and how to get at them when needed is going to be a challenge. > > After a mechanism is picked it simply becomes a case of making certain > your permission checks for starting something are in sync with your > mechanism. Hopefully yourself and others can help with that, ;) > > Personally I am a fan of the don't be clever and capture a kernel > thread > approach as it is very easy to see you what if any exploitation > opportunities there are. The justifications for something more clever > is trickier. Of course we do something that from this perspective > would > be considered ``clever'' today with kthreadd and user mode helpers. Indeed, a good policy, but it seems the choice of the init process context (of a given container) is fairly straight forward and much of the tricky stuff and a good measure of checks may already be done in thread creation and exec code. As you have pointed out before this is a very difficult problem to deal with ..... Ian ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: call_usermodehelper in containers @ 2016-02-19 5:14 ` Ian Kent 0 siblings, 0 replies; 49+ messages in thread From: Ian Kent @ 2016-02-19 5:14 UTC (permalink / raw) To: Eric W. Biederman Cc: Kamezawa Hiroyuki, Oleg Nesterov, Stanislav Kinsbursky, Jeff Layton, Greg KH, linux-kernel, linux-fsdevel, linux-nfs, devel, bfields, bharrosh, Linux Containers On Thu, 2016-02-18 at 14:45 -0600, Eric W. Biederman wrote: > Ian Kent <raven@themaw.net> writes: > > > On Thu, 2016-02-18 at 14:36 +0800, Ian Kent wrote: > > > On Thu, 2016-02-18 at 12:43 +0900, Kamezawa Hiroyuki wrote: > > > > On 2016/02/18 11:57, Eric W. Biederman wrote: > > > > > > > > > > Ccing The containers list because a related discussion is > > > > > happening > > > > > there > > > > > and somehow this thread has never made it there. > > > > > > > > > > Ian Kent <raven@themaw.net> writes: > > > > > > > > > > > On Mon, 2013-11-18 at 18:28 +0100, Oleg Nesterov wrote: > > > > > > > On 11/15, Eric W. Biederman wrote: > > > > > > > > > > > > > > > > I don't understand that one. Having a preforked thread > > > > > > > > with > > > > > > > > the > > > > > > > > proper > > > > > > > > environment that can act like kthreadd in terms of > > > > > > > > spawning > > > > > > > > user > > > > > > > > mode > > > > > > > > helpers works and is simple. > > > > > > > > > > > > Forgive me replying to such an old thread but ... > > > > > > > > > > > > After realizing workqueues can't be used to pre-create > > > > > > threads > > > > > > to > > > > > > run > > > > > > usermode helpers I've returned to look at this. > > > > > > > > > > If someone can wind up with a good implementation I will be > > > > > happy. > > > > > > > > > > > > Can't we ask ->child_reaper to create the non-daemonized > > > > > > > kernel > > > > > > > thread > > > > > > > with the "right" ->nsproxy, ->fs, etc? > > > > > > > > > > > > Eric, do you think this approach would be sufficient too? > > > > > > > > > > > > Probably wouldn't be quite right for user namespaces but > > > > > > should > > > > > > provide > > > > > > what's needed for other cases? > > > > > > > > > > > > It certainly has the advantage of not having to maintain a > > > > > > plague > > > > > > of > > > > > > processes waiting around to execute helpers. > > > > > > > > > > That certainly sounds attractive. Especially for the case of > > > > > everyone > > > > > who wants to set a core pattern in a container. > > > > > > > > > > I am fuzzy on all of the details right now, but what I do > > > > > remember > > > > > is > > > > > that in the kernel the user mode helper concepts when they > > > > > attempted > > > > > to > > > > > scrub a processes environment were quite error prone until we > > > > > managed to > > > > > get kthreadd(pid 2) on the scene which always had a clean > > > > > environment. > > > > > > > > > > If we are going to tie this kind of thing to the pid namespace > > > > > I > > > > > recommend simplying denying it if you are in a user namespace > > > > > without > > > > > an approrpriate pid namespace. AKA simply not allowing thigns > > > > > to > > > > > be > > > > > setup > > > > > if current->pid_ns->user_ns != current->user_ns. > > > > > > > > > Can't be handled by simple capability like > > > > CAP_SYS_USERMODEHELPER ? > > I wasn't talking about a capability I was talking about how to > identify > where the user mode helper lives. > > > > > User_ns check seems not to allow core-dump-cather in host will > > > > not > > > > work if user_ns is used. > > The bottom line is all of this approaches non-sense if user namespaces > are not used. If you just have a pid namespace or a mount namespace > (or > perhaps both) and your fire off a new fangled user mode helper you get > a > deep problem. The user space process started to handle your core dump > or > your nfs callback will have a full set of capabilities (because it is > still in the root user namespace). With a full set of capabilities > and perhaps a little luck there is no containment. > > The imperfect solution that currently exists for the core dump helper > is to provide enough information to the user space application that > it can query and find out the context of the core dumping application > and keep everything in that application sandbox if it so desires. > I expect something similar could be done for other user mode helper > style callbacks. > > To make starting the user space application other than how we do today > needs a good argument that you are you can allow a lesser privileged > process set things up and that it can be exploited to gain privielge. > > > > I don't think so but I'm not sure. > > > > > > The approach I was talking about assumes the init process of the > > > caller > > > (say within a container, corresponding to ->child_reaper) is an > > > appropriate template for umh thread execution. > > > > > > But I don't think that covers the case where unshare has created > > > different namespaces, like a mount namespace for example. > > > > > > The current workqueue sub system can't be used to pre-create a > > > thread > > > to > > > be used for umh execution so, either is needs changes or yet > > > another > > > mechanism needs to be implemented. > > > > > > For uses other than core dumping capturing a reference to the > > > struct > > > pid > > > of the environment init process and using that as an execution > > > template > > > should be sufficient and takes care of environment existence > > > problems > > > with some extra checks, not to mention eliminating the need for a > > > potentially huge number of kernel threads needing to be created to > > > provide execution templates. > > > > > > Where to store this and how to access it when needed is another > > > problem. > > > > > > Not sure a usermode helper capability is the right thing either as > > > I > > > thought one important use of user namespaces was to allow > > > unprivileged > > > users to perform operations they otherwise can't. > > > > > > Maybe a CAP_SYS_USERNSCOREDUMP or similar would be sensible .... > > > > > > Still an appropriate execution template would be needed and IIUC > > > we > > > can't trust getting that from within a user created namespace > > > environment. > > > > Perhaps, if a struct cred could be captured at some appropriate time > > that could be used to cater for user namespaces. > > > > Eric, do you think that would be possible to do without allowing > > users > > to circumvent security? > > The general problem with capturing less than a full process is that > we always mess it up and forget to capture something important. > > In a lot of ways this is a very simpilar problem to setting up an at > job > or a cron job. You build a script you test it then you tell at to run > it at a certain time and it fails, because your working environment > did > not include something important that was in your actuall environment. > > Unfortunately in this case the failures we are talking about are > container escapes and privilege escalation, so we do need to tread > carefully. > > We might be able to safely define the context as the context of the > currently running init process (Which we can identifiy with a struct > pid). Justifying that looks a little trickier but doable. Right, that seems like a fairly straight forward thing to implement based on Olegs' example patch. I'll put together a series based on that approach. Keep in mind that the patches in my previous posts for sub-system usage are definitely wrong but I can use them (and they will be only an initial example of how to use the mechanism) to verify that contained execution happens. They will need to change. I was thinking that also capturing a struct cred (although I need to look more at the relationship between the process cred, and the nsproxy locations) at a particular time combined with a double fork and exec could allow inclusion of user namespace. Perhaps at only one level deep, ie. only allowing the first user namesapec created from init or from container and not user namespaces created from within a user namespace (if I can work out how to identify that case). Again when these are captured and how to get at them when needed is going to be a challenge. > > After a mechanism is picked it simply becomes a case of making certain > your permission checks for starting something are in sync with your > mechanism. Hopefully yourself and others can help with that, ;) > > Personally I am a fan of the don't be clever and capture a kernel > thread > approach as it is very easy to see you what if any exploitation > opportunities there are. The justifications for something more clever > is trickier. Of course we do something that from this perspective > would > be considered ``clever'' today with kthreadd and user mode helpers. Indeed, a good policy, but it seems the choice of the init process context (of a given container) is fairly straight forward and much of the tricky stuff and a good measure of checks may already be done in thread creation and exec code. As you have pointed out before this is a very difficult problem to deal with ..... Ian ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: call_usermodehelper in containers 2016-02-19 5:14 ` Ian Kent (?) @ 2016-02-23 2:55 ` Ian Kent [not found] ` <1456196130.2911.10.camel-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org> 2016-02-23 14:36 ` J. Bruce Fields -1 siblings, 2 replies; 49+ messages in thread From: Ian Kent @ 2016-02-23 2:55 UTC (permalink / raw) To: Eric W. Biederman Cc: Kamezawa Hiroyuki, Oleg Nesterov, Stanislav Kinsbursky, Jeff Layton, Greg KH, linux-kernel, linux-fsdevel, linux-nfs, devel, bfields, bharrosh, Linux Containers On Fri, 2016-02-19 at 13:14 +0800, Ian Kent wrote: > On Thu, 2016-02-18 at 14:45 -0600, Eric W. Biederman wrote: > > Ian Kent <raven@themaw.net> writes: > > > > > On Thu, 2016-02-18 at 14:36 +0800, Ian Kent wrote: > > > > On Thu, 2016-02-18 at 12:43 +0900, Kamezawa Hiroyuki wrote: > > > > > On 2016/02/18 11:57, Eric W. Biederman wrote: > > > > > > > > > > > > Ccing The containers list because a related discussion is > > > > > > happening > > > > > > there > > > > > > and somehow this thread has never made it there. > > > > > > > > > > > > Ian Kent <raven@themaw.net> writes: > > > > > > > > > > > > > On Mon, 2013-11-18 at 18:28 +0100, Oleg Nesterov wrote: > > > > > > > > On 11/15, Eric W. Biederman wrote: > > > > > > > > > > > > > > > > > > I don't understand that one. Having a preforked > > > > > > > > > thread > > > > > > > > > with > > > > > > > > > the > > > > > > > > > proper > > > > > > > > > environment that can act like kthreadd in terms of > > > > > > > > > spawning > > > > > > > > > user > > > > > > > > > mode > > > > > > > > > helpers works and is simple. > > > > > > > > > > > > > > Forgive me replying to such an old thread but ... > > > > > > > > > > > > > > After realizing workqueues can't be used to pre-create > > > > > > > threads > > > > > > > to > > > > > > > run > > > > > > > usermode helpers I've returned to look at this. > > > > > > > > > > > > If someone can wind up with a good implementation I will be > > > > > > happy. > > > > > > > > > > > > > > Can't we ask ->child_reaper to create the non-daemonized > > > > > > > > kernel > > > > > > > > thread > > > > > > > > with the "right" ->nsproxy, ->fs, etc? > > > > > > > > > > > > > > Eric, do you think this approach would be sufficient too? > > > > > > > > > > > > > > Probably wouldn't be quite right for user namespaces but > > > > > > > should > > > > > > > provide > > > > > > > what's needed for other cases? > > > > > > > > > > > > > > It certainly has the advantage of not having to maintain a > > > > > > > plague > > > > > > > of > > > > > > > processes waiting around to execute helpers. > > > > > > > > > > > > That certainly sounds attractive. Especially for the case > > > > > > of > > > > > > everyone > > > > > > who wants to set a core pattern in a container. > > > > > > > > > > > > I am fuzzy on all of the details right now, but what I do > > > > > > remember > > > > > > is > > > > > > that in the kernel the user mode helper concepts when they > > > > > > attempted > > > > > > to > > > > > > scrub a processes environment were quite error prone until > > > > > > we > > > > > > managed to > > > > > > get kthreadd(pid 2) on the scene which always had a clean > > > > > > environment. > > > > > > > > > > > > If we are going to tie this kind of thing to the pid > > > > > > namespace > > > > > > I > > > > > > recommend simplying denying it if you are in a user > > > > > > namespace > > > > > > without > > > > > > an approrpriate pid namespace. AKA simply not allowing > > > > > > thigns > > > > > > to > > > > > > be > > > > > > setup > > > > > > if current->pid_ns->user_ns != current->user_ns. > > > > > > > > > > > Can't be handled by simple capability like > > > > > CAP_SYS_USERMODEHELPER ? > > > > I wasn't talking about a capability I was talking about how to > > identify > > where the user mode helper lives. > > > > > > > User_ns check seems not to allow core-dump-cather in host will > > > > > not > > > > > work if user_ns is used. > > > > The bottom line is all of this approaches non-sense if user > > namespaces > > are not used. If you just have a pid namespace or a mount namespace > > (or > > perhaps both) and your fire off a new fangled user mode helper you > > get > > a > > deep problem. The user space process started to handle your core > > dump > > or > > your nfs callback will have a full set of capabilities (because it > > is > > still in the root user namespace). With a full set of capabilities > > and perhaps a little luck there is no containment. > > > > The imperfect solution that currently exists for the core dump > > helper > > is to provide enough information to the user space application that > > it can query and find out the context of the core dumping > > application > > and keep everything in that application sandbox if it so desires. > > I expect something similar could be done for other user mode helper > > style callbacks. > > > > To make starting the user space application other than how we do > > today > > needs a good argument that you are you can allow a lesser privileged > > process set things up and that it can be exploited to gain > > privielge. > > > > > > I don't think so but I'm not sure. > > > > > > > > The approach I was talking about assumes the init process of the > > > > caller > > > > (say within a container, corresponding to ->child_reaper) is an > > > > appropriate template for umh thread execution. > > > > > > > > But I don't think that covers the case where unshare has created > > > > different namespaces, like a mount namespace for example. > > > > > > > > The current workqueue sub system can't be used to pre-create a > > > > thread > > > > to > > > > be used for umh execution so, either is needs changes or yet > > > > another > > > > mechanism needs to be implemented. > > > > > > > > For uses other than core dumping capturing a reference to the > > > > struct > > > > pid > > > > of the environment init process and using that as an execution > > > > template > > > > should be sufficient and takes care of environment existence > > > > problems > > > > with some extra checks, not to mention eliminating the need for > > > > a > > > > potentially huge number of kernel threads needing to be created > > > > to > > > > provide execution templates. > > > > > > > > Where to store this and how to access it when needed is another > > > > problem. > > > > > > > > Not sure a usermode helper capability is the right thing either > > > > as > > > > I > > > > thought one important use of user namespaces was to allow > > > > unprivileged > > > > users to perform operations they otherwise can't. > > > > > > > > Maybe a CAP_SYS_USERNSCOREDUMP or similar would be sensible .... > > > > > > > > Still an appropriate execution template would be needed and IIUC > > > > we > > > > can't trust getting that from within a user created namespace > > > > environment. > > > > > > Perhaps, if a struct cred could be captured at some appropriate > > > time > > > that could be used to cater for user namespaces. > > > > > > Eric, do you think that would be possible to do without allowing > > > users > > > to circumvent security? > > > > The general problem with capturing less than a full process is that > > we always mess it up and forget to capture something important. > > > > In a lot of ways this is a very simpilar problem to setting up an at > > job > > or a cron job. You build a script you test it then you tell at to > > run > > it at a certain time and it fails, because your working environment > > did > > not include something important that was in your actuall > > environment. > > > > Unfortunately in this case the failures we are talking about are > > container escapes and privilege escalation, so we do need to tread > > carefully. > > > > We might be able to safely define the context as the context of the > > currently running init process (Which we can identifiy with a struct > > pid). Justifying that looks a little trickier but doable. > > Right, that seems like a fairly straight forward thing to implement > based on Olegs' example patch. > > I'll put together a series based on that approach. > > Keep in mind that the patches in my previous posts for sub-system > usage > are definitely wrong but I can use them (and they will be only an > initial example of how to use the mechanism) to verify that contained > execution happens. They will need to change. > > I was thinking that also capturing a struct cred (although I need to > look more at the relationship between the process cred, and the > nsproxy > locations) at a particular time combined with a double fork and exec > could allow inclusion of user namespace. > > Perhaps at only one level deep, ie. only allowing the first user > namesapec created from init or from container and not user namespaces > created from within a user namespace (if I can work out how to > identify > that case). You know, wrt. the mechanism Oleg suggested, I've been wondering if it's even necessary to capture process template information for execution. Isn't the main issue the execution of unknown arbitrary objects getting access to a privileged context? Then perhaps it is sufficient to require registration of an SHA hash (of some sort) for these objects by a suitably privileged process and only allow helper execution of valid objects. If that is sufficient then helper execution from within a container or user namespace could just use the callers environment itself. What else do we need to be wary of, any thoughts Eric? Ian ^ permalink raw reply [flat|nested] 49+ messages in thread
[parent not found: <1456196130.2911.10.camel-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org>]
* Re: call_usermodehelper in containers [not found] ` <1456196130.2911.10.camel-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org> @ 2016-02-23 14:36 ` J. Bruce Fields 0 siblings, 0 replies; 49+ messages in thread From: J. Bruce Fields @ 2016-02-23 14:36 UTC (permalink / raw) To: Ian Kent Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA, Stanislav Kinsbursky, Jeff Layton, Greg KH, Linux Containers, Oleg Nesterov, linux-kernel-u79uwXL29TY76Z2rM5mHXA, devel-GEFAQzZX7r8dnm+yROfE0A, Eric W. Biederman, bharrosh-C4P08NqkoRlBDgjK7y7TUQ, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA On Tue, Feb 23, 2016 at 10:55:30AM +0800, Ian Kent wrote: > You know, wrt. the mechanism Oleg suggested, I've been wondering if it's > even necessary to capture process template information for execution. > > Isn't the main issue the execution of unknown arbitrary objects getting > access to a privileged context? > > Then perhaps it is sufficient to require registration of an SHA hash (of > some sort) for these objects by a suitably privileged process and only > allow helper execution of valid objects. That executable probably also depends on libraries, services, and tons of other miscellaneous stuff in its environment. The NFSv4 client idmapper, for example, may be doing ldap calls. Unless the helper is created with incredible care, I don't think that it's enough just to verify that you're executing the correct helper. --b. > > If that is sufficient then helper execution from within a container or > user namespace could just use the callers environment itself. > > What else do we need to be wary of, any thoughts Eric? > > Ian ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: call_usermodehelper in containers 2016-02-23 2:55 ` Ian Kent [not found] ` <1456196130.2911.10.camel-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org> @ 2016-02-23 14:36 ` J. Bruce Fields [not found] ` <20160223143627.GB31951-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> 1 sibling, 1 reply; 49+ messages in thread From: J. Bruce Fields @ 2016-02-23 14:36 UTC (permalink / raw) To: Ian Kent Cc: Eric W. Biederman, Kamezawa Hiroyuki, Oleg Nesterov, Stanislav Kinsbursky, Jeff Layton, Greg KH, linux-kernel, linux-fsdevel, linux-nfs, devel, bharrosh, Linux Containers On Tue, Feb 23, 2016 at 10:55:30AM +0800, Ian Kent wrote: > You know, wrt. the mechanism Oleg suggested, I've been wondering if it's > even necessary to capture process template information for execution. > > Isn't the main issue the execution of unknown arbitrary objects getting > access to a privileged context? > > Then perhaps it is sufficient to require registration of an SHA hash (of > some sort) for these objects by a suitably privileged process and only > allow helper execution of valid objects. That executable probably also depends on libraries, services, and tons of other miscellaneous stuff in its environment. The NFSv4 client idmapper, for example, may be doing ldap calls. Unless the helper is created with incredible care, I don't think that it's enough just to verify that you're executing the correct helper. --b. > > If that is sufficient then helper execution from within a container or > user namespace could just use the callers environment itself. > > What else do we need to be wary of, any thoughts Eric? > > Ian ^ permalink raw reply [flat|nested] 49+ messages in thread
[parent not found: <20160223143627.GB31951-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>]
* Re: call_usermodehelper in containers 2016-02-23 14:36 ` J. Bruce Fields @ 2016-02-24 0:55 ` Ian Kent 0 siblings, 0 replies; 49+ messages in thread From: Ian Kent @ 2016-02-24 0:55 UTC (permalink / raw) To: J. Bruce Fields Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA, Stanislav Kinsbursky, Jeff Layton, Greg KH, Linux Containers, Oleg Nesterov, linux-kernel-u79uwXL29TY76Z2rM5mHXA, devel-GEFAQzZX7r8dnm+yROfE0A, Eric W. Biederman, bharrosh-C4P08NqkoRlBDgjK7y7TUQ, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA On Tue, 2016-02-23 at 09:36 -0500, J. Bruce Fields wrote: > On Tue, Feb 23, 2016 at 10:55:30AM +0800, Ian Kent wrote: > > You know, wrt. the mechanism Oleg suggested, I've been wondering if > > it's > > even necessary to capture process template information for > > execution. > > > > Isn't the main issue the execution of unknown arbitrary objects > > getting > > access to a privileged context? > > > > Then perhaps it is sufficient to require registration of an SHA hash > > (of > > some sort) for these objects by a suitably privileged process and > > only > > allow helper execution of valid objects. > > That executable probably also depends on libraries, services, and tons > of other miscellaneous stuff in its environment. The NFSv4 client > idmapper, for example, may be doing ldap calls. Unless the helper is > created with incredible care, I don't think that it's enough just to > verify that you're executing the correct helper. Yeah, I was thinking the logistics of keeping something like this up to date would be hard but calculating this for every call would be too much overhead I think. > > --b. > > > > > If that is sufficient then helper execution from within a container > > or > > user namespace could just use the callers environment itself. > > > > What else do we need to be wary of, any thoughts Eric? > > > > Ian ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: call_usermodehelper in containers @ 2016-02-24 0:55 ` Ian Kent 0 siblings, 0 replies; 49+ messages in thread From: Ian Kent @ 2016-02-24 0:55 UTC (permalink / raw) To: J. Bruce Fields Cc: Eric W. Biederman, Kamezawa Hiroyuki, Oleg Nesterov, Stanislav Kinsbursky, Jeff Layton, Greg KH, linux-kernel, linux-fsdevel, linux-nfs, devel, bharrosh, Linux Containers On Tue, 2016-02-23 at 09:36 -0500, J. Bruce Fields wrote: > On Tue, Feb 23, 2016 at 10:55:30AM +0800, Ian Kent wrote: > > You know, wrt. the mechanism Oleg suggested, I've been wondering if > > it's > > even necessary to capture process template information for > > execution. > > > > Isn't the main issue the execution of unknown arbitrary objects > > getting > > access to a privileged context? > > > > Then perhaps it is sufficient to require registration of an SHA hash > > (of > > some sort) for these objects by a suitably privileged process and > > only > > allow helper execution of valid objects. > > That executable probably also depends on libraries, services, and tons > of other miscellaneous stuff in its environment. The NFSv4 client > idmapper, for example, may be doing ldap calls. Unless the helper is > created with incredible care, I don't think that it's enough just to > verify that you're executing the correct helper. Yeah, I was thinking the logistics of keeping something like this up to date would be hard but calculating this for every call would be too much overhead I think. > > --b. > > > > > If that is sufficient then helper execution from within a container > > or > > user namespace could just use the callers environment itself. > > > > What else do we need to be wary of, any thoughts Eric? > > > > Ian ^ permalink raw reply [flat|nested] 49+ messages in thread
[parent not found: <1455858850.3356.19.camel-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org>]
* Re: call_usermodehelper in containers [not found] ` <1455858850.3356.19.camel-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org> @ 2016-02-23 2:55 ` Ian Kent 0 siblings, 0 replies; 49+ messages in thread From: Ian Kent @ 2016-02-23 2:55 UTC (permalink / raw) To: Eric W. Biederman Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA, Stanislav Kinsbursky, Jeff Layton, Greg KH, Linux Containers, Oleg Nesterov, linux-kernel-u79uwXL29TY76Z2rM5mHXA, bfields-uC3wQj2KruNg9hUCZPvPmw, devel-GEFAQzZX7r8dnm+yROfE0A, bharrosh-C4P08NqkoRlBDgjK7y7TUQ, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA On Fri, 2016-02-19 at 13:14 +0800, Ian Kent wrote: > On Thu, 2016-02-18 at 14:45 -0600, Eric W. Biederman wrote: > > Ian Kent <raven-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org> writes: > > > > > On Thu, 2016-02-18 at 14:36 +0800, Ian Kent wrote: > > > > On Thu, 2016-02-18 at 12:43 +0900, Kamezawa Hiroyuki wrote: > > > > > On 2016/02/18 11:57, Eric W. Biederman wrote: > > > > > > > > > > > > Ccing The containers list because a related discussion is > > > > > > happening > > > > > > there > > > > > > and somehow this thread has never made it there. > > > > > > > > > > > > Ian Kent <raven-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org> writes: > > > > > > > > > > > > > On Mon, 2013-11-18 at 18:28 +0100, Oleg Nesterov wrote: > > > > > > > > On 11/15, Eric W. Biederman wrote: > > > > > > > > > > > > > > > > > > I don't understand that one. Having a preforked > > > > > > > > > thread > > > > > > > > > with > > > > > > > > > the > > > > > > > > > proper > > > > > > > > > environment that can act like kthreadd in terms of > > > > > > > > > spawning > > > > > > > > > user > > > > > > > > > mode > > > > > > > > > helpers works and is simple. > > > > > > > > > > > > > > Forgive me replying to such an old thread but ... > > > > > > > > > > > > > > After realizing workqueues can't be used to pre-create > > > > > > > threads > > > > > > > to > > > > > > > run > > > > > > > usermode helpers I've returned to look at this. > > > > > > > > > > > > If someone can wind up with a good implementation I will be > > > > > > happy. > > > > > > > > > > > > > > Can't we ask ->child_reaper to create the non-daemonized > > > > > > > > kernel > > > > > > > > thread > > > > > > > > with the "right" ->nsproxy, ->fs, etc? > > > > > > > > > > > > > > Eric, do you think this approach would be sufficient too? > > > > > > > > > > > > > > Probably wouldn't be quite right for user namespaces but > > > > > > > should > > > > > > > provide > > > > > > > what's needed for other cases? > > > > > > > > > > > > > > It certainly has the advantage of not having to maintain a > > > > > > > plague > > > > > > > of > > > > > > > processes waiting around to execute helpers. > > > > > > > > > > > > That certainly sounds attractive. Especially for the case > > > > > > of > > > > > > everyone > > > > > > who wants to set a core pattern in a container. > > > > > > > > > > > > I am fuzzy on all of the details right now, but what I do > > > > > > remember > > > > > > is > > > > > > that in the kernel the user mode helper concepts when they > > > > > > attempted > > > > > > to > > > > > > scrub a processes environment were quite error prone until > > > > > > we > > > > > > managed to > > > > > > get kthreadd(pid 2) on the scene which always had a clean > > > > > > environment. > > > > > > > > > > > > If we are going to tie this kind of thing to the pid > > > > > > namespace > > > > > > I > > > > > > recommend simplying denying it if you are in a user > > > > > > namespace > > > > > > without > > > > > > an approrpriate pid namespace. AKA simply not allowing > > > > > > thigns > > > > > > to > > > > > > be > > > > > > setup > > > > > > if current->pid_ns->user_ns != current->user_ns. > > > > > > > > > > > Can't be handled by simple capability like > > > > > CAP_SYS_USERMODEHELPER ? > > > > I wasn't talking about a capability I was talking about how to > > identify > > where the user mode helper lives. > > > > > > > User_ns check seems not to allow core-dump-cather in host will > > > > > not > > > > > work if user_ns is used. > > > > The bottom line is all of this approaches non-sense if user > > namespaces > > are not used. If you just have a pid namespace or a mount namespace > > (or > > perhaps both) and your fire off a new fangled user mode helper you > > get > > a > > deep problem. The user space process started to handle your core > > dump > > or > > your nfs callback will have a full set of capabilities (because it > > is > > still in the root user namespace). With a full set of capabilities > > and perhaps a little luck there is no containment. > > > > The imperfect solution that currently exists for the core dump > > helper > > is to provide enough information to the user space application that > > it can query and find out the context of the core dumping > > application > > and keep everything in that application sandbox if it so desires. > > I expect something similar could be done for other user mode helper > > style callbacks. > > > > To make starting the user space application other than how we do > > today > > needs a good argument that you are you can allow a lesser privileged > > process set things up and that it can be exploited to gain > > privielge. > > > > > > I don't think so but I'm not sure. > > > > > > > > The approach I was talking about assumes the init process of the > > > > caller > > > > (say within a container, corresponding to ->child_reaper) is an > > > > appropriate template for umh thread execution. > > > > > > > > But I don't think that covers the case where unshare has created > > > > different namespaces, like a mount namespace for example. > > > > > > > > The current workqueue sub system can't be used to pre-create a > > > > thread > > > > to > > > > be used for umh execution so, either is needs changes or yet > > > > another > > > > mechanism needs to be implemented. > > > > > > > > For uses other than core dumping capturing a reference to the > > > > struct > > > > pid > > > > of the environment init process and using that as an execution > > > > template > > > > should be sufficient and takes care of environment existence > > > > problems > > > > with some extra checks, not to mention eliminating the need for > > > > a > > > > potentially huge number of kernel threads needing to be created > > > > to > > > > provide execution templates. > > > > > > > > Where to store this and how to access it when needed is another > > > > problem. > > > > > > > > Not sure a usermode helper capability is the right thing either > > > > as > > > > I > > > > thought one important use of user namespaces was to allow > > > > unprivileged > > > > users to perform operations they otherwise can't. > > > > > > > > Maybe a CAP_SYS_USERNSCOREDUMP or similar would be sensible .... > > > > > > > > Still an appropriate execution template would be needed and IIUC > > > > we > > > > can't trust getting that from within a user created namespace > > > > environment. > > > > > > Perhaps, if a struct cred could be captured at some appropriate > > > time > > > that could be used to cater for user namespaces. > > > > > > Eric, do you think that would be possible to do without allowing > > > users > > > to circumvent security? > > > > The general problem with capturing less than a full process is that > > we always mess it up and forget to capture something important. > > > > In a lot of ways this is a very simpilar problem to setting up an at > > job > > or a cron job. You build a script you test it then you tell at to > > run > > it at a certain time and it fails, because your working environment > > did > > not include something important that was in your actuall > > environment. > > > > Unfortunately in this case the failures we are talking about are > > container escapes and privilege escalation, so we do need to tread > > carefully. > > > > We might be able to safely define the context as the context of the > > currently running init process (Which we can identifiy with a struct > > pid). Justifying that looks a little trickier but doable. > > Right, that seems like a fairly straight forward thing to implement > based on Olegs' example patch. > > I'll put together a series based on that approach. > > Keep in mind that the patches in my previous posts for sub-system > usage > are definitely wrong but I can use them (and they will be only an > initial example of how to use the mechanism) to verify that contained > execution happens. They will need to change. > > I was thinking that also capturing a struct cred (although I need to > look more at the relationship between the process cred, and the > nsproxy > locations) at a particular time combined with a double fork and exec > could allow inclusion of user namespace. > > Perhaps at only one level deep, ie. only allowing the first user > namesapec created from init or from container and not user namespaces > created from within a user namespace (if I can work out how to > identify > that case). You know, wrt. the mechanism Oleg suggested, I've been wondering if it's even necessary to capture process template information for execution. Isn't the main issue the execution of unknown arbitrary objects getting access to a privileged context? Then perhaps it is sufficient to require registration of an SHA hash (of some sort) for these objects by a suitably privileged process and only allow helper execution of valid objects. If that is sufficient then helper execution from within a container or user namespace could just use the callers environment itself. What else do we need to be wary of, any thoughts Eric? Ian ^ permalink raw reply [flat|nested] 49+ messages in thread
[parent not found: <1455777387.3188.24.camel-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org>]
* Re: call_usermodehelper in containers [not found] ` <1455777387.3188.24.camel-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org> @ 2016-02-18 7:37 ` Ian Kent 0 siblings, 0 replies; 49+ messages in thread From: Ian Kent @ 2016-02-18 7:37 UTC (permalink / raw) To: Kamezawa Hiroyuki, Eric W. Biederman Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA, Stanislav Kinsbursky, Jeff Layton, Greg KH, Linux Containers, Oleg Nesterov, linux-kernel-u79uwXL29TY76Z2rM5mHXA, bfields-uC3wQj2KruNg9hUCZPvPmw, bharrosh-C4P08NqkoRlBDgjK7y7TUQ, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, devel-GEFAQzZX7r8dnm+yROfE0A On Thu, 2016-02-18 at 14:36 +0800, Ian Kent wrote: > On Thu, 2016-02-18 at 12:43 +0900, Kamezawa Hiroyuki wrote: > > On 2016/02/18 11:57, Eric W. Biederman wrote: > > > > > > Ccing The containers list because a related discussion is > > > happening > > > there > > > and somehow this thread has never made it there. > > > > > > Ian Kent <raven-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org> writes: > > > > > > > On Mon, 2013-11-18 at 18:28 +0100, Oleg Nesterov wrote: > > > > > On 11/15, Eric W. Biederman wrote: > > > > > > > > > > > > I don't understand that one. Having a preforked thread with > > > > > > the > > > > > > proper > > > > > > environment that can act like kthreadd in terms of spawning > > > > > > user > > > > > > mode > > > > > > helpers works and is simple. > > > > > > > > Forgive me replying to such an old thread but ... > > > > > > > > After realizing workqueues can't be used to pre-create threads > > > > to > > > > run > > > > usermode helpers I've returned to look at this. > > > > > > If someone can wind up with a good implementation I will be happy. > > > > > > > > Can't we ask ->child_reaper to create the non-daemonized > > > > > kernel > > > > > thread > > > > > with the "right" ->nsproxy, ->fs, etc? > > > > > > > > Eric, do you think this approach would be sufficient too? > > > > > > > > Probably wouldn't be quite right for user namespaces but should > > > > provide > > > > what's needed for other cases? > > > > > > > > It certainly has the advantage of not having to maintain a > > > > plague > > > > of > > > > processes waiting around to execute helpers. > > > > > > That certainly sounds attractive. Especially for the case of > > > everyone > > > who wants to set a core pattern in a container. > > > > > > I am fuzzy on all of the details right now, but what I do remember > > > is > > > that in the kernel the user mode helper concepts when they > > > attempted > > > to > > > scrub a processes environment were quite error prone until we > > > managed to > > > get kthreadd(pid 2) on the scene which always had a clean > > > environment. > > > > > > If we are going to tie this kind of thing to the pid namespace I > > > recommend simplying denying it if you are in a user namespace > > > without > > > an approrpriate pid namespace. AKA simply not allowing thigns to > > > be > > > setup > > > if current->pid_ns->user_ns != current->user_ns. > > > > > Can't be handled by simple capability like CAP_SYS_USERMODEHELPER ? > > > > User_ns check seems not to allow core-dump-cather in host will not > > work if user_ns is used. > > I don't think so but I'm not sure. > > The approach I was talking about assumes the init process of the > caller > (say within a container, corresponding to ->child_reaper) is an > appropriate template for umh thread execution. > > But I don't think that covers the case where unshare has created > different namespaces, like a mount namespace for example. > > The current workqueue sub system can't be used to pre-create a thread > to > be used for umh execution so, either is needs changes or yet another > mechanism needs to be implemented. > > For uses other than core dumping capturing a reference to the struct > pid > of the environment init process and using that as an execution > template > should be sufficient and takes care of environment existence problems > with some extra checks, not to mention eliminating the need for a > potentially huge number of kernel threads needing to be created to > provide execution templates. > > Where to store this and how to access it when needed is another > problem. > > Not sure a usermode helper capability is the right thing either as I > thought one important use of user namespaces was to allow unprivileged > users to perform operations they otherwise can't. > > Maybe a CAP_SYS_USERNSCOREDUMP or similar would be sensible .... > > Still an appropriate execution template would be needed and IIUC we > can't trust getting that from within a user created namespace > environment. Perhaps, if a struct cred could be captured at some appropriate time that could be used to cater for user namespaces. Eric, do you think that would be possible to do without allowing users to circumvent security? > > > > > Thanks, > > -Kame ^ permalink raw reply [flat|nested] 49+ messages in thread
[parent not found: <56C53DE3.1070108-+CUm20s59erQFUHtdCDX3A@public.gmane.org>]
* Re: call_usermodehelper in containers [not found] ` <56C53DE3.1070108-+CUm20s59erQFUHtdCDX3A@public.gmane.org> @ 2016-02-18 6:36 ` Ian Kent 0 siblings, 0 replies; 49+ messages in thread From: Ian Kent @ 2016-02-18 6:36 UTC (permalink / raw) To: Kamezawa Hiroyuki, Eric W. Biederman Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA, Stanislav Kinsbursky, Jeff Layton, Greg KH, Linux Containers, Oleg Nesterov, linux-kernel-u79uwXL29TY76Z2rM5mHXA, bfields-uC3wQj2KruNg9hUCZPvPmw, bharrosh-C4P08NqkoRlBDgjK7y7TUQ, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, devel-GEFAQzZX7r8dnm+yROfE0A On Thu, 2016-02-18 at 12:43 +0900, Kamezawa Hiroyuki wrote: > On 2016/02/18 11:57, Eric W. Biederman wrote: > > > > Ccing The containers list because a related discussion is happening > > there > > and somehow this thread has never made it there. > > > > Ian Kent <raven-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org> writes: > > > > > On Mon, 2013-11-18 at 18:28 +0100, Oleg Nesterov wrote: > > > > On 11/15, Eric W. Biederman wrote: > > > > > > > > > > I don't understand that one. Having a preforked thread with > > > > > the > > > > > proper > > > > > environment that can act like kthreadd in terms of spawning > > > > > user > > > > > mode > > > > > helpers works and is simple. > > > > > > Forgive me replying to such an old thread but ... > > > > > > After realizing workqueues can't be used to pre-create threads to > > > run > > > usermode helpers I've returned to look at this. > > > > If someone can wind up with a good implementation I will be happy. > > > > > > Can't we ask ->child_reaper to create the non-daemonized kernel > > > > thread > > > > with the "right" ->nsproxy, ->fs, etc? > > > > > > Eric, do you think this approach would be sufficient too? > > > > > > Probably wouldn't be quite right for user namespaces but should > > > provide > > > what's needed for other cases? > > > > > > It certainly has the advantage of not having to maintain a plague > > > of > > > processes waiting around to execute helpers. > > > > That certainly sounds attractive. Especially for the case of > > everyone > > who wants to set a core pattern in a container. > > > > I am fuzzy on all of the details right now, but what I do remember > > is > > that in the kernel the user mode helper concepts when they attempted > > to > > scrub a processes environment were quite error prone until we > > managed to > > get kthreadd(pid 2) on the scene which always had a clean > > environment. > > > > If we are going to tie this kind of thing to the pid namespace I > > recommend simplying denying it if you are in a user namespace > > without > > an approrpriate pid namespace. AKA simply not allowing thigns to be > > setup > > if current->pid_ns->user_ns != current->user_ns. > > > Can't be handled by simple capability like CAP_SYS_USERMODEHELPER ? > > User_ns check seems not to allow core-dump-cather in host will not > work if user_ns is used. I don't think so but I'm not sure. The approach I was talking about assumes the init process of the caller (say within a container, corresponding to ->child_reaper) is an appropriate template for umh thread execution. But I don't think that covers the case where unshare has created different namespaces, like a mount namespace for example. The current workqueue sub system can't be used to pre-create a thread to be used for umh execution so, either is needs changes or yet another mechanism needs to be implemented. For uses other than core dumping capturing a reference to the struct pid of the environment init process and using that as an execution template should be sufficient and takes care of environment existence problems with some extra checks, not to mention eliminating the need for a potentially huge number of kernel threads needing to be created to provide execution templates. Where to store this and how to access it when needed is another problem. Not sure a usermode helper capability is the right thing either as I thought one important use of user namespaces was to allow unprivileged users to perform operations they otherwise can't. Maybe a CAP_SYS_USERNSCOREDUMP or similar would be sensible .... Still an appropriate execution template would be needed and IIUC we can't trust getting that from within a user created namespace environment. > > Thanks, > -Kame > ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: call_usermodehelper in containers 2013-11-18 17:28 ` Oleg Nesterov 2013-11-18 18:02 ` Oleg Nesterov 2016-02-11 0:17 ` Ian Kent @ 2016-03-24 7:45 ` Ian Kent 2016-03-25 1:28 ` Oleg Nesterov 2 siblings, 1 reply; 49+ messages in thread From: Ian Kent @ 2016-03-24 7:45 UTC (permalink / raw) To: Oleg Nesterov, Eric W. Biederman Cc: Stanislav Kinsbursky, Jeff Layton, Greg KH, linux-kernel, linux-fsdevel, linux-nfs, devel, bfields, bharrosh On Mon, 2013-11-18 at 18:28 +0100, Oleg Nesterov wrote: > On 11/15, Eric W. Biederman wrote: > > > > I don't understand that one. Having a preforked thread with the > > proper > > environment that can act like kthreadd in terms of spawning user > > mode > > helpers works and is simple. > > Can't we ask ->child_reaper to create the non-daemonized kernel thread > with the "right" ->nsproxy, ->fs, etc? > > IOW. Please the the "patch" below. It is obviously incomplete and > wrong, > and it can be more clear/clean. And probably we need another API. Just > to explain what I mean. > > With this patch call_usermodehelper(..., UMH_IN_MY_NS) should do exec > from the caller's namespace. Umm ... I don't think this can work. I don't think it can be assumed that the init process of a container will behave like an init process. If you try and do this with a Docker container that has /bin/bash as the init process signals never arrive and work doesn't start until some other signal arrives at which time it fails to create the kernel thread returning an error ERESTARTNOINTER (IIRC). In fact a number of other things relating to signalling processes to cleanly shutdown in a container suffer the same problem. I probably don't understand what's actually going on, this is just my impression of what I'm seeing. > > Oleg. > --- > > --- a/include/linux/kmod.h > +++ b/include/linux/kmod.h > @@ -24,6 +24,7 @@ > #include <linux/errno.h> > #include <linux/compiler.h> > #include <linux/workqueue.h> > +#include <linux/task_work.h> > #include <linux/sysctl.h> > > #define KMOD_PATH_LEN 256 > @@ -53,8 +54,14 @@ struct file; > #define UMH_WAIT_PROC 2 /* wait for the process to > complete */ > #define UMH_KILLABLE 4 /* wait for EXEC/PROC killable > */ > > +// FIXME: IMH_* is not actually a mask > +#define UMH_IN_MY_NS 8 > + > struct subprocess_info { > - struct work_struct work; > + union { > + struct work_struct work; > + struct callback_head twork; > + }; > struct completion *complete; > char *path; > char **argv; > --- a/kernel/kmod.c > +++ b/kernel/kmod.c > @@ -541,7 +541,6 @@ struct subprocess_info > *call_usermodehelper_setup(char *path, char **argv, > if (!sub_info) > goto out; > > - INIT_WORK(&sub_info->work, __call_usermodehelper); > sub_info->path = path; > sub_info->argv = argv; > sub_info->envp = envp; > @@ -554,6 +553,24 @@ struct subprocess_info > *call_usermodehelper_setup(char *path, char **argv, > } > EXPORT_SYMBOL(call_usermodehelper_setup); > > +static int call_call_usermodehelper(void *twork) > +{ > + struct subprocess_info *sub_info = > + container_of(twork, struct subprocess_info, twork); > + > + __call_usermodehelper(&sub_info->work); > + do_exit(0); > + > +} > + > +static void fork_umh_helper(struct callback_head *twork) > +{ > + if (current->flags & PF_EXITING) > + return; // WRONG, FIXME > + > + kernel_thread(call_call_usermodehelper, twork, SIGCHLD); > +} > + > /** > * call_usermodehelper_exec - start a usermode application > * @sub_info: information about the subprocessa > @@ -570,6 +587,10 @@ int call_usermodehelper_exec(struct > subprocess_info *sub_info, int wait) > { > DECLARE_COMPLETION_ONSTACK(done); > int retval = 0; > + bool in_my_ns; > + > + in_my_ns = wait & UMH_IN_MY_NS; > + wait &= ~UMH_IN_MY_NS; > > if (!sub_info->path) { > call_usermodehelper_freeinfo(sub_info); > @@ -594,7 +615,21 @@ int call_usermodehelper_exec(struct > subprocess_info *sub_info, int wait) > sub_info->complete = &done; > sub_info->wait = wait; > > - queue_work(khelper_wq, &sub_info->work); > + if (likely(!in_my_ns)) { > + INIT_WORK(&sub_info->work, __call_usermodehelper); > + queue_work(khelper_wq, &sub_info->work); > + } else { > + // RACY, WRONG, ETC > + struct task_struct *my_init = > task_active_pid_ns(current)->child_reaper; > + > + init_task_work(&sub_info->twork, fork_umh_helper); > + task_work_add(my_init, &sub_info->twork, false); > + > + // until we have task_work_add_interruptibel() > + do_send_sig_info(SIGCHLD, SEND_SIG_FORCED, my_init, > false); > + > + } > + > if (wait == UMH_NO_WAIT) /* task has freed sub_info */ > goto unlock; > > > -- > To unsubscribe from this list: send the line "unsubscribe linux > -fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: call_usermodehelper in containers 2016-03-24 7:45 ` Ian Kent @ 2016-03-25 1:28 ` Oleg Nesterov 2016-03-25 7:25 ` Ian Kent 0 siblings, 1 reply; 49+ messages in thread From: Oleg Nesterov @ 2016-03-25 1:28 UTC (permalink / raw) To: Ian Kent Cc: Eric W. Biederman, Stanislav Kinsbursky, Jeff Layton, Greg KH, linux-kernel, linux-fsdevel, linux-nfs, devel, bfields, bharrosh Hi Ian, I can't really recall this old discussion, so I can be easily wrong... On 03/24, Ian Kent wrote: > > On Mon, 2013-11-18 at 18:28 +0100, Oleg Nesterov wrote: > > > > IOW. Please the the "patch" below. It is obviously incomplete and > > wrong, > > and it can be more clear/clean. And probably we need another API. Just > > to explain what I mean. I hope you didn't miss this part ;) In particular, we want to turn task_work_add(..., bool notify) into task_work_add(..., how_to_notify mask) and this "mask" should allow to force TIF_SIGPENDING. > > With this patch call_usermodehelper(..., UMH_IN_MY_NS) should do exec > > from the caller's namespace. > > Umm ... I don't think this can work. > > I don't think it can be assumed that the init process of a container > will behave like an init process. > > If you try and do this with a Docker container that has /bin/bash as the > init process signals never arrive and work doesn't start until some > other signal arrives only if it blocks/ignores SIGCHLD? But this doesn't matter, see above and note the "until we have task_work_add_interruptibel()" in the pseudo-code I showed. > I probably don't understand what's actually going on, this is just my > impression of what I'm seeing. Or perhaps it is me who misunderstands your concerns. Oleg. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: call_usermodehelper in containers 2016-03-25 1:28 ` Oleg Nesterov @ 2016-03-25 7:25 ` Ian Kent 0 siblings, 0 replies; 49+ messages in thread From: Ian Kent @ 2016-03-25 7:25 UTC (permalink / raw) To: Oleg Nesterov Cc: Eric W. Biederman, Stanislav Kinsbursky, Jeff Layton, Greg KH, linux-kernel, linux-fsdevel, linux-nfs, devel, bfields, bharrosh On Fri, 2016-03-25 at 02:28 +0100, Oleg Nesterov wrote: > Hi Ian, > > I can't really recall this old discussion, so I can be easily wrong... > > On 03/24, Ian Kent wrote: > > > > On Mon, 2013-11-18 at 18:28 +0100, Oleg Nesterov wrote: > > > > > > IOW. Please the the "patch" below. It is obviously incomplete and > > > wrong, > > > and it can be more clear/clean. And probably we need another API. > > > Just > > > to explain what I mean. > > I hope you didn't miss this part ;) Not at all. > > In particular, we want to turn task_work_add(..., bool notify) into > task_work_add(..., how_to_notify mask) and this "mask" should allow > to force TIF_SIGPENDING. The point of posting the reply was to try and get some advice as my understanding of the signalling subsystem is fairly poor. LOL, I'll have another look at the task_work_add() code and see if I can understand what your trying to tell me. > > > > With this patch call_usermodehelper(..., UMH_IN_MY_NS) should do > > > exec > > > from the caller's namespace. > > > > Umm ... I don't think this can work. > > > > I don't think it can be assumed that the init process of a container > > will behave like an init process. > > > > If you try and do this with a Docker container that has /bin/bash as > > the > > init process signals never arrive and work doesn't start until some > > other signal arrives > > only if it blocks/ignores SIGCHLD? But this doesn't matter, see above > and > note the "until we have task_work_add_interruptibel()" in the pseudo > -code > I showed. It seems, and this is not the only case I've encountered, that the init process in docker containers can be a problem when you want to capture and handle signals. I've seen this with /bin/bash and supervisord so far. I don't know if it is the docker container creation doing this or something else .... certainly I can catch signals within subordinate processes. The other thing that occurs to me is that just about anything in a container could be subverted so the definition of a privileged process which can be used as a template form execution is essentially undefined. Mmm ... maybe I've got that wrong too, ;) Ian ^ permalink raw reply [flat|nested] 49+ messages in thread
end of thread, other threads:[~2016-03-25 7:26 UTC | newest] Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-11-11 12:18 call_usermodehelper in containers Jeff Layton 2013-11-11 12:43 ` [Devel] " Vasily Kulikov 2013-11-11 13:26 ` Jeff Layton 2013-11-12 0:47 ` Greg KH 2013-11-12 11:12 ` Jeff Layton 2013-11-12 13:02 ` Stanislav Kinsbursky 2013-11-12 13:30 ` Jeff Layton 2013-11-15 5:05 ` Eric W. Biederman 2013-11-15 10:40 ` Stanislav Kinsbursky 2013-11-15 11:03 ` Eric W. Biederman 2013-11-15 11:54 ` Stanislav Kinsbursky 2016-02-12 23:39 ` Ian Kent 2016-02-13 16:08 ` Stanislav Kinsburskiy 2016-02-15 0:11 ` Ian Kent [not found] ` <1455495082.2941.32.camel-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org> 2016-02-18 3:17 ` Eric W. Biederman 2016-02-18 3:17 ` Eric W. Biederman 2013-11-18 17:28 ` Oleg Nesterov 2013-11-18 18:02 ` Oleg Nesterov 2013-11-19 14:51 ` Jeff Layton 2016-02-11 0:17 ` Ian Kent [not found] ` <1455149857.2903.9.camel-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org> 2016-02-18 2:57 ` Eric W. Biederman 2016-02-18 2:57 ` Eric W. Biederman [not found] ` <8737sq4teb.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> 2016-02-18 3:43 ` Kamezawa Hiroyuki 2016-02-18 3:43 ` Kamezawa Hiroyuki 2016-02-18 6:36 ` Ian Kent 2016-02-18 7:37 ` Ian Kent [not found] ` <1455781033.2908.5.camel-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org> 2016-02-18 20:45 ` Eric W. Biederman 2016-02-18 20:45 ` Eric W. Biederman [not found] ` <87r3g9ychc.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> 2016-02-19 3:08 ` Kamezawa Hiroyuki 2016-02-19 3:08 ` Kamezawa Hiroyuki [not found] ` <56C68714.2000900-+CUm20s59erQFUHtdCDX3A@public.gmane.org> 2016-02-19 5:37 ` Ian Kent 2016-02-19 5:37 ` Ian Kent [not found] ` <1455860260.3356.31.camel-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org> 2016-02-19 9:30 ` Kamezawa Hiroyuki 2016-02-19 9:30 ` Kamezawa Hiroyuki [not found] ` <56C6E0A8.3010806-+CUm20s59erQFUHtdCDX3A@public.gmane.org> 2016-02-20 3:28 ` Ian Kent 2016-02-20 3:28 ` Ian Kent 2016-02-19 5:14 ` Ian Kent 2016-02-19 5:14 ` Ian Kent 2016-02-23 2:55 ` Ian Kent [not found] ` <1456196130.2911.10.camel-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org> 2016-02-23 14:36 ` J. Bruce Fields 2016-02-23 14:36 ` J. Bruce Fields [not found] ` <20160223143627.GB31951-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> 2016-02-24 0:55 ` Ian Kent 2016-02-24 0:55 ` Ian Kent [not found] ` <1455858850.3356.19.camel-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org> 2016-02-23 2:55 ` Ian Kent [not found] ` <1455777387.3188.24.camel-PKsaG3nR2I+sTnJN9+BGXg@public.gmane.org> 2016-02-18 7:37 ` Ian Kent [not found] ` <56C53DE3.1070108-+CUm20s59erQFUHtdCDX3A@public.gmane.org> 2016-02-18 6:36 ` Ian Kent 2016-03-24 7:45 ` Ian Kent 2016-03-25 1:28 ` Oleg Nesterov 2016-03-25 7:25 ` Ian Kent
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.