From: Djalal Harouni <tixxdz@gmail.com> To: Jeff Layton <jlayton@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com>, David Howells <dhowells@redhat.com>, trondmy@primarydata.com, Miklos Szeredi <mszeredi@redhat.com>, linux-nfs@vger.kernel.org, linux-kernel <linux-kernel@vger.kernel.org>, Alexander Viro <viro@zeniv.linux.org.uk>, Linux FS Devel <linux-fsdevel@vger.kernel.org>, "open list:CONTROL GROUP (CGROUP)" <cgroups@vger.kernel.org>, Andy Lutomirski <luto@kernel.org>, Kees Cook <keescook@chromium.org> Subject: Re: [RFC][PATCH 0/9] Make containers kernel objects Date: Tue, 23 May 2017 16:23:55 +0200 [thread overview] Message-ID: <CAEiveUchdQVvFCVxSVcvpxnT0gj=sidfsJfW5mOOWKL6LWddYQ@mail.gmail.com> (raw) In-Reply-To: <1495491733.25946.3.camel@redhat.com> On Tue, May 23, 2017 at 12:22 AM, Jeff Layton <jlayton@redhat.com> wrote: > On Mon, 2017-05-22 at 14:04 -0500, Eric W. Biederman wrote: >> David Howells <dhowells@redhat.com> writes: >> >> > Here are a set of patches to define a container object for the kernel and >> > to provide some methods to create and manipulate them. >> > >> > The reason I think this is necessary is that the kernel has no idea how to >> > direct upcalls to what userspace considers to be a container - current >> > Linux practice appears to make a "container" just an arbitrarily chosen >> > junction of namespaces, control groups and files, which may be changed >> > individually within the "container". >> > >> >> I think this might possibly be a useful abstraction for solving the >> keyring upcalls if it was something created implicitly. >> >> fork_into_container for use by keyring upcalls is currently a security >> vulnerability as it allows escaping all of a containers cgroups. But >> you have that on your list of things to fix. However you don't have >> seccomp and a few other things. >> >> Before we had kthreadd in the kernel upcalls always had issues because >> the code to reset all of the userspace bits and make the forked >> task suitable for running upcalls was always missing some detail. It is >> a very bug-prone kind of idiom that you are talking about. It is doubly >> bug-prone because the wrongness is visible to userspace and as such >> might get become a frozen KABI guarantee. >> >> Let me suggest a concrete alternative: >> >> - At the time of mount observer the mounters user namespace. >> - Find the mounters pid namespace. >> - If the mounters pid namespace is owned by the mounters user namespace >> walk up the pid namespace tree to the first pid namespace owned by >> that user namespace. >> - If the mounters pid namespace is not owned by the mounters user >> namespace fail the mount it is going to need to make upcalls as >> will not be possible. >> - Hold a reference to the pid namespace that was found. >> >> Then when an upcall needs to be made fork a child of the init process >> of the specified pid namespace. Or fail if the init process of the >> pid namespace has died. >> >> That should always work and it does not require keeping expensive state >> where we did not have it previously. Further because the semantics are >> fork a child of a particular pid namespace's init as features get added >> to the kernel this code remains well defined. >> >> For ordinary request-key upcalls we should be able to use the same rules >> and just not save/restore things in the kernel. >> > > OK, that does seem like a reasonable idea. Note that it's not just > request-key upcalls here that we're interested in, but anything that > we'd typically spawn from kthreadd otherwise. Generalizing it will expose the kernel to exploits, today containers setup the mount namespace for images from the net, outdated filesystems, and users just do it, it is easy. Having kthread running inside such contexts is not a good idea. That's today usecases. > That said, I worry a little about this. If the init process does a setns > at the wrong time, suddenly you're doing the upcall in different > namespaces than you intended. That init process or whatever process inside owns that context and files. Maybe for some cases it is better to use userspace that you can talk to through a standard kernel bus endpoint and request a resource as it is done within modern apps. The application at the other end acts using kthread helpers in the appropriate context. -- tixxdz
WARNING: multiple messages have this Message-ID (diff)
From: Djalal Harouni <tixxdz-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> To: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> Cc: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>, David Howells <dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, trondmy-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org, Miklos Szeredi <mszeredi-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-kernel <linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>, Alexander Viro <viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>, Linux FS Devel <linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>, "open list:CONTROL GROUP (CGROUP)" <cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>, Andy Lutomirski <luto-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>, Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org> Subject: Re: [RFC][PATCH 0/9] Make containers kernel objects Date: Tue, 23 May 2017 16:23:55 +0200 [thread overview] Message-ID: <CAEiveUchdQVvFCVxSVcvpxnT0gj=sidfsJfW5mOOWKL6LWddYQ@mail.gmail.com> (raw) In-Reply-To: <1495491733.25946.3.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> On Tue, May 23, 2017 at 12:22 AM, Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > On Mon, 2017-05-22 at 14:04 -0500, Eric W. Biederman wrote: >> David Howells <dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes: >> >> > Here are a set of patches to define a container object for the kernel and >> > to provide some methods to create and manipulate them. >> > >> > The reason I think this is necessary is that the kernel has no idea how to >> > direct upcalls to what userspace considers to be a container - current >> > Linux practice appears to make a "container" just an arbitrarily chosen >> > junction of namespaces, control groups and files, which may be changed >> > individually within the "container". >> > >> >> I think this might possibly be a useful abstraction for solving the >> keyring upcalls if it was something created implicitly. >> >> fork_into_container for use by keyring upcalls is currently a security >> vulnerability as it allows escaping all of a containers cgroups. But >> you have that on your list of things to fix. However you don't have >> seccomp and a few other things. >> >> Before we had kthreadd in the kernel upcalls always had issues because >> the code to reset all of the userspace bits and make the forked >> task suitable for running upcalls was always missing some detail. It is >> a very bug-prone kind of idiom that you are talking about. It is doubly >> bug-prone because the wrongness is visible to userspace and as such >> might get become a frozen KABI guarantee. >> >> Let me suggest a concrete alternative: >> >> - At the time of mount observer the mounters user namespace. >> - Find the mounters pid namespace. >> - If the mounters pid namespace is owned by the mounters user namespace >> walk up the pid namespace tree to the first pid namespace owned by >> that user namespace. >> - If the mounters pid namespace is not owned by the mounters user >> namespace fail the mount it is going to need to make upcalls as >> will not be possible. >> - Hold a reference to the pid namespace that was found. >> >> Then when an upcall needs to be made fork a child of the init process >> of the specified pid namespace. Or fail if the init process of the >> pid namespace has died. >> >> That should always work and it does not require keeping expensive state >> where we did not have it previously. Further because the semantics are >> fork a child of a particular pid namespace's init as features get added >> to the kernel this code remains well defined. >> >> For ordinary request-key upcalls we should be able to use the same rules >> and just not save/restore things in the kernel. >> > > OK, that does seem like a reasonable idea. Note that it's not just > request-key upcalls here that we're interested in, but anything that > we'd typically spawn from kthreadd otherwise. Generalizing it will expose the kernel to exploits, today containers setup the mount namespace for images from the net, outdated filesystems, and users just do it, it is easy. Having kthread running inside such contexts is not a good idea. That's today usecases. > That said, I worry a little about this. If the init process does a setns > at the wrong time, suddenly you're doing the upcall in different > namespaces than you intended. That init process or whatever process inside owns that context and files. Maybe for some cases it is better to use userspace that you can talk to through a standard kernel bus endpoint and request a resource as it is done within modern apps. The application at the other end acts using kthread helpers in the appropriate context. -- tixxdz
next prev parent reply other threads:[~2017-05-23 14:24 UTC|newest] Thread overview: 118+ messages / expand[flat|nested] mbox.gz Atom feed top 2017-05-22 16:22 [RFC][PATCH 0/9] Make containers kernel objects David Howells 2017-05-22 16:22 ` David Howells 2017-05-22 16:22 ` [PATCH 1/9] containers: Rename linux/container.h to linux/container_dev.h David Howells 2017-05-22 16:22 ` [PATCH 2/9] Implement containers as kernel objects David Howells 2017-08-14 5:47 ` Richard Guy Briggs 2017-08-14 5:47 ` Richard Guy Briggs [not found] ` <20170814054711.GB29957-bcJWsdo4jJjeVoXN4CMphl7TgLCtbB0G@public.gmane.org> 2017-08-16 22:21 ` Paul Moore 2017-08-16 22:21 ` Paul Moore 2017-08-16 22:21 ` Paul Moore 2017-08-16 22:21 ` Paul Moore [not found] ` <CAHC9VhRgPRa7KeMt8G700aeFvqVYc0gMx__82K31TYY6oQQqTw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2017-08-18 8:03 ` Richard Guy Briggs 2017-08-18 8:03 ` Richard Guy Briggs 2017-08-18 8:03 ` Richard Guy Briggs 2017-09-06 14:03 ` Serge E. Hallyn 2017-09-06 14:03 ` Serge E. Hallyn [not found] ` <20170906140341.GA8729-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> 2017-09-14 5:47 ` Richard Guy Briggs 2017-09-14 5:47 ` Richard Guy Briggs 2017-09-14 5:47 ` Richard Guy Briggs [not found] ` <20170818080300.GQ7187-bcJWsdo4jJjeVoXN4CMphl7TgLCtbB0G@public.gmane.org> 2017-09-06 14:03 ` Serge E. Hallyn 2017-09-08 20:02 ` Paul Moore 2017-09-08 20:02 ` Paul Moore [not found] ` <149547016213.10599.1969443294414531853.stgit-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org> 2017-08-14 5:47 ` Richard Guy Briggs 2017-05-22 16:22 ` [PATCH 3/9] Provide /proc/containers David Howells 2017-05-22 16:22 ` David Howells 2017-05-22 16:22 ` [PATCH 4/9] Allow processes to be forked and upcalled into a container David Howells 2017-05-22 16:22 ` David Howells 2017-05-22 16:23 ` [PATCH 5/9] Open a socket inside " David Howells 2017-05-22 16:23 ` [PATCH 6/9] Allow fs syscall dfd arguments to take a container fd David Howells 2017-05-22 16:23 ` [PATCH 7/9] Make fsopen() able to initiate mounting into a container David Howells 2017-05-22 16:23 ` [PATCH 8/9] Honour CONTAINER_NEW_EMPTY_FS_NS David Howells 2017-05-22 16:23 ` David Howells 2017-05-22 16:23 ` [PATCH 9/9] Sample program for driving container objects David Howells [not found] ` <149547014649.10599.12025037906646164347.stgit-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org> 2017-05-22 16:53 ` [RFC][PATCH 0/9] Make containers kernel objects James Bottomley 2017-05-22 16:53 ` James Bottomley 2017-05-22 17:14 ` Aleksa Sarai 2017-05-22 17:14 ` Aleksa Sarai 2017-05-22 17:27 ` Jessica Frazelle 2017-05-22 17:27 ` Jessica Frazelle 2017-05-22 18:34 ` Jeff Layton 2017-05-22 18:34 ` Jeff Layton [not found] ` <1495478092.2816.17.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2017-05-22 19:21 ` James Bottomley 2017-05-22 19:21 ` James Bottomley 2017-05-22 19:21 ` James Bottomley 2017-05-22 22:14 ` Jeff Layton [not found] ` <1495480860.9050.18.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org> 2017-05-22 22:14 ` Jeff Layton 2017-05-23 10:35 ` Ian Kent 2017-05-23 10:35 ` Ian Kent 2017-05-23 10:35 ` Ian Kent 2017-05-23 9:38 ` Ian Kent 2017-05-23 9:38 ` Ian Kent 2017-05-23 9:38 ` Ian Kent [not found] ` <1495472039.2757.19.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org> 2017-05-22 17:14 ` Aleksa Sarai 2017-05-22 17:27 ` Jessica Frazelle 2017-05-22 18:34 ` Jeff Layton 2017-05-23 9:38 ` Ian Kent 2017-05-23 13:52 ` David Howells [not found] ` <f167feeb-e653-12e3-eec8-24162f7f7c07-l3A5Bk7waGM@public.gmane.org> 2017-05-23 14:53 ` David Howells 2017-05-23 14:53 ` David Howells 2017-05-23 14:56 ` Eric W. Biederman 2017-05-23 14:56 ` Eric W. Biederman [not found] ` <2446.1495551216-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org> 2017-05-23 14:56 ` Eric W. Biederman 2017-05-23 15:14 ` David Howells 2017-05-23 15:14 ` David Howells [not found] ` <2961.1495552481-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org> 2017-05-23 15:17 ` Eric W. Biederman 2017-05-23 15:17 ` Eric W. Biederman [not found] ` <87bmqjmwl5.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> 2017-05-23 15:44 ` James Bottomley 2017-05-23 15:44 ` James Bottomley 2017-05-23 15:44 ` James Bottomley [not found] ` <1495554267.27369.9.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org> 2017-05-23 16:36 ` David Howells 2017-05-23 16:36 ` David Howells [not found] ` <3860.1495557363-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org> 2017-05-24 8:26 ` Eric W. Biederman 2017-05-24 8:26 ` Eric W. Biederman 2017-05-24 9:16 ` Ian Kent 2017-05-24 9:16 ` Ian Kent [not found] ` <87k256ek3e.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> 2017-05-24 9:16 ` Ian Kent [not found] ` <87zie3mxkc.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> 2017-05-23 15:14 ` David Howells 2017-05-22 17:11 ` Jessica Frazelle 2017-05-22 17:11 ` Jessica Frazelle 2017-05-22 19:04 ` Eric W. Biederman 2017-05-22 19:04 ` Eric W. Biederman 2017-05-22 22:22 ` Jeff Layton 2017-05-22 22:22 ` Jeff Layton 2017-05-23 12:54 ` Eric W. Biederman 2017-05-23 12:54 ` Eric W. Biederman 2017-05-23 14:27 ` Jeff Layton 2017-05-23 14:27 ` Jeff Layton 2017-05-23 14:30 ` Djalal Harouni 2017-05-23 14:30 ` Djalal Harouni 2017-05-23 14:54 ` Colin Walters 2017-05-23 14:54 ` Colin Walters 2017-05-23 15:31 ` Jeff Layton 2017-05-23 15:31 ` Jeff Layton 2017-05-23 15:35 ` Colin Walters 2017-05-23 15:35 ` Colin Walters 2017-05-23 15:30 ` David Howells 2017-05-23 14:23 ` Djalal Harouni [this message] 2017-05-23 14:23 ` Djalal Harouni 2017-05-27 17:45 ` Trond Myklebust 2017-05-27 17:45 ` Trond Myklebust 2017-05-27 19:10 ` James Bottomley 2017-05-27 19:10 ` James Bottomley 2017-05-30 1:03 ` Ian Kent 2017-05-30 1:03 ` Ian Kent 2017-05-23 10:09 ` Ian Kent 2017-05-23 10:09 ` Ian Kent 2017-05-23 13:52 ` David Howells 2017-05-23 13:52 ` David Howells 2017-05-23 15:02 ` James Bottomley 2017-05-23 15:02 ` James Bottomley [not found] ` <32556.1495547529-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org> 2017-05-23 15:02 ` James Bottomley 2017-05-23 15:23 ` Eric W. Biederman 2017-05-23 15:23 ` Eric W. Biederman 2017-05-23 15:12 ` David Howells 2017-05-23 15:12 ` David Howells 2017-05-23 15:33 ` Eric W. Biederman 2017-05-23 15:33 ` Eric W. Biederman 2017-05-23 16:13 ` David Howells 2017-05-23 16:13 ` David Howells
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to='CAEiveUchdQVvFCVxSVcvpxnT0gj=sidfsJfW5mOOWKL6LWddYQ@mail.gmail.com' \ --to=tixxdz@gmail.com \ --cc=cgroups@vger.kernel.org \ --cc=dhowells@redhat.com \ --cc=ebiederm@xmission.com \ --cc=jlayton@redhat.com \ --cc=keescook@chromium.org \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-nfs@vger.kernel.org \ --cc=luto@kernel.org \ --cc=mszeredi@redhat.com \ --cc=trondmy@primarydata.com \ --cc=viro@zeniv.linux.org.uk \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.