From: ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org (Eric W. Biederman) To: David Howells <dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, Linux Containers <containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, James Bottomley <James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>, viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org, trondmy-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Subject: Re: [RFC][PATCH 0/9] Make containers kernel objects Date: Wed, 24 May 2017 03:26:45 -0500 [thread overview] Message-ID: <87k256ek3e.fsf@xmission.com> (raw) In-Reply-To: <3860.1495557363-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org> (David Howells's message of "Tue, 23 May 2017 17:36:03 +0100") David Howells <dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes: > James Bottomley <James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org> wrote: > >> What David is pointing out is that the kernel has a DNS cache >> (net/dns_resolver/) it can do name to IP translations, but isn't >> namespaced. Once it has one entry all containers would see it if they >> cause a lookup to go through the kernel cache, so going through the >> cache you can't have a name resolving to different IP addresses on a >> per container basis. > > Yes - and the transport to userspace, the request_key() upcall, isn't > namespaced either. Namespacing it isn't entirely simple since we have to set > the right mount namespace (for execve, config, etc.), plus any other relevant > namespaces (such as network) - which is dependent on key type. > > I can't record the mount namespace in the network namespace because that would > create a dependency loop: > > mnt_ns -> mnt -> sb -> net_ns -> mnt_ns I have already given a concrete suggest on how this might be untangled. So I won't repeat it here. >> I think Eric's point is that if you need the same DNS names resolving >> to different IP addresses on a per container basis, you can do this in >> userspace today but you have to disable the in-kernel DNS cache. > > You could disable the in-kernel dns resolver in your config, but then you > don't get referrals in NFS. Also, CIFS, AFS and other filesystems would be > affected. If you're fine with the restrictions, then there is no > problem. I haven't been arguing that at all. I was only pointing out that this issue is not an issue with DNS. Userspace handles this all fine. The issue is exclusively with this request_key api and generally user mode upcalls. I have no problem seeing that there is an issue with the kernel code. I am well aware of the problem. Unfortunately the people who cared enough to start addressing this have not been able to write kernel code that fixes this. My personal experience when I tried to use the request_key api at the beginning of this was it was too hard to test. There was no room for goofing up as at that time it was impossible to invalidate a cached reply from userspace if you happened to know it was wrong. Which meant that if something incorrect was cached it required rebooting the kernel. I have a lot of sympathy with the view that the best way to do some of this is with socket activations or perhaps something with rpc portmapper. Where something like inetd is used to start the user space component on-demand. I won't call that a solution to this case but I do think it makes a good example to compare with. When you need run something in a clean context having that something only need to worry about the contents of the data it is receiving and not about it's environment as suid applications do is a nice simplification. The entire user mode helper paradigm removes from user space the freedom to specify what context it's code should run in. In a world where everything is global that is fine. But in a world with containers where not everything is global it becomes a royal pain. And I am very very sympathetic to solving this. The only solution that I know would work is to capture the context at some point in a process and then to use that process to fork user mode helpers. So far no one has even bothered to seriously try the one solution that is guaranteed to work because it takes a lot of changes to kernel code. I believe the last effort snagged on what a pain it is to refactor the user mode helper infrastructure. I don't see in your code any of that work. I am glad to see that you also see the problem. At least when it comes to the request_key api. What I am hoping to see is someone who has the will to dig in and understand all of the interactions and refactor the kernel to solve the problem. This is not a case where our user space interfaces are preventing a solution to this problem (as your patchset implies). This is a case where things need to be refactored kernel side to solve this. So far this attempt is just another in the bazillion or so bad half-assed attempts to solve this problem I have seen over the years. Eric
WARNING: multiple messages have this Message-ID (diff)
From: ebiederm@xmission.com (Eric W. Biederman) To: David Howells <dhowells@redhat.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com>, mszeredi@redhat.com, linux-nfs@vger.kernel.org, jlayton@redhat.com, Linux Containers <containers@lists.linux-foundation.org>, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-fsdevel@vger.kernel.org, trondmy@primarydata.com, viro@zeniv.linux.org.uk Subject: Re: [RFC][PATCH 0/9] Make containers kernel objects Date: Wed, 24 May 2017 03:26:45 -0500 [thread overview] Message-ID: <87k256ek3e.fsf@xmission.com> (raw) In-Reply-To: <3860.1495557363@warthog.procyon.org.uk> (David Howells's message of "Tue, 23 May 2017 17:36:03 +0100") David Howells <dhowells@redhat.com> writes: > James Bottomley <James.Bottomley@HansenPartnership.com> wrote: > >> What David is pointing out is that the kernel has a DNS cache >> (net/dns_resolver/) it can do name to IP translations, but isn't >> namespaced. Once it has one entry all containers would see it if they >> cause a lookup to go through the kernel cache, so going through the >> cache you can't have a name resolving to different IP addresses on a >> per container basis. > > Yes - and the transport to userspace, the request_key() upcall, isn't > namespaced either. Namespacing it isn't entirely simple since we have to set > the right mount namespace (for execve, config, etc.), plus any other relevant > namespaces (such as network) - which is dependent on key type. > > I can't record the mount namespace in the network namespace because that would > create a dependency loop: > > mnt_ns -> mnt -> sb -> net_ns -> mnt_ns I have already given a concrete suggest on how this might be untangled. So I won't repeat it here. >> I think Eric's point is that if you need the same DNS names resolving >> to different IP addresses on a per container basis, you can do this in >> userspace today but you have to disable the in-kernel DNS cache. > > You could disable the in-kernel dns resolver in your config, but then you > don't get referrals in NFS. Also, CIFS, AFS and other filesystems would be > affected. If you're fine with the restrictions, then there is no > problem. I haven't been arguing that at all. I was only pointing out that this issue is not an issue with DNS. Userspace handles this all fine. The issue is exclusively with this request_key api and generally user mode upcalls. I have no problem seeing that there is an issue with the kernel code. I am well aware of the problem. Unfortunately the people who cared enough to start addressing this have not been able to write kernel code that fixes this. My personal experience when I tried to use the request_key api at the beginning of this was it was too hard to test. There was no room for goofing up as at that time it was impossible to invalidate a cached reply from userspace if you happened to know it was wrong. Which meant that if something incorrect was cached it required rebooting the kernel. I have a lot of sympathy with the view that the best way to do some of this is with socket activations or perhaps something with rpc portmapper. Where something like inetd is used to start the user space component on-demand. I won't call that a solution to this case but I do think it makes a good example to compare with. When you need run something in a clean context having that something only need to worry about the contents of the data it is receiving and not about it's environment as suid applications do is a nice simplification. The entire user mode helper paradigm removes from user space the freedom to specify what context it's code should run in. In a world where everything is global that is fine. But in a world with containers where not everything is global it becomes a royal pain. And I am very very sympathetic to solving this. The only solution that I know would work is to capture the context at some point in a process and then to use that process to fork user mode helpers. So far no one has even bothered to seriously try the one solution that is guaranteed to work because it takes a lot of changes to kernel code. I believe the last effort snagged on what a pain it is to refactor the user mode helper infrastructure. I don't see in your code any of that work. I am glad to see that you also see the problem. At least when it comes to the request_key api. What I am hoping to see is someone who has the will to dig in and understand all of the interactions and refactor the kernel to solve the problem. This is not a case where our user space interfaces are preventing a solution to this problem (as your patchset implies). This is a case where things need to be refactored kernel side to solve this. So far this attempt is just another in the bazillion or so bad half-assed attempts to solve this problem I have seen over the years. Eric
next prev parent reply other threads:[~2017-05-24 8:26 UTC|newest] Thread overview: 118+ messages / expand[flat|nested] mbox.gz Atom feed top 2017-05-22 16:22 [RFC][PATCH 0/9] Make containers kernel objects David Howells 2017-05-22 16:22 ` David Howells 2017-05-22 16:22 ` [PATCH 1/9] containers: Rename linux/container.h to linux/container_dev.h David Howells 2017-05-22 16:22 ` [PATCH 2/9] Implement containers as kernel objects David Howells 2017-08-14 5:47 ` Richard Guy Briggs 2017-08-14 5:47 ` Richard Guy Briggs [not found] ` <20170814054711.GB29957-bcJWsdo4jJjeVoXN4CMphl7TgLCtbB0G@public.gmane.org> 2017-08-16 22:21 ` Paul Moore 2017-08-16 22:21 ` Paul Moore 2017-08-16 22:21 ` Paul Moore 2017-08-16 22:21 ` Paul Moore [not found] ` <CAHC9VhRgPRa7KeMt8G700aeFvqVYc0gMx__82K31TYY6oQQqTw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2017-08-18 8:03 ` Richard Guy Briggs 2017-08-18 8:03 ` Richard Guy Briggs 2017-08-18 8:03 ` Richard Guy Briggs 2017-09-06 14:03 ` Serge E. Hallyn 2017-09-06 14:03 ` Serge E. Hallyn [not found] ` <20170906140341.GA8729-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> 2017-09-14 5:47 ` Richard Guy Briggs 2017-09-14 5:47 ` Richard Guy Briggs 2017-09-14 5:47 ` Richard Guy Briggs [not found] ` <20170818080300.GQ7187-bcJWsdo4jJjeVoXN4CMphl7TgLCtbB0G@public.gmane.org> 2017-09-06 14:03 ` Serge E. Hallyn 2017-09-08 20:02 ` Paul Moore 2017-09-08 20:02 ` Paul Moore [not found] ` <149547016213.10599.1969443294414531853.stgit-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org> 2017-08-14 5:47 ` Richard Guy Briggs 2017-05-22 16:22 ` [PATCH 3/9] Provide /proc/containers David Howells 2017-05-22 16:22 ` David Howells 2017-05-22 16:22 ` [PATCH 4/9] Allow processes to be forked and upcalled into a container David Howells 2017-05-22 16:22 ` David Howells 2017-05-22 16:23 ` [PATCH 5/9] Open a socket inside " David Howells 2017-05-22 16:23 ` [PATCH 6/9] Allow fs syscall dfd arguments to take a container fd David Howells 2017-05-22 16:23 ` [PATCH 7/9] Make fsopen() able to initiate mounting into a container David Howells 2017-05-22 16:23 ` [PATCH 8/9] Honour CONTAINER_NEW_EMPTY_FS_NS David Howells 2017-05-22 16:23 ` David Howells 2017-05-22 16:23 ` [PATCH 9/9] Sample program for driving container objects David Howells [not found] ` <149547014649.10599.12025037906646164347.stgit-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org> 2017-05-22 16:53 ` [RFC][PATCH 0/9] Make containers kernel objects James Bottomley 2017-05-22 16:53 ` James Bottomley 2017-05-22 17:14 ` Aleksa Sarai 2017-05-22 17:14 ` Aleksa Sarai 2017-05-22 17:27 ` Jessica Frazelle 2017-05-22 17:27 ` Jessica Frazelle 2017-05-22 18:34 ` Jeff Layton 2017-05-22 18:34 ` Jeff Layton [not found] ` <1495478092.2816.17.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2017-05-22 19:21 ` James Bottomley 2017-05-22 19:21 ` James Bottomley 2017-05-22 19:21 ` James Bottomley 2017-05-22 22:14 ` Jeff Layton [not found] ` <1495480860.9050.18.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org> 2017-05-22 22:14 ` Jeff Layton 2017-05-23 10:35 ` Ian Kent 2017-05-23 10:35 ` Ian Kent 2017-05-23 10:35 ` Ian Kent 2017-05-23 9:38 ` Ian Kent 2017-05-23 9:38 ` Ian Kent 2017-05-23 9:38 ` Ian Kent [not found] ` <1495472039.2757.19.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org> 2017-05-22 17:14 ` Aleksa Sarai 2017-05-22 17:27 ` Jessica Frazelle 2017-05-22 18:34 ` Jeff Layton 2017-05-23 9:38 ` Ian Kent 2017-05-23 13:52 ` David Howells [not found] ` <f167feeb-e653-12e3-eec8-24162f7f7c07-l3A5Bk7waGM@public.gmane.org> 2017-05-23 14:53 ` David Howells 2017-05-23 14:53 ` David Howells 2017-05-23 14:56 ` Eric W. Biederman 2017-05-23 14:56 ` Eric W. Biederman [not found] ` <2446.1495551216-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org> 2017-05-23 14:56 ` Eric W. Biederman 2017-05-23 15:14 ` David Howells 2017-05-23 15:14 ` David Howells [not found] ` <2961.1495552481-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org> 2017-05-23 15:17 ` Eric W. Biederman 2017-05-23 15:17 ` Eric W. Biederman [not found] ` <87bmqjmwl5.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> 2017-05-23 15:44 ` James Bottomley 2017-05-23 15:44 ` James Bottomley 2017-05-23 15:44 ` James Bottomley [not found] ` <1495554267.27369.9.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org> 2017-05-23 16:36 ` David Howells 2017-05-23 16:36 ` David Howells [not found] ` <3860.1495557363-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org> 2017-05-24 8:26 ` Eric W. Biederman [this message] 2017-05-24 8:26 ` Eric W. Biederman 2017-05-24 9:16 ` Ian Kent 2017-05-24 9:16 ` Ian Kent [not found] ` <87k256ek3e.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> 2017-05-24 9:16 ` Ian Kent [not found] ` <87zie3mxkc.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> 2017-05-23 15:14 ` David Howells 2017-05-22 17:11 ` Jessica Frazelle 2017-05-22 17:11 ` Jessica Frazelle 2017-05-22 19:04 ` Eric W. Biederman 2017-05-22 19:04 ` Eric W. Biederman 2017-05-22 22:22 ` Jeff Layton 2017-05-22 22:22 ` Jeff Layton 2017-05-23 12:54 ` Eric W. Biederman 2017-05-23 12:54 ` Eric W. Biederman 2017-05-23 14:27 ` Jeff Layton 2017-05-23 14:27 ` Jeff Layton 2017-05-23 14:30 ` Djalal Harouni 2017-05-23 14:30 ` Djalal Harouni 2017-05-23 14:54 ` Colin Walters 2017-05-23 14:54 ` Colin Walters 2017-05-23 15:31 ` Jeff Layton 2017-05-23 15:31 ` Jeff Layton 2017-05-23 15:35 ` Colin Walters 2017-05-23 15:35 ` Colin Walters 2017-05-23 15:30 ` David Howells 2017-05-23 14:23 ` Djalal Harouni 2017-05-23 14:23 ` Djalal Harouni 2017-05-27 17:45 ` Trond Myklebust 2017-05-27 17:45 ` Trond Myklebust 2017-05-27 19:10 ` James Bottomley 2017-05-27 19:10 ` James Bottomley 2017-05-30 1:03 ` Ian Kent 2017-05-30 1:03 ` Ian Kent 2017-05-23 10:09 ` Ian Kent 2017-05-23 10:09 ` Ian Kent 2017-05-23 13:52 ` David Howells 2017-05-23 13:52 ` David Howells 2017-05-23 15:02 ` James Bottomley 2017-05-23 15:02 ` James Bottomley [not found] ` <32556.1495547529-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org> 2017-05-23 15:02 ` James Bottomley 2017-05-23 15:23 ` Eric W. Biederman 2017-05-23 15:23 ` Eric W. Biederman 2017-05-23 15:12 ` David Howells 2017-05-23 15:12 ` David Howells 2017-05-23 15:33 ` Eric W. Biederman 2017-05-23 15:33 ` Eric W. Biederman 2017-05-23 16:13 ` David Howells 2017-05-23 16:13 ` David Howells
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=87k256ek3e.fsf@xmission.com \ --to=ebiederm-as9lmozglivwk0htik3j/w@public.gmane.org \ --cc=James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org \ --cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \ --cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \ --cc=dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \ --cc=jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \ --cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \ --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \ --cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \ --cc=mszeredi-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \ --cc=trondmy-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org \ --cc=viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.