All of lore.kernel.org
 help / color / mirror / Atom feed
From: ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org (Eric W. Biederman)
To: David Howells <dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	Linux Containers
	<containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	James Bottomley
	<James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org,
	trondmy-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [RFC][PATCH 0/9] Make containers kernel objects
Date: Wed, 24 May 2017 03:26:45 -0500	[thread overview]
Message-ID: <87k256ek3e.fsf@xmission.com> (raw)
In-Reply-To: <3860.1495557363-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org> (David Howells's message of "Tue, 23 May 2017 17:36:03 +0100")

David Howells <dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:

> James Bottomley <James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org> wrote:
>
>> What David is pointing out is that the kernel has a DNS cache
>> (net/dns_resolver/) it can do name to IP translations, but isn't
>> namespaced.  Once it has one entry all containers would see it if they
>> cause a lookup to go through the kernel cache, so going through the
>> cache you can't have a name resolving to different IP addresses on a
>> per container basis.
>
> Yes - and the transport to userspace, the request_key() upcall, isn't
> namespaced either.  Namespacing it isn't entirely simple since we have to set
> the right mount namespace (for execve, config, etc.), plus any other relevant
> namespaces (such as network) - which is dependent on key type.
>
> I can't record the mount namespace in the network namespace because that would
> create a dependency loop:
>
> 	mnt_ns -> mnt -> sb -> net_ns -> mnt_ns

I have already given a concrete suggest on how this might be untangled.
So I won't repeat it here.

>> I think Eric's point is that if you need the same DNS names resolving
>> to different IP addresses on a per container basis, you can do this in
>> userspace today but you have to disable the in-kernel DNS cache.
>
> You could disable the in-kernel dns resolver in your config, but then you
> don't get referrals in NFS.  Also, CIFS, AFS and other filesystems would be
> affected.  If you're fine with the restrictions, then there is no
> problem.


I haven't been arguing that at all.  I was only pointing out that this
issue is not an issue with DNS.  Userspace handles this all fine.
The issue is exclusively with this request_key api and generally user
mode upcalls.

I have no problem seeing that there is an issue with the kernel code.
I am well aware of the problem.  Unfortunately the people who cared
enough to start addressing this have not been able to write kernel
code that fixes this.

My personal experience when I tried to use the request_key api at
the beginning of this was it was too hard to test.  There was no room
for goofing up as at that time it was impossible to invalidate a cached
reply from userspace if you happened to know it was wrong.  Which meant
that if something incorrect was cached it required rebooting the kernel.

I have a lot of sympathy with the view that the best way to do
some of this is with socket activations or perhaps something with rpc
portmapper.  Where something like inetd is used to start the user space
component on-demand.  I won't call that a solution to this case but I do
think it makes a good example to compare with.

When you need run something in a clean context having that something
only need to worry about the contents of the data it is receiving and
not about it's environment as suid applications do is a nice
simplification.

The entire user mode helper paradigm removes from user space the freedom
to specify what context it's code should run in.  In a world where
everything is global that is fine.  But in a world with containers where
not everything is global it becomes a royal pain.

And I am very very sympathetic to solving this.  The only solution that
I know would work is to capture the context at some point in a process
and then to use that process to fork user mode helpers.

So far no one has even bothered to seriously try the one solution that
is guaranteed to work because it takes a lot of changes to kernel code.
I believe the last effort snagged on what a pain it is to refactor the
user mode helper infrastructure.

I don't see in your code any of that work.

I am glad to see that you also see the problem.  At least when it comes
to the request_key api.

What I am hoping to see is someone who has the will to dig in and
understand all of the interactions and refactor the kernel to solve
the problem.

This is not a case where our user space interfaces are preventing a
solution to this problem (as your patchset implies).  This is a case
where things need to be refactored kernel side to solve this.

So far this attempt is just another in the bazillion or so bad
half-assed attempts to solve this problem I have seen over the years.

Eric

WARNING: multiple messages have this Message-ID (diff)
From: ebiederm@xmission.com (Eric W. Biederman)
To: David Howells <dhowells@redhat.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>,
	mszeredi@redhat.com, linux-nfs@vger.kernel.org,
	jlayton@redhat.com,
	Linux Containers <containers@lists.linux-foundation.org>,
	linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, trondmy@primarydata.com,
	viro@zeniv.linux.org.uk
Subject: Re: [RFC][PATCH 0/9] Make containers kernel objects
Date: Wed, 24 May 2017 03:26:45 -0500	[thread overview]
Message-ID: <87k256ek3e.fsf@xmission.com> (raw)
In-Reply-To: <3860.1495557363@warthog.procyon.org.uk> (David Howells's message of "Tue, 23 May 2017 17:36:03 +0100")

David Howells <dhowells@redhat.com> writes:

> James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>
>> What David is pointing out is that the kernel has a DNS cache
>> (net/dns_resolver/) it can do name to IP translations, but isn't
>> namespaced.  Once it has one entry all containers would see it if they
>> cause a lookup to go through the kernel cache, so going through the
>> cache you can't have a name resolving to different IP addresses on a
>> per container basis.
>
> Yes - and the transport to userspace, the request_key() upcall, isn't
> namespaced either.  Namespacing it isn't entirely simple since we have to set
> the right mount namespace (for execve, config, etc.), plus any other relevant
> namespaces (such as network) - which is dependent on key type.
>
> I can't record the mount namespace in the network namespace because that would
> create a dependency loop:
>
> 	mnt_ns -> mnt -> sb -> net_ns -> mnt_ns

I have already given a concrete suggest on how this might be untangled.
So I won't repeat it here.

>> I think Eric's point is that if you need the same DNS names resolving
>> to different IP addresses on a per container basis, you can do this in
>> userspace today but you have to disable the in-kernel DNS cache.
>
> You could disable the in-kernel dns resolver in your config, but then you
> don't get referrals in NFS.  Also, CIFS, AFS and other filesystems would be
> affected.  If you're fine with the restrictions, then there is no
> problem.


I haven't been arguing that at all.  I was only pointing out that this
issue is not an issue with DNS.  Userspace handles this all fine.
The issue is exclusively with this request_key api and generally user
mode upcalls.

I have no problem seeing that there is an issue with the kernel code.
I am well aware of the problem.  Unfortunately the people who cared
enough to start addressing this have not been able to write kernel
code that fixes this.

My personal experience when I tried to use the request_key api at
the beginning of this was it was too hard to test.  There was no room
for goofing up as at that time it was impossible to invalidate a cached
reply from userspace if you happened to know it was wrong.  Which meant
that if something incorrect was cached it required rebooting the kernel.

I have a lot of sympathy with the view that the best way to do
some of this is with socket activations or perhaps something with rpc
portmapper.  Where something like inetd is used to start the user space
component on-demand.  I won't call that a solution to this case but I do
think it makes a good example to compare with.

When you need run something in a clean context having that something
only need to worry about the contents of the data it is receiving and
not about it's environment as suid applications do is a nice
simplification.

The entire user mode helper paradigm removes from user space the freedom
to specify what context it's code should run in.  In a world where
everything is global that is fine.  But in a world with containers where
not everything is global it becomes a royal pain.

And I am very very sympathetic to solving this.  The only solution that
I know would work is to capture the context at some point in a process
and then to use that process to fork user mode helpers.

So far no one has even bothered to seriously try the one solution that
is guaranteed to work because it takes a lot of changes to kernel code.
I believe the last effort snagged on what a pain it is to refactor the
user mode helper infrastructure.

I don't see in your code any of that work.

I am glad to see that you also see the problem.  At least when it comes
to the request_key api.

What I am hoping to see is someone who has the will to dig in and
understand all of the interactions and refactor the kernel to solve
the problem.

This is not a case where our user space interfaces are preventing a
solution to this problem (as your patchset implies).  This is a case
where things need to be refactored kernel side to solve this.

So far this attempt is just another in the bazillion or so bad
half-assed attempts to solve this problem I have seen over the years.

Eric

  parent reply	other threads:[~2017-05-24  8:26 UTC|newest]

Thread overview: 118+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-22 16:22 [RFC][PATCH 0/9] Make containers kernel objects David Howells
2017-05-22 16:22 ` David Howells
2017-05-22 16:22 ` [PATCH 1/9] containers: Rename linux/container.h to linux/container_dev.h David Howells
2017-05-22 16:22 ` [PATCH 2/9] Implement containers as kernel objects David Howells
2017-08-14  5:47   ` Richard Guy Briggs
2017-08-14  5:47     ` Richard Guy Briggs
     [not found]     ` <20170814054711.GB29957-bcJWsdo4jJjeVoXN4CMphl7TgLCtbB0G@public.gmane.org>
2017-08-16 22:21       ` Paul Moore
2017-08-16 22:21     ` Paul Moore
2017-08-16 22:21       ` Paul Moore
2017-08-16 22:21       ` Paul Moore
     [not found]       ` <CAHC9VhRgPRa7KeMt8G700aeFvqVYc0gMx__82K31TYY6oQQqTw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-08-18  8:03         ` Richard Guy Briggs
2017-08-18  8:03       ` Richard Guy Briggs
2017-08-18  8:03         ` Richard Guy Briggs
2017-09-06 14:03         ` Serge E. Hallyn
2017-09-06 14:03           ` Serge E. Hallyn
     [not found]           ` <20170906140341.GA8729-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2017-09-14  5:47             ` Richard Guy Briggs
2017-09-14  5:47           ` Richard Guy Briggs
2017-09-14  5:47             ` Richard Guy Briggs
     [not found]         ` <20170818080300.GQ7187-bcJWsdo4jJjeVoXN4CMphl7TgLCtbB0G@public.gmane.org>
2017-09-06 14:03           ` Serge E. Hallyn
2017-09-08 20:02           ` Paul Moore
2017-09-08 20:02         ` Paul Moore
     [not found]   ` <149547016213.10599.1969443294414531853.stgit-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org>
2017-08-14  5:47     ` Richard Guy Briggs
2017-05-22 16:22 ` [PATCH 3/9] Provide /proc/containers David Howells
2017-05-22 16:22   ` David Howells
2017-05-22 16:22 ` [PATCH 4/9] Allow processes to be forked and upcalled into a container David Howells
2017-05-22 16:22   ` David Howells
2017-05-22 16:23 ` [PATCH 5/9] Open a socket inside " David Howells
2017-05-22 16:23 ` [PATCH 6/9] Allow fs syscall dfd arguments to take a container fd David Howells
2017-05-22 16:23 ` [PATCH 7/9] Make fsopen() able to initiate mounting into a container David Howells
2017-05-22 16:23 ` [PATCH 8/9] Honour CONTAINER_NEW_EMPTY_FS_NS David Howells
2017-05-22 16:23   ` David Howells
2017-05-22 16:23 ` [PATCH 9/9] Sample program for driving container objects David Howells
     [not found] ` <149547014649.10599.12025037906646164347.stgit-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org>
2017-05-22 16:53   ` [RFC][PATCH 0/9] Make containers kernel objects James Bottomley
2017-05-22 16:53     ` James Bottomley
2017-05-22 17:14     ` Aleksa Sarai
2017-05-22 17:14       ` Aleksa Sarai
2017-05-22 17:27     ` Jessica Frazelle
2017-05-22 17:27       ` Jessica Frazelle
2017-05-22 18:34     ` Jeff Layton
2017-05-22 18:34       ` Jeff Layton
     [not found]       ` <1495478092.2816.17.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-05-22 19:21         ` James Bottomley
2017-05-22 19:21       ` James Bottomley
2017-05-22 19:21         ` James Bottomley
2017-05-22 22:14         ` Jeff Layton
     [not found]         ` <1495480860.9050.18.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
2017-05-22 22:14           ` Jeff Layton
2017-05-23 10:35           ` Ian Kent
2017-05-23 10:35         ` Ian Kent
2017-05-23 10:35           ` Ian Kent
2017-05-23  9:38     ` Ian Kent
2017-05-23  9:38       ` Ian Kent
2017-05-23  9:38       ` Ian Kent
     [not found]     ` <1495472039.2757.19.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
2017-05-22 17:14       ` Aleksa Sarai
2017-05-22 17:27       ` Jessica Frazelle
2017-05-22 18:34       ` Jeff Layton
2017-05-23  9:38       ` Ian Kent
2017-05-23 13:52       ` David Howells
     [not found]     ` <f167feeb-e653-12e3-eec8-24162f7f7c07-l3A5Bk7waGM@public.gmane.org>
2017-05-23 14:53       ` David Howells
2017-05-23 14:53     ` David Howells
2017-05-23 14:56       ` Eric W. Biederman
2017-05-23 14:56         ` Eric W. Biederman
     [not found]       ` <2446.1495551216-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org>
2017-05-23 14:56         ` Eric W. Biederman
2017-05-23 15:14       ` David Howells
2017-05-23 15:14         ` David Howells
     [not found]         ` <2961.1495552481-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org>
2017-05-23 15:17           ` Eric W. Biederman
2017-05-23 15:17             ` Eric W. Biederman
     [not found]             ` <87bmqjmwl5.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2017-05-23 15:44               ` James Bottomley
2017-05-23 15:44             ` James Bottomley
2017-05-23 15:44               ` James Bottomley
     [not found]             ` <1495554267.27369.9.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
2017-05-23 16:36               ` David Howells
2017-05-23 16:36                 ` David Howells
     [not found]                 ` <3860.1495557363-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org>
2017-05-24  8:26                   ` Eric W. Biederman [this message]
2017-05-24  8:26                     ` Eric W. Biederman
2017-05-24  9:16                     ` Ian Kent
2017-05-24  9:16                       ` Ian Kent
     [not found]                     ` <87k256ek3e.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2017-05-24  9:16                       ` Ian Kent
     [not found]       ` <87zie3mxkc.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2017-05-23 15:14         ` David Howells
2017-05-22 17:11 ` Jessica Frazelle
2017-05-22 17:11   ` Jessica Frazelle
2017-05-22 19:04 ` Eric W. Biederman
2017-05-22 19:04   ` Eric W. Biederman
2017-05-22 22:22   ` Jeff Layton
2017-05-22 22:22     ` Jeff Layton
2017-05-23 12:54     ` Eric W. Biederman
2017-05-23 12:54       ` Eric W. Biederman
2017-05-23 14:27       ` Jeff Layton
2017-05-23 14:27         ` Jeff Layton
2017-05-23 14:30       ` Djalal Harouni
2017-05-23 14:30         ` Djalal Harouni
2017-05-23 14:54         ` Colin Walters
2017-05-23 14:54           ` Colin Walters
2017-05-23 15:31           ` Jeff Layton
2017-05-23 15:31             ` Jeff Layton
2017-05-23 15:35             ` Colin Walters
2017-05-23 15:35               ` Colin Walters
2017-05-23 15:30         ` David Howells
2017-05-23 14:23     ` Djalal Harouni
2017-05-23 14:23       ` Djalal Harouni
2017-05-27 17:45   ` Trond Myklebust
2017-05-27 17:45     ` Trond Myklebust
2017-05-27 19:10     ` James Bottomley
2017-05-27 19:10       ` James Bottomley
2017-05-30  1:03     ` Ian Kent
2017-05-30  1:03       ` Ian Kent
2017-05-23 10:09 ` Ian Kent
2017-05-23 10:09   ` Ian Kent
2017-05-23 13:52 ` David Howells
2017-05-23 13:52   ` David Howells
2017-05-23 15:02   ` James Bottomley
2017-05-23 15:02     ` James Bottomley
     [not found]   ` <32556.1495547529-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org>
2017-05-23 15:02     ` James Bottomley
2017-05-23 15:23     ` Eric W. Biederman
2017-05-23 15:23   ` Eric W. Biederman
2017-05-23 15:12 ` David Howells
2017-05-23 15:12   ` David Howells
2017-05-23 15:33 ` Eric W. Biederman
2017-05-23 15:33   ` Eric W. Biederman
2017-05-23 16:13 ` David Howells
2017-05-23 16:13   ` David Howells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87k256ek3e.fsf@xmission.com \
    --to=ebiederm-as9lmozglivwk0htik3j/w@public.gmane.org \
    --cc=James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org \
    --cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=mszeredi-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=trondmy-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org \
    --cc=viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.