All of lore.kernel.org
 help / color / mirror / Atom feed
From: James Bottomley <James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
To: David Howells <dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	trondmy-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org
Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	Linux Containers
	<containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org
Subject: Re: [RFC][PATCH 0/9] Make containers kernel objects
Date: Mon, 22 May 2017 09:53:59 -0700	[thread overview]
Message-ID: <1495472039.2757.19.camel@HansenPartnership.com> (raw)
In-Reply-To: <149547014649.10599.12025037906646164347.stgit-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org>

[Added missing cc to containers list]
On Mon, 2017-05-22 at 17:22 +0100, David Howells wrote:
> Here are a set of patches to define a container object for the kernel 
> and to provide some methods to create and manipulate them.
> 
> The reason I think this is necessary is that the kernel has no idea 
> how to direct upcalls to what userspace considers to be a container -
> current Linux practice appears to make a "container" just an 
> arbitrarily chosen junction of namespaces, control groups and files, 
> which may be changed individually within the "container".

This sounds like a step in the wrong direction: the strength of the
current container interfaces in Linux is that people who set up
containers don't have to agree what they look like.  So I can set up a
user namespace without a mount namespace or an architecture emulation
container with only a mount namespace.

But ignoring my fun foibles with containers and to give a concrete
example in terms of a popular orchestration system: in kubernetes,
where certain namespaces are shared across pods, do you imagine the
kernel's view of the "container" to be the pod or what kubernetes
thinks of as the container?  This is important, because half the
examples you give below are network related and usually pods share a
network namespace.

> The kernel upcall mechanism then needs to decide which set of 
> namespaces, etc., it must exec the appropriate upcall program. 
>  Examples of this include:
> 
>  (1) The DNS resolver.  The DNS cache in the kernel should probably 
> be per-network namespace, but in userspace the program, its
> libraries and its config data are associated with a mount tree and a 
> user namespace and it gets run in a particular pid namespace.

All persistent (written to fs data) has to be mount ns associated;
there are no ifs, ands and buts to that.  I agree this implies that if
you want to run a separate network namespace, you either take DNS from
the parent (a lot of containers do) or you set up a daemon to run
within the mount namespace.  I agree the latter is a slightly fiddly
operation you have to get right, but that's why we have orchestration
systems.

What is it we could do with the above that we cannot do today?

>  (2) NFS ID mapper.  The NFS ID mapping cache should also probably be
>      per-network namespace.

I think this is a view but not the only one:  Right at the moment, NFS
ID mapping is used as the one of the ways we can get the user namespace
ID mapping writes to file problems fixed ... that makes it a property
of the mount namespace for a lot of containers.  There are many other
instances where they do exactly as you say, but what I'm saying is that
we don't want to lose the flexibility we currently have.

>  (3) nfsdcltrack.  A way for NFSD to access stable storage for 
> tracking of persistent state.  Again, network-namespace dependent, 
> but also perhaps mount-namespace dependent.

So again, given we can set this up to work today, this sounds like more
a restriction that will bite us than an enhancement that gives us extra
features.

>  (4) General request-key upcalls.  Not particularly namespace 
> dependent, apart from keyrings being somewhat governed by the user
> namespace and the upcall being configured by the mount namespace.

All mount namespaces have an owning user namespace, so the data
relations are already there in the kernel, is the problem simply
finding them?

> These patches are built on top of the mount context patchset so that
> namespaces can be properly propagated over submounts/automounts.

I'll stop here ... you get the idea that I think this is imposing a set
of restrictions that will come back to bite us later.  If this is just
for the sake of figuring out how to get keyring upcalls to work, then
I'm sure we can come up with something.

James

WARNING: multiple messages have this Message-ID (diff)
From: James Bottomley <James.Bottomley@HansenPartnership.com>
To: David Howells <dhowells@redhat.com>, trondmy@primarydata.com
Cc: mszeredi@redhat.com, linux-nfs@vger.kernel.org,
	jlayton@redhat.com, linux-kernel@vger.kernel.org,
	viro@zeniv.linux.org.uk, linux-fsdevel@vger.kernel.org,
	cgroups@vger.kernel.org, ebiederm@xmission.com,
	Linux Containers <containers@lists.linux-foundation.org>
Subject: Re: [RFC][PATCH 0/9] Make containers kernel objects
Date: Mon, 22 May 2017 09:53:59 -0700	[thread overview]
Message-ID: <1495472039.2757.19.camel@HansenPartnership.com> (raw)
In-Reply-To: <149547014649.10599.12025037906646164347.stgit@warthog.procyon.org.uk>

[Added missing cc to containers list]
On Mon, 2017-05-22 at 17:22 +0100, David Howells wrote:
> Here are a set of patches to define a container object for the kernel 
> and to provide some methods to create and manipulate them.
> 
> The reason I think this is necessary is that the kernel has no idea 
> how to direct upcalls to what userspace considers to be a container -
> current Linux practice appears to make a "container" just an 
> arbitrarily chosen junction of namespaces, control groups and files, 
> which may be changed individually within the "container".

This sounds like a step in the wrong direction: the strength of the
current container interfaces in Linux is that people who set up
containers don't have to agree what they look like.  So I can set up a
user namespace without a mount namespace or an architecture emulation
container with only a mount namespace.

But ignoring my fun foibles with containers and to give a concrete
example in terms of a popular orchestration system: in kubernetes,
where certain namespaces are shared across pods, do you imagine the
kernel's view of the "container" to be the pod or what kubernetes
thinks of as the container?  This is important, because half the
examples you give below are network related and usually pods share a
network namespace.

> The kernel upcall mechanism then needs to decide which set of 
> namespaces, etc., it must exec the appropriate upcall program. 
>  Examples of this include:
> 
>  (1) The DNS resolver.  The DNS cache in the kernel should probably 
> be per-network namespace, but in userspace the program, its
> libraries and its config data are associated with a mount tree and a 
> user namespace and it gets run in a particular pid namespace.

All persistent (written to fs data) has to be mount ns associated;
there are no ifs, ands and buts to that.  I agree this implies that if
you want to run a separate network namespace, you either take DNS from
the parent (a lot of containers do) or you set up a daemon to run
within the mount namespace.  I agree the latter is a slightly fiddly
operation you have to get right, but that's why we have orchestration
systems.

What is it we could do with the above that we cannot do today?

>  (2) NFS ID mapper.  The NFS ID mapping cache should also probably be
>      per-network namespace.

I think this is a view but not the only one:  Right at the moment, NFS
ID mapping is used as the one of the ways we can get the user namespace
ID mapping writes to file problems fixed ... that makes it a property
of the mount namespace for a lot of containers.  There are many other
instances where they do exactly as you say, but what I'm saying is that
we don't want to lose the flexibility we currently have.

>  (3) nfsdcltrack.  A way for NFSD to access stable storage for 
> tracking of persistent state.  Again, network-namespace dependent, 
> but also perhaps mount-namespace dependent.

So again, given we can set this up to work today, this sounds like more
a restriction that will bite us than an enhancement that gives us extra
features.

>  (4) General request-key upcalls.  Not particularly namespace 
> dependent, apart from keyrings being somewhat governed by the user
> namespace and the upcall being configured by the mount namespace.

All mount namespaces have an owning user namespace, so the data
relations are already there in the kernel, is the problem simply
finding them?

> These patches are built on top of the mount context patchset so that
> namespaces can be properly propagated over submounts/automounts.

I'll stop here ... you get the idea that I think this is imposing a set
of restrictions that will come back to bite us later.  If this is just
for the sake of figuring out how to get keyring upcalls to work, then
I'm sure we can come up with something.

James

  parent reply	other threads:[~2017-05-22 16:53 UTC|newest]

Thread overview: 118+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-22 16:22 [RFC][PATCH 0/9] Make containers kernel objects David Howells
2017-05-22 16:22 ` David Howells
2017-05-22 16:22 ` [PATCH 1/9] containers: Rename linux/container.h to linux/container_dev.h David Howells
2017-05-22 16:22 ` [PATCH 2/9] Implement containers as kernel objects David Howells
2017-08-14  5:47   ` Richard Guy Briggs
2017-08-14  5:47     ` Richard Guy Briggs
     [not found]     ` <20170814054711.GB29957-bcJWsdo4jJjeVoXN4CMphl7TgLCtbB0G@public.gmane.org>
2017-08-16 22:21       ` Paul Moore
2017-08-16 22:21     ` Paul Moore
2017-08-16 22:21       ` Paul Moore
2017-08-16 22:21       ` Paul Moore
     [not found]       ` <CAHC9VhRgPRa7KeMt8G700aeFvqVYc0gMx__82K31TYY6oQQqTw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-08-18  8:03         ` Richard Guy Briggs
2017-08-18  8:03       ` Richard Guy Briggs
2017-08-18  8:03         ` Richard Guy Briggs
2017-09-06 14:03         ` Serge E. Hallyn
2017-09-06 14:03           ` Serge E. Hallyn
     [not found]           ` <20170906140341.GA8729-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2017-09-14  5:47             ` Richard Guy Briggs
2017-09-14  5:47           ` Richard Guy Briggs
2017-09-14  5:47             ` Richard Guy Briggs
     [not found]         ` <20170818080300.GQ7187-bcJWsdo4jJjeVoXN4CMphl7TgLCtbB0G@public.gmane.org>
2017-09-06 14:03           ` Serge E. Hallyn
2017-09-08 20:02           ` Paul Moore
2017-09-08 20:02         ` Paul Moore
     [not found]   ` <149547016213.10599.1969443294414531853.stgit-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org>
2017-08-14  5:47     ` Richard Guy Briggs
2017-05-22 16:22 ` [PATCH 3/9] Provide /proc/containers David Howells
2017-05-22 16:22   ` David Howells
2017-05-22 16:22 ` [PATCH 4/9] Allow processes to be forked and upcalled into a container David Howells
2017-05-22 16:22   ` David Howells
2017-05-22 16:23 ` [PATCH 5/9] Open a socket inside " David Howells
2017-05-22 16:23 ` [PATCH 6/9] Allow fs syscall dfd arguments to take a container fd David Howells
2017-05-22 16:23 ` [PATCH 7/9] Make fsopen() able to initiate mounting into a container David Howells
2017-05-22 16:23 ` [PATCH 8/9] Honour CONTAINER_NEW_EMPTY_FS_NS David Howells
2017-05-22 16:23   ` David Howells
2017-05-22 16:23 ` [PATCH 9/9] Sample program for driving container objects David Howells
     [not found] ` <149547014649.10599.12025037906646164347.stgit-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org>
2017-05-22 16:53   ` James Bottomley [this message]
2017-05-22 16:53     ` [RFC][PATCH 0/9] Make containers kernel objects James Bottomley
2017-05-22 17:14     ` Aleksa Sarai
2017-05-22 17:14       ` Aleksa Sarai
2017-05-22 17:27     ` Jessica Frazelle
2017-05-22 17:27       ` Jessica Frazelle
2017-05-22 18:34     ` Jeff Layton
2017-05-22 18:34       ` Jeff Layton
     [not found]       ` <1495478092.2816.17.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-05-22 19:21         ` James Bottomley
2017-05-22 19:21       ` James Bottomley
2017-05-22 19:21         ` James Bottomley
2017-05-22 22:14         ` Jeff Layton
     [not found]         ` <1495480860.9050.18.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
2017-05-22 22:14           ` Jeff Layton
2017-05-23 10:35           ` Ian Kent
2017-05-23 10:35         ` Ian Kent
2017-05-23 10:35           ` Ian Kent
2017-05-23  9:38     ` Ian Kent
2017-05-23  9:38       ` Ian Kent
2017-05-23  9:38       ` Ian Kent
     [not found]     ` <1495472039.2757.19.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
2017-05-22 17:14       ` Aleksa Sarai
2017-05-22 17:27       ` Jessica Frazelle
2017-05-22 18:34       ` Jeff Layton
2017-05-23  9:38       ` Ian Kent
2017-05-23 13:52       ` David Howells
     [not found]     ` <f167feeb-e653-12e3-eec8-24162f7f7c07-l3A5Bk7waGM@public.gmane.org>
2017-05-23 14:53       ` David Howells
2017-05-23 14:53     ` David Howells
2017-05-23 14:56       ` Eric W. Biederman
2017-05-23 14:56         ` Eric W. Biederman
     [not found]       ` <2446.1495551216-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org>
2017-05-23 14:56         ` Eric W. Biederman
2017-05-23 15:14       ` David Howells
2017-05-23 15:14         ` David Howells
     [not found]         ` <2961.1495552481-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org>
2017-05-23 15:17           ` Eric W. Biederman
2017-05-23 15:17             ` Eric W. Biederman
     [not found]             ` <87bmqjmwl5.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2017-05-23 15:44               ` James Bottomley
2017-05-23 15:44             ` James Bottomley
2017-05-23 15:44               ` James Bottomley
     [not found]             ` <1495554267.27369.9.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
2017-05-23 16:36               ` David Howells
2017-05-23 16:36                 ` David Howells
     [not found]                 ` <3860.1495557363-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org>
2017-05-24  8:26                   ` Eric W. Biederman
2017-05-24  8:26                     ` Eric W. Biederman
2017-05-24  9:16                     ` Ian Kent
2017-05-24  9:16                       ` Ian Kent
     [not found]                     ` <87k256ek3e.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2017-05-24  9:16                       ` Ian Kent
     [not found]       ` <87zie3mxkc.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2017-05-23 15:14         ` David Howells
2017-05-22 17:11 ` Jessica Frazelle
2017-05-22 17:11   ` Jessica Frazelle
2017-05-22 19:04 ` Eric W. Biederman
2017-05-22 19:04   ` Eric W. Biederman
2017-05-22 22:22   ` Jeff Layton
2017-05-22 22:22     ` Jeff Layton
2017-05-23 12:54     ` Eric W. Biederman
2017-05-23 12:54       ` Eric W. Biederman
2017-05-23 14:27       ` Jeff Layton
2017-05-23 14:27         ` Jeff Layton
2017-05-23 14:30       ` Djalal Harouni
2017-05-23 14:30         ` Djalal Harouni
2017-05-23 14:54         ` Colin Walters
2017-05-23 14:54           ` Colin Walters
2017-05-23 15:31           ` Jeff Layton
2017-05-23 15:31             ` Jeff Layton
2017-05-23 15:35             ` Colin Walters
2017-05-23 15:35               ` Colin Walters
2017-05-23 15:30         ` David Howells
2017-05-23 14:23     ` Djalal Harouni
2017-05-23 14:23       ` Djalal Harouni
2017-05-27 17:45   ` Trond Myklebust
2017-05-27 17:45     ` Trond Myklebust
2017-05-27 19:10     ` James Bottomley
2017-05-27 19:10       ` James Bottomley
2017-05-30  1:03     ` Ian Kent
2017-05-30  1:03       ` Ian Kent
2017-05-23 10:09 ` Ian Kent
2017-05-23 10:09   ` Ian Kent
2017-05-23 13:52 ` David Howells
2017-05-23 13:52   ` David Howells
2017-05-23 15:02   ` James Bottomley
2017-05-23 15:02     ` James Bottomley
     [not found]   ` <32556.1495547529-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org>
2017-05-23 15:02     ` James Bottomley
2017-05-23 15:23     ` Eric W. Biederman
2017-05-23 15:23   ` Eric W. Biederman
2017-05-23 15:12 ` David Howells
2017-05-23 15:12   ` David Howells
2017-05-23 15:33 ` Eric W. Biederman
2017-05-23 15:33   ` Eric W. Biederman
2017-05-23 16:13 ` David Howells
2017-05-23 16:13   ` David Howells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1495472039.2757.19.camel@HansenPartnership.com \
    --to=james.bottomley-d9phhud1jfjcxq6kfmz53/egyhegw8jk@public.gmane.org \
    --cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org \
    --cc=jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=mszeredi-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=trondmy-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org \
    --cc=viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.