linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Trond Myklebust <trondmy@hammerspace.com>
To: "sfrench@samba.org" <sfrench@samba.org>,
	"dhowells@redhat.com" <dhowells@redhat.com>,
	"keyrings@vger.kernel.org" <keyrings@vger.kernel.org>
Cc: "rgb@redhat.com" <rgb@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-security-module@vger.kernel.org" 
	<linux-security-module@vger.kernel.org>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"linux-cifs@vger.kernel.org" <linux-cifs@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: Re: [RFC PATCH 02/27] containers: Implement containers as kernel objects
Date: Sun, 17 Feb 2019 18:57:54 +0000	[thread overview]
Message-ID: <8c95213ae0981bd7af928902fcb34d6a9dedaa6f.camel@hammerspace.com> (raw)
In-Reply-To: <155024685321.21651.1504201877881622756.stgit@warthog.procyon.org.uk>

Hi David,

On Fri, 2019-02-15 at 16:07 +0000, David Howells wrote:
> Implement a kernel container object such that it contains the
> following
> things:
> 
>  (1) Namespaces.
> 
>  (2) A root directory.
> 
>  (3) A set of processes, including one designated as the 'init'
> process.
> 
> A container is created and attached to a file descriptor by:
> 
> 	int cfd = container_create(const char *name, unsigned int
> flags);
> 
> this inherits all the namespaces of the parent container unless
> otherwise
> the mask calls for new namespaces.
> 
> 	CONTAINER_NEW_FS_NS
> 	CONTAINER_NEW_EMPTY_FS_NS
> 	CONTAINER_NEW_CGROUP_NS [root only]
> 	CONTAINER_NEW_UTS_NS
> 	CONTAINER_NEW_IPC_NS
> 	CONTAINER_NEW_USER_NS
> 	CONTAINER_NEW_PID_NS
> 	CONTAINER_NEW_NET_NS
> 
> Other flags include:
> 
> 	CONTAINER_KILL_ON_CLOSE
> 	CONTAINER_CLOSE_ON_EXEC
> 
> Note that I've added a pointer to the current container to
> task_struct.
> This doesn't make the nsproxy pointer redundant as you can still make
> new
> namespaces with clone().
> 
> I've also added a list_head to task_struct to form a list in the
> container
> of its member processes.  This is convenient, but redundant since the
> code
> could iterate over all the tasks looking for ones that have a
> matching
> task->container.
> 
> It might make sense to use fsconfig() to configure the container:
> 
> 	fsconfig(cfd, FSCONFIG_SET_NAMESPACE, "user", NULL, userns_fd);
> 	fsconfig(cfd, FSCONFIG_SET_NAMESPACE, "mnt", NULL, mntns_fd);
> 	fsconfig(cfd, FSCONFIG_SET_FD, "rootfs", NULL, root_fd);
> 	fsconfig(cfd, FSCONFIG_CMD_CREATE_CONTAINER, NULL, NULL, 0);
> 
> 
> ==================
> FUTURE DEVELOPMENT
> ==================
> 
>  (1) Setting up the container.
> 
>      A container would be created with, say:
> 
> 	int cfd = container_create("fred", CONTAINER_NEW_EMPTY_FS_NS);
> 
>      Once created, it should then be possible for the supervising
> process
>      to modify the new container.  Mounts can be created inside of
> the
>      container's namespaces:
> 
> 	fsfd = fsopen("ext4", 0);
> 	fsconfig(fsfd, FSCONFIG_SET_CONTAINER, NULL, NULL, cfd);
> 	fsconfig(fsfd, FSCONFIG_SET_STRING, "source", "/dev/sda3", 0);
> 	fsconfig(fsfd, FSCONFIG_SET_FLAG, "user_xattr", NULL, 0);
> 	fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
> 	mfd = fsmount(fsfd, 0, 0);
> 
>      and then mounted into the namespace:
> 
> 	move_mount(mfd, "", cfd, "/",
> 		   MOVE_MOUNT_F_EMPTY_PATH |
> MOVE_MOUNT_T_CONTAINER_ROOT);
> 
>      Further mounts can be added by:
> 
> 	move_mount(mfd, "", cfd, "proc", MOVE_MOUNT_F_EMPTY_PATH);
> 
>      Files and devices can be created by supplying the container fd
> as the
>      dirfd argument:
> 
> 	mkdirat(int cfd, const char *path, mode_t mode);
> 	mknodat(int cfd, const char *path, mode_t mode, dev_t dev);
> 	int fd = openat(int cfd, const char *path,
> 			unsigned int flags, mode_t mode);
> 
>      [*] Note that when using cfd as dirfd, the path must not contain
> a '/'
>      	 at the front.
> 
>      Sockets, such as netlink, can be opened inside of the
> container's
>      namespaces:
> 
> 	int fd = container_socket(int cfd, int domain, int type,
> 				  int protocol);
> 
>      This should allow management of the container's network
> namespace from
>      outside.
> 
>  (2) Starting the container.
> 
>      Once all modifications are complete, the container's 'init'
> process
>      can be started by:
> 
> 	fork_into_container(int cfd);
> 
>      This precludes further external modification of the mount tree
> within
>      the container.  Before this point, the container is simply
> destroyed
>      if the container fd is closed.
> 
>  (3) Waiting for the container to complete.
> 
>      The container fd can then be polled to wait for init process
> therein
>      to complete and the exit code collected by:
> 
> 	container_wait(int container_fd, int *_wstatus, unsigned int
> wait,
> 		       struct rusage *rusage);
> 
>      The container and everything in it can be terminated or killed
> off:
> 
> 	container_kill(int container_fd, int initonly, int signal);
> 
>      If 'init' dies, all other processes in the container are
> preemptively
>      SIGKILL'd by the kernel.
> 
>      By default, if the container is active and its fd is closed, the
>      container is left running and wil be cleaned up when its 'init'
> exits.
>      The default can be changed with the CONTAINER_KILL_ON_CLOSE
> flag.
> 
>  (4) Supervising the container.
> 
>      Given that we have an fd attached to the container, we could
> make it
>      such that the supervising process could monitor and override
> EPERM
>      returns for mount and other privileged operations within the
>      container.
> 
>  (5) Per-container keyring.
> 
>      Each container can point to a per-container keyring for the
> holding of
>      integrity keys and filesystem keys for use inside the
> container.  This
>      would be attached:
> 
> 	keyctl(KEYCTL_SET_CONTAINER_KEYRING, cfd, keyring)
> 
>      This keyring would be searched by request_key() after it has
> searched
>      the thread, process and session keyrings.
> 
>  (6) Running different LSM policies by container.  This might
> particularly
>      make sense with something like Apparmor where different path-
> based
>      rules might be required inside a container to inside the parent.
> 
> Signed-off-by: David Howells <dhowells@redhat.com>
> ---

Do we really need a new system call to set up containers? That would
force changes to all existing orchestration software.

Given that the main thing we want to achieve is to direct messages from
the kernel to an appropriate handler, why not focus on adding
functionality to do just that?

Is there any reason why a syscall to allow an appropriately privileged
process to add a keyring-specific message queue to its own
user_namespace and obtain a file descriptor to that message queue might
not work? That forces the container to use a daemon if it cares to
intercept keyring traffic, rather than worrying about the kernel
running request_key (in fact, it might make sense to allow a trivial
implementation of the daemon to be to just read the messages, parse
them and run request_key).

With such an implementation, the fallback mechanism could be to walk
back up the hierarchy of user_namespaces until a message queue is
found, and to invoke the existing request_key mechanism if not.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



  reply	other threads:[~2019-02-17 18:58 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-15 16:07 [RFC PATCH 00/27] Containers and using authenticated filesystems David Howells
2019-02-15 16:07 ` [RFC PATCH 01/27] containers: Rename linux/container.h to linux/container_dev.h David Howells
2019-02-15 16:07 ` [RFC PATCH 02/27] containers: Implement containers as kernel objects David Howells
2019-02-17 18:57   ` Trond Myklebust [this message]
2019-02-17 19:39   ` James Bottomley
2019-02-19 16:56   ` Eric W. Biederman
2019-02-19 23:03   ` David Howells
2019-02-20 14:23     ` Trond Myklebust
2019-02-19 23:06   ` David Howells
2019-02-20  2:20     ` James Bottomley
2019-02-20  3:04       ` Ian Kent
2019-02-20  3:46         ` James Bottomley
2019-02-20  4:42           ` Ian Kent
2019-02-20  6:57           ` Paul Moore
2019-02-19 23:13   ` David Howells
2019-02-19 23:55   ` Tycho Andersen
2019-02-20  2:46   ` Ian Kent
2019-02-20 13:26     ` Christian Brauner
2019-02-21 10:39       ` Ian Kent
2019-02-15 16:07 ` [RFC PATCH 03/27] containers: Provide /proc/containers David Howells
2019-02-15 16:07 ` [RFC PATCH 04/27] containers: Allow a process to be forked into a container David Howells
2019-02-15 17:39   ` Stephen Smalley
2019-02-19 16:39   ` Eric W. Biederman
2019-02-19 23:16   ` David Howells
2019-02-15 16:07 ` [RFC PATCH 05/27] containers: Open a socket inside " David Howells
2019-02-19 16:41   ` Eric W. Biederman
2019-02-15 16:08 ` [RFC PATCH 06/27] containers, vfs: Allow syscall dirfd arguments to take a container fd David Howells
2019-02-19 16:45   ` Eric W. Biederman
2019-02-19 23:24   ` David Howells
2019-02-15 16:08 ` [RFC PATCH 07/27] containers: Make fsopen() able to create a superblock in a container David Howells
2019-02-15 16:08 ` [RFC PATCH 08/27] containers, vfs: Honour CONTAINER_NEW_EMPTY_FS_NS David Howells
2019-02-17  0:11   ` Al Viro
2019-02-15 16:08 ` [RFC PATCH 09/27] vfs: Allow mounting to other namespaces David Howells
2019-02-17  0:14   ` Al Viro
2019-02-15 16:08 ` [RFC PATCH 10/27] containers: Provide fs_context op for container setting David Howells
2019-02-15 16:09 ` [RFC PATCH 11/27] containers: Sample program for driving container objects David Howells
2019-02-15 16:09 ` [RFC PATCH 12/27] containers: Allow a daemon to intercept request_key upcalls in a container David Howells
2019-02-15 16:09 ` [RFC PATCH 13/27] keys: Provide a keyctl to query a request_key authentication key David Howells
2019-02-15 16:09 ` [RFC PATCH 14/27] keys: Break bits out of key_unlink() David Howells
2019-02-15 16:09 ` [RFC PATCH 15/27] keys: Make __key_link_begin() handle lockdep nesting David Howells
2019-02-15 16:09 ` [RFC PATCH 16/27] keys: Grant Link permission to possessers of request_key auth keys David Howells
2019-02-15 16:10 ` [RFC PATCH 17/27] keys: Add a keyctl to move a key between keyrings David Howells
2019-02-15 16:10 ` [RFC PATCH 18/27] keys: Find the least-recently used unseen key in a keyring David Howells
2019-02-15 16:10 ` [RFC PATCH 19/27] containers: Sample: request_key upcall handling David Howells
2019-02-15 16:10 ` [RFC PATCH 20/27] container, keys: Add a container keyring David Howells
2019-02-15 21:46   ` Eric Biggers
2019-02-15 16:11 ` [RFC PATCH 21/27] keys: Fix request_key() lack of Link perm check on found key David Howells
2019-02-15 16:11 ` [RFC PATCH 22/27] KEYS: Replace uid/gid/perm permissions checking with an ACL David Howells
2019-02-15 17:32   ` Stephen Smalley
2019-02-15 17:39   ` David Howells
2019-02-15 16:11 ` [RFC PATCH 23/27] KEYS: Provide KEYCTL_GRANT_PERMISSION David Howells
2019-02-15 16:11 ` [RFC PATCH 24/27] keys: Allow a container to be specified as a subject in a key's ACL David Howells
2019-02-15 16:11 ` [RFC PATCH 25/27] keys: Provide a way to ask for the container keyring David Howells
2019-02-15 16:12 ` [RFC PATCH 26/27] keys: Allow containers to be included in key ACLs by name David Howells
2019-02-15 16:12 ` [RFC PATCH 27/27] containers: Sample to grant access to a key in a container David Howells
2019-02-15 22:36 ` [RFC PATCH 00/27] Containers and using authenticated filesystems James Morris
2019-02-19 16:35 ` Eric W. Biederman
2019-02-20 14:18   ` Christian Brauner
2019-02-19 23:42 ` David Howells
2019-02-20  7:00   ` Paul Moore
2019-02-20 18:54   ` Steve French

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8c95213ae0981bd7af928902fcb34d6a9dedaa6f.camel@hammerspace.com \
    --to=trondmy@hammerspace.com \
    --cc=dhowells@redhat.com \
    --cc=keyrings@vger.kernel.org \
    --cc=linux-cifs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=rgb@redhat.com \
    --cc=sfrench@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).