From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S968274AbdEWOxp convert rfc822-to-8bit (ORCPT ); Tue, 23 May 2017 10:53:45 -0400 Received: from mx1.redhat.com ([209.132.183.28]:53876 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S967115AbdEWOxl (ORCPT ); Tue, 23 May 2017 10:53:41 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 9535061D22 Authentication-Results: ext-mx10.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx10.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=dhowells@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com 9535061D22 Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 From: David Howells In-Reply-To: References: <149547014649.10599.12025037906646164347.stgit@warthog.procyon.org.uk> <1495472039.2757.19.camel@HansenPartnership.com> To: Aleksa Sarai Cc: dhowells@redhat.com, James Bottomley , trondmy@primarydata.com, mszeredi@redhat.com, linux-nfs@vger.kernel.org, jlayton@redhat.com, Linux Containers , linux-kernel@vger.kernel.org, viro@zeniv.linux.org.uk, linux-fsdevel@vger.kernel.org, cgroups@vger.kernel.org, ebiederm@xmission.com Subject: Re: [RFC][PATCH 0/9] Make containers kernel objects MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <2442.1495551216.1@warthog.procyon.org.uk> Content-Transfer-Encoding: 8BIT Date: Tue, 23 May 2017 15:53:36 +0100 Message-ID: <2446.1495551216@warthog.procyon.org.uk> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Tue, 23 May 2017 14:53:41 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Aleksa Sarai wrote: > >> The reason I think this is necessary is that the kernel has no idea > >> how to direct upcalls to what userspace considers to be a container - > >> current Linux practice appears to make a "container" just an > >> arbitrarily chosen junction of namespaces, control groups and files, > >> which may be changed individually within the "container". > > Just want to point out that if the kernel APIs for containers massively > change, then the OCI will have to completely rework how we describe containers > (and so will all existing runtimes). > > Not to mention that while I don't like how hard it is (from a runtime > perspective) to actually set up a container securely, there are undoubtedly > benefits to having namespaces split out. The network namespace being separate > means that in certain contexts you actually don't want to create a new network > namespace when creating a container. Yep, I quite agree. However, certain things need to be made per-net namespace that *aren't*. DNS results, for instance. As an example, I could set up a client machine with two ethernet ports, set up two DNS+NFS servers, each of which think they're called "foo.bar" and attach each server to a different port on the client machine. Then I could create a pair of containers on the client machine and route the network in each container to a different port. Now there's a problem because the names of the cached DNS records for each port overlap. Further, the NFS idmapper needs to be able to direct its calls to the appropriate network. > I had some ideas about how you could implement bridging in userspace (as an > unprivileged user, for rootless containers) but if you can't join namespaces > individually then such a setup is not practically possible. I'm not proposing to take away the ability to arbitrarily set the namespaces in a container. I haven't implemented it yet, but it was on the to-do list: (7) Directly set a container's namespaces to allow cross-container sharing. David