From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755957AbcA2I4k (ORCPT ); Fri, 29 Jan 2016 03:56:40 -0500 Received: from h2.hallyn.com ([78.46.35.8]:43920 "EHLO h2.hallyn.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753569AbcA2IyP (ORCPT ); Fri, 29 Jan 2016 03:54:15 -0500 From: serge.hallyn@ubuntu.com To: linux-kernel@vger.kernel.org Cc: adityakali@google.com, tj@kernel.org, linux-api@vger.kernel.org, containers@lists.linux-foundation.org, cgroups@vger.kernel.org, lxc-devel@lists.linuxcontainers.org, akpm@linux-foundation.org, ebiederm@xmission.com, gregkh@linuxfoundation.org, lizefan@huawei.com, hannes@cmpxchg.org Subject: CGroup Namespaces (v10) Date: Fri, 29 Jan 2016 02:54:03 -0600 Message-Id: <1454057651-23959-1-git-send-email-serge.hallyn@ubuntu.com> X-Mailer: git-send-email 1.7.9.5 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, following is a revised set of the CGroup Namespace patchset which Aditya Kali has previously sent. The code can also be found in the cgroupns.v10 branch of https://git.kernel.org/cgit/linux/kernel/git/sergeh/linux-security.git/ To summarize the semantics: 1. CLONE_NEWCGROUP re-uses 0x02000000, which was previously CLONE_STOPPED 2. unsharing a cgroup namespace makes all your current cgroups your new cgroup root. 3. /proc/pid/cgroup always shows cgroup paths relative to the reader's cgroup namespce root. A task outside of your cgroup looks like 8:memory:/../../.. 4. when a task mounts a cgroupfs, the cgroup which shows up as root depends on the mounting task's cgroup namespace. 5. setns to a cgroup namespace switches your cgroup namespace but not your cgroups. With this, using github.com/hallyn/lxc #2015-11-09/cgns (and github.com/hallyn/lxcfs #2015-11-10/cgns) we can start a container in a full proper cgroup namespace, avoiding either cgmanager or lxcfs cgroup bind mounts. This is completely backward compatible and will be completely invisible to any existing cgroup users (except for those running inside a cgroup namespace and looking at /proc/pid/cgroup of tasks outside their namespace.) Changes from V9: 1. Update to latest Linus tree 2. A few locking fixes Changes from V8: 1. Incorporate updated documentation from tj. 2. Put lookup_one_len() under inode lock 3. Make cgroup_path non-namespaced, so only calls to cgroup_path_ns() are namespaced. 4. Make cgroup_path{,_ns} take the needed locks, since external callers cannot do so. 5. Fix the bisectability problem of to_cg_ns() being defined after use Changes from V7: 1. Rework kernfs_path_from_node_locked to return the string length 2. Rename and reorder args to kernfs_path_from_node 3. cgroup.c: undo accidental conversoins to inline 4. cgroup.h: move ns declarations to bottom. 5. Rework the documentation to fit the style of the rest of cgroup.txt Changes from V6: 1. Switch to some WARN_ONs to provide stack traces 2. Rename kernfs_node_distance to kernfs_depth 3. Make sure kernfs_common_ancestor() nodes are from same root 4. Split kernfs changes for cgroup_mount into separate patch 5. Rename kernfs_obtain_root to kernfs_node_dentry (And more, see patch changelogs) Changes from V5: 1. To get a root dentry for cgroup namespace mount, walk the path from the kernfs root dentry. Changes from V4: 1. Move the FS_USERNS_MOUNT flag to last patch 2. Rebase onto cgroup/for-4.5 3. Don't non-init user namespaces to bind new subsystems when mounting. 4. Address feedback from Tejun (thanks). Specificaly, not addressed: . kernfs_obtain_root - walking dentry from kernfs root. (I think that's the only piece) 5. Dropped unused get_task_cgroup fn/patch. 6. Reworked kernfs_path_from_node_locked() to try to simplify the logic. It now finds a common ancestor, walks from the source to it, then back up to the target. Changes from V3: 1. Rebased onto latest cgroup changes. In particular switch to css_set_lock and ns_common. 2. Support all hierarchies. Changes from V2: 1. Added documentation in Documentation/cgroups/namespace.txt 2. Fixed a bug that caused crash 3. Incorporated some other suggestions from last patchset: - removed use of threadgroup_lock() while creating new cgroupns - use task_lock() instead of rcu_read_lock() while accessing task->nsproxy - optimized setns() to own cgroupns - simplified code around sane-behavior mount option parsing 4. Restored ACKs from Serge Hallyn from v1 on few patches that have not changed since then. Changes from V1: 1. No pinning of processes within cgroupns. Tasks can be freely moved across cgroups even outside of their cgroupns-root. Usual DAC/MAC policies apply as before. 2. Path in /proc//cgroup is now always shown and is relative to cgroupns-root. So path can contain '/..' strings depending on cgroupns-root of the reader and cgroup of . 3. setns() does not require the process to first move under target cgroupns-root. Changes form RFC (V0): 1. setns support for cgroupns 2. 'mount -t cgroup cgroup ' from inside a cgroupns now mounts the cgroup hierarcy with cgroupns-root as the filesystem root. 3. writes to cgroup files outside of cgroupns-root are not allowed 4. visibility of /proc//cgroup is further restricted by not showing anything if the is in a sibling cgroupns and its cgroup falls outside your cgroupns-root. From mboxrd@z Thu Jan 1 00:00:00 1970 From: serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org Subject: CGroup Namespaces (v10) Date: Fri, 29 Jan 2016 02:54:03 -0600 Message-ID: <1454057651-23959-1-git-send-email-serge.hallyn@ubuntu.com> Reply-To: LXC development mailing-list Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Return-path: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: lxc-devel-bounces-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I@public.gmane.org Sender: "lxc-devel" To: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Cc: adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org, lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I@public.gmane.org, gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org, tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org List-Id: linux-api@vger.kernel.org SGksCgpmb2xsb3dpbmcgaXMgYSByZXZpc2VkIHNldCBvZiB0aGUgQ0dyb3VwIE5hbWVzcGFjZSBw YXRjaHNldCB3aGljaCBBZGl0eWEKS2FsaSBoYXMgcHJldmlvdXNseSBzZW50LiAgVGhlIGNvZGUg Y2FuIGFsc28gYmUgZm91bmQgaW4gdGhlIGNncm91cG5zLnYxMApicmFuY2ggb2YKCmh0dHBzOi8v Z2l0Lmtlcm5lbC5vcmcvY2dpdC9saW51eC9rZXJuZWwvZ2l0L3NlcmdlaC9saW51eC1zZWN1cml0 eS5naXQvCgpUbyBzdW1tYXJpemUgdGhlIHNlbWFudGljczoKCjEuIENMT05FX05FV0NHUk9VUCBy ZS11c2VzIDB4MDIwMDAwMDAsIHdoaWNoIHdhcyBwcmV2aW91c2x5IENMT05FX1NUT1BQRUQKCjIu IHVuc2hhcmluZyBhIGNncm91cCBuYW1lc3BhY2UgbWFrZXMgYWxsIHlvdXIgY3VycmVudCBjZ3Jv dXBzIHlvdXIgbmV3CmNncm91cCByb290LgoKMy4gL3Byb2MvcGlkL2Nncm91cCBhbHdheXMgc2hv d3MgY2dyb3VwIHBhdGhzIHJlbGF0aXZlIHRvIHRoZSByZWFkZXIncwpjZ3JvdXAgbmFtZXNwY2Ug cm9vdC4gIEEgdGFzayBvdXRzaWRlIG9mICB5b3VyIGNncm91cCBsb29rcyBsaWtlCgoJODptZW1v cnk6Ly4uLy4uLy4uCgo0LiB3aGVuIGEgdGFzayBtb3VudHMgYSBjZ3JvdXBmcywgdGhlIGNncm91 cCB3aGljaCBzaG93cyB1cCBhcyByb290IGRlcGVuZHMKb24gdGhlIG1vdW50aW5nIHRhc2sncyAg Y2dyb3VwIG5hbWVzcGFjZS4KCjUuIHNldG5zIHRvIGEgY2dyb3VwIG5hbWVzcGFjZSBzd2l0Y2hl cyB5b3VyIGNncm91cCBuYW1lc3BhY2UgYnV0IG5vdAp5b3VyIGNncm91cHMuCgpXaXRoIHRoaXMs IHVzaW5nIGdpdGh1Yi5jb20vaGFsbHluL2x4YyAjMjAxNS0xMS0wOS9jZ25zIChhbmQKZ2l0aHVi LmNvbS9oYWxseW4vbHhjZnMgIzIwMTUtMTEtMTAvY2ducykgd2UgY2FuIHN0YXJ0IGEgY29udGFp bmVyIGluIGEgZnVsbApwcm9wZXIgY2dyb3VwIG5hbWVzcGFjZSwgYXZvaWRpbmcgZWl0aGVyIGNn bWFuYWdlciBvciBseGNmcyBjZ3JvdXAgYmluZCBtb3VudHMuCgpUaGlzIGlzIGNvbXBsZXRlbHkg YmFja3dhcmQgY29tcGF0aWJsZSBhbmQgd2lsbCBiZSBjb21wbGV0ZWx5IGludmlzaWJsZQp0byBh bnkgZXhpc3RpbmcgY2dyb3VwIHVzZXJzIChleGNlcHQgZm9yIHRob3NlIHJ1bm5pbmcgaW5zaWRl IGEgY2dyb3VwCm5hbWVzcGFjZSBhbmQgbG9va2luZyBhdCAvcHJvYy9waWQvY2dyb3VwIG9mIHRh c2tzIG91dHNpZGUgdGhlaXIKbmFtZXNwYWNlLikKCkNoYW5nZXMgZnJvbSBWOToKMS4gVXBkYXRl IHRvIGxhdGVzdCBMaW51cyB0cmVlCjIuIEEgZmV3IGxvY2tpbmcgZml4ZXMKCkNoYW5nZXMgZnJv bSBWODoKMS4gSW5jb3Jwb3JhdGUgdXBkYXRlZCBkb2N1bWVudGF0aW9uIGZyb20gdGouCjIuIFB1 dCBsb29rdXBfb25lX2xlbigpIHVuZGVyIGlub2RlIGxvY2sKMy4gTWFrZSBjZ3JvdXBfcGF0aCBu b24tbmFtZXNwYWNlZCwgc28gb25seSBjYWxscyB0byBjZ3JvdXBfcGF0aF9ucygpIGFyZQogICBu YW1lc3BhY2VkLgo0LiBNYWtlIGNncm91cF9wYXRoeyxfbnN9IHRha2UgdGhlIG5lZWRlZCBsb2Nr cywgc2luY2UgZXh0ZXJuYWwgY2FsbGVycyBjYW5ub3QKICAgZG8gc28uCjUuIEZpeCB0aGUgYmlz ZWN0YWJpbGl0eSBwcm9ibGVtIG9mIHRvX2NnX25zKCkgYmVpbmcgZGVmaW5lZCBhZnRlciB1c2UK CkNoYW5nZXMgZnJvbSBWNzoKMS4gUmV3b3JrIGtlcm5mc19wYXRoX2Zyb21fbm9kZV9sb2NrZWQg dG8gcmV0dXJuIHRoZSBzdHJpbmcgbGVuZ3RoCjIuIFJlbmFtZSBhbmQgcmVvcmRlciBhcmdzIHRv IGtlcm5mc19wYXRoX2Zyb21fbm9kZQozLiBjZ3JvdXAuYzogdW5kbyBhY2NpZGVudGFsIGNvbnZl cnNvaW5zIHRvIGlubGluZQo0LiBjZ3JvdXAuaDogbW92ZSBucyBkZWNsYXJhdGlvbnMgdG8gYm90 dG9tLgo1LiBSZXdvcmsgdGhlIGRvY3VtZW50YXRpb24gdG8gZml0IHRoZSBzdHlsZSBvZiB0aGUg cmVzdCBvZiBjZ3JvdXAudHh0CgpDaGFuZ2VzIGZyb20gVjY6CjEuIFN3aXRjaCB0byBzb21lIFdB Uk5fT05zIHRvIHByb3ZpZGUgc3RhY2sgdHJhY2VzCjIuIFJlbmFtZSBrZXJuZnNfbm9kZV9kaXN0 YW5jZSB0byBrZXJuZnNfZGVwdGgKMy4gTWFrZSBzdXJlIGtlcm5mc19jb21tb25fYW5jZXN0b3Io KSBub2RlcyBhcmUgZnJvbSBzYW1lIHJvb3QKNC4gU3BsaXQga2VybmZzIGNoYW5nZXMgZm9yIGNn cm91cF9tb3VudCBpbnRvIHNlcGFyYXRlIHBhdGNoCjUuIFJlbmFtZSBrZXJuZnNfb2J0YWluX3Jv b3QgdG8ga2VybmZzX25vZGVfZGVudHJ5CihBbmQgbW9yZSwgc2VlIHBhdGNoIGNoYW5nZWxvZ3Mp CgpDaGFuZ2VzIGZyb20gVjU6CjEuIFRvIGdldCBhIHJvb3QgZGVudHJ5IGZvciBjZ3JvdXAgbmFt ZXNwYWNlIG1vdW50LCB3YWxrIHRoZSBwYXRoIGZyb20gdGhlCiAgIGtlcm5mcyByb290IGRlbnRy eS4KCkNoYW5nZXMgZnJvbSBWNDoKMS4gTW92ZSB0aGUgRlNfVVNFUk5TX01PVU5UIGZsYWcgdG8g bGFzdCBwYXRjaAoyLiBSZWJhc2Ugb250byBjZ3JvdXAvZm9yLTQuNQozLiBEb24ndCBub24taW5p dCB1c2VyIG5hbWVzcGFjZXMgdG8gYmluZCBuZXcgc3Vic3lzdGVtcyB3aGVuIG1vdW50aW5nLgo0 LiBBZGRyZXNzIGZlZWRiYWNrIGZyb20gVGVqdW4gKHRoYW5rcykuICBTcGVjaWZpY2FseSwgbm90 IGFkZHJlc3NlZDoKICAgLiBrZXJuZnNfb2J0YWluX3Jvb3QgLSB3YWxraW5nIGRlbnRyeSBmcm9t IGtlcm5mcyByb290LgogICAgIChJIHRoaW5rIHRoYXQncyB0aGUgb25seSBwaWVjZSkKNS4gRHJv cHBlZCB1bnVzZWQgZ2V0X3Rhc2tfY2dyb3VwIGZuL3BhdGNoLgo2LiBSZXdvcmtlZCBrZXJuZnNf cGF0aF9mcm9tX25vZGVfbG9ja2VkKCkgdG8gdHJ5IHRvIHNpbXBsaWZ5IHRoZSBsb2dpYy4KICAg SXQgbm93IGZpbmRzIGEgY29tbW9uIGFuY2VzdG9yLCB3YWxrcyBmcm9tIHRoZSBzb3VyY2UgdG8g aXQsIHRoZW4gYmFjawogICB1cCB0byB0aGUgdGFyZ2V0LgoKQ2hhbmdlcyBmcm9tIFYzOgoxLiBS ZWJhc2VkIG9udG8gbGF0ZXN0IGNncm91cCBjaGFuZ2VzLiAgSW4gcGFydGljdWxhciBzd2l0Y2gg dG8KICAgY3NzX3NldF9sb2NrIGFuZCBuc19jb21tb24uCjIuIFN1cHBvcnQgYWxsIGhpZXJhcmNo aWVzLgoKQ2hhbmdlcyBmcm9tIFYyOgoxLiBBZGRlZCBkb2N1bWVudGF0aW9uIGluIERvY3VtZW50 YXRpb24vY2dyb3Vwcy9uYW1lc3BhY2UudHh0CjIuIEZpeGVkIGEgYnVnIHRoYXQgY2F1c2VkIGNy YXNoCjMuIEluY29ycG9yYXRlZCBzb21lIG90aGVyIHN1Z2dlc3Rpb25zIGZyb20gbGFzdCBwYXRj aHNldDoKICAgLSByZW1vdmVkIHVzZSBvZiB0aHJlYWRncm91cF9sb2NrKCkgd2hpbGUgY3JlYXRp bmcgbmV3IGNncm91cG5zCiAgIC0gdXNlIHRhc2tfbG9jaygpIGluc3RlYWQgb2YgcmN1X3JlYWRf bG9jaygpIHdoaWxlIGFjY2Vzc2luZwogICAgIHRhc2stPm5zcHJveHkKICAgLSBvcHRpbWl6ZWQg c2V0bnMoKSB0byBvd24gY2dyb3VwbnMKICAgLSBzaW1wbGlmaWVkIGNvZGUgYXJvdW5kIHNhbmUt YmVoYXZpb3IgbW91bnQgb3B0aW9uIHBhcnNpbmcKNC4gUmVzdG9yZWQgQUNLcyBmcm9tIFNlcmdl IEhhbGx5biBmcm9tIHYxIG9uIGZldyBwYXRjaGVzIHRoYXQgaGF2ZQogICBub3QgY2hhbmdlZCBz aW5jZSB0aGVuLgoKQ2hhbmdlcyBmcm9tIFYxOgoxLiBObyBwaW5uaW5nIG9mIHByb2Nlc3NlcyB3 aXRoaW4gY2dyb3VwbnMuIFRhc2tzIGNhbiBiZSBmcmVlbHkgbW92ZWQKICAgYWNyb3NzIGNncm91 cHMgZXZlbiBvdXRzaWRlIG9mIHRoZWlyIGNncm91cG5zLXJvb3QuIFVzdWFsIERBQy9NQUMgcG9s aWNpZXMKICAgYXBwbHkgYXMgYmVmb3JlLgoyLiBQYXRoIGluIC9wcm9jLzxwaWQ+L2Nncm91cCBp cyBub3cgYWx3YXlzIHNob3duIGFuZCBpcyByZWxhdGl2ZSB0bwogICBjZ3JvdXBucy1yb290LiBT byBwYXRoIGNhbiBjb250YWluICcvLi4nIHN0cmluZ3MgZGVwZW5kaW5nIG9uIGNncm91cG5zLXJv b3QKICAgb2YgdGhlIHJlYWRlciBhbmQgY2dyb3VwIG9mIDxwaWQ+LgozLiBzZXRucygpIGRvZXMg bm90IHJlcXVpcmUgdGhlIHByb2Nlc3MgdG8gZmlyc3QgbW92ZSB1bmRlciB0YXJnZXQKICAgY2dy b3VwbnMtcm9vdC4KCkNoYW5nZXMgZm9ybSBSRkMgKFYwKToKMS4gc2V0bnMgc3VwcG9ydCBmb3Ig Y2dyb3VwbnMKMi4gJ21vdW50IC10IGNncm91cCBjZ3JvdXAgPG1udHB0PicgZnJvbSBpbnNpZGUg YSBjZ3JvdXBucyBub3cKICAgbW91bnRzIHRoZSBjZ3JvdXAgaGllcmFyY3kgd2l0aCBjZ3JvdXBu cy1yb290IGFzIHRoZSBmaWxlc3lzdGVtIHJvb3QuCjMuIHdyaXRlcyB0byBjZ3JvdXAgZmlsZXMg b3V0c2lkZSBvZiBjZ3JvdXBucy1yb290IGFyZSBub3QgYWxsb3dlZAo0LiB2aXNpYmlsaXR5IG9m IC9wcm9jLzxwaWQ+L2Nncm91cCBpcyBmdXJ0aGVyIHJlc3RyaWN0ZWQgYnkgbm90IHNob3dpbmcK ICAgYW55dGhpbmcgaWYgdGhlIDxwaWQ+IGlzIGluIGEgc2libGluZyBjZ3JvdXBucyBhbmQgaXRz IGNncm91cCBmYWxscyBvdXRzaWRlCiAgIHlvdXIgY2dyb3VwbnMtcm9vdC4KCgpfX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXwpseGMtZGV2ZWwgbWFpbGluZyBs aXN0Cmx4Yy1kZXZlbEBsaXN0cy5saW51eGNvbnRhaW5lcnMub3JnCmh0dHA6Ly9saXN0cy5saW51 eGNvbnRhaW5lcnMub3JnL2xpc3RpbmZvL2x4Yy1kZXZlbAo=