linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Serge Hallyn <serge.hallyn@ubuntu.com>
To: Aditya Kali <adityakali@google.com>
Cc: tj@kernel.org, lizefan@huawei.com, cgroups@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-api@vger.kernel.org,
	mingo@redhat.com, containers@lists.linux-foundation.org
Subject: Re: [PATCH 0/5] RFC: CGroup Namespaces
Date: Thu, 24 Jul 2014 16:36:28 +0000	[thread overview]
Message-ID: <20140724163628.GN26600@ubuntumail> (raw)
In-Reply-To: <1405626731-12220-1-git-send-email-adityakali@google.com>

Quoting Aditya Kali (adityakali@google.com):
> Background
>   Cgroups and Namespaces are used together to create “virtual”
>   containers that isolates the host environment from the processes
>   running in container. But since cgroups themselves are not
>   “virtualized”, the task is always able to see global cgroups view
>   through cgroupfs mount and via /proc/self/cgroup file.
> 
>   $ cat /proc/self/cgroup 
>   0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/batchjobs/c_job_id1
> 
>   This exposure of cgroup names to the processes running inside a
>   container results in some problems:
>   (1) The container names are typically host-container-management-agent
>       (systemd, docker/libcontainer, etc.) data and leaking its name (or
>       leaking the hierarchy) reveals too much information about the host
>       system.
>   (2) It makes the container migration across machines (CRIU) more
>       difficult as the container names need to be unique across the
>       machines in the migration domain.
>   (3) It makes it difficult to run container management tools (like
>       docker/libcontainer, lmctfy, etc.) within virtual containers
>       without adding dependency on some state/agent present outside the
>       container.
> 
>   Note that the feature proposed here is completely different than the
>   “ns cgroup” feature which existed in the linux kernel until recently.
>   The ns cgroup also attempted to connect cgroups and namespaces by
>   creating a new cgroup every time a new namespace was created. It did
>   not solve any of the above mentioned problems and was later dropped
>   from the kernel.
> 
> Introducing CGroup Namespaces
>   With unified cgroup hierarchy
>   (Documentation/cgroups/unified-hierarchy.txt), the containers can now
>   have a much more coherent cgroup view and its easy to associate a
>   container with a single cgroup. This also allows us to virtualize the
>   cgroup view for tasks inside the container.
> 
>   The new CGroup Namespace allows a process to “unshare” its cgroup
>   hierarchy starting from the cgroup its currently in.
>   For Ex:
>   $ cat /proc/self/cgroup
>   0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/batchjobs/c_job_id1
>   $ ls -l /proc/self/ns/cgroup
>   lrwxrwxrwx 1 root root 0 2014-07-15 10:37 /proc/self/ns/cgroup -> cgroup:[4026531835]
>   $ ~/unshare -c  # calls unshare(CLONE_NEWCGROUP) and exec’s /bin/bash
>   [ns]$ ls -l /proc/self/ns/cgroup
>   lrwxrwxrwx 1 root root 0 2014-07-15 10:35 /proc/self/ns/cgroup -> cgroup:[4026532183]
>   # From within new cgroupns, process sees that its in the root cgroup
>   [ns]$ cat /proc/self/cgroup
>   0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/
> 
>   # From global cgroupns:
>   $ cat /proc/<pid>/cgroup
>   0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/batchjobs/c_job_id1
> 
>   The virtualization of /proc/self/cgroup file combined with restricting
>   the view of cgroup hierarchy by bind-mounting for the
>   $CGROUP_MOUNT/batchjobs/c_job_id1/ directory to
>   $CONTAINER_CHROOT/sys/fs/cgroup/) should provide a completely isolated
>   cgroup view inside the container.
> 
>   In its current simplistic form, the cgroup namespaces provide
>   following behavior:
> 
>   (1) The “root” cgroup for a cgroup namespace is the cgroup in which
>       the process calling unshare is running.
>       For ex. if a process in /batchjobs/c_job_id1 cgroup calls unshare,
>       cgroup /batchjobs/c_job_id1 becomes the cgroupns-root.
>       For the init_cgroup_ns, this is the real root (“/”) cgroup
>       (identified in code as cgrp_dfl_root.cgrp).
> 
>   (2) The cgroupns-root cgroup does not change even if the namespace
>       creator process later moves to a different cgroup.
>       $ ~/unshare -c # unshare cgroupns in some cgroup
>       [ns]$ cat /proc/self/cgroup 
>       0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/ 
>       [ns]$ mkdir sub_cgrp_1
>       [ns]$ echo 0 > sub_cgrp_1/cgroup.procs
>       [ns]$ cat /proc/self/cgroup 
>       0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/sub_cgrp_1
> 
>   (3) Each process gets its CGROUPNS specific view of
>       /proc/<pid>/cgroup.
>   (a) Processes running inside the cgroup namespace will be able to see
>       cgroup paths (in /proc/self/cgroup) only inside their root cgroup
>       [ns]$ sleep 100000 &  # From within unshared cgroupns
>       [1] 7353
>       [ns]$ echo 7353 > sub_cgrp_1/cgroup.procs
>       [ns]$ cat /proc/7353/cgroup
>       0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/sub_cgrp_1
> 
>   (b) From global cgroupns, the real cgroup path will be visible:
>       $ cat /proc/7353/cgroup
>       0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/batchjobs/c_job_id1/sub_cgrp_1
> 
>   (c) From a sibling cgroupns, the real path will be visible:
>       [ns2]$ cat /proc/7353/cgroup
>       0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/batchjobs/c_job_id1/sub_cgrp_1
>       (In correct container setup though, it should not be possible to
>        access PIDs in another container in the first place. This can be
>        detected changed if desired.)
> 
>   (4) Processes inside a cgroupns are not allowed to move out of the
>       cgroupns-root. This is true even if a privileged process in global
>       cgroupns tries to move the process out of its cgroupns-root.
> 
>       # From global cgroupns
>       $ cat /proc/7353/cgroup
>       0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/batchjobs/c_job_id1/sub_cgrp_1
>       # cgroupns-root for 7353 is /batchjobs/c_job_id1
>       $ echo 7353 > batchjobs/c_job_id2/cgroup.procs
>       -bash: echo: write error: Operation not permitted
> 
>   (5) setns() is not supported for cgroup namespace in the initial
>       version.

This combined with the full-path reporting for peer ns cgroups could make
for fun antics when attaching to an existing container (since we'd have
to unshare into a new ns cgroup with the same roto as the container).
I understand you are implying this will be fixed soon though.

>   (6) When some thread from a multi-threaded process unshares its
>       cgroup-namespace, the new cgroupns gets applied to the entire
>       process (all the threads). This should be OK since
>       unified-hierarchy only allows process-level containerization. So
>       all the threads in the process will have the same cgroup. And both
>       - changing cgroups and unsharing namespaces - are protected under
>       threadgroup_lock(task).
> 
>   (7) The cgroup namespace is alive as long as there is atleast 1
>       process inside it. When the last process exits, the cgroup
>       namespace is destroyed. The cgroupns-root and the actual cgroups
>       remain though.
> 
> Implementation
>   The current patch-set is based on top of Tejun's cgroup tree (for-next
>   branch). Its fairly non-intrusive and provides above mentioned
>   features.
> 
> Possible extensions of CGROUPNS:
>   (1) The Documentation/cgroups/unified-hierarchy.txt mentions use of
>       capabilities to restrict cgroups to administrative users. CGroup
>       namespaces could be of help here. With cgroup namespaces, it might
>       be possible to delegate administration of sub-cgroups under a
>       cgroupns-root to the cgroupns owner.

That would be nice.

>   (2) Provide a cgroupns specific cgroupfs mount. i.e., the following
>       command when ran from inside a cgroupns should only mount the
>       hierarchy from cgroupns-root cgroup:
>       $ mount -t cgroup cgroup <cgroup-mountpoint>
>       # -o __DEVEL__sane_behavior should be implicit
> 
>       This is similar to how procfs can be mounted for every PIDNS. This
>       may have some usecases.

Sorry - I see this answers the first part of a question in my previous email.
However, the question of whether changes to limits in cgroups which are not
under our cgroup-ns-root are allowed.

Admittedly the current case with cgmanager is the same - in that it depends
on proper setup of the container - but cgmanager is geared to recommend
not mounting the cgroups in the container at all (and we can reject such
mounts in the contaienr altogether with no loss in functionality) whereas
you are here encouraging such mounts.  Which is fine - so long as you then
fully address the potential issues.

  parent reply	other threads:[~2014-07-24 16:36 UTC|newest]

Thread overview: 157+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <adityakali-cgroupns>
2014-07-17 19:52 ` [PATCH 0/5] RFC: CGroup Namespaces Aditya Kali
2014-07-17 19:52   ` [PATCH 1/5] kernfs: Add API to get generate relative kernfs path Aditya Kali
2014-07-24 15:10     ` Serge Hallyn
2014-07-17 19:52   ` [PATCH 2/5] sched: new clone flag CLONE_NEWCGROUP for cgroup namespace Aditya Kali
2014-07-24 17:01     ` Serge Hallyn
2014-07-31 19:48       ` Aditya Kali
2014-08-04 23:12         ` Serge Hallyn
2014-07-17 19:52   ` [PATCH 3/5] cgroup: add function to get task's cgroup on default hierarchy Aditya Kali
2014-07-24 16:59     ` Serge Hallyn
2014-07-17 19:52   ` [PATCH 4/5] cgroup: export cgroup_get() and cgroup_put() Aditya Kali
2014-07-24 17:03     ` Serge Hallyn
2014-07-17 19:52   ` [PATCH 5/5] cgroup: introduce cgroup namespaces Aditya Kali
2014-07-17 19:57     ` Andy Lutomirski
2014-07-17 20:55       ` Aditya Kali
2014-07-18 16:51         ` Andy Lutomirski
2014-07-18 18:51           ` Aditya Kali
2014-07-18 18:57             ` Andy Lutomirski
2014-07-21 22:11               ` Aditya Kali
2014-07-21 22:16                 ` Andy Lutomirski
2014-07-23 19:52                   ` Aditya Kali
2014-07-18 16:00   ` [PATCH 0/5] RFC: CGroup Namespaces Serge Hallyn
2014-07-24 16:10   ` Serge Hallyn
2014-07-24 16:36   ` Serge Hallyn [this message]
2014-07-25 19:29     ` Aditya Kali
2014-07-25 20:27       ` Andy Lutomirski
2014-07-29  4:51       ` Serge E. Hallyn
2014-07-29 15:08         ` Andy Lutomirski
2014-07-29 16:06           ` Serge E. Hallyn
2014-10-13 21:23 ` [PATCHv1 0/8] " Aditya Kali
2014-10-13 21:23   ` [PATCHv1 1/8] kernfs: Add API to generate relative kernfs path Aditya Kali
2014-10-16 16:07     ` Serge E. Hallyn
2014-10-13 21:23   ` [PATCHv1 2/8] sched: new clone flag CLONE_NEWCGROUP for cgroup namespace Aditya Kali
2014-10-16 16:08     ` Serge E. Hallyn
2014-10-13 21:23   ` [PATCHv1 3/8] cgroup: add function to get task's cgroup on default hierarchy Aditya Kali
2014-10-16 16:13     ` Serge E. Hallyn
2014-10-13 21:23   ` [PATCHv1 4/8] cgroup: export cgroup_get() and cgroup_put() Aditya Kali
2014-10-16 16:14     ` Serge E. Hallyn
2014-10-13 21:23   ` [PATCHv1 5/8] cgroup: introduce cgroup namespaces Aditya Kali
2014-10-16 16:37     ` Serge E. Hallyn
2014-10-24  1:03       ` Aditya Kali
2014-10-25  3:16         ` Serge E. Hallyn
2014-10-13 21:23   ` [PATCHv1 6/8] cgroup: restrict cgroup operations within task's cgroupns Aditya Kali
2014-10-17  9:28     ` Serge E. Hallyn
2014-10-22 19:06       ` Aditya Kali
2014-10-19  4:57     ` Eric W. Biederman
2014-10-13 21:23   ` [PATCHv1 7/8] cgroup: cgroup namespace setns support Aditya Kali
2014-10-16 21:12     ` Serge E. Hallyn
2014-10-16 21:17       ` Andy Lutomirski
2014-10-16 21:22       ` Aditya Kali
2014-10-16 21:47         ` Serge E. Hallyn
2014-10-19  5:23           ` Eric W. Biederman
2014-10-19 18:26             ` Andy Lutomirski
2014-10-20  4:55               ` Eric W.Biederman
2014-10-21  0:20                 ` Andy Lutomirski
2014-10-21  4:49                   ` Eric W. Biederman
2014-10-21  5:03                     ` Andy Lutomirski
2014-10-21  5:42                       ` Eric W. Biederman
2014-10-21  5:49                         ` Andy Lutomirski
2014-10-21 18:49                           ` Aditya Kali
2014-10-21 19:02                             ` Andy Lutomirski
2014-10-21 22:33                               ` Aditya Kali
2014-10-21 22:42                                 ` Andy Lutomirski
2014-10-22  0:46                                   ` Aditya Kali
2014-10-22  0:58                                     ` Andy Lutomirski
2014-10-22 18:37                                       ` Aditya Kali
2014-10-22 18:50                                         ` Andy Lutomirski
2014-10-22 19:42                                         ` Tejun Heo
2014-10-17  9:52     ` Serge E. Hallyn
2014-10-13 21:23   ` [PATCHv1 8/8] cgroup: mount cgroupns-root when inside non-init cgroupns Aditya Kali
2014-10-17 12:19     ` Serge E. Hallyn
2014-10-14 22:42   ` [PATCHv1 0/8] CGroup Namespaces Andy Lutomirski
2014-10-14 23:33     ` Aditya Kali
2014-10-19  4:54   ` Eric W. Biederman
2015-07-22 18:10     ` Vincent Batts
2014-10-31 19:18 ` [PATCHv2 0/7] " Aditya Kali
2014-10-31 19:18   ` [PATCHv2 1/7] kernfs: Add API to generate relative kernfs path Aditya Kali
2014-10-31 19:18   ` [PATCHv2 2/7] sched: new clone flag CLONE_NEWCGROUP for cgroup namespace Aditya Kali
2014-10-31 19:18   ` [PATCHv2 3/7] cgroup: add function to get task's cgroup on default hierarchy Aditya Kali
2014-10-31 19:18   ` [PATCHv2 4/7] cgroup: export cgroup_get() and cgroup_put() Aditya Kali
2014-10-31 19:18   ` [PATCHv2 5/7] cgroup: introduce cgroup namespaces Aditya Kali
2014-11-01  0:02     ` Andy Lutomirski
2014-11-01  0:58       ` Eric W. Biederman
2014-11-03 23:42         ` Aditya Kali
2014-11-03 23:40       ` Aditya Kali
2014-11-04  1:56     ` Aditya Kali
2014-10-31 19:19   ` [PATCHv2 6/7] cgroup: cgroup namespace setns support Aditya Kali
2014-10-31 19:19   ` [PATCHv2 7/7] cgroup: mount cgroupns-root when inside non-init cgroupns Aditya Kali
2014-11-01  0:07     ` Andy Lutomirski
2014-11-01  2:59       ` Eric W. Biederman
2014-11-01  3:29         ` Andy Lutomirski
2014-11-03 23:12       ` Aditya Kali
2014-11-03 23:15         ` Andy Lutomirski
2014-11-03 23:23           ` Aditya Kali
2014-11-03 23:48             ` Andy Lutomirski
2014-11-04  0:12               ` Aditya Kali
2014-11-04  0:17                 ` Andy Lutomirski
2014-11-04  0:49                   ` Aditya Kali
2014-11-04 13:57         ` Tejun Heo
2014-11-06 17:28           ` Aditya Kali
2014-11-01  1:09     ` Eric W. Biederman
2014-11-03 22:46       ` Aditya Kali
     [not found]       ` <CAGr1F2Hd_PS_AscBGMXdZC9qkHGRUp-MeQvJksDOQkRBB3RGoA@mail.gmail.com>
2014-11-03 22:56         ` Andy Lutomirski
2014-11-04 13:46         ` Tejun Heo
2014-11-04 15:00           ` Andy Lutomirski
2014-11-04 15:50             ` Serge E. Hallyn
2014-11-12 17:48               ` Aditya Kali
2014-11-04  1:59     ` Aditya Kali
2014-11-04 13:10   ` [PATCHv2 0/7] CGroup Namespaces Vivek Goyal
2014-11-06 17:33     ` Aditya Kali
2014-11-26 22:58       ` Richard Weinberger
2014-12-02 19:14         ` Aditya Kali
2014-12-05  1:55 ` [PATCHv3 0/8] " Aditya Kali
2014-12-05  1:55   ` [PATCHv3 1/8] kernfs: Add API to generate relative kernfs path Aditya Kali
2014-12-05  1:55   ` [PATCHv3 2/8] sched: new clone flag CLONE_NEWCGROUP for cgroup namespace Aditya Kali
2014-12-05  1:55   ` [PATCHv3 3/8] cgroup: add function to get task's cgroup on default hierarchy Aditya Kali
2014-12-05  1:55   ` [PATCHv3 4/8] cgroup: export cgroup_get() and cgroup_put() Aditya Kali
2014-12-05  1:55   ` [PATCHv3 5/8] cgroup: introduce cgroup namespaces Aditya Kali
2014-12-12  8:54     ` Zefan Li
2014-12-05  1:55   ` [PATCHv3 6/8] cgroup: cgroup namespace setns support Aditya Kali
2014-12-05  1:55   ` [PATCHv3 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns Aditya Kali
2014-12-12  8:55     ` Zefan Li
2014-12-05  1:55   ` [PATCHv3 8/8] cgroup: Add documentation for cgroup namespaces Aditya Kali
2014-12-12  8:54     ` Zefan Li
2015-01-05 22:54       ` Aditya Kali
2014-12-14 23:05     ` Richard Weinberger
2015-01-05 22:48       ` Aditya Kali
2015-01-05 22:52         ` Richard Weinberger
2015-01-05 23:53           ` Eric W. Biederman
2015-01-06  0:07             ` Richard Weinberger
2015-01-06  0:10             ` Aditya Kali
2015-01-06  0:17               ` Richard Weinberger
2015-01-06 23:20                 ` Aditya Kali
2015-01-06 23:39                   ` Richard Weinberger
2015-01-07  9:28                   ` Richard Weinberger
2015-01-07 14:45                     ` Eric W. Biederman
2015-01-07 19:30                       ` Serge E. Hallyn
2015-01-07 22:14                         ` Eric W. Biederman
2015-01-07 22:45                           ` Tejun Heo
2015-01-07 23:02                             ` Eric W. Biederman
2015-01-07 23:06                               ` Tejun Heo
2015-01-07 23:09                                 ` Eric W. Biederman
2015-01-07 23:16                                   ` Tejun Heo
2015-01-07 23:27                                   ` Eric W. Biederman
2015-01-07 23:35                                     ` Tejun Heo
2015-02-11  3:46                                       ` Serge E. Hallyn
2015-02-11  4:09                                         ` Tejun Heo
2015-02-11  4:29                                           ` Serge E. Hallyn
2015-02-11  5:02                                             ` Eric W. Biederman
2015-02-11  5:17                                               ` Tejun Heo
2015-02-11  6:29                                                 ` Eric W. Biederman
2015-02-11 14:36                                                   ` Tejun Heo
2015-02-11 16:00                                                 ` Serge E. Hallyn
2015-02-11 16:03                                                   ` Tejun Heo
2015-02-11 16:18                                                     ` Serge E. Hallyn
2015-02-11  5:10                                             ` Tejun Heo
2015-01-07 18:57                     ` Aditya Kali
2014-12-05  3:20   ` [PATCHv3 0/8] CGroup Namespaces Aditya Kali

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140724163628.GN26600@ubuntumail \
    --to=serge.hallyn@ubuntu.com \
    --cc=adityakali@google.com \
    --cc=cgroups@vger.kernel.org \
    --cc=containers@lists.linux-foundation.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizefan@huawei.com \
    --cc=mingo@redhat.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).