From: Andy Lutomirski <luto@amacapital.net> To: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com>, Mike Galbraith <umgwanakikbuti@gmail.com>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, kernel-team@fb.com, "open list:CONTROL GROUP (CGROUP)" <cgroups@vger.kernel.org>, Andrew Morton <akpm@linux-foundation.org>, Paul Turner <pjt@google.com>, Li Zefan <lizefan@huawei.com>, Linux API <linux-api@vger.kernel.org>, Peter Zijlstra <peterz@infradead.org>, Johannes Weiner <hannes@cmpxchg.org>, Linus Torvalds <torvalds@linux-foundation.org> Subject: Re: [Documentation] State of CPU controller in cgroup v2 Date: Tue, 30 Aug 2016 20:42:20 -0700 [thread overview] Message-ID: <CALCETrUEygWrJbG25wSfG3zMG_+TNeP8+gAkcbh4_=ZNWHQCkw@mail.gmail.com> (raw) In-Reply-To: <20160829222048.GH28713@mtj.duckdns.org> On Mon, Aug 29, 2016 at 3:20 PM, Tejun Heo <tj@kernel.org> wrote: >> > These base-system operations are special regardless of cgroup and we >> > already have sometimes crude ways to affect their behaviors where >> > necessary through sysctl knobs, priorities on specific kernel threads >> > and so on. cgroup doesn't change the situation all that much. What >> > gets left in the root cgroup usually are the base-system operations >> > which are outside the scope of cgroup resource control in the first >> > place and cgroup resource graph can treat the root as an opaque anchor >> > point. >> >> This seems to explain why the controllers need to be able to handle >> things being charged to the root cgroup (or to an unidentifiable >> cgroup, anyway). That isn't quite the same thing as allowing, from an >> ABI point of view, the root cgroup to contain processes and cgroups >> but not allowing other cgroups to do the same thing. Consider: > > The points are 1. we need the root to be a special container anyway But you don't need to let userspace see that. > 2. allowing it to be special and contain system-wide consumptions > doesn't make the resource graph inconsistent once all non-system-wide > consumptions are put in non-root cgroups, and 3. this is the most > natural way to handle the situation both from implementation and > interface standpoints as it makes non-cgroup configuration a natural > degenerate case of cgroup configuration. > >> suppose that systemd (or some competing cgroup manager) is designed to >> run in the root cgroup namespace. It presumably expects *itself* to >> be in the root cgroup. Now try to run it using cgroups v2 in a >> non-root namespace. I don't see how it can possibly work if it the >> hierarchy constraints don't permit it to create sub-cgroups while it's >> still in the root. In fact, this seems impossible to fix even with >> user code changes. The manager would need to simultaneously create a >> new child cgroup to contain itself and assign itself to that child >> cgroup, because the intermediate state is illegal. > > Please re-read the constraint. It doesn't prevent any organizational > operations before resource control is enabled. > >> I really, really think that cgroup v2 should supply the same >> *interface* inside and outside of a non-root namespace. If this is > > It *does*. That's what I tried to explain, that it's exactly > isomorhpic once you discount the system-wide consumptions. > I don't think I agree. Suppose I wrote an init program or a cgroup manager. I can expect that init program to be started in the root cgroup. The program can be lazy and write +io to /cgroup/cgroup.subtree_control and then create some new cgroup /cgroup/a and it will work (I just tried it). Now I run that program in a namespace. It will not work because it'll get -EBUSY when it tries to write to cgroup.subtree_control. (I just tried this, too, only using cd instead of a namespace.) So it's *not* isomorphic. It *also* won't work (I think) if subtree control is enabled on the root, but I don't think this is a problem in practice because subtree control won't be enabled on the namespace root by a sensible cgroup manager. --Andy
WARNING: multiple messages have this Message-ID (diff)
From: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> To: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> Cc: Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, Mike Galbraith <umgwanakikbuti-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>, "linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" <linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>, kernel-team-b10kYP2dOMg@public.gmane.org, "open list:CONTROL GROUP (CGROUP)" <cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>, Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>, Paul Turner <pjt-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>, Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>, Linux API <linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>, Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>, Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>, Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> Subject: Re: [Documentation] State of CPU controller in cgroup v2 Date: Tue, 30 Aug 2016 20:42:20 -0700 [thread overview] Message-ID: <CALCETrUEygWrJbG25wSfG3zMG_+TNeP8+gAkcbh4_=ZNWHQCkw@mail.gmail.com> (raw) In-Reply-To: <20160829222048.GH28713-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org> On Mon, Aug 29, 2016 at 3:20 PM, Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote: >> > These base-system operations are special regardless of cgroup and we >> > already have sometimes crude ways to affect their behaviors where >> > necessary through sysctl knobs, priorities on specific kernel threads >> > and so on. cgroup doesn't change the situation all that much. What >> > gets left in the root cgroup usually are the base-system operations >> > which are outside the scope of cgroup resource control in the first >> > place and cgroup resource graph can treat the root as an opaque anchor >> > point. >> >> This seems to explain why the controllers need to be able to handle >> things being charged to the root cgroup (or to an unidentifiable >> cgroup, anyway). That isn't quite the same thing as allowing, from an >> ABI point of view, the root cgroup to contain processes and cgroups >> but not allowing other cgroups to do the same thing. Consider: > > The points are 1. we need the root to be a special container anyway But you don't need to let userspace see that. > 2. allowing it to be special and contain system-wide consumptions > doesn't make the resource graph inconsistent once all non-system-wide > consumptions are put in non-root cgroups, and 3. this is the most > natural way to handle the situation both from implementation and > interface standpoints as it makes non-cgroup configuration a natural > degenerate case of cgroup configuration. > >> suppose that systemd (or some competing cgroup manager) is designed to >> run in the root cgroup namespace. It presumably expects *itself* to >> be in the root cgroup. Now try to run it using cgroups v2 in a >> non-root namespace. I don't see how it can possibly work if it the >> hierarchy constraints don't permit it to create sub-cgroups while it's >> still in the root. In fact, this seems impossible to fix even with >> user code changes. The manager would need to simultaneously create a >> new child cgroup to contain itself and assign itself to that child >> cgroup, because the intermediate state is illegal. > > Please re-read the constraint. It doesn't prevent any organizational > operations before resource control is enabled. > >> I really, really think that cgroup v2 should supply the same >> *interface* inside and outside of a non-root namespace. If this is > > It *does*. That's what I tried to explain, that it's exactly > isomorhpic once you discount the system-wide consumptions. > I don't think I agree. Suppose I wrote an init program or a cgroup manager. I can expect that init program to be started in the root cgroup. The program can be lazy and write +io to /cgroup/cgroup.subtree_control and then create some new cgroup /cgroup/a and it will work (I just tried it). Now I run that program in a namespace. It will not work because it'll get -EBUSY when it tries to write to cgroup.subtree_control. (I just tried this, too, only using cd instead of a namespace.) So it's *not* isomorphic. It *also* won't work (I think) if subtree control is enabled on the root, but I don't think this is a problem in practice because subtree control won't be enabled on the namespace root by a sensible cgroup manager. --Andy
next prev parent reply other threads:[~2016-08-31 3:42 UTC|newest] Thread overview: 87+ messages / expand[flat|nested] mbox.gz Atom feed top 2016-08-05 17:07 [Documentation] State of CPU controller in cgroup v2 Tejun Heo 2016-08-05 17:07 ` Tejun Heo 2016-08-05 17:09 ` [PATCH 1/2] sched: Misc preps for cgroup unified hierarchy interface Tejun Heo 2016-08-05 17:09 ` Tejun Heo 2016-08-05 17:09 ` [PATCH 2/2] sched: Implement interface for cgroup unified hierarchy Tejun Heo 2016-08-05 17:09 ` Tejun Heo 2016-08-06 9:04 ` [Documentation] State of CPU controller in cgroup v2 Mike Galbraith 2016-08-06 9:04 ` Mike Galbraith 2016-08-10 22:09 ` Johannes Weiner 2016-08-10 22:09 ` Johannes Weiner 2016-08-11 6:25 ` Mike Galbraith 2016-08-11 6:25 ` Mike Galbraith 2016-08-12 22:17 ` Johannes Weiner 2016-08-12 22:17 ` Johannes Weiner 2016-08-13 5:08 ` Mike Galbraith 2016-08-13 5:08 ` Mike Galbraith 2016-08-16 14:07 ` Peter Zijlstra 2016-08-16 14:07 ` Peter Zijlstra 2016-08-16 14:58 ` Chris Mason 2016-08-16 14:58 ` Chris Mason 2016-08-16 16:30 ` Johannes Weiner 2016-08-16 16:30 ` Johannes Weiner 2016-08-17 9:33 ` Mike Galbraith 2016-08-16 21:59 ` Tejun Heo 2016-08-16 21:59 ` Tejun Heo 2016-08-17 20:18 ` Andy Lutomirski 2016-08-20 15:56 ` Tejun Heo 2016-08-20 15:56 ` Tejun Heo 2016-08-20 18:45 ` Andy Lutomirski 2016-08-29 22:20 ` Tejun Heo 2016-08-29 22:20 ` Tejun Heo 2016-08-31 3:42 ` Andy Lutomirski [this message] 2016-08-31 3:42 ` Andy Lutomirski 2016-08-31 17:32 ` Tejun Heo 2016-08-31 19:11 ` Andy Lutomirski 2016-08-31 19:11 ` Andy Lutomirski 2016-08-31 21:07 ` Tejun Heo 2016-08-31 21:07 ` Tejun Heo 2016-08-31 21:46 ` Andy Lutomirski 2016-09-03 22:05 ` Tejun Heo 2016-09-03 22:05 ` Tejun Heo 2016-09-05 17:37 ` Andy Lutomirski 2016-09-06 10:29 ` Peter Zijlstra 2016-09-06 10:29 ` Peter Zijlstra 2016-10-04 14:47 ` Tejun Heo 2016-10-05 8:07 ` Peter Zijlstra 2016-10-05 8:07 ` Peter Zijlstra 2016-09-09 22:57 ` Tejun Heo 2016-09-10 8:54 ` Mike Galbraith 2016-09-10 8:54 ` Mike Galbraith 2016-09-10 10:08 ` Mike Galbraith 2016-09-10 10:08 ` Mike Galbraith 2016-09-30 9:06 ` Tejun Heo 2016-09-30 9:06 ` Tejun Heo 2016-09-30 14:53 ` Mike Galbraith 2016-09-30 14:53 ` Mike Galbraith 2016-09-12 15:20 ` Austin S. Hemmelgarn 2016-09-12 15:20 ` Austin S. Hemmelgarn 2016-09-19 21:34 ` Tejun Heo 2016-09-19 21:34 ` Tejun Heo [not found] ` <CALCETrUhpPQdyZ-6WRjdB+iLbpGBduRZMWXQtCuS+R7Cq7rygg@mail.gmail.com> 2016-09-14 20:00 ` Tejun Heo 2016-09-15 20:08 ` Andy Lutomirski 2016-09-15 20:08 ` Andy Lutomirski 2016-09-16 7:51 ` Peter Zijlstra 2016-09-16 7:51 ` Peter Zijlstra 2016-09-16 15:12 ` Andy Lutomirski 2016-09-16 15:12 ` Andy Lutomirski 2016-09-16 16:19 ` Peter Zijlstra 2016-09-16 16:19 ` Peter Zijlstra 2016-09-16 16:29 ` Andy Lutomirski 2016-09-16 16:29 ` Andy Lutomirski 2016-09-16 16:50 ` Peter Zijlstra 2016-09-16 16:50 ` Peter Zijlstra 2016-09-16 18:19 ` Andy Lutomirski 2016-09-16 18:19 ` Andy Lutomirski 2016-09-17 1:47 ` Peter Zijlstra 2016-09-17 1:47 ` Peter Zijlstra 2016-09-19 21:53 ` Tejun Heo 2016-09-19 21:53 ` Tejun Heo 2016-08-31 19:57 ` Andy Lutomirski 2016-08-31 19:57 ` Andy Lutomirski 2016-08-22 10:12 ` Mike Galbraith 2016-08-22 10:12 ` Mike Galbraith 2016-08-21 5:34 ` James Bottomley 2016-08-21 5:34 ` James Bottomley 2016-08-29 22:35 ` Tejun Heo 2016-08-29 22:35 ` Tejun Heo
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to='CALCETrUEygWrJbG25wSfG3zMG_+TNeP8+gAkcbh4_=ZNWHQCkw@mail.gmail.com' \ --to=luto@amacapital.net \ --cc=akpm@linux-foundation.org \ --cc=cgroups@vger.kernel.org \ --cc=hannes@cmpxchg.org \ --cc=kernel-team@fb.com \ --cc=linux-api@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=lizefan@huawei.com \ --cc=mingo@redhat.com \ --cc=peterz@infradead.org \ --cc=pjt@google.com \ --cc=tj@kernel.org \ --cc=torvalds@linux-foundation.org \ --cc=umgwanakikbuti@gmail.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.