All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@amacapital.net>
To: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>,
	Mike Galbraith <umgwanakikbuti@gmail.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	kernel-team@fb.com,
	"open list:CONTROL GROUP (CGROUP)" <cgroups@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Paul Turner <pjt@google.com>, Li Zefan <lizefan@huawei.com>,
	Linux API <linux-api@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [Documentation] State of CPU controller in cgroup v2
Date: Tue, 30 Aug 2016 20:42:20 -0700	[thread overview]
Message-ID: <CALCETrUEygWrJbG25wSfG3zMG_+TNeP8+gAkcbh4_=ZNWHQCkw@mail.gmail.com> (raw)
In-Reply-To: <20160829222048.GH28713@mtj.duckdns.org>

On Mon, Aug 29, 2016 at 3:20 PM, Tejun Heo <tj@kernel.org> wrote:
>> > These base-system operations are special regardless of cgroup and we
>> > already have sometimes crude ways to affect their behaviors where
>> > necessary through sysctl knobs, priorities on specific kernel threads
>> > and so on.  cgroup doesn't change the situation all that much.  What
>> > gets left in the root cgroup usually are the base-system operations
>> > which are outside the scope of cgroup resource control in the first
>> > place and cgroup resource graph can treat the root as an opaque anchor
>> > point.
>>
>> This seems to explain why the controllers need to be able to handle
>> things being charged to the root cgroup (or to an unidentifiable
>> cgroup, anyway).  That isn't quite the same thing as allowing, from an
>> ABI point of view, the root cgroup to contain processes and cgroups
>> but not allowing other cgroups to do the same thing.  Consider:
>
> The points are 1. we need the root to be a special container anyway

But you don't need to let userspace see that.

> 2. allowing it to be special and contain system-wide consumptions
> doesn't make the resource graph inconsistent once all non-system-wide
> consumptions are put in non-root cgroups, and 3. this is the most
> natural way to handle the situation both from implementation and
> interface standpoints as it makes non-cgroup configuration a natural
> degenerate case of cgroup configuration.
>
>> suppose that systemd (or some competing cgroup manager) is designed to
>> run in the root cgroup namespace.  It presumably expects *itself* to
>> be in the root cgroup.  Now try to run it using cgroups v2 in a
>> non-root namespace.  I don't see how it can possibly work if it the
>> hierarchy constraints don't permit it to create sub-cgroups while it's
>> still in the root.  In fact, this seems impossible to fix even with
>> user code changes.  The manager would need to simultaneously create a
>> new child cgroup to contain itself and assign itself to that child
>> cgroup, because the intermediate state is illegal.
>
> Please re-read the constraint.  It doesn't prevent any organizational
> operations before resource control is enabled.
>
>> I really, really think that cgroup v2 should supply the same
>> *interface* inside and outside of a non-root namespace.  If this is
>
> It *does*.  That's what I tried to explain, that it's exactly
> isomorhpic once you discount the system-wide consumptions.
>

I don't think I agree.

Suppose I wrote an init program or a cgroup manager.  I can expect
that init program to be started in the root cgroup.  The program can
be lazy and write +io to /cgroup/cgroup.subtree_control and then
create some new cgroup /cgroup/a and it will work (I just tried it).

Now I run that program in a namespace.  It will not work because it'll
get -EBUSY when it tries to write to cgroup.subtree_control.  (I just
tried this, too, only using cd instead of a namespace.)  So it's *not*
isomorphic.

It *also* won't work (I think) if subtree control is enabled on the
root, but I don't think this is a problem in practice because subtree
control won't be enabled on the namespace root by a sensible cgroup
manager.

--Andy

WARNING: multiple messages have this Message-ID (diff)
From: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
To: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Mike Galbraith
	<umgwanakikbuti-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	"linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	kernel-team-b10kYP2dOMg@public.gmane.org,
	"open list:CONTROL GROUP (CGROUP)"
	<cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	Paul Turner <pjt-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>,
	Linux API <linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>,
	Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>,
	Linus Torvalds
	<torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Subject: Re: [Documentation] State of CPU controller in cgroup v2
Date: Tue, 30 Aug 2016 20:42:20 -0700	[thread overview]
Message-ID: <CALCETrUEygWrJbG25wSfG3zMG_+TNeP8+gAkcbh4_=ZNWHQCkw@mail.gmail.com> (raw)
In-Reply-To: <20160829222048.GH28713-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>

On Mon, Aug 29, 2016 at 3:20 PM, Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>> > These base-system operations are special regardless of cgroup and we
>> > already have sometimes crude ways to affect their behaviors where
>> > necessary through sysctl knobs, priorities on specific kernel threads
>> > and so on.  cgroup doesn't change the situation all that much.  What
>> > gets left in the root cgroup usually are the base-system operations
>> > which are outside the scope of cgroup resource control in the first
>> > place and cgroup resource graph can treat the root as an opaque anchor
>> > point.
>>
>> This seems to explain why the controllers need to be able to handle
>> things being charged to the root cgroup (or to an unidentifiable
>> cgroup, anyway).  That isn't quite the same thing as allowing, from an
>> ABI point of view, the root cgroup to contain processes and cgroups
>> but not allowing other cgroups to do the same thing.  Consider:
>
> The points are 1. we need the root to be a special container anyway

But you don't need to let userspace see that.

> 2. allowing it to be special and contain system-wide consumptions
> doesn't make the resource graph inconsistent once all non-system-wide
> consumptions are put in non-root cgroups, and 3. this is the most
> natural way to handle the situation both from implementation and
> interface standpoints as it makes non-cgroup configuration a natural
> degenerate case of cgroup configuration.
>
>> suppose that systemd (or some competing cgroup manager) is designed to
>> run in the root cgroup namespace.  It presumably expects *itself* to
>> be in the root cgroup.  Now try to run it using cgroups v2 in a
>> non-root namespace.  I don't see how it can possibly work if it the
>> hierarchy constraints don't permit it to create sub-cgroups while it's
>> still in the root.  In fact, this seems impossible to fix even with
>> user code changes.  The manager would need to simultaneously create a
>> new child cgroup to contain itself and assign itself to that child
>> cgroup, because the intermediate state is illegal.
>
> Please re-read the constraint.  It doesn't prevent any organizational
> operations before resource control is enabled.
>
>> I really, really think that cgroup v2 should supply the same
>> *interface* inside and outside of a non-root namespace.  If this is
>
> It *does*.  That's what I tried to explain, that it's exactly
> isomorhpic once you discount the system-wide consumptions.
>

I don't think I agree.

Suppose I wrote an init program or a cgroup manager.  I can expect
that init program to be started in the root cgroup.  The program can
be lazy and write +io to /cgroup/cgroup.subtree_control and then
create some new cgroup /cgroup/a and it will work (I just tried it).

Now I run that program in a namespace.  It will not work because it'll
get -EBUSY when it tries to write to cgroup.subtree_control.  (I just
tried this, too, only using cd instead of a namespace.)  So it's *not*
isomorphic.

It *also* won't work (I think) if subtree control is enabled on the
root, but I don't think this is a problem in practice because subtree
control won't be enabled on the namespace root by a sensible cgroup
manager.

--Andy

  reply	other threads:[~2016-08-31  3:42 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-05 17:07 [Documentation] State of CPU controller in cgroup v2 Tejun Heo
2016-08-05 17:07 ` Tejun Heo
2016-08-05 17:09 ` [PATCH 1/2] sched: Misc preps for cgroup unified hierarchy interface Tejun Heo
2016-08-05 17:09   ` Tejun Heo
2016-08-05 17:09 ` [PATCH 2/2] sched: Implement interface for cgroup unified hierarchy Tejun Heo
2016-08-05 17:09   ` Tejun Heo
2016-08-06  9:04 ` [Documentation] State of CPU controller in cgroup v2 Mike Galbraith
2016-08-06  9:04   ` Mike Galbraith
2016-08-10 22:09   ` Johannes Weiner
2016-08-10 22:09     ` Johannes Weiner
2016-08-11  6:25     ` Mike Galbraith
2016-08-11  6:25       ` Mike Galbraith
2016-08-12 22:17       ` Johannes Weiner
2016-08-12 22:17         ` Johannes Weiner
2016-08-13  5:08         ` Mike Galbraith
2016-08-13  5:08           ` Mike Galbraith
2016-08-16 14:07     ` Peter Zijlstra
2016-08-16 14:07       ` Peter Zijlstra
2016-08-16 14:58       ` Chris Mason
2016-08-16 14:58         ` Chris Mason
2016-08-16 16:30       ` Johannes Weiner
2016-08-16 16:30         ` Johannes Weiner
2016-08-17  9:33         ` Mike Galbraith
2016-08-16 21:59       ` Tejun Heo
2016-08-16 21:59         ` Tejun Heo
2016-08-17 20:18 ` Andy Lutomirski
2016-08-20 15:56   ` Tejun Heo
2016-08-20 15:56     ` Tejun Heo
2016-08-20 18:45     ` Andy Lutomirski
2016-08-29 22:20       ` Tejun Heo
2016-08-29 22:20         ` Tejun Heo
2016-08-31  3:42         ` Andy Lutomirski [this message]
2016-08-31  3:42           ` Andy Lutomirski
2016-08-31 17:32           ` Tejun Heo
2016-08-31 19:11             ` Andy Lutomirski
2016-08-31 19:11               ` Andy Lutomirski
2016-08-31 21:07               ` Tejun Heo
2016-08-31 21:07                 ` Tejun Heo
2016-08-31 21:46                 ` Andy Lutomirski
2016-09-03 22:05                   ` Tejun Heo
2016-09-03 22:05                     ` Tejun Heo
2016-09-05 17:37                     ` Andy Lutomirski
2016-09-06 10:29                       ` Peter Zijlstra
2016-09-06 10:29                         ` Peter Zijlstra
2016-10-04 14:47                         ` Tejun Heo
2016-10-05  8:07                           ` Peter Zijlstra
2016-10-05  8:07                             ` Peter Zijlstra
2016-09-09 22:57                       ` Tejun Heo
2016-09-10  8:54                         ` Mike Galbraith
2016-09-10  8:54                           ` Mike Galbraith
2016-09-10 10:08                         ` Mike Galbraith
2016-09-10 10:08                           ` Mike Galbraith
2016-09-30  9:06                           ` Tejun Heo
2016-09-30  9:06                             ` Tejun Heo
2016-09-30 14:53                             ` Mike Galbraith
2016-09-30 14:53                               ` Mike Galbraith
2016-09-12 15:20                         ` Austin S. Hemmelgarn
2016-09-12 15:20                           ` Austin S. Hemmelgarn
2016-09-19 21:34                           ` Tejun Heo
2016-09-19 21:34                             ` Tejun Heo
     [not found]                         ` <CALCETrUhpPQdyZ-6WRjdB+iLbpGBduRZMWXQtCuS+R7Cq7rygg@mail.gmail.com>
2016-09-14 20:00                           ` Tejun Heo
2016-09-15 20:08                             ` Andy Lutomirski
2016-09-15 20:08                               ` Andy Lutomirski
2016-09-16  7:51                               ` Peter Zijlstra
2016-09-16  7:51                                 ` Peter Zijlstra
2016-09-16 15:12                                 ` Andy Lutomirski
2016-09-16 15:12                                   ` Andy Lutomirski
2016-09-16 16:19                                   ` Peter Zijlstra
2016-09-16 16:19                                     ` Peter Zijlstra
2016-09-16 16:29                                     ` Andy Lutomirski
2016-09-16 16:29                                       ` Andy Lutomirski
2016-09-16 16:50                                       ` Peter Zijlstra
2016-09-16 16:50                                         ` Peter Zijlstra
2016-09-16 18:19                                         ` Andy Lutomirski
2016-09-16 18:19                                           ` Andy Lutomirski
2016-09-17  1:47                                           ` Peter Zijlstra
2016-09-17  1:47                                             ` Peter Zijlstra
2016-09-19 21:53                               ` Tejun Heo
2016-09-19 21:53                                 ` Tejun Heo
2016-08-31 19:57         ` Andy Lutomirski
2016-08-31 19:57           ` Andy Lutomirski
2016-08-22 10:12     ` Mike Galbraith
2016-08-22 10:12       ` Mike Galbraith
2016-08-21  5:34   ` James Bottomley
2016-08-21  5:34     ` James Bottomley
2016-08-29 22:35     ` Tejun Heo
2016-08-29 22:35       ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALCETrUEygWrJbG25wSfG3zMG_+TNeP8+gAkcbh4_=ZNWHQCkw@mail.gmail.com' \
    --to=luto@amacapital.net \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@fb.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizefan@huawei.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=umgwanakikbuti@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.