linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@amacapital.net>
To: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>,
	Mike Galbraith <umgwanakikbuti@gmail.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	kernel-team@fb.com,
	"open list:CONTROL GROUP (CGROUP)" <cgroups@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Paul Turner <pjt@google.com>, Li Zefan <lizefan@huawei.com>,
	Linux API <linux-api@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [Documentation] State of CPU controller in cgroup v2
Date: Tue, 30 Aug 2016 20:42:20 -0700	[thread overview]
Message-ID: <CALCETrUEygWrJbG25wSfG3zMG_+TNeP8+gAkcbh4_=ZNWHQCkw@mail.gmail.com> (raw)
In-Reply-To: <20160829222048.GH28713@mtj.duckdns.org>

On Mon, Aug 29, 2016 at 3:20 PM, Tejun Heo <tj@kernel.org> wrote:
>> > These base-system operations are special regardless of cgroup and we
>> > already have sometimes crude ways to affect their behaviors where
>> > necessary through sysctl knobs, priorities on specific kernel threads
>> > and so on.  cgroup doesn't change the situation all that much.  What
>> > gets left in the root cgroup usually are the base-system operations
>> > which are outside the scope of cgroup resource control in the first
>> > place and cgroup resource graph can treat the root as an opaque anchor
>> > point.
>>
>> This seems to explain why the controllers need to be able to handle
>> things being charged to the root cgroup (or to an unidentifiable
>> cgroup, anyway).  That isn't quite the same thing as allowing, from an
>> ABI point of view, the root cgroup to contain processes and cgroups
>> but not allowing other cgroups to do the same thing.  Consider:
>
> The points are 1. we need the root to be a special container anyway

But you don't need to let userspace see that.

> 2. allowing it to be special and contain system-wide consumptions
> doesn't make the resource graph inconsistent once all non-system-wide
> consumptions are put in non-root cgroups, and 3. this is the most
> natural way to handle the situation both from implementation and
> interface standpoints as it makes non-cgroup configuration a natural
> degenerate case of cgroup configuration.
>
>> suppose that systemd (or some competing cgroup manager) is designed to
>> run in the root cgroup namespace.  It presumably expects *itself* to
>> be in the root cgroup.  Now try to run it using cgroups v2 in a
>> non-root namespace.  I don't see how it can possibly work if it the
>> hierarchy constraints don't permit it to create sub-cgroups while it's
>> still in the root.  In fact, this seems impossible to fix even with
>> user code changes.  The manager would need to simultaneously create a
>> new child cgroup to contain itself and assign itself to that child
>> cgroup, because the intermediate state is illegal.
>
> Please re-read the constraint.  It doesn't prevent any organizational
> operations before resource control is enabled.
>
>> I really, really think that cgroup v2 should supply the same
>> *interface* inside and outside of a non-root namespace.  If this is
>
> It *does*.  That's what I tried to explain, that it's exactly
> isomorhpic once you discount the system-wide consumptions.
>

I don't think I agree.

Suppose I wrote an init program or a cgroup manager.  I can expect
that init program to be started in the root cgroup.  The program can
be lazy and write +io to /cgroup/cgroup.subtree_control and then
create some new cgroup /cgroup/a and it will work (I just tried it).

Now I run that program in a namespace.  It will not work because it'll
get -EBUSY when it tries to write to cgroup.subtree_control.  (I just
tried this, too, only using cd instead of a namespace.)  So it's *not*
isomorphic.

It *also* won't work (I think) if subtree control is enabled on the
root, but I don't think this is a problem in practice because subtree
control won't be enabled on the namespace root by a sensible cgroup
manager.

--Andy

  reply	other threads:[~2016-08-31  3:42 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-05 17:07 [Documentation] State of CPU controller in cgroup v2 Tejun Heo
2016-08-05 17:09 ` [PATCH 1/2] sched: Misc preps for cgroup unified hierarchy interface Tejun Heo
2016-08-05 17:09 ` [PATCH 2/2] sched: Implement interface for cgroup unified hierarchy Tejun Heo
2016-08-06  9:04 ` [Documentation] State of CPU controller in cgroup v2 Mike Galbraith
2016-08-10 22:09   ` Johannes Weiner
2016-08-11  6:25     ` Mike Galbraith
2016-08-12 22:17       ` Johannes Weiner
2016-08-13  5:08         ` Mike Galbraith
2016-08-16 14:07     ` Peter Zijlstra
2016-08-16 14:58       ` Chris Mason
2016-08-16 16:30       ` Johannes Weiner
2016-08-17  9:33         ` Mike Galbraith
2016-08-16 21:59       ` Tejun Heo
2016-08-17 20:18 ` Andy Lutomirski
2016-08-20 15:56   ` Tejun Heo
2016-08-20 18:45     ` Andy Lutomirski
2016-08-29 22:20       ` Tejun Heo
2016-08-31  3:42         ` Andy Lutomirski [this message]
2016-08-31 17:32           ` Tejun Heo
2016-08-31 19:11             ` Andy Lutomirski
2016-08-31 21:07               ` Tejun Heo
2016-08-31 21:46                 ` Andy Lutomirski
2016-09-03 22:05                   ` Tejun Heo
2016-09-05 17:37                     ` Andy Lutomirski
2016-09-06 10:29                       ` Peter Zijlstra
2016-10-04 14:47                         ` Tejun Heo
2016-10-05  8:07                           ` Peter Zijlstra
2016-09-09 22:57                       ` Tejun Heo
2016-09-10  8:54                         ` Mike Galbraith
2016-09-10 10:08                         ` Mike Galbraith
2016-09-30  9:06                           ` Tejun Heo
2016-09-30 14:53                             ` Mike Galbraith
2016-09-12 15:20                         ` Austin S. Hemmelgarn
2016-09-19 21:34                           ` Tejun Heo
     [not found]                         ` <CALCETrUhpPQdyZ-6WRjdB+iLbpGBduRZMWXQtCuS+R7Cq7rygg@mail.gmail.com>
2016-09-14 20:00                           ` Tejun Heo
2016-09-15 20:08                             ` Andy Lutomirski
2016-09-16  7:51                               ` Peter Zijlstra
2016-09-16 15:12                                 ` Andy Lutomirski
2016-09-16 16:19                                   ` Peter Zijlstra
2016-09-16 16:29                                     ` Andy Lutomirski
2016-09-16 16:50                                       ` Peter Zijlstra
2016-09-16 18:19                                         ` Andy Lutomirski
2016-09-17  1:47                                           ` Peter Zijlstra
2016-09-19 21:53                               ` Tejun Heo
2016-08-31 19:57         ` Andy Lutomirski
2016-08-22 10:12     ` Mike Galbraith
2016-08-21  5:34   ` James Bottomley
2016-08-29 22:35     ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALCETrUEygWrJbG25wSfG3zMG_+TNeP8+gAkcbh4_=ZNWHQCkw@mail.gmail.com' \
    --to=luto@amacapital.net \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@fb.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizefan@huawei.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=umgwanakikbuti@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).