linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFD] cgroup: about multiple hierarchies
@ 2012-02-21 21:19 Tejun Heo
  2012-02-21 21:21 ` Tejun Heo
                   ` (7 more replies)
  0 siblings, 8 replies; 84+ messages in thread
From: Tejun Heo @ 2012-02-21 21:19 UTC (permalink / raw)
  To: Li Zefan, containers, cgroups
  Cc: Andrew Morton, Kay Sievers, Lennart Poettering,
	Frederic Weisbecker, linux-kernel

Hello, guys.

I've been thinking about multiple hierarchy support in cgroup for a
while, especially after Frederic's pending task counter patchset.
This is a write up of what I've been thinking.  I don't know what to
do yet and simply continuing the current situation definitely is an
option, so please read on and throw in your 20 Won (or whatever amount
in whatever currency you want).

* The problems.

The support for multiple process hierarchies always struck me as
rather strange.  If you forget about the current cgroup controllers
and their implementations, the *only* reason to support multiple
hierarchies is if you want to apply resource limits based on different
orthogonal categorizations.

Documentation/cgroups.txt seems to be written with this consideration
on mind.  It's giving an example of applying limits accoring to two
orthogonal categorizations - user groups (profressors, students...)
and applications (WWW, NFS...).  While it may sound like a valid use
case, I'm very skeptical how useful or common mixing such orthogonal
categorizations in a single setup would be.

If support for multiple hierarchies comes for free, at least in terms
of features, maybe it can be better but of course it isn't so.  Any
given cgroup subsystem (or controller) can only be applied to a single
hierarchy, which makes sense for a lot of things - what would two
different limits on the same resource from different hierarchies mean?
But, there also are things which can be used and useful in all
hierarchies - e.g. cgroup freezer and task counter.

While the current cgroup implementation and conventions can probably
allow admins and engineers to tailor cgroup configuration for a
specific setup, it is very difficult to use in generic and automated
way.  I mean, who owns the freezer or task counter?  If they're
mounted on their own hierarchies, how should they be structured?
Should the different hierarchies be structured such that they are
projections of one unified hierarchy so that those generic mechanisms
can be applied uniformly?  If so, why do we need multiple hierarchies
at all?

A related limitation is that as different subsystems don't know which
hierarchies they'll end up on, they can't cooperate.  Wouldn't it make
more sense if task counter is a separate thing watching the resources
and triggers different actions as conifgured - be it failing forks or
freezing?

And yet another oddity is how cgroup handles nested cgroups - some
care about nesting but others just treat both internal and leaf nodes
equally.  They don't care about the topology at all.  This, too, can
be fine if you approach things subsys by subsys and use them in
different ways but if you try to combine them in generic way you get
sucked into the lala land of whatevers.

The following is a "best practices" document on using cgroups.

  http://www.freedesktop.org/wiki/Software/systemd/PaxControlGroups

To me, it seems to demonstrate the rather ugly situation that the
current cgroup is providing.  Everyone should tip-toe around cgroup
hierarchies and nobody has full knowledge or control over them.
e.g. base system management (e.g. systemd) can't use freezer or task
counter as someone else might want to use it for different hierarchy
layout.

It seems to me that cgroup interface is too complicated and inflexible
at the same time to be useful in generic manner.  Sure, it can be
useful for setups individually crafted by engineers and admins to
match specific sites or applications but as soon as you try to do
something automatic and generic with it, there just are too many
different scenarios and limitations to consider.


* So, what to do?

Heh, I don't know.  IIRC, last year at LinuxCon Japan, I heard
Christoph saying that the biggest problem w/ cgroup was that it was
building completely separate hierarchies out of the traditional
process hierarchies.  After thinking about this stuff for a while, I
fully agree with him.  I think this whole thing should have been a
layer over the process tree like sessions or program groups.

Unfortunately, that ship sailed long ago and we gotta make do with
what we have on our collective hands.  Here are some paths that we can
take.

1. We're screwed anyway.  Just don't worry about it and continue down
   on this path.  Can't get much worse, right?

   This approach has the apparent advantage of not having to do
   anything and is probably most likely to be taken.  This isn't ideal
   but hey nothing is. :P

2. Make it more flexible (and likely more complex, unfortunately).
   Allow the utility type subsystems to be used in multiple
   hierarchies.  The easiest and probably dirtiest way to achieve that
   would be embedding them into cgroup core.

   Thinking about doing this depresses me and it's not like I have a
   cheerful personality to begin with. :(

3. Head towards single hierarchy with the pie-in-the-sky goal of
   merging things into process hierarchy in some distant future.

   The first step would be herding people to use a unified hierarchy
   (ie. all subsystems mounted on a single cgroup tree) which is
   controlled by single entity in userland (be it systemd or cgroupd,
   cgroup-kit or whatever); however, even if we exclude supporting
   orthogonal categorizations, there are good number of non-trivial
   hurdles to clear before this can be realized.

   Most importantly, we would need to clean up how nesting is handled
   across different subsystems.  Handling internal and leaf nodes as
   equals simply can't work.  Membership should be recursive, and for
   subsystems which can't support proper nesting, the right thing to
   do would be somehow ensuring that only single node in the path from
   root to leaf is active for the controller.  We may even have to
   introduce an alternative of operation to support this (yuck).

   This path would require the most amount of work and we would be
   excluding a feature - support for multiple orthogonal
   categorizations - which has been available till now, probably
   through deprecation process spanning years; however, this at least
   gives us hope that we may reach sanity in the end, how distant that
   end may be.  Oh, hope. :)

So, I mean, I don't know.  What do other people think?  Is this a
unnecessary worry?  Are people generally happy with the way things
are?  Lennart, Kay, what do you guys think?

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-02-21 21:19 [RFD] cgroup: about multiple hierarchies Tejun Heo
@ 2012-02-21 21:21 ` Tejun Heo
  2012-02-22 13:34   ` Glauber Costa
  2012-02-26  4:59   ` Konstantin Khlebnikov
  2012-02-22 13:30 ` Peter Zijlstra
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 84+ messages in thread
From: Tejun Heo @ 2012-02-21 21:21 UTC (permalink / raw)
  To: Li Zefan, containers, cgroups
  Cc: Andrew Morton, Kay Sievers, Lennart Poettering,
	Frederic Weisbecker, linux-kernel, Christoph Hellwig

Sorry, forgot to cc hch.  Cc'ing him and quoting whole message.

On Tue, Feb 21, 2012 at 01:19:38PM -0800, Tejun Heo wrote:
> Hello, guys.
> 
> I've been thinking about multiple hierarchy support in cgroup for a
> while, especially after Frederic's pending task counter patchset.
> This is a write up of what I've been thinking.  I don't know what to
> do yet and simply continuing the current situation definitely is an
> option, so please read on and throw in your 20 Won (or whatever amount
> in whatever currency you want).
> 
> * The problems.
> 
> The support for multiple process hierarchies always struck me as
> rather strange.  If you forget about the current cgroup controllers
> and their implementations, the *only* reason to support multiple
> hierarchies is if you want to apply resource limits based on different
> orthogonal categorizations.
> 
> Documentation/cgroups.txt seems to be written with this consideration
> on mind.  It's giving an example of applying limits accoring to two
> orthogonal categorizations - user groups (profressors, students...)
> and applications (WWW, NFS...).  While it may sound like a valid use
> case, I'm very skeptical how useful or common mixing such orthogonal
> categorizations in a single setup would be.
> 
> If support for multiple hierarchies comes for free, at least in terms
> of features, maybe it can be better but of course it isn't so.  Any
> given cgroup subsystem (or controller) can only be applied to a single
> hierarchy, which makes sense for a lot of things - what would two
> different limits on the same resource from different hierarchies mean?
> But, there also are things which can be used and useful in all
> hierarchies - e.g. cgroup freezer and task counter.
> 
> While the current cgroup implementation and conventions can probably
> allow admins and engineers to tailor cgroup configuration for a
> specific setup, it is very difficult to use in generic and automated
> way.  I mean, who owns the freezer or task counter?  If they're
> mounted on their own hierarchies, how should they be structured?
> Should the different hierarchies be structured such that they are
> projections of one unified hierarchy so that those generic mechanisms
> can be applied uniformly?  If so, why do we need multiple hierarchies
> at all?
> 
> A related limitation is that as different subsystems don't know which
> hierarchies they'll end up on, they can't cooperate.  Wouldn't it make
> more sense if task counter is a separate thing watching the resources
> and triggers different actions as conifgured - be it failing forks or
> freezing?
> 
> And yet another oddity is how cgroup handles nested cgroups - some
> care about nesting but others just treat both internal and leaf nodes
> equally.  They don't care about the topology at all.  This, too, can
> be fine if you approach things subsys by subsys and use them in
> different ways but if you try to combine them in generic way you get
> sucked into the lala land of whatevers.
> 
> The following is a "best practices" document on using cgroups.
> 
>   http://www.freedesktop.org/wiki/Software/systemd/PaxControlGroups
> 
> To me, it seems to demonstrate the rather ugly situation that the
> current cgroup is providing.  Everyone should tip-toe around cgroup
> hierarchies and nobody has full knowledge or control over them.
> e.g. base system management (e.g. systemd) can't use freezer or task
> counter as someone else might want to use it for different hierarchy
> layout.
> 
> It seems to me that cgroup interface is too complicated and inflexible
> at the same time to be useful in generic manner.  Sure, it can be
> useful for setups individually crafted by engineers and admins to
> match specific sites or applications but as soon as you try to do
> something automatic and generic with it, there just are too many
> different scenarios and limitations to consider.
> 
> 
> * So, what to do?
> 
> Heh, I don't know.  IIRC, last year at LinuxCon Japan, I heard
> Christoph saying that the biggest problem w/ cgroup was that it was
> building completely separate hierarchies out of the traditional
> process hierarchies.  After thinking about this stuff for a while, I
> fully agree with him.  I think this whole thing should have been a
> layer over the process tree like sessions or program groups.
> 
> Unfortunately, that ship sailed long ago and we gotta make do with
> what we have on our collective hands.  Here are some paths that we can
> take.
> 
> 1. We're screwed anyway.  Just don't worry about it and continue down
>    on this path.  Can't get much worse, right?
> 
>    This approach has the apparent advantage of not having to do
>    anything and is probably most likely to be taken.  This isn't ideal
>    but hey nothing is. :P
> 
> 2. Make it more flexible (and likely more complex, unfortunately).
>    Allow the utility type subsystems to be used in multiple
>    hierarchies.  The easiest and probably dirtiest way to achieve that
>    would be embedding them into cgroup core.
> 
>    Thinking about doing this depresses me and it's not like I have a
>    cheerful personality to begin with. :(
> 
> 3. Head towards single hierarchy with the pie-in-the-sky goal of
>    merging things into process hierarchy in some distant future.
> 
>    The first step would be herding people to use a unified hierarchy
>    (ie. all subsystems mounted on a single cgroup tree) which is
>    controlled by single entity in userland (be it systemd or cgroupd,
>    cgroup-kit or whatever); however, even if we exclude supporting
>    orthogonal categorizations, there are good number of non-trivial
>    hurdles to clear before this can be realized.
> 
>    Most importantly, we would need to clean up how nesting is handled
>    across different subsystems.  Handling internal and leaf nodes as
>    equals simply can't work.  Membership should be recursive, and for
>    subsystems which can't support proper nesting, the right thing to
>    do would be somehow ensuring that only single node in the path from
>    root to leaf is active for the controller.  We may even have to
>    introduce an alternative of operation to support this (yuck).
> 
>    This path would require the most amount of work and we would be
>    excluding a feature - support for multiple orthogonal
>    categorizations - which has been available till now, probably
>    through deprecation process spanning years; however, this at least
>    gives us hope that we may reach sanity in the end, how distant that
>    end may be.  Oh, hope. :)
> 
> So, I mean, I don't know.  What do other people think?  Is this a
> unnecessary worry?  Are people generally happy with the way things
> are?  Lennart, Kay, what do you guys think?
> 
> Thanks.
> 
> --
> tejun

-- 
tejun

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-02-21 21:19 [RFD] cgroup: about multiple hierarchies Tejun Heo
  2012-02-21 21:21 ` Tejun Heo
@ 2012-02-22 13:30 ` Peter Zijlstra
  2012-02-22 13:37   ` Glauber Costa
                     ` (2 more replies)
  2012-02-22 15:45 ` Frederic Weisbecker
                   ` (5 subsequent siblings)
  7 siblings, 3 replies; 84+ messages in thread
From: Peter Zijlstra @ 2012-02-22 13:30 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, containers, cgroups, Andrew Morton, Kay Sievers,
	Lennart Poettering, Frederic Weisbecker, linux-kernel,
	Christoph Hellwig

On Tue, 2012-02-21 at 13:19 -0800, Tejun Heo wrote:
> So, I mean, I don't know.  What do other people think?  Is this a
> unnecessary worry?  Are people generally happy with the way things
> are?  Lennart, Kay, what do you guys think? 

FWIW I'm all for ripping the orthogonal hierarchy crap out, I hate it
just about as much as you do judging from your write-up.

Yes it will make some people unhappy, but I can live with that since my
life will be easier.. :-)

I'm not sure on your process hierarchy pie though, I rather like being
able to assign tasks to cgroups of my making without having to mirror
that in the process hierarchy.

Having seen what userspace does (libvirt in particular, I've still
managed to not get infected by the systemd crap) its utterly and
completely insane. Now I don't think any of my machines actually still
have libvirt on it, so I don't care if we break that either ;-)

Another thing I dislike about all the cgroup crap is all the dozens of
tiny controllers being proposed left right and center. Like WTF isn't
the hugetlb controller part of memcg? Its all memory, right?

Now I appreciate all this is new and exciting and Linux does the
evolutionary development thing so its bound to be a mess sometimes, but
shees.. 

So +1 on just ripping everything apart and trying again.



^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-02-21 21:21 ` Tejun Heo
@ 2012-02-22 13:34   ` Glauber Costa
  2012-02-23  7:45     ` Serge E. Hallyn
  2012-02-26  4:59   ` Konstantin Khlebnikov
  1 sibling, 1 reply; 84+ messages in thread
From: Glauber Costa @ 2012-02-22 13:34 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, containers, cgroups, Andrew Morton, Kay Sievers,
	Lennart Poettering, Frederic Weisbecker, linux-kernel,
	Christoph Hellwig

I am afraid I also don't have too much answers for your questions, but
I do have more questions =)

On 02/22/2012 01:21 AM, Tejun Heo wrote:
> Sorry, forgot to cc hch.  Cc'ing him and quoting whole message.
>
> On Tue, Feb 21, 2012 at 01:19:38PM -0800, Tejun Heo wrote:
>> Hello, guys.
>>
>> I've been thinking about multiple hierarchy support in cgroup for a
>> while, especially after Frederic's pending task counter patchset.
>> This is a write up of what I've been thinking.  I don't know what to
>> do yet and simply continuing the current situation definitely is an
>> option, so please read on and throw in your 20 Won (or whatever amount
>> in whatever currency you want).

I said that previously, but to this days the need for it still strikes 
me. I mean: the usecase is pretty clear. But every single cgroup is 
counting forks in a way or another. So for me, it would be better to 
simply count it as a cgroup property and act on it accordingly.

But then, of course, if you have multiple hierarchies, in which of them 
should you put that ? How ugly is it that you'll fail a fork, then check 
a hierarchy - no problem - only to later found out that this was 
configures in another hierarchy ?

>>
>> * The problems.
>>
>> The support for multiple process hierarchies always struck me as
>> rather strange.  If you forget about the current cgroup controllers
>> and their implementations, the *only* reason to support multiple
>> hierarchies is if you want to apply resource limits based on different
>> orthogonal categorizations.
>>
>> Documentation/cgroups.txt seems to be written with this consideration
>> on mind.  It's giving an example of applying limits accoring to two
>> orthogonal categorizations - user groups (profressors, students...)
>> and applications (WWW, NFS...).  While it may sound like a valid use
>> case, I'm very skeptical how useful or common mixing such orthogonal
>> categorizations in a single setup would be.
>>
>> If support for multiple hierarchies comes for free, at least in terms
>> of features, maybe it can be better but of course it isn't so.  Any
>> given cgroup subsystem (or controller) can only be applied to a single
>> hierarchy, which makes sense for a lot of things - what would two
>> different limits on the same resource from different hierarchies mean?
>> But, there also are things which can be used and useful in all
>> hierarchies - e.g. cgroup freezer and task counter.
>>
>> While the current cgroup implementation and conventions can probably
>> allow admins and engineers to tailor cgroup configuration for a
>> specific setup, it is very difficult to use in generic and automated
>> way.  I mean, who owns the freezer or task counter?  If they're
>> mounted on their own hierarchies, how should they be structured?
>> Should the different hierarchies be structured such that they are
>> projections of one unified hierarchy so that those generic mechanisms
>> can be applied uniformly?  If so, why do we need multiple hierarchies
>> at all?
 >>
>> A related limitation is that as different subsystems don't know which
>> hierarchies they'll end up on, they can't cooperate.  Wouldn't it make
>> more sense if task counter is a separate thing watching the resources
>> and triggers different actions as conifgured - be it failing forks or
>> freezing?

Well, there is more. The use case we have in mind here, is Containers. 
To span a container, we put process in cgroups - we don't care about 
hierarchies, they are all the same - but then also need to put those 
same process in different namespaces.

This is quite cumbersome, because those are two completely different 
ways of achieving more or less the same thing, resource visibility. At 
some point, we need to allow the container admin to interface with those 
resources - traditionally done via /proc. And now the mess begins:

Part of /proc is namespace aware. So if you are reading your 
/proc/mounts file, this is okay. But part of the data coming from there, 
like /proc/cpuinfo, /proc/stat, or /proc/meminfo, really belong to 
cgroups. And in some cases, information comes from more than one cgroup. 
A consensus wasn't yet reached about what to do with it.

>> And yet another oddity is how cgroup handles nested cgroups - some
>> care about nesting but others just treat both internal and leaf nodes
>> equally.
To be honest, I don't like that very much. I think once you have a 
directory-like structure, nesting of controlled resources should be 
assumed. But since I don't understand why this is this way to begin 
with, I'll leave it to someone else.

>> They don't care about the topology at all.  This, too, can
>> be fine if you approach things subsys by subsys and use them in
>> different ways but if you try to combine them in generic way you get
>> sucked into the lala land of whatevers.
>>
>> The following is a "best practices" document on using cgroups.
>>
>>    http://www.freedesktop.org/wiki/Software/systemd/PaxControlGroups
>>
>> To me, it seems to demonstrate the rather ugly situation that the
>> current cgroup is providing.  Everyone should tip-toe around cgroup
>> hierarchies and nobody has full knowledge or control over them.
>> e.g. base system management (e.g. systemd) can't use freezer or task
>> counter as someone else might want to use it for different hierarchy
>> layout.
>>
>> It seems to me that cgroup interface is too complicated and inflexible
>> at the same time to be useful in generic manner.  Sure, it can be
>> useful for setups individually crafted by engineers and admins to
>> match specific sites or applications but as soon as you try to do
>> something automatic and generic with it, there just are too many
>> different scenarios and limitations to consider.
>>
>>
>> * So, what to do?
>>
>> Heh, I don't know.  IIRC, last year at LinuxCon Japan, I heard
>> Christoph saying that the biggest problem w/ cgroup was that it was
>> building completely separate hierarchies out of the traditional
>> process hierarchies.  After thinking about this stuff for a while, I
>> fully agree with him.  I think this whole thing should have been a
>> layer over the process tree like sessions or program groups.
>>
>> Unfortunately, that ship sailed long ago and we gotta make do with
>> what we have on our collective hands.  Here are some paths that we can
>> take.
>>
>> 1. We're screwed anyway.  Just don't worry about it and continue down
>>     on this path.  Can't get much worse, right?
Wrong. =)

>>
>>     This approach has the apparent advantage of not having to do
>>     anything and is probably most likely to be taken.  This isn't ideal
>>     but hey nothing is. :P
>>
>> 2. Make it more flexible (and likely more complex, unfortunately).
It sounds like the guys on TV proposing more debt to end the debt crisis...

>>     Allow the utility type subsystems to be used in multiple
>>     hierarchies.  The easiest and probably dirtiest way to achieve that
>>     would be embedding them into cgroup core.
>>
>>     Thinking about doing this depresses me and it's not like I have a
>>     cheerful personality to begin with. :(
>>
>> 3. Head towards single hierarchy with the pie-in-the-sky goal of
>>     merging things into process hierarchy in some distant future.
>>
>>     The first step would be herding people to use a unified hierarchy
>>     (ie. all subsystems mounted on a single cgroup tree) which is
>>     controlled by single entity in userland (be it systemd or cgroupd,
>>     cgroup-kit or whatever); however, even if we exclude supporting
>>     orthogonal categorizations, there are good number of non-trivial
>>     hurdles to clear before this can be realized.
>>
>>     Most importantly, we would need to clean up how nesting is handled
>>     across different subsystems.  Handling internal and leaf nodes as
>>     equals simply can't work.
Agree here.

>>     Membership should be recursive, and for
>>     subsystems which can't support proper nesting, the right thing to
>>     do would be somehow ensuring that only single node in the path from
>>     root to leaf is active for the controller.  We may even have to
>>     introduce an alternative of operation to support this (yuck).
>>
>>     This path would require the most amount of work and we would be
>>     excluding a feature - support for multiple orthogonal
>>     categorizations - which has been available till now, probably
>>     through deprecation process spanning years; however, this at least
>>     gives us hope that we may reach sanity in the end, how distant that
>>     end may be.  Oh, hope. :)
>>
>> So, I mean, I don't know.  What do other people think?  Is this a
>> unnecessary worry?  Are people generally happy with the way things
>> are?  Lennart, Kay, what do you guys think?
>>

Well, most of the controllers, can be in practice enabled or disabled. 
The mere fact that you live on a cgroup controller doesn't do anything 
until you start to set limits - with the big exception being the cpu 
controller - once you're there, it treats you as a sched entity. Maybe 
we should ensure that all cgroups can be either on/off. Then after that, 
we can group processes the way we want, and they may or may be not 
resource constrained, depending on what you put on your files.

This can be combined with a mechanism to lock the tasks file for 
removal, then maybe we can end up in a better awareness situation - 
maybe it would be saner if you can be sure that once you put a task on a 
group, it won't just disappear...


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-02-22 13:30 ` Peter Zijlstra
@ 2012-02-22 13:37   ` Glauber Costa
  2012-02-22 18:01   ` Tejun Heo
  2012-02-23  7:39   ` Li Zefan
  2 siblings, 0 replies; 84+ messages in thread
From: Glauber Costa @ 2012-02-22 13:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, Li Zefan, containers, cgroups, Andrew Morton,
	Kay Sievers, Lennart Poettering, Frederic Weisbecker,
	linux-kernel, Christoph Hellwig

On 02/22/2012 05:30 PM, Peter Zijlstra wrote:
> On Tue, 2012-02-21 at 13:19 -0800, Tejun Heo wrote:
>> So, I mean, I don't know.  What do other people think?  Is this a
>> unnecessary worry?  Are people generally happy with the way things
>> are?  Lennart, Kay, what do you guys think?
>
> FWIW I'm all for ripping the orthogonal hierarchy crap out, I hate it
> just about as much as you do judging from your write-up.
>
> Yes it will make some people unhappy, but I can live with that since my
> life will be easier.. :-)
>
> I'm not sure on your process hierarchy pie though, I rather like being
> able to assign tasks to cgroups of my making without having to mirror
> that in the process hierarchy.
>
> Having seen what userspace does (libvirt in particular, I've still
> managed to not get infected by the systemd crap) its utterly and
> completely insane. Now I don't think any of my machines actually still
> have libvirt on it, so I don't care if we break that either ;-)
>
> Another thing I dislike about all the cgroup crap is all the dozens of
> tiny controllers being proposed left right and center. Like WTF isn't
> the hugetlb controller part of memcg? Its all memory, right?
>
Right. But this is easy to solve.
People are usually pointing out that "Hey, but that's not how my 
controller works, I need it to be slightly different here and there".
If we agree this is a bad thing - I think it is, we can at least adopt 
as a policy not to take any patches that create another hierarchy unless 
the need is utterly demonstrated.



^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-02-21 21:19 [RFD] cgroup: about multiple hierarchies Tejun Heo
  2012-02-21 21:21 ` Tejun Heo
  2012-02-22 13:30 ` Peter Zijlstra
@ 2012-02-22 15:45 ` Frederic Weisbecker
  2012-02-22 18:22   ` Tejun Heo
  2012-02-22 16:38 ` Vivek Goyal
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 84+ messages in thread
From: Frederic Weisbecker @ 2012-02-22 15:45 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, containers, cgroups, Andrew Morton, Kay Sievers,
	Lennart Poettering, linux-kernel, Paul Menage

On Tue, Feb 21, 2012 at 01:19:38PM -0800, Tejun Heo wrote:
> Hello, guys.
> 
> I've been thinking about multiple hierarchy support in cgroup for a
> while, especially after Frederic's pending task counter patchset.
> This is a write up of what I've been thinking.  I don't know what to
> do yet and simply continuing the current situation definitely is an
> option, so please read on and throw in your 20 Won (or whatever amount
> in whatever currency you want).
> 
> * The problems.
> 
> The support for multiple process hierarchies always struck me as
> rather strange.  If you forget about the current cgroup controllers
> and their implementations, the *only* reason to support multiple
> hierarchies is if you want to apply resource limits based on different
> orthogonal categorizations.
> 
> Documentation/cgroups.txt seems to be written with this consideration
> on mind.  It's giving an example of applying limits accoring to two
> orthogonal categorizations - user groups (profressors, students...)
> and applications (WWW, NFS...).  While it may sound like a valid use
> case, I'm very skeptical how useful or common mixing such orthogonal
> categorizations in a single setup would be.
> 
> If support for multiple hierarchies comes for free, at least in terms
> of features, maybe it can be better but of course it isn't so.  Any
> given cgroup subsystem (or controller) can only be applied to a single
> hierarchy, which makes sense for a lot of things - what would two
> different limits on the same resource from different hierarchies mean?
> But, there also are things which can be used and useful in all
> hierarchies - e.g. cgroup freezer and task counter.
> 
> While the current cgroup implementation and conventions can probably
> allow admins and engineers to tailor cgroup configuration for a
> specific setup, it is very difficult to use in generic and automated
> way.  I mean, who owns the freezer or task counter?  If they're
> mounted on their own hierarchies, how should they be structured?
> Should the different hierarchies be structured such that they are
> projections of one unified hierarchy so that those generic mechanisms
> can be applied uniformly?  If so, why do we need multiple hierarchies
> at all?
> 
> A related limitation is that as different subsystems don't know which
> hierarchies they'll end up on, they can't cooperate.  Wouldn't it make
> more sense if task counter is a separate thing watching the resources
> and triggers different actions as conifgured - be it failing forks or
> freezing?

For this particular example, I think we'd better have a file in which
a task can poll and get woken up when the task limit has been reached.
Then that task can decide to freeze or whatever.

> 
> And yet another oddity is how cgroup handles nested cgroups - some
> care about nesting but others just treat both internal and leaf nodes
> equally.  They don't care about the topology at all.  This, too, can
> be fine if you approach things subsys by subsys and use them in
> different ways but if you try to combine them in generic way you get
> sucked into the lala land of whatevers.
> 
> The following is a "best practices" document on using cgroups.
> 
>   http://www.freedesktop.org/wiki/Software/systemd/PaxControlGroups
> 
> To me, it seems to demonstrate the rather ugly situation that the
> current cgroup is providing.  Everyone should tip-toe around cgroup
> hierarchies and nobody has full knowledge or control over them.
> e.g. base system management (e.g. systemd) can't use freezer or task
> counter as someone else might want to use it for different hierarchy
> layout.
> 
> It seems to me that cgroup interface is too complicated and inflexible
> at the same time to be useful in generic manner.  Sure, it can be
> useful for setups individually crafted by engineers and admins to
> match specific sites or applications but as soon as you try to do
> something automatic and generic with it, there just are too many
> different scenarios and limitations to consider.
> 
> 
> * So, what to do?
> 
> Heh, I don't know.  IIRC, last year at LinuxCon Japan, I heard
> Christoph saying that the biggest problem w/ cgroup was that it was
> building completely separate hierarchies out of the traditional
> process hierarchies.  After thinking about this stuff for a while, I
> fully agree with him.  I think this whole thing should have been a
> layer over the process tree like sessions or program groups.
> 
> Unfortunately, that ship sailed long ago and we gotta make do with
> what we have on our collective hands.  Here are some paths that we can
> take.
> 
> 1. We're screwed anyway.  Just don't worry about it and continue down
>    on this path.  Can't get much worse, right?
> 
>    This approach has the apparent advantage of not having to do
>    anything and is probably most likely to be taken.  This isn't ideal
>    but hey nothing is. :P

Thing is we have an ABI now and it has been there for a while now. Aren't
we stuck with it? I'm no big fan of that multiple hierarchies thing either
but now I fear we have to support it.

> 
> 2. Make it more flexible (and likely more complex, unfortunately).
>    Allow the utility type subsystems to be used in multiple
>    hierarchies.  The easiest and probably dirtiest way to achieve that
>    would be embedding them into cgroup core.
> 
>    Thinking about doing this depresses me and it's not like I have a
>    cheerful personality to begin with. :(

Another solution is to support a class of multi-bindable subsystems as in
this old patch from Paul:

	https://lkml.org/lkml/2009/7/1/578

It sounds to me more healthy to iterate only over subsystems in fork/exit.
We probably don't want to add a new iteration over cgroups themselves
on these fast path.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-02-21 21:19 [RFD] cgroup: about multiple hierarchies Tejun Heo
                   ` (2 preceding siblings ...)
  2012-02-22 15:45 ` Frederic Weisbecker
@ 2012-02-22 16:38 ` Vivek Goyal
  2012-02-22 16:57   ` Vivek Goyal
                     ` (2 more replies)
  2012-02-23  8:22 ` Li Zefan
                   ` (3 subsequent siblings)
  7 siblings, 3 replies; 84+ messages in thread
From: Vivek Goyal @ 2012-02-22 16:38 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, containers, cgroups, Andrew Morton, Kay Sievers,
	Lennart Poettering, Frederic Weisbecker, linux-kernel,
	Christoph Hellwig

On Tue, Feb 21, 2012 at 01:19:38PM -0800, Tejun Heo wrote:

[..]
> 3. Head towards single hierarchy with the pie-in-the-sky goal of
>    merging things into process hierarchy in some distant future.
> 
>    The first step would be herding people to use a unified hierarchy
>    (ie. all subsystems mounted on a single cgroup tree) which is
>    controlled by single entity in userland (be it systemd or cgroupd,
>    cgroup-kit or whatever); however, even if we exclude supporting
>    orthogonal categorizations, there are good number of non-trivial
>    hurdles to clear before this can be realized.

Apart from orthogonal categorizations, one advantage of of multiple 
hierarchies is that you don't have to use a controller if you don't
want to. (Just don't create cgroup in controller's respective hierarchy).

This is not ideal but practically it might he helpful. In the sense
cgroups might not come cheap and different controllers might have different
overheads associated with it. For example, in blkio controller we can end
up idling a lot with increasing number of cgroups. In that case a better
way might be that use blkio controller cgroups selectively and that is
any workload which is destroying the performance of others, move it out
in a separate blkio group.

This is not ideal situation but that's how things currently are.

systemd by default creates in cgroups only cpu hierarchy (apart from named
systemd hiearchy to keep track of groups/processes). By default it does
not make use of other controllers and put any restrictions on
processes/services apart from cpu. Having a separate hiearchy for every
controller atleast easily allows that.

> 
>    Most importantly, we would need to clean up how nesting is handled
>    across different subsystems.  Handling internal and leaf nodes as
>    equals simply can't work.  Membership should be recursive, and for
>    subsystems which can't support proper nesting, the right thing to
>    do would be somehow ensuring that only single node in the path from
>    root to leaf is active for the controller.  We may even have to
>    introduce an alternative of operation to support this (yuck).
> 
>    This path would require the most amount of work and we would be
>    excluding a feature - support for multiple orthogonal
>    categorizations - which has been available till now, probably
>    through deprecation process spanning years; however, this at least
>    gives us hope that we may reach sanity in the end, how distant that
>    end may be.  Oh, hope. :)

Yes this is something needs to be cleaned up. Everybody seems to have
dealt with hiearchy in its own way.

For blkio controller, initially we provided fully nested hiearchies like
cpu controller but then implementation became too complex (CFQ is already
complicated and implementing fully nested hiearchies made it much more
complicated without any significant gain). So, I converted it into
flat model where internally we treat the whole hierarchy flat. (It
might have been a bad decision though).

So for blkio controller we can convert it into fully nested hierarchy
at the expense of more complex code in CFQ. I think memory cgroup
controller provides both flat and hierarchical mode. Keeping it fully
hierarchical also increases the cost as we need to traverse lot more
pointers for simple things like nested stats. On a system having
both systemd and libvirt, every virtual machine is already 3-4 level
deep in cgroup hierarchy.

Trying to make all the controllers uniform in terms of their treatment
of cgroup hiearchy sounds like a good thing to do. Once that is done,
one can probably see if it is worth to put all the controllers in a
single hierarchy.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-02-22 16:38 ` Vivek Goyal
@ 2012-02-22 16:57   ` Vivek Goyal
  2012-02-22 18:43     ` Tejun Heo
  2012-02-23  9:41     ` Peter Zijlstra
  2012-02-22 18:33   ` Tejun Heo
  2012-02-23  7:59   ` Li Zefan
  2 siblings, 2 replies; 84+ messages in thread
From: Vivek Goyal @ 2012-02-22 16:57 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, containers, cgroups, Andrew Morton, Kay Sievers,
	Lennart Poettering, Frederic Weisbecker, linux-kernel,
	Christoph Hellwig

On Wed, Feb 22, 2012 at 11:38:58AM -0500, Vivek Goyal wrote:

[..]
> > 
> >    Most importantly, we would need to clean up how nesting is handled
> >    across different subsystems.  Handling internal and leaf nodes as
> >    equals simply can't work.  Membership should be recursive, and for
> >    subsystems which can't support proper nesting, the right thing to
> >    do would be somehow ensuring that only single node in the path from
> >    root to leaf is active for the controller.  We may even have to
> >    introduce an alternative of operation to support this (yuck).
> > 
> >    This path would require the most amount of work and we would be
> >    excluding a feature - support for multiple orthogonal
> >    categorizations - which has been available till now, probably
> >    through deprecation process spanning years; however, this at least
> >    gives us hope that we may reach sanity in the end, how distant that
> >    end may be.  Oh, hope. :)
> 
> Yes this is something needs to be cleaned up. Everybody seems to have
> dealt with hiearchy in its own way.
> 
> For blkio controller, initially we provided fully nested hiearchies like
> cpu controller but then implementation became too complex (CFQ is already
> complicated and implementing fully nested hiearchies made it much more
> complicated without any significant gain). So, I converted it into
> flat model where internally we treat the whole hierarchy flat. (It
> might have been a bad decision though).

IIRC, another reason to implement flat hierachy was that some people
believed that's more natural way of doing things. For example, when
you talk about cgroup, people ask, ok, give me a cgroup with 25% IO
bandwidth. Now this does not come naturally with completely nested
hierarchies where task and groups are treated at the same level. As
group's peer tasks share the bandwidth, and task come and go a group's
% share varies dynamically.

Again, it does not mean I am advocating flat hiearchy. I am just wondering
in case of fully nested hierarchies (task at same level as groups), how
does one explain it to a layman user who understands things in terms of
% of resources.

Just saying that your group has weight X does not mean much in absolute
terms. And % bandwidth achieved by group will vary dynamically. (Hey,
you told me that one can divide the system resources somewhat
deterministically. But bandwidth varying dynamically does not sound the
same).

Thanks
Vivek

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-02-22 13:30 ` Peter Zijlstra
  2012-02-22 13:37   ` Glauber Costa
@ 2012-02-22 18:01   ` Tejun Heo
  2012-02-23  7:39   ` Li Zefan
  2 siblings, 0 replies; 84+ messages in thread
From: Tejun Heo @ 2012-02-22 18:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Li Zefan, containers, cgroups, Andrew Morton, Kay Sievers,
	Lennart Poettering, Frederic Weisbecker, linux-kernel,
	Christoph Hellwig

Hey, Peter.

On Wed, Feb 22, 2012 at 02:30:59PM +0100, Peter Zijlstra wrote:
> FWIW I'm all for ripping the orthogonal hierarchy crap out, I hate it
> just about as much as you do judging from your write-up.

I just don't get why it's there.  Maybe, there can be some remote use
cases where orthogonal hierarchies can be useful but structuring whole
cgroup around that seems really extreme.

> I'm not sure on your process hierarchy pie though, I rather like being
> able to assign tasks to cgroups of my making without having to mirror
> that in the process hierarchy.

The only question is whether we want to allow cgroup hierarchy to be
completely orthogonal from process tree structure, which I don't think
is a good idea.  It shouldn't affect trivial use cases.  If not
explicitly configured, all tasks would live in a single root cgroup -
much like every process would belong to the same session if nobody
does setsid() since boot (or container).

I don't know how the implementation would turn out and it may as well
stay separate as it is now but I still think the topology should match
pstree.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-02-22 15:45 ` Frederic Weisbecker
@ 2012-02-22 18:22   ` Tejun Heo
  2012-02-27 17:46     ` Frederic Weisbecker
  0 siblings, 1 reply; 84+ messages in thread
From: Tejun Heo @ 2012-02-22 18:22 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Li Zefan, containers, cgroups, Andrew Morton, Kay Sievers,
	Lennart Poettering, linux-kernel, Paul Menage

Hey, Frederic.

On Wed, Feb 22, 2012 at 04:45:04PM +0100, Frederic Weisbecker wrote:
> > A related limitation is that as different subsystems don't know which
> > hierarchies they'll end up on, they can't cooperate.  Wouldn't it make
> > more sense if task counter is a separate thing watching the resources
> > and triggers different actions as conifgured - be it failing forks or
> > freezing?
> 
> For this particular example, I think we'd better have a file in which
> a task can poll and get woken up when the task limit has been reached.
> Then that task can decide to freeze or whatever.

Yes, that may be a solution but to "guarantee" that the limit is never
breached, we need to stop it first somehow.  Probably making freezing
the default behavior with userland notifier (inotify event should
suffice) should do, which we can't do now. :(

> > 1. We're screwed anyway.  Just don't worry about it and continue down
> >    on this path.  Can't get much worse, right?
> > 
> >    This approach has the apparent advantage of not having to do
> >    anything and is probably most likely to be taken.  This isn't ideal
> >    but hey nothing is. :P
> 
> Thing is we have an ABI now and it has been there for a while now. Aren't
> we stuck with it? I'm no big fan of that multiple hierarchies thing either
> but now I fear we have to support it.

Well, yes and no.  While maintaining userland ABI is very important,
its importance isn't infinite and there are different types of
userland ABIs.  We definitely don't want to screw with syscalls.  We
should keep userland visible dynamic files which are used by common
usertools stable at almost all costs.  When it comes over to system
interface which is used mostly by base system tools, it can be a bit
flexible.  If the ABI in question is an optional thing, we probably
can be slightly more flexible.

We of course can't change things drastically.  It should be done
carefully with rather long deprecation period, but it can be done and
in fact isn't too uncommon.  Stuff under /sysfs tends to be somewhat
volatile and sysfs itself went through several ABI incompatible
iterations.

So, we can transition in baby steps.  e.g. we can first implement
proper nesting behavior without changing the default behavior and then
the base system can be updated to mount and control all subsystems by
default (with configuration opt-outs) so that the hierarchy reflects
pstree, effectively driving people away from multiple hierarchies and
we can implement new features assuming the new structure.  After a few
years, the kernel can start whining about non-start hierarchies and
then eventually remove the support.  It's a long process but
definitely doable.

> > 2. Make it more flexible (and likely more complex, unfortunately).
> >    Allow the utility type subsystems to be used in multiple
> >    hierarchies.  The easiest and probably dirtiest way to achieve that
> >    would be embedding them into cgroup core.
> > 
> >    Thinking about doing this depresses me and it's not like I have a
> >    cheerful personality to begin with. :(
> 
> Another solution is to support a class of multi-bindable subsystems as in
> this old patch from Paul:
> 
> 	https://lkml.org/lkml/2009/7/1/578

Heh, yeah, this would be closer to the proper way to achieve
multi-attach but I can't help feeling that this just buries ourselves
deeper into s*it and we're already knee-deep.  If multiple hierarchies
is an essential feature, maybe, but, if it's not, and I'm extremely
skeptical that it is, why the hell would we want to go that way?

> It sounds to me more healthy to iterate only over subsystems in fork/exit.
> We probably don't want to add a new iteration over cgroups themselves
> on these fast path.

Hmmm?  Don't follow why this is relevant.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-02-22 16:38 ` Vivek Goyal
  2012-02-22 16:57   ` Vivek Goyal
@ 2012-02-22 18:33   ` Tejun Heo
  2012-02-23 19:41     ` Vivek Goyal
  2012-02-23  7:59   ` Li Zefan
  2 siblings, 1 reply; 84+ messages in thread
From: Tejun Heo @ 2012-02-22 18:33 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Li Zefan, containers, cgroups, Andrew Morton, Kay Sievers,
	Lennart Poettering, Frederic Weisbecker, linux-kernel,
	Christoph Hellwig

Hey, Vivek.

On Wed, Feb 22, 2012 at 11:38:58AM -0500, Vivek Goyal wrote:
> Apart from orthogonal categorizations, one advantage of of multiple 
> hierarchies is that you don't have to use a controller if you don't
> want to. (Just don't create cgroup in controller's respective hierarchy).
> 
> This is not ideal but practically it might he helpful. In the sense
> cgroups might not come cheap and different controllers might have different
> overheads associated with it. For example, in blkio controller we can end
> up idling a lot with increasing number of cgroups. In that case a better
> way might be that use blkio controller cgroups selectively and that is
> any workload which is destroying the performance of others, move it out
> in a separate blkio group.

It should of course be possible to apply selective grouping on
different cgroups.  It's like any other layers on top of pstree -
sessions, program groups or containers.  Just group subtrees as you
see fit for each subsystem (there gotta be some fancy CS word for this
thing).  As long as those grouped trees are from the same base tree,
we can represent it in a single tree, just like we can just annotate
sessions and program groups in pstree.

So, as long as you don't want something orthogonal from pstree, it
should be fine.

> So for blkio controller we can convert it into fully nested hierarchy
> at the expense of more complex code in CFQ. I think memory cgroup
> controller provides both flat and hierarchical mode. Keeping it fully
> hierarchical also increases the cost as we need to traverse lot more
> pointers for simple things like nested stats. On a system having
> both systemd and libvirt, every virtual machine is already 3-4 level
> deep in cgroup hierarchy.

I don't think every controller should implement full nesting and
sharing the same hierarchy doesn't require it.  ie. if a controller
only wants to support flat hierarchy, just allow a single subgroup to
be active on any path between root and leaf.  We can add a flag or
helpers to support such mode of operation and controllers themselves
can treat all cgroups equally.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-02-22 16:57   ` Vivek Goyal
@ 2012-02-22 18:43     ` Tejun Heo
  2012-02-23  9:41     ` Peter Zijlstra
  1 sibling, 0 replies; 84+ messages in thread
From: Tejun Heo @ 2012-02-22 18:43 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Li Zefan, containers, cgroups, Andrew Morton, Kay Sievers,
	Lennart Poettering, Frederic Weisbecker, linux-kernel,
	Christoph Hellwig

Hello,

On Wed, Feb 22, 2012 at 11:57:14AM -0500, Vivek Goyal wrote:
> IIRC, another reason to implement flat hierachy was that some people
> believed that's more natural way of doing things. For example, when
> you talk about cgroup, people ask, ok, give me a cgroup with 25% IO
> bandwidth. Now this does not come naturally with completely nested
> hierarchies where task and groups are treated at the same level. As
> group's peer tasks share the bandwidth, and task come and go a group's
> % share varies dynamically.

I don't see how that is more "natural".  While I don't think
supporting full nesting is necessary for all controllers, the
semantics is very clear - build grouped trees according to active
configurations and distritbute resources top to bottom (network qdiscs
do exactly this).  Flat case is proper degenerate case of nesting.
There's nothing more or less natural.  It's just matter of trade off
between complexity and requirements.

> Again, it does not mean I am advocating flat hiearchy. I am just wondering
> in case of fully nested hierarchies (task at same level as groups), how
> does one explain it to a layman user who understands things in terms of
> % of resources.

I don't know whether we want nesting for block cgroup or not but at
the same time that doesn't really matter.  Sharing hierarchy doesn't
require every controller supporting full hierarchy.  I'm not sure how
the interface should be tho - maybe we can fail specifying config if
there already is an effective config encompassing that node or maybe
we can just break the existing config, I don't know.

Thank you.

-- 
tejun

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-02-22 13:30 ` Peter Zijlstra
  2012-02-22 13:37   ` Glauber Costa
  2012-02-22 18:01   ` Tejun Heo
@ 2012-02-23  7:39   ` Li Zefan
  2 siblings, 0 replies; 84+ messages in thread
From: Li Zefan @ 2012-02-23  7:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, containers, cgroups, Andrew Morton, Kay Sievers,
	Lennart Poettering, Frederic Weisbecker, linux-kernel,
	Christoph Hellwig

> Another thing I dislike about all the cgroup crap is all the dozens of
> tiny controllers being proposed left right and center. Like WTF isn't
> the hugetlb controller part of memcg? Its all memory, right?
> 

We also have two network controllers - net_cls and net_prio.
Patches were sent to netdev only, so I didn't see them until
they hit mainline.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-02-22 13:34   ` Glauber Costa
@ 2012-02-23  7:45     ` Serge E. Hallyn
  2012-02-23 17:29       ` Tejun Heo
  0 siblings, 1 reply; 84+ messages in thread
From: Serge E. Hallyn @ 2012-02-23  7:45 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Tejun Heo, Frederic Weisbecker, containers, Kay Sievers,
	linux-kernel, Christoph Hellwig, Lennart Poettering, cgroups,
	Andrew Morton

Quoting Glauber Costa (glommer@parallels.com):
> >>The support for multiple process hierarchies always struck me as
> >>rather strange.  If you forget about the current cgroup controllers
> >>and their implementations, the *only* reason to support multiple
> >>hierarchies is if you want to apply resource limits based on different
> >>orthogonal categorizations.
> >>

Right, the old lwn writeup took the same approach:
http://lwn.net/Articles/236038/

> >>Documentation/cgroups.txt seems to be written with this consideration
> >>on mind.  It's giving an example of applying limits accoring to two
> >>orthogonal categorizations - user groups (profressors, students...)
> >>and applications (WWW, NFS...).  While it may sound like a valid use
> >>case, I'm very skeptical how useful or common mixing such orthogonal
> >>categorizations in a single setup would be.

My first inclination is to agree, but counterexamples do come to mind.

I could imagine a site saying "users can run (X) (say, ftpds), but the
memory consumed by all those ftpds must not be > 10% total RAM".  At
the same time, they may run several apaches but want them all locked to
two of the cpus.

It might be worth a formal description of the new limits on use cases
such changes (both dropping support for orthogonal cgroups, and limiting
cgroups hierarchies to a mirror pstrees, separately) would bring.

To me personally the hierarchy limitation is more worrying.  There have
been times when I've simply created cgroups for 'compile' and 'image
build', with particular cpu and memory limits.  If I started a second
simultaneous compile, I'd want both compiles confined together.  (That's
not to say the simplification might not be worth it, just bringing up
the other side)

-serge

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-02-22 16:38 ` Vivek Goyal
  2012-02-22 16:57   ` Vivek Goyal
  2012-02-22 18:33   ` Tejun Heo
@ 2012-02-23  7:59   ` Li Zefan
  2012-02-23 20:32     ` Vivek Goyal
  2 siblings, 1 reply; 84+ messages in thread
From: Li Zefan @ 2012-02-23  7:59 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Tejun Heo, containers, cgroups, Andrew Morton, Kay Sievers,
	Lennart Poettering, Frederic Weisbecker, linux-kernel,
	Christoph Hellwig

> Trying to make all the controllers uniform in terms of their treatment
> of cgroup hiearchy sounds like a good thing to do.

Agreed.

Apart from nesting cgroups, there're other inconsistencies.

- Some controllers disallow more than one cgroup layer. That's the new
net_prio controller, and I don't know why it's made so, but I guess
it's fine to eliminate this restriction.

- Some controllers move resource charges when a task is moved to
a different cgroup, but some don't?

- Some controllers disallow task attaching under some circumstances.
So if we have a single hierarchy with all subsystems, the chance
that attaching a task to a cgroup fails may be bigger.

> Once that is done,
> one can probably see if it is worth to put all the controllers in a
> single hierarchy.
> 

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-02-21 21:19 [RFD] cgroup: about multiple hierarchies Tejun Heo
                   ` (3 preceding siblings ...)
  2012-02-22 16:38 ` Vivek Goyal
@ 2012-02-23  8:22 ` Li Zefan
  2012-02-23 17:33   ` Tejun Heo
       [not found] ` <m162em2efy.fsf@fess.ebiederm.org>
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 84+ messages in thread
From: Li Zefan @ 2012-02-23  8:22 UTC (permalink / raw)
  To: Tejun Heo
  Cc: containers, cgroups, Andrew Morton, Kay Sievers,
	Lennart Poettering, Frederic Weisbecker, linux-kernel

> The following is a "best practices" document on using cgroups.
> 
>   http://www.freedesktop.org/wiki/Software/systemd/PaxControlGroups
> 
> To me, it seems to demonstrate the rather ugly situation that the
> current cgroup is providing.  Everyone should tip-toe around cgroup
> hierarchies and nobody has full knowledge or control over them.
> e.g. base system management (e.g. systemd) can't use freezer or task
> counter as someone else might want to use it for different hierarchy
> layout.
> 

This issue still exists if we allow a single hierarchy only, right?
Different cgroup users/applications have to struggle not to step
on each other's toe.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-02-22 16:57   ` Vivek Goyal
  2012-02-22 18:43     ` Tejun Heo
@ 2012-02-23  9:41     ` Peter Zijlstra
  2012-02-23 14:13       ` Peter Zijlstra
  2012-02-23 21:38       ` Vivek Goyal
  1 sibling, 2 replies; 84+ messages in thread
From: Peter Zijlstra @ 2012-02-23  9:41 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Tejun Heo, Li Zefan, containers, cgroups, Andrew Morton,
	Kay Sievers, Lennart Poettering, Frederic Weisbecker,
	linux-kernel, Christoph Hellwig

On Wed, 2012-02-22 at 11:57 -0500, Vivek Goyal wrote:
> 
> Again, it does not mean I am advocating flat hiearchy. I am just wondering
> in case of fully nested hierarchies (task at same level as groups), how
> does one explain it to a layman user who understands things in terms of
> % of resources. 

If your complete control is % based then I would assume its a % of a %.
Simple enough.

If its bandwidth based then simply don't allow a child to consume more
bandwidth than its parent, also simple.

If your layman isn't capable of grokking that, he should stay the f*ck
away from it.

I'm really thinking that if we stick with the full hierarchical thing we
should mandate all controllers be fully hierarchical. And yes that
sucks, but so be it.

The scheduler thing tries to be completely hierarchical and yes it will
run into the ground if you push it hard enough simply because we're
hitting the limits of fixed point arithmetic, fractions can only go so
far, so the deeper you nest the crappier things get -- not that any
userspace cares about this.



^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-02-23  9:41     ` Peter Zijlstra
@ 2012-02-23 14:13       ` Peter Zijlstra
  2012-03-01 17:19         ` Michal Schmidt
  2012-02-23 21:38       ` Vivek Goyal
  1 sibling, 1 reply; 84+ messages in thread
From: Peter Zijlstra @ 2012-02-23 14:13 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Tejun Heo, Li Zefan, containers, cgroups, Andrew Morton,
	Kay Sievers, Lennart Poettering, Frederic Weisbecker,
	linux-kernel, Christoph Hellwig

On Thu, 2012-02-23 at 10:41 +0100, Peter Zijlstra wrote:
> If your complete control is % based then I would assume its a % of a %.
> Simple enough.
> 
> If its bandwidth based then simply don't allow a child to consume more
> bandwidth than its parent, also simple.
> 
> If your layman isn't capable of grokking that, he should stay the f*ck
> away from it. 

Fact is, the scheduler does both these things, so there's absolutely no
reason for other controllers not to do so too. Its the only sensible
thing if you want hierarchy.

My utter disregard for cgroups comes from having to actually implement a
controller for them, its a frigging nightmare. The systemd retards
mandating all this nonsense for booting a machine is completely bonghit
inspired and hasn't made me feel any better about it.



^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-02-23  7:45     ` Serge E. Hallyn
@ 2012-02-23 17:29       ` Tejun Heo
  2012-02-23 18:47         ` Serge Hallyn
  0 siblings, 1 reply; 84+ messages in thread
From: Tejun Heo @ 2012-02-23 17:29 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Glauber Costa, Frederic Weisbecker, containers, Kay Sievers,
	linux-kernel, Christoph Hellwig, Lennart Poettering, cgroups,
	Andrew Morton

Hey, Serge.

On Thu, Feb 23, 2012 at 07:45:26AM +0000, Serge E. Hallyn wrote:
> > >>Documentation/cgroups.txt seems to be written with this consideration
> > >>on mind.  It's giving an example of applying limits accoring to two
> > >>orthogonal categorizations - user groups (profressors, students...)
> > >>and applications (WWW, NFS...).  While it may sound like a valid use
> > >>case, I'm very skeptical how useful or common mixing such orthogonal
> > >>categorizations in a single setup would be.
> 
> My first inclination is to agree, but counterexamples do come to mind.
> 
> I could imagine a site saying "users can run (X) (say, ftpds), but the
> memory consumed by all those ftpds must not be > 10% total RAM".  At
> the same time, they may run several apaches but want them all locked to
> two of the cpus.

Orthogonal hierarchies is a feature and it does allow use cases which
aren't possible to support otherwise.  It's not too difficult to come
up with a use case crafted to exploit the feature.  The main thing is
whether the added functionality justifies the complexity and other
disadvantages described earlier in the thread.  To me, the scenarios
seem not realistic, common place or essential enough.

Also, it's not like there's only one problem to solve these issues.
It may not be exactly the same thing but that's just part of the
trade-off game we all play.

> It might be worth a formal description of the new limits on use cases
> such changes (both dropping support for orthogonal cgroups, and limiting
> cgroups hierarchies to a mirror pstrees, separately) would bring.

The word "formal" scares me. :)

> To me personally the hierarchy limitation is more worrying.  There have
> been times when I've simply created cgroups for 'compile' and 'image
> build', with particular cpu and memory limits.  If I started a second
> simultaneous compile, I'd want both compiles confined together.  (That's
> not to say the simplification might not be worth it, just bringing up
> the other side)

Yeah, that's an interesting point, but wouldn't something like the
following work too?

1. create_cgroup --cpu 40% --mem 20% screen
2. tell screen to create as many build screens you want
3. issue builds from those screens

To me, something like the above seems far more consistent with
everything else we have on the system than moving tasks around by
echoing pids to some sysfs file.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-02-23  8:22 ` Li Zefan
@ 2012-02-23 17:33   ` Tejun Heo
  0 siblings, 0 replies; 84+ messages in thread
From: Tejun Heo @ 2012-02-23 17:33 UTC (permalink / raw)
  To: Li Zefan
  Cc: containers, cgroups, Andrew Morton, Kay Sievers,
	Lennart Poettering, Frederic Weisbecker, linux-kernel

Hello, Li.

On Thu, Feb 23, 2012 at 04:22:26PM +0800, Li Zefan wrote:
> > The following is a "best practices" document on using cgroups.
> > 
> >   http://www.freedesktop.org/wiki/Software/systemd/PaxControlGroups
> > 
> > To me, it seems to demonstrate the rather ugly situation that the
> > current cgroup is providing.  Everyone should tip-toe around cgroup
> > hierarchies and nobody has full knowledge or control over them.
> > e.g. base system management (e.g. systemd) can't use freezer or task
> > counter as someone else might want to use it for different hierarchy
> > layout.
> > 
> 
> This issue still exists if we allow a single hierarchy only, right?
> Different cgroup users/applications have to struggle not to step
> on each other's toe.

Oh sure, having single hierarchy doesn't solve that problem but makes
it clear that there's single representation that kernel understands
and deals with.  I think the problem now is that kernel tries to
multiplex multiple users.  Unfortunately, it does that half-way and
badly and I think the nature of the problem doesn't really allow
proper muxed interface at kernel layer.  So, I'm suggesting to let go
of the broken pretense and just have a single unified interfce and let
userland deal with resource allocation policies.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-02-23 17:29       ` Tejun Heo
@ 2012-02-23 18:47         ` Serge Hallyn
  0 siblings, 0 replies; 84+ messages in thread
From: Serge Hallyn @ 2012-02-23 18:47 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Serge E. Hallyn, Frederic Weisbecker, containers, Kay Sievers,
	linux-kernel, Christoph Hellwig, Lennart Poettering, cgroups,
	Andrew Morton

Quoting Tejun Heo (tj@kernel.org):
> Hey, Serge.
> 
> On Thu, Feb 23, 2012 at 07:45:26AM +0000, Serge E. Hallyn wrote:
> > > >>Documentation/cgroups.txt seems to be written with this consideration
> > > >>on mind.  It's giving an example of applying limits accoring to two
> > > >>orthogonal categorizations - user groups (profressors, students...)
> > > >>and applications (WWW, NFS...).  While it may sound like a valid use
> > > >>case, I'm very skeptical how useful or common mixing such orthogonal
> > > >>categorizations in a single setup would be.
> > 
> > My first inclination is to agree, but counterexamples do come to mind.
> > 
> > I could imagine a site saying "users can run (X) (say, ftpds), but the
> > memory consumed by all those ftpds must not be > 10% total RAM".  At
> > the same time, they may run several apaches but want them all locked to
> > two of the cpus.
> 
> Orthogonal hierarchies is a feature and it does allow use cases which

Of course.  Note that while I used myself in the examples, I'm not
opposed to any of what you've suggested.  Just trying to raise
discussion.

> aren't possible to support otherwise.  It's not too difficult to come
> up with a use case crafted to exploit the feature.  The main thing is
> whether the added functionality justifies the complexity and other

And (somehow) I think we need to get input from the users - the ones not
on lkml.  There is an end-user summit coming up, right?  Perhaps this
question should be floated there?

> disadvantages described earlier in the thread.  To me, the scenarios
> seem not realistic, common place or essential enough.
> 
> Also, it's not like there's only one problem to solve these issues.
> It may not be exactly the same thing but that's just part of the
> trade-off game we all play.
> 
> > It might be worth a formal description of the new limits on use cases
> > such changes (both dropping support for orthogonal cgroups, and limiting
> > cgroups hierarchies to a mirror pstrees, separately) would bring.
> 
> The word "formal" scares me. :)

The upside would be a clear explanation of what userspace can do to
work around the more limited kernel functionality.

> > To me personally the hierarchy limitation is more worrying.  There have
> > been times when I've simply created cgroups for 'compile' and 'image
> > build', with particular cpu and memory limits.  If I started a second
> > simultaneous compile, I'd want both compiles confined together.  (That's
> > not to say the simplification might not be worth it, just bringing up
> > the other side)
> 
> Yeah, that's an interesting point, but wouldn't something like the
> following work too?
> 
> 1. create_cgroup --cpu 40% --mem 20% screen
> 2. tell screen to create as many build screens you want
> 3. issue builds from those screens

That works for a single user.  Gets more complicated if you have multiple
users but still want to confine compiles differently from other workloads.

Still, we now have 'namespace attach', so even if we generally shadow
pstree with the cgroups, perhaps we could implement a cgroup transfer
much more cleanly than the current cgroup attach stuff.

Or, maybe it's just not something users would deem worthwhile.  *I*
will be fine either way.

> To me, something like the above seems far more consistent with
> everything else we have on the system than moving tasks around by
> echoing pids to some sysfs file.

-serge

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-02-22 18:33   ` Tejun Heo
@ 2012-02-23 19:41     ` Vivek Goyal
  2012-02-23 22:38       ` Tejun Heo
  0 siblings, 1 reply; 84+ messages in thread
From: Vivek Goyal @ 2012-02-23 19:41 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, containers, cgroups, Andrew Morton, Kay Sievers,
	Lennart Poettering, Frederic Weisbecker, linux-kernel,
	Christoph Hellwig

On Wed, Feb 22, 2012 at 10:33:51AM -0800, Tejun Heo wrote:

[..]
> 
> > So for blkio controller we can convert it into fully nested hierarchy
> > at the expense of more complex code in CFQ. I think memory cgroup
> > controller provides both flat and hierarchical mode. Keeping it fully
> > hierarchical also increases the cost as we need to traverse lot more
> > pointers for simple things like nested stats. On a system having
> > both systemd and libvirt, every virtual machine is already 3-4 level
> > deep in cgroup hierarchy.
> 
> I don't think every controller should implement full nesting and
> sharing the same hierarchy doesn't require it.  ie. if a controller
> only wants to support flat hierarchy, just allow a single subgroup to
> be active on any path between root and leaf.  We can add a flag or
> helpers to support such mode of operation and controllers themselves
> can treat all cgroups equally.

I am not sure I understand "allow a single subgroup to be active on any
path"

So if a hierarchy looks as follows.

				root
			        / | \
			      g1  g2 g3
			          |   |	
				  g4  g5

So you are saying that just either g2 or g4 to be active in path 2 and 
similiarly allow g3 or g5 to be active. IOW, if a task is in g5 and g3
is active group, and effectively task will be considered in g3? So in
above diagram if g1 and g4 and g3 are active groups, controller will
see them as.

				root
				/ | \
			       g1 g4 g3

Did I understand it right or you meant something else. But this is still
not flat and has 2 level of hierarchy. Tasks in root group and tasks in 
children group (g1, g2 and g3) are different levels hence controller needs
to implement hierarchy. For it to be truly flat, it needs to look like
this.
			     pivot_point
				/ | \  \
			       g1 g4 g3 root

Now the notion that only one group is active in each path from root to
leaf does not mean much.

Considering everything internally as flat, isn't it simpler. So cgroup 
tree still might look hierarchical but actually controller treats it
as.
				root
		              / / | \ \
			    g1 g2 g3 g4 g5

Well, above is not exactly flat as has 2 level of hierarchy. That's blkio
controller views cgroup hierarhcy as follows, currently.

				pivot_point
		              / / | \   \  \ 
			    g1 g2 g3 g4 g5 root

Thanks
Vivek

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-02-23  7:59   ` Li Zefan
@ 2012-02-23 20:32     ` Vivek Goyal
  0 siblings, 0 replies; 84+ messages in thread
From: Vivek Goyal @ 2012-02-23 20:32 UTC (permalink / raw)
  To: Li Zefan
  Cc: Tejun Heo, containers, cgroups, Andrew Morton, Kay Sievers,
	Lennart Poettering, Frederic Weisbecker, linux-kernel,
	Christoph Hellwig

On Thu, Feb 23, 2012 at 03:59:44PM +0800, Li Zefan wrote:
> > Trying to make all the controllers uniform in terms of their treatment
> > of cgroup hiearchy sounds like a good thing to do.
> 
> Agreed.
> 
> Apart from nesting cgroups, there're other inconsistencies.
> 
> - Some controllers disallow more than one cgroup layer. That's the new
> net_prio controller, and I don't know why it's made so, but I guess
> it's fine to eliminate this restriction.

You mean don't allow creating deeper levels in cgroup hierarcy? That will
fail with libvirt + systemd as they create much deeper levels. I had to
change that for blkio.

> 
> - Some controllers move resource charges when a task is moved to
> a different cgroup, but some don't?

I think in case of some controllers it does not even apply. For cpu, blkio
resources are renewable, so there is no moving around of charges.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-02-23  9:41     ` Peter Zijlstra
  2012-02-23 14:13       ` Peter Zijlstra
@ 2012-02-23 21:38       ` Vivek Goyal
  2012-02-23 22:34         ` Tejun Heo
  2012-02-24 11:33         ` Peter Zijlstra
  1 sibling, 2 replies; 84+ messages in thread
From: Vivek Goyal @ 2012-02-23 21:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, Li Zefan, containers, cgroups, Andrew Morton,
	Kay Sievers, Lennart Poettering, Frederic Weisbecker,
	linux-kernel, Christoph Hellwig

On Thu, Feb 23, 2012 at 10:41:34AM +0100, Peter Zijlstra wrote:
> On Wed, 2012-02-22 at 11:57 -0500, Vivek Goyal wrote:
> > 
> > Again, it does not mean I am advocating flat hiearchy. I am just wondering
> > in case of fully nested hierarchies (task at same level as groups), how
> > does one explain it to a layman user who understands things in terms of
> > % of resources. 
> 
> If your complete control is % based then I would assume its a % of a %.
> Simple enough.

But % of % will vary dynamically and not be static. So if root has got
100% of resources and we want 25% of that for a group, then hierarchy
might look as follows.

				root
				/ | \
			       T1 T2 g1

T1, T2 are tasks and g1 is the group needing 25% of root's resources. Now
number of tasks running in parallel to g1 will determine its effective %
and tasks come and go. So the only way to do this would be that move T1
and T2 in a child group under root and make sure new tasks don't show up
in root. 

Otherwise creating a group under root does not ensure that you get minimum
% of resource. It just makes sure that you can't get more than 25% of
% resources when things are tight.

> 
> If its bandwidth based then simply don't allow a child to consume more
> bandwidth than its parent, also simple.

In case of absolute limit, things are somewhat simpler. A group is not impacted
by its peer tasks/groups that much. Well, there is also an issue and that
is how do all the children of a group share the resources. So assume
following.

				  g1
				/ | \
			       T1 T2 g2

Assume g1 has 100MB/s limit and g2 has 90MB/s limit too. Now how this
100MB/s is divided among T1, T2 and g2? Round robin or do proportional
division based on weights.  I think for cpu scheduler it can do
proportional division as everything is implemented in single layer. For
blkio, trottling is stacked on top of proportional. So I guess, I can
do round robin between T1, T2 and g2 and also make sure total of T1, T2
and g2 does not cross g1's bandwidth.

So upper limit is not that big a issue. Proportional one does become
one with effective % varying dynamically.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-02-23 21:38       ` Vivek Goyal
@ 2012-02-23 22:34         ` Tejun Heo
  2012-02-28 21:16           ` Vivek Goyal
  2012-02-24 11:33         ` Peter Zijlstra
  1 sibling, 1 reply; 84+ messages in thread
From: Tejun Heo @ 2012-02-23 22:34 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Peter Zijlstra, Li Zefan, containers, cgroups, Andrew Morton,
	Kay Sievers, Lennart Poettering, Frederic Weisbecker,
	linux-kernel, Christoph Hellwig

On Thu, Feb 23, 2012 at 04:38:47PM -0500, Vivek Goyal wrote:
> On Thu, Feb 23, 2012 at 10:41:34AM +0100, Peter Zijlstra wrote:
> > On Wed, 2012-02-22 at 11:57 -0500, Vivek Goyal wrote:
> > > 
> > > Again, it does not mean I am advocating flat hiearchy. I am just wondering
> > > in case of fully nested hierarchies (task at same level as groups), how
> > > does one explain it to a layman user who understands things in terms of
> > > % of resources. 
> > 
> > If your complete control is % based then I would assume its a % of a %.
> > Simple enough.
> 
> But % of % will vary dynamically and not be static. So if root has got
> 100% of resources and we want 25% of that for a group, then hierarchy
> might look as follows.

It is complex but semantics is pretty well defined.  It should behave
exactly the same as HTB.  Whether the complexity would be justifiable
is a different issue.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-02-23 19:41     ` Vivek Goyal
@ 2012-02-23 22:38       ` Tejun Heo
  0 siblings, 0 replies; 84+ messages in thread
From: Tejun Heo @ 2012-02-23 22:38 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Li Zefan, containers, cgroups, Andrew Morton, Kay Sievers,
	Lennart Poettering, Frederic Weisbecker, linux-kernel,
	Christoph Hellwig

Hello,

On Thu, Feb 23, 2012 at 02:41:10PM -0500, Vivek Goyal wrote:
> Considering everything internally as flat, isn't it simpler. So cgroup 
> tree still might look hierarchical but actually controller treats it
> as.
> 				root
> 		              / / | \ \
> 			    g1 g2 g3 g4 g5

I don't know.  Mixing the above with controllers which implement
proper nesting makes my head explode (why is there a hierarchy at
all?).  Root is always special anyway.  Just treating root differently
and collapsing the rest of hierarchies should do, right?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-02-23 21:38       ` Vivek Goyal
  2012-02-23 22:34         ` Tejun Heo
@ 2012-02-24 11:33         ` Peter Zijlstra
  1 sibling, 0 replies; 84+ messages in thread
From: Peter Zijlstra @ 2012-02-24 11:33 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Tejun Heo, Li Zefan, containers, cgroups, Andrew Morton,
	Kay Sievers, Lennart Poettering, Frederic Weisbecker,
	linux-kernel, Christoph Hellwig

On Thu, 2012-02-23 at 16:38 -0500, Vivek Goyal wrote:
> > > Again, it does not mean I am advocating flat hiearchy. I am just wondering
> > > in case of fully nested hierarchies (task at same level as groups), how
> > > does one explain it to a layman user who understands things in terms of
> > > % of resources. 
> > 
> > If your complete control is % based then I would assume its a % of a %.
> > Simple enough.
> 
> But % of % will vary dynamically and not be static. So if root has got
> 100% of resources and we want 25% of that for a group, then hierarchy
> might look as follows.
> 
>                                 root
>                                 / | \
>                                T1 T2 g1
> 
> T1, T2 are tasks and g1 is the group needing 25% of root's resources. Now
> number of tasks running in parallel to g1 will determine its effective %
> and tasks come and go. So the only way to do this would be that move T1
> and T2 in a child group under root and make sure new tasks don't show up
> in root. 

Which is exactly that the scheduler stuff does.. so tough luck for the
sysad who can't grasp it.

> Otherwise creating a group under root does not ensure that you get minimum
> % of resource. It just makes sure that you can't get more than 25% of
> % resources when things are tight. 

You never said anything about minimum resource guarantees in the initial
problem statement.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-02-21 21:21 ` Tejun Heo
  2012-02-22 13:34   ` Glauber Costa
@ 2012-02-26  4:59   ` Konstantin Khlebnikov
  1 sibling, 0 replies; 84+ messages in thread
From: Konstantin Khlebnikov @ 2012-02-26  4:59 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, containers, cgroups, Andrew Morton, Kay Sievers,
	Lennart Poettering, Frederic Weisbecker, linux-kernel,
	Christoph Hellwig

Tejun Heo wrote:
> Sorry, forgot to cc hch.  Cc'ing him and quoting whole message.
>
> On Tue, Feb 21, 2012 at 01:19:38PM -0800, Tejun Heo wrote:
>> Hello, guys.
>>
>> I've been thinking about multiple hierarchy support in cgroup for a
>> while, especially after Frederic's pending task counter patchset.
>> This is a write up of what I've been thinking.  I don't know what to
>> do yet and simply continuing the current situation definitely is an
>> option, so please read on and throw in your 20 Won (or whatever amount
>> in whatever currency you want).
>>
>> * The problems.
>>
>> The support for multiple process hierarchies always struck me as
>> rather strange.  If you forget about the current cgroup controllers
>> and their implementations, the *only* reason to support multiple
>> hierarchies is if you want to apply resource limits based on different
>> orthogonal categorizations.
>>
>> Documentation/cgroups.txt seems to be written with this consideration
>> on mind.  It's giving an example of applying limits accoring to two
>> orthogonal categorizations - user groups (profressors, students...)
>> and applications (WWW, NFS...).  While it may sound like a valid use
>> case, I'm very skeptical how useful or common mixing such orthogonal
>> categorizations in a single setup would be.
>>
>> If support for multiple hierarchies comes for free, at least in terms
>> of features, maybe it can be better but of course it isn't so.  Any
>> given cgroup subsystem (or controller) can only be applied to a single
>> hierarchy, which makes sense for a lot of things - what would two
>> different limits on the same resource from different hierarchies mean?
>> But, there also are things which can be used and useful in all
>> hierarchies - e.g. cgroup freezer and task counter.
>>
>> While the current cgroup implementation and conventions can probably
>> allow admins and engineers to tailor cgroup configuration for a
>> specific setup, it is very difficult to use in generic and automated
>> way.  I mean, who owns the freezer or task counter?  If they're
>> mounted on their own hierarchies, how should they be structured?
>> Should the different hierarchies be structured such that they are
>> projections of one unified hierarchy so that those generic mechanisms
>> can be applied uniformly?  If so, why do we need multiple hierarchies
>> at all?

We can keep orthogonal categorization in a single hierarchy, if we allow task
to live in several cgroups simultaneously, each controller in independent cgroup.
Task to cgroup links already organized through css, which can store any combination
of subsystems. I think it might be easier than current multiple hierarchies.

>>
>> A related limitation is that as different subsystems don't know which
>> hierarchies they'll end up on, they can't cooperate.  Wouldn't it make
>> more sense if task counter is a separate thing watching the resources
>> and triggers different actions as conifgured - be it failing forks or
>> freezing?
>>
>> And yet another oddity is how cgroup handles nested cgroups - some
>> care about nesting but others just treat both internal and leaf nodes
>> equally.  They don't care about the topology at all.  This, too, can
>> be fine if you approach things subsys by subsys and use them in
>> different ways but if you try to combine them in generic way you get
>> sucked into the lala land of whatevers.
>>
>> The following is a "best practices" document on using cgroups.
>>
>>    http://www.freedesktop.org/wiki/Software/systemd/PaxControlGroups
>>
>> To me, it seems to demonstrate the rather ugly situation that the
>> current cgroup is providing.  Everyone should tip-toe around cgroup
>> hierarchies and nobody has full knowledge or control over them.
>> e.g. base system management (e.g. systemd) can't use freezer or task
>> counter as someone else might want to use it for different hierarchy
>> layout.
>>
>> It seems to me that cgroup interface is too complicated and inflexible
>> at the same time to be useful in generic manner.  Sure, it can be
>> useful for setups individually crafted by engineers and admins to
>> match specific sites or applications but as soon as you try to do
>> something automatic and generic with it, there just are too many
>> different scenarios and limitations to consider.
>>
>>
>> * So, what to do?
>>
>> Heh, I don't know.  IIRC, last year at LinuxCon Japan, I heard
>> Christoph saying that the biggest problem w/ cgroup was that it was
>> building completely separate hierarchies out of the traditional
>> process hierarchies.  After thinking about this stuff for a while, I
>> fully agree with him.  I think this whole thing should have been a
>> layer over the process tree like sessions or program groups.

I agree too. Zombies can not live in cgroups, this is not fair!
It seems, to integrate cgroups into normal process hierarchies, we should
link cgroup-css with struct pid rather than struct task.
Struct pid always rcu-protected and well managed. This change should
simplify cgroup iteration and allows to drop ugly "use_task_css_set_links"
together with "css_set_lock" on fork/exit paths.

>>
>> Unfortunately, that ship sailed long ago and we gotta make do with
>> what we have on our collective hands.  Here are some paths that we can
>> take.
>>
>> 1. We're screwed anyway.  Just don't worry about it and continue down
>>     on this path.  Can't get much worse, right?
>>
>>     This approach has the apparent advantage of not having to do
>>     anything and is probably most likely to be taken.  This isn't ideal
>>     but hey nothing is. :P
>>
>> 2. Make it more flexible (and likely more complex, unfortunately).
>>     Allow the utility type subsystems to be used in multiple
>>     hierarchies.  The easiest and probably dirtiest way to achieve that
>>     would be embedding them into cgroup core.
>>
>>     Thinking about doing this depresses me and it's not like I have a
>>     cheerful personality to begin with. :(
>>
>> 3. Head towards single hierarchy with the pie-in-the-sky goal of
>>     merging things into process hierarchy in some distant future.
>>
>>     The first step would be herding people to use a unified hierarchy
>>     (ie. all subsystems mounted on a single cgroup tree) which is
>>     controlled by single entity in userland (be it systemd or cgroupd,
>>     cgroup-kit or whatever); however, even if we exclude supporting
>>     orthogonal categorizations, there are good number of non-trivial
>>     hurdles to clear before this can be realized.
>>
>>     Most importantly, we would need to clean up how nesting is handled
>>     across different subsystems.  Handling internal and leaf nodes as
>>     equals simply can't work.  Membership should be recursive, and for
>>     subsystems which can't support proper nesting, the right thing to
>>     do would be somehow ensuring that only single node in the path from
>>     root to leaf is active for the controller.  We may even have to
>>     introduce an alternative of operation to support this (yuck).
>>
>>     This path would require the most amount of work and we would be
>>     excluding a feature - support for multiple orthogonal
>>     categorizations - which has been available till now, probably
>>     through deprecation process spanning years; however, this at least
>>     gives us hope that we may reach sanity in the end, how distant that
>>     end may be.  Oh, hope. :)
>>
>> So, I mean, I don't know.  What do other people think?  Is this a
>> unnecessary worry?  Are people generally happy with the way things
>> are?  Lennart, Kay, what do you guys think?
>>
>> Thanks.
>>
>> --
>> tejun
>


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-02-22 18:22   ` Tejun Heo
@ 2012-02-27 17:46     ` Frederic Weisbecker
  0 siblings, 0 replies; 84+ messages in thread
From: Frederic Weisbecker @ 2012-02-27 17:46 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, containers, cgroups, Andrew Morton, Kay Sievers,
	Lennart Poettering, linux-kernel, Paul Menage

On Wed, Feb 22, 2012 at 10:22:07AM -0800, Tejun Heo wrote:
> Hey, Frederic.
> 
> On Wed, Feb 22, 2012 at 04:45:04PM +0100, Frederic Weisbecker wrote:
> > > A related limitation is that as different subsystems don't know which
> > > hierarchies they'll end up on, they can't cooperate.  Wouldn't it make
> > > more sense if task counter is a separate thing watching the resources
> > > and triggers different actions as conifgured - be it failing forks or
> > > freezing?
> > 
> > For this particular example, I think we'd better have a file in which
> > a task can poll and get woken up when the task limit has been reached.
> > Then that task can decide to freeze or whatever.
> 
> Yes, that may be a solution but to "guarantee" that the limit is never
> breached, we need to stop it first somehow.  Probably making freezing
> the default behavior with userland notifier (inotify event should
> suffice) should do, which we can't do now. :(

The limit can't be breached because forks are rejected once we reached the
limit given by the user.

With this rejection, another task can take control of this and freeze the
cgroup.

> 
> > > 1. We're screwed anyway.  Just don't worry about it and continue down
> > >    on this path.  Can't get much worse, right?
> > > 
> > >    This approach has the apparent advantage of not having to do
> > >    anything and is probably most likely to be taken.  This isn't ideal
> > >    but hey nothing is. :P
> > 
> > Thing is we have an ABI now and it has been there for a while now. Aren't
> > we stuck with it? I'm no big fan of that multiple hierarchies thing either
> > but now I fear we have to support it.
> 
> Well, yes and no.  While maintaining userland ABI is very important,
> its importance isn't infinite and there are different types of
> userland ABIs.  We definitely don't want to screw with syscalls.  We
> should keep userland visible dynamic files which are used by common
> usertools stable at almost all costs.  When it comes over to system
> interface which is used mostly by base system tools, it can be a bit
> flexible.  If the ABI in question is an optional thing, we probably
> can be slightly more flexible.

But cgroups falls into the general purpose category to me. Not something
that was used only by a finite circle of a few well known and defined tools.

> We of course can't change things drastically.  It should be done
> carefully with rather long deprecation period, but it can be done and
> in fact isn't too uncommon.  Stuff under /sysfs tends to be somewhat
> volatile and sysfs itself went through several ABI incompatible
> iterations.
> 
> So, we can transition in baby steps.  e.g. we can first implement
> proper nesting behavior without changing the default behavior and then
> the base system can be updated to mount and control all subsystems by
> default (with configuration opt-outs) so that the hierarchy reflects
> pstree, effectively driving people away from multiple hierarchies and
> we can implement new features assuming the new structure.  After a few
> years, the kernel can start whining about non-start hierarchies and
> then eventually remove the support.  It's a long process but
> definitely doable.

Well, if we can I'll be glad.

> 
> > > 2. Make it more flexible (and likely more complex, unfortunately).
> > >    Allow the utility type subsystems to be used in multiple
> > >    hierarchies.  The easiest and probably dirtiest way to achieve that
> > >    would be embedding them into cgroup core.
> > > 
> > >    Thinking about doing this depresses me and it's not like I have a
> > >    cheerful personality to begin with. :(
> > 
> > Another solution is to support a class of multi-bindable subsystems as in
> > this old patch from Paul:
> > 
> > 	https://lkml.org/lkml/2009/7/1/578
> 
> Heh, yeah, this would be closer to the proper way to achieve
> multi-attach but I can't help feeling that this just buries ourselves
> deeper into s*it and we're already knee-deep.  If multiple hierarchies
> is an essential feature, maybe, but, if it's not, and I'm extremely
> skeptical that it is, why the hell would we want to go that way?

I don't know, it just depend what will happen on these multiple
hierarchies.

> 
> > It sounds to me more healthy to iterate only over subsystems in fork/exit.
> > We probably don't want to add a new iteration over cgroups themselves
> > on these fast path.
> 
> Hmmm?  Don't follow why this is relevant.

If you make something a cgroup core feature instead of a subsystem and you
need to do something on these cgroups during forks, then you need to
iterate over these as well as the subsystems.

Typically adding some more loop on fork is not considered very welcome.

> 
> Thanks.
> 
> -- 
> tejun

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-02-23 22:34         ` Tejun Heo
@ 2012-02-28 21:16           ` Vivek Goyal
  2012-02-28 21:21             ` Peter Zijlstra
  0 siblings, 1 reply; 84+ messages in thread
From: Vivek Goyal @ 2012-02-28 21:16 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Peter Zijlstra, Li Zefan, containers, cgroups, Andrew Morton,
	Kay Sievers, Lennart Poettering, Frederic Weisbecker,
	linux-kernel, Christoph Hellwig

On Thu, Feb 23, 2012 at 02:34:57PM -0800, Tejun Heo wrote:
> On Thu, Feb 23, 2012 at 04:38:47PM -0500, Vivek Goyal wrote:
> > On Thu, Feb 23, 2012 at 10:41:34AM +0100, Peter Zijlstra wrote:
> > > On Wed, 2012-02-22 at 11:57 -0500, Vivek Goyal wrote:
> > > > 
> > > > Again, it does not mean I am advocating flat hiearchy. I am just wondering
> > > > in case of fully nested hierarchies (task at same level as groups), how
> > > > does one explain it to a layman user who understands things in terms of
> > > > % of resources. 
> > > 
> > > If your complete control is % based then I would assume its a % of a %.
> > > Simple enough.
> > 
> > But % of % will vary dynamically and not be static. So if root has got
> > 100% of resources and we want 25% of that for a group, then hierarchy
> > might look as follows.
> 
> It is complex but semantics is pretty well defined.  It should behave
> exactly the same as HTB.  Whether the complexity would be justifiable
> is a different issue.

I don't know much about HTB but a quick read at internet seems to suggest
that hierarchy we setup is pretty static and does not change with more
task coming in/going out. That means share/configured bandwidth of each
queue in the hierarchy is fixed until and unless that tree is changed. 

But in this case, if task and groups are treated at same level, things
are not static and % share will change dynamically.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-02-28 21:16           ` Vivek Goyal
@ 2012-02-28 21:21             ` Peter Zijlstra
  2012-02-28 21:35               ` Vivek Goyal
  0 siblings, 1 reply; 84+ messages in thread
From: Peter Zijlstra @ 2012-02-28 21:21 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Tejun Heo, Li Zefan, containers, cgroups, Andrew Morton,
	Kay Sievers, Lennart Poettering, Frederic Weisbecker,
	linux-kernel, Christoph Hellwig

On Tue, 2012-02-28 at 16:16 -0500, Vivek Goyal wrote:
> 
> But in this case, if task and groups are treated at same level, things
> are not static and % share will change dynamically. 

which is exactly how the scheduler stuff behaves for the proportional
bits.. so there's no reason not to do it too.



^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-02-28 21:21             ` Peter Zijlstra
@ 2012-02-28 21:35               ` Vivek Goyal
  2012-02-28 21:43                 ` Peter Zijlstra
  2012-02-28 21:53                 ` Peter Zijlstra
  0 siblings, 2 replies; 84+ messages in thread
From: Vivek Goyal @ 2012-02-28 21:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, Li Zefan, containers, cgroups, Andrew Morton,
	Kay Sievers, Lennart Poettering, Frederic Weisbecker,
	linux-kernel, Christoph Hellwig

On Tue, Feb 28, 2012 at 10:21:40PM +0100, Peter Zijlstra wrote:
> On Tue, 2012-02-28 at 16:16 -0500, Vivek Goyal wrote:
> > 
> > But in this case, if task and groups are treated at same level, things
> > are not static and % share will change dynamically. 
> 
> which is exactly how the scheduler stuff behaves for the proportional
> bits.. so there's no reason not to do it too.

Yes this is how scheduler does to handle hierarchy. Treat task and group
at same level. Tejun was giving example of HTB and I was saying that there
class/queues or whatever, seem to be static and are not created
dynamically as tasks come in/go. So its not same.

So coming back to scheduler, handling tasks and groups at same level only
provides us with notion of priority for group. It does not provide any
notion of % (neither minimum, nor maximum). To calculate the % one needs
to know the proportioanal share/weight of all entities at same level and
currently number of entities vary hence % share can't be determined.

Whether it is a good thing or bad thing, I don't know. I think previous
design was allocating a group for every user. I guess, in that case we
will have fixed % share of each user (until and unless users are created/
removed).

So I don't know what's the right behavior. With this discussion, I am just
trying to make it explicit what to expect out of cgroup controllers. For
cpu controller, it is priority at the group level no fixed minimum/maximum
% shares. And that's a limitation of treating task and group at same level.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-02-28 21:35               ` Vivek Goyal
@ 2012-02-28 21:43                 ` Peter Zijlstra
  2012-02-28 21:54                   ` Vivek Goyal
  2012-02-28 21:53                 ` Peter Zijlstra
  1 sibling, 1 reply; 84+ messages in thread
From: Peter Zijlstra @ 2012-02-28 21:43 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Tejun Heo, Li Zefan, containers, cgroups, Andrew Morton,
	Kay Sievers, Lennart Poettering, Frederic Weisbecker,
	linux-kernel, Christoph Hellwig

On Tue, 2012-02-28 at 16:35 -0500, Vivek Goyal wrote:
> For
> cpu controller, it is priority at the group level no fixed minimum/maximum
> % shares. And that's a limitation of treating task and group at same level.

Depends on what you mean by min/max %, you can do it on the group level
by using bandwidth caps (for max) or inverted (max on everybody else,
for min).



^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-02-28 21:35               ` Vivek Goyal
  2012-02-28 21:43                 ` Peter Zijlstra
@ 2012-02-28 21:53                 ` Peter Zijlstra
  2012-02-28 22:09                   ` Vivek Goyal
  1 sibling, 1 reply; 84+ messages in thread
From: Peter Zijlstra @ 2012-02-28 21:53 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Tejun Heo, Li Zefan, containers, cgroups, Andrew Morton,
	Kay Sievers, Lennart Poettering, Frederic Weisbecker,
	linux-kernel, Christoph Hellwig

On Tue, 2012-02-28 at 16:35 -0500, Vivek Goyal wrote:
> Yes this is how scheduler does to handle hierarchy. Treat task and group
> at same level. 

...

> Whether it is a good thing or bad thing, I don't know. 

That's IMO what the cgroupfs interface provides for, if you do anything
different there's this shadow group that contains the tasks for which
you then have to provide extra parameter control.

Furthermore, by treating tasks and groups at the same level you can
create the extra group, but you can't do the reverse. So its the more
versatile solution as well.

> I think previous
> design was allocating a group for every user. I guess, in that case we
> will have fixed % share of each user (until and unless users are created/
> removed). 

Not even, it depended on if the user had anything runnable or not. It
was very much like the current cgroup stuff if you create a cgroup for
each user and stick the tasks in.

The cpu-cgroup stuff is purely runnable based, so every wakeup/sleep
changes the entire weight distribution, yay! :-)

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-02-28 21:43                 ` Peter Zijlstra
@ 2012-02-28 21:54                   ` Vivek Goyal
  2012-02-28 22:00                     ` Peter Zijlstra
  0 siblings, 1 reply; 84+ messages in thread
From: Vivek Goyal @ 2012-02-28 21:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, Li Zefan, containers, cgroups, Andrew Morton,
	Kay Sievers, Lennart Poettering, Frederic Weisbecker,
	linux-kernel, Christoph Hellwig

On Tue, Feb 28, 2012 at 10:43:54PM +0100, Peter Zijlstra wrote:
> On Tue, 2012-02-28 at 16:35 -0500, Vivek Goyal wrote:
> > For
> > cpu controller, it is priority at the group level no fixed minimum/maximum
> > % shares. And that's a limitation of treating task and group at same level.
> 
> Depends on what you mean by min/max %, you can do it on the group level
> by using bandwidth caps (for max) or inverted (max on everybody else,
> for min).

I was referring to using pure proportional controller. max bandwidth is
new and I am looking for a quick documentation file which describes
what are the knobs and how to use it. Did not find any in
Documentation/cgroups/. Is there any documentation available?

I am assuming that max are being specified for groups in some absolute
quantity. That is fine. It will not still be max %, as again for % you
need fixed number of entities at any level and that's not the case with
tasks.

Minimum for one group (max for everyone else) will also only work if 
task and groups are not at same level.

I think the only way to get fixed % share is not to put task and group
at same level during system configuration.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-02-28 21:54                   ` Vivek Goyal
@ 2012-02-28 22:00                     ` Peter Zijlstra
  2012-02-28 22:31                       ` Vivek Goyal
  0 siblings, 1 reply; 84+ messages in thread
From: Peter Zijlstra @ 2012-02-28 22:00 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Tejun Heo, Li Zefan, containers, cgroups, Andrew Morton,
	Kay Sievers, Lennart Poettering, Frederic Weisbecker,
	linux-kernel, Christoph Hellwig

On Tue, 2012-02-28 at 16:54 -0500, Vivek Goyal wrote:
> On Tue, Feb 28, 2012 at 10:43:54PM +0100, Peter Zijlstra wrote:
> > On Tue, 2012-02-28 at 16:35 -0500, Vivek Goyal wrote:
> > > For
> > > cpu controller, it is priority at the group level no fixed minimum/maximum
> > > % shares. And that's a limitation of treating task and group at same level.
> > 
> > Depends on what you mean by min/max %, you can do it on the group level
> > by using bandwidth caps (for max) or inverted (max on everybody else,
> > for min).
> 
> I was referring to using pure proportional controller. max bandwidth is
> new and I am looking for a quick documentation file which describes
> what are the knobs and how to use it. Did not find any in
> Documentation/cgroups/. Is there any documentation available?

Its written in C, its at kernel/sched/fair.c ;-)

> I am assuming that max are being specified for groups in some absolute
> quantity. That is fine. It will not still be max %, as again for % you
> need fixed number of entities at any level and that's not the case with
> tasks.
> 
> Minimum for one group (max for everyone else) will also only work if 
> task and groups are not at same level.

I'm really not seeing this.

> I think the only way to get fixed % share is not to put task and group
> at same level during system configuration.

Still doesn't matter, like said, its all runnable based. If a group has
0 runnable entities it doesn't exist (more or less).

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-02-28 21:53                 ` Peter Zijlstra
@ 2012-02-28 22:09                   ` Vivek Goyal
  0 siblings, 0 replies; 84+ messages in thread
From: Vivek Goyal @ 2012-02-28 22:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, Li Zefan, containers, cgroups, Andrew Morton,
	Kay Sievers, Lennart Poettering, Frederic Weisbecker,
	linux-kernel, Christoph Hellwig

On Tue, Feb 28, 2012 at 10:53:59PM +0100, Peter Zijlstra wrote:
> On Tue, 2012-02-28 at 16:35 -0500, Vivek Goyal wrote:
> > Yes this is how scheduler does to handle hierarchy. Treat task and group
> > at same level. 
> 
> ...
> 
> > Whether it is a good thing or bad thing, I don't know. 
> 
> That's IMO what the cgroupfs interface provides for, if you do anything
> different there's this shadow group that contains the tasks for which
> you then have to provide extra parameter control.
> 
> Furthermore, by treating tasks and groups at the same level you can
> create the extra group, but you can't do the reverse. So its the more
> versatile solution as well.

Agreed that it is more versatile. And one can move all the tasks to a
new group to achieve what a shadow group will do.

The only thing is what is a good default. If we are thinking of dividing
resources in terms of % and writing a user space tool, then in default
model we just don't know what's the %. May be it is dynamically varying
% and should be shown accordingly.

Or if idea of minimum % proportional bandwidth is more natural, then 
we shall have to change userspace and things like systemd to not run
any task in /. Then a user space tool can go through cgroup hierarchy
and calculate minimum % share of a group and display it.

> 
> > I think previous
> > design was allocating a group for every user. I guess, in that case we
> > will have fixed % share of each user (until and unless users are created/
> > removed). 
> 
> Not even, it depended on if the user had anything runnable or not. It
> was very much like the current cgroup stuff if you create a cgroup for
> each user and stick the tasks in.
> 
> The cpu-cgroup stuff is purely runnable based, so every wakeup/sleep
> changes the entire weight distribution, yay! :-)

:-). That's fine. If a group is not using its bandwidth because there is
no runnable task, then other groups get more cpu. I thought that's the 
proportional definition.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-02-28 22:00                     ` Peter Zijlstra
@ 2012-02-28 22:31                       ` Vivek Goyal
  0 siblings, 0 replies; 84+ messages in thread
From: Vivek Goyal @ 2012-02-28 22:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, Li Zefan, containers, cgroups, Andrew Morton,
	Kay Sievers, Lennart Poettering, Frederic Weisbecker,
	linux-kernel, Christoph Hellwig

On Tue, Feb 28, 2012 at 11:00:21PM +0100, Peter Zijlstra wrote:
> On Tue, 2012-02-28 at 16:54 -0500, Vivek Goyal wrote:
> > On Tue, Feb 28, 2012 at 10:43:54PM +0100, Peter Zijlstra wrote:
> > > On Tue, 2012-02-28 at 16:35 -0500, Vivek Goyal wrote:
> > > > For
> > > > cpu controller, it is priority at the group level no fixed minimum/maximum
> > > > % shares. And that's a limitation of treating task and group at same level.
> > > 
> > > Depends on what you mean by min/max %, you can do it on the group level
> > > by using bandwidth caps (for max) or inverted (max on everybody else,
> > > for min).
> > 
> > I was referring to using pure proportional controller. max bandwidth is
> > new and I am looking for a quick documentation file which describes
> > what are the knobs and how to use it. Did not find any in
> > Documentation/cgroups/. Is there any documentation available?
> 
> Its written in C, its at kernel/sched/fair.c ;-)

/me does not know enough of scheduler code to parse it fast. But now I found
Documentation/scheduler/sched-bwc.txt to explain what max bandwidth
control is.

> 
> > I am assuming that max are being specified for groups in some absolute
> > quantity. That is fine. It will not still be max %, as again for % you
> > need fixed number of entities at any level and that's not the case with
> > tasks.
> > 
> > Minimum for one group (max for everyone else) will also only work if 
> > task and groups are not at same level.
> 
> I'm really not seeing this.

Assume I have following hierarchy.

				 root
			       /  |  \
			   Tasks  G1  G2

Assume I have given upper limit of 20% to G1. (period=50ms, quota=10ms), 
then it still does not gurantee the minimum bandwidth for G2. If I 
upper limit G2 also, then I am assuming it just gives minimum gurantee
for any tasks in root. (Right now I am assuming just 1 cpu in the system
for simplicity).

So to get a minimum % bandwdith gurantee for G2, I shall have to move
"Tasks" in a children group. Now both upper limit/proportional weight 
on G3 will help to determine minimum % share of G2.

				 root
			       /  |  \
			      G3  G1  G2
			      |
			     Tasks

Please correct me if I have understood the whole thing wrong.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-02-23 14:13       ` Peter Zijlstra
@ 2012-03-01 17:19         ` Michal Schmidt
  2012-03-01 18:03           ` Peter Zijlstra
  2012-03-01 20:26           ` Mike Galbraith
  0 siblings, 2 replies; 84+ messages in thread
From: Michal Schmidt @ 2012-03-01 17:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vivek Goyal, Tejun Heo, Li Zefan, containers, cgroups,
	Andrew Morton, Kay Sievers, Lennart Poettering,
	Frederic Weisbecker, linux-kernel, Christoph Hellwig

Dne 23.2.2012 15:13, Peter Zijlstra napsal:
> My utter disregard for cgroups comes from having to actually implement a
> controller for them, its a frigging nightmare. The systemd retards
> mandating all this nonsense for booting a machine is completely bonghit
> inspired and hasn't made me feel any better about it.

systemd requires only CONFIG_CGROUPS=y. It does not need any controllers.

The insults are entirely unnecessary.

Michal

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-03-01 17:19         ` Michal Schmidt
@ 2012-03-01 18:03           ` Peter Zijlstra
  2012-03-02 11:08             ` Michal Schmidt
  2012-03-01 20:26           ` Mike Galbraith
  1 sibling, 1 reply; 84+ messages in thread
From: Peter Zijlstra @ 2012-03-01 18:03 UTC (permalink / raw)
  To: Michal Schmidt
  Cc: Vivek Goyal, Tejun Heo, Li Zefan, containers, cgroups,
	Andrew Morton, Kay Sievers, Lennart Poettering,
	Frederic Weisbecker, linux-kernel, Christoph Hellwig

On Thu, 2012-03-01 at 18:19 +0100, Michal Schmidt wrote:
> Dne 23.2.2012 15:13, Peter Zijlstra napsal:
> > My utter disregard for cgroups comes from having to actually implement a
> > controller for them, its a frigging nightmare. The systemd retards
> > mandating all this nonsense for booting a machine is completely bonghit
> > inspired and hasn't made me feel any better about it.
> 
> systemd requires only CONFIG_CGROUPS=y. It does not need any controllers.

And that makes it better how?

> The insults are entirely unnecessary.

I think not, booting a machine should depend on the smallest possible
subset of features. Doing anything else is completely bonkers.

Luckily it looks like the Debian guys made the right decision -- albeit
for the wrong reasons.


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-03-01 17:19         ` Michal Schmidt
  2012-03-01 18:03           ` Peter Zijlstra
@ 2012-03-01 20:26           ` Mike Galbraith
  2012-03-01 21:02             ` Vivek Goyal
                               ` (2 more replies)
  1 sibling, 3 replies; 84+ messages in thread
From: Mike Galbraith @ 2012-03-01 20:26 UTC (permalink / raw)
  To: Michal Schmidt
  Cc: Peter Zijlstra, Vivek Goyal, Tejun Heo, Li Zefan, containers,
	cgroups, Andrew Morton, Kay Sievers, Lennart Poettering,
	Frederic Weisbecker, linux-kernel, Christoph Hellwig

On Thu, 2012-03-01 at 18:19 +0100, Michal Schmidt wrote: 
> Dne 23.2.2012 15:13, Peter Zijlstra napsal:
> > My utter disregard for cgroups comes from having to actually implement a
> > controller for them, its a frigging nightmare. The systemd retards
> > mandating all this nonsense for booting a machine is completely bonghit
> > inspired and hasn't made me feel any better about it.
> 
> systemd requires only CONFIG_CGROUPS=y. It does not need any controllers.
> 
> The insults are entirely unnecessary.

At the risk of insulting any systemd person, I recently upgraded my box,
and had my very first encounter with systemd.  It didn't go well at all,
to say the very least.  In fact, it quickly became a violent removal.

After the fact, when I queried, I was told straight out that I should
live in harmony with the cgroups configuration systemd set up for me and
be happy.  For the nonce, you can remove it, and here's how (thanks for
that guys), but that removal option is _going_ to go away.  No, you
can't simply turn our cgroup setup off and control your box as if you
actually _own_ the thing, because cgroups is an integral part of the
systemd concept.

Really.  I hope that was idiotic fanboy tripe, because that flat ain't
gonna happen here, ever.

Q: you say systemd requires CONFIG_CGROUPS=y.  Why is that?  It's taking
over sysvinits job afaiui, what does that have to do with cgroups?

-Mike


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-03-01 20:26           ` Mike Galbraith
@ 2012-03-01 21:02             ` Vivek Goyal
  2012-03-01 22:04               ` Mike Galbraith
  2012-03-02  2:43             ` Kay Sievers
  2012-03-02 11:16             ` Michal Schmidt
  2 siblings, 1 reply; 84+ messages in thread
From: Vivek Goyal @ 2012-03-01 21:02 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Michal Schmidt, Peter Zijlstra, Tejun Heo, Li Zefan, containers,
	cgroups, Andrew Morton, Kay Sievers, Lennart Poettering,
	Frederic Weisbecker, linux-kernel, Christoph Hellwig

On Thu, Mar 01, 2012 at 09:26:43PM +0100, Mike Galbraith wrote:
> On Thu, 2012-03-01 at 18:19 +0100, Michal Schmidt wrote: 
> > Dne 23.2.2012 15:13, Peter Zijlstra napsal:
> > > My utter disregard for cgroups comes from having to actually implement a
> > > controller for them, its a frigging nightmare. The systemd retards
> > > mandating all this nonsense for booting a machine is completely bonghit
> > > inspired and hasn't made me feel any better about it.
> > 
> > systemd requires only CONFIG_CGROUPS=y. It does not need any controllers.
> > 
> > The insults are entirely unnecessary.
> 
> At the risk of insulting any systemd person, I recently upgraded my box,
> and had my very first encounter with systemd.  It didn't go well at all,
> to say the very least.  In fact, it quickly became a violent removal.
> 
> After the fact, when I queried, I was told straight out that I should
> live in harmony with the cgroups configuration systemd set up for me and
> be happy.  For the nonce, you can remove it, and here's how (thanks for
> that guys), but that removal option is _going_ to go away.  No, you
> can't simply turn our cgroup setup off and control your box as if you
> actually _own_ the thing, because cgroups is an integral part of the
> systemd concept.
> 
> Really.  I hope that was idiotic fanboy tripe, because that flat ain't
> gonna happen here, ever.
> 
> Q: you say systemd requires CONFIG_CGROUPS=y.  Why is that?  It's taking
> over sysvinits job afaiui, what does that have to do with cgroups?

I think they were using it to track all the children forked by a service
and cleanup all of them if need be. So they just need it for logical
grouping functionality and don't require any controllers as such.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-03-01 21:02             ` Vivek Goyal
@ 2012-03-01 22:04               ` Mike Galbraith
  2012-03-01 22:38                 ` C Anthony Risinger
                                   ` (2 more replies)
  0 siblings, 3 replies; 84+ messages in thread
From: Mike Galbraith @ 2012-03-01 22:04 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Michal Schmidt, Peter Zijlstra, Tejun Heo, Li Zefan, containers,
	cgroups, Andrew Morton, Kay Sievers, Lennart Poettering,
	Frederic Weisbecker, linux-kernel, Christoph Hellwig

On Thu, 2012-03-01 at 16:02 -0500, Vivek Goyal wrote: 
> On Thu, Mar 01, 2012 at 09:26:43PM +0100, Mike Galbraith wrote:

> > Q: you say systemd requires CONFIG_CGROUPS=y.  Why is that?  It's taking
> > over sysvinits job afaiui, what does that have to do with cgroups?
> 
> I think they were using it to track all the children forked by a service
> and cleanup all of them if need be. So they just need it for logical
> grouping functionality and don't require any controllers as such.

Hm.  Controllers are perhaps not required, but cpu controller was
configured and used without consent.  I didn't receive an offer.

-Mike


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-03-01 22:04               ` Mike Galbraith
@ 2012-03-01 22:38                 ` C Anthony Risinger
  2012-03-02 10:51                 ` Michal Schmidt
  2012-03-05 12:43                 ` Lennart Poettering
  2 siblings, 0 replies; 84+ messages in thread
From: C Anthony Risinger @ 2012-03-01 22:38 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Vivek Goyal, Kay Sievers, Frederic Weisbecker, containers,
	Michal Schmidt, linux-kernel, Christoph Hellwig,
	Lennart Poettering, Tejun Heo, cgroups, Andrew Morton

On Thu, Mar 1, 2012 at 4:04 PM, Mike Galbraith <efault@gmx.de> wrote:
> On Thu, 2012-03-01 at 16:02 -0500, Vivek Goyal wrote:
>> On Thu, Mar 01, 2012 at 09:26:43PM +0100, Mike Galbraith wrote:
>
>> > Q: you say systemd requires CONFIG_CGROUPS=y.  Why is that?  It's taking
>> > over sysvinits job afaiui, what does that have to do with cgroups?
>>
>> I think they were using it to track all the children forked by a service
>> and cleanup all of them if need be. So they just need it for logical
>> grouping functionality and don't require any controllers as such.
>
> Hm.  Controllers are perhaps not required, but cpu controller was
> configured and used without consent.  I didn't receive an offer.

AFAIK it does in fact only require `name` cgroup for it's own
monitoring purposes.  i believe the systemd folks also tried (and are
trying? see TODO) to get PR_SET_ANCHOR merged upstream:

https://lkml.org/lkml/2010/2/2/165

... which is a sort of recursive/persistent parenting flag; without
that or cgroups, there is no way to reliably supervise processes under
Linux.

the other problem is there is no way for a process to enumerate the
available cgroups -- IIRC a list had to be hard-coded in systemd
sources -- and mounting the cgroupfs without specifying a specific
subsystem simply mounts everything in one whack.

you should be able to tell systemd to ignore that specific controller,
or tell it to use existing mounts.  i for one have been using it
*exclusively* on my personal machines/home servers [archlinux] and am
very pleased ... it's very flexible and gives you an unprecedented
level of control and introspection into the system (man systemd.*) ...
i can create new services in about 3-5 lines.

... obviously this thread is not about systemd, but since it makes
such extensive use of cgroup facilities it only proves to highlight
it's deficiencies.  i think the notes and practices systemd has
established should be viewed as a good reference for what's clumsy, at
the very least, and should not be attributed to systemd, but to
cgroups.

-- 

C Anthony

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-03-01 20:26           ` Mike Galbraith
  2012-03-01 21:02             ` Vivek Goyal
@ 2012-03-02  2:43             ` Kay Sievers
  2012-03-02 10:15               ` Peter Zijlstra
  2012-03-02 11:16             ` Michal Schmidt
  2 siblings, 1 reply; 84+ messages in thread
From: Kay Sievers @ 2012-03-02  2:43 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Michal Schmidt, Peter Zijlstra, Vivek Goyal, Tejun Heo, Li Zefan,
	containers, cgroups, Andrew Morton, Lennart Poettering,
	Frederic Weisbecker, linux-kernel, Christoph Hellwig

On Thu, Mar 1, 2012 at 21:26, Mike Galbraith <efault@gmx.de> wrote:
> At the risk of insulting any systemd person

You do.

> what does that have to do with cgroups?

Do your homework before writing such emails please. It's ridiculous.

Kay

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-03-02  2:43             ` Kay Sievers
@ 2012-03-02 10:15               ` Peter Zijlstra
  0 siblings, 0 replies; 84+ messages in thread
From: Peter Zijlstra @ 2012-03-02 10:15 UTC (permalink / raw)
  To: Kay Sievers
  Cc: Mike Galbraith, Michal Schmidt, Vivek Goyal, Tejun Heo, Li Zefan,
	containers, cgroups, Andrew Morton, Lennart Poettering,
	Frederic Weisbecker, linux-kernel, Christoph Hellwig

On Fri, 2012-03-02 at 03:43 +0100, Kay Sievers wrote:
> It's ridiculous.

systemd in a nutshell, well put! :-)

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-03-01 22:04               ` Mike Galbraith
  2012-03-01 22:38                 ` C Anthony Risinger
@ 2012-03-02 10:51                 ` Michal Schmidt
  2012-03-02 11:52                   ` Mike Galbraith
  2012-03-05 12:43                 ` Lennart Poettering
  2 siblings, 1 reply; 84+ messages in thread
From: Michal Schmidt @ 2012-03-02 10:51 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Vivek Goyal, Peter Zijlstra, Tejun Heo, Li Zefan, containers,
	cgroups, Andrew Morton, Kay Sievers, Lennart Poettering,
	Frederic Weisbecker, linux-kernel, Christoph Hellwig

On 03/01/2012 11:04 PM, Mike Galbraith wrote:
> Hm.  Controllers are perhaps not required, but cpu controller was
> configured and used without consent.  I didn't receive an offer.

This is optional.

"DefaultControllers=" in /etc/systemd/system.conf

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-03-01 18:03           ` Peter Zijlstra
@ 2012-03-02 11:08             ` Michal Schmidt
  2012-03-02 11:23               ` Peter Zijlstra
  0 siblings, 1 reply; 84+ messages in thread
From: Michal Schmidt @ 2012-03-02 11:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vivek Goyal, Tejun Heo, Li Zefan, containers, cgroups,
	Andrew Morton, Kay Sievers, Lennart Poettering,
	Frederic Weisbecker, linux-kernel, Christoph Hellwig

On 03/01/2012 07:03 PM, Peter Zijlstra wrote:
> On Thu, 2012-03-01 at 18:19 +0100, Michal Schmidt wrote:
>> Dne 23.2.2012 15:13, Peter Zijlstra napsal:
>>> My utter disregard for cgroups comes from having to actually implement a
>>> controller for them, its a frigging nightmare. The systemd retards
>>> mandating all this nonsense for booting a machine is completely bonghit
>>> inspired and hasn't made me feel any better about it.
>>
>> systemd requires only CONFIG_CGROUPS=y. It does not need any controllers.
>
> And that makes it better how?

Because it is not involved in the controllers nightmare you have.
systemd does not require "all this nonsense".

>> The insults are entirely unnecessary.
>
> I think not, booting a machine should depend on the smallest possible
> subset of features. Doing anything else is completely bonkers.

Your disagreement does not justify calling the people on the other side 
retards.

Your statement about the smallest feature subset could use some 
clarification. Should we make sure everything works fine without, say, 
CONFIG_UNIX?
And what exactly do you mean by booting? Obviously not booting into a 
full desktop environment, because that requires a lot of features.
If on the other hand you are satisfied with booting into a getty with 
not many services around, in this sense systemd will boot without 
CONFIG_CGROUPS. It's recommended not to do that and nobody actively 
tests this setup, but at least systemd will not abort. So it can be used 
to check if the kernel boots and to run some tests.

Michal

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-03-01 20:26           ` Mike Galbraith
  2012-03-01 21:02             ` Vivek Goyal
  2012-03-02  2:43             ` Kay Sievers
@ 2012-03-02 11:16             ` Michal Schmidt
  2012-03-02 11:24               ` Peter Zijlstra
  2 siblings, 1 reply; 84+ messages in thread
From: Michal Schmidt @ 2012-03-02 11:16 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Peter Zijlstra, Vivek Goyal, Tejun Heo, Li Zefan, containers,
	cgroups, Andrew Morton, Kay Sievers, Lennart Poettering,
	Frederic Weisbecker, linux-kernel, Christoph Hellwig

On 03/01/2012 09:26 PM, Mike Galbraith wrote:
> Q: you say systemd requires CONFIG_CGROUPS=y.  Why is that?  It's taking
> over sysvinits job afaiui, what does that have to do with cgroups?

It does more than sysvinit. It does more than fork off services and then 
forget about them. It keeps track of them all the time.
It notices when all processes of a service have exited.
It can kill all processes descended from a service.
It can always tell what service a given process belongs to.

Michal

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-03-02 11:08             ` Michal Schmidt
@ 2012-03-02 11:23               ` Peter Zijlstra
  2012-03-02 11:28                 ` Michal Schmidt
  0 siblings, 1 reply; 84+ messages in thread
From: Peter Zijlstra @ 2012-03-02 11:23 UTC (permalink / raw)
  To: Michal Schmidt
  Cc: Vivek Goyal, Tejun Heo, Li Zefan, containers, cgroups,
	Andrew Morton, Kay Sievers, Lennart Poettering,
	Frederic Weisbecker, linux-kernel, Christoph Hellwig

On Fri, 2012-03-02 at 12:08 +0100, Michal Schmidt wrote:
> And what exactly do you mean by booting? Obviously not booting into a 
> full desktop environment, because that requires a lot of features.

Nah, who needs that stuff anyway :-) Note that even my desktop and
laptop work fine without initrd, systemd and CONFIG_CGROUP nonsense. And
while they are bloated with useless crap like *Kit and others, simply
because I can't get myself to rebuild enough to get rid of the
dependencies, I utterly hate them being around.

> If on the other hand you are satisfied with booting into a getty with 
> not many services around, in this sense systemd will boot without 
> CONFIG_CGROUPS. 

Except that it waits a random amount of time, long enough for me to
think the machine didn't come back up and power cycle again. The random
delay is _waay_ longer than a regular reboot cycle and totally destroys
the usability.

> It's recommended not to do that and nobody actively 
> tests this setup, but at least systemd will not abort. So it can be used 
> to check if the kernel boots and to run some tests. 

So you don't recommend people use server type setups? Quality
engineering that!

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-03-02 11:16             ` Michal Schmidt
@ 2012-03-02 11:24               ` Peter Zijlstra
  0 siblings, 0 replies; 84+ messages in thread
From: Peter Zijlstra @ 2012-03-02 11:24 UTC (permalink / raw)
  To: Michal Schmidt
  Cc: Mike Galbraith, Vivek Goyal, Tejun Heo, Li Zefan, containers,
	cgroups, Andrew Morton, Kay Sievers, Lennart Poettering,
	Frederic Weisbecker, linux-kernel, Christoph Hellwig

On Fri, 2012-03-02 at 12:16 +0100, Michal Schmidt wrote:
> On 03/01/2012 09:26 PM, Mike Galbraith wrote:
> > Q: you say systemd requires CONFIG_CGROUPS=y.  Why is that?  It's taking
> > over sysvinits job afaiui, what does that have to do with cgroups?
> 
> It does more than sysvinit. It does more than fork off services and then 
> forget about them. It keeps track of them all the time.
> It notices when all processes of a service have exited.
> It can kill all processes descended from a service.
> It can always tell what service a given process belongs to.

And I don't give a crap about any of that.. sysvinit was sufficient. 

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-03-02 11:23               ` Peter Zijlstra
@ 2012-03-02 11:28                 ` Michal Schmidt
  2012-03-02 11:34                   ` Peter Zijlstra
  0 siblings, 1 reply; 84+ messages in thread
From: Michal Schmidt @ 2012-03-02 11:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vivek Goyal, Tejun Heo, Li Zefan, containers, cgroups,
	Andrew Morton, Kay Sievers, Lennart Poettering,
	Frederic Weisbecker, linux-kernel, Christoph Hellwig

On 03/02/2012 12:23 PM, Peter Zijlstra wrote:
> On Fri, 2012-03-02 at 12:08 +0100, Michal Schmidt wrote:
>> It's recommended not to do that and nobody actively
>> tests this setup, but at least systemd will not abort. So it can be used
>> to check if the kernel boots and to run some tests.
>
> So you don't recommend people use server type setups? Quality
> engineering that!

I don't follow. Are you saying that CONFIG_CGROUPS=y is incompatible 
with server setups?

Michal

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-03-02 11:28                 ` Michal Schmidt
@ 2012-03-02 11:34                   ` Peter Zijlstra
  0 siblings, 0 replies; 84+ messages in thread
From: Peter Zijlstra @ 2012-03-02 11:34 UTC (permalink / raw)
  To: Michal Schmidt
  Cc: Vivek Goyal, Tejun Heo, Li Zefan, containers, cgroups,
	Andrew Morton, Kay Sievers, Lennart Poettering,
	Frederic Weisbecker, linux-kernel, Christoph Hellwig

On Fri, 2012-03-02 at 12:28 +0100, Michal Schmidt wrote:
> On 03/02/2012 12:23 PM, Peter Zijlstra wrote:
> > On Fri, 2012-03-02 at 12:08 +0100, Michal Schmidt wrote:
> >> It's recommended not to do that and nobody actively
> >> tests this setup, but at least systemd will not abort. So it can be used
> >> to check if the kernel boots and to run some tests.
> >
> > So you don't recommend people use server type setups? Quality
> > engineering that!
> 
> I don't follow. Are you saying that CONFIG_CGROUPS=y is incompatible 
> with server setups?

It is for mine, I don't need it, not building it gives a smaller kernel
and thus less attack/bug surface. 

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-03-02 10:51                 ` Michal Schmidt
@ 2012-03-02 11:52                   ` Mike Galbraith
  0 siblings, 0 replies; 84+ messages in thread
From: Mike Galbraith @ 2012-03-02 11:52 UTC (permalink / raw)
  To: Michal Schmidt
  Cc: Vivek Goyal, Peter Zijlstra, Tejun Heo, Li Zefan, containers,
	cgroups, Andrew Morton, Kay Sievers, Lennart Poettering,
	Frederic Weisbecker, linux-kernel, Christoph Hellwig

On Fri, 2012-03-02 at 11:51 +0100, Michal Schmidt wrote: 
> On 03/01/2012 11:04 PM, Mike Galbraith wrote:
> > Hm.  Controllers are perhaps not required, but cpu controller was
> > configured and used without consent.  I didn't receive an offer.
> 
> This is optional.
> 
> "DefaultControllers=" in /etc/systemd/system.conf

#DefaultControllers=cpu is what was there.  But we're OT, and my
annoyance went *poof* when systemd-sysvinit did.

-Mike


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
       [not found] ` <m162em2efy.fsf@fess.ebiederm.org>
@ 2012-03-03 14:26   ` Serge Hallyn
  0 siblings, 0 replies; 84+ messages in thread
From: Serge Hallyn @ 2012-03-03 14:26 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Tejun Heo, Frederic Weisbecker, containers, Kay Sievers,
	linux-kernel, Christoph Hellwig, Lennart Poettering, cgroups,
	Andrew Morton

Quoting Eric W. Biederman (ebiederm@xmission.com):
> 4.  Only allow a single controller per hierarchy.  So that we can
> make reasonable choice points.  If the hierarchy is to be about
> policy I am baffled by concept of putting multiple controllers in
> a single hierarchy.  Because where you clump things together for
> policy for one controller in general does not seem to be where you
> want to clump things together for another controller.

This I agree with.  The only reasonably use I've seen for the
composed hierarchies is the ns cgroup, which was the wrong tool
for what it wanted to do anyway.  With ns cgroup deprecated, every
modern setup I've seen mounts each cgroup separately.

> > So, I mean, I don't know.  What do other people think?  Is this a
> > unnecessary worry?  Are people generally happy with the way things
> > are?  Lennart, Kay, what do you guys think?
> 
> I think the current situation is crazy.
> 
> I especially think it is crazy that inside a container I can't create
> a fresh set of cgroup mounts, and establish a fresh "hierarchy" relative
> to the process that created the cgroup mounts.  It sucks that
> controllers may nest fine but hierarchies don't nest.

Again I agree here.  In the past there has been brought up the idea
of fake cgroup roots ( http://thread.gmane.org/gmane.linux.kernel/1197643 )
I haven't looked in detail, but I know some people hated this particular
idea.  But it tried to solve this problem.

Perhaps we can attack this in two stages.

1. get rid of the ability to compose cgroups.  See how much this
simplifies the code.

2. add the ability to somehow namespace the cgroup mounts to allow a
container to freshly mount cgroups.

-serge

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-02-21 21:19 [RFD] cgroup: about multiple hierarchies Tejun Heo
                   ` (5 preceding siblings ...)
       [not found] ` <m162em2efy.fsf@fess.ebiederm.org>
@ 2012-03-05 11:37 ` Lennart Poettering
  2012-03-12 22:10 ` Tejun Heo
  7 siblings, 0 replies; 84+ messages in thread
From: Lennart Poettering @ 2012-03-05 11:37 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, containers, cgroups, Andrew Morton, Kay Sievers,
	Frederic Weisbecker, linux-kernel

On Tue, 21.02.12 13:19, Tejun Heo (tj@kernel.org) wrote:

> Hello, guys.

Heya,

> I've been thinking about multiple hierarchy support in cgroup for a
> while, especially after Frederic's pending task counter patchset.
> This is a write up of what I've been thinking.  I don't know what to
> do yet and simply continuing the current situation definitely is an
> option, so please read on and throw in your 20 Won (or whatever amount
> in whatever currency you want).

Sorry for responding to this thread only this late, but here are my 2
eurocents on this.

Yes, I think it would make a ton of sense simplifiying the cgroup logic
drastically. I see no need to maintain a number of entirely orthogonal
hierarchies in parallel. That said, I think that allowing some kind of
deviation from the main cgroup tree for some controlls might be a good
idea, however. More specifically, even if a task is placed in some
specific cgroup down the tree it might make sense for some controllers
to consider it in the root cgroup. e.g. since some controllers are more
expensive than others it might make sense to leave task A in
/foobar/waldo for the memory controller, but in / for the block IO
controller. But placing A in /foobar/waldo for the memory controller and
in /piep/papo makes little sense, if you understand what I mean. 

To implement something like this it might be enough to enforce a single
hierarchy only but then allow tasks to either live in the tree or live
outside the tree for a specific controller, and that's it.

> Heh, I don't know.  IIRC, last year at LinuxCon Japan, I heard
> Christoph saying that the biggest problem w/ cgroup was that it was
> building completely separate hierarchies out of the traditional
> process hierarchies.  After thinking about this stuff for a while, I
> fully agree with him.  I think this whole thing should have been a
> layer over the process tree like sessions or program groups.

Hmm, so while in general I think I agree with this sentiment, I am not
entirely sure. For example, on systemd systems we currently set up one
cgroup per user (/user/lennart/), and in that cgroup a cgroup per
session (/user/lennart/4711/). This implies that when the same user logs
in twice his processes will be placed beneath the same parent cgroup
(/user/lennart/), which I kinda read would be conflicting with what you
suggest, no?

>    This path would require the most amount of work and we would be
>    excluding a feature - support for multiple orthogonal
>    categorizations - which has been available till now, probably
>    through deprecation process spanning years; however, this at least
>    gives us hope that we may reach sanity in the end, how distant that
>    end may be.  Oh, hope. :)
> 
> So, I mean, I don't know.  What do other people think?  Is this a
> unnecessary worry?  Are people generally happy with the way things
> are?  Lennart, Kay, what do you guys think?

A clean-up like you suggest as 3, is something we'd be happy to
support. I am happy to make any change necessary to adapt systemd to
whatever makes sense in the kernel. To me the current flexibility of the
cgroup interface appears way over-the-top, and a much simpler design
would suffice too and be more understandable to the user. (starting from
the fact that ps output currently looks really awful, since it lists
cgroup membership for all hierarchies, which just looks crazy)

Lennart

-- 
Lennart Poettering - Red Hat, Inc.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-03-01 22:04               ` Mike Galbraith
  2012-03-01 22:38                 ` C Anthony Risinger
  2012-03-02 10:51                 ` Michal Schmidt
@ 2012-03-05 12:43                 ` Lennart Poettering
  2012-03-05 15:47                   ` Mike Galbraith
  2 siblings, 1 reply; 84+ messages in thread
From: Lennart Poettering @ 2012-03-05 12:43 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Vivek Goyal, Michal Schmidt, Peter Zijlstra, Tejun Heo, Li Zefan,
	containers, cgroups, Andrew Morton, Kay Sievers,
	Frederic Weisbecker, linux-kernel, Christoph Hellwig

On Thu, 01.03.12 23:04, Mike Galbraith (efault@gmx.de) wrote:

> 
> On Thu, 2012-03-01 at 16:02 -0500, Vivek Goyal wrote: 
> > On Thu, Mar 01, 2012 at 09:26:43PM +0100, Mike Galbraith wrote:
> 
> > > Q: you say systemd requires CONFIG_CGROUPS=y.  Why is that?  It's taking
> > > over sysvinits job afaiui, what does that have to do with cgroups?
> > 
> > I think they were using it to track all the children forked by a service
> > and cleanup all of them if need be. So they just need it for logical
> > grouping functionality and don't require any controllers as such.
> 
> Hm.  Controllers are perhaps not required, but cpu controller was
> configured and used without consent.  I didn't receive an offer.

Just set DefaultControllers= in /etc/systemd/system.conf to an empty
string and systemd will not make use of any hierarchy beyond its private
name=systemd named hierarchy.

Lennart

-- 
Lennart Poettering - Red Hat, Inc.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-03-05 12:43                 ` Lennart Poettering
@ 2012-03-05 15:47                   ` Mike Galbraith
  2012-03-05 19:58                     ` Mike Galbraith
  0 siblings, 1 reply; 84+ messages in thread
From: Mike Galbraith @ 2012-03-05 15:47 UTC (permalink / raw)
  To: Lennart Poettering
  Cc: Vivek Goyal, Michal Schmidt, Peter Zijlstra, Tejun Heo, Li Zefan,
	containers, cgroups, Andrew Morton, Kay Sievers,
	Frederic Weisbecker, linux-kernel, Christoph Hellwig

On Mon, 2012-03-05 at 13:43 +0100, Lennart Poettering wrote: 
> On Thu, 01.03.12 23:04, Mike Galbraith (efault@gmx.de) wrote:
> 
> > 
> > On Thu, 2012-03-01 at 16:02 -0500, Vivek Goyal wrote: 
> > > On Thu, Mar 01, 2012 at 09:26:43PM +0100, Mike Galbraith wrote:
> > 
> > > > Q: you say systemd requires CONFIG_CGROUPS=y.  Why is that?  It's taking
> > > > over sysvinits job afaiui, what does that have to do with cgroups?
> > > 
> > > I think they were using it to track all the children forked by a service
> > > and cleanup all of them if need be. So they just need it for logical
> > > grouping functionality and don't require any controllers as such.
> > 
> > Hm.  Controllers are perhaps not required, but cpu controller was
> > configured and used without consent.  I didn't receive an offer.
> 
> Just set DefaultControllers= in /etc/systemd/system.conf to an empty
> string and systemd will not make use of any hierarchy beyond its private
> name=systemd named hierarchy.

I updated my laptop to openSUSE 12.1 over the weekend, so tried it.  It
didn't work, so I tried setting JoinControllers= as well.  That split up
cpu and cpuacct, but didn't stop cpu from being used.  I then moved both
system.conf and user.conf to /etc/systemd/save, and got the original
setup back.. not surprising given everything was commented out in both
files to begin with.  The little bugger is stubborn.

maggy:/etc/systemd # mount|grep cgroup
tmpfs on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,relatime,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/ns type cgroup (rw,nosuid,nodev,noexec,relatime,ns)
cgroup on /sys/fs/cgroup/cpu type cgroup (rw,nosuid,nodev,noexec,relatime,cpu)
cgroup on /sys/fs/cgroup/cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuacct)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
maggy:/etc/systemd # cat /sys/fs/cgroup/cpu/tasks|wc -l
262
maggy:/etc/systemd # cat system.conf
#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version 2 of the License, or
#  (at your option) any later version.
#
# See systemd.conf(5) for details

[Manager]
#LogLevel=info
#LogTarget=syslog-or-kmsg
#LogColor=yes
#LogLocation=no
#DumpCore=yes
#CrashShell=no
#ShowStatus=yes
#SysVConsole=yes
#CrashChVT=1
#CPUAffinity=1 2
#MountAuto=yes
#SwapAuto=yes
DefaultControllers=
#DefaultStandardOutput=syslog
#DefaultStandardError=inherit
JoinControllers=
maggy:/etc/systemd #

reboot...

maggy:/etc/systemd # mount|grep cgroup
tmpfs on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,relatime,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/ns type cgroup (rw,nosuid,nodev,noexec,relatime,ns)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuacct,cpu)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
maggy:/etc/systemd # cat /sys/fs/cgroup/cpu/tasks|wc -l
146
maggy:/etc/systemd # ls
save  system  systemd-logind.conf  user
maggy:/etc/systemd #



^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-03-05 15:47                   ` Mike Galbraith
@ 2012-03-05 19:58                     ` Mike Galbraith
  0 siblings, 0 replies; 84+ messages in thread
From: Mike Galbraith @ 2012-03-05 19:58 UTC (permalink / raw)
  To: Lennart Poettering
  Cc: Vivek Goyal, Michal Schmidt, Peter Zijlstra, Tejun Heo, Li Zefan,
	containers, cgroups, Andrew Morton, Kay Sievers,
	Frederic Weisbecker, linux-kernel, Christoph Hellwig

On Mon, 2012-03-05 at 16:47 +0100, Mike Galbraith wrote: 
> On Mon, 2012-03-05 at 13:43 +0100, Lennart Poettering wrote: 

> > Just set DefaultControllers= in /etc/systemd/system.conf to an empty
> > string and systemd will not make use of any hierarchy beyond its private
> > name=systemd named hierarchy.
> 
> I updated my laptop to openSUSE 12.1 over the weekend, so tried it.  It
> didn't work

Bah, yes it did, subdir is gone.  I shouldn't multitask.

-Mike



^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-02-21 21:19 [RFD] cgroup: about multiple hierarchies Tejun Heo
                   ` (6 preceding siblings ...)
  2012-03-05 11:37 ` Lennart Poettering
@ 2012-03-12 22:10 ` Tejun Heo
  2012-03-12 22:22   ` Peter Zijlstra
                     ` (2 more replies)
  7 siblings, 3 replies; 84+ messages in thread
From: Tejun Heo @ 2012-03-12 22:10 UTC (permalink / raw)
  To: Li Zefan, containers, cgroups
  Cc: Andrew Morton, Kay Sievers, Lennart Poettering,
	Frederic Weisbecker, linux-kernel, Vivek Goyal, Michal Schmidt,
	Peter Zijlstra

Hello, guys.

Thanks a lot for the discussion and here are my take aways:

* At least to me, nobody seems to have strong enough justification for
  orthogonal multiple hierarchies, so, yeah, unless something else
  happens, I'm scheduling multiple hierarchy support for the chopping
  block.  This is a long term thing (think years), so no need to panic
  right now and as is life plans may change and fail to materialize,
  but I intend to at least move away from it.

* Several people pointed out that it would be inconvenient to require
  cgroup hierarchy to be a strict super-imposed tree on top of process
  tree and that program groups / sessions aren't like that either.  I
  agree, so it will hopefully be single hierarchy which more or less
  behaves the same as the current hierarchy.

* How to map controllers which aren't aware of full hierarchy is still
  an open question but I'm still standing by one active node on any
  root-to-leaf path w/ root group serving as the special rest group.

  This should happen first for the long migration to begin.  I might
  get to it someday but if anyone can beat me to it, please go ahead.
  I'll be ecstatic to review and merge the patches.

Also, I'll slowly be marking features which don't seem essential,
especially the convenience features for multiple hierarchies, as
deprecated and eventually chop them.

Thank you.

-- 
tejun

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-03-12 22:10 ` Tejun Heo
@ 2012-03-12 22:22   ` Peter Zijlstra
  2012-03-12 22:28     ` Tejun Heo
  2012-03-12 22:37   ` Serge Hallyn
  2012-03-13 13:49   ` Vivek Goyal
  2 siblings, 1 reply; 84+ messages in thread
From: Peter Zijlstra @ 2012-03-12 22:22 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, containers, cgroups, Andrew Morton, Kay Sievers,
	Lennart Poettering, Frederic Weisbecker, linux-kernel,
	Vivek Goyal, Michal Schmidt

On Mon, 2012-03-12 at 15:10 -0700, Tejun Heo wrote:
> 
> * How to map controllers which aren't aware of full hierarchy is still
>   an open question but I'm still standing by one active node on any
>   root-to-leaf path w/ root group serving as the special rest group. 

What does this mean?

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-03-12 22:22   ` Peter Zijlstra
@ 2012-03-12 22:28     ` Tejun Heo
  2012-03-12 22:31       ` Lennart Poettering
                         ` (2 more replies)
  0 siblings, 3 replies; 84+ messages in thread
From: Tejun Heo @ 2012-03-12 22:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Li Zefan, containers, cgroups, Andrew Morton, Kay Sievers,
	Lennart Poettering, Frederic Weisbecker, linux-kernel,
	Vivek Goyal, Michal Schmidt

Hey,

On Mon, Mar 12, 2012 at 11:22:18PM +0100, Peter Zijlstra wrote:
> On Mon, 2012-03-12 at 15:10 -0700, Tejun Heo wrote:
> > 
> > * How to map controllers which aren't aware of full hierarchy is still
> >   an open question but I'm still standing by one active node on any
> >   root-to-leaf path w/ root group serving as the special rest group. 
> 
> What does this mean?

Let's say we have a tree like the following.

         root
      /   |   \
     G1  G2   G3
             /  \
	   G31  G32

So, for cgroups which don't support full hierarchy, it'll be viewed as
either,

         root
      /   |   \
     G1  G2   G3

or

          root
      /   |   |  \
     G1  G2  G31 G32

With root being treated specially, probably as just being a equal
group as other groups, I'm not fully determined about that yet.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-03-12 22:28     ` Tejun Heo
@ 2012-03-12 22:31       ` Lennart Poettering
  2012-03-12 23:00         ` Tejun Heo
  2012-03-12 22:32       ` Peter Zijlstra
  2012-03-13 14:03       ` Vivek Goyal
  2 siblings, 1 reply; 84+ messages in thread
From: Lennart Poettering @ 2012-03-12 22:31 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Peter Zijlstra, Li Zefan, containers, cgroups, Andrew Morton,
	Kay Sievers, Frederic Weisbecker, linux-kernel, Vivek Goyal,
	Michal Schmidt

On Mon, 12.03.12 15:28, Tejun Heo (tj@kernel.org) wrote:

> 
> Hey,
> 
> On Mon, Mar 12, 2012 at 11:22:18PM +0100, Peter Zijlstra wrote:
> > On Mon, 2012-03-12 at 15:10 -0700, Tejun Heo wrote:
> > > 
> > > * How to map controllers which aren't aware of full hierarchy is still
> > >   an open question but I'm still standing by one active node on any
> > >   root-to-leaf path w/ root group serving as the special rest group. 
> > 
> > What does this mean?
> 
> Let's say we have a tree like the following.
> 
>          root
>       /   |   \
>      G1  G2   G3
>              /  \
> 	   G31  G32
> 
> So, for cgroups which don't support full hierarchy, it'll be viewed as
> either,
> 
>          root
>       /   |   \
>      G1  G2   G3
> 
> or
> 
>           root
>       /   |   |  \
>      G1  G2  G31 G32
> 
> With root being treated specially, probably as just being a equal
> group as other groups, I'm not fully determined about that yet.

Note that at least systemd places all services by default beneath a
single "super" group (/system/), hence the first suggestion would make
little sense for us. The second suggestion would be fine however.

Lennart

-- 
Lennart Poettering - Red Hat, Inc.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-03-12 22:28     ` Tejun Heo
  2012-03-12 22:31       ` Lennart Poettering
@ 2012-03-12 22:32       ` Peter Zijlstra
  2012-03-12 22:39         ` Tejun Heo
  2012-03-13 14:03       ` Vivek Goyal
  2 siblings, 1 reply; 84+ messages in thread
From: Peter Zijlstra @ 2012-03-12 22:32 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, containers, cgroups, Andrew Morton, Kay Sievers,
	Lennart Poettering, Frederic Weisbecker, linux-kernel,
	Vivek Goyal, Michal Schmidt

On Mon, 2012-03-12 at 15:28 -0700, Tejun Heo wrote:
> Hey,
> 
> On Mon, Mar 12, 2012 at 11:22:18PM +0100, Peter Zijlstra wrote:
> > On Mon, 2012-03-12 at 15:10 -0700, Tejun Heo wrote:
> > > 
> > > * How to map controllers which aren't aware of full hierarchy is still
> > >   an open question but I'm still standing by one active node on any
> > >   root-to-leaf path w/ root group serving as the special rest group. 
> > 
> > What does this mean?
> 
> Let's say we have a tree like the following.
> 
>          root
>       /   |   \
>      G1  G2   G3
>              /  \
> 	   G31  G32
> 
> So, for cgroups which don't support full hierarchy, it'll be viewed as
> either,
> 
>          root
>       /   |   \
>      G1  G2   G3
> 
> or
> 
>           root
>       /   |   |  \
>      G1  G2  G31 G32
> 
> With root being treated specially, probably as just being a equal
> group as other groups, I'm not fully determined about that yet.

I'm assuming that G31/G32's tasks end up in G3 in the first case, but
where do the tasks in G3 go to in the second case?

Also, why allow non-hierarchical controllers to begin with? I would very
much argue for mandating that all controllers work the same wrt
hierarchy and if that means ditching hierarchy support we should do that
and modify cgroupfs to not allow creation of directories deeper than 1.

But allowing controllers that implement hierarchy proper and controllers
that do not and then force them in the same mount point, that just
doesn't make any friggin sense what so ever.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-03-12 22:10 ` Tejun Heo
  2012-03-12 22:22   ` Peter Zijlstra
@ 2012-03-12 22:37   ` Serge Hallyn
  2012-03-12 22:55     ` Tejun Heo
  2012-03-13 13:49   ` Vivek Goyal
  2 siblings, 1 reply; 84+ messages in thread
From: Serge Hallyn @ 2012-03-12 22:37 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, containers, cgroups, Michal Schmidt,
	Frederic Weisbecker, Kay Sievers, linux-kernel,
	Lennart Poettering, Andrew Morton, Vivek Goyal

Quoting Tejun Heo (tj@kernel.org):
> Hello, guys.
> 
> Thanks a lot for the discussion and here are my take aways:
> 
> * At least to me, nobody seems to have strong enough justification for
>   orthogonal multiple hierarchies, so, yeah, unless something else

First off, can you (sorry) show an example of exactly what would no
longer be supported?  Is this the ability to not separately mount
freezer and cpusets (for instance)?

I've submitted a topic for the upcoming linux foundation end user summit
in new york, to get feedback from end users.  (Frankly I don't want to go
so would love if someone else wanted to do this :).  I think it'd be nice
to wait and see if this gets accepted, and see whether any end users are
relying on this.

(IMO it's wrong to say that "if you are using a current feature, you'd
better be reading lkml so you can speak up lest that feature might get
removed.")

>   happens, I'm scheduling multiple hierarchy support for the chopping
>   block.  This is a long term thing (think years), so no need to panic
>   right now and as is life plans may change and fail to materialize,
>   but I intend to at least move away from it.
> 
> * Several people pointed out that it would be inconvenient to require
>   cgroup hierarchy to be a strict super-imposed tree on top of process
>   tree and that program groups / sessions aren't like that either.  I
>   agree, so it will hopefully be single hierarchy which more or less
>   behaves the same as the current hierarchy.
> 
> * How to map controllers which aren't aware of full hierarchy is still
>   an open question but I'm still standing by one active node on any
>   root-to-leaf path w/ root group serving as the special rest group.
> 
>   This should happen first for the long migration to begin.  I might
>   get to it someday but if anyone can beat me to it, please go ahead.
>   I'll be ecstatic to review and merge the patches.
> 
> Also, I'll slowly be marking features which don't seem essential,
> especially the convenience features for multiple hierarchies, as
> deprecated and eventually chop them.
> 
> Thank you.
> 
> -- 
> tejun
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-03-12 22:32       ` Peter Zijlstra
@ 2012-03-12 22:39         ` Tejun Heo
  2012-03-12 22:44           ` Peter Zijlstra
  0 siblings, 1 reply; 84+ messages in thread
From: Tejun Heo @ 2012-03-12 22:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Li Zefan, containers, cgroups, Andrew Morton, Kay Sievers,
	Lennart Poettering, Frederic Weisbecker, linux-kernel,
	Vivek Goyal, Michal Schmidt

Hello,

On Mon, Mar 12, 2012 at 11:32:48PM +0100, Peter Zijlstra wrote:
> I'm assuming that G31/G32's tasks end up in G3 in the first case, but
> where do the tasks in G3 go to in the second case?

Collapsed into the root group.  The controller simply doesn't have
anything configured at that layer.

> Also, why allow non-hierarchical controllers to begin with? I would very
> much argue for mandating that all controllers work the same wrt
> hierarchy and if that means ditching hierarchy support we should do that
> and modify cgroupfs to not allow creation of directories deeper than 1.
> 
> But allowing controllers that implement hierarchy proper and controllers
> that do not and then force them in the same mount point, that just
> doesn't make any friggin sense what so ever.

Hmmm... that could be a good final goal but I think supporting mapping
to flat structure will simplify the transition much easier, or
possible.  That way, core transition can be mostly decoupled from
controller updates.  If we can get to the point where nesting is fully
supported by every controller first, that would be awesome too.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-03-12 22:39         ` Tejun Heo
@ 2012-03-12 22:44           ` Peter Zijlstra
  2012-03-12 23:04             ` Tejun Heo
  2012-03-13 10:11             ` Glauber Costa
  0 siblings, 2 replies; 84+ messages in thread
From: Peter Zijlstra @ 2012-03-12 22:44 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, containers, cgroups, Andrew Morton, Kay Sievers,
	Lennart Poettering, Frederic Weisbecker, linux-kernel,
	Vivek Goyal, Michal Schmidt

On Mon, 2012-03-12 at 15:39 -0700, Tejun Heo wrote:
> If we can get to the point where nesting is fully
> supported by every controller first, that would be awesome too. 

As long as that is the goal.. otherwise, I'd be overjoyed if I can rip
nesting support out of the cpu-controller.. that stuff is such a pain.
Then again, I don't think the container people like this proposal --
they were the ones pushing for full hierarchy back when.

Another way there is simply deprecating and removing all
non-hierarchical controllers (and not adding any more while we're at
it).

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-03-12 22:37   ` Serge Hallyn
@ 2012-03-12 22:55     ` Tejun Heo
  0 siblings, 0 replies; 84+ messages in thread
From: Tejun Heo @ 2012-03-12 22:55 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: Li Zefan, containers, cgroups, Michal Schmidt,
	Frederic Weisbecker, Kay Sievers, linux-kernel,
	Lennart Poettering, Andrew Morton, Vivek Goyal

Hello,

On Mon, Mar 12, 2012 at 05:37:07PM -0500, Serge Hallyn wrote:
> Quoting Tejun Heo (tj@kernel.org):
> > Hello, guys.
> > 
> > Thanks a lot for the discussion and here are my take aways:
> > 
> > * At least to me, nobody seems to have strong enough justification for
> >   orthogonal multiple hierarchies, so, yeah, unless something else
> 
> First off, can you (sorry) show an example of exactly what would no
> longer be supported?  Is this the ability to not separately mount
> freezer and cpusets (for instance)?

I think there are two aspects of it - there's actual functionality
loss and there's loss of one of the ways to achieve something.

The former is applying completely orthogonal categorizations to
processes depending on the controller in use - ie. memory limits by
user and disk IO limits by the program binary.  AFAICS, nobody seems
to have strong enough justification for this.

The latter is probably more material.  Even when people aren't using
orthogonal categorizations, they probably are using separate
hierarchies - currently, it's almost unavoidable to do so.  So, that
part would be more painful.  What I hope for is that some userland
system management tooling takes ownership of the cgroup fs interface
to impose some sane hierarchy and defaults, and expose more policy
aware interface to the rest of the system.

> I've submitted a topic for the upcoming linux foundation end user summit
> in new york, to get feedback from end users.  (Frankly I don't want to go
> so would love if someone else wanted to do this :).  I think it'd be nice
> to wait and see if this gets accepted, and see whether any end users are
> relying on this.
> 
> (IMO it's wrong to say that "if you are using a current feature, you'd
> better be reading lkml so you can speak up lest that feature might get
> removed.")

Well, these mailing lists are the widest comm channel I can make use
of and I expect there to be people to chain and propagate the
communication to many different channels as necessary.  So, yes, it
would be very appreciated if you can bring up the topic with larger /
different crowd. :)

Thank you.

-- 
tejun

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-03-12 22:31       ` Lennart Poettering
@ 2012-03-12 23:00         ` Tejun Heo
  2012-03-12 23:02           ` Peter Zijlstra
  0 siblings, 1 reply; 84+ messages in thread
From: Tejun Heo @ 2012-03-12 23:00 UTC (permalink / raw)
  To: Lennart Poettering
  Cc: Peter Zijlstra, Li Zefan, containers, cgroups, Andrew Morton,
	Kay Sievers, Frederic Weisbecker, linux-kernel, Vivek Goyal,
	Michal Schmidt

Hey, Lennart.

On Mon, Mar 12, 2012 at 11:31:14PM +0100, Lennart Poettering wrote:
> > On Mon, Mar 12, 2012 at 11:22:18PM +0100, Peter Zijlstra wrote:
> > > On Mon, 2012-03-12 at 15:10 -0700, Tejun Heo wrote:
> > > > 
> > > > * How to map controllers which aren't aware of full hierarchy is still
> > > >   an open question but I'm still standing by one active node on any
> > > >   root-to-leaf path w/ root group serving as the special rest group. 
> > > 
> > > What does this mean?
> > 
> > Let's say we have a tree like the following.
> > 
> >          root
> >       /   |   \
> >      G1  G2   G3
> >              /  \
> > 	   G31  G32
> > 
> > So, for cgroups which don't support full hierarchy, it'll be viewed as
> > either,
> > 
> >          root
> >       /   |   \
> >      G1  G2   G3
> > 
> > or
> > 
> >           root
> >       /   |   |  \
> >      G1  G2  G31 G32
> > 
> > With root being treated specially, probably as just being a equal
> > group as other groups, I'm not fully determined about that yet.
> 
> Note that at least systemd places all services by default beneath a
> single "super" group (/system/), hence the first suggestion would make
> little sense for us. The second suggestion would be fine however.

Ooh, both will be available to choose from.  I was trying to explain
that there can be configuration only at one layer for any task so that
it can be mapped to flat hierarchy.  Where to apply the config will be
selected by the user (or system tool).

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-03-12 23:00         ` Tejun Heo
@ 2012-03-12 23:02           ` Peter Zijlstra
  2012-03-12 23:09             ` Tejun Heo
  2012-03-12 23:43             ` Lennart Poettering
  0 siblings, 2 replies; 84+ messages in thread
From: Peter Zijlstra @ 2012-03-12 23:02 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Lennart Poettering, Li Zefan, containers, cgroups, Andrew Morton,
	Kay Sievers, Frederic Weisbecker, linux-kernel, Vivek Goyal,
	Michal Schmidt

On Mon, 2012-03-12 at 16:00 -0700, Tejun Heo wrote:
> 
> Ooh, both will be available to choose from.  I was trying to explain
> that there can be configuration only at one layer for any task so that
> it can be mapped to flat hierarchy.  Where to apply the config will be
> selected by the user (or system tool). 

Thus in effect this is a false choice, since Lennart and assorted idiots
conspire against sanity by pushing systemd into our every orifice, and
since he just said systemd requires one of the two, the choice will be
made for us, lest we forfeit wanting to boot our system.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-03-12 22:44           ` Peter Zijlstra
@ 2012-03-12 23:04             ` Tejun Heo
  2012-03-13 14:10               ` Vivek Goyal
  2012-03-13 10:11             ` Glauber Costa
  1 sibling, 1 reply; 84+ messages in thread
From: Tejun Heo @ 2012-03-12 23:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Li Zefan, containers, cgroups, Andrew Morton, Kay Sievers,
	Lennart Poettering, Frederic Weisbecker, linux-kernel,
	Vivek Goyal, Michal Schmidt

On Mon, Mar 12, 2012 at 11:44:01PM +0100, Peter Zijlstra wrote:
> On Mon, 2012-03-12 at 15:39 -0700, Tejun Heo wrote:
> > If we can get to the point where nesting is fully
> > supported by every controller first, that would be awesome too. 
> 
> As long as that is the goal.. otherwise, I'd be overjoyed if I can rip
> nesting support out of the cpu-controller.. that stuff is such a pain.
> Then again, I don't think the container people like this proposal --
> they were the ones pushing for full hierarchy back when.

Yeah, the great pain of full hierarchy support is one of the reasons
why I keep thinking about supporting mapping to flat hierarchy.  Full
hierarchy could be too painful and not useful enough for some
controllers.  Then again, cpu and memcg already have it and according
to Vivek blkcg also had a proposed implementation, so maybe it's okay.
Let's see.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-03-12 23:02           ` Peter Zijlstra
@ 2012-03-12 23:09             ` Tejun Heo
  2012-03-12 23:43             ` Lennart Poettering
  1 sibling, 0 replies; 84+ messages in thread
From: Tejun Heo @ 2012-03-12 23:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Lennart Poettering, Li Zefan, containers, cgroups, Andrew Morton,
	Kay Sievers, Frederic Weisbecker, linux-kernel, Vivek Goyal,
	Michal Schmidt

Hello, Peter.

On Tue, Mar 13, 2012 at 12:02:47AM +0100, Peter Zijlstra wrote:
> On Mon, 2012-03-12 at 16:00 -0700, Tejun Heo wrote:
> > 
> > Ooh, both will be available to choose from.  I was trying to explain
> > that there can be configuration only at one layer for any task so that
> > it can be mapped to flat hierarchy.  Where to apply the config will be
> > selected by the user (or system tool). 
> 
> Thus in effect this is a false choice, since Lennart and assorted idiots
> conspire against sanity by pushing systemd into our every orifice, and
> since he just said systemd requires one of the two, the choice will be
> made for us, lest we forfeit wanting to boot our system.

I think it should be fine as long as systemd or whatever system cgroup
manager can be told to stay aside about limits.  Everyone doing their
own thing and ending up competing directly under /sys/fs/cgroup/
worries me more.  cgroup config fs doesn't have enough flexibility or
proper provisions for sharing while encouraging direct usage.  That's
not a good combination.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-03-12 23:02           ` Peter Zijlstra
  2012-03-12 23:09             ` Tejun Heo
@ 2012-03-12 23:43             ` Lennart Poettering
  1 sibling, 0 replies; 84+ messages in thread
From: Lennart Poettering @ 2012-03-12 23:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, Li Zefan, containers, cgroups, Andrew Morton,
	Kay Sievers, Frederic Weisbecker, linux-kernel, Vivek Goyal,
	Michal Schmidt

On Tue, 13.03.12 00:02, Peter Zijlstra (peterz@infradead.org) wrote:

> 
> On Mon, 2012-03-12 at 16:00 -0700, Tejun Heo wrote:
> > 
> > Ooh, both will be available to choose from.  I was trying to explain
> > that there can be configuration only at one layer for any task so that
> > it can be mapped to flat hierarchy.  Where to apply the config will be
> > selected by the user (or system tool). 
> 
> Thus in effect this is a false choice, since Lennart and assorted idiots
> conspire against sanity by pushing systemd into our every orifice, and
> since he just said systemd requires one of the two, the choice will be
> made for us, lest we forfeit wanting to boot our system.

I didn't say that that we require one of the two. I just pointed out
that for us the first option makes more sense. Also, as I pointed out I
am happy to adapt systemd to whatever Tejun decides.

BTW, I actually believe the hierachial design of cgroups is pretty neat,
since it allows us to label things hierarchially, so that for example
user services can have their own labels all beneath a per-user label. So
for the purpose of grouping things and naming them I very much
appreciate hierarchial cgroups. For the purpose of actually applying
resource controls I care much less for it, but I still do see its
use.

Lennart

PS: Awesome choice of words! I totally appreciate how you talk to and
about me. This creates such a strong urge inside of me to care about the
problems you have with systemd and fix them for you.

-- 
Lennart Poettering - Red Hat, Inc.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-03-12 22:44           ` Peter Zijlstra
  2012-03-12 23:04             ` Tejun Heo
@ 2012-03-13 10:11             ` Glauber Costa
  1 sibling, 0 replies; 84+ messages in thread
From: Glauber Costa @ 2012-03-13 10:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, Li Zefan, containers, cgroups, Andrew Morton,
	Kay Sievers, Lennart Poettering, Frederic Weisbecker,
	linux-kernel, Vivek Goyal, Michal Schmidt

On 03/13/2012 02:44 AM, Peter Zijlstra wrote:
> On Mon, 2012-03-12 at 15:39 -0700, Tejun Heo wrote:
>> >  If we can get to the point where nesting is fully
>> >  supported by every controller first, that would be awesome too.
> As long as that is the goal.. otherwise, I'd be overjoyed if I can rip
> nesting support out of the cpu-controller.. that stuff is such a pain.
> Then again, I don't think the container people like this proposal --
> they were the ones pushing for full hierarchy back when.

Indeed. At some point it is desirable that an admin inside the container 
be able to divide resources between the tasks they see.

Maybe it doesn't even make sense for things like I/O, for each he at 
most has an illusion of control anyway. But cpu is a lot more tangible 
resource in this regard.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-03-12 22:10 ` Tejun Heo
  2012-03-12 22:22   ` Peter Zijlstra
  2012-03-12 22:37   ` Serge Hallyn
@ 2012-03-13 13:49   ` Vivek Goyal
  2012-03-13 16:02     ` Tejun Heo
  2 siblings, 1 reply; 84+ messages in thread
From: Vivek Goyal @ 2012-03-13 13:49 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, containers, cgroups, Andrew Morton, Kay Sievers,
	Lennart Poettering, Frederic Weisbecker, linux-kernel,
	Michal Schmidt, Peter Zijlstra

On Mon, Mar 12, 2012 at 03:10:50PM -0700, Tejun Heo wrote:
> Hello, guys.
> 
> Thanks a lot for the discussion and here are my take aways:
> 
> * At least to me, nobody seems to have strong enough justification for
>   orthogonal multiple hierarchies, so, yeah, unless something else
>   happens, I'm scheduling multiple hierarchy support for the chopping
>   block.  This is a long term thing (think years), so no need to panic
>   right now and as is life plans may change and fail to materialize,
>   but I intend to at least move away from it.

So everything will be under single hierarchy? How will we control that
what controllers are active on what cgroups? Or all controllers are going
to be active on all cgroups on that hierarchy. 

That would be bad for IO cgroups where a better way to use it to isolate
the trouble making workload and run rest in root or a common cgroup.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-03-12 22:28     ` Tejun Heo
  2012-03-12 22:31       ` Lennart Poettering
  2012-03-12 22:32       ` Peter Zijlstra
@ 2012-03-13 14:03       ` Vivek Goyal
  2012-03-13 15:59         ` Tejun Heo
  2 siblings, 1 reply; 84+ messages in thread
From: Vivek Goyal @ 2012-03-13 14:03 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Peter Zijlstra, Li Zefan, containers, cgroups, Andrew Morton,
	Kay Sievers, Lennart Poettering, Frederic Weisbecker,
	linux-kernel, Michal Schmidt

On Mon, Mar 12, 2012 at 03:28:17PM -0700, Tejun Heo wrote:
> Hey,
> 
> On Mon, Mar 12, 2012 at 11:22:18PM +0100, Peter Zijlstra wrote:
> > On Mon, 2012-03-12 at 15:10 -0700, Tejun Heo wrote:
> > > 
> > > * How to map controllers which aren't aware of full hierarchy is still
> > >   an open question but I'm still standing by one active node on any
> > >   root-to-leaf path w/ root group serving as the special rest group. 
> > 
> > What does this mean?
> 
> Let's say we have a tree like the following.
> 
>          root
>       /   |   \
>      G1  G2   G3
>              /  \
> 	   G31  G32
> 
> So, for cgroups which don't support full hierarchy, it'll be viewed as
> either,
> 
>          root
>       /   |   \
>      G1  G2   G3
> 
> or
> 
>           root
>       /   |   |  \
>      G1  G2  G31 G32
> 
> With root being treated specially, probably as just being a equal
> group as other groups, I'm not fully determined about that yet.

So what wrong with flattening the whole hierarchy and all groups being active
in the path? It is not worse then second option?

             root
       /   |  |  |  \
      G1  G2  G3 G31 G32

One problem with above is that a children can create its own child cgroups
and compete at highest level for resources.

But same is the problem with second choice you gave. May be you are
thinking of introducing another knob to configure active point in the
path? I guess that will make cgroup configuration even harder.

May be we flatten the whole hierarchy. When we launch user sessions, we
don't give then permissions to create child cgroups for IO. For services
running with admin priviliges, we still don't have a good solution for
flat controllers. If nobody has a use case, I guess we just live with
that as that's the limitation of flat controller and one needs to make
it hierarchical, if need be.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-03-12 23:04             ` Tejun Heo
@ 2012-03-13 14:10               ` Vivek Goyal
  2012-03-13 16:11                 ` C Anthony Risinger
  2012-03-13 17:25                 ` Peter Zijlstra
  0 siblings, 2 replies; 84+ messages in thread
From: Vivek Goyal @ 2012-03-13 14:10 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Peter Zijlstra, Li Zefan, containers, cgroups, Andrew Morton,
	Kay Sievers, Lennart Poettering, Frederic Weisbecker,
	linux-kernel, Michal Schmidt

On Mon, Mar 12, 2012 at 04:04:16PM -0700, Tejun Heo wrote:
> On Mon, Mar 12, 2012 at 11:44:01PM +0100, Peter Zijlstra wrote:
> > On Mon, 2012-03-12 at 15:39 -0700, Tejun Heo wrote:
> > > If we can get to the point where nesting is fully
> > > supported by every controller first, that would be awesome too. 
> > 
> > As long as that is the goal.. otherwise, I'd be overjoyed if I can rip
> > nesting support out of the cpu-controller.. that stuff is such a pain.
> > Then again, I don't think the container people like this proposal --
> > they were the ones pushing for full hierarchy back when.
> 
> Yeah, the great pain of full hierarchy support is one of the reasons
> why I keep thinking about supporting mapping to flat hierarchy.  Full
> hierarchy could be too painful and not useful enough for some
> controllers.  Then again, cpu and memcg already have it and according
> to Vivek blkcg also had a proposed implementation, so maybe it's okay.
> Let's see.

Implementing hierarchy is a pain and is expensive at run time. Supporting
flat structure will provide path for smooth transition.

We had some RFC patches for blkcg hierarchy and that made things even more
complicated and we might not gain much. So why to complicate the code
until and unless we have a good use case.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-03-13 14:03       ` Vivek Goyal
@ 2012-03-13 15:59         ` Tejun Heo
  2012-03-16 23:14           ` James Bottomley
  0 siblings, 1 reply; 84+ messages in thread
From: Tejun Heo @ 2012-03-13 15:59 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Peter Zijlstra, Li Zefan, containers, cgroups, Andrew Morton,
	Kay Sievers, Lennart Poettering, Frederic Weisbecker,
	linux-kernel, Michal Schmidt

Hey, Vivek.

On Tue, Mar 13, 2012 at 10:03:45AM -0400, Vivek Goyal wrote:
> > With root being treated specially, probably as just being a equal
> > group as other groups, I'm not fully determined about that yet.
> 
> So what wrong with flattening the whole hierarchy and all groups being active
> in the path? It is not worse then second option?
> 
>              root
>        /   |  |  |  \
>       G1  G2  G3 G31 G32

It is worse because while there isn't much need for orthogonal
hierarchies, people often need to apply different limits at different
levels of the hierarchy for different controllers.  ie. it often
happens that the distinction between G31 and G32 matters for one
controller but not for others.  The problem with flattening like you
suggested above is that it isn't a hierarchy at all - membership isn't
recursive.

Imposing limits at single level is an additional restriction and may
cause some config complexity but it'll be at least explicit and can
co-exist with full hierarchy in meaningful way.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-03-13 13:49   ` Vivek Goyal
@ 2012-03-13 16:02     ` Tejun Heo
  0 siblings, 0 replies; 84+ messages in thread
From: Tejun Heo @ 2012-03-13 16:02 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Li Zefan, containers, cgroups, Andrew Morton, Kay Sievers,
	Lennart Poettering, Frederic Weisbecker, linux-kernel,
	Michal Schmidt, Peter Zijlstra

Hello,

On Tue, Mar 13, 2012 at 09:49:22AM -0400, Vivek Goyal wrote:
> So everything will be under single hierarchy? How will we control that
> what controllers are active on what cgroups? Or all controllers are going
> to be active on all cgroups on that hierarchy. 
> 
> That would be bad for IO cgroups where a better way to use it to isolate
> the trouble making workload and run rest in root or a common cgroup.

Depending on active configs, I guess, or maybe we'll have active
controllers mask which can also be used for single-level restriction
for controllers which don't support full hierarchy.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-03-13 14:10               ` Vivek Goyal
@ 2012-03-13 16:11                 ` C Anthony Risinger
  2012-03-13 16:30                   ` C Anthony Risinger
  2012-03-13 17:25                 ` Peter Zijlstra
  1 sibling, 1 reply; 84+ messages in thread
From: C Anthony Risinger @ 2012-03-13 16:11 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Tejun Heo, Michal Schmidt, Frederic Weisbecker, containers,
	Kay Sievers, linux-kernel, Lennart Poettering, cgroups,
	Andrew Morton

On Tue, Mar 13, 2012 at 9:10 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Mon, Mar 12, 2012 at 04:04:16PM -0700, Tejun Heo wrote:
>> On Mon, Mar 12, 2012 at 11:44:01PM +0100, Peter Zijlstra wrote:
>> > On Mon, 2012-03-12 at 15:39 -0700, Tejun Heo wrote:
>> > > If we can get to the point where nesting is fully
>> > > supported by every controller first, that would be awesome too.
>> >
>> > As long as that is the goal.. otherwise, I'd be overjoyed if I can rip
>> > nesting support out of the cpu-controller.. that stuff is such a pain.
>> > Then again, I don't think the container people like this proposal --
>> > they were the ones pushing for full hierarchy back when.
>>
>> Yeah, the great pain of full hierarchy support is one of the reasons
>> why I keep thinking about supporting mapping to flat hierarchy.  Full
>> hierarchy could be too painful and not useful enough for some
>> controllers.  Then again, cpu and memcg already have it and according
>> to Vivek blkcg also had a proposed implementation, so maybe it's okay.
>> Let's see.
>
> Implementing hierarchy is a pain and is expensive at run time. Supporting
> flat structure will provide path for smooth transition.
>
> We had some RFC patches for blkcg hierarchy and that made things even more
> complicated and we might not gain much. So why to complicate the code
> until and unless we have a good use case.

how about ditching the idea of an FS altogether?

the `mkdir` creates and nests has always felt awkward to me.  maybe
instead we flatten everything out, and bind to the process tree, but
enable a tag-like system to "mark" processes, and attach meaning to
them.  akin to marking+processing packets (netfilter), or maybe like
sysfs tags(?).

maybe a trivial example, but bear with me here ... other controllers
are bound to a `name` controller ...

# my pid?
$ echo $$
123

# what controllers are available for this process?
$ cat /proc/self/tags/TYPE

# create a new `name` base controller
$ touch /proc/self/tags/admin

# create a new `name` base controller
$ touch /proc/self/tags/users

# begin tracking cpu shares at some default level
$ touch /proc/self/tags/admin.cpuacct.cpu.shares

# explicit assign `admin` 150 shares
$ echo 150 > /proc/self/tags/admin.cpuacct.cpu.shares

# explicit assign `users` 50 shares
$ echo 50 > /proc/self/tags/admin.cpuacct.cpu.shares

# tag will propogate to children
$ echo 1 > /proc/self/tags/admin.cpuacct.cpu.PERSISTENT

# `name`'s priority relative to sibling `name` groups (like shares)
$ echo 100 > /proc/self/tags/admin.cpuacct.cpu.PRIORITY

# `name`'s priority relative to sibling `name` groups (like shares)
$ echo 100 > /proc/self/tags/admin.cpuacct.cpu.PRIORITY

[... system ...]

# what controllers are available system-wide?
$ cat /sys/fs/cgroup/TYPE
cpuacct = monitor resources
memory = monitor memory
blkio = io stuffs
[...]

# what knobs are available?
$ cat /sys/fs/cgroup/cpuacct.TYPE
shares = relative assignment of resources
stat = some stats
[...]

# how many total shares requested (system)
$ cat /sys/fs/cgroup/cpuacct.cpu.shares
200

# how many total shares requested (admin)
$ cat /sys/fs/cgroup/admin.cpuacct.cpu.shares
150

# how many total shares requested (users)
$ cat /sys/fs/cgroup/users.cpuacct.cpu.shares
50

# *all* processes
$ cat /sys/fs/cgroup/TASKS
1
123
[...]

# which processes have `admin` tag?
$ cat /sys/fs/cgroup/cpuacct/admin.TASKS
123

# which processes have `users` tag?
$ cat /sys/fs/cgroup/cpuacct/users.TASKS
123

# link to pid
$ readlink -f /sys/fs/cgroup/cpuacct/users.TASKS.123
/proc/123

# which user owns `users` tag?
$ cat /sys/fs/cgroup/cpuacct/users.UID
1000

# default mode for `user` controls?
$ cat /sys/fs/cgroup/users.MODE
0664

# default mode for `user` cpuacct controls?
$ cat /sys/fs/cgroup/users.cpuacct.MODE
0600

# mask some controllers to `users` tag?
$ echo -e "cpuacct\nmemory" > /sys/fs/cgroup/users.MASK

# ... did the above work? (look at last call to TYPE above)
$ cat /sys/fs/cgroup/users.TYPE
blkio
[...]

# assign a whitelist instead
$ echo -e "cpu\nmemory" > /sys/fs/cgroup/users.TYPE

# mask some knobs to `users` tag
$ echo -e "shares" > /sys/fs/cgroup/users.cpuacct.MASK

# ... did the above work?
$ cat /sys/fs/cgroup/users.cpuacct.TYPE
stat = some stats
[...]

... in this way there is still a sort of heirarchy, but each
controller is free to choose:

) if there is any meaning to multiple `names` per process
) ... or if one one should be allowed
) how to combine laterally
) how to combine descendents
) ... maybe even assignable strategies!
) controller semantics independent of other controllers

when a new pid namespace is created, the `tags` dir is "cleared out"
and that person can assign new values (or maybe a directory is created
in `tags`?).  the effective value is the union of both, and identical
to whatever the process would have had *without* a namespace (no
difference, on visibility).

thus, cgroupfs becomes a simple mount that has aggregate stats and
system-wide settings.

recap:

) bound to process heirarchy
) ... but control space is flat
) does not force every controller to use same paradigm (eg, "you must
behave like a directory tree")
) ... but orthogonal multiplexing of a controller is possible if the
controller allows it
) allows same permission-based ACL
) easy to see all controls affect a process or `name` group with a
simple `ls -l`
) additional possibilities that didn't exist with directory/arbitrary
mounts paradigm

does this make sense? makes much more to me at least, and i think
allow greater flexibility with less complexity (if my experience with
FUSE is any indication) ...

... or is this the same wolf in sheep's skin?

-- 

C Anthony

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-03-13 16:11                 ` C Anthony Risinger
@ 2012-03-13 16:30                   ` C Anthony Risinger
  0 siblings, 0 replies; 84+ messages in thread
From: C Anthony Risinger @ 2012-03-13 16:30 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Tejun Heo, Michal Schmidt, Frederic Weisbecker, containers,
	Kay Sievers, linux-kernel, Lennart Poettering, cgroups,
	Andrew Morton

On Tue, Mar 13, 2012 at 11:11 AM, C Anthony Risinger <anthony@xtfx.me> wrote:
>
> # what controllers are available for this process?
> $ cat /proc/self/tags/TYPE
^^^^^ should list of controllers either assigned to TYPE or not masked by MASK

> # explicit assign `users` 50 shares
> $ echo 50 > /proc/self/tags/admin.cpuacct.cpu.shares
                                               ^^^^^ s,admin,users,

> # `name`'s priority relative to sibling `name` groups (like shares)
> $ echo 100 > /proc/self/tags/admin.cpuacct.cpu.PRIORITY
^^^^^ maybe redundant, my original reasoning for it escapes me, but
could be useful for controllers that accept multiple `name`s

-- 

C Anthony

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-03-13 14:10               ` Vivek Goyal
  2012-03-13 16:11                 ` C Anthony Risinger
@ 2012-03-13 17:25                 ` Peter Zijlstra
  2012-03-13 17:31                   ` Peter Zijlstra
  1 sibling, 1 reply; 84+ messages in thread
From: Peter Zijlstra @ 2012-03-13 17:25 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Tejun Heo, Li Zefan, containers, cgroups, Andrew Morton,
	Kay Sievers, Lennart Poettering, Frederic Weisbecker,
	linux-kernel, Michal Schmidt

On Tue, 2012-03-13 at 10:10 -0400, Vivek Goyal wrote:
> Implementing hierarchy is a pain and is expensive at run time. 

Yeah, suck it up :-)

I would really rather we mandate one implementation standard for
controllers for the sake of consistency and uniformity. A direct result
of doing away with the multiple hierarchy crap is that all controllers
are co-mounted. Allowing differences like this just doesn't make any
sense.

So either we drop full hierarchy support from all controllers or we
deprecate and remove all non-hierarchical controllers.

I'm fine with either, but I'm not fine with with the half-arsed
solutions proposed here.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-03-13 17:25                 ` Peter Zijlstra
@ 2012-03-13 17:31                   ` Peter Zijlstra
  0 siblings, 0 replies; 84+ messages in thread
From: Peter Zijlstra @ 2012-03-13 17:31 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Tejun Heo, Li Zefan, containers, cgroups, Andrew Morton,
	Kay Sievers, Lennart Poettering, Frederic Weisbecker,
	linux-kernel, Michal Schmidt

On Tue, 2012-03-13 at 18:25 +0100, Peter Zijlstra wrote:
> On Tue, 2012-03-13 at 10:10 -0400, Vivek Goyal wrote:
> > Implementing hierarchy is a pain and is expensive at run time. 
> 
> Yeah, suck it up :-)
> 
> I would really rather we mandate one implementation standard for
> controllers for the sake of consistency and uniformity. A direct result
> of doing away with the multiple hierarchy crap is that all controllers
> are co-mounted. Allowing differences like this just doesn't make any
> sense.
> 
> So either we drop full hierarchy support from all controllers or we
> deprecate and remove all non-hierarchical controllers.
> 
> I'm fine with either, but I'm not fine with with the half-arsed
> solutions proposed here.

Note that before this whole discussion I was under the impressions it
was mandated for a controller to be fully hierarchical. I'm very much
surprised people were allowed to merge incomplete controllers like that.



^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [RFD] cgroup: about multiple hierarchies
  2012-03-13 15:59         ` Tejun Heo
@ 2012-03-16 23:14           ` James Bottomley
  0 siblings, 0 replies; 84+ messages in thread
From: James Bottomley @ 2012-03-16 23:14 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Vivek Goyal, Peter Zijlstra, Li Zefan, containers, cgroups,
	Andrew Morton, Kay Sievers, Lennart Poettering,
	Frederic Weisbecker, linux-kernel, Michal Schmidt

On Tue, 2012-03-13 at 08:59 -0700, Tejun Heo wrote:
> Hey, Vivek.
> 
> On Tue, Mar 13, 2012 at 10:03:45AM -0400, Vivek Goyal wrote:
> > > With root being treated specially, probably as just being a equal
> > > group as other groups, I'm not fully determined about that yet.
> > 
> > So what wrong with flattening the whole hierarchy and all groups being active
> > in the path? It is not worse then second option?
> > 
> >              root
> >        /   |  |  |  \
> >       G1  G2  G3 G31 G32
> 
> It is worse because while there isn't much need for orthogonal
> hierarchies, people often need to apply different limits at different
> levels of the hierarchy for different controllers.  ie. it often
> happens that the distinction between G31 and G32 matters for one
> controller but not for others.  The problem with flattening like you
> suggested above is that it isn't a hierarchy at all - membership isn't
> recursive.
> 
> Imposing limits at single level is an additional restriction and may
> cause some config complexity but it'll be at least explicit and can
> co-exist with full hierarchy in meaningful way.

Isn't there a simple fix for this?  Each controller can decide whether
to pay attention to its cgroup parent in calculating the resource limits
or counting usage.  If the controller elects not to pay attention to its
parents when counting resources and enforcing limits, it effectively
gives you a flat hierarchy from the point of view of the controller.

What actually happens depends on how the controller calculates the
limits: if it's a global fraction, then it's completely flat, if it's
just an absolute limit, then it wouldn't pay attention to the parent
anyway, if it's a proportion, then the controller has to decide how to
divide up the parent's allocation.

James



^ permalink raw reply	[flat|nested] 84+ messages in thread

end of thread, other threads:[~2012-03-16 23:14 UTC | newest]

Thread overview: 84+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-21 21:19 [RFD] cgroup: about multiple hierarchies Tejun Heo
2012-02-21 21:21 ` Tejun Heo
2012-02-22 13:34   ` Glauber Costa
2012-02-23  7:45     ` Serge E. Hallyn
2012-02-23 17:29       ` Tejun Heo
2012-02-23 18:47         ` Serge Hallyn
2012-02-26  4:59   ` Konstantin Khlebnikov
2012-02-22 13:30 ` Peter Zijlstra
2012-02-22 13:37   ` Glauber Costa
2012-02-22 18:01   ` Tejun Heo
2012-02-23  7:39   ` Li Zefan
2012-02-22 15:45 ` Frederic Weisbecker
2012-02-22 18:22   ` Tejun Heo
2012-02-27 17:46     ` Frederic Weisbecker
2012-02-22 16:38 ` Vivek Goyal
2012-02-22 16:57   ` Vivek Goyal
2012-02-22 18:43     ` Tejun Heo
2012-02-23  9:41     ` Peter Zijlstra
2012-02-23 14:13       ` Peter Zijlstra
2012-03-01 17:19         ` Michal Schmidt
2012-03-01 18:03           ` Peter Zijlstra
2012-03-02 11:08             ` Michal Schmidt
2012-03-02 11:23               ` Peter Zijlstra
2012-03-02 11:28                 ` Michal Schmidt
2012-03-02 11:34                   ` Peter Zijlstra
2012-03-01 20:26           ` Mike Galbraith
2012-03-01 21:02             ` Vivek Goyal
2012-03-01 22:04               ` Mike Galbraith
2012-03-01 22:38                 ` C Anthony Risinger
2012-03-02 10:51                 ` Michal Schmidt
2012-03-02 11:52                   ` Mike Galbraith
2012-03-05 12:43                 ` Lennart Poettering
2012-03-05 15:47                   ` Mike Galbraith
2012-03-05 19:58                     ` Mike Galbraith
2012-03-02  2:43             ` Kay Sievers
2012-03-02 10:15               ` Peter Zijlstra
2012-03-02 11:16             ` Michal Schmidt
2012-03-02 11:24               ` Peter Zijlstra
2012-02-23 21:38       ` Vivek Goyal
2012-02-23 22:34         ` Tejun Heo
2012-02-28 21:16           ` Vivek Goyal
2012-02-28 21:21             ` Peter Zijlstra
2012-02-28 21:35               ` Vivek Goyal
2012-02-28 21:43                 ` Peter Zijlstra
2012-02-28 21:54                   ` Vivek Goyal
2012-02-28 22:00                     ` Peter Zijlstra
2012-02-28 22:31                       ` Vivek Goyal
2012-02-28 21:53                 ` Peter Zijlstra
2012-02-28 22:09                   ` Vivek Goyal
2012-02-24 11:33         ` Peter Zijlstra
2012-02-22 18:33   ` Tejun Heo
2012-02-23 19:41     ` Vivek Goyal
2012-02-23 22:38       ` Tejun Heo
2012-02-23  7:59   ` Li Zefan
2012-02-23 20:32     ` Vivek Goyal
2012-02-23  8:22 ` Li Zefan
2012-02-23 17:33   ` Tejun Heo
     [not found] ` <m162em2efy.fsf@fess.ebiederm.org>
2012-03-03 14:26   ` Serge Hallyn
2012-03-05 11:37 ` Lennart Poettering
2012-03-12 22:10 ` Tejun Heo
2012-03-12 22:22   ` Peter Zijlstra
2012-03-12 22:28     ` Tejun Heo
2012-03-12 22:31       ` Lennart Poettering
2012-03-12 23:00         ` Tejun Heo
2012-03-12 23:02           ` Peter Zijlstra
2012-03-12 23:09             ` Tejun Heo
2012-03-12 23:43             ` Lennart Poettering
2012-03-12 22:32       ` Peter Zijlstra
2012-03-12 22:39         ` Tejun Heo
2012-03-12 22:44           ` Peter Zijlstra
2012-03-12 23:04             ` Tejun Heo
2012-03-13 14:10               ` Vivek Goyal
2012-03-13 16:11                 ` C Anthony Risinger
2012-03-13 16:30                   ` C Anthony Risinger
2012-03-13 17:25                 ` Peter Zijlstra
2012-03-13 17:31                   ` Peter Zijlstra
2012-03-13 10:11             ` Glauber Costa
2012-03-13 14:03       ` Vivek Goyal
2012-03-13 15:59         ` Tejun Heo
2012-03-16 23:14           ` James Bottomley
2012-03-12 22:37   ` Serge Hallyn
2012-03-12 22:55     ` Tejun Heo
2012-03-13 13:49   ` Vivek Goyal
2012-03-13 16:02     ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).