linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>,
	containers@lists.linux-foundation.org, cgroups@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Kay Sievers <kay.sievers@vrfy.org>,
	Lennart Poettering <lennart@poettering.net>,
	linux-kernel@vger.kernel.org, Paul Menage <paul@paulmenage.org>
Subject: Re: [RFD] cgroup: about multiple hierarchies
Date: Wed, 22 Feb 2012 10:22:07 -0800	[thread overview]
Message-ID: <20120222182207.GC32694@google.com> (raw)
In-Reply-To: <20120222154501.GA1693@somewhere.redhat.com>

Hey, Frederic.

On Wed, Feb 22, 2012 at 04:45:04PM +0100, Frederic Weisbecker wrote:
> > A related limitation is that as different subsystems don't know which
> > hierarchies they'll end up on, they can't cooperate.  Wouldn't it make
> > more sense if task counter is a separate thing watching the resources
> > and triggers different actions as conifgured - be it failing forks or
> > freezing?
> 
> For this particular example, I think we'd better have a file in which
> a task can poll and get woken up when the task limit has been reached.
> Then that task can decide to freeze or whatever.

Yes, that may be a solution but to "guarantee" that the limit is never
breached, we need to stop it first somehow.  Probably making freezing
the default behavior with userland notifier (inotify event should
suffice) should do, which we can't do now. :(

> > 1. We're screwed anyway.  Just don't worry about it and continue down
> >    on this path.  Can't get much worse, right?
> > 
> >    This approach has the apparent advantage of not having to do
> >    anything and is probably most likely to be taken.  This isn't ideal
> >    but hey nothing is. :P
> 
> Thing is we have an ABI now and it has been there for a while now. Aren't
> we stuck with it? I'm no big fan of that multiple hierarchies thing either
> but now I fear we have to support it.

Well, yes and no.  While maintaining userland ABI is very important,
its importance isn't infinite and there are different types of
userland ABIs.  We definitely don't want to screw with syscalls.  We
should keep userland visible dynamic files which are used by common
usertools stable at almost all costs.  When it comes over to system
interface which is used mostly by base system tools, it can be a bit
flexible.  If the ABI in question is an optional thing, we probably
can be slightly more flexible.

We of course can't change things drastically.  It should be done
carefully with rather long deprecation period, but it can be done and
in fact isn't too uncommon.  Stuff under /sysfs tends to be somewhat
volatile and sysfs itself went through several ABI incompatible
iterations.

So, we can transition in baby steps.  e.g. we can first implement
proper nesting behavior without changing the default behavior and then
the base system can be updated to mount and control all subsystems by
default (with configuration opt-outs) so that the hierarchy reflects
pstree, effectively driving people away from multiple hierarchies and
we can implement new features assuming the new structure.  After a few
years, the kernel can start whining about non-start hierarchies and
then eventually remove the support.  It's a long process but
definitely doable.

> > 2. Make it more flexible (and likely more complex, unfortunately).
> >    Allow the utility type subsystems to be used in multiple
> >    hierarchies.  The easiest and probably dirtiest way to achieve that
> >    would be embedding them into cgroup core.
> > 
> >    Thinking about doing this depresses me and it's not like I have a
> >    cheerful personality to begin with. :(
> 
> Another solution is to support a class of multi-bindable subsystems as in
> this old patch from Paul:
> 
> 	https://lkml.org/lkml/2009/7/1/578

Heh, yeah, this would be closer to the proper way to achieve
multi-attach but I can't help feeling that this just buries ourselves
deeper into s*it and we're already knee-deep.  If multiple hierarchies
is an essential feature, maybe, but, if it's not, and I'm extremely
skeptical that it is, why the hell would we want to go that way?

> It sounds to me more healthy to iterate only over subsystems in fork/exit.
> We probably don't want to add a new iteration over cgroups themselves
> on these fast path.

Hmmm?  Don't follow why this is relevant.

Thanks.

-- 
tejun

  reply	other threads:[~2012-02-22 18:22 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-21 21:19 [RFD] cgroup: about multiple hierarchies Tejun Heo
2012-02-21 21:21 ` Tejun Heo
2012-02-22 13:34   ` Glauber Costa
2012-02-23  7:45     ` Serge E. Hallyn
2012-02-23 17:29       ` Tejun Heo
2012-02-23 18:47         ` Serge Hallyn
2012-02-26  4:59   ` Konstantin Khlebnikov
2012-02-22 13:30 ` Peter Zijlstra
2012-02-22 13:37   ` Glauber Costa
2012-02-22 18:01   ` Tejun Heo
2012-02-23  7:39   ` Li Zefan
2012-02-22 15:45 ` Frederic Weisbecker
2012-02-22 18:22   ` Tejun Heo [this message]
2012-02-27 17:46     ` Frederic Weisbecker
2012-02-22 16:38 ` Vivek Goyal
2012-02-22 16:57   ` Vivek Goyal
2012-02-22 18:43     ` Tejun Heo
2012-02-23  9:41     ` Peter Zijlstra
2012-02-23 14:13       ` Peter Zijlstra
2012-03-01 17:19         ` Michal Schmidt
2012-03-01 18:03           ` Peter Zijlstra
2012-03-02 11:08             ` Michal Schmidt
2012-03-02 11:23               ` Peter Zijlstra
2012-03-02 11:28                 ` Michal Schmidt
2012-03-02 11:34                   ` Peter Zijlstra
2012-03-01 20:26           ` Mike Galbraith
2012-03-01 21:02             ` Vivek Goyal
2012-03-01 22:04               ` Mike Galbraith
2012-03-01 22:38                 ` C Anthony Risinger
2012-03-02 10:51                 ` Michal Schmidt
2012-03-02 11:52                   ` Mike Galbraith
2012-03-05 12:43                 ` Lennart Poettering
2012-03-05 15:47                   ` Mike Galbraith
2012-03-05 19:58                     ` Mike Galbraith
2012-03-02  2:43             ` Kay Sievers
2012-03-02 10:15               ` Peter Zijlstra
2012-03-02 11:16             ` Michal Schmidt
2012-03-02 11:24               ` Peter Zijlstra
2012-02-23 21:38       ` Vivek Goyal
2012-02-23 22:34         ` Tejun Heo
2012-02-28 21:16           ` Vivek Goyal
2012-02-28 21:21             ` Peter Zijlstra
2012-02-28 21:35               ` Vivek Goyal
2012-02-28 21:43                 ` Peter Zijlstra
2012-02-28 21:54                   ` Vivek Goyal
2012-02-28 22:00                     ` Peter Zijlstra
2012-02-28 22:31                       ` Vivek Goyal
2012-02-28 21:53                 ` Peter Zijlstra
2012-02-28 22:09                   ` Vivek Goyal
2012-02-24 11:33         ` Peter Zijlstra
2012-02-22 18:33   ` Tejun Heo
2012-02-23 19:41     ` Vivek Goyal
2012-02-23 22:38       ` Tejun Heo
2012-02-23  7:59   ` Li Zefan
2012-02-23 20:32     ` Vivek Goyal
2012-02-23  8:22 ` Li Zefan
2012-02-23 17:33   ` Tejun Heo
     [not found] ` <m162em2efy.fsf@fess.ebiederm.org>
2012-03-03 14:26   ` Serge Hallyn
2012-03-05 11:37 ` Lennart Poettering
2012-03-12 22:10 ` Tejun Heo
2012-03-12 22:22   ` Peter Zijlstra
2012-03-12 22:28     ` Tejun Heo
2012-03-12 22:31       ` Lennart Poettering
2012-03-12 23:00         ` Tejun Heo
2012-03-12 23:02           ` Peter Zijlstra
2012-03-12 23:09             ` Tejun Heo
2012-03-12 23:43             ` Lennart Poettering
2012-03-12 22:32       ` Peter Zijlstra
2012-03-12 22:39         ` Tejun Heo
2012-03-12 22:44           ` Peter Zijlstra
2012-03-12 23:04             ` Tejun Heo
2012-03-13 14:10               ` Vivek Goyal
2012-03-13 16:11                 ` C Anthony Risinger
2012-03-13 16:30                   ` C Anthony Risinger
2012-03-13 17:25                 ` Peter Zijlstra
2012-03-13 17:31                   ` Peter Zijlstra
2012-03-13 10:11             ` Glauber Costa
2012-03-13 14:03       ` Vivek Goyal
2012-03-13 15:59         ` Tejun Heo
2012-03-16 23:14           ` James Bottomley
2012-03-12 22:37   ` Serge Hallyn
2012-03-12 22:55     ` Tejun Heo
2012-03-13 13:49   ` Vivek Goyal
2012-03-13 16:02     ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120222182207.GC32694@google.com \
    --to=tj@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=containers@lists.linux-foundation.org \
    --cc=fweisbec@gmail.com \
    --cc=kay.sievers@vrfy.org \
    --cc=lennart@poettering.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=paul@paulmenage.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).