From: Herbert Poetzl <herbert@13thfloor.at>
To: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>,
ckrm-tech@lists.sourceforge.net, linux-kernel@vger.kernel.org,
xemul@sw.ru, pj@sgi.com, winget@google.com,
containers@lists.osdl.org, akpm@linux-foundation.org,
menage@google.com
Subject: Re: [PATCH 1/2] rcfs core patch
Date: Fri, 9 Mar 2007 01:48:16 +0100 [thread overview]
Message-ID: <20070309004816.GB4506@MAIL.13thfloor.at> (raw)
In-Reply-To: <20070308101347.GA29051@in.ibm.com>
On Thu, Mar 08, 2007 at 03:43:47PM +0530, Srivatsa Vaddagiri wrote:
> On Wed, Mar 07, 2007 at 08:12:00PM -0700, Eric W. Biederman wrote:
> > The review is still largely happening at the why level but no
> > one is addressing that yet. So please can we have a why.
>
> Here's a brief summary of what's happening and why. If its not clear,
> pls get back to us with specific questions.
>
> There have been various projects attempting to provide resource
> management support in Linux, including CKRM/Resource Groups and UBC.
let me note here, once again, that you forgot Linux-VServer
which does quite non-intrusive resource management ...
> Each had its own task-grouping mechanism.
the basic 'context' (pid space) is the grouping mechanism
we use for resource management too
> Paul Menage observed [1] that cpusets in the kernel already has a
> grouping mechanism which was working well for cpusets. He went ahead
> and generalized the grouping code in cpusets so that it could be used
> for overall resource management purpose.
> With his patches, it is possible to even create multiple hierarchies
> of groups (see [2] on why multiple hierarchies) as follows:
do we need or even want that? IMHO the hierarchical
concept CKRM was designed with, was also the reason
for it being slow, unuseable and complicated
> mount -t container -o cpuset none /dev/cpuset <- cpuset hierarchy
> mount -t container -o mem,cpu none /dev/mem <- memory/cpu hierarchy
> mount -t container -o disk none /dev/disk <- disk hierarchy
>
> In each hierarchy, you can create task groups and manipulate the
> resource parameters of each group. You can also move tasks between
> groups at run-time (see [3] on why this is required).
> Each hierarchy is also manipulated independent of the other.
> Paul's patches also introduced a 'struct container' in the kernel,
> which serves these key purposes:
>
> - Task-grouping
> 'struct container' represents a task-group created in each hierarchy.
> So every directory created under /dev/cpuset or /dev/mem above will
> have a corresponding 'struct container' inside the kernel. All tasks
> pointing to the same 'struct container' are considered to be part of
> a group
>
> The 'struct container' in turn has pointers to resource objects which
> store actual resource parameters for that group. In above example,
> 'struct container' created under /dev/cpuset will have a pointer to
> 'struct cpuset' while 'struct container' created under /dev/disk will
> have pointer to 'struct disk_quota_or_whatever'.
>
> - Maintain hierarchical information
> The 'struct container' also keeps track of hierarchical relationship
> between groups.
>
> The filesystem interface in the patches essentially serves these
> purposes:
>
> - Provide an interface to manipulate task-groups. This includes
> creating/deleting groups, listing tasks present in a group and
> moving tasks across groups
>
> - Provdes an interface to manipulate the resource objects
> (limits etc) pointed to by 'struct container'.
>
> As you know, the introduction of 'struct container' was objected
> to and was felt redundant as a means to group tasks. Thats where I
> took a shot at converting over Paul Menage's patch to avoid 'struct
> container' abstraction and insead work with 'struct nsproxy'.
which IMHO isn't a step in the right direction, as
you will need to handle different nsproxies within
the same 'resource container' (see previous email)
> In the rcfs patch, each directory (in /dev/cpuset or /dev/disk) is
> associated with a 'struct nsproxy' instead. The most important need
> of the filesystem interface is not to manipulate the nsproxy objects
> directly, but to manipulate the resource objects (nsproxy->ctlr_data[]
> in the patches) which store information like limit etc.
>
> > I have a question? What does rcfs look like if we start with
> > the code that is in the kernel? That is start with namespaces
> > and nsproxy and just build a filesystem to display/manipulate them?
> > With the code built so it will support adding resource controllers
> > when they are ready?
>
> If I am not mistaken, Serge did attempt something in that direction,
> only that it was based on Paul's container patches. rcfs can no doubt
> support the same feature.
>
> > > struct ipc_namespace *ipc_ns;
> > > struct mnt_namespace *mnt_ns;
> > > struct pid_namespace *pid_ns;
> > > +#ifdef CONFIG_RCFS
> > > + struct list_head list;
> >
> > This extra list of nsproxy's is unneeded and a performance problem the
> > way it is used. In general we want to talk about the individual resource
> > controllers not the nsproxy.
>
> I think if you consider the multiple hierarchy picture, the need
> becomes obvious.
>
> Lets say that you had these hierarchies : /dev/cpuset, /dev/mem, /dev/disk
> and the various resource classes (task-groups) under them as below:
>
> /dev/cpuset/C1, /dev/cpuset/C1/C11, /dev/cpuset/C2
> /dev/mem/M1, /dev/mem/M2, /dev/mem/M3
> /dev/disk/D1, /dev/disk/D2, /dev/disk/D3
>
> The nsproxy structure basically has pointers to a resource objects in
> each of these hierarchies.
>
> nsproxy { ..., C1, M1, D1} could be one nsproxy
> nsproxy { ..., C1, M2, D3} could be another nsproxy and so on
>
> So you see, because of multi-hierachies, we can have different
> combinations of resource classes.
>
> When we support task movement across resource classes, we need to find a
> nsproxy which has the right combination of resource classes that the
> task's nsproxy can be hooked to.
no, not necessarily, we can simply create a new one
and give it the proper resource or whatever-spaces
> That's where we need the nsproxy list. Hope this makes it clear.
>
> > > + void *ctlr_data[CONFIG_MAX_RC_SUBSYS];
> >
> > I still don't understand why these pointers are so abstract,
> > and why we need an array lookup into them?
>
> we can avoid these abstract pointers and instead have a set of pointers
> like this:
>
> struct nsproxy {
> ...
> struct cpu_limit *cpu; /* cpu control namespace */
> struct rss_limit *rss; /* rss control namespace */
> struct cpuset *cs; /* cpuset namespace */
>
> }
>
> But that will make some code (like searching for a right nsproxy when a
> task moves across classes/groups) very awkward.
>
> > I'm still inclined to think this should be part of /proc, instead of a purely
> > separate fs. But I might be missing something.
>
> A separate filesystem would give us more flexibility like the
> implementing multi-hierarchy support described above.
why is the filesystem approach so favored for this
kind of manipulations?
IMHO it is one of the worst interfaces I can imagine
(to move tasks between spaces and/or assign resources)
but yes, I'm aware that filesystems are 'in' nowadays
best,
Herbert
> --
> Regards,
> vatsa
>
>
> References:
>
> 1. http://lkml.org/lkml/2006/09/20/200
> 2. http://lkml.org/lkml/2006/11/6/95
> 3. http://lkml.org/lkml/2006/09/5/178
>
> _______________________________________________
> Containers mailing list
> Containers@lists.osdl.org
> https://lists.osdl.org/mailman/listinfo/containers
next prev parent reply other threads:[~2007-03-09 0:48 UTC|newest]
Thread overview: 125+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-03-01 13:35 [PATCH 0/2] resource control file system - aka containers on top of nsproxy! Srivatsa Vaddagiri
2007-03-01 13:45 ` [PATCH 1/2] rcfs core patch Srivatsa Vaddagiri
2007-03-01 16:31 ` Serge E. Hallyn
2007-03-01 16:46 ` Srivatsa Vaddagiri
2007-03-02 5:06 ` [ckrm-tech] " Balbir Singh
2007-03-03 9:38 ` Srivatsa Vaddagiri
2007-03-08 3:12 ` Eric W. Biederman
2007-03-08 9:10 ` Paul Menage
2007-03-09 0:38 ` Herbert Poetzl
2007-03-09 9:07 ` Kirill Korotaev
2007-03-09 13:29 ` Herbert Poetzl
2007-03-09 17:57 ` Srivatsa Vaddagiri
2007-03-10 1:19 ` Herbert Poetzl
2007-03-11 16:36 ` Serge E. Hallyn
2007-03-12 23:16 ` Herbert Poetzl
2007-03-08 10:13 ` Srivatsa Vaddagiri
2007-03-09 0:48 ` Herbert Poetzl [this message]
2007-03-09 2:35 ` Paul Jackson
2007-03-09 9:23 ` Kirill Korotaev
2007-03-09 9:38 ` Paul Jackson
2007-03-09 13:21 ` Herbert Poetzl
2007-03-11 17:09 ` Kirill Korotaev
2007-03-12 23:00 ` Herbert Poetzl
2007-03-13 8:28 ` Kirill Korotaev
2007-03-13 13:55 ` Herbert Poetzl
2007-03-13 14:11 ` [ckrm-tech] " Srivatsa Vaddagiri
2007-03-13 15:52 ` Herbert Poetzl
2007-03-09 18:14 ` Srivatsa Vaddagiri
2007-03-09 19:25 ` Paul Jackson
2007-03-10 1:00 ` Herbert Poetzl
2007-03-10 1:31 ` Paul Jackson
2007-03-10 0:56 ` Herbert Poetzl
2007-03-09 16:16 ` Serge E. Hallyn
2007-03-01 13:50 ` [PATCH 2/2] cpu_accounting controller Srivatsa Vaddagiri
2007-03-01 19:39 ` [PATCH 0/2] resource control file system - aka containers on top of nsproxy! Paul Jackson
2007-03-02 15:45 ` Kirill Korotaev
2007-03-02 16:52 ` Andrew Morton
2007-03-02 17:25 ` Kirill Korotaev
2007-03-03 17:45 ` Herbert Poetzl
2007-03-03 21:22 ` Paul Jackson
2007-03-05 17:47 ` Srivatsa Vaddagiri
2007-03-03 9:36 ` Srivatsa Vaddagiri
2007-03-03 10:21 ` Paul Jackson
2007-03-05 17:02 ` Srivatsa Vaddagiri
2007-03-03 17:32 ` Herbert Poetzl
2007-03-05 17:34 ` [ckrm-tech] " Srivatsa Vaddagiri
2007-03-05 18:39 ` Herbert Poetzl
2007-03-06 10:39 ` Srivatsa Vaddagiri
2007-03-06 13:28 ` Herbert Poetzl
2007-03-06 16:21 ` Srivatsa Vaddagiri
2007-03-07 2:32 ` Paul Menage
2007-03-07 17:30 ` Srivatsa Vaddagiri
2007-03-07 17:29 ` Paul Menage
2007-03-07 17:52 ` [ckrm-tech] " Srivatsa Vaddagiri
2007-03-07 17:32 ` Srivatsa Vaddagiri
2007-03-07 17:43 ` Serge E. Hallyn
2007-03-07 17:46 ` Paul Menage
2007-03-07 23:16 ` Eric W. Biederman
2007-03-08 11:39 ` Srivatsa Vaddagiri
2007-03-07 18:00 ` Srivatsa Vaddagiri
2007-03-07 20:58 ` Serge E. Hallyn
2007-03-07 21:20 ` Paul Menage
2007-03-07 21:59 ` Serge E. Hallyn
2007-03-07 22:13 ` Dave Hansen
2007-03-07 23:13 ` Eric W. Biederman
2007-03-12 14:11 ` [ckrm-tech] " Srivatsa Vaddagiri
2007-03-07 22:32 ` Eric W. Biederman
2007-03-07 23:18 ` Paul Menage
2007-03-08 0:35 ` Sam Vilain
2007-03-08 0:42 ` Paul Menage
2007-03-08 0:53 ` Sam Vilain
2007-03-08 0:58 ` [ckrm-tech] " Paul Menage
2007-03-08 1:32 ` Eric W. Biederman
2007-03-08 1:35 ` Paul Menage
2007-03-08 2:25 ` Eric W. Biederman
2007-03-09 0:56 ` Herbert Poetzl
2007-03-09 0:53 ` Herbert Poetzl
2007-03-09 18:19 ` Srivatsa Vaddagiri
2007-03-09 19:36 ` Paul Jackson
2007-03-09 21:52 ` Herbert Poetzl
2007-03-09 22:06 ` Paul Jackson
2007-03-12 14:01 ` Srivatsa Vaddagiri
2007-03-12 15:15 ` Srivatsa Vaddagiri
2007-03-12 20:26 ` Paul Jackson
2007-03-09 4:30 ` Paul Jackson
2007-03-08 2:47 ` Sam Vilain
2007-03-08 2:57 ` Paul Menage
2007-03-08 3:32 ` Sam Vilain
2007-03-08 6:10 ` Matt Helsley
2007-03-08 6:44 ` Eric W. Biederman
2007-03-09 1:06 ` Herbert Poetzl
2007-03-10 9:06 ` Sam Vilain
2007-03-11 21:15 ` Paul Jackson
2007-03-12 9:35 ` Sam Vilain
2007-03-12 10:00 ` Paul Menage
2007-03-12 23:21 ` Herbert Poetzl
2007-03-13 2:25 ` Paul Menage
2007-03-13 15:57 ` Herbert Poetzl
2007-03-09 4:37 ` Paul Jackson
2007-03-08 6:32 ` Eric W. Biederman
2007-03-08 9:10 ` Paul Menage
2007-03-09 16:50 ` Serge E. Hallyn
2007-03-22 14:08 ` Srivatsa Vaddagiri
2007-03-22 14:39 ` Serge E. Hallyn
2007-03-22 14:56 ` Srivatsa Vaddagiri
2007-03-09 4:27 ` Paul Jackson
2007-03-10 8:52 ` Sam Vilain
2007-03-10 9:11 ` Paul Jackson
2007-03-09 16:34 ` Srivatsa Vaddagiri
2007-03-09 16:41 ` [ckrm-tech] " Srivatsa Vaddagiri
2007-03-09 22:09 ` Paul Menage
2007-03-10 2:02 ` Srivatsa Vaddagiri
2007-03-10 3:19 ` [ckrm-tech] " Srivatsa Vaddagiri
2007-03-12 15:07 ` Srivatsa Vaddagiri
2007-03-12 15:56 ` Serge E. Hallyn
2007-03-12 16:20 ` Srivatsa Vaddagiri
2007-03-12 17:25 ` Serge E. Hallyn
2007-03-12 21:15 ` Sam Vilain
2007-03-12 23:31 ` Herbert Poetzl
2007-03-13 2:22 ` Srivatsa Vaddagiri
2007-03-08 0:50 ` Sam Vilain
2007-03-08 11:30 ` Srivatsa Vaddagiri
2007-03-09 1:16 ` Herbert Poetzl
2007-03-09 18:41 ` Srivatsa Vaddagiri
2007-03-10 2:03 ` Herbert Poetzl
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070309004816.GB4506@MAIL.13thfloor.at \
--to=herbert@13thfloor.at \
--cc=akpm@linux-foundation.org \
--cc=ckrm-tech@lists.sourceforge.net \
--cc=containers@lists.osdl.org \
--cc=ebiederm@xmission.com \
--cc=linux-kernel@vger.kernel.org \
--cc=menage@google.com \
--cc=pj@sgi.com \
--cc=vatsa@in.ibm.com \
--cc=winget@google.com \
--cc=xemul@sw.ru \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).