linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Herbert Poetzl <herbert@13thfloor.at>
To: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>,
	ckrm-tech@lists.sourceforge.net, linux-kernel@vger.kernel.org,
	xemul@sw.ru, pj@sgi.com, winget@google.com,
	containers@lists.osdl.org, akpm@linux-foundation.org,
	menage@google.com
Subject: Re: [PATCH 1/2] rcfs core patch
Date: Fri, 9 Mar 2007 01:48:16 +0100	[thread overview]
Message-ID: <20070309004816.GB4506@MAIL.13thfloor.at> (raw)
In-Reply-To: <20070308101347.GA29051@in.ibm.com>

On Thu, Mar 08, 2007 at 03:43:47PM +0530, Srivatsa Vaddagiri wrote:
> On Wed, Mar 07, 2007 at 08:12:00PM -0700, Eric W. Biederman wrote:
> > The review is still largely happening at the why level but no
> > one is addressing that yet.  So please can we have a why.
> 
> Here's a brief summary of what's happening and why. If its not clear,
> pls get back to us with specific questions.
> 
> There have been various projects attempting to provide resource
> management support in Linux, including CKRM/Resource Groups and UBC.

let me note here, once again, that you forgot Linux-VServer
which does quite non-intrusive resource management ...

> Each had its own task-grouping mechanism. 

the basic 'context' (pid space) is the grouping mechanism
we use for resource management too

> Paul Menage observed [1] that cpusets in the kernel already has a
> grouping mechanism which was working well for cpusets. He went ahead
> and generalized the grouping code in cpusets so that it could be used
> for overall resource management purpose. 

> With his patches, it is possible to even create multiple hierarchies
> of groups (see [2] on why multiple hierarchies) as follows:

do we need or even want that? IMHO the hierarchical
concept CKRM was designed with, was also the reason
for it being slow, unuseable and complicated

> mount -t container -o cpuset none /dev/cpuset	<- cpuset hierarchy
> mount -t container -o mem,cpu none /dev/mem	<- memory/cpu hierarchy
> mount -t container -o disk none /dev/disk	<- disk hierarchy
> 
> In each hierarchy, you can create task groups and manipulate the
> resource parameters of each group. You can also move tasks between
> groups at run-time (see [3] on why this is required). 

> Each hierarchy is also manipulated independent of the other.          

> Paul's patches also introduced a 'struct container' in the kernel,
> which serves these key purposes:
> 
> - Task-grouping
>   'struct container' represents a task-group created in each hierarchy.
>   So every directory created under /dev/cpuset or /dev/mem above will
>   have a corresponding 'struct container' inside the kernel. All tasks
>   pointing to the same 'struct container' are considered to be part of
>   a group
> 
>   The 'struct container' in turn has pointers to resource objects which
>   store actual resource parameters for that group. In above example,
>   'struct container' created under /dev/cpuset will have a pointer to
>   'struct cpuset' while 'struct container' created under /dev/disk will
>   have pointer to 'struct disk_quota_or_whatever'.
> 
> - Maintain hierarchical information
>   The 'struct container' also keeps track of hierarchical relationship
>   between groups.
> 
> The filesystem interface in the patches essentially serves these
> purposes:
> 
> 	- Provide an interface to manipulate task-groups. This includes
> 	  creating/deleting groups, listing tasks present in a group and 
> 	  moving tasks across groups
> 
> 	- Provdes an interface to manipulate the resource objects
> 	  (limits etc) pointed to by 'struct container'.
> 
> As you know, the introduction of 'struct container' was objected
> to and was felt redundant as a means to group tasks. Thats where I
> took a shot at converting over Paul Menage's patch to avoid 'struct
> container' abstraction and insead work with 'struct nsproxy'.

which IMHO isn't a step in the right direction, as
you will need to handle different nsproxies within
the same 'resource container' (see previous email)

> In the rcfs patch, each directory (in /dev/cpuset or /dev/disk) is
> associated with a 'struct nsproxy' instead. The most important need
> of the filesystem interface is not to manipulate the nsproxy objects
> directly, but to manipulate the resource objects (nsproxy->ctlr_data[]
> in the patches) which store information like limit etc.
> 
> > I have a question?  What does rcfs look like if we start with
> > the code that is in the kernel?  That is start with namespaces
> > and nsproxy and just build a filesystem to display/manipulate them?
> > With the code built so it will support adding resource controllers
> > when they are ready?
> 
> If I am not mistaken, Serge did attempt something in that direction,
> only that it was based on Paul's container patches. rcfs can no doubt
> support the same feature.
> 
> > >  	struct ipc_namespace *ipc_ns;
> > >  	struct mnt_namespace *mnt_ns;
> > >  	struct pid_namespace *pid_ns;
> > > +#ifdef CONFIG_RCFS
> > > +	struct list_head list;
> > 
> > This extra list of nsproxy's is unneeded and a performance problem the
> > way it is used.  In general we want to talk about the individual resource
> > controllers not the nsproxy.
> 
> I think if you consider the multiple hierarchy picture, the need
> becomes obvious.
> 
> Lets say that you had these hierarchies : /dev/cpuset, /dev/mem, /dev/disk
> and the various resource classes (task-groups) under them as below:
> 
> 	/dev/cpuset/C1, /dev/cpuset/C1/C11, /dev/cpuset/C2
> 	/dev/mem/M1, /dev/mem/M2, /dev/mem/M3
> 	/dev/disk/D1, /dev/disk/D2, /dev/disk/D3
> 
> The nsproxy structure basically has pointers to a resource objects in
> each of these hierarchies. 
> 
> 	nsproxy { ..., C1, M1, D1} could be one nsproxy
> 	nsproxy { ..., C1, M2, D3} could be another nsproxy and so on
> 
> So you see, because of multi-hierachies, we can have different
> combinations of resource classes.
> 
> When we support task movement across resource classes, we need to find a
> nsproxy which has the right combination of resource classes that the
> task's nsproxy can be hooked to.

no, not necessarily, we can simply create a new one
and give it the proper resource or whatever-spaces

> That's where we need the nsproxy list. Hope this makes it clear.
> 
> > > +	void *ctlr_data[CONFIG_MAX_RC_SUBSYS];
> > 
> > I still don't understand why these pointers are so abstract,
> > and why we need an array lookup into them?
> 
> we can avoid these abstract pointers and instead have a set of pointers
> like this:
> 
> 	struct nsproxy {
> 		...
> 		struct cpu_limit *cpu;	/* cpu control namespace */
> 		struct rss_limit *rss;	/* rss control namespace */
> 		struct cpuset *cs;	/* cpuset namespace */
> 
> 	}
> 
> But that will make some code (like searching for a right nsproxy when a
> task moves across classes/groups) very awkward.
> 
> > I'm still inclined to think this should be part of /proc, instead of a purely
> > separate fs.  But I might be missing something.
> 
> A separate filesystem would give us more flexibility like the
> implementing multi-hierarchy support described above.

why is the filesystem approach so favored for this
kind of manipulations?

IMHO it is one of the worst interfaces I can imagine
(to move tasks between spaces and/or assign resources)
but yes, I'm aware that filesystems are 'in' nowadays

best,
Herbert

> -- 
> Regards,
> vatsa
> 
> 
> References:
> 
> 1. http://lkml.org/lkml/2006/09/20/200 
> 2. http://lkml.org/lkml/2006/11/6/95
> 3. http://lkml.org/lkml/2006/09/5/178
> 
> _______________________________________________
> Containers mailing list
> Containers@lists.osdl.org
> https://lists.osdl.org/mailman/listinfo/containers

  reply	other threads:[~2007-03-09  0:48 UTC|newest]

Thread overview: 125+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-03-01 13:35 [PATCH 0/2] resource control file system - aka containers on top of nsproxy! Srivatsa Vaddagiri
2007-03-01 13:45 ` [PATCH 1/2] rcfs core patch Srivatsa Vaddagiri
2007-03-01 16:31   ` Serge E. Hallyn
2007-03-01 16:46     ` Srivatsa Vaddagiri
2007-03-02  5:06   ` [ckrm-tech] " Balbir Singh
2007-03-03  9:38     ` Srivatsa Vaddagiri
2007-03-08  3:12   ` Eric W. Biederman
2007-03-08  9:10     ` Paul Menage
2007-03-09  0:38       ` Herbert Poetzl
2007-03-09  9:07         ` Kirill Korotaev
2007-03-09 13:29           ` Herbert Poetzl
2007-03-09 17:57         ` Srivatsa Vaddagiri
2007-03-10  1:19           ` Herbert Poetzl
2007-03-11 16:36             ` Serge E. Hallyn
2007-03-12 23:16               ` Herbert Poetzl
2007-03-08 10:13     ` Srivatsa Vaddagiri
2007-03-09  0:48       ` Herbert Poetzl [this message]
2007-03-09  2:35         ` Paul Jackson
2007-03-09  9:23         ` Kirill Korotaev
2007-03-09  9:38           ` Paul Jackson
2007-03-09 13:21           ` Herbert Poetzl
2007-03-11 17:09             ` Kirill Korotaev
2007-03-12 23:00               ` Herbert Poetzl
2007-03-13  8:28                 ` Kirill Korotaev
2007-03-13 13:55                   ` Herbert Poetzl
2007-03-13 14:11                     ` [ckrm-tech] " Srivatsa Vaddagiri
2007-03-13 15:52                       ` Herbert Poetzl
2007-03-09 18:14         ` Srivatsa Vaddagiri
2007-03-09 19:25           ` Paul Jackson
2007-03-10  1:00             ` Herbert Poetzl
2007-03-10  1:31               ` Paul Jackson
2007-03-10  0:56           ` Herbert Poetzl
2007-03-09 16:16       ` Serge E. Hallyn
2007-03-01 13:50 ` [PATCH 2/2] cpu_accounting controller Srivatsa Vaddagiri
2007-03-01 19:39 ` [PATCH 0/2] resource control file system - aka containers on top of nsproxy! Paul Jackson
2007-03-02 15:45   ` Kirill Korotaev
2007-03-02 16:52     ` Andrew Morton
2007-03-02 17:25       ` Kirill Korotaev
2007-03-03 17:45     ` Herbert Poetzl
2007-03-03 21:22       ` Paul Jackson
2007-03-05 17:47         ` Srivatsa Vaddagiri
2007-03-03  9:36   ` Srivatsa Vaddagiri
2007-03-03 10:21     ` Paul Jackson
2007-03-05 17:02       ` Srivatsa Vaddagiri
2007-03-03 17:32     ` Herbert Poetzl
2007-03-05 17:34       ` [ckrm-tech] " Srivatsa Vaddagiri
2007-03-05 18:39         ` Herbert Poetzl
2007-03-06 10:39           ` Srivatsa Vaddagiri
2007-03-06 13:28             ` Herbert Poetzl
2007-03-06 16:21               ` Srivatsa Vaddagiri
2007-03-07  2:32 ` Paul Menage
2007-03-07 17:30   ` Srivatsa Vaddagiri
2007-03-07 17:29     ` Paul Menage
2007-03-07 17:52       ` [ckrm-tech] " Srivatsa Vaddagiri
2007-03-07 17:32     ` Srivatsa Vaddagiri
2007-03-07 17:43     ` Serge E. Hallyn
2007-03-07 17:46       ` Paul Menage
2007-03-07 23:16         ` Eric W. Biederman
2007-03-08 11:39           ` Srivatsa Vaddagiri
2007-03-07 18:00       ` Srivatsa Vaddagiri
2007-03-07 20:58         ` Serge E. Hallyn
2007-03-07 21:20           ` Paul Menage
2007-03-07 21:59             ` Serge E. Hallyn
2007-03-07 22:13               ` Dave Hansen
2007-03-07 23:13                 ` Eric W. Biederman
2007-03-12 14:11               ` [ckrm-tech] " Srivatsa Vaddagiri
2007-03-07 22:32             ` Eric W. Biederman
2007-03-07 23:18               ` Paul Menage
2007-03-08  0:35                 ` Sam Vilain
2007-03-08  0:42                   ` Paul Menage
2007-03-08  0:53                     ` Sam Vilain
2007-03-08  0:58                       ` [ckrm-tech] " Paul Menage
2007-03-08  1:32                         ` Eric W. Biederman
2007-03-08  1:35                           ` Paul Menage
2007-03-08  2:25                             ` Eric W. Biederman
2007-03-09  0:56                             ` Herbert Poetzl
2007-03-09  0:53                           ` Herbert Poetzl
2007-03-09 18:19                             ` Srivatsa Vaddagiri
2007-03-09 19:36                               ` Paul Jackson
2007-03-09 21:52                               ` Herbert Poetzl
2007-03-09 22:06                                 ` Paul Jackson
2007-03-12 14:01                                   ` Srivatsa Vaddagiri
2007-03-12 15:15                                     ` Srivatsa Vaddagiri
2007-03-12 20:26                                     ` Paul Jackson
2007-03-09  4:30                           ` Paul Jackson
2007-03-08  2:47                         ` Sam Vilain
2007-03-08  2:57                           ` Paul Menage
2007-03-08  3:32                             ` Sam Vilain
2007-03-08  6:10                               ` Matt Helsley
2007-03-08  6:44                                 ` Eric W. Biederman
2007-03-09  1:06                                   ` Herbert Poetzl
2007-03-10  9:06                                     ` Sam Vilain
2007-03-11 21:15                                       ` Paul Jackson
2007-03-12  9:35                                         ` Sam Vilain
2007-03-12 10:00                                         ` Paul Menage
2007-03-12 23:21                                           ` Herbert Poetzl
2007-03-13  2:25                                             ` Paul Menage
2007-03-13 15:57                                               ` Herbert Poetzl
2007-03-09  4:37                                 ` Paul Jackson
2007-03-08  6:32                               ` Eric W. Biederman
2007-03-08  9:10                               ` Paul Menage
2007-03-09 16:50                                 ` Serge E. Hallyn
2007-03-22 14:08                                   ` Srivatsa Vaddagiri
2007-03-22 14:39                                     ` Serge E. Hallyn
2007-03-22 14:56                                       ` Srivatsa Vaddagiri
2007-03-09  4:27                       ` Paul Jackson
2007-03-10  8:52                         ` Sam Vilain
2007-03-10  9:11                           ` Paul Jackson
2007-03-09 16:34             ` Srivatsa Vaddagiri
2007-03-09 16:41               ` [ckrm-tech] " Srivatsa Vaddagiri
2007-03-09 22:09               ` Paul Menage
2007-03-10  2:02                 ` Srivatsa Vaddagiri
2007-03-10  3:19                   ` [ckrm-tech] " Srivatsa Vaddagiri
2007-03-12 15:07                 ` Srivatsa Vaddagiri
2007-03-12 15:56                   ` Serge E. Hallyn
2007-03-12 16:20                     ` Srivatsa Vaddagiri
2007-03-12 17:25                       ` Serge E. Hallyn
2007-03-12 21:15                       ` Sam Vilain
2007-03-12 23:31                       ` Herbert Poetzl
2007-03-13  2:22                         ` Srivatsa Vaddagiri
2007-03-08  0:50     ` Sam Vilain
2007-03-08 11:30       ` Srivatsa Vaddagiri
2007-03-09  1:16         ` Herbert Poetzl
2007-03-09 18:41           ` Srivatsa Vaddagiri
2007-03-10  2:03             ` Herbert Poetzl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070309004816.GB4506@MAIL.13thfloor.at \
    --to=herbert@13thfloor.at \
    --cc=akpm@linux-foundation.org \
    --cc=ckrm-tech@lists.sourceforge.net \
    --cc=containers@lists.osdl.org \
    --cc=ebiederm@xmission.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=menage@google.com \
    --cc=pj@sgi.com \
    --cc=vatsa@in.ibm.com \
    --cc=winget@google.com \
    --cc=xemul@sw.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).