linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Balbir Singh <balbir@linux.vnet.ibm.com>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	linux kernel mailing list <linux-kernel@vger.kernel.org>,
	Libcg Devel Mailing List <libcg-devel@lists.sourceforge.net>,
	Dhaval Giani <dhaval@linux.vnet.ibm.com>,
	Paul Menage <menage@google.com>,
	Peter Zijlstra <pzijlstr@redhat.com>,
	Kazunaga Ikeno <k-ikeno@ak.jp.nec.com>,
	Morton Andrew Morton <akpm@linux-foundation.org>,
	Thomas Graf <tgraf@redhat.com>, Rik Van Riel <riel@redhat.com>
Subject: Re: [RFC] How to handle the rules engine for cgroups
Date: Tue, 08 Jul 2008 15:05:47 +0530	[thread overview]
Message-ID: <487334F3.6040000@linux.vnet.ibm.com> (raw)
In-Reply-To: <20080703155446.GB9275@redhat.com>

Vivek Goyal wrote:
> On Thu, Jul 03, 2008 at 10:19:57AM +0900, KAMEZAWA Hiroyuki wrote:
>> On Tue, 1 Jul 2008 15:11:26 -0400
>> Vivek Goyal <vgoyal@redhat.com> wrote:
>>
>>> Hi,
>>>
>>> While development is going on for cgroup and various controllers, we also
>>> need a facility so that an admin/user can specify the group creation and
>>> also specify the rules based on which tasks should be placed in respective
>>> groups. Group creation part will be handled by libcg which is already
>>> under development. We still need to tackle the issue of how to specify
>>> the rules and how these rules are enforced (rules engine).
>>>
>>> I have gathered few views, with regards to how rule engine can possibly be
>>> implemented, I am listing these down.
>>>
>>> Proposal 1
>>> ==========
>>> Let user space daemon hanle all that. Daemon will open a netlink socket
>>> and receive the notifications for various kernel events. Daemon will
>>> also parse appropriate admin specified rules config file and place the
>>> processes in right cgroup based on rules as and when events happen.
>>>
>>> I have written a prototype user space program which does that. Program 
>>> can be found here. Currently it is in very crude shape.
>>>
>>> http://people.redhat.com/vgoyal/misc/rules-engine-daemon/user-id-based-namespaces.patch
>>>
>>> Various people have raised two main issues with this approach.
>>>
>>> - netlink is not a reliable protocol.
>>> 	- Messages can be dropped and one can loose message. That means a
>>> 	  newly forked process might never go into right group as meant.
>>>
>>> - How to handle delays in rule exectuion?
>>> 	- For example, if an "exec" happens and by the time process is moved to
>>> 	 right group, it might have forked off few more processes or might
>>> 	 have done quite some amount of memory allocation which will be
>>>    	 charged to the wring group. Or, newly exec process might get
>>>  	 killed in existing cgroup because of lack of memory (despite the
>>> 	 fact that destination cgroup has sufficient memory).
>>>
>> Hmm, can't we rework the process event connector to use some reliable
>> fast interface besides netlink ? (I mean an interface like eventpoll.)
>> (Or enhance netlink ? ;)
> 
> I see following text in netlink man page.
> 
> "However, reliable transmissions from kernel to user are impossible in
>  any case. The kernel can’t send a netlink message if the socket buffer
>  is full: the message will be dropped and the kernel and  the userspace
>  process will no longer have the same view of kernel state. It is up to
>  the application to detect when this  happens  (via  the  ENOBUFS error
>  returned by recvmsg(2)) and resynchronize."
> 
> So at the end of the day, it looks like unreliability comes from the
> fact that we can not allocate memory currently so we will discard the
> packet.
> 
> Are there alternatives as compared to dropping packets?
> 
> - Let sender cache the packet and retry later. So maybe netlink layer
>   can return error if packet can not be queued and connector can cache the
>   event and try sending it later. (Hopefully later memory situation became
>   better because of OOM or some process exited or something else...).
> 
>   This looks like a band-aid to handle the temporary congestion kind of
>   problems. Will not be able to help if consumer is inherently slow and
>   event generation is faster.
> 
> This probably can be one possible enhancement to connector, but at the end
> of the day, any kind of user space daemon will have to accept the fact
> that packets can be dropped, leading to lost events. Detect that situation
> (using ENOBUFS) and then let admin know about it (logging). I am not sure
> what admin is supposed to do after that.
> 
> I am CCing Thomas Graf. He might have a better idea of netlink limitations
> and is there a way to overcome these.
> 

One thing we did with the delay accounting framework was to add the ability for
clients to listen on a per-cpu basis, that helped us scale well (user space
buffers per-client in turn per-cpu)

>> Because "a child inherits parent's" rule is very strong, I think the amount
>> of events we have to check is much less than we get report. Can't we add some
>> filter/assumption here ?
>>
> 
> I am not sure if proc connector currently allows filtering of various
> events like fork, exec, exit etc. In a quick look it looks like it
> does not. But probably that can be worked out. Even then, it will just
> help reduce the number of messages queued for user space on that socket
> but will not take away the fact that messages can be dropped under
> memory pressure. 
> 
>> BTW, the placement of proc_exec_connector() is not too late ? It seems memory for
>> creating exec-image is charged to original group...
>>
> 
> As of today it should happen because newly execed process will run into
> same cgroup as parent.  But that's what probably we need to avoid.
> For example, if an admin has created three cgroups "database", "browser"
> "others" and a user launches "firefox" from shell (assuming shell is running
> originally in "others" cgroup), then any memory allocation for firefox should
> come from "browser" cgroup and not from "others".
> 
> I am assuming that this will be a requirement for enterprise class
> systems. Would be good to know the experiences of people who are already
> doing some kind of work load management.

CKRM had a kernel module for rule based classification - called rule based
classification engine (rbce). We should consider a simple cgroups client that
can share a database from user space and use the fork callback for classification.

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL


  parent reply	other threads:[~2008-07-08  9:43 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-01 19:11 [RFC] How to handle the rules engine for cgroups Vivek Goyal
2008-07-02  9:33 ` Kazunaga Ikeno
2008-07-03  1:19 ` KAMEZAWA Hiroyuki
2008-07-03 15:54   ` Vivek Goyal
2008-07-04  0:34     ` KAMEZAWA Hiroyuki
2008-07-04  3:17     ` Li Zefan
2008-07-08  9:35     ` Balbir Singh [this message]
2008-07-08 13:45       ` Vivek Goyal
2008-07-10  9:23     ` Paul Menage
2008-07-10 14:30       ` Vivek Goyal
2008-07-10 15:42         ` Dhaval Giani
2008-07-10 16:51         ` Paul Menage
2008-07-10 14:48       ` Rik van Riel
2008-07-10 15:40         ` Vivek Goyal
2008-07-10 15:56           ` Ulrich Drepper
2008-07-10 17:25             ` Rik van Riel
2008-07-10 17:39               ` Ulrich Drepper
2008-07-10 18:41                 ` Vivek Goyal
2008-07-10 22:29                   ` Ulrich Drepper
2008-07-11  0:55           ` KAMEZAWA Hiroyuki
2008-07-14 13:57             ` Vivek Goyal
2008-07-14 14:44               ` David Collier-Brown
2008-07-14 15:21                 ` Vivek Goyal
2008-07-17  7:05                   ` Kazunaga Ikeno
2008-07-17 13:47                     ` Vivek Goyal
     [not found]                       ` <20080717170717.GA3718@linux.vnet.ibm.com>
2008-07-18  8:12                         ` [Libcg-devel] " Dhaval Giani
2008-07-18 20:12                           ` Vivek Goyal
2008-08-17 10:33                   ` [RFC] [PATCH -mm] cgroup: uid-based rules to add processes efficiently in the right cgroup Andrea Righi
2008-08-18 12:35                     ` Vivek Goyal
2008-08-19 14:35                       ` righi.andrea
2008-08-18 21:05                     ` Paul Menage
2008-08-19 12:57                       ` Vivek Goyal
2008-08-26  0:54                         ` Paul Menage
2008-08-26 13:41                           ` Vivek Goyal
2008-08-26 14:35                             ` Balbir Singh
2008-08-26 15:04                               ` David Collier-Brown
2008-08-26 16:00                                 ` Vivek Goyal
2008-08-26 16:32                                   ` David Collier-Brown
2008-08-26 16:08                               ` Vivek Goyal
2008-09-04 18:25                             ` Paul Menage
2008-08-19 15:12                       ` righi.andrea
2008-08-26  0:55                         ` Paul Menage
2008-07-14 15:07             ` Re: [RFC] How to handle the rules engine for cgroups kamezawa.hiroyu
2008-07-10  9:07 ` Paul Menage
2008-07-10 14:06   ` Vivek Goyal
2008-07-10 16:41     ` Paul Menage
2008-07-10 17:19       ` Vivek Goyal
2008-07-10 17:27         ` [Libcg-devel] " Dhaval Giani
2008-07-10 14:33   ` Vivek Goyal
2008-07-10 16:46     ` Paul Menage
2008-07-10 17:18       ` [Libcg-devel] " Dhaval Giani
2008-07-10 17:30         ` Paul Menage
2008-07-10 17:44           ` Dhaval Giani
2008-07-10 15:49   ` Dhaval Giani
2008-07-18  9:52 ` KAMEZAWA Hiroyuki
2008-07-18 15:46   ` Paul Menage
2008-07-18 16:39   ` Balbir Singh
2008-07-18 18:55     ` Vivek Goyal
2008-07-18 23:05   ` kamezawa.hiroyu
2008-07-18 23:10   ` kamezawa.hiroyu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=487334F3.6040000@linux.vnet.ibm.com \
    --to=balbir@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=dhaval@linux.vnet.ibm.com \
    --cc=k-ikeno@ak.jp.nec.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=libcg-devel@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=menage@google.com \
    --cc=pzijlstr@redhat.com \
    --cc=riel@redhat.com \
    --cc=tgraf@redhat.com \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).