linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vivek Goyal <vgoyal@redhat.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: linux kernel mailing list <linux-kernel@vger.kernel.org>,
	Libcg Devel Mailing List <libcg-devel@lists.sourceforge.net>,
	Balbir Singh <balbir@linux.vnet.ibm.com>,
	Dhaval Giani <dhaval@linux.vnet.ibm.com>,
	Paul Menage <menage@google.com>,
	Peter Zijlstra <pzijlstr@redhat.com>,
	Kazunaga Ikeno <k-ikeno@ak.jp.nec.com>,
	Morton Andrew Morton <akpm@linux-foundation.org>,
	Thomas Graf <tgraf@redhat.com>, Rik Van Riel <riel@redhat.com>
Subject: Re: [RFC] How to handle the rules engine for cgroups
Date: Thu, 3 Jul 2008 11:54:46 -0400	[thread overview]
Message-ID: <20080703155446.GB9275@redhat.com> (raw)
In-Reply-To: <20080703101957.b3856904.kamezawa.hiroyu@jp.fujitsu.com>

On Thu, Jul 03, 2008 at 10:19:57AM +0900, KAMEZAWA Hiroyuki wrote:
> On Tue, 1 Jul 2008 15:11:26 -0400
> Vivek Goyal <vgoyal@redhat.com> wrote:
> 
> > Hi,
> > 
> > While development is going on for cgroup and various controllers, we also
> > need a facility so that an admin/user can specify the group creation and
> > also specify the rules based on which tasks should be placed in respective
> > groups. Group creation part will be handled by libcg which is already
> > under development. We still need to tackle the issue of how to specify
> > the rules and how these rules are enforced (rules engine).
> > 
> > I have gathered few views, with regards to how rule engine can possibly be
> > implemented, I am listing these down.
> > 
> > Proposal 1
> > ==========
> > Let user space daemon hanle all that. Daemon will open a netlink socket
> > and receive the notifications for various kernel events. Daemon will
> > also parse appropriate admin specified rules config file and place the
> > processes in right cgroup based on rules as and when events happen.
> > 
> > I have written a prototype user space program which does that. Program 
> > can be found here. Currently it is in very crude shape.
> > 
> > http://people.redhat.com/vgoyal/misc/rules-engine-daemon/user-id-based-namespaces.patch
> > 
> > Various people have raised two main issues with this approach.
> > 
> > - netlink is not a reliable protocol.
> > 	- Messages can be dropped and one can loose message. That means a
> > 	  newly forked process might never go into right group as meant.
> > 
> > - How to handle delays in rule exectuion?
> > 	- For example, if an "exec" happens and by the time process is moved to
> > 	 right group, it might have forked off few more processes or might
> > 	 have done quite some amount of memory allocation which will be
> >    	 charged to the wring group. Or, newly exec process might get
> >  	 killed in existing cgroup because of lack of memory (despite the
> > 	 fact that destination cgroup has sufficient memory).
> > 
> Hmm, can't we rework the process event connector to use some reliable
> fast interface besides netlink ? (I mean an interface like eventpoll.)
> (Or enhance netlink ? ;)

I see following text in netlink man page.

"However, reliable transmissions from kernel to user are impossible in
 any case. The kernel can’t send a netlink message if the socket buffer
 is full: the message will be dropped and the kernel and  the userspace
 process will no longer have the same view of kernel state. It is up to
 the application to detect when this  happens  (via  the  ENOBUFS error
 returned by recvmsg(2)) and resynchronize."

So at the end of the day, it looks like unreliability comes from the
fact that we can not allocate memory currently so we will discard the
packet.

Are there alternatives as compared to dropping packets?

- Let sender cache the packet and retry later. So maybe netlink layer
  can return error if packet can not be queued and connector can cache the
  event and try sending it later. (Hopefully later memory situation became
  better because of OOM or some process exited or something else...).

  This looks like a band-aid to handle the temporary congestion kind of
  problems. Will not be able to help if consumer is inherently slow and
  event generation is faster.

This probably can be one possible enhancement to connector, but at the end
of the day, any kind of user space daemon will have to accept the fact
that packets can be dropped, leading to lost events. Detect that situation
(using ENOBUFS) and then let admin know about it (logging). I am not sure
what admin is supposed to do after that.

I am CCing Thomas Graf. He might have a better idea of netlink limitations
and is there a way to overcome these.

> 
> Because "a child inherits parent's" rule is very strong, I think the amount
> of events we have to check is much less than we get report. Can't we add some
> filter/assumption here ?
> 

I am not sure if proc connector currently allows filtering of various
events like fork, exec, exit etc. In a quick look it looks like it
does not. But probably that can be worked out. Even then, it will just
help reduce the number of messages queued for user space on that socket
but will not take away the fact that messages can be dropped under
memory pressure. 

> BTW, the placement of proc_exec_connector() is not too late ? It seems memory for
> creating exec-image is charged to original group...
> 

As of today it should happen because newly execed process will run into
same cgroup as parent.  But that's what probably we need to avoid.
For example, if an admin has created three cgroups "database", "browser"
"others" and a user launches "firefox" from shell (assuming shell is running
originally in "others" cgroup), then any memory allocation for firefox should
come from "browser" cgroup and not from "others".

I am assuming that this will be a requirement for enterprise class
systems. Would be good to know the experiences of people who are already
doing some kind of work load management.

Thanks
Vivek

  reply	other threads:[~2008-07-03 16:04 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-01 19:11 [RFC] How to handle the rules engine for cgroups Vivek Goyal
2008-07-02  9:33 ` Kazunaga Ikeno
2008-07-03  1:19 ` KAMEZAWA Hiroyuki
2008-07-03 15:54   ` Vivek Goyal [this message]
2008-07-04  0:34     ` KAMEZAWA Hiroyuki
2008-07-04  3:17     ` Li Zefan
2008-07-08  9:35     ` Balbir Singh
2008-07-08 13:45       ` Vivek Goyal
2008-07-10  9:23     ` Paul Menage
2008-07-10 14:30       ` Vivek Goyal
2008-07-10 15:42         ` Dhaval Giani
2008-07-10 16:51         ` Paul Menage
2008-07-10 14:48       ` Rik van Riel
2008-07-10 15:40         ` Vivek Goyal
2008-07-10 15:56           ` Ulrich Drepper
2008-07-10 17:25             ` Rik van Riel
2008-07-10 17:39               ` Ulrich Drepper
2008-07-10 18:41                 ` Vivek Goyal
2008-07-10 22:29                   ` Ulrich Drepper
2008-07-11  0:55           ` KAMEZAWA Hiroyuki
2008-07-14 13:57             ` Vivek Goyal
2008-07-14 14:44               ` David Collier-Brown
2008-07-14 15:21                 ` Vivek Goyal
2008-07-17  7:05                   ` Kazunaga Ikeno
2008-07-17 13:47                     ` Vivek Goyal
     [not found]                       ` <20080717170717.GA3718@linux.vnet.ibm.com>
2008-07-18  8:12                         ` [Libcg-devel] " Dhaval Giani
2008-07-18 20:12                           ` Vivek Goyal
2008-08-17 10:33                   ` [RFC] [PATCH -mm] cgroup: uid-based rules to add processes efficiently in the right cgroup Andrea Righi
2008-08-18 12:35                     ` Vivek Goyal
2008-08-19 14:35                       ` righi.andrea
2008-08-18 21:05                     ` Paul Menage
2008-08-19 12:57                       ` Vivek Goyal
2008-08-26  0:54                         ` Paul Menage
2008-08-26 13:41                           ` Vivek Goyal
2008-08-26 14:35                             ` Balbir Singh
2008-08-26 15:04                               ` David Collier-Brown
2008-08-26 16:00                                 ` Vivek Goyal
2008-08-26 16:32                                   ` David Collier-Brown
2008-08-26 16:08                               ` Vivek Goyal
2008-09-04 18:25                             ` Paul Menage
2008-08-19 15:12                       ` righi.andrea
2008-08-26  0:55                         ` Paul Menage
2008-07-14 15:07             ` Re: [RFC] How to handle the rules engine for cgroups kamezawa.hiroyu
2008-07-10  9:07 ` Paul Menage
2008-07-10 14:06   ` Vivek Goyal
2008-07-10 16:41     ` Paul Menage
2008-07-10 17:19       ` Vivek Goyal
2008-07-10 17:27         ` [Libcg-devel] " Dhaval Giani
2008-07-10 14:33   ` Vivek Goyal
2008-07-10 16:46     ` Paul Menage
2008-07-10 17:18       ` [Libcg-devel] " Dhaval Giani
2008-07-10 17:30         ` Paul Menage
2008-07-10 17:44           ` Dhaval Giani
2008-07-10 15:49   ` Dhaval Giani
2008-07-18  9:52 ` KAMEZAWA Hiroyuki
2008-07-18 15:46   ` Paul Menage
2008-07-18 16:39   ` Balbir Singh
2008-07-18 18:55     ` Vivek Goyal
2008-07-18 23:05   ` kamezawa.hiroyu
2008-07-18 23:10   ` kamezawa.hiroyu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080703155446.GB9275@redhat.com \
    --to=vgoyal@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=dhaval@linux.vnet.ibm.com \
    --cc=k-ikeno@ak.jp.nec.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=libcg-devel@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=menage@google.com \
    --cc=pzijlstr@redhat.com \
    --cc=riel@redhat.com \
    --cc=tgraf@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).