From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: linux kernel mailing list <linux-kernel@vger.kernel.org>,
Libcg Devel Mailing List <libcg-devel@lists.sourceforge.net>,
Balbir Singh <balbir@linux.vnet.ibm.com>,
Dhaval Giani <dhaval@linux.vnet.ibm.com>,
Paul Menage <menage@google.com>,
Peter Zijlstra <pzijlstr@redhat.com>,
Kazunaga Ikeno <k-ikeno@ak.jp.nec.com>,
Morton Andrew Morton <akpm@linux-foundation.org>,
Thomas Graf <tgraf@redhat.com>, Rik Van Riel <riel@redhat.com>
Subject: Re: [RFC] How to handle the rules engine for cgroups
Date: Fri, 4 Jul 2008 09:34:16 +0900 [thread overview]
Message-ID: <20080704093416.ed3d1951.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <20080703155446.GB9275@redhat.com>
On Thu, 3 Jul 2008 11:54:46 -0400
Vivek Goyal <vgoyal@redhat.com> wrote:
> On Thu, Jul 03, 2008 at 10:19:57AM +0900, KAMEZAWA Hiroyuki wrote:
> > On Tue, 1 Jul 2008 15:11:26 -0400
> > Vivek Goyal <vgoyal@redhat.com> wrote:
> > > - How to handle delays in rule exectuion?
> > > - For example, if an "exec" happens and by the time process is moved to
> > > right group, it might have forked off few more processes or might
> > > have done quite some amount of memory allocation which will be
> > > charged to the wring group. Or, newly exec process might get
> > > killed in existing cgroup because of lack of memory (despite the
> > > fact that destination cgroup has sufficient memory).
> > >
> > Hmm, can't we rework the process event connector to use some reliable
> > fast interface besides netlink ? (I mean an interface like eventpoll.)
> > (Or enhance netlink ? ;)
>
> I see following text in netlink man page.
>
> "However, reliable transmissions from kernel to user are impossible in
> any case. The kernel can’t send a netlink message if the socket buffer
> is full: the message will be dropped and the kernel and the userspace
> process will no longer have the same view of kernel state. It is up to
> the application to detect when this happens (via the ENOBUFS error
> returned by recvmsg(2)) and resynchronize."
>
> So at the end of the day, it looks like unreliability comes from the
> fact that we can not allocate memory currently so we will discard the
> packet.
>
> Are there alternatives as compared to dropping packets?
>
If it's just problem of memory allocation, preallocate socket buffer and
use it later, like radix_tree_preload().
==
foo() {
if (preallocate())
return -ENOBUFS;
.......
proc_xxxx_connector();
}
==
(this means setuid() will return -ENOBUFS, undocumented error code.)
But af_netlink layer have another cause of dropping packets
1. copying skb at broadcast.
2. recv buffer over run..
(2) is not avoidable in the kernel.
> - Let sender cache the packet and retry later. So maybe netlink layer
> can return error if packet can not be queued and connector can cache the
> event and try sending it later. (Hopefully later memory situation became
> better because of OOM or some process exited or something else...).
>
> This looks like a band-aid to handle the temporary congestion kind of
> problems. Will not be able to help if consumer is inherently slow and
> event generation is faster.
>
> This probably can be one possible enhancement to connector, but at the end
> of the day, any kind of user space daemon will have to accept the fact
> that packets can be dropped, leading to lost events. Detect that situation
> (using ENOBUFS) and then let admin know about it (logging). I am not sure
> what admin is supposed to do after that.
>
I'm not either ;)
> I am CCing Thomas Graf. He might have a better idea of netlink limitations
> and is there a way to overcome these.
>
> >
> > Because "a child inherits parent's" rule is very strong, I think the amount
> > of events we have to check is much less than we get report. Can't we add some
> > filter/assumption here ?
> >
>
> I am not sure if proc connector currently allows filtering of various
> events like fork, exec, exit etc. In a quick look it looks like it
> does not. But probably that can be worked out. Even then, it will just
> help reduce the number of messages queued for user space on that socket
> but will not take away the fact that messages can be dropped under
> memory pressure.
>
agreed.
> > BTW, the placement of proc_exec_connector() is not too late ? It seems memory for
> > creating exec-image is charged to original group...
> >
>
> As of today it should happen because newly execed process will run into
> same cgroup as parent. But that's what probably we need to avoid.
I think so.
> For example, if an admin has created three cgroups "database", "browser"
> "others" and a user launches "firefox" from shell (assuming shell is running
> originally in "others" cgroup), then any memory allocation for firefox should
> come from "browser" cgroup and not from "others".
>
yes.
> I am assuming that this will be a requirement for enterprise class
> systems. Would be good to know the experiences of people who are already
> doing some kind of work load management.
>
Thanks,
-Kame
> Thanks
> Vivek
>
next prev parent reply other threads:[~2008-07-04 0:30 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-07-01 19:11 [RFC] How to handle the rules engine for cgroups Vivek Goyal
2008-07-02 9:33 ` Kazunaga Ikeno
2008-07-03 1:19 ` KAMEZAWA Hiroyuki
2008-07-03 15:54 ` Vivek Goyal
2008-07-04 0:34 ` KAMEZAWA Hiroyuki [this message]
2008-07-04 3:17 ` Li Zefan
2008-07-08 9:35 ` Balbir Singh
2008-07-08 13:45 ` Vivek Goyal
2008-07-10 9:23 ` Paul Menage
2008-07-10 14:30 ` Vivek Goyal
2008-07-10 15:42 ` Dhaval Giani
2008-07-10 16:51 ` Paul Menage
2008-07-10 14:48 ` Rik van Riel
2008-07-10 15:40 ` Vivek Goyal
2008-07-10 15:56 ` Ulrich Drepper
2008-07-10 17:25 ` Rik van Riel
2008-07-10 17:39 ` Ulrich Drepper
2008-07-10 18:41 ` Vivek Goyal
2008-07-10 22:29 ` Ulrich Drepper
2008-07-11 0:55 ` KAMEZAWA Hiroyuki
2008-07-14 13:57 ` Vivek Goyal
2008-07-14 14:44 ` David Collier-Brown
2008-07-14 15:21 ` Vivek Goyal
2008-07-17 7:05 ` Kazunaga Ikeno
2008-07-17 13:47 ` Vivek Goyal
[not found] ` <20080717170717.GA3718@linux.vnet.ibm.com>
2008-07-18 8:12 ` [Libcg-devel] " Dhaval Giani
2008-07-18 20:12 ` Vivek Goyal
2008-08-17 10:33 ` [RFC] [PATCH -mm] cgroup: uid-based rules to add processes efficiently in the right cgroup Andrea Righi
2008-08-18 12:35 ` Vivek Goyal
2008-08-19 14:35 ` righi.andrea
2008-08-18 21:05 ` Paul Menage
2008-08-19 12:57 ` Vivek Goyal
2008-08-26 0:54 ` Paul Menage
2008-08-26 13:41 ` Vivek Goyal
2008-08-26 14:35 ` Balbir Singh
2008-08-26 15:04 ` David Collier-Brown
2008-08-26 16:00 ` Vivek Goyal
2008-08-26 16:32 ` David Collier-Brown
2008-08-26 16:08 ` Vivek Goyal
2008-09-04 18:25 ` Paul Menage
2008-08-19 15:12 ` righi.andrea
2008-08-26 0:55 ` Paul Menage
2008-07-14 15:07 ` Re: [RFC] How to handle the rules engine for cgroups kamezawa.hiroyu
2008-07-10 9:07 ` Paul Menage
2008-07-10 14:06 ` Vivek Goyal
2008-07-10 16:41 ` Paul Menage
2008-07-10 17:19 ` Vivek Goyal
2008-07-10 17:27 ` [Libcg-devel] " Dhaval Giani
2008-07-10 14:33 ` Vivek Goyal
2008-07-10 16:46 ` Paul Menage
2008-07-10 17:18 ` [Libcg-devel] " Dhaval Giani
2008-07-10 17:30 ` Paul Menage
2008-07-10 17:44 ` Dhaval Giani
2008-07-10 15:49 ` Dhaval Giani
2008-07-18 9:52 ` KAMEZAWA Hiroyuki
2008-07-18 15:46 ` Paul Menage
2008-07-18 16:39 ` Balbir Singh
2008-07-18 18:55 ` Vivek Goyal
2008-07-18 23:05 ` kamezawa.hiroyu
2008-07-18 23:10 ` kamezawa.hiroyu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080704093416.ed3d1951.kamezawa.hiroyu@jp.fujitsu.com \
--to=kamezawa.hiroyu@jp.fujitsu.com \
--cc=akpm@linux-foundation.org \
--cc=balbir@linux.vnet.ibm.com \
--cc=dhaval@linux.vnet.ibm.com \
--cc=k-ikeno@ak.jp.nec.com \
--cc=libcg-devel@lists.sourceforge.net \
--cc=linux-kernel@vger.kernel.org \
--cc=menage@google.com \
--cc=pzijlstr@redhat.com \
--cc=riel@redhat.com \
--cc=tgraf@redhat.com \
--cc=vgoyal@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).