From mboxrd@z Thu Jan 1 00:00:00 1970 From: Johannes Weiner Subject: Re: [RFD] Merge task counter into memcg Date: Wed, 18 Apr 2012 12:39:30 +0200 Message-ID: <20120418103930.GA1771__34574.683258734$1334745621$gmane$org@cmpxchg.org> References: <20120412153055.GL1787@cmpxchg.org> <20120412163825.GB13069@google.com> <20120412172309.GM1787@cmpxchg.org> <20120412174155.GC13069@google.com> <4F878480.60505@jp.fujitsu.com> <20120417154117.GE32402@google.com> <4F8D9FC4.3080800@parallels.com> <4F8E646B.1020807@jp.fujitsu.com> <4F8E7E76.3020202@jp.fujitsu.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <4F8E7E76.3020202-+CUm20s59erQFUHtdCDX3A@public.gmane.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: containers-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org Errors-To: containers-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org To: KAMEZAWA Hiroyuki Cc: "Daniel P. Berrange" , Frederic Weisbecker , Containers , Daniel Walsh , Hugh Dickins , LKML , Tejun Heo , Cgroups , Andrew Morton List-Id: containers.vger.kernel.org On Wed, Apr 18, 2012 at 05:42:30PM +0900, KAMEZAWA Hiroyuki wrote: > (2012/04/18 16:53), Frederic Weisbecker wrote: > > > 2012/4/18 KAMEZAWA Hiroyuki : > >> (2012/04/18 1:52), Glauber Costa wrote: > >> > >>> > >>>>> In short, I don't think it's better to have task-counting and fd-counting in memcg. > >>>>> It's kmem, but it's more than that, I think. > >>>>> Please provide subsys like ulimit. > >>>> > >>>> So, you think that while kmem would be enough to prevent fork-bombs, > >>>> it would still make sense to limit in more traditional ways > >>>> (ie. ulimit style object limits). Hmmm.... > >>>> > >>> > >>> I personally think this is namespaces business, not cgroups. > >>> If you have a process namespace, an interface that works to limit the > >>> number of processes should keep working given the constraints you are > >>> given. > >>> > >>> What doesn't make sense, is to create a *new* interface to limit > >>> something that doesn't really need to be limited, just because you > >>> limited a similar resource before. > >>> > >> > >> > >> Ok, limitiing forkbomb is unnecessary. ulimit+namespace should work. > >> What we need is user-id namespace, isn't it ? If we have that, ulimit > >> works enough fine, no overheads. > > > > I have considered using NR_PROC rlimit on top of user namespaces to > > fight forkbombs inside a container. > > ie: one user namespace per container with its own rlimit. > > > > But it doesn't work because we can have multiuser apps running in a > > single container. > > > > Ok, then, requirements is different from ulimit. ok, please forget my words. > > My concern for using 'kmem' is that size of object can be changed, and set up > may be more complicated than limiting 'number' of tasks. > It's very architecture dependent....But hmm... BECAUSE it is architecture/kernel version/runtime dependent how big a task really is, limiting available kernel memory is much more meaningful than limiting a container to a number of units of unknown and dynamically changing size. How could this argument ever work IN FAVOR of limiting the number of tasks? > If slab accounting can handle task_struct accounting, all you wants can be > done by it (maybe). And implementation can be duplicated. > (But another aspect of the problem will be speed of development..) > > One idea is (I'm not sure good or bad)...having following control files. > > - memory.kmem.task_struct.limit_in_bytes > - memory.kmem.task_struct.usage_in_bytes > - memory.kmem.task_struct.size_in_bytes # size of task struct. A task's memory impact is not just its task_struct. > At 1st, implement this by accounting task struct(or some) directly. > Later, if we can, replace the implementation with slab(kmem) cgroup.. > and unify interfaces.....a long way to go. > > 2nd idea is > > - memory.object.task.limit_in_number # limit the number of tasks. > - memory.object.task.usage_in_number # usage > > If I'm a user, I prefer #2. The memory controller is there to partition physical memory. This is usually measured in bytes and that's why the user-visible object size in the memory controller is a byte. When you add other types of objects, you force the user to know about them and give them a method of knowing the object size in bytes, which in case of a task, can vary at runtime. I will agree to this interface the moment I can buy RAM whose quantity is measured in number of tasks. > Hmm, > global kmem limiting -> done by bytes. > special kernel object limiting -> done by the number of objects. > > is...complicated ? Yes, and you don't provide any arguments! What are you trying to do that would make limiting the number of tasks a useful mechanism? Why should some kernel objects be special?