From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753645AbXCPUFf (ORCPT ); Fri, 16 Mar 2007 16:05:35 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753646AbXCPUFf (ORCPT ); Fri, 16 Mar 2007 16:05:35 -0400 Received: from ebiederm.dsl.xmission.com ([166.70.28.69]:34898 "EHLO ebiederm.dsl.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753638AbXCPUFe (ORCPT ); Fri, 16 Mar 2007 16:05:34 -0400 From: ebiederm@xmission.com (Eric W. Biederman) To: vatsa@in.ibm.com Cc: "Paul Menage" , xemul@sw.ru, dev@sw.ru, pj@sgi.com, sam@vilain.net, winget@google.com, serue@us.ibm.com, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, ckrm-tech@lists.sourceforge.net, containers@lists.osdl.org Subject: Re: Summary of resource management discussion References: <20070312124226.GD17151@in.ibm.com> <6599ad830703150424t3478cd55mf9d2699f3669c9f0@mail.gmail.com> <20070315170435.GA28692@in.ibm.com> <6599ad830703151212o524af40es6cc6893c4304175f@mail.gmail.com> <20070316014024.GC28692@in.ibm.com> Date: Fri, 16 Mar 2007 14:03:03 -0600 In-Reply-To: <20070316014024.GC28692@in.ibm.com> (Srivatsa Vaddagiri's message of "Fri, 16 Mar 2007 07:10:24 +0530") Message-ID: User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/21.4 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Srivatsa Vaddagiri writes: > On Thu, Mar 15, 2007 at 12:12:50PM -0700, Paul Menage wrote: >> >Yes me too. But maybe to keep in simple in initial versions, we should >> >avoid that optimisation and at the same time get statistics on duplicates?. >> >> That's an implementation detail - we have more important points to >> agree on right now ... > > yes :) > > Eric, did you have any opinion on this thread? A little. Sorry for the delay this is my low priority discussion to follow. Here comes a brain dump... I do know of one case outside the context of containers/vservers where I would have liked a nice multi-process memory limit. HPC jobs doing rdma have a bad habit of wanting to mlock all of memory and thus send the memory allocator into a tail spin. So the separate use would be nice. I do suspect that we are likely to look at having a hierarchy for most of the limits. For limits if we can stand the cost having a hierarchy makes sense. However I don't see the advantage of sharing the hierarchy between different classes of resource groups. I do know that implementing hierarchical counters are inefficient for tracking resource consumption, and if we support deep hierarchies I expect we will want to optimize that, in non-trivial ways. At which point the hierarchy will be composed of more that just a pointer to it's parent. We could easily end up with children taking a lease from their parents saying you can have this many more before you look upwards in the hierarchy again. So with non-trivial hierarchy information adding a pointer at a generic level doesn't seem to be much of a help. Further my gut feel is that in general we will want to limit all resources on a per container basis. At the same time I expect we will want to limit other resources to select process groups or users even farther inside of a container. So a single hierarchy for multiple resources seems a little questionable. Which suggests that we want what I think was called multi-hierarchy support. I guess also with hierarchies and entering we probably need a limit such that if you enter another resource group you can't do anything except stay at the same level or descend into the hierarchy. But you can never remove your parent of that resource type. With the whole shared subtree concept the mount namespace stores things you can enter into in the mount namespace. This overcomes the difficulty of having to find a process who has the resources you want when you want to enter someplace. I think that concept is a benefit of a filesystem interface. Using mount and unmount to pin magic subsystem state seems a little more natural to me then having to do other filesystem manipulations. Especially since the mount namespace is reference counted so you can be certain everything you have pinned will either remain visible to someone or automatically disappear. I don't like interfaces like sysvipc that require manual destruction. I do think there is some sense in layout out a palette into the mount namespace we can enter into. There is another issue I'm not quite certain how to address. To some extent it makes sense to be able to compile out the resource controllers ensuring their overhead goes away completely. However for the most part I think it makes sense to assume that when they are compiled in we have an initial set of resource controllers that either doesn't limit us at all or has very liberal limits that have the same effect. Dealing with the case of possibly limiting things when we have the support compiled in does not seem to make a lot of sense to me. I hope that helps a little. I'm still coming to terms with the issues brought on by resource groups and controlling filesystems. Eric