From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752269AbXCLXQV (ORCPT ); Mon, 12 Mar 2007 19:16:21 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752275AbXCLXQV (ORCPT ); Mon, 12 Mar 2007 19:16:21 -0400 Received: from MAIL.13thfloor.at ([213.145.232.33]:41617 "EHLO MAIL.13thfloor.at" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752269AbXCLXQU (ORCPT ); Mon, 12 Mar 2007 19:16:20 -0400 Date: Tue, 13 Mar 2007 00:16:18 +0100 From: Herbert Poetzl To: "Serge E. Hallyn" Cc: Srivatsa Vaddagiri , Paul Menage , "Eric W. Biederman" , ckrm-tech@lists.sourceforge.net, linux-kernel@vger.kernel.org, xemul@sw.ru, pj@sgi.com, winget@google.com, containers@lists.osdl.org, akpm@linux-foundation.org Subject: Re: [PATCH 1/2] rcfs core patch Message-ID: <20070312231618.GB6832@MAIL.13thfloor.at> Mail-Followup-To: "Serge E. Hallyn" , Srivatsa Vaddagiri , Paul Menage , "Eric W. Biederman" , ckrm-tech@lists.sourceforge.net, linux-kernel@vger.kernel.org, xemul@sw.ru, pj@sgi.com, winget@google.com, containers@lists.osdl.org, akpm@linux-foundation.org References: <20070301133543.GK15509@in.ibm.com> <20070301134528.GL15509@in.ibm.com> <6599ad830703080110j15aa961anc39954928cbc3851@mail.gmail.com> <20070309003819.GA4506@MAIL.13thfloor.at> <20070309175707.GA21661@in.ibm.com> <20070310011953.GE3647@MAIL.13thfloor.at> <20070311163604.GB31350@sergelap.austin.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20070311163604.GB31350@sergelap.austin.ibm.com> User-Agent: Mutt/1.5.11 Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Mar 11, 2007 at 11:36:04AM -0500, Serge E. Hallyn wrote: > Quoting Herbert Poetzl (herbert@13thfloor.at): > > On Fri, Mar 09, 2007 at 11:27:07PM +0530, Srivatsa Vaddagiri wrote: > > > On Fri, Mar 09, 2007 at 01:38:19AM +0100, Herbert Poetzl wrote: > > > > > 2) you allow a task to selectively reshare namespaces/subsystems with > > > > > another task, i.e. you can update current->task_proxy to point to > > > > > a proxy that matches your existing task_proxy in some ways and the > > > > > task_proxy of your destination in others. In that case a trivial > > > > > implementation would be to allocate a new task_proxy and copy some > > > > > pointers from the old task_proxy and some from the new. But then > > > > > whenever a task moves between different groupings it acquires a > > > > > new unique task_proxy. So moving a bunch of tasks between two > > > > > groupings, they'd all end up with unique task_proxy objects with > > > > > identical contents. > > > > > > this is exactly what Linux-VServer does right now, and I'm > > > > still not convinced that the nsproxy really buys us anything > > > > compared to a number of different pointers to various spaces > > > > (located in the task struct) > > > > > Are you saying that the current scheme of storing pointers to > > > different spaces (uts_ns, ipc_ns etc) in nsproxy doesn't buy > > > anything? > > > > > Or are you referring to storage of pointers to resource > > > (name)spaces in nsproxy doesn't buy anything? > > > > > In either case, doesn't it buy speed and storage space? > > > > let's do a few examples here, just to illustrate the > > advantages and disadvantages of nsproxy as separate > > structure over nsproxy as part of the task_struct > > But you're forgetting the *common* case, which is hundreds or > thousands of tasks with just one nsproxy. That's case for > which we have to optimize. yes, I agree here, maybe we should do something I suggested (and submitted a patch for some time ago) and add some kind of accounting for the various spaces (and the nsproxy) so that we can get a feeling how many of them are there and how many create/destroy cycles really happen ... those things will definitely be accounted in the Linux-VServer devel versions, don't know about OVZ > When that case is no longer the common case, we can yank the > nsproxy. As I keep saying, it *is* just an optimization. yes, fine with me, just wanted to paint a picture ... best, Herbert > -serge > > > 1) typical setup, 100 guests as shell servers, 5 > > tasks each when unused, 10 tasks when used 10% > > used in average > > > > a) separate nsproxy, we need at least 100 > > structs to handle that (saves some space) > > > > we might end up with ~500 nsproxies, if > > the shell clones a new namespace (so might > > not save that much space) > > > > we do a single inc/dec when the nsproxy > > is reused, but do the full N inc/dec when > > we have to copy an nsproxy (might save > > some refcounting) > > > > we need to do the indirection step, from > > task to nsproxy to space (and data) > > > > b) we have ~600 tasks with 600 times the > > nsproxy data (uses up some more space) > > > > we have to do the full N inc/dev when > > we create a new task (more refcounting) > > > > we do not need to do the indirection, we > > access spaces directly from the 'hot' > > task struct (makes hot pathes quite fast) > > > > so basically we trade a little more space and > > overhead on task creation for having no > > indirection to the data accessed quite often > > throughout the tasks life (hopefully) > > > > 2) context migration: for whatever reason, we decide > > to migrate a task into a subset (space mix) of a > > context 1000 times > > > > a) separate nsproxy, we need to create a new one > > consisting of the 'new' mix, which will > > > > - allocate the nsproxy struct > > - inc refcounts to all copied spaces > > - inc refcount nsproxy and assign to task > > - dec refcount existing task nsproxy > > > > after task completion > > - dec nsproxy refcount > > - dec refcounts for all spaces > > - free up nsproxy struct > > > > b) nsproxy data in task struct > > > > - inc/dec refcounts to changed spaces > > > > after task completion > > - dec refcounts to spaces > > > > so here we gain nothing with the nsproxy, unless > > the chosen subset is identical to the one already > > used, where we end up with a single refcount > > instead of N > > > > > > I'd prefer to do accounting (and limits) in a very simple > > > > and especially performant way, and the reason for doing > > > > so is quite simple: > > > > > Can you elaborate on the relationship between data structures > > > used to store those limits to the task_struct? > > > > sure it is one to many, i.e. each task points to > > exactly one context struct, while a context can > > consist of zero, one or many tasks (no back- > > pointers there) > > > > > Does task_struct store pointers to those objects directly? > > > > it contains a single pointer to the context struct, > > and that contains (as a substruct) the accounting > > and limit information > > > > HTC, > > Herbert > > > > > -- > > > Regards, > > > vatsa > > > _______________________________________________ > > > Containers mailing list > > > Containers@lists.osdl.org > > > https://lists.osdl.org/mailman/listinfo/containers > > _______________________________________________ > > Containers mailing list > > Containers@lists.osdl.org > > https://lists.osdl.org/mailman/listinfo/containers > _______________________________________________ > Containers mailing list > Containers@lists.osdl.org > https://lists.osdl.org/mailman/listinfo/containers