From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964870AbWAWSit (ORCPT ); Mon, 23 Jan 2006 13:38:49 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S964875AbWAWSit (ORCPT ); Mon, 23 Jan 2006 13:38:49 -0500 Received: from e5.ny.us.ibm.com ([32.97.182.145]:8940 "EHLO e5.ny.us.ibm.com") by vger.kernel.org with ESMTP id S964870AbWAWSit (ORCPT ); Mon, 23 Jan 2006 13:38:49 -0500 Message-ID: <43D522B2.5060308@watson.ibm.com> Date: Mon, 23 Jan 2006 13:38:42 -0500 From: Hubertus Franke User-Agent: Mozilla Thunderbird 1.0.2 (Windows/20050317) X-Accept-Language: en-us, en MIME-Version: 1.0 To: "Eric W. Biederman" CC: Dave Hansen , Greg KH , Alan Cox , "Serge E. Hallyn" , Arjan van de Ven , linux-kernel@vger.kernel.org, Cedric Le Goater Subject: Re: RFC [patch 13/34] PID Virtualization Define new task_pid api References: <20060117143258.150807000@sergelap> <20060117143326.283450000@sergelap> <1137511972.3005.33.camel@laptopd505.fenrus.org> <20060117155600.GF20632@sergelap.austin.ibm.com> <1137513818.14135.23.camel@localhost.localdomain> <1137518714.5526.8.camel@localhost.localdomain> <20060118045518.GB7292@kroah.com> <1137601395.7850.9.camel@localhost.localdomain> <43D14578.6060801@watson.ibm.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Eric W. Biederman wrote: > Hubertus Franke writes: > > ... > >>Actions: The vpid_to_pid will disappear and the check for whether we are in the >>same >>container needs to be pushed down into the task lookup. question remains to >>figure out >>whether the context of the task lookup (will always remain the caller ?). > > > You don't need a same container check. If something is in another container > it becomes invisible to you. > Eric, agreed.... that was implied by me (but poorly worded). What I meant (lets try this again) is that the context defines/provides the namespace in which the lookup is performed, hence as you say state.. naturally things in different containers (namespaces) are invisible to you.. > >>Doing so has an implication, namely that we are moving over to "system >>containers". >>The current implementation requires the vpid/pid only for the boundary condition >>at the >>top of the container (to rewrite pid=1) and its parent and the fact that we >>wanted >>a global look through container=0. >>If said boundary would be eliminated and we simply make a container a child of >>the >>initproc (pid=1), this would be unnecessary. >> >>all together this would provide private namespaces (as just suggested by Eric). >> >>The feeling would be that large parts of patch could be reduce by this. > > > I concur. Except I think the initial impact could still be large. > It may be worth breaking all users of pids just so we audit them. > > But that will certainly result in no long term cost, or runtime overhead. > > >>What we need is a new system calls (similar to vserver) or maybe we can continue >>the /proc approach for now... >> >>sys_exec_container(const *char container_name, pid_t pid, unsigned int flags, >>const *char argv, const *char envp); >> >>exec_container creates a new container (if indicated in flags) and a new task in >>it that reports to parent initproc. >>if a non-zero pid is specified we use that pid, otherwise the system will >>allocate it. Finally >>it create new session id ; chroot and exec's the specified program. >> >>What we loose with this is the session and the tty, which Cedric described as >>application >>container... >> >>The sys_exec_container(...) seems to be similar to what Eric just called >>clone_namespace() > > > Similar. But I was actually talking about just adding another flag to > sys_clone the syscall underlying fork(). Basically it is just another > resource not share or not-share. > > Eric > That's a good idea .. right now we simply did this through a flag left by the call to the /proc/container fs ... (awkward at best, but didn't break the API). I have a concern wrt doing it in during fork namely the sharing of resources. Whe obviously are looking at some constraints here wrt to sharing. We need to ensure that this ain't a thread etc that will share resources across "containers" (which then later aren't migratable due to that sharing). So doing the fork_exec() atomically would avoid that problem.