From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S262356AbVFUVAe (ORCPT ); Tue, 21 Jun 2005 17:00:34 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S262362AbVFUU7D (ORCPT ); Tue, 21 Jun 2005 16:59:03 -0400 Received: from mailwasher.lanl.gov ([192.65.95.54]:64166 "EHLO mailwasher-b.lanl.gov") by vger.kernel.org with ESMTP id S262344AbVFUUwn (ORCPT ); Tue, 21 Jun 2005 16:52:43 -0400 Date: Tue, 21 Jun 2005 14:52:38 -0600 (MDT) From: "Ronald G. Minnich" To: linux-kernel@vger.kernel.org cc: Eric Van Hensbergen Subject: Re: Fwd: v9fs (-mm -> 2.6.13 merge status) In-Reply-To: Message-ID: References: <20050620235458.5b437274.akpm@osdl.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-PMX-Version: 4.7.0.111621 Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 21 Jun 2005, Eric Van Hensbergen wrote: > On 6/21/05, Andrew Morton wrote: > > > > v9fs > > > > I'm not sure that this has a sufficiently high > > usefulness-to-maintenance-cost ratio. > > I got pointed at this discussion. Here are my $.02 on why we at LANL are interested in v9fs. We build clusters on the order of 2000 machines at present, with larger systems coming along. The system which we use to run these clusters is bproc. While bproc has proven to be very powerful to date, it does have its limits: - requires homogenous system - the network protocols it uses, while simple, are somewhat ad-hoc (as is common in this type of system) - if you are on a bproc system as user x, using 25% of the system, you still see 100% of the processes. This is a bit of a security issue. We have a desire to build single-system-image looking clusters along the bproc model, but at the same time compose those clusters of, e.g., Opterons and G5s. This mixing is highly desirable for compoutations that have phases, some of which belong on one type of a machine, and some on another. We are going to use v9fs as the glue for our next-generation cluster software, called 'xcpu'. Xcpu has been implemented on Plan 9 and works there. I have ported xcpu to Linux, using v9fs as the client side and Russ Cox's plan9ports server to write servers. xcpu presents a remote execution service as a 9p server. xcpu has been tested across architectures and it works very well. By summer 2006, we hope to have cut over our bproc systems to xcpu. That's one use for v9fs. We also plan to use v9fs to provide us with servers for global /proc, monitoring, and control systems for our clusters. The global /proc is interesting. bproc provides a global /proc, but it is incomplete; entries for, e.g., exe and maps are not filled in. bproc also caches part of the /proc, but the rules about what is cached and what the timeouts are, are set in the kernel module and not easily changed. We are going to have an "aggregating" user level 9p server based on Mirtchovskis's aggrfs, which will both aggregate all the cluster nodes, and have caching rules that make sense in clusters of 1000s of node (for example, it is ok to cache /proc/x/status; there is no need to cache /proc/x/maps, and you probably don't want to anyway). A neat capability is that if we give a user, e.g., 25% of the cluster, we can tailor that user's name space so that they only see their procs and the 25% of the cluster they own. This is good for security, but also good for convenience: most users don't really care that some other user is on 75% of the cluster. Global pid spaces are neat in theory, messy in practice at large scale. I want my global pid space to be global to *me*, meaning I see the global space of the nodes I care about. The sysadmin, of course, wants to see everything. All this is possible. V9fs, along with Linux private name spaces, will allow us to provide this model: users can see some or all of the global pid space, depending on need; users can be constrained to only see part of the global pid space, depending on other issues. 9p will also replace the Supermon protocol, allowing people to easily view status information in a file system. In addition to the cluster usage, there is also grid usage. The 9grid, composed of plan 9 systems, is connected by 9p servers. Linux systems can join the 9grid with no problem, once Linux has v9fs. Were v9fs just a file system, I would not really be interested in it one way or another; we have NFS, after all. But v9fs is really the key piece of a new model of cluster services we are building at LANL. 9p will be the glue, and v9fs will be the needed client side for hooking 9p servers into the file system name space. I'm hoping we can see v9fs in the kernel someday. thanks ron