From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S261318AbTIHD4O (ORCPT ); Sun, 7 Sep 2003 23:56:14 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S261899AbTIHD4O (ORCPT ); Sun, 7 Sep 2003 23:56:14 -0400 Received: from ebiederm.dsl.xmission.com ([166.70.28.69]:61027 "EHLO ebiederm.dsl.xmission.com") by vger.kernel.org with ESMTP id S261318AbTIHD4M (ORCPT ); Sun, 7 Sep 2003 23:56:12 -0400 To: Larry McVoy Cc: "Martin J. Bligh" , William Lee Irwin III , Alan Cox , "Brown, Len" , Giuliano Pochini , Linux Kernel Mailing List Subject: Re: Scaling noise References: <20030903194658.GC1715@holomorphy.com> <105370000.1062622139@flay> <20030903212119.GX4306@holomorphy.com> <115070000.1062624541@flay> <20030903215135.GY4306@holomorphy.com> <116940000.1062625566@flay> <20030904010653.GD5227@work.bitmover.com> <20030907230729.GA19380@work.bitmover.com> <20030908005749.GA24714@work.bitmover.com> From: ebiederm@xmission.com (Eric W. Biederman) Date: 07 Sep 2003 21:55:47 -0600 In-Reply-To: <20030908005749.GA24714@work.bitmover.com> Message-ID: User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Larry McVoy writes: > On Sun, Sep 07, 2003 at 05:47:04PM -0600, Eric W. Biederman wrote: > > I have already built a 2304 cpu machine and am working on a 2900+ cpu > > machine. > > That's not "a machine" that's ~1150 machines on a network. This business > of describing a bunch of boxes on a network as "a machine" is nonsense. Every bit as much as describing a scalable NUMA box with replaceable nodes as a single machine is nonsense. When things are built and run as a single machine, it is a single machine. The fact you standardized parts used many times does not change that. The only real difference is cache coherency, and the price. I won't argue that at the lowest end the vendor delivers you a pile of boxes and walks away, at which point you must do everything yourself and it is a real maintenance pain. But that is the lowest end and certainly not what I sell. The systems are built and tested as a single machine before delivery. > Don't get me wrong, I love clusters, in fact, I think what you are doing > is great. It doesn't screw up the OS, it forces the OS to stay lean and > mean. Goodness. > > All the CC cluster stuff is about making sure that the SMP fanatics don't > screw up the OS for you. We're on the same side. Try not to be so rude > and have a bit more vision. And I agree, except on some small details. Although I have yet to see the large way SMP folks causing problems. But as far as doing the work there are two different ends the work can be started from. a) SMP and make the locks finer grained. b) Cluster and add the few necessary locks. Both solutions run fine on a NUMA machine. And both eventually lead to good SSI solutions. But except for some magic piece that only works on cc NUMA nodes, you can develop all of the SSI software on an ordinary cluster. On an ordinary cluster that is that is the only option, and so the people with clusters are going to do the work. The only reason you don't see more SSI work out of the cluster guys is they are willing to sacrifice some coherency for scalability. But mostly it is because of the fact that clusters are only slowly catching on. So assuming the non coherent cluster guys do their part you get SSI software that works out of the box and does everything except for optimize the page cache for the shared physical hardware. And the software will scale awesomely because each generation of cluster hardware is larger than the last. The only piece that is unique is CCFS, which builds a shared page cache. And even then the non coherent cluster guys may come up with a better solution. So my argument is that if you are going to do it right. Start with an ordinary non-coherent cluster. Build the SSI support. Then build CCFS the global shared page cache as an optimization. I fail to see how starting with CCFS will help, or assuming CCFS will be there will help. Unless you think the R&D budgets of all of the non coherent cluster guys is insubstantial, and somehow not up to the task. Eric