From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S264624AbTIDDTa (ORCPT ); Wed, 3 Sep 2003 23:19:30 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S264616AbTIDDT3 (ORCPT ); Wed, 3 Sep 2003 23:19:29 -0400 Received: from [209.195.52.120] ([209.195.52.120]:53738 "HELO warden2.diginsite.com") by vger.kernel.org with SMTP id S264624AbTIDDTV (ORCPT ); Wed, 3 Sep 2003 23:19:21 -0400 From: David Lang To: "Martin J. Bligh" Cc: Larry McVoy , "Brown, Len" , Giuliano Pochini , linux-kernel@vger.kernel.org Date: Wed, 3 Sep 2003 20:16:16 -0700 (PDT) Subject: Re: Scaling noise In-Reply-To: <9110000.1062643682@[10.10.2.4]> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 3 Sep 2003, Martin J. Bligh wrote: > >> --Larry McVoy wrote (on Wednesday, September 03, 2003 17:36:33 -0700): > >> > They have to be, CPUs are fast enough > >> > to handle most problems, clustering has worked for lots of big companies > >> > like Google, Amazon, Yahoo, and the HPC market has been flat for years. > >> > So where's the growth? Nowhere I can see. If I'm not seeing it, show > >> > me the data. I may be a pain in the ass but I'll change my mind instantly > >> > when you show me data that says something different than what I believe. > >> > So far, all I've seen is people having fun proving that their ego is > >> > bigger than the next guys, no real data. Come on, you'd love nothing > >> > better than to prove me wrong. Do it. Or admit that you can't. > >> > >> Not quite sure why the onus is on the rest of us to disprove your pet > >> theory, rather than you to prove it. > > > > Maybe because history has shown over and over again that your pet theory > > doesn't work. Mine might be wrong but it hasn't been proven wrong. Yours > > has. Multiple times. > > Please, this makes no sense. Why do you think IBM and others make large > machines? Stupidity? From my experience they're hard assed "if it don't > make a profit, nor is likely to, then it can piss off" marketeers. Which > often pisses me off, but still ... if it didn't make money, they wouldn't > do it. And no, I can't go get you internal confidential sales figures, > but I'll bet you we're not selling these things at a loss for our own > general self-flagellating amusement. > > I don't think you're stupid, but please ... who do you think has better > data on this? IBM market research people? or you? I think I'd bet on IBM. there are some problems that don't scale well to multiple machines/images (large databases, huge working sets, etc). and there are other cases where the problem may be able to scale to multiple machines, but the software was badly written and so the software won't. in these cases all you can do is to buy a bigger box, however inefficantly you use the extra processors. I know one company a couple years ago that looked at buying a 24 CPU IBM machine but found that it was outperformed by a pair of 4 CPU IBM machines (all running the same version of AIX, and the same application software). in this case they hit a AIX limit that more CPU's/RAM couldn't help with, but more copies of the OS could. as CPU speeds climb and the relative penalty for going off the chip for any reason climb even faster the hardware overhead of keeping a SMP machine coherant becomes more significant. sometimes this can be solved by changing the algorithm (the O(1) scheduler for example) but other times you just have to accept the costof bouncing a cache line from one CPU to another so that you can aquire a lock. the advantage of multiple images is that the hardware only needs to maintain full speed consistancy between CPU's in one image, a higher level (the OS or the partitioning software) can deal with the issues of letting one image know what it needs to about what the others are doing. some of these problems can be addressed in hardware (the Opteron could be called SSI-NUMA that has it's partitioning layer running in hardware for up to 8 CPU's) but addressing it in hardware runs into scaling problems becouse you don't want to pay to much at the low end for the features you need on the high end (which is why the opteron doesn't directly scale to 128+ CPU's in one image) David Lang