* Re: Minutes from Feb 21 LSE Call @ 2003-02-24 2:04 linux 2003-02-24 2:39 ` Linus Torvalds 0 siblings, 1 reply; 266+ messages in thread From: linux @ 2003-02-24 2:04 UTC (permalink / raw) To: linux-kernel, torvalds Linus brought back tablets from the mount on which were graven: > The x86 is a hell of a lot nicer than the ppc32, for example. On the > x86, you get good performance and you can ignore the design mistakes (ie > segmentation) by just basically turning them off. Now wait a minute. I thought you worked at Transmeta. There were no development and debugging costs associated with getting all those different kinds of gates working, and all the segmentation checking right? Wouldn't it have been easier to build the system, and shift the effort where it would really do some good, if you didn't have to support all that crap? An extra base/bounds check doesn't take any die area? An extra exception source doesn't complicate exception handling? > And the baroque instruction encoding on the x86 is actually a _good_ > thing: it's a rather dense encoding, which means that you win on icache. > It's a bit hard to decode, but who cares? Existing chips do well at > decoding, and thanks to the icache win they tend to perform better - and > they load faster too (which is important - you can make your CPU have > big caches, but _nothing_ saves you from the cold-cache costs). I *really* thought you worked at Transmeta. Transmeta's software-decoding is an extreme example of what all modern x86 processors are doing in their L1 caches, namely predecoding the instructions and storing them in expanded form. This varies from just adding boundary tags (Pentium) and instruction type (K7) through converting them to uops and cacheing those (P4). This exactly undoes any L1 cache size benefits. The win, of course, is that you don't have as much shifting and aligning on your i-fetch path, which all the fixed-instruction-size architectures already started with. So your comments only apply to the L2 cache. And for the expense of all the instruction predecoding logic betweeen L2 and L1, don't you think someone could build an instruction compressor to fit more into the die-size-limited L2 cache? With the sizes cache likes are getting to these days, you should be able to do pretty well. It seems like 6 of one, half dozen of the other, and would save the compiler writers a lot of pain. > The low register count isn't an issue when you code in any high-level > language, and it has actually forced x86 implementors to do a hell of a > lot better job than the competition when it comes to memory loads and > stores - which helps in general. While the RISC people were off trying > to optimize their compilers to generate loops that used all 32 registers > efficiently, the x86 implementors instead made the chip run fast on > varied loads and used tons of register renaming hardware (and looking at > _memory_ renaming too). I don't disagree that chip designers have managed to do very well with the x86, and there's nothing wrong with making a virtue out of a necessity, but that doesn't make the necessity good. I was about to raise the same point. L1 dcache access tends to be a cycle-limiting bottleneck, and as pearly as the original Pentium, the x86 had to go to a 2-access-per-cycle L1 dcache to avoid bottlenecking with only 2 pipes! The low register count *does* affect you when using a high-level language, because if you have too many live variables floating around, you start suffering. Handling these spills is why you need memory renaming. It's true that x86 processors have had fancy architectural features sooner than similar-performance RISCs, but I think there's a fair case that that's because they've *needed* them. Why do the P4 and K7/K8 have such enormous reorder buffers, able to keep around 100 instructions in flight at a time? Because they need it to extract parallelism out of an instruction stream serialized by a miserly register file. They've developed some great technology to compensate for the weaknesses, but it's sure nice to dream of an architecture with all that great technology but with fewer initial warts. (Alpha seemed like the best hope, but *sigh*. Still, however you apportion blame for its demise, performance was clearly not one of its problems.) I think the same claim applies much more powerfully to the ppc32's MMU. It may be stupid, but it is only visible from inside the kernel, and a fairly small piece of the kernel at that. It could be scrapped and replaced with something better without any effect on existing user-level code at all. Do you think you can replace the x86's register problems as easily? > The only real major failure of the x86 is the PAE crud. So you think AMD extended the register file just for fun? Hell, the "PAE crud" is the *same* problem as the tiny register file. Insufficient virtual address space leading to physical > virtual kludges. And, as you've noticed, there are limits to the physical/virtual ratio above which it gets really painful. And the 64G:4G ratio of PAE is mirrored in the 128:8 ratio of P4 integer registers. I wish the original Intel designers could have left a "no heroic measures" living will, because that design is on more life support than Darth Vader. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 2:04 Minutes from Feb 21 LSE Call linux @ 2003-02-24 2:39 ` Linus Torvalds 2003-02-24 3:28 ` David Lang 2003-02-24 4:42 ` Martin J. Bligh 0 siblings, 2 replies; 266+ messages in thread From: Linus Torvalds @ 2003-02-24 2:39 UTC (permalink / raw) To: linux; +Cc: linux-kernel On 24 Feb 2003 linux@horizon.com wrote: > > Now wait a minute. I thought you worked at Transmeta. > > There were no development and debugging costs associated with getting > all those different kinds of gates working, and all the segmentation > checking right? So? The only thing that matters is the end result. > Wouldn't it have been easier to build the system, and shift the effort > where it would really do some good, if you didn't have to support > all that crap? Probably not appreciably. You forget - it's been tried. Over and over again. The whole RISC philosophy was all about "wouldn't it perform better if you didn't have to support that crap". The fact is, the "crap" doesn't matter that much. As proven by the fact that the "crap" processor family ends up being the one that eats pretty much everybody else for lunch on performance issues. Yes, the "crap" does end up making it a harder market to enter. There's a lot of IP involved in knowing what all the rules are, and having literally _millions_ of tests that check for conformance to the architecture (and much of the "architecture" is a de-facto thing, not really written down in architecture manuals). But clearly even that is not insurmountable, as shown by the fact that not only does the x85 perform well, it's also one of the few CPU's that are actively worked on by multiple different companies (including Transmeta, as you point out - although clearly the "crap" is one reason why the sw approach works at all). > Transmeta's software-decoding is an extreme example of what all modern > x86 processors are doing in their L1 caches, namely predecoding the > instructions and storing them in expanded form. This varies from > just adding boundary tags (Pentium) and instruction type (K7) through > converting them to uops and cacheing those (P4). But you seem to imply that that is somehow a counter-argument to _my_ argument. And I don't agree. I think what Transmeta (and AMD, and VIA etc) show is that the ugliness doesn't really matter - there are different ways of handling it, and you can either throw hardware at it or software at it, but it's still worth doing, because in the end what matters is not the bad parts of it, but the good parts. Btw, the P4 tracecache does pretty much exactly the same thing that Transmeta does, except in hardware. It's based on a very simple reality: decoding _is_ going to be the bottleneck for _any_ instruction set, once you've pushed the rest hard enough. If you're not doing predecoding, that only means that you haven't pushed hard enough yet - _regardless_ of your archtiecture. > This exactly undoes any L1 cache size benefits. The win, of course, is > that you don't have as much shifting and aligning on your i-fetch path, > which all the fixed-instruction-size architectures already started with. No. You don't understand what "cold-cache" case really means. It's more than just bringing the thing in from memory to the cache. It's also all about loading the dang thing from disk. > So your comments only apply to the L2 cache. And the disk. > And for the expense of all the instruction predecoding logic betweeen > L2 and L1, don't you think someone could build an instruction compressor > to fit more into the die-size-limited L2 cache? It's been done. See the PPC stuff. I've read the papers (it's been a long time, admittedly - it's not something new), and the fact is, it's not apparently being used that much. Because it's quite painful, unlike the x86 approach. > > stores - which helps in general. While the RISC people were off trying > > to optimize their compilers to generate loops that used all 32 registers > > efficiently, the x86 implementors instead made the chip run fast on > > varied loads and used tons of register renaming hardware (and looking at > > _memory_ renaming too). > > I don't disagree that chip designers have managed to do very well with > the x86, and there's nothing wrong with making a virtue out of a necessity, > but that doesn't make the necessity good. Actually, you miss my point. The necessity is good because it _forced_ people to look at what really matters. Instead of wasting 15 years and countless PhD's on things that are, in the end, just engineering-masturbation (nr of registers etc). > The low register count *does* affect you when using a high-level language, > because if you have too many live variables floating around, you start > suffering. Handling these spills is why you need memory renaming. Bzzt. Wrong answer. The right answer is that you need memory renaming and memory alias hardware _anyway_, because doing dynamic scheduling of loads vs stores is something that is _required_ to get the kind of performance that people expect today. And all the RISC stuff that tried to avoid it was just a BIG WASTE OF TIME. Because the _only_ thing the RISC approach ended up showing was that eventually you have to do the hard stuff anyway, so you might as well design for doing it in the first place. Which is what ia-64 did wrong - and what I mean by doing the same mistakes that everybody else did 15 years ago. Look at all the crap that ia64 does in order to do compiler-driven loop modulo-optimizations. That's part of the whole design, with predication and those horrible register windows. Can you say "risc mistakes all over again"? My strong suspicion (and that makes it a "fact" ;) is that in another 5 years they'll get to where the x86 has been for the last 10 years, and they'll realize that they will need to do out-of-order accesses etc, which makes all of that modulo optimization pretty much useless, since the hardware pretty much has to do it _anyway_. > It's true that x86 processors have had fancy architectural features > sooner than similar-performance RISCs, but I think there's a fair case > that that's because they've *needed* them. Which is exactly my point. And by the time you implement them, you notice that the half-way measures don't mean anything, and in fact make for more problems. For example, that small register state is a pain in the ass, no? But since you basically need register renaming _anyway_, the small register state actually has some advantages in that it makes it easier to have tons of read ports and still keep the register file fast. And once you do renaming (including memory state renaming), IT DOESN'T MUCH MATTER. > Why do the P4 and K7/K8 have > such enormous reorder buffers, able to keep around 100 instructions > in flight at a time? Because they need it to extract parallelism out > of an instruction stream serialized by a miserly register file. You think this is bad? Look at it another way: once you have hundreds of instructions in flight, you have hardware that automatically - executes legacy applications reasonably well, since compilers aren't the most important thing. End result: users are happy. - you don't need to have compilers that do stupid things like unrolling loops, thus keeping your icache pressure down, since you do loop unrolling in hardware thanks to deep pipelines. Even the RISC people are doing hundreds of instructions in flight (ie Power5), but they started doing it years after the x86 did, because they claimed that they could force their users to recompile their binaries every few years. And look where it actually got them.. > They've developed some great technology to compensate for the weaknesses, > but it's sure nice to dream of an architecture with all that great > technology but with fewer initial warts. (Alpha seemed like the > best hope, but *sigh*. Still, however you apportion blame for its > demise, performance was clearly not one of its problems.) So my premise is that you always end up doing the hard things anyway, and the "crap" _really_ doesn't matter. Alpha was nice, no question about it. But it took them way too long to get to the whole OoO thing, because they tried to take a short-cut that in the end wasn't the answer. It _looked_ like the answer (the original alpha design was done explicitly to not _need_ things like complex out-of-order execution), but it was all just wrong. The thing about the x86 is that hard cold reality (ie millions of customers that have existign applications) really _forces_ you to look at what matters, and so far it clearly appears that the things you are complaining about (registers and segmentation) simply do _not_ matter. > I think the same claim applies much more powerfully to the ppc32's MMU. > It may be stupid, but it is only visible from inside the kernel, and > a fairly small piece of the kernel at that. > > It could be scrapped and replaced with something better without any > effect on existing user-level code at all. > > Do you think you can replace the x86's register problems as easily? They _have_ been solved. The x86 performs about twice as well as any ppc32 on the market. End of discussion. > > The only real major failure of the x86 is the PAE crud. > > So you think AMD extended the register file just for fun? I think the AMD register file extension was unnecessary, yes. They did it because they could, and it wasn't a big deal. That's not the part that makes the architecture interesting. As you should well know. > Hell, the "PAE crud" is the *same* problem as the tiny register > file. Insufficient virtual address space leading to physical > virtual > kludges. Nope. The small register file is a non-issue. Trust me. I do work for transmeta, and we do the register renaming in software, and it doesn't matter in the end. Linus ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 2:39 ` Linus Torvalds @ 2003-02-24 3:28 ` David Lang 2003-02-26 5:30 ` Bernd Eckenfels 2003-02-24 4:42 ` Martin J. Bligh 1 sibling, 1 reply; 266+ messages in thread From: David Lang @ 2003-02-24 3:28 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux, linux-kernel also the RISC argument was started back when the CPU ran at the same speed as memory and CISC was limited to slow clock speeds, the RISC folks figured that if they could eliminate the CISC complexity they could ratchet up the clock speeds and more then make up for the other inefficiancies. unfortunantly for them the core CPU speeds became uncoupled from the memory speeds and skyrocketed up to the point where CISC cores are as fast or faster then the 'high speed' RISC cores. so the remaining difference between CISC and RISC really isn't the core functions, it's where do the optimizations take place. in CISC they take place on the CPU (either in software or hardware) in RISC they take place in the compiler unfortunantly for the RISC folks the optimizations change significantly from chip to chip and so the code needs to be compiled for each chip (in many cases each motheboard) to take full advantage of them. while this could be acceptable in an OpenSource world in the real world of pre-compiled binaries companies are not interested in shipping a dozen versions of a program so there is no preasure for the compilers to get optimized this well (besides the problem of people not wanting to use the 'blessed' compiler), and software development always laggs hardware so the compilers are never available when the CPUs are (and at the pace of development by the time they are availabel new CPUs are out) while on the CISC side of the house these optimizations are done in the CPU by the designers of the chips and since the performance of the chips is judged including these optimizations (no way to weasle out by claiming it's a compiler problem) a lot of attention is paid to this work and it's out when the chip is released (at this point both the tranemeta and intel CPUs have the ability to update this code, but it's really only done to fix major bugs) the other big claim of the RISC folks was that they didn't waste transisters on this since it was done in the compiler, but the transister budget has climbed so high since then that this cost is minimal. I seem to remember that intel claims that enabling hyperthreading (which not only duplicates this entire section, but adds even more to keep them straight) adds less then 10% to the transister count of the CPU. yes it would be nicer to have those transisters available to be used for other things, but experiance with RISC has shown that overall the tradeoff isn't worth it David Lang On Sun, 23 Feb 2003, Linus Torvalds wrote: > Date: Sun, 23 Feb 2003 18:39:09 -0800 (PST) > From: Linus Torvalds <torvalds@transmeta.com> > To: linux@horizon.com > Cc: linux-kernel@vger.kernel.org > Subject: Re: Minutes from Feb 21 LSE Call > > > On 24 Feb 2003 linux@horizon.com wrote: > > > > Now wait a minute. I thought you worked at Transmeta. > > > > There were no development and debugging costs associated with getting > > all those different kinds of gates working, and all the segmentation > > checking right? > > So? The only thing that matters is the end result. > > > Wouldn't it have been easier to build the system, and shift the effort > > where it would really do some good, if you didn't have to support > > all that crap? > > Probably not appreciably. You forget - it's been tried. Over and over > again. The whole RISC philosophy was all about "wouldn't it perform better > if you didn't have to support that crap". > > The fact is, the "crap" doesn't matter that much. As proven by the fact > that the "crap" processor family ends up being the one that eats pretty > much everybody else for lunch on performance issues. > > Yes, the "crap" does end up making it a harder market to enter. There's a > lot of IP involved in knowing what all the rules are, and having literally > _millions_ of tests that check for conformance to the architecture (and > much of the "architecture" is a de-facto thing, not really written down in > architecture manuals). > > But clearly even that is not insurmountable, as shown by the fact that not > only does the x85 perform well, it's also one of the few CPU's that are > actively worked on by multiple different companies (including Transmeta, > as you point out - although clearly the "crap" is one reason why the sw > approach works at all). > > > Transmeta's software-decoding is an extreme example of what all modern > > x86 processors are doing in their L1 caches, namely predecoding the > > instructions and storing them in expanded form. This varies from > > just adding boundary tags (Pentium) and instruction type (K7) through > > converting them to uops and cacheing those (P4). > > But you seem to imply that that is somehow a counter-argument to _my_ > argument. And I don't agree. > > I think what Transmeta (and AMD, and VIA etc) show is that the ugliness > doesn't really matter - there are different ways of handling it, and you > can either throw hardware at it or software at it, but it's still worth > doing, because in the end what matters is not the bad parts of it, but the > good parts. > > Btw, the P4 tracecache does pretty much exactly the same thing that > Transmeta does, except in hardware. It's based on a very simple reality: > decoding _is_ going to be the bottleneck for _any_ instruction set, once > you've pushed the rest hard enough. If you're not doing predecoding, that > only means that you haven't pushed hard enough yet - _regardless_ of your > archtiecture. > > > This exactly undoes any L1 cache size benefits. The win, of course, is > > that you don't have as much shifting and aligning on your i-fetch path, > > which all the fixed-instruction-size architectures already started with. > > No. You don't understand what "cold-cache" case really means. It's more > than just bringing the thing in from memory to the cache. It's also all > about loading the dang thing from disk. > > > So your comments only apply to the L2 cache. > > And the disk. > > > And for the expense of all the instruction predecoding logic betweeen > > L2 and L1, don't you think someone could build an instruction compressor > > to fit more into the die-size-limited L2 cache? > > It's been done. See the PPC stuff. I've read the papers (it's been a long > time, admittedly - it's not something new), and the fact is, it's not > apparently being used that much. Because it's quite painful, unlike the > x86 approach. > > > > stores - which helps in general. While the RISC people were off trying > > > to optimize their compilers to generate loops that used all 32 registers > > > efficiently, the x86 implementors instead made the chip run fast on > > > varied loads and used tons of register renaming hardware (and looking at > > > _memory_ renaming too). > > > > I don't disagree that chip designers have managed to do very well with > > the x86, and there's nothing wrong with making a virtue out of a necessity, > > but that doesn't make the necessity good. > > Actually, you miss my point. > > The necessity is good because it _forced_ people to look at what really > matters. Instead of wasting 15 years and countless PhD's on things that > are, in the end, just engineering-masturbation (nr of registers etc). > > > The low register count *does* affect you when using a high-level language, > > because if you have too many live variables floating around, you start > > suffering. Handling these spills is why you need memory renaming. > > Bzzt. Wrong answer. > > The right answer is that you need memory renaming and memory alias > hardware _anyway_, because doing dynamic scheduling of loads vs stores is > something that is _required_ to get the kind of performance that people > expect today. And all the RISC stuff that tried to avoid it was just a BIG > WASTE OF TIME. Because the _only_ thing the RISC approach ended up showing > was that eventually you have to do the hard stuff anyway, so you might as > well design for doing it in the first place. > > Which is what ia-64 did wrong - and what I mean by doing the same mistakes > that everybody else did 15 years ago. Look at all the crap that ia64 does > in order to do compiler-driven loop modulo-optimizations. That's part of > the whole design, with predication and those horrible register windows. > Can you say "risc mistakes all over again"? > > My strong suspicion (and that makes it a "fact" ;) is that in another 5 > years they'll get to where the x86 has been for the last 10 years, and > they'll realize that they will need to do out-of-order accesses etc, which > makes all of that modulo optimization pretty much useless, since the > hardware pretty much has to do it _anyway_. > > > It's true that x86 processors have had fancy architectural features > > sooner than similar-performance RISCs, but I think there's a fair case > > that that's because they've *needed* them. > > Which is exactly my point. And by the time you implement them, you notice > that the half-way measures don't mean anything, and in fact make for more > problems. > > For example, that small register state is a pain in the ass, no? But since > you basically need register renaming _anyway_, the small register state > actually has some advantages in that it makes it easier to have tons of > read ports and still keep the register file fast. And once you do renaming > (including memory state renaming), IT DOESN'T MUCH MATTER. > > > Why do the P4 and K7/K8 have > > such enormous reorder buffers, able to keep around 100 instructions > > in flight at a time? Because they need it to extract parallelism out > > of an instruction stream serialized by a miserly register file. > > You think this is bad? > > Look at it another way: once you have hundreds of instructions in flight, > you have hardware that automatically > > - executes legacy applications reasonably well, since compilers aren't > the most important thing. > > End result: users are happy. > > - you don't need to have compilers that do stupid things like unrolling > loops, thus keeping your icache pressure down, since you do loop > unrolling in hardware thanks to deep pipelines. > > Even the RISC people are doing hundreds of instructions in flight (ie > Power5), but they started doing it years after the x86 did, because they > claimed that they could force their users to recompile their binaries > every few years. And look where it actually got them.. > > > They've developed some great technology to compensate for the weaknesses, > > but it's sure nice to dream of an architecture with all that great > > technology but with fewer initial warts. (Alpha seemed like the > > best hope, but *sigh*. Still, however you apportion blame for its > > demise, performance was clearly not one of its problems.) > > So my premise is that you always end up doing the hard things anyway, and > the "crap" _really_ doesn't matter. > > Alpha was nice, no question about it. But it took them way too long to get > to the whole OoO thing, because they tried to take a short-cut that in the > end wasn't the answer. It _looked_ like the answer (the original alpha > design was done explicitly to not _need_ things like complex out-of-order > execution), but it was all just wrong. > > The thing about the x86 is that hard cold reality (ie millions of > customers that have existign applications) really _forces_ you to look at > what matters, and so far it clearly appears that the things you are > complaining about (registers and segmentation) simply do _not_ matter. > > > I think the same claim applies much more powerfully to the ppc32's MMU. > > It may be stupid, but it is only visible from inside the kernel, and > > a fairly small piece of the kernel at that. > > > > It could be scrapped and replaced with something better without any > > effect on existing user-level code at all. > > > > Do you think you can replace the x86's register problems as easily? > > They _have_ been solved. The x86 performs about twice as well as any ppc32 > on the market. End of discussion. > > > > The only real major failure of the x86 is the PAE crud. > > > > So you think AMD extended the register file just for fun? > > I think the AMD register file extension was unnecessary, yes. They did it > because they could, and it wasn't a big deal. That's not the part that > makes the architecture interesting. As you should well know. > > > Hell, the "PAE crud" is the *same* problem as the tiny register > > file. Insufficient virtual address space leading to physical > virtual > > kludges. > > Nope. The small register file is a non-issue. Trust me. I do work for > transmeta, and we do the register renaming in software, and it doesn't > matter in the end. > > Linus > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 3:28 ` David Lang @ 2003-02-26 5:30 ` Bernd Eckenfels 2003-02-26 5:42 ` William Lee Irwin III 2003-02-27 17:50 ` Daniel Egger 0 siblings, 2 replies; 266+ messages in thread From: Bernd Eckenfels @ 2003-02-26 5:30 UTC (permalink / raw) To: linux-kernel In article <Pine.LNX.4.44.0302231906360.8609-100000@dlang.diginsite.com> you wrote: > unfortunantly for them the core CPU speeds became uncoupled from the > memory speeds and skyrocketed up to the point where CISC cores are as fast > or faster then the 'high speed' RISC cores. Hmm.. are there any RISC Cores which run even closely to CISC Speeds? And why not? Is this only the financial power of Intel? Bernd ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-26 5:30 ` Bernd Eckenfels @ 2003-02-26 5:42 ` William Lee Irwin III 2003-02-26 7:22 ` David Lang 2003-02-27 17:50 ` Daniel Egger 1 sibling, 1 reply; 266+ messages in thread From: William Lee Irwin III @ 2003-02-26 5:42 UTC (permalink / raw) To: Bernd Eckenfels; +Cc: linux-kernel In article <...> someone wrote: >> unfortunantly for them the core CPU speeds became uncoupled from the >> memory speeds and skyrocketed up to the point where CISC cores are as fast >> or faster then the 'high speed' RISC cores. On Wed, Feb 26, 2003 at 06:30:50AM +0100, Bernd Eckenfels wrote: > Hmm.. are there any RISC Cores which run even closely to CISC Speeds? > And why not? Is this only the financial power of Intel? There is one other: x86 binary compatibility. Looks like the beginning and end of it to me. -- wli ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-26 5:42 ` William Lee Irwin III @ 2003-02-26 7:22 ` David Lang 0 siblings, 0 replies; 266+ messages in thread From: David Lang @ 2003-02-26 7:22 UTC (permalink / raw) To: William Lee Irwin III; +Cc: Bernd Eckenfels, linux-kernel On Tue, 25 Feb 2003, William Lee Irwin III wrote: > In article <...> someone wrote: > >> unfortunantly for them the core CPU speeds became uncoupled from the > >> memory speeds and skyrocketed up to the point where CISC cores are as fast > >> or faster then the 'high speed' RISC cores. > > On Wed, Feb 26, 2003 at 06:30:50AM +0100, Bernd Eckenfels wrote: > > Hmm.. are there any RISC Cores which run even closely to CISC Speeds? > > And why not? Is this only the financial power of Intel? > > There is one other: x86 binary compatibility. > > Looks like the beginning and end of it to me. it's more then just the financial power of Intel, AMD is also in many ways above the performance of the 'high-end' processors aceshardware has a chart showing several different processors (dated in october 2002 so it's not _that_ out of date http://www.aceshardware.com/read_news.jsp?id=60000436 one interesting thing I see from this chart is that the x86 processors are well ahead in integer performance and pulling further ahead (the pace of development is significantly faster then the other processors) while they do lag in floating point (but not by that much) there are a LOT of workloads where the floating point performance is not as important (the K2 showed that it can't lag _to_ far behind) the x86 binary compatability means that even a 'low volume' x86 compatable chip has a large potential market and a company can do reasonably well getting a small percentage of the market (see the transmeta and cyrix shiips) while the non x86 chips (including ia64) have to invent a new market segment for themselves. David Lang ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-26 5:30 ` Bernd Eckenfels 2003-02-26 5:42 ` William Lee Irwin III @ 2003-02-27 17:50 ` Daniel Egger 2003-02-27 18:25 ` David Lang 1 sibling, 1 reply; 266+ messages in thread From: Daniel Egger @ 2003-02-27 17:50 UTC (permalink / raw) To: Bernd Eckenfels; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 667 bytes --] Am Mit, 2003-02-26 um 06.30 schrieb Bernd Eckenfels: > > unfortunantly for them the core CPU speeds became uncoupled from the > > memory speeds and skyrocketed up to the point where CISC cores are as fast > > or faster then the 'high speed' RISC cores. > Hmm.. are there any RISC Cores which run even closely to CISC Speeds? Define RISC and CISC: do you mean pure RISC implementations or RISC implementations with CISC frontend? Define Speed: Felt speed, clock speed or measurable speed? I'm convinced that for each (sensible) combination of definations above there's a clear indication that your question is wrong. -- Servus, Daniel [-- Attachment #2: Dies ist ein digital signierter Nachrichtenteil --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-27 17:50 ` Daniel Egger @ 2003-02-27 18:25 ` David Lang 2003-02-28 8:58 ` Filip Van Raemdonck 2003-02-28 19:48 ` Arador 0 siblings, 2 replies; 266+ messages in thread From: David Lang @ 2003-02-27 18:25 UTC (permalink / raw) To: Daniel Egger; +Cc: Bernd Eckenfels, linux-kernel On Thu, 27 Feb 2003, Daniel Egger wrote: > Date: Thu, 27 Feb 2003 09:50:21 -0800 > From: Daniel Egger <degger@fhm.edu> > To: Bernd Eckenfels <ecki@calista.eckenfels.6bone.ka-ip.net> > Cc: linux-kernel@vger.kernel.org > Subject: Re: Minutes from Feb 21 LSE Call > > Am Mit, 2003-02-26 um 06.30 schrieb Bernd Eckenfels: > > > > unfortunantly for them the core CPU speeds became uncoupled from the > > > memory speeds and skyrocketed up to the point where CISC cores are > as fast > > > or faster then the 'high speed' RISC cores. > > > Hmm.. are there any RISC Cores which run even closely to CISC Speeds? > > Define RISC and CISC: do you mean pure RISC implementations or RISC > implementations with CISC frontend? as far as programmers and users are concerned there is no difference between CISC and RISC with a CISC front-end (transmeta is the most obvious example of this, but all current CISC chips use this technique) > Define Speed: Felt speed, clock speed or measurable speed? for my origional post I was refering to pure clock speeds, remember that the origional RISC chips came out when CISC chips were just starting to hit 60MHz and a large part of their claim was that it didn't matter if the chip got less done per clock becouse they could run at much higher speeds (a couple hundred MHz). Also the Instructions per Clock for CISC chips was very high with the RISC chips pushing towards 1 IPC. the reasoning was that there was no way to implement all the complicated CISC instruction set decoding and options and achieve anything close to the clock speeds that the nice streamlined RISC chips could reach. when the RISC chip cores are just over 1GHz and talking about possibly hitting 1.8GHz within a year or so the intel chips are pushing 3GHz while the AMD chips are pushing 2GHz (true speed, I'll avoid commenting on the mistakes that intel made on the P4 that make these chips competitive with each other :-) and the IPC of current CISC implementations is pushing towards 1 as well (insert disclaimer about benchmarks) so RISC no longer has a huge advantage there either. obviously higher clock speeds to not directly equate to higher performance, but as Linus has pointed out there are a lot of efficiancies in the CISC command set that mean that if you have two chips running at the same clockspeed with the same IPC the CISC command set will outperform the RISC one. the only serious advantage the RISC chips have today is the fact that they are 64 bit instead of 32 bit, and x86-64 will erase that limitation. David Lang > I'm convinced that for each (sensible) combination of definations above > there's a clear indication that your question is wrong. > > -- > Servus, > Daniel > ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-27 18:25 ` David Lang @ 2003-02-28 8:58 ` Filip Van Raemdonck 2003-02-28 19:48 ` Arador 1 sibling, 0 replies; 266+ messages in thread From: Filip Van Raemdonck @ 2003-02-28 8:58 UTC (permalink / raw) To: David Lang; +Cc: Daniel Egger, linux-kernel, Bernd Eckenfels On Thu, Feb 27, 2003 at 10:25:23AM -0800, David Lang wrote: > On Thu, 27 Feb 2003, Daniel Egger wrote: > > Am Mit, 2003-02-26 um 06.30 schrieb Bernd Eckenfels: > > > > > > Hmm.. are there any RISC Cores which run even closely to CISC Speeds? > > > > Define Speed: Felt speed, clock speed or measurable speed? > > when the RISC chip cores are just over 1GHz and talking about possibly > hitting 1.8GHz within a year or so the intel chips are pushing 3GHz while > the AMD chips are pushing 2GHz (true speed, I'll avoid commenting on the > mistakes that intel made on the P4 that make these chips competitive with > each other :-) And they need what kind of cooling to get there? Compare that to a passively cooled G4.[1] This is AFAIK not true for every RISC chip, I believe current ultrasparc and alpha even more so do need significant cooling as well. But IMO your question is not really relevant to the discussion, as especially the common CISC examples you are referring to are artificially pushed to higher clock speeds. Compare those to a passively cooled transmeta or via chip and it you can ask about clock speeds CISC vs CISC instead of CISC vs RISC. Or compare a Transmeta or Via chip with RISC cores and see who wins the clockspeed race then :-) Regards, Filip [1] Yes, those dual powermacs have a case exhaust fan which draws air over the CPUs' fin. So has the fastest passively cooled, not clocked down intel desktop I've ever seen, a 300MHz pII Dell Dimension. (That's also still the most silent "high-end" intel desktop I've ever seen) -- "The only stupid question is the unasked one." -- Martin Schulze ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-27 18:25 ` David Lang 2003-02-28 8:58 ` Filip Van Raemdonck @ 2003-02-28 19:48 ` Arador 2003-03-01 0:51 ` Chris Wedgwood 1 sibling, 1 reply; 266+ messages in thread From: Arador @ 2003-02-28 19:48 UTC (permalink / raw) To: David Lang; +Cc: degger, ecki, linux-kernel El día Thu, 27 Feb 2003 10:25:23 -0800 (PST) David Lang <david.lang@digitalinsight.com> escribió... > when the RISC chip cores are just over 1GHz and talking about possibly > hitting 1.8GHz within a year or so the intel chips are pushing 3GHz while I suppose eveybody has seen this... (today on slashdot ;): http://apple.slashdot.org/article.pl?sid=03/02/27/2227257&mode=thread&tid=136 "PowerPC 970 Running at 2.5 GHz" ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-28 19:48 ` Arador @ 2003-03-01 0:51 ` Chris Wedgwood 2003-03-01 1:14 ` Davide Libenzi 2003-03-01 1:27 ` David Lang 0 siblings, 2 replies; 266+ messages in thread From: Chris Wedgwood @ 2003-03-01 0:51 UTC (permalink / raw) To: Arador; +Cc: David Lang, degger, ecki, linux-kernel On Fri, Feb 28, 2003 at 08:48:26PM +0100, Arador wrote: > I suppose eveybody has seen this... (today on slashdot ;): > http://apple.slashdot.org/article.pl?sid=03/02/27/2227257&mode=thread&tid=136 In a lab... who cares. I would guess the P4s or whatever are at 5GHz+ in Intel's labs. --cw ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-03-01 0:51 ` Chris Wedgwood @ 2003-03-01 1:14 ` Davide Libenzi 2003-03-01 1:27 ` David Lang 1 sibling, 0 replies; 266+ messages in thread From: Davide Libenzi @ 2003-03-01 1:14 UTC (permalink / raw) To: Chris Wedgwood; +Cc: Linux Kernel Mailing List On Fri, 28 Feb 2003, Chris Wedgwood wrote: > On Fri, Feb 28, 2003 at 08:48:26PM +0100, Arador wrote: > > > I suppose eveybody has seen this... (today on slashdot ;): > > http://apple.slashdot.org/article.pl?sid=03/02/27/2227257&mode=thread&tid=136 > > In a lab... who cares. > > I would guess the P4s or whatever are at 5GHz+ in Intel's labs. Last time I checked ( one year ago ), they were running a cooled ALU at 10GHz. - Davide ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-03-01 0:51 ` Chris Wedgwood 2003-03-01 1:14 ` Davide Libenzi @ 2003-03-01 1:27 ` David Lang 2003-03-01 14:15 ` Daniel Egger 1 sibling, 1 reply; 266+ messages in thread From: David Lang @ 2003-03-01 1:27 UTC (permalink / raw) To: Chris Wedgwood; +Cc: Arador, degger, ecki, linux-kernel this implies that when the new chip is released it may be up to 2.5GHz instead of the 1.8GHz previously listed. Ok, this means that a year or so from now when the chip is released it will have clock speeds in the range of todays x86 chips. this is at least in the ballpark to remain competitive. David Lang On Fri, 28 Feb 2003, Chris Wedgwood wrote: > Date: Fri, 28 Feb 2003 16:51:49 -0800 > From: Chris Wedgwood <cw@f00f.org> > To: Arador <diegocg@teleline.es> > Cc: David Lang <david.lang@digitalinsight.com>, degger@fhm.edu, > ecki@calista.eckenfels.6bone.ka-ip.net, linux-kernel@vger.kernel.org > Subject: Re: Minutes from Feb 21 LSE Call > > On Fri, Feb 28, 2003 at 08:48:26PM +0100, Arador wrote: > > > I suppose eveybody has seen this... (today on slashdot ;): > > http://apple.slashdot.org/article.pl?sid=03/02/27/2227257&mode=thread&tid=136 > > In a lab... who cares. > > I would guess the P4s or whatever are at 5GHz+ in Intel's labs. > > > --cw > ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-03-01 1:27 ` David Lang @ 2003-03-01 14:15 ` Daniel Egger 0 siblings, 0 replies; 266+ messages in thread From: Daniel Egger @ 2003-03-01 14:15 UTC (permalink / raw) To: David Lang; +Cc: Chris Wedgwood, linux-kernel [-- Attachment #1: Type: text/plain, Size: 931 bytes --] Am Sam, 2003-03-01 um 02.27 schrieb David Lang: > this implies that when the new chip is released it may be up to 2.5GHz > instead of the 1.8GHz previously listed. > Ok, this means that a year or so from now when the chip is released it > will have clock speeds in the range of todays x86 chips. this is at least > in the ballpark to remain competitive. From a pure clock point of view. For some reallife performance rather look at the SPEC numbers. Actually I don't care about the clocking of a processor especially when vendors are pushing it with longer pipelines as this is all marketing crap (and let's see what problems Intel will face with the Centrino architecture to tell that lower clocks might bring higher performance with a different design). BTW: Distributed.net client has now an AltiVec implementation which makes an G4/500 eat an Athlon/1.8GHz for breakfast... -- Servus, Daniel [-- Attachment #2: Dies ist ein digital signierter Nachrichtenteil --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 2:39 ` Linus Torvalds 2003-02-24 3:28 ` David Lang @ 2003-02-24 4:42 ` Martin J. Bligh 2003-02-24 4:58 ` Linus Torvalds 1 sibling, 1 reply; 266+ messages in thread From: Martin J. Bligh @ 2003-02-24 4:42 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel > The fact is, the "crap" doesn't matter that much. As proven by the fact > that the "crap" processor family ends up being the one that eats pretty > much everybody else for lunch on performance issues. But is that because it's a better design? Or because it has more money thrown at it? I suspect it's merely it's mass-market dominance generating huge amounts of cash to improve it ... and it got there through history, not technical prowess. Of course, to be pragmatic about it, none of this matters. The chip with the best price:performance and market presence wins, not the best technical design. M. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 4:42 ` Martin J. Bligh @ 2003-02-24 4:58 ` Linus Torvalds 0 siblings, 0 replies; 266+ messages in thread From: Linus Torvalds @ 2003-02-24 4:58 UTC (permalink / raw) To: Martin J. Bligh; +Cc: linux-kernel On Sun, 23 Feb 2003, Martin J. Bligh wrote: > > > The fact is, the "crap" doesn't matter that much. As proven by the fact > > that the "crap" processor family ends up being the one that eats pretty > > much everybody else for lunch on performance issues. > > But is that because it's a better design? Or because it has more money > thrown at it? I suspect it's merely it's mass-market dominance generating > huge amounts of cash to improve it ... and it got there through history, > not technical prowess. Sure. It's to a large degree "more money and resources", no question about that. But what is "better design"? Would it have been possible to put as much effort as Intel (and others) put into the x86 architecture into something else, and make it even better? MY standpoint is that the above question is _meaningless_ and stupid. People did try. Very hard. Claiming anything else is clearly misguided. But compatibility and price matter equally much - and often more - than raw performance. Which means that even _if_ another architecture performed better (and it certainly happened, in the hay-day of the alpha), it wouldn't much matter. People still stayed away from it in droves. And in the end, that's why I don't like IA-64. I'll take back every single bad thing I've ever said about IA-64 if Intel were to just to sell those things to the mass market instead of P4's. But clearly the IA-64 can't make it in that market, and thus it is made irrelevant. The same way alpha was made irrelevant, _despite_ having had much better performance - an advantage that ia-64 clearly doesn't have. (Admittedly, alpha didn't have hugely better performance for very long. Intel came out with the PPro, and took a _lot_ of people by surprise). AMD's x86-64 approach is a lot more interesting not so much because of any technical issues, but because AMD _can_ try to avoid the "irrelevant" part. By having a part that _can_ potentially compete in the market against a P4, AMD has something that is worth hoping for. Something that can make a difference. IBM with Power5 and apple could be the same thing (yeah yeah, I personally suspect it goes enough against IBMs normal approach that it will cause some friction). A CPU that actually competes in a market that is relevant. Because server CPU's simply aren't very interesting from a technical standpoint. I don't know of a _single_ CPU that ever grew down. But we've seen a _lot_ of CPU's grow _up_. In other words: the small machines tend to eat into the large ones, not the other way around. And if you start from the large ones, you aren't going to make it in the long run. Put yet another way: if I was on Intels IA-32 team, I'd be a lot more worried about those XScale people finally getting their act together than I would be about IA-64. Linus ^ permalink raw reply [flat|nested] 266+ messages in thread
[parent not found: <Pine.LNX.4.44.0302221417120.2686-100000@coffee.psychology.mcmaster.ca>]
[parent not found: <1510000.1045942974@[10.10.2.4]>]
* Re: Minutes from Feb 21 LSE Call [not found] ` <1510000.1045942974@[10.10.2.4]> @ 2003-02-22 19:56 ` Larry McVoy 2003-02-22 20:24 ` William Lee Irwin III ` (2 more replies) 0 siblings, 3 replies; 266+ messages in thread From: Larry McVoy @ 2003-02-22 19:56 UTC (permalink / raw) To: Martin J. Bligh; +Cc: Mark Hahn, David S. Miller, Larry McVoy, linux-kernel On Sat, Feb 22, 2003 at 11:42:55AM -0800, Martin J. Bligh wrote: > >> Dell makes money on many things other than thin-margin PCs. And lo' > > > > Dell's revenue is 53/29/18% desktop/notebook/server; > > 80% of US sales are to businesses. their annual report doesn't > > break out service revenue. > > Interesting. Given the profit margins involved, I bet they still > make more money on servers than desktops and notebooks combined > (the annual report doesn't seem to list that). And that's before > you take account of the "linux weighting" on top of that ... Err, here's a news flash. Dell has just one server with more than 4 CPUS and it tops out at 8. Everything else is clusters. And they call any machine that doesn't have a head a server, they have servers starting $299. Yeah, that's right, $299. http://www.dell.com/us/en/bsd/products/series_pedge_servers.htm How much do you want to bet that more than 95% of their server revenue comes from 4CPU or less boxes? I wouldn't be surprised if it is more like 99.5%. And you can configure yourself a pretty nice quad xeon box for $25K. Yeah, there is some profit in there but nowhere near the huge margins you are counting on to make your case. -- --- Larry McVoy lm at bitmover.com http://www.bitmover.com/lm ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 19:56 ` Larry McVoy @ 2003-02-22 20:24 ` William Lee Irwin III 2003-02-22 21:02 ` Martin J. Bligh 2003-02-22 21:29 ` Jeff Garzik 2 siblings, 0 replies; 266+ messages in thread From: William Lee Irwin III @ 2003-02-22 20:24 UTC (permalink / raw) To: Larry McVoy, Martin J. Bligh, Mark Hahn, David S. Miller, Larry McVoy, linux-kernel On Sat, Feb 22, 2003 at 11:56:42AM -0800, Larry McVoy wrote: > Err, here's a news flash. Dell has just one server with more than > 4 CPUS and it tops out at 8. Everything else is clusters. And they > call any machine that doesn't have a head a server, they have servers > starting $299. Yeah, that's right, $299. > http://www.dell.com/us/en/bsd/products/series_pedge_servers.htm Sounds like low-capacity boxen meant to minimize colocation costs via rackspace minimization. On Sat, Feb 22, 2003 at 11:56:42AM -0800, Larry McVoy wrote: > How much do you want to bet that more than 95% of their server revenue > comes from 4CPU or less boxes? I wouldn't be surprised if it is more > like 99.5%. And you can configure yourself a pretty nice quad xeon box > for $25K. Yeah, there is some profit in there but nowhere near the huge > margins you are counting on to make your case. Ask their marketing dept. or something. I can maximize utility integrals and find Nash equilibria, but can't tell you Dell's secrets. -- wli ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 19:56 ` Larry McVoy 2003-02-22 20:24 ` William Lee Irwin III @ 2003-02-22 21:02 ` Martin J. Bligh 2003-02-22 22:06 ` Mark Hahn 2003-02-22 23:15 ` Larry McVoy 2003-02-22 21:29 ` Jeff Garzik 2 siblings, 2 replies; 266+ messages in thread From: Martin J. Bligh @ 2003-02-22 21:02 UTC (permalink / raw) To: Larry McVoy; +Cc: Mark Hahn, David S. Miller, linux-kernel >> Interesting. Given the profit margins involved, I bet they still >> make more money on servers than desktops and notebooks combined >> (the annual report doesn't seem to list that). And that's before >> you take account of the "linux weighting" on top of that ... > > Err, here's a news flash. Dell has just one server with more than > 4 CPUS and it tops out at 8. Everything else is clusters. And they > call any machine that doesn't have a head a server, they have servers > starting $299. Yeah, that's right, $299. > > http://www.dell.com/us/en/bsd/products/series_pedge_servers.htm > > How much do you want to bet that more than 95% of their server revenue > comes from 4CPU or less boxes? I wouldn't be surprised if it is more > like 99.5%. And you can configure yourself a pretty nice quad xeon box > for $25K. Yeah, there is some profit in there but nowhere near the huge > margins you are counting on to make your case. OK, so now you've slid from talking about PCs to 2-way to 4-way ... perhaps because your original arguement was fatally flawed. The work we're doing on scalablity has big impacts on 4-way systems as well as the high end. We're also simultaneously dramatically improving stability for smaller SMP machines by finding reproducing races in 5 minutes that smaller machines might hit once every year or so, and running high-stress workloads that thrash the hell out of various subsystems exposing bugs. Some applications work well on clusters, which will give them cheaper hardware, at the expense of a lot more complexity in userspace ... depending on the scale of the system, that's a tradeoff that might go either way. For applications that don't work well on clusters, you have no real choice but to go with the high-end systems. I'd like to see Linux across the board, as would many others. You don't believe we can make it scale without screwing up the low end, I do believe we can do that. Time will tell ... Linus et al are not stupid ... we're not going to be able to submit stuff that screwed up the low-end, even if we wanted to. M. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 21:02 ` Martin J. Bligh @ 2003-02-22 22:06 ` Mark Hahn 2003-02-22 22:17 ` William Lee Irwin III ` (3 more replies) 2003-02-22 23:15 ` Larry McVoy 1 sibling, 4 replies; 266+ messages in thread From: Mark Hahn @ 2003-02-22 22:06 UTC (permalink / raw) To: Martin J. Bligh; +Cc: linux-kernel > OK, so now you've slid from talking about PCs to 2-way to 4-way ... > perhaps because your original arguement was fatally flawed. oh, come on. the issue is whether memory is fast and flat. most "scalability" efforts are mainly trying to code around the fact that any ccNUMA (and most 4-ways) is going to be slow/bumpy. it is reasonable to worry that optimizations for imbalanced machines will hurt "normal" ones. is it worth hurting uni by 5% to give a 50% speedup to IBM's 32-way? I think not, simply because low-end machines are more important to Linux. the best way to kill Linux is to turn it into an OS best suited for $6+-digit machines. > For applications that don't work well on clusters, you have no real ccNUMA worst-case latencies are not much different from decent cluster (message-passing) latencies. getting an app to work on a cluster is a matter of programming will. regards, mark hahn. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 22:06 ` Mark Hahn @ 2003-02-22 22:17 ` William Lee Irwin III 2003-02-22 23:28 ` Larry McVoy 2003-02-22 22:44 ` Ben Greear ` (2 subsequent siblings) 3 siblings, 1 reply; 266+ messages in thread From: William Lee Irwin III @ 2003-02-22 22:17 UTC (permalink / raw) To: Mark Hahn; +Cc: Martin J. Bligh, linux-kernel On Sat, Feb 22, 2003 at 05:06:27PM -0500, Mark Hahn wrote: > ccNUMA worst-case latencies are not much different from decent > cluster (message-passing) latencies. Not even close, by several orders of magnitude. -- wli ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 22:17 ` William Lee Irwin III @ 2003-02-22 23:28 ` Larry McVoy 2003-02-22 23:47 ` Martin J. Bligh ` (2 more replies) 0 siblings, 3 replies; 266+ messages in thread From: Larry McVoy @ 2003-02-22 23:28 UTC (permalink / raw) To: William Lee Irwin III, Mark Hahn, Martin J. Bligh, linux-kernel On Sat, Feb 22, 2003 at 02:17:39PM -0800, William Lee Irwin III wrote: > On Sat, Feb 22, 2003 at 05:06:27PM -0500, Mark Hahn wrote: > > ccNUMA worst-case latencies are not much different from decent > > cluster (message-passing) latencies. > > Not even close, by several orders of magnitude. Err, I think you're wrong. It's been a long time since I looked, but I'm pretty sure myrinet had single digit microseconds. Yup, google rocks, 7.6 usecs, user to user. Last I checked, Sequents worst case was around there, right? -- --- Larry McVoy lm at bitmover.com http://www.bitmover.com/lm ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 23:28 ` Larry McVoy @ 2003-02-22 23:47 ` Martin J. Bligh 2003-02-23 0:09 ` Gerrit Huizenga 2003-02-24 18:36 ` Andy Pfiffer 2 siblings, 0 replies; 266+ messages in thread From: Martin J. Bligh @ 2003-02-22 23:47 UTC (permalink / raw) To: Larry McVoy, William Lee Irwin III, Mark Hahn, linux-kernel >> > ccNUMA worst-case latencies are not much different from decent >> > cluster (message-passing) latencies. >> >> Not even close, by several orders of magnitude. > > Err, I think you're wrong. It's been a long time since I looked, but I'm > pretty sure myrinet had single digit microseconds. Yup, google rocks, > 7.6 usecs, user to user. Last I checked, Sequents worst case was around > there, right? Sequent hardware is very old. Go time a Regatta. M. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 23:28 ` Larry McVoy 2003-02-22 23:47 ` Martin J. Bligh @ 2003-02-23 0:09 ` Gerrit Huizenga 2003-02-23 8:01 ` Larry McVoy 2003-02-24 18:36 ` Andy Pfiffer 2 siblings, 1 reply; 266+ messages in thread From: Gerrit Huizenga @ 2003-02-23 0:09 UTC (permalink / raw) To: Larry McVoy Cc: William Lee Irwin III, Mark Hahn, Martin J. Bligh, linux-kernel On Sat, 22 Feb 2003 15:28:59 PST, Larry McVoy wrote: > On Sat, Feb 22, 2003 at 02:17:39PM -0800, William Lee Irwin III wrote: > > On Sat, Feb 22, 2003 at 05:06:27PM -0500, Mark Hahn wrote: > > > ccNUMA worst-case latencies are not much different from decent > > > cluster (message-passing) latencies. > > > > Not even close, by several orders of magnitude. > > Err, I think you're wrong. It's been a long time since I looked, but I'm > pretty sure myrinet had single digit microseconds. Yup, google rocks, > 7.6 usecs, user to user. Last I checked, Sequents worst case was around > there, right? You are going to drag 1994 technology into this to compare against something in 2003? Hmm. You might win on that comparison. But yeah, Sequent way back then was in that ballpark. World has moved forwards since then... gerrit ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-23 0:09 ` Gerrit Huizenga @ 2003-02-23 8:01 ` Larry McVoy 2003-02-23 8:05 ` William Lee Irwin III 0 siblings, 1 reply; 266+ messages in thread From: Larry McVoy @ 2003-02-23 8:01 UTC (permalink / raw) To: Gerrit Huizenga Cc: Larry McVoy, William Lee Irwin III, Mark Hahn, Martin J. Bligh, linux-kernel On Sat, Feb 22, 2003 at 04:09:15PM -0800, Gerrit Huizenga wrote: > On Sat, 22 Feb 2003 15:28:59 PST, Larry McVoy wrote: > > On Sat, Feb 22, 2003 at 02:17:39PM -0800, William Lee Irwin III wrote: > > > On Sat, Feb 22, 2003 at 05:06:27PM -0500, Mark Hahn wrote: > > > > ccNUMA worst-case latencies are not much different from decent > > > > cluster (message-passing) latencies. > > > > > > Not even close, by several orders of magnitude. > > > > Err, I think you're wrong. It's been a long time since I looked, but I'm > > pretty sure myrinet had single digit microseconds. Yup, google rocks, > > 7.6 usecs, user to user. Last I checked, Sequents worst case was around > > there, right? > > You are going to drag 1994 technology into this to compare against > something in 2003? Hmm. You might win on that comparison. But yeah, > Sequent way back then was in that ballpark. World has moved forwards > since then... Really? "Several orders of magnitude"? Show me the data. -- --- Larry McVoy lm at bitmover.com http://www.bitmover.com/lm ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-23 8:01 ` Larry McVoy @ 2003-02-23 8:05 ` William Lee Irwin III 0 siblings, 0 replies; 266+ messages in thread From: William Lee Irwin III @ 2003-02-23 8:05 UTC (permalink / raw) To: Larry McVoy, Gerrit Huizenga, Larry McVoy, Mark Hahn, Martin J. Bligh, linux-kernel On Sat, Feb 22, 2003 at 04:09:15PM -0800, Gerrit Huizenga wrote: >> You are going to drag 1994 technology into this to compare against >> something in 2003? Hmm. You might win on that comparison. But yeah, >> Sequent way back then was in that ballpark. World has moved forwards >> since then... On Sun, Feb 23, 2003 at 12:01:43AM -0800, Larry McVoy wrote: > Really? "Several orders of magnitude"? Show me the data. I was assuming ethernet when I said that. -- wli ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 23:28 ` Larry McVoy 2003-02-22 23:47 ` Martin J. Bligh 2003-02-23 0:09 ` Gerrit Huizenga @ 2003-02-24 18:36 ` Andy Pfiffer 2 siblings, 0 replies; 266+ messages in thread From: Andy Pfiffer @ 2003-02-24 18:36 UTC (permalink / raw) To: Larry McVoy Cc: William Lee Irwin III, Mark Hahn, Martin J. Bligh, linux-kernel On Sat, 2003-02-22 at 15:28, Larry McVoy wrote: > On Sat, Feb 22, 2003 at 02:17:39PM -0800, William Lee Irwin III wrote: > > On Sat, Feb 22, 2003 at 05:06:27PM -0500, Mark Hahn wrote: > > > ccNUMA worst-case latencies are not much different from decent > > > cluster (message-passing) latencies. > > > > Not even close, by several orders of magnitude. > > Err, I think you're wrong. It's been a long time since I looked, but I'm > pretty sure myrinet had single digit microseconds. Yup, google rocks, > 7.6 usecs, user to user. Last I checked, Sequents worst case was around > there, right? FYI: The Intel/DOE ASCI Red system (>1 TFLOPS) delivered user-to-user messaging of < 5us. With a tail wind, peak point-to-point data rates, delivered from a user-mode buffer into another user-mode buffer anywhere else on the system were just shy of 400 megabytes/second (actual rates could be affected by several factors -- obviously). Andy ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 22:06 ` Mark Hahn 2003-02-22 22:17 ` William Lee Irwin III @ 2003-02-22 22:44 ` Ben Greear 2003-02-23 23:29 ` Bill Davidsen 2003-02-22 23:10 ` Martin J. Bligh 2003-02-25 2:19 ` Hans Reiser 3 siblings, 1 reply; 266+ messages in thread From: Ben Greear @ 2003-02-22 22:44 UTC (permalink / raw) To: Mark Hahn; +Cc: Martin J. Bligh, linux-kernel Mark Hahn wrote: >>OK, so now you've slid from talking about PCs to 2-way to 4-way ... >>perhaps because your original arguement was fatally flawed. > > > oh, come on. the issue is whether memory is fast and flat. > most "scalability" efforts are mainly trying to code around the fact > that any ccNUMA (and most 4-ways) is going to be slow/bumpy. > it is reasonable to worry that optimizations for imbalanced machines > will hurt "normal" ones. is it worth hurting uni by 5% to give > a 50% speedup to IBM's 32-way? I think not, simply because > low-end machines are more important to Linux. > > the best way to kill Linux is to turn it into an OS best suited > for $6+-digit machines. Linux has a key feature that most other OS's lack: It can (easily, and by all) be recompiled for a particular architecture. So, there is no particular reason why optimizing for a high-end system has to kill performance on uni-processor machines. For instance, don't locks simply get compiled away to nothing on uni-processor machines? -- Ben Greear <greearb@candelatech.com> <Ben_Greear AT excite.com> President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 22:44 ` Ben Greear @ 2003-02-23 23:29 ` Bill Davidsen 2003-02-23 23:37 ` Martin J. Bligh 0 siblings, 1 reply; 266+ messages in thread From: Bill Davidsen @ 2003-02-23 23:29 UTC (permalink / raw) To: Ben Greear; +Cc: Martin J. Bligh, Linux Kernel Mailing List On Sat, 22 Feb 2003, Ben Greear wrote: > Mark Hahn wrote: > > oh, come on. the issue is whether memory is fast and flat. > > most "scalability" efforts are mainly trying to code around the fact > > that any ccNUMA (and most 4-ways) is going to be slow/bumpy. > > it is reasonable to worry that optimizations for imbalanced machines > > will hurt "normal" ones. is it worth hurting uni by 5% to give > > a 50% speedup to IBM's 32-way? I think not, simply because > > low-end machines are more important to Linux. > > > > the best way to kill Linux is to turn it into an OS best suited > > for $6+-digit machines. > > Linux has a key feature that most other OS's lack: It can (easily, and by all) > be recompiled for a particular architecture. So, there is no particular reason why > optimizing for a high-end system has to kill performance on uni-processor > machines. This is exactly correct, although build just the optimal kernel for a machine is still somewhat art rather than science. You have to choose the trade-offs carefully. > For instance, don't locks simply get compiled away to nothing on > uni-processor machines? Preempt causes most of the issues of SMP with few of the benefits. There are loads for which it's ideal, but for general use it may not be the right feature, and I ran it during the time when it was just a patch, but lately I'm convinced it's for special occasions. -- bill davidsen <davidsen@tmr.com> CTO, TMR Associates, Inc Doing interesting things with little computers since 1979. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-23 23:29 ` Bill Davidsen @ 2003-02-23 23:37 ` Martin J. Bligh 2003-02-24 4:57 ` Larry McVoy 0 siblings, 1 reply; 266+ messages in thread From: Martin J. Bligh @ 2003-02-23 23:37 UTC (permalink / raw) To: Bill Davidsen, Ben Greear; +Cc: Linux Kernel Mailing List >> For instance, don't locks simply get compiled away to nothing on >> uni-processor machines? > > Preempt causes most of the issues of SMP with few of the benefits. There > are loads for which it's ideal, but for general use it may not be the > right feature, and I ran it during the time when it was just a patch, but > lately I'm convinced it's for special occasions. Note that preemption was pushed by the embedded people Larry was advocating for, not the big-machine crowd .... ironic, eh? M. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-23 23:37 ` Martin J. Bligh @ 2003-02-24 4:57 ` Larry McVoy 2003-02-24 6:10 ` Gerhard Mack 2003-02-24 7:44 ` Bill Huey 0 siblings, 2 replies; 266+ messages in thread From: Larry McVoy @ 2003-02-24 4:57 UTC (permalink / raw) To: Martin J. Bligh; +Cc: Bill Davidsen, Ben Greear, Linux Kernel Mailing List On Sun, Feb 23, 2003 at 03:37:49PM -0800, Martin J. Bligh wrote: > >> For instance, don't locks simply get compiled away to nothing on > >> uni-processor machines? > > > > Preempt causes most of the issues of SMP with few of the benefits. There > > are loads for which it's ideal, but for general use it may not be the > > right feature, and I ran it during the time when it was just a patch, but > > lately I'm convinced it's for special occasions. > > Note that preemption was pushed by the embedded people Larry was advocating > for, not the big-machine crowd .... ironic, eh? Dig through the mail logs and you'll see that I was completely against the preemption patch. I think it is a bad idea, if you want real time, use rt/linux, it solves the problem right. -- --- Larry McVoy lm at bitmover.com http://www.bitmover.com/lm ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 4:57 ` Larry McVoy @ 2003-02-24 6:10 ` Gerhard Mack 2003-02-24 6:52 ` Larry McVoy 2003-02-24 7:44 ` Bill Huey 1 sibling, 1 reply; 266+ messages in thread From: Gerhard Mack @ 2003-02-24 6:10 UTC (permalink / raw) To: Larry McVoy Cc: Martin J. Bligh, Bill Davidsen, Ben Greear, Linux Kernel Mailing List On Sun, 23 Feb 2003, Larry McVoy wrote: > Date: Sun, 23 Feb 2003 20:57:17 -0800 > From: Larry McVoy <lm@bitmover.com> > To: Martin J. Bligh <mbligh@aracnet.com> > Cc: Bill Davidsen <davidsen@tmr.com>, Ben Greear <greearb@candelatech.com>, > Linux Kernel Mailing List <linux-kernel@vger.kernel.org> > Subject: Re: Minutes from Feb 21 LSE Call > > On Sun, Feb 23, 2003 at 03:37:49PM -0800, Martin J. Bligh wrote: > > >> For instance, don't locks simply get compiled away to nothing on > > >> uni-processor machines? > > > > > > Preempt causes most of the issues of SMP with few of the benefits. There > > > are loads for which it's ideal, but for general use it may not be the > > > right feature, and I ran it during the time when it was just a patch, but > > > lately I'm convinced it's for special occasions. > > > > Note that preemption was pushed by the embedded people Larry was advocating > > for, not the big-machine crowd .... ironic, eh? > > Dig through the mail logs and you'll see that I was completely against the > preemption patch. I think it is a bad idea, if you want real time, use > rt/linux, it solves the problem right. So your saying I need to switch to rt/linux to run games or an mp3 player? Gerhard -- Gerhard Mack gmack@innerfire.net <>< As a computer I find your faith in technology amusing. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 6:10 ` Gerhard Mack @ 2003-02-24 6:52 ` Larry McVoy 2003-02-24 7:46 ` Bill Huey 0 siblings, 1 reply; 266+ messages in thread From: Larry McVoy @ 2003-02-24 6:52 UTC (permalink / raw) To: Gerhard Mack Cc: Larry McVoy, Martin J. Bligh, Bill Davidsen, Ben Greear, Linux Kernel Mailing List > > Dig through the mail logs and you'll see that I was completely against the > > preemption patch. I think it is a bad idea, if you want real time, use > > rt/linux, it solves the problem right. > > So your saying I need to switch to rt/linux to run games or an mp3 player? It depends on the quality you want. If you want it to work without exception, yeah, I guess that is what I'm saying. People seem to be willing to put up with sloppy playback on a computer that they would freak out over if it happened on their TV. rt/linux will make your el cheapo laptop actually deliver what you need. I think there has been a fair amount of discussion of this sort of stuff in the games world. Some game company got taken to task recently because even 2Ghz machines couldn't run their game properly. Makes me wonder if a real time system is what they need. -- --- Larry McVoy lm at bitmover.com http://www.bitmover.com/lm ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 6:52 ` Larry McVoy @ 2003-02-24 7:46 ` Bill Huey 0 siblings, 0 replies; 266+ messages in thread From: Bill Huey @ 2003-02-24 7:46 UTC (permalink / raw) To: Larry McVoy, Gerhard Mack, Larry McVoy, Martin J. Bligh, Bill Davidsen, Ben Greear, Linux Kernel Mailing List Cc: Bill Huey (Hui) On Sun, Feb 23, 2003 at 10:52:04PM -0800, Larry McVoy wrote: > willing to put up with sloppy playback on a computer that they would > freak out over if it happened on their TV. rt/linux will make your > el cheapo laptop actually deliver what you need. > > I think there has been a fair amount of discussion of this sort of stuff > in the games world. Some game company got taken to task recently because > even 2Ghz machines couldn't run their game properly. Makes me wonder if > a real time system is what they need. RT for TV, mp3 player and game performance ? What the hell happened to network, disk QoS and carrier grade issues with modern operating systems as it concerns telecoms ? VoIP ? My god. bill ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 4:57 ` Larry McVoy 2003-02-24 6:10 ` Gerhard Mack @ 2003-02-24 7:44 ` Bill Huey 2003-02-24 7:54 ` William Lee Irwin III 1 sibling, 1 reply; 266+ messages in thread From: Bill Huey @ 2003-02-24 7:44 UTC (permalink / raw) To: Larry McVoy, Martin J. Bligh, Bill Davidsen, Ben Greear, Linux Kernel Mailing List Cc: Bill Huey (Hui) On Sun, Feb 23, 2003 at 08:57:17PM -0800, Larry McVoy wrote: > Dig through the mail logs and you'll see that I was completely against the > preemption patch. I think it is a bad idea, if you want real time, use > rt/linux, it solves the problem right. And large unbounded operation on data structures. DOS, a single tasking operating system is fast running a single thread of execution too, it just happens to also be completely useless. Whether folks like it or not, embedded RT is the future of Linux much more so than any single NUMA machine that's sold or can be sold by IBM, SGI and any other vendor of that type. bill ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 7:44 ` Bill Huey @ 2003-02-24 7:54 ` William Lee Irwin III 2003-02-24 8:00 ` Bill Huey 0 siblings, 1 reply; 266+ messages in thread From: William Lee Irwin III @ 2003-02-24 7:54 UTC (permalink / raw) To: Bill Huey Cc: Larry McVoy, Martin J. Bligh, Bill Davidsen, Ben Greear, Linux Kernel Mailing List On Sun, Feb 23, 2003 at 08:57:17PM -0800, Larry McVoy wrote: >> Dig through the mail logs and you'll see that I was completely against the >> preemption patch. I think it is a bad idea, if you want real time, use >> rt/linux, it solves the problem right. On Sun, Feb 23, 2003 at 11:44:47PM -0800, Bill Huey wrote: > And large unbounded operation on data structures. DOS, a single tasking > operating system is fast running a single thread of execution too, it just > happens to also be completely useless. > Whether folks like it or not, embedded RT is the future of Linux much more > so than any single NUMA machine that's sold or can be sold by IBM, SGI and > any other vendor of that type. And scalability is as essential there as it is on 512x/16TB O2K's. For this, it's _downward_ scalability, where "downward" is relative to "typical" UP x86 boxen. -- wli ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 7:54 ` William Lee Irwin III @ 2003-02-24 8:00 ` Bill Huey 2003-02-24 8:40 ` Andrew Morton 2003-02-24 8:43 ` William Lee Irwin III 0 siblings, 2 replies; 266+ messages in thread From: Bill Huey @ 2003-02-24 8:00 UTC (permalink / raw) To: William Lee Irwin III, Larry McVoy, Martin J. Bligh, Bill Davidsen, Ben Greear, Linux Kernel Mailing List Cc: Bill Huey (Hui) On Sun, Feb 23, 2003 at 11:54:30PM -0800, William Lee Irwin III wrote: > On Sun, Feb 23, 2003 at 11:44:47PM -0800, Bill Huey wrote: > > And large unbounded operation on data structures. DOS, a single tasking > > operating system is fast running a single thread of execution too, it just > > happens to also be completely useless. > > Whether folks like it or not, embedded RT is the future of Linux much more > > so than any single NUMA machine that's sold or can be sold by IBM, SGI and > > any other vendor of that type. > > And scalability is as essential there as it is on 512x/16TB O2K's. > > For this, it's _downward_ scalability, where "downward" is relative to > "typical" UP x86 boxen. The good thing about Linux is that, with some compile options, stuff (scalability) can be insert and removed and any time. One shouldn't narrow their view of how an OS can be out of a strict tradition. I don't buy this spinlock-for-all-locking things tradition with no preemption, especially given some of the IO performance improvement that happened as a courtesy of preempt. Some how that was forgotten in Larry's discussion. bill ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 8:00 ` Bill Huey @ 2003-02-24 8:40 ` Andrew Morton 2003-02-24 8:50 ` William Lee Irwin III 2003-02-24 8:56 ` Bill Huey 2003-02-24 8:43 ` William Lee Irwin III 1 sibling, 2 replies; 266+ messages in thread From: Andrew Morton @ 2003-02-24 8:40 UTC (permalink / raw) To: Bill Huey; +Cc: wli, lm, mbligh, davidsen, greearb, linux-kernel, billh Bill Huey (Hui) <billh@gnuppy.monkey.org> wrote: > > especially given some of the IO performance improvement that happened as a courtesy > of preempt. There is no evidence for any such thing. Nor has any plausible theory been put forward as to why such an improvement should occur. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 8:40 ` Andrew Morton @ 2003-02-24 8:50 ` William Lee Irwin III 2003-02-24 16:17 ` yodaiken 2003-02-24 8:56 ` Bill Huey 1 sibling, 1 reply; 266+ messages in thread From: William Lee Irwin III @ 2003-02-24 8:50 UTC (permalink / raw) To: Andrew Morton; +Cc: Bill Huey, lm, mbligh, davidsen, greearb, linux-kernel Bill Huey (Hui) <billh@gnuppy.monkey.org> wrote: >> especially given some of the IO performance improvement that >> happened as a courtesy of preempt. On Mon, Feb 24, 2003 at 12:40:05AM -0800, Andrew Morton wrote: > There is no evidence for any such thing. Nor has any plausible > theory been put forward as to why such an improvement should occur. There's a vague notion in my head that it should decrease scheduling latencies in general, possibly including responses to io completion. No idea how that lines up with reality. You've actually tracked scheduling latencies at least at some point in the past. What kind of results have you seen from the stuff (if any)? -- wli ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 8:50 ` William Lee Irwin III @ 2003-02-24 16:17 ` yodaiken 2003-02-24 23:13 ` William Lee Irwin III 2003-02-25 2:07 ` Bill Huey 0 siblings, 2 replies; 266+ messages in thread From: yodaiken @ 2003-02-24 16:17 UTC (permalink / raw) To: William Lee Irwin III, Andrew Morton, Bill Huey, lm, mbligh, davidsen, greearb, linux-kernel On Mon, Feb 24, 2003 at 12:50:31AM -0800, William Lee Irwin III wrote: > Bill Huey (Hui) <billh@gnuppy.monkey.org> wrote: > >> especially given some of the IO performance improvement that > >> happened as a courtesy of preempt. > > On Mon, Feb 24, 2003 at 12:40:05AM -0800, Andrew Morton wrote: > > There is no evidence for any such thing. Nor has any plausible > > theory been put forward as to why such an improvement should occur. > > There's a vague notion in my head that it should decrease scheduling Vague notions seems to be the level of data on this topic. -- --------------------------------------------------------- Victor Yodaiken Finite State Machine Labs: The RTLinux Company. www.fsmlabs.com www.rtlinux.com 1+ 505 838 9109 ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 16:17 ` yodaiken @ 2003-02-24 23:13 ` William Lee Irwin III 2003-02-24 23:27 ` yodaiken 2003-02-25 2:07 ` Bill Huey 1 sibling, 1 reply; 266+ messages in thread From: William Lee Irwin III @ 2003-02-24 23:13 UTC (permalink / raw) To: yodaiken Cc: Andrew Morton, Bill Huey, lm, mbligh, davidsen, greearb, linux-kernel On Mon, Feb 24, 2003 at 12:50:31AM -0800, William Lee Irwin III wrote: >> There's a vague notion in my head that it should decrease scheduling On Mon, Feb 24, 2003 at 09:17:58AM -0700, yodaiken@fsmlabs.com wrote: > Vague notions seems to be the level of data on this topic. Which, if you had bothered reading the rest of my post, is why I asked for data. -- wli ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 23:13 ` William Lee Irwin III @ 2003-02-24 23:27 ` yodaiken 2003-02-24 23:54 ` William Lee Irwin III 2003-02-25 2:17 ` Bill Huey 0 siblings, 2 replies; 266+ messages in thread From: yodaiken @ 2003-02-24 23:27 UTC (permalink / raw) To: William Lee Irwin III, yodaiken, Andrew Morton, Bill Huey, lm, mbligh, davidsen, greearb, linux-kernel On Mon, Feb 24, 2003 at 03:13:41PM -0800, William Lee Irwin III wrote: > On Mon, Feb 24, 2003 at 12:50:31AM -0800, William Lee Irwin III wrote: > >> There's a vague notion in my head that it should decrease scheduling > > On Mon, Feb 24, 2003 at 09:17:58AM -0700, yodaiken@fsmlabs.com wrote: > > Vague notions seems to be the level of data on this topic. > > Which, if you had bothered reading the rest of my post, is why I asked > for data. I'm not sure what you are complaining about. I don't think there is good or even marginal data or explanations of this "effect". -- --------------------------------------------------------- Victor Yodaiken Finite State Machine Labs: The RTLinux Company. www.fsmlabs.com www.rtlinux.com 1+ 505 838 9109 ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 23:27 ` yodaiken @ 2003-02-24 23:54 ` William Lee Irwin III 2003-02-24 23:54 ` yodaiken 2003-02-25 2:17 ` Bill Huey 1 sibling, 1 reply; 266+ messages in thread From: William Lee Irwin III @ 2003-02-24 23:54 UTC (permalink / raw) To: yodaiken Cc: Andrew Morton, Bill Huey, lm, mbligh, davidsen, greearb, linux-kernel On Mon, Feb 24, 2003 at 03:13:41PM -0800, William Lee Irwin III wrote: >> Which, if you had bothered reading the rest of my post, is why I asked >> for data. On Mon, Feb 24, 2003 at 04:27:54PM -0700, yodaiken@fsmlabs.com wrote: > I'm not sure what you are complaining about. I don't think there is good > or even marginal data or explanations of this "effect". I'm complaining about being quoted out of context and the animus against unsupported preempt claims being directed against me. Re-stating preempt's "ostensible purpose" is the purpose of the "vague notion", not adding to the pile of speculation. For the data, akpm has apparently tracked scheduling latency, so there is a chance he actually knows whether it's serving its ostensible purpose as opposed to having a large stockpile of overwrought wisecracks and a propensity for quoting out of context. -- wli ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 23:54 ` William Lee Irwin III @ 2003-02-24 23:54 ` yodaiken 0 siblings, 0 replies; 266+ messages in thread From: yodaiken @ 2003-02-24 23:54 UTC (permalink / raw) To: William Lee Irwin III, yodaiken, Andrew Morton, Bill Huey, lm, mbligh, davidsen, greearb, linux-kernel On Mon, Feb 24, 2003 at 03:54:33PM -0800, William Lee Irwin III wrote: > On Mon, Feb 24, 2003 at 03:13:41PM -0800, William Lee Irwin III wrote: > >> Which, if you had bothered reading the rest of my post, is why I asked > >> for data. > > On Mon, Feb 24, 2003 at 04:27:54PM -0700, yodaiken@fsmlabs.com wrote: > > I'm not sure what you are complaining about. I don't think there is good > > or even marginal data or explanations of this "effect". > > I'm complaining about being quoted out of context and the animus against > unsupported preempt claims being directed against me. I did not quote you out of context. > For the data, akpm has apparently tracked scheduling latency, so there > is a chance he actually knows whether it's serving its ostensible > purpose as opposed to having a large stockpile of overwrought wisecracks > and a propensity for quoting out of context. You seem determined to pick a fight. Goodbye. -- --------------------------------------------------------- Victor Yodaiken Finite State Machine Labs: The RTLinux Company. www.fsmlabs.com www.rtlinux.com 1+ 505 838 9109 ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 23:27 ` yodaiken 2003-02-24 23:54 ` William Lee Irwin III @ 2003-02-25 2:17 ` Bill Huey 2003-02-25 2:24 ` yodaiken ` (4 more replies) 1 sibling, 5 replies; 266+ messages in thread From: Bill Huey @ 2003-02-25 2:17 UTC (permalink / raw) To: yodaiken Cc: William Lee Irwin III, Andrew Morton, lm, mbligh, davidsen, greearb, linux-kernel, Bill Huey (Hui) On Mon, Feb 24, 2003 at 04:27:54PM -0700, yodaiken@fsmlabs.com wrote: > I'm not sure what you are complaining about. I don't think there is good > or even marginal data or explanations of this "effect". You don't need data. It's conceptually obvious. If you have a higher priority thread that's not running because another thread of lower priority is hogging the CPU for some unknown operation in the kernel, then you're going be less able to respond to external events from the IO system and other things with respect to a Unix style priority scheduler. That's why we have fully preemptive RTOS to deal with that and priority inheritence, both of which are fundamental to any kind of fixed-priority RTOS. If you're scheduler is scheduling crap, then it's not going to be very effective and scheduling... Rhetorical question... what the hell do you think this is about ? http://linuxdevices.com/articles/AT5698775833.html It's about getting relationship inside the kernel to respect and be controllable by the scheduler in some formal manner, not some random not-so-well-though-out hack of the day. bill ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 2:17 ` Bill Huey @ 2003-02-25 2:24 ` yodaiken 2003-02-25 2:35 ` Bill Huey 2003-02-25 2:43 ` Bill Huey 2003-02-25 2:32 ` Larry McVoy ` (3 subsequent siblings) 4 siblings, 2 replies; 266+ messages in thread From: yodaiken @ 2003-02-25 2:24 UTC (permalink / raw) To: Bill Huey Cc: yodaiken, William Lee Irwin III, Andrew Morton, lm, mbligh, davidsen, greearb, linux-kernel On Mon, Feb 24, 2003 at 06:17:36PM -0800, Bill Huey wrote: > On Mon, Feb 24, 2003 at 04:27:54PM -0700, yodaiken@fsmlabs.com wrote: > > I'm not sure what you are complaining about. I don't think there is good > > or even marginal data or explanations of this "effect". > > You don't need data. It's conceptually obvious. If you have a higher Oh. Well that makes things clear enough. Goodbye. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 2:24 ` yodaiken @ 2003-02-25 2:35 ` Bill Huey 2003-02-25 2:43 ` Bill Huey 1 sibling, 0 replies; 266+ messages in thread From: Bill Huey @ 2003-02-25 2:35 UTC (permalink / raw) To: yodaiken Cc: William Lee Irwin III, Andrew Morton, lm, mbligh, davidsen, greearb, linux-kernel On Mon, Feb 24, 2003 at 07:24:45PM -0700, yodaiken@fsmlabs.com wrote: > Oh. Well that makes things clear enough. Goodbye. It's completely clear. Ok, now I know you're a completely screwed narrow minded asshole. Good grief. bill ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 2:24 ` yodaiken 2003-02-25 2:35 ` Bill Huey @ 2003-02-25 2:43 ` Bill Huey 1 sibling, 0 replies; 266+ messages in thread From: Bill Huey @ 2003-02-25 2:43 UTC (permalink / raw) To: yodaiken Cc: William Lee Irwin III, Andrew Morton, lm, mbligh, davidsen, greearb, linux-kernel, Bill Huey (Hui) On Mon, Feb 24, 2003 at 07:24:45PM -0700, yodaiken@fsmlabs.com wrote: > Oh. Well that makes things clear enough. Goodbye. I'd be worried about why you don't have a competent reply to this article again: http://linuxdevices.com/articles/AT5698775833.html Whether you, Larry and other so call Unix traditionalists realize it, "resource kernels" from the folks like CMU's RTOS group are going to rule you and the rest of the RT community. It's the future. bill ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 2:17 ` Bill Huey 2003-02-25 2:24 ` yodaiken @ 2003-02-25 2:32 ` Larry McVoy 2003-02-25 2:40 ` Bill Huey 2003-02-25 5:24 ` Rik van Riel ` (2 subsequent siblings) 4 siblings, 1 reply; 266+ messages in thread From: Larry McVoy @ 2003-02-25 2:32 UTC (permalink / raw) To: Bill Huey Cc: yodaiken, William Lee Irwin III, Andrew Morton, lm, mbligh, davidsen, greearb, linux-kernel > Rhetorical question... what the hell do you think this is about ? > > http://linuxdevices.com/articles/AT5698775833.html Hmm, maybe someone who is advertising their companies mistaken approach? -- --- Larry McVoy lm at bitmover.com http://www.bitmover.com/lm ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 2:32 ` Larry McVoy @ 2003-02-25 2:40 ` Bill Huey 0 siblings, 0 replies; 266+ messages in thread From: Bill Huey @ 2003-02-25 2:40 UTC (permalink / raw) To: Larry McVoy, yodaiken, William Lee Irwin III, Andrew Morton, lm, mbligh, davidsen, greearb, linux-kernel Cc: Bill Huey (Hui) On Mon, Feb 24, 2003 at 06:32:26PM -0800, Larry McVoy wrote: > Hmm, maybe someone who is advertising their companies mistaken approach? Or maybe your understanding of this is faded and you haven't keep up with your generational contemporaries, like our BSD/OS engineers about schedulers, preemption and priority inheritence. Again, assuming that you actually understand what this means read this: http://linuxdevices.com/articles/AT5698775833.html ...because I don't think you really do understand it. bill ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 2:17 ` Bill Huey 2003-02-25 2:24 ` yodaiken 2003-02-25 2:32 ` Larry McVoy @ 2003-02-25 5:24 ` Rik van Riel 2003-02-25 15:30 ` Alan Cox 2003-02-26 19:31 ` Bill Davidsen 4 siblings, 0 replies; 266+ messages in thread From: Rik van Riel @ 2003-02-25 5:24 UTC (permalink / raw) To: Bill Huey Cc: yodaiken, William Lee Irwin III, Andrew Morton, lm, mbligh, davidsen, greearb, linux-kernel On Mon, 24 Feb 2003, Bill Huey wrote: > You don't need data. It's conceptually obvious. I hope you realise this is about as good as a real godwination ? Rik -- Engineers don't grow up, they grow sideways. http://www.surriel.com/ http://kernelnewbies.org/ ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 2:17 ` Bill Huey ` (2 preceding siblings ...) 2003-02-25 5:24 ` Rik van Riel @ 2003-02-25 15:30 ` Alan Cox 2003-02-25 14:59 ` Bill Huey 2003-02-26 19:31 ` Bill Davidsen 4 siblings, 1 reply; 266+ messages in thread From: Alan Cox @ 2003-02-25 15:30 UTC (permalink / raw) To: Bill Huey Cc: yodaiken, William Lee Irwin III, Andrew Morton, lm, mbligh, davidsen, greearb, Linux Kernel Mailing List On Tue, 2003-02-25 at 02:17, Bill Huey wrote: > You don't need data. It's conceptually obvious. If you have a higher > priority thread that's not running because another thread of lower priority > is hogging the CPU for some unknown operation in the kernel, then you're > going be less able to respond to external events from the IO system and > other things with respect to a Unix style priority scheduler. Nothing is conceptually obvious. Thats the difference between 'science' and engineering. Our bridges have to stay up. > It's about getting relationship inside the kernel to respect and be > controllable by the scheduler in some formal manner, not some random > not-so-well-though-out hack of the day. Prove it, compute the bounded RT worst case. You can't do it. Linux, NT, VMS and so on are all basically "armwaved real time". Now for a lot of things armwaved realtime is ok, one 'click' an hour on a phone call from a DSP load miss isnt a big deal. Just don't try the same with precision heavy machinery. Its not a lack of competence, we genuinely don't yet have the understanding in computing to solve some of the problems people are content to armwave about. If I need extremely high provable precision, Victor's approach is right, if I want armwaved realtimeish behaviour with a more convenient way of working then Victor's approach may not be the best. Its called engineering. There are multiple ways to build most things, each with different advantages, there are multiple ways to model it each with more accuracy in some areas. Knowing how to use the right tool is a lot more important than having some religion about it. Alan ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 15:30 ` Alan Cox @ 2003-02-25 14:59 ` Bill Huey 2003-02-25 15:44 ` yodaiken 0 siblings, 1 reply; 266+ messages in thread From: Bill Huey @ 2003-02-25 14:59 UTC (permalink / raw) To: Alan Cox Cc: yodaiken, William Lee Irwin III, Andrew Morton, lm, mbligh, davidsen, greearb, Linux Kernel Mailing List, Bill Huey (Hui) On Tue, Feb 25, 2003 at 03:30:59PM +0000, Alan Cox wrote: > Nothing is conceptually obvious. Thats the difference between 'science' > and engineering. Our bridges have to stay up. Yes, I absolutely agree with this. It shouldn't be the case where one is over the other, they should have a complementary relationship. > > It's about getting relationship inside the kernel to respect and be > > controllable by the scheduler in some formal manner, not some random > > not-so-well-though-out hack of the day. > > Prove it, compute the bounded RT worst case. You can't do it. Linux, NT, > VMS and so on are all basically "armwaved real time". Now for a lot of > things armwaved realtime is ok, one 'click' an hour on a phone call > from a DSP load miss isnt a big deal. Just don't try the same with > precision heavy machinery. > > Its not a lack of competence, we genuinely don't yet have the understanding > in computing to solve some of the problems people are content to armwave > about. > > If I need extremely high provable precision, Victor's approach is right, if > I want armwaved realtimeish behaviour with a more convenient way of working > then Victor's approach may not be the best. I spoke to some folks related to CMU's RTOS group about a year ago and was influenced by their preemption design in that they claimed to get tight RT latency characteristics by what seems like some mild changes to the Linux kernel. I recently start to investigate their stuff, took a clue from them and became convince that this approach was very neat and elegant. MontaVista apparently uses this approach over other groups that run Linux as a thread in another RT kernel. Whether this, static analysis tools doing rate{deadline}-monotonic analysis and scheduler "reservations" (born from that RT theory I believe) are unclear to me at this moment. I just find this particular track neat and reminiscent of some FreeBSD ideals that I'd like to see fully working in an open source kernel. Top level link to many papers: http://linuxdevices.com/articles/AT6476691775.html A paper I've take interest in recently from the top-level link: http://www.linuxdevices.com/articles/AT6078481804.html People I originally talked to that influence my view on this: http://www-2.cs.cmu.edu/~rajkumar/linux-rk.html > Its called engineering. There are multiple ways to build most things, each > with different advantages, there are multiple ways to model it each with > more accuracy in some areas. Knowing how to use the right tool is a lot > more important than having some religion about it. Yes, I agree. I'm not trying to make a religious assertion and I don't function that way. I just want things to work smoother and explore some interesting ideas that I think eventually will be highly relevant to a very broad embedded arena. bill ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 14:59 ` Bill Huey @ 2003-02-25 15:44 ` yodaiken 0 siblings, 0 replies; 266+ messages in thread From: yodaiken @ 2003-02-25 15:44 UTC (permalink / raw) To: Bill Huey Cc: Alan Cox, yodaiken, William Lee Irwin III, Andrew Morton, lm, mbligh, davidsen, greearb, Linux Kernel Mailing List On Tue, Feb 25, 2003 at 06:59:12AM -0800, Bill Huey wrote: > latency characteristics by what seems like some mild changes to the Linux > kernel. I recently start to investigate their stuff, took a clue from them > and became convince that this approach was very neat and elegant. MontaVista > apparently uses this approach over other groups that run Linux as a thread > in another RT kernel. Whether this, static analysis tools doing rate{deadline}-monotonic > analysis and scheduler "reservations" (born from that RT theory I believe) > are unclear to me at this moment. I just find this particular track neat > and reminiscent of some FreeBSD ideals that I'd like to see fully working in > an open source kernel. There are two easy tests: 1) Run a millisecond period real-time task on a system under heave load (not just compute load) and ping flood and find worst case jitter. In our experience tests run for less than 24 hours are worthless. (I've seen a lot of numbers based on 1million interrupts - do the math and laugh) It's not fair to throttle the network to make the numbers come out better. Please also make clear how much of the kernel you had to rewrite to get your numbers: e.g. specially configured network drivers are nice, but have an impact on usability. BTW: a version of this test is distributed with RTLinux . 2) Run the same real-time task and run a known compute/I/O load such as the standard kernel compile to see the overhead of real-time. Remember: hard cli: run RT code only produces great numbers for (1) at the expense of (2) so no reconfiguration allowed between these tests. Now try these on some embedded processors that run under 1GHz and 1G memory. FWIW: RTLinux numbers are 18microseconds jitter and about 15 seconds slowdown of a 5 minute kernel compile on a kinda wimpy P3. On a 2.4Ghz we do slightly better. I got 12 microseconds on a K7, the drop for embedded processors is low. PowerPCs are generally excellent. The second test requires a little more work on things like StrongArms because nobody has the patience to time a kernel compile on those. As for RMA, it's a nice trick, but of limited use. Instead of test (and design for testability) you get a formula for calculating schedulability from the computation times of the tasks. But since we have no good way to estimate compute times of code without test, it has the result of moving ignorance instead of removing it. Also, the idea that frequency and priority are lock-step is simply incorrect for many applications. When you start dealing with really esoteric concepts: like demand driven tasks and shared resources, RMA wheezes mightily. Pre-allocation of resources is good for RT, although not especially revolutionary. Traditional RT systems were written using cyclic schedulers. Many of our simulation customers use a "slot" or "frame" scheduler. Fortunately, these are really old ideas so I know about them. Probably because of the well advertised low level of my knowledge and abilities, I advocate that RT systems be designed with simplicity and testability in mind. We have found that exceptionally complex RT control systems can be developed on such a basis. Making the tools more complicated does not seem to improve reliability or performance: the application performance is more interesting than features of the OS. You can see a nice illustration of the differences between RTLinux and the TimeSys approach in my paper on priority inheritance http://www.fsmlabs.com/articles/inherit/inherit.html (Orignally http://www.linuxdevices.com/articles/AT7168794919.html) and Doug Locke's response http://www.linuxdevices.com/articles/AT5698775833.html > > Top level link to many papers: > http://linuxdevices.com/articles/AT6476691775.html > > A paper I've take interest in recently from the top-level link: > http://www.linuxdevices.com/articles/AT6078481804.html > > People I originally talked to that influence my view on this: > http://www-2.cs.cmu.edu/~rajkumar/linux-rk.html > > > Its called engineering. There are multiple ways to build most things, each > > with different advantages, there are multiple ways to model it each with > > more accuracy in some areas. Knowing how to use the right tool is a lot > > more important than having some religion about it. > > Yes, I agree. I'm not trying to make a religious assertion and I don't > function that way. I just want things to work smoother and explore some > interesting ideas that I think eventually will be highly relevant to a > very broad embedded arena. > > bill > -- --------------------------------------------------------- Victor Yodaiken Finite State Machine Labs: The RTLinux Company. www.fsmlabs.com www.rtlinux.com 1+ 505 838 9109 ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 2:17 ` Bill Huey ` (3 preceding siblings ...) 2003-02-25 15:30 ` Alan Cox @ 2003-02-26 19:31 ` Bill Davidsen 2003-02-27 0:56 ` Bill Huey 4 siblings, 1 reply; 266+ messages in thread From: Bill Davidsen @ 2003-02-26 19:31 UTC (permalink / raw) To: Bill Huey; +Cc: Linux Kernel Mailing List On Mon, 24 Feb 2003, Bill Huey wrote: > You don't need data. It's conceptually obvious. The mantra of doomed IPOs ill-fated software projects, and the guy down the street who has never invested in a company which was still in business 24 months later. No matter how great the concept it still has to work. It's conceptionally obvious that professional programmers working for a major software house will write a better os than a grad student fighting off boredom one summer... in the end you always need data. -- bill davidsen <davidsen@tmr.com> CTO, TMR Associates, Inc Doing interesting things with little computers since 1979. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-26 19:31 ` Bill Davidsen @ 2003-02-27 0:56 ` Bill Huey 2003-02-27 20:04 ` Bill Davidsen 0 siblings, 1 reply; 266+ messages in thread From: Bill Huey @ 2003-02-27 0:56 UTC (permalink / raw) To: Bill Davidsen; +Cc: Linux Kernel Mailing List, Bill Huey (Hui) On Wed, Feb 26, 2003 at 02:31:33PM -0500, Bill Davidsen wrote: > On Mon, 24 Feb 2003, Bill Huey wrote: > > You don't need data. It's conceptually obvious. > > The mantra of doomed IPOs ill-fated software projects, and the guy down > the street who has never invested in a company which was still in business > 24 months later. No matter how great the concept it still has to work. I'm not disagreeing with that, but if you read the previous exchange you'd see that I was reacting to what seemed to be an obviously rude dismissal of how latency effects both IO performance of a system and trashes the usability of the a priority driven scheduler. It's basic computer science. > It's conceptionally obvious that professional programmers working for a > major software house will write a better os than a grad student fighting > off boredom one summer... in the end you always need data. Had to read your post a couple of times to make sure that the tone of it wasn't charged. :) All I can say now is that I'm working on it. We'll see if it's vaporware in the near future. bill ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-27 0:56 ` Bill Huey @ 2003-02-27 20:04 ` Bill Davidsen 0 siblings, 0 replies; 266+ messages in thread From: Bill Davidsen @ 2003-02-27 20:04 UTC (permalink / raw) To: Bill Huey; +Cc: Linux Kernel Mailing List On Wed, 26 Feb 2003, Bill Huey wrote: > On Wed, Feb 26, 2003 at 02:31:33PM -0500, Bill Davidsen wrote: > > On Mon, 24 Feb 2003, Bill Huey wrote: > > > You don't need data. It's conceptually obvious. > > > > The mantra of doomed IPOs ill-fated software projects, and the guy down > > the street who has never invested in a company which was still in business > > 24 months later. No matter how great the concept it still has to work. > > I'm not disagreeing with that, but if you read the previous exchange you'd see > that I was reacting to what seemed to be an obviously rude dismissal of how > latency effects both IO performance of a system and trashes the usability of > the a priority driven scheduler. It's basic computer science. No argument from me, but I have seen systems driving up the system time and beating the cache with scheduling logic and context switches. There's a balance to be had there, and in timeslice size, and other places as well, and real data are always useful. > > It's conceptionally obvious that professional programmers working for a > > major software house will write a better os than a grad student fighting > > off boredom one summer... in the end you always need data. > > Had to read your post a couple of times to make sure that the tone of it > wasn't charged. :) It's always more effective if it's subtle and and people take an instant to get it. > All I can say now is that I'm working on it. We'll see if it's vaporware > in the near future. Great. I have no doubt that when you have convinced yourself one way or the other you won't have any problem convincing me. When the io was slow, the VM was primitive, and the scheduler was a doorknob, preempt made a big improvement. Now that the rest of the kernel doesn't suck, it's a lot hardware to make a big improvement. -- bill davidsen <davidsen@tmr.com> CTO, TMR Associates, Inc Doing interesting things with little computers since 1979. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 16:17 ` yodaiken 2003-02-24 23:13 ` William Lee Irwin III @ 2003-02-25 2:07 ` Bill Huey 2003-02-25 2:14 ` Larry McVoy 1 sibling, 1 reply; 266+ messages in thread From: Bill Huey @ 2003-02-25 2:07 UTC (permalink / raw) To: yodaiken Cc: William Lee Irwin III, Andrew Morton, lm, mbligh, davidsen, greearb, linux-kernel, Bill Huey (Hui) On Mon, Feb 24, 2003 at 09:17:58AM -0700, yodaiken@fsmlabs.com wrote: > On Mon, Feb 24, 2003 at 12:50:31AM -0800, William Lee Irwin III wrote: > > There's a vague notion in my head that it should decrease scheduling > > Vague notions seems to be the level of data on this topic. Ok, replace "vague notion" with latency and scheduling concepts that everybody else except you understands and you'll be a bit more relevant. It's not even about IO system, it's about a consumer-producer relationships between threads and some kind of IPC generic mechanism. You'd run into the same problems by having two threads communicating in a priorty capable scheduler, since the temporal granualarity of "things that the scheduler manages" gets clobbered but inheritently brain damaged locking. Say, how would the scheduler properly order the priority relationships for non-preemptable thread that holds that critical section for 100ms under an extreme (or normal) case ? The effectiveness of the scheduler in these cases would be meaningless. Shit, just replace that SOB with a stocastic-insert-round-robin system and it'll be just as effective if this current state of Linux locking stays in place. There's probably more truth than exaggeration from what I've seen both in the code and running Linux as a desktop OS. > Victor Yodaiken bill ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 2:07 ` Bill Huey @ 2003-02-25 2:14 ` Larry McVoy 2003-02-25 2:24 ` Bill Huey 0 siblings, 1 reply; 266+ messages in thread From: Larry McVoy @ 2003-02-25 2:14 UTC (permalink / raw) To: Bill Huey Cc: yodaiken, William Lee Irwin III, Andrew Morton, lm, mbligh, davidsen, greearb, linux-kernel On Mon, Feb 24, 2003 at 06:07:30PM -0800, Bill Huey wrote: > On Mon, Feb 24, 2003 at 09:17:58AM -0700, yodaiken@fsmlabs.com wrote: > > On Mon, Feb 24, 2003 at 12:50:31AM -0800, William Lee Irwin III wrote: > > > There's a vague notion in my head that it should decrease scheduling > > > > Vague notions seems to be the level of data on this topic. > > Ok, replace "vague notion" with latency and scheduling concepts that > everybody else except you understands and you'll be a bit more relevant. Victor has forgotten more than most people know about operating systems. Dig into his background, he tends to know what he is talking about even if he is a little terse at times. -- --- Larry McVoy lm at bitmover.com http://www.bitmover.com/lm ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 2:14 ` Larry McVoy @ 2003-02-25 2:24 ` Bill Huey 2003-02-25 2:46 ` Valdis.Kletnieks 0 siblings, 1 reply; 266+ messages in thread From: Bill Huey @ 2003-02-25 2:24 UTC (permalink / raw) To: Larry McVoy, yodaiken, William Lee Irwin III, Andrew Morton, lm, mbligh, davidsen, greearb, linux-kernel Cc: Bill Huey (Hui) On Mon, Feb 24, 2003 at 06:14:26PM -0800, Larry McVoy wrote: > > Ok, replace "vague notion" with latency and scheduling concepts that > > everybody else except you understands and you'll be a bit more relevant. > > Victor has forgotten more than most people know about operating systems. > Dig into his background, he tends to know what he is talking about even > if he is a little terse at times. But apparently what knows is not very modern. I'm no slouch either being a former BSDi (the original Unix folks) engineer, but I don't go dimissing folks implicitly like he did to "The Will", William Irwin... and then not adding anything usable in the conversation. That's just no excuse for an adult running a company or in a public forum that's discussing these very important issues. Frankly, I don't care what he has or what traditional so called "Unix folks" think. Even FreeBSD's SMPng project, using BSD/OS's 5.0 code deals with these issues respectfully. These old school Unix folks seem to have a much more modern attitude towards this stuff than either you or Victor. bill ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 2:24 ` Bill Huey @ 2003-02-25 2:46 ` Valdis.Kletnieks 2003-02-25 14:47 ` Mr. James W. Laferriere 0 siblings, 1 reply; 266+ messages in thread From: Valdis.Kletnieks @ 2003-02-25 2:46 UTC (permalink / raw) To: Bill Huey Cc: Larry McVoy, yodaiken, William Lee Irwin III, Andrew Morton, lm, mbligh, davidsen, greearb, linux-kernel [-- Attachment #1: Type: text/plain, Size: 351 bytes --] On Mon, 24 Feb 2003 18:24:38 PST, Bill Huey said: > But apparently what knows is not very modern. I'm no slouch either being a > former BSDi (the original Unix folks) engineer, but I don't go dimissing And here I thought "the original Unix folks" was Dennis and Ken mailing you an RL05 with a "Good luck, let us know if it works" cover letter... ;) [-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --] ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 2:46 ` Valdis.Kletnieks @ 2003-02-25 14:47 ` Mr. James W. Laferriere 2003-02-25 15:59 ` Jesse Pollard 0 siblings, 1 reply; 266+ messages in thread From: Mr. James W. Laferriere @ 2003-02-25 14:47 UTC (permalink / raw) To: Valdis.Kletnieks; +Cc: Linux Kernel Maillist Hello Valdis , One in those days there were no RL05's (never were if my memory serves) . They were RL02's 10mb packs . Maybe RM05 ? *nix definately was NOT known as BSD then . JimL On Mon, 24 Feb 2003 Valdis.Kletnieks@vt.edu wrote: > On Mon, 24 Feb 2003 18:24:38 PST, Bill Huey said: > > But apparently what knows is not very modern. I'm no slouch either being a > > former BSDi (the original Unix folks) engineer, but I don't go dimissing > And here I thought "the original Unix folks" was Dennis and Ken mailing you > an RL05 with a "Good luck, let us know if it works" cover letter... ;) -- +------------------------------------------------------------------+ | James W. Laferriere | System Techniques | Give me VMS | | Network Engineer | P.O. Box 854 | Give me Linux | | babydr@baby-dragons.com | Coudersport PA 16915 | only on AXP | +------------------------------------------------------------------+ ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 14:47 ` Mr. James W. Laferriere @ 2003-02-25 15:59 ` Jesse Pollard 0 siblings, 0 replies; 266+ messages in thread From: Jesse Pollard @ 2003-02-25 15:59 UTC (permalink / raw) To: Mr. James W. Laferriere, Valdis.Kletnieks; +Cc: Linux Kernel Maillist On Tuesday 25 February 2003 08:47 am, Mr. James W. Laferriere wrote: > Hello Valdis , One in those days there were no RL05's (never were > if my memory serves) . They were RL02's 10mb packs . Maybe RM05 ? > *nix definately was NOT known as BSD then . JimL nope - it was RK05 (2.5 MB disk). The distribution was on tape, not disk. (9 track tapes were only 20 bucks, RKs were several thousand). RLs did not show up for almost 10 years. RM05 was a 600 MB disk, and didn't show up until after the PDP11/70 and VAX/11 existed (it was a relabled CDC9766 disk I believe). And it was UNIX v x, where x varied from null (not labeled) and 1 .. 7. RL distributions did not come from AT&T (Yourden, Inc. was where I got one) -- ------------------------------------------------------------------------- Jesse I Pollard, II Email: pollard@navo.hpc.mil Any opinions expressed are solely my own. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 8:40 ` Andrew Morton 2003-02-24 8:50 ` William Lee Irwin III @ 2003-02-24 8:56 ` Bill Huey 2003-02-24 9:09 ` Andrew Morton ` (2 more replies) 1 sibling, 3 replies; 266+ messages in thread From: Bill Huey @ 2003-02-24 8:56 UTC (permalink / raw) To: Andrew Morton Cc: wli, lm, mbligh, davidsen, greearb, linux-kernel, Bill Huey (Hui) On Mon, Feb 24, 2003 at 12:40:05AM -0800, Andrew Morton wrote: > There is no evidence for any such thing. Nor has any plausible > theory been put forward as to why such an improvement should occur. I find what you're saying a rather unbelievable given some of the benchmarks I saw when the preempt patch started to floating around. If you search linuxdevices.com for articles on preempt, you'll see a claim about IO performance improvements with the patch. If somethings changed then I'd like to know. The numbers are here: http://kpreempt.sourceforge.net/ bill ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 8:56 ` Bill Huey @ 2003-02-24 9:09 ` Andrew Morton 2003-02-24 9:24 ` Bill Huey 2003-02-24 14:40 ` Bill Davidsen 2003-02-24 21:10 ` Andrea Arcangeli 2 siblings, 1 reply; 266+ messages in thread From: Andrew Morton @ 2003-02-24 9:09 UTC (permalink / raw) To: Bill Huey; +Cc: wli, lm, mbligh, davidsen, greearb, linux-kernel, billh Bill Huey (Hui) <billh@gnuppy.monkey.org> wrote: > > On Mon, Feb 24, 2003 at 12:40:05AM -0800, Andrew Morton wrote: > > There is no evidence for any such thing. Nor has any plausible > > theory been put forward as to why such an improvement should occur. > > I find what you're saying a rather unbelievable given some of the > benchmarks I saw when the preempt patch started to floating around. > > If you search linuxdevices.com for articles on preempt, you'll see a > claim about IO performance improvements with the patch. If somethings > changed then I'd like to know. > > The numbers are here: > http://kpreempt.sourceforge.net/ > That's a 5% difference across five dbench runs. If it is even statistically significant, dbench is notoriously prone to chaotic effects (less so in 2.5) It is a long stretch to say that any increase in dbench numbers can be generalised to "improved IO performance" across the board. The preempt stuff is all about *worst-case* latency. I doubt if it shifts the average latency (which is in the 50-100 microsecond range) by more that 50 microseconds. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 9:09 ` Andrew Morton @ 2003-02-24 9:24 ` Bill Huey 2003-02-24 9:56 ` Andrew Morton 0 siblings, 1 reply; 266+ messages in thread From: Bill Huey @ 2003-02-24 9:24 UTC (permalink / raw) To: Andrew Morton Cc: wli, lm, mbligh, davidsen, greearb, linux-kernel, Bill Huey (Hui) On Mon, Feb 24, 2003 at 01:09:38AM -0800, Andrew Morton wrote: > That's a 5% difference across five dbench runs. If it is even > statistically significant, dbench is notoriously prone to chaotic > effects (less so in 2.5) It is a long stretch to say that any > increase in dbench numbers can be generalised to "improved IO > performance" across the board. I think the test is valid. If the scheduler can't deal with some kind IO event in a very tight time window, then you'd think that it might influence the performance of that IO system. > The preempt stuff is all about *worst-case* latency. I doubt if > it shifts the average latency (which is in the 50-100 microsecond > range) by more that 50 microseconds. You obviously don't know what the current patch is suppose to do, I'm assuming that's what you're refering to at this point. A fully preemptive kernel, like the one from TimeSys, is about constraining worst case latency by using sleeping locks that enable preemption across critical section where that's normally turned off courtesy of spinlocks. Combine that with heavy weight interrupts you have a mix for constraining maximum latency to about 50us in their kernel. The patch and locking schema in Linux in it's current form only reduces the latency on "average", which is an inverse to your claim if concerning maximum latency. The last time I looked at 2.5.62 there were still quite a few place where there was the possibility of a critical section bounded by spinlocks (with interrupts turned off) to iterate over a data structure (VM), copy, move memory in critical sections that have very large upper bounds. I can't believe an engineer of your stature would blow something this basic to the understanding of locking. You can't mean what you just said above. Read: http://linuxdevices.com/articles/AT6106723802.html That's basically what I'm refering to... bill ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 9:24 ` Bill Huey @ 2003-02-24 9:56 ` Andrew Morton 2003-02-24 10:11 ` Bill Huey 0 siblings, 1 reply; 266+ messages in thread From: Andrew Morton @ 2003-02-24 9:56 UTC (permalink / raw) To: Bill Huey; +Cc: wli, lm, mbligh, davidsen, greearb, linux-kernel, billh Bill Huey (Hui) <billh@gnuppy.monkey.org> wrote: > > On Mon, Feb 24, 2003 at 01:09:38AM -0800, Andrew Morton wrote: > > That's a 5% difference across five dbench runs. If it is even > > statistically significant, dbench is notoriously prone to chaotic > > effects (less so in 2.5) It is a long stretch to say that any > > increase in dbench numbers can be generalised to "improved IO > > performance" across the board. > > I think the test is valid. If the scheduler can't deal with some > kind IO event in a very tight time window, then you'd think that > it might influence the performance of that IO system. > On the contrary. If the disk request queue is plugged and the task which is submitting writeback is preempted, the IO system could remain artificially idle for hundreds of milliseconds while the CPU is off calculating pi. This is one of the reasons why I converted the 2.5 request queues to unplug autonomously. But that is speculation as well - I never observed this aspect to be a real problem. Probably, it was not. Substantiation of your claim requires quality testing and a plausible explanation. I do not believe we have seen either, OK? > Read: > http://linuxdevices.com/articles/AT6106723802.html I did, briefly. It appears to be claiming that the average scheduling latency of the non-preemptible kernel is ten milliseconds! Maybe I need to read that again in the morning. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 9:56 ` Andrew Morton @ 2003-02-24 10:11 ` Bill Huey 0 siblings, 0 replies; 266+ messages in thread From: Bill Huey @ 2003-02-24 10:11 UTC (permalink / raw) To: Andrew Morton Cc: wli, lm, mbligh, davidsen, greearb, linux-kernel, Bill Huey (Hui) On Mon, Feb 24, 2003 at 01:56:25AM -0800, Andrew Morton wrote: > But that is speculation as well - I never observed this aspect to be > a real problem. Probably, it was not. > > Substantiation of your claim requires quality testing and a plausible > explanation. I do not believe we have seen either, OK? Well, let's back off here. It's not my claim, it's Robert Love's in that URL. Not to arrange a fight, but I had to point that out. :) > > http://linuxdevices.com/articles/AT6106723802.html > > I did, briefly. It appears to be claiming that the average scheduling > latency of the non-preemptible kernel is ten milliseconds! They mention that this is related to the console code. Obviously, if you're not checking for reschedule in a big pix map scroll blit, then it's going to stick out boldly as a big latency spike. A fully preemptive system would only turn off preemption in places that would break drivers and other obvious places like scheduler run-queues, etc... > Maybe I need to read that again in the morning. It's also an old article, but goes over a lot of the basics of a fully preemptable kernel like that. Things might not be as dramatic now with 2.5.62. Not sure how things are now... bill ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 8:56 ` Bill Huey 2003-02-24 9:09 ` Andrew Morton @ 2003-02-24 14:40 ` Bill Davidsen 2003-02-24 21:10 ` Andrea Arcangeli 2 siblings, 0 replies; 266+ messages in thread From: Bill Davidsen @ 2003-02-24 14:40 UTC (permalink / raw) To: Bill Huey; +Cc: Andrew Morton, wli, lm, mbligh, greearb, linux-kernel On Mon, 24 Feb 2003, Bill Huey wrote: > On Mon, Feb 24, 2003 at 12:40:05AM -0800, Andrew Morton wrote: > > There is no evidence for any such thing. Nor has any plausible > > theory been put forward as to why such an improvement should occur. > > I find what you're saying a rather unbelievable given some of the > benchmarks I saw when the preempt patch started to floating around. > > If you search linuxdevices.com for articles on preempt, you'll see a > claim about IO performance improvements with the patch. If somethings > changed then I'd like to know. Clearly you do know... preempt started out when 2.4 was the only game in town. It made improvements to some degree because the rest of the kernel had some real latency issues. Skip forward through low latency patches, several flavors of elevator improvements, faster clock rate, rmap, better VM, object rmap, finer grained locking, io scheduling of several types including latency limiting and prevention of write blocking, and the O(1) scheduler. Preempt was a great way to get the right thing running sooner because there was a lot of latency in many places. Just doesn't seem to be true anymore. Preempt doesn't make as much difference anymore because many things have been improved. I'm sure that there are applications which benefit greatly from preempt, but the days of vast improvement seem to be gone, the low hanging fruit has been picked. Context switching latency is still way higher than 2.4, that isn't hurting io as much as all the other improvements have helped. -- bill davidsen <davidsen@tmr.com> CTO, TMR Associates, Inc Doing interesting things with little computers since 1979. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 8:56 ` Bill Huey 2003-02-24 9:09 ` Andrew Morton 2003-02-24 14:40 ` Bill Davidsen @ 2003-02-24 21:10 ` Andrea Arcangeli 2 siblings, 0 replies; 266+ messages in thread From: Andrea Arcangeli @ 2003-02-24 21:10 UTC (permalink / raw) To: Bill Huey; +Cc: Andrew Morton, wli, lm, mbligh, davidsen, greearb, linux-kernel On Mon, Feb 24, 2003 at 12:56:17AM -0800, Bill Huey wrote: > On Mon, Feb 24, 2003 at 12:40:05AM -0800, Andrew Morton wrote: > > There is no evidence for any such thing. Nor has any plausible > > theory been put forward as to why such an improvement should occur. > > I find what you're saying a rather unbelievable given some of the > benchmarks I saw when the preempt patch started to floating around. > > If you search linuxdevices.com for articles on preempt, you'll see a > claim about IO performance improvements with the patch. If somethings > changed then I'd like to know. > > The numbers are here: > http://kpreempt.sourceforge.net/ most kernels out there are buggy w/o preempt. 2.4.21pre4aa3 has most of the needed preemption checks in the kernel loops instead. It's quite pointless to compare preempt with an otherwise buggy kernel. Andrea ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 8:00 ` Bill Huey 2003-02-24 8:40 ` Andrew Morton @ 2003-02-24 8:43 ` William Lee Irwin III 1 sibling, 0 replies; 266+ messages in thread From: William Lee Irwin III @ 2003-02-24 8:43 UTC (permalink / raw) To: Bill Huey Cc: Larry McVoy, Martin J. Bligh, Bill Davidsen, Ben Greear, Linux Kernel Mailing List On Sun, Feb 23, 2003 at 11:54:30PM -0800, William Lee Irwin III wrote: >> And scalability is as essential there as it is on 512x/16TB O2K's. >> For this, it's _downward_ scalability, where "downward" is relative to >> "typical" UP x86 boxen. On Mon, Feb 24, 2003 at 12:00:52AM -0800, Bill Huey wrote: > The good thing about Linux is that, with some compile options, stuff > (scalability) can be insert and removed and any time. One shouldn't > narrow their view of how an OS can be out of a strict tradition. No!! Scalability means the kernel figures out how to adapt to the box. Removing scalability means it no longer adapts to the size of your box. Scalability includes scaling "downward" to smaller systems. On Mon, Feb 24, 2003 at 12:00:52AM -0800, Bill Huey wrote: > I don't buy this spinlock-for-all-locking things tradition with no > preemption, especially given some of the IO performance improvement > that happened as a courtesy of preempt. Some how that was forgotten > in Larry's discussion. I've largely not been a party to the preempt business. Advances in scheduling semantics are good, but are not my focus. -- wli ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 22:06 ` Mark Hahn 2003-02-22 22:17 ` William Lee Irwin III 2003-02-22 22:44 ` Ben Greear @ 2003-02-22 23:10 ` Martin J. Bligh 2003-02-22 23:20 ` Larry McVoy 2003-02-25 2:19 ` Hans Reiser 3 siblings, 1 reply; 266+ messages in thread From: Martin J. Bligh @ 2003-02-22 23:10 UTC (permalink / raw) To: Mark Hahn; +Cc: linux-kernel >> OK, so now you've slid from talking about PCs to 2-way to 4-way ... >> perhaps because your original arguement was fatally flawed. > > oh, come on. the issue is whether memory is fast and flat. > most "scalability" efforts are mainly trying to code around the fact > that any ccNUMA (and most 4-ways) is going to be slow/bumpy. Scalability is not just NUMA machines by any stretch of the imagination. It's 2x, 4x, 8x SMP as well. > it is reasonable to worry that optimizations for imbalanced machines > will hurt "normal" ones. is it worth hurting uni by 5% to give > a 50% speedup to IBM's 32-way? I think not, simply because > low-end machines are more important to Linux. We would never try to propose such a change, and never have. Name a scalability change that's hurt the performance of UP by 5%. There isn't one. > ccNUMA worst-case latencies are not much different from decent > cluster (message-passing) latencies. getting an app to work on a cluster > is a matter of programming will. It's a matter of repeatedly reimplementing a bunch of stuff in userspace, instead of doing things in kernel space once, properly, with all the machine specific knowledge that's needed. It's *so* much easier to program over a single OS image. M. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 23:10 ` Martin J. Bligh @ 2003-02-22 23:20 ` Larry McVoy 2003-02-22 23:46 ` Martin J. Bligh 0 siblings, 1 reply; 266+ messages in thread From: Larry McVoy @ 2003-02-22 23:20 UTC (permalink / raw) To: Martin J. Bligh; +Cc: Mark Hahn, linux-kernel > We would never try to propose such a change, and never have. > Name a scalability change that's hurt the performance of UP by 5%. > There isn't one. This is *exactly* the reasoning that every OS marketing weenie has used for the last 20 years to justify their "feature" of the week. The road to slow bloated code is paved one cache miss at a time. You may quote me on that. In fact, print it out and put it above your monitor and look at it every day. One cache miss at a time. How much does one cache miss add to any benchmark? .001%? Less. But your pet features didn't slow the system down. Nope, they just made the cache smaller, which you didn't notice because whatever artificial benchmark you ran didn't happen to need the whole cache. You need to understand that system resources belong to the user. Not the kernel. The goal is to have all of the kernel code running under any load be less than 1% of the CPU. Your 5% number up there would pretty much double the amount of time we spend in the kernel for most workloads. -- --- Larry McVoy lm at bitmover.com http://www.bitmover.com/lm ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 23:20 ` Larry McVoy @ 2003-02-22 23:46 ` Martin J. Bligh 0 siblings, 0 replies; 266+ messages in thread From: Martin J. Bligh @ 2003-02-22 23:46 UTC (permalink / raw) To: Larry McVoy; +Cc: Mark Hahn, linux-kernel >> We would never try to propose such a change, and never have. >> Name a scalability change that's hurt the performance of UP by 5%. >> There isn't one. > > This is *exactly* the reasoning that every OS marketing weenie has used > for the last 20 years to justify their "feature" of the week. Fine, stick 'em all together. I bet it's either an improvement or doesn't even register on the scale. Knock yourself out. M. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 22:06 ` Mark Hahn ` (2 preceding siblings ...) 2003-02-22 23:10 ` Martin J. Bligh @ 2003-02-25 2:19 ` Hans Reiser 2003-02-25 3:49 ` Martin J. Bligh 3 siblings, 1 reply; 266+ messages in thread From: Hans Reiser @ 2003-02-25 2:19 UTC (permalink / raw) To: Mark Hahn; +Cc: Martin J. Bligh, linux-kernel I expect to have 16-32 CPUs in my $3000 desktop in 5 years . If you all start planning for that now, you might get it debugged before it happens to me.;-) I don't expect to connect the 16-32 CPUs with ethernet.... but it won't surprise me if they have non-uniform memory. It is just a matter of time before the users need Reiser4 to be highly scalable, and I don't want to rewrite when they do, so we are worrying about it now. -- Hans ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 2:19 ` Hans Reiser @ 2003-02-25 3:49 ` Martin J. Bligh 2003-02-25 5:12 ` Steven Cole 0 siblings, 1 reply; 266+ messages in thread From: Martin J. Bligh @ 2003-02-25 3:49 UTC (permalink / raw) To: Hans Reiser; +Cc: linux-kernel > I expect to have 16-32 CPUs in my $3000 desktop in 5 years . If you all > start planning for that now, you might get it debugged before it happens > to me.;-) Thank you ... some sanity amongst the crowd > I don't expect to connect the 16-32 CPUs with ethernet.... but it won't > surprise me if they have non-uniform memory. Indeed. Just look at AMD hammer for NUMA effects, and SMT and multiple chip on die technologies for the way things are going. M. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 3:49 ` Martin J. Bligh @ 2003-02-25 5:12 ` Steven Cole 2003-02-25 20:37 ` Scott Robert Ladd 0 siblings, 1 reply; 266+ messages in thread From: Steven Cole @ 2003-02-25 5:12 UTC (permalink / raw) To: Martin J. Bligh; +Cc: Hans Reiser, LKML, Larry McVoy On Mon, 2003-02-24 at 20:49, Martin J. Bligh wrote: > > I expect to have 16-32 CPUs in my $3000 desktop in 5 years . If you all > > start planning for that now, you might get it debugged before it happens > > to me.;-) > > Thank you ... some sanity amongst the crowd > > > I don't expect to connect the 16-32 CPUs with ethernet.... but it won't > > surprise me if they have non-uniform memory. > > Indeed. Just look at AMD hammer for NUMA effects, and SMT and multiple > chip on die technologies for the way things are going. > > M. Hans may have 32 CPUs in his $3000 box, and I expect to have 8 CPUs in my $500 Walmart special 5 or 6 years hence. And multiple chip on die along with HT is what will make it possible. What concerns me is that this will make it possible to put insane numbers of CPUs in those $250,000 and higher boxes. If Martin et al can scale Linux to 64 CPUs, can they make it scale several binary orders of magnitude higher? Why do this? NUMA memory is much faster than even very fast network connections any day. Is there a market for such a thing? I won't pretend to know that answer. But the capability to do it will be there, and in 5 years the 3.2 kernel probably won't be quite stable yet, so decisions made in the next year for 2.9/3.0 may have to last until then. Please listen to Larry. When he says you can't scale endlessly, I have a feeling he knows what he's talking about. The Nirvana machine has 48 SGI boxes with 128 CPUs in each. I don't hear about many 128 CPU machines nowadays. Perhaps Irix just wasn't quite up to the job. But new technologies will make this kind of machine affordable (by the government and financial institutions) in the not too distant future. Just my two cents. Enough ranting for today. Steven ^ permalink raw reply [flat|nested] 266+ messages in thread
* RE: Minutes from Feb 21 LSE Call 2003-02-25 5:12 ` Steven Cole @ 2003-02-25 20:37 ` Scott Robert Ladd 2003-02-25 21:36 ` Hans Reiser 2003-02-26 0:44 ` Alan Cox 0 siblings, 2 replies; 266+ messages in thread From: Scott Robert Ladd @ 2003-02-25 20:37 UTC (permalink / raw) To: Steven Cole, Martin J. Bligh; +Cc: Hans Reiser, LKML, Larry McVoy Steven Cole wrote: > Hans may have 32 CPUs in his $3000 box, and I expect to have 8 CPUs in > my $500 Walmart special 5 or 6 years hence. And multiple chip on die > along with HT is what will make it possible. Or will Walmart be selling systems with one CPU for $62.50? "Normal" folk simply have no use for an 8 CPU system. Sure, the technology is great -- but no many people are buying HDTV, let alone a computer system that could do real-time 3D holographic imaging. What Walmart is selling today for $199 is a 1.1 GHz Duron system with minimal memory and a 10GB hard drive. Not exactly state of the art (although it might make a nice node in a super-cheap cluster!) Of course, you'll have your Joe Normals who will buy multiprocessor machines with neon lights and case windows -- but those are the same people who drive a Ford Excessive 4WD SuperCab pickup when the only thing they ever "haul" is groceries. (Note: I drive a big SUV because I *do* haul stuff, and I've got lots of kids -- the right tool for the job, as Alan stated.) > What concerns me is that this will make it possible to put insane > numbers of CPUs in those $250,000 and higher boxes. If Martin et al can > scale Linux to 64 CPUs, can they make it scale several binary orders of > magnitude higher? Why do this? NUMA memory is much faster than even > very fast network connections any day. > > Is there a market for such a thing? Such systems will be very useful in limited markets. If I need to simulate the global climate or the evolution of galaxies, I can damned-well use 65,536 quad-core CPUs, and I'll be happy to install Linux on such a box. Writing e-mail or scanning my kids' drawings doesn't require that sort of power. > Please listen to Larry. When he says you can't scale endlessly, I have > a feeling he knows what he's talking about. The Nirvana machine has 48 > SGI boxes with 128 CPUs in each. I don't hear about many 128 CPU > machines nowadays. Perhaps Irix just wasn't quite up to the job. But > new technologies will make this kind of machine affordable (by the > government and financial institutions) in the not too distant future. Linux needs a roadmap; perhaps it has one, and I just haven't seen it? I'm not entirely certain that Linux can scale from toasters to Deep Thought; the needs of an office worker don't coincide well with the needs of a scientist trying to simulate the dynamics of hurricanes. I've worked both ends of that spectrum; they really are two different universes that may not be effectively addressed by one Linux. I, for one, would rather see Linux work best on high-end systems; I have no problem leaving the low end of the spectrum to consumer-oriented companies like Microsoft. Linux has the most potential of any extant OS, in my opinion, for handling the types of systems you envision. And to achieve such a goal, some planning needs to be done *now* to avoid quagmires and minefields in the future. ..Scott -- Scott Robert Ladd Coyote Gulch Productions (http://www.coyotegulch.com) ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 20:37 ` Scott Robert Ladd @ 2003-02-25 21:36 ` Hans Reiser 2003-02-25 23:28 ` Scott Robert Ladd 2003-02-26 0:44 ` Alan Cox 1 sibling, 1 reply; 266+ messages in thread From: Hans Reiser @ 2003-02-25 21:36 UTC (permalink / raw) To: Scott Robert Ladd; +Cc: Steven Cole, Martin J. Bligh, LKML, Larry McVoy Scott Robert Ladd wrote: >"Normal" folk simply have no use for an 8 CPU system. > I had this argument over whether normal people would ever really need a 10mb hard drive when I was 21. Once was enough, sorry, I didn't convince the other guy then, and I don't think I have gotten more eloquent since then. I'll just say that entertainment will drive computing for the next 5-15 years, and game designers won't have enough CPU that whole time. Hollywood is dying like radio did, and immersive experiences are replacing it. HDTV might not make it. I personally don't really want any audio or video devices or sources which are not well integrated into my computer, and HDTV is not. I am not sure if the rest of the market will think like me, but the gamers might.... I am getting a La Cie 4 monitor next week which will do 2048x1536 without blurring pixels for $960, and I just don't think I will want to use an HDTV for anything except maybe the kitchen. I try to watch a high quality movie once a week with a friend because I don't want to miss out on our culture (and games are not yet as culturally rich as movies), but games are more engaging, and I am not really managing to watch the movie a week. I seem to be at the extreme of a growing trend. Scott Robert Ladd wrote: (Note: I drive a big SUV because I *do** haul stuff, and I've got lots of kids -- the right tool for the job, as Alan stated.) You didn't say whether you typically haul stuff and kids over rough roads. If you don't (and very few SUV owners do), then what you need is called a "mini-van", which is what people who are functionally oriented buy for city hauling of kids and stuff ;-), and I bought my wife one. It has more than 16 CPUs in it.... -- Hans ^ permalink raw reply [flat|nested] 266+ messages in thread
* RE: Minutes from Feb 21 LSE Call 2003-02-25 21:36 ` Hans Reiser @ 2003-02-25 23:28 ` Scott Robert Ladd 2003-02-25 23:41 ` Hans Reiser 2003-02-26 6:04 ` Aaron Lehmann 0 siblings, 2 replies; 266+ messages in thread From: Scott Robert Ladd @ 2003-02-25 23:28 UTC (permalink / raw) To: Hans Reiser; +Cc: Steven Cole, Martin J. Bligh, LKML, Larry McVoy > >"Normal" folk simply have no use for an 8 CPU system. > I had this argument over whether normal people would ever really need a > 10mb hard drive when I was 21. Once was enough, sorry, I didn't > convince the other guy then, and I don't think I have gotten more > eloquent since then. I should be more careful in what I say. I remember fighting to get 20MB drives in systems for an early Novell LAN, when management thought that no one would ever need more than 10MB. To be more precise in my reasoning: High-powered, multiprocessor computers will be an essential part of people's lives -- in medical equipment, possibly guiding transportation, in various tools that affect people's live. As for what we see today as a "home computer": The vast majority of people don't use what they already have. This is one reason that sales of "home computers" have slowed; people just don't need a 3GHz system (with or without HT or SMP) for checking e-mail and writing a letter to Aunt Edna. > I'll just say that entertainment will drive computing for the next 5-15 > years, and game designers won't have enough CPU that whole time. > Hollywood is dying like radio did, and immersive experiences are > replacing it. You are correct. Gaming, file sharing, digital imaging -- those application eat horsepower. But I honestly can't see how 8 processors can possibly make Abiword run "better." Technologies tend to hit a point where they're "good enough" for the majority of users. For example, houses haven't really changed much in 50 years, in spite of Disney visions and the HGTV. I haven't seen too many push-button houses (like people predicted in the 1950s); and I still want my flying cars, dang-it! > You didn't say whether you typically haul stuff and kids over rough > roads. If you don't (and very few SUV owners do), then what you need is > called a "mini-van", which is what people who are functionally oriented > buy for city hauling of kids and stuff ;-), and I bought my wife one. > It has more than 16 CPUs in it.... I live half-time in rural Colorado -- at 9800 feet above sea level, on rough highways 60 miles from the nearest grocery store. I've also done Search & Rescue, and I'm involved in work on Indian Reservations (where roads just plain stink). I do need a different vehicle for when I'm in Florida -- we usually leave my behemoth parked and drive a boring Taurus. My 4x4 SUV is kinda raggy; it's 18 years old, and I maintain it myself. People who buy $75,000 Cadillac SUVs with leather seats do it for prestige and "mine is bigger than yours" competition. Kinda like folks who buy dual-processor systems with 250GB drives, so they can web surf or impress people at LAN parties... ;) This point does fit with our discussion of multiprocessor computers. Minivans are *not* marvels of high technology; they're actually quite prosaic. But they do the job well for many people who have no need for a high-tech car. Meanwhile, the best-technology vehicles don't sell very well. I suspect the same rule holds true for computers. ..Scott ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 23:28 ` Scott Robert Ladd @ 2003-02-25 23:41 ` Hans Reiser 2003-02-26 0:19 ` Scott Robert Ladd ` (2 more replies) 2003-02-26 6:04 ` Aaron Lehmann 1 sibling, 3 replies; 266+ messages in thread From: Hans Reiser @ 2003-02-25 23:41 UTC (permalink / raw) To: Scott Robert Ladd; +Cc: Steven Cole, Martin J. Bligh, LKML, Larry McVoy Scott Robert Ladd wrote: > But I honestly can't see how 8 processors can possibly make >Abiword run "better." > They can't, but you know it was before 1980 that hardware exceeded what was really needed for email. What happened then? People needed more horsepower for wysiwyg editors, the new thing of that time..... Now it is games that hardware is too slow for. After games, maybe AI assistants?.... Will you be saying, "My AI doesn't have enough horsepower to run on, its databases are small and out of date, and it is providing me worse advice than my wealthy friends get, and providing it later."? How much will you pay for a good AI to advise you? (I really like my I-Nav GPS adviser in my mini-van.... money well spent....) >I live half-time in rural Colorado -- at 9800 feet above sea level, on rough >highways 60 miles from the nearest grocery store. > Ok, you win that one.;-) >Kinda like folks who buy >dual-processor systems with 250GB drives, so they can web surf or impress >people at LAN parties... ;) > I am buying a new monitor so that I can do head-shots more easily in tribes 2;-). I suppose I should be more motivated by having bigger emacs windows and thereby increasing the size of my visual cache, and maybe when I was younger I would have been more motivated by that, and it does prevent me from feeling guilty about spending that money, but at this phase of my life ;-) I hate it when pixelization prevents me from lining up on the head.... It is interesting that games are the only compelling motivation for faster desktop hardware these days. It may be part of why we are in a tech bust. When AIs become hardware purchase drivers, there will likely be a boom again. -- Hans ^ permalink raw reply [flat|nested] 266+ messages in thread
* RE: Minutes from Feb 21 LSE Call 2003-02-25 23:41 ` Hans Reiser @ 2003-02-26 0:19 ` Scott Robert Ladd 2003-02-26 0:35 ` Hans Reiser 2003-02-26 0:47 ` Steven Cole 2003-02-26 16:07 ` Horst von Brand 2 siblings, 1 reply; 266+ messages in thread From: Scott Robert Ladd @ 2003-02-26 0:19 UTC (permalink / raw) To: Hans Reiser; +Cc: Steven Cole, Martin J. Bligh, LKML, Larry McVoy Hans Reiser wrote > Now it is games that hardware is too slow for. After games, maybe AI > assistants?.... Will you be saying, "My AI doesn't have enough > horsepower to run on, its databases are small and out of date, and it is > providing me worse advice than my wealthy friends get, and providing it > later."? How much will you pay for a good AI to advise you? (I really > like my I-Nav GPS adviser in my mini-van.... money well spent....) Really good AI is predicated on the invention of better algorithms. I do a bit of work in this area; we're a long way from any useful AI -- unless you think Microsoft's "Clippy" qualifies. :) I would love to see "intelligence" in software; IBM's recent "autonomic computing" initiative is marketing hype for a good idea. Programs (including Linux!) should be self-diagnosing, fault tolerant, and self-correcting. We're not there yet on the software side (again). And "smart AI" may not be something people want. Many people distrust machines -- and in gaming, a really good AI simply isn't as important (or desirable) as are pretty graphics (handled by a GPU). > Ok, you win that one.;-) Yeah! ;) > It is interesting that games are the only compelling motivation for > faster desktop hardware these days. It may be part of why we are in a > tech bust. When AIs become hardware purchase drivers, there will likely > be a boom again. I've worked with several game companies; AI just isn't a priority. Games need to be "good enough" to challenge average gamers; people who want a real challenge play online against other humans. Excellent breast physics (Extreme Beach Volleyball) sells games; a crafty, hard-to-defeat AI actually turns off casual players and just isn't "sexy". And now I think we're getting *WAY* off topic. :) ..Scott ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-26 0:19 ` Scott Robert Ladd @ 2003-02-26 0:35 ` Hans Reiser 2003-02-26 16:31 ` Horst von Brand 0 siblings, 1 reply; 266+ messages in thread From: Hans Reiser @ 2003-02-26 0:35 UTC (permalink / raw) To: Scott Robert Ladd; +Cc: Steven Cole, Martin J. Bligh, LKML, Larry McVoy Scott Robert Ladd wrote: > >I've worked with several game companies; AI just isn't a priority. Games >need to be "good enough" to challenge average gamers; people who want a real >challenge play online against other humans. > I didn't mean game AI. In real life, computers aim better than humans do. In real life, that makes people want the AI, in a game, when the robot is faster they don't buy the game. I predict the US military will drive AI research over the next 10 years because AIs can shoot better and faster. After the AIs mature on the battlefield they'll start being more useful to industry (replacing bus drivers, etc.) In 15-30 years, AIs will be a big market, a huge one. Of course, people said that 30 years ago and it seemed reasonable then.... -- Hans ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-26 0:35 ` Hans Reiser @ 2003-02-26 16:31 ` Horst von Brand 0 siblings, 0 replies; 266+ messages in thread From: Horst von Brand @ 2003-02-26 16:31 UTC (permalink / raw) To: Hans Reiser; +Cc: LKML [Massive cutdown on Cc:] Hans Reiser <reiser@namesys.com> [...] > In 15-30 years, AIs will be a big market, a huge one. Of course, people > said that 30 years ago and it seemed reasonable then.... It won't. Because AI is handwaving patch over hack with the odd kludge for lack of a decent, structured solution. If the problem is important, some solution is found eventually, and the area doesn't qualify anymore ;-) Happened to "automatic programming", to get a program written from a high-level specification was an AI problem, until compiler technology was born and matured. To be able to manage a computer system required a human, until modern OSes. Today you have machines reading handwriting (sort of) as part of PDAs, there is even some limited voice input available. Automatic recognition of failed parts from video cameras is routine, work is progressing on face recognition. It just isn't called AI anymore. -- Dr. Horst H. von Brand User #22616 counter.li.org Departamento de Informatica Fono: +56 32 654431 Universidad Tecnica Federico Santa Maria +56 32 654239 Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513 ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 23:41 ` Hans Reiser 2003-02-26 0:19 ` Scott Robert Ladd @ 2003-02-26 0:47 ` Steven Cole 2003-02-26 16:07 ` Horst von Brand 2 siblings, 0 replies; 266+ messages in thread From: Steven Cole @ 2003-02-26 0:47 UTC (permalink / raw) To: Hans Reiser; +Cc: Scott Robert Ladd, LKML cc list trimmed. On Tue, 2003-02-25 at 16:41, Hans Reiser wrote: > Scott Robert Ladd wrote: > > > But I honestly can't see how 8 processors can possibly make > >Abiword run "better." > > > They can't, but you know it was before 1980 that hardware exceeded what > was really needed for email. What happened then? People needed more > horsepower for wysiwyg editors, the new thing of that time..... > > Now it is games that hardware is too slow for. After games, maybe AI > assistants?.... Will you be saying, "My AI doesn't have enough > horsepower to run on, its databases are small and out of date, and it is > providing me worse advice than my wealthy friends get, and providing it > later."? How much will you pay for a good AI to advise you? (I really > like my I-Nav GPS adviser in my mini-van.... money well spent....) > [snippage] > > It is interesting that games are the only compelling motivation for > faster desktop hardware these days. It may be part of why we are in a > tech bust. When AIs become hardware purchase drivers, there will likely > be a boom again. > > -- > Hans It's easy to say that people don't need a multiple Ghz processor to run most applications (games and AI aside) because it's true. But human nature is such that people bought muscle cars with way more horsepower than needed in the 60's and 70's before environmental concerns intervened. The current slowdown in PC purchases may be more due to a cyclical bear market than due to satiation of need. When the economy turns around, and it always does, many people will opt for the $600 2.4 Ghz P4 instead of the $200 1.1 Ghz Duron. And not because they need it, but because of other factors. Now, fast forward five years. If AMD is still around, Intel will be forced to offer ridiculously fast hardware just to stay in business. My original point is that the Ghz race may be supplemented by a SMP/HT race, not because of need (AI and games may help provide an excuse), but because of greed and envy. Never underestimate those last two. And that SMP/HT race could have an important impact on future kernel design. Steven (Looking forward to his 2.4 Ghz P4 which will compile a 2.5 kernel faster than the 15 minutes it takes his 450 Mhz PIII today, especially with Reiser4 patched in.;) ) ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 23:41 ` Hans Reiser 2003-02-26 0:19 ` Scott Robert Ladd 2003-02-26 0:47 ` Steven Cole @ 2003-02-26 16:07 ` Horst von Brand 2003-02-26 19:47 ` Alan Cox 2 siblings, 1 reply; 266+ messages in thread From: Horst von Brand @ 2003-02-26 16:07 UTC (permalink / raw) To: Hans Reiser; +Cc: LKML [Massive snippage of Cc:] Hans Reiser <reiser@namesys.com> said: [...] > It is interesting that games are the only compelling motivation for > faster desktop hardware these days. It may be part of why we are in a > tech bust. When AIs become hardware purchase drivers, there will likely > be a boom again. Oh, it was always that way. When it was Apple ][+, nobody complained about the spreadsheet being too small/slow, it was games which were CPU and display hungry. The machines of that vintage that have still a following around here are the Atari 800XL and such, which had special hardware for managing grahics on display. With the first PCs it was color displays. Then came CDs and multimedia. Today it is fast CPUs and accelerated video cards my students want for running the latest crop in games. Many keep Win98 just for running games, for work they use Linux ;-) Some people say that most new computing stuff is first introduced for gaming. That does make sense to me, as in a game you'll be more tolerant of rough edges; plus games do have a much wider appeal than office suites or databases, and are a much more competitive market to boot. ;-) -- Dr. Horst H. von Brand User #22616 counter.li.org Departamento de Informatica Fono: +56 32 654431 Universidad Tecnica Federico Santa Maria +56 32 654239 Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513 ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-26 16:07 ` Horst von Brand @ 2003-02-26 19:47 ` Alan Cox 0 siblings, 0 replies; 266+ messages in thread From: Alan Cox @ 2003-02-26 19:47 UTC (permalink / raw) To: Horst von Brand; +Cc: Hans Reiser, LKML On Wed, 2003-02-26 at 16:07, Horst von Brand wrote: > Some people say that most new computing stuff is first introduced for > gaming. That does make sense to me, as in a game you'll be more tolerant of > rough edges; plus games do have a much wider appeal than office suites or > databases, and are a much more competitive market to boot. ;-) If you've ever seen Master Thief run on a 16Mhz palmpilot you might want to ask the game folks some *hard* questions too 8) ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 23:28 ` Scott Robert Ladd 2003-02-25 23:41 ` Hans Reiser @ 2003-02-26 6:04 ` Aaron Lehmann 1 sibling, 0 replies; 266+ messages in thread From: Aaron Lehmann @ 2003-02-26 6:04 UTC (permalink / raw) To: Scott Robert Ladd Cc: Hans Reiser, Steven Cole, Martin J. Bligh, LKML, Larry McVoy On Tue, Feb 25, 2003 at 06:28:08PM -0500, Scott Robert Ladd wrote: > You are correct. Gaming, file sharing, digital imaging -- those application > eat horsepower. But I honestly can't see how 8 processors can possibly make > Abiword run "better." With the current (and historic) state of Abiword performance, anything would be an improvement. ^ permalink raw reply [flat|nested] 266+ messages in thread
* RE: Minutes from Feb 21 LSE Call 2003-02-25 20:37 ` Scott Robert Ladd 2003-02-25 21:36 ` Hans Reiser @ 2003-02-26 0:44 ` Alan Cox 2003-02-25 23:58 ` Scott Robert Ladd 1 sibling, 1 reply; 266+ messages in thread From: Alan Cox @ 2003-02-26 0:44 UTC (permalink / raw) To: Scott Robert Ladd Cc: Steven Cole, Martin J. Bligh, Hans Reiser, LKML, Larry McVoy On Tue, 2003-02-25 at 20:37, Scott Robert Ladd wrote: > Steven Cole wrote: > > Hans may have 32 CPUs in his $3000 box, and I expect to have 8 CPUs in > > my $500 Walmart special 5 or 6 years hence. And multiple chip on die > > along with HT is what will make it possible. > > Or will Walmart be selling systems with one CPU for $62.50? > > "Normal" folk simply have no use for an 8 CPU system. Sure, the technology > is great -- but no many people are buying HDTV, let alone a computer system > that could do real-time 3D holographic imaging. What Walmart is selling > today for $199 is a 1.1 GHz Duron system with minimal memory and a 10GB hard Last time I checked it was an 800Mhz VIA C3 with onboard everything (EPIA variant). Even the CPU is BGA mounted to keep cost down ^ permalink raw reply [flat|nested] 266+ messages in thread
* RE: Minutes from Feb 21 LSE Call 2003-02-26 0:44 ` Alan Cox @ 2003-02-25 23:58 ` Scott Robert Ladd 0 siblings, 0 replies; 266+ messages in thread From: Scott Robert Ladd @ 2003-02-25 23:58 UTC (permalink / raw) To: Alan Cox; +Cc: Steven Cole, Martin J. Bligh, Hans Reiser, LKML, Larry McVoy Alan Cox wrote: SRL> that could do real-time 3D holographic imaging. What Walmart is SRL> selling today for $199 is a 1.1 GHz Duron system with minimal SRL> memory and a 10GB hard. AC> Last time I checked it was an 800Mhz VIA C3 with onboard everything AC> (EPIA variant). Even the CPU is BGA mounted to keep cost down My reference is: http://www.walmart.com/catalog/product.gsp?product_id=2138700&cat=3951&type= 19&dept=3944&path=0:3944:3951 1.1 GHz Duron 128 MB RAM 10 GB drive CD-ROM Ethernet For $199.98. ..Scott ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 21:02 ` Martin J. Bligh 2003-02-22 22:06 ` Mark Hahn @ 2003-02-22 23:15 ` Larry McVoy 2003-02-22 23:23 ` Christoph Hellwig ` (3 more replies) 1 sibling, 4 replies; 266+ messages in thread From: Larry McVoy @ 2003-02-22 23:15 UTC (permalink / raw) To: Martin J. Bligh; +Cc: Larry McVoy, Mark Hahn, David S. Miller, linux-kernel On Sat, Feb 22, 2003 at 01:02:12PM -0800, Martin J. Bligh wrote: > > How much do you want to bet that more than 95% of their server revenue > > comes from 4CPU or less boxes? I wouldn't be surprised if it is more > > like 99.5%. And you can configure yourself a pretty nice quad xeon box > > for $25K. Yeah, there is some profit in there but nowhere near the huge > > margins you are counting on to make your case. > > OK, so now you've slid from talking about PCs to 2-way to 4-way ... > perhaps because your original arguement was fatally flawed. Nice attempt at deflection but it won't work. Your position is that there is no money in PC's only in big iron. Last I checked, "big iron" doesn't include $25K 4 way machines, now does it? You claimed that Dell was making the majority of their profits from servers. To refresh your memory: "I bet they still make more money on servers than desktops and notebooks combined". Are you still claiming that? If so, please provide some data to back it up because, as Mark and others have pointed out, the bulk of their servers are headless desktop machines in tower or rackmount cases. I fail to see how there are better margins on the same hardware in a rackmount box for $800 when the desktop costs $750. Those rack mount power supplies and cases are not as cheap as the desktop ones, so I see no difference in the margins. Let's get back to your position. You want to shovel stuff in the kernel for the benefit of the 32 way / 64 way etc boxes. I don't see that as wise. You could prove me wrong. Here's how you do it: go get oprofile or whatever that tool is which lets you run apps and count cache misses. Start including before/after runs of each microbench in lmbench and some time sharing loads with and without your changes. When you can do that and you don't add any more bus traffic, you're a genius and I'll shut up. But that's a false promise because by definition, fine grained threading adds more bus traffic. It's kind of hard to not have that happen, the caches have to stay coherent somehow. > Some applications work well on clusters, which will give them cheaper > hardware, at the expense of a lot more complexity in userspace ... > depending on the scale of the system, that's a tradeoff that might go > either way. Tell it to Google. That's probably one of the largest applications in the world; I was the 4th engineer there, and I didn't think that the cluster added complexity at all. On the contrary, it made things go one hell of a lot faster. > You don't believe we can make it scale without screwing up the low end, > I do believe we can do that. I'd like a little more than "I think I can, I think I can, I think I can". The people who are saying "no you can't, no you can't, no you can't" have seen this sort of work done before and there is no data which shows that it is possible and all sorts of data which shows that it is not. Show me one OS which scales to 32 CPUs on an I/O load and run lmbench on a single CPU. Then take that same CPU and stuff it into a uniprocessor motherboard and run the same benchmarks on under Linux. The Linux one will blow away the multi threaded one. Come on, prove me wrong, show me the data. -- --- Larry McVoy lm at bitmover.com http://www.bitmover.com/lm ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 23:15 ` Larry McVoy @ 2003-02-22 23:23 ` Christoph Hellwig 2003-02-22 23:54 ` Mark Hahn 2003-02-22 23:44 ` Martin J. Bligh ` (2 subsequent siblings) 3 siblings, 1 reply; 266+ messages in thread From: Christoph Hellwig @ 2003-02-22 23:23 UTC (permalink / raw) To: Larry McVoy, Martin J. Bligh, Larry McVoy, Mark Hahn, David S. Miller, linux-kernel On Sat, Feb 22, 2003 at 03:15:52PM -0800, Larry McVoy wrote: > Show me one OS which scales to 32 CPUs on an I/O load and run lmbench > on a single CPU. Then take that same CPU and stuff it into a uniprocessor > motherboard and run the same benchmarks on under Linux. The Linux one > will blow away the multi threaded one. Come on, prove me wrong, show > me the data. I could ask the SGI Eagan folks to do that with an Altix and a IA64 Whitebox - oh wait, both OSes would be Linux.. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 23:23 ` Christoph Hellwig @ 2003-02-22 23:54 ` Mark Hahn 0 siblings, 0 replies; 266+ messages in thread From: Mark Hahn @ 2003-02-22 23:54 UTC (permalink / raw) To: Christoph Hellwig; +Cc: linux-kernel > I could ask the SGI Eagan folks to do that with an Altix and a IA64 > Whitebox - oh wait, both OSes would be Linux.. the only public info I've seen is "round-trip in as little as 40ns", which is too vague to be useful. and sounds WAY optimistic - perhaps that's just between two CPUs in a single brick. remember that LMBench shows memory latencies of O(100ns) for even fast uniprocessors. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 23:15 ` Larry McVoy 2003-02-22 23:23 ` Christoph Hellwig @ 2003-02-22 23:44 ` Martin J. Bligh 2003-02-24 4:56 ` Larry McVoy 2003-02-22 23:57 ` Jeff Garzik 2003-02-23 23:57 ` Bill Davidsen 3 siblings, 1 reply; 266+ messages in thread From: Martin J. Bligh @ 2003-02-22 23:44 UTC (permalink / raw) To: Larry McVoy; +Cc: Mark Hahn, David S. Miller, linux-kernel >> OK, so now you've slid from talking about PCs to 2-way to 4-way ... >> perhaps because your original arguement was fatally flawed. > > Nice attempt at deflection but it won't work. On your part or mine? seemingly yours. > Your position is that > there is no money in PC's only in big iron. Last I checked, "big iron" > doesn't include $25K 4 way machines, now does it? I would call 4x a "big machine" which is what I originally said. > You claimed that > Dell was making the majority of their profits from servers. I think that's probably true (nobody can be certain, as we don't have the numbers). > To refresh > your memory: "I bet they still make more money on servers than desktops > and notebooks combined". Are you still claiming that? Yup. > If so, please > provide some data to back it up because, as Mark and others have pointed > out, the bulk of their servers are headless desktop machines in tower > or rackmount cases. So what? they're still servers. I can no more provide data to back it up than you can to contradict it, because they don't release those figures. Note my sentence began "I bet", not "I have cast iron evidence". > Let's get back to your position. You want to shovel stuff in the kernel > for the benefit of the 32 way / 64 way etc boxes. Actually, I'm focussed on 16-way at the moment, and have never run on, or published numbers for anything higher. If you need to exaggerate to make your point, then go ahead, but it's pretty transparent. > I don't see that as wise. You could prove me wrong. > Here's how you do it: go get oprofile > or whatever that tool is which lets you run apps and count cache misses. > Start including before/after runs of each microbench in lmbench and > some time sharing loads with and without your changes. When you can do > that and you don't add any more bus traffic, you're a genius and > I'll shut up. I don't feel the need to do that to prove my point, but if you feel the need to do it to prove yours, go ahead. > But that's a false promise because by definition, fine grained threading > adds more bus traffic. It's kind of hard to not have that happen, the > caches have to stay coherent somehow. Adding more bus traffic is fine if you increase throughput. Focussing on just one tiny aspect of performance is ludicrous. Look at the big picture. Run some non-micro benchmarks. Analyse the results. Compare 2.4 vs 2.5 (or any set of patches I've put into the kernel of your choice) On UP, 2P or whatever you care about. You seem to think the maintainers are morons that we can just slide crap straight by ... give them a little more credit than that. > Tell it to Google. That's probably one of the largest applications in > the world; I was the 4th engineer there, and I didn't think that the > cluster added complexity at all. On the contrary, it made things go > one hell of a lot faster. As I've explained to you many times before, it depends on the system. Some things split easily, some don't. >> You don't believe we can make it scale without screwing up the low end, >> I do believe we can do that. > > I'd like a little more than "I think I can, I think I can, I think I can". > The people who are saying "no you can't, no you can't, no you can't" have > seen this sort of work done before and there is no data which shows that > it is possible and all sorts of data which shows that it is not. The only data that's relevant is what we've done to Linux. If you want to run the numbers, and show some useful metric on a semi-realistic benchmark, I'd love to seem. > Show me one OS which scales to 32 CPUs on an I/O load and run lmbench > on a single CPU. Then take that same CPU and stuff it into a uniprocessor > motherboard and run the same benchmarks on under Linux. The Linux one > will blow away the multi threaded one. Nobody has every really focussed before on an OS that scales across the board from UP to big iron ... a closed development system is bad at resolving that sort of thing. The real interesting comparison is UP or 2x SMP on Linux with and without the scalability changes that have made it into the tree. > Come on, prove me wrong, show me the data. I don't have to *prove* you wrong. I'm happy in my own personal knowledge that you're wrong, and things seem to be going along just fine, thanks. If you want to change the attitude of the maintainers, I suggest you generate the data yourself. M. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 23:44 ` Martin J. Bligh @ 2003-02-24 4:56 ` Larry McVoy 2003-02-24 5:06 ` William Lee Irwin III 2003-02-24 5:16 ` Martin J. Bligh 0 siblings, 2 replies; 266+ messages in thread From: Larry McVoy @ 2003-02-24 4:56 UTC (permalink / raw) To: Martin J. Bligh; +Cc: Larry McVoy, Mark Hahn, David S. Miller, linux-kernel > > Your position is that > > there is no money in PC's only in big iron. Last I checked, "big iron" > > doesn't include $25K 4 way machines, now does it? > > I would call 4x a "big machine" which is what I originally said. Nonsense. You were talking about 16/32/64 way boxes, go read your own mail. In fact, you said so in this message. Furthermore, I can prove that isn't what you are talking about. Show me the performance gains you are getting on 4way systems from your changes. Last I checked, things scaled pretty nicely on 4 ways. > > You claimed that > > Dell was making the majority of their profits from servers. > > I think that's probably true (nobody can be certain, as we don't have the > numbers). Yes, we do. You just don't like what the numbers are saying. You can work backward from the size of the server market and the percentages claimed by Sun, HP, IBM, etc. If you do that, you'll see that even if Dell was making 100% margins on every server they sold, that still wouldn't be 51% of their profits. It's not "probably true", it's not physically possible that it is true and if you don't know that you are simply waving your hands and not doing any math. > > To refresh > > your memory: "I bet they still make more money on servers than desktops > > and notebooks combined". Are you still claiming that? > > Yup. Well, you are flat out 100% wrong. > > If so, please > > provide some data to back it up because, as Mark and others have pointed > > out, the bulk of their servers are headless desktop machines in tower > > or rackmount cases. > > So what? they're still servers. I can no more provide data to back it up > than you can to contradict it, because they don't release those figures. Read the mail I've posted on topic, the data is there. Or better yet, don't trust me, go work it out for yourself, it isn't hard. > > I don't see that as wise. You could prove me wrong. > > Here's how you do it: go get oprofile > > or whatever that tool is which lets you run apps and count cache misses. > > Start including before/after runs of each microbench in lmbench and > > some time sharing loads with and without your changes. When you can do > > that and you don't add any more bus traffic, you're a genius and > > I'll shut up. > > I don't feel the need to do that to prove my point, but if you feel the > need to do it to prove yours, go ahead. Ahh, now we're getting somewhere. As soon as we get anywhere near real numbers, you don't want anything to do with it. Why is that? > You seem to think the maintainers are morons that we can just slide crap > straight by ... give them a little more credit than that. It happens all the time. > > Come on, prove me wrong, show me the data. > > I don't have to *prove* you wrong. I'm happy in my own personal knowledge > that you're wrong, and things seem to be going along just fine, thanks. Wow. Compelling. "It is so because I say it is so". Jeez, forgive me if I'm not falling all over myself to have that sort of engineering being the basis for scaling work. -- --- Larry McVoy lm at bitmover.com http://www.bitmover.com/lm ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 4:56 ` Larry McVoy @ 2003-02-24 5:06 ` William Lee Irwin III 2003-02-24 6:00 ` Mark Hahn 2003-02-24 15:06 ` Alan Cox 2003-02-24 5:16 ` Martin J. Bligh 1 sibling, 2 replies; 266+ messages in thread From: William Lee Irwin III @ 2003-02-24 5:06 UTC (permalink / raw) To: Larry McVoy, Martin J. Bligh, Larry McVoy, Mark Hahn, David S. Miller, linux-kernel On Sun, Feb 23, 2003 at 08:56:16PM -0800, Larry McVoy wrote: > Furthermore, I can prove that isn't what you are talking about. Show me > the performance gains you are getting on 4way systems from your changes. > Last I checked, things scaled pretty nicely on 4 ways. Try 4 or 8 mkfs's in parallel on a 4x box running virgin 2.4.x. -- wli ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 5:06 ` William Lee Irwin III @ 2003-02-24 6:00 ` Mark Hahn 2003-02-24 6:02 ` William Lee Irwin III 2003-02-24 15:06 ` Alan Cox 1 sibling, 1 reply; 266+ messages in thread From: Mark Hahn @ 2003-02-24 6:00 UTC (permalink / raw) To: William Lee Irwin III; +Cc: Larry McVoy, linux-kernel > > Last I checked, things scaled pretty nicely on 4 ways. > > Try 4 or 8 mkfs's in parallel on a 4x box running virgin 2.4.x. "Doctor, it hurts..." ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 6:00 ` Mark Hahn @ 2003-02-24 6:02 ` William Lee Irwin III 0 siblings, 0 replies; 266+ messages in thread From: William Lee Irwin III @ 2003-02-24 6:02 UTC (permalink / raw) To: Mark Hahn; +Cc: Larry McVoy, linux-kernel At some point in the past, Larry McVoy wrote: >>> Last I checked, things scaled pretty nicely on 4 ways. At some point in the past, I wrote: >> Try 4 or 8 mkfs's in parallel on a 4x box running virgin 2.4.x. On Mon, Feb 24, 2003 at 01:00:22AM -0500, Mark Hahn wrote: > "Doctor, it hurts..." Doing disk io is supposed to hurt? I'll file this in the "sick and wrong" category along with RBJ and Hohensee. In the meantime, compare to 2.5.x. -- wli ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 5:06 ` William Lee Irwin III 2003-02-24 6:00 ` Mark Hahn @ 2003-02-24 15:06 ` Alan Cox 2003-02-24 23:18 ` William Lee Irwin III 1 sibling, 1 reply; 266+ messages in thread From: Alan Cox @ 2003-02-24 15:06 UTC (permalink / raw) To: William Lee Irwin III Cc: Larry McVoy, Martin J. Bligh, Larry McVoy, Mark Hahn, David S. Miller, Linux Kernel Mailing List On Mon, 2003-02-24 at 05:06, William Lee Irwin III wrote: > On Sun, Feb 23, 2003 at 08:56:16PM -0800, Larry McVoy wrote: > > Furthermore, I can prove that isn't what you are talking about. Show me > > the performance gains you are getting on 4way systems from your changes. > > Last I checked, things scaled pretty nicely on 4 ways. > > Try 4 or 8 mkfs's in parallel on a 4x box running virgin 2.4.x. You have strange ideas of typical workloads. The mkfs paralle one is a good one though because its also a lot better on one CPU in 2.5 ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 15:06 ` Alan Cox @ 2003-02-24 23:18 ` William Lee Irwin III 0 siblings, 0 replies; 266+ messages in thread From: William Lee Irwin III @ 2003-02-24 23:18 UTC (permalink / raw) To: Alan Cox Cc: Larry McVoy, Martin J. Bligh, Larry McVoy, Mark Hahn, David S. Miller, Linux Kernel Mailing List On Mon, 2003-02-24 at 05:06, William Lee Irwin III wrote: >> Try 4 or 8 mkfs's in parallel on a 4x box running virgin 2.4.x. On Mon, Feb 24, 2003 at 03:06:53PM +0000, Alan Cox wrote: > You have strange ideas of typical workloads. The mkfs paralle one is a good > one though because its also a lot better on one CPU in 2.5 The results I saw were that this did not affect 2.5 in any interesting way and 2.4 behaved "very badly". It's a simple way to get lots of disk io going without a complex benchmark. There are good reasons and real workloads why things were done to fix this. -- wli ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 4:56 ` Larry McVoy 2003-02-24 5:06 ` William Lee Irwin III @ 2003-02-24 5:16 ` Martin J. Bligh 2003-02-24 6:58 ` Larry McVoy 1 sibling, 1 reply; 266+ messages in thread From: Martin J. Bligh @ 2003-02-24 5:16 UTC (permalink / raw) To: Larry McVoy; +Cc: linux-kernel > Nonsense. You were talking about 16/32/64 way boxes, go read your own > mail. In fact, you said so in this message. Where? I never mentioned 32 / 64 way boxes, for starters ... > Furthermore, I can prove that isn't what you are talking about. Show me > the performance gains you are getting on 4way systems from your changes. > Last I checked, things scaled pretty nicely on 4 ways. Depends what you mean by "your changes". If you do a before and after comparison on a 4x machine on the scalability changes IBM LTC has made, I think you'd find a dramatic difference. Of course, it depends to some extent on what tests you run. Maybe running bitkeeper (or whatever you're testing) just eats cpu, and doesn't do much interprocess communication or disk IO (compared to the CPU load), in which case it'll scale pretty well on anything as long as it's multithreaded enough. If you're just worried about one particular app, yes of course you could tweak the system to go faster for it ... but that's not what a general purpose OS is about. > Yes, we do. You just don't like what the numbers are saying. You can > work backward from the size of the server market and the percentages > claimed by Sun, HP, IBM, etc. If you do that, you'll see that even > if Dell was making 100% margins on every server they sold, that still > wouldn't be 51% of their profits. Ummm ... now go back to what we were actually talking about. Linux margins. You think a significant percentage of the desktops they sell run Linux? >> > To refresh >> > your memory: "I bet they still make more money on servers than desktops >> > and notebooks combined". Are you still claiming that? >> >> Yup. > > Well, you are flat out 100% wrong. In the context we were talking about (Linux), I seriously doubt it. Apologies if I didn't feel the need to continously restate the context in every email to stop you from trying to twist the argument. > Ahh, now we're getting somewhere. As soon as we get anywhere near real > numbers, you don't want anything to do with it. Why is that? Because I don't see why I should waste my time running benchmarks just to prove you wrong. I don't respect you that much, and it seems the maintainers don't either. When you become somebody with the stature in the Linux community of, say, Linus or Andrew I'd be prepared to spend a lot more time running benchmarks on any concerns you might have. >> I don't have to *prove* you wrong. I'm happy in my own personal knowledge >> that you're wrong, and things seem to be going along just fine, thanks. > > Wow. Compelling. "It is so because I say it is so". Jeez, forgive me > if I'm not falling all over myself to have that sort of engineering being > the basis for scaling work. Ummm ... and your argument is different because of what? You've run some tiny little microfocused benchmark, seen a couple of bus cycles, and projected the results out? Not very impressive, really, is it? Go run a real benchmark and prove it makes a difference if you want to sway people's opinions. Until then, I suspect the current status quo will continue in terms of us getting patches accepted. M. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 5:16 ` Martin J. Bligh @ 2003-02-24 6:58 ` Larry McVoy 2003-02-24 7:39 ` Martin J. Bligh ` (3 more replies) 0 siblings, 4 replies; 266+ messages in thread From: Larry McVoy @ 2003-02-24 6:58 UTC (permalink / raw) To: Martin J. Bligh; +Cc: Larry McVoy, linux-kernel On Sun, Feb 23, 2003 at 09:16:38PM -0800, Martin J. Bligh wrote: > Ummm ... now go back to what we were actually talking about. Linux margins. > You think a significant percentage of the desktops they sell run Linux? The real discussion was the justification for scaling work beyond the small SMPs. You tried to make the point that there is no money in PC's so any work to scale Linux up would help hardware companies stay financially healthy. I and others pointed out that there is indeed a pile of money in PC's, that's vast majority of the hardware dell sells. They don't sell anything bigger than an 8 way and they only have one of those. We went on to do the digging to figure out that it's impossible that dell makes a substantial portion of their profits from the big servers. The point being that there is a company generating $32B/year in sales and almost all of that is in uniprocessors. Directly countering your statement that there is no margin in PC's. They are making $2B/year in profits, QED. Which brings us back to the point. If the world is not heading towards an 8 way on every desk then it is really questionable to make a lot of changes to the kernel to make it work really well on 8-ways. Yeah, I'm sure it makes you feel good, but it's more of a intellectual exercise than anything which really benefits the vast majority of the kernel user base. > > Ahh, now we're getting somewhere. As soon as we get anywhere near real > > numbers, you don't want anything to do with it. Why is that? > > Because I don't see why I should waste my time running benchmarks just to > prove you wrong. I don't respect you that much, and it seems the > maintainers don't either. When you become somebody with the stature in the > Linux community of, say, Linus or Andrew I'd be prepared to spend a lot > more time running benchmarks on any concerns you might have. Who cares if you respect me, what does that have to do with proper engineering? Do you think that I'm the only person who wants to see numbers? You think Linus doesn't care about this? Maybe you missed the whole IA32 vs IA64 instruction cache thread. It sure sounded like he cares. How about Alan? He stepped up and pointed out that less is more. How about Mark? He knows a thing or two about the topic? In fact, I think you'd be hard pressed to find anyone who wouldn't be interested in seeing the cache effects of a patch. People care about performance, both scaling up and scaling down. A lot of performance changes are measured poorly, in a way that makes the changes look good but doesn't expose the hidden costs of the change. What I'm saying is that those sorts of measurements screwed over performance in the past, why are you trying to repeat old mistakes? > > Wow. Compelling. "It is so because I say it is so". Jeez, forgive me > > if I'm not falling all over myself to have that sort of engineering being > > the basis for scaling work. > > Ummm ... and your argument is different because of what? You've run some > tiny little microfocused benchmark, seen a couple of bus cycles, and > projected the results out? My argument is different because every effort which has gone in the direction you are going has ended up with a kernel that worked well on big boxes and sucked rocks on little boxes. And all of them started with kernels which performed quite nicely on uniprocessors. If I was waving my hands and saying "I'm an old fart and I think this won't work" and that was it, you'd have every right to tell me to piss off. I'd tell me to piss off. But that's not what is going on here. What's going on is that a pile of smart people have tried over and over to do what you claim you will do and they all failed. They all ended up with kernels that gave up lots of uniprocessor performance and justified it by throwing more processors at that problem. You haven't said a single thing to refute that and when challenged to measure the parts which lead to those results you respond with "nah, nah, I don't respect you so I don't have to measure it". Come on, *you* should want to know if what I'm saying is true. You're an engineer, not a marketing drone, of course you should want to know, why wouldn't you? Linux is a really fast system right now. The code paths are short and it is possible to use the OS almost as if it were a library, the cost is so little that you really can mmap stuff in as you need, something that people have wanted since Multics. There will always be many more uses of Linux in small systems than large, simply because there will always be more small systems. Keeping Linux working well on small systems is going to have a dramatically larger positive benefit for the world than scaling it to 64 processors. So who do you want to help? An elite few or everyone? -- --- Larry McVoy lm at bitmover.com http://www.bitmover.com/lm ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 6:58 ` Larry McVoy @ 2003-02-24 7:39 ` Martin J. Bligh 2003-02-24 16:17 ` Larry McVoy 2003-02-24 7:51 ` William Lee Irwin III ` (2 subsequent siblings) 3 siblings, 1 reply; 266+ messages in thread From: Martin J. Bligh @ 2003-02-24 7:39 UTC (permalink / raw) To: Larry McVoy; +Cc: linux-kernel >> Ummm ... now go back to what we were actually talking about. Linux >> margins. You think a significant percentage of the desktops they sell >> run Linux? > > The real discussion was the justification for scaling work beyond the > small SMPs. You tried to make the point that there is no money in PC's so > any work to scale Linux up would help hardware companies stay financially > healthy. More or less, yes. > The point being that there is a company generating $32B/year in sales and > almost all of that is in uniprocessors. Directly countering your > statement that there is no margin in PC's. They are making $2B/year in > profits, QED. Which is totally irrelevant. It's the *LINUX* market that matters. What part of that do you find so hard to understand? > Which brings us back to the point. If the world is not heading towards > an 8 way on every desk then it is really questionable to make a lot of > changes to the kernel to make it work really well on 8-ways. Yeah, I'm > sure it makes you feel good, but it's more of a intellectual exercise than > anything which really benefits the vast majority of the kernel user base. It makes IBM money, ergo they pay me. I enjoy doing it, ergo I work for them. Most of the work benefits smaller systems as well, ergo we get our patches accepted. So everyone's happy, apart from you, who keeps whining. >> Because I don't see why I should waste my time running benchmarks just to >> prove you wrong. I don't respect you that much, and it seems the >> maintainers don't either. When you become somebody with the stature in >> the Linux community of, say, Linus or Andrew I'd be prepared to spend a >> lot more time running benchmarks on any concerns you might have. > > Who cares if you respect me, what does that have to do with proper > engineering? Do you think that I'm the only person who wants to see > numbers? You think Linus doesn't care about this? Maybe you missed > the whole IA32 vs IA64 instruction cache thread. It sure sounded like > he cares. How about Alan? He stepped up and pointed out that less > is more. How about Mark? He knows a thing or two about the topic? > In fact, I think you'd be hard pressed to find anyone who wouldn't be > interested in seeing the cache effects of a patch. So now we've slid from talking about bus traffic from fine-grained locking, which is mostly just you whining in ignorance of the big picture, to cache effects, which are obviously important. Nice try at twisting the conversation. Again. > People care about performance, both scaling up and scaling down. A lot of > performance changes are measured poorly, in a way that makes the changes > look good but doesn't expose the hidden costs of the change. What I'm > saying is that those sorts of measurements screwed over performance in > the past, why are you trying to repeat old mistakes? One way to measure those changes poorly would be to do what you were advocating earlier - look at one tiny metric of a microbenchmark, rather than the actual throughput of the machine. So pardon me if I take your concerns, and file them in the appropriate place. > My argument is different because every effort which has gone in the > direction you are going has ended up with a kernel that worked well on > big boxes and sucked rocks on little boxes. And all of them started > with kernels which performed quite nicely on uniprocessors. So you're trying to say that fine-grained locking ruins uniprocessor performance now? Or did you have some other change in mind? > If I was waving my hands and saying "I'm an old fart and I think this > won't work" and that was it, you'd have every right to tell me to piss > off. I'd tell me to piss off. But that's not what is going on here. > What's going on is that a pile of smart people have tried over and over > to do what you claim you will do and they all failed. They all ended up > with kernels that gave up lots of uniprocessor performance and justified > it by throwing more processors at that problem. You haven't said a > single thing to refute that and when challenged to measure the parts > which lead to those results you respond with "nah, nah, I don't respect > you so I don't have to measure it". Come on, *you* should want to know > if what I'm saying is true. You're an engineer, not a marketing drone, > of course you should want to know, why wouldn't you? You just don't get it, do you? Your head is so vastly inflated that you think everyone should run around researching whatever *you* happen to think is interesting. Do your own benchmarking if you think it's a problem. You're the one whining about this. > Linux is a really fast system right now. The code paths are short and > it is possible to use the OS almost as if it were a library, the cost is > so little that you really can mmap stuff in as you need, something that > people have wanted since Multics. There will always be many more uses > of Linux in small systems than large, simply because there will always > be more small systems. Keeping Linux working well on small systems is > going to have a dramatically larger positive benefit for the world than > scaling it to 64 processors. So who do you want to help? An elite > few or everyone? Everyone. And we can do that, and make large systems work at the same time. Despite the fact you don't believe me. And despite the fact that you can't grasp the difference between the number 16 and the number 64. M. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 7:39 ` Martin J. Bligh @ 2003-02-24 16:17 ` Larry McVoy 2003-02-24 16:49 ` Martin J. Bligh 2003-02-24 18:22 ` John W. M. Stevens 0 siblings, 2 replies; 266+ messages in thread From: Larry McVoy @ 2003-02-24 16:17 UTC (permalink / raw) To: Martin J. Bligh; +Cc: Larry McVoy, linux-kernel On Sun, Feb 23, 2003 at 11:39:34PM -0800, Martin J. Bligh wrote: > > The point being that there is a company generating $32B/year in sales and > > almost all of that is in uniprocessors. Directly countering your > > statement that there is no margin in PC's. They are making $2B/year in > > profits, QED. > > Which is totally irrelevant. It's the *LINUX* market that matters. What > part of that do you find so hard to understand? OK, so you can't handle the reality that the server market overall doesn't make your point so you retreat to the Linux market. OK, fine. All the data anyone has ever seen has Linux running on *smaller* servers, not larger. Show me all the cases where people replaced 4 CPU NT boxes with 8 CPU Linux boxes. The point being that if in the overall market place, big iron isn't dominating, you have one hell of a tough time making the case that the Linux market place is somehow profoundly different and needs larger boxes to do the same job. In fact, the opposite is true. Linux squeezes substantially more performance out of the same hardware than the commercial OS offerings, NT or Unix. So where is the market force which says "oh, switching to Linux? Better get more CPUs". > It makes IBM money, ergo they pay me. I enjoy doing it, ergo I work for > them. Most of the work benefits smaller systems as well, ergo we get our > patches accepted. So everyone's happy, apart from you, who keeps whining. Indeed I do, I'm good at it. You're about to find out how good. It's quite effective to simply focus attention on a problem area. Here's my promise to you: there will be a ton of attention focussed on the scaling patches until you and anyone else doing them starts showing up with cache miss counters as part of the submission process. > So now we've slid from talking about bus traffic from fine-grained locking, > which is mostly just you whining in ignorance of the big picture, to cache > effects, which are obviously important. Nice try at twisting the > conversation. Again. You need to take a deep breath and try and understand that the focus of the conversation is Linux, not your ego or mine. Getting mad at me just wastes energy, stay focussed on the real issue, Linux. > > People care about performance, both scaling up and scaling down. A lot of > > performance changes are measured poorly, in a way that makes the changes > > look good but doesn't expose the hidden costs of the change. What I'm > > saying is that those sorts of measurements screwed over performance in > > the past, why are you trying to repeat old mistakes? > > One way to measure those changes poorly would be to do what you were > advocating earlier - look at one tiny metric of a microbenchmark, rather > than the actual throughput of the machine. So pardon me if I take your > concerns, and file them in the appropriate place. You apparently missed the point where I have said (a bunch of times) run the benchmarks you want and report before and after the patch cache miss counters for the same runs. Microbenchmarks would be a really bad way to do that, you really want to run a real application because you need it fighting for the cache. > > My argument is different because every effort which has gone in the > > direction you are going has ended up with a kernel that worked well on > > big boxes and sucked rocks on little boxes. And all of them started > > with kernels which performed quite nicely on uniprocessors. > > So you're trying to say that fine-grained locking ruins uniprocessor > performance now? I've been saying that for almost 10 years, check the archives. > You just don't get it, do you? Your head is so vastly inflated that you > think everyone should run around researching whatever *you* happen to think > is interesting. Do your own benchmarking if you think it's a problem. That's exactly what I'll do if you don't learn how to do it yourself. I'm astounded that any competent engineer wouldn't want to know the effects of their changes, I think you actually do but are just too pissed right now to see it. > > Linux is a really fast system right now. [etc] > > Everyone. And we can do that, and make large systems work at the same time. > Despite the fact you don't believe me. And despite the fact that you can't > grasp the difference between the number 16 and the number 64. See other postings on this one. All engineers in your position have said "we're just trying to get to N cpus where N = ~2x where we are today and it won't hurt uniprocessor performance". They *all* say that. And they all end up with a slow uniprocessor OS. Unlike security and a number of other invasive features, the SMP stuff can't be configed out or you end up with an #ifdef-ed mess like IRIX. -- --- Larry McVoy lm at bitmover.com http://www.bitmover.com/lm ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 16:17 ` Larry McVoy @ 2003-02-24 16:49 ` Martin J. Bligh 2003-02-24 18:22 ` John W. M. Stevens 1 sibling, 0 replies; 266+ messages in thread From: Martin J. Bligh @ 2003-02-24 16:49 UTC (permalink / raw) To: Larry McVoy; +Cc: linux-kernel >> > The point being that there is a company generating $32B/year in sales >> > and almost all of that is in uniprocessors. Directly countering your >> > statement that there is no margin in PC's. They are making $2B/year in >> > profits, QED. >> >> Which is totally irrelevant. It's the *LINUX* market that matters. What >> part of that do you find so hard to understand? > > OK, so you can't handle the reality that the server market overall doesn't > make your point so you retreat to the Linux market. OK, fine. All the Errm. no. That was the conversation all along - you just took some remarks out of context > The point being that if in the overall market place, big iron isn't > dominating, you have one hell of a tough time making the case that the > Linux market place is somehow profoundly different and needs larger > boxes to do the same job. Dominating in terms of volume? No. My postion is that Linux sales for hardware companies make more money on servers than desktops. We're working on scalability ... that means CPUs, memory, disk IO, networking, everything. That improves both the efficiency of servers ... "large machines" (which your original message had as, and I quote, "4 or more CPU SMP machines"), 2x and even larger 1x machines. If you're being more specific as to things like NUMA changes, please point to examples of patches you think degrades performance on UP / 2x or whatever. > Indeed I do, I'm good at it. You're about to find out how good. It's > quite effective to simply focus attention on a problem area. Here's > my promise to you: there will be a ton of attention focussed on the > scaling patches until you and anyone else doing them starts showing > up with cache miss counters as part of the submission process. Here's my promise to you: people listen to you far less than you think, and our patches will continue to go into the kernel. >> So now we've slid from talking about bus traffic from fine-grained >> locking, which is mostly just you whining in ignorance of the big >> picture, to cache effects, which are obviously important. Nice try at >> twisting the conversation. Again. > > You need to take a deep breath and try and understand that the focus of > the conversation is Linux, not your ego or mine. Getting mad at me just > wastes energy, stay focussed on the real issue, Linux. So exactly what do you think is the problem? it seems to keep shifting mysteriously. Name some patches that got accepted into mainline ... if they're broken, that'll give us some clues what is bad for the future, and we can fix them. >> One way to measure those changes poorly would be to do what you were >> advocating earlier - look at one tiny metric of a microbenchmark, rather >> than the actual throughput of the machine. So pardon me if I take your >> concerns, and file them in the appropriate place. > > You apparently missed the point where I have said (a bunch of times) > run the benchmarks you want and report before and after the patch > cache miss counters for the same runs. Microbenchmarks would be > a really bad way to do that, you really want to run a real application > because you need it fighting for the cache. One statistic (eg cache miss counters) isn't the big picture. If throughput goes up or remains the same on all machines, that's what important. >> So you're trying to say that fine-grained locking ruins uniprocessor >> performance now? > > I've been saying that for almost 10 years, check the archives. And you haven't worked out that locks compile away to nothing on UP yet? I think you might be better off pulling your head out of where it's currently residing, and pointing it at the source code. >> You just don't get it, do you? Your head is so vastly inflated that you >> think everyone should run around researching whatever *you* happen to >> think is interesting. Do your own benchmarking if you think it's a >> problem. > > That's exactly what I'll do if you don't learn how to do it yourself. I'm > astounded that any competent engineer wouldn't want to know the effects of > their changes, I think you actually do but are just too pissed right now > to see it. Cool, I'd love to see some benchmarks ... and real throughput numbers from them, not just microstatistics. > See other postings on this one. All engineers in your position have said > "we're just trying to get to N cpus where N = ~2x where we are today and > it won't hurt uniprocessor performance". They *all* say that. And they > all end up with a slow uniprocessor OS. Unlike security and a number of > other invasive features, the SMP stuff can't be configed out or you end > up with an #ifdef-ed mess like IRIX. Try looking up "abstraction" in a dictionary. Linus doesn't take #ifdef's in the main code. M. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 16:17 ` Larry McVoy 2003-02-24 16:49 ` Martin J. Bligh @ 2003-02-24 18:22 ` John W. M. Stevens 1 sibling, 0 replies; 266+ messages in thread From: John W. M. Stevens @ 2003-02-24 18:22 UTC (permalink / raw) To: Larry McVoy, Martin J. Bligh, Larry McVoy, linux-kernel On Mon, Feb 24, 2003 at 08:17:16AM -0800, Larry McVoy wrote: > On Sun, Feb 23, 2003 at 11:39:34PM -0800, Martin J. Bligh wrote: > > See other postings on this one. All engineers in your position have said > "we're just trying to get to N cpus where N = ~2x where we are today and > it won't hurt uniprocessor performance". They *all* say that. And they > all end up with a slow uniprocessor OS. Unlike security and a number of > other invasive features, the SMP stuff can't be configed out Heck, you can't even configure it out on so-called UP systems. The moment you introduce DMA into a system, you have an (admittedly, constrained) SMP system. And of course, simple interruption is another, contrained, kind of "virtual SMP", yes? Anybody whose done any USB HC programming is horribly aware of this fact, trust me! ;-) > or you end > up with an #ifdef-ed mess like IRIX. Why if-def it every where? #ifdef SMP #define lock( mutex ) smpLock( lock ) #else #define lock( mutex ) #endif Do that once, use the lock macro, and forget about it (except in cases where you have to worry about DMA, interruption, or some other kind of MP, of course). My (limited, only about 600 machines) experience is that Linux is inevitably less stable on non-Intel, and on non-UP machines. Before worrying about scalability, my opinion is that worrying about getting the simplest (dual processor) machines as stable as UP machines, first, would be both a better ROI, and a good basis for higher levels of scalability. Mind you, there is a perfectly simple reason (for Linux being less stable on non-Intel, non-UP machines) that this is true: the Linux development methodology pretty much makes this an emergent property. Interesting discussion, though . . . from my experience, the commercial Unices use fine grained locking. Luck, John S. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 6:58 ` Larry McVoy 2003-02-24 7:39 ` Martin J. Bligh @ 2003-02-24 7:51 ` William Lee Irwin III 2003-02-24 15:47 ` Larry McVoy 2003-02-24 13:28 ` Alan Cox 2003-02-24 18:44 ` Davide Libenzi 3 siblings, 1 reply; 266+ messages in thread From: William Lee Irwin III @ 2003-02-24 7:51 UTC (permalink / raw) To: Larry McVoy, Martin J. Bligh, Larry McVoy, linux-kernel On Sun, Feb 23, 2003 at 10:58:26PM -0800, Larry McVoy wrote: > Linux is a really fast system right now. The code paths are short and > it is possible to use the OS almost as if it were a library, the cost is > so little that you really can mmap stuff in as you need, something that > people have wanted since Multics. There will always be many more uses > of Linux in small systems than large, simply because there will always > be more small systems. Keeping Linux working well on small systems is > going to have a dramatically larger positive benefit for the world than > scaling it to 64 processors. So who do you want to help? An elite > few or everyone? I don't know what kind of joke you think I'm trying to play here. "Scalability" is about making the kernel properly adapt to the size of the system. This means UP. This means embedded. This means mid-range x86 bigfathighmem turds. This means SGI Altix. I have _personally_ written patches to decrease the space footprint of pidhashes and other data structures so that embedded systems function more optimally. It's not about crapping all over the low end. It's not about degrading performance on commonly available systems. It's about increasing the range of systems on which Linux performs well and is useful. Maintaining the performance of Linux on commonly available systems is not only deeply ingrained as one of a set of personal standards amongst all kernel hackers involved with scalability, it's also a prerequisite for patch acceptance that is rigorously enforced by maintainers. To further demonstrate this, look at the pgd_ctor patches, which markedly reduced the overhead of pgd setup and teardown on UP lowmem systems and were very minor improvements on PAE systems. Now it's time to turn the question back around on you. Why do you not want Linux to work well on a broader range of systems than it does now? -- wli ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 7:51 ` William Lee Irwin III @ 2003-02-24 15:47 ` Larry McVoy 2003-02-24 16:00 ` Martin J. Bligh ` (2 more replies) 0 siblings, 3 replies; 266+ messages in thread From: Larry McVoy @ 2003-02-24 15:47 UTC (permalink / raw) To: William Lee Irwin III, Martin J. Bligh, Larry McVoy, linux-kernel On Sun, Feb 23, 2003 at 11:51:42PM -0800, William Lee Irwin III wrote: > Now it's time to turn the question back around on you. Why do you not > want Linux to work well on a broader range of systems than it does now? I never said that I didn't. I'm just taking issue with the choosen path which has been demonstrated to not work. "Let's scale Linux by multi threading" "Err, that really sucked for everyone who has tried it in the past, all the code paths got long and uniprocessor performance suffered" "Oh, but we won't do that, that would be bad". "Great, how about you measure the changes carefully and really show that?" "We don't need to measure the changes, we know we'll do it right". And just like in every other time this come up in every other engineering organization, the focus is in 2x wherever we are today. It is *never* about getting to 100x or 1000x. If you were looking at the problem assuming that the same code had to run on uniprocessor and a 1000 way smp, right now, today, and designing for it, I doubt very much we'd have anything to argue about. A lot of what I'm saying starts to become obviously true as you increase the number of CPUs but engineers are always seduced into making it go 2x farther than it does today. Unfortunately, each of those 2x increases comes at some cost and they add up. -- --- Larry McVoy lm at bitmover.com http://www.bitmover.com/lm ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 15:47 ` Larry McVoy @ 2003-02-24 16:00 ` Martin J. Bligh 2003-02-24 16:23 ` Benjamin LaHaise 2003-02-24 23:36 ` William Lee Irwin III 2 siblings, 0 replies; 266+ messages in thread From: Martin J. Bligh @ 2003-02-24 16:00 UTC (permalink / raw) To: Larry McVoy, William Lee Irwin III, linux-kernel > I never said that I didn't. I'm just taking issue with the choosen path > which has been demonstrated to not work. > > "Let's scale Linux by multi threading" > > "Err, that really sucked for everyone who has tried it in the past, > all the code paths got long and uniprocessor performance suffered" > > "Oh, but we won't do that, that would be bad". > > "Great, how about you measure the changes carefully and really show > that?" > > "We don't need to measure the changes, we know we'll do it right". Most of the threading changes have been things like 1 thread per cpu, which would seem to scale up and down rather well to me ... could you illustrate by pointing to an example of something that's changed in that area which you think is bad? Yes, if Linux started 2000 kernel threads on a UP system, that would obviously be bad. M. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 15:47 ` Larry McVoy 2003-02-24 16:00 ` Martin J. Bligh @ 2003-02-24 16:23 ` Benjamin LaHaise 2003-02-24 16:25 ` yodaiken 2003-02-24 16:31 ` Larry McVoy 2003-02-24 23:36 ` William Lee Irwin III 2 siblings, 2 replies; 266+ messages in thread From: Benjamin LaHaise @ 2003-02-24 16:23 UTC (permalink / raw) To: Larry McVoy, William Lee Irwin III, Martin J. Bligh, Larry McVoy, linux-kernel On Mon, Feb 24, 2003 at 07:47:25AM -0800, Larry McVoy wrote: > If you were looking at the problem assuming that the same code had to > run on uniprocessor and a 1000 way smp, right now, today, and designing > for it, I doubt very much we'd have anything to argue about. A lot of > what I'm saying starts to become obviously true as you increase the > number of CPUs but engineers are always seduced into making it go 2x > farther than it does today. Unfortunately, each of those 2x increases > comes at some cost and they add up. Good point. However, we are in a position to compare test results of older linux kernels against newer, and to recompile code out of the kernel for specific applications. I'm curious if there is a collection of lmbench results of hand configured and compiled kernels vs the vendor module based kernels across 2.0, 2.2, 2.4 and recent 2.5 on the same uniprocessor and dual processor configuration. That would really give us a better idea of how a properly tuned kernel vs what people actually use for support reasons is costing us, and if we're winning or losing. -ben -- Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a> ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 16:23 ` Benjamin LaHaise @ 2003-02-24 16:25 ` yodaiken 2003-02-24 18:20 ` Gerrit Huizenga 2003-02-24 16:31 ` Larry McVoy 1 sibling, 1 reply; 266+ messages in thread From: yodaiken @ 2003-02-24 16:25 UTC (permalink / raw) To: Benjamin LaHaise Cc: Larry McVoy, William Lee Irwin III, Martin J. Bligh, Larry McVoy, linux-kernel On Mon, Feb 24, 2003 at 11:23:14AM -0500, Benjamin LaHaise wrote: > Good point. However, we are in a position to compare test results of > older linux kernels against newer, and to recompile code out of the > kernel for specific applications. I'm curious if there is a collection > of lmbench results of hand configured and compiled kernels vs the vendor > module based kernels across 2.0, 2.2, 2.4 and recent 2.5 on the same > uniprocessor and dual processor configuration. That would really give > us a better idea of how a properly tuned kernel vs what people actually > use for support reasons is costing us, and if we're winning or losing. It's interesting to me that the people supporting the scale up do not carefully do such benchmarks and indeed have a rather cavilier attitude to testing and benchmarking: or perhaps they don't think it's worth publishing. -- --------------------------------------------------------- Victor Yodaiken Finite State Machine Labs: The RTLinux Company. www.fsmlabs.com www.rtlinux.com 1+ 505 838 9109 ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 16:25 ` yodaiken @ 2003-02-24 18:20 ` Gerrit Huizenga 0 siblings, 0 replies; 266+ messages in thread From: Gerrit Huizenga @ 2003-02-24 18:20 UTC (permalink / raw) To: yodaiken Cc: Benjamin LaHaise, Larry McVoy, William Lee Irwin III, Martin J. Bligh, Larry McVoy, linux-kernel On Mon, 24 Feb 2003 09:25:33 MST, yodaiken@fsmlabs.com wrote: > It's interesting to me that the people supporting the scale up do not > carefully do such benchmarks and indeed have a rather cavilier attitude > to testing and benchmarking: or perhaps they don't think it's worth > publishing. I'm afraid it is the latter half that is closer to correct. Within IBM's Linux Technology Center, we have a good sized performance team and a tightly coupled set of developers who can internally share a lot of real benchmark data. Unfortunately, the rules of SPEC and TPC don't allow us to release data unless it is carefully (and time- consumingly) audited, and IBM has a history of not dumping the output of a few hundred runs of benchmarks out in the open and then claiming that it is all valid, without doing a lot of internal validation first. I'm sure other large companies doing Linux stuff have similar hurdles. In some cases, ours are probably higher than average (IBM as an entity has zero interest in pissing of the TPC or SPEC). We do have a few papers out there, check OLS for the large database workload one that steps through 2.4 performance changes (stock 2.4 vs. a set of patches we pushed to UL & RHAT) that increase database performance about, oh, I forget, 5-fold... And there is occasional other data sent out on web server stuff, some microbenchmark data (see the continuing stream of data from mbligh, for instance). Also, the contest data, OSDL data, etc. etc. shows comparisons and trends for anyone who cares to pay attention. It *would* be nice if someone could publish a compedium of performance data, but that would be asking a lot... gerrit ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 16:23 ` Benjamin LaHaise 2003-02-24 16:25 ` yodaiken @ 2003-02-24 16:31 ` Larry McVoy 1 sibling, 0 replies; 266+ messages in thread From: Larry McVoy @ 2003-02-24 16:31 UTC (permalink / raw) To: Benjamin LaHaise Cc: Larry McVoy, William Lee Irwin III, Martin J. Bligh, linux-kernel On Mon, Feb 24, 2003 at 11:23:14AM -0500, Benjamin LaHaise wrote: > kernel for specific applications. I'm curious if there is a collection > of lmbench results of hand configured and compiled kernels vs the vendor > module based kernels across 2.0, 2.2, 2.4 and recent 2.5 on the same > uniprocessor and dual processor configuration. If someone were willing to build the init script infra structure to reboot to a new kernel, run the test, etc., I'll buy a couple of machines and just let them run through this. I'd like to do it with the cache miss counters turned on so if P4's do a nicer job of counting than Athlons, I'll get those. -- --- Larry McVoy lm at bitmover.com http://www.bitmover.com/lm ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 15:47 ` Larry McVoy 2003-02-24 16:00 ` Martin J. Bligh 2003-02-24 16:23 ` Benjamin LaHaise @ 2003-02-24 23:36 ` William Lee Irwin III 2003-02-25 0:23 ` Larry McVoy 2 siblings, 1 reply; 266+ messages in thread From: William Lee Irwin III @ 2003-02-24 23:36 UTC (permalink / raw) To: Larry McVoy, Martin J. Bligh, Larry McVoy, linux-kernel On Sun, Feb 23, 2003 at 11:51:42PM -0800, William Lee Irwin III wrote: >> Now it's time to turn the question back around on you. Why do you not >> want Linux to work well on a broader range of systems than it does now? On Mon, Feb 24, 2003 at 07:47:25AM -0800, Larry McVoy wrote: > I never said that I didn't. I'm just taking issue with the choosen path > which has been demonstrated to not work. > "Let's scale Linux by multi threading" > "Err, that really sucked for everyone who has tried it in the past, all > the code paths got long and uniprocessor performance suffered" > "Oh, but we won't do that, that would be bad". > "Great, how about you measure the changes carefully and really show that?" > "We don't need to measure the changes, we know we'll do it right". The changes are getting measured. By and large if it's slower on UP it's rejected. There's a dedicated benchmark crew, of which Randy Hron is an important member, that benchmarks such things very consistently. Internal benchmarking includes both free and non-free benchmarks. dbench, tiobench, kernel compiles, contest, and so on are the publicable bits. Also, code paths are also not necessarily getting longer. Single- threaded efficiency lowers lock hold time and helps small systems too, and numerous improvements with buffer_heads, task searching, file truncation, and the like, are of that flavor. On Mon, Feb 24, 2003 at 07:47:25AM -0800, Larry McVoy wrote: > And just like in every other time this come up in every other engineering > organization, the focus is in 2x wherever we are today. It is *never* > about getting to 100x or 1000x. > If you were looking at the problem assuming that the same code had to > run on uniprocessor and a 1000 way smp, right now, today, and designing > for it, I doubt very much we'd have anything to argue about. A lot of > what I'm saying starts to become obviously true as you increase the > number of CPUs but engineers are always seduced into making it go 2x > farther than it does today. Unfortunately, each of those 2x increases > comes at some cost and they add up. Linux is a patchwork kernel. No coherent design will ever shine through. Scaling the kernel incrementally merely becomes that much more difficult. The small system performance standards aren't getting lowered. Also note there are various efforts to scale the kernel _downward_ to smaller embedded systems, partly by controlling "bloated" hash tables' sizes and partly by making major subsystems optional and partly by supporting systems with no MMU. This is not a one-way street, though I myself am clearly pointed in the upward direction. -- wli ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 23:36 ` William Lee Irwin III @ 2003-02-25 0:23 ` Larry McVoy 2003-02-25 2:37 ` Werner Almesberger 2003-02-25 4:42 ` William Lee Irwin III 0 siblings, 2 replies; 266+ messages in thread From: Larry McVoy @ 2003-02-25 0:23 UTC (permalink / raw) To: William Lee Irwin III, Martin J. Bligh, Larry McVoy, linux-kernel > The changes are getting measured. By and large if it's slower on UP > it's rejected. Suppose I have an application which has a working set which just exactly fits in the I+D caches, including the related OS stuff. Someone makes some change to the OS and the benchmark for that change is smaller than the I+D caches but the change increased the I+D cache space needed. The benchmark will not show any slowdown, correct? My application no longer fits and will suffer, correct? The point is that if you are putting SMP changes into the system, you have to be held to a higher standard for measurement given the past track record of SMP changes increasing code length and cache footprints. So "measuring" doesn't mean "it's not slower on XYZ microbenchmark". It means "under the following work loads the cache misses went down or stayed the same for before and after tests". And if you said that all changes should be held to this standard, not just scaling changes, I'd agree with you. But scaling changes are the "bad guy" in my mind, they are not to be trusted, so they should be held to this standard first. If we can get everyone to step up to this bat, that's all to the good. -- --- Larry McVoy lm at bitmover.com http://www.bitmover.com/lm ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 0:23 ` Larry McVoy @ 2003-02-25 2:37 ` Werner Almesberger 2003-02-25 4:42 ` William Lee Irwin III 1 sibling, 0 replies; 266+ messages in thread From: Werner Almesberger @ 2003-02-25 2:37 UTC (permalink / raw) To: William Lee Irwin III, Martin J. Bligh, Larry McVoy, linux-kernel Larry McVoy wrote: > The point is that if you are putting SMP changes into the system, you > have to be held to a higher standard for measurement given the past > track record of SMP changes increasing code length and cache footprints. So you probably want to run this benchmark on a synthetic CPU a la cachegrind. The difficult part would be to come up with a reasonably understandable additive metric for cache pressure. (I guess there goes another call to arms to academia :-) - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 0:23 ` Larry McVoy 2003-02-25 2:37 ` Werner Almesberger @ 2003-02-25 4:42 ` William Lee Irwin III 2003-02-25 4:54 ` Larry McVoy 1 sibling, 1 reply; 266+ messages in thread From: William Lee Irwin III @ 2003-02-25 4:42 UTC (permalink / raw) To: Larry McVoy, Martin J. Bligh, Larry McVoy, linux-kernel At some point in the past, I wrote: >> The changes are getting measured. By and large if it's slower on UP >> it's rejected. On Mon, Feb 24, 2003 at 04:23:09PM -0800, Larry McVoy wrote: > Suppose I have an application which has a working set which just exactly > fits in the I+D caches, including the related OS stuff. > Someone makes some change to the OS and the benchmark for that change is > smaller than the I+D caches but the change increased the I+D cache space > needed. > The benchmark will not show any slowdown, correct? > My application no longer fits and will suffer, correct? Well, it's often clear from the code whether it'll have a larger cache footprint or not, so it's probably not that large a problem. OTOH it is a real problem that little cache or TLB profiling is going on. I tried once or twice and actually came up with a function or two that should be inlined instead of uninlined in very short order. Much low-hanging fruit could be gleaned from those kinds profiles. It's also worthwhile noting increased cache footprints are actually very often degradations on SMP and especially NUMA. The notion that optimizing for SMP and/or NUMA involves increasing cache footprint on anything doesn't really sound plausible, though I'll admit that the mistake of trusting microbenchmarks too far on SMP has probably already been committed at least once. Userspace owns the cache; using cache for the kernel is "cache pollution", which should be minimized. Going too far out on the space end of time/space tradeoff curves is every bit as bad for SMP as UP, and really horrible for NUMA. On Mon, Feb 24, 2003 at 04:23:09PM -0800, Larry McVoy wrote: > The point is that if you are putting SMP changes into the system, you > have to be held to a higher standard for measurement given the past > track record of SMP changes increasing code length and cache footprints. > So "measuring" doesn't mean "it's not slower on XYZ microbenchmark". > It means "under the following work loads the cache misses went down or > stayed the same for before and after tests". This kind of measurement is actually relatively unusual. I'm definitely interested in it, as there appear to be some deficits wrt. locality of reference that show up as big profile spikes on NUMA boxen. With care exercised good solutions should also trim down cache misses on UP also. Cache and TLB miss profile driven development sounds very attractive. On Mon, Feb 24, 2003 at 04:23:09PM -0800, Larry McVoy wrote: > And if you said that all changes should be held to this standard, not > just scaling changes, I'd agree with you. But scaling changes are the > "bad guy" in my mind, they are not to be trusted, so they should be held > to this standard first. If we can get everyone to step up to this bat, > that's all to the good. Let me put it this way: IBM sells tiny boxen too, from 4x, to UP, to whatever. And people are simultaneously actively trying to scale downward to embedded bacteria or whatever. So the small systems are being neither ignored nor sacrificed for anything else. -- wli ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 4:42 ` William Lee Irwin III @ 2003-02-25 4:54 ` Larry McVoy 2003-02-25 6:00 ` William Lee Irwin III 0 siblings, 1 reply; 266+ messages in thread From: Larry McVoy @ 2003-02-25 4:54 UTC (permalink / raw) To: William Lee Irwin III, Martin J. Bligh, Larry McVoy, linux-kernel > Userspace owns the cache; using > cache for the kernel is "cache pollution", which should be minimized. > Going too far out on the space end of time/space tradeoff curves is > every bit as bad for SMP as UP, and really horrible for NUMA. Cool, I agree 100% with this. > > So "measuring" doesn't mean "it's not slower on XYZ microbenchmark". > > It means "under the following work loads the cache misses went down or > > stayed the same for before and after tests". > > This kind of measurement is actually relatively unusual. I'm definitely > interested in it, as there appear to be some deficits wrt. locality of > reference that show up as big profile spikes on NUMA boxen. With care > exercised good solutions should also trim down cache misses on UP also. > Cache and TLB miss profile driven development sounds very attractive. Again, I'm with you all the way on this. If the scale up guys can adopt this as a mantra, I'm a lot less concerned that anything bad will happen. Tim at OSDL and I have been talking about trying to work out some benchmarks to test for this. I came up with the idea of adding a "-s XXX" which means "touch XXX bytes between each iteration" to each LMbench test. One problem is the lack of page coloring will make the numbers bounce around too much. We talked that over with Linus and he suggested using the big TLB hack to get around that. Assuming we can deal with the page coloring, do you think that there is any merit in taking microbenchmarks, adding an artificial working set, and running those? > Let me put it this way: IBM sells tiny boxen too, from 4x, to UP, to > whatever. And people are simultaneously actively trying to scale > downward to embedded bacteria or whatever. That's really great, I know it's a lot less sexy but it's important. I'd love to see as much attention on making Linux work on tiny embedded platforms as there is on making it work on big iron. Small is cool too. -- --- Larry McVoy lm at bitmover.com http://www.bitmover.com/lm ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 4:54 ` Larry McVoy @ 2003-02-25 6:00 ` William Lee Irwin III 2003-02-25 7:00 ` Val Henson 0 siblings, 1 reply; 266+ messages in thread From: William Lee Irwin III @ 2003-02-25 6:00 UTC (permalink / raw) To: Larry McVoy, Martin J. Bligh, Larry McVoy, linux-kernel At some point in the past, I wrote: >> This kind of measurement is actually relatively unusual. I'm definitely >> interested in it, as there appear to be some deficits wrt. locality of >> reference that show up as big profile spikes on NUMA boxen. With care >> exercised good solutions should also trim down cache misses on UP also. >> Cache and TLB miss profile driven development sounds very attractive. On Mon, Feb 24, 2003 at 08:54:04PM -0800, Larry McVoy wrote: > Again, I'm with you all the way on this. If the scale up guys can adopt > this as a mantra, I'm a lot less concerned that anything bad will happen. I don't know about mantras, but we're getting to the point where lock contention is a non-issue on midrange SMP and straight line efficiency is beyond the range of "obviously it should be done some other way." The time to chase cache pollution is certainly coming. On Mon, Feb 24, 2003 at 08:54:04PM -0800, Larry McVoy wrote: > Tim at OSDL and I have been talking about trying to work out some benchmarks > to test for this. I came up with the idea of adding a "-s XXX" which means > "touch XXX bytes between each iteration" to each LMbench test. One problem > is the lack of page coloring will make the numbers bounce around too much. > We talked that over with Linus and he suggested using the big TLB hack to > get around that. Assuming we can deal with the page coloring, do you think > that there is any merit in taking microbenchmarks, adding an artificial > working set, and running those? Page coloring needs to get into the kernel at some point. Using large TLB entries will artificially tie this to TLB effects and fragmentation, in addition to pagetable space conservation (on x86 anyway). So I really don't see any way to deal with reproducibility issues on this front but just doing page coloring. Everything else that does it as a side effect would unduly disturb the results, IMHO. At some point in the past, I wrote: >> Let me put it this way: IBM sells tiny boxen too, from 4x, to UP, to >> whatever. And people are simultaneously actively trying to scale >> downward to embedded bacteria or whatever. On Mon, Feb 24, 2003 at 08:54:04PM -0800, Larry McVoy wrote: > That's really great, I know it's a lot less sexy but it's important. > I'd love to see as much attention on making Linux work on tiny embedded > platforms as there is on making it work on big iron. Small is cool too. There is, unfortunately the participation in the development cycle of embedded vendors is not as visible as it is with large system vendors. More direct, frequent, and vocal input from embedded kernel hackers would be very valuable, as many "corner cases" with automatic kernel scaling should occur on the small end, not just the large end. I've had some brief attempts to explain to me the motives and methods of embedded system vendors and the like, but I've failed to absorb enough to get a "big picture" or much of any notion as to why embedded kernel hackers aren't participating as much in the development cycle. On the large system side, it's very clear that issues in the core VM and other parts of the kernel must be addressed to achieve the goals, and hence participation in the development cycle is outright mandatory. It's not "working effectively". It's a requirement. And part of that "requirement" bit is we have to work with constraints never enforced before, including maintaining the scalability curve on the low end. It's hard, and probably not impossible, but absolutely required. -- wli ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 6:00 ` William Lee Irwin III @ 2003-02-25 7:00 ` Val Henson 0 siblings, 0 replies; 266+ messages in thread From: Val Henson @ 2003-02-25 7:00 UTC (permalink / raw) To: William Lee Irwin III, linux-kernel On Mon, Feb 24, 2003 at 10:00:53PM -0800, William Lee Irwin III wrote: > On Mon, Feb 24, 2003 at 08:54:04PM -0800, Larry McVoy wrote: > > That's really great, I know it's a lot less sexy but it's important. > > I'd love to see as much attention on making Linux work on tiny embedded > > platforms as there is on making it work on big iron. Small is cool too. > > There is, unfortunately the participation in the development cycle of > embedded vendors is not as visible as it is with large system vendors. > More direct, frequent, and vocal input from embedded kernel hackers > would be very valuable, as many "corner cases" with automatic kernel > scaling should occur on the small end, not just the large end. > > I've had some brief attempts to explain to me the motives and methods > of embedded system vendors and the like, but I've failed to absorb > enough to get a "big picture" or much of any notion as to why embedded > kernel hackers aren't participating as much in the development cycle. Speaking as a former Linux developer for an embedded[1] systems vendor, it's because embedded companies aren't the size of IBM and don't have money to spend on software development beyond the "make it work on our boards" point. One of the many reasons I'm a _former_ embedded Linux developer. -VAL [1] Okay, our boards had up to 4 processors and 1GB memory. But the same principles applied. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 6:58 ` Larry McVoy 2003-02-24 7:39 ` Martin J. Bligh 2003-02-24 7:51 ` William Lee Irwin III @ 2003-02-24 13:28 ` Alan Cox 2003-02-25 5:19 ` Chris Wedgwood 2003-02-24 18:44 ` Davide Libenzi 3 siblings, 1 reply; 266+ messages in thread From: Alan Cox @ 2003-02-24 13:28 UTC (permalink / raw) To: Larry McVoy; +Cc: Martin J. Bligh, Linux Kernel Mailing List On Mon, 2003-02-24 at 06:58, Larry McVoy wrote: > Which brings us back to the point. If the world is not heading towards > an 8 way on every desk then it is really questionable to make a lot of > changes to the kernel to make it work really well on 8-ways. _If_ it harms performance on small boxes. Otherwise you turn Linux into Irix and your market doesnt look so hot in 3 or 4 years time. Featuritus is a slow creeping death. The definitive Linux box appears to be $199 from Walmart right now, and its not SMP. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 13:28 ` Alan Cox @ 2003-02-25 5:19 ` Chris Wedgwood 2003-02-25 5:26 ` William Lee Irwin III ` (3 more replies) 0 siblings, 4 replies; 266+ messages in thread From: Chris Wedgwood @ 2003-02-25 5:19 UTC (permalink / raw) To: Alan Cox; +Cc: Larry McVoy, Martin J. Bligh, Linux Kernel Mailing List On Mon, Feb 24, 2003 at 01:28:30PM +0000, Alan Cox wrote: > _If_ it harms performance on small boxes. You mean like the general slowdown from 2.4 - >2.5? It seems to me for small boxes, 2.5.x is margianlly slower at most things than 2.4.x. I'm hoping and the code solidifes and things are tuned this gap will go away and 2.5.x will inch ahead... hoping.... > The definitive Linux box appears to be $199 from Walmart right now, > and its not SMP. In two year this kind of hardware probably will be SMP (HT or some variant). --cw ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 5:19 ` Chris Wedgwood @ 2003-02-25 5:26 ` William Lee Irwin III 2003-02-25 21:21 ` Chris Wedgwood 2003-02-25 6:17 ` Martin J. Bligh ` (2 subsequent siblings) 3 siblings, 1 reply; 266+ messages in thread From: William Lee Irwin III @ 2003-02-25 5:26 UTC (permalink / raw) To: Chris Wedgwood Cc: Alan Cox, Larry McVoy, Martin J. Bligh, Linux Kernel Mailing List On Mon, Feb 24, 2003 at 01:28:30PM +0000, Alan Cox wrote: >> _If_ it harms performance on small boxes. On Mon, Feb 24, 2003 at 09:19:56PM -0800, Chris Wedgwood wrote: > You mean like the general slowdown from 2.4 - >2.5? > It seems to me for small boxes, 2.5.x is margianlly slower at most > things than 2.4.x. > I'm hoping and the code solidifes and things are tuned this gap will > go away and 2.5.x will inch ahead... hoping.... Could you help identify the regressions? Profiles? Workload? On Mon, Feb 24, 2003 at 01:28:30PM +0000, Alan Cox wrote: >> The definitive Linux box appears to be $199 from Walmart right now, >> and its not SMP. On Mon, Feb 24, 2003 at 09:19:56PM -0800, Chris Wedgwood wrote: > In two year this kind of hardware probably will be SMP (HT or some I'm a programmer not an economist (despite utility functions and Nash equilibria). Don't tell me what's definitive, give me some profiles. -- wli ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 5:26 ` William Lee Irwin III @ 2003-02-25 21:21 ` Chris Wedgwood 2003-02-25 21:14 ` Martin J. Bligh 2003-02-25 21:21 ` William Lee Irwin III 0 siblings, 2 replies; 266+ messages in thread From: Chris Wedgwood @ 2003-02-25 21:21 UTC (permalink / raw) To: William Lee Irwin III, Alan Cox, Larry McVoy, Martin J. Bligh, Linux Kernel Mailing List On Mon, Feb 24, 2003 at 09:26:02PM -0800, William Lee Irwin III wrote: > Could you help identify the regressions? Profiles? Workload? I the OSDL data that Cliff White pointed out sufficient to work-with, or do you want specific tests run with oprofile outputs? --cw ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 21:21 ` Chris Wedgwood @ 2003-02-25 21:14 ` Martin J. Bligh 2003-02-25 21:21 ` William Lee Irwin III 1 sibling, 0 replies; 266+ messages in thread From: Martin J. Bligh @ 2003-02-25 21:14 UTC (permalink / raw) To: Chris Wedgwood, William Lee Irwin III, Linux Kernel Mailing List >> Could you help identify the regressions? Profiles? Workload? > > I the OSDL data that Cliff White pointed out sufficient to work-with, > or do you want specific tests run with oprofile outputs? It's a great start, but profiles would really help if you can grab them. M. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 21:21 ` Chris Wedgwood 2003-02-25 21:14 ` Martin J. Bligh @ 2003-02-25 21:21 ` William Lee Irwin III 2003-02-25 22:08 ` Larry McVoy 1 sibling, 1 reply; 266+ messages in thread From: William Lee Irwin III @ 2003-02-25 21:21 UTC (permalink / raw) To: Chris Wedgwood Cc: Alan Cox, Larry McVoy, Martin J. Bligh, Linux Kernel Mailing List On Mon, Feb 24, 2003 at 09:26:02PM -0800, William Lee Irwin III wrote: >> Could you help identify the regressions? Profiles? Workload? On Tue, Feb 25, 2003 at 01:21:15PM -0800, Chris Wedgwood wrote: > I the OSDL data that Cliff White pointed out sufficient to work-with, > or do you want specific tests run with oprofile outputs? oprofile is what's needed. Looks like he's taking care of that too. -- wli ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 21:21 ` William Lee Irwin III @ 2003-02-25 22:08 ` Larry McVoy 2003-02-25 22:10 ` William Lee Irwin III 2003-02-25 22:37 ` Chris Wedgwood 0 siblings, 2 replies; 266+ messages in thread From: Larry McVoy @ 2003-02-25 22:08 UTC (permalink / raw) To: William Lee Irwin III, Chris Wedgwood, Alan Cox, Larry McVoy, Martin J. Bligh, Linux Kernel Mailing List On Tue, Feb 25, 2003 at 01:21:34PM -0800, William Lee Irwin III wrote: > On Mon, Feb 24, 2003 at 09:26:02PM -0800, William Lee Irwin III wrote: > >> Could you help identify the regressions? Profiles? Workload? > > On Tue, Feb 25, 2003 at 01:21:15PM -0800, Chris Wedgwood wrote: > > I the OSDL data that Cliff White pointed out sufficient to work-with, > > or do you want specific tests run with oprofile outputs? > > oprofile is what's needed. Looks like he's taking care of that too. Without doing something about the page coloring problem (and he might be) the numbers will be fairly meaningless. -- --- Larry McVoy lm at bitmover.com http://www.bitmover.com/lm ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 22:08 ` Larry McVoy @ 2003-02-25 22:10 ` William Lee Irwin III 2003-02-25 22:37 ` Chris Wedgwood 1 sibling, 0 replies; 266+ messages in thread From: William Lee Irwin III @ 2003-02-25 22:10 UTC (permalink / raw) To: Larry McVoy, Chris Wedgwood, Alan Cox, Larry McVoy, Martin J. Bligh, Linux Kernel Mailing List On Tue, Feb 25, 2003 at 01:21:34PM -0800, William Lee Irwin III wrote: >> oprofile is what's needed. Looks like he's taking care of that too. On Tue, Feb 25, 2003 at 02:08:11PM -0800, Larry McVoy wrote: > Without doing something about the page coloring problem (and he might be) > the numbers will be fairly meaningless. Hmm, point. Let's see if we can get Cliff to apply the new patch that one guy put out yesterday or so. -- wli ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 22:08 ` Larry McVoy 2003-02-25 22:10 ` William Lee Irwin III @ 2003-02-25 22:37 ` Chris Wedgwood 2003-02-25 22:58 ` Larry McVoy 1 sibling, 1 reply; 266+ messages in thread From: Chris Wedgwood @ 2003-02-25 22:37 UTC (permalink / raw) To: Larry McVoy, William Lee Irwin III, Alan Cox, Larry McVoy, Martin J. Bligh, Linux Kernel Mailing List On Tue, Feb 25, 2003 at 02:08:11PM -0800, Larry McVoy wrote: > Without doing something about the page coloring problem (and he > might be) the numbers will be fairly meaningless. page coloring problem? i was under the impression on anything 8-way-associative or better the page coloring improvements were negligible for real-world benchmarks (ie. kernel compiles) ... or is this more an artifact that even though the improvements for real-world are negligible, micro-benchmarks are susceptible to these variations this making things like the std. dev. larger than it would otherwise be? --cw ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 22:37 ` Chris Wedgwood @ 2003-02-25 22:58 ` Larry McVoy 0 siblings, 0 replies; 266+ messages in thread From: Larry McVoy @ 2003-02-25 22:58 UTC (permalink / raw) To: Chris Wedgwood Cc: Larry McVoy, William Lee Irwin III, Alan Cox, Martin J. Bligh, Linux Kernel Mailing List > ... or is this more an artifact that even though the improvements for > real-world are negligible, micro-benchmarks are susceptible to these > variations this making things like the std. dev. larger than it would > otherwise be? Bingo. If you are trying to measure whether something adds cache misses you really want reproducible runs. -- --- Larry McVoy lm at bitmover.com http://www.bitmover.com/lm ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 5:19 ` Chris Wedgwood 2003-02-25 5:26 ` William Lee Irwin III @ 2003-02-25 6:17 ` Martin J. Bligh 2003-02-25 17:11 ` Cliff White 2003-02-25 21:28 ` William Lee Irwin III 2003-02-25 19:20 ` Alan Cox 2003-02-25 19:59 ` Scott Robert Ladd 3 siblings, 2 replies; 266+ messages in thread From: Martin J. Bligh @ 2003-02-25 6:17 UTC (permalink / raw) To: Chris Wedgwood, Alan Cox; +Cc: Larry McVoy, Linux Kernel Mailing List >> _If_ it harms performance on small boxes. > > You mean like the general slowdown from 2.4 - >2.5? > > It seems to me for small boxes, 2.5.x is margianlly slower at most > things than 2.4.x. Can you name a benchmark, or at least do something reproducible between versions, and produce a 2.4 vs 2.5 profile? Let's at least try to fix it ... M. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 6:17 ` Martin J. Bligh @ 2003-02-25 17:11 ` Cliff White 2003-02-25 17:17 ` William Lee Irwin III ` (2 more replies) 2003-02-25 21:28 ` William Lee Irwin III 1 sibling, 3 replies; 266+ messages in thread From: Cliff White @ 2003-02-25 17:11 UTC (permalink / raw) To: Martin J. Bligh Cc: Chris Wedgwood, Alan Cox, Larry McVoy, Linux Kernel Mailing List, cliffw > >> _If_ it harms performance on small boxes. > > > > You mean like the general slowdown from 2.4 - >2.5? > > > > It seems to me for small boxes, 2.5.x is margianlly slower at most > > things than 2.4.x. > > Can you name a benchmark, or at least do something reproducible between > versions, and produce a 2.4 vs 2.5 profile? Let's at least try to fix it ... > > M. Well, here's one bit of data. Easy enough to do if you have a web browser. LMBench 2.0 on 1-way and 2-way, kernels 2.4.18 and 2.5.60 1-way (stp1-003 stp1-002) 2.4.18 http://khack.osdl.org/stp/7443/ 2.5.60 http://khack.osdl.org/stp/265622/ 2-way (stp2-003 stp2-000) 2.4.18 http://khack.osdl.org/stp/3165/ 2.5.60 http://khack.osdl.org/stp/265643/ Interesting items for me are the fork/exec/sh times and some of the file + VM numbers LMBench 2.0 Data ( items selected from total of five runs ) Processor, Processes - times in microseconds - smaller is better ---------------------------------------------------------------- Host OS Mhz null null open selct sig sig fork exec sh call I/O stat clos TCP inst hndl proc proc proc --------- ------------- ---- ---- ---- ---- ---- ----- ---- ---- ---- ---- ---- stp2-003. Linux 2.4.18 1000 0.39 0.67 3.89 4.99 30.4 0.93 3.06 344. 1403 4465 stp2-000. Linux 2.5.60 1000 0.41 0.77 4.34 5.57 32.6 1.15 3.59 245. 1406 5795 stp1-003. Linux 2.4.18 1000 0.32 0.46 2.60 3.21 16.6 0.79 2.52 104. 918. 4460 stp1-002. Linux 2.5.60 1000 0.33 0.47 2.83 3.47 16.0 0.94 2.70 143. 1212 5292 Context switching - times in microseconds - smaller is better ------------------------------------------------------------- Host OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw --------- ------------- ----- ------ ------ ------ ------ ------- ------- stp2-003. Linux 2.4.18 2.680 6.2100 15.8 7.9400 110.7 26.4 111.1 stp2-000. Linux 2.5.60 1.590 5.0700 17.6 7.5800 79.8 11.0 113.6 stp1-003. Linux 2.4.18 0.590 3.4700 11.1 4.8200 134.3 30.8 131.7 stp1-002. Linux 2.5.60 1.000 3.5400 11.2 4.1400 129.6 30.4 127.8 *Local* Communication latencies in microseconds - smaller is better ------------------------------------------------------------------- Host OS 2p/0K Pipe AF UDP RPC/ TCP RPC/ TCP ctxsw UNIX UDP TCP conn --------- ------------- ----- ----- ---- ----- ----- ----- ----- ---- stp2-003. Linux 2.4.18 2.680 9.071 17.5 26.9 46.2 34.4 60.0 62.9 stp2-000. Linux 2.5.60 1.590 8.414 13.2 21.2 43.2 28.3 54.1 97.1 stp1-003. Linux 2.4.18 0.590 3.623 6.98 11.7 28.2 17.8 38.4 300K stp1-002. Linux 2.5.60 1.050 4.591 8.54 14.8 31.8 20.0 41.0 67.1 File & VM system latencies in microseconds - smaller is better -------------------------------------------------------------- Host OS 0K File 10K File Mmap Prot Page Create Delete Create Delete Latency Fault Fault --------- ------------- ------ ------ ------ ------ ------- ----- ----- stp2-003. Linux 2.4.18 34.6 7.2490 110.9 17.9 2642.0 0.771 3.00000 stp2-000. Linux 2.5.60 40.0 9.2780 113.3 23.3 4592.0 0.543 3.00000 stp1-003. Linux 2.4.18 28.8 4.8890 107.5 11.3 686.0 0.621 2.00000 stp1-002. Linux 2.5.60 32.4 6.4290 112.9 16.2 1455.0 0.465 2.00000 *Local* Communication bandwidths in MB/s - bigger is better ----------------------------------------------------------- Host OS Pipe AF TCP File Mmap Bcopy Bcopy Mem Mem UNIX reread reread (libc) (hand) read write --------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- ----- stp2-003. Linux 2.4.18 563. 277. 263. 437.0 552.8 249.1 180.7 553. 215.2 stp2-000. Linux 2.5.60 603. 516. 151. 436.3 549.0 238.0 171.9 548. 233.7 stp1-003. Linux 2.4.18 1009 820. 404. 414.3 467.0 167.2 154.1 466. 236.2 stp1-002. Linux 2.5.60 806. 584. 69.1 408.0 461.7 161.1 149.1 461. 233.5 Memory latencies in nanoseconds - smaller is better (WARNING - may not be correct, check graphs) --------------------------------------------------- Host OS Mhz L1 $ L2 $ Main mem Guesses --------- ------------- ---- ----- ------ -------- ------- stp2-003. Linux 2.4.18 1000 3.464 8.0820 110.9 stp2-000. Linux 2.5.60 1000 3.545 8.2790 110.6 stp1-003. Linux 2.4.18 1000 2.994 6.9850 121.4 stp1-002. Linux 2.5.60 1000 3.023 7.0530 122.5 ------------------ cliffw > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 17:11 ` Cliff White @ 2003-02-25 17:17 ` William Lee Irwin III 2003-02-25 17:38 ` Linus Torvalds 2003-02-25 19:48 ` Martin J. Bligh 2 siblings, 0 replies; 266+ messages in thread From: William Lee Irwin III @ 2003-02-25 17:17 UTC (permalink / raw) To: Cliff White Cc: Martin J. Bligh, Chris Wedgwood, Alan Cox, Larry McVoy, Linux Kernel Mailing List On Tue, Feb 25, 2003 at 09:11:38AM -0800, Cliff White wrote: > Interesting items for me are the fork/exec/sh times and some of the file + VM > numbers > LMBench 2.0 Data ( items selected from total of five runs ) Okay, got profiles for the individual tests you're interested in? Also, what are the statistical significance cutoffs? -- wli ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 17:11 ` Cliff White 2003-02-25 17:17 ` William Lee Irwin III @ 2003-02-25 17:38 ` Linus Torvalds 2003-02-25 19:54 ` Dave Jones 2003-02-25 19:48 ` Martin J. Bligh 2 siblings, 1 reply; 266+ messages in thread From: Linus Torvalds @ 2003-02-25 17:38 UTC (permalink / raw) To: linux-kernel In article <200302251711.h1PHBct16624@mail.osdl.org>, Cliff White <cliffw@osdl.org> wrote: > >Well, here's one bit of data. Easy enough to do if you have a web browser. >LMBench 2.0 on 1-way and 2-way, kernels 2.4.18 and 2.5.60 >1-way (stp1-003 stp1-002) >2.4.18 http://khack.osdl.org/stp/7443/ >2.5.60 http://khack.osdl.org/stp/265622/ > >2-way (stp2-003 stp2-000) >2.4.18 http://khack.osdl.org/stp/3165/ >2.5.60 http://khack.osdl.org/stp/265643/ > >Interesting items for me are the fork/exec/sh times and some of the file + VM >numbers >LMBench 2.0 Data ( items selected from total of five runs ) > >Processor, Processes - times in microseconds - smaller is better >---------------------------------------------------------------- >Host OS Mhz null null open selct sig sig fork exec sh > call I/O stat clos TCP inst hndl proc proc proc >--------- ------------- ---- ---- ---- ---- ---- ----- ---- ---- ---- ---- ---- >stp2-003. Linux 2.4.18 1000 0.39 0.67 3.89 4.99 30.4 0.93 3.06 344. 1403 4465 >stp2-000. Linux 2.5.60 1000 0.41 0.77 4.34 5.57 32.6 1.15 3.59 245. 1406 5795 Note that those numbers will look quite different (at least on a P4) if you use a modern library that uses the "sysenter" stuff. The difference ends up being something like this: Host OS Mhz null null open selct sig sig fork exec sh call I/O stat clos inst hndl proc proc proc --------- ------------- ---- ---- ---- ---- ---- ----- ---- ---- ---- ---- ---- i686-linu Linux 2.5.30 2380 0.8 1.1 3 5 0.04K 1.1 3 0.2K 1K 3K i686-linu Linux 2.5.62 2380 0.2 0.6 3 4 0.04K 0.7 3 0.2K 1K 3K (Yeah, I've never run a 2.4.x kernel on this machine, so..) In other words, the system call has been speeded up quite noticeably. Yes, if you don't take advantage of sysenter, then all the sysenter support will just make us look worse ;( I'm surprised by your "sh proc" changes, they are quite big. I guess it's rmap and highmem that bites us, and yes, we've gotten slower there. Linus ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 17:38 ` Linus Torvalds @ 2003-02-25 19:54 ` Dave Jones 2003-02-26 2:04 ` Linus Torvalds 0 siblings, 1 reply; 266+ messages in thread From: Dave Jones @ 2003-02-25 19:54 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel On Tue, Feb 25, 2003 at 05:38:31PM +0000, Linus Torvalds wrote: > Yes, if you don't take advantage of sysenter, then all the sysenter > support will just make us look worse ;( Andi's patch[1] to remove one of the wrmsr's from the context switch fast path should win back at least some of the lost microbenchmark points. (Full info at http://bugzilla.kernel.org/show_bug.cgi?id=350) Dave [1] http://bugzilla.kernel.org/attachment.cgi?id=140&action=view ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 19:54 ` Dave Jones @ 2003-02-26 2:04 ` Linus Torvalds 0 siblings, 0 replies; 266+ messages in thread From: Linus Torvalds @ 2003-02-26 2:04 UTC (permalink / raw) To: Dave Jones; +Cc: linux-kernel On Tue, 25 Feb 2003, Dave Jones wrote: > > > Yes, if you don't take advantage of sysenter, then all the sysenter > > support will just make us look worse ;( > > Andi's patch[1] to remove one of the wrmsr's from the context switch > fast path should win back at least some of the lost microbenchmark > points. But the patch is fundamentally broken wrt preemption at least, and it looks totally unfixable. It's also overly complex, for no apparent reason. The simple way to avoid the wrmsr of SYSENTER_CS is to just cache a per-cpu copy in memory, preferably in some location that is already in the cache at context switch time for other reasons. Linus ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 17:11 ` Cliff White 2003-02-25 17:17 ` William Lee Irwin III 2003-02-25 17:38 ` Linus Torvalds @ 2003-02-25 19:48 ` Martin J. Bligh 2 siblings, 0 replies; 266+ messages in thread From: Martin J. Bligh @ 2003-02-25 19:48 UTC (permalink / raw) To: Cliff White Cc: Chris Wedgwood, Alan Cox, Larry McVoy, Linux Kernel Mailing List > Interesting items for me are the fork/exec/sh times and some of the file > + VM numbers For the ones where you see degradation in fork/exec type stuff, any chance you could rerun them with 62-mjb3 with the objrmap stuff in it? That should fix a lot of the overhead. Thanks, M. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 6:17 ` Martin J. Bligh 2003-02-25 17:11 ` Cliff White @ 2003-02-25 21:28 ` William Lee Irwin III 1 sibling, 0 replies; 266+ messages in thread From: William Lee Irwin III @ 2003-02-25 21:28 UTC (permalink / raw) To: Martin J. Bligh Cc: Chris Wedgwood, Alan Cox, Larry McVoy, Linux Kernel Mailing List At some point in the past, Chris Wedgewood wrote: >> It seems to me for small boxes, 2.5.x is margianlly slower at most >> things than 2.4.x. On Mon, Feb 24, 2003 at 10:17:05PM -0800, Martin J. Bligh wrote: > Can you name a benchmark, or at least do something reproducible between > versions, and produce a 2.4 vs 2.5 profile? Let's at least try to fix it ... Looks like Cliff's got some good data. -- wli ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 5:19 ` Chris Wedgwood 2003-02-25 5:26 ` William Lee Irwin III 2003-02-25 6:17 ` Martin J. Bligh @ 2003-02-25 19:20 ` Alan Cox 2003-02-25 19:59 ` Scott Robert Ladd 3 siblings, 0 replies; 266+ messages in thread From: Alan Cox @ 2003-02-25 19:20 UTC (permalink / raw) To: Chris Wedgwood; +Cc: Larry McVoy, Martin J. Bligh, Linux Kernel Mailing List On Tue, 2003-02-25 at 05:19, Chris Wedgwood wrote: > > The definitive Linux box appears to be $199 from Walmart right now, > > and its not SMP. > > In two year this kind of hardware probably will be SMP (HT or some > variant). Not if it costs money. If the cheapest reasonable x86 cpu is one that has chosen to avoid HT and SMP it won't have HT and SMP. Think 4xUSB2 connectors, brick PSU and no user adjustable components. ^ permalink raw reply [flat|nested] 266+ messages in thread
* RE: Minutes from Feb 21 LSE Call 2003-02-25 5:19 ` Chris Wedgwood ` (2 preceding siblings ...) 2003-02-25 19:20 ` Alan Cox @ 2003-02-25 19:59 ` Scott Robert Ladd 2003-02-25 20:18 ` jlnance 2003-02-25 21:19 ` Chris Wedgwood 3 siblings, 2 replies; 266+ messages in thread From: Scott Robert Ladd @ 2003-02-25 19:59 UTC (permalink / raw) To: Chris Wedgwood, Alan Cox Cc: Larry McVoy, Martin J. Bligh, Linux Kernel Mailing List Chris Wedgwood wrote: > > The definitive Linux box appears to be $199 from Walmart right now, > > and its not SMP. > > In two year this kind of hardware probably will be SMP (HT or some > variant). HT is not the same thing as SMP; while the chip may appear to be two processors, it is actually equivalent 1.1 to 1.3 processors, depending on the application. Multicore processors and true SMP systems are unlikely to become mainstream consumer items, given the premium price charged for such systems. That given, I see some value in a stripped-down, low-overhead, consumer-focused Linux that targets uniprocessor and HT systems, to be used in the typical business or gaming PC. I'm not sure such is achievable with the current config options; perhaps I should try to see how small a kernel I can build for a simple ia32 system... ..Scott Scott Robert Ladd Coyote Gulch Productions (http://www.coyotegulch.com) Professional programming for science and engineering; Interesting and unusual bits of very free code. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 19:59 ` Scott Robert Ladd @ 2003-02-25 20:18 ` jlnance 2003-02-25 20:59 ` Scott Robert Ladd 2003-02-25 21:19 ` Chris Wedgwood 1 sibling, 1 reply; 266+ messages in thread From: jlnance @ 2003-02-25 20:18 UTC (permalink / raw) To: linux-kernel On Tue, Feb 25, 2003 at 02:59:05PM -0500, Scott Robert Ladd wrote: > > In two year this kind of hardware probably will be SMP (HT or some > > variant). > > HT is not the same thing as SMP; while the chip may appear to be two > processors, it is actually equivalent 1.1 to 1.3 processors, depending on > the application. > > Multicore processors and true SMP systems are unlikely to become mainstream > consumer items, given the premium price charged for such systems. I think the difference between SMP and HT is likely to decrease rather than increase in the future. Even now people want to put multiple CPUs on the same piece of silicon. Once you do that it only makes sense to start sharning things between them. If you had a system with 2 CPUs which shared a common L1 cache is that going to be a HT or an SMP system? Or you could go further and have 2 CPUs which share an FPU. There are all sorts of combinations you could come up with. I think designers will experiment and find the one that gives the most throughput for the least money. Jim ^ permalink raw reply [flat|nested] 266+ messages in thread
* RE: Minutes from Feb 21 LSE Call 2003-02-25 20:18 ` jlnance @ 2003-02-25 20:59 ` Scott Robert Ladd 0 siblings, 0 replies; 266+ messages in thread From: Scott Robert Ladd @ 2003-02-25 20:59 UTC (permalink / raw) To: jlnance, linux-kernel jlnance@unity.ncsu.edu wrote: > I think the difference between SMP and HT is likely to decrease rather > than increase in the future. Even now people want to put multiple CPUs > on the same piece of silicon. Once you do that it only makes sense to > start sharning things between them. If you had a system with 2 CPUs > which shared a common L1 cache is that going to be a HT or an SMP system? > Or you could go further and have 2 CPUs which share an FPU. There are > all sorts of combinations you could come up with. I think designers > will experiment and find the one that gives the most throughput for > the least money. IBM's forthcoming Power5 will have two cores, each with SMT (the generic term for HyperThreading); it will present itself to the OS as four processors. Those four processors, however, are not equal; SMT is certainly valuable, but it can only be as effective as mutliple cores if it in effect *becomes* multiple cores (and, as such, turns into SMP). I'm writing a chapter on memory architectures in my parallel programming book; it's giving me a bit of a headache, as the issues you raise are both important and complex. We have multiple levels of caches, NUMA architectures, clusters, SMP, HT... the list just goes on and on, infinite in diversity and combinations. Vendors will continue to experiment; I doubt very much that any one architecture will take center stage. I hope Linux handles the brain-sprain better than I am at the moment! ;) ..Scott Scott Robert Ladd Coyote Gulch Productions (http://www.coyotegulch.com) ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 19:59 ` Scott Robert Ladd 2003-02-25 20:18 ` jlnance @ 2003-02-25 21:19 ` Chris Wedgwood 2003-02-25 21:38 ` Scott Robert Ladd 1 sibling, 1 reply; 266+ messages in thread From: Chris Wedgwood @ 2003-02-25 21:19 UTC (permalink / raw) To: Scott Robert Ladd Cc: Alan Cox, Larry McVoy, Martin J. Bligh, Linux Kernel Mailing List On Tue, Feb 25, 2003 at 02:59:05PM -0500, Scott Robert Ladd wrote: > HT is not the same thing as SMP; while the chip may appear to be two > processors, it is actually equivalent 1.1 to 1.3 processors, > depending on the application. You can't have non-integer numbers of processors. HT is a hack that makes what appears to be two processors using common silicon. The fact it's slower than a really dual CPU box is irrelevant in some sense, you still need SMP smart to deal with it; it's only important when you want to know why performance increases aren't apparent or you loose performance in some cases... (ie. other virtual CPU thrashing the cache). > Multicore processors and true SMP systems are unlikely to become > mainstream consumer items, given the premium price charged for such > systems. I overstated things thinking SMP/HT would be in low-end hardware given two years. As Alan pointed out, since the 'Walmart' class hardware is 'whatever is cheapest' then perhaps HT/SMT/whatever won't be common place for super-low end boxes in two years --- but I would be surprised if it didn't gain considerable market share elsewhere. > That given, I see some value in a stripped-down, low-overhead, > consumer-focused Linux that targets uniprocessor and HT systems, to > be used in the typical business or gaming PC. UP != HT HT is SMP with magic requirements. For multiple physical CPUs the requirements become even more complex; you want to try to group tasks to physical CPUs, not logical ones lest you thrash the cache. Presumably there are other tweaks possible two, cache-line's don't bounce between logic CPUs on a physical CPU for example, so some locks and other data structures will be much faster to access than those which actually do need cache-lines to migrate between different physical CPUs. I'm not sure if these specific property cane be exploited in the general case though. > I'm not sure such is achievable with the current config options; > perhaps I should try to see how small a kernel I can build for a > simple ia32 system... Present 2.5.x looks like it will have smarts for HT as a subset of NUMA. If HT does become more common and similar things abound, I'm not sure if it even makes sense to have a UP kernel for certain platforms and/or CPUs --- since a mere BIOS change will affect what is 'virtually' apparent to the OS. --cw ^ permalink raw reply [flat|nested] 266+ messages in thread
* RE: Minutes from Feb 21 LSE Call 2003-02-25 21:19 ` Chris Wedgwood @ 2003-02-25 21:38 ` Scott Robert Ladd 0 siblings, 0 replies; 266+ messages in thread From: Scott Robert Ladd @ 2003-02-25 21:38 UTC (permalink / raw) To: Chris Wedgwood Cc: Alan Cox, Larry McVoy, Martin J. Bligh, Linux Kernel Mailing List Chris Wedgwood wrote: SRL>HT is not the same thing as SMP; while the chip may appear to be SRL>two processors, it is actually equivalent 1.1 to 1.3 processors, SRL>depending on the application. > CW> You can't have non-integer numbers of processors. HT is a hack CW> that makes what appears to be two processors using common CW> silicon. I'm aware of that. ;) I'm well aware of the architecture needed to support HT. > The fact it's slower than a really dual CPU box is irrelevant in some > sense, you still need SMP smart to deal with it; it's only important > when you want to know why performance increases aren't apparent or you > loose performance in some cases... (ie. other virtual CPU thrashing > the cache). Performance differences *are* quite relevant when it comes to thread scheduling; the two virtual CPUS are not necessarily equivalent in performnace. > As Alan pointed out, since the 'Walmart' class hardware is 'whatever > is cheapest' then perhaps HT/SMT/whatever won't be common place for > super-low end boxes in two years --- but I would be surprised if it > didn't gain considerable market share elsewhere. I suspect HT/SMT be common for people who have multimedia systems, for video editing and high-end gaming. I doubt we'll see SMT toasters, though. > UP != HT An HT system is still a single, phsyical processor; HT is not equivalent to a multicore chip, either. Much depends on memory and connection models; a dual-core chip may be faster or slower than two similar physical SMP processors. depending on the architecture. I was speaking in terms of Intel's push to add HT to all of their P4s. Systems with a single CPU will likely have HT; that still doesn't make them as powerful as a true dual processor (or dual core CPU) system. > HT is SMP with magic requirements. For multiple physical CPUs the > requirements become even more complex; you want to try to group tasks > to physical CPUs, not logical ones lest you thrash the cache. Eaxctly. This is why HT is not the same thing as two physical CPUs. The OS must be aware of this the effectively schedule jobs. So I think we generally agree. > If HT does become more common and similar things abound, I'm not sure > if it even makes sense to have a UP kernel for certain platforms > and/or CPUs --- since a mere BIOS change will affect what is > 'virtually' apparent to the OS. A good point. ..Scott ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 6:58 ` Larry McVoy ` (2 preceding siblings ...) 2003-02-24 13:28 ` Alan Cox @ 2003-02-24 18:44 ` Davide Libenzi 3 siblings, 0 replies; 266+ messages in thread From: Davide Libenzi @ 2003-02-24 18:44 UTC (permalink / raw) To: Larry McVoy; +Cc: Linux Kernel Mailing List On Sun, 23 Feb 2003, Larry McVoy wrote: > > Because I don't see why I should waste my time running benchmarks just to > > prove you wrong. I don't respect you that much, and it seems the > > maintainers don't either. When you become somebody with the stature in the > > Linux community of, say, Linus or Andrew I'd be prepared to spend a lot > > more time running benchmarks on any concerns you might have. > > Who cares if you respect me, what does that have to do with proper > engineering? Do you think that I'm the only person who wants to see > numbers? You think Linus doesn't care about this? Maybe you missed > the whole IA32 vs IA64 instruction cache thread. It sure sounded like > he cares. How about Alan? He stepped up and pointed out that less > is more. How about Mark? He knows a thing or two about the topic? > In fact, I think you'd be hard pressed to find anyone who wouldn't be > interested in seeing the cache effects of a patch. > > People care about performance, both scaling up and scaling down. A lot of > performance changes are measured poorly, in a way that makes the changes > look good but doesn't expose the hidden costs of the change. What I'm > saying is that those sorts of measurements screwed over performance in > the past, why are you trying to repeat old mistakes? Larry, how many times this kind of discussions went on during the last years ? I think you should remember pretty well because it was always you on that side of the river pushing back "Barbarians" with your UP sword. The point is that people ( expecially young ) like to dig where other failed, it's normal. It's attractive like honey for bears. Let them try, many they will fail, but chances are that someone will succeed making it worth the try. And trust Linus, that is more on your wavelength than on the huge scalabity one. - Davide ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 23:15 ` Larry McVoy 2003-02-22 23:23 ` Christoph Hellwig 2003-02-22 23:44 ` Martin J. Bligh @ 2003-02-22 23:57 ` Jeff Garzik 2003-02-23 23:57 ` Bill Davidsen 3 siblings, 0 replies; 266+ messages in thread From: Jeff Garzik @ 2003-02-22 23:57 UTC (permalink / raw) To: Larry McVoy, Martin J. Bligh, Larry McVoy, Mark Hahn, David S. Miller, linux-kernel On Sat, Feb 22, 2003 at 03:15:52PM -0800, Larry McVoy wrote: > or rackmount cases. I fail to see how there are better margins on the > same hardware in a rackmount box for $800 when the desktop costs $750. > Those rack mount power supplies and cases are not as cheap as the desktop > ones, so I see no difference in the margins. Oh, it's definitely different hardware. Maybe the 16550-related portion of the ASIC is the same :) but just do an lspci to see huge differences in motherboard chipsets, on-board parts, more complicated BIOS, remote management bells and whistles, etc. Even the low-end rackmounts. But the better margins come simply from the mentality, IMO. Desktops just aren't "as important" to a business compared to servers, so IT shops are willing to spend more money to not only get better hardware, but also the support services that accompany it. Selling servers to enterprise data centers means bigger, more concentrated cash pool. Jeff ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 23:15 ` Larry McVoy ` (2 preceding siblings ...) 2003-02-22 23:57 ` Jeff Garzik @ 2003-02-23 23:57 ` Bill Davidsen 2003-02-24 6:22 ` Val Henson 3 siblings, 1 reply; 266+ messages in thread From: Bill Davidsen @ 2003-02-23 23:57 UTC (permalink / raw) To: Larry McVoy; +Cc: Linux Kernel Mailing List On Sat, 22 Feb 2003, Larry McVoy wrote: > > We would never try to propose such a change, and never have. > > Name a scalability change that's hurt the performance of UP by 5%. > > There isn't one. > > This is *exactly* the reasoning that every OS marketing weenie has used > for the last 20 years to justify their "feature" of the week. > > The road to slow bloated code is paved one cache miss at a time. You > may quote me on that. In fact, print it out and put it above your > monitor and look at it every day. One cache miss at a time. How much > does one cache miss add to any benchmark? .001%? Less. > > But your pet features didn't slow the system down. Nope, they just made > the cache smaller, which you didn't notice because whatever artificial > benchmark you ran didn't happen to need the whole cache. Clearly this is the case, the benefit of a change must balance the negative effects. Making the code paths longer hurts free cache, having more of them should not. More code is not always slower code, and doesn't always have more impact on cache use. You identify something which must be considered, but it's not the only thing to consider. Linux shouild be stable, not moribund. > You need to understand that system resources belong to the user. Not the > kernel. The goal is to have all of the kernel code running under any > load be less than 1% of the CPU. Your 5% number up there would pretty > much double the amount of time we spend in the kernel for most workloads. Who profits? For most users a bit more system time resulting in better disk performance would be a win, or at least non-lose. This isn't black and white. On Sat, 22 Feb 2003, Larry McVoy wrote: > Let's get back to your position. You want to shovel stuff in the kernel > for the benefit of the 32 way / 64 way etc boxes. I don't see that as > wise. You could prove me wrong. Here's how you do it: go get oprofile > or whatever that tool is which lets you run apps and count cache misses. > Start including before/after runs of each microbench in lmbench and > some time sharing loads with and without your changes. When you can do > that and you don't add any more bus traffic, you're a genius and > I'll shut up. Code only costs when it's executed. Linux is somewhat heading to the place where a distro has a few useful configs and then people who care for the last bit of whatever they see as a bottleneck can build their own fro "make config." So it is possible to add features for big machines without any impact on the builds which don't use the features. it goes without saying that this is hard. I would guess that it results in more bugs as well, if one path or another is "the less-traveled way." > > But that's a false promise because by definition, fine grained threading > adds more bus traffic. It's kind of hard to not have that happen, the > caches have to stay coherent somehow. Clearly. And things which require more locking will pay some penalty for this. But a quick scan of this list on keyword "lockless' will show that people are thinking about this. I don't think developers will buy ignoring part of the market to completely optimize for another. Linux will grow by being ubiquitious, not by winning some battle and losing the war. It's not a niche market os. -- bill davidsen <davidsen@tmr.com> CTO, TMR Associates, Inc Doing interesting things with little computers since 1979. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-23 23:57 ` Bill Davidsen @ 2003-02-24 6:22 ` Val Henson 2003-02-24 6:41 ` William Lee Irwin III 0 siblings, 1 reply; 266+ messages in thread From: Val Henson @ 2003-02-24 6:22 UTC (permalink / raw) To: Bill Davidsen; +Cc: Larry McVoy, Linux Kernel Mailing List On Sun, Feb 23, 2003 at 06:57:09PM -0500, Bill Davidsen wrote: > On Sat, 22 Feb 2003, Larry McVoy wrote: > > > > But that's a false promise because by definition, fine grained threading > > adds more bus traffic. It's kind of hard to not have that happen, the > > caches have to stay coherent somehow. > > Clearly. And things which require more locking will pay some penalty for > this. But a quick scan of this list on keyword "lockless' will show that > people are thinking about this. Lockless algorithms still generate bus traffic when you do the atomic compare-and-swap or load-linked or whatever hardware instruction you use to implement your lockless algorithm. Caches still have to stay coherent, lock or no lock. -VAL ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 6:22 ` Val Henson @ 2003-02-24 6:41 ` William Lee Irwin III 0 siblings, 0 replies; 266+ messages in thread From: William Lee Irwin III @ 2003-02-24 6:41 UTC (permalink / raw) To: Val Henson; +Cc: Bill Davidsen, Larry McVoy, Linux Kernel Mailing List On Sun, Feb 23, 2003 at 06:57:09PM -0500, Bill Davidsen wrote: >> Clearly. And things which require more locking will pay some penalty for >> this. But a quick scan of this list on keyword "lockless' will show that >> people are thinking about this. On Sun, Feb 23, 2003 at 11:22:30PM -0700, Val Henson wrote: > Lockless algorithms still generate bus traffic when you do the atomic > compare-and-swap or load-linked or whatever hardware instruction you > use to implement your lockless algorithm. Caches still have to stay > coherent, lock or no lock. Not all lockless algorithms operate on the "access everything with atomic operations" principle. RCU, for example, uses no atomic operations on the read side, which is actually fewer atomic operations than standard rwlocks use for the read side. -- wli ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 19:56 ` Larry McVoy 2003-02-22 20:24 ` William Lee Irwin III 2003-02-22 21:02 ` Martin J. Bligh @ 2003-02-22 21:29 ` Jeff Garzik 2 siblings, 0 replies; 266+ messages in thread From: Jeff Garzik @ 2003-02-22 21:29 UTC (permalink / raw) To: Larry McVoy, Martin J. Bligh, Mark Hahn, David S. Miller, Larry McVoy, linux-kernel Oh, come on :) It's all vague handwaving because people either don't know real numbers, or sure as heck won't post them on a public list... Jeff ^ permalink raw reply [flat|nested] 266+ messages in thread
* Minutes from Feb 21 LSE Call @ 2003-02-21 23:48 Hanna Linder 2003-02-22 0:16 ` Larry McVoy ` (2 more replies) 0 siblings, 3 replies; 266+ messages in thread From: Hanna Linder @ 2003-02-21 23:48 UTC (permalink / raw) To: lse-tech; +Cc: linux-kernel LSE Con Call Minutes from Feb21 Minutes compiled by Hanna Linder hannal@us.ibm.com, please post corrections to lse-tech@lists.sf.net. Object Based Reverse Mapping: (Dave McCracken, Ben LaHaise, Rik van Riel, Martin Bligh, Gerrit Huizenga) Dave coded up an initial patch for partial object based rmap which he sent to linux-mm yesterday. Rik pointed out there is a scalability problem with the full object based approach. However, a hybrid approach between regular rmap and object based may not be too radical for 2.5/2.6 timeframe. Ben said none of the users have been complaining about performance with the existing rmap. Martin disagreed and said Linus, Andrew Morton and himself have all agreed there is a problem. One of the problems Martin is already hitting on high cpu machines with large memory is the space consumption by all the pte-chains filling up memory and killing the machine. There is also a performance impact of maintaining the chains. Ben said they shouldnt be using fork and bash is the main user of fork and should be changed to use clone instead. Gerrit said bash is not used as much as Ben might think on these large systems running real world applications. Ben said he doesnt see the large systems problems with the users he talks to and doesnt agree the full object based rmap is needed. Gerrit explained we have very complex workloads running on very large systems and we are already hitting the space consumption problem which is a blocker for running Linux on them. Ben said none of the distros are supporting these large systems right now. Martin said UL is already starting to support them. Then it degraded into a distro discussion and Hanna asked for them to bring it back to the technical side. In order to show the problem with object based rmap you have to add vm pressure to existing benchmarks to see what happens. Martin agreed to run multiple benchmarks on the same systems to simulate this. Cliff White of the OSDL offered to help Martin with this. At the end Ben said the solution for now needs to be a hybrid with existing rmap. Martin, Rik, and Dave all agreed with Ben. Then we all agreed to move on to other things. *ActionItem - someone needs to change bash to use clone instead of fork.. Scheduler Hang as discovered by restarting a large Web application multiple times: Rick Lindlsey/ Hanna Linder We were seeing a hard hang after restarting a large web serving application 3-6 times on the 2.5.59 (and up) kernels (also seen as far back as 2.5.44). It was mainly caused when two threads each have interrupts disabled and one is spinning on a lock that the other is holding. The one holding the lock has sent an IPI to all the other processes telling them to flush their TLB's. But the one witinging for the spinlock has interrupts turned off and does not recieve that IPI request. So they both sit there waiting for ever. The final fix will be in kernel.org mainline kernel version 2.5.63. Here are the individual patches which should apply with fuzz to older kernel versions: http://linux.bkbits.net:8080/linux-2.5/cset@1.1005?nav=index.html http://linux.bkbits.net:8080/linux-2.5/cset@1.1004?nav=index.html Shared Memory Binding : Matt Dobson - Shared memory binding API (new). A way for an application to bind shared memory to Nodes. Motivation is for large databases support that want more control over their shared memory. current allocation scheme is each process gets a chunk of shared memory from the same node the process is located on. instead of page faulting around to different nodes dynamicaly this API will allow a process to specify which node or set of nodes to bind the shared memory to. Work in progress. Martin - gcc 2.95 vs 3.2. Martin has done some testing which indicates that gcc 3.2 produces slightly worse code for the kernel than 2.95 and takes a bit longer to do so. gcc 3.2 -Os produces larger code than gcc 2.95 -O2. On his machines -O2 was faster than -Os, but on a cpu wiht smaller caches the inverse may be true. More testing may be needed. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-21 23:48 Hanna Linder @ 2003-02-22 0:16 ` Larry McVoy 2003-02-22 0:25 ` William Lee Irwin III ` (4 more replies) 2003-02-23 0:42 ` Eric W. Biederman 2003-02-23 3:24 ` Andrew Morton 2 siblings, 5 replies; 266+ messages in thread From: Larry McVoy @ 2003-02-22 0:16 UTC (permalink / raw) To: Hanna Linder; +Cc: lse-tech, linux-kernel > Ben said none of the distros are supporting these large > systems right now. Martin said UL is already starting to support > them. Ben is right. I think IBM and the other big iron companies would be far better served looking at what they have done with running multiple instances of Linux on one big machine, like the 390 work. Figure out how to use that model to scale up. There is simply not a big enough market to justify shoveling lots of scaling stuff in for huge machines that only a handful of people can afford. That's the same path which has sunk all the workstation companies, they all have bloated OS's and Linux runs circles around them. In terms of the money and in terms of installed seats, the small Linux machines out number the 4 or more CPU SMP machines easily 10,000:1. And with the embedded market being one of the few real money makers for Linux, there will be huge pushback from those companies against changes which increase memory footprint. -- --- Larry McVoy lm at bitmover.com http://www.bitmover.com/lm ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 0:16 ` Larry McVoy @ 2003-02-22 0:25 ` William Lee Irwin III 2003-02-22 2:24 ` Steven Cole 2003-02-22 0:44 ` Martin J. Bligh ` (3 subsequent siblings) 4 siblings, 1 reply; 266+ messages in thread From: William Lee Irwin III @ 2003-02-22 0:25 UTC (permalink / raw) To: Larry McVoy, Hanna Linder, lse-tech, linux-kernel On Fri, Feb 21, 2003 at 04:16:18PM -0800, Larry McVoy wrote: > Ben is right. I think IBM and the other big iron companies would be > far better served looking at what they have done with running multiple > instances of Linux on one big machine, like the 390 work. Figure out > how to use that model to scale up. There is simply not a big enough > market to justify shoveling lots of scaling stuff in for huge machines > that only a handful of people can afford. That's the same path which > has sunk all the workstation companies, they all have bloated OS's and > Linux runs circles around them. Scalability done properly should not degrade performance on smaller machines, Pee Cees, or even microscopic organisms. On Fri, Feb 21, 2003 at 04:16:18PM -0800, Larry McVoy wrote: > In terms of the money and in terms of installed seats, the small Linux > machines out number the 4 or more CPU SMP machines easily 10,000:1. > And with the embedded market being one of the few real money makers > for Linux, there will be huge pushback from those companies against > changes which increase memory footprint. There's quite a bit of commonality with large x86 highmem there, as the highmem crew is extremely concerned about the kernel's memory footprint and is looking to trim kernel memory overhead from every aspect of its operation they can. Reducing kernel memory footprint is a crucial part of scalability, in both scaling down to the low end and scaling up to highmem. =) -- wli ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 0:25 ` William Lee Irwin III @ 2003-02-22 2:24 ` Steven Cole 0 siblings, 0 replies; 266+ messages in thread From: Steven Cole @ 2003-02-22 2:24 UTC (permalink / raw) To: William Lee Irwin III; +Cc: Larry McVoy, Hanna Linder, lse-tech, LKML On Fri, 2003-02-21 at 17:25, William Lee Irwin III wrote: > On Fri, Feb 21, 2003 at 04:16:18PM -0800, Larry McVoy wrote: > > Ben is right. I think IBM and the other big iron companies would be > > far better served looking at what they have done with running multiple > > instances of Linux on one big machine, like the 390 work. Figure out > > how to use that model to scale up. There is simply not a big enough > > market to justify shoveling lots of scaling stuff in for huge machines > > that only a handful of people can afford. That's the same path which > > has sunk all the workstation companies, they all have bloated OS's and > > Linux runs circles around them. mjb> Unfortunately, as I've pointed out to you before, this doesn't work mjb> in practice. Workloads may not be easily divisible amongst mjb> machines, and you're just pushing all the complex problems out for mjb> every userspace app to solve itself, instead of fixing it once in mjb> the kernel. Please permit an observer from the sidelines a few comments. I think all four of you are right, for different reasons. > > Scalability done properly should not degrade performance on smaller > machines, Pee Cees, or even microscopic organisms. s/should/must/ in the above. That must be a guiding principle. > > > On Fri, Feb 21, 2003 at 04:16:18PM -0800, Larry McVoy wrote: > > In terms of the money and in terms of installed seats, the small Linux > > machines out number the 4 or more CPU SMP machines easily 10,000:1. > > And with the embedded market being one of the few real money makers > > for Linux, there will be huge pushback from those companies against > > changes which increase memory footprint. > > There's quite a bit of commonality with large x86 highmem there, as > the highmem crew is extremely concerned about the kernel's memory > footprint and is looking to trim kernel memory overhead from every > aspect of its operation they can. Reducing kernel memory footprint > is a crucial part of scalability, in both scaling down to the low end > and scaling up to highmem. =) > > > -- wli Since the time between major releases of the kernel seems to be two to three years now (counting to where the new kernel is really stable), it is probably worthwhile to think about what high-end systems will be like when 3.0 is expected. My guess is that a trend will be machines with increasingly greater cpu counts with access to the same memory. Why? Because if it can be done, it will be done. The ability to put more cpus on a single chip may translate into a Moore's law of increasing cpu counts per machine. And as Martin points out, the high end machines are where the money is. In my own unsophisticated opinion, Larry's concept of Cache Coherent Clusters seems worth further development. And Martin is right about the need for fixing it in the kernel, again IMHO. But how to fix it in the kernel? Would something similar to OpenMosix or OpenSSI in a future kernel be appropriate to get Larry's CCCluster members to cooperate? Or is it possible to continue the scalability race when cpu counts get to 256, 512, etc. Just some thoughts from the sidelines. Best regards, Steven ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 0:16 ` Larry McVoy 2003-02-22 0:25 ` William Lee Irwin III @ 2003-02-22 0:44 ` Martin J. Bligh 2003-02-22 2:47 ` Larry McVoy 2003-02-22 8:32 ` David S. Miller ` (2 subsequent siblings) 4 siblings, 1 reply; 266+ messages in thread From: Martin J. Bligh @ 2003-02-22 0:44 UTC (permalink / raw) To: Larry McVoy, Hanna Linder; +Cc: lse-tech, linux-kernel > Ben is right. I think IBM and the other big iron companies would be > far better served looking at what they have done with running multiple > instances of Linux on one big machine, like the 390 work. Figure out > how to use that model to scale up. There is simply not a big enough > market to justify shoveling lots of scaling stuff in for huge machines > that only a handful of people can afford. That's the same path which > has sunk all the workstation companies, they all have bloated OS's and > Linux runs circles around them. In your humble opinion. Unfortunately, as I've pointed out to you before, this doesn't work in practice. Workloads may not be easily divisible amongst machines, and you're just pushing all the complex problems out for every userspace app to solve itself, instead of fixing it once in the kernel. The fact that you were never able to do this before doesn't mean it's impossible, it just means that you failed. > In terms of the money and in terms of installed seats, the small Linux > machines out number the 4 or more CPU SMP machines easily 10,000:1. > And with the embedded market being one of the few real money makers > for Linux, there will be huge pushback from those companies against > changes which increase memory footprint. And the profit margin on the big machines will outpace the smaller machines by a similar ratio, inverted. The high-end space is where most of the money is made by the Linux distros, by selling products like SLES or Advanced Server to people who can afford to pay for it. M. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 0:44 ` Martin J. Bligh @ 2003-02-22 2:47 ` Larry McVoy 2003-02-22 4:32 ` Martin J. Bligh 0 siblings, 1 reply; 266+ messages in thread From: Larry McVoy @ 2003-02-22 2:47 UTC (permalink / raw) To: Martin J. Bligh; +Cc: Larry McVoy, Hanna Linder, lse-tech, linux-kernel On Fri, Feb 21, 2003 at 04:44:13PM -0800, Martin J. Bligh wrote: > > Ben is right. I think IBM and the other big iron companies would be > > far better served looking at what they have done with running multiple > > instances of Linux on one big machine, like the 390 work. Figure out > > how to use that model to scale up. There is simply not a big enough > > market to justify shoveling lots of scaling stuff in for huge machines > > that only a handful of people can afford. That's the same path which > > has sunk all the workstation companies, they all have bloated OS's and > > Linux runs circles around them. > > In your humble opinion. My opinion has nothing to do with it, go benchmark them and see for yourself. I'm in a pretty good position to back up my statements with data, we support BitKeeper on AIX, Solaris, IRIX, HP-UX, Tru64, as well as a pile of others, so we have both the hardware and the software to do the comparisons. I stand by statement above and so does anyone else who has done the measurements. It is much much more pleasant to have Linux versus any other Unix implementation on the same platform. Let's keep it that way. > Unfortunately, as I've pointed out to you before, this doesn't work in > practice. Workloads may not be easily divisible amongst machines, and > you're just pushing all the complex problems out for every userspace > app to solve itself, instead of fixing it once in the kernel. "fixing it", huh? Your "fixes" may be great for your tiny segment of the market but they are not going to be welcome if they turn Linux into BloatOS 9.8. > The fact that you were never able to do this before doesn't mean it's > impossible, it just means that you failed. Thanks for the vote of confidence. I think the thing to focus on, however, is that *noone* has ever succeeded at what you are trying to do. And there have been many, many attempts. Your opinion, it would appear, is that you are smarter than all of the people in all of those past failed attempts, but you'll forgive me if I'm not impressed with your optimism. > > In terms of the money and in terms of installed seats, the small Linux > > machines out number the 4 or more CPU SMP machines easily 10,000:1. > > And with the embedded market being one of the few real money makers > > for Linux, there will be huge pushback from those companies against > > changes which increase memory footprint. > > And the profit margin on the big machines will outpace the smaller > machines by a similar ratio, inverted. Really? How about some figures? You'd need HUGE profit margins to justify your position, how about some actual hard cold numbers? -- --- Larry McVoy lm at bitmover.com http://www.bitmover.com/lm ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 2:47 ` Larry McVoy @ 2003-02-22 4:32 ` Martin J. Bligh 2003-02-22 5:05 ` Larry McVoy 0 siblings, 1 reply; 266+ messages in thread From: Martin J. Bligh @ 2003-02-22 4:32 UTC (permalink / raw) To: Larry McVoy; +Cc: Hanna Linder, lse-tech, linux-kernel >> In your humble opinion. > > My opinion has nothing to do with it, go benchmark them and see for > yourself. Nope, I was referring to this: >> > Ben is right. I think IBM and the other big iron companies would be >> > far better served looking at what they have done with running multiple >> > instances of Linux on one big machine, like the 390 work. Figure out >> > how to use that model to scale up. There is simply not a big enough >> > market to justify shoveling lots of scaling stuff in for huge machines >> > that only a handful of people can afford. Which I totally disagree with. >> >That's the same path which >> > has sunk all the workstation companies, they all have bloated OS's and >> > Linux runs circles around them. Not the fact that Linux is capable of stellar things, which I totally agree with. > I'm in a pretty good position to back up my statements with > data, we support BitKeeper on AIX, Solaris, IRIX, HP-UX, Tru64, as well > as a pile of others, so we have both the hardware and the software to > do the comparisons. I stand by statement above and so does anyone else > who has done the measurements. Oh, I don't doubt it - But I'd be amused to see the measurements, if you have them to hand. > It is much much more pleasant to have Linux versus any other Unix > implementation on the same platform. Let's keep it that way. Absolutely. >> Unfortunately, as I've pointed out to you before, this doesn't work in >> practice. Workloads may not be easily divisible amongst machines, and >> you're just pushing all the complex problems out for every userspace >> app to solve itself, instead of fixing it once in the kernel. > > "fixing it", huh? Your "fixes" may be great for your tiny segment of > the market but they are not going to be welcome if they turn Linux into > BloatOS 9.8. They won't - the maintainers would never allow us to do that. >> The fact that you were never able to do this before doesn't mean it's >> impossible, it just means that you failed. > > Thanks for the vote of confidence. I think the thing to focus on, > however, is that *noone* has ever succeeded at what you are trying > to do. And there have been many, many attempts. Your opinion, it > would appear, is that you are smarter than all of the people in all > of those past failed attempts, but you'll forgive me if I'm not > impressed with your optimism. Who said that I was going to single-handedly change the world? What's different with Linux is the development model. That's why *we* will succeed where others have failed before. There's some incredible intellect all around Linux, but that's not all it takes, as you've pointed out. >> > In terms of the money and in terms of installed seats, the small Linux >> > machines out number the 4 or more CPU SMP machines easily 10,000:1. >> > And with the embedded market being one of the few real money makers >> > for Linux, there will be huge pushback from those companies against >> > changes which increase memory footprint. >> >> And the profit margin on the big machines will outpace the smaller >> machines by a similar ratio, inverted. > > Really? How about some figures? You'd need HUGE profit margins to > justify your position, how about some actual hard cold numbers? I don't have them to hand, but if you think anyone's making money on PCs nowadays, you're delusional (with respect to hardware). With respect to Linux, what makes you think distros are going to make large amounts of money from a freely replicatable OS, for tiny embedded systems? Support for servers, on the other hand, is a different game ... M. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 4:32 ` Martin J. Bligh @ 2003-02-22 5:05 ` Larry McVoy 2003-02-22 6:39 ` Martin J. Bligh 2003-02-22 8:38 ` David S. Miller 0 siblings, 2 replies; 266+ messages in thread From: Larry McVoy @ 2003-02-22 5:05 UTC (permalink / raw) To: Martin J. Bligh; +Cc: Larry McVoy, Hanna Linder, lse-tech, linux-kernel On Fri, Feb 21, 2003 at 08:32:30PM -0800, Martin J. Bligh wrote: > > "fixing it", huh? Your "fixes" may be great for your tiny segment of > > the market but they are not going to be welcome if they turn Linux into > > BloatOS 9.8. > > They won't - the maintainers would never allow us to do that. The path to hell is paved with good intentions. > > Really? How about some figures? You'd need HUGE profit margins to > > justify your position, how about some actual hard cold numbers? > > I don't have them to hand, but if you think anyone's making money on > PCs nowadays, you're delusional (with respect to hardware). Let's see, Dell has a $66B market cap, revenues of $8B/quarter and $500M/quarter in profit. Lots of people working for companies who haven't figured out how to do it as well as Dell *say* it can't be done but numbers say differently. -- --- Larry McVoy lm at bitmover.com http://www.bitmover.com/lm ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 5:05 ` Larry McVoy @ 2003-02-22 6:39 ` Martin J. Bligh 2003-02-22 8:38 ` Jeff Garzik 2003-02-22 8:38 ` David S. Miller 2003-02-22 8:38 ` David S. Miller 1 sibling, 2 replies; 266+ messages in thread From: Martin J. Bligh @ 2003-02-22 6:39 UTC (permalink / raw) To: Larry McVoy; +Cc: Hanna Linder, lse-tech, linux-kernel >> I don't have them to hand, but if you think anyone's making money on >> PCs nowadays, you're delusional (with respect to hardware). > > Let's see, Dell has a $66B market cap, revenues of $8B/quarter and > $500M/quarter in profit. > > Lots of people working for companies who haven't figured out how to do > it as well as Dell *say* it can't be done but numbers say differently. And how much of that was profit on PCs running Linux? M. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 6:39 ` Martin J. Bligh @ 2003-02-22 8:38 ` Jeff Garzik 2003-02-22 22:18 ` William Lee Irwin III 2003-02-22 8:38 ` David S. Miller 1 sibling, 1 reply; 266+ messages in thread From: Jeff Garzik @ 2003-02-22 8:38 UTC (permalink / raw) To: linux-kernel ia32 big iron. sigh. I think that's so unfortunately in a number of ways, but the main reason, of course, is that highmem is evil :) Intel can use PAE to "turn back the clock" on ia32. Although googling doesn't support this speculation, I am willing to bet Intel will eventually unveil a new PAE that busts the 64GB barrier -- instead of trying harder to push consumers to 64-bit processors. Processor speed, FSB speed, PCI bus bandwidth, all these are issues -- but ones that pale in comparison to the long term effects of highmem on the market. Enterprise customers will see this as a signal to continue building around ia32 for the next few years, thoroughly damaging 64-bit technology sales and development. I bet even IA64 suffers... at Intel's own hands. Rumors of a "Pentium64" at Intel are constantly floating around The Register and various rumor web sites, but Intel is gonna miss that huge profit opportunity too by trying to hack the ia32 ISA to scale up to big iron -- where it doesn't belong. Being cynical, one might guess that Intel will treat IA64 as a loss leader until the other 64-bit competition dies, keeping ia32 at the top end of the market via silly PAE/PSE hacks. When the existing 64-bit compettion disappears, five years down the road, compilers will have matured sufficiently to make using IA64 boxes feasible. If you really want to scale, just go to 64-bits, darn it. Don't keep hacking ia32 ISA -- leave it alone, it's fine as it is, and will live a nice long life as the future's preferred embedded platform. 64-bit. alpha is old tech, and dead. *sniff* sparc64 is mostly old tech, and mostly dead. IA64 isn't, yet. x86-64 is _nice_ tech, but who knows if AMD will survive competition with Intel. PPC64 is the wild card in all this. I hope it succeeds. Jeff, feeling like a silly, random rant after a long drive ...and from a technical perspective, highmem grots up the code, too :) ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 8:38 ` Jeff Garzik @ 2003-02-22 22:18 ` William Lee Irwin III 2003-02-23 0:50 ` Martin J. Bligh 2003-02-23 1:17 ` Benjamin LaHaise 0 siblings, 2 replies; 266+ messages in thread From: William Lee Irwin III @ 2003-02-22 22:18 UTC (permalink / raw) To: Jeff Garzik; +Cc: linux-kernel On Sat, Feb 22, 2003 at 03:38:10AM -0500, Jeff Garzik wrote: > ia32 big iron. sigh. I think that's so unfortunately in a number > of ways, but the main reason, of course, is that highmem is evil :) > Intel can use PAE to "turn back the clock" on ia32. Although googling > doesn't support this speculation, I am willing to bet Intel will > eventually unveil a new PAE that busts the 64GB barrier -- instead of > trying harder to push consumers to 64-bit processors. Processor speed, > FSB speed, PCI bus bandwidth, all these are issues -- but ones that > pale in comparison to the long term effects of highmem on the market. PAE is a relatively minor insult compared to the FPU, the 50,000 psi register pressure, variable-length instruction encoding with extremely difficult to optimize for instruction decoder trickiness, the nauseating bastardization of segmentation, the microscopic caches and TLB's, the lack of TLB context tags, frankly bizarre and just-barely-fixable gate nonsense, the interrupt controller, and ISA DMA. I've got no idea why this particular system-level ugliness which is nothing more than a routine pitstop in any bring your own barfbag reading session of x86 manuals fascinates you so much. At any rate, if systems (or any other) programming difficulties were any concern at all, x86 wouldn't be used at all. On Sat, Feb 22, 2003 at 03:38:10AM -0500, Jeff Garzik wrote: > Enterprise customers will see this as a signal to continue building > around ia32 for the next few years, thoroughly damaging 64-bit > technology sales and development. I bet even IA64 suffers... > at Intel's own hands. Rumors of a "Pentium64" at Intel are constantly > floating around The Register and various rumor web sites, but Intel > is gonna miss that huge profit opportunity too by trying to hack the > ia32 ISA to scale up to big iron -- where it doesn't belong. What power do you suppose we have to resist any of this? Intel, the 800lb gorilla, shoves what it wants where it wants to shove it, and all the "exit only" signs in the world attached to our backsides do absolutely nothing to deter it whatsoever. On Sat, Feb 22, 2003 at 03:38:10AM -0500, Jeff Garzik wrote: > Being cynical, one might guess that Intel will treat IA64 as a loss > leader until the other 64-bit competition dies, keeping ia32 at the > top end of the market via silly PAE/PSE hacks. When the existing > 64-bit compettion disappears, five years down the road, compilers > will have matured sufficiently to make using IA64 boxes feasible. Sounds relatively natural. I don't have a good notion of the legality boundaries wrt. to antitrust, but I'd assume they would otherwise do whatever it takes to either defeat or wipe out competitors. On Sat, Feb 22, 2003 at 03:38:10AM -0500, Jeff Garzik wrote: > If you really want to scale, just go to 64-bits, darn it. Don't keep > hacking ia32 ISA -- leave it alone, it's fine as it is, and will live > a nice long life as the future's preferred embedded platform. Take this up with Intel. The rest of us are at their mercy. Good luck finding anyone there to listen to it, you'll need it. On Sat, Feb 22, 2003 at 03:38:10AM -0500, Jeff Garzik wrote: > 64-bit. alpha is old tech, and dead. *sniff* sparc64 is mostly > old tech, and mostly dead. IA64 isn't, yet. x86-64 is _nice_ tech, > but who knows if AMD will survive competition with Intel. PPC64 is > the wild card in all this. I hope it succeeds. Alpha is old, dead, and kicking most other cpus' asses from the grave. I always did like DEC hardware. =( I'm not sure what's so nice about x86-64; another opcode prefix controlled extension atop the festering pile of existing x86 crud sounds every bit as bad any other attempt to prolong x86. Some of the system device -level cleanups like the HPET look nice, though. This success/failure stuff sounds a lot like economics, which is pretty much even further out of our control than the weather or the government. What prompted this bit? -- wli ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 22:18 ` William Lee Irwin III @ 2003-02-23 0:50 ` Martin J. Bligh 2003-02-23 11:22 ` Magnus Danielson 2003-02-23 19:54 ` Eric W. Biederman 2003-02-23 1:17 ` Benjamin LaHaise 1 sibling, 2 replies; 266+ messages in thread From: Martin J. Bligh @ 2003-02-23 0:50 UTC (permalink / raw) To: William Lee Irwin III, Jeff Garzik; +Cc: linux-kernel > On Sat, Feb 22, 2003 at 03:38:10AM -0500, Jeff Garzik wrote: >> ia32 big iron. sigh. I think that's so unfortunately in a number >> of ways, but the main reason, of course, is that highmem is evil :) One phrase ... "price:performance ratio". That's all it's about. The only thing that will kill 32-bit big iron is the availability of cheap 64 bit chips. It's a free-market economy. It's ugly to program, but it's cheap, and it works. M. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-23 0:50 ` Martin J. Bligh @ 2003-02-23 11:22 ` Magnus Danielson 2003-02-23 19:54 ` Eric W. Biederman 1 sibling, 0 replies; 266+ messages in thread From: Magnus Danielson @ 2003-02-23 11:22 UTC (permalink / raw) To: mbligh; +Cc: wli, jgarzik, linux-kernel From: "Martin J. Bligh" <mbligh@aracnet.com> Subject: Re: Minutes from Feb 21 LSE Call Date: Sat, 22 Feb 2003 16:50:36 -0800 > > On Sat, Feb 22, 2003 at 03:38:10AM -0500, Jeff Garzik wrote: > >> ia32 big iron. sigh. I think that's so unfortunately in a number > >> of ways, but the main reason, of course, is that highmem is evil :) > > One phrase ... "price:performance ratio". That's all it's about. > The only thing that will kill 32-bit big iron is the availability of > cheap 64 bit chips. It's a free-market economy. > > It's ugly to program, but it's cheap, and it works. Not all heavy-duty problems die for 64 bit, but fit nicely into 32 bit. There is however different 32-bit architectures for which it fit more or less nicely into. SIMD may or may not give the boost just as 64 bit in itself. This is just like clustering vs. SMP, it depends on the application. Cheers, Magnus ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-23 0:50 ` Martin J. Bligh 2003-02-23 11:22 ` Magnus Danielson @ 2003-02-23 19:54 ` Eric W. Biederman 1 sibling, 0 replies; 266+ messages in thread From: Eric W. Biederman @ 2003-02-23 19:54 UTC (permalink / raw) To: Martin J. Bligh; +Cc: William Lee Irwin III, Jeff Garzik, linux-kernel "Martin J. Bligh" <mbligh@aracnet.com> writes: > > On Sat, Feb 22, 2003 at 03:38:10AM -0500, Jeff Garzik wrote: > >> ia32 big iron. sigh. I think that's so unfortunately in a number > >> of ways, but the main reason, of course, is that highmem is evil :) > > One phrase ... "price:performance ratio". That's all it's about. > The only thing that will kill 32-bit big iron is the availability of > cheap 64 bit chips. It's a free-market economy. > > It's ugly to program, but it's cheap, and it works. I guess ugly to program is in the eye of the beholder. The big platforms have always seemed much worse to me. When every box is feels free to change things in arbitrary ways for no good reason. Or where OS and other low-level software must know exactly which motherboard they are running on to work properly. Gratuitous incompatibilities are the ugliest thing I have ever seen. Much less ugly then the warts a real platform accumulates because it is designed to actually be used. Eric ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 22:18 ` William Lee Irwin III 2003-02-23 0:50 ` Martin J. Bligh @ 2003-02-23 1:17 ` Benjamin LaHaise 2003-02-23 5:21 ` Gerrit Huizenga 2003-02-23 9:37 ` William Lee Irwin III 1 sibling, 2 replies; 266+ messages in thread From: Benjamin LaHaise @ 2003-02-23 1:17 UTC (permalink / raw) To: William Lee Irwin III, Jeff Garzik, linux-kernel On Sat, Feb 22, 2003 at 02:18:20PM -0800, William Lee Irwin III wrote: > I'm not sure what's so nice about x86-64; another opcode prefix > controlled extension atop the festering pile of existing x86 crud What's nice about x86-64 is that it runs existing 32 bit apps fast and doesn't suffer from the blisteringly small caches that were part of your rant. Plus, x86-64 binaries are not horrifically bloated like ia64. Not to mention that the amount of reengineering in compilers like gcc required to get decent performance out of it is actually sane. > sounds every bit as bad any other attempt to prolong x86. Some of > the system device -level cleanups like the HPET look nice, though. HPET is part of one of the PCYY specs and even available on 32 bit x86, there are just not that many bug free implements yet. Since x86-64 made it part of the base platform and is testing it from launch, they actually have a chance at being debugged in the mass market versions. -ben -- Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a> ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-23 1:17 ` Benjamin LaHaise @ 2003-02-23 5:21 ` Gerrit Huizenga 2003-02-23 8:07 ` David Lang 2003-02-23 9:37 ` William Lee Irwin III 1 sibling, 1 reply; 266+ messages in thread From: Gerrit Huizenga @ 2003-02-23 5:21 UTC (permalink / raw) To: Benjamin LaHaise; +Cc: William Lee Irwin III, Jeff Garzik, linux-kernel On Sat, 22 Feb 2003 20:17:24 EST, Benjamin LaHaise wrote: > On Sat, Feb 22, 2003 at 02:18:20PM -0800, William Lee Irwin III wrote: > > I'm not sure what's so nice about x86-64; another opcode prefix > > controlled extension atop the festering pile of existing x86 crud > > What's nice about x86-64 is that it runs existing 32 bit apps fast and > doesn't suffer from the blisteringly small caches that were part of your > rant. Plus, x86-64 binaries are not horrifically bloated like ia64. > Not to mention that the amount of reengineering in compilers like > gcc required to get decent performance out of it is actually sane. Four or five years ago the claim was that IA64 would solve all the large memory problems. Commercial viability and substantial market presence is still lacking. x86-64 has the same uphill battle. It has a better architecture for highmem and potentially better architecture for large systems in general (compared to IA32, not substantially better than, say, IA64 or PPC64). It also has at least one manufacturer looking at high end systems. But until those systems have some recognized market share, the boys with the big pockets aren't likely to make the ubiquitous. The whole thing about expenses to design and develop combined with the ROI model have more influence on their deployment than the fact that it is technically a useful architecture. gerrit ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-23 5:21 ` Gerrit Huizenga @ 2003-02-23 8:07 ` David Lang 2003-02-23 8:20 ` William Lee Irwin III ` (2 more replies) 0 siblings, 3 replies; 266+ messages in thread From: David Lang @ 2003-02-23 8:07 UTC (permalink / raw) To: Gerrit Huizenga Cc: Benjamin LaHaise, William Lee Irwin III, Jeff Garzik, linux-kernel On Sat, 22 Feb 2003, Gerrit Huizenga wrote: > On Sat, 22 Feb 2003 20:17:24 EST, Benjamin LaHaise wrote: > > On Sat, Feb 22, 2003 at 02:18:20PM -0800, William Lee Irwin III wrote: > > > I'm not sure what's so nice about x86-64; another opcode prefix > > > controlled extension atop the festering pile of existing x86 crud > > > > What's nice about x86-64 is that it runs existing 32 bit apps fast and > > doesn't suffer from the blisteringly small caches that were part of your > > rant. Plus, x86-64 binaries are not horrifically bloated like ia64. > > Not to mention that the amount of reengineering in compilers like > > gcc required to get decent performance out of it is actually sane. > > Four or five years ago the claim was that IA64 would solve all the large > memory problems. Commercial viability and substantial market presence > is still lacking. x86-64 has the same uphill battle. It has a better > architecture for highmem and potentially better architecture for large > systems in general (compared to IA32, not substantially better than, say, > IA64 or PPC64). It also has at least one manufacturer looking at high > end systems. But until those systems have some recognized market share, > the boys with the big pockets aren't likely to make the ubiquitous. > The whole thing about expenses to design and develop combined with the > ROI model have more influence on their deployment than the fact that it > is technically a useful architecture. Garrit, you missed the preior posters point. IA64 had the same fundamental problem as the Alpha, PPC, and Sparc processors, it doesn't run x86 binaries. the 8086/8088 CPU was nothing special when it was picked to be used on the IBM PC, but once it was picked it hit a critical mass that has meant that compatability with it is critical to a new CPU. the 286 and 386 CPUs were arguably inferior to other options available at the time, but they had one feature that absolutly trumped everything else, they could run existing programs with no modifications faster then anything else available. with the IA64 Intel forgot this (or decided their name value was so high that they were immune to the issue) x86-64 takes the same approach that the 286 and 386 did and will be used by people who couldn't care less about 64 bit stuff simply becouse it looks to be the fastest x86 cpu available (and if the SMP features work as advertised it will again give a big boost to the price/performance of SMP machines due to much cheaper MLB designs). if it was being marketed by Intel it would be a shoo-in, but AMD does have a bit of an uphill struggle David Lang ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-23 8:07 ` David Lang @ 2003-02-23 8:20 ` William Lee Irwin III 2003-02-23 19:17 ` Linus Torvalds 2003-02-23 19:13 ` David Mosberger 2003-02-23 20:48 ` Gerrit Huizenga 2 siblings, 1 reply; 266+ messages in thread From: William Lee Irwin III @ 2003-02-23 8:20 UTC (permalink / raw) To: David Lang; +Cc: Gerrit Huizenga, Benjamin LaHaise, Jeff Garzik, linux-kernel On Sun, Feb 23, 2003 at 12:07:50AM -0800, David Lang wrote: > Garrit, you missed the preior posters point. IA64 had the same fundamental > problem as the Alpha, PPC, and Sparc processors, it doesn't run x86 > binaries. If I didn't know this mattered I wouldn't bother with the barfbags. I just wouldn't deal with it. -- wli ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-23 8:20 ` William Lee Irwin III @ 2003-02-23 19:17 ` Linus Torvalds 2003-02-23 19:29 ` David Mosberger ` (3 more replies) 0 siblings, 4 replies; 266+ messages in thread From: Linus Torvalds @ 2003-02-23 19:17 UTC (permalink / raw) To: linux-kernel In article <20030223082036.GI10411@holomorphy.com>, William Lee Irwin III <wli@holomorphy.com> wrote: >On Sun, Feb 23, 2003 at 12:07:50AM -0800, David Lang wrote: >> Garrit, you missed the preior posters point. IA64 had the same fundamental >> problem as the Alpha, PPC, and Sparc processors, it doesn't run x86 >> binaries. > >If I didn't know this mattered I wouldn't bother with the barfbags. >I just wouldn't deal with it. Why? The x86 is a hell of a lot nicer than the ppc32, for example. On the x86, you get good performance and you can ignore the design mistakes (ie segmentation) by just basically turning them off. On the ppc32, the MMU braindamage is not something you can ignore, you have to write your OS for it and if you turn it off (ie enable soft-fill on the ones that support it) you now have to have separate paths in the OS for it. And the baroque instruction encoding on the x86 is actually a _good_ thing: it's a rather dense encoding, which means that you win on icache. It's a bit hard to decode, but who cares? Existing chips do well at decoding, and thanks to the icache win they tend to perform better - and they load faster too (which is important - you can make your CPU have big caches, but _nothing_ saves you from the cold-cache costs). The low register count isn't an issue when you code in any high-level language, and it has actually forced x86 implementors to do a hell of a lot better job than the competition when it comes to memory loads and stores - which helps in general. While the RISC people were off trying to optimize their compilers to generate loops that used all 32 registers efficiently, the x86 implementors instead made the chip run fast on varied loads and used tons of register renaming hardware (and looking at _memory_ renaming too). IA64 made all the mistakes anybody else did, and threw out all the good parts of the x86 because people thought those parts were ugly. They aren't ugly, they're the "charming oddity" that makes it do well. Look at them the right way and you realize that a lot of the grottyness is exactly _why_ the x86 works so well (yeah, and the fact that they are everywhere ;). The only real major failure of the x86 is the PAE crud. Let's hope we'll get to forget it, the same way the DOS people eventually forgot about their memory extenders. (Yeah, and maybe IBM will make their ppc64 chips cheap enough that they will matter, and people can overlook the grottiness there. Right now Intel doesn't even seem to be interested in "64-bit for the masses", and maybe IBM will be. AMD certainly seems to be serious about the "masses" part, which in the end is the only part that really matters). Linus ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-23 19:17 ` Linus Torvalds @ 2003-02-23 19:29 ` David Mosberger 2003-02-23 20:13 ` Martin J. Bligh 2003-02-23 21:34 ` Linus Torvalds 2003-02-23 20:21 ` Xavier Bestel ` (2 subsequent siblings) 3 siblings, 2 replies; 266+ messages in thread From: David Mosberger @ 2003-02-23 19:29 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel >>>>> On Sun, 23 Feb 2003 19:17:30 +0000 (UTC), torvalds@transmeta.com (Linus Torvalds) said: Linus> Look at them the right way and you realize that a lot of the Linus> grottyness is exactly _why_ the x86 works so well (yeah, and Linus> the fact that they are everywhere ;). But does x86 reall work so well? Itanium 2 on 0.13um performs a lot better than P4 on 0.13um. As far as I can guess, the only reason P4 comes out on 0.13um (and 0.09um) before anything else is due to the latter part you mention: it's where the volume is today. --david ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-23 19:29 ` David Mosberger @ 2003-02-23 20:13 ` Martin J. Bligh 2003-02-23 22:01 ` David Mosberger 2003-02-23 21:34 ` Linus Torvalds 1 sibling, 1 reply; 266+ messages in thread From: Martin J. Bligh @ 2003-02-23 20:13 UTC (permalink / raw) To: davidm, Linus Torvalds; +Cc: linux-kernel > Linus> Look at them the right way and you realize that a lot of the > Linus> grottyness is exactly _why_ the x86 works so well (yeah, and > Linus> the fact that they are everywhere ;). > > But does x86 reall work so well? Itanium 2 on 0.13um performs a lot > better than P4 on 0.13um. As far as I can guess, the only reason P4 > comes out on 0.13um (and 0.09um) before anything else is due to the > latter part you mention: it's where the volume is today. Care to share those impressive benchmark numbers (for macro-benchmarks)? Would be interesting to see the difference, and where it wins. Thanks, M ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-23 20:13 ` Martin J. Bligh @ 2003-02-23 22:01 ` David Mosberger 2003-02-23 22:12 ` Martin J. Bligh 0 siblings, 1 reply; 266+ messages in thread From: David Mosberger @ 2003-02-23 22:01 UTC (permalink / raw) To: Martin J. Bligh; +Cc: davidm, Linus Torvalds, linux-kernel >>>>> On Sun, 23 Feb 2003 12:13:00 -0800, "Martin J. Bligh" <mbligh@aracnet.com> said: Linus> Look at them the right way and you realize that a lot of the Linus> grottyness is exactly _why_ the x86 works so well (yeah, and Linus> the fact that they are everywhere ;). >> But does x86 reall work so well? Itanium 2 on 0.13um performs a >> lot better than P4 on 0.13um. As far as I can guess, the only >> reason P4 comes out on 0.13um (and 0.09um) before anything else >> is due to the latter part you mention: it's where the volume is >> today. Martin> Care to share those impressive benchmark numbers (for Martin> macro-benchmarks)? Would be interesting to see the Martin> difference, and where it wins. You can do it two ways: you can look at the numbers Intel is publicly projected for Madison, or you can compare McKinley with 0.18um Pentium 4. --david ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-23 22:01 ` David Mosberger @ 2003-02-23 22:12 ` Martin J. Bligh 0 siblings, 0 replies; 266+ messages in thread From: Martin J. Bligh @ 2003-02-23 22:12 UTC (permalink / raw) To: davidm; +Cc: Linus Torvalds, linux-kernel > >> But does x86 reall work so well? Itanium 2 on 0.13um performs a > >> lot better than P4 on 0.13um. As far as I can guess, the only > >> reason P4 comes out on 0.13um (and 0.09um) before anything else > >> is due to the latter part you mention: it's where the volume is > >> today. > > Martin> Care to share those impressive benchmark numbers (for > Martin> macro-benchmarks)? Would be interesting to see the > Martin> difference, and where it wins. > > You can do it two ways: you can look at the numbers Intel is publicly > projected for Madison, or you can compare McKinley with 0.18um Pentium 4. Ummm ... I'm not exactly happy working with Intel's own projections on the performance of their Itanium chips ... seems a little unscientific ;-) Presumably when you said "Itanium 2 on 0.13um performs a lot better than P4 on 0.13um." you were referring to some benchmarks you have the results of? If you can't publish them, fair enough. But if you can, I'd love to see how it compares ... Itanium seems to be "more interesting" nowadays, though I can't say I'm happy about the complexity of it. M. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-23 19:29 ` David Mosberger 2003-02-23 20:13 ` Martin J. Bligh @ 2003-02-23 21:34 ` Linus Torvalds 2003-02-23 22:40 ` David Mosberger 1 sibling, 1 reply; 266+ messages in thread From: Linus Torvalds @ 2003-02-23 21:34 UTC (permalink / raw) To: davidm; +Cc: linux-kernel On Sun, 23 Feb 2003, David Mosberger wrote: > > But does x86 reall work so well? Itanium 2 on 0.13um performs a lot > better than P4 on 0.13um. On WHAT benchmark? Itanium 2 doesn't hold a candle to a P4 on any real-world benchmarks. As far as I know, the _only_ things Itanium 2 does better on is (a) FP kernels, partly due to a huge cache and (b) big databases, entirely because the P4 is crippled with lots of memory because Intel refuses to do a 64-bit version (because they know it would totally kill ia-64). Last I saw P4 was kicking ia-64 butt on specint and friends. That's also ignoring the fact that ia-64 simply CANNOT DO the things a P4 does every single day. You can't put an ia-64 in a reasonable desktop machine, partly because of pricing, but partly because it would just suck so horribly at things people expect not to suck (games spring to mind). And I further bet that using a native distribution (ie totally ignoring the power and price and bad x86 performance issues), ia-64 will work a lot worse for people simply because the binaries are bigger. That was quite painful on alpha, and ia-64 is even worse - to offset the bigger binaries, you need a faster disk subsystem etc just to not feel slower than a bog-standard PC. Code size matters. Price matters. Real world matters. And ia-64 at least so far falls flat on its face on ALL of these. > As far as I can guess, the only reason P4 > comes out on 0.13um (and 0.09um) before anything else is due to the > latter part you mention: it's where the volume is today. It's where all the money is ("ia-64: 5 billion dollars in the red and still sinking") so of _course_ it's where the efforts get put. Linus ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-23 21:34 ` Linus Torvalds @ 2003-02-23 22:40 ` David Mosberger 2003-02-23 22:48 ` David Lang 2003-02-23 23:06 ` Martin J. Bligh 0 siblings, 2 replies; 266+ messages in thread From: David Mosberger @ 2003-02-23 22:40 UTC (permalink / raw) To: Linus Torvalds; +Cc: davidm, linux-kernel >>>>> On Sun, 23 Feb 2003 13:34:32 -0800 (PST), Linus Torvalds <torvalds@transmeta.com> said: Linus> Last I saw P4 was kicking ia-64 butt on specint and friends. I don't think so. According to Intel [1], the highest clockfrequency for a 0.18um part is 2GHz (both for Xeon and P4, for Xeon MP it's 1.5GHz). The highest reported SPECint for a 2GHz Xeon seems to be 701 [2]. In comparison, a 1GHz McKinley gets a SPECint of 810 [3]. --david [1] http://www.intel.com/support/processors/xeon/corespeeds.htm [2] http://www.specbench.org/cpu2000/results/res2002q1/cpu2000-20020128-01232.html [3] http://www.specbench.org/cpu2000/results/res2002q3/cpu2000-20020711-01469.html ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-23 22:40 ` David Mosberger @ 2003-02-23 22:48 ` David Lang 2003-02-23 22:54 ` David Mosberger 2003-02-23 23:06 ` Martin J. Bligh 1 sibling, 1 reply; 266+ messages in thread From: David Lang @ 2003-02-23 22:48 UTC (permalink / raw) To: davidm; +Cc: Linus Torvalds, linux-kernel I would call a 15% lead over the ia64 pretty substantial. yes it's not the same clock speed, but if that's the clock speed they can achieve on that process it's equivalent. the P4 covers a LOT of sins by ratcheting up it's speed, what matters is the final capability, not the capability/clock (if capability/clock was what mattered the AMD chips would have put intel out of business and the P4 would be as common as ia-64) David Lang On Sun, 23 Feb 2003, David Mosberger wrote: > Date: Sun, 23 Feb 2003 14:40:44 -0800 > From: David Mosberger <davidm@napali.hpl.hp.com> > Reply-To: davidm@hpl.hp.com > To: Linus Torvalds <torvalds@transmeta.com> > Cc: davidm@hpl.hp.com, linux-kernel@vger.kernel.org > Subject: Re: Minutes from Feb 21 LSE Call > > >>>>> On Sun, 23 Feb 2003 13:34:32 -0800 (PST), Linus Torvalds <torvalds@transmeta.com> said: > > Linus> Last I saw P4 was kicking ia-64 butt on specint and friends. > > I don't think so. According to Intel [1], the highest clockfrequency > for a 0.18um part is 2GHz (both for Xeon and P4, for Xeon MP it's > 1.5GHz). The highest reported SPECint for a 2GHz Xeon seems to be 701 > [2]. In comparison, a 1GHz McKinley gets a SPECint of 810 [3]. > > --david > > [1] http://www.intel.com/support/processors/xeon/corespeeds.htm > [2] http://www.specbench.org/cpu2000/results/res2002q1/cpu2000-20020128-01232.html > [3] http://www.specbench.org/cpu2000/results/res2002q3/cpu2000-20020711-01469.html > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-23 22:48 ` David Lang @ 2003-02-23 22:54 ` David Mosberger 2003-02-23 22:56 ` David Lang ` (2 more replies) 0 siblings, 3 replies; 266+ messages in thread From: David Mosberger @ 2003-02-23 22:54 UTC (permalink / raw) To: David Lang; +Cc: davidm, Linus Torvalds, linux-kernel >>>>> On Sun, 23 Feb 2003 14:48:48 -0800 (PST), David Lang <david.lang@digitalinsight.com> said: David.L> I would call a 15% lead over the ia64 pretty substantial. Huh? Did you misread my mail? 2 GHz Xeon: 701 SPECint 1 GHz Itanium 2: 810 SPECint That is, Itanium 2 is 15% faster. --david ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-23 22:54 ` David Mosberger @ 2003-02-23 22:56 ` David Lang 2003-02-24 0:40 ` Linus Torvalds 2003-02-24 1:06 ` dean gaudet 2 siblings, 0 replies; 266+ messages in thread From: David Lang @ 2003-02-23 22:56 UTC (permalink / raw) To: davidm; +Cc: Linus Torvalds, linux-kernel yep, I revered the numbers David Lang On Sun, 23 Feb 2003, David Mosberger wrote: > Date: Sun, 23 Feb 2003 14:54:12 -0800 > From: David Mosberger <davidm@napali.hpl.hp.com> > Reply-To: davidm@hpl.hp.com > To: David Lang <david.lang@digitalinsight.com> > Cc: davidm@hpl.hp.com, Linus Torvalds <torvalds@transmeta.com>, > linux-kernel@vger.kernel.org > Subject: Re: Minutes from Feb 21 LSE Call > > >>>>> On Sun, 23 Feb 2003 14:48:48 -0800 (PST), David Lang <david.lang@digitalinsight.com> said: > > David.L> I would call a 15% lead over the ia64 pretty substantial. > > Huh? Did you misread my mail? > > 2 GHz Xeon: 701 SPECint > 1 GHz Itanium 2: 810 SPECint > > That is, Itanium 2 is 15% faster. > > --david > ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-23 22:54 ` David Mosberger 2003-02-23 22:56 ` David Lang @ 2003-02-24 0:40 ` Linus Torvalds 2003-02-24 2:32 ` David Mosberger 2003-02-24 1:06 ` dean gaudet 2 siblings, 1 reply; 266+ messages in thread From: Linus Torvalds @ 2003-02-24 0:40 UTC (permalink / raw) To: davidm; +Cc: David Lang, linux-kernel On Sun, 23 Feb 2003, David Mosberger wrote: > > 2 GHz Xeon: 701 SPECint > 1 GHz Itanium 2: 810 SPECint > > That is, Itanium 2 is 15% faster. Ehh, and this is with how much cache? Last I saw, the Itanium 2 machines came with 3MB of integrated L3 caches, and I suspect that whatever 0.13 Itanium numbers you're looking at are with the new 6MB caches. So your "apples to apples" comparison isn't exactly that. The only thing that is meaningful is "performace at the same time of general availability". At which point the P4 beats the Itanium 2 senseless with a 25% higher SpecInt. And last I heard, by the time Itanium 2 is up at 2GHz, the P4 is apparently going to be at 5GHz, comfortably keeping that 25% lead. Linus ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 0:40 ` Linus Torvalds @ 2003-02-24 2:32 ` David Mosberger 2003-02-24 2:54 ` Linus Torvalds 0 siblings, 1 reply; 266+ messages in thread From: David Mosberger @ 2003-02-24 2:32 UTC (permalink / raw) To: Linus Torvalds; +Cc: davidm, David Lang, linux-kernel >>>>> On Sun, 23 Feb 2003 16:40:40 -0800 (PST), Linus Torvalds <torvalds@transmeta.com> said: Linus> On Sun, 23 Feb 2003, David Mosberger wrote: >> 2 GHz Xeon: 701 SPECint >> 1 GHz Itanium 2: 810 SPECint >> That is, Itanium 2 is 15% faster. Linus> Ehh, and this is with how much cache? Linus> Last I saw, the Itanium 2 machines came with 3MB of Linus> integrated L3 caches, and I suspect that whatever 0.13 Linus> Itanium numbers you're looking at are with the new 6MB Linus> caches. Unfortunately, HP doesn't sell 1.5MB/1GHz Itanium 2 workstations, but we can do some educated guessing: 1GHz Itanium 2, 3MB cache: 810 SPECint 900MHz Itanium 2, 1.5MB cache: 674 SPECint Assuming pure frequency scaling, a 1GHz/1.5MB Itanium 2 would get around 750 SPECint. In reality, it would get slightly less, but most likely substantially more than 701. Linus> So your "apples to apples" comparison isn't exactly that. I never claimed it's an apples to apples comparison. But comparing same-process chips from the same manufacturer does make for a fairer "architectural" comparison because it factors out at least some of the effects caused by volume (there is no reason other than (a) volume and (b) being designed as a server chip for Itanium chips to come out on the same process later than the corresponding x86 chips). Linus> The only thing that is meaningful is "performace at the same Linus> time of general availability". You claimed that x86 is inherently superior. I provided data that shows that much of this apparent superiority is simply an effect of the larger volume that x86 achieves today. Please don't claim that x86 wins on technical grounds when it really wins on economic grounds. --david ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 2:32 ` David Mosberger @ 2003-02-24 2:54 ` Linus Torvalds 2003-02-24 3:08 ` David Mosberger 2003-02-24 21:42 ` Andrea Arcangeli 0 siblings, 2 replies; 266+ messages in thread From: Linus Torvalds @ 2003-02-24 2:54 UTC (permalink / raw) To: davidm; +Cc: David Lang, linux-kernel On Sun, 23 Feb 2003, David Mosberger wrote: > >> 2 GHz Xeon: 701 SPECint > >> 1 GHz Itanium 2: 810 SPECint > > >> That is, Itanium 2 is 15% faster. > > Unfortunately, HP doesn't sell 1.5MB/1GHz Itanium 2 workstations, but > we can do some educated guessing: > > 1GHz Itanium 2, 3MB cache: 810 SPECint > 900MHz Itanium 2, 1.5MB cache: 674 SPECint > > Assuming pure frequency scaling, a 1GHz/1.5MB Itanium 2 would get > around 750 SPECint. In reality, it would get slightly less, but most > likely substantially more than 701. And as Dean pointed out: 2Ghz Xeon MP with 2MB L3 cache: 842 SPECint In other words, the P4 eats the Itanium for breakfast even if you limit it to 2GHz due to some "process" rule. And if you don't make up any silly rules, but simply look at "what's available today", you get 2.8Ghz Xeon MP with 2MB L3 cache: 907 SPECint or even better (much cheaper CPUs): 3.06 GHz P4 with 512kB L2 cache: 1074 SPECint AMD Athlon XP 2800+: 933 SPECint These are systems that you can buy today. With _less_ cache, and clearly much higher performance (the difference between the best-performing published ia-64 and the best P4 on specint, the P4 is 32% faster. Even with the "you can only run the P4 at 2GHz because that is all it ever ran at in 0.18" thing the ia-64 falls behind. > Linus> The only thing that is meaningful is "performace at the same > Linus> time of general availability". > > You claimed that x86 is inherently superior. I provided data that > shows that much of this apparent superiority is simply an effect of > the larger volume that x86 achieves today. And I showed that your data is flawed. Clearly the P4 outperforms ia-64 on an architectural level _even_ when taking process into account. Linus ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 2:54 ` Linus Torvalds @ 2003-02-24 3:08 ` David Mosberger 2003-02-24 21:42 ` Andrea Arcangeli 1 sibling, 0 replies; 266+ messages in thread From: David Mosberger @ 2003-02-24 3:08 UTC (permalink / raw) To: Linus Torvalds; +Cc: davidm, David Lang, linux-kernel >>>>> On Sun, 23 Feb 2003 18:54:41 -0800 (PST), Linus Torvalds <torvalds@transmeta.com> said: Linus> In other words, the P4 eats the Itanium for breakfast even if Linus> you limit it to 2GHz due to some "process" rule. Ugh, 842 vs 810 is "eating for breakfast"? In my lexicon, that's "in the same ballpark". Besides the 2GHz Xeon MP is a 0.13um part. >> You claimed that x86 is inherently superior. I provided data that >> shows that much of this apparent superiority is simply an effect of >> the larger volume that x86 achieves today. Linus> And I showed that your data is flawed. No, you did not. --david ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 2:54 ` Linus Torvalds 2003-02-24 3:08 ` David Mosberger @ 2003-02-24 21:42 ` Andrea Arcangeli 1 sibling, 0 replies; 266+ messages in thread From: Andrea Arcangeli @ 2003-02-24 21:42 UTC (permalink / raw) To: Linus Torvalds; +Cc: davidm, David Lang, linux-kernel On Sun, Feb 23, 2003 at 06:54:41PM -0800, Linus Torvalds wrote: > > On Sun, 23 Feb 2003, David Mosberger wrote: > > >> 2 GHz Xeon: 701 SPECint > > >> 1 GHz Itanium 2: 810 SPECint > > > > >> That is, Itanium 2 is 15% faster. > > > > Unfortunately, HP doesn't sell 1.5MB/1GHz Itanium 2 workstations, but > > we can do some educated guessing: > > > > 1GHz Itanium 2, 3MB cache: 810 SPECint > > 900MHz Itanium 2, 1.5MB cache: 674 SPECint > > > > Assuming pure frequency scaling, a 1GHz/1.5MB Itanium 2 would get > > around 750 SPECint. In reality, it would get slightly less, but most > > likely substantially more than 701. > > And as Dean pointed out: > > 2Ghz Xeon MP with 2MB L3 cache: 842 SPECint > > In other words, the P4 eats the Itanium for breakfast even if you limit it > to 2GHz due to some "process" rule. > > And if you don't make up any silly rules, but simply look at "what's > available today", you get > > 2.8Ghz Xeon MP with 2MB L3 cache: 907 SPECint > > or even better (much cheaper CPUs): > > 3.06 GHz P4 with 512kB L2 cache: 1074 SPECint > AMD Athlon XP 2800+: 933 SPECint > > These are systems that you can buy today. With _less_ cache, and clearly > much higher performance (the difference between the best-performing > published ia-64 and the best P4 on specint, the P4 is 32% faster. Even > with the "you can only run the P4 at 2GHz because that is all it ever ran > at in 0.18" thing the ia-64 falls behind. I agree, especially the cache difference makes any comparison not interesting to my eyes (it's similar to running dbench with different pagecache sizes and comparing the results). But I've a side note on these matters in favour of the 64bit platforms. I could be wrong, but AFIK some of the specint testcases generates a double data memory footprint if compiled 64bit, so I guess some of the testcases should be really called speclong and not specint. (however I don't think those testcases alone can explain a global 32% difference, but still there would be some difference in favour of the 32bit platform) So in short, I currently believe specint is not a good benchmark to compare a 64bit cpu to a 32bit cpu, 64bit can only lose in specint if the cpu is exactly the same but only the data 'longs' are changed to 64bit. To do a real fair comparison one should first change the source replacing every "long" with either a "long long" or an "int", only then it will be fair to compare specint results between 32bit and 64bit cpus. I never used specint myself, so don't ask me more details on this, and again I could be wrong, but really - if I'm right - somebody should go over the source and make a kind of unofficial (but official) patch available to people to generate a specint testsuite usable to compare 32bit with 64bit results, or lots of effort will be wasted by people pretending to do the impossible. I mean, if the memory bus is the same hardware in both the 32bit and 64bit runs, the double memory footprint will run slower and there's nothing the OS or the hardware can do about it (and dozen mbytes of ram won't fit in l1 cache, not even on the itanium 8). The benchmark suite really must be fixed to ensure the 32bit and 64bit compilation will generate the same _data_ memory footprint if one wants to make comparisons between the two. Andrea ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-23 22:54 ` David Mosberger 2003-02-23 22:56 ` David Lang 2003-02-24 0:40 ` Linus Torvalds @ 2003-02-24 1:06 ` dean gaudet 2003-02-24 1:56 ` David Mosberger 2 siblings, 1 reply; 266+ messages in thread From: dean gaudet @ 2003-02-24 1:06 UTC (permalink / raw) To: davidm; +Cc: David Lang, Linus Torvalds, linux-kernel On Sun, 23 Feb 2003, David Mosberger wrote: > >>>>> On Sun, 23 Feb 2003 14:48:48 -0800 (PST), David Lang <david.lang@digitalinsight.com> said: > > David.L> I would call a 15% lead over the ia64 pretty substantial. > > Huh? Did you misread my mail? > > 2 GHz Xeon: 701 SPECint > 1 GHz Itanium 2: 810 SPECint > > That is, Itanium 2 is 15% faster. according to pricewatch i could buy ten 2GHz Xeons for about the cost of one Itanium 2 900MHz. that's not even considering the cost of the motherboards i'd need to plug those into. -dean ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 1:06 ` dean gaudet @ 2003-02-24 1:56 ` David Mosberger 2003-02-24 2:15 ` dean gaudet 0 siblings, 1 reply; 266+ messages in thread From: David Mosberger @ 2003-02-24 1:56 UTC (permalink / raw) To: dean gaudet; +Cc: davidm, David Lang, Linus Torvalds, linux-kernel >>>>> On Sun, 23 Feb 2003 17:06:29 -0800 (PST), dean gaudet <dean-list-linux-kernel@arctic.org> said: Dean> On Sun, 23 Feb 2003, David Mosberger wrote: >> >>>>> On Sun, 23 Feb 2003 14:48:48 -0800 (PST), David Lang <david.lang@digitalinsight.com> said: David.L> I would call a 15% lead over the ia64 pretty substantial. >> Huh? Did you misread my mail? >> 2 GHz Xeon: 701 SPECint >> 1 GHz Itanium 2: 810 SPECint >> That is, Itanium 2 is 15% faster. Dean> according to pricewatch i could buy ten 2GHz Xeons for about Dean> the cost of one Itanium 2 900MHz. Not if you want comparable cache-sizes [1]: Intel Xeon MP, 2MB L3 cache: $3692 Itanium 2, 1 GHZ, 3MB L3 cache: $4226 Itanium 2, 1 GHZ, 1.5MB L3 cache: $2247 Itanium 2, 900 MHZ, 1.5MB L3 cache: $1338 Intel basically prices things by the cache size. --david [1]: http://www.intel.com/intel/finance/pricelist/ ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 1:56 ` David Mosberger @ 2003-02-24 2:15 ` dean gaudet 2003-02-24 3:11 ` David Mosberger 0 siblings, 1 reply; 266+ messages in thread From: dean gaudet @ 2003-02-24 2:15 UTC (permalink / raw) To: davidm; +Cc: David Lang, Linus Torvalds, linux-kernel On Sun, 23 Feb 2003, David Mosberger wrote: > >>>>> On Sun, 23 Feb 2003 17:06:29 -0800 (PST), dean gaudet <dean-list-linux-kernel@arctic.org> said: > > Dean> On Sun, 23 Feb 2003, David Mosberger wrote: > >> >>>>> On Sun, 23 Feb 2003 14:48:48 -0800 (PST), David Lang <david.lang@digitalinsight.com> said: > > David.L> I would call a 15% lead over the ia64 pretty substantial. > > >> Huh? Did you misread my mail? > > >> 2 GHz Xeon: 701 SPECint > >> 1 GHz Itanium 2: 810 SPECint > > >> That is, Itanium 2 is 15% faster. > > Dean> according to pricewatch i could buy ten 2GHz Xeons for about > Dean> the cost of one Itanium 2 900MHz. > > Not if you want comparable cache-sizes [1]: somehow i doubt you're quoting Xeon numbers w/2MB of cache above. in fact, here's a 701 specint with only 512KB of cache @ 2GHz: http://www.spec.org/osg/cpu2000/results/res2002q1/cpu2000-20020128-01232.html my point was that if you had comparable die sizes the 15% "advantage" would disappear. there's a hell of a lot which could be done with the approximately double die size that the itanium 2 has compared to any of the commodity x86 parts. but then the cost per part would be correspondingly higher... which is exactly what is shown in the intel cost numbers. a more fair comparison would be your itanium 2 number with this: http://www.spec.org/osg/cpu2000/results/res2002q4/cpu2000-20021021-01742.html 2MB L2 Xeon @ 2GHz, scores 842. is this the itanium 2 number you're quoting us? http://www.spec.org/osg/cpu2000/results/res2002q3/cpu2000-20020711-01469.html 'cause that's with 3MB L3. -dean > > Intel Xeon MP, 2MB L3 cache: $3692 > > Itanium 2, 1 GHZ, 3MB L3 cache: $4226 > Itanium 2, 1 GHZ, 1.5MB L3 cache: $2247 > Itanium 2, 900 MHZ, 1.5MB L3 cache: $1338 > > Intel basically prices things by the cache size. > > --david > > [1]: http://www.intel.com/intel/finance/pricelist/ > ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 2:15 ` dean gaudet @ 2003-02-24 3:11 ` David Mosberger 0 siblings, 0 replies; 266+ messages in thread From: David Mosberger @ 2003-02-24 3:11 UTC (permalink / raw) To: dean gaudet; +Cc: davidm, David Lang, Linus Torvalds, linux-kernel >>>>> On Sun, 23 Feb 2003 18:15:29 -0800 (PST), dean gaudet <dean-list-linux-kernel@arctic.org> said: Dean> somehow i doubt you're quoting Xeon numbers w/2MB of cache above. I quoted the Xeon 0.13um price because there was no 0.18um part with >512KB cache (for better or worse, Intel basically prices CPUs by cache-size). --david ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-23 22:40 ` David Mosberger 2003-02-23 22:48 ` David Lang @ 2003-02-23 23:06 ` Martin J. Bligh 2003-02-23 23:59 ` David Mosberger 1 sibling, 1 reply; 266+ messages in thread From: Martin J. Bligh @ 2003-02-23 23:06 UTC (permalink / raw) To: davidm, Linus Torvalds; +Cc: linux-kernel > Linus> Last I saw P4 was kicking ia-64 butt on specint and friends. > > I don't think so. According to Intel [1], the highest clockfrequency > for a 0.18um part is 2GHz (both for Xeon and P4, for Xeon MP it's > 1.5GHz). The highest reported SPECint for a 2GHz Xeon seems to be 701 > [2]. In comparison, a 1GHz McKinley gets a SPECint of 810 [3]. > > --david > > [1] http://www.intel.com/support/processors/xeon/corespeeds.htm > [2] > http://www.specbench.org/cpu2000/results/res2002q1/cpu2000-20020128-01232 > .html [3] > http://www.specbench.org/cpu2000/results/res2002q3/cpu2000-20020711-01469 > .html - Got anything more real-world than SPECint type microbenchmarks? M. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-23 23:06 ` Martin J. Bligh @ 2003-02-23 23:59 ` David Mosberger 2003-02-24 3:49 ` Gerrit Huizenga 0 siblings, 1 reply; 266+ messages in thread From: David Mosberger @ 2003-02-23 23:59 UTC (permalink / raw) To: Martin J. Bligh; +Cc: davidm, Linus Torvalds, linux-kernel >>>>> On Sun, 23 Feb 2003 15:06:56 -0800, "Martin J. Bligh" <mbligh@aracnet.com> said: Linus> Last I saw P4 was kicking ia-64 butt on specint and friends. >> I don't think so. According to Intel [1], the highest >> clockfrequency for a 0.18um part is 2GHz (both for Xeon and P4, >> for Xeon MP it's 1.5GHz). The highest reported SPECint for a >> 2GHz Xeon seems to be 701 [2]. In comparison, a 1GHz McKinley >> gets a SPECint of 810 [3]. Martin> Got anything more real-world than SPECint type Martin> microbenchmarks? SPECint a microbenchmark? You seem to be redefining the meaning of the word (last time I checked, lmbench was a microbenchmark). Ironically, Itanium 2 seems to do even better in the "real world" than suggested by benchmarks, partly because of the large caches, memory bandwidth and, I'm guessing, partly because of it's straight-forward micro-architecture (e.g., a synchronization operation takes on the order of 10 cycles, as compared to order of dozens and hundres of cycles on the Pentium 4). BTW: I hope I don't sound too negative on the Pentium 4/Xeon. It's certainly an excellent performer for many things. I just want to point out that Itanium 2 also is a good performer, probably more so than many on this list seem to be willing to give it credit for. --david ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-23 23:59 ` David Mosberger @ 2003-02-24 3:49 ` Gerrit Huizenga 2003-02-24 4:07 ` David Mosberger 0 siblings, 1 reply; 266+ messages in thread From: Gerrit Huizenga @ 2003-02-24 3:49 UTC (permalink / raw) To: davidm; +Cc: Martin J. Bligh, Linus Torvalds, linux-kernel On Sun, 23 Feb 2003 15:59:12 PST, David Mosberger wrote: > >>>>> On Sun, 23 Feb 2003 15:06:56 -0800, "Martin J. Bligh" <mbligh@aracnet.com> said: > Martin> Got anything more real-world than SPECint type > Martin> microbenchmarks? > > SPECint a microbenchmark? You seem to be redefining the meaning of > the word (last time I checked, lmbench was a microbenchmark). > > Ironically, Itanium 2 seems to do even better in the "real world" than > suggested by benchmarks, partly because of the large caches, memory > bandwidth and, I'm guessing, partly because of it's straight-forward > micro-architecture (e.g., a synchronization operation takes on the > order of 10 cycles, as compared to order of dozens and hundres of > cycles on the Pentium 4). Two major types of high end workloads here (and IA64 is definitely still in the "high end" category). There are the scientific and technical style workloads, which SPECcpu (of which CINT and CFP are the integer and floating point subsets) might reasonably categorize, and some of the "system" workloads, such as those roughly categorized by things like TPC-C/H/W/etc, or SPECweb/jbb/jvm/jAppServer which exercise some more complex, multi-tier interactions. I haven't seen anything recently on the higher level System bencmarks for IA64 - I'm not sure that anyone is doing much that is significant in this space, where IA32 results practically saturate the overall reported results. I know SGI is generally more interested in the scientific and technical area. I would assume that HP would be more interested in the broader system deployment, except that too much activity in that area might endanger parisc sales. IBM is doing some stuff in the IA64 space, but more in IA32 and obviously PPC64. That leaves NEC and a few others that I don't know about. It may be that IA64 isn't really ready for the system level stuff or that it competes with too many entrenched platforms to make it economically viable. But, I would be really interested in seeing anything other than "scientific and technical" based benchmarks for IA64. I don't think there is much out there. That implies that nobody is interested in IA64 or that it doesn't perform "competitively" in that space... gerrit ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 3:49 ` Gerrit Huizenga @ 2003-02-24 4:07 ` David Mosberger 2003-02-24 4:34 ` Martin J. Bligh 2003-02-24 5:02 ` Gerrit Huizenga 0 siblings, 2 replies; 266+ messages in thread From: David Mosberger @ 2003-02-24 4:07 UTC (permalink / raw) To: Gerrit Huizenga; +Cc: davidm, Martin J. Bligh, Linus Torvalds, linux-kernel >>>>> On Sun, 23 Feb 2003 19:49:38 -0800, Gerrit Huizenga <gh@us.ibm.com> said: Gerrit> I haven't seen anything recently on the higher level System bencmarks Gerrit> for IA64 Did you miss the TPC-C announcement from last November & December? rx5670 4-way Itanium 2: 80498 tpmC @ $5.30/transaction (Oracle 10 on Linux). rx5670 4-way Itanium 2: 87741 tpmC @ $5.03/transaction (MS SQL on Windows). Both world-records for 4-way machines when they were announced (not sure if that's still true). --david ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 4:07 ` David Mosberger @ 2003-02-24 4:34 ` Martin J. Bligh 2003-02-24 5:02 ` Gerrit Huizenga 1 sibling, 0 replies; 266+ messages in thread From: Martin J. Bligh @ 2003-02-24 4:34 UTC (permalink / raw) To: davidm, Gerrit Huizenga; +Cc: Linus Torvalds, linux-kernel > Gerrit> I haven't seen anything recently on the higher level System > bencmarks Gerrit> for IA64 > > Did you miss the TPC-C announcement from last November & December? > > rx5670 4-way Itanium 2: 80498 tpmC @ $5.30/transaction (Oracle 10 on > Linux). rx5670 4-way Itanium 2: 87741 tpmC @ $5.03/transaction (MS SQL > on Windows). > > Both world-records for 4-way machines when they were announced (not > sure if that's still true). Cool - thanks. that's more what I was looking for. M. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 4:07 ` David Mosberger 2003-02-24 4:34 ` Martin J. Bligh @ 2003-02-24 5:02 ` Gerrit Huizenga 1 sibling, 0 replies; 266+ messages in thread From: Gerrit Huizenga @ 2003-02-24 5:02 UTC (permalink / raw) To: davidm; +Cc: Martin J. Bligh, Linus Torvalds, linux-kernel On Sun, 23 Feb 2003 20:07:43 PST, David Mosberger wrote: > >>> On Sun, 23 Feb 2003 19:49:38 -0800, Gerrit Huizenga <gh@us.ibm.com> said: > > Gerrit> I haven't seen anything recently on the higher level System bencmarks > Gerrit> for IA64 > > Did you miss the TPC-C announcement from last November & December? > > rx5670 4-way Itanium 2: 80498 tpmC @ $5.30/transaction (Oracle 10 on Linux). > rx5670 4-way Itanium 2: 87741 tpmC @ $5.03/transaction (MS SQL on Windows). > > Both world-records for 4-way machines when they were announced (not > sure if that's still true). Yeah, I missed that. And my spot checking didn't catch anything IA64 related. Was there anything else on IA64 that competed with the current rack of 8-way IA32 boxen, or the upcoming 16-way stuff rolling out this year? Seems like the larger phys memory support should help on several of those benchmarks... The thin number of IA64 results indicates the difference in marketing/sales, although better price/performance should be able to change that... ;) Odd that MS is still outdoing Linux (or SQL is outdoing Oracle on Linux). Will be nice when that changes... gerrit ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-23 19:17 ` Linus Torvalds 2003-02-23 19:29 ` David Mosberger @ 2003-02-23 20:21 ` Xavier Bestel 2003-02-23 20:50 ` Martin J. Bligh ` (4 more replies) 2003-02-23 21:15 ` John Bradford 2003-02-23 21:55 ` William Lee Irwin III 3 siblings, 5 replies; 266+ messages in thread From: Xavier Bestel @ 2003-02-23 20:21 UTC (permalink / raw) To: Linus Torvalds; +Cc: Linux Kernel Mailing List Le dim 23/02/2003 à 20:17, Linus Torvalds a écrit : > And the baroque instruction encoding on the x86 is actually a _good_ > thing: it's a rather dense encoding, which means that you win on icache. > It's a bit hard to decode, but who cares? Existing chips do well at > decoding, and thanks to the icache win they tend to perform better - and > they load faster too (which is important - you can make your CPU have > big caches, but _nothing_ saves you from the cold-cache costs). Next step: hardware gzip ? ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-23 20:21 ` Xavier Bestel @ 2003-02-23 20:50 ` Martin J. Bligh 2003-02-23 23:57 ` Alan Cox 2003-02-23 21:35 ` Alan Cox ` (3 subsequent siblings) 4 siblings, 1 reply; 266+ messages in thread From: Martin J. Bligh @ 2003-02-23 20:50 UTC (permalink / raw) To: Xavier Bestel; +Cc: Linux Kernel Mailing List >> And the baroque instruction encoding on the x86 is actually a _good_ >> thing: it's a rather dense encoding, which means that you win on icache. >> It's a bit hard to decode, but who cares? Existing chips do well at >> decoding, and thanks to the icache win they tend to perform better - and >> they load faster too (which is important - you can make your CPU have >> big caches, but _nothing_ saves you from the cold-cache costs). > > Next step: hardware gzip ? They did that already ... IBM were demonstrating such a thing a couple of years ago. Don't see it helping with icache though, as it unpacks between memory and the processory, IIRC. M. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-23 20:50 ` Martin J. Bligh @ 2003-02-23 23:57 ` Alan Cox 2003-02-24 1:26 ` Kenneth Johansson 0 siblings, 1 reply; 266+ messages in thread From: Alan Cox @ 2003-02-23 23:57 UTC (permalink / raw) To: Martin J. Bligh; +Cc: Xavier Bestel, Linux Kernel Mailing List On Sun, 2003-02-23 at 20:50, Martin J. Bligh wrote: > >> And the baroque instruction encoding on the x86 is actually a _good_ > >> thing: it's a rather dense encoding, which means that you win on icache. > >> It's a bit hard to decode, but who cares? Existing chips do well at > >> decoding, and thanks to the icache win they tend to perform better - and > >> they load faster too (which is important - you can make your CPU have > >> big caches, but _nothing_ saves you from the cold-cache costs). > > > > Next step: hardware gzip ? > > They did that already ... IBM were demonstrating such a thing a couple of > years ago. Don't see it helping with icache though, as it unpacks between > memory and the processory, IIRC. I saw the L2/L3 compressed cache thing, and I thought "doh!", and I watched and I've not seen it for a long time. What happened to it ? ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-23 23:57 ` Alan Cox @ 2003-02-24 1:26 ` Kenneth Johansson 2003-02-24 1:53 ` dean gaudet 0 siblings, 1 reply; 266+ messages in thread From: Kenneth Johansson @ 2003-02-24 1:26 UTC (permalink / raw) To: Alan Cox; +Cc: Martin J. Bligh, Xavier Bestel, Linux Kernel Mailing List On Mon, 2003-02-24 at 00:57, Alan Cox wrote: > On Sun, 2003-02-23 at 20:50, Martin J. Bligh wrote: > > >> And the baroque instruction encoding on the x86 is actually a _good_ > > >> thing: it's a rather dense encoding, which means that you win on icache. > > >> It's a bit hard to decode, but who cares? Existing chips do well at > > >> decoding, and thanks to the icache win they tend to perform better - and > > >> they load faster too (which is important - you can make your CPU have > > >> big caches, but _nothing_ saves you from the cold-cache costs). > > > > > > Next step: hardware gzip ? > > > > They did that already ... IBM were demonstrating such a thing a couple of > > years ago. Don't see it helping with icache though, as it unpacks between > > memory and the processory, IIRC. > > I saw the L2/L3 compressed cache thing, and I thought "doh!", and I watched and > I've not seen it for a long time. What happened to it ? > http://www-3.ibm.com/chips/techlib/techlib.nsf/products/CodePack If you are thinking of this it dose look like people was not using it I know I'm not.It reduces memory for instructions but that is all and memory is seems is not a problem at least not for instructions. It dose not exist in new cpu's from IBM I don't know the official reason for the removal. If you really do mean compressed cache I don't think anybody has done that for real. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 1:26 ` Kenneth Johansson @ 2003-02-24 1:53 ` dean gaudet 0 siblings, 0 replies; 266+ messages in thread From: dean gaudet @ 2003-02-24 1:53 UTC (permalink / raw) To: Kenneth Johansson Cc: Alan Cox, Martin J. Bligh, Xavier Bestel, Linux Kernel Mailing List On Sun, 24 Feb 2003, Kenneth Johansson wrote: > If you really do mean compressed cache I don't think anybody has done > that for real. people are doing this *for real* -- it really depends on what you define as compressed. ARM thumb is definitely a compression function for code. x86 native instructions are compressed compared to the RISC-like micro-ops which a processor like athlon, p3, and p4 actually execute. for similar operations, an x86 would average probably 1.5 bytes to encode what a 32-bit RISC would need 4 bytes to encode. -dean ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-23 20:21 ` Xavier Bestel 2003-02-23 20:50 ` Martin J. Bligh @ 2003-02-23 21:35 ` Alan Cox 2003-02-23 21:41 ` Linus Torvalds ` (2 subsequent siblings) 4 siblings, 0 replies; 266+ messages in thread From: Alan Cox @ 2003-02-23 21:35 UTC (permalink / raw) To: Xavier Bestel; +Cc: Linus Torvalds, Linux Kernel Mailing List On Sun, 2003-02-23 at 20:21, Xavier Bestel wrote: > > they load faster too (which is important - you can make your CPU have > > big caches, but _nothing_ saves you from the cold-cache costs). > > Next step: hardware gzip ? gzip doesn't work because its not unpackable from an arbitary point. x86 in many ways is compressed, with common codes carefully bitpacked. A horrible cisc design constraint for size has come full circle and turned into a very nice memory/cache optimisation ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-23 20:21 ` Xavier Bestel 2003-02-23 20:50 ` Martin J. Bligh 2003-02-23 21:35 ` Alan Cox @ 2003-02-23 21:41 ` Linus Torvalds 2003-02-24 0:01 ` Bill Davidsen 2003-02-24 0:36 ` yodaiken 4 siblings, 0 replies; 266+ messages in thread From: Linus Torvalds @ 2003-02-23 21:41 UTC (permalink / raw) To: Xavier Bestel; +Cc: Linux Kernel Mailing List On 23 Feb 2003, Xavier Bestel wrote: > Le dim 23/02/2003 à 20:17, Linus Torvalds a écrit : > > > And the baroque instruction encoding on the x86 is actually a _good_ > > thing: it's a rather dense encoding, which means that you win on icache. > > It's a bit hard to decode, but who cares? Existing chips do well at > > decoding, and thanks to the icache win they tend to perform better - and > > they load faster too (which is important - you can make your CPU have > > big caches, but _nothing_ saves you from the cold-cache costs). > > Next step: hardware gzip ? Not gzip, no. It needs to be a random-access compression with reasonably small blocks, not something designed for streaming. Which makes it harder to do right and efficiently. But ARM has Thumb (not the same thing, but same idea), and at least some PPC chips have a page-based compressor - IBM calls it "CodePack" in case you want to google for it. Linus ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-23 20:21 ` Xavier Bestel ` (2 preceding siblings ...) 2003-02-23 21:41 ` Linus Torvalds @ 2003-02-24 0:01 ` Bill Davidsen 2003-02-24 0:36 ` yodaiken 4 siblings, 0 replies; 266+ messages in thread From: Bill Davidsen @ 2003-02-24 0:01 UTC (permalink / raw) To: Xavier Bestel; +Cc: Linus Torvalds, Linux Kernel Mailing List [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: TEXT/PLAIN; charset=US-ASCII, Size: 859 bytes --] On 23 Feb 2003, Xavier Bestel wrote: > Le dim 23/02/2003 à 20:17, Linus Torvalds a écrit : > > > And the baroque instruction encoding on the x86 is actually a _good_ > > thing: it's a rather dense encoding, which means that you win on icache. > > It's a bit hard to decode, but who cares? Existing chips do well at > > decoding, and thanks to the icache win they tend to perform better - and > > they load faster too (which is important - you can make your CPU have > > big caches, but _nothing_ saves you from the cold-cache costs). > > Next step: hardware gzip ? If the firmware issues were better defined in Intel ia32 chips, I could see a gzip instruction pointing to blocks in memory. As a proof of concept, not a big win. -- bill davidsen <davidsen@tmr.com> CTO, TMR Associates, Inc Doing interesting things with little computers since 1979. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-23 20:21 ` Xavier Bestel ` (3 preceding siblings ...) 2003-02-24 0:01 ` Bill Davidsen @ 2003-02-24 0:36 ` yodaiken 4 siblings, 0 replies; 266+ messages in thread From: yodaiken @ 2003-02-24 0:36 UTC (permalink / raw) To: Xavier Bestel; +Cc: Linus Torvalds, Linux Kernel Mailing List On Sun, Feb 23, 2003 at 09:21:27PM +0100, Xavier Bestel wrote: > Le dim 23/02/2003 à 20:17, Linus Torvalds a écrit : > > > And the baroque instruction encoding on the x86 is actually a _good_ > > thing: it's a rather dense encoding, which means that you win on icache. > > It's a bit hard to decode, but who cares? Existing chips do well at > > decoding, and thanks to the icache win they tend to perform better - and > > they load faster too (which is important - you can make your CPU have > > big caches, but _nothing_ saves you from the cold-cache costs). > > Next step: hardware gzip ? See ARM "thumb" ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-23 19:17 ` Linus Torvalds 2003-02-23 19:29 ` David Mosberger 2003-02-23 20:21 ` Xavier Bestel @ 2003-02-23 21:15 ` John Bradford 2003-02-23 21:45 ` Linus Torvalds 2003-02-23 21:55 ` William Lee Irwin III 3 siblings, 1 reply; 266+ messages in thread From: John Bradford @ 2003-02-23 21:15 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel > >If I didn't know this mattered I wouldn't bother with the barfbags. > >I just wouldn't deal with it. > > Why? > > The x86 is a hell of a lot nicer than the ppc32, for example. On the > x86, you get good performance and you can ignore the design mistakes (ie > segmentation) by just basically turning them off. I could be wrong, but I always thought that Sparc, and a lot of other architectures could mark arbitrary areas of memory, (such as the stack), as non-executable, whereas x86 only lets you have one non-executable segment. John. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-23 21:15 ` John Bradford @ 2003-02-23 21:45 ` Linus Torvalds 2003-02-24 1:25 ` Benjamin LaHaise 0 siblings, 1 reply; 266+ messages in thread From: Linus Torvalds @ 2003-02-23 21:45 UTC (permalink / raw) To: John Bradford; +Cc: linux-kernel On Sun, 23 Feb 2003, John Bradford wrote: > > I could be wrong, but I always thought that Sparc, and a lot of other > architectures could mark arbitrary areas of memory, (such as the > stack), as non-executable, whereas x86 only lets you have one > non-executable segment. The x86 has that stupid "executablility is tied to a segment" thing, which means that you cannot make things executable on a page-per-page level. It's a mistake, but it's one that _could_ be fixed in the architecture if it really mattered, the same way the WP bit got fixed in the i486. I'm definitely not saying that the x86 is perfect. It clearly isn't. But a lot of people complain about the wrong things, and a lot of people who tried to "fix" things just made them worse by throwing out the good parts too. Linus ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-23 21:45 ` Linus Torvalds @ 2003-02-24 1:25 ` Benjamin LaHaise 0 siblings, 0 replies; 266+ messages in thread From: Benjamin LaHaise @ 2003-02-24 1:25 UTC (permalink / raw) To: Linus Torvalds; +Cc: John Bradford, linux-kernel On Sun, Feb 23, 2003 at 01:45:16PM -0800, Linus Torvalds wrote: > The x86 has that stupid "executablility is tied to a segment" thing, which > means that you cannot make things executable on a page-per-page level. > It's a mistake, but it's one that _could_ be fixed in the architecture if > it really mattered, the same way the WP bit got fixed in the i486. I've been thinking about this recently, and it turns out that the whole point is moot with a fixed address vsyscall page: non-exec stacks are trivially circumvented by using the vsyscall page as a known starting point for the exploite. All the other tricks of changing the starting stack offset and using randomized load addresses don't help at all, since the exploite can merely use the vsyscall page to perform various operations. Personally, I'm still a fan of the shared library vsyscall trick, which would allow us to randomize its laod address and defeat this problem. -ben ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-23 19:17 ` Linus Torvalds ` (2 preceding siblings ...) 2003-02-23 21:15 ` John Bradford @ 2003-02-23 21:55 ` William Lee Irwin III 3 siblings, 0 replies; 266+ messages in thread From: William Lee Irwin III @ 2003-02-23 21:55 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel On Sun, Feb 23, 2003 at 07:17:30PM +0000, Linus Torvalds wrote: >> If I didn't know this mattered I wouldn't bother with the barfbags. >> I just wouldn't deal with it. On Sun, Feb 23, 2003 at 07:17:30PM +0000, Linus Torvalds wrote: > The x86 is a hell of a lot nicer than the ppc32, for example. On the > x86, you get good performance and you can ignore the design mistakes (ie > segmentation) by just basically turning them off. We "basically" turn it off, but I was recently reminded it existed, as LDT's are apparently wanted by something in userspace. There seem to be various other unwelcome reminders floating around performance critical paths as well. I vaguely remember segmentation being the only way to enforce execution permissions for mmap(), which we just don't bother doing. On Sun, Feb 23, 2003 at 07:17:30PM +0000, Linus Torvalds wrote: > On the ppc32, the MMU braindamage is not something you can ignore, you > have to write your OS for it and if you turn it off (ie enable soft-fill > on the ones that support it) you now have to have separate paths in the > OS for it. The hashtables don't bother me very much. They can relatively easily be front-ended by radix tree pagetables anyway, and if it sucks, well, no software in the world can save sucky hardware. Hopefully later models fix it to be fast or disablable. I'm more bothered by x86 lacking ASN's. On Sun, Feb 23, 2003 at 07:17:30PM +0000, Linus Torvalds wrote: > And the baroque instruction encoding on the x86 is actually a _good_ > thing: it's a rather dense encoding, which means that you win on icache. > It's a bit hard to decode, but who cares? Existing chips do well at > decoding, and thanks to the icache win they tend to perform better - and > they load faster too (which is important - you can make your CPU have > big caches, but _nothing_ saves you from the cold-cache costs). I'm not so sure, between things cacheline aligning branch targets and space/time tradeoffs with smaller instructions running slower than large sequences of instructions, this stuff gets pretty strange. It still comes out smaller in the end but by a smaller-than-expected though probably still significant margin. There's a good chunk of the instruction set that should probably just be dumped outright, too. On Sun, Feb 23, 2003 at 07:17:30PM +0000, Linus Torvalds wrote: > The low register count isn't an issue when you code in any high-level > language, and it has actually forced x86 implementors to do a hell of a > lot better job than the competition when it comes to memory loads and > stores - which helps in general. While the RISC people were off trying > to optimize their compilers to generate loops that used all 32 registers > efficiently, the x86 implementors instead made the chip run fast on > varied loads and used tons of register renaming hardware (and looking at > _memory_ renaming too). Invariably we get stuck diving into assembly anyway. =) This one is basically me getting irked by looking at disassemblies of random x86 binaries and seeing vast amounts of register spilling. It's probably not a performance issue aside from code bloat esp. given the amount of trickery with the weird L1 cache stack magic and so on. On Sun, Feb 23, 2003 at 07:17:30PM +0000, Linus Torvalds wrote: > IA64 made all the mistakes anybody else did, and threw out all the good > parts of the x86 because people thought those parts were ugly. They > aren't ugly, they're the "charming oddity" that makes it do well. Look > at them the right way and you realize that a lot of the grottyness is > exactly _why_ the x86 works so well (yeah, and the fact that they are > everywhere ;). Count me as "not charmed". We've actually tripped over this stuff, and for the most part you've been personally squashing the super low-level bugs like the NT flag business and vsyscall segmentation oddities. IA64 suffers from truly excessive featuritis and there are relatively good chances some (or all) of them will be every bit as unused and hated as segmentation if it actually survives. On Sun, Feb 23, 2003 at 07:17:30PM +0000, Linus Torvalds wrote: > The only real major failure of the x86 is the PAE crud. Let's hope > we'll get to forget it, the same way the DOS people eventually forgot > about their memory extenders. We've not really been able to forget about segments or ISA DMA... The pessimist in me has more or less already resigned me to PAE as a fact of life. On Sun, Feb 23, 2003 at 07:17:30PM +0000, Linus Torvalds wrote: > (Yeah, and maybe IBM will make their ppc64 chips cheap enough that they > will matter, and people can overlook the grottiness there. Right now > Intel doesn't even seem to be interested in "64-bit for the masses", and > maybe IBM will be. AMD certainly seems to be serious about the "masses" > part, which in the end is the only part that really matters). ppc64 is sane in my book (not vendor nepotism, the other "vanilla RISC" machines get the same rating in my book). No idea about marketing stuff. -- wli ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-23 8:07 ` David Lang 2003-02-23 8:20 ` William Lee Irwin III @ 2003-02-23 19:13 ` David Mosberger 2003-02-23 23:28 ` Benjamin LaHaise 2003-02-26 8:46 ` Eric W. Biederman 2003-02-23 20:48 ` Gerrit Huizenga 2 siblings, 2 replies; 266+ messages in thread From: David Mosberger @ 2003-02-23 19:13 UTC (permalink / raw) To: David Lang Cc: Gerrit Huizenga, Benjamin LaHaise, William Lee Irwin III, Jeff Garzik, linux-kernel >>>>> On Sun, 23 Feb 2003 00:07:50 -0800 (PST), David Lang <david.lang@digitalinsight.com> said: David.L> Garrit, you missed the preior posters point. IA64 had the David.L> same fundamental problem as the Alpha, PPC, and Sparc David.L> processors, it doesn't run x86 binaries. This simply isn't true. Itanium and Itanium 2 have full x86 hardware built into the chip (for better or worse ;-). The speed isn't as good as the fastest x86 chips today, but it's faster (~300MHz P6) than the PCs many of us are using and it certainly meets my needs better than any other x86 "emulation" I have used in the past (which includes FX!32 and its relatives for Alpha). --david ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-23 19:13 ` David Mosberger @ 2003-02-23 23:28 ` Benjamin LaHaise 2003-02-26 8:46 ` Eric W. Biederman 1 sibling, 0 replies; 266+ messages in thread From: Benjamin LaHaise @ 2003-02-23 23:28 UTC (permalink / raw) To: David Mosberger Cc: David Lang, Gerrit Huizenga, William Lee Irwin III, Jeff Garzik, linux-kernel On Sun, Feb 23, 2003 at 11:13:03AM -0800, David Mosberger wrote: > This simply isn't true. Itanium and Itanium 2 have full x86 hardware > built into the chip (for better or worse ;-). The speed isn't as good > as the fastest x86 chips today, but it's faster (~300MHz P6) than the That hardly counts as reasonably performant: the slowest mainstream chips from Intel and AMD are clocked well over 1 GHz. At least x86-64 will improve the performance of the 32 bit databases people have already invested large amounts of money in, and it will do so without the need for a massive outlay of funds for a new 64 bit license. Why accept more than 10x the cost to migrate to ia64 when a new x86-64 will improve the speed of existing applications, and improve scalability with the transparent addition of a 64 bit kernel? -ben -- Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a> ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-23 19:13 ` David Mosberger 2003-02-23 23:28 ` Benjamin LaHaise @ 2003-02-26 8:46 ` Eric W. Biederman 1 sibling, 0 replies; 266+ messages in thread From: Eric W. Biederman @ 2003-02-26 8:46 UTC (permalink / raw) To: davidm Cc: David Lang, Gerrit Huizenga, Benjamin LaHaise, William Lee Irwin III, Jeff Garzik, linux-kernel David Mosberger <davidm@napali.hpl.hp.com> writes: > >>>>> On Sun, 23 Feb 2003 00:07:50 -0800 (PST), David Lang > <david.lang@digitalinsight.com> said: > > > David.L> Garrit, you missed the preior posters point. IA64 had the > David.L> same fundamental problem as the Alpha, PPC, and Sparc > David.L> processors, it doesn't run x86 binaries. > > This simply isn't true. Itanium and Itanium 2 have full x86 hardware > built into the chip (for better or worse ;-). The speed isn't as good > as the fastest x86 chips today, but it's faster (~300MHz P6) than the > PCs many of us are using and it certainly meets my needs better than > any other x86 "emulation" I have used in the past (which includes > FX!32 and its relatives for Alpha). I have various random x86 binaries that do not work. My 32bit x86 user space does not run. A 32bit kernel doesn't have a chance. So for me at least the 32bit support is not useful in avoiding converting binaries. For the handful of apps that cannot be recompiled I suspect the support is good enough so you can get them to run somehow. Eric ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-23 8:07 ` David Lang 2003-02-23 8:20 ` William Lee Irwin III 2003-02-23 19:13 ` David Mosberger @ 2003-02-23 20:48 ` Gerrit Huizenga 2 siblings, 0 replies; 266+ messages in thread From: Gerrit Huizenga @ 2003-02-23 20:48 UTC (permalink / raw) To: David Lang Cc: Benjamin LaHaise, William Lee Irwin III, Jeff Garzik, linux-kernel On Sun, 23 Feb 2003 00:07:50 PST, David Lang wrote: > Garrit, you missed the preior posters point. IA64 had the same fundamental > problem as the Alpha, PPC, and Sparc processors, it doesn't run x86 > binaries. IA64 *can* run IA32 binaries, just more slowly than native IA64 code. gerrit ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-23 1:17 ` Benjamin LaHaise 2003-02-23 5:21 ` Gerrit Huizenga @ 2003-02-23 9:37 ` William Lee Irwin III 1 sibling, 0 replies; 266+ messages in thread From: William Lee Irwin III @ 2003-02-23 9:37 UTC (permalink / raw) To: Benjamin LaHaise; +Cc: Jeff Garzik, linux-kernel On Sat, Feb 22, 2003 at 02:18:20PM -0800, William Lee Irwin III wrote: >> I'm not sure what's so nice about x86-64; another opcode prefix >> controlled extension atop the festering pile of existing x86 crud On Sat, Feb 22, 2003 at 08:17:24PM -0500, Benjamin LaHaise wrote: > What's nice about x86-64 is that it runs existing 32 bit apps fast and > doesn't suffer from the blisteringly small caches that were part of your > rant. Plus, x86-64 binaries are not horrifically bloated like ia64. > Not to mention that the amount of reengineering in compilers like > gcc required to get decent performance out of it is actually sane. Rant? It was just a catalogue of other things that are nasty. The point was that PAE's not special, it's one of a very long list of very ugly uglinesses, and my list wasn't anywhere near exhaustive. But yes, more cache is good. Unfortunately the amount of baggage from 32-bit x86 stuff still puts a good chunk of systems programming into the old bring your own barfbag territory. On Sat, Feb 22, 2003 at 02:18:20PM -0800, William Lee Irwin III wrote: >> sounds every bit as bad any other attempt to prolong x86. Some of >> the system device -level cleanups like the HPET look nice, though. On Sat, Feb 22, 2003 at 08:17:24PM -0500, Benjamin LaHaise wrote: > HPET is part of one of the PCYY specs and even available on 32 bit x86, > there are just not that many bug free implements yet. Since x86-64 made > it part of the base platform and is testing it from launch, they actually > have a chance at being debugged in the mass market versions. Well, it beats the heck out of the TSC and the PIT, and x86-64 is apparently supposed to have it "for real". I'm not excited at all about another opcode prefix and pagetable format. -- wli ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 6:39 ` Martin J. Bligh 2003-02-22 8:38 ` Jeff Garzik @ 2003-02-22 8:38 ` David S. Miller 1 sibling, 0 replies; 266+ messages in thread From: David S. Miller @ 2003-02-22 8:38 UTC (permalink / raw) To: Martin J. Bligh; +Cc: Larry McVoy, Hanna Linder, lse-tech, linux-kernel On Fri, 2003-02-21 at 22:39, Martin J. Bligh wrote: > > Lots of people working for companies who haven't figured out how to do > > it as well as Dell *say* it can't be done but numbers say differently. > > And how much of that was profit on PCs running Linux? Or PCs period, they make tons of bucks on servers and associated support contracts. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 5:05 ` Larry McVoy 2003-02-22 6:39 ` Martin J. Bligh @ 2003-02-22 8:38 ` David S. Miller 2003-02-22 14:34 ` Larry McVoy 1 sibling, 1 reply; 266+ messages in thread From: David S. Miller @ 2003-02-22 8:38 UTC (permalink / raw) To: Larry McVoy; +Cc: Martin J. Bligh, Hanna Linder, lse-tech, linux-kernel On Fri, 2003-02-21 at 21:05, Larry McVoy wrote: > Let's see, Dell has a $66B market cap, revenues of $8B/quarter and > $500M/quarter in profit. While I understand these numbers are on the mark, there is a tertiary issue to realize. Dell makes money on many things other than thin-margin PCs. And lo' and behold one of those things is selling the larger Intel based servers and support contracts to go along with that. And so you're nearly supporting Martin's arguments for supporting large servers better under Linux by bringing up Dell's balance sheet :-) ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 8:38 ` David S. Miller @ 2003-02-22 14:34 ` Larry McVoy 2003-02-22 15:47 ` Martin J. Bligh 0 siblings, 1 reply; 266+ messages in thread From: Larry McVoy @ 2003-02-22 14:34 UTC (permalink / raw) To: David S. Miller Cc: Larry McVoy, Martin J. Bligh, Hanna Linder, lse-tech, linux-kernel On Sat, Feb 22, 2003 at 12:38:33AM -0800, David S. Miller wrote: > On Fri, 2003-02-21 at 21:05, Larry McVoy wrote: > > Let's see, Dell has a $66B market cap, revenues of $8B/quarter and > > $500M/quarter in profit. > > While I understand these numbers are on the mark, there is a tertiary > issue to realize. > > Dell makes money on many things other than thin-margin PCs. And lo' > and behold one of those things is selling the larger Intel based > servers and support contracts to go along with that. I did some digging trying to find that ratio before I posted last night and couldn't. You obviously think that the servers are a significant part of their business. I'd be surprised at that, but that's cool, what are the numbers? PC's, monitors, disks, laptops, anything with less than 4 cpus is in the little bucket, so how much revenue does Dell generate on the 4 CPU and larger servers? -- --- Larry McVoy lm at bitmover.com http://www.bitmover.com/lm ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 14:34 ` Larry McVoy @ 2003-02-22 15:47 ` Martin J. Bligh 2003-02-22 16:13 ` Larry McVoy 0 siblings, 1 reply; 266+ messages in thread From: Martin J. Bligh @ 2003-02-22 15:47 UTC (permalink / raw) To: Larry McVoy, David S. Miller; +Cc: lse-tech, linux-kernel >> > Let's see, Dell has a $66B market cap, revenues of $8B/quarter and >> > $500M/quarter in profit. >> >> While I understand these numbers are on the mark, there is a tertiary >> issue to realize. >> >> Dell makes money on many things other than thin-margin PCs. And lo' >> and behold one of those things is selling the larger Intel based >> servers and support contracts to go along with that. > > I did some digging trying to find that ratio before I posted last night > and couldn't. You obviously think that the servers are a significant > part of their business. I'd be surprised at that, but that's cool, > what are the numbers? PC's, monitors, disks, laptops, anything with less > than 4 cpus is in the little bucket, so how much revenue does Dell generate > on the 4 CPU and larger servers? It's not a question of revenue, it's one of profit. Very few people buy desktops for use with Linux, compared to those that buy them for Windows. The profit on each PC is small, thus I still think a substantial proportion of the profit made by hardware vendors by Linux is on servers rather than desktop PCs. The numbers will be smaller for high end machines, but the profit margins are much higher. M. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 15:47 ` Martin J. Bligh @ 2003-02-22 16:13 ` Larry McVoy 2003-02-22 16:29 ` Martin J. Bligh 2003-02-24 18:00 ` Timothy D. Witham 0 siblings, 2 replies; 266+ messages in thread From: Larry McVoy @ 2003-02-22 16:13 UTC (permalink / raw) To: Martin J. Bligh; +Cc: Larry McVoy, David S. Miller, lse-tech, linux-kernel On Sat, Feb 22, 2003 at 07:47:53AM -0800, Martin J. Bligh wrote: > >> > Let's see, Dell has a $66B market cap, revenues of $8B/quarter and > >> > $500M/quarter in profit. > >> > >> While I understand these numbers are on the mark, there is a tertiary > >> issue to realize. > >> > >> Dell makes money on many things other than thin-margin PCs. And lo' > >> and behold one of those things is selling the larger Intel based > >> servers and support contracts to go along with that. > > > > I did some digging trying to find that ratio before I posted last night > > and couldn't. You obviously think that the servers are a significant > > part of their business. I'd be surprised at that, but that's cool, > > what are the numbers? PC's, monitors, disks, laptops, anything with less > > than 4 cpus is in the little bucket, so how much revenue does Dell generate > > on the 4 CPU and larger servers? > > It's not a question of revenue, it's one of profit. Very few people buy > desktops for use with Linux, compared to those that buy them for Windows. > The profit on each PC is small, thus I still think a substantial proportion > of the profit made by hardware vendors by Linux is on servers rather than > desktop PCs. The numbers will be smaller for high end machines, but the > profit margins are much higher. That's all handwaving and has no meaning without numbers. I could care less if Dell has 99.99% margins on their servers, if they only sell $50M of servers a quarter that is still less than 10% of their quarterly profit. So what are the actual *numbers*? Your point makes sense if and only if people sell lots of server. I spent a few minutes in google: world wide server sales are $40B at the moment. The overwhelming majority of that revenue is small servers. Let's say that Dell has 20% of that market, that's $2B/quarter. Now let's chop off the 1-2 CPU systems. I'll bet you long long odds that that is 90% of their revenue in the server space. Supposing that's right, that's $200M/quarter in big iron sales. Out of $8000M/quarter. I'd love to see data which is different than this but you'll have a tough time finding it. More and more companies are looking at the cost of big iron and deciding it doesn't make sense to spend $20K/CPU when they could be spending $1K/CPU. Look at Google, try selling them some big iron. Look at Wall Street - abandoning big iron as fast as they can. -- --- Larry McVoy lm at bitmover.com http://www.bitmover.com/lm ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 16:13 ` Larry McVoy @ 2003-02-22 16:29 ` Martin J. Bligh 2003-02-22 16:33 ` Larry McVoy 2003-02-24 18:00 ` Timothy D. Witham 1 sibling, 1 reply; 266+ messages in thread From: Martin J. Bligh @ 2003-02-22 16:29 UTC (permalink / raw) To: Larry McVoy; +Cc: David S. Miller, lse-tech, linux-kernel > That's all handwaving and has no meaning without numbers. I could care less > if Dell has 99.99% margins on their servers, if they only sell $50M of servers > a quarter that is still less than 10% of their quarterly profit. > > So what are the actual *numbers*? Your point makes sense if and only if > people sell lots of server. I spent a few minutes in google: world wide > server sales are $40B at the moment. The overwhelming majority of that > revenue is small servers. Let's say that Dell has 20% of that market, > that's $2B/quarter. Now let's chop off the 1-2 CPU systems. I'll bet > you long long odds that that is 90% of their revenue in the server space. > Supposing that's right, that's $200M/quarter in big iron sales. Out of > $8000M/quarter. > > I'd love to see data which is different than this but you'll have a tough > time finding it. More and more companies are looking at the cost of > big iron and deciding it doesn't make sense to spend $20K/CPU when they > could be spending $1K/CPU. Look at Google, try selling them some big > iron. Look at Wall Street - abandoning big iron as fast as they can. But we're talking about linux ... and we're talking about profit, not revenue. I'd guess that 99% of their desktop sales are for Windows. And I'd guess they make 100 times as much profit on a big server as they do on a desktop PC. Would be nice if someone had real numbers, but I doubt they're published except in non-free corporate research reports. M. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 16:29 ` Martin J. Bligh @ 2003-02-22 16:33 ` Larry McVoy 2003-02-22 16:39 ` Martin J. Bligh 0 siblings, 1 reply; 266+ messages in thread From: Larry McVoy @ 2003-02-22 16:33 UTC (permalink / raw) To: Martin J. Bligh; +Cc: Larry McVoy, David S. Miller, lse-tech, linux-kernel On Sat, Feb 22, 2003 at 08:29:34AM -0800, Martin J. Bligh wrote: > > people sell lots of server. I spent a few minutes in google: world wide > > server sales are $40B at the moment. The overwhelming majority of that > > revenue is small servers. Let's say that Dell has 20% of that market, > > that's $2B/quarter. Now let's chop off the 1-2 CPU systems. I'll bet > > you long long odds that that is 90% of their revenue in the server space. > > Supposing that's right, that's $200M/quarter in big iron sales. Out of > > $8000M/quarter. > > > > I'd love to see data which is different than this but you'll have a tough > > time finding it. More and more companies are looking at the cost of > > big iron and deciding it doesn't make sense to spend $20K/CPU when they > > could be spending $1K/CPU. Look at Google, try selling them some big > > iron. Look at Wall Street - abandoning big iron as fast as they can. > > But we're talking about linux ... and we're talking about profit, not > revenue. I'd guess that 99% of their desktop sales are for Windows. > And I'd guess they make 100 times as much profit on a big server as they > do on a desktop PC. You are thinking in today's terms. Find the asymptote and project out. -- --- Larry McVoy lm at bitmover.com http://www.bitmover.com/lm ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 16:33 ` Larry McVoy @ 2003-02-22 16:39 ` Martin J. Bligh 2003-02-22 16:59 ` John Bradford 0 siblings, 1 reply; 266+ messages in thread From: Martin J. Bligh @ 2003-02-22 16:39 UTC (permalink / raw) To: Larry McVoy; +Cc: David S. Miller, lse-tech, linux-kernel >> But we're talking about linux ... and we're talking about profit, not >> revenue. I'd guess that 99% of their desktop sales are for Windows. >> And I'd guess they make 100 times as much profit on a big server as they >> do on a desktop PC. > > You are thinking in today's terms. Find the asymptote and project out. OK, I predict that Linux will take over the whole of the high end server market ... if people stop complaining about us fixing scalability. That should give some nicer numbers .... M. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 16:39 ` Martin J. Bligh @ 2003-02-22 16:59 ` John Bradford 0 siblings, 0 replies; 266+ messages in thread From: John Bradford @ 2003-02-22 16:59 UTC (permalink / raw) To: Martin J. Bligh; +Cc: lm, davem, lse-tech, linux-kernel > OK, I predict that Linux will take over the whole of the high end server > market ... if people stop complaining about us fixing scalability. That > should give some nicer numbers .... Extending the useful life of current hardware will shift profit even further towards support contracts, and away from hardware sales. Imagine the performance gain a webserver serving mostly static content, with light database and scripting usage is going to see moving from a 2.4 -> 2.6 kernel? Zero copy and filesystem improvements alone will extend it's useful life dramatically, in my opinion. John. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 16:13 ` Larry McVoy 2003-02-22 16:29 ` Martin J. Bligh @ 2003-02-24 18:00 ` Timothy D. Witham 1 sibling, 0 replies; 266+ messages in thread From: Timothy D. Witham @ 2003-02-24 18:00 UTC (permalink / raw) To: Larry McVoy; +Cc: Martin J. Bligh, David S. Miller, lse-tech, linux-kernel On Sat, 2003-02-22 at 08:13, Larry McVoy wrote: > On Sat, Feb 22, 2003 at 07:47:53AM -0800, Martin J. Bligh wrote: > > >> > Let's see, Dell has a $66B market cap, revenues of $8B/quarter and > > >> > $500M/quarter in profit. > > >> > > >> While I understand these numbers are on the mark, there is a tertiary > > >> issue to realize. > > >> > > >> Dell makes money on many things other than thin-margin PCs. And lo' > > >> and behold one of those things is selling the larger Intel based > > >> servers and support contracts to go along with that. > > > > > > I did some digging trying to find that ratio before I posted last night > > > and couldn't. You obviously think that the servers are a significant > > > part of their business. I'd be surprised at that, but that's cool, > > > what are the numbers? PC's, monitors, disks, laptops, anything with less > > > than 4 cpus is in the little bucket, so how much revenue does Dell generate > > > on the 4 CPU and larger servers? > > > > It's not a question of revenue, it's one of profit. Very few people buy > > desktops for use with Linux, compared to those that buy them for Windows. > > The profit on each PC is small, thus I still think a substantial proportion > > of the profit made by hardware vendors by Linux is on servers rather than > > desktop PCs. The numbers will be smaller for high end machines, but the > > profit margins are much higher. > > That's all handwaving and has no meaning without numbers. I could care less > if Dell has 99.99% margins on their servers, if they only sell $50M of servers > a quarter that is still less than 10% of their quarterly profit. > > So what are the actual *numbers*? Your point makes sense if and only if > people sell lots of server. I spent a few minutes in google: world wide > server sales are $40B at the moment. The overwhelming majority of that > revenue is small servers. Let's say that Dell has 20% of that market, > that's $2B/quarter. Now let's chop off the 1-2 CPU systems. I'll bet > you long long odds that that is 90% of their revenue in the server space. > Supposing that's right, that's $200M/quarter in big iron sales. Out of > $8000M/quarter. > The numbers that I have seen are covered under an NDA so I can' put them out but an important point to note is that while there is a very sharp decrease in the number of servers sold as you go hight up into the price bands the total $ in revenue is hourglass shaped. With the neck being in a price band that corresponds to a 4 way server. The total $ spent on the highest band of servers is about equal to the total $ spent on the lowest price band of servers. But the margins for the high end are much better than the margins for the lowest band. > I'd love to see data which is different than this but you'll have a tough > time finding it. More and more companies are looking at the cost of > big iron and deciding it doesn't make sense to spend $20K/CPU when they > could be spending $1K/CPU. Look at Google, try selling them some big > iron. Look at Wall Street - abandoning big iron as fast as they can. Oh, you can see it, it will just cost you about $50,000 to get the survey from the company that spends all the money putting it together. On the size of the system, every system should be as big as it needs to be. Some problems partition nicely, like Google but other ones do not, like accounts receivable. It all seems to come down to the question, "Does the data _naturally_ partition?" If it does then you should either use lots of small servers or a s/390 type solution with lots of instances. However if the data doesn't naturally partition you should use one large machine as you will spend more money on people trying to manage the servers than you would of spent initially on the hardware. Also you need to look at the backend systems in places like Wall Street, those are big machines, have been for a long time and aren't changing out. But it doesn't make a good story. Tim -- Timothy D. Witham <wookie@osdl.org> Open Sourcre Development Lab, Inc ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 0:16 ` Larry McVoy 2003-02-22 0:25 ` William Lee Irwin III 2003-02-22 0:44 ` Martin J. Bligh @ 2003-02-22 8:32 ` David S. Miller 2003-02-22 18:20 ` Alan Cox 2003-02-23 0:37 ` Eric W. Biederman 4 siblings, 0 replies; 266+ messages in thread From: David S. Miller @ 2003-02-22 8:32 UTC (permalink / raw) To: Larry McVoy; +Cc: Hanna Linder, lse-tech, linux-kernel On Fri, 2003-02-21 at 16:16, Larry McVoy wrote: > In terms of the money and in terms of installed seats, the small Linux > machines out number the 4 or more CPU SMP machines easily 10,000:1. While I totally agree with your points, I want to mention that although this ratio is true, the exact opposite ratio applies to the price of the service contracts a company can land with the big machines :-) ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 0:16 ` Larry McVoy ` (2 preceding siblings ...) 2003-02-22 8:32 ` David S. Miller @ 2003-02-22 18:20 ` Alan Cox 2003-02-22 20:05 ` William Lee Irwin III 2003-02-22 21:36 ` Gerrit Huizenga 2003-02-23 0:37 ` Eric W. Biederman 4 siblings, 2 replies; 266+ messages in thread From: Alan Cox @ 2003-02-22 18:20 UTC (permalink / raw) To: Larry McVoy; +Cc: Hanna Linder, lse-tech, Linux Kernel Mailing List On Sat, 2003-02-22 at 00:16, Larry McVoy wrote: > In terms of the money and in terms of installed seats, the small Linux > machines out number the 4 or more CPU SMP machines easily 10,000:1. > And with the embedded market being one of the few real money makers > for Linux, there will be huge pushback from those companies against > changes which increase memory footprint. I think people overestimate the numbner of large boxes badly. Several IDE pre-patches didn't work on highmem boxes. It took *ages* for people to actually notice there was a problem. The desktop world is still 128-256Mb and some of the crap people push is problematic even there. In the embedded space where there is a *ton* of money to be made by smart people a lot of the 2.5 choices look very questionable indeed - but not all by any means, we are for example close to being able to dump the block layer, shrink stacks down by using IRQ stacks and other good stuff. I'm hoping the Montavista and IBM people will swat each others bogons 8) Alan ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 18:20 ` Alan Cox @ 2003-02-22 20:05 ` William Lee Irwin III 2003-02-22 21:35 ` Alan Cox 2003-02-22 21:36 ` Gerrit Huizenga 1 sibling, 1 reply; 266+ messages in thread From: William Lee Irwin III @ 2003-02-22 20:05 UTC (permalink / raw) To: Alan Cox; +Cc: Larry McVoy, Hanna Linder, lse-tech, Linux Kernel Mailing List On Sat, 2003-02-22 at 00:16, Larry McVoy wrote: >> And with the embedded market being one of the few real money makers >> for Linux, there will be huge pushback from those companies against >> changes which increase memory footprint. On Sat, Feb 22, 2003 at 06:20:19PM +0000, Alan Cox wrote: > I think people overestimate the numbner of large boxes badly. Several IDE > pre-patches didn't work on highmem boxes. It took *ages* for people to > actually notice there was a problem. The desktop world is still 128-256Mb > and some of the crap people push is problematic even there. In the embedded > space where there is a *ton* of money to be made by smart people a lot > of the 2.5 choices look very questionable indeed - but not all by any > means, we are for example close to being able to dump the block layer, > shrink stacks down by using IRQ stacks and other good stuff. Well, I've never seen IDE in a highmem box, and there's probably a good reason for it. The space trimmings sound pretty interesting. IRQ stacks in general sound good just to mitigate stackblowings due to IRQ pounding. On Sat, Feb 22, 2003 at 06:20:19PM +0000, Alan Cox wrote: > I'm hoping the Montavista and IBM people will swat each others bogons 8) Sounds like a bigger win for the bigboxen, since space matters there, but large-scale SMP efficiency probably doesn't make a difference to embedded (though I think some 2x embedded systems are floating around). -- wli ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 20:05 ` William Lee Irwin III @ 2003-02-22 21:35 ` Alan Cox 0 siblings, 0 replies; 266+ messages in thread From: Alan Cox @ 2003-02-22 21:35 UTC (permalink / raw) To: William Lee Irwin III Cc: Larry McVoy, Hanna Linder, lse-tech, Linux Kernel Mailing List On Sat, 2003-02-22 at 20:05, William Lee Irwin III wrote: > On Sat, Feb 22, 2003 at 06:20:19PM +0000, Alan Cox wrote: > > I'm hoping the Montavista and IBM people will swat each others bogons 8) > > Sounds like a bigger win for the bigboxen, since space matters there, > but large-scale SMP efficiency probably doesn't make a difference to > embedded (though I think some 2x embedded systems are floating around). Smaller cleaner code is a win for everyone, and it often pays off in ways that are not immediately obvious. For example having your entire kernel working set and running app fitting in the L2 cache happens to be very good news to most people. Alan ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 18:20 ` Alan Cox 2003-02-22 20:05 ` William Lee Irwin III @ 2003-02-22 21:36 ` Gerrit Huizenga 2003-02-22 21:42 ` Christoph Hellwig 2003-02-23 23:23 ` Bill Davidsen 1 sibling, 2 replies; 266+ messages in thread From: Gerrit Huizenga @ 2003-02-22 21:36 UTC (permalink / raw) To: Alan Cox; +Cc: Larry McVoy, Hanna Linder, lse-tech, Linux Kernel Mailing List On 22 Feb 2003 18:20:19 GMT, Alan Cox wrote: > I think people overestimate the numbner of large boxes badly. Several IDE > pre-patches didn't work on highmem boxes. It took *ages* for people to > actually notice there was a problem. The desktop world is still 128-256Mb IDE on big boxes? Is that crack I smell burning? A desktop with 4 GB is a fun toy, but bigger than *I* need, even for development purposes. But I don't think EMC, Clariion (low end EMC), Shark, etc. have any IDE products for my 8-proc 16 GB machine... And running pre-patches in a production environment that might expose this would be a little silly as well. Probably a bad example to extrapolate large system numbers from. gerrit ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 21:36 ` Gerrit Huizenga @ 2003-02-22 21:42 ` Christoph Hellwig 2003-02-23 23:23 ` Bill Davidsen 1 sibling, 0 replies; 266+ messages in thread From: Christoph Hellwig @ 2003-02-22 21:42 UTC (permalink / raw) To: Gerrit Huizenga Cc: Alan Cox, Larry McVoy, Hanna Linder, lse-tech, Linux Kernel Mailing List On Sat, Feb 22, 2003 at 01:36:31PM -0800, Gerrit Huizenga wrote: > IDE on big boxes? Is that crack I smell burning? A desktop with 4 GB > is a fun toy, but bigger than *I* need, even for development purposes. > But I don't think EMC, Clariion (low end EMC), Shark, etc. have any > IDE products for my 8-proc 16 GB machine... And running pre-patches in > a production environment that might expose this would be a little > silly as well. > > Probably a bad example to extrapolate large system numbers from. At least the SGI Altix does have an IDE/ATAPI CDROM drive :) ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 21:36 ` Gerrit Huizenga 2003-02-22 21:42 ` Christoph Hellwig @ 2003-02-23 23:23 ` Bill Davidsen 2003-02-24 3:31 ` Gerrit Huizenga 1 sibling, 1 reply; 266+ messages in thread From: Bill Davidsen @ 2003-02-23 23:23 UTC (permalink / raw) To: Gerrit Huizenga; +Cc: lse-tech, Linux Kernel Mailing List On Sat, 22 Feb 2003, Gerrit Huizenga wrote: > On 22 Feb 2003 18:20:19 GMT, Alan Cox wrote: > > I think people overestimate the numbner of large boxes badly. Several IDE > > pre-patches didn't work on highmem boxes. It took *ages* for people to > > actually notice there was a problem. The desktop world is still 128-256Mb > > IDE on big boxes? Is that crack I smell burning? A desktop with 4 GB > is a fun toy, but bigger than *I* need, even for development purposes. > But I don't think EMC, Clariion (low end EMC), Shark, etc. have any > IDE products for my 8-proc 16 GB machine... And running pre-patches in > a production environment that might expose this would be a little > silly as well. I don't disagree with most of your point, however there certainly are legitimate uses for big boxes with small (IDE) disk. Those which first come to mind are all computational problems, in which a small dataset is read from disk and then processors beat on the data. More or less common examples are graphics transformations (original and final data compressed), engineering calculations such as finite element analysis, rendering (raytracing) type calculations, and data analysis (things like setiathome or automated medical image analysis). IDE drives are very cost effective, and low cost motherboard RAID is certainly useful for preserving the results of large calculations on small (relatively) datasets. -- bill davidsen <davidsen@tmr.com> CTO, TMR Associates, Inc Doing interesting things with little computers since 1979. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-23 23:23 ` Bill Davidsen @ 2003-02-24 3:31 ` Gerrit Huizenga 2003-02-24 4:02 ` Larry McVoy 0 siblings, 1 reply; 266+ messages in thread From: Gerrit Huizenga @ 2003-02-24 3:31 UTC (permalink / raw) To: Bill Davidsen; +Cc: lse-tech, Linux Kernel Mailing List On Sun, 23 Feb 2003 18:23:01 EST, Bill Davidsen wrote: > On Sat, 22 Feb 2003, Gerrit Huizenga wrote: > > > On 22 Feb 2003 18:20:19 GMT, Alan Cox wrote: > > > I think people overestimate the numbner of large boxes badly. Several IDE > > > pre-patches didn't work on highmem boxes. It took *ages* for people to > > > actually notice there was a problem. The desktop world is still 128-256Mb > > > > IDE on big boxes? Is that crack I smell burning? A desktop with 4 GB > > is a fun toy, but bigger than *I* need, even for development purposes. > > But I don't think EMC, Clariion (low end EMC), Shark, etc. have any > > IDE products for my 8-proc 16 GB machine... And running pre-patches in > > a production environment that might expose this would be a little > > silly as well. > > I don't disagree with most of your point, however there certainly are > legitimate uses for big boxes with small (IDE) disk. Those which first > come to mind are all computational problems, in which a small dataset is > read from disk and then processors beat on the data. More or less common > examples are graphics transformations (original and final data > compressed), engineering calculations such as finite element analysis, > rendering (raytracing) type calculations, and data analysis (things like > setiathome or automated medical image analysis). Yeah and as Christoph pointed out, a lot of big machines have IDE based CD-ROMs. And, there *are* some IDE disk subsystems with 1 TB on an IDE bus and such, but there just aren't enough IDE busses or PCI slots on most big machines to span out to the really high disk capacities or large numbers of spindles. But some of the compute engines could either be net-booted (no local disk) or have a cheap, small disk for boot, small static storage (couple hundred GB range) etc. But most people don't connect big machines to IDE drive subsystems. gerrit ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 3:31 ` Gerrit Huizenga @ 2003-02-24 4:02 ` Larry McVoy 2003-02-24 4:15 ` Russell Leighton ` (2 more replies) 0 siblings, 3 replies; 266+ messages in thread From: Larry McVoy @ 2003-02-24 4:02 UTC (permalink / raw) To: Gerrit Huizenga; +Cc: Bill Davidsen, lse-tech, Linux Kernel Mailing List On Sun, Feb 23, 2003 at 07:31:26PM -0800, Gerrit Huizenga wrote: > But most > people don't connect big machines to IDE drive subsystems. 3ware controllers. They look like SCSI to the host, but use cheap IDE drives on the back end. Really nice cards. bkbits.net runs on one. -- --- Larry McVoy lm at bitmover.com http://www.bitmover.com/lm ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 4:02 ` Larry McVoy @ 2003-02-24 4:15 ` Russell Leighton 2003-02-24 5:11 ` William Lee Irwin III 2003-02-24 8:07 ` Christoph Hellwig 2 siblings, 0 replies; 266+ messages in thread From: Russell Leighton @ 2003-02-24 4:15 UTC (permalink / raw) To: Larry McVoy Cc: Gerrit Huizenga, Bill Davidsen, lse-tech, Linux Kernel Mailing List Yup. Great price and super price/performance. Gotta luv it. Larry McVoy wrote: >On Sun, Feb 23, 2003 at 07:31:26PM -0800, Gerrit Huizenga wrote: > >>But most >>people don't connect big machines to IDE drive subsystems. >> > >3ware controllers. They look like SCSI to the host, but use cheap IDE >drives on the back end. Really nice cards. bkbits.net runs on one. > ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 4:02 ` Larry McVoy 2003-02-24 4:15 ` Russell Leighton @ 2003-02-24 5:11 ` William Lee Irwin III 2003-02-24 8:07 ` Christoph Hellwig 2 siblings, 0 replies; 266+ messages in thread From: William Lee Irwin III @ 2003-02-24 5:11 UTC (permalink / raw) To: Larry McVoy, Gerrit Huizenga, Bill Davidsen, lse-tech, Linux Kernel Mailing List On Sun, Feb 23, 2003 at 07:31:26PM -0800, Gerrit Huizenga wrote: >> But most people don't connect big machines to IDE drive subsystems. > On Sun, Feb 23, 2003 at 08:02:46PM -0800, Larry McVoy wrote: > 3ware controllers. They look like SCSI to the host, but use cheap IDE > drives on the back end. Really nice cards. bkbits.net runs on one. A quick back of the napkin estimate guesstimates that this 3ware stuff would max at 6 racks of disks on NUMA-Q or 3/8 of a rack per node (ignoring cabling, which looks infeasible, but never mind that), which is a smaller capacity than I remember FC having. NUMA-Q's a bit optimistic for 3ware because it has buttloads of PCI slots in comparison to more modern machines. -- wli ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-24 4:02 ` Larry McVoy 2003-02-24 4:15 ` Russell Leighton 2003-02-24 5:11 ` William Lee Irwin III @ 2003-02-24 8:07 ` Christoph Hellwig 2 siblings, 0 replies; 266+ messages in thread From: Christoph Hellwig @ 2003-02-24 8:07 UTC (permalink / raw) To: Larry McVoy, Gerrit Huizenga, Bill Davidsen, lse-tech, Linux Kernel Mailing List On Sun, Feb 23, 2003 at 08:02:46PM -0800, Larry McVoy wrote: > On Sun, Feb 23, 2003 at 07:31:26PM -0800, Gerrit Huizenga wrote: > > But most > > people don't connect big machines to IDE drive subsystems. > > 3ware controllers. They look like SCSI to the host, but use cheap IDE > drives on the back end. Really nice cards. bkbits.net runs on one. That's true (similar for some nice scsi2ide external raid boxens), but Alan's original argument was about the Linux IDE driver on bix machines which is used by neither.. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-22 0:16 ` Larry McVoy ` (3 preceding siblings ...) 2003-02-22 18:20 ` Alan Cox @ 2003-02-23 0:37 ` Eric W. Biederman 4 siblings, 0 replies; 266+ messages in thread From: Eric W. Biederman @ 2003-02-23 0:37 UTC (permalink / raw) To: Larry McVoy; +Cc: Hanna Linder, lse-tech, linux-kernel Larry McVoy <lm@bitmover.com> writes: > > Ben said none of the distros are supporting these large > > systems right now. Martin said UL is already starting to support > > them. > > Ben is right. I think IBM and the other big iron companies would be > far better served looking at what they have done with running multiple > instances of Linux on one big machine, like the 390 work. Figure out > how to use that model to scale up. There is simply not a big enough > market to justify shoveling lots of scaling stuff in for huge machines > that only a handful of people can afford. That's the same path which > has sunk all the workstation companies, they all have bloated OS's and > Linux runs circles around them. Larry it isn't that Linux isn't being scaled in the way you suggest. But for the people who really care about scalability having a single system image is not the most important thing so making it look like one system is secondary. Linux clusters are currently among the top 5 supercomputers of the world. And there the question is how do you make 1200 machines look like one. And how do you handle the reliability issues. When MTBF becomes a predictor for how many times a week someone needs to replace hardware the problem is very different from a simple SMP. And there seems to be a fairly substantial market for huge machines, for people who need high performance. All kinds of problems are require enormous amounts of data crunching. So far the low hanging fruit on large clusters is still with making the hardware and the systems actually work. But increasingly having a single high performance distributed filesystem is becoming important. But look at projects like bproc, mosix, and lustre. Not the best things in the world but the work is getting done. Scalability is easy. The hard part is making it look like one machine when you are done. Eric ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-21 23:48 Hanna Linder 2003-02-22 0:16 ` Larry McVoy @ 2003-02-23 0:42 ` Eric W. Biederman 2003-02-23 14:29 ` Rik van Riel 2003-02-23 3:24 ` Andrew Morton 2 siblings, 1 reply; 266+ messages in thread From: Eric W. Biederman @ 2003-02-23 0:42 UTC (permalink / raw) To: Hanna Linder; +Cc: lse-tech, linux-kernel Hanna Linder <hannal@us.ibm.com> writes: > LSE Con Call Minutes from Feb21 > > Minutes compiled by Hanna Linder hannal@us.ibm.com, please post > corrections to lse-tech@lists.sf.net. > > Object Based Reverse Mapping: > (Dave McCracken, Ben LaHaise, Rik van Riel, Martin Bligh, Gerrit Huizenga) > > Ben said none of the users have been complaining about > performance with the existing rmap. Martin disagreed and said Linus, > Andrew Morton and himself have all agreed there is a problem. > One of the problems Martin is already hitting on high cpu machines with > large memory is the space consumption by all the pte-chains filling up > memory and killing the machine. There is also a performance impact of > maintaining the chains. Note: rmap chains can be restricted to an arbitrary length, or an arbitrary total count trivially. All you have to do is allow a fixed limit on the number of people who can map a page simultaneously. The selection of which chain to unmap can be a bit tricky but is relatively straight forward. Why doesn't someone who is seeing this just hack this up? Eric ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-23 0:42 ` Eric W. Biederman @ 2003-02-23 14:29 ` Rik van Riel 2003-02-23 17:28 ` Eric W. Biederman 0 siblings, 1 reply; 266+ messages in thread From: Rik van Riel @ 2003-02-23 14:29 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Hanna Linder, lse-tech, linux-kernel On Sat, 22 Feb 2003, Eric W. Biederman wrote: > Note: rmap chains can be restricted to an arbitrary length, or an > arbitrary total count trivially. All you have to do is allow a fixed > limit on the number of people who can map a page simultaneously. > > The selection of which chain to unmap can be a bit tricky but is > relatively straight forward. Why doesn't someone who is seeing > this just hack this up? I'm not sure how useful this feature would be. Also, there are a bunch of corner cases in which you cannot limit the number of processes mapping a page, think about eg. mlock, nonlinear vmas and anonymous memory. All in all I suspect that the cost of such a feature might be higher than any benefits. cheers, Rik -- Engineers don't grow up, they grow sideways. http://www.surriel.com/ http://kernelnewbies.org/ ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-23 14:29 ` Rik van Riel @ 2003-02-23 17:28 ` Eric W. Biederman 2003-02-24 1:42 ` Benjamin LaHaise 0 siblings, 1 reply; 266+ messages in thread From: Eric W. Biederman @ 2003-02-23 17:28 UTC (permalink / raw) To: Rik van Riel; +Cc: Hanna Linder, lse-tech, linux-kernel Rik van Riel <riel@imladris.surriel.com> writes: > On Sat, 22 Feb 2003, Eric W. Biederman wrote: > > > Note: rmap chains can be restricted to an arbitrary length, or an > > arbitrary total count trivially. All you have to do is allow a fixed > > limit on the number of people who can map a page simultaneously. > > > > The selection of which chain to unmap can be a bit tricky but is > > relatively straight forward. Why doesn't someone who is seeing > > this just hack this up? > > I'm not sure how useful this feature would be. The problem. There is no upper bound to how many rmap entries there can be at one time. And the unbounded growth can overwhelm a machine. The goal is to provide an overall system cap on the number of rmap entries. > Also, > there are a bunch of corner cases in which you cannot > limit the number of processes mapping a page, think > about eg. mlock, nonlinear vmas and anonymous memory. Unless something has changed for nonlinear vmas, and anonymous memory we have been storing enough information to recover the page in the page tables for ages. For mlock we want a cap on the number of pages that are locked, so it should not be a problem. But even then we don't have to guarantee the page is constantly in the processes page table, simply that the mlocked page is never swapped out. > All in all I suspect that the cost of such a feature > might be higher than any benefits. Cost? What Cost? The simple implementation is to walk the page lists and unmap the pages that are least likely to be used next. This is not something new. We have been doing this in 2.4.x and before for years. Before it just never freed up rmap entries, as well as preparing a page to be paged out. Eric ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-23 17:28 ` Eric W. Biederman @ 2003-02-24 1:42 ` Benjamin LaHaise 0 siblings, 0 replies; 266+ messages in thread From: Benjamin LaHaise @ 2003-02-24 1:42 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Rik van Riel, Hanna Linder, lse-tech, linux-kernel On Sun, Feb 23, 2003 at 10:28:04AM -0700, Eric W. Biederman wrote: > The problem. There is no upper bound to how many rmap > entries there can be at one time. And the unbounded > growth can overwhelm a machine. Eh? By that logic there's no bound to the number of vmas that can exist at a given time. But there is a bound on the number that a single process can force the system into using, and that limit also caps the number of rmap entries the process can bring into existance. Virtual address space is not free, and there are already mechanisms in place to limit it which, given that the number of rmap entries are directly proportion to the amount of virtual address space in use, probably need proper configuration. > The goal is to provide an overall system cap on the number > of rmap entries. No, the goal is to have a stable system under a variety of workloads that performs well. User exploitable worst case behaviour is a bad idea. Hybrid solves that at the expense of added complexity. -ben -- Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a> ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-21 23:48 Hanna Linder 2003-02-22 0:16 ` Larry McVoy 2003-02-23 0:42 ` Eric W. Biederman @ 2003-02-23 3:24 ` Andrew Morton 2003-02-25 17:17 ` Andrea Arcangeli 2 siblings, 1 reply; 266+ messages in thread From: Andrew Morton @ 2003-02-23 3:24 UTC (permalink / raw) To: Hanna Linder; +Cc: lse-tech, linux-kernel Hanna Linder <hannal@us.ibm.com> wrote: > > > Dave coded up an initial patch for partial object based rmap > which he sent to linux-mm yesterday. I've run some numbers on this. Looks like it reclaims most of the fork/exec/exit rmap overhead. The testcase is applying and removing 64 kernel patches using my patch management scripts. I use this because a) It's a real workload, which someone cares about and b) It's about as forky as anything is ever likely to be, without being a stupid microbenchmark. Testing is on the fast P4-HT, everything in pagecache. 2.4.21-pre4: 8.10 seconds 2.5.62-mm3 with objrmap: 9.95 seconds (+1.85) 2.5.62-mm3 without objrmap: 10.86 seconds (+0.91) Current 2.5 is 2.76 seconds slower, and this patch reclaims 0.91 of those seconds. So whole stole the remaining 1.85 seconds? Looks like pte_highmem. Here is 2.5.62-mm3, with objrmap: c013042c find_get_page 601 10.7321 c01333dc free_hot_cold_page 641 2.7629 c0207130 __copy_to_user_ll 687 6.6058 c011450c flush_tlb_page 725 6.4732 c0139ba0 clear_page_tables 841 2.4735 c011718c pte_alloc_one 910 6.5000 c013b56c do_anonymous_page 954 1.7667 c013b788 do_no_page 1044 1.6519 c015b59c d_lookup 1096 3.2619 c013ba00 handle_mm_fault 1098 4.6525 c0108d14 system_call 1116 25.3636 c0137240 release_pages 1828 6.4366 c013a1f4 zap_pte_range 2616 4.8806 c013f5c0 page_add_rmap 2776 8.3614 c0139eac copy_page_range 2994 3.5643 c013f70c page_remove_rmap 3132 6.2640 c013adb4 do_wp_page 6712 8.4322 c01172e0 do_page_fault 8788 7.7496 c0106ed8 poll_idle 99878 1189.0238 00000000 total 158601 0.0869 Note one second spent in pte_alloc_one(). Here is 2.4.21-pre4, with the following functions uninlined pte_t *pte_alloc_one(struct mm_struct *mm, unsigned long address); pte_t *pte_alloc_one_fast(struct mm_struct *mm, unsigned long address); void pte_free_fast(pte_t *pte); void pte_free_slow(pte_t *pte); c0252950 atomic_dec_and_lock 36 0.4800 c0111778 flush_tlb_mm 37 0.3304 c0129c3c file_read_actor 37 0.2569 c025282c strnlen_user 43 0.5119 c012b35c generic_file_write 46 0.0283 c0114c78 schedule 48 0.0361 c0129050 unlock_page 53 0.4907 c0140974 link_path_walk 57 0.0237 c0116740 copy_mm 62 0.0852 c0130740 __free_pages_ok 62 0.0963 c0126afc handle_mm_fault 63 0.3424 c01254c0 __free_pte 67 0.8816 c0129198 __find_get_page 67 0.9853 c01309c4 rmqueue 70 0.1207 c011ae0c exit_notify 77 0.1075 c0149b34 d_lookup 81 0.2774 c0126874 do_anonymous_page 83 0.3517 c0126960 do_no_page 86 0.2087 c01117e8 flush_tlb_page 105 0.8750 c0106f54 system_call 138 2.4643 c01255c8 copy_page_range 197 0.4603 c0130ffc __free_pages 204 5.6667 c0125774 zap_page_range 262 0.3104 c0126330 do_wp_page 775 1.4904 c0113c18 do_page_fault 864 0.7030 c01052f8 poll_idle 6803 170.0750 00000000 total 11923 0.0087 Note the lack of pte_alloc_one_slow(). So we need the page table cache back. We cannot put it in slab, because slab does not do highmem. I believe the best way to solve this is to implement a per-cpu LIFO head array of known-to-be-zeroed pages in the page allocator. Populate it with free_zeroed_page(), grab pages from it with __GFP_ZEROED. This is a simple extension to the existing hot and cold head arrays, and I have patches, and they don't work. Something in the pagetable freeing path seems to be putting back pages which are not fully zeroed, and I didn't get onto debugging it. It would be nice to get it going, because a number of architectures can perhaps nuke their private pagetable caches. I shall drop the patches in next-mm/experimental and look hopefully at Dave ;) ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-23 3:24 ` Andrew Morton @ 2003-02-25 17:17 ` Andrea Arcangeli 2003-02-25 17:43 ` William Lee Irwin III 0 siblings, 1 reply; 266+ messages in thread From: Andrea Arcangeli @ 2003-02-25 17:17 UTC (permalink / raw) To: Andrew Morton; +Cc: Hanna Linder, lse-tech, linux-kernel On Sat, Feb 22, 2003 at 07:24:24PM -0800, Andrew Morton wrote: > 2.4.21-pre4: 8.10 seconds > 2.5.62-mm3 with objrmap: 9.95 seconds (+1.85) > 2.5.62-mm3 without objrmap: 10.86 seconds (+0.91) > > Current 2.5 is 2.76 seconds slower, and this patch reclaims 0.91 of those > seconds. > > > So whole stole the remaining 1.85 seconds? Looks like pte_highmem. would you mind to add the line for 2.4.21-pre4aa3? it has pte-highmem so you can easily find it out for sure if it is pte_highmem that stole >10% of your fast cpu. A line for the 2.4-rmap patch would be also interesting. > Note one second spent in pte_alloc_one(). note the seconds spent in the rmap affected paths too. Andrea ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 17:17 ` Andrea Arcangeli @ 2003-02-25 17:43 ` William Lee Irwin III 2003-02-25 17:59 ` Andrea Arcangeli 0 siblings, 1 reply; 266+ messages in thread From: William Lee Irwin III @ 2003-02-25 17:43 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Andrew Morton, Hanna Linder, lse-tech, linux-kernel On Sat, Feb 22, 2003 at 07:24:24PM -0800, Andrew Morton wrote: >> So whole stole the remaining 1.85 seconds? Looks like pte_highmem. On Tue, Feb 25, 2003 at 06:17:27PM +0100, Andrea Arcangeli wrote: > would you mind to add the line for 2.4.21-pre4aa3? it has pte-highmem so > you can easily find it out for sure if it is pte_highmem that stole >10% > of your fast cpu. A line for the 2.4-rmap patch would be also > interesting. On Sat, Feb 22, 2003 at 07:24:24PM -0800, Andrew Morton wrote: >> Note one second spent in pte_alloc_one(). On Tue, Feb 25, 2003 at 06:17:27PM +0100, Andrea Arcangeli wrote: > note the seconds spent in the rmap affected paths too. The pagetable cache is gone in 2.5, so pte_alloc_one() takes the bitblitting hit for pagetables. I didn't catch the whole profile, so I'll need numbers for rmap paths. -- wli ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 17:43 ` William Lee Irwin III @ 2003-02-25 17:59 ` Andrea Arcangeli 2003-02-25 18:04 ` William Lee Irwin III 2003-02-25 18:50 ` William Lee Irwin III 0 siblings, 2 replies; 266+ messages in thread From: Andrea Arcangeli @ 2003-02-25 17:59 UTC (permalink / raw) To: William Lee Irwin III, Andrew Morton, Hanna Linder, lse-tech, linux-kernel On Tue, Feb 25, 2003 at 09:43:59AM -0800, William Lee Irwin III wrote: > On Sat, Feb 22, 2003 at 07:24:24PM -0800, Andrew Morton wrote: > >> So whole stole the remaining 1.85 seconds? Looks like pte_highmem. > > On Tue, Feb 25, 2003 at 06:17:27PM +0100, Andrea Arcangeli wrote: > > would you mind to add the line for 2.4.21-pre4aa3? it has pte-highmem so > > you can easily find it out for sure if it is pte_highmem that stole >10% > > of your fast cpu. A line for the 2.4-rmap patch would be also > > interesting. > > On Sat, Feb 22, 2003 at 07:24:24PM -0800, Andrew Morton wrote: > >> Note one second spent in pte_alloc_one(). > > On Tue, Feb 25, 2003 at 06:17:27PM +0100, Andrea Arcangeli wrote: > > note the seconds spent in the rmap affected paths too. > > The pagetable cache is gone in 2.5, so pte_alloc_one() takes the > bitblitting hit for pagetables. I'm talking about do_anonymous_page, do_wp_page, do_no_page fork and all the other places that introduces spinlocks (per-page) and allocations of 2 pieces of ram rather than just 1 (and in turn potentially global spinlocks too if the cpu-caches are empty). Just grep for pte_chain_alloc or page_add_rmap in mm/memory.c, that's what I mean, I'm not talking about pagetables. Andrea ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 17:59 ` Andrea Arcangeli @ 2003-02-25 18:04 ` William Lee Irwin III 2003-02-25 18:50 ` William Lee Irwin III 1 sibling, 0 replies; 266+ messages in thread From: William Lee Irwin III @ 2003-02-25 18:04 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Andrew Morton, Hanna Linder, lse-tech, linux-kernel On Tue, Feb 25, 2003 at 09:43:59AM -0800, William Lee Irwin III wrote: >> The pagetable cache is gone in 2.5, so pte_alloc_one() takes the >> bitblitting hit for pagetables. On Tue, Feb 25, 2003 at 06:59:28PM +0100, Andrea Arcangeli wrote: > I'm talking about do_anonymous_page, do_wp_page, do_no_page fork and all > the other places that introduces spinlocks (per-page) and allocations of > 2 pieces of ram rather than just 1 (and in turn potentially global > spinlocks too if the cpu-caches are empty). Just grep for > pte_chain_alloc or page_add_rmap in mm/memory.c, that's what I mean, I'm > not talking about pagetables. Well, pte_alloc_one() has a clear explanation. The fact that the rmap accounting is not free is not news. For anonymous pages performing the analogous vma-based lookup as with Dave McCracken's patch for file-backed pages would require a significant anonymous page accounting rework. -- wli ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 17:59 ` Andrea Arcangeli 2003-02-25 18:04 ` William Lee Irwin III @ 2003-02-25 18:50 ` William Lee Irwin III 2003-02-25 19:18 ` Andrea Arcangeli 1 sibling, 1 reply; 266+ messages in thread From: William Lee Irwin III @ 2003-02-25 18:50 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Andrew Morton, Hanna Linder, lse-tech, linux-kernel On Tue, Feb 25, 2003 at 09:43:59AM -0800, William Lee Irwin III wrote: >> The pagetable cache is gone in 2.5, so pte_alloc_one() takes the >> bitblitting hit for pagetables. On Tue, Feb 25, 2003 at 06:59:28PM +0100, Andrea Arcangeli wrote: > I'm talking about do_anonymous_page, do_wp_page, do_no_page fork and all > the other places that introduces spinlocks (per-page) and allocations of > 2 pieces of ram rather than just 1 (and in turn potentially global > spinlocks too if the cpu-caches are empty). Just grep for > pte_chain_alloc or page_add_rmap in mm/memory.c, that's what I mean, I'm > not talking about pagetables. Okay, fished out the profiles (w/Dave's optimization): 00000000 total 158601 0.0869 c0106ed8 poll_idle 99878 1189.0238 c01172e0 do_page_fault 8788 7.7496 c013adb4 do_wp_page 6712 8.4322 c013f70c page_remove_rmap 3132 6.2640 c0139eac copy_page_range 2994 3.5643 c013f5c0 page_add_rmap 2776 8.3614 c013a1f4 zap_pte_range 2616 4.8806 c0137240 release_pages 1828 6.4366 c0108d14 system_call 1116 25.3636 c013ba00 handle_mm_fault 1098 4.6525 c015b59c d_lookup 1096 3.2619 c013b788 do_no_page 1044 1.6519 c013b56c do_anonymous_page 954 1.7667 c011718c pte_alloc_one 910 6.5000 c0139ba0 clear_page_tables 841 2.4735 c011450c flush_tlb_page 725 6.4732 c0207130 __copy_to_user_ll 687 6.6058 c01333dc free_hot_cold_page 641 2.7629 c013042c find_get_page 601 10.7321 Just taking the exception dwarfs anything written in C. page_add_rmap() absorbs hits from all of the fault routines and copy_page_range(). page_remove_rmap() absorbs hits from zap_pte_range(). do_wp_page() is huge because it's doing bitblitting in-line. These things aren't cheap with or without rmap. Trimming down accounting overhead could raise search problems elsewhere. Whether avoiding the search problem is worth the accounting overhead could probably use some more investigation, like actually trying the anonymous page handling rework needed to use vma-based ptov resolution. -- wli ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 18:50 ` William Lee Irwin III @ 2003-02-25 19:18 ` Andrea Arcangeli 2003-02-25 19:27 ` Martin J. Bligh 2003-02-25 20:10 ` William Lee Irwin III 0 siblings, 2 replies; 266+ messages in thread From: Andrea Arcangeli @ 2003-02-25 19:18 UTC (permalink / raw) To: William Lee Irwin III, Andrew Morton, Hanna Linder, lse-tech, linux-kernel On Tue, Feb 25, 2003 at 10:50:08AM -0800, William Lee Irwin III wrote: > On Tue, Feb 25, 2003 at 09:43:59AM -0800, William Lee Irwin III wrote: > >> The pagetable cache is gone in 2.5, so pte_alloc_one() takes the > >> bitblitting hit for pagetables. > > On Tue, Feb 25, 2003 at 06:59:28PM +0100, Andrea Arcangeli wrote: > > I'm talking about do_anonymous_page, do_wp_page, do_no_page fork and all > > the other places that introduces spinlocks (per-page) and allocations of > > 2 pieces of ram rather than just 1 (and in turn potentially global > > spinlocks too if the cpu-caches are empty). Just grep for > > pte_chain_alloc or page_add_rmap in mm/memory.c, that's what I mean, I'm > > not talking about pagetables. > > Okay, fished out the profiles (w/Dave's optimization): > > 00000000 total 158601 0.0869 > c0106ed8 poll_idle 99878 1189.0238 > c01172e0 do_page_fault 8788 7.7496 > c013adb4 do_wp_page 6712 8.4322 > c013f70c page_remove_rmap 3132 6.2640 > c0139eac copy_page_range 2994 3.5643 > c013f5c0 page_add_rmap 2776 8.3614 > c013a1f4 zap_pte_range 2616 4.8806 > c0137240 release_pages 1828 6.4366 > c0108d14 system_call 1116 25.3636 > c013ba00 handle_mm_fault 1098 4.6525 > c015b59c d_lookup 1096 3.2619 > c013b788 do_no_page 1044 1.6519 > c013b56c do_anonymous_page 954 1.7667 > c011718c pte_alloc_one 910 6.5000 > c0139ba0 clear_page_tables 841 2.4735 > c011450c flush_tlb_page 725 6.4732 > c0207130 __copy_to_user_ll 687 6.6058 > c01333dc free_hot_cold_page 641 2.7629 > c013042c find_get_page 601 10.7321 > > Just taking the exception dwarfs anything written in C. > > page_add_rmap() absorbs hits from all of the fault routines and > copy_page_range(). page_remove_rmap() absorbs hits from zap_pte_range(). > do_wp_page() is huge because it's doing bitblitting in-line. "absorbing" is a nice word for it. The way I see it, page_add_rmap and page_remove_rmap are even more expensive than the pagtable zapping. They're even more expensive than copy_page_range. Also focus on the numbers on the right that are even more interesting to find what is worth to optimize away first IMHO > > These things aren't cheap with or without rmap. Trimming down lots of things aren't cheap, but this isn't a good reason to make them twice more expensive, especially if they were as cheap as possible and they're critical hot paths. > accounting overhead could raise search problems elsewhere. this is the point indeed, but at least in 2.4 I don't see any cpu saving advantage during swapping because during swapping the cpu is always idle anyways. Infact I had to drop the lru_cache_add too from the anonymous page fault path because it was wasting way too much cpu to get peak performance (of course you're using per-page spinlocks by hand with rmap, and lru_cache_add needs a global spinlock, so at least rmap shouldn't introduce very big scalability issue unlike the lru_cache_add) > Whether avoiding the search problem is worth the accounting overhead > could probably use some more investigation, like actually trying the > anonymous page handling rework needed to use vma-based ptov resolution. the only solution is to do rmap lazily, i.e. to start building the rmap during swapping by walking the pagetables, basically exactly like I refill the lru with anonymous pages only after I start to need this information recently in my 2.4 tree, so if you never need to pageout heavily several giga of ram (like most of very high end numa servers), you'll never waste a single cycle in locking or whatever other worthless accounting overhead that hurts performance of all common workloads Andrea ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 19:18 ` Andrea Arcangeli @ 2003-02-25 19:27 ` Martin J. Bligh 2003-02-25 20:30 ` Andrea Arcangeli 2003-02-25 20:10 ` William Lee Irwin III 1 sibling, 1 reply; 266+ messages in thread From: Martin J. Bligh @ 2003-02-25 19:27 UTC (permalink / raw) To: Andrea Arcangeli, William Lee Irwin III, Andrew Morton, Hanna Linder, lse-tech, linux-kernel > the only solution is to do rmap lazily, i.e. to start building the rmap > during swapping by walking the pagetables, basically exactly like I > refill the lru with anonymous pages only after I start to need this > information recently in my 2.4 tree, so if you never need to pageout > heavily several giga of ram (like most of very high end numa servers), > you'll never waste a single cycle in locking or whatever other worthless > accounting overhead that hurts performance of all common workloads Did you see the partially object-based rmap stuff? I think that does very close to what you want already. M. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 19:27 ` Martin J. Bligh @ 2003-02-25 20:30 ` Andrea Arcangeli 2003-02-25 20:53 ` Martin J. Bligh 0 siblings, 1 reply; 266+ messages in thread From: Andrea Arcangeli @ 2003-02-25 20:30 UTC (permalink / raw) To: Martin J. Bligh Cc: William Lee Irwin III, Andrew Morton, Hanna Linder, lse-tech, linux-kernel On Tue, Feb 25, 2003 at 11:27:40AM -0800, Martin J. Bligh wrote: > > the only solution is to do rmap lazily, i.e. to start building the rmap > > during swapping by walking the pagetables, basically exactly like I > > refill the lru with anonymous pages only after I start to need this > > information recently in my 2.4 tree, so if you never need to pageout > > heavily several giga of ram (like most of very high end numa servers), > > you'll never waste a single cycle in locking or whatever other worthless > > accounting overhead that hurts performance of all common workloads > > Did you see the partially object-based rmap stuff? I think that does > very close to what you want already. I don't see how it can optimize away the overhead but I didn't look at it for long. Andrea ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 20:30 ` Andrea Arcangeli @ 2003-02-25 20:53 ` Martin J. Bligh 2003-02-25 21:17 ` Andrea Arcangeli 0 siblings, 1 reply; 266+ messages in thread From: Martin J. Bligh @ 2003-02-25 20:53 UTC (permalink / raw) To: Andrea Arcangeli Cc: William Lee Irwin III, Andrew Morton, Hanna Linder, lse-tech, linux-kernel >> > the only solution is to do rmap lazily, i.e. to start building the rmap >> > during swapping by walking the pagetables, basically exactly like I >> > refill the lru with anonymous pages only after I start to need this >> > information recently in my 2.4 tree, so if you never need to pageout >> > heavily several giga of ram (like most of very high end numa servers), >> > you'll never waste a single cycle in locking or whatever other >> > worthless accounting overhead that hurts performance of all common >> > workloads >> >> Did you see the partially object-based rmap stuff? I think that does >> very close to what you want already. > > I don't see how it can optimize away the overhead but I didn't look at > it for long. Because you don't set up and tear down the rmap pte-chains for every fault in / delete of any page ... it just works off the vmas. M. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 20:53 ` Martin J. Bligh @ 2003-02-25 21:17 ` Andrea Arcangeli 2003-02-25 21:12 ` Martin J. Bligh 2003-02-25 21:26 ` William Lee Irwin III 0 siblings, 2 replies; 266+ messages in thread From: Andrea Arcangeli @ 2003-02-25 21:17 UTC (permalink / raw) To: Martin J. Bligh Cc: William Lee Irwin III, Andrew Morton, Hanna Linder, lse-tech, linux-kernel On Tue, Feb 25, 2003 at 12:53:44PM -0800, Martin J. Bligh wrote: > >> > the only solution is to do rmap lazily, i.e. to start building the rmap > >> > during swapping by walking the pagetables, basically exactly like I > >> > refill the lru with anonymous pages only after I start to need this > >> > information recently in my 2.4 tree, so if you never need to pageout > >> > heavily several giga of ram (like most of very high end numa servers), > >> > you'll never waste a single cycle in locking or whatever other > >> > worthless accounting overhead that hurts performance of all common > >> > workloads > >> > >> Did you see the partially object-based rmap stuff? I think that does > >> very close to what you want already. > > > > I don't see how it can optimize away the overhead but I didn't look at > > it for long. > > Because you don't set up and tear down the rmap pte-chains for every > fault in / delete of any page ... it just works off the vmas. so basically it uses the rmap that we always had since at least 2.2 for everything but anon mappings, right? this is what DaveM did a few years back too. This makes lots of sense to me, so at least we avoid the duplication of rmap information, even if it won't fix the anonymous page overhead, but clearly it's much lower cost for everything but anonymous pages. Andrea ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 21:17 ` Andrea Arcangeli @ 2003-02-25 21:12 ` Martin J. Bligh 2003-02-25 22:16 ` Andrea Arcangeli 2003-02-25 21:26 ` William Lee Irwin III 1 sibling, 1 reply; 266+ messages in thread From: Martin J. Bligh @ 2003-02-25 21:12 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: William Lee Irwin III, Andrew Morton, linux-kernel >> Because you don't set up and tear down the rmap pte-chains for every >> fault in / delete of any page ... it just works off the vmas. > > so basically it uses the rmap that we always had since at least 2.2 for > everything but anon mappings, right? this is what DaveM did a few years > back too. This makes lots of sense to me, so at least we avoid the > duplication of rmap information, even if it won't fix the anonymous page > overhead, but clearly it's much lower cost for everything but anonymous > pages. Right ... and anonymous chains are about 95% single-reference (at least for the case I looked at), so they're direct mapped from the struct page with no chain at all. Cuts out something like 95% of the space overhead of pte-chains, and 65% of the time (for kernel compile -j256 on 16x system). However, it's going to be a little more expensive to *use* the mappings, so we need to measure that carefully. M. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 21:12 ` Martin J. Bligh @ 2003-02-25 22:16 ` Andrea Arcangeli 2003-02-25 22:17 ` Martin J. Bligh 0 siblings, 1 reply; 266+ messages in thread From: Andrea Arcangeli @ 2003-02-25 22:16 UTC (permalink / raw) To: Martin J. Bligh; +Cc: William Lee Irwin III, Andrew Morton, linux-kernel On Tue, Feb 25, 2003 at 01:12:55PM -0800, Martin J. Bligh wrote: > >> Because you don't set up and tear down the rmap pte-chains for every > >> fault in / delete of any page ... it just works off the vmas. > > > > so basically it uses the rmap that we always had since at least 2.2 for > > everything but anon mappings, right? this is what DaveM did a few years > > back too. This makes lots of sense to me, so at least we avoid the > > duplication of rmap information, even if it won't fix the anonymous page > > overhead, but clearly it's much lower cost for everything but anonymous > > pages. > > Right ... and anonymous chains are about 95% single-reference (at least for > the case I looked at), so they're direct mapped from the struct page with > no chain at all. Cuts out something like 95% of the space overhead of > pte-chains, and 65% of the time (for kernel compile -j256 on 16x system). > However, it's going to be a little more expensive to *use* the mappings, > so we need to measure that carefully. Sure, it is more expensive to use them, but all we care about is complexity, and they solve the complexity problem just fine, so I definitely prefer it. Cpu utilization during heavy swapping isn't a big deal IMHO Andrea ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 22:16 ` Andrea Arcangeli @ 2003-02-25 22:17 ` Martin J. Bligh 2003-02-25 22:37 ` Andrea Arcangeli 0 siblings, 1 reply; 266+ messages in thread From: Martin J. Bligh @ 2003-02-25 22:17 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: William Lee Irwin III, Andrew Morton, linux-kernel > Sure, it is more expensive to use them, but all we care about is > complexity, and they solve the complexity problem just fine, so I > definitely prefer it. Cpu utilization during heavy swapping isn't a big > deal IMHO I totally agree with you. However the concerns others raised were over page aging and page stealing (eg from pagecache), which might not involve disk, but would also be slower. It probably need some tuning and tweaking, but I'm pretty sure it's fundamentally the right approach. M. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 22:17 ` Martin J. Bligh @ 2003-02-25 22:37 ` Andrea Arcangeli 0 siblings, 0 replies; 266+ messages in thread From: Andrea Arcangeli @ 2003-02-25 22:37 UTC (permalink / raw) To: Martin J. Bligh; +Cc: William Lee Irwin III, Andrew Morton, linux-kernel On Tue, Feb 25, 2003 at 02:17:48PM -0800, Martin J. Bligh wrote: > > Sure, it is more expensive to use them, but all we care about is > > complexity, and they solve the complexity problem just fine, so I > > definitely prefer it. Cpu utilization during heavy swapping isn't a big > > deal IMHO > > I totally agree with you. However the concerns others raised were over > page aging and page stealing (eg from pagecache), which might not involve > disk, but would also be slower. It probably need some tuning and tweaking, > but I'm pretty sure it's fundamentally the right approach. there's no slowdown at all when we don't need to unmap anything. We just need to avoid watching the pte young bit in the pagetables unless we're about to start unmapping stuff. Most machines won't reach the point where they need to start unmapping stuff. Watching the ptes during normal pagecache recycling would be wasteful anyways, regardless what chain we take to reach the pte. Andrea ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 21:17 ` Andrea Arcangeli 2003-02-25 21:12 ` Martin J. Bligh @ 2003-02-25 21:26 ` William Lee Irwin III 2003-02-25 22:18 ` Andrea Arcangeli 2003-02-26 5:24 ` Rik van Riel 1 sibling, 2 replies; 266+ messages in thread From: William Lee Irwin III @ 2003-02-25 21:26 UTC (permalink / raw) To: Andrea Arcangeli Cc: Martin J. Bligh, Andrew Morton, Hanna Linder, lse-tech, linux-kernel On Tue, Feb 25, 2003 at 12:53:44PM -0800, Martin J. Bligh wrote: >> Because you don't set up and tear down the rmap pte-chains for every >> fault in / delete of any page ... it just works off the vmas. On Tue, Feb 25, 2003 at 10:17:18PM +0100, Andrea Arcangeli wrote: > so basically it uses the rmap that we always had since at least 2.2 for > everything but anon mappings, right? this is what DaveM did a few years > back too. This makes lots of sense to me, so at least we avoid the > duplication of rmap information, even if it won't fix the anonymous page > overhead, but clearly it's much lower cost for everything but anonymous > pages. This is what the "anonymous rework" is about. There is already a fix extant for the file-backed case, which I presumed you knew of already, and so were were speaking of issues with the anonymous case. My impression thus far is that the anonymous case has not been pressing with respect to space consumption or cpu time once the file-backed code is in place, though if it resurfaces as a serious concern the anonymous rework can be pursued (along with other things). -- wli ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 21:26 ` William Lee Irwin III @ 2003-02-25 22:18 ` Andrea Arcangeli 2003-02-26 5:24 ` Rik van Riel 1 sibling, 0 replies; 266+ messages in thread From: Andrea Arcangeli @ 2003-02-25 22:18 UTC (permalink / raw) To: William Lee Irwin III, Martin J. Bligh, Andrew Morton, Hanna Linder, lse-tech, linux-kernel On Tue, Feb 25, 2003 at 01:26:35PM -0800, William Lee Irwin III wrote: > My impression thus far is that the anonymous case has not been pressing > with respect to space consumption or cpu time once the file-backed code > is in place, though if it resurfaces as a serious concern the anonymous > rework can be pursued (along with other things). sounds good to me ;) Andrea ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 21:26 ` William Lee Irwin III 2003-02-25 22:18 ` Andrea Arcangeli @ 2003-02-26 5:24 ` Rik van Riel 2003-02-26 5:38 ` William Lee Irwin III 1 sibling, 1 reply; 266+ messages in thread From: Rik van Riel @ 2003-02-26 5:24 UTC (permalink / raw) To: William Lee Irwin III Cc: Andrea Arcangeli, Martin J. Bligh, Andrew Morton, Hanna Linder, lse-tech, linux-kernel On Tue, 25 Feb 2003, William Lee Irwin III wrote: > My impression thus far is that the anonymous case has not been pressing > with respect to space consumption or cpu time once the file-backed code > is in place, though if it resurfaces as a serious concern the anonymous > rework can be pursued (along with other things). ... but making the anonymous pages use an object based scheme probably will make things too expensive. IIRC the object based reverse map patches by bcrl and davem both failed on the complexities needed to deal with anonymous pages. My instinct is that a hybrid system will work well in most cases and the worst case with mapped files won't be too bad. cheers, Rik -- Engineers don't grow up, they grow sideways. http://www.surriel.com/ http://kernelnewbies.org/ ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-26 5:24 ` Rik van Riel @ 2003-02-26 5:38 ` William Lee Irwin III 2003-02-26 6:01 ` Martin J. Bligh 0 siblings, 1 reply; 266+ messages in thread From: William Lee Irwin III @ 2003-02-26 5:38 UTC (permalink / raw) To: Rik van Riel Cc: Andrea Arcangeli, Martin J. Bligh, Andrew Morton, Hanna Linder, lse-tech, linux-kernel On Tue, 25 Feb 2003, William Lee Irwin III wrote: >> My impression thus far is that the anonymous case has not been pressing >> with respect to space consumption or cpu time once the file-backed code >> is in place, though if it resurfaces as a serious concern the anonymous >> rework can be pursued (along with other things). On Wed, Feb 26, 2003 at 02:24:18AM -0300, Rik van Riel wrote: > ... but making the anonymous pages use an object based > scheme probably will make things too expensive. > IIRC the object based reverse map patches by bcrl and > davem both failed on the complexities needed to deal > with anonymous pages. > My instinct is that a hybrid system will work well in > most cases and the worst case with mapped files won't > be too bad. The boxen I'm supposed to babysit need a high degree of resource consciousness wrt. lowmem allocations, so there is a clear voice on this issue. IMHO it's still an open question as to whether this is efficient for replacement concerns, which may yet favor objects. -- wli ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-26 5:38 ` William Lee Irwin III @ 2003-02-26 6:01 ` Martin J. Bligh 2003-02-26 6:14 ` William Lee Irwin III 2003-02-26 16:02 ` Rik van Riel 0 siblings, 2 replies; 266+ messages in thread From: Martin J. Bligh @ 2003-02-26 6:01 UTC (permalink / raw) To: William Lee Irwin III, Rik van Riel Cc: Andrea Arcangeli, Andrew Morton, Hanna Linder, lse-tech, linux-kernel >>> My impression thus far is that the anonymous case has not been pressing >>> with respect to space consumption or cpu time once the file-backed code >>> is in place, though if it resurfaces as a serious concern the anonymous >>> rework can be pursued (along with other things). > > On Wed, Feb 26, 2003 at 02:24:18AM -0300, Rik van Riel wrote: >> ... but making the anonymous pages use an object based >> scheme probably will make things too expensive. >> IIRC the object based reverse map patches by bcrl and >> davem both failed on the complexities needed to deal >> with anonymous pages. >> My instinct is that a hybrid system will work well in >> most cases and the worst case with mapped files won't >> be too bad. > > The boxen I'm supposed to babysit need a high degree of resource > consciousness wrt. lowmem allocations, so there is a clear voice It seemed, at least on the simple kernel compile tests that I did, that all the long chains are not anonymous. It killed 95% of the space issue, which given the simplicity of the patch was pretty damned stunning. Yes, there's a pointer per page I guess we could kill in the struct page itself, but I think you already have a better method for killing mem_map bloat ;-) M. ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-26 6:01 ` Martin J. Bligh @ 2003-02-26 6:14 ` William Lee Irwin III 2003-02-26 6:32 ` William Lee Irwin III 2003-02-26 16:02 ` Rik van Riel 1 sibling, 1 reply; 266+ messages in thread From: William Lee Irwin III @ 2003-02-26 6:14 UTC (permalink / raw) To: Martin J. Bligh Cc: Rik van Riel, Andrea Arcangeli, Andrew Morton, Hanna Linder, lse-tech, linux-kernel At some point in the past, I wrote: >> The boxen I'm supposed to babysit need a high degree of resource >> consciousness wrt. lowmem allocations, so there is a clear voice On Tue, Feb 25, 2003 at 10:01:20PM -0800, Martin J. Bligh wrote: > It seemed, at least on the simple kernel compile tests that I did, that all > the long chains are not anonymous. It killed 95% of the space issue, which > given the simplicity of the patch was pretty damned stunning. Yes, there's > a pointer per page I guess we could kill in the struct page itself, but I > think you already have a better method for killing mem_map bloat ;-) I'm not going to get up in arms about this unless there's a serious performance issue that's going to get smacked down that I want to have a say in how it gets smacked down. aa is happy with the filebacked stuff, so I'm not pressing it (much) further. And yes, page clustering is certainly on its way and fast. I'm getting very close to the point where a general announcement will be in order. There's basically "one last big bug" and two bits of gross suboptimality I want to clean up before bringing the world to bear on it. -- wli ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-26 6:14 ` William Lee Irwin III @ 2003-02-26 6:32 ` William Lee Irwin III 0 siblings, 0 replies; 266+ messages in thread From: William Lee Irwin III @ 2003-02-26 6:32 UTC (permalink / raw) To: Martin J. Bligh, Rik van Riel, Andrea Arcangeli, Andrew Morton, Hanna Linder, lse-tech, linux-kernel On Tue, Feb 25, 2003 at 10:01:20PM -0800, Martin J. Bligh wrote: >> It seemed, at least on the simple kernel compile tests that I did, that all >> the long chains are not anonymous. It killed 95% of the space issue, which >> given the simplicity of the patch was pretty damned stunning. Yes, there's >> a pointer per page I guess we could kill in the struct page itself, but I >> think you already have a better method for killing mem_map bloat ;-) On Tue, Feb 25, 2003 at 10:14:40PM -0800, William Lee Irwin III wrote: > I'm not going to get up in arms about this unless there's a serious > performance issue that's going to get smacked down that I want to have > a say in how it gets smacked down. aa is happy with the filebacked > stuff, so I'm not pressing it (much) further. > And yes, page clustering is certainly on its way and fast. I'm getting > very close to the point where a general announcement will be in order. > There's basically "one last big bug" and two bits of gross suboptimality > I want to clean up before bringing the world to bear on it. Screw it. Here it comes, ready or not. hch, I hope you were right... -- wli ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-26 6:01 ` Martin J. Bligh 2003-02-26 6:14 ` William Lee Irwin III @ 2003-02-26 16:02 ` Rik van Riel 2003-02-27 3:48 ` Daniel Phillips 1 sibling, 1 reply; 266+ messages in thread From: Rik van Riel @ 2003-02-26 16:02 UTC (permalink / raw) To: Martin J. Bligh Cc: William Lee Irwin III, Andrea Arcangeli, Andrew Morton, Hanna Linder, lse-tech, linux-kernel On Tue, 25 Feb 2003, Martin J. Bligh wrote: > > On Wed, Feb 26, 2003 at 02:24:18AM -0300, Rik van Riel wrote: > >> ... but making the anonymous pages use an object based > >> scheme probably will make things too expensive. > >> My instinct is that a hybrid system will work well in [snip] "wli wrote something" > It seemed, at least on the simple kernel compile tests that I did, that > all the long chains are not anonymous. It killed 95% of the space issue, > which given the simplicity of the patch was pretty damned stunning. Yes, > there's a pointer per page I guess we could kill in the struct page > itself, but I think you already have a better method for killing mem_map > bloat ;-) Also, with copy-on-write and mremap after fork, doing an object based rmap scheme for anonymous pages is just complex, almost certainly far too complex to be worth it, since it just has too many issues. Just read the patches by bcrl and davem, things get hairy fast. The pte chain rmap scheme is clean, but suffers from too much overhead for file mappings. As shown by Dave's patch, a hybrid system really is simple and clean, and it removes most of the pte chain overhead while still keeping the code nice and efficient. I think this hybrid system is the way to go, possibly with a few more tweaks left and right... regards, Rik -- Engineers don't grow up, they grow sideways. http://www.surriel.com/ http://kernelnewbies.org/ ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-26 16:02 ` Rik van Riel @ 2003-02-27 3:48 ` Daniel Phillips 0 siblings, 0 replies; 266+ messages in thread From: Daniel Phillips @ 2003-02-27 3:48 UTC (permalink / raw) To: Rik van Riel, Martin J. Bligh Cc: William Lee Irwin III, Andrea Arcangeli, Andrew Morton, Hanna Linder, lse-tech, linux-kernel On Wednesday 26 February 2003 17:02, Rik van Riel wrote: > On Tue, 25 Feb 2003, Martin J. Bligh wrote: > > It seemed, at least on the simple kernel compile tests that I did, that > > all the long chains are not anonymous. It killed 95% of the space issue, > > which given the simplicity of the patch was pretty damned stunning. Yes, > > there's a pointer per page I guess we could kill in the struct page > > itself, but I think you already have a better method for killing mem_map > > bloat ;-) > > Also, with copy-on-write and mremap after fork, doing an > object based rmap scheme for anonymous pages is just complex, > almost certainly far too complex to be worth it, since it just > has too many issues. Just read the patches by bcrl and davem, > things get hairy fast. > > The pte chain rmap scheme is clean, but suffers from too much > overhead for file mappings. There is a lot of redundancy in the rmap chains that could be exploited. If a pte page happens to reference a group of (say) 32 anon pages, then you can set each anon page's page->index to its position in the group and let a pte_chain node point at the pte of the first page of the group. You can then find each page's pte by adding its page->index to the pte_chain node's pte pointer. This allows a single rmap chain to be shared by all the pages in the group. This much of the idea is simple, however there are some tricky details to take care of. How does a copy-on-write break out one page of the group from one of the pte pages? I tried putting a (32 bit) bitmap in each pte_chain node to indicate which pte entries actually belong to the group, and that wasn't too bad except for doubling the per-link memory usage, turning a best case 32x gain into only 16x. It's probably better to break the group up, creating log2(groupsize) new chains. (This can be avoided in the common case that you already know every page in the group is going to be copied, as with a copy_from_user.) Getting rid of the bitmaps makes the single-page case the same as the current arrangement and makes it easy to let the size of a page be as large as the capacity of a whole pte page. There's also the problem of detecting groupable clusters of pages, e.g., in do_anon_page. Swap-out and swap-in introduce more messiness, as does mremap. In the end, I decided it's not needed in the current cycle, but probably worth investigating later. My purpose in bringing it up now is to show that there are still some more incremental gains to be had without needing radical surgery. > As shown by Dave's patch, a hybrid system really is simple and > clean, and it removes most of the pte chain overhead while still > keeping the code nice and efficient. > > I think this hybrid system is the way to go, possibly with a few > more tweaks left and right... Emphatically, yes. Regards, Daniel ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 19:18 ` Andrea Arcangeli 2003-02-25 19:27 ` Martin J. Bligh @ 2003-02-25 20:10 ` William Lee Irwin III 2003-02-25 20:23 ` Andrea Arcangeli 1 sibling, 1 reply; 266+ messages in thread From: William Lee Irwin III @ 2003-02-25 20:10 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Andrew Morton, Hanna Linder, lse-tech, linux-kernel On Tue, Feb 25, 2003 at 10:50:08AM -0800, William Lee Irwin III wrote: >> Just taking the exception dwarfs anything written in C. >> page_add_rmap() absorbs hits from all of the fault routines and >> copy_page_range(). page_remove_rmap() absorbs hits from zap_pte_range(). >> do_wp_page() is huge because it's doing bitblitting in-line. On Tue, Feb 25, 2003 at 08:18:17PM +0100, Andrea Arcangeli wrote: > "absorbing" is a nice word for it. The way I see it, page_add_rmap and > page_remove_rmap are even more expensive than the pagtable zapping. > They're even more expensive than copy_page_range. Also focus on the > numbers on the right that are even more interesting to find what is > worth to optimize away first IMHO Those just divide the number of hits by the size of the function IIRC, which is useless for some codepath spinning hard in the middle of a large function or in the presence of over-inlining. It's also greatly disturbed by spinlock section hackery (as are most profilers). On Tue, Feb 25, 2003 at 10:50:08AM -0800, William Lee Irwin III wrote: >> These things aren't cheap with or without rmap. Trimming down On Tue, Feb 25, 2003 at 08:18:17PM +0100, Andrea Arcangeli wrote: > lots of things aren't cheap, but this isn't a good reason to make them > twice more expensive, especially if they were as cheap as possible and > they're critical hot paths. They weren't as cheap as possible and it's a bad idea to make them so. SVR4 proved there are limits to the usefulness of lazy evaluation wrt. pagetable copying and the like. You're also looking at sampling hits, not end-to-end timings. After all these disclaimers, trimming down cpu cost is a good idea. On Tue, Feb 25, 2003 at 10:50:08AM -0800, William Lee Irwin III wrote: >> accounting overhead could raise search problems elsewhere. On Tue, Feb 25, 2003 at 08:18:17PM +0100, Andrea Arcangeli wrote: > this is the point indeed, but at least in 2.4 I don't see any cpu saving > advantage during swapping because during swapping the cpu is always idle > anyways. It's probably not swapping that matters, but high turnover of clean data. No one can really make a concrete assertion without some implementations of the alternatives, which is why I think they need to be done soon. Once one or more are there we're set. I'm personally in favor of the anonymous handling rework as the alternative to pursue, since that actually retains the locality of reference as opposed to wild pagetable scanning over random processes, which is highly unpredictable with respect to locality and even worse with respect to cpu consumption. On Tue, Feb 25, 2003 at 08:18:17PM +0100, Andrea Arcangeli wrote: > Infact I had to drop the lru_cache_add too from the anonymous page fault > path because it was wasting way too much cpu to get peak performance (of > course you're using per-page spinlocks by hand with rmap, and > lru_cache_add needs a global spinlock, so at least rmap shouldn't > introduce very big scalability issue unlike the lru_cache_add) The high arrival rates to LRU lists in do_anonymous_page() etc. were dealt with by the pagevec batching infrastructure in 2.5.x, which is the primary method by which pagemap_lru_lock contention was addressed. The "breakup" so to speak is primarily for locality of reference. Which reminds me, my node-local pgdat allocation patch is pending... On Tue, Feb 25, 2003 at 10:50:08AM -0800, William Lee Irwin III wrote: >> Whether avoiding the search problem is worth the accounting overhead >> could probably use some more investigation, like actually trying the >> anonymous page handling rework needed to use vma-based ptov resolution. On Tue, Feb 25, 2003 at 08:18:17PM +0100, Andrea Arcangeli wrote: > the only solution is to do rmap lazily, i.e. to start building the rmap > during swapping by walking the pagetables, basically exactly like I > refill the lru with anonymous pages only after I start to need this > information recently in my 2.4 tree, so if you never need to pageout > heavily several giga of ram (like most of very high end numa servers), > you'll never waste a single cycle in locking or whatever other worthless > accounting overhead that hurts performance of all common workloads I'd just bite the bullet and do the anonymous rework. Building pte_chains lazily raises the issue of needing to allocate in order to free, which is relatively thorny. Maintaining any level of accuracy of the things with lazy buildup is also problematic. That and the whole space issue wrt. pte_chains is blown away by the anonymous rework, which is a significant advantage. -- wli ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 20:10 ` William Lee Irwin III @ 2003-02-25 20:23 ` Andrea Arcangeli 2003-02-25 20:46 ` William Lee Irwin III 0 siblings, 1 reply; 266+ messages in thread From: Andrea Arcangeli @ 2003-02-25 20:23 UTC (permalink / raw) To: William Lee Irwin III, Andrew Morton, Hanna Linder, lse-tech, linux-kernel On Tue, Feb 25, 2003 at 12:10:23PM -0800, William Lee Irwin III wrote: > I'd just bite the bullet and do the anonymous rework. Building > pte_chains lazily raises the issue of needing to allocate in order to note that there is no need of allocate to free. Andrea ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 20:23 ` Andrea Arcangeli @ 2003-02-25 20:46 ` William Lee Irwin III 2003-02-25 20:52 ` Andrea Arcangeli 0 siblings, 1 reply; 266+ messages in thread From: William Lee Irwin III @ 2003-02-25 20:46 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Andrew Morton, Hanna Linder, lse-tech, linux-kernel On Tue, Feb 25, 2003 at 12:10:23PM -0800, William Lee Irwin III wrote: >> I'd just bite the bullet and do the anonymous rework. Building >> pte_chains lazily raises the issue of needing to allocate in order to On Tue, Feb 25, 2003 at 09:23:35PM +0100, Andrea Arcangeli wrote: > note that there is no need of allocate to free. I've no longer got any idea what you're talking about, then. -- wli ^ permalink raw reply [flat|nested] 266+ messages in thread
* Re: Minutes from Feb 21 LSE Call 2003-02-25 20:46 ` William Lee Irwin III @ 2003-02-25 20:52 ` Andrea Arcangeli 0 siblings, 0 replies; 266+ messages in thread From: Andrea Arcangeli @ 2003-02-25 20:52 UTC (permalink / raw) To: William Lee Irwin III, Andrew Morton, Hanna Linder, lse-tech, linux-kernel On Tue, Feb 25, 2003 at 12:46:16PM -0800, William Lee Irwin III wrote: > On Tue, Feb 25, 2003 at 12:10:23PM -0800, William Lee Irwin III wrote: > >> I'd just bite the bullet and do the anonymous rework. Building > >> pte_chains lazily raises the issue of needing to allocate in order to > > On Tue, Feb 25, 2003 at 09:23:35PM +0100, Andrea Arcangeli wrote: > > note that there is no need of allocate to free. > > I've no longer got any idea what you're talking about, then. Were we able to release memory w/o rmap: yes. Can we do it again: yes. Can we use a bit of the released memory to release further memory more efficiently with rmap: yes. I'm not saying it's easy to implement that, but the problem that we'll need memory to release memory doesn't exit, since it also never existed before rmap was introduced into the kernel. Sure, the early stage of the swapping would be more cpu-intensive, but that is the feature. Andrea ^ permalink raw reply [flat|nested] 266+ messages in thread
end of thread, other threads:[~2003-03-01 14:07 UTC | newest] Thread overview: 266+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2003-02-24 2:04 Minutes from Feb 21 LSE Call linux 2003-02-24 2:39 ` Linus Torvalds 2003-02-24 3:28 ` David Lang 2003-02-26 5:30 ` Bernd Eckenfels 2003-02-26 5:42 ` William Lee Irwin III 2003-02-26 7:22 ` David Lang 2003-02-27 17:50 ` Daniel Egger 2003-02-27 18:25 ` David Lang 2003-02-28 8:58 ` Filip Van Raemdonck 2003-02-28 19:48 ` Arador 2003-03-01 0:51 ` Chris Wedgwood 2003-03-01 1:14 ` Davide Libenzi 2003-03-01 1:27 ` David Lang 2003-03-01 14:15 ` Daniel Egger 2003-02-24 4:42 ` Martin J. Bligh 2003-02-24 4:58 ` Linus Torvalds [not found] <Pine.LNX.4.44.0302221417120.2686-100000@coffee.psychology.mcmaster.ca> [not found] ` <1510000.1045942974@[10.10.2.4]> 2003-02-22 19:56 ` Larry McVoy 2003-02-22 20:24 ` William Lee Irwin III 2003-02-22 21:02 ` Martin J. Bligh 2003-02-22 22:06 ` Mark Hahn 2003-02-22 22:17 ` William Lee Irwin III 2003-02-22 23:28 ` Larry McVoy 2003-02-22 23:47 ` Martin J. Bligh 2003-02-23 0:09 ` Gerrit Huizenga 2003-02-23 8:01 ` Larry McVoy 2003-02-23 8:05 ` William Lee Irwin III 2003-02-24 18:36 ` Andy Pfiffer 2003-02-22 22:44 ` Ben Greear 2003-02-23 23:29 ` Bill Davidsen 2003-02-23 23:37 ` Martin J. Bligh 2003-02-24 4:57 ` Larry McVoy 2003-02-24 6:10 ` Gerhard Mack 2003-02-24 6:52 ` Larry McVoy 2003-02-24 7:46 ` Bill Huey 2003-02-24 7:44 ` Bill Huey 2003-02-24 7:54 ` William Lee Irwin III 2003-02-24 8:00 ` Bill Huey 2003-02-24 8:40 ` Andrew Morton 2003-02-24 8:50 ` William Lee Irwin III 2003-02-24 16:17 ` yodaiken 2003-02-24 23:13 ` William Lee Irwin III 2003-02-24 23:27 ` yodaiken 2003-02-24 23:54 ` William Lee Irwin III 2003-02-24 23:54 ` yodaiken 2003-02-25 2:17 ` Bill Huey 2003-02-25 2:24 ` yodaiken 2003-02-25 2:35 ` Bill Huey 2003-02-25 2:43 ` Bill Huey 2003-02-25 2:32 ` Larry McVoy 2003-02-25 2:40 ` Bill Huey 2003-02-25 5:24 ` Rik van Riel 2003-02-25 15:30 ` Alan Cox 2003-02-25 14:59 ` Bill Huey 2003-02-25 15:44 ` yodaiken 2003-02-26 19:31 ` Bill Davidsen 2003-02-27 0:56 ` Bill Huey 2003-02-27 20:04 ` Bill Davidsen 2003-02-25 2:07 ` Bill Huey 2003-02-25 2:14 ` Larry McVoy 2003-02-25 2:24 ` Bill Huey 2003-02-25 2:46 ` Valdis.Kletnieks 2003-02-25 14:47 ` Mr. James W. Laferriere 2003-02-25 15:59 ` Jesse Pollard 2003-02-24 8:56 ` Bill Huey 2003-02-24 9:09 ` Andrew Morton 2003-02-24 9:24 ` Bill Huey 2003-02-24 9:56 ` Andrew Morton 2003-02-24 10:11 ` Bill Huey 2003-02-24 14:40 ` Bill Davidsen 2003-02-24 21:10 ` Andrea Arcangeli 2003-02-24 8:43 ` William Lee Irwin III 2003-02-22 23:10 ` Martin J. Bligh 2003-02-22 23:20 ` Larry McVoy 2003-02-22 23:46 ` Martin J. Bligh 2003-02-25 2:19 ` Hans Reiser 2003-02-25 3:49 ` Martin J. Bligh 2003-02-25 5:12 ` Steven Cole 2003-02-25 20:37 ` Scott Robert Ladd 2003-02-25 21:36 ` Hans Reiser 2003-02-25 23:28 ` Scott Robert Ladd 2003-02-25 23:41 ` Hans Reiser 2003-02-26 0:19 ` Scott Robert Ladd 2003-02-26 0:35 ` Hans Reiser 2003-02-26 16:31 ` Horst von Brand 2003-02-26 0:47 ` Steven Cole 2003-02-26 16:07 ` Horst von Brand 2003-02-26 19:47 ` Alan Cox 2003-02-26 6:04 ` Aaron Lehmann 2003-02-26 0:44 ` Alan Cox 2003-02-25 23:58 ` Scott Robert Ladd 2003-02-22 23:15 ` Larry McVoy 2003-02-22 23:23 ` Christoph Hellwig 2003-02-22 23:54 ` Mark Hahn 2003-02-22 23:44 ` Martin J. Bligh 2003-02-24 4:56 ` Larry McVoy 2003-02-24 5:06 ` William Lee Irwin III 2003-02-24 6:00 ` Mark Hahn 2003-02-24 6:02 ` William Lee Irwin III 2003-02-24 15:06 ` Alan Cox 2003-02-24 23:18 ` William Lee Irwin III 2003-02-24 5:16 ` Martin J. Bligh 2003-02-24 6:58 ` Larry McVoy 2003-02-24 7:39 ` Martin J. Bligh 2003-02-24 16:17 ` Larry McVoy 2003-02-24 16:49 ` Martin J. Bligh 2003-02-24 18:22 ` John W. M. Stevens 2003-02-24 7:51 ` William Lee Irwin III 2003-02-24 15:47 ` Larry McVoy 2003-02-24 16:00 ` Martin J. Bligh 2003-02-24 16:23 ` Benjamin LaHaise 2003-02-24 16:25 ` yodaiken 2003-02-24 18:20 ` Gerrit Huizenga 2003-02-24 16:31 ` Larry McVoy 2003-02-24 23:36 ` William Lee Irwin III 2003-02-25 0:23 ` Larry McVoy 2003-02-25 2:37 ` Werner Almesberger 2003-02-25 4:42 ` William Lee Irwin III 2003-02-25 4:54 ` Larry McVoy 2003-02-25 6:00 ` William Lee Irwin III 2003-02-25 7:00 ` Val Henson 2003-02-24 13:28 ` Alan Cox 2003-02-25 5:19 ` Chris Wedgwood 2003-02-25 5:26 ` William Lee Irwin III 2003-02-25 21:21 ` Chris Wedgwood 2003-02-25 21:14 ` Martin J. Bligh 2003-02-25 21:21 ` William Lee Irwin III 2003-02-25 22:08 ` Larry McVoy 2003-02-25 22:10 ` William Lee Irwin III 2003-02-25 22:37 ` Chris Wedgwood 2003-02-25 22:58 ` Larry McVoy 2003-02-25 6:17 ` Martin J. Bligh 2003-02-25 17:11 ` Cliff White 2003-02-25 17:17 ` William Lee Irwin III 2003-02-25 17:38 ` Linus Torvalds 2003-02-25 19:54 ` Dave Jones 2003-02-26 2:04 ` Linus Torvalds 2003-02-25 19:48 ` Martin J. Bligh 2003-02-25 21:28 ` William Lee Irwin III 2003-02-25 19:20 ` Alan Cox 2003-02-25 19:59 ` Scott Robert Ladd 2003-02-25 20:18 ` jlnance 2003-02-25 20:59 ` Scott Robert Ladd 2003-02-25 21:19 ` Chris Wedgwood 2003-02-25 21:38 ` Scott Robert Ladd 2003-02-24 18:44 ` Davide Libenzi 2003-02-22 23:57 ` Jeff Garzik 2003-02-23 23:57 ` Bill Davidsen 2003-02-24 6:22 ` Val Henson 2003-02-24 6:41 ` William Lee Irwin III 2003-02-22 21:29 ` Jeff Garzik -- strict thread matches above, loose matches on Subject: below -- 2003-02-21 23:48 Hanna Linder 2003-02-22 0:16 ` Larry McVoy 2003-02-22 0:25 ` William Lee Irwin III 2003-02-22 2:24 ` Steven Cole 2003-02-22 0:44 ` Martin J. Bligh 2003-02-22 2:47 ` Larry McVoy 2003-02-22 4:32 ` Martin J. Bligh 2003-02-22 5:05 ` Larry McVoy 2003-02-22 6:39 ` Martin J. Bligh 2003-02-22 8:38 ` Jeff Garzik 2003-02-22 22:18 ` William Lee Irwin III 2003-02-23 0:50 ` Martin J. Bligh 2003-02-23 11:22 ` Magnus Danielson 2003-02-23 19:54 ` Eric W. Biederman 2003-02-23 1:17 ` Benjamin LaHaise 2003-02-23 5:21 ` Gerrit Huizenga 2003-02-23 8:07 ` David Lang 2003-02-23 8:20 ` William Lee Irwin III 2003-02-23 19:17 ` Linus Torvalds 2003-02-23 19:29 ` David Mosberger 2003-02-23 20:13 ` Martin J. Bligh 2003-02-23 22:01 ` David Mosberger 2003-02-23 22:12 ` Martin J. Bligh 2003-02-23 21:34 ` Linus Torvalds 2003-02-23 22:40 ` David Mosberger 2003-02-23 22:48 ` David Lang 2003-02-23 22:54 ` David Mosberger 2003-02-23 22:56 ` David Lang 2003-02-24 0:40 ` Linus Torvalds 2003-02-24 2:32 ` David Mosberger 2003-02-24 2:54 ` Linus Torvalds 2003-02-24 3:08 ` David Mosberger 2003-02-24 21:42 ` Andrea Arcangeli 2003-02-24 1:06 ` dean gaudet 2003-02-24 1:56 ` David Mosberger 2003-02-24 2:15 ` dean gaudet 2003-02-24 3:11 ` David Mosberger 2003-02-23 23:06 ` Martin J. Bligh 2003-02-23 23:59 ` David Mosberger 2003-02-24 3:49 ` Gerrit Huizenga 2003-02-24 4:07 ` David Mosberger 2003-02-24 4:34 ` Martin J. Bligh 2003-02-24 5:02 ` Gerrit Huizenga 2003-02-23 20:21 ` Xavier Bestel 2003-02-23 20:50 ` Martin J. Bligh 2003-02-23 23:57 ` Alan Cox 2003-02-24 1:26 ` Kenneth Johansson 2003-02-24 1:53 ` dean gaudet 2003-02-23 21:35 ` Alan Cox 2003-02-23 21:41 ` Linus Torvalds 2003-02-24 0:01 ` Bill Davidsen 2003-02-24 0:36 ` yodaiken 2003-02-23 21:15 ` John Bradford 2003-02-23 21:45 ` Linus Torvalds 2003-02-24 1:25 ` Benjamin LaHaise 2003-02-23 21:55 ` William Lee Irwin III 2003-02-23 19:13 ` David Mosberger 2003-02-23 23:28 ` Benjamin LaHaise 2003-02-26 8:46 ` Eric W. Biederman 2003-02-23 20:48 ` Gerrit Huizenga 2003-02-23 9:37 ` William Lee Irwin III 2003-02-22 8:38 ` David S. Miller 2003-02-22 8:38 ` David S. Miller 2003-02-22 14:34 ` Larry McVoy 2003-02-22 15:47 ` Martin J. Bligh 2003-02-22 16:13 ` Larry McVoy 2003-02-22 16:29 ` Martin J. Bligh 2003-02-22 16:33 ` Larry McVoy 2003-02-22 16:39 ` Martin J. Bligh 2003-02-22 16:59 ` John Bradford 2003-02-24 18:00 ` Timothy D. Witham 2003-02-22 8:32 ` David S. Miller 2003-02-22 18:20 ` Alan Cox 2003-02-22 20:05 ` William Lee Irwin III 2003-02-22 21:35 ` Alan Cox 2003-02-22 21:36 ` Gerrit Huizenga 2003-02-22 21:42 ` Christoph Hellwig 2003-02-23 23:23 ` Bill Davidsen 2003-02-24 3:31 ` Gerrit Huizenga 2003-02-24 4:02 ` Larry McVoy 2003-02-24 4:15 ` Russell Leighton 2003-02-24 5:11 ` William Lee Irwin III 2003-02-24 8:07 ` Christoph Hellwig 2003-02-23 0:37 ` Eric W. Biederman 2003-02-23 0:42 ` Eric W. Biederman 2003-02-23 14:29 ` Rik van Riel 2003-02-23 17:28 ` Eric W. Biederman 2003-02-24 1:42 ` Benjamin LaHaise 2003-02-23 3:24 ` Andrew Morton 2003-02-25 17:17 ` Andrea Arcangeli 2003-02-25 17:43 ` William Lee Irwin III 2003-02-25 17:59 ` Andrea Arcangeli 2003-02-25 18:04 ` William Lee Irwin III 2003-02-25 18:50 ` William Lee Irwin III 2003-02-25 19:18 ` Andrea Arcangeli 2003-02-25 19:27 ` Martin J. Bligh 2003-02-25 20:30 ` Andrea Arcangeli 2003-02-25 20:53 ` Martin J. Bligh 2003-02-25 21:17 ` Andrea Arcangeli 2003-02-25 21:12 ` Martin J. Bligh 2003-02-25 22:16 ` Andrea Arcangeli 2003-02-25 22:17 ` Martin J. Bligh 2003-02-25 22:37 ` Andrea Arcangeli 2003-02-25 21:26 ` William Lee Irwin III 2003-02-25 22:18 ` Andrea Arcangeli 2003-02-26 5:24 ` Rik van Riel 2003-02-26 5:38 ` William Lee Irwin III 2003-02-26 6:01 ` Martin J. Bligh 2003-02-26 6:14 ` William Lee Irwin III 2003-02-26 6:32 ` William Lee Irwin III 2003-02-26 16:02 ` Rik van Riel 2003-02-27 3:48 ` Daniel Phillips 2003-02-25 20:10 ` William Lee Irwin III 2003-02-25 20:23 ` Andrea Arcangeli 2003-02-25 20:46 ` William Lee Irwin III 2003-02-25 20:52 ` Andrea Arcangeli
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).