* Re: kernel + gcc 4.1 = several problems @ 2007-01-04 7:11 Albert Cahalan 2007-01-04 16:43 ` Segher Boessenkool 0 siblings, 1 reply; 60+ messages in thread From: Albert Cahalan @ 2007-01-04 7:11 UTC (permalink / raw) To: mikpe, s0348365, torvalds, linux-kernel, akpm, bunk Linus Torvalds writes: > [probably Mikael Pettersson] writes: >> The suggestions I've had so far which I have not yet tried: >> >> - Select a different x86 CPU in the config. >> - Unfortunately the C3-2 flags seem to simply tell GCC to >> schedule for ppro (like i686) and enabled MMX and SSE >> - Probably useless > > Actually, try this one. Try using something that doesn't like "cmov". > Maybe the C3-2 simply has some internal cmov bugginess. Of course that changes register usage, register spilling, and thus ultimately even the stack layout. :-( Adjusting gcc flags to eliminate optimizations is another way to go. Adding -fwrapv would be an excellent start. Lack of this flag breaks most code which checks for integer wrap-around. The compiler "knows" that signed integers don't ever wrap, and thus eliminates any code which checks for values going negative after a wrap-around. I could imagine this affecting a switch() or other jump table. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-04 7:11 kernel + gcc 4.1 = several problems Albert Cahalan @ 2007-01-04 16:43 ` Segher Boessenkool 2007-01-04 17:04 ` Albert Cahalan 0 siblings, 1 reply; 60+ messages in thread From: Segher Boessenkool @ 2007-01-04 16:43 UTC (permalink / raw) To: Albert Cahalan; +Cc: akpm, linux-kernel, s0348365, bunk, mikpe, torvalds > Adjusting gcc flags to eliminate optimizations is another way to go. > Adding -fwrapv would be an excellent start. Lack of this flag breaks > most code which checks for integer wrap-around. Lack of the flag does not break any valid C code, only code making unwarranted assumptions (i.e., buggy code). > The compiler "knows" > that signed integers don't ever wrap, and thus eliminates any code > which checks for values going negative after a wrap-around. You cannot assume it eliminates such code; the compiler is free to do whatever it wants in such a case. You should typically write such a computation using unsigned types, FWIW. Anyway, with 4.1 you shouldn't see frequent problems due to "not using -fwrapv while my code is broken WRT signed overflow" yet; and if/when problems start to happen, to "correct" action to take is not to add the compiler flag, but to fix the code. Segher ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-04 16:43 ` Segher Boessenkool @ 2007-01-04 17:04 ` Albert Cahalan 2007-01-04 17:24 ` Segher Boessenkool ` (2 more replies) 0 siblings, 3 replies; 60+ messages in thread From: Albert Cahalan @ 2007-01-04 17:04 UTC (permalink / raw) To: Segher Boessenkool; +Cc: akpm, linux-kernel, s0348365, bunk, mikpe, torvalds On 1/4/07, Segher Boessenkool <segher@kernel.crashing.org> wrote: > > Adjusting gcc flags to eliminate optimizations is another way to go. > > Adding -fwrapv would be an excellent start. Lack of this flag breaks > > most code which checks for integer wrap-around. > > Lack of the flag does not break any valid C code, only code > making unwarranted assumptions (i.e., buggy code). Right, if "C" means "strictly conforming ISO C" to you. (in which case, nearly all real-world code is broken) FYI, the kernel also assumes that a "char" is 8 bits. Maybe you should run away screaming. > > The compiler "knows" > > that signed integers don't ever wrap, and thus eliminates any code > > which checks for values going negative after a wrap-around. > > You cannot assume it eliminates such code; the compiler is free > to do whatever it wants in such a case. > > You should typically write such a computation using unsigned > types, FWIW. > > Anyway, with 4.1 you shouldn't see frequent problems due to Right, it gets much worse with the current gcc snapshots. IMHO you should play such games with "g++ -O9", but that's a discussion for a different mailing list. > "not using -fwrapv while my code is broken WRT signed overflow" > yet; and if/when problems start to happen, to "correct" action > to take is not to add the compiler flag, but to fix the code. Nope, unless we decide that the performance advantages of a language change are worth the risk and pain. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-04 17:04 ` Albert Cahalan @ 2007-01-04 17:24 ` Segher Boessenkool 2007-01-04 17:47 ` Linus Torvalds 2007-01-05 17:17 ` Pavel Machek 2007-01-04 17:37 ` Linus Torvalds 2007-01-04 18:08 ` Andreas Schwab 2 siblings, 2 replies; 60+ messages in thread From: Segher Boessenkool @ 2007-01-04 17:24 UTC (permalink / raw) To: Albert Cahalan; +Cc: akpm, linux-kernel, s0348365, bunk, mikpe, torvalds >> Lack of the flag does not break any valid C code, only code >> making unwarranted assumptions (i.e., buggy code). > > Right, if "C" means "strictly conforming ISO C" to you. Without any further qualification, it of course does, yes. > (in which case, nearly all real-world code is broken) Not "nearly all" -- but lots of code, yes. > FYI, the kernel also assumes that a "char" is 8 bits. > Maybe you should run away screaming. No, that's fine with me. It's fine with GCC as well of course. >> Anyway, with 4.1 you shouldn't see frequent problems due to > > Right, it gets much worse with the current gcc snapshots. Yes. And that problem will be fixed some way pretty soon -- simply because it _has_ to be fixed. > IMHO you should play such games with "g++ -O9", but that's > a discussion for a different mailing list. For a different mailing list indeed; let me just point out that for certain important quite common cases it's an ~50% overall speedup. >> "not using -fwrapv while my code is broken WRT signed overflow" >> yet; and if/when problems start to happen, to "correct" action >> to take is not to add the compiler flag, but to fix the code. > > Nope, unless we decide that the performance advantages of > a language change are worth the risk and pain. If the kernel breaks all over the place, of course you should add the flag. But it won't, it would break *all* programs all over the place then, and that wouldn't be acceptable to GCC. If instead only a few kernel code bugs pop up, it's easy to fix. Aaaaanyway -- my only real point was to point out that there's no doomsday scenario here, yes current GCC TOT seems to regress here (for some definition of that word), but GCC development is in stage 1, that sort of thing happens. It'll stabilise again. In the meantime, building git HEAD kernels with GCC 4.1 and 4.2 will probably rattle out quite a few bugs still, both in the kernel and in GCC -- neither is used all that often it seems? Segher ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-04 17:24 ` Segher Boessenkool @ 2007-01-04 17:47 ` Linus Torvalds 2007-01-04 18:53 ` Segher Boessenkool 2007-01-04 19:10 ` Al Viro 2007-01-05 17:17 ` Pavel Machek 1 sibling, 2 replies; 60+ messages in thread From: Linus Torvalds @ 2007-01-04 17:47 UTC (permalink / raw) To: Segher Boessenkool Cc: Albert Cahalan, akpm, linux-kernel, s0348365, bunk, mikpe On Thu, 4 Jan 2007, Segher Boessenkool wrote: > > > (in which case, nearly all real-world code is broken) > > Not "nearly all" -- but lots of code, yes. I wouldn't say "lots of code". I would say "all real projects". NOBODY will guarantee you that they follow all standards to the letter. Some use compiler extensions knowingly, but pretty much _everybody_ ends up depending on subtle issues without even realizing it. It's almost impossible to write a real program that has no bugs, and if they don't show up in testing (because the compiler didn't generate buggy assembly code from source code that had the _potential_ for bugs), they often won't get fixed. The kernel does things like compare pointers across objects, and the kernel EXPECTS it to work. I seriously doubt that the kernel is even unusual in this. The common way to avoid AB-BA deadlocks in any threaded code (whether kernel or user space) is to just take two locks in a specific order, and the common way to do that for locks of the same type is simply to compare the addresses). The fact that this is "undefined" behaviour matters not a _whit_. Not for the kernel, and I bet not for a lot of other applications either. So "nearly all" is probably _understating_ things rather than overstating it as you claim. Anybody who thinks that they have proven the correctness of their program is likely lying. It's a good thing if they have _tested_ all the code-paths, but they've invariably been tested with a compiler that doesn't go out of its way to try to generate "legal but idiotic" code. So the testing won't generally find cases where the compiler may have been _allowed_ to do something else. The end result: any nontrivial project always has dodgy code. Because people simply don't write perfect code. Compiler people who don't realize this aren't compiler people. They're academics involved with mental masturbation. Linus ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-04 17:47 ` Linus Torvalds @ 2007-01-04 18:53 ` Segher Boessenkool 2007-01-04 19:10 ` Al Viro 1 sibling, 0 replies; 60+ messages in thread From: Segher Boessenkool @ 2007-01-04 18:53 UTC (permalink / raw) To: Linus Torvalds; +Cc: akpm, Albert Cahalan, linux-kernel, s0348365, bunk, mikpe >>> (in which case, nearly all real-world code is broken) >> >> Not "nearly all" -- but lots of code, yes. > > I wouldn't say "lots of code". I would say "all real projects". All projects that tell the compiler they're written in ISO C, while they're not, can easily break, sure. You can't say this is GCC's fault; sure in some cases decisions were made that resulted in more of those programs breaking than was really necessary, but it's obviously *impossible* to prevent all from breaking. And yes it's true: most people do not program in ISO C at all, _even if they think they do_, simply because they are not aware of all the rules. For some of the areas where most of the mistakes are made, for example aliasing rules and signed overflow, GCC provides helpful options to switch behaviour to something that makes those people's programs work. You can also use those options if you have made a conscious decision that you want to write your code in one of the resulting dialects of C. Segher p.s. If it's decided to not use -fwrapv, a debug option that sets -ftrapv can be introduced -- it will make it a BUG() if any (accidental) signed overflow happens after all. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-04 17:47 ` Linus Torvalds 2007-01-04 18:53 ` Segher Boessenkool @ 2007-01-04 19:10 ` Al Viro 1 sibling, 0 replies; 60+ messages in thread From: Al Viro @ 2007-01-04 19:10 UTC (permalink / raw) To: Linus Torvalds Cc: Segher Boessenkool, Albert Cahalan, akpm, linux-kernel, s0348365, bunk, mikpe On Thu, Jan 04, 2007 at 09:47:01AM -0800, Linus Torvalds wrote: > NOBODY will guarantee you that they follow all standards to the letter. > Some use compiler extensions knowingly, but pretty much _everybody_ ends > up depending on subtle issues without even realizing it. It's almost > impossible to write a real program that has no bugs, and if they don't > show up in testing (because the compiler didn't generate buggy assembly > code from source code that had the _potential_ for bugs), they often won't > get fixed. > > The kernel does things like compare pointers across objects, and the > kernel EXPECTS it to work. I seriously doubt that the kernel is even > unusual in this. The common way to avoid AB-BA deadlocks in any threaded > code (whether kernel or user space) is to just take two locks in a > specific order, and the common way to do that for locks of the same type > is simply to compare the addresses). > > The fact that this is "undefined" behaviour matters not a _whit_. Not for > the kernel, and I bet not for a lot of other applications either. True, but we'd better understand what assumptions we are making. I have seen patches seriously attempting to _subtract_ unrelated pointers. And that simply doesn't work for obvious reasons... ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-04 17:24 ` Segher Boessenkool 2007-01-04 17:47 ` Linus Torvalds @ 2007-01-05 17:17 ` Pavel Machek 2007-01-06 8:23 ` Segher Boessenkool 1 sibling, 1 reply; 60+ messages in thread From: Pavel Machek @ 2007-01-05 17:17 UTC (permalink / raw) To: Segher Boessenkool Cc: Albert Cahalan, akpm, linux-kernel, s0348365, bunk, mikpe, torvalds Hi! > >IMHO you should play such games with "g++ -O9", but > >that's > >a discussion for a different mailing list. > > For a different mailing list indeed; let me just point > out > that for certain important quite common cases it's an > ~50% > overall speedup. Hmm, what code was that? 'signed int does not wrap around' does not seem to provide _that_ much info... Pavel -- Thanks for all the (sleeping) penguins. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-05 17:17 ` Pavel Machek @ 2007-01-06 8:23 ` Segher Boessenkool 0 siblings, 0 replies; 60+ messages in thread From: Segher Boessenkool @ 2007-01-06 8:23 UTC (permalink / raw) To: Pavel Machek Cc: akpm, Albert Cahalan, linux-kernel, s0348365, bunk, mikpe, torvalds >> For a different mailing list indeed; let me just point >> out >> that for certain important quite common cases it's an >> ~50% >> overall speedup. > > Hmm, what code was that? 'signed int does not wrap around' does not > seem to provide _that_ much info... One of the recent huge threads on the GCC dev list has a post that says *some other* compiler gets a result like this from this optimisation (I don't have a link to the exact post and I don't remember the details; perhaps it was XLC?) Sorry if I wasn't clear enough and you understood I meant that GCC exploits this optimisation opportunity well enough for such nice results already. - - - So I searched for it anyway: <http://gcc.gnu.org/ml/gcc/2006-12/msg00768.html> It looks like the result for *integer* code wasn't *all* that dramatic a difference. Anyway, it's obvious that the optimisation can certainly give nice results and it wouldn't be a good idea for the Linux kernel to dismiss it without really evaluating the impact first; and anyway, this is for some future date, GCC-4.2 isn't here yet. Segher ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-04 17:04 ` Albert Cahalan 2007-01-04 17:24 ` Segher Boessenkool @ 2007-01-04 17:37 ` Linus Torvalds 2007-01-04 18:34 ` Segher Boessenkool 2007-01-07 4:25 ` Denis Vlasenko 2007-01-04 18:08 ` Andreas Schwab 2 siblings, 2 replies; 60+ messages in thread From: Linus Torvalds @ 2007-01-04 17:37 UTC (permalink / raw) To: Albert Cahalan Cc: Segher Boessenkool, akpm, linux-kernel, s0348365, bunk, mikpe On Thu, 4 Jan 2007, Albert Cahalan wrote: > On 1/4/07, Segher Boessenkool <segher@kernel.crashing.org> wrote: > > > > Lack of the flag does not break any valid C code, only code > > making unwarranted assumptions (i.e., buggy code). > > Right, if "C" means "strictly conforming ISO C" to you. > (in which case, nearly all real-world code is broken) Indeed. The gcc people seem to often think that "language lawyering" is a good idea, and totally overrides "real world". The whole flap about the completely idiotic things they do (or at least did) for alias analysis on the grounds that "they can" is an example of this. > FYI, the kernel also assumes that a "char" is 8 bits. > Maybe you should run away screaming. Gcc people are quick to condemn others for assumptions that breaks standards, but it has tons of assumptions very deeply embedded itself. I don't think it could realistically work very well on setups where pointers aren't the same size as long, and it has various deep assumptions itself about what is "realistic". The kernel does the same. Some of it intentional and by design, much of it probably totally unintentional, but the result of "it worked, and nobody even thought about anything else". With 7+ million lines of C code and headers, I'm not interested in compilers that read the letter of the law. We don't want some really clever code generation that gets us .5% on some unrealistic load. We want good _solid_ code generation that does the obvious thing. Compiler writers seem to seldom even realize this. A lot of commercial code gets shipped with basically no optimizations at all (or with specific optimizations turned off), because people want to ship what they debug and work with. I'll happily turn off compiler features that are "clever optimizations that never actually matter in practice, but are just likely to possible cause problems". The sad part is that "straightforward optimizations" (as opposed to "really clever ones") often work better in practice too. At least with kernel code, which is not that high-level to begin with. > > to take is not to add the compiler flag, but to fix the code. > > Nope, unless we decide that the performance advantages of > a language change are worth the risk and pain. Indeed. We'd happily fix the code if: (a) it's reasonably easy to find places that are buggy. (b) there are syntactically sane ways to fix it (c) the optimization actually makes sense and is worthwhile An example of where _none_ of these things were true was the old gcc alias analysis. I think gcc eventually added a sane way to mark pointers as being possible aliases (ie case (b): give a syntactially acceptable way for code maintainability to actually fix things), but since neither (a) nor (b) are there, the _correct_ solution was just to tell the compiler to stop doing that. With integer overflow optimizations, the same situation may be true. The kernel has never been "strict ANSI C". We've always used C extensions. The extension of "signed integer arithmetic follows 2's-complement-arithmetic" is a perfectly sane extension to the language, and quite possibly worth it. And the fact that it's not "strict ANSI C" has absolutely _zero_ relevance. Linus ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-04 17:37 ` Linus Torvalds @ 2007-01-04 18:34 ` Segher Boessenkool 2007-01-04 22:02 ` Geert Bosch 2007-01-07 4:25 ` Denis Vlasenko 1 sibling, 1 reply; 60+ messages in thread From: Segher Boessenkool @ 2007-01-04 18:34 UTC (permalink / raw) To: Linus Torvalds; +Cc: akpm, Albert Cahalan, linux-kernel, s0348365, bunk, mikpe > I'll happily turn off compiler features that are "clever optimizations > that never actually matter in practice, but are just likely to possible > cause problems". The "signed wrap is undefined" thing doesn't fit in this category though: -- It is an important optimisation for loops with a signed induction variable; -- "Random code" where it causes problems is typically buggy already (i.e., code that doesn't take overflow into account at all won't expect wraparound either); -- Code that explicitly depends on signed overflow two's complement wraparound can be trivially converted to use unsigned arithmetic (and in almost all cases it really should have used that already). If GCC can generate warnings for things in the second bullet point (and it probably will, but nothing is finalised yet), I don't see a reason for the kernel to turn off the optimisation. Why not try it out and only _if_ it causes troubles (after the compiler version is stable) turn it off. to take is not to add the compiler flag, but to fix the code. >> >> Nope, unless we decide that the performance advantages of >> a language change are worth the risk and pain. But it's not a language change -- GCC has worked like this for a _long_ time already, since May 2003 if I read the ChangeLog correctly -- it's just that it starts to optimise some things more aggressively now. > With integer overflow optimizations, the same situation may be true. > The > kernel has never been "strict ANSI C". We've always used C extensions. > The > extension of "signed integer arithmetic follows > 2's-complement-arithmetic" > is a perfectly sane extension to the language, and quite possibly worth > it. Could be. Who knows, without testing. I'm just saying to not add -fwrapv purely as a knee-jerk reaction. > And the fact that it's not "strict ANSI C" has absolutely _zero_ > relevance. I certainly never claimed so, that's all in Albert's mind it seems :-) Segher ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-04 18:34 ` Segher Boessenkool @ 2007-01-04 22:02 ` Geert Bosch 0 siblings, 0 replies; 60+ messages in thread From: Geert Bosch @ 2007-01-04 22:02 UTC (permalink / raw) To: Segher Boessenkool Cc: Linus Torvalds, akpm, Albert Cahalan, linux-kernel, s0348365, bunk, mikpe On Jan 4, 2007, at 13:34, Segher Boessenkool wrote: > The "signed wrap is undefined" thing doesn't fit in this category > though: > > -- It is an important optimisation for loops with a signed > induction variable; It certainly isn't that important. Even SpecINT compiled with -O3 and top-of-tree GCC *improves* 1% by adding -fwrapv. If the compiler itself can rely on wrap-around semantics and doesn't have to worry about introducing overflows between optimization passes, it can reorder simple chains of additions. This is more important for many real-world applications than being able to perform some complex loop-interchange. Compiler developers always make the mistake of overrating their optimizations. If GCC does really poorly on a few important loops that matter, that issue is easily addressed. If GCC generates unreliable code for millions of boring lines of important real-world C, the compiler is worthless. -Geert ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-04 17:37 ` Linus Torvalds 2007-01-04 18:34 ` Segher Boessenkool @ 2007-01-07 4:25 ` Denis Vlasenko 2007-01-07 4:45 ` Linus Torvalds 2007-01-07 15:10 ` Segher Boessenkool 1 sibling, 2 replies; 60+ messages in thread From: Denis Vlasenko @ 2007-01-07 4:25 UTC (permalink / raw) To: Linus Torvalds Cc: Albert Cahalan, Segher Boessenkool, akpm, linux-kernel, s0348365, bunk, mikpe On Thursday 04 January 2007 18:37, Linus Torvalds wrote: > With 7+ million lines of C code and headers, I'm not interested in > compilers that read the letter of the law. We don't want some really > clever code generation that gets us .5% on some unrealistic load. We want > good _solid_ code generation that does the obvious thing. > > Compiler writers seem to seldom even realize this. A lot of commercial > code gets shipped with basically no optimizations at all (or with specific > optimizations turned off), because people want to ship what they debug and > work with. I'd say "care about obvious, safe optimizations which we still not do". I want this: char v[4]; ... memcmp(v, "abcd", 4) == 0 compile to single cmpl on i386. This (gcc 4.1.1) is ridiculous: .LC0: .string "abcd" .text ... pushl $4 pushl $.LC0 pushl $v call memcmp addl $12, %esp testl %eax, %eax There are tons of examples where you can improve code generation. -- vda ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-07 4:25 ` Denis Vlasenko @ 2007-01-07 4:45 ` Linus Torvalds 2007-01-07 5:26 ` Jeff Garzik 2007-01-07 15:10 ` Segher Boessenkool 1 sibling, 1 reply; 60+ messages in thread From: Linus Torvalds @ 2007-01-07 4:45 UTC (permalink / raw) To: Denis Vlasenko Cc: Albert Cahalan, Segher Boessenkool, akpm, linux-kernel, s0348365, bunk, mikpe On Sun, 7 Jan 2007, Denis Vlasenko wrote: > > I'd say "care about obvious, safe optimizations which we still not do". > I want this: > > char v[4]; > ... > memcmp(v, "abcd", 4) == 0 > > compile to single cmpl on i386. Yeah. For a more relevant case, look at the hoops we used to jump through to get "memcpy()" to generate ok code for trivial fixed-sized cases. (That said, I think __builtin_memcpy() does a reasonable job these days with gcc, and we might drop the crap one day when we can trust the compiler to do ok. It didn't use to, and we continued using our ridiculous macro/__builtin_constant_p misuses just because it works with _all_ relevant gcc versions). Linus ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-07 4:45 ` Linus Torvalds @ 2007-01-07 5:26 ` Jeff Garzik 0 siblings, 0 replies; 60+ messages in thread From: Jeff Garzik @ 2007-01-07 5:26 UTC (permalink / raw) To: Linus Torvalds Cc: Denis Vlasenko, Albert Cahalan, Segher Boessenkool, akpm, linux-kernel, s0348365, bunk, mikpe Linus Torvalds wrote: > (That said, I think __builtin_memcpy() does a reasonable job these days > with gcc, and we might drop the crap one day when we can trust the > compiler to do ok. It didn't use to, and we continued using our > ridiculous macro/__builtin_constant_p misuses just because it works with > _all_ relevant gcc versions). Yep, a ton of work by Roger Sayle, among others, really matured the gcc str*/mem* builtins in the 4.x series. They are definitely worth another look. Jeff ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-07 4:25 ` Denis Vlasenko 2007-01-07 4:45 ` Linus Torvalds @ 2007-01-07 15:10 ` Segher Boessenkool 2007-01-26 22:05 ` Michael K. Edwards 1 sibling, 1 reply; 60+ messages in thread From: Segher Boessenkool @ 2007-01-07 15:10 UTC (permalink / raw) To: Denis Vlasenko Cc: akpm, Albert Cahalan, linux-kernel, s0348365, Linus Torvalds, bunk, mikpe > I want this: > > char v[4]; > ... > memcmp(v, "abcd", 4) == 0 > > compile to single cmpl on i386. This (gcc 4.1.1) is ridiculous: > call memcmp i686-linux-gcc (GCC) 4.2.0 20060410 (experimental) movl $4, %ecx #, tmp65 cld movl $v, %esi #, tmp63 movl $.LC0, %edi #, tmp64 repz cmpsb sete %al #, tmp68 Still not perfect, but better already. If you have any specific examples that you'd like to have compiled to better code, please report them in GCC bugzilla (with a self-contained testcase, please). Segher ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-07 15:10 ` Segher Boessenkool @ 2007-01-26 22:05 ` Michael K. Edwards 0 siblings, 0 replies; 60+ messages in thread From: Michael K. Edwards @ 2007-01-26 22:05 UTC (permalink / raw) To: Segher Boessenkool Cc: Denis Vlasenko, akpm, Albert Cahalan, linux-kernel, s0348365, Linus Torvalds, bunk, mikpe ALSA + GCC 4.1.1 + -Os is known to be a bad combination on some arches; see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27363 . (I tripped over it on an ARM target, but my limited understanding of GCC internals does not allow me to conclude that it is ARM-specific.) A patch claiming to fix the bug was integrated into the 4.1 branch, but my tests with a recent (20070115) gcc-4.1 snapshot indicate that it has regressed again. You might also check /proc/cpu/alignment; we have seen the alignment fixup code trigger for alignment errors in both kernel and userspace. The default appears to be to IGNORE alignment traps from userspace, which results in bogus data and potentially a wacky series of system calls, which could conceivably trigger an oops. I am told that echo 2 > /proc/cpu/alignment activates the kernel alignment fixup code, and that 3 turns on some sort of logging in addition to the fixup (haven't pursued this myself). No idea whether this is relevant to your CPU. Cheers, - Michael ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-04 17:04 ` Albert Cahalan 2007-01-04 17:24 ` Segher Boessenkool 2007-01-04 17:37 ` Linus Torvalds @ 2007-01-04 18:08 ` Andreas Schwab 2 siblings, 0 replies; 60+ messages in thread From: Andreas Schwab @ 2007-01-04 18:08 UTC (permalink / raw) To: Albert Cahalan Cc: Segher Boessenkool, akpm, linux-kernel, s0348365, bunk, mikpe, torvalds "Albert Cahalan" <acahalan@gmail.com> writes: > FYI, the kernel also assumes that a "char" is 8 bits. > Maybe you should run away screaming. You are confusing "undefined" with "implementation defined". Those are two quite different concepts. Andreas. -- Andreas Schwab, SuSE Labs, schwab@suse.de SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems
@ 2007-01-03 2:12 Mikael Pettersson
2007-01-03 2:20 ` Alistair John Strachan
` (2 more replies)
0 siblings, 3 replies; 60+ messages in thread
From: Mikael Pettersson @ 2007-01-03 2:12 UTC (permalink / raw)
To: s0348365, torvalds
Cc: 76306.1226, akpm, bunk, greg, linux-kernel, yanmin_zhang
On Tue, 2 Jan 2007 17:43:00 -0800 (PST), Linus Torvalds wrote:
> > The suggestions I've had so far which I have not yet tried:
> >
> > - Select a different x86 CPU in the config.
> > - Unfortunately the C3-2 flags seem to simply tell GCC
> > to schedule for ppro (like i686) and enabled MMX and SSE
> > - Probably useless
>
> Actually, try this one. Try using something that doesn't like "cmov".
> Maybe the C3-2 simply has some internal cmov bugginess.
That's a good suggestion. Earlier C3s didn't have cmov so it's
not entirely unlikely that cmov in C3-2 is broken in some cases.
Configuring for P5MMX or 486 should be good safe alternatives.
/Mikael
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-03 2:12 Mikael Pettersson @ 2007-01-03 2:20 ` Alistair John Strachan 2007-01-05 15:53 ` Alistair John Strachan 2007-01-03 5:55 ` Willy Tarreau 2007-01-03 10:29 ` Alan 2 siblings, 1 reply; 60+ messages in thread From: Alistair John Strachan @ 2007-01-03 2:20 UTC (permalink / raw) To: Mikael Pettersson Cc: torvalds, 76306.1226, akpm, bunk, greg, linux-kernel, yanmin_zhang On Wednesday 03 January 2007 02:12, Mikael Pettersson wrote: > On Tue, 2 Jan 2007 17:43:00 -0800 (PST), Linus Torvalds wrote: > > > The suggestions I've had so far which I have not yet tried: > > > > > > - Select a different x86 CPU in the config. > > > - Unfortunately the C3-2 flags seem to simply tell GCC > > > to schedule for ppro (like i686) and enabled MMX and SSE > > > - Probably useless > > > > Actually, try this one. Try using something that doesn't like "cmov". > > Maybe the C3-2 simply has some internal cmov bugginess. > > That's a good suggestion. Earlier C3s didn't have cmov so it's > not entirely unlikely that cmov in C3-2 is broken in some cases. > Configuring for P5MMX or 486 should be good safe alternatives. Or just C3 (not C3-2), which is what I've done. I'll report back whether it crashes or not. -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-03 2:20 ` Alistair John Strachan @ 2007-01-05 15:53 ` Alistair John Strachan 2007-01-05 16:02 ` Linus Torvalds 0 siblings, 1 reply; 60+ messages in thread From: Alistair John Strachan @ 2007-01-05 15:53 UTC (permalink / raw) To: Mikael Pettersson Cc: torvalds, 76306.1226, akpm, bunk, greg, linux-kernel, yanmin_zhang On Wednesday 03 January 2007 02:20, Alistair John Strachan wrote: > On Wednesday 03 January 2007 02:12, Mikael Pettersson wrote: > > On Tue, 2 Jan 2007 17:43:00 -0800 (PST), Linus Torvalds wrote: > > > > The suggestions I've had so far which I have not yet tried: > > > > > > > > - Select a different x86 CPU in the config. > > > > - Unfortunately the C3-2 flags seem to simply tell GCC > > > > to schedule for ppro (like i686) and enabled MMX and SSE > > > > - Probably useless > > > > > > Actually, try this one. Try using something that doesn't like "cmov". > > > Maybe the C3-2 simply has some internal cmov bugginess. > > > > That's a good suggestion. Earlier C3s didn't have cmov so it's > > not entirely unlikely that cmov in C3-2 is broken in some cases. > > Configuring for P5MMX or 486 should be good safe alternatives. > > Or just C3 (not C3-2), which is what I've done. > > I'll report back whether it crashes or not. This didn't help. After about 14 hours, the machine crashed again. cmov is not the culprit. -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-05 15:53 ` Alistair John Strachan @ 2007-01-05 16:02 ` Linus Torvalds 2007-01-05 16:19 ` Alistair John Strachan 0 siblings, 1 reply; 60+ messages in thread From: Linus Torvalds @ 2007-01-05 16:02 UTC (permalink / raw) To: Alistair John Strachan Cc: Mikael Pettersson, 76306.1226, akpm, bunk, greg, linux-kernel, yanmin_zhang On Fri, 5 Jan 2007, Alistair John Strachan wrote: > > This didn't help. After about 14 hours, the machine crashed again. > > cmov is not the culprit. Ok. Have you ever tried to limit the drivers you have loaded? I notice you had the prism54 wireless thing in your modules list and the vt1211 hw monitoring thing. I'm wondering about the vt1211 thing - it probably isn't too common. But if you can use that machine without the wireless too, it might be good to try without either. (The rest of your module list looked bog-standard, so if it's not hardware-specific, I don't think it's there) Turning of the VIA sound driver just in case would be good too. The reason I mention vt1211 in particular is that it does things like regulate fan activity etc. Is the problem perhaps heat-related? Linus ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-05 16:02 ` Linus Torvalds @ 2007-01-05 16:19 ` Alistair John Strachan 2007-01-05 16:49 ` Linus Torvalds 0 siblings, 1 reply; 60+ messages in thread From: Alistair John Strachan @ 2007-01-05 16:19 UTC (permalink / raw) To: Linus Torvalds Cc: Mikael Pettersson, 76306.1226, akpm, bunk, greg, linux-kernel, yanmin_zhang On Friday 05 January 2007 16:02, Linus Torvalds wrote: > On Fri, 5 Jan 2007, Alistair John Strachan wrote: > > This didn't help. After about 14 hours, the machine crashed again. > > > > cmov is not the culprit. > > Ok. Have you ever tried to limit the drivers you have loaded? I notice you > had the prism54 wireless thing in your modules list and the vt1211 hw > monitoring thing. I'm wondering about the vt1211 thing - it probably isn't > too common. Sure, and it only got added to 2.6.19 anyway (however GCC 3.4.6 really does seem to have no problem with it). > But if you can use that machine without the wireless too, it > might be good to try without either. Required, plus I've been running prism54 on three different machines with a huge number of compilers since the early 2.6 days with no problems. > (The rest of your module list looked bog-standard, so if it's not > hardware-specific, I don't think it's there) Agreed, the config is already _very_ minimal for this machine. > Turning of the VIA sound driver just in case would be good too. I'm not even really sure why that's enabled. I can do that. > The reason I mention vt1211 in particular is that it does things like > regulate fan activity etc. Is the problem perhaps heat-related? It definitely isn't heat related. This CPU puts out 7-10W, has a ridiculous 5000 RPM fan on it (that works) and the temp never exceeds 40C. If anything, the -O2, 3.4.6 kernel with CMOV ran the chip a little hotter. As far as I can see, all the other components are either cool to touch or have stupidly big heatsinks on them. (I realise with problems like these it's almost always some sort of obscure hardware problem, but I find that very difficult to believe when I can toggle from 3 years of stability to 6-18 hours crashing by switching compiler. I've also ran extensive stability test programs on the hardware with absolutely no negative results.) -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-05 16:19 ` Alistair John Strachan @ 2007-01-05 16:49 ` Linus Torvalds 2007-01-07 0:36 ` Pavel Machek 0 siblings, 1 reply; 60+ messages in thread From: Linus Torvalds @ 2007-01-05 16:49 UTC (permalink / raw) To: Alistair John Strachan Cc: Mikael Pettersson, 76306.1226, akpm, bunk, greg, linux-kernel, yanmin_zhang On Fri, 5 Jan 2007, Alistair John Strachan wrote: > > (I realise with problems like these it's almost always some sort of obscure > hardware problem, but I find that very difficult to believe when I can toggle > from 3 years of stability to 6-18 hours crashing by switching compiler. I've > also ran extensive stability test programs on the hardware with absolutely no > negative results.) The thing is, I agree with you - it does seem to be compiler-related. But at the same time, I'm almost positive that it's not in "pipe_poll()" itself, because that function is just too simple, and looking at the assembly code, I don't see how what you describe could happen in THAT function. HOWEVER. I can easily see an NMI coming in, or another interrupt, or something, and that one corrupting the stack under it because of a compiler bug (or a kernel bug that just needs a specific compiler to trigger). For example, we've had problems before with the compiler thinking it owns the stack frame for an "asmlinkage" function, and us having no way to tell the compiler to keep its hands off - so the compiler ended up touching registers that were actually in the "save area" of the interrupt or system call, and then returning with corrupted state. Here's a stupid patch. It just adds more debugging to the oops message, and shows all the code pointers it can find on the WHOLE stack. It also makes the raw stack dumping print out as much of the stack contents _under_ the stack pointer as it does above it too. However, this patch is mostly useless if you have a separate stack for IRQ's (since if that happens, any interrupt will be taken on a different stack which we don't see any more), so you should NOT enable the 4KSTACKS config option if you try this out. I'm not sure how enlightening any of the output might be, but it is probably worth trying. Linus --- diff --git a/arch/i386/kernel/traps.c b/arch/i386/kernel/traps.c index 0efad8a..2359eed 100644 --- a/arch/i386/kernel/traps.c +++ b/arch/i386/kernel/traps.c @@ -243,6 +243,20 @@ void show_trace(struct task_struct *task, struct pt_regs *regs, show_trace_log_lvl(task, regs, stack, ""); } +static void show_all_stack_addresses(unsigned long *esp) +{ + struct thread_info *tinfo = (void *) ((unsigned long)esp & (~(THREAD_SIZE - 1))); + unsigned long *stack = (unsigned long *)(tinfo+1); + + printk("All stack code pointers:\n"); + while (valid_stack_ptr(tinfo, stack)) { + unsigned long addr = *stack++; + if (__kernel_text_address(addr)) + print_symbol(" %s", addr); + } + printk("\n"); +} + static void show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs, unsigned long *esp, char *log_lvl) { @@ -256,8 +270,10 @@ static void show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs, esp = (unsigned long *)&esp; } + show_all_stack_addresses(esp); stack = esp; - for(i = 0; i < kstack_depth_to_print; i++) { + stack -= kstack_depth_to_print; + for(i = 0; i < 2*kstack_depth_to_print; i++) { if (kstack_end(stack)) break; if (i && ((i % 8) == 0)) ^ permalink raw reply related [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-05 16:49 ` Linus Torvalds @ 2007-01-07 0:36 ` Pavel Machek 2007-01-07 0:57 ` Alistair John Strachan 0 siblings, 1 reply; 60+ messages in thread From: Pavel Machek @ 2007-01-07 0:36 UTC (permalink / raw) To: Linus Torvalds Cc: Alistair John Strachan, Mikael Pettersson, 76306.1226, akpm, bunk, greg, linux-kernel, yanmin_zhang Hi! > > (I realise with problems like these it's almost always some sort of obscure > > hardware problem, but I find that very difficult to believe when I can toggle > > from 3 years of stability to 6-18 hours crashing by switching compiler. I've > > also ran extensive stability test programs on the hardware with absolutely no > > negative results.) > > The thing is, I agree with you - it does seem to be compiler-related. But > at the same time, I'm almost positive that it's not in "pipe_poll()" > itself, because that function is just too simple, and looking at the > assembly code, I don't see how what you describe could happen in THAT > function. > > HOWEVER. > > I can easily see an NMI coming in, or another interrupt, or something, and > that one corrupting the stack under it because of a compiler bug (or a > kernel bug that just needs a specific compiler to trigger). For example, > we've had problems before with the compiler thinking it owns the stack > frame for an "asmlinkage" function, and us having no way to tell the > compiler to keep its hands off - so the compiler ended up touching > registers that were actually in the "save area" of the interrupt or system > call, and then returning with corrupted state. > > Here's a stupid patch. It just adds more debugging to the oops message, > and shows all the code pointers it can find on the WHOLE stack. > > It also makes the raw stack dumping print out as much of the stack > contents _under_ the stack pointer as it does above it too. > > However, this patch is mostly useless if you have a separate stack for > IRQ's (since if that happens, any interrupt will be taken on a different > stack which we don't see any more), so you should NOT enable the 4KSTACKS > config option if you try this out. stupid idea... perhaps gcc-4.1 generates bigger stackframe somewhere, and stack overflows? that hw monitoring thingie... I'd turn it off. Its interactions with acpi are non-trivial and dangerous. Pavel -- Thanks for all the (sleeping) penguins. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-07 0:36 ` Pavel Machek @ 2007-01-07 0:57 ` Alistair John Strachan 0 siblings, 0 replies; 60+ messages in thread From: Alistair John Strachan @ 2007-01-07 0:57 UTC (permalink / raw) To: Pavel Machek Cc: Linus Torvalds, Mikael Pettersson, 76306.1226, akpm, bunk, greg, linux-kernel, yanmin_zhang On Sunday 07 January 2007 00:36, Pavel Machek wrote: [snip] > > However, this patch is mostly useless if you have a separate stack for > > IRQ's (since if that happens, any interrupt will be taken on a different > > stack which we don't see any more), so you should NOT enable the 4KSTACKS > > config option if you try this out. > > stupid idea... perhaps gcc-4.1 generates bigger stackframe somewhere, > and stack overflows? The primary reason it's not 4KSTACKS already is that I run multiple XFS partitions on top of an md RAID 1. LVM isn't involved, however, and I'm not using any other filesystem overlays like dm. I'm fairly sceptical that it's a stack overflow, but I'll be sure to enable the debugging option on the next try. > that hw monitoring thingie... I'd turn it off. Its interactions with > acpi are non-trivial and dangerous. Well, GCC 3.4 kernels seem to run fine with it, but as I said to Linus I'll be sure to turn this and the sound drivers off in the next build. -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-03 2:12 Mikael Pettersson 2007-01-03 2:20 ` Alistair John Strachan @ 2007-01-03 5:55 ` Willy Tarreau 2007-01-03 10:29 ` Alan 2 siblings, 0 replies; 60+ messages in thread From: Willy Tarreau @ 2007-01-03 5:55 UTC (permalink / raw) To: Mikael Pettersson Cc: s0348365, torvalds, 76306.1226, akpm, bunk, greg, linux-kernel, yanmin_zhang On Wed, Jan 03, 2007 at 03:12:13AM +0100, Mikael Pettersson wrote: > On Tue, 2 Jan 2007 17:43:00 -0800 (PST), Linus Torvalds wrote: > > > The suggestions I've had so far which I have not yet tried: > > > > > > - Select a different x86 CPU in the config. > > > - Unfortunately the C3-2 flags seem to simply tell GCC > > > to schedule for ppro (like i686) and enabled MMX and SSE > > > - Probably useless > > > > Actually, try this one. Try using something that doesn't like "cmov". > > Maybe the C3-2 simply has some internal cmov bugginess. > > That's a good suggestion. Earlier C3s didn't have cmov so it's > not entirely unlikely that cmov in C3-2 is broken in some cases. Agreed! When I developped the cmov emulator, I used an early C3 for the tests (well, a "Samuel2" to be precise), because it did not report "cmov" in its flags. I first thought "wow, my emulator is amazingly fast!" because it took something like 50 cycles to do cmovne %eax,%ebx. Then I realized that this processor performed cmov itself between registers, and only triggered the invalid opcode when one of the operand was a memory reference. And this time, for a hard-coded instruction, it was really slow... For this reason, I would not be surprized at all that there would be some buggy behaviour in the cmov right there. Maybe a bug in the decoder unit making it skip a byte when the next instruction in the prefetch queue is a cmov affecting same registers... When vendors can do dirty things such as executing unsupported instructions, we can expect anything from them. > Configuring for P5MMX or 486 should be good safe alternatives. I generally use the P5MMX target for such processors. > /Mikael Regards, Willy ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-03 2:12 Mikael Pettersson 2007-01-03 2:20 ` Alistair John Strachan 2007-01-03 5:55 ` Willy Tarreau @ 2007-01-03 10:29 ` Alan 2007-01-03 10:32 ` Grzegorz Kulewski 2 siblings, 1 reply; 60+ messages in thread From: Alan @ 2007-01-03 10:29 UTC (permalink / raw) To: Mikael Pettersson Cc: s0348365, torvalds, 76306.1226, akpm, bunk, greg, linux-kernel, yanmin_zhang > That's a good suggestion. Earlier C3s didn't have cmov so it's > not entirely unlikely that cmov in C3-2 is broken in some cases. > Configuring for P5MMX or 486 should be good safe alternatives. The proper fix for all of this mess is to fix the gcc compiler suite to actually generate i686 code when told to use i686. CMOV is an optional i686 extension which gcc uses without checking. In early PIV days it made sense but on modern processors CMOV is so pointless the bug should be fixed. At that point an i686 kernel would contain i686 instructions and actually run on all i686 processors ending all the i586 pain for most users and distributions. Unfortunately the compiler people don't appear to care about their years old bug. Alan ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-03 10:29 ` Alan @ 2007-01-03 10:32 ` Grzegorz Kulewski 2007-01-03 11:51 ` Jeff Garzik ` (2 more replies) 0 siblings, 3 replies; 60+ messages in thread From: Grzegorz Kulewski @ 2007-01-03 10:32 UTC (permalink / raw) To: Alan Cc: Mikael Pettersson, s0348365, torvalds, 76306.1226, akpm, bunk, greg, linux-kernel, yanmin_zhang On Wed, 3 Jan 2007, Alan wrote: > The proper fix for all of this mess is to fix the gcc compiler suite to > actually generate i686 code when told to use i686. CMOV is an optional > i686 extension which gcc uses without checking. In early PIV days it made > sense but on modern processors CMOV is so pointless the bug should be > fixed. At that point an i686 kernel would contain i686 instructions and > actually run on all i686 processors ending all the i586 pain for most > users and distributions. Could you explain why CMOV is pointless now? Are there any benchmarks proving that? Thanks, Grzegorz Kulewski ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-03 10:32 ` Grzegorz Kulewski @ 2007-01-03 11:51 ` Jeff Garzik 2007-01-03 12:44 ` Alan 2007-01-03 16:03 ` Linus Torvalds 2 siblings, 0 replies; 60+ messages in thread From: Jeff Garzik @ 2007-01-03 11:51 UTC (permalink / raw) To: Grzegorz Kulewski Cc: Alan, Mikael Pettersson, s0348365, torvalds, 76306.1226, akpm, bunk, greg, linux-kernel, yanmin_zhang Grzegorz Kulewski wrote: > On Wed, 3 Jan 2007, Alan wrote: >> The proper fix for all of this mess is to fix the gcc compiler suite to >> actually generate i686 code when told to use i686. CMOV is an optional >> i686 extension which gcc uses without checking. In early PIV days it made >> sense but on modern processors CMOV is so pointless the bug should be >> fixed. At that point an i686 kernel would contain i686 instructions and >> actually run on all i686 processors ending all the i586 pain for most >> users and distributions. > > Could you explain why CMOV is pointless now? Are there any benchmarks > proving that? In theory modern processors should have no trouble converting a test/move sequence into the same uops generated by a cmov instruction, for one. Jeff ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-03 10:32 ` Grzegorz Kulewski 2007-01-03 11:51 ` Jeff Garzik @ 2007-01-03 12:44 ` Alan 2007-01-03 13:32 ` Arjan van de Ven 2007-01-03 16:03 ` Linus Torvalds 2 siblings, 1 reply; 60+ messages in thread From: Alan @ 2007-01-03 12:44 UTC (permalink / raw) To: Grzegorz Kulewski Cc: Mikael Pettersson, s0348365, torvalds, 76306.1226, akpm, bunk, greg, linux-kernel, yanmin_zhang > > fixed. At that point an i686 kernel would contain i686 instructions and > > actually run on all i686 processors ending all the i586 pain for most > > users and distributions. > > Could you explain why CMOV is pointless now? Are there any benchmarks > proving that? Take a look at the recent ffmpeg bits on the mplayer list for one example I have to hand - P4 cmov is pretty slow. The crypto folks find the same things. Alan ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-03 12:44 ` Alan @ 2007-01-03 13:32 ` Arjan van de Ven 2007-01-03 13:58 ` Jakub Jelinek 2007-01-03 14:28 ` Alan 0 siblings, 2 replies; 60+ messages in thread From: Arjan van de Ven @ 2007-01-03 13:32 UTC (permalink / raw) To: Alan Cc: Grzegorz Kulewski, Mikael Pettersson, s0348365, torvalds, 76306.1226, akpm, bunk, greg, linux-kernel, yanmin_zhang On Wed, 2007-01-03 at 12:44 +0000, Alan wrote: > > > fixed. At that point an i686 kernel would contain i686 instructions and > > > actually run on all i686 processors ending all the i586 pain for most > > > users and distributions. > > > > Could you explain why CMOV is pointless now? Are there any benchmarks > > proving that? > > Take a look at the recent ffmpeg bits on the mplayer list for one example > I have to hand - P4 cmov is pretty slow. The crypto folks find the same > things. cmov is effectively the same cost as a compare and jump, in both cases the cpu needs to do a prediction, and on a mispredict, restart. the reason cmov can make sense is because it's smaller code... -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-03 13:32 ` Arjan van de Ven @ 2007-01-03 13:58 ` Jakub Jelinek 2007-01-03 14:28 ` Alan 1 sibling, 0 replies; 60+ messages in thread From: Jakub Jelinek @ 2007-01-03 13:58 UTC (permalink / raw) To: Arjan van de Ven Cc: Alan, Grzegorz Kulewski, Mikael Pettersson, s0348365, torvalds, 76306.1226, akpm, bunk, greg, linux-kernel, yanmin_zhang On Wed, Jan 03, 2007 at 05:32:16AM -0800, Arjan van de Ven wrote: > On Wed, 2007-01-03 at 12:44 +0000, Alan wrote: > > > > fixed. At that point an i686 kernel would contain i686 instructions and > > > > actually run on all i686 processors ending all the i586 pain for most > > > > users and distributions. > > > > > > Could you explain why CMOV is pointless now? Are there any benchmarks > > > proving that? > > > > Take a look at the recent ffmpeg bits on the mplayer list for one example > > I have to hand - P4 cmov is pretty slow. The crypto folks find the same > > things. > > cmov is effectively the same cost as a compare and jump, in both cases > the cpu needs to do a prediction, and on a mispredict, restart. > > the reason cmov can make sense is because it's smaller code... BTW, from GCC POV availability of CMOV is the only difference between -march=i586 -mtune=something and -march=i686 -mtune=something. So this is just a naming thing, it could be called -march=i686cmov to make it more obvious but it is too late (and too unimportant) to change it now. Perhaps adding a note to info gcc/man gcc ought to be enough? If you don't want CMOV being emitted, compile with -march=i586 -mtune=generic (or whatever other tuning you pick up), with -march=i686 -mtune=generic you tell GCC you have CMOV. Whether CMOV is actually used in generated code is another matter, which should be decided based on the selected -mtune. For -Os CMOV should be used whenever available, as that means usually smaller code, otherwise if on some particular chip CMOV is actually slower than compare, jump and assignment, then CMOV should not be selected for that particular tuning (say if Pentium4 has slower CMOV than compare+jump+assignment, -mtune=pentium4 should not emit CMOV, at least not often), if you have examples of that, please file a bug to http://gcc.gnu.org/bugzilla/. -mtune=generic should emit resp. not emit CMOV depending on whether it is a win on the currently common CPUs. Jakub ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-03 13:32 ` Arjan van de Ven 2007-01-03 13:58 ` Jakub Jelinek @ 2007-01-03 14:28 ` Alan 2007-01-03 16:06 ` Linus Torvalds 1 sibling, 1 reply; 60+ messages in thread From: Alan @ 2007-01-03 14:28 UTC (permalink / raw) To: Arjan van de Ven Cc: Grzegorz Kulewski, Mikael Pettersson, s0348365, torvalds, 76306.1226, akpm, bunk, greg, linux-kernel, yanmin_zhang > cmov is effectively the same cost as a compare and jump, in both cases > the cpu needs to do a prediction, and on a mispredict, restart. On a P4 it appears to be slower than compare/jump in most cases ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-03 14:28 ` Alan @ 2007-01-03 16:06 ` Linus Torvalds 0 siblings, 0 replies; 60+ messages in thread From: Linus Torvalds @ 2007-01-03 16:06 UTC (permalink / raw) To: Alan Cc: Arjan van de Ven, Grzegorz Kulewski, Mikael Pettersson, s0348365, 76306.1226, akpm, bunk, greg, linux-kernel, yanmin_zhang On Wed, 3 Jan 2007, Alan wrote: > > > cmov is effectively the same cost as a compare and jump, in both cases > > the cpu needs to do a prediction, and on a mispredict, restart. > > On a P4 it appears to be slower than compare/jump in most cases On just about EVERYTHING it's slower than compare/jump. See my other post on why, together with a (largely untested) test app. Linus ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-03 10:32 ` Grzegorz Kulewski 2007-01-03 11:51 ` Jeff Garzik 2007-01-03 12:44 ` Alan @ 2007-01-03 16:03 ` Linus Torvalds 2007-01-03 17:01 ` l.genoni ` (5 more replies) 2 siblings, 6 replies; 60+ messages in thread From: Linus Torvalds @ 2007-01-03 16:03 UTC (permalink / raw) To: Grzegorz Kulewski Cc: Alan, Mikael Pettersson, s0348365, 76306.1226, akpm, bunk, greg, linux-kernel, yanmin_zhang [-- Attachment #1: Type: TEXT/PLAIN, Size: 4765 bytes --] On Wed, 3 Jan 2007, Grzegorz Kulewski wrote: > > Could you explain why CMOV is pointless now? Are there any benchmarks proving > that? CMOV (and, more generically, any "predicated instruction") tends to generally a bad idea on an aggressively out-of-order CPU. It doesn't always have to be horrible, but in practice it is seldom very nice, and (as usual) on the P4 it can be really quite bad. On a P4, I think a cmov basically takes 10 cycles. But even ignoring the usual P4 "I suck at things that aren't totally normal", cmov is actually not a great idea. You can always replace it by j<negated condition> forward mov ..., %reg forward: and assuming the branch is AT ALL predictable (and 95+% of all branches are), the branch-over will actually be a LOT better for a CPU. Why? Becuase branches can be predicted, and when they are predicted they basically go away. They go away on many levels, too. Not just the branch itself, but the _conditional_ for the branch goes away as far as the critical path of code is concerned: the CPU still has to calculate it and check it, but from a performance angle it "doesn't exist any more", because it's not holding anything else up (well, you want to do it in _some_ reasonable time, but the point stands..) Similarly, whichever side of the branch wasn't taken goes away. Again, in an out-of-order machine with register renaming, this means that even if the branch isn't taken above, and you end up executing all the non-branch instructions, because you now UNCONDITIONALLY over-write the register, the old data in the register is now DEAD, so now all the OTHER writes to that register are off the critical path too! So the end result is that with a conditional branch, ona good CPU, the _only_ part of the code that is actually performance-sensitive is the actual calculation of the value that gets used! In contrast, if you use a predicated instruction, ALL of it is on the critical path. Calculating the conditional is on the critical path. Calculating the value that gets used is obviously ALSO on the critical path, but so is the calculation for the value that DOESN'T get used too. So the cmov - rather than speeding things up - actually slows things down, because it makes more code be dependent on each other. So here's the basic rule: - cmov is sometimes nice for code density. It's not a big win, but it certainly can be a win. - if you KNOW the branch is totally unpredictable, cmov is often good for performance. But a compiler almost never knows that, and even if you train it with input data and profiling, remember that not very many branches _are_ totally unpredictable, so even if you were to know that something is unpredictable, it's going to be very rare. - on a P4, branch mispredictions are expensive, but so is cmov, so all the above is to some degree exaggerated. On nicer microarchitectures (the Intel Core 2 in particular is something I have to say is very nice indeed), the difference will be a lot less noticeable. The loss from cmov isn't very big (it's not as sucky as P4), but neither is the win (branch misprediction isn't that expensive either). Here's an example program that you can test and time yourself. On my Core 2, I get [torvalds@woody ~]$ gcc -DCMOV -Wall -O2 t.c [torvalds@woody ~]$ time ./a.out 600000000 real 0m0.194s user 0m0.192s sys 0m0.000s [torvalds@woody ~]$ gcc -Wall -O2 t.c [torvalds@woody ~]$ time ./a.out 600000000 real 0m0.167s user 0m0.168s sys 0m0.000s ie the cmov is quite a bit slower. Maybe I did something wrong. But note how cmov not only is slower, it's fundamnetally more limited too (ie the branch-over can actually do a lot of things cmov simply cannot do). So don't use cmov. Except for non-performance-critical code, or if you really care about code-size, and it helps (which is actually fairly rare: quite often cmov isn't even smaller than a conditional jump and a regular move, partly because a regular move can take arguments that a cmov cannot: move to memory, move from an immediate etc etc, so depending on what you're moving, cmov simply isn't good even if it's _just_ a move). (For me, the "cmov" version of the function ends up being three bytes shorter. So it's actually a good example of everything above) Linus (*) x86 only has "move to register" as a predicated instruction, but some other architectures have lots of them, potentially all instructions. I don't count conditional branches as "predicated", although some crazy people do. ARM has predicated instructions (but they are gone in Thumb, I think), and ia64 obviously has predicated instructions (but it will be gone in a few years ;) [-- Attachment #2: Type: TEXT/PLAIN, Size: 806 bytes --] #include <stdio.h> /* How many iterations? */ #define ITERATIONS (100000000) /* Which bit of the counter to test? */ #define BIT 1 #ifdef CMOV #define choose(i, a, b) ({ \ unsigned long result; \ asm("testl %1,%2 ; cmovne %3,%0" \ :"=r" (result) \ :"i" (BIT), \ "g" (i), \ "rm" (a), \ "0" (b)); \ result; }) #else #define choose(i, a, b) ({ \ unsigned long result; \ asm("testl %1,%2 ; je 1f ; mov %3,%0\n1:" \ :"=r" (result) \ :"i" (BIT), \ "g" (i), \ "g" (a), \ "0" (b)); \ result; }) #endif int main(int argc, char **argv) { int i; unsigned long sum = 0; for (i = 0; i < ITERATIONS; i++) { unsigned long a = 5, b = 7; sum += choose(i, a, b); } printf("%lu\n", sum); return 0; } ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-03 16:03 ` Linus Torvalds @ 2007-01-03 17:01 ` l.genoni 2007-01-03 17:45 ` Tim Schmielau 2007-01-03 17:06 ` l.genoni ` (4 subsequent siblings) 5 siblings, 1 reply; 60+ messages in thread From: l.genoni @ 2007-01-03 17:01 UTC (permalink / raw) To: Linus Torvalds Cc: Grzegorz Kulewski, Alan, Mikael Pettersson, s0348365, 76306.1226, akpm, bunk, greg, linux-kernel, yanmin_zhang Just curious why on Opteron dual core 2600MHZ I get: phoenix:{root}:/tmp> gcc -DCMOV -Wall -O2 t.c phoenix:{root}:/tmp>time ./a.out 600000000 real 0m0.117s user 0m0.120s sys 0m0.000s phoenix:{root}:/tmp>gcc -Wall -O2 t.c phoenix:{root}:/tmp> time ./a.out 600000000 real 0m0.136s user 0m0.130s sys 0m0.010s Regards (I understand it is very different from P4) Luigi Genoni On Wed, 3 Jan 2007, Linus Torvalds wrote: > Date: Wed, 3 Jan 2007 08:03:37 -0800 (PST) > From: Linus Torvalds <torvalds@osdl.org> > To: Grzegorz Kulewski <kangur@polcom.net> > Cc: Alan <alan@lxorguk.ukuu.org.uk>, Mikael Pettersson <mikpe@it.uu.se>, > s0348365@sms.ed.ac.uk, 76306.1226@compuserve.com, akpm@osdl.org, > bunk@stusta.de, greg@kroah.com, linux-kernel@vger.kernel.org, > yanmin_zhang@linux.intel.com > Subject: Re: kernel + gcc 4.1 = several problems > Resent-Date: Wed, 03 Jan 2007 17:16:00 +0100 > Resent-From: <l.genoni@sns.it> > > > > On Wed, 3 Jan 2007, Grzegorz Kulewski wrote: >> >> Could you explain why CMOV is pointless now? Are there any benchmarks proving >> that? > > CMOV (and, more generically, any "predicated instruction") tends to > generally a bad idea on an aggressively out-of-order CPU. It doesn't > always have to be horrible, but in practice it is seldom very nice, and > (as usual) on the P4 it can be really quite bad. > > On a P4, I think a cmov basically takes 10 cycles. > > But even ignoring the usual P4 "I suck at things that aren't totally > normal", cmov is actually not a great idea. You can always replace it by > > j<negated condition> forward > mov ..., %reg > forward: > > and assuming the branch is AT ALL predictable (and 95+% of all branches > are), the branch-over will actually be a LOT better for a CPU. > > Why? Becuase branches can be predicted, and when they are predicted they > basically go away. They go away on many levels, too. Not just the branch > itself, but the _conditional_ for the branch goes away as far as the > critical path of code is concerned: the CPU still has to calculate it and > check it, but from a performance angle it "doesn't exist any more", > because it's not holding anything else up (well, you want to do it in > _some_ reasonable time, but the point stands..) > > Similarly, whichever side of the branch wasn't taken goes away. Again, in > an out-of-order machine with register renaming, this means that even if > the branch isn't taken above, and you end up executing all the non-branch > instructions, because you now UNCONDITIONALLY over-write the register, the > old data in the register is now DEAD, so now all the OTHER writes to that > register are off the critical path too! > > So the end result is that with a conditional branch, ona good CPU, the > _only_ part of the code that is actually performance-sensitive is the > actual calculation of the value that gets used! > > In contrast, if you use a predicated instruction, ALL of it is on the > critical path. Calculating the conditional is on the critical path. > Calculating the value that gets used is obviously ALSO on the critical > path, but so is the calculation for the value that DOESN'T get used too. > So the cmov - rather than speeding things up - actually slows things down, > because it makes more code be dependent on each other. > > So here's the basic rule: > > - cmov is sometimes nice for code density. It's not a big win, but it > certainly can be a win. > > - if you KNOW the branch is totally unpredictable, cmov is often good for > performance. But a compiler almost never knows that, and even if you > train it with input data and profiling, remember that not very many > branches _are_ totally unpredictable, so even if you were to know that > something is unpredictable, it's going to be very rare. > > - on a P4, branch mispredictions are expensive, but so is cmov, so all > the above is to some degree exaggerated. On nicer microarchitectures > (the Intel Core 2 in particular is something I have to say is very nice > indeed), the difference will be a lot less noticeable. The loss from > cmov isn't very big (it's not as sucky as P4), but neither is the win > (branch misprediction isn't that expensive either). > > Here's an example program that you can test and time yourself. > > On my Core 2, I get > > [torvalds@woody ~]$ gcc -DCMOV -Wall -O2 t.c > [torvalds@woody ~]$ time ./a.out > 600000000 > > real 0m0.194s > user 0m0.192s > sys 0m0.000s > > [torvalds@woody ~]$ gcc -Wall -O2 t.c > [torvalds@woody ~]$ time ./a.out > 600000000 > > real 0m0.167s > user 0m0.168s > sys 0m0.000s > > ie the cmov is quite a bit slower. Maybe I did something wrong. But note > how cmov not only is slower, it's fundamnetally more limited too (ie the > branch-over can actually do a lot of things cmov simply cannot do). > > So don't use cmov. Except for non-performance-critical code, or if you > really care about code-size, and it helps (which is actually fairly rare: > quite often cmov isn't even smaller than a conditional jump and a regular > move, partly because a regular move can take arguments that a cmov cannot: > move to memory, move from an immediate etc etc, so depending on what > you're moving, cmov simply isn't good even if it's _just_ a move). > > (For me, the "cmov" version of the function ends up being three bytes > shorter. So it's actually a good example of everything above) > > Linus > > (*) x86 only has "move to register" as a predicated instruction, but some > other architectures have lots of them, potentially all instructions. I > don't count conditional branches as "predicated", although some crazy > people do. ARM has predicated instructions (but they are gone in Thumb, I > think), and ia64 obviously has predicated instructions (but it will be > gone in a few years ;) ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-03 17:01 ` l.genoni @ 2007-01-03 17:45 ` Tim Schmielau 2007-01-03 20:24 ` Linus Torvalds 0 siblings, 1 reply; 60+ messages in thread From: Tim Schmielau @ 2007-01-03 17:45 UTC (permalink / raw) To: l.genoni Cc: Linus Torvalds, Grzegorz Kulewski, Alan, Mikael Pettersson, s0348365, 76306.1226, akpm, bunk, greg, linux-kernel, yanmin_zhang Well, on a P4 (which is supposed to be soo bad) I get: > gcc -O2 t.c -o t > foreach x ( 1 2 3 4 5 ) >> time ./t > /dev/null >> end 0.196u 0.004s 0:00.19 100.0% 0+0k 0+0io 0pf+0w 0.168u 0.004s 0:00.16 100.0% 0+0k 0+0io 0pf+0w 0.168u 0.000s 0:00.16 100.0% 0+0k 0+0io 0pf+0w 0.160u 0.000s 0:00.15 106.6% 0+0k 0+0io 0pf+0w 0.180u 0.000s 0:00.18 100.0% 0+0k 0+0io 0pf+0w > gcc -DCMOV -O2 t.c -o t > foreach x ( 1 2 3 4 5 ) >> time ./t > /dev/null >> end 0.168u 0.000s 0:00.17 94.1% 0+0k 0+0io 0pf+0w 0.152u 0.000s 0:00.15 100.0% 0+0k 0+0io 0pf+0w 0.136u 0.004s 0:00.13 100.0% 0+0k 0+0io 0pf+0w 0.168u 0.000s 0:00.16 100.0% 0+0k 0+0io 0pf+0w 0.172u 0.000s 0:00.17 100.0% 0+0k 0+0io 0pf+0w see? ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-03 17:45 ` Tim Schmielau @ 2007-01-03 20:24 ` Linus Torvalds 0 siblings, 0 replies; 60+ messages in thread From: Linus Torvalds @ 2007-01-03 20:24 UTC (permalink / raw) To: Tim Schmielau Cc: l.genoni, Grzegorz Kulewski, Alan, Mikael Pettersson, s0348365, 76306.1226, akpm, bunk, greg, linux-kernel, yanmin_zhang On Wed, 3 Jan 2007, Tim Schmielau wrote: > > Well, on a P4 (which is supposed to be soo bad) I get: Interesting. My P4 gets basically exactly the same timings for the cmov and branch cases. And my Core 2 is consistently faster (something like 15%) for the branch version. Btw, the test-case should be the best possible one for cmov, since there are no data-dependencies except for ALU operations, and everything is totally independent (the actual values have no data dependencies at all, since they are constants). So the critical path issue never show up. Linus ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-03 16:03 ` Linus Torvalds 2007-01-03 17:01 ` l.genoni @ 2007-01-03 17:06 ` l.genoni 2007-01-03 17:53 ` Mariusz Kozlowski ` (3 subsequent siblings) 5 siblings, 0 replies; 60+ messages in thread From: l.genoni @ 2007-01-03 17:06 UTC (permalink / raw) To: Linus Torvalds Cc: Grzegorz Kulewski, Alan, Mikael Pettersson, s0348365, 76306.1226, akpm, bunk, greg, linux-kernel, yanmin_zhang Just to make clearer why I am so curious, this from X86_64 X2 3800+: DarkStar:{venom}:/tmp> gcc -DCMOV -Wall -O2 t.c DarkStar:{venom}:/tmp>time ./a.out 600000000 real 0m0.151s user 0m0.150s sys 0m0.000s DarkStar:{venom}:/tmp> gcc -Wall -O2 t.c DarkStar:{venom}:/tmp> time ./a.out 600000000 real 0m0.176s user 0m0.180s sys 0m0.000s DarkStar:{venom}:/tmp>gcc -m32 -DCMOV -Wall -O2 t.c DarkStar:{venom}:/tmp>time ./a.out 600000000 real 0m0.152s user 0m0.160s sys 0m0.000s DarkStar:{venom}:/tmp>gcc -m32 -Wall -O2 t.c DarkStar:{venom}:/tmp>time ./a.out 600000000 real 0m0.200s user 0m0.200s sys 0m0.000s ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-03 16:03 ` Linus Torvalds 2007-01-03 17:01 ` l.genoni 2007-01-03 17:06 ` l.genoni @ 2007-01-03 17:53 ` Mariusz Kozlowski 2007-01-03 19:47 ` Denis Vlasenko ` (2 subsequent siblings) 5 siblings, 0 replies; 60+ messages in thread From: Mariusz Kozlowski @ 2007-01-03 17:53 UTC (permalink / raw) To: Linus Torvalds Cc: Grzegorz Kulewski, Alan, Mikael Pettersson, s0348365, 76306.1226, akpm, bunk, greg, linux-kernel, yanmin_zhang [-- Attachment #1: Type: text/plain, Size: 1404 bytes --] Hello, > Here's an example program that you can test and time yourself. > > On my Core 2, I get > > [torvalds@woody ~]$ gcc -DCMOV -Wall -O2 t.c > [torvalds@woody ~]$ time ./a.out > 600000000 > > real 0m0.194s > user 0m0.192s > sys 0m0.000s > > [torvalds@woody ~]$ gcc -Wall -O2 t.c > [torvalds@woody ~]$ time ./a.out > 600000000 > > real 0m0.167s > user 0m0.168s > sys 0m0.000s Test was done on my laptop with gcc 4.1.1 and CPU: processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Pentium(R) 4 CPU 2.40GHz stepping : 9 cpu MHz : 2392.349 cache size : 512 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid xtpr bogomips : 4786.36 clflush size : 64 I wrote a simple script that run each version of your code 100 times measuring the execution time. Then some simple gnuplot magic was applied. The result is attached (png file). - cmovne was faster with almost stable execution time (~171ms) - je-mov was slower and execution time varies Interpretation is up to you ;-) -- Regards, Mariusz Kozlowski [-- Attachment #2: benchmark.png --] [-- Type: image/png, Size: 6165 bytes --] ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-03 16:03 ` Linus Torvalds ` (2 preceding siblings ...) 2007-01-03 17:53 ` Mariusz Kozlowski @ 2007-01-03 19:47 ` Denis Vlasenko 2007-01-03 20:38 ` Linus Torvalds 2007-01-03 21:44 ` Thomas Sailer 2007-01-04 3:08 ` Zou, Nanhai 5 siblings, 1 reply; 60+ messages in thread From: Denis Vlasenko @ 2007-01-03 19:47 UTC (permalink / raw) To: Linus Torvalds Cc: Grzegorz Kulewski, Alan, Mikael Pettersson, s0348365, 76306.1226, akpm, bunk, greg, linux-kernel, yanmin_zhang On Wednesday 03 January 2007 17:03, Linus Torvalds wrote: > On Wed, 3 Jan 2007, Grzegorz Kulewski wrote: > > Could you explain why CMOV is pointless now? Are there any benchmarks proving > > that? > > CMOV (and, more generically, any "predicated instruction") tends to > generally a bad idea on an aggressively out-of-order CPU. It doesn't > always have to be horrible, but in practice it is seldom very nice, and > (as usual) on the P4 it can be really quite bad. > > On a P4, I think a cmov basically takes 10 cycles. > > But even ignoring the usual P4 "I suck at things that aren't totally > normal", cmov is actually not a great idea. You can always replace it by > > j<negated condition> forward > mov ..., %reg > forward: ... ... > In contrast, if you use a predicated instruction, ALL of it is on the > critical path. Calculating the conditional is on the critical path. > Calculating the value that gets used is obviously ALSO on the critical > path, but so is the calculation for the value that DOESN'T get used too. > So the cmov - rather than speeding things up - actually slows things down, > because it makes more code be dependent on each other. Why CPU people do not internally convert cmov into jmp,mov pair? -- vda ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-03 19:47 ` Denis Vlasenko @ 2007-01-03 20:38 ` Linus Torvalds 2007-01-03 21:48 ` Denis Vlasenko 0 siblings, 1 reply; 60+ messages in thread From: Linus Torvalds @ 2007-01-03 20:38 UTC (permalink / raw) To: Denis Vlasenko Cc: Grzegorz Kulewski, Alan, Mikael Pettersson, s0348365, 76306.1226, akpm, bunk, greg, linux-kernel, yanmin_zhang On Wed, 3 Jan 2007, Denis Vlasenko wrote: > > Why CPU people do not internally convert cmov into jmp,mov pair? Probably because - it's not worth it. cmov's certainly _can_ be faster for unpredictable input. So expecially if you teach your compiler (by using profiling) to use cmov's mainly for unpredictable cases, turning it into a conditional jump internally would likely be a bad idea. - the biggest reason to do it would likely be microarchitectural: if you have an ALU or a bypass network that just isn't suitable for bypassing the flags that way (because you designed your pipeline for a conditional branch), you might decide that it just simplifies things to turn the cmov internally into a branch+mov uop pair. - cmov's simply aren't common enough to be worth worrying about, especially as it's not likely that the difference is all that big in the end. The limitations on cmov's means that the compiler can only use them under certain fairly limited circumstances anyway, so it's not like you'll make a huge difference by doing anything clever. So see above - it's simply a wash, and likely ends up just depending on other issues. And don't get me wrong. cmov's can make a difference. You can use them to avoid polluting your branch prediction tables, you can use them to make code smaller, and you can use them when they simply just fit the problem really well. It's just _not_ the case that they are "obviously better". They simply aren't. Conditional branches aren't "evil". There are many MUCH worse things you can do, and other things you should avoid. It really all boils down to: there's simply no real reason to use cmov. It's not horrible either, so go ahead and use it if you want to, but don't expect your code to really magically run any faster. Linus ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-03 20:38 ` Linus Torvalds @ 2007-01-03 21:48 ` Denis Vlasenko 2007-01-03 22:13 ` Linus Torvalds 0 siblings, 1 reply; 60+ messages in thread From: Denis Vlasenko @ 2007-01-03 21:48 UTC (permalink / raw) To: Linus Torvalds Cc: Grzegorz Kulewski, Alan, Mikael Pettersson, s0348365, 76306.1226, akpm, bunk, greg, linux-kernel, yanmin_zhang On Wednesday 03 January 2007 21:38, Linus Torvalds wrote: > On Wed, 3 Jan 2007, Denis Vlasenko wrote: > > > > Why CPU people do not internally convert cmov into jmp,mov pair? > ... > It really all boils down to: there's simply no real reason to use cmov. > It's not horrible either, so go ahead and use it if you want to, but don't > expect your code to really magically run any faster. IOW: yet another slot in instruction opcode matrix and thousands of transistors in instruction decoders are wasted because of this "clever invention", eh? -- vda ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-03 21:48 ` Denis Vlasenko @ 2007-01-03 22:13 ` Linus Torvalds 0 siblings, 0 replies; 60+ messages in thread From: Linus Torvalds @ 2007-01-03 22:13 UTC (permalink / raw) To: Denis Vlasenko Cc: Grzegorz Kulewski, Alan, Mikael Pettersson, s0348365, 76306.1226, akpm, bunk, greg, linux-kernel, yanmin_zhang On Wed, 3 Jan 2007, Denis Vlasenko wrote: > > IOW: yet another slot in instruction opcode matrix and thousands of > transistors in instruction decoders are wasted because of this > "clever invention", eh? Well, in all fairness, it can probably help more on certain microarchitectures. Intel is fairly aggressively OoO, especially in Core 2, and predicted branches are not only free, they allow OoO to do a great job around them. But an in-order architecture doesn't have that, and cmov might show more of an advantage there. Linus ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-03 16:03 ` Linus Torvalds ` (3 preceding siblings ...) 2007-01-03 19:47 ` Denis Vlasenko @ 2007-01-03 21:44 ` Thomas Sailer 2007-01-03 22:08 ` Linus Torvalds 2007-01-04 3:08 ` Zou, Nanhai 5 siblings, 1 reply; 60+ messages in thread From: Thomas Sailer @ 2007-01-03 21:44 UTC (permalink / raw) To: Linus Torvalds Cc: Grzegorz Kulewski, Alan, Mikael Pettersson, s0348365, 76306.1226, akpm, bunk, greg, linux-kernel, yanmin_zhang On Wed, 2007-01-03 at 08:03 -0800, Linus Torvalds wrote: > and assuming the branch is AT ALL predictable (and 95+% of all branches > are), the branch-over will actually be a LOT better for a CPU. IF... Counterexample: Add-Compare-Select in a Viterbi Decoder. If the compare can be predicted, you botched the compression of the data (if you can predict the data, you could have compressed it better), or your noise is not white, i.e. you f*** up the whitening filter. So in any practical viterbi decoder, the compares cannot be predicted. I remember cmov made a big difference in Viterbi Decoder performance on a Cyrix 6x86. But granted, nowadays these things are usually done with SIMD and masks. Tom ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-03 21:44 ` Thomas Sailer @ 2007-01-03 22:08 ` Linus Torvalds 0 siblings, 0 replies; 60+ messages in thread From: Linus Torvalds @ 2007-01-03 22:08 UTC (permalink / raw) To: Thomas Sailer Cc: Grzegorz Kulewski, Alan, Mikael Pettersson, s0348365, 76306.1226, akpm, bunk, greg, linux-kernel, yanmin_zhang On Wed, 3 Jan 2007, Thomas Sailer wrote: > > IF... Counterexample: Add-Compare-Select in a Viterbi Decoder. Yes. [De]compression stuff tends to be (a) totally unpredictable and (b) a situation where people care about performance. It's fairly rare in many other situations. That said, any real performance these days is about avoiding cache misses. There cmov really can help more, if it results in denser code (fairly big if, though). Linus ^ permalink raw reply [flat|nested] 60+ messages in thread
* RE: kernel + gcc 4.1 = several problems 2007-01-03 16:03 ` Linus Torvalds ` (4 preceding siblings ...) 2007-01-03 21:44 ` Thomas Sailer @ 2007-01-04 3:08 ` Zou, Nanhai 2007-01-04 15:34 ` Linus Torvalds 5 siblings, 1 reply; 60+ messages in thread From: Zou, Nanhai @ 2007-01-04 3:08 UTC (permalink / raw) To: Linus Torvalds, Grzegorz Kulewski Cc: Alan, Mikael Pettersson, s0348365, 76306.1226, akpm, bunk, greg, linux-kernel, yanmin_zhang > -----Original Message----- > From: linux-kernel-owner@vger.kernel.org > [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Linus Torvalds > Sent: 2007年1月4日 0:04 > To: Grzegorz Kulewski > Cc: Alan; Mikael Pettersson; s0348365@sms.ed.ac.uk; > 76306.1226@compuserve.com; akpm@osdl.org; bunk@stusta.de; greg@kroah.com; > linux-kernel@vger.kernel.org; yanmin_zhang@linux.intel.com > Subject: Re: kernel + gcc 4.1 = several problems > > > > On Wed, 3 Jan 2007, Grzegorz Kulewski wrote: > > > > Could you explain why CMOV is pointless now? Are there any benchmarks proving > > that? > > CMOV (and, more generically, any "predicated instruction") tends to > generally a bad idea on an aggressively out-of-order CPU. It doesn't > always have to be horrible, but in practice it is seldom very nice, and > (as usual) on the P4 it can be really quite bad. > > On a P4, I think a cmov basically takes 10 cycles. > > But even ignoring the usual P4 "I suck at things that aren't totally > normal", cmov is actually not a great idea. You can always replace it by > > j<negated condition> forward > mov ..., %reg > forward: > > and assuming the branch is AT ALL predictable (and 95+% of all branches > are), the branch-over will actually be a LOT better for a CPU. > > Why? Becuase branches can be predicted, and when they are predicted they > basically go away. They go away on many levels, too. Not just the branch > itself, but the _conditional_ for the branch goes away as far as the > critical path of code is concerned: the CPU still has to calculate it and > check it, but from a performance angle it "doesn't exist any more", > because it's not holding anything else up (well, you want to do it in > _some_ reasonable time, but the point stands..) > > Similarly, whichever side of the branch wasn't taken goes away. Again, in > an out-of-order machine with register renaming, this means that even if > the branch isn't taken above, and you end up executing all the non-branch > instructions, because you now UNCONDITIONALLY over-write the register, the > old data in the register is now DEAD, so now all the OTHER writes to that > register are off the critical path too! > > So the end result is that with a conditional branch, ona good CPU, the > _only_ part of the code that is actually performance-sensitive is the > actual calculation of the value that gets used! > > In contrast, if you use a predicated instruction, ALL of it is on the > critical path. Calculating the conditional is on the critical path. > Calculating the value that gets used is obviously ALSO on the critical > path, but so is the calculation for the value that DOESN'T get used too. > So the cmov - rather than speeding things up - actually slows things down, > because it makes more code be dependent on each other. > > So here's the basic rule: > > - cmov is sometimes nice for code density. It's not a big win, but it > certainly can be a win. > > - if you KNOW the branch is totally unpredictable, cmov is often good for > performance. But a compiler almost never knows that, and even if you > train it with input data and profiling, remember that not very many > branches _are_ totally unpredictable, so even if you were to know that > something is unpredictable, it's going to be very rare. > > - on a P4, branch mispredictions are expensive, but so is cmov, so all > the above is to some degree exaggerated. On nicer microarchitectures > (the Intel Core 2 in particular is something I have to say is very nice > indeed), the difference will be a lot less noticeable. The loss from > cmov isn't very big (it's not as sucky as P4), but neither is the win > (branch misprediction isn't that expensive either). > > Here's an example program that you can test and time yourself. > > On my Core 2, I get > > [torvalds@woody ~]$ gcc -DCMOV -Wall -O2 t.c > [torvalds@woody ~]$ time ./a.out > 600000000 > > real 0m0.194s > user 0m0.192s > sys 0m0.000s > > [torvalds@woody ~]$ gcc -Wall -O2 t.c > [torvalds@woody ~]$ time ./a.out > 600000000 > > real 0m0.167s > user 0m0.168s > sys 0m0.000s > > ie the cmov is quite a bit slower. Maybe I did something wrong. But note > how cmov not only is slower, it's fundamnetally more limited too (ie the > branch-over can actually do a lot of things cmov simply cannot do). Hi, cmov will stall on eflags in your test program. I think you will see benefit of cmov if you can manage to put some instructions which does NOT modify eflags between testl and cmov. Thanks Zou Nan hai ^ permalink raw reply [flat|nested] 60+ messages in thread
* RE: kernel + gcc 4.1 = several problems 2007-01-04 3:08 ` Zou, Nanhai @ 2007-01-04 15:34 ` Linus Torvalds 0 siblings, 0 replies; 60+ messages in thread From: Linus Torvalds @ 2007-01-04 15:34 UTC (permalink / raw) To: Zou, Nanhai Cc: Grzegorz Kulewski, Alan, Mikael Pettersson, s0348365, 76306.1226, akpm, bunk, greg, linux-kernel, yanmin_zhang On Thu, 4 Jan 2007, Zou, Nanhai wrote: > > cmov will stall on eflags in your test program. And that is EXACTLY my point. CMOV is a piece of CRAP for most things, exactly because it serializes three streams of data: the two inputs, and the conditional. My test-case was actually _good_ for cmov, because there was just the one conditional (which was 100% ALU) thing that was serialized. In real life, the two data sources also come from memory, and _any_ of them being delayed ends up delaying the cmov, and screwing up your out-of-order pipeline because you now introduced a serialization point that was very possibly not necessary at all. In contrast, a conditional branch-around serializes absolutely NOTHING, because branches get predicted. > I think you will see benefit of cmov if you can manage to put some > instructions which does NOT modify eflags between testl and cmov. A lot of the time, the conditional _is_ the critical path. The whole point of this discussion was that cmov isn't really all that great. It has fundamental problems that a conditional branch that gets predicted simply does not have. That's qiute apart from the fact that cmov has rather limited semantics, and that in 99% of all cases you have to use a conditional branch anyway. Linus ^ permalink raw reply [flat|nested] 60+ messages in thread
* Oops in 2.6.19.1 @ 2006-12-20 14:21 Alistair John Strachan 2006-12-30 16:59 ` Alistair John Strachan 0 siblings, 1 reply; 60+ messages in thread From: Alistair John Strachan @ 2006-12-20 14:21 UTC (permalink / raw) To: LKML Hi, Any ideas? BUG: unable to handle kernel NULL pointer dereference at virtual address 00000009 printing eip: c0156f60 *pde = 00000000 Oops: 0002 [#1] Modules linked in: ipt_recent ipt_REJECT xt_tcpudp ipt_MASQUERADE iptable_nat xt_state iptable_filter ip_tables x_tables prism54 yenta_socket rsrc_nonstatic pcmcia_core snd_via82xx snd_ac97_codec snd_ac97_bus snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd soundcore ehci_hcd usblp eth1394 uhci_hcd usbcore ohci1394 ieee1394 via_agp agpgart vt1211 hwmon_vid hwmon ip_nat_ftp ip_nat ip_conntrack_ftp ip_conntrack CPU: 0 EIP: 0060:[<c0156f60>] Not tainted VLI EFLAGS: 00010246 (2.6.19.1 #1) EIP is at pipe_poll+0xa0/0xb0 eax: 00000008 ebx: 00000000 ecx: 00000008 edx: 00000000 esi: f70f3e9c edi: f7017c00 ebp: f70f3c1c esp: f70f3c0c ds: 007b es: 007b ss: 0068 Process python (pid: 4178, ti=f70f2000 task=f70c4a90 task.ti=f70f2000) Stack: 00000000 00000000 f70f3e9c f6e111c0 f70f3fa4 c015d7f3 f70f3c54 f70f3fac 084c44a0 00000030 084c44d0 00000000 f70f3e94 f70f3e94 00000006 f70f3ecc 00000000 f70f3e94 c015e580 00000000 00000000 00000006 f6e111c0 00000000 Call Trace: [<c015d7f3>] do_sys_poll+0x253/0x480 [<c015da53>] sys_poll+0x33/0x50 [<c0102c97>] syscall_call+0x7/0xb [<b7f6b402>] 0xb7f6b402 ======================= Code: 58 01 00 00 0f 4f c2 09 c1 89 c8 83 c8 08 85 db 0f 44 c8 8b 5d f4 89 c8 8b 75 f8 8b 7d fc 89 ec 5d c3 89 ca 8b 46 6c 83 ca 10 3b <87> 68 01 00 00 0f 45 ca eb b6 8d b6 00 00 00 00 55 b8 01 00 00 EIP: [<c0156f60>] pipe_poll+0xa0/0xb0 SS:ESP 0068:f70f3c0c -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: Oops in 2.6.19.1 @ 2006-12-30 16:59 ` Alistair John Strachan 2006-12-31 16:27 ` Adrian Bunk 0 siblings, 1 reply; 60+ messages in thread From: Alistair John Strachan @ 2006-12-30 16:59 UTC (permalink / raw) To: Zhang, Yanmin; +Cc: LKML, Greg KH, Chuck Ebbert On Thursday 28 December 2006 04:14, Alistair John Strachan wrote: > On Thursday 28 December 2006 04:02, Alistair John Strachan wrote: > > On Thursday 28 December 2006 02:41, Zhang, Yanmin wrote: > > [snip] > > > > > > Here's a current decompilation of vmlinux/pipe_poll() from the > > > > running kernel, the addresses have changed slightly. There's no xchg > > > > there either: > > > > > > Could you reproduce the bug by the new kernel, so we could get the > > > exact address and instruction of the bug? > > > > It crashed again, but this time with no output (machine locked solid). To > > be honest, the disassembly looks right (it's like Chuck said, it's > > jumping back half way through an instruction): > > > > c0156f5f: 3b 87 68 01 00 00 cmp 0x168(%edi),%eax > > > > So c0156f60 is 87 68 01 00 00.. > > > > This is with the GCC recompile, so it's not a distro problem. It could > > still either be GCC 4.x, or a 2.6.19.1 specific bug, but it's serious. > > 2.6.19 with GCC 3.4.3 is 100% stable. > > Looks like a similar crash here: > > http://ubuntuforums.org/showthread.php?p=1803389 I've eliminated 2.6.19.1 as the culprit, and also tried toggling "optimize for size", various debug options. 2.6.19 compiled with GCC 4.1.1 on an Via Nehemiah C3-2 seems to crash in pipe_poll reliably, within approximately 12 hours. The machine passes 6 hours of Prime95 (a CPU stability tester), four memtest86 passes, and there are no heat problems. I have compiled GCC 3.4.6 and compiled 2.6.19 with an identical config using this compiler (but the same binutils), and will report back if it crashes. My bet is that it won't, however. -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: Oops in 2.6.19.1 2006-12-30 16:59 ` Alistair John Strachan @ 2006-12-31 16:27 ` Adrian Bunk 2006-12-31 16:55 ` Alistair John Strachan 0 siblings, 1 reply; 60+ messages in thread From: Adrian Bunk @ 2006-12-31 16:27 UTC (permalink / raw) To: Alistair John Strachan; +Cc: Zhang, Yanmin, LKML, Greg KH, Chuck Ebbert On Sat, Dec 30, 2006 at 04:59:35PM +0000, Alistair John Strachan wrote: > On Thursday 28 December 2006 04:14, Alistair John Strachan wrote: > > On Thursday 28 December 2006 04:02, Alistair John Strachan wrote: > > > On Thursday 28 December 2006 02:41, Zhang, Yanmin wrote: > > > [snip] > > > > > > > > Here's a current decompilation of vmlinux/pipe_poll() from the > > > > > running kernel, the addresses have changed slightly. There's no xchg > > > > > there either: > > > > > > > > Could you reproduce the bug by the new kernel, so we could get the > > > > exact address and instruction of the bug? > > > > > > It crashed again, but this time with no output (machine locked solid). To > > > be honest, the disassembly looks right (it's like Chuck said, it's > > > jumping back half way through an instruction): > > > > > > c0156f5f: 3b 87 68 01 00 00 cmp 0x168(%edi),%eax > > > > > > So c0156f60 is 87 68 01 00 00.. > > > > > > This is with the GCC recompile, so it's not a distro problem. It could > > > still either be GCC 4.x, or a 2.6.19.1 specific bug, but it's serious. > > > 2.6.19 with GCC 3.4.3 is 100% stable. > > > > Looks like a similar crash here: > > > > http://ubuntuforums.org/showthread.php?p=1803389 > > I've eliminated 2.6.19.1 as the culprit, and also tried toggling "optimize for > size", various debug options. 2.6.19 compiled with GCC 4.1.1 on an Via > Nehemiah C3-2 seems to crash in pipe_poll reliably, within approximately 12 > hours. > > The machine passes 6 hours of Prime95 (a CPU stability tester), four memtest86 > passes, and there are no heat problems. > > I have compiled GCC 3.4.6 and compiled 2.6.19 with an identical config using > this compiler (but the same binutils), and will report back if it crashes. My > bet is that it won't, however. There are occasional reports of problems with kernels compiled with gcc 4.1 that vanish when using older versions of gcc. AFAIK, until now noone has ever debugged whether that's a gcc bug, gcc exposing a kernel bug or gcc exposing a hardware bug. Comparing your report and [1], it seems that if these are the same problem, it's not a hardware bug but a gcc or kernel bug. > Cheers, > Alistair. cu Adrian [1] http://bugzilla.kernel.org/show_bug.cgi?id=7176 -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: Oops in 2.6.19.1 2006-12-31 16:27 ` Adrian Bunk @ 2006-12-31 16:55 ` Alistair John Strachan 2007-01-02 21:10 ` kernel + gcc 4.1 = several problems Adrian Bunk 0 siblings, 1 reply; 60+ messages in thread From: Alistair John Strachan @ 2006-12-31 16:55 UTC (permalink / raw) To: Adrian Bunk; +Cc: Zhang, Yanmin, LKML, Greg KH, Chuck Ebbert On Sunday 31 December 2006 16:27, Adrian Bunk wrote: > On Sat, Dec 30, 2006 at 04:59:35PM +0000, Alistair John Strachan wrote: > > On Thursday 28 December 2006 04:14, Alistair John Strachan wrote: > > > On Thursday 28 December 2006 04:02, Alistair John Strachan wrote: > > > > On Thursday 28 December 2006 02:41, Zhang, Yanmin wrote: > > > > [snip] > > > > > > > > > > Here's a current decompilation of vmlinux/pipe_poll() from the > > > > > > running kernel, the addresses have changed slightly. There's no > > > > > > xchg there either: > > > > > > > > > > Could you reproduce the bug by the new kernel, so we could get the > > > > > exact address and instruction of the bug? > > > > > > > > It crashed again, but this time with no output (machine locked > > > > solid). To be honest, the disassembly looks right (it's like Chuck > > > > said, it's jumping back half way through an instruction): > > > > > > > > c0156f5f: 3b 87 68 01 00 00 cmp 0x168(%edi),%eax > > > > > > > > So c0156f60 is 87 68 01 00 00.. > > > > > > > > This is with the GCC recompile, so it's not a distro problem. It > > > > could still either be GCC 4.x, or a 2.6.19.1 specific bug, but it's > > > > serious. 2.6.19 with GCC 3.4.3 is 100% stable. > > > > > > Looks like a similar crash here: > > > > > > http://ubuntuforums.org/showthread.php?p=1803389 > > > > I've eliminated 2.6.19.1 as the culprit, and also tried toggling > > "optimize for size", various debug options. 2.6.19 compiled with GCC > > 4.1.1 on an Via Nehemiah C3-2 seems to crash in pipe_poll reliably, > > within approximately 12 hours. > > > > The machine passes 6 hours of Prime95 (a CPU stability tester), four > > memtest86 passes, and there are no heat problems. > > > > I have compiled GCC 3.4.6 and compiled 2.6.19 with an identical config > > using this compiler (but the same binutils), and will report back if it > > crashes. My bet is that it won't, however. > > There are occasional reports of problems with kernels compiled with > gcc 4.1 that vanish when using older versions of gcc. > > AFAIK, until now noone has ever debugged whether that's a gcc bug, > gcc exposing a kernel bug or gcc exposing a hardware bug. > > Comparing your report and [1], it seems that if these are the same > problem, it's not a hardware bug but a gcc or kernel bug. This bug specifically indicates some kind of miscompilation in a driver, causing boot time hangs. My problem is quite different, and more subtle. The crash happens in the same place every time, which does suggest determinism (even with various options toggled on and off, and a 300K smaller kernel image), but it takes 8-12 hours to manifest and only happens with GCC 4.1.1. Unless we can start narrowing this down, it would be a mammoth task to seek out either the kernel or GCC change that first exhibited this bug, due to the non-immediate reproducibility of the bug, the lack of clues, and this machine's role as a stable, high-availability server. (If I had another Epia M10000 or another computer I could reproduce the bug on, I would be only too happy to boot as many kernels as required to fix it; however I cannot spare this machine). -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. ^ permalink raw reply [flat|nested] 60+ messages in thread
* kernel + gcc 4.1 = several problems 2006-12-31 16:55 ` Alistair John Strachan @ 2007-01-02 21:10 ` Adrian Bunk 2007-01-02 21:56 ` Alistair John Strachan 2007-01-02 22:01 ` Linus Torvalds 0 siblings, 2 replies; 60+ messages in thread From: Adrian Bunk @ 2007-01-02 21:10 UTC (permalink / raw) To: Alistair John Strachan Cc: Zhang, Yanmin, LKML, Greg KH, Chuck Ebbert, Linus Torvalds, Andrew Morton On Sun, Dec 31, 2006 at 04:55:51PM +0000, Alistair John Strachan wrote: > On Sunday 31 December 2006 16:27, Adrian Bunk wrote: > > On Sat, Dec 30, 2006 at 04:59:35PM +0000, Alistair John Strachan wrote: > > > On Thursday 28 December 2006 04:14, Alistair John Strachan wrote: > > > > On Thursday 28 December 2006 04:02, Alistair John Strachan wrote: > > > > > On Thursday 28 December 2006 02:41, Zhang, Yanmin wrote: > > > > > [snip] > > > > > > > > > > > > Here's a current decompilation of vmlinux/pipe_poll() from the > > > > > > > running kernel, the addresses have changed slightly. There's no > > > > > > > xchg there either: > > > > > > > > > > > > Could you reproduce the bug by the new kernel, so we could get the > > > > > > exact address and instruction of the bug? > > > > > > > > > > It crashed again, but this time with no output (machine locked > > > > > solid). To be honest, the disassembly looks right (it's like Chuck > > > > > said, it's jumping back half way through an instruction): > > > > > > > > > > c0156f5f: 3b 87 68 01 00 00 cmp 0x168(%edi),%eax > > > > > > > > > > So c0156f60 is 87 68 01 00 00.. > > > > > > > > > > This is with the GCC recompile, so it's not a distro problem. It > > > > > could still either be GCC 4.x, or a 2.6.19.1 specific bug, but it's > > > > > serious. 2.6.19 with GCC 3.4.3 is 100% stable. > > > > > > > > Looks like a similar crash here: > > > > > > > > http://ubuntuforums.org/showthread.php?p=1803389 > > > > > > I've eliminated 2.6.19.1 as the culprit, and also tried toggling > > > "optimize for size", various debug options. 2.6.19 compiled with GCC > > > 4.1.1 on an Via Nehemiah C3-2 seems to crash in pipe_poll reliably, > > > within approximately 12 hours. > > > > > > The machine passes 6 hours of Prime95 (a CPU stability tester), four > > > memtest86 passes, and there are no heat problems. > > > > > > I have compiled GCC 3.4.6 and compiled 2.6.19 with an identical config > > > using this compiler (but the same binutils), and will report back if it > > > crashes. My bet is that it won't, however. > > > > There are occasional reports of problems with kernels compiled with > > gcc 4.1 that vanish when using older versions of gcc. > > > > AFAIK, until now noone has ever debugged whether that's a gcc bug, > > gcc exposing a kernel bug or gcc exposing a hardware bug. > > > > Comparing your report and [1], it seems that if these are the same > > problem, it's not a hardware bug but a gcc or kernel bug. > > This bug specifically indicates some kind of miscompilation in a driver, > causing boot time hangs. My problem is quite different, and more subtle. The > crash happens in the same place every time, which does suggest determinism > (even with various options toggled on and off, and a 300K smaller kernel > image), but it takes 8-12 hours to manifest and only happens with GCC 4.1.1. >... Sorry if my point goes a bit away from your problem: My point is that we have several reported problems only visible with gcc 4.1. Other bug reports are e.g. [2] and [3], but they are only present with using gcc 4.1 _and_ using -Os. There's simply a bunch of bugs only present with gcc 4.1, and what worries me most is that the estimated number of unknown cases is most likely very high since most people won't check different compiler versions when running into a problem. > Cheers, > Alistair. cu Adrian [1] http://bugzilla.kernel.org/show_bug.cgi?id=7176 [2] http://bugzilla.kernel.org/show_bug.cgi?id=7106 [3] https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=186852 -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-02 21:10 ` kernel + gcc 4.1 = several problems Adrian Bunk @ 2007-01-02 21:56 ` Alistair John Strachan 2007-01-02 22:06 ` D. Hazelton 2007-01-02 22:13 ` Linus Torvalds 2007-01-02 22:01 ` Linus Torvalds 1 sibling, 2 replies; 60+ messages in thread From: Alistair John Strachan @ 2007-01-02 21:56 UTC (permalink / raw) To: Adrian Bunk Cc: Zhang, Yanmin, LKML, Greg KH, Chuck Ebbert, Linus Torvalds, Andrew Morton On Tuesday 02 January 2007 21:10, Adrian Bunk wrote: [snip] > > > Comparing your report and [1], it seems that if these are the same > > > problem, it's not a hardware bug but a gcc or kernel bug. > > > > This bug specifically indicates some kind of miscompilation in a driver, > > causing boot time hangs. My problem is quite different, and more subtle. > > The crash happens in the same place every time, which does suggest > > determinism (even with various options toggled on and off, and a 300K > > smaller kernel image), but it takes 8-12 hours to manifest and only > > happens with GCC 4.1.1. ... > > Sorry if my point goes a bit away from your problem: > > My point is that we have several reported problems only visible > with gcc 4.1. > > Other bug reports are e.g. [2] and [3], but they are only present with > using gcc 4.1 _and_ using -Os. I find [2] most compelling, and I can confirm that I do have the same problem with or without optimisation for size. I don't use selinux nor has it ever been enabled. At any rate, I have absolute confirmation that it is GCC 4.1.1, because with GCC 3.4.6 the same kernel I reported booting three days ago is still cheerfully working. I regularly get uptimes of 60+ days on that machine, rebooting only for kernel upgrades. 2.6.19 seems to be no worse in this regard. Perhaps fortunately, the configs I've tried have consistently failed to shake the crash, so I have a semi-reproducible test case here on C3-2 hardware if somebody wants to investigate the problem (though it still takes 6-12 hours). -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-02 21:56 ` Alistair John Strachan @ 2007-01-02 22:06 ` D. Hazelton 2007-01-02 23:24 ` Adrian Bunk 2007-01-02 22:13 ` Linus Torvalds 1 sibling, 1 reply; 60+ messages in thread From: D. Hazelton @ 2007-01-02 22:06 UTC (permalink / raw) To: Alistair John Strachan Cc: Adrian Bunk, Zhang, Yanmin, LKML, Greg KH, Chuck Ebbert, Linus Torvalds, Andrew Morton On Tuesday 02 January 2007 16:56, Alistair John Strachan wrote: > On Tuesday 02 January 2007 21:10, Adrian Bunk wrote: > [snip] > > > > > Comparing your report and [1], it seems that if these are the same > > > > problem, it's not a hardware bug but a gcc or kernel bug. > > > > > > This bug specifically indicates some kind of miscompilation in a > > > driver, causing boot time hangs. My problem is quite different, and > > > more subtle. The crash happens in the same place every time, which does > > > suggest determinism (even with various options toggled on and off, and > > > a 300K smaller kernel image), but it takes 8-12 hours to manifest and > > > only happens with GCC 4.1.1. ... > > > > Sorry if my point goes a bit away from your problem: > > > > My point is that we have several reported problems only visible > > with gcc 4.1. > > > > Other bug reports are e.g. [2] and [3], but they are only present with > > using gcc 4.1 _and_ using -Os. > > I find [2] most compelling, and I can confirm that I do have the same > problem with or without optimisation for size. I don't use selinux nor has > it ever been enabled. > > At any rate, I have absolute confirmation that it is GCC 4.1.1, because > with GCC 3.4.6 the same kernel I reported booting three days ago is still > cheerfully working. I regularly get uptimes of 60+ days on that machine, > rebooting only for kernel upgrades. 2.6.19 seems to be no worse in this > regard. > > Perhaps fortunately, the configs I've tried have consistently failed to > shake the crash, so I have a semi-reproducible test case here on C3-2 > hardware if somebody wants to investigate the problem (though it still > takes 6-12 hours). The GCC code generator appears to have been rewritten between 3.4.6 and 4.1.1.... I took a look at the dump he posted and there are some minor and some massive differences between the code. In one case some of the code is swapped, in another there is code in the 3.4.6 version that isn't in the 4.1.1... Finally the 4.1.1 version of the function has what appears to be function calls and these don't appear in the code generated by 3.4.6 In other words - the code generation for 4.1.1 appears to be broken when it comes to generating system code. DRH ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-02 22:06 ` D. Hazelton @ 2007-01-02 23:24 ` Adrian Bunk 2007-01-02 23:41 ` D. Hazelton 0 siblings, 1 reply; 60+ messages in thread From: Adrian Bunk @ 2007-01-02 23:24 UTC (permalink / raw) To: D. Hazelton Cc: Alistair John Strachan, Zhang, Yanmin, LKML, Greg KH, Chuck Ebbert, Linus Torvalds, Andrew Morton On Tue, Jan 02, 2007 at 05:06:14PM -0500, D. Hazelton wrote: > On Tuesday 02 January 2007 16:56, Alistair John Strachan wrote: > > On Tuesday 02 January 2007 21:10, Adrian Bunk wrote: > > [snip] > > > > > > > Comparing your report and [1], it seems that if these are the same > > > > > problem, it's not a hardware bug but a gcc or kernel bug. > > > > > > > > This bug specifically indicates some kind of miscompilation in a > > > > driver, causing boot time hangs. My problem is quite different, and > > > > more subtle. The crash happens in the same place every time, which does > > > > suggest determinism (even with various options toggled on and off, and > > > > a 300K smaller kernel image), but it takes 8-12 hours to manifest and > > > > only happens with GCC 4.1.1. ... > > > > > > Sorry if my point goes a bit away from your problem: > > > > > > My point is that we have several reported problems only visible > > > with gcc 4.1. > > > > > > Other bug reports are e.g. [2] and [3], but they are only present with > > > using gcc 4.1 _and_ using -Os. > > > > I find [2] most compelling, and I can confirm that I do have the same > > problem with or without optimisation for size. I don't use selinux nor has > > it ever been enabled. > > > > At any rate, I have absolute confirmation that it is GCC 4.1.1, because > > with GCC 3.4.6 the same kernel I reported booting three days ago is still > > cheerfully working. I regularly get uptimes of 60+ days on that machine, > > rebooting only for kernel upgrades. 2.6.19 seems to be no worse in this > > regard. > > > > Perhaps fortunately, the configs I've tried have consistently failed to > > shake the crash, so I have a semi-reproducible test case here on C3-2 > > hardware if somebody wants to investigate the problem (though it still > > takes 6-12 hours). > > The GCC code generator appears to have been rewritten between 3.4.6 and > 4.1.1.... > > I took a look at the dump he posted and there are some minor and some massive > differences between the code. In one case some of the code is swapped, in > another there is code in the 3.4.6 version that isn't in the 4.1.1... Finally > the 4.1.1 version of the function has what appears to be function calls and > these don't appear in the code generated by 3.4.6 Differences are expected since we disable unit-at-a-time for gcc < 4 and gcc development didn't stall between 3.4 and 4.1. > In other words - the code generation for 4.1.1 appears to be broken when it > comes to generating system code. Bug number for an either already open or created by you bug in the gcc Bugzilla for what you claim to be a bug in gcc? > DRH cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-02 23:24 ` Adrian Bunk @ 2007-01-02 23:41 ` D. Hazelton 2007-01-03 2:05 ` Horst H. von Brand 0 siblings, 1 reply; 60+ messages in thread From: D. Hazelton @ 2007-01-02 23:41 UTC (permalink / raw) To: Adrian Bunk Cc: Alistair John Strachan, Zhang, Yanmin, LKML, Greg KH, Chuck Ebbert, Linus Torvalds, Andrew Morton On Tuesday 02 January 2007 18:24, you wrote: > On Tue, Jan 02, 2007 at 05:06:14PM -0500, D. Hazelton wrote: > > On Tuesday 02 January 2007 16:56, Alistair John Strachan wrote: > > > On Tuesday 02 January 2007 21:10, Adrian Bunk wrote: > > > [snip] > > > > > > > > > Comparing your report and [1], it seems that if these are the > > > > > > same problem, it's not a hardware bug but a gcc or kernel bug. > > > > > > > > > > This bug specifically indicates some kind of miscompilation in a > > > > > driver, causing boot time hangs. My problem is quite different, and > > > > > more subtle. The crash happens in the same place every time, which > > > > > does suggest determinism (even with various options toggled on and > > > > > off, and a 300K smaller kernel image), but it takes 8-12 hours to > > > > > manifest and only happens with GCC 4.1.1. ... > > > > > > > > Sorry if my point goes a bit away from your problem: > > > > > > > > My point is that we have several reported problems only visible > > > > with gcc 4.1. > > > > > > > > Other bug reports are e.g. [2] and [3], but they are only present > > > > with using gcc 4.1 _and_ using -Os. > > > > > > I find [2] most compelling, and I can confirm that I do have the same > > > problem with or without optimisation for size. I don't use selinux nor > > > has it ever been enabled. > > > > > > At any rate, I have absolute confirmation that it is GCC 4.1.1, because > > > with GCC 3.4.6 the same kernel I reported booting three days ago is > > > still cheerfully working. I regularly get uptimes of 60+ days on that > > > machine, rebooting only for kernel upgrades. 2.6.19 seems to be no > > > worse in this regard. > > > > > > Perhaps fortunately, the configs I've tried have consistently failed to > > > shake the crash, so I have a semi-reproducible test case here on C3-2 > > > hardware if somebody wants to investigate the problem (though it still > > > takes 6-12 hours). > > > > The GCC code generator appears to have been rewritten between 3.4.6 and > > 4.1.1.... > > > > I took a look at the dump he posted and there are some minor and some > > massive differences between the code. In one case some of the code is > > swapped, in another there is code in the 3.4.6 version that isn't in the > > 4.1.1... Finally the 4.1.1 version of the function has what appears to be > > function calls and these don't appear in the code generated by 3.4.6 > > Differences are expected since we disable unit-at-a-time for gcc < 4 > and gcc development didn't stall between 3.4 and 4.1. Okay. Thing is that these noted differences, aside from where 4.1.1 doesn't generate an opcode that 3.4.6 does aren't all that fatal, IMHO. The fact that there it does generate call's rather than jumps for local pointer moves (IIRC - been a while since I looked at the dump of pipe_poll that he provided) might be part of the problem > > In other words - the code generation for 4.1.1 appears to be broken when > > it comes to generating system code. > > Bug number for an either already open or created by you bug in the gcc > Bugzilla for what you claim to be a bug in gcc? None. I didn't file a report on this because I didn't find the big, just noted a problem that appears to occur. In this case the call's generated seem to wrap loops - something I've never heard of anyone doing. These *might* be causing the off-by-one that is causing the function to re-enter in the middle of an instruction. Seeing this I'd guess that this follows for all system-level code generated by 4.1.1 and this is exactly what I was reporting. If you'd like I'll go dig up the dumps he posted and post the two related segments side-by-side to give you a better example what I'm referring to. DRH ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-02 23:41 ` D. Hazelton @ 2007-01-03 2:05 ` Horst H. von Brand 0 siblings, 0 replies; 60+ messages in thread From: Horst H. von Brand @ 2007-01-03 2:05 UTC (permalink / raw) To: D. Hazelton Cc: Adrian Bunk, Alistair John Strachan, Zhang, Yanmin, LKML, Greg KH, Chuck Ebbert, Linus Torvalds, Andrew Morton D. Hazelton <dhazelton@enter.net> wrote: [...] > None. I didn't file a report on this because I didn't find the big, just > noted a problem that appears to occur. In this case the call's generated > seem to wrap loops - something I've never heard of anyone doing. Example code showing this weirdness? > These > *might* be causing the off-by-one that is causing the function to > re-enter in the middle of an instruction. If something like this happened, programs would be crashing left and right. > Seeing this I'd guess that this follows for all system-level code > generated by 4.1.1 Define "system-level code". What makes it different from, say, bog-of-the-mill compiler code (yes, gcc compiles itself as part of its sanity checking)? > and this is exactly what I was reporting. If you'd > like I'll go dig up the dumps he posted and post the two related segments > side-by-side to give you a better example what I'm referring to. If the related segments show code that is somehow wrong, by all means report it /with your detailed analysis/ to the compiler people. Just a warning, gcc is pretty smart in what it does, its code is often surprising to the unwashed. Also, the C standard is subtle, the error might be in a unwarranted assumption in the source code. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-02 21:56 ` Alistair John Strachan 2007-01-02 22:06 ` D. Hazelton @ 2007-01-02 22:13 ` Linus Torvalds 2007-01-02 23:18 ` Alistair John Strachan 1 sibling, 1 reply; 60+ messages in thread From: Linus Torvalds @ 2007-01-02 22:13 UTC (permalink / raw) To: Alistair John Strachan Cc: Adrian Bunk, Zhang, Yanmin, LKML, Greg KH, Chuck Ebbert, Andrew Morton On Tue, 2 Jan 2007, Alistair John Strachan wrote: > > At any rate, I have absolute confirmation that it is GCC 4.1.1, because with > GCC 3.4.6 the same kernel I reported booting three days ago is still > cheerfully working. I regularly get uptimes of 60+ days on that machine, > rebooting only for kernel upgrades. 2.6.19 seems to be no worse in this > regard. > > Perhaps fortunately, the configs I've tried have consistently failed to shake > the crash, so I have a semi-reproducible test case here on C3-2 hardware if > somebody wants to investigate the problem (though it still takes 6-12 hours). Historically, some people have actually used horrible hacks like trying to figure out which particular C file gets miscompiled by basically having both compilers installed, and then trying out different subdirectories with different compilers. And once the subdirectory has been pinpointed, pinpointing which particular file it is.. etc. Pretty damn horrible to do, and I'm afraid we don't have any real helpful scripts to do any of the work for you. So it's all effectively manual (basically boils down to: "compile everything with known-good compiler. Then replace the good compiler with the bad one, remove the object files from one directory, and recompile the kernel". "Rinse and repeat". I don't think anybody has ever done that with something where triggering the cause then also takes that long - that just ends up making the whole thing even more painful. What are the exact crash details? That might narrow things down enough that maybe you could try just one or two files that are "suspect". Linus ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-02 22:13 ` Linus Torvalds @ 2007-01-02 23:18 ` Alistair John Strachan 2007-01-03 1:43 ` Linus Torvalds 0 siblings, 1 reply; 60+ messages in thread From: Alistair John Strachan @ 2007-01-02 23:18 UTC (permalink / raw) To: Linus Torvalds Cc: Adrian Bunk, Zhang, Yanmin, LKML, Greg KH, Chuck Ebbert, Andrew Morton Linus, On Tuesday 02 January 2007 22:13, Linus Torvalds wrote: [snip] > What are the exact crash details? That might narrow things down enough > that maybe you could try just one or two files that are "suspect". I'll do a digest of the problem for you and anybody else that's lost track of the debugging story so far.. There are no hardware problems evidenced by any testing I have performed (memtest, prime95 CPU torture tests, temp monitors). Furthermore, kernels compiled with older GCCs have been running without problems for literally years on this machine. Here is an example of an oops. The kernel continued to limp along after this. BUG: unable to handle kernel NULL pointer dereference at virtual address 00000009 printing eip: c0156f60 *pde = 00000000 Oops: 0002 [#1] Modules linked in: ipt_recent ipt_REJECT xt_tcpudp ipt_MASQUERADE iptable_nat xt_state iptable_filter ip_tables x_tables prism54 yenta_socket rsrc_nonstatic pcmcia_core snd_via82xx snd_ac97_codec snd_ac97_bus snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd soundcore ehci_hcd usblp eth1394 uhci_hcd usbcore ohci1394 ieee1394 via_agp agpgart vt1211 hwmon_vid hwmon ip_nat_ftp ip_nat ip_conntrack_ftp ip_conntrack CPU: 0 EIP: 0060:[<c0156f60>] Not tainted VLI EFLAGS: 00010246 (2.6.19.1 #1) EIP is at pipe_poll+0xa0/0xb0 eax: 00000008 ebx: 00000000 ecx: 00000008 edx: 00000000 esi: f70f3e9c edi: f7017c00 ebp: f70f3c1c esp: f70f3c0c ds: 007b es: 007b ss: 0068 Process python (pid: 4178, ti=f70f2000 task=f70c4a90 task.ti=f70f2000) Stack: 00000000 00000000 f70f3e9c f6e111c0 f70f3fa4 c015d7f3 f70f3c54 f70f3fac 084c44a0 00000030 084c44d0 00000000 f70f3e94 f70f3e94 00000006 f70f3ecc 00000000 f70f3e94 c015e580 00000000 00000000 00000006 f6e111c0 00000000 Call Trace: [<c015d7f3>] do_sys_poll+0x253/0x480 [<c015da53>] sys_poll+0x33/0x50 [<c0102c97>] syscall_call+0x7/0xb [<b7f6b402>] 0xb7f6b402 ======================= Code: 58 01 00 00 0f 4f c2 09 c1 89 c8 83 c8 08 85 db 0f 44 c8 8b 5d f4 89 c8 8b 75 f8 8b 7d fc 89 ec 5d c3 89 ca 8b 46 6c 83 ca 10 3b <87> 68 01 00 00 0f 45 ca eb b6 8d b6 00 00 00 00 55 b8 01 00 00 EIP: [<c0156f60>] pipe_poll+0xa0/0xb0 SS:ESP 0068:f70f3c0c Chuck observed that the kernel tries to reenter pipe_poll half way through an instruction (c0156f5f->c0156f60); it's not a single-bit error but an off-by-one. On Wednesday 20 December 2006 20:48, Chuck Ebbert wrote: > In-Reply-To: <200612201421.03514.s0348365@sms.ed.ac.uk> > > On Wed, 20 Dec 2006 14:21:03 +0000, Alistair John Strachan wrote: > > Any ideas? > > > > BUG: unable to handle kernel NULL pointer dereference at virtual address > > 00000009 > > 83 ca 10 or $0x10,%edx > 3b .byte 0x3b > 87 68 01 xchg %ebp,0x1(%eax) <===== > 00 00 add %al,(%eax) > > Somehow it is trying to execute code in the middle of an instruction. > That almost never works, even when the resulting fragment is a legal > opcode. :) > > The real instruction is: > > 3b 87 68 01 00 00 00 cmp 0x168(%edi),%eax I've tried a multitude of kernel configs and compiler options, but none have made any difference. That first oops was pretty lucky, very often the machine locks up after oopsing (panic_on_oops=1 doesn't work). I've not seen oopses anywhere but in pipe_poll, but I've not seen many oopses. The machine runs jabberd 2.x which uses separate python processes as transports to different networks. The server hosts 50-100 users. One of my oops reports had Java crashing in the same place, that's Azureus. I've got binutils 2.17, gcc 4.1.1 hand bootstrapped from GNU sources (not distro versions). I've got another, secondary compiler (3.4.6), also compiled from GNU sources, installed elsewhere which I have used to build working kernels. So the only variable, for sure, is GCC itself. Both compilers were built with "make bootstrap" and I built binutils with the resulting GCC, and GCC with the resulting binutils, just to be sure. The only slightly non-standard thing I do is to compile everything (GCC, binutils, the kernels) on a dual-opteron box, inside a 32bit chroot, which is rsync'ed over to the Via C3-2 box with the problem. I can't see how this would cause any problems (and indeed have done it successfully for years), but I thought I'd point it out. The crashes take time to appear, which is why so many people suspected hardware initially. But the uptime of a GCC 4.1.1 kernel will always be less than 12 hours, where a 3.4.6 kernel will survive for months. I've had no other mysterious software crashes, ever. On Sunday 31 December 2006 22:16, Alistair John Strachan wrote: > On Sunday 31 December 2006 21:43, Chuck Ebbert wrote: > > Those were compiled without frame pointers. Can you post them compiled > > with frame pointers so they match your original bug report? And confirm > > that pipe_poll() is still at 0xc0156ec0 in vmlinux? > > c0156ec0 <pipe_poll>: > > I used the config I original sent you to rebuild it again. This time I've > put up the whole vmlinux for both kernels, the config is replaced, the > decompilation is re-done, I've confirmed the offset in the GCC 4.1.1 kernel > is identical. Sorry for the confusion. [snip] > http://devzero.co.uk/~alistair/2.6.19-via-c3-pipe_poll/ At the above URL can be found vmlinux images, the config used to build both, and decompilations of the fs/pipe.o file (with relocation information). The suggestions I've had so far which I have not yet tried: - Select a different x86 CPU in the config. - Unfortunately the C3-2 flags seem to simply tell GCC to schedule for ppro (like i686) and enabled MMX and SSE - Probably useless - Enable as many debug options as possible ("a shot in the dark") - Try compiling a minimal kernel config, sans modules that are not required for booting. The problem with this one (whilst it might uncover some bizarre memory scribbling or stack corruption) is that the machine's primary role is that of a router, so I require most of the modules loaded for the oops to be reproduced (chicken, egg?). If I can provide any more information, please do let me know. -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-02 23:18 ` Alistair John Strachan @ 2007-01-03 1:43 ` Linus Torvalds 0 siblings, 0 replies; 60+ messages in thread From: Linus Torvalds @ 2007-01-03 1:43 UTC (permalink / raw) To: Alistair John Strachan Cc: Adrian Bunk, Zhang, Yanmin, LKML, Greg KH, Chuck Ebbert, Andrew Morton On Tue, 2 Jan 2007, Alistair John Strachan wrote: > > eax: 00000008 ebx: 00000000 ecx: 00000008 edx: 00000000 > esi: f70f3e9c edi: f7017c00 ebp: f70f3c1c esp: f70f3c0c > > Code: 58 01 00 00 0f 4f c2 09 c1 89 c8 83 c8 08 85 db 0f 44 c8 8b 5d f4 89 c8 > 8b 75 f8 8b 7d fc 89 ec 5d c3 89 ca 8b 46 6c 83 ca 10 3b <87> 68 01 00 00 0f > 45 ca eb b6 8d b6 00 00 00 00 55 b8 01 00 00 > EIP: [<c0156f60>] pipe_poll+0xa0/0xb0 SS:ESP 0068:f70f3c0c > > Chuck observed that the kernel tries to reenter pipe_poll half way through an > instruction (c0156f5f->c0156f60); it's not a single-bit error but an > off-by-one. It's not an off-by-one either (eg say we're taking an exception and screiwing up %eip by one somehow). The code sequence in question is mov %ecx,%edx mov 0x6c(%esi),%eax or $0x10,%edx cmp 0x168(%edi),%eax <-- cmovne %edx,%ecx jmp ... and it's in the second byte of the "cmp". And yes, it definitely entered there, because trying other random entry-points will have either invalid instructions or instructions that would fault due to NULL pointers. HOWEVER, it's also not as simple as "took an interrupt, and returned with %eip incremented by one", becasue your %edx is zero, so it won't have done that "or $10,%edx" and then some interrupt happened and screwed up just %eip. So it's literally a random %eip, but since you say it's consistently in that function, it's not truly "random". There's something that triggers it just _there_. However, that's a damn simple function. There's _nothing_ there. The particular code that is involved right there is literally if (!pipe->writers && filp->f_version != pipe->w_counter) mask |= POLLHUP; and that's it. There's not even anything half-way interesting around it, except for the "poll_wait()" call, but even that is about as common as you can humanly get.. Looking at the register set and the stack, I see: Stack: 00000000 00000000 <- saved %ebx (dunno, seems dead in caller) f70f3e9c <- saved %esi (== pollfd in do_pollfd) f6e111c0 <- saved %edi (== filp) f70f3fa4 <- outer EBP (looks reasonable) c015d7f3 <- return address (do_sys_poll+0x253/0x480) and the strange thing is that when the oops happens, it really looks like %esi _still_ contains the value it had originally (and that is saved on the stack). But afaik, from your disassembly, it should have been overwritten by the initial %eax, which should have had the same value as %edi on entry... IOW, none of it really makes any sense. The stack frames look fine, so we _did_ enter at the beginning of the function (and it wasn't the *poll fn pointer that was corrupt. > The suggestions I've had so far which I have not yet tried: > > - Select a different x86 CPU in the config. > - Unfortunately the C3-2 flags seem to simply tell GCC > to schedule for ppro (like i686) and enabled MMX and SSE > - Probably useless Actually, try this one. Try using something that doesn't like "cmov". Maybe the C3-2 simply has some internal cmov bugginess. Linus ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-02 21:10 ` kernel + gcc 4.1 = several problems Adrian Bunk 2007-01-02 21:56 ` Alistair John Strachan @ 2007-01-02 22:01 ` Linus Torvalds 2007-01-02 23:09 ` David Rientjes 1 sibling, 1 reply; 60+ messages in thread From: Linus Torvalds @ 2007-01-02 22:01 UTC (permalink / raw) To: Adrian Bunk Cc: Alistair John Strachan, Zhang, Yanmin, LKML, Greg KH, Chuck Ebbert, Andrew Morton On Tue, 2 Jan 2007, Adrian Bunk wrote: > > My point is that we have several reported problems only visible > with gcc 4.1. > > Other bug reports are e.g. [2] and [3], but they are only present with > using gcc 4.1 _and_ using -Os. Traditionally, afaik, -Os has tended to show compiler problems that _could_ happen with -O2 too, but never do in practice. It may be that gcc-4.1 without -Os miscompiles some very unusual code, and then with -Os we just hit more cases of that. That said, I th ink gcc-4.1.1 is very common - I know it's the Fedora compiler. Also, CC_OPTIMIZE_FOR_SIZE defaults to 'y' if you have EXPERIMENTAL on, and from all the bug-reports about other features that are marked EXPERIMENTAL, I know that a lot of people do seem to select for it. So I would expect that gcc-4.1.1 and -Os is actually a fairly common combination. I just checked, and it's what I use personally, for example. Of course, my main machine is an x86-64, and it has more registers. At least some historical -Os bug was about bad things happening under register pressure, iirc, and so x86-64 would show fewer problems than regular 32-bit x86 (which has far fewer registers for the compiler to use). It is a bit worrisome. These things seem to be about 50:50 real kernel bugs (just hidden by some common code generation sequence) and real honest-to-goodness compiler bugs. But they are hard as hell to find. Linus ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: kernel + gcc 4.1 = several problems 2007-01-02 22:01 ` Linus Torvalds @ 2007-01-02 23:09 ` David Rientjes 0 siblings, 0 replies; 60+ messages in thread From: David Rientjes @ 2007-01-02 23:09 UTC (permalink / raw) To: Linus Torvalds Cc: Adrian Bunk, Alistair John Strachan, Zhang, Yanmin, LKML, Greg KH, Chuck Ebbert, Andrew Morton On Tue, 2 Jan 2007, Linus Torvalds wrote: > Traditionally, afaik, -Os has tended to show compiler problems that > _could_ happen with -O2 too, but never do in practice. It may be that > gcc-4.1 without -Os miscompiles some very unusual code, and then with -Os > we just hit more cases of that. > gcc optimizations were almost completely rewritten between 3.4.6 and 4.1, and one of the subtle changes that may have been introduced is with regard to the heuristics used to determine whether to inline an 'inline' function or not when using -Os. This problem can show up in dynamic linking and break on certain architectures but should be detectable by using -Winline. David ^ permalink raw reply [flat|nested] 60+ messages in thread
end of thread, other threads:[~2007-01-26 22:05 UTC | newest] Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2007-01-04 7:11 kernel + gcc 4.1 = several problems Albert Cahalan 2007-01-04 16:43 ` Segher Boessenkool 2007-01-04 17:04 ` Albert Cahalan 2007-01-04 17:24 ` Segher Boessenkool 2007-01-04 17:47 ` Linus Torvalds 2007-01-04 18:53 ` Segher Boessenkool 2007-01-04 19:10 ` Al Viro 2007-01-05 17:17 ` Pavel Machek 2007-01-06 8:23 ` Segher Boessenkool 2007-01-04 17:37 ` Linus Torvalds 2007-01-04 18:34 ` Segher Boessenkool 2007-01-04 22:02 ` Geert Bosch 2007-01-07 4:25 ` Denis Vlasenko 2007-01-07 4:45 ` Linus Torvalds 2007-01-07 5:26 ` Jeff Garzik 2007-01-07 15:10 ` Segher Boessenkool 2007-01-26 22:05 ` Michael K. Edwards 2007-01-04 18:08 ` Andreas Schwab -- strict thread matches above, loose matches on Subject: below -- 2007-01-03 2:12 Mikael Pettersson 2007-01-03 2:20 ` Alistair John Strachan 2007-01-05 15:53 ` Alistair John Strachan 2007-01-05 16:02 ` Linus Torvalds 2007-01-05 16:19 ` Alistair John Strachan 2007-01-05 16:49 ` Linus Torvalds 2007-01-07 0:36 ` Pavel Machek 2007-01-07 0:57 ` Alistair John Strachan 2007-01-03 5:55 ` Willy Tarreau 2007-01-03 10:29 ` Alan 2007-01-03 10:32 ` Grzegorz Kulewski 2007-01-03 11:51 ` Jeff Garzik 2007-01-03 12:44 ` Alan 2007-01-03 13:32 ` Arjan van de Ven 2007-01-03 13:58 ` Jakub Jelinek 2007-01-03 14:28 ` Alan 2007-01-03 16:06 ` Linus Torvalds 2007-01-03 16:03 ` Linus Torvalds 2007-01-03 17:01 ` l.genoni 2007-01-03 17:45 ` Tim Schmielau 2007-01-03 20:24 ` Linus Torvalds 2007-01-03 17:06 ` l.genoni 2007-01-03 17:53 ` Mariusz Kozlowski 2007-01-03 19:47 ` Denis Vlasenko 2007-01-03 20:38 ` Linus Torvalds 2007-01-03 21:48 ` Denis Vlasenko 2007-01-03 22:13 ` Linus Torvalds 2007-01-03 21:44 ` Thomas Sailer 2007-01-03 22:08 ` Linus Torvalds 2007-01-04 3:08 ` Zou, Nanhai 2007-01-04 15:34 ` Linus Torvalds 2006-12-20 14:21 Oops in 2.6.19.1 Alistair John Strachan 2006-12-30 16:59 ` Alistair John Strachan 2006-12-31 16:27 ` Adrian Bunk 2006-12-31 16:55 ` Alistair John Strachan 2007-01-02 21:10 ` kernel + gcc 4.1 = several problems Adrian Bunk 2007-01-02 21:56 ` Alistair John Strachan 2007-01-02 22:06 ` D. Hazelton 2007-01-02 23:24 ` Adrian Bunk 2007-01-02 23:41 ` D. Hazelton 2007-01-03 2:05 ` Horst H. von Brand 2007-01-02 22:13 ` Linus Torvalds 2007-01-02 23:18 ` Alistair John Strachan 2007-01-03 1:43 ` Linus Torvalds 2007-01-02 22:01 ` Linus Torvalds 2007-01-02 23:09 ` David Rientjes
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).