linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: kernel + gcc 4.1 = several problems
@ 2007-01-04  7:11 Albert Cahalan
  2007-01-04 16:43 ` Segher Boessenkool
  0 siblings, 1 reply; 60+ messages in thread
From: Albert Cahalan @ 2007-01-04  7:11 UTC (permalink / raw)
  To: mikpe, s0348365, torvalds, linux-kernel, akpm, bunk

Linus Torvalds writes:
> [probably Mikael Pettersson] writes:

>> The suggestions I've had so far which I have not yet tried:
>>
>> - Select a different x86 CPU in the config.
>>   - Unfortunately the C3-2 flags seem to simply tell GCC to
>>     schedule for ppro (like i686) and enabled MMX and SSE
>>   - Probably useless
>
> Actually, try this one. Try using something that doesn't like "cmov".
> Maybe the C3-2 simply has some internal cmov bugginess.

Of course that changes register usage, register spilling,
and thus ultimately even the stack layout. :-(

Adjusting gcc flags to eliminate optimizations is another way to go.
Adding -fwrapv would be an excellent start. Lack of this flag breaks
most code which checks for integer wrap-around. The compiler "knows"
that signed integers don't ever wrap, and thus eliminates any code
which checks for values going negative after a wrap-around. I could
imagine this affecting a switch() or other jump table.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-04  7:11 kernel + gcc 4.1 = several problems Albert Cahalan
@ 2007-01-04 16:43 ` Segher Boessenkool
  2007-01-04 17:04   ` Albert Cahalan
  0 siblings, 1 reply; 60+ messages in thread
From: Segher Boessenkool @ 2007-01-04 16:43 UTC (permalink / raw)
  To: Albert Cahalan; +Cc: akpm, linux-kernel, s0348365, bunk, mikpe, torvalds

> Adjusting gcc flags to eliminate optimizations is another way to go.
> Adding -fwrapv would be an excellent start. Lack of this flag breaks
> most code which checks for integer wrap-around.

Lack of the flag does not break any valid C code, only code
making unwarranted assumptions (i.e., buggy code).

> The compiler "knows"
> that signed integers don't ever wrap, and thus eliminates any code
> which checks for values going negative after a wrap-around.

You cannot assume it eliminates such code; the compiler is free
to do whatever it wants in such a case.

You should typically write such a computation using unsigned
types, FWIW.

Anyway, with 4.1 you shouldn't see frequent problems due to
"not using -fwrapv while my code is broken WRT signed overflow"
yet; and if/when problems start to happen, to "correct" action
to take is not to add the compiler flag, but to fix the code.


Segher


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-04 16:43 ` Segher Boessenkool
@ 2007-01-04 17:04   ` Albert Cahalan
  2007-01-04 17:24     ` Segher Boessenkool
                       ` (2 more replies)
  0 siblings, 3 replies; 60+ messages in thread
From: Albert Cahalan @ 2007-01-04 17:04 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: akpm, linux-kernel, s0348365, bunk, mikpe, torvalds

On 1/4/07, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> > Adjusting gcc flags to eliminate optimizations is another way to go.
> > Adding -fwrapv would be an excellent start. Lack of this flag breaks
> > most code which checks for integer wrap-around.
>
> Lack of the flag does not break any valid C code, only code
> making unwarranted assumptions (i.e., buggy code).

Right, if "C" means "strictly conforming ISO C" to you.
(in which case, nearly all real-world code is broken)

FYI, the kernel also assumes that a "char" is 8 bits.
Maybe you should run away screaming.

> > The compiler "knows"
> > that signed integers don't ever wrap, and thus eliminates any code
> > which checks for values going negative after a wrap-around.
>
> You cannot assume it eliminates such code; the compiler is free
> to do whatever it wants in such a case.
>
> You should typically write such a computation using unsigned
> types, FWIW.
>
> Anyway, with 4.1 you shouldn't see frequent problems due to

Right, it gets much worse with the current gcc snapshots.

IMHO you should play such games with "g++ -O9", but that's
a discussion for a different mailing list.

> "not using -fwrapv while my code is broken WRT signed overflow"
> yet; and if/when problems start to happen, to "correct" action
> to take is not to add the compiler flag, but to fix the code.

Nope, unless we decide that the performance advantages of
a language change are worth the risk and pain.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-04 17:04   ` Albert Cahalan
@ 2007-01-04 17:24     ` Segher Boessenkool
  2007-01-04 17:47       ` Linus Torvalds
  2007-01-05 17:17       ` Pavel Machek
  2007-01-04 17:37     ` Linus Torvalds
  2007-01-04 18:08     ` Andreas Schwab
  2 siblings, 2 replies; 60+ messages in thread
From: Segher Boessenkool @ 2007-01-04 17:24 UTC (permalink / raw)
  To: Albert Cahalan; +Cc: akpm, linux-kernel, s0348365, bunk, mikpe, torvalds

>> Lack of the flag does not break any valid C code, only code
>> making unwarranted assumptions (i.e., buggy code).
>
> Right, if "C" means "strictly conforming ISO C" to you.

Without any further qualification, it of course does, yes.

> (in which case, nearly all real-world code is broken)

Not "nearly all" -- but lots of code, yes.

> FYI, the kernel also assumes that a "char" is 8 bits.
> Maybe you should run away screaming.

No, that's fine with me.  It's fine with GCC as well
of course.

>> Anyway, with 4.1 you shouldn't see frequent problems due to
>
> Right, it gets much worse with the current gcc snapshots.

Yes.  And that problem will be fixed some way pretty soon --
simply because it _has_ to be fixed.

> IMHO you should play such games with "g++ -O9", but that's
> a discussion for a different mailing list.

For a different mailing list indeed; let me just point out
that for certain important quite common cases it's an ~50%
overall speedup.

>> "not using -fwrapv while my code is broken WRT signed overflow"
>> yet; and if/when problems start to happen, to "correct" action
>> to take is not to add the compiler flag, but to fix the code.
>
> Nope, unless we decide that the performance advantages of
> a language change are worth the risk and pain.

If the kernel breaks all over the place, of course you should add
the flag.  But it won't, it would break *all* programs all over
the place then, and that wouldn't be acceptable to GCC.  If instead
only a few kernel code bugs pop up, it's easy to fix.

Aaaaanyway -- my only real point was to point out that there's
no doomsday scenario here, yes current GCC TOT seems to regress
here (for some definition of that word), but GCC development
is in stage 1, that sort of thing happens.  It'll stabilise
again.

In the meantime, building git HEAD kernels with GCC 4.1 and
4.2 will probably rattle out quite a few bugs still, both
in the kernel and in GCC -- neither is used all that often
it seems?


Segher


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-04 17:04   ` Albert Cahalan
  2007-01-04 17:24     ` Segher Boessenkool
@ 2007-01-04 17:37     ` Linus Torvalds
  2007-01-04 18:34       ` Segher Boessenkool
  2007-01-07  4:25       ` Denis Vlasenko
  2007-01-04 18:08     ` Andreas Schwab
  2 siblings, 2 replies; 60+ messages in thread
From: Linus Torvalds @ 2007-01-04 17:37 UTC (permalink / raw)
  To: Albert Cahalan
  Cc: Segher Boessenkool, akpm, linux-kernel, s0348365, bunk, mikpe



On Thu, 4 Jan 2007, Albert Cahalan wrote:

> On 1/4/07, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> >
> > Lack of the flag does not break any valid C code, only code
> > making unwarranted assumptions (i.e., buggy code).
> 
> Right, if "C" means "strictly conforming ISO C" to you.
> (in which case, nearly all real-world code is broken)

Indeed. The gcc people seem to often think that "language lawyering" is a 
good idea, and totally overrides "real world". The whole flap about the 
completely idiotic things they do (or at least did) for alias analysis on 
the grounds that "they can" is an example of this.

> FYI, the kernel also assumes that a "char" is 8 bits.
> Maybe you should run away screaming.

Gcc people are quick to condemn others for assumptions that breaks 
standards, but it has tons of assumptions very deeply embedded itself. I 
don't think it could realistically work very well on setups where pointers 
aren't the same size as long, and it has various deep assumptions itself 
about what is "realistic".

The kernel does the same. Some of it intentional and by design, much of it 
probably totally unintentional, but the result of "it worked, and nobody 
even thought about anything else". 

With 7+ million lines of C code and headers, I'm not interested in 
compilers that read the letter of the law. We don't want some really 
clever code generation that gets us .5% on some unrealistic load. We want 
good _solid_ code generation that does the obvious thing.

Compiler writers seem to seldom even realize this. A lot of commercial 
code gets shipped with basically no optimizations at all (or with specific 
optimizations turned off), because people want to ship what they debug and 
work with.

I'll happily turn off compiler features that are "clever optimizations 
that never actually matter in practice, but are just likely to possible 
cause problems".

The sad part is that "straightforward optimizations" (as opposed to 
"really clever ones") often work better in practice too. At least with 
kernel code, which is not that high-level to begin with. 

> > to take is not to add the compiler flag, but to fix the code.
> 
> Nope, unless we decide that the performance advantages of
> a language change are worth the risk and pain.

Indeed. We'd happily fix the code if:
 (a) it's reasonably easy to find places that are buggy.
 (b) there are syntactically sane ways to fix it
 (c) the optimization actually makes sense and is worthwhile

An example of where _none_ of these things were true was the old gcc alias 
analysis. I think gcc eventually added a sane way to mark pointers as 
being possible aliases (ie case (b): give a syntactially acceptable way 
for code maintainability to actually fix things), but since neither (a) 
nor (b) are there, the _correct_ solution was just to tell the compiler to 
stop doing that.

With integer overflow optimizations, the same situation may be true. The 
kernel has never been "strict ANSI C". We've always used C extensions. The 
extension of "signed integer arithmetic follows 2's-complement-arithmetic" 
is a perfectly sane extension to the language, and quite possibly worth 
it.

And the fact that it's not "strict ANSI C" has absolutely _zero_ 
relevance.

		Linus

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-04 17:24     ` Segher Boessenkool
@ 2007-01-04 17:47       ` Linus Torvalds
  2007-01-04 18:53         ` Segher Boessenkool
  2007-01-04 19:10         ` Al Viro
  2007-01-05 17:17       ` Pavel Machek
  1 sibling, 2 replies; 60+ messages in thread
From: Linus Torvalds @ 2007-01-04 17:47 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Albert Cahalan, akpm, linux-kernel, s0348365, bunk, mikpe



On Thu, 4 Jan 2007, Segher Boessenkool wrote:
> 
> > (in which case, nearly all real-world code is broken)
> 
> Not "nearly all" -- but lots of code, yes.

I wouldn't say "lots of code". I would say "all real projects".

NOBODY will guarantee you that they follow all standards to the letter. 
Some use compiler extensions knowingly, but pretty much _everybody_ ends 
up depending on subtle issues without even realizing it. It's almost 
impossible to write a real program that has no bugs, and if they don't 
show up in testing (because the compiler didn't generate buggy assembly 
code from source code that had the _potential_ for bugs), they often won't 
get fixed.

The kernel does things like compare pointers across objects, and the 
kernel EXPECTS it to work. I seriously doubt that the kernel is even 
unusual in this. The common way to avoid AB-BA deadlocks in any threaded 
code (whether kernel or user space) is to just take two locks in a 
specific order, and the common way to do that for locks of the same type 
is simply to compare the addresses).

The fact that this is "undefined" behaviour matters not a _whit_. Not for 
the kernel, and I bet not for a lot of other applications either.

So "nearly all" is probably _understating_ things rather than overstating 
it as you claim. Anybody who thinks that they have proven the correctness 
of their program is likely lying. It's a good thing if they have _tested_ 
all the code-paths, but they've invariably been tested with a compiler 
that doesn't go out of its way to try to generate "legal but idiotic" 
code. So the testing won't generally find cases where the compiler may 
have been _allowed_ to do something else.

The end result: any nontrivial project always has dodgy code. Because 
people simply don't write perfect code.

Compiler people who don't realize this aren't compiler people. They're 
academics involved with mental masturbation.

		Linus

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-04 17:04   ` Albert Cahalan
  2007-01-04 17:24     ` Segher Boessenkool
  2007-01-04 17:37     ` Linus Torvalds
@ 2007-01-04 18:08     ` Andreas Schwab
  2 siblings, 0 replies; 60+ messages in thread
From: Andreas Schwab @ 2007-01-04 18:08 UTC (permalink / raw)
  To: Albert Cahalan
  Cc: Segher Boessenkool, akpm, linux-kernel, s0348365, bunk, mikpe, torvalds

"Albert Cahalan" <acahalan@gmail.com> writes:

> FYI, the kernel also assumes that a "char" is 8 bits.
> Maybe you should run away screaming.

You are confusing "undefined" with "implementation defined".  Those are
two quite different concepts.

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-04 17:37     ` Linus Torvalds
@ 2007-01-04 18:34       ` Segher Boessenkool
  2007-01-04 22:02         ` Geert Bosch
  2007-01-07  4:25       ` Denis Vlasenko
  1 sibling, 1 reply; 60+ messages in thread
From: Segher Boessenkool @ 2007-01-04 18:34 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: akpm, Albert Cahalan, linux-kernel, s0348365, bunk, mikpe

> I'll happily turn off compiler features that are "clever optimizations
> that never actually matter in practice, but are just likely to possible
> cause problems".

The "signed wrap is undefined" thing doesn't fit in this category
though:

-- It is an important optimisation for loops with a signed
    induction variable;
-- "Random code" where it causes problems is typically buggy
    already (i.e., code that doesn't take overflow into account
    at all won't expect wraparound either);
-- Code that explicitly depends on signed overflow two's complement
    wraparound can be trivially converted to use unsigned arithmetic
    (and in almost all cases it really should have used that already).

If GCC can generate warnings for things in the second bullet point
(and it probably will, but nothing is finalised yet), I don't see
a reason for the kernel to turn off the optimisation.  Why not try
it out and only _if_ it causes troubles (after the compiler version
is stable) turn it off.

to take is not to add the compiler flag, but to fix the code.
>>
>> Nope, unless we decide that the performance advantages of
>> a language change are worth the risk and pain.

But it's not a language change -- GCC has worked like this
for a _long_ time already, since May 2003 if I read the
ChangeLog correctly -- it's just that it starts to optimise
some things more aggressively now.

> With integer overflow optimizations, the same situation may be true. 
> The
> kernel has never been "strict ANSI C". We've always used C extensions. 
> The
> extension of "signed integer arithmetic follows 
> 2's-complement-arithmetic"
> is a perfectly sane extension to the language, and quite possibly worth
> it.

Could be.  Who knows, without testing.  I'm just saying to
not add -fwrapv purely as a knee-jerk reaction.

> And the fact that it's not "strict ANSI C" has absolutely _zero_
> relevance.

I certainly never claimed so, that's all in Albert's mind it seems :-)


Segher


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-04 17:47       ` Linus Torvalds
@ 2007-01-04 18:53         ` Segher Boessenkool
  2007-01-04 19:10         ` Al Viro
  1 sibling, 0 replies; 60+ messages in thread
From: Segher Boessenkool @ 2007-01-04 18:53 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: akpm, Albert Cahalan, linux-kernel, s0348365, bunk, mikpe

>>> (in which case, nearly all real-world code is broken)
>>
>> Not "nearly all" -- but lots of code, yes.
>
> I wouldn't say "lots of code". I would say "all real projects".

All projects that tell the compiler they're written in ISO C,
while they're not, can easily break, sure.  You can't say this
is GCC's fault; sure in some cases decisions were made that
resulted in more of those programs breaking than was really
necessary, but it's obviously *impossible* to prevent all
from breaking.

And yes it's true: most people do not program in ISO C at all,
_even if they think they do_, simply because they are not aware
of all the rules.  For some of the areas where most of the
mistakes are made, for example aliasing rules and signed overflow,
GCC provides helpful options to switch behaviour to something
that makes those people's programs work.  You can also use those
options if you have made a conscious decision that you want to
write your code in one of the resulting dialects of C.


Segher

p.s.  If it's decided to not use -fwrapv, a debug option that
sets -ftrapv can be introduced -- it will make it a BUG() if
any (accidental) signed overflow happens after all.


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-04 17:47       ` Linus Torvalds
  2007-01-04 18:53         ` Segher Boessenkool
@ 2007-01-04 19:10         ` Al Viro
  1 sibling, 0 replies; 60+ messages in thread
From: Al Viro @ 2007-01-04 19:10 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Segher Boessenkool, Albert Cahalan, akpm, linux-kernel, s0348365,
	bunk, mikpe

On Thu, Jan 04, 2007 at 09:47:01AM -0800, Linus Torvalds wrote:
> NOBODY will guarantee you that they follow all standards to the letter. 
> Some use compiler extensions knowingly, but pretty much _everybody_ ends 
> up depending on subtle issues without even realizing it. It's almost 
> impossible to write a real program that has no bugs, and if they don't 
> show up in testing (because the compiler didn't generate buggy assembly 
> code from source code that had the _potential_ for bugs), they often won't 
> get fixed.
> 
> The kernel does things like compare pointers across objects, and the 
> kernel EXPECTS it to work. I seriously doubt that the kernel is even 
> unusual in this. The common way to avoid AB-BA deadlocks in any threaded 
> code (whether kernel or user space) is to just take two locks in a 
> specific order, and the common way to do that for locks of the same type 
> is simply to compare the addresses).
> 
> The fact that this is "undefined" behaviour matters not a _whit_. Not for 
> the kernel, and I bet not for a lot of other applications either.

True, but we'd better understand what assumptions we are making.  I have
seen patches seriously attempting to _subtract_ unrelated pointers.  And
that simply doesn't work for obvious reasons...

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-04 18:34       ` Segher Boessenkool
@ 2007-01-04 22:02         ` Geert Bosch
  0 siblings, 0 replies; 60+ messages in thread
From: Geert Bosch @ 2007-01-04 22:02 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Linus Torvalds, akpm, Albert Cahalan, linux-kernel, s0348365,
	bunk, mikpe


On Jan 4, 2007, at 13:34, Segher Boessenkool wrote:

> The "signed wrap is undefined" thing doesn't fit in this category
> though:
>
> -- It is an important optimisation for loops with a signed
>    induction variable;

It certainly isn't that important. Even SpecINT compiled with
-O3 and top-of-tree GCC *improves* 1% by adding -fwrapv.
If the compiler itself can rely on wrap-around semantics and
doesn't have to worry about introducing overflows between
optimization passes, it can reorder simple chains of additions.
This is more important for many real-world applications than
being able to perform some complex loop-interchange.
Compiler developers always make the mistake of overrating
their optimizations.

If GCC does really poorly on a few important loops that matter,
that issue is easily addressed. If GCC generates unreliable
code for millions of boring lines of important real-world C,
the compiler is worthless.

   -Geert

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-04 17:24     ` Segher Boessenkool
  2007-01-04 17:47       ` Linus Torvalds
@ 2007-01-05 17:17       ` Pavel Machek
  2007-01-06  8:23         ` Segher Boessenkool
  1 sibling, 1 reply; 60+ messages in thread
From: Pavel Machek @ 2007-01-05 17:17 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Albert Cahalan, akpm, linux-kernel, s0348365, bunk, mikpe, torvalds

Hi!

> >IMHO you should play such games with "g++ -O9", but 
> >that's
> >a discussion for a different mailing list.
> 
> For a different mailing list indeed; let me just point 
> out
> that for certain important quite common cases it's an 
> ~50%
> overall speedup.

Hmm, what code was that? 'signed int does not wrap around' does not
seem to provide _that_ much info...
							Pavel
-- 
Thanks for all the (sleeping) penguins.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-05 17:17       ` Pavel Machek
@ 2007-01-06  8:23         ` Segher Boessenkool
  0 siblings, 0 replies; 60+ messages in thread
From: Segher Boessenkool @ 2007-01-06  8:23 UTC (permalink / raw)
  To: Pavel Machek
  Cc: akpm, Albert Cahalan, linux-kernel, s0348365, bunk, mikpe, torvalds

>> For a different mailing list indeed; let me just point
>> out
>> that for certain important quite common cases it's an
>> ~50%
>> overall speedup.
>
> Hmm, what code was that? 'signed int does not wrap around' does not
> seem to provide _that_ much info...

One of the recent huge threads on the GCC dev list has a
post that says *some other* compiler gets a result like
this from this optimisation (I don't have a link to the
exact post and I don't remember the details; perhaps it
was XLC?)

Sorry if I wasn't clear enough and you understood I meant
that GCC exploits this optimisation opportunity well
enough for such nice results already.

  - - -

So I searched for it anyway:

<http://gcc.gnu.org/ml/gcc/2006-12/msg00768.html>

It looks like the result for *integer* code wasn't *all*
that dramatic a difference.  Anyway, it's obvious that
the optimisation can certainly give nice results and it
wouldn't be a good idea for the Linux kernel to dismiss
it without really evaluating the impact first; and anyway,
this is for some future date, GCC-4.2 isn't here yet.


Segher


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-04 17:37     ` Linus Torvalds
  2007-01-04 18:34       ` Segher Boessenkool
@ 2007-01-07  4:25       ` Denis Vlasenko
  2007-01-07  4:45         ` Linus Torvalds
  2007-01-07 15:10         ` Segher Boessenkool
  1 sibling, 2 replies; 60+ messages in thread
From: Denis Vlasenko @ 2007-01-07  4:25 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Albert Cahalan, Segher Boessenkool, akpm, linux-kernel, s0348365,
	bunk, mikpe

On Thursday 04 January 2007 18:37, Linus Torvalds wrote:
> With 7+ million lines of C code and headers, I'm not interested in 
> compilers that read the letter of the law. We don't want some really 
> clever code generation that gets us .5% on some unrealistic load. We want 
> good _solid_ code generation that does the obvious thing.
> 
> Compiler writers seem to seldom even realize this. A lot of commercial 
> code gets shipped with basically no optimizations at all (or with specific 
> optimizations turned off), because people want to ship what they debug and 
> work with.

I'd say "care about obvious, safe optimizations which we still not do".
I want this:

char v[4];
...
	memcmp(v, "abcd", 4) == 0

compile to single cmpl on i386. This (gcc 4.1.1) is ridiculous:

.LC0:
        .string "abcd"
        .text
...
        pushl   $4
        pushl   $.LC0
        pushl   $v
        call    memcmp
        addl    $12, %esp
        testl   %eax, %eax

There are tons of examples where you can improve code generation.
--
vda

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-07  4:25       ` Denis Vlasenko
@ 2007-01-07  4:45         ` Linus Torvalds
  2007-01-07  5:26           ` Jeff Garzik
  2007-01-07 15:10         ` Segher Boessenkool
  1 sibling, 1 reply; 60+ messages in thread
From: Linus Torvalds @ 2007-01-07  4:45 UTC (permalink / raw)
  To: Denis Vlasenko
  Cc: Albert Cahalan, Segher Boessenkool, akpm, linux-kernel, s0348365,
	bunk, mikpe



On Sun, 7 Jan 2007, Denis Vlasenko wrote:
> 
> I'd say "care about obvious, safe optimizations which we still not do".
> I want this:
> 
> char v[4];
> ...
> 	memcmp(v, "abcd", 4) == 0
> 
> compile to single cmpl on i386.

Yeah. For a more relevant case, look at the hoops we used to jump through 
to get "memcpy()" to generate ok code for trivial fixed-sized cases.

(That said, I think __builtin_memcpy() does a reasonable job these days 
with gcc, and we might drop the crap one day when we can trust the 
compiler to do ok. It didn't use to, and we continued using our 
ridiculous macro/__builtin_constant_p misuses just because it works with 
_all_ relevant gcc versions).

			Linus

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-07  4:45         ` Linus Torvalds
@ 2007-01-07  5:26           ` Jeff Garzik
  0 siblings, 0 replies; 60+ messages in thread
From: Jeff Garzik @ 2007-01-07  5:26 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Denis Vlasenko, Albert Cahalan, Segher Boessenkool, akpm,
	linux-kernel, s0348365, bunk, mikpe

Linus Torvalds wrote:
> (That said, I think __builtin_memcpy() does a reasonable job these days 
> with gcc, and we might drop the crap one day when we can trust the 
> compiler to do ok. It didn't use to, and we continued using our 
> ridiculous macro/__builtin_constant_p misuses just because it works with 
> _all_ relevant gcc versions).


Yep, a ton of work by Roger Sayle, among others, really matured the gcc 
str*/mem* builtins in the 4.x series.  They are definitely worth another 
look.

	Jeff



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-07  4:25       ` Denis Vlasenko
  2007-01-07  4:45         ` Linus Torvalds
@ 2007-01-07 15:10         ` Segher Boessenkool
  2007-01-26 22:05           ` Michael K. Edwards
  1 sibling, 1 reply; 60+ messages in thread
From: Segher Boessenkool @ 2007-01-07 15:10 UTC (permalink / raw)
  To: Denis Vlasenko
  Cc: akpm, Albert Cahalan, linux-kernel, s0348365, Linus Torvalds,
	bunk, mikpe

> I want this:
>
> char v[4];
> ...
> 	memcmp(v, "abcd", 4) == 0
>
> compile to single cmpl on i386. This (gcc 4.1.1) is ridiculous:

>         call    memcmp

i686-linux-gcc (GCC) 4.2.0 20060410 (experimental)

         movl    $4, %ecx        #, tmp65
         cld
         movl    $v, %esi        #, tmp63
         movl    $.LC0, %edi     #, tmp64
         repz
         cmpsb
         sete    %al     #, tmp68

Still not perfect, but better already.  If you have any
specific examples that you'd like to have compiled to
better code, please report them in GCC bugzilla (with a
self-contained testcase, please).


Segher


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-07 15:10         ` Segher Boessenkool
@ 2007-01-26 22:05           ` Michael K. Edwards
  0 siblings, 0 replies; 60+ messages in thread
From: Michael K. Edwards @ 2007-01-26 22:05 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Denis Vlasenko, akpm, Albert Cahalan, linux-kernel, s0348365,
	Linus Torvalds, bunk, mikpe

ALSA + GCC 4.1.1 + -Os is known to be a bad combination on some
arches; see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27363 .  (I
tripped over it on an ARM target, but my limited understanding of GCC
internals does not allow me to conclude that it is ARM-specific.)  A
patch claiming to fix the bug was integrated into the 4.1 branch, but
my tests with a recent (20070115) gcc-4.1 snapshot indicate that it
has regressed again.

You might also check /proc/cpu/alignment; we have seen the alignment
fixup code trigger for alignment errors in both kernel and userspace.
The default appears to be to IGNORE alignment traps from userspace,
which results in bogus data and potentially a wacky series of system
calls, which could conceivably trigger an oops.  I am told that echo 2
> /proc/cpu/alignment activates the kernel alignment fixup code, and
that 3 turns on some sort of logging in addition to the fixup (haven't
pursued this myself).  No idea whether this is relevant to your CPU.

Cheers,
- Michael

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-07  0:36           ` Pavel Machek
@ 2007-01-07  0:57             ` Alistair John Strachan
  0 siblings, 0 replies; 60+ messages in thread
From: Alistair John Strachan @ 2007-01-07  0:57 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Linus Torvalds, Mikael Pettersson, 76306.1226, akpm, bunk, greg,
	linux-kernel, yanmin_zhang

On Sunday 07 January 2007 00:36, Pavel Machek wrote:
[snip]
> > However, this patch is mostly useless if you have a separate stack for
> > IRQ's (since if that happens, any interrupt will be taken on a different
> > stack which we don't see any more), so you should NOT enable the 4KSTACKS
> > config option if you try this out.
>
> stupid idea... perhaps gcc-4.1 generates bigger stackframe somewhere,
> and stack overflows?

The primary reason it's not 4KSTACKS already is that I run multiple XFS 
partitions on top of an md RAID 1. LVM isn't involved, however, and I'm not 
using any other filesystem overlays like dm.

I'm fairly sceptical that it's a stack overflow, but I'll be sure to enable 
the debugging option on the next try.

> that hw monitoring thingie... I'd turn it off. Its interactions with
> acpi are non-trivial and dangerous.

Well, GCC 3.4 kernels seem to run fine with it, but as I said to Linus I'll be 
sure to turn this and the sound drivers off in the next build.

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-05 16:49         ` Linus Torvalds
@ 2007-01-07  0:36           ` Pavel Machek
  2007-01-07  0:57             ` Alistair John Strachan
  0 siblings, 1 reply; 60+ messages in thread
From: Pavel Machek @ 2007-01-07  0:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alistair John Strachan, Mikael Pettersson, 76306.1226, akpm,
	bunk, greg, linux-kernel, yanmin_zhang

Hi!

> > (I realise with problems like these it's almost always some sort of obscure 
> > hardware problem, but I find that very difficult to believe when I can toggle 
> > from 3 years of stability to 6-18 hours crashing by switching compiler. I've 
> > also ran extensive stability test programs on the hardware with absolutely no 
> > negative results.)
> 
> The thing is, I agree with you - it does seem to be compiler-related. But 
> at the same time, I'm almost positive that it's not in "pipe_poll()" 
> itself, because that function is just too simple, and looking at the 
> assembly code, I don't see how what you describe could happen in THAT 
> function.
> 
> HOWEVER.
> 
> I can easily see an NMI coming in, or another interrupt, or something, and 
> that one corrupting the stack under it because of a compiler bug (or a 
> kernel bug that just needs a specific compiler to trigger). For example, 
> we've had problems before with the compiler thinking it owns the stack 
> frame for an "asmlinkage" function, and us having no way to tell the 
> compiler to keep its hands off - so the compiler ended up touching 
> registers that were actually in the "save area" of the interrupt or system 
> call, and then returning with corrupted state.
> 
> Here's a stupid patch. It just adds more debugging to the oops message, 
> and shows all the code pointers it can find on the WHOLE stack.
> 
> It also makes the raw stack dumping print out as much of the stack 
> contents _under_ the stack pointer as it does above it too.
> 
> However, this patch is mostly useless if you have a separate stack for 
> IRQ's (since if that happens, any interrupt will be taken on a different 
> stack which we don't see any more), so you should NOT enable the 4KSTACKS 
> config option if you try this out.

stupid idea... perhaps gcc-4.1 generates bigger stackframe somewhere,
and stack overflows?

that hw monitoring thingie... I'd turn it off. Its interactions with
acpi are non-trivial and dangerous.
						Pavel
-- 
Thanks for all the (sleeping) penguins.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-05 16:19       ` Alistair John Strachan
@ 2007-01-05 16:49         ` Linus Torvalds
  2007-01-07  0:36           ` Pavel Machek
  0 siblings, 1 reply; 60+ messages in thread
From: Linus Torvalds @ 2007-01-05 16:49 UTC (permalink / raw)
  To: Alistair John Strachan
  Cc: Mikael Pettersson, 76306.1226, akpm, bunk, greg, linux-kernel,
	yanmin_zhang



On Fri, 5 Jan 2007, Alistair John Strachan wrote:
> 
> (I realise with problems like these it's almost always some sort of obscure 
> hardware problem, but I find that very difficult to believe when I can toggle 
> from 3 years of stability to 6-18 hours crashing by switching compiler. I've 
> also ran extensive stability test programs on the hardware with absolutely no 
> negative results.)

The thing is, I agree with you - it does seem to be compiler-related. But 
at the same time, I'm almost positive that it's not in "pipe_poll()" 
itself, because that function is just too simple, and looking at the 
assembly code, I don't see how what you describe could happen in THAT 
function.

HOWEVER.

I can easily see an NMI coming in, or another interrupt, or something, and 
that one corrupting the stack under it because of a compiler bug (or a 
kernel bug that just needs a specific compiler to trigger). For example, 
we've had problems before with the compiler thinking it owns the stack 
frame for an "asmlinkage" function, and us having no way to tell the 
compiler to keep its hands off - so the compiler ended up touching 
registers that were actually in the "save area" of the interrupt or system 
call, and then returning with corrupted state.

Here's a stupid patch. It just adds more debugging to the oops message, 
and shows all the code pointers it can find on the WHOLE stack.

It also makes the raw stack dumping print out as much of the stack 
contents _under_ the stack pointer as it does above it too.

However, this patch is mostly useless if you have a separate stack for 
IRQ's (since if that happens, any interrupt will be taken on a different 
stack which we don't see any more), so you should NOT enable the 4KSTACKS 
config option if you try this out.

I'm not sure how enlightening any of the output might be, but it is 
probably worth trying.

		Linus

---
diff --git a/arch/i386/kernel/traps.c b/arch/i386/kernel/traps.c
index 0efad8a..2359eed 100644
--- a/arch/i386/kernel/traps.c
+++ b/arch/i386/kernel/traps.c
@@ -243,6 +243,20 @@ void show_trace(struct task_struct *task, struct pt_regs *regs,
 	show_trace_log_lvl(task, regs, stack, "");
 }
 
+static void show_all_stack_addresses(unsigned long *esp)
+{
+	struct thread_info *tinfo = (void *) ((unsigned long)esp & (~(THREAD_SIZE - 1)));
+	unsigned long *stack = (unsigned long *)(tinfo+1);
+
+	printk("All stack code pointers:\n");
+	while (valid_stack_ptr(tinfo, stack)) {
+		unsigned long addr = *stack++;
+		if (__kernel_text_address(addr))
+			print_symbol(" %s", addr);
+	}
+	printk("\n");
+}
+
 static void show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 			       unsigned long *esp, char *log_lvl)
 {
@@ -256,8 +270,10 @@ static void show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 			esp = (unsigned long *)&esp;
 	}
 
+	show_all_stack_addresses(esp);
 	stack = esp;
-	for(i = 0; i < kstack_depth_to_print; i++) {
+	stack -= kstack_depth_to_print;
+	for(i = 0; i < 2*kstack_depth_to_print; i++) {
 		if (kstack_end(stack))
 			break;
 		if (i && ((i % 8) == 0))

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-05 16:02     ` Linus Torvalds
@ 2007-01-05 16:19       ` Alistair John Strachan
  2007-01-05 16:49         ` Linus Torvalds
  0 siblings, 1 reply; 60+ messages in thread
From: Alistair John Strachan @ 2007-01-05 16:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Mikael Pettersson, 76306.1226, akpm, bunk, greg, linux-kernel,
	yanmin_zhang

On Friday 05 January 2007 16:02, Linus Torvalds wrote:
> On Fri, 5 Jan 2007, Alistair John Strachan wrote:
> > This didn't help. After about 14 hours, the machine crashed again.
> >
> > cmov is not the culprit.
>
> Ok. Have you ever tried to limit the drivers you have loaded? I notice you
> had the prism54 wireless thing in your modules list and the vt1211 hw
> monitoring thing. I'm wondering about the vt1211 thing - it probably isn't
> too common.

Sure, and it only got added to 2.6.19 anyway (however GCC 3.4.6 really does 
seem to have no problem with it).

> But if you can use that machine without the wireless too, it
> might be good to try without either.

Required, plus I've been running prism54 on three different machines with a 
huge number of compilers since the early 2.6 days with no problems.

> (The rest of your module list looked bog-standard, so if it's not
> hardware-specific, I don't think it's there)

Agreed, the config is already _very_ minimal for this machine.

> Turning of the VIA sound driver just in case would be good too.

I'm not even really sure why that's enabled. I can do that.

> The reason I mention vt1211 in particular is that it does things like
> regulate fan activity etc. Is the problem perhaps heat-related?

It definitely isn't heat related. This CPU puts out 7-10W, has a ridiculous 
5000 RPM fan on it (that works) and the temp never exceeds 40C. If anything, 
the -O2, 3.4.6 kernel with CMOV ran the chip a little hotter.

As far as I can see, all the other components are either cool to touch or have 
stupidly big heatsinks on them.

(I realise with problems like these it's almost always some sort of obscure 
hardware problem, but I find that very difficult to believe when I can toggle 
from 3 years of stability to 6-18 hours crashing by switching compiler. I've 
also ran extensive stability test programs on the hardware with absolutely no 
negative results.)

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-05 15:53   ` Alistair John Strachan
@ 2007-01-05 16:02     ` Linus Torvalds
  2007-01-05 16:19       ` Alistair John Strachan
  0 siblings, 1 reply; 60+ messages in thread
From: Linus Torvalds @ 2007-01-05 16:02 UTC (permalink / raw)
  To: Alistair John Strachan
  Cc: Mikael Pettersson, 76306.1226, akpm, bunk, greg, linux-kernel,
	yanmin_zhang



On Fri, 5 Jan 2007, Alistair John Strachan wrote:
> 
> This didn't help. After about 14 hours, the machine crashed again.
> 
> cmov is not the culprit.

Ok. Have you ever tried to limit the drivers you have loaded? I notice you 
had the prism54 wireless thing in your modules list and the vt1211 hw 
monitoring thing. I'm wondering about the vt1211 thing - it probably isn't 
too common. But if you can use that machine without the wireless too, it 
might be good to try without either.

(The rest of your module list looked bog-standard, so if it's not 
hardware-specific, I don't think it's there)

Turning of the VIA sound driver just in case would be good too.

The reason I mention vt1211 in particular is that it does things like 
regulate fan activity etc. Is the problem perhaps heat-related? 

		Linus

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-03  2:20 ` Alistair John Strachan
@ 2007-01-05 15:53   ` Alistair John Strachan
  2007-01-05 16:02     ` Linus Torvalds
  0 siblings, 1 reply; 60+ messages in thread
From: Alistair John Strachan @ 2007-01-05 15:53 UTC (permalink / raw)
  To: Mikael Pettersson
  Cc: torvalds, 76306.1226, akpm, bunk, greg, linux-kernel, yanmin_zhang

On Wednesday 03 January 2007 02:20, Alistair John Strachan wrote:
> On Wednesday 03 January 2007 02:12, Mikael Pettersson wrote:
> > On Tue, 2 Jan 2007 17:43:00 -0800 (PST), Linus Torvalds wrote:
> > > > The suggestions I've had so far which I have not yet tried:
> > > >
> > > > -	Select a different x86 CPU in the config.
> > > > 		-	Unfortunately the C3-2 flags seem to simply tell GCC
> > > > 			to schedule for ppro (like i686) and enabled MMX and SSE
> > > > 		-	Probably useless
> > >
> > > Actually, try this one. Try using something that doesn't like "cmov".
> > > Maybe the C3-2 simply has some internal cmov bugginess.
> >
> > That's a good suggestion. Earlier C3s didn't have cmov so it's
> > not entirely unlikely that cmov in C3-2 is broken in some cases.
> > Configuring for P5MMX or 486 should be good safe alternatives.
>
> Or just C3 (not C3-2), which is what I've done.
>
> I'll report back whether it crashes or not.

This didn't help. After about 14 hours, the machine crashed again.

cmov is not the culprit.

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* RE: kernel + gcc 4.1 = several problems
  2007-01-04  3:08       ` Zou, Nanhai
@ 2007-01-04 15:34         ` Linus Torvalds
  0 siblings, 0 replies; 60+ messages in thread
From: Linus Torvalds @ 2007-01-04 15:34 UTC (permalink / raw)
  To: Zou, Nanhai
  Cc: Grzegorz Kulewski, Alan, Mikael Pettersson, s0348365, 76306.1226,
	akpm, bunk, greg, linux-kernel, yanmin_zhang



On Thu, 4 Jan 2007, Zou, Nanhai wrote:
> 
> cmov will stall on eflags in your test program.

And that is EXACTLY my point.

CMOV is a piece of CRAP for most things, exactly because it serializes 
three streams of data: the two inputs, and the conditional.

My test-case was actually _good_ for cmov, because there was just the one 
conditional (which was 100% ALU) thing that was serialized. In real life, 
the two data sources also come from memory, and _any_ of them being 
delayed ends up delaying the cmov, and screwing up your out-of-order 
pipeline because you now introduced a serialization point that was very 
possibly not necessary at all.

In contrast, a conditional branch-around serializes absolutely NOTHING, 
because branches get predicted.

> I think you will see benefit of cmov if you can manage to put some 
> instructions which does NOT modify eflags between testl and cmov.

A lot of the time, the conditional _is_ the critical path.

The whole point of this discussion was that cmov isn't really all that 
great. It has fundamental problems that a conditional branch that gets 
predicted simply does not have.

That's qiute apart from the fact that cmov has rather limited semantics, 
and that in 99% of all cases you have to use a conditional branch anyway.

			Linus

^ permalink raw reply	[flat|nested] 60+ messages in thread

* RE: kernel + gcc 4.1 = several problems
  2007-01-03 16:03     ` Linus Torvalds
                         ` (4 preceding siblings ...)
  2007-01-03 21:44       ` Thomas Sailer
@ 2007-01-04  3:08       ` Zou, Nanhai
  2007-01-04 15:34         ` Linus Torvalds
  5 siblings, 1 reply; 60+ messages in thread
From: Zou, Nanhai @ 2007-01-04  3:08 UTC (permalink / raw)
  To: Linus Torvalds, Grzegorz Kulewski
  Cc: Alan, Mikael Pettersson, s0348365, 76306.1226, akpm, bunk, greg,
	linux-kernel, yanmin_zhang

> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org
> [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Linus Torvalds
> Sent: 2007年1月4日 0:04
> To: Grzegorz Kulewski
> Cc: Alan; Mikael Pettersson; s0348365@sms.ed.ac.uk;
> 76306.1226@compuserve.com; akpm@osdl.org; bunk@stusta.de; greg@kroah.com;
> linux-kernel@vger.kernel.org; yanmin_zhang@linux.intel.com
> Subject: Re: kernel + gcc 4.1 = several problems
> 
> 
> 
> On Wed, 3 Jan 2007, Grzegorz Kulewski wrote:
> >
> > Could you explain why CMOV is pointless now? Are there any benchmarks proving
> > that?
> 
> CMOV (and, more generically, any "predicated instruction") tends to
> generally a bad idea on an aggressively out-of-order CPU. It doesn't
> always have to be horrible, but in practice it is seldom very nice, and
> (as usual) on the P4 it can be really quite bad.
> 
> On a P4, I think a cmov basically takes 10 cycles.
> 
> But even ignoring the usual P4 "I suck at things that aren't totally
> normal", cmov is actually not a great idea. You can always replace it by
> 
> 		j<negated condition> forward
> 		mov ..., %reg
> 	forward:
> 
> and assuming the branch is AT ALL predictable (and 95+% of all branches
> are), the branch-over will actually be a LOT better for a CPU.
> 
> Why? Becuase branches can be predicted, and when they are predicted they
> basically go away. They go away on many levels, too. Not just the branch
> itself, but the _conditional_ for the branch goes away as far as the
> critical path of code is concerned: the CPU still has to calculate it and
> check it, but from a performance angle it "doesn't exist any more",
> because it's not holding anything else up (well, you want to do it in
> _some_ reasonable time, but the point stands..)
> 
> Similarly, whichever side of the branch wasn't taken goes away. Again, in
> an out-of-order machine with register renaming, this means that even if
> the branch isn't taken above, and you end up executing all the non-branch
> instructions, because you now UNCONDITIONALLY over-write the register, the
> old data in the register is now DEAD, so now all the OTHER writes to that
> register are off the critical path too!
> 
> So the end result is that with a conditional branch, ona good CPU, the
> _only_ part of the code that is actually performance-sensitive is the
> actual calculation of the value that gets used!
> 
> In contrast, if you use a predicated instruction, ALL of it is on the
> critical path. Calculating the conditional is on the critical path.
> Calculating the value that gets used is obviously ALSO on the critical
> path, but so is the calculation for the value that DOESN'T get used too.
> So the cmov - rather than speeding things up - actually slows things down,
> because it makes more code be dependent on each other.
> 
> So here's the basic rule:
> 
>  - cmov is sometimes nice for code density. It's not a big win, but it
>    certainly can be a win.
> 
>  - if you KNOW the branch is totally unpredictable, cmov is often good for
>    performance. But a compiler almost never knows that, and even if you
>    train it with input data and profiling, remember that not very many
>    branches _are_ totally unpredictable, so even if you were to know that
>    something is unpredictable, it's going to be very rare.
> 
>  - on a P4, branch mispredictions are expensive, but so is cmov, so all
>    the above is to some degree exaggerated. On nicer microarchitectures
>    (the Intel Core 2 in particular is something I have to say is very nice
>    indeed), the difference will be a lot less noticeable. The loss from
>    cmov isn't very big (it's not as sucky as P4), but neither is the win
>    (branch misprediction isn't that expensive either).
> 
> Here's an example program that you can test and time yourself.
> 
> On my Core 2, I get
> 
> 	[torvalds@woody ~]$ gcc -DCMOV -Wall -O2 t.c
> 	[torvalds@woody ~]$ time ./a.out
> 	600000000
> 
> 	real    0m0.194s
> 	user    0m0.192s
> 	sys     0m0.000s
> 
> 	[torvalds@woody ~]$ gcc -Wall -O2 t.c
> 	[torvalds@woody ~]$ time ./a.out
> 	600000000
> 
> 	real    0m0.167s
> 	user    0m0.168s
> 	sys     0m0.000s
> 
> ie the cmov is quite a bit slower. Maybe I did something wrong. But note
> how cmov not only is slower, it's fundamnetally more limited too (ie the
> branch-over can actually do a lot of things cmov simply cannot do).


Hi,
cmov will stall on eflags in your test program.
I think you will see benefit of cmov if you can manage to put some instructions which does NOT modify eflags between testl and cmov. 

Thanks
Zou Nan hai

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-03 21:48           ` Denis Vlasenko
@ 2007-01-03 22:13             ` Linus Torvalds
  0 siblings, 0 replies; 60+ messages in thread
From: Linus Torvalds @ 2007-01-03 22:13 UTC (permalink / raw)
  To: Denis Vlasenko
  Cc: Grzegorz Kulewski, Alan, Mikael Pettersson, s0348365, 76306.1226,
	akpm, bunk, greg, linux-kernel, yanmin_zhang



On Wed, 3 Jan 2007, Denis Vlasenko wrote:
> 
> IOW: yet another slot in instruction opcode matrix and thousands of
> transistors in instruction decoders are wasted because of this
> "clever invention", eh?

Well, in all fairness, it can probably help more on certain 
microarchitectures. Intel is fairly aggressively OoO, especially in Core 
2, and predicted branches are not only free, they allow OoO to do a great 
job around them. But an in-order architecture doesn't have that, and cmov 
might show more of an advantage there.

		Linus

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-03 21:44       ` Thomas Sailer
@ 2007-01-03 22:08         ` Linus Torvalds
  0 siblings, 0 replies; 60+ messages in thread
From: Linus Torvalds @ 2007-01-03 22:08 UTC (permalink / raw)
  To: Thomas Sailer
  Cc: Grzegorz Kulewski, Alan, Mikael Pettersson, s0348365, 76306.1226,
	akpm, bunk, greg, linux-kernel, yanmin_zhang



On Wed, 3 Jan 2007, Thomas Sailer wrote:
> 
> IF... Counterexample: Add-Compare-Select in a Viterbi Decoder.

Yes. [De]compression stuff tends to be (a) totally unpredictable and (b) a 
situation where people care about performance. It's fairly rare in many 
other situations.

That said, any real performance these days is about avoiding cache misses. 
There cmov really can help more, if it results in denser code (fairly big 
if, though).

			Linus

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-03 20:38         ` Linus Torvalds
@ 2007-01-03 21:48           ` Denis Vlasenko
  2007-01-03 22:13             ` Linus Torvalds
  0 siblings, 1 reply; 60+ messages in thread
From: Denis Vlasenko @ 2007-01-03 21:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Grzegorz Kulewski, Alan, Mikael Pettersson, s0348365, 76306.1226,
	akpm, bunk, greg, linux-kernel, yanmin_zhang

On Wednesday 03 January 2007 21:38, Linus Torvalds wrote:
> On Wed, 3 Jan 2007, Denis Vlasenko wrote:
> > 
> > Why CPU people do not internally convert cmov into jmp,mov pair?
> 
...
> It really all boils down to: there's simply no real reason to use cmov. 
> It's not horrible either, so go ahead and use it if you want to, but don't 
> expect your code to really magically run any faster.

IOW: yet another slot in instruction opcode matrix and thousands of
transistors in instruction decoders are wasted because of this
"clever invention", eh?
--
vda

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-03 16:03     ` Linus Torvalds
                         ` (3 preceding siblings ...)
  2007-01-03 19:47       ` Denis Vlasenko
@ 2007-01-03 21:44       ` Thomas Sailer
  2007-01-03 22:08         ` Linus Torvalds
  2007-01-04  3:08       ` Zou, Nanhai
  5 siblings, 1 reply; 60+ messages in thread
From: Thomas Sailer @ 2007-01-03 21:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Grzegorz Kulewski, Alan, Mikael Pettersson, s0348365, 76306.1226,
	akpm, bunk, greg, linux-kernel, yanmin_zhang

On Wed, 2007-01-03 at 08:03 -0800, Linus Torvalds wrote:

> and assuming the branch is AT ALL predictable (and 95+% of all branches 
> are), the branch-over will actually be a LOT better for a CPU.

IF... Counterexample: Add-Compare-Select in a Viterbi Decoder. If the
compare can be predicted, you botched the compression of the data (if
you can predict the data, you could have compressed it better), or your
noise is not white, i.e. you f*** up the whitening filter. So in any
practical viterbi decoder, the compares cannot be predicted. I remember
cmov made a big difference in Viterbi Decoder performance on a Cyrix
6x86. But granted, nowadays these things are usually done with SIMD and
masks.

Tom


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-03 19:47       ` Denis Vlasenko
@ 2007-01-03 20:38         ` Linus Torvalds
  2007-01-03 21:48           ` Denis Vlasenko
  0 siblings, 1 reply; 60+ messages in thread
From: Linus Torvalds @ 2007-01-03 20:38 UTC (permalink / raw)
  To: Denis Vlasenko
  Cc: Grzegorz Kulewski, Alan, Mikael Pettersson, s0348365, 76306.1226,
	akpm, bunk, greg, linux-kernel, yanmin_zhang



On Wed, 3 Jan 2007, Denis Vlasenko wrote:
> 
> Why CPU people do not internally convert cmov into jmp,mov pair?

Probably because

 - it's not worth it. cmov's certainly _can_ be faster for unpredictable 
   input. So expecially if you teach your compiler (by using profiling) to 
   use cmov's mainly for unpredictable cases, turning it into a 
   conditional jump internally would likely be a bad idea.

 - the biggest reason to do it would likely be microarchitectural: if you 
   have an ALU or a bypass network that just isn't suitable for bypassing 
   the flags that way (because you designed your pipeline for a 
   conditional branch), you might decide that it just simplifies things to 
   turn the cmov internally into a branch+mov uop pair. 

 - cmov's simply aren't common enough to be worth worrying about, 
   especially as it's not likely that the difference is all that big in 
   the end. The limitations on cmov's means that the compiler can only use 
   them under certain fairly limited circumstances anyway, so it's not 
   like you'll make a huge difference by doing anything clever.  So see 
   above - it's simply a wash, and likely ends up just depending on other 
   issues.

And don't get me wrong. cmov's can make a difference. You can use them to 
avoid polluting your branch prediction tables, you can use them to make 
code smaller, and you can use them when they simply just fit the problem 
really well. It's just _not_ the case that they are "obviously better". 
They simply aren't. Conditional branches aren't "evil". There are many 
MUCH worse things you can do, and other things you should avoid.

It really all boils down to: there's simply no real reason to use cmov. 
It's not horrible either, so go ahead and use it if you want to, but don't 
expect your code to really magically run any faster.

			Linus

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-03 17:45         ` Tim Schmielau
@ 2007-01-03 20:24           ` Linus Torvalds
  0 siblings, 0 replies; 60+ messages in thread
From: Linus Torvalds @ 2007-01-03 20:24 UTC (permalink / raw)
  To: Tim Schmielau
  Cc: l.genoni, Grzegorz Kulewski, Alan, Mikael Pettersson, s0348365,
	76306.1226, akpm, bunk, greg, linux-kernel, yanmin_zhang



On Wed, 3 Jan 2007, Tim Schmielau wrote:
>
> Well, on a P4 (which is supposed to be soo bad) I get:

Interesting. My P4 gets basically exactly the same timings for the cmov 
and branch cases.  And my Core 2 is consistently faster (something like 
15%) for the branch version.

Btw, the test-case should be the best possible one for cmov, since there 
are no data-dependencies except for ALU operations, and everything is 
totally independent (the actual values have no data dependencies at all, 
since they are constants). So the critical path issue never show up.

		Linus

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-03 16:03     ` Linus Torvalds
                         ` (2 preceding siblings ...)
  2007-01-03 17:53       ` Mariusz Kozlowski
@ 2007-01-03 19:47       ` Denis Vlasenko
  2007-01-03 20:38         ` Linus Torvalds
  2007-01-03 21:44       ` Thomas Sailer
  2007-01-04  3:08       ` Zou, Nanhai
  5 siblings, 1 reply; 60+ messages in thread
From: Denis Vlasenko @ 2007-01-03 19:47 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Grzegorz Kulewski, Alan, Mikael Pettersson, s0348365, 76306.1226,
	akpm, bunk, greg, linux-kernel, yanmin_zhang

On Wednesday 03 January 2007 17:03, Linus Torvalds wrote:
> On Wed, 3 Jan 2007, Grzegorz Kulewski wrote:
> > Could you explain why CMOV is pointless now? Are there any benchmarks proving
> > that?
> 
> CMOV (and, more generically, any "predicated instruction") tends to 
> generally a bad idea on an aggressively out-of-order CPU. It doesn't 
> always have to be horrible, but in practice it is seldom very nice, and 
> (as usual) on the P4 it can be really quite bad.
> 
> On a P4, I think a cmov basically takes 10 cycles.
> 
> But even ignoring the usual P4 "I suck at things that aren't totally 
> normal", cmov is actually not a great idea. You can always replace it by
> 
> 		j<negated condition> forward
> 		mov ..., %reg
> 	forward:
...
...
> In contrast, if you use a predicated instruction, ALL of it is on the 
> critical path. Calculating the conditional is on the critical path. 
> Calculating the value that gets used is obviously ALSO on the critical 
> path, but so is the calculation for the value that DOESN'T get used too. 
> So the cmov - rather than speeding things up - actually slows things down, 
> because it makes more code be dependent on each other.

Why CPU people do not internally convert cmov into jmp,mov pair?
--
vda

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-03 16:03     ` Linus Torvalds
  2007-01-03 17:01       ` l.genoni
  2007-01-03 17:06       ` l.genoni
@ 2007-01-03 17:53       ` Mariusz Kozlowski
  2007-01-03 19:47       ` Denis Vlasenko
                         ` (2 subsequent siblings)
  5 siblings, 0 replies; 60+ messages in thread
From: Mariusz Kozlowski @ 2007-01-03 17:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Grzegorz Kulewski, Alan, Mikael Pettersson, s0348365, 76306.1226,
	akpm, bunk, greg, linux-kernel, yanmin_zhang

[-- Attachment #1: Type: text/plain, Size: 1404 bytes --]

Hello, 

> Here's an example program that you can test and time yourself. 
> 
> On my Core 2, I get
> 
> 	[torvalds@woody ~]$ gcc -DCMOV -Wall -O2 t.c
> 	[torvalds@woody ~]$ time ./a.out
> 	600000000
> 
> 	real    0m0.194s
> 	user    0m0.192s
> 	sys     0m0.000s
> 
> 	[torvalds@woody ~]$ gcc -Wall -O2 t.c
> 	[torvalds@woody ~]$ time ./a.out
> 	600000000
> 	
> 	real    0m0.167s
> 	user    0m0.168s
> 	sys     0m0.000s

Test was done on my laptop with gcc 4.1.1 and CPU:

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 2
model name      : Intel(R) Pentium(R) 4 CPU 2.40GHz
stepping        : 9
cpu MHz         : 2392.349
cache size      : 512 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid xtpr
bogomips        : 4786.36
clflush size    : 64

I wrote a simple script that run each version of your code 100
times measuring the execution time. Then some simple gnuplot
magic was applied. The result is attached (png file).

- cmovne was faster with almost stable execution time (~171ms)
- je-mov was slower and execution time varies

Interpretation is up to you ;-)

-- 
Regards,

	Mariusz Kozlowski

[-- Attachment #2: benchmark.png --]
[-- Type: image/png, Size: 6165 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-03 17:01       ` l.genoni
@ 2007-01-03 17:45         ` Tim Schmielau
  2007-01-03 20:24           ` Linus Torvalds
  0 siblings, 1 reply; 60+ messages in thread
From: Tim Schmielau @ 2007-01-03 17:45 UTC (permalink / raw)
  To: l.genoni
  Cc: Linus Torvalds, Grzegorz Kulewski, Alan, Mikael Pettersson,
	s0348365, 76306.1226, akpm, bunk, greg, linux-kernel,
	yanmin_zhang

Well, on a P4 (which is supposed to be soo bad) I get:

> gcc -O2 t.c -o t
> foreach x ( 1 2 3 4 5 )
>> time ./t > /dev/null
>> end
0.196u 0.004s 0:00.19 100.0%    0+0k 0+0io 0pf+0w
0.168u 0.004s 0:00.16 100.0%    0+0k 0+0io 0pf+0w
0.168u 0.000s 0:00.16 100.0%    0+0k 0+0io 0pf+0w
0.160u 0.000s 0:00.15 106.6%    0+0k 0+0io 0pf+0w
0.180u 0.000s 0:00.18 100.0%    0+0k 0+0io 0pf+0w
> gcc -DCMOV -O2 t.c -o t
> foreach x ( 1 2 3 4 5 )
>> time ./t > /dev/null
>> end
0.168u 0.000s 0:00.17 94.1%     0+0k 0+0io 0pf+0w
0.152u 0.000s 0:00.15 100.0%    0+0k 0+0io 0pf+0w
0.136u 0.004s 0:00.13 100.0%    0+0k 0+0io 0pf+0w
0.168u 0.000s 0:00.16 100.0%    0+0k 0+0io 0pf+0w
0.172u 0.000s 0:00.17 100.0%    0+0k 0+0io 0pf+0w

see?

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-03 16:03     ` Linus Torvalds
  2007-01-03 17:01       ` l.genoni
@ 2007-01-03 17:06       ` l.genoni
  2007-01-03 17:53       ` Mariusz Kozlowski
                         ` (3 subsequent siblings)
  5 siblings, 0 replies; 60+ messages in thread
From: l.genoni @ 2007-01-03 17:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Grzegorz Kulewski, Alan, Mikael Pettersson, s0348365, 76306.1226,
	akpm, bunk, greg, linux-kernel, yanmin_zhang


Just to make clearer why I am so curious, this from X86_64 X2 3800+:

DarkStar:{venom}:/tmp> gcc -DCMOV -Wall -O2 t.c
DarkStar:{venom}:/tmp>time ./a.out
600000000

real    0m0.151s
user    0m0.150s
sys     0m0.000s
DarkStar:{venom}:/tmp> gcc -Wall -O2 t.c
DarkStar:{venom}:/tmp> time ./a.out
600000000

real    0m0.176s
user    0m0.180s
sys     0m0.000s
DarkStar:{venom}:/tmp>gcc -m32 -DCMOV -Wall -O2 t.c
DarkStar:{venom}:/tmp>time ./a.out
600000000

real    0m0.152s
user    0m0.160s
sys     0m0.000s
DarkStar:{venom}:/tmp>gcc -m32  -Wall -O2 t.c
DarkStar:{venom}:/tmp>time ./a.out
600000000

real    0m0.200s
user    0m0.200s
sys     0m0.000s


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-03 16:03     ` Linus Torvalds
@ 2007-01-03 17:01       ` l.genoni
  2007-01-03 17:45         ` Tim Schmielau
  2007-01-03 17:06       ` l.genoni
                         ` (4 subsequent siblings)
  5 siblings, 1 reply; 60+ messages in thread
From: l.genoni @ 2007-01-03 17:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Grzegorz Kulewski, Alan, Mikael Pettersson, s0348365, 76306.1226,
	akpm, bunk, greg, linux-kernel, yanmin_zhang


Just curious why on Opteron dual core 2600MHZ I get:

phoenix:{root}:/tmp> gcc -DCMOV -Wall -O2 t.c
phoenix:{root}:/tmp>time ./a.out
600000000

real    0m0.117s
user    0m0.120s
sys     0m0.000s
phoenix:{root}:/tmp>gcc -Wall -O2 t.c
phoenix:{root}:/tmp> time ./a.out
600000000

real    0m0.136s
user    0m0.130s
sys     0m0.010s

Regards

(I understand it is very different from P4)

Luigi Genoni

On Wed, 3 Jan 2007, Linus Torvalds wrote:

> Date: Wed, 3 Jan 2007 08:03:37 -0800 (PST)
> From: Linus Torvalds <torvalds@osdl.org>
> To: Grzegorz Kulewski <kangur@polcom.net>
> Cc: Alan <alan@lxorguk.ukuu.org.uk>, Mikael Pettersson <mikpe@it.uu.se>,
>     s0348365@sms.ed.ac.uk, 76306.1226@compuserve.com, akpm@osdl.org,
>     bunk@stusta.de, greg@kroah.com, linux-kernel@vger.kernel.org,
>     yanmin_zhang@linux.intel.com
> Subject: Re: kernel + gcc 4.1 = several problems
> Resent-Date: Wed, 03 Jan 2007 17:16:00 +0100
> Resent-From: <l.genoni@sns.it>
> 
>
>
> On Wed, 3 Jan 2007, Grzegorz Kulewski wrote:
>>
>> Could you explain why CMOV is pointless now? Are there any benchmarks proving
>> that?
>
> CMOV (and, more generically, any "predicated instruction") tends to
> generally a bad idea on an aggressively out-of-order CPU. It doesn't
> always have to be horrible, but in practice it is seldom very nice, and
> (as usual) on the P4 it can be really quite bad.
>
> On a P4, I think a cmov basically takes 10 cycles.
>
> But even ignoring the usual P4 "I suck at things that aren't totally
> normal", cmov is actually not a great idea. You can always replace it by
>
> 		j<negated condition> forward
> 		mov ..., %reg
> 	forward:
>
> and assuming the branch is AT ALL predictable (and 95+% of all branches
> are), the branch-over will actually be a LOT better for a CPU.
>
> Why? Becuase branches can be predicted, and when they are predicted they
> basically go away. They go away on many levels, too. Not just the branch
> itself, but the _conditional_ for the branch goes away as far as the
> critical path of code is concerned: the CPU still has to calculate it and
> check it, but from a performance angle it "doesn't exist any more",
> because it's not holding anything else up (well, you want to do it in
> _some_ reasonable time, but the point stands..)
>
> Similarly, whichever side of the branch wasn't taken goes away. Again, in
> an out-of-order machine with register renaming, this means that even if
> the branch isn't taken above, and you end up executing all the non-branch
> instructions, because you now UNCONDITIONALLY over-write the register, the
> old data in the register is now DEAD, so now all the OTHER writes to that
> register are off the critical path too!
>
> So the end result is that with a conditional branch, ona good CPU, the
> _only_ part of the code that is actually performance-sensitive is the
> actual calculation of the value that gets used!
>
> In contrast, if you use a predicated instruction, ALL of it is on the
> critical path. Calculating the conditional is on the critical path.
> Calculating the value that gets used is obviously ALSO on the critical
> path, but so is the calculation for the value that DOESN'T get used too.
> So the cmov - rather than speeding things up - actually slows things down,
> because it makes more code be dependent on each other.
>
> So here's the basic rule:
>
> - cmov is sometimes nice for code density. It's not a big win, but it
>   certainly can be a win.
>
> - if you KNOW the branch is totally unpredictable, cmov is often good for
>   performance. But a compiler almost never knows that, and even if you
>   train it with input data and profiling, remember that not very many
>   branches _are_ totally unpredictable, so even if you were to know that
>   something is unpredictable, it's going to be very rare.
>
> - on a P4, branch mispredictions are expensive, but so is cmov, so all
>   the above is to some degree exaggerated. On nicer microarchitectures
>   (the Intel Core 2 in particular is something I have to say is very nice
>   indeed), the difference will be a lot less noticeable. The loss from
>   cmov isn't very big (it's not as sucky as P4), but neither is the win
>   (branch misprediction isn't that expensive either).
>
> Here's an example program that you can test and time yourself.
>
> On my Core 2, I get
>
> 	[torvalds@woody ~]$ gcc -DCMOV -Wall -O2 t.c
> 	[torvalds@woody ~]$ time ./a.out
> 	600000000
>
> 	real    0m0.194s
> 	user    0m0.192s
> 	sys     0m0.000s
>
> 	[torvalds@woody ~]$ gcc -Wall -O2 t.c
> 	[torvalds@woody ~]$ time ./a.out
> 	600000000
>
> 	real    0m0.167s
> 	user    0m0.168s
> 	sys     0m0.000s
>
> ie the cmov is quite a bit slower. Maybe I did something wrong. But note
> how cmov not only is slower, it's fundamnetally more limited too (ie the
> branch-over can actually do a lot of things cmov simply cannot do).
>
> So don't use cmov. Except for non-performance-critical code, or if you
> really care about code-size, and it helps (which is actually fairly rare:
> quite often cmov isn't even smaller than a conditional jump and a regular
> move, partly because a regular move can take arguments that a cmov cannot:
> move to memory, move from an immediate etc etc, so depending on what
> you're moving, cmov simply isn't good even if it's _just_ a move).
>
> (For me, the "cmov" version of the function ends up being three bytes
> shorter. So it's actually a good example of everything above)
>
> 			Linus
>
> (*) x86 only has "move to register" as a predicated instruction, but some
> other architectures have lots of them, potentially all instructions. I
> don't count conditional branches as "predicated", although some crazy
> people do. ARM has predicated instructions (but they are gone in Thumb, I
> think), and ia64 obviously has predicated instructions (but it will be
> gone in a few years ;)

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-03 14:28         ` Alan
@ 2007-01-03 16:06           ` Linus Torvalds
  0 siblings, 0 replies; 60+ messages in thread
From: Linus Torvalds @ 2007-01-03 16:06 UTC (permalink / raw)
  To: Alan
  Cc: Arjan van de Ven, Grzegorz Kulewski, Mikael Pettersson, s0348365,
	76306.1226, akpm, bunk, greg, linux-kernel, yanmin_zhang



On Wed, 3 Jan 2007, Alan wrote:
>
> > cmov is effectively the same cost as a compare and jump, in both cases
> > the cpu needs to do a prediction, and on a mispredict, restart.
> 
> On a P4 it appears to be slower than compare/jump in most cases

On just about EVERYTHING it's slower than compare/jump. See my other post 
on why, together with a (largely untested) test app.

		Linus

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-03 10:32   ` Grzegorz Kulewski
  2007-01-03 11:51     ` Jeff Garzik
  2007-01-03 12:44     ` Alan
@ 2007-01-03 16:03     ` Linus Torvalds
  2007-01-03 17:01       ` l.genoni
                         ` (5 more replies)
  2 siblings, 6 replies; 60+ messages in thread
From: Linus Torvalds @ 2007-01-03 16:03 UTC (permalink / raw)
  To: Grzegorz Kulewski
  Cc: Alan, Mikael Pettersson, s0348365, 76306.1226, akpm, bunk, greg,
	linux-kernel, yanmin_zhang

[-- Attachment #1: Type: TEXT/PLAIN, Size: 4765 bytes --]



On Wed, 3 Jan 2007, Grzegorz Kulewski wrote:
> 
> Could you explain why CMOV is pointless now? Are there any benchmarks proving
> that?

CMOV (and, more generically, any "predicated instruction") tends to 
generally a bad idea on an aggressively out-of-order CPU. It doesn't 
always have to be horrible, but in practice it is seldom very nice, and 
(as usual) on the P4 it can be really quite bad.

On a P4, I think a cmov basically takes 10 cycles.

But even ignoring the usual P4 "I suck at things that aren't totally 
normal", cmov is actually not a great idea. You can always replace it by

		j<negated condition> forward
		mov ..., %reg
	forward:

and assuming the branch is AT ALL predictable (and 95+% of all branches 
are), the branch-over will actually be a LOT better for a CPU.

Why? Becuase branches can be predicted, and when they are predicted they 
basically go away. They go away on many levels, too. Not just the branch 
itself, but the _conditional_ for the branch goes away as far as the 
critical path of code is concerned: the CPU still has to calculate it and 
check it, but from a performance angle it "doesn't exist any more", 
because it's not holding anything else up (well, you want to do it in 
_some_ reasonable time, but the point stands..)

Similarly, whichever side of the branch wasn't taken goes away. Again, in 
an out-of-order machine with register renaming, this means that even if 
the branch isn't taken above, and you end up executing all the non-branch 
instructions, because you now UNCONDITIONALLY over-write the register, the 
old data in the register is now DEAD, so now all the OTHER writes to that 
register are off the critical path too!

So the end result is that with a conditional branch, ona good CPU, the 
_only_ part of the code that is actually performance-sensitive is the 
actual calculation of the value that gets used!

In contrast, if you use a predicated instruction, ALL of it is on the 
critical path. Calculating the conditional is on the critical path. 
Calculating the value that gets used is obviously ALSO on the critical 
path, but so is the calculation for the value that DOESN'T get used too. 
So the cmov - rather than speeding things up - actually slows things down, 
because it makes more code be dependent on each other.

So here's the basic rule:

 - cmov is sometimes nice for code density. It's not a big win, but it 
   certainly can be a win.

 - if you KNOW the branch is totally unpredictable, cmov is often good for 
   performance. But a compiler almost never knows that, and even if you 
   train it with input data and profiling, remember that not very many 
   branches _are_ totally unpredictable, so even if you were to know that 
   something is unpredictable, it's going to be very rare.

 - on a P4, branch mispredictions are expensive, but so is cmov, so all 
   the above is to some degree exaggerated. On nicer microarchitectures 
   (the Intel Core 2 in particular is something I have to say is very nice 
   indeed), the difference will be a lot less noticeable. The loss from 
   cmov isn't very big (it's not as sucky as P4), but neither is the win 
   (branch misprediction isn't that expensive either).

Here's an example program that you can test and time yourself. 

On my Core 2, I get

	[torvalds@woody ~]$ gcc -DCMOV -Wall -O2 t.c
	[torvalds@woody ~]$ time ./a.out
	600000000

	real    0m0.194s
	user    0m0.192s
	sys     0m0.000s

	[torvalds@woody ~]$ gcc -Wall -O2 t.c
	[torvalds@woody ~]$ time ./a.out
	600000000
	
	real    0m0.167s
	user    0m0.168s
	sys     0m0.000s

ie the cmov is quite a bit slower. Maybe I did something wrong. But note 
how cmov not only is slower, it's fundamnetally more limited too (ie the 
branch-over can actually do a lot of things cmov simply cannot do).

So don't use cmov. Except for non-performance-critical code, or if you 
really care about code-size, and it helps (which is actually fairly rare: 
quite often cmov isn't even smaller than a conditional jump and a regular 
move, partly because a regular move can take arguments that a cmov cannot: 
move to memory, move from an immediate etc etc, so depending on what 
you're moving, cmov simply isn't good even if it's _just_ a move).

(For me, the "cmov" version of the function ends up being three bytes 
shorter. So it's actually a good example of everything above)

			Linus

(*) x86 only has "move to register" as a predicated instruction, but some 
other architectures have lots of them, potentially all instructions. I 
don't count conditional branches as "predicated", although some crazy 
people do. ARM has predicated instructions (but they are gone in Thumb, I 
think), and ia64 obviously has predicated instructions (but it will be 
gone in a few years ;)

[-- Attachment #2: Type: TEXT/PLAIN, Size: 806 bytes --]

#include <stdio.h>

/* How many iterations? */
#define ITERATIONS (100000000)

/* Which bit of the counter to test? */
#define BIT 1

#ifdef CMOV

#define choose(i, a, b) ({			\
	unsigned long result;			\
	asm("testl %1,%2 ; cmovne %3,%0"	\
		:"=r" (result)			\
		:"i" (BIT),			\
		 "g" (i),			\
		 "rm" (a),			\
		 "0" (b));			\
	result; })

#else

#define choose(i, a, b) ({			\
	unsigned long result;			\
	asm("testl %1,%2 ; je 1f ; mov %3,%0\n1:"	\
		:"=r" (result)			\
		:"i" (BIT),			\
		 "g" (i),			\
		 "g" (a),			\
		 "0" (b));			\
	result; })

#endif

int main(int argc, char **argv)
{
	int i;
	unsigned long sum = 0;

	for (i = 0; i < ITERATIONS; i++) {
		unsigned long a = 5, b = 7;
		sum += choose(i, a, b);
	}
	printf("%lu\n", sum);
	return 0;
}

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-03 13:32       ` Arjan van de Ven
  2007-01-03 13:58         ` Jakub Jelinek
@ 2007-01-03 14:28         ` Alan
  2007-01-03 16:06           ` Linus Torvalds
  1 sibling, 1 reply; 60+ messages in thread
From: Alan @ 2007-01-03 14:28 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Grzegorz Kulewski, Mikael Pettersson, s0348365, torvalds,
	76306.1226, akpm, bunk, greg, linux-kernel, yanmin_zhang

> cmov is effectively the same cost as a compare and jump, in both cases
> the cpu needs to do a prediction, and on a mispredict, restart.

On a P4 it appears to be slower than compare/jump in most cases


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-03 13:32       ` Arjan van de Ven
@ 2007-01-03 13:58         ` Jakub Jelinek
  2007-01-03 14:28         ` Alan
  1 sibling, 0 replies; 60+ messages in thread
From: Jakub Jelinek @ 2007-01-03 13:58 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Alan, Grzegorz Kulewski, Mikael Pettersson, s0348365, torvalds,
	76306.1226, akpm, bunk, greg, linux-kernel, yanmin_zhang

On Wed, Jan 03, 2007 at 05:32:16AM -0800, Arjan van de Ven wrote:
> On Wed, 2007-01-03 at 12:44 +0000, Alan wrote:
> > > > fixed. At that point an i686 kernel would contain i686 instructions and
> > > > actually run on all i686 processors ending all the i586 pain for most
> > > > users and distributions.
> > > 
> > > Could you explain why CMOV is pointless now? Are there any benchmarks 
> > > proving that?
> > 
> > Take a look at the recent ffmpeg bits on the mplayer list for one example
> > I have to hand - P4 cmov is pretty slow. The crypto folks find the same
> > things.
> 
> cmov is effectively the same cost as a compare and jump, in both cases
> the cpu needs to do a prediction, and on a mispredict, restart.
> 
> the reason cmov can make sense is because it's smaller code...

BTW, from GCC POV availability of CMOV is the only difference between
-march=i586 -mtune=something and -march=i686 -mtune=something.  So this is
just a naming thing, it could be called -march=i686cmov to make it more
obvious but it is too late (and too unimportant) to change it now.
Perhaps adding a note to info gcc/man gcc ought to be enough?
If you don't want CMOV being emitted, compile with -march=i586 -mtune=generic
(or whatever other tuning you pick up), with -march=i686 -mtune=generic
you tell GCC you have CMOV.  Whether CMOV is actually used in generated
code is another matter, which should be decided based on the selected
-mtune.  For -Os CMOV should be used whenever available, as that means
usually smaller code, otherwise if on some particular chip CMOV is actually
slower than compare, jump and assignment, then CMOV should not be selected
for that particular tuning (say if Pentium4 has slower CMOV than
compare+jump+assignment, -mtune=pentium4 should not emit CMOV, at least not
often), if you have examples of that, please file a bug to
http://gcc.gnu.org/bugzilla/.  -mtune=generic should emit resp. not emit
CMOV depending on whether it is a win on the currently common CPUs.

	Jakub

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-03 12:44     ` Alan
@ 2007-01-03 13:32       ` Arjan van de Ven
  2007-01-03 13:58         ` Jakub Jelinek
  2007-01-03 14:28         ` Alan
  0 siblings, 2 replies; 60+ messages in thread
From: Arjan van de Ven @ 2007-01-03 13:32 UTC (permalink / raw)
  To: Alan
  Cc: Grzegorz Kulewski, Mikael Pettersson, s0348365, torvalds,
	76306.1226, akpm, bunk, greg, linux-kernel, yanmin_zhang

On Wed, 2007-01-03 at 12:44 +0000, Alan wrote:
> > > fixed. At that point an i686 kernel would contain i686 instructions and
> > > actually run on all i686 processors ending all the i586 pain for most
> > > users and distributions.
> > 
> > Could you explain why CMOV is pointless now? Are there any benchmarks 
> > proving that?
> 
> Take a look at the recent ffmpeg bits on the mplayer list for one example
> I have to hand - P4 cmov is pretty slow. The crypto folks find the same
> things.

cmov is effectively the same cost as a compare and jump, in both cases
the cpu needs to do a prediction, and on a mispredict, restart.

the reason cmov can make sense is because it's smaller code...

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-03 10:32   ` Grzegorz Kulewski
  2007-01-03 11:51     ` Jeff Garzik
@ 2007-01-03 12:44     ` Alan
  2007-01-03 13:32       ` Arjan van de Ven
  2007-01-03 16:03     ` Linus Torvalds
  2 siblings, 1 reply; 60+ messages in thread
From: Alan @ 2007-01-03 12:44 UTC (permalink / raw)
  To: Grzegorz Kulewski
  Cc: Mikael Pettersson, s0348365, torvalds, 76306.1226, akpm, bunk,
	greg, linux-kernel, yanmin_zhang

> > fixed. At that point an i686 kernel would contain i686 instructions and
> > actually run on all i686 processors ending all the i586 pain for most
> > users and distributions.
> 
> Could you explain why CMOV is pointless now? Are there any benchmarks 
> proving that?

Take a look at the recent ffmpeg bits on the mplayer list for one example
I have to hand - P4 cmov is pretty slow. The crypto folks find the same
things.

Alan


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-03 10:32   ` Grzegorz Kulewski
@ 2007-01-03 11:51     ` Jeff Garzik
  2007-01-03 12:44     ` Alan
  2007-01-03 16:03     ` Linus Torvalds
  2 siblings, 0 replies; 60+ messages in thread
From: Jeff Garzik @ 2007-01-03 11:51 UTC (permalink / raw)
  To: Grzegorz Kulewski
  Cc: Alan, Mikael Pettersson, s0348365, torvalds, 76306.1226, akpm,
	bunk, greg, linux-kernel, yanmin_zhang

Grzegorz Kulewski wrote:
> On Wed, 3 Jan 2007, Alan wrote:
>> The proper fix for all of this mess is to fix the gcc compiler suite to
>> actually generate i686 code when told to use i686. CMOV is an optional
>> i686 extension which gcc uses without checking. In early PIV days it made
>> sense but on modern processors CMOV is so pointless the bug should be
>> fixed. At that point an i686 kernel would contain i686 instructions and
>> actually run on all i686 processors ending all the i586 pain for most
>> users and distributions.
> 
> Could you explain why CMOV is pointless now? Are there any benchmarks 
> proving that?

In theory modern processors should have no trouble converting a 
test/move sequence into the same uops generated by a cmov instruction, 
for one.

	Jeff




^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-03 10:29 ` Alan
@ 2007-01-03 10:32   ` Grzegorz Kulewski
  2007-01-03 11:51     ` Jeff Garzik
                       ` (2 more replies)
  0 siblings, 3 replies; 60+ messages in thread
From: Grzegorz Kulewski @ 2007-01-03 10:32 UTC (permalink / raw)
  To: Alan
  Cc: Mikael Pettersson, s0348365, torvalds, 76306.1226, akpm, bunk,
	greg, linux-kernel, yanmin_zhang

On Wed, 3 Jan 2007, Alan wrote:
> The proper fix for all of this mess is to fix the gcc compiler suite to
> actually generate i686 code when told to use i686. CMOV is an optional
> i686 extension which gcc uses without checking. In early PIV days it made
> sense but on modern processors CMOV is so pointless the bug should be
> fixed. At that point an i686 kernel would contain i686 instructions and
> actually run on all i686 processors ending all the i586 pain for most
> users and distributions.

Could you explain why CMOV is pointless now? Are there any benchmarks 
proving that?


Thanks,

Grzegorz Kulewski


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-03  2:12 Mikael Pettersson
  2007-01-03  2:20 ` Alistair John Strachan
  2007-01-03  5:55 ` Willy Tarreau
@ 2007-01-03 10:29 ` Alan
  2007-01-03 10:32   ` Grzegorz Kulewski
  2 siblings, 1 reply; 60+ messages in thread
From: Alan @ 2007-01-03 10:29 UTC (permalink / raw)
  To: Mikael Pettersson
  Cc: s0348365, torvalds, 76306.1226, akpm, bunk, greg, linux-kernel,
	yanmin_zhang

> That's a good suggestion. Earlier C3s didn't have cmov so it's 
> not entirely unlikely that cmov in C3-2 is broken in some cases.
> Configuring for P5MMX or 486 should be good safe alternatives.

The proper fix for all of this mess is to fix the gcc compiler suite to
actually generate i686 code when told to use i686. CMOV is an optional
i686 extension which gcc uses without checking. In early PIV days it made
sense but on modern processors CMOV is so pointless the bug should be
fixed. At that point an i686 kernel would contain i686 instructions and
actually run on all i686 processors ending all the i586 pain for most
users and distributions.

Unfortunately the compiler people don't appear to care about their years
old bug.

Alan

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-03  2:12 Mikael Pettersson
  2007-01-03  2:20 ` Alistair John Strachan
@ 2007-01-03  5:55 ` Willy Tarreau
  2007-01-03 10:29 ` Alan
  2 siblings, 0 replies; 60+ messages in thread
From: Willy Tarreau @ 2007-01-03  5:55 UTC (permalink / raw)
  To: Mikael Pettersson
  Cc: s0348365, torvalds, 76306.1226, akpm, bunk, greg, linux-kernel,
	yanmin_zhang

On Wed, Jan 03, 2007 at 03:12:13AM +0100, Mikael Pettersson wrote:
> On Tue, 2 Jan 2007 17:43:00 -0800 (PST), Linus Torvalds wrote:
> > > The suggestions I've had so far which I have not yet tried:
> > > 
> > > -	Select a different x86 CPU in the config.
> > > 		-	Unfortunately the C3-2 flags seem to simply tell GCC
> > > 			to schedule for ppro (like i686) and enabled MMX and SSE
> > > 		-	Probably useless
> > 
> > Actually, try this one. Try using something that doesn't like "cmov". 
> > Maybe the C3-2 simply has some internal cmov bugginess. 
> 
> That's a good suggestion. Earlier C3s didn't have cmov so it's 
> not entirely unlikely that cmov in C3-2 is broken in some cases.

Agreed! When I developped the cmov emulator, I used an early C3 for the
tests (well, a "Samuel2" to be precise), because it did not report "cmov"
in its flags. I first thought "wow, my emulator is amazingly fast!" because
it took something like 50 cycles to do cmovne %eax,%ebx.

Then I realized that this processor performed cmov itself between
registers, and only triggered the invalid opcode when one of the operand
was a memory reference. And this time, for a hard-coded instruction, it
was really slow...

For this reason, I would not be surprized at all that there would be some
buggy behaviour in the cmov right there. Maybe a bug in the decoder unit
making it skip a byte when the next instruction in the prefetch queue is
a cmov affecting same registers... When vendors can do dirty things such
as executing unsupported instructions, we can expect anything from them.

> Configuring for P5MMX or 486 should be good safe alternatives.

I generally use the P5MMX target for such processors.

> /Mikael

Regards,
Willy


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-03  2:12 Mikael Pettersson
@ 2007-01-03  2:20 ` Alistair John Strachan
  2007-01-05 15:53   ` Alistair John Strachan
  2007-01-03  5:55 ` Willy Tarreau
  2007-01-03 10:29 ` Alan
  2 siblings, 1 reply; 60+ messages in thread
From: Alistair John Strachan @ 2007-01-03  2:20 UTC (permalink / raw)
  To: Mikael Pettersson
  Cc: torvalds, 76306.1226, akpm, bunk, greg, linux-kernel, yanmin_zhang

On Wednesday 03 January 2007 02:12, Mikael Pettersson wrote:
> On Tue, 2 Jan 2007 17:43:00 -0800 (PST), Linus Torvalds wrote:
> > > The suggestions I've had so far which I have not yet tried:
> > >
> > > -	Select a different x86 CPU in the config.
> > > 		-	Unfortunately the C3-2 flags seem to simply tell GCC
> > > 			to schedule for ppro (like i686) and enabled MMX and SSE
> > > 		-	Probably useless
> >
> > Actually, try this one. Try using something that doesn't like "cmov".
> > Maybe the C3-2 simply has some internal cmov bugginess.
>
> That's a good suggestion. Earlier C3s didn't have cmov so it's
> not entirely unlikely that cmov in C3-2 is broken in some cases.
> Configuring for P5MMX or 486 should be good safe alternatives.

Or just C3 (not C3-2), which is what I've done.

I'll report back whether it crashes or not.

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
@ 2007-01-03  2:12 Mikael Pettersson
  2007-01-03  2:20 ` Alistair John Strachan
                   ` (2 more replies)
  0 siblings, 3 replies; 60+ messages in thread
From: Mikael Pettersson @ 2007-01-03  2:12 UTC (permalink / raw)
  To: s0348365, torvalds
  Cc: 76306.1226, akpm, bunk, greg, linux-kernel, yanmin_zhang

On Tue, 2 Jan 2007 17:43:00 -0800 (PST), Linus Torvalds wrote:
> > The suggestions I've had so far which I have not yet tried:
> > 
> > -	Select a different x86 CPU in the config.
> > 		-	Unfortunately the C3-2 flags seem to simply tell GCC
> > 			to schedule for ppro (like i686) and enabled MMX and SSE
> > 		-	Probably useless
> 
> Actually, try this one. Try using something that doesn't like "cmov". 
> Maybe the C3-2 simply has some internal cmov bugginess. 

That's a good suggestion. Earlier C3s didn't have cmov so it's 
not entirely unlikely that cmov in C3-2 is broken in some cases.
Configuring for P5MMX or 486 should be good safe alternatives.

/Mikael

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-02 23:41               ` D. Hazelton
@ 2007-01-03  2:05                 ` Horst H. von Brand
  0 siblings, 0 replies; 60+ messages in thread
From: Horst H. von Brand @ 2007-01-03  2:05 UTC (permalink / raw)
  To: D. Hazelton
  Cc: Adrian Bunk, Alistair John Strachan, Zhang, Yanmin, LKML,
	Greg KH, Chuck Ebbert, Linus Torvalds, Andrew Morton

D. Hazelton <dhazelton@enter.net> wrote:

[...]

> None. I didn't file a report on this because I didn't find the big, just
> noted a problem that appears to occur. In this case the call's generated
> seem to wrap loops - something I've never heard of anyone doing.

Example code showing this weirdness?

>                                                                  These
> *might* be causing the off-by-one that is causing the function to
> re-enter in the middle of an instruction.

If something like this happened, programs would be crashing left and right.

> Seeing this I'd guess that this follows for all system-level code
> generated by 4.1.1

Define "system-level code". What makes it different from, say,
bog-of-the-mill compiler code (yes, gcc compiles itself as part of its
sanity checking)?

>                    and this is exactly what I was reporting. If you'd
> like I'll go dig up the dumps he posted and post the two related segments
> side-by-side to give you a better example what I'm referring to.

If the related segments show code that is somehow wrong, by all means
report it /with your detailed analysis/ to the compiler people. Just a
warning, gcc is pretty smart in what it does, its code is often surprising
to the unwashed. Also, the C standard is subtle, the error might be in a
unwarranted assumption in the source code.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-02 23:18             ` Alistair John Strachan
@ 2007-01-03  1:43               ` Linus Torvalds
  0 siblings, 0 replies; 60+ messages in thread
From: Linus Torvalds @ 2007-01-03  1:43 UTC (permalink / raw)
  To: Alistair John Strachan
  Cc: Adrian Bunk, Zhang, Yanmin, LKML, Greg KH, Chuck Ebbert, Andrew Morton



On Tue, 2 Jan 2007, Alistair John Strachan wrote:
>
> eax: 00000008   ebx: 00000000   ecx: 00000008   edx: 00000000
> esi: f70f3e9c   edi: f7017c00   ebp: f70f3c1c   esp: f70f3c0c
>
> Code: 58 01 00 00 0f 4f c2 09 c1 89 c8 83 c8 08 85 db 0f 44 c8 8b 5d f4 89 c8 
> 8b 75 f8 8b 7d fc 89 ec 5d c3 89 ca 8b 46 6c 83 ca 10 3b <87> 68 01 00 00 0f 
> 45 ca eb b6 8d b6 00 00 00 00 55 b8 01 00 00
> EIP: [<c0156f60>] pipe_poll+0xa0/0xb0 SS:ESP 0068:f70f3c0c
> 
> Chuck observed that the kernel tries to reenter pipe_poll half way through an 
> instruction (c0156f5f->c0156f60); it's not a single-bit error but an 
> off-by-one.

It's not an off-by-one either (eg say we're taking an exception and 
screiwing up %eip by one somehow).

The code sequence in question is

	mov    %ecx,%edx
	mov    0x6c(%esi),%eax
	or     $0x10,%edx
	cmp    0x168(%edi),%eax		<--
	cmovne %edx,%ecx
	jmp    ...

and it's in the second byte of the "cmp".

And yes, it definitely entered there, because trying other random 
entry-points will have either invalid instructions or instructions that 
would fault due to NULL pointers. HOWEVER, it's also not as simple as 
"took an interrupt, and returned with %eip incremented by one", becasue 
your %edx is zero, so it won't have done that "or $10,%edx" and then some 
interrupt happened and screwed up just %eip.

So it's literally a random %eip, but since you say it's consistently in 
that function, it's not truly "random". There's something that triggers it 
just _there_.

However, that's a damn simple function. There's _nothing_ there. The 
particular code that is involved right there is literally

	if (!pipe->writers && filp->f_version != pipe->w_counter)
		mask |= POLLHUP;

and that's it.  There's not even anything half-way interesting around it, 
except for the "poll_wait()" call, but even that is about as common as
you can humanly get..

Looking at the register set and the stack, I see:

	Stack:	00000000
		00000000  <- saved %ebx (dunno, seems dead in caller)
		f70f3e9c  <- saved %esi (== pollfd in do_pollfd)
		f6e111c0  <- saved %edi	(== filp)
		f70f3fa4  <- outer EBP (looks reasonable) 
		c015d7f3  <- return address (do_sys_poll+0x253/0x480)

and the strange thing is that when the oops happens, it really looks like 
%esi _still_ contains the value it had originally (and that is saved on 
the stack). But afaik, from your disassembly, it should have been 
overwritten by the initial %eax, which should have had the same value as 
%edi on entry...

IOW, none of it really makes any sense. The stack frames look fine, so we 
_did_ enter at the beginning of the function (and it wasn't the *poll fn 
pointer that was corrupt.

> The suggestions I've had so far which I have not yet tried:
> 
> -	Select a different x86 CPU in the config.
> 		-	Unfortunately the C3-2 flags seem to simply tell GCC
> 			to schedule for ppro (like i686) and enabled MMX and SSE
> 		-	Probably useless

Actually, try this one. Try using something that doesn't like "cmov". 
Maybe the C3-2 simply has some internal cmov bugginess. 

		Linus

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-02 23:24             ` Adrian Bunk
@ 2007-01-02 23:41               ` D. Hazelton
  2007-01-03  2:05                 ` Horst H. von Brand
  0 siblings, 1 reply; 60+ messages in thread
From: D. Hazelton @ 2007-01-02 23:41 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Alistair John Strachan, Zhang, Yanmin, LKML, Greg KH,
	Chuck Ebbert, Linus Torvalds, Andrew Morton

On Tuesday 02 January 2007 18:24, you wrote:
> On Tue, Jan 02, 2007 at 05:06:14PM -0500, D. Hazelton wrote:
> > On Tuesday 02 January 2007 16:56, Alistair John Strachan wrote:
> > > On Tuesday 02 January 2007 21:10, Adrian Bunk wrote:
> > > [snip]
> > >
> > > > > > Comparing your report and [1], it seems that if these are the
> > > > > > same problem, it's not a hardware bug but a gcc or kernel bug.
> > > > >
> > > > > This bug specifically indicates some kind of miscompilation in a
> > > > > driver, causing boot time hangs. My problem is quite different, and
> > > > > more subtle. The crash happens in the same place every time, which
> > > > > does suggest determinism (even with various options toggled on and
> > > > > off, and a 300K smaller kernel image), but it takes 8-12 hours to
> > > > > manifest and only happens with GCC 4.1.1. ...
> > > >
> > > > Sorry if my point goes a bit away from your problem:
> > > >
> > > > My point is that we have several reported problems only visible
> > > > with gcc 4.1.
> > > >
> > > > Other bug reports are e.g. [2] and [3], but they are only present
> > > > with using gcc 4.1 _and_ using -Os.
> > >
> > > I find [2] most compelling, and I can confirm that I do have the same
> > > problem with or without optimisation for size. I don't use selinux nor
> > > has it ever been enabled.
> > >
> > > At any rate, I have absolute confirmation that it is GCC 4.1.1, because
> > > with GCC 3.4.6 the same kernel I reported booting three days ago is
> > > still cheerfully working. I regularly get uptimes of 60+ days on that
> > > machine, rebooting only for kernel upgrades. 2.6.19 seems to be no
> > > worse in this regard.
> > >
> > > Perhaps fortunately, the configs I've tried have consistently failed to
> > > shake the crash, so I have a semi-reproducible test case here on C3-2
> > > hardware if somebody wants to investigate the problem (though it still
> > > takes 6-12 hours).
> >
> > The GCC code generator appears to have been rewritten between 3.4.6 and
> > 4.1.1....
> >
> > I took a look at the dump he posted and there are some minor and some
> > massive differences between the code. In one case some of the code is
> > swapped, in another there is code in the 3.4.6 version that isn't in the
> > 4.1.1... Finally the 4.1.1 version of the function has what appears to be
> > function calls and these don't appear in the code generated by 3.4.6
>
> Differences are expected since we disable unit-at-a-time for gcc < 4
> and gcc development didn't stall between 3.4 and 4.1.

Okay. Thing is that these noted differences, aside from where 4.1.1 doesn't 
generate an opcode that 3.4.6 does aren't all that fatal, IMHO. The fact that 
there it does generate call's rather than jumps for local pointer moves 
(IIRC - been a while since I looked at the dump of pipe_poll that he 
provided) might be part of the problem

> > In other words - the code generation for 4.1.1 appears to be broken when
> > it comes to generating system code.
>
> Bug number for an either already open or created by you bug in the gcc
> Bugzilla for what you claim to be a bug in gcc?

None. I didn't file a report on this because I didn't find the big, just noted 
a problem that appears to occur. In this case the call's generated seem to 
wrap loops - something I've never heard of anyone doing. These *might* be 
causing the off-by-one that is causing the function to re-enter in the middle 
of an instruction.

Seeing this I'd guess that this follows for all system-level code generated by 
4.1.1 and this is exactly what I was reporting. If you'd like I'll go dig up 
the dumps he posted and post the two related segments side-by-side to give 
you a better example what I'm referring to.

DRH

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-02 22:06           ` D. Hazelton
@ 2007-01-02 23:24             ` Adrian Bunk
  2007-01-02 23:41               ` D. Hazelton
  0 siblings, 1 reply; 60+ messages in thread
From: Adrian Bunk @ 2007-01-02 23:24 UTC (permalink / raw)
  To: D. Hazelton
  Cc: Alistair John Strachan, Zhang, Yanmin, LKML, Greg KH,
	Chuck Ebbert, Linus Torvalds, Andrew Morton

On Tue, Jan 02, 2007 at 05:06:14PM -0500, D. Hazelton wrote:
> On Tuesday 02 January 2007 16:56, Alistair John Strachan wrote:
> > On Tuesday 02 January 2007 21:10, Adrian Bunk wrote:
> > [snip]
> >
> > > > > Comparing your report and [1], it seems that if these are the same
> > > > > problem, it's not a hardware bug but a gcc or kernel bug.
> > > >
> > > > This bug specifically indicates some kind of miscompilation in a
> > > > driver, causing boot time hangs. My problem is quite different, and
> > > > more subtle. The crash happens in the same place every time, which does
> > > > suggest determinism (even with various options toggled on and off, and
> > > > a 300K smaller kernel image), but it takes 8-12 hours to manifest and
> > > > only happens with GCC 4.1.1. ...
> > >
> > > Sorry if my point goes a bit away from your problem:
> > >
> > > My point is that we have several reported problems only visible
> > > with gcc 4.1.
> > >
> > > Other bug reports are e.g. [2] and [3], but they are only present with
> > > using gcc 4.1 _and_ using -Os.
> >
> > I find [2] most compelling, and I can confirm that I do have the same
> > problem with or without optimisation for size. I don't use selinux nor has
> > it ever been enabled.
> >
> > At any rate, I have absolute confirmation that it is GCC 4.1.1, because
> > with GCC 3.4.6 the same kernel I reported booting three days ago is still
> > cheerfully working. I regularly get uptimes of 60+ days on that machine,
> > rebooting only for kernel upgrades. 2.6.19 seems to be no worse in this
> > regard.
> >
> > Perhaps fortunately, the configs I've tried have consistently failed to
> > shake the crash, so I have a semi-reproducible test case here on C3-2
> > hardware if somebody wants to investigate the problem (though it still
> > takes 6-12 hours).
> 
> The GCC code generator appears to have been rewritten between 3.4.6 and 
> 4.1.1....
> 
> I took a look at the dump he posted and there are some minor and some massive 
> differences between the code. In one case some of the code is swapped, in 
> another there is code in the 3.4.6 version that isn't in the 4.1.1... Finally 
> the 4.1.1 version of the function has what appears to be function calls and 
> these don't appear in the code generated by 3.4.6

Differences are expected since we disable unit-at-a-time for gcc < 4 
and gcc development didn't stall between 3.4 and 4.1.

> In other words - the code generation for 4.1.1 appears to be broken when it 
> comes to generating system code.

Bug number for an either already open or created by you bug in the gcc 
Bugzilla for what you claim to be a bug in gcc?

> DRH

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-02 22:13           ` Linus Torvalds
@ 2007-01-02 23:18             ` Alistair John Strachan
  2007-01-03  1:43               ` Linus Torvalds
  0 siblings, 1 reply; 60+ messages in thread
From: Alistair John Strachan @ 2007-01-02 23:18 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Adrian Bunk, Zhang, Yanmin, LKML, Greg KH, Chuck Ebbert, Andrew Morton

Linus,

On Tuesday 02 January 2007 22:13, Linus Torvalds wrote:
[snip]
> What are the exact crash details? That might narrow things down enough
> that maybe you could try just one or two files that are "suspect".

I'll do a digest of the problem for you and anybody else that's lost track of 
the debugging story so far..

There are no hardware problems evidenced by any testing I have performed 
(memtest, prime95 CPU torture tests, temp monitors). Furthermore, kernels 
compiled with older GCCs have been running without problems for literally 
years on this machine.

Here is an example of an oops. The kernel continued to limp along after this.

BUG: unable to handle kernel NULL pointer dereference at virtual address 
00000009
 printing eip:
c0156f60
*pde = 00000000
Oops: 0002 [#1]
Modules linked in: ipt_recent ipt_REJECT xt_tcpudp ipt_MASQUERADE iptable_nat 
xt_state iptable_filter ip_tables x_tables prism54 yenta_socket 
rsrc_nonstatic pcmcia_core snd_via82xx snd_ac97_codec snd_ac97_bus snd_pcm 
snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd soundcore ehci_hcd 
usblp eth1394 uhci_hcd usbcore ohci1394 ieee1394 via_agp agpgart vt1211 
hwmon_vid hwmon ip_nat_ftp ip_nat ip_conntrack_ftp ip_conntrack
CPU:    0
EIP:    0060:[<c0156f60>]    Not tainted VLI
EFLAGS: 00010246   (2.6.19.1 #1)
EIP is at pipe_poll+0xa0/0xb0
eax: 00000008   ebx: 00000000   ecx: 00000008   edx: 00000000
esi: f70f3e9c   edi: f7017c00   ebp: f70f3c1c   esp: f70f3c0c
ds: 007b   es: 007b   ss: 0068
Process python (pid: 4178, ti=f70f2000 task=f70c4a90 task.ti=f70f2000)
Stack: 00000000 00000000 f70f3e9c f6e111c0 f70f3fa4 c015d7f3 f70f3c54 f70f3fac
       084c44a0 00000030 084c44d0 00000000 f70f3e94 f70f3e94 00000006 f70f3ecc
       00000000 f70f3e94 c015e580 00000000 00000000 00000006 f6e111c0 00000000
Call Trace:
 [<c015d7f3>] do_sys_poll+0x253/0x480
 [<c015da53>] sys_poll+0x33/0x50
 [<c0102c97>] syscall_call+0x7/0xb
 [<b7f6b402>] 0xb7f6b402
 =======================
Code: 58 01 00 00 0f 4f c2 09 c1 89 c8 83 c8 08 85 db 0f 44 c8 8b 5d f4 89 c8 
8b 75 f8 8b 7d fc 89 ec 5d c3 89 ca 8b 46 6c 83 ca 10 3b <87> 68 01 00 00 0f 
45 ca eb b6 8d b6 00 00 00 00 55 b8 01 00 00
EIP: [<c0156f60>] pipe_poll+0xa0/0xb0 SS:ESP 0068:f70f3c0c

Chuck observed that the kernel tries to reenter pipe_poll half way through an 
instruction (c0156f5f->c0156f60); it's not a single-bit error but an 
off-by-one.

On Wednesday 20 December 2006 20:48, Chuck Ebbert wrote:
> In-Reply-To: <200612201421.03514.s0348365@sms.ed.ac.uk>
>
> On Wed, 20 Dec 2006 14:21:03 +0000, Alistair John Strachan wrote:
> > Any ideas?
> >
> > BUG: unable to handle kernel NULL pointer dereference at virtual address
> > 00000009
>
>     83 ca 10                  or     $0x10,%edx
>     3b                        .byte 0x3b
>     87 68 01                  xchg   %ebp,0x1(%eax)   <=====
>     00 00                     add    %al,(%eax)
>
> Somehow it is trying to execute code in the middle of an instruction.
> That almost never works, even when the resulting fragment is a legal
> opcode. :)
>
> The real instruction is:
>
>     3b 87 68 01 00 00 00        cmp    0x168(%edi),%eax

I've tried a multitude of kernel configs and compiler options, but none have 
made any difference. That first oops was pretty lucky, very often the machine 
locks up after oopsing (panic_on_oops=1 doesn't work). I've not seen oopses 
anywhere but in pipe_poll, but I've not seen many oopses.

The machine runs jabberd 2.x which uses separate python processes as 
transports to different networks. The server hosts 50-100 users. One of my 
oops reports had Java crashing in the same place, that's Azureus.

I've got binutils 2.17, gcc 4.1.1 hand bootstrapped from GNU sources (not 
distro versions). I've got another, secondary compiler (3.4.6), also compiled 
from GNU sources, installed elsewhere which I have used to build working 
kernels. So the only variable, for sure, is GCC itself.

Both compilers were built with "make bootstrap" and I built binutils with the 
resulting GCC, and GCC with the resulting binutils, just to be sure. The only 
slightly non-standard thing I do is to compile everything (GCC, binutils, the 
kernels) on a dual-opteron box, inside a 32bit chroot, which is rsync'ed over 
to the Via C3-2 box with the problem. I can't see how this would cause any 
problems (and indeed have done it successfully for years), but I thought I'd 
point it out.

The crashes take time to appear, which is why so many people suspected 
hardware initially. But the uptime of a GCC 4.1.1 kernel will always be less 
than 12 hours, where a 3.4.6 kernel will survive for months. I've had no 
other mysterious software crashes, ever.

On Sunday 31 December 2006 22:16, Alistair John Strachan wrote:
> On Sunday 31 December 2006 21:43, Chuck Ebbert wrote:
> > Those were compiled without frame pointers.  Can you post them compiled
> > with frame pointers so they match your original bug report? And confirm
> > that pipe_poll() is still at 0xc0156ec0 in vmlinux?
>
> c0156ec0 <pipe_poll>:
>
> I used the config I original sent you to rebuild it again. This time I've
> put up the whole vmlinux for both kernels, the config is replaced, the
> decompilation is re-done, I've confirmed the offset in the GCC 4.1.1 kernel
> is identical. Sorry for the confusion.
[snip]
> http://devzero.co.uk/~alistair/2.6.19-via-c3-pipe_poll/

At the above URL can be found vmlinux images, the config used to build both, 
and decompilations of the fs/pipe.o file (with relocation information).

The suggestions I've had so far which I have not yet tried:

-	Select a different x86 CPU in the config.
		-	Unfortunately the C3-2 flags seem to simply tell GCC
			to schedule for ppro (like i686) and enabled MMX and SSE
		-	Probably useless

-	Enable as many debug options as possible ("a shot in the dark")

-	Try compiling a minimal kernel config, sans modules that are not required
	for booting. The problem with this one (whilst it might uncover some bizarre
	memory scribbling or stack corruption) is that the machine's primary role is
	that of a router, so I require most of the modules loaded for the oops to be
	reproduced (chicken, egg?).

If I can provide any more information, please do let me know.

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-02 22:01         ` Linus Torvalds
@ 2007-01-02 23:09           ` David Rientjes
  0 siblings, 0 replies; 60+ messages in thread
From: David Rientjes @ 2007-01-02 23:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Adrian Bunk, Alistair John Strachan, Zhang, Yanmin, LKML,
	Greg KH, Chuck Ebbert, Andrew Morton

On Tue, 2 Jan 2007, Linus Torvalds wrote:

> Traditionally, afaik, -Os has tended to show compiler problems that 
> _could_ happen with -O2 too, but never do in practice. It may be that 
> gcc-4.1 without -Os miscompiles some very unusual code, and then with -Os 
> we just hit more cases of that.
> 

gcc optimizations were almost completely rewritten between 3.4.6 and 4.1, 
and one of the subtle changes that may have been introduced is with regard 
to the heuristics used to determine whether to inline an 'inline' function 
or not when using -Os.  This problem can show up in dynamic linking and 
break on certain architectures but should be detectable by using -Winline.

		David

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-02 21:56         ` Alistair John Strachan
  2007-01-02 22:06           ` D. Hazelton
@ 2007-01-02 22:13           ` Linus Torvalds
  2007-01-02 23:18             ` Alistair John Strachan
  1 sibling, 1 reply; 60+ messages in thread
From: Linus Torvalds @ 2007-01-02 22:13 UTC (permalink / raw)
  To: Alistair John Strachan
  Cc: Adrian Bunk, Zhang, Yanmin, LKML, Greg KH, Chuck Ebbert, Andrew Morton



On Tue, 2 Jan 2007, Alistair John Strachan wrote:
> 
> At any rate, I have absolute confirmation that it is GCC 4.1.1, because with 
> GCC 3.4.6 the same kernel I reported booting three days ago is still 
> cheerfully working. I regularly get uptimes of 60+ days on that machine, 
> rebooting only for kernel upgrades. 2.6.19 seems to be no worse in this 
> regard.
> 
> Perhaps fortunately, the configs I've tried have consistently failed to shake 
> the crash, so I have a semi-reproducible test case here on C3-2 hardware if 
> somebody wants to investigate the problem (though it still takes 6-12 hours).

Historically, some people have actually used horrible hacks like trying to 
figure out which particular C file gets miscompiled by basically having 
both compilers installed, and then trying out different subdirectories 
with different compilers. And once the subdirectory has been pinpointed, 
pinpointing which particular file it is.. etc.

Pretty damn horrible to do, and I'm afraid we don't have any real helpful 
scripts to do any of the work for you. So it's all effectively manual 
(basically boils down to: "compile everything with known-good compiler. 
Then replace the good compiler with the bad one, remove the object files 
from one directory, and recompile the kernel". "Rinse and repeat".

I don't think anybody has ever done that with something where triggering 
the cause then also takes that long - that just ends up making the whole 
thing even more painful. 

What are the exact crash details? That might narrow things down enough 
that maybe you could try just one or two files that are "suspect".

		Linus

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-02 21:56         ` Alistair John Strachan
@ 2007-01-02 22:06           ` D. Hazelton
  2007-01-02 23:24             ` Adrian Bunk
  2007-01-02 22:13           ` Linus Torvalds
  1 sibling, 1 reply; 60+ messages in thread
From: D. Hazelton @ 2007-01-02 22:06 UTC (permalink / raw)
  To: Alistair John Strachan
  Cc: Adrian Bunk, Zhang, Yanmin, LKML, Greg KH, Chuck Ebbert,
	Linus Torvalds, Andrew Morton

On Tuesday 02 January 2007 16:56, Alistair John Strachan wrote:
> On Tuesday 02 January 2007 21:10, Adrian Bunk wrote:
> [snip]
>
> > > > Comparing your report and [1], it seems that if these are the same
> > > > problem, it's not a hardware bug but a gcc or kernel bug.
> > >
> > > This bug specifically indicates some kind of miscompilation in a
> > > driver, causing boot time hangs. My problem is quite different, and
> > > more subtle. The crash happens in the same place every time, which does
> > > suggest determinism (even with various options toggled on and off, and
> > > a 300K smaller kernel image), but it takes 8-12 hours to manifest and
> > > only happens with GCC 4.1.1. ...
> >
> > Sorry if my point goes a bit away from your problem:
> >
> > My point is that we have several reported problems only visible
> > with gcc 4.1.
> >
> > Other bug reports are e.g. [2] and [3], but they are only present with
> > using gcc 4.1 _and_ using -Os.
>
> I find [2] most compelling, and I can confirm that I do have the same
> problem with or without optimisation for size. I don't use selinux nor has
> it ever been enabled.
>
> At any rate, I have absolute confirmation that it is GCC 4.1.1, because
> with GCC 3.4.6 the same kernel I reported booting three days ago is still
> cheerfully working. I regularly get uptimes of 60+ days on that machine,
> rebooting only for kernel upgrades. 2.6.19 seems to be no worse in this
> regard.
>
> Perhaps fortunately, the configs I've tried have consistently failed to
> shake the crash, so I have a semi-reproducible test case here on C3-2
> hardware if somebody wants to investigate the problem (though it still
> takes 6-12 hours).

The GCC code generator appears to have been rewritten between 3.4.6 and 
4.1.1....

I took a look at the dump he posted and there are some minor and some massive 
differences between the code. In one case some of the code is swapped, in 
another there is code in the 3.4.6 version that isn't in the 4.1.1... Finally 
the 4.1.1 version of the function has what appears to be function calls and 
these don't appear in the code generated by 3.4.6

In other words - the code generation for 4.1.1 appears to be broken when it 
comes to generating system code.

DRH

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-02 21:10       ` kernel + gcc 4.1 = several problems Adrian Bunk
  2007-01-02 21:56         ` Alistair John Strachan
@ 2007-01-02 22:01         ` Linus Torvalds
  2007-01-02 23:09           ` David Rientjes
  1 sibling, 1 reply; 60+ messages in thread
From: Linus Torvalds @ 2007-01-02 22:01 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Alistair John Strachan, Zhang, Yanmin, LKML, Greg KH,
	Chuck Ebbert, Andrew Morton



On Tue, 2 Jan 2007, Adrian Bunk wrote:
> 
> My point is that we have several reported problems only visible
> with gcc 4.1.
> 
> Other bug reports are e.g. [2] and [3], but they are only present with
> using gcc 4.1 _and_ using -Os.

Traditionally, afaik, -Os has tended to show compiler problems that 
_could_ happen with -O2 too, but never do in practice. It may be that 
gcc-4.1 without -Os miscompiles some very unusual code, and then with -Os 
we just hit more cases of that.

That said, I th ink gcc-4.1.1 is very common - I know it's the Fedora 
compiler. Also, CC_OPTIMIZE_FOR_SIZE defaults to 'y' if you have 
EXPERIMENTAL on, and from all the bug-reports about other features that 
are marked EXPERIMENTAL, I know that a lot of people do seem to select for 
it. So I would expect that gcc-4.1.1 and -Os is actually a fairly common 
combination. I just checked, and it's what I use personally, for example.

Of course, my main machine is an x86-64, and it has more registers. At 
least some historical -Os bug was about bad things happening under 
register pressure, iirc, and so x86-64 would show fewer problems than 
regular 32-bit x86 (which has far fewer registers for the compiler to 
use).

It is a bit worrisome. These things seem to be about 50:50 real kernel 
bugs (just hidden by some common code generation sequence) and real 
honest-to-goodness compiler bugs. But they are hard as hell to find.

		Linus

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: kernel + gcc 4.1 = several problems
  2007-01-02 21:10       ` kernel + gcc 4.1 = several problems Adrian Bunk
@ 2007-01-02 21:56         ` Alistair John Strachan
  2007-01-02 22:06           ` D. Hazelton
  2007-01-02 22:13           ` Linus Torvalds
  2007-01-02 22:01         ` Linus Torvalds
  1 sibling, 2 replies; 60+ messages in thread
From: Alistair John Strachan @ 2007-01-02 21:56 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Zhang, Yanmin, LKML, Greg KH, Chuck Ebbert, Linus Torvalds,
	Andrew Morton

On Tuesday 02 January 2007 21:10, Adrian Bunk wrote:
[snip]
> > > Comparing your report and [1], it seems that if these are the same
> > > problem, it's not a hardware bug but a gcc or kernel bug.
> >
> > This bug specifically indicates some kind of miscompilation in a driver,
> > causing boot time hangs. My problem is quite different, and more subtle.
> > The crash happens in the same place every time, which does suggest
> > determinism (even with various options toggled on and off, and a 300K
> > smaller kernel image), but it takes 8-12 hours to manifest and only
> > happens with GCC 4.1.1. ...
>
> Sorry if my point goes a bit away from your problem:
>
> My point is that we have several reported problems only visible
> with gcc 4.1.
>
> Other bug reports are e.g. [2] and [3], but they are only present with
> using gcc 4.1 _and_ using -Os.

I find [2] most compelling, and I can confirm that I do have the same problem 
with or without optimisation for size. I don't use selinux nor has it ever 
been enabled.

At any rate, I have absolute confirmation that it is GCC 4.1.1, because with 
GCC 3.4.6 the same kernel I reported booting three days ago is still 
cheerfully working. I regularly get uptimes of 60+ days on that machine, 
rebooting only for kernel upgrades. 2.6.19 seems to be no worse in this 
regard.

Perhaps fortunately, the configs I've tried have consistently failed to shake 
the crash, so I have a semi-reproducible test case here on C3-2 hardware if 
somebody wants to investigate the problem (though it still takes 6-12 hours).

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* kernel + gcc 4.1 = several problems
  2006-12-31 16:55     ` Alistair John Strachan
@ 2007-01-02 21:10       ` Adrian Bunk
  2007-01-02 21:56         ` Alistair John Strachan
  2007-01-02 22:01         ` Linus Torvalds
  0 siblings, 2 replies; 60+ messages in thread
From: Adrian Bunk @ 2007-01-02 21:10 UTC (permalink / raw)
  To: Alistair John Strachan
  Cc: Zhang, Yanmin, LKML, Greg KH, Chuck Ebbert, Linus Torvalds,
	Andrew Morton

On Sun, Dec 31, 2006 at 04:55:51PM +0000, Alistair John Strachan wrote:
> On Sunday 31 December 2006 16:27, Adrian Bunk wrote:
> > On Sat, Dec 30, 2006 at 04:59:35PM +0000, Alistair John Strachan wrote:
> > > On Thursday 28 December 2006 04:14, Alistair John Strachan wrote:
> > > > On Thursday 28 December 2006 04:02, Alistair John Strachan wrote:
> > > > > On Thursday 28 December 2006 02:41, Zhang, Yanmin wrote:
> > > > > [snip]
> > > > >
> > > > > > > Here's a current decompilation of vmlinux/pipe_poll() from the
> > > > > > > running kernel, the addresses have changed slightly. There's no
> > > > > > > xchg there either:
> > > > > >
> > > > > > Could you reproduce the bug by the new kernel, so we could get the
> > > > > > exact address and instruction of the bug?
> > > > >
> > > > > It crashed again, but this time with no output (machine locked
> > > > > solid). To be honest, the disassembly looks right (it's like Chuck
> > > > > said, it's jumping back half way through an instruction):
> > > > >
> > > > > c0156f5f:       3b 87 68 01 00 00       cmp    0x168(%edi),%eax
> > > > >
> > > > > So c0156f60 is 87 68 01 00 00..
> > > > >
> > > > > This is with the GCC recompile, so it's not a distro problem. It
> > > > > could still either be GCC 4.x, or a 2.6.19.1 specific bug, but it's
> > > > > serious. 2.6.19 with GCC 3.4.3 is 100% stable.
> > > >
> > > > Looks like a similar crash here:
> > > >
> > > > http://ubuntuforums.org/showthread.php?p=1803389
> > >
> > > I've eliminated 2.6.19.1 as the culprit, and also tried toggling
> > > "optimize for size", various debug options. 2.6.19 compiled with GCC
> > > 4.1.1 on an Via Nehemiah C3-2 seems to crash in pipe_poll reliably,
> > > within approximately 12 hours.
> > >
> > > The machine passes 6 hours of Prime95 (a CPU stability tester), four
> > > memtest86 passes, and there are no heat problems.
> > >
> > > I have compiled GCC 3.4.6 and compiled 2.6.19 with an identical config
> > > using this compiler (but the same binutils), and will report back if it
> > > crashes. My bet is that it won't, however.
> >
> > There are occasional reports of problems with kernels compiled with
> > gcc 4.1 that vanish when using older versions of gcc.
> >
> > AFAIK, until now noone has ever debugged whether that's a gcc bug,
> > gcc exposing a kernel bug or gcc exposing a hardware bug.
> >
> > Comparing your report and [1], it seems that if these are the same
> > problem, it's not a hardware bug but a gcc or kernel bug.
> 
> This bug specifically indicates some kind of miscompilation in a driver, 
> causing boot time hangs. My problem is quite different, and more subtle. The 
> crash happens in the same place every time, which does suggest determinism 
> (even with various options toggled on and off, and a 300K smaller kernel 
> image), but it takes 8-12 hours to manifest and only happens with GCC 4.1.1.
>...

Sorry if my point goes a bit away from your problem:

My point is that we have several reported problems only visible
with gcc 4.1.

Other bug reports are e.g. [2] and [3], but they are only present with
using gcc 4.1 _and_ using -Os.

There's simply a bunch of bugs only present with gcc 4.1, and what 
worries me most is that the estimated number of unknown cases is most 
likely very high since most people won't check different compiler 
versions when running into a problem.

> Cheers,
> Alistair.

cu
Adrian

[1] http://bugzilla.kernel.org/show_bug.cgi?id=7176
[2] http://bugzilla.kernel.org/show_bug.cgi?id=7106
[3] https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=186852

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2007-01-26 22:05 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-01-04  7:11 kernel + gcc 4.1 = several problems Albert Cahalan
2007-01-04 16:43 ` Segher Boessenkool
2007-01-04 17:04   ` Albert Cahalan
2007-01-04 17:24     ` Segher Boessenkool
2007-01-04 17:47       ` Linus Torvalds
2007-01-04 18:53         ` Segher Boessenkool
2007-01-04 19:10         ` Al Viro
2007-01-05 17:17       ` Pavel Machek
2007-01-06  8:23         ` Segher Boessenkool
2007-01-04 17:37     ` Linus Torvalds
2007-01-04 18:34       ` Segher Boessenkool
2007-01-04 22:02         ` Geert Bosch
2007-01-07  4:25       ` Denis Vlasenko
2007-01-07  4:45         ` Linus Torvalds
2007-01-07  5:26           ` Jeff Garzik
2007-01-07 15:10         ` Segher Boessenkool
2007-01-26 22:05           ` Michael K. Edwards
2007-01-04 18:08     ` Andreas Schwab
  -- strict thread matches above, loose matches on Subject: below --
2007-01-03  2:12 Mikael Pettersson
2007-01-03  2:20 ` Alistair John Strachan
2007-01-05 15:53   ` Alistair John Strachan
2007-01-05 16:02     ` Linus Torvalds
2007-01-05 16:19       ` Alistair John Strachan
2007-01-05 16:49         ` Linus Torvalds
2007-01-07  0:36           ` Pavel Machek
2007-01-07  0:57             ` Alistair John Strachan
2007-01-03  5:55 ` Willy Tarreau
2007-01-03 10:29 ` Alan
2007-01-03 10:32   ` Grzegorz Kulewski
2007-01-03 11:51     ` Jeff Garzik
2007-01-03 12:44     ` Alan
2007-01-03 13:32       ` Arjan van de Ven
2007-01-03 13:58         ` Jakub Jelinek
2007-01-03 14:28         ` Alan
2007-01-03 16:06           ` Linus Torvalds
2007-01-03 16:03     ` Linus Torvalds
2007-01-03 17:01       ` l.genoni
2007-01-03 17:45         ` Tim Schmielau
2007-01-03 20:24           ` Linus Torvalds
2007-01-03 17:06       ` l.genoni
2007-01-03 17:53       ` Mariusz Kozlowski
2007-01-03 19:47       ` Denis Vlasenko
2007-01-03 20:38         ` Linus Torvalds
2007-01-03 21:48           ` Denis Vlasenko
2007-01-03 22:13             ` Linus Torvalds
2007-01-03 21:44       ` Thomas Sailer
2007-01-03 22:08         ` Linus Torvalds
2007-01-04  3:08       ` Zou, Nanhai
2007-01-04 15:34         ` Linus Torvalds
2006-12-20 14:21 Oops in 2.6.19.1 Alistair John Strachan
2006-12-30 16:59 ` Alistair John Strachan
2006-12-31 16:27   ` Adrian Bunk
2006-12-31 16:55     ` Alistair John Strachan
2007-01-02 21:10       ` kernel + gcc 4.1 = several problems Adrian Bunk
2007-01-02 21:56         ` Alistair John Strachan
2007-01-02 22:06           ` D. Hazelton
2007-01-02 23:24             ` Adrian Bunk
2007-01-02 23:41               ` D. Hazelton
2007-01-03  2:05                 ` Horst H. von Brand
2007-01-02 22:13           ` Linus Torvalds
2007-01-02 23:18             ` Alistair John Strachan
2007-01-03  1:43               ` Linus Torvalds
2007-01-02 22:01         ` Linus Torvalds
2007-01-02 23:09           ` David Rientjes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).