linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Richard B. Johnson" <root@chaos.analogic.com>
To: "J.A. Magallon" <jamagallon@able.es>
Cc: "Martin J. Bligh" <mbligh@aracnet.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	lse-tech <lse-tech@lists.sourceforge.net>
Subject: Re: gcc 2.95 vs 3.21 performance
Date: Tue, 4 Feb 2003 08:42:18 -0500 (EST)	[thread overview]
Message-ID: <Pine.LNX.3.95.1030204083627.10035B-100000@chaos.analogic.com> (raw)
In-Reply-To: <20030204004321.GA12038@werewolf.able.es>

On Tue, 4 Feb 2003, J.A. Magallon wrote:

> 
> On 2003.02.04 Richard B. Johnson wrote:
> > On Mon, 3 Feb 2003, Martin J. Bligh wrote:
> > 
> > > People keep extolling the virtues of gcc 3.2 to me, which I'm
> > > reluctant to switch to, since it compiles so much slower. But
> > > it supposedly generates better code, so I thought I'd compile
> > > the kernel with both and compare the results. This is gcc 2.95
> > > and 3.2.1 from debian unstable on a 16-way NUMA-Q. The kernbench
> > > tests still use 2.95 for the compile-time stuff.
> > >
> > [SNIPPED tests...]
> > 
> > Don't let this get out, but egcs-2.91.66 compiled FFT code
> > works about 50 percent of the speed of whatever M$ uses for
> > Visual C++ Version 6.0  I was awfully disheartened when I
> > found that identical code executed twice as fast on M$ than
> > it does on Linux. I tried to isolate what was causing the
> > difference. So I replaced 'hypot()' with some 'C' code that
> > does sqrt(x^2 + y^2) just to see if it was the 'C' library.
> > It didn't help. When I find out what type (section) of code
> > is running slower, I'll report. In the meantime, it's fast
> > enough, but I don't like being beat by M$.
> > 
> 
> I face a simliar problem. As everybody says that SSE is so marvelous,
> we are trying to put some SSE code in our render engine, to speed up this.
> But look at the results of the code below (box is a P4@1.8, Xeon with ht):

[SNIPPED good demo code]

I'm going to answer all the comments on this topic with just
one observation. Sorry that I don't have the time to answer
all who responded personally, but I have to take a "work break"
today and tommorrow (design review).

gcc is a marvelous compiler because it was designed
to be readily ported to different architectures. However,
is not an optimum compiler for ix86 machines and probably
is not optimum for any one kind of machine.

I often hear complaints about the ix86 processors as being
"register starved", etc. This could not be further from
fact. There are enough registers. However, various registers
were designed to do various things. Once you decide that
you know more than the processor developers, and start
using registers for things they were not designed for,
you start to have excellent test benchmarks, but awful
overall performance.

For example, the ECX register was designed to be used as
a counter. It can be told to decrement and perform a
conditional jump with the 'loop' instruction. The loop
instruction comes in various flavors, also, like loopz,
loopnz. Somebody decided that 'dec ecx; jnz' was faster.
They measured this to "prove" that it's faster. In the
meantime, other code suffers (stumbles) because there
was really no spare time to be grabbed. Data needs to
be fetched to and from memory. The instruction unit
ends up being starved while data are acquired. This
would not normally hurt anything because the RAM bandwidth
ends up being the dominant pole in the transfer function,
but you end up with something I call the "accordion problem".

I will first demonstrate the accordion problem and then
explain where it comes from. Note a smooth slow of traffic
on a highway. All the cars are traveling at the same speed.
Their speed increases until they don't dare go any faster.
They are now "bandwidth limited". Somebody sees a traffic
cop. Somebody slows down, it takes a few hundred milliseconds
for the next car to slow down, this transient moves backwards
though the line of cars until cars several miles back actually
have to perform emergency braking to stay off the bumper
ahead. Then, the cars start accelerating again. This acceleration,
deceleration ripple moves through the line of cars like the
bellows of an accordion. The average speed of the line of
traffic is now reduced even though there are oscillatory
accelerations above the speed-limit.

Now, visualize a CPU and RAM combination running in lock-step.
The speed of the execution unit is matched to the speed of the
processor I/O so the instructions are fetched and executed in
a more-or-less synchronized manner. This is like the high-speed
line of cars before somebody sees the traffic cop. Now, perturb
this execution by throwing in some faster-than-normal program
sequences. You may start the accordion effect. The problem is
that both instructions and data come through the same hole-in-
the wall, regardless of caching. When the prefetch unit needs
more data (instructions) it must contend with the data I/O.
This may cause an oscillatory condition, actually reducing
throughput.

Anybody who uses CPUs in laboratories with sensitive receiving
equipment knows that, regardless of the FCC rules, these
machines generate great gobs of radio frequency interference.
That's why they need to be in shielded boxes. If you want
to "hear" the stumble I'm talking about, just listen to
the AM audio output using a field-intensity meter. When you
have a fast smoothly-running machine, the interference sounds
like noise. When you have the accordion effect, the interference
has a repetitive pattern to it, a tone, usually low-frequency.
If you capture enough data in a logic analyzer, you will see
the pattern and can see actual pauses in bus I/O where the
CPU just isn't doing a damn thing at all!

FYI, there is a difference in power supply current required
to write 0xffffffff to RAM than 0x00000000 (honest!). If you
are doing a memory-test, writing such a pattern that the
load on the power supply changes at a rate that will disturb
the power supply servo-loop, you can make the voltage bounce!
This has nothing to do with slow CPU execution speed, but
just demonstrates that there are a lot of interactions that
should be considered when designing or proving-out a system.
It's not just a local bench-mark that counts.

The Intel Compiler(s) I have used generate code that uses
the registers just like Intel specified. It uses EBX, ESI, EDI
as index registers just like the 16-bit BX, SI, DI. I have
never seen code output from an Intel 'C' compiler that uses
EAX as in index register, even though it's available and
"faster". They seem to stick with the "un-optimized" string
instructions like rep movsb, repnz cmpsb, etc., and they
use 'loop'. Maybe, just maybe, Intel knows something about
their processor that shouldn't be second-guessed by clever
programmers.
 

Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
Why is the government concerned about the lunatic fringe? Think about it.



  reply	other threads:[~2003-02-04 13:31 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-02-03 23:05 gcc 2.95 vs 3.21 performance Martin J. Bligh
2003-02-03 23:22 ` [Lse-tech] " Andi Kleen
2003-02-03 23:31 ` Richard B. Johnson
2003-02-04  0:43   ` J.A. Magallon
2003-02-04 13:42     ` Richard B. Johnson [this message]
2003-02-04 14:20       ` John Bradford
2003-02-04  6:54   ` Denis Vlasenko
2003-02-04  7:13     ` Martin J. Bligh
2003-02-04 12:25       ` Adrian Bunk
2003-02-04 15:51         ` Martin J. Bligh
2003-02-04 16:27           ` [Lse-tech] " Martin J. Bligh
2003-02-04 17:40             ` Patrick Mansfield
2003-02-04 17:55               ` Martin J. Bligh
2003-02-04  9:54     ` Bryan Andersen
2003-02-04 15:46       ` Martin J. Bligh
2003-02-04 19:09     ` Timothy D. Witham
2003-02-04 19:35       ` John Bradford
2003-02-04 19:44         ` Dave Jones
2003-02-04 20:11           ` John Bradford
2003-02-04 20:20             ` John Bradford
2003-02-04 20:45             ` Herman Oosthuysen
2003-02-04 21:44               ` Timothy D. Witham
2003-02-05  7:15               ` Denis Vlasenko
2003-02-05 10:36                 ` Andreas Schwab
2003-02-05 11:41                   ` Denis Vlasenko
2003-02-05 12:20                     ` Dave Jones
2003-02-05 13:10                     ` [Lse-tech] " Dipankar Sarma
2003-02-05 15:30                 ` Martin J. Bligh
2003-02-04 21:38         ` Linus Torvalds
2003-02-04 21:54           ` John Bradford
2003-02-04 22:11             ` Linus Torvalds
2003-02-04 23:27               ` Timothy D. Witham
2003-02-04 23:21           ` Larry McVoy
2003-02-04 23:42             ` b_adlakha
2003-02-05  0:19               ` Andy Pfiffer
2003-02-04 23:51             ` Jakob Oestergaard
2003-02-05  1:03               ` Hugo Mills
2003-02-10 22:26               ` Andrea Arcangeli
2003-02-10 23:28                 ` J.A. Magallon
2003-02-04 23:51             ` Eli Carter
2003-02-05  0:27               ` Larry McVoy
2003-02-06 20:42                 ` Paul Jakma
2003-02-05  3:03             ` Tomas Szepe
2003-02-05  6:03             ` Mark Mielke
2003-02-07 16:09           ` Pavel Machek
2003-02-04 10:57   ` Padraig
2003-02-04 13:11     ` Helge Hafting
2003-02-04 13:29       ` Jörn Engel
2003-02-04 14:05       ` P
2003-02-04 20:36         ` Herman Oosthuysen
2003-02-04 12:20 ` [Lse-tech] " Dave Jones
2003-02-04 15:50   ` Martin J. Bligh
2003-02-10 12:13     ` Momchil Velikov
2003-02-06 15:42 ` gcc -O2 vs gcc -Os performance Martin J. Bligh
2003-02-06 15:51   ` [Lse-tech] " Andi Kleen
2003-02-06 17:48   ` Alan Cox
2003-02-06 17:06     ` Martin J. Bligh
2003-02-06 20:38     ` Martin J. Bligh
2003-02-06 21:32       ` John Bradford
2003-02-06 22:12       ` Linus Torvalds
2003-02-06 22:58         ` Martin J. Bligh
2003-02-06 23:16           ` Linus Torvalds
2003-02-06 23:59             ` Martin J. Bligh
2003-02-06 23:17       ` Roger Larsson
2003-02-06 23:33         ` Martin J. Bligh
     [not found] <1044385759.1861.46.camel@localhost.localdomain.suse.lists.linux.kernel>
     [not found] ` <200302041935.h14JZ69G002675@darkstar.example.net.suse.lists.linux.kernel>
     [not found]   ` <b1pbt8$2ll$1@penguin.transmeta.com.suse.lists.linux.kernel>
2003-02-04 22:05     ` gcc 2.95 vs 3.21 performance Andi Kleen
2003-02-04 22:14       ` Linus Torvalds
2003-02-05 10:04         ` Pavel Janík
2003-02-05 20:07           ` Linus Torvalds
2003-02-06 15:00           ` Horst von Brand
2003-02-04 22:59       ` Jeff Muizelaar
2003-02-04 23:12         ` b_adlakha
2003-02-05  8:41         ` Horst von Brand
2003-02-05 19:09         ` Linus Torvalds
2003-02-05 19:22           ` Randy.Dunlap
2003-02-05 19:24           ` John Bradford
2003-02-06  7:02         ` Neil Booth
     [not found]           ` <courier.3E423112.00007219@softhome.net>
     [not found]             ` <20030206212218.GA4891@daikokuya.co.uk>
2003-02-07 10:31               ` b_adlakha
2003-02-07 18:46                 ` Horst von Brand
2003-02-07 21:49                 ` Neil Booth
2003-02-10  2:14           ` Jeff Garzik
2003-02-10  9:19             ` Tomas Szepe
     [not found] <120432836@toto.iv>
2003-02-05  2:45 ` Peter Chubb
     [not found] <200302052021.h15KLrXv000881@darkstar.example.net>
2003-02-05 20:28 ` b_adlakha

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.3.95.1030204083627.10035B-100000@chaos.analogic.com \
    --to=root@chaos.analogic.com \
    --cc=jamagallon@able.es \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lse-tech@lists.sourceforge.net \
    --cc=mbligh@aracnet.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).