linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* gcc 2.95 vs 3.21 performance
@ 2003-02-03 23:05 Martin J. Bligh
  2003-02-03 23:22 ` [Lse-tech] " Andi Kleen
                   ` (3 more replies)
  0 siblings, 4 replies; 84+ messages in thread
From: Martin J. Bligh @ 2003-02-03 23:05 UTC (permalink / raw)
  To: linux-kernel; +Cc: lse-tech

People keep extolling the virtues of gcc 3.2 to me, which I'm
reluctant to switch to, since it compiles so much slower. But
it supposedly generates better code, so I thought I'd compile
the kernel with both and compare the results. This is gcc 2.95
and 3.2.1 from debian unstable on a 16-way NUMA-Q. The kernbench
tests still use 2.95 for the compile-time stuff.

The results below leaves me distinctly unconvinced by the supposed 
merits of modern gcc's. Not really better or worse, within experimental
error. But much slower to compile things with.

Kernbench-2: (make -j N vmlinux, where N = 2 x num_cpus)
                                   Elapsed        User      System         CPU
                        2.5.59       46.08      563.88      118.38     1480.00
                 2.5.59-gcc3.2       45.86      563.63      119.58     1489.33

Kernbench-16: (make -j N vmlinux, where N = 16 x num_cpus)
                                   Elapsed        User      System         CPU
                        2.5.59       47.45      568.02      143.17     1498.17
                 2.5.59-gcc3.2       47.15      567.41      143.72     1507.50

DISCLAIMER: SPEC(tm) and the benchmark name SDET(tm) are registered
trademarks of the Standard Performance Evaluation Corporation. This 
benchmarking was performed for research purposes only, and the run results
are non-compliant and not-comparable with any published results.

Results are shown as percentages of the first set displayed

SDET 1  (see disclaimer)
                                Throughput    Std. Dev
                        2.5.59       100.0%         0.8%
                 2.5.59-gcc3.2        95.3%         5.2%

SDET 2  (see disclaimer)
                                Throughput    Std. Dev
                        2.5.59       100.0%         0.6%
                 2.5.59-gcc3.2        91.9%         7.1%

SDET 4  (see disclaimer)
                                Throughput    Std. Dev
                        2.5.59       100.0%         5.7%
                 2.5.59-gcc3.2        98.8%         5.3%

SDET 8  (see disclaimer)
                                Throughput    Std. Dev
                        2.5.59       100.0%         1.4%
                 2.5.59-gcc3.2       105.3%         4.7%

SDET 16  (see disclaimer)
                                Throughput    Std. Dev
                        2.5.59       100.0%         1.7%
                 2.5.59-gcc3.2       103.1%         1.8%

SDET 32  (see disclaimer)
                                Throughput    Std. Dev
                        2.5.59       100.0%         1.5%
                 2.5.59-gcc3.2       101.0%         1.6%

SDET 64  (see disclaimer)
                                Throughput    Std. Dev
                        2.5.59       100.0%         0.7%
                 2.5.59-gcc3.2       103.1%         1.1%

SDET 128  (see disclaimer)
                                Throughput    Std. Dev

NUMA schedbench 4:
                                   AvgUser     Elapsed   TotalUser    TotalSys
                        2.5.59        0.00       38.88       82.78        0.65
                 2.5.59-gcc3.2        0.00       41.80      107.76        0.73

NUMA schedbench 8:
                                   AvgUser     Elapsed   TotalUser    TotalSys
                        2.5.59        0.00       49.30      247.80        1.93
                 2.5.59-gcc3.2        0.00       38.00      229.83        2.11

NUMA schedbench 16:
                                   AvgUser     Elapsed   TotalUser    TotalSys
                        2.5.59        0.00       57.37      843.12        3.77
                 2.5.59-gcc3.2        0.00       57.28      839.21        2.85

NUMA schedbench 32:
                                   AvgUser     Elapsed   TotalUser    TotalSys
                        2.5.59        0.00      116.99     1805.79        6.05
                 2.5.59-gcc3.2        0.00      118.44     1788.09        6.25

NUMA schedbench 64:
                                   AvgUser     Elapsed   TotalUser    TotalSys
                        2.5.59        0.00      235.18     3632.73       15.45
                 2.5.59-gcc3.2        0.00      234.55     3633.76       15.02



------------------------------------------------------------------------------


And with the same kernel, comparing the compile times for gcc 2.95 to 3.2

Kernbench-2: (make -j N vmlinux, where N = 2 x num_cpus)
                                   Elapsed        User      System         CPU
                        gcc2.95      46.08      563.88      118.38     1480.00
                        gcc3.21      69.93      923.17      114.36     1483.17

Kernbench-16: (make -j N vmlinux, where N = 16 x num_cpus)
                                   Elapsed        User      System         CPU
                        gcc2.95      47.45      568.02      143.17     1498.17
                        gcc3.21      71.44      926.45      134.89     1485.33

pft.


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Lse-tech] gcc 2.95 vs 3.21 performance
  2003-02-03 23:05 gcc 2.95 vs 3.21 performance Martin J. Bligh
@ 2003-02-03 23:22 ` Andi Kleen
  2003-02-03 23:31 ` Richard B. Johnson
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 84+ messages in thread
From: Andi Kleen @ 2003-02-03 23:22 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: linux-kernel, lse-tech

On Mon, Feb 03, 2003 at 03:05:06PM -0800, Martin J. Bligh wrote:
> The results below leaves me distinctly unconvinced by the supposed 
> merits of modern gcc's. Not really better or worse, within experimental
> error. But much slower to compile things with.

Curious - could you compare it with a gcc 3.3 snapshot too?

It should be even slower at compiling, but generate better code.

-Andi

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-03 23:05 gcc 2.95 vs 3.21 performance Martin J. Bligh
  2003-02-03 23:22 ` [Lse-tech] " Andi Kleen
@ 2003-02-03 23:31 ` Richard B. Johnson
  2003-02-04  0:43   ` J.A. Magallon
                     ` (2 more replies)
  2003-02-04 12:20 ` [Lse-tech] " Dave Jones
  2003-02-06 15:42 ` gcc -O2 vs gcc -Os performance Martin J. Bligh
  3 siblings, 3 replies; 84+ messages in thread
From: Richard B. Johnson @ 2003-02-03 23:31 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: linux-kernel, lse-tech

On Mon, 3 Feb 2003, Martin J. Bligh wrote:

> People keep extolling the virtues of gcc 3.2 to me, which I'm
> reluctant to switch to, since it compiles so much slower. But
> it supposedly generates better code, so I thought I'd compile
> the kernel with both and compare the results. This is gcc 2.95
> and 3.2.1 from debian unstable on a 16-way NUMA-Q. The kernbench
> tests still use 2.95 for the compile-time stuff.
>
[SNIPPED tests...]

Don't let this get out, but egcs-2.91.66 compiled FFT code
works about 50 percent of the speed of whatever M$ uses for
Visual C++ Version 6.0  I was awfully disheartened when I
found that identical code executed twice as fast on M$ than
it does on Linux. I tried to isolate what was causing the
difference. So I replaced 'hypot()' with some 'C' code that
does sqrt(x^2 + y^2) just to see if it was the 'C' library.
It didn't help. When I find out what type (section) of code
is running slower, I'll report. In the meantime, it's fast
enough, but I don't like being beat by M$.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
Why is the government concerned about the lunatic fringe? Think about it.



^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-03 23:31 ` Richard B. Johnson
@ 2003-02-04  0:43   ` J.A. Magallon
  2003-02-04 13:42     ` Richard B. Johnson
  2003-02-04  6:54   ` Denis Vlasenko
  2003-02-04 10:57   ` Padraig
  2 siblings, 1 reply; 84+ messages in thread
From: J.A. Magallon @ 2003-02-04  0:43 UTC (permalink / raw)
  To: root; +Cc: Martin J. Bligh, linux-kernel, lse-tech


On 2003.02.04 Richard B. Johnson wrote:
> On Mon, 3 Feb 2003, Martin J. Bligh wrote:
> 
> > People keep extolling the virtues of gcc 3.2 to me, which I'm
> > reluctant to switch to, since it compiles so much slower. But
> > it supposedly generates better code, so I thought I'd compile
> > the kernel with both and compare the results. This is gcc 2.95
> > and 3.2.1 from debian unstable on a 16-way NUMA-Q. The kernbench
> > tests still use 2.95 for the compile-time stuff.
> >
> [SNIPPED tests...]
> 
> Don't let this get out, but egcs-2.91.66 compiled FFT code
> works about 50 percent of the speed of whatever M$ uses for
> Visual C++ Version 6.0  I was awfully disheartened when I
> found that identical code executed twice as fast on M$ than
> it does on Linux. I tried to isolate what was causing the
> difference. So I replaced 'hypot()' with some 'C' code that
> does sqrt(x^2 + y^2) just to see if it was the 'C' library.
> It didn't help. When I find out what type (section) of code
> is running slower, I'll report. In the meantime, it's fast
> enough, but I don't like being beat by M$.
> 

I face a simliar problem. As everybody says that SSE is so marvelous,
we are trying to put some SSE code in our render engine, to speed up this.
But look at the results of the code below (box is a P4@1.8, Xeon with ht):
annwn:~/sse> ss-g
Proc std:
      5020 kticks
Proc std inline:
      4320 kticks
Proc sse:
      4290 kticks
Proc sse inline:
      3890 kticks

So what ? Just around 500 ticks for updating to sse ? As Computer Architecture
people at the school says, it is something called 'spill code' (did I wrote it
ok?). In short, too much sse but too less registers, so Intel ia32 turns into
crap when you need some indexes, out of registers and copy to and from the stack.

#include <stdlib.h>
#include <time.h>
#include <stdio.h>
#if defined(__INTEL_COMPILER)
#include <xmmintrin.h>
#endif

#define LOOPS	1000
#define SZ		100000

#if defined(__GNUC__) && defined(__SSE__)
typedef void __ve_reg __attribute__((__mode__(V4SF)));
#endif

typedef struct point point;
struct point { 
	float v[4];
};

void mulp_std(const point* a,const point* b,point* r)
{
	int i;
	for (i=0; i<4; i++)
		r->v[i] = a->v[i] * b->v[i];
}

inline void mulpi_std(const point* a,const point* b,point* r)
{
	int i;
	for (i=0; i<4; i++)
		r->v[i] = a->v[i] * b->v[i];
}

void mulp_sse(const point* a,const point* b,point* r)
{
#if defined(__GNUC__) && defined(__SSE__)
	__ve_reg xmm0,xmm1,xmm2;
	xmm0 = __builtin_ia32_loadups((float*)a->v);
	xmm1 = __builtin_ia32_loadups((float*)b->v);
	xmm2 = __builtin_ia32_mulps(xmm0,xmm1);
	__builtin_ia32_storeups(r->v,xmm2);
#endif
#if defined(__INTEL_COMPILER)
	__m128 xmm0,xmm1,xmm2;
	xmm0 = _mm_loadu_ps((float*)a->v);
	xmm1 = _mm_loadu_ps((float*)b->v);
	xmm2 = _mm_mul_ps(xmm0,xmm1);
	_mm_storeu_ps(r->v,xmm2);
#endif
}

inline void mulpi_sse(const point* a,const point* b,point* r)
{
#if defined(__GNUC__) && defined(__SSE__)
	__ve_reg xmm0,xmm1,xmm2;
	xmm0 = __builtin_ia32_loadups((float*)a->v);
	xmm1 = __builtin_ia32_loadups((float*)b->v);
	xmm2 = __builtin_ia32_mulps(xmm0,xmm1);
	__builtin_ia32_storeups(r->v,xmm2);
#endif
#if defined(__INTEL_COMPILER)
#if defined(__INTEL_COMPILER)
	__m128 xmm0,xmm1,xmm2;
	xmm0 = _mm_loadu_ps((float*)a->v);
	xmm1 = _mm_loadu_ps((float*)b->v);
	xmm2 = _mm_mul_ps(xmm0,xmm1);
	_mm_storeu_ps(r->v,xmm2);
#endif
#endif
}

int main(int argc, char** argv)
{
	point *a;
	point *b;
	point *c;
	int i,j;
	unsigned long t0,t1;

	a = malloc(SZ*sizeof(point));
	b = malloc(SZ*sizeof(point));
	c = malloc(SZ*sizeof(point));

	printf("Proc std:\n");
	t0 = clock();
	for (i=0; i<LOOPS; i++)
	{
		for (j=0; j<SZ; j++)
			mulp_std(&a[j],&b[j],&c[j]);
		for (j=0; j<SZ; j++)
			mulp_std(&b[j],&b[j],&a[j]);
	}
	t1 = clock();
	printf("%10d kticks\n",(t1-t0)/1000);

	printf("Proc std inline:\n");
	t0 = clock();
	for (i=0; i<LOOPS; i++)
	{
		for (j=0; j<SZ; j++)
			mulpi_std(&a[j],&b[j],&c[j]);
		for (j=0; j<SZ; j++)
			mulpi_std(&b[j],&b[j],&a[j]);
	}
	t1 = clock();
	printf("%10d kticks\n",(t1-t0)/1000);

	printf("Proc sse:\n");
	t0 = clock();
	for (i=0; i<LOOPS; i++)
	{
		for (j=0; j<SZ; j++)
			mulp_sse(&a[j],&b[j],&c[j]);
		for (j=0; j<SZ; j++)
			mulp_sse(&b[j],&b[j],&a[j]);
	}
	t1 = clock();
	printf("%10d kticks\n",(t1-t0)/1000);

	printf("Proc sse inline:\n");
	t0 = clock();
	for (i=0; i<LOOPS; i++)
	{
		for (j=0; j<SZ; j++)
			mulpi_sse(&a[j],&b[j],&c[j]);
		for (j=0; j<SZ; j++)
			mulpi_sse(&b[j],&b[j],&a[j]);
	}
	t1 = clock();
	printf("%10d kticks\n",(t1-t0)/1000);

	free(c);
	free(b);
	free(a);

	return 0;
}


-- 
J.A. Magallon <jamagallon@able.es>      \                 Software is like sex:
werewolf.able.es                         \           It's better when it's free
Mandrake Linux release 9.1 (Cooker) for i586
Linux 2.4.21-pre4-jam1 (gcc 3.2.1 (Mandrake Linux 9.1 3.2.1-5mdk))

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-03 23:31 ` Richard B. Johnson
  2003-02-04  0:43   ` J.A. Magallon
@ 2003-02-04  6:54   ` Denis Vlasenko
  2003-02-04  7:13     ` Martin J. Bligh
                       ` (2 more replies)
  2003-02-04 10:57   ` Padraig
  2 siblings, 3 replies; 84+ messages in thread
From: Denis Vlasenko @ 2003-02-04  6:54 UTC (permalink / raw)
  To: root, Martin J. Bligh; +Cc: linux-kernel, lse-tech

On 4 February 2003 01:31, Richard B. Johnson wrote:
> On Mon, 3 Feb 2003, Martin J. Bligh wrote:
> > People keep extolling the virtues of gcc 3.2 to me, which I'm
> > reluctant to switch to, since it compiles so much slower. But
> > it supposedly generates better code, so I thought I'd compile
> > the kernel with both and compare the results. This is gcc 2.95
> > and 3.2.1 from debian unstable on a 16-way NUMA-Q. The kernbench
> > tests still use 2.95 for the compile-time stuff.
>
> [SNIPPED tests...]

What was the size of uncompressed kernel binaries?
This is a simple (and somewhat inaccurate) measure of compiler
improvement ;)

> Don't let this get out, but egcs-2.91.66 compiled FFT code
> works about 50 percent of the speed of whatever M$ uses for
> Visual C++ Version 6.0  I was awfully disheartened when I

Yes. M$ (and some other compilers) beat GCC badly.

> found that identical code executed twice as fast on M$ than
> it does on Linux. I tried to isolate what was causing the
> difference. So I replaced 'hypot()' with some 'C' code that
> does sqrt(x^2 + y^2) just to see if it was the 'C' library.
> It didn't help. When I find out what type (section) of code
> is running slower, I'll report. In the meantime, it's fast
> enough, but I don't like being beat by M$.

I'm afraid it's code generation engine. It is just worse than
M$ or Intel's one. It is not easily fixable,
GCC folks have tremendous task at hand.

I wonder whether some big companies supposedly supporting 
Linux (e.g. Intel) can help GCC team (for example by giving
away some code and/or developer time).
--
vda

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-04  6:54   ` Denis Vlasenko
@ 2003-02-04  7:13     ` Martin J. Bligh
  2003-02-04 12:25       ` Adrian Bunk
  2003-02-04  9:54     ` Bryan Andersen
  2003-02-04 19:09     ` Timothy D. Witham
  2 siblings, 1 reply; 84+ messages in thread
From: Martin J. Bligh @ 2003-02-04  7:13 UTC (permalink / raw)
  To: vda; +Cc: linux-kernel, lse-tech

> I'm afraid it's code generation engine. It is just worse than
> M$ or Intel's one. It is not easily fixable,
> GCC folks have tremendous task at hand.
> 
> I wonder whether some big companies supposedly supporting 
> Linux (e.g. Intel) can help GCC team (for example by giving
> away some code and/or developer time).

Comparing Intel's compiler vs GCC on Linux would be more interesting.
Anyone got a copy and some time to burn?

M.



^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-04  6:54   ` Denis Vlasenko
  2003-02-04  7:13     ` Martin J. Bligh
@ 2003-02-04  9:54     ` Bryan Andersen
  2003-02-04 15:46       ` Martin J. Bligh
  2003-02-04 19:09     ` Timothy D. Witham
  2 siblings, 1 reply; 84+ messages in thread
From: Bryan Andersen @ 2003-02-04  9:54 UTC (permalink / raw)
  To: linux-kernel; +Cc: vda, root, Martin J. Bligh, lse-tech


Personal opinion here but I know it is also held by many developers I 
know and work with.  I'd rather have a compiler that produces correct 
and fast code but ran slow than one that produces slow or bad code and 
runs fast.  Remember compilation is done far less often than run time 
execution.  Yes I too noticed a difference when I switched over to 3.2 
but I also noticed some of my code speed up.

>>>People keep extolling the virtues of gcc 3.2 to me, which I'm
>>>reluctant to switch to, since it compiles so much slower. But
>>>it supposedly generates better code, so I thought I'd compile
>>>the kernel with both and compare the results. This is gcc 2.95
>>>and 3.2.1 from debian unstable on a 16-way NUMA-Q. The kernbench
>>>tests still use 2.95 for the compile-time stuff.
>>
>>[SNIPPED tests...]
> 
> 
> What was the size of uncompressed kernel binaries?
> This is a simple (and somewhat inaccurate) measure of compiler
> improvement ;)

While I too like smaller tighter output code, I'd trade it for code that 
runs faster in real world situations.  As an example identifying the 
most likely execution path through a routine and keeping it contiguous 
in memory will do more for average execution speed than optimizing to 
use the smallest number of bytes.  If the compiler could tell which 
blocks of code are for handling exceptions it then can place them ouside 
of the main execution path.  This makes the normal code execution path 
smaller and more compact.  In doing so it also reduces the number of 
memory fetch operations and cache space needed to run the code.  With 
cache misses being 100+ clock cycles and page faults well into the 
millions, keeping that normal execution path short means alot.

>>Don't let this get out, but egcs-2.91.66 compiled FFT code
>>works about 50 percent of the speed of whatever M$ uses for
>>Visual C++ Version 6.0  I was awfully disheartened when I
> 
> Yes. M$ (and some other compilers) beat GCC badly.

But can M$'s compiler produce code for many radically different CPU 
architectures?  Most people only work with gcc on one type of CPU so 
they never think about just how flexible and good GCC really is.  I see 
it often compaired against compilers that are dedicated to a single CPU 
where the development team only has to worry about one CPU type.  GCC's 
development team needs to worry about many different arcitectures.  Some 
are radically different in their fundamental structure.  This really 
complicates the job of producing a compiler that works correctly.

- Bryan




^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-03 23:31 ` Richard B. Johnson
  2003-02-04  0:43   ` J.A. Magallon
  2003-02-04  6:54   ` Denis Vlasenko
@ 2003-02-04 10:57   ` Padraig
  2003-02-04 13:11     ` Helge Hafting
  2 siblings, 1 reply; 84+ messages in thread
From: Padraig @ 2003-02-04 10:57 UTC (permalink / raw)
  To: root; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1156 bytes --]

Richard B. Johnson wrote:
> On Mon, 3 Feb 2003, Martin J. Bligh wrote:
> 
>>People keep extolling the virtues of gcc 3.2 to me, which I'm
>>reluctant to switch to, since it compiles so much slower. But
>>it supposedly generates better code, so I thought I'd compile
>>the kernel with both and compare the results. This is gcc 2.95
>>and 3.2.1 from debian unstable on a 16-way NUMA-Q. The kernbench
>>tests still use 2.95 for the compile-time stuff.
>>
> 
> [SNIPPED tests...]
> 
> Don't let this get out, but egcs-2.91.66 compiled FFT code
> works about 50 percent of the speed of whatever M$ uses for
> Visual C++ Version 6.0

Interesting. I just noticed that I get 50% decrease in
the speed of my program if I just insert a printf(). I.E.
my program is like:

printf()
for(;;) {
     do_sorting_loop_test();
}

If I remove the initial printf it doubles in speed?
I assume this is some weird caching thing?
gcc is 3.2.1 (same happens for 2.95..)

<boggle>
Note this is with -O3. If I don't specify -O then
leaving the printf in speeds things up by about 15%
</boggle>

attached is the assembly for the slow and fast
in case anyone's interested.

Pádraig.

[-- Attachment #2: slow.s --]
[-- Type: text/plain, Size: 4466 bytes --]

	.file	"testfunc.c"
.globl TEST_NUMBER
	.data
	.align 2
	.type	TEST_NUMBER,@object
	.size	TEST_NUMBER,2
TEST_NUMBER:
	.value	256
.globl count
	.align 4
	.type	count,@object
	.size	count,4
count:
	.long	0
.globl exit_flag
	.align 4
	.type	exit_flag,@object
	.size	exit_flag,4
exit_flag:
	.long	0
	.align 4
	.type	throttle_print.0,@object
	.size	throttle_print.0,4
throttle_print.0:
	.long	0
	.section	.rodata.str1.1,"aMS",@progbits,1
.LC0:
	.string	"\033[H\033[2J"
	.section	.rodata.str1.32,"aMS",@progbits,1
	.align 32
.LC3:
	.string	"\nAdding & dropping random array elements,(from a set of 000..%03u)\n"
	.section	.rodata.str1.1
.LC4:
	.string	"Ctrl C to exit"
	.section	.rodata.str1.32
	.align 32
.LC1:
	.string	"\n%lu array elements randomly dropped and added in %lus"
	.align 32
.LC2:
	.string	" (%lu/s)\n                                                                          \n"
	.text
	.p2align 2,,3
.globl main
	.type	main,@function
main:
	pushl	%ebp
	movl	%esp, %ebp
	pushl	%edi
	pushl	%esi
	pushl	%ebx
	subl	$12, %esp
	andl	$-16, %esp
	cmpl	$1, 8(%ebp)
	movl	$1, %edi
	jle	.L2
	pushl	$0
	pushl	$10
	pushl	$0
	movl	12(%ebp), %eax
	pushl	4(%eax)
	call	__strtol_internal
	addl	$16, %esp
	testl	%eax, %eax
	jle	.L2
	movw	%ax, TEST_NUMBER
.L2:
	subl	$12, %esp
	pushl	$.LC0
	call	printf
	popl	%eax
	pushl	stdout
	call	fflush
	movzwl	TEST_NUMBER, %edx
	sall	$1, %edx
	movl	%edx, (%esp)
	call	malloc
	movl	%eax, %esi
	movl	$0, (%esp)
	call	time
	popl	%ebx
	movl	%eax, start
	popl	%eax
	pushl	$exit_info_sig
	pushl	$2
	call	signal
	xorl	%edx, %edx
	movw	TEST_NUMBER, %cx
	addl	$16, %esp
	cmpw	%cx, %dx
	jae	.L24
.L10:
	movzwl	%dx, %ebx
	movw	%dx, (%esi,%ebx,2)
	incl	%edx
	cmpw	%cx, %dx
	jb	.L10
	.p2align 2,,3
.L24:
	incl	count
	call	rand
	movw	TEST_NUMBER, %bx
	movzwl	%bx, %edx
	movl	%edx, %ecx
	cltd
	idivl	%ecx
	cmpw	%bx, %dx
	movl	%edx, %ecx
	jae	.L27
	.p2align 2,,3
.L18:
	movzwl	%cx, %edx
	incl	%ecx
	movw	(%esi,%edx,2), %ax
	cmpw	%bx, %cx
	movw	%ax, -2(%esi,%edx,2)
	jb	.L18
.L27:
	leal	-1(%ebx), %ecx
	subl	$8, %esp
	movzwl	%cx, %edx
	pushl	%edx
	pushl	%esi
	call	GetLowestValueAvailable
	movzwl	TEST_NUMBER, %edx
	movw	%ax, -2(%esi,%edx,2)
	movl	exit_flag, %eax
	addl	$16, %esp
	testl	%eax, %eax
	jne	.L28
	testl	%edi, %edi
	je	.L24
	subl	$8, %esp
	leal	-1(%edx), %ebx
	pushl	%ebx
	pushl	$.LC3
	call	printf
	xorl	%edi, %edi
	movl	$.LC4, (%esp)
	call	puts
	addl	$16, %esp
	jmp	.L24
.L28:
	subl	$12, %esp
	pushl	$0
	call	time
	movl	%eax, %esi
	addl	$12, %esp
	subl	start, %esi
	pushl	%esi
	pushl	count
	pushl	$.LC1
	call	printf
	popl	%eax
	popl	%edx
	movl	count, %eax
	xorl	%edx, %edx
	divl	%esi
	pushl	%eax
	pushl	$.LC2
	call	printf
	movl	$1, (%esp)
	call	exit
.Lfe1:
	.size	main,.Lfe1-main
	.p2align 2,,3
.globl RemoveNumber
	.type	RemoveNumber,@function
RemoveNumber:
	pushl	%ebp
	movl	%esp, %ebp
	movl	12(%ebp), %ecx
	cmpw	TEST_NUMBER, %cx
	pushl	%ebx
	movl	8(%ebp), %ebx
	jae	.L69
	.p2align 2,,3
.L67:
	movzwl	%cx, %edx
	movw	(%ebx,%edx,2), %ax
	movw	%ax, -2(%ebx,%edx,2)
	incl	%ecx
	cmpw	TEST_NUMBER, %cx
	jb	.L67
.L69:
	popl	%ebx
	leave
	ret
.Lfe2:
	.size	RemoveNumber,.Lfe2-RemoveNumber
	.section	.rodata.str1.1
.LC5:
	.string	"\033[H"
.LC6:
	.string	"%03d "
	.text
	.p2align 2,,3
.globl printArray
	.type	printArray,@function
printArray:
	pushl	%ebp
	movl	%esp, %ebp
	pushl	%esi
	pushl	%ebx
	subl	$12, %esp
	pushl	$.LC5
	movl	8(%ebp), %esi
	call	printf
	popl	%eax
	pushl	stdout
	xorl	%ebx, %ebx
	call	fflush
	addl	$16, %esp
	cmpw	TEST_NUMBER, %bx
	jb	.L75
.L77:
	leal	-8(%ebp), %esp
	popl	%ebx
	popl	%esi
	leave
	ret
	.p2align 2,,3
.L75:
	movzwl	%bx, %ecx
	subl	$8, %esp
	movzwl	(%esi,%ecx,2), %edx
	pushl	%edx
	pushl	$.LC6
	incl	%ebx
	call	printf
	addl	$16, %esp
	cmpw	TEST_NUMBER, %bx
	jb	.L75
	jmp	.L77
.Lfe3:
	.size	printArray,.Lfe3-printArray
	.p2align 2,,3
.globl exit_info
	.type	exit_info,@function
exit_info:
	pushl	%ebp
	movl	%esp, %ebp
	pushl	%ebx
	subl	$16, %esp
	pushl	$0
	call	time
	movl	%eax, %ebx
	addl	$12, %esp
	subl	start, %ebx
	pushl	%ebx
	pushl	count
	pushl	$.LC1
	call	printf
	popl	%eax
	popl	%edx
	movl	count, %eax
	xorl	%edx, %edx
	divl	%ebx
	pushl	%eax
	pushl	$.LC2
	call	printf
	movl	$1, (%esp)
	call	exit
.Lfe4:
	.size	exit_info,.Lfe4-exit_info
	.p2align 2,,3
.globl exit_info_sig
	.type	exit_info_sig,@function
exit_info_sig:
	pushl	%ebp
	movl	%esp, %ebp
	movl	$1, exit_flag
	leave
	ret
.Lfe5:
	.size	exit_info_sig,.Lfe5-exit_info_sig
	.comm	start,4,4
	.ident	"GCC: (GNU) 3.2.1 20021207 (Red Hat Linux 8.0 3.2.1-2)"

[-- Attachment #3: fast.s --]
[-- Type: text/plain, Size: 4339 bytes --]

	.file	"testfunc.c"
.globl TEST_NUMBER
	.data
	.align 2
	.type	TEST_NUMBER,@object
	.size	TEST_NUMBER,2
TEST_NUMBER:
	.value	256
.globl count
	.align 4
	.type	count,@object
	.size	count,4
count:
	.long	0
.globl exit_flag
	.align 4
	.type	exit_flag,@object
	.size	exit_flag,4
exit_flag:
	.long	0
	.align 4
	.type	throttle_print.0,@object
	.size	throttle_print.0,4
throttle_print.0:
	.long	0
	.section	.rodata.str1.32,"aMS",@progbits,1
	.align 32
.LC2:
	.string	"\nAdding & dropping random array elements,(from a set of 000..%03u)\n"
	.section	.rodata.str1.1,"aMS",@progbits,1
.LC3:
	.string	"Ctrl C to exit"
	.section	.rodata.str1.32
	.align 32
.LC0:
	.string	"\n%lu array elements randomly dropped and added in %lus"
	.align 32
.LC1:
	.string	" (%lu/s)\n                                                                          \n"
	.text
	.p2align 2,,3
.globl main
	.type	main,@function
main:
	pushl	%ebp
	movl	%esp, %ebp
	pushl	%edi
	pushl	%esi
	pushl	%ebx
	subl	$12, %esp
	andl	$-16, %esp
	cmpl	$1, 8(%ebp)
	movl	$1, %edi
	jle	.L2
	pushl	$0
	pushl	$10
	pushl	$0
	movl	12(%ebp), %eax
	pushl	4(%eax)
	call	__strtol_internal
	addl	$16, %esp
	testl	%eax, %eax
	jle	.L2
	movw	%ax, TEST_NUMBER
.L2:
	movzwl	TEST_NUMBER, %edx
	subl	$12, %esp
	sall	$1, %edx
	pushl	%edx
	call	malloc
	movl	%eax, %esi
	movl	$0, (%esp)
	call	time
	popl	%ebx
	movl	%eax, start
	popl	%eax
	pushl	$exit_info_sig
	pushl	$2
	call	signal
	xorl	%edx, %edx
	movw	TEST_NUMBER, %cx
	addl	$16, %esp
	cmpw	%cx, %dx
	jae	.L24
.L10:
	movzwl	%dx, %ebx
	movw	%dx, (%esi,%ebx,2)
	incl	%edx
	cmpw	%cx, %dx
	jb	.L10
	.p2align 2,,3
.L24:
	incl	count
	call	rand
	movw	TEST_NUMBER, %bx
	movzwl	%bx, %edx
	movl	%edx, %ecx
	cltd
	idivl	%ecx
	cmpw	%bx, %dx
	movl	%edx, %ecx
	jae	.L27
	.p2align 2,,3
.L18:
	movzwl	%cx, %edx
	incl	%ecx
	movw	(%esi,%edx,2), %ax
	cmpw	%bx, %cx
	movw	%ax, -2(%esi,%edx,2)
	jb	.L18
.L27:
	leal	-1(%ebx), %ecx
	subl	$8, %esp
	movzwl	%cx, %edx
	pushl	%edx
	pushl	%esi
	call	GetLowestValueAvailable
	movzwl	TEST_NUMBER, %edx
	movw	%ax, -2(%esi,%edx,2)
	movl	exit_flag, %eax
	addl	$16, %esp
	testl	%eax, %eax
	jne	.L28
	testl	%edi, %edi
	je	.L24
	subl	$8, %esp
	leal	-1(%edx), %ebx
	pushl	%ebx
	pushl	$.LC2
	call	printf
	xorl	%edi, %edi
	movl	$.LC3, (%esp)
	call	puts
	addl	$16, %esp
	jmp	.L24
.L28:
	subl	$12, %esp
	pushl	$0
	call	time
	movl	%eax, %esi
	addl	$12, %esp
	subl	start, %esi
	pushl	%esi
	pushl	count
	pushl	$.LC0
	call	printf
	popl	%eax
	popl	%edx
	movl	count, %eax
	xorl	%edx, %edx
	divl	%esi
	pushl	%eax
	pushl	$.LC1
	call	printf
	movl	$1, (%esp)
	call	exit
.Lfe1:
	.size	main,.Lfe1-main
	.p2align 2,,3
.globl RemoveNumber
	.type	RemoveNumber,@function
RemoveNumber:
	pushl	%ebp
	movl	%esp, %ebp
	movl	12(%ebp), %ecx
	cmpw	TEST_NUMBER, %cx
	pushl	%ebx
	movl	8(%ebp), %ebx
	jae	.L69
	.p2align 2,,3
.L67:
	movzwl	%cx, %edx
	movw	(%ebx,%edx,2), %ax
	movw	%ax, -2(%ebx,%edx,2)
	incl	%ecx
	cmpw	TEST_NUMBER, %cx
	jb	.L67
.L69:
	popl	%ebx
	leave
	ret
.Lfe2:
	.size	RemoveNumber,.Lfe2-RemoveNumber
	.section	.rodata.str1.1
.LC4:
	.string	"\033[H"
.LC5:
	.string	"%03d "
	.text
	.p2align 2,,3
.globl printArray
	.type	printArray,@function
printArray:
	pushl	%ebp
	movl	%esp, %ebp
	pushl	%esi
	pushl	%ebx
	subl	$12, %esp
	pushl	$.LC4
	movl	8(%ebp), %esi
	call	printf
	popl	%eax
	pushl	stdout
	xorl	%ebx, %ebx
	call	fflush
	addl	$16, %esp
	cmpw	TEST_NUMBER, %bx
	jb	.L75
.L77:
	leal	-8(%ebp), %esp
	popl	%ebx
	popl	%esi
	leave
	ret
	.p2align 2,,3
.L75:
	movzwl	%bx, %ecx
	subl	$8, %esp
	movzwl	(%esi,%ecx,2), %edx
	pushl	%edx
	pushl	$.LC5
	incl	%ebx
	call	printf
	addl	$16, %esp
	cmpw	TEST_NUMBER, %bx
	jb	.L75
	jmp	.L77
.Lfe3:
	.size	printArray,.Lfe3-printArray
	.p2align 2,,3
.globl exit_info
	.type	exit_info,@function
exit_info:
	pushl	%ebp
	movl	%esp, %ebp
	pushl	%ebx
	subl	$16, %esp
	pushl	$0
	call	time
	movl	%eax, %ebx
	addl	$12, %esp
	subl	start, %ebx
	pushl	%ebx
	pushl	count
	pushl	$.LC0
	call	printf
	popl	%eax
	popl	%edx
	movl	count, %eax
	xorl	%edx, %edx
	divl	%ebx
	pushl	%eax
	pushl	$.LC1
	call	printf
	movl	$1, (%esp)
	call	exit
.Lfe4:
	.size	exit_info,.Lfe4-exit_info
	.p2align 2,,3
.globl exit_info_sig
	.type	exit_info_sig,@function
exit_info_sig:
	pushl	%ebp
	movl	%esp, %ebp
	movl	$1, exit_flag
	leave
	ret
.Lfe5:
	.size	exit_info_sig,.Lfe5-exit_info_sig
	.comm	start,4,4
	.ident	"GCC: (GNU) 3.2.1 20021207 (Red Hat Linux 8.0 3.2.1-2)"

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Lse-tech] gcc 2.95 vs 3.21 performance
  2003-02-03 23:05 gcc 2.95 vs 3.21 performance Martin J. Bligh
  2003-02-03 23:22 ` [Lse-tech] " Andi Kleen
  2003-02-03 23:31 ` Richard B. Johnson
@ 2003-02-04 12:20 ` Dave Jones
  2003-02-04 15:50   ` Martin J. Bligh
  2003-02-06 15:42 ` gcc -O2 vs gcc -Os performance Martin J. Bligh
  3 siblings, 1 reply; 84+ messages in thread
From: Dave Jones @ 2003-02-04 12:20 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: linux-kernel, lse-tech

On Mon, Feb 03, 2003 at 03:05:06PM -0800, Martin J. Bligh wrote:
 > People keep extolling the virtues of gcc 3.2 to me, which I'm
 > reluctant to switch to, since it compiles so much slower. But
 > it supposedly generates better code, so I thought I'd compile
 > the kernel with both and compare the results. This is gcc 2.95
 > and 3.2.1 from debian unstable on a 16-way NUMA-Q. The kernbench
 > tests still use 2.95 for the compile-time stuff.
 > 
 > The results below leaves me distinctly unconvinced by the supposed 
 > merits of modern gcc's. Not really better or worse, within experimental
 > error. But much slower to compile things with.

What kernel was kernbench compiling ? The reason I'm asking is that
2.5s (and more recent 2.4.21pre's) will use -march flags for more
aggressive optimisation on newer gcc's.
If you want to compare apples to apples, make sure you choose
something like i386 in the processor menu, and then it'll always
use -march=i386 instead of getting fancy with things like -march=pentium4

        Dave

-- 
| Dave Jones.        http://www.codemonkey.org.uk
| SuSE Labs

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-04  7:13     ` Martin J. Bligh
@ 2003-02-04 12:25       ` Adrian Bunk
  2003-02-04 15:51         ` Martin J. Bligh
  0 siblings, 1 reply; 84+ messages in thread
From: Adrian Bunk @ 2003-02-04 12:25 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: vda, linux-kernel, lse-tech

On Mon, Feb 03, 2003 at 11:13:31PM -0800, Martin J. Bligh wrote:
> > I'm afraid it's code generation engine. It is just worse than
> > M$ or Intel's one. It is not easily fixable,
> > GCC folks have tremendous task at hand.
> > 
> > I wonder whether some big companies supposedly supporting 
> > Linux (e.g. Intel) can help GCC team (for example by giving
> > away some code and/or developer time).
> 
> Comparing Intel's compiler vs GCC on Linux would be more interesting.
> Anyone got a copy and some time to burn?

There are already people who have done this, e.g.

  http://www.coyotegulch.com/reviews/intel_comp/intel_gcc_bench2.html

compares g++ and Intel's C++ compiler with C++ code.

> M.

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-04 10:57   ` Padraig
@ 2003-02-04 13:11     ` Helge Hafting
  2003-02-04 13:29       ` Jörn Engel
  2003-02-04 14:05       ` P
  0 siblings, 2 replies; 84+ messages in thread
From: Helge Hafting @ 2003-02-04 13:11 UTC (permalink / raw)
  To: Padraig; +Cc: linux-kernel

Padraig@Linux.ie wrote:
[...]
> Interesting. I just noticed that I get 50% decrease in
> the speed of my program if I just insert a printf(). I.E.
> my program is like:
> 
> printf()
> for(;;) {
>      do_sorting_loop_test();
> }
> 
> If I remove the initial printf it doubles in speed?
> I assume this is some weird caching thing?

Looks like a cacheline alignment issue to me.
This loop of yours occupy x cachelines on your cpu,
moving it in memory by adding the printf
might cause it to ocupy x+1 cachelines.
That might be noticeable if x is a really small number,
such as 1.

> gcc is 3.2.1 (same happens for 2.95..)
> 
> <boggle>
> Note this is with -O3. If I don't specify -O then
> leaving the printf in speeds things up by about 15%
> </boggle>

Sure - going from -O3 to -O changes code generation so
your loop code hits the cachelines differently.
In this case the printf moved the loop into
better alignment.

My advice is to put your test loop in a function of its own,
and do the printing in the function that calls it.
functions are always aligned the same (good) way so
that calling them will be fast.

You can tune the speed of your inner loop by experimenting
with the insertion of one or more NOP asms in front
of the loop.  Just be aware that all such tuning is wasted once
you change anything at all in that function - you'll have to
re-do the tuning each time. 

The compiler should ideally align the loops for maximum performance.
That can be hard though, considering all the different processors
that might run your program.  And aligning everything optimally
could waste a _lot_ of code space - so do this only for
small loops with lots of iterations.

Helge Hafting

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-04 13:11     ` Helge Hafting
@ 2003-02-04 13:29       ` Jörn Engel
  2003-02-04 14:05       ` P
  1 sibling, 0 replies; 84+ messages in thread
From: Jörn Engel @ 2003-02-04 13:29 UTC (permalink / raw)
  To: Helge Hafting; +Cc: Padraig, linux-kernel

On Tue, 4 February 2003 14:11:56 +0100, Helge Hafting wrote:
> 
> Looks like a cacheline alignment issue to me.
> This loop of yours occupy x cachelines on your cpu,
> moving it in memory by adding the printf
> might cause it to ocupy x+1 cachelines.
> That might be noticeable if x is a really small number,
> such as 1.

Makes a lot of sense.

> My advice is to put your test loop in a function of its own,
> and do the printing in the function that calls it.
> functions are always aligned the same (good) way so
> that calling them will be fast.
> 
> You can tune the speed of your inner loop by experimenting
> with the insertion of one or more NOP asms in front
> of the loop.  Just be aware that all such tuning is wasted once
> you change anything at all in that function - you'll have to
> re-do the tuning each time. 
> 
> The compiler should ideally align the loops for maximum performance.
> That can be hard though, considering all the different processors
> that might run your program.  And aligning everything optimally
> could waste a _lot_ of code space - so do this only for
> small loops with lots of iterations.

The compiler has a hard time to identify those loops that affect
performance as opposed to those that are run 2-3 times.

But the developer can usually profile and figure out, where those
loops are. I wonder if the following would be possible.

printf();
__cacheline_aligned_code;
for(;;)
	do_sorting_loop_test();

include/linux/cache.h appears to define such for data structures, but
not for code.

Jörn

-- 
ticks = jiffies;
while (ticks == jiffies);
ticks = jiffies;
-- /usr/src/linux/init/main.c

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-04  0:43   ` J.A. Magallon
@ 2003-02-04 13:42     ` Richard B. Johnson
  2003-02-04 14:20       ` John Bradford
  0 siblings, 1 reply; 84+ messages in thread
From: Richard B. Johnson @ 2003-02-04 13:42 UTC (permalink / raw)
  To: J.A. Magallon; +Cc: Martin J. Bligh, linux-kernel, lse-tech

On Tue, 4 Feb 2003, J.A. Magallon wrote:

> 
> On 2003.02.04 Richard B. Johnson wrote:
> > On Mon, 3 Feb 2003, Martin J. Bligh wrote:
> > 
> > > People keep extolling the virtues of gcc 3.2 to me, which I'm
> > > reluctant to switch to, since it compiles so much slower. But
> > > it supposedly generates better code, so I thought I'd compile
> > > the kernel with both and compare the results. This is gcc 2.95
> > > and 3.2.1 from debian unstable on a 16-way NUMA-Q. The kernbench
> > > tests still use 2.95 for the compile-time stuff.
> > >
> > [SNIPPED tests...]
> > 
> > Don't let this get out, but egcs-2.91.66 compiled FFT code
> > works about 50 percent of the speed of whatever M$ uses for
> > Visual C++ Version 6.0  I was awfully disheartened when I
> > found that identical code executed twice as fast on M$ than
> > it does on Linux. I tried to isolate what was causing the
> > difference. So I replaced 'hypot()' with some 'C' code that
> > does sqrt(x^2 + y^2) just to see if it was the 'C' library.
> > It didn't help. When I find out what type (section) of code
> > is running slower, I'll report. In the meantime, it's fast
> > enough, but I don't like being beat by M$.
> > 
> 
> I face a simliar problem. As everybody says that SSE is so marvelous,
> we are trying to put some SSE code in our render engine, to speed up this.
> But look at the results of the code below (box is a P4@1.8, Xeon with ht):

[SNIPPED good demo code]

I'm going to answer all the comments on this topic with just
one observation. Sorry that I don't have the time to answer
all who responded personally, but I have to take a "work break"
today and tommorrow (design review).

gcc is a marvelous compiler because it was designed
to be readily ported to different architectures. However,
is not an optimum compiler for ix86 machines and probably
is not optimum for any one kind of machine.

I often hear complaints about the ix86 processors as being
"register starved", etc. This could not be further from
fact. There are enough registers. However, various registers
were designed to do various things. Once you decide that
you know more than the processor developers, and start
using registers for things they were not designed for,
you start to have excellent test benchmarks, but awful
overall performance.

For example, the ECX register was designed to be used as
a counter. It can be told to decrement and perform a
conditional jump with the 'loop' instruction. The loop
instruction comes in various flavors, also, like loopz,
loopnz. Somebody decided that 'dec ecx; jnz' was faster.
They measured this to "prove" that it's faster. In the
meantime, other code suffers (stumbles) because there
was really no spare time to be grabbed. Data needs to
be fetched to and from memory. The instruction unit
ends up being starved while data are acquired. This
would not normally hurt anything because the RAM bandwidth
ends up being the dominant pole in the transfer function,
but you end up with something I call the "accordion problem".

I will first demonstrate the accordion problem and then
explain where it comes from. Note a smooth slow of traffic
on a highway. All the cars are traveling at the same speed.
Their speed increases until they don't dare go any faster.
They are now "bandwidth limited". Somebody sees a traffic
cop. Somebody slows down, it takes a few hundred milliseconds
for the next car to slow down, this transient moves backwards
though the line of cars until cars several miles back actually
have to perform emergency braking to stay off the bumper
ahead. Then, the cars start accelerating again. This acceleration,
deceleration ripple moves through the line of cars like the
bellows of an accordion. The average speed of the line of
traffic is now reduced even though there are oscillatory
accelerations above the speed-limit.

Now, visualize a CPU and RAM combination running in lock-step.
The speed of the execution unit is matched to the speed of the
processor I/O so the instructions are fetched and executed in
a more-or-less synchronized manner. This is like the high-speed
line of cars before somebody sees the traffic cop. Now, perturb
this execution by throwing in some faster-than-normal program
sequences. You may start the accordion effect. The problem is
that both instructions and data come through the same hole-in-
the wall, regardless of caching. When the prefetch unit needs
more data (instructions) it must contend with the data I/O.
This may cause an oscillatory condition, actually reducing
throughput.

Anybody who uses CPUs in laboratories with sensitive receiving
equipment knows that, regardless of the FCC rules, these
machines generate great gobs of radio frequency interference.
That's why they need to be in shielded boxes. If you want
to "hear" the stumble I'm talking about, just listen to
the AM audio output using a field-intensity meter. When you
have a fast smoothly-running machine, the interference sounds
like noise. When you have the accordion effect, the interference
has a repetitive pattern to it, a tone, usually low-frequency.
If you capture enough data in a logic analyzer, you will see
the pattern and can see actual pauses in bus I/O where the
CPU just isn't doing a damn thing at all!

FYI, there is a difference in power supply current required
to write 0xffffffff to RAM than 0x00000000 (honest!). If you
are doing a memory-test, writing such a pattern that the
load on the power supply changes at a rate that will disturb
the power supply servo-loop, you can make the voltage bounce!
This has nothing to do with slow CPU execution speed, but
just demonstrates that there are a lot of interactions that
should be considered when designing or proving-out a system.
It's not just a local bench-mark that counts.

The Intel Compiler(s) I have used generate code that uses
the registers just like Intel specified. It uses EBX, ESI, EDI
as index registers just like the 16-bit BX, SI, DI. I have
never seen code output from an Intel 'C' compiler that uses
EAX as in index register, even though it's available and
"faster". They seem to stick with the "un-optimized" string
instructions like rep movsb, repnz cmpsb, etc., and they
use 'loop'. Maybe, just maybe, Intel knows something about
their processor that shouldn't be second-guessed by clever
programmers.
 

Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
Why is the government concerned about the lunatic fringe? Think about it.



^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-04 13:11     ` Helge Hafting
  2003-02-04 13:29       ` Jörn Engel
@ 2003-02-04 14:05       ` P
  2003-02-04 20:36         ` Herman Oosthuysen
  1 sibling, 1 reply; 84+ messages in thread
From: P @ 2003-02-04 14:05 UTC (permalink / raw)
  To: Helge Hafting; +Cc: linux-kernel

Helge Hafting wrote:
> Padraig@Linux.ie wrote:
> [...]
> 
>>Interesting. I just noticed that I get 50% decrease in
>>the speed of my program if I just insert a printf(). I.E.
>>my program is like:
>>
>>printf()
>>for(;;) {
>>     do_sorting_loop_test();
>>}
>>
>>If I remove the initial printf it doubles in speed?
>>I assume this is some weird caching thing?
> 
> 
> Looks like a cacheline alignment issue to me.
> This loop of yours occupy x cachelines on your cpu,
> moving it in memory by adding the printf
> might cause it to ocupy x+1 cachelines.
> That might be noticeable if x is a really small number,
> such as 1.

OK it is (as I suspected and as you explained nicely)
related to the cachelines on my CPU (866 celery).

===============================
GCC options		loops/s
===============================
gcc			2283
gcc -O3 -falign-loops=2	3451
gcc -O3 -falign-loops=4	3443
gcc -O3 -falign-loops=8	7045
gcc -march=i686 -O3	9101
===============================

cheers,
Pádraig.


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-04 13:42     ` Richard B. Johnson
@ 2003-02-04 14:20       ` John Bradford
  0 siblings, 0 replies; 84+ messages in thread
From: John Bradford @ 2003-02-04 14:20 UTC (permalink / raw)
  To: root; +Cc: jamagallon, mbligh, linux-kernel, lse-tech

There is some discussion about compiler optimisations in this Linux
Journal article:

http://www.linuxjournal.com/article.php?sid=4885

John.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-04  9:54     ` Bryan Andersen
@ 2003-02-04 15:46       ` Martin J. Bligh
  0 siblings, 0 replies; 84+ messages in thread
From: Martin J. Bligh @ 2003-02-04 15:46 UTC (permalink / raw)
  To: Bryan Andersen, linux-kernel; +Cc: lse-tech

> Personal opinion here but I know it is also held by many developers I
> know and work with.  I'd rather have a compiler that produces correct and
> fast code but ran slow than one that produces slow or bad code and runs
> fast.  Remember compilation is done far less often than run time
> execution.  

Yeah, I'd make that tradeoff too, but gcc 3.2 doesn't give me that.
People keep saying it does, but I see no real evidence of it.
Show me the money.

M.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Lse-tech] gcc 2.95 vs 3.21 performance
  2003-02-04 12:20 ` [Lse-tech] " Dave Jones
@ 2003-02-04 15:50   ` Martin J. Bligh
  2003-02-10 12:13     ` Momchil Velikov
  0 siblings, 1 reply; 84+ messages in thread
From: Martin J. Bligh @ 2003-02-04 15:50 UTC (permalink / raw)
  To: Dave Jones; +Cc: linux-kernel, lse-tech

>  > People keep extolling the virtues of gcc 3.2 to me, which I'm
>  > reluctant to switch to, since it compiles so much slower. But
>  > it supposedly generates better code, so I thought I'd compile
>  > the kernel with both and compare the results. This is gcc 2.95
>  > and 3.2.1 from debian unstable on a 16-way NUMA-Q. The kernbench
>  > tests still use 2.95 for the compile-time stuff.
>  > 
>  > The results below leaves me distinctly unconvinced by the supposed 
>  > merits of modern gcc's. Not really better or worse, within experimental
>  > error. But much slower to compile things with.
> 
> What kernel was kernbench compiling ? The reason I'm asking is that
> 2.5s (and more recent 2.4.21pre's) will use -march flags for more
> aggressive optimisation on newer gcc's.
> If you want to compare apples to apples, make sure you choose
> something like i386 in the processor menu, and then it'll always
> use -march=i386 instead of getting fancy with things like -march=pentium4

Kernbench compiles 2.4.17, because I'm old, slow and lazy, and that
was what was around when I started doing this test ;-)

But the point is still the same ... even if it is doing more agressive
optimisation, it's not actually buying us anything (at least for the kernel)

M.


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-04 12:25       ` Adrian Bunk
@ 2003-02-04 15:51         ` Martin J. Bligh
  2003-02-04 16:27           ` [Lse-tech] " Martin J. Bligh
  0 siblings, 1 reply; 84+ messages in thread
From: Martin J. Bligh @ 2003-02-04 15:51 UTC (permalink / raw)
  To: Adrian Bunk; +Cc: vda, linux-kernel, lse-tech

>> Comparing Intel's compiler vs GCC on Linux would be more interesting.
>> Anyone got a copy and some time to burn?
> 
> There are already people who have done this, e.g.
> 
>   http://www.coyotegulch.com/reviews/intel_comp/intel_gcc_bench2.html
> 
> compares g++ and Intel's C++ compiler with C++ code.

C would be infinitely more interesting ;-)

M.


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Lse-tech] Re: gcc 2.95 vs 3.21 performance
  2003-02-04 15:51         ` Martin J. Bligh
@ 2003-02-04 16:27           ` Martin J. Bligh
  2003-02-04 17:40             ` Patrick Mansfield
  0 siblings, 1 reply; 84+ messages in thread
From: Martin J. Bligh @ 2003-02-04 16:27 UTC (permalink / raw)
  To: Adrian Bunk; +Cc: linux-kernel, lse-tech

>>> Comparing Intel's compiler vs GCC on Linux would be more interesting.
>>> Anyone got a copy and some time to burn?
>> 
>> There are already people who have done this, e.g.
>> 
>>   http://www.coyotegulch.com/reviews/intel_comp/intel_gcc_bench2.html
>> 
>> compares g++ and Intel's C++ compiler with C++ code.
> 
> C would be infinitely more interesting ;-)

Speaking of which, has anyone ever compiled the ia32 Linux kernel with the
Intel compiler? I thought I saw some patches floating around to make it
compile the ia64 kernel .... that'd be an interesting test case ... might
give us some ideas about what could be tweaked in GCC (or code rejiggled in
the kernel).

M.


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Lse-tech] Re: gcc 2.95 vs 3.21 performance
  2003-02-04 16:27           ` [Lse-tech] " Martin J. Bligh
@ 2003-02-04 17:40             ` Patrick Mansfield
  2003-02-04 17:55               ` Martin J. Bligh
  0 siblings, 1 reply; 84+ messages in thread
From: Patrick Mansfield @ 2003-02-04 17:40 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Adrian Bunk, linux-kernel, lse-tech

On Tue, Feb 04, 2003 at 08:27:28AM -0800, Martin J. Bligh wrote:
> >>> Comparing Intel's compiler vs GCC on Linux would be more interesting.
> >>> Anyone got a copy and some time to burn?
> >> 
> >> There are already people who have done this, e.g.
> >> 
> >>   http://www.coyotegulch.com/reviews/intel_comp/intel_gcc_bench2.html
> >> 
> >> compares g++ and Intel's C++ compiler with C++ code.
> > 
> > C would be infinitely more interesting ;-)
> 
> Speaking of which, has anyone ever compiled the ia32 Linux kernel with the
> Intel compiler? I thought I saw some patches floating around to make it
> compile the ia64 kernel .... that'd be an interesting test case ... might
> give us some ideas about what could be tweaked in GCC (or code rejiggled in
> the kernel).
> 
> M.

Martin -

Like this?

http://marc.theaimsgroup.com/?l=linux-kernel&m=103559880923586&w=2

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Lse-tech] Re: gcc 2.95 vs 3.21 performance
  2003-02-04 17:40             ` Patrick Mansfield
@ 2003-02-04 17:55               ` Martin J. Bligh
  0 siblings, 0 replies; 84+ messages in thread
From: Martin J. Bligh @ 2003-02-04 17:55 UTC (permalink / raw)
  To: Patrick Mansfield; +Cc: Adrian Bunk, linux-kernel, lse-tech

>> Speaking of which, has anyone ever compiled the ia32 Linux kernel with
>> the Intel compiler? I thought I saw some patches floating around to make
>> it compile the ia64 kernel .... that'd be an interesting test case ...
>> might give us some ideas about what could be tweaked in GCC (or code
>> rejiggled in the kernel).
>> 
>> M.
> 
> Martin -
> 
> Like this?
> 
> http://marc.theaimsgroup.com/?l=linux-kernel&m=103559880923586&w=2

Yeah, something very like that ;-) Thanks.
Preferably less micro-benchmarky though ....

M.


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-04  6:54   ` Denis Vlasenko
  2003-02-04  7:13     ` Martin J. Bligh
  2003-02-04  9:54     ` Bryan Andersen
@ 2003-02-04 19:09     ` Timothy D. Witham
  2003-02-04 19:35       ` John Bradford
  2 siblings, 1 reply; 84+ messages in thread
From: Timothy D. Witham @ 2003-02-04 19:09 UTC (permalink / raw)
  To: vda; +Cc: root, Martin J. Bligh, linux-kernel, lse-tech

On Mon, 2003-02-03 at 22:54, Denis Vlasenko wrote:
snip

> 
> I'm afraid it's code generation engine. It is just worse than
> M$ or Intel's one. It is not easily fixable,
> GCC folks have tremendous task at hand.
> 
> I wonder whether some big companies supposedly supporting 
> Linux (e.g. Intel) can help GCC team (for example by giving
> away some code and/or developer time).
> --

   I'm hesitant to enter into this.  But from my own experience
the issue with big companies supporting these sort of changes 
in gcc have more to do with the acceptance process of changes 
into gcc than a lack of desire on the large companies part.

Tim

> vda
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-- 
Timothy D. Witham - Lab Director - wookie@osdlab.org
Open Source Development Lab Inc - A non-profit corporation
15275 SW Koll Parkway - Suite H - Beaverton OR, 97006
(503)-626-2455 x11 (office)    (503)-702-2871     (cell)
(503)-626-2436     (fax)


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-04 19:09     ` Timothy D. Witham
@ 2003-02-04 19:35       ` John Bradford
  2003-02-04 19:44         ` Dave Jones
  2003-02-04 21:38         ` Linus Torvalds
  0 siblings, 2 replies; 84+ messages in thread
From: John Bradford @ 2003-02-04 19:35 UTC (permalink / raw)
  To: Timothy D. Witham; +Cc: vda, root, mbligh, linux-kernel, lse-tech

>    I'm hesitant to enter into this.  But from my own experience
> the issue with big companies supporting these sort of changes 
> in gcc have more to do with the acceptance process of changes 
> into gcc than a lack of desire on the large companies part.

Maybe we should create a KGCC fork, optimise it for kernel
complilations, then try to get our changes merged back in to GCC
mainline at a later date.

John.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-04 19:35       ` John Bradford
@ 2003-02-04 19:44         ` Dave Jones
  2003-02-04 20:11           ` John Bradford
  2003-02-04 21:38         ` Linus Torvalds
  1 sibling, 1 reply; 84+ messages in thread
From: Dave Jones @ 2003-02-04 19:44 UTC (permalink / raw)
  To: John Bradford
  Cc: Timothy D. Witham, vda, root, mbligh, linux-kernel, lse-tech

On Tue, Feb 04, 2003 at 07:35:06PM +0000, John Bradford wrote:

 > Maybe we should create a KGCC fork, optimise it for kernel
 > complilations, then try to get our changes merged back in to GCC
 > mainline at a later date.

What exactly do you mean by "optimise for kernel compilations" ?

		Dave

-- 
| Dave Jones.        http://www.codemonkey.org.uk
| SuSE Labs

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-04 19:44         ` Dave Jones
@ 2003-02-04 20:11           ` John Bradford
  2003-02-04 20:20             ` John Bradford
  2003-02-04 20:45             ` Herman Oosthuysen
  0 siblings, 2 replies; 84+ messages in thread
From: John Bradford @ 2003-02-04 20:11 UTC (permalink / raw)
  To: Dave Jones; +Cc: john, wookie, vda, root, mbligh, linux-kernel, lse-tech

>  > Maybe we should create a KGCC fork, optimise it for kernel
>  > complilations, then try to get our changes merged back in to GCC
>  > mainline at a later date.
> 
> What exactly do you mean by "optimise for kernel compilations" ?

I don't, that was a bad way of phrasing it - I didn't mean fork GCC
just to create one which compiles the kernel so it runs faster, as the
expense of other code.

What I was thinking was that if we forked GCC, we could try out all of
these ideas that have been floating around in this thread, and if, as
was hinted at earlier in this thread, $bigcompanies[] have not offered
contributions because of reluctance to accept them by the GCC team, we
would be more in a position to try them out, because we only need to
concern ourselves with breaking the compilation of the kernel, not
every single program that currently compiles with GCC.

The way I see it, the development series would be optimised for KGCC,
and when we start to think about stabilising that development series,
we try to get our KGCC changes merged back in to GCC mainline.  If
they are not accepted, either KGCC becomes the recommended kernel
compiler, which should cause no great difficulties, (having one
compiler for kernels, and one for userland applications), or we start
making sure that we haven't broken compilation with GCC, (and since a
there would probably always be people compiling with GCC anyway, even
if there was a KGCC, we would effectively always know if we broke
compilation with GCC), and then the recommended compiler is just not
the optimal one, and it would be up to the various distributions to
decide which one they are going to use.

John.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-04 20:11           ` John Bradford
@ 2003-02-04 20:20             ` John Bradford
  2003-02-04 20:45             ` Herman Oosthuysen
  1 sibling, 0 replies; 84+ messages in thread
From: John Bradford @ 2003-02-04 20:20 UTC (permalink / raw)
  To: John Bradford; +Cc: davej, wookie, vda, root, mbligh, linux-kernel, lse-tech

Sorry, that last post didn't make sense, please apply this diff:

- just to create one which compiles the kernel so it runs faster, as the
+ just to create one which compiles the kernel so it runs faster, at the
  expense of other code.

John.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-04 14:05       ` P
@ 2003-02-04 20:36         ` Herman Oosthuysen
  0 siblings, 0 replies; 84+ messages in thread
From: Herman Oosthuysen @ 2003-02-04 20:36 UTC (permalink / raw)
  To: P; +Cc: Helge Hafting, linux-kernel

Hi there,

More than anything else, the execution speed on modern processors seem 
to be a factor of code and data allignment.  Some processors are OK with 
16 bit word allignment, other require 32 bit word allignment and the new 
crop of processors will probably require 64 bit word allignment.

If the data accesses are not alligned for your type of processor, then 
SDRAM accesses go to hell as the bursting gets upset.

Unfortunately, this is a factor of processor architecture and the MS and 
Intel compilers support a small number of processors and can therefore 
be more easily optimized than GCC, which supports every processor in the 
whole world.

If some application of yours is very speed sensitive, then you'll have 
to insert specific allignment control switches/pragmas to force GCC to 
do things the right way for speed, but that will typically increase the 
code and data size a little.

Cheers,
-- 

------------------------------------------------------------------------
Herman Oosthuysen
B.Eng.(E), Member of IEEE
Wireless Networks Inc.
http://www.WirelessNetworksInc.com
E-mail: Herman@WirelessNetworksInc.com
Phone: 1.403.569-5687, Fax: 1.403.235-3965
------------------------------------------------------------------------


P@draigBrady.com wrote:
> Helge Hafting wrote:
> 
>>Padraig@Linux.ie wrote:
>>[...]
>>
>>
>>>Interesting. I just noticed that I get 50% decrease in
>>>the speed of my program if I just insert a printf(). I.E.
>>>my program is like:
>>>
>>>printf()
>>>for(;;) {
>>>    do_sorting_loop_test();
>>>}
>>>
>>>If I remove the initial printf it doubles in speed?
>>>I assume this is some weird caching thing?
>>
>>
>>Looks like a cacheline alignment issue to me.
>>This loop of yours occupy x cachelines on your cpu,
>>moving it in memory by adding the printf
>>might cause it to ocupy x+1 cachelines.
>>That might be noticeable if x is a really small number,
>>such as 1.
> 
> 
> OK it is (as I suspected and as you explained nicely)
> related to the cachelines on my CPU (866 celery).
> 
> ===============================
> GCC options		loops/s
> ===============================
> gcc			2283
> gcc -O3 -falign-loops=2	3451
> gcc -O3 -falign-loops=4	3443
> gcc -O3 -falign-loops=8	7045
> gcc -march=i686 -O3	9101
> ===============================
> 
> cheers,
> Pádraig.
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 




^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-04 20:11           ` John Bradford
  2003-02-04 20:20             ` John Bradford
@ 2003-02-04 20:45             ` Herman Oosthuysen
  2003-02-04 21:44               ` Timothy D. Witham
  2003-02-05  7:15               ` Denis Vlasenko
  1 sibling, 2 replies; 84+ messages in thread
From: Herman Oosthuysen @ 2003-02-04 20:45 UTC (permalink / raw)
  To: John Bradford
  Cc: Dave Jones, wookie, vda, root, mbligh, linux-kernel, lse-tech

Hi there,

 From my experience, the speed issue is caused by misalligned memory 
accesses, causing inefficient SDRAM to Cache movement of data and 
instructions.

I don't think that you necessarily need a modification to the compiler. 
  What you can do is carefully place the ALLIGN switch in a few critical 
places in the kernel code, to ensure that the code and data will be 
properly alligned for whatever processor it is compiled for, be that a 
Pentium, an ARM, a MIPS or whatever.

It would be nice if GCC can be suitably improved to do this correcly for 
all architectures, but a little bit of human help can do wonders, 
without having to fork the GCC project.

Cheers,
-- 

------------------------------------------------------------------------
Herman Oosthuysen
B.Eng.(E), Member of IEEE
Wireless Networks Inc.
http://www.WirelessNetworksInc.com
E-mail: Herman@WirelessNetworksInc.com
Phone: 1.403.569-5687, Fax: 1.403.235-3965
------------------------------------------------------------------------



John Bradford wrote:
>> > Maybe we should create a KGCC fork, optimise it for kernel
>> > complilations, then try to get our changes merged back in to GCC
>> > mainline at a later date.
>>
>>What exactly do you mean by "optimise for kernel compilations" ?
> 
> 
> I don't, that was a bad way of phrasing it - I didn't mean fork GCC
> just to create one which compiles the kernel so it runs faster, as the
> expense of other code.
> 
> What I was thinking was that if we forked GCC, we could try out all of
> these ideas that have been floating around in this thread, and if, as
> was hinted at earlier in this thread, $bigcompanies[] have not offered
> contributions because of reluctance to accept them by the GCC team, we
> would be more in a position to try them out, because we only need to
> concern ourselves with breaking the compilation of the kernel, not
> every single program that currently compiles with GCC.
> 
> The way I see it, the development series would be optimised for KGCC,
> and when we start to think about stabilising that development series,
> we try to get our KGCC changes merged back in to GCC mainline.  If
> they are not accepted, either KGCC becomes the recommended kernel
> compiler, which should cause no great difficulties, (having one
> compiler for kernels, and one for userland applications), or we start
> making sure that we haven't broken compilation with GCC, (and since a
> there would probably always be people compiling with GCC anyway, even
> if there was a KGCC, we would effectively always know if we broke
> compilation with GCC), and then the recommended compiler is just not
> the optimal one, and it would be up to the various distributions to
> decide which one they are going to use.
> 
> John.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 



^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-04 19:35       ` John Bradford
  2003-02-04 19:44         ` Dave Jones
@ 2003-02-04 21:38         ` Linus Torvalds
  2003-02-04 21:54           ` John Bradford
                             ` (2 more replies)
  1 sibling, 3 replies; 84+ messages in thread
From: Linus Torvalds @ 2003-02-04 21:38 UTC (permalink / raw)
  To: linux-kernel

In article <200302041935.h14JZ69G002675@darkstar.example.net>,
John Bradford  <john@grabjohn.com> wrote:
>>    I'm hesitant to enter into this.  But from my own experience
>> the issue with big companies supporting these sort of changes 
>> in gcc have more to do with the acceptance process of changes 
>> into gcc than a lack of desire on the large companies part.
>
>Maybe we should create a KGCC fork, optimise it for kernel
>complilations, then try to get our changes merged back in to GCC
>mainline at a later date.

That's not really the problem.

I think the problem with gcc is that many of the developers are actually
much more interested in Ada or C++ (or even Fortran!), than in plain
old-fashioned C.  So it's not a kernel issue per se, gcc is slow to
compile _any_ C project. 

And a lot of the optimizations gcc does aren't even interesting to most
C projects.  Most "old-fashioned" C projects tend to be written in ways
that mean that the most important optimizations are the truly trivial
ones, and then doing good register allocation.

I'd love to see a small - and fast - C compiler, and I'd be willing to
make kernel changes to make it work with it.  

Let's see. There's been some noises on the gcc lists about splitting up
the languages for easier maintenance, we'll see what happens.

		Linus


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-04 20:45             ` Herman Oosthuysen
@ 2003-02-04 21:44               ` Timothy D. Witham
  2003-02-05  7:15               ` Denis Vlasenko
  1 sibling, 0 replies; 84+ messages in thread
From: Timothy D. Witham @ 2003-02-04 21:44 UTC (permalink / raw)
  To: Herman Oosthuysen
  Cc: John Bradford, Dave Jones, vda, root, mbligh, linux-kernel, lse-tech


On Tue, 2003-02-04 at 12:45, Herman Oosthuysen wrote:
> Hi there,
> 
>  From my experience, the speed issue is caused by misalligned memory 
> accesses, causing inefficient SDRAM to Cache movement of data and 
> instructions.
> 
> I don't think that you necessarily need a modification to the compiler. 
>   What you can do is carefully place the ALLIGN switch in a few critical 
> places in the kernel code, to ensure that the code and data will be 
> properly alligned for whatever processor it is compiled for, be that a 
> Pentium, an ARM, a MIPS or whatever.
> 
 
  I guess I would like the compiler to do that without having to go
in and futz the code.  

> It would be nice if GCC can be suitably improved to do this correcly for 
> all architectures, but a little bit of human help can do wonders, 
> without having to fork the GCC project.
> 
> Cheers,
-- 
Timothy D. Witham - Lab Director - wookie@osdlab.org
Open Source Development Lab Inc - A non-profit corporation
15275 SW Koll Parkway - Suite H - Beaverton OR, 97006
(503)-626-2455 x11 (office)    (503)-702-2871     (cell)
(503)-626-2436     (fax)


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-04 21:38         ` Linus Torvalds
@ 2003-02-04 21:54           ` John Bradford
  2003-02-04 22:11             ` Linus Torvalds
  2003-02-04 23:21           ` Larry McVoy
  2003-02-07 16:09           ` Pavel Machek
  2 siblings, 1 reply; 84+ messages in thread
From: John Bradford @ 2003-02-04 21:54 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

> I'd love to see a small - and fast - C compiler, and I'd be willing to
> make kernel changes to make it work with it.  

How IA-32 centric would your prefered compiler choice be?  In other
words, if a small and fast C compiler turns up, which lacks support
for some currently ported to architectures, are you likely to
encourage kernel changes which will make it difficult for the other
architectures that have to stay with GCC to keep up?

John.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-04 21:54           ` John Bradford
@ 2003-02-04 22:11             ` Linus Torvalds
  2003-02-04 23:27               ` Timothy D. Witham
  0 siblings, 1 reply; 84+ messages in thread
From: Linus Torvalds @ 2003-02-04 22:11 UTC (permalink / raw)
  To: John Bradford; +Cc: linux-kernel


On Tue, 4 Feb 2003, John Bradford wrote:
> > I'd love to see a small - and fast - C compiler, and I'd be willing to
> > make kernel changes to make it work with it.  
> 
> How IA-32 centric would your prefered compiler choice be?  In other
> words, if a small and fast C compiler turns up, which lacks support
> for some currently ported to architectures, are you likely to
> encourage kernel changes which will make it difficult for the other
> architectures that have to stay with GCC to keep up?

I don't think being architecture-specific is necessarily a bad thing in 
compilers, although most compiler writers obviously try to avoid it.

The kernel shouldn't really care: it does want to have a compiler with
support for inline functions, but other than that it's fairly close to
ANSI C.

Yes, I know we use a _lot_ of gcc extensions (inline asms, variadic macros
etc), but that's at least partly because there simply aren't any really
viable alternatives to gcc, so we've had no incentives to abstract any of
that out.

So the gcc'isms aren't really fundamental per se. Although, quite frankly,
even inline asms are pretty much a "standard" thing for any reasonable C
compiler (since C is often used for things that really want it), and the
main issue tends to be the exact syntax rather than anything else. So I
don't think I'd like to use a compiler that is _so_ limited that it
doesn't have some support for something like that. I certainly would 
refuse to use a C compiler that didn't support inline functions.

		Linus


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-04 21:38         ` Linus Torvalds
  2003-02-04 21:54           ` John Bradford
@ 2003-02-04 23:21           ` Larry McVoy
  2003-02-04 23:42             ` b_adlakha
                               ` (4 more replies)
  2003-02-07 16:09           ` Pavel Machek
  2 siblings, 5 replies; 84+ messages in thread
From: Larry McVoy @ 2003-02-04 23:21 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

> I'd love to see a small - and fast - C compiler, and I'd be willing to
> make kernel changes to make it work with it.  

I can't offer any immediate help with this but I want the same thing.  At
some point, we're planning on funding some extensions into GCC or whatever
reasonable C compiler is around:

    - associative arrays as a builtin type

      {
      	  assoc	bar = {};	// anonymous, no file backing

	  bar{"some key"} = "some value";
	  if (defined(bar{"some other value"})) ...
      }

    - regular expressions

      {
      	  char	*foo = "blech";

	  if (foo =~ /regex are nice/) {
	  	printf("Well isn't that special?\n");
	  }
      }

    - tk bindings built in

and then we'll port BK to that compiler.  It's likely to be GCC because we
want to support all the different architectures but if a kernel sponsered
cc shows up we'll happily throw money at that.
-- 
---
Larry McVoy            	 lm at bitmover.com           http://www.bitmover.com/lm 

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-04 22:11             ` Linus Torvalds
@ 2003-02-04 23:27               ` Timothy D. Witham
  0 siblings, 0 replies; 84+ messages in thread
From: Timothy D. Witham @ 2003-02-04 23:27 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: John Bradford, linux-kernel

  If needed we could build this compiler's tree into our testing
process. (PLM/STP) So that patches or changes could be automatically
tested against a matrix of kernels, hardware configurations on 
different regression and stress tests.

Tim
 
On Tue, 2003-02-04 at 14:11, Linus Torvalds wrote:
> On Tue, 4 Feb 2003, John Bradford wrote:
> > > I'd love to see a small - and fast - C compiler, and I'd be willing to
> > > make kernel changes to make it work with it.  
> > 
> > How IA-32 centric would your prefered compiler choice be?  In other
> > words, if a small and fast C compiler turns up, which lacks support
> > for some currently ported to architectures, are you likely to
> > encourage kernel changes which will make it difficult for the other
> > architectures that have to stay with GCC to keep up?
> 
> I don't think being architecture-specific is necessarily a bad thing in 
> compilers, although most compiler writers obviously try to avoid it.
> 
> The kernel shouldn't really care: it does want to have a compiler with
> support for inline functions, but other than that it's fairly close to
> ANSI C.
> 
> Yes, I know we use a _lot_ of gcc extensions (inline asms, variadic macros
> etc), but that's at least partly because there simply aren't any really
> viable alternatives to gcc, so we've had no incentives to abstract any of
> that out.
> 
> So the gcc'isms aren't really fundamental per se. Although, quite frankly,
> even inline asms are pretty much a "standard" thing for any reasonable C
> compiler (since C is often used for things that really want it), and the
> main issue tends to be the exact syntax rather than anything else. So I
> don't think I'd like to use a compiler that is _so_ limited that it
> doesn't have some support for something like that. I certainly would 
> refuse to use a C compiler that didn't support inline functions.
> 
> 		Linus
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-- 
Timothy D. Witham - Lab Director - wookie@osdlab.org
Open Source Development Lab Inc - A non-profit corporation
15275 SW Koll Parkway - Suite H - Beaverton OR, 97006
(503)-626-2455 x11 (office)    (503)-702-2871     (cell)
(503)-626-2436     (fax)


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-04 23:21           ` Larry McVoy
@ 2003-02-04 23:42             ` b_adlakha
  2003-02-05  0:19               ` Andy Pfiffer
  2003-02-04 23:51             ` Jakob Oestergaard
                               ` (3 subsequent siblings)
  4 siblings, 1 reply; 84+ messages in thread
From: b_adlakha @ 2003-02-04 23:42 UTC (permalink / raw)
  To: linux-kernel

>> I'd love to see a small - and fast - C compiler, and I'd be willing to
>> make kernel changes to make it work with it.  

tcc looks like a cool project to me...
Its small enough to be distributed through this mailing list! 

and the "C scripts" looks like a cool feature... 


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-04 23:21           ` Larry McVoy
  2003-02-04 23:42             ` b_adlakha
@ 2003-02-04 23:51             ` Jakob Oestergaard
  2003-02-05  1:03               ` Hugo Mills
  2003-02-10 22:26               ` Andrea Arcangeli
  2003-02-04 23:51             ` Eli Carter
                               ` (2 subsequent siblings)
  4 siblings, 2 replies; 84+ messages in thread
From: Jakob Oestergaard @ 2003-02-04 23:51 UTC (permalink / raw)
  To: Larry McVoy, linux-kernel

On Tue, Feb 04, 2003 at 03:21:01PM -0800, Larry McVoy wrote:
> > I'd love to see a small - and fast - C compiler, and I'd be willing to
> > make kernel changes to make it work with it.  
> 
> I can't offer any immediate help with this but I want the same thing.  At
> some point, we're planning on funding some extensions into GCC or whatever
> reasonable C compiler is around:

[snipping Linus from To:]

Cool.

> 
>     - associative arrays as a builtin type
> 
>       {
>       	  assoc	bar = {};	// anonymous, no file backing
> 
> 	  bar{"some key"} = "some value";
> 	  if (defined(bar{"some other value"})) ...
>       }

Allow me:

{
 std::map<std::string,std::string> bar;

 bar["some key"] = "some value";
 if (bar.find("some other value") != bar.end()) ...
}

Works beautifully, all you need is to pick the existing language which
allows for the existing standard library which already provide that
functionality.

I doubt there's much need for a C+ or C 2+/3 langauage variant  ;)

> 
>     - regular expressions
> 
>       {
>       	  char	*foo = "blech";
> 
> 	  if (foo =~ /regex are nice/) {
> 	  	printf("Well isn't that special?\n");
> 	  }
>       }

Ok, I can't help you with that.

You have probably seen a Perl program before... Now imagine a two
million line Perl program... That is why the above is not a good idea ;)

It's still your right to want it of course...

> 
>     - tk bindings built in

Built into the language (not a library)?

<sarcasm>
Then I'd want the compiler in a kernel module  ;)
</>

> and then we'll port BK to that compiler.  It's likely to be GCC because we
> want to support all the different architectures but if a kernel sponsered
> cc shows up we'll happily throw money at that.

If you look at http://www.codesourcery.com, you can see that there
really are some people who do GCC extentions or optimizations for money
- various institutions have funded additions to GCC this way.

It's a cool idea - I have a few things I'd like my company to fund as
well... Some time in the future...  Unless someone beats us to it.

-- 
................................................................
:   jakob@unthought.net   : And I see the elder races,         :
:.........................: putrid forms of man                :
:   Jakob Østergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............:

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-04 23:21           ` Larry McVoy
  2003-02-04 23:42             ` b_adlakha
  2003-02-04 23:51             ` Jakob Oestergaard
@ 2003-02-04 23:51             ` Eli Carter
  2003-02-05  0:27               ` Larry McVoy
  2003-02-05  3:03             ` Tomas Szepe
  2003-02-05  6:03             ` Mark Mielke
  4 siblings, 1 reply; 84+ messages in thread
From: Eli Carter @ 2003-02-04 23:51 UTC (permalink / raw)
  To: Larry McVoy; +Cc: Linus Torvalds, linux-kernel

Larry McVoy wrote:
>>I'd love to see a small - and fast - C compiler, and I'd be willing to
>>make kernel changes to make it work with it.  
> 
> 
> I can't offer any immediate help with this but I want the same thing.  At
> some point, we're planning on funding some extensions into GCC or whatever
> reasonable C compiler is around:
> 
>     - associative arrays as a builtin type
[snip]
>     - regular expressions
[snip]
>     - tk bindings built in
> 
> and then we'll port BK to that compiler.  It's likely to be GCC because we
> want to support all the different architectures but if a kernel sponsered
> cc shows up we'll happily throw money at that.

Ok, dumb, (and probably flamebait) question time:  I read your list and 
thought "In C? Why not Python?"  I'm guessing speed issues?

Eli
--------------------. "If it ain't broke now,
Eli Carter           \                  it will be soon." -- crypto-gram
eli.carter(a)inet.com `-------------------------------------------------


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-04 23:42             ` b_adlakha
@ 2003-02-05  0:19               ` Andy Pfiffer
  0 siblings, 0 replies; 84+ messages in thread
From: Andy Pfiffer @ 2003-02-05  0:19 UTC (permalink / raw)
  To: b_adlakha; +Cc: linux-kernel

On Tue, 2003-02-04 at 15:42, b_adlakha@softhome.net wrote:
> >> I'd love to see a small - and fast - C compiler, and I'd be willing to
> >> make kernel changes to make it work with it.  
> 
> tcc looks like a cool project to me...
> Its small enough to be distributed through this mailing list! 

Don't overlook lcc -- last I knew most users were using GNU's cpp, but
other than that, it is available for the curious:

http://www.cs.princeton.edu/software/lcc/





^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-04 23:51             ` Eli Carter
@ 2003-02-05  0:27               ` Larry McVoy
  2003-02-06 20:42                 ` Paul Jakma
  0 siblings, 1 reply; 84+ messages in thread
From: Larry McVoy @ 2003-02-05  0:27 UTC (permalink / raw)
  To: Eli Carter; +Cc: Larry McVoy, Linus Torvalds, linux-kernel

> Ok, dumb, (and probably flamebait) question time:  I read your list and 
> thought "In C? Why not Python?"  I'm guessing speed issues?

Scripting languages are unacceptable for products.  Flat out unacceptable.
I spoke to Chip when he was running the perl effort, his answer was "if
you are worried about new releases of perl breaking your scripts, ship
your own version of perl".  I spoke with Guido or some other Python 
luminary and he said the same thing.

For something which a company has to support, it needs to be a compiled 
language with fairly minimal dependencies.  Otherwise the customer 
upgrades and the tool breaks.

Don't get me wrong, I love perl (well, perl 4, perl 5 got a bit weird
for my tastes but some people seem to like it) and python looks cool as
well.  They are great for prototyping but they are just useless as a 
application platform.  Our support costs would be through the roof.

Before the inevitable flameage, please consider that we have to support
people who insist on using all sorts of weird things.  Richard Gooch
maintains his own a.out based linux distribution, for example.  Do we
get to tell him to upgrade?  Nope.  And it just gets worse from there.
-- 
---
Larry McVoy            	 lm at bitmover.com           http://www.bitmover.com/lm 

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-04 23:51             ` Jakob Oestergaard
@ 2003-02-05  1:03               ` Hugo Mills
  2003-02-10 22:26               ` Andrea Arcangeli
  1 sibling, 0 replies; 84+ messages in thread
From: Hugo Mills @ 2003-02-05  1:03 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2166 bytes --]

On Wed, Feb 05, 2003 at 12:51:12AM +0100, Jakob Oestergaard wrote:
> On Tue, Feb 04, 2003 at 03:21:01PM -0800, Larry McVoy wrote:
> > I can't offer any immediate help with this but I want the same thing.  At
> > some point, we're planning on funding some extensions into GCC or whatever
> > reasonable C compiler is around:
> > 
> >     - regular expressions
> > 
> >       {
> >       	  char	*foo = "blech";
> > 
> > 	  if (foo =~ /regex are nice/) {
> > 	  	printf("Well isn't that special?\n");
> > 	  }
> >       }
> 
> Ok, I can't help you with that.

   I wanted something like that a while ago, so I wrote a couple of
classes in C++ to handle regexps. Some of the test code looks like
this:

        string str = "fum foo";
	rejex exp("f(o*)");
	// Search for a regex
	if( s/exp )
		cout << "Found it!" << endl;
	// Count matches
	cout << s/exp << " matches" << endl;

	replace rep("g$0");

	// Search & replace
	str/exp/rep;
	cout << s << endl;

	// All in one
	"foo bar"/rejex("ba")/replace();

   It's not perfect by any stretch of the imagination, but it works.
I've not released it, because I haven't had a chance to get it into a
releasable form yet. Actually, looking at it, I should probably play a
couple of tricks with overloading operators to give you instead

   str =~ search/replace;

or even

   "str" =~ "search"/"replace";

> You have probably seen a Perl program before... Now imagine a two
> million line Perl program... That is why the above is not a good idea ;)
> 
> It's still your right to want it of course...

   That's a good point, but I've always felt that the main problem
with perl isn't the regexes, but the rest of the language(*).

   Hugo.

(*) Some may feel that, coming from a C++ programmer, this is a case
of the pot calling the kettle black. :)

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 1C335860 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Our so-called leaders speak/with words they try to jail ya/ ---   
        They subjugate the meek/but it's the rhetoric of failure.        
                                                                         

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-04 23:21           ` Larry McVoy
                               ` (2 preceding siblings ...)
  2003-02-04 23:51             ` Eli Carter
@ 2003-02-05  3:03             ` Tomas Szepe
  2003-02-05  6:03             ` Mark Mielke
  4 siblings, 0 replies; 84+ messages in thread
From: Tomas Szepe @ 2003-02-05  3:03 UTC (permalink / raw)
  To: Larry McVoy, Linus Torvalds, linux-kernel

> [lm@bitmover.com]
> 
> I can't offer any immediate help with this but I want the same thing.  At
> some point, we're planning on funding some extensions into GCC or whatever
> reasonable C compiler is around:
> 
>     - associative arrays as a builtin type
>     - regular expressions
>     - tk bindings built in

Is it April 1st already?

I can't see why this should be a language extension other than you want
to make a real mess out of it.

> and then we'll port BK to that compiler.  It's likely to be GCC because we
> want to support all the different architectures but if a kernel sponsered
> cc shows up we'll happily throw money at that.

Ever heard of glib?
#include <glib.h> and be done with it.

-- 
Tomas Szepe <szepe@pinerecords.com>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-04 23:21           ` Larry McVoy
                               ` (3 preceding siblings ...)
  2003-02-05  3:03             ` Tomas Szepe
@ 2003-02-05  6:03             ` Mark Mielke
  4 siblings, 0 replies; 84+ messages in thread
From: Mark Mielke @ 2003-02-05  6:03 UTC (permalink / raw)
  To: Larry McVoy, Linus Torvalds, linux-kernel

On Tue, Feb 04, 2003 at 03:21:01PM -0800, Larry McVoy wrote:
> > I'd love to see a small - and fast - C compiler, and I'd be willing to
> > make kernel changes to make it work with it.  
> I can't offer any immediate help with this but I want the same thing.  At
> some point, we're planning on funding some extensions into GCC or whatever
> reasonable C compiler is around:
>     - associative arrays as a builtin type
>     - regular expressions
>     - tk bindings built in

What is the problem with C++ or objective C?

I doubt that the GCC people would accept these sort of additions, even
if complete.

mark

-- 
mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-04 20:45             ` Herman Oosthuysen
  2003-02-04 21:44               ` Timothy D. Witham
@ 2003-02-05  7:15               ` Denis Vlasenko
  2003-02-05 10:36                 ` Andreas Schwab
  2003-02-05 15:30                 ` Martin J. Bligh
  1 sibling, 2 replies; 84+ messages in thread
From: Denis Vlasenko @ 2003-02-05  7:15 UTC (permalink / raw)
  To: Herman Oosthuysen, John Bradford
  Cc: Dave Jones, wookie, root, mbligh, linux-kernel, lse-tech

On 4 February 2003 22:45, Herman Oosthuysen wrote:
> Hi there,
>
>  From my experience, the speed issue is caused by misalligned memory
> accesses, causing inefficient SDRAM to Cache movement of data and
> instructions.
>
> I don't think that you necessarily need a modification to the
> compiler. What you can do is carefully place the ALLIGN switch in a
> few critical places in the kernel code, to ensure that the code and
> data will be properly alligned for whatever processor it is compiled
> for, be that a Pentium, an ARM, a MIPS or whatever.
>
> It would be nice if GCC can be suitably improved to do this correcly
> for all architectures, but a little bit of human help can do wonders,
> without having to fork the GCC project.

			NO.

GCC already went this way, i.e. it aligns functions and loops by
ridiculous (IMHO) amounts like 16 bytes. That's 7,5 bytes per alignment
on average. Now count lk functions and loops and mourn for lost icache.
Or just disassemble any .o module and read the damn code.

This is the primary reason why people report larger kernels for GCC 3.x

I am damn sure that if you compile with less sadistic alignment
you will get smaller *and* faster kernel.
--
vda

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-05  7:15               ` Denis Vlasenko
@ 2003-02-05 10:36                 ` Andreas Schwab
  2003-02-05 11:41                   ` Denis Vlasenko
  2003-02-05 15:30                 ` Martin J. Bligh
  1 sibling, 1 reply; 84+ messages in thread
From: Andreas Schwab @ 2003-02-05 10:36 UTC (permalink / raw)
  To: vda; +Cc: linux-kernel, lse-tech

Denis Vlasenko <vda@port.imtp.ilyichevsk.odessa.ua> writes:

|> I am damn sure that if you compile with less sadistic alignment
|> you will get smaller *and* faster kernel.

So why don't you try it out?  GCC offers everything you need for this
experiment.

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux AG, Deutschherrnstr. 15-19, D-90429 Nürnberg
Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-05 10:36                 ` Andreas Schwab
@ 2003-02-05 11:41                   ` Denis Vlasenko
  2003-02-05 12:20                     ` Dave Jones
  2003-02-05 13:10                     ` [Lse-tech] " Dipankar Sarma
  0 siblings, 2 replies; 84+ messages in thread
From: Denis Vlasenko @ 2003-02-05 11:41 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: linux-kernel, lse-tech

On 5 February 2003 12:36, Andreas Schwab wrote:
> Denis Vlasenko <vda@port.imtp.ilyichevsk.odessa.ua> writes:
> |> I am damn sure that if you compile with less sadistic alignment
> |> you will get smaller *and* faster kernel.
>
> So why don't you try it out?  GCC offers everything you need for this
> experiment.

I did. Others did it too on occasion.

My argument was against overusing optimization techniques.
You cannot speed up kernel by aligning *everything* to 32 bytes,
or by unrolling all loops, or by aggressive inlining.
That's too easy to work. You get kernel which is bigger
*and* slower.
--
vda

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-05 11:41                   ` Denis Vlasenko
@ 2003-02-05 12:20                     ` Dave Jones
  2003-02-05 13:10                     ` [Lse-tech] " Dipankar Sarma
  1 sibling, 0 replies; 84+ messages in thread
From: Dave Jones @ 2003-02-05 12:20 UTC (permalink / raw)
  To: Denis Vlasenko; +Cc: Andreas Schwab, linux-kernel, lse-tech

On Wed, Feb 05, 2003 at 01:41:34PM +0200, Denis Vlasenko wrote:

 > > So why don't you try it out?  GCC offers everything you need for this
 > > experiment.
 > 
 > I did. Others did it too on occasion.

You seem to have forgotten to attach the numbers to your mail.

		Dave

-- 
| Dave Jones.        http://www.codemonkey.org.uk
| SuSE Labs

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Lse-tech] Re: gcc 2.95 vs 3.21 performance
  2003-02-05 11:41                   ` Denis Vlasenko
  2003-02-05 12:20                     ` Dave Jones
@ 2003-02-05 13:10                     ` Dipankar Sarma
  1 sibling, 0 replies; 84+ messages in thread
From: Dipankar Sarma @ 2003-02-05 13:10 UTC (permalink / raw)
  To: Denis Vlasenko; +Cc: Andreas Schwab, linux-kernel, lse-tech

On Wed, Feb 05, 2003 at 01:41:34PM +0200, Denis Vlasenko wrote:
> My argument was against overusing optimization techniques.
> You cannot speed up kernel by aligning *everything* to 32 bytes,
> or by unrolling all loops, or by aggressive inlining.
> That's too easy to work. You get kernel which is bigger
> *and* slower.

I am not getting into this debate, just wanted to point out that
effect of compiler optimization on UNIX kernels have been studied
before. One paper I recall is  -

http://www.usenix.org/publications/library/proceedings/sf94/full_papers/partridge.ps

They used prfile-guided optimization, so that is whole another angle altogether.

Thanks
Dipankar

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-05  7:15               ` Denis Vlasenko
  2003-02-05 10:36                 ` Andreas Schwab
@ 2003-02-05 15:30                 ` Martin J. Bligh
  1 sibling, 0 replies; 84+ messages in thread
From: Martin J. Bligh @ 2003-02-05 15:30 UTC (permalink / raw)
  To: vda, Herman Oosthuysen; +Cc: linux-kernel, lse-tech

> GCC already went this way, i.e. it aligns functions and loops by
> ridiculous (IMHO) amounts like 16 bytes. That's 7,5 bytes per alignment
> on average. Now count lk functions and loops and mourn for lost icache.
> Or just disassemble any .o module and read the damn code.
> 
> This is the primary reason why people report larger kernels for GCC 3.x
> 
> I am damn sure that if you compile with less sadistic alignment
> you will get smaller *and* faster kernel.

There's only one real way to know that. Do it, test it.

M.


^ permalink raw reply	[flat|nested] 84+ messages in thread

* gcc -O2 vs gcc -Os performance
  2003-02-03 23:05 gcc 2.95 vs 3.21 performance Martin J. Bligh
                   ` (2 preceding siblings ...)
  2003-02-04 12:20 ` [Lse-tech] " Dave Jones
@ 2003-02-06 15:42 ` Martin J. Bligh
  2003-02-06 15:51   ` [Lse-tech] " Andi Kleen
  2003-02-06 17:48   ` Alan Cox
  3 siblings, 2 replies; 84+ messages in thread
From: Martin J. Bligh @ 2003-02-06 15:42 UTC (permalink / raw)
  To: linux-kernel; +Cc: lse-tech

Compiled the kernel with gcc -O2 (default) vs -Os
(which people sometimes predict will be faster due to better
cache usage). Didn't bother to measure how much time the compile
itself took like that, but the resultant kernels were compared.
Summary ... -Os is a little slower (note system times on kernbench,
SDET and NUMAschedbench I consider within experimental error),
but not drastically. I wouldn't switch to it though ;-)

All done with gcc-2.95.4 (Debian Woody). These machines (16x NUMA-Q) have 
700MHz P3 Xeons with 2Mb L2 cache ... -Os might fare better on celeron 
with a puny cache if someone wants to try that out.

M.

sizes:

894822 Feb  5 23:50 /boot/vmlinuz-2.5.59-mjb3-Os
906203 Feb  5 22:46 /boot/vmlinuz-2.5.59-mjb3.old

Kernbench-2: (make -j N vmlinux, where N = 2 x num_cpus)
                                   Elapsed        User      System         CPU
                   2.5.59-mjb3       45.66      565.33      110.18     1479.00
                2.5.59-mjb3-Os       45.58      565.38      111.42     1484.33

Kernbench-16: (make -j N vmlinux, where N = 16 x num_cpus)
                                   Elapsed        User      System         CPU
                   2.5.59-mjb3       46.87      569.77      133.32     1499.67
                2.5.59-mjb3-Os       46.86      569.30      134.63     1501.50

DISCLAIMER: SPEC(tm) and the benchmark name SDET(tm) are registered
trademarks of the Standard Performance Evaluation Corporation. This 
benchmarking was performed for research purposes only, and the run results
are non-compliant and not-comparable with any published results.

Results are shown as percentages of the first set displayed

SDET 1  (see disclaimer)
                                Throughput    Std. Dev
                   2.5.59-mjb3       100.0%         4.1%
                2.5.59-mjb3-Os        95.1%         6.7%

SDET 2  (see disclaimer)
                                Throughput    Std. Dev
                   2.5.59-mjb3       100.0%         8.0%
                2.5.59-mjb3-Os       101.2%         5.8%

SDET 4  (see disclaimer)
                                Throughput    Std. Dev
                   2.5.59-mjb3       100.0%         6.2%
                2.5.59-mjb3-Os        99.4%        14.1%

SDET 8  (see disclaimer)
                                Throughput    Std. Dev
                   2.5.59-mjb3       100.0%         3.3%
                2.5.59-mjb3-Os       100.5%         2.2%

SDET 16  (see disclaimer)
                                Throughput    Std. Dev
                   2.5.59-mjb3       100.0%         3.2%
                2.5.59-mjb3-Os        98.9%         2.4%

SDET 32  (see disclaimer)
                                Throughput    Std. Dev
                   2.5.59-mjb3       100.0%         2.2%
                2.5.59-mjb3-Os        97.2%         1.6%

SDET 64  (see disclaimer)
                                Throughput    Std. Dev
                   2.5.59-mjb3       100.0%         0.4%
                2.5.59-mjb3-Os        99.9%         0.3%

SDET 128  (see disclaimer)
                                Throughput    Std. Dev

NUMA schedbench 4:
                                   AvgUser     Elapsed   TotalUser    TotalSys
                   2.5.59-mjb3        0.00       34.62       90.63        0.91
                2.5.59-mjb3-Os        0.00       40.35       81.94        0.69

NUMA schedbench 8:
                                   AvgUser     Elapsed   TotalUser    TotalSys
                   2.5.59-mjb3        0.00       52.16      266.45        1.51
                2.5.59-mjb3-Os        0.00       46.61      248.47        1.49

NUMA schedbench 16:
                                   AvgUser     Elapsed   TotalUser    TotalSys
                   2.5.59-mjb3        0.00       57.38      845.30        3.58
                2.5.59-mjb3-Os        0.00       58.34      851.12        2.94

NUMA schedbench 32:
                                   AvgUser     Elapsed   TotalUser    TotalSys
                   2.5.59-mjb3        0.00      118.05     1806.79        6.24
                2.5.59-mjb3-Os        0.00      115.85     1803.72        6.29

NUMA schedbench 64:
                                   AvgUser     Elapsed   TotalUser    TotalSys
                   2.5.59-mjb3        0.00      236.59     3627.47       15.24
                2.5.59-mjb3-Os        0.00      236.90     3631.11       15.35


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Lse-tech] gcc -O2 vs gcc -Os performance
  2003-02-06 15:42 ` gcc -O2 vs gcc -Os performance Martin J. Bligh
@ 2003-02-06 15:51   ` Andi Kleen
  2003-02-06 17:48   ` Alan Cox
  1 sibling, 0 replies; 84+ messages in thread
From: Andi Kleen @ 2003-02-06 15:51 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: linux-kernel, lse-tech

> All done with gcc-2.95.4 (Debian Woody). These machines (16x NUMA-Q) have 
> 700MHz P3 Xeons with 2Mb L2 cache ... -Os might fare better on celeron 
> with a puny cache if someone wants to try that out.

-Os on 2.95 is not too useful. It only started becomming useful on 3.1+,
even more so on the upcomming 3.3.

e.g. there was one report of ACPI shrinking by >60k by recompiling it
with -Os on 3.1. ACPI is only slow path code so that is completely reasonable.

Best would be of course to use profile feedback to let the compiler
decide where to generate small and where to generate fast&big code.
But that has problems with the maintainability (it will be hard to generate
the same vmlinux as users for debugging/ksymoops reading purposes)

-Andi

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc -O2 vs gcc -Os performance
  2003-02-06 17:48   ` Alan Cox
@ 2003-02-06 17:06     ` Martin J. Bligh
  2003-02-06 20:38     ` Martin J. Bligh
  1 sibling, 0 replies; 84+ messages in thread
From: Martin J. Bligh @ 2003-02-06 17:06 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linux Kernel Mailing List, lse-tech

>> All done with gcc-2.95.4 (Debian Woody). These machines (16x NUMA-Q) have 
>> 700MHz P3 Xeons with 2Mb L2 cache ... -Os might fare better on celeron 
>> with a puny cache if someone wants to try that out
> 
> gcc 3.2 is a lot smarter about -Os and it makes a very big size
> difference according to the numbers the from the ACPI guys.
> 
> Im not sure testing with a gcc from the last millenium is useful 8)

I'll retest with gcc-3.2 ... maybe it'll finally show a case where it's
better than 2.95 this way?

<ducks> <runs>

M.


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc -O2 vs gcc -Os performance
  2003-02-06 15:42 ` gcc -O2 vs gcc -Os performance Martin J. Bligh
  2003-02-06 15:51   ` [Lse-tech] " Andi Kleen
@ 2003-02-06 17:48   ` Alan Cox
  2003-02-06 17:06     ` Martin J. Bligh
  2003-02-06 20:38     ` Martin J. Bligh
  1 sibling, 2 replies; 84+ messages in thread
From: Alan Cox @ 2003-02-06 17:48 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Linux Kernel Mailing List, lse-tech

On Thu, 2003-02-06 at 15:42, Martin J. Bligh wrote:
> All done with gcc-2.95.4 (Debian Woody). These machines (16x NUMA-Q) have 
> 700MHz P3 Xeons with 2Mb L2 cache ... -Os might fare better on celeron 
> with a puny cache if someone wants to try that out

gcc 3.2 is a lot smarter about -Os and it makes a very big size
difference according to the numbers the from the ACPI guys.

Im not sure testing with a gcc from the last millenium is useful 8)


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc -O2 vs gcc -Os performance
  2003-02-06 17:48   ` Alan Cox
  2003-02-06 17:06     ` Martin J. Bligh
@ 2003-02-06 20:38     ` Martin J. Bligh
  2003-02-06 21:32       ` John Bradford
                         ` (2 more replies)
  1 sibling, 3 replies; 84+ messages in thread
From: Martin J. Bligh @ 2003-02-06 20:38 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linux Kernel Mailing List, lse-tech

>> All done with gcc-2.95.4 (Debian Woody). These machines (16x NUMA-Q) have 
>> 700MHz P3 Xeons with 2Mb L2 cache ... -Os might fare better on celeron 
>> with a puny cache if someone wants to try that out
> 
> gcc 3.2 is a lot smarter about -Os and it makes a very big size
> difference according to the numbers the from the ACPI guys.
> 
> Im not sure testing with a gcc from the last millenium is useful 8)

Still no use.
/me throws gcc-3.2 in the trash can.

2901299 vmlinux.O2
2667827 vmlinux.Os


Kernbench-2: (make -j N vmlinux, where N = 2 x num_cpus)
                                   Elapsed        User      System         CPU
          2.5.59-mjb3-gcc32-O2       45.86      564.75      110.91     1472.67
          2.5.59-mjb3-gcc32-Os       45.74      563.96      111.06     1475.17

Kernbench-16: (make -j N vmlinux, where N = 16 x num_cpus)
                                   Elapsed        User      System         CPU
          2.5.59-mjb3-gcc32-O2       46.83      569.15      133.88     1500.50
          2.5.59-mjb3-gcc32-Os       46.90      568.17      134.58     1497.83

DISCLAIMER: SPEC(tm) and the benchmark name SDET(tm) are registered
trademarks of the Standard Performance Evaluation Corporation. This 
benchmarking was performed for research purposes only, and the run results
are non-compliant and not-comparable with any published results.

Results are shown as percentages of the first set displayed

SDET 1  (see disclaimer)
                                Throughput    Std. Dev
          2.5.59-mjb3-gcc32-O2       100.0%         3.4%
          2.5.59-mjb3-gcc32-Os        99.8%         2.8%

SDET 2  (see disclaimer)
                                Throughput    Std. Dev
          2.5.59-mjb3-gcc32-O2       100.0%         6.7%
          2.5.59-mjb3-gcc32-Os       101.2%         4.9%

SDET 4  (see disclaimer)
                                Throughput    Std. Dev
          2.5.59-mjb3-gcc32-O2       100.0%         3.8%
          2.5.59-mjb3-gcc32-Os        95.1%         3.0%

SDET 8  (see disclaimer)
                                Throughput    Std. Dev
          2.5.59-mjb3-gcc32-O2       100.0%         1.1%
          2.5.59-mjb3-gcc32-Os        98.1%         1.4%

SDET 16  (see disclaimer)
                                Throughput    Std. Dev
          2.5.59-mjb3-gcc32-O2       100.0%         1.6%
          2.5.59-mjb3-gcc32-Os        97.7%         1.7%

SDET 32  (see disclaimer)
                                Throughput    Std. Dev
          2.5.59-mjb3-gcc32-O2       100.0%         1.1%
          2.5.59-mjb3-gcc32-Os       103.7%         1.9%

SDET 64  (see disclaimer)
                                Throughput    Std. Dev
          2.5.59-mjb3-gcc32-O2       100.0%         1.4%
          2.5.59-mjb3-gcc32-Os        96.6%         9.7%

NUMA schedbench 4:
                                   AvgUser     Elapsed   TotalUser    TotalSys
          2.5.59-mjb3-gcc32-O2        0.00       36.93       88.84        0.62
          2.5.59-mjb3-gcc32-Os        0.00       44.28       96.95        0.67

NUMA schedbench 8:
                                   AvgUser     Elapsed   TotalUser    TotalSys
          2.5.59-mjb3-gcc32-O2        0.00       54.16      327.57        1.58
          2.5.59-mjb3-gcc32-Os        0.00       50.66      248.42        1.89

NUMA schedbench 16:
                                   AvgUser     Elapsed   TotalUser    TotalSys
          2.5.59-mjb3-gcc32-O2        0.00       57.17      851.44        3.09
          2.5.59-mjb3-gcc32-Os        0.00       57.25      849.20        3.14

NUMA schedbench 32:
                                   AvgUser     Elapsed   TotalUser    TotalSys
          2.5.59-mjb3-gcc32-O2        0.00      117.82     1808.42        6.34
          2.5.59-mjb3-gcc32-Os        0.00      130.02     1814.74        6.52

NUMA schedbench 64:
                                   AvgUser     Elapsed   TotalUser    TotalSys
          2.5.59-mjb3-gcc32-O2        0.00      236.82     3616.04       15.17
          2.5.59-mjb3-gcc32-Os        0.00      241.34     3624.50       16.39


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-05  0:27               ` Larry McVoy
@ 2003-02-06 20:42                 ` Paul Jakma
  0 siblings, 0 replies; 84+ messages in thread
From: Paul Jakma @ 2003-02-06 20:42 UTC (permalink / raw)
  To: Larry McVoy; +Cc: Eli Carter, Linux Kernel

On Tue, 4 Feb 2003, Larry McVoy wrote:

> Scripting languages are unacceptable for products.  Flat out unacceptable.
> I spoke to Chip when he was running the perl effort, his answer was "if
> you are worried about new releases of perl breaking your scripts, ship
> your own version of perl". 

There is a perl compiler, perlcc, but its not perfect. why not fund it
to have it made perfect. then you get best of all worlds - perl and
interpretation at run time for developers and ability to ship binary
files to customers.

regards,
-- 
Paul Jakma	Sys Admin	Alphyra
	paulj@alphyra.ie
Warning: /never/ send email to spam@dishone.st or trap@dishone.st


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc -O2 vs gcc -Os performance
  2003-02-06 20:38     ` Martin J. Bligh
@ 2003-02-06 21:32       ` John Bradford
  2003-02-06 22:12       ` Linus Torvalds
  2003-02-06 23:17       ` Roger Larsson
  2 siblings, 0 replies; 84+ messages in thread
From: John Bradford @ 2003-02-06 21:32 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: alan, linux-kernel, lse-tech

> >> All done with gcc-2.95.4 (Debian Woody). These machines (16x NUMA-Q) have 
> >> 700MHz P3 Xeons with 2Mb L2 cache ... -Os might fare better on celeron 
> >> with a puny cache if someone wants to try that out
> > 
> > gcc 3.2 is a lot smarter about -Os and it makes a very big size
> > difference according to the numbers the from the ACPI guys.
> > 
> > Im not sure testing with a gcc from the last millenium is useful 8)
> 
> Still no use.
> /me throws gcc-3.2 in the trash can.

What submodel options are you using?  If you're compiling with
-march=i386, I wouldn't expect -Os to have much effect.

Note that, of all architectures, GCC is almost certainly most
efficient on IA-32.  Although I haven't done any benchmarks against
other compilers on $arch!=IA32, the ones I've seen claim that the
native compiler generates much better code.

John.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc -O2 vs gcc -Os performance
  2003-02-06 20:38     ` Martin J. Bligh
  2003-02-06 21:32       ` John Bradford
@ 2003-02-06 22:12       ` Linus Torvalds
  2003-02-06 22:58         ` Martin J. Bligh
  2003-02-06 23:17       ` Roger Larsson
  2 siblings, 1 reply; 84+ messages in thread
From: Linus Torvalds @ 2003-02-06 22:12 UTC (permalink / raw)
  To: linux-kernel

In article <263740000.1044563891@[10.10.2.4]>,
Martin J. Bligh <mbligh@aracnet.com> wrote:
>>> All done with gcc-2.95.4 (Debian Woody). These machines (16x NUMA-Q) have 
>>> 700MHz P3 Xeons with 2Mb L2 cache ... -Os might fare better on celeron 
>>> with a puny cache if someone wants to try that out
>> 
>> gcc 3.2 is a lot smarter about -Os and it makes a very big size
>> difference according to the numbers the from the ACPI guys.
>> 
>> Im not sure testing with a gcc from the last millenium is useful 8)
>
>Still no use.
>/me throws gcc-3.2 in the trash can.
>
>2901299 vmlinux.O2
>2667827 vmlinux.Os

Well, Os is certainly smaller.  One thing to look out for is that
microbenchmarks for kernels are usually the _worst_ things to test with
Os.

That's since a large part of the premise of the -Os speed advantage is
that it is better for icache (usually not an issue for microbenchmarks)
and that it is better for load/startup times (generally not a huge issue
for kernels, since the real startup costs of kernels tend to be entirely
elsewhere).

So I suspect -Os tends to be more appropriate for user-mode code, and
especially code with low repeat rates.  Possibly the "low repeat rate"
thing ends up being true of certain kernel subsystems too.

Think of it this way: if you win 10% in size, you're likely to map and
load 10% less code pages at run-time. Which is not a big issue for
traditional data-centric loads, but can be a _huge_ deal for things like
GUI programs etc where there is often more code than data.

			Linus

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc -O2 vs gcc -Os performance
  2003-02-06 22:12       ` Linus Torvalds
@ 2003-02-06 22:58         ` Martin J. Bligh
  2003-02-06 23:16           ` Linus Torvalds
  0 siblings, 1 reply; 84+ messages in thread
From: Martin J. Bligh @ 2003-02-06 22:58 UTC (permalink / raw)
  To: Linus Torvalds, linux-kernel

>> 2901299 vmlinux.O2
>> 2667827 vmlinux.Os
> 
> Well, Os is certainly smaller.  

Yup. I have lots of RAM though, so unless I can see the perf increase
from cache effects, it's not desperately interesting to me personally.
If someone could do similar measurements with a puny-cache celeron chip, 
it would be interesting ... 

> So I suspect -Os tends to be more appropriate for user-mode code, and
> especially code with low repeat rates.  Possibly the "low repeat rate"
> thing ends up being true of certain kernel subsystems too.

Fair enough. I'm not desperately interested in user-land code at the
moment, personally, but gcc is admittedly more general. Maybe we should
compile gcc itself with -Os ;-) Andi (I think) also made the observation
that the garbage-collect size for gcc3.2 may be rather small.

The observation re low repeat rate is interesting ... might be amusing 
to do some really basic profile-guided optimisation on this grounds,
take readprofile / oprofile output, and compile the files that don't
get hammered at all with -Os rather than -O2. Given their low frequency
(by definition), I'm not sure that improving their icache footprint will
have a measureable effect though.

M.


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc -O2 vs gcc -Os performance
  2003-02-06 22:58         ` Martin J. Bligh
@ 2003-02-06 23:16           ` Linus Torvalds
  2003-02-06 23:59             ` Martin J. Bligh
  0 siblings, 1 reply; 84+ messages in thread
From: Linus Torvalds @ 2003-02-06 23:16 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: linux-kernel


On Thu, 6 Feb 2003, Martin J. Bligh wrote:
> 
> The observation re low repeat rate is interesting ... might be amusing 
> to do some really basic profile-guided optimisation on this grounds,
> take readprofile / oprofile output, and compile the files that don't
> get hammered at all with -Os rather than -O2. Given their low frequency
> (by definition), I'm not sure that improving their icache footprint will
> have a measureable effect though.

Icache footprint has nothing to do with repeat rates, which is exactly why 
repeat rates are interesting for -Os.

Icache footprint is directly proportional to the _static_ size of the code 
(ie exactly the thing that -Os is supposed to optimize for), while 
instruction-level performance measurement is only valid on the _dynamic_ 
code.

And with modern CPU's with big caches, a _lot_ of cache misses are the 
forced kind - the startup costs, not the actual runtime cost. That's not 
always true (if you touch big data sets, you'll have replacement misses 
too, of course), but it's not really false either.

So think of the I$ (and TLB, and page load/map - all the same) cost as a 
fixed cost that will always be there, but that -Os tries to minimize. 
That's _one_ dimension in the total cost.

The "traditional" -O2 kind of "try to make the code run fast" 
optimizations tend to try to minimize a totally different dimension, 
namely the dynamic code speed.

And the time required for running the program is the sum of the static and 
dynamic factors. In other words, a _good_ optimization should try to 
minimize not one or the other, but the sum.

And low repeat rates means that the dynamic component is smaller, which 
clearly makes the static component more important.

For example, if you are doing mp3 encoding, the repeat rates for the core 
loop are huge, and the code is small, so clearly the static component is 
largely insignificant. Use -O2.

But if you're running a GUI program then just the loading time is often
quite noticeable, and if you can improve that by, say, 10%, then that can
_more_ than make up for almost any amount of stupidity in your code.  
Especially since a lot of the code isn't even all that loopy and tends to
have low repeat rates. You're almost guaranteed to be better off using -Os
than -O2.

If you've got performance counter data, check the I$ and ITLB miss ratios, 
and if they are at all noticeable, think about the fact that a I$ miss 
tends to cost a lot more than a few more dynamic instructions. 

I suspect the kernel I$ behaviour is generally pretty good, and the ITLB 
behaviour is improved even further thanks to large pages etc. That said, a 
user app that blows the I$ will blow the kernel out of the I$ too, so 
small is always beautiful, even in the kernel.

		Linus


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc -O2 vs gcc -Os performance
  2003-02-06 20:38     ` Martin J. Bligh
  2003-02-06 21:32       ` John Bradford
  2003-02-06 22:12       ` Linus Torvalds
@ 2003-02-06 23:17       ` Roger Larsson
  2003-02-06 23:33         ` Martin J. Bligh
  2 siblings, 1 reply; 84+ messages in thread
From: Roger Larsson @ 2003-02-06 23:17 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: linux-kernel

On Thursday 06 February 2003 21:38, Martin J. Bligh wrote:
> gcc-3.2
> 
> 2901299 vmlinux.O2
> 2667827 vmlinux.Os
>

In an earlier message, Martin J. Bligh wrote: 
>
> 894822 Feb  5 23:50 /boot/vmlinuz-2.5.59-mjb3-Os
> 906203 Feb  5 22:46 /boot/vmlinuz-2.5.59-mjb3.old

And if you compare both with  same/no  compression?

/RogerL

-- 
Roger Larsson
Skellefteå
Sweden


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc -O2 vs gcc -Os performance
  2003-02-06 23:17       ` Roger Larsson
@ 2003-02-06 23:33         ` Martin J. Bligh
  0 siblings, 0 replies; 84+ messages in thread
From: Martin J. Bligh @ 2003-02-06 23:33 UTC (permalink / raw)
  To: Roger Larsson; +Cc: linux-kernel

>> gcc-3.2
>> 
>> 2901299 vmlinux.O2
>> 2667827 vmlinux.Os
>> 
> 
> In an earlier message, Martin J. Bligh wrote: 
>> 
>> 894822 Feb  5 23:50 /boot/vmlinuz-2.5.59-mjb3-Os
>> 906203 Feb  5 22:46 /boot/vmlinuz-2.5.59-mjb3.old
> 
> And if you compare both with  same/no  compression?

 980233 Feb  6 11:15 /boot/vmlinuz-2.5.59-mjb3
 914965 Feb  6 09:34 /boot/vmlinuz-2.5.59-mjb3.old

Those were probably the right files. (O2 and Os respectively)
I didn't look too  closely at the time. Looks like 2.95 produces
smaller files with O2 than 3.2 does with -Os. Bah.

/me cheers for gcc 2.95.4

M.


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc -O2 vs gcc -Os performance
  2003-02-06 23:16           ` Linus Torvalds
@ 2003-02-06 23:59             ` Martin J. Bligh
  0 siblings, 0 replies; 84+ messages in thread
From: Martin J. Bligh @ 2003-02-06 23:59 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

>> The observation re low repeat rate is interesting ... might be amusing 
>> to do some really basic profile-guided optimisation on this grounds,
>> take readprofile / oprofile output, and compile the files that don't
>> get hammered at all with -Os rather than -O2. Given their low frequency
>> (by definition), I'm not sure that improving their icache footprint will
>> have a measureable effect though.
> 
> Icache footprint has nothing to do with repeat rates, which is exactly why 
> repeat rates are interesting for -Os.

Reading the below, I think I just misinterpreted what you meant by 
"repeate rate". My point was that if you hardly ever run that section
of code, -Os might be better. If we call how often you call that code
section it's "frequency" (nothing to do with how tightly it loops inside
it), then if the frequency of the code is low, the icache footprint 
might be better off smaller, as it'll just blow the icache when we do
run it and those cachelines are fetched. On the other hand, that won't
happen often, so it may well be unobservable for real loads.

M.




^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-04 21:38         ` Linus Torvalds
  2003-02-04 21:54           ` John Bradford
  2003-02-04 23:21           ` Larry McVoy
@ 2003-02-07 16:09           ` Pavel Machek
  2 siblings, 0 replies; 84+ messages in thread
From: Pavel Machek @ 2003-02-07 16:09 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

Hi!

> >>    I'm hesitant to enter into this.  But from my own experience
> >> the issue with big companies supporting these sort of changes 
> >> in gcc have more to do with the acceptance process of changes 
> >> into gcc than a lack of desire on the large companies part.
> >
> >Maybe we should create a KGCC fork, optimise it for kernel
> >complilations, then try to get our changes merged back in to GCC
> >mainline at a later date.
> 
> That's not really the problem.
> 
> I think the problem with gcc is that many of the developers are actually
> much more interested in Ada or C++ (or even Fortran!), than in plain
> old-fashioned C.  So it's not a kernel issue per se, gcc is slow to
> compile _any_ C project. 
> 
> And a lot of the optimizations gcc does aren't even interesting to most
> C projects.  Most "old-fashioned" C projects tend to be written in ways
> that mean that the most important optimizations are the truly trivial
> ones, and then doing good register allocation.
> 
> I'd love to see a small - and fast - C compiler, and I'd be willing to
> make kernel changes to make it work with it.  

What about gcc-1.4 or something like that? If you go back in time,
you'll find gcc is getting smaller and faster ;-). Actually making
kernel compile with gcc-2.7.2 should make it few times faster than
gcc-3.2...
								Pavel
-- 
Worst form of spam? Adding advertisment signatures ala sourceforge.net.
What goes next? Inserting advertisment *into* email?

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [Lse-tech] gcc 2.95 vs 3.21 performance
  2003-02-04 15:50   ` Martin J. Bligh
@ 2003-02-10 12:13     ` Momchil Velikov
  0 siblings, 0 replies; 84+ messages in thread
From: Momchil Velikov @ 2003-02-10 12:13 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Dave Jones, linux-kernel, lse-tech

>>>>> "Martin" == Martin J Bligh <mbligh@aracnet.com> writes:

    Martin> But the point is still the same ... even if it is doing
    Martin> more agressive optimisation, it's not actually buying us
    Martin> anything (at least for the kernel)

which might be due in part to ``-fno-strict-aliasing'' used to compile
the Linux kernel.

~velco

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-04 23:51             ` Jakob Oestergaard
  2003-02-05  1:03               ` Hugo Mills
@ 2003-02-10 22:26               ` Andrea Arcangeli
  2003-02-10 23:28                 ` J.A. Magallon
  1 sibling, 1 reply; 84+ messages in thread
From: Andrea Arcangeli @ 2003-02-10 22:26 UTC (permalink / raw)
  To: Jakob Oestergaard, Larry McVoy, linux-kernel

On Wed, Feb 05, 2003 at 12:51:12AM +0100, Jakob Oestergaard wrote:
> On Tue, Feb 04, 2003 at 03:21:01PM -0800, Larry McVoy wrote:
> > > I'd love to see a small - and fast - C compiler, and I'd be willing to
> > > make kernel changes to make it work with it.  
> > 
> > I can't offer any immediate help with this but I want the same thing.  At
> > some point, we're planning on funding some extensions into GCC or whatever
> > reasonable C compiler is around:
> 
> [snipping Linus from To:]
> 
> Cool.
> 
> > 
> >     - associative arrays as a builtin type
> > 
> >       {
> >       	  assoc	bar = {};	// anonymous, no file backing
> > 
> > 	  bar{"some key"} = "some value";
> > 	  if (defined(bar{"some other value"})) ...
> >       }
> 
> Allow me:
> 
> {
>  std::map<std::string,std::string> bar;
> 
>  bar["some key"] = "some value";
>  if (bar.find("some other value") != bar.end()) ...
> }

Indeed. Hardcoding map and multimap templates with string,string
parameter in the language sounds like a very worthless effort. If he
wants an high level syntax on top of the abstractions he should use a
more high level language. C can do everything but it's going to be a
sintax like what we do in the kernel, with lists, rbtrees, structures of
pointer to functions etc..

> Works beautifully, all you need is to pick the existing language which
> allows for the existing standard library which already provide that
> functionality.
> 
> I doubt there's much need for a C+ or C 2+/3 langauage variant  ;)
> 
> > 
> >     - regular expressions
> > 
> >       {
> >       	  char	*foo = "blech";
> > 
> > 	  if (foo =~ /regex are nice/) {
> > 	  	printf("Well isn't that special?\n");
> > 	  }
> >       }
> 
> Ok, I can't help you with that.
> 
> You have probably seen a Perl program before... Now imagine a two
> million line Perl program... That is why the above is not a good idea ;)

actually the python syntax for re is quite nice, and would be pretty
compatible with C, no magic perl =~ operator etc.. again a library like
STL in an highlevel language would do the trick just fine.

> 
> It's still your right to want it of course...
> 
> > 
> >     - tk bindings built in
> 
> Built into the language (not a library)?

Oh my.

> 
> <sarcasm>
> Then I'd want the compiler in a kernel module  ;)
> </>

then I want insmod kde.o too ;)

Andrea

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-10 22:26               ` Andrea Arcangeli
@ 2003-02-10 23:28                 ` J.A. Magallon
  0 siblings, 0 replies; 84+ messages in thread
From: J.A. Magallon @ 2003-02-10 23:28 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Jakob Oestergaard, Larry McVoy, linux-kernel


On 2003.02.10 Andrea Arcangeli wrote:
> On Wed, Feb 05, 2003 at 12:51:12AM +0100, Jakob Oestergaard wrote:
> > On Tue, Feb 04, 2003 at 03:21:01PM -0800, Larry McVoy wrote:
> > > > I'd love to see a small - and fast - C compiler, and I'd be willing to
> > > > make kernel changes to make it work with it.  
> > > 
> > > I can't offer any immediate help with this but I want the same thing.  At
> > > some point, we're planning on funding some extensions into GCC or whatever
> > > reasonable C compiler is around:
> > 
> > [snipping Linus from To:]
> > 
> > Cool.
> > 
> > > 
> > >     - associative arrays as a builtin type
> > > 
> > >       {
> > >       	  assoc	bar = {};	// anonymous, no file backing
> > > 
> > > 	  bar{"some key"} = "some value";
> > > 	  if (defined(bar{"some other value"})) ...
> > >       }
> > 
> > Allow me:
> > 
> > {
> >  std::map<std::string,std::string> bar;
> > 
> >  bar["some key"] = "some value";
> >  if (bar.find("some other value") != bar.end()) ...
> > }
> 

And don't forget smart pointers with reference counting so you can get rid of
all those stupind kfree's... ;)

-- 
J.A. Magallon <jamagallon@able.es>      \                 Software is like sex:
werewolf.able.es                         \           It's better when it's free
Mandrake Linux release 9.1 (Cooker) for i586
Linux 2.4.21-pre4-jam1 (gcc 3.2.1 (Mandrake Linux 9.1 3.2.1-5mdk))

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-10  2:14           ` Jeff Garzik
@ 2003-02-10  9:19             ` Tomas Szepe
  0 siblings, 0 replies; 84+ messages in thread
From: Tomas Szepe @ 2003-02-10  9:19 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Neil Booth, Jeff Muizelaar, Andi Kleen, Linus Torvalds, linux-kernel

> [jgarzik@pobox.com]
> 
> Given the existing TinyCC source base, function inlining is a big step 
> (since tcc doesn't do AST-like things currently), so don't expect that 
> very soon.  TinyCC is a fun little project to watch and play around 
> with, though, and can compile most major open source projects, as well 
> as itself.

I wonder how that can be, though, because I've failed getting it to
compile code as trivial as

	walk_de = (dirent_t *) debug_malloc(sizeof(dirent_t));

where dirent_t is a simple structure and debug_malloc is prototyped
to void *debug_malloc(size_t size);

-- 
Tomas Szepe <szepe@pinerecords.com>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-06  7:02         ` Neil Booth
       [not found]           ` <courier.3E423112.00007219@softhome.net>
@ 2003-02-10  2:14           ` Jeff Garzik
  2003-02-10  9:19             ` Tomas Szepe
  1 sibling, 1 reply; 84+ messages in thread
From: Jeff Garzik @ 2003-02-10  2:14 UTC (permalink / raw)
  To: Neil Booth; +Cc: Jeff Muizelaar, Andi Kleen, Linus Torvalds, linux-kernel

Neil Booth wrote:
> Jeff Muizelaar wrote:-
> 
> 
>>There is also tcc (http://fabrice.bellard.free.fr/tcc/)
>>It claims to support gcc-like inline assembler, appears to be much 
>>smaller and faster than gcc. Plus it is GPL so the liscense isn't a 
>>problem either.
> 
> 
> It doesn't expand macros correctly, however, and accepts an enormous
> range of invalid code without a single diagnostic.  I'm pretty sure
> it's arithmetic rules are incorrect, too.  It's certainly nowhere
> near C89 compliance.


100% agreed.

However, for our purposes, TinyCC is only missing two pieces needed for 
successfully building a bootable kernel:

* __builtin_constant_p
* function inlining

Given the existing TinyCC source base, function inlining is a big step 
(since tcc doesn't do AST-like things currently), so don't expect that 
very soon.  TinyCC is a fun little project to watch and play around 
with, though, and can compile most major open source projects, as well 
as itself.

	Jeff




^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-07 10:31               ` b_adlakha
  2003-02-07 18:46                 ` Horst von Brand
@ 2003-02-07 21:49                 ` Neil Booth
  1 sibling, 0 replies; 84+ messages in thread
From: Neil Booth @ 2003-02-07 21:49 UTC (permalink / raw)
  To: b_adlakha; +Cc: linux-kernel

b_adlakha@softhome.net wrote:-

> Cool (you're trying to fix it), maybe you can modify tcc so it is optimized 
> for compiling linux (optimized for compiling speed and runtime speed for 
> linux). I think it'll be easier and quicker to just make it compile linux 
> properly first, then do the testing/fixing for other things, as they are so 
> many compilers for other things anyway...And maybe it can be called "Linux 
> C Compiler"? lol. 

Sorry, I only care about GCC.

Neil.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-07 10:31               ` b_adlakha
@ 2003-02-07 18:46                 ` Horst von Brand
  2003-02-07 21:49                 ` Neil Booth
  1 sibling, 0 replies; 84+ messages in thread
From: Horst von Brand @ 2003-02-07 18:46 UTC (permalink / raw)
  To: b_adlakha; +Cc: linux-kernel

b_adlakha@softhome.net said:
> Neil Booth writes: 
> > b_adlakha@softhome.net wrote:- 
> >> Maybe thats why its a 0.9* version, and the auther has stated on his site 
> >> that not all C98 features are implimented...but then even GCC doesn't 
> >> impliment them...

> > No, I said C89.  He's got a *long* way to go for that.  Forget C99. 

> > However, he does claim C89 compliance, which is quite disingenuous. 

> >> I checked tcc out, and its damn fast, much much much much faster than
> >> gcc. gcc is bloated and its slow even on my pentium 4 machine, let
> >> alone my 1.2 celeron. It takes 20 minutes to compile a new kernel on
> >> that, now if you're gonna test kernels/patches, you can wait 20
> >> minutes for every compile!

Come on, quit whining already. When I started out fooling around with egcs
and the kernel, it took 45 to 60 minutes to build a kernel for me. And the
kernel was a lot smaller, and the compiler much faster.

> > I agree.  I'm trying to fix it. 
> > 
> > GCC is larger for a reason: it does things properly.  It's easy to be
> > fast if you're willing to be wrong, and not emit warnings or errors, and
> > not implement half the standard.  And not optimize. 

> >> Even icc is much better than gcc, but its very perticular about code (and 
> >> its not gcc compatible as the intel site says)
> >> And its non-free also... 

Pour manpower and people who _know_ that _one_ CPU you are targeting in and
out into the project, it sure will get further along...

> > Only better in terms of compile speed.
> 
> Cool (you're trying to fix it), maybe you can modify tcc so it is optimized 
> for compiling linux (optimized for compiling speed and runtime speed for 
> linux).

Sorry, can pick just one. Either you compile very fast (because you don't
analyze the code you are compiling very much, i.e., generate lousy code) or
generate excelent code (that requires complex analysis, large data
structures to build and use, and takes time).

>         I think it'll be easier and quicker to just make it compile linux 
> properly first, then do the testing/fixing for other things, as they are so 
> many compilers for other things anyway...And maybe it can be called "Linux C 
> Compiler"? lol. 

"Easier and quicker" as in 5 or 6 years of hard work. Sure enough, come
back when you're done.
-- 
Dr. Horst H. von Brand                   User #22616 counter.li.org
Departamento de Informatica                     Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria              +56 32 654239
Casilla 110-V, Valparaiso, Chile                Fax:  +56 32 797513

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
       [not found]             ` <20030206212218.GA4891@daikokuya.co.uk>
@ 2003-02-07 10:31               ` b_adlakha
  2003-02-07 18:46                 ` Horst von Brand
  2003-02-07 21:49                 ` Neil Booth
  0 siblings, 2 replies; 84+ messages in thread
From: b_adlakha @ 2003-02-07 10:31 UTC (permalink / raw)
  To: Neil Booth; +Cc: linux-kernel

Neil Booth writes: 

> b_adlakha@softhome.net wrote:- 
> 
>> Maybe thats why its a 0.9* version, and the auther has stated on his site 
>> that not all C98 features are implimented...but then even GCC doesn't 
>> impliment them...
> 
> No, I said C89.  He's got a *long* way to go for that.  Forget C99. 
> 
> However, he does claim C89 compliance, which is quite disingenuous. 
> 
>> I checked tcc out, and its damn fast, much much much much faster than gcc.
>> gcc is bloated and its slow even on my pentium 4 machine, let alone my 1.2 
>> celeron. It takes 20 minutes to compile a new kernel on that, now if you're 
>> gonna test kernels/patches, you can wait 20 minutes for every compile! 
> 
> I agree.  I'm trying to fix it. 
> 
> GCC is larger for a reason: it does things properly.  It's easy to be
> fast if you're willing to be wrong, and not emit warnings or errors, and
> not implement half the standard.  And not optimize. 
> 
>> Even icc is much better than gcc, but its very perticular about code (and 
>> its not gcc compatible as the intel site says)
>> And its non-free also... 
> 
> Only better in terms of compile speed.

Cool (you're trying to fix it), maybe you can modify tcc so it is optimized 
for compiling linux (optimized for compiling speed and runtime speed for 
linux). I think it'll be easier and quicker to just make it compile linux 
properly first, then do the testing/fixing for other things, as they are so 
many compilers for other things anyway...And maybe it can be called "Linux C 
Compiler"? lol. 


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-05 10:04         ` Pavel Janík
  2003-02-05 20:07           ` Linus Torvalds
@ 2003-02-06 15:00           ` Horst von Brand
  1 sibling, 0 replies; 84+ messages in thread
From: Horst von Brand @ 2003-02-06 15:00 UTC (permalink / raw)
  To: Pavel Janík; +Cc: Linux Kernel Mailing List

Pavel@Janik.cz (Pavel =?iso-8859-2?q?Jan=EDk?=) said:
> Linus Torvalds <torvalds@transmeta.com> said:
>    > lcc isn't really something I want to use, since the license is so
>    > strange, and thus can't be improved upon if there are issues with it.

> what is the difference between compiler and source management system
> regarding licenses and improvements?

That bk was designed around Linus' and other head kernel hackers ideas of
how it should work, and they are still bending over backwards to keep this
biggest _*non*_customer of theirs happy.

OTOH, lcc as a project seems to be dead for all practical purposes (it
looks like 4.2 will be the endo of the line, no patches or updates have
shown up for quite some time). Its licence
<http://www.cs.princeton.edu/software/lcc/pkg/CPYRIGHT> is vaguely BSDish,
but with a "you can't make money off this or any modified versions/software
based on it" clause.

I've been inside lcc 4.1 (current version is 4.2, somewhat different, so
YMMV...) myself a bit, and while it is a marvelous showpiece for classroom
use, it is sorely lacking in what makes a _real_ C compiler (for kernel
use).  For one, it only knows about i486-ish ia32 CPUs, to get others
supported in its current incarnation would be a massive excercise in
duplication or macro-massaging the backend source; other than the (very
good) optimal instruction selection there is very little optimization (what
there is is a bit of strength reduction), the organization of the compiler
makes adding aditional higher-level optimization almost impossible, a
separate SSA or such intermediate form would have to retrofitted; the
register selection is very simplistic and doesn't work correctly (some
experimental patches we had for generating PIC code on ia32 kept it
crashing by running out of registers the code for fixing this case up just
doesn't work). No hint at scheduling instructions or such.
-- 
Dr. Horst H. von Brand                   User #22616 counter.li.org
Departamento de Informatica                     Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria              +56 32 654239
Casilla 110-V, Valparaiso, Chile                Fax:  +56 32 797513

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-04 22:59       ` Jeff Muizelaar
                           ` (2 preceding siblings ...)
  2003-02-05 19:09         ` Linus Torvalds
@ 2003-02-06  7:02         ` Neil Booth
       [not found]           ` <courier.3E423112.00007219@softhome.net>
  2003-02-10  2:14           ` Jeff Garzik
  3 siblings, 2 replies; 84+ messages in thread
From: Neil Booth @ 2003-02-06  7:02 UTC (permalink / raw)
  To: Jeff Muizelaar; +Cc: Andi Kleen, Linus Torvalds, linux-kernel

Jeff Muizelaar wrote:-

> There is also tcc (http://fabrice.bellard.free.fr/tcc/)
> It claims to support gcc-like inline assembler, appears to be much 
> smaller and faster than gcc. Plus it is GPL so the liscense isn't a 
> problem either.

It doesn't expand macros correctly, however, and accepts an enormous
range of invalid code without a single diagnostic.  I'm pretty sure
it's arithmetic rules are incorrect, too.  It's certainly nowhere
near C89 compliance.

Neil.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
       [not found] <200302052021.h15KLrXv000881@darkstar.example.net>
@ 2003-02-05 20:28 ` b_adlakha
  0 siblings, 0 replies; 84+ messages in thread
From: b_adlakha @ 2003-02-05 20:28 UTC (permalink / raw)
  To: John Bradford; +Cc: linux-kernel

John Bradford writes: 

>> No really, I downloaded tcc yesterday, compiled a few things with it and it 
>> is REALLY fast...and as I wrote yesterday, its small enough so people might 
>> say:  
>> 
>> A: "I can't compile linux, what is wrong?"
>> B: "Here, compile it with the compiler attached to this message"  
>> 
>> Sounds like fun doesn't it? I mean, tcc is a working C compiler (thats 
>> supposed to be a great thing), and its only 170 kb gzipped tar! 
> 
> I haven't actually had chance to test tcc yet, but I'll try to
> tomorrow.  How close is it to being able to compile the kernel? 
> 
> John.
Far away, it doesn't even compile the ncurses based menuconfig...I think we 
need to hack (seriously) either tcc or linux... Since tcc is so small it 
would be easier to make it run it (bit) more like gcc, than modifying the 
whole kernel... 


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-05 10:04         ` Pavel Janík
@ 2003-02-05 20:07           ` Linus Torvalds
  2003-02-06 15:00           ` Horst von Brand
  1 sibling, 0 replies; 84+ messages in thread
From: Linus Torvalds @ 2003-02-05 20:07 UTC (permalink / raw)
  To: Pavel Janík; +Cc: linux-kernel


On Wed, 5 Feb 2003, Pavel [iso-8859-2] Janík wrote:
> 
> Hi Linus,
> 
>    > lcc isn't really something I want to use, since the license is so
>    > strange, and thus can't be improved upon if there are issues with it.
> 
> what is the difference between compiler and source management system
> regarding licenses and improvements?

You snipped the part where I said that the intel compiler is likely to be 
more interesting to a number of people, since it's at a higher level. So 
no, I'm not religious about licenses.

But the real issue is "does it do what we want it to do?" and "do we have
a choice?". There are no open-source SCM's that work for me. But there
_is_ an open-source compiler that does work for me. At which point the
license matters - simply because there is choice in the matter.

Gcc mostly works. But it's slower then I'd like. And it prioritizes things
I don't care about. And competition is always good. So I would definitely 
love to see some alternatives.

And if you have issues with BK, maybe you can try to encourage the SCM
people to see why I consider BK to not even have alternatives right now. 

		Linus


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-05 19:09         ` Linus Torvalds
  2003-02-05 19:22           ` Randy.Dunlap
@ 2003-02-05 19:24           ` John Bradford
  1 sibling, 0 replies; 84+ messages in thread
From: John Bradford @ 2003-02-05 19:24 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

> >There is also tcc (http://fabrice.bellard.free.fr/tcc/)
> >It claims to support gcc-like inline assembler, appears to be much 
> >smaller and faster than gcc. Plus it is GPL so the liscense isn't a 
> >problem either.
> >Though, I am not really sure of the quality of code generated or of how 
> >mature it is.
> 
> tcc is interesting.  The code generation is pretty simplistic (read:
> trivially horrible for most things), but it sure is fast and small.  And
> judging by the changelog, Fabrice is trying to compile the kernel with
> it. 
> 
> For a lot of problems, small-and-fast is good.

Maybe otcc is a better choice, then?

http://fabrice.bellard.free.fr/otcc/

:-)

John.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-05 19:09         ` Linus Torvalds
@ 2003-02-05 19:22           ` Randy.Dunlap
  2003-02-05 19:24           ` John Bradford
  1 sibling, 0 replies; 84+ messages in thread
From: Randy.Dunlap @ 2003-02-05 19:22 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

On Wed, 5 Feb 2003, Linus Torvalds wrote:

| In article <3E4045D1.4010704@rogers.com>,
| Jeff Muizelaar  <muizelaar@rogers.com> wrote:
| >
| >There is also tcc (http://fabrice.bellard.free.fr/tcc/)
| >It claims to support gcc-like inline assembler, appears to be much
| >smaller and faster than gcc. Plus it is GPL so the liscense isn't a
| >problem either.
| >Though, I am not really sure of the quality of code generated or of how
| >mature it is.
|
| tcc is interesting.  The code generation is pretty simplistic (read:
| trivially horrible for most things), but it sure is fast and small.  And
| judging by the changelog, Fabrice is trying to compile the kernel with
| it.
|
| For a lot of problems, small-and-fast is good.  Hell, some of the things
| I'd personally find interesting don't have any code generation part at
| all (static analysis of annotated source-code - stanford checker on the
| cheap).
Yep, that's exactly why I'm interested...

| And development doesn't always need good code generation (right
| now some people use "gcc -O0" for that, because anything else hurts too
| much.  Now, the code from tcc will probably look more like "-O-1", but
| at least you can test out things _quickly_).

-- 
~Randy


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-04 22:59       ` Jeff Muizelaar
  2003-02-04 23:12         ` b_adlakha
  2003-02-05  8:41         ` Horst von Brand
@ 2003-02-05 19:09         ` Linus Torvalds
  2003-02-05 19:22           ` Randy.Dunlap
  2003-02-05 19:24           ` John Bradford
  2003-02-06  7:02         ` Neil Booth
  3 siblings, 2 replies; 84+ messages in thread
From: Linus Torvalds @ 2003-02-05 19:09 UTC (permalink / raw)
  To: linux-kernel

In article <3E4045D1.4010704@rogers.com>,
Jeff Muizelaar  <muizelaar@rogers.com> wrote:
>
>There is also tcc (http://fabrice.bellard.free.fr/tcc/)
>It claims to support gcc-like inline assembler, appears to be much 
>smaller and faster than gcc. Plus it is GPL so the liscense isn't a 
>problem either.
>Though, I am not really sure of the quality of code generated or of how 
>mature it is.

tcc is interesting.  The code generation is pretty simplistic (read:
trivially horrible for most things), but it sure is fast and small.  And
judging by the changelog, Fabrice is trying to compile the kernel with
it. 

For a lot of problems, small-and-fast is good.  Hell, some of the things
I'd personally find interesting don't have any code generation part at
all (static analysis of annotated source-code - stanford checker on the
cheap).  And development doesn't always need good code generation (right
now some people use "gcc -O0" for that, because anything else hurts too
much.  Now, the code from tcc will probably look more like "-O-1", but
at least you can test out things _quickly_). 

		Linus

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-04 22:14       ` Linus Torvalds
@ 2003-02-05 10:04         ` Pavel Janík
  2003-02-05 20:07           ` Linus Torvalds
  2003-02-06 15:00           ` Horst von Brand
  0 siblings, 2 replies; 84+ messages in thread
From: Pavel Janík @ 2003-02-05 10:04 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

   From: Linus Torvalds <torvalds@transmeta.com>
   Date: Tue, 4 Feb 2003 14:14:06 -0800 (PST)

Hi Linus,

   > lcc isn't really something I want to use, since the license is so
   > strange, and thus can't be improved upon if there are issues with it.

what is the difference between compiler and source management system
regarding licenses and improvements?
-- 
Pavel Janík

I think I started with hitting C-h a lot.  Really a LOT.
                  -- Kai Grossjohann in gnu.emacs.help about Emacs knowledge

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-04 22:59       ` Jeff Muizelaar
  2003-02-04 23:12         ` b_adlakha
@ 2003-02-05  8:41         ` Horst von Brand
  2003-02-05 19:09         ` Linus Torvalds
  2003-02-06  7:02         ` Neil Booth
  3 siblings, 0 replies; 84+ messages in thread
From: Horst von Brand @ 2003-02-05  8:41 UTC (permalink / raw)
  To: Jeff Muizelaar; +Cc: linux-kernel

[Massive Cc: snippage]

Jeff Muizelaar <muizelaar@rogers.com> said:

[...]

> There is also tcc (http://fabrice.bellard.free.fr/tcc/)
> It claims to support gcc-like inline assembler, appears to be much 
> smaller and faster than gcc. Plus it is GPL so the liscense isn't a 
> problem either.
> Though, I am not really sure of the quality of code generated

Horrible.

>                                                               or of how 
> mature it is.

Nice for one-file throwaway C proggies. But then again, Perl is so much
better at what you'd want to do most of the time...

Look, people, the gcc folks have recently redone the guts of the compiler
to make more advanced optimizations possible/easier (look at the news for
2000-2002 at <http://gcc.gnu.org>). It still needs a lot of porting over of
optimizations and developing new ones, plus tuning, AFAIU.

The other open(ish) C compilers I know about are mere toys.
-- 
Dr. Horst H. von Brand                   User #22616 counter.li.org
Departamento de Informatica                     Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria              +56 32 654239
Casilla 110-V, Valparaiso, Chile                Fax:  +56 32 797513

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
       [not found] <120432836@toto.iv>
@ 2003-02-05  2:45 ` Peter Chubb
  0 siblings, 0 replies; 84+ messages in thread
From: Peter Chubb @ 2003-02-05  2:45 UTC (permalink / raw)
  To: Bryan Andersen; +Cc: linux-kernel, vda, root, Martin J. Bligh, lse-tech

>>>>> "Bryan" == Bryan Andersen <bryan@bogonomicon.net> writes:

Bryan> Personal opinion here but I know it is also held by many
Bryan> developers I know and work with.  I'd rather have a compiler
Bryan> that produces correct and fast code but ran slow than one that
Bryan> produces slow or bad code and runs fast.  Remember compilation
Bryan> is done far less often than run time execution.  Yes I too
Bryan> noticed a difference when I switched over to 3.2 but I also
Bryan> noticed some of my code speed up.

A different personal opinion:  I'd prefer a compiler than can be told
either to run fast and produce correct but suboptimal code or to
produce the fastest correct code it can.

While developing, the compile/test/think/edit cycle is dominated by compile
time for me.  So fast compilation is important while developing
algorithms.

--
Dr Peter Chubb				    peterc@gelato.unsw.edu.au
You are lost in a maze of BitKeeper repositories, all almost the same.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-04 22:59       ` Jeff Muizelaar
@ 2003-02-04 23:12         ` b_adlakha
  2003-02-05  8:41         ` Horst von Brand
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 84+ messages in thread
From: b_adlakha @ 2003-02-04 23:12 UTC (permalink / raw)
  To: Jeff Muizelaar; +Cc: linux-kernel

Jeff Muizelaar writes: 

> Andi Kleen wrote: 
> 
>> If you want small and fast use lcc. 
>> 
>> Unfortunately it's not completely free (some weird license), doesn't
>> really support real inline assembly and generates rather bad code 
>> compared to gcc. 
>> 
>> I'm still looking forward to Open Watcom (http://www.openwatcom.org) - 
>> they are near self hosting on Linux. The inline assembly is very VC++ 
>> style though; very different from gcc and worse you have to write it in
>> Intel syntax. 
>> 
>> Another alternative would be TenDRA, but it also has no inline assembly
>> and it's C understanding can be only described as "fascist". 
>> 
>> If you don't care about free software you could also use the Intel
>> compiler, which seems to be often faster in compile time than gcc now
>> and can already compile kernels. 
>> 
> There is also tcc (http://fabrice.bellard.free.fr/tcc/)
> It claims to support gcc-like inline assembler, appears to be much smaller 
> and faster than gcc. Plus it is GPL so the liscense isn't a problem 
> either.
> Though, I am not really sure of the quality of code generated or of how 
> mature it is. 
> 
> -Jeff

wow, looks like some teenage kid like me made it...
its a 170 kb gzipped tar!
nice for a C compiler...But i'm not sure if it could compile half of the 
linux kernel successfully... 

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-04 22:05     ` gcc 2.95 vs 3.21 performance Andi Kleen
  2003-02-04 22:14       ` Linus Torvalds
@ 2003-02-04 22:59       ` Jeff Muizelaar
  2003-02-04 23:12         ` b_adlakha
                           ` (3 more replies)
  1 sibling, 4 replies; 84+ messages in thread
From: Jeff Muizelaar @ 2003-02-04 22:59 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Linus Torvalds, linux-kernel

Andi Kleen wrote:

>If you want small and fast use lcc.
>
>Unfortunately it's not completely free (some weird license), doesn't
>really support real inline assembly and generates rather bad code compared 
>to gcc.
>
>I'm still looking forward to Open Watcom (http://www.openwatcom.org) - 
>they are near self hosting on Linux. The inline assembly is very VC++ style 
>though; very different from gcc and worse you have to write it in
>Intel syntax.
>
>Another alternative would be TenDRA, but it also has no inline assembly
>and it's C understanding can be only described as "fascist".
>
>If you don't care about free software you could also use the Intel
>compiler, which seems to be often faster in compile time than gcc now
>and can already compile kernels.
>
There is also tcc (http://fabrice.bellard.free.fr/tcc/)
It claims to support gcc-like inline assembler, appears to be much 
smaller and faster than gcc. Plus it is GPL so the liscense isn't a 
problem either.
Though, I am not really sure of the quality of code generated or of how 
mature it is.

-Jeff



^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
  2003-02-04 22:05     ` gcc 2.95 vs 3.21 performance Andi Kleen
@ 2003-02-04 22:14       ` Linus Torvalds
  2003-02-05 10:04         ` Pavel Janík
  2003-02-04 22:59       ` Jeff Muizelaar
  1 sibling, 1 reply; 84+ messages in thread
From: Linus Torvalds @ 2003-02-04 22:14 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel


On 4 Feb 2003, Andi Kleen wrote:
> 
> If you want small and fast use lcc.

lcc isn't really something I want to use, since the license is so strange, 
and thus can't be improved upon if there are issues with it.

Some people have used the Intel compiler - which obviously also cannot be
improved upon, but which is likely to start off pretty good. I don't
really want to use it myself - what I'd really like to see is gcc
splitting up just the C compiler as a separate project with more attention
to size and speed.

		Linus


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: gcc 2.95 vs 3.21 performance
       [not found]   ` <b1pbt8$2ll$1@penguin.transmeta.com.suse.lists.linux.kernel>
@ 2003-02-04 22:05     ` Andi Kleen
  2003-02-04 22:14       ` Linus Torvalds
  2003-02-04 22:59       ` Jeff Muizelaar
  0 siblings, 2 replies; 84+ messages in thread
From: Andi Kleen @ 2003-02-04 22:05 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

torvalds@transmeta.com (Linus Torvalds) writes:
> 
> I'd love to see a small - and fast - C compiler, and I'd be willing to
> make kernel changes to make it work with it.  

If you want small and fast use lcc.

Unfortunately it's not completely free (some weird license), doesn't
really support real inline assembly and generates rather bad code compared 
to gcc.

I'm still looking forward to Open Watcom (http://www.openwatcom.org) - 
they are near self hosting on Linux. The inline assembly is very VC++ style 
though; very different from gcc and worse you have to write it in
Intel syntax.

Another alternative would be TenDRA, but it also has no inline assembly
and it's C understanding can be only described as "fascist".

If you don't care about free software you could also use the Intel
compiler, which seems to be often faster in compile time than gcc now
and can already compile kernels.

-Andi

^ permalink raw reply	[flat|nested] 84+ messages in thread

end of thread, other threads:[~2003-02-10 23:19 UTC | newest]

Thread overview: 84+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-02-03 23:05 gcc 2.95 vs 3.21 performance Martin J. Bligh
2003-02-03 23:22 ` [Lse-tech] " Andi Kleen
2003-02-03 23:31 ` Richard B. Johnson
2003-02-04  0:43   ` J.A. Magallon
2003-02-04 13:42     ` Richard B. Johnson
2003-02-04 14:20       ` John Bradford
2003-02-04  6:54   ` Denis Vlasenko
2003-02-04  7:13     ` Martin J. Bligh
2003-02-04 12:25       ` Adrian Bunk
2003-02-04 15:51         ` Martin J. Bligh
2003-02-04 16:27           ` [Lse-tech] " Martin J. Bligh
2003-02-04 17:40             ` Patrick Mansfield
2003-02-04 17:55               ` Martin J. Bligh
2003-02-04  9:54     ` Bryan Andersen
2003-02-04 15:46       ` Martin J. Bligh
2003-02-04 19:09     ` Timothy D. Witham
2003-02-04 19:35       ` John Bradford
2003-02-04 19:44         ` Dave Jones
2003-02-04 20:11           ` John Bradford
2003-02-04 20:20             ` John Bradford
2003-02-04 20:45             ` Herman Oosthuysen
2003-02-04 21:44               ` Timothy D. Witham
2003-02-05  7:15               ` Denis Vlasenko
2003-02-05 10:36                 ` Andreas Schwab
2003-02-05 11:41                   ` Denis Vlasenko
2003-02-05 12:20                     ` Dave Jones
2003-02-05 13:10                     ` [Lse-tech] " Dipankar Sarma
2003-02-05 15:30                 ` Martin J. Bligh
2003-02-04 21:38         ` Linus Torvalds
2003-02-04 21:54           ` John Bradford
2003-02-04 22:11             ` Linus Torvalds
2003-02-04 23:27               ` Timothy D. Witham
2003-02-04 23:21           ` Larry McVoy
2003-02-04 23:42             ` b_adlakha
2003-02-05  0:19               ` Andy Pfiffer
2003-02-04 23:51             ` Jakob Oestergaard
2003-02-05  1:03               ` Hugo Mills
2003-02-10 22:26               ` Andrea Arcangeli
2003-02-10 23:28                 ` J.A. Magallon
2003-02-04 23:51             ` Eli Carter
2003-02-05  0:27               ` Larry McVoy
2003-02-06 20:42                 ` Paul Jakma
2003-02-05  3:03             ` Tomas Szepe
2003-02-05  6:03             ` Mark Mielke
2003-02-07 16:09           ` Pavel Machek
2003-02-04 10:57   ` Padraig
2003-02-04 13:11     ` Helge Hafting
2003-02-04 13:29       ` Jörn Engel
2003-02-04 14:05       ` P
2003-02-04 20:36         ` Herman Oosthuysen
2003-02-04 12:20 ` [Lse-tech] " Dave Jones
2003-02-04 15:50   ` Martin J. Bligh
2003-02-10 12:13     ` Momchil Velikov
2003-02-06 15:42 ` gcc -O2 vs gcc -Os performance Martin J. Bligh
2003-02-06 15:51   ` [Lse-tech] " Andi Kleen
2003-02-06 17:48   ` Alan Cox
2003-02-06 17:06     ` Martin J. Bligh
2003-02-06 20:38     ` Martin J. Bligh
2003-02-06 21:32       ` John Bradford
2003-02-06 22:12       ` Linus Torvalds
2003-02-06 22:58         ` Martin J. Bligh
2003-02-06 23:16           ` Linus Torvalds
2003-02-06 23:59             ` Martin J. Bligh
2003-02-06 23:17       ` Roger Larsson
2003-02-06 23:33         ` Martin J. Bligh
     [not found] <1044385759.1861.46.camel@localhost.localdomain.suse.lists.linux.kernel>
     [not found] ` <200302041935.h14JZ69G002675@darkstar.example.net.suse.lists.linux.kernel>
     [not found]   ` <b1pbt8$2ll$1@penguin.transmeta.com.suse.lists.linux.kernel>
2003-02-04 22:05     ` gcc 2.95 vs 3.21 performance Andi Kleen
2003-02-04 22:14       ` Linus Torvalds
2003-02-05 10:04         ` Pavel Janík
2003-02-05 20:07           ` Linus Torvalds
2003-02-06 15:00           ` Horst von Brand
2003-02-04 22:59       ` Jeff Muizelaar
2003-02-04 23:12         ` b_adlakha
2003-02-05  8:41         ` Horst von Brand
2003-02-05 19:09         ` Linus Torvalds
2003-02-05 19:22           ` Randy.Dunlap
2003-02-05 19:24           ` John Bradford
2003-02-06  7:02         ` Neil Booth
     [not found]           ` <courier.3E423112.00007219@softhome.net>
     [not found]             ` <20030206212218.GA4891@daikokuya.co.uk>
2003-02-07 10:31               ` b_adlakha
2003-02-07 18:46                 ` Horst von Brand
2003-02-07 21:49                 ` Neil Booth
2003-02-10  2:14           ` Jeff Garzik
2003-02-10  9:19             ` Tomas Szepe
     [not found] <120432836@toto.iv>
2003-02-05  2:45 ` Peter Chubb
     [not found] <200302052021.h15KLrXv000881@darkstar.example.net>
2003-02-05 20:28 ` b_adlakha

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).