* [PATCH] trivial: the memset operation on a automatic array variable should be optimized out by data initialization @ 2007-06-23 5:15 Denis Cheng 2007-06-23 7:59 ` Oleg Verych 0 siblings, 1 reply; 35+ messages in thread From: Denis Cheng @ 2007-06-23 5:15 UTC (permalink / raw) To: trivial; +Cc: linux-kernel From: Denis Cheng <crquan@gmail.com> the explicit memset call could be optimized out by data initialization, thus all the fill working can be done by the compiler implicitly. and C standard guaranteed all the unspecified data field initialized to zero. Signed-off-by: Denis Cheng <crquan@gmail.com> --- After comments in the former threads: http://lkml.org/lkml/2007/6/18/119 http://lkml.org/lkml/2007/6/18/48 On 6/18/07, Jan Engelhardt <jengelh@computergmbh.de> wrote: > The cost is the same. "= {0}" is transformed into a bunch of movs, > or a rep mov, (At least for x86), so is equivalent to memset (which > will get transformed to __builtin_memset anyway). So I wonder > what this really buys. > > And, you do not even need the zero. Just write > ...[MAX_NR_ZONES] = {}; > Jan I also think this style of zero initialization would be better. so the patch is little different: --- arch/x86_64/mm/init.c.orig 2007-06-07 10:08:04.000000000 +0800 +++ arch/x86_64/mm/init.c 2007-06-23 13:12:26.000000000 +0800 @@ -406,8 +406,8 @@ void __cpuinit zap_low_mappings(int cpu) #ifndef CONFIG_NUMA void __init paging_init(void) { - unsigned long max_zone_pfns[MAX_NR_ZONES]; - memset(max_zone_pfns, 0, sizeof(max_zone_pfns)); + unsigned long max_zone_pfns[MAX_NR_ZONES] = {}; + max_zone_pfns[ZONE_DMA] = MAX_DMA_PFN; max_zone_pfns[ZONE_DMA32] = MAX_DMA32_PFN; max_zone_pfns[ZONE_NORMAL] = end_pfn; ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] trivial: the memset operation on a automatic array variable should be optimized out by data initialization 2007-06-23 5:15 [PATCH] trivial: the memset operation on a automatic array variable should be optimized out by data initialization Denis Cheng @ 2007-06-23 7:59 ` Oleg Verych 2007-06-23 13:13 ` Adrian Bunk 2007-06-24 12:58 ` rae l 0 siblings, 2 replies; 35+ messages in thread From: Oleg Verych @ 2007-06-23 7:59 UTC (permalink / raw) To: Denis Cheng; +Cc: trivial, linux-kernel * From: Denis Cheng * Newsgroups: linux.kernel * Date: Fri, 22 Jun 2007 22:15:49 -0700 (PDT) > From: Denis Cheng <crquan@gmail.com> > > the explicit memset call could be optimized out by data initialization, > thus all the fill working can be done by the compiler implicitly. Can be optimized and can be done by compiler are just words; > and C standard guaranteed all the unspecified data field initialized to zero. standards and implementation are on opposite poles of magnet > Signed-off-by: Denis Cheng <crquan@gmail.com> > > --- > After comments in the former threads: > http://lkml.org/lkml/2007/6/18/119 i see a patch > http://lkml.org/lkml/2007/6/18/48 same patch. > @@ -406,8 +406,8 @@ void __cpuinit zap_low_mappings(int cpu) > #ifndef CONFIG_NUMA > void __init paging_init(void) > { > - unsigned long max_zone_pfns[MAX_NR_ZONES]; > - memset(max_zone_pfns, 0, sizeof(max_zone_pfns)); > + unsigned long max_zone_pfns[MAX_NR_ZONES] = {}; > + > max_zone_pfns[ZONE_DMA] = MAX_DMA_PFN; > max_zone_pfns[ZONE_DMA32] = MAX_DMA32_PFN; > max_zone_pfns[ZONE_NORMAL] = end_pfn; Why not just show actual objdump output on code (maybe with different oxygen atoms used in gcc), rather than *talking* about optimization and standards, hm? I bet, that will be a key for success. And if you are interested in such optimizations, why not to grep whole source tree for this kind of things? I'm not sure one function in arch/x86_64 is only such ``unoptimized''. And after doing that maybe you will see, that "{}" initializer can be applied not only to integer values (you did init with of *long int*, with *int*, btw), but to structs and others. Ahh, one more thing about _optimizing_ your time, i.e. not wasting one. Add to CC list people, who already did reply on you patch. Otherwise you are showing your disrespect for them and hiding from further discussion. I think you do not, but Linux development not have an automatic system for patch tracking, so you are on your own with your text editor and e-mail client on this. Please take care for your time. -- frenzy -o--=O`C #oo'L O <___=E M ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] trivial: the memset operation on a automatic array variable should be optimized out by data initialization 2007-06-23 7:59 ` Oleg Verych @ 2007-06-23 13:13 ` Adrian Bunk 2007-06-23 13:41 ` Oleg Verych 2007-06-24 12:58 ` rae l 1 sibling, 1 reply; 35+ messages in thread From: Adrian Bunk @ 2007-06-23 13:13 UTC (permalink / raw) To: Oleg Verych; +Cc: Denis Cheng, trivial, linux-kernel On Sat, Jun 23, 2007 at 09:59:33AM +0200, Oleg Verych wrote: > * From: Denis Cheng > * Newsgroups: linux.kernel > * Date: Fri, 22 Jun 2007 22:15:49 -0700 (PDT) > > > From: Denis Cheng <crquan@gmail.com> > > > > the explicit memset call could be optimized out by data initialization, > > thus all the fill working can be done by the compiler implicitly. > > Can be optimized and can be done by compiler are just words; > > > and C standard guaranteed all the unspecified data field initialized to zero. > > standards and implementation are on opposite poles of magnet Bullshit. We expect a C compiler, and if a C compiler violates the C standard that's a bug in the compiler that has to be fixed. And gcc is usually quite good in following the C standard. > > Signed-off-by: Denis Cheng <crquan@gmail.com> > > > > --- > > After comments in the former threads: > > http://lkml.org/lkml/2007/6/18/119 > > i see a patch > > > http://lkml.org/lkml/2007/6/18/48 > > same patch. >... Open your eyes and you'll find thread overviews at the left side of the URLs he gave... cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] trivial: the memset operation on a automatic array variable should be optimized out by data initialization 2007-06-23 13:13 ` Adrian Bunk @ 2007-06-23 13:41 ` Oleg Verych 2007-06-23 13:57 ` Adrian Bunk 0 siblings, 1 reply; 35+ messages in thread From: Oleg Verych @ 2007-06-23 13:41 UTC (permalink / raw) To: Adrian Bunk; +Cc: Denis Cheng, trivial, linux-kernel On Sat, Jun 23, 2007 at 03:13:55PM +0200, Adrian Bunk wrote: > On Sat, Jun 23, 2007 at 09:59:33AM +0200, Oleg Verych wrote: [] > > > From: Denis Cheng <crquan@gmail.com> > > > > > > the explicit memset call could be optimized out by data initialization, > > > thus all the fill working can be done by the compiler implicitly. > > > > Can be optimized and can be done by compiler are just words; > > > > > and C standard guaranteed all the unspecified data field initialized to zero. > > > > standards and implementation are on opposite poles of magnet > > Bullshit. > > We expect a C compiler, and if a C compiler violates the C standard > that's a bug in the compiler that has to be fixed. If you are serious, please consider last kernel headers vs ANSI C discussion, then GNU extensions of the GCC C compiler and relevant "if ICC doesn't support GCC extensions it's ICC's bug". That was about implementation. About standards you are not serious, aren't you? (Please don't see this as for this particular case, but as general viewpoint) > And gcc is usually quite good in following the C standard. > > > Signed-off-by: Denis Cheng <crquan@gmail.com> > > > > > > --- > > > After comments in the former threads: > > > http://lkml.org/lkml/2007/6/18/119 > > > > i see a patch > > > > > http://lkml.org/lkml/2007/6/18/48 > > > > same patch. > >... > > Open your eyes and you'll find thread overviews at the left side of > the URLs he gave... Two threads with *different* URLs but with *same* patch... > > cu > Adrian Where's constructive context and support of yet another patch author, Adrian? > -- > > "Is there not promise of rain?" Ling Tan asked suddenly out > of the darkness. There had been need of rain for many days. > "Only a promise," Lao Er said. > Pearl S. Buck - Dragon Seed > ____ ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] trivial: the memset operation on a automatic array variable should be optimized out by data initialization 2007-06-23 13:41 ` Oleg Verych @ 2007-06-23 13:57 ` Adrian Bunk 2007-06-23 15:21 ` Segher Boessenkool 0 siblings, 1 reply; 35+ messages in thread From: Adrian Bunk @ 2007-06-23 13:57 UTC (permalink / raw) To: Oleg Verych; +Cc: Denis Cheng, trivial, linux-kernel On Sat, Jun 23, 2007 at 03:41:26PM +0200, Oleg Verych wrote: > On Sat, Jun 23, 2007 at 03:13:55PM +0200, Adrian Bunk wrote: > > On Sat, Jun 23, 2007 at 09:59:33AM +0200, Oleg Verych wrote: > [] > > > > From: Denis Cheng <crquan@gmail.com> > > > > > > > > the explicit memset call could be optimized out by data initialization, > > > > thus all the fill working can be done by the compiler implicitly. > > > > > > Can be optimized and can be done by compiler are just words; > > > > > > > and C standard guaranteed all the unspecified data field initialized to zero. > > > > > > standards and implementation are on opposite poles of magnet > > > > Bullshit. > > > > We expect a C compiler, and if a C compiler violates the C standard > > that's a bug in the compiler that has to be fixed. > > If you are serious, please consider last kernel headers vs ANSI C > discussion, If only Joerg would tell us where the problem exactly is... There might be a bug in the kernel header, but this simply has to be fixed. > then GNU extensions of the GCC C compiler and relevant "if > ICC doesn't support GCC extensions it's ICC's bug". gcc is a C compiler and claims to follow the C standard. The kernel does not claim to be compilable by a plain C compiler. Spot the difference? > That was about > implementation. About standards you are not serious, aren't you? > (Please don't see this as for this particular case, but as general > viewpoint) And as with many generalizations, that's often wrong... > > And gcc is usually quite good in following the C standard. > > > > > Signed-off-by: Denis Cheng <crquan@gmail.com> > > > > > > > > --- > > > > After comments in the former threads: > > > > http://lkml.org/lkml/2007/6/18/119 > > > > > > i see a patch > > > > > > > http://lkml.org/lkml/2007/6/18/48 > > > > > > same patch. > > >... > > > > Open your eyes and you'll find thread overviews at the left side of > > the URLs he gave... > > Two threads with *different* URLs but with *same* patch... >... The comments are in the _threads_. The patches are only the roots of the threads. cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] trivial: the memset operation on a automatic array variable should be optimized out by data initialization 2007-06-23 13:57 ` Adrian Bunk @ 2007-06-23 15:21 ` Segher Boessenkool 0 siblings, 0 replies; 35+ messages in thread From: Segher Boessenkool @ 2007-06-23 15:21 UTC (permalink / raw) To: Adrian Bunk; +Cc: Denis Cheng, Oleg Verych, linux-kernel, trivial > gcc is a C compiler and claims to follow the C standard. Not with the options the kernel build uses. But, close enough -- the differences are really minor stuff. Segher ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] trivial: the memset operation on a automatic array variable should be optimized out by data initialization 2007-06-23 7:59 ` Oleg Verych 2007-06-23 13:13 ` Adrian Bunk @ 2007-06-24 12:58 ` rae l 2007-06-24 22:25 ` Oleg Verych 1 sibling, 1 reply; 35+ messages in thread From: rae l @ 2007-06-24 12:58 UTC (permalink / raw) To: Oleg Verych; +Cc: trivial, linux-kernel On 6/23/07, Oleg Verych <olecom@flower.upol.cz> wrote: > Why not just show actual objdump output on code (maybe with different > oxygen atoms used in gcc), rather than *talking* about optimization and > standards, hm? here is the objdump output of the two object files: As you could see, the older one used 0x38 bytes stack space while the new one used 0x28 bytes, and the object code is two bytes less, I think all these benefits are the gcc's __builtin_memset optimization than the explicit call to memset. $ objdump -d /tmp/init.orig.o|grep -A23 -nw '<paging_init>' 525:0000000000000395 <paging_init>: 526- 395: 48 83 ec 38 sub $0x38,%rsp 527- 399: 48 8d 54 24 10 lea 0x10(%rsp),%rdx 528- 39e: fc cld 529- 39f: 31 c0 xor %eax,%eax 530- 3a1: 48 89 d7 mov %rdx,%rdi 531- 3a4: ab stos %eax,%es:(%rdi) 532- 3a5: ab stos %eax,%es:(%rdi) 533- 3a6: ab stos %eax,%es:(%rdi) 534- 3a7: ab stos %eax,%es:(%rdi) 535- 3a8: ab stos %eax,%es:(%rdi) 536- 3a9: 48 89 7c 24 08 mov %rdi,0x8(%rsp) 537- 3ae: ab stos %eax,%es:(%rdi) 538- 3af: 48 c7 44 24 10 00 10 movq $0x1000,0x10(%rsp) 539- 3b6: 00 00 540- 3b8: 48 c7 44 24 18 00 00 movq $0x100000,0x18(%rsp) 541- 3bf: 10 00 542- 3c1: 48 8b 05 00 00 00 00 mov 0(%rip),%rax # 3c8 <paging_init+0x33> 543- 3c8: 48 89 44 24 20 mov %rax,0x20(%rsp) 544- 3cd: 48 89 d7 mov %rdx,%rdi 545- 3d0: e8 00 00 00 00 callq 3d5 <paging_init+0x40> 546- 3d5: 48 83 c4 38 add $0x38,%rsp 547- 3d9: c3 retq 548- $ objdump -d /tmp/init.new.o|grep -A23 -nw '<paging_init>' 525:0000000000000395 <paging_init>: 526- 395: 48 83 ec 28 sub $0x28,%rsp 527- 399: 48 89 e7 mov %rsp,%rdi 528- 39c: fc cld 529- 39d: 31 c0 xor %eax,%eax 530- 39f: ab stos %eax,%es:(%rdi) 531- 3a0: ab stos %eax,%es:(%rdi) 532- 3a1: ab stos %eax,%es:(%rdi) 533- 3a2: ab stos %eax,%es:(%rdi) 534- 3a3: ab stos %eax,%es:(%rdi) 535- 3a4: ab stos %eax,%es:(%rdi) 536- 3a5: 48 c7 04 24 00 10 00 movq $0x1000,(%rsp) 537- 3ac: 00 538- 3ad: 48 c7 44 24 08 00 00 movq $0x100000,0x8(%rsp) 539- 3b4: 10 00 540- 3b6: 48 8b 05 00 00 00 00 mov 0(%rip),%rax # 3bd <paging_init+0x28> 541- 3bd: 48 89 44 24 10 mov %rax,0x10(%rsp) 542- 3c2: 48 89 e7 mov %rsp,%rdi 543- 3c5: e8 00 00 00 00 callq 3ca <paging_init+0x35> 544- 3ca: 48 83 c4 28 add $0x28,%rsp 545- 3ce: c3 retq 546- 547-00000000000003cf <alloc_low_page>: 548- 3cf: 41 56 push %r14 > > I bet, that will be a key for success. And if you are interested in such > optimizations, why not to grep whole source tree for this kind of > things? I'm not sure one function in arch/x86_64 is only such ``unoptimized''. > And after doing that maybe you will see, that "{}" initializer can be > applied not only to integer values (you did init with of *long int*, > with *int*, btw), but to structs and others. with '{}' initializer, gcc will fill its memory with zeros. to other potential points to be optimized, I only see this trivial as the first point, I wonder how people gives comments on this; and if this optimization can be tested correctly, this can be done as an optimization example and I'll try others. > > Ahh, one more thing about _optimizing_ your time, i.e. not wasting one. > > Add to CC list people, who already did reply on you patch. Otherwise > you are showing your disrespect for them and hiding from further > discussion. Thank you, I know it and I've already subscribed the linux kernel mailing list(linux-kernel@vger.kernel.org) so that I won't miss any further discussion about it. > > I think you do not, but Linux development not have an automatic system > for patch tracking, so you are on your own with your text editor and > e-mail client on this. Please take care for your time. What about that? Do you mean something such as git by "an automatic system"? > > -- > frenzy > -o--=O`C > #oo'L O > <___=E M > -- Denis Cheng Linux Application Developer ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] trivial: the memset operation on a automatic array variable should be optimized out by data initialization 2007-06-24 12:58 ` rae l @ 2007-06-24 22:25 ` Oleg Verych 2007-06-24 22:15 ` Arjan van de Ven 0 siblings, 1 reply; 35+ messages in thread From: Oleg Verych @ 2007-06-24 22:25 UTC (permalink / raw) To: rae l; +Cc: trivial, linux-kernel On Sun, Jun 24, 2007 at 08:58:10PM +0800, rae l wrote: > On 6/23/07, Oleg Verych <olecom@flower.upol.cz> wrote: > >Why not just show actual objdump output on code (maybe with different > >oxygen atoms used in gcc), rather than *talking* about optimization and > >standards, hm? > here is the objdump output of the two object files: > As you could see, the older one used 0x38 bytes stack space while the > new one used 0x28 bytes, > and the object code is two bytes less, Actually more: $((0x3d9 - 0x3ce)) > I think all these benefits are the gcc's __builtin_memset optimization > than the explicit call to memset. ... or from complex memset() implementation (some chips even didn't do `rep' fast enough somehow). Maybe code like below will be acceptable for both optimizers and maintainers? |-*- unsigned long max_zone_pfns[MAX_NR_ZONES] = { [ZONE_DMA] = MAX_DMA_PFN, [ZONE_DMA32] = MAX_DMA32_PFN, [ZONE_NORMAL] = end_pfn, [ZONE_MOVABLE] = 0UL }; |-*- > $ objdump -d /tmp/init.orig.o|grep -A23 -nw '<paging_init>' [] > 547- 3d9: c3 retq [] > 545- 3ce: c3 retq [] > > > >I bet, that will be a key for success. And if you are interested in such > >optimizations, why not to grep whole source tree for this kind of > >things? I'm not sure one function in arch/x86_64 is only such > >``unoptimized''. > >And after doing that maybe you will see, that "{}" initializer can be > >applied not only to integer values (you did init with of *long int*, > >with *int*, btw), but to structs and others. > with '{}' initializer, gcc will fill its memory with zeros. > > to other potential points to be optimized, I only see this trivial as > the first point, I wonder how people gives comments on this; and if > this optimization can be tested correctly, this can be done as an > optimization example and I'll try others. Yes, comments and discussion is most important thing. But with such propositions you will be better in the kernel-janitors list. > > > >Ahh, one more thing about _optimizing_ your time, i.e. not wasting one. > > > >Add to CC list people, who already did reply on you patch. Otherwise > >you are showing your disrespect for them and hiding from further > >discussion. > Thank you, I know it and I've already subscribed the linux kernel > mailing list(linux-kernel@vger.kernel.org) so that I won't miss any > further discussion about it. OK, but news<=>e-mail service, like Gmane is much nicer. > > > >I think you do not, but Linux development not have an automatic system > >for patch tracking, so you are on your own with your text editor and > >e-mail client on this. Please take care for your time. > What about that? > Do you mean something such as git by "an automatic system"? That was a side note. ____ ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] trivial: the memset operation on a automatic array variable should be optimized out by data initialization 2007-06-24 22:25 ` Oleg Verych @ 2007-06-24 22:15 ` Arjan van de Ven 2007-06-24 23:23 ` Benjamin LaHaise 2007-06-24 23:33 ` memset() with zeroes (Re: [PATCH] trivial: the memset operation on a automatic array variable should be optimized out by data initialization) Oleg Verych 0 siblings, 2 replies; 35+ messages in thread From: Arjan van de Ven @ 2007-06-24 22:15 UTC (permalink / raw) To: Oleg Verych; +Cc: rae l, trivial, linux-kernel > > I think all these benefits are the gcc's __builtin_memset optimization > > than the explicit call to memset. > > ... or from complex memset() implementation (some chips even didn't do > `rep' fast enough somehow). Maybe code like below will be acceptable for > both optimizers and maintainers? we should just alias our memset to the __builtin one, and then provide a generic one from lib/ for the cases gcc needs to do a fallback. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] trivial: the memset operation on a automatic array variable should be optimized out by data initialization 2007-06-24 22:15 ` Arjan van de Ven @ 2007-06-24 23:23 ` Benjamin LaHaise 2007-06-25 0:09 ` Arjan van de Ven 2007-06-24 23:33 ` memset() with zeroes (Re: [PATCH] trivial: the memset operation on a automatic array variable should be optimized out by data initialization) Oleg Verych 1 sibling, 1 reply; 35+ messages in thread From: Benjamin LaHaise @ 2007-06-24 23:23 UTC (permalink / raw) To: Arjan van de Ven; +Cc: Oleg Verych, rae l, trivial, linux-kernel On Sun, Jun 24, 2007 at 03:15:17PM -0700, Arjan van de Ven wrote: > we should just alias our memset to the __builtin one, and then provide a > generic one from lib/ for the cases gcc needs to do a fallback. The last time I checked, gcc generated horrible badly performing code for builtin memset/memcpy() when -Os is specified. -ben -- "Time is of no importance, Mr. President, only life is important." Don't Email: <zyntrop@kvack.org>. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] trivial: the memset operation on a automatic array variable should be optimized out by data initialization 2007-06-24 23:23 ` Benjamin LaHaise @ 2007-06-25 0:09 ` Arjan van de Ven 2007-06-25 0:12 ` Benjamin LaHaise 0 siblings, 1 reply; 35+ messages in thread From: Arjan van de Ven @ 2007-06-25 0:09 UTC (permalink / raw) To: Benjamin LaHaise; +Cc: Oleg Verych, rae l, trivial, linux-kernel On Sun, 2007-06-24 at 19:23 -0400, Benjamin LaHaise wrote: > On Sun, Jun 24, 2007 at 03:15:17PM -0700, Arjan van de Ven wrote: > > we should just alias our memset to the __builtin one, and then provide a > > generic one from lib/ for the cases gcc needs to do a fallback. > > The last time I checked, gcc generated horrible badly performing code for > builtin memset/memcpy() when -Os is specified. if you care about the last cycle, don't specify -Os but -O2. simple as that... you get what you tell the compiler you want. -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] trivial: the memset operation on a automatic array variable should be optimized out by data initialization 2007-06-25 0:09 ` Arjan van de Ven @ 2007-06-25 0:12 ` Benjamin LaHaise 2007-06-25 0:23 ` Arjan van de Ven 0 siblings, 1 reply; 35+ messages in thread From: Benjamin LaHaise @ 2007-06-25 0:12 UTC (permalink / raw) To: Arjan van de Ven; +Cc: Oleg Verych, rae l, trivial, linux-kernel On Sun, Jun 24, 2007 at 05:09:16PM -0700, Arjan van de Ven wrote: > if you care about the last cycle, don't specify -Os but -O2. > simple as that... you get what you tell the compiler you want. Certain distros are shipping kernels compiled with -Os. And it's more than just a couple of cycles. -ben -- "Time is of no importance, Mr. President, only life is important." Don't Email: <zyntrop@kvack.org>. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] trivial: the memset operation on a automatic array variable should be optimized out by data initialization 2007-06-25 0:12 ` Benjamin LaHaise @ 2007-06-25 0:23 ` Arjan van de Ven 2007-06-25 0:41 ` -Os versus -O2 Adrian Bunk 0 siblings, 1 reply; 35+ messages in thread From: Arjan van de Ven @ 2007-06-25 0:23 UTC (permalink / raw) To: Benjamin LaHaise; +Cc: Oleg Verych, rae l, trivial, linux-kernel On Sun, 2007-06-24 at 20:12 -0400, Benjamin LaHaise wrote: > On Sun, Jun 24, 2007 at 05:09:16PM -0700, Arjan van de Ven wrote: > > if you care about the last cycle, don't specify -Os but -O2. > > simple as that... you get what you tell the compiler you want. > > Certain distros are shipping kernels compiled with -Os. And it's more > than just a couple of cycles. so those distros pick space over some cycles. Who are you to then override that choice ? ;-) seriously, why are we even talking about overriding a choice the user (or distro vendor as user) made here? -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org ^ permalink raw reply [flat|nested] 35+ messages in thread
* -Os versus -O2 2007-06-25 0:23 ` Arjan van de Ven @ 2007-06-25 0:41 ` Adrian Bunk 2007-06-25 0:58 ` Arjan van de Ven ` (2 more replies) 0 siblings, 3 replies; 35+ messages in thread From: Adrian Bunk @ 2007-06-25 0:41 UTC (permalink / raw) To: Arjan van de Ven; +Cc: Benjamin LaHaise, Oleg Verych, rae l, linux-kernel On Sun, Jun 24, 2007 at 05:23:42PM -0700, Arjan van de Ven wrote: > On Sun, 2007-06-24 at 20:12 -0400, Benjamin LaHaise wrote: > > On Sun, Jun 24, 2007 at 05:09:16PM -0700, Arjan van de Ven wrote: > > > if you care about the last cycle, don't specify -Os but -O2. > > > simple as that... you get what you tell the compiler you want. > > > > Certain distros are shipping kernels compiled with -Os. And it's more > > than just a couple of cycles. > > so those distros pick space over some cycles. Who are you to then > override that choice ? ;-) > > seriously, why are we even talking about overriding a choice the user > (or distro vendor as user) made here? There is a real issue in the fact that compiling with -Os is available through a kconfig option and AFAIR used by some distributions. I doubt distros enable CONFIG_CC_OPTIMIZE_FOR_SIZE due to size considerations, but due to speed considerations. I wouldn't care if CONFIG_CC_OPTIMIZE_FOR_SIZE was hidden behind CONFIG_EMBEDDED, but as long as it's available as a general purpose option we have to consider it's performance. The interesting questions are: Does -Os still sometimes generate faster code with gcc 4.2? If yes, why? cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: -Os versus -O2 2007-06-25 0:41 ` -Os versus -O2 Adrian Bunk @ 2007-06-25 0:58 ` Arjan van de Ven 2007-06-25 1:08 ` david 2007-06-25 1:33 ` Adrian Bunk 2007-06-25 1:23 ` Rene Herman 2007-06-25 1:34 ` Jeff Garzik 2 siblings, 2 replies; 35+ messages in thread From: Arjan van de Ven @ 2007-06-25 0:58 UTC (permalink / raw) To: Adrian Bunk; +Cc: Benjamin LaHaise, Oleg Verych, rae l, linux-kernel > I wouldn't care if CONFIG_CC_OPTIMIZE_FOR_SIZE was hidden behind > CONFIG_EMBEDDED, but as long as it's available as a general purpose > option we have to consider it's performance. I think you are missing the point. You tell the kernel to OPTIMIZE_FOR_SIZE. *over performance*. Sure. Performance shouldn't be EXTREMELY pathetic, but it's not; and if it were, it's a problem with the gcc version you have (and if you are a distro, you can surely fix that) > > The interesting questions are: > Does -Os still sometimes generate faster code with gcc 4.2? > If yes, why? on a system level, size can help performance because you have more memory available for other things. It also reduces download size and gives you more space on the live CD.... if you want to make things bigger again, please do this OUTSIDE the "optimize for size" option. Because that TELLS you to go for size. -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: -Os versus -O2 2007-06-25 0:58 ` Arjan van de Ven @ 2007-06-25 1:08 ` david 2007-06-25 1:17 ` Arjan van de Ven 2007-06-25 7:03 ` Segher Boessenkool 2007-06-25 1:33 ` Adrian Bunk 1 sibling, 2 replies; 35+ messages in thread From: david @ 2007-06-25 1:08 UTC (permalink / raw) To: Arjan van de Ven Cc: Adrian Bunk, Benjamin LaHaise, Oleg Verych, rae l, linux-kernel On Sun, 24 Jun 2007, Arjan van de Ven wrote: >> I wouldn't care if CONFIG_CC_OPTIMIZE_FOR_SIZE was hidden behind >> CONFIG_EMBEDDED, but as long as it's available as a general purpose >> option we have to consider it's performance. > > I think you are missing the point. You tell the kernel to > OPTIMIZE_FOR_SIZE. *over performance*. Sure. Performance shouldn't be > EXTREMELY pathetic, but it's not; and if it were, it's a problem with > the gcc version you have (and if you are a distro, you can surely fix > that) > >> >> The interesting questions are: >> Does -Os still sometimes generate faster code with gcc 4.2? >> If yes, why? > > on a system level, size can help performance because you have more > memory available for other things. It also reduces download size and > gives you more space on the live CD.... > > if you want to make things bigger again, please do this OUTSIDE the > "optimize for size" option. Because that TELLS you to go for size. then do we need a new option 'optimize for best overall performance' that goes for size (and the corresponding wins there) most of the time, but is ignored where it makes a huge difference? I started useing Os several years ago, even when it was hidden in the embedded menu becouse in many cases the smaller binary ended up being faster. in reality this was a flaw in gcc that on modern CPU's with the larger difference between CPU speed and memory speed it still preferred to unroll loops (eating more memory and blowing out the cpu cache) when it shouldn't have. if that has been fixed on later versions of gcc this would be a good thing. if it hasn't (possibly in part due to gcc optimizations being designed to be cross platform) then either the current 'go for size' or a hybrid 'performance' option is needed. David Lang ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: -Os versus -O2 2007-06-25 1:08 ` david @ 2007-06-25 1:17 ` Arjan van de Ven 2007-06-25 1:33 ` david 2007-06-25 7:03 ` Segher Boessenkool 1 sibling, 1 reply; 35+ messages in thread From: Arjan van de Ven @ 2007-06-25 1:17 UTC (permalink / raw) To: david; +Cc: Adrian Bunk, Benjamin LaHaise, Oleg Verych, rae l, linux-kernel On Sun, 2007-06-24 at 18:08 -0700, david@lang.hm wrote: > > > > on a system level, size can help performance because you have more > > memory available for other things. It also reduces download size and > > gives you more space on the live CD.... > > > > if you want to make things bigger again, please do this OUTSIDE the > > "optimize for size" option. Because that TELLS you to go for size. > > then do we need a new option 'optimize for best overall performance' that > goes for size (and the corresponding wins there) most of the time, but is > ignored where it makes a huge difference? that isn't so easy. Anything which doesn't have a performance tradeoff is in -O2 already. So every single thing in -Os costs you performance on a micro level. The translation to macro level depends greatly on how things are used (you even have to factor in download times etc)... so that is a fair question to leave up to the user... which is what there is today. -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: -Os versus -O2 2007-06-25 1:17 ` Arjan van de Ven @ 2007-06-25 1:33 ` david 2007-06-25 1:41 ` Rene Herman 2007-06-25 5:04 ` Willy Tarreau 0 siblings, 2 replies; 35+ messages in thread From: david @ 2007-06-25 1:33 UTC (permalink / raw) To: Arjan van de Ven Cc: Adrian Bunk, Benjamin LaHaise, Oleg Verych, rae l, linux-kernel On Sun, 24 Jun 2007, Arjan van de Ven wrote: > On Sun, 2007-06-24 at 18:08 -0700, david@lang.hm wrote: >>> >>> on a system level, size can help performance because you have more >>> memory available for other things. It also reduces download size and >>> gives you more space on the live CD.... >>> >>> if you want to make things bigger again, please do this OUTSIDE the >>> "optimize for size" option. Because that TELLS you to go for size. >> >> then do we need a new option 'optimize for best overall performance' that >> goes for size (and the corresponding wins there) most of the time, but is >> ignored where it makes a huge difference? > > that isn't so easy. Anything which doesn't have a performance tradeoff > is in -O2 already. So every single thing in -Os costs you performance on > a micro level. this has not been true in the past (assuming that it's true today) ok, if you look at a micro-enough level this may be true, but completely ignoring things like download times, the optimizations almost always boil down to trying to avoid jumps, loops, and decision logic at the expense of space. however recent cpu's are significantly better as handling jumps and loops, and the cost of cache misses is significantly worse. is the list of what's included in -O2 vs -Os different for different CPU's? what about within a single family of processors? (even in the x86 family the costs of jumps, loops, and cache misses varies drasticly) my understanding was that the optimizations for O2 were pretty fixed. > The translation to macro level depends greatly on how things are used > (you even have to factor in download times etc)... so that is a fair > question to leave up to the user... which is what there is today. ignore things like download time for the moment. it's not significant to most people as they don't download things that often, and when they do they are almost always downloading lots of stuff they don't need (drivers for example) users are trying to get better performance 90+% of the time when they select -Os. That's why it got moved out of CONFIG_EMBEDDED. David Lang ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: -Os versus -O2 2007-06-25 1:33 ` david @ 2007-06-25 1:41 ` Rene Herman 2007-06-25 5:04 ` Willy Tarreau 1 sibling, 0 replies; 35+ messages in thread From: Rene Herman @ 2007-06-25 1:41 UTC (permalink / raw) To: david Cc: Arjan van de Ven, Adrian Bunk, Benjamin LaHaise, Oleg Verych, rae l, linux-kernel On 06/25/2007 03:33 AM, david@lang.hm wrote: > is the list of what's included in -O2 vs -Os different for different > CPU's? what about within a single family of processors? (even in the x86 > family the costs of jumps, loops, and cache misses varies drasticly) At least not in the example Duron/Athlon case. Both -march=athlon{,-4) but 64K versus 256K L2 which I'd expect to be an important difference in the -Os versus -O2 behaviour. Rene. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: -Os versus -O2 2007-06-25 1:33 ` david 2007-06-25 1:41 ` Rene Herman @ 2007-06-25 5:04 ` Willy Tarreau 2007-06-25 7:08 ` Segher Boessenkool 1 sibling, 1 reply; 35+ messages in thread From: Willy Tarreau @ 2007-06-25 5:04 UTC (permalink / raw) To: david Cc: Arjan van de Ven, Adrian Bunk, Benjamin LaHaise, Oleg Verych, rae l, linux-kernel On Sun, Jun 24, 2007 at 06:33:15PM -0700, david@lang.hm wrote: > On Sun, 24 Jun 2007, Arjan van de Ven wrote: > > >On Sun, 2007-06-24 at 18:08 -0700, david@lang.hm wrote: > >>> > >>>on a system level, size can help performance because you have more > >>>memory available for other things. It also reduces download size and > >>>gives you more space on the live CD.... > >>> > >>>if you want to make things bigger again, please do this OUTSIDE the > >>>"optimize for size" option. Because that TELLS you to go for size. > >> > >>then do we need a new option 'optimize for best overall performance' that > >>goes for size (and the corresponding wins there) most of the time, but is > >>ignored where it makes a huge difference? > > > >that isn't so easy. Anything which doesn't have a performance tradeoff > >is in -O2 already. So every single thing in -Os costs you performance on > >a micro level. > > this has not been true in the past (assuming that it's true today) > > ok, if you look at a micro-enough level this may be true, but completely > ignoring things like download times, the optimizations almost always boil > down to trying to avoid jumps, loops, and decision logic at the expense of > space. > > however recent cpu's are significantly better as handling jumps and loops, > and the cost of cache misses is significantly worse. > > is the list of what's included in -O2 vs -Os different for different > CPU's? what about within a single family of processors? (even in the x86 > family the costs of jumps, loops, and cache misses varies drasticly) > > my understanding was that the optimizations for O2 were pretty fixed. > > >The translation to macro level depends greatly on how things are used > >(you even have to factor in download times etc)... so that is a fair > >question to leave up to the user... which is what there is today. > > ignore things like download time for the moment. it's not significant to > most people as they don't download things that often, and when they do > they are almost always downloading lots of stuff they don't need (drivers > for example) > > users are trying to get better performance 90+% of the time when they > select -Os. That's why it got moved out of CONFIG_EMBEDDED. In my experience, -Os produced faster code on gcc-2.95 than -O2 or -O3. It was not only because of cache considerations, but because gcc used different tricks to avoid poor optimizations, and at the end, the CPU ended executing the alternative code faster. With gcc-3.3, -Os show roughly the same performance as -O2 for me on various programs. However, with gcc-3.4, I noticed a slow down with -Os. And with gcc-4, using -Os optimizes only for size, even if the output code is slow as hell. I've had programs whose speed dropped by 70% using -Os on gcc-4. But their size was smaller than with older versions. But in some situtations, it's desirable to have the smallest possible kernel whatever its performance. This goes for installation CDs for instance. Willy ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: -Os versus -O2 2007-06-25 5:04 ` Willy Tarreau @ 2007-06-25 7:08 ` Segher Boessenkool 2007-06-25 7:15 ` david 2007-06-25 8:19 ` Willy Tarreau 0 siblings, 2 replies; 35+ messages in thread From: Segher Boessenkool @ 2007-06-25 7:08 UTC (permalink / raw) To: Willy Tarreau Cc: Benjamin LaHaise, linux-kernel, Arjan van de Ven, Adrian Bunk, david, Oleg Verych, rae l > In my experience, -Os produced faster code on gcc-2.95 than -O2 or -O3. On what CPU? The effect of different optimisations varies hugely between different CPUs (and architectures). > It was not only because of cache considerations, but because gcc used > different tricks to avoid poor optimizations, and at the end, the CPU > ended executing the alternative code faster. -Os is "as fast as you can without bloating the code size", so that is the expected result for CPUs that don't need special hand-holding around certain performance pitfalls. > With gcc-3.3, -Os show roughly the same performance as -O2 for me on > various programs. However, with gcc-3.4, I noticed a slow down with > -Os. And with gcc-4, using -Os optimizes only for size, even if the > output code is slow as hell. I've had programs whose speed dropped > by 70% using -Os on gcc-4. Well you better report those! <http://gcc.gnu.org/bugzilla> > But in some situtations, it's desirable to have the smallest possible > kernel whatever its performance. This goes for installation CDs for > instance. There are much better ways to achieve that. Segher ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: -Os versus -O2 2007-06-25 7:08 ` Segher Boessenkool @ 2007-06-25 7:15 ` david 2007-06-25 7:41 ` Segher Boessenkool 2007-06-25 8:19 ` Willy Tarreau 1 sibling, 1 reply; 35+ messages in thread From: david @ 2007-06-25 7:15 UTC (permalink / raw) To: Segher Boessenkool Cc: Willy Tarreau, Benjamin LaHaise, linux-kernel, Arjan van de Ven, Adrian Bunk, Oleg Verych, rae l On Mon, 25 Jun 2007, Segher Boessenkool wrote: >> In my experience, -Os produced faster code on gcc-2.95 than -O2 or -O3. > > On what CPU? The effect of different optimisations varies > hugely between different CPUs (and architectures). > >> It was not only because of cache considerations, but because gcc used >> different tricks to avoid poor optimizations, and at the end, the CPU >> ended executing the alternative code faster. > > -Os is "as fast as you can without bloating the code size", > so that is the expected result for CPUs that don't need > special hand-holding around certain performance pitfalls. this sounds like you are saying that people wanting performance should pick -Os. what should people pick who care more about code size then anything else? (examples being embedded development where you may be willing to sacrafice speed to avoid having to add additional chips to the design) David Lang ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: -Os versus -O2 2007-06-25 7:15 ` david @ 2007-06-25 7:41 ` Segher Boessenkool 0 siblings, 0 replies; 35+ messages in thread From: Segher Boessenkool @ 2007-06-25 7:41 UTC (permalink / raw) To: david Cc: Benjamin LaHaise, linux-kernel, Willy Tarreau, Arjan van de Ven, Adrian Bunk, Oleg Verych, rae l >> -Os is "as fast as you can without bloating the code size", >> so that is the expected result for CPUs that don't need >> special hand-holding around certain performance pitfalls. > > this sounds like you are saying that people wanting performance should > pick -Os. That is true on most CPUs. Some CPUs really really need some of things that -Os disables (compared to -O2) for decent performance though (branch target alignment...) > what should people pick who care more about code size then anything > else? (examples being embedded development where you may be willing to > sacrafice speed to avoid having to add additional chips to the design) -Os and tune some options. There is extensive work being done over the last few years to make GCC more suitable for embedded targets btw. But the -O1/-O2/-O3/-Os gives you four choices only, it's really not so hard to understand I hope that for more specific goals you need to add more specific options? Segher ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: -Os versus -O2 2007-06-25 7:08 ` Segher Boessenkool 2007-06-25 7:15 ` david @ 2007-06-25 8:19 ` Willy Tarreau 2007-06-25 8:41 ` Segher Boessenkool 1 sibling, 1 reply; 35+ messages in thread From: Willy Tarreau @ 2007-06-25 8:19 UTC (permalink / raw) To: Segher Boessenkool Cc: Benjamin LaHaise, linux-kernel, Arjan van de Ven, Adrian Bunk, david, Oleg Verych, rae l On Mon, Jun 25, 2007 at 09:08:23AM +0200, Segher Boessenkool wrote: > >In my experience, -Os produced faster code on gcc-2.95 than -O2 or -O3. > > On what CPU? The effect of different optimisations varies > hugely between different CPUs (and architectures). x86 > >It was not only because of cache considerations, but because gcc used > >different tricks to avoid poor optimizations, and at the end, the CPU > >ended executing the alternative code faster. > > -Os is "as fast as you can without bloating the code size", > so that is the expected result for CPUs that don't need > special hand-holding around certain performance pitfalls. > > >With gcc-3.3, -Os show roughly the same performance as -O2 for me on > >various programs. However, with gcc-3.4, I noticed a slow down with > >-Os. And with gcc-4, using -Os optimizes only for size, even if the > >output code is slow as hell. I've had programs whose speed dropped > >by 70% using -Os on gcc-4. > > Well you better report those! <http://gcc.gnu.org/bugzilla> No, -Os is for size only : -Os Optimize for size. -Os enables all -O2 optimizations that do not typically increase code size. It also performs further optimizations designed to reduce code size. So it is expected that speed can be reduced using -Os. I won't report a thing which is already documented ! > >But in some situtations, it's desirable to have the smallest possible > >kernel whatever its performance. This goes for installation CDs for > >instance. > > There are much better ways to achieve that. Optimizing is not a matter of choosing *one* way, but cumulating everything you have. For instance, on a smart boot loader, I have a kernel which is about 300 kB, or 700 kB with the initramfs. Among the tricks I used : - -Os - -march=i386 - align everything to 0 - replace gzip with p7zip Even if each of them reduces overall size by 5%, the net result is 0.95^4 = 0.81 = 19% gain, for the same set of features. This is something to consider. Regards, Willy ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: -Os versus -O2 2007-06-25 8:19 ` Willy Tarreau @ 2007-06-25 8:41 ` Segher Boessenkool 0 siblings, 0 replies; 35+ messages in thread From: Segher Boessenkool @ 2007-06-25 8:41 UTC (permalink / raw) To: Willy Tarreau Cc: Benjamin LaHaise, linux-kernel, Arjan van de Ven, Adrian Bunk, david, Oleg Verych, rae l >>> In my experience, -Os produced faster code on gcc-2.95 than -O2 or >>> -O3. >> >> On what CPU? The effect of different optimisations varies >> hugely between different CPUs (and architectures). > > x86 That's not a CPU, that's an architecture. I hope you understand there are very big differences between different members of the x86 family and you don't compare 2.95 on a Pentium class CPU to 3.x on an Opteron or 4.x on a Pentium4 or something like that. >>> With gcc-3.3, -Os show roughly the same performance as -O2 for me on >>> various programs. However, with gcc-3.4, I noticed a slow down with >>> -Os. And with gcc-4, using -Os optimizes only for size, even if the >>> output code is slow as hell. I've had programs whose speed dropped >>> by 70% using -Os on gcc-4. >> >> Well you better report those! <http://gcc.gnu.org/bugzilla> > > No, -Os is for size only : > > -Os Optimize for size. -Os enables all -O2 optimizations > that do not typically increase code size. It also > performs further optimizations designed to reduce code > size. That is not "for size only". Please read again. A 70% speed decrease is something that should be at least investigated, even if then perhaps it is decided GCC already does the "right thing". > So it is expected that speed can be reduced using -Os. I won't report > a thing which is already documented ! A few percent points slower is expected, 20% would be explainable, but 70%? -O2 and -Os are supposed to differ in _minor_ ways. Such a huge performance drop is unexpected. If you file the PR, feel free to blame me for reporting it at all. >>> But in some situtations, it's desirable to have the smallest possible >>> kernel whatever its performance. This goes for installation CDs for >>> instance. >> >> There are much better ways to achieve that. > > Optimizing is not a matter of choosing *one* way, but cumulating > everything you have. Yes of course. I'm just saying -Os is a pretty minor step in the overall making-things-smaller game. Leaving out XFS helps a whole megabyte on my default target, for example. > For instance, on a smart boot loader, I have > a kernel which is about 300 kB, or 700 kB with the initramfs. Among > the tricks I used : > - -Os > - -march=i386 > - align everything to 0 > - replace gzip with p7zip > > Even if each of them reduces overall size by 5%, the net result is > 0.95^4 = 0.81 = 19% gain, for the same set of features. This is > something to consider. Sure. I don't think making -Os mean "as small as possible in all cases" (or, rather, introducing a new option for that) would help terribly much over the current -Os meaning -- a few percent at most. That's not to say that no such optimisations are added anymore, but mostly they turn out not to decrease speed at all and so are enabled at any -O level :-) Segher ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: -Os versus -O2 2007-06-25 1:08 ` david 2007-06-25 1:17 ` Arjan van de Ven @ 2007-06-25 7:03 ` Segher Boessenkool 2007-06-25 7:13 ` david 1 sibling, 1 reply; 35+ messages in thread From: Segher Boessenkool @ 2007-06-25 7:03 UTC (permalink / raw) To: david Cc: Benjamin LaHaise, linux-kernel, Arjan van de Ven, Adrian Bunk, Oleg Verych, rae l > then do we need a new option 'optimize for best overall performance' > that goes for size (and the corresponding wins there) most of the > time, but is ignored where it makes a huge difference? That's -Os mostly. Some awful CPUs really need higher loop/label/function alignment though to get any performance; you could add -falign-xxx options for those. > in reality this was a flaw in gcc that on modern CPU's with the larger > difference between CPU speed and memory speed it still preferred to > unroll loops (eating more memory and blowing out the cpu cache) when > it shouldn't have. You told it to unroll loops, so it did. No flaw. If you feel the optimisations enabled by -O2 should depend on the CPU tuning selected, please file a PR. Also note that whether or not it is profitable to unroll a particular loop depends largely on how "hot" that loop is, and GCC doesn't know much about that if you don't feed it profiling information (it can guess a bit, sure, but it can guess wrong too). Segher ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: -Os versus -O2 2007-06-25 7:03 ` Segher Boessenkool @ 2007-06-25 7:13 ` david 2007-06-25 7:35 ` Segher Boessenkool 0 siblings, 1 reply; 35+ messages in thread From: david @ 2007-06-25 7:13 UTC (permalink / raw) To: Segher Boessenkool Cc: Benjamin LaHaise, linux-kernel, Arjan van de Ven, Adrian Bunk, Oleg Verych, rae l On Mon, 25 Jun 2007, Segher Boessenkool wrote: >> then do we need a new option 'optimize for best overall performance' that >> goes for size (and the corresponding wins there) most of the time, but is >> ignored where it makes a huge difference? > > That's -Os mostly. Some awful CPUs really need higher > loop/label/function alignment though to get any > performance; you could add -falign-xxx options for those. > >> in reality this was a flaw in gcc that on modern CPU's with the larger >> difference between CPU speed and memory speed it still preferred to unroll >> loops (eating more memory and blowing out the cpu cache) when it shouldn't >> have. > > You told it to unroll loops, so it did. No flaw. If you > feel the optimisations enabled by -O2 should depend on the > CPU tuning selected, please file a PR. > > Also note that whether or not it is profitable to unroll > a particular loop depends largely on how "hot" that loop > is, and GCC doesn't know much about that if you don't feed > it profiling information (it can guess a bit, sure, but it > can guess wrong too). actually, what you are saying is that the compiler can't know enough to figure out how to optimize for speed. it will just do what you tell it to, either unroll loops or not. this argues that both O2 and Os are incorrect for a project to use and instead the project needs to make it's own decisions on this. if this is the true feeling of the gcc team I'm very disappointed, it feels like a huge step backwards. David Lang ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: -Os versus -O2 2007-06-25 7:13 ` david @ 2007-06-25 7:35 ` Segher Boessenkool 0 siblings, 0 replies; 35+ messages in thread From: Segher Boessenkool @ 2007-06-25 7:35 UTC (permalink / raw) To: david Cc: Benjamin LaHaise, linux-kernel, Arjan van de Ven, Adrian Bunk, Oleg Verych, rae l >> Also note that whether or not it is profitable to unroll >> a particular loop depends largely on how "hot" that loop >> is, and GCC doesn't know much about that if you don't feed >> it profiling information (it can guess a bit, sure, but it >> can guess wrong too). > > actually, what you are saying is that the compiler can't know enough > to figure out how to optimize for speed. it will just do what you tell > it to, either unroll loops or not. It bases its optimisation decisions on the options you give it, the profile feedback information you either or not gave it, and a whole bunch of heuristics. > this argues that both O2 and Os are incorrect for a project to use and > instead the project needs to make it's own decisions on this. For optimal performance, you need to fine-tune options yes, per file (or per function even!) > if this is the true feeling of the gcc team I'm very disappointed, it > feels like a huge step backwards. I speak only for myself. However this is the only way it _can_ be, the compiler isn't clairvoyant. Some of the heuristics sure could use some tuning, but they stay heuristics. Segher ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: -Os versus -O2 2007-06-25 0:58 ` Arjan van de Ven 2007-06-25 1:08 ` david @ 2007-06-25 1:33 ` Adrian Bunk 1 sibling, 0 replies; 35+ messages in thread From: Adrian Bunk @ 2007-06-25 1:33 UTC (permalink / raw) To: Arjan van de Ven; +Cc: Benjamin LaHaise, Oleg Verych, rae l, linux-kernel On Sun, Jun 24, 2007 at 05:58:46PM -0700, Arjan van de Ven wrote: > > > I wouldn't care if CONFIG_CC_OPTIMIZE_FOR_SIZE was hidden behind > > CONFIG_EMBEDDED, but as long as it's available as a general purpose > > option we have to consider it's performance. > > I think you are missing the point. You tell the kernel to > OPTIMIZE_FOR_SIZE. *over performance*. Sure. Performance shouldn't be > EXTREMELY pathetic, but it's not; and if it were, it's a problem with > the gcc version you have (and if you are a distro, you can surely fix > that) My point is commit c45b4f1f1e149c023762ac4be166ead1818cefef CC_OPTIMIZE_FOR_SIZE is currently known as an experimental feature to improve the _performance_. > > The interesting questions are: > > Does -Os still sometimes generate faster code with gcc 4.2? > > If yes, why? > > on a system level, size can help performance because you have more > memory available for other things. For a given gcc version, there's a finite number of differences between -Os and -O2. The interesting question is for which differences with gcc 4.2 we want the -Os version in the kernel for best performance. This should then be controllable through gcc options. > It also reduces download size and > gives you more space on the live CD.... That's a different point. If you don't care about performance but care about size then -Os is the best choice. > if you want to make things bigger again, please do this OUTSIDE the > "optimize for size" option. Because that TELLS you to go for size. Agreed, but CONFIG_CC_OPTIMIZE_FOR_SIZE should again be under CONFIG_EMBEDDED. cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: -Os versus -O2 2007-06-25 0:41 ` -Os versus -O2 Adrian Bunk 2007-06-25 0:58 ` Arjan van de Ven @ 2007-06-25 1:23 ` Rene Herman 2007-06-25 1:31 ` Rene Herman 2007-06-25 1:34 ` Jeff Garzik 2 siblings, 1 reply; 35+ messages in thread From: Rene Herman @ 2007-06-25 1:23 UTC (permalink / raw) To: Adrian Bunk Cc: Arjan van de Ven, Benjamin LaHaise, Oleg Verych, rae l, linux-kernel On 06/25/2007 02:41 AM, Adrian Bunk wrote: > The interesting questions are: > Does -Os still sometimes generate faster code with gcc 4.2? > If yes, why? I would wager that the CPU type makes more of a difference than the compiler version. That is, I'd expect my Duron with it's "puny" 64K L1 to have a very different profile than it's Athlon brother with 256K L1. Not to mention CPUs with as little as 8K L1 (P1). I can't quote numbers -- it's a bit hard to test those things anyway as it's a system-global effect and not su much that's easily isolated in a dedicated benchmark. Rene. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: -Os versus -O2 2007-06-25 1:23 ` Rene Herman @ 2007-06-25 1:31 ` Rene Herman 0 siblings, 0 replies; 35+ messages in thread From: Rene Herman @ 2007-06-25 1:31 UTC (permalink / raw) To: Adrian Bunk Cc: Arjan van de Ven, Benjamin LaHaise, Oleg Verych, rae l, linux-kernel On 06/25/2007 03:23 AM, Rene Herman wrote: > On 06/25/2007 02:41 AM, Adrian Bunk wrote: > >> The interesting questions are: >> Does -Os still sometimes generate faster code with gcc 4.2? >> If yes, why? > > I would wager that the CPU type makes more of a difference than the > compiler version. That is, I'd expect my Duron with it's "puny" 64K L1 > to have a very different profile than it's Athlon brother with 256K L1. Sorry, that should've been L2. And "its" ... > Not to mention CPUs with as little as 8K L1 (P1). > > I can't quote numbers -- it's a bit hard to test those things anyway as > it's a system-global effect and not su much that's easily isolated in a > dedicated benchmark. And while I'm at it, "and not so much one that's [ ...]". Rene. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: -Os versus -O2 2007-06-25 0:41 ` -Os versus -O2 Adrian Bunk 2007-06-25 0:58 ` Arjan van de Ven 2007-06-25 1:23 ` Rene Herman @ 2007-06-25 1:34 ` Jeff Garzik 2007-06-25 1:46 ` Adrian Bunk 2 siblings, 1 reply; 35+ messages in thread From: Jeff Garzik @ 2007-06-25 1:34 UTC (permalink / raw) To: Adrian Bunk Cc: Arjan van de Ven, Benjamin LaHaise, Oleg Verych, rae l, linux-kernel Adrian Bunk wrote: > The interesting questions are: > Does -Os still sometimes generate faster code with gcc 4.2? > If yes, why? Smaller code can mean fewer page faults, fewer cache invalidations, etc. It's not just a matter of compiler code generation, gotta look at the whole picture. Jeff ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: -Os versus -O2 2007-06-25 1:34 ` Jeff Garzik @ 2007-06-25 1:46 ` Adrian Bunk 2007-06-25 2:19 ` david 0 siblings, 1 reply; 35+ messages in thread From: Adrian Bunk @ 2007-06-25 1:46 UTC (permalink / raw) To: Jeff Garzik Cc: Arjan van de Ven, Benjamin LaHaise, Oleg Verych, rae l, linux-kernel On Sun, Jun 24, 2007 at 09:34:05PM -0400, Jeff Garzik wrote: > Adrian Bunk wrote: >> The interesting questions are: >> Does -Os still sometimes generate faster code with gcc 4.2? >> If yes, why? > > Smaller code can mean fewer page faults, fewer cache invalidations, etc. > > It's not just a matter of compiler code generation, gotta look at the whole > picture. Sure, but my point is that if the kernel is considered special and the best optimization for the kernel is therefore between -Os and -O2, we should try to find this point of best optimization. This should address Arjans point that -Os might not be best choice for best performance (and it's actually our fault if gcc generates stupid but small code when we use -Os). > Jeff cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: -Os versus -O2 2007-06-25 1:46 ` Adrian Bunk @ 2007-06-25 2:19 ` david 0 siblings, 0 replies; 35+ messages in thread From: david @ 2007-06-25 2:19 UTC (permalink / raw) To: Adrian Bunk Cc: Jeff Garzik, Arjan van de Ven, Benjamin LaHaise, Oleg Verych, rae l, linux-kernel On Mon, 25 Jun 2007, Adrian Bunk wrote: > On Sun, Jun 24, 2007 at 09:34:05PM -0400, Jeff Garzik wrote: >> Adrian Bunk wrote: >>> The interesting questions are: >>> Does -Os still sometimes generate faster code with gcc 4.2? >>> If yes, why? >> >> Smaller code can mean fewer page faults, fewer cache invalidations, etc. >> >> It's not just a matter of compiler code generation, gotta look at the whole >> picture. the picture gets even murkier when you consider that even if neither option overflows the cpu cache the one that takes more space in the cache leaves less space in the cache for the userspacde code that the system is actually there to run. > Sure, but my point is that if the kernel is considered special and the > best optimization for the kernel is therefore between -Os and -O2, we > should try to find this point of best optimization. > > This should address Arjans point that -Os might not be best choice for > best performance (and it's actually our fault if gcc generates stupid > but small code when we use -Os). what can be done to find the horribly bad but small code among the "it's smaller and would be less efficiant if you didn't consider the cache" majority? David Lang ^ permalink raw reply [flat|nested] 35+ messages in thread
* memset() with zeroes (Re: [PATCH] trivial: the memset operation on a automatic array variable should be optimized out by data initialization) 2007-06-24 22:15 ` Arjan van de Ven 2007-06-24 23:23 ` Benjamin LaHaise @ 2007-06-24 23:33 ` Oleg Verych 1 sibling, 0 replies; 35+ messages in thread From: Oleg Verych @ 2007-06-24 23:33 UTC (permalink / raw) To: Arjan van de Ven; +Cc: rae l, trivial, linux-kernel On Sun, Jun 24, 2007 at 03:15:17PM -0700, Arjan van de Ven wrote: > > > > I think all these benefits are the gcc's __builtin_memset optimization > > > than the explicit call to memset. > > > > ... or from complex memset() implementation (some chips even didn't do > > `rep' fast enough somehow). Maybe code like below will be acceptable for > > both optimizers and maintainers? > > > we should just alias our memset to the __builtin one, and then provide a > generic one from lib/ for the cases gcc needs to do a fallback. In x86_64 there's infrastructure to check and select right memset(). Therefor it's need, i think. But if one will took a look at usage, zero memset() optimization becomes obvious, one argument off -- one reg is free from clobbering. |-*- flower-:22-rc4-mm2/arch/x86_64$ grep memset -R . | grep "[ 0,]0," | wc -l 42 flower-:22-rc4-mm2/arch/x86_64$ flower-:22-rc4-mm2/arch/x86_64$ cd .. flower-:22-rc4-mm2/arch$ grep memset -R . | grep "[ 0,]0," | wc -l 735 flower-:22-rc4-mm2/arch$ flower-:22-rc4-mm2$ grep memset -R . | grep "[ 0,]0," | wc -l 6679 flower-:22-rc4-mm2$ |-*- ____ ^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2007-06-25 8:41 UTC | newest] Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2007-06-23 5:15 [PATCH] trivial: the memset operation on a automatic array variable should be optimized out by data initialization Denis Cheng 2007-06-23 7:59 ` Oleg Verych 2007-06-23 13:13 ` Adrian Bunk 2007-06-23 13:41 ` Oleg Verych 2007-06-23 13:57 ` Adrian Bunk 2007-06-23 15:21 ` Segher Boessenkool 2007-06-24 12:58 ` rae l 2007-06-24 22:25 ` Oleg Verych 2007-06-24 22:15 ` Arjan van de Ven 2007-06-24 23:23 ` Benjamin LaHaise 2007-06-25 0:09 ` Arjan van de Ven 2007-06-25 0:12 ` Benjamin LaHaise 2007-06-25 0:23 ` Arjan van de Ven 2007-06-25 0:41 ` -Os versus -O2 Adrian Bunk 2007-06-25 0:58 ` Arjan van de Ven 2007-06-25 1:08 ` david 2007-06-25 1:17 ` Arjan van de Ven 2007-06-25 1:33 ` david 2007-06-25 1:41 ` Rene Herman 2007-06-25 5:04 ` Willy Tarreau 2007-06-25 7:08 ` Segher Boessenkool 2007-06-25 7:15 ` david 2007-06-25 7:41 ` Segher Boessenkool 2007-06-25 8:19 ` Willy Tarreau 2007-06-25 8:41 ` Segher Boessenkool 2007-06-25 7:03 ` Segher Boessenkool 2007-06-25 7:13 ` david 2007-06-25 7:35 ` Segher Boessenkool 2007-06-25 1:33 ` Adrian Bunk 2007-06-25 1:23 ` Rene Herman 2007-06-25 1:31 ` Rene Herman 2007-06-25 1:34 ` Jeff Garzik 2007-06-25 1:46 ` Adrian Bunk 2007-06-25 2:19 ` david 2007-06-24 23:33 ` memset() with zeroes (Re: [PATCH] trivial: the memset operation on a automatic array variable should be optimized out by data initialization) Oleg Verych
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).