* PATCH][RFC][resend] CC_OPTIMIZE_FOR_SIZE should default to N @ 2011-03-21 20:08 Jesper Juhl 2011-03-22 2:52 ` Steven Rostedt 2011-03-22 8:21 ` Pekka Enberg 0 siblings, 2 replies; 8+ messages in thread From: Jesper Juhl @ 2011-03-21 20:08 UTC (permalink / raw) To: linux-kernel Cc: Andrew Morton, Paul E. McKenney, Ingo Molnar, Daniel Lezcano, Eric Paris, Roman Zippel, linux-kbuild I believe that the majority of systems we are built on want a -O2 compiled kernel. Optimizing for size (-Os) is mainly benneficial for embedded systems and systems with very small CPU caches (correct me if I'm wrong). So it seems wrong to me that CC_OPTIMIZE_FOR_SIZE defaults to 'y' and recommends saying 'Y' if unsure. I believe it should default to 'n' and recommend that if unsure. People who bennefit from -Os know who they are and can enable the option if needed/wanted - the majority shouldn't select this. Right? Signed-off-by: Jesper Juhl <jj@chaosbits.net> --- Kconfig | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/init/Kconfig b/init/Kconfig index 56240e7..0d63dfa 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -903,12 +903,12 @@ endif config CC_OPTIMIZE_FOR_SIZE bool "Optimize for size" - default y + default n help Enabling this option will pass "-Os" instead of "-O2" to gcc resulting in a smaller kernel. - If unsure, say Y. + If unsure, say N. config SYSCTL bool -- Jesper Juhl <jj@chaosbits.net> http://www.chaosbits.net/ Don't top-post http://www.catb.org/jargon/html/T/top-post.html Plain text mails only, please. ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: PATCH][RFC][resend] CC_OPTIMIZE_FOR_SIZE should default to N 2011-03-21 20:08 PATCH][RFC][resend] CC_OPTIMIZE_FOR_SIZE should default to N Jesper Juhl @ 2011-03-22 2:52 ` Steven Rostedt 2011-03-22 8:21 ` Pekka Enberg 1 sibling, 0 replies; 8+ messages in thread From: Steven Rostedt @ 2011-03-22 2:52 UTC (permalink / raw) To: Jesper Juhl Cc: linux-kernel, Andrew Morton, Paul E. McKenney, Ingo Molnar, Daniel Lezcano, Eric Paris, Roman Zippel, linux-kbuild On Mon, Mar 21, 2011 at 09:08:24PM +0100, Jesper Juhl wrote: > I believe that the majority of systems we are built on want a -O2 compiled > kernel. Optimizing for size (-Os) is mainly benneficial for embedded > systems and systems with very small CPU caches (correct me if I'm wrong). > So it seems wrong to me that CC_OPTIMIZE_FOR_SIZE defaults to 'y' and > recommends saying 'Y' if unsure. I believe it should default to 'n' and > recommend that if unsure. People who bennefit from -Os know who they are > and can enable the option if needed/wanted - the majority shouldn't > select this. Right? > > Signed-off-by: Jesper Juhl <jj@chaosbits.net> I've actually seen nothing but problems with -Os. Acked-by: Steven Rostedt <rostedt@goodmis.org> -- Steve ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: PATCH][RFC][resend] CC_OPTIMIZE_FOR_SIZE should default to N 2011-03-21 20:08 PATCH][RFC][resend] CC_OPTIMIZE_FOR_SIZE should default to N Jesper Juhl 2011-03-22 2:52 ` Steven Rostedt @ 2011-03-22 8:21 ` Pekka Enberg 2011-03-22 8:25 ` Jesper Juhl 2011-03-22 10:27 ` Ingo Molnar 1 sibling, 2 replies; 8+ messages in thread From: Pekka Enberg @ 2011-03-22 8:21 UTC (permalink / raw) To: Jesper Juhl Cc: linux-kernel, Andrew Morton, Paul E. McKenney, Ingo Molnar, Daniel Lezcano, Eric Paris, Roman Zippel, linux-kbuild, Linus Torvalds, Steven Rostedt Hi Jesper, On Mon, Mar 21, 2011 at 10:08 PM, Jesper Juhl <jj@chaosbits.net> wrote: > I believe that the majority of systems we are built on want a -O2 compiled > kernel. Optimizing for size (-Os) is mainly benneficial for embedded > systems and systems with very small CPU caches (correct me if I'm wrong). Please take a look at commit 0910b44 ("Expose "Optimize for size" option for everybody") for the reasoning behind defaulting to -Os. Pekka ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: PATCH][RFC][resend] CC_OPTIMIZE_FOR_SIZE should default to N 2011-03-22 8:21 ` Pekka Enberg @ 2011-03-22 8:25 ` Jesper Juhl 2011-03-22 10:27 ` Ingo Molnar 1 sibling, 0 replies; 8+ messages in thread From: Jesper Juhl @ 2011-03-22 8:25 UTC (permalink / raw) To: Pekka Enberg Cc: linux-kernel, Andrew Morton, Paul E. McKenney, Ingo Molnar, Daniel Lezcano, Eric Paris, Roman Zippel, linux-kbuild, Linus Torvalds, Steven Rostedt On Tue, 22 Mar 2011, Pekka Enberg wrote: > Hi Jesper, > > On Mon, Mar 21, 2011 at 10:08 PM, Jesper Juhl <jj@chaosbits.net> wrote: > > I believe that the majority of systems we are built on want a -O2 compiled > > kernel. Optimizing for size (-Os) is mainly benneficial for embedded > > systems and systems with very small CPU caches (correct me if I'm wrong). > > Please take a look at commit 0910b44 ("Expose "Optimize for size" > option for everybody") for the reasoning behind defaulting to -Os. > Thank you for that pointer Pekka. I guess I need to update my view on -Os. -- Jesper Juhl <jj@chaosbits.net> http://www.chaosbits.net/ Don't top-post http://www.catb.org/jargon/html/T/top-post.html Plain text mails only, please. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: PATCH][RFC][resend] CC_OPTIMIZE_FOR_SIZE should default to N 2011-03-22 8:21 ` Pekka Enberg 2011-03-22 8:25 ` Jesper Juhl @ 2011-03-22 10:27 ` Ingo Molnar 2011-03-22 16:59 ` Linus Torvalds 1 sibling, 1 reply; 8+ messages in thread From: Ingo Molnar @ 2011-03-22 10:27 UTC (permalink / raw) To: Pekka Enberg Cc: Jesper Juhl, linux-kernel, Andrew Morton, Paul E. McKenney, Daniel Lezcano, Eric Paris, Roman Zippel, linux-kbuild, Linus Torvalds, Steven Rostedt * Pekka Enberg <penberg@kernel.org> wrote: > Hi Jesper, > > On Mon, Mar 21, 2011 at 10:08 PM, Jesper Juhl <jj@chaosbits.net> wrote: > > I believe that the majority of systems we are built on want a -O2 compiled > > kernel. Optimizing for size (-Os) is mainly benneficial for embedded > > systems and systems with very small CPU caches (correct me if I'm wrong). > > Please take a look at commit 0910b44 ("Expose "Optimize for size" > option for everybody") for the reasoning behind defaulting to -Os. If that situation has changed - if GCC has regressed in this area then a commit changing the default IMHO gains a lot of credibility if it is backed by careful measurements using perf stat --repeat or similar tools. See the hard numbers in this upstream commit for example: ea7145477a46: x86: Separate out entry text section there we were able to prove the positive effects of a pretty subtle change to the layout of the instruction cache, with a measurement noise in the 0.1% range. Thanks, Ingo ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: PATCH][RFC][resend] CC_OPTIMIZE_FOR_SIZE should default to N 2011-03-22 10:27 ` Ingo Molnar @ 2011-03-22 16:59 ` Linus Torvalds 2011-03-23 17:45 ` Andi Kleen 2011-03-23 21:14 ` Ingo Molnar 0 siblings, 2 replies; 8+ messages in thread From: Linus Torvalds @ 2011-03-22 16:59 UTC (permalink / raw) To: Ingo Molnar Cc: Pekka Enberg, Jesper Juhl, linux-kernel, Andrew Morton, Paul E. McKenney, Daniel Lezcano, Eric Paris, Roman Zippel, linux-kbuild, Steven Rostedt On Tue, Mar 22, 2011 at 3:27 AM, Ingo Molnar <mingo@elte.hu> wrote: > > If that situation has changed - if GCC has regressed in this area then a commit > changing the default IMHO gains a lot of credibility if it is backed by careful > measurements using perf stat --repeat or similar tools. Also, please don't back up any numbers for the "-O2 is faster than -Os" case with some benchmark that is hot in the caches. The thing is, many optimizations that make the code larger look really good if there are no cache misses, and the code is run a million times in a tight loop. But kernel code in particular tends to not be like that. Yes, there are cases where we spend 75% of the time in the kernel (my own personal favorite is "git diff") basically having user space loop around just one single operation. But it is _really_ quite rare in real life. Most of the time, user space will blow the kernel caches out of the water, and the kernel loops will be on the order of a few entries (eg a "loop" may be the loop around a pathname lookup, and loops over three path components). Not millions. The rule-of-thumb should be simple: 10% larger code likely means 10% more I$ misses. Does the larger -O2 code make up for it? Now, the downside of -Os has always been that it's not all that widely used, so we've hit compiler bugs several times. That's been almost enough to make me think that it's not worth it. But currently I don't think we have any known issues, and probably exactly _because_ we use -Os it seems that gcc hasn't that many regressions. It was much more painful when we started trying to use -Os. (That said, gcc -Os isn't all that wonderful. It tends to sometimes generate really crappy code just because it's smaller, ie using a multiply instruction in a critical code window just because doing a few shifts and adds is larger. And that can be _so_ much slower that it really hurts. So we might be better off with a model where we can say "this code is important and really core kernel code that everybody uses, do -O2 for this", and just compile _most_ of the kernel with -Os) Linus ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: PATCH][RFC][resend] CC_OPTIMIZE_FOR_SIZE should default to N 2011-03-22 16:59 ` Linus Torvalds @ 2011-03-23 17:45 ` Andi Kleen 2011-03-23 21:14 ` Ingo Molnar 1 sibling, 0 replies; 8+ messages in thread From: Andi Kleen @ 2011-03-23 17:45 UTC (permalink / raw) To: Linus Torvalds Cc: Ingo Molnar, Pekka Enberg, Jesper Juhl, linux-kernel, Andrew Morton, Paul E. McKenney, Daniel Lezcano, Eric Paris, Roman Zippel, linux-kbuild, Steven Rostedt Linus Torvalds <torvalds@linux-foundation.org> writes: > On Tue, Mar 22, 2011 at 3:27 AM, Ingo Molnar <mingo@elte.hu> wrote: >> >> If that situation has changed - if GCC has regressed in this area then a commit >> changing the default IMHO gains a lot of credibility if it is backed by careful >> measurements using perf stat --repeat or similar tools. > > Also, please don't back up any numbers for the "-O2 is faster than > -Os" case with some benchmark that is hot in the caches. Haven't done it recently, but some time last year -O2 vs -Os made a measurable difference in large OLTP benchmarks. (-Os being worse of course -- friends don't let friends use -Os) -Andi -- ak@linux.intel.com -- Speaking for myself only ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: PATCH][RFC][resend] CC_OPTIMIZE_FOR_SIZE should default to N 2011-03-22 16:59 ` Linus Torvalds 2011-03-23 17:45 ` Andi Kleen @ 2011-03-23 21:14 ` Ingo Molnar 1 sibling, 0 replies; 8+ messages in thread From: Ingo Molnar @ 2011-03-23 21:14 UTC (permalink / raw) To: Linus Torvalds Cc: Pekka Enberg, Jesper Juhl, linux-kernel, Andrew Morton, Paul E. McKenney, Daniel Lezcano, Eric Paris, Roman Zippel, linux-kbuild, Steven Rostedt * Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Tue, Mar 22, 2011 at 3:27 AM, Ingo Molnar <mingo@elte.hu> wrote: > > > > If that situation has changed - if GCC has regressed in this area then a commit > > changing the default IMHO gains a lot of credibility if it is backed by careful > > measurements using perf stat --repeat or similar tools. > > Also, please don't back up any numbers for the "-O2 is faster than > -Os" case with some benchmark that is hot in the caches. > > The thing is, many optimizations that make the code larger look really > good if there are no cache misses, and the code is run a million times > in a tight loop. > > But kernel code in particular tends to not be like that. [...] To throw some numbers into the discussion, here's the size versus speed comparison for 'hackbench 15' - which is more on the microbenchmark side of the equation - but has macrobenchmark properties as well, because it runs 3000 tasks and moves a lot of data, hence thrashes the caches constantly: CONFIG_CC_OPTIMIZE_FOR_SIZE=y ---------------------------------------- 6,757,858,145 cycles # 2525.983 M/sec ( +- 0.388% ) 2,949,907,036 instructions # 0.437 IPC ( +- 0.191% ) 595,955,367 branches # 222.759 M/sec ( +- 0.238% ) 31,504,981 branch-misses # 5.286 % ( +- 0.187% ) 0.164320722 seconds time elapsed ( +- 0.524% ) # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set ---------------------------------------- 6,061,867,073 cycles # 2510.283 M/sec ( +- 0.494% ) 2,510,505,732 instructions # 0.414 IPC ( +- 0.243% ) 493,721,089 branches # 204.455 M/sec ( +- 0.302% ) 38,731,708 branch-misses # 7.845 % ( +- 0.206% ) 0.148203574 seconds time elapsed ( +- 0.673% ) They were perf stat --repeat 100 runs - repeated a couple of times to make sure it's all real. I have used GCC 4.6.0, a relatively recent compiler. (64-bit x86, typical .config, etc.) The text size differences: text data bss dec filename ------------------------------------------------------------------------- 8809558 1790428 2719744 13319730 vmlinux.optimize_for_size 10268082 1825292 2727936 14821310 vmlinux.optimize_for_speed So by enabling CONFIG_CC_OPTIMIZE_FOR_SIZE=y, we get this total effect: -16.5% text size reduction +17.5% instruction count increase +20.7% branches executed increase -22.9% branch-miss reduction +11.5% cycle count increase +10.8% total runtime increase A few observations: - the branch-miss reduction suggests that almost none of the new branches introduced by -Os generates a branch miss. - the cycles count increase is in line with the total runtime increase. - workloads where 16.5% more instruction cache footprint slows down the workload by more than ~11% would win from enabling CONFIG_CC_OPTIMIZE_FOR_SIZE=y. Looking at these numbers i became more pessimistic about the usefulness of the current implementation of CONFIG_CC_OPTIMIZE_FOR_SIZE=y - it would need some *serious* icache thrashing to cause a larger than 11% slowdown, right? I'm not sure what the best way would be to measure a realistic macro workloads where the kernel's instructions generate a lot of instruction-cache misses. Most of the 'real' workloads tend to be hard to measure precisely, tend to be very noisy and take a long time to run. I could perhaps try to simulate them: i could patch a debug-only 'icache flusher' function into every system call, and compare the perf stat results - would that be an acceptable simulation of cache-cold kernel execution? The 'icache flusher' would be something simple, like 10,000x 5-byte NOP instructions in a row, or so. This would slow things down immensely, but this particular slowdown is the same for both OPTIMIZE_FOR_SIZE=y and OPTIMIZE_FOR_SIZE=n. Any better ideas? Ingo ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2011-03-23 21:14 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-03-21 20:08 PATCH][RFC][resend] CC_OPTIMIZE_FOR_SIZE should default to N Jesper Juhl 2011-03-22 2:52 ` Steven Rostedt 2011-03-22 8:21 ` Pekka Enberg 2011-03-22 8:25 ` Jesper Juhl 2011-03-22 10:27 ` Ingo Molnar 2011-03-22 16:59 ` Linus Torvalds 2011-03-23 17:45 ` Andi Kleen 2011-03-23 21:14 ` Ingo Molnar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).