* PATCH][RFC][resend] CC_OPTIMIZE_FOR_SIZE should default to N
@ 2011-03-21 20:08 Jesper Juhl
2011-03-22 2:52 ` Steven Rostedt
2011-03-22 8:21 ` Pekka Enberg
0 siblings, 2 replies; 8+ messages in thread
From: Jesper Juhl @ 2011-03-21 20:08 UTC (permalink / raw)
To: linux-kernel
Cc: Andrew Morton, Paul E. McKenney, Ingo Molnar, Daniel Lezcano,
Eric Paris, Roman Zippel, linux-kbuild
I believe that the majority of systems we are built on want a -O2 compiled
kernel. Optimizing for size (-Os) is mainly benneficial for embedded
systems and systems with very small CPU caches (correct me if I'm wrong).
So it seems wrong to me that CC_OPTIMIZE_FOR_SIZE defaults to 'y' and
recommends saying 'Y' if unsure. I believe it should default to 'n' and
recommend that if unsure. People who bennefit from -Os know who they are
and can enable the option if needed/wanted - the majority shouldn't
select this. Right?
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
---
Kconfig | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/init/Kconfig b/init/Kconfig
index 56240e7..0d63dfa 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -903,12 +903,12 @@ endif
config CC_OPTIMIZE_FOR_SIZE
bool "Optimize for size"
- default y
+ default n
help
Enabling this option will pass "-Os" instead of "-O2" to gcc
resulting in a smaller kernel.
- If unsure, say Y.
+ If unsure, say N.
config SYSCTL
bool
--
Jesper Juhl <jj@chaosbits.net> http://www.chaosbits.net/
Don't top-post http://www.catb.org/jargon/html/T/top-post.html
Plain text mails only, please.
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: PATCH][RFC][resend] CC_OPTIMIZE_FOR_SIZE should default to N
2011-03-21 20:08 PATCH][RFC][resend] CC_OPTIMIZE_FOR_SIZE should default to N Jesper Juhl
@ 2011-03-22 2:52 ` Steven Rostedt
2011-03-22 8:21 ` Pekka Enberg
1 sibling, 0 replies; 8+ messages in thread
From: Steven Rostedt @ 2011-03-22 2:52 UTC (permalink / raw)
To: Jesper Juhl
Cc: linux-kernel, Andrew Morton, Paul E. McKenney, Ingo Molnar,
Daniel Lezcano, Eric Paris, Roman Zippel, linux-kbuild
On Mon, Mar 21, 2011 at 09:08:24PM +0100, Jesper Juhl wrote:
> I believe that the majority of systems we are built on want a -O2 compiled
> kernel. Optimizing for size (-Os) is mainly benneficial for embedded
> systems and systems with very small CPU caches (correct me if I'm wrong).
> So it seems wrong to me that CC_OPTIMIZE_FOR_SIZE defaults to 'y' and
> recommends saying 'Y' if unsure. I believe it should default to 'n' and
> recommend that if unsure. People who bennefit from -Os know who they are
> and can enable the option if needed/wanted - the majority shouldn't
> select this. Right?
>
> Signed-off-by: Jesper Juhl <jj@chaosbits.net>
I've actually seen nothing but problems with -Os.
Acked-by: Steven Rostedt <rostedt@goodmis.org>
-- Steve
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: PATCH][RFC][resend] CC_OPTIMIZE_FOR_SIZE should default to N
2011-03-21 20:08 PATCH][RFC][resend] CC_OPTIMIZE_FOR_SIZE should default to N Jesper Juhl
2011-03-22 2:52 ` Steven Rostedt
@ 2011-03-22 8:21 ` Pekka Enberg
2011-03-22 8:25 ` Jesper Juhl
2011-03-22 10:27 ` Ingo Molnar
1 sibling, 2 replies; 8+ messages in thread
From: Pekka Enberg @ 2011-03-22 8:21 UTC (permalink / raw)
To: Jesper Juhl
Cc: linux-kernel, Andrew Morton, Paul E. McKenney, Ingo Molnar,
Daniel Lezcano, Eric Paris, Roman Zippel, linux-kbuild,
Linus Torvalds, Steven Rostedt
Hi Jesper,
On Mon, Mar 21, 2011 at 10:08 PM, Jesper Juhl <jj@chaosbits.net> wrote:
> I believe that the majority of systems we are built on want a -O2 compiled
> kernel. Optimizing for size (-Os) is mainly benneficial for embedded
> systems and systems with very small CPU caches (correct me if I'm wrong).
Please take a look at commit 0910b44 ("Expose "Optimize for size"
option for everybody") for the reasoning behind defaulting to -Os.
Pekka
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: PATCH][RFC][resend] CC_OPTIMIZE_FOR_SIZE should default to N
2011-03-22 8:21 ` Pekka Enberg
@ 2011-03-22 8:25 ` Jesper Juhl
2011-03-22 10:27 ` Ingo Molnar
1 sibling, 0 replies; 8+ messages in thread
From: Jesper Juhl @ 2011-03-22 8:25 UTC (permalink / raw)
To: Pekka Enberg
Cc: linux-kernel, Andrew Morton, Paul E. McKenney, Ingo Molnar,
Daniel Lezcano, Eric Paris, Roman Zippel, linux-kbuild,
Linus Torvalds, Steven Rostedt
On Tue, 22 Mar 2011, Pekka Enberg wrote:
> Hi Jesper,
>
> On Mon, Mar 21, 2011 at 10:08 PM, Jesper Juhl <jj@chaosbits.net> wrote:
> > I believe that the majority of systems we are built on want a -O2 compiled
> > kernel. Optimizing for size (-Os) is mainly benneficial for embedded
> > systems and systems with very small CPU caches (correct me if I'm wrong).
>
> Please take a look at commit 0910b44 ("Expose "Optimize for size"
> option for everybody") for the reasoning behind defaulting to -Os.
>
Thank you for that pointer Pekka. I guess I need to update my view on -Os.
--
Jesper Juhl <jj@chaosbits.net> http://www.chaosbits.net/
Don't top-post http://www.catb.org/jargon/html/T/top-post.html
Plain text mails only, please.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: PATCH][RFC][resend] CC_OPTIMIZE_FOR_SIZE should default to N
2011-03-22 8:21 ` Pekka Enberg
2011-03-22 8:25 ` Jesper Juhl
@ 2011-03-22 10:27 ` Ingo Molnar
2011-03-22 16:59 ` Linus Torvalds
1 sibling, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2011-03-22 10:27 UTC (permalink / raw)
To: Pekka Enberg
Cc: Jesper Juhl, linux-kernel, Andrew Morton, Paul E. McKenney,
Daniel Lezcano, Eric Paris, Roman Zippel, linux-kbuild,
Linus Torvalds, Steven Rostedt
* Pekka Enberg <penberg@kernel.org> wrote:
> Hi Jesper,
>
> On Mon, Mar 21, 2011 at 10:08 PM, Jesper Juhl <jj@chaosbits.net> wrote:
> > I believe that the majority of systems we are built on want a -O2 compiled
> > kernel. Optimizing for size (-Os) is mainly benneficial for embedded
> > systems and systems with very small CPU caches (correct me if I'm wrong).
>
> Please take a look at commit 0910b44 ("Expose "Optimize for size"
> option for everybody") for the reasoning behind defaulting to -Os.
If that situation has changed - if GCC has regressed in this area then a commit
changing the default IMHO gains a lot of credibility if it is backed by careful
measurements using perf stat --repeat or similar tools.
See the hard numbers in this upstream commit for example:
ea7145477a46: x86: Separate out entry text section
there we were able to prove the positive effects of a pretty subtle change to
the layout of the instruction cache, with a measurement noise in the 0.1%
range.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: PATCH][RFC][resend] CC_OPTIMIZE_FOR_SIZE should default to N
2011-03-22 10:27 ` Ingo Molnar
@ 2011-03-22 16:59 ` Linus Torvalds
2011-03-23 17:45 ` Andi Kleen
2011-03-23 21:14 ` Ingo Molnar
0 siblings, 2 replies; 8+ messages in thread
From: Linus Torvalds @ 2011-03-22 16:59 UTC (permalink / raw)
To: Ingo Molnar
Cc: Pekka Enberg, Jesper Juhl, linux-kernel, Andrew Morton,
Paul E. McKenney, Daniel Lezcano, Eric Paris, Roman Zippel,
linux-kbuild, Steven Rostedt
On Tue, Mar 22, 2011 at 3:27 AM, Ingo Molnar <mingo@elte.hu> wrote:
>
> If that situation has changed - if GCC has regressed in this area then a commit
> changing the default IMHO gains a lot of credibility if it is backed by careful
> measurements using perf stat --repeat or similar tools.
Also, please don't back up any numbers for the "-O2 is faster than
-Os" case with some benchmark that is hot in the caches.
The thing is, many optimizations that make the code larger look really
good if there are no cache misses, and the code is run a million times
in a tight loop.
But kernel code in particular tends to not be like that. Yes, there
are cases where we spend 75% of the time in the kernel (my own
personal favorite is "git diff") basically having user space loop
around just one single operation. But it is _really_ quite rare in
real life. Most of the time, user space will blow the kernel caches
out of the water, and the kernel loops will be on the order of a few
entries (eg a "loop" may be the loop around a pathname lookup, and
loops over three path components). Not millions.
The rule-of-thumb should be simple: 10% larger code likely means 10%
more I$ misses. Does the larger -O2 code make up for it?
Now, the downside of -Os has always been that it's not all that widely
used, so we've hit compiler bugs several times. That's been almost
enough to make me think that it's not worth it. But currently I don't
think we have any known issues, and probably exactly _because_ we use
-Os it seems that gcc hasn't that many regressions. It was much more
painful when we started trying to use -Os.
(That said, gcc -Os isn't all that wonderful. It tends to sometimes
generate really crappy code just because it's smaller, ie using a
multiply instruction in a critical code window just because doing a
few shifts and adds is larger. And that can be _so_ much slower that
it really hurts. So we might be better off with a model where we can
say "this code is important and really core kernel code that everybody
uses, do -O2 for this", and just compile _most_ of the kernel with
-Os)
Linus
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: PATCH][RFC][resend] CC_OPTIMIZE_FOR_SIZE should default to N
2011-03-22 16:59 ` Linus Torvalds
@ 2011-03-23 17:45 ` Andi Kleen
2011-03-23 21:14 ` Ingo Molnar
1 sibling, 0 replies; 8+ messages in thread
From: Andi Kleen @ 2011-03-23 17:45 UTC (permalink / raw)
To: Linus Torvalds
Cc: Ingo Molnar, Pekka Enberg, Jesper Juhl, linux-kernel,
Andrew Morton, Paul E. McKenney, Daniel Lezcano, Eric Paris,
Roman Zippel, linux-kbuild, Steven Rostedt
Linus Torvalds <torvalds@linux-foundation.org> writes:
> On Tue, Mar 22, 2011 at 3:27 AM, Ingo Molnar <mingo@elte.hu> wrote:
>>
>> If that situation has changed - if GCC has regressed in this area then a commit
>> changing the default IMHO gains a lot of credibility if it is backed by careful
>> measurements using perf stat --repeat or similar tools.
>
> Also, please don't back up any numbers for the "-O2 is faster than
> -Os" case with some benchmark that is hot in the caches.
Haven't done it recently, but some time last year -O2 vs -Os
made a measurable difference in large OLTP benchmarks.
(-Os being worse of course -- friends don't let friends use -Os)
-Andi
--
ak@linux.intel.com -- Speaking for myself only
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: PATCH][RFC][resend] CC_OPTIMIZE_FOR_SIZE should default to N
2011-03-22 16:59 ` Linus Torvalds
2011-03-23 17:45 ` Andi Kleen
@ 2011-03-23 21:14 ` Ingo Molnar
1 sibling, 0 replies; 8+ messages in thread
From: Ingo Molnar @ 2011-03-23 21:14 UTC (permalink / raw)
To: Linus Torvalds
Cc: Pekka Enberg, Jesper Juhl, linux-kernel, Andrew Morton,
Paul E. McKenney, Daniel Lezcano, Eric Paris, Roman Zippel,
linux-kbuild, Steven Rostedt
* Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Tue, Mar 22, 2011 at 3:27 AM, Ingo Molnar <mingo@elte.hu> wrote:
> >
> > If that situation has changed - if GCC has regressed in this area then a commit
> > changing the default IMHO gains a lot of credibility if it is backed by careful
> > measurements using perf stat --repeat or similar tools.
>
> Also, please don't back up any numbers for the "-O2 is faster than
> -Os" case with some benchmark that is hot in the caches.
>
> The thing is, many optimizations that make the code larger look really
> good if there are no cache misses, and the code is run a million times
> in a tight loop.
>
> But kernel code in particular tends to not be like that. [...]
To throw some numbers into the discussion, here's the size versus speed
comparison for 'hackbench 15' - which is more on the microbenchmark side of the
equation - but has macrobenchmark properties as well, because it runs 3000
tasks and moves a lot of data, hence thrashes the caches constantly:
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
----------------------------------------
6,757,858,145 cycles # 2525.983 M/sec ( +- 0.388% )
2,949,907,036 instructions # 0.437 IPC ( +- 0.191% )
595,955,367 branches # 222.759 M/sec ( +- 0.238% )
31,504,981 branch-misses # 5.286 % ( +- 0.187% )
0.164320722 seconds time elapsed ( +- 0.524% )
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
----------------------------------------
6,061,867,073 cycles # 2510.283 M/sec ( +- 0.494% )
2,510,505,732 instructions # 0.414 IPC ( +- 0.243% )
493,721,089 branches # 204.455 M/sec ( +- 0.302% )
38,731,708 branch-misses # 7.845 % ( +- 0.206% )
0.148203574 seconds time elapsed ( +- 0.673% )
They were perf stat --repeat 100 runs - repeated a couple of times to make sure
it's all real. I have used GCC 4.6.0, a relatively recent compiler. (64-bit
x86, typical .config, etc.)
The text size differences:
text data bss dec filename
-------------------------------------------------------------------------
8809558 1790428 2719744 13319730 vmlinux.optimize_for_size
10268082 1825292 2727936 14821310 vmlinux.optimize_for_speed
So by enabling CONFIG_CC_OPTIMIZE_FOR_SIZE=y, we get this total effect:
-16.5% text size reduction
+17.5% instruction count increase
+20.7% branches executed increase
-22.9% branch-miss reduction
+11.5% cycle count increase
+10.8% total runtime increase
A few observations:
- the branch-miss reduction suggests that almost none of the new branches
introduced by -Os generates a branch miss.
- the cycles count increase is in line with the total runtime increase.
- workloads where 16.5% more instruction cache footprint slows down the
workload by more than ~11% would win from enabling
CONFIG_CC_OPTIMIZE_FOR_SIZE=y.
Looking at these numbers i became more pessimistic about the usefulness of the
current implementation of CONFIG_CC_OPTIMIZE_FOR_SIZE=y - it would need some
*serious* icache thrashing to cause a larger than 11% slowdown, right?
I'm not sure what the best way would be to measure a realistic macro workloads
where the kernel's instructions generate a lot of instruction-cache misses.
Most of the 'real' workloads tend to be hard to measure precisely, tend to be
very noisy and take a long time to run.
I could perhaps try to simulate them: i could patch a debug-only 'icache
flusher' function into every system call, and compare the perf stat results -
would that be an acceptable simulation of cache-cold kernel execution?
The 'icache flusher' would be something simple, like 10,000x 5-byte NOP
instructions in a row, or so. This would slow things down immensely, but this
particular slowdown is the same for both OPTIMIZE_FOR_SIZE=y and
OPTIMIZE_FOR_SIZE=n.
Any better ideas?
Ingo
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2011-03-23 21:14 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-03-21 20:08 PATCH][RFC][resend] CC_OPTIMIZE_FOR_SIZE should default to N Jesper Juhl
2011-03-22 2:52 ` Steven Rostedt
2011-03-22 8:21 ` Pekka Enberg
2011-03-22 8:25 ` Jesper Juhl
2011-03-22 10:27 ` Ingo Molnar
2011-03-22 16:59 ` Linus Torvalds
2011-03-23 17:45 ` Andi Kleen
2011-03-23 21:14 ` Ingo Molnar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).