linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* PATCH][RFC][resend] CC_OPTIMIZE_FOR_SIZE should default to N
@ 2011-03-21 20:08 Jesper Juhl
  2011-03-22  2:52 ` Steven Rostedt
  2011-03-22  8:21 ` Pekka Enberg
  0 siblings, 2 replies; 8+ messages in thread
From: Jesper Juhl @ 2011-03-21 20:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Paul E. McKenney, Ingo Molnar, Daniel Lezcano,
	Eric Paris, Roman Zippel, linux-kbuild

I believe that the majority of systems we are built on want a -O2 compiled 
kernel. Optimizing for size (-Os) is mainly benneficial for embedded 
systems and systems with very small CPU caches (correct me if I'm wrong).
So it seems wrong to me that CC_OPTIMIZE_FOR_SIZE defaults to 'y' and 
recommends saying 'Y' if unsure. I believe it should default to 'n' and 
recommend that if unsure. People who bennefit from -Os know who they are 
and can enable the option if needed/wanted - the majority shouldn't 
select this. Right?

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
---
 Kconfig |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index 56240e7..0d63dfa 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -903,12 +903,12 @@ endif
 
 config CC_OPTIMIZE_FOR_SIZE
 	bool "Optimize for size"
-	default y
+	default n
 	help
 	  Enabling this option will pass "-Os" instead of "-O2" to gcc
 	  resulting in a smaller kernel.
 
-	  If unsure, say Y.
+	  If unsure, say N.
 
 config SYSCTL
 	bool


-- 
Jesper Juhl <jj@chaosbits.net>       http://www.chaosbits.net/
Don't top-post http://www.catb.org/jargon/html/T/top-post.html
Plain text mails only, please.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PATCH][RFC][resend] CC_OPTIMIZE_FOR_SIZE should default to N
  2011-03-21 20:08 PATCH][RFC][resend] CC_OPTIMIZE_FOR_SIZE should default to N Jesper Juhl
@ 2011-03-22  2:52 ` Steven Rostedt
  2011-03-22  8:21 ` Pekka Enberg
  1 sibling, 0 replies; 8+ messages in thread
From: Steven Rostedt @ 2011-03-22  2:52 UTC (permalink / raw)
  To: Jesper Juhl
  Cc: linux-kernel, Andrew Morton, Paul E. McKenney, Ingo Molnar,
	Daniel Lezcano, Eric Paris, Roman Zippel, linux-kbuild

On Mon, Mar 21, 2011 at 09:08:24PM +0100, Jesper Juhl wrote:
> I believe that the majority of systems we are built on want a -O2 compiled 
> kernel. Optimizing for size (-Os) is mainly benneficial for embedded 
> systems and systems with very small CPU caches (correct me if I'm wrong).
> So it seems wrong to me that CC_OPTIMIZE_FOR_SIZE defaults to 'y' and 
> recommends saying 'Y' if unsure. I believe it should default to 'n' and 
> recommend that if unsure. People who bennefit from -Os know who they are 
> and can enable the option if needed/wanted - the majority shouldn't 
> select this. Right?
> 
> Signed-off-by: Jesper Juhl <jj@chaosbits.net>

I've actually seen nothing but problems with -Os.

Acked-by: Steven Rostedt <rostedt@goodmis.org>

-- Steve


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PATCH][RFC][resend] CC_OPTIMIZE_FOR_SIZE should default to N
  2011-03-21 20:08 PATCH][RFC][resend] CC_OPTIMIZE_FOR_SIZE should default to N Jesper Juhl
  2011-03-22  2:52 ` Steven Rostedt
@ 2011-03-22  8:21 ` Pekka Enberg
  2011-03-22  8:25   ` Jesper Juhl
  2011-03-22 10:27   ` Ingo Molnar
  1 sibling, 2 replies; 8+ messages in thread
From: Pekka Enberg @ 2011-03-22  8:21 UTC (permalink / raw)
  To: Jesper Juhl
  Cc: linux-kernel, Andrew Morton, Paul E. McKenney, Ingo Molnar,
	Daniel Lezcano, Eric Paris, Roman Zippel, linux-kbuild,
	Linus Torvalds, Steven Rostedt

Hi Jesper,

On Mon, Mar 21, 2011 at 10:08 PM, Jesper Juhl <jj@chaosbits.net> wrote:
> I believe that the majority of systems we are built on want a -O2 compiled
> kernel. Optimizing for size (-Os) is mainly benneficial for embedded
> systems and systems with very small CPU caches (correct me if I'm wrong).

Please take a look at commit 0910b44 ("Expose "Optimize for size"
option for everybody") for the reasoning behind defaulting to -Os.

                        Pekka

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PATCH][RFC][resend] CC_OPTIMIZE_FOR_SIZE should default to N
  2011-03-22  8:21 ` Pekka Enberg
@ 2011-03-22  8:25   ` Jesper Juhl
  2011-03-22 10:27   ` Ingo Molnar
  1 sibling, 0 replies; 8+ messages in thread
From: Jesper Juhl @ 2011-03-22  8:25 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: linux-kernel, Andrew Morton, Paul E. McKenney, Ingo Molnar,
	Daniel Lezcano, Eric Paris, Roman Zippel, linux-kbuild,
	Linus Torvalds, Steven Rostedt

On Tue, 22 Mar 2011, Pekka Enberg wrote:

> Hi Jesper,
> 
> On Mon, Mar 21, 2011 at 10:08 PM, Jesper Juhl <jj@chaosbits.net> wrote:
> > I believe that the majority of systems we are built on want a -O2 compiled
> > kernel. Optimizing for size (-Os) is mainly benneficial for embedded
> > systems and systems with very small CPU caches (correct me if I'm wrong).
> 
> Please take a look at commit 0910b44 ("Expose "Optimize for size"
> option for everybody") for the reasoning behind defaulting to -Os.
> 
Thank you for that pointer Pekka. I guess I need to update my view on -Os.

-- 
Jesper Juhl <jj@chaosbits.net>       http://www.chaosbits.net/
Don't top-post http://www.catb.org/jargon/html/T/top-post.html
Plain text mails only, please.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PATCH][RFC][resend] CC_OPTIMIZE_FOR_SIZE should default to N
  2011-03-22  8:21 ` Pekka Enberg
  2011-03-22  8:25   ` Jesper Juhl
@ 2011-03-22 10:27   ` Ingo Molnar
  2011-03-22 16:59     ` Linus Torvalds
  1 sibling, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2011-03-22 10:27 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Jesper Juhl, linux-kernel, Andrew Morton, Paul E. McKenney,
	Daniel Lezcano, Eric Paris, Roman Zippel, linux-kbuild,
	Linus Torvalds, Steven Rostedt


* Pekka Enberg <penberg@kernel.org> wrote:

> Hi Jesper,
> 
> On Mon, Mar 21, 2011 at 10:08 PM, Jesper Juhl <jj@chaosbits.net> wrote:
> > I believe that the majority of systems we are built on want a -O2 compiled
> > kernel. Optimizing for size (-Os) is mainly benneficial for embedded
> > systems and systems with very small CPU caches (correct me if I'm wrong).
> 
> Please take a look at commit 0910b44 ("Expose "Optimize for size"
> option for everybody") for the reasoning behind defaulting to -Os.

If that situation has changed - if GCC has regressed in this area then a commit 
changing the default IMHO gains a lot of credibility if it is backed by careful 
measurements using perf stat --repeat or similar tools.

See the hard numbers in this upstream commit for example:

  ea7145477a46: x86: Separate out entry text section

there we were able to prove the positive effects of a pretty subtle change to 
the layout of the instruction cache, with a measurement noise in the 0.1% 
range.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PATCH][RFC][resend] CC_OPTIMIZE_FOR_SIZE should default to N
  2011-03-22 10:27   ` Ingo Molnar
@ 2011-03-22 16:59     ` Linus Torvalds
  2011-03-23 17:45       ` Andi Kleen
  2011-03-23 21:14       ` Ingo Molnar
  0 siblings, 2 replies; 8+ messages in thread
From: Linus Torvalds @ 2011-03-22 16:59 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Pekka Enberg, Jesper Juhl, linux-kernel, Andrew Morton,
	Paul E. McKenney, Daniel Lezcano, Eric Paris, Roman Zippel,
	linux-kbuild, Steven Rostedt

On Tue, Mar 22, 2011 at 3:27 AM, Ingo Molnar <mingo@elte.hu> wrote:
>
> If that situation has changed - if GCC has regressed in this area then a commit
> changing the default IMHO gains a lot of credibility if it is backed by careful
> measurements using perf stat --repeat or similar tools.

Also, please don't back up any numbers for the "-O2 is faster than
-Os" case with some benchmark that is hot in the caches.

The thing is, many optimizations that make the code larger look really
good if there are no cache misses, and the code is run a million times
in a tight loop.

But kernel code in particular tends to not be like that. Yes, there
are cases where we spend 75% of the time in the kernel (my own
personal favorite is "git diff") basically having user space loop
around just one single operation. But it is _really_ quite rare in
real life. Most of the time, user space will blow the kernel caches
out of the water, and the kernel loops will be on the order of a few
entries (eg a "loop" may be the loop around a pathname lookup, and
loops over three path components). Not millions.

The rule-of-thumb should be simple: 10% larger code likely means 10%
more I$ misses. Does the larger -O2 code make up for it?

Now, the downside of -Os has always been that it's not all that widely
used, so we've hit compiler bugs several times. That's been almost
enough to make me think that it's not worth it. But currently I don't
think we have any known issues, and probably exactly _because_ we use
-Os it seems that gcc hasn't that many regressions. It was much more
painful when we started trying to use -Os.

(That said, gcc -Os isn't all that wonderful. It tends to sometimes
generate really crappy code just because it's smaller, ie using a
multiply instruction in a critical code window just because doing a
few shifts and adds is larger. And that can be _so_ much slower that
it really hurts. So we might be better off with a model where we can
say "this code is important and really core kernel code that everybody
uses, do -O2 for this", and just compile _most_ of the kernel with
-Os)

                             Linus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PATCH][RFC][resend] CC_OPTIMIZE_FOR_SIZE should default to N
  2011-03-22 16:59     ` Linus Torvalds
@ 2011-03-23 17:45       ` Andi Kleen
  2011-03-23 21:14       ` Ingo Molnar
  1 sibling, 0 replies; 8+ messages in thread
From: Andi Kleen @ 2011-03-23 17:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Pekka Enberg, Jesper Juhl, linux-kernel,
	Andrew Morton, Paul E. McKenney, Daniel Lezcano, Eric Paris,
	Roman Zippel, linux-kbuild, Steven Rostedt

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Tue, Mar 22, 2011 at 3:27 AM, Ingo Molnar <mingo@elte.hu> wrote:
>>
>> If that situation has changed - if GCC has regressed in this area then a commit
>> changing the default IMHO gains a lot of credibility if it is backed by careful
>> measurements using perf stat --repeat or similar tools.
>
> Also, please don't back up any numbers for the "-O2 is faster than
> -Os" case with some benchmark that is hot in the caches.

Haven't done it recently, but some time last year -O2 vs -Os
made a measurable difference in large OLTP benchmarks.

(-Os being worse of course -- friends don't let friends use -Os)

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PATCH][RFC][resend] CC_OPTIMIZE_FOR_SIZE should default to N
  2011-03-22 16:59     ` Linus Torvalds
  2011-03-23 17:45       ` Andi Kleen
@ 2011-03-23 21:14       ` Ingo Molnar
  1 sibling, 0 replies; 8+ messages in thread
From: Ingo Molnar @ 2011-03-23 21:14 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Pekka Enberg, Jesper Juhl, linux-kernel, Andrew Morton,
	Paul E. McKenney, Daniel Lezcano, Eric Paris, Roman Zippel,
	linux-kbuild, Steven Rostedt


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Tue, Mar 22, 2011 at 3:27 AM, Ingo Molnar <mingo@elte.hu> wrote:
> >
> > If that situation has changed - if GCC has regressed in this area then a commit
> > changing the default IMHO gains a lot of credibility if it is backed by careful
> > measurements using perf stat --repeat or similar tools.
> 
> Also, please don't back up any numbers for the "-O2 is faster than
> -Os" case with some benchmark that is hot in the caches.
> 
> The thing is, many optimizations that make the code larger look really
> good if there are no cache misses, and the code is run a million times
> in a tight loop.
> 
> But kernel code in particular tends to not be like that. [...]

To throw some numbers into the discussion, here's the size versus speed 
comparison for 'hackbench 15' - which is more on the microbenchmark side of the 
equation - but has macrobenchmark properties as well, because it runs 3000 
tasks and moves a lot of data, hence thrashes the caches constantly:

     CONFIG_CC_OPTIMIZE_FOR_SIZE=y
     ----------------------------------------
     6,757,858,145 cycles                   #   2525.983 M/sec   ( +-   0.388% )
     2,949,907,036 instructions             #      0.437 IPC     ( +-   0.191% )
       595,955,367 branches                 #    222.759 M/sec   ( +-   0.238% )
        31,504,981 branch-misses            #      5.286 %       ( +-   0.187% )

        0.164320722  seconds time elapsed   ( +-   0.524% )


     # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
     ----------------------------------------
     6,061,867,073 cycles                   #   2510.283 M/sec   ( +-   0.494% )
     2,510,505,732 instructions             #      0.414 IPC     ( +-   0.243% )
       493,721,089 branches                 #    204.455 M/sec   ( +-   0.302% )
        38,731,708 branch-misses            #      7.845 %       ( +-   0.206% )

        0.148203574  seconds time elapsed   ( +-   0.673% )

They were perf stat --repeat 100 runs - repeated a couple of times to make sure 
it's all real. I have used GCC 4.6.0, a relatively recent compiler. (64-bit 
x86, typical .config, etc.)

The text size differences:

      text	   data	    bss	    dec	         filename
  -------------------------------------------------------------------------
   8809558	1790428	2719744	13319730	 vmlinux.optimize_for_size
  10268082	1825292	2727936	14821310	 vmlinux.optimize_for_speed

So by enabling CONFIG_CC_OPTIMIZE_FOR_SIZE=y, we get this total effect:

  -16.5% text size reduction
  +17.5% instruction count increase
  +20.7% branches executed increase
  -22.9% branch-miss reduction
  +11.5% cycle count increase
  +10.8% total runtime increase

A few observations:

 - the branch-miss reduction suggests that almost none of the new branches
   introduced by -Os generates a branch miss.

 - the cycles count increase is in line with the total runtime increase.

 - workloads where 16.5% more instruction cache footprint slows down the 
   workload by more than ~11% would win from enabling 
   CONFIG_CC_OPTIMIZE_FOR_SIZE=y.

Looking at these numbers i became more pessimistic about the usefulness of the 
current implementation of CONFIG_CC_OPTIMIZE_FOR_SIZE=y - it would need some 
*serious* icache thrashing to cause a larger than 11% slowdown, right?

I'm not sure what the best way would be to measure a realistic macro workloads 
where the kernel's instructions generate a lot of instruction-cache misses. 
Most of the 'real' workloads tend to be hard to measure precisely, tend to be 
very noisy and take a long time to run.

I could perhaps try to simulate them: i could patch a debug-only 'icache 
flusher' function into every system call, and compare the perf stat results - 
would that be an acceptable simulation of cache-cold kernel execution?

The 'icache flusher' would be something simple, like 10,000x 5-byte NOP 
instructions in a row, or so. This would slow things down immensely, but this 
particular slowdown is the same for both OPTIMIZE_FOR_SIZE=y and 
OPTIMIZE_FOR_SIZE=n.

Any better ideas?

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-03-23 21:14 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-03-21 20:08 PATCH][RFC][resend] CC_OPTIMIZE_FOR_SIZE should default to N Jesper Juhl
2011-03-22  2:52 ` Steven Rostedt
2011-03-22  8:21 ` Pekka Enberg
2011-03-22  8:25   ` Jesper Juhl
2011-03-22 10:27   ` Ingo Molnar
2011-03-22 16:59     ` Linus Torvalds
2011-03-23 17:45       ` Andi Kleen
2011-03-23 21:14       ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).