All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10
@ 2020-05-07 22:45 Jason A. Donenfeld
  2020-05-08  8:35 ` Peter Zijlstra
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Jason A. Donenfeld @ 2020-05-07 22:45 UTC (permalink / raw)
  To: linux-kernel, x86; +Cc: Jason A. Donenfeld

GCC 10 appears to have changed -O2 in order to make compilation time
faster when using -flto, seemingly at the expense of performance, in
particular with regards to how the inliner works. Since -O3 these days
shouldn't have the same set of bugs as 10 years ago, this commit
defaults new kernel compiles to -O3 when using gcc >= 10.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
 init/Kconfig | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/init/Kconfig b/init/Kconfig
index 9e22ee8fbd75..fab3f810a68d 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1245,7 +1245,8 @@ config BOOT_CONFIG
 
 choice
 	prompt "Compiler optimization level"
-	default CC_OPTIMIZE_FOR_PERFORMANCE
+	default CC_OPTIMIZE_FOR_PERFORMANCE_O3 if GCC_VERSION >= 100000
+	default CC_OPTIMIZE_FOR_PERFORMANCE if (GCC_VERSION < 100000 || CC_IS_CLANG)
 
 config CC_OPTIMIZE_FOR_PERFORMANCE
 	bool "Optimize for performance (-O2)"
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10
  2020-05-07 22:45 [PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10 Jason A. Donenfeld
@ 2020-05-08  8:35 ` Peter Zijlstra
  2020-05-08  9:02 ` Oleksandr Natalenko
  2020-05-13 11:27 ` [PATCH] " Artem S. Tashkinov
  2 siblings, 0 replies; 20+ messages in thread
From: Peter Zijlstra @ 2020-05-08  8:35 UTC (permalink / raw)
  To: Jason A. Donenfeld, Jakub Jelinek; +Cc: linux-kernel, x86, hjl.tools

On Thu, May 07, 2020 at 04:45:30PM -0600, Jason A. Donenfeld wrote:
> GCC 10 appears to have changed -O2 in order to make compilation time
> faster when using -flto, seemingly at the expense of performance, in
> particular with regards to how the inliner works. Since -O3 these days
> shouldn't have the same set of bugs as 10 years ago, this commit
> defaults new kernel compiles to -O3 when using gcc >= 10.

Would be nice to get some GCC person's feedback on this. But in general,
I think you're right in that O3 isn't the code-gen disaster it used to
be.

> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> ---
>  init/Kconfig | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/init/Kconfig b/init/Kconfig
> index 9e22ee8fbd75..fab3f810a68d 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1245,7 +1245,8 @@ config BOOT_CONFIG
>  
>  choice
>  	prompt "Compiler optimization level"
> -	default CC_OPTIMIZE_FOR_PERFORMANCE
> +	default CC_OPTIMIZE_FOR_PERFORMANCE_O3 if GCC_VERSION >= 100000
> +	default CC_OPTIMIZE_FOR_PERFORMANCE if (GCC_VERSION < 100000 || CC_IS_CLANG)
>  
>  config CC_OPTIMIZE_FOR_PERFORMANCE
>  	bool "Optimize for performance (-O2)"
> -- 
> 2.26.2
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10
  2020-05-07 22:45 [PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10 Jason A. Donenfeld
  2020-05-08  8:35 ` Peter Zijlstra
@ 2020-05-08  9:02 ` Oleksandr Natalenko
  2020-05-08 11:21   ` Jason A. Donenfeld
  2020-05-11 21:57   ` [PATCH v2] " Jason A. Donenfeld
  2020-05-13 11:27 ` [PATCH] " Artem S. Tashkinov
  2 siblings, 2 replies; 20+ messages in thread
From: Oleksandr Natalenko @ 2020-05-08  9:02 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: linux-kernel, x86

On Thu, May 07, 2020 at 04:45:30PM -0600, Jason A. Donenfeld wrote:
> GCC 10 appears to have changed -O2 in order to make compilation time
> faster when using -flto, seemingly at the expense of performance, in
> particular with regards to how the inliner works. Since -O3 these days
> shouldn't have the same set of bugs as 10 years ago, this commit
> defaults new kernel compiles to -O3 when using gcc >= 10.
> 
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> ---
>  init/Kconfig | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/init/Kconfig b/init/Kconfig
> index 9e22ee8fbd75..fab3f810a68d 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1245,7 +1245,8 @@ config BOOT_CONFIG
>  
>  choice
>  	prompt "Compiler optimization level"
> -	default CC_OPTIMIZE_FOR_PERFORMANCE
> +	default CC_OPTIMIZE_FOR_PERFORMANCE_O3 if GCC_VERSION >= 100000
> +	default CC_OPTIMIZE_FOR_PERFORMANCE if (GCC_VERSION < 100000 || CC_IS_CLANG)
>  
>  config CC_OPTIMIZE_FOR_PERFORMANCE
>  	bool "Optimize for performance (-O2)"
> -- 
> 2.26.2
> 

Should we untangle -O3 from depending on ARC first maybe?

-- 
  Best regards,
    Oleksandr Natalenko (post-factum)
    Principal Software Maintenance Engineer


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10
  2020-05-08  9:02 ` Oleksandr Natalenko
@ 2020-05-08 11:21   ` Jason A. Donenfeld
  2020-05-08 11:33     ` Oleksandr Natalenko
  2020-05-11 21:57   ` [PATCH v2] " Jason A. Donenfeld
  1 sibling, 1 reply; 20+ messages in thread
From: Jason A. Donenfeld @ 2020-05-08 11:21 UTC (permalink / raw)
  To: Oleksandr Natalenko; +Cc: LKML, X86 ML

On Fri, May 8, 2020 at 3:08 AM Oleksandr Natalenko <oleksandr@redhat.com> wrote:
>
> Should we untangle -O3 from depending on ARC first maybe?

Oh, hah, good point. Yes, I'll do that for a v2, but will wait another
day for feedback first.

Jason

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10
  2020-05-08 11:21   ` Jason A. Donenfeld
@ 2020-05-08 11:33     ` Oleksandr Natalenko
  2020-05-08 11:49       ` Arnd Bergmann
  0 siblings, 1 reply; 20+ messages in thread
From: Oleksandr Natalenko @ 2020-05-08 11:33 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: LKML, X86 ML, Arnd Bergmann, Andrew Morton

On Fri, May 08, 2020 at 05:21:47AM -0600, Jason A. Donenfeld wrote:
> > Should we untangle -O3 from depending on ARC first maybe?
> 
> Oh, hah, good point. Yes, I'll do that for a v2, but will wait another
> day for feedback first.

Just keep in mind that my previous attempt [1] failed because of too
many false positive warnings despite -O3 really uncovered a couple of
bugs in the codebase.

Lets hope your attempt will be more successfull. I'll happily offer my
review tag ;).

Also Cc'ing Andrew who (IIRC) tried to took my sumbission and Arnd who
tried to clean up the mess afterwards.

[1] https://lore.kernel.org/lkml/20191211104619.114557-1-oleksandr@redhat.com/

-- 
  Best regards,
    Oleksandr Natalenko (post-factum)
    Principal Software Maintenance Engineer


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10
  2020-05-08 11:33     ` Oleksandr Natalenko
@ 2020-05-08 11:49       ` Arnd Bergmann
  2020-05-08 12:07         ` Jason A. Donenfeld
  2020-05-08 15:06         ` Joe Perches
  0 siblings, 2 replies; 20+ messages in thread
From: Arnd Bergmann @ 2020-05-08 11:49 UTC (permalink / raw)
  To: Oleksandr Natalenko; +Cc: Jason A. Donenfeld, LKML, X86 ML, Andrew Morton

On Fri, May 8, 2020 at 1:33 PM Oleksandr Natalenko <oleksandr@redhat.com> wrote:
>
> On Fri, May 08, 2020 at 05:21:47AM -0600, Jason A. Donenfeld wrote:
> > > Should we untangle -O3 from depending on ARC first maybe?
> >
> > Oh, hah, good point. Yes, I'll do that for a v2, but will wait another
> > day for feedback first.
>
> Just keep in mind that my previous attempt [1] failed because of too
> many false positive warnings despite -O3 really uncovered a couple of
> bugs in the codebase.

I think my warning fixes were mostly picked up in the meantime, but
if there are any remaining, they would be mixed in with random other
fixes in my testing tree, so it's hard to know for sure.

I also want to hear the feedback from the gcc developers about what
the general recommendations are between O2 and O3, and how
they may have changed over time. According to the gcc-10 documentation,
the difference between -O2 and -O3 is exactly this set of options:

-fgcse-after-reload
-fipa-cp-clone
-floop-interchange
-floop-unroll-and-jam
-fpeel-loops
-fpredictive-commoning
-fsplit-loops
-fsplit-paths
-ftree-loop-distribution
-ftree-loop-vectorize
-ftree-partial-pre
-ftree-slp-vectorize
-funswitch-loops
-fvect-cost-model
-fvect-cost-model=dynamic
-fversion-loops-for-strides

It's a relatively short list, so someone familiar with the options could
perhaps look into whether we want to change the default for all
of them, or if it makes sense to be more selective.

Personally, I'm more interested in improving compile speed of the kernel
and eventually supporting -Og or some variant of it for my own build
testing, but of course I also want to make sure that the other optimization
levels do not produce warnings, and -Og leads to more problems than
-O3 at the moment.

       Arnd

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10
  2020-05-08 11:49       ` Arnd Bergmann
@ 2020-05-08 12:07         ` Jason A. Donenfeld
  2020-05-08 13:04           ` Arnd Bergmann
  2020-05-08 15:06         ` Joe Perches
  1 sibling, 1 reply; 20+ messages in thread
From: Jason A. Donenfeld @ 2020-05-08 12:07 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Oleksandr Natalenko, LKML, X86 ML, Andrew Morton

On Fri, May 8, 2020 at 5:56 AM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Fri, May 8, 2020 at 1:33 PM Oleksandr Natalenko <oleksandr@redhat.com> wrote:
> >
> > On Fri, May 08, 2020 at 05:21:47AM -0600, Jason A. Donenfeld wrote:
> > > > Should we untangle -O3 from depending on ARC first maybe?
> > >
> > > Oh, hah, good point. Yes, I'll do that for a v2, but will wait another
> > > day for feedback first.
> >
> > Just keep in mind that my previous attempt [1] failed because of too
> > many false positive warnings despite -O3 really uncovered a couple of
> > bugs in the codebase.
>
> I think my warning fixes were mostly picked up in the meantime, but
> if there are any remaining, they would be mixed in with random other
> fixes in my testing tree, so it's hard to know for sure.
>
> I also want to hear the feedback from the gcc developers about what
> the general recommendations are between O2 and O3, and how
> they may have changed over time. According to the gcc-10 documentation,
> the difference between -O2 and -O3 is exactly this set of options:
>
> -fgcse-after-reload
> -fipa-cp-clone
> -floop-interchange
> -floop-unroll-and-jam
> -fpeel-loops
> -fpredictive-commoning
> -fsplit-loops
> -fsplit-paths
> -ftree-loop-distribution
> -ftree-loop-vectorize
> -ftree-partial-pre
> -ftree-slp-vectorize
> -funswitch-loops
> -fvect-cost-model
> -fvect-cost-model=dynamic
> -fversion-loops-for-strides

The other significant thing -- and what prompted this patchset -- is
it looks like gcc 10 has lowered the inlining degree for -O2, and put
gcc 9's inlining parameters from -O2 into gcc-10's -O3.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10
  2020-05-08 12:07         ` Jason A. Donenfeld
@ 2020-05-08 13:04           ` Arnd Bergmann
  0 siblings, 0 replies; 20+ messages in thread
From: Arnd Bergmann @ 2020-05-08 13:04 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: Oleksandr Natalenko, LKML, X86 ML, Andrew Morton

On Fri, May 8, 2020 at 2:07 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> On Fri, May 8, 2020 at 5:56 AM Arnd Bergmann <arnd@arndb.de> wrote:
>
> The other significant thing -- and what prompted this patchset -- is
> it looks like gcc 10 has lowered the inlining degree for -O2, and put
> gcc 9's inlining parameters from -O2 into gcc-10's -O3.

I suspect it is more complicated than that, as there are a number of
parameters that determine inlining decisions. It's also not clear whether
the ones for -O3 are generally better than the ones with -O2, or if it's
just that whatever changed caused a few surprises but is otherwise
preferable.

Did you see regressions in specific modules, or just a general slowdown
or growth in object size as the result?

      Arnd

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10
  2020-05-08 11:49       ` Arnd Bergmann
  2020-05-08 12:07         ` Jason A. Donenfeld
@ 2020-05-08 15:06         ` Joe Perches
  2020-05-08 15:09           ` Arnd Bergmann
  2020-05-10 12:47           ` David Laight
  1 sibling, 2 replies; 20+ messages in thread
From: Joe Perches @ 2020-05-08 15:06 UTC (permalink / raw)
  To: Arnd Bergmann, Oleksandr Natalenko
  Cc: Jason A. Donenfeld, LKML, X86 ML, Andrew Morton

On Fri, 2020-05-08 at 13:49 +0200, Arnd Bergmann wrote:
> Personally, I'm more interested in improving compile speed of the kernel

Any opinion on precompiled header support?



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10
  2020-05-08 15:06         ` Joe Perches
@ 2020-05-08 15:09           ` Arnd Bergmann
  2020-05-10 12:47           ` David Laight
  1 sibling, 0 replies; 20+ messages in thread
From: Arnd Bergmann @ 2020-05-08 15:09 UTC (permalink / raw)
  To: Joe Perches
  Cc: Oleksandr Natalenko, Jason A. Donenfeld, LKML, X86 ML, Andrew Morton

On Fri, May 8, 2020 at 5:06 PM Joe Perches <joe@perches.com> wrote:
>
> On Fri, 2020-05-08 at 13:49 +0200, Arnd Bergmann wrote:
> > Personally, I'm more interested in improving compile speed of the kernel
>
> Any opinion on precompiled header support?

I have not tried it. IIRC precompiled headers usually work best for projects
that have a large header with all the global declarations that gets included
everywhere, while Linux has always tried (with different amounts of success)
to minimize the number of headers that get included per file.

       Arnd

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10
  2020-05-08 15:06         ` Joe Perches
  2020-05-08 15:09           ` Arnd Bergmann
@ 2020-05-10 12:47           ` David Laight
  2020-05-10 17:45             ` Joe Perches
  2020-05-12  1:10             ` Masahiro Yamada
  1 sibling, 2 replies; 20+ messages in thread
From: David Laight @ 2020-05-10 12:47 UTC (permalink / raw)
  To: 'Joe Perches', Arnd Bergmann, Oleksandr Natalenko
  Cc: Jason A. Donenfeld, LKML, X86 ML, Andrew Morton

From: Joe Perches
> Sent: 08 May 2020 16:06
> On Fri, 2020-05-08 at 13:49 +0200, Arnd Bergmann wrote:
> > Personally, I'm more interested in improving compile speed of the kernel
> 
> Any opinion on precompiled header support?

When ever I've been anywhere near it it is always a disaster.
It may make sense for C++ where there is lots of complicated
code to parse in .h files. Parsing C headers is usually easier.

One this I have done that significantly speeds up .h file
processing is to take the long list of '-I directory' parameters
that are passed to the compiler and copy the first version
of each file into a separate 'object headers' directory.
This saves the compiler doing lots of 'failed opens'.

If each fragment makefile lists its 'public' headers make
can generate dependency rules that do the copies.

FWIW make is much faster if you delete all the builtin and
suffix rules and rely on explicit rules for each file.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10
  2020-05-10 12:47           ` David Laight
@ 2020-05-10 17:45             ` Joe Perches
  2020-05-10 18:58               ` David Laight
  2020-05-12  1:10             ` Masahiro Yamada
  1 sibling, 1 reply; 20+ messages in thread
From: Joe Perches @ 2020-05-10 17:45 UTC (permalink / raw)
  To: David Laight, Arnd Bergmann, Oleksandr Natalenko
  Cc: Jason A. Donenfeld, LKML, X86 ML, Andrew Morton

On Sun, 2020-05-10 at 12:47 +0000, David Laight wrote:
> From: Joe Perches
> > Sent: 08 May 2020 16:06
> > On Fri, 2020-05-08 at 13:49 +0200, Arnd Bergmann wrote:
> > > Personally, I'm more interested in improving compile speed of the kernel
> > 
> > Any opinion on precompiled header support?
> 
> When ever I've been anywhere near it it is always a disaster.

A disaster? Why?

For a large commercial c only project, it worked well
by reducing a combined multi-include file, similar to
kernel.h here, to a single file.

That was before SSDs though and the file open times
might have been rather larger then.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10
  2020-05-10 17:45             ` Joe Perches
@ 2020-05-10 18:58               ` David Laight
  0 siblings, 0 replies; 20+ messages in thread
From: David Laight @ 2020-05-10 18:58 UTC (permalink / raw)
  To: 'Joe Perches', Arnd Bergmann, Oleksandr Natalenko
  Cc: Jason A. Donenfeld, LKML, X86 ML, Andrew Morton

From: Joe Perches
> Sent: 10 May 2020 18:45
> 
> On Sun, 2020-05-10 at 12:47 +0000, David Laight wrote:
> > From: Joe Perches
> > > Sent: 08 May 2020 16:06
> > > On Fri, 2020-05-08 at 13:49 +0200, Arnd Bergmann wrote:
> > > > Personally, I'm more interested in improving compile speed of the kernel
> > >
> > > Any opinion on precompiled header support?
> >
> > When ever I've been anywhere near it it is always a disaster.
> 
> A disaster? Why?

The only time I've had systems that used them they always got
out of step with the headers - probable due to #define changes.
If auto-generated by the compiler then parallel makes also
give problems.

> For a large commercial c only project, it worked well
> by reducing a combined multi-include file, similar to
> kernel.h here, to a single file.

Certainly reducing the number of directories searched
can make a big difference.

I've also compiled .so by merging all the sources into a
single file.

> That was before SSDs though and the file open times
> might have been rather larger then.

The real killer is lots of directory names in the -I <paths>
especially over NFS.

I've also looked at system call stats during a kernel compile.
open() dominated and my 'gut feeling' was that most were
failing opens.

I also suspect that modern compilers remember that an include
file contained an include guard - and don't even both looking
for it a second time.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10
  2020-05-08  9:02 ` Oleksandr Natalenko
  2020-05-08 11:21   ` Jason A. Donenfeld
@ 2020-05-11 21:57   ` Jason A. Donenfeld
  2020-05-12  0:04     ` Linus Torvalds
  1 sibling, 1 reply; 20+ messages in thread
From: Jason A. Donenfeld @ 2020-05-11 21:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jason A. Donenfeld, linux-kbuild, x86, stable, hjl.tools,
	Peter Zijlstra, Jakub Jelinek, Oleksandr Natalenko,
	Arnd Bergmann, Andrew Morton, David Laight, Linus Torvalds,
	Masahiro Yamada

GCC 10 appears to have changed -O2 in order to make compilation time
faster when using -flto, seemingly at the expense of performance, in
particular with regards to how the inliner works. Since -O3 these days
shouldn't have the same set of bugs as 10 years ago, this commit
defaults new kernel compiles to -O3 when using gcc >= 10.

Cc: linux-kbuild@vger.kernel.org
Cc: x86@kernel.org
Cc: stable@vger.kernel.org
Cc: hjl.tools@gmail.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: Oleksandr Natalenko <oleksandr@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Laight <David.Laight@aculab.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
Changes v1->v2:
 - [Oleksandr] Remove O3 dependency on ARC.

 init/Kconfig | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index 9e22ee8fbd75..f76ec3ccc883 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1245,7 +1245,8 @@ config BOOT_CONFIG
 
 choice
 	prompt "Compiler optimization level"
-	default CC_OPTIMIZE_FOR_PERFORMANCE
+	default CC_OPTIMIZE_FOR_PERFORMANCE_O3 if GCC_VERSION >= 100000
+	default CC_OPTIMIZE_FOR_PERFORMANCE if (GCC_VERSION < 100000 || CC_IS_CLANG)
 
 config CC_OPTIMIZE_FOR_PERFORMANCE
 	bool "Optimize for performance (-O2)"
@@ -1256,7 +1257,6 @@ config CC_OPTIMIZE_FOR_PERFORMANCE
 
 config CC_OPTIMIZE_FOR_PERFORMANCE_O3
 	bool "Optimize more for performance (-O3)"
-	depends on ARC
 	imply CC_DISABLE_WARN_MAYBE_UNINITIALIZED  # avoid false positives
 	help
 	  Choosing this option will pass "-O3" to your compiler to optimize
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10
  2020-05-11 21:57   ` [PATCH v2] " Jason A. Donenfeld
@ 2020-05-12  0:04     ` Linus Torvalds
  2020-05-12  0:09       ` Linus Torvalds
                         ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Linus Torvalds @ 2020-05-12  0:04 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Linux Kernel Mailing List, Linux Kbuild mailing list,
	the arch/x86 maintainers, stable, H.J. Lu, Peter Zijlstra,
	Jakub Jelinek, Oleksandr Natalenko, Arnd Bergmann, Andrew Morton,
	David Laight, Masahiro Yamada

On Mon, May 11, 2020 at 2:57 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> GCC 10 appears to have changed -O2 in order to make compilation time
> faster when using -flto, seemingly at the expense of performance, in
> particular with regards to how the inliner works. Since -O3 these days
> shouldn't have the same set of bugs as 10 years ago, this commit
> defaults new kernel compiles to -O3 when using gcc >= 10.

I'm not convinced this is sensible.

-O3 historically does bad things with gcc. Including bad things for
performance. It traditionally makes code larger and often SLOWER.

And I don't mean slower to compile (although that's an issue). I mean
actually generating slower code.

Things like trying to unroll loops etc makes very little sense in the
kernel, where we very seldom have high loop counts for pretty much
anything.

There's a reason -O3 isn't even offered as an option.

Maybe things have changed, and maybe they've improved. But I'd like to
see actual numbers for something like this.

Not inlining as aggressively is not necessarily a bad thing. It can
be, of course. But I've actually also done gcc bugreports about gcc
inlining too much, and generating _worse_ code as a result (ie
inlinging things that were behind an "if (unlikely())" test, and
causing the likely path to grow a stack fram and stack spills as a
result).

So just "O3 inlines more" is not a valid argument.

              Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10
  2020-05-12  0:04     ` Linus Torvalds
@ 2020-05-12  0:09       ` Linus Torvalds
  2020-05-12  0:43       ` Jason A. Donenfeld
  2020-05-12  8:44       ` Richard Biener
  2 siblings, 0 replies; 20+ messages in thread
From: Linus Torvalds @ 2020-05-12  0:09 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Linux Kernel Mailing List, Linux Kbuild mailing list,
	the arch/x86 maintainers, stable, H.J. Lu, Peter Zijlstra,
	Jakub Jelinek, Oleksandr Natalenko, Arnd Bergmann, Andrew Morton,
	David Laight, Masahiro Yamada

On Mon, May 11, 2020 at 5:04 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Not inlining as aggressively is not necessarily a bad thing. It can
> be, of course. But I've actually also done gcc bugreports about gcc
> inlining too much, and generating _worse_ code as a result (ie
> inlinging things that were behind an "if (unlikely())" test, and
> causing the likely path to grow a stack fram and stack spills as a
> result).

In case people care, the bugzilla case I mentioned is this one:

    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49194

with example code on why it's actively wrong to inline.

Obviously, in the kernel, we can fix the obvious cases with "noinline"
and "always_inline", but those take care of the outliers.  Having a
compiler that does reasonably well by default is a good thing, and
that very much includes *not* inlining mindlessly.

                  Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10
  2020-05-12  0:04     ` Linus Torvalds
  2020-05-12  0:09       ` Linus Torvalds
@ 2020-05-12  0:43       ` Jason A. Donenfeld
  2020-05-12  8:44       ` Richard Biener
  2 siblings, 0 replies; 20+ messages in thread
From: Jason A. Donenfeld @ 2020-05-12  0:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Linux Kbuild mailing list,
	the arch/x86 maintainers, stable, H.J. Lu, Peter Zijlstra,
	Jakub Jelinek, Oleksandr Natalenko, Arnd Bergmann, Andrew Morton,
	David Laight, Masahiro Yamada

On Mon, May 11, 2020 at 6:05 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> There's a reason -O3 isn't even offered as an option.
>
> Maybe things have changed, and maybe they've improved. But I'd like to
> see actual numbers for something like this.
>
> Not inlining as aggressively is not necessarily a bad thing. It can
> be, of course. But I've actually also done gcc bugreports about gcc
> inlining too much, and generating _worse_ code as a result (ie
> inlinging things that were behind an "if (unlikely())" test, and
> causing the likely path to grow a stack fram and stack spills as a
> result).
>
> So just "O3 inlines more" is not a valid argument.

Alright. It might be possible to produce some benchmarks, and then
isolate the precise inlining parameter that makes the difference, and
include that for gcc-10. But you made a compelling argument in that
old gcc bug report about not going down the finicky rabbit hole of gcc
inlining switches that seem to change meaning between releases, which
is persuasive.

The other possibility would be if -O3 actually isn't bad like it used
to be and the codegen is markedly better, alongside some numbers to
back it up. I'm not presently making that argument and don't have
those numbers, but perhaps others who were interested in this patch
for other reasons do have strong arguments there and want to chime in.
Otherwise, no problem dropping this.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10
  2020-05-10 12:47           ` David Laight
  2020-05-10 17:45             ` Joe Perches
@ 2020-05-12  1:10             ` Masahiro Yamada
  1 sibling, 0 replies; 20+ messages in thread
From: Masahiro Yamada @ 2020-05-12  1:10 UTC (permalink / raw)
  To: David Laight
  Cc: Joe Perches, Arnd Bergmann, Oleksandr Natalenko,
	Jason A. Donenfeld, LKML, X86 ML, Andrew Morton

On Sun, May 10, 2020 at 9:47 PM David Laight <David.Laight@aculab.com> wrote:
>
> From: Joe Perches
> > Sent: 08 May 2020 16:06
> > On Fri, 2020-05-08 at 13:49 +0200, Arnd Bergmann wrote:
> > > Personally, I'm more interested in improving compile speed of the kernel
> >
> > Any opinion on precompiled header support?
>
> When ever I've been anywhere near it it is always a disaster.
> It may make sense for C++ where there is lots of complicated
> code to parse in .h files. Parsing C headers is usually easier.
>
> One this I have done that significantly speeds up .h file
> processing is to take the long list of '-I directory' parameters
> that are passed to the compiler and copy the first version
> of each file into a separate 'object headers' directory.
> This saves the compiler doing lots of 'failed opens'.
>
> If each fragment makefile lists its 'public' headers make
> can generate dependency rules that do the copies.
>
> FWIW make is much faster if you delete all the builtin and
> suffix rules and rely on explicit rules for each file.


Kbuild disables Make's builtin rules at least.


# Do not use make's built-in rules and variables
# (this increases performance and avoids hard-to-debug behaviour)
MAKEFLAGS += -rR



--
Best Regards
Masahiro Yamada

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10
  2020-05-12  0:04     ` Linus Torvalds
  2020-05-12  0:09       ` Linus Torvalds
  2020-05-12  0:43       ` Jason A. Donenfeld
@ 2020-05-12  8:44       ` Richard Biener
  2 siblings, 0 replies; 20+ messages in thread
From: Richard Biener @ 2020-05-12  8:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jason A. Donenfeld, Linux Kernel Mailing List,
	Linux Kbuild mailing list, the arch/x86 maintainers, stable,
	H.J. Lu, Peter Zijlstra, Jakub Jelinek, Oleksandr Natalenko,
	Arnd Bergmann, Andrew Morton, David Laight, Masahiro Yamada

[-- Attachment #1: Type: text/plain, Size: 2651 bytes --]

On Mon, 11 May 2020, Linus Torvalds wrote:

> On Mon, May 11, 2020 at 2:57 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> >
> > GCC 10 appears to have changed -O2 in order to make compilation time
> > faster when using -flto, seemingly at the expense of performance, in
> > particular with regards to how the inliner works. Since -O3 these days
> > shouldn't have the same set of bugs as 10 years ago, this commit
> > defaults new kernel compiles to -O3 when using gcc >= 10.
> 
> I'm not convinced this is sensible.

Note the real thing that changed for GCC 10 at -O2 is that -O2
now includes -finline-functions which means GCC considers inlining
of functions not marked with 'inline' at -O2.  To counter code-size
growth and tune that back to previous levels the inlining limits
in effect at -O2 have been lowered.

Note this has been done based on analyzing larger C++ code and obviously
not because the kernel would benefit (IIRC kernel folks like 'inline'
to behave as written and thus rather may dislike the change to default to
-finline-functions).

> -O3 historically does bad things with gcc. Including bad things for
> performance. It traditionally makes code larger and often SLOWER.
> 
> And I don't mean slower to compile (although that's an issue). I mean
> actually generating slower code.
> 
> Things like trying to unroll loops etc makes very little sense in the
> kernel, where we very seldom have high loop counts for pretty much
> anything.
> 
> There's a reason -O3 isn't even offered as an option.

And I think that's completely sensible.  I would not recommend
to use -O3 for the kernel.  Somehow feeding back profile data
might help - though getting such data at all and with enough
coverage is probably hard.

As you said in the followup I wouldn't recommend tweaking GCCs
defaults for the various --param affecting inlining.  The behavior
with this is not consistent across releases.

Richard.

> Maybe things have changed, and maybe they've improved. But I'd like to
> see actual numbers for something like this.
> 
> Not inlining as aggressively is not necessarily a bad thing. It can
> be, of course. But I've actually also done gcc bugreports about gcc
> inlining too much, and generating _worse_ code as a result (ie
> inlinging things that were behind an "if (unlikely())" test, and
> causing the likely path to grow a stack fram and stack spills as a
> result).
> 
> So just "O3 inlines more" is not a valid argument.
> 
>               Linus
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10
  2020-05-07 22:45 [PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10 Jason A. Donenfeld
  2020-05-08  8:35 ` Peter Zijlstra
  2020-05-08  9:02 ` Oleksandr Natalenko
@ 2020-05-13 11:27 ` Artem S. Tashkinov
  2 siblings, 0 replies; 20+ messages in thread
From: Artem S. Tashkinov @ 2020-05-13 11:27 UTC (permalink / raw)
  To: Jason; +Cc: Linux Kernel Mailing List

 > GCC 10 appears to have changed -O2 in order to make compilation time
faster when using -flto, seemingly at the expense of performance, in
particular with regards to how the inliner works. Since -O3 these days
shouldn't have the same set of bugs as 10 years ago, this commit
defaults new kernel compiles to -O3 when using gcc >= 10.

It's a strong "no" from me.

1) Aside from rare Gentoo users no one has extensively tested -O3 with
the kernel - even Gentoo defaults to -O2 for kernel compilation

2) -O3 _always_ bloats the code by a large amount which means both
vmlinux/bzImage and modules will become bigger, and slower to load from
the disk

3) -O3 does _not_ necessarily makes the code run faster

4) If GCC10 has removed certain options for the -O2 optimization level
you could just readded them as compilation flags without forcing -O3 by
default on everyone

5) If you still insist on -O3 I guess everyone would be happy if you
just made two KConfig options:

OPTIMIZE_O2 (-O2)
OPTIMIZE_O3_EVEN_MOAR (-O3)

Best regards,
Artem

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2020-05-13 11:27 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-07 22:45 [PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10 Jason A. Donenfeld
2020-05-08  8:35 ` Peter Zijlstra
2020-05-08  9:02 ` Oleksandr Natalenko
2020-05-08 11:21   ` Jason A. Donenfeld
2020-05-08 11:33     ` Oleksandr Natalenko
2020-05-08 11:49       ` Arnd Bergmann
2020-05-08 12:07         ` Jason A. Donenfeld
2020-05-08 13:04           ` Arnd Bergmann
2020-05-08 15:06         ` Joe Perches
2020-05-08 15:09           ` Arnd Bergmann
2020-05-10 12:47           ` David Laight
2020-05-10 17:45             ` Joe Perches
2020-05-10 18:58               ` David Laight
2020-05-12  1:10             ` Masahiro Yamada
2020-05-11 21:57   ` [PATCH v2] " Jason A. Donenfeld
2020-05-12  0:04     ` Linus Torvalds
2020-05-12  0:09       ` Linus Torvalds
2020-05-12  0:43       ` Jason A. Donenfeld
2020-05-12  8:44       ` Richard Biener
2020-05-13 11:27 ` [PATCH] " Artem S. Tashkinov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.