linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] expand micro-optimizations in kernel to newer model CPUs
@ 2013-12-08 15:53 John
  2013-12-10  3:51 ` David Heidelberger
  0 siblings, 1 reply; 11+ messages in thread
From: John @ 2013-12-08 15:53 UTC (permalink / raw)
  To: lkml; +Cc: david.heidelberger

[-- Attachment #1: Type: text/plain, Size: 876 bytes --]

I have been maintaining the attached patch since kernel v3.9.[1]  I submit for review the most recent incarnation which works with the v3.12 tree.  As you can see by the ANOVA plots referenced in the comments, these micro optimizations are value-added statistically based on a compilation endpoint and are on-par with the included "core2" option in the mainline kernel itself.

I maintain an unofficial Arch Linux kernel repo and have been building/packaging kernels using this patch for many different CPUs and Arches.  I feel this code has been tested by the >2,500 users of my repo on many different CPUs and under both x86 and x86_64 systems[2] and feel it is worth for inclusion into the mainline kernel.

Please cc me on replies as I am NOT a regular subscriber to lkml.  Thank you.

1. https://github.com/graysky2/kernel_gcc_patch
2, http://repo-ck.com/stats.pdf

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: kernel-312-gcc48-1.patch --]
[-- Type: text/x-patch; name="kernel-312-gcc48-1.patch", Size: 14066 bytes --]

The original patch was written by jeroen@linuxforge.net
http://www.linuxforge.net/linux/kernel/kernel-33-gcc47-0.patch

Benchmarks by graysky

Three different machines running a generic x86-64 kernel and an otherwise identical kernel running with the optimized gcc options were tested using a make based endpoint.

Conclusion:
There are small but real speed increases using a make endpoint to running with this patch.

Details:
1) Three test machines: Intel Xeon X3360, Intel i7-2620M, Intel Core i7-3660K.
2) All ran the make benchmark (linked below) 35 times while booted into a 'generic' kernel. Then all ran the same make benchmark 35 times after booting into an optimized kernel. Below are the optimizations chosen for each machine.
2a) X3360 = core2
2b) i7-2620M = corei7-avx
2c) i7-3660K = core-avx-i
3) Analyzed resulting distributions for statistical significance via ANOVA plots that clearly show statistically significant albeit small differences.

Links to ANOVA plots:
http://s19.postimage.org/68urcofzn/corei7_avx.png
http://s19.postimage.org/ozwomuak3/core_avx_i.png
http://s19.postimage.org/d0l6fj4z7/core2.png

Discussion:
1) All the assumptions for ANOVA are met:
*Data are normally distributed as show in the normal quantile plots.
*The population variances are fairly equal (Levene and Barlett tests).

2) The ANOVA plots clearly show significance.
*Pair-wise analysis by Tukey-Kramer shows significance at the 0.05 level for all CPUs compared.
Below are the differences in median values:

core2       +87.5 ms
corei7-avx  +79.7 ms
core-avx-i  +257.2 ms

References:
Bash script that controls the benchmark: https://github.com/graysky2/bin/blob/master/bench
Log file generated by script: http://repo-ck.com/bench/compile_time_optimization.txt.gz

---
--- linux-3.12/arch/x86/include/asm/module.h	2013-02-18 18:58:34.000000000 -0500
+++ linux-3.12.mod/arch/x86/include/asm/module.h	2013-04-11 17:40:04.064910866 -0400
@@ -15,6 +15,16 @@
 #define MODULE_PROC_FAMILY "586MMX "
 #elif defined CONFIG_MCORE2
 #define MODULE_PROC_FAMILY "CORE2 "
+#elif defined CONFIG_MNATIVE
+#define MODULE_PROC_FAMILY "NATIVE "
+#elif defined CONFIG_MCOREI7
+#define MODULE_PROC_FAMILY "COREI7 "
+#elif defined CONFIG_MCOREI7AVX
+#define MODULE_PROC_FAMILY "COREI7AVX "
+#elif defined CONFIG_MCOREAVXI
+#define MODULE_PROC_FAMILY "COREAVXI "
+#elif defined CONFIG_MCOREAVX2
+#define MODULE_PROC_FAMILY "COREAVX2 "
 #elif defined CONFIG_MATOM
 #define MODULE_PROC_FAMILY "ATOM "
 #elif defined CONFIG_M686
@@ -33,6 +43,16 @@
 #define MODULE_PROC_FAMILY "K7 "
 #elif defined CONFIG_MK8
 #define MODULE_PROC_FAMILY "K8 "
+#elif defined CONFIG_MK10
+#define MODULE_PROC_FAMILY "K10 "
+#elif defined CONFIG_MBARCELONA
+#define MODULE_PROC_FAMILY "BARCELONA "
+#elif defined CONFIG_MBOBCAT
+#define MODULE_PROC_FAMILY "BOBCAT "
+#elif defined CONFIG_MBULLDOZER
+#define MODULE_PROC_FAMILY "BULLDOZER "
+#elif defined CONFIG_MPILEDRIVER
+#define MODULE_PROC_FAMILY "PILEDRIVER "
 #elif defined CONFIG_MELAN
 #define MODULE_PROC_FAMILY "ELAN "
 #elif defined CONFIG_MCRUSOE
--- linux-3.12/arch/x86/Kconfig.cpu	2013-02-18 18:58:34.000000000 -0500
+++ linux-3.12.mod/arch/x86/Kconfig.cpu	2013-04-06 08:25:58.095745643 -0400
@@ -139,7 +139,7 @@
 
 
 config MK6
-	bool "K6/K6-II/K6-III"
+	bool "AMD K6/K6-II/K6-III"
 	depends on X86_32
 	---help---
 	  Select this for an AMD K6-family processor.  Enables use of
@@ -147,7 +147,7 @@
 	  flags to GCC.
 
 config MK7
-	bool "Athlon/Duron/K7"
+	bool "AMD Athlon/Duron/K7"
 	depends on X86_32
 	---help---
 	  Select this for an AMD Athlon K7-family processor.  Enables use of
@@ -155,12 +155,48 @@
 	  flags to GCC.
 
 config MK8
-	bool "Opteron/Athlon64/Hammer/K8"
+	bool "AMD Opteron/Athlon64/Hammer/K8"
 	---help---
 	  Select this for an AMD Opteron or Athlon64 Hammer-family processor.
 	  Enables use of some extended instructions, and passes appropriate
 	  optimization flags to GCC.
 
+config MK10
+	bool "AMD 61xx/7x50/PhenomX3/X4/II/K10"
+	---help---
+	  Select this for an AMD 61xx Eight-Core Magny-Cours, Athlon X2 7x50,
+		Phenom X3/X4/II, Athlon II X2/X3/X4, or Turion II-family processor.
+	  Enables use of some extended instructions, and passes appropriate
+	  optimization flags to GCC.
+
+config MBARCELONA
+	bool "AMD Barcelona"
+	---help---
+	  Select this for AMD Barcelona and newer processors.
+
+	  Enables -march=barcelona
+
+config MBOBCAT
+	bool "AMD Bobcat"
+	---help---
+	  Select this for AMD Bobcat processors.
+
+	  Enables -march=btver1
+
+config MBULLDOZER
+	bool "AMD Bulldozer"
+	---help---
+	  Select this for AMD Bulldozer processors.
+
+	  Enables -march=bdver1
+
+config MPILEDRIVER
+	bool "AMD Piledriver"
+	---help---
+	  Select this for AMD Piledriver processors.
+
+	  Enables -march=bdver2
+
 config MCRUSOE
 	bool "Crusoe"
 	depends on X86_32
@@ -251,8 +287,17 @@
 	  using the cpu family field
 	  in /proc/cpuinfo. Family 15 is an older Xeon, Family 6 a newer one.
 
+config MATOM
+	bool "Intel Atom"
+	---help---
+
+	  Select this for the Intel Atom platform. Intel Atom CPUs have an
+	  in-order pipelining architecture and thus can benefit from
+	  accordingly optimized code. Use a recent GCC with specific Atom
+	  support in order to fully benefit from selecting this option.
+
 config MCORE2
-	bool "Core 2/newer Xeon"
+	bool "Intel Core 2"
 	---help---
 
 	  Select this for Intel Core 2 and newer Core 2 Xeons (Xeon 51xx and
@@ -260,14 +305,40 @@
 	  family in /proc/cpuinfo. Newer ones have 6 and older ones 15
 	  (not a typo)
 
-config MATOM
-	bool "Intel Atom"
+	  Enables -march=core2
+
+config MCOREI7
+	bool "Intel Core i7"
 	---help---
 
-	  Select this for the Intel Atom platform. Intel Atom CPUs have an
-	  in-order pipelining architecture and thus can benefit from
-	  accordingly optimized code. Use a recent GCC with specific Atom
-	  support in order to fully benefit from selecting this option.
+	  Select this for the Intel Nehalem platform. Intel Nehalem proecessors
+	  include Core i3, i5, i7, Xeon: 34xx, 35xx, 55xx, 56xx, 75xx processors.
+
+	  Enables -march=corei7
+
+config MCOREI7AVX
+	bool "Intel Core 2nd Gen AVX"
+	---help---
+
+	  Select this for 2nd Gen Core processors including Sandy Bridge.
+
+	  Enables -march=corei7-avx
+
+config MCOREAVXI
+	bool "Intel Core 3rd Gen AVX"
+	---help---
+
+	  Select this for 3rd Gen Core processors including Ivy Bridge.
+
+	  Enables -march=core-avx-i
+
+config MCOREAVX2
+	bool "Intel Core AVX2"
+	---help---
+
+	  Select this for AVX2 enabled processors including Haswell.
+
+	  Enables -march=core-avx2
 
 config GENERIC_CPU
 	bool "Generic-x86-64"
@@ -276,6 +347,19 @@
 	  Generic x86-64 CPU.
 	  Run equally well on all x86-64 CPUs.
 
+config MNATIVE
+ bool "Native optimizations autodetected by GCC"
+ ---help---
+
+   GCC 4.2 and above support -march=native, which automatically detects
+   the optimum settings to use based on your processor. -march=native 
+   also detects and applies additional settings beyond -march specific
+   to your CPU, (eg. -msse4). Unless you have a specific reason not to
+   (e.g. distcc cross-compiling), you should probably be using
+   -march=native rather than anything listed below.
+
+   Enables -march=native
+
 endchoice
 
 config X86_GENERIC
@@ -300,7 +384,7 @@
 config X86_L1_CACHE_SHIFT
 	int
 	default "7" if MPENTIUM4 || MPSC
-	default "6" if MK7 || MK8 || MPENTIUMM || MCORE2 || MATOM || MVIAC7 || X86_GENERIC || GENERIC_CPU
+	default "6" if MK7 || MK8 || MK10 || MBARCELONA || MBOBCAT || MBULLDOZER || MPILEDRIVER || MPENTIUMM || MCORE2 || MCOREI7 || MCOREI7AVX || MCOREAVXI || MCOREAVX2 || MATOM || MVIAC7 || X86_GENERIC || MNATIVE || GENERIC_CPU
 	default "4" if MELAN || M486 || MGEODEGX1
 	default "5" if MWINCHIP3D || MWINCHIPC6 || MCRUSOE || MEFFICEON || MCYRIXIII || MK6 || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || M586 || MVIAC3_2 || MGEODE_LX
 
@@ -331,11 +415,11 @@
 
 config X86_INTEL_USERCOPY
 	def_bool y
-	depends on MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M586MMX || X86_GENERIC || MK8 || MK7 || MEFFICEON || MCORE2
+	depends on MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M586MMX || MNATIVE || X86_GENERIC || MK8 || MK7 || MK10 || MBARCELONA || MEFFICEON || MCORE2 || MCOREI7 || MCOREI7AVX || MCOREAVXI || MCOREAVX2
 
 config X86_USE_PPRO_CHECKSUM
 	def_bool y
-	depends on MWINCHIP3D || MWINCHIPC6 || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MK8 || MVIAC3_2 || MVIAC7 || MEFFICEON || MGEODE_LX || MCORE2 || MATOM
+	depends on MWINCHIP3D || MWINCHIPC6 || MCYRIXIII || MK7 || MK6 || MK10 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MK8 || MVIAC3_2 || MVIAC7 || MEFFICEON || MGEODE_LX || MCORE2 || MCOREI7 || MCOREI7AVX || MCOREAVXI || MCOREAVX2 || MATOM || MNATIVE
 
 config X86_USE_3DNOW
 	def_bool y
@@ -363,17 +447,17 @@
 
 config X86_TSC
 	def_bool y
-	depends on ((MWINCHIP3D || MCRUSOE || MEFFICEON || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || MK8 || MVIAC3_2 || MVIAC7 || MGEODEGX1 || MGEODE_LX || MCORE2 || MATOM) && !X86_NUMAQ) || X86_64
+	depends on ((MWINCHIP3D || MCRUSOE || MEFFICEON || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || MK8 || MK10 || MBARCELONA || MBOBCAT || MBULLDOZER || MPILEDRIVER || MVIAC3_2 || MVIAC7 || MGEODEGX1 || MGEODE_LX || MCORE2 || MCOREI7 || MCOREI7-AVX || MATOM) && !X86_NUMAQ) || X86_64 || MNATIVE
 
 config X86_CMPXCHG64
 	def_bool y
-	depends on X86_PAE || X86_64 || MCORE2 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MATOM
+	depends on X86_PAE || X86_64 || MCORE2 || MCOREI7 || MCOREI7AVX || MCOREAVXI || MCOREAVX2 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MATOM || MNATIVE
 
 # this should be set for all -march=.. options where the compiler
 # generates cmov.
 config X86_CMOV
 	def_bool y
-	depends on (MK8 || MK7 || MCORE2 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MCRUSOE || MEFFICEON || X86_64 || MATOM || MGEODE_LX)
+	depends on (MK8 || MK10 || MBARCELONA || MBOBCAT || MBULLDOZER || MPILEDRIVER || MK7 || MCORE2 || MCOREI7 || MCOREI7AVX || MCOREAVXI || MCOREAVX2 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MCRUSOE || MEFFICEON || X86_64 || MNATIVE || MATOM || MGEODE_LX)
 
 config X86_MINIMUM_CPU_FAMILY
 	int
--- linux-3.12/arch/x86/Makefile 2012-12-10 22:30:57.000000000 -0500
+++ linux-3.12.mod/arch/x86/Makefile	2013-04-06 07:36:39.349203123 -0400
@@ -57,11 +57,25 @@
 	KBUILD_CFLAGS += $(call cc-option,-mno-sse -mpreferred-stack-boundary=3)
 
         # FIXME - should be integrated in Makefile.cpu (Makefile_32.cpu)
+        cflags-$(CONFIG_MNATIVE) += $(call cc-option,-march=native)
         cflags-$(CONFIG_MK8) += $(call cc-option,-march=k8)
+        cflags-$(CONFIG_MK10) += $(call cc-option,-march=amdfam10)
+        cflags-$(CONFIG_MBARCELONA) += $(call cc-option,-march=barcelona)
+        cflags-$(CONFIG_MBOBCAT) += $(call cc-option,-march=btver1)
+        cflags-$(CONFIG_MBULLDOZER) += $(call cc-option,-march=bdver1)
+        cflags-$(CONFIG_MPILEDRIVER) += $(call cc-option,-march=bdver2)
         cflags-$(CONFIG_MPSC) += $(call cc-option,-march=nocona)
 
         cflags-$(CONFIG_MCORE2) += \
-                $(call cc-option,-march=core2,$(call cc-option,-mtune=generic))
+                $(call cc-option,-march=core2,$(call cc-option,-mtune=core2))
+        cflags-$(CONFIG_MCOREI7) += \
+                $(call cc-option,-march=corei7,$(call cc-option,-mtune=corei7))
+        cflags-$(CONFIG_MCOREI7AVX) += \
+                $(call cc-option,-march=corei7-avx,$(call cc-option,-mtune=corei7-avx))
+        cflags-$(CONFIG_MCOREAVXI) += \
+                $(call cc-option,-march=core-avx-i,$(call cc-option,-mtune=core-avx-i))
+        cflags-$(CONFIG_MCOREAVX2) += \
+                $(call cc-option,-march=core-avx2,$(call cc-option,-mtune=core-avx2))
 	cflags-$(CONFIG_MATOM) += $(call cc-option,-march=atom) \
 		$(call cc-option,-mtune=atom,$(call cc-option,-mtune=generic))
         cflags-$(CONFIG_GENERIC_CPU) += $(call cc-option,-mtune=generic)
--- linux-3.12/arch/x86/Makefile_32.cpu  2012-12-10 22:30:57.000000000 -0500
+++ linux-3.12.mod/arch/x86/Makefile_32.cpu	2013-04-06 07:37:31.754423693 -0400
@@ -23,7 +23,13 @@
 # Please note, that patches that add -march=athlon-xp and friends are pointless.
 # They make zero difference whatsosever to performance at this time.
 cflags-$(CONFIG_MK7)		+= -march=athlon
+cflags-$(CONFIG_MNATIVE) += $(call cc-option,-march=native)
 cflags-$(CONFIG_MK8)		+= $(call cc-option,-march=k8,-march=athlon)
+cflags-$(CONFIG_MK10)	+= $(call cc-option,-march=amdfam10,-march=athlon)
+cflags-$(CONFIG_MBARCELONA)	+= $(call cc-option,-march=barcelona,-march=athlon)
+cflags-$(CONFIG_MBOBCAT)	+= $(call cc-option,-march=btver1,-march=athlon)
+cflags-$(CONFIG_MBULLDOZER)	+= $(call cc-option,-march=bdver1,-march=athlon)
+cflags-$(CONFIG_MPILEDRIVER)	+= $(call cc-option,-march=bdver2,-march=athlon)
 cflags-$(CONFIG_MCRUSOE)	+= -march=i686 $(align)-functions=0 $(align)-jumps=0 $(align)-loops=0
 cflags-$(CONFIG_MEFFICEON)	+= -march=i686 $(call tune,pentium3) $(align)-functions=0 $(align)-jumps=0 $(align)-loops=0
 cflags-$(CONFIG_MWINCHIPC6)	+= $(call cc-option,-march=winchip-c6,-march=i586)
@@ -32,6 +38,10 @@
 cflags-$(CONFIG_MVIAC3_2)	+= $(call cc-option,-march=c3-2,-march=i686)
 cflags-$(CONFIG_MVIAC7)		+= -march=i686
 cflags-$(CONFIG_MCORE2)		+= -march=i686 $(call tune,core2)
+cflags-$(CONFIG_MCOREI7)	+= -march=i686 $(call tune,corei7)
+cflags-$(CONFIG_MCOREI7AVX)	+= -march=i686 $(call tune,corei7-avx)
+cflags-$(CONFIG_MCOREAVXI)	+= -march=i686 $(call tune,core-avx-i)
+cflags-$(CONFIG_MCOREAVX2)	+= -march=i686 $(call tune,core-avx2)
 cflags-$(CONFIG_MATOM)		+= $(call cc-option,-march=atom,$(call cc-option,-march=core2,-march=i686)) \
 	$(call cc-option,-mtune=atom,$(call cc-option,-mtune=generic))
 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] expand micro-optimizations in kernel to newer model CPUs
  2013-12-08 15:53 [PATCH] expand micro-optimizations in kernel to newer model CPUs John
@ 2013-12-10  3:51 ` David Heidelberger
  2013-12-13 12:37   ` Austin S Hemmelgarn
  0 siblings, 1 reply; 11+ messages in thread
From: David Heidelberger @ 2013-12-10  3:51 UTC (permalink / raw)
  To: John; +Cc: lkml

Hello,

this patch is used in Gentoo geek-sources and mptcp-sources (and 
probably in lot other kernel sources sets). So, there was lot testing.

It's not _big_ performance difference, but -mtune=native is nice to 
have.
It could be great, if this kernel option could be fully utilized without 
need of additional patching.

Thank for your work guys.

Cheers,
David

Dne 2013-12-08 16:53, John napsal:
> I have been maintaining the attached patch since kernel v3.9.[1]  I
> submit for review the most recent incarnation which works with the
> v3.12 tree.  As you can see by the ANOVA plots referenced in the
> comments, these micro optimizations are value-added statistically
> based on a compilation endpoint and are on-par with the included
> "core2" option in the mainline kernel itself.
> 
> I maintain an unofficial Arch Linux kernel repo and have been
> building/packaging kernels using this patch for many different CPUs
> and Arches.  I feel this code has been tested by the >2,500 users of
> my repo on many different CPUs and under both x86 and x86_64
> systems[2] and feel it is worth for inclusion into the mainline
> kernel.
> 
> Please cc me on replies as I am NOT a regular subscriber to lkml. 
>  Thank you.
> 
> 1. https://github.com/graysky2/kernel_gcc_patch
> 2, http://repo-ck.com/stats.pdf

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] expand micro-optimizations in kernel to newer model CPUs
  2013-12-10  3:51 ` David Heidelberger
@ 2013-12-13 12:37   ` Austin S Hemmelgarn
       [not found]     ` <1387051250.86178.YahooMailNeo@web140005.mail.bf1.yahoo.com>
  2013-12-15 12:23     ` John
  0 siblings, 2 replies; 11+ messages in thread
From: Austin S Hemmelgarn @ 2013-12-13 12:37 UTC (permalink / raw)
  To: David Heidelberger, John; +Cc: lkml

On 2013-12-09 22:51, David Heidelberger wrote:
> Hello,
> 
> this patch is used in Gentoo geek-sources and mptcp-sources (and
> probably in lot other kernel sources sets). So, there was lot testing.
> 
> It's not _big_ performance difference, but -mtune=native is nice to have.
> It could be great, if this kernel option could be fully utilized without
> need of additional patching.

This really depends on how you define 'big'.  Even a 1% performance
increase is significant for a realtime system, and for someone with a
cluster of systems, getting even a small improvement in performance on
all of the systems could mean a huge increase in effective processing power.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] expand micro-optimizations in kernel to newer model CPUs
       [not found]           ` <52ACECC4.208@zytor.com>
@ 2013-12-15 12:00             ` John
  2013-12-15 12:27               ` Richard Weinberger
  2013-12-16 14:28               ` Ingo Molnar
  0 siblings, 2 replies; 11+ messages in thread
From: John @ 2013-12-15 12:00 UTC (permalink / raw)
  To: H. Peter Anvin, akpm, david.heidelberger, tglx, mingo, x86; +Cc: lkml

----- Original Message -----

> From: H. Peter Anvin <hpa@zytor.com>
> Sent: Saturday, December 14, 2013 6:41 PM
> Subject: Re: Fw: [PATCH] expand micro-optimizations in kernel to newer model CPUs

> 
> Please submit in the email form requested by the
> Documentation/SubmittingPatches email; in particular we need the
> Signed-off-by: statements.
> 
> 
>     -hpa
> 

From: John Audia <da_audiophile@yahoo.com>


Signed-off-by: John Audia <da_audiophile@yahoo.com>


This patch has been tested on and known to work with kernel versions from 3.2
up to the latest git version (pulled on 12/14/2013).

This patch will expand the number of microarchitectures to include new
processors including: AMD K10-family, AMD Family 10h (Barcelona), AMD Family
14h (Bobcat), AMD Family 15h (Bulldozer), AMD Family 15h (Piledriver), AMD
Family 16h (Jaguar), Intel 1st Gen Core i3/i5/i7 (Nehalem), Intel 2nd Gen Core
i3/i5/i7 (Sandybridge), Intel 3rd Gen Core i3/i5/i7 (Ivybridge), and Intel 4th
Gen Core i3/i5/i7 (Haswell). It also offers the compiler the 'native' flag.

Small but real speed increases are measurable using a make endpoint comparing
a generic kernel to one built with one of the respective microarchs.

See the following experimental evidence of this statement:
https://github.com/graysky2/kernel_gcc_patch

---
diff -uprN a/arch/x86/include/asm/module.h b/arch/x86/include/asm/module.h
--- a/arch/x86/include/asm/module.h2013-11-03 18:41:51.000000000 -0500
+++ b/arch/x86/include/asm/module.h2013-12-15 06:21:24.351122516 -0500
@@ -15,6 +15,16 @@
 #define MODULE_PROC_FAMILY "586MMX "
 #elif defined CONFIG_MCORE2
 #define MODULE_PROC_FAMILY "CORE2 "
+#elif defined CONFIG_MNATIVE
+#define MODULE_PROC_FAMILY "NATIVE "
+#elif defined CONFIG_MCOREI7
+#define MODULE_PROC_FAMILY "COREI7 "
+#elif defined CONFIG_MCOREI7AVX
+#define MODULE_PROC_FAMILY "COREI7AVX "
+#elif defined CONFIG_MCOREAVXI
+#define MODULE_PROC_FAMILY "COREAVXI "
+#elif defined CONFIG_MCOREAVX2
+#define MODULE_PROC_FAMILY "COREAVX2 "
 #elif defined CONFIG_MATOM
 #define MODULE_PROC_FAMILY "ATOM "
 #elif defined CONFIG_M686
@@ -33,6 +43,18 @@
 #define MODULE_PROC_FAMILY "K7 "
 #elif defined CONFIG_MK8
 #define MODULE_PROC_FAMILY "K8 "
+#elif defined CONFIG_MK10
+#define MODULE_PROC_FAMILY "K10 "
+#elif defined CONFIG_MBARCELONA
+#define MODULE_PROC_FAMILY "BARCELONA "
+#elif defined CONFIG_MBOBCAT
+#define MODULE_PROC_FAMILY "BOBCAT "
+#elif defined CONFIG_MBULLDOZER
+#define MODULE_PROC_FAMILY "BULLDOZER "
+#elif defined CONFIG_MPILEDRIVER
+#define MODULE_PROC_FAMILY "PILEDRIVER "
+#elif defined CONFIG_MJAGUAR
+#define MODULE_PROC_FAMILY "JAGUAR "
 #elif defined CONFIG_MELAN
 #define MODULE_PROC_FAMILY "ELAN "
 #elif defined CONFIG_MCRUSOE
diff -uprN a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu
--- a/arch/x86/Kconfig.cpu2013-11-03 18:41:51.000000000 -0500
+++ b/arch/x86/Kconfig.cpu2013-12-15 06:21:24.351122516 -0500
@@ -139,7 +139,7 @@ config MPENTIUM4
 
 
 config MK6
-bool "K6/K6-II/K6-III"
+bool "AMD K6/K6-II/K6-III"
 depends on X86_32
 ---help---
   Select this for an AMD K6-family processor.  Enables use of
@@ -147,7 +147,7 @@ config MK6
   flags to GCC.
 
 config MK7
-bool "Athlon/Duron/K7"
+bool "AMD Athlon/Duron/K7"
 depends on X86_32
 ---help---
   Select this for an AMD Athlon K7-family processor.  Enables use of
@@ -155,12 +155,55 @@ config MK7
   flags to GCC.
 
 config MK8
-bool "Opteron/Athlon64/Hammer/K8"
+bool "AMD Opteron/Athlon64/Hammer/K8"
 ---help---
   Select this for an AMD Opteron or Athlon64 Hammer-family processor.
   Enables use of some extended instructions, and passes appropriate
   optimization flags to GCC.
 
+config MK10
+bool "AMD 61xx/7x50/PhenomX3/X4/II/K10"
+---help---
+  Select this for an AMD 61xx Eight-Core Magny-Cours, Athlon X2 7x50,
+Phenom X3/X4/II, Athlon II X2/X3/X4, or Turion II-family processor.
+  Enables use of some extended instructions, and passes appropriate
+  optimization flags to GCC.
+
+config MBARCELONA
+bool "AMD Barcelona"
+---help---
+  Select this for AMD Barcelona and newer processors.
+
+  Enables -march=barcelona
+
+config MBOBCAT
+bool "AMD Bobcat"
+---help---
+  Select this for AMD Bobcat processors.
+
+  Enables -march=btver1
+
+config MBULLDOZER
+bool "AMD Bulldozer"
+---help---
+  Select this for AMD Bulldozer processors.
+
+  Enables -march=bdver1
+
+config MPILEDRIVER
+bool "AMD Piledriver"
+---help---
+  Select this for AMD Piledriver processors.
+
+  Enables -march=bdver2
+
+config MJAGUAR
+bool "AMD Jaguar"
+---help---
+  Select this for AMD Jaguar processors.
+
+  Enables -march=btver2
+
 config MCRUSOE
 bool "Crusoe"
 depends on X86_32
@@ -251,8 +294,17 @@ config MPSC
   using the cpu family field
   in /proc/cpuinfo. Family 15 is an older Xeon, Family 6 a newer one.
 
+config MATOM
+bool "Intel Atom"
+---help---
+
+  Select this for the Intel Atom platform. Intel Atom CPUs have an
+  in-order pipelining architecture and thus can benefit from
+  accordingly optimized code. Use a recent GCC with specific Atom
+  support in order to fully benefit from selecting this option.
+
 config MCORE2
-bool "Core 2/newer Xeon"
+bool "Intel Core 2"
 ---help---
 
   Select this for Intel Core 2 and newer Core 2 Xeons (Xeon 51xx and
@@ -260,14 +312,40 @@ config MCORE2
   family in /proc/cpuinfo. Newer ones have 6 and older ones 15
   (not a typo)
 
-config MATOM
-bool "Intel Atom"
+  Enables -march=core2
+
+config MCOREI7
+bool "Intel Core i7"
 ---help---
 
-  Select this for the Intel Atom platform. Intel Atom CPUs have an
-  in-order pipelining architecture and thus can benefit from
-  accordingly optimized code. Use a recent GCC with specific Atom
-  support in order to fully benefit from selecting this option.
+  Select this for the Intel Nehalem platform. Intel Nehalem proecessors
+  include Core i3, i5, i7, Xeon: 34xx, 35xx, 55xx, 56xx, 75xx processors.
+
+  Enables -march=corei7
+
+config MCOREI7AVX
+bool "Intel Core 2nd Gen AVX"
+---help---
+
+  Select this for 2nd Gen Core processors including Sandy Bridge.
+
+  Enables -march=corei7-avx
+
+config MCOREAVXI
+bool "Intel Core 3rd Gen AVX"
+---help---
+
+  Select this for 3rd Gen Core processors including Ivy Bridge.
+
+  Enables -march=core-avx-i
+
+config MCOREAVX2
+bool "Intel Core AVX2"
+---help---
+
+  Select this for AVX2 enabled processors including Haswell.
+
+  Enables -march=core-avx2
 
 config GENERIC_CPU
 bool "Generic-x86-64"
@@ -276,6 +354,19 @@ config GENERIC_CPU
   Generic x86-64 CPU.
   Run equally well on all x86-64 CPUs.
 
+config MNATIVE
+ bool "Native optimizations autodetected by GCC"
+ ---help---
+
+   GCC 4.2 and above support -march=native, which automatically detects
+   the optimum settings to use based on your processor. -march=native 
+   also detects and applies additional settings beyond -march specific
+   to your CPU, (eg. -msse4). Unless you have a specific reason not to
+   (e.g. distcc cross-compiling), you should probably be using
+   -march=native rather than anything listed below.
+
+   Enables -march=native
+
 endchoice
 
 config X86_GENERIC
@@ -300,7 +391,7 @@ config X86_INTERNODE_CACHE_SHIFT
 config X86_L1_CACHE_SHIFT
 int
 default "7" if MPENTIUM4 || MPSC
-default "6" if MK7 || MK8 || MPENTIUMM || MCORE2 || MATOM || MVIAC7 || X86_GENERIC || GENERIC_CPU
+default "6" if MK7 || MK8 || MK10 || MBARCELONA || MBOBCAT || MBULLDOZER || MPILEDRIVER || MJAGUAR || MPENTIUMM || MCORE2 || MCOREI7 || MCOREI7AVX || MCOREAVXI || MCOREAVX2 || MATOM || MVIAC7 || X86_GENERIC || MNATIVE || GENERIC_CPU
 default "4" if MELAN || M486 || MGEODEGX1
 default "5" if MWINCHIP3D || MWINCHIPC6 || MCRUSOE || MEFFICEON || MCYRIXIII || MK6 || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || M586 || MVIAC3_2 || MGEODE_LX
 
@@ -331,11 +422,11 @@ config X86_ALIGNMENT_16
 
 config X86_INTEL_USERCOPY
 def_bool y
-depends on MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M586MMX || X86_GENERIC || MK8 || MK7 || MEFFICEON || MCORE2
+depends on MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M586MMX || MNATIVE || X86_GENERIC || MK8 || MK7 || MK10 || MBARCELONA || MEFFICEON || MCORE2 || MCOREI7 || MCOREI7AVX || MCOREAVXI || MCOREAVX2
 
 config X86_USE_PPRO_CHECKSUM
 def_bool y
-depends on MWINCHIP3D || MWINCHIPC6 || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MK8 || MVIAC3_2 || MVIAC7 || MEFFICEON || MGEODE_LX || MCORE2 || MATOM
+depends on MWINCHIP3D || MWINCHIPC6 || MCYRIXIII || MK7 || MK6 || MK10 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MK8 || MVIAC3_2 || MVIAC7 || MEFFICEON || MGEODE_LX || MCORE2 || MCOREI7 || MCOREI7AVX || MCOREAVXI || MCOREAVX2 || MATOM || MNATIVE
 
 config X86_USE_3DNOW
 def_bool y
@@ -363,17 +454,17 @@ config X86_P6_NOP
 
 config X86_TSC
 def_bool y
-depends on ((MWINCHIP3D || MCRUSOE || MEFFICEON || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || MK8 || MVIAC3_2 || MVIAC7 || MGEODEGX1 || MGEODE_LX || MCORE2 || MATOM) && !X86_NUMAQ) || X86_64
+depends on ((MWINCHIP3D || MCRUSOE || MEFFICEON || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || MK8 || MK10 || MBARCELONA || MBOBCAT || MBULLDOZER || MPILEDRIVER || MJAGUAR || MVIAC3_2 || MVIAC7 || MGEODEGX1 || MGEODE_LX || MCORE2 || MCOREI7 || MCOREI7-AVX || MATOM) && !X86_NUMAQ) || X86_64 || MNATIVE
 
 config X86_CMPXCHG64
 def_bool y
-depends on X86_PAE || X86_64 || MCORE2 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MATOM
+depends on X86_PAE || X86_64 || MCORE2 || MCOREI7 || MCOREI7AVX || MCOREAVXI || MCOREAVX2 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MATOM || MNATIVE
 
 # this should be set for all -march=.. options where the compiler
 # generates cmov.
 config X86_CMOV
 def_bool y
-depends on (MK8 || MK7 || MCORE2 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MCRUSOE || MEFFICEON || X86_64 || MATOM || MGEODE_LX)
+depends on (MK8 || MK10 || MBARCELONA || MBOBCAT || MBULLDOZER || MPILEDRIVER || MJAGUAR || MK7 || MCORE2 || MCOREI7 || MCOREI7AVX || MCOREAVXI || MCOREAVX2 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MCRUSOE || MEFFICEON || X86_64 || MNATIVE || MATOM || MGEODE_LX)
 
 config X86_MINIMUM_CPU_FAMILY
 int
diff -uprN a/arch/x86/Makefile b/arch/x86/Makefile
--- a/arch/x86/Makefile2013-11-03 18:41:51.000000000 -0500
+++ b/arch/x86/Makefile2013-12-15 06:21:24.354455723 -0500
@@ -61,11 +61,26 @@ else
 KBUILD_CFLAGS += $(call cc-option,-mno-sse -mpreferred-stack-boundary=3)
 
         # FIXME - should be integrated in Makefile.cpu (Makefile_32.cpu)
+        cflags-$(CONFIG_MNATIVE) += $(call cc-option,-march=native)
         cflags-$(CONFIG_MK8) += $(call cc-option,-march=k8)
+        cflags-$(CONFIG_MK10) += $(call cc-option,-march=amdfam10)
+        cflags-$(CONFIG_MBARCELONA) += $(call cc-option,-march=barcelona)
+        cflags-$(CONFIG_MBOBCAT) += $(call cc-option,-march=btver1)
+        cflags-$(CONFIG_MBULLDOZER) += $(call cc-option,-march=bdver1)
+        cflags-$(CONFIG_MPILEDRIVER) += $(call cc-option,-march=bdver2)
+        cflags-$(CONFIG_MJAGUAR) += $(call cc-option,-march=btver2)
         cflags-$(CONFIG_MPSC) += $(call cc-option,-march=nocona)
 
         cflags-$(CONFIG_MCORE2) += \
-                $(call cc-option,-march=core2,$(call cc-option,-mtune=generic))
+                $(call cc-option,-march=core2,$(call cc-option,-mtune=core2))
+        cflags-$(CONFIG_MCOREI7) += \
+                $(call cc-option,-march=corei7,$(call cc-option,-mtune=corei7))
+        cflags-$(CONFIG_MCOREI7AVX) += \
+                $(call cc-option,-march=corei7-avx,$(call cc-option,-mtune=corei7-avx))
+        cflags-$(CONFIG_MCOREAVXI) += \
+                $(call cc-option,-march=core-avx-i,$(call cc-option,-mtune=core-avx-i))
+        cflags-$(CONFIG_MCOREAVX2) += \
+                $(call cc-option,-march=core-avx2,$(call cc-option,-mtune=core-avx2))
 cflags-$(CONFIG_MATOM) += $(call cc-option,-march=atom) \
 $(call cc-option,-mtune=atom,$(call cc-option,-mtune=generic))
         cflags-$(CONFIG_GENERIC_CPU) += $(call cc-option,-mtune=generic)
diff -uprN a/arch/x86/Makefile_32.cpu b/arch/x86/Makefile_32.cpu
--- a/arch/x86/Makefile_32.cpu2013-11-03 18:41:51.000000000 -0500
+++ b/arch/x86/Makefile_32.cpu2013-12-15 06:21:24.354455723 -0500
@@ -23,7 +23,14 @@ cflags-$(CONFIG_MK6)+= -march=k6
 # Please note, that patches that add -march=athlon-xp and friends are pointless.
 # They make zero difference whatsosever to performance at this time.
 cflags-$(CONFIG_MK7)+= -march=athlon
+cflags-$(CONFIG_MNATIVE) += $(call cc-option,-march=native)
 cflags-$(CONFIG_MK8)+= $(call cc-option,-march=k8,-march=athlon)
+cflags-$(CONFIG_MK10)+= $(call cc-option,-march=amdfam10,-march=athlon)
+cflags-$(CONFIG_MBARCELONA)+= $(call cc-option,-march=barcelona,-march=athlon)
+cflags-$(CONFIG_MBOBCAT)+= $(call cc-option,-march=btver1,-march=athlon)
+cflags-$(CONFIG_MBULLDOZER)+= $(call cc-option,-march=bdver1,-march=athlon)
+cflags-$(CONFIG_MPILEDRIVER)+= $(call cc-option,-march=bdver2,-march=athlon)
+cflags-$(CONFIG_MJAGUAR)+= $(call cc-option,-march=btver2,-march=athlon)
 cflags-$(CONFIG_MCRUSOE)+= -march=i686 $(align)-functions=0 $(align)-jumps=0 $(align)-loops=0
 cflags-$(CONFIG_MEFFICEON)+= -march=i686 $(call tune,pentium3) $(align)-functions=0 $(align)-jumps=0 $(align)-loops=0
 cflags-$(CONFIG_MWINCHIPC6)+= $(call cc-option,-march=winchip-c6,-march=i586)
@@ -32,6 +39,10 @@ cflags-$(CONFIG_MCYRIXIII)+= $(call cc-
 cflags-$(CONFIG_MVIAC3_2)+= $(call cc-option,-march=c3-2,-march=i686)
 cflags-$(CONFIG_MVIAC7)+= -march=i686
 cflags-$(CONFIG_MCORE2)+= -march=i686 $(call tune,core2)
+cflags-$(CONFIG_MCOREI7)+= -march=i686 $(call tune,corei7)
+cflags-$(CONFIG_MCOREI7AVX)+= -march=i686 $(call tune,corei7-avx)
+cflags-$(CONFIG_MCOREAVXI)+= -march=i686 $(call tune,core-avx-i)
+cflags-$(CONFIG_MCOREAVX2)+= -march=i686 $(call tune,core-avx2)
 cflags-$(CONFIG_MATOM)+= $(call cc-option,-march=atom,$(call cc-option,-march=core2,-march=i686)) \
 $(call cc-option,-mtune=atom,$(call cc-option,-mtune=generic))

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] expand micro-optimizations in kernel to newer model CPUs
  2013-12-13 12:37   ` Austin S Hemmelgarn
       [not found]     ` <1387051250.86178.YahooMailNeo@web140005.mail.bf1.yahoo.com>
@ 2013-12-15 12:23     ` John
  1 sibling, 0 replies; 11+ messages in thread
From: John @ 2013-12-15 12:23 UTC (permalink / raw)
  To: Austin S Hemmelgarn, David Heidelberger; +Cc: lkml

> From: Austin S Hemmelgarn <>
> This really depends on how you define 'big'.  Even a 1% performance
> increase is significant for a realtime system, and for someone with a
> cluster of systems, getting even a small improvement in performance on
> all of the systems could mean a huge increase in effective processing power.
> 

I have resubmitted per H. Peter Anvin's request:

http://lkml.indiana.edu/hypermail/linux/kernel/1312.1/04844.html


Please cc me on any replies as I am not subscribed to lkml.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] expand micro-optimizations in kernel to newer model CPUs
  2013-12-15 12:00             ` John
@ 2013-12-15 12:27               ` Richard Weinberger
  2013-12-15 12:42                 ` John
  2013-12-16 14:28               ` Ingo Molnar
  1 sibling, 1 reply; 11+ messages in thread
From: Richard Weinberger @ 2013-12-15 12:27 UTC (permalink / raw)
  To: John; +Cc: H. Peter Anvin, akpm, david.heidelberger, tglx, mingo, x86, lkml

On Sun, Dec 15, 2013 at 1:00 PM, John <da_audiophile@yahoo.com> wrote:
> ----- Original Message -----
>
>> From: H. Peter Anvin <hpa@zytor.com>
>> Sent: Saturday, December 14, 2013 6:41 PM
>> Subject: Re: Fw: [PATCH] expand micro-optimizations in kernel to newer model CPUs
>
>>
>> Please submit in the email form requested by the
>> Documentation/SubmittingPatches email; in particular we need the
>> Signed-off-by: statements.
>>
>>
>> ááá -hpa
>>
>
> From: John Audia <da_audiophile@yahoo.com>
>
>
> Signed-off-by: John Audia <da_audiophile@yahoo.com>
>
>
> This patch has been tested on and known to work with kernel versions from 3.2
> up to the latest git version (pulled on 12/14/2013).
>
> This patch will expand the number of microarchitectures to include new
> processors including: AMD K10-family, AMD Family 10h (Barcelona), AMD Family
> 14h (Bobcat), AMD Family 15h (Bulldozer), AMD Family 15h (Piledriver), AMD
> Family 16h (Jaguar), Intel 1st Gen Core i3/i5/i7 (Nehalem), Intel 2nd Gen Core
> i3/i5/i7 (Sandybridge), Intel 3rd Gen Core i3/i5/i7 (Ivybridge), and Intel 4th
> Gen Core i3/i5/i7 (Haswell). It also offers the compiler the 'native' flag.
>
> Small but real speed increases are measurable using a make endpoint comparing
> a generic kernel to one built with one of the respective microarchs.

A *very* small speedup.

And I really doubt your numbers.
Why are you using ANOVA? You're comparing *two* groups not more than two.
I had a quick look at your raw numbers, they don't seem to be normally
distributed at all.
Did you remove some peaks?

> See the following experimental evidence of this statement:
> https://github.com/graysky2/kernel_gcc_patch
>
> ---
> diff -uprN a/arch/x86/include/asm/module.h b/arch/x86/include/asm/module.h
> --- a/arch/x86/include/asm/module.h2013-11-03 18:41:51.000000000 -0500
> +++ b/arch/x86/include/asm/module.h2013-12-15 06:21:24.351122516 -0500
> @@ -15,6 +15,16 @@
> á#define MODULE_PROC_FAMILY "586MMX "
> á#elif defined CONFIG_MCORE2
> á#define MODULE_PROC_FAMILY "CORE2 "
> +#elif defined CONFIG_MNATIVE
> +#define MODULE_PROC_FAMILY "NATIVE "
> +#elif defined CONFIG_MCOREI7
> +#define MODULE_PROC_FAMILY "COREI7 "
> +#elif defined CONFIG_MCOREI7AVX
> +#define MODULE_PROC_FAMILY "COREI7AVX "
> +#elif defined CONFIG_MCOREAVXI
> +#define MODULE_PROC_FAMILY "COREAVXI "
> +#elif defined CONFIG_MCOREAVX2
> +#define MODULE_PROC_FAMILY "COREAVX2 "
> á#elif defined CONFIG_MATOM
> á#define MODULE_PROC_FAMILY "ATOM "
> á#elif defined CONFIG_M686
> @@ -33,6 +43,18 @@
> á#define MODULE_PROC_FAMILY "K7 "
> á#elif defined CONFIG_MK8
> á#define MODULE_PROC_FAMILY "K8 "
> +#elif defined CONFIG_MK10
> +#define MODULE_PROC_FAMILY "K10 "
> +#elif defined CONFIG_MBARCELONA
> +#define MODULE_PROC_FAMILY "BARCELONA "
> +#elif defined CONFIG_MBOBCAT
> +#define MODULE_PROC_FAMILY "BOBCAT "
> +#elif defined CONFIG_MBULLDOZER
> +#define MODULE_PROC_FAMILY "BULLDOZER "
> +#elif defined CONFIG_MPILEDRIVER
> +#define MODULE_PROC_FAMILY "PILEDRIVER "
> +#elif defined CONFIG_MJAGUAR
> +#define MODULE_PROC_FAMILY "JAGUAR "
> á#elif defined CONFIG_MELAN
> á#define MODULE_PROC_FAMILY "ELAN "
> á#elif defined CONFIG_MCRUSOE
> diff -uprN a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu
> --- a/arch/x86/Kconfig.cpu2013-11-03 18:41:51.000000000 -0500
> +++ b/arch/x86/Kconfig.cpu2013-12-15 06:21:24.351122516 -0500
> @@ -139,7 +139,7 @@ config MPENTIUM4
> á
> á
> áconfig MK6
> -bool "K6/K6-II/K6-III"
> +bool "AMD K6/K6-II/K6-III"
> ádepends on X86_32
> á---help---
> á áSelect this for an AMD K6-family processor. áEnables use of
> @@ -147,7 +147,7 @@ config MK6
> á áflags to GCC.
> á
> áconfig MK7
> -bool "Athlon/Duron/K7"
> +bool "AMD Athlon/Duron/K7"
> ádepends on X86_32
> á---help---
> á áSelect this for an AMD Athlon K7-family processor. áEnables use of
> @@ -155,12 +155,55 @@ config MK7
> á áflags to GCC.
> á
> áconfig MK8
> -bool "Opteron/Athlon64/Hammer/K8"
> +bool "AMD Opteron/Athlon64/Hammer/K8"
> á---help---
> á áSelect this for an AMD Opteron or Athlon64 Hammer-family processor.
> á áEnables use of some extended instructions, and passes appropriate
> á áoptimization flags to GCC.
> á
> +config MK10
> +bool "AMD 61xx/7x50/PhenomX3/X4/II/K10"
> +---help---
> + áSelect this for an AMD 61xx Eight-Core Magny-Cours, Athlon X2 7x50,
> +Phenom X3/X4/II, Athlon II X2/X3/X4, or Turion II-family processor.
> + áEnables use of some extended instructions, and passes appropriate
> + áoptimization flags to GCC.
> +
> +config MBARCELONA
> +bool "AMD Barcelona"
> +---help---
> + áSelect this for AMD Barcelona and newer processors.
> +
> + áEnables -march=barcelona
> +
> +config MBOBCAT
> +bool "AMD Bobcat"
> +---help---
> + áSelect this for AMD Bobcat processors.
> +
> + áEnables -march=btver1
> +
> +config MBULLDOZER
> +bool "AMD Bulldozer"
> +---help---
> + áSelect this for AMD Bulldozer processors.
> +
> + áEnables -march=bdver1
> +
> +config MPILEDRIVER
> +bool "AMD Piledriver"
> +---help---
> + áSelect this for AMD Piledriver processors.
> +
> + áEnables -march=bdver2
> +
> +config MJAGUAR
> +bool "AMD Jaguar"
> +---help---
> + áSelect this for AMD Jaguar processors.
> +
> + áEnables -march=btver2
> +
> áconfig MCRUSOE
> ábool "Crusoe"
> ádepends on X86_32
> @@ -251,8 +294,17 @@ config MPSC
> á áusing the cpu family field
> á áin /proc/cpuinfo. Family 15 is an older Xeon, Family 6 a newer one.
> á
> +config MATOM
> +bool "Intel Atom"
> +---help---
> +
> + áSelect this for the Intel Atom platform. Intel Atom CPUs have an
> + áin-order pipelining architecture and thus can benefit from
> + áaccordingly optimized code. Use a recent GCC with specific Atom
> + ásupport in order to fully benefit from selecting this option.
> +
> áconfig MCORE2
> -bool "Core 2/newer Xeon"
> +bool "Intel Core 2"
> á---help---
> á
> á áSelect this for Intel Core 2 and newer Core 2 Xeons (Xeon 51xx and
> @@ -260,14 +312,40 @@ config MCORE2
> á áfamily in /proc/cpuinfo. Newer ones have 6 and older ones 15
> á á(not a typo)
> á
> -config MATOM
> -bool "Intel Atom"
> + áEnables -march=core2
> +
> +config MCOREI7
> +bool "Intel Core i7"
> á---help---
> á
> - áSelect this for the Intel Atom platform. Intel Atom CPUs have an
> - áin-order pipelining architecture and thus can benefit from
> - áaccordingly optimized code. Use a recent GCC with specific Atom
> - ásupport in order to fully benefit from selecting this option.
> + áSelect this for the Intel Nehalem platform. Intel Nehalem proecessors
> + áinclude Core i3, i5, i7, Xeon: 34xx, 35xx, 55xx, 56xx, 75xx processors.
> +
> + áEnables -march=corei7
> +
> +config MCOREI7AVX
> +bool "Intel Core 2nd Gen AVX"
> +---help---
> +
> + áSelect this for 2nd Gen Core processors including Sandy Bridge.
> +
> + áEnables -march=corei7-avx
> +
> +config MCOREAVXI
> +bool "Intel Core 3rd Gen AVX"
> +---help---
> +
> + áSelect this for 3rd Gen Core processors including Ivy Bridge.
> +
> + áEnables -march=core-avx-i
> +
> +config MCOREAVX2
> +bool "Intel Core AVX2"
> +---help---
> +
> + áSelect this for AVX2 enabled processors including Haswell.
> +
> + áEnables -march=core-avx2
> á
> áconfig GENERIC_CPU
> ábool "Generic-x86-64"
> @@ -276,6 +354,19 @@ config GENERIC_CPU
> á áGeneric x86-64 CPU.
> á áRun equally well on all x86-64 CPUs.
> á
> +config MNATIVE
> + bool "Native optimizations autodetected by GCC"
> + ---help---
> +
> + á GCC 4.2 and above support -march=native, which automatically detects
> + á the optimum settings to use based on your processor. -march=nativeá
> + á also detects and applies additional settings beyond -march specific
> + á to your CPU, (eg. -msse4). Unless you have a specific reason not to
> + á (e.g. distcc cross-compiling), you should probably be using
> + á -march=native rather than anything listed below.
> +
> + á Enables -march=native
> +
> áendchoice
> á
> áconfig X86_GENERIC
> @@ -300,7 +391,7 @@ config X86_INTERNODE_CACHE_SHIFT
> áconfig X86_L1_CACHE_SHIFT
> áint
> ádefault "7" if MPENTIUM4 || MPSC
> -default "6" if MK7 || MK8 || MPENTIUMM || MCORE2 || MATOM || MVIAC7 || X86_GENERIC || GENERIC_CPU
> +default "6" if MK7 || MK8 || MK10 || MBARCELONA || MBOBCAT || MBULLDOZER || MPILEDRIVER || MJAGUAR || MPENTIUMM || MCORE2 || MCOREI7 || MCOREI7AVX || MCOREAVXI || MCOREAVX2 || MATOM || MVIAC7 || X86_GENERIC || MNATIVE || GENERIC_CPU
> ádefault "4" if MELAN || M486 || MGEODEGX1
> ádefault "5" if MWINCHIP3D || MWINCHIPC6 || MCRUSOE || MEFFICEON || MCYRIXIII || MK6 || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || M586 || MVIAC3_2 || MGEODE_LX
> á
> @@ -331,11 +422,11 @@ config X86_ALIGNMENT_16
> á
> áconfig X86_INTEL_USERCOPY
> ádef_bool y
> -depends on MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M586MMX || X86_GENERIC || MK8 || MK7 || MEFFICEON || MCORE2
> +depends on MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M586MMX || MNATIVE || X86_GENERIC || MK8 || MK7 || MK10 || MBARCELONA || MEFFICEON || MCORE2 || MCOREI7 || MCOREI7AVX || MCOREAVXI || MCOREAVX2
> á
> áconfig X86_USE_PPRO_CHECKSUM
> ádef_bool y
> -depends on MWINCHIP3D || MWINCHIPC6 || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MK8 || MVIAC3_2 || MVIAC7 || MEFFICEON || MGEODE_LX || MCORE2 || MATOM
> +depends on MWINCHIP3D || MWINCHIPC6 || MCYRIXIII || MK7 || MK6 || MK10 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MK8 || MVIAC3_2 || MVIAC7 || MEFFICEON || MGEODE_LX || MCORE2 || MCOREI7 || MCOREI7AVX || MCOREAVXI || MCOREAVX2 || MATOM || MNATIVE
> á
> áconfig X86_USE_3DNOW
> ádef_bool y
> @@ -363,17 +454,17 @@ config X86_P6_NOP
> á
> áconfig X86_TSC
> ádef_bool y
> -depends on ((MWINCHIP3D || MCRUSOE || MEFFICEON || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || MK8 || MVIAC3_2 || MVIAC7 || MGEODEGX1 || MGEODE_LX || MCORE2 || MATOM) && !X86_NUMAQ) || X86_64
> +depends on ((MWINCHIP3D || MCRUSOE || MEFFICEON || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || MK8 || MK10 || MBARCELONA || MBOBCAT || MBULLDOZER || MPILEDRIVER || MJAGUAR || MVIAC3_2 || MVIAC7 || MGEODEGX1 || MGEODE_LX || MCORE2 || MCOREI7 || MCOREI7-AVX || MATOM) && !X86_NUMAQ) || X86_64 || MNATIVE
> á
> áconfig X86_CMPXCHG64
> ádef_bool y
> -depends on X86_PAE || X86_64 || MCORE2 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MATOM
> +depends on X86_PAE || X86_64 || MCORE2 || MCOREI7 || MCOREI7AVX || MCOREAVXI || MCOREAVX2 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MATOM || MNATIVE
> á
> á# this should be set for all -march=.. options where the compiler
> á# generates cmov.
> áconfig X86_CMOV
> ádef_bool y
> -depends on (MK8 || MK7 || MCORE2 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MCRUSOE || MEFFICEON || X86_64 || MATOM || MGEODE_LX)
> +depends on (MK8 || MK10 || MBARCELONA || MBOBCAT || MBULLDOZER || MPILEDRIVER || MJAGUAR || MK7 || MCORE2 || MCOREI7 || MCOREI7AVX || MCOREAVXI || MCOREAVX2 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MCRUSOE || MEFFICEON || X86_64 || MNATIVE || MATOM || MGEODE_LX)
> á
> áconfig X86_MINIMUM_CPU_FAMILY
> áint
> diff -uprN a/arch/x86/Makefile b/arch/x86/Makefile
> --- a/arch/x86/Makefile2013-11-03 18:41:51.000000000 -0500
> +++ b/arch/x86/Makefile2013-12-15 06:21:24.354455723 -0500
> @@ -61,11 +61,26 @@ else
> áKBUILD_CFLAGS += $(call cc-option,-mno-sse -mpreferred-stack-boundary=3)
> á
> á á á á á# FIXME - should be integrated in Makefile.cpu (Makefile_32.cpu)
> + á á á ácflags-$(CONFIG_MNATIVE) += $(call cc-option,-march=native)
> á á á á ácflags-$(CONFIG_MK8) += $(call cc-option,-march=k8)
> + á á á ácflags-$(CONFIG_MK10) += $(call cc-option,-march=amdfam10)
> + á á á ácflags-$(CONFIG_MBARCELONA) += $(call cc-option,-march=barcelona)
> + á á á ácflags-$(CONFIG_MBOBCAT) += $(call cc-option,-march=btver1)
> + á á á ácflags-$(CONFIG_MBULLDOZER) += $(call cc-option,-march=bdver1)
> + á á á ácflags-$(CONFIG_MPILEDRIVER) += $(call cc-option,-march=bdver2)
> + á á á ácflags-$(CONFIG_MJAGUAR) += $(call cc-option,-march=btver2)
> á á á á ácflags-$(CONFIG_MPSC) += $(call cc-option,-march=nocona)
> á
> á á á á ácflags-$(CONFIG_MCORE2) += \
> - á á á á á á á á$(call cc-option,-march=core2,$(call cc-option,-mtune=generic))
> + á á á á á á á á$(call cc-option,-march=core2,$(call cc-option,-mtune=core2))
> + á á á ácflags-$(CONFIG_MCOREI7) += \
> + á á á á á á á á$(call cc-option,-march=corei7,$(call cc-option,-mtune=corei7))
> + á á á ácflags-$(CONFIG_MCOREI7AVX) += \
> + á á á á á á á á$(call cc-option,-march=corei7-avx,$(call cc-option,-mtune=corei7-avx))
> + á á á ácflags-$(CONFIG_MCOREAVXI) += \
> + á á á á á á á á$(call cc-option,-march=core-avx-i,$(call cc-option,-mtune=core-avx-i))
> + á á á ácflags-$(CONFIG_MCOREAVX2) += \
> + á á á á á á á á$(call cc-option,-march=core-avx2,$(call cc-option,-mtune=core-avx2))
> ácflags-$(CONFIG_MATOM) += $(call cc-option,-march=atom) \
> á$(call cc-option,-mtune=atom,$(call cc-option,-mtune=generic))
> á á á á ácflags-$(CONFIG_GENERIC_CPU) += $(call cc-option,-mtune=generic)
> diff -uprN a/arch/x86/Makefile_32.cpu b/arch/x86/Makefile_32.cpu
> --- a/arch/x86/Makefile_32.cpu2013-11-03 18:41:51.000000000 -0500
> +++ b/arch/x86/Makefile_32.cpu2013-12-15 06:21:24.354455723 -0500
> @@ -23,7 +23,14 @@ cflags-$(CONFIG_MK6)+= -march=k6
> á# Please note, that patches that add -march=athlon-xp and friends are pointless.
> á# They make zero difference whatsosever to performance at this time.
> ácflags-$(CONFIG_MK7)+= -march=athlon
> +cflags-$(CONFIG_MNATIVE) += $(call cc-option,-march=native)
> ácflags-$(CONFIG_MK8)+= $(call cc-option,-march=k8,-march=athlon)
> +cflags-$(CONFIG_MK10)+= $(call cc-option,-march=amdfam10,-march=athlon)
> +cflags-$(CONFIG_MBARCELONA)+= $(call cc-option,-march=barcelona,-march=athlon)
> +cflags-$(CONFIG_MBOBCAT)+= $(call cc-option,-march=btver1,-march=athlon)
> +cflags-$(CONFIG_MBULLDOZER)+= $(call cc-option,-march=bdver1,-march=athlon)
> +cflags-$(CONFIG_MPILEDRIVER)+= $(call cc-option,-march=bdver2,-march=athlon)
> +cflags-$(CONFIG_MJAGUAR)+= $(call cc-option,-march=btver2,-march=athlon)
> ácflags-$(CONFIG_MCRUSOE)+= -march=i686 $(align)-functions=0 $(align)-jumps=0 $(align)-loops=0
> ácflags-$(CONFIG_MEFFICEON)+= -march=i686 $(call tune,pentium3) $(align)-functions=0 $(align)-jumps=0 $(align)-loops=0
> ácflags-$(CONFIG_MWINCHIPC6)+= $(call cc-option,-march=winchip-c6,-march=i586)
> @@ -32,6 +39,10 @@ cflags-$(CONFIG_MCYRIXIII)+= $(call cc-
> ácflags-$(CONFIG_MVIAC3_2)+= $(call cc-option,-march=c3-2,-march=i686)
> ácflags-$(CONFIG_MVIAC7)+= -march=i686
> ácflags-$(CONFIG_MCORE2)+= -march=i686 $(call tune,core2)
> +cflags-$(CONFIG_MCOREI7)+= -march=i686 $(call tune,corei7)
> +cflags-$(CONFIG_MCOREI7AVX)+= -march=i686 $(call tune,corei7-avx)
> +cflags-$(CONFIG_MCOREAVXI)+= -march=i686 $(call tune,core-avx-i)
> +cflags-$(CONFIG_MCOREAVX2)+= -march=i686 $(call tune,core-avx2)
> ácflags-$(CONFIG_MATOM)+= $(call cc-option,-march=atom,$(call cc-option,-march=core2,-march=i686)) \
> á$(call cc-option,-mtune=atom,$(call cc-option,-mtune=generic))
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/



-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] expand micro-optimizations in kernel to newer model CPUs
  2013-12-15 12:27               ` Richard Weinberger
@ 2013-12-15 12:42                 ` John
  2013-12-15 13:23                   ` Borislav Petkov
  2013-12-15 17:31                   ` Richard Weinberger
  0 siblings, 2 replies; 11+ messages in thread
From: John @ 2013-12-15 12:42 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: H. Peter Anvin, akpm, david.heidelberger, tglx, mingo, x86, lkml





----- Original Message -----
> From: Richard Weinberger <>
>
> A *very* small speedup.
> 
> And I really doubt your numbers.
> Why are you using ANOVA? You're comparing *two* groups not more than two.
> I had a quick look at your raw numbers, they don't seem to be normally
> distributed at all.
> Did you remove some peaks?
> 


Hi Richard.  Thank you for your interest.  Yes, a small speedup as I mentioned but  I'll note that the current kernel code includes the MCORE2 option.  I tested this against some of the newer ones and they are all on par with each other.  For example, here are differences in median values:

CPUDifference in median value
core2        +87.5 ms
core7-avx+79.7 ms
core-avx-i+257.2 ms

I am using ANOVA to establish that the generic group differs from the optimized group.  I have always used ANOVA for this sort of comparison whether using two or more groups.  In fact, thumb through any medical or scientific journal, you'll see others in pier reviewed article doing the same.  

I did not remove any datapoints; I do not understand why you don't think the sets are normally distributed.  Did you see the normal quantile plots?  Additionally, the population variances are fairly equal (Levene and Barlett tests). 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] expand micro-optimizations in kernel to newer model CPUs
  2013-12-15 12:42                 ` John
@ 2013-12-15 13:23                   ` Borislav Petkov
  2013-12-15 17:31                   ` Richard Weinberger
  1 sibling, 0 replies; 11+ messages in thread
From: Borislav Petkov @ 2013-12-15 13:23 UTC (permalink / raw)
  To: John
  Cc: Richard Weinberger, H. Peter Anvin, akpm, david.heidelberger,
	tglx, mingo, x86, lkml

On Sun, Dec 15, 2013 at 04:42:50AM -0800, John wrote:
> I am using ANOVA to establish that the generic group differs from the
> optimized group.

You probably should run a couple other benchmarks, in addition, for
greater confidence that this optimization actually brings any palpable
improvement and is not just causing code bloat.

For example, last time I did a perf profile of a kernel build on AMD
F15h, the PILERDRIVER "optimization" was actually even a bit worse than
the standard MK8 one.

http://marc.info/?l=linux-kernel&m=138081947417204

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] expand micro-optimizations in kernel to newer model CPUs
  2013-12-15 12:42                 ` John
  2013-12-15 13:23                   ` Borislav Petkov
@ 2013-12-15 17:31                   ` Richard Weinberger
  1 sibling, 0 replies; 11+ messages in thread
From: Richard Weinberger @ 2013-12-15 17:31 UTC (permalink / raw)
  To: John
  Cc: Richard Weinberger, H. Peter Anvin, akpm, david.heidelberger,
	tglx, mingo, x86, lkml

Am Sonntag, 15. Dezember 2013, 04:42:50 schrieb John:
> ----- Original Message -----
> 
> > From: Richard Weinberger <>
> > 
> > A *very* small speedup.
> > 
> > And I really doubt your numbers.
> > Why are you using ANOVA? You're comparing *two* groups not more than two.
> > I had a quick look at your raw numbers, they don't seem to be normally
> > distributed at all.
> > Did you remove some peaks?
> 
> Hi Richard.  Thank you for your interest.  Yes, a small speedup as I
> mentioned but  I'll note that the current kernel code includes the MCORE2
> option.  I tested this against some of the newer ones and they are all on
> par with each other.  For example, here are differences in median values:
> 
> CPUDifference in median value
> core2        +87.5 ms
> core7-avx+79.7 ms
> core-avx-i+257.2 ms
> 
> I am using ANOVA to establish that the generic group differs from the
> optimized group.  I have always used ANOVA for this sort of comparison
> whether using two or more groups.  In fact, thumb through any medical or
> scientific journal, you'll see others in pier reviewed article doing the
> same.  

Only because others so does not make it valid.
Why not a plain T-test?

> I did not remove any datapoints; I do not understand why you don't think the
> sets are normally distributed.  Did you see the normal quantile plots?
>  Additionally, the population variances are fairly equal (Levene and
> Barlett tests). 

Just perform a simple Kolmogorow-Smirnow test like on 
http://jumk.de/statistic-calculator or http://www.physics.csbsju.edu/stats/KS-test.n.plot_form.html and you'll find out.
IIRC from my statistics 101 you'd have to perform a two-sided ANOVA test
if your data points are not normally distributed.

That said, we should not waste time with statistics games.
What we need are reliable and reproducible results.
As Boris requested many times before...

Thanks,
//richard

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] expand micro-optimizations in kernel to newer model CPUs
  2013-12-15 12:00             ` John
  2013-12-15 12:27               ` Richard Weinberger
@ 2013-12-16 14:28               ` Ingo Molnar
  2013-12-17 12:59                 ` Austin S Hemmelgarn
  1 sibling, 1 reply; 11+ messages in thread
From: Ingo Molnar @ 2013-12-16 14:28 UTC (permalink / raw)
  To: John
  Cc: H. Peter Anvin, akpm, david.heidelberger, tglx, mingo, x86, lkml,
	Linus Torvalds, Peter Zijlstra


* John <da_audiophile@yahoo.com> wrote:

> This patch has been tested on and known to work with kernel versions 
> from 3.2 up to the latest git version (pulled on 12/14/2013).
>
> This patch will expand the number of microarchitectures to include 
> new processors including: AMD K10-family, AMD Family 10h 
> (Barcelona), AMD Family 14h (Bobcat), AMD Family 15h (Bulldozer), 
> AMD Family 15h (Piledriver), AMD Family 16h (Jaguar), Intel 1st Gen 
> Core i3/i5/i7 (Nehalem), Intel 2nd Gen Core i3/i5/i7 (Sandybridge), 
> Intel 3rd Gen Core i3/i5/i7 (Ivybridge), and Intel 4th Gen Core 
> i3/i5/i7 (Haswell). It also offers the compiler the 'native' flag.

So let me (again) follow Linus's general advice to say 'no' to patches 
more forcefully, so that people don't go down potential dead ends for 
too long time without strong negative feedback from upstream. :-)

This series does not look convincing enough to me. My complaints:

 - I'm not convinced the numbers are right. Rarely are such tiny
   compiler optimizations measureable in integer-only kernel code ...
   Too noisy benchmarks were used. More precise measurements done by
   Boris showed no statistically significant improvements:

      http://marc.info/?l=linux-kernel&m=138081947417204

 - Modern CPUs have inherently high noise: boot-to-boot variance is 
   often higher on modern systems with large caches than the speedup 
   claimed by optimization options ...

 - I'm not convinced the whole concept is long term maintainable to 
   begin with. When Linux on x86 began we used to have just 2-3 major 
   CPU models to care about, so it made sense. That count grew rapidly 
   and today we havedozens (if not hundreds) of models, families and
   variants and our 'optimization' options are just one big 
   fragmented, rarely tested mess with essentially random compiler 
   flags thrown at it.

 - The cost of getting optimizations wrong by going away from sane 
   defaults is probably high as well: see the case where Boris
   measured a regression from an 'optimization' option.

 - GCC itself changes as well, so a seemingly good but rarely used
   optimization flag could get out of sync and hurt performance on 
   rarer, rarely tested CPU models. It's sometimes safer to go with 
   the herd and use good, sensible defaults in most situations.

For those reasons I think we should just strip out all the current 
outdated micro-management of models/ and go to more logical, much 
broader optimization categories such as:

   "Optimize for modern Intel CPUs"
   "Optimize for modern AMD CPUs"

because most of the day to day measurement and testing work is 
concentrated on modern CPUs.

We might not even want to make a vendor differentiation there and just 
do a generic:

   "Optimize for modern x86 CPUs"
   
With perhaps a "workarounds" sub-option opening up:

   "Optimization workarounds" [x]
      "Intel Atom CPUs" [x]

Because occasionally there will be oddball yet common CPUs that need 
starkly different optimizations/workarounds. Naming it a 'workaround' 
creates an incentive to return such platforms to the common options.

I.e. handle and document the exceptions, and try to minimize them - 
instead of trying to enumerate every CPU model which is IMHO a losing 
game ...

[ If that is done then we also need much more statistically convincing
  methods to test how well a kernel's compiler options perform.

Thanks,

        Ingo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] expand micro-optimizations in kernel to newer model CPUs
  2013-12-16 14:28               ` Ingo Molnar
@ 2013-12-17 12:59                 ` Austin S Hemmelgarn
  0 siblings, 0 replies; 11+ messages in thread
From: Austin S Hemmelgarn @ 2013-12-17 12:59 UTC (permalink / raw)
  To: Ingo Molnar, John
  Cc: H. Peter Anvin, akpm, david.heidelberger, tglx, mingo, x86, lkml,
	Linus Torvalds, Peter Zijlstra

On 2013-12-16 09:28, Ingo Molnar wrote:
> 
> * John <da_audiophile@yahoo.com> wrote:
> 
>> This patch has been tested on and known to work with kernel versions 
>> from 3.2 up to the latest git version (pulled on 12/14/2013).
>>
>> This patch will expand the number of microarchitectures to include 
>> new processors including: AMD K10-family, AMD Family 10h 
>> (Barcelona), AMD Family 14h (Bobcat), AMD Family 15h (Bulldozer), 
>> AMD Family 15h (Piledriver), AMD Family 16h (Jaguar), Intel 1st Gen 
>> Core i3/i5/i7 (Nehalem), Intel 2nd Gen Core i3/i5/i7 (Sandybridge), 
>> Intel 3rd Gen Core i3/i5/i7 (Ivybridge), and Intel 4th Gen Core 
>> i3/i5/i7 (Haswell). It also offers the compiler the 'native' flag.
> 
> So let me (again) follow Linus's general advice to say 'no' to patches 
> more forcefully, so that people don't go down potential dead ends for 
> too long time without strong negative feedback from upstream. :-)
> 
> This series does not look convincing enough to me. My complaints:
> 
>  - I'm not convinced the numbers are right. Rarely are such tiny
>    compiler optimizations measureable in integer-only kernel code ...
>    Too noisy benchmarks were used. More precise measurements done by
>    Boris showed no statistically significant improvements:
> 
>       http://marc.info/?l=linux-kernel&m=138081947417204
> 
>  - Modern CPUs have inherently high noise: boot-to-boot variance is 
>    often higher on modern systems with large caches than the speedup 
>    claimed by optimization options ...
> 
>  - I'm not convinced the whole concept is long term maintainable to 
>    begin with. When Linux on x86 began we used to have just 2-3 major 
>    CPU models to care about, so it made sense. That count grew rapidly 
>    and today we havedozens (if not hundreds) of models, families and
>    variants and our 'optimization' options are just one big 
>    fragmented, rarely tested mess with essentially random compiler 
>    flags thrown at it.
> 
>  - The cost of getting optimizations wrong by going away from sane 
>    defaults is probably high as well: see the case where Boris
>    measured a regression from an 'optimization' option.
> 
>  - GCC itself changes as well, so a seemingly good but rarely used
>    optimization flag could get out of sync and hurt performance on 
>    rarer, rarely tested CPU models. It's sometimes safer to go with 
>    the herd and use good, sensible defaults in most situations.
> 
> For those reasons I think we should just strip out all the current 
> outdated micro-management of models/ and go to more logical, much 
> broader optimization categories such as:
> 
>    "Optimize for modern Intel CPUs"
>    "Optimize for modern AMD CPUs"
> 
> because most of the day to day measurement and testing work is 
> concentrated on modern CPUs.
> 
> We might not even want to make a vendor differentiation there and just 
> do a generic:
> 
>    "Optimize for modern x86 CPUs"
>    
> With perhaps a "workarounds" sub-option opening up:
> 
>    "Optimization workarounds" [x]
>       "Intel Atom CPUs" [x]
> 
> Because occasionally there will be oddball yet common CPUs that need 
> starkly different optimizations/workarounds. Naming it a 'workaround' 
> creates an incentive to return such platforms to the common options.
> 
> I.e. handle and document the exceptions, and try to minimize them - 
> instead of trying to enumerate every CPU model which is IMHO a losing 
> game ...
> 
> [ If that is done then we also need much more statistically convincing
>   methods to test how well a kernel's compiler options perform.

As an alternative to removing them altogether, why not make options
other than CONFIG_GENERICCPU and the big exceptions (such as Atom CPUs)
 depend on CONFIG_EXPERT and mark the kernel as tainted.  This way
people can still use the optimizations, and the developers have recourse
for dismissing bugs that are probably caused by them.

As far as I see things, people who are using these options fall into
four general categories:
1. People running HPC clusters.
2. People using embedded systems.
3. Power users (who actually do testing to determine how worthwhile the
optimization is).
4. Idiots who think that building a custom kernel is cool, but don't
have the ability to deal with potential fallout.

People in the first category will usually have sufficient resources to
hire someone to fix problems caused by the optimizations.  People in the
second and third categories usually know what they are doing and can
deal with any problems that may arise. People in the fourth category
shouldn't be using a custom kernel build to begin with.

In general, following the above suggestion, people in the first three
groups will likely be happy to rebuild the kernel with CONFIG_GENERICCPU
instead of system specific optimizations to try and reproduce a bug.

I understand wanting to make it harder for people to shoot themselves,
but there are still a multitude of legitimate uses for these options,
and I would personally think that a providing a standardized methodology
for optimizing the kernel is generally preferable to making everyone who
wants to do so hack things together.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2013-12-17 12:59 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-12-08 15:53 [PATCH] expand micro-optimizations in kernel to newer model CPUs John
2013-12-10  3:51 ` David Heidelberger
2013-12-13 12:37   ` Austin S Hemmelgarn
     [not found]     ` <1387051250.86178.YahooMailNeo@web140005.mail.bf1.yahoo.com>
     [not found]       ` <ce679ca5-bf83-4c2a-9234-859cc7a4206b@email.android.com>
     [not found]         ` <1387057337.97000.YahooMailNeo@web140004.mail.bf1.yahoo.com>
     [not found]           ` <52ACECC4.208@zytor.com>
2013-12-15 12:00             ` John
2013-12-15 12:27               ` Richard Weinberger
2013-12-15 12:42                 ` John
2013-12-15 13:23                   ` Borislav Petkov
2013-12-15 17:31                   ` Richard Weinberger
2013-12-16 14:28               ` Ingo Molnar
2013-12-17 12:59                 ` Austin S Hemmelgarn
2013-12-15 12:23     ` John

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).