linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: John <da_audiophile@yahoo.com>
To: lkml <linux-kernel@vger.kernel.org>
Cc: "david.heidelberger@ixit.cz" <david.heidelberger@ixit.cz>
Subject: [PATCH] expand micro-optimizations in kernel to newer model CPUs
Date: Sun, 8 Dec 2013 07:53:20 -0800 (PST)	[thread overview]
Message-ID: <1386518000.90452.YahooMailNeo@web140004.mail.bf1.yahoo.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 876 bytes --]

I have been maintaining the attached patch since kernel v3.9.[1]  I submit for review the most recent incarnation which works with the v3.12 tree.  As you can see by the ANOVA plots referenced in the comments, these micro optimizations are value-added statistically based on a compilation endpoint and are on-par with the included "core2" option in the mainline kernel itself.

I maintain an unofficial Arch Linux kernel repo and have been building/packaging kernels using this patch for many different CPUs and Arches.  I feel this code has been tested by the >2,500 users of my repo on many different CPUs and under both x86 and x86_64 systems[2] and feel it is worth for inclusion into the mainline kernel.

Please cc me on replies as I am NOT a regular subscriber to lkml.  Thank you.

1. https://github.com/graysky2/kernel_gcc_patch
2, http://repo-ck.com/stats.pdf

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: kernel-312-gcc48-1.patch --]
[-- Type: text/x-patch; name="kernel-312-gcc48-1.patch", Size: 14066 bytes --]

The original patch was written by jeroen@linuxforge.net
http://www.linuxforge.net/linux/kernel/kernel-33-gcc47-0.patch

Benchmarks by graysky

Three different machines running a generic x86-64 kernel and an otherwise identical kernel running with the optimized gcc options were tested using a make based endpoint.

Conclusion:
There are small but real speed increases using a make endpoint to running with this patch.

Details:
1) Three test machines: Intel Xeon X3360, Intel i7-2620M, Intel Core i7-3660K.
2) All ran the make benchmark (linked below) 35 times while booted into a 'generic' kernel. Then all ran the same make benchmark 35 times after booting into an optimized kernel. Below are the optimizations chosen for each machine.
2a) X3360 = core2
2b) i7-2620M = corei7-avx
2c) i7-3660K = core-avx-i
3) Analyzed resulting distributions for statistical significance via ANOVA plots that clearly show statistically significant albeit small differences.

Links to ANOVA plots:
http://s19.postimage.org/68urcofzn/corei7_avx.png
http://s19.postimage.org/ozwomuak3/core_avx_i.png
http://s19.postimage.org/d0l6fj4z7/core2.png

Discussion:
1) All the assumptions for ANOVA are met:
*Data are normally distributed as show in the normal quantile plots.
*The population variances are fairly equal (Levene and Barlett tests).

2) The ANOVA plots clearly show significance.
*Pair-wise analysis by Tukey-Kramer shows significance at the 0.05 level for all CPUs compared.
Below are the differences in median values:

core2       +87.5 ms
corei7-avx  +79.7 ms
core-avx-i  +257.2 ms

References:
Bash script that controls the benchmark: https://github.com/graysky2/bin/blob/master/bench
Log file generated by script: http://repo-ck.com/bench/compile_time_optimization.txt.gz

---
--- linux-3.12/arch/x86/include/asm/module.h	2013-02-18 18:58:34.000000000 -0500
+++ linux-3.12.mod/arch/x86/include/asm/module.h	2013-04-11 17:40:04.064910866 -0400
@@ -15,6 +15,16 @@
 #define MODULE_PROC_FAMILY "586MMX "
 #elif defined CONFIG_MCORE2
 #define MODULE_PROC_FAMILY "CORE2 "
+#elif defined CONFIG_MNATIVE
+#define MODULE_PROC_FAMILY "NATIVE "
+#elif defined CONFIG_MCOREI7
+#define MODULE_PROC_FAMILY "COREI7 "
+#elif defined CONFIG_MCOREI7AVX
+#define MODULE_PROC_FAMILY "COREI7AVX "
+#elif defined CONFIG_MCOREAVXI
+#define MODULE_PROC_FAMILY "COREAVXI "
+#elif defined CONFIG_MCOREAVX2
+#define MODULE_PROC_FAMILY "COREAVX2 "
 #elif defined CONFIG_MATOM
 #define MODULE_PROC_FAMILY "ATOM "
 #elif defined CONFIG_M686
@@ -33,6 +43,16 @@
 #define MODULE_PROC_FAMILY "K7 "
 #elif defined CONFIG_MK8
 #define MODULE_PROC_FAMILY "K8 "
+#elif defined CONFIG_MK10
+#define MODULE_PROC_FAMILY "K10 "
+#elif defined CONFIG_MBARCELONA
+#define MODULE_PROC_FAMILY "BARCELONA "
+#elif defined CONFIG_MBOBCAT
+#define MODULE_PROC_FAMILY "BOBCAT "
+#elif defined CONFIG_MBULLDOZER
+#define MODULE_PROC_FAMILY "BULLDOZER "
+#elif defined CONFIG_MPILEDRIVER
+#define MODULE_PROC_FAMILY "PILEDRIVER "
 #elif defined CONFIG_MELAN
 #define MODULE_PROC_FAMILY "ELAN "
 #elif defined CONFIG_MCRUSOE
--- linux-3.12/arch/x86/Kconfig.cpu	2013-02-18 18:58:34.000000000 -0500
+++ linux-3.12.mod/arch/x86/Kconfig.cpu	2013-04-06 08:25:58.095745643 -0400
@@ -139,7 +139,7 @@
 
 
 config MK6
-	bool "K6/K6-II/K6-III"
+	bool "AMD K6/K6-II/K6-III"
 	depends on X86_32
 	---help---
 	  Select this for an AMD K6-family processor.  Enables use of
@@ -147,7 +147,7 @@
 	  flags to GCC.
 
 config MK7
-	bool "Athlon/Duron/K7"
+	bool "AMD Athlon/Duron/K7"
 	depends on X86_32
 	---help---
 	  Select this for an AMD Athlon K7-family processor.  Enables use of
@@ -155,12 +155,48 @@
 	  flags to GCC.
 
 config MK8
-	bool "Opteron/Athlon64/Hammer/K8"
+	bool "AMD Opteron/Athlon64/Hammer/K8"
 	---help---
 	  Select this for an AMD Opteron or Athlon64 Hammer-family processor.
 	  Enables use of some extended instructions, and passes appropriate
 	  optimization flags to GCC.
 
+config MK10
+	bool "AMD 61xx/7x50/PhenomX3/X4/II/K10"
+	---help---
+	  Select this for an AMD 61xx Eight-Core Magny-Cours, Athlon X2 7x50,
+		Phenom X3/X4/II, Athlon II X2/X3/X4, or Turion II-family processor.
+	  Enables use of some extended instructions, and passes appropriate
+	  optimization flags to GCC.
+
+config MBARCELONA
+	bool "AMD Barcelona"
+	---help---
+	  Select this for AMD Barcelona and newer processors.
+
+	  Enables -march=barcelona
+
+config MBOBCAT
+	bool "AMD Bobcat"
+	---help---
+	  Select this for AMD Bobcat processors.
+
+	  Enables -march=btver1
+
+config MBULLDOZER
+	bool "AMD Bulldozer"
+	---help---
+	  Select this for AMD Bulldozer processors.
+
+	  Enables -march=bdver1
+
+config MPILEDRIVER
+	bool "AMD Piledriver"
+	---help---
+	  Select this for AMD Piledriver processors.
+
+	  Enables -march=bdver2
+
 config MCRUSOE
 	bool "Crusoe"
 	depends on X86_32
@@ -251,8 +287,17 @@
 	  using the cpu family field
 	  in /proc/cpuinfo. Family 15 is an older Xeon, Family 6 a newer one.
 
+config MATOM
+	bool "Intel Atom"
+	---help---
+
+	  Select this for the Intel Atom platform. Intel Atom CPUs have an
+	  in-order pipelining architecture and thus can benefit from
+	  accordingly optimized code. Use a recent GCC with specific Atom
+	  support in order to fully benefit from selecting this option.
+
 config MCORE2
-	bool "Core 2/newer Xeon"
+	bool "Intel Core 2"
 	---help---
 
 	  Select this for Intel Core 2 and newer Core 2 Xeons (Xeon 51xx and
@@ -260,14 +305,40 @@
 	  family in /proc/cpuinfo. Newer ones have 6 and older ones 15
 	  (not a typo)
 
-config MATOM
-	bool "Intel Atom"
+	  Enables -march=core2
+
+config MCOREI7
+	bool "Intel Core i7"
 	---help---
 
-	  Select this for the Intel Atom platform. Intel Atom CPUs have an
-	  in-order pipelining architecture and thus can benefit from
-	  accordingly optimized code. Use a recent GCC with specific Atom
-	  support in order to fully benefit from selecting this option.
+	  Select this for the Intel Nehalem platform. Intel Nehalem proecessors
+	  include Core i3, i5, i7, Xeon: 34xx, 35xx, 55xx, 56xx, 75xx processors.
+
+	  Enables -march=corei7
+
+config MCOREI7AVX
+	bool "Intel Core 2nd Gen AVX"
+	---help---
+
+	  Select this for 2nd Gen Core processors including Sandy Bridge.
+
+	  Enables -march=corei7-avx
+
+config MCOREAVXI
+	bool "Intel Core 3rd Gen AVX"
+	---help---
+
+	  Select this for 3rd Gen Core processors including Ivy Bridge.
+
+	  Enables -march=core-avx-i
+
+config MCOREAVX2
+	bool "Intel Core AVX2"
+	---help---
+
+	  Select this for AVX2 enabled processors including Haswell.
+
+	  Enables -march=core-avx2
 
 config GENERIC_CPU
 	bool "Generic-x86-64"
@@ -276,6 +347,19 @@
 	  Generic x86-64 CPU.
 	  Run equally well on all x86-64 CPUs.
 
+config MNATIVE
+ bool "Native optimizations autodetected by GCC"
+ ---help---
+
+   GCC 4.2 and above support -march=native, which automatically detects
+   the optimum settings to use based on your processor. -march=native 
+   also detects and applies additional settings beyond -march specific
+   to your CPU, (eg. -msse4). Unless you have a specific reason not to
+   (e.g. distcc cross-compiling), you should probably be using
+   -march=native rather than anything listed below.
+
+   Enables -march=native
+
 endchoice
 
 config X86_GENERIC
@@ -300,7 +384,7 @@
 config X86_L1_CACHE_SHIFT
 	int
 	default "7" if MPENTIUM4 || MPSC
-	default "6" if MK7 || MK8 || MPENTIUMM || MCORE2 || MATOM || MVIAC7 || X86_GENERIC || GENERIC_CPU
+	default "6" if MK7 || MK8 || MK10 || MBARCELONA || MBOBCAT || MBULLDOZER || MPILEDRIVER || MPENTIUMM || MCORE2 || MCOREI7 || MCOREI7AVX || MCOREAVXI || MCOREAVX2 || MATOM || MVIAC7 || X86_GENERIC || MNATIVE || GENERIC_CPU
 	default "4" if MELAN || M486 || MGEODEGX1
 	default "5" if MWINCHIP3D || MWINCHIPC6 || MCRUSOE || MEFFICEON || MCYRIXIII || MK6 || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || M586 || MVIAC3_2 || MGEODE_LX
 
@@ -331,11 +415,11 @@
 
 config X86_INTEL_USERCOPY
 	def_bool y
-	depends on MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M586MMX || X86_GENERIC || MK8 || MK7 || MEFFICEON || MCORE2
+	depends on MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M586MMX || MNATIVE || X86_GENERIC || MK8 || MK7 || MK10 || MBARCELONA || MEFFICEON || MCORE2 || MCOREI7 || MCOREI7AVX || MCOREAVXI || MCOREAVX2
 
 config X86_USE_PPRO_CHECKSUM
 	def_bool y
-	depends on MWINCHIP3D || MWINCHIPC6 || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MK8 || MVIAC3_2 || MVIAC7 || MEFFICEON || MGEODE_LX || MCORE2 || MATOM
+	depends on MWINCHIP3D || MWINCHIPC6 || MCYRIXIII || MK7 || MK6 || MK10 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MK8 || MVIAC3_2 || MVIAC7 || MEFFICEON || MGEODE_LX || MCORE2 || MCOREI7 || MCOREI7AVX || MCOREAVXI || MCOREAVX2 || MATOM || MNATIVE
 
 config X86_USE_3DNOW
 	def_bool y
@@ -363,17 +447,17 @@
 
 config X86_TSC
 	def_bool y
-	depends on ((MWINCHIP3D || MCRUSOE || MEFFICEON || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || MK8 || MVIAC3_2 || MVIAC7 || MGEODEGX1 || MGEODE_LX || MCORE2 || MATOM) && !X86_NUMAQ) || X86_64
+	depends on ((MWINCHIP3D || MCRUSOE || MEFFICEON || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || MK8 || MK10 || MBARCELONA || MBOBCAT || MBULLDOZER || MPILEDRIVER || MVIAC3_2 || MVIAC7 || MGEODEGX1 || MGEODE_LX || MCORE2 || MCOREI7 || MCOREI7-AVX || MATOM) && !X86_NUMAQ) || X86_64 || MNATIVE
 
 config X86_CMPXCHG64
 	def_bool y
-	depends on X86_PAE || X86_64 || MCORE2 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MATOM
+	depends on X86_PAE || X86_64 || MCORE2 || MCOREI7 || MCOREI7AVX || MCOREAVXI || MCOREAVX2 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MATOM || MNATIVE
 
 # this should be set for all -march=.. options where the compiler
 # generates cmov.
 config X86_CMOV
 	def_bool y
-	depends on (MK8 || MK7 || MCORE2 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MCRUSOE || MEFFICEON || X86_64 || MATOM || MGEODE_LX)
+	depends on (MK8 || MK10 || MBARCELONA || MBOBCAT || MBULLDOZER || MPILEDRIVER || MK7 || MCORE2 || MCOREI7 || MCOREI7AVX || MCOREAVXI || MCOREAVX2 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MCRUSOE || MEFFICEON || X86_64 || MNATIVE || MATOM || MGEODE_LX)
 
 config X86_MINIMUM_CPU_FAMILY
 	int
--- linux-3.12/arch/x86/Makefile 2012-12-10 22:30:57.000000000 -0500
+++ linux-3.12.mod/arch/x86/Makefile	2013-04-06 07:36:39.349203123 -0400
@@ -57,11 +57,25 @@
 	KBUILD_CFLAGS += $(call cc-option,-mno-sse -mpreferred-stack-boundary=3)
 
         # FIXME - should be integrated in Makefile.cpu (Makefile_32.cpu)
+        cflags-$(CONFIG_MNATIVE) += $(call cc-option,-march=native)
         cflags-$(CONFIG_MK8) += $(call cc-option,-march=k8)
+        cflags-$(CONFIG_MK10) += $(call cc-option,-march=amdfam10)
+        cflags-$(CONFIG_MBARCELONA) += $(call cc-option,-march=barcelona)
+        cflags-$(CONFIG_MBOBCAT) += $(call cc-option,-march=btver1)
+        cflags-$(CONFIG_MBULLDOZER) += $(call cc-option,-march=bdver1)
+        cflags-$(CONFIG_MPILEDRIVER) += $(call cc-option,-march=bdver2)
         cflags-$(CONFIG_MPSC) += $(call cc-option,-march=nocona)
 
         cflags-$(CONFIG_MCORE2) += \
-                $(call cc-option,-march=core2,$(call cc-option,-mtune=generic))
+                $(call cc-option,-march=core2,$(call cc-option,-mtune=core2))
+        cflags-$(CONFIG_MCOREI7) += \
+                $(call cc-option,-march=corei7,$(call cc-option,-mtune=corei7))
+        cflags-$(CONFIG_MCOREI7AVX) += \
+                $(call cc-option,-march=corei7-avx,$(call cc-option,-mtune=corei7-avx))
+        cflags-$(CONFIG_MCOREAVXI) += \
+                $(call cc-option,-march=core-avx-i,$(call cc-option,-mtune=core-avx-i))
+        cflags-$(CONFIG_MCOREAVX2) += \
+                $(call cc-option,-march=core-avx2,$(call cc-option,-mtune=core-avx2))
 	cflags-$(CONFIG_MATOM) += $(call cc-option,-march=atom) \
 		$(call cc-option,-mtune=atom,$(call cc-option,-mtune=generic))
         cflags-$(CONFIG_GENERIC_CPU) += $(call cc-option,-mtune=generic)
--- linux-3.12/arch/x86/Makefile_32.cpu  2012-12-10 22:30:57.000000000 -0500
+++ linux-3.12.mod/arch/x86/Makefile_32.cpu	2013-04-06 07:37:31.754423693 -0400
@@ -23,7 +23,13 @@
 # Please note, that patches that add -march=athlon-xp and friends are pointless.
 # They make zero difference whatsosever to performance at this time.
 cflags-$(CONFIG_MK7)		+= -march=athlon
+cflags-$(CONFIG_MNATIVE) += $(call cc-option,-march=native)
 cflags-$(CONFIG_MK8)		+= $(call cc-option,-march=k8,-march=athlon)
+cflags-$(CONFIG_MK10)	+= $(call cc-option,-march=amdfam10,-march=athlon)
+cflags-$(CONFIG_MBARCELONA)	+= $(call cc-option,-march=barcelona,-march=athlon)
+cflags-$(CONFIG_MBOBCAT)	+= $(call cc-option,-march=btver1,-march=athlon)
+cflags-$(CONFIG_MBULLDOZER)	+= $(call cc-option,-march=bdver1,-march=athlon)
+cflags-$(CONFIG_MPILEDRIVER)	+= $(call cc-option,-march=bdver2,-march=athlon)
 cflags-$(CONFIG_MCRUSOE)	+= -march=i686 $(align)-functions=0 $(align)-jumps=0 $(align)-loops=0
 cflags-$(CONFIG_MEFFICEON)	+= -march=i686 $(call tune,pentium3) $(align)-functions=0 $(align)-jumps=0 $(align)-loops=0
 cflags-$(CONFIG_MWINCHIPC6)	+= $(call cc-option,-march=winchip-c6,-march=i586)
@@ -32,6 +38,10 @@
 cflags-$(CONFIG_MVIAC3_2)	+= $(call cc-option,-march=c3-2,-march=i686)
 cflags-$(CONFIG_MVIAC7)		+= -march=i686
 cflags-$(CONFIG_MCORE2)		+= -march=i686 $(call tune,core2)
+cflags-$(CONFIG_MCOREI7)	+= -march=i686 $(call tune,corei7)
+cflags-$(CONFIG_MCOREI7AVX)	+= -march=i686 $(call tune,corei7-avx)
+cflags-$(CONFIG_MCOREAVXI)	+= -march=i686 $(call tune,core-avx-i)
+cflags-$(CONFIG_MCOREAVX2)	+= -march=i686 $(call tune,core-avx2)
 cflags-$(CONFIG_MATOM)		+= $(call cc-option,-march=atom,$(call cc-option,-march=core2,-march=i686)) \
 	$(call cc-option,-mtune=atom,$(call cc-option,-mtune=generic))
 

             reply	other threads:[~2013-12-08 15:58 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-08 15:53 John [this message]
2013-12-10  3:51 ` [PATCH] expand micro-optimizations in kernel to newer model CPUs David Heidelberger
2013-12-13 12:37   ` Austin S Hemmelgarn
     [not found]     ` <1387051250.86178.YahooMailNeo@web140005.mail.bf1.yahoo.com>
     [not found]       ` <ce679ca5-bf83-4c2a-9234-859cc7a4206b@email.android.com>
     [not found]         ` <1387057337.97000.YahooMailNeo@web140004.mail.bf1.yahoo.com>
     [not found]           ` <52ACECC4.208@zytor.com>
2013-12-15 12:00             ` John
2013-12-15 12:27               ` Richard Weinberger
2013-12-15 12:42                 ` John
2013-12-15 13:23                   ` Borislav Petkov
2013-12-15 17:31                   ` Richard Weinberger
2013-12-16 14:28               ` Ingo Molnar
2013-12-17 12:59                 ` Austin S Hemmelgarn
2013-12-15 12:23     ` John

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1386518000.90452.YahooMailNeo@web140004.mail.bf1.yahoo.com \
    --to=da_audiophile@yahoo.com \
    --cc=david.heidelberger@ixit.cz \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).