linux-kbuild.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Link Time Optimization patchkit v3
@ 2014-02-18 14:28 Andi Kleen
  2014-02-18 14:28 ` [PATCH 01/20] x86, lto: Disable LTO for the x86 VDSO Andi Kleen
                   ` (20 more replies)
  0 siblings, 21 replies; 27+ messages in thread
From: Andi Kleen @ 2014-02-18 14:28 UTC (permalink / raw)
  To: linux-kernel; +Cc: sam, x86, linux-kbuild

LTO allows the compiler to do global optimization over the whole kernel.

Updated version of the LTO patchkit, mainly for fixing Sam's review
comments.  I also rebased to 3.14-rc3 and added a fix for bloat-o-meter
with gcc 4.9

See the individual patches for a detailed description

Dependencies: asmlinkage patchkit (posted two weeks ago), kallsyms patchkit
(plus LTO capable toolchain, see documentation)

Full git tree is in
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc lto-3.14

-Andi


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 01/20] x86, lto: Disable LTO for the x86 VDSO
  2014-02-18 14:28 Link Time Optimization patchkit v3 Andi Kleen
@ 2014-02-18 14:28 ` Andi Kleen
  2014-02-18 14:28 ` [PATCH 02/20] lto: Disable LTO for hweight functions Andi Kleen
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 27+ messages in thread
From: Andi Kleen @ 2014-02-18 14:28 UTC (permalink / raw)
  To: linux-kernel; +Cc: sam, x86, linux-kbuild, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

The VDSO does not play well with LTO, so just disable LTO for it.
Also pass a 32bit linker flag for the 32bit version.

Cc: x86@kernel.org
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/vdso/Makefile | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/x86/vdso/Makefile b/arch/x86/vdso/Makefile
index fd14be1..db0626c 100644
--- a/arch/x86/vdso/Makefile
+++ b/arch/x86/vdso/Makefile
@@ -2,6 +2,8 @@
 # Building vDSO images for x86.
 #
 
+KBUILD_CFLAGS += $(DISABLE_LTO)
+
 VDSO64-$(CONFIG_X86_64)		:= y
 VDSOX32-$(CONFIG_X86_X32_ABI)	:= y
 VDSO32-$(CONFIG_X86_32)		:= y
@@ -35,7 +37,8 @@ export CPPFLAGS_vdso.lds += -P -C
 
 VDSO_LDFLAGS_vdso.lds = -m64 -Wl,-soname=linux-vdso.so.1 \
 			-Wl,--no-undefined \
-		      	-Wl,-z,max-page-size=4096 -Wl,-z,common-page-size=4096
+			-Wl,-z,max-page-size=4096 -Wl,-z,common-page-size=4096 \
+			$(DISABLE_LTO)
 
 $(obj)/vdso.o: $(src)/vdso.S $(obj)/vdso.so
 
@@ -127,7 +130,7 @@ vdso32.so-$(VDSO32-y)		+= sysenter
 vdso32-images			= $(vdso32.so-y:%=vdso32-%.so)
 
 CPPFLAGS_vdso32.lds = $(CPPFLAGS_vdso.lds)
-VDSO_LDFLAGS_vdso32.lds = -m32 -Wl,-soname=linux-gate.so.1
+VDSO_LDFLAGS_vdso32.lds = -m32 -Wl,-m,elf_i386 -Wl,-soname=linux-gate.so.1
 
 # This makes sure the $(obj) subdirectory exists even though vdso32/
 # is not a kbuild sub-make subdirectory.
@@ -181,7 +184,8 @@ quiet_cmd_vdso = VDSO    $@
 		       -Wl,-T,$(filter %.lds,$^) $(filter %.o,$^) && \
 		 sh $(srctree)/$(src)/checkundef.sh '$(NM)' '$@'
 
-VDSO_LDFLAGS = -fPIC -shared $(call cc-ldoption, -Wl$(comma)--hash-style=sysv)
+VDSO_LDFLAGS = -fPIC -shared $(call cc-ldoption, -Wl$(comma)--hash-style=sysv) \
+		${LTO_CFLAGS}
 GCOV_PROFILE := n
 
 #
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 02/20] lto: Disable LTO for hweight functions
  2014-02-18 14:28 Link Time Optimization patchkit v3 Andi Kleen
  2014-02-18 14:28 ` [PATCH 01/20] x86, lto: Disable LTO for the x86 VDSO Andi Kleen
@ 2014-02-18 14:28 ` Andi Kleen
  2014-02-18 14:28 ` [PATCH 03/20] lto: Make asmlinkage __visible Andi Kleen
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 27+ messages in thread
From: Andi Kleen @ 2014-02-18 14:28 UTC (permalink / raw)
  To: linux-kernel; +Cc: sam, x86, linux-kbuild, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

x86 calls the hweight library functions with special calling conventions.
LTO doesn't support compiling individual files with special options.
Just disable LTO for the file.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 lib/Makefile | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/lib/Makefile b/lib/Makefile
index 48140e3..2057304 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -46,7 +46,9 @@ obj-$(CONFIG_CHECK_SIGNATURE) += check_signature.o
 obj-$(CONFIG_DEBUG_LOCKING_API_SELFTESTS) += locking-selftest.o
 
 GCOV_PROFILE_hweight.o := n
-CFLAGS_hweight.o = $(subst $(quote),,$(CONFIG_ARCH_HWEIGHT_CFLAGS))
+CFLAGS_hweight.o = $(subst $(quote),,$(CONFIG_ARCH_HWEIGHT_CFLAGS)) \
+		   $(DISABLE_LTO)
+
 obj-$(CONFIG_GENERIC_HWEIGHT) += hweight.o
 
 obj-$(CONFIG_BTREE) += btree.o
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 03/20] lto: Make asmlinkage __visible
  2014-02-18 14:28 Link Time Optimization patchkit v3 Andi Kleen
  2014-02-18 14:28 ` [PATCH 01/20] x86, lto: Disable LTO for the x86 VDSO Andi Kleen
  2014-02-18 14:28 ` [PATCH 02/20] lto: Disable LTO for hweight functions Andi Kleen
@ 2014-02-18 14:28 ` Andi Kleen
  2014-02-18 14:28 ` [PATCH 04/20] lto, workaround: Add workaround for initcall reordering Andi Kleen
                   ` (17 subsequent siblings)
  20 siblings, 0 replies; 27+ messages in thread
From: Andi Kleen @ 2014-02-18 14:28 UTC (permalink / raw)
  To: linux-kernel; +Cc: sam, x86, linux-kbuild, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

With LTO the compiler needs to know which function can be
called from assembler, otherwise it would optimize
those functions away. We use the existing asmlinkage
for this, which is already used widely.

Note this causes warnings for static asmlinkage, which
is used in some places. These can be later cleaned up.
static asmlinkage usually makes no sense.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 include/linux/linkage.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/linkage.h b/include/linux/linkage.h
index a6a42dd..34a513a 100644
--- a/include/linux/linkage.h
+++ b/include/linux/linkage.h
@@ -12,9 +12,9 @@
 #endif
 
 #ifdef __cplusplus
-#define CPP_ASMLINKAGE extern "C"
+#define CPP_ASMLINKAGE extern "C" __visible
 #else
-#define CPP_ASMLINKAGE
+#define CPP_ASMLINKAGE __visible
 #endif
 
 #ifndef asmlinkage
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 04/20] lto, workaround: Add workaround for initcall reordering
  2014-02-18 14:28 Link Time Optimization patchkit v3 Andi Kleen
                   ` (2 preceding siblings ...)
  2014-02-18 14:28 ` [PATCH 03/20] lto: Make asmlinkage __visible Andi Kleen
@ 2014-02-18 14:28 ` Andi Kleen
  2014-02-18 14:28 ` [PATCH 05/20] lto: Handle LTO common symbols in module loader Andi Kleen
                   ` (16 subsequent siblings)
  20 siblings, 0 replies; 27+ messages in thread
From: Andi Kleen @ 2014-02-18 14:28 UTC (permalink / raw)
  To: linux-kernel; +Cc: sam, x86, linux-kbuild, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Work around a LTO gcc problem: when there is no reference to a variable
in a module it will be moved to the end of the program. This causes
reordering of initcalls which the kernel does not like.
Add a dummy reference function to avoid this. The function is
deleted by the linker.

This replaces a previous much slower workaround.

Thanks to Honza Hubicka for suggesting this technique.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 include/linux/init.h | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/include/linux/init.h b/include/linux/init.h
index e168880..a3ba270 100644
--- a/include/linux/init.h
+++ b/include/linux/init.h
@@ -163,6 +163,23 @@ extern bool initcall_debug;
 
 #ifndef __ASSEMBLY__
 
+#ifdef CONFIG_LTO
+/* Work around a LTO gcc problem: when there is no reference to a variable
+ * in a module it will be moved to the end of the program. This causes
+ * reordering of initcalls which the kernel does not like.
+ * Add a dummy reference function to avoid this. The function is
+ * deleted by the linker.
+ */
+#define LTO_REFERENCE_INITCALL(x) \
+	; /* yes this is needed */			\
+	static __used __exit void *reference_##x(void)	\
+	{						\
+		return &x;				\
+	}
+#else
+#define LTO_REFERENCE_INITCALL(x)
+#endif
+
 /* initcalls are now grouped by functionality into separate 
  * subsections. Ordering inside the subsections is determined
  * by link order. 
@@ -175,7 +192,8 @@ extern bool initcall_debug;
 
 #define __define_initcall(fn, id) \
 	static initcall_t __initcall_##fn##id __used \
-	__attribute__((__section__(".initcall" #id ".init"))) = fn
+	__attribute__((__section__(".initcall" #id ".init"))) = fn; \
+	LTO_REFERENCE_INITCALL(__initcall_##fn##id)
 
 /*
  * Early initcalls run before initializing SMP.
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 05/20] lto: Handle LTO common symbols in module loader
  2014-02-18 14:28 Link Time Optimization patchkit v3 Andi Kleen
                   ` (3 preceding siblings ...)
  2014-02-18 14:28 ` [PATCH 04/20] lto, workaround: Add workaround for initcall reordering Andi Kleen
@ 2014-02-18 14:28 ` Andi Kleen
  2014-02-18 14:53   ` Konrad Rzeszutek Wilk
  2014-02-18 14:28 ` [PATCH 06/20] lto: Disable LTO for sys_ni Andi Kleen
                   ` (15 subsequent siblings)
  20 siblings, 1 reply; 27+ messages in thread
From: Andi Kleen @ 2014-02-18 14:28 UTC (permalink / raw)
  To: linux-kernel; +Cc: sam, x86, linux-kbuild, Joe Mario, rusty, Andi Kleen

From: Joe Mario <jmario@redhat.com>

Here is the workaround I made for having the kernel not reject modules
built with -flto.  The clean solution would be to get the compiler to not
emit the symbol.  Or if it has to emit the symbol, then emit it as
initialized data but put it into a comdat/linkonce section.

Minor tweaks by AK over Joe's patch.

Cc: rusty@rustcorp.com.au
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 kernel/module.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/module.c b/kernel/module.c
index d24fcf2..b99e801 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -1948,6 +1948,10 @@ static int simplify_symbols(struct module *mod, const struct load_info *info)
 
 		switch (sym[i].st_shndx) {
 		case SHN_COMMON:
+			/* Ignore common symbols */
+			if (!strncmp(name, "__gnu_lto", 9))
+				break;
+
 			/* We compiled with -fno-common.  These are not
 			   supposed to happen.  */
 			pr_debug("Common symbol: %s\n", name);
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 06/20] lto: Disable LTO for sys_ni
  2014-02-18 14:28 Link Time Optimization patchkit v3 Andi Kleen
                   ` (4 preceding siblings ...)
  2014-02-18 14:28 ` [PATCH 05/20] lto: Handle LTO common symbols in module loader Andi Kleen
@ 2014-02-18 14:28 ` Andi Kleen
  2014-02-18 14:28 ` [PATCH 07/20] lto: Don't let LATENCYTOP and LOCKDEP select KALLSYMS_ALL Andi Kleen
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 27+ messages in thread
From: Andi Kleen @ 2014-02-18 14:28 UTC (permalink / raw)
  To: linux-kernel; +Cc: sam, x86, linux-kbuild, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

The assembler alias code in cond_syscall does not work
when compiled for LTO. Just disable LTO for that file.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 kernel/Makefile | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/Makefile b/kernel/Makefile
index bc010ee..31c26c6 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -18,6 +18,9 @@ CFLAGS_REMOVE_cgroup-debug.o = -pg
 CFLAGS_REMOVE_irq_work.o = -pg
 endif
 
+# cond_syscall is currently not LTO compatible
+CFLAGS_sys_ni.o = $(DISABLE_LTO)
+
 obj-y += sched/
 obj-y += locking/
 obj-y += power/
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 07/20] lto: Don't let LATENCYTOP and LOCKDEP select KALLSYMS_ALL
  2014-02-18 14:28 Link Time Optimization patchkit v3 Andi Kleen
                   ` (5 preceding siblings ...)
  2014-02-18 14:28 ` [PATCH 06/20] lto: Disable LTO for sys_ni Andi Kleen
@ 2014-02-18 14:28 ` Andi Kleen
  2014-02-18 14:28 ` [PATCH 08/20] Kbuild, lto, workaround: Don't warn for initcall_reference in modpost Andi Kleen
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 27+ messages in thread
From: Andi Kleen @ 2014-02-18 14:28 UTC (permalink / raw)
  To: linux-kernel; +Cc: sam, x86, linux-kbuild, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

KALLSYMS_ALL enables including data variables into KALLSYMS.
With plain KALLSYMS only functions are included.

LATENCYTOP and LOCKDEP select KALLSYMS_ALL in addition to KALLSYMS.
It's unclear what they actually need _ALL for; they should
only need function backtraces and afaik never touch variables.

LTO currently does not support KALLSYMS_ALL, which prevents
LATENCYTOP and LOCKDEP from working and gives Kconfig errors.
Disable the requirement for KALLSYMS_ALL for them, just use
KALLSYMS.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 lib/Kconfig.debug | 2 --
 1 file changed, 2 deletions(-)

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index a48abea..5cbf0c5 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -928,7 +928,6 @@ config LOCKDEP
 	select STACKTRACE
 	select FRAME_POINTER if !MIPS && !PPC && !ARM_UNWIND && !S390 && !MICROBLAZE && !ARC
 	select KALLSYMS
-	select KALLSYMS_ALL
 
 config LOCK_STAT
 	bool "Lock usage statistics"
@@ -1396,7 +1395,6 @@ config LATENCYTOP
 	depends on PROC_FS
 	select FRAME_POINTER if !MIPS && !PPC && !S390 && !MICROBLAZE && !ARM_UNWIND && !ARC
 	select KALLSYMS
-	select KALLSYMS_ALL
 	select STACKTRACE
 	select SCHEDSTATS
 	select SCHED_DEBUG
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 08/20] Kbuild, lto, workaround: Don't warn for initcall_reference in modpost
  2014-02-18 14:28 Link Time Optimization patchkit v3 Andi Kleen
                   ` (6 preceding siblings ...)
  2014-02-18 14:28 ` [PATCH 07/20] lto: Don't let LATENCYTOP and LOCKDEP select KALLSYMS_ALL Andi Kleen
@ 2014-02-18 14:28 ` Andi Kleen
  2014-02-18 14:28 ` [PATCH 09/20] Kbuild, lto: Drop .number postfixes " Andi Kleen
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 27+ messages in thread
From: Andi Kleen @ 2014-02-18 14:28 UTC (permalink / raw)
  To: linux-kernel; +Cc: sam, x86, linux-kbuild, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

This reference is discarded, but can cause warnings when it refers to
exit. Ignore for now.

This is a workaround and can be removed once we get rid of
-fno-toplevel-reorder

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 scripts/mod/modpost.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index 4061098..bd06857 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -1455,6 +1455,10 @@ static void check_section_mismatch(const char *modname, struct elf_info *elf,
 		to = find_elf_symbol(elf, r->r_addend, sym);
 		tosym = sym_name(elf, to);
 
+		if (!strncmp(fromsym, "reference___initcall",
+				sizeof("reference___initcall") - 1))
+			return;
+
 		/* check whitelist - we may ignore it */
 		if (secref_whitelist(mismatch,
 					fromsec, fromsym, tosec, tosym)) {
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 09/20] Kbuild, lto: Drop .number postfixes in modpost
  2014-02-18 14:28 Link Time Optimization patchkit v3 Andi Kleen
                   ` (7 preceding siblings ...)
  2014-02-18 14:28 ` [PATCH 08/20] Kbuild, lto, workaround: Don't warn for initcall_reference in modpost Andi Kleen
@ 2014-02-18 14:28 ` Andi Kleen
  2014-02-18 14:28 ` [PATCH 10/20] Kbuild, lto: add ld-version and ld-ifversion macros Andi Kleen
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 27+ messages in thread
From: Andi Kleen @ 2014-02-18 14:28 UTC (permalink / raw)
  To: linux-kernel; +Cc: sam, x86, linux-kbuild, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

LTO turns all global symbols effectively into statics. This
has the side effect that they all have a .NUMBER postfix to make
them unique. In modpost drop this postfix because it confuses
it.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 scripts/mod/modpost.c | 15 ++++++++++++++-
 scripts/mod/modpost.h |  2 +-
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index bd06857..d0fc656 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -1684,6 +1684,19 @@ static void check_sec_ref(struct module *mod, const char *modname,
 	}
 }
 
+static char *remove_dot(char *s)
+{
+	char *end;
+	int n = strcspn(s, ".");
+
+	if (n > 0 && s[n] != 0) {
+		strtoul(s + n + 1, &end, 10);
+		if  (end > s + n + 1 && (*end == '.' || *end == 0))
+			s[n] = 0;
+	}
+	return s;
+}
+
 static void read_symbols(char *modname)
 {
 	const char *symname;
@@ -1722,7 +1735,7 @@ static void read_symbols(char *modname)
 	}
 
 	for (sym = info.symtab_start; sym < info.symtab_stop; sym++) {
-		symname = info.strtab + sym->st_name;
+		symname = remove_dot(info.strtab + sym->st_name);
 
 		handle_modversions(mod, &info, sym, symname);
 		handle_moddevtable(mod, &info, sym, symname);
diff --git a/scripts/mod/modpost.h b/scripts/mod/modpost.h
index 51207e4..168b43d 100644
--- a/scripts/mod/modpost.h
+++ b/scripts/mod/modpost.h
@@ -127,7 +127,7 @@ struct elf_info {
 	Elf_Section  export_gpl_sec;
 	Elf_Section  export_unused_gpl_sec;
 	Elf_Section  export_gpl_future_sec;
-	const char   *strtab;
+	char         *strtab;
 	char	     *modinfo;
 	unsigned int modinfo_len;
 
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 10/20] Kbuild, lto: add ld-version and ld-ifversion macros
  2014-02-18 14:28 Link Time Optimization patchkit v3 Andi Kleen
                   ` (8 preceding siblings ...)
  2014-02-18 14:28 ` [PATCH 09/20] Kbuild, lto: Drop .number postfixes " Andi Kleen
@ 2014-02-18 14:28 ` Andi Kleen
  2014-02-18 14:28 ` [PATCH 11/20] Kbuild, lto: Add a gcc-ld script to let run gcc as ld Andi Kleen
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 27+ messages in thread
From: Andi Kleen @ 2014-02-18 14:28 UTC (permalink / raw)
  To: linux-kernel; +Cc: sam, x86, linux-kbuild, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

To check the linker version. Used by the LTO makefile.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 Documentation/kbuild/makefiles.txt | 10 ++++++++++
 scripts/Kbuild.include             |  9 +++++++++
 scripts/ld-version.sh              |  8 ++++++++
 3 files changed, 27 insertions(+)
 create mode 100755 scripts/ld-version.sh

diff --git a/Documentation/kbuild/makefiles.txt b/Documentation/kbuild/makefiles.txt
index d567a7c..40d51f3 100644
--- a/Documentation/kbuild/makefiles.txt
+++ b/Documentation/kbuild/makefiles.txt
@@ -595,6 +595,16 @@ more details, with real examples.
 		#Makefile
 		LDFLAGS_vmlinux += $(call ld-option, -X)
 
+    ld-version
+	ld-version returns the full version number of $(LD), for three
+	number binutils versions, like the Linux binutils
+
+    ld-ifversion
+	Check the version number of the $(LD) linker in an if.
+
+	Example:
+		#Makefile.lto
+		ifeq ($(call ld-ifversion,-ge,22710001,y),y)
 
 === 4 Host Program support
 
diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include
index 547e15d..93a0da2 100644
--- a/scripts/Kbuild.include
+++ b/scripts/Kbuild.include
@@ -155,6 +155,15 @@ ld-option = $(call try-run,\
 # Important: no spaces around options
 ar-option = $(call try-run, $(AR) rc$(1) "$$TMP",$(1),$(2))
 
+# ld-version
+# Usage: $(call ld-version)
+# Note this is mainly for HJ Lu's 3 number binutil versions
+ld-version = $(shell $(LD) --version | $(srctree)/scripts/ld-version.sh)
+
+# ld-ifversion
+# Usage:  $(call ld-ifversion, -ge, 22252, y)
+ld-ifversion = $(shell [ $(call ld-version) $(1) $(2) ] && echo $(3))
+
 ######
 
 ###
diff --git a/scripts/ld-version.sh b/scripts/ld-version.sh
new file mode 100755
index 0000000..198580d
--- /dev/null
+++ b/scripts/ld-version.sh
@@ -0,0 +1,8 @@
+#!/usr/bin/awk -f
+# extract linker version number from stdin and turn into single number
+	{
+	gsub(".*)", "");
+	split($1,a, ".");
+	print a[1]*10000000 + a[2]*100000 + a[3]*10000 + a[4]*100 + a[5];
+	exit
+	}
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 11/20] Kbuild, lto: Add a gcc-ld script to let run gcc as ld
  2014-02-18 14:28 Link Time Optimization patchkit v3 Andi Kleen
                   ` (9 preceding siblings ...)
  2014-02-18 14:28 ` [PATCH 10/20] Kbuild, lto: add ld-version and ld-ifversion macros Andi Kleen
@ 2014-02-18 14:28 ` Andi Kleen
  2014-02-18 14:28 ` [PATCH 12/20] Kbuild, lto: Disable LTO for asm-offsets.c Andi Kleen
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 27+ messages in thread
From: Andi Kleen @ 2014-02-18 14:28 UTC (permalink / raw)
  To: linux-kernel; +Cc: sam, x86, linux-kbuild, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

For LTO we need to run the link step with gcc, not ld.
Since there are a lot of linker options passed to it, add a gcc-ld wrapper
that wraps them as -Wl,

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 scripts/gcc-ld | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)
 create mode 100644 scripts/gcc-ld

diff --git a/scripts/gcc-ld b/scripts/gcc-ld
new file mode 100644
index 0000000..1e71ef3
--- /dev/null
+++ b/scripts/gcc-ld
@@ -0,0 +1,34 @@
+#!/bin/sh
+# run gcc with ld options
+# used as a wrapper to execute link time optimizations
+# yes virginia, this is not pretty
+
+case "${KBUILD_VERBOSE}" in
+*1*) set -x ;;
+esac
+
+ARGS="-nostdlib"
+
+while [ "$1" != "" ] ; do
+	case "$1" in
+	-save-temps|-m32|-m64) N="$1" ;;
+	-r) N="$1" ;;
+	-[Wg]*) N="$1" ;;
+	-[olv]|-[Ofd]*|-nostdlib) N="$1" ;;
+	--end-group|--start-group)
+		 N="-Wl,$1" ;;
+	-[RTFGhIezcbyYu]*|\
+--script|--defsym|-init|-Map|--oformat|-rpath|\
+-rpath-link|--sort-section|--section-start|-Tbss|-Tdata|-Ttext|\
+--version-script|--dynamic-list|--version-exports-symbol|--wrap|-m)
+		A="$1" ; shift ; N="-Wl,$A,$1" ;;
+	--param) shift ; N="--param $1" ;;
+	-[m]*) N="$1" ;;
+	-*) N="-Wl,$1" ;;
+	*)  N="$1" ;;
+	esac
+	ARGS="$ARGS $N"
+	shift
+done
+
+exec $CC $ARGS
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 12/20] Kbuild, lto: Disable LTO for asm-offsets.c
  2014-02-18 14:28 Link Time Optimization patchkit v3 Andi Kleen
                   ` (10 preceding siblings ...)
  2014-02-18 14:28 ` [PATCH 11/20] Kbuild, lto: Add a gcc-ld script to let run gcc as ld Andi Kleen
@ 2014-02-18 14:28 ` Andi Kleen
  2014-02-18 14:28 ` [PATCH 13/20] Kbuild, lto: Set TMPDIR for LTO v2 Andi Kleen
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 27+ messages in thread
From: Andi Kleen @ 2014-02-18 14:28 UTC (permalink / raw)
  To: linux-kernel; +Cc: sam, x86, linux-kbuild, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

The asm-offset.c technique to fish data out of the assembler file
does not work with LTO. Just disable for the asm-offset.c build.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 scripts/Makefile.build | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index d5d859c..9f0ee22 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -198,7 +198,7 @@ $(multi-objs-y:.o=.s)   : modname = $(modname-multi)
 $(multi-objs-y:.o=.lst) : modname = $(modname-multi)
 
 quiet_cmd_cc_s_c = CC $(quiet_modtag)  $@
-cmd_cc_s_c       = $(CC) $(c_flags) -fverbose-asm -S -o $@ $<
+cmd_cc_s_c       = $(CC) $(c_flags) $(DISABLE_LTO) -fverbose-asm -S -o $@ $<
 
 $(obj)/%.s: $(src)/%.c FORCE
 	$(call if_changed_dep,cc_s_c)
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 13/20] Kbuild, lto: Set TMPDIR for LTO v2
  2014-02-18 14:28 Link Time Optimization patchkit v3 Andi Kleen
                   ` (11 preceding siblings ...)
  2014-02-18 14:28 ` [PATCH 12/20] Kbuild, lto: Disable LTO for asm-offsets.c Andi Kleen
@ 2014-02-18 14:28 ` Andi Kleen
  2014-02-18 14:35   ` H. Peter Anvin
  2014-02-18 14:28 ` [PATCH 14/20] Kbuild, lto: Handle basic LTO in modpost Andi Kleen
                   ` (7 subsequent siblings)
  20 siblings, 1 reply; 27+ messages in thread
From: Andi Kleen @ 2014-02-18 14:28 UTC (permalink / raw)
  To: linux-kernel; +Cc: sam, x86, linux-kbuild, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

LTO gcc puts a lot of data into $TMPDIR, essentially another copy
of the object directory to pass the repartitioned object files
to the code generation processes.

TMPDIR defaults to /tmp With /tmp as tmpfs it's easy to drive systems to
out of memory, because they will compete with the already high anonymous
memory consumption of the wpa LTO pass.

When LTO is set always set TMPDIR to the object directory. This could
be slightly slower, but is far safer and eliminates another parameter
the LTO user would need to set manually.

I made it conditional on LTO for now.

v2: Allow user to override (H. Peter Anvin)
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 Makefile | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/Makefile b/Makefile
index 893d6f0..0fd460b 100644
--- a/Makefile
+++ b/Makefile
@@ -407,6 +407,14 @@ export KBUILD_AFLAGS_MODULE KBUILD_CFLAGS_MODULE KBUILD_LDFLAGS_MODULE
 export KBUILD_AFLAGS_KERNEL KBUILD_CFLAGS_KERNEL
 export KBUILD_ARFLAGS
 
+ifdef CONFIG_LTO
+# LTO gcc creates a lot of files in TMPDIR, and with /tmp as tmpfs
+# it's easy to drive the machine OOM. Use the object directory
+# instead
+TMPDIR ?= ${objtree}
+export TMPDIR
+endif
+
 # When compiling out-of-tree modules, put MODVERDIR in the module
 # tree rather than in the kernel tree. The kernel tree might
 # even be read-only.
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 14/20] Kbuild, lto: Handle basic LTO in modpost
  2014-02-18 14:28 Link Time Optimization patchkit v3 Andi Kleen
                   ` (12 preceding siblings ...)
  2014-02-18 14:28 ` [PATCH 13/20] Kbuild, lto: Set TMPDIR for LTO v2 Andi Kleen
@ 2014-02-18 14:28 ` Andi Kleen
  2014-02-18 14:28 ` [PATCH 15/20] Kbuild, lto: Fix single pass kallsyms for LTO Andi Kleen
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 27+ messages in thread
From: Andi Kleen @ 2014-02-18 14:28 UTC (permalink / raw)
  To: linux-kernel; +Cc: sam, x86, linux-kbuild, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

- Don't warn about LTO marker symbols. modpost runs before
the linker, so the module is not necessarily LTOed yet.
- Don't complain about .gnu.lto* sections

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 scripts/mod/modpost.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index d0fc656..4445f59 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -623,7 +623,11 @@ static void handle_modversions(struct module *mod, struct elf_info *info,
 
 	switch (sym->st_shndx) {
 	case SHN_COMMON:
-		warn("\"%s\" [%s] is COMMON symbol\n", symname, mod->name);
+		if (!strncmp(symname,
+			     "__gnu_lto_", sizeof("__gnu_lto_") - 1)) {
+			/* Should warn here, but modpost runs before the linker */
+		} else
+			warn("\"%s\" [%s] is COMMON symbol\n", symname, mod->name);
 		break;
 	case SHN_UNDEF:
 		/* undefined symbol */
@@ -849,6 +853,7 @@ static const char *section_white_list[] =
 	".xt.lit",         /* xtensa */
 	".arcextmap*",			/* arc */
 	".gnu.linkonce.arcext*",	/* arc : modules */
+	".gnu.lto*",
 	NULL
 };
 
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 15/20] Kbuild, lto: Fix single pass kallsyms for LTO
  2014-02-18 14:28 Link Time Optimization patchkit v3 Andi Kleen
                   ` (13 preceding siblings ...)
  2014-02-18 14:28 ` [PATCH 14/20] Kbuild, lto: Handle basic LTO in modpost Andi Kleen
@ 2014-02-18 14:28 ` Andi Kleen
  2014-02-18 14:28 ` [PATCH 16/20] Kbuild, lto: Add Link Time Optimization support v2 Andi Kleen
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 27+ messages in thread
From: Andi Kleen @ 2014-02-18 14:28 UTC (permalink / raw)
  To: linux-kernel; +Cc: sam, x86, linux-kbuild, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

gcc-nm on slim LTO objects does not output static functions or variables.

This causes the first pass estimation of the kallsyms table
to be off too much. Add a hack using the LTO function sections
to retrieve all functions instead. I wrote that hack in perl,
as it exceeded my awk-fu (this adds a build dependency on perl,
but only if LTO is active). The hack also doesn't support
variables, but we handle that by disable KALLSYMS_ALL with LTO.
The hack is also somewhat depending on the internal LTO
object format.

Hopefully at some future point gcc-nm will be fixed and this
won't be necessary anymore.

One issue is that LTO can generate new symbols in the
final link, for example when cloning functions. These clones
are just copies of existing names with a postfix (which
we remove), so they compress well in the kallsyms compression.

With that we can get away by just adding a 5% safety factor
to the first pass kallsyms estimation, and hope all clones
fit into that.  If some obscure build has more clones than that
the PAD_RATIO value in kallsyms.c can be lowered to increase it.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 init/Kconfig            |  5 ++++-
 scripts/link-vmlinux.sh | 24 +++++++++++++++++++++---
 2 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index 009a797..ea00c0e 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1317,7 +1317,10 @@ config KALLSYMS
 
 config KALLSYMS_ALL
 	bool "Include all symbols in kallsyms"
-	depends on DEBUG_KERNEL && KALLSYMS
+	# the method LTO uses to predict the symbol table
+	# only supports functions for now
+	# This can be removed once http://gcc.gnu.org/PR60016 is fixed
+	depends on DEBUG_KERNEL && KALLSYMS && !LTO
 	help
 	   Normally kallsyms only contains the symbols of functions for nicer
 	   OOPS messages and backtraces (i.e., symbols from the text and inittext
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index b299fdd..5a28f2a 100644
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -90,10 +90,28 @@ kallsyms()
 	local aflags="${KBUILD_AFLAGS} ${KBUILD_AFLAGS_KERNEL}               \
 		      ${NOSTDINC_FLAGS} ${LINUXINCLUDE} ${KBUILD_CPPFLAGS}"
 
-	${NM} -n ${1} | \
-		awk 'NF == 3 { print}' |
-		scripts/kallsyms ${kallsymopt} | \
+	# workaround for slim LTO gcc-nm not outputing static symbols
+	# http://gcc.gnu.org/PR60016
+	# generate a fake symbol table based on the LTO function sections.
+	# This unfortunately "knows" about the internal LTO file format
+	# and only works for functions
+	# needs perl for now when building for LTO
+	(
+	if $OBJDUMP --section-headers ${1} | grep -q \.gnu\.lto_ ; then
+		${OBJDUMP} --section-headers ${1} |
+		perl -ne '
+@n = split;
+next unless $n[1] =~ /\.gnu\.lto_([_a-zA-Z][^.]+)/;
+next if $n[1] eq $prev;
+$prev = $n[1];
+print "0 T ",$1,"\n"'
+	fi
+	${NM} -n ${1} | awk 'NF == 3 { print }'
+	)  > ${2}_sym
+	# run without pipe to make kallsyms errors stop the script
+	./scripts/kallsyms ${kallsymopt} < ${2}_sym |
 		${CC} ${aflags} -c -o ${2} -x assembler-with-cpp -
+
 }
 
 # Create map file with all symbols from ${1}
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 16/20] Kbuild, lto: Add Link Time Optimization support v2
  2014-02-18 14:28 Link Time Optimization patchkit v3 Andi Kleen
                   ` (14 preceding siblings ...)
  2014-02-18 14:28 ` [PATCH 15/20] Kbuild, lto: Fix single pass kallsyms for LTO Andi Kleen
@ 2014-02-18 14:28 ` Andi Kleen
  2014-02-18 14:28 ` [PATCH 17/20] Kbuild, bloat-o-meter: Ignore .lto_priv postfix Andi Kleen
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 27+ messages in thread
From: Andi Kleen @ 2014-02-18 14:28 UTC (permalink / raw)
  To: linux-kernel; +Cc: sam, x86, linux-kbuild, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

With LTO gcc will do whole program optimizations for
the whole kernel and each module. This increases compile time,
but can generate faster and smaller code and allows
the compiler to do global checking. For example the compiler
can complain now about type mismatches for symbols between
different files.

LTO allows gcc to inline functions between different files and
do various other optimization across the whole binary.

It might also trigger bugs due to more aggressive optimizations.
It allows gcc to drop unused code. It also allows it to check
types over the whole program.

The compile time is definitely slower. For gcc 4.8 on a
typical monolithic config it is about 58% slower. 4.9
drastically improved performance, with slowdown being
38% or so. Also incremenential rebuilds are somewhat
slower, as the whole kernel always needs to be reoptimized.
Very modular kernels have less build time slow down, as
the LTO will run for each module individually.

This adds the basic Kbuild plumbing for LTO:

- In Kbuild add a new scripts/Makefile.lto that checks
the tool chain (note the checks may not be fully bulletproof)
and when the tests pass sets the LTO options
Currently LTO is very finicky about the tool chain.
- Add a new LDFINAL variable that controls the final link
for vmlinux or module. In this case we call gcc-ld instead
of ld, to run the LTO step.
- For slim LTO builds (object files containing no backup
executable) force AR to gcc-ar
- Theoretically LTO should pass through compiler options from
the compiler to the link step, but this doesn't work for all options.
So the Makefile sets most of these options manually.
- Kconfigs:
Since LTO with allyesconfig needs more than 4G of memory (~8G)
and has the potential to makes people's system swap to death.
I used a nested config that ensures that a simple
allyesconfig disables LTO. It has to be explicitely
enabled.
- Some depencies on other Kconfigs:
MODVERSIONS, GCOV, FUNCTION_TRACER, KALLSYMS_ALL, single chain WCHAN are
incompatible with LTO currently, mostly because they
they require setting special compiler options
for specific files, which LTO currently doesn't support.
MODVERSIONS should in principle work with gcc 4.9, but still disabled.
FUNCTION_TRACER/GCOV can be fixed with a unmerged gcc patch.
- Also disable strict copy user checks because they trigger
errors with LTO.
- modpost symbol checking is downgraded to a warning,
as in some cases modpost runs before the final link
and it cannot resolve LTO symbols at this point.

For more information see Documentation/lto-build

Thanks to HJ Lu, Joe Mario, Honza Hubicka, Richard Guenther,
Don Zickus, Changlong Xie who helped with this project
(and probably some more who I forgot, sorry)

v2:
Merge documentation file into this patch
Improve documentation and Kconfig, fix a lot of obsolete comments.
Exclude READABLE_ASM
Some random fixes
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 Documentation/lto-build  | 173 +++++++++++++++++++++++++++++++++++++++++++++++
 Makefile                 |   9 ++-
 arch/x86/Kconfig         |   2 +-
 init/Kconfig             |  81 ++++++++++++++++++++++
 kernel/gcov/Kconfig      |   2 +-
 lib/Kconfig.debug        |   2 +-
 scripts/Makefile.lto     |  85 +++++++++++++++++++++++
 scripts/Makefile.modpost |   7 +-
 scripts/link-vmlinux.sh  |   2 +-
 9 files changed, 355 insertions(+), 8 deletions(-)
 create mode 100644 Documentation/lto-build
 create mode 100644 scripts/Makefile.lto

diff --git a/Documentation/lto-build b/Documentation/lto-build
new file mode 100644
index 0000000..5dcce1e
--- /dev/null
+++ b/Documentation/lto-build
@@ -0,0 +1,173 @@
+Link time optimization (LTO) for the Linux kernel
+
+This is an experimental feature.
+
+Link Time Optimization allows the compiler to optimize the complete program
+instead of just each file.  LTO requires at least gcc 4.8 (but
+works more efficiently with 4.9+) LTO requires Linux binutils (the normal FSF
+releases used in many distributions do not work at the moment)
+
+The compiler can inline functions between files and do various other global
+optimizations, like specializing functions for common parameters,
+determing when global variables are clobbered, making functions pure/const,
+propagating constants globally, removing unneeded data and others.
+
+It will also drop unused functions which can make the kernel
+image smaller in some circumstances, in particular for small kernel
+configurations.
+
+For small monolithic kernels it can throw away unused code very effectively
+(especially when modules are disabled) and usually shrinks
+the code size.
+
+Build time and memory consumption at build time will increase, depending
+on the size of the largest binary. Modular kernels are less affected.
+With LTO incremental builds are less incremental, as always the whole
+binary needs to be re-optimized (but not re-parsed)
+
+Oops can be somewhat more difficult to read, due to the more aggressive
+inlining.
+
+Normal "reasonable" builds work with less than 4GB of RAM, but very large
+configurations like allyesconfig may need more memory. The actual
+memory needed depends on the available memory (gcc sizes its garbage
+collector pools based on that or on the ulimit -m limits) and
+the compiler version.
+
+gcc 4.9+ has much better build performance and less memory consumption
+
+- A few kernel features are currently incompatible with LTO, in particular
+function tracing, because they require special compiler flags for
+specific files, which is not supported in LTO right now.
+- Jobserver control for -j does not work correctly for the final
+LTO phase due to some problems with the kernel's pipe code.
+The makefiles hard codes -j<number of online cpus> for the final
+LTO phase to work around for this
+
+Configuration:
+- Enable CONFIG_LTO_MENU and then disable CONFIG_LTO_DISABLE.
+This is mainly to not have allyesconfig default to LTO.
+- FUNCTION_TRACER, STACK_TRACER, FUNCTION_GRAPH_TRACER, KALLSYMS_ALL, GCOV
+have to disabled because they are currently incompatible with LTO.
+- MODVERSIONS have to be disabled (may work with 4.9+)
+
+Requirements:
+- Enough memory: 4GB for a standard build, more for allyesconfig
+The peak memory usage happens single threaded (when lto-wpa merges types),
+so dialing back -j options will not help much.
+
+A 32bit compiler is unlikely to work due to the memory requirements.
+You can however build a kernel targeted at 32bit on a 64bit host.
+
+Example build procedure:
+
+Simplified procedure for distributions that have gcc 4.8, but not
+the Linux binutils (for example openSUSE 13.1 or FC20):
+
+The LTO builds requires gcc-nm/gcc-ar. Some distributions ship
+those in separate packages, which may need to be explicitely installed.
+
+- Get the latest Linux binutils from
+http://www.kernel.org/pub/linux/devel/binutils/
+and unpack it.
+
+We install it in a separate directory to not overwrite the system binutils.
+
+# replace VERSION with respective version numbers
+
+cd binutils*
+# don't forget the --enable-plugins!
+./configure --prefix=/opt/binutils-VERSION --enable-plugins
+make -j $(getconf _NPROCESSORS_ONLN) && sudo make install
+
+Fix up the kernel configuration to allow LTO:
+
+<start with a suitable kernel configuration>
+./source/scripts/config --disable function_tracer \
+			--disable function_graph_tracer \
+			--disable stack_tracer --enable lto_menu \
+                        --disable lto_disable \
+			--disable gcov \
+			--disable kallsyms_all \
+			--disable modversions
+make oldconfig
+
+Then you can build with
+
+# The COMPILER_PATH is needed to let gcc use the new binutils
+# as the LTO plugin linker
+# if you installed gcc in a separate directory like below also
+# add it to the PATH line below before the regular $PATH
+# The COMPILER_PATH setting is only needed if the gcc was not built
+# with --with-plugin-ld pointing to the Linux binutils ld
+# The AR/NM setting works around a Makefile bug
+COMPILER_PATH=/opt/binutils-VERSION/bin PATH=$COMPILER_PATH:$PATH \
+make -j$(getconf _NPROCESSORS_ONLN) AR=gcc-ar NM=gcc-nm
+
+If you don't have gcc 4.8+ as system compiler you would also need
+to install that compiler. In this case I recommend getting
+a gcc 4.9+ snapshot from http://gcc.gnu.org (or release when available),
+as it builds much faster for LTO than 4.8.
+
+Here's an example build procedure:
+
+Assuming gcc is unpacked in gcc-VERSION
+
+cd gcc-VERSION
+./contrib/download_preqrequisites
+cd ..
+
+mkdir obj-gcc
+# please don't skip this cd. the build will not work correctly in the
+# source dir, you have to use the separate object dir
+cd obj-gcc
+../gcc-VERSION/configure --prefix=/opt/gcc-VERSION --enable-lto \
+--with-plugin-ld=/opt/binutils-VERSION/bin/ld
+--disable-nls --enable-languages=c,c++ \
+--disable-libstdcxx-pch
+make -j$(getconf _NPROCESSORS_ONLN)
+sudo make install-no-fixedincludes
+
+FAQs:
+
+Q: I get a section type attribute conflict
+A: Usually because of someone doing
+const __initdata (should be const __initconst) or const __read_mostly
+(should be just const). Check both symbols reported by gcc.
+
+Q: I see lots of undefined symbols for memcmp etc.
+A: Usually because NM=gcc-nm AR=gcc-ar are missing.
+The Makefile tries to set those automatically, but it doesn't always
+work. Better to set it manually on the make command line.
+
+Q: It's quite slow / uses too much memory.
+A: Consider a gcc 4.9 snapshot/release (not released yet)
+The main problem in 4.8 is the type merging in the single threaded WPA pass,
+which has been improved considerably in 4.9 by running it distributed.
+
+Q: It's still slow
+A: It'll always be somewhat slower than non LTO sorry.
+
+Q: What's up with .XXXXX numeric post fixes
+A: This is due LTO turning (near) all symbols to static
+Use gcc 4.9, it avoids them in most cases. They are also filtered out
+in kallsyms.
+
+References:
+
+Presentation on Kernel LTO
+(note, performance numbers/details outdated.  In particular gcc 4.9 fixed
+most of the build time problems):
+http://halobates.de/kernel-lto.pdf
+
+Generic gcc LTO:
+http://www.ucw.cz/~hubicka/slides/labs2013.pdf
+http://www.hipeac.net/system/files/barcelona.pdf
+
+Somewhat outdated too:
+http://gcc.gnu.org/projects/lto/lto.pdf
+http://gcc.gnu.org/projects/lto/whopr.pdf
+
+Happy Link-Time-Optimizing!
+
+Andi Kleen
diff --git a/Makefile b/Makefile
index 0fd460b..5173d5e 100644
--- a/Makefile
+++ b/Makefile
@@ -335,9 +335,14 @@ include $(srctree)/scripts/Kbuild.include
 
 AS		= $(CROSS_COMPILE)as
 LD		= $(CROSS_COMPILE)ld
+LDFINAL	= $(LD)
 CC		= $(CROSS_COMPILE)gcc
 CPP		= $(CC) -E
+ifdef CONFIG_LTO_SLIM
+AR		= $(CROSS_COMPILE)gcc-ar
+else
 AR		= $(CROSS_COMPILE)ar
+endif
 NM		= $(CROSS_COMPILE)nm
 STRIP		= $(CROSS_COMPILE)strip
 OBJCOPY		= $(CROSS_COMPILE)objcopy
@@ -396,7 +401,7 @@ KERNELVERSION = $(VERSION)$(if $(PATCHLEVEL),.$(PATCHLEVEL)$(if $(SUBLEVEL),.$(S
 
 export VERSION PATCHLEVEL SUBLEVEL KERNELRELEASE KERNELVERSION
 export ARCH SRCARCH CONFIG_SHELL HOSTCC HOSTCFLAGS CROSS_COMPILE AS LD CC
-export CPP AR NM STRIP OBJCOPY OBJDUMP
+export CPP AR NM STRIP OBJCOPY OBJDUMP LDFINAL
 export MAKE AWK GENKSYMS INSTALLKERNEL PERL UTS_MACHINE
 export HOSTCXX HOSTCXXFLAGS LDFLAGS_MODULE CHECK CHECKFLAGS
 
@@ -707,6 +712,8 @@ ifeq ($(shell $(CONFIG_SHELL) $(srctree)/scripts/gcc-goto.sh $(CC)), y)
 	KBUILD_CFLAGS += -DCC_HAVE_ASM_GOTO
 endif
 
+include ${srctree}/scripts/Makefile.lto
+
 # Add user supplied CPPFLAGS, AFLAGS and CFLAGS as the last assignments
 KBUILD_CPPFLAGS += $(KCPPFLAGS)
 KBUILD_AFLAGS += $(KAFLAGS)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 0af5250..a5928cd 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -577,7 +577,7 @@ config X86_32_IRIS
 
 config SCHED_OMIT_FRAME_POINTER
 	def_bool y
-	prompt "Single-depth WCHAN output"
+	prompt "Single-depth WCHAN output" if !LTO && !FRAME_POINTER
 	depends on X86
 	---help---
 	  Calculate simpler /proc/<PID>/wchan values. If this option
diff --git a/init/Kconfig b/init/Kconfig
index ea00c0e..7e8910d 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1241,6 +1241,85 @@ config CC_OPTIMIZE_FOR_SIZE
 
 	  If unsure, say N.
 
+config LTO_MENU
+	bool "Enable gcc link time optimization (LTO)"
+	# Only tested on X86 for now. For other architectures you likely
+	# have to fix some things first, like adding asmlinkages etc.
+	depends on X86
+	# lto does not support excluding flags for specific files
+	# right now. Can be removed if that is fixed.
+	depends on !FUNCTION_TRACER
+	help
+	  With this option gcc will do whole program optimizations for
+	  the whole kernel and module. This increases compile time, but can
+	  lead to better code. It allows gcc to inline functions between
+	  different files and do other optimization.  It might also trigger
+	  bugs due to more aggressive optimization. It allows gcc to drop unused
+	  code. On smaller monolithic kernel configurations
+	  it usually leads to smaller kernels, especially when modules
+	  are disabled.
+
+	  With this option gcc will also do some global checking over
+	  different source files. It also disables a number of kernel
+	  features.
+
+	  This option is recommended for release builds. With LTO
+	  the kernel always has to be re-optimized (but not re-parsed)
+	  on each build.
+
+	  This requires a gcc 4.8 or later compiler and
+	  Linux binutils 2.21.51.0.3 or later.  gcc 4.9 builds significantly
+	  faster than 4.8 It does not currently work with a FSF release of
+	  binutils or with the gold linker.
+
+	  On larger configurations this may need more than 4GB of RAM.
+	  It will likely not work on those with a 32bit compiler.
+
+	  When the toolchain support is not available this will (hopefully)
+	  be automatically disabled.
+
+	  For more information see Documentation/lto-build
+
+config LTO_DISABLE
+         bool "Disable LTO again"
+         depends on LTO_MENU
+         default n
+         help
+           This option is merely here so that allyesconfig or allmodconfig do
+           not enable LTO. If you want to actually use LTO do not enable.
+
+config LTO
+	bool
+	default y
+	depends on LTO_MENU && !LTO_DISABLE
+
+config LTO_DEBUG
+	bool "Enable LTO compile time debugging"
+	depends on LTO
+	help
+	  Enable LTO debugging in the compiler. The compiler dumps
+	  some log files that make it easier to figure out LTO
+	  behavior. The log files also allow to reconstruct
+	  the global inlining and a global callgraph.
+	  They however add some (single threaded) cost to the
+	  compilation.  When in doubt do not enable.
+
+config LTO_CP_CLONE
+	bool "Allow aggressive cloning for function specialization"
+	depends on LTO
+	help
+	  Allow the compiler to clone and specialize functions for specific
+	  arguments when it determines these arguments are very commonly
+	  called.  Experimential. Will increase text size.
+
+config LTO_SLIM
+	#bool "Use slim lto"
+	def_bool y
+	depends on LTO
+	help
+	  Do not generate all code twice. The object files will only contain
+	  LTO information. This lowers build time.
+
 config SYSCTL
 	bool
 
@@ -1715,6 +1794,8 @@ config MODULE_FORCE_UNLOAD
 
 config MODVERSIONS
 	bool "Module versioning support"
+	# LTO should work with gcc 4.9
+	depends on !LTO
 	help
 	  Usually, you have to use modules compiled with your kernel.
 	  Saying Y here makes it sometimes possible to use modules
diff --git a/kernel/gcov/Kconfig b/kernel/gcov/Kconfig
index d04ce8a..32f65b7 100644
--- a/kernel/gcov/Kconfig
+++ b/kernel/gcov/Kconfig
@@ -2,7 +2,7 @@ menu "GCOV-based kernel profiling"
 
 config GCOV_KERNEL
 	bool "Enable gcov-based kernel profiling"
-	depends on DEBUG_FS
+	depends on DEBUG_FS && !LTO
 	select CONSTRUCTORS if !UML
 	default n
 	---help---
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 5cbf0c5..40e8a3b 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -180,7 +180,7 @@ config STRIP_ASM_SYMS
 
 config READABLE_ASM
         bool "Generate readable assembler code"
-        depends on DEBUG_KERNEL
+        depends on DEBUG_KERNEL && !LTO
         help
           Disable some compiler optimizations that tend to generate human unreadable
           assembler output. This may make the kernel slightly slower, but it helps
diff --git a/scripts/Makefile.lto b/scripts/Makefile.lto
new file mode 100644
index 0000000..0beeba1
--- /dev/null
+++ b/scripts/Makefile.lto
@@ -0,0 +1,85 @@
+#
+# Support for gcc link time optimization
+#
+
+DISABLE_LTO :=
+LTO_CFLAGS :=
+
+export DISABLE_LTO
+export LTO_CFLAGS
+
+ifdef CONFIG_LTO
+# 4.7 works mostly, but it sometimes loses symbols on large builds
+# This can be worked around by marking those symbols visible,
+# but that is fairly ugly and the problem is gone with 4.8
+# So only allow it with 4.8 for now.
+ifeq ($(call cc-ifversion, -ge, 0408,y),y)
+ifneq ($(call cc-option,${LTO_CFLAGS},n),n)
+# We need HJ Lu's Linux binutils because mainline binutils does not
+# support mixing assembler and LTO code in the same ld -r object.
+# XXX check if the gcc plugin ld is the expected one too
+# XXX some Fedora binutils should also support it. How to check for that?
+ifeq ($(call ld-ifversion,-ge,22710001,y),y)
+        LTO_CFLAGS := -flto -fno-toplevel-reorder
+	LTO_FINAL_CFLAGS := -fuse-linker-plugin
+
+# the -fno-toplevel-reorder is to preserve the order of initcalls
+# everything else should tolerate reordering
+        LTO_FINAL_CFLAGS +=-fno-toplevel-reorder
+
+# enable LTO and set the jobs used by the LTO phase
+# this should be -flto=jobserver to coordinate with the
+# parent make, but work around
+# http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50639
+# use as many jobs as processors are online for now
+# this actually seems to be a kernel bug with the pipe code
+	LTO_FINAL_CFLAGS := -flto=$(shell getconf _NPROCESSORS_ONLN)
+	#LTO_FINAL_CFLAGS := -flto=jobserver
+
+ifdef CONFIG_LTO_SLIM
+	# requires plugin ar passed and very recent HJ binutils
+        LTO_CFLAGS += -fno-fat-lto-objects
+endif
+# Used to disable LTO for specific files (e.g. vdso)
+	DISABLE_LTO := -fno-lto
+
+	LTO_FINAL_CFLAGS += ${LTO_CFLAGS} -fwhole-program
+
+ifdef CONFIG_LTO_DEBUG
+	LTO_FINAL_CFLAGS += -dH -fdump-ipa-cgraph -fdump-ipa-inline-details
+	# -Wl,-plugin-save-temps -save-temps
+	LTO_CFLAGS +=
+endif
+ifdef CONFIG_LTO_CP_CLONE
+	LTO_FINAL_CFLAGS += -fipa-cp-clone
+	LTO_CFLAGS += -fipa-cp-clone
+endif
+
+	# In principle gcc should pass through options in the object files,
+	# but it doesn't always work. So do it here manually
+	# Note that special options for individual files does not
+	# work currently (except for some special cases that only
+	# affect the compiler frontend)
+	# The main offenders are FTRACE and GCOV -- we exclude
+	# those in the config.
+	LTO_FINAL_CFLAGS += $(filter -g%,${KBUILD_CFLAGS})
+	LTO_FINAL_CFLAGS += $(filter -O%,${KBUILD_CFLAGS})
+	LTO_FINAL_CFLAGS += $(filter -f%,${KBUILD_CFLAGS})
+	LTO_FINAL_CFLAGS += $(filter -m%,${KBUILD_CFLAGS})
+	LTO_FINAL_CFLAGS += $(filter -W%,${KBUILD_CFLAGS})
+
+	KBUILD_CFLAGS += ${LTO_CFLAGS}
+
+	LDFINAL := ${CONFIG_SHELL} ${srctree}/scripts/gcc-ld \
+                  ${LTO_FINAL_CFLAGS}
+
+else
+        $(warning "WARNING: Too old linker version $(call ld-version) for kernel LTO. You need Linux binutils. CONFIG_LTO disabled.")
+endif
+else
+        $(warning "WARNING: Compiler/Linker does not support LTO/WHOPR with linker plugin. CONFIG_LTO disabled.")
+endif
+else
+        $(warning "WARNING: GCC $(call cc-version) too old for LTO/WHOPR. CONFIG_LTO disabled")
+endif
+endif
diff --git a/scripts/Makefile.modpost b/scripts/Makefile.modpost
index 69f0a14..9c40dae 100644
--- a/scripts/Makefile.modpost
+++ b/scripts/Makefile.modpost
@@ -77,7 +77,8 @@ modpost = scripts/mod/modpost                    \
  $(if $(KBUILD_EXTRA_SYMBOLS), $(patsubst %, -e %,$(KBUILD_EXTRA_SYMBOLS))) \
  $(if $(KBUILD_EXTMOD),-o $(modulesymfile))      \
  $(if $(CONFIG_DEBUG_SECTION_MISMATCH),,-S)      \
- $(if $(KBUILD_EXTMOD)$(KBUILD_MODPOST_WARN),-w)
+ $(if $(KBUILD_EXTMOD)$(KBUILD_MODPOST_WARN),-w) \
+ $(if $(CONFIG_LTO),-w)
 
 MODPOST_OPT=$(subst -i,-n,$(filter -i,$(MAKEFLAGS)))
 
@@ -115,8 +116,8 @@ $(modules:.ko=.mod.o): %.mod.o: %.mod.c FORCE
 targets += $(modules:.ko=.mod.o)
 
 # Step 6), final link of the modules
-quiet_cmd_ld_ko_o = LD [M]  $@
-      cmd_ld_ko_o = $(LD) -r $(LDFLAGS)                                 \
+quiet_cmd_ld_ko_o = LDFINAL [M]  $@
+      cmd_ld_ko_o = $(LDFINAL) -r $(LDFLAGS)                            \
                              $(KBUILD_LDFLAGS_MODULE) $(LDFLAGS_MODULE) \
                              -o $@ $(filter-out FORCE,$^)
 
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index 5a28f2a..177034b 100644
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -53,7 +53,7 @@ vmlinux_link()
 	local lds="${objtree}/${KBUILD_LDS}"
 
 	if [ "${SRCARCH}" != "um" ]; then
-		${LD} ${LDFLAGS} ${LDFLAGS_vmlinux} -o ${2}                  \
+		${LDFINAL} ${LDFLAGS} ${LDFLAGS_vmlinux} -o ${2}                  \
 			-T ${lds} ${KBUILD_VMLINUX_INIT}                     \
 			--start-group ${KBUILD_VMLINUX_MAIN} --end-group ${1}
 	else
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 17/20] Kbuild, bloat-o-meter: Ignore .lto_priv postfix
  2014-02-18 14:28 Link Time Optimization patchkit v3 Andi Kleen
                   ` (15 preceding siblings ...)
  2014-02-18 14:28 ` [PATCH 16/20] Kbuild, lto: Add Link Time Optimization support v2 Andi Kleen
@ 2014-02-18 14:28 ` Andi Kleen
  2014-02-18 14:51   ` Konrad Rzeszutek Wilk
  2014-02-18 14:28 ` [PATCH 18/20] lto: Mark spinlocks noinline when inline spinlocks are disabled Andi Kleen
                   ` (3 subsequent siblings)
  20 siblings, 1 reply; 27+ messages in thread
From: Andi Kleen @ 2014-02-18 14:28 UTC (permalink / raw)
  To: linux-kernel; +Cc: sam, x86, linux-kbuild, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

gcc 4.9 may generate .lto_priv post fixes for LTO functions.
Ignore those for bloat-o-meter comparisons.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 scripts/bloat-o-meter | 1 +
 1 file changed, 1 insertion(+)

diff --git a/scripts/bloat-o-meter b/scripts/bloat-o-meter
index 549d0ab..2becdf6 100755
--- a/scripts/bloat-o-meter
+++ b/scripts/bloat-o-meter
@@ -23,6 +23,7 @@ def getsizes(file):
             if name == "linux_banner": continue
             # statics and some other optimizations adds random .NUMBER
             name = re.sub(r'\.[0-9]+', '', name)
+	    name = name.replace(".lto_priv", "")
             sym[name] = sym.get(name, 0) + int(size, 16)
     return sym
 
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 18/20] lto: Mark spinlocks noinline when inline spinlocks are disabled
  2014-02-18 14:28 Link Time Optimization patchkit v3 Andi Kleen
                   ` (16 preceding siblings ...)
  2014-02-18 14:28 ` [PATCH 17/20] Kbuild, bloat-o-meter: Ignore .lto_priv postfix Andi Kleen
@ 2014-02-18 14:28 ` Andi Kleen
  2014-02-18 14:28 ` [PATCH 19/20] lto, module: Warn about modules that are not fully LTOed Andi Kleen
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 27+ messages in thread
From: Andi Kleen @ 2014-02-18 14:28 UTC (permalink / raw)
  To: linux-kernel; +Cc: sam, x86, linux-kbuild, Andi Kleen, mingo

From: Andi Kleen <ak@linux.intel.com>

Otherwise LTO will inline them anyways

Cc: mingo@kernel.org
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 kernel/locking/spinlock.c | 56 +++++++++++++++++++++++------------------------
 1 file changed, 28 insertions(+), 28 deletions(-)

diff --git a/kernel/locking/spinlock.c b/kernel/locking/spinlock.c
index 4b082b5..975bfe9 100644
--- a/kernel/locking/spinlock.c
+++ b/kernel/locking/spinlock.c
@@ -130,7 +130,7 @@ BUILD_LOCK_OPS(write, rwlock);
 #endif
 
 #ifndef CONFIG_INLINE_SPIN_TRYLOCK
-int __lockfunc _raw_spin_trylock(raw_spinlock_t *lock)
+noinline int __lockfunc _raw_spin_trylock(raw_spinlock_t *lock)
 {
 	return __raw_spin_trylock(lock);
 }
@@ -138,7 +138,7 @@ EXPORT_SYMBOL(_raw_spin_trylock);
 #endif
 
 #ifndef CONFIG_INLINE_SPIN_TRYLOCK_BH
-int __lockfunc _raw_spin_trylock_bh(raw_spinlock_t *lock)
+noinline int __lockfunc _raw_spin_trylock_bh(raw_spinlock_t *lock)
 {
 	return __raw_spin_trylock_bh(lock);
 }
@@ -146,7 +146,7 @@ EXPORT_SYMBOL(_raw_spin_trylock_bh);
 #endif
 
 #ifndef CONFIG_INLINE_SPIN_LOCK
-void __lockfunc _raw_spin_lock(raw_spinlock_t *lock)
+noinline void __lockfunc _raw_spin_lock(raw_spinlock_t *lock)
 {
 	__raw_spin_lock(lock);
 }
@@ -154,7 +154,7 @@ EXPORT_SYMBOL(_raw_spin_lock);
 #endif
 
 #ifndef CONFIG_INLINE_SPIN_LOCK_IRQSAVE
-unsigned long __lockfunc _raw_spin_lock_irqsave(raw_spinlock_t *lock)
+noinline unsigned long __lockfunc _raw_spin_lock_irqsave(raw_spinlock_t *lock)
 {
 	return __raw_spin_lock_irqsave(lock);
 }
@@ -162,7 +162,7 @@ EXPORT_SYMBOL(_raw_spin_lock_irqsave);
 #endif
 
 #ifndef CONFIG_INLINE_SPIN_LOCK_IRQ
-void __lockfunc _raw_spin_lock_irq(raw_spinlock_t *lock)
+noinline void __lockfunc _raw_spin_lock_irq(raw_spinlock_t *lock)
 {
 	__raw_spin_lock_irq(lock);
 }
@@ -170,7 +170,7 @@ EXPORT_SYMBOL(_raw_spin_lock_irq);
 #endif
 
 #ifndef CONFIG_INLINE_SPIN_LOCK_BH
-void __lockfunc _raw_spin_lock_bh(raw_spinlock_t *lock)
+noinline void __lockfunc _raw_spin_lock_bh(raw_spinlock_t *lock)
 {
 	__raw_spin_lock_bh(lock);
 }
@@ -178,7 +178,7 @@ EXPORT_SYMBOL(_raw_spin_lock_bh);
 #endif
 
 #ifdef CONFIG_UNINLINE_SPIN_UNLOCK
-void __lockfunc _raw_spin_unlock(raw_spinlock_t *lock)
+noinline void __lockfunc _raw_spin_unlock(raw_spinlock_t *lock)
 {
 	__raw_spin_unlock(lock);
 }
@@ -186,7 +186,7 @@ EXPORT_SYMBOL(_raw_spin_unlock);
 #endif
 
 #ifndef CONFIG_INLINE_SPIN_UNLOCK_IRQRESTORE
-void __lockfunc _raw_spin_unlock_irqrestore(raw_spinlock_t *lock, unsigned long flags)
+noinline void __lockfunc _raw_spin_unlock_irqrestore(raw_spinlock_t *lock, unsigned long flags)
 {
 	__raw_spin_unlock_irqrestore(lock, flags);
 }
@@ -194,7 +194,7 @@ EXPORT_SYMBOL(_raw_spin_unlock_irqrestore);
 #endif
 
 #ifndef CONFIG_INLINE_SPIN_UNLOCK_IRQ
-void __lockfunc _raw_spin_unlock_irq(raw_spinlock_t *lock)
+noinline void __lockfunc _raw_spin_unlock_irq(raw_spinlock_t *lock)
 {
 	__raw_spin_unlock_irq(lock);
 }
@@ -202,7 +202,7 @@ EXPORT_SYMBOL(_raw_spin_unlock_irq);
 #endif
 
 #ifndef CONFIG_INLINE_SPIN_UNLOCK_BH
-void __lockfunc _raw_spin_unlock_bh(raw_spinlock_t *lock)
+noinline void __lockfunc _raw_spin_unlock_bh(raw_spinlock_t *lock)
 {
 	__raw_spin_unlock_bh(lock);
 }
@@ -210,7 +210,7 @@ EXPORT_SYMBOL(_raw_spin_unlock_bh);
 #endif
 
 #ifndef CONFIG_INLINE_READ_TRYLOCK
-int __lockfunc _raw_read_trylock(rwlock_t *lock)
+noinline int __lockfunc _raw_read_trylock(rwlock_t *lock)
 {
 	return __raw_read_trylock(lock);
 }
@@ -218,7 +218,7 @@ EXPORT_SYMBOL(_raw_read_trylock);
 #endif
 
 #ifndef CONFIG_INLINE_READ_LOCK
-void __lockfunc _raw_read_lock(rwlock_t *lock)
+noinline void __lockfunc _raw_read_lock(rwlock_t *lock)
 {
 	__raw_read_lock(lock);
 }
@@ -226,7 +226,7 @@ EXPORT_SYMBOL(_raw_read_lock);
 #endif
 
 #ifndef CONFIG_INLINE_READ_LOCK_IRQSAVE
-unsigned long __lockfunc _raw_read_lock_irqsave(rwlock_t *lock)
+noinline unsigned long __lockfunc _raw_read_lock_irqsave(rwlock_t *lock)
 {
 	return __raw_read_lock_irqsave(lock);
 }
@@ -234,7 +234,7 @@ EXPORT_SYMBOL(_raw_read_lock_irqsave);
 #endif
 
 #ifndef CONFIG_INLINE_READ_LOCK_IRQ
-void __lockfunc _raw_read_lock_irq(rwlock_t *lock)
+noinline void __lockfunc _raw_read_lock_irq(rwlock_t *lock)
 {
 	__raw_read_lock_irq(lock);
 }
@@ -242,7 +242,7 @@ EXPORT_SYMBOL(_raw_read_lock_irq);
 #endif
 
 #ifndef CONFIG_INLINE_READ_LOCK_BH
-void __lockfunc _raw_read_lock_bh(rwlock_t *lock)
+noinline void __lockfunc _raw_read_lock_bh(rwlock_t *lock)
 {
 	__raw_read_lock_bh(lock);
 }
@@ -250,7 +250,7 @@ EXPORT_SYMBOL(_raw_read_lock_bh);
 #endif
 
 #ifndef CONFIG_INLINE_READ_UNLOCK
-void __lockfunc _raw_read_unlock(rwlock_t *lock)
+noinline void __lockfunc _raw_read_unlock(rwlock_t *lock)
 {
 	__raw_read_unlock(lock);
 }
@@ -258,7 +258,7 @@ EXPORT_SYMBOL(_raw_read_unlock);
 #endif
 
 #ifndef CONFIG_INLINE_READ_UNLOCK_IRQRESTORE
-void __lockfunc _raw_read_unlock_irqrestore(rwlock_t *lock, unsigned long flags)
+noinline void __lockfunc _raw_read_unlock_irqrestore(rwlock_t *lock, unsigned long flags)
 {
 	__raw_read_unlock_irqrestore(lock, flags);
 }
@@ -266,7 +266,7 @@ EXPORT_SYMBOL(_raw_read_unlock_irqrestore);
 #endif
 
 #ifndef CONFIG_INLINE_READ_UNLOCK_IRQ
-void __lockfunc _raw_read_unlock_irq(rwlock_t *lock)
+noinline void __lockfunc _raw_read_unlock_irq(rwlock_t *lock)
 {
 	__raw_read_unlock_irq(lock);
 }
@@ -274,7 +274,7 @@ EXPORT_SYMBOL(_raw_read_unlock_irq);
 #endif
 
 #ifndef CONFIG_INLINE_READ_UNLOCK_BH
-void __lockfunc _raw_read_unlock_bh(rwlock_t *lock)
+noinline void __lockfunc _raw_read_unlock_bh(rwlock_t *lock)
 {
 	__raw_read_unlock_bh(lock);
 }
@@ -282,7 +282,7 @@ EXPORT_SYMBOL(_raw_read_unlock_bh);
 #endif
 
 #ifndef CONFIG_INLINE_WRITE_TRYLOCK
-int __lockfunc _raw_write_trylock(rwlock_t *lock)
+noinline int __lockfunc _raw_write_trylock(rwlock_t *lock)
 {
 	return __raw_write_trylock(lock);
 }
@@ -290,7 +290,7 @@ EXPORT_SYMBOL(_raw_write_trylock);
 #endif
 
 #ifndef CONFIG_INLINE_WRITE_LOCK
-void __lockfunc _raw_write_lock(rwlock_t *lock)
+noinline void __lockfunc _raw_write_lock(rwlock_t *lock)
 {
 	__raw_write_lock(lock);
 }
@@ -298,7 +298,7 @@ EXPORT_SYMBOL(_raw_write_lock);
 #endif
 
 #ifndef CONFIG_INLINE_WRITE_LOCK_IRQSAVE
-unsigned long __lockfunc _raw_write_lock_irqsave(rwlock_t *lock)
+noinline unsigned long __lockfunc _raw_write_lock_irqsave(rwlock_t *lock)
 {
 	return __raw_write_lock_irqsave(lock);
 }
@@ -306,7 +306,7 @@ EXPORT_SYMBOL(_raw_write_lock_irqsave);
 #endif
 
 #ifndef CONFIG_INLINE_WRITE_LOCK_IRQ
-void __lockfunc _raw_write_lock_irq(rwlock_t *lock)
+noinline void __lockfunc _raw_write_lock_irq(rwlock_t *lock)
 {
 	__raw_write_lock_irq(lock);
 }
@@ -314,7 +314,7 @@ EXPORT_SYMBOL(_raw_write_lock_irq);
 #endif
 
 #ifndef CONFIG_INLINE_WRITE_LOCK_BH
-void __lockfunc _raw_write_lock_bh(rwlock_t *lock)
+noinline void __lockfunc _raw_write_lock_bh(rwlock_t *lock)
 {
 	__raw_write_lock_bh(lock);
 }
@@ -322,7 +322,7 @@ EXPORT_SYMBOL(_raw_write_lock_bh);
 #endif
 
 #ifndef CONFIG_INLINE_WRITE_UNLOCK
-void __lockfunc _raw_write_unlock(rwlock_t *lock)
+noinline void __lockfunc _raw_write_unlock(rwlock_t *lock)
 {
 	__raw_write_unlock(lock);
 }
@@ -330,7 +330,7 @@ EXPORT_SYMBOL(_raw_write_unlock);
 #endif
 
 #ifndef CONFIG_INLINE_WRITE_UNLOCK_IRQRESTORE
-void __lockfunc _raw_write_unlock_irqrestore(rwlock_t *lock, unsigned long flags)
+noinline void __lockfunc _raw_write_unlock_irqrestore(rwlock_t *lock, unsigned long flags)
 {
 	__raw_write_unlock_irqrestore(lock, flags);
 }
@@ -338,7 +338,7 @@ EXPORT_SYMBOL(_raw_write_unlock_irqrestore);
 #endif
 
 #ifndef CONFIG_INLINE_WRITE_UNLOCK_IRQ
-void __lockfunc _raw_write_unlock_irq(rwlock_t *lock)
+noinline void __lockfunc _raw_write_unlock_irq(rwlock_t *lock)
 {
 	__raw_write_unlock_irq(lock);
 }
@@ -346,7 +346,7 @@ EXPORT_SYMBOL(_raw_write_unlock_irq);
 #endif
 
 #ifndef CONFIG_INLINE_WRITE_UNLOCK_BH
-void __lockfunc _raw_write_unlock_bh(rwlock_t *lock)
+noinline void __lockfunc _raw_write_unlock_bh(rwlock_t *lock)
 {
 	__raw_write_unlock_bh(lock);
 }
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 19/20] lto, module: Warn about modules that are not fully LTOed
  2014-02-18 14:28 Link Time Optimization patchkit v3 Andi Kleen
                   ` (17 preceding siblings ...)
  2014-02-18 14:28 ` [PATCH 18/20] lto: Mark spinlocks noinline when inline spinlocks are disabled Andi Kleen
@ 2014-02-18 14:28 ` Andi Kleen
  2014-02-18 14:50   ` Konrad Rzeszutek Wilk
  2014-02-18 14:28 ` [PATCH 20/20] lto: Don't inline __const_udelay Andi Kleen
  2014-02-18 14:34 ` Link Time Optimization patchkit v3 H. Peter Anvin
  20 siblings, 1 reply; 27+ messages in thread
From: Andi Kleen @ 2014-02-18 14:28 UTC (permalink / raw)
  To: linux-kernel; +Cc: sam, x86, linux-kbuild, Andi Kleen, rusty

From: Andi Kleen <ak@linux.intel.com>

When __gnu_lto_* is present that means that the module hasn't run with
LTO yet.

Cc: rusty@rustcorp.com.au
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 kernel/module.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/module.c b/kernel/module.c
index b99e801..4f3eff7 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -1949,8 +1949,11 @@ static int simplify_symbols(struct module *mod, const struct load_info *info)
 		switch (sym[i].st_shndx) {
 		case SHN_COMMON:
 			/* Ignore common symbols */
-			if (!strncmp(name, "__gnu_lto", 9))
+			if (!strncmp(name, "__gnu_lto", 9)) {
+				pr_info("%s: module not link time optimized\n",
+				       mod->name);
 				break;
+			}
 
 			/* We compiled with -fno-common.  These are not
 			   supposed to happen.  */
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 20/20] lto: Don't inline __const_udelay
  2014-02-18 14:28 Link Time Optimization patchkit v3 Andi Kleen
                   ` (18 preceding siblings ...)
  2014-02-18 14:28 ` [PATCH 19/20] lto, module: Warn about modules that are not fully LTOed Andi Kleen
@ 2014-02-18 14:28 ` Andi Kleen
  2014-02-18 14:34 ` Link Time Optimization patchkit v3 H. Peter Anvin
  20 siblings, 0 replies; 27+ messages in thread
From: Andi Kleen @ 2014-02-18 14:28 UTC (permalink / raw)
  To: linux-kernel; +Cc: sam, x86, linux-kbuild, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

__const_udelay is marked inline, and LTO will happily inline it everywhere
Dropping the inline saves ~44k text in a non LTO build.

13999560        1740864 1499136 17239560        1070e08 vmlinux-with-udelay-inline
13954764        1736768 1499136 17190668        1064f0c vmlinux-wo-udelay-inline

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/lib/delay.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/lib/delay.c b/arch/x86/lib/delay.c
index 39d6a3d..540a320 100644
--- a/arch/x86/lib/delay.c
+++ b/arch/x86/lib/delay.c
@@ -112,7 +112,7 @@ void __delay(unsigned long loops)
 }
 EXPORT_SYMBOL(__delay);
 
-inline void __const_udelay(unsigned long xloops)
+void __const_udelay(unsigned long xloops)
 {
 	int d0;
 
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: Link Time Optimization patchkit v3
  2014-02-18 14:28 Link Time Optimization patchkit v3 Andi Kleen
                   ` (19 preceding siblings ...)
  2014-02-18 14:28 ` [PATCH 20/20] lto: Don't inline __const_udelay Andi Kleen
@ 2014-02-18 14:34 ` H. Peter Anvin
  20 siblings, 0 replies; 27+ messages in thread
From: H. Peter Anvin @ 2014-02-18 14:34 UTC (permalink / raw)
  To: Andi Kleen, linux-kernel; +Cc: sam, x86, linux-kbuild, Michal Marek

On 02/18/2014 06:28 AM, Andi Kleen wrote:
> LTO allows the compiler to do global optimization over the whole kernel.
> 
> Updated version of the LTO patchkit, mainly for fixing Sam's review
> comments.  I also rebased to 3.14-rc3 and added a fix for bloat-o-meter
> with gcc 4.9
> 
> See the individual patches for a detailed description
> 
> Dependencies: asmlinkage patchkit (posted two weeks ago), kallsyms patchkit
> (plus LTO capable toolchain, see documentation)
> 
> Full git tree is in
> git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc lto-3.14
> 

Michal, Sam, when this patchset is done, do you want to take it in your
tree or should I?

	-hpa




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 13/20] Kbuild, lto: Set TMPDIR for LTO v2
  2014-02-18 14:28 ` [PATCH 13/20] Kbuild, lto: Set TMPDIR for LTO v2 Andi Kleen
@ 2014-02-18 14:35   ` H. Peter Anvin
  0 siblings, 0 replies; 27+ messages in thread
From: H. Peter Anvin @ 2014-02-18 14:35 UTC (permalink / raw)
  To: Andi Kleen, linux-kernel; +Cc: sam, x86, linux-kbuild, Andi Kleen

On 02/18/2014 06:28 AM, Andi Kleen wrote:
>  
> +ifdef CONFIG_LTO
> +# LTO gcc creates a lot of files in TMPDIR, and with /tmp as tmpfs
> +# it's easy to drive the machine OOM. Use the object directory
> +# instead
> +TMPDIR ?= ${objtree}
> +export TMPDIR
> +endif
> +

We still prefer $(...) in Makefiles, no?

	-hpa



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 19/20] lto, module: Warn about modules that are not fully LTOed
  2014-02-18 14:28 ` [PATCH 19/20] lto, module: Warn about modules that are not fully LTOed Andi Kleen
@ 2014-02-18 14:50   ` Konrad Rzeszutek Wilk
  2014-02-18 18:52     ` Andi Kleen
  0 siblings, 1 reply; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-02-18 14:50 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, sam, x86, linux-kbuild, Andi Kleen, rusty

On Tue, Feb 18, 2014 at 03:28:57PM +0100, Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> 
> When __gnu_lto_* is present that means that the module hasn't run with
> LTO yet.

The title says 'warn' but this is not a warning just information.

Can you actually build modules against the kernel with different compiler
options? I thought it would complain when trying to load about some
form of mismatch?

> 
> Cc: rusty@rustcorp.com.au
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
>  kernel/module.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/module.c b/kernel/module.c
> index b99e801..4f3eff7 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -1949,8 +1949,11 @@ static int simplify_symbols(struct module *mod, const struct load_info *info)
>  		switch (sym[i].st_shndx) {
>  		case SHN_COMMON:
>  			/* Ignore common symbols */
> -			if (!strncmp(name, "__gnu_lto", 9))
> +			if (!strncmp(name, "__gnu_lto", 9)) {
> +				pr_info("%s: module not link time optimized\n",
> +				       mod->name);
>  				break;
> +			}
>  
>  			/* We compiled with -fno-common.  These are not
>  			   supposed to happen.  */
> -- 
> 1.8.5.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 17/20] Kbuild, bloat-o-meter: Ignore .lto_priv postfix
  2014-02-18 14:28 ` [PATCH 17/20] Kbuild, bloat-o-meter: Ignore .lto_priv postfix Andi Kleen
@ 2014-02-18 14:51   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-02-18 14:51 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, sam, x86, linux-kbuild, Andi Kleen

On Tue, Feb 18, 2014 at 03:28:55PM +0100, Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> 
> gcc 4.9 may generate .lto_priv post fixes for LTO functions.
> Ignore those for bloat-o-meter comparisons.
> 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
>  scripts/bloat-o-meter | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/scripts/bloat-o-meter b/scripts/bloat-o-meter
> index 549d0ab..2becdf6 100755
> --- a/scripts/bloat-o-meter
> +++ b/scripts/bloat-o-meter
> @@ -23,6 +23,7 @@ def getsizes(file):
>              if name == "linux_banner": continue
>              # statics and some other optimizations adds random .NUMBER
>              name = re.sub(r'\.[0-9]+', '', name)
> +	    name = name.replace(".lto_priv", "")

Is it my editor or are you using tabs instead of spaces there?

Not that it matters that much - but it makes the patch look
odd.

>              sym[name] = sym.get(name, 0) + int(size, 16)
>      return sym
>  
> -- 
> 1.8.5.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 05/20] lto: Handle LTO common symbols in module loader
  2014-02-18 14:28 ` [PATCH 05/20] lto: Handle LTO common symbols in module loader Andi Kleen
@ 2014-02-18 14:53   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-02-18 14:53 UTC (permalink / raw)
  To: Andi Kleen
  Cc: linux-kernel, sam, x86, linux-kbuild, Joe Mario, rusty, Andi Kleen

On Tue, Feb 18, 2014 at 03:28:43PM +0100, Andi Kleen wrote:
> From: Joe Mario <jmario@redhat.com>
> 
> Here is the workaround I made for having the kernel not reject modules
> built with -flto.  The clean solution would be to get the compiler to not
> emit the symbol.  Or if it has to emit the symbol, then emit it as
> initialized data but put it into a comdat/linkonce section.
> 
> Minor tweaks by AK over Joe's patch.

Should Joe's SOB be on this patch ?
> 
> Cc: rusty@rustcorp.com.au
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
>  kernel/module.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/kernel/module.c b/kernel/module.c
> index d24fcf2..b99e801 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -1948,6 +1948,10 @@ static int simplify_symbols(struct module *mod, const struct load_info *info)
>  
>  		switch (sym[i].st_shndx) {
>  		case SHN_COMMON:
> +			/* Ignore common symbols */
> +			if (!strncmp(name, "__gnu_lto", 9))
> +				break;
> +
>  			/* We compiled with -fno-common.  These are not
>  			   supposed to happen.  */
>  			pr_debug("Common symbol: %s\n", name);
> -- 
> 1.8.5.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 19/20] lto, module: Warn about modules that are not fully LTOed
  2014-02-18 14:50   ` Konrad Rzeszutek Wilk
@ 2014-02-18 18:52     ` Andi Kleen
  0 siblings, 0 replies; 27+ messages in thread
From: Andi Kleen @ 2014-02-18 18:52 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Andi Kleen, linux-kernel, sam, x86, linux-kbuild, Andi Kleen, rusty

On Tue, Feb 18, 2014 at 09:50:39AM -0500, Konrad Rzeszutek Wilk wrote:
> On Tue, Feb 18, 2014 at 03:28:57PM +0100, Andi Kleen wrote:
> > From: Andi Kleen <ak@linux.intel.com>
> > 
> > When __gnu_lto_* is present that means that the module hasn't run with
> > LTO yet.
> 
> The title says 'warn' but this is not a warning just information.
> 
> Can you actually build modules against the kernel with different compiler
> options? I thought it would complain when trying to load about some
> form of mismatch?

Sure it has worked forever (I think the only reason we still check
compiler versions was some ancient long obsolete ABI problem)

LTO is also fully ABI compatible.

Hmm I think this patch can be actually removed now because I default
to slim LTO, and with that it wouldn't even load if this happens.
I'll remove it.

-Andi

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2014-02-18 18:52 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-18 14:28 Link Time Optimization patchkit v3 Andi Kleen
2014-02-18 14:28 ` [PATCH 01/20] x86, lto: Disable LTO for the x86 VDSO Andi Kleen
2014-02-18 14:28 ` [PATCH 02/20] lto: Disable LTO for hweight functions Andi Kleen
2014-02-18 14:28 ` [PATCH 03/20] lto: Make asmlinkage __visible Andi Kleen
2014-02-18 14:28 ` [PATCH 04/20] lto, workaround: Add workaround for initcall reordering Andi Kleen
2014-02-18 14:28 ` [PATCH 05/20] lto: Handle LTO common symbols in module loader Andi Kleen
2014-02-18 14:53   ` Konrad Rzeszutek Wilk
2014-02-18 14:28 ` [PATCH 06/20] lto: Disable LTO for sys_ni Andi Kleen
2014-02-18 14:28 ` [PATCH 07/20] lto: Don't let LATENCYTOP and LOCKDEP select KALLSYMS_ALL Andi Kleen
2014-02-18 14:28 ` [PATCH 08/20] Kbuild, lto, workaround: Don't warn for initcall_reference in modpost Andi Kleen
2014-02-18 14:28 ` [PATCH 09/20] Kbuild, lto: Drop .number postfixes " Andi Kleen
2014-02-18 14:28 ` [PATCH 10/20] Kbuild, lto: add ld-version and ld-ifversion macros Andi Kleen
2014-02-18 14:28 ` [PATCH 11/20] Kbuild, lto: Add a gcc-ld script to let run gcc as ld Andi Kleen
2014-02-18 14:28 ` [PATCH 12/20] Kbuild, lto: Disable LTO for asm-offsets.c Andi Kleen
2014-02-18 14:28 ` [PATCH 13/20] Kbuild, lto: Set TMPDIR for LTO v2 Andi Kleen
2014-02-18 14:35   ` H. Peter Anvin
2014-02-18 14:28 ` [PATCH 14/20] Kbuild, lto: Handle basic LTO in modpost Andi Kleen
2014-02-18 14:28 ` [PATCH 15/20] Kbuild, lto: Fix single pass kallsyms for LTO Andi Kleen
2014-02-18 14:28 ` [PATCH 16/20] Kbuild, lto: Add Link Time Optimization support v2 Andi Kleen
2014-02-18 14:28 ` [PATCH 17/20] Kbuild, bloat-o-meter: Ignore .lto_priv postfix Andi Kleen
2014-02-18 14:51   ` Konrad Rzeszutek Wilk
2014-02-18 14:28 ` [PATCH 18/20] lto: Mark spinlocks noinline when inline spinlocks are disabled Andi Kleen
2014-02-18 14:28 ` [PATCH 19/20] lto, module: Warn about modules that are not fully LTOed Andi Kleen
2014-02-18 14:50   ` Konrad Rzeszutek Wilk
2014-02-18 18:52     ` Andi Kleen
2014-02-18 14:28 ` [PATCH 20/20] lto: Don't inline __const_udelay Andi Kleen
2014-02-18 14:34 ` Link Time Optimization patchkit v3 H. Peter Anvin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).