linux-kbuild.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Link time optimization for LTO/x86
@ 2017-11-27 21:34 Andi Kleen
  2017-11-27 21:34 ` [PATCH 01/21] x86/xen: Mark pv stub assembler symbol visible Andi Kleen
                   ` (22 more replies)
  0 siblings, 23 replies; 31+ messages in thread
From: Andi Kleen @ 2017-11-27 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, samitolvanen, alxmtvv, linux-kbuild, yamada.masahiro, akpm

This is an updated version of my older LTO patchkit for gcc/x86
This version doesn't need special binutils, but requires gcc 5+.
It also is compatible with near all options (except MODVERSIONS)

This allows the compiler to optimize over source files and throw
away unnecessary functions.

It also found various problems in source files, these are
fixed in the first few patches.

There are still some minor issues, see the individual files.
Also it still does double/triple link for KALLSYMS, which increases
the build time

Available at

git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc lto-415-2

-Andi

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 01/21] x86/xen: Mark pv stub assembler symbol visible
  2017-11-27 21:34 Link time optimization for LTO/x86 Andi Kleen
@ 2017-11-27 21:34 ` Andi Kleen
  2017-11-27 21:34 ` [PATCH 02/21] afs: Fix const confusion in AFS Andi Kleen
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Andi Kleen @ 2017-11-27 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, samitolvanen, alxmtvv, linux-kbuild, yamada.masahiro, akpm,
	Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

With LTO any external assembler symbol has to be marked __visible.
Mark the generated asm PV stubs __visible to prevent a linker error.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/include/asm/paravirt.h | 3 ++-
 drivers/xen/time.c              | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 283efcaac8af..cf2861aa4df0 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -745,7 +745,8 @@ static __always_inline bool pv_vcpu_is_preempted(long cpu)
  */
 #define PV_THUNK_NAME(func) "__raw_callee_save_" #func
 #define PV_CALLEE_SAVE_REGS_THUNK(func)					\
-	extern typeof(func) __raw_callee_save_##func;			\
+	extern __visible typeof(func) __raw_callee_save_##func;		\
+	extern __visible typeof(func) func;				\
 									\
 	asm(".pushsection .text;"					\
 	    ".globl " PV_THUNK_NAME(func) ";"				\
diff --git a/drivers/xen/time.c b/drivers/xen/time.c
index 3e741cd1409c..708a00c337d7 100644
--- a/drivers/xen/time.c
+++ b/drivers/xen/time.c
@@ -144,7 +144,7 @@ void xen_get_runstate_snapshot(struct vcpu_runstate_info *res)
 }
 
 /* return true when a vcpu could run but has no real cpu to run on */
-bool xen_vcpu_stolen(int vcpu)
+__visible bool xen_vcpu_stolen(int vcpu)
 {
 	return per_cpu(xen_runstate, vcpu).state == RUNSTATE_runnable;
 }
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 02/21] afs: Fix const confusion in AFS
  2017-11-27 21:34 Link time optimization for LTO/x86 Andi Kleen
  2017-11-27 21:34 ` [PATCH 01/21] x86/xen: Mark pv stub assembler symbol visible Andi Kleen
@ 2017-11-27 21:34 ` Andi Kleen
  2017-11-27 21:34 ` [PATCH 03/21] x86/timer: Don't inline __const_udelay Andi Kleen
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Andi Kleen @ 2017-11-27 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, samitolvanen, alxmtvv, linux-kbuild, yamada.masahiro, akpm,
	Andi Kleen, dhowells

From: Andi Kleen <ak@linux.intel.com>

A trace point string cannot be const because the underlying special
section is not marked const. An LTO build complains about the
section attribute mismatch. Fix it by not marking the trace point
string in afs const.

Cc: dhowells@redhat.com
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 fs/afs/cmservice.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/afs/cmservice.c b/fs/afs/cmservice.c
index 41e277f57b20..0e9ea0f8d620 100644
--- a/fs/afs/cmservice.c
+++ b/fs/afs/cmservice.c
@@ -31,7 +31,7 @@ static void SRXAFSCB_ProbeUuid(struct work_struct *);
 static void SRXAFSCB_TellMeAboutYourself(struct work_struct *);
 
 #define CM_NAME(name) \
-	const char afs_SRXCB##name##_name[] __tracepoint_string =	\
+	char afs_SRXCB##name##_name[] __tracepoint_string =	\
 		"CB." #name
 
 /*
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 03/21] x86/timer: Don't inline __const_udelay
  2017-11-27 21:34 Link time optimization for LTO/x86 Andi Kleen
  2017-11-27 21:34 ` [PATCH 01/21] x86/xen: Mark pv stub assembler symbol visible Andi Kleen
  2017-11-27 21:34 ` [PATCH 02/21] afs: Fix const confusion in AFS Andi Kleen
@ 2017-11-27 21:34 ` Andi Kleen
  2017-11-27 21:34 ` [PATCH 04/21] locking/spinlocks: Mark spinlocks noinline when inline spinlocks are disabled Andi Kleen
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Andi Kleen @ 2017-11-27 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, samitolvanen, alxmtvv, linux-kbuild, yamada.masahiro, akpm,
	Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

__const_udelay is marked inline, and LTO will happily inline it everywhere
Dropping the inline saves ~44k text in a LTO build.

13999560        1740864 1499136 17239560        1070e08 vmlinux-with-udelay-inline
13954764        1736768 1499136 17190668        1064f0c vmlinux-wo-udelay-inline

Even without LTO I believe marking it noinline documents it correctly.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/lib/delay.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/lib/delay.c b/arch/x86/lib/delay.c
index 553f8fd23cc4..09c83b2f80d2 100644
--- a/arch/x86/lib/delay.c
+++ b/arch/x86/lib/delay.c
@@ -162,7 +162,7 @@ void __delay(unsigned long loops)
 }
 EXPORT_SYMBOL(__delay);
 
-inline void __const_udelay(unsigned long xloops)
+void __const_udelay(unsigned long xloops)
 {
 	unsigned long lpj = this_cpu_read(cpu_info.loops_per_jiffy) ? : loops_per_jiffy;
 	int d0;
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 04/21] locking/spinlocks: Mark spinlocks noinline when inline spinlocks are disabled
  2017-11-27 21:34 Link time optimization for LTO/x86 Andi Kleen
                   ` (2 preceding siblings ...)
  2017-11-27 21:34 ` [PATCH 03/21] x86/timer: Don't inline __const_udelay Andi Kleen
@ 2017-11-27 21:34 ` Andi Kleen
  2017-11-27 21:34 ` [PATCH 05/21] x86/kvm: Make steal_time visible Andi Kleen
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Andi Kleen @ 2017-11-27 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, samitolvanen, alxmtvv, linux-kbuild, yamada.masahiro, akpm,
	Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Otherwise LTO will inline them anyways and cause a large
kernel text increase.

Since the explicit intention here is to not inline them marking
them noinline is good documentation even for the non LTO case.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 kernel/locking/spinlock.c | 56 +++++++++++++++++++++++------------------------
 1 file changed, 28 insertions(+), 28 deletions(-)

diff --git a/kernel/locking/spinlock.c b/kernel/locking/spinlock.c
index 1fd1a7543cdd..09d0dce7fa5d 100644
--- a/kernel/locking/spinlock.c
+++ b/kernel/locking/spinlock.c
@@ -130,7 +130,7 @@ BUILD_LOCK_OPS(write, rwlock);
 #endif
 
 #ifndef CONFIG_INLINE_SPIN_TRYLOCK
-int __lockfunc _raw_spin_trylock(raw_spinlock_t *lock)
+noinline int __lockfunc _raw_spin_trylock(raw_spinlock_t *lock)
 {
 	return __raw_spin_trylock(lock);
 }
@@ -138,7 +138,7 @@ EXPORT_SYMBOL(_raw_spin_trylock);
 #endif
 
 #ifndef CONFIG_INLINE_SPIN_TRYLOCK_BH
-int __lockfunc _raw_spin_trylock_bh(raw_spinlock_t *lock)
+noinline int __lockfunc _raw_spin_trylock_bh(raw_spinlock_t *lock)
 {
 	return __raw_spin_trylock_bh(lock);
 }
@@ -146,7 +146,7 @@ EXPORT_SYMBOL(_raw_spin_trylock_bh);
 #endif
 
 #ifndef CONFIG_INLINE_SPIN_LOCK
-void __lockfunc _raw_spin_lock(raw_spinlock_t *lock)
+noinline void __lockfunc _raw_spin_lock(raw_spinlock_t *lock)
 {
 	__raw_spin_lock(lock);
 }
@@ -154,7 +154,7 @@ EXPORT_SYMBOL(_raw_spin_lock);
 #endif
 
 #ifndef CONFIG_INLINE_SPIN_LOCK_IRQSAVE
-unsigned long __lockfunc _raw_spin_lock_irqsave(raw_spinlock_t *lock)
+noinline unsigned long __lockfunc _raw_spin_lock_irqsave(raw_spinlock_t *lock)
 {
 	return __raw_spin_lock_irqsave(lock);
 }
@@ -162,7 +162,7 @@ EXPORT_SYMBOL(_raw_spin_lock_irqsave);
 #endif
 
 #ifndef CONFIG_INLINE_SPIN_LOCK_IRQ
-void __lockfunc _raw_spin_lock_irq(raw_spinlock_t *lock)
+noinline void __lockfunc _raw_spin_lock_irq(raw_spinlock_t *lock)
 {
 	__raw_spin_lock_irq(lock);
 }
@@ -170,7 +170,7 @@ EXPORT_SYMBOL(_raw_spin_lock_irq);
 #endif
 
 #ifndef CONFIG_INLINE_SPIN_LOCK_BH
-void __lockfunc _raw_spin_lock_bh(raw_spinlock_t *lock)
+noinline void __lockfunc _raw_spin_lock_bh(raw_spinlock_t *lock)
 {
 	__raw_spin_lock_bh(lock);
 }
@@ -178,7 +178,7 @@ EXPORT_SYMBOL(_raw_spin_lock_bh);
 #endif
 
 #ifdef CONFIG_UNINLINE_SPIN_UNLOCK
-void __lockfunc _raw_spin_unlock(raw_spinlock_t *lock)
+noinline void __lockfunc _raw_spin_unlock(raw_spinlock_t *lock)
 {
 	__raw_spin_unlock(lock);
 }
@@ -186,7 +186,7 @@ EXPORT_SYMBOL(_raw_spin_unlock);
 #endif
 
 #ifndef CONFIG_INLINE_SPIN_UNLOCK_IRQRESTORE
-void __lockfunc _raw_spin_unlock_irqrestore(raw_spinlock_t *lock, unsigned long flags)
+noinline void __lockfunc _raw_spin_unlock_irqrestore(raw_spinlock_t *lock, unsigned long flags)
 {
 	__raw_spin_unlock_irqrestore(lock, flags);
 }
@@ -194,7 +194,7 @@ EXPORT_SYMBOL(_raw_spin_unlock_irqrestore);
 #endif
 
 #ifndef CONFIG_INLINE_SPIN_UNLOCK_IRQ
-void __lockfunc _raw_spin_unlock_irq(raw_spinlock_t *lock)
+noinline void __lockfunc _raw_spin_unlock_irq(raw_spinlock_t *lock)
 {
 	__raw_spin_unlock_irq(lock);
 }
@@ -202,7 +202,7 @@ EXPORT_SYMBOL(_raw_spin_unlock_irq);
 #endif
 
 #ifndef CONFIG_INLINE_SPIN_UNLOCK_BH
-void __lockfunc _raw_spin_unlock_bh(raw_spinlock_t *lock)
+noinline void __lockfunc _raw_spin_unlock_bh(raw_spinlock_t *lock)
 {
 	__raw_spin_unlock_bh(lock);
 }
@@ -210,7 +210,7 @@ EXPORT_SYMBOL(_raw_spin_unlock_bh);
 #endif
 
 #ifndef CONFIG_INLINE_READ_TRYLOCK
-int __lockfunc _raw_read_trylock(rwlock_t *lock)
+noinline int __lockfunc _raw_read_trylock(rwlock_t *lock)
 {
 	return __raw_read_trylock(lock);
 }
@@ -218,7 +218,7 @@ EXPORT_SYMBOL(_raw_read_trylock);
 #endif
 
 #ifndef CONFIG_INLINE_READ_LOCK
-void __lockfunc _raw_read_lock(rwlock_t *lock)
+noinline void __lockfunc _raw_read_lock(rwlock_t *lock)
 {
 	__raw_read_lock(lock);
 }
@@ -226,7 +226,7 @@ EXPORT_SYMBOL(_raw_read_lock);
 #endif
 
 #ifndef CONFIG_INLINE_READ_LOCK_IRQSAVE
-unsigned long __lockfunc _raw_read_lock_irqsave(rwlock_t *lock)
+noinline unsigned long __lockfunc _raw_read_lock_irqsave(rwlock_t *lock)
 {
 	return __raw_read_lock_irqsave(lock);
 }
@@ -234,7 +234,7 @@ EXPORT_SYMBOL(_raw_read_lock_irqsave);
 #endif
 
 #ifndef CONFIG_INLINE_READ_LOCK_IRQ
-void __lockfunc _raw_read_lock_irq(rwlock_t *lock)
+noinline void __lockfunc _raw_read_lock_irq(rwlock_t *lock)
 {
 	__raw_read_lock_irq(lock);
 }
@@ -242,7 +242,7 @@ EXPORT_SYMBOL(_raw_read_lock_irq);
 #endif
 
 #ifndef CONFIG_INLINE_READ_LOCK_BH
-void __lockfunc _raw_read_lock_bh(rwlock_t *lock)
+noinline void __lockfunc _raw_read_lock_bh(rwlock_t *lock)
 {
 	__raw_read_lock_bh(lock);
 }
@@ -250,7 +250,7 @@ EXPORT_SYMBOL(_raw_read_lock_bh);
 #endif
 
 #ifndef CONFIG_INLINE_READ_UNLOCK
-void __lockfunc _raw_read_unlock(rwlock_t *lock)
+noinline void __lockfunc _raw_read_unlock(rwlock_t *lock)
 {
 	__raw_read_unlock(lock);
 }
@@ -258,7 +258,7 @@ EXPORT_SYMBOL(_raw_read_unlock);
 #endif
 
 #ifndef CONFIG_INLINE_READ_UNLOCK_IRQRESTORE
-void __lockfunc _raw_read_unlock_irqrestore(rwlock_t *lock, unsigned long flags)
+noinline void __lockfunc _raw_read_unlock_irqrestore(rwlock_t *lock, unsigned long flags)
 {
 	__raw_read_unlock_irqrestore(lock, flags);
 }
@@ -266,7 +266,7 @@ EXPORT_SYMBOL(_raw_read_unlock_irqrestore);
 #endif
 
 #ifndef CONFIG_INLINE_READ_UNLOCK_IRQ
-void __lockfunc _raw_read_unlock_irq(rwlock_t *lock)
+noinline void __lockfunc _raw_read_unlock_irq(rwlock_t *lock)
 {
 	__raw_read_unlock_irq(lock);
 }
@@ -274,7 +274,7 @@ EXPORT_SYMBOL(_raw_read_unlock_irq);
 #endif
 
 #ifndef CONFIG_INLINE_READ_UNLOCK_BH
-void __lockfunc _raw_read_unlock_bh(rwlock_t *lock)
+noinline void __lockfunc _raw_read_unlock_bh(rwlock_t *lock)
 {
 	__raw_read_unlock_bh(lock);
 }
@@ -282,7 +282,7 @@ EXPORT_SYMBOL(_raw_read_unlock_bh);
 #endif
 
 #ifndef CONFIG_INLINE_WRITE_TRYLOCK
-int __lockfunc _raw_write_trylock(rwlock_t *lock)
+noinline int __lockfunc _raw_write_trylock(rwlock_t *lock)
 {
 	return __raw_write_trylock(lock);
 }
@@ -290,7 +290,7 @@ EXPORT_SYMBOL(_raw_write_trylock);
 #endif
 
 #ifndef CONFIG_INLINE_WRITE_LOCK
-void __lockfunc _raw_write_lock(rwlock_t *lock)
+noinline void __lockfunc _raw_write_lock(rwlock_t *lock)
 {
 	__raw_write_lock(lock);
 }
@@ -298,7 +298,7 @@ EXPORT_SYMBOL(_raw_write_lock);
 #endif
 
 #ifndef CONFIG_INLINE_WRITE_LOCK_IRQSAVE
-unsigned long __lockfunc _raw_write_lock_irqsave(rwlock_t *lock)
+noinline unsigned long __lockfunc _raw_write_lock_irqsave(rwlock_t *lock)
 {
 	return __raw_write_lock_irqsave(lock);
 }
@@ -306,7 +306,7 @@ EXPORT_SYMBOL(_raw_write_lock_irqsave);
 #endif
 
 #ifndef CONFIG_INLINE_WRITE_LOCK_IRQ
-void __lockfunc _raw_write_lock_irq(rwlock_t *lock)
+noinline void __lockfunc _raw_write_lock_irq(rwlock_t *lock)
 {
 	__raw_write_lock_irq(lock);
 }
@@ -314,7 +314,7 @@ EXPORT_SYMBOL(_raw_write_lock_irq);
 #endif
 
 #ifndef CONFIG_INLINE_WRITE_LOCK_BH
-void __lockfunc _raw_write_lock_bh(rwlock_t *lock)
+noinline void __lockfunc _raw_write_lock_bh(rwlock_t *lock)
 {
 	__raw_write_lock_bh(lock);
 }
@@ -322,7 +322,7 @@ EXPORT_SYMBOL(_raw_write_lock_bh);
 #endif
 
 #ifndef CONFIG_INLINE_WRITE_UNLOCK
-void __lockfunc _raw_write_unlock(rwlock_t *lock)
+noinline void __lockfunc _raw_write_unlock(rwlock_t *lock)
 {
 	__raw_write_unlock(lock);
 }
@@ -330,7 +330,7 @@ EXPORT_SYMBOL(_raw_write_unlock);
 #endif
 
 #ifndef CONFIG_INLINE_WRITE_UNLOCK_IRQRESTORE
-void __lockfunc _raw_write_unlock_irqrestore(rwlock_t *lock, unsigned long flags)
+noinline void __lockfunc _raw_write_unlock_irqrestore(rwlock_t *lock, unsigned long flags)
 {
 	__raw_write_unlock_irqrestore(lock, flags);
 }
@@ -338,7 +338,7 @@ EXPORT_SYMBOL(_raw_write_unlock_irqrestore);
 #endif
 
 #ifndef CONFIG_INLINE_WRITE_UNLOCK_IRQ
-void __lockfunc _raw_write_unlock_irq(rwlock_t *lock)
+noinline void __lockfunc _raw_write_unlock_irq(rwlock_t *lock)
 {
 	__raw_write_unlock_irq(lock);
 }
@@ -346,7 +346,7 @@ EXPORT_SYMBOL(_raw_write_unlock_irq);
 #endif
 
 #ifndef CONFIG_INLINE_WRITE_UNLOCK_BH
-void __lockfunc _raw_write_unlock_bh(rwlock_t *lock)
+noinline void __lockfunc _raw_write_unlock_bh(rwlock_t *lock)
 {
 	__raw_write_unlock_bh(lock);
 }
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 05/21] x86/kvm: Make steal_time visible
  2017-11-27 21:34 Link time optimization for LTO/x86 Andi Kleen
                   ` (3 preceding siblings ...)
  2017-11-27 21:34 ` [PATCH 04/21] locking/spinlocks: Mark spinlocks noinline when inline spinlocks are disabled Andi Kleen
@ 2017-11-27 21:34 ` Andi Kleen
  2017-11-27 21:34 ` [PATCH 06/21] x86/syscalls: Make x86 syscalls use real prototypes Andi Kleen
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Andi Kleen @ 2017-11-27 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, samitolvanen, alxmtvv, linux-kbuild, yamada.masahiro, akpm,
	Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

This per cpu variable is accessed from assembler code, so needs
to be visible.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/kernel/kvm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index b40ffbf156c1..8484e3e41d36 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -76,7 +76,7 @@ static int parse_no_kvmclock_vsyscall(char *arg)
 early_param("no-kvmclock-vsyscall", parse_no_kvmclock_vsyscall);
 
 static DEFINE_PER_CPU_DECRYPTED(struct kvm_vcpu_pv_apf_data, apf_reason) __aligned(64);
-static DEFINE_PER_CPU_DECRYPTED(struct kvm_steal_time, steal_time) __aligned(64);
+DEFINE_PER_CPU_DECRYPTED(struct kvm_steal_time, steal_time) __aligned(64) __visible;
 static int has_steal_clock = 0;
 
 /*
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 06/21] x86/syscalls: Make x86 syscalls use real prototypes
  2017-11-27 21:34 Link time optimization for LTO/x86 Andi Kleen
                   ` (4 preceding siblings ...)
  2017-11-27 21:34 ` [PATCH 05/21] x86/kvm: Make steal_time visible Andi Kleen
@ 2017-11-27 21:34 ` Andi Kleen
  2017-11-27 21:34 ` [PATCH 07/21] x86: Make exception handler functions visible Andi Kleen
                   ` (16 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Andi Kleen @ 2017-11-27 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, samitolvanen, alxmtvv, linux-kbuild, yamada.masahiro, akpm,
	Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

LTO complains very loudly that the x86 syscalls use their own different
prototypes. Switch it to use the real prototypes instead. This requires
adding a few extra prototypes to asm/syscalls.h.

This is a generic cleanup, useful even without LTO.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/entry/syscall_32.c     | 13 +++++--------
 arch/x86/entry/syscall_64.c     | 13 +++++--------
 arch/x86/include/asm/syscalls.h | 42 ++++++++++++++++++++++++++++++++++++-----
 3 files changed, 47 insertions(+), 21 deletions(-)

diff --git a/arch/x86/entry/syscall_32.c b/arch/x86/entry/syscall_32.c
index 95c294963612..f7ffd4b100fd 100644
--- a/arch/x86/entry/syscall_32.c
+++ b/arch/x86/entry/syscall_32.c
@@ -6,20 +6,17 @@
 #include <linux/cache.h>
 #include <asm/asm-offsets.h>
 #include <asm/syscall.h>
+#include <linux/syscalls.h>
+#include <asm/syscalls.h>
+#include <asm/sys_ia32.h>
 
-#define __SYSCALL_I386(nr, sym, qual) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ;
-#include <asm/syscalls_32.h>
-#undef __SYSCALL_I386
-
-#define __SYSCALL_I386(nr, sym, qual) [nr] = sym,
-
-extern asmlinkage long sys_ni_syscall(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
+#define __SYSCALL_I386(nr, sym, qual) [nr] = (sys_call_ptr_t)sym,
 
 __visible const sys_call_ptr_t ia32_sys_call_table[__NR_syscall_compat_max+1] = {
 	/*
 	 * Smells like a compiler bug -- it doesn't work
 	 * when the & below is removed.
 	 */
-	[0 ... __NR_syscall_compat_max] = &sys_ni_syscall,
+	[0 ... __NR_syscall_compat_max] = (sys_call_ptr_t)&sys_ni_syscall,
 #include <asm/syscalls_32.h>
 };
diff --git a/arch/x86/entry/syscall_64.c b/arch/x86/entry/syscall_64.c
index 9c09775e589d..1ccf4cd300a6 100644
--- a/arch/x86/entry/syscall_64.c
+++ b/arch/x86/entry/syscall_64.c
@@ -4,25 +4,22 @@
 #include <linux/linkage.h>
 #include <linux/sys.h>
 #include <linux/cache.h>
+#include <linux/syscalls.h>
+#include <linux/compat.h>
 #include <asm/asm-offsets.h>
 #include <asm/syscall.h>
+#include <asm/syscalls.h>
 
 #define __SYSCALL_64_QUAL_(sym) sym
 #define __SYSCALL_64_QUAL_ptregs(sym) ptregs_##sym
 
-#define __SYSCALL_64(nr, sym, qual) extern asmlinkage long __SYSCALL_64_QUAL_##qual(sym)(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
-#include <asm/syscalls_64.h>
-#undef __SYSCALL_64
-
-#define __SYSCALL_64(nr, sym, qual) [nr] = __SYSCALL_64_QUAL_##qual(sym),
-
-extern long sys_ni_syscall(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
+#define __SYSCALL_64(nr, sym, qual) [nr] = (sys_call_ptr_t)__SYSCALL_64_QUAL_##qual(sym),
 
 asmlinkage const sys_call_ptr_t sys_call_table[__NR_syscall_max+1] = {
 	/*
 	 * Smells like a compiler bug -- it doesn't work
 	 * when the & below is removed.
 	 */
-	[0 ... __NR_syscall_max] = &sys_ni_syscall,
+	[0 ... __NR_syscall_max] = (sys_call_ptr_t)&sys_ni_syscall,
 #include <asm/syscalls_64.h>
 };
diff --git a/arch/x86/include/asm/syscalls.h b/arch/x86/include/asm/syscalls.h
index bad25bb80679..76bf99c8b151 100644
--- a/arch/x86/include/asm/syscalls.h
+++ b/arch/x86/include/asm/syscalls.h
@@ -14,6 +14,7 @@
 #include <linux/linkage.h>
 #include <linux/signal.h>
 #include <linux/types.h>
+#include <linux/compat.h>
 
 /* Common in X86_32 and X86_64 */
 /* kernel/ioport.c */
@@ -30,6 +31,8 @@ asmlinkage long sys_rt_sigreturn(void);
 asmlinkage long sys_set_thread_area(struct user_desc __user *);
 asmlinkage long sys_get_thread_area(struct user_desc __user *);
 
+asmlinkage long sys_arch_prctl(int, unsigned long);
+
 /* X86_32 only */
 #ifdef CONFIG_X86_32
 
@@ -43,13 +46,42 @@ asmlinkage long sys_vm86(unsigned long, unsigned long);
 
 #else /* CONFIG_X86_32 */
 
-/* X86_64 only */
-/* kernel/process_64.c */
-asmlinkage long sys_arch_prctl(int, unsigned long);
-
 /* kernel/sys_x86_64.c */
 asmlinkage long sys_mmap(unsigned long, unsigned long, unsigned long,
 			 unsigned long, unsigned long, unsigned long);
 
-#endif /* CONFIG_X86_32 */
+asmlinkage long ptregs_sys_rt_sigreturn(struct pt_regs *regs);
+asmlinkage long ptregs_sys_fork(struct pt_regs *regs);
+asmlinkage long ptregs_sys_vfork(struct pt_regs *regs);
+asmlinkage long ptregs_sys_execve(const char __user *filename,
+		const char __user *const __user *argv,
+		const char __user *const __user *envp);
+asmlinkage long ptregs_sys_iopl(unsigned int);
+asmlinkage long ptregs_sys_execveat(int dfd, const char __user *filename,
+			const char __user *const __user *argv,
+			const char __user *const __user *envp, int flags);
+asmlinkage long ptregs_sys_clone(unsigned long, unsigned long, int __user *,
+	       int __user *, unsigned long);
+
+#ifdef CONFIG_COMPAT
+asmlinkage long compat_sys_preadv64v2(unsigned long fd,
+	       const struct compat_iovec __user *vec,
+	       unsigned long vlen, loff_t pos, int flags);
+asmlinkage long ptregs_compat_sys_execve(unsigned long dfd,
+		 const char __user *filename,
+		 const compat_uptr_t __user *argv,
+		 const compat_uptr_t __user *envp);
+asmlinkage long ptregs_compat_sys_execveat(int dfd, const char __user *filename,
+		     const compat_uptr_t __user *argv,
+		     const compat_uptr_t __user *envp, int flags);
+asmlinkage long compat_sys_old_getrlimit(unsigned int resource,
+	struct compat_rlimit __user *rlim);
+asmlinkage long stub32_clone(unsigned, unsigned, int __user *,
+	       compat_uptr_t __user *, unsigned);
+#endif
+
+asmlinkage long sys32_x32_rt_sigreturn(void);
+
+
+#endif /* !CONFIG_X86_32 */
 #endif /* _ASM_X86_SYSCALLS_H */
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 07/21] x86: Make exception handler functions visible
  2017-11-27 21:34 Link time optimization for LTO/x86 Andi Kleen
                   ` (5 preceding siblings ...)
  2017-11-27 21:34 ` [PATCH 06/21] x86/syscalls: Make x86 syscalls use real prototypes Andi Kleen
@ 2017-11-27 21:34 ` Andi Kleen
  2017-11-27 21:34 ` [PATCH 08/21] x86/idt: Make const __initconst Andi Kleen
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Andi Kleen @ 2017-11-27 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, samitolvanen, alxmtvv, linux-kbuild, yamada.masahiro, akpm,
	Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Make the C exception handler functions that are directly called through
exception tables visible. LTO needs to know they are accessed from assembler.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/mm/extable.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
index 3321b446b66c..abe60607e8b9 100644
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -20,7 +20,7 @@ ex_fixup_handler(const struct exception_table_entry *x)
 	return (ex_handler_t)((unsigned long)&x->handler + x->handler);
 }
 
-bool ex_handler_default(const struct exception_table_entry *fixup,
+__visible bool ex_handler_default(const struct exception_table_entry *fixup,
 		       struct pt_regs *regs, int trapnr)
 {
 	regs->ip = ex_fixup_addr(fixup);
@@ -28,7 +28,7 @@ bool ex_handler_default(const struct exception_table_entry *fixup,
 }
 EXPORT_SYMBOL(ex_handler_default);
 
-bool ex_handler_fault(const struct exception_table_entry *fixup,
+__visible bool ex_handler_fault(const struct exception_table_entry *fixup,
 		     struct pt_regs *regs, int trapnr)
 {
 	regs->ip = ex_fixup_addr(fixup);
@@ -41,7 +41,7 @@ EXPORT_SYMBOL_GPL(ex_handler_fault);
  * Handler for UD0 exception following a failed test against the
  * result of a refcount inc/dec/add/sub.
  */
-bool ex_handler_refcount(const struct exception_table_entry *fixup,
+__visible bool ex_handler_refcount(const struct exception_table_entry *fixup,
 			 struct pt_regs *regs, int trapnr)
 {
 	/* First unconditionally saturate the refcount. */
@@ -94,6 +94,7 @@ EXPORT_SYMBOL_GPL(ex_handler_refcount);
  * of vulnerability by restoring from the initial state (essentially, zeroing
  * out all the FPU registers) if we can't restore from the task's FPU state.
  */
+__visible
 bool ex_handler_fprestore(const struct exception_table_entry *fixup,
 			  struct pt_regs *regs, int trapnr)
 {
@@ -107,7 +108,7 @@ bool ex_handler_fprestore(const struct exception_table_entry *fixup,
 }
 EXPORT_SYMBOL_GPL(ex_handler_fprestore);
 
-bool ex_handler_ext(const struct exception_table_entry *fixup,
+__visible bool ex_handler_ext(const struct exception_table_entry *fixup,
 		   struct pt_regs *regs, int trapnr)
 {
 	/* Special hack for uaccess_err */
@@ -117,7 +118,7 @@ bool ex_handler_ext(const struct exception_table_entry *fixup,
 }
 EXPORT_SYMBOL(ex_handler_ext);
 
-bool ex_handler_rdmsr_unsafe(const struct exception_table_entry *fixup,
+__visible bool ex_handler_rdmsr_unsafe(const struct exception_table_entry *fixup,
 			     struct pt_regs *regs, int trapnr)
 {
 	if (pr_warn_once("unchecked MSR access error: RDMSR from 0x%x at rIP: 0x%lx (%pF)\n",
@@ -132,7 +133,7 @@ bool ex_handler_rdmsr_unsafe(const struct exception_table_entry *fixup,
 }
 EXPORT_SYMBOL(ex_handler_rdmsr_unsafe);
 
-bool ex_handler_wrmsr_unsafe(const struct exception_table_entry *fixup,
+__visible bool ex_handler_wrmsr_unsafe(const struct exception_table_entry *fixup,
 			     struct pt_regs *regs, int trapnr)
 {
 	if (pr_warn_once("unchecked MSR access error: WRMSR to 0x%x (tried to write 0x%08x%08x) at rIP: 0x%lx (%pF)\n",
@@ -146,7 +147,7 @@ bool ex_handler_wrmsr_unsafe(const struct exception_table_entry *fixup,
 }
 EXPORT_SYMBOL(ex_handler_wrmsr_unsafe);
 
-bool ex_handler_clear_fs(const struct exception_table_entry *fixup,
+__visible bool ex_handler_clear_fs(const struct exception_table_entry *fixup,
 			 struct pt_regs *regs, int trapnr)
 {
 	if (static_cpu_has(X86_BUG_NULL_SEG))
@@ -156,7 +157,7 @@ bool ex_handler_clear_fs(const struct exception_table_entry *fixup,
 }
 EXPORT_SYMBOL(ex_handler_clear_fs);
 
-bool ex_has_fault_handler(unsigned long ip)
+__visible bool ex_has_fault_handler(unsigned long ip)
 {
 	const struct exception_table_entry *e;
 	ex_handler_t handler;
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 08/21] x86/idt: Make const __initconst
  2017-11-27 21:34 Link time optimization for LTO/x86 Andi Kleen
                   ` (6 preceding siblings ...)
  2017-11-27 21:34 ` [PATCH 07/21] x86: Make exception handler functions visible Andi Kleen
@ 2017-11-27 21:34 ` Andi Kleen
  2017-11-27 21:34 ` [PATCH 09/21] lto: Use C version for SYSCALL_ALIAS Andi Kleen
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Andi Kleen @ 2017-11-27 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, samitolvanen, alxmtvv, linux-kbuild, yamada.masahiro, akpm,
	Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

const variables must use __initconst, not __initdata. Fix this up
for the new IDT tables recently added, which got it consistently wrong.

Fixes a whole range of commits between
16bc18d895ce x86/idt: Move 32-bit idt_descr to C code
and
dc20b2d52653 x86/idt: Move interrupt gate initialization to IDT code

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/kernel/idt.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
index d985cef3984f..56d99be3706a 100644
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -56,7 +56,7 @@ struct idt_data {
  * Early traps running on the DEFAULT_STACK because the other interrupt
  * stacks work only after cpu_init().
  */
-static const __initdata struct idt_data early_idts[] = {
+static const __initconst struct idt_data early_idts[] = {
 	INTG(X86_TRAP_DB,		debug),
 	SYSG(X86_TRAP_BP,		int3),
 #ifdef CONFIG_X86_32
@@ -70,7 +70,7 @@ static const __initdata struct idt_data early_idts[] = {
  * the traps which use them are reinitialized with IST after cpu_init() has
  * set up TSS.
  */
-static const __initdata struct idt_data def_idts[] = {
+static const __initconst struct idt_data def_idts[] = {
 	INTG(X86_TRAP_DE,		divide_error),
 	INTG(X86_TRAP_NMI,		nmi),
 	INTG(X86_TRAP_BR,		bounds),
@@ -108,7 +108,7 @@ static const __initdata struct idt_data def_idts[] = {
 /*
  * The APIC and SMP idt entries
  */
-static const __initdata struct idt_data apic_idts[] = {
+static const __initconst struct idt_data apic_idts[] = {
 #ifdef CONFIG_SMP
 	INTG(RESCHEDULE_VECTOR,		reschedule_interrupt),
 	INTG(CALL_FUNCTION_VECTOR,	call_function_interrupt),
@@ -150,7 +150,7 @@ static const __initdata struct idt_data apic_idts[] = {
  * Early traps running on the DEFAULT_STACK because the other interrupt
  * stacks work only after cpu_init().
  */
-static const __initdata struct idt_data early_pf_idts[] = {
+static const __initconst struct idt_data early_pf_idts[] = {
 	INTG(X86_TRAP_PF,		page_fault),
 };
 
@@ -158,7 +158,7 @@ static const __initdata struct idt_data early_pf_idts[] = {
  * Override for the debug_idt. Same as the default, but with interrupt
  * stack set to DEFAULT_STACK (0). Required for NMI trap handling.
  */
-static const __initdata struct idt_data dbg_idts[] = {
+static const __initconst struct idt_data dbg_idts[] = {
 	INTG(X86_TRAP_DB,	debug),
 	INTG(X86_TRAP_BP,	int3),
 };
@@ -180,7 +180,7 @@ gate_desc debug_idt_table[IDT_ENTRIES] __page_aligned_bss;
  * The exceptions which use Interrupt stacks. They are setup after
  * cpu_init() when the TSS has been initialized.
  */
-static const __initdata struct idt_data ist_idts[] = {
+static const __initconst struct idt_data ist_idts[] = {
 	ISTG(X86_TRAP_DB,	debug,		DEBUG_STACK),
 	ISTG(X86_TRAP_NMI,	nmi,		NMI_STACK),
 	SISTG(X86_TRAP_BP,	int3,		DEBUG_STACK),
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 09/21] lto: Use C version for SYSCALL_ALIAS
  2017-11-27 21:34 Link time optimization for LTO/x86 Andi Kleen
                   ` (7 preceding siblings ...)
  2017-11-27 21:34 ` [PATCH 08/21] x86/idt: Make const __initconst Andi Kleen
@ 2017-11-27 21:34 ` Andi Kleen
  2017-11-27 21:34 ` [PATCH 10/21] Fix read buffer overflow in delta-ipc Andi Kleen
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Andi Kleen @ 2017-11-27 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, samitolvanen, alxmtvv, linux-kbuild, yamada.masahiro, akpm,
	Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

LTO doesn't like the assembler aliasing used for SYSCALL_ALIAS.
Replace it with C aliasing. Also mark the only user visible.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 include/linux/linkage.h   | 6 ++----
 kernel/time/posix-stubs.c | 2 +-
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/include/linux/linkage.h b/include/linux/linkage.h
index f68db9e450eb..24fb6468b1a5 100644
--- a/include/linux/linkage.h
+++ b/include/linux/linkage.h
@@ -30,10 +30,8 @@
 #endif
 
 #ifndef SYSCALL_ALIAS
-#define SYSCALL_ALIAS(alias, name) asm(			\
-	".globl " VMLINUX_SYMBOL_STR(alias) "\n\t"	\
-	".set   " VMLINUX_SYMBOL_STR(alias) ","		\
-		  VMLINUX_SYMBOL_STR(name))
+#define SYSCALL_ALIAS(a, name) \
+	__visible typeof(a) a __attribute__((alias(__stringify(name))))
 #endif
 
 #define __page_aligned_data	__section(.data..page_aligned) __aligned(PAGE_SIZE)
diff --git a/kernel/time/posix-stubs.c b/kernel/time/posix-stubs.c
index b258bee13b02..5e36cb75f3be 100644
--- a/kernel/time/posix-stubs.c
+++ b/kernel/time/posix-stubs.c
@@ -19,7 +19,7 @@
 #include <linux/posix-timers.h>
 #include <linux/compat.h>
 
-asmlinkage long sys_ni_posix_timers(void)
+__visible asmlinkage long sys_ni_posix_timers(void)
 {
 	pr_err_once("process %d (%s) attempted a POSIX timer syscall "
 		    "while CONFIG_POSIX_TIMERS is not set\n",
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 10/21] Fix read buffer overflow in delta-ipc
  2017-11-27 21:34 Link time optimization for LTO/x86 Andi Kleen
                   ` (8 preceding siblings ...)
  2017-11-27 21:34 ` [PATCH 09/21] lto: Use C version for SYSCALL_ALIAS Andi Kleen
@ 2017-11-27 21:34 ` Andi Kleen
  2017-11-27 21:34 ` [PATCH 11/21] trace: Use -mcount-record for dynamic ftrace Andi Kleen
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Andi Kleen @ 2017-11-27 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, samitolvanen, alxmtvv, linux-kbuild, yamada.masahiro, akpm,
	Andi Kleen, hugues.fruchet, mchehab

From: Andi Kleen <ak@linux.intel.com>

The single caller passes a string to delta_ipc_open, which copies with a
fixed size larger than the string. So it copies some random data after
the original string the ro segment.

If the string was at the end of a page it may fault.

Just copy the string with a normal strcpy after clearing the field.

Found by a LTO build (which errors out)
because the compiler inlines the functions and can resolve
the string sizes and triggers the compile time checks in memcpy.

In function ‘memcpy’,
    inlined from ‘delta_ipc_open.constprop’ at linux/drivers/media/platform/sti/delta/delta-ipc.c:178:0,
    inlined from ‘delta_mjpeg_ipc_open’ at linux/drivers/media/platform/sti/delta/delta-mjpeg-dec.c:227:0,
    inlined from ‘delta_mjpeg_decode’ at linux/drivers/media/platform/sti/delta/delta-mjpeg-dec.c:403:0:
/home/andi/lsrc/linux/include/linux/string.h:337:0: error: call to ‘__read_overflow2’ declared with attribute error: detected read beyond size of object passed as 2nd parameter
    __read_overflow2();

Cc: hugues.fruchet@st.com
Cc: mchehab@s-opensource.com
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/platform/intel-mid/device_libs/platform_bt.c | 2 +-
 certs/blacklist_nohashes.c                            | 2 +-
 drivers/media/platform/sti/delta/delta-ipc.c          | 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/platform/intel-mid/device_libs/platform_bt.c b/arch/x86/platform/intel-mid/device_libs/platform_bt.c
index dc036e511f48..2b5d86ce24c2 100644
--- a/arch/x86/platform/intel-mid/device_libs/platform_bt.c
+++ b/arch/x86/platform/intel-mid/device_libs/platform_bt.c
@@ -60,7 +60,7 @@ static int __init tng_bt_sfi_setup(struct bt_sfi_data *ddata)
 	return 0;
 }
 
-static const struct bt_sfi_data tng_bt_sfi_data __initdata = {
+static const struct bt_sfi_data tng_bt_sfi_data __initconst = {
 	.setup	= tng_bt_sfi_setup,
 };
 
diff --git a/certs/blacklist_nohashes.c b/certs/blacklist_nohashes.c
index 73fd99098ad7..753b703ef0ef 100644
--- a/certs/blacklist_nohashes.c
+++ b/certs/blacklist_nohashes.c
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 #include "blacklist.h"
 
-const char __initdata *const blacklist_hashes[] = {
+const char __initconst *const blacklist_hashes[] = {
 	NULL
 };
diff --git a/drivers/media/platform/sti/delta/delta-ipc.c b/drivers/media/platform/sti/delta/delta-ipc.c
index 41e4a4c259b3..b6c256e3ceb6 100644
--- a/drivers/media/platform/sti/delta/delta-ipc.c
+++ b/drivers/media/platform/sti/delta/delta-ipc.c
@@ -175,8 +175,8 @@ int delta_ipc_open(struct delta_ctx *pctx, const char *name,
 	msg.ipc_buf_size = ipc_buf_size;
 	msg.ipc_buf_paddr = ctx->ipc_buf->paddr;
 
-	memcpy(msg.name, name, sizeof(msg.name));
-	msg.name[sizeof(msg.name) - 1] = 0;
+	memset(msg.name, 0, sizeof(msg.name));
+	strcpy(msg.name, name);
 
 	msg.param_size = param->size;
 	memcpy(ctx->ipc_buf->vaddr, param->data, msg.param_size);
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 11/21] trace: Use -mcount-record for dynamic ftrace
  2017-11-27 21:34 Link time optimization for LTO/x86 Andi Kleen
                   ` (9 preceding siblings ...)
  2017-11-27 21:34 ` [PATCH 10/21] Fix read buffer overflow in delta-ipc Andi Kleen
@ 2017-11-27 21:34 ` Andi Kleen
  2017-12-01  0:22   ` Steven Rostedt
  2017-11-27 21:34 ` [PATCH 12/21] ftrace: Mark function tracer test functions noinline/noclone Andi Kleen
                   ` (11 subsequent siblings)
  22 siblings, 1 reply; 31+ messages in thread
From: Andi Kleen @ 2017-11-27 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, samitolvanen, alxmtvv, linux-kbuild, yamada.masahiro, akpm,
	Andi Kleen, rostedt

From: Andi Kleen <ak@linux.intel.com>

gcc 5 supports a new -mcount-record option to generate ftrace
tables directly. This avoids the need to run record_mcount
manually.

Use this option when available.

So far doesn't use -mcount-nop, which also exists now.

This is needed to make ftrace work with LTO because the
normal record-mcount script doesn't run over the link
time output.

It should also improve build times slightly in the general
case.

Cc: rostedt@goodmis.org
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 scripts/Makefile.build | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index cb8997ed0149..8179563bcd85 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -219,6 +219,11 @@ cmd_modversions_c =								\
 endif
 
 ifdef CONFIG_FTRACE_MCOUNT_RECORD
+# gcc 5 supports generating the mcount tables directly
+ifneq ($(call cc-option,-mrecord-mcount,y),y)
+KBUILD_CFLAGS += -mrecord-mcount
+else
+# else do it all manually
 ifdef BUILD_C_RECORDMCOUNT
 ifeq ("$(origin RECORDMCOUNT_WARN)", "command line")
   RECORDMCOUNT_FLAGS = -w
@@ -264,6 +269,7 @@ objtool_args += --no-unreachable
 else
 objtool_args += $(call cc-ifversion, -lt, 0405, --no-unreachable)
 endif
+endif
 
 # 'OBJECT_FILES_NON_STANDARD := y': skip objtool checking for a directory
 # 'OBJECT_FILES_NON_STANDARD_foo.o := 'y': skip objtool checking for a file
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 12/21] ftrace: Mark function tracer test functions noinline/noclone
  2017-11-27 21:34 Link time optimization for LTO/x86 Andi Kleen
                   ` (10 preceding siblings ...)
  2017-11-27 21:34 ` [PATCH 11/21] trace: Use -mcount-record for dynamic ftrace Andi Kleen
@ 2017-11-27 21:34 ` Andi Kleen
  2017-11-27 21:34 ` [PATCH 13/21] ftrace: Disable LTO for ftrace self tests Andi Kleen
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Andi Kleen @ 2017-11-27 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, samitolvanen, alxmtvv, linux-kbuild, yamada.masahiro, akpm,
	Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

The ftrace function tracer self tests calls some functions to verify
the get traced. This relies on them not being inlined. Previously
this was ensured by putting them into another file, but with LTO
the compiler can inline across files, which makes the tests fail.

Mark these functions as noinline and noclone.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 kernel/trace/trace_selftest_dynamic.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace_selftest_dynamic.c b/kernel/trace/trace_selftest_dynamic.c
index 8cda06a10d66..c364cf777e1a 100644
--- a/kernel/trace/trace_selftest_dynamic.c
+++ b/kernel/trace/trace_selftest_dynamic.c
@@ -1,13 +1,14 @@
 // SPDX-License-Identifier: GPL-2.0
+#include <linux/compiler.h>
 #include "trace.h"
 
-int DYN_FTRACE_TEST_NAME(void)
+noinline __noclone int DYN_FTRACE_TEST_NAME(void)
 {
 	/* used to call mcount */
 	return 0;
 }
 
-int DYN_FTRACE_TEST_NAME2(void)
+noinline __noclone int DYN_FTRACE_TEST_NAME2(void)
 {
 	/* used to call mcount */
 	return 0;
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 13/21] ftrace: Disable LTO for ftrace self tests
  2017-11-27 21:34 Link time optimization for LTO/x86 Andi Kleen
                   ` (11 preceding siblings ...)
  2017-11-27 21:34 ` [PATCH 12/21] ftrace: Mark function tracer test functions noinline/noclone Andi Kleen
@ 2017-11-27 21:34 ` Andi Kleen
  2017-11-27 21:34 ` [PATCH 14/21] lto, fs: Avoid static variable in linux/fs.h Andi Kleen
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Andi Kleen @ 2017-11-27 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, samitolvanen, alxmtvv, linux-kbuild, yamada.masahiro, akpm,
	Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Even when the test functions are not inlined something makes the ftrace
self tests fail with LTO. ftrace manually tests seems to work fine.
Disable LTO for the self test file, which makes the self tests work
again.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 kernel/trace/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index e2538c7638d4..a471a08305e9 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -8,7 +8,7 @@ KBUILD_CFLAGS = $(subst $(CC_FLAGS_FTRACE),,$(ORIG_CFLAGS))
 
 ifdef CONFIG_FTRACE_SELFTEST
 # selftest needs instrumentation
-CFLAGS_trace_selftest_dynamic.o = $(CC_FLAGS_FTRACE)
+CFLAGS_trace_selftest_dynamic.o = $(CC_FLAGS_FTRACE) ${DISABLE_LTO}
 obj-y += trace_selftest_dynamic.o
 endif
 endif
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 14/21] lto, fs: Avoid static variable in linux/fs.h
  2017-11-27 21:34 Link time optimization for LTO/x86 Andi Kleen
                   ` (12 preceding siblings ...)
  2017-11-27 21:34 ` [PATCH 13/21] ftrace: Disable LTO for ftrace self tests Andi Kleen
@ 2017-11-27 21:34 ` Andi Kleen
  2017-11-27 21:34 ` [PATCH 15/21] lto, x86, mm: Disable vmalloc BUILD_BUG_ON for LTO Andi Kleen
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Andi Kleen @ 2017-11-27 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, samitolvanen, alxmtvv, linux-kbuild, yamada.masahiro, akpm,
	Andi Kleen, viro

From: Andi Kleen <ak@linux.intel.com>

linux/fs.h has a initialized static variable kernel_read_file_str. It doesn't
make much sense to have a static variable in a frequently included
header file. With LTO -fno-toplevel-reorder gcc is unable to eliminate
it, which leads to a lot of unnecessary duplicated copies.

Move the static into the scope of the only inline that uses it,
this tells the compiler enough to not duplicate it. Right now
the inline is only called from one place, so that is ok. If it was
called from more places would need to move it somewhere else
to avoid unnecessary copies.

With LTO this avoids ~100k unnecessary data segment for a x86 defconfig
build. Even without LTO it doesn't make any sense.

Cc: viro@zeniv.linux.org.uk
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 include/linux/fs.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2995a271ec46..2f02f1c991c9 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2821,12 +2821,12 @@ enum kernel_read_file_id {
 	__kernel_read_file_id(__fid_enumify)
 };
 
-static const char * const kernel_read_file_str[] = {
-	__kernel_read_file_id(__fid_stringify)
-};
-
 static inline const char *kernel_read_file_id_str(enum kernel_read_file_id id)
 {
+	static const char * const kernel_read_file_str[] = {
+		__kernel_read_file_id(__fid_stringify)
+	};
+
 	if ((unsigned)id >= READING_MAX_ID)
 		return kernel_read_file_str[READING_UNKNOWN];
 
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 15/21] lto, x86, mm: Disable vmalloc BUILD_BUG_ON for LTO
  2017-11-27 21:34 Link time optimization for LTO/x86 Andi Kleen
                   ` (13 preceding siblings ...)
  2017-11-27 21:34 ` [PATCH 14/21] lto, fs: Avoid static variable in linux/fs.h Andi Kleen
@ 2017-11-27 21:34 ` Andi Kleen
  2017-11-27 21:34 ` [PATCH 16/21] lto: Add __noreorder and mark initcalls __noreorder Andi Kleen
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Andi Kleen @ 2017-11-27 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, samitolvanen, alxmtvv, linux-kbuild, yamada.masahiro, akpm,
	Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

On 32bit builds this BUILD_BUG_ON often fires with LTO for unknown
reasons. As far as I can tell it's a false positive. So disable
it for LTO.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/mm/init_32.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 8a64a6f2848d..8187d8ee98ee 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -808,7 +808,9 @@ void __init mem_init(void)
 	BUILD_BUG_ON(VMALLOC_END			> PKMAP_BASE);
 #endif
 #define high_memory (-128UL << 20)
+#ifndef CONFIG_LTO
 	BUILD_BUG_ON(VMALLOC_START			>= VMALLOC_END);
+#endif
 #undef high_memory
 #undef __FIXADDR_TOP
 
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 16/21] lto: Add __noreorder and mark initcalls __noreorder
  2017-11-27 21:34 Link time optimization for LTO/x86 Andi Kleen
                   ` (14 preceding siblings ...)
  2017-11-27 21:34 ` [PATCH 15/21] lto, x86, mm: Disable vmalloc BUILD_BUG_ON for LTO Andi Kleen
@ 2017-11-27 21:34 ` Andi Kleen
  2017-11-27 21:34 ` [PATCH 17/21] lto, workaround: Disable LTO for BPF Andi Kleen
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Andi Kleen @ 2017-11-27 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, samitolvanen, alxmtvv, linux-kbuild, yamada.masahiro, akpm,
	Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

gcc 5 has a new no_reorder attribute that prevents top level
reordering only for that symbol.

Kernels don't like any reordering of initcalls between files, as several
initcalls depend on each other. LTO previously needed to use
-fno-toplevel-reordering to prevent boot failures.

Add a __noreorder wrapper for the no_reorder attribute and use
it for initcalls.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 include/linux/compiler-gcc.h   | 5 +++++
 include/linux/compiler_types.h | 3 +++
 include/linux/init.h           | 2 +-
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
index 2272ded07496..0d562baeb379 100644
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -324,6 +324,11 @@
 #define __no_sanitize_address
 #endif
 
+#if __GNUC__ >= 5
+/* Avoid reordering a top level statement */
+#define __noreorder    __attribute__((no_reorder))
+#endif
+
 /*
  * A trick to suppress uninitialized variable warning without generating any
  * code
diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
index 6b79a9bba9a7..fe4604603f5e 100644
--- a/include/linux/compiler_types.h
+++ b/include/linux/compiler_types.h
@@ -260,6 +260,9 @@ struct ftrace_likely_data {
 #define __assume_aligned(a, ...)
 #endif
 
+#ifndef __noreorder
+#define __noreorder
+#endif
 
 /* Are two types/vars the same type (ignoring qualifiers)? */
 #ifndef __same_type
diff --git a/include/linux/init.h b/include/linux/init.h
index ea1b31101d9e..abf1980367b7 100644
--- a/include/linux/init.h
+++ b/include/linux/init.h
@@ -161,7 +161,7 @@ extern bool initcall_debug;
  */
 
 #define __define_initcall(fn, id) \
-	static initcall_t __initcall_##fn##id __used \
+	static initcall_t __initcall_##fn##id __used __noreorder \
 	__attribute__((__section__(".initcall" #id ".init"))) = fn;
 
 /*
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 17/21] lto, workaround: Disable LTO for BPF
  2017-11-27 21:34 Link time optimization for LTO/x86 Andi Kleen
                   ` (15 preceding siblings ...)
  2017-11-27 21:34 ` [PATCH 16/21] lto: Add __noreorder and mark initcalls __noreorder Andi Kleen
@ 2017-11-27 21:34 ` Andi Kleen
  2017-11-27 21:34 ` [PATCH 18/21] lto, crypto: Disable LTO for camelia glue Andi Kleen
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Andi Kleen @ 2017-11-27 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, samitolvanen, alxmtvv, linux-kbuild, yamada.masahiro, akpm,
	Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Disable LTO for the BPF interpreter. This works around a gcc bug in the LTO
partitioner that partitions the jumptable used the BPF interpreter
into a different LTO unit. This in term causes assembler
errors because the jump table contains references to the
code labels in the original file.

gcc problem tracked in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50676

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 kernel/bpf/Makefile | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index e691da0b3bab..409d4b6762ee 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -16,3 +16,8 @@ ifeq ($(CONFIG_PERF_EVENTS),y)
 obj-$(CONFIG_BPF_SYSCALL) += stackmap.o
 endif
 obj-$(CONFIG_CGROUP_BPF) += cgroup.o
+
+# various version of gcc have a LTO bug where the &&labels used in the
+# BPF interpreter can cause linker errors when spread incorrectly over
+# partitions. Disable LTO for BPF for now
+CFLAGS_core.o = $(DISABLE_LTO)
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 18/21] lto, crypto: Disable LTO for camelia glue
  2017-11-27 21:34 Link time optimization for LTO/x86 Andi Kleen
                   ` (16 preceding siblings ...)
  2017-11-27 21:34 ` [PATCH 17/21] lto, workaround: Disable LTO for BPF Andi Kleen
@ 2017-11-27 21:34 ` Andi Kleen
  2017-11-27 21:34 ` [PATCH 19/21] lto, x86: Disable LTO for realmode / vDSO / head64 Andi Kleen
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Andi Kleen @ 2017-11-27 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, samitolvanen, alxmtvv, linux-kbuild, yamada.masahiro, akpm,
	Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

The camelia assembler glue functions don't like LTO
and cause missing symbols. Just disable LTO for them.
I tried to add some visibles, but it's good enough
and this works fine.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/crypto/Makefile | 1 +
 crypto/Makefile          | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index 5f07333bb224..9dd30dd062f1 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -64,6 +64,7 @@ serpent-sse2-i586-y := serpent-sse2-i586-asm_32.o serpent_sse2_glue.o
 
 aes-x86_64-y := aes-x86_64-asm_64.o aes_glue.o
 des3_ede-x86_64-y := des3_ede-asm_64.o des3_ede_glue.o
+CFLAGS_camellia_glue.o += $(DISABLE_LTO)
 camellia-x86_64-y := camellia-x86_64-asm_64.o camellia_glue.o
 blowfish-x86_64-y := blowfish-x86_64-asm_64.o blowfish_glue.o
 twofish-x86_64-y := twofish-x86_64-asm_64.o twofish_glue.o
diff --git a/crypto/Makefile b/crypto/Makefile
index d674884b2d51..86ae10bb468b 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -101,6 +101,7 @@ CFLAGS_serpent_generic.o := $(call cc-option,-fsched-pressure)  # https://gcc.gn
 obj-$(CONFIG_CRYPTO_AES) += aes_generic.o
 obj-$(CONFIG_CRYPTO_AES_TI) += aes_ti.o
 obj-$(CONFIG_CRYPTO_CAMELLIA) += camellia_generic.o
+CFLAGS_cast_common.o += $(DISABLE_LTO)
 obj-$(CONFIG_CRYPTO_CAST_COMMON) += cast_common.o
 obj-$(CONFIG_CRYPTO_CAST5) += cast5_generic.o
 obj-$(CONFIG_CRYPTO_CAST6) += cast6_generic.o
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 19/21] lto, x86: Disable LTO for realmode / vDSO / head64
  2017-11-27 21:34 Link time optimization for LTO/x86 Andi Kleen
                   ` (17 preceding siblings ...)
  2017-11-27 21:34 ` [PATCH 18/21] lto, crypto: Disable LTO for camelia glue Andi Kleen
@ 2017-11-27 21:34 ` Andi Kleen
  2017-11-27 21:34 ` [PATCH 20/21] Kbuild, lto: Add Link Time Optimization support Andi Kleen
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Andi Kleen @ 2017-11-27 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, samitolvanen, alxmtvv, linux-kbuild, yamada.masahiro, akpm,
	Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

These files all don't like being compiled with LTO. Disable
it for them.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/entry/vdso/Makefile | 3 +--
 arch/x86/kernel/Makefile     | 2 ++
 arch/x86/realmode/Makefile   | 1 +
 3 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 1943aebadede..76caa1159ce1 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -3,7 +3,6 @@
 # Building vDSO images for x86.
 #
 
-KBUILD_CFLAGS += $(DISABLE_LTO)
 KASAN_SANITIZE			:= n
 UBSAN_SANITIZE			:= n
 OBJECT_FILES_NON_STANDARD	:= y
@@ -74,7 +73,7 @@ $(obj)/vdso-image-%.c: $(obj)/vdso%.so.dbg $(obj)/vdso%.so $(obj)/vdso2c FORCE
 CFL := $(PROFILING) -mcmodel=small -fPIC -O2 -fasynchronous-unwind-tables -m64 \
        $(filter -g%,$(KBUILD_CFLAGS)) $(call cc-option, -fno-stack-protector) \
        -fno-omit-frame-pointer -foptimize-sibling-calls \
-       -DDISABLE_BRANCH_PROFILING -DBUILD_VDSO
+       -DDISABLE_BRANCH_PROFILING -DBUILD_VDSO $(DISABLE_LTO)
 
 $(vobjs): KBUILD_CFLAGS := $(filter-out $(GCC_PLUGINS_CFLAGS),$(KBUILD_CFLAGS)) $(CFL)
 
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 81bb565f4497..18106858d4c3 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -22,6 +22,8 @@ CFLAGS_REMOVE_early_printk.o = -pg
 CFLAGS_REMOVE_head64.o = -pg
 endif
 
+CFLAGS_head64.o += $(DISABLE_LTO)
+
 KASAN_SANITIZE_head$(BITS).o				:= n
 KASAN_SANITIZE_dumpstack.o				:= n
 KASAN_SANITIZE_dumpstack_$(BITS).o			:= n
diff --git a/arch/x86/realmode/Makefile b/arch/x86/realmode/Makefile
index 682c895753d9..719f423da3a0 100644
--- a/arch/x86/realmode/Makefile
+++ b/arch/x86/realmode/Makefile
@@ -6,6 +6,7 @@
 # for more details.
 #
 #
+KBUILD_CFLAGS += $(DISABLE_LTO)
 KASAN_SANITIZE			:= n
 OBJECT_FILES_NON_STANDARD	:= y
 
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 20/21] Kbuild, lto: Add Link Time Optimization support
  2017-11-27 21:34 Link time optimization for LTO/x86 Andi Kleen
                   ` (18 preceding siblings ...)
  2017-11-27 21:34 ` [PATCH 19/21] lto, x86: Disable LTO for realmode / vDSO / head64 Andi Kleen
@ 2017-11-27 21:34 ` Andi Kleen
  2018-01-27  0:15   ` Arnd Bergmann
  2017-11-27 21:34 ` [PATCH 21/21] x86: Enable Link Time Optimization Andi Kleen
                   ` (2 subsequent siblings)
  22 siblings, 1 reply; 31+ messages in thread
From: Andi Kleen @ 2017-11-27 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, samitolvanen, alxmtvv, linux-kbuild, yamada.masahiro, akpm,
	Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

With LTO gcc will do whole program optimizations for
the whole kernel and each module. This increases compile time,
and makes incremential builds slower, but can generate faster and
smaller code and allows the compiler to do some global checking.

gcc can complain now about type mismatches for symbols between
different files.

The main advantage is that it allows cross file inlining, which
enables a range of new optimizations. It also allows the compiler
to throw away unused functions, which typically shrinks the kernel
somewhat.

It also enables a range of advanced and future optimizations
in the compiler.

Unlike earlier, this version doesn't require special
binutils, but relies on THIN_ARCHIVES instead.

This adds the basic Kbuild plumbing for LTO:

- In Kbuild add a new scripts/Makefile.lto that checks
the tool chain and when the tests pass sets the LTO options
We enable it only for gcc 5.0+ and reasonable new binutils

- Add a new LDFINAL variable that controls the final link
for vmlinux or module. In this case we call gcc-ld instead
of ld, to run the LTO step.

- Kconfigs:
Since LTO with allyesconfig needs more than 4G of memory (~8G)
and has the potential to makes people's system swap to death.
Smaller configs typically work with 4G.
I used a nested config that ensures that a simple
allyesconfig disables LTO. It has to be explicitely
enabled.

- This version runs modpost on the LTO object files.
This currently breaks MODVERSIONS and causes some warnings
and requires disabling the module resolution checks.
MODVERSIONS is excluded with LTO here. Solution would be to
reorganize the linking step to do a LDFINAL -r link
on all modules before running modpost

- Since this kernel version links the final kernel two-three
times for kallsyms all optimization steps are done multiple
times.

Thanks to HJ Lu, Joe Mario, Honza Hubicka, Richard Guenther,
Don Zickus, Changlong Xie, Gleb Schukin who helped with this project
(and probably some more who I forgot, sorry)

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 Documentation/lto-build  | 76 ++++++++++++++++++++++++++++++++++++++
 Makefile                 |  6 ++-
 init/Kconfig             | 68 ++++++++++++++++++++++++++++++++++
 scripts/Makefile.lto     | 95 ++++++++++++++++++++++++++++++++++++++++++++++++
 scripts/Makefile.modpost |  7 ++--
 scripts/gcc-ld           |  4 +-
 scripts/link-vmlinux.sh  |  6 +--
 7 files changed, 252 insertions(+), 10 deletions(-)
 create mode 100644 Documentation/lto-build
 create mode 100644 scripts/Makefile.lto

diff --git a/Documentation/lto-build b/Documentation/lto-build
new file mode 100644
index 000000000000..f33f008b23db
--- /dev/null
+++ b/Documentation/lto-build
@@ -0,0 +1,76 @@
+Link time optimization (LTO) for the Linux kernel
+
+This is an experimental feature.
+
+Link Time Optimization allows the compiler to optimize the complete program
+instead of just each file.
+
+The compiler can inline functions between files and do various other global
+optimizations, like specializing functions for common parameters,
+determing when global variables are clobbered, making functions pure/const,
+propagating constants globally, removing unneeded data and others.
+
+It will also drop unused functions which can make the kernel
+image smaller in some circumstances, in particular for small kernel
+configurations.
+
+For small monolithic kernels it can throw away unused code very effectively
+(especially when modules are disabled) and usually shrinks
+the code size.
+
+Build time and memory consumption at build time will increase, depending
+on the size of the largest binary. Modular kernels are less affected.
+With LTO incremental builds are less incremental, as always the whole
+binary needs to be re-optimized (but not re-parsed)
+
+Oops can be somewhat more difficult to read, due to the more aggressive
+inlining (it helps to use scripts/faddr2line)
+
+Normal "reasonable" builds work with less than 4GB of RAM, but very large
+configurations like allyesconfig typically need more memory. The actual
+memory needed depends on the available memory (gcc sizes its garbage
+collector pools based on that or on the ulimit -m limits) and
+the compiler version.
+
+Configuration:
+- Enable CONFIG_LTO_MENU and then disable CONFIG_LTO_DISABLE.
+This is mainly to not have allyesconfig default to LTO.
+
+Requirements:
+- Enough memory: 4GB for a standard build, more for allyesconfig
+The peak memory usage happens single threaded (when lto-wpa merges types),
+so dialing back -j options will not help much.
+
+A 32bit compiler is unlikely to work due to the memory requirements.
+You can however build a kernel targeted at 32bit on a 64bit host.
+
+FAQs:
+
+Q: I get a section type attribute conflict
+A: Usually because of someone doing
+const __initdata (should be const __initconst) or const __read_mostly
+(should be just const). Check both symbols reported by gcc.
+
+Q: What's up with .XXXXX numeric post fixes
+A: This is due LTO turning (near) all symbols to static
+Use gcc 4.9, it avoids them in most cases. They are also filtered out
+in kallsyms. There are still some .lto_priv left.
+
+References:
+
+Presentation on Kernel LTO
+(note, performance numbers/details outdated.  In particular gcc 4.9 fixed
+most of the build time problems):
+http://halobates.de/kernel-lto.pdf
+
+Generic gcc LTO:
+http://www.ucw.cz/~hubicka/slides/labs2013.pdf
+http://www.hipeac.net/system/files/barcelona.pdf
+
+Somewhat outdated too:
+http://gcc.gnu.org/projects/lto/lto.pdf
+http://gcc.gnu.org/projects/lto/whopr.pdf
+
+Happy Link-Time-Optimizing!
+
+Andi Kleen
diff --git a/Makefile b/Makefile
index f761bf475ba5..685a638bc3cd 100644
--- a/Makefile
+++ b/Makefile
@@ -370,6 +370,7 @@ HOST_LOADLIBES := $(HOST_LFS_LIBS)
 # Make variables (CC, etc...)
 AS		= $(CROSS_COMPILE)as
 LD		= $(CROSS_COMPILE)ld
+LDFINAL		= $(LD)
 CC		= $(CROSS_COMPILE)gcc
 CPP		= $(CC) -E
 AR		= $(CROSS_COMPILE)ar
@@ -427,7 +428,7 @@ KBUILD_LDFLAGS_MODULE := -T $(srctree)/scripts/module-common.lds
 GCC_PLUGINS_CFLAGS :=
 
 export ARCH SRCARCH CONFIG_SHELL HOSTCC HOSTCFLAGS CROSS_COMPILE AS LD CC
-export CPP AR NM STRIP OBJCOPY OBJDUMP HOSTLDFLAGS HOST_LOADLIBES
+export CPP AR NM STRIP OBJCOPY OBJDUMP HOSTLDFLAGS HOST_LOADLIBES LDFINAL
 export MAKE AWK GENKSYMS INSTALLKERNEL PERL PYTHON UTS_MACHINE
 export HOSTCXX HOSTCXXFLAGS LDFLAGS_MODULE CHECK CHECKFLAGS
 
@@ -813,6 +814,7 @@ KBUILD_ARFLAGS := $(call ar-option,D)
 include scripts/Makefile.kasan
 include scripts/Makefile.extrawarn
 include scripts/Makefile.ubsan
+include scripts/Makefile.lto
 
 # Add any arch overrides and user supplied CPPFLAGS, AFLAGS and CFLAGS as the
 # last assignments
@@ -986,7 +988,7 @@ ARCH_POSTLINK := $(wildcard $(srctree)/arch/$(SRCARCH)/Makefile.postlink)
 
 # Final link of vmlinux with optional arch pass after final link
 cmd_link-vmlinux =                                                 \
-	$(CONFIG_SHELL) $< $(LD) $(LDFLAGS) $(LDFLAGS_vmlinux) ;    \
+	$(CONFIG_SHELL) $< $(LDFINAL) $(LDFLAGS) $(LDFLAGS_vmlinux) ;    \
 	$(if $(ARCH_POSTLINK), $(MAKE) -f $(ARCH_POSTLINK) $@, true)
 
 vmlinux: scripts/link-vmlinux.sh vmlinux_prereq $(vmlinux-deps) FORCE
diff --git a/init/Kconfig b/init/Kconfig
index 2934249fba46..36f79d2bbcdb 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1034,6 +1034,73 @@ config CC_OPTIMIZE_FOR_SIZE
 
 endchoice
 
+config ARCH_SUPPORTS_LTO
+	bool
+
+config LTO_MENU
+	bool "Enable gcc link time optimization (LTO)"
+	depends on ARCH_SUPPORTS_LTO
+	help
+	  With this option gcc will do whole program optimizations for
+	  the whole kernel and module. This increases compile time, but can
+	  lead to better code. It allows gcc to inline functions between
+	  different files and do other optimization.  It might also trigger
+	  bugs due to more aggressive optimization. It allows gcc to drop unused
+	  code. On smaller monolithic kernel configurations
+	  it usually leads to smaller kernels, especially when modules
+	  are disabled.
+
+	  With this option gcc will also do some global checking over
+	  different source files. It also disables a number of kernel
+	  features.
+
+	  This option is recommended for release builds. With LTO
+	  the kernel always has to be re-optimized (but not re-parsed)
+	  on each build.
+
+	  This requires a gcc 5.0 or later compiler, or 6.0 or later
+	  if UBSAN is used.
+
+	  On larger configurations this may need more than 4GB of RAM.
+	  It will likely not work on those with a 32bit compiler.
+
+	  When the toolchain support is not available this will (hopefully)
+	  be automatically disabled.
+
+	  For more information see Documentation/lto-build
+
+config LTO_DISABLE
+         bool "Disable LTO again"
+         depends on LTO_MENU
+         default n
+         help
+           This option is merely here so that allyesconfig or allmodconfig do
+           not enable LTO. If you want to actually use LTO do not enable.
+
+config LTO
+	bool
+	default y
+	depends on LTO_MENU && !LTO_DISABLE
+
+config LTO_DEBUG
+	bool "Enable LTO compile time debugging"
+	depends on LTO
+	help
+	  Enable LTO debugging in the compiler. The compiler dumps
+	  some log files that make it easier to figure out LTO
+	  behavior. The log files also allow to reconstruct
+	  the global inlining and a global callgraph.
+	  They however add some (single threaded) cost to the
+	  compilation.  When in doubt do not enable.
+
+config LTO_CP_CLONE
+	bool "Allow aggressive cloning for function specialization"
+	depends on LTO
+	help
+	  Allow the compiler to clone and specialize functions for specific
+	  arguments when it determines these arguments are very commonly
+	  called.  Experimential. Will increase text size.
+
 config SYSCTL
 	bool
 
@@ -1716,6 +1783,7 @@ config MODULE_FORCE_UNLOAD
 
 config MODVERSIONS
 	bool "Module versioning support"
+	depends on !LTO
 	help
 	  Usually, you have to use modules compiled with your kernel.
 	  Saying Y here makes it sometimes possible to use modules
diff --git a/scripts/Makefile.lto b/scripts/Makefile.lto
new file mode 100644
index 000000000000..2d6995ba7d0b
--- /dev/null
+++ b/scripts/Makefile.lto
@@ -0,0 +1,95 @@
+#
+# Support for gcc link time optimization
+#
+
+DISABLE_LTO :=
+LTO_CFLAGS :=
+
+export DISABLE_LTO
+export LTO_CFLAGS
+
+ifdef CONFIG_LTO
+ifdef CONFIG_UBSAN
+ifeq ($(call cc-ifversion,-lt,0600,y),y)
+        # work around compiler asserts due to UBSAN
+        $(warning Disabling LTO for gcc 5.x because UBSAN is active)
+        undefine CONFIG_LTO
+endif
+endif
+endif
+
+ifdef CONFIG_LTO
+# 4.7 works mostly, but it sometimes loses symbols on large builds
+# This can be worked around by marking those symbols visible,
+# but that is fairly ugly and the problem is gone with 4.8
+# 4.8 was very slow
+# 4.9 was missing __attribute__((noreorder)) for ordering initcalls,
+# and needed -fno-toplevel-reorder, which can lead to missing symbols
+# so only support 5.0+
+ifeq ($(call cc-ifversion, -ge, 0500,y),y)
+# is the compiler compiled with LTO?
+ifneq ($(call cc-option,${LTO_CFLAGS},n),n)
+# binutils before 2.27 has various problems with plugins
+ifeq ($(call ld-ifversion,-ge,227000000,y),y)
+
+	LTO_CFLAGS := -flto $(DISABLE_TL_REORDER)
+	LTO_FINAL_CFLAGS := -fuse-linker-plugin
+
+# would be needed to support < 5.0
+#	LTO_FINAL_CFLAGS += -fno-toplevel-reorder
+
+	LTO_FINAL_CFLAGS += -flto=jobserver
+
+	# don't compile everything twice
+	# requires plugin ar
+	LTO_CFLAGS += -fno-fat-lto-objects
+
+	# Used to disable LTO for specific files (e.g. vdso)
+	DISABLE_LTO := -fno-lto
+
+	# shut up lots of warnings for the compat syscalls
+	LTO_CFLAGS += $(call cc-disable-warning,attribute-alias,)
+
+	LTO_FINAL_CFLAGS += ${LTO_CFLAGS} -fwhole-program
+
+	# most options are passed through implicitely in the LTO
+	# files per function, but not all.
+	# should not pass any that may need to be disabled for
+	# individual files.
+	LTO_FINAL_CFLAGS += $(filter -pg,${KBUILD_CFLAGS})
+	LTO_FINAL_CFLAGS += $(filter -fno-strict-aliasing,${KBUILD_CFLAGS})
+
+ifdef CONFIG_LTO_DEBUG
+	LTO_FINAL_CFLAGS += -fdump-ipa-cgraph -fdump-ipa-inline-details
+	# add for debugging compiler crashes:
+	# LTO_FINAL_CFLAGS += -dH -save-temps
+endif
+ifdef CONFIG_LTO_CP_CLONE
+	LTO_FINAL_CFLAGS += -fipa-cp-clone
+	LTO_CFLAGS += -fipa-cp-clone
+endif
+
+	KBUILD_CFLAGS += ${LTO_CFLAGS}
+
+	LDFINAL := ${CONFIG_SHELL} ${srctree}/scripts/gcc-ld \
+                  ${LTO_FINAL_CFLAGS}
+
+	# LTO gcc creates a lot of files in TMPDIR, and with /tmp as tmpfs
+	# it's easy to drive the machine OOM. Use the object directory
+	# instead.
+	TMPDIR ?= $(objtree)
+	export TMPDIR
+
+	# use plugin aware tools
+	AR = $(CROSS_COMPILE)gcc-ar
+	NM = $(CROSS_COMPILE)gcc-nm
+else
+        $(warning WARNING old binutils. LTO disabled)
+endif
+else
+        $(warning "WARNING: Compiler/Linker does not support LTO/WHOPR with linker plugin. CONFIG_LTO disabled.")
+endif
+else
+        $(warning "WARNING: GCC $(call cc-version) too old for LTO/WHOPR. CONFIG_LTO disabled")
+endif
+endif
diff --git a/scripts/Makefile.modpost b/scripts/Makefile.modpost
index df4174405feb..d1d3b2cfc9ce 100644
--- a/scripts/Makefile.modpost
+++ b/scripts/Makefile.modpost
@@ -79,7 +79,8 @@ modpost = scripts/mod/modpost                    \
  $(if $(KBUILD_EXTMOD),-o $(modulesymfile))      \
  $(if $(CONFIG_DEBUG_SECTION_MISMATCH),,-S)      \
  $(if $(CONFIG_SECTION_MISMATCH_WARN_ONLY),,-E)  \
- $(if $(KBUILD_EXTMOD)$(KBUILD_MODPOST_WARN),-w)
+ $(if $(KBUILD_EXTMOD)$(KBUILD_MODPOST_WARN),-w) \
+ $(if $(CONFIG_LTO),-w)
 
 MODPOST_OPT=$(subst -i,-n,$(filter -i,$(MAKEFLAGS)))
 
@@ -118,9 +119,9 @@ targets += $(modules:.ko=.mod.o)
 ARCH_POSTLINK := $(wildcard $(srctree)/arch/$(SRCARCH)/Makefile.postlink)
 
 # Step 6), final link of the modules with optional arch pass after final link
-quiet_cmd_ld_ko_o = LD [M]  $@
+quiet_cmd_ld_ko_o = LDFINAL [M]  $@
       cmd_ld_ko_o =                                                     \
-	$(LD) -r $(LDFLAGS)                                             \
+	$(LDFINAL) -r $(LDFLAGS)                                             \
                  $(KBUILD_LDFLAGS_MODULE) $(LDFLAGS_MODULE)             \
                  -o $@ $(filter-out FORCE,$^) ;                         \
 	$(if $(ARCH_POSTLINK), $(MAKE) -f $(ARCH_POSTLINK) $@, true)
diff --git a/scripts/gcc-ld b/scripts/gcc-ld
index 997b818c3962..d95dd0be38e7 100755
--- a/scripts/gcc-ld
+++ b/scripts/gcc-ld
@@ -8,7 +8,7 @@ ARGS="-nostdlib"
 
 while [ "$1" != "" ] ; do
 	case "$1" in
-	-save-temps|-m32|-m64) N="$1" ;;
+	-save-temps*|-m32|-m64) N="$1" ;;
 	-r) N="$1" ;;
 	-[Wg]*) N="$1" ;;
 	-[olv]|-[Ofd]*|-nostdlib) N="$1" ;;
@@ -19,7 +19,7 @@ while [ "$1" != "" ] ; do
 -rpath-link|--sort-section|--section-start|-Tbss|-Tdata|-Ttext|\
 --version-script|--dynamic-list|--version-exports-symbol|--wrap|-m)
 		A="$1" ; shift ; N="-Wl,$A,$1" ;;
-	-[m]*) N="$1" ;;
+	-[mp]*) N="$1" ;;
 	-*) N="-Wl,$1" ;;
 	*)  N="$1" ;;
 	esac
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index c0d129d7f430..964b2ee855dd 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -84,7 +84,7 @@ modpost_link()
 			${KBUILD_VMLINUX_LIBS}				\
 			--end-group"
 	fi
-	${LD} ${LDFLAGS} -r -o ${1} ${objects}
+	${LDFINAL} ${LDFLAGS} -r -o ${1} ${objects}
 }
 
 # Link of vmlinux
@@ -113,7 +113,7 @@ vmlinux_link()
 				${1}"
 		fi
 
-		${LD} ${LDFLAGS} ${LDFLAGS_vmlinux} -o ${2}		\
+		${LDFINAL} ${LDFLAGS} ${LDFLAGS_vmlinux} -o ${2}		\
 			-T ${lds} ${objects}
 	else
 		if [ -n "${CONFIG_THIN_ARCHIVES}" ]; then
@@ -309,7 +309,7 @@ if [ -n "${CONFIG_KALLSYMS}" ]; then
 	fi
 fi
 
-info LD vmlinux
+info LDFINAL vmlinux
 vmlinux_link "${kallsymso}" vmlinux
 
 if [ -n "${CONFIG_BUILDTIME_EXTABLE_SORT}" ]; then
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 21/21] x86: Enable Link Time Optimization
  2017-11-27 21:34 Link time optimization for LTO/x86 Andi Kleen
                   ` (19 preceding siblings ...)
  2017-11-27 21:34 ` [PATCH 20/21] Kbuild, lto: Add Link Time Optimization support Andi Kleen
@ 2017-11-27 21:34 ` Andi Kleen
  2017-11-28 16:04 ` [PATCH 02/21] afs: Fix const confusion in AFS David Howells
  2017-11-29 23:09 ` Link time optimization for LTO/x86 Sami Tolvanen
  22 siblings, 0 replies; 31+ messages in thread
From: Andi Kleen @ 2017-11-27 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, samitolvanen, alxmtvv, linux-kbuild, yamada.masahiro, akpm,
	Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

LTO is opt-in per architecture because it usually needs some
fixes.

LTO needs THIN_ARCHIVES because standard binutils doesn't like mixing
assembler and LTO code with ld -r.

Enable LTO and THIN_ARCHIVES for x86

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/Kconfig | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 8eed3f94bfc7..92650726f908 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -122,6 +122,8 @@ config X86
 	select HAVE_ARCH_VMAP_STACK		if X86_64
 	select HAVE_ARCH_WITHIN_STACK_FRAMES
 	select HAVE_CC_STACKPROTECTOR
+	select THIN_ARCHIVES			if LTO
+	select ARCH_SUPPORTS_LTO
 	select HAVE_CMPXCHG_DOUBLE
 	select HAVE_CMPXCHG_LOCAL
 	select HAVE_CONTEXT_TRACKING		if X86_64
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH 02/21] afs: Fix const confusion in AFS
  2017-11-27 21:34 Link time optimization for LTO/x86 Andi Kleen
                   ` (20 preceding siblings ...)
  2017-11-27 21:34 ` [PATCH 21/21] x86: Enable Link Time Optimization Andi Kleen
@ 2017-11-28 16:04 ` David Howells
  2017-11-28 16:50   ` Andi Kleen
  2017-11-29 23:09 ` Link time optimization for LTO/x86 Sami Tolvanen
  22 siblings, 1 reply; 31+ messages in thread
From: David Howells @ 2017-11-28 16:04 UTC (permalink / raw)
  To: Andi Kleen
  Cc: dhowells, linux-kernel, x86, samitolvanen, alxmtvv, linux-kbuild,
	yamada.masahiro, akpm, Andi Kleen

Andi Kleen <andi@firstfloor.org> wrote:

> A trace point string cannot be const because the underlying special
> section is not marked const. An LTO build complains about the
> section attribute mismatch. Fix it by not marking the trace point
> string in afs const.

Do you want this to go through my tree?

David

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 02/21] afs: Fix const confusion in AFS
  2017-11-28 16:04 ` [PATCH 02/21] afs: Fix const confusion in AFS David Howells
@ 2017-11-28 16:50   ` Andi Kleen
  0 siblings, 0 replies; 31+ messages in thread
From: Andi Kleen @ 2017-11-28 16:50 UTC (permalink / raw)
  To: David Howells
  Cc: Andi Kleen, linux-kernel, x86, samitolvanen, alxmtvv,
	linux-kbuild, yamada.masahiro, akpm, Andi Kleen

On Tue, Nov 28, 2017 at 04:04:38PM +0000, David Howells wrote:
> Andi Kleen <andi@firstfloor.org> wrote:
> 
> > A trace point string cannot be const because the underlying special
> > section is not marked const. An LTO build complains about the
> > section attribute mismatch. Fix it by not marking the trace point
> > string in afs const.
> 
> Do you want this to go through my tree?

Yes please.

-Andi

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Link time optimization for LTO/x86
  2017-11-27 21:34 Link time optimization for LTO/x86 Andi Kleen
                   ` (21 preceding siblings ...)
  2017-11-28 16:04 ` [PATCH 02/21] afs: Fix const confusion in AFS David Howells
@ 2017-11-29 23:09 ` Sami Tolvanen
  22 siblings, 0 replies; 31+ messages in thread
From: Sami Tolvanen @ 2017-11-29 23:09 UTC (permalink / raw)
  To: Andi Kleen
  Cc: linux-kernel, x86, alxmtvv, linux-kbuild, yamada.masahiro, akpm

Hi Andi,

On Mon, Nov 27, 2017 at 01:34:02PM -0800, Andi Kleen wrote:
> This is an updated version of my older LTO patchkit for gcc/x86
> This version doesn't need special binutils, but requires gcc 5+.
> It also is compatible with near all options (except MODVERSIONS)

Would you mind changing the config option prefix to LTO_GCC, for
example? It looks like there are a few places where gcc LTO needs
to be disabled, but clang LTO works fine, and vice versa.

My earlier clang LTO patches also had a solution for MODVERSIONS,
which might work with gcc as well.

Sami

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 11/21] trace: Use -mcount-record for dynamic ftrace
  2017-11-27 21:34 ` [PATCH 11/21] trace: Use -mcount-record for dynamic ftrace Andi Kleen
@ 2017-12-01  0:22   ` Steven Rostedt
  2018-05-01 18:42     ` Steven Rostedt
  0 siblings, 1 reply; 31+ messages in thread
From: Steven Rostedt @ 2017-12-01  0:22 UTC (permalink / raw)
  To: Andi Kleen
  Cc: linux-kernel, x86, samitolvanen, alxmtvv, linux-kbuild,
	yamada.masahiro, akpm, Andi Kleen

On Mon, 27 Nov 2017 13:34:13 -0800
Andi Kleen <andi@firstfloor.org> wrote:

> From: Andi Kleen <ak@linux.intel.com>
> 
> gcc 5 supports a new -mcount-record option to generate ftrace
> tables directly. This avoids the need to run record_mcount
> manually.
> 
> Use this option when available.
> 
> So far doesn't use -mcount-nop, which also exists now.
> 
> This is needed to make ftrace work with LTO because the
> normal record-mcount script doesn't run over the link
> time output.
> 
> It should also improve build times slightly in the general
> case.
> 
> Cc: rostedt@goodmis.org
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---

Hi Andi,

Thanks for sending this. I'm currently trying to catch up on all the
changes that need to get into 4.15 and will be doing a bit of
traveling.

I plan on looking at this in a week or two, and add it to the 4.16
queue.

-- Steve

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 20/21] Kbuild, lto: Add Link Time Optimization support
  2017-11-27 21:34 ` [PATCH 20/21] Kbuild, lto: Add Link Time Optimization support Andi Kleen
@ 2018-01-27  0:15   ` Arnd Bergmann
  2018-01-27  0:55     ` Andi Kleen
  0 siblings, 1 reply; 31+ messages in thread
From: Arnd Bergmann @ 2018-01-27  0:15 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Linux Kernel Mailing List, the arch/x86 maintainers,
	samitolvanen, alxmtvv, Linux Kbuild mailing list,
	Masahiro Yamada, Andrew Morton, Andi Kleen

On Mon, Nov 27, 2017 at 10:34 PM, Andi Kleen <andi@firstfloor.org> wrote:
> From: Andi Kleen <ak@linux.intel.com>
> - Add a new LDFINAL variable that controls the final link
> for vmlinux or module. In this case we call gcc-ld instead
> of ld, to run the LTO step.

When I tried this out on allmodconfig (following the lwn article), I ran into
a number of warnings:

WARNING: modpost: missing MODULE_LICENSE() in
drivers/isdn/hardware/eicon/diva_mnt.o
see include/linux/module.h for more information
WARNING: modpost: missing MODULE_LICENSE() in drivers/isdn/hysdn/hysdn.o
see include/linux/module.h for more information
WARNING: modpost: missing MODULE_LICENSE() in drivers/vhost/vhost_scsi.o
see include/linux/module.h for more information
WARNING: modpost: missing MODULE_LICENSE() in drivers/scsi/pcmcia/qlogic_cs.o
see include/linux/module.h for more information
WARNING: modpost: missing MODULE_LICENSE() in drivers/scsi/pcmcia/fdomain_cs.o
...

These are apparently all compound modules that do have a valid
license, and modinfo
shows it correctly:

modinfo build/tmp/drivers/isdn/hardware/eicon/diva_mnt.ko
filename:
/home/arnd/arm-soc/build/tmp/drivers/isdn/hardware/eicon/diva_mnt.ko
license:        GPL
author:         Cytronics & Melware, Eicon Networks
description:    Maint driver for Eicon DIVA Server cards
...

but something in the LTO build conuses modpost to the point that it doesn't
see it.

I also get 131 errors in x86 allmodconfig along the lines of

/git/arm-soc/include/linux/slab.h:298:0: error: type of
'kmalloc_caches' does not match original declaration
[-Werror=lto-type-mismatch]
 extern struct kmem_cache *kmalloc_caches[KMALLOC_SHIFT_HIGH + 1];
 /git/arm-soc/mm/slab_common.c:958:1: note: 'kmalloc_caches' was
previously declared here
 EXPORT_SYMBOL(kmalloc_caches);

/git/arm-soc/include/linux/seq_file.h:109:5: error: type of 'seq_open'
does not match original declaration [-Werror=lto-type-mismatc
h]
 int seq_open(struct file *, const struct seq_operations *);
     ^
/git/arm-soc/fs/seq_file.c:48:0: note: 'seq_open' was previously declared here
 int seq_open(struct file *file, const struct seq_operations *op)

I found a fix for some of these issues attached to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83954
but am still doing more randconfig build testing.

        Arnd

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 20/21] Kbuild, lto: Add Link Time Optimization support
  2018-01-27  0:15   ` Arnd Bergmann
@ 2018-01-27  0:55     ` Andi Kleen
  2018-01-27 14:26       ` Arnd Bergmann
  0 siblings, 1 reply; 31+ messages in thread
From: Andi Kleen @ 2018-01-27  0:55 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Andi Kleen, Linux Kernel Mailing List, the arch/x86 maintainers,
	samitolvanen, alxmtvv, Linux Kbuild mailing list,
	Masahiro Yamada, Andrew Morton

On Sat, Jan 27, 2018 at 01:15:49AM +0100, Arnd Bergmann wrote:
> On Mon, Nov 27, 2017 at 10:34 PM, Andi Kleen <andi@firstfloor.org> wrote:
> > From: Andi Kleen <ak@linux.intel.com>
> > - Add a new LDFINAL variable that controls the final link
> > for vmlinux or module. In this case we call gcc-ld instead
> > of ld, to run the LTO step.
> 
> When I tried this out on allmodconfig (following the lwn article), I ran into
> a number of warnings:

Thanks for testing. Yes it's a known issue: during one module build
when modpost looks at the file it is still in LTO format, and
modpost doesn't understand the LTO symbol table. I had a patch
to teach it to it at some point, but it got lost somewhere.
The LLVM LTO patchkit has a different solution that actually
fixes the sequence to run modpost only after a LTO final link,
but I haven't gotten around to port that one.

It seems to work already for the single file modules.

-Andi

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 20/21] Kbuild, lto: Add Link Time Optimization support
  2018-01-27  0:55     ` Andi Kleen
@ 2018-01-27 14:26       ` Arnd Bergmann
  2018-01-28 18:33         ` Andi Kleen
  0 siblings, 1 reply; 31+ messages in thread
From: Arnd Bergmann @ 2018-01-27 14:26 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andi Kleen, Linux Kernel Mailing List, the arch/x86 maintainers,
	samitolvanen, alxmtvv, Linux Kbuild mailing list,
	Masahiro Yamada, Andrew Morton

On Sat, Jan 27, 2018 at 1:55 AM, Andi Kleen <ak@linux.intel.com> wrote:
> On Sat, Jan 27, 2018 at 01:15:49AM +0100, Arnd Bergmann wrote:
>> On Mon, Nov 27, 2017 at 10:34 PM, Andi Kleen <andi@firstfloor.org> wrote:
>> > From: Andi Kleen <ak@linux.intel.com>
>> > - Add a new LDFINAL variable that controls the final link
>> > for vmlinux or module. In this case we call gcc-ld instead
>> > of ld, to run the LTO step.
>>
>> When I tried this out on allmodconfig (following the lwn article), I ran into
>> a number of warnings:
>
> Thanks for testing. Yes it's a known issue: during one module build
> when modpost looks at the file it is still in LTO format, and
> modpost doesn't understand the LTO symbol table. I had a patch
> to teach it to it at some point, but it got lost somewhere.
> The LLVM LTO patchkit has a different solution that actually
> fixes the sequence to run modpost only after a LTO final link,
> but I haven't gotten around to port that one.
>
> It seems to work already for the single file modules.

Ok, I see. I've turned off that warning now for testing.

What about this one:

kallsyms failure: relative symbol value 0xffffffff81000000 out of
range in relative mode

I seem to get that all the time here, and have also disabled it for now,
but it sounds important (and breaks the build).

      Arnd

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 20/21] Kbuild, lto: Add Link Time Optimization support
  2018-01-27 14:26       ` Arnd Bergmann
@ 2018-01-28 18:33         ` Andi Kleen
  0 siblings, 0 replies; 31+ messages in thread
From: Andi Kleen @ 2018-01-28 18:33 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Andi Kleen, Linux Kernel Mailing List, the arch/x86 maintainers,
	samitolvanen, alxmtvv, Linux Kbuild mailing list,
	Masahiro Yamada, Andrew Morton

> 
> kallsyms failure: relative symbol value 0xffffffff81000000 out of
> range in relative mode
> 
> I seem to get that all the time here, and have also disabled it for now,
> but it sounds important (and breaks the build).

Need to take a look at it. 

I had some patches that completely revamped kallsyms as part of the single
pass linking, but I dropped them again because they still had too many 
issues.

-Andi

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 11/21] trace: Use -mcount-record for dynamic ftrace
  2017-12-01  0:22   ` Steven Rostedt
@ 2018-05-01 18:42     ` Steven Rostedt
  0 siblings, 0 replies; 31+ messages in thread
From: Steven Rostedt @ 2018-05-01 18:42 UTC (permalink / raw)
  To: Andi Kleen
  Cc: linux-kernel, x86, samitolvanen, alxmtvv, linux-kbuild,
	yamada.masahiro, akpm, Andi Kleen

On Thu, 30 Nov 2017 19:22:33 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Mon, 27 Nov 2017 13:34:13 -0800
> Andi Kleen <andi@firstfloor.org> wrote:
> 
> > From: Andi Kleen <ak@linux.intel.com>
> > 
> > gcc 5 supports a new -mcount-record option to generate ftrace
> > tables directly. This avoids the need to run record_mcount
> > manually.
> > 
> > Use this option when available.
> > 
> > So far doesn't use -mcount-nop, which also exists now.
> > 
> > This is needed to make ftrace work with LTO because the
> > normal record-mcount script doesn't run over the link
> > time output.
> > 
> > It should also improve build times slightly in the general
> > case.
> > 
> > Cc: rostedt@goodmis.org
> > Signed-off-by: Andi Kleen <ak@linux.intel.com>
> > ---  
> 
> Hi Andi,
> 
> Thanks for sending this. I'm currently trying to catch up on all the
> changes that need to get into 4.15 and will be doing a bit of
> traveling.
> 
> I plan on looking at this in a week or two, and add it to the 4.16
> queue.
>

Looks like it I forgot to add it to the queue (actually, I remember
having issues with that queue and restarted it), and this was dropped.

Adding it this time for 4.18

Sorry about that. :-/

-- Steve

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2018-05-01 18:42 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-27 21:34 Link time optimization for LTO/x86 Andi Kleen
2017-11-27 21:34 ` [PATCH 01/21] x86/xen: Mark pv stub assembler symbol visible Andi Kleen
2017-11-27 21:34 ` [PATCH 02/21] afs: Fix const confusion in AFS Andi Kleen
2017-11-27 21:34 ` [PATCH 03/21] x86/timer: Don't inline __const_udelay Andi Kleen
2017-11-27 21:34 ` [PATCH 04/21] locking/spinlocks: Mark spinlocks noinline when inline spinlocks are disabled Andi Kleen
2017-11-27 21:34 ` [PATCH 05/21] x86/kvm: Make steal_time visible Andi Kleen
2017-11-27 21:34 ` [PATCH 06/21] x86/syscalls: Make x86 syscalls use real prototypes Andi Kleen
2017-11-27 21:34 ` [PATCH 07/21] x86: Make exception handler functions visible Andi Kleen
2017-11-27 21:34 ` [PATCH 08/21] x86/idt: Make const __initconst Andi Kleen
2017-11-27 21:34 ` [PATCH 09/21] lto: Use C version for SYSCALL_ALIAS Andi Kleen
2017-11-27 21:34 ` [PATCH 10/21] Fix read buffer overflow in delta-ipc Andi Kleen
2017-11-27 21:34 ` [PATCH 11/21] trace: Use -mcount-record for dynamic ftrace Andi Kleen
2017-12-01  0:22   ` Steven Rostedt
2018-05-01 18:42     ` Steven Rostedt
2017-11-27 21:34 ` [PATCH 12/21] ftrace: Mark function tracer test functions noinline/noclone Andi Kleen
2017-11-27 21:34 ` [PATCH 13/21] ftrace: Disable LTO for ftrace self tests Andi Kleen
2017-11-27 21:34 ` [PATCH 14/21] lto, fs: Avoid static variable in linux/fs.h Andi Kleen
2017-11-27 21:34 ` [PATCH 15/21] lto, x86, mm: Disable vmalloc BUILD_BUG_ON for LTO Andi Kleen
2017-11-27 21:34 ` [PATCH 16/21] lto: Add __noreorder and mark initcalls __noreorder Andi Kleen
2017-11-27 21:34 ` [PATCH 17/21] lto, workaround: Disable LTO for BPF Andi Kleen
2017-11-27 21:34 ` [PATCH 18/21] lto, crypto: Disable LTO for camelia glue Andi Kleen
2017-11-27 21:34 ` [PATCH 19/21] lto, x86: Disable LTO for realmode / vDSO / head64 Andi Kleen
2017-11-27 21:34 ` [PATCH 20/21] Kbuild, lto: Add Link Time Optimization support Andi Kleen
2018-01-27  0:15   ` Arnd Bergmann
2018-01-27  0:55     ` Andi Kleen
2018-01-27 14:26       ` Arnd Bergmann
2018-01-28 18:33         ` Andi Kleen
2017-11-27 21:34 ` [PATCH 21/21] x86: Enable Link Time Optimization Andi Kleen
2017-11-28 16:04 ` [PATCH 02/21] afs: Fix const confusion in AFS David Howells
2017-11-28 16:50   ` Andi Kleen
2017-11-29 23:09 ` Link time optimization for LTO/x86 Sami Tolvanen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).