linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] ARM: hacks for link-time optimization
@ 2018-02-20 21:59 Arnd Bergmann
  2018-02-20 21:59 ` [PATCH 1/7] ARM: disallow combining XIP and LTO Arnd Bergmann
                   ` (7 more replies)
  0 siblings, 8 replies; 35+ messages in thread
From: Arnd Bergmann @ 2018-02-20 21:59 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Andi Kleen, linux-arm-kernel, linux-kernel, Arnd Bergmann

Hi Nico, all,

I was playing with ARM link-time optimization handling earlier this
month, and eventually got it to build cleanly with randconfig kernels,
but ended up with a lot of ugly hacks to actually pull it off.

Here are the ones that I don't think we actually want to merge,
but it may be helpful to have these for reference for the next
person that tries to make it work.

      Arnd

Arnd Bergmann (7):
  ARM: disallow combining XIP and LTO
  ARM: LTO: avoid THUMB2_KERNEL+LTO
  [HACK] pass endianess flag to LTO linker
  ARM: io-acorn: fix LTO linking without CONFIG_PRINTK
  ARM: fix __inflate_kernel_data stack warning for LTO
  ARM: mark assembler-referenced symbols as __visible
  efi: disable LTO for EFI stub

 arch/arm/Kconfig                      |  4 +++-
 arch/arm/Makefile                     |  2 ++
 arch/arm/kernel/Makefile              |  3 ---
 arch/arm/kernel/head-inflate-data.c   |  3 ++-
 arch/arm/kernel/process.c             |  2 +-
 arch/arm/kernel/suspend.c             |  2 ++
 arch/arm/kernel/unwind.c              |  1 +
 arch/arm/lib/io-acorn.S               |  4 ++++
 arch/arm/probes/kprobes/core.c        |  2 +-
 arch/arm/probes/kprobes/test-core.c   | 11 ++++++-----
 arch/arm/vdso/vgettimeofday.c         |  2 ++
 drivers/bus/arm-cci.c                 |  6 +++---
 drivers/firmware/efi/libstub/Makefile |  4 +++-
 drivers/soc/bcm/brcmstb/pm/pm-arm.c   |  2 +-
 lib/clz_ctz.c                         | 20 ++++++++++----------
 15 files changed, 41 insertions(+), 27 deletions(-)

-- 
2.9.0

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 1/7] ARM: disallow combining XIP and LTO
  2018-02-20 21:59 [PATCH 0/7] ARM: hacks for link-time optimization Arnd Bergmann
@ 2018-02-20 21:59 ` Arnd Bergmann
  2018-02-21  3:01   ` Nicolas Pitre
  2018-03-12  2:40   ` Nicolas Pitre
  2018-02-20 21:59 ` [PATCH 2/7] ARM: LTO: avoid THUMB2_KERNEL+LTO Arnd Bergmann
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 35+ messages in thread
From: Arnd Bergmann @ 2018-02-20 21:59 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Andi Kleen, linux-arm-kernel, linux-kernel, Arnd Bergmann

This fails during deflate_xip_data.sh

  /home/arnd/cross-gcc/bin/arm-linux-gnueabi-objcopy -O binary -R .comment -S  vmlinux arch/arm/boot/xipImage && /bin/bash -c '/git/arm-soc/arch/arm/boot/deflate_xip_data.sh vmlinux arch/arm/boot/xipImage || { rm -f arch/arm/boot/xipImage; false; }'
make -f /git/arm-soc/scripts/Makefile.modpost
+ sym_val __data_loc
+ sed -n / __data_loc$/{s/ .*$//p;q}
+ /home/arnd/cross-gcc/bin/arm-linux-gnueabi-gcc-nm vmlinux
/home/arnd/cross-gcc/lib/gcc/arm-linux-gnueabi/8.0.1/../../../../arm-linux-gnueabi/bin/nm terminated with signal 13 [Broken pipe]
+ local val=ac74c0f4
+ [ ac74c0f4 ]
+ echo 2893332724
+ __data_loc=2893332724
+ sym_val _edata_loc
+ /home/arnd/cross-gcc/bin/arm-linux-gnueabi-gcc-nm vmlinux
+ sed -n / _edata_loc$/{s/ .*$//p;q}
/home/arnd/cross-gcc/lib/gcc/arm-linux-gnueabi/8.0.1/../../../../arm-linux-gnueabi/bin/nm terminated with signal 13 [Broken pipe]
+ local val=ac7b8744
+ [ ac7b8744 ]
+ echo 2893776708
+ _edata_loc=2893776708
+ sym_val _xiprom
+ sed -n / _xiprom$/{s/ .*$//p;q}
+ /home/arnd/cross-gcc/bin/arm-linux-gnueabi-gcc-nm vmlinux
/home/arnd/cross-gcc/lib/gcc/arm-linux-gnueabi/8.0.1/../../../../arm-linux-gnueabi/bin/nm terminated with signal 13 [Broken pipe]

Obviously we want to make the combination work, no idea why it doesn't.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/arm/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 823e397ee0f3..8ed0f664f86f 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1976,6 +1976,7 @@ endchoice
 config XIP_KERNEL
 	bool "Kernel Execute-In-Place from ROM"
 	depends on !ARM_LPAE && !ARCH_MULTIPLATFORM
+	depends on !LTO
 	help
 	  Execute-In-Place allows the kernel to run from non-volatile storage
 	  directly addressable by the CPU, such as NOR flash. This saves RAM
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 2/7] ARM: LTO: avoid THUMB2_KERNEL+LTO
  2018-02-20 21:59 [PATCH 0/7] ARM: hacks for link-time optimization Arnd Bergmann
  2018-02-20 21:59 ` [PATCH 1/7] ARM: disallow combining XIP and LTO Arnd Bergmann
@ 2018-02-20 21:59 ` Arnd Bergmann
  2018-02-21  3:12   ` Nicolas Pitre
  2018-03-07 18:30   ` Matthias Kaehlcke
  2018-02-20 21:59 ` [PATCH 3/7] [HACK] pass endianess flag to LTO linker Arnd Bergmann
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 35+ messages in thread
From: Arnd Bergmann @ 2018-02-20 21:59 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Andi Kleen, linux-arm-kernel, linux-kernel, Arnd Bergmann

Trying to build an LTO-Enabled kernel with Thumb2 instructions failed
horribly for me, with an endless output of things like

ccVnNycO.s:2665: Error: thumb conditional instruction should be in IT block -- `bxne lr'
ccVnNycO.s:7128: Error: thumb conditional instruction should be in IT block -- `strexeq r5,r2,[r3]'
ccVnNycO.s:7258: Error: thumb conditional instruction should be in IT block -- `strexeq lr,r0,[r3]'
ccVnNycO.s:17380: Error: thumb conditional instruction should be in IT block -- `strexeq r1,r2,[r6]'
ccVnNycO.s:19163: Error: thumb conditional instruction should be in IT block -- `strexeq r8,r6,[r3]'
ccVnNycO.s:22722: Error: thumb conditional instruction should be in IT block -- `strexeq r7,r1,[r0]'
ccVnNycO.s:24105: conditional infixes are deprecated in unified syntax
ccVnNycO.s:24105: Error: thumb conditional instruction should be in IT block -- `sbcccs r1,r1,r3'
ccVnNycO.s:24105: Error: thumb conditional instruction should be in IT block -- `movcc r3,#0'
ccVnNycO.s:24210: conditional infixes are deprecated in unified syntax
ccVnNycO.s:24210: Error: thumb conditional instruction should be in IT block -- `sbcccs r2,r2,r3'
ccVnNycO.s:24210: Error: thumb conditional instruction should be in IT block -- `movcc r3,#0'

I did not investigate this too much, disabling Thumb2 support when LTO is
set lets me build randconfig kernels.

Since ARM_SINGLE_ARMV7M is Thumb2-only, I have to disallow LTO for V7-M
targets.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/arm/Kconfig | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 8ed0f664f86f..fbf2c3ab9a97 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -18,7 +18,7 @@ config ARM
 	select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX
 	select ARCH_OPTIONAL_KERNEL_RWX_DEFAULT if CPU_V7
 	select ARCH_SUPPORTS_ATOMIC_RMW
-	select ARCH_SUPPORTS_LTO
+	select ARCH_SUPPORTS_LTO if !ARM_SINGLE_ARMV7M
 	select ARCH_USE_BUILTIN_BSWAP
 	select ARCH_USE_CMPXCHG_LOCKREF
 	select ARCH_WANT_IPC_PARSE_VERSION
@@ -1533,6 +1533,7 @@ config SCHED_HRTICK
 config THUMB2_KERNEL
 	bool "Compile the kernel in Thumb-2 mode" if !CPU_THUMBONLY
 	depends on (CPU_V7 || CPU_V7M) && !CPU_V6 && !CPU_V6K
+	depends on !LTO
 	default y if CPU_THUMBONLY
 	select ARM_UNWIND
 	help
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 3/7] [HACK] pass endianess flag to LTO linker
  2018-02-20 21:59 [PATCH 0/7] ARM: hacks for link-time optimization Arnd Bergmann
  2018-02-20 21:59 ` [PATCH 1/7] ARM: disallow combining XIP and LTO Arnd Bergmann
  2018-02-20 21:59 ` [PATCH 2/7] ARM: LTO: avoid THUMB2_KERNEL+LTO Arnd Bergmann
@ 2018-02-20 21:59 ` Arnd Bergmann
  2018-02-21  3:15   ` Nicolas Pitre
  2018-02-21  8:37   ` Ard Biesheuvel
  2018-02-20 21:59 ` [PATCH 4/7] ARM: io-acorn: fix LTO linking without CONFIG_PRINTK Arnd Bergmann
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 35+ messages in thread
From: Arnd Bergmann @ 2018-02-20 21:59 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Andi Kleen, linux-arm-kernel, linux-kernel, Arnd Bergmann

We need some way to pass -mbig-endian to the linker during the
LTO link stage, otherwise we get a waning like

arm-linux-gnueabi/bin/ld: arch/arm/lib/clearbit.o: compiled for a big endian system and target is little endian

for each file we link in.

There is probably a better method of passing that flag, I'm just
adding it to a different hack that I added earlier for x86 LTO
here.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/arm/Makefile | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index 33b7eb4502aa..f39c2e2d55c0 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -49,11 +49,13 @@ endif
 
 ifeq ($(CONFIG_CPU_BIG_ENDIAN),y)
 KBUILD_CPPFLAGS	+= -mbig-endian
+KBUILD_BIARCHFLAGS += -mbig-endian
 CHECKFLAGS	+= -D__ARMEB__
 AS		+= -EB
 LD		+= -EB
 else
 KBUILD_CPPFLAGS	+= -mlittle-endian
+KBUILD_BIARCHFLAGS += -mlittle-endian
 CHECKFLAGS	+= -D__ARMEL__
 AS		+= -EL
 LD		+= -EL
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 4/7] ARM: io-acorn: fix LTO linking without CONFIG_PRINTK
  2018-02-20 21:59 [PATCH 0/7] ARM: hacks for link-time optimization Arnd Bergmann
                   ` (2 preceding siblings ...)
  2018-02-20 21:59 ` [PATCH 3/7] [HACK] pass endianess flag to LTO linker Arnd Bergmann
@ 2018-02-20 21:59 ` Arnd Bergmann
  2018-02-20 21:59 ` [PATCH 5/7] ARM: fix __inflate_kernel_data stack warning for LTO Arnd Bergmann
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 35+ messages in thread
From: Arnd Bergmann @ 2018-02-20 21:59 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Andi Kleen, linux-arm-kernel, linux-kernel, Arnd Bergmann

When CONFIG_LTO is enabled, we get a link error for this file
that contains a reference to printk():

arch/arm/lib/io-acorn.o: In function `outsl':
(.text+0x38): undefined reference to `printk'

Normally the file is simply dropped, but that doesn't happen
with LTO. Making the reference conditional helps, but perhaps
a better fix would be to make sure the LTO linker drops the
entire file in the same way that we normally do.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/arm/lib/io-acorn.S | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/arm/lib/io-acorn.S b/arch/arm/lib/io-acorn.S
index 69719bad674d..3522a899460b 100644
--- a/arch/arm/lib/io-acorn.S
+++ b/arch/arm/lib/io-acorn.S
@@ -27,6 +27,10 @@
  */
 ENTRY(insl)
 ENTRY(outsl)
+#ifdef CONFIG_PRINTK
 		adr	r0, .Liosl_warning
 		mov	r1, lr
 		b	printk
+#else
+		ret	lr
+#endif
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 5/7] ARM: fix __inflate_kernel_data stack warning for LTO
  2018-02-20 21:59 [PATCH 0/7] ARM: hacks for link-time optimization Arnd Bergmann
                   ` (3 preceding siblings ...)
  2018-02-20 21:59 ` [PATCH 4/7] ARM: io-acorn: fix LTO linking without CONFIG_PRINTK Arnd Bergmann
@ 2018-02-20 21:59 ` Arnd Bergmann
  2018-02-21  3:26   ` Nicolas Pitre
  2018-02-20 21:59 ` [PATCH 6/7] ARM: mark assembler-referenced symbols as __visible Arnd Bergmann
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 35+ messages in thread
From: Arnd Bergmann @ 2018-02-20 21:59 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Andi Kleen, linux-arm-kernel, linux-kernel, Arnd Bergmann

Commit ca8b5d97d6bf ("ARM: XIP kernel: store .data compressed in ROM")
moved the decompressor workspace to the stack and added a compiler
flag to avoid the stack size warning.

With LTO, that warning comes back. Moving the workspace into an initdata
variable avoids that warning but presumably also undoes the optimization.

We could also try disabling the warning locally in that file with
_Pragma("GCC disagnostic"), but we lack a little bit of infrastructure
to do that nicely.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/arm/kernel/Makefile            | 3 ---
 arch/arm/kernel/head-inflate-data.c | 3 ++-
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/arm/kernel/Makefile b/arch/arm/kernel/Makefile
index b59ac4bf82b8..2e8d40d442a2 100644
--- a/arch/arm/kernel/Makefile
+++ b/arch/arm/kernel/Makefile
@@ -88,9 +88,6 @@ head-y			:= head$(MMUEXT).o
 obj-$(CONFIG_DEBUG_LL)	+= debug.o
 obj-$(CONFIG_EARLY_PRINTK)	+= early_printk.o
 
-# This is executed very early using a temporary stack when no memory allocator
-# nor global data is available. Everything has to be allocated on the stack.
-CFLAGS_head-inflate-data.o := $(call cc-option,-Wframe-larger-than=10240)
 obj-$(CONFIG_XIP_DEFLATED_DATA) += head-inflate-data.o
 
 obj-$(CONFIG_ARM_VIRT_EXT)	+= hyp-stub.o
diff --git a/arch/arm/kernel/head-inflate-data.c b/arch/arm/kernel/head-inflate-data.c
index 6dd0ce5e6058..b208c4541bd1 100644
--- a/arch/arm/kernel/head-inflate-data.c
+++ b/arch/arm/kernel/head-inflate-data.c
@@ -35,10 +35,11 @@ extern char _sdata[];
  * stack then there is no need to clean up before returning.
  */
 
+static __initdata struct inflate_state state;
+
 int __init __inflate_kernel_data(void)
 {
 	struct z_stream_s stream, *strm = &stream;
-	struct inflate_state state;
 	char *in = __data_loc;
 	int rc;
 
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 6/7] ARM: mark assembler-referenced symbols as __visible
  2018-02-20 21:59 [PATCH 0/7] ARM: hacks for link-time optimization Arnd Bergmann
                   ` (4 preceding siblings ...)
  2018-02-20 21:59 ` [PATCH 5/7] ARM: fix __inflate_kernel_data stack warning for LTO Arnd Bergmann
@ 2018-02-20 21:59 ` Arnd Bergmann
  2018-02-20 21:59 ` [PATCH 7/7] efi: disable LTO for EFI stub Arnd Bergmann
  2018-12-17 22:50 ` [PATCH 0/7] ARM: hacks for link-time optimization Peter Zijlstra
  7 siblings, 0 replies; 35+ messages in thread
From: Arnd Bergmann @ 2018-02-20 21:59 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Andi Kleen, linux-arm-kernel, linux-kernel, Arnd Bergmann

I got link errors for references to local symbols from
inline assembler, and also for referencing local symbols
inside of inline assembler fragments from C.

In both cases, making the symbols globally visible fixes
the link process, but this seems a bit ugly, so I hope there
is a better way to do this.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/arm/kernel/process.c           |  2 +-
 arch/arm/kernel/suspend.c           |  2 ++
 arch/arm/kernel/unwind.c            |  1 +
 arch/arm/probes/kprobes/core.c      |  2 +-
 arch/arm/probes/kprobes/test-core.c | 11 ++++++-----
 arch/arm/vdso/vgettimeofday.c       |  2 ++
 drivers/bus/arm-cci.c               |  6 +++---
 drivers/soc/bcm/brcmstb/pm/pm-arm.c |  2 +-
 lib/clz_ctz.c                       | 20 ++++++++++----------
 9 files changed, 27 insertions(+), 21 deletions(-)

diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index 1523cb18b109..38969cb279df 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -41,7 +41,7 @@
 
 #ifdef CONFIG_CC_STACKPROTECTOR
 #include <linux/stackprotector.h>
-unsigned long __stack_chk_guard __read_mostly;
+unsigned long __stack_chk_guard __read_mostly __visible;
 EXPORT_SYMBOL(__stack_chk_guard);
 #endif
 
diff --git a/arch/arm/kernel/suspend.c b/arch/arm/kernel/suspend.c
index a40ebb7c0896..6679ffa0a3ef 100644
--- a/arch/arm/kernel/suspend.c
+++ b/arch/arm/kernel/suspend.c
@@ -16,6 +16,7 @@ extern int __cpu_suspend(unsigned long, int (*)(unsigned long), u32 cpuid);
 extern void cpu_resume_mmu(void);
 
 #ifdef CONFIG_MMU
+__visible
 int cpu_suspend(unsigned long arg, int (*fn)(unsigned long))
 {
 	struct mm_struct *mm = current->active_mm;
@@ -41,6 +42,7 @@ int cpu_suspend(unsigned long arg, int (*fn)(unsigned long))
 	return ret;
 }
 #else
+__visible
 int cpu_suspend(unsigned long arg, int (*fn)(unsigned long))
 {
 	u32 __mpidr = cpu_logical_map(smp_processor_id());
diff --git a/arch/arm/kernel/unwind.c b/arch/arm/kernel/unwind.c
index 0bee233fef9a..e3bc44b35028 100644
--- a/arch/arm/kernel/unwind.c
+++ b/arch/arm/kernel/unwind.c
@@ -55,6 +55,7 @@ void __aeabi_unwind_cpp_pr0(void)
 };
 EXPORT_SYMBOL(__aeabi_unwind_cpp_pr0);
 
+__visible
 void __aeabi_unwind_cpp_pr1(void)
 {
 };
diff --git a/arch/arm/probes/kprobes/core.c b/arch/arm/probes/kprobes/core.c
index e90cc8a08186..70ce40b5aa40 100644
--- a/arch/arm/probes/kprobes/core.c
+++ b/arch/arm/probes/kprobes/core.c
@@ -242,7 +242,7 @@ singlestep(struct kprobe *p, struct pt_regs *regs, struct kprobe_ctlblk *kcb)
  * kprobe, and that level is reserved for user kprobe handlers, so we can't
  * risk encountering a new kprobe in an interrupt handler.
  */
-void __kprobes kprobe_handler(struct pt_regs *regs)
+void __kprobes __visible __used kprobe_handler(struct pt_regs *regs)
 {
 	struct kprobe *p, *cur;
 	struct kprobe_ctlblk *kcb;
diff --git a/arch/arm/probes/kprobes/test-core.c b/arch/arm/probes/kprobes/test-core.c
index 9ed0129bed3c..bc40eededab9 100644
--- a/arch/arm/probes/kprobes/test-core.c
+++ b/arch/arm/probes/kprobes/test-core.c
@@ -245,6 +245,7 @@ static void __used __naked __arm_kprobes_test_func(void)
 	__asm__ __volatile__ (
 		".arm					\n\t"
 		".type arm_func, %%function		\n\t"
+		".globl arm_func			\n\t"
 		"arm_func:				\n\t"
 		"adds	r0, r0, r1			\n\t"
 		"mov	pc, lr				\n\t"
@@ -917,7 +918,7 @@ static void coverage_end(void)
  * Framework for instruction set test cases
  */
 
-void __naked __kprobes_test_case_start(void)
+void __naked __used __visible __kprobes_test_case_start(void)
 {
 	__asm__ __volatile__ (
 		"mov	r2, sp					\n\t"
@@ -934,7 +935,7 @@ void __naked __kprobes_test_case_start(void)
 
 #ifndef CONFIG_THUMB2_KERNEL
 
-void __naked __kprobes_test_case_end_32(void)
+void __naked __used __visible __kprobes_test_case_end_32(void)
 {
 	__asm__ __volatile__ (
 		"mov	r4, lr					\n\t"
@@ -951,7 +952,7 @@ void __naked __kprobes_test_case_end_32(void)
 
 #else /* CONFIG_THUMB2_KERNEL */
 
-void __naked __kprobes_test_case_end_16(void)
+void __naked __used __visible __kprobes_test_case_end_16(void)
 {
 	__asm__ __volatile__ (
 		"mov	r4, lr					\n\t"
@@ -966,7 +967,7 @@ void __naked __kprobes_test_case_end_16(void)
 	);
 }
 
-void __naked __kprobes_test_case_end_32(void)
+void __naked __used __visible __kprobes_test_case_end_32(void)
 {
 	__asm__ __volatile__ (
 		".arm						\n\t"
@@ -1315,7 +1316,7 @@ static unsigned long next_instruction(unsigned long pc)
 	return pc + 4;
 }
 
-static uintptr_t __used kprobes_test_case_start(const char **title, void *stack)
+uintptr_t __used __visible kprobes_test_case_start(const char **title, void *stack)
 {
 	struct test_arg *args;
 	struct test_arg_end *end_arg;
diff --git a/arch/arm/vdso/vgettimeofday.c b/arch/arm/vdso/vgettimeofday.c
index b79dc05b0b4d..df7e4b4d8719 100644
--- a/arch/arm/vdso/vgettimeofday.c
+++ b/arch/arm/vdso/vgettimeofday.c
@@ -273,6 +273,8 @@ void __aeabi_unwind_cpp_pr0(void)
 {
 }
 
+__visible
+void __aeabi_unwind_cpp_pr1(void);
 void __aeabi_unwind_cpp_pr1(void)
 {
 }
diff --git a/drivers/bus/arm-cci.c b/drivers/bus/arm-cci.c
index 5426c04fe24b..e149590bce50 100644
--- a/drivers/bus/arm-cci.c
+++ b/drivers/bus/arm-cci.c
@@ -1842,7 +1842,7 @@ struct cci_ace_port {
 	struct device_node *dn;
 };
 
-static struct cci_ace_port *ports;
+struct cci_ace_port *ports;
 static unsigned int nb_cci_ports;
 
 struct cpu_port {
@@ -1877,7 +1877,7 @@ static inline bool cpu_port_match(struct cpu_port *port, u64 mpidr)
 	return port->mpidr == (mpidr & MPIDR_HWID_BITMASK);
 }
 
-static struct cpu_port cpu_port[NR_CPUS];
+struct cpu_port cpu_port[NR_CPUS];
 
 /**
  * __cci_ace_get_port - Function to retrieve the port index connected to
@@ -2027,7 +2027,7 @@ EXPORT_SYMBOL_GPL(cci_disable_port_by_cpu);
  * any failure this never returns as the inability to enable the CCI is
  * fatal and there is no possible recovery at this stage.
  */
-asmlinkage void __naked cci_enable_port_for_self(void)
+asmlinkage void __naked __visible cci_enable_port_for_self(void)
 {
 	asm volatile ("\n"
 "	.arch armv7-a\n"
diff --git a/drivers/soc/bcm/brcmstb/pm/pm-arm.c b/drivers/soc/bcm/brcmstb/pm/pm-arm.c
index dcf8c8065508..863f3a6b2279 100644
--- a/drivers/soc/bcm/brcmstb/pm/pm-arm.c
+++ b/drivers/soc/bcm/brcmstb/pm/pm-arm.c
@@ -400,7 +400,7 @@ static int brcmstb_pm_s2(void)
  * generate stack references on the old stack). It cannot be made static because
  * it is referenced from brcmstb_pm_s3()
  */
-noinline int brcmstb_pm_s3_finish(void)
+__visible noinline int brcmstb_pm_s3_finish(void)
 {
 	struct brcmstb_s3_params *params = ctrl.s3_params;
 	dma_addr_t params_pa = ctrl.s3_params_pa;
diff --git a/lib/clz_ctz.c b/lib/clz_ctz.c
index 2e11e48446ab..516751d12217 100644
--- a/lib/clz_ctz.c
+++ b/lib/clz_ctz.c
@@ -16,31 +16,31 @@
 #include <linux/export.h>
 #include <linux/kernel.h>
 
-int __weak __ctzsi2(int val);
-int __weak __ctzsi2(int val)
+int __visible __ctzsi2(int val);
+int __visible __ctzsi2(int val)
 {
 	return __ffs(val);
 }
 EXPORT_SYMBOL(__ctzsi2);
 
-int __weak __clzsi2(int val);
-int __weak __clzsi2(int val)
+int __visible __clzsi2(int val);
+int __visible __clzsi2(int val)
 {
 	return 32 - fls(val);
 }
 EXPORT_SYMBOL(__clzsi2);
 
-int __weak __clzdi2(long val);
-int __weak __ctzdi2(long val);
+int __visible __clzdi2(long val);
+int __visible __ctzdi2(long val);
 #if BITS_PER_LONG == 32
 
-int __weak __clzdi2(long val)
+int __visible __clzdi2(long val)
 {
 	return 32 - fls((int)val);
 }
 EXPORT_SYMBOL(__clzdi2);
 
-int __weak __ctzdi2(long val)
+int __visible __ctzdi2(long val)
 {
 	return __ffs((u32)val);
 }
@@ -48,13 +48,13 @@ EXPORT_SYMBOL(__ctzdi2);
 
 #elif BITS_PER_LONG == 64
 
-int __weak __clzdi2(long val)
+int __visible __clzdi2(long val)
 {
 	return 64 - fls64((u64)val);
 }
 EXPORT_SYMBOL(__clzdi2);
 
-int __weak __ctzdi2(long val)
+int __visible __ctzdi2(long val)
 {
 	return __ffs64((u64)val);
 }
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 7/7] efi: disable LTO for EFI stub
  2018-02-20 21:59 [PATCH 0/7] ARM: hacks for link-time optimization Arnd Bergmann
                   ` (5 preceding siblings ...)
  2018-02-20 21:59 ` [PATCH 6/7] ARM: mark assembler-referenced symbols as __visible Arnd Bergmann
@ 2018-02-20 21:59 ` Arnd Bergmann
  2018-12-17 22:50 ` [PATCH 0/7] ARM: hacks for link-time optimization Peter Zijlstra
  7 siblings, 0 replies; 35+ messages in thread
From: Arnd Bergmann @ 2018-02-20 21:59 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Andi Kleen, linux-arm-kernel, linux-kernel, Arnd Bergmann

Building the EFI stub on 32-bit ARM with LTO resulted in a build error:

arm-linux-gnueabi-ld: drivers/firmware/efi/libstub/arm-stub.stub.o: plugin needed to handle lto object
arch/arm/boot/compressed/head.o: In function `efi_stub_entry':

Locally disabling LTO for this directory seems to do the trick.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 drivers/firmware/efi/libstub/Makefile | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index 7b3ba40f0745..cc4c53f1ff0a 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -20,7 +20,9 @@ cflags-$(CONFIG_EFI_ARMSTUB)	+= -I$(srctree)/scripts/dtc/libfdt
 KBUILD_CFLAGS			:= $(cflags-y) -DDISABLE_BRANCH_PROFILING \
 				   -D__NO_FORTIFY \
 				   $(call cc-option,-ffreestanding) \
-				   $(call cc-option,-fno-stack-protector)
+				   $(call cc-option,-fno-stack-protector) \
+				   $(DISABLE_LTO)
+
 
 GCOV_PROFILE			:= n
 KASAN_SANITIZE			:= n
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/7] ARM: disallow combining XIP and LTO
  2018-02-20 21:59 ` [PATCH 1/7] ARM: disallow combining XIP and LTO Arnd Bergmann
@ 2018-02-21  3:01   ` Nicolas Pitre
  2018-02-21 11:50     ` Arnd Bergmann
  2018-03-12  2:40   ` Nicolas Pitre
  1 sibling, 1 reply; 35+ messages in thread
From: Nicolas Pitre @ 2018-02-21  3:01 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Andi Kleen, linux-arm-kernel, linux-kernel

On Tue, 20 Feb 2018, Arnd Bergmann wrote:

> This fails during deflate_xip_data.sh
> 
>   /home/arnd/cross-gcc/bin/arm-linux-gnueabi-objcopy -O binary -R .comment -S  vmlinux arch/arm/boot/xipImage && /bin/bash -c '/git/arm-soc/arch/arm/boot/deflate_xip_data.sh vmlinux arch/arm/boot/xipImage || { rm -f arch/arm/boot/xipImage; false; }'
> make -f /git/arm-soc/scripts/Makefile.modpost
> + sym_val __data_loc
> + sed -n / __data_loc$/{s/ .*$//p;q}
> + /home/arnd/cross-gcc/bin/arm-linux-gnueabi-gcc-nm vmlinux
> /home/arnd/cross-gcc/lib/gcc/arm-linux-gnueabi/8.0.1/../../../../arm-linux-gnueabi/bin/nm terminated with signal 13 [Broken pipe]
> + local val=ac74c0f4
> + [ ac74c0f4 ]
> + echo 2893332724
> + __data_loc=2893332724
> + sym_val _edata_loc
> + /home/arnd/cross-gcc/bin/arm-linux-gnueabi-gcc-nm vmlinux
> + sed -n / _edata_loc$/{s/ .*$//p;q}
> /home/arnd/cross-gcc/lib/gcc/arm-linux-gnueabi/8.0.1/../../../../arm-linux-gnueabi/bin/nm terminated with signal 13 [Broken pipe]
> + local val=ac7b8744
> + [ ac7b8744 ]
> + echo 2893776708
> + _edata_loc=2893776708
> + sym_val _xiprom
> + sed -n / _xiprom$/{s/ .*$//p;q}
> + /home/arnd/cross-gcc/bin/arm-linux-gnueabi-gcc-nm vmlinux
> /home/arnd/cross-gcc/lib/gcc/arm-linux-gnueabi/8.0.1/../../../../arm-linux-gnueabi/bin/nm terminated with signal 13 [Broken pipe]
> 
> Obviously we want to make the combination work, no idea why it doesn't.
> 
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
>  arch/arm/Kconfig | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index 823e397ee0f3..8ed0f664f86f 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -1976,6 +1976,7 @@ endchoice
>  config XIP_KERNEL
>  	bool "Kernel Execute-In-Place from ROM"
>  	depends on !ARM_LPAE && !ARCH_MULTIPLATFORM
> +	depends on !LTO

You should move this to config XIP_DEFLATED_DATA instead.


Nicolas

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/7] ARM: LTO: avoid THUMB2_KERNEL+LTO
  2018-02-20 21:59 ` [PATCH 2/7] ARM: LTO: avoid THUMB2_KERNEL+LTO Arnd Bergmann
@ 2018-02-21  3:12   ` Nicolas Pitre
  2018-02-21 11:48     ` Arnd Bergmann
  2018-03-07 18:30   ` Matthias Kaehlcke
  1 sibling, 1 reply; 35+ messages in thread
From: Nicolas Pitre @ 2018-02-21  3:12 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Andi Kleen, linux-arm-kernel, linux-kernel

On Tue, 20 Feb 2018, Arnd Bergmann wrote:

> Trying to build an LTO-Enabled kernel with Thumb2 instructions failed
> horribly for me, with an endless output of things like
> 
> ccVnNycO.s:2665: Error: thumb conditional instruction should be in IT block -- `bxne lr'
> ccVnNycO.s:7128: Error: thumb conditional instruction should be in IT block -- `strexeq r5,r2,[r3]'
> ccVnNycO.s:7258: Error: thumb conditional instruction should be in IT block -- `strexeq lr,r0,[r3]'
> ccVnNycO.s:17380: Error: thumb conditional instruction should be in IT block -- `strexeq r1,r2,[r6]'
> ccVnNycO.s:19163: Error: thumb conditional instruction should be in IT block -- `strexeq r8,r6,[r3]'
> ccVnNycO.s:22722: Error: thumb conditional instruction should be in IT block -- `strexeq r7,r1,[r0]'
> ccVnNycO.s:24105: conditional infixes are deprecated in unified syntax
> ccVnNycO.s:24105: Error: thumb conditional instruction should be in IT block -- `sbcccs r1,r1,r3'
> ccVnNycO.s:24105: Error: thumb conditional instruction should be in IT block -- `movcc r3,#0'
> ccVnNycO.s:24210: conditional infixes are deprecated in unified syntax
> ccVnNycO.s:24210: Error: thumb conditional instruction should be in IT block -- `sbcccs r2,r2,r3'
> ccVnNycO.s:24210: Error: thumb conditional instruction should be in IT block -- `movcc r3,#0'
> 
> I did not investigate this too much, disabling Thumb2 support when LTO is
> set lets me build randconfig kernels.
> 
> Since ARM_SINGLE_ARMV7M is Thumb2-only, I have to disallow LTO for V7-M
> targets.

Here's the workaround I sent you on January 2rd:

----- >*
#!/bin/bash

# work around https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78353

GCC_ROOT=/opt/gcc-linaro-6.3.1-2017.05-x86_64_arm-linux-gnueabihf

set -e
set -x

cd $GCC_ROOT//arm-linux-gnueabihf/bin
[ -e fat-as ] && exit 1
mv as fat-as
cat > as << EOF
#!/bin/bash
exec -a "\$0" "\$(dirname "\$0")/fat-as" -mimplicit-it=always "\$@"
EOF
chmod +x as
----- >8


Nicolas

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/7] [HACK] pass endianess flag to LTO linker
  2018-02-20 21:59 ` [PATCH 3/7] [HACK] pass endianess flag to LTO linker Arnd Bergmann
@ 2018-02-21  3:15   ` Nicolas Pitre
  2018-02-21  9:44     ` Arnd Bergmann
  2018-02-21  8:37   ` Ard Biesheuvel
  1 sibling, 1 reply; 35+ messages in thread
From: Nicolas Pitre @ 2018-02-21  3:15 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Andi Kleen, linux-arm-kernel, linux-kernel

On Tue, 20 Feb 2018, Arnd Bergmann wrote:

> We need some way to pass -mbig-endian to the linker during the
> LTO link stage, otherwise we get a waning like
> 
> arm-linux-gnueabi/bin/ld: arch/arm/lib/clearbit.o: compiled for a big endian system and target is little endian
> 
> for each file we link in.
> 
> There is probably a better method of passing that flag, I'm just
> adding it to a different hack that I added earlier for x86 LTO
> here.

Didn't the patch below fix it for you already?

----- >8
Date: Fri, 1 Sep 2017 18:37:52 -0400
Subject: [PATCH] scripts/gcc-ld: LTO on ARM needs arch specific gcc flags

Otherwise the final link where code generation happens produces code
for the wrong ISA when the default CPU configured into gcc is not the
one we need.

Also display the actual command when invoked with "make V=1".

Signed-off-by: Nicolas Pitre <nico@linaro.org>

diff --git a/scripts/gcc-ld b/scripts/gcc-ld
index d95dd0be38..fa53be2a34 100755
--- a/scripts/gcc-ld
+++ b/scripts/gcc-ld
@@ -27,4 +27,10 @@ while [ "$1" != "" ] ; do
 	shift
 done
 
-exec $CC $ARGS
+case "${KBUILD_VERBOSE}" in
+*1*)
+	set -x
+	;;
+esac
+
+exec $CC $KBUILD_CFLAGS $ARGS

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 5/7] ARM: fix __inflate_kernel_data stack warning for LTO
  2018-02-20 21:59 ` [PATCH 5/7] ARM: fix __inflate_kernel_data stack warning for LTO Arnd Bergmann
@ 2018-02-21  3:26   ` Nicolas Pitre
  0 siblings, 0 replies; 35+ messages in thread
From: Nicolas Pitre @ 2018-02-21  3:26 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Andi Kleen, linux-arm-kernel, linux-kernel

On Tue, 20 Feb 2018, Arnd Bergmann wrote:

> Commit ca8b5d97d6bf ("ARM: XIP kernel: store .data compressed in ROM")
> moved the decompressor workspace to the stack and added a compiler
> flag to avoid the stack size warning.
> 
> With LTO, that warning comes back. Moving the workspace into an initdata
> variable avoids that warning but presumably also undoes the optimization.

Not only that, but it will probably crash at run time. What this code 
does is uncompressing initialized data to memory, _including_ initdata. 
So you'll end up overwriting your inflate_state while decompressing.

> We could also try disabling the warning locally in that file with
> _Pragma("GCC disagnostic"), but we lack a little bit of infrastructure
> to do that nicely.

Your patch #1/7 showed issues with the final part of this feature 
anyway, so my suggestion for that patch will take care of this one too 
for the time being.

> 
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
>  arch/arm/kernel/Makefile            | 3 ---
>  arch/arm/kernel/head-inflate-data.c | 3 ++-
>  2 files changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm/kernel/Makefile b/arch/arm/kernel/Makefile
> index b59ac4bf82b8..2e8d40d442a2 100644
> --- a/arch/arm/kernel/Makefile
> +++ b/arch/arm/kernel/Makefile
> @@ -88,9 +88,6 @@ head-y			:= head$(MMUEXT).o
>  obj-$(CONFIG_DEBUG_LL)	+= debug.o
>  obj-$(CONFIG_EARLY_PRINTK)	+= early_printk.o
>  
> -# This is executed very early using a temporary stack when no memory allocator
> -# nor global data is available. Everything has to be allocated on the stack.
> -CFLAGS_head-inflate-data.o := $(call cc-option,-Wframe-larger-than=10240)
>  obj-$(CONFIG_XIP_DEFLATED_DATA) += head-inflate-data.o
>  
>  obj-$(CONFIG_ARM_VIRT_EXT)	+= hyp-stub.o
> diff --git a/arch/arm/kernel/head-inflate-data.c b/arch/arm/kernel/head-inflate-data.c
> index 6dd0ce5e6058..b208c4541bd1 100644
> --- a/arch/arm/kernel/head-inflate-data.c
> +++ b/arch/arm/kernel/head-inflate-data.c
> @@ -35,10 +35,11 @@ extern char _sdata[];
>   * stack then there is no need to clean up before returning.
>   */
>  
> +static __initdata struct inflate_state state;
> +
>  int __init __inflate_kernel_data(void)
>  {
>  	struct z_stream_s stream, *strm = &stream;
> -	struct inflate_state state;
>  	char *in = __data_loc;
>  	int rc;
>  
> -- 
> 2.9.0
> 
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/7] [HACK] pass endianess flag to LTO linker
  2018-02-20 21:59 ` [PATCH 3/7] [HACK] pass endianess flag to LTO linker Arnd Bergmann
  2018-02-21  3:15   ` Nicolas Pitre
@ 2018-02-21  8:37   ` Ard Biesheuvel
  2018-02-21  9:48     ` Arnd Bergmann
  1 sibling, 1 reply; 35+ messages in thread
From: Ard Biesheuvel @ 2018-02-21  8:37 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Nicolas Pitre, Andi Kleen, Linux Kernel Mailing List, linux-arm-kernel

On 20 February 2018 at 21:59, Arnd Bergmann <arnd@arndb.de> wrote:
> We need some way to pass -mbig-endian to the linker during the
> LTO link stage, otherwise we get a waning like
>
> arm-linux-gnueabi/bin/ld: arch/arm/lib/clearbit.o: compiled for a big endian system and target is little endian
>
> for each file we link in.
>
> There is probably a better method of passing that flag, I'm just
> adding it to a different hack that I added earlier for x86 LTO
> here.
>

In general, LTO requires that *all* C flags are passed to the linker.
Given that linking now involves code generation, any C flag that
affects code generation must be visible to the linker as well, which
includes all the tweaks and overrides that we add per-file or
per-directory. It is not clear to me how much of this is carried in
the intermediate representation as metadata, but we should probably
err on the side of caution here, and update the Kbuild routines to
pass the complete value of KBUILD_CFLAGS (or whatever it is called) to
ld as well.



> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
>  arch/arm/Makefile | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/arch/arm/Makefile b/arch/arm/Makefile
> index 33b7eb4502aa..f39c2e2d55c0 100644
> --- a/arch/arm/Makefile
> +++ b/arch/arm/Makefile
> @@ -49,11 +49,13 @@ endif
>
>  ifeq ($(CONFIG_CPU_BIG_ENDIAN),y)
>  KBUILD_CPPFLAGS        += -mbig-endian
> +KBUILD_BIARCHFLAGS += -mbig-endian
>  CHECKFLAGS     += -D__ARMEB__
>  AS             += -EB
>  LD             += -EB
>  else
>  KBUILD_CPPFLAGS        += -mlittle-endian
> +KBUILD_BIARCHFLAGS += -mlittle-endian
>  CHECKFLAGS     += -D__ARMEL__
>  AS             += -EL
>  LD             += -EL
> --
> 2.9.0
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/7] [HACK] pass endianess flag to LTO linker
  2018-02-21  3:15   ` Nicolas Pitre
@ 2018-02-21  9:44     ` Arnd Bergmann
  0 siblings, 0 replies; 35+ messages in thread
From: Arnd Bergmann @ 2018-02-21  9:44 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Andi Kleen, Linux ARM, Linux Kernel Mailing List

On Wed, Feb 21, 2018 at 4:15 AM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> On Tue, 20 Feb 2018, Arnd Bergmann wrote:
>
>> We need some way to pass -mbig-endian to the linker during the
>> LTO link stage, otherwise we get a waning like
>>
>> arm-linux-gnueabi/bin/ld: arch/arm/lib/clearbit.o: compiled for a big endian system and target is little endian
>>
>> for each file we link in.
>>
>> There is probably a better method of passing that flag, I'm just
>> adding it to a different hack that I added earlier for x86 LTO
>> here.
>
> Didn't the patch below fix it for you already?

I think the problem here is that -mbig-endian is part of KBUILD_CPPFLAGS,
not KBUILD_CFLAGS. We add the latter to the gcc-ld command line, but not
the former.

      Arnd

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/7] [HACK] pass endianess flag to LTO linker
  2018-02-21  8:37   ` Ard Biesheuvel
@ 2018-02-21  9:48     ` Arnd Bergmann
  2018-02-21 10:09       ` Ard Biesheuvel
  0 siblings, 1 reply; 35+ messages in thread
From: Arnd Bergmann @ 2018-02-21  9:48 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Nicolas Pitre, Andi Kleen, Linux Kernel Mailing List, linux-arm-kernel

On Wed, Feb 21, 2018 at 9:37 AM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> On 20 February 2018 at 21:59, Arnd Bergmann <arnd@arndb.de> wrote:
>> We need some way to pass -mbig-endian to the linker during the
>> LTO link stage, otherwise we get a waning like
>>
>> arm-linux-gnueabi/bin/ld: arch/arm/lib/clearbit.o: compiled for a big endian system and target is little endian
>>
>> for each file we link in.
>>
>> There is probably a better method of passing that flag, I'm just
>> adding it to a different hack that I added earlier for x86 LTO
>> here.
>>
>
> In general, LTO requires that *all* C flags are passed to the linker.
> Given that linking now involves code generation, any C flag that
> affects code generation must be visible to the linker as well, which
> includes all the tweaks and overrides that we add per-file or
> per-directory. It is not clear to me how much of this is carried in
> the intermediate representation as metadata, but we should probably
> err on the side of caution here, and update the Kbuild routines to
> pass the complete value of KBUILD_CFLAGS (or whatever it is called) to
> ld as well.

It looks like we're just missing KBUILD_CPPFLAGS.

However, I wonder for the more general case what happens to files
that require non-standard CFLAGS. In some cases we turn off
some optimization step for a file, we might remove '-pg', or build for
a particular target architecture. Do we have to turn off -flto for any file
that requires this for correct behavior?

      Arnd

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/7] [HACK] pass endianess flag to LTO linker
  2018-02-21  9:48     ` Arnd Bergmann
@ 2018-02-21 10:09       ` Ard Biesheuvel
  2018-02-21 13:00         ` Arnd Bergmann
  0 siblings, 1 reply; 35+ messages in thread
From: Ard Biesheuvel @ 2018-02-21 10:09 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Nicolas Pitre, Andi Kleen, Linux Kernel Mailing List, linux-arm-kernel

On 21 February 2018 at 09:48, Arnd Bergmann <arnd@arndb.de> wrote:
> On Wed, Feb 21, 2018 at 9:37 AM, Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
>> On 20 February 2018 at 21:59, Arnd Bergmann <arnd@arndb.de> wrote:
>>> We need some way to pass -mbig-endian to the linker during the
>>> LTO link stage, otherwise we get a waning like
>>>
>>> arm-linux-gnueabi/bin/ld: arch/arm/lib/clearbit.o: compiled for a big endian system and target is little endian
>>>
>>> for each file we link in.
>>>
>>> There is probably a better method of passing that flag, I'm just
>>> adding it to a different hack that I added earlier for x86 LTO
>>> here.
>>>
>>
>> In general, LTO requires that *all* C flags are passed to the linker.
>> Given that linking now involves code generation, any C flag that
>> affects code generation must be visible to the linker as well, which
>> includes all the tweaks and overrides that we add per-file or
>> per-directory. It is not clear to me how much of this is carried in
>> the intermediate representation as metadata, but we should probably
>> err on the side of caution here, and update the Kbuild routines to
>> pass the complete value of KBUILD_CFLAGS (or whatever it is called) to
>> ld as well.
>
> It looks like we're just missing KBUILD_CPPFLAGS.
>
> However, I wonder for the more general case what happens to files
> that require non-standard CFLAGS.

That was kind of my point, yes.

> In some cases we turn off
> some optimization step for a file, we might remove '-pg', or build for
> a particular target architecture. Do we have to turn off -flto for any file
> that requires this for correct behavior?
>

Excellent question. I think the problem is that the file boundary is
becoming blurred with LTO, and so any option that is absolutely
required to build a single file correctly may need to be applied to
vmlinux as a whole. Whether that is the case for any particular option
depends on which compiler pass(es) is (are) affected by the option,
i.e., whether it is taken into account when creating the intermediate
representation.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/7] ARM: LTO: avoid THUMB2_KERNEL+LTO
  2018-02-21  3:12   ` Nicolas Pitre
@ 2018-02-21 11:48     ` Arnd Bergmann
  0 siblings, 0 replies; 35+ messages in thread
From: Arnd Bergmann @ 2018-02-21 11:48 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Andi Kleen, Linux ARM, Linux Kernel Mailing List

On Wed, Feb 21, 2018 at 4:12 AM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> On Tue, 20 Feb 2018, Arnd Bergmann wrote:
>
>> Trying to build an LTO-Enabled kernel with Thumb2 instructions failed
>> horribly for me, with an endless output of things like
>>
>> ccVnNycO.s:2665: Error: thumb conditional instruction should be in IT block -- `bxne lr'
>> ccVnNycO.s:7128: Error: thumb conditional instruction should be in IT block -- `strexeq r5,r2,[r3]'
>> ccVnNycO.s:7258: Error: thumb conditional instruction should be in IT block -- `strexeq lr,r0,[r3]'
>> ccVnNycO.s:17380: Error: thumb conditional instruction should be in IT block -- `strexeq r1,r2,[r6]'
>> ccVnNycO.s:19163: Error: thumb conditional instruction should be in IT block -- `strexeq r8,r6,[r3]'
>> ccVnNycO.s:22722: Error: thumb conditional instruction should be in IT block -- `strexeq r7,r1,[r0]'
>> ccVnNycO.s:24105: conditional infixes are deprecated in unified syntax
>> ccVnNycO.s:24105: Error: thumb conditional instruction should be in IT block -- `sbcccs r1,r1,r3'
>> ccVnNycO.s:24105: Error: thumb conditional instruction should be in IT block -- `movcc r3,#0'
>> ccVnNycO.s:24210: conditional infixes are deprecated in unified syntax
>> ccVnNycO.s:24210: Error: thumb conditional instruction should be in IT block -- `sbcccs r2,r2,r3'
>> ccVnNycO.s:24210: Error: thumb conditional instruction should be in IT block -- `movcc r3,#0'
>>
>> I did not investigate this too much, disabling Thumb2 support when LTO is
>> set lets me build randconfig kernels.
>>
>> Since ARM_SINGLE_ARMV7M is Thumb2-only, I have to disallow LTO for V7-M
>> targets.
>
> Here's the workaround I sent you on January 2rd:

Hmm, I thought I had applied that correctly but looking again now I
must have overwritten
the assembler when I reinstalled the toolchain from source to get a
fixed version of the
compiler.

       Arnd

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/7] ARM: disallow combining XIP and LTO
  2018-02-21  3:01   ` Nicolas Pitre
@ 2018-02-21 11:50     ` Arnd Bergmann
  2018-02-21 15:13       ` Nicolas Pitre
  0 siblings, 1 reply; 35+ messages in thread
From: Arnd Bergmann @ 2018-02-21 11:50 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Andi Kleen, Linux ARM, Linux Kernel Mailing List

On Wed, Feb 21, 2018 at 4:01 AM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> On Tue, 20 Feb 2018, Arnd Bergmann wrote:
>
>> This fails during deflate_xip_data.sh
>>
>>   /home/arnd/cross-gcc/bin/arm-linux-gnueabi-objcopy -O binary -R .comment -S  vmlinux arch/arm/boot/xipImage && /bin/bash -c '/git/arm-soc/arch/arm/boot/deflate_xip_data.sh vmlinux arch/arm/boot/xipImage || { rm -f arch/arm/boot/xipImage; false; }'
>> make -f /git/arm-soc/scripts/Makefile.modpost
>> + sym_val __data_loc
>> + sed -n / __data_loc$/{s/ .*$//p;q}
>> + /home/arnd/cross-gcc/bin/arm-linux-gnueabi-gcc-nm vmlinux
>> /home/arnd/cross-gcc/lib/gcc/arm-linux-gnueabi/8.0.1/../../../../arm-linux-gnueabi/bin/nm terminated with signal 13 [Broken pipe]
>> + local val=ac74c0f4
>> + [ ac74c0f4 ]
>> + echo 2893332724
>> + __data_loc=2893332724
>> + sym_val _edata_loc
>> + /home/arnd/cross-gcc/bin/arm-linux-gnueabi-gcc-nm vmlinux
>> + sed -n / _edata_loc$/{s/ .*$//p;q}
>> /home/arnd/cross-gcc/lib/gcc/arm-linux-gnueabi/8.0.1/../../../../arm-linux-gnueabi/bin/nm terminated with signal 13 [Broken pipe]
>> + local val=ac7b8744
>> + [ ac7b8744 ]
>> + echo 2893776708
>> + _edata_loc=2893776708
>> + sym_val _xiprom
>> + sed -n / _xiprom$/{s/ .*$//p;q}
>> + /home/arnd/cross-gcc/bin/arm-linux-gnueabi-gcc-nm vmlinux
>> /home/arnd/cross-gcc/lib/gcc/arm-linux-gnueabi/8.0.1/../../../../arm-linux-gnueabi/bin/nm terminated with signal 13 [Broken pipe]
>>
>> Obviously we want to make the combination work, no idea why it doesn't.
>>
>
> You should move this to config XIP_DEFLATED_DATA instead.

Right, makes sense. I'd still prever nm to not crash, but that would be a small
improvement.

       Arnd

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/7] [HACK] pass endianess flag to LTO linker
  2018-02-21 10:09       ` Ard Biesheuvel
@ 2018-02-21 13:00         ` Arnd Bergmann
  0 siblings, 0 replies; 35+ messages in thread
From: Arnd Bergmann @ 2018-02-21 13:00 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Nicolas Pitre, Andi Kleen, Linux Kernel Mailing List, linux-arm-kernel

On Wed, Feb 21, 2018 at 11:09 AM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> On 21 February 2018 at 09:48, Arnd Bergmann <arnd@arndb.de> wrote:
>> On Wed, Feb 21, 2018 at 9:37 AM, Ard Biesheuvel
>> <ard.biesheuvel@linaro.org> wrote:
>>> On 20 February 2018 at 21:59, Arnd Bergmann <arnd@arndb.de> wrote:
>
>> In some cases we turn off
>> some optimization step for a file, we might remove '-pg', or build for
>> a particular target architecture. Do we have to turn off -flto for any file
>> that requires this for correct behavior?
>>
>
> Excellent question. I think the problem is that the file boundary is
> becoming blurred with LTO, and so any option that is absolutely
> required to build a single file correctly may need to be applied to
> vmlinux as a whole. Whether that is the case for any particular option
> depends on which compiler pass(es) is (are) affected by the option,
> i.e., whether it is taken into account when creating the intermediate
> representation.

Newer compilers are able to change both the optimization and warning
flags per function using a #pragma or _Pragma() directive or
function attributes.
There has been some recent discussion about requiring a newer gcc
version, so if we do that, we could perhaps replace all the existing
CFLAGS overrides with those pragmas.

     Arnd

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/7] ARM: disallow combining XIP and LTO
  2018-02-21 11:50     ` Arnd Bergmann
@ 2018-02-21 15:13       ` Nicolas Pitre
  0 siblings, 0 replies; 35+ messages in thread
From: Nicolas Pitre @ 2018-02-21 15:13 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Andi Kleen, Linux ARM, Linux Kernel Mailing List

On Wed, 21 Feb 2018, Arnd Bergmann wrote:

> On Wed, Feb 21, 2018 at 4:01 AM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> > On Tue, 20 Feb 2018, Arnd Bergmann wrote:
> >
> >> This fails during deflate_xip_data.sh
> >>
> >>   /home/arnd/cross-gcc/bin/arm-linux-gnueabi-objcopy -O binary -R .comment -S  vmlinux arch/arm/boot/xipImage && /bin/bash -c '/git/arm-soc/arch/arm/boot/deflate_xip_data.sh vmlinux arch/arm/boot/xipImage || { rm -f arch/arm/boot/xipImage; false; }'
> >> make -f /git/arm-soc/scripts/Makefile.modpost
> >> + sym_val __data_loc
> >> + sed -n / __data_loc$/{s/ .*$//p;q}
> >> + /home/arnd/cross-gcc/bin/arm-linux-gnueabi-gcc-nm vmlinux
> >> /home/arnd/cross-gcc/lib/gcc/arm-linux-gnueabi/8.0.1/../../../../arm-linux-gnueabi/bin/nm terminated with signal 13 [Broken pipe]
> >> + local val=ac74c0f4
> >> + [ ac74c0f4 ]
> >> + echo 2893332724
> >> + __data_loc=2893332724
> >> + sym_val _edata_loc
> >> + /home/arnd/cross-gcc/bin/arm-linux-gnueabi-gcc-nm vmlinux
> >> + sed -n / _edata_loc$/{s/ .*$//p;q}
> >> /home/arnd/cross-gcc/lib/gcc/arm-linux-gnueabi/8.0.1/../../../../arm-linux-gnueabi/bin/nm terminated with signal 13 [Broken pipe]
> >> + local val=ac7b8744
> >> + [ ac7b8744 ]
> >> + echo 2893776708
> >> + _edata_loc=2893776708
> >> + sym_val _xiprom
> >> + sed -n / _xiprom$/{s/ .*$//p;q}
> >> + /home/arnd/cross-gcc/bin/arm-linux-gnueabi-gcc-nm vmlinux
> >> /home/arnd/cross-gcc/lib/gcc/arm-linux-gnueabi/8.0.1/../../../../arm-linux-gnueabi/bin/nm terminated with signal 13 [Broken pipe]
> >>
> >> Obviously we want to make the combination work, no idea why it doesn't.
> >>
> >
> > You should move this to config XIP_DEFLATED_DATA instead.
> 
> Right, makes sense. I'd still prever nm to not crash, but that would be a small
> improvement.

I'll have a look at nm.

However this feature is linked to head-inflate-data.c (your patch #5/7) 
and I have no obvious quick solution in mind for that one.


Nicolas

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/7] ARM: LTO: avoid THUMB2_KERNEL+LTO
  2018-02-20 21:59 ` [PATCH 2/7] ARM: LTO: avoid THUMB2_KERNEL+LTO Arnd Bergmann
  2018-02-21  3:12   ` Nicolas Pitre
@ 2018-03-07 18:30   ` Matthias Kaehlcke
  2018-03-07 18:52     ` Nicolas Pitre
  1 sibling, 1 reply; 35+ messages in thread
From: Matthias Kaehlcke @ 2018-03-07 18:30 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Nicolas Pitre, Andi Kleen, linux-arm-kernel, linux-kernel

El Tue, Feb 20, 2018 at 10:59:49PM +0100 Arnd Bergmann ha dit:

> Trying to build an LTO-Enabled kernel with Thumb2 instructions failed
> horribly for me, with an endless output of things like
> 
> ccVnNycO.s:2665: Error: thumb conditional instruction should be in IT block -- `bxne lr'
> ccVnNycO.s:7128: Error: thumb conditional instruction should be in IT block -- `strexeq r5,r2,[r3]'
> ccVnNycO.s:7258: Error: thumb conditional instruction should be in IT block -- `strexeq lr,r0,[r3]'
> ccVnNycO.s:17380: Error: thumb conditional instruction should be in IT block -- `strexeq r1,r2,[r6]'
> ccVnNycO.s:19163: Error: thumb conditional instruction should be in IT block -- `strexeq r8,r6,[r3]'
> ccVnNycO.s:22722: Error: thumb conditional instruction should be in IT block -- `strexeq r7,r1,[r0]'
> ccVnNycO.s:24105: conditional infixes are deprecated in unified syntax
> ccVnNycO.s:24105: Error: thumb conditional instruction should be in IT block -- `sbcccs r1,r1,r3'
> ccVnNycO.s:24105: Error: thumb conditional instruction should be in IT block -- `movcc r3,#0'
> ccVnNycO.s:24210: conditional infixes are deprecated in unified syntax
> ccVnNycO.s:24210: Error: thumb conditional instruction should be in IT block -- `sbcccs r2,r2,r3'
> ccVnNycO.s:24210: Error: thumb conditional instruction should be in IT block -- `movcc r3,#0'

For the record:

The errors about sbcccs and movcc probably stem from here:

/* We use 33-bit arithmetic here... */
#define __range_ok(addr, size) ({ \
        unsigned long flag, roksum; \
        __chk_user_ptr(addr);   \
        __asm__("adds %1, %2, %3; sbcccs %1, %1, %0; movcc %0, #0" \
                : "=&r" (flag), "=&r" (roksum) \
                : "r" (addr), "Ir" (size), "0" (current_thread_info()->addr_limit) \
                : "cc"); \
        flag; })

arch/arm/include/asm/uaccess.h

I stumbled across this when trying to build a 32-bit ARM kernel with
clang. This post has some more information:

https://lists.linuxfoundation.org/pipermail/llvmlinux/2012-November/000073.html

"This is a problem with the above Linux inline ASM code, not with Clang,
nor with gas. It appears the new "Unified syntax" for ARM ASM code now
used by gas doesn't allow the use of conditional infixes anymore (the CC
part)."

Unfortunately I'm probably not fluent enough with ARM inline assembly
to come up with a good alternative.

Matthias

> I did not investigate this too much, disabling Thumb2 support when LTO is
> set lets me build randconfig kernels.
> 
> Since ARM_SINGLE_ARMV7M is Thumb2-only, I have to disallow LTO for V7-M
> targets.
> 
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
>  arch/arm/Kconfig | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index 8ed0f664f86f..fbf2c3ab9a97 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -18,7 +18,7 @@ config ARM
>  	select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX
>  	select ARCH_OPTIONAL_KERNEL_RWX_DEFAULT if CPU_V7
>  	select ARCH_SUPPORTS_ATOMIC_RMW
> -	select ARCH_SUPPORTS_LTO
> +	select ARCH_SUPPORTS_LTO if !ARM_SINGLE_ARMV7M
>  	select ARCH_USE_BUILTIN_BSWAP
>  	select ARCH_USE_CMPXCHG_LOCKREF
>  	select ARCH_WANT_IPC_PARSE_VERSION
> @@ -1533,6 +1533,7 @@ config SCHED_HRTICK
>  config THUMB2_KERNEL
>  	bool "Compile the kernel in Thumb-2 mode" if !CPU_THUMBONLY
>  	depends on (CPU_V7 || CPU_V7M) && !CPU_V6 && !CPU_V6K
> +	depends on !LTO
>  	default y if CPU_THUMBONLY
>  	select ARM_UNWIND
>  	help

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/7] ARM: LTO: avoid THUMB2_KERNEL+LTO
  2018-03-07 18:30   ` Matthias Kaehlcke
@ 2018-03-07 18:52     ` Nicolas Pitre
  0 siblings, 0 replies; 35+ messages in thread
From: Nicolas Pitre @ 2018-03-07 18:52 UTC (permalink / raw)
  To: Matthias Kaehlcke
  Cc: Arnd Bergmann, Andi Kleen, linux-arm-kernel, linux-kernel

On Wed, 7 Mar 2018, Matthias Kaehlcke wrote:

> El Tue, Feb 20, 2018 at 10:59:49PM +0100 Arnd Bergmann ha dit:
> 
> > Trying to build an LTO-Enabled kernel with Thumb2 instructions failed
> > horribly for me, with an endless output of things like
> > 
> > ccVnNycO.s:2665: Error: thumb conditional instruction should be in IT block -- `bxne lr'
> > ccVnNycO.s:7128: Error: thumb conditional instruction should be in IT block -- `strexeq r5,r2,[r3]'
> > ccVnNycO.s:7258: Error: thumb conditional instruction should be in IT block -- `strexeq lr,r0,[r3]'
> > ccVnNycO.s:17380: Error: thumb conditional instruction should be in IT block -- `strexeq r1,r2,[r6]'
> > ccVnNycO.s:19163: Error: thumb conditional instruction should be in IT block -- `strexeq r8,r6,[r3]'
> > ccVnNycO.s:22722: Error: thumb conditional instruction should be in IT block -- `strexeq r7,r1,[r0]'
> > ccVnNycO.s:24105: conditional infixes are deprecated in unified syntax
> > ccVnNycO.s:24105: Error: thumb conditional instruction should be in IT block -- `sbcccs r1,r1,r3'
> > ccVnNycO.s:24105: Error: thumb conditional instruction should be in IT block -- `movcc r3,#0'
> > ccVnNycO.s:24210: conditional infixes are deprecated in unified syntax
> > ccVnNycO.s:24210: Error: thumb conditional instruction should be in IT block -- `sbcccs r2,r2,r3'
> > ccVnNycO.s:24210: Error: thumb conditional instruction should be in IT block -- `movcc r3,#0'
> 
> For the record:
> 
> The errors about sbcccs and movcc probably stem from here:
> 
> /* We use 33-bit arithmetic here... */
> #define __range_ok(addr, size) ({ \
>         unsigned long flag, roksum; \
>         __chk_user_ptr(addr);   \
>         __asm__("adds %1, %2, %3; sbcccs %1, %1, %0; movcc %0, #0" \
>                 : "=&r" (flag), "=&r" (roksum) \
>                 : "r" (addr), "Ir" (size), "0" (current_thread_info()->addr_limit) \
>                 : "cc"); \
>         flag; })
> 
> arch/arm/include/asm/uaccess.h
> 
> I stumbled across this when trying to build a 32-bit ARM kernel with
> clang.

You have to tell clang to pass -mno-warn-deprecated to gas. That's what 
the gcc build currently does.

In this particular case with LTO, the same trick as done for 
-mimplicit-it=always would do it to work around 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78353.


Nicolas

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/7] ARM: disallow combining XIP and LTO
  2018-02-20 21:59 ` [PATCH 1/7] ARM: disallow combining XIP and LTO Arnd Bergmann
  2018-02-21  3:01   ` Nicolas Pitre
@ 2018-03-12  2:40   ` Nicolas Pitre
  2018-03-12 13:52     ` Arnd Bergmann
  1 sibling, 1 reply; 35+ messages in thread
From: Nicolas Pitre @ 2018-03-12  2:40 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Andi Kleen, linux-arm-kernel, linux-kernel

On Tue, 20 Feb 2018, Arnd Bergmann wrote:

> This fails during deflate_xip_data.sh
> 
>   /home/arnd/cross-gcc/bin/arm-linux-gnueabi-objcopy -O binary -R .comment -S  vmlinux arch/arm/boot/xipImage && /bin/bash -c '/git/arm-soc/arch/arm/boot/deflate_xip_data.sh vmlinux arch/arm/boot/xipImage || { rm -f arch/arm/boot/xipImage; false; }'
> make -f /git/arm-soc/scripts/Makefile.modpost
> + sym_val __data_loc
> + sed -n / __data_loc$/{s/ .*$//p;q}
> + /home/arnd/cross-gcc/bin/arm-linux-gnueabi-gcc-nm vmlinux
> /home/arnd/cross-gcc/lib/gcc/arm-linux-gnueabi/8.0.1/../../../../arm-linux-gnueabi/bin/nm terminated with signal 13 [Broken pipe]
> + local val=ac74c0f4
> + [ ac74c0f4 ]
> + echo 2893332724
> + __data_loc=2893332724
> + sym_val _edata_loc
> + /home/arnd/cross-gcc/bin/arm-linux-gnueabi-gcc-nm vmlinux
> + sed -n / _edata_loc$/{s/ .*$//p;q}
> /home/arnd/cross-gcc/lib/gcc/arm-linux-gnueabi/8.0.1/../../../../arm-linux-gnueabi/bin/nm terminated with signal 13 [Broken pipe]
> + local val=ac7b8744
> + [ ac7b8744 ]
> + echo 2893776708
> + _edata_loc=2893776708
> + sym_val _xiprom
> + sed -n / _xiprom$/{s/ .*$//p;q}
> + /home/arnd/cross-gcc/bin/arm-linux-gnueabi-gcc-nm vmlinux
> /home/arnd/cross-gcc/lib/gcc/arm-linux-gnueabi/8.0.1/../../../../arm-linux-gnueabi/bin/nm terminated with signal 13 [Broken pipe]
> 
> Obviously we want to make the combination work, no idea why it doesn't.

Well, it does work regardless of the noise. Here the nm output is piped 
into sed, and the later exits early when it finds what it is looking 
for, causing nm to complain about the broken pipe.

Here's a patch silencing this bogus error message and fixing other minor 
issues.

----- >8
Subject: [PATCH] ARM: deflate_xip_data.sh: minor fixes

Send nm complaints about broken pipe (when sed exits early) to /dev/null.
All errors should be printed to stderr.
Don't trap on normal exit so the trap can return an error code.

Signed-off-by: Nicolas Pitre <nico@linaro.org>

diff --git a/arch/arm/boot/deflate_xip_data.sh b/arch/arm/boot/deflate_xip_data.sh
index 1189598a25..5e7d758ebd 100755
--- a/arch/arm/boot/deflate_xip_data.sh
+++ b/arch/arm/boot/deflate_xip_data.sh
@@ -30,7 +30,7 @@ esac
 
 sym_val() {
 	# extract hex value for symbol in $1
-	local val=$($NM "$VMLINUX" | sed -n "/ $1$/{s/ .*$//p;q}")
+	local val=$($NM "$VMLINUX" 2>/dev/null | sed -n "/ $1\$/{s/ .*$//p;q}")
 	[ "$val" ] || { echo "can't find $1 in $VMLINUX" 1>&2; exit 1; }
 	# convert from hex to decimal
 	echo $((0x$val))
@@ -48,12 +48,12 @@ data_end=$(($_edata_loc - $base_offset))
 file_end=$(stat -c "%s" "$XIPIMAGE")
 if [ "$file_end" != "$data_end" ]; then
 	printf "end of xipImage doesn't match with _edata_loc (%#x vs %#x)\n" \
-	       $(($file_end + $base_offset)) $_edata_loc 2>&1
+	       $(($file_end + $base_offset)) $_edata_loc 1>&2
 	exit 1;
 fi
 
 # be ready to clean up
-trap 'rm -f "$XIPIMAGE.tmp"' 0 1 2 3
+trap 'rm -f "$XIPIMAGE.tmp"; exit 1' 1 2 3
 
 # substitute the data section by a compressed version
 $DD if="$XIPIMAGE" count=$data_start iflag=count_bytes of="$XIPIMAGE.tmp"

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/7] ARM: disallow combining XIP and LTO
  2018-03-12  2:40   ` Nicolas Pitre
@ 2018-03-12 13:52     ` Arnd Bergmann
  2018-03-12 16:46       ` Arnd Bergmann
  0 siblings, 1 reply; 35+ messages in thread
From: Arnd Bergmann @ 2018-03-12 13:52 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Andi Kleen, Linux ARM, Linux Kernel Mailing List

On Mon, Mar 12, 2018 at 3:40 AM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> On Tue, 20 Feb 2018, Arnd Bergmann wrote:

>> Obviously we want to make the combination work, no idea why it doesn't.
>
> Well, it does work regardless of the noise. Here the nm output is piped
> into sed, and the later exits early when it finds what it is looking
> for, causing nm to complain about the broken pipe.
>
> Here's a patch silencing this bogus error message and fixing other minor
> issues.
>
> ----- >8
> Subject: [PATCH] ARM: deflate_xip_data.sh: minor fixes
>
> Send nm complaints about broken pipe (when sed exits early) to /dev/null.
> All errors should be printed to stderr.
> Don't trap on normal exit so the trap can return an error code.
>
> Signed-off-by: Nicolas Pitre <nico@linaro.org>

Ah, that explains it, thanks!

Tested-by: Arnd Bergmann <arnd@arndb.de>

> diff --git a/arch/arm/boot/deflate_xip_data.sh b/arch/arm/boot/deflate_xip_data.sh
> index 1189598a25..5e7d758ebd 100755
> --- a/arch/arm/boot/deflate_xip_data.sh
> +++ b/arch/arm/boot/deflate_xip_data.sh
> @@ -30,7 +30,7 @@ esac
>
>  sym_val() {
>         # extract hex value for symbol in $1
> -       local val=$($NM "$VMLINUX" | sed -n "/ $1$/{s/ .*$//p;q}")
> +       local val=$($NM "$VMLINUX" 2>/dev/null | sed -n "/ $1\$/{s/ .*$//p;q}")
>         [ "$val" ] || { echo "can't find $1 in $VMLINUX" 1>&2; exit 1; }
>         # convert from hex to decimal
>         echo $((0x$val))
> @@ -48,12 +48,12 @@ data_end=$(($_edata_loc - $base_offset))
>  file_end=$(stat -c "%s" "$XIPIMAGE")
>  if [ "$file_end" != "$data_end" ]; then
>         printf "end of xipImage doesn't match with _edata_loc (%#x vs %#x)\n" \
> -              $(($file_end + $base_offset)) $_edata_loc 2>&1
> +              $(($file_end + $base_offset)) $_edata_loc 1>&2
>         exit 1;
>  fi
>
>  # be ready to clean up
> -trap 'rm -f "$XIPIMAGE.tmp"' 0 1 2 3
> +trap 'rm -f "$XIPIMAGE.tmp"; exit 1' 1 2 3
>
>  # substitute the data section by a compressed version
>  $DD if="$XIPIMAGE" count=$data_start iflag=count_bytes of="$XIPIMAGE.tmp"

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/7] ARM: disallow combining XIP and LTO
  2018-03-12 13:52     ` Arnd Bergmann
@ 2018-03-12 16:46       ` Arnd Bergmann
  2018-03-12 17:00         ` Nicolas Pitre
  0 siblings, 1 reply; 35+ messages in thread
From: Arnd Bergmann @ 2018-03-12 16:46 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Andi Kleen, Linux ARM, Linux Kernel Mailing List

On Mon, Mar 12, 2018 at 2:52 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Mon, Mar 12, 2018 at 3:40 AM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
>> On Tue, 20 Feb 2018, Arnd Bergmann wrote:
>
>>> Obviously we want to make the combination work, no idea why it doesn't.
>>
>> Well, it does work regardless of the noise. Here the nm output is piped
>> into sed, and the later exits early when it finds what it is looking
>> for, causing nm to complain about the broken pipe.
>>
>> Here's a patch silencing this bogus error message and fixing other minor
>> issues.
>>
>> ----- >8
>> Subject: [PATCH] ARM: deflate_xip_data.sh: minor fixes
>>
>> Send nm complaints about broken pipe (when sed exits early) to /dev/null.
>> All errors should be printed to stderr.
>> Don't trap on normal exit so the trap can return an error code.
>>
>> Signed-off-by: Nicolas Pitre <nico@linaro.org>
>
> Ah, that explains it, thanks!
>
> Tested-by: Arnd Bergmann <arnd@arndb.de>


Nevermind, I confused it with a different problem that I'm running into
with randconfig builds:

arm-linux-gnueabi-nm: 'arch/arm/boot/compressed/../../../../vmlinux':
No such file

I applied your patch, and it did not happen for a couple of randconfigs,
but that is now back. So I did not test the bug you are fixing, and the
thing that I thought you fixed is still there.

        Arnd

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/7] ARM: disallow combining XIP and LTO
  2018-03-12 16:46       ` Arnd Bergmann
@ 2018-03-12 17:00         ` Nicolas Pitre
  2018-03-12 17:05           ` Nicolas Pitre
  2018-03-12 17:07           ` Arnd Bergmann
  0 siblings, 2 replies; 35+ messages in thread
From: Nicolas Pitre @ 2018-03-12 17:00 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Andi Kleen, Linux ARM, Linux Kernel Mailing List

On Mon, 12 Mar 2018, Arnd Bergmann wrote:

> On Mon, Mar 12, 2018 at 2:52 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> > On Mon, Mar 12, 2018 at 3:40 AM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> >> On Tue, 20 Feb 2018, Arnd Bergmann wrote:
> >
> >>> Obviously we want to make the combination work, no idea why it doesn't.
> >>
> >> Well, it does work regardless of the noise. Here the nm output is piped
> >> into sed, and the later exits early when it finds what it is looking
> >> for, causing nm to complain about the broken pipe.
> >>
> >> Here's a patch silencing this bogus error message and fixing other minor
> >> issues.
> >>
> >> ----- >8
> >> Subject: [PATCH] ARM: deflate_xip_data.sh: minor fixes
> >>
> >> Send nm complaints about broken pipe (when sed exits early) to /dev/null.
> >> All errors should be printed to stderr.
> >> Don't trap on normal exit so the trap can return an error code.
> >>
> >> Signed-off-by: Nicolas Pitre <nico@linaro.org>
> >
> > Ah, that explains it, thanks!
> >
> > Tested-by: Arnd Bergmann <arnd@arndb.de>
> 
> 
> Nevermind, I confused it with a different problem that I'm running into
> with randconfig builds:

At least you no longer get the "broken pipe" warning as your initial 
report showed, right?

> arm-linux-gnueabi-nm: 'arch/arm/boot/compressed/../../../../vmlinux':
> No such file

That is weird. The Makefile has:

cmd_mkxip = $(cmd_objcopy) && $(cmd_deflate_xip_data)

$(obj)/xipImage: vmlinux FORCE
        $(call if_changed,mkxip)

So the objcopy must succeed at producing vmlinux for deflate_xip_data 
(where nm is used) to be called.

Do you have a .config for this issue?


Nicolas

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/7] ARM: disallow combining XIP and LTO
  2018-03-12 17:00         ` Nicolas Pitre
@ 2018-03-12 17:05           ` Nicolas Pitre
  2018-03-12 17:07           ` Arnd Bergmann
  1 sibling, 0 replies; 35+ messages in thread
From: Nicolas Pitre @ 2018-03-12 17:05 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Andi Kleen, Linux ARM, Linux Kernel Mailing List

On Mon, 12 Mar 2018, Nicolas Pitre wrote:

> On Mon, 12 Mar 2018, Arnd Bergmann wrote:
> 
> > arm-linux-gnueabi-nm: 'arch/arm/boot/compressed/../../../../vmlinux':
> > No such file
> 
> That is weird. The Makefile has:
> 
> cmd_mkxip = $(cmd_objcopy) && $(cmd_deflate_xip_data)
> 
> $(obj)/xipImage: vmlinux FORCE
>         $(call if_changed,mkxip)
> 
> So the objcopy must succeed at producing vmlinux for deflate_xip_data 
> (where nm is used) to be called.

Correction: vmlinux is a dependency not a product, so it must be there 
for this rule to be executed, and it is input to both objcopy and nm.


Nicolas

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/7] ARM: disallow combining XIP and LTO
  2018-03-12 17:00         ` Nicolas Pitre
  2018-03-12 17:05           ` Nicolas Pitre
@ 2018-03-12 17:07           ` Arnd Bergmann
  1 sibling, 0 replies; 35+ messages in thread
From: Arnd Bergmann @ 2018-03-12 17:07 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Andi Kleen, Linux ARM, Linux Kernel Mailing List

On Mon, Mar 12, 2018 at 6:00 PM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> On Mon, 12 Mar 2018, Arnd Bergmann wrote:
>
>> On Mon, Mar 12, 2018 at 2:52 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>> > On Mon, Mar 12, 2018 at 3:40 AM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
>> >> On Tue, 20 Feb 2018, Arnd Bergmann wrote:
>> >
>> >>> Obviously we want to make the combination work, no idea why it doesn't.
>> >>
>> >> Well, it does work regardless of the noise. Here the nm output is piped
>> >> into sed, and the later exits early when it finds what it is looking
>> >> for, causing nm to complain about the broken pipe.
>> >>
>> >> Here's a patch silencing this bogus error message and fixing other minor
>> >> issues.
>> >>
>> >> ----- >8
>> >> Subject: [PATCH] ARM: deflate_xip_data.sh: minor fixes
>> >>
>> >> Send nm complaints about broken pipe (when sed exits early) to /dev/null.
>> >> All errors should be printed to stderr.
>> >> Don't trap on normal exit so the trap can return an error code.
>> >>
>> >> Signed-off-by: Nicolas Pitre <nico@linaro.org>
>> >
>> > Ah, that explains it, thanks!
>> >
>> > Tested-by: Arnd Bergmann <arnd@arndb.de>
>>
>>
>> Nevermind, I confused it with a different problem that I'm running into
>> with randconfig builds:
>
> At least you no longer get the "broken pipe" warning as your initial
> report showed, right?

I don't know, my recent tests are all without the LTO patch series,
so I don't get that one anyway.

>> arm-linux-gnueabi-nm: 'arch/arm/boot/compressed/../../../../vmlinux':
>> No such file
>
> That is weird. The Makefile has:
>
> cmd_mkxip = $(cmd_objcopy) && $(cmd_deflate_xip_data)
>
> $(obj)/xipImage: vmlinux FORCE
>         $(call if_changed,mkxip)
>
> So the objcopy must succeed at producing vmlinux for deflate_xip_data
> (where nm is used) to be called.
>
> Do you have a .config for this issue?

I only get it while doing randconfig builds with my scripts, maybe
one out of three times, but don't ever get when rebuilding the
configs later. I have never figured out how to reproduce it reliably
other than with my randconfig script:

#!/bin/bash

RAND_DIR=build/rand
LOG_DIR=rand
PARALLEL="-j30" # number of CPUs on build system
MAKE="make O=${RAND_DIR} -sk ${PARALLEL}"
export CCACHE_DISABLE=1 # no point in ccache for random builds

buildone()
{
        mkdir -p ${RAND_DIR} ${LOG_DIR}

        eval `${MAKE} randconfig 2>&1 | grep KCONFIG_SEED=`
        export KCONFIG_SEED
        ${MAKE} allrandom.config > /dev/null

        rm -rf ${RAND_DIR}/arch/arm64/kernel/vdso
        ID=${KCONFIG_SEED}
        cp ${RAND_DIR}/.config ${LOG_DIR}/$ID-config

        if ${MAKE} > ${LOG_DIR}/$ID-output 2>&1 ; then
                mv ${LOG_DIR}/$ID-output ${LOG_DIR}/$ID-success
                echo $ID `date` success
        else
                mv ${LOG_DIR}/$ID-output ${LOG_DIR}/$ID-failure
                echo $ID `date` failed
                ${MAKE}
        fi

        ${MAKE} clean
}

buildone



      Arnd

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/7] ARM: hacks for link-time optimization
  2018-02-20 21:59 [PATCH 0/7] ARM: hacks for link-time optimization Arnd Bergmann
                   ` (6 preceding siblings ...)
  2018-02-20 21:59 ` [PATCH 7/7] efi: disable LTO for EFI stub Arnd Bergmann
@ 2018-12-17 22:50 ` Peter Zijlstra
  2018-12-18  0:08   ` Andi Kleen
  7 siblings, 1 reply; 35+ messages in thread
From: Peter Zijlstra @ 2018-12-17 22:50 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Nicolas Pitre, Andi Kleen, linux-arm-kernel, linux-kernel,
	Paul McKenney, Will Deacon, Ingo Molnar, Thomas Gleixner

On Tue, Feb 20, 2018 at 10:59:47PM +0100, Arnd Bergmann wrote:
> Hi Nico, all,
> 
> I was playing with ARM link-time optimization handling earlier this
> month, and eventually got it to build cleanly with randconfig kernels,
> but ended up with a lot of ugly hacks to actually pull it off.

How are we dealing with the fact that LTO can break RCU in very subtle
and scary ways?

Do we have a compiler guy on board that has given us a compiler switch
that kills that optimization (and thereby guarantees that behaviour for
future compilers etc..) ?

Also see the thread here:

  https://lkml.kernel.org/r/20171116115810.GH9361@arm.com

(and yes, this is a fine example of how lore sucks for reading email)



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/7] ARM: hacks for link-time optimization
  2018-12-17 22:50 ` [PATCH 0/7] ARM: hacks for link-time optimization Peter Zijlstra
@ 2018-12-18  0:08   ` Andi Kleen
  2018-12-18  9:18     ` Peter Zijlstra
  0 siblings, 1 reply; 35+ messages in thread
From: Andi Kleen @ 2018-12-18  0:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arnd Bergmann, Nicolas Pitre, linux-arm-kernel, linux-kernel,
	Paul McKenney, Will Deacon, Ingo Molnar, Thomas Gleixner

On Mon, Dec 17, 2018 at 11:50:20PM +0100, Peter Zijlstra wrote:
> On Tue, Feb 20, 2018 at 10:59:47PM +0100, Arnd Bergmann wrote:
> > Hi Nico, all,
> > 
> > I was playing with ARM link-time optimization handling earlier this
> > month, and eventually got it to build cleanly with randconfig kernels,
> > but ended up with a lot of ugly hacks to actually pull it off.
> 
> How are we dealing with the fact that LTO can break RCU in very subtle
> and scary ways?
> 
> Do we have a compiler guy on board that has given us a compiler switch
> that kills that optimization (and thereby guarantees that behaviour for
> future compilers etc..) ?

Can you actually define what optimization you are worred about?

If there are optimizations that cause problems they likely happen
even without LTO inside files. The only difference with LTO is that it
does them between files too.

-Andi

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/7] ARM: hacks for link-time optimization
  2018-12-18  0:08   ` Andi Kleen
@ 2018-12-18  9:18     ` Peter Zijlstra
  2018-12-18 10:00       ` Peter Zijlstra
  2018-12-21 17:20       ` Andi Kleen
  0 siblings, 2 replies; 35+ messages in thread
From: Peter Zijlstra @ 2018-12-18  9:18 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Arnd Bergmann, Nicolas Pitre, linux-arm-kernel, linux-kernel,
	Paul McKenney, Will Deacon, Ingo Molnar, Thomas Gleixner

On Mon, Dec 17, 2018 at 04:08:00PM -0800, Andi Kleen wrote:
> On Mon, Dec 17, 2018 at 11:50:20PM +0100, Peter Zijlstra wrote:
> > On Tue, Feb 20, 2018 at 10:59:47PM +0100, Arnd Bergmann wrote:
> > > Hi Nico, all,
> > > 
> > > I was playing with ARM link-time optimization handling earlier this
> > > month, and eventually got it to build cleanly with randconfig kernels,
> > > but ended up with a lot of ugly hacks to actually pull it off.
> > 
> > How are we dealing with the fact that LTO can break RCU in very subtle
> > and scary ways?
> > 
> > Do we have a compiler guy on board that has given us a compiler switch
> > that kills that optimization (and thereby guarantees that behaviour for
> > future compilers etc..) ?
> 
> Can you actually define what optimization you are worred about?
> 
> If there are optimizations that cause problems they likely happen
> even without LTO inside files. The only difference with LTO is that it
> does them between files too.

In particular turning an address-dependency into a control-dependency,
which is something allowed by the C language, since it doesn't recognise
these concepts as such.

The 'optimization' is allowed currently, but LTO will make it much more
likely since it will have a much wider view of things. Esp. when combined
with PGO.

Specifically; if you have something like:

int idx;
struct object objs[2];

the statement:

  val = objs[idx & 1].ponies;

which you 'need' to be translated like:

  struct object *obj = objs;
  obj += (idx & 1);
  val = obj->ponies;

Such that the load of obj->ponies depends on the load of idx. However
our dear compiler is allowed to make it:

  if (idx & 1)
    obj = &objs[1];
  else
    obj = &objs[0];

  val = obj->ponies;

Because C doesn't recognise this as being different. However this is
utterly broken, because in this translation we can speculate the load
of obj->ponies such that it no longer depends on the load of idx, which
breaks RCU.

Note that further 'optimization' is possible and the compiler could even
make it:

  if (idx & 1)
    val = objs[1].ponies;
  else
    val = objs[0].ponies;

Now, granted, this is a fairly artificial example, but it does
illustrate the exact problem.

The more the compiler can see of the complete program, the more likely
it can make inferrences like this, esp. when coupled with PGO.

Now, we're (usually) very careful to wrap things in READ_ONCE() and
rcu_dereference() and the like, which makes it harder on the compiler
(because 'volatile' is special), but nothing really stops it from doing
this.

Paul has been trying to beat clue into the language people, but given
he's been at it for 10 years now, and there's no resolution, I figure we
ought to get compiler implementations to give us a knob.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/7] ARM: hacks for link-time optimization
  2018-12-18  9:18     ` Peter Zijlstra
@ 2018-12-18 10:00       ` Peter Zijlstra
  2018-12-21 14:23         ` Paul E. McKenney
  2018-12-21 17:20       ` Andi Kleen
  1 sibling, 1 reply; 35+ messages in thread
From: Peter Zijlstra @ 2018-12-18 10:00 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Arnd Bergmann, Nicolas Pitre, linux-arm-kernel, linux-kernel,
	Paul McKenney, Will Deacon, Ingo Molnar, Thomas Gleixner

On Tue, Dec 18, 2018 at 10:18:24AM +0100, Peter Zijlstra wrote:
> In particular turning an address-dependency into a control-dependency,
> which is something allowed by the C language, since it doesn't recognise
> these concepts as such.
> 
> The 'optimization' is allowed currently, but LTO will make it much more
> likely since it will have a much wider view of things. Esp. when combined
> with PGO.
> 
> Specifically; if you have something like:
> 
> int idx;
> struct object objs[2];
> 
> the statement:
> 
>   val = objs[idx & 1].ponies;
> 
> which you 'need' to be translated like:
> 
>   struct object *obj = objs;
>   obj += (idx & 1);
>   val = obj->ponies;
> 
> Such that the load of obj->ponies depends on the load of idx. However
> our dear compiler is allowed to make it:
> 
>   if (idx & 1)
>     obj = &objs[1];
>   else
>     obj = &objs[0];
> 
>   val = obj->ponies;
> 
> Because C doesn't recognise this as being different. However this is
> utterly broken, because in this translation we can speculate the load
> of obj->ponies such that it no longer depends on the load of idx, which
> breaks RCU.
> 
> Note that further 'optimization' is possible and the compiler could even
> make it:
> 
>   if (idx & 1)
>     val = objs[1].ponies;
>   else
>     val = objs[0].ponies;

A variant that is actually broken on x86 too (due to issuing the loads
in the 'wrong' order):

  val = objs[0].ponies;
  if (idx & 1)
    val = objs[1].ponies;

Which is a translation that makes sense if we either marked
unlikely(idx & 1) or if PGO found the same.

> Now, granted, this is a fairly artificial example, but it does
> illustrate the exact problem.
> 
> The more the compiler can see of the complete program, the more likely
> it can make inferrences like this, esp. when coupled with PGO.
> 
> Now, we're (usually) very careful to wrap things in READ_ONCE() and
> rcu_dereference() and the like, which makes it harder on the compiler
> (because 'volatile' is special), but nothing really stops it from doing
> this.
> 
> Paul has been trying to beat clue into the language people, but given
> he's been at it for 10 years now, and there's no resolution, I figure we
> ought to get compiler implementations to give us a knob.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/7] ARM: hacks for link-time optimization
  2018-12-18 10:00       ` Peter Zijlstra
@ 2018-12-21 14:23         ` Paul E. McKenney
  0 siblings, 0 replies; 35+ messages in thread
From: Paul E. McKenney @ 2018-12-21 14:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andi Kleen, Arnd Bergmann, Nicolas Pitre, linux-arm-kernel,
	linux-kernel, Will Deacon, Ingo Molnar, Thomas Gleixner

On Tue, Dec 18, 2018 at 11:00:14AM +0100, Peter Zijlstra wrote:
> On Tue, Dec 18, 2018 at 10:18:24AM +0100, Peter Zijlstra wrote:
> > In particular turning an address-dependency into a control-dependency,
> > which is something allowed by the C language, since it doesn't recognise
> > these concepts as such.
> > 
> > The 'optimization' is allowed currently, but LTO will make it much more
> > likely since it will have a much wider view of things. Esp. when combined
> > with PGO.
> > 
> > Specifically; if you have something like:
> > 
> > int idx;
> > struct object objs[2];
> > 
> > the statement:
> > 
> >   val = objs[idx & 1].ponies;
> > 
> > which you 'need' to be translated like:
> > 
> >   struct object *obj = objs;
> >   obj += (idx & 1);
> >   val = obj->ponies;
> > 
> > Such that the load of obj->ponies depends on the load of idx. However
> > our dear compiler is allowed to make it:
> > 
> >   if (idx & 1)
> >     obj = &objs[1];
> >   else
> >     obj = &objs[0];
> > 
> >   val = obj->ponies;
> > 
> > Because C doesn't recognise this as being different. However this is
> > utterly broken, because in this translation we can speculate the load
> > of obj->ponies such that it no longer depends on the load of idx, which
> > breaks RCU.

Hence the following in Documentation/RCU/rcu_dereference.txt:

	You are only permitted to use rcu_dereference on pointer values.
	The compiler simply knows too much about integral values to
	trust it to carry dependencies through integer operations.

I got rid of the carrying of dependencies via non-pointers in 2014.
You are telling me that they have crept back?  Sigh!!!  :-/

							Thanx, Paul

> > Note that further 'optimization' is possible and the compiler could even
> > make it:
> > 
> >   if (idx & 1)
> >     val = objs[1].ponies;
> >   else
> >     val = objs[0].ponies;
> 
> A variant that is actually broken on x86 too (due to issuing the loads
> in the 'wrong' order):
> 
>   val = objs[0].ponies;
>   if (idx & 1)
>     val = objs[1].ponies;
> 
> Which is a translation that makes sense if we either marked
> unlikely(idx & 1) or if PGO found the same.
> 
> > Now, granted, this is a fairly artificial example, but it does
> > illustrate the exact problem.
> > 
> > The more the compiler can see of the complete program, the more likely
> > it can make inferrences like this, esp. when coupled with PGO.
> > 
> > Now, we're (usually) very careful to wrap things in READ_ONCE() and
> > rcu_dereference() and the like, which makes it harder on the compiler
> > (because 'volatile' is special), but nothing really stops it from doing
> > this.
> > 
> > Paul has been trying to beat clue into the language people, but given
> > he's been at it for 10 years now, and there's no resolution, I figure we
> > ought to get compiler implementations to give us a knob.
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/7] ARM: hacks for link-time optimization
  2018-12-18  9:18     ` Peter Zijlstra
  2018-12-18 10:00       ` Peter Zijlstra
@ 2018-12-21 17:20       ` Andi Kleen
  2018-12-21 18:00         ` Paul E. McKenney
  1 sibling, 1 reply; 35+ messages in thread
From: Andi Kleen @ 2018-12-21 17:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arnd Bergmann, Nicolas Pitre, linux-arm-kernel, linux-kernel,
	Paul McKenney, Will Deacon, Ingo Molnar, Thomas Gleixner,
	hubicka

> In particular turning an address-dependency into a control-dependency,
> which is something allowed by the C language, since it doesn't recognise
> these concepts as such.
> 
> The 'optimization' is allowed currently, but LTO will make it much more
> likely since it will have a much wider view of things. Esp. when combined
> with PGO.
> 
> Specifically; if you have something like:
> 
> int idx;
> struct object objs[2];
> 
> the statement:
> 
>   val = objs[idx & 1].ponies;
> 
> which you 'need' to be translated like:
> 
>   struct object *obj = objs;
>   obj += (idx & 1);
>   val = obj->ponies;
> 
> Such that the load of obj->ponies depends on the load of idx. However
> our dear compiler is allowed to make it:
> 
>   if (idx & 1)
>     obj = &objs[1];
>   else
>     obj = &objs[0];
> 
>   val = obj->ponies;

I don't see why a compiler would do such an optimization. Clearly
the second variant is worse than the first, bigger and needs
branch prediction resources.

In fact compilers usually try hard to go into the other direction
and apply if conversion.

Has anyone seen real world examples of such changes being done, or is this
all language lawyering theory?

-Andi

> 
> Because C doesn't recognise this as being different. However this is
> utterly broken, because in this translation we can speculate the load
> of obj->ponies such that it no longer depends on the load of idx, which
> breaks RCU.
> 
> Note that further 'optimization' is possible and the compiler could even
> make it:
> 
>   if (idx & 1)
>     val = objs[1].ponies;
>   else
>     val = objs[0].ponies;
> 
> Now, granted, this is a fairly artificial example, but it does
> illustrate the exact problem.
> 
> The more the compiler can see of the complete program, the more likely
> it can make inferrences like this, esp. when coupled with PGO.
> 
> Now, we're (usually) very careful to wrap things in READ_ONCE() and
> rcu_dereference() and the like, which makes it harder on the compiler
> (because 'volatile' is special), but nothing really stops it from doing
> this.
> 
> Paul has been trying to beat clue into the language people, but given
> he's been at it for 10 years now, and there's no resolution, I figure we
> ought to get compiler implementations to give us a knob.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/7] ARM: hacks for link-time optimization
  2018-12-21 17:20       ` Andi Kleen
@ 2018-12-21 18:00         ` Paul E. McKenney
  0 siblings, 0 replies; 35+ messages in thread
From: Paul E. McKenney @ 2018-12-21 18:00 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Peter Zijlstra, Arnd Bergmann, Nicolas Pitre, linux-arm-kernel,
	linux-kernel, Will Deacon, Ingo Molnar, Thomas Gleixner, hubicka

On Fri, Dec 21, 2018 at 09:20:44AM -0800, Andi Kleen wrote:
> > In particular turning an address-dependency into a control-dependency,
> > which is something allowed by the C language, since it doesn't recognise
> > these concepts as such.
> > 
> > The 'optimization' is allowed currently, but LTO will make it much more
> > likely since it will have a much wider view of things. Esp. when combined
> > with PGO.
> > 
> > Specifically; if you have something like:
> > 
> > int idx;
> > struct object objs[2];
> > 
> > the statement:
> > 
> >   val = objs[idx & 1].ponies;
> > 
> > which you 'need' to be translated like:
> > 
> >   struct object *obj = objs;
> >   obj += (idx & 1);
> >   val = obj->ponies;
> > 
> > Such that the load of obj->ponies depends on the load of idx. However
> > our dear compiler is allowed to make it:
> > 
> >   if (idx & 1)
> >     obj = &objs[1];
> >   else
> >     obj = &objs[0];
> > 
> >   val = obj->ponies;
> 
> I don't see why a compiler would do such an optimization. Clearly
> the second variant is worse than the first, bigger and needs
> branch prediction resources.
> 
> In fact compilers usually try hard to go into the other direction
> and apply if conversion.
> 
> Has anyone seen real world examples of such changes being done, or is this
> all language lawyering theory?

I have not seen it myself, but I have heard others claim to.  For example,
if "idx & 1" had to be computed for some other reason, especially if there
was a pre-exiting "if" statement with this as its condition.  Or if you
have hardware that has a conditional-move instruction.  And so on.

Do you have a way to guarantee that it never happens?

							Thanx, Paul

> -Andi
> 
> > 
> > Because C doesn't recognise this as being different. However this is
> > utterly broken, because in this translation we can speculate the load
> > of obj->ponies such that it no longer depends on the load of idx, which
> > breaks RCU.
> > 
> > Note that further 'optimization' is possible and the compiler could even
> > make it:
> > 
> >   if (idx & 1)
> >     val = objs[1].ponies;
> >   else
> >     val = objs[0].ponies;
> > 
> > Now, granted, this is a fairly artificial example, but it does
> > illustrate the exact problem.
> > 
> > The more the compiler can see of the complete program, the more likely
> > it can make inferrences like this, esp. when coupled with PGO.
> > 
> > Now, we're (usually) very careful to wrap things in READ_ONCE() and
> > rcu_dereference() and the like, which makes it harder on the compiler
> > (because 'volatile' is special), but nothing really stops it from doing
> > this.
> > 
> > Paul has been trying to beat clue into the language people, but given
> > he's been at it for 10 years now, and there's no resolution, I figure we
> > ought to get compiler implementations to give us a knob.
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2018-12-21 18:00 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-20 21:59 [PATCH 0/7] ARM: hacks for link-time optimization Arnd Bergmann
2018-02-20 21:59 ` [PATCH 1/7] ARM: disallow combining XIP and LTO Arnd Bergmann
2018-02-21  3:01   ` Nicolas Pitre
2018-02-21 11:50     ` Arnd Bergmann
2018-02-21 15:13       ` Nicolas Pitre
2018-03-12  2:40   ` Nicolas Pitre
2018-03-12 13:52     ` Arnd Bergmann
2018-03-12 16:46       ` Arnd Bergmann
2018-03-12 17:00         ` Nicolas Pitre
2018-03-12 17:05           ` Nicolas Pitre
2018-03-12 17:07           ` Arnd Bergmann
2018-02-20 21:59 ` [PATCH 2/7] ARM: LTO: avoid THUMB2_KERNEL+LTO Arnd Bergmann
2018-02-21  3:12   ` Nicolas Pitre
2018-02-21 11:48     ` Arnd Bergmann
2018-03-07 18:30   ` Matthias Kaehlcke
2018-03-07 18:52     ` Nicolas Pitre
2018-02-20 21:59 ` [PATCH 3/7] [HACK] pass endianess flag to LTO linker Arnd Bergmann
2018-02-21  3:15   ` Nicolas Pitre
2018-02-21  9:44     ` Arnd Bergmann
2018-02-21  8:37   ` Ard Biesheuvel
2018-02-21  9:48     ` Arnd Bergmann
2018-02-21 10:09       ` Ard Biesheuvel
2018-02-21 13:00         ` Arnd Bergmann
2018-02-20 21:59 ` [PATCH 4/7] ARM: io-acorn: fix LTO linking without CONFIG_PRINTK Arnd Bergmann
2018-02-20 21:59 ` [PATCH 5/7] ARM: fix __inflate_kernel_data stack warning for LTO Arnd Bergmann
2018-02-21  3:26   ` Nicolas Pitre
2018-02-20 21:59 ` [PATCH 6/7] ARM: mark assembler-referenced symbols as __visible Arnd Bergmann
2018-02-20 21:59 ` [PATCH 7/7] efi: disable LTO for EFI stub Arnd Bergmann
2018-12-17 22:50 ` [PATCH 0/7] ARM: hacks for link-time optimization Peter Zijlstra
2018-12-18  0:08   ` Andi Kleen
2018-12-18  9:18     ` Peter Zijlstra
2018-12-18 10:00       ` Peter Zijlstra
2018-12-21 14:23         ` Paul E. McKenney
2018-12-21 17:20       ` Andi Kleen
2018-12-21 18:00         ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).