linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/18] Add support for clang LTO
@ 2017-11-15 21:34 Sami Tolvanen
  2017-11-15 21:34 ` [PATCH v2 01/18] kbuild: add ld-name macro and support for GNU gold Sami Tolvanen
                   ` (19 more replies)
  0 siblings, 20 replies; 103+ messages in thread
From: Sami Tolvanen @ 2017-11-15 21:34 UTC (permalink / raw)
  To: Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Mark Rutland, Masahiro Yamada, Maxim Kuvyrkov, Michal Marek,
	Nick Desaulniers, Yury Norov, Matthias Kaehlcke
  Cc: Sami Tolvanen

This series adds build system support for compiling the kernel with clang
Link Time Optimization (LTO), using GNU gold with the LLVMgold plug-in
for linking. Some background for clang's LTO support is available here:

  https://llvm.org/docs/LinkTimeOptimization.html

With -flto, clang produces LLVM bitcode instead of object files, and
the compilation to native code happens at link time. In addition, clang
cannot use an external assembler for inline assembly when LTO is enabled,
which causes further compatibility issues.

The patches in this series remove intermediate linking steps when LTO is
used, postpone processing done on object files until after the LTO link
step, add workarounds for GNU gold incompatibilities, and address inline
assembly incompatibilities for arm64.

These changes allow arm64 defconfig to be compiled with LTO, but other
architectures are not enabled until compatibility issues have been
addressed. In particular, x86 inline assembly doesn't currently compile
with clang's integrated assembler due to this LLVM bug:

  https://bugs.llvm.org/show_bug.cgi?id=24487

Due to recent bug fixes in the toolchain, it's recommended to use clang
5.0 or later, and GNU gold from binutils 2.27 or later, although older
versions may also work depending on your kernel configuration. Here are
the steps for compiling arm64 defconfig with LTO, assuming LLVMgold.so is
in LD_LIBRARY_PATH and the rest of the toolchain is in PATH:

  $ make ARCH=arm64 defconfig
  $ ./scripts/config -e LTO_CLANG
  $ make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- CC=clang

--

Changes in v2:
 - Fixed issues found by the 0-day bot.
 - Moved clang and gold fixes that don't depend on LTO earlier in the
   patch set.
 - Added clang support to cc-version.
 - Renamed CLANG_LTO to LTO_CLANG, added CONFIG_LTO for sharing code
   with potential gcc LTO work, and added checks for compiler support
   to prepare-compiler-check.
 - Added linker selection to Makefile, so LD doesn't have to be passed
   in make command line.
 - arm64 specific:
   - Added -fno-jump-tables to arch/arm64/kvm/hyp to prevent clang
     from generating jump tables with EL1 virtual addresses for code
     that runs at EL2.
   - Adopted Alex's version of the mrs_s / msr_s fix, which uses the
     same code path for both gcc and clang.
   - Enabled LLVM bug 30792 workaround only for clang < 6.0.
   - Instead of disabling PLTs, removed NOLOAD from PLT sections in
     modules to work around a bug in gold.
   - Instead of disabling ARM64_ERRATUM_843419 with gold, enabled the
     previously disabled ADR_PREL_PG_HI21* relocations.
   - Instead of omitting -m for gold, pass aarch64_elf64_(be|le)_vec.

--

Alex Matveev (1):
  arm64: make mrs_s and msr_s macros work with LTO

Greg Hackmann (1):
  arm64: use -mno-implicit-float instead of -mgeneral-regs-only

Sami Tolvanen (16):
  kbuild: add ld-name macro and support for GNU gold
  kbuild: fix LD_DEAD_CODE_DATA_ELIMINATION with GNU gold
  kbuild: move gcc-version.sh to cc-version.sh and add clang support
  arm64: fix -m for GNU gold
  arm64: kvm: use -fno-jump-tables with clang
  arm64: keep .altinstructions and .altinstr_replacement
  arm64: don't disable ADR_PREL_PG_HI21* with ARM64_ERRATUM_843419
  arm64: explicitly pass --no-fix-cortex-a53-843419 to GNU gold
  arm64: add a workaround for GNU gold with ARM64_MODULE_PLTS
  kbuild: add support for clang LTO
  kbuild: fix dynamic ftrace with clang LTO
  scripts/mod: disable LTO for empty.c
  efi/libstub: disable LTO
  arm64: crypto: disable LTO for aes-ce-cipher.c
  arm64: disable RANDOMIZE_MODULE_REGION_FULL with LTO_CLANG
  arm64: select ARCH_SUPPORTS_LTO_CLANG

 .gitignore                            |   2 +
 Makefile                              |  59 ++++++++++++++++++-
 arch/Kconfig                          |  32 ++++++++++
 arch/arm64/Kconfig                    |   3 +-
 arch/arm64/Makefile                   |  21 ++++++-
 arch/arm64/crypto/Makefile            |   2 +-
 arch/arm64/include/asm/kvm_hyp.h      |   8 ++-
 arch/arm64/include/asm/sysreg.h       |  55 +++++++++++------
 arch/arm64/kernel/module.c            |   2 -
 arch/arm64/kernel/module.lds          |   4 +-
 arch/arm64/kernel/vmlinux.lds.S       |   4 +-
 arch/arm64/kvm/hyp/Makefile           |   4 ++
 drivers/firmware/efi/libstub/Makefile |   3 +-
 include/asm-generic/vmlinux.lds.h     |   9 +--
 include/linux/compiler-clang.h        |   7 +++
 include/linux/compiler_types.h        |   4 ++
 kernel/trace/ftrace.c                 |   6 +-
 scripts/Kbuild.include                |   8 ++-
 scripts/Makefile.build                |  80 ++++++++++++++++++++++++-
 scripts/Makefile.modpost              |  67 ++++++++++++++++++---
 scripts/cc-version.sh                 |  45 ++++++++++++++
 scripts/gcc-version.sh                |  33 -----------
 scripts/link-vmlinux.sh               | 108 ++++++++++++++++++++++++++++++----
 scripts/mod/Makefile                  |   1 +
 scripts/recordmcount.c                |   3 +-
 25 files changed, 475 insertions(+), 95 deletions(-)
 create mode 100755 scripts/cc-version.sh
 delete mode 100755 scripts/gcc-version.sh

-- 
2.15.0.448.gf294e3d99a-goog

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH v2 01/18] kbuild: add ld-name macro and support for GNU gold
  2017-11-15 21:34 [PATCH v2 00/18] Add support for clang LTO Sami Tolvanen
@ 2017-11-15 21:34 ` Sami Tolvanen
  2017-11-15 21:53   ` Kees Cook
  2017-11-15 21:34 ` [PATCH v2 02/18] kbuild: fix LD_DEAD_CODE_DATA_ELIMINATION with " Sami Tolvanen
                   ` (18 subsequent siblings)
  19 siblings, 1 reply; 103+ messages in thread
From: Sami Tolvanen @ 2017-11-15 21:34 UTC (permalink / raw)
  To: Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Mark Rutland, Masahiro Yamada, Maxim Kuvyrkov, Michal Marek,
	Nick Desaulniers, Yury Norov, Matthias Kaehlcke
  Cc: Sami Tolvanen

GNU gold may require different flags than GNU ld. Add a macro for
detecting the linker and conditionally add gold specific flags from
LDFLAGS_GOLD.

Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
---
 Makefile               | 5 +++++
 scripts/Kbuild.include | 4 ++++
 2 files changed, 9 insertions(+)

diff --git a/Makefile b/Makefile
index 763ab35df12a..f976af9525bf 100644
--- a/Makefile
+++ b/Makefile
@@ -827,6 +827,11 @@ include scripts/Makefile.kasan
 include scripts/Makefile.extrawarn
 include scripts/Makefile.ubsan
 
+# Add any flags specific to ld.gold
+ifeq ($(ld-name),gold)
+LDFLAGS		+= $(LDFLAGS_GOLD)
+endif
+
 # Add any arch overrides and user supplied CPPFLAGS, AFLAGS and CFLAGS as the
 # last assignments
 KBUILD_CPPFLAGS += $(ARCH_CPPFLAGS) $(KCPPFLAGS)
diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include
index 9ffd3dda3889..584d6cecd7c0 100644
--- a/scripts/Kbuild.include
+++ b/scripts/Kbuild.include
@@ -172,6 +172,10 @@ ld-option = $(call try-run,\
 # Important: no spaces around options
 ar-option = $(call try-run, $(AR) rc$(1) "$$TMP",$(1),$(2))
 
+# ld-name
+# Expands to either bfd or gold
+ld-name = $(shell $(LD) -v 2>&1 | grep -q "GNU gold" && echo gold || echo bfd)
+
 # ld-version
 # Note this is mainly for HJ Lu's 3 number binutil versions
 ld-version = $(shell $(LD) --version | $(srctree)/scripts/ld-version.sh)
-- 
2.15.0.448.gf294e3d99a-goog

^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH v2 02/18] kbuild: fix LD_DEAD_CODE_DATA_ELIMINATION with GNU gold
  2017-11-15 21:34 [PATCH v2 00/18] Add support for clang LTO Sami Tolvanen
  2017-11-15 21:34 ` [PATCH v2 01/18] kbuild: add ld-name macro and support for GNU gold Sami Tolvanen
@ 2017-11-15 21:34 ` Sami Tolvanen
  2017-11-15 21:34 ` [PATCH v2 03/18] kbuild: move gcc-version.sh to cc-version.sh and add clang support Sami Tolvanen
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 103+ messages in thread
From: Sami Tolvanen @ 2017-11-15 21:34 UTC (permalink / raw)
  To: Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Mark Rutland, Masahiro Yamada, Maxim Kuvyrkov, Michal Marek,
	Nick Desaulniers, Yury Norov, Matthias Kaehlcke
  Cc: Sami Tolvanen

Don't remove .head.text or .exitcall.exit when linking with --gc-sections,
and include .init.text.* in .init.text and .init.rodata.* in .init.rodata.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
---
 include/asm-generic/vmlinux.lds.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index bdcd1caae092..ce0244780988 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -512,7 +512,7 @@
 		VMLINUX_SYMBOL(__softirqentry_text_end) = .;
 
 /* Section used for early init (in .S files) */
-#define HEAD_TEXT  *(.head.text)
+#define HEAD_TEXT  KEEP(*(.head.text))
 
 #define HEAD_TEXT_SECTION							\
 	.head.text : AT(ADDR(.head.text) - LOAD_OFFSET) {		\
@@ -557,7 +557,7 @@
 	MEM_DISCARD(init.data)						\
 	KERNEL_CTORS()							\
 	MCOUNT_REC()							\
-	*(.init.rodata)							\
+	*(.init.rodata .init.rodata.*)					\
 	FTRACE_EVENTS()							\
 	TRACE_SYSCALLS()						\
 	KPROBE_BLACKLIST()						\
@@ -576,7 +576,7 @@
 	EARLYCON_TABLE()
 
 #define INIT_TEXT							\
-	*(.init.text)							\
+	*(.init.text .init.text.*)					\
 	*(.text.startup)						\
 	MEM_DISCARD(init.text)
 
@@ -593,7 +593,7 @@
 	MEM_DISCARD(exit.text)
 
 #define EXIT_CALL							\
-	*(.exitcall.exit)
+	KEEP(*(.exitcall.exit))
 
 /*
  * bss (Block Started by Symbol) - uninitialized data
-- 
2.15.0.448.gf294e3d99a-goog

^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH v2 03/18] kbuild: move gcc-version.sh to cc-version.sh and add clang support
  2017-11-15 21:34 [PATCH v2 00/18] Add support for clang LTO Sami Tolvanen
  2017-11-15 21:34 ` [PATCH v2 01/18] kbuild: add ld-name macro and support for GNU gold Sami Tolvanen
  2017-11-15 21:34 ` [PATCH v2 02/18] kbuild: fix LD_DEAD_CODE_DATA_ELIMINATION with " Sami Tolvanen
@ 2017-11-15 21:34 ` Sami Tolvanen
  2017-11-15 21:48   ` Kees Cook
  2017-11-15 21:34 ` [PATCH v2 04/18] arm64: use -mno-implicit-float instead of -mgeneral-regs-only Sami Tolvanen
                   ` (16 subsequent siblings)
  19 siblings, 1 reply; 103+ messages in thread
From: Sami Tolvanen @ 2017-11-15 21:34 UTC (permalink / raw)
  To: Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Mark Rutland, Masahiro Yamada, Maxim Kuvyrkov, Michal Marek,
	Nick Desaulniers, Yury Norov, Matthias Kaehlcke
  Cc: Sami Tolvanen

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
---
 scripts/Kbuild.include |  4 ++--
 scripts/cc-version.sh  | 45 +++++++++++++++++++++++++++++++++++++++++++++
 scripts/gcc-version.sh | 33 ---------------------------------
 3 files changed, 47 insertions(+), 35 deletions(-)
 create mode 100755 scripts/cc-version.sh
 delete mode 100755 scripts/gcc-version.sh

diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include
index 584d6cecd7c0..de41ab74121e 100644
--- a/scripts/Kbuild.include
+++ b/scripts/Kbuild.include
@@ -143,11 +143,11 @@ cc-disable-warning = $(call try-run,\
 cc-name = $(shell $(CC) -v 2>&1 | grep -q "clang version" && echo clang || echo gcc)
 
 # cc-version
-cc-version = $(shell $(CONFIG_SHELL) $(srctree)/scripts/gcc-version.sh $(CC))
+cc-version = $(shell $(CONFIG_SHELL) $(srctree)/scripts/cc-version.sh $(CC))
 
 # cc-fullversion
 cc-fullversion = $(shell $(CONFIG_SHELL) \
-	$(srctree)/scripts/gcc-version.sh -p $(CC))
+	$(srctree)/scripts/cc-version.sh -p $(CC))
 
 # cc-ifversion
 # Usage:  EXTRA_CFLAGS += $(call cc-ifversion, -lt, 0402, -O1)
diff --git a/scripts/cc-version.sh b/scripts/cc-version.sh
new file mode 100755
index 000000000000..6d085a8a07e8
--- /dev/null
+++ b/scripts/cc-version.sh
@@ -0,0 +1,45 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+#
+# cc-version [-p] cc-command
+#
+# Prints the compiler version of `command' in a canonical 4-digit form
+# such as `0295' for gcc-2.95, `0303' for gcc-3.3, etc.
+#
+# With the -p option, prints the patchlevel as well, for example `029503' for
+# gcc-2.95.3, `030301' for gcc-3.3.1, etc.
+#
+
+if [ "$1" = "-p" ] ; then
+	with_patchlevel=1;
+	shift;
+fi
+
+compiler="$*"
+
+if [ ${#compiler} -eq 0 ]; then
+	echo "Error: No compiler specified."
+	printf "Usage:\n\t$0 <cc-command>\n"
+	exit 1
+fi
+
+clang=$(echo __clang__ | $compiler -E -x c - | tail -n1)
+
+if [ "$clang" == "1" ]; then
+	major_macro="__clang_major__"
+	minor_macro="__clang_minor__"
+	patchlevel_macro="__clang_patchlevel__"
+else
+	major_macro="__GNUC__"
+	minor_macro="__GNUC_MINOR__"
+	patchlevel_macro="__GNUC_PATCHLEVEL__"
+fi
+
+MAJOR=$(echo $major_macro | $compiler -E -x c - | tail -n 1)
+MINOR=$(echo $minor_macro | $compiler -E -x c - | tail -n 1)
+if [ "x$with_patchlevel" != "x" ] ; then
+	PATCHLEVEL=$(echo $patchlevel_macro | $compiler -E -x c - | tail -n 1)
+	printf "%02d%02d%02d\\n" $MAJOR $MINOR $PATCHLEVEL
+else
+	printf "%02d%02d\\n" $MAJOR $MINOR
+fi
diff --git a/scripts/gcc-version.sh b/scripts/gcc-version.sh
deleted file mode 100755
index 11bb909845e7..000000000000
--- a/scripts/gcc-version.sh
+++ /dev/null
@@ -1,33 +0,0 @@
-#!/bin/sh
-# SPDX-License-Identifier: GPL-2.0
-#
-# gcc-version [-p] gcc-command
-#
-# Prints the gcc version of `gcc-command' in a canonical 4-digit form
-# such as `0295' for gcc-2.95, `0303' for gcc-3.3, etc.
-#
-# With the -p option, prints the patchlevel as well, for example `029503' for
-# gcc-2.95.3, `030301' for gcc-3.3.1, etc.
-#
-
-if [ "$1" = "-p" ] ; then
-	with_patchlevel=1;
-	shift;
-fi
-
-compiler="$*"
-
-if [ ${#compiler} -eq 0 ]; then
-	echo "Error: No compiler specified."
-	printf "Usage:\n\t$0 <gcc-command>\n"
-	exit 1
-fi
-
-MAJOR=$(echo __GNUC__ | $compiler -E -x c - | tail -n 1)
-MINOR=$(echo __GNUC_MINOR__ | $compiler -E -x c - | tail -n 1)
-if [ "x$with_patchlevel" != "x" ] ; then
-	PATCHLEVEL=$(echo __GNUC_PATCHLEVEL__ | $compiler -E -x c - | tail -n 1)
-	printf "%02d%02d%02d\\n" $MAJOR $MINOR $PATCHLEVEL
-else
-	printf "%02d%02d\\n" $MAJOR $MINOR
-fi
-- 
2.15.0.448.gf294e3d99a-goog

^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH v2 04/18] arm64: use -mno-implicit-float instead of -mgeneral-regs-only
  2017-11-15 21:34 [PATCH v2 00/18] Add support for clang LTO Sami Tolvanen
                   ` (2 preceding siblings ...)
  2017-11-15 21:34 ` [PATCH v2 03/18] kbuild: move gcc-version.sh to cc-version.sh and add clang support Sami Tolvanen
@ 2017-11-15 21:34 ` Sami Tolvanen
  2017-11-15 21:34 ` [PATCH v2 05/18] arm64: fix -m for GNU gold Sami Tolvanen
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 103+ messages in thread
From: Sami Tolvanen @ 2017-11-15 21:34 UTC (permalink / raw)
  To: Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Mark Rutland, Masahiro Yamada, Maxim Kuvyrkov, Michal Marek,
	Nick Desaulniers, Yury Norov, Matthias Kaehlcke
  Cc: Sami Tolvanen

From: Greg Hackmann <ghackmann@google.com>

LLVM bug 30792 causes clang's AArch64 backend to crash compiling
arch/arm64/crypto/aes-ce-cipher.c.  Replacing -mgeneral-regs-only with
-mno-implicit-float is the suggested workaround.

Signed-off-by: Greg Hackmann <ghackmann@google.com>
Cc: Matthias Kaehlcke <mka@chromium.org>
[added cc-ifversion to enable the workaround only for clang <6.0]
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
---
 arch/arm64/Makefile | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index b35788c909f1..ecd5ed11c764 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -49,7 +49,13 @@ $(warning Detected assembler with broken .inst; disassembly will be unreliable)
   endif
 endif
 
-KBUILD_CFLAGS	+= -mgeneral-regs-only $(lseinstr) $(brokengasinst)
+ifeq ($(cc-name),clang)
+# This is a workaround for https://bugs.llvm.org/show_bug.cgi?id=30792.
+KBUILD_CFLAGS	+= $(call cc-ifversion, -lt, 0600, -mno-implicit-float, -mgeneral-regs-only)
+else
+KBUILD_CFLAGS	+= -mgeneral-regs-only
+endif
+KBUILD_CFLAGS	+= $(lseinstr) $(brokengasinst)
 KBUILD_CFLAGS	+= -fno-asynchronous-unwind-tables
 KBUILD_CFLAGS	+= $(call cc-option, -mpc-relative-literal-loads)
 KBUILD_AFLAGS	+= $(lseinstr) $(brokengasinst)
-- 
2.15.0.448.gf294e3d99a-goog

^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH v2 05/18] arm64: fix -m for GNU gold
  2017-11-15 21:34 [PATCH v2 00/18] Add support for clang LTO Sami Tolvanen
                   ` (3 preceding siblings ...)
  2017-11-15 21:34 ` [PATCH v2 04/18] arm64: use -mno-implicit-float instead of -mgeneral-regs-only Sami Tolvanen
@ 2017-11-15 21:34 ` Sami Tolvanen
  2017-11-16 10:55   ` Yury Norov
  2017-11-15 21:34 ` [PATCH v2 06/18] arm64: kvm: use -fno-jump-tables with clang Sami Tolvanen
                   ` (14 subsequent siblings)
  19 siblings, 1 reply; 103+ messages in thread
From: Sami Tolvanen @ 2017-11-15 21:34 UTC (permalink / raw)
  To: Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Mark Rutland, Masahiro Yamada, Maxim Kuvyrkov, Michal Marek,
	Nick Desaulniers, Yury Norov, Matthias Kaehlcke
  Cc: Sami Tolvanen

GNU gold supports different emulations than bfd. Use aarch64_elf64_*_vec
instead of aarch64linux.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
---
 arch/arm64/Makefile | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index ecd5ed11c764..6059c8169513 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -70,14 +70,22 @@ KBUILD_CPPFLAGS	+= -mbig-endian
 CHECKFLAGS	+= -D__AARCH64EB__
 AS		+= -EB
 LD		+= -EB
+ifeq ($(ld-name),gold)
+LDFLAGS		+= -maarch64_elf64_be_vec
+else
 LDFLAGS		+= -maarch64linuxb
+endif
 UTS_MACHINE	:= aarch64_be
 else
 KBUILD_CPPFLAGS	+= -mlittle-endian
 CHECKFLAGS	+= -D__AARCH64EL__
 AS		+= -EL
 LD		+= -EL
+ifeq ($(ld-name),gold)
+LDFLAGS		+= -maarch64_elf64_le_vec
+else
 LDFLAGS		+= -maarch64linux
+endif
 UTS_MACHINE	:= aarch64
 endif
 
-- 
2.15.0.448.gf294e3d99a-goog

^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH v2 06/18] arm64: kvm: use -fno-jump-tables with clang
  2017-11-15 21:34 [PATCH v2 00/18] Add support for clang LTO Sami Tolvanen
                   ` (4 preceding siblings ...)
  2017-11-15 21:34 ` [PATCH v2 05/18] arm64: fix -m for GNU gold Sami Tolvanen
@ 2017-11-15 21:34 ` Sami Tolvanen
  2017-11-16 11:46   ` Will Deacon
  2017-11-20 14:41   ` Mark Rutland
  2017-11-15 21:34 ` [PATCH v2 07/18] arm64: keep .altinstructions and .altinstr_replacement Sami Tolvanen
                   ` (13 subsequent siblings)
  19 siblings, 2 replies; 103+ messages in thread
From: Sami Tolvanen @ 2017-11-15 21:34 UTC (permalink / raw)
  To: Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Mark Rutland, Masahiro Yamada, Maxim Kuvyrkov, Michal Marek,
	Nick Desaulniers, Yury Norov, Matthias Kaehlcke
  Cc: Sami Tolvanen

Use -fno-jump-tables to make sure clang doesn't generate branches
to EL1 virtual addresses.

Suggested-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
---
 arch/arm64/kvm/hyp/Makefile | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
index f04400d494b7..19fa1c6b6b69 100644
--- a/arch/arm64/kvm/hyp/Makefile
+++ b/arch/arm64/kvm/hyp/Makefile
@@ -5,6 +5,10 @@
 
 ccflags-y += -fno-stack-protector -DDISABLE_BRANCH_PROFILING
 
+ifeq ($(cc-name),clang)
+ccflags-y += -fno-jump-tables
+endif
+
 KVM=../../../../virt/kvm
 
 obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/vgic-v2-sr.o
-- 
2.15.0.448.gf294e3d99a-goog

^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH v2 07/18] arm64: keep .altinstructions and .altinstr_replacement
  2017-11-15 21:34 [PATCH v2 00/18] Add support for clang LTO Sami Tolvanen
                   ` (5 preceding siblings ...)
  2017-11-15 21:34 ` [PATCH v2 06/18] arm64: kvm: use -fno-jump-tables with clang Sami Tolvanen
@ 2017-11-15 21:34 ` Sami Tolvanen
  2017-11-15 21:34 ` [PATCH v2 08/18] arm64: don't disable ADR_PREL_PG_HI21* with ARM64_ERRATUM_843419 Sami Tolvanen
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 103+ messages in thread
From: Sami Tolvanen @ 2017-11-15 21:34 UTC (permalink / raw)
  To: Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Mark Rutland, Masahiro Yamada, Maxim Kuvyrkov, Michal Marek,
	Nick Desaulniers, Yury Norov, Matthias Kaehlcke
  Cc: Sami Tolvanen

Make sure the linker doesn't remove .altinstructions or
.altinstr_replacement when CONFIG_LD_DEAD_CODE_DATA_ELIMINATION is
enabled.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
---
 arch/arm64/kernel/vmlinux.lds.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 7da3e5c366a0..15479995869c 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -138,11 +138,11 @@ SECTIONS
 	. = ALIGN(4);
 	.altinstructions : {
 		__alt_instructions = .;
-		*(.altinstructions)
+		KEEP(*(.altinstructions))
 		__alt_instructions_end = .;
 	}
 	.altinstr_replacement : {
-		*(.altinstr_replacement)
+		KEEP(*(.altinstr_replacement))
 	}
 
 	. = ALIGN(PAGE_SIZE);
-- 
2.15.0.448.gf294e3d99a-goog

^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH v2 08/18] arm64: don't disable ADR_PREL_PG_HI21* with ARM64_ERRATUM_843419
  2017-11-15 21:34 [PATCH v2 00/18] Add support for clang LTO Sami Tolvanen
                   ` (6 preceding siblings ...)
  2017-11-15 21:34 ` [PATCH v2 07/18] arm64: keep .altinstructions and .altinstr_replacement Sami Tolvanen
@ 2017-11-15 21:34 ` Sami Tolvanen
  2017-11-15 22:29   ` Ard Biesheuvel
  2017-11-15 21:34 ` [PATCH v2 09/18] arm64: explicitly pass --no-fix-cortex-a53-843419 to GNU gold Sami Tolvanen
                   ` (11 subsequent siblings)
  19 siblings, 1 reply; 103+ messages in thread
From: Sami Tolvanen @ 2017-11-15 21:34 UTC (permalink / raw)
  To: Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Mark Rutland, Masahiro Yamada, Maxim Kuvyrkov, Michal Marek,
	Nick Desaulniers, Yury Norov, Matthias Kaehlcke
  Cc: Sami Tolvanen

CONFIG_LTO_CLANG depends on GNU gold, which can generate ADR_PREL_PG_HI21*
relocations even with --fix-cortex-a53-843419.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
---
 arch/arm64/kernel/module.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
index f469e0435903..fec9a578f122 100644
--- a/arch/arm64/kernel/module.c
+++ b/arch/arm64/kernel/module.c
@@ -336,14 +336,12 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
 			ovf = reloc_insn_imm(RELOC_OP_PREL, loc, val, 0, 21,
 					     AARCH64_INSN_IMM_ADR);
 			break;
-#ifndef CONFIG_ARM64_ERRATUM_843419
 		case R_AARCH64_ADR_PREL_PG_HI21_NC:
 			overflow_check = false;
 		case R_AARCH64_ADR_PREL_PG_HI21:
 			ovf = reloc_insn_imm(RELOC_OP_PAGE, loc, val, 12, 21,
 					     AARCH64_INSN_IMM_ADR);
 			break;
-#endif
 		case R_AARCH64_ADD_ABS_LO12_NC:
 		case R_AARCH64_LDST8_ABS_LO12_NC:
 			overflow_check = false;
-- 
2.15.0.448.gf294e3d99a-goog

^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH v2 09/18] arm64: explicitly pass --no-fix-cortex-a53-843419 to GNU gold
  2017-11-15 21:34 [PATCH v2 00/18] Add support for clang LTO Sami Tolvanen
                   ` (7 preceding siblings ...)
  2017-11-15 21:34 ` [PATCH v2 08/18] arm64: don't disable ADR_PREL_PG_HI21* with ARM64_ERRATUM_843419 Sami Tolvanen
@ 2017-11-15 21:34 ` Sami Tolvanen
  2017-11-16 11:47   ` Will Deacon
  2017-11-15 21:34 ` [PATCH v2 10/18] arm64: add a workaround for GNU gold with ARM64_MODULE_PLTS Sami Tolvanen
                   ` (10 subsequent siblings)
  19 siblings, 1 reply; 103+ messages in thread
From: Sami Tolvanen @ 2017-11-15 21:34 UTC (permalink / raw)
  To: Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Mark Rutland, Masahiro Yamada, Maxim Kuvyrkov, Michal Marek,
	Nick Desaulniers, Yury Norov, Matthias Kaehlcke
  Cc: Sami Tolvanen

Some versions of GNU gold are known to produce broken code with
--fix-cortex-a53-843419 as explained in this bug:

  https://sourceware.org/bugzilla/show_bug.cgi?id=21491

If ARM64_ERRATUM_843419 is disabled and we're using GNU gold, pass
--no-fix-cortex-a53-843419 to the linker to ensure the erratum fix is not
used even if the linker is configured to enable it by default.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
---
 arch/arm64/Makefile | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index 6059c8169513..ca700b201736 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -30,6 +30,11 @@ LDFLAGS_vmlinux	+= --fix-cortex-a53-843419
   endif
 endif
 
+ifeq ($(CONFIG_ARM64_ERRATUM_843419),)
+# https://sourceware.org/bugzilla/show_bug.cgi?id=21491
+LDFLAGS_GOLD	+= --no-fix-cortex-a53-843419
+endif
+
 KBUILD_DEFCONFIG := defconfig
 
 # Check for binutils support for specific extensions
-- 
2.15.0.448.gf294e3d99a-goog

^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH v2 10/18] arm64: add a workaround for GNU gold with ARM64_MODULE_PLTS
  2017-11-15 21:34 [PATCH v2 00/18] Add support for clang LTO Sami Tolvanen
                   ` (8 preceding siblings ...)
  2017-11-15 21:34 ` [PATCH v2 09/18] arm64: explicitly pass --no-fix-cortex-a53-843419 to GNU gold Sami Tolvanen
@ 2017-11-15 21:34 ` Sami Tolvanen
  2017-11-16 11:50   ` Will Deacon
  2017-11-15 21:34 ` [PATCH v2 11/18] arm64: make mrs_s and msr_s macros work with LTO Sami Tolvanen
                   ` (9 subsequent siblings)
  19 siblings, 1 reply; 103+ messages in thread
From: Sami Tolvanen @ 2017-11-15 21:34 UTC (permalink / raw)
  To: Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Mark Rutland, Masahiro Yamada, Maxim Kuvyrkov, Michal Marek,
	Nick Desaulniers, Yury Norov, Matthias Kaehlcke
  Cc: Sami Tolvanen

CONFIG_CLANG_LTO depends on GNU gold and due to a known bug, the
linker crashes when ARM64_MODULE_PLTS is enabled:

  https://sourceware.org/bugzilla/show_bug.cgi?id=14592

To work around the problem, this change removes NOLOAD from .plt
and .init.plt, which allows us to link modules with ld.gold.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
---
 arch/arm64/kernel/module.lds | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/module.lds b/arch/arm64/kernel/module.lds
index f7c9781a9d48..eacb5c67f61e 100644
--- a/arch/arm64/kernel/module.lds
+++ b/arch/arm64/kernel/module.lds
@@ -1,4 +1,4 @@
 SECTIONS {
-	.plt (NOLOAD) : { BYTE(0) }
-	.init.plt (NOLOAD) : { BYTE(0) }
+	.plt : { BYTE(0) }
+	.init.plt : { BYTE(0) }
 }
-- 
2.15.0.448.gf294e3d99a-goog

^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH v2 11/18] arm64: make mrs_s and msr_s macros work with LTO
  2017-11-15 21:34 [PATCH v2 00/18] Add support for clang LTO Sami Tolvanen
                   ` (9 preceding siblings ...)
  2017-11-15 21:34 ` [PATCH v2 10/18] arm64: add a workaround for GNU gold with ARM64_MODULE_PLTS Sami Tolvanen
@ 2017-11-15 21:34 ` Sami Tolvanen
  2017-11-16 11:54   ` Will Deacon
  2017-11-15 21:34 ` [PATCH v2 12/18] kbuild: add support for clang LTO Sami Tolvanen
                   ` (8 subsequent siblings)
  19 siblings, 1 reply; 103+ messages in thread
From: Sami Tolvanen @ 2017-11-15 21:34 UTC (permalink / raw)
  To: Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Mark Rutland, Masahiro Yamada, Maxim Kuvyrkov, Michal Marek,
	Nick Desaulniers, Yury Norov, Matthias Kaehlcke
  Cc: Sami Tolvanen

From: Alex Matveev <alxmtvv@gmail.com>

Use UNDEFINE_MRS_S and UNDEFINE_MSR_S to define corresponding macros
in-place and workaround gcc and clang limitations on redefining macros
across different assembler blocks.

Signed-off-by: Alex Matveev <alxmtvv@gmail.com>
Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
---
 arch/arm64/include/asm/kvm_hyp.h |  8 ++++--
 arch/arm64/include/asm/sysreg.h  | 55 +++++++++++++++++++++++++++-------------
 2 files changed, 43 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 4572a9b560fa..20bfb8e676e0 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -29,7 +29,9 @@
 	({								\
 		u64 reg;						\
 		asm volatile(ALTERNATIVE("mrs %0, " __stringify(r##nvh),\
-					 "mrs_s %0, " __stringify(r##vh),\
+					 DEFINE_MRS_S			\
+					 "mrs_s %0, " __stringify(r##vh) "\n"\
+					 UNDEFINE_MRS_S,		\
 					 ARM64_HAS_VIRT_HOST_EXTN)	\
 			     : "=r" (reg));				\
 		reg;							\
@@ -39,7 +41,9 @@
 	do {								\
 		u64 __val = (u64)(v);					\
 		asm volatile(ALTERNATIVE("msr " __stringify(r##nvh) ", %x0",\
-					 "msr_s " __stringify(r##vh) ", %x0",\
+					 DEFINE_MSR_S			\
+					 "msr_s " __stringify(r##vh) ", %x0\n"\
+					 UNDEFINE_MSR_S,		\
 					 ARM64_HAS_VIRT_HOST_EXTN)	\
 					 : : "rZ" (__val));		\
 	} while (0)
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 08cc88574659..3ae147c7e160 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -584,20 +584,39 @@
 
 #include <linux/types.h>
 
-asm(
-"	.irp	num,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30\n"
-"	.equ	.L__reg_num_x\\num, \\num\n"
-"	.endr\n"
+#define __DEFINE_MRS_MSR_S_REGNUM				\
+"	.irp	num,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30\n" \
+"	.equ	.L__reg_num_x\\num, \\num\n"			\
+"	.endr\n"						\
 "	.equ	.L__reg_num_xzr, 31\n"
-"\n"
-"	.macro	mrs_s, rt, sreg\n"
-	__emit_inst(0xd5200000|(\\sreg)|(.L__reg_num_\\rt))
+
+#define DEFINE_MRS_S						\
+	__DEFINE_MRS_MSR_S_REGNUM				\
+"	.macro	mrs_s, rt, sreg\n"				\
+	__emit_inst(0xd5200000|(\\sreg)|(.L__reg_num_\\rt))	\
 "	.endm\n"
-"\n"
-"	.macro	msr_s, sreg, rt\n"
-	__emit_inst(0xd5000000|(\\sreg)|(.L__reg_num_\\rt))
+
+#define DEFINE_MSR_S						\
+	__DEFINE_MRS_MSR_S_REGNUM				\
+"	.macro	msr_s, sreg, rt\n"				\
+	__emit_inst(0xd5000000|(\\sreg)|(.L__reg_num_\\rt))	\
 "	.endm\n"
-);
+
+#define UNDEFINE_MRS_S						\
+"	.purgem	mrs_s\n"
+
+#define UNDEFINE_MSR_S						\
+"	.purgem	msr_s\n"
+
+#define __mrs_s(r, v)						\
+	DEFINE_MRS_S						\
+"	mrs_s %0, " __stringify(r) "\n"				\
+	UNDEFINE_MRS_S : "=r" (v)
+
+#define __msr_s(r, v)						\
+	DEFINE_MSR_S						\
+"	msr_s " __stringify(r) ", %x0\n"			\
+	UNDEFINE_MSR_S : : "rZ" (v)
 
 /*
  * Unlike read_cpuid, calls to read_sysreg are never expected to be
@@ -623,15 +642,15 @@ asm(
  * For registers without architectural names, or simply unsupported by
  * GAS.
  */
-#define read_sysreg_s(r) ({						\
-	u64 __val;							\
-	asm volatile("mrs_s %0, " __stringify(r) : "=r" (__val));	\
-	__val;								\
+#define read_sysreg_s(r) ({					\
+	u64 __val;						\
+	asm volatile(__mrs_s(r, __val));			\
+	__val;							\
 })
 
-#define write_sysreg_s(v, r) do {					\
-	u64 __val = (u64)(v);						\
-	asm volatile("msr_s " __stringify(r) ", %x0" : : "rZ" (__val));	\
+#define write_sysreg_s(v, r) do {				\
+	u64 __val = (u64)(v);					\
+	asm volatile(__msr_s(r, __val));			\
 } while (0)
 
 static inline void config_sctlr_el1(u32 clear, u32 set)
-- 
2.15.0.448.gf294e3d99a-goog

^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH v2 12/18] kbuild: add support for clang LTO
  2017-11-15 21:34 [PATCH v2 00/18] Add support for clang LTO Sami Tolvanen
                   ` (10 preceding siblings ...)
  2017-11-15 21:34 ` [PATCH v2 11/18] arm64: make mrs_s and msr_s macros work with LTO Sami Tolvanen
@ 2017-11-15 21:34 ` Sami Tolvanen
  2017-11-15 22:06   ` Kees Cook
  2017-11-18  3:21   ` [v2,12/18] " Nicholas Piggin
  2017-11-15 21:34 ` [PATCH v2 13/18] kbuild: fix dynamic ftrace with " Sami Tolvanen
                   ` (7 subsequent siblings)
  19 siblings, 2 replies; 103+ messages in thread
From: Sami Tolvanen @ 2017-11-15 21:34 UTC (permalink / raw)
  To: Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Mark Rutland, Masahiro Yamada, Maxim Kuvyrkov, Michal Marek,
	Nick Desaulniers, Yury Norov, Matthias Kaehlcke
  Cc: Sami Tolvanen

This change adds the configuration option CONFIG_LTO_CLANG, and
build system support for clang's Link Time Optimization (LTO). In
preparation for LTO support for other compilers, potentially common
parts of the changes are gated behind CONFIG_LTO instead.

With -flto, instead of object files, clang produces LLVM bitcode,
which is compiled into a native object at link time, allowing the
final binary to be optimized globally. For more details, see:

  https://llvm.org/docs/LinkTimeOptimization.html

While the kernel normally uses GNU ld for linking, LLVM supports LTO
only with lld or GNU gold linkers. This patch set assumes gold will
be used with the LLVMgold plug-in to perform the LTO link step. Due
to potential incompatibilities with GNU ld, this change also adds
LDFINAL_vmlinux for using a different linker for the vmlinux_link
step, and defaults to using GNU ld.

Assuming LLVMgold.so is in LD_LIBRARY_PATH and CONFIG_LTO_CLANG has
been selected, an LTO kernel can be built simply by running make
CC=clang. Recommended versions are >= 5.0 for clang, and >= 2.27 for
binutils.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
---
 .gitignore               |  2 ++
 Makefile                 | 54 +++++++++++++++++++++++++++-
 arch/Kconfig             | 32 +++++++++++++++++
 scripts/Makefile.build   | 66 +++++++++++++++++++++++++++++++++-
 scripts/Makefile.modpost | 63 ++++++++++++++++++++++++++++-----
 scripts/link-vmlinux.sh  | 92 ++++++++++++++++++++++++++++++++++++++++++------
 6 files changed, 288 insertions(+), 21 deletions(-)

diff --git a/.gitignore b/.gitignore
index 6c119eab5d46..ac236e2bb9b1 100644
--- a/.gitignore
+++ b/.gitignore
@@ -11,6 +11,7 @@
 #
 .*
 *.a
+*.a.*
 *.bin
 *.bz2
 *.c.[012]*.*
@@ -28,6 +29,7 @@
 *.lzma
 *.lzo
 *.mod.c
+*.modversions
 *.o
 *.o.*
 *.order
diff --git a/Makefile b/Makefile
index f976af9525bf..8141b4c8f1bf 100644
--- a/Makefile
+++ b/Makefile
@@ -350,6 +350,7 @@ include scripts/Kbuild.include
 # Make variables (CC, etc...)
 AS		= $(CROSS_COMPILE)as
 LD		= $(CROSS_COMPILE)ld
+LDGOLD		= $(CROSS_COMPILE)ld.gold
 CC		= $(CROSS_COMPILE)gcc
 CPP		= $(CC) -E
 AR		= $(CROSS_COMPILE)ar
@@ -623,6 +624,15 @@ endif
 # Defaults to vmlinux, but the arch makefile usually adds further targets
 all: vmlinux
 
+# Make toolchain changes before including arch/$(SRCARCH)/Makefile to ensure
+# cc-/ld-* macros return correct values.
+ifdef CONFIG_LTO_CLANG
+# use GNU gold with LLVMgold for LTO linking, and LD for vmlinux_link
+LDFINAL_vmlinux := $(LD)
+LD		:= $(LDGOLD)
+LDFLAGS_GOLD	+= -plugin LLVMgold.so
+endif
+
 # The arch Makefile can set ARCH_{CPP,A,C}FLAGS to override the default
 # values of the respective KBUILD_* variables
 ARCH_CPPFLAGS :=
@@ -789,6 +799,32 @@ KBUILD_CFLAGS	+= $(call cc-option,-ffunction-sections,)
 KBUILD_CFLAGS	+= $(call cc-option,-fdata-sections,)
 endif
 
+ifdef CONFIG_LTO_CLANG
+lto-clang-flags	:= -flto -fvisibility=hidden
+
+# allow disabling only clang LTO where needed 
+DISABLE_LTO_CLANG := -fno-lto
+export DISABLE_LTO_CLANG
+
+ifdef CONFIG_MODVERSIONS
+# llvm-dis is used instead of objdump to process LLVM IR files
+LLVM_DIS	:= llvm-dis
+export LLVM_DIS
+endif
+endif
+
+ifdef CONFIG_LTO
+lto-flags	:= $(lto-clang-flags)
+KBUILD_CFLAGS	+= $(lto-flags)
+
+DISABLE_LTO	:= $(DISABLE_LTO_CLANG)
+export DISABLE_LTO
+
+# LDFINAL_vmlinux and LDFLAGS_FINAL_vmlinux can be set to override
+# the linker and flags for vmlinux_link.
+export LDFINAL_vmlinux LDFLAGS_FINAL_vmlinux
+endif
+
 # arch Makefile may override CC so keep this after arch Makefile is included
 NOSTDINC_FLAGS += -nostdinc -isystem $(shell $(CC) -print-file-name=include)
 CHECKFLAGS     += $(NOSTDINC_FLAGS)
@@ -1090,6 +1126,19 @@ prepare-objtool: $(objtool_target)
 # CC_STACKPROTECTOR_STRONG! Why did it build with _REGULAR?!")
 PHONY += prepare-compiler-check
 prepare-compiler-check: FORCE
+# Make sure we're using clang with LTO_CLANG
+ifdef CONFIG_LTO_CLANG
+  ifneq ($(cc-name),clang)
+	@echo Cannot use CONFIG_LTO_CLANG without CC=clang >&2 && exit 1
+  endif
+endif
+# Make sure compiler supports LTO flags
+ifdef lto-flags
+  ifeq ($(call cc-option, $(lto-flags)),)
+	@echo Cannot use CONFIG_LTO: $(lto-flags) not supported by compiler \
+		>&2 && exit 1
+  endif
+endif
 # Make sure compiler supports requested stack protector flag.
 ifdef stackp-name
   ifeq ($(call cc-option, $(stackp-flag)),)
@@ -1564,7 +1613,10 @@ clean: $(clean-dirs)
 		-o -name modules.builtin -o -name '.tmp_*.o.*' \
 		-o -name '*.c.[012]*.*' \
 		-o -name '*.ll' \
-		-o -name '*.gcno' \) -type f -print | xargs rm -f
+		-o -name '*.gcno' \
+		-o -name '*.[oa].objects' \
+		-o -name '*.o.symversions' \
+		-o -name '*.modversions' \) -type f -print | xargs rm -f
 
 # Generate tags for editors
 # ---------------------------------------------------------------------------
diff --git a/arch/Kconfig b/arch/Kconfig
index 400b9e1b2f27..bb5296ecebdd 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -585,6 +585,7 @@ config CC_STACKPROTECTOR_STRONG
 endchoice
 
 config THIN_ARCHIVES
+	depends on !LTO_CLANG
 	def_bool y
 	help
 	  Select this if the architecture wants to use thin archives
@@ -605,6 +606,37 @@ config LD_DEAD_CODE_DATA_ELIMINATION
 	  sections (e.g., '.text.init'). Typically '.' in section names
 	  is used to distinguish them from label names / C identifiers.
 
+config LTO
+	bool
+
+config ARCH_SUPPORTS_LTO_CLANG
+	bool
+	help
+	  An architecture should select this option it supports:
+	  - compiling with clang,
+	  - compiling inline assembly with clang's integrated assembler,
+	  - and linking with either lld or GNU gold w/ LLVMgold.
+
+config LTO_CLANG
+	bool "Use clang Link Time Optimization (LTO)"
+	depends on ARCH_SUPPORTS_LTO_CLANG
+	depends on !FTRACE_MCOUNT_RECORD
+	select LTO
+	select LD_DEAD_CODE_DATA_ELIMINATION
+	help
+          This option enables clang's Link Time Optimization (LTO), which allows
+          the compiler to optimize the kernel globally at link time. If you
+          enable this option, the compiler generates LLVM IR instead of object
+          files, and the actual compilation from IR occurs at the LTO link step,
+          which may take several minutes.
+
+          If you select this option, you must compile the kernel with clang
+          (make CC=clang) and have the LVMgold plug-in in LD_LIBRARY_PATH.
+
+          Using clang >= 5.0 and GNU gold from binutils >= 2.27 is recommended
+          for this option.
+
+
 config HAVE_ARCH_WITHIN_STACK_FRAMES
 	bool
 	help
diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index e63af4e19382..e8bf5c440612 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -210,6 +210,23 @@ else
 
 cmd_cc_o_c = $(CC) $(c_flags) -c -o $(@D)/.tmp_$(@F) $<
 
+ifdef CONFIG_LTO_CLANG
+# Generate .o.symversions files for each .o with exported symbols, and link these
+# to the kernel and/or modules at the end.
+cmd_modversions_c =								\
+	if echo '$(c_flags)' | grep -q -- '$(DISABLE_LTO_CLANG)'; then		\
+		if $(OBJDUMP) -h $(@D)/.tmp_$(@F) | grep -q __ksymtab; then	\
+			$(call cmd_gensymtypes_c,$(KBUILD_SYMTYPES),$(@:.o=.symtypes)) \
+			    > $(@D)/$(@F).symversions;				\
+		fi;								\
+	else									\
+		if $(LLVM_DIS) -o=- $(@D)/.tmp_$(@F) | grep -q __ksymtab; then	\
+			$(call cmd_gensymtypes_c,$(KBUILD_SYMTYPES),$(@:.o=.symtypes)) \
+			    > $(@D)/$(@F).symversions;				\
+		fi;								\
+	fi;									\
+	mv -f $(@D)/.tmp_$(@F) $@;
+else
 cmd_modversions_c =								\
 	if $(OBJDUMP) -h $(@D)/.tmp_$(@F) | grep -q __ksymtab; then		\
 		$(call cmd_gensymtypes_c,$(KBUILD_SYMTYPES),$(@:.o=.symtypes))	\
@@ -222,6 +239,7 @@ cmd_modversions_c =								\
 		mv -f $(@D)/.tmp_$(@F) $@;					\
 	fi;
 endif
+endif
 
 ifdef CONFIG_FTRACE_MCOUNT_RECORD
 ifdef BUILD_C_RECORDMCOUNT
@@ -443,6 +461,13 @@ $(obj)/%-asn1.c $(obj)/%-asn1.h: $(src)/%.asn1 $(objtree)/scripts/asn1_compiler
 # To build objects in subdirs, we need to descend into the directories
 $(sort $(subdir-obj-y)): $(subdir-ym) ;
 
+ifdef CONFIG_LTO_CLANG
+# If LTO is enabled, we remove all intermediate linking steps and instead
+# collect a list of all objects to be linked at the end.
+cc_lto_objects = $(foreach o,$(1),\
+			[ -f ${o}.objects ] && cat ${o}.objects || echo ${o};)
+endif
+
 #
 # Rule to compile a set of .o files into one .o file
 #
@@ -458,6 +483,16 @@ else
   quiet_cmd_link_o_target = LD      $@
 endif
 
+ifdef CONFIG_LTO_CLANG
+builtin-cmds = $(call cc_lto_objects,$(filter $(obj-y), $^))
+
+quiet_cmd_update_builtin = GEN     $@
+cmd_update_builtin = (cat /dev/null; $(builtin-cmds)) > $@.objects && \
+			cat /dev/null > $@
+
+$(builtin-target): $(obj-y) FORCE
+	$(call if_changed,update_builtin)
+else
 # If the list of objects to link is empty, just create an empty built-in.o
 cmd_link_o_target = $(if $(strip $(obj-y)),\
 		      $(cmd_make_builtin) $@ $(filter $(obj-y), $^) \
@@ -466,6 +501,7 @@ cmd_link_o_target = $(if $(strip $(obj-y)),\
 
 $(builtin-target): $(obj-y) FORCE
 	$(call if_changed,link_o_target)
+endif
 
 targets += $(builtin-target)
 endif # builtin-target
@@ -487,6 +523,16 @@ $(modorder-target): $(subdir-ym) FORCE
 # Rule to compile a set of .o files into one .a file
 #
 ifdef lib-target
+ifdef CONFIG_LTO_CLANG
+lib-target-cmds = $(call cc_lto_objects,$(lib-y))
+
+quiet_cmd_update_lib_target = GEN     $@
+cmd_update_lib_target = (cat /dev/null; $(lib-target-cmds)) > $@.objects && \
+			cat /dev/null > $@
+
+$(lib-target): $(lib-y) FORCE
+	$(call if_changed,update_lib_target)
+else
 quiet_cmd_link_l_target = AR      $@
 
 ifdef CONFIG_THIN_ARCHIVES
@@ -497,6 +543,7 @@ endif
 
 $(lib-target): $(lib-y) FORCE
 	$(call if_changed,link_l_target)
+endif
 
 targets += $(lib-target)
 
@@ -552,14 +599,31 @@ endif
 quiet_cmd_link_multi-m = LD [M]  $@
 cmd_link_multi-m = $(cmd_link_multi-link)
 
+ifdef CONFIG_LTO_CLANG
+multi-deps-cmds = $(call cc_lto_objects,$(link_multi_deps))
+
+quiet_cmd_update_multi_deps = GEN     $@
+cmd_update_multi_deps = (cat /dev/null; $(multi-deps-cmds)) > $@.objects && \
+			cat /dev/null > $@
+
+$(multi-used-y): FORCE
+	$(call if_changed,update_multi_deps)
+
+$(multi-used-m): FORCE
+	$(call if_changed,update_multi_deps)
+	@{ echo $(@:.o=.ko); echo $(link_multi_deps); \
+	   $(cmd_undef_syms); } > $(MODVERDIR)/$(@F:.o=.mod)
+else
 $(multi-used-y): FORCE
 	$(call if_changed,link_multi-y)
-$(call multi_depend, $(multi-used-y), .o, -objs -y)
 
 $(multi-used-m): FORCE
 	$(call if_changed,link_multi-m)
 	@{ echo $(@:.o=.ko); echo $(link_multi_deps); \
 	   $(cmd_undef_syms); } > $(MODVERDIR)/$(@F:.o=.mod)
+endif
+
+$(call multi_depend, $(multi-used-y), .o, -objs -y)
 $(call multi_depend, $(multi-used-m), .o, -objs -y -m)
 
 targets += $(multi-used-y) $(multi-used-m)
diff --git a/scripts/Makefile.modpost b/scripts/Makefile.modpost
index 991db7d6e4df..cb1c040a006c 100644
--- a/scripts/Makefile.modpost
+++ b/scripts/Makefile.modpost
@@ -83,12 +83,46 @@ modpost = scripts/mod/modpost                    \
 
 MODPOST_OPT=$(subst -i,-n,$(filter -i,$(MAKEFLAGS)))
 
+# If CONFIG_LTO_CLANG is enabled, .o files are either LLVM IR, or empty, so we
+# need to link them into actual objects before passing them to modpost
+modpost-ext = $(if $(CONFIG_LTO_CLANG),.lto,)
+
+ifdef CONFIG_LTO_CLANG
+quiet_cmd_cc_lto_modversions = GEN [M]  $@
+cmd_cc_lto_modversions = 						\
+	rm -f $(@); 							\
+	if [ -f $(@:.modversions=.o).objects ]; then 			\
+		for i in `cat $(@:.modversions=.o).objects`; do 	\
+			[ -s $$i.symversions ] &&			\
+				cat $$i.symversions >> $(@);		\
+		done;							\
+	else								\
+		[ -s $(@:.modversions=.o).symversions ] &&		\
+			cat $(@:.modversions=.o).symversions >> $(@);	\
+	fi
+
+$(modules:.ko=.modversions): FORCE
+	$(call if_changed,cc_lto_modversions)
+
+quiet_cmd_cc_lto_link_modules = LD [M]  $@
+cmd_cc_lto_link_modules =						\
+	$(LD) $(ld_flags) -r -o $(@)					\
+		$(shell [ -s $(@:$(modpost-ext).o=.modversions) ] &&	\
+			echo -T $(@:$(modpost-ext).o=.modversions))	\
+		$(shell [ -f $(@:$(modpost-ext).o=.o).objects ] &&	\
+			 cat $(@:$(modpost-ext).o=.o).objects ||	\
+			echo $(@:$(modpost-ext).o=.o))
+
+$(modules:.ko=$(modpost-ext).o): %$(modpost-ext).o: %.o %.modversions FORCE
+	$(call if_changed,cc_lto_link_modules)
+endif
+
 # We can go over command line length here, so be careful.
 quiet_cmd_modpost = MODPOST $(words $(filter-out vmlinux FORCE, $^)) modules
-      cmd_modpost = $(MODLISTCMD) | sed 's/\.ko$$/.o/' | $(modpost) $(MODPOST_OPT) -s -T -
+      cmd_modpost = $(MODLISTCMD) | sed 's/\.ko$$/$(modpost-ext)\.o/' | $(modpost) $(MODPOST_OPT) -s -T -
 
 PHONY += __modpost
-__modpost: $(modules:.ko=.o) FORCE
+__modpost: $(modules:.ko=$(modpost-ext).o) FORCE
 	$(call cmd,modpost) $(wildcard vmlinux)
 
 quiet_cmd_kernel-mod = MODPOST $@
@@ -98,8 +132,7 @@ vmlinux.o: FORCE
 	$(call cmd,kernel-mod)
 
 # Declare generated files as targets for modpost
-$(modules:.ko=.mod.c): __modpost ;
-
+$(modules:.ko=$(modpost-ext).mod.c): __modpost ;
 
 # Step 5), compile all *.mod.c files
 
@@ -110,23 +143,37 @@ quiet_cmd_cc_o_c = CC      $@
       cmd_cc_o_c = $(CC) $(c_flags) $(KBUILD_CFLAGS_MODULE) $(CFLAGS_MODULE) \
 		   -c -o $@ $<
 
-$(modules:.ko=.mod.o): %.mod.o: %.mod.c FORCE
+$(modules:.ko=.mod.o): %.mod.o: %$(modpost-ext).mod.c FORCE
 	$(call if_changed_dep,cc_o_c)
 
-targets += $(modules:.ko=.mod.o)
+targets += $(modules:.ko=$(modpost-ext).mod.o)
 
 ARCH_POSTLINK := $(wildcard $(srctree)/arch/$(SRCARCH)/Makefile.postlink)
 
 # Step 6), final link of the modules with optional arch pass after final link
 quiet_cmd_ld_ko_o = LD [M]  $@
+
+ifdef CONFIG_LTO_CLANG
+lto_ko_objects = $(foreach o,$(1:$(modpost-ext).o=.o),			\
+			$(shell [ -f $(o).objects ] && 			\
+				 cat $(o).objects || echo $(o)))
+
+      cmd_ld_ko_o = 							\
+	$(LD) -r $(LDFLAGS)                                 		\
+		 $(KBUILD_LDFLAGS_MODULE) $(LDFLAGS_MODULE) 		\
+		 $(shell [ -s $(@:.ko=.modversions) ] &&		\
+			echo -T $(@:.ko=.modversions))  		\
+		 -o $@ $(call lto_ko_objects, $(filter-out FORCE,$^))
+else
       cmd_ld_ko_o =                                                     \
 	$(LD) -r $(LDFLAGS)                                             \
                  $(KBUILD_LDFLAGS_MODULE) $(LDFLAGS_MODULE)             \
                  -o $@ $(filter-out FORCE,$^) ;                         \
 	$(if $(ARCH_POSTLINK), $(MAKE) -f $(ARCH_POSTLINK) $@, true)
+endif
 
-$(modules): %.ko :%.o %.mod.o FORCE
-	+$(call if_changed,ld_ko_o)
+$(modules): %.ko: %$(modpost-ext).o %.mod.o FORCE
+	$(call if_changed,ld_ko_o)
 
 targets += $(modules)
 
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index e6818b8e7141..fccc123b691f 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -64,6 +64,53 @@ archive_builtin()
 	fi
 }
 
+# If CONFIG_LTO_CLANG is selected, the compiler produces LLVM IR files instead
+# of ELF object files. This function expands individual IR files from a list of
+# objects that would have otherwise been linked already.
+expand()
+{
+	if [ -z "${CONFIG_LTO_CLANG}" ]; then
+		echo $*
+	fi
+
+	local objs
+
+	for o in $*; do
+		if [ -f ${o}.objects ]; then
+			objs="${objs} $(xargs < ${o}.objects)"
+		else
+			objs="${objs} ${o}"
+		fi
+	done
+
+	echo "${objs}"
+}
+
+# If CONFIG_LTO_CLANG is selected, collect generated symbol versions into
+# .tmp_symversions
+modversions()
+{
+	if [ -z "${CONFIG_LTO_CLANG}" ]; then
+		return
+	fi
+
+	if [ -z "${CONFIG_MODVERSIONS}" ]; then
+		return
+	fi
+
+	rm -f .tmp_symversions
+
+	for o in $(expand ${KBUILD_VMLINUX_INIT}) \
+		 $(expand ${KBUILD_VMLINUX_MAIN}) \
+		 $(expand ${KBUILD_VMLINUX_LIBS}); do
+		if [ -f ${o}.symversions ]; then
+			cat ${o}.symversions >> .tmp_symversions
+		fi
+	done
+
+	echo "-T .tmp_symversions"
+}
+
 # Link of vmlinux.o used for section mismatch analysis
 # ${1} output file
 modpost_link()
@@ -78,13 +125,22 @@ modpost_link()
 			${KBUILD_VMLINUX_LIBS}				\
 			--end-group"
 	else
-		objects="${KBUILD_VMLINUX_INIT}				\
+		objects="$(expand ${KBUILD_VMLINUX_INIT})		\
 			--start-group					\
-			${KBUILD_VMLINUX_MAIN}				\
-			${KBUILD_VMLINUX_LIBS}				\
+			$(expand ${KBUILD_VMLINUX_MAIN})		\
+			$(expand ${KBUILD_VMLINUX_LIBS})		\
 			--end-group"
 	fi
-	${LD} ${LDFLAGS} -r -o ${1} ${objects}
+
+	if [ -n "${CONFIG_LTO_CLANG}" ]; then
+		# This might take a while, so indicate that we're doing
+		# an LTO link
+		info LTO vmlinux.o
+	else
+		info LD vmlinux.o
+	fi
+
+	${LD} ${LDFLAGS} -r -o ${1} $(modversions) ${objects}
 }
 
 # Link of vmlinux
@@ -96,6 +152,14 @@ vmlinux_link()
 	local objects
 
 	if [ "${SRCARCH}" != "um" ]; then
+		local ld=${LD}
+		local ldflags="${LDFLAGS} ${LDFLAGS_vmlinux}"
+
+		if [ -n "${LDFINAL_vmlinux}" ]; then
+			ld=${LDFINAL_vmlinux}
+			ldflags="${LDFLAGS_FINAL_vmlinux} ${LDFLAGS_vmlinux}"
+		fi
+
 		if [ -n "${CONFIG_THIN_ARCHIVES}" ]; then
 			objects="--whole-archive			\
 				built-in.o				\
@@ -105,16 +169,15 @@ vmlinux_link()
 				--end-group				\
 				${1}"
 		else
-			objects="${KBUILD_VMLINUX_INIT}			\
+			objects="$(expand ${KBUILD_VMLINUX_INIT})	\
 				--start-group				\
-				${KBUILD_VMLINUX_MAIN}			\
-				${KBUILD_VMLINUX_LIBS}			\
+				$(expand ${KBUILD_VMLINUX_MAIN})	\
+				$(expand ${KBUILD_VMLINUX_LIBS})	\
 				--end-group				\
 				${1}"
 		fi
 
-		${LD} ${LDFLAGS} ${LDFLAGS_vmlinux} -o ${2}		\
-			-T ${lds} ${objects}
+		${ld} ${ldflags} -o ${2} -T ${lds} ${objects}
 	else
 		if [ -n "${CONFIG_THIN_ARCHIVES}" ]; then
 			objects="-Wl,--whole-archive			\
@@ -141,7 +204,6 @@ vmlinux_link()
 	fi
 }
 
-
 # Create ${2} .o file with all symbols from the ${1} object file
 kallsyms()
 {
@@ -192,6 +254,7 @@ cleanup()
 	rm -f .tmp_System.map
 	rm -f .tmp_kallsyms*
 	rm -f .tmp_version
+	rm -f .tmp_symversions
 	rm -f .tmp_vmlinux*
 	rm -f built-in.o
 	rm -f System.map
@@ -253,12 +316,19 @@ ${MAKE} -f "${srctree}/scripts/Makefile.build" obj=init GCC_PLUGINS_CFLAGS="${GC
 archive_builtin
 
 #link vmlinux.o
-info LD vmlinux.o
 modpost_link vmlinux.o
 
 # modpost vmlinux.o to check for section mismatches
 ${MAKE} -f "${srctree}/scripts/Makefile.modpost" vmlinux.o
 
+if [ -n "${CONFIG_LTO_CLANG}" ]; then
+	# Re-use vmlinux.o, so we can avoid the slow LTO link step in
+	# vmlinux_link
+	KBUILD_VMLINUX_INIT=
+	KBUILD_VMLINUX_MAIN=vmlinux.o
+	KBUILD_VMLINUX_LIBS=
+fi
+
 kallsymso=""
 kallsyms_vmlinux=""
 if [ -n "${CONFIG_KALLSYMS}" ]; then
-- 
2.15.0.448.gf294e3d99a-goog

^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH v2 13/18] kbuild: fix dynamic ftrace with clang LTO
  2017-11-15 21:34 [PATCH v2 00/18] Add support for clang LTO Sami Tolvanen
                   ` (11 preceding siblings ...)
  2017-11-15 21:34 ` [PATCH v2 12/18] kbuild: add support for clang LTO Sami Tolvanen
@ 2017-11-15 21:34 ` Sami Tolvanen
  2017-11-15 21:34 ` [PATCH v2 14/18] scripts/mod: disable LTO for empty.c Sami Tolvanen
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 103+ messages in thread
From: Sami Tolvanen @ 2017-11-15 21:34 UTC (permalink / raw)
  To: Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Mark Rutland, Masahiro Yamada, Maxim Kuvyrkov, Michal Marek,
	Nick Desaulniers, Yury Norov, Matthias Kaehlcke
  Cc: Sami Tolvanen

With CONFIG_LTO_CLANG enabled, LLVM IR won't be compiled into object
files until modpost_link. This change postpones calls to recordmcount
until after this step.

In order to exclude ftrace_process_locs from inspection, we add a new
code section .text..ftrace, which we tell recordmcount to ignore, and
a __norecordmcount attribute for moving functions to this section.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
---
 arch/Kconfig                      |  2 +-
 include/asm-generic/vmlinux.lds.h |  1 +
 include/linux/compiler-clang.h    |  7 +++++++
 include/linux/compiler_types.h    |  4 ++++
 kernel/trace/ftrace.c             |  6 +++---
 scripts/Makefile.build            | 14 +++++++++++++-
 scripts/Makefile.modpost          |  4 ++++
 scripts/link-vmlinux.sh           | 16 ++++++++++++++++
 scripts/recordmcount.c            |  3 ++-
 9 files changed, 51 insertions(+), 6 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index bb5296ecebdd..e0d0084308e2 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -620,7 +620,7 @@ config ARCH_SUPPORTS_LTO_CLANG
 config LTO_CLANG
 	bool "Use clang Link Time Optimization (LTO)"
 	depends on ARCH_SUPPORTS_LTO_CLANG
-	depends on !FTRACE_MCOUNT_RECORD
+	depends on !FTRACE_MCOUNT_RECORD || HAVE_C_RECORDMCOUNT
 	select LTO
 	select LD_DEAD_CODE_DATA_ELIMINATION
 	help
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index ce0244780988..c4c44ea73930 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -459,6 +459,7 @@
 #define TEXT_TEXT							\
 		ALIGN_FUNCTION();					\
 		*(.text.hot TEXT_MAIN .text.fixup .text.unlikely)	\
+		*(.text..ftrace)					\
 		*(.text..refcount)					\
 		*(.ref.text)						\
 	MEM_KEEP(init.text)						\
diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
index a06583e41f80..62604537ea66 100644
--- a/include/linux/compiler-clang.h
+++ b/include/linux/compiler-clang.h
@@ -16,3 +16,10 @@
  * with any version that can compile the kernel
  */
 #define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix), __COUNTER__)
+
+#ifdef CONFIG_CLANG_LTO
+#ifdef CONFIG_FTRACE_MCOUNT_RECORD
+#define __norecordmcount \
+	__attribute__((__section__(".text..ftrace")))
+#endif
+#endif
diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
index 6b79a9bba9a7..231c413c615a 100644
--- a/include/linux/compiler_types.h
+++ b/include/linux/compiler_types.h
@@ -253,6 +253,10 @@ struct ftrace_likely_data {
 # define __nostackprotector
 #endif
 
+#ifndef __norecordmcount
+#define __norecordmcount
+#endif
+
 /*
  * Assume alignment of return value.
  */
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 8319e09e15b9..e117b849f9dc 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -5573,9 +5573,9 @@ static int ftrace_cmp_ips(const void *a, const void *b)
 	return 0;
 }
 
-static int ftrace_process_locs(struct module *mod,
-			       unsigned long *start,
-			       unsigned long *end)
+static int __norecordmcount ftrace_process_locs(struct module *mod,
+						unsigned long *start,
+						unsigned long *end)
 {
 	struct ftrace_page *start_pg;
 	struct ftrace_page *pg;
diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index e8bf5c440612..4d990c7cbe34 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -246,6 +246,12 @@ ifdef BUILD_C_RECORDMCOUNT
 ifeq ("$(origin RECORDMCOUNT_WARN)", "command line")
   RECORDMCOUNT_FLAGS = -w
 endif
+
+ifdef CONFIG_LTO_CLANG
+# With LTO, we postpone running recordmcount until after the LTO link step, so
+# let's export the parameters for the link script.
+export RECORDMCOUNT_FLAGS
+else
 # Due to recursion, we must skip empty.o.
 # The empty.o file is created in the make process in order to determine
 # the target endianness and word size. It is made before all other C
@@ -254,17 +260,22 @@ sub_cmd_record_mcount =					\
 	if [ $(@) != "scripts/mod/empty.o" ]; then	\
 		$(objtree)/scripts/recordmcount $(RECORDMCOUNT_FLAGS) "$(@)";	\
 	fi;
+endif
+
 recordmcount_source := $(srctree)/scripts/recordmcount.c \
 		    $(srctree)/scripts/recordmcount.h
-else
+else # !BUILD_C_RECORDMCOUNT
 sub_cmd_record_mcount = set -e ; perl $(srctree)/scripts/recordmcount.pl "$(ARCH)" \
 	"$(if $(CONFIG_CPU_BIG_ENDIAN),big,little)" \
 	"$(if $(CONFIG_64BIT),64,32)" \
 	"$(OBJDUMP)" "$(OBJCOPY)" "$(CC) $(KBUILD_CFLAGS)" \
 	"$(LD)" "$(NM)" "$(RM)" "$(MV)" \
 	"$(if $(part-of-module),1,0)" "$(@)";
+
 recordmcount_source := $(srctree)/scripts/recordmcount.pl
 endif # BUILD_C_RECORDMCOUNT
+
+ifndef CONFIG_LTO_CLANG
 cmd_record_mcount =						\
 	if [ "$(findstring $(CC_FLAGS_FTRACE),$(_c_flags))" =	\
 	     "$(CC_FLAGS_FTRACE)" ]; then			\
@@ -287,6 +298,7 @@ objtool_args += --no-unreachable
 else
 objtool_args += $(call cc-ifversion, -lt, 0405, --no-unreachable)
 endif
+endif # CONFIG_FTRACE_MCOUNT_RECORD
 
 # 'OBJECT_FILES_NON_STANDARD := y': skip objtool checking for a directory
 # 'OBJECT_FILES_NON_STANDARD_foo.o := 'y': skip objtool checking for a file
diff --git a/scripts/Makefile.modpost b/scripts/Makefile.modpost
index cb1c040a006c..5896e07c499b 100644
--- a/scripts/Makefile.modpost
+++ b/scripts/Makefile.modpost
@@ -164,6 +164,10 @@ lto_ko_objects = $(foreach o,$(1:$(modpost-ext).o=.o),			\
 		 $(shell [ -s $(@:.ko=.modversions) ] &&		\
 			echo -T $(@:.ko=.modversions))  		\
 		 -o $@ $(call lto_ko_objects, $(filter-out FORCE,$^))
+
+ifdef CONFIG_FTRACE_MCOUNT_RECORD
+cmd_ld_ko_o += ; $(objtree)/scripts/recordmcount $(RECORDMCOUNT_FLAGS) $@
+endif
 else
       cmd_ld_ko_o =                                                     \
 	$(LD) -r $(LDFLAGS)                                             \
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index fccc123b691f..44ec43c781e0 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -143,6 +143,19 @@ modpost_link()
 	${LD} ${LDFLAGS} -r -o ${1} $(modversions) ${objects}
 }
 
+# If CONFIG_LTO_CLANG is selected, we postpone running recordmcount until
+# we have compiled LLVM IR to an object file.
+recordmcount()
+{
+	if [ -z "${CONFIG_LTO_CLANG}" ]; then
+		return
+	fi
+
+	if [ -n "${CONFIG_FTRACE_MCOUNT_RECORD}" ]; then
+		scripts/recordmcount ${RECORDMCOUNT_FLAGS} $*
+	fi
+}
+
 # Link of vmlinux
 # ${1} - optional extra .o files
 # ${2} - output file
@@ -327,6 +340,9 @@ if [ -n "${CONFIG_LTO_CLANG}" ]; then
 	KBUILD_VMLINUX_INIT=
 	KBUILD_VMLINUX_MAIN=vmlinux.o
 	KBUILD_VMLINUX_LIBS=
+
+	# Call recordmcount if needed
+	recordmcount vmlinux.o
 fi
 
 kallsymso=""
diff --git a/scripts/recordmcount.c b/scripts/recordmcount.c
index 16e086dcc567..69a769904da7 100644
--- a/scripts/recordmcount.c
+++ b/scripts/recordmcount.c
@@ -420,7 +420,8 @@ is_mcounted_section_name(char const *const txtname)
 		strcmp(".softirqentry.text", txtname) == 0 ||
 		strcmp(".kprobes.text", txtname) == 0 ||
 		strcmp(".cpuidle.text", txtname) == 0 ||
-		strcmp(".text.unlikely", txtname) == 0;
+		(strncmp(".text.",       txtname, 6) == 0 &&
+		 strcmp(".text..ftrace", txtname) != 0);
 }
 
 /* 32 bit and 64 bit are very similar */
-- 
2.15.0.448.gf294e3d99a-goog

^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH v2 14/18] scripts/mod: disable LTO for empty.c
  2017-11-15 21:34 [PATCH v2 00/18] Add support for clang LTO Sami Tolvanen
                   ` (12 preceding siblings ...)
  2017-11-15 21:34 ` [PATCH v2 13/18] kbuild: fix dynamic ftrace with " Sami Tolvanen
@ 2017-11-15 21:34 ` Sami Tolvanen
  2017-11-15 21:34 ` [PATCH v2 15/18] efi/libstub: disable LTO Sami Tolvanen
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 103+ messages in thread
From: Sami Tolvanen @ 2017-11-15 21:34 UTC (permalink / raw)
  To: Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Mark Rutland, Masahiro Yamada, Maxim Kuvyrkov, Michal Marek,
	Nick Desaulniers, Yury Norov, Matthias Kaehlcke
  Cc: Sami Tolvanen

With CONFIG_LTO_CLANG, clang generates LLVM IR instead of ELF object
files. As empty.o is used for probing target properties, disable LTO
for it to produce an object file instead.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
---
 scripts/mod/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/scripts/mod/Makefile b/scripts/mod/Makefile
index 42c5d50f2bcc..e014b2fdd069 100644
--- a/scripts/mod/Makefile
+++ b/scripts/mod/Makefile
@@ -1,5 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 OBJECT_FILES_NON_STANDARD := y
+CFLAGS_empty.o += $(DISABLE_LTO)
 
 hostprogs-y	:= modpost mk_elfconfig
 always		:= $(hostprogs-y) empty.o
-- 
2.15.0.448.gf294e3d99a-goog

^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH v2 15/18] efi/libstub: disable LTO
  2017-11-15 21:34 [PATCH v2 00/18] Add support for clang LTO Sami Tolvanen
                   ` (13 preceding siblings ...)
  2017-11-15 21:34 ` [PATCH v2 14/18] scripts/mod: disable LTO for empty.c Sami Tolvanen
@ 2017-11-15 21:34 ` Sami Tolvanen
  2017-11-15 21:34 ` [PATCH v2 16/18] arm64: crypto: disable LTO for aes-ce-cipher.c Sami Tolvanen
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 103+ messages in thread
From: Sami Tolvanen @ 2017-11-15 21:34 UTC (permalink / raw)
  To: Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Mark Rutland, Masahiro Yamada, Maxim Kuvyrkov, Michal Marek,
	Nick Desaulniers, Yury Norov, Matthias Kaehlcke
  Cc: Sami Tolvanen

With CONFIG_LTO_CLANG, we produce LLVM IR instead of object files. Since LTO
is not really needed here and the Makefile assumes we produce an object file,
disable LTO for libstub.

Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
---
 drivers/firmware/efi/libstub/Makefile | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index adaa4a964f0c..69b3fbfb7f97 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -20,7 +20,8 @@ cflags-$(CONFIG_EFI_ARMSTUB)	+= -I$(srctree)/scripts/dtc/libfdt
 KBUILD_CFLAGS			:= $(cflags-y) -DDISABLE_BRANCH_PROFILING \
 				   -D__NO_FORTIFY \
 				   $(call cc-option,-ffreestanding) \
-				   $(call cc-option,-fno-stack-protector)
+				   $(call cc-option,-fno-stack-protector) \
+				   $(DISABLE_LTO)
 
 GCOV_PROFILE			:= n
 KASAN_SANITIZE			:= n
-- 
2.15.0.448.gf294e3d99a-goog

^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH v2 16/18] arm64: crypto: disable LTO for aes-ce-cipher.c
  2017-11-15 21:34 [PATCH v2 00/18] Add support for clang LTO Sami Tolvanen
                   ` (14 preceding siblings ...)
  2017-11-15 21:34 ` [PATCH v2 15/18] efi/libstub: disable LTO Sami Tolvanen
@ 2017-11-15 21:34 ` Sami Tolvanen
  2017-11-20 15:20   ` Mark Rutland
  2017-11-15 21:34 ` [PATCH v2 17/18] arm64: disable RANDOMIZE_MODULE_REGION_FULL with LTO_CLANG Sami Tolvanen
                   ` (3 subsequent siblings)
  19 siblings, 1 reply; 103+ messages in thread
From: Sami Tolvanen @ 2017-11-15 21:34 UTC (permalink / raw)
  To: Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Mark Rutland, Masahiro Yamada, Maxim Kuvyrkov, Michal Marek,
	Nick Desaulniers, Yury Norov, Matthias Kaehlcke
  Cc: Sami Tolvanen

CONFIG_LTO_CLANG requires the use of clang's integrated assembler, which
doesn't understand the inline assembly in aes-ce-cipher.c. Disable LTO for
the file to work around the issue.

Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
---
 arch/arm64/crypto/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index b5edc5918c28..af08508521a3 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -24,7 +24,7 @@ obj-$(CONFIG_CRYPTO_CRC32_ARM64_CE) += crc32-ce.o
 crc32-ce-y:= crc32-ce-core.o crc32-ce-glue.o
 
 obj-$(CONFIG_CRYPTO_AES_ARM64_CE) += aes-ce-cipher.o
-CFLAGS_aes-ce-cipher.o += -march=armv8-a+crypto
+CFLAGS_aes-ce-cipher.o += -march=armv8-a+crypto $(DISABLE_LTO_CLANG)
 
 obj-$(CONFIG_CRYPTO_AES_ARM64_CE_CCM) += aes-ce-ccm.o
 aes-ce-ccm-y := aes-ce-ccm-glue.o aes-ce-ccm-core.o
-- 
2.15.0.448.gf294e3d99a-goog

^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH v2 17/18] arm64: disable RANDOMIZE_MODULE_REGION_FULL with LTO_CLANG
  2017-11-15 21:34 [PATCH v2 00/18] Add support for clang LTO Sami Tolvanen
                   ` (15 preceding siblings ...)
  2017-11-15 21:34 ` [PATCH v2 16/18] arm64: crypto: disable LTO for aes-ce-cipher.c Sami Tolvanen
@ 2017-11-15 21:34 ` Sami Tolvanen
  2017-11-15 21:34 ` [PATCH v2 18/18] arm64: select ARCH_SUPPORTS_LTO_CLANG Sami Tolvanen
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 103+ messages in thread
From: Sami Tolvanen @ 2017-11-15 21:34 UTC (permalink / raw)
  To: Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Mark Rutland, Masahiro Yamada, Maxim Kuvyrkov, Michal Marek,
	Nick Desaulniers, Yury Norov, Matthias Kaehlcke
  Cc: Sami Tolvanen

RANDOMIZE_MODULE_REGION_FULL results in "overflow in relocation type 275"
when loading a module linked with GNU gold. As a workaround, disable when
LTO_CLANG is selected.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
---
 arch/arm64/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index ba6aab55d464..3a70f763e18a 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1064,7 +1064,7 @@ config RANDOMIZE_BASE
 
 config RANDOMIZE_MODULE_REGION_FULL
 	bool "Randomize the module region independently from the core kernel"
-	depends on RANDOMIZE_BASE
+	depends on RANDOMIZE_BASE && !LTO_CLANG
 	default y
 	help
 	  Randomizes the location of the module region without considering the
-- 
2.15.0.448.gf294e3d99a-goog

^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH v2 18/18] arm64: select ARCH_SUPPORTS_LTO_CLANG
  2017-11-15 21:34 [PATCH v2 00/18] Add support for clang LTO Sami Tolvanen
                   ` (16 preceding siblings ...)
  2017-11-15 21:34 ` [PATCH v2 17/18] arm64: disable RANDOMIZE_MODULE_REGION_FULL with LTO_CLANG Sami Tolvanen
@ 2017-11-15 21:34 ` Sami Tolvanen
  2017-11-16 11:58   ` Will Deacon
  2017-11-16 20:53 ` [PATCH v2 00/18] Add support for clang LTO Yury Norov
  2017-11-20 15:21 ` Mark Rutland
  19 siblings, 1 reply; 103+ messages in thread
From: Sami Tolvanen @ 2017-11-15 21:34 UTC (permalink / raw)
  To: Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Mark Rutland, Masahiro Yamada, Maxim Kuvyrkov, Michal Marek,
	Nick Desaulniers, Yury Norov, Matthias Kaehlcke
  Cc: Sami Tolvanen

Allow CONFIG_LTO_CLANG to be enabled for the architecture.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
---
 arch/arm64/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 3a70f763e18a..58504327b9f6 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -40,6 +40,7 @@ config ARM64
 	select ARCH_INLINE_WRITE_UNLOCK_IRQRESTORE if !PREEMPT
 	select ARCH_USE_CMPXCHG_LOCKREF
 	select ARCH_USE_QUEUED_RWLOCKS
+	select ARCH_SUPPORTS_LTO_CLANG
 	select ARCH_SUPPORTS_MEMORY_FAILURE
 	select ARCH_SUPPORTS_ATOMIC_RMW
 	select ARCH_SUPPORTS_NUMA_BALANCING
-- 
2.15.0.448.gf294e3d99a-goog

^ permalink raw reply related	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 03/18] kbuild: move gcc-version.sh to cc-version.sh and add clang support
  2017-11-15 21:34 ` [PATCH v2 03/18] kbuild: move gcc-version.sh to cc-version.sh and add clang support Sami Tolvanen
@ 2017-11-15 21:48   ` Kees Cook
  2017-11-15 22:06     ` Sami Tolvanen
  0 siblings, 1 reply; 103+ messages in thread
From: Kees Cook @ 2017-11-15 21:48 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	linux-arm-kernel, linux-kbuild, LKML, Mark Rutland,
	Masahiro Yamada, Maxim Kuvyrkov, Michal Marek, Nick Desaulniers,
	Yury Norov, Matthias Kaehlcke

On Wed, Nov 15, 2017 at 1:34 PM, Sami Tolvanen <samitolvanen@google.com> wrote:
> Signed-off-by: Sami Tolvanen <samitolvanen@google.com>

It might make sense to split this patch: do the move and refactoring,
then add clang support.

Though, won't this confuse some tests? A lot of cc-version tests are
expecting only gcc, yes?

-Kees

> ---
>  scripts/Kbuild.include |  4 ++--
>  scripts/cc-version.sh  | 45 +++++++++++++++++++++++++++++++++++++++++++++
>  scripts/gcc-version.sh | 33 ---------------------------------
>  3 files changed, 47 insertions(+), 35 deletions(-)
>  create mode 100755 scripts/cc-version.sh
>  delete mode 100755 scripts/gcc-version.sh
>
> diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include
> index 584d6cecd7c0..de41ab74121e 100644
> --- a/scripts/Kbuild.include
> +++ b/scripts/Kbuild.include
> @@ -143,11 +143,11 @@ cc-disable-warning = $(call try-run,\
>  cc-name = $(shell $(CC) -v 2>&1 | grep -q "clang version" && echo clang || echo gcc)
>
>  # cc-version
> -cc-version = $(shell $(CONFIG_SHELL) $(srctree)/scripts/gcc-version.sh $(CC))
> +cc-version = $(shell $(CONFIG_SHELL) $(srctree)/scripts/cc-version.sh $(CC))
>
>  # cc-fullversion
>  cc-fullversion = $(shell $(CONFIG_SHELL) \
> -       $(srctree)/scripts/gcc-version.sh -p $(CC))
> +       $(srctree)/scripts/cc-version.sh -p $(CC))
>
>  # cc-ifversion
>  # Usage:  EXTRA_CFLAGS += $(call cc-ifversion, -lt, 0402, -O1)
> diff --git a/scripts/cc-version.sh b/scripts/cc-version.sh
> new file mode 100755
> index 000000000000..6d085a8a07e8
> --- /dev/null
> +++ b/scripts/cc-version.sh
> @@ -0,0 +1,45 @@
> +#!/bin/sh
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# cc-version [-p] cc-command
> +#
> +# Prints the compiler version of `command' in a canonical 4-digit form
> +# such as `0295' for gcc-2.95, `0303' for gcc-3.3, etc.
> +#
> +# With the -p option, prints the patchlevel as well, for example `029503' for
> +# gcc-2.95.3, `030301' for gcc-3.3.1, etc.
> +#
> +
> +if [ "$1" = "-p" ] ; then
> +       with_patchlevel=1;
> +       shift;
> +fi
> +
> +compiler="$*"
> +
> +if [ ${#compiler} -eq 0 ]; then
> +       echo "Error: No compiler specified."
> +       printf "Usage:\n\t$0 <cc-command>\n"
> +       exit 1
> +fi
> +
> +clang=$(echo __clang__ | $compiler -E -x c - | tail -n1)
> +
> +if [ "$clang" == "1" ]; then
> +       major_macro="__clang_major__"
> +       minor_macro="__clang_minor__"
> +       patchlevel_macro="__clang_patchlevel__"
> +else
> +       major_macro="__GNUC__"
> +       minor_macro="__GNUC_MINOR__"
> +       patchlevel_macro="__GNUC_PATCHLEVEL__"
> +fi
> +
> +MAJOR=$(echo $major_macro | $compiler -E -x c - | tail -n 1)
> +MINOR=$(echo $minor_macro | $compiler -E -x c - | tail -n 1)
> +if [ "x$with_patchlevel" != "x" ] ; then
> +       PATCHLEVEL=$(echo $patchlevel_macro | $compiler -E -x c - | tail -n 1)
> +       printf "%02d%02d%02d\\n" $MAJOR $MINOR $PATCHLEVEL
> +else
> +       printf "%02d%02d\\n" $MAJOR $MINOR
> +fi
> diff --git a/scripts/gcc-version.sh b/scripts/gcc-version.sh
> deleted file mode 100755
> index 11bb909845e7..000000000000
> --- a/scripts/gcc-version.sh
> +++ /dev/null
> @@ -1,33 +0,0 @@
> -#!/bin/sh
> -# SPDX-License-Identifier: GPL-2.0
> -#
> -# gcc-version [-p] gcc-command
> -#
> -# Prints the gcc version of `gcc-command' in a canonical 4-digit form
> -# such as `0295' for gcc-2.95, `0303' for gcc-3.3, etc.
> -#
> -# With the -p option, prints the patchlevel as well, for example `029503' for
> -# gcc-2.95.3, `030301' for gcc-3.3.1, etc.
> -#
> -
> -if [ "$1" = "-p" ] ; then
> -       with_patchlevel=1;
> -       shift;
> -fi
> -
> -compiler="$*"
> -
> -if [ ${#compiler} -eq 0 ]; then
> -       echo "Error: No compiler specified."
> -       printf "Usage:\n\t$0 <gcc-command>\n"
> -       exit 1
> -fi
> -
> -MAJOR=$(echo __GNUC__ | $compiler -E -x c - | tail -n 1)
> -MINOR=$(echo __GNUC_MINOR__ | $compiler -E -x c - | tail -n 1)
> -if [ "x$with_patchlevel" != "x" ] ; then
> -       PATCHLEVEL=$(echo __GNUC_PATCHLEVEL__ | $compiler -E -x c - | tail -n 1)
> -       printf "%02d%02d%02d\\n" $MAJOR $MINOR $PATCHLEVEL
> -else
> -       printf "%02d%02d\\n" $MAJOR $MINOR
> -fi
> --
> 2.15.0.448.gf294e3d99a-goog
>



-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 01/18] kbuild: add ld-name macro and support for GNU gold
  2017-11-15 21:34 ` [PATCH v2 01/18] kbuild: add ld-name macro and support for GNU gold Sami Tolvanen
@ 2017-11-15 21:53   ` Kees Cook
  0 siblings, 0 replies; 103+ messages in thread
From: Kees Cook @ 2017-11-15 21:53 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	linux-arm-kernel, linux-kbuild, LKML, Mark Rutland,
	Masahiro Yamada, Maxim Kuvyrkov, Michal Marek, Nick Desaulniers,
	Yury Norov, Matthias Kaehlcke

On Wed, Nov 15, 2017 at 1:34 PM, Sami Tolvanen <samitolvanen@google.com> wrote:
> GNU gold may require different flags than GNU ld. Add a macro for
> detecting the linker and conditionally add gold specific flags from
> LDFLAGS_GOLD.
>
> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
> Signed-off-by: Sami Tolvanen <samitolvanen@google.com>

Reviewed-by: Kees Cook <keescook@chromium.org>

(It does seem weird to have only LDFLAGS instead of KBUILD_LDFLAGS,
but that's not really a problem for this patch exactly.)

-Kees

> ---
>  Makefile               | 5 +++++
>  scripts/Kbuild.include | 4 ++++
>  2 files changed, 9 insertions(+)
>
> diff --git a/Makefile b/Makefile
> index 763ab35df12a..f976af9525bf 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -827,6 +827,11 @@ include scripts/Makefile.kasan
>  include scripts/Makefile.extrawarn
>  include scripts/Makefile.ubsan
>
> +# Add any flags specific to ld.gold
> +ifeq ($(ld-name),gold)
> +LDFLAGS                += $(LDFLAGS_GOLD)
> +endif
> +
>  # Add any arch overrides and user supplied CPPFLAGS, AFLAGS and CFLAGS as the
>  # last assignments
>  KBUILD_CPPFLAGS += $(ARCH_CPPFLAGS) $(KCPPFLAGS)
> diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include
> index 9ffd3dda3889..584d6cecd7c0 100644
> --- a/scripts/Kbuild.include
> +++ b/scripts/Kbuild.include
> @@ -172,6 +172,10 @@ ld-option = $(call try-run,\
>  # Important: no spaces around options
>  ar-option = $(call try-run, $(AR) rc$(1) "$$TMP",$(1),$(2))
>
> +# ld-name
> +# Expands to either bfd or gold
> +ld-name = $(shell $(LD) -v 2>&1 | grep -q "GNU gold" && echo gold || echo bfd)
> +
>  # ld-version
>  # Note this is mainly for HJ Lu's 3 number binutil versions
>  ld-version = $(shell $(LD) --version | $(srctree)/scripts/ld-version.sh)
> --
> 2.15.0.448.gf294e3d99a-goog
>



-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 12/18] kbuild: add support for clang LTO
  2017-11-15 21:34 ` [PATCH v2 12/18] kbuild: add support for clang LTO Sami Tolvanen
@ 2017-11-15 22:06   ` Kees Cook
  2017-11-18  3:21   ` [v2,12/18] " Nicholas Piggin
  1 sibling, 0 replies; 103+ messages in thread
From: Kees Cook @ 2017-11-15 22:06 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	linux-arm-kernel, linux-kbuild, LKML, Mark Rutland,
	Masahiro Yamada, Maxim Kuvyrkov, Michal Marek, Nick Desaulniers,
	Yury Norov, Matthias Kaehlcke

On Wed, Nov 15, 2017 at 1:34 PM, Sami Tolvanen <samitolvanen@google.com> wrote:
> This change adds the configuration option CONFIG_LTO_CLANG, and
> build system support for clang's Link Time Optimization (LTO). In
> preparation for LTO support for other compilers, potentially common
> parts of the changes are gated behind CONFIG_LTO instead.
>
> With -flto, instead of object files, clang produces LLVM bitcode,
> which is compiled into a native object at link time, allowing the
> final binary to be optimized globally. For more details, see:
>
>   https://llvm.org/docs/LinkTimeOptimization.html
>
> While the kernel normally uses GNU ld for linking, LLVM supports LTO
> only with lld or GNU gold linkers. This patch set assumes gold will
> be used with the LLVMgold plug-in to perform the LTO link step. Due
> to potential incompatibilities with GNU ld, this change also adds
> LDFINAL_vmlinux for using a different linker for the vmlinux_link
> step, and defaults to using GNU ld.
>
> Assuming LLVMgold.so is in LD_LIBRARY_PATH and CONFIG_LTO_CLANG has
> been selected, an LTO kernel can be built simply by running make
> CC=clang. Recommended versions are >= 5.0 for clang, and >= 2.27 for
> binutils.
>
> Signed-off-by: Sami Tolvanen <samitolvanen@google.com>

Reviewed-by: Kees Cook <keescook@chromium.org>

I wonder if this patch could be split into a few pieces. e.g. linker
script changes, modpost changes, core build changes, and Kconfig? The
changes area already pretty confined to specific files, so maybe this
isn't really needed, but it might make review easier.

-Kees

> ---
>  .gitignore               |  2 ++
>  Makefile                 | 54 +++++++++++++++++++++++++++-
>  arch/Kconfig             | 32 +++++++++++++++++
>  scripts/Makefile.build   | 66 +++++++++++++++++++++++++++++++++-
>  scripts/Makefile.modpost | 63 ++++++++++++++++++++++++++++-----
>  scripts/link-vmlinux.sh  | 92 ++++++++++++++++++++++++++++++++++++++++++------
>  6 files changed, 288 insertions(+), 21 deletions(-)
>
> diff --git a/.gitignore b/.gitignore
> index 6c119eab5d46..ac236e2bb9b1 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -11,6 +11,7 @@
>  #
>  .*
>  *.a
> +*.a.*
>  *.bin
>  *.bz2
>  *.c.[012]*.*
> @@ -28,6 +29,7 @@
>  *.lzma
>  *.lzo
>  *.mod.c
> +*.modversions
>  *.o
>  *.o.*
>  *.order
> diff --git a/Makefile b/Makefile
> index f976af9525bf..8141b4c8f1bf 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -350,6 +350,7 @@ include scripts/Kbuild.include
>  # Make variables (CC, etc...)
>  AS             = $(CROSS_COMPILE)as
>  LD             = $(CROSS_COMPILE)ld
> +LDGOLD         = $(CROSS_COMPILE)ld.gold
>  CC             = $(CROSS_COMPILE)gcc
>  CPP            = $(CC) -E
>  AR             = $(CROSS_COMPILE)ar
> @@ -623,6 +624,15 @@ endif
>  # Defaults to vmlinux, but the arch makefile usually adds further targets
>  all: vmlinux
>
> +# Make toolchain changes before including arch/$(SRCARCH)/Makefile to ensure
> +# cc-/ld-* macros return correct values.
> +ifdef CONFIG_LTO_CLANG
> +# use GNU gold with LLVMgold for LTO linking, and LD for vmlinux_link
> +LDFINAL_vmlinux := $(LD)
> +LD             := $(LDGOLD)
> +LDFLAGS_GOLD   += -plugin LLVMgold.so
> +endif
> +
>  # The arch Makefile can set ARCH_{CPP,A,C}FLAGS to override the default
>  # values of the respective KBUILD_* variables
>  ARCH_CPPFLAGS :=
> @@ -789,6 +799,32 @@ KBUILD_CFLAGS      += $(call cc-option,-ffunction-sections,)
>  KBUILD_CFLAGS  += $(call cc-option,-fdata-sections,)
>  endif
>
> +ifdef CONFIG_LTO_CLANG
> +lto-clang-flags        := -flto -fvisibility=hidden
> +
> +# allow disabling only clang LTO where needed
> +DISABLE_LTO_CLANG := -fno-lto
> +export DISABLE_LTO_CLANG
> +
> +ifdef CONFIG_MODVERSIONS
> +# llvm-dis is used instead of objdump to process LLVM IR files
> +LLVM_DIS       := llvm-dis
> +export LLVM_DIS
> +endif
> +endif
> +
> +ifdef CONFIG_LTO
> +lto-flags      := $(lto-clang-flags)
> +KBUILD_CFLAGS  += $(lto-flags)
> +
> +DISABLE_LTO    := $(DISABLE_LTO_CLANG)
> +export DISABLE_LTO
> +
> +# LDFINAL_vmlinux and LDFLAGS_FINAL_vmlinux can be set to override
> +# the linker and flags for vmlinux_link.
> +export LDFINAL_vmlinux LDFLAGS_FINAL_vmlinux
> +endif
> +
>  # arch Makefile may override CC so keep this after arch Makefile is included
>  NOSTDINC_FLAGS += -nostdinc -isystem $(shell $(CC) -print-file-name=include)
>  CHECKFLAGS     += $(NOSTDINC_FLAGS)
> @@ -1090,6 +1126,19 @@ prepare-objtool: $(objtool_target)
>  # CC_STACKPROTECTOR_STRONG! Why did it build with _REGULAR?!")
>  PHONY += prepare-compiler-check
>  prepare-compiler-check: FORCE
> +# Make sure we're using clang with LTO_CLANG
> +ifdef CONFIG_LTO_CLANG
> +  ifneq ($(cc-name),clang)
> +       @echo Cannot use CONFIG_LTO_CLANG without CC=clang >&2 && exit 1
> +  endif
> +endif
> +# Make sure compiler supports LTO flags
> +ifdef lto-flags
> +  ifeq ($(call cc-option, $(lto-flags)),)
> +       @echo Cannot use CONFIG_LTO: $(lto-flags) not supported by compiler \
> +               >&2 && exit 1
> +  endif
> +endif
>  # Make sure compiler supports requested stack protector flag.
>  ifdef stackp-name
>    ifeq ($(call cc-option, $(stackp-flag)),)
> @@ -1564,7 +1613,10 @@ clean: $(clean-dirs)
>                 -o -name modules.builtin -o -name '.tmp_*.o.*' \
>                 -o -name '*.c.[012]*.*' \
>                 -o -name '*.ll' \
> -               -o -name '*.gcno' \) -type f -print | xargs rm -f
> +               -o -name '*.gcno' \
> +               -o -name '*.[oa].objects' \
> +               -o -name '*.o.symversions' \
> +               -o -name '*.modversions' \) -type f -print | xargs rm -f
>
>  # Generate tags for editors
>  # ---------------------------------------------------------------------------
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 400b9e1b2f27..bb5296ecebdd 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -585,6 +585,7 @@ config CC_STACKPROTECTOR_STRONG
>  endchoice
>
>  config THIN_ARCHIVES
> +       depends on !LTO_CLANG
>         def_bool y
>         help
>           Select this if the architecture wants to use thin archives
> @@ -605,6 +606,37 @@ config LD_DEAD_CODE_DATA_ELIMINATION
>           sections (e.g., '.text.init'). Typically '.' in section names
>           is used to distinguish them from label names / C identifiers.
>
> +config LTO
> +       bool
> +
> +config ARCH_SUPPORTS_LTO_CLANG
> +       bool
> +       help
> +         An architecture should select this option it supports:
> +         - compiling with clang,
> +         - compiling inline assembly with clang's integrated assembler,
> +         - and linking with either lld or GNU gold w/ LLVMgold.
> +
> +config LTO_CLANG
> +       bool "Use clang Link Time Optimization (LTO)"
> +       depends on ARCH_SUPPORTS_LTO_CLANG
> +       depends on !FTRACE_MCOUNT_RECORD
> +       select LTO
> +       select LD_DEAD_CODE_DATA_ELIMINATION
> +       help
> +          This option enables clang's Link Time Optimization (LTO), which allows
> +          the compiler to optimize the kernel globally at link time. If you
> +          enable this option, the compiler generates LLVM IR instead of object
> +          files, and the actual compilation from IR occurs at the LTO link step,
> +          which may take several minutes.
> +
> +          If you select this option, you must compile the kernel with clang
> +          (make CC=clang) and have the LVMgold plug-in in LD_LIBRARY_PATH.
> +
> +          Using clang >= 5.0 and GNU gold from binutils >= 2.27 is recommended
> +          for this option.
> +
> +
>  config HAVE_ARCH_WITHIN_STACK_FRAMES
>         bool
>         help
> diff --git a/scripts/Makefile.build b/scripts/Makefile.build
> index e63af4e19382..e8bf5c440612 100644
> --- a/scripts/Makefile.build
> +++ b/scripts/Makefile.build
> @@ -210,6 +210,23 @@ else
>
>  cmd_cc_o_c = $(CC) $(c_flags) -c -o $(@D)/.tmp_$(@F) $<
>
> +ifdef CONFIG_LTO_CLANG
> +# Generate .o.symversions files for each .o with exported symbols, and link these
> +# to the kernel and/or modules at the end.
> +cmd_modversions_c =                                                            \
> +       if echo '$(c_flags)' | grep -q -- '$(DISABLE_LTO_CLANG)'; then          \
> +               if $(OBJDUMP) -h $(@D)/.tmp_$(@F) | grep -q __ksymtab; then     \
> +                       $(call cmd_gensymtypes_c,$(KBUILD_SYMTYPES),$(@:.o=.symtypes)) \
> +                           > $(@D)/$(@F).symversions;                          \
> +               fi;                                                             \
> +       else                                                                    \
> +               if $(LLVM_DIS) -o=- $(@D)/.tmp_$(@F) | grep -q __ksymtab; then  \
> +                       $(call cmd_gensymtypes_c,$(KBUILD_SYMTYPES),$(@:.o=.symtypes)) \
> +                           > $(@D)/$(@F).symversions;                          \
> +               fi;                                                             \
> +       fi;                                                                     \
> +       mv -f $(@D)/.tmp_$(@F) $@;
> +else
>  cmd_modversions_c =                                                            \
>         if $(OBJDUMP) -h $(@D)/.tmp_$(@F) | grep -q __ksymtab; then             \
>                 $(call cmd_gensymtypes_c,$(KBUILD_SYMTYPES),$(@:.o=.symtypes))  \
> @@ -222,6 +239,7 @@ cmd_modversions_c =                                                         \
>                 mv -f $(@D)/.tmp_$(@F) $@;                                      \
>         fi;
>  endif
> +endif
>
>  ifdef CONFIG_FTRACE_MCOUNT_RECORD
>  ifdef BUILD_C_RECORDMCOUNT
> @@ -443,6 +461,13 @@ $(obj)/%-asn1.c $(obj)/%-asn1.h: $(src)/%.asn1 $(objtree)/scripts/asn1_compiler
>  # To build objects in subdirs, we need to descend into the directories
>  $(sort $(subdir-obj-y)): $(subdir-ym) ;
>
> +ifdef CONFIG_LTO_CLANG
> +# If LTO is enabled, we remove all intermediate linking steps and instead
> +# collect a list of all objects to be linked at the end.
> +cc_lto_objects = $(foreach o,$(1),\
> +                       [ -f ${o}.objects ] && cat ${o}.objects || echo ${o};)
> +endif
> +
>  #
>  # Rule to compile a set of .o files into one .o file
>  #
> @@ -458,6 +483,16 @@ else
>    quiet_cmd_link_o_target = LD      $@
>  endif
>
> +ifdef CONFIG_LTO_CLANG
> +builtin-cmds = $(call cc_lto_objects,$(filter $(obj-y), $^))
> +
> +quiet_cmd_update_builtin = GEN     $@
> +cmd_update_builtin = (cat /dev/null; $(builtin-cmds)) > $@.objects && \
> +                       cat /dev/null > $@
> +
> +$(builtin-target): $(obj-y) FORCE
> +       $(call if_changed,update_builtin)
> +else
>  # If the list of objects to link is empty, just create an empty built-in.o
>  cmd_link_o_target = $(if $(strip $(obj-y)),\
>                       $(cmd_make_builtin) $@ $(filter $(obj-y), $^) \
> @@ -466,6 +501,7 @@ cmd_link_o_target = $(if $(strip $(obj-y)),\
>
>  $(builtin-target): $(obj-y) FORCE
>         $(call if_changed,link_o_target)
> +endif
>
>  targets += $(builtin-target)
>  endif # builtin-target
> @@ -487,6 +523,16 @@ $(modorder-target): $(subdir-ym) FORCE
>  # Rule to compile a set of .o files into one .a file
>  #
>  ifdef lib-target
> +ifdef CONFIG_LTO_CLANG
> +lib-target-cmds = $(call cc_lto_objects,$(lib-y))
> +
> +quiet_cmd_update_lib_target = GEN     $@
> +cmd_update_lib_target = (cat /dev/null; $(lib-target-cmds)) > $@.objects && \
> +                       cat /dev/null > $@
> +
> +$(lib-target): $(lib-y) FORCE
> +       $(call if_changed,update_lib_target)
> +else
>  quiet_cmd_link_l_target = AR      $@
>
>  ifdef CONFIG_THIN_ARCHIVES
> @@ -497,6 +543,7 @@ endif
>
>  $(lib-target): $(lib-y) FORCE
>         $(call if_changed,link_l_target)
> +endif
>
>  targets += $(lib-target)
>
> @@ -552,14 +599,31 @@ endif
>  quiet_cmd_link_multi-m = LD [M]  $@
>  cmd_link_multi-m = $(cmd_link_multi-link)
>
> +ifdef CONFIG_LTO_CLANG
> +multi-deps-cmds = $(call cc_lto_objects,$(link_multi_deps))
> +
> +quiet_cmd_update_multi_deps = GEN     $@
> +cmd_update_multi_deps = (cat /dev/null; $(multi-deps-cmds)) > $@.objects && \
> +                       cat /dev/null > $@
> +
> +$(multi-used-y): FORCE
> +       $(call if_changed,update_multi_deps)
> +
> +$(multi-used-m): FORCE
> +       $(call if_changed,update_multi_deps)
> +       @{ echo $(@:.o=.ko); echo $(link_multi_deps); \
> +          $(cmd_undef_syms); } > $(MODVERDIR)/$(@F:.o=.mod)
> +else
>  $(multi-used-y): FORCE
>         $(call if_changed,link_multi-y)
> -$(call multi_depend, $(multi-used-y), .o, -objs -y)
>
>  $(multi-used-m): FORCE
>         $(call if_changed,link_multi-m)
>         @{ echo $(@:.o=.ko); echo $(link_multi_deps); \
>            $(cmd_undef_syms); } > $(MODVERDIR)/$(@F:.o=.mod)
> +endif
> +
> +$(call multi_depend, $(multi-used-y), .o, -objs -y)
>  $(call multi_depend, $(multi-used-m), .o, -objs -y -m)
>
>  targets += $(multi-used-y) $(multi-used-m)
> diff --git a/scripts/Makefile.modpost b/scripts/Makefile.modpost
> index 991db7d6e4df..cb1c040a006c 100644
> --- a/scripts/Makefile.modpost
> +++ b/scripts/Makefile.modpost
> @@ -83,12 +83,46 @@ modpost = scripts/mod/modpost                    \
>
>  MODPOST_OPT=$(subst -i,-n,$(filter -i,$(MAKEFLAGS)))
>
> +# If CONFIG_LTO_CLANG is enabled, .o files are either LLVM IR, or empty, so we
> +# need to link them into actual objects before passing them to modpost
> +modpost-ext = $(if $(CONFIG_LTO_CLANG),.lto,)
> +
> +ifdef CONFIG_LTO_CLANG
> +quiet_cmd_cc_lto_modversions = GEN [M]  $@
> +cmd_cc_lto_modversions =                                               \
> +       rm -f $(@);                                                     \
> +       if [ -f $(@:.modversions=.o).objects ]; then                    \
> +               for i in `cat $(@:.modversions=.o).objects`; do         \
> +                       [ -s $$i.symversions ] &&                       \
> +                               cat $$i.symversions >> $(@);            \
> +               done;                                                   \
> +       else                                                            \
> +               [ -s $(@:.modversions=.o).symversions ] &&              \
> +                       cat $(@:.modversions=.o).symversions >> $(@);   \
> +       fi
> +
> +$(modules:.ko=.modversions): FORCE
> +       $(call if_changed,cc_lto_modversions)
> +
> +quiet_cmd_cc_lto_link_modules = LD [M]  $@
> +cmd_cc_lto_link_modules =                                              \
> +       $(LD) $(ld_flags) -r -o $(@)                                    \
> +               $(shell [ -s $(@:$(modpost-ext).o=.modversions) ] &&    \
> +                       echo -T $(@:$(modpost-ext).o=.modversions))     \
> +               $(shell [ -f $(@:$(modpost-ext).o=.o).objects ] &&      \
> +                        cat $(@:$(modpost-ext).o=.o).objects ||        \
> +                       echo $(@:$(modpost-ext).o=.o))
> +
> +$(modules:.ko=$(modpost-ext).o): %$(modpost-ext).o: %.o %.modversions FORCE
> +       $(call if_changed,cc_lto_link_modules)
> +endif
> +
>  # We can go over command line length here, so be careful.
>  quiet_cmd_modpost = MODPOST $(words $(filter-out vmlinux FORCE, $^)) modules
> -      cmd_modpost = $(MODLISTCMD) | sed 's/\.ko$$/.o/' | $(modpost) $(MODPOST_OPT) -s -T -
> +      cmd_modpost = $(MODLISTCMD) | sed 's/\.ko$$/$(modpost-ext)\.o/' | $(modpost) $(MODPOST_OPT) -s -T -
>
>  PHONY += __modpost
> -__modpost: $(modules:.ko=.o) FORCE
> +__modpost: $(modules:.ko=$(modpost-ext).o) FORCE
>         $(call cmd,modpost) $(wildcard vmlinux)
>
>  quiet_cmd_kernel-mod = MODPOST $@
> @@ -98,8 +132,7 @@ vmlinux.o: FORCE
>         $(call cmd,kernel-mod)
>
>  # Declare generated files as targets for modpost
> -$(modules:.ko=.mod.c): __modpost ;
> -
> +$(modules:.ko=$(modpost-ext).mod.c): __modpost ;
>
>  # Step 5), compile all *.mod.c files
>
> @@ -110,23 +143,37 @@ quiet_cmd_cc_o_c = CC      $@
>        cmd_cc_o_c = $(CC) $(c_flags) $(KBUILD_CFLAGS_MODULE) $(CFLAGS_MODULE) \
>                    -c -o $@ $<
>
> -$(modules:.ko=.mod.o): %.mod.o: %.mod.c FORCE
> +$(modules:.ko=.mod.o): %.mod.o: %$(modpost-ext).mod.c FORCE
>         $(call if_changed_dep,cc_o_c)
>
> -targets += $(modules:.ko=.mod.o)
> +targets += $(modules:.ko=$(modpost-ext).mod.o)
>
>  ARCH_POSTLINK := $(wildcard $(srctree)/arch/$(SRCARCH)/Makefile.postlink)
>
>  # Step 6), final link of the modules with optional arch pass after final link
>  quiet_cmd_ld_ko_o = LD [M]  $@
> +
> +ifdef CONFIG_LTO_CLANG
> +lto_ko_objects = $(foreach o,$(1:$(modpost-ext).o=.o),                 \
> +                       $(shell [ -f $(o).objects ] &&                  \
> +                                cat $(o).objects || echo $(o)))
> +
> +      cmd_ld_ko_o =                                                    \
> +       $(LD) -r $(LDFLAGS)                                             \
> +                $(KBUILD_LDFLAGS_MODULE) $(LDFLAGS_MODULE)             \
> +                $(shell [ -s $(@:.ko=.modversions) ] &&                \
> +                       echo -T $(@:.ko=.modversions))                  \
> +                -o $@ $(call lto_ko_objects, $(filter-out FORCE,$^))
> +else
>        cmd_ld_ko_o =                                                     \
>         $(LD) -r $(LDFLAGS)                                             \
>                   $(KBUILD_LDFLAGS_MODULE) $(LDFLAGS_MODULE)             \
>                   -o $@ $(filter-out FORCE,$^) ;                         \
>         $(if $(ARCH_POSTLINK), $(MAKE) -f $(ARCH_POSTLINK) $@, true)
> +endif
>
> -$(modules): %.ko :%.o %.mod.o FORCE
> -       +$(call if_changed,ld_ko_o)
> +$(modules): %.ko: %$(modpost-ext).o %.mod.o FORCE
> +       $(call if_changed,ld_ko_o)
>
>  targets += $(modules)
>
> diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
> index e6818b8e7141..fccc123b691f 100755
> --- a/scripts/link-vmlinux.sh
> +++ b/scripts/link-vmlinux.sh
> @@ -64,6 +64,53 @@ archive_builtin()
>         fi
>  }
>
> +# If CONFIG_LTO_CLANG is selected, the compiler produces LLVM IR files instead
> +# of ELF object files. This function expands individual IR files from a list of
> +# objects that would have otherwise been linked already.
> +expand()
> +{
> +       if [ -z "${CONFIG_LTO_CLANG}" ]; then
> +               echo $*
> +       fi
> +
> +       local objs
> +
> +       for o in $*; do
> +               if [ -f ${o}.objects ]; then
> +                       objs="${objs} $(xargs < ${o}.objects)"
> +               else
> +                       objs="${objs} ${o}"
> +               fi
> +       done
> +
> +       echo "${objs}"
> +}
> +
> +# If CONFIG_LTO_CLANG is selected, collect generated symbol versions into
> +# .tmp_symversions
> +modversions()
> +{
> +       if [ -z "${CONFIG_LTO_CLANG}" ]; then
> +               return
> +       fi
> +
> +       if [ -z "${CONFIG_MODVERSIONS}" ]; then
> +               return
> +       fi
> +
> +       rm -f .tmp_symversions
> +
> +       for o in $(expand ${KBUILD_VMLINUX_INIT}) \
> +                $(expand ${KBUILD_VMLINUX_MAIN}) \
> +                $(expand ${KBUILD_VMLINUX_LIBS}); do
> +               if [ -f ${o}.symversions ]; then
> +                       cat ${o}.symversions >> .tmp_symversions
> +               fi
> +       done
> +
> +       echo "-T .tmp_symversions"
> +}
> +
>  # Link of vmlinux.o used for section mismatch analysis
>  # ${1} output file
>  modpost_link()
> @@ -78,13 +125,22 @@ modpost_link()
>                         ${KBUILD_VMLINUX_LIBS}                          \
>                         --end-group"
>         else
> -               objects="${KBUILD_VMLINUX_INIT}                         \
> +               objects="$(expand ${KBUILD_VMLINUX_INIT})               \
>                         --start-group                                   \
> -                       ${KBUILD_VMLINUX_MAIN}                          \
> -                       ${KBUILD_VMLINUX_LIBS}                          \
> +                       $(expand ${KBUILD_VMLINUX_MAIN})                \
> +                       $(expand ${KBUILD_VMLINUX_LIBS})                \
>                         --end-group"
>         fi
> -       ${LD} ${LDFLAGS} -r -o ${1} ${objects}
> +
> +       if [ -n "${CONFIG_LTO_CLANG}" ]; then
> +               # This might take a while, so indicate that we're doing
> +               # an LTO link
> +               info LTO vmlinux.o
> +       else
> +               info LD vmlinux.o
> +       fi
> +
> +       ${LD} ${LDFLAGS} -r -o ${1} $(modversions) ${objects}
>  }
>
>  # Link of vmlinux
> @@ -96,6 +152,14 @@ vmlinux_link()
>         local objects
>
>         if [ "${SRCARCH}" != "um" ]; then
> +               local ld=${LD}
> +               local ldflags="${LDFLAGS} ${LDFLAGS_vmlinux}"
> +
> +               if [ -n "${LDFINAL_vmlinux}" ]; then
> +                       ld=${LDFINAL_vmlinux}
> +                       ldflags="${LDFLAGS_FINAL_vmlinux} ${LDFLAGS_vmlinux}"
> +               fi
> +
>                 if [ -n "${CONFIG_THIN_ARCHIVES}" ]; then
>                         objects="--whole-archive                        \
>                                 built-in.o                              \
> @@ -105,16 +169,15 @@ vmlinux_link()
>                                 --end-group                             \
>                                 ${1}"
>                 else
> -                       objects="${KBUILD_VMLINUX_INIT}                 \
> +                       objects="$(expand ${KBUILD_VMLINUX_INIT})       \
>                                 --start-group                           \
> -                               ${KBUILD_VMLINUX_MAIN}                  \
> -                               ${KBUILD_VMLINUX_LIBS}                  \
> +                               $(expand ${KBUILD_VMLINUX_MAIN})        \
> +                               $(expand ${KBUILD_VMLINUX_LIBS})        \
>                                 --end-group                             \
>                                 ${1}"
>                 fi
>
> -               ${LD} ${LDFLAGS} ${LDFLAGS_vmlinux} -o ${2}             \
> -                       -T ${lds} ${objects}
> +               ${ld} ${ldflags} -o ${2} -T ${lds} ${objects}
>         else
>                 if [ -n "${CONFIG_THIN_ARCHIVES}" ]; then
>                         objects="-Wl,--whole-archive                    \
> @@ -141,7 +204,6 @@ vmlinux_link()
>         fi
>  }
>
> -
>  # Create ${2} .o file with all symbols from the ${1} object file
>  kallsyms()
>  {
> @@ -192,6 +254,7 @@ cleanup()
>         rm -f .tmp_System.map
>         rm -f .tmp_kallsyms*
>         rm -f .tmp_version
> +       rm -f .tmp_symversions
>         rm -f .tmp_vmlinux*
>         rm -f built-in.o
>         rm -f System.map
> @@ -253,12 +316,19 @@ ${MAKE} -f "${srctree}/scripts/Makefile.build" obj=init GCC_PLUGINS_CFLAGS="${GC
>  archive_builtin
>
>  #link vmlinux.o
> -info LD vmlinux.o
>  modpost_link vmlinux.o
>
>  # modpost vmlinux.o to check for section mismatches
>  ${MAKE} -f "${srctree}/scripts/Makefile.modpost" vmlinux.o
>
> +if [ -n "${CONFIG_LTO_CLANG}" ]; then
> +       # Re-use vmlinux.o, so we can avoid the slow LTO link step in
> +       # vmlinux_link
> +       KBUILD_VMLINUX_INIT=
> +       KBUILD_VMLINUX_MAIN=vmlinux.o
> +       KBUILD_VMLINUX_LIBS=
> +fi
> +
>  kallsymso=""
>  kallsyms_vmlinux=""
>  if [ -n "${CONFIG_KALLSYMS}" ]; then
> --
> 2.15.0.448.gf294e3d99a-goog
>



-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 03/18] kbuild: move gcc-version.sh to cc-version.sh and add clang support
  2017-11-15 21:48   ` Kees Cook
@ 2017-11-15 22:06     ` Sami Tolvanen
  0 siblings, 0 replies; 103+ messages in thread
From: Sami Tolvanen @ 2017-11-15 22:06 UTC (permalink / raw)
  To: Kees Cook
  Cc: Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	linux-arm-kernel, linux-kbuild, LKML, Mark Rutland,
	Masahiro Yamada, Maxim Kuvyrkov, Michal Marek, Nick Desaulniers,
	Yury Norov, Matthias Kaehlcke

On Wed, Nov 15, 2017 at 01:48:52PM -0800, Kees Cook wrote:
> It might make sense to split this patch: do the move and refactoring,
> then add clang support.

Sure.

> Though, won't this confuse some tests? A lot of cc-version tests are
> expecting only gcc, yes?

There's already a chance of this happening with cc-version. Currently,
gcc-version.sh returns 0402 for clang 5.0, which probably doesn't have
the same issues as gcc 4.2 did.

While I didn't see anything new that would break on platforms that
clang can currently compile, you're correct, we should probably have a
macro that also checks for the compiler, or have separate macros for
different compilers.

I'll address these in v3.

Sami

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 08/18] arm64: don't disable ADR_PREL_PG_HI21* with ARM64_ERRATUM_843419
  2017-11-15 21:34 ` [PATCH v2 08/18] arm64: don't disable ADR_PREL_PG_HI21* with ARM64_ERRATUM_843419 Sami Tolvanen
@ 2017-11-15 22:29   ` Ard Biesheuvel
  2017-11-16 11:44     ` Will Deacon
  0 siblings, 1 reply; 103+ messages in thread
From: Ard Biesheuvel @ 2017-11-15 22:29 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Alex Matveev, Andi Kleen, Greg Hackmann, Kees Cook,
	linux-arm-kernel, linux-kbuild, linux-kernel, Mark Rutland,
	Masahiro Yamada, Maxim Kuvyrkov, Michal Marek, Nick Desaulniers,
	Yury Norov, Matthias Kaehlcke

On 15 November 2017 at 21:34, Sami Tolvanen <samitolvanen@google.com> wrote:
> CONFIG_LTO_CLANG depends on GNU gold, which can generate ADR_PREL_PG_HI21*
> relocations even with --fix-cortex-a53-843419.
>
> Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
> ---
>  arch/arm64/kernel/module.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
> index f469e0435903..fec9a578f122 100644
> --- a/arch/arm64/kernel/module.c
> +++ b/arch/arm64/kernel/module.c
> @@ -336,14 +336,12 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
>                         ovf = reloc_insn_imm(RELOC_OP_PREL, loc, val, 0, 21,
>                                              AARCH64_INSN_IMM_ADR);
>                         break;
> -#ifndef CONFIG_ARM64_ERRATUM_843419
>                 case R_AARCH64_ADR_PREL_PG_HI21_NC:
>                         overflow_check = false;
>                 case R_AARCH64_ADR_PREL_PG_HI21:
>                         ovf = reloc_insn_imm(RELOC_OP_PAGE, loc, val, 12, 21,
>                                              AARCH64_INSN_IMM_ADR);
>                         break;
> -#endif
>                 case R_AARCH64_ADD_ABS_LO12_NC:
>                 case R_AARCH64_LDST8_ABS_LO12_NC:
>                         overflow_check = false;
> --
> 2.15.0.448.gf294e3d99a-goog
>

I think this change is reasonable in itself, but i do wonder how we
can ensure that the adrp instructions that GNU gold does generate are
not affected by the erratum, given that modules are partially linked
object files, not shared libraries.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 05/18] arm64: fix -m for GNU gold
  2017-11-15 21:34 ` [PATCH v2 05/18] arm64: fix -m for GNU gold Sami Tolvanen
@ 2017-11-16 10:55   ` Yury Norov
  2017-11-16 16:55     ` Nick Desaulniers
  0 siblings, 1 reply; 103+ messages in thread
From: Yury Norov @ 2017-11-16 10:55 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Mark Rutland, Masahiro Yamada, Maxim Kuvyrkov, Michal Marek,
	Nick Desaulniers, Matthias Kaehlcke

On Wed, Nov 15, 2017 at 01:34:15PM -0800, Sami Tolvanen wrote:
> GNU gold supports different emulations than bfd. Use aarch64_elf64_*_vec
> instead of aarch64linux.
> 
> Signed-off-by: Sami Tolvanen <samitolvanen@google.com>

Acked-by: Yury Norov <ynorov@caviumnetworks.com>

> ---
>  arch/arm64/Makefile | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
> index ecd5ed11c764..6059c8169513 100644
> --- a/arch/arm64/Makefile
> +++ b/arch/arm64/Makefile
> @@ -70,14 +70,22 @@ KBUILD_CPPFLAGS	+= -mbig-endian
>  CHECKFLAGS	+= -D__AARCH64EB__
>  AS		+= -EB
>  LD		+= -EB
> +ifeq ($(ld-name),gold)
> +LDFLAGS		+= -maarch64_elf64_be_vec
> +else
>  LDFLAGS		+= -maarch64linuxb
> +endif
>  UTS_MACHINE	:= aarch64_be
>  else
>  KBUILD_CPPFLAGS	+= -mlittle-endian
>  CHECKFLAGS	+= -D__AARCH64EL__
>  AS		+= -EL
>  LD		+= -EL
> +ifeq ($(ld-name),gold)
> +LDFLAGS		+= -maarch64_elf64_le_vec
> +else
>  LDFLAGS		+= -maarch64linux
> +endif
>  UTS_MACHINE	:= aarch64
>  endif
>  
> -- 
> 2.15.0.448.gf294e3d99a-goog

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 08/18] arm64: don't disable ADR_PREL_PG_HI21* with ARM64_ERRATUM_843419
  2017-11-15 22:29   ` Ard Biesheuvel
@ 2017-11-16 11:44     ` Will Deacon
  2017-11-16 16:32       ` Sami Tolvanen
  0 siblings, 1 reply; 103+ messages in thread
From: Will Deacon @ 2017-11-16 11:44 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Sami Tolvanen, Mark Rutland, Andi Kleen, Kees Cook, linux-kbuild,
	Nick Desaulniers, linux-kernel, Greg Hackmann, Masahiro Yamada,
	Michal Marek, Yury Norov, Alex Matveev, Matthias Kaehlcke,
	linux-arm-kernel, Maxim Kuvyrkov

On Wed, Nov 15, 2017 at 10:29:12PM +0000, Ard Biesheuvel wrote:
> On 15 November 2017 at 21:34, Sami Tolvanen <samitolvanen@google.com> wrote:
> > CONFIG_LTO_CLANG depends on GNU gold, which can generate ADR_PREL_PG_HI21*
> > relocations even with --fix-cortex-a53-843419.
> >
> > Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
> > ---
> >  arch/arm64/kernel/module.c | 2 --
> >  1 file changed, 2 deletions(-)
> >
> > diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
> > index f469e0435903..fec9a578f122 100644
> > --- a/arch/arm64/kernel/module.c
> > +++ b/arch/arm64/kernel/module.c
> > @@ -336,14 +336,12 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
> >                         ovf = reloc_insn_imm(RELOC_OP_PREL, loc, val, 0, 21,
> >                                              AARCH64_INSN_IMM_ADR);
> >                         break;
> > -#ifndef CONFIG_ARM64_ERRATUM_843419
> >                 case R_AARCH64_ADR_PREL_PG_HI21_NC:
> >                         overflow_check = false;
> >                 case R_AARCH64_ADR_PREL_PG_HI21:
> >                         ovf = reloc_insn_imm(RELOC_OP_PAGE, loc, val, 12, 21,
> >                                              AARCH64_INSN_IMM_ADR);
> >                         break;
> > -#endif
> >                 case R_AARCH64_ADD_ABS_LO12_NC:
> >                 case R_AARCH64_LDST8_ABS_LO12_NC:
> >                         overflow_check = false;
> > --
> > 2.15.0.448.gf294e3d99a-goog
> >
> 
> I think this change is reasonable in itself, but i do wonder how we
> can ensure that the adrp instructions that GNU gold does generate are
> not affected by the erratum, given that modules are partially linked
> object files, not shared libraries.

Right, and this would also mean that we silently load vulnerable modules
that are linked with either LD that doesn't support --fix-cortex-a53-843419
or simply wasn't passed.

Will

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 06/18] arm64: kvm: use -fno-jump-tables with clang
  2017-11-15 21:34 ` [PATCH v2 06/18] arm64: kvm: use -fno-jump-tables with clang Sami Tolvanen
@ 2017-11-16 11:46   ` Will Deacon
  2017-11-16 16:25     ` Sami Tolvanen
  2017-11-20 14:41   ` Mark Rutland
  1 sibling, 1 reply; 103+ messages in thread
From: Will Deacon @ 2017-11-16 11:46 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Mark Rutland, Masahiro Yamada, Maxim Kuvyrkov, Michal Marek,
	Nick Desaulniers, Yury Norov, Matthias Kaehlcke

On Wed, Nov 15, 2017 at 01:34:16PM -0800, Sami Tolvanen wrote:
> Use -fno-jump-tables to make sure clang doesn't generate branches
> to EL1 virtual addresses.

Can you elaborate a bit more on exactly what you saw failing here, please?
Whilst it's obviously broken to jump to EL1 from EL2 w/o VHE, the commit
message is a bit cryptic.

Will

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 09/18] arm64: explicitly pass --no-fix-cortex-a53-843419 to GNU gold
  2017-11-15 21:34 ` [PATCH v2 09/18] arm64: explicitly pass --no-fix-cortex-a53-843419 to GNU gold Sami Tolvanen
@ 2017-11-16 11:47   ` Will Deacon
  2017-11-16 16:35     ` Sami Tolvanen
  0 siblings, 1 reply; 103+ messages in thread
From: Will Deacon @ 2017-11-16 11:47 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Mark Rutland, Masahiro Yamada, Maxim Kuvyrkov, Michal Marek,
	Nick Desaulniers, Yury Norov, Matthias Kaehlcke

On Wed, Nov 15, 2017 at 01:34:19PM -0800, Sami Tolvanen wrote:
> Some versions of GNU gold are known to produce broken code with
> --fix-cortex-a53-843419 as explained in this bug:
> 
>   https://sourceware.org/bugzilla/show_bug.cgi?id=21491
> 
> If ARM64_ERRATUM_843419 is disabled and we're using GNU gold, pass
> --no-fix-cortex-a53-843419 to the linker to ensure the erratum fix is not
> used even if the linker is configured to enable it by default.

But if ARM64_ERRATUM_843419 is enabled, we'll go ahead and generate broken
code?

Will

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 10/18] arm64: add a workaround for GNU gold with ARM64_MODULE_PLTS
  2017-11-15 21:34 ` [PATCH v2 10/18] arm64: add a workaround for GNU gold with ARM64_MODULE_PLTS Sami Tolvanen
@ 2017-11-16 11:50   ` Will Deacon
  2017-11-16 16:41     ` Sami Tolvanen
  0 siblings, 1 reply; 103+ messages in thread
From: Will Deacon @ 2017-11-16 11:50 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Mark Rutland, Masahiro Yamada, Maxim Kuvyrkov, Michal Marek,
	Nick Desaulniers, Yury Norov, Matthias Kaehlcke

On Wed, Nov 15, 2017 at 01:34:20PM -0800, Sami Tolvanen wrote:
> CONFIG_CLANG_LTO depends on GNU gold and due to a known bug, the
> linker crashes when ARM64_MODULE_PLTS is enabled:
> 
>   https://sourceware.org/bugzilla/show_bug.cgi?id=14592
> 
> To work around the problem, this change removes NOLOAD from .plt
> and .init.plt, which allows us to link modules with ld.gold.

Why don't we just not do LTO if the toolchain is busted? This feels like
it will end up being a game of whack-a-mole as code could be introduced
that tickles known bugs on older toolchains.

Will

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 11/18] arm64: make mrs_s and msr_s macros work with LTO
  2017-11-15 21:34 ` [PATCH v2 11/18] arm64: make mrs_s and msr_s macros work with LTO Sami Tolvanen
@ 2017-11-16 11:54   ` Will Deacon
  2017-11-16 13:07     ` Yury Norov
                       ` (2 more replies)
  0 siblings, 3 replies; 103+ messages in thread
From: Will Deacon @ 2017-11-16 11:54 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Mark Rutland, Masahiro Yamada, Maxim Kuvyrkov, Michal Marek,
	Nick Desaulniers, Yury Norov, Matthias Kaehlcke

On Wed, Nov 15, 2017 at 01:34:21PM -0800, Sami Tolvanen wrote:
> From: Alex Matveev <alxmtvv@gmail.com>
> 
> Use UNDEFINE_MRS_S and UNDEFINE_MSR_S to define corresponding macros
> in-place and workaround gcc and clang limitations on redefining macros
> across different assembler blocks.

What limitations? Can you elaborate please? Is this a fix?

Will

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 18/18] arm64: select ARCH_SUPPORTS_LTO_CLANG
  2017-11-15 21:34 ` [PATCH v2 18/18] arm64: select ARCH_SUPPORTS_LTO_CLANG Sami Tolvanen
@ 2017-11-16 11:58   ` Will Deacon
  2017-11-16 16:17     ` Sami Tolvanen
  0 siblings, 1 reply; 103+ messages in thread
From: Will Deacon @ 2017-11-16 11:58 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Mark Rutland, Masahiro Yamada, Maxim Kuvyrkov, Michal Marek,
	Nick Desaulniers, Yury Norov, Matthias Kaehlcke, peterz, paulmck

Hi Sami,

On Wed, Nov 15, 2017 at 01:34:28PM -0800, Sami Tolvanen wrote:
> Allow CONFIG_LTO_CLANG to be enabled for the architecture.
> 
> Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
> ---
>  arch/arm64/Kconfig | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 3a70f763e18a..58504327b9f6 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -40,6 +40,7 @@ config ARM64
>  	select ARCH_INLINE_WRITE_UNLOCK_IRQRESTORE if !PREEMPT
>  	select ARCH_USE_CMPXCHG_LOCKREF
>  	select ARCH_USE_QUEUED_RWLOCKS
> +	select ARCH_SUPPORTS_LTO_CLANG

I'll be honest with you: I'm absolutely terrified about enabling this.
How much testing has this seen?

The main thing that worries me is that this gives the toolchain a lot
more freedom to break dependency ordering with RCU, leading to subtle
concurrency issues that would actually break on arm64. Specifically,
I'm worried about value analysis that could potentially convert an
address dependency into a control dependency. Right now, the C standard
isn't on our side here and we're relying on the compiler not doing this
kind of thing. Can we continue to rely on that in the face of LTO?

Will

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 11/18] arm64: make mrs_s and msr_s macros work with LTO
  2017-11-16 11:54   ` Will Deacon
@ 2017-11-16 13:07     ` Yury Norov
  2017-11-16 13:55       ` Robin Murphy
  2017-11-16 13:56       ` Segher Boessenkool
  2017-11-16 16:43     ` Sami Tolvanen
  2017-11-16 18:14     ` Alex Matveev
  2 siblings, 2 replies; 103+ messages in thread
From: Yury Norov @ 2017-11-16 13:07 UTC (permalink / raw)
  To: Will Deacon
  Cc: Sami Tolvanen, Alex Matveev, Andi Kleen, Ard Biesheuvel,
	Greg Hackmann, Kees Cook, linux-arm-kernel, linux-kbuild,
	linux-kernel, Mark Rutland, Masahiro Yamada, Maxim Kuvyrkov,
	Michal Marek, Nick Desaulniers, Matthias Kaehlcke

On Thu, Nov 16, 2017 at 11:54:33AM +0000, Will Deacon wrote:
> On Wed, Nov 15, 2017 at 01:34:21PM -0800, Sami Tolvanen wrote:
> > From: Alex Matveev <alxmtvv@gmail.com>
> > 
> > Use UNDEFINE_MRS_S and UNDEFINE_MSR_S to define corresponding macros
> > in-place and workaround gcc and clang limitations on redefining macros
> > across different assembler blocks.
> 
> What limitations? Can you elaborate please? Is this a fix?

Hi Will,

Regarding GCC.

When it joins preprocessed source files into single asm file,
mrs_s/msr_s becomes either not declared or declared multiple times.

./ccuFb68h.s:33120: Error: Macro `mrs_s' was already defined
./ccuFb68h.s:33124: Error: Macro `msr_s' was already defined

I'm not sure that GCC works correctly in this case, and I sent the
email to Linaro toolchain group to clarify it. See below.

Yury

[...]

Links:
My unfinished branch:
https://github.com/norov/linux/tree/lto
Andi Kleen tree:
https://github.com/andikleen/linux-misc/tree/lto-411-1
Sami Tolvanen's recent work for clang:
https://lkml.org/lkml/2017/11/3/606

Question we have for now:
There's mrs_s/msr_s macro that doesn't work with LTO - linker
complains very loudly that macro is either not declared, or declared
multiple times. (To reproduce - try to build my kernel branch w/o last
patch).

The same (?) problem is observed with clang, and people there
considered it as feature, not a bug.

https://bugs.llvm.org/show_bug.cgi?id=19749

We have the fix for both clang and gcc, but it looks hacky. Maybe it
worth to fix mrs/msr issue on toolchain side?

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 11/18] arm64: make mrs_s and msr_s macros work with LTO
  2017-11-16 13:07     ` Yury Norov
@ 2017-11-16 13:55       ` Robin Murphy
  2017-11-16 21:29         ` Yury Norov
  2017-11-16 13:56       ` Segher Boessenkool
  1 sibling, 1 reply; 103+ messages in thread
From: Robin Murphy @ 2017-11-16 13:55 UTC (permalink / raw)
  To: Yury Norov, Will Deacon
  Cc: Mark Rutland, Andi Kleen, Kees Cook, Ard Biesheuvel,
	linux-kbuild, Nick Desaulniers, linux-kernel, Greg Hackmann,
	Masahiro Yamada, Michal Marek, Sami Tolvanen, Alex Matveev,
	Matthias Kaehlcke, linux-arm-kernel, Maxim Kuvyrkov

On 16/11/17 13:07, Yury Norov wrote:
> On Thu, Nov 16, 2017 at 11:54:33AM +0000, Will Deacon wrote:
>> On Wed, Nov 15, 2017 at 01:34:21PM -0800, Sami Tolvanen wrote:
>>> From: Alex Matveev <alxmtvv@gmail.com>
>>>
>>> Use UNDEFINE_MRS_S and UNDEFINE_MSR_S to define corresponding macros
>>> in-place and workaround gcc and clang limitations on redefining macros
>>> across different assembler blocks.
>>
>> What limitations? Can you elaborate please? Is this a fix?
> 
> Hi Will,
> 
> Regarding GCC.
> 
> When it joins preprocessed source files into single asm file,
> mrs_s/msr_s becomes either not declared or declared multiple times.
> 
> ./ccuFb68h.s:33120: Error: Macro `mrs_s' was already defined
> ./ccuFb68h.s:33124: Error: Macro `msr_s' was already defined
> 
> I'm not sure that GCC works correctly in this case, and I sent the
> email to Linaro toolchain group to clarify it. See below.
> 
> Yury
> 
> [...]
> 
> Links:
> My unfinished branch:
> https://github.com/norov/linux/tree/lto
> Andi Kleen tree:
> https://github.com/andikleen/linux-misc/tree/lto-411-1
> Sami Tolvanen's recent work for clang:
> https://lkml.org/lkml/2017/11/3/606
> 
> Question we have for now:
> There's mrs_s/msr_s macro that doesn't work with LTO - linker
> complains very loudly that macro is either not declared, or declared
> multiple times. (To reproduce - try to build my kernel branch w/o last
> patch).
> 
> The same (?) problem is observed with clang, and people there
> considered it as feature, not a bug.
> 
> https://bugs.llvm.org/show_bug.cgi?id=19749
> 
> We have the fix for both clang and gcc, but it looks hacky. Maybe it
> worth to fix mrs/msr issue on toolchain side?

Given that this whole mrs_s infrastructure is a workaround for older 
assemblers which don't support the "S<op0>_<op1>_<Cn>_<Cm>_<op2>" syntax 
for arbitrary unnamed system registers (which IIRC was a fairly late 
addition to the architecture), the only way it could be "fixed" on the 
toolchain side is by removing all those older toolchains from existence. 
Good luck with that ;)

In *theory*, it might be possible to do something similar to what we do 
with CONFIG_BROKEN_GAS_INST to detect offending assemblers and only 
define and use these macros when necessary (hopefully Clang and other 
LTO-capable toolchains do accept the proper syntax), but I've no idea 
how invasive or difficult that might turn out to be.

Robin.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 11/18] arm64: make mrs_s and msr_s macros work with LTO
  2017-11-16 13:07     ` Yury Norov
  2017-11-16 13:55       ` Robin Murphy
@ 2017-11-16 13:56       ` Segher Boessenkool
  2017-11-16 16:46         ` Sami Tolvanen
  1 sibling, 1 reply; 103+ messages in thread
From: Segher Boessenkool @ 2017-11-16 13:56 UTC (permalink / raw)
  To: Yury Norov
  Cc: Will Deacon, Sami Tolvanen, Alex Matveev, Andi Kleen,
	Ard Biesheuvel, Greg Hackmann, Kees Cook, linux-arm-kernel,
	linux-kbuild, linux-kernel, Mark Rutland, Masahiro Yamada,
	Maxim Kuvyrkov, Michal Marek, Nick Desaulniers,
	Matthias Kaehlcke

On Thu, Nov 16, 2017 at 04:07:42PM +0300, Yury Norov wrote:
> > > Use UNDEFINE_MRS_S and UNDEFINE_MSR_S to define corresponding macros
> > > in-place and workaround gcc and clang limitations on redefining macros
> > > across different assembler blocks.
> > 
> > What limitations? Can you elaborate please? Is this a fix?

> Regarding GCC.
> 
> When it joins preprocessed source files into single asm file,
> mrs_s/msr_s becomes either not declared or declared multiple times.
> 
> ./ccuFb68h.s:33120: Error: Macro `mrs_s' was already defined
> ./ccuFb68h.s:33124: Error: Macro `msr_s' was already defined
> 
> I'm not sure that GCC works correctly in this case, and I sent the
> email to Linaro toolchain group to clarify it. See below.

The compiler is free to inline code, reorder code, duplicate code, etc.
Inline assembler is not exempt from this.

The compiler is fine, the assembler is fine (and the linker has nothing
to do with it).  Your code is not fine.


Segher

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 18/18] arm64: select ARCH_SUPPORTS_LTO_CLANG
  2017-11-16 11:58   ` Will Deacon
@ 2017-11-16 16:17     ` Sami Tolvanen
  2017-11-16 16:30       ` Peter Zijlstra
  0 siblings, 1 reply; 103+ messages in thread
From: Sami Tolvanen @ 2017-11-16 16:17 UTC (permalink / raw)
  To: Will Deacon
  Cc: Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Mark Rutland, Masahiro Yamada, Maxim Kuvyrkov, Michal Marek,
	Nick Desaulniers, Yury Norov, Matthias Kaehlcke, peterz, paulmck

On Thu, Nov 16, 2017 at 11:58:11AM +0000, Will Deacon wrote:
> I'll be honest with you: I'm absolutely terrified about enabling this.

That's understandable, I wouldn't want to enable this by default
quite yet either. This patch doesn't enable LTO for arm64, just makes
it possible to enable the feature. I'm perfectly fine with marking
CONFIG_LTO_CLANG experimental if it makes people more comfortable.

> How much testing has this seen?

I've been running clang LTO kernels for a few months on a Pixel 2 device
without any issues. This is on a 4.4 kernel though.

> Right now, the C standard isn't on our side here and we're relying on
> the compiler not doing this kind of thing. Can we continue to rely on
> that in the face of LTO?

I'll have to check with our LLVM experts, but I have not run into these
issues with current compiler versions. Looking at Andi's old patches,
looks like gcc might be more aggressive in reordering things with LTO
than clang.

Sami

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 06/18] arm64: kvm: use -fno-jump-tables with clang
  2017-11-16 11:46   ` Will Deacon
@ 2017-11-16 16:25     ` Sami Tolvanen
  0 siblings, 0 replies; 103+ messages in thread
From: Sami Tolvanen @ 2017-11-16 16:25 UTC (permalink / raw)
  To: Will Deacon
  Cc: Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Mark Rutland, Masahiro Yamada, Maxim Kuvyrkov, Michal Marek,
	Nick Desaulniers, Yury Norov, Matthias Kaehlcke

On Thu, Nov 16, 2017 at 11:46:17AM +0000, Will Deacon wrote:
> Can you elaborate a bit more on exactly what you saw failing here,
> please?

Mark noticed that clang built kernels fail to boot when the kernel
starts at EL2:

  http://lkml.iu.edu/hypermail/linux/kernel/1711.0/02817.html

Turns out starting with LLVM r308050, the compiler decides to use
a jump table in __init_stage2_translation, and generates code that
branches into an EL1 virtual address.

> Whilst it's obviously broken to jump to EL1 from EL2 w/o VHE, the
> commit message is a bit cryptic.

I agree, I'll add a more descriptive commit message for v3.

Sami

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 18/18] arm64: select ARCH_SUPPORTS_LTO_CLANG
  2017-11-16 16:17     ` Sami Tolvanen
@ 2017-11-16 16:30       ` Peter Zijlstra
  2017-11-16 16:50         ` Nick Desaulniers
  0 siblings, 1 reply; 103+ messages in thread
From: Peter Zijlstra @ 2017-11-16 16:30 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Will Deacon, Alex Matveev, Andi Kleen, Ard Biesheuvel,
	Greg Hackmann, Kees Cook, linux-arm-kernel, linux-kbuild,
	linux-kernel, Mark Rutland, Masahiro Yamada, Maxim Kuvyrkov,
	Michal Marek, Nick Desaulniers, Yury Norov, Matthias Kaehlcke,
	paulmck

On Thu, Nov 16, 2017 at 08:17:31AM -0800, Sami Tolvanen wrote:
> On Thu, Nov 16, 2017 at 11:58:11AM +0000, Will Deacon wrote:
> > I'll be honest with you: I'm absolutely terrified about enabling this.
> 
> That's understandable, I wouldn't want to enable this by default
> quite yet either. This patch doesn't enable LTO for arm64, just makes
> it possible to enable the feature. I'm perfectly fine with marking
> CONFIG_LTO_CLANG experimental if it makes people more comfortable.
> 
> > How much testing has this seen?
> 
> I've been running clang LTO kernels for a few months on a Pixel 2 device
> without any issues. This is on a 4.4 kernel though.
> 
> > Right now, the C standard isn't on our side here and we're relying on
> > the compiler not doing this kind of thing. Can we continue to rely on
> > that in the face of LTO?
> 
> I'll have to check with our LLVM experts, but I have not run into these
> issues with current compiler versions. Looking at Andi's old patches,
> looks like gcc might be more aggressive in reordering things with LTO
> than clang.

Ideally we'd get the toolchain people to commit to supporting the kernel
memory model along side the C11 one. That would help a ton.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 08/18] arm64: don't disable ADR_PREL_PG_HI21* with ARM64_ERRATUM_843419
  2017-11-16 11:44     ` Will Deacon
@ 2017-11-16 16:32       ` Sami Tolvanen
  2017-11-16 16:34         ` Ard Biesheuvel
  0 siblings, 1 reply; 103+ messages in thread
From: Sami Tolvanen @ 2017-11-16 16:32 UTC (permalink / raw)
  To: Will Deacon
  Cc: Ard Biesheuvel, Mark Rutland, Andi Kleen, Kees Cook,
	linux-kbuild, Nick Desaulniers, linux-kernel, Greg Hackmann,
	Masahiro Yamada, Michal Marek, Yury Norov, Alex Matveev,
	Matthias Kaehlcke, linux-arm-kernel, Maxim Kuvyrkov

On Thu, Nov 16, 2017 at 11:44:06AM +0000, Will Deacon wrote:
> Right, and this would also mean that we silently load vulnerable
> modules that are linked with either LD that doesn't support
> --fix-cortex-a53-843419 or simply wasn't passed.

You'll see a warning at least if the linker doesn't support the flag,
but yes, you're correct. In v1 of this patch set, LTO depended on this
erratum not being selected, but I changed it in v2 based on Ard's
suggestion.

I'm fine with not being able to use LTO on devices that are affected
by this erratum, so either option works for me. I can even change
this so the user must explicitly disable the erratum in order to use
LTO. Thoughts?

Sami

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 08/18] arm64: don't disable ADR_PREL_PG_HI21* with ARM64_ERRATUM_843419
  2017-11-16 16:32       ` Sami Tolvanen
@ 2017-11-16 16:34         ` Ard Biesheuvel
  2017-11-16 21:37           ` Sami Tolvanen
  0 siblings, 1 reply; 103+ messages in thread
From: Ard Biesheuvel @ 2017-11-16 16:34 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Will Deacon, Mark Rutland, Andi Kleen, Kees Cook, linux-kbuild,
	Nick Desaulniers, linux-kernel, Greg Hackmann, Masahiro Yamada,
	Michal Marek, Yury Norov, Alex Matveev, Matthias Kaehlcke,
	linux-arm-kernel, Maxim Kuvyrkov

On 16 November 2017 at 16:32, Sami Tolvanen <samitolvanen@google.com> wrote:
> On Thu, Nov 16, 2017 at 11:44:06AM +0000, Will Deacon wrote:
>> Right, and this would also mean that we silently load vulnerable
>> modules that are linked with either LD that doesn't support
>> --fix-cortex-a53-843419 or simply wasn't passed.
>
> You'll see a warning at least if the linker doesn't support the flag,
> but yes, you're correct. In v1 of this patch set, LTO depended on this
> erratum not being selected, but I changed it in v2 based on Ard's
> suggestion.
>
> I'm fine with not being able to use LTO on devices that are affected
> by this erratum, so either option works for me. I can even change
> this so the user must explicitly disable the erratum in order to use
> LTO. Thoughts?
>

You still have not explained to us how GOLD avoids the erratum.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 09/18] arm64: explicitly pass --no-fix-cortex-a53-843419 to GNU gold
  2017-11-16 11:47   ` Will Deacon
@ 2017-11-16 16:35     ` Sami Tolvanen
  2017-11-20 14:47       ` Mark Rutland
  0 siblings, 1 reply; 103+ messages in thread
From: Sami Tolvanen @ 2017-11-16 16:35 UTC (permalink / raw)
  To: Will Deacon
  Cc: Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Mark Rutland, Masahiro Yamada, Maxim Kuvyrkov, Michal Marek,
	Nick Desaulniers, Yury Norov, Matthias Kaehlcke

On Thu, Nov 16, 2017 at 11:47:23AM +0000, Will Deacon wrote:
> But if ARM64_ERRATUM_843419 is enabled, we'll go ahead and generate
> broken code?

This bug has been fixed in all current versions of GNU gold, but of
course, if someone is using a broken linker, it will generate broken
code. We can't do anything about that in the kernel.

The reason for this patch is that the linker can be configured to
enable the erratum fix by default, and we must explicitly disable it
if we don't want the fix, whether it's because of a broken linker, or
in our case, because of issues with kernel modules.

Sami

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 10/18] arm64: add a workaround for GNU gold with ARM64_MODULE_PLTS
  2017-11-16 11:50   ` Will Deacon
@ 2017-11-16 16:41     ` Sami Tolvanen
  2017-11-16 18:47       ` Will Deacon
  0 siblings, 1 reply; 103+ messages in thread
From: Sami Tolvanen @ 2017-11-16 16:41 UTC (permalink / raw)
  To: Will Deacon
  Cc: Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Mark Rutland, Masahiro Yamada, Maxim Kuvyrkov, Michal Marek,
	Nick Desaulniers, Yury Norov, Matthias Kaehlcke

On Thu, Nov 16, 2017 at 11:50:12AM +0000, Will Deacon wrote:
> Why don't we just not do LTO if the toolchain is busted?

Because LTO can not only potentially improve performance, especially
when combined with PGO (Profile Guided Optimization), but it also
makes it possible to enable features like Control Flow Integrity that
can make kernel vulnerabilities more difficult to exploit:

  https://clang.llvm.org/docs/ControlFlowIntegrity.html

> This feels like it will end up being a game of whack-a-mole as code
> could be introduced that tickles known bugs on older toolchains.

I'm not sure this is much different from dealing with older versions
of the existing toolchain. Otherwise, we wouldn't need cc-version or
other similar macros, for example.

Sami

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 11/18] arm64: make mrs_s and msr_s macros work with LTO
  2017-11-16 11:54   ` Will Deacon
  2017-11-16 13:07     ` Yury Norov
@ 2017-11-16 16:43     ` Sami Tolvanen
  2017-11-16 16:44       ` Nick Desaulniers
  2017-11-16 18:14     ` Alex Matveev
  2 siblings, 1 reply; 103+ messages in thread
From: Sami Tolvanen @ 2017-11-16 16:43 UTC (permalink / raw)
  To: Will Deacon
  Cc: Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Mark Rutland, Masahiro Yamada, Maxim Kuvyrkov, Michal Marek,
	Nick Desaulniers, Yury Norov, Matthias Kaehlcke

On Thu, Nov 16, 2017 at 11:54:33AM +0000, Will Deacon wrote:
> What limitations? Can you elaborate please? Is this a fix?

The commit message in v1 was more informative, I'll change this back for v3:

--

Clang's integrated assembler does not allow assembly macros defined
in one inline asm block using the .macro directive to be used across
separate asm blocks. LLVM developers consider this a feature and not a
bug, recommending code refactoring:

https://bugs.llvm.org/show_bug.cgi?id=19749

As binutils doesn't allow macros to be redefined, this change adds C
preprocessor macros that define the assembly macros globally for binutils
and locally for clang's integrated assembler.

Sami

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 11/18] arm64: make mrs_s and msr_s macros work with LTO
  2017-11-16 16:43     ` Sami Tolvanen
@ 2017-11-16 16:44       ` Nick Desaulniers
  0 siblings, 0 replies; 103+ messages in thread
From: Nick Desaulniers @ 2017-11-16 16:44 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Will Deacon, Alex Matveev, Andi Kleen, Ard Biesheuvel,
	Greg Hackmann, Kees Cook, linux-arm-kernel,
	Linux Kbuild mailing list, LKML, Mark Rutland, Masahiro Yamada,
	Maxim Kuvyrkov, Michal Marek, Yury Norov, Matthias Kaehlcke

I need this patch to assemble the kernel for arm64 with clang regardless of LTO.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 11/18] arm64: make mrs_s and msr_s macros work with LTO
  2017-11-16 13:56       ` Segher Boessenkool
@ 2017-11-16 16:46         ` Sami Tolvanen
  2017-11-16 17:01           ` Segher Boessenkool
  0 siblings, 1 reply; 103+ messages in thread
From: Sami Tolvanen @ 2017-11-16 16:46 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Yury Norov, Will Deacon, Alex Matveev, Andi Kleen,
	Ard Biesheuvel, Greg Hackmann, Kees Cook, linux-arm-kernel,
	linux-kbuild, linux-kernel, Mark Rutland, Masahiro Yamada,
	Maxim Kuvyrkov, Michal Marek, Nick Desaulniers,
	Matthias Kaehlcke

On Thu, Nov 16, 2017 at 07:56:50AM -0600, Segher Boessenkool wrote:
> The compiler is fine, the assembler is fine (and the linker has
> nothing to do with it).  Your code is not fine.

Would you care to elaborate? The current code assumes that macros are
visible in other inline assembly blocks, and LLVM developers seem to
feel this isn't correct behavior. This patch fixes the current code so
it works with both assemblers.

Sami

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 18/18] arm64: select ARCH_SUPPORTS_LTO_CLANG
  2017-11-16 16:30       ` Peter Zijlstra
@ 2017-11-16 16:50         ` Nick Desaulniers
  2017-11-16 16:59           ` Peter Zijlstra
  0 siblings, 1 reply; 103+ messages in thread
From: Nick Desaulniers @ 2017-11-16 16:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sami Tolvanen, Will Deacon, Alex Matveev, Andi Kleen,
	Ard Biesheuvel, Greg Hackmann, Kees Cook, linux-arm-kernel,
	Linux Kbuild mailing list, LKML, Mark Rutland, Masahiro Yamada,
	Maxim Kuvyrkov, Michal Marek, Yury Norov, Matthias Kaehlcke,
	paulmck

On Thu, Nov 16, 2017 at 8:30 AM, Peter Zijlstra <peterz@infradead.org> wrote:

> Ideally we'd get the toolchain people to commit to supporting the kernel
> memory model along side the C11 one. That would help a ton.

Does anyone from the kernel side participate in the C standardization process?

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 05/18] arm64: fix -m for GNU gold
  2017-11-16 10:55   ` Yury Norov
@ 2017-11-16 16:55     ` Nick Desaulniers
  0 siblings, 0 replies; 103+ messages in thread
From: Nick Desaulniers @ 2017-11-16 16:55 UTC (permalink / raw)
  To: Yury Norov
  Cc: Sami Tolvanen, Alex Matveev, Andi Kleen, Ard Biesheuvel,
	Greg Hackmann, Kees Cook, linux-arm-kernel,
	Linux Kbuild mailing list, LKML, Mark Rutland, Masahiro Yamada,
	Maxim Kuvyrkov, Michal Marek, Matthias Kaehlcke

This and the ld-name patch can probably be taken separately from the
rest in the LTO series simply for enabling linking w/ gold.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 18/18] arm64: select ARCH_SUPPORTS_LTO_CLANG
  2017-11-16 16:50         ` Nick Desaulniers
@ 2017-11-16 16:59           ` Peter Zijlstra
  2017-11-16 17:16             ` Nick Desaulniers
  0 siblings, 1 reply; 103+ messages in thread
From: Peter Zijlstra @ 2017-11-16 16:59 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Sami Tolvanen, Will Deacon, Alex Matveev, Andi Kleen,
	Ard Biesheuvel, Greg Hackmann, Kees Cook, linux-arm-kernel,
	Linux Kbuild mailing list, LKML, Mark Rutland, Masahiro Yamada,
	Maxim Kuvyrkov, Michal Marek, Yury Norov, Matthias Kaehlcke,
	paulmck

On Thu, Nov 16, 2017 at 08:50:41AM -0800, Nick Desaulniers wrote:
> On Thu, Nov 16, 2017 at 8:30 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > Ideally we'd get the toolchain people to commit to supporting the kernel
> > memory model along side the C11 one. That would help a ton.
> 
> Does anyone from the kernel side participate in the C standardization process?

Yes, Paul McKenney and Will Deacon. Doesn't mean these two can still be
reconciled though. From what I understand C11 (and onwards) are
incompatible with the kernel model on a number of subtle points.

Not to mention that there's people in the C11 process that strongly
argue for stuff that would break every single multi-threaded program
written since the 70s, which would include pretty much all OS kernels.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 11/18] arm64: make mrs_s and msr_s macros work with LTO
  2017-11-16 16:46         ` Sami Tolvanen
@ 2017-11-16 17:01           ` Segher Boessenkool
  2017-11-16 17:11             ` Sami Tolvanen
  0 siblings, 1 reply; 103+ messages in thread
From: Segher Boessenkool @ 2017-11-16 17:01 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Yury Norov, Will Deacon, Alex Matveev, Andi Kleen,
	Ard Biesheuvel, Greg Hackmann, Kees Cook, linux-arm-kernel,
	linux-kbuild, linux-kernel, Mark Rutland, Masahiro Yamada,
	Maxim Kuvyrkov, Michal Marek, Nick Desaulniers,
	Matthias Kaehlcke

On Thu, Nov 16, 2017 at 08:46:08AM -0800, Sami Tolvanen wrote:
> On Thu, Nov 16, 2017 at 07:56:50AM -0600, Segher Boessenkool wrote:
> > The compiler is fine, the assembler is fine (and the linker has
> > nothing to do with it).  Your code is not fine.
> 
> Would you care to elaborate? The current code assumes that macros are
> visible in other inline assembly blocks, and LLVM developers seem to
> feel this isn't correct behavior. This patch fixes the current code so
> it works with both assemblers.

If you say e.g.

void f(void)
{
	asm(".macro something\n\t.endm");
}

there is nothing that prevents the compiler from emitting this more
than once.  Expecting things to be emitted in whatever order is a bad
idea, too.

The thing with .purgem can work.  Inelegant, sure, but it can work :-)
Just make sure you do the macro define, the code that uses it, and the
undefine, all in the same inline asm statement.


Segher

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 11/18] arm64: make mrs_s and msr_s macros work with LTO
  2017-11-16 17:01           ` Segher Boessenkool
@ 2017-11-16 17:11             ` Sami Tolvanen
  0 siblings, 0 replies; 103+ messages in thread
From: Sami Tolvanen @ 2017-11-16 17:11 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Yury Norov, Will Deacon, Alex Matveev, Andi Kleen,
	Ard Biesheuvel, Greg Hackmann, Kees Cook, linux-arm-kernel,
	linux-kbuild, linux-kernel, Mark Rutland, Masahiro Yamada,
	Maxim Kuvyrkov, Michal Marek, Nick Desaulniers,
	Matthias Kaehlcke

On Thu, Nov 16, 2017 at 11:01:44AM -0600, Segher Boessenkool wrote:
> The thing with .purgem can work.  Inelegant, sure, but it can work :-)

It works, there are already functions in the kernel that use these
macros more than once. I agree that this might not be the most elegant
solution, but at least it allows us to use the same code path for both
compilers.

> Just make sure you do the macro define, the code that uses it, and the
> undefine, all in the same inline asm statement.

Yes, that's what we're doing here. Only KVM uses msr_s/msr_s directly,
everything else uses the (read|write)_sysreg_s wrappers instead, which
means the DEFINE/UNDEFINE macros are only needed in few places.

Sami

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 18/18] arm64: select ARCH_SUPPORTS_LTO_CLANG
  2017-11-16 16:59           ` Peter Zijlstra
@ 2017-11-16 17:16             ` Nick Desaulniers
  2017-11-16 17:34               ` Peter Zijlstra
  0 siblings, 1 reply; 103+ messages in thread
From: Nick Desaulniers @ 2017-11-16 17:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sami Tolvanen, Will Deacon, Alex Matveev, Andi Kleen,
	Ard Biesheuvel, Greg Hackmann, Kees Cook, linux-arm-kernel,
	Linux Kbuild mailing list, LKML, Mark Rutland, Masahiro Yamada,
	Maxim Kuvyrkov, Michal Marek, Yury Norov, Matthias Kaehlcke,
	paulmck

On Thu, Nov 16, 2017 at 8:59 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, Nov 16, 2017 at 08:50:41AM -0800, Nick Desaulniers wrote:
>> On Thu, Nov 16, 2017 at 8:30 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>>
>> > Ideally we'd get the toolchain people to commit to supporting the kernel
>> > memory model along side the C11 one. That would help a ton.
>>
>> Does anyone from the kernel side participate in the C standardization process?
>
> Yes, Paul McKenney and Will Deacon. Doesn't mean these two can still be
> reconciled though. From what I understand C11 (and onwards) are
> incompatible with the kernel model on a number of subtle points.

It would be good to have these incompatibilities written down, then
for the sake of argument, they can be cited both for discussions on
LKML and in the C standardization process.  For example, a running
list in Documentation/ or something would make it so that anyone could
understand and cite current issues with the latest C standard.

I don't understand why we'd block patches for enabling experimental
features.  We've been running this patch-set on actual devices for
months and would love to provide them to the community for further
testing.  If bugs are found, then there's more evidence to bring to
the C standards committee.  Otherwise we're shutting down feature
development for the sake of potential bugs in a C standard we're not
even using.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 18/18] arm64: select ARCH_SUPPORTS_LTO_CLANG
  2017-11-16 17:16             ` Nick Desaulniers
@ 2017-11-16 17:34               ` Peter Zijlstra
  2017-11-16 17:48                 ` Paul E. McKenney
  0 siblings, 1 reply; 103+ messages in thread
From: Peter Zijlstra @ 2017-11-16 17:34 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Sami Tolvanen, Will Deacon, Alex Matveev, Andi Kleen,
	Ard Biesheuvel, Greg Hackmann, Kees Cook, linux-arm-kernel,
	Linux Kbuild mailing list, LKML, Mark Rutland, Masahiro Yamada,
	Maxim Kuvyrkov, Michal Marek, Yury Norov, Matthias Kaehlcke,
	paulmck

On Thu, Nov 16, 2017 at 09:16:49AM -0800, Nick Desaulniers wrote:
> On Thu, Nov 16, 2017 at 8:59 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Thu, Nov 16, 2017 at 08:50:41AM -0800, Nick Desaulniers wrote:
> >> On Thu, Nov 16, 2017 at 8:30 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> >>
> >> > Ideally we'd get the toolchain people to commit to supporting the kernel
> >> > memory model along side the C11 one. That would help a ton.
> >>
> >> Does anyone from the kernel side participate in the C standardization process?
> >
> > Yes, Paul McKenney and Will Deacon. Doesn't mean these two can still be
> > reconciled though. From what I understand C11 (and onwards) are
> > incompatible with the kernel model on a number of subtle points.
> 
> It would be good to have these incompatibilities written down, then
> for the sake of argument, they can be cited both for discussions on
> LKML and in the C standardization process.  For example, a running
> list in Documentation/ or something would make it so that anyone could
> understand and cite current issues with the latest C standard.

Will should be able to produce this list; I know he's done before, I
just can't find it -- my Google-foo isn't strong today.

> I don't understand why we'd block patches for enabling experimental
> features.  We've been running this patch-set on actual devices for
> months and would love to provide them to the community for further
> testing.  If bugs are found, then there's more evidence to bring to
> the C standards committee.  Otherwise we're shutting down feature
> development for the sake of potential bugs in a C standard we're not
> even using.

So the problem is that its very very hard (and painful) to find these
bugs. Getting the tools people to comment on these specific
optimizations would really help lots.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 18/18] arm64: select ARCH_SUPPORTS_LTO_CLANG
  2017-11-16 17:34               ` Peter Zijlstra
@ 2017-11-16 17:48                 ` Paul E. McKenney
  2017-11-16 18:16                   ` Nick Desaulniers
  0 siblings, 1 reply; 103+ messages in thread
From: Paul E. McKenney @ 2017-11-16 17:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Nick Desaulniers, Sami Tolvanen, Will Deacon, Alex Matveev,
	Andi Kleen, Ard Biesheuvel, Greg Hackmann, Kees Cook,
	linux-arm-kernel, Linux Kbuild mailing list, LKML, Mark Rutland,
	Masahiro Yamada, Maxim Kuvyrkov, Michal Marek, Yury Norov,
	Matthias Kaehlcke

On Thu, Nov 16, 2017 at 06:34:17PM +0100, Peter Zijlstra wrote:
> On Thu, Nov 16, 2017 at 09:16:49AM -0800, Nick Desaulniers wrote:
> > On Thu, Nov 16, 2017 at 8:59 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > > On Thu, Nov 16, 2017 at 08:50:41AM -0800, Nick Desaulniers wrote:
> > >> On Thu, Nov 16, 2017 at 8:30 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > >>
> > >> > Ideally we'd get the toolchain people to commit to supporting the kernel
> > >> > memory model along side the C11 one. That would help a ton.
> > >>
> > >> Does anyone from the kernel side participate in the C standardization process?
> > >
> > > Yes, Paul McKenney and Will Deacon. Doesn't mean these two can still be
> > > reconciled though. From what I understand C11 (and onwards) are
> > > incompatible with the kernel model on a number of subtle points.
> > 
> > It would be good to have these incompatibilities written down, then
> > for the sake of argument, they can be cited both for discussions on
> > LKML and in the C standardization process.  For example, a running
> > list in Documentation/ or something would make it so that anyone could
> > understand and cite current issues with the latest C standard.
> 
> Will should be able to produce this list; I know he's done before, I
> just can't find it -- my Google-foo isn't strong today.

Here you go:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0124r4.html

> > I don't understand why we'd block patches for enabling experimental
> > features.  We've been running this patch-set on actual devices for
> > months and would love to provide them to the community for further
> > testing.  If bugs are found, then there's more evidence to bring to
> > the C standards committee.  Otherwise we're shutting down feature
> > development for the sake of potential bugs in a C standard we're not
> > even using.
> 
> So the problem is that its very very hard (and painful) to find these
> bugs. Getting the tools people to comment on these specific
> optimizations would really help lots.

It would be good to get something similar to LKMM into KTSAN, for
example.  There would probably be a few differences due to efficiency
concerns, but closer is better than less close.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 11/18] arm64: make mrs_s and msr_s macros work with LTO
  2017-11-16 11:54   ` Will Deacon
  2017-11-16 13:07     ` Yury Norov
  2017-11-16 16:43     ` Sami Tolvanen
@ 2017-11-16 18:14     ` Alex Matveev
  2 siblings, 0 replies; 103+ messages in thread
From: Alex Matveev @ 2017-11-16 18:14 UTC (permalink / raw)
  To: Will Deacon
  Cc: Sami Tolvanen, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Mark Rutland, Masahiro Yamada, Maxim Kuvyrkov, Michal Marek,
	Nick Desaulniers, Yury Norov, Matthias Kaehlcke

On Thu, 16 Nov 2017 11:54:33 +0000
Will Deacon <will.deacon@arm.com> wrote:

> On Wed, Nov 15, 2017 at 01:34:21PM -0800, Sami Tolvanen wrote:
> > From: Alex Matveev <alxmtvv@gmail.com>
> > 
> > Use UNDEFINE_MRS_S and UNDEFINE_MSR_S to define corresponding macros
> > in-place and workaround gcc and clang limitations on redefining
> > macros across different assembler blocks.  
> 
> What limitations? Can you elaborate please? Is this a fix?

Ok, here's the detailed explanation. Current state after preprocessing
is like this:

asm volatile(".macro mXX_s ... .endm");
void f()
{
	asm volatile("mXX_s a, b");
}

With GCC, it gives macro redefinition error because sysreg.h is
included in multiple source files, and assembler code for all of them
is later combined for LTO (I've seen an intermediate file with hundreds
of identical definitions).

With clang, it gives macro undefined error because clang doesn't allow
sharing macros between inline asm statements.

I also seem to remember catching another sort of undefined error with
GCC due to reordering of macro definition asm statement and generated
asm code for function that uses the macro.

Solution with defining and undefining for each use, while certainly not
elegant, satisfies both GCC and clang, LTO and non-LTO.

Regards,
Alex

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 18/18] arm64: select ARCH_SUPPORTS_LTO_CLANG
  2017-11-16 17:48                 ` Paul E. McKenney
@ 2017-11-16 18:16                   ` Nick Desaulniers
  2017-11-16 18:39                     ` Paul E. McKenney
  2017-11-23 13:42                     ` Alexander Potapenko
  0 siblings, 2 replies; 103+ messages in thread
From: Nick Desaulniers @ 2017-11-16 18:16 UTC (permalink / raw)
  To: paulmck
  Cc: Peter Zijlstra, Sami Tolvanen, Will Deacon, Alex Matveev,
	Andi Kleen, Ard Biesheuvel, Greg Hackmann, Kees Cook,
	linux-arm-kernel, Linux Kbuild mailing list, LKML, Mark Rutland,
	Masahiro Yamada, Maxim Kuvyrkov, Michal Marek, Yury Norov,
	Matthias Kaehlcke, Alexander Potapenko, Stephen Hines,
	Pirama Arumuga Nainar, Manoj Gupta

On Thu, Nov 16, 2017 at 9:48 AM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> On Thu, Nov 16, 2017 at 06:34:17PM +0100, Peter Zijlstra wrote:
>> On Thu, Nov 16, 2017 at 09:16:49AM -0800, Nick Desaulniers wrote:
>> > On Thu, Nov 16, 2017 at 8:59 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > > On Thu, Nov 16, 2017 at 08:50:41AM -0800, Nick Desaulniers wrote:
>> > >> On Thu, Nov 16, 2017 at 8:30 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > >>
>> > >> > Ideally we'd get the toolchain people to commit to supporting the kernel
>> > >> > memory model along side the C11 one. That would help a ton.
>> > >>
>> > >> Does anyone from the kernel side participate in the C standardization process?
>> > >
>> > > Yes, Paul McKenney and Will Deacon. Doesn't mean these two can still be
>> > > reconciled though. From what I understand C11 (and onwards) are
>> > > incompatible with the kernel model on a number of subtle points.
>> >
>> > It would be good to have these incompatibilities written down, then
>> > for the sake of argument, they can be cited both for discussions on
>> > LKML and in the C standardization process.  For example, a running
>> > list in Documentation/ or something would make it so that anyone could
>> > understand and cite current issues with the latest C standard.
>>
>> Will should be able to produce this list; I know he's done before, I
>> just can't find it -- my Google-foo isn't strong today.
>
> Here you go:
>
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0124r4.html

Great, thanks! Will take some time to digest, but happy to refer
others to this important work in the future.

I wonder if we have anything like a case study that shows specifically
how a compiler generated a subtle bug due to specifics of the memory
model, like a "if you do this, here's the problematic code that will
get generated, and why it's problematic due to the memory model."
That may be a good way to raise issues with toolchain developers with
concrete and actionable examples.

>> > I don't understand why we'd block patches for enabling experimental
>> > features.  We've been running this patch-set on actual devices for
>> > months and would love to provide them to the community for further
>> > testing.  If bugs are found, then there's more evidence to bring to
>> > the C standards committee.  Otherwise we're shutting down feature
>> > development for the sake of potential bugs in a C standard we're not
>> > even using.
>>
>> So the problem is that its very very hard (and painful) to find these
>> bugs. Getting the tools people to comment on these specific
>> optimizations would really help lots.

No doubt; I do not disagree with you.  Kernel developers have very
important use cases for the language.

But the core point I'm trying to make is "do we need to block this
patch set until issues with the C standards body in regards to the
kernels memory model are resolved?"  I would hope the two are
orthogonal and that we'd take them and then test them even more
extensively than the developer has in order to find out.

> It would be good to get something similar to LKMM into KTSAN, for
> example.  There would probably be a few differences due to efficiency
> concerns, but closer is better than less close.  ;-)

+ glider, who may be able to comment on the state of KTSAN.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 18/18] arm64: select ARCH_SUPPORTS_LTO_CLANG
  2017-11-16 18:16                   ` Nick Desaulniers
@ 2017-11-16 18:39                     ` Paul E. McKenney
  2017-11-16 18:45                       ` Will Deacon
  2017-11-23 13:42                     ` Alexander Potapenko
  1 sibling, 1 reply; 103+ messages in thread
From: Paul E. McKenney @ 2017-11-16 18:39 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Peter Zijlstra, Sami Tolvanen, Will Deacon, Alex Matveev,
	Andi Kleen, Ard Biesheuvel, Greg Hackmann, Kees Cook,
	linux-arm-kernel, Linux Kbuild mailing list, LKML, Mark Rutland,
	Masahiro Yamada, Maxim Kuvyrkov, Michal Marek, Yury Norov,
	Matthias Kaehlcke, Alexander Potapenko, Stephen Hines,
	Pirama Arumuga Nainar, Manoj Gupta

On Thu, Nov 16, 2017 at 10:16:22AM -0800, Nick Desaulniers wrote:
> On Thu, Nov 16, 2017 at 9:48 AM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > On Thu, Nov 16, 2017 at 06:34:17PM +0100, Peter Zijlstra wrote:
> >> On Thu, Nov 16, 2017 at 09:16:49AM -0800, Nick Desaulniers wrote:
> >> > On Thu, Nov 16, 2017 at 8:59 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> >> > > On Thu, Nov 16, 2017 at 08:50:41AM -0800, Nick Desaulniers wrote:
> >> > >> On Thu, Nov 16, 2017 at 8:30 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> >> > >>
> >> > >> > Ideally we'd get the toolchain people to commit to supporting the kernel
> >> > >> > memory model along side the C11 one. That would help a ton.
> >> > >>
> >> > >> Does anyone from the kernel side participate in the C standardization process?
> >> > >
> >> > > Yes, Paul McKenney and Will Deacon. Doesn't mean these two can still be
> >> > > reconciled though. From what I understand C11 (and onwards) are
> >> > > incompatible with the kernel model on a number of subtle points.
> >> >
> >> > It would be good to have these incompatibilities written down, then
> >> > for the sake of argument, they can be cited both for discussions on
> >> > LKML and in the C standardization process.  For example, a running
> >> > list in Documentation/ or something would make it so that anyone could
> >> > understand and cite current issues with the latest C standard.
> >>
> >> Will should be able to produce this list; I know he's done before, I
> >> just can't find it -- my Google-foo isn't strong today.
> >
> > Here you go:
> >
> > http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0124r4.html
> 
> Great, thanks! Will take some time to digest, but happy to refer
> others to this important work in the future.

Glad you like it!

> I wonder if we have anything like a case study that shows specifically
> how a compiler generated a subtle bug due to specifics of the memory
> model, like a "if you do this, here's the problematic code that will
> get generated, and why it's problematic due to the memory model."
> That may be a good way to raise issues with toolchain developers with
> concrete and actionable examples.

Well, the above is an official working paper from the C++ standards
committee.

The first priority is to fix memory_order_consume.  Which is getting a
bit more mindshare of late.  As Fedor Pikus said at CPPCON: "If you have
a large software project, you are probably already using RCU.  But you
don't know that you are using it, and you are probably doing it wrong."

> >> > I don't understand why we'd block patches for enabling experimental
> >> > features.  We've been running this patch-set on actual devices for
> >> > months and would love to provide them to the community for further
> >> > testing.  If bugs are found, then there's more evidence to bring to
> >> > the C standards committee.  Otherwise we're shutting down feature
> >> > development for the sake of potential bugs in a C standard we're not
> >> > even using.
> >>
> >> So the problem is that its very very hard (and painful) to find these
> >> bugs. Getting the tools people to comment on these specific
> >> optimizations would really help lots.
> 
> No doubt; I do not disagree with you.  Kernel developers have very
> important use cases for the language.
> 
> But the core point I'm trying to make is "do we need to block this
> patch set until issues with the C standards body in regards to the
> kernels memory model are resolved?"  I would hope the two are
> orthogonal and that we'd take them and then test them even more
> extensively than the developer has in order to find out.

Given that I have been working on getting the C and C++ standards to
correctly handle rcu_dereference() for more than ten years, I recommend
-against- waiting on standardization in the strongest possible terms.
And if you think that ten years is bad, the Java standards community has
been struggling with out-of-thin-air (OOTA) values for almost 20 years.
And the C and C++ standards haven't solved OOTA, either.

							Thanx, Paul

> > It would be good to get something similar to LKMM into KTSAN, for
> > example.  There would probably be a few differences due to efficiency
> > concerns, but closer is better than less close.  ;-)
> 
> + glider, who may be able to comment on the state of KTSAN.
> 

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 18/18] arm64: select ARCH_SUPPORTS_LTO_CLANG
  2017-11-16 18:39                     ` Paul E. McKenney
@ 2017-11-16 18:45                       ` Will Deacon
  2017-11-16 19:13                         ` Paul E. McKenney
  0 siblings, 1 reply; 103+ messages in thread
From: Will Deacon @ 2017-11-16 18:45 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Nick Desaulniers, Peter Zijlstra, Sami Tolvanen, Alex Matveev,
	Andi Kleen, Ard Biesheuvel, Greg Hackmann, Kees Cook,
	linux-arm-kernel, Linux Kbuild mailing list, LKML, Mark Rutland,
	Masahiro Yamada, Maxim Kuvyrkov, Michal Marek, Yury Norov,
	Matthias Kaehlcke, Alexander Potapenko, Stephen Hines,
	Pirama Arumuga Nainar, Manoj Gupta

On Thu, Nov 16, 2017 at 10:39:51AM -0800, Paul E. McKenney wrote:
> On Thu, Nov 16, 2017 at 10:16:22AM -0800, Nick Desaulniers wrote:
> > > On Thu, Nov 16, 2017 at 06:34:17PM +0100, Peter Zijlstra wrote:
> > >> So the problem is that its very very hard (and painful) to find these
> > >> bugs. Getting the tools people to comment on these specific
> > >> optimizations would really help lots.
> > 
> > No doubt; I do not disagree with you.  Kernel developers have very
> > important use cases for the language.
> > 
> > But the core point I'm trying to make is "do we need to block this
> > patch set until issues with the C standards body in regards to the
> > kernels memory model are resolved?"  I would hope the two are
> > orthogonal and that we'd take them and then test them even more
> > extensively than the developer has in order to find out.
> 
> Given that I have been working on getting the C and C++ standards to
> correctly handle rcu_dereference() for more than ten years, I recommend
> -against- waiting on standardization in the strongest possible terms.
> And if you think that ten years is bad, the Java standards community has
> been struggling with out-of-thin-air (OOTA) values for almost 20 years.
> And the C and C++ standards haven't solved OOTA, either.

The problem is, if we go ahead with this change, the compiler *will* break
some address dependencies and something will eventually go wrong. At that
point, what do we do? Turn off some random compiler optimisation? Add a
random barrier()?

We don't necessarily need standardisation, but we at least need guarantees
from the compiler implementation that LTO/PGO will respect source level
address dependencies. I don't think we have that today.

Will

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 10/18] arm64: add a workaround for GNU gold with ARM64_MODULE_PLTS
  2017-11-16 16:41     ` Sami Tolvanen
@ 2017-11-16 18:47       ` Will Deacon
  0 siblings, 0 replies; 103+ messages in thread
From: Will Deacon @ 2017-11-16 18:47 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Mark Rutland, Masahiro Yamada, Maxim Kuvyrkov, Michal Marek,
	Nick Desaulniers, Yury Norov, Matthias Kaehlcke

On Thu, Nov 16, 2017 at 08:41:01AM -0800, Sami Tolvanen wrote:
> On Thu, Nov 16, 2017 at 11:50:12AM +0000, Will Deacon wrote:
> > Why don't we just not do LTO if the toolchain is busted?
> 
> Because LTO can not only potentially improve performance, especially
> when combined with PGO (Profile Guided Optimization), but it also
> makes it possible to enable features like Control Flow Integrity that
> can make kernel vulnerabilities more difficult to exploit:
> 
>   https://clang.llvm.org/docs/ControlFlowIntegrity.html
> 
> > This feels like it will end up being a game of whack-a-mole as code
> > could be introduced that tickles known bugs on older toolchains.
> 
> I'm not sure this is much different from dealing with older versions
> of the existing toolchain. Otherwise, we wouldn't need cc-version or
> other similar macros, for example.

I think the big difference is that we have no compelling need to support
older versions of clang or gold.

Will

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 18/18] arm64: select ARCH_SUPPORTS_LTO_CLANG
  2017-11-16 18:45                       ` Will Deacon
@ 2017-11-16 19:13                         ` Paul E. McKenney
  2017-11-16 20:17                           ` Sami Tolvanen
  0 siblings, 1 reply; 103+ messages in thread
From: Paul E. McKenney @ 2017-11-16 19:13 UTC (permalink / raw)
  To: Will Deacon
  Cc: Nick Desaulniers, Peter Zijlstra, Sami Tolvanen, Alex Matveev,
	Andi Kleen, Ard Biesheuvel, Greg Hackmann, Kees Cook,
	linux-arm-kernel, Linux Kbuild mailing list, LKML, Mark Rutland,
	Masahiro Yamada, Maxim Kuvyrkov, Michal Marek, Yury Norov,
	Matthias Kaehlcke, Alexander Potapenko, Stephen Hines,
	Pirama Arumuga Nainar, Manoj Gupta

On Thu, Nov 16, 2017 at 06:45:08PM +0000, Will Deacon wrote:
> On Thu, Nov 16, 2017 at 10:39:51AM -0800, Paul E. McKenney wrote:
> > On Thu, Nov 16, 2017 at 10:16:22AM -0800, Nick Desaulniers wrote:
> > > > On Thu, Nov 16, 2017 at 06:34:17PM +0100, Peter Zijlstra wrote:
> > > >> So the problem is that its very very hard (and painful) to find these
> > > >> bugs. Getting the tools people to comment on these specific
> > > >> optimizations would really help lots.
> > > 
> > > No doubt; I do not disagree with you.  Kernel developers have very
> > > important use cases for the language.
> > > 
> > > But the core point I'm trying to make is "do we need to block this
> > > patch set until issues with the C standards body in regards to the
> > > kernels memory model are resolved?"  I would hope the two are
> > > orthogonal and that we'd take them and then test them even more
> > > extensively than the developer has in order to find out.
> > 
> > Given that I have been working on getting the C and C++ standards to
> > correctly handle rcu_dereference() for more than ten years, I recommend
> > -against- waiting on standardization in the strongest possible terms.
> > And if you think that ten years is bad, the Java standards community has
> > been struggling with out-of-thin-air (OOTA) values for almost 20 years.
> > And the C and C++ standards haven't solved OOTA, either.
> 
> The problem is, if we go ahead with this change, the compiler *will* break
> some address dependencies and something will eventually go wrong. At that
> point, what do we do? Turn off some random compiler optimisation? Add a
> random barrier()?
> 
> We don't necessarily need standardisation, but we at least need guarantees
> from the compiler implementation that LTO/PGO will respect source level
> address dependencies. I don't think we have that today.

Ah, if "this patch set" meant "adding LTO", I stand corrected and I
apologize for my confusion.

I agree that we need LTO/PGO to be housebroken from an LKMM viewpoint
before it is enabled.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 18/18] arm64: select ARCH_SUPPORTS_LTO_CLANG
  2017-11-16 19:13                         ` Paul E. McKenney
@ 2017-11-16 20:17                           ` Sami Tolvanen
  2017-11-20 18:05                             ` Will Deacon
  0 siblings, 1 reply; 103+ messages in thread
From: Sami Tolvanen @ 2017-11-16 20:17 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Will Deacon, Nick Desaulniers, Peter Zijlstra, Alex Matveev,
	Andi Kleen, Ard Biesheuvel, Greg Hackmann, Kees Cook,
	linux-arm-kernel, Linux Kbuild mailing list, LKML, Mark Rutland,
	Masahiro Yamada, Maxim Kuvyrkov, Michal Marek, Yury Norov,
	Matthias Kaehlcke, Alexander Potapenko, Stephen Hines,
	Pirama Arumuga Nainar, Manoj Gupta

On Thu, Nov 16, 2017 at 11:13:07AM -0800, Paul E. McKenney wrote:
> Ah, if "this patch set" meant "adding LTO", I stand corrected and I
> apologize for my confusion.

Again, I'm not proposing for LTO to be enabled by default. These patches
just make it possible to enable it. Are you saying the possibility
that a future compiler update breaks something is a blocker even for
experimental features?

> I agree that we need LTO/PGO to be housebroken from an LKMM viewpoint
> before it is enabled.

Can you elaborate what's needed from clang before this can move
forward? For example, if you have specific test cases in mind, we can
always work on including them in the LLVM test suite.

Sami

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 00/18] Add support for clang LTO
  2017-11-15 21:34 [PATCH v2 00/18] Add support for clang LTO Sami Tolvanen
                   ` (17 preceding siblings ...)
  2017-11-15 21:34 ` [PATCH v2 18/18] arm64: select ARCH_SUPPORTS_LTO_CLANG Sami Tolvanen
@ 2017-11-16 20:53 ` Yury Norov
  2017-11-16 21:38   ` Sami Tolvanen
  2017-11-20 15:21 ` Mark Rutland
  19 siblings, 1 reply; 103+ messages in thread
From: Yury Norov @ 2017-11-16 20:53 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Mark Rutland, Masahiro Yamada, Maxim Kuvyrkov, Michal Marek,
	Nick Desaulniers, Matthias Kaehlcke

On Wed, Nov 15, 2017 at 01:34:10PM -0800, Sami Tolvanen wrote:
> This series adds build system support for compiling the kernel with clang
> Link Time Optimization (LTO), using GNU gold with the LLVMgold plug-in
> for linking. Some background for clang's LTO support is available here:
> 
>   https://llvm.org/docs/LinkTimeOptimization.html
> 
> With -flto, clang produces LLVM bitcode instead of object files, and
> the compilation to native code happens at link time. In addition, clang
> cannot use an external assembler for inline assembly when LTO is enabled,
> which causes further compatibility issues.
> 
> The patches in this series remove intermediate linking steps when LTO is
> used, postpone processing done on object files until after the LTO link
> step, add workarounds for GNU gold incompatibilities, and address inline
> assembly incompatibilities for arm64.
> 
> These changes allow arm64 defconfig to be compiled with LTO, but other
> architectures are not enabled until compatibility issues have been
> addressed. In particular, x86 inline assembly doesn't currently compile
> with clang's integrated assembler due to this LLVM bug:
> 
>   https://bugs.llvm.org/show_bug.cgi?id=24487
> 
> Due to recent bug fixes in the toolchain, it's recommended to use clang
> 5.0 or later, and GNU gold from binutils 2.27 or later, although older
> versions may also work depending on your kernel configuration.

So, you don't guarantee that kernel will work with old compiler and
toolchain? If so, I would suggest you to add the patch that checks
their versions and disables LTO if needed.

Yury

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 11/18] arm64: make mrs_s and msr_s macros work with LTO
  2017-11-16 13:55       ` Robin Murphy
@ 2017-11-16 21:29         ` Yury Norov
  2017-11-16 22:54           ` Alex Matveev
  0 siblings, 1 reply; 103+ messages in thread
From: Yury Norov @ 2017-11-16 21:29 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Will Deacon, Mark Rutland, Andi Kleen, Kees Cook, Ard Biesheuvel,
	linux-kbuild, Nick Desaulniers, linux-kernel, Greg Hackmann,
	Masahiro Yamada, Michal Marek, Sami Tolvanen, Alex Matveev,
	Matthias Kaehlcke, linux-arm-kernel, Maxim Kuvyrkov

On Thu, Nov 16, 2017 at 01:55:31PM +0000, Robin Murphy wrote:
> On 16/11/17 13:07, Yury Norov wrote:
> > On Thu, Nov 16, 2017 at 11:54:33AM +0000, Will Deacon wrote:
> > > On Wed, Nov 15, 2017 at 01:34:21PM -0800, Sami Tolvanen wrote:
> > > > From: Alex Matveev <alxmtvv@gmail.com>
> > > > 
> > > > Use UNDEFINE_MRS_S and UNDEFINE_MSR_S to define corresponding macros
> > > > in-place and workaround gcc and clang limitations on redefining macros
> > > > across different assembler blocks.
> > > 
> > > What limitations? Can you elaborate please? Is this a fix?
> > 
> > Hi Will,
> > 
> > Regarding GCC.
> > 
> > When it joins preprocessed source files into single asm file,
> > mrs_s/msr_s becomes either not declared or declared multiple times.
> > 
> > ./ccuFb68h.s:33120: Error: Macro `mrs_s' was already defined
> > ./ccuFb68h.s:33124: Error: Macro `msr_s' was already defined
> > 
> > I'm not sure that GCC works correctly in this case, and I sent the
> > email to Linaro toolchain group to clarify it. See below.
> > 
> > Yury
> > 
> > [...]
> > 
> > Links:
> > My unfinished branch:
> > https://github.com/norov/linux/tree/lto
> > Andi Kleen tree:
> > https://github.com/andikleen/linux-misc/tree/lto-411-1
> > Sami Tolvanen's recent work for clang:
> > https://lkml.org/lkml/2017/11/3/606
> > 
> > Question we have for now:
> > There's mrs_s/msr_s macro that doesn't work with LTO - linker
> > complains very loudly that macro is either not declared, or declared
> > multiple times. (To reproduce - try to build my kernel branch w/o last
> > patch).
> > 
> > The same (?) problem is observed with clang, and people there
> > considered it as feature, not a bug.
> > 
> > https://bugs.llvm.org/show_bug.cgi?id=19749
> > 
> > We have the fix for both clang and gcc, but it looks hacky. Maybe it
> > worth to fix mrs/msr issue on toolchain side?
> 
> Given that this whole mrs_s infrastructure is a workaround for older
> assemblers which don't support the "S<op0>_<op1>_<Cn>_<Cm>_<op2>" syntax for
> arbitrary unnamed system registers (which IIRC was a fairly late addition to
> the architecture), the only way it could be "fixed" on the toolchain side is
> by removing all those older toolchains from existence. Good luck with that
> ;)
> 
> In *theory*, it might be possible to do something similar to what we do with
> CONFIG_BROKEN_GAS_INST to detect offending assemblers and only define and
> use these macros when necessary (hopefully Clang and other LTO-capable
> toolchains do accept the proper syntax), but I've no idea how invasive or
> difficult that might turn out to be.

Thank you Robin. The "S<op0>_<op1>_<Cn>_<Cm>_<op2>" is the feature I
was looking for. I'm not going to remove old toolchain from every user
machine - I believe if I do, I'll be put into jail finally. :)

But I think we should use this syntax for {read,write}_sysreg_s
and other functions if assembler supports it. The existing mrs_s/msr_s
machinery will be deprecated then, and it will be generally good.

Sami told that LTO doesn't work with old toolchains, so I think that
it would be OK to disable LTO for old assemblers that doesn't support
all necessary features. Anyway, this is not blocking issue because we
have patch that workarounds it (again).

Yury

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 08/18] arm64: don't disable ADR_PREL_PG_HI21* with ARM64_ERRATUM_843419
  2017-11-16 16:34         ` Ard Biesheuvel
@ 2017-11-16 21:37           ` Sami Tolvanen
  2017-11-16 22:14             ` Ard Biesheuvel
  0 siblings, 1 reply; 103+ messages in thread
From: Sami Tolvanen @ 2017-11-16 21:37 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Will Deacon, Mark Rutland, Andi Kleen, Kees Cook, linux-kbuild,
	Nick Desaulniers, linux-kernel, Greg Hackmann, Masahiro Yamada,
	Michal Marek, Yury Norov, Alex Matveev, Matthias Kaehlcke,
	linux-arm-kernel, Maxim Kuvyrkov

On Thu, Nov 16, 2017 at 04:34:03PM +0000, Ard Biesheuvel wrote:
> You still have not explained to us how GOLD avoids the erratum.

Sorry, I didn't realize you were asking that. If gold spots erratum
sequences, looks like it creates stubs to break them up:

  https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gold/aarch64.cc#l8396

It also attempts to optimize the code by replacing adrps in these
sequences with adr where possible, but otherwise doesn't appear to
touch them:

  https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gold/aarch64.cc#l2053

Sami

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 00/18] Add support for clang LTO
  2017-11-16 20:53 ` [PATCH v2 00/18] Add support for clang LTO Yury Norov
@ 2017-11-16 21:38   ` Sami Tolvanen
  0 siblings, 0 replies; 103+ messages in thread
From: Sami Tolvanen @ 2017-11-16 21:38 UTC (permalink / raw)
  To: Yury Norov
  Cc: Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Mark Rutland, Masahiro Yamada, Maxim Kuvyrkov, Michal Marek,
	Nick Desaulniers, Matthias Kaehlcke

On Thu, Nov 16, 2017 at 11:53:00PM +0300, Yury Norov wrote:
> I would suggest you to add the patch that checks their versions and
> disables LTO if needed.

Sure, sounds reasonable. I'll add this in the next version.

Sami

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 08/18] arm64: don't disable ADR_PREL_PG_HI21* with ARM64_ERRATUM_843419
  2017-11-16 21:37           ` Sami Tolvanen
@ 2017-11-16 22:14             ` Ard Biesheuvel
  2017-11-16 22:25               ` Ard Biesheuvel
  2017-11-16 23:09               ` Sami Tolvanen
  0 siblings, 2 replies; 103+ messages in thread
From: Ard Biesheuvel @ 2017-11-16 22:14 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Will Deacon, Mark Rutland, Andi Kleen, Kees Cook, linux-kbuild,
	Nick Desaulniers, linux-kernel, Greg Hackmann, Masahiro Yamada,
	Michal Marek, Yury Norov, Alex Matveev, Matthias Kaehlcke,
	linux-arm-kernel, Maxim Kuvyrkov

On 16 November 2017 at 21:37, Sami Tolvanen <samitolvanen@google.com> wrote:
> On Thu, Nov 16, 2017 at 04:34:03PM +0000, Ard Biesheuvel wrote:
>> You still have not explained to us how GOLD avoids the erratum.
>
> Sorry, I didn't realize you were asking that. If gold spots erratum
> sequences, looks like it creates stubs to break them up:
>
>   https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gold/aarch64.cc#l8396
>
> It also attempts to optimize the code by replacing adrps in these
> sequences with adr where possible, but otherwise doesn't appear to
> touch them:
>
>   https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gold/aarch64.cc#l2053
>

OK, so my concern here is that this code probably only operates on
fully linked binaries, and not partially linked object files like
kernel modules. The same applies to ld.bfd, which is why we need to
use the large module instead of a code model that may emit adrp
instructions..

What is preventing us from using the large model with clang? I know it
uses movk/movz pairs rather than literals, but this shouldn't matter
for modules, given that we support static ELF relocations.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 08/18] arm64: don't disable ADR_PREL_PG_HI21* with ARM64_ERRATUM_843419
  2017-11-16 22:14             ` Ard Biesheuvel
@ 2017-11-16 22:25               ` Ard Biesheuvel
  2017-11-16 23:09               ` Sami Tolvanen
  1 sibling, 0 replies; 103+ messages in thread
From: Ard Biesheuvel @ 2017-11-16 22:25 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Will Deacon, Mark Rutland, Andi Kleen, Kees Cook, linux-kbuild,
	Nick Desaulniers, linux-kernel, Greg Hackmann, Masahiro Yamada,
	Michal Marek, Yury Norov, Alex Matveev, Matthias Kaehlcke,
	linux-arm-kernel, Maxim Kuvyrkov

On 16 November 2017 at 22:14, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 16 November 2017 at 21:37, Sami Tolvanen <samitolvanen@google.com> wrote:
>> On Thu, Nov 16, 2017 at 04:34:03PM +0000, Ard Biesheuvel wrote:
>>> You still have not explained to us how GOLD avoids the erratum.
>>
>> Sorry, I didn't realize you were asking that. If gold spots erratum
>> sequences, looks like it creates stubs to break them up:
>>
>>   https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gold/aarch64.cc#l8396
>>
>> It also attempts to optimize the code by replacing adrps in these
>> sequences with adr where possible, but otherwise doesn't appear to
>> touch them:
>>
>>   https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gold/aarch64.cc#l2053
>>
>
> OK, so my concern here is that this code probably only operates on
> fully linked binaries, and not partially linked object files like
> kernel modules. The same applies to ld.bfd, which is why we need to
> use the large module

*model* not module

> instead of a code model that may emit adrp
> instructions..
>
> What is preventing us from using the large model with clang? I know it
> uses movk/movz pairs rather than literals, but this shouldn't matter
> for modules, given that we support static ELF relocations.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 11/18] arm64: make mrs_s and msr_s macros work with LTO
  2017-11-16 21:29         ` Yury Norov
@ 2017-11-16 22:54           ` Alex Matveev
  2017-12-04 17:34             ` Nick Desaulniers
  0 siblings, 1 reply; 103+ messages in thread
From: Alex Matveev @ 2017-11-16 22:54 UTC (permalink / raw)
  To: Yury Norov
  Cc: Robin Murphy, Will Deacon, Mark Rutland, Andi Kleen, Kees Cook,
	Ard Biesheuvel, linux-kbuild, Nick Desaulniers, linux-kernel,
	Greg Hackmann, Masahiro Yamada, Michal Marek, Sami Tolvanen,
	Matthias Kaehlcke, linux-arm-kernel, Maxim Kuvyrkov

On Fri, 17 Nov 2017 00:29:20 +0300
Yury Norov <ynorov@caviumnetworks.com> wrote:

> On Thu, Nov 16, 2017 at 01:55:31PM +0000, Robin Murphy wrote:
> > Given that this whole mrs_s infrastructure is a workaround for older
> > assemblers which don't support the "S<op0>_<op1>_<Cn>_<Cm>_<op2>"
> > syntax for arbitrary unnamed system registers (which IIRC was a
> > fairly late addition to the architecture), the only way it could be
> > "fixed" on the toolchain side is by removing all those older
> > toolchains from existence. Good luck with that ;)
> > 
> > In *theory*, it might be possible to do something similar to what
> > we do with CONFIG_BROKEN_GAS_INST to detect offending assemblers
> > and only define and use these macros when necessary (hopefully
> > Clang and other LTO-capable toolchains do accept the proper
> > syntax), but I've no idea how invasive or difficult that might turn
> > out to be.  
> 
> Thank you Robin. The "S<op0>_<op1>_<Cn>_<Cm>_<op2>" is the feature I
> was looking for. I'm not going to remove old toolchain from every user
> machine - I believe if I do, I'll be put into jail finally. :)
> 
> But I think we should use this syntax for {read,write}_sysreg_s
> and other functions if assembler supports it. The existing mrs_s/msr_s
> machinery will be deprecated then, and it will be generally good.

Well, "S<op>_..." is how registers were accessed before implementation
of mrs_s/msr_s - here's the patch from mid-2014 by Catalin Marinas:

commit 72c583951526
Author: Catalin Marinas <catalin.marinas@arm.com>
Date:   Thu Jul 24 14:14:42 2014 +0100

    arm64: gicv3: Allow GICv3 compilation with older binutils
    
    GICv3 introduces new system registers accessible with the full
    msr/mrs syntax (e.g. mrs x0, Sop0_op1_CRm_CRn_op2). However, only
    recent binutils understand the new syntax. This patch introduces
    msr_s/mrs_s assembly macros which generate the equivalent
    instructions above and converts the existing GICv3 code (both
    drivers/irqchip/ and arch/arm64/kernel/).

The question is - is it OK to drop compatibility with old versions of
binutils (which were already "older" back in 2014)? It's not my call to
make. If yes, then it should be possible to make this change more
aesthetic by reverting to "S<op>" (however, it will affect more places
as now some users of register definitions expect them to be numbers, not
"S<op>" strings).

Going the way of CONFIG_BROKEN_GAS_INST and selecting between "S<op>"
and mrs_s/msr_s is also possible, but then we'd have to keep both
mechanisms around. I'm not quite sure what that would give us.

Regards,
Alex

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 08/18] arm64: don't disable ADR_PREL_PG_HI21* with ARM64_ERRATUM_843419
  2017-11-16 22:14             ` Ard Biesheuvel
  2017-11-16 22:25               ` Ard Biesheuvel
@ 2017-11-16 23:09               ` Sami Tolvanen
  2017-11-16 23:20                 ` Ard Biesheuvel
  1 sibling, 1 reply; 103+ messages in thread
From: Sami Tolvanen @ 2017-11-16 23:09 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Will Deacon, Mark Rutland, Andi Kleen, Kees Cook, linux-kbuild,
	Nick Desaulniers, linux-kernel, Greg Hackmann, Masahiro Yamada,
	Michal Marek, Yury Norov, Alex Matveev, Matthias Kaehlcke,
	linux-arm-kernel, Maxim Kuvyrkov

On Thu, Nov 16, 2017 at 10:14:17PM +0000, Ard Biesheuvel wrote:
> OK, so my concern here is that this code probably only operates on
> fully linked binaries, and not partially linked object files like
> kernel modules.

Right. That makes sense.

> What is preventing us from using the large model with clang?

We pass -mcmodel=large to clang, but I just confirmed that the
attribute isn't stored in LLVM IR, which means it's not used during
link time compilation. I'll see if we can solve this problem by
passing the correct code model directly to LLVMgold instead. Thanks
for pointing this out.

Sami

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 08/18] arm64: don't disable ADR_PREL_PG_HI21* with ARM64_ERRATUM_843419
  2017-11-16 23:09               ` Sami Tolvanen
@ 2017-11-16 23:20                 ` Ard Biesheuvel
  2017-11-16 23:50                   ` Sami Tolvanen
  0 siblings, 1 reply; 103+ messages in thread
From: Ard Biesheuvel @ 2017-11-16 23:20 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Will Deacon, Mark Rutland, Andi Kleen, Kees Cook, linux-kbuild,
	Nick Desaulniers, linux-kernel, Greg Hackmann, Masahiro Yamada,
	Michal Marek, Yury Norov, Alex Matveev, Matthias Kaehlcke,
	linux-arm-kernel, Maxim Kuvyrkov

On 16 November 2017 at 23:09, Sami Tolvanen <samitolvanen@google.com> wrote:
> On Thu, Nov 16, 2017 at 10:14:17PM +0000, Ard Biesheuvel wrote:
>> OK, so my concern here is that this code probably only operates on
>> fully linked binaries, and not partially linked object files like
>> kernel modules.
>
> Right. That makes sense.
>
>> What is preventing us from using the large model with clang?
>
> We pass -mcmodel=large to clang, but I just confirmed that the
> attribute isn't stored in LLVM IR, which means it's not used during
> link time compilation. I'll see if we can solve this problem by
> passing the correct code model directly to LLVMgold instead. Thanks
> for pointing this out.
>

So at which point is the IR in a partially linked object file
converted into executable code?

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 08/18] arm64: don't disable ADR_PREL_PG_HI21* with ARM64_ERRATUM_843419
  2017-11-16 23:20                 ` Ard Biesheuvel
@ 2017-11-16 23:50                   ` Sami Tolvanen
  2017-11-17  9:54                     ` Ard Biesheuvel
  0 siblings, 1 reply; 103+ messages in thread
From: Sami Tolvanen @ 2017-11-16 23:50 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Will Deacon, Mark Rutland, Andi Kleen, Kees Cook, linux-kbuild,
	Nick Desaulniers, linux-kernel, Greg Hackmann, Masahiro Yamada,
	Michal Marek, Yury Norov, Alex Matveev, Matthias Kaehlcke,
	linux-arm-kernel, Maxim Kuvyrkov

On Thu, Nov 16, 2017 at 11:20:40PM +0000, Ard Biesheuvel wrote:
> So at which point is the IR in a partially linked object file
> converted into executable code?

At the final module link step (cmd_ld_ko_o) in scripts/Makefile.modpost,
added in patch 12.

Sami

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 08/18] arm64: don't disable ADR_PREL_PG_HI21* with ARM64_ERRATUM_843419
  2017-11-16 23:50                   ` Sami Tolvanen
@ 2017-11-17  9:54                     ` Ard Biesheuvel
  2017-11-17 18:49                       ` Sami Tolvanen
  0 siblings, 1 reply; 103+ messages in thread
From: Ard Biesheuvel @ 2017-11-17  9:54 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Will Deacon, Mark Rutland, Andi Kleen, Kees Cook, linux-kbuild,
	Nick Desaulniers, linux-kernel, Greg Hackmann, Masahiro Yamada,
	Michal Marek, Yury Norov, Alex Matveev, Matthias Kaehlcke,
	linux-arm-kernel, Maxim Kuvyrkov

On 16 November 2017 at 23:50, Sami Tolvanen <samitolvanen@google.com> wrote:
> On Thu, Nov 16, 2017 at 11:20:40PM +0000, Ard Biesheuvel wrote:
>> So at which point is the IR in a partially linked object file
>> converted into executable code?
>
> At the final module link step (cmd_ld_ko_o) in scripts/Makefile.modpost,
> added in patch 12.
>

OK, so all IR objects are converted into a single .o file
encapsulating the module image. Does this give the same benefits as
LTO linking IR objects to a fully linked executable? Even if it does,
partial linking is not something the toolchain people are usually
crazy about, so it would be nice to have some confirmation that this
is a usage model that is fully supported.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 08/18] arm64: don't disable ADR_PREL_PG_HI21* with ARM64_ERRATUM_843419
  2017-11-17  9:54                     ` Ard Biesheuvel
@ 2017-11-17 18:49                       ` Sami Tolvanen
  0 siblings, 0 replies; 103+ messages in thread
From: Sami Tolvanen @ 2017-11-17 18:49 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Will Deacon, Mark Rutland, Andi Kleen, Kees Cook, linux-kbuild,
	Nick Desaulniers, linux-kernel, Greg Hackmann, Masahiro Yamada,
	Michal Marek, Yury Norov, Alex Matveev, Matthias Kaehlcke,
	linux-arm-kernel, Maxim Kuvyrkov

On Fri, Nov 17, 2017 at 09:54:48AM +0000, Ard Biesheuvel wrote:
> OK, so all IR objects are converted into a single .o file
> encapsulating the module image. Does this give the same benefits as
> LTO linking IR objects to a fully linked executable?

Yes, it does.

> Even if it does, partial linking is not something the toolchain
> people are usually crazy about, so it would be nice to have some
> confirmation that this is a usage model that is fully supported.

I confirmed with our LLVM developers that while this is less common,
it's fully supported.

I will also drop this patch in v3 as passing code model to LLVMgold
fixes the issue with LTO.

Sami

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [v2,12/18] kbuild: add support for clang LTO
  2017-11-15 21:34 ` [PATCH v2 12/18] kbuild: add support for clang LTO Sami Tolvanen
  2017-11-15 22:06   ` Kees Cook
@ 2017-11-18  3:21   ` Nicholas Piggin
  2017-11-20 20:21     ` Sami Tolvanen
  1 sibling, 1 reply; 103+ messages in thread
From: Nicholas Piggin @ 2017-11-18  3:21 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Mark Rutland, Masahiro Yamada, Maxim Kuvyrkov, Michal Marek,
	Nick Desaulniers, Yury Norov, Matthias Kaehlcke

On Wed, 15 Nov 2017 13:34:22 -0800
Sami Tolvanen <samitolvanen@google.com> wrote:

> This change adds the configuration option CONFIG_LTO_CLANG, and
> build system support for clang's Link Time Optimization (LTO). In
> preparation for LTO support for other compilers, potentially common
> parts of the changes are gated behind CONFIG_LTO instead.
> 
> With -flto, instead of object files, clang produces LLVM bitcode,
> which is compiled into a native object at link time, allowing the
> final binary to be optimized globally. For more details, see:
> 
>   https://llvm.org/docs/LinkTimeOptimization.html
> 
> While the kernel normally uses GNU ld for linking, LLVM supports LTO
> only with lld or GNU gold linkers. This patch set assumes gold will
> be used with the LLVMgold plug-in to perform the LTO link step. Due
> to potential incompatibilities with GNU ld, this change also adds
> LDFINAL_vmlinux for using a different linker for the vmlinux_link
> step, and defaults to using GNU ld.
> 
> Assuming LLVMgold.so is in LD_LIBRARY_PATH and CONFIG_LTO_CLANG has
> been selected, an LTO kernel can be built simply by running make
> CC=clang. Recommended versions are >= 5.0 for clang, and >= 2.27 for
> binutils.

Do you have any kind of numbers for this, out of curiosity? Binary
size, performance, build time?

Also

> @@ -585,6 +585,7 @@ config CC_STACKPROTECTOR_STRONG
>  endchoice
>  
>  config THIN_ARCHIVES
> +	depends on !LTO_CLANG
>  	def_bool y
>  	help
>  	  Select this if the architecture wants to use thin archives

Why is this needed? It would have been nice to get rid of the
!THIN_ARCHIVES option if you can make the patches work with the
thin archives paths.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 06/18] arm64: kvm: use -fno-jump-tables with clang
  2017-11-15 21:34 ` [PATCH v2 06/18] arm64: kvm: use -fno-jump-tables with clang Sami Tolvanen
  2017-11-16 11:46   ` Will Deacon
@ 2017-11-20 14:41   ` Mark Rutland
  2017-11-20 14:43     ` Mark Rutland
  1 sibling, 1 reply; 103+ messages in thread
From: Mark Rutland @ 2017-11-20 14:41 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Masahiro Yamada, Maxim Kuvyrkov, Michal Marek, Nick Desaulniers,
	Yury Norov, Matthias Kaehlcke

On Wed, Nov 15, 2017 at 01:34:16PM -0800, Sami Tolvanen wrote:
> Use -fno-jump-tables to make sure clang doesn't generate branches
> to EL1 virtual addresses.

Are there any other reasons that clang might generate absolute
references/relocations?

It would be nice if there was the option to disable that more generally,
rather than disabling individual optimizations. Is there any PIC/PIE
option that we could use?

We might need something simnilar for GCC, even if we're not seeing
problems today.

Thanks,
Mark.

> Suggested-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
> Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
> ---
>  arch/arm64/kvm/hyp/Makefile | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
> index f04400d494b7..19fa1c6b6b69 100644
> --- a/arch/arm64/kvm/hyp/Makefile
> +++ b/arch/arm64/kvm/hyp/Makefile
> @@ -5,6 +5,10 @@
>  
>  ccflags-y += -fno-stack-protector -DDISABLE_BRANCH_PROFILING
>  
> +ifeq ($(cc-name),clang)
> +ccflags-y += -fno-jump-tables
> +endif
> +
>  KVM=../../../../virt/kvm
>  
>  obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/vgic-v2-sr.o
> -- 
> 2.15.0.448.gf294e3d99a-goog
> 

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 06/18] arm64: kvm: use -fno-jump-tables with clang
  2017-11-20 14:41   ` Mark Rutland
@ 2017-11-20 14:43     ` Mark Rutland
  2017-11-20 14:47       ` Ard Biesheuvel
  0 siblings, 1 reply; 103+ messages in thread
From: Mark Rutland @ 2017-11-20 14:43 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Andi Kleen, Kees Cook, Ard Biesheuvel, linux-kbuild,
	Nick Desaulniers, linux-kernel, Greg Hackmann, Masahiro Yamada,
	Michal Marek, Yury Norov, Alex Matveev, Matthias Kaehlcke,
	linux-arm-kernel, Maxim Kuvyrkov

On Mon, Nov 20, 2017 at 02:41:47PM +0000, Mark Rutland wrote:
> On Wed, Nov 15, 2017 at 01:34:16PM -0800, Sami Tolvanen wrote:
> > Use -fno-jump-tables to make sure clang doesn't generate branches
> > to EL1 virtual addresses.
> 
> Are there any other reasons that clang might generate absolute
> references/relocations?
> 
> It would be nice if there was the option to disable that more generally,
> rather than disabling individual optimizations. Is there any PIC/PIE
> option that we could use?
> 
> We might need something simnilar for GCC, even if we're not seeing
> problems today.

... and likewise for the EFI stub, which needs to be
position-independent much like the hyp code.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 06/18] arm64: kvm: use -fno-jump-tables with clang
  2017-11-20 14:43     ` Mark Rutland
@ 2017-11-20 14:47       ` Ard Biesheuvel
  0 siblings, 0 replies; 103+ messages in thread
From: Ard Biesheuvel @ 2017-11-20 14:47 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Sami Tolvanen, Andi Kleen, Kees Cook, linux-kbuild,
	Nick Desaulniers, linux-kernel, Greg Hackmann, Masahiro Yamada,
	Michal Marek, Yury Norov, Alex Matveev, Matthias Kaehlcke,
	linux-arm-kernel, Maxim Kuvyrkov

On 20 November 2017 at 14:43, Mark Rutland <mark.rutland@arm.com> wrote:
> On Mon, Nov 20, 2017 at 02:41:47PM +0000, Mark Rutland wrote:
>> On Wed, Nov 15, 2017 at 01:34:16PM -0800, Sami Tolvanen wrote:
>> > Use -fno-jump-tables to make sure clang doesn't generate branches
>> > to EL1 virtual addresses.
>>
>> Are there any other reasons that clang might generate absolute
>> references/relocations?
>>
>> It would be nice if there was the option to disable that more generally,
>> rather than disabling individual optimizations. Is there any PIC/PIE
>> option that we could use?
>>
>> We might need something simnilar for GCC, even if we're not seeing
>> problems today.
>
> ... and likewise for the EFI stub, which needs to be
> position-independent much like the hyp code.
>

Actually, we are already using -fpie for the EFI stub, but it is a bit
fiddly because it requires visibility overrides, given that PIC symbol
references are usually indirected via GOT entries. At the time, it
didn't occur to me that the HYP code has similar issues. Of course, in
the EFI stub we already check for ABS64 relocations to make sure that
the resulting code is correct (and /me makes mental note to add
R_AARCH64_GOT_xxx relocations to that as well)

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 09/18] arm64: explicitly pass --no-fix-cortex-a53-843419 to GNU gold
  2017-11-16 16:35     ` Sami Tolvanen
@ 2017-11-20 14:47       ` Mark Rutland
  2017-11-20 20:35         ` Sami Tolvanen
  0 siblings, 1 reply; 103+ messages in thread
From: Mark Rutland @ 2017-11-20 14:47 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Will Deacon, Alex Matveev, Andi Kleen, Ard Biesheuvel,
	Greg Hackmann, Kees Cook, linux-arm-kernel, linux-kbuild,
	linux-kernel, Masahiro Yamada, Maxim Kuvyrkov, Michal Marek,
	Nick Desaulniers, Yury Norov, Matthias Kaehlcke

On Thu, Nov 16, 2017 at 08:35:49AM -0800, Sami Tolvanen wrote:
> On Thu, Nov 16, 2017 at 11:47:23AM +0000, Will Deacon wrote:
> > But if ARM64_ERRATUM_843419 is enabled, we'll go ahead and generate
> > broken code?
> 
> This bug has been fixed in all current versions of GNU gold, but of
> course, if someone is using a broken linker, it will generate broken
> code. We can't do anything about that in the kernel.

However, we could instead check ld-version, produce a warning, and abort
the build in such cases.

Which would be better than silently producing a kernel that won't work
correctly.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 16/18] arm64: crypto: disable LTO for aes-ce-cipher.c
  2017-11-15 21:34 ` [PATCH v2 16/18] arm64: crypto: disable LTO for aes-ce-cipher.c Sami Tolvanen
@ 2017-11-20 15:20   ` Mark Rutland
  2017-11-20 15:25     ` Ard Biesheuvel
                       ` (2 more replies)
  0 siblings, 3 replies; 103+ messages in thread
From: Mark Rutland @ 2017-11-20 15:20 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Masahiro Yamada, Maxim Kuvyrkov, Michal Marek, Nick Desaulniers,
	Yury Norov, Matthias Kaehlcke

On Wed, Nov 15, 2017 at 01:34:26PM -0800, Sami Tolvanen wrote:
> CONFIG_LTO_CLANG requires the use of clang's integrated assembler, which
> doesn't understand the inline assembly in aes-ce-cipher.c. Disable LTO for
> the file to work around the issue.

Could you elaborate on what the integrated asembler doesn't like?

It's not entirely clear at a glance, as the asm in that file doesn't
seem to do anything that obscure.

Is it a bug?

Thanks,
Mark.

> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
> ---
>  arch/arm64/crypto/Makefile | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
> index b5edc5918c28..af08508521a3 100644
> --- a/arch/arm64/crypto/Makefile
> +++ b/arch/arm64/crypto/Makefile
> @@ -24,7 +24,7 @@ obj-$(CONFIG_CRYPTO_CRC32_ARM64_CE) += crc32-ce.o
>  crc32-ce-y:= crc32-ce-core.o crc32-ce-glue.o
>  
>  obj-$(CONFIG_CRYPTO_AES_ARM64_CE) += aes-ce-cipher.o
> -CFLAGS_aes-ce-cipher.o += -march=armv8-a+crypto
> +CFLAGS_aes-ce-cipher.o += -march=armv8-a+crypto $(DISABLE_LTO_CLANG)
>  
>  obj-$(CONFIG_CRYPTO_AES_ARM64_CE_CCM) += aes-ce-ccm.o
>  aes-ce-ccm-y := aes-ce-ccm-glue.o aes-ce-ccm-core.o
> -- 
> 2.15.0.448.gf294e3d99a-goog
> 

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 00/18] Add support for clang LTO
  2017-11-15 21:34 [PATCH v2 00/18] Add support for clang LTO Sami Tolvanen
                   ` (18 preceding siblings ...)
  2017-11-16 20:53 ` [PATCH v2 00/18] Add support for clang LTO Yury Norov
@ 2017-11-20 15:21 ` Mark Rutland
  2017-11-20 21:04   ` Sami Tolvanen
  19 siblings, 1 reply; 103+ messages in thread
From: Mark Rutland @ 2017-11-20 15:21 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Masahiro Yamada, Maxim Kuvyrkov, Michal Marek, Nick Desaulniers,
	Yury Norov, Matthias Kaehlcke


Hi,

On Wed, Nov 15, 2017 at 01:34:10PM -0800, Sami Tolvanen wrote:
> This series adds build system support for compiling the kernel with clang
> Link Time Optimization (LTO), using GNU gold with the LLVMgold plug-in
> for linking. 

As a high-level comment, this path series is rather confusing for
maintainers, since it's a mixture of basic support, workarounds, and a
new feature, and the commit messages aren't that helpful for those of us
unfamiliar with clang.

It would be *very* helpful if this could be split into a few smaller
patch series that built atop of each other.

Would it be possible to split this into:

(1) (basic) arm64 clang support
(2) gold support (no LTO)
(3) LTO support

... with any necessary workarounds added as-required to the relevant
series?

I think (1) is just the clang-version, -mno-implicit-float, and KVM
patches, and that might stand a chance of getting in soon. IIUC that
fixes the outstanding issues with basic clang support, and would give us
all a common baseline for review, testing, and future debugging.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 16/18] arm64: crypto: disable LTO for aes-ce-cipher.c
  2017-11-20 15:20   ` Mark Rutland
@ 2017-11-20 15:25     ` Ard Biesheuvel
  2017-11-20 21:01       ` Sami Tolvanen
  2017-11-20 20:51     ` Sami Tolvanen
  2017-11-20 21:29     ` Alex Matveev
  2 siblings, 1 reply; 103+ messages in thread
From: Ard Biesheuvel @ 2017-11-20 15:25 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Sami Tolvanen, Alex Matveev, Andi Kleen, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Masahiro Yamada, Maxim Kuvyrkov, Michal Marek, Nick Desaulniers,
	Yury Norov, Matthias Kaehlcke

On 20 November 2017 at 15:20, Mark Rutland <mark.rutland@arm.com> wrote:
> On Wed, Nov 15, 2017 at 01:34:26PM -0800, Sami Tolvanen wrote:
>> CONFIG_LTO_CLANG requires the use of clang's integrated assembler, which
>> doesn't understand the inline assembly in aes-ce-cipher.c. Disable LTO for
>> the file to work around the issue.
>
> Could you elaborate on what the integrated asembler doesn't like?
>
> It's not entirely clear at a glance, as the asm in that file doesn't
> seem to do anything that obscure.
>

Actually, it just occurred to me that this code does not adhere 100%
to the kernel mode neon rules, by putting kernel_neon_begin/end and
the code itself into the same source file. At the time, it seemed
harmless, given that these functions are only exposed via function
pointers and never called locally, and the compiler cannot really
reorder the function calls with the asm block. However, under LTO this
all changes, and it is no longer guaranteed that the NEON registers
are only touched between the kernel mode neon begin/end calls.

So the correct way to fix this would be to move the asm into its own
.S file, and call it from between the kernel_mode_neon_begin/end
calls. That should also fix your compat issue.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 18/18] arm64: select ARCH_SUPPORTS_LTO_CLANG
  2017-11-16 20:17                           ` Sami Tolvanen
@ 2017-11-20 18:05                             ` Will Deacon
  2017-11-20 19:25                               ` Peter Zijlstra
                                                 ` (2 more replies)
  0 siblings, 3 replies; 103+ messages in thread
From: Will Deacon @ 2017-11-20 18:05 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Paul E. McKenney, Nick Desaulniers, Peter Zijlstra, Alex Matveev,
	Andi Kleen, Ard Biesheuvel, Greg Hackmann, Kees Cook,
	linux-arm-kernel, Linux Kbuild mailing list, LKML, Mark Rutland,
	Masahiro Yamada, Maxim Kuvyrkov, Michal Marek, Yury Norov,
	Matthias Kaehlcke, Alexander Potapenko, Stephen Hines,
	Pirama Arumuga Nainar, Manoj Gupta

Hi Sami,

On Thu, Nov 16, 2017 at 12:17:01PM -0800, Sami Tolvanen wrote:
> On Thu, Nov 16, 2017 at 11:13:07AM -0800, Paul E. McKenney wrote:
> > Ah, if "this patch set" meant "adding LTO", I stand corrected and I
> > apologize for my confusion.
> 
> Again, I'm not proposing for LTO to be enabled by default. These patches
> just make it possible to enable it. Are you saying the possibility
> that a future compiler update breaks something is a blocker even for
> experimental features?

My opinion here is that it's a blocker for LTO on arm64, as the symptoms
aren't different to building with a compiler that has subtle code generation
issues. If you *really* want to support LTO, we could consider giving
acquire semantics to READ_ONCE for LTO builds, but that means trading
RCU read-side performance for whatever gains are provided by LTO.

Please don't confuse this with a reluctance to support clang; I'm keen to
see that supported, so let's focus on that for the moment (although I don't
see a good reason to support old clang builds with known issues).

> > I agree that we need LTO/PGO to be housebroken from an LKMM viewpoint
> > before it is enabled.
> 
> Can you elaborate what's needed from clang before this can move
> forward? For example, if you have specific test cases in mind, we can
> always work on including them in the LLVM test suite.

This is a thorny issue, but RCU (specifically rcu_dereference but probably
also some READ_ONCEs) relies on being able to utilise syntactic dependency
chains to order local accesses to shared variables. Paul has spent a
fair amount of time working to define these dependency chains here:

  https://wg21.link/P0190

Although the current direction of the C++ committee is to prefer
that dependencies are explicitly "marked", this is not deemed to be
acceptable for the kernel (in other words, everything is always considered
"marked").

If a compiler is capable of breaking unmarked dependency chains, then we
will run into subtle concurrency issues on arm64 because the hardware is
also capable of reordering independent accesses. My understanding is that
LTO makes these sort of optimisations far more likely because whole-program
analysis can enable value prediction, which can convert address dependencies
into control dependcies; the latter not ordering reads on arm64.

Will

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 18/18] arm64: select ARCH_SUPPORTS_LTO_CLANG
  2017-11-20 18:05                             ` Will Deacon
@ 2017-11-20 19:25                               ` Peter Zijlstra
  2017-11-20 19:28                               ` Peter Zijlstra
  2017-11-20 19:32                               ` Peter Zijlstra
  2 siblings, 0 replies; 103+ messages in thread
From: Peter Zijlstra @ 2017-11-20 19:25 UTC (permalink / raw)
  To: Will Deacon
  Cc: Sami Tolvanen, Paul E. McKenney, Nick Desaulniers, Alex Matveev,
	Andi Kleen, Ard Biesheuvel, Greg Hackmann, Kees Cook,
	linux-arm-kernel, Linux Kbuild mailing list, LKML, Mark Rutland,
	Masahiro Yamada, Maxim Kuvyrkov, Michal Marek, Yury Norov,
	Matthias Kaehlcke, Alexander Potapenko, Stephen Hines,
	Pirama Arumuga Nainar, Manoj Gupta

On Mon, Nov 20, 2017 at 06:05:55PM +0000, Will Deacon wrote:
> Please don't confuse this with a reluctance to support clang; I'm keen to
> see that supported, 

As an aside; as long as clang doesn't do asm-goto and asm-flag-output
(as examples of features that clang lacks and developers have at times
stated they would never support) I can't take them serious as a target
compiler.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 18/18] arm64: select ARCH_SUPPORTS_LTO_CLANG
  2017-11-20 18:05                             ` Will Deacon
  2017-11-20 19:25                               ` Peter Zijlstra
@ 2017-11-20 19:28                               ` Peter Zijlstra
  2017-11-20 20:52                                 ` Paul E. McKenney
  2017-11-20 19:32                               ` Peter Zijlstra
  2 siblings, 1 reply; 103+ messages in thread
From: Peter Zijlstra @ 2017-11-20 19:28 UTC (permalink / raw)
  To: Will Deacon
  Cc: Sami Tolvanen, Paul E. McKenney, Nick Desaulniers, Alex Matveev,
	Andi Kleen, Ard Biesheuvel, Greg Hackmann, Kees Cook,
	linux-arm-kernel, Linux Kbuild mailing list, LKML, Mark Rutland,
	Masahiro Yamada, Maxim Kuvyrkov, Michal Marek, Yury Norov,
	Matthias Kaehlcke, Alexander Potapenko, Stephen Hines,
	Pirama Arumuga Nainar, Manoj Gupta

On Mon, Nov 20, 2017 at 06:05:55PM +0000, Will Deacon wrote:
> This is a thorny issue, but RCU (specifically rcu_dereference but probably
> also some READ_ONCEs) relies on being able to utilise syntactic dependency
> chains to order local accesses to shared variables.

Well, we used to have READ_ONCE() and smp_read_barrier_depends(), but
we recently munged them together, in the process getting rid of
lockless_dereference().

So for sure, READ_ONCE() must be able to do the address dependency
thing, otherwise tons of code comes apart.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 18/18] arm64: select ARCH_SUPPORTS_LTO_CLANG
  2017-11-20 18:05                             ` Will Deacon
  2017-11-20 19:25                               ` Peter Zijlstra
  2017-11-20 19:28                               ` Peter Zijlstra
@ 2017-11-20 19:32                               ` Peter Zijlstra
  2017-11-20 20:53                                 ` Paul E. McKenney
  2 siblings, 1 reply; 103+ messages in thread
From: Peter Zijlstra @ 2017-11-20 19:32 UTC (permalink / raw)
  To: Will Deacon
  Cc: Sami Tolvanen, Paul E. McKenney, Nick Desaulniers, Alex Matveev,
	Andi Kleen, Ard Biesheuvel, Greg Hackmann, Kees Cook,
	linux-arm-kernel, Linux Kbuild mailing list, LKML, Mark Rutland,
	Masahiro Yamada, Maxim Kuvyrkov, Michal Marek, Yury Norov,
	Matthias Kaehlcke, Alexander Potapenko, Stephen Hines,
	Pirama Arumuga Nainar, Manoj Gupta

On Mon, Nov 20, 2017 at 06:05:55PM +0000, Will Deacon wrote:
> Although the current direction of the C++ committee is to prefer
> that dependencies are explicitly "marked", this is not deemed to be
> acceptable for the kernel (in other words, everything is always considered
> "marked").

Yeah, that is an attitude not compatible with existing code. Much like
the proposal to allow temporary/wide stores on everything not explicitly
declared atomic. Such stuff instantly breaks all extant code that does
multi-threading with no recourse.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [v2,12/18] kbuild: add support for clang LTO
  2017-11-18  3:21   ` [v2,12/18] " Nicholas Piggin
@ 2017-11-20 20:21     ` Sami Tolvanen
  2017-11-21  1:01       ` Nicholas Piggin
  0 siblings, 1 reply; 103+ messages in thread
From: Sami Tolvanen @ 2017-11-20 20:21 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Mark Rutland, Masahiro Yamada, Maxim Kuvyrkov, Michal Marek,
	Nick Desaulniers, Yury Norov, Matthias Kaehlcke

On Sat, Nov 18, 2017 at 01:21:39PM +1000, Nicholas Piggin wrote:
> Do you have any kind of numbers for this, out of curiosity? Binary
> size, performance, build time?

I don't have performance numbers to share. Are there any specific
benchmarks you'd be interested in seeing? Build time typically
increases with LTO and in my experience, binary size tends to increase
by ~10-15% as well.

> Why is this needed? It would have been nice to get rid of the
> !THIN_ARCHIVES option if you can make the patches work with the thin
> archives paths.

I believe LLVMgold doesn't know how to deal with an archive of LLVM IR
files, but I can certainly use thin archives as an index and extract
the path names for linking. I'll look into it.

Sami

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 09/18] arm64: explicitly pass --no-fix-cortex-a53-843419 to GNU gold
  2017-11-20 14:47       ` Mark Rutland
@ 2017-11-20 20:35         ` Sami Tolvanen
  2017-11-21 11:15           ` Mark Rutland
  0 siblings, 1 reply; 103+ messages in thread
From: Sami Tolvanen @ 2017-11-20 20:35 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Will Deacon, Alex Matveev, Andi Kleen, Ard Biesheuvel,
	Greg Hackmann, Kees Cook, linux-arm-kernel, linux-kbuild,
	linux-kernel, Masahiro Yamada, Maxim Kuvyrkov, Michal Marek,
	Nick Desaulniers, Yury Norov, Matthias Kaehlcke

On Mon, Nov 20, 2017 at 02:47:20PM +0000, Mark Rutland wrote:
> However, we could instead check ld-version, produce a warning, and
> abort the build in such cases.

I believe the version number of gold didn't change in binutils 2.28.1,
where this was fixed, but we could certainly warn about older versions
of gold.

Sami

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 16/18] arm64: crypto: disable LTO for aes-ce-cipher.c
  2017-11-20 15:20   ` Mark Rutland
  2017-11-20 15:25     ` Ard Biesheuvel
@ 2017-11-20 20:51     ` Sami Tolvanen
  2017-11-20 21:29     ` Alex Matveev
  2 siblings, 0 replies; 103+ messages in thread
From: Sami Tolvanen @ 2017-11-20 20:51 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Masahiro Yamada, Maxim Kuvyrkov, Michal Marek, Nick Desaulniers,
	Yury Norov, Matthias Kaehlcke

On Mon, Nov 20, 2017 at 03:20:14PM +0000, Mark Rutland wrote:
> Could you elaborate on what the integrated asembler doesn't like?

Here's the error, looks like in aes_sub:

<inline asm>:1:69: error: invalid operand for instruction
        dup     v1.4s, w12              ;movi   v0.16b, #0              ;aese   v0.16b, v1.16b          ;umov   w12, v0.4s[0]   ;
                                                                                                                     ^
LLVM ERROR: Error parsing inline asm

Sami

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 18/18] arm64: select ARCH_SUPPORTS_LTO_CLANG
  2017-11-20 19:28                               ` Peter Zijlstra
@ 2017-11-20 20:52                                 ` Paul E. McKenney
  0 siblings, 0 replies; 103+ messages in thread
From: Paul E. McKenney @ 2017-11-20 20:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Sami Tolvanen, Nick Desaulniers, Alex Matveev,
	Andi Kleen, Ard Biesheuvel, Greg Hackmann, Kees Cook,
	linux-arm-kernel, Linux Kbuild mailing list, LKML, Mark Rutland,
	Masahiro Yamada, Maxim Kuvyrkov, Michal Marek, Yury Norov,
	Matthias Kaehlcke, Alexander Potapenko, Stephen Hines,
	Pirama Arumuga Nainar, Manoj Gupta

On Mon, Nov 20, 2017 at 08:28:06PM +0100, Peter Zijlstra wrote:
> On Mon, Nov 20, 2017 at 06:05:55PM +0000, Will Deacon wrote:
> > This is a thorny issue, but RCU (specifically rcu_dereference but probably
> > also some READ_ONCEs) relies on being able to utilise syntactic dependency
> > chains to order local accesses to shared variables.
> 
> Well, we used to have READ_ONCE() and smp_read_barrier_depends(), but
> we recently munged them together, in the process getting rid of
> lockless_dereference().
> 
> So for sure, READ_ONCE() must be able to do the address dependency
> thing, otherwise tons of code comes apart.

I do hope that they track the volatile accesses produced by READ_ONCE().
Otherwise, it would not be good to apply this to anything touching MMIO.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 18/18] arm64: select ARCH_SUPPORTS_LTO_CLANG
  2017-11-20 19:32                               ` Peter Zijlstra
@ 2017-11-20 20:53                                 ` Paul E. McKenney
  2017-11-21 17:23                                   ` David Laight
  0 siblings, 1 reply; 103+ messages in thread
From: Paul E. McKenney @ 2017-11-20 20:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Sami Tolvanen, Nick Desaulniers, Alex Matveev,
	Andi Kleen, Ard Biesheuvel, Greg Hackmann, Kees Cook,
	linux-arm-kernel, Linux Kbuild mailing list, LKML, Mark Rutland,
	Masahiro Yamada, Maxim Kuvyrkov, Michal Marek, Yury Norov,
	Matthias Kaehlcke, Alexander Potapenko, Stephen Hines,
	Pirama Arumuga Nainar, Manoj Gupta

On Mon, Nov 20, 2017 at 08:32:56PM +0100, Peter Zijlstra wrote:
> On Mon, Nov 20, 2017 at 06:05:55PM +0000, Will Deacon wrote:
> > Although the current direction of the C++ committee is to prefer
> > that dependencies are explicitly "marked", this is not deemed to be
> > acceptable for the kernel (in other words, everything is always considered
> > "marked").
> 
> Yeah, that is an attitude not compatible with existing code. Much like
> the proposal to allow temporary/wide stores on everything not explicitly
> declared atomic. Such stuff instantly breaks all extant code that does
> multi-threading with no recourse.

If someone suggests temporary/wide stores, even on non-atomics, tell
them that the standard does not permit them to introduce data races.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 16/18] arm64: crypto: disable LTO for aes-ce-cipher.c
  2017-11-20 15:25     ` Ard Biesheuvel
@ 2017-11-20 21:01       ` Sami Tolvanen
  2017-11-21 11:47         ` Mark Rutland
  0 siblings, 1 reply; 103+ messages in thread
From: Sami Tolvanen @ 2017-11-20 21:01 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Mark Rutland, Alex Matveev, Andi Kleen, Greg Hackmann, Kees Cook,
	linux-arm-kernel, linux-kbuild, linux-kernel, Masahiro Yamada,
	Maxim Kuvyrkov, Michal Marek, Nick Desaulniers, Yury Norov,
	Matthias Kaehlcke

On Mon, Nov 20, 2017 at 03:25:31PM +0000, Ard Biesheuvel wrote:
> However, under LTO this all changes, and it is no longer guaranteed
> that the NEON registers are only touched between the kernel mode
> neon begin/end calls.

LTO operates on LLVM IR, so disabling LTO for this file should make
sure there won't be any unsafe optimizations. Are there other places
in the kernel that might have this issue?

> So the correct way to fix this would be to move the asm into its own
> .S file, and call it from between the kernel_mode_neon_begin/end
> calls. That should also fix your compat issue.

Sure, that would also work.
 
Sami

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 00/18] Add support for clang LTO
  2017-11-20 15:21 ` Mark Rutland
@ 2017-11-20 21:04   ` Sami Tolvanen
  2017-11-21 11:53     ` Mark Rutland
  0 siblings, 1 reply; 103+ messages in thread
From: Sami Tolvanen @ 2017-11-20 21:04 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Masahiro Yamada, Maxim Kuvyrkov, Michal Marek, Nick Desaulniers,
	Yury Norov, Matthias Kaehlcke

On Mon, Nov 20, 2017 at 03:21:40PM +0000, Mark Rutland wrote:
> Would it be possible to split this into:
> 
> (1) (basic) arm64 clang support
> (2) gold support (no LTO)
> (3) LTO support
> 
> ... with any necessary workarounds added as-required to the relevant
> series?

Absolutely. I'll split the patches into more sensible sets.

Sami

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 16/18] arm64: crypto: disable LTO for aes-ce-cipher.c
  2017-11-20 15:20   ` Mark Rutland
  2017-11-20 15:25     ` Ard Biesheuvel
  2017-11-20 20:51     ` Sami Tolvanen
@ 2017-11-20 21:29     ` Alex Matveev
  2017-11-20 21:31       ` Ard Biesheuvel
  2 siblings, 1 reply; 103+ messages in thread
From: Alex Matveev @ 2017-11-20 21:29 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Sami Tolvanen, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Masahiro Yamada, Maxim Kuvyrkov, Michal Marek, Nick Desaulniers,
	Yury Norov, Matthias Kaehlcke

On Mon, 20 Nov 2017 15:20:14 +0000
Mark Rutland <mark.rutland@arm.com> wrote:

> On Wed, Nov 15, 2017 at 01:34:26PM -0800, Sami Tolvanen wrote:
> > CONFIG_LTO_CLANG requires the use of clang's integrated assembler,
> > which doesn't understand the inline assembly in aes-ce-cipher.c.
> > Disable LTO for the file to work around the issue.  
> 
> Could you elaborate on what the integrated asembler doesn't like?
> 
> It's not entirely clear at a glance, as the asm in that file doesn't
> seem to do anything that obscure.
> 
> Is it a bug?

Turns out, integrated assembler doesn't like this instruction in
aes_sub():
	umov	%w[out], v0.4s[0]

Specifically, it barks at "v0.4s[0]" part. And the way I read the spec,
it's quite correct in not accepting this argument. From UMOV
description:

UMOV <Wd>, <Vn>.<Ts>[<index>]

...

<Ts> For the 32-bit variant: is an element size specifier, encoded in
the "imm5" field. It can have the following values:
	B	when imm5 = xxxx1
	H	when imm5 = xxx10
	S	when imm5 = xx100


With "v0.s[0]" it builds fine.

Ard, since this is your code, can you comment? Feels like a typo.

Regards,
Alex

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 16/18] arm64: crypto: disable LTO for aes-ce-cipher.c
  2017-11-20 21:29     ` Alex Matveev
@ 2017-11-20 21:31       ` Ard Biesheuvel
  2017-11-20 22:03         ` Alex Matveev
  0 siblings, 1 reply; 103+ messages in thread
From: Ard Biesheuvel @ 2017-11-20 21:31 UTC (permalink / raw)
  To: Alex Matveev
  Cc: Mark Rutland, Sami Tolvanen, Andi Kleen, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Masahiro Yamada, Maxim Kuvyrkov, Michal Marek, Nick Desaulniers,
	Yury Norov, Matthias Kaehlcke

On 20 November 2017 at 21:29, Alex Matveev <alxmtvv@gmail.com> wrote:
> On Mon, 20 Nov 2017 15:20:14 +0000
> Mark Rutland <mark.rutland@arm.com> wrote:
>
>> On Wed, Nov 15, 2017 at 01:34:26PM -0800, Sami Tolvanen wrote:
>> > CONFIG_LTO_CLANG requires the use of clang's integrated assembler,
>> > which doesn't understand the inline assembly in aes-ce-cipher.c.
>> > Disable LTO for the file to work around the issue.
>>
>> Could you elaborate on what the integrated asembler doesn't like?
>>
>> It's not entirely clear at a glance, as the asm in that file doesn't
>> seem to do anything that obscure.
>>
>> Is it a bug?
>
> Turns out, integrated assembler doesn't like this instruction in
> aes_sub():
>         umov    %w[out], v0.4s[0]
>
> Specifically, it barks at "v0.4s[0]" part. And the way I read the spec,
> it's quite correct in not accepting this argument. From UMOV
> description:
>
> UMOV <Wd>, <Vn>.<Ts>[<index>]
>
> ...
>
> <Ts> For the 32-bit variant: is an element size specifier, encoded in
> the "imm5" field. It can have the following values:
>         B       when imm5 = xxxx1
>         H       when imm5 = xxx10
>         S       when imm5 = xx100
>
>
> With "v0.s[0]" it builds fine.
>
> Ard, since this is your code, can you comment? Feels like a typo.
>

Yep.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 16/18] arm64: crypto: disable LTO for aes-ce-cipher.c
  2017-11-20 21:31       ` Ard Biesheuvel
@ 2017-11-20 22:03         ` Alex Matveev
  0 siblings, 0 replies; 103+ messages in thread
From: Alex Matveev @ 2017-11-20 22:03 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Mark Rutland, Sami Tolvanen, Andi Kleen, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Masahiro Yamada, Maxim Kuvyrkov, Michal Marek, Nick Desaulniers,
	Yury Norov, Matthias Kaehlcke

Sami, this seems to be better solution for aes-ce-cipher.c problem.

Regards,
Alex


>From 6bcdd763b56ce10a77a79373a46fc0e8d9026178 Mon Sep 17 00:00:00 2001
From: Alex Matveev <alxmtvv@gmail.com>
Date: Mon, 20 Nov 2017 21:30:38 +0000
Subject: [PATCH] arm64: crypto: fix typo in aes_sub()

Clang's integrated assembler can't parse "v0.4s[0]" argument of the UMOV
instruction. And, as per ARM ARM, this is incorrect usage:

UMOV <Wd>, <Vn>.<Ts>[<index>]

...

<Ts> For the 32-bit variant: is an element size specifier, encoded in
the "imm5" field. It can have the following values:
	B	when imm5 = xxxx1
	H	when imm5 = xxx10
	S	when imm5 = xx100

Signed-off-by: Alex Matveev <alxmtvv@gmail.com>
---
 arch/arm64/crypto/aes-ce-cipher.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/crypto/aes-ce-cipher.c
b/arch/arm64/crypto/aes-ce-cipher.c index 6a75cd75ed11..e26bedee2c45
100644 --- a/arch/arm64/crypto/aes-ce-cipher.c
+++ b/arch/arm64/crypto/aes-ce-cipher.c
@@ -152,7 +152,7 @@ static u32 aes_sub(u32 input)
 	__asm__("dup	v1.4s, %w[in]		;"
 		"movi	v0.16b, #0		;"
 		"aese	v0.16b, v1.16b		;"
-		"umov	%w[out], v0.4s[0]	;"
+		"umov	%w[out], v0.s[0]	;"
 
 	:	[out]	"=r"(ret)
 	:	[in]	"r"(input)
-- 
2.15.0

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [v2,12/18] kbuild: add support for clang LTO
  2017-11-20 20:21     ` Sami Tolvanen
@ 2017-11-21  1:01       ` Nicholas Piggin
  2017-11-29 23:30         ` Sami Tolvanen
  0 siblings, 1 reply; 103+ messages in thread
From: Nicholas Piggin @ 2017-11-21  1:01 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Mark Rutland, Masahiro Yamada, Maxim Kuvyrkov, Michal Marek,
	Nick Desaulniers, Yury Norov, Matthias Kaehlcke

On Mon, 20 Nov 2017 12:21:52 -0800
Sami Tolvanen <samitolvanen@google.com> wrote:

> On Sat, Nov 18, 2017 at 01:21:39PM +1000, Nicholas Piggin wrote:
> > Do you have any kind of numbers for this, out of curiosity? Binary
> > size, performance, build time?  
> 
> I don't have performance numbers to share. Are there any specific
> benchmarks you'd be interested in seeing? Build time typically
> increases with LTO and in my experience, binary size tends to increase
> by ~10-15% as well.

By deduction, then you must see some performance improvement? :) 

I just wonder are you doing this because there is some worthwhile
performance gain? Or to enable more testing and development of LTO?
Any clues for why a user would want to enable it.

> 
> > Why is this needed? It would have been nice to get rid of the
> > !THIN_ARCHIVES option if you can make the patches work with the thin
> > archives paths.  
> 
> I believe LLVMgold doesn't know how to deal with an archive of LLVM IR
> files, but I can certainly use thin archives as an index and extract
> the path names for linking. I'll look into it.

Thanks, if you could. Possibly file a request with LLVMgold too, it
seems to be that toolchain support for archives is quite strong, so it
will be good to keep pushing for that.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 09/18] arm64: explicitly pass --no-fix-cortex-a53-843419 to GNU gold
  2017-11-20 20:35         ` Sami Tolvanen
@ 2017-11-21 11:15           ` Mark Rutland
  0 siblings, 0 replies; 103+ messages in thread
From: Mark Rutland @ 2017-11-21 11:15 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Will Deacon, Alex Matveev, Andi Kleen, Ard Biesheuvel,
	Greg Hackmann, Kees Cook, linux-arm-kernel, linux-kbuild,
	linux-kernel, Masahiro Yamada, Maxim Kuvyrkov, Michal Marek,
	Nick Desaulniers, Yury Norov, Matthias Kaehlcke

On Mon, Nov 20, 2017 at 12:35:49PM -0800, Sami Tolvanen wrote:
> On Mon, Nov 20, 2017 at 02:47:20PM +0000, Mark Rutland wrote:
> > However, we could instead check ld-version, produce a warning, and
> > abort the build in such cases.
> 
> I believe the version number of gold didn't change in binutils 2.28.1,
> where this was fixed, but we could certainly warn about older versions
> of gold.

If that's the case, can we mandate gold version > 2.28.1?

The latest version seems to be 2.29.1, so there's at least one
(hopefully) working release.

That might rule out a few working gold binaries, but it would save on
unwelcome surprises.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 16/18] arm64: crypto: disable LTO for aes-ce-cipher.c
  2017-11-20 21:01       ` Sami Tolvanen
@ 2017-11-21 11:47         ` Mark Rutland
  0 siblings, 0 replies; 103+ messages in thread
From: Mark Rutland @ 2017-11-21 11:47 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Ard Biesheuvel, Alex Matveev, Andi Kleen, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Masahiro Yamada, Maxim Kuvyrkov, Michal Marek, Nick Desaulniers,
	Yury Norov, Matthias Kaehlcke

On Mon, Nov 20, 2017 at 01:01:43PM -0800, Sami Tolvanen wrote:
> On Mon, Nov 20, 2017 at 03:25:31PM +0000, Ard Biesheuvel wrote:
> > However, under LTO this all changes, and it is no longer guaranteed
> > that the NEON registers are only touched between the kernel mode
> > neon begin/end calls.

Just to check, I take it that the feat is that LTO can merge the
begin/asm/end, reordering bits to the begin/end relative to the asm?

AFAICT, assuming that LTO respects our compiler barriers:

* the preempt_disable() in kernel_neon_begin() should prevent the asm
  block from being moved earlier, but it looks like it could be moved
  somewhere in the middle of local_bh_enable().

* the __this_cpu_xchg() in kernel_neon_end() *isn't* ordered w.r.t the
  asm, as it doesn't have a full memory clobber, and could be
  re-ordered before the asm block.

We *could* solve this case with a barrier() at the end of
kernel_neon_begin() and the start of kernel_neon_end(), but it is a
whack-a-mole solution. :/

... this also raises the question as to how the {__,}this_cpu*() ops are
expected to be ordered w.r.t. other local operations, as that's not
clear to me even in the absence of LTO.

> LTO operates on LLVM IR, so disabling LTO for this file should make
> sure there won't be any unsafe optimizations. Are there other places
> in the kernel that might have this issue?

I suspect that as above, there are a number of places that implicitly
rely on compilation-unit boundaries enforcing (local) ordering w.r.t.
asynchronous events, as the compiler won't otherwise be able to reorder
code such as cpu-local flag manipulation.

I think we have a much bigger problem here.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 00/18] Add support for clang LTO
  2017-11-20 21:04   ` Sami Tolvanen
@ 2017-11-21 11:53     ` Mark Rutland
  0 siblings, 0 replies; 103+ messages in thread
From: Mark Rutland @ 2017-11-21 11:53 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Masahiro Yamada, Maxim Kuvyrkov, Michal Marek, Nick Desaulniers,
	Yury Norov, Matthias Kaehlcke

On Mon, Nov 20, 2017 at 01:04:00PM -0800, Sami Tolvanen wrote:
> On Mon, Nov 20, 2017 at 03:21:40PM +0000, Mark Rutland wrote:
> > Would it be possible to split this into:
> > 
> > (1) (basic) arm64 clang support
> > (2) gold support (no LTO)
> > (3) LTO support
> > 
> > ... with any necessary workarounds added as-required to the relevant
> > series?
> 
> Absolutely. I'll split the patches into more sensible sets.

Thanks, that's much appreciated!

Mark.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* RE: [PATCH v2 18/18] arm64: select ARCH_SUPPORTS_LTO_CLANG
  2017-11-20 20:53                                 ` Paul E. McKenney
@ 2017-11-21 17:23                                   ` David Laight
  2017-11-21 18:51                                     ` Paul E. McKenney
  0 siblings, 1 reply; 103+ messages in thread
From: David Laight @ 2017-11-21 17:23 UTC (permalink / raw)
  To: 'paulmck@linux.vnet.ibm.com', Peter Zijlstra
  Cc: Will Deacon, Sami Tolvanen, Nick Desaulniers, Alex Matveev,
	Andi Kleen, Ard Biesheuvel, Greg Hackmann, Kees Cook,
	linux-arm-kernel, Linux Kbuild mailing list, LKML, Mark Rutland,
	Masahiro Yamada, Maxim Kuvyrkov, Michal Marek, Yury Norov,
	Matthias Kaehlcke, Alexander Potapenko, Stephen Hines,
	Pirama Arumuga Nainar, Manoj Gupta

From: Paul E. McKenney
> Sent: 20 November 2017 20:54
> 
> On Mon, Nov 20, 2017 at 08:32:56PM +0100, Peter Zijlstra wrote:
> > On Mon, Nov 20, 2017 at 06:05:55PM +0000, Will Deacon wrote:
> > > Although the current direction of the C++ committee is to prefer
> > > that dependencies are explicitly "marked", this is not deemed to be
> > > acceptable for the kernel (in other words, everything is always considered
> > > "marked").
> >
> > Yeah, that is an attitude not compatible with existing code. Much like
> > the proposal to allow temporary/wide stores on everything not explicitly
> > declared atomic. Such stuff instantly breaks all extant code that does
> > multi-threading with no recourse.
> 
> If someone suggests temporary/wide stores, even on non-atomics, tell
> them that the standard does not permit them to introduce data races.

The C standard doesn't say anything about multi-threading.

The x86 bis (bit set) family are well known for being problematic
because they always do a 32bit wide rmw cycle.

	David

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 18/18] arm64: select ARCH_SUPPORTS_LTO_CLANG
  2017-11-21 17:23                                   ` David Laight
@ 2017-11-21 18:51                                     ` Paul E. McKenney
  0 siblings, 0 replies; 103+ messages in thread
From: Paul E. McKenney @ 2017-11-21 18:51 UTC (permalink / raw)
  To: David Laight
  Cc: Peter Zijlstra, Will Deacon, Sami Tolvanen, Nick Desaulniers,
	Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	Kees Cook, linux-arm-kernel, Linux Kbuild mailing list, LKML,
	Mark Rutland, Masahiro Yamada, Maxim Kuvyrkov, Michal Marek,
	Yury Norov, Matthias Kaehlcke, Alexander Potapenko,
	Stephen Hines, Pirama Arumuga Nainar, Manoj Gupta

On Tue, Nov 21, 2017 at 05:23:52PM +0000, David Laight wrote:
> From: Paul E. McKenney
> > Sent: 20 November 2017 20:54
> > 
> > On Mon, Nov 20, 2017 at 08:32:56PM +0100, Peter Zijlstra wrote:
> > > On Mon, Nov 20, 2017 at 06:05:55PM +0000, Will Deacon wrote:
> > > > Although the current direction of the C++ committee is to prefer
> > > > that dependencies are explicitly "marked", this is not deemed to be
> > > > acceptable for the kernel (in other words, everything is always considered
> > > > "marked").
> > >
> > > Yeah, that is an attitude not compatible with existing code. Much like
> > > the proposal to allow temporary/wide stores on everything not explicitly
> > > declared atomic. Such stuff instantly breaks all extant code that does
> > > multi-threading with no recourse.
> > 
> > If someone suggests temporary/wide stores, even on non-atomics, tell
> > them that the standard does not permit them to introduce data races.
> 
> The C standard doesn't say anything about multi-threading.

Actually, recent versions of the C standard really do cover
multi-threading, and have for some years.  For example, the June 2010
draft has this to say in section 5.1.2.4:

	Under a hosted implementation, a program can have more than one
	thread of execution (or thread) running concurrently.

Later, in paragraph 25 of this same section:

	The execution of a program contains a data race if it contains
	two conflicting actions in different threads, at least one of
	which is not atomic, and neither happens before the other. Any
	such data race results in undefined behavior.

Because the compiler is not allowed to introduce undefined behavior in a
program that does not already contain undefined behavior, the compiler
is absolutely forbidden from inventing stores unless it can prove that
doing so does not introduce a data race.

One (painful and annoying) case in which it can prove this is just before
a normal (non-volatile and non-atomic) store.

> The x86 bis (bit set) family are well known for being problematic
> because they always do a 32bit wide rmw cycle.

If the compiler is careful, it can invent atomic read-modify-write cycles
to uninvolved variables.  Here "is careful" includes ensuring that any
read from or write to one of those uninvolved variables acts just as it
would in the absence of the atomic read-modify-write cycle.

But I did say "store" above, not atomic read-modify-write operation.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 18/18] arm64: select ARCH_SUPPORTS_LTO_CLANG
  2017-11-16 18:16                   ` Nick Desaulniers
  2017-11-16 18:39                     ` Paul E. McKenney
@ 2017-11-23 13:42                     ` Alexander Potapenko
  2017-11-24  7:52                       ` Dmitry Vyukov
  1 sibling, 1 reply; 103+ messages in thread
From: Alexander Potapenko @ 2017-11-23 13:42 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Paul McKenney, Peter Zijlstra, Sami Tolvanen, Will Deacon,
	Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	Kees Cook, linux-arm-kernel, Linux Kbuild mailing list, LKML,
	Mark Rutland, Masahiro Yamada, Maxim Kuvyrkov, Michal Marek,
	Yury Norov, Matthias Kaehlcke, Stephen Hines,
	Pirama Arumuga Nainar, Manoj Gupta, Dmitriy Vyukov,
	Andrey Konovalov

On Thu, Nov 16, 2017 at 7:16 PM, Nick Desaulniers
<ndesaulniers@google.com> wrote:
> On Thu, Nov 16, 2017 at 9:48 AM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
>> On Thu, Nov 16, 2017 at 06:34:17PM +0100, Peter Zijlstra wrote:
>>> On Thu, Nov 16, 2017 at 09:16:49AM -0800, Nick Desaulniers wrote:
>>> > On Thu, Nov 16, 2017 at 8:59 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>>> > > On Thu, Nov 16, 2017 at 08:50:41AM -0800, Nick Desaulniers wrote:
>>> > >> On Thu, Nov 16, 2017 at 8:30 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>>> > >>
>>> > >> > Ideally we'd get the toolchain people to commit to supporting the kernel
>>> > >> > memory model along side the C11 one. That would help a ton.
>>> > >>
>>> > >> Does anyone from the kernel side participate in the C standardization process?
>>> > >
>>> > > Yes, Paul McKenney and Will Deacon. Doesn't mean these two can still be
>>> > > reconciled though. From what I understand C11 (and onwards) are
>>> > > incompatible with the kernel model on a number of subtle points.
>>> >
>>> > It would be good to have these incompatibilities written down, then
>>> > for the sake of argument, they can be cited both for discussions on
>>> > LKML and in the C standardization process.  For example, a running
>>> > list in Documentation/ or something would make it so that anyone could
>>> > understand and cite current issues with the latest C standard.
>>>
>>> Will should be able to produce this list; I know he's done before, I
>>> just can't find it -- my Google-foo isn't strong today.
>>
>> Here you go:
>>
>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0124r4.html
>
> Great, thanks! Will take some time to digest, but happy to refer
> others to this important work in the future.
>
> I wonder if we have anything like a case study that shows specifically
> how a compiler generated a subtle bug due to specifics of the memory
> model, like a "if you do this, here's the problematic code that will
> get generated, and why it's problematic due to the memory model."
> That may be a good way to raise issues with toolchain developers with
> concrete and actionable examples.
>
>>> > I don't understand why we'd block patches for enabling experimental
>>> > features.  We've been running this patch-set on actual devices for
>>> > months and would love to provide them to the community for further
>>> > testing.  If bugs are found, then there's more evidence to bring to
>>> > the C standards committee.  Otherwise we're shutting down feature
>>> > development for the sake of potential bugs in a C standard we're not
>>> > even using.
>>>
>>> So the problem is that its very very hard (and painful) to find these
>>> bugs. Getting the tools people to comment on these specific
>>> optimizations would really help lots.
>
> No doubt; I do not disagree with you.  Kernel developers have very
> important use cases for the language.
>
> But the core point I'm trying to make is "do we need to block this
> patch set until issues with the C standards body in regards to the
> kernels memory model are resolved?"  I would hope the two are
> orthogonal and that we'd take them and then test them even more
> extensively than the developer has in order to find out.
>
>> It would be good to get something similar to LKMM into KTSAN, for
>> example.  There would probably be a few differences due to efficiency
>> concerns, but closer is better than less close.  ;-)
>
> + glider, who may be able to comment on the state of KTSAN.
We haven't touched KTSAN for a while, so it's probably broken right now.
It should be possible to revive it, the question is how much code will
need to be annotated to prevent the tool from overwhelming the
developers with reports.
+Dima and Andrey, who should know better.


-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 18/18] arm64: select ARCH_SUPPORTS_LTO_CLANG
  2017-11-23 13:42                     ` Alexander Potapenko
@ 2017-11-24  7:52                       ` Dmitry Vyukov
  0 siblings, 0 replies; 103+ messages in thread
From: Dmitry Vyukov @ 2017-11-24  7:52 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Nick Desaulniers, Paul McKenney, Peter Zijlstra, Sami Tolvanen,
	Will Deacon, Alex Matveev, Andi Kleen, Ard Biesheuvel,
	Greg Hackmann, Kees Cook, linux-arm-kernel,
	Linux Kbuild mailing list, LKML, Mark Rutland, Masahiro Yamada,
	Maxim Kuvyrkov, Michal Marek, Yury Norov, Matthias Kaehlcke,
	Stephen Hines, Pirama Arumuga Nainar, Manoj Gupta,
	Andrey Konovalov

On Thu, Nov 23, 2017 at 2:42 PM, Alexander Potapenko <glider@google.com> wrote:
>>>> > >> > Ideally we'd get the toolchain people to commit to supporting the kernel
>>>> > >> > memory model along side the C11 one. That would help a ton.
>>>> > >>
>>>> > >> Does anyone from the kernel side participate in the C standardization process?
>>>> > >
>>>> > > Yes, Paul McKenney and Will Deacon. Doesn't mean these two can still be
>>>> > > reconciled though. From what I understand C11 (and onwards) are
>>>> > > incompatible with the kernel model on a number of subtle points.
>>>> >
>>>> > It would be good to have these incompatibilities written down, then
>>>> > for the sake of argument, they can be cited both for discussions on
>>>> > LKML and in the C standardization process.  For example, a running
>>>> > list in Documentation/ or something would make it so that anyone could
>>>> > understand and cite current issues with the latest C standard.
>>>>
>>>> Will should be able to produce this list; I know he's done before, I
>>>> just can't find it -- my Google-foo isn't strong today.
>>>
>>> Here you go:
>>>
>>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0124r4.html
>>
>> Great, thanks! Will take some time to digest, but happy to refer
>> others to this important work in the future.
>>
>> I wonder if we have anything like a case study that shows specifically
>> how a compiler generated a subtle bug due to specifics of the memory
>> model, like a "if you do this, here's the problematic code that will
>> get generated, and why it's problematic due to the memory model."
>> That may be a good way to raise issues with toolchain developers with
>> concrete and actionable examples.
>>
>>>> > I don't understand why we'd block patches for enabling experimental
>>>> > features.  We've been running this patch-set on actual devices for
>>>> > months and would love to provide them to the community for further
>>>> > testing.  If bugs are found, then there's more evidence to bring to
>>>> > the C standards committee.  Otherwise we're shutting down feature
>>>> > development for the sake of potential bugs in a C standard we're not
>>>> > even using.
>>>>
>>>> So the problem is that its very very hard (and painful) to find these
>>>> bugs. Getting the tools people to comment on these specific
>>>> optimizations would really help lots.
>>
>> No doubt; I do not disagree with you.  Kernel developers have very
>> important use cases for the language.
>>
>> But the core point I'm trying to make is "do we need to block this
>> patch set until issues with the C standards body in regards to the
>> kernels memory model are resolved?"  I would hope the two are
>> orthogonal and that we'd take them and then test them even more
>> extensively than the developer has in order to find out.
>>
>>> It would be good to get something similar to LKMM into KTSAN, for
>>> example.  There would probably be a few differences due to efficiency
>>> concerns, but closer is better than less close.  ;-)
>>
>> + glider, who may be able to comment on the state of KTSAN.
> We haven't touched KTSAN for a while, so it's probably broken right now.
> It should be possible to revive it, the question is how much code will
> need to be annotated to prevent the tool from overwhelming the
> developers with reports.
> +Dima and Andrey, who should know better.

Hi,

KTSAN checks acquire/release pairs, and that's very useful. But as far
as I understand this thread is about more subtle things and areas of
kernel/compiler tension. I afraid this in this context KTSAN is in the
same boat as compiler. Because, well, we need to write code that
implements precise algorithms. And if control-dependencies are defined
as "extreme care is required to avoid control-dependency-destroying
compiler optimizations" (that is, code is correct if it does not break
against the current set of enabled optimizations in the current
compiler, what?) and data-dependencies are defined akin to C11
definition (read -- non-implementable, unicorns); then KTSAN can't
help.

When/if C provides implementable rules for data-dependencies
(_Dependency) and that's implemented in compilers and kernel sticks to
this model, then I guess it should be possible to extract that info
from compiler and check against it in KTSAN (e.g. 2 synchronization
domains, one for dependent accesses and one for everything else).
Kernel could as well define own model, and KTSAN could check against
it. But that model must be implemented in compilers first anyway.
Because (1) doing it just for KTSAN does not look reasonable, (2)
until compiler supports that model there is little point in checking
(the fact that compiler uses a different model is the major gaping
hole and we are aware of it without tooling help).

And, yes, I agree that we should not block this LTO patch. All
problems are already there and are orthogonal to LTO. Compiler sees
enough code already (large TUs, lots of code in headers) and we move
code. I also have not seen any special rules wrt rcu and translation
units, I have not seen developers doing any additional analysis re
TUs, move code to separate files, nor I seen comments says that this
code must be in separate TU than that code. From what I see usually
it's assumed that things will just work. If anything LTO will be
useful to shake out latent bugs that will pop up later.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [v2,12/18] kbuild: add support for clang LTO
  2017-11-21  1:01       ` Nicholas Piggin
@ 2017-11-29 23:30         ` Sami Tolvanen
  0 siblings, 0 replies; 103+ messages in thread
From: Sami Tolvanen @ 2017-11-29 23:30 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Alex Matveev, Andi Kleen, Ard Biesheuvel, Greg Hackmann,
	Kees Cook, linux-arm-kernel, linux-kbuild, linux-kernel,
	Mark Rutland, Masahiro Yamada, Maxim Kuvyrkov, Michal Marek,
	Nick Desaulniers, Yury Norov, Matthias Kaehlcke

On Tue, Nov 21, 2017 at 11:01:52AM +1000, Nicholas Piggin wrote:
> I just wonder are you doing this because there is some worthwhile
> performance gain? Or to enable more testing and development of LTO?
> Any clues for why a user would want to enable it.

I'm primarily interested in CFI, which with clang requires LTO; not
for the optimizations, but for source visibility. We do expect to see
performance improvements with LTO though, especially if combined with
PGO.

> Thanks, if you could. Possibly file a request with LLVMgold too, it
> seems to be that toolchain support for archives is quite strong, so it
> will be good to keep pushing for that.

It turns out LLVMgold is fine with mixed IR/object archives, but we need
to use llvm-ar to generate symbol tables for them, and there are some
compatibility issues with objdump that I had to work around. I'll send
v3 for review once I receive some feedback for the clang/gold patches we
need first.

Sami

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v2 11/18] arm64: make mrs_s and msr_s macros work with LTO
  2017-11-16 22:54           ` Alex Matveev
@ 2017-12-04 17:34             ` Nick Desaulniers
  0 siblings, 0 replies; 103+ messages in thread
From: Nick Desaulniers @ 2017-12-04 17:34 UTC (permalink / raw)
  To: Alex Matveev
  Cc: Yury Norov, Robin Murphy, Will Deacon, Mark Rutland, Andi Kleen,
	Kees Cook, Ard Biesheuvel, Linux Kbuild mailing list, LKML,
	Greg Hackmann, Masahiro Yamada, Michal Marek, Sami Tolvanen,
	Matthias Kaehlcke, linux-arm-kernel, Maxim Kuvyrkov

On Thu, Nov 16, 2017 at 2:54 PM, Alex Matveev <alxmtvv@gmail.com> wrote:
> On Fri, 17 Nov 2017 00:29:20 +0300
> Yury Norov <ynorov@caviumnetworks.com> wrote:
>
>> On Thu, Nov 16, 2017 at 01:55:31PM +0000, Robin Murphy wrote:
>> > Given that this whole mrs_s infrastructure is a workaround for older
>> > assemblers which don't support the "S<op0>_<op1>_<Cn>_<Cm>_<op2>"
>> > syntax for arbitrary unnamed system registers (which IIRC was a
>> > fairly late addition to the architecture), the only way it could be
>> > "fixed" on the toolchain side is by removing all those older
>> > toolchains from existence. Good luck with that ;)
> commit 72c583951526
> Author: Catalin Marinas <catalin.marinas@arm.com>
> Date:   Thu Jul 24 14:14:42 2014 +0100
>
>     arm64: gicv3: Allow GICv3 compilation with older binutils
>
>     GICv3 introduces new system registers accessible with the full
>     msr/mrs syntax (e.g. mrs x0, Sop0_op1_CRm_CRn_op2). However, only
>     recent binutils understand the new syntax. This patch introduces
>     msr_s/mrs_s assembly macros which generate the equivalent
>     instructions above and converts the existing GICv3 code (both
>     drivers/irqchip/ and arch/arm64/kernel/).
>
> The question is - is it OK to drop compatibility with old versions of
> binutils (which were already "older" back in 2014)? It's not my call to
> make. If yes, then it should be possible to make this change more
> aesthetic by reverting to "S<op>" (however, it will affect more places
> as now some users of register definitions expect them to be numbers, not
> "S<op>" strings).

I don't think we found a resolution to the compatibility question
posed.  Given that the affected file is only in use for arm64, I think
the arm64 maintainers should make the call.  I encourage them to drop
support for old toolchains; the use of ld-version macros can help warn
users using old toolchains on newer kernel versions.

^ permalink raw reply	[flat|nested] 103+ messages in thread

end of thread, other threads:[~2017-12-04 17:34 UTC | newest]

Thread overview: 103+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-15 21:34 [PATCH v2 00/18] Add support for clang LTO Sami Tolvanen
2017-11-15 21:34 ` [PATCH v2 01/18] kbuild: add ld-name macro and support for GNU gold Sami Tolvanen
2017-11-15 21:53   ` Kees Cook
2017-11-15 21:34 ` [PATCH v2 02/18] kbuild: fix LD_DEAD_CODE_DATA_ELIMINATION with " Sami Tolvanen
2017-11-15 21:34 ` [PATCH v2 03/18] kbuild: move gcc-version.sh to cc-version.sh and add clang support Sami Tolvanen
2017-11-15 21:48   ` Kees Cook
2017-11-15 22:06     ` Sami Tolvanen
2017-11-15 21:34 ` [PATCH v2 04/18] arm64: use -mno-implicit-float instead of -mgeneral-regs-only Sami Tolvanen
2017-11-15 21:34 ` [PATCH v2 05/18] arm64: fix -m for GNU gold Sami Tolvanen
2017-11-16 10:55   ` Yury Norov
2017-11-16 16:55     ` Nick Desaulniers
2017-11-15 21:34 ` [PATCH v2 06/18] arm64: kvm: use -fno-jump-tables with clang Sami Tolvanen
2017-11-16 11:46   ` Will Deacon
2017-11-16 16:25     ` Sami Tolvanen
2017-11-20 14:41   ` Mark Rutland
2017-11-20 14:43     ` Mark Rutland
2017-11-20 14:47       ` Ard Biesheuvel
2017-11-15 21:34 ` [PATCH v2 07/18] arm64: keep .altinstructions and .altinstr_replacement Sami Tolvanen
2017-11-15 21:34 ` [PATCH v2 08/18] arm64: don't disable ADR_PREL_PG_HI21* with ARM64_ERRATUM_843419 Sami Tolvanen
2017-11-15 22:29   ` Ard Biesheuvel
2017-11-16 11:44     ` Will Deacon
2017-11-16 16:32       ` Sami Tolvanen
2017-11-16 16:34         ` Ard Biesheuvel
2017-11-16 21:37           ` Sami Tolvanen
2017-11-16 22:14             ` Ard Biesheuvel
2017-11-16 22:25               ` Ard Biesheuvel
2017-11-16 23:09               ` Sami Tolvanen
2017-11-16 23:20                 ` Ard Biesheuvel
2017-11-16 23:50                   ` Sami Tolvanen
2017-11-17  9:54                     ` Ard Biesheuvel
2017-11-17 18:49                       ` Sami Tolvanen
2017-11-15 21:34 ` [PATCH v2 09/18] arm64: explicitly pass --no-fix-cortex-a53-843419 to GNU gold Sami Tolvanen
2017-11-16 11:47   ` Will Deacon
2017-11-16 16:35     ` Sami Tolvanen
2017-11-20 14:47       ` Mark Rutland
2017-11-20 20:35         ` Sami Tolvanen
2017-11-21 11:15           ` Mark Rutland
2017-11-15 21:34 ` [PATCH v2 10/18] arm64: add a workaround for GNU gold with ARM64_MODULE_PLTS Sami Tolvanen
2017-11-16 11:50   ` Will Deacon
2017-11-16 16:41     ` Sami Tolvanen
2017-11-16 18:47       ` Will Deacon
2017-11-15 21:34 ` [PATCH v2 11/18] arm64: make mrs_s and msr_s macros work with LTO Sami Tolvanen
2017-11-16 11:54   ` Will Deacon
2017-11-16 13:07     ` Yury Norov
2017-11-16 13:55       ` Robin Murphy
2017-11-16 21:29         ` Yury Norov
2017-11-16 22:54           ` Alex Matveev
2017-12-04 17:34             ` Nick Desaulniers
2017-11-16 13:56       ` Segher Boessenkool
2017-11-16 16:46         ` Sami Tolvanen
2017-11-16 17:01           ` Segher Boessenkool
2017-11-16 17:11             ` Sami Tolvanen
2017-11-16 16:43     ` Sami Tolvanen
2017-11-16 16:44       ` Nick Desaulniers
2017-11-16 18:14     ` Alex Matveev
2017-11-15 21:34 ` [PATCH v2 12/18] kbuild: add support for clang LTO Sami Tolvanen
2017-11-15 22:06   ` Kees Cook
2017-11-18  3:21   ` [v2,12/18] " Nicholas Piggin
2017-11-20 20:21     ` Sami Tolvanen
2017-11-21  1:01       ` Nicholas Piggin
2017-11-29 23:30         ` Sami Tolvanen
2017-11-15 21:34 ` [PATCH v2 13/18] kbuild: fix dynamic ftrace with " Sami Tolvanen
2017-11-15 21:34 ` [PATCH v2 14/18] scripts/mod: disable LTO for empty.c Sami Tolvanen
2017-11-15 21:34 ` [PATCH v2 15/18] efi/libstub: disable LTO Sami Tolvanen
2017-11-15 21:34 ` [PATCH v2 16/18] arm64: crypto: disable LTO for aes-ce-cipher.c Sami Tolvanen
2017-11-20 15:20   ` Mark Rutland
2017-11-20 15:25     ` Ard Biesheuvel
2017-11-20 21:01       ` Sami Tolvanen
2017-11-21 11:47         ` Mark Rutland
2017-11-20 20:51     ` Sami Tolvanen
2017-11-20 21:29     ` Alex Matveev
2017-11-20 21:31       ` Ard Biesheuvel
2017-11-20 22:03         ` Alex Matveev
2017-11-15 21:34 ` [PATCH v2 17/18] arm64: disable RANDOMIZE_MODULE_REGION_FULL with LTO_CLANG Sami Tolvanen
2017-11-15 21:34 ` [PATCH v2 18/18] arm64: select ARCH_SUPPORTS_LTO_CLANG Sami Tolvanen
2017-11-16 11:58   ` Will Deacon
2017-11-16 16:17     ` Sami Tolvanen
2017-11-16 16:30       ` Peter Zijlstra
2017-11-16 16:50         ` Nick Desaulniers
2017-11-16 16:59           ` Peter Zijlstra
2017-11-16 17:16             ` Nick Desaulniers
2017-11-16 17:34               ` Peter Zijlstra
2017-11-16 17:48                 ` Paul E. McKenney
2017-11-16 18:16                   ` Nick Desaulniers
2017-11-16 18:39                     ` Paul E. McKenney
2017-11-16 18:45                       ` Will Deacon
2017-11-16 19:13                         ` Paul E. McKenney
2017-11-16 20:17                           ` Sami Tolvanen
2017-11-20 18:05                             ` Will Deacon
2017-11-20 19:25                               ` Peter Zijlstra
2017-11-20 19:28                               ` Peter Zijlstra
2017-11-20 20:52                                 ` Paul E. McKenney
2017-11-20 19:32                               ` Peter Zijlstra
2017-11-20 20:53                                 ` Paul E. McKenney
2017-11-21 17:23                                   ` David Laight
2017-11-21 18:51                                     ` Paul E. McKenney
2017-11-23 13:42                     ` Alexander Potapenko
2017-11-24  7:52                       ` Dmitry Vyukov
2017-11-16 20:53 ` [PATCH v2 00/18] Add support for clang LTO Yury Norov
2017-11-16 21:38   ` Sami Tolvanen
2017-11-20 15:21 ` Mark Rutland
2017-11-20 21:04   ` Sami Tolvanen
2017-11-21 11:53     ` Mark Rutland

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).