linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v8 00/16] Add support for Clang LTO
@ 2020-12-01 21:36 Sami Tolvanen
  2020-12-01 21:36 ` [PATCH v8 01/16] tracing: move function tracer options to Kconfig Sami Tolvanen
                   ` (18 more replies)
  0 siblings, 19 replies; 48+ messages in thread
From: Sami Tolvanen @ 2020-12-01 21:36 UTC (permalink / raw)
  To: Masahiro Yamada, Steven Rostedt, Will Deacon
  Cc: Josh Poimboeuf, Peter Zijlstra, Greg Kroah-Hartman,
	Paul E. McKenney, Kees Cook, Nick Desaulniers, clang-built-linux,
	kernel-hardening, linux-arch, linux-arm-kernel, linux-kbuild,
	linux-kernel, linux-pci, Sami Tolvanen

This patch series adds support for building the kernel with Clang's
Link Time Optimization (LTO). In addition to performance, the primary
motivation for LTO is to allow Clang's Control-Flow Integrity (CFI)
to be used in the kernel. Google has shipped millions of Pixel
devices running three major kernel versions with LTO+CFI since 2018.

Most of the patches are build system changes for handling LLVM
bitcode, which Clang produces with LTO instead of ELF object files,
postponing ELF processing until a later stage, and ensuring initcall
ordering.

Note that arm64 support depends on Will's memory ordering patches
[1]. I will post x86_64 patches separately after we have fixed the
remaining objtool warnings [2][3].

[1] https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-next/lto
[2] https://lore.kernel.org/lkml/20201120040424.a3wctajzft4ufoiw@treble/
[3] https://git.kernel.org/pub/scm/linux/kernel/git/jpoimboe/linux.git/log/?h=objtool-vmlinux

You can also pull this series from

  https://github.com/samitolvanen/linux.git lto-v8

---
Changes in v8:

  - Cleaned up the LTO Kconfig options based on suggestions from
    Nick and Kees.

  - Dropped the patch to disable LTO for the arm64 nVHE KVM code as
    David pointed out it's not needed anymore.

Changes in v7:

  - Rebased to master again.

  - Added back arm64 patches as the prerequisites are now staged,
    and dropped x86_64 support until the remaining objtool issues
    are resolved.

  - Dropped ifdefs from module.lds.S.

Changes in v6:

  - Added the missing --mcount flag to patch 5.

  - Dropped the arm64 patches from this series and will repost them
    later.

Changes in v5:

  - Rebased on top of tip/master.

  - Changed the command line for objtool to use --vmlinux --duplicate
    to disable warnings about retpoline thunks and to fix .orc_unwind
    generation for vmlinux.o.

  - Added --noinstr flag to objtool, so we can use --vmlinux without
    also enabling noinstr validation.

  - Disabled objtool's unreachable instruction warnings with LTO to
    disable false positives for the int3 padding in vmlinux.o.

  - Added ANNOTATE_RETPOLINE_SAFE annotations to the indirect jumps
    in x86 assembly code to fix objtool warnings with retpoline.

  - Fixed modpost warnings about missing version information with
    CONFIG_MODVERSIONS.

  - Included Makefile.lib into Makefile.modpost for ld_flags. Thanks
    to Sedat for pointing this out.

  - Updated the help text for ThinLTO to better explain the trade-offs.

  - Updated commit messages with better explanations.

Changes in v4:

  - Fixed a typo in Makefile.lib to correctly pass --no-fp to objtool.

  - Moved ftrace configs related to generating __mcount_loc to Kconfig,
    so they are available also in Makefile.modfinal.

  - Dropped two prerequisite patches that were merged to Linus' tree.

Changes in v3:

  - Added a separate patch to remove the unused DISABLE_LTO treewide,
    as filtering out CC_FLAGS_LTO instead is preferred.

  - Updated the Kconfig help to explain why LTO is behind a choice
    and disabled by default.

  - Dropped CC_FLAGS_LTO_CLANG, compiler-specific LTO flags are now
    appended directly to CC_FLAGS_LTO.

  - Updated $(AR) flags as KBUILD_ARFLAGS was removed earlier.

  - Fixed ThinLTO cache handling for external module builds.

  - Rebased on top of Masahiro's patch for preprocessing modules.lds,
    and moved the contents of module-lto.lds to modules.lds.S.

  - Moved objtool_args to Makefile.lib to avoid duplication of the
    command line parameters in Makefile.modfinal.

  - Clarified in the commit message for the initcall ordering patch
    that the initcall order remains the same as without LTO.

  - Changed link-vmlinux.sh to use jobserver-exec to control the
    number of jobs started by generate_initcall_ordering.pl.

  - Dropped the x86/relocs patch to whitelist L4_PAGE_OFFSET as it's
    no longer needed with ToT kernel.

  - Disabled LTO for arch/x86/power/cpu.c to work around a Clang bug
    with stack protector attributes.

Changes in v2:

  - Fixed -Wmissing-prototypes warnings with W=1.

  - Dropped cc-option from -fsplit-lto-unit and added .thinlto-cache
    scrubbing to make distclean.

  - Added a comment about Clang >=11 being required.

  - Added a patch to disable LTO for the arm64 KVM nVHE code.

  - Disabled objtool's noinstr validation with LTO unless enabled.

  - Included Peter's proposed objtool mcount patch in the series
    and replaced recordmcount with the objtool pass to avoid
    whitelisting relocations that are not calls.

  - Updated several commit messages with better explanations.


Sami Tolvanen (17):
  tracing: move function tracer options to Kconfig
  kbuild: add support for Clang LTO
  kbuild: lto: fix module versioning
  kbuild: lto: limit inlining
  kbuild: lto: merge module sections
  kbuild: lto: remove duplicate dependencies from .mod files
  init: lto: ensure initcall ordering
  init: lto: fix PREL32 relocations
  PCI: Fix PREL32 relocations for LTO
  modpost: lto: strip .lto from module names
  scripts/mod: disable LTO for empty.c
  efi/libstub: disable LTO
  drivers/misc/lkdtm: disable LTO for rodata.o
  arm64: vdso: disable LTO
  KVM: arm64: disable LTO for the nVHE directory
  arm64: disable recordmcount with DYNAMIC_FTRACE_WITH_REGS
  arm64: allow LTO_CLANG and THINLTO to be selected

 .gitignore                            |   1 +
 Makefile                              |  45 +++--
 arch/Kconfig                          |  74 +++++++
 arch/arm64/Kconfig                    |   4 +
 arch/arm64/kernel/vdso/Makefile       |   3 +-
 arch/arm64/kvm/hyp/nvhe/Makefile      |   4 +-
 drivers/firmware/efi/libstub/Makefile |   2 +
 drivers/misc/lkdtm/Makefile           |   1 +
 include/asm-generic/vmlinux.lds.h     |  11 +-
 include/linux/init.h                  |  79 +++++++-
 include/linux/pci.h                   |  19 +-
 kernel/trace/Kconfig                  |  16 ++
 scripts/Makefile.build                |  50 ++++-
 scripts/Makefile.lib                  |   6 +-
 scripts/Makefile.modfinal             |   9 +-
 scripts/Makefile.modpost              |  25 ++-
 scripts/generate_initcall_order.pl    | 270 ++++++++++++++++++++++++++
 scripts/link-vmlinux.sh               |  70 ++++++-
 scripts/mod/Makefile                  |   1 +
 scripts/mod/modpost.c                 |  16 +-
 scripts/mod/modpost.h                 |   9 +
 scripts/mod/sumversion.c              |   6 +-
 scripts/module.lds.S                  |  24 +++
 23 files changed, 677 insertions(+), 68 deletions(-)
 create mode 100755 scripts/generate_initcall_order.pl


base-commit: 0fa8ee0d9ab95c9350b8b84574824d9a384a9f7d
-- 
2.29.2.299.gdc1121823c-goog


*** BLURB HERE ***

Sami Tolvanen (16):
  tracing: move function tracer options to Kconfig
  kbuild: add support for Clang LTO
  kbuild: lto: fix module versioning
  kbuild: lto: limit inlining
  kbuild: lto: merge module sections
  kbuild: lto: remove duplicate dependencies from .mod files
  init: lto: ensure initcall ordering
  init: lto: fix PREL32 relocations
  PCI: Fix PREL32 relocations for LTO
  modpost: lto: strip .lto from module names
  scripts/mod: disable LTO for empty.c
  efi/libstub: disable LTO
  drivers/misc/lkdtm: disable LTO for rodata.o
  arm64: vdso: disable LTO
  arm64: disable recordmcount with DYNAMIC_FTRACE_WITH_REGS
  arm64: allow LTO to be selected

 .gitignore                            |   1 +
 Makefile                              |  45 +++--
 arch/Kconfig                          |  87 +++++++++
 arch/arm64/Kconfig                    |   4 +
 arch/arm64/kernel/vdso/Makefile       |   3 +-
 drivers/firmware/efi/libstub/Makefile |   2 +
 drivers/misc/lkdtm/Makefile           |   1 +
 include/asm-generic/vmlinux.lds.h     |  11 +-
 include/linux/init.h                  |  79 +++++++-
 include/linux/pci.h                   |  19 +-
 kernel/trace/Kconfig                  |  16 ++
 scripts/Makefile.build                |  50 ++++-
 scripts/Makefile.lib                  |   6 +-
 scripts/Makefile.modfinal             |   9 +-
 scripts/Makefile.modpost              |  25 ++-
 scripts/generate_initcall_order.pl    | 270 ++++++++++++++++++++++++++
 scripts/link-vmlinux.sh               |  70 ++++++-
 scripts/mod/Makefile                  |   1 +
 scripts/mod/modpost.c                 |  16 +-
 scripts/mod/modpost.h                 |   9 +
 scripts/mod/sumversion.c              |   6 +-
 scripts/module.lds.S                  |  24 +++
 22 files changed, 688 insertions(+), 66 deletions(-)
 create mode 100755 scripts/generate_initcall_order.pl


base-commit: b65054597872ce3aefbc6a666385eabdf9e288da
-- 
2.29.2.576.ga3fc446d84-goog


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v8 01/16] tracing: move function tracer options to Kconfig
  2020-12-01 21:36 [PATCH v8 00/16] Add support for Clang LTO Sami Tolvanen
@ 2020-12-01 21:36 ` Sami Tolvanen
  2020-12-01 21:47   ` Steven Rostedt
  2020-12-01 21:36 ` [PATCH v8 02/16] kbuild: add support for Clang LTO Sami Tolvanen
                   ` (17 subsequent siblings)
  18 siblings, 1 reply; 48+ messages in thread
From: Sami Tolvanen @ 2020-12-01 21:36 UTC (permalink / raw)
  To: Masahiro Yamada, Steven Rostedt, Will Deacon
  Cc: Josh Poimboeuf, Peter Zijlstra, Greg Kroah-Hartman,
	Paul E. McKenney, Kees Cook, Nick Desaulniers, clang-built-linux,
	kernel-hardening, linux-arch, linux-arm-kernel, linux-kbuild,
	linux-kernel, linux-pci, Sami Tolvanen

Move function tracer options to Kconfig to make it easier to add
new methods for generating __mcount_loc, and to make the options
available also when building kernel modules.

Note that FTRACE_MCOUNT_USE_* options are updated on rebuild and
therefore, work even if the .config was generated in a different
environment.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
---
 Makefile               | 20 ++++++++------------
 kernel/trace/Kconfig   | 16 ++++++++++++++++
 scripts/Makefile.build |  6 ++----
 3 files changed, 26 insertions(+), 16 deletions(-)

diff --git a/Makefile b/Makefile
index 43ecedeb3f02..16b7f0890e75 100644
--- a/Makefile
+++ b/Makefile
@@ -849,12 +849,8 @@ KBUILD_CFLAGS += $(DEBUG_CFLAGS)
 export DEBUG_CFLAGS
 
 ifdef CONFIG_FUNCTION_TRACER
-ifdef CONFIG_FTRACE_MCOUNT_RECORD
-  # gcc 5 supports generating the mcount tables directly
-  ifeq ($(call cc-option-yn,-mrecord-mcount),y)
-    CC_FLAGS_FTRACE	+= -mrecord-mcount
-    export CC_USING_RECORD_MCOUNT := 1
-  endif
+ifdef CONFIG_FTRACE_MCOUNT_USE_CC
+  CC_FLAGS_FTRACE	+= -mrecord-mcount
   ifdef CONFIG_HAVE_NOP_MCOUNT
     ifeq ($(call cc-option-yn, -mnop-mcount),y)
       CC_FLAGS_FTRACE	+= -mnop-mcount
@@ -862,6 +858,12 @@ ifdef CONFIG_FTRACE_MCOUNT_RECORD
     endif
   endif
 endif
+ifdef CONFIG_FTRACE_MCOUNT_USE_RECORDMCOUNT
+  ifdef CONFIG_HAVE_C_RECORDMCOUNT
+    BUILD_C_RECORDMCOUNT := y
+    export BUILD_C_RECORDMCOUNT
+  endif
+endif
 ifdef CONFIG_HAVE_FENTRY
   ifeq ($(call cc-option-yn, -mfentry),y)
     CC_FLAGS_FTRACE	+= -mfentry
@@ -871,12 +873,6 @@ endif
 export CC_FLAGS_FTRACE
 KBUILD_CFLAGS	+= $(CC_FLAGS_FTRACE) $(CC_FLAGS_USING)
 KBUILD_AFLAGS	+= $(CC_FLAGS_USING)
-ifdef CONFIG_DYNAMIC_FTRACE
-	ifdef CONFIG_HAVE_C_RECORDMCOUNT
-		BUILD_C_RECORDMCOUNT := y
-		export BUILD_C_RECORDMCOUNT
-	endif
-endif
 endif
 
 # We trigger additional mismatches with less inlining
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index a4020c0b4508..927ad004888a 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -595,6 +595,22 @@ config FTRACE_MCOUNT_RECORD
 	depends on DYNAMIC_FTRACE
 	depends on HAVE_FTRACE_MCOUNT_RECORD
 
+config FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY
+	bool
+	depends on FTRACE_MCOUNT_RECORD
+
+config FTRACE_MCOUNT_USE_CC
+	def_bool y
+	depends on $(cc-option,-mrecord-mcount)
+	depends on !FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY
+	depends on FTRACE_MCOUNT_RECORD
+
+config FTRACE_MCOUNT_USE_RECORDMCOUNT
+	def_bool y
+	depends on !FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY
+	depends on !FTRACE_MCOUNT_USE_CC
+	depends on FTRACE_MCOUNT_RECORD
+
 config TRACING_MAP
 	bool
 	depends on ARCH_HAVE_NMI_SAFE_CMPXCHG
diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index ae647379b579..2175ddb1ee0c 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -178,8 +178,7 @@ cmd_modversions_c =								\
 	fi
 endif
 
-ifdef CONFIG_FTRACE_MCOUNT_RECORD
-ifndef CC_USING_RECORD_MCOUNT
+ifdef CONFIG_FTRACE_MCOUNT_USE_RECORDMCOUNT
 # compiler will not generate __mcount_loc use recordmcount or recordmcount.pl
 ifdef BUILD_C_RECORDMCOUNT
 ifeq ("$(origin RECORDMCOUNT_WARN)", "command line")
@@ -206,8 +205,7 @@ recordmcount_source := $(srctree)/scripts/recordmcount.pl
 endif # BUILD_C_RECORDMCOUNT
 cmd_record_mcount = $(if $(findstring $(strip $(CC_FLAGS_FTRACE)),$(_c_flags)),	\
 	$(sub_cmd_record_mcount))
-endif # CC_USING_RECORD_MCOUNT
-endif # CONFIG_FTRACE_MCOUNT_RECORD
+endif # CONFIG_FTRACE_MCOUNT_USE_RECORDMCOUNT
 
 ifdef CONFIG_STACK_VALIDATION
 ifneq ($(SKIP_STACK_VALIDATION),1)
-- 
2.29.2.576.ga3fc446d84-goog


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v8 02/16] kbuild: add support for Clang LTO
  2020-12-01 21:36 [PATCH v8 00/16] Add support for Clang LTO Sami Tolvanen
  2020-12-01 21:36 ` [PATCH v8 01/16] tracing: move function tracer options to Kconfig Sami Tolvanen
@ 2020-12-01 21:36 ` Sami Tolvanen
  2020-12-02  2:59   ` Masahiro Yamada
  2020-12-03  0:07   ` Nick Desaulniers
  2020-12-01 21:36 ` [PATCH v8 03/16] kbuild: lto: fix module versioning Sami Tolvanen
                   ` (16 subsequent siblings)
  18 siblings, 2 replies; 48+ messages in thread
From: Sami Tolvanen @ 2020-12-01 21:36 UTC (permalink / raw)
  To: Masahiro Yamada, Steven Rostedt, Will Deacon
  Cc: Josh Poimboeuf, Peter Zijlstra, Greg Kroah-Hartman,
	Paul E. McKenney, Kees Cook, Nick Desaulniers, clang-built-linux,
	kernel-hardening, linux-arch, linux-arm-kernel, linux-kbuild,
	linux-kernel, linux-pci, Sami Tolvanen

This change adds build system support for Clang's Link Time
Optimization (LTO). With -flto, instead of ELF object files, Clang
produces LLVM bitcode, which is compiled into native code at link
time, allowing the final binary to be optimized globally. For more
details, see:

  https://llvm.org/docs/LinkTimeOptimization.html

The Kconfig option CONFIG_LTO_CLANG is implemented as a choice,
which defaults to LTO being disabled. To use LTO, the architecture
must select ARCH_SUPPORTS_LTO_CLANG and support:

  - compiling with Clang,
  - compiling inline assembly with Clang's integrated assembler,
  - and linking with LLD.

While using full LTO results in the best runtime performance, the
compilation is not scalable in time or memory. CONFIG_THINLTO
enables ThinLTO, which allows parallel optimization and faster
incremental builds. ThinLTO is used by default if the architecture
also selects ARCH_SUPPORTS_THINLTO:

  https://clang.llvm.org/docs/ThinLTO.html

To enable LTO, LLVM tools must be used to handle bitcode files. The
easiest way is to pass the LLVM=1 option to make:

  $ make LLVM=1 defconfig
  $ scripts/config -e LTO_CLANG
  $ make LLVM=1

Alternatively, at least the following LLVM tools must be used:

  CC=clang LD=ld.lld AR=llvm-ar NM=llvm-nm

To prepare for LTO support with other compilers, common parts are
gated behind the CONFIG_LTO option, and LTO can be disabled for
specific files by filtering out CC_FLAGS_LTO.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
---
 Makefile                          | 19 ++++++-
 arch/Kconfig                      | 88 +++++++++++++++++++++++++++++++
 include/asm-generic/vmlinux.lds.h | 11 ++--
 scripts/Makefile.build            |  9 +++-
 scripts/Makefile.modfinal         |  9 +++-
 scripts/Makefile.modpost          | 21 +++++++-
 scripts/link-vmlinux.sh           | 32 ++++++++---
 7 files changed, 171 insertions(+), 18 deletions(-)

diff --git a/Makefile b/Makefile
index 16b7f0890e75..f5cac2428efc 100644
--- a/Makefile
+++ b/Makefile
@@ -891,6 +891,21 @@ KBUILD_CFLAGS	+= $(CC_FLAGS_SCS)
 export CC_FLAGS_SCS
 endif
 
+ifdef CONFIG_LTO_CLANG
+ifdef CONFIG_LTO_CLANG_THIN
+CC_FLAGS_LTO	+= -flto=thin -fsplit-lto-unit
+KBUILD_LDFLAGS	+= --thinlto-cache-dir=$(extmod-prefix).thinlto-cache
+else
+CC_FLAGS_LTO	+= -flto
+endif
+CC_FLAGS_LTO	+= -fvisibility=default
+endif
+
+ifdef CONFIG_LTO
+KBUILD_CFLAGS	+= $(CC_FLAGS_LTO)
+export CC_FLAGS_LTO
+endif
+
 ifdef CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_32B
 KBUILD_CFLAGS += -falign-functions=32
 endif
@@ -1471,7 +1486,7 @@ MRPROPER_FILES += include/config include/generated          \
 		  *.spec
 
 # Directories & files removed with 'make distclean'
-DISTCLEAN_FILES += tags TAGS cscope* GPATH GTAGS GRTAGS GSYMS
+DISTCLEAN_FILES += tags TAGS cscope* GPATH GTAGS GRTAGS GSYMS .thinlto-cache
 
 # clean - Delete most, but leave enough to build external modules
 #
@@ -1717,7 +1732,7 @@ PHONY += compile_commands.json
 
 clean-dirs := $(KBUILD_EXTMOD)
 clean: rm-files := $(KBUILD_EXTMOD)/Module.symvers $(KBUILD_EXTMOD)/modules.nsdeps \
-	$(KBUILD_EXTMOD)/compile_commands.json
+	$(KBUILD_EXTMOD)/compile_commands.json $(KBUILD_EXTMOD)/.thinlto-cache
 
 PHONY += help
 help:
diff --git a/arch/Kconfig b/arch/Kconfig
index 56b6ccc0e32d..30907b554451 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -598,6 +598,94 @@ config SHADOW_CALL_STACK
 	  reading and writing arbitrary memory may be able to locate them
 	  and hijack control flow by modifying the stacks.
 
+config LTO
+	bool
+	help
+	  Selected if the kernel will be built using the compiler's LTO feature.
+
+config LTO_CLANG
+	bool
+	select LTO
+	help
+	  Selected if the kernel will be built using Clang's LTO feature.
+
+config ARCH_SUPPORTS_LTO_CLANG
+	bool
+	help
+	  An architecture should select this option if it supports:
+	  - compiling with Clang,
+	  - compiling inline assembly with Clang's integrated assembler,
+	  - and linking with LLD.
+
+config ARCH_SUPPORTS_LTO_CLANG_THIN
+	bool
+	help
+	  An architecture should select this option if it can support Clang's
+	  ThinLTO mode.
+
+config HAS_LTO_CLANG
+	def_bool y
+	# Clang >= 11: https://github.com/ClangBuiltLinux/linux/issues/510
+	depends on CC_IS_CLANG && CLANG_VERSION >= 110000 && LD_IS_LLD
+	depends on $(success,$(NM) --help | head -n 1 | grep -qi llvm)
+	depends on $(success,$(AR) --help | head -n 1 | grep -qi llvm)
+	depends on ARCH_SUPPORTS_LTO_CLANG
+	depends on !FTRACE_MCOUNT_USE_RECORDMCOUNT
+	depends on !KASAN
+	depends on !GCOV_KERNEL
+	depends on !MODVERSIONS
+	help
+	  The compiler and Kconfig options support building with Clang's
+	  LTO.
+
+choice
+	prompt "Link Time Optimization (LTO)"
+	default LTO_NONE
+	help
+	  This option enables Link Time Optimization (LTO), which allows the
+	  compiler to optimize binaries globally.
+
+	  If unsure, select LTO_NONE. Note that LTO is very resource-intensive
+	  so it's disabled by default.
+
+config LTO_NONE
+	bool "None"
+	help
+	  Build the kernel normally, without Link Time Optimization (LTO).
+
+config LTO_CLANG_FULL
+	bool "Clang Full LTO (EXPERIMENTAL)"
+	depends on HAS_LTO_CLANG
+	select LTO_CLANG
+	help
+          This option enables Clang's full Link Time Optimization (LTO), which
+          allows the compiler to optimize the kernel globally. If you enable
+          this option, the compiler generates LLVM bitcode instead of ELF
+          object files, and the actual compilation from bitcode happens at
+          the LTO link step, which may take several minutes depending on the
+          kernel configuration. More information can be found from LLVM's
+          documentation:
+
+	    https://llvm.org/docs/LinkTimeOptimization.html
+
+	  During link time, this option can use a large amount of RAM, and
+	  may take much longer than the ThinLTO option.
+
+config LTO_CLANG_THIN
+	bool "Clang ThinLTO (EXPERIMENTAL)"
+	depends on HAS_LTO_CLANG && ARCH_SUPPORTS_LTO_CLANG_THIN
+	select LTO_CLANG
+	help
+	  This option enables Clang's ThinLTO, which allows for parallel
+	  optimization and faster incremental compiles compared to the
+	  CONFIG_LTO_CLANG_FULL option. More information can be found
+	  from Clang's documentation:
+
+	    https://clang.llvm.org/docs/ThinLTO.html
+
+	  If unsure, say Y.
+endchoice
+
 config HAVE_ARCH_WITHIN_STACK_FRAMES
 	bool
 	help
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index b2b3d81b1535..8988a2e445d8 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -90,15 +90,18 @@
  * .data. We don't want to pull in .data..other sections, which Linux
  * has defined. Same for text and bss.
  *
+ * With LTO_CLANG, the linker also splits sections by default, so we need
+ * these macros to combine the sections during the final link.
+ *
  * RODATA_MAIN is not used because existing code already defines .rodata.x
  * sections to be brought in with rodata.
  */
-#ifdef CONFIG_LD_DEAD_CODE_DATA_ELIMINATION
+#if defined(CONFIG_LD_DEAD_CODE_DATA_ELIMINATION) || defined(CONFIG_LTO_CLANG)
 #define TEXT_MAIN .text .text.[0-9a-zA-Z_]*
-#define DATA_MAIN .data .data.[0-9a-zA-Z_]* .data..LPBX*
+#define DATA_MAIN .data .data.[0-9a-zA-Z_]* .data..L* .data..compoundliteral*
 #define SDATA_MAIN .sdata .sdata.[0-9a-zA-Z_]*
-#define RODATA_MAIN .rodata .rodata.[0-9a-zA-Z_]*
-#define BSS_MAIN .bss .bss.[0-9a-zA-Z_]*
+#define RODATA_MAIN .rodata .rodata.[0-9a-zA-Z_]* .rodata..L*
+#define BSS_MAIN .bss .bss.[0-9a-zA-Z_]* .bss..compoundliteral*
 #define SBSS_MAIN .sbss .sbss.[0-9a-zA-Z_]*
 #else
 #define TEXT_MAIN .text
diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index 2175ddb1ee0c..ed74b2f986f7 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -111,7 +111,7 @@ endif
 # ---------------------------------------------------------------------------
 
 quiet_cmd_cc_s_c = CC $(quiet_modtag)  $@
-      cmd_cc_s_c = $(CC) $(filter-out $(DEBUG_CFLAGS), $(c_flags)) -fverbose-asm -S -o $@ $<
+      cmd_cc_s_c = $(CC) $(filter-out $(DEBUG_CFLAGS) $(CC_FLAGS_LTO), $(c_flags)) -fverbose-asm -S -o $@ $<
 
 $(obj)/%.s: $(src)/%.c FORCE
 	$(call if_changed_dep,cc_s_c)
@@ -425,8 +425,15 @@ $(obj)/lib.a: $(lib-y) FORCE
 # Do not replace $(filter %.o,^) with $(real-prereqs). When a single object
 # module is turned into a multi object module, $^ will contain header file
 # dependencies recorded in the .*.cmd file.
+ifdef CONFIG_LTO_CLANG
+quiet_cmd_link_multi-m = AR [M]  $@
+cmd_link_multi-m =						\
+	rm -f $@; 						\
+	$(AR) cDPrsT $@ $(filter %.o,$^)
+else
 quiet_cmd_link_multi-m = LD [M]  $@
       cmd_link_multi-m = $(LD) $(ld_flags) -r -o $@ $(filter %.o,$^)
+endif
 
 $(multi-used-m): FORCE
 	$(call if_changed,link_multi-m)
diff --git a/scripts/Makefile.modfinal b/scripts/Makefile.modfinal
index ae01baf96f4e..2cb9a1d88434 100644
--- a/scripts/Makefile.modfinal
+++ b/scripts/Makefile.modfinal
@@ -6,6 +6,7 @@
 PHONY := __modfinal
 __modfinal:
 
+include $(objtree)/include/config/auto.conf
 include $(srctree)/scripts/Kbuild.include
 
 # for c_flags
@@ -29,6 +30,12 @@ quiet_cmd_cc_o_c = CC [M]  $@
 
 ARCH_POSTLINK := $(wildcard $(srctree)/arch/$(SRCARCH)/Makefile.postlink)
 
+ifdef CONFIG_LTO_CLANG
+# With CONFIG_LTO_CLANG, reuse the object file we compiled for modpost to
+# avoid a second slow LTO link
+prelink-ext := .lto
+endif
+
 quiet_cmd_ld_ko_o = LD [M]  $@
       cmd_ld_ko_o =                                                     \
 	$(LD) -r $(KBUILD_LDFLAGS)					\
@@ -36,7 +43,7 @@ quiet_cmd_ld_ko_o = LD [M]  $@
 		-T scripts/module.lds -o $@ $(filter %.o, $^);		\
 	$(if $(ARCH_POSTLINK), $(MAKE) -f $(ARCH_POSTLINK) $@, true)
 
-$(modules): %.ko: %.o %.mod.o scripts/module.lds FORCE
+$(modules): %.ko: %$(prelink-ext).o %.mod.o scripts/module.lds FORCE
 	+$(call if_changed,ld_ko_o)
 
 targets += $(modules) $(modules:.ko=.mod.o)
diff --git a/scripts/Makefile.modpost b/scripts/Makefile.modpost
index f54b6ac37ac2..9ff8bfdb574d 100644
--- a/scripts/Makefile.modpost
+++ b/scripts/Makefile.modpost
@@ -43,6 +43,9 @@ __modpost:
 include include/config/auto.conf
 include scripts/Kbuild.include
 
+# for ld_flags
+include scripts/Makefile.lib
+
 MODPOST = scripts/mod/modpost								\
 	$(if $(CONFIG_MODVERSIONS),-m)							\
 	$(if $(CONFIG_MODULE_SRCVERSION_ALL),-a)					\
@@ -102,12 +105,26 @@ $(input-symdump):
 	@echo >&2 'WARNING: Symbol version dump "$@" is missing.'
 	@echo >&2 '         Modules may not have dependencies or modversions.'
 
+ifdef CONFIG_LTO_CLANG
+# With CONFIG_LTO_CLANG, .o files might be LLVM bitcode, so we need to run
+# LTO to compile them into native code before running modpost
+prelink-ext := .lto
+
+quiet_cmd_cc_lto_link_modules = LTO [M] $@
+cmd_cc_lto_link_modules = $(LD) $(ld_flags) -r -o $@ --whole-archive $^
+
+%.lto.o: %.o
+	$(call if_changed,cc_lto_link_modules)
+endif
+
+modules := $(sort $(shell cat $(MODORDER)))
+
 # Read out modules.order to pass in modpost.
 # Otherwise, allmodconfig would fail with "Argument list too long".
 quiet_cmd_modpost = MODPOST $@
-      cmd_modpost = sed 's/ko$$/o/' $< | $(MODPOST) -T -
+      cmd_modpost = sed 's/\.ko$$/$(prelink-ext)\.o/' $< | $(MODPOST) -T -
 
-$(output-symdump): $(MODORDER) $(input-symdump) FORCE
+$(output-symdump): $(MODORDER) $(input-symdump) $(modules:.ko=$(prelink-ext).o) FORCE
 	$(call if_changed,modpost)
 
 targets += $(output-symdump)
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index 6eded325c837..596507573a48 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -56,6 +56,14 @@ modpost_link()
 		${KBUILD_VMLINUX_LIBS}				\
 		--end-group"
 
+	if [ -n "${CONFIG_LTO_CLANG}" ]; then
+		# This might take a while, so indicate that we're doing
+		# an LTO link
+		info LTO ${1}
+	else
+		info LD ${1}
+	fi
+
 	${LD} ${KBUILD_LDFLAGS} -r -o ${1} ${objects}
 }
 
@@ -103,13 +111,22 @@ vmlinux_link()
 	fi
 
 	if [ "${SRCARCH}" != "um" ]; then
-		objects="--whole-archive			\
-			${KBUILD_VMLINUX_OBJS}			\
-			--no-whole-archive			\
-			--start-group				\
-			${KBUILD_VMLINUX_LIBS}			\
-			--end-group				\
-			${@}"
+		if [ -n "${CONFIG_LTO_CLANG}" ]; then
+			# Use vmlinux.o instead of performing the slow LTO
+			# link again.
+			objects="--whole-archive		\
+				vmlinux.o 			\
+				--no-whole-archive		\
+				${@}"
+		else
+			objects="--whole-archive		\
+				${KBUILD_VMLINUX_OBJS}		\
+				--no-whole-archive		\
+				--start-group			\
+				${KBUILD_VMLINUX_LIBS}		\
+				--end-group			\
+				${@}"
+		fi
 
 		${LD} ${KBUILD_LDFLAGS} ${LDFLAGS_vmlinux}	\
 			${strip_debug#-Wl,}			\
@@ -274,7 +291,6 @@ fi;
 ${MAKE} -f "${srctree}/scripts/Makefile.build" obj=init need-builtin=1
 
 #link vmlinux.o
-info LD vmlinux.o
 modpost_link vmlinux.o
 objtool_link vmlinux.o
 
-- 
2.29.2.576.ga3fc446d84-goog


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v8 03/16] kbuild: lto: fix module versioning
  2020-12-01 21:36 [PATCH v8 00/16] Add support for Clang LTO Sami Tolvanen
  2020-12-01 21:36 ` [PATCH v8 01/16] tracing: move function tracer options to Kconfig Sami Tolvanen
  2020-12-01 21:36 ` [PATCH v8 02/16] kbuild: add support for Clang LTO Sami Tolvanen
@ 2020-12-01 21:36 ` Sami Tolvanen
  2020-12-01 21:36 ` [PATCH v8 04/16] kbuild: lto: limit inlining Sami Tolvanen
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 48+ messages in thread
From: Sami Tolvanen @ 2020-12-01 21:36 UTC (permalink / raw)
  To: Masahiro Yamada, Steven Rostedt, Will Deacon
  Cc: Josh Poimboeuf, Peter Zijlstra, Greg Kroah-Hartman,
	Paul E. McKenney, Kees Cook, Nick Desaulniers, clang-built-linux,
	kernel-hardening, linux-arch, linux-arm-kernel, linux-kbuild,
	linux-kernel, linux-pci, Sami Tolvanen

With CONFIG_MODVERSIONS, version information is linked into each
compilation unit that exports symbols. With LTO, we cannot use this
method as all C code is compiled into LLVM bitcode instead. This
change collects symbol versions into .symversions files and merges
them in link-vmlinux.sh where they are all linked into vmlinux.o at
the same time.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
---
 .gitignore               |  1 +
 Makefile                 |  3 ++-
 arch/Kconfig             |  1 -
 scripts/Makefile.build   | 33 +++++++++++++++++++++++++++++++--
 scripts/Makefile.modpost |  6 +++++-
 scripts/link-vmlinux.sh  | 23 ++++++++++++++++++++++-
 6 files changed, 61 insertions(+), 6 deletions(-)

diff --git a/.gitignore b/.gitignore
index d01cda8e1177..44e34991875e 100644
--- a/.gitignore
+++ b/.gitignore
@@ -41,6 +41,7 @@
 *.so.dbg
 *.su
 *.symtypes
+*.symversions
 *.tab.[ch]
 *.tar
 *.xz
diff --git a/Makefile b/Makefile
index f5cac2428efc..222ae96d179d 100644
--- a/Makefile
+++ b/Makefile
@@ -1829,7 +1829,8 @@ clean: $(clean-dirs)
 		-o -name '.tmp_*.o.*' \
 		-o -name '*.c.[012]*.*' \
 		-o -name '*.ll' \
-		-o -name '*.gcno' \) -type f -print | xargs rm -f
+		-o -name '*.gcno' \
+		-o -name '*.*.symversions' \) -type f -print | xargs rm -f
 
 # Generate tags for editors
 # ---------------------------------------------------------------------------
diff --git a/arch/Kconfig b/arch/Kconfig
index 30907b554451..c3c13ec9a74c 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -633,7 +633,6 @@ config HAS_LTO_CLANG
 	depends on !FTRACE_MCOUNT_USE_RECORDMCOUNT
 	depends on !KASAN
 	depends on !GCOV_KERNEL
-	depends on !MODVERSIONS
 	help
 	  The compiler and Kconfig options support building with Clang's
 	  LTO.
diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index ed74b2f986f7..eae2f5386a03 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -166,6 +166,15 @@ ifdef CONFIG_MODVERSIONS
 #   the actual value of the checksum generated by genksyms
 # o remove .tmp_<file>.o to <file>.o
 
+ifdef CONFIG_LTO_CLANG
+# Generate .o.symversions files for each .o with exported symbols, and link these
+# to the kernel and/or modules at the end.
+cmd_modversions_c =								\
+	if $(NM) $@ 2>/dev/null | grep -q __ksymtab; then			\
+		$(call cmd_gensymtypes_c,$(KBUILD_SYMTYPES),$(@:.o=.symtypes))	\
+		    > $@.symversions;						\
+	fi;
+else
 cmd_modversions_c =								\
 	if $(OBJDUMP) -h $@ | grep -q __ksymtab; then				\
 		$(call cmd_gensymtypes_c,$(KBUILD_SYMTYPES),$(@:.o=.symtypes))	\
@@ -177,6 +186,7 @@ cmd_modversions_c =								\
 		rm -f $(@D)/.tmp_$(@F:.o=.ver);					\
 	fi
 endif
+endif
 
 ifdef CONFIG_FTRACE_MCOUNT_USE_RECORDMCOUNT
 # compiler will not generate __mcount_loc use recordmcount or recordmcount.pl
@@ -390,6 +400,18 @@ $(obj)/%.asn1.c $(obj)/%.asn1.h: $(src)/%.asn1 $(objtree)/scripts/asn1_compiler
 $(subdir-builtin): $(obj)/%/built-in.a: $(obj)/% ;
 $(subdir-modorder): $(obj)/%/modules.order: $(obj)/% ;
 
+# combine symversions for later processing
+quiet_cmd_update_lto_symversions = SYMVER  $@
+ifeq ($(CONFIG_LTO_CLANG) $(CONFIG_MODVERSIONS),y y)
+      cmd_update_lto_symversions =					\
+	rm -f $@.symversions						\
+	$(foreach n, $(filter-out FORCE,$^),				\
+		$(if $(wildcard $(n).symversions),			\
+			; cat $(n).symversions >> $@.symversions))
+else
+      cmd_update_lto_symversions = echo >/dev/null
+endif
+
 #
 # Rule to compile a set of .o files into one .a file (without symbol table)
 #
@@ -397,8 +419,11 @@ $(subdir-modorder): $(obj)/%/modules.order: $(obj)/% ;
 quiet_cmd_ar_builtin = AR      $@
       cmd_ar_builtin = rm -f $@; $(AR) cDPrST $@ $(real-prereqs)
 
+quiet_cmd_ar_and_symver = AR      $@
+      cmd_ar_and_symver = $(cmd_update_lto_symversions); $(cmd_ar_builtin)
+
 $(obj)/built-in.a: $(real-obj-y) FORCE
-	$(call if_changed,ar_builtin)
+	$(call if_changed,ar_and_symver)
 
 #
 # Rule to create modules.order file
@@ -418,8 +443,11 @@ $(obj)/modules.order: $(obj-m) FORCE
 #
 # Rule to compile a set of .o files into one .a file (with symbol table)
 #
+quiet_cmd_ar_lib = AR      $@
+      cmd_ar_lib = $(cmd_update_lto_symversions); $(cmd_ar)
+
 $(obj)/lib.a: $(lib-y) FORCE
-	$(call if_changed,ar)
+	$(call if_changed,ar_lib)
 
 # NOTE:
 # Do not replace $(filter %.o,^) with $(real-prereqs). When a single object
@@ -428,6 +456,7 @@ $(obj)/lib.a: $(lib-y) FORCE
 ifdef CONFIG_LTO_CLANG
 quiet_cmd_link_multi-m = AR [M]  $@
 cmd_link_multi-m =						\
+	$(cmd_update_lto_symversions);				\
 	rm -f $@; 						\
 	$(AR) cDPrsT $@ $(filter %.o,$^)
 else
diff --git a/scripts/Makefile.modpost b/scripts/Makefile.modpost
index 9ff8bfdb574d..066beffca09a 100644
--- a/scripts/Makefile.modpost
+++ b/scripts/Makefile.modpost
@@ -111,7 +111,11 @@ ifdef CONFIG_LTO_CLANG
 prelink-ext := .lto
 
 quiet_cmd_cc_lto_link_modules = LTO [M] $@
-cmd_cc_lto_link_modules = $(LD) $(ld_flags) -r -o $@ --whole-archive $^
+cmd_cc_lto_link_modules =						\
+	$(LD) $(ld_flags) -r -o $@					\
+		$(shell [ -s $(@:.lto.o=.o.symversions) ] &&		\
+			echo -T $(@:.lto.o=.o.symversions))		\
+		--whole-archive $^
 
 %.lto.o: %.o
 	$(call if_changed,cc_lto_link_modules)
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index 596507573a48..78e55fe7210b 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -43,11 +43,26 @@ info()
 	fi
 }
 
+# If CONFIG_LTO_CLANG is selected, collect generated symbol versions into
+# .tmp_symversions.lds
+gen_symversions()
+{
+	info GEN .tmp_symversions.lds
+	rm -f .tmp_symversions.lds
+
+	for o in ${KBUILD_VMLINUX_OBJS} ${KBUILD_VMLINUX_LIBS}; do
+		if [ -f ${o}.symversions ]; then
+			cat ${o}.symversions >> .tmp_symversions.lds
+		fi
+	done
+}
+
 # Link of vmlinux.o used for section mismatch analysis
 # ${1} output file
 modpost_link()
 {
 	local objects
+	local lds=""
 
 	objects="--whole-archive				\
 		${KBUILD_VMLINUX_OBJS}				\
@@ -57,6 +72,11 @@ modpost_link()
 		--end-group"
 
 	if [ -n "${CONFIG_LTO_CLANG}" ]; then
+		if [ -n "${CONFIG_MODVERSIONS}" ]; then
+			gen_symversions
+			lds="${lds} -T .tmp_symversions.lds"
+		fi
+
 		# This might take a while, so indicate that we're doing
 		# an LTO link
 		info LTO ${1}
@@ -64,7 +84,7 @@ modpost_link()
 		info LD ${1}
 	fi
 
-	${LD} ${KBUILD_LDFLAGS} -r -o ${1} ${objects}
+	${LD} ${KBUILD_LDFLAGS} -r -o ${1} ${lds} ${objects}
 }
 
 objtool_link()
@@ -242,6 +262,7 @@ cleanup()
 {
 	rm -f .btf.*
 	rm -f .tmp_System.map
+	rm -f .tmp_symversions.lds
 	rm -f .tmp_vmlinux*
 	rm -f System.map
 	rm -f vmlinux
-- 
2.29.2.576.ga3fc446d84-goog


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v8 04/16] kbuild: lto: limit inlining
  2020-12-01 21:36 [PATCH v8 00/16] Add support for Clang LTO Sami Tolvanen
                   ` (2 preceding siblings ...)
  2020-12-01 21:36 ` [PATCH v8 03/16] kbuild: lto: fix module versioning Sami Tolvanen
@ 2020-12-01 21:36 ` Sami Tolvanen
  2020-12-01 21:36 ` [PATCH v8 05/16] kbuild: lto: merge module sections Sami Tolvanen
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 48+ messages in thread
From: Sami Tolvanen @ 2020-12-01 21:36 UTC (permalink / raw)
  To: Masahiro Yamada, Steven Rostedt, Will Deacon
  Cc: Josh Poimboeuf, Peter Zijlstra, Greg Kroah-Hartman,
	Paul E. McKenney, Kees Cook, Nick Desaulniers, clang-built-linux,
	kernel-hardening, linux-arch, linux-arm-kernel, linux-kbuild,
	linux-kernel, linux-pci, Sami Tolvanen

This change limits function inlining across translation unit boundaries
in order to reduce the binary size with LTO. The -import-instr-limit
flag defines a size limit, as the number of LLVM IR instructions, for
importing functions from other TUs, defaulting to 100.

Based on testing with arm64 defconfig, we found that a limit of 5 is a
reasonable compromise between performance and binary size, reducing the
size of a stripped vmlinux by 11%.

Suggested-by: George Burgess IV <gbiv@google.com>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
---
 Makefile | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/Makefile b/Makefile
index 222ae96d179d..ac836907d8b1 100644
--- a/Makefile
+++ b/Makefile
@@ -899,6 +899,9 @@ else
 CC_FLAGS_LTO	+= -flto
 endif
 CC_FLAGS_LTO	+= -fvisibility=default
+
+# Limit inlining across translation units to reduce binary size
+KBUILD_LDFLAGS += -mllvm -import-instr-limit=5
 endif
 
 ifdef CONFIG_LTO
-- 
2.29.2.576.ga3fc446d84-goog


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v8 05/16] kbuild: lto: merge module sections
  2020-12-01 21:36 [PATCH v8 00/16] Add support for Clang LTO Sami Tolvanen
                   ` (3 preceding siblings ...)
  2020-12-01 21:36 ` [PATCH v8 04/16] kbuild: lto: limit inlining Sami Tolvanen
@ 2020-12-01 21:36 ` Sami Tolvanen
  2020-12-01 21:36 ` [PATCH v8 06/16] kbuild: lto: remove duplicate dependencies from .mod files Sami Tolvanen
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 48+ messages in thread
From: Sami Tolvanen @ 2020-12-01 21:36 UTC (permalink / raw)
  To: Masahiro Yamada, Steven Rostedt, Will Deacon
  Cc: Josh Poimboeuf, Peter Zijlstra, Greg Kroah-Hartman,
	Paul E. McKenney, Kees Cook, Nick Desaulniers, clang-built-linux,
	kernel-hardening, linux-arch, linux-arm-kernel, linux-kbuild,
	linux-kernel, linux-pci, Sami Tolvanen

LLD always splits sections with LTO, which increases module sizes. This
change adds linker script rules to merge the split sections in the final
module.

Suggested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
---
 scripts/module.lds.S | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/scripts/module.lds.S b/scripts/module.lds.S
index 69b9b71a6a47..18d5b8423635 100644
--- a/scripts/module.lds.S
+++ b/scripts/module.lds.S
@@ -23,6 +23,30 @@ SECTIONS {
 	.init_array		0 : ALIGN(8) { *(SORT(.init_array.*)) *(.init_array) }
 
 	__jump_table		0 : ALIGN(8) { KEEP(*(__jump_table)) }
+
+	__patchable_function_entries : { *(__patchable_function_entries) }
+
+	/*
+	 * With CONFIG_LTO_CLANG, LLD always enables -fdata-sections and
+	 * -ffunction-sections, which increases the size of the final module.
+	 * Merge the split sections in the final binary.
+	 */
+	.bss : {
+		*(.bss .bss.[0-9a-zA-Z_]*)
+		*(.bss..L*)
+	}
+
+	.data : {
+		*(.data .data.[0-9a-zA-Z_]*)
+		*(.data..L*)
+	}
+
+	.rodata : {
+		*(.rodata .rodata.[0-9a-zA-Z_]*)
+		*(.rodata..L*)
+	}
+
+	.text : { *(.text .text.[0-9a-zA-Z_]*) }
 }
 
 /* bring in arch-specific sections */
-- 
2.29.2.576.ga3fc446d84-goog


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v8 06/16] kbuild: lto: remove duplicate dependencies from .mod files
  2020-12-01 21:36 [PATCH v8 00/16] Add support for Clang LTO Sami Tolvanen
                   ` (4 preceding siblings ...)
  2020-12-01 21:36 ` [PATCH v8 05/16] kbuild: lto: merge module sections Sami Tolvanen
@ 2020-12-01 21:36 ` Sami Tolvanen
  2020-12-01 21:36 ` [PATCH v8 07/16] init: lto: ensure initcall ordering Sami Tolvanen
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 48+ messages in thread
From: Sami Tolvanen @ 2020-12-01 21:36 UTC (permalink / raw)
  To: Masahiro Yamada, Steven Rostedt, Will Deacon
  Cc: Josh Poimboeuf, Peter Zijlstra, Greg Kroah-Hartman,
	Paul E. McKenney, Kees Cook, Nick Desaulniers, clang-built-linux,
	kernel-hardening, linux-arch, linux-arm-kernel, linux-kbuild,
	linux-kernel, linux-pci, Sami Tolvanen

With LTO, llvm-nm prints out symbols for each archive member
separately, which results in a lot of duplicate dependencies in the
.mod file when CONFIG_TRIM_UNUSED_SYMS is enabled. When a module
consists of several compilation units, the output can exceed the
default xargs command size limit and split the dependency list to
multiple lines, which results in used symbols getting trimmed.

This change removes duplicate dependencies, which will reduce the
probability of this happening and makes .mod files smaller and
easier to read.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
---
 scripts/Makefile.build | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index eae2f5386a03..f80ada58271d 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -281,7 +281,7 @@ endef
 
 # List module undefined symbols (or empty line if not enabled)
 ifdef CONFIG_TRIM_UNUSED_KSYMS
-cmd_undef_syms = $(NM) $< | sed -n 's/^  *U //p' | xargs echo
+cmd_undef_syms = $(NM) $< | sed -n 's/^  *U //p' | sort -u | xargs echo
 else
 cmd_undef_syms = echo
 endif
-- 
2.29.2.576.ga3fc446d84-goog


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v8 07/16] init: lto: ensure initcall ordering
  2020-12-01 21:36 [PATCH v8 00/16] Add support for Clang LTO Sami Tolvanen
                   ` (5 preceding siblings ...)
  2020-12-01 21:36 ` [PATCH v8 06/16] kbuild: lto: remove duplicate dependencies from .mod files Sami Tolvanen
@ 2020-12-01 21:36 ` Sami Tolvanen
  2020-12-01 21:36 ` [PATCH v8 08/16] init: lto: fix PREL32 relocations Sami Tolvanen
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 48+ messages in thread
From: Sami Tolvanen @ 2020-12-01 21:36 UTC (permalink / raw)
  To: Masahiro Yamada, Steven Rostedt, Will Deacon
  Cc: Josh Poimboeuf, Peter Zijlstra, Greg Kroah-Hartman,
	Paul E. McKenney, Kees Cook, Nick Desaulniers, clang-built-linux,
	kernel-hardening, linux-arch, linux-arm-kernel, linux-kbuild,
	linux-kernel, linux-pci, Sami Tolvanen

With LTO, the compiler doesn't necessarily obey the link order for
initcalls, and initcall variables need globally unique names to avoid
collisions at link time.

This change exports __KBUILD_MODNAME and adds the initcall_id() macro,
which uses it together with __COUNTER__ and __LINE__ to help ensure
these variables have unique names, and moves each variable to its own
section when LTO is enabled, so the correct order can be specified using
a linker script.

The generate_initcall_ordering.pl script uses nm to find initcalls from
the object files passed to the linker, and generates a linker script
that specifies the same order for initcalls that we would have without
LTO. With LTO enabled, the script is called in link-vmlinux.sh through
jobserver-exec to limit the number of jobs spawned.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
---
 include/linux/init.h               |  52 +++++-
 scripts/Makefile.lib               |   6 +-
 scripts/generate_initcall_order.pl | 270 +++++++++++++++++++++++++++++
 scripts/link-vmlinux.sh            |  15 ++
 4 files changed, 334 insertions(+), 9 deletions(-)
 create mode 100755 scripts/generate_initcall_order.pl

diff --git a/include/linux/init.h b/include/linux/init.h
index 7b53cb3092ee..d466bea7ecba 100644
--- a/include/linux/init.h
+++ b/include/linux/init.h
@@ -184,19 +184,57 @@ extern bool initcall_debug;
  * as KEEP() in the linker script.
  */
 
+/* Format: <modname>__<counter>_<line>_<fn> */
+#define __initcall_id(fn)					\
+	__PASTE(__KBUILD_MODNAME,				\
+	__PASTE(__,						\
+	__PASTE(__COUNTER__,					\
+	__PASTE(_,						\
+	__PASTE(__LINE__,					\
+	__PASTE(_, fn))))))
+
+/* Format: __<prefix>__<iid><id> */
+#define __initcall_name(prefix, __iid, id)			\
+	__PASTE(__,						\
+	__PASTE(prefix,						\
+	__PASTE(__,						\
+	__PASTE(__iid, id))))
+
+#ifdef CONFIG_LTO_CLANG
+/*
+ * With LTO, the compiler doesn't necessarily obey link order for
+ * initcalls. In order to preserve the correct order, we add each
+ * variable into its own section and generate a linker script (in
+ * scripts/link-vmlinux.sh) to specify the order of the sections.
+ */
+#define __initcall_section(__sec, __iid)			\
+	#__sec ".init.." #__iid
+#else
+#define __initcall_section(__sec, __iid)			\
+	#__sec ".init"
+#endif
+
 #ifdef CONFIG_HAVE_ARCH_PREL32_RELOCATIONS
-#define ___define_initcall(fn, id, __sec)			\
+#define ____define_initcall(fn, __name, __sec)			\
 	__ADDRESSABLE(fn)					\
-	asm(".section	\"" #__sec ".init\", \"a\"	\n"	\
-	"__initcall_" #fn #id ":			\n"	\
+	asm(".section	\"" __sec "\", \"a\"		\n"	\
+	    __stringify(__name) ":			\n"	\
 	    ".long	" #fn " - .			\n"	\
 	    ".previous					\n");
 #else
-#define ___define_initcall(fn, id, __sec) \
-	static initcall_t __initcall_##fn##id __used \
-		__attribute__((__section__(#__sec ".init"))) = fn;
+#define ____define_initcall(fn, __name, __sec)			\
+	static initcall_t __name __used 			\
+		__attribute__((__section__(__sec))) = fn;
 #endif
 
+#define __unique_initcall(fn, id, __sec, __iid)			\
+	____define_initcall(fn,					\
+		__initcall_name(initcall, __iid, id),		\
+		__initcall_section(__sec, __iid))
+
+#define ___define_initcall(fn, id, __sec)			\
+	__unique_initcall(fn, id, __sec, __initcall_id(fn))
+
 #define __define_initcall(fn, id) ___define_initcall(fn, id, .initcall##id)
 
 /*
@@ -236,7 +274,7 @@ extern bool initcall_debug;
 #define __exitcall(fn)						\
 	static exitcall_t __exitcall_##fn __exit_call = fn
 
-#define console_initcall(fn)	___define_initcall(fn,, .con_initcall)
+#define console_initcall(fn)	___define_initcall(fn, con, .con_initcall)
 
 struct obs_kernel_param {
 	const char *str;
diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index 94133708889d..53aa3e18ce8a 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -117,9 +117,11 @@ target-stem = $(basename $(patsubst $(obj)/%,%,$@))
 # These flags are needed for modversions and compiling, so we define them here
 # $(modname_flags) defines KBUILD_MODNAME as the name of the module it will
 # end up in (or would, if it gets compiled in)
-name-fix = $(call stringify,$(subst $(comma),_,$(subst -,_,$1)))
+name-fix-token = $(subst $(comma),_,$(subst -,_,$1))
+name-fix = $(call stringify,$(call name-fix-token,$1))
 basename_flags = -DKBUILD_BASENAME=$(call name-fix,$(basetarget))
-modname_flags  = -DKBUILD_MODNAME=$(call name-fix,$(modname))
+modname_flags  = -DKBUILD_MODNAME=$(call name-fix,$(modname)) \
+		 -D__KBUILD_MODNAME=kmod_$(call name-fix-token,$(modname))
 modfile_flags  = -DKBUILD_MODFILE=$(call stringify,$(modfile))
 
 _c_flags       = $(filter-out $(CFLAGS_REMOVE_$(target-stem).o), \
diff --git a/scripts/generate_initcall_order.pl b/scripts/generate_initcall_order.pl
new file mode 100755
index 000000000000..1a88d3f1b913
--- /dev/null
+++ b/scripts/generate_initcall_order.pl
@@ -0,0 +1,270 @@
+#!/usr/bin/env perl
+# SPDX-License-Identifier: GPL-2.0
+#
+# Generates a linker script that specifies the correct initcall order.
+#
+# Copyright (C) 2019 Google LLC
+
+use strict;
+use warnings;
+use IO::Handle;
+use IO::Select;
+use POSIX ":sys_wait_h";
+
+my $nm = $ENV{'NM'} || die "$0: ERROR: NM not set?";
+my $objtree = $ENV{'objtree'} || '.';
+
+## currently active child processes
+my $jobs = {};		# child process pid -> file handle
+## results from child processes
+my $results = {};	# object index -> [ { level, secname }, ... ]
+
+## reads _NPROCESSORS_ONLN to determine the maximum number of processes to
+## start
+sub get_online_processors {
+	open(my $fh, "getconf _NPROCESSORS_ONLN 2>/dev/null |")
+		or die "$0: ERROR: failed to execute getconf: $!";
+	my $procs = <$fh>;
+	close($fh);
+
+	if (!($procs =~ /^\d+$/)) {
+		return 1;
+	}
+
+	return int($procs);
+}
+
+## writes results to the parent process
+## format: <file index> <initcall level> <base initcall section name>
+sub write_results {
+	my ($index, $initcalls) = @_;
+
+	# sort by the counter value to ensure the order of initcalls within
+	# each object file is correct
+	foreach my $counter (sort { $a <=> $b } keys(%{$initcalls})) {
+		my $level = $initcalls->{$counter}->{'level'};
+
+		# section name for the initcall function
+		my $secname = $initcalls->{$counter}->{'module'} . '__' .
+			      $counter . '_' .
+			      $initcalls->{$counter}->{'line'} . '_' .
+			      $initcalls->{$counter}->{'function'};
+
+		print "$index $level $secname\n";
+	}
+}
+
+## reads a result line from a child process and adds it to the $results array
+sub read_results{
+	my ($fh) = @_;
+
+	# each child prints out a full line w/ autoflush and exits after the
+	# last line, so even if buffered I/O blocks here, it shouldn't block
+	# very long
+	my $data = <$fh>;
+
+	if (!defined($data)) {
+		return 0;
+	}
+
+	chomp($data);
+
+	my ($index, $level, $secname) = $data =~
+		/^(\d+)\ ([^\ ]+)\ (.*)$/;
+
+	if (!defined($index) ||
+		!defined($level) ||
+		!defined($secname)) {
+		die "$0: ERROR: child process returned invalid data: $data\n";
+	}
+
+	$index = int($index);
+
+	if (!exists($results->{$index})) {
+		$results->{$index} = [];
+	}
+
+	push (@{$results->{$index}}, {
+		'level'   => $level,
+		'secname' => $secname
+	});
+
+	return 1;
+}
+
+## finds initcalls from an object file or all object files in an archive, and
+## writes results back to the parent process
+sub find_initcalls {
+	my ($index, $file) = @_;
+
+	die "$0: ERROR: file $file doesn't exist?" if (! -f $file);
+
+	open(my $fh, "\"$nm\" --defined-only \"$file\" 2>/dev/null |")
+		or die "$0: ERROR: failed to execute \"$nm\": $!";
+
+	my $initcalls = {};
+
+	while (<$fh>) {
+		chomp;
+
+		# check for the start of a new object file (if processing an
+		# archive)
+		my ($path)= $_ =~ /^(.+)\:$/;
+
+		if (defined($path)) {
+			write_results($index, $initcalls);
+			$initcalls = {};
+			next;
+		}
+
+		# look for an initcall
+		my ($module, $counter, $line, $symbol) = $_ =~
+			/[a-z]\s+__initcall__(\S*)__(\d+)_(\d+)_(.*)$/;
+
+		if (!defined($module)) {
+			$module = ''
+		}
+
+		if (!defined($counter) ||
+			!defined($line) ||
+			!defined($symbol)) {
+			next;
+		}
+
+		# parse initcall level
+		my ($function, $level) = $symbol =~
+			/^(.*)((early|rootfs|con|[0-9])s?)$/;
+
+		die "$0: ERROR: invalid initcall name $symbol in $file($path)"
+			if (!defined($function) || !defined($level));
+
+		$initcalls->{$counter} = {
+			'module'   => $module,
+			'line'     => $line,
+			'function' => $function,
+			'level'    => $level,
+		};
+	}
+
+	close($fh);
+	write_results($index, $initcalls);
+}
+
+## waits for any child process to complete, reads the results, and adds them to
+## the $results array for later processing
+sub wait_for_results {
+	my ($select) = @_;
+
+	my $pid = 0;
+	do {
+		# unblock children that may have a full write buffer
+		foreach my $fh ($select->can_read(0)) {
+			read_results($fh);
+		}
+
+		# check for children that have exited, read the remaining data
+		# from them, and clean up
+		$pid = waitpid(-1, WNOHANG);
+		if ($pid > 0) {
+			if (!exists($jobs->{$pid})) {
+				next;
+			}
+
+			my $fh = $jobs->{$pid};
+			$select->remove($fh);
+
+			while (read_results($fh)) {
+				# until eof
+			}
+
+			close($fh);
+			delete($jobs->{$pid});
+		}
+	} while ($pid > 0);
+}
+
+## forks a child to process each file passed in the command line and collects
+## the results
+sub process_files {
+	my $index = 0;
+	my $njobs = $ENV{'PARALLELISM'} || get_online_processors();
+	my $select = IO::Select->new();
+
+	while (my $file = shift(@ARGV)) {
+		# fork a child process and read it's stdout
+		my $pid = open(my $fh, '-|');
+
+		if (!defined($pid)) {
+			die "$0: ERROR: failed to fork: $!";
+		} elsif ($pid) {
+			# save the child process pid and the file handle
+			$select->add($fh);
+			$jobs->{$pid} = $fh;
+		} else {
+			# in the child process
+			STDOUT->autoflush(1);
+			find_initcalls($index, "$objtree/$file");
+			exit;
+		}
+
+		$index++;
+
+		# limit the number of children to $njobs
+		if (scalar(keys(%{$jobs})) >= $njobs) {
+			wait_for_results($select);
+		}
+	}
+
+	# wait for the remaining children to complete
+	while (scalar(keys(%{$jobs})) > 0) {
+		wait_for_results($select);
+	}
+}
+
+sub generate_initcall_lds() {
+	process_files();
+
+	my $sections = {};	# level -> [ secname, ...]
+
+	# sort results to retain link order and split to sections per
+	# initcall level
+	foreach my $index (sort { $a <=> $b } keys(%{$results})) {
+		foreach my $result (@{$results->{$index}}) {
+			my $level = $result->{'level'};
+
+			if (!exists($sections->{$level})) {
+				$sections->{$level} = [];
+			}
+
+			push(@{$sections->{$level}}, $result->{'secname'});
+		}
+	}
+
+	die "$0: ERROR: no initcalls?" if (!keys(%{$sections}));
+
+	# print out a linker script that defines the order of initcalls for
+	# each level
+	print "SECTIONS {\n";
+
+	foreach my $level (sort(keys(%{$sections}))) {
+		my $section;
+
+		if ($level eq 'con') {
+			$section = '.con_initcall.init';
+		} else {
+			$section = ".initcall${level}.init";
+		}
+
+		print "\t${section} : {\n";
+
+		foreach my $secname (@{$sections->{$level}}) {
+			print "\t\t*(${section}..${secname}) ;\n";
+		}
+
+		print "\t}\n";
+	}
+
+	print "}\n";
+}
+
+generate_initcall_lds();
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index 78e55fe7210b..c5919d5a0b4f 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -43,6 +43,17 @@ info()
 	fi
 }
 
+# Generate a linker script to ensure correct ordering of initcalls.
+gen_initcalls()
+{
+	info GEN .tmp_initcalls.lds
+
+	${PYTHON} ${srctree}/scripts/jobserver-exec		\
+	${PERL} ${srctree}/scripts/generate_initcall_order.pl	\
+		${KBUILD_VMLINUX_OBJS} ${KBUILD_VMLINUX_LIBS}	\
+		> .tmp_initcalls.lds
+}
+
 # If CONFIG_LTO_CLANG is selected, collect generated symbol versions into
 # .tmp_symversions.lds
 gen_symversions()
@@ -72,6 +83,9 @@ modpost_link()
 		--end-group"
 
 	if [ -n "${CONFIG_LTO_CLANG}" ]; then
+		gen_initcalls
+		lds="-T .tmp_initcalls.lds"
+
 		if [ -n "${CONFIG_MODVERSIONS}" ]; then
 			gen_symversions
 			lds="${lds} -T .tmp_symversions.lds"
@@ -262,6 +276,7 @@ cleanup()
 {
 	rm -f .btf.*
 	rm -f .tmp_System.map
+	rm -f .tmp_initcalls.lds
 	rm -f .tmp_symversions.lds
 	rm -f .tmp_vmlinux*
 	rm -f System.map
-- 
2.29.2.576.ga3fc446d84-goog


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v8 08/16] init: lto: fix PREL32 relocations
  2020-12-01 21:36 [PATCH v8 00/16] Add support for Clang LTO Sami Tolvanen
                   ` (6 preceding siblings ...)
  2020-12-01 21:36 ` [PATCH v8 07/16] init: lto: ensure initcall ordering Sami Tolvanen
@ 2020-12-01 21:36 ` Sami Tolvanen
  2020-12-01 21:37 ` [PATCH v8 09/16] PCI: Fix PREL32 relocations for LTO Sami Tolvanen
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 48+ messages in thread
From: Sami Tolvanen @ 2020-12-01 21:36 UTC (permalink / raw)
  To: Masahiro Yamada, Steven Rostedt, Will Deacon
  Cc: Josh Poimboeuf, Peter Zijlstra, Greg Kroah-Hartman,
	Paul E. McKenney, Kees Cook, Nick Desaulniers, clang-built-linux,
	kernel-hardening, linux-arch, linux-arm-kernel, linux-kbuild,
	linux-kernel, linux-pci, Sami Tolvanen

With LTO, the compiler can rename static functions to avoid global
naming collisions. As initcall functions are typically static,
renaming can break references to them in inline assembly. This
change adds a global stub with a stable name for each initcall to
fix the issue when PREL32 relocations are used.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
---
 include/linux/init.h | 31 +++++++++++++++++++++++++++----
 1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/include/linux/init.h b/include/linux/init.h
index d466bea7ecba..27b9478dcdef 100644
--- a/include/linux/init.h
+++ b/include/linux/init.h
@@ -209,26 +209,49 @@ extern bool initcall_debug;
  */
 #define __initcall_section(__sec, __iid)			\
 	#__sec ".init.." #__iid
+
+/*
+ * With LTO, the compiler can rename static functions to avoid
+ * global naming collisions. We use a global stub function for
+ * initcalls to create a stable symbol name whose address can be
+ * taken in inline assembly when PREL32 relocations are used.
+ */
+#define __initcall_stub(fn, __iid, id)				\
+	__initcall_name(initstub, __iid, id)
+
+#define __define_initcall_stub(__stub, fn)			\
+	int __init __stub(void);				\
+	int __init __stub(void)					\
+	{ 							\
+		return fn();					\
+	}							\
+	__ADDRESSABLE(__stub)
 #else
 #define __initcall_section(__sec, __iid)			\
 	#__sec ".init"
+
+#define __initcall_stub(fn, __iid, id)	fn
+
+#define __define_initcall_stub(__stub, fn)			\
+	__ADDRESSABLE(fn)
 #endif
 
 #ifdef CONFIG_HAVE_ARCH_PREL32_RELOCATIONS
-#define ____define_initcall(fn, __name, __sec)			\
-	__ADDRESSABLE(fn)					\
+#define ____define_initcall(fn, __stub, __name, __sec)		\
+	__define_initcall_stub(__stub, fn)			\
 	asm(".section	\"" __sec "\", \"a\"		\n"	\
 	    __stringify(__name) ":			\n"	\
-	    ".long	" #fn " - .			\n"	\
+	    ".long	" __stringify(__stub) " - .	\n"	\
 	    ".previous					\n");
 #else
-#define ____define_initcall(fn, __name, __sec)			\
+#define ____define_initcall(fn, __unused, __name, __sec)	\
 	static initcall_t __name __used 			\
 		__attribute__((__section__(__sec))) = fn;
 #endif
 
 #define __unique_initcall(fn, id, __sec, __iid)			\
 	____define_initcall(fn,					\
+		__initcall_stub(fn, __iid, id),			\
 		__initcall_name(initcall, __iid, id),		\
 		__initcall_section(__sec, __iid))
 
-- 
2.29.2.576.ga3fc446d84-goog


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v8 09/16] PCI: Fix PREL32 relocations for LTO
  2020-12-01 21:36 [PATCH v8 00/16] Add support for Clang LTO Sami Tolvanen
                   ` (7 preceding siblings ...)
  2020-12-01 21:36 ` [PATCH v8 08/16] init: lto: fix PREL32 relocations Sami Tolvanen
@ 2020-12-01 21:37 ` Sami Tolvanen
  2020-12-01 21:37 ` [PATCH v8 10/16] modpost: lto: strip .lto from module names Sami Tolvanen
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 48+ messages in thread
From: Sami Tolvanen @ 2020-12-01 21:37 UTC (permalink / raw)
  To: Masahiro Yamada, Steven Rostedt, Will Deacon
  Cc: Josh Poimboeuf, Peter Zijlstra, Greg Kroah-Hartman,
	Paul E. McKenney, Kees Cook, Nick Desaulniers, clang-built-linux,
	kernel-hardening, linux-arch, linux-arm-kernel, linux-kbuild,
	linux-kernel, linux-pci, Sami Tolvanen

With Clang's Link Time Optimization (LTO), the compiler can rename
static functions to avoid global naming collisions. As PCI fixup
functions are typically static, renaming can break references
to them in inline assembly. This change adds a global stub to
DECLARE_PCI_FIXUP_SECTION to fix the issue when PREL32 relocations
are used.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
---
 include/linux/pci.h | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/include/linux/pci.h b/include/linux/pci.h
index 22207a79762c..5b8505a5ca5f 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1912,19 +1912,28 @@ enum pci_fixup_pass {
 };
 
 #ifdef CONFIG_HAVE_ARCH_PREL32_RELOCATIONS
-#define __DECLARE_PCI_FIXUP_SECTION(sec, name, vendor, device, class,	\
-				    class_shift, hook)			\
-	__ADDRESSABLE(hook)						\
+#define ___DECLARE_PCI_FIXUP_SECTION(sec, name, vendor, device, class,	\
+				    class_shift, hook, stub)		\
+	void stub(struct pci_dev *dev);					\
+	void stub(struct pci_dev *dev)					\
+	{ 								\
+		hook(dev); 						\
+	}								\
 	asm(".section "	#sec ", \"a\"				\n"	\
 	    ".balign	16					\n"	\
 	    ".short "	#vendor ", " #device "			\n"	\
 	    ".long "	#class ", " #class_shift "		\n"	\
-	    ".long "	#hook " - .				\n"	\
+	    ".long "	#stub " - .				\n"	\
 	    ".previous						\n");
+
+#define __DECLARE_PCI_FIXUP_SECTION(sec, name, vendor, device, class,	\
+				  class_shift, hook, stub)		\
+	___DECLARE_PCI_FIXUP_SECTION(sec, name, vendor, device, class,	\
+				  class_shift, hook, stub)
 #define DECLARE_PCI_FIXUP_SECTION(sec, name, vendor, device, class,	\
 				  class_shift, hook)			\
 	__DECLARE_PCI_FIXUP_SECTION(sec, name, vendor, device, class,	\
-				  class_shift, hook)
+				  class_shift, hook, __UNIQUE_ID(hook))
 #else
 /* Anonymous variables would be nice... */
 #define DECLARE_PCI_FIXUP_SECTION(section, name, vendor, device, class,	\
-- 
2.29.2.576.ga3fc446d84-goog


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v8 10/16] modpost: lto: strip .lto from module names
  2020-12-01 21:36 [PATCH v8 00/16] Add support for Clang LTO Sami Tolvanen
                   ` (8 preceding siblings ...)
  2020-12-01 21:37 ` [PATCH v8 09/16] PCI: Fix PREL32 relocations for LTO Sami Tolvanen
@ 2020-12-01 21:37 ` Sami Tolvanen
  2020-12-01 21:37 ` [PATCH v8 11/16] scripts/mod: disable LTO for empty.c Sami Tolvanen
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 48+ messages in thread
From: Sami Tolvanen @ 2020-12-01 21:37 UTC (permalink / raw)
  To: Masahiro Yamada, Steven Rostedt, Will Deacon
  Cc: Josh Poimboeuf, Peter Zijlstra, Greg Kroah-Hartman,
	Paul E. McKenney, Kees Cook, Nick Desaulniers, clang-built-linux,
	kernel-hardening, linux-arch, linux-arm-kernel, linux-kbuild,
	linux-kernel, linux-pci, Sami Tolvanen

With LTO, everything is compiled into LLVM bitcode, so we have to link
each module into native code before modpost. Kbuild uses the .lto.o
suffix for these files, which also ends up in module information. This
change strips the unnecessary .lto suffix from the module name.

Suggested-by: Bill Wendling <morbo@google.com>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
---
 scripts/mod/modpost.c    | 16 +++++++---------
 scripts/mod/modpost.h    |  9 +++++++++
 scripts/mod/sumversion.c |  6 +++++-
 3 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index f882ce0d9327..ebb15cc3f262 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -17,7 +17,6 @@
 #include <ctype.h>
 #include <string.h>
 #include <limits.h>
-#include <stdbool.h>
 #include <errno.h>
 #include "modpost.h"
 #include "../../include/linux/license.h"
@@ -80,14 +79,6 @@ modpost_log(enum loglevel loglevel, const char *fmt, ...)
 		exit(1);
 }
 
-static inline bool strends(const char *str, const char *postfix)
-{
-	if (strlen(str) < strlen(postfix))
-		return false;
-
-	return strcmp(str + strlen(str) - strlen(postfix), postfix) == 0;
-}
-
 void *do_nofail(void *ptr, const char *expr)
 {
 	if (!ptr)
@@ -1984,6 +1975,10 @@ static char *remove_dot(char *s)
 		size_t m = strspn(s + n + 1, "0123456789");
 		if (m && (s[n + m] == '.' || s[n + m] == 0))
 			s[n] = 0;
+
+		/* strip trailing .lto */
+		if (strends(s, ".lto"))
+			s[strlen(s) - 4] = '\0';
 	}
 	return s;
 }
@@ -2007,6 +2002,9 @@ static void read_symbols(const char *modname)
 		/* strip trailing .o */
 		tmp = NOFAIL(strdup(modname));
 		tmp[strlen(tmp) - 2] = '\0';
+		/* strip trailing .lto */
+		if (strends(tmp, ".lto"))
+			tmp[strlen(tmp) - 4] = '\0';
 		mod = new_module(tmp);
 		free(tmp);
 	}
diff --git a/scripts/mod/modpost.h b/scripts/mod/modpost.h
index 3aa052722233..fab30d201f9e 100644
--- a/scripts/mod/modpost.h
+++ b/scripts/mod/modpost.h
@@ -2,6 +2,7 @@
 #include <stdio.h>
 #include <stdlib.h>
 #include <stdarg.h>
+#include <stdbool.h>
 #include <string.h>
 #include <sys/types.h>
 #include <sys/stat.h>
@@ -180,6 +181,14 @@ static inline unsigned int get_secindex(const struct elf_info *info,
 	return info->symtab_shndx_start[sym - info->symtab_start];
 }
 
+static inline bool strends(const char *str, const char *postfix)
+{
+	if (strlen(str) < strlen(postfix))
+		return false;
+
+	return strcmp(str + strlen(str) - strlen(postfix), postfix) == 0;
+}
+
 /* file2alias.c */
 extern unsigned int cross_build;
 void handle_moddevtable(struct module *mod, struct elf_info *info,
diff --git a/scripts/mod/sumversion.c b/scripts/mod/sumversion.c
index d587f40f1117..760e6baa7eda 100644
--- a/scripts/mod/sumversion.c
+++ b/scripts/mod/sumversion.c
@@ -391,10 +391,14 @@ void get_src_version(const char *modname, char sum[], unsigned sumlen)
 	struct md4_ctx md;
 	char *fname;
 	char filelist[PATH_MAX + 1];
+	int postfix_len = 1;
+
+	if (strends(modname, ".lto.o"))
+		postfix_len = 5;
 
 	/* objects for a module are listed in the first line of *.mod file. */
 	snprintf(filelist, sizeof(filelist), "%.*smod",
-		 (int)strlen(modname) - 1, modname);
+		 (int)strlen(modname) - postfix_len, modname);
 
 	buf = read_text_file(filelist);
 
-- 
2.29.2.576.ga3fc446d84-goog


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v8 11/16] scripts/mod: disable LTO for empty.c
  2020-12-01 21:36 [PATCH v8 00/16] Add support for Clang LTO Sami Tolvanen
                   ` (9 preceding siblings ...)
  2020-12-01 21:37 ` [PATCH v8 10/16] modpost: lto: strip .lto from module names Sami Tolvanen
@ 2020-12-01 21:37 ` Sami Tolvanen
  2020-12-01 21:37 ` [PATCH v8 12/16] efi/libstub: disable LTO Sami Tolvanen
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 48+ messages in thread
From: Sami Tolvanen @ 2020-12-01 21:37 UTC (permalink / raw)
  To: Masahiro Yamada, Steven Rostedt, Will Deacon
  Cc: Josh Poimboeuf, Peter Zijlstra, Greg Kroah-Hartman,
	Paul E. McKenney, Kees Cook, Nick Desaulniers, clang-built-linux,
	kernel-hardening, linux-arch, linux-arm-kernel, linux-kbuild,
	linux-kernel, linux-pci, Sami Tolvanen

With CONFIG_LTO_CLANG, clang generates LLVM IR instead of ELF object
files. As empty.o is used for probing target properties, disable LTO
for it to produce an object file instead.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
---
 scripts/mod/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/scripts/mod/Makefile b/scripts/mod/Makefile
index 78071681d924..c9e38ad937fd 100644
--- a/scripts/mod/Makefile
+++ b/scripts/mod/Makefile
@@ -1,5 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 OBJECT_FILES_NON_STANDARD := y
+CFLAGS_REMOVE_empty.o += $(CC_FLAGS_LTO)
 
 hostprogs-always-y	+= modpost mk_elfconfig
 always-y		+= empty.o
-- 
2.29.2.576.ga3fc446d84-goog


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v8 12/16] efi/libstub: disable LTO
  2020-12-01 21:36 [PATCH v8 00/16] Add support for Clang LTO Sami Tolvanen
                   ` (10 preceding siblings ...)
  2020-12-01 21:37 ` [PATCH v8 11/16] scripts/mod: disable LTO for empty.c Sami Tolvanen
@ 2020-12-01 21:37 ` Sami Tolvanen
  2020-12-01 21:37 ` [PATCH v8 13/16] drivers/misc/lkdtm: disable LTO for rodata.o Sami Tolvanen
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 48+ messages in thread
From: Sami Tolvanen @ 2020-12-01 21:37 UTC (permalink / raw)
  To: Masahiro Yamada, Steven Rostedt, Will Deacon
  Cc: Josh Poimboeuf, Peter Zijlstra, Greg Kroah-Hartman,
	Paul E. McKenney, Kees Cook, Nick Desaulniers, clang-built-linux,
	kernel-hardening, linux-arch, linux-arm-kernel, linux-kbuild,
	linux-kernel, linux-pci, Sami Tolvanen

With CONFIG_LTO_CLANG, we produce LLVM bitcode instead of ELF object
files. Since LTO is not really needed here and the Makefile assumes we
produce an object file, disable LTO for libstub.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
---
 drivers/firmware/efi/libstub/Makefile | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index 8a94388e38b3..c23466e05e60 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -38,6 +38,8 @@ KBUILD_CFLAGS			:= $(cflags-y) -Os -DDISABLE_BRANCH_PROFILING \
 
 # remove SCS flags from all objects in this directory
 KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))
+# disable LTO
+KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_LTO), $(KBUILD_CFLAGS))
 
 GCOV_PROFILE			:= n
 # Sanitizer runtimes are unavailable and cannot be linked here.
-- 
2.29.2.576.ga3fc446d84-goog


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v8 13/16] drivers/misc/lkdtm: disable LTO for rodata.o
  2020-12-01 21:36 [PATCH v8 00/16] Add support for Clang LTO Sami Tolvanen
                   ` (11 preceding siblings ...)
  2020-12-01 21:37 ` [PATCH v8 12/16] efi/libstub: disable LTO Sami Tolvanen
@ 2020-12-01 21:37 ` Sami Tolvanen
  2020-12-01 21:37 ` [PATCH v8 14/16] arm64: vdso: disable LTO Sami Tolvanen
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 48+ messages in thread
From: Sami Tolvanen @ 2020-12-01 21:37 UTC (permalink / raw)
  To: Masahiro Yamada, Steven Rostedt, Will Deacon
  Cc: Josh Poimboeuf, Peter Zijlstra, Greg Kroah-Hartman,
	Paul E. McKenney, Kees Cook, Nick Desaulniers, clang-built-linux,
	kernel-hardening, linux-arch, linux-arm-kernel, linux-kbuild,
	linux-kernel, linux-pci, Sami Tolvanen

Disable LTO for rodata.o to allow objcopy to be used to
manipulate sections.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
---
 drivers/misc/lkdtm/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/misc/lkdtm/Makefile b/drivers/misc/lkdtm/Makefile
index c70b3822013f..dd4c936d4d73 100644
--- a/drivers/misc/lkdtm/Makefile
+++ b/drivers/misc/lkdtm/Makefile
@@ -13,6 +13,7 @@ lkdtm-$(CONFIG_LKDTM)		+= cfi.o
 
 KASAN_SANITIZE_stackleak.o	:= n
 KCOV_INSTRUMENT_rodata.o	:= n
+CFLAGS_REMOVE_rodata.o		+= $(CC_FLAGS_LTO)
 
 OBJCOPYFLAGS :=
 OBJCOPYFLAGS_rodata_objcopy.o	:= \
-- 
2.29.2.576.ga3fc446d84-goog


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v8 14/16] arm64: vdso: disable LTO
  2020-12-01 21:36 [PATCH v8 00/16] Add support for Clang LTO Sami Tolvanen
                   ` (12 preceding siblings ...)
  2020-12-01 21:37 ` [PATCH v8 13/16] drivers/misc/lkdtm: disable LTO for rodata.o Sami Tolvanen
@ 2020-12-01 21:37 ` Sami Tolvanen
  2020-12-01 21:37 ` [PATCH v8 15/16] arm64: disable recordmcount with DYNAMIC_FTRACE_WITH_REGS Sami Tolvanen
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 48+ messages in thread
From: Sami Tolvanen @ 2020-12-01 21:37 UTC (permalink / raw)
  To: Masahiro Yamada, Steven Rostedt, Will Deacon
  Cc: Josh Poimboeuf, Peter Zijlstra, Greg Kroah-Hartman,
	Paul E. McKenney, Kees Cook, Nick Desaulniers, clang-built-linux,
	kernel-hardening, linux-arch, linux-arm-kernel, linux-kbuild,
	linux-kernel, linux-pci, Sami Tolvanen

Disable LTO for the vDSO by filtering out CC_FLAGS_LTO, as there's no
point in using link-time optimization for the small amount of C code.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kernel/vdso/Makefile | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile
index d65f52264aba..50fe49fb4d95 100644
--- a/arch/arm64/kernel/vdso/Makefile
+++ b/arch/arm64/kernel/vdso/Makefile
@@ -30,7 +30,8 @@ ldflags-y := -shared -nostdlib -soname=linux-vdso.so.1 --hash-style=sysv	\
 ccflags-y := -fno-common -fno-builtin -fno-stack-protector -ffixed-x18
 ccflags-y += -DDISABLE_BRANCH_PROFILING
 
-CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os $(CC_FLAGS_SCS) $(GCC_PLUGINS_CFLAGS)
+CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os $(CC_FLAGS_SCS) $(GCC_PLUGINS_CFLAGS) \
+				$(CC_FLAGS_LTO)
 KASAN_SANITIZE			:= n
 UBSAN_SANITIZE			:= n
 OBJECT_FILES_NON_STANDARD	:= y
-- 
2.29.2.576.ga3fc446d84-goog


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v8 15/16] arm64: disable recordmcount with DYNAMIC_FTRACE_WITH_REGS
  2020-12-01 21:36 [PATCH v8 00/16] Add support for Clang LTO Sami Tolvanen
                   ` (13 preceding siblings ...)
  2020-12-01 21:37 ` [PATCH v8 14/16] arm64: vdso: disable LTO Sami Tolvanen
@ 2020-12-01 21:37 ` Sami Tolvanen
  2020-12-01 21:37 ` [PATCH v8 16/16] arm64: allow LTO to be selected Sami Tolvanen
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 48+ messages in thread
From: Sami Tolvanen @ 2020-12-01 21:37 UTC (permalink / raw)
  To: Masahiro Yamada, Steven Rostedt, Will Deacon
  Cc: Josh Poimboeuf, Peter Zijlstra, Greg Kroah-Hartman,
	Paul E. McKenney, Kees Cook, Nick Desaulniers, clang-built-linux,
	kernel-hardening, linux-arch, linux-arm-kernel, linux-kbuild,
	linux-kernel, linux-pci, Sami Tolvanen

DYNAMIC_FTRACE_WITH_REGS uses -fpatchable-function-entry, which makes
running recordmcount unnecessary as there are no mcount calls in object
files, and __mcount_loc doesn't need to be generated.

While there's normally no harm in running recordmcount even when it's
not strictly needed, this won't work with LTO as we have LLVM bitcode
instead of ELF objects.

This change selects FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY, which
disables recordmcount when patchable function entries are used instead.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Will Deacon <will@kernel.org>
---
 arch/arm64/Kconfig | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 1515f6f153a0..c7f07978f5b6 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -158,6 +158,8 @@ config ARM64
 	select HAVE_DYNAMIC_FTRACE
 	select HAVE_DYNAMIC_FTRACE_WITH_REGS \
 		if $(cc-option,-fpatchable-function-entry=2)
+	select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY \
+		if DYNAMIC_FTRACE_WITH_REGS
 	select HAVE_EFFICIENT_UNALIGNED_ACCESS
 	select HAVE_FAST_GUP
 	select HAVE_FTRACE_MCOUNT_RECORD
-- 
2.29.2.576.ga3fc446d84-goog


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v8 16/16] arm64: allow LTO to be selected
  2020-12-01 21:36 [PATCH v8 00/16] Add support for Clang LTO Sami Tolvanen
                   ` (14 preceding siblings ...)
  2020-12-01 21:37 ` [PATCH v8 15/16] arm64: disable recordmcount with DYNAMIC_FTRACE_WITH_REGS Sami Tolvanen
@ 2020-12-01 21:37 ` Sami Tolvanen
  2020-12-03  0:01 ` [PATCH v8 00/16] Add support for Clang LTO Nick Desaulniers
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 48+ messages in thread
From: Sami Tolvanen @ 2020-12-01 21:37 UTC (permalink / raw)
  To: Masahiro Yamada, Steven Rostedt, Will Deacon
  Cc: Josh Poimboeuf, Peter Zijlstra, Greg Kroah-Hartman,
	Paul E. McKenney, Kees Cook, Nick Desaulniers, clang-built-linux,
	kernel-hardening, linux-arch, linux-arm-kernel, linux-kbuild,
	linux-kernel, linux-pci, Sami Tolvanen

Allow CONFIG_LTO_CLANG to be enabled.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Will Deacon <will@kernel.org>
---
 arch/arm64/Kconfig | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c7f07978f5b6..9d29c48ecd4f 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -73,6 +73,8 @@ config ARM64
 	select ARCH_USE_SYM_ANNOTATIONS
 	select ARCH_SUPPORTS_MEMORY_FAILURE
 	select ARCH_SUPPORTS_SHADOW_CALL_STACK if CC_HAVE_SHADOW_CALL_STACK
+	select ARCH_SUPPORTS_LTO_CLANG
+	select ARCH_SUPPORTS_LTO_CLANG_THIN
 	select ARCH_SUPPORTS_ATOMIC_RMW
 	select ARCH_SUPPORTS_INT128 if CC_HAS_INT128 && (GCC_VERSION >= 50000 || CC_IS_CLANG)
 	select ARCH_SUPPORTS_NUMA_BALANCING
-- 
2.29.2.576.ga3fc446d84-goog


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v8 01/16] tracing: move function tracer options to Kconfig
  2020-12-01 21:36 ` [PATCH v8 01/16] tracing: move function tracer options to Kconfig Sami Tolvanen
@ 2020-12-01 21:47   ` Steven Rostedt
  0 siblings, 0 replies; 48+ messages in thread
From: Steven Rostedt @ 2020-12-01 21:47 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Masahiro Yamada, Will Deacon, Josh Poimboeuf, Peter Zijlstra,
	Greg Kroah-Hartman, Paul E. McKenney, Kees Cook,
	Nick Desaulniers, clang-built-linux, kernel-hardening,
	linux-arch, linux-arm-kernel, linux-kbuild, linux-kernel,
	linux-pci

On Tue,  1 Dec 2020 13:36:52 -0800
Sami Tolvanen <samitolvanen@google.com> wrote:

> Move function tracer options to Kconfig to make it easier to add
> new methods for generating __mcount_loc, and to make the options
> available also when building kernel modules.
> 
> Note that FTRACE_MCOUNT_USE_* options are updated on rebuild and
> therefore, work even if the .config was generated in a different
> environment.

Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

> 
> Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
> ---


-- Steve

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v8 02/16] kbuild: add support for Clang LTO
  2020-12-01 21:36 ` [PATCH v8 02/16] kbuild: add support for Clang LTO Sami Tolvanen
@ 2020-12-02  2:59   ` Masahiro Yamada
  2020-12-03  0:07   ` Nick Desaulniers
  1 sibling, 0 replies; 48+ messages in thread
From: Masahiro Yamada @ 2020-12-02  2:59 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Steven Rostedt, Will Deacon, Josh Poimboeuf, Peter Zijlstra,
	Greg Kroah-Hartman, Paul E. McKenney, Kees Cook,
	Nick Desaulniers, clang-built-linux, Kernel Hardening,
	linux-arch, linux-arm-kernel, Linux Kbuild mailing list,
	Linux Kernel Mailing List, linux-pci

On Wed, Dec 2, 2020 at 6:37 AM 'Sami Tolvanen' via Clang Built Linux
<clang-built-linux@googlegroups.com> wrote:
>
> This change adds build system support for Clang's Link Time
> Optimization (LTO). With -flto, instead of ELF object files, Clang
> produces LLVM bitcode, which is compiled into native code at link
> time, allowing the final binary to be optimized globally. For more
> details, see:
>
>   https://llvm.org/docs/LinkTimeOptimization.html
>
> The Kconfig option CONFIG_LTO_CLANG is implemented as a choice,
> which defaults to LTO being disabled. To use LTO, the architecture
> must select ARCH_SUPPORTS_LTO_CLANG and support:
>
>   - compiling with Clang,
>   - compiling inline assembly with Clang's integrated assembler,
>   - and linking with LLD.
>
> While using full LTO results in the best runtime performance, the
> compilation is not scalable in time or memory. CONFIG_THINLTO
> enables ThinLTO, which allows parallel optimization and faster
> incremental builds. ThinLTO is used by default if the architecture
> also selects ARCH_SUPPORTS_THINLTO:
>
>   https://clang.llvm.org/docs/ThinLTO.html
>
> To enable LTO, LLVM tools must be used to handle bitcode files. The
> easiest way is to pass the LLVM=1 option to make:
>
>   $ make LLVM=1 defconfig
>   $ scripts/config -e LTO_CLANG
>   $ make LLVM=1
>
> Alternatively, at least the following LLVM tools must be used:
>
>   CC=clang LD=ld.lld AR=llvm-ar NM=llvm-nm
>
> To prepare for LTO support with other compilers, common parts are
> gated behind the CONFIG_LTO option, and LTO can be disabled for
> specific files by filtering out CC_FLAGS_LTO.
>
> Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
> Reviewed-by: Kees Cook <keescook@chromium.org>
> ---
>  Makefile                          | 19 ++++++-
>  arch/Kconfig                      | 88 +++++++++++++++++++++++++++++++
>  include/asm-generic/vmlinux.lds.h | 11 ++--
>  scripts/Makefile.build            |  9 +++-
>  scripts/Makefile.modfinal         |  9 +++-
>  scripts/Makefile.modpost          | 21 +++++++-
>  scripts/link-vmlinux.sh           | 32 ++++++++---
>  7 files changed, 171 insertions(+), 18 deletions(-)
>
> diff --git a/Makefile b/Makefile
> index 16b7f0890e75..f5cac2428efc 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -891,6 +891,21 @@ KBUILD_CFLAGS      += $(CC_FLAGS_SCS)
>  export CC_FLAGS_SCS
>  endif
>
> +ifdef CONFIG_LTO_CLANG
> +ifdef CONFIG_LTO_CLANG_THIN
> +CC_FLAGS_LTO   += -flto=thin -fsplit-lto-unit
> +KBUILD_LDFLAGS += --thinlto-cache-dir=$(extmod-prefix).thinlto-cache
> +else
> +CC_FLAGS_LTO   += -flto
> +endif
> +CC_FLAGS_LTO   += -fvisibility=default
> +endif
> +
> +ifdef CONFIG_LTO
> +KBUILD_CFLAGS  += $(CC_FLAGS_LTO)
> +export CC_FLAGS_LTO
> +endif
> +
>  ifdef CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_32B
>  KBUILD_CFLAGS += -falign-functions=32
>  endif
> @@ -1471,7 +1486,7 @@ MRPROPER_FILES += include/config include/generated          \
>                   *.spec
>
>  # Directories & files removed with 'make distclean'
> -DISTCLEAN_FILES += tags TAGS cscope* GPATH GTAGS GRTAGS GSYMS
> +DISTCLEAN_FILES += tags TAGS cscope* GPATH GTAGS GRTAGS GSYMS .thinlto-cache
>
>  # clean - Delete most, but leave enough to build external modules
>  #
> @@ -1717,7 +1732,7 @@ PHONY += compile_commands.json
>
>  clean-dirs := $(KBUILD_EXTMOD)
>  clean: rm-files := $(KBUILD_EXTMOD)/Module.symvers $(KBUILD_EXTMOD)/modules.nsdeps \
> -       $(KBUILD_EXTMOD)/compile_commands.json
> +       $(KBUILD_EXTMOD)/compile_commands.json $(KBUILD_EXTMOD)/.thinlto-cache
>
>  PHONY += help
>  help:
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 56b6ccc0e32d..30907b554451 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -598,6 +598,94 @@ config SHADOW_CALL_STACK
>           reading and writing arbitrary memory may be able to locate them
>           and hijack control flow by modifying the stacks.
>
> +config LTO
> +       bool
> +       help
> +         Selected if the kernel will be built using the compiler's LTO feature.
> +
> +config LTO_CLANG
> +       bool
> +       select LTO
> +       help
> +         Selected if the kernel will be built using Clang's LTO feature.
> +
> +config ARCH_SUPPORTS_LTO_CLANG
> +       bool
> +       help
> +         An architecture should select this option if it supports:
> +         - compiling with Clang,
> +         - compiling inline assembly with Clang's integrated assembler,
> +         - and linking with LLD.
> +
> +config ARCH_SUPPORTS_LTO_CLANG_THIN
> +       bool
> +       help
> +         An architecture should select this option if it can support Clang's
> +         ThinLTO mode.
> +
> +config HAS_LTO_CLANG
> +       def_bool y
> +       # Clang >= 11: https://github.com/ClangBuiltLinux/linux/issues/510
> +       depends on CC_IS_CLANG && CLANG_VERSION >= 110000 && LD_IS_LLD
> +       depends on $(success,$(NM) --help | head -n 1 | grep -qi llvm)
> +       depends on $(success,$(AR) --help | head -n 1 | grep -qi llvm)
> +       depends on ARCH_SUPPORTS_LTO_CLANG
> +       depends on !FTRACE_MCOUNT_USE_RECORDMCOUNT
> +       depends on !KASAN
> +       depends on !GCOV_KERNEL
> +       depends on !MODVERSIONS
> +       help
> +         The compiler and Kconfig options support building with Clang's
> +         LTO.
> +
> +choice
> +       prompt "Link Time Optimization (LTO)"
> +       default LTO_NONE
> +       help
> +         This option enables Link Time Optimization (LTO), which allows the
> +         compiler to optimize binaries globally.
> +
> +         If unsure, select LTO_NONE. Note that LTO is very resource-intensive
> +         so it's disabled by default.
> +
> +config LTO_NONE
> +       bool "None"
> +       help
> +         Build the kernel normally, without Link Time Optimization (LTO).
> +
> +config LTO_CLANG_FULL
> +       bool "Clang Full LTO (EXPERIMENTAL)"
> +       depends on HAS_LTO_CLANG
> +       select LTO_CLANG
> +       help
> +          This option enables Clang's full Link Time Optimization (LTO), which
> +          allows the compiler to optimize the kernel globally. If you enable
> +          this option, the compiler generates LLVM bitcode instead of ELF
> +          object files, and the actual compilation from bitcode happens at
> +          the LTO link step, which may take several minutes depending on the
> +          kernel configuration. More information can be found from LLVM's
> +          documentation:
> +
> +           https://llvm.org/docs/LinkTimeOptimization.html
> +

This help document is misleading.
People who read the document would misunderstand how great this feature would.

This should be added in the commit log and Kconfig help:

            In contrast to the example in the documentation, Clang LTO
            for the kernel cannot remove any unreachable function or data.
            In fact, this results in even bigger vmlinux and modules.




-- 
Best Regards
Masahiro Yamada

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v8 00/16] Add support for Clang LTO
  2020-12-01 21:36 [PATCH v8 00/16] Add support for Clang LTO Sami Tolvanen
                   ` (15 preceding siblings ...)
  2020-12-01 21:37 ` [PATCH v8 16/16] arm64: allow LTO to be selected Sami Tolvanen
@ 2020-12-03  0:01 ` Nick Desaulniers
  2020-12-03 11:26 ` Will Deacon
  2020-12-08 12:15 ` Arnd Bergmann
  18 siblings, 0 replies; 48+ messages in thread
From: Nick Desaulniers @ 2020-12-03  0:01 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Masahiro Yamada, Steven Rostedt, Will Deacon, Josh Poimboeuf,
	Peter Zijlstra, Greg Kroah-Hartman, Paul E. McKenney, Kees Cook,
	clang-built-linux, Kernel Hardening, linux-arch, Linux ARM,
	Linux Kbuild mailing list, LKML, PCI

On Tue, Dec 1, 2020 at 1:37 PM Sami Tolvanen <samitolvanen@google.com> wrote:
>
> This patch series adds support for building the kernel with Clang's
> Link Time Optimization (LTO). In addition to performance, the primary
> motivation for LTO is to allow Clang's Control-Flow Integrity (CFI)
> to be used in the kernel. Google has shipped millions of Pixel
> devices running three major kernel versions with LTO+CFI since 2018.
>
> Most of the patches are build system changes for handling LLVM
> bitcode, which Clang produces with LTO instead of ELF object files,
> postponing ELF processing until a later stage, and ensuring initcall
> ordering.
>
> Note that arm64 support depends on Will's memory ordering patches
> [1]. I will post x86_64 patches separately after we have fixed the
> remaining objtool warnings [2][3].
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-next/lto
> [2] https://lore.kernel.org/lkml/20201120040424.a3wctajzft4ufoiw@treble/
> [3] https://git.kernel.org/pub/scm/linux/kernel/git/jpoimboe/linux.git/log/?h=objtool-vmlinux
>
> You can also pull this series from
>
>   https://github.com/samitolvanen/linux.git lto-v8
>
> ---
> Changes in v8:
>
>   - Cleaned up the LTO Kconfig options based on suggestions from
>     Nick and Kees.

Thanks Sami, for the series:

Tested-by: Nick Desaulniers <ndesaulniers@google.com>

(build and boot tested under emulation with
https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-next/lto
additionally rebased on top).

As with v7, if the series changes drastically for v9, please consider
dropping my tested by tag for the individual patches that change and I
will help re-test them.
-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v8 02/16] kbuild: add support for Clang LTO
  2020-12-01 21:36 ` [PATCH v8 02/16] kbuild: add support for Clang LTO Sami Tolvanen
  2020-12-02  2:59   ` Masahiro Yamada
@ 2020-12-03  0:07   ` Nick Desaulniers
  1 sibling, 0 replies; 48+ messages in thread
From: Nick Desaulniers @ 2020-12-03  0:07 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Masahiro Yamada, Steven Rostedt, Will Deacon, Josh Poimboeuf,
	Peter Zijlstra, Greg Kroah-Hartman, Paul E. McKenney, Kees Cook,
	clang-built-linux, Kernel Hardening, linux-arch, Linux ARM,
	Linux Kbuild mailing list, LKML, PCI

On Tue, Dec 1, 2020 at 1:37 PM Sami Tolvanen <samitolvanen@google.com> wrote:
>
> This change adds build system support for Clang's Link Time
> Optimization (LTO). With -flto, instead of ELF object files, Clang
> produces LLVM bitcode, which is compiled into native code at link
> time, allowing the final binary to be optimized globally. For more
> details, see:
>
>   https://llvm.org/docs/LinkTimeOptimization.html
>
> The Kconfig option CONFIG_LTO_CLANG is implemented as a choice,
> which defaults to LTO being disabled. To use LTO, the architecture
> must select ARCH_SUPPORTS_LTO_CLANG and support:
>
>   - compiling with Clang,
>   - compiling inline assembly with Clang's integrated assembler,
>   - and linking with LLD.
>
> While using full LTO results in the best runtime performance, the
> compilation is not scalable in time or memory. CONFIG_THINLTO
> enables ThinLTO, which allows parallel optimization and faster
> incremental builds. ThinLTO is used by default if the architecture
> also selects ARCH_SUPPORTS_THINLTO:
>
>   https://clang.llvm.org/docs/ThinLTO.html
>
> To enable LTO, LLVM tools must be used to handle bitcode files. The
> easiest way is to pass the LLVM=1 option to make:
>
>   $ make LLVM=1 defconfig
>   $ scripts/config -e LTO_CLANG
>   $ make LLVM=1
>
> Alternatively, at least the following LLVM tools must be used:
>
>   CC=clang LD=ld.lld AR=llvm-ar NM=llvm-nm
>
> To prepare for LTO support with other compilers, common parts are
> gated behind the CONFIG_LTO option, and LTO can be disabled for
> specific files by filtering out CC_FLAGS_LTO.
>
> Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
> Reviewed-by: Kees Cook <keescook@chromium.org>

Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v8 00/16] Add support for Clang LTO
  2020-12-01 21:36 [PATCH v8 00/16] Add support for Clang LTO Sami Tolvanen
                   ` (16 preceding siblings ...)
  2020-12-03  0:01 ` [PATCH v8 00/16] Add support for Clang LTO Nick Desaulniers
@ 2020-12-03 11:26 ` Will Deacon
  2020-12-03 17:07   ` Sami Tolvanen
  2020-12-08 12:15 ` Arnd Bergmann
  18 siblings, 1 reply; 48+ messages in thread
From: Will Deacon @ 2020-12-03 11:26 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Masahiro Yamada, Steven Rostedt, Josh Poimboeuf, Peter Zijlstra,
	Greg Kroah-Hartman, Paul E. McKenney, Kees Cook,
	Nick Desaulniers, clang-built-linux, kernel-hardening,
	linux-arch, linux-arm-kernel, linux-kbuild, linux-kernel,
	linux-pci

Hi Sami,

On Tue, Dec 01, 2020 at 01:36:51PM -0800, Sami Tolvanen wrote:
> This patch series adds support for building the kernel with Clang's
> Link Time Optimization (LTO). In addition to performance, the primary
> motivation for LTO is to allow Clang's Control-Flow Integrity (CFI)
> to be used in the kernel. Google has shipped millions of Pixel
> devices running three major kernel versions with LTO+CFI since 2018.
> 
> Most of the patches are build system changes for handling LLVM
> bitcode, which Clang produces with LTO instead of ELF object files,
> postponing ELF processing until a later stage, and ensuring initcall
> ordering.
> 
> Note that arm64 support depends on Will's memory ordering patches
> [1]. I will post x86_64 patches separately after we have fixed the
> remaining objtool warnings [2][3].

I took this series for a spin, with my for-next/lto branch merged in but
I see a failure during the LTO stage with clang 11.0.5 because it doesn't
understand the '.arch_extension rcpc' directive we throw out in READ_ONCE().

We actually check that this extension is available before using it in
the arm64 Kconfig:

	config AS_HAS_LDAPR
		def_bool $(as-instr,.arch_extension rcpc)

so this shouldn't happen. I then realised, I wasn't passing LLVM_IAS=1
on my Make command line; with that, then the detection works correctly
and the LTO step succeeds.

Why is it necessary to pass LLVM_IAS=1 if LTO is enabled? I think it
would be _much_ better if this was implicit (or if LTO depended on it).

Cheers,

Will

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v8 00/16] Add support for Clang LTO
  2020-12-03 11:26 ` Will Deacon
@ 2020-12-03 17:07   ` Sami Tolvanen
  2020-12-03 18:21     ` Nathan Chancellor
  2020-12-03 18:22     ` Will Deacon
  0 siblings, 2 replies; 48+ messages in thread
From: Sami Tolvanen @ 2020-12-03 17:07 UTC (permalink / raw)
  To: Will Deacon, Nick Desaulniers, Nathan Chancellor
  Cc: Masahiro Yamada, Steven Rostedt, Josh Poimboeuf, Peter Zijlstra,
	Greg Kroah-Hartman, Paul E. McKenney, Kees Cook,
	clang-built-linux, Kernel Hardening, linux-arch,
	linux-arm-kernel, linux-kbuild, LKML, PCI

On Thu, Dec 3, 2020 at 3:26 AM Will Deacon <will@kernel.org> wrote:
>
> Hi Sami,
>
> On Tue, Dec 01, 2020 at 01:36:51PM -0800, Sami Tolvanen wrote:
> > This patch series adds support for building the kernel with Clang's
> > Link Time Optimization (LTO). In addition to performance, the primary
> > motivation for LTO is to allow Clang's Control-Flow Integrity (CFI)
> > to be used in the kernel. Google has shipped millions of Pixel
> > devices running three major kernel versions with LTO+CFI since 2018.
> >
> > Most of the patches are build system changes for handling LLVM
> > bitcode, which Clang produces with LTO instead of ELF object files,
> > postponing ELF processing until a later stage, and ensuring initcall
> > ordering.
> >
> > Note that arm64 support depends on Will's memory ordering patches
> > [1]. I will post x86_64 patches separately after we have fixed the
> > remaining objtool warnings [2][3].
>
> I took this series for a spin, with my for-next/lto branch merged in but
> I see a failure during the LTO stage with clang 11.0.5 because it doesn't
> understand the '.arch_extension rcpc' directive we throw out in READ_ONCE().

I just tested this with Clang 11.0.0, which I believe is the latest
11.x version, and the current Clang 12 development branch, and both
work for me. Godbolt confirms that '.arch_extension rcpc' is supported
by the integrated assembler starting with Clang 11 (the example fails
with 10.0.1):

https://godbolt.org/z/1csGcT

What does running clang --version and ld.lld --version tell you?

> We actually check that this extension is available before using it in
> the arm64 Kconfig:
>
>         config AS_HAS_LDAPR
>                 def_bool $(as-instr,.arch_extension rcpc)
>
> so this shouldn't happen. I then realised, I wasn't passing LLVM_IAS=1
> on my Make command line; with that, then the detection works correctly
> and the LTO step succeeds.
>
> Why is it necessary to pass LLVM_IAS=1 if LTO is enabled? I think it
> would be _much_ better if this was implicit (or if LTO depended on it).

Without LLVM_IAS=1, Clang uses two different assemblers when LTO is
enabled: the external GNU assembler for stand-alone assembly, and
LLVM's integrated assembler for inline assembly. as-instr tests the
external assembler and makes an admittedly reasonable assumption that
the test is also valid for inline assembly.

I agree that it would reduce confusion in future if we just always
enabled IAS with LTO. Nick, Nathan, any thoughts about this?

Sami

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v8 00/16] Add support for Clang LTO
  2020-12-03 17:07   ` Sami Tolvanen
@ 2020-12-03 18:21     ` Nathan Chancellor
  2020-12-03 18:22     ` Will Deacon
  1 sibling, 0 replies; 48+ messages in thread
From: Nathan Chancellor @ 2020-12-03 18:21 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Will Deacon, Nick Desaulniers, Masahiro Yamada, Steven Rostedt,
	Josh Poimboeuf, Peter Zijlstra, Greg Kroah-Hartman,
	Paul E. McKenney, Kees Cook, clang-built-linux, Kernel Hardening,
	linux-arch, linux-arm-kernel, linux-kbuild, LKML, PCI

On Thu, Dec 03, 2020 at 09:07:30AM -0800, Sami Tolvanen wrote:
> On Thu, Dec 3, 2020 at 3:26 AM Will Deacon <will@kernel.org> wrote:
> >
> > Hi Sami,
> >
> > On Tue, Dec 01, 2020 at 01:36:51PM -0800, Sami Tolvanen wrote:
> > > This patch series adds support for building the kernel with Clang's
> > > Link Time Optimization (LTO). In addition to performance, the primary
> > > motivation for LTO is to allow Clang's Control-Flow Integrity (CFI)
> > > to be used in the kernel. Google has shipped millions of Pixel
> > > devices running three major kernel versions with LTO+CFI since 2018.
> > >
> > > Most of the patches are build system changes for handling LLVM
> > > bitcode, which Clang produces with LTO instead of ELF object files,
> > > postponing ELF processing until a later stage, and ensuring initcall
> > > ordering.
> > >
> > > Note that arm64 support depends on Will's memory ordering patches
> > > [1]. I will post x86_64 patches separately after we have fixed the
> > > remaining objtool warnings [2][3].
> >
> > I took this series for a spin, with my for-next/lto branch merged in but
> > I see a failure during the LTO stage with clang 11.0.5 because it doesn't
> > understand the '.arch_extension rcpc' directive we throw out in READ_ONCE().
> 
> I just tested this with Clang 11.0.0, which I believe is the latest
> 11.x version, and the current Clang 12 development branch, and both
> work for me. Godbolt confirms that '.arch_extension rcpc' is supported
> by the integrated assembler starting with Clang 11 (the example fails
> with 10.0.1):
> 
> https://godbolt.org/z/1csGcT
> 
> What does running clang --version and ld.lld --version tell you?

11.0.5 is AOSP's clang, which is behind the upstream 11.0.0 release so
it is most likely the case that it is missing the patch that added rcpc.
I think that a version based on the development branch (12.0.0) is in
the works but I am not sure.

> > We actually check that this extension is available before using it in
> > the arm64 Kconfig:
> >
> >         config AS_HAS_LDAPR
> >                 def_bool $(as-instr,.arch_extension rcpc)
> >
> > so this shouldn't happen. I then realised, I wasn't passing LLVM_IAS=1
> > on my Make command line; with that, then the detection works correctly
> > and the LTO step succeeds.
> >
> > Why is it necessary to pass LLVM_IAS=1 if LTO is enabled? I think it
> > would be _much_ better if this was implicit (or if LTO depended on it).
> 
> Without LLVM_IAS=1, Clang uses two different assemblers when LTO is
> enabled: the external GNU assembler for stand-alone assembly, and
> LLVM's integrated assembler for inline assembly. as-instr tests the
> external assembler and makes an admittedly reasonable assumption that
> the test is also valid for inline assembly.
> 
> I agree that it would reduce confusion in future if we just always
> enabled IAS with LTO. Nick, Nathan, any thoughts about this?

I am personally fine with that. As far as I am aware, we are in a fairly
good spot on arm64 and x86_64 when it comes to the integrated assembler.
Should we make it so that the user has to pass LLVM_IAS=1 explicitly or
we just make adding the no integrated assembler flag to CLANG_FLAGS
depend on not LTO (although that will require extra handling because
Kconfig is not available at that stage I think)?

Cheers,
Nathan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v8 00/16] Add support for Clang LTO
  2020-12-03 17:07   ` Sami Tolvanen
  2020-12-03 18:21     ` Nathan Chancellor
@ 2020-12-03 18:22     ` Will Deacon
  2020-12-03 22:32       ` Nick Desaulniers
  1 sibling, 1 reply; 48+ messages in thread
From: Will Deacon @ 2020-12-03 18:22 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Nick Desaulniers, Nathan Chancellor, Masahiro Yamada,
	Steven Rostedt, Josh Poimboeuf, Peter Zijlstra,
	Greg Kroah-Hartman, Paul E. McKenney, Kees Cook,
	clang-built-linux, Kernel Hardening, linux-arch,
	linux-arm-kernel, linux-kbuild, LKML, PCI

On Thu, Dec 03, 2020 at 09:07:30AM -0800, Sami Tolvanen wrote:
> On Thu, Dec 3, 2020 at 3:26 AM Will Deacon <will@kernel.org> wrote:
> > On Tue, Dec 01, 2020 at 01:36:51PM -0800, Sami Tolvanen wrote:
> > > This patch series adds support for building the kernel with Clang's
> > > Link Time Optimization (LTO). In addition to performance, the primary
> > > motivation for LTO is to allow Clang's Control-Flow Integrity (CFI)
> > > to be used in the kernel. Google has shipped millions of Pixel
> > > devices running three major kernel versions with LTO+CFI since 2018.
> > >
> > > Most of the patches are build system changes for handling LLVM
> > > bitcode, which Clang produces with LTO instead of ELF object files,
> > > postponing ELF processing until a later stage, and ensuring initcall
> > > ordering.
> > >
> > > Note that arm64 support depends on Will's memory ordering patches
> > > [1]. I will post x86_64 patches separately after we have fixed the
> > > remaining objtool warnings [2][3].
> >
> > I took this series for a spin, with my for-next/lto branch merged in but
> > I see a failure during the LTO stage with clang 11.0.5 because it doesn't
> > understand the '.arch_extension rcpc' directive we throw out in READ_ONCE().
> 
> I just tested this with Clang 11.0.0, which I believe is the latest
> 11.x version, and the current Clang 12 development branch, and both
> work for me. Godbolt confirms that '.arch_extension rcpc' is supported
> by the integrated assembler starting with Clang 11 (the example fails
> with 10.0.1):
> 
> https://godbolt.org/z/1csGcT
> 
> What does running clang --version and ld.lld --version tell you?

I'm using some Android prebuilts I had kicking around:

Android (6875598, based on r399163b) clang version 11.0.5 (https://android.googlesource.com/toolchain/llvm-project 87f1315dfbea7c137aa2e6d362dbb457e388158d)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/local/google/home/willdeacon/work/android/repo/android-kernel/prebuilts-master/clang/host/linux-x86/clang-r399163b/bin

and:

LLD 11.0.5 (/buildbot/tmp/tmpx1DlI_ 87f1315dfbea7c137aa2e6d362dbb457e388158d) (compatible with GNU linkers)

> > We actually check that this extension is available before using it in
> > the arm64 Kconfig:
> >
> >         config AS_HAS_LDAPR
> >                 def_bool $(as-instr,.arch_extension rcpc)
> >
> > so this shouldn't happen. I then realised, I wasn't passing LLVM_IAS=1
> > on my Make command line; with that, then the detection works correctly
> > and the LTO step succeeds.
> >
> > Why is it necessary to pass LLVM_IAS=1 if LTO is enabled? I think it
> > would be _much_ better if this was implicit (or if LTO depended on it).
> 
> Without LLVM_IAS=1, Clang uses two different assemblers when LTO is
> enabled: the external GNU assembler for stand-alone assembly, and
> LLVM's integrated assembler for inline assembly. as-instr tests the
> external assembler and makes an admittedly reasonable assumption that
> the test is also valid for inline assembly.
> 
> I agree that it would reduce confusion in future if we just always
> enabled IAS with LTO. Nick, Nathan, any thoughts about this?

That works for me, although I'm happy with anything which means that the
assembler checks via as-instr apply to the assembler which will ultimately
be used.

Will

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v8 00/16] Add support for Clang LTO
  2020-12-03 18:22     ` Will Deacon
@ 2020-12-03 22:32       ` Nick Desaulniers
  2020-12-04  9:35         ` Will Deacon
  2020-12-04 22:52         ` Sami Tolvanen
  0 siblings, 2 replies; 48+ messages in thread
From: Nick Desaulniers @ 2020-12-03 22:32 UTC (permalink / raw)
  To: Will Deacon, Sami Tolvanen, Masahiro Yamada
  Cc: Nathan Chancellor, Steven Rostedt, Josh Poimboeuf,
	Peter Zijlstra, Greg Kroah-Hartman, Paul E. McKenney, Kees Cook,
	clang-built-linux, Kernel Hardening, linux-arch,
	linux-arm-kernel, linux-kbuild, LKML, PCI, Jian Cai,
	Kristof Beyls

On Thu, Dec 3, 2020 at 10:23 AM Will Deacon <will@kernel.org> wrote:
>
> On Thu, Dec 03, 2020 at 09:07:30AM -0800, Sami Tolvanen wrote:
> > On Thu, Dec 3, 2020 at 3:26 AM Will Deacon <will@kernel.org> wrote:
> > > I took this series for a spin, with my for-next/lto branch merged in but
> > > I see a failure during the LTO stage with clang 11.0.5 because it doesn't
> > > understand the '.arch_extension rcpc' directive we throw out in READ_ONCE().
> >
> > I just tested this with Clang 11.0.0, which I believe is the latest
> > 11.x version, and the current Clang 12 development branch, and both
> > work for me. Godbolt confirms that '.arch_extension rcpc' is supported
> > by the integrated assembler starting with Clang 11 (the example fails
> > with 10.0.1):
> >
> > https://godbolt.org/z/1csGcT
> >
> > What does running clang --version and ld.lld --version tell you?
>
> I'm using some Android prebuilts I had kicking around:
>
> Android (6875598, based on r399163b) clang version 11.0.5 (https://android.googlesource.com/toolchain/llvm-project 87f1315dfbea7c137aa2e6d362dbb457e388158d)
> Target: x86_64-unknown-linux-gnu
> Thread model: posix
> InstalledDir: /usr/local/google/home/willdeacon/work/android/repo/android-kernel/prebuilts-master/clang/host/linux-x86/clang-r399163b/bin
>
> and:
>
> LLD 11.0.5 (/buildbot/tmp/tmpx1DlI_ 87f1315dfbea7c137aa2e6d362dbb457e388158d) (compatible with GNU linkers)

On Thu, Dec 3, 2020 at 10:22 AM Nathan Chancellor
<natechancellor@gmail.com> wrote:
>
> 11.0.5 is AOSP's clang, which is behind the upstream 11.0.0 release so
> it is most likely the case that it is missing the patch that added rcpc.
> I think that a version based on the development branch (12.0.0) is in
> the works but I am not sure.

Yep, I have a lot of thoughts on the AOSP LLVM versioning scheme, but
they're not for LKML.  That's yet another reason to prefer feature
detection as opposed to brittle version checks.  Of course, as Will
points out, if your feature detection is broken, that helps no
one...more thoughts below.

> > > We actually check that this extension is available before using it in
> > > the arm64 Kconfig:
> > >
> > >         config AS_HAS_LDAPR
> > >                 def_bool $(as-instr,.arch_extension rcpc)
> > >
> > > so this shouldn't happen. I then realised, I wasn't passing LLVM_IAS=1
> > > on my Make command line; with that, then the detection works correctly
> > > and the LTO step succeeds.
> > >
> > > Why is it necessary to pass LLVM_IAS=1 if LTO is enabled? I think it
> > > would be _much_ better if this was implicit (or if LTO depended on it).
> >
> > Without LLVM_IAS=1, Clang uses two different assemblers when LTO is
> > enabled: the external GNU assembler for stand-alone assembly, and
> > LLVM's integrated assembler for inline assembly. as-instr tests the
> > external assembler and makes an admittedly reasonable assumption that
> > the test is also valid for inline assembly.
> >
> > I agree that it would reduce confusion in future if we just always
> > enabled IAS with LTO. Nick, Nathan, any thoughts about this?
>
> That works for me, although I'm happy with anything which means that the
> assembler checks via as-instr apply to the assembler which will ultimately
> be used.

I agree with Will.

I think interoperability of tools is important.  We should be able to
mix tools from GNU and LLVM and produce working images. Specifically,
combinations like gcc+llvm-nm+as+llvm-objcopy, or clang+nm+as+objcopy
as two examples.  There's a combinatorial explosion of combinations to
test/validate, which we're not doing today, but if for some reason
someone wants to use some varied combination and it doesn't work, it's
worthwhile to understand the differences and issues and try to fix
them.  That is a win for optionality and loose coupling.

That's not what's going on here though.

While I think it's ok to select a compiler and assembler and linker
etc from ecosystem or another, I think trying to support a build that
mixes or uses different assemblers (or linkers, compilers, etc) from
both for the same build is something we should draw a line in the sand
and explicitly not support (except for the compat vdso's*...).  ie. if
I say `make LD=ld.bfd` and ld.lld gets invoked somehow (or vice
versa); I consider that a bug in KBUILD.

That is what's happening here, it's why as-instr feature detection is
broken; because two different assemblers were used in the same build.
One for inline asm, a different one for out of line asm.  At the very
least, it violates the Principle of Least Surprise (or is it the Law
of Equivalent Exchange, I forget).

In fact, lots of the work we've been doing to enable LLVM tools to
build the kernel have been identifying places throughout KBUILD where
tools were hardcoded rather than using what make was told to use, and
we've been making progress fixing those.  The ultimate test of Linux
kernel build hermiticity IMO is that I should be able to build a
kernel in an environment that only has one version of either
GCC/binutils or LLVM, and the kernel should build without failure.
That's not the case today for all arch's; cross compiling compat vdsos
again are a major pain point*, but we're making progress.  In that
sense, the mixing of an individual GNU and LLVM utility is what I
would consider a bug in KBUILD.  I want to emphasize that's distinct
from mixing and matching tools when invoking make, which I consider
OK, if under-tested.

Ok (mixes GNU and LLVM tools; gcc is the only compiler invoked, ld.lld
is the only linker invoked):
$ make CC=gcc LD=ld.lld

Not ok (if ld.bfd or both are invoked)
$ make LD=ld.lld

Not ok (if ld.lld or both are invoked)
$ make LD=ld.bfd

Not ok (if clang's integrated assembler and GAS are invoked)
$ ./scripts/config -e LTO_CLANG
$ make LLVM=1 LLVM_IAS=1

The mixing of GAS and clang's integrated assembler for kernel LTO
builds is a relic of a time when this series was first written when
Clang's integrated assembler was in no form ready to assemble the
entire Linux kernel, but could handle the inline asm for aarch64.
Fortunately, ARM's LLVM team has done great work to ensure the latest
extensions like RCpc are supported and compatible, and Jian has done
the hard work ironing out the last mile issues in clang's assembler to
get the ball in the end zone.  Removing mixing GAS and clang's IA here
ups the ante and removes a fallback/pressure relief valve, but I'm
fine with that.  Requiring clang's integrated assembler here aligns
incentives to keep this working and to continue investing here.

Just because it's possible to mix the use of clang's integrated
assembler with GNU assembler for LTO (for some combination of versions
of these tools) doesn't mean we should support it, or encourage it,
for all of the reasons above.  We should make this config depend on
clang's integrated assembler, and not support the mixing of assemblers
in one build.

Thou shalt not support invoking of different tools than what's
specified*.  Do not pass go, do not collect $200. Full stop.

* The compat vdso's are again a special case; when cross compiling
using GNU tools, a separate binary with a different target triple
prefix will typically get invoked than what's used to build the rest
of the kernel image.  This still doesn't cross the GNU/LLVM boundary
though, and most importantly doesn't involve linking together object
files that were built with distinct assemblers (for example).

So I'd recommend to Sami to simply make the Kconfig also depend on
clang's integrated assembler (not just llvm-nm and llvm-ar).  If
someone cares about LTO with Clang as the compiler but GAS as the
assembler, then we can revisit supporting that combination (and the
changes to KCONFIG), but it shouldn't be something we consider Tier 1
supported or a combination that need be supported in a minimum viable
product. And at that point we should make it avoid clang's integrated
assembler entirely (I suspect LTO won't work at all in that case, so
maybe even considering it is a waste of time).

One question I have to Will; if for aarch64 LTO will depend on RCpc,
but RCpc is an ARMv8.3 extension, what are the implications for LTO on
pre-ARMv8.3 aarch64 processors?
-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v8 00/16] Add support for Clang LTO
  2020-12-03 22:32       ` Nick Desaulniers
@ 2020-12-04  9:35         ` Will Deacon
  2020-12-04 22:52         ` Sami Tolvanen
  1 sibling, 0 replies; 48+ messages in thread
From: Will Deacon @ 2020-12-04  9:35 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Sami Tolvanen, Masahiro Yamada, Nathan Chancellor,
	Steven Rostedt, Josh Poimboeuf, Peter Zijlstra,
	Greg Kroah-Hartman, Paul E. McKenney, Kees Cook,
	clang-built-linux, Kernel Hardening, linux-arch,
	linux-arm-kernel, linux-kbuild, LKML, PCI, Jian Cai,
	Kristof Beyls

On Thu, Dec 03, 2020 at 02:32:13PM -0800, Nick Desaulniers wrote:
> On Thu, Dec 3, 2020 at 10:23 AM Will Deacon <will@kernel.org> wrote:
> > On Thu, Dec 03, 2020 at 09:07:30AM -0800, Sami Tolvanen wrote:
> > > Without LLVM_IAS=1, Clang uses two different assemblers when LTO is
> > > enabled: the external GNU assembler for stand-alone assembly, and
> > > LLVM's integrated assembler for inline assembly. as-instr tests the
> > > external assembler and makes an admittedly reasonable assumption that
> > > the test is also valid for inline assembly.
> > >
> > > I agree that it would reduce confusion in future if we just always
> > > enabled IAS with LTO. Nick, Nathan, any thoughts about this?
> >
> > That works for me, although I'm happy with anything which means that the
> > assembler checks via as-instr apply to the assembler which will ultimately
> > be used.
> 
> I agree with Will.

[...]

> So I'd recommend to Sami to simply make the Kconfig also depend on
> clang's integrated assembler (not just llvm-nm and llvm-ar).  If
> someone cares about LTO with Clang as the compiler but GAS as the
> assembler, then we can revisit supporting that combination (and the
> changes to KCONFIG), but it shouldn't be something we consider Tier 1
> supported or a combination that need be supported in a minimum viable
> product. And at that point we should make it avoid clang's integrated
> assembler entirely (I suspect LTO won't work at all in that case, so
> maybe even considering it is a waste of time).
> 
> One question I have to Will; if for aarch64 LTO will depend on RCpc,
> but RCpc is an ARMv8.3 extension, what are the implications for LTO on
> pre-ARMv8.3 aarch64 processors?

It doesn't depend on RCpc -- we just emit a more expensive instruction
(an RCsc acquire) if the RCpc one is not supported by both the toolchain
and the CPU. So the implication for those processors is that READ_ONCE()
may be more expensive.

Will

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v8 00/16] Add support for Clang LTO
  2020-12-03 22:32       ` Nick Desaulniers
  2020-12-04  9:35         ` Will Deacon
@ 2020-12-04 22:52         ` Sami Tolvanen
  2020-12-06  6:50           ` Nathan Chancellor
  1 sibling, 1 reply; 48+ messages in thread
From: Sami Tolvanen @ 2020-12-04 22:52 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Will Deacon, Masahiro Yamada, Nathan Chancellor, Steven Rostedt,
	Josh Poimboeuf, Peter Zijlstra, Greg Kroah-Hartman,
	Paul E. McKenney, Kees Cook, clang-built-linux, Kernel Hardening,
	linux-arch, linux-arm-kernel, linux-kbuild, LKML, PCI, Jian Cai,
	Kristof Beyls

On Thu, Dec 3, 2020 at 2:32 PM Nick Desaulniers <ndesaulniers@google.com> wrote:
>
> So I'd recommend to Sami to simply make the Kconfig also depend on
> clang's integrated assembler (not just llvm-nm and llvm-ar).

Sure, sounds good to me. What's the preferred way to test for this in Kconfig?

It looks like actually trying to test if we have an LLVM assembler
(e.g. using $(as-instr,.section
".linker-options","e",@llvm_linker_options)) doesn't work as Kconfig
doesn't pass -no-integrated-as to clang here. I could do something
simple like $(success,echo $(LLVM) $(LLVM_IAS) | grep -q "1 1").

Thoughts?

Sami

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v8 00/16] Add support for Clang LTO
  2020-12-04 22:52         ` Sami Tolvanen
@ 2020-12-06  6:50           ` Nathan Chancellor
  2020-12-06 20:09             ` Sami Tolvanen
  0 siblings, 1 reply; 48+ messages in thread
From: Nathan Chancellor @ 2020-12-06  6:50 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Nick Desaulniers, Will Deacon, Masahiro Yamada, Steven Rostedt,
	Josh Poimboeuf, Peter Zijlstra, Greg Kroah-Hartman,
	Paul E. McKenney, Kees Cook, clang-built-linux, Kernel Hardening,
	linux-arch, linux-arm-kernel, linux-kbuild, LKML, PCI, Jian Cai,
	Kristof Beyls

On Fri, Dec 04, 2020 at 02:52:41PM -0800, Sami Tolvanen wrote:
> On Thu, Dec 3, 2020 at 2:32 PM Nick Desaulniers <ndesaulniers@google.com> wrote:
> >
> > So I'd recommend to Sami to simply make the Kconfig also depend on
> > clang's integrated assembler (not just llvm-nm and llvm-ar).
> 
> Sure, sounds good to me. What's the preferred way to test for this in Kconfig?
> 
> It looks like actually trying to test if we have an LLVM assembler
> (e.g. using $(as-instr,.section
> ".linker-options","e",@llvm_linker_options)) doesn't work as Kconfig
> doesn't pass -no-integrated-as to clang here. I could do something
> simple like $(success,echo $(LLVM) $(LLVM_IAS) | grep -q "1 1").
> 
> Thoughts?
> 
> Sami

I think

    depends on $(success,test $(LLVM_IAS) -eq 1)

should work, at least according to my brief test.

Cheers,
Nathan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v8 00/16] Add support for Clang LTO
  2020-12-06  6:50           ` Nathan Chancellor
@ 2020-12-06 20:09             ` Sami Tolvanen
  2020-12-08  0:46               ` Nathan Chancellor
  0 siblings, 1 reply; 48+ messages in thread
From: Sami Tolvanen @ 2020-12-06 20:09 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Nick Desaulniers, Will Deacon, Masahiro Yamada, Steven Rostedt,
	Josh Poimboeuf, Peter Zijlstra, Greg Kroah-Hartman,
	Paul E. McKenney, Kees Cook, clang-built-linux, Kernel Hardening,
	linux-arch, linux-arm-kernel, linux-kbuild, LKML, PCI, Jian Cai,
	Kristof Beyls

On Sat, Dec 5, 2020 at 10:50 PM Nathan Chancellor
<natechancellor@gmail.com> wrote:
>
> On Fri, Dec 04, 2020 at 02:52:41PM -0800, Sami Tolvanen wrote:
> > On Thu, Dec 3, 2020 at 2:32 PM Nick Desaulniers <ndesaulniers@google.com> wrote:
> > >
> > > So I'd recommend to Sami to simply make the Kconfig also depend on
> > > clang's integrated assembler (not just llvm-nm and llvm-ar).
> >
> > Sure, sounds good to me. What's the preferred way to test for this in Kconfig?
> >
> > It looks like actually trying to test if we have an LLVM assembler
> > (e.g. using $(as-instr,.section
> > ".linker-options","e",@llvm_linker_options)) doesn't work as Kconfig
> > doesn't pass -no-integrated-as to clang here.

After a closer look, that's actually not correct, this seems to work
with Clang+LLD no matter which assembler is used. I suppose we could
test for .gasversion. to detect GNU as, but that's hardly ideal.

> >I could do something
> > simple like $(success,echo $(LLVM) $(LLVM_IAS) | grep -q "1 1").
> >
> > Thoughts?
> >
> > Sami
>
> I think
>
>     depends on $(success,test $(LLVM_IAS) -eq 1)
>
> should work, at least according to my brief test.

Sure, looks good to me. However, I think we should also test for
LLVM=1 to avoid possible further issues with mismatched toolchains
instead of only checking for llvm-nm and llvm-ar.

Sami

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v8 00/16] Add support for Clang LTO
  2020-12-06 20:09             ` Sami Tolvanen
@ 2020-12-08  0:46               ` Nathan Chancellor
  0 siblings, 0 replies; 48+ messages in thread
From: Nathan Chancellor @ 2020-12-08  0:46 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Nick Desaulniers, Will Deacon, Masahiro Yamada, Steven Rostedt,
	Josh Poimboeuf, Peter Zijlstra, Greg Kroah-Hartman,
	Paul E. McKenney, Kees Cook, clang-built-linux, Kernel Hardening,
	linux-arch, linux-arm-kernel, linux-kbuild, LKML, PCI, Jian Cai,
	Kristof Beyls

On Sun, Dec 06, 2020 at 12:09:31PM -0800, Sami Tolvanen wrote:
> Sure, looks good to me. However, I think we should also test for
> LLVM=1 to avoid possible further issues with mismatched toolchains
> instead of only checking for llvm-nm and llvm-ar.

It might still be worth testing for $(AR) and $(NM) because in theory, a
user could say 'make AR=ar LLVM=1'. Highly unlikely I suppose but worth
considering.

Cheers,
Nathan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v8 00/16] Add support for Clang LTO
  2020-12-01 21:36 [PATCH v8 00/16] Add support for Clang LTO Sami Tolvanen
                   ` (17 preceding siblings ...)
  2020-12-03 11:26 ` Will Deacon
@ 2020-12-08 12:15 ` Arnd Bergmann
  2020-12-08 13:54   ` Arnd Bergmann
                     ` (2 more replies)
  18 siblings, 3 replies; 48+ messages in thread
From: Arnd Bergmann @ 2020-12-08 12:15 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Masahiro Yamada, Steven Rostedt, Will Deacon, Josh Poimboeuf,
	Peter Zijlstra, Greg Kroah-Hartman, Paul E. McKenney, Kees Cook,
	Nick Desaulniers, clang-built-linux, Kernel Hardening,
	linux-arch, Linux ARM, Linux Kbuild mailing list, linux-kernel,
	linux-pci

On Tue, Dec 1, 2020 at 10:37 PM 'Sami Tolvanen' via Clang Built Linux
<clang-built-linux@googlegroups.com> wrote:
>
> This patch series adds support for building the kernel with Clang's
> Link Time Optimization (LTO). In addition to performance, the primary
> motivation for LTO is to allow Clang's Control-Flow Integrity (CFI)
> to be used in the kernel. Google has shipped millions of Pixel
> devices running three major kernel versions with LTO+CFI since 2018.
>
> Most of the patches are build system changes for handling LLVM
> bitcode, which Clang produces with LTO instead of ELF object files,
> postponing ELF processing until a later stage, and ensuring initcall
> ordering.
>
> Note that arm64 support depends on Will's memory ordering patches
> [1]. I will post x86_64 patches separately after we have fixed the
> remaining objtool warnings [2][3].
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-next/lto
> [2] https://lore.kernel.org/lkml/20201120040424.a3wctajzft4ufoiw@treble/
> [3] https://git.kernel.org/pub/scm/linux/kernel/git/jpoimboe/linux.git/log/?h=objtool-vmlinux
>
> You can also pull this series from
>
>   https://github.com/samitolvanen/linux.git lto-v8

I've tried pull this into my randconfig test tree to give it a spin.
So far I have
not managed to get a working build out of it, the main problem so far being
that it is really slow to build because the link stage only uses one CPU.
These are the other issues I've seen so far:

- one build seems to take even longer to link. It's currently at 35GB RAM
  usage and 40 minutes into the final link, but I'm worried it might
not complete
  before it runs out of memory.  I only have 128GB installed, and google-chrome
  uses another 30GB of that, and I'm also doing some other builds in parallel.
  Is there a minimum recommended amount of memory for doing LTO builds?

- One build failed with
 ld.lld -EL -maarch64elf -mllvm -import-instr-limit=5 -r -o vmlinux.o
-T .tmp_initcalls.lds --whole-archive arch/arm64/kernel/head.o
init/built-in.a usr/built-in.a arch/arm64/built-in.a kernel/built-in.a
certs/built-in.a mm/built-in.a fs/built-in.a ipc/built-in.a
security/built-in.a crypto/built-in.a block/built-in.a
arch/arm64/lib/built-in.a lib/built-in.a drivers/built-in.a
sound/built-in.a net/built-in.a virt/built-in.a --no-whole-archive
--start-group arch/arm64/lib/lib.a lib/lib.a
./drivers/firmware/efi/libstub/lib.a --end-group
  "ld.lld: error: arch/arm64/kernel/head.o: invalid symbol index"
  after about 30 minutes

- CONFIG_CPU_BIG_ENDIAN doesn't seem to work with lld, and LTO
  doesn't work with ld.bfd.
  I've added a CPU_LITTLE_ENDIAN dependency to
  ARCH_SUPPORTS_LTO_CLANG{,THIN}

- one build failed with
  "ld.lld: error: Never resolved function from blockaddress (Producer:
'LLVM12.0.0' Reader: 'LLVM 12.0.0')"
  Not sure how to debug this

- one build seems to have dropped all symbols the string operations
from vmlinux,
  so while the link goes through, modules cannot be loaded:
 ERROR: modpost: "memmove" [drivers/media/rc/rc-core.ko] undefined!
 ERROR: modpost: "memcpy" [net/wireless/cfg80211.ko] undefined!
 ERROR: modpost: "memcpy" [net/8021q/8021q.ko] undefined!
 ERROR: modpost: "memset" [net/8021q/8021q.ko] undefined!
 ERROR: modpost: "memcpy" [net/unix/unix.ko] undefined!
 ERROR: modpost: "memset" [net/sched/cls_u32.ko] undefined!
 ERROR: modpost: "memcpy" [net/sched/cls_u32.ko] undefined!
 ERROR: modpost: "memset" [net/sched/sch_skbprio.ko] undefined!
 ERROR: modpost: "memcpy" [net/802/garp.ko] undefined!
 I first thought this was related to a clang-12 bug I saw the other day, but
 this also happens with clang-11

- many builds complain about thousands of duplicate symbols in the kernel, e.g.
  ld.lld: error: duplicate symbol: qrtr_endpoint_post
 >>> defined in net/qrtr/qrtr.lto.o
 >>> defined in net/qrtr/qrtr.o
 ld.lld: error: duplicate symbol: init_module
 >>> defined in crypto/842.lto.o
 >>> defined in crypto/842.o
 ld.lld: error: duplicate symbol: init_module
 >>> defined in net/netfilter/nfnetlink_log.lto.o
 >>> defined in net/netfilter/nfnetlink_log.o
 ld.lld: error: duplicate symbol: vli_from_be64
 >>> defined in crypto/ecc.lto.o
 >>> defined in crypto/ecc.o
 ld.lld: error: duplicate symbol: __mod_of__plldig_clk_id_device_table
 >>> defined in drivers/clk/clk-plldig.lto.o
 >>> defined in drivers/clk/clk-plldig.o

Not sure if these are all known issues. If there is one you'd like me try
take a closer look at for finding which config options break it, I can try

     Arnd

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v8 00/16] Add support for Clang LTO
  2020-12-08 12:15 ` Arnd Bergmann
@ 2020-12-08 13:54   ` Arnd Bergmann
  2020-12-08 16:53     ` Sami Tolvanen
  2020-12-08 16:43   ` Sami Tolvanen
  2020-12-09 12:35   ` Arnd Bergmann
  2 siblings, 1 reply; 48+ messages in thread
From: Arnd Bergmann @ 2020-12-08 13:54 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Masahiro Yamada, Steven Rostedt, Will Deacon, Josh Poimboeuf,
	Peter Zijlstra, Greg Kroah-Hartman, Paul E. McKenney, Kees Cook,
	Nick Desaulniers, clang-built-linux, Kernel Hardening,
	linux-arch, Linux ARM, Linux Kbuild mailing list, linux-kernel,
	linux-pci

On Tue, Dec 8, 2020 at 1:15 PM Arnd Bergmann <arnd@kernel.org> wrote:
> On Tue, Dec 1, 2020 at 10:37 PM 'Sami Tolvanen' via Clang Built Linux <clang-built-linux@googlegroups.com> wrote:
>
> - many builds complain about thousands of duplicate symbols in the kernel, e.g.
>   ld.lld: error: duplicate symbol: qrtr_endpoint_post
>  >>> defined in net/qrtr/qrtr.lto.o
>  >>> defined in net/qrtr/qrtr.o
>  ld.lld: error: duplicate symbol: init_module
>  >>> defined in crypto/842.lto.o
>  >>> defined in crypto/842.o
>  ld.lld: error: duplicate symbol: init_module
>  >>> defined in net/netfilter/nfnetlink_log.lto.o
>  >>> defined in net/netfilter/nfnetlink_log.o
>  ld.lld: error: duplicate symbol: vli_from_be64
>  >>> defined in crypto/ecc.lto.o
>  >>> defined in crypto/ecc.o
>  ld.lld: error: duplicate symbol: __mod_of__plldig_clk_id_device_table
>  >>> defined in drivers/clk/clk-plldig.lto.o
>  >>> defined in drivers/clk/clk-plldig.o

A small update here: I see this behavior with every single module
build, including 'tinyconfig' with one module enabled, and 'defconfig'.

I tuned the randconfig setting using KCONFIG_PROBABILITY=2:2:1
now, which only enables a few symbols. With this I see faster build
times (obvioulsy), aroudn 30 seconds per kernel, and all small builds
with CONFIG_MODULES disabled so far succeed.
It appears that the problems I saw originally only happen for larger
configurations, or possibly a combination of Kconfig options that don't
happen that often on randconfig builds with low
KCONFIG_PROBABILITY.

      Arnd

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v8 00/16] Add support for Clang LTO
  2020-12-08 12:15 ` Arnd Bergmann
  2020-12-08 13:54   ` Arnd Bergmann
@ 2020-12-08 16:43   ` Sami Tolvanen
       [not found]     ` <CAK8P3a1Xfpt7QLkvxjtXKcgzcWkS8g9bmxD687+rqjTafTzKrg@mail.gmail.com>
  2020-12-09  4:55     ` Fangrui Song
  2020-12-09 12:35   ` Arnd Bergmann
  2 siblings, 2 replies; 48+ messages in thread
From: Sami Tolvanen @ 2020-12-08 16:43 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Masahiro Yamada, Steven Rostedt, Will Deacon, Josh Poimboeuf,
	Peter Zijlstra, Greg Kroah-Hartman, Paul E. McKenney, Kees Cook,
	Nick Desaulniers, clang-built-linux, Kernel Hardening,
	linux-arch, Linux ARM, Linux Kbuild mailing list, linux-kernel,
	linux-pci

On Tue, Dec 8, 2020 at 4:15 AM Arnd Bergmann <arnd@kernel.org> wrote:
>
> On Tue, Dec 1, 2020 at 10:37 PM 'Sami Tolvanen' via Clang Built Linux
> <clang-built-linux@googlegroups.com> wrote:
> >
> > This patch series adds support for building the kernel with Clang's
> > Link Time Optimization (LTO). In addition to performance, the primary
> > motivation for LTO is to allow Clang's Control-Flow Integrity (CFI)
> > to be used in the kernel. Google has shipped millions of Pixel
> > devices running three major kernel versions with LTO+CFI since 2018.
> >
> > Most of the patches are build system changes for handling LLVM
> > bitcode, which Clang produces with LTO instead of ELF object files,
> > postponing ELF processing until a later stage, and ensuring initcall
> > ordering.
> >
> > Note that arm64 support depends on Will's memory ordering patches
> > [1]. I will post x86_64 patches separately after we have fixed the
> > remaining objtool warnings [2][3].
> >
> > [1] https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-next/lto
> > [2] https://lore.kernel.org/lkml/20201120040424.a3wctajzft4ufoiw@treble/
> > [3] https://git.kernel.org/pub/scm/linux/kernel/git/jpoimboe/linux.git/log/?h=objtool-vmlinux
> >
> > You can also pull this series from
> >
> >   https://github.com/samitolvanen/linux.git lto-v8
>
> I've tried pull this into my randconfig test tree to give it a spin.

Great, thank you for testing this!

> So far I have
> not managed to get a working build out of it, the main problem so far being
> that it is really slow to build because the link stage only uses one CPU.
> These are the other issues I've seen so far:

You may want to limit your testing only to ThinLTO at first, because
full LTO is going to be extremely slow with larger configs, especially
when building arm64 kernels.

> - one build seems to take even longer to link. It's currently at 35GB RAM
>   usage and 40 minutes into the final link, but I'm worried it might
> not complete
>   before it runs out of memory.  I only have 128GB installed, and google-chrome
>   uses another 30GB of that, and I'm also doing some other builds in parallel.
>   Is there a minimum recommended amount of memory for doing LTO builds?

When building arm64 defconfig, the maximum memory usage I measured
with ThinLTO was 3.5 GB, and with full LTO 20.3 GB. I haven't measured
larger configurations, but I believe LLD can easily consume 3-4x that
much with full LTO allyesconfig.

> - One build failed with
>  ld.lld -EL -maarch64elf -mllvm -import-instr-limit=5 -r -o vmlinux.o
> -T .tmp_initcalls.lds --whole-archive arch/arm64/kernel/head.o
> init/built-in.a usr/built-in.a arch/arm64/built-in.a kernel/built-in.a
> certs/built-in.a mm/built-in.a fs/built-in.a ipc/built-in.a
> security/built-in.a crypto/built-in.a block/built-in.a
> arch/arm64/lib/built-in.a lib/built-in.a drivers/built-in.a
> sound/built-in.a net/built-in.a virt/built-in.a --no-whole-archive
> --start-group arch/arm64/lib/lib.a lib/lib.a
> ./drivers/firmware/efi/libstub/lib.a --end-group
>   "ld.lld: error: arch/arm64/kernel/head.o: invalid symbol index"
>   after about 30 minutes

That's interesting. Did you use LLVM_IAS=1?

> - CONFIG_CPU_BIG_ENDIAN doesn't seem to work with lld, and LTO
>   doesn't work with ld.bfd.
>   I've added a CPU_LITTLE_ENDIAN dependency to
>   ARCH_SUPPORTS_LTO_CLANG{,THIN}

Ah, good point. I'll fix this in v9.

[...]
> Not sure if these are all known issues. If there is one you'd like me try
> take a closer look at for finding which config options break it, I can try

No, none of these are known issues. I would be happy to take a closer
look if you can share configs that reproduce these.

Sami

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v8 00/16] Add support for Clang LTO
  2020-12-08 13:54   ` Arnd Bergmann
@ 2020-12-08 16:53     ` Sami Tolvanen
  2020-12-08 16:56       ` Arnd Bergmann
  0 siblings, 1 reply; 48+ messages in thread
From: Sami Tolvanen @ 2020-12-08 16:53 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Masahiro Yamada, Steven Rostedt, Will Deacon, Josh Poimboeuf,
	Peter Zijlstra, Greg Kroah-Hartman, Paul E. McKenney, Kees Cook,
	Nick Desaulniers, clang-built-linux, Kernel Hardening,
	linux-arch, Linux ARM, Linux Kbuild mailing list, linux-kernel,
	linux-pci

On Tue, Dec 8, 2020 at 5:55 AM Arnd Bergmann <arnd@kernel.org> wrote:
>
> On Tue, Dec 8, 2020 at 1:15 PM Arnd Bergmann <arnd@kernel.org> wrote:
> > On Tue, Dec 1, 2020 at 10:37 PM 'Sami Tolvanen' via Clang Built Linux <clang-built-linux@googlegroups.com> wrote:
> >
> > - many builds complain about thousands of duplicate symbols in the kernel, e.g.
> >   ld.lld: error: duplicate symbol: qrtr_endpoint_post
> >  >>> defined in net/qrtr/qrtr.lto.o
> >  >>> defined in net/qrtr/qrtr.o
> >  ld.lld: error: duplicate symbol: init_module
> >  >>> defined in crypto/842.lto.o
> >  >>> defined in crypto/842.o
> >  ld.lld: error: duplicate symbol: init_module
> >  >>> defined in net/netfilter/nfnetlink_log.lto.o
> >  >>> defined in net/netfilter/nfnetlink_log.o
> >  ld.lld: error: duplicate symbol: vli_from_be64
> >  >>> defined in crypto/ecc.lto.o
> >  >>> defined in crypto/ecc.o
> >  ld.lld: error: duplicate symbol: __mod_of__plldig_clk_id_device_table
> >  >>> defined in drivers/clk/clk-plldig.lto.o
> >  >>> defined in drivers/clk/clk-plldig.o
>
> A small update here: I see this behavior with every single module
> build, including 'tinyconfig' with one module enabled, and 'defconfig'.

The .o file here is a thin archive of the bitcode files for the
module. We compile .lto.o from that before modpost, because we need an
ELF binary to process, and then reuse the .lto.o file when linking the
final module.

At no point should we link the .o file again, especially not with
.lto.o, because that would clearly cause every symbol to be
duplicated, so I'm not sure what goes wrong here. Here's the relevant
part of scripts/Makefile.modfinal:

ifdef CONFIG_LTO_CLANG
# With CONFIG_LTO_CLANG, reuse the object file we compiled for modpost to
# avoid a second slow LTO link
prelink-ext := .lto
...
$(modules): %.ko: %$(prelink-ext).o %.mod.o scripts/module.lds FORCE
        +$(call if_changed,ld_ko_o)

Sami

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v8 00/16] Add support for Clang LTO
  2020-12-08 16:53     ` Sami Tolvanen
@ 2020-12-08 16:56       ` Arnd Bergmann
  0 siblings, 0 replies; 48+ messages in thread
From: Arnd Bergmann @ 2020-12-08 16:56 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Masahiro Yamada, Steven Rostedt, Will Deacon, Josh Poimboeuf,
	Peter Zijlstra, Greg Kroah-Hartman, Paul E. McKenney, Kees Cook,
	Nick Desaulniers, clang-built-linux, Kernel Hardening,
	linux-arch, Linux ARM, Linux Kbuild mailing list, linux-kernel,
	linux-pci

On Tue, Dec 8, 2020 at 5:53 PM 'Sami Tolvanen' via Clang Built Linux
<clang-built-linux@googlegroups.com> wrote:
>
> > A small update here: I see this behavior with every single module
> > build, including 'tinyconfig' with one module enabled, and 'defconfig'.
>
> The .o file here is a thin archive of the bitcode files for the
> module. We compile .lto.o from that before modpost, because we need an
> ELF binary to process, and then reuse the .lto.o file when linking the
> final module.
>
> At no point should we link the .o file again, especially not with
> .lto.o, because that would clearly cause every symbol to be
> duplicated, so I'm not sure what goes wrong here. Here's the relevant
> part of scripts/Makefile.modfinal:
>
> ifdef CONFIG_LTO_CLANG
> # With CONFIG_LTO_CLANG, reuse the object file we compiled for modpost to
> # avoid a second slow LTO link
> prelink-ext := .lto
> ...
> $(modules): %.ko: %$(prelink-ext).o %.mod.o scripts/module.lds FORCE
>         +$(call if_changed,ld_ko_o)

Ah, it's probably a local problem now, as I had a merge conflict against
linux-next in this Makefile and I must have resolved the conflict incorrectly.

        Arnd

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v8 00/16] Add support for Clang LTO
       [not found]     ` <CAK8P3a1Xfpt7QLkvxjtXKcgzcWkS8g9bmxD687+rqjTafTzKrg@mail.gmail.com>
@ 2020-12-08 21:09       ` Nick Desaulniers
  2020-12-08 22:20         ` Arnd Bergmann
       [not found]       ` <CAK8P3a3O65m6Us=YvCP3QA+0kqAeEqfi-DLOJa+JYmBqs8-JcA@mail.gmail.com>
  1 sibling, 1 reply; 48+ messages in thread
From: Nick Desaulniers @ 2020-12-08 21:09 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Sami Tolvanen, Masahiro Yamada, Steven Rostedt, Will Deacon,
	Josh Poimboeuf, Peter Zijlstra, Greg Kroah-Hartman,
	Paul E. McKenney, Kees Cook, clang-built-linux, Kernel Hardening,
	linux-arch, Linux ARM, Linux Kbuild mailing list, linux-kernel,
	linux-pci

On Tue, Dec 8, 2020 at 1:00 PM Arnd Bergmann <arnd@kernel.org> wrote:
>
> On Tue, Dec 8, 2020 at 5:43 PM 'Sami Tolvanen' via Clang Built Linux
> <clang-built-linux@googlegroups.com> wrote:
> >
> > On Tue, Dec 8, 2020 at 4:15 AM Arnd Bergmann <arnd@kernel.org> wrote:
> > >
> > > - one build seems to take even longer to link. It's currently at 35GB RAM
> > >   usage and 40 minutes into the final link, but I'm worried it might
> > > not complete
> > >   before it runs out of memory.  I only have 128GB installed, and google-chrome
> > >   uses another 30GB of that, and I'm also doing some other builds in parallel.
> > >   Is there a minimum recommended amount of memory for doing LTO builds?
> >
> > When building arm64 defconfig, the maximum memory usage I measured
> > with ThinLTO was 3.5 GB, and with full LTO 20.3 GB. I haven't measured
> > larger configurations, but I believe LLD can easily consume 3-4x that
> > much with full LTO allyesconfig.
>
> Ok, that's not too bad then. Is there actually a reason to still
> support full-lto
> in your series? As I understand it, full LTO was the initial approach and
> used to work better, but thin LTO is actually what we want to use in the
> long run. Perhaps dropping the full LTO option from your series now
> that thin LTO works well enough and uses less resources would help
> avoid some of the problems.

While all developers agree that ThinLTO is a much more palatable
experience than full LTO; our product teams prefer the excessive build
time and memory high water mark (at build time) costs in exchange for
slightly better performance than ThinLTO in <benchmarks that I've been
told are important>.  Keeping support for full LTO in tree would help
our product teams reduce the amount of out of tree code they have.  As
long as <benchmarks that I've been told are important> help
sell/differentiate phones, I suspect our product teams will continue
to ship full LTO in production.
-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v8 00/16] Add support for Clang LTO
  2020-12-08 21:09       ` Nick Desaulniers
@ 2020-12-08 22:20         ` Arnd Bergmann
  2020-12-09 16:11           ` Sami Tolvanen
  0 siblings, 1 reply; 48+ messages in thread
From: Arnd Bergmann @ 2020-12-08 22:20 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Sami Tolvanen, Masahiro Yamada, Steven Rostedt, Will Deacon,
	Josh Poimboeuf, Peter Zijlstra, Greg Kroah-Hartman,
	Paul E. McKenney, Kees Cook, clang-built-linux, Kernel Hardening,
	linux-arch, Linux ARM, Linux Kbuild mailing list, linux-kernel,
	linux-pci

On Tue, Dec 8, 2020 at 10:10 PM 'Nick Desaulniers' via Clang Built
Linux <clang-built-linux@googlegroups.com> wrote:
>
> On Tue, Dec 8, 2020 at 1:00 PM Arnd Bergmann <arnd@kernel.org> wrote:
> >
> > On Tue, Dec 8, 2020 at 5:43 PM 'Sami Tolvanen' via Clang Built Linux
> > <clang-built-linux@googlegroups.com> wrote:
> > >
> > > On Tue, Dec 8, 2020 at 4:15 AM Arnd Bergmann <arnd@kernel.org> wrote:
> > > >
> > > > - one build seems to take even longer to link. It's currently at 35GB RAM
> > > >   usage and 40 minutes into the final link, but I'm worried it might
> > > > not complete
> > > >   before it runs out of memory.  I only have 128GB installed, and google-chrome
> > > >   uses another 30GB of that, and I'm also doing some other builds in parallel.
> > > >   Is there a minimum recommended amount of memory for doing LTO builds?
> > >
> > > When building arm64 defconfig, the maximum memory usage I measured
> > > with ThinLTO was 3.5 GB, and with full LTO 20.3 GB. I haven't measured
> > > larger configurations, but I believe LLD can easily consume 3-4x that
> > > much with full LTO allyesconfig.
> >
> > Ok, that's not too bad then. Is there actually a reason to still
> > support full-lto
> > in your series? As I understand it, full LTO was the initial approach and
> > used to work better, but thin LTO is actually what we want to use in the
> > long run. Perhaps dropping the full LTO option from your series now
> > that thin LTO works well enough and uses less resources would help
> > avoid some of the problems.
>
> While all developers agree that ThinLTO is a much more palatable
> experience than full LTO; our product teams prefer the excessive build
> time and memory high water mark (at build time) costs in exchange for
> slightly better performance than ThinLTO in <benchmarks that I've been
> told are important>.  Keeping support for full LTO in tree would help
> our product teams reduce the amount of out of tree code they have.  As
> long as <benchmarks that I've been told are important> help
> sell/differentiate phones, I suspect our product teams will continue
> to ship full LTO in production.

Ok, fair enough. How about marking FULL_LTO as 'depends on
!COMPILE_TEST' then? I'll do that locally for my randconfig tests,
but it would help the other build bots that also force-enable
COMPILE_TEST.

       Arnd

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v8 00/16] Add support for Clang LTO
  2020-12-08 16:43   ` Sami Tolvanen
       [not found]     ` <CAK8P3a1Xfpt7QLkvxjtXKcgzcWkS8g9bmxD687+rqjTafTzKrg@mail.gmail.com>
@ 2020-12-09  4:55     ` Fangrui Song
  2020-12-09  9:19       ` Arnd Bergmann
  1 sibling, 1 reply; 48+ messages in thread
From: Fangrui Song @ 2020-12-09  4:55 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Arnd Bergmann, Masahiro Yamada, Steven Rostedt, Will Deacon,
	Josh Poimboeuf, Peter Zijlstra, Greg Kroah-Hartman,
	Paul E. McKenney, Kees Cook, Nick Desaulniers, clang-built-linux,
	Kernel Hardening, linux-arch, Linux ARM,
	Linux Kbuild mailing list, linux-kernel, linux-pci


On 2020-12-08, 'Sami Tolvanen' via Clang Built Linux wrote:
>On Tue, Dec 8, 2020 at 4:15 AM Arnd Bergmann <arnd@kernel.org> wrote:
>>
>> On Tue, Dec 1, 2020 at 10:37 PM 'Sami Tolvanen' via Clang Built Linux
>> <clang-built-linux@googlegroups.com> wrote:
>> >
>> > This patch series adds support for building the kernel with Clang's
>> > Link Time Optimization (LTO). In addition to performance, the primary
>> > motivation for LTO is to allow Clang's Control-Flow Integrity (CFI)
>> > to be used in the kernel. Google has shipped millions of Pixel
>> > devices running three major kernel versions with LTO+CFI since 2018.
>> >
>> > Most of the patches are build system changes for handling LLVM
>> > bitcode, which Clang produces with LTO instead of ELF object files,
>> > postponing ELF processing until a later stage, and ensuring initcall
>> > ordering.
>> >
>> > Note that arm64 support depends on Will's memory ordering patches
>> > [1]. I will post x86_64 patches separately after we have fixed the
>> > remaining objtool warnings [2][3].
>> >
>> > [1] https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-next/lto
>> > [2] https://lore.kernel.org/lkml/20201120040424.a3wctajzft4ufoiw@treble/
>> > [3] https://git.kernel.org/pub/scm/linux/kernel/git/jpoimboe/linux.git/log/?h=objtool-vmlinux
>> >
>> > You can also pull this series from
>> >
>> >   https://github.com/samitolvanen/linux.git lto-v8
>>
>> I've tried pull this into my randconfig test tree to give it a spin.
>
>Great, thank you for testing this!
>
>> So far I have
>> not managed to get a working build out of it, the main problem so far being
>> that it is really slow to build because the link stage only uses one CPU.
>> These are the other issues I've seen so far:

ld.lld ThinLTO uses the number of (physical cores enabled by affinity) by default.

>You may want to limit your testing only to ThinLTO at first, because
>full LTO is going to be extremely slow with larger configs, especially
>when building arm64 kernels.
>
>> - one build seems to take even longer to link. It's currently at 35GB RAM
>>   usage and 40 minutes into the final link, but I'm worried it might
>> not complete
>>   before it runs out of memory.  I only have 128GB installed, and google-chrome
>>   uses another 30GB of that, and I'm also doing some other builds in parallel.
>>   Is there a minimum recommended amount of memory for doing LTO builds?
>
>When building arm64 defconfig, the maximum memory usage I measured
>with ThinLTO was 3.5 GB, and with full LTO 20.3 GB. I haven't measured
>larger configurations, but I believe LLD can easily consume 3-4x that
>much with full LTO allyesconfig.
>
>> - One build failed with
>>  ld.lld -EL -maarch64elf -mllvm -import-instr-limit=5 -r -o vmlinux.o
>> -T .tmp_initcalls.lds --whole-archive arch/arm64/kernel/head.o
>> init/built-in.a usr/built-in.a arch/arm64/built-in.a kernel/built-in.a
>> certs/built-in.a mm/built-in.a fs/built-in.a ipc/built-in.a
>> security/built-in.a crypto/built-in.a block/built-in.a
>> arch/arm64/lib/built-in.a lib/built-in.a drivers/built-in.a
>> sound/built-in.a net/built-in.a virt/built-in.a --no-whole-archive
>> --start-group arch/arm64/lib/lib.a lib/lib.a
>> ./drivers/firmware/efi/libstub/lib.a --end-group
>>   "ld.lld: error: arch/arm64/kernel/head.o: invalid symbol index"
>>   after about 30 minutes
>
>That's interesting. Did you use LLVM_IAS=1?

May be worth checking which relocation or (SHT_GROUP section's sh_info) in arch/arm64/kernel/head.o is incorrect.

>> - CONFIG_CPU_BIG_ENDIAN doesn't seem to work with lld, and LTO
>>   doesn't work with ld.bfd.
>>   I've added a CPU_LITTLE_ENDIAN dependency to
>>   ARCH_SUPPORTS_LTO_CLANG{,THIN}
>
>Ah, good point. I'll fix this in v9.

Full/Thin LTO should work with GNU ld and gold with LLVMgold.so built from
llvm-project (https://llvm.org/docs/GoldPlugin.html ). You'll need to make sure
that LLVMgold.so is newer than clang. (Newer clang may introduce bitcode
attributes which are unrecognizable by older LLVMgold.so/ld.lld)

>[...]
>> Not sure if these are all known issues. If there is one you'd like me try
>> take a closer look at for finding which config options break it, I can try
>
>No, none of these are known issues. I would be happy to take a closer
>look if you can share configs that reproduce these.
>
>Sami
>
>-- 
>You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
>To unsubscribe from this group and stop receiving emails from it, send an email to clang-built-linux+unsubscribe@googlegroups.com.
>To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/CABCJKueCHo2RYfx_A21m%2B%3Dd1gQLR9QsOOxCsHFeicCqyHkb-Kg%40mail.gmail.com.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v8 00/16] Add support for Clang LTO
       [not found]       ` <CAK8P3a3O65m6Us=YvCP3QA+0kqAeEqfi-DLOJa+JYmBqs8-JcA@mail.gmail.com>
@ 2020-12-09  5:23         ` Fāng-ruì Sòng
  2020-12-09  9:07           ` Arnd Bergmann
  2020-12-09 16:09         ` Sami Tolvanen
  1 sibling, 1 reply; 48+ messages in thread
From: Fāng-ruì Sòng @ 2020-12-09  5:23 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Sami Tolvanen, Masahiro Yamada, Steven Rostedt, Will Deacon,
	Josh Poimboeuf, Peter Zijlstra, Greg Kroah-Hartman,
	Paul E. McKenney, Kees Cook, Nick Desaulniers, clang-built-linux,
	Kernel Hardening, linux-arch, Linux ARM,
	Linux Kbuild mailing list, linux-kernel, linux-pci

On Tue, Dec 8, 2020 at 1:02 PM Arnd Bergmann <arnd@kernel.org> wrote:
>
> On Tue, Dec 8, 2020 at 9:59 PM Arnd Bergmann <arnd@kernel.org> wrote:
> >
> > Attaching the config for "ld.lld: error: Never resolved function from
> >   blockaddress (Producer: 'LLVM12.0.0' Reader: 'LLVM 12.0.0')"
>
> And here is a new one: "ld.lld: error: assignment to symbol
> init_pg_end does not converge"
>
>       Arnd
>

This is interesting. I changed the symbol assignment to a separate
loop in https://reviews.llvm.org/D66279
Does raising the limit help? Sometimes the kernel linker script can be
rewritten to be more friendly to the linker...

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v8 00/16] Add support for Clang LTO
  2020-12-09  5:23         ` Fāng-ruì Sòng
@ 2020-12-09  9:07           ` Arnd Bergmann
  0 siblings, 0 replies; 48+ messages in thread
From: Arnd Bergmann @ 2020-12-09  9:07 UTC (permalink / raw)
  To: Fāng-ruì Sòng
  Cc: Sami Tolvanen, Masahiro Yamada, Steven Rostedt, Will Deacon,
	Josh Poimboeuf, Peter Zijlstra, Greg Kroah-Hartman,
	Paul E. McKenney, Kees Cook, Nick Desaulniers, clang-built-linux,
	Kernel Hardening, linux-arch, Linux ARM,
	Linux Kbuild mailing list, linux-kernel, linux-pci

On Wed, Dec 9, 2020 at 6:23 AM 'Fāng-ruì Sòng' via Clang Built Linux
<clang-built-linux@googlegroups.com> wrote:
>
> On Tue, Dec 8, 2020 at 1:02 PM Arnd Bergmann <arnd@kernel.org> wrote:
> >
> > On Tue, Dec 8, 2020 at 9:59 PM Arnd Bergmann <arnd@kernel.org> wrote:
> > >
> > > Attaching the config for "ld.lld: error: Never resolved function from
> > >   blockaddress (Producer: 'LLVM12.0.0' Reader: 'LLVM 12.0.0')"
> >
> > And here is a new one: "ld.lld: error: assignment to symbol
> > init_pg_end does not converge"
>
> This is interesting. I changed the symbol assignment to a separate
> loop in https://reviews.llvm.org/D66279
> Does raising the limit help? Sometimes the kernel linker script can be
> rewritten to be more friendly to the linker...

If that requires rebuilding lld, testing it is beyond what I can help with
right now. Hopefully someone can reproduce it with my .config.

       Arnd

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v8 00/16] Add support for Clang LTO
  2020-12-09  4:55     ` Fangrui Song
@ 2020-12-09  9:19       ` Arnd Bergmann
  0 siblings, 0 replies; 48+ messages in thread
From: Arnd Bergmann @ 2020-12-09  9:19 UTC (permalink / raw)
  To: Fangrui Song
  Cc: Sami Tolvanen, Masahiro Yamada, Steven Rostedt, Will Deacon,
	Josh Poimboeuf, Peter Zijlstra, Greg Kroah-Hartman,
	Paul E. McKenney, Kees Cook, Nick Desaulniers, clang-built-linux,
	Kernel Hardening, linux-arch, Linux ARM,
	Linux Kbuild mailing list, linux-kernel, linux-pci

On Wed, Dec 9, 2020 at 5:56 AM 'Fangrui Song' via Clang Built Linux
<clang-built-linux@googlegroups.com> wrote:
> On 2020-12-08, 'Sami Tolvanen' via Clang Built Linux wrote:
> >On Tue, Dec 8, 2020 at 4:15 AM Arnd Bergmann <arnd@kernel.org> wrote:
> >> So far I have
> >> not managed to get a working build out of it, the main problem so far being
> >> that it is really slow to build because the link stage only uses one CPU.
> >> These are the other issues I've seen so far:
>
> ld.lld ThinLTO uses the number of (physical cores enabled by affinity) by default.

Ah, I see.  Do you know if it's also possible to do something like
-flto=jobserver
to integrate better with the kernel build system?

I tend to run multiple builds under a top-level makefile with 'make
-j30' in order
to use 30 of the 32 threads and leave the scheduling to jobserver instead of
the kernel. If the linker itself is multithreaded but the jobserver
thinks it is a
single thread, could end up with 30 concurrent linkers each trying to use
16 cores.

> >> - CONFIG_CPU_BIG_ENDIAN doesn't seem to work with lld, and LTO
> >>   doesn't work with ld.bfd.
> >>   I've added a CPU_LITTLE_ENDIAN dependency to
> >>   ARCH_SUPPORTS_LTO_CLANG{,THIN}
> >
> >Ah, good point. I'll fix this in v9.
>
> Full/Thin LTO should work with GNU ld and gold with LLVMgold.so built from
> llvm-project (https://llvm.org/docs/GoldPlugin.html ). You'll need to make sure
> that LLVMgold.so is newer than clang. (Newer clang may introduce bitcode
> attributes which are unrecognizable by older LLVMgold.so/ld.lld)

The current patch series requires LLD:

config HAS_LTO_CLANG
       def_bool y
       depends on CC_IS_CLANG && CLANG_VERSION >= 110000 && LD_IS_LLD

Is this something we should change then, or try to keep it simple with the
current approach, leaving LTO disabled for big-endian builds and hosts without
a working lld?

       Arnd

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v8 00/16] Add support for Clang LTO
  2020-12-08 12:15 ` Arnd Bergmann
  2020-12-08 13:54   ` Arnd Bergmann
  2020-12-08 16:43   ` Sami Tolvanen
@ 2020-12-09 12:35   ` Arnd Bergmann
  2020-12-09 16:25     ` Sami Tolvanen
  2 siblings, 1 reply; 48+ messages in thread
From: Arnd Bergmann @ 2020-12-09 12:35 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Masahiro Yamada, Steven Rostedt, Will Deacon, Josh Poimboeuf,
	Peter Zijlstra, Greg Kroah-Hartman, Paul E. McKenney, Kees Cook,
	Nick Desaulniers, clang-built-linux, Kernel Hardening,
	linux-arch, Linux ARM, Linux Kbuild mailing list, linux-kernel,
	linux-pci

On Tue, Dec 8, 2020 at 1:15 PM Arnd Bergmann <arnd@kernel.org> wrote:
>
> - one build seems to have dropped all symbols the string operations
> from vmlinux,
>   so while the link goes through, modules cannot be loaded:
>  ERROR: modpost: "memmove" [drivers/media/rc/rc-core.ko] undefined!
>  ERROR: modpost: "memcpy" [net/wireless/cfg80211.ko] undefined!
>  ERROR: modpost: "memcpy" [net/8021q/8021q.ko] undefined!
>  ERROR: modpost: "memset" [net/8021q/8021q.ko] undefined!
>  ERROR: modpost: "memcpy" [net/unix/unix.ko] undefined!
>  ERROR: modpost: "memset" [net/sched/cls_u32.ko] undefined!
>  ERROR: modpost: "memcpy" [net/sched/cls_u32.ko] undefined!
>  ERROR: modpost: "memset" [net/sched/sch_skbprio.ko] undefined!
>  ERROR: modpost: "memcpy" [net/802/garp.ko] undefined!
>  I first thought this was related to a clang-12 bug I saw the other day, but
>  this also happens with clang-11

It seems to happen because of CONFIG_TRIM_UNUSED_KSYMS,
which is a shame, since I think that is an option we'd always want to
have enabled with LTO, to allow more dead code to be eliminated.

       Arnd

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v8 00/16] Add support for Clang LTO
       [not found]       ` <CAK8P3a3O65m6Us=YvCP3QA+0kqAeEqfi-DLOJa+JYmBqs8-JcA@mail.gmail.com>
  2020-12-09  5:23         ` Fāng-ruì Sòng
@ 2020-12-09 16:09         ` Sami Tolvanen
  2020-12-09 19:24           ` Arnd Bergmann
  1 sibling, 1 reply; 48+ messages in thread
From: Sami Tolvanen @ 2020-12-09 16:09 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Masahiro Yamada, Steven Rostedt, Will Deacon, Josh Poimboeuf,
	Peter Zijlstra, Greg Kroah-Hartman, Paul E. McKenney, Kees Cook,
	Nick Desaulniers, clang-built-linux, Kernel Hardening,
	linux-arch, Linux ARM, Linux Kbuild mailing list, linux-kernel,
	linux-pci

On Tue, Dec 8, 2020 at 1:02 PM Arnd Bergmann <arnd@kernel.org> wrote:
>
> On Tue, Dec 8, 2020 at 9:59 PM Arnd Bergmann <arnd@kernel.org> wrote:
> >
> > Attaching the config for "ld.lld: error: Never resolved function from
> >   blockaddress (Producer: 'LLVM12.0.0' Reader: 'LLVM 12.0.0')"
>
> And here is a new one: "ld.lld: error: assignment to symbol
> init_pg_end does not converge"

Thanks for these. I can reproduce the "Never resolved function from
blockaddress" issue with full LTO, but I couldn't reproduce this one
with ToT Clang, and the config doesn't have LTO enabled:

$ grep LTO 0x2824F594_defconfig
CONFIG_ARCH_SUPPORTS_LTO_CLANG_THIN=y

Is this the correct config file?

Sami

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v8 00/16] Add support for Clang LTO
  2020-12-08 22:20         ` Arnd Bergmann
@ 2020-12-09 16:11           ` Sami Tolvanen
  0 siblings, 0 replies; 48+ messages in thread
From: Sami Tolvanen @ 2020-12-09 16:11 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Nick Desaulniers, Masahiro Yamada, Steven Rostedt, Will Deacon,
	Josh Poimboeuf, Peter Zijlstra, Greg Kroah-Hartman,
	Paul E. McKenney, Kees Cook, clang-built-linux, Kernel Hardening,
	linux-arch, Linux ARM, Linux Kbuild mailing list, linux-kernel,
	linux-pci

On Tue, Dec 8, 2020 at 2:20 PM Arnd Bergmann <arnd@kernel.org> wrote:
>
> On Tue, Dec 8, 2020 at 10:10 PM 'Nick Desaulniers' via Clang Built
> Linux <clang-built-linux@googlegroups.com> wrote:
> >
> > On Tue, Dec 8, 2020 at 1:00 PM Arnd Bergmann <arnd@kernel.org> wrote:
> > >
> > > On Tue, Dec 8, 2020 at 5:43 PM 'Sami Tolvanen' via Clang Built Linux
> > > <clang-built-linux@googlegroups.com> wrote:
> > > >
> > > > On Tue, Dec 8, 2020 at 4:15 AM Arnd Bergmann <arnd@kernel.org> wrote:
> > > > >
> > > > > - one build seems to take even longer to link. It's currently at 35GB RAM
> > > > >   usage and 40 minutes into the final link, but I'm worried it might
> > > > > not complete
> > > > >   before it runs out of memory.  I only have 128GB installed, and google-chrome
> > > > >   uses another 30GB of that, and I'm also doing some other builds in parallel.
> > > > >   Is there a minimum recommended amount of memory for doing LTO builds?
> > > >
> > > > When building arm64 defconfig, the maximum memory usage I measured
> > > > with ThinLTO was 3.5 GB, and with full LTO 20.3 GB. I haven't measured
> > > > larger configurations, but I believe LLD can easily consume 3-4x that
> > > > much with full LTO allyesconfig.
> > >
> > > Ok, that's not too bad then. Is there actually a reason to still
> > > support full-lto
> > > in your series? As I understand it, full LTO was the initial approach and
> > > used to work better, but thin LTO is actually what we want to use in the
> > > long run. Perhaps dropping the full LTO option from your series now
> > > that thin LTO works well enough and uses less resources would help
> > > avoid some of the problems.
> >
> > While all developers agree that ThinLTO is a much more palatable
> > experience than full LTO; our product teams prefer the excessive build
> > time and memory high water mark (at build time) costs in exchange for
> > slightly better performance than ThinLTO in <benchmarks that I've been
> > told are important>.  Keeping support for full LTO in tree would help
> > our product teams reduce the amount of out of tree code they have.  As
> > long as <benchmarks that I've been told are important> help
> > sell/differentiate phones, I suspect our product teams will continue
> > to ship full LTO in production.
>
> Ok, fair enough. How about marking FULL_LTO as 'depends on
> !COMPILE_TEST' then? I'll do that locally for my randconfig tests,
> but it would help the other build bots that also force-enable
> COMPILE_TEST.

Sure, that sounds reasonable to me. I'll add it in v9.

Sami

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v8 00/16] Add support for Clang LTO
  2020-12-09 12:35   ` Arnd Bergmann
@ 2020-12-09 16:25     ` Sami Tolvanen
  2020-12-09 17:51       ` Arnd Bergmann
  0 siblings, 1 reply; 48+ messages in thread
From: Sami Tolvanen @ 2020-12-09 16:25 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Masahiro Yamada, Steven Rostedt, Will Deacon, Josh Poimboeuf,
	Peter Zijlstra, Greg Kroah-Hartman, Paul E. McKenney, Kees Cook,
	Nick Desaulniers, clang-built-linux, Kernel Hardening,
	linux-arch, Linux ARM, Linux Kbuild mailing list, linux-kernel,
	linux-pci

On Wed, Dec 9, 2020 at 4:36 AM Arnd Bergmann <arnd@kernel.org> wrote:
>
> On Tue, Dec 8, 2020 at 1:15 PM Arnd Bergmann <arnd@kernel.org> wrote:
> >
> > - one build seems to have dropped all symbols the string operations
> > from vmlinux,
> >   so while the link goes through, modules cannot be loaded:
> >  ERROR: modpost: "memmove" [drivers/media/rc/rc-core.ko] undefined!
> >  ERROR: modpost: "memcpy" [net/wireless/cfg80211.ko] undefined!
> >  ERROR: modpost: "memcpy" [net/8021q/8021q.ko] undefined!
> >  ERROR: modpost: "memset" [net/8021q/8021q.ko] undefined!
> >  ERROR: modpost: "memcpy" [net/unix/unix.ko] undefined!
> >  ERROR: modpost: "memset" [net/sched/cls_u32.ko] undefined!
> >  ERROR: modpost: "memcpy" [net/sched/cls_u32.ko] undefined!
> >  ERROR: modpost: "memset" [net/sched/sch_skbprio.ko] undefined!
> >  ERROR: modpost: "memcpy" [net/802/garp.ko] undefined!
> >  I first thought this was related to a clang-12 bug I saw the other day, but
> >  this also happens with clang-11
>
> It seems to happen because of CONFIG_TRIM_UNUSED_KSYMS,
> which is a shame, since I think that is an option we'd always want to
> have enabled with LTO, to allow more dead code to be eliminated.

Ah yes, this is a known issue. We use TRIM_UNUSED_KSYMS with LTO in
Android's Generic Kernel Image and the problem is that bitcode doesn't
yet contain calls to these functions, so autoksyms won't see them. The
solution is to use a symbol whitelist with LTO to prevent these from
being trimmed. I suspect we would need a default whitelist for LTO
builds.

Sami

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v8 00/16] Add support for Clang LTO
  2020-12-09 16:25     ` Sami Tolvanen
@ 2020-12-09 17:51       ` Arnd Bergmann
  0 siblings, 0 replies; 48+ messages in thread
From: Arnd Bergmann @ 2020-12-09 17:51 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Masahiro Yamada, Steven Rostedt, Will Deacon, Josh Poimboeuf,
	Peter Zijlstra, Greg Kroah-Hartman, Paul E. McKenney, Kees Cook,
	Nick Desaulniers, clang-built-linux, Kernel Hardening,
	linux-arch, Linux ARM, Linux Kbuild mailing list, linux-kernel,
	linux-pci

On Wed, Dec 9, 2020 at 5:25 PM 'Sami Tolvanen' via Clang Built Linux
<clang-built-linux@googlegroups.com> wrote:
>
> On Wed, Dec 9, 2020 at 4:36 AM Arnd Bergmann <arnd@kernel.org> wrote:
> >
> > On Tue, Dec 8, 2020 at 1:15 PM Arnd Bergmann <arnd@kernel.org> wrote:
> >
> >
> > It seems to happen because of CONFIG_TRIM_UNUSED_KSYMS,
> > which is a shame, since I think that is an option we'd always want to
> > have enabled with LTO, to allow more dead code to be eliminated.
>
> Ah yes, this is a known issue. We use TRIM_UNUSED_KSYMS with LTO in
> Android's Generic Kernel Image and the problem is that bitcode doesn't
> yet contain calls to these functions, so autoksyms won't see them. The
> solution is to use a symbol whitelist with LTO to prevent these from
> being trimmed. I suspect we would need a default whitelist for LTO
> builds.

A built-in allowlist sounds good to me. FWIW, in the randconfigs so far, I only
saw five symbols that would need to be on it:

memcpy(), memmove(), memset(), __stack_chk_fail() and __stack_chk_guard

       Arnd

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v8 00/16] Add support for Clang LTO
  2020-12-09 16:09         ` Sami Tolvanen
@ 2020-12-09 19:24           ` Arnd Bergmann
  0 siblings, 0 replies; 48+ messages in thread
From: Arnd Bergmann @ 2020-12-09 19:24 UTC (permalink / raw)
  To: Sami Tolvanen
  Cc: Masahiro Yamada, Steven Rostedt, Will Deacon, Josh Poimboeuf,
	Peter Zijlstra, Greg Kroah-Hartman, Paul E. McKenney, Kees Cook,
	Nick Desaulniers, clang-built-linux, Kernel Hardening,
	linux-arch, Linux ARM, Linux Kbuild mailing list, linux-kernel,
	linux-pci

On Wed, Dec 9, 2020 at 5:09 PM 'Sami Tolvanen' via Clang Built Linux
<clang-built-linux@googlegroups.com> wrote:
> On Tue, Dec 8, 2020 at 1:02 PM Arnd Bergmann <arnd@kernel.org> wrote:
> > On Tue, Dec 8, 2020 at 9:59 PM Arnd Bergmann <arnd@kernel.org> wrote:
> > >
> > > Attaching the config for "ld.lld: error: Never resolved function from
> > >   blockaddress (Producer: 'LLVM12.0.0' Reader: 'LLVM 12.0.0')"
> >
> > And here is a new one: "ld.lld: error: assignment to symbol
> > init_pg_end does not converge"
>
> Thanks for these. I can reproduce the "Never resolved function from
> blockaddress" issue with full LTO, but I couldn't reproduce this one
> with ToT Clang, and the config doesn't have LTO enabled:
>
> $ grep LTO 0x2824F594_defconfig
> CONFIG_ARCH_SUPPORTS_LTO_CLANG_THIN=y
>
> Is this the correct config file?

It is the right file, and so far this is the only defconfig on which I
see the "does not converge" error, so I don't have any other one.

I suspect this might be an issue in the version of lld that I have here
and unrelated to LTO, and I can confirm that I see the error
with LTO still disabled.

It seems to be completely random. I do see the bug on next-20201203
but not on a later one. I also tried bisecting through linux-next and
arrived at "lib: stackdepot: add support to configure STACK_HASH_SIZE",
which is almost certainly not related, other than just changing a few
symbols around.

      Arnd

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2020-12-09 19:26 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-01 21:36 [PATCH v8 00/16] Add support for Clang LTO Sami Tolvanen
2020-12-01 21:36 ` [PATCH v8 01/16] tracing: move function tracer options to Kconfig Sami Tolvanen
2020-12-01 21:47   ` Steven Rostedt
2020-12-01 21:36 ` [PATCH v8 02/16] kbuild: add support for Clang LTO Sami Tolvanen
2020-12-02  2:59   ` Masahiro Yamada
2020-12-03  0:07   ` Nick Desaulniers
2020-12-01 21:36 ` [PATCH v8 03/16] kbuild: lto: fix module versioning Sami Tolvanen
2020-12-01 21:36 ` [PATCH v8 04/16] kbuild: lto: limit inlining Sami Tolvanen
2020-12-01 21:36 ` [PATCH v8 05/16] kbuild: lto: merge module sections Sami Tolvanen
2020-12-01 21:36 ` [PATCH v8 06/16] kbuild: lto: remove duplicate dependencies from .mod files Sami Tolvanen
2020-12-01 21:36 ` [PATCH v8 07/16] init: lto: ensure initcall ordering Sami Tolvanen
2020-12-01 21:36 ` [PATCH v8 08/16] init: lto: fix PREL32 relocations Sami Tolvanen
2020-12-01 21:37 ` [PATCH v8 09/16] PCI: Fix PREL32 relocations for LTO Sami Tolvanen
2020-12-01 21:37 ` [PATCH v8 10/16] modpost: lto: strip .lto from module names Sami Tolvanen
2020-12-01 21:37 ` [PATCH v8 11/16] scripts/mod: disable LTO for empty.c Sami Tolvanen
2020-12-01 21:37 ` [PATCH v8 12/16] efi/libstub: disable LTO Sami Tolvanen
2020-12-01 21:37 ` [PATCH v8 13/16] drivers/misc/lkdtm: disable LTO for rodata.o Sami Tolvanen
2020-12-01 21:37 ` [PATCH v8 14/16] arm64: vdso: disable LTO Sami Tolvanen
2020-12-01 21:37 ` [PATCH v8 15/16] arm64: disable recordmcount with DYNAMIC_FTRACE_WITH_REGS Sami Tolvanen
2020-12-01 21:37 ` [PATCH v8 16/16] arm64: allow LTO to be selected Sami Tolvanen
2020-12-03  0:01 ` [PATCH v8 00/16] Add support for Clang LTO Nick Desaulniers
2020-12-03 11:26 ` Will Deacon
2020-12-03 17:07   ` Sami Tolvanen
2020-12-03 18:21     ` Nathan Chancellor
2020-12-03 18:22     ` Will Deacon
2020-12-03 22:32       ` Nick Desaulniers
2020-12-04  9:35         ` Will Deacon
2020-12-04 22:52         ` Sami Tolvanen
2020-12-06  6:50           ` Nathan Chancellor
2020-12-06 20:09             ` Sami Tolvanen
2020-12-08  0:46               ` Nathan Chancellor
2020-12-08 12:15 ` Arnd Bergmann
2020-12-08 13:54   ` Arnd Bergmann
2020-12-08 16:53     ` Sami Tolvanen
2020-12-08 16:56       ` Arnd Bergmann
2020-12-08 16:43   ` Sami Tolvanen
     [not found]     ` <CAK8P3a1Xfpt7QLkvxjtXKcgzcWkS8g9bmxD687+rqjTafTzKrg@mail.gmail.com>
2020-12-08 21:09       ` Nick Desaulniers
2020-12-08 22:20         ` Arnd Bergmann
2020-12-09 16:11           ` Sami Tolvanen
     [not found]       ` <CAK8P3a3O65m6Us=YvCP3QA+0kqAeEqfi-DLOJa+JYmBqs8-JcA@mail.gmail.com>
2020-12-09  5:23         ` Fāng-ruì Sòng
2020-12-09  9:07           ` Arnd Bergmann
2020-12-09 16:09         ` Sami Tolvanen
2020-12-09 19:24           ` Arnd Bergmann
2020-12-09  4:55     ` Fangrui Song
2020-12-09  9:19       ` Arnd Bergmann
2020-12-09 12:35   ` Arnd Bergmann
2020-12-09 16:25     ` Sami Tolvanen
2020-12-09 17:51       ` Arnd Bergmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).