This patch series adds support for building x86_64 and arm64 kernels with Clang's Link Time Optimization (LTO). In addition to performance, the primary motivation for LTO is to allow Clang's Control-Flow Integrity (CFI) to be used in the kernel. Google has shipped millions of Pixel devices running three major kernel versions with LTO+CFI since 2018. Most of the patches are build system changes for handling LLVM bitcode, which Clang produces with LTO instead of ELF object files, postponing ELF processing until a later stage, and ensuring initcall ordering. Note that this version is based on tip/master to reduce the number of prerequisite patches, and to make it easier to manage changes to objtool. Patch 1 is from Masahiro's kbuild tree, and while it's not directly related to LTO, it makes the module linker script changes cleaner. Furthermore, patches 2-6 include Peter's patch for generating __mcount_loc with objtool, and build system changes to enable it on x86. With these patches, we no longer need to annotate functions that have non-call references to __fentry__ with LTO, which greatly simplifies supporting dynamic ftrace. You can also pull this series from https://github.com/samitolvanen/linux.git lto-v5 --- Changes in v5: - Rebased on top of tip/master. - Changed the command line for objtool to use --vmlinux --duplicate to disable warnings about retpoline thunks and to fix .orc_unwind generation for vmlinux.o. - Added --noinstr flag to objtool, so we can use --vmlinux without also enabling noinstr validation. - Disabled objtool's unreachable instruction warnings with LTO to disable false positives for the int3 padding in vmlinux.o. - Added ANNOTATE_RETPOLINE_SAFE annotations to the indirect jumps in x86 assembly code to fix objtool warnings with retpoline. - Fixed modpost warnings about missing version information with CONFIG_MODVERSIONS. - Included Makefile.lib into Makefile.modpost for ld_flags. Thanks to Sedat for pointing this out. - Updated the help text for ThinLTO to better explain the trade-offs. - Updated commit messages with better explanations. Changes in v4: - Fixed a typo in Makefile.lib to correctly pass --no-fp to objtool. - Moved ftrace configs related to generating __mcount_loc to Kconfig, so they are available also in Makefile.modfinal. - Dropped two prerequisite patches that were merged to Linus' tree. Changes in v3: - Added a separate patch to remove the unused DISABLE_LTO treewide, as filtering out CC_FLAGS_LTO instead is preferred. - Updated the Kconfig help to explain why LTO is behind a choice and disabled by default. - Dropped CC_FLAGS_LTO_CLANG, compiler-specific LTO flags are now appended directly to CC_FLAGS_LTO. - Updated $(AR) flags as KBUILD_ARFLAGS was removed earlier. - Fixed ThinLTO cache handling for external module builds. - Rebased on top of Masahiro's patch for preprocessing modules.lds, and moved the contents of module-lto.lds to modules.lds.S. - Moved objtool_args to Makefile.lib to avoid duplication of the command line parameters in Makefile.modfinal. - Clarified in the commit message for the initcall ordering patch that the initcall order remains the same as without LTO. - Changed link-vmlinux.sh to use jobserver-exec to control the number of jobs started by generate_initcall_ordering.pl. - Dropped the x86/relocs patch to whitelist L4_PAGE_OFFSET as it's no longer needed with ToT kernel. - Disabled LTO for arch/x86/power/cpu.c to work around a Clang bug with stack protector attributes. Changes in v2: - Fixed -Wmissing-prototypes warnings with W=1. - Dropped cc-option from -fsplit-lto-unit and added .thinlto-cache scrubbing to make distclean. - Added a comment about Clang >=11 being required. - Added a patch to disable LTO for the arm64 KVM nVHE code. - Disabled objtool's noinstr validation with LTO unless enabled. - Included Peter's proposed objtool mcount patch in the series and replaced recordmcount with the objtool pass to avoid whitelisting relocations that are not calls. - Updated several commit messages with better explanations. Masahiro Yamada (1): kbuild: preprocess module linker script Peter Zijlstra (1): objtool: Add a pass for generating __mcount_loc Sami Tolvanen (27): objtool: Don't autodetect vmlinux.o tracing: move function tracer options to Kconfig tracing: add support for objtool mcount x86, build: use objtool mcount treewide: remove DISABLE_LTO kbuild: add support for Clang LTO kbuild: lto: fix module versioning objtool: Split noinstr validation from --vmlinux kbuild: lto: postpone objtool kbuild: lto: limit inlining kbuild: lto: merge module sections kbuild: lto: remove duplicate dependencies from .mod files init: lto: ensure initcall ordering init: lto: fix PREL32 relocations PCI: Fix PREL32 relocations for LTO modpost: lto: strip .lto from module names scripts/mod: disable LTO for empty.c efi/libstub: disable LTO drivers/misc/lkdtm: disable LTO for rodata.o arm64: vdso: disable LTO KVM: arm64: disable LTO for the nVHE directory arm64: disable recordmcount with DYNAMIC_FTRACE_WITH_REGS arm64: allow LTO_CLANG and THINLTO to be selected x86/asm: annotate indirect jumps x86, vdso: disable LTO only for vDSO x86, cpu: disable LTO for cpu.c x86, build: allow LTO_CLANG and THINLTO to be selected .gitignore | 1 + Makefile | 68 +++-- arch/Kconfig | 74 +++++ arch/arm/Makefile | 4 - .../module.lds => include/asm/module.lds.h} | 2 + arch/arm64/Kconfig | 4 + arch/arm64/Makefile | 4 - .../module.lds => include/asm/module.lds.h} | 2 + arch/arm64/kernel/vdso/Makefile | 4 +- arch/arm64/kvm/hyp/nvhe/Makefile | 4 +- arch/ia64/Makefile | 1 - .../{module.lds => include/asm/module.lds.h} | 0 arch/m68k/Makefile | 1 - .../module.lds => include/asm/module.lds.h} | 0 arch/powerpc/Makefile | 1 - .../module.lds => include/asm/module.lds.h} | 0 arch/riscv/Makefile | 3 - .../module.lds => include/asm/module.lds.h} | 3 +- arch/sparc/vdso/Makefile | 2 - arch/um/include/asm/Kbuild | 1 + arch/x86/Kconfig | 3 + arch/x86/Makefile | 5 + arch/x86/entry/vdso/Makefile | 5 +- arch/x86/kernel/acpi/wakeup_64.S | 2 + arch/x86/platform/pvh/head.S | 2 + arch/x86/power/Makefile | 4 + arch/x86/power/hibernate_asm_64.S | 3 + drivers/firmware/efi/libstub/Makefile | 2 + drivers/misc/lkdtm/Makefile | 1 + include/asm-generic/Kbuild | 1 + include/asm-generic/module.lds.h | 10 + include/asm-generic/vmlinux.lds.h | 11 +- include/linux/init.h | 79 ++++- include/linux/pci.h | 19 +- kernel/Makefile | 3 - kernel/trace/Kconfig | 29 ++ scripts/.gitignore | 1 + scripts/Makefile | 3 + scripts/Makefile.build | 69 +++-- scripts/Makefile.lib | 17 +- scripts/Makefile.modfinal | 29 +- scripts/Makefile.modpost | 25 +- scripts/generate_initcall_order.pl | 270 ++++++++++++++++++ scripts/link-vmlinux.sh | 98 ++++++- scripts/mod/Makefile | 1 + scripts/mod/modpost.c | 16 +- scripts/mod/modpost.h | 9 + scripts/mod/sumversion.c | 6 +- scripts/{module-common.lds => module.lds.S} | 31 ++ scripts/package/builddeb | 2 +- tools/objtool/builtin-check.c | 10 +- tools/objtool/check.c | 84 +++++- tools/objtool/include/objtool/builtin.h | 2 +- tools/objtool/include/objtool/check.h | 1 + tools/objtool/include/objtool/objtool.h | 1 + tools/objtool/objtool.c | 1 + 56 files changed, 903 insertions(+), 131 deletions(-) rename arch/arm/{kernel/module.lds => include/asm/module.lds.h} (72%) rename arch/arm64/{kernel/module.lds => include/asm/module.lds.h} (76%) rename arch/ia64/{module.lds => include/asm/module.lds.h} (100%) rename arch/m68k/{kernel/module.lds => include/asm/module.lds.h} (100%) rename arch/powerpc/{kernel/module.lds => include/asm/module.lds.h} (100%) rename arch/riscv/{kernel/module.lds => include/asm/module.lds.h} (84%) create mode 100644 include/asm-generic/module.lds.h create mode 100755 scripts/generate_initcall_order.pl rename scripts/{module-common.lds => module.lds.S} (59%) base-commit: 80396d76da65fc8b82581c0260c25a6aa0a495a3 -- 2.28.0.1011.ga647a8990f-goog
From: Masahiro Yamada <masahiroy@kernel.org> There was a request to preprocess the module linker script like we do for the vmlinux one. (https://lkml.org/lkml/2020/8/21/512) The difference between vmlinux.lds and module.lds is that the latter is needed for external module builds, thus must be cleaned up by 'make mrproper' instead of 'make clean'. Also, it must be created by 'make modules_prepare'. You cannot put it in arch/$(SRCARCH)/kernel/, which is cleaned up by 'make clean'. I moved arch/$(SRCARCH)/kernel/module.lds to arch/$(SRCARCH)/include/asm/module.lds.h, which is included from scripts/module.lds.S. scripts/module.lds is fine because 'make clean' keeps all the build artifacts under scripts/. You can add arch-specific sections in <asm/module.lds.h>. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Tested-by: Jessica Yu <jeyu@kernel.org> Acked-by: Will Deacon <will@kernel.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Palmer Dabbelt <palmerdabbelt@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Jessica Yu <jeyu@kernel.org> --- Makefile | 10 ++++++---- arch/arm/Makefile | 4 ---- .../{kernel/module.lds => include/asm/module.lds.h} | 2 ++ arch/arm64/Makefile | 4 ---- .../{kernel/module.lds => include/asm/module.lds.h} | 2 ++ arch/ia64/Makefile | 1 - arch/ia64/{module.lds => include/asm/module.lds.h} | 0 arch/m68k/Makefile | 1 - .../{kernel/module.lds => include/asm/module.lds.h} | 0 arch/powerpc/Makefile | 1 - .../{kernel/module.lds => include/asm/module.lds.h} | 0 arch/riscv/Makefile | 3 --- .../{kernel/module.lds => include/asm/module.lds.h} | 3 ++- arch/um/include/asm/Kbuild | 1 + include/asm-generic/Kbuild | 1 + include/asm-generic/module.lds.h | 10 ++++++++++ scripts/.gitignore | 1 + scripts/Makefile | 3 +++ scripts/Makefile.modfinal | 5 ++--- scripts/{module-common.lds => module.lds.S} | 3 +++ scripts/package/builddeb | 2 +- 21 files changed, 34 insertions(+), 23 deletions(-) rename arch/arm/{kernel/module.lds => include/asm/module.lds.h} (72%) rename arch/arm64/{kernel/module.lds => include/asm/module.lds.h} (76%) rename arch/ia64/{module.lds => include/asm/module.lds.h} (100%) rename arch/m68k/{kernel/module.lds => include/asm/module.lds.h} (100%) rename arch/powerpc/{kernel/module.lds => include/asm/module.lds.h} (100%) rename arch/riscv/{kernel/module.lds => include/asm/module.lds.h} (84%) create mode 100644 include/asm-generic/module.lds.h rename scripts/{module-common.lds => module.lds.S} (93%) diff --git a/Makefile b/Makefile index f84d7e4ca0be..a913a6829754 100644 --- a/Makefile +++ b/Makefile @@ -505,7 +505,6 @@ KBUILD_CFLAGS_KERNEL := KBUILD_AFLAGS_MODULE := -DMODULE KBUILD_CFLAGS_MODULE := -DMODULE KBUILD_LDFLAGS_MODULE := -export KBUILD_LDS_MODULE := $(srctree)/scripts/module-common.lds KBUILD_LDFLAGS := CLANG_FLAGS := @@ -1384,7 +1383,7 @@ endif # using awk while concatenating to the final file. PHONY += modules -modules: $(if $(KBUILD_BUILTIN),vmlinux) modules_check +modules: $(if $(KBUILD_BUILTIN),vmlinux) modules_check modules_prepare $(Q)$(MAKE) -f $(srctree)/scripts/Makefile.modpost PHONY += modules_check @@ -1401,6 +1400,7 @@ targets += modules.order # Target to prepare building external modules PHONY += modules_prepare modules_prepare: prepare + $(Q)$(MAKE) $(build)=scripts scripts/module.lds # Target to install modules PHONY += modules_install @@ -1722,7 +1722,9 @@ help: @echo ' clean - remove generated files in module directory only' @echo '' -PHONY += prepare +# no-op for external module builds +PHONY += prepare modules_prepare + endif # KBUILD_EXTMOD # Single targets @@ -1755,7 +1757,7 @@ MODORDER := .modules.tmp endif PHONY += single_modpost -single_modpost: $(single-no-ko) +single_modpost: $(single-no-ko) modules_prepare $(Q){ $(foreach m, $(single-ko), echo $(extmod-prefix)$m;) } > $(MODORDER) $(Q)$(MAKE) -f $(srctree)/scripts/Makefile.modpost diff --git a/arch/arm/Makefile b/arch/arm/Makefile index e589da3c8949..9c4d19ae7d61 100644 --- a/arch/arm/Makefile +++ b/arch/arm/Makefile @@ -20,10 +20,6 @@ endif # linker. All sections should be explicitly named in the linker script. LDFLAGS_vmlinux += $(call ld-option, --orphan-handling=warn) -ifeq ($(CONFIG_ARM_MODULE_PLTS),y) -KBUILD_LDS_MODULE += $(srctree)/arch/arm/kernel/module.lds -endif - GZFLAGS :=-9 #KBUILD_CFLAGS +=-pipe diff --git a/arch/arm/kernel/module.lds b/arch/arm/include/asm/module.lds.h similarity index 72% rename from arch/arm/kernel/module.lds rename to arch/arm/include/asm/module.lds.h index 79cb6af565e5..0e7cb4e314b4 100644 --- a/arch/arm/kernel/module.lds +++ b/arch/arm/include/asm/module.lds.h @@ -1,5 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 */ +#ifdef CONFIG_ARM_MODULE_PLTS SECTIONS { .plt : { BYTE(0) } .init.plt : { BYTE(0) } } +#endif diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile index aff994473eb2..32d7e07b0acf 100644 --- a/arch/arm64/Makefile +++ b/arch/arm64/Makefile @@ -127,10 +127,6 @@ endif CHECKFLAGS += -D__aarch64__ -ifeq ($(CONFIG_ARM64_MODULE_PLTS),y) -KBUILD_LDS_MODULE += $(srctree)/arch/arm64/kernel/module.lds -endif - ifeq ($(CONFIG_DYNAMIC_FTRACE_WITH_REGS),y) KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY CC_FLAGS_FTRACE := -fpatchable-function-entry=2 diff --git a/arch/arm64/kernel/module.lds b/arch/arm64/include/asm/module.lds.h similarity index 76% rename from arch/arm64/kernel/module.lds rename to arch/arm64/include/asm/module.lds.h index 22e36a21c113..691f15af788e 100644 --- a/arch/arm64/kernel/module.lds +++ b/arch/arm64/include/asm/module.lds.h @@ -1,5 +1,7 @@ +#ifdef CONFIG_ARM64_MODULE_PLTS SECTIONS { .plt (NOLOAD) : { BYTE(0) } .init.plt (NOLOAD) : { BYTE(0) } .text.ftrace_trampoline (NOLOAD) : { BYTE(0) } } +#endif diff --git a/arch/ia64/Makefile b/arch/ia64/Makefile index 2876a7df1b0a..703b1c4f6d12 100644 --- a/arch/ia64/Makefile +++ b/arch/ia64/Makefile @@ -20,7 +20,6 @@ CHECKFLAGS += -D__ia64=1 -D__ia64__=1 -D_LP64 -D__LP64__ OBJCOPYFLAGS := --strip-all LDFLAGS_vmlinux := -static -KBUILD_LDS_MODULE += $(srctree)/arch/ia64/module.lds KBUILD_AFLAGS_KERNEL := -mconstant-gp EXTRA := diff --git a/arch/ia64/module.lds b/arch/ia64/include/asm/module.lds.h similarity index 100% rename from arch/ia64/module.lds rename to arch/ia64/include/asm/module.lds.h diff --git a/arch/m68k/Makefile b/arch/m68k/Makefile index 4438ffb4bbe1..ea14f2046fb4 100644 --- a/arch/m68k/Makefile +++ b/arch/m68k/Makefile @@ -75,7 +75,6 @@ KBUILD_CPPFLAGS += -D__uClinux__ endif KBUILD_LDFLAGS := -m m68kelf -KBUILD_LDS_MODULE += $(srctree)/arch/m68k/kernel/module.lds ifdef CONFIG_SUN3 LDFLAGS_vmlinux = -N diff --git a/arch/m68k/kernel/module.lds b/arch/m68k/include/asm/module.lds.h similarity index 100% rename from arch/m68k/kernel/module.lds rename to arch/m68k/include/asm/module.lds.h diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index 3e8da9cf2eb9..8935658fcd06 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -65,7 +65,6 @@ UTS_MACHINE := $(subst $(space),,$(machine-y)) ifdef CONFIG_PPC32 KBUILD_LDFLAGS_MODULE += arch/powerpc/lib/crtsavres.o else -KBUILD_LDS_MODULE += $(srctree)/arch/powerpc/kernel/module.lds ifeq ($(call ld-ifversion, -ge, 225000000, y),y) # Have the linker provide sfpr if possible. # There is a corresponding test in arch/powerpc/lib/Makefile diff --git a/arch/powerpc/kernel/module.lds b/arch/powerpc/include/asm/module.lds.h similarity index 100% rename from arch/powerpc/kernel/module.lds rename to arch/powerpc/include/asm/module.lds.h diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile index fb6e37db836d..8edaa8bd86d6 100644 --- a/arch/riscv/Makefile +++ b/arch/riscv/Makefile @@ -53,9 +53,6 @@ endif ifeq ($(CONFIG_CMODEL_MEDANY),y) KBUILD_CFLAGS += -mcmodel=medany endif -ifeq ($(CONFIG_MODULE_SECTIONS),y) - KBUILD_LDS_MODULE += $(srctree)/arch/riscv/kernel/module.lds -endif ifeq ($(CONFIG_PERF_EVENTS),y) KBUILD_CFLAGS += -fno-omit-frame-pointer endif diff --git a/arch/riscv/kernel/module.lds b/arch/riscv/include/asm/module.lds.h similarity index 84% rename from arch/riscv/kernel/module.lds rename to arch/riscv/include/asm/module.lds.h index 295ecfb341a2..4254ff2ff049 100644 --- a/arch/riscv/kernel/module.lds +++ b/arch/riscv/include/asm/module.lds.h @@ -1,8 +1,9 @@ /* SPDX-License-Identifier: GPL-2.0 */ /* Copyright (C) 2017 Andes Technology Corporation */ - +#ifdef CONFIG_MODULE_SECTIONS SECTIONS { .plt (NOLOAD) : { BYTE(0) } .got (NOLOAD) : { BYTE(0) } .got.plt (NOLOAD) : { BYTE(0) } } +#endif diff --git a/arch/um/include/asm/Kbuild b/arch/um/include/asm/Kbuild index 8d435f8a6dec..1c63b260ecc4 100644 --- a/arch/um/include/asm/Kbuild +++ b/arch/um/include/asm/Kbuild @@ -16,6 +16,7 @@ generic-y += kdebug.h generic-y += mcs_spinlock.h generic-y += mm-arch-hooks.h generic-y += mmiowb.h +generic-y += module.lds.h generic-y += param.h generic-y += pci.h generic-y += percpu.h diff --git a/include/asm-generic/Kbuild b/include/asm-generic/Kbuild index 74b0612601dd..7cd4e627e00e 100644 --- a/include/asm-generic/Kbuild +++ b/include/asm-generic/Kbuild @@ -40,6 +40,7 @@ mandatory-y += mmiowb.h mandatory-y += mmu.h mandatory-y += mmu_context.h mandatory-y += module.h +mandatory-y += module.lds.h mandatory-y += msi.h mandatory-y += pci.h mandatory-y += percpu.h diff --git a/include/asm-generic/module.lds.h b/include/asm-generic/module.lds.h new file mode 100644 index 000000000000..f210d5c1b78b --- /dev/null +++ b/include/asm-generic/module.lds.h @@ -0,0 +1,10 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +#ifndef __ASM_GENERIC_MODULE_LDS_H +#define __ASM_GENERIC_MODULE_LDS_H + +/* + * <asm/module.lds.h> can specify arch-specific sections for linking modules. + * Empty for the asm-generic header. + */ + +#endif /* __ASM_GENERIC_MODULE_LDS_H */ diff --git a/scripts/.gitignore b/scripts/.gitignore index 0d1c8e217cd7..a6c11316c969 100644 --- a/scripts/.gitignore +++ b/scripts/.gitignore @@ -8,3 +8,4 @@ asn1_compiler extract-cert sign-file insert-sys-cert +/module.lds diff --git a/scripts/Makefile b/scripts/Makefile index bc018e4b733e..b5418ec587fb 100644 --- a/scripts/Makefile +++ b/scripts/Makefile @@ -29,6 +29,9 @@ endif # The following programs are only built on demand hostprogs += unifdef +# The module linker script is preprocessed on demand +targets += module.lds + subdir-$(CONFIG_GCC_PLUGINS) += gcc-plugins subdir-$(CONFIG_MODVERSIONS) += genksyms subdir-$(CONFIG_SECURITY_SELINUX) += selinux diff --git a/scripts/Makefile.modfinal b/scripts/Makefile.modfinal index 411c1e600e7d..ae01baf96f4e 100644 --- a/scripts/Makefile.modfinal +++ b/scripts/Makefile.modfinal @@ -33,11 +33,10 @@ quiet_cmd_ld_ko_o = LD [M] $@ cmd_ld_ko_o = \ $(LD) -r $(KBUILD_LDFLAGS) \ $(KBUILD_LDFLAGS_MODULE) $(LDFLAGS_MODULE) \ - $(addprefix -T , $(KBUILD_LDS_MODULE)) \ - -o $@ $(filter %.o, $^); \ + -T scripts/module.lds -o $@ $(filter %.o, $^); \ $(if $(ARCH_POSTLINK), $(MAKE) -f $(ARCH_POSTLINK) $@, true) -$(modules): %.ko: %.o %.mod.o $(KBUILD_LDS_MODULE) FORCE +$(modules): %.ko: %.o %.mod.o scripts/module.lds FORCE +$(call if_changed,ld_ko_o) targets += $(modules) $(modules:.ko=.mod.o) diff --git a/scripts/module-common.lds b/scripts/module.lds.S similarity index 93% rename from scripts/module-common.lds rename to scripts/module.lds.S index d61b9e8678e8..69b9b71a6a47 100644 --- a/scripts/module-common.lds +++ b/scripts/module.lds.S @@ -24,3 +24,6 @@ SECTIONS { __jump_table 0 : ALIGN(8) { KEEP(*(__jump_table)) } } + +/* bring in arch-specific sections */ +#include <asm/module.lds.h> diff --git a/scripts/package/builddeb b/scripts/package/builddeb index 6df3c9f8b2da..44f212e37935 100755 --- a/scripts/package/builddeb +++ b/scripts/package/builddeb @@ -55,7 +55,7 @@ deploy_kernel_headers () { cd $srctree find . arch/$SRCARCH -maxdepth 1 -name Makefile\* find include scripts -type f -o -type l - find arch/$SRCARCH -name module.lds -o -name Kbuild.platforms -o -name Platform + find arch/$SRCARCH -name Kbuild.platforms -o -name Platform find $(find arch/$SRCARCH -name include -o -name scripts -type d) -type f ) > debian/hdrsrcfiles -- 2.28.0.1011.ga647a8990f-goog
From: Peter Zijlstra <peterz@infradead.org> Add the --mcount option for generating __mcount_loc sections needed for dynamic ftrace. Using this pass requires the kernel to be compiled with -mfentry and CC_USING_NOP_MCOUNT to be defined in Makefile. Link: https://lore.kernel.org/lkml/20200625200235.GQ4781@hirez.programming.kicks-ass.net/ Signed-off-by: Peter Zijlstra <peterz@infradead.org> [Sami: rebased, dropped config changes, fixed to actually use --mcount, and wrote a commit message.] Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> --- tools/objtool/builtin-check.c | 3 +- tools/objtool/check.c | 82 +++++++++++++++++++++++++ tools/objtool/include/objtool/builtin.h | 2 +- tools/objtool/include/objtool/check.h | 1 + tools/objtool/include/objtool/objtool.h | 1 + tools/objtool/objtool.c | 1 + 6 files changed, 88 insertions(+), 2 deletions(-) diff --git a/tools/objtool/builtin-check.c b/tools/objtool/builtin-check.c index f47951e19c9d..6518c1a6ad1e 100644 --- a/tools/objtool/builtin-check.c +++ b/tools/objtool/builtin-check.c @@ -18,7 +18,7 @@ #include <objtool/builtin.h> #include <objtool/objtool.h> -bool no_fp, no_unreachable, retpoline, module, backtrace, uaccess, stats, validate_dup, vmlinux; +bool no_fp, no_unreachable, retpoline, module, backtrace, uaccess, stats, validate_dup, vmlinux, mcount; static const char * const check_usage[] = { "objtool check [<options>] file.o", @@ -35,6 +35,7 @@ const struct option check_options[] = { OPT_BOOLEAN('s', "stats", &stats, "print statistics"), OPT_BOOLEAN('d', "duplicate", &validate_dup, "duplicate validation for vmlinux.o"), OPT_BOOLEAN('l', "vmlinux", &vmlinux, "vmlinux.o validation"), + OPT_BOOLEAN('M', "mcount", &mcount, "generate __mcount_loc"), OPT_END(), }; diff --git a/tools/objtool/check.c b/tools/objtool/check.c index 32e6a0db6768..61dcd80feec5 100644 --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -524,6 +524,65 @@ static int create_static_call_sections(struct objtool_file *file) return 0; } +static int create_mcount_loc_sections(struct objtool_file *file) +{ + struct section *sec, *reloc_sec; + struct reloc *reloc; + unsigned long *loc; + struct instruction *insn; + int idx; + + sec = find_section_by_name(file->elf, "__mcount_loc"); + if (sec) { + INIT_LIST_HEAD(&file->mcount_loc_list); + WARN("file already has __mcount_loc section, skipping"); + return 0; + } + + if (list_empty(&file->mcount_loc_list)) + return 0; + + idx = 0; + list_for_each_entry(insn, &file->mcount_loc_list, mcount_loc_node) + idx++; + + sec = elf_create_section(file->elf, "__mcount_loc", 0, sizeof(unsigned long), idx); + if (!sec) + return -1; + + reloc_sec = elf_create_reloc_section(file->elf, sec, SHT_RELA); + if (!reloc_sec) + return -1; + + idx = 0; + list_for_each_entry(insn, &file->mcount_loc_list, mcount_loc_node) { + + loc = (unsigned long *)sec->data->d_buf + idx; + memset(loc, 0, sizeof(unsigned long)); + + reloc = malloc(sizeof(*reloc)); + if (!reloc) { + perror("malloc"); + return -1; + } + memset(reloc, 0, sizeof(*reloc)); + + reloc->sym = insn->sec->sym; + reloc->addend = insn->offset; + reloc->type = R_X86_64_64; + reloc->offset = idx * sizeof(unsigned long); + reloc->sec = reloc_sec; + elf_add_reloc(file->elf, reloc); + + idx++; + } + + if (elf_rebuild_reloc_section(file->elf, reloc_sec)) + return -1; + + return 0; +} + /* * Warnings shouldn't be reported for ignored functions. */ @@ -950,6 +1009,22 @@ static int add_call_destinations(struct objtool_file *file) insn->type = INSN_NOP; } + if (mcount && !strcmp(insn->call_dest->name, "__fentry__")) { + if (reloc) { + reloc->type = R_NONE; + elf_write_reloc(file->elf, reloc); + } + + elf_write_insn(file->elf, insn->sec, + insn->offset, insn->len, + arch_nop_insn(insn->len)); + + insn->type = INSN_NOP; + + list_add_tail(&insn->mcount_loc_node, + &file->mcount_loc_list); + } + /* * Whatever stack impact regular CALLs have, should be undone * by the RETURN of the called function. @@ -2921,6 +2996,13 @@ int check(struct objtool_file *file) goto out; warnings += ret; + if (mcount) { + ret = create_mcount_loc_sections(file); + if (ret < 0) + goto out; + warnings += ret; + } + out: if (ret < 0) { /* diff --git a/tools/objtool/include/objtool/builtin.h b/tools/objtool/include/objtool/builtin.h index 85c979caa367..94565a72b701 100644 --- a/tools/objtool/include/objtool/builtin.h +++ b/tools/objtool/include/objtool/builtin.h @@ -8,7 +8,7 @@ #include <subcmd/parse-options.h> extern const struct option check_options[]; -extern bool no_fp, no_unreachable, retpoline, module, backtrace, uaccess, stats, validate_dup, vmlinux; +extern bool no_fp, no_unreachable, retpoline, module, backtrace, uaccess, stats, validate_dup, vmlinux, mcount; extern int cmd_check(int argc, const char **argv); extern int cmd_orc(int argc, const char **argv); diff --git a/tools/objtool/include/objtool/check.h b/tools/objtool/include/objtool/check.h index bba10968eac0..f04415852c29 100644 --- a/tools/objtool/include/objtool/check.h +++ b/tools/objtool/include/objtool/check.h @@ -23,6 +23,7 @@ struct instruction { struct list_head list; struct hlist_node hash; struct list_head static_call_node; + struct list_head mcount_loc_node; struct section *sec; unsigned long offset; unsigned int len; diff --git a/tools/objtool/include/objtool/objtool.h b/tools/objtool/include/objtool/objtool.h index 32f4cd1da9fa..3c899e0ab861 100644 --- a/tools/objtool/include/objtool/objtool.h +++ b/tools/objtool/include/objtool/objtool.h @@ -19,6 +19,7 @@ struct objtool_file { struct list_head insn_list; DECLARE_HASHTABLE(insn_hash, 20); struct list_head static_call_list; + struct list_head mcount_loc_list; bool ignore_unreachables, c_file, hints, rodata; }; diff --git a/tools/objtool/objtool.c b/tools/objtool/objtool.c index e848feb0a5fc..7b97ce499405 100644 --- a/tools/objtool/objtool.c +++ b/tools/objtool/objtool.c @@ -62,6 +62,7 @@ struct objtool_file *objtool_open_read(const char *_objname) INIT_LIST_HEAD(&file.insn_list); hash_init(file.insn_hash); INIT_LIST_HEAD(&file.static_call_list); + INIT_LIST_HEAD(&file.mcount_loc_list); file.c_file = !vmlinux && find_section_by_name(file.elf, ".comment"); file.ignore_unreachables = no_unreachable; file.hints = false; -- 2.28.0.1011.ga647a8990f-goog
With LTO, we run objtool on vmlinux.o, but don't want noinstr validation. This change requires --vmlinux to be passed to objtool explicitly. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> --- scripts/link-vmlinux.sh | 2 +- tools/objtool/builtin-check.c | 6 +----- 2 files changed, 2 insertions(+), 6 deletions(-) diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh index e6e2d9e5ff48..372c3719f94c 100755 --- a/scripts/link-vmlinux.sh +++ b/scripts/link-vmlinux.sh @@ -64,7 +64,7 @@ objtool_link() local objtoolopt; if [ -n "${CONFIG_VMLINUX_VALIDATION}" ]; then - objtoolopt="check" + objtoolopt="check --vmlinux" if [ -z "${CONFIG_FRAME_POINTER}" ]; then objtoolopt="${objtoolopt} --no-fp" fi diff --git a/tools/objtool/builtin-check.c b/tools/objtool/builtin-check.c index 6518c1a6ad1e..ff4d7f5c0e80 100644 --- a/tools/objtool/builtin-check.c +++ b/tools/objtool/builtin-check.c @@ -41,7 +41,7 @@ const struct option check_options[] = { int cmd_check(int argc, const char **argv) { - const char *objname, *s; + const char *objname; struct objtool_file *file; int ret; @@ -52,10 +52,6 @@ int cmd_check(int argc, const char **argv) objname = argv[0]; - s = strstr(objname, "vmlinux.o"); - if (s && !s[9]) - vmlinux = true; - file = objtool_open_read(objname); if (!file) return 1; -- 2.28.0.1011.ga647a8990f-goog
Move function tracer options to Kconfig to make it easier to add new methods for generating __mcount_loc, and to make the options available also when building kernel modules. Note that FTRACE_MCOUNT_USE_* options are updated on rebuild and therefore, work even if the .config was generated in a different environment. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> --- Makefile | 20 ++++++++------------ kernel/trace/Kconfig | 16 ++++++++++++++++ scripts/Makefile.build | 6 ++---- 3 files changed, 26 insertions(+), 16 deletions(-) diff --git a/Makefile b/Makefile index a913a6829754..434da9fcbf3c 100644 --- a/Makefile +++ b/Makefile @@ -841,12 +841,8 @@ KBUILD_CFLAGS += $(DEBUG_CFLAGS) export DEBUG_CFLAGS ifdef CONFIG_FUNCTION_TRACER -ifdef CONFIG_FTRACE_MCOUNT_RECORD - # gcc 5 supports generating the mcount tables directly - ifeq ($(call cc-option-yn,-mrecord-mcount),y) - CC_FLAGS_FTRACE += -mrecord-mcount - export CC_USING_RECORD_MCOUNT := 1 - endif +ifdef CONFIG_FTRACE_MCOUNT_USE_CC + CC_FLAGS_FTRACE += -mrecord-mcount ifdef CONFIG_HAVE_NOP_MCOUNT ifeq ($(call cc-option-yn, -mnop-mcount),y) CC_FLAGS_FTRACE += -mnop-mcount @@ -854,6 +850,12 @@ ifdef CONFIG_FTRACE_MCOUNT_RECORD endif endif endif +ifdef CONFIG_FTRACE_MCOUNT_USE_RECORDMCOUNT + ifdef CONFIG_HAVE_C_RECORDMCOUNT + BUILD_C_RECORDMCOUNT := y + export BUILD_C_RECORDMCOUNT + endif +endif ifdef CONFIG_HAVE_FENTRY ifeq ($(call cc-option-yn, -mfentry),y) CC_FLAGS_FTRACE += -mfentry @@ -863,12 +865,6 @@ endif export CC_FLAGS_FTRACE KBUILD_CFLAGS += $(CC_FLAGS_FTRACE) $(CC_FLAGS_USING) KBUILD_AFLAGS += $(CC_FLAGS_USING) -ifdef CONFIG_DYNAMIC_FTRACE - ifdef CONFIG_HAVE_C_RECORDMCOUNT - BUILD_C_RECORDMCOUNT := y - export BUILD_C_RECORDMCOUNT - endif -endif endif # We trigger additional mismatches with less inlining diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig index a4020c0b4508..927ad004888a 100644 --- a/kernel/trace/Kconfig +++ b/kernel/trace/Kconfig @@ -595,6 +595,22 @@ config FTRACE_MCOUNT_RECORD depends on DYNAMIC_FTRACE depends on HAVE_FTRACE_MCOUNT_RECORD +config FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY + bool + depends on FTRACE_MCOUNT_RECORD + +config FTRACE_MCOUNT_USE_CC + def_bool y + depends on $(cc-option,-mrecord-mcount) + depends on !FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY + depends on FTRACE_MCOUNT_RECORD + +config FTRACE_MCOUNT_USE_RECORDMCOUNT + def_bool y + depends on !FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY + depends on !FTRACE_MCOUNT_USE_CC + depends on FTRACE_MCOUNT_RECORD + config TRACING_MAP bool depends on ARCH_HAVE_NMI_SAFE_CMPXCHG diff --git a/scripts/Makefile.build b/scripts/Makefile.build index a467b9323442..a4634aae1506 100644 --- a/scripts/Makefile.build +++ b/scripts/Makefile.build @@ -178,8 +178,7 @@ cmd_modversions_c = \ fi endif -ifdef CONFIG_FTRACE_MCOUNT_RECORD -ifndef CC_USING_RECORD_MCOUNT +ifdef CONFIG_FTRACE_MCOUNT_USE_RECORDMCOUNT # compiler will not generate __mcount_loc use recordmcount or recordmcount.pl ifdef BUILD_C_RECORDMCOUNT ifeq ("$(origin RECORDMCOUNT_WARN)", "command line") @@ -206,8 +205,7 @@ recordmcount_source := $(srctree)/scripts/recordmcount.pl endif # BUILD_C_RECORDMCOUNT cmd_record_mcount = $(if $(findstring $(strip $(CC_FLAGS_FTRACE)),$(_c_flags)), \ $(sub_cmd_record_mcount)) -endif # CC_USING_RECORD_MCOUNT -endif # CONFIG_FTRACE_MCOUNT_RECORD +endif # CONFIG_FTRACE_MCOUNT_USE_RECORDMCOUNT ifdef CONFIG_STACK_VALIDATION ifneq ($(SKIP_STACK_VALIDATION),1) -- 2.28.0.1011.ga647a8990f-goog
This change adds build support for using objtool to generate __mcount_loc sections. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> --- Makefile | 12 ++++++++++-- kernel/trace/Kconfig | 13 +++++++++++++ 2 files changed, 23 insertions(+), 2 deletions(-) diff --git a/Makefile b/Makefile index 434da9fcbf3c..fb2cf557d3ca 100644 --- a/Makefile +++ b/Makefile @@ -850,6 +850,9 @@ ifdef CONFIG_FTRACE_MCOUNT_USE_CC endif endif endif +ifdef CONFIG_FTRACE_MCOUNT_USE_OBJTOOL + CC_FLAGS_USING += -DCC_USING_NOP_MCOUNT +endif ifdef CONFIG_FTRACE_MCOUNT_USE_RECORDMCOUNT ifdef CONFIG_HAVE_C_RECORDMCOUNT BUILD_C_RECORDMCOUNT := y @@ -1209,11 +1212,16 @@ uapi-asm-generic: PHONY += prepare-objtool prepare-resolve_btfids prepare-objtool: $(objtool_target) ifeq ($(SKIP_STACK_VALIDATION),1) +objtool-lib-prompt := "please install libelf-dev, libelf-devel or elfutils-libelf-devel" +ifdef CONFIG_FTRACE_MCOUNT_USE_OBJTOOL + @echo "error: Cannot generate __mcount_loc for CONFIG_DYNAMIC_FTRACE=y, $(objtool-lib-prompt)" >&2 + @false +endif ifdef CONFIG_UNWINDER_ORC - @echo "error: Cannot generate ORC metadata for CONFIG_UNWINDER_ORC=y, please install libelf-dev, libelf-devel or elfutils-libelf-devel" >&2 + @echo "error: Cannot generate ORC metadata for CONFIG_UNWINDER_ORC=y, $(objtool-lib-prompt)" >&2 @false else - @echo "warning: Cannot use CONFIG_STACK_VALIDATION=y, please install libelf-dev, libelf-devel or elfutils-libelf-devel" >&2 + @echo "warning: Cannot use CONFIG_STACK_VALIDATION=y, $(objtool-lib-prompt)" >&2 endif endif diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig index 927ad004888a..89263210ab26 100644 --- a/kernel/trace/Kconfig +++ b/kernel/trace/Kconfig @@ -51,6 +51,11 @@ config HAVE_NOP_MCOUNT help Arch supports the gcc options -pg with -mrecord-mcount and -nop-mcount +config HAVE_OBJTOOL_MCOUNT + bool + help + Arch supports objtool --mcount + config HAVE_C_RECORDMCOUNT bool help @@ -605,10 +610,18 @@ config FTRACE_MCOUNT_USE_CC depends on !FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY depends on FTRACE_MCOUNT_RECORD +config FTRACE_MCOUNT_USE_OBJTOOL + def_bool y + depends on HAVE_OBJTOOL_MCOUNT + depends on !FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY + depends on !FTRACE_MCOUNT_USE_CC + depends on FTRACE_MCOUNT_RECORD + config FTRACE_MCOUNT_USE_RECORDMCOUNT def_bool y depends on !FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY depends on !FTRACE_MCOUNT_USE_CC + depends on !FTRACE_MCOUNT_USE_OBJTOOL depends on FTRACE_MCOUNT_RECORD config TRACING_MAP -- 2.28.0.1011.ga647a8990f-goog
Select HAVE_OBJTOOL_MCOUNT if STACK_VALIDATION is selected to use objtool to generate __mcount_loc sections for dynamic ftrace with Clang and gcc <5 (later versions of gcc use -mrecord-mcount). Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> --- arch/x86/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 5e832fd520b5..6d67646153bc 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -163,6 +163,7 @@ config X86 select HAVE_CMPXCHG_LOCAL select HAVE_CONTEXT_TRACKING if X86_64 select HAVE_C_RECORDMCOUNT + select HAVE_OBJTOOL_MCOUNT if STACK_VALIDATION select HAVE_DEBUG_KMEMLEAK select HAVE_DMA_CONTIGUOUS select HAVE_DYNAMIC_FTRACE -- 2.28.0.1011.ga647a8990f-goog
This change removes all instances of DISABLE_LTO from Makefiles, as they are currently unused, and the preferred method of disabling LTO is to filter out the flags instead. Suggested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> --- arch/arm64/kernel/vdso/Makefile | 1 - arch/sparc/vdso/Makefile | 2 -- arch/x86/entry/vdso/Makefile | 2 -- kernel/Makefile | 3 --- scripts/Makefile.build | 2 +- 5 files changed, 1 insertion(+), 9 deletions(-) diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile index 45d5cfe46429..e836e300440f 100644 --- a/arch/arm64/kernel/vdso/Makefile +++ b/arch/arm64/kernel/vdso/Makefile @@ -31,7 +31,6 @@ ccflags-y := -fno-common -fno-builtin -fno-stack-protector -ffixed-x18 ccflags-y += -DDISABLE_BRANCH_PROFILING CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os $(CC_FLAGS_SCS) $(GCC_PLUGINS_CFLAGS) -KBUILD_CFLAGS += $(DISABLE_LTO) KASAN_SANITIZE := n UBSAN_SANITIZE := n OBJECT_FILES_NON_STANDARD := y diff --git a/arch/sparc/vdso/Makefile b/arch/sparc/vdso/Makefile index f44355e46f31..476c4b315505 100644 --- a/arch/sparc/vdso/Makefile +++ b/arch/sparc/vdso/Makefile @@ -3,8 +3,6 @@ # Building vDSO images for sparc. # -KBUILD_CFLAGS += $(DISABLE_LTO) - VDSO64-$(CONFIG_SPARC64) := y VDSOCOMPAT-$(CONFIG_COMPAT) := y diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile index 215376d975a2..ecc27018ae13 100644 --- a/arch/x86/entry/vdso/Makefile +++ b/arch/x86/entry/vdso/Makefile @@ -9,8 +9,6 @@ ARCH_REL_TYPE_ABS := R_X86_64_JUMP_SLOT|R_X86_64_GLOB_DAT|R_X86_64_RELATIVE| ARCH_REL_TYPE_ABS += R_386_GLOB_DAT|R_386_JMP_SLOT|R_386_RELATIVE include $(srctree)/lib/vdso/Makefile -KBUILD_CFLAGS += $(DISABLE_LTO) - # Sanitizer runtimes are unavailable and cannot be linked here. KASAN_SANITIZE := n UBSAN_SANITIZE := n diff --git a/kernel/Makefile b/kernel/Makefile index 16ec9262ce9d..2561abc91961 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -38,9 +38,6 @@ KASAN_SANITIZE_kcov.o := n KCSAN_SANITIZE_kcov.o := n CFLAGS_kcov.o := $(call cc-option, -fno-conserve-stack) -fno-stack-protector -# cond_syscall is currently not LTO compatible -CFLAGS_sys_ni.o = $(DISABLE_LTO) - obj-y += sched/ obj-y += locking/ obj-y += power/ diff --git a/scripts/Makefile.build b/scripts/Makefile.build index a4634aae1506..2175ddb1ee0c 100644 --- a/scripts/Makefile.build +++ b/scripts/Makefile.build @@ -111,7 +111,7 @@ endif # --------------------------------------------------------------------------- quiet_cmd_cc_s_c = CC $(quiet_modtag) $@ - cmd_cc_s_c = $(CC) $(filter-out $(DEBUG_CFLAGS), $(c_flags)) $(DISABLE_LTO) -fverbose-asm -S -o $@ $< + cmd_cc_s_c = $(CC) $(filter-out $(DEBUG_CFLAGS), $(c_flags)) -fverbose-asm -S -o $@ $< $(obj)/%.s: $(src)/%.c FORCE $(call if_changed_dep,cc_s_c) -- 2.28.0.1011.ga647a8990f-goog
This change adds build system support for Clang's Link Time Optimization (LTO). With -flto, instead of ELF object files, Clang produces LLVM bitcode, which is compiled into native code at link time, allowing the final binary to be optimized globally. For more details, see: https://llvm.org/docs/LinkTimeOptimization.html The Kconfig option CONFIG_LTO_CLANG is implemented as a choice, which defaults to LTO being disabled. To use LTO, the architecture must select ARCH_SUPPORTS_LTO_CLANG and support: - compiling with Clang, - compiling inline assembly with Clang's integrated assembler, - and linking with LLD. While using full LTO results in the best runtime performance, the compilation is not scalable in time or memory. CONFIG_THINLTO enables ThinLTO, which allows parallel optimization and faster incremental builds. ThinLTO is used by default if the architecture also selects ARCH_SUPPORTS_THINLTO: https://clang.llvm.org/docs/ThinLTO.html To enable LTO, LLVM tools must be used to handle bitcode files. The easiest way is to pass the LLVM=1 option to make: $ make LLVM=1 defconfig $ scripts/config -e LTO_CLANG $ make LLVM=1 Alternatively, at least the following LLVM tools must be used: CC=clang LD=ld.lld AR=llvm-ar NM=llvm-nm To prepare for LTO support with other compilers, common parts are gated behind the CONFIG_LTO option, and LTO can be disabled for specific files by filtering out CC_FLAGS_LTO. Note that support for DYNAMIC_FTRACE and MODVERSIONS are added in follow-up patches. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> --- Makefile | 20 ++++++++- arch/Kconfig | 75 +++++++++++++++++++++++++++++++ include/asm-generic/vmlinux.lds.h | 11 +++-- scripts/Makefile.build | 9 +++- scripts/Makefile.modfinal | 9 +++- scripts/Makefile.modpost | 21 ++++++++- scripts/link-vmlinux.sh | 32 +++++++++---- 7 files changed, 159 insertions(+), 18 deletions(-) diff --git a/Makefile b/Makefile index fb2cf557d3ca..a3baf8388163 100644 --- a/Makefile +++ b/Makefile @@ -886,6 +886,21 @@ KBUILD_CFLAGS += $(CC_FLAGS_SCS) export CC_FLAGS_SCS endif +ifdef CONFIG_LTO_CLANG +ifdef CONFIG_THINLTO +CC_FLAGS_LTO += -flto=thin -fsplit-lto-unit +KBUILD_LDFLAGS += --thinlto-cache-dir=$(extmod-prefix).thinlto-cache +else +CC_FLAGS_LTO += -flto +endif +CC_FLAGS_LTO += -fvisibility=default +endif + +ifdef CONFIG_LTO +KBUILD_CFLAGS += $(CC_FLAGS_LTO) +export CC_FLAGS_LTO +endif + ifdef CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_32B KBUILD_CFLAGS += -falign-functions=32 endif @@ -1477,7 +1492,7 @@ MRPROPER_FILES += include/config include/generated \ *.spec # Directories & files removed with 'make distclean' -DISTCLEAN_FILES += tags TAGS cscope* GPATH GTAGS GRTAGS GSYMS +DISTCLEAN_FILES += tags TAGS cscope* GPATH GTAGS GRTAGS GSYMS .thinlto-cache # clean - Delete most, but leave enough to build external modules # @@ -1714,7 +1729,8 @@ _emodinst_post: _emodinst_ $(call cmd,depmod) clean-dirs := $(KBUILD_EXTMOD) -clean: rm-files := $(KBUILD_EXTMOD)/Module.symvers $(KBUILD_EXTMOD)/modules.nsdeps +clean: rm-files := $(KBUILD_EXTMOD)/Module.symvers $(KBUILD_EXTMOD)/modules.nsdeps \ + $(KBUILD_EXTMOD)/.thinlto-cache PHONY += help help: diff --git a/arch/Kconfig b/arch/Kconfig index 76ec3395b843..4ac5dda6d873 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -558,6 +558,81 @@ config SHADOW_CALL_STACK reading and writing arbitrary memory may be able to locate them and hijack control flow by modifying the stacks. +config LTO + bool + +config ARCH_SUPPORTS_LTO_CLANG + bool + help + An architecture should select this option if it supports: + - compiling with Clang, + - compiling inline assembly with Clang's integrated assembler, + - and linking with LLD. + +config ARCH_SUPPORTS_THINLTO + bool + help + An architecture should select this option if it supports Clang's + ThinLTO. + +config THINLTO + bool "Clang ThinLTO" + depends on LTO_CLANG && ARCH_SUPPORTS_THINLTO + default y + help + This option enables Clang's ThinLTO, which allows for parallel + optimization and faster incremental compiles. More information + can be found from Clang's documentation: + + https://clang.llvm.org/docs/ThinLTO.html + + If you say N here, the compiler will use full LTO, which may + produce faster code, but building the kernel will be significantly + slower as the linker won't efficiently utilize multiple threads. + + If unsure, say Y. + +choice + prompt "Link Time Optimization (LTO)" + default LTO_NONE + help + This option enables Link Time Optimization (LTO), which allows the + compiler to optimize binaries globally. + + If unsure, select LTO_NONE. Note that LTO is very resource-intensive + so it's disabled by default. + +config LTO_NONE + bool "None" + +config LTO_CLANG + bool "Clang's Link Time Optimization (EXPERIMENTAL)" + # Clang >= 11: https://github.com/ClangBuiltLinux/linux/issues/510 + depends on CC_IS_CLANG && CLANG_VERSION >= 110000 && LD_IS_LLD + depends on $(success,$(NM) --help | head -n 1 | grep -qi llvm) + depends on $(success,$(AR) --help | head -n 1 | grep -qi llvm) + depends on ARCH_SUPPORTS_LTO_CLANG + depends on !FTRACE_MCOUNT_RECORD + depends on !KASAN + depends on !GCOV_KERNEL + depends on !MODVERSIONS + select LTO + help + This option enables Clang's Link Time Optimization (LTO), which + allows the compiler to optimize the kernel globally. If you enable + this option, the compiler generates LLVM bitcode instead of ELF + object files, and the actual compilation from bitcode happens at + the LTO link step, which may take several minutes depending on the + kernel configuration. More information can be found from LLVM's + documentation: + + https://llvm.org/docs/LinkTimeOptimization.html + + To select this option, you also need to use LLVM tools to handle + the bitcode by passing LLVM=1 to make. + +endchoice + config HAVE_ARCH_WITHIN_STACK_FRAMES bool help diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index e1843976754a..77e5bd069dd4 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -90,15 +90,18 @@ * .data. We don't want to pull in .data..other sections, which Linux * has defined. Same for text and bss. * + * With LTO_CLANG, the linker also splits sections by default, so we need + * these macros to combine the sections during the final link. + * * RODATA_MAIN is not used because existing code already defines .rodata.x * sections to be brought in with rodata. */ -#ifdef CONFIG_LD_DEAD_CODE_DATA_ELIMINATION +#if defined(CONFIG_LD_DEAD_CODE_DATA_ELIMINATION) || defined(CONFIG_LTO_CLANG) #define TEXT_MAIN .text .text.[0-9a-zA-Z_]* -#define DATA_MAIN .data .data.[0-9a-zA-Z_]* .data..LPBX* +#define DATA_MAIN .data .data.[0-9a-zA-Z_]* .data..L* .data..compoundliteral* #define SDATA_MAIN .sdata .sdata.[0-9a-zA-Z_]* -#define RODATA_MAIN .rodata .rodata.[0-9a-zA-Z_]* -#define BSS_MAIN .bss .bss.[0-9a-zA-Z_]* +#define RODATA_MAIN .rodata .rodata.[0-9a-zA-Z_]* .rodata..L* +#define BSS_MAIN .bss .bss.[0-9a-zA-Z_]* .bss..compoundliteral* #define SBSS_MAIN .sbss .sbss.[0-9a-zA-Z_]* #else #define TEXT_MAIN .text diff --git a/scripts/Makefile.build b/scripts/Makefile.build index 2175ddb1ee0c..ed74b2f986f7 100644 --- a/scripts/Makefile.build +++ b/scripts/Makefile.build @@ -111,7 +111,7 @@ endif # --------------------------------------------------------------------------- quiet_cmd_cc_s_c = CC $(quiet_modtag) $@ - cmd_cc_s_c = $(CC) $(filter-out $(DEBUG_CFLAGS), $(c_flags)) -fverbose-asm -S -o $@ $< + cmd_cc_s_c = $(CC) $(filter-out $(DEBUG_CFLAGS) $(CC_FLAGS_LTO), $(c_flags)) -fverbose-asm -S -o $@ $< $(obj)/%.s: $(src)/%.c FORCE $(call if_changed_dep,cc_s_c) @@ -425,8 +425,15 @@ $(obj)/lib.a: $(lib-y) FORCE # Do not replace $(filter %.o,^) with $(real-prereqs). When a single object # module is turned into a multi object module, $^ will contain header file # dependencies recorded in the .*.cmd file. +ifdef CONFIG_LTO_CLANG +quiet_cmd_link_multi-m = AR [M] $@ +cmd_link_multi-m = \ + rm -f $@; \ + $(AR) cDPrsT $@ $(filter %.o,$^) +else quiet_cmd_link_multi-m = LD [M] $@ cmd_link_multi-m = $(LD) $(ld_flags) -r -o $@ $(filter %.o,$^) +endif $(multi-used-m): FORCE $(call if_changed,link_multi-m) diff --git a/scripts/Makefile.modfinal b/scripts/Makefile.modfinal index ae01baf96f4e..2cb9a1d88434 100644 --- a/scripts/Makefile.modfinal +++ b/scripts/Makefile.modfinal @@ -6,6 +6,7 @@ PHONY := __modfinal __modfinal: +include $(objtree)/include/config/auto.conf include $(srctree)/scripts/Kbuild.include # for c_flags @@ -29,6 +30,12 @@ quiet_cmd_cc_o_c = CC [M] $@ ARCH_POSTLINK := $(wildcard $(srctree)/arch/$(SRCARCH)/Makefile.postlink) +ifdef CONFIG_LTO_CLANG +# With CONFIG_LTO_CLANG, reuse the object file we compiled for modpost to +# avoid a second slow LTO link +prelink-ext := .lto +endif + quiet_cmd_ld_ko_o = LD [M] $@ cmd_ld_ko_o = \ $(LD) -r $(KBUILD_LDFLAGS) \ @@ -36,7 +43,7 @@ quiet_cmd_ld_ko_o = LD [M] $@ -T scripts/module.lds -o $@ $(filter %.o, $^); \ $(if $(ARCH_POSTLINK), $(MAKE) -f $(ARCH_POSTLINK) $@, true) -$(modules): %.ko: %.o %.mod.o scripts/module.lds FORCE +$(modules): %.ko: %$(prelink-ext).o %.mod.o scripts/module.lds FORCE +$(call if_changed,ld_ko_o) targets += $(modules) $(modules:.ko=.mod.o) diff --git a/scripts/Makefile.modpost b/scripts/Makefile.modpost index f54b6ac37ac2..9ff8bfdb574d 100644 --- a/scripts/Makefile.modpost +++ b/scripts/Makefile.modpost @@ -43,6 +43,9 @@ __modpost: include include/config/auto.conf include scripts/Kbuild.include +# for ld_flags +include scripts/Makefile.lib + MODPOST = scripts/mod/modpost \ $(if $(CONFIG_MODVERSIONS),-m) \ $(if $(CONFIG_MODULE_SRCVERSION_ALL),-a) \ @@ -102,12 +105,26 @@ $(input-symdump): @echo >&2 'WARNING: Symbol version dump "$@" is missing.' @echo >&2 ' Modules may not have dependencies or modversions.' +ifdef CONFIG_LTO_CLANG +# With CONFIG_LTO_CLANG, .o files might be LLVM bitcode, so we need to run +# LTO to compile them into native code before running modpost +prelink-ext := .lto + +quiet_cmd_cc_lto_link_modules = LTO [M] $@ +cmd_cc_lto_link_modules = $(LD) $(ld_flags) -r -o $@ --whole-archive $^ + +%.lto.o: %.o + $(call if_changed,cc_lto_link_modules) +endif + +modules := $(sort $(shell cat $(MODORDER))) + # Read out modules.order to pass in modpost. # Otherwise, allmodconfig would fail with "Argument list too long". quiet_cmd_modpost = MODPOST $@ - cmd_modpost = sed 's/ko$$/o/' $< | $(MODPOST) -T - + cmd_modpost = sed 's/\.ko$$/$(prelink-ext)\.o/' $< | $(MODPOST) -T - -$(output-symdump): $(MODORDER) $(input-symdump) FORCE +$(output-symdump): $(MODORDER) $(input-symdump) $(modules:.ko=$(prelink-ext).o) FORCE $(call if_changed,modpost) targets += $(output-symdump) diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh index 372c3719f94c..ebb9f912aab6 100755 --- a/scripts/link-vmlinux.sh +++ b/scripts/link-vmlinux.sh @@ -56,6 +56,14 @@ modpost_link() ${KBUILD_VMLINUX_LIBS} \ --end-group" + if [ -n "${CONFIG_LTO_CLANG}" ]; then + # This might take a while, so indicate that we're doing + # an LTO link + info LTO ${1} + else + info LD ${1} + fi + ${LD} ${KBUILD_LDFLAGS} -r -o ${1} ${objects} } @@ -103,13 +111,22 @@ vmlinux_link() fi if [ "${SRCARCH}" != "um" ]; then - objects="--whole-archive \ - ${KBUILD_VMLINUX_OBJS} \ - --no-whole-archive \ - --start-group \ - ${KBUILD_VMLINUX_LIBS} \ - --end-group \ - ${@}" + if [ -n "${CONFIG_LTO_CLANG}" ]; then + # Use vmlinux.o instead of performing the slow LTO + # link again. + objects="--whole-archive \ + vmlinux.o \ + --no-whole-archive \ + ${@}" + else + objects="--whole-archive \ + ${KBUILD_VMLINUX_OBJS} \ + --no-whole-archive \ + --start-group \ + ${KBUILD_VMLINUX_LIBS} \ + --end-group \ + ${@}" + fi ${LD} ${KBUILD_LDFLAGS} ${LDFLAGS_vmlinux} \ ${strip_debug#-Wl,} \ @@ -274,7 +291,6 @@ fi; ${MAKE} -f "${srctree}/scripts/Makefile.build" obj=init need-builtin=1 #link vmlinux.o -info LD vmlinux.o modpost_link vmlinux.o objtool_link vmlinux.o -- 2.28.0.1011.ga647a8990f-goog
With CONFIG_MODVERSIONS, version information is linked into each compilation unit that exports symbols. With LTO, we cannot use this method as all C code is compiled into LLVM bitcode instead. This change collects symbol versions into .symversions files and merges them in link-vmlinux.sh where they are all linked into vmlinux.o at the same time. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> --- .gitignore | 1 + Makefile | 3 ++- arch/Kconfig | 1 - scripts/Makefile.build | 33 +++++++++++++++++++++++++++++++-- scripts/Makefile.modpost | 6 +++++- scripts/link-vmlinux.sh | 23 ++++++++++++++++++++++- 6 files changed, 61 insertions(+), 6 deletions(-) diff --git a/.gitignore b/.gitignore index 162bd2b67bdf..06e76dc39ffe 100644 --- a/.gitignore +++ b/.gitignore @@ -41,6 +41,7 @@ *.so.dbg *.su *.symtypes +*.symversions *.tab.[ch] *.tar *.xz diff --git a/Makefile b/Makefile index a3baf8388163..6d31a78d79ce 100644 --- a/Makefile +++ b/Makefile @@ -1827,7 +1827,8 @@ clean: $(clean-dirs) -o -name '.tmp_*.o.*' \ -o -name '*.c.[012]*.*' \ -o -name '*.ll' \ - -o -name '*.gcno' \) -type f -print | xargs rm -f + -o -name '*.gcno' \ + -o -name '*.*.symversions' \) -type f -print | xargs rm -f # Generate tags for editors # --------------------------------------------------------------------------- diff --git a/arch/Kconfig b/arch/Kconfig index 4ac5dda6d873..caeb6feb517e 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -615,7 +615,6 @@ config LTO_CLANG depends on !FTRACE_MCOUNT_RECORD depends on !KASAN depends on !GCOV_KERNEL - depends on !MODVERSIONS select LTO help This option enables Clang's Link Time Optimization (LTO), which diff --git a/scripts/Makefile.build b/scripts/Makefile.build index ed74b2f986f7..eae2f5386a03 100644 --- a/scripts/Makefile.build +++ b/scripts/Makefile.build @@ -166,6 +166,15 @@ ifdef CONFIG_MODVERSIONS # the actual value of the checksum generated by genksyms # o remove .tmp_<file>.o to <file>.o +ifdef CONFIG_LTO_CLANG +# Generate .o.symversions files for each .o with exported symbols, and link these +# to the kernel and/or modules at the end. +cmd_modversions_c = \ + if $(NM) $@ 2>/dev/null | grep -q __ksymtab; then \ + $(call cmd_gensymtypes_c,$(KBUILD_SYMTYPES),$(@:.o=.symtypes)) \ + > $@.symversions; \ + fi; +else cmd_modversions_c = \ if $(OBJDUMP) -h $@ | grep -q __ksymtab; then \ $(call cmd_gensymtypes_c,$(KBUILD_SYMTYPES),$(@:.o=.symtypes)) \ @@ -177,6 +186,7 @@ cmd_modversions_c = \ rm -f $(@D)/.tmp_$(@F:.o=.ver); \ fi endif +endif ifdef CONFIG_FTRACE_MCOUNT_USE_RECORDMCOUNT # compiler will not generate __mcount_loc use recordmcount or recordmcount.pl @@ -390,6 +400,18 @@ $(obj)/%.asn1.c $(obj)/%.asn1.h: $(src)/%.asn1 $(objtree)/scripts/asn1_compiler $(subdir-builtin): $(obj)/%/built-in.a: $(obj)/% ; $(subdir-modorder): $(obj)/%/modules.order: $(obj)/% ; +# combine symversions for later processing +quiet_cmd_update_lto_symversions = SYMVER $@ +ifeq ($(CONFIG_LTO_CLANG) $(CONFIG_MODVERSIONS),y y) + cmd_update_lto_symversions = \ + rm -f $@.symversions \ + $(foreach n, $(filter-out FORCE,$^), \ + $(if $(wildcard $(n).symversions), \ + ; cat $(n).symversions >> $@.symversions)) +else + cmd_update_lto_symversions = echo >/dev/null +endif + # # Rule to compile a set of .o files into one .a file (without symbol table) # @@ -397,8 +419,11 @@ $(subdir-modorder): $(obj)/%/modules.order: $(obj)/% ; quiet_cmd_ar_builtin = AR $@ cmd_ar_builtin = rm -f $@; $(AR) cDPrST $@ $(real-prereqs) +quiet_cmd_ar_and_symver = AR $@ + cmd_ar_and_symver = $(cmd_update_lto_symversions); $(cmd_ar_builtin) + $(obj)/built-in.a: $(real-obj-y) FORCE - $(call if_changed,ar_builtin) + $(call if_changed,ar_and_symver) # # Rule to create modules.order file @@ -418,8 +443,11 @@ $(obj)/modules.order: $(obj-m) FORCE # # Rule to compile a set of .o files into one .a file (with symbol table) # +quiet_cmd_ar_lib = AR $@ + cmd_ar_lib = $(cmd_update_lto_symversions); $(cmd_ar) + $(obj)/lib.a: $(lib-y) FORCE - $(call if_changed,ar) + $(call if_changed,ar_lib) # NOTE: # Do not replace $(filter %.o,^) with $(real-prereqs). When a single object @@ -428,6 +456,7 @@ $(obj)/lib.a: $(lib-y) FORCE ifdef CONFIG_LTO_CLANG quiet_cmd_link_multi-m = AR [M] $@ cmd_link_multi-m = \ + $(cmd_update_lto_symversions); \ rm -f $@; \ $(AR) cDPrsT $@ $(filter %.o,$^) else diff --git a/scripts/Makefile.modpost b/scripts/Makefile.modpost index 9ff8bfdb574d..066beffca09a 100644 --- a/scripts/Makefile.modpost +++ b/scripts/Makefile.modpost @@ -111,7 +111,11 @@ ifdef CONFIG_LTO_CLANG prelink-ext := .lto quiet_cmd_cc_lto_link_modules = LTO [M] $@ -cmd_cc_lto_link_modules = $(LD) $(ld_flags) -r -o $@ --whole-archive $^ +cmd_cc_lto_link_modules = \ + $(LD) $(ld_flags) -r -o $@ \ + $(shell [ -s $(@:.lto.o=.o.symversions) ] && \ + echo -T $(@:.lto.o=.o.symversions)) \ + --whole-archive $^ %.lto.o: %.o $(call if_changed,cc_lto_link_modules) diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh index ebb9f912aab6..1a48ef525f46 100755 --- a/scripts/link-vmlinux.sh +++ b/scripts/link-vmlinux.sh @@ -43,11 +43,26 @@ info() fi } +# If CONFIG_LTO_CLANG is selected, collect generated symbol versions into +# .tmp_symversions.lds +gen_symversions() +{ + info GEN .tmp_symversions.lds + rm -f .tmp_symversions.lds + + for o in ${KBUILD_VMLINUX_OBJS} ${KBUILD_VMLINUX_LIBS}; do + if [ -f ${o}.symversions ]; then + cat ${o}.symversions >> .tmp_symversions.lds + fi + done +} + # Link of vmlinux.o used for section mismatch analysis # ${1} output file modpost_link() { local objects + local lds="" objects="--whole-archive \ ${KBUILD_VMLINUX_OBJS} \ @@ -57,6 +72,11 @@ modpost_link() --end-group" if [ -n "${CONFIG_LTO_CLANG}" ]; then + if [ -n "${CONFIG_MODVERSIONS}" ]; then + gen_symversions + lds="${lds} -T .tmp_symversions.lds" + fi + # This might take a while, so indicate that we're doing # an LTO link info LTO ${1} @@ -64,7 +84,7 @@ modpost_link() info LD ${1} fi - ${LD} ${KBUILD_LDFLAGS} -r -o ${1} ${objects} + ${LD} ${KBUILD_LDFLAGS} -r -o ${1} ${lds} ${objects} } objtool_link() @@ -242,6 +262,7 @@ cleanup() { rm -f .btf.* rm -f .tmp_System.map + rm -f .tmp_symversions.lds rm -f .tmp_vmlinux* rm -f System.map rm -f vmlinux -- 2.28.0.1011.ga647a8990f-goog
This change adds a --noinstr flag to objtool to allow us to specify that we're processing vmlinux.o without also enabling noinstr validation. This is needed to avoid false positives with LTO when we run objtool on vmlinux.o without CONFIG_DEBUG_ENTRY. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> --- scripts/link-vmlinux.sh | 2 +- tools/objtool/builtin-check.c | 3 ++- tools/objtool/check.c | 2 +- tools/objtool/include/objtool/builtin.h | 2 +- 4 files changed, 5 insertions(+), 4 deletions(-) diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh index 1a48ef525f46..5ace1dc43993 100755 --- a/scripts/link-vmlinux.sh +++ b/scripts/link-vmlinux.sh @@ -92,7 +92,7 @@ objtool_link() local objtoolopt; if [ -n "${CONFIG_VMLINUX_VALIDATION}" ]; then - objtoolopt="check --vmlinux" + objtoolopt="check --vmlinux --noinstr" if [ -z "${CONFIG_FRAME_POINTER}" ]; then objtoolopt="${objtoolopt} --no-fp" fi diff --git a/tools/objtool/builtin-check.c b/tools/objtool/builtin-check.c index ff4d7f5c0e80..c3a85d8f6c5c 100644 --- a/tools/objtool/builtin-check.c +++ b/tools/objtool/builtin-check.c @@ -18,7 +18,7 @@ #include <objtool/builtin.h> #include <objtool/objtool.h> -bool no_fp, no_unreachable, retpoline, module, backtrace, uaccess, stats, validate_dup, vmlinux, mcount; +bool no_fp, no_unreachable, retpoline, module, backtrace, uaccess, stats, validate_dup, vmlinux, mcount, noinstr; static const char * const check_usage[] = { "objtool check [<options>] file.o", @@ -34,6 +34,7 @@ const struct option check_options[] = { OPT_BOOLEAN('a', "uaccess", &uaccess, "enable uaccess checking"), OPT_BOOLEAN('s', "stats", &stats, "print statistics"), OPT_BOOLEAN('d', "duplicate", &validate_dup, "duplicate validation for vmlinux.o"), + OPT_BOOLEAN('n', "noinstr", &noinstr, "noinstr validation for vmlinux.o"), OPT_BOOLEAN('l', "vmlinux", &vmlinux, "vmlinux.o validation"), OPT_BOOLEAN('M', "mcount", &mcount, "generate __mcount_loc"), OPT_END(), diff --git a/tools/objtool/check.c b/tools/objtool/check.c index 61dcd80feec5..0c05d58608b0 100644 --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -245,7 +245,7 @@ static void init_insn_state(struct insn_state *state, struct section *sec) * not correctly determine insn->call_dest->sec (external symbols do * not have a section). */ - if (vmlinux && sec) + if (vmlinux && noinstr && sec) state->noinstr = sec->noinstr; } diff --git a/tools/objtool/include/objtool/builtin.h b/tools/objtool/include/objtool/builtin.h index 94565a72b701..2502bb27de17 100644 --- a/tools/objtool/include/objtool/builtin.h +++ b/tools/objtool/include/objtool/builtin.h @@ -8,7 +8,7 @@ #include <subcmd/parse-options.h> extern const struct option check_options[]; -extern bool no_fp, no_unreachable, retpoline, module, backtrace, uaccess, stats, validate_dup, vmlinux, mcount; +extern bool no_fp, no_unreachable, retpoline, module, backtrace, uaccess, stats, validate_dup, vmlinux, mcount, noinstr; extern int cmd_check(int argc, const char **argv); extern int cmd_orc(int argc, const char **argv); -- 2.28.0.1011.ga647a8990f-goog
With LTO, LLVM bitcode won't be compiled into native code until modpost_link, or modfinal for modules. This change postpones calls to objtool until after these steps, and moves objtool_args to Makefile.lib, so the arguments can be reused in Makefile.modfinal. As we didn't have objects to process earlier, we use --duplicate when processing vmlinux.o. This change also disables unreachable instruction warnings with LTO to avoid warnings about the int3 padding between functions. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> --- arch/Kconfig | 2 +- scripts/Makefile.build | 19 ++----------------- scripts/Makefile.lib | 11 +++++++++++ scripts/Makefile.modfinal | 19 ++++++++++++++++--- scripts/link-vmlinux.sh | 28 +++++++++++++++++++++++++--- 5 files changed, 55 insertions(+), 24 deletions(-) diff --git a/arch/Kconfig b/arch/Kconfig index caeb6feb517e..74cbd6e3b116 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -612,7 +612,7 @@ config LTO_CLANG depends on $(success,$(NM) --help | head -n 1 | grep -qi llvm) depends on $(success,$(AR) --help | head -n 1 | grep -qi llvm) depends on ARCH_SUPPORTS_LTO_CLANG - depends on !FTRACE_MCOUNT_RECORD + depends on !FTRACE_MCOUNT_USE_RECORDMCOUNT depends on !KASAN depends on !GCOV_KERNEL select LTO diff --git a/scripts/Makefile.build b/scripts/Makefile.build index eae2f5386a03..ab0ddf4884fd 100644 --- a/scripts/Makefile.build +++ b/scripts/Makefile.build @@ -218,27 +218,11 @@ cmd_record_mcount = $(if $(findstring $(strip $(CC_FLAGS_FTRACE)),$(_c_flags)), endif # CONFIG_FTRACE_MCOUNT_USE_RECORDMCOUNT ifdef CONFIG_STACK_VALIDATION +ifndef CONFIG_LTO_CLANG ifneq ($(SKIP_STACK_VALIDATION),1) __objtool_obj := $(objtree)/tools/objtool/objtool -objtool_args = $(if $(CONFIG_UNWINDER_ORC),orc generate,check) - -objtool_args += $(if $(part-of-module), --module,) - -ifndef CONFIG_FRAME_POINTER -objtool_args += --no-fp -endif -ifdef CONFIG_GCOV_KERNEL -objtool_args += --no-unreachable -endif -ifdef CONFIG_RETPOLINE - objtool_args += --retpoline -endif -ifdef CONFIG_X86_SMAP - objtool_args += --uaccess -endif - # 'OBJECT_FILES_NON_STANDARD := y': skip objtool checking for a directory # 'OBJECT_FILES_NON_STANDARD_foo.o := 'y': skip objtool checking for a file # 'OBJECT_FILES_NON_STANDARD_foo.o := 'n': override directory skip for a file @@ -250,6 +234,7 @@ objtool_obj = $(if $(patsubst y%,, \ $(__objtool_obj)) endif # SKIP_STACK_VALIDATION +endif # CONFIG_LTO_CLANG endif # CONFIG_STACK_VALIDATION # Rebuild all objects when objtool changes, or is enabled/disabled. diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib index 3d599716940c..ecb97c9f5feb 100644 --- a/scripts/Makefile.lib +++ b/scripts/Makefile.lib @@ -216,6 +216,17 @@ dtc_cpp_flags = -Wp,-MMD,$(depfile).pre.tmp -nostdinc \ $(addprefix -I,$(DTC_INCLUDE)) \ -undef -D__DTS__ +# Objtool arguments are also needed for modfinal with LTO, so we define +# then here to avoid duplication. +objtool_args = \ + $(if $(CONFIG_UNWINDER_ORC),orc generate,check) \ + $(if $(part-of-module), --module,) \ + $(if $(CONFIG_FRAME_POINTER),, --no-fp) \ + $(if $(CONFIG_GCOV_KERNEL), --no-unreachable,) \ + $(if $(CONFIG_RETPOLINE), --retpoline,) \ + $(if $(CONFIG_X86_SMAP), --uaccess,) \ + $(if $(CONFIG_FTRACE_MCOUNT_USE_OBJTOOL), --mcount,) + # Useful for describing the dependency of composite objects # Usage: # $(call multi_depend, multi_used_targets, suffix_to_remove, suffix_to_add) diff --git a/scripts/Makefile.modfinal b/scripts/Makefile.modfinal index 2cb9a1d88434..1bd2953b11c4 100644 --- a/scripts/Makefile.modfinal +++ b/scripts/Makefile.modfinal @@ -9,7 +9,7 @@ __modfinal: include $(objtree)/include/config/auto.conf include $(srctree)/scripts/Kbuild.include -# for c_flags +# for c_flags and objtool_args include $(srctree)/scripts/Makefile.lib # find all modules listed in modules.order @@ -34,10 +34,23 @@ ifdef CONFIG_LTO_CLANG # With CONFIG_LTO_CLANG, reuse the object file we compiled for modpost to # avoid a second slow LTO link prelink-ext := .lto -endif + +# ELF processing was skipped earlier because we didn't have native code, +# so let's now process the prelinked binary before we link the module. + +ifdef CONFIG_STACK_VALIDATION +ifneq ($(SKIP_STACK_VALIDATION),1) +cmd_ld_ko_o += \ + $(objtree)/tools/objtool/objtool $(objtool_args) \ + $(@:.ko=$(prelink-ext).o); + +endif # SKIP_STACK_VALIDATION +endif # CONFIG_STACK_VALIDATION + +endif # CONFIG_LTO_CLANG quiet_cmd_ld_ko_o = LD [M] $@ - cmd_ld_ko_o = \ + cmd_ld_ko_o += \ $(LD) -r $(KBUILD_LDFLAGS) \ $(KBUILD_LDFLAGS_MODULE) $(LDFLAGS_MODULE) \ -T scripts/module.lds -o $@ $(filter %.o, $^); \ diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh index 5ace1dc43993..7f4d19271180 100755 --- a/scripts/link-vmlinux.sh +++ b/scripts/link-vmlinux.sh @@ -89,14 +89,36 @@ modpost_link() objtool_link() { + local objtoolcmd; local objtoolopt; + if [ "${CONFIG_LTO_CLANG} ${CONFIG_STACK_VALIDATION}" = "y y" ]; then + # Don't perform vmlinux validation unless explicitly requested, + # but run objtool on vmlinux.o now that we have an object file. + if [ -n "${CONFIG_UNWINDER_ORC}" ]; then + objtoolcmd="orc generate" + fi + + objtoolopt="${objtoolopt} --duplicate" + + if [ -n "${CONFIG_FTRACE_MCOUNT_USE_OBJTOOL}" ]; then + objtoolopt="${objtoolopt} --mcount" + fi + fi + if [ -n "${CONFIG_VMLINUX_VALIDATION}" ]; then - objtoolopt="check --vmlinux --noinstr" + objtoolopt="${objtoolopt} --noinstr" + fi + + if [ -n "${objtoolopt}" ]; then + if [ -z "${objtoolcmd}" ]; then + objtoolcmd="check" + fi + objtoolopt="${objtoolopt} --vmlinux" if [ -z "${CONFIG_FRAME_POINTER}" ]; then objtoolopt="${objtoolopt} --no-fp" fi - if [ -n "${CONFIG_GCOV_KERNEL}" ]; then + if [ -n "${CONFIG_GCOV_KERNEL}" ] || [ -n "${CONFIG_LTO_CLANG}" ]; then objtoolopt="${objtoolopt} --no-unreachable" fi if [ -n "${CONFIG_RETPOLINE}" ]; then @@ -106,7 +128,7 @@ objtool_link() objtoolopt="${objtoolopt} --uaccess" fi info OBJTOOL ${1} - tools/objtool/objtool ${objtoolopt} ${1} + tools/objtool/objtool ${objtoolcmd} ${objtoolopt} ${1} fi } -- 2.28.0.1011.ga647a8990f-goog
This change limits function inlining across translation unit boundaries in order to reduce the binary size with LTO. The -import-instr-limit flag defines a size limit, as the number of LLVM IR instructions, for importing functions from other TUs, defaulting to 100. Based on testing with arm64 defconfig, we found that a limit of 5 is a reasonable compromise between performance and binary size, reducing the size of a stripped vmlinux by 11%. Suggested-by: George Burgess IV <gbiv@google.com> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> --- Makefile | 3 +++ 1 file changed, 3 insertions(+) diff --git a/Makefile b/Makefile index 6d31a78d79ce..3fe2dca17261 100644 --- a/Makefile +++ b/Makefile @@ -894,6 +894,9 @@ else CC_FLAGS_LTO += -flto endif CC_FLAGS_LTO += -fvisibility=default + +# Limit inlining across translation units to reduce binary size +KBUILD_LDFLAGS += -mllvm -import-instr-limit=5 endif ifdef CONFIG_LTO -- 2.28.0.1011.ga647a8990f-goog
LLD always splits sections with LTO, which increases module sizes. This change adds linker script rules to merge the split sections in the final module. Suggested-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> --- scripts/module.lds.S | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/scripts/module.lds.S b/scripts/module.lds.S index 69b9b71a6a47..037120173a22 100644 --- a/scripts/module.lds.S +++ b/scripts/module.lds.S @@ -25,5 +25,33 @@ SECTIONS { __jump_table 0 : ALIGN(8) { KEEP(*(__jump_table)) } } +#ifdef CONFIG_LTO_CLANG +/* + * With CONFIG_LTO_CLANG, LLD always enables -fdata-sections and + * -ffunction-sections, which increases the size of the final module. + * Merge the split sections in the final binary. + */ +SECTIONS { + __patchable_function_entries : { *(__patchable_function_entries) } + + .bss : { + *(.bss .bss.[0-9a-zA-Z_]*) + *(.bss..L*) + } + + .data : { + *(.data .data.[0-9a-zA-Z_]*) + *(.data..L*) + } + + .rodata : { + *(.rodata .rodata.[0-9a-zA-Z_]*) + *(.rodata..L*) + } + + .text : { *(.text .text.[0-9a-zA-Z_]*) } +} +#endif + /* bring in arch-specific sections */ #include <asm/module.lds.h> -- 2.28.0.1011.ga647a8990f-goog
With LTO, llvm-nm prints out symbols for each archive member separately, which results in a lot of duplicate dependencies in the .mod file when CONFIG_TRIM_UNUSED_SYMS is enabled. When a module consists of several compilation units, the output can exceed the default xargs command size limit and split the dependency list to multiple lines, which results in used symbols getting trimmed. This change removes duplicate dependencies, which will reduce the probability of this happening and makes .mod files smaller and easier to read. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> --- scripts/Makefile.build | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/Makefile.build b/scripts/Makefile.build index ab0ddf4884fd..96d6c9e18901 100644 --- a/scripts/Makefile.build +++ b/scripts/Makefile.build @@ -266,7 +266,7 @@ endef # List module undefined symbols (or empty line if not enabled) ifdef CONFIG_TRIM_UNUSED_KSYMS -cmd_undef_syms = $(NM) $< | sed -n 's/^ *U //p' | xargs echo +cmd_undef_syms = $(NM) $< | sed -n 's/^ *U //p' | sort -u | xargs echo else cmd_undef_syms = echo endif -- 2.28.0.1011.ga647a8990f-goog
With LTO, the compiler doesn't necessarily obey the link order for initcalls, and initcall variables need globally unique names to avoid collisions at link time. This change exports __KBUILD_MODNAME and adds the initcall_id() macro, which uses it together with __COUNTER__ and __LINE__ to help ensure these variables have unique names, and moves each variable to its own section when LTO is enabled, so the correct order can be specified using a linker script. The generate_initcall_ordering.pl script uses nm to find initcalls from the object files passed to the linker, and generates a linker script that specifies the same order for initcalls that we would have without LTO. With LTO enabled, the script is called in link-vmlinux.sh through jobserver-exec to limit the number of jobs spawned. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> --- include/linux/init.h | 52 +++++- scripts/Makefile.lib | 6 +- scripts/generate_initcall_order.pl | 270 +++++++++++++++++++++++++++++ scripts/link-vmlinux.sh | 15 ++ 4 files changed, 334 insertions(+), 9 deletions(-) create mode 100755 scripts/generate_initcall_order.pl diff --git a/include/linux/init.h b/include/linux/init.h index 212fc9e2f691..af638cd6dd52 100644 --- a/include/linux/init.h +++ b/include/linux/init.h @@ -184,19 +184,57 @@ extern bool initcall_debug; * as KEEP() in the linker script. */ +/* Format: <modname>__<counter>_<line>_<fn> */ +#define __initcall_id(fn) \ + __PASTE(__KBUILD_MODNAME, \ + __PASTE(__, \ + __PASTE(__COUNTER__, \ + __PASTE(_, \ + __PASTE(__LINE__, \ + __PASTE(_, fn)))))) + +/* Format: __<prefix>__<iid><id> */ +#define __initcall_name(prefix, __iid, id) \ + __PASTE(__, \ + __PASTE(prefix, \ + __PASTE(__, \ + __PASTE(__iid, id)))) + +#ifdef CONFIG_LTO_CLANG +/* + * With LTO, the compiler doesn't necessarily obey link order for + * initcalls. In order to preserve the correct order, we add each + * variable into its own section and generate a linker script (in + * scripts/link-vmlinux.sh) to specify the order of the sections. + */ +#define __initcall_section(__sec, __iid) \ + #__sec ".init.." #__iid +#else +#define __initcall_section(__sec, __iid) \ + #__sec ".init" +#endif + #ifdef CONFIG_HAVE_ARCH_PREL32_RELOCATIONS -#define ___define_initcall(fn, id, __sec) \ +#define ____define_initcall(fn, __name, __sec) \ __ADDRESSABLE(fn) \ - asm(".section \"" #__sec ".init\", \"a\" \n" \ - "__initcall_" #fn #id ": \n" \ + asm(".section \"" __sec "\", \"a\" \n" \ + __stringify(__name) ": \n" \ ".long " #fn " - . \n" \ ".previous \n"); #else -#define ___define_initcall(fn, id, __sec) \ - static initcall_t __initcall_##fn##id __used \ - __attribute__((__section__(#__sec ".init"))) = fn; +#define ____define_initcall(fn, __name, __sec) \ + static initcall_t __name __used \ + __attribute__((__section__(__sec))) = fn; #endif +#define __unique_initcall(fn, id, __sec, __iid) \ + ____define_initcall(fn, \ + __initcall_name(initcall, __iid, id), \ + __initcall_section(__sec, __iid)) + +#define ___define_initcall(fn, id, __sec) \ + __unique_initcall(fn, id, __sec, __initcall_id(fn)) + #define __define_initcall(fn, id) ___define_initcall(fn, id, .initcall##id) /* @@ -236,7 +274,7 @@ extern bool initcall_debug; #define __exitcall(fn) \ static exitcall_t __exitcall_##fn __exit_call = fn -#define console_initcall(fn) ___define_initcall(fn,, .con_initcall) +#define console_initcall(fn) ___define_initcall(fn, con, .con_initcall) struct obs_kernel_param { const char *str; diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib index ecb97c9f5feb..1142de6a4161 100644 --- a/scripts/Makefile.lib +++ b/scripts/Makefile.lib @@ -117,9 +117,11 @@ target-stem = $(basename $(patsubst $(obj)/%,%,$@)) # These flags are needed for modversions and compiling, so we define them here # $(modname_flags) defines KBUILD_MODNAME as the name of the module it will # end up in (or would, if it gets compiled in) -name-fix = $(call stringify,$(subst $(comma),_,$(subst -,_,$1))) +name-fix-token = $(subst $(comma),_,$(subst -,_,$1)) +name-fix = $(call stringify,$(call name-fix-token,$1)) basename_flags = -DKBUILD_BASENAME=$(call name-fix,$(basetarget)) -modname_flags = -DKBUILD_MODNAME=$(call name-fix,$(modname)) +modname_flags = -DKBUILD_MODNAME=$(call name-fix,$(modname)) \ + -D__KBUILD_MODNAME=kmod_$(call name-fix-token,$(modname)) modfile_flags = -DKBUILD_MODFILE=$(call stringify,$(modfile)) _c_flags = $(filter-out $(CFLAGS_REMOVE_$(target-stem).o), \ diff --git a/scripts/generate_initcall_order.pl b/scripts/generate_initcall_order.pl new file mode 100755 index 000000000000..1a88d3f1b913 --- /dev/null +++ b/scripts/generate_initcall_order.pl @@ -0,0 +1,270 @@ +#!/usr/bin/env perl +# SPDX-License-Identifier: GPL-2.0 +# +# Generates a linker script that specifies the correct initcall order. +# +# Copyright (C) 2019 Google LLC + +use strict; +use warnings; +use IO::Handle; +use IO::Select; +use POSIX ":sys_wait_h"; + +my $nm = $ENV{'NM'} || die "$0: ERROR: NM not set?"; +my $objtree = $ENV{'objtree'} || '.'; + +## currently active child processes +my $jobs = {}; # child process pid -> file handle +## results from child processes +my $results = {}; # object index -> [ { level, secname }, ... ] + +## reads _NPROCESSORS_ONLN to determine the maximum number of processes to +## start +sub get_online_processors { + open(my $fh, "getconf _NPROCESSORS_ONLN 2>/dev/null |") + or die "$0: ERROR: failed to execute getconf: $!"; + my $procs = <$fh>; + close($fh); + + if (!($procs =~ /^\d+$/)) { + return 1; + } + + return int($procs); +} + +## writes results to the parent process +## format: <file index> <initcall level> <base initcall section name> +sub write_results { + my ($index, $initcalls) = @_; + + # sort by the counter value to ensure the order of initcalls within + # each object file is correct + foreach my $counter (sort { $a <=> $b } keys(%{$initcalls})) { + my $level = $initcalls->{$counter}->{'level'}; + + # section name for the initcall function + my $secname = $initcalls->{$counter}->{'module'} . '__' . + $counter . '_' . + $initcalls->{$counter}->{'line'} . '_' . + $initcalls->{$counter}->{'function'}; + + print "$index $level $secname\n"; + } +} + +## reads a result line from a child process and adds it to the $results array +sub read_results{ + my ($fh) = @_; + + # each child prints out a full line w/ autoflush and exits after the + # last line, so even if buffered I/O blocks here, it shouldn't block + # very long + my $data = <$fh>; + + if (!defined($data)) { + return 0; + } + + chomp($data); + + my ($index, $level, $secname) = $data =~ + /^(\d+)\ ([^\ ]+)\ (.*)$/; + + if (!defined($index) || + !defined($level) || + !defined($secname)) { + die "$0: ERROR: child process returned invalid data: $data\n"; + } + + $index = int($index); + + if (!exists($results->{$index})) { + $results->{$index} = []; + } + + push (@{$results->{$index}}, { + 'level' => $level, + 'secname' => $secname + }); + + return 1; +} + +## finds initcalls from an object file or all object files in an archive, and +## writes results back to the parent process +sub find_initcalls { + my ($index, $file) = @_; + + die "$0: ERROR: file $file doesn't exist?" if (! -f $file); + + open(my $fh, "\"$nm\" --defined-only \"$file\" 2>/dev/null |") + or die "$0: ERROR: failed to execute \"$nm\": $!"; + + my $initcalls = {}; + + while (<$fh>) { + chomp; + + # check for the start of a new object file (if processing an + # archive) + my ($path)= $_ =~ /^(.+)\:$/; + + if (defined($path)) { + write_results($index, $initcalls); + $initcalls = {}; + next; + } + + # look for an initcall + my ($module, $counter, $line, $symbol) = $_ =~ + /[a-z]\s+__initcall__(\S*)__(\d+)_(\d+)_(.*)$/; + + if (!defined($module)) { + $module = '' + } + + if (!defined($counter) || + !defined($line) || + !defined($symbol)) { + next; + } + + # parse initcall level + my ($function, $level) = $symbol =~ + /^(.*)((early|rootfs|con|[0-9])s?)$/; + + die "$0: ERROR: invalid initcall name $symbol in $file($path)" + if (!defined($function) || !defined($level)); + + $initcalls->{$counter} = { + 'module' => $module, + 'line' => $line, + 'function' => $function, + 'level' => $level, + }; + } + + close($fh); + write_results($index, $initcalls); +} + +## waits for any child process to complete, reads the results, and adds them to +## the $results array for later processing +sub wait_for_results { + my ($select) = @_; + + my $pid = 0; + do { + # unblock children that may have a full write buffer + foreach my $fh ($select->can_read(0)) { + read_results($fh); + } + + # check for children that have exited, read the remaining data + # from them, and clean up + $pid = waitpid(-1, WNOHANG); + if ($pid > 0) { + if (!exists($jobs->{$pid})) { + next; + } + + my $fh = $jobs->{$pid}; + $select->remove($fh); + + while (read_results($fh)) { + # until eof + } + + close($fh); + delete($jobs->{$pid}); + } + } while ($pid > 0); +} + +## forks a child to process each file passed in the command line and collects +## the results +sub process_files { + my $index = 0; + my $njobs = $ENV{'PARALLELISM'} || get_online_processors(); + my $select = IO::Select->new(); + + while (my $file = shift(@ARGV)) { + # fork a child process and read it's stdout + my $pid = open(my $fh, '-|'); + + if (!defined($pid)) { + die "$0: ERROR: failed to fork: $!"; + } elsif ($pid) { + # save the child process pid and the file handle + $select->add($fh); + $jobs->{$pid} = $fh; + } else { + # in the child process + STDOUT->autoflush(1); + find_initcalls($index, "$objtree/$file"); + exit; + } + + $index++; + + # limit the number of children to $njobs + if (scalar(keys(%{$jobs})) >= $njobs) { + wait_for_results($select); + } + } + + # wait for the remaining children to complete + while (scalar(keys(%{$jobs})) > 0) { + wait_for_results($select); + } +} + +sub generate_initcall_lds() { + process_files(); + + my $sections = {}; # level -> [ secname, ...] + + # sort results to retain link order and split to sections per + # initcall level + foreach my $index (sort { $a <=> $b } keys(%{$results})) { + foreach my $result (@{$results->{$index}}) { + my $level = $result->{'level'}; + + if (!exists($sections->{$level})) { + $sections->{$level} = []; + } + + push(@{$sections->{$level}}, $result->{'secname'}); + } + } + + die "$0: ERROR: no initcalls?" if (!keys(%{$sections})); + + # print out a linker script that defines the order of initcalls for + # each level + print "SECTIONS {\n"; + + foreach my $level (sort(keys(%{$sections}))) { + my $section; + + if ($level eq 'con') { + $section = '.con_initcall.init'; + } else { + $section = ".initcall${level}.init"; + } + + print "\t${section} : {\n"; + + foreach my $secname (@{$sections->{$level}}) { + print "\t\t*(${section}..${secname}) ;\n"; + } + + print "\t}\n"; + } + + print "}\n"; +} + +generate_initcall_lds(); diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh index 7f4d19271180..cd649dc21c04 100755 --- a/scripts/link-vmlinux.sh +++ b/scripts/link-vmlinux.sh @@ -43,6 +43,17 @@ info() fi } +# Generate a linker script to ensure correct ordering of initcalls. +gen_initcalls() +{ + info GEN .tmp_initcalls.lds + + ${PYTHON} ${srctree}/scripts/jobserver-exec \ + ${PERL} ${srctree}/scripts/generate_initcall_order.pl \ + ${KBUILD_VMLINUX_OBJS} ${KBUILD_VMLINUX_LIBS} \ + > .tmp_initcalls.lds +} + # If CONFIG_LTO_CLANG is selected, collect generated symbol versions into # .tmp_symversions.lds gen_symversions() @@ -72,6 +83,9 @@ modpost_link() --end-group" if [ -n "${CONFIG_LTO_CLANG}" ]; then + gen_initcalls + lds="-T .tmp_initcalls.lds" + if [ -n "${CONFIG_MODVERSIONS}" ]; then gen_symversions lds="${lds} -T .tmp_symversions.lds" @@ -284,6 +298,7 @@ cleanup() { rm -f .btf.* rm -f .tmp_System.map + rm -f .tmp_initcalls.lds rm -f .tmp_symversions.lds rm -f .tmp_vmlinux* rm -f System.map -- 2.28.0.1011.ga647a8990f-goog
With LTO, the compiler can rename static functions to avoid global naming collisions. As initcall functions are typically static, renaming can break references to them in inline assembly. This change adds a global stub with a stable name for each initcall to fix the issue when PREL32 relocations are used. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> --- include/linux/init.h | 31 +++++++++++++++++++++++++++---- 1 file changed, 27 insertions(+), 4 deletions(-) diff --git a/include/linux/init.h b/include/linux/init.h index af638cd6dd52..cea63f7e7705 100644 --- a/include/linux/init.h +++ b/include/linux/init.h @@ -209,26 +209,49 @@ extern bool initcall_debug; */ #define __initcall_section(__sec, __iid) \ #__sec ".init.." #__iid + +/* + * With LTO, the compiler can rename static functions to avoid + * global naming collisions. We use a global stub function for + * initcalls to create a stable symbol name whose address can be + * taken in inline assembly when PREL32 relocations are used. + */ +#define __initcall_stub(fn, __iid, id) \ + __initcall_name(initstub, __iid, id) + +#define __define_initcall_stub(__stub, fn) \ + int __init __stub(void); \ + int __init __stub(void) \ + { \ + return fn(); \ + } \ + __ADDRESSABLE(__stub) #else #define __initcall_section(__sec, __iid) \ #__sec ".init" + +#define __initcall_stub(fn, __iid, id) fn + +#define __define_initcall_stub(__stub, fn) \ + __ADDRESSABLE(fn) #endif #ifdef CONFIG_HAVE_ARCH_PREL32_RELOCATIONS -#define ____define_initcall(fn, __name, __sec) \ - __ADDRESSABLE(fn) \ +#define ____define_initcall(fn, __stub, __name, __sec) \ + __define_initcall_stub(__stub, fn) \ asm(".section \"" __sec "\", \"a\" \n" \ __stringify(__name) ": \n" \ - ".long " #fn " - . \n" \ + ".long " __stringify(__stub) " - . \n" \ ".previous \n"); #else -#define ____define_initcall(fn, __name, __sec) \ +#define ____define_initcall(fn, __unused, __name, __sec) \ static initcall_t __name __used \ __attribute__((__section__(__sec))) = fn; #endif #define __unique_initcall(fn, id, __sec, __iid) \ ____define_initcall(fn, \ + __initcall_stub(fn, __iid, id), \ __initcall_name(initcall, __iid, id), \ __initcall_section(__sec, __iid)) -- 2.28.0.1011.ga647a8990f-goog
With Clang's Link Time Optimization (LTO), the compiler can rename static functions to avoid global naming collisions. As PCI fixup functions are typically static, renaming can break references to them in inline assembly. This change adds a global stub to DECLARE_PCI_FIXUP_SECTION to fix the issue when PREL32 relocations are used. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> --- include/linux/pci.h | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/include/linux/pci.h b/include/linux/pci.h index 835530605c0d..4e64421981c7 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1909,19 +1909,28 @@ enum pci_fixup_pass { }; #ifdef CONFIG_HAVE_ARCH_PREL32_RELOCATIONS -#define __DECLARE_PCI_FIXUP_SECTION(sec, name, vendor, device, class, \ - class_shift, hook) \ - __ADDRESSABLE(hook) \ +#define ___DECLARE_PCI_FIXUP_SECTION(sec, name, vendor, device, class, \ + class_shift, hook, stub) \ + void stub(struct pci_dev *dev); \ + void stub(struct pci_dev *dev) \ + { \ + hook(dev); \ + } \ asm(".section " #sec ", \"a\" \n" \ ".balign 16 \n" \ ".short " #vendor ", " #device " \n" \ ".long " #class ", " #class_shift " \n" \ - ".long " #hook " - . \n" \ + ".long " #stub " - . \n" \ ".previous \n"); + +#define __DECLARE_PCI_FIXUP_SECTION(sec, name, vendor, device, class, \ + class_shift, hook, stub) \ + ___DECLARE_PCI_FIXUP_SECTION(sec, name, vendor, device, class, \ + class_shift, hook, stub) #define DECLARE_PCI_FIXUP_SECTION(sec, name, vendor, device, class, \ class_shift, hook) \ __DECLARE_PCI_FIXUP_SECTION(sec, name, vendor, device, class, \ - class_shift, hook) + class_shift, hook, __UNIQUE_ID(hook)) #else /* Anonymous variables would be nice... */ #define DECLARE_PCI_FIXUP_SECTION(section, name, vendor, device, class, \ -- 2.28.0.1011.ga647a8990f-goog
With LTO, everything is compiled into LLVM bitcode, so we have to link each module into native code before modpost. Kbuild uses the .lto.o suffix for these files, which also ends up in module information. This change strips the unnecessary .lto suffix from the module name. Suggested-by: Bill Wendling <morbo@google.com> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> --- scripts/mod/modpost.c | 16 +++++++--------- scripts/mod/modpost.h | 9 +++++++++ scripts/mod/sumversion.c | 6 +++++- 3 files changed, 21 insertions(+), 10 deletions(-) diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c index 69341b36f271..5a329df55cc3 100644 --- a/scripts/mod/modpost.c +++ b/scripts/mod/modpost.c @@ -17,7 +17,6 @@ #include <ctype.h> #include <string.h> #include <limits.h> -#include <stdbool.h> #include <errno.h> #include "modpost.h" #include "../../include/linux/license.h" @@ -80,14 +79,6 @@ modpost_log(enum loglevel loglevel, const char *fmt, ...) exit(1); } -static inline bool strends(const char *str, const char *postfix) -{ - if (strlen(str) < strlen(postfix)) - return false; - - return strcmp(str + strlen(str) - strlen(postfix), postfix) == 0; -} - void *do_nofail(void *ptr, const char *expr) { if (!ptr) @@ -1984,6 +1975,10 @@ static char *remove_dot(char *s) size_t m = strspn(s + n + 1, "0123456789"); if (m && (s[n + m] == '.' || s[n + m] == 0)) s[n] = 0; + + /* strip trailing .lto */ + if (strends(s, ".lto")) + s[strlen(s) - 4] = '\0'; } return s; } @@ -2007,6 +2002,9 @@ static void read_symbols(const char *modname) /* strip trailing .o */ tmp = NOFAIL(strdup(modname)); tmp[strlen(tmp) - 2] = '\0'; + /* strip trailing .lto */ + if (strends(tmp, ".lto")) + tmp[strlen(tmp) - 4] = '\0'; mod = new_module(tmp); free(tmp); } diff --git a/scripts/mod/modpost.h b/scripts/mod/modpost.h index 3aa052722233..fab30d201f9e 100644 --- a/scripts/mod/modpost.h +++ b/scripts/mod/modpost.h @@ -2,6 +2,7 @@ #include <stdio.h> #include <stdlib.h> #include <stdarg.h> +#include <stdbool.h> #include <string.h> #include <sys/types.h> #include <sys/stat.h> @@ -180,6 +181,14 @@ static inline unsigned int get_secindex(const struct elf_info *info, return info->symtab_shndx_start[sym - info->symtab_start]; } +static inline bool strends(const char *str, const char *postfix) +{ + if (strlen(str) < strlen(postfix)) + return false; + + return strcmp(str + strlen(str) - strlen(postfix), postfix) == 0; +} + /* file2alias.c */ extern unsigned int cross_build; void handle_moddevtable(struct module *mod, struct elf_info *info, diff --git a/scripts/mod/sumversion.c b/scripts/mod/sumversion.c index d587f40f1117..760e6baa7eda 100644 --- a/scripts/mod/sumversion.c +++ b/scripts/mod/sumversion.c @@ -391,10 +391,14 @@ void get_src_version(const char *modname, char sum[], unsigned sumlen) struct md4_ctx md; char *fname; char filelist[PATH_MAX + 1]; + int postfix_len = 1; + + if (strends(modname, ".lto.o")) + postfix_len = 5; /* objects for a module are listed in the first line of *.mod file. */ snprintf(filelist, sizeof(filelist), "%.*smod", - (int)strlen(modname) - 1, modname); + (int)strlen(modname) - postfix_len, modname); buf = read_text_file(filelist); -- 2.28.0.1011.ga647a8990f-goog
With CONFIG_LTO_CLANG, clang generates LLVM IR instead of ELF object files. As empty.o is used for probing target properties, disable LTO for it to produce an object file instead. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> --- scripts/mod/Makefile | 1 + 1 file changed, 1 insertion(+) diff --git a/scripts/mod/Makefile b/scripts/mod/Makefile index 78071681d924..c9e38ad937fd 100644 --- a/scripts/mod/Makefile +++ b/scripts/mod/Makefile @@ -1,5 +1,6 @@ # SPDX-License-Identifier: GPL-2.0 OBJECT_FILES_NON_STANDARD := y +CFLAGS_REMOVE_empty.o += $(CC_FLAGS_LTO) hostprogs-always-y += modpost mk_elfconfig always-y += empty.o -- 2.28.0.1011.ga647a8990f-goog
With CONFIG_LTO_CLANG, we produce LLVM bitcode instead of ELF object files. Since LTO is not really needed here and the Makefile assumes we produce an object file, disable LTO for libstub. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> --- drivers/firmware/efi/libstub/Makefile | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile index 0c911e391d75..e927876f3a05 100644 --- a/drivers/firmware/efi/libstub/Makefile +++ b/drivers/firmware/efi/libstub/Makefile @@ -36,6 +36,8 @@ KBUILD_CFLAGS := $(cflags-y) -Os -DDISABLE_BRANCH_PROFILING \ # remove SCS flags from all objects in this directory KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS)) +# disable LTO +KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_LTO), $(KBUILD_CFLAGS)) GCOV_PROFILE := n # Sanitizer runtimes are unavailable and cannot be linked here. -- 2.28.0.1011.ga647a8990f-goog
Disable LTO for rodata.o to allow objcopy to be used to manipulate sections. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Acked-by: Kees Cook <keescook@chromium.org> --- drivers/misc/lkdtm/Makefile | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/misc/lkdtm/Makefile b/drivers/misc/lkdtm/Makefile index c70b3822013f..dd4c936d4d73 100644 --- a/drivers/misc/lkdtm/Makefile +++ b/drivers/misc/lkdtm/Makefile @@ -13,6 +13,7 @@ lkdtm-$(CONFIG_LKDTM) += cfi.o KASAN_SANITIZE_stackleak.o := n KCOV_INSTRUMENT_rodata.o := n +CFLAGS_REMOVE_rodata.o += $(CC_FLAGS_LTO) OBJCOPYFLAGS := OBJCOPYFLAGS_rodata_objcopy.o := \ -- 2.28.0.1011.ga647a8990f-goog
Disable LTO for the vDSO by filtering out CC_FLAGS_LTO, as there's no point in using link-time optimization for the small about of C code. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> --- arch/arm64/kernel/vdso/Makefile | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile index e836e300440f..aa47070a3ccf 100644 --- a/arch/arm64/kernel/vdso/Makefile +++ b/arch/arm64/kernel/vdso/Makefile @@ -30,7 +30,8 @@ ldflags-y := -shared -nostdlib -soname=linux-vdso.so.1 --hash-style=sysv \ ccflags-y := -fno-common -fno-builtin -fno-stack-protector -ffixed-x18 ccflags-y += -DDISABLE_BRANCH_PROFILING -CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os $(CC_FLAGS_SCS) $(GCC_PLUGINS_CFLAGS) +CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os $(CC_FLAGS_SCS) $(GCC_PLUGINS_CFLAGS) \ + $(CC_FLAGS_LTO) KASAN_SANITIZE := n UBSAN_SANITIZE := n OBJECT_FILES_NON_STANDARD := y -- 2.28.0.1011.ga647a8990f-goog
We use objcopy to manipulate ELF binaries for the nVHE code, which fails with LTO as the compiler produces LLVM bitcode instead. Disable LTO for this code to allow objcopy to be used. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> --- arch/arm64/kvm/hyp/nvhe/Makefile | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile index aef76487edc2..c903c8f31280 100644 --- a/arch/arm64/kvm/hyp/nvhe/Makefile +++ b/arch/arm64/kvm/hyp/nvhe/Makefile @@ -45,9 +45,9 @@ quiet_cmd_hypcopy = HYPCOPY $@ --rename-section=.text=.hyp.text \ $< $@ -# Remove ftrace and Shadow Call Stack CFLAGS. +# Remove ftrace, LTO, and Shadow Call Stack CFLAGS. # This is equivalent to the 'notrace' and '__noscs' annotations. -KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_FTRACE) $(CC_FLAGS_SCS), $(KBUILD_CFLAGS)) +KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_FTRACE) $(CC_FLAGS_LTO) $(CC_FLAGS_SCS), $(KBUILD_CFLAGS)) # KVM nVHE code is run at a different exception code with a different map, so # compiler instrumentation that inserts callbacks or checks into the code may -- 2.28.0.1011.ga647a8990f-goog
DYNAMIC_FTRACE_WITH_REGS uses -fpatchable-function-entry, which makes running recordmcount unnecessary as there are no mcount calls in object files, and __mcount_loc doesn't need to be generated. While there's normally no harm in running recordmcount even when it's not strictly needed, this won't work with LTO as we have LLVM bitcode instead of ELF objects. This change selects FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY, which disables recordmcount when patchable function entries are used instead. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> --- arch/arm64/Kconfig | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 6d232837cbee..ad522b021f35 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -155,6 +155,8 @@ config ARM64 select HAVE_DYNAMIC_FTRACE select HAVE_DYNAMIC_FTRACE_WITH_REGS \ if $(cc-option,-fpatchable-function-entry=2) + select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY \ + if DYNAMIC_FTRACE_WITH_REGS select HAVE_EFFICIENT_UNALIGNED_ACCESS select HAVE_FAST_GUP select HAVE_FTRACE_MCOUNT_RECORD -- 2.28.0.1011.ga647a8990f-goog
Allow CONFIG_LTO_CLANG and CONFIG_THINLTO to be enabled. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> --- arch/arm64/Kconfig | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index ad522b021f35..7016d193864f 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -72,6 +72,8 @@ config ARM64 select ARCH_USE_SYM_ANNOTATIONS select ARCH_SUPPORTS_MEMORY_FAILURE select ARCH_SUPPORTS_SHADOW_CALL_STACK if CC_HAVE_SHADOW_CALL_STACK + select ARCH_SUPPORTS_LTO_CLANG + select ARCH_SUPPORTS_THINLTO select ARCH_SUPPORTS_ATOMIC_RMW select ARCH_SUPPORTS_INT128 if CC_HAS_INT128 && (GCC_VERSION >= 50000 || CC_IS_CLANG) select ARCH_SUPPORTS_NUMA_BALANCING -- 2.28.0.1011.ga647a8990f-goog
Running objtool --vmlinux --duplicate on vmlinux.o produces a few warnings about indirect jumps with retpoline: vmlinux.o: warning: objtool: wakeup_long64()+0x61: indirect jump found in RETPOLINE build ... This change adds ANNOTATE_RETPOLINE_SAFE annotations to the jumps in assembly code to stop the warnings. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> --- arch/x86/kernel/acpi/wakeup_64.S | 2 ++ arch/x86/platform/pvh/head.S | 2 ++ arch/x86/power/hibernate_asm_64.S | 3 +++ 3 files changed, 7 insertions(+) diff --git a/arch/x86/kernel/acpi/wakeup_64.S b/arch/x86/kernel/acpi/wakeup_64.S index c8daa92f38dc..041e79c4e195 100644 --- a/arch/x86/kernel/acpi/wakeup_64.S +++ b/arch/x86/kernel/acpi/wakeup_64.S @@ -7,6 +7,7 @@ #include <asm/msr.h> #include <asm/asm-offsets.h> #include <asm/frame.h> +#include <asm/nospec-branch.h> # Copyright 2003 Pavel Machek <pavel@suse.cz @@ -39,6 +40,7 @@ SYM_FUNC_START(wakeup_long64) movq saved_rbp, %rbp movq saved_rip, %rax + ANNOTATE_RETPOLINE_SAFE jmp *%rax SYM_FUNC_END(wakeup_long64) diff --git a/arch/x86/platform/pvh/head.S b/arch/x86/platform/pvh/head.S index 43b4d864817e..640b79cc64b8 100644 --- a/arch/x86/platform/pvh/head.S +++ b/arch/x86/platform/pvh/head.S @@ -15,6 +15,7 @@ #include <asm/asm.h> #include <asm/boot.h> #include <asm/processor-flags.h> +#include <asm/nospec-branch.h> #include <asm/msr.h> #include <xen/interface/elfnote.h> @@ -105,6 +106,7 @@ SYM_CODE_START_LOCAL(pvh_start_xen) /* startup_64 expects boot_params in %rsi. */ mov $_pa(pvh_bootparams), %rsi mov $_pa(startup_64), %rax + ANNOTATE_RETPOLINE_SAFE jmp *%rax #else /* CONFIG_X86_64 */ diff --git a/arch/x86/power/hibernate_asm_64.S b/arch/x86/power/hibernate_asm_64.S index 7918b8415f13..715509d94fa3 100644 --- a/arch/x86/power/hibernate_asm_64.S +++ b/arch/x86/power/hibernate_asm_64.S @@ -21,6 +21,7 @@ #include <asm/asm-offsets.h> #include <asm/processor-flags.h> #include <asm/frame.h> +#include <asm/nospec-branch.h> SYM_FUNC_START(swsusp_arch_suspend) movq $saved_context, %rax @@ -66,6 +67,7 @@ SYM_CODE_START(restore_image) /* jump to relocated restore code */ movq relocated_restore_code(%rip), %rcx + ANNOTATE_RETPOLINE_SAFE jmpq *%rcx SYM_CODE_END(restore_image) @@ -97,6 +99,7 @@ SYM_CODE_START(core_restore_code) .Ldone: /* jump to the restore_registers address from the image header */ + ANNOTATE_RETPOLINE_SAFE jmpq *%r8 SYM_CODE_END(core_restore_code) -- 2.28.0.1011.ga647a8990f-goog
Disable LTO for the vDSO. Note that while we could use Clang's LTO for the 64-bit vDSO, it won't add noticeable benefit for the small amount of C code. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> --- arch/x86/entry/vdso/Makefile | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile index ecc27018ae13..9b742f21d2db 100644 --- a/arch/x86/entry/vdso/Makefile +++ b/arch/x86/entry/vdso/Makefile @@ -90,7 +90,7 @@ ifneq ($(RETPOLINE_VDSO_CFLAGS),) endif endif -$(vobjs): KBUILD_CFLAGS := $(filter-out $(GCC_PLUGINS_CFLAGS) $(RETPOLINE_CFLAGS),$(KBUILD_CFLAGS)) $(CFL) +$(vobjs): KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_LTO) $(GCC_PLUGINS_CFLAGS) $(RETPOLINE_CFLAGS),$(KBUILD_CFLAGS)) $(CFL) # # vDSO code runs in userspace and -pg doesn't help with profiling anyway. @@ -148,6 +148,7 @@ KBUILD_CFLAGS_32 := $(filter-out -fno-pic,$(KBUILD_CFLAGS_32)) KBUILD_CFLAGS_32 := $(filter-out -mfentry,$(KBUILD_CFLAGS_32)) KBUILD_CFLAGS_32 := $(filter-out $(GCC_PLUGINS_CFLAGS),$(KBUILD_CFLAGS_32)) KBUILD_CFLAGS_32 := $(filter-out $(RETPOLINE_CFLAGS),$(KBUILD_CFLAGS_32)) +KBUILD_CFLAGS_32 := $(filter-out $(CC_FLAGS_LTO),$(KBUILD_CFLAGS_32)) KBUILD_CFLAGS_32 += -m32 -msoft-float -mregparm=0 -fpic KBUILD_CFLAGS_32 += -fno-stack-protector KBUILD_CFLAGS_32 += $(call cc-option, -foptimize-sibling-calls) -- 2.28.0.1011.ga647a8990f-goog
Clang incorrectly inlines functions with differing stack protector attributes, which breaks __restore_processor_state() that relies on stack protector being disabled. This change disables LTO for cpu.c to work aroung the bug. Link: https://bugs.llvm.org/show_bug.cgi?id=47479 Suggested-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> --- arch/x86/power/Makefile | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/arch/x86/power/Makefile b/arch/x86/power/Makefile index 6907b523e856..5f711a441623 100644 --- a/arch/x86/power/Makefile +++ b/arch/x86/power/Makefile @@ -5,5 +5,9 @@ OBJECT_FILES_NON_STANDARD_hibernate_asm_$(BITS).o := y # itself be stack-protected CFLAGS_cpu.o := -fno-stack-protector +# Clang may incorrectly inline functions with stack protector enabled into +# __restore_processor_state(): https://bugs.llvm.org/show_bug.cgi?id=47479 +CFLAGS_REMOVE_cpu.o := $(CC_FLAGS_LTO) + obj-$(CONFIG_PM_SLEEP) += cpu.o obj-$(CONFIG_HIBERNATION) += hibernate_$(BITS).o hibernate_asm_$(BITS).o hibernate.o -- 2.28.0.1011.ga647a8990f-goog
Pass code model and stack alignment to the linker as these are not stored in LLVM bitcode, and allow both CONFIG_LTO_CLANG and CONFIG_THINLTO to be selected. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> --- arch/x86/Kconfig | 2 ++ arch/x86/Makefile | 5 +++++ 2 files changed, 7 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 6d67646153bc..c579d7000b67 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -92,6 +92,8 @@ config X86 select ARCH_SUPPORTS_ACPI select ARCH_SUPPORTS_ATOMIC_RMW select ARCH_SUPPORTS_NUMA_BALANCING if X86_64 + select ARCH_SUPPORTS_LTO_CLANG if X86_64 + select ARCH_SUPPORTS_THINLTO if X86_64 select ARCH_USE_BUILTIN_BSWAP select ARCH_USE_QUEUED_RWLOCKS select ARCH_USE_QUEUED_SPINLOCKS diff --git a/arch/x86/Makefile b/arch/x86/Makefile index 154259f18b8b..774a7debb27c 100644 --- a/arch/x86/Makefile +++ b/arch/x86/Makefile @@ -173,6 +173,11 @@ ifeq ($(ACCUMULATE_OUTGOING_ARGS), 1) KBUILD_CFLAGS += $(call cc-option,-maccumulate-outgoing-args,) endif +ifdef CONFIG_LTO_CLANG +KBUILD_LDFLAGS += -plugin-opt=-code-model=kernel \ + -plugin-opt=-stack-alignment=$(if $(CONFIG_X86_32),4,8) +endif + # Workaround for a gcc prelease that unfortunately was shipped in a suse release KBUILD_CFLAGS += -Wno-sign-compare # -- 2.28.0.1011.ga647a8990f-goog
On Fri, Oct 9, 2020 at 6:13 PM 'Sami Tolvanen' via Clang Built Linux <clang-built-linux@googlegroups.com> wrote: > > This patch series adds support for building x86_64 and arm64 kernels > with Clang's Link Time Optimization (LTO). > > In addition to performance, the primary motivation for LTO is > to allow Clang's Control-Flow Integrity (CFI) to be used in the > kernel. Google has shipped millions of Pixel devices running three > major kernel versions with LTO+CFI since 2018. > > Most of the patches are build system changes for handling LLVM > bitcode, which Clang produces with LTO instead of ELF object files, > postponing ELF processing until a later stage, and ensuring initcall > ordering. > > Note that this version is based on tip/master to reduce the number > of prerequisite patches, and to make it easier to manage changes to > objtool. Patch 1 is from Masahiro's kbuild tree, and while it's not > directly related to LTO, it makes the module linker script changes > cleaner. > > Furthermore, patches 2-6 include Peter's patch for generating > __mcount_loc with objtool, and build system changes to enable it on > x86. With these patches, we no longer need to annotate functions > that have non-call references to __fentry__ with LTO, which greatly > simplifies supporting dynamic ftrace. > > You can also pull this series from > > https://github.com/samitolvanen/linux.git lto-v5 > > --- > Changes in v5: > > - Rebased on top of tip/master. > What are the plans to get this into mainline? Linux v5.10 :-) too early - needs more review/testing? Will clang-cfi be based on this, too? > - Changed the command line for objtool to use --vmlinux --duplicate > to disable warnings about retpoline thunks and to fix .orc_unwind > generation for vmlinux.o. > > - Added --noinstr flag to objtool, so we can use --vmlinux without > also enabling noinstr validation. > > - Disabled objtool's unreachable instruction warnings with LTO to > disable false positives for the int3 padding in vmlinux.o. > > - Added ANNOTATE_RETPOLINE_SAFE annotations to the indirect jumps > in x86 assembly code to fix objtool warnings with retpoline. > > - Fixed modpost warnings about missing version information with > CONFIG_MODVERSIONS. > > - Included Makefile.lib into Makefile.modpost for ld_flags. Thanks > to Sedat for pointing this out. > That was a long way to detect this as I had very big Debian Linux debug packages generated with CONFIG_DEBUG_INFO_COMPRESSED=y. Thanks for v5 of clang-lto. - Sedat - [1] https://github.com/ClangBuiltLinux/linux/issues/1086#issuecomment-705754002 > - Updated the help text for ThinLTO to better explain the trade-offs. > > - Updated commit messages with better explanations. > > Changes in v4: > > - Fixed a typo in Makefile.lib to correctly pass --no-fp to objtool. > > - Moved ftrace configs related to generating __mcount_loc to Kconfig, > so they are available also in Makefile.modfinal. > > - Dropped two prerequisite patches that were merged to Linus' tree. > > Changes in v3: > > - Added a separate patch to remove the unused DISABLE_LTO treewide, > as filtering out CC_FLAGS_LTO instead is preferred. > > - Updated the Kconfig help to explain why LTO is behind a choice > and disabled by default. > > - Dropped CC_FLAGS_LTO_CLANG, compiler-specific LTO flags are now > appended directly to CC_FLAGS_LTO. > > - Updated $(AR) flags as KBUILD_ARFLAGS was removed earlier. > > - Fixed ThinLTO cache handling for external module builds. > > - Rebased on top of Masahiro's patch for preprocessing modules.lds, > and moved the contents of module-lto.lds to modules.lds.S. > > - Moved objtool_args to Makefile.lib to avoid duplication of the > command line parameters in Makefile.modfinal. > > - Clarified in the commit message for the initcall ordering patch > that the initcall order remains the same as without LTO. > > - Changed link-vmlinux.sh to use jobserver-exec to control the > number of jobs started by generate_initcall_ordering.pl. > > - Dropped the x86/relocs patch to whitelist L4_PAGE_OFFSET as it's > no longer needed with ToT kernel. > > - Disabled LTO for arch/x86/power/cpu.c to work around a Clang bug > with stack protector attributes. > > Changes in v2: > > - Fixed -Wmissing-prototypes warnings with W=1. > > - Dropped cc-option from -fsplit-lto-unit and added .thinlto-cache > scrubbing to make distclean. > > - Added a comment about Clang >=11 being required. > > - Added a patch to disable LTO for the arm64 KVM nVHE code. > > - Disabled objtool's noinstr validation with LTO unless enabled. > > - Included Peter's proposed objtool mcount patch in the series > and replaced recordmcount with the objtool pass to avoid > whitelisting relocations that are not calls. > > - Updated several commit messages with better explanations. > > > Masahiro Yamada (1): > kbuild: preprocess module linker script > > Peter Zijlstra (1): > objtool: Add a pass for generating __mcount_loc > > Sami Tolvanen (27): > objtool: Don't autodetect vmlinux.o > tracing: move function tracer options to Kconfig > tracing: add support for objtool mcount > x86, build: use objtool mcount > treewide: remove DISABLE_LTO > kbuild: add support for Clang LTO > kbuild: lto: fix module versioning > objtool: Split noinstr validation from --vmlinux > kbuild: lto: postpone objtool > kbuild: lto: limit inlining > kbuild: lto: merge module sections > kbuild: lto: remove duplicate dependencies from .mod files > init: lto: ensure initcall ordering > init: lto: fix PREL32 relocations > PCI: Fix PREL32 relocations for LTO > modpost: lto: strip .lto from module names > scripts/mod: disable LTO for empty.c > efi/libstub: disable LTO > drivers/misc/lkdtm: disable LTO for rodata.o > arm64: vdso: disable LTO > KVM: arm64: disable LTO for the nVHE directory > arm64: disable recordmcount with DYNAMIC_FTRACE_WITH_REGS > arm64: allow LTO_CLANG and THINLTO to be selected > x86/asm: annotate indirect jumps > x86, vdso: disable LTO only for vDSO > x86, cpu: disable LTO for cpu.c > x86, build: allow LTO_CLANG and THINLTO to be selected > > .gitignore | 1 + > Makefile | 68 +++-- > arch/Kconfig | 74 +++++ > arch/arm/Makefile | 4 - > .../module.lds => include/asm/module.lds.h} | 2 + > arch/arm64/Kconfig | 4 + > arch/arm64/Makefile | 4 - > .../module.lds => include/asm/module.lds.h} | 2 + > arch/arm64/kernel/vdso/Makefile | 4 +- > arch/arm64/kvm/hyp/nvhe/Makefile | 4 +- > arch/ia64/Makefile | 1 - > .../{module.lds => include/asm/module.lds.h} | 0 > arch/m68k/Makefile | 1 - > .../module.lds => include/asm/module.lds.h} | 0 > arch/powerpc/Makefile | 1 - > .../module.lds => include/asm/module.lds.h} | 0 > arch/riscv/Makefile | 3 - > .../module.lds => include/asm/module.lds.h} | 3 +- > arch/sparc/vdso/Makefile | 2 - > arch/um/include/asm/Kbuild | 1 + > arch/x86/Kconfig | 3 + > arch/x86/Makefile | 5 + > arch/x86/entry/vdso/Makefile | 5 +- > arch/x86/kernel/acpi/wakeup_64.S | 2 + > arch/x86/platform/pvh/head.S | 2 + > arch/x86/power/Makefile | 4 + > arch/x86/power/hibernate_asm_64.S | 3 + > drivers/firmware/efi/libstub/Makefile | 2 + > drivers/misc/lkdtm/Makefile | 1 + > include/asm-generic/Kbuild | 1 + > include/asm-generic/module.lds.h | 10 + > include/asm-generic/vmlinux.lds.h | 11 +- > include/linux/init.h | 79 ++++- > include/linux/pci.h | 19 +- > kernel/Makefile | 3 - > kernel/trace/Kconfig | 29 ++ > scripts/.gitignore | 1 + > scripts/Makefile | 3 + > scripts/Makefile.build | 69 +++-- > scripts/Makefile.lib | 17 +- > scripts/Makefile.modfinal | 29 +- > scripts/Makefile.modpost | 25 +- > scripts/generate_initcall_order.pl | 270 ++++++++++++++++++ > scripts/link-vmlinux.sh | 98 ++++++- > scripts/mod/Makefile | 1 + > scripts/mod/modpost.c | 16 +- > scripts/mod/modpost.h | 9 + > scripts/mod/sumversion.c | 6 +- > scripts/{module-common.lds => module.lds.S} | 31 ++ > scripts/package/builddeb | 2 +- > tools/objtool/builtin-check.c | 10 +- > tools/objtool/check.c | 84 +++++- > tools/objtool/include/objtool/builtin.h | 2 +- > tools/objtool/include/objtool/check.h | 1 + > tools/objtool/include/objtool/objtool.h | 1 + > tools/objtool/objtool.c | 1 + > 56 files changed, 903 insertions(+), 131 deletions(-) > rename arch/arm/{kernel/module.lds => include/asm/module.lds.h} (72%) > rename arch/arm64/{kernel/module.lds => include/asm/module.lds.h} (76%) > rename arch/ia64/{module.lds => include/asm/module.lds.h} (100%) > rename arch/m68k/{kernel/module.lds => include/asm/module.lds.h} (100%) > rename arch/powerpc/{kernel/module.lds => include/asm/module.lds.h} (100%) > rename arch/riscv/{kernel/module.lds => include/asm/module.lds.h} (84%) > create mode 100644 include/asm-generic/module.lds.h > create mode 100755 scripts/generate_initcall_order.pl > rename scripts/{module-common.lds => module.lds.S} (59%) > > > base-commit: 80396d76da65fc8b82581c0260c25a6aa0a495a3 > -- > 2.28.0.1011.ga647a8990f-goog > > -- > You received this message because you are subscribed to the Google Groups "Clang Built Linux" group. > To unsubscribe from this group and stop receiving emails from it, send an email to clang-built-linux+unsubscribe@googlegroups.com. > To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/20201009161338.657380-1-samitolvanen%40google.com.
On Fri, Oct 09, 2020 at 06:30:24PM +0200, Sedat Dilek wrote: > Will clang-cfi be based on this, too? At least until the prerequisite patches are merged into mainline. In the meanwhile, I have a CFI tree based on this series here: https://github.com/samitolvanen/linux/tree/tip/clang-lto Sami
On Fri, Oct 09, 2020 at 03:35:12PM -0400, Steven Rostedt wrote: > On Fri, 9 Oct 2020 09:13:09 -0700 > Sami Tolvanen <samitolvanen@google.com> wrote: > > > This patch series adds support for building x86_64 and arm64 kernels > > with Clang's Link Time Optimization (LTO). > > > > In addition to performance, the primary motivation for LTO is > > to allow Clang's Control-Flow Integrity (CFI) to be used in the > > kernel. Google has shipped millions of Pixel devices running three > > major kernel versions with LTO+CFI since 2018. > > > > Most of the patches are build system changes for handling LLVM > > bitcode, which Clang produces with LTO instead of ELF object files, > > postponing ELF processing until a later stage, and ensuring initcall > > ordering. > > > > Note that this version is based on tip/master to reduce the number > > of prerequisite patches, and to make it easier to manage changes to > > objtool. Patch 1 is from Masahiro's kbuild tree, and while it's not > > directly related to LTO, it makes the module linker script changes > > cleaner. > > > > I went to test this, but it appears that the latest tip/master fails to > build for me. This error is on tip/master, before I even applied a single > patch. > > (config attached) Ah yes, X86_DECODER_SELFTEST seems to be broken in tip/master. If you prefer, I have these patches on top of mainline here: https://github.com/samitolvanen/linux/tree/clang-lto Testing your config with LTO on this tree, it does build and boot for me, although I saw a couple of new objtool warnings, and with LLVM=1, one warning from llvm-objdump. Sami
On Fri, 9 Oct 2020 14:05:48 -0700 Sami Tolvanen <samitolvanen@google.com> wrote: > Ah yes, X86_DECODER_SELFTEST seems to be broken in tip/master. If you > prefer, I have these patches on top of mainline here: > > https://github.com/samitolvanen/linux/tree/clang-lto > > Testing your config with LTO on this tree, it does build and boot for > me, although I saw a couple of new objtool warnings, and with LLVM=1, > one warning from llvm-objdump. Thanks, I disabled X86_DECODER_SELFTEST and it now builds. I forced the objdump mcount logic with the below patch, which produces: CONFIG_FTRACE_MCOUNT_RECORD=y CONFIG_FTRACE_MCOUNT_USE_OBJTOOL=y But I don't see the __mcount_loc sections being created. I applied patches 1 - 6. -- Steve diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig index 89263210ab26..3042619e21b7 100644 --- a/kernel/trace/Kconfig +++ b/kernel/trace/Kconfig @@ -606,7 +606,7 @@ config FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY config FTRACE_MCOUNT_USE_CC def_bool y - depends on $(cc-option,-mrecord-mcount) + depends on $(cc-option,-mrecord-mcount1) depends on !FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY depends on FTRACE_MCOUNT_RECORD
On Fri, Oct 9, 2020 at 4:38 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Fri, 9 Oct 2020 14:05:48 -0700
> Sami Tolvanen <samitolvanen@google.com> wrote:
>
> > Ah yes, X86_DECODER_SELFTEST seems to be broken in tip/master. If you
> > prefer, I have these patches on top of mainline here:
> >
> > https://github.com/samitolvanen/linux/tree/clang-lto
> >
> > Testing your config with LTO on this tree, it does build and boot for
> > me, although I saw a couple of new objtool warnings, and with LLVM=1,
> > one warning from llvm-objdump.
>
> Thanks, I disabled X86_DECODER_SELFTEST and it now builds.
>
> I forced the objdump mcount logic with the below patch, which produces:
>
> CONFIG_FTRACE_MCOUNT_RECORD=y
> CONFIG_FTRACE_MCOUNT_USE_OBJTOOL=y
>
> But I don't see the __mcount_loc sections being created.
>
> I applied patches 1 - 6.
Patch 6 is missing the part where we actually pass --mcount to
objtool, it's in patch 11 ("kbuild: lto: postpone objtool"). I'll fix
this in v6. In the meanwhile, please apply patches 1-11 to test the
objtool change. Do you have any thoughts about the approach otherwise?
Sami
On Fri, Oct 09, 2020 at 09:13:34AM -0700, Sami Tolvanen wrote: > Allow CONFIG_LTO_CLANG and CONFIG_THINLTO to be enabled. > > Signed-off-by: Sami Tolvanen <samitolvanen@google.com> > Reviewed-by: Kees Cook <keescook@chromium.org> > --- > arch/arm64/Kconfig | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index ad522b021f35..7016d193864f 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -72,6 +72,8 @@ config ARM64 > select ARCH_USE_SYM_ANNOTATIONS > select ARCH_SUPPORTS_MEMORY_FAILURE > select ARCH_SUPPORTS_SHADOW_CALL_STACK if CC_HAVE_SHADOW_CALL_STACK > + select ARCH_SUPPORTS_LTO_CLANG > + select ARCH_SUPPORTS_THINLTO Please don't enable this for arm64 until we have the dependency stuff sorted out. I posted patches [1] for this before, but I think they should be part of this series as they don't make sense on their own. Will [1] https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=rwonce/read-barrier-depends
On Mon, Oct 12, 2020 at 09:31:16AM +0100, Will Deacon wrote: > On Fri, Oct 09, 2020 at 09:13:34AM -0700, Sami Tolvanen wrote: > > Allow CONFIG_LTO_CLANG and CONFIG_THINLTO to be enabled. > > > > Signed-off-by: Sami Tolvanen <samitolvanen@google.com> > > Reviewed-by: Kees Cook <keescook@chromium.org> > > --- > > arch/arm64/Kconfig | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > > index ad522b021f35..7016d193864f 100644 > > --- a/arch/arm64/Kconfig > > +++ b/arch/arm64/Kconfig > > @@ -72,6 +72,8 @@ config ARM64 > > select ARCH_USE_SYM_ANNOTATIONS > > select ARCH_SUPPORTS_MEMORY_FAILURE > > select ARCH_SUPPORTS_SHADOW_CALL_STACK if CC_HAVE_SHADOW_CALL_STACK > > + select ARCH_SUPPORTS_LTO_CLANG > > + select ARCH_SUPPORTS_THINLTO > > Please don't enable this for arm64 until we have the dependency stuff sorted > out. I posted patches [1] for this before, but I think they should be part > of this series as they don't make sense on their own. Oh, hm. We've been trying to trim down this series, since it's already quite large. Why can't [1] land first? It would make this easier to deal with, IMO. > [1] https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=rwonce/read-barrier-depends -- Kees Cook
On Mon, Oct 12, 2020 at 01:44:56PM -0700, Kees Cook wrote: > On Mon, Oct 12, 2020 at 09:31:16AM +0100, Will Deacon wrote: > > On Fri, Oct 09, 2020 at 09:13:34AM -0700, Sami Tolvanen wrote: > > > Allow CONFIG_LTO_CLANG and CONFIG_THINLTO to be enabled. > > > > > > Signed-off-by: Sami Tolvanen <samitolvanen@google.com> > > > Reviewed-by: Kees Cook <keescook@chromium.org> > > > --- > > > arch/arm64/Kconfig | 2 ++ > > > 1 file changed, 2 insertions(+) > > > > > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > > > index ad522b021f35..7016d193864f 100644 > > > --- a/arch/arm64/Kconfig > > > +++ b/arch/arm64/Kconfig > > > @@ -72,6 +72,8 @@ config ARM64 > > > select ARCH_USE_SYM_ANNOTATIONS > > > select ARCH_SUPPORTS_MEMORY_FAILURE > > > select ARCH_SUPPORTS_SHADOW_CALL_STACK if CC_HAVE_SHADOW_CALL_STACK > > > + select ARCH_SUPPORTS_LTO_CLANG > > > + select ARCH_SUPPORTS_THINLTO > > > > Please don't enable this for arm64 until we have the dependency stuff sorted > > out. I posted patches [1] for this before, but I think they should be part > > of this series as they don't make sense on their own. > > Oh, hm. We've been trying to trim down this series, since it's already > quite large. Why can't [1] land first? It would make this easier to deal > with, IMO. I'm happy to handle [1] along with the LTO Kconfig change when the time comes, if that helps. I just don't want to merge dead code! Will > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=rwonce/read-barrier-depends
On Mon, Oct 12, 2020 at 09:51:09PM +0100, Will Deacon wrote: > On Mon, Oct 12, 2020 at 01:44:56PM -0700, Kees Cook wrote: > > On Mon, Oct 12, 2020 at 09:31:16AM +0100, Will Deacon wrote: > > > On Fri, Oct 09, 2020 at 09:13:34AM -0700, Sami Tolvanen wrote: > > > > Allow CONFIG_LTO_CLANG and CONFIG_THINLTO to be enabled. > > > > > > > > Signed-off-by: Sami Tolvanen <samitolvanen@google.com> > > > > Reviewed-by: Kees Cook <keescook@chromium.org> > > > > --- > > > > arch/arm64/Kconfig | 2 ++ > > > > 1 file changed, 2 insertions(+) > > > > > > > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > > > > index ad522b021f35..7016d193864f 100644 > > > > --- a/arch/arm64/Kconfig > > > > +++ b/arch/arm64/Kconfig > > > > @@ -72,6 +72,8 @@ config ARM64 > > > > select ARCH_USE_SYM_ANNOTATIONS > > > > select ARCH_SUPPORTS_MEMORY_FAILURE > > > > select ARCH_SUPPORTS_SHADOW_CALL_STACK if CC_HAVE_SHADOW_CALL_STACK > > > > + select ARCH_SUPPORTS_LTO_CLANG > > > > + select ARCH_SUPPORTS_THINLTO > > > > > > Please don't enable this for arm64 until we have the dependency stuff sorted > > > out. I posted patches [1] for this before, but I think they should be part > > > of this series as they don't make sense on their own. > > > > Oh, hm. We've been trying to trim down this series, since it's already > > quite large. Why can't [1] land first? It would make this easier to deal > > with, IMO. > > I'm happy to handle [1] along with the LTO Kconfig change when the time > comes, if that helps. I just don't want to merge dead code! Okay, understood. Thanks! > > Will > > > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=rwonce/read-barrier-depends -- Kees Cook