From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.6 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 43FA4C433E0 for ; Wed, 24 Jun 2020 20:54:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 11EF72081A for ; Wed, 24 Jun 2020 20:54:08 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="H5Esc7pp" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388930AbgFXUyG (ORCPT ); Wed, 24 Jun 2020 16:54:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53810 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387735AbgFXUyG (ORCPT ); Wed, 24 Jun 2020 16:54:06 -0400 Received: from mail-pf1-x441.google.com (mail-pf1-x441.google.com [IPv6:2607:f8b0:4864:20::441]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0DBE7C0613ED for ; Wed, 24 Jun 2020 13:54:06 -0700 (PDT) Received: by mail-pf1-x441.google.com with SMTP id d66so1768489pfd.6 for ; Wed, 24 Jun 2020 13:54:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=d0EL9NFdX5f9VfIfl2NFPx5y5hXz1Uyj4y9f4db/Xvs=; b=H5Esc7ppD7ajcasVMCC9fu37qF19lnhoSuCsYVEphUWeKPRLGblJ7gMQbDirrPeLjB H4PxO8V2ZaXlA5meAPPCKB5+e5pzJ7UOzoF6IU/U8d6FsYlEtDIx1I9bxHj1zI+YTSdj PMgMTWllqkCYFyVJ6ZCIirHAbngjxGCg+ul0OHsGciPvla7qeUDRGIY6wvrwgfeumG49 R0F9mMk6qdSGbUFyBpYTnmu6wZjeWFVCsjGn7/8+YqXD99mo1UkupOfj2AzfMjdRRhbM RmWXRheMoqG1CNNWluHFzSx0tEnXxjdU0O0MP2rI+350qKgrYwjZBkXCoehod3GJa6xl 4ulg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=d0EL9NFdX5f9VfIfl2NFPx5y5hXz1Uyj4y9f4db/Xvs=; b=iUqqihIgV96K8KlgihZxOfJNwnXvhnxs96gM2GESevo3hgsPGCyPiPdjBZ4fpgMYiY RsmDhM2kmNtaThjGdBPc7g+nfQLaS2doGdcnpshAvQLSPb8UofrtEYxLlZrMY06DAdw3 DskN+3f4q0EBBW8nmt0jz5ImQnf7lARhkRshXDTgeMwfebW+Uvzj8cjYas8r6PqZYp5O rbm0JYFGU7bv+ik0wp3i0ZWLqEc01fgtbpYwl4+TFzN8SzUfCWmaqCvSWsPb50ikbbrr aDWy2lozkBYRJhHsJ27p93n2y1q9pKob04LgOSFcS/YA7qflQv96S3oBdvUJrqOJ2JEF 1xLw== X-Gm-Message-State: AOAM530vQF8eaJZA7eAe11dH8foMpXRI1GdCt25Cy8XnDClN21hSRVSo y3FUNEa3+emITl4/vFl1JLRinCZlAbAfsWwbgXzZAQ== X-Google-Smtp-Source: ABdhPJxdRg4CyimepZFl3TlhMYwN3sqV9wabjVBLqFbFsA/AUlxHXR9UblJJuAxFnem4caWlJ6BIBXmEJEr28D+Xgl4= X-Received: by 2002:a63:7e55:: with SMTP id o21mr12841493pgn.263.1593032044935; Wed, 24 Jun 2020 13:54:04 -0700 (PDT) MIME-Version: 1.0 References: <20200624203200.78870-1-samitolvanen@google.com> <20200624203200.78870-3-samitolvanen@google.com> In-Reply-To: <20200624203200.78870-3-samitolvanen@google.com> From: Nick Desaulniers Date: Wed, 24 Jun 2020 13:53:52 -0700 Message-ID: Subject: Re: [PATCH 02/22] kbuild: add support for Clang LTO To: Sami Tolvanen Cc: Masahiro Yamada , Will Deacon , Greg Kroah-Hartman , "Paul E. McKenney" , Kees Cook , clang-built-linux , Kernel Hardening , linux-arch , Linux ARM , Linux Kbuild mailing list , LKML , linux-pci@vger.kernel.org, "maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 24, 2020 at 1:32 PM Sami Tolvanen wrote: > > This change adds build system support for Clang's Link Time > Optimization (LTO). With -flto, instead of ELF object files, Clang > produces LLVM bitcode, which is compiled into native code at link > time, allowing the final binary to be optimized globally. For more > details, see: > > https://llvm.org/docs/LinkTimeOptimization.html > > The Kconfig option CONFIG_LTO_CLANG is implemented as a choice, > which defaults to LTO being disabled. To use LTO, the architecture > must select ARCH_SUPPORTS_LTO_CLANG and support: > > - compiling with Clang, > - compiling inline assembly with Clang's integrated assembler, > - and linking with LLD. > > While using full LTO results in the best runtime performance, the > compilation is not scalable in time or memory. CONFIG_THINLTO > enables ThinLTO, which allows parallel optimization and faster > incremental builds. ThinLTO is used by default if the architecture > also selects ARCH_SUPPORTS_THINLTO: > > https://clang.llvm.org/docs/ThinLTO.html > > To enable LTO, LLVM tools must be used to handle bitcode files. The > easiest way is to pass the LLVM=1 option to make: > > $ make LLVM=1 defconfig > $ scripts/config -e LTO_CLANG > $ make LLVM=1 > > Alternatively, at least the following LLVM tools must be used: > > CC=clang LD=ld.lld AR=llvm-ar NM=llvm-nm > > To prepare for LTO support with other compilers, common parts are > gated behind the CONFIG_LTO option, and LTO can be disabled for > specific files by filtering out CC_FLAGS_LTO. > > Note that support for DYNAMIC_FTRACE and MODVERSIONS are added in > follow-up patches. > > Signed-off-by: Sami Tolvanen > --- > Makefile | 16 ++++++++ > arch/Kconfig | 66 +++++++++++++++++++++++++++++++ > include/asm-generic/vmlinux.lds.h | 11 ++++-- > scripts/Makefile.build | 9 ++++- > scripts/Makefile.modfinal | 9 ++++- > scripts/Makefile.modpost | 24 ++++++++++- > scripts/link-vmlinux.sh | 32 +++++++++++---- > 7 files changed, 151 insertions(+), 16 deletions(-) > > diff --git a/Makefile b/Makefile > index ac2c61c37a73..0c7fe6fb2143 100644 > --- a/Makefile > +++ b/Makefile > @@ -886,6 +886,22 @@ KBUILD_CFLAGS += $(CC_FLAGS_SCS) > export CC_FLAGS_SCS > endif > > +ifdef CONFIG_LTO_CLANG > +ifdef CONFIG_THINLTO > +CC_FLAGS_LTO_CLANG := -flto=thin $(call cc-option, -fsplit-lto-unit) The kconfig change gates this on clang-11; do we still need the cc-option check here, or can we hardcode the use of -fsplit-lto-unit? Playing with the flag in godbolt, it looks like clang-8 had support for this flag. > +KBUILD_LDFLAGS += --thinlto-cache-dir=.thinlto-cache It might be nice to have `make distclean` or even `make clean` scrub the .thinlto-cache? Also, I verified that the `.gitignore` rule for `.*` properly ignores this dir. > +else > +CC_FLAGS_LTO_CLANG := -flto > +endif > +CC_FLAGS_LTO_CLANG += -fvisibility=default > +endif > + > +ifdef CONFIG_LTO > +CC_FLAGS_LTO := $(CC_FLAGS_LTO_CLANG) > +KBUILD_CFLAGS += $(CC_FLAGS_LTO) > +export CC_FLAGS_LTO > +endif > + > # arch Makefile may override CC so keep this after arch Makefile is included > NOSTDINC_FLAGS += -nostdinc -isystem $(shell $(CC) -print-file-name=include) > > diff --git a/arch/Kconfig b/arch/Kconfig > index 8cc35dc556c7..e00b122293f8 100644 > --- a/arch/Kconfig > +++ b/arch/Kconfig > @@ -552,6 +552,72 @@ config SHADOW_CALL_STACK > reading and writing arbitrary memory may be able to locate them > and hijack control flow by modifying the stacks. > > +config LTO > + bool > + > +config ARCH_SUPPORTS_LTO_CLANG > + bool > + help > + An architecture should select this option if it supports: > + - compiling with Clang, > + - compiling inline assembly with Clang's integrated assembler, > + - and linking with LLD. > + > +config ARCH_SUPPORTS_THINLTO > + bool > + help > + An architecture should select this option if it supports Clang's > + ThinLTO. > + > +config THINLTO > + bool "Clang ThinLTO" > + depends on LTO_CLANG && ARCH_SUPPORTS_THINLTO > + default y > + help > + This option enables Clang's ThinLTO, which allows for parallel > + optimization and faster incremental compiles. More information > + can be found from Clang's documentation: > + > + https://clang.llvm.org/docs/ThinLTO.html > + > +choice > + prompt "Link Time Optimization (LTO)" > + default LTO_NONE > + help > + This option enables Link Time Optimization (LTO), which allows the > + compiler to optimize binaries globally. > + > + If unsure, select LTO_NONE. > + > +config LTO_NONE > + bool "None" > + > +config LTO_CLANG > + bool "Clang's Link Time Optimization (EXPERIMENTAL)" > + depends on CC_IS_CLANG && CLANG_VERSION >= 110000 && LD_IS_LLD > + depends on $(success,$(NM) --help | head -n 1 | grep -qi llvm) > + depends on $(success,$(AR) --help | head -n 1 | grep -qi llvm) > + depends on ARCH_SUPPORTS_LTO_CLANG > + depends on !FTRACE_MCOUNT_RECORD > + depends on !KASAN > + depends on !MODVERSIONS > + select LTO > + help > + This option enables Clang's Link Time Optimization (LTO), which > + allows the compiler to optimize the kernel globally. If you enable > + this option, the compiler generates LLVM bitcode instead of ELF > + object files, and the actual compilation from bitcode happens at > + the LTO link step, which may take several minutes depending on the > + kernel configuration. More information can be found from LLVM's > + documentation: > + > + https://llvm.org/docs/LinkTimeOptimization.html > + > + To select this option, you also need to use LLVM tools to handle > + the bitcode by passing LLVM=1 to make. > + > +endchoice > + > config HAVE_ARCH_WITHIN_STACK_FRAMES > bool > help > diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h > index db600ef218d7..78079000c05a 100644 > --- a/include/asm-generic/vmlinux.lds.h > +++ b/include/asm-generic/vmlinux.lds.h > @@ -89,15 +89,18 @@ > * .data. We don't want to pull in .data..other sections, which Linux > * has defined. Same for text and bss. > * > + * With LTO_CLANG, the linker also splits sections by default, so we need > + * these macros to combine the sections during the final link. > + * > * RODATA_MAIN is not used because existing code already defines .rodata.x > * sections to be brought in with rodata. > */ > -#ifdef CONFIG_LD_DEAD_CODE_DATA_ELIMINATION > +#if defined(CONFIG_LD_DEAD_CODE_DATA_ELIMINATION) || defined(CONFIG_LTO_CLANG) > #define TEXT_MAIN .text .text.[0-9a-zA-Z_]* > -#define DATA_MAIN .data .data.[0-9a-zA-Z_]* .data..LPBX* > +#define DATA_MAIN .data .data.[0-9a-zA-Z_]* .data..L* .data..compoundliteral* > #define SDATA_MAIN .sdata .sdata.[0-9a-zA-Z_]* > -#define RODATA_MAIN .rodata .rodata.[0-9a-zA-Z_]* > -#define BSS_MAIN .bss .bss.[0-9a-zA-Z_]* > +#define RODATA_MAIN .rodata .rodata.[0-9a-zA-Z_]* .rodata..L* > +#define BSS_MAIN .bss .bss.[0-9a-zA-Z_]* .bss..compoundliteral* > #define SBSS_MAIN .sbss .sbss.[0-9a-zA-Z_]* > #else > #define TEXT_MAIN .text > diff --git a/scripts/Makefile.build b/scripts/Makefile.build > index 2e8810b7e5ed..f307e708a1b7 100644 > --- a/scripts/Makefile.build > +++ b/scripts/Makefile.build > @@ -108,7 +108,7 @@ endif > # --------------------------------------------------------------------------- > > quiet_cmd_cc_s_c = CC $(quiet_modtag) $@ > - cmd_cc_s_c = $(CC) $(filter-out $(DEBUG_CFLAGS), $(c_flags)) $(DISABLE_LTO) -fverbose-asm -S -o $@ $< > + cmd_cc_s_c = $(CC) $(filter-out $(DEBUG_CFLAGS) $(CC_FLAGS_LTO), $(c_flags)) -fverbose-asm -S -o $@ $< > > $(obj)/%.s: $(src)/%.c FORCE > $(call if_changed_dep,cc_s_c) > @@ -424,8 +424,15 @@ $(obj)/lib.a: $(lib-y) FORCE > # Do not replace $(filter %.o,^) with $(real-prereqs). When a single object > # module is turned into a multi object module, $^ will contain header file > # dependencies recorded in the .*.cmd file. > +ifdef CONFIG_LTO_CLANG > +quiet_cmd_link_multi-m = AR [M] $@ > +cmd_link_multi-m = \ > + rm -f $@; \ > + $(AR) rcsTP$(KBUILD_ARFLAGS) $@ $(filter %.o,$^) > +else > quiet_cmd_link_multi-m = LD [M] $@ > cmd_link_multi-m = $(LD) $(ld_flags) -r -o $@ $(filter %.o,$^) > +endif > > $(multi-used-m): FORCE > $(call if_changed,link_multi-m) > diff --git a/scripts/Makefile.modfinal b/scripts/Makefile.modfinal > index 411c1e600e7d..1005b147abd0 100644 > --- a/scripts/Makefile.modfinal > +++ b/scripts/Makefile.modfinal > @@ -6,6 +6,7 @@ > PHONY := __modfinal > __modfinal: > > +include $(objtree)/include/config/auto.conf > include $(srctree)/scripts/Kbuild.include > > # for c_flags > @@ -29,6 +30,12 @@ quiet_cmd_cc_o_c = CC [M] $@ > > ARCH_POSTLINK := $(wildcard $(srctree)/arch/$(SRCARCH)/Makefile.postlink) > > +ifdef CONFIG_LTO_CLANG > +# With CONFIG_LTO_CLANG, reuse the object file we compiled for modpost to > +# avoid a second slow LTO link > +prelink-ext := .lto > +endif > + > quiet_cmd_ld_ko_o = LD [M] $@ > cmd_ld_ko_o = \ > $(LD) -r $(KBUILD_LDFLAGS) \ > @@ -37,7 +44,7 @@ quiet_cmd_ld_ko_o = LD [M] $@ > -o $@ $(filter %.o, $^); \ > $(if $(ARCH_POSTLINK), $(MAKE) -f $(ARCH_POSTLINK) $@, true) > > -$(modules): %.ko: %.o %.mod.o $(KBUILD_LDS_MODULE) FORCE > +$(modules): %.ko: %$(prelink-ext).o %.mod.o $(KBUILD_LDS_MODULE) FORCE > +$(call if_changed,ld_ko_o) > > targets += $(modules) $(modules:.ko=.mod.o) > diff --git a/scripts/Makefile.modpost b/scripts/Makefile.modpost > index 3651cbf6ad49..9ced8aecd579 100644 > --- a/scripts/Makefile.modpost > +++ b/scripts/Makefile.modpost > @@ -102,12 +102,32 @@ $(input-symdump): > @echo >&2 'WARNING: Symbol version dump "$@" is missing.' > @echo >&2 ' Modules may not have dependencies or modversions.' > > +ifdef CONFIG_LTO_CLANG > +# With CONFIG_LTO_CLANG, .o files might be LLVM bitcode, so we need to run > +# LTO to compile them into native code before running modpost > +prelink-ext = .lto > + > +quiet_cmd_cc_lto_link_modules = LTO [M] $@ > +cmd_cc_lto_link_modules = \ > + $(LD) $(ld_flags) -r -o $@ \ > + --whole-archive $(filter-out FORCE,$^) > + > +%.lto.o: %.o FORCE > + $(call if_changed,cc_lto_link_modules) > + > +PHONY += FORCE > +FORCE: > + > +endif > + > +modules := $(sort $(shell cat $(MODORDER))) > + > # Read out modules.order to pass in modpost. > # Otherwise, allmodconfig would fail with "Argument list too long". > quiet_cmd_modpost = MODPOST $@ > - cmd_modpost = sed 's/ko$$/o/' $< | $(MODPOST) -T - > + cmd_modpost = sed 's/\.ko$$/$(prelink-ext)\.o/' $< | $(MODPOST) -T - > > -$(output-symdump): $(MODORDER) $(input-symdump) FORCE > +$(output-symdump): $(MODORDER) $(input-symdump) $(modules:.ko=$(prelink-ext).o) FORCE > $(call if_changed,modpost) > > targets += $(output-symdump) > diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh > index 92dd745906f4..a681b3b6722e 100755 > --- a/scripts/link-vmlinux.sh > +++ b/scripts/link-vmlinux.sh > @@ -52,6 +52,14 @@ modpost_link() > ${KBUILD_VMLINUX_LIBS} \ > --end-group" > > + if [ -n "${CONFIG_LTO_CLANG}" ]; then > + # This might take a while, so indicate that we're doing > + # an LTO link > + info LTO ${1} > + else > + info LD ${1} > + fi > + > ${LD} ${KBUILD_LDFLAGS} -r -o ${1} ${objects} > } > > @@ -99,13 +107,22 @@ vmlinux_link() > fi > > if [ "${SRCARCH}" != "um" ]; then > - objects="--whole-archive \ > - ${KBUILD_VMLINUX_OBJS} \ > - --no-whole-archive \ > - --start-group \ > - ${KBUILD_VMLINUX_LIBS} \ > - --end-group \ > - ${@}" > + if [ -n "${CONFIG_LTO_CLANG}" ]; then > + # Use vmlinux.o instead of performing the slow LTO > + # link again. > + objects="--whole-archive \ > + vmlinux.o \ > + --no-whole-archive \ > + ${@}" > + else > + objects="--whole-archive \ > + ${KBUILD_VMLINUX_OBJS} \ > + --no-whole-archive \ > + --start-group \ > + ${KBUILD_VMLINUX_LIBS} \ > + --end-group \ > + ${@}" > + fi > > ${LD} ${KBUILD_LDFLAGS} ${LDFLAGS_vmlinux} \ > ${strip_debug#-Wl,} \ > @@ -270,7 +287,6 @@ fi; > ${MAKE} -f "${srctree}/scripts/Makefile.build" obj=init need-builtin=1 > > #link vmlinux.o > -info LD vmlinux.o > modpost_link vmlinux.o > objtool_link vmlinux.o > > -- > 2.27.0.212.ge8ba1cc988-goog > -- Thanks, ~Nick Desaulniers From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=3.0 tests=DKIMWL_WL_HIGH, DKIM_ADSP_CUSTOM_MED,DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0D81FC433DF for ; Wed, 24 Jun 2020 20:56:11 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id BB0A42070A for ; Wed, 24 Jun 2020 20:56:10 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="xyT1pxIy"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=google.com header.i=@google.com header.b="H5Esc7pp" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BB0A42070A Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:To:Subject:Message-ID:Date:From:In-Reply-To: References:MIME-Version:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=UcT1Rgx+75OB+t0ZdDWM39H0Vq0vrwRK2+2hpP6FuXY=; b=xyT1pxIyZekS+KpDwR7tUFohK hdAwaFIs4qUjveTZ5ppkDQol6pTwhwxg/qh0JyGspPLZxPXcL/kqkhR5dPoVItoeVO2pFjog7DMVk yuJGtJkfk38cA55cl+ZC5qUexI9hLaIDqZOTcZ+mEqWlXGQGtf+A8TU5UIyJZiQXJmbp/zwDuXfaC CE0Ij5dEfZgEaOqnHMw5as48Pe0ALTWajtQJPAHtQIe1hXPdxpTJ/vhqO6pvivWsQJ40zNWdcVWj2 o2fJrdkAVGHBs3JXfXjakiRs1eeD3pDY3J6ZyzRVwVo2xQZDbXpbnRj4Mkrdo6bc6a6NRaKx5wcBP V6zWm3DBA==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1joCPO-0000ch-Cx; Wed, 24 Jun 2020 20:54:14 +0000 Received: from mail-pg1-x543.google.com ([2607:f8b0:4864:20::543]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1joCPK-0000Zx-77 for linux-arm-kernel@lists.infradead.org; Wed, 24 Jun 2020 20:54:11 +0000 Received: by mail-pg1-x543.google.com with SMTP id r18so2015386pgk.11 for ; Wed, 24 Jun 2020 13:54:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=d0EL9NFdX5f9VfIfl2NFPx5y5hXz1Uyj4y9f4db/Xvs=; b=H5Esc7ppD7ajcasVMCC9fu37qF19lnhoSuCsYVEphUWeKPRLGblJ7gMQbDirrPeLjB H4PxO8V2ZaXlA5meAPPCKB5+e5pzJ7UOzoF6IU/U8d6FsYlEtDIx1I9bxHj1zI+YTSdj PMgMTWllqkCYFyVJ6ZCIirHAbngjxGCg+ul0OHsGciPvla7qeUDRGIY6wvrwgfeumG49 R0F9mMk6qdSGbUFyBpYTnmu6wZjeWFVCsjGn7/8+YqXD99mo1UkupOfj2AzfMjdRRhbM RmWXRheMoqG1CNNWluHFzSx0tEnXxjdU0O0MP2rI+350qKgrYwjZBkXCoehod3GJa6xl 4ulg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=d0EL9NFdX5f9VfIfl2NFPx5y5hXz1Uyj4y9f4db/Xvs=; b=MMJCSwxlu7KvMpY8yb+dYkR1gXXgdhXu2+yXP7geG4flDTgb7ZazdSVkwLEZnqzUDz ttxaDpT1AO7Gf1dgWq9BF2NMIYPthzegV71C6YKfsW1Eky3Tz2Xf4ENNF4bPiZArJPsz 9SWL2t1L2kebNU0u4MnvwOS8wC1yxGUTXZ1iDeHIDJ9EwO/JKczgD0vX/t+Rw+YFiJoo Jc8ZSyF8uMZ3P+HBe5nKqGegk/3gBmSj8N3jFdjphZ5iltNsdqXgpYyZG91wm5NCXXbp zfSNyTh9A/0a7pVuTcS7z+3ZAsrpw1vlCGcJu6ApGaC0cNsiV9/Wau+BgD+094oCxxEW LP3A== X-Gm-Message-State: AOAM531r8G7RroTbDcPQwsrj/rNTbjKrmB8ygKMXVc+5Sc5Sh6fyS0+r GUIMiWlK4bjWihVRetKNogkjE0eBVBHpAhK8stN6sA== X-Google-Smtp-Source: ABdhPJxdRg4CyimepZFl3TlhMYwN3sqV9wabjVBLqFbFsA/AUlxHXR9UblJJuAxFnem4caWlJ6BIBXmEJEr28D+Xgl4= X-Received: by 2002:a63:7e55:: with SMTP id o21mr12841493pgn.263.1593032044935; Wed, 24 Jun 2020 13:54:04 -0700 (PDT) MIME-Version: 1.0 References: <20200624203200.78870-1-samitolvanen@google.com> <20200624203200.78870-3-samitolvanen@google.com> In-Reply-To: <20200624203200.78870-3-samitolvanen@google.com> From: Nick Desaulniers Date: Wed, 24 Jun 2020 13:53:52 -0700 Message-ID: Subject: Re: [PATCH 02/22] kbuild: add support for Clang LTO To: Sami Tolvanen X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-arch , "maintainer:X86 ARCHITECTURE \(32-BIT AND 64-BIT\)" , Kees Cook , "Paul E. McKenney" , Kernel Hardening , Greg Kroah-Hartman , Masahiro Yamada , Linux Kbuild mailing list , LKML , clang-built-linux , linux-pci@vger.kernel.org, Will Deacon , Linux ARM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Wed, Jun 24, 2020 at 1:32 PM Sami Tolvanen wrote: > > This change adds build system support for Clang's Link Time > Optimization (LTO). With -flto, instead of ELF object files, Clang > produces LLVM bitcode, which is compiled into native code at link > time, allowing the final binary to be optimized globally. For more > details, see: > > https://llvm.org/docs/LinkTimeOptimization.html > > The Kconfig option CONFIG_LTO_CLANG is implemented as a choice, > which defaults to LTO being disabled. To use LTO, the architecture > must select ARCH_SUPPORTS_LTO_CLANG and support: > > - compiling with Clang, > - compiling inline assembly with Clang's integrated assembler, > - and linking with LLD. > > While using full LTO results in the best runtime performance, the > compilation is not scalable in time or memory. CONFIG_THINLTO > enables ThinLTO, which allows parallel optimization and faster > incremental builds. ThinLTO is used by default if the architecture > also selects ARCH_SUPPORTS_THINLTO: > > https://clang.llvm.org/docs/ThinLTO.html > > To enable LTO, LLVM tools must be used to handle bitcode files. The > easiest way is to pass the LLVM=1 option to make: > > $ make LLVM=1 defconfig > $ scripts/config -e LTO_CLANG > $ make LLVM=1 > > Alternatively, at least the following LLVM tools must be used: > > CC=clang LD=ld.lld AR=llvm-ar NM=llvm-nm > > To prepare for LTO support with other compilers, common parts are > gated behind the CONFIG_LTO option, and LTO can be disabled for > specific files by filtering out CC_FLAGS_LTO. > > Note that support for DYNAMIC_FTRACE and MODVERSIONS are added in > follow-up patches. > > Signed-off-by: Sami Tolvanen > --- > Makefile | 16 ++++++++ > arch/Kconfig | 66 +++++++++++++++++++++++++++++++ > include/asm-generic/vmlinux.lds.h | 11 ++++-- > scripts/Makefile.build | 9 ++++- > scripts/Makefile.modfinal | 9 ++++- > scripts/Makefile.modpost | 24 ++++++++++- > scripts/link-vmlinux.sh | 32 +++++++++++---- > 7 files changed, 151 insertions(+), 16 deletions(-) > > diff --git a/Makefile b/Makefile > index ac2c61c37a73..0c7fe6fb2143 100644 > --- a/Makefile > +++ b/Makefile > @@ -886,6 +886,22 @@ KBUILD_CFLAGS += $(CC_FLAGS_SCS) > export CC_FLAGS_SCS > endif > > +ifdef CONFIG_LTO_CLANG > +ifdef CONFIG_THINLTO > +CC_FLAGS_LTO_CLANG := -flto=thin $(call cc-option, -fsplit-lto-unit) The kconfig change gates this on clang-11; do we still need the cc-option check here, or can we hardcode the use of -fsplit-lto-unit? Playing with the flag in godbolt, it looks like clang-8 had support for this flag. > +KBUILD_LDFLAGS += --thinlto-cache-dir=.thinlto-cache It might be nice to have `make distclean` or even `make clean` scrub the .thinlto-cache? Also, I verified that the `.gitignore` rule for `.*` properly ignores this dir. > +else > +CC_FLAGS_LTO_CLANG := -flto > +endif > +CC_FLAGS_LTO_CLANG += -fvisibility=default > +endif > + > +ifdef CONFIG_LTO > +CC_FLAGS_LTO := $(CC_FLAGS_LTO_CLANG) > +KBUILD_CFLAGS += $(CC_FLAGS_LTO) > +export CC_FLAGS_LTO > +endif > + > # arch Makefile may override CC so keep this after arch Makefile is included > NOSTDINC_FLAGS += -nostdinc -isystem $(shell $(CC) -print-file-name=include) > > diff --git a/arch/Kconfig b/arch/Kconfig > index 8cc35dc556c7..e00b122293f8 100644 > --- a/arch/Kconfig > +++ b/arch/Kconfig > @@ -552,6 +552,72 @@ config SHADOW_CALL_STACK > reading and writing arbitrary memory may be able to locate them > and hijack control flow by modifying the stacks. > > +config LTO > + bool > + > +config ARCH_SUPPORTS_LTO_CLANG > + bool > + help > + An architecture should select this option if it supports: > + - compiling with Clang, > + - compiling inline assembly with Clang's integrated assembler, > + - and linking with LLD. > + > +config ARCH_SUPPORTS_THINLTO > + bool > + help > + An architecture should select this option if it supports Clang's > + ThinLTO. > + > +config THINLTO > + bool "Clang ThinLTO" > + depends on LTO_CLANG && ARCH_SUPPORTS_THINLTO > + default y > + help > + This option enables Clang's ThinLTO, which allows for parallel > + optimization and faster incremental compiles. More information > + can be found from Clang's documentation: > + > + https://clang.llvm.org/docs/ThinLTO.html > + > +choice > + prompt "Link Time Optimization (LTO)" > + default LTO_NONE > + help > + This option enables Link Time Optimization (LTO), which allows the > + compiler to optimize binaries globally. > + > + If unsure, select LTO_NONE. > + > +config LTO_NONE > + bool "None" > + > +config LTO_CLANG > + bool "Clang's Link Time Optimization (EXPERIMENTAL)" > + depends on CC_IS_CLANG && CLANG_VERSION >= 110000 && LD_IS_LLD > + depends on $(success,$(NM) --help | head -n 1 | grep -qi llvm) > + depends on $(success,$(AR) --help | head -n 1 | grep -qi llvm) > + depends on ARCH_SUPPORTS_LTO_CLANG > + depends on !FTRACE_MCOUNT_RECORD > + depends on !KASAN > + depends on !MODVERSIONS > + select LTO > + help > + This option enables Clang's Link Time Optimization (LTO), which > + allows the compiler to optimize the kernel globally. If you enable > + this option, the compiler generates LLVM bitcode instead of ELF > + object files, and the actual compilation from bitcode happens at > + the LTO link step, which may take several minutes depending on the > + kernel configuration. More information can be found from LLVM's > + documentation: > + > + https://llvm.org/docs/LinkTimeOptimization.html > + > + To select this option, you also need to use LLVM tools to handle > + the bitcode by passing LLVM=1 to make. > + > +endchoice > + > config HAVE_ARCH_WITHIN_STACK_FRAMES > bool > help > diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h > index db600ef218d7..78079000c05a 100644 > --- a/include/asm-generic/vmlinux.lds.h > +++ b/include/asm-generic/vmlinux.lds.h > @@ -89,15 +89,18 @@ > * .data. We don't want to pull in .data..other sections, which Linux > * has defined. Same for text and bss. > * > + * With LTO_CLANG, the linker also splits sections by default, so we need > + * these macros to combine the sections during the final link. > + * > * RODATA_MAIN is not used because existing code already defines .rodata.x > * sections to be brought in with rodata. > */ > -#ifdef CONFIG_LD_DEAD_CODE_DATA_ELIMINATION > +#if defined(CONFIG_LD_DEAD_CODE_DATA_ELIMINATION) || defined(CONFIG_LTO_CLANG) > #define TEXT_MAIN .text .text.[0-9a-zA-Z_]* > -#define DATA_MAIN .data .data.[0-9a-zA-Z_]* .data..LPBX* > +#define DATA_MAIN .data .data.[0-9a-zA-Z_]* .data..L* .data..compoundliteral* > #define SDATA_MAIN .sdata .sdata.[0-9a-zA-Z_]* > -#define RODATA_MAIN .rodata .rodata.[0-9a-zA-Z_]* > -#define BSS_MAIN .bss .bss.[0-9a-zA-Z_]* > +#define RODATA_MAIN .rodata .rodata.[0-9a-zA-Z_]* .rodata..L* > +#define BSS_MAIN .bss .bss.[0-9a-zA-Z_]* .bss..compoundliteral* > #define SBSS_MAIN .sbss .sbss.[0-9a-zA-Z_]* > #else > #define TEXT_MAIN .text > diff --git a/scripts/Makefile.build b/scripts/Makefile.build > index 2e8810b7e5ed..f307e708a1b7 100644 > --- a/scripts/Makefile.build > +++ b/scripts/Makefile.build > @@ -108,7 +108,7 @@ endif > # --------------------------------------------------------------------------- > > quiet_cmd_cc_s_c = CC $(quiet_modtag) $@ > - cmd_cc_s_c = $(CC) $(filter-out $(DEBUG_CFLAGS), $(c_flags)) $(DISABLE_LTO) -fverbose-asm -S -o $@ $< > + cmd_cc_s_c = $(CC) $(filter-out $(DEBUG_CFLAGS) $(CC_FLAGS_LTO), $(c_flags)) -fverbose-asm -S -o $@ $< > > $(obj)/%.s: $(src)/%.c FORCE > $(call if_changed_dep,cc_s_c) > @@ -424,8 +424,15 @@ $(obj)/lib.a: $(lib-y) FORCE > # Do not replace $(filter %.o,^) with $(real-prereqs). When a single object > # module is turned into a multi object module, $^ will contain header file > # dependencies recorded in the .*.cmd file. > +ifdef CONFIG_LTO_CLANG > +quiet_cmd_link_multi-m = AR [M] $@ > +cmd_link_multi-m = \ > + rm -f $@; \ > + $(AR) rcsTP$(KBUILD_ARFLAGS) $@ $(filter %.o,$^) > +else > quiet_cmd_link_multi-m = LD [M] $@ > cmd_link_multi-m = $(LD) $(ld_flags) -r -o $@ $(filter %.o,$^) > +endif > > $(multi-used-m): FORCE > $(call if_changed,link_multi-m) > diff --git a/scripts/Makefile.modfinal b/scripts/Makefile.modfinal > index 411c1e600e7d..1005b147abd0 100644 > --- a/scripts/Makefile.modfinal > +++ b/scripts/Makefile.modfinal > @@ -6,6 +6,7 @@ > PHONY := __modfinal > __modfinal: > > +include $(objtree)/include/config/auto.conf > include $(srctree)/scripts/Kbuild.include > > # for c_flags > @@ -29,6 +30,12 @@ quiet_cmd_cc_o_c = CC [M] $@ > > ARCH_POSTLINK := $(wildcard $(srctree)/arch/$(SRCARCH)/Makefile.postlink) > > +ifdef CONFIG_LTO_CLANG > +# With CONFIG_LTO_CLANG, reuse the object file we compiled for modpost to > +# avoid a second slow LTO link > +prelink-ext := .lto > +endif > + > quiet_cmd_ld_ko_o = LD [M] $@ > cmd_ld_ko_o = \ > $(LD) -r $(KBUILD_LDFLAGS) \ > @@ -37,7 +44,7 @@ quiet_cmd_ld_ko_o = LD [M] $@ > -o $@ $(filter %.o, $^); \ > $(if $(ARCH_POSTLINK), $(MAKE) -f $(ARCH_POSTLINK) $@, true) > > -$(modules): %.ko: %.o %.mod.o $(KBUILD_LDS_MODULE) FORCE > +$(modules): %.ko: %$(prelink-ext).o %.mod.o $(KBUILD_LDS_MODULE) FORCE > +$(call if_changed,ld_ko_o) > > targets += $(modules) $(modules:.ko=.mod.o) > diff --git a/scripts/Makefile.modpost b/scripts/Makefile.modpost > index 3651cbf6ad49..9ced8aecd579 100644 > --- a/scripts/Makefile.modpost > +++ b/scripts/Makefile.modpost > @@ -102,12 +102,32 @@ $(input-symdump): > @echo >&2 'WARNING: Symbol version dump "$@" is missing.' > @echo >&2 ' Modules may not have dependencies or modversions.' > > +ifdef CONFIG_LTO_CLANG > +# With CONFIG_LTO_CLANG, .o files might be LLVM bitcode, so we need to run > +# LTO to compile them into native code before running modpost > +prelink-ext = .lto > + > +quiet_cmd_cc_lto_link_modules = LTO [M] $@ > +cmd_cc_lto_link_modules = \ > + $(LD) $(ld_flags) -r -o $@ \ > + --whole-archive $(filter-out FORCE,$^) > + > +%.lto.o: %.o FORCE > + $(call if_changed,cc_lto_link_modules) > + > +PHONY += FORCE > +FORCE: > + > +endif > + > +modules := $(sort $(shell cat $(MODORDER))) > + > # Read out modules.order to pass in modpost. > # Otherwise, allmodconfig would fail with "Argument list too long". > quiet_cmd_modpost = MODPOST $@ > - cmd_modpost = sed 's/ko$$/o/' $< | $(MODPOST) -T - > + cmd_modpost = sed 's/\.ko$$/$(prelink-ext)\.o/' $< | $(MODPOST) -T - > > -$(output-symdump): $(MODORDER) $(input-symdump) FORCE > +$(output-symdump): $(MODORDER) $(input-symdump) $(modules:.ko=$(prelink-ext).o) FORCE > $(call if_changed,modpost) > > targets += $(output-symdump) > diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh > index 92dd745906f4..a681b3b6722e 100755 > --- a/scripts/link-vmlinux.sh > +++ b/scripts/link-vmlinux.sh > @@ -52,6 +52,14 @@ modpost_link() > ${KBUILD_VMLINUX_LIBS} \ > --end-group" > > + if [ -n "${CONFIG_LTO_CLANG}" ]; then > + # This might take a while, so indicate that we're doing > + # an LTO link > + info LTO ${1} > + else > + info LD ${1} > + fi > + > ${LD} ${KBUILD_LDFLAGS} -r -o ${1} ${objects} > } > > @@ -99,13 +107,22 @@ vmlinux_link() > fi > > if [ "${SRCARCH}" != "um" ]; then > - objects="--whole-archive \ > - ${KBUILD_VMLINUX_OBJS} \ > - --no-whole-archive \ > - --start-group \ > - ${KBUILD_VMLINUX_LIBS} \ > - --end-group \ > - ${@}" > + if [ -n "${CONFIG_LTO_CLANG}" ]; then > + # Use vmlinux.o instead of performing the slow LTO > + # link again. > + objects="--whole-archive \ > + vmlinux.o \ > + --no-whole-archive \ > + ${@}" > + else > + objects="--whole-archive \ > + ${KBUILD_VMLINUX_OBJS} \ > + --no-whole-archive \ > + --start-group \ > + ${KBUILD_VMLINUX_LIBS} \ > + --end-group \ > + ${@}" > + fi > > ${LD} ${KBUILD_LDFLAGS} ${LDFLAGS_vmlinux} \ > ${strip_debug#-Wl,} \ > @@ -270,7 +287,6 @@ fi; > ${MAKE} -f "${srctree}/scripts/Makefile.build" obj=init need-builtin=1 > > #link vmlinux.o > -info LD vmlinux.o > modpost_link vmlinux.o > objtool_link vmlinux.o > > -- > 2.27.0.212.ge8ba1cc988-goog > -- Thanks, ~Nick Desaulniers _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.6 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DAE24C433DF for ; Wed, 24 Jun 2020 20:54:25 +0000 (UTC) Received: from mother.openwall.net (mother.openwall.net [195.42.179.200]) by mail.kernel.org (Postfix) with SMTP id D9E732081A for ; Wed, 24 Jun 2020 20:54:24 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="H5Esc7pp" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D9E732081A Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kernel-hardening-return-19131-kernel-hardening=archiver.kernel.org@lists.openwall.com Received: (qmail 1604 invoked by uid 550); 24 Jun 2020 20:54:18 -0000 Mailing-List: contact kernel-hardening-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Received: (qmail 1572 invoked from network); 24 Jun 2020 20:54:17 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=d0EL9NFdX5f9VfIfl2NFPx5y5hXz1Uyj4y9f4db/Xvs=; b=H5Esc7ppD7ajcasVMCC9fu37qF19lnhoSuCsYVEphUWeKPRLGblJ7gMQbDirrPeLjB H4PxO8V2ZaXlA5meAPPCKB5+e5pzJ7UOzoF6IU/U8d6FsYlEtDIx1I9bxHj1zI+YTSdj PMgMTWllqkCYFyVJ6ZCIirHAbngjxGCg+ul0OHsGciPvla7qeUDRGIY6wvrwgfeumG49 R0F9mMk6qdSGbUFyBpYTnmu6wZjeWFVCsjGn7/8+YqXD99mo1UkupOfj2AzfMjdRRhbM RmWXRheMoqG1CNNWluHFzSx0tEnXxjdU0O0MP2rI+350qKgrYwjZBkXCoehod3GJa6xl 4ulg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=d0EL9NFdX5f9VfIfl2NFPx5y5hXz1Uyj4y9f4db/Xvs=; b=Mk32a7YmTMOTQXjdScvMS5F/SONZDhzL6LkM++OcIJYisDAiB7WK9GVQgbym/VK+Kh 4GzrFPnzWk3C9db8ESsHqYOO09WmVU8i4JCb/iYT/hgQN4HG+G7MXXTJYV5DRH8DUmo1 5oBF6w2RwKSA/aBd0tnwNuwTg4YpApbxrNH/H/ijr8J1dhxUa1h6DXYqzuVdHqwI1z5p hTRxoBgDgam1XBueKV5JgqyiWRMy/J00dl+Y+9HaA8YM0c1w+XGwhjbjS33JNKaQQS2o 6QBHpUbmrrDtM/mC/NMv+pJPRztqzKirmAaG+z+12+wniS1FU89VoFQkj3qaTmpKXfKG /u8g== X-Gm-Message-State: AOAM531J5gwlK7iGaK1XY5WpHg8xiKgIHqkX86qPogf9Mfzo8eeNffYs Gxly9IC61P/V+HU1sO2hBGT+Y3mn0Y40CRKaAkt9yw== X-Google-Smtp-Source: ABdhPJxdRg4CyimepZFl3TlhMYwN3sqV9wabjVBLqFbFsA/AUlxHXR9UblJJuAxFnem4caWlJ6BIBXmEJEr28D+Xgl4= X-Received: by 2002:a63:7e55:: with SMTP id o21mr12841493pgn.263.1593032044935; Wed, 24 Jun 2020 13:54:04 -0700 (PDT) MIME-Version: 1.0 References: <20200624203200.78870-1-samitolvanen@google.com> <20200624203200.78870-3-samitolvanen@google.com> In-Reply-To: <20200624203200.78870-3-samitolvanen@google.com> From: Nick Desaulniers Date: Wed, 24 Jun 2020 13:53:52 -0700 Message-ID: Subject: Re: [PATCH 02/22] kbuild: add support for Clang LTO To: Sami Tolvanen Cc: Masahiro Yamada , Will Deacon , Greg Kroah-Hartman , "Paul E. McKenney" , Kees Cook , clang-built-linux , Kernel Hardening , linux-arch , Linux ARM , Linux Kbuild mailing list , LKML , linux-pci@vger.kernel.org, "maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" Content-Type: text/plain; charset="UTF-8" On Wed, Jun 24, 2020 at 1:32 PM Sami Tolvanen wrote: > > This change adds build system support for Clang's Link Time > Optimization (LTO). With -flto, instead of ELF object files, Clang > produces LLVM bitcode, which is compiled into native code at link > time, allowing the final binary to be optimized globally. For more > details, see: > > https://llvm.org/docs/LinkTimeOptimization.html > > The Kconfig option CONFIG_LTO_CLANG is implemented as a choice, > which defaults to LTO being disabled. To use LTO, the architecture > must select ARCH_SUPPORTS_LTO_CLANG and support: > > - compiling with Clang, > - compiling inline assembly with Clang's integrated assembler, > - and linking with LLD. > > While using full LTO results in the best runtime performance, the > compilation is not scalable in time or memory. CONFIG_THINLTO > enables ThinLTO, which allows parallel optimization and faster > incremental builds. ThinLTO is used by default if the architecture > also selects ARCH_SUPPORTS_THINLTO: > > https://clang.llvm.org/docs/ThinLTO.html > > To enable LTO, LLVM tools must be used to handle bitcode files. The > easiest way is to pass the LLVM=1 option to make: > > $ make LLVM=1 defconfig > $ scripts/config -e LTO_CLANG > $ make LLVM=1 > > Alternatively, at least the following LLVM tools must be used: > > CC=clang LD=ld.lld AR=llvm-ar NM=llvm-nm > > To prepare for LTO support with other compilers, common parts are > gated behind the CONFIG_LTO option, and LTO can be disabled for > specific files by filtering out CC_FLAGS_LTO. > > Note that support for DYNAMIC_FTRACE and MODVERSIONS are added in > follow-up patches. > > Signed-off-by: Sami Tolvanen > --- > Makefile | 16 ++++++++ > arch/Kconfig | 66 +++++++++++++++++++++++++++++++ > include/asm-generic/vmlinux.lds.h | 11 ++++-- > scripts/Makefile.build | 9 ++++- > scripts/Makefile.modfinal | 9 ++++- > scripts/Makefile.modpost | 24 ++++++++++- > scripts/link-vmlinux.sh | 32 +++++++++++---- > 7 files changed, 151 insertions(+), 16 deletions(-) > > diff --git a/Makefile b/Makefile > index ac2c61c37a73..0c7fe6fb2143 100644 > --- a/Makefile > +++ b/Makefile > @@ -886,6 +886,22 @@ KBUILD_CFLAGS += $(CC_FLAGS_SCS) > export CC_FLAGS_SCS > endif > > +ifdef CONFIG_LTO_CLANG > +ifdef CONFIG_THINLTO > +CC_FLAGS_LTO_CLANG := -flto=thin $(call cc-option, -fsplit-lto-unit) The kconfig change gates this on clang-11; do we still need the cc-option check here, or can we hardcode the use of -fsplit-lto-unit? Playing with the flag in godbolt, it looks like clang-8 had support for this flag. > +KBUILD_LDFLAGS += --thinlto-cache-dir=.thinlto-cache It might be nice to have `make distclean` or even `make clean` scrub the .thinlto-cache? Also, I verified that the `.gitignore` rule for `.*` properly ignores this dir. > +else > +CC_FLAGS_LTO_CLANG := -flto > +endif > +CC_FLAGS_LTO_CLANG += -fvisibility=default > +endif > + > +ifdef CONFIG_LTO > +CC_FLAGS_LTO := $(CC_FLAGS_LTO_CLANG) > +KBUILD_CFLAGS += $(CC_FLAGS_LTO) > +export CC_FLAGS_LTO > +endif > + > # arch Makefile may override CC so keep this after arch Makefile is included > NOSTDINC_FLAGS += -nostdinc -isystem $(shell $(CC) -print-file-name=include) > > diff --git a/arch/Kconfig b/arch/Kconfig > index 8cc35dc556c7..e00b122293f8 100644 > --- a/arch/Kconfig > +++ b/arch/Kconfig > @@ -552,6 +552,72 @@ config SHADOW_CALL_STACK > reading and writing arbitrary memory may be able to locate them > and hijack control flow by modifying the stacks. > > +config LTO > + bool > + > +config ARCH_SUPPORTS_LTO_CLANG > + bool > + help > + An architecture should select this option if it supports: > + - compiling with Clang, > + - compiling inline assembly with Clang's integrated assembler, > + - and linking with LLD. > + > +config ARCH_SUPPORTS_THINLTO > + bool > + help > + An architecture should select this option if it supports Clang's > + ThinLTO. > + > +config THINLTO > + bool "Clang ThinLTO" > + depends on LTO_CLANG && ARCH_SUPPORTS_THINLTO > + default y > + help > + This option enables Clang's ThinLTO, which allows for parallel > + optimization and faster incremental compiles. More information > + can be found from Clang's documentation: > + > + https://clang.llvm.org/docs/ThinLTO.html > + > +choice > + prompt "Link Time Optimization (LTO)" > + default LTO_NONE > + help > + This option enables Link Time Optimization (LTO), which allows the > + compiler to optimize binaries globally. > + > + If unsure, select LTO_NONE. > + > +config LTO_NONE > + bool "None" > + > +config LTO_CLANG > + bool "Clang's Link Time Optimization (EXPERIMENTAL)" > + depends on CC_IS_CLANG && CLANG_VERSION >= 110000 && LD_IS_LLD > + depends on $(success,$(NM) --help | head -n 1 | grep -qi llvm) > + depends on $(success,$(AR) --help | head -n 1 | grep -qi llvm) > + depends on ARCH_SUPPORTS_LTO_CLANG > + depends on !FTRACE_MCOUNT_RECORD > + depends on !KASAN > + depends on !MODVERSIONS > + select LTO > + help > + This option enables Clang's Link Time Optimization (LTO), which > + allows the compiler to optimize the kernel globally. If you enable > + this option, the compiler generates LLVM bitcode instead of ELF > + object files, and the actual compilation from bitcode happens at > + the LTO link step, which may take several minutes depending on the > + kernel configuration. More information can be found from LLVM's > + documentation: > + > + https://llvm.org/docs/LinkTimeOptimization.html > + > + To select this option, you also need to use LLVM tools to handle > + the bitcode by passing LLVM=1 to make. > + > +endchoice > + > config HAVE_ARCH_WITHIN_STACK_FRAMES > bool > help > diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h > index db600ef218d7..78079000c05a 100644 > --- a/include/asm-generic/vmlinux.lds.h > +++ b/include/asm-generic/vmlinux.lds.h > @@ -89,15 +89,18 @@ > * .data. We don't want to pull in .data..other sections, which Linux > * has defined. Same for text and bss. > * > + * With LTO_CLANG, the linker also splits sections by default, so we need > + * these macros to combine the sections during the final link. > + * > * RODATA_MAIN is not used because existing code already defines .rodata.x > * sections to be brought in with rodata. > */ > -#ifdef CONFIG_LD_DEAD_CODE_DATA_ELIMINATION > +#if defined(CONFIG_LD_DEAD_CODE_DATA_ELIMINATION) || defined(CONFIG_LTO_CLANG) > #define TEXT_MAIN .text .text.[0-9a-zA-Z_]* > -#define DATA_MAIN .data .data.[0-9a-zA-Z_]* .data..LPBX* > +#define DATA_MAIN .data .data.[0-9a-zA-Z_]* .data..L* .data..compoundliteral* > #define SDATA_MAIN .sdata .sdata.[0-9a-zA-Z_]* > -#define RODATA_MAIN .rodata .rodata.[0-9a-zA-Z_]* > -#define BSS_MAIN .bss .bss.[0-9a-zA-Z_]* > +#define RODATA_MAIN .rodata .rodata.[0-9a-zA-Z_]* .rodata..L* > +#define BSS_MAIN .bss .bss.[0-9a-zA-Z_]* .bss..compoundliteral* > #define SBSS_MAIN .sbss .sbss.[0-9a-zA-Z_]* > #else > #define TEXT_MAIN .text > diff --git a/scripts/Makefile.build b/scripts/Makefile.build > index 2e8810b7e5ed..f307e708a1b7 100644 > --- a/scripts/Makefile.build > +++ b/scripts/Makefile.build > @@ -108,7 +108,7 @@ endif > # --------------------------------------------------------------------------- > > quiet_cmd_cc_s_c = CC $(quiet_modtag) $@ > - cmd_cc_s_c = $(CC) $(filter-out $(DEBUG_CFLAGS), $(c_flags)) $(DISABLE_LTO) -fverbose-asm -S -o $@ $< > + cmd_cc_s_c = $(CC) $(filter-out $(DEBUG_CFLAGS) $(CC_FLAGS_LTO), $(c_flags)) -fverbose-asm -S -o $@ $< > > $(obj)/%.s: $(src)/%.c FORCE > $(call if_changed_dep,cc_s_c) > @@ -424,8 +424,15 @@ $(obj)/lib.a: $(lib-y) FORCE > # Do not replace $(filter %.o,^) with $(real-prereqs). When a single object > # module is turned into a multi object module, $^ will contain header file > # dependencies recorded in the .*.cmd file. > +ifdef CONFIG_LTO_CLANG > +quiet_cmd_link_multi-m = AR [M] $@ > +cmd_link_multi-m = \ > + rm -f $@; \ > + $(AR) rcsTP$(KBUILD_ARFLAGS) $@ $(filter %.o,$^) > +else > quiet_cmd_link_multi-m = LD [M] $@ > cmd_link_multi-m = $(LD) $(ld_flags) -r -o $@ $(filter %.o,$^) > +endif > > $(multi-used-m): FORCE > $(call if_changed,link_multi-m) > diff --git a/scripts/Makefile.modfinal b/scripts/Makefile.modfinal > index 411c1e600e7d..1005b147abd0 100644 > --- a/scripts/Makefile.modfinal > +++ b/scripts/Makefile.modfinal > @@ -6,6 +6,7 @@ > PHONY := __modfinal > __modfinal: > > +include $(objtree)/include/config/auto.conf > include $(srctree)/scripts/Kbuild.include > > # for c_flags > @@ -29,6 +30,12 @@ quiet_cmd_cc_o_c = CC [M] $@ > > ARCH_POSTLINK := $(wildcard $(srctree)/arch/$(SRCARCH)/Makefile.postlink) > > +ifdef CONFIG_LTO_CLANG > +# With CONFIG_LTO_CLANG, reuse the object file we compiled for modpost to > +# avoid a second slow LTO link > +prelink-ext := .lto > +endif > + > quiet_cmd_ld_ko_o = LD [M] $@ > cmd_ld_ko_o = \ > $(LD) -r $(KBUILD_LDFLAGS) \ > @@ -37,7 +44,7 @@ quiet_cmd_ld_ko_o = LD [M] $@ > -o $@ $(filter %.o, $^); \ > $(if $(ARCH_POSTLINK), $(MAKE) -f $(ARCH_POSTLINK) $@, true) > > -$(modules): %.ko: %.o %.mod.o $(KBUILD_LDS_MODULE) FORCE > +$(modules): %.ko: %$(prelink-ext).o %.mod.o $(KBUILD_LDS_MODULE) FORCE > +$(call if_changed,ld_ko_o) > > targets += $(modules) $(modules:.ko=.mod.o) > diff --git a/scripts/Makefile.modpost b/scripts/Makefile.modpost > index 3651cbf6ad49..9ced8aecd579 100644 > --- a/scripts/Makefile.modpost > +++ b/scripts/Makefile.modpost > @@ -102,12 +102,32 @@ $(input-symdump): > @echo >&2 'WARNING: Symbol version dump "$@" is missing.' > @echo >&2 ' Modules may not have dependencies or modversions.' > > +ifdef CONFIG_LTO_CLANG > +# With CONFIG_LTO_CLANG, .o files might be LLVM bitcode, so we need to run > +# LTO to compile them into native code before running modpost > +prelink-ext = .lto > + > +quiet_cmd_cc_lto_link_modules = LTO [M] $@ > +cmd_cc_lto_link_modules = \ > + $(LD) $(ld_flags) -r -o $@ \ > + --whole-archive $(filter-out FORCE,$^) > + > +%.lto.o: %.o FORCE > + $(call if_changed,cc_lto_link_modules) > + > +PHONY += FORCE > +FORCE: > + > +endif > + > +modules := $(sort $(shell cat $(MODORDER))) > + > # Read out modules.order to pass in modpost. > # Otherwise, allmodconfig would fail with "Argument list too long". > quiet_cmd_modpost = MODPOST $@ > - cmd_modpost = sed 's/ko$$/o/' $< | $(MODPOST) -T - > + cmd_modpost = sed 's/\.ko$$/$(prelink-ext)\.o/' $< | $(MODPOST) -T - > > -$(output-symdump): $(MODORDER) $(input-symdump) FORCE > +$(output-symdump): $(MODORDER) $(input-symdump) $(modules:.ko=$(prelink-ext).o) FORCE > $(call if_changed,modpost) > > targets += $(output-symdump) > diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh > index 92dd745906f4..a681b3b6722e 100755 > --- a/scripts/link-vmlinux.sh > +++ b/scripts/link-vmlinux.sh > @@ -52,6 +52,14 @@ modpost_link() > ${KBUILD_VMLINUX_LIBS} \ > --end-group" > > + if [ -n "${CONFIG_LTO_CLANG}" ]; then > + # This might take a while, so indicate that we're doing > + # an LTO link > + info LTO ${1} > + else > + info LD ${1} > + fi > + > ${LD} ${KBUILD_LDFLAGS} -r -o ${1} ${objects} > } > > @@ -99,13 +107,22 @@ vmlinux_link() > fi > > if [ "${SRCARCH}" != "um" ]; then > - objects="--whole-archive \ > - ${KBUILD_VMLINUX_OBJS} \ > - --no-whole-archive \ > - --start-group \ > - ${KBUILD_VMLINUX_LIBS} \ > - --end-group \ > - ${@}" > + if [ -n "${CONFIG_LTO_CLANG}" ]; then > + # Use vmlinux.o instead of performing the slow LTO > + # link again. > + objects="--whole-archive \ > + vmlinux.o \ > + --no-whole-archive \ > + ${@}" > + else > + objects="--whole-archive \ > + ${KBUILD_VMLINUX_OBJS} \ > + --no-whole-archive \ > + --start-group \ > + ${KBUILD_VMLINUX_LIBS} \ > + --end-group \ > + ${@}" > + fi > > ${LD} ${KBUILD_LDFLAGS} ${LDFLAGS_vmlinux} \ > ${strip_debug#-Wl,} \ > @@ -270,7 +287,6 @@ fi; > ${MAKE} -f "${srctree}/scripts/Makefile.build" obj=init need-builtin=1 > > #link vmlinux.o > -info LD vmlinux.o > modpost_link vmlinux.o > objtool_link vmlinux.o > > -- > 2.27.0.212.ge8ba1cc988-goog > -- Thanks, ~Nick Desaulniers