* [RFC][PATCH 0/5] kbuild changes, thin archives, --gc-sections @ 2016-08-05 12:11 Nicholas Piggin 2016-08-05 12:11 ` [PATCH 1/5] kbuild: allow architectures to use thin archives instead of ld -r Nicholas Piggin ` (6 more replies) 0 siblings, 7 replies; 43+ messages in thread From: Nicholas Piggin @ 2016-08-05 12:11 UTC (permalink / raw) To: linux-kbuild Cc: Nicholas Piggin, linux-arch, linuxppc-dev, Stephen Rothwell, Arnd Bergmann, Nicolas Pitre, Segher Boessenkool, Alan Modra Hello, I have 3 different things in this patchset. All arch specific, but all involve kbuild changes, so I'd like to discuss them with kbuild maintainers. The goal has been to improve long standing linking difficulties with the powerpc kernel. * First, building kernel using thin archives rather than incremental linking. This seems quite clean and is per-arch, so I hope it should not be too controversial. * Second, building kernel using -ffunction-sections -fdata-sections, --gc-sections. Yes, I'm spinning the wheel again. It was motivated by tiny codesize regression in the first patch, but the results seem too good to ignore. * Third, allowing architecture to run a tool over module after it has been linked. Powerpc wants to use it in order to relocate "alternate code" instructions that get don't get linked at their runtime address. No idea if this is the right approach wrt kbuild, but it seems to work. I have included the powerpc code for the first two as a reference. The third is much bigger and mostly uninteresting for this cc list, but it can be found here: https://patchwork.ozlabs.org/patch/651006/ Comments appreciated. Thanks, ^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH 1/5] kbuild: allow architectures to use thin archives instead of ld -r 2016-08-05 12:11 [RFC][PATCH 0/5] kbuild changes, thin archives, --gc-sections Nicholas Piggin @ 2016-08-05 12:11 ` Nicholas Piggin 2016-08-06 3:50 ` kbuild test robot 2016-08-06 20:10 ` Sam Ravnborg 2016-08-05 12:12 ` [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination Nicholas Piggin ` (5 subsequent siblings) 6 siblings, 2 replies; 43+ messages in thread From: Nicholas Piggin @ 2016-08-05 12:11 UTC (permalink / raw) To: linux-kbuild Cc: Nicholas Piggin, linux-arch, linuxppc-dev, Stephen Rothwell, Arnd Bergmann, Nicolas Pitre, Segher Boessenkool, Alan Modra From: Stephen Rothwell <sfr@canb.auug.org.au> ld -r is an incremental link used to create built-in.o files in build subdirectories. It produces relocatable object files containing all its input files, and these are are then pulled together and relocated in the final link. Aside from the bloat, this constrains the final link relocations, which has bitten large powerpc builds with unresolvable relocations in the final link. Alan Modra has recommended the kernel use thin archives for linking. This is an alternative and means that the linker has more information available to it when it links the kernel. This patch enables a config option architectures can select, which causes all built-in.o files to be built as thin archives. built-in.o files in subdirectories do not get symbol table or index attached, which improves speed and size. The final link pass creates a built-in.o archive in the root output directory which includes the symbol table and index. The linker then uses takes this file to link. The --whole-archive linker option is required, because the linker now has visibility to every individual object file, and it will otherwise just completely avoid including those without external references (consider a file with EXPORT_SYMBOL or initcall or hardware exceptions as its only entry points). The traditional built works "by luck" as built-in.o files are large enough that they're going to get external references. However this optimisation is unpredictable for the kernel (due to above external references), ineffective at culling unused, and costly because the .o files have to be searched for references. Superior alternatives for link-time culling should be used instead. Build characteristics for inclink vs thinarc, on a small powerpc64le pseries VM with a modest .config: inclink thinarc sizes vmlinux 15 618 680 15 625 028 sum of all built-in.o 56 091 808 1 054 334 sum excluding root built-in.o 151 430 find -name built-in.o | xargs rm ; time make vmlinux real 22.772s 21.143s user 13.280s 13.430s sys 4.310s 2.750s - Final kernel pulled in only about 6K more, which shows how ineffective the object file culling is. - Build performance looks improved due to less pagecache activity. On IO constrained systems it could be a bigger win. - Build size saving is significant. Side note, the toochain understands archives, so there's some tricks, $ ar t built-in.o # list all files you linked with $ size built-in.o # and their sizes $ objdump -d built-in.o # disassembly (unrelocated) with filenames Implementation by sfr, minor tweaks by npiggin. Cc: linux-kbuild@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: Arnd Bergmann <arnd@arndb.de> Cc: Segher Boessenkool <segher@kernel.crashing.org> Cc: Alan Modra <amodra@gmail.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> --- arch/Kconfig | 6 +++++ scripts/Makefile.build | 23 +++++++++++++--- scripts/link-vmlinux.sh | 71 +++++++++++++++++++++++++++++++++++++++++-------- 3 files changed, 85 insertions(+), 15 deletions(-) diff --git a/arch/Kconfig b/arch/Kconfig index d794384..1330bf4 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -424,6 +424,12 @@ config CC_STACKPROTECTOR_STRONG endchoice +config THIN_ARCHIVES + bool + help + Select this if the architecture wants to use thin archives + instead of ld -r to create the built-in.o files. + config HAVE_CONTEXT_TRACKING bool help diff --git a/scripts/Makefile.build b/scripts/Makefile.build index 0d1ca5b..7fab825 100644 --- a/scripts/Makefile.build +++ b/scripts/Makefile.build @@ -358,12 +358,22 @@ $(sort $(subdir-obj-y)): $(subdir-ym) ; # Rule to compile a set of .o files into one .o file # ifdef builtin-target -quiet_cmd_link_o_target = LD $@ + +ifdef CONFIG_THIN_ARCHIVES + cmd_make_builtin = rm -f $@; $(AR) rcST$(KBUILD_ARFLAGS) + cmd_make_empty_builtin = rm -f $@; $(AR) rcST$(KBUILD_ARFLAGS) + quiet_cmd_link_o_target = AR $@ +else + cmd_make_builtin = $(LD) $(ld_flags) -r -o + cmd_make_empty_builtin = rm -f $@; $(AR) rcs$(KBUILD_ARFLAGS) + quiet_cmd_link_o_target = LD $@ +endif + # If the list of objects to link is empty, just create an empty built-in.o cmd_link_o_target = $(if $(strip $(obj-y)),\ - $(LD) $(ld_flags) -r -o $@ $(filter $(obj-y), $^) \ + $(cmd_make_builtin) $@ $(filter $(obj-y), $^) \ $(cmd_secanalysis),\ - rm -f $@; $(AR) rcs$(KBUILD_ARFLAGS) $@) + $(cmd_make_empty_builtin) $@) $(builtin-target): $(obj-y) FORCE $(call if_changed,link_o_target) @@ -389,7 +399,12 @@ $(modorder-target): $(subdir-ym) FORCE # ifdef lib-target quiet_cmd_link_l_target = AR $@ -cmd_link_l_target = rm -f $@; $(AR) rcs$(KBUILD_ARFLAGS) $@ $(lib-y) + +ifdef CONFIG_THIN_ARCHIVES + cmd_link_l_target = rm -f $@; $(AR) rcST$(KBUILD_ARFLAGS) $@ $(lib-y) +else + cmd_link_l_target = rm -f $@; $(AR) rcs$(KBUILD_ARFLAGS) $@ $(lib-y) +endif $(lib-target): $(lib-y) FORCE $(call if_changed,link_l_target) diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh index f0f6d9d..a234ffe 100755 --- a/scripts/link-vmlinux.sh +++ b/scripts/link-vmlinux.sh @@ -37,12 +37,40 @@ info() fi } +# Thin archive build here makes a final archive with +# symbol table and indexes from vmlinux objects, which can be +# used as input to linker. +# +# Traditional incremental style of link does not require this step +# +# built-in.o output file +# +archive_builtin() +{ + if [ -n "${CONFIG_THIN_ARCHIVES}" ]; then + info AR built-in.o + rm -f built-in.o; + ${AR} rcsT${KBUILD_ARFLAGS} built-in.o \ + ${KBUILD_VMLINUX_INIT} \ + ${KBUILD_VMLINUX_MAIN} + fi +} + # Link of vmlinux.o used for section mismatch analysis # ${1} output file modpost_link() { - ${LD} ${LDFLAGS} -r -o ${1} ${KBUILD_VMLINUX_INIT} \ - --start-group ${KBUILD_VMLINUX_MAIN} --end-group + local objects + + if [ -n "${CONFIG_THIN_ARCHIVES}" ]; then + objects="--whole-archive built-in.o" + else + objects="${KBUILD_VMLINUX_INIT} \ + --start-group \ + ${KBUILD_VMLINUX_MAIN} \ + --end-group" + fi + ${LD} ${LDFLAGS} -r -o ${1} ${objects} } # Link of vmlinux @@ -51,18 +79,36 @@ modpost_link() vmlinux_link() { local lds="${objtree}/${KBUILD_LDS}" + local objects if [ "${SRCARCH}" != "um" ]; then - ${LD} ${LDFLAGS} ${LDFLAGS_vmlinux} -o ${2} \ - -T ${lds} ${KBUILD_VMLINUX_INIT} \ - --start-group ${KBUILD_VMLINUX_MAIN} --end-group ${1} + if [ -n "${CONFIG_THIN_ARCHIVES}" ]; then + objects="--whole-archive built-in.o ${1}" + else + objects="${KBUILD_VMLINUX_INIT} \ + --start-group \ + ${KBUILD_VMLINUX_MAIN} \ + --end-group \ + ${1}" + fi + + ${LD} ${LDFLAGS} ${LDFLAGS_vmlinux} -o ${2} \ + -T ${lds} ${objects} else - ${CC} ${CFLAGS_vmlinux} -o ${2} \ - -Wl,-T,${lds} ${KBUILD_VMLINUX_INIT} \ - -Wl,--start-group \ - ${KBUILD_VMLINUX_MAIN} \ - -Wl,--end-group \ - -lutil -lrt -lpthread ${1} + if [ -n "${CONFIG_THIN_ARCHIVES}" ]; then + objects="-Wl,--whole-archive built-in.o ${1}" + else + objects="${KBUILD_VMLINUX_INIT} \ + -Wl,--start-group \ + ${KBUILD_VMLINUX_MAIN} \ + -Wl,--end-group \ + ${1}" + fi + + ${CC} ${CFLAGS_vmlinux} -o ${2} \ + -Wl,-T,${lds} ${KBUILD_VMLINUX_INIT} \ + -lutil -lrt -lpthread \ + ${objects} rm -f linux fi } @@ -119,6 +165,7 @@ cleanup() rm -f .tmp_kallsyms* rm -f .tmp_version rm -f .tmp_vmlinux* + rm -f built-in.o rm -f System.map rm -f vmlinux rm -f vmlinux.o @@ -162,6 +209,8 @@ case "${KCONFIG_CONFIG}" in . "./${KCONFIG_CONFIG}" esac +archive_builtin + #link vmlinux.o info LD vmlinux.o modpost_link vmlinux.o -- 2.8.1 ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH 1/5] kbuild: allow architectures to use thin archives instead of ld -r 2016-08-05 12:11 ` [PATCH 1/5] kbuild: allow architectures to use thin archives instead of ld -r Nicholas Piggin 2016-08-06 3:50 ` kbuild test robot @ 2016-08-06 3:50 ` kbuild test robot 1 sibling, 0 replies; 43+ messages in thread From: kbuild test robot @ 2016-08-06 3:50 UTC (permalink / raw) To: Nicholas Piggin Cc: kbuild-all, linux-kbuild, linux-arch, Stephen Rothwell, Arnd Bergmann, Nicolas Pitre, Alan Modra, linuxppc-dev [-- Attachment #1: Type: text/plain, Size: 5426 bytes --] Hi Stephen, [auto build test ERROR on kbuild/for-next] [also build test ERROR on v4.7] [cannot apply to linus/master linux/master next-20160805] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Nicholas-Piggin/kbuild-changes-thin-archives-gc-sections/20160805-202258 base: https://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild.git for-next config: um-allnoconfig (attached as .config) compiler: gcc-6 (Debian 6.1.1-9) 6.1.1 20160705 reproduce: # save the attached .config to linux build tree make ARCH=um All errors (new ones prefixed by >>): init/built-in.o:(.bss+0x4): multiple definition of `reset_devices' init/built-in.o:(.bss+0x4): first defined here init/built-in.o: In function `prepare_namespace': (.init.text+0x988): multiple definition of `prepare_namespace' init/built-in.o:(.init.text+0x988): first defined here init/built-in.o:(.init.data+0x1070): multiple definition of `late_time_init' init/built-in.o:(.init.data+0x1070): first defined here init/built-in.o: In function `load_default_modules': (.init.text+0x2f8): multiple definition of `load_default_modules' init/built-in.o:(.init.text+0x2f8): first defined here init/built-in.o:(.bss+0x10): multiple definition of `system_state' init/built-in.o:(.bss+0x10): first defined here init/built-in.o: In function `mount_root': (.init.text+0x987): multiple definition of `mount_root' init/built-in.o:(.init.text+0x987): first defined here init/built-in.o: In function `mount_block_root': (.init.text+0x836): multiple definition of `mount_block_root' init/built-in.o:(.init.text+0x836): first defined here init/built-in.o:(.bss+0x14): multiple definition of `early_boot_irqs_disabled' init/built-in.o:(.bss+0x14): first defined here init/built-in.o:(.data+0x0): multiple definition of `loops_per_jiffy' init/built-in.o:(.data+0x0): first defined here init/built-in.o:(.bss+0xc): multiple definition of `saved_command_line' init/built-in.o:(.bss+0xc): first defined here init/built-in.o: In function `calibrate_delay': (.text+0x2e0): multiple definition of `calibrate_delay' init/built-in.o:(.text+0x2e0): first defined here init/built-in.o: In function `do_one_initcall': (.init.text+0x57a): multiple definition of `do_one_initcall' init/built-in.o:(.init.text+0x57a): first defined here init/built-in.o:(.rodata+0x20): multiple definition of `linux_proc_banner' init/built-in.o:(.rodata+0x20): first defined here init/built-in.o:(.init.data+0x20e4): multiple definition of `rd_doload' init/built-in.o:(.init.data+0x20e4): first defined here init/built-in.o:(.data+0x620): multiple definition of `init_task' init/built-in.o:(.data+0x620): first defined here init/built-in.o:(.data+0x460): multiple definition of `init_uts_ns' init/built-in.o:(.data+0x460): first defined here init/built-in.o:(.data+0x20): multiple definition of `envp_init' init/built-in.o:(.data+0x20): first defined here init/built-in.o: In function `name_to_dev_t': (.text+0x70): multiple definition of `name_to_dev_t' init/built-in.o:(.text+0x70): first defined here init/built-in.o:(.data+0x618): multiple definition of `root_mountflags' init/built-in.o:(.data+0x618): first defined here init/built-in.o:(.bss+0x8): multiple definition of `static_key_initialized' init/built-in.o:(.bss+0x8): first defined here init/built-in.o: In function `start_kernel': (.init.text+0x351): multiple definition of `start_kernel' init/built-in.o:(.init.text+0x351): first defined here >> init/built-in.o:(.bss+0x30): multiple definition of `Version_263936' init/built-in.o:(.bss+0x30): first defined here init/built-in.o: In function `parse_early_options': (.init.text+0x2f9): multiple definition of `parse_early_options' init/built-in.o:(.init.text+0x2f9): first defined here init/built-in.o: In function `parse_early_param': (.init.text+0x31a): multiple definition of `parse_early_param' init/built-in.o:(.init.text+0x31a): first defined here init/built-in.o:(.bss+0x40): multiple definition of `preset_lpj' init/built-in.o:(.bss+0x40): first defined here init/built-in.o:(.data..init_task+0x0): multiple definition of `init_thread_union' init/built-in.o:(.data..init_task+0x0): first defined here init/built-in.o: In function `init_rootfs': (.init.text+0x80a): multiple definition of `init_rootfs' init/built-in.o:(.init.text+0x80a): first defined here init/built-in.o:(.rodata+0x80): multiple definition of `linux_banner' init/built-in.o:(.rodata+0x80): first defined here init/built-in.o:(.bss+0x34): multiple definition of `ROOT_DEV' init/built-in.o:(.bss+0x34): first defined here init/built-in.o:(.bss+0x0): multiple definition of `initcall_debug' init/built-in.o:(.bss+0x0): first defined here init/built-in.o:(.init.data+0x1080): multiple definition of `boot_command_line' init/built-in.o:(.init.data+0x1080): first defined here init/built-in.o:(.bss+0x44): multiple definition of `lpj_fine' init/built-in.o:(.bss+0x44): first defined here collect2: error: ld returned 1 exit status --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation [-- Attachment #2: .config.gz --] [-- Type: application/octet-stream, Size: 3843 bytes --] ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 1/5] kbuild: allow architectures to use thin archives instead of ld -r @ 2016-08-06 3:50 ` kbuild test robot 0 siblings, 0 replies; 43+ messages in thread From: kbuild test robot @ 2016-08-06 3:50 UTC (permalink / raw) To: Nicholas Piggin Cc: kbuild-all, linux-kbuild, linux-arch, Stephen Rothwell, Arnd Bergmann, Nicolas Pitre, Alan Modra, Nicholas Piggin, linuxppc-dev [-- Attachment #1: Type: text/plain, Size: 5426 bytes --] Hi Stephen, [auto build test ERROR on kbuild/for-next] [also build test ERROR on v4.7] [cannot apply to linus/master linux/master next-20160805] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Nicholas-Piggin/kbuild-changes-thin-archives-gc-sections/20160805-202258 base: https://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild.git for-next config: um-allnoconfig (attached as .config) compiler: gcc-6 (Debian 6.1.1-9) 6.1.1 20160705 reproduce: # save the attached .config to linux build tree make ARCH=um All errors (new ones prefixed by >>): init/built-in.o:(.bss+0x4): multiple definition of `reset_devices' init/built-in.o:(.bss+0x4): first defined here init/built-in.o: In function `prepare_namespace': (.init.text+0x988): multiple definition of `prepare_namespace' init/built-in.o:(.init.text+0x988): first defined here init/built-in.o:(.init.data+0x1070): multiple definition of `late_time_init' init/built-in.o:(.init.data+0x1070): first defined here init/built-in.o: In function `load_default_modules': (.init.text+0x2f8): multiple definition of `load_default_modules' init/built-in.o:(.init.text+0x2f8): first defined here init/built-in.o:(.bss+0x10): multiple definition of `system_state' init/built-in.o:(.bss+0x10): first defined here init/built-in.o: In function `mount_root': (.init.text+0x987): multiple definition of `mount_root' init/built-in.o:(.init.text+0x987): first defined here init/built-in.o: In function `mount_block_root': (.init.text+0x836): multiple definition of `mount_block_root' init/built-in.o:(.init.text+0x836): first defined here init/built-in.o:(.bss+0x14): multiple definition of `early_boot_irqs_disabled' init/built-in.o:(.bss+0x14): first defined here init/built-in.o:(.data+0x0): multiple definition of `loops_per_jiffy' init/built-in.o:(.data+0x0): first defined here init/built-in.o:(.bss+0xc): multiple definition of `saved_command_line' init/built-in.o:(.bss+0xc): first defined here init/built-in.o: In function `calibrate_delay': (.text+0x2e0): multiple definition of `calibrate_delay' init/built-in.o:(.text+0x2e0): first defined here init/built-in.o: In function `do_one_initcall': (.init.text+0x57a): multiple definition of `do_one_initcall' init/built-in.o:(.init.text+0x57a): first defined here init/built-in.o:(.rodata+0x20): multiple definition of `linux_proc_banner' init/built-in.o:(.rodata+0x20): first defined here init/built-in.o:(.init.data+0x20e4): multiple definition of `rd_doload' init/built-in.o:(.init.data+0x20e4): first defined here init/built-in.o:(.data+0x620): multiple definition of `init_task' init/built-in.o:(.data+0x620): first defined here init/built-in.o:(.data+0x460): multiple definition of `init_uts_ns' init/built-in.o:(.data+0x460): first defined here init/built-in.o:(.data+0x20): multiple definition of `envp_init' init/built-in.o:(.data+0x20): first defined here init/built-in.o: In function `name_to_dev_t': (.text+0x70): multiple definition of `name_to_dev_t' init/built-in.o:(.text+0x70): first defined here init/built-in.o:(.data+0x618): multiple definition of `root_mountflags' init/built-in.o:(.data+0x618): first defined here init/built-in.o:(.bss+0x8): multiple definition of `static_key_initialized' init/built-in.o:(.bss+0x8): first defined here init/built-in.o: In function `start_kernel': (.init.text+0x351): multiple definition of `start_kernel' init/built-in.o:(.init.text+0x351): first defined here >> init/built-in.o:(.bss+0x30): multiple definition of `Version_263936' init/built-in.o:(.bss+0x30): first defined here init/built-in.o: In function `parse_early_options': (.init.text+0x2f9): multiple definition of `parse_early_options' init/built-in.o:(.init.text+0x2f9): first defined here init/built-in.o: In function `parse_early_param': (.init.text+0x31a): multiple definition of `parse_early_param' init/built-in.o:(.init.text+0x31a): first defined here init/built-in.o:(.bss+0x40): multiple definition of `preset_lpj' init/built-in.o:(.bss+0x40): first defined here init/built-in.o:(.data..init_task+0x0): multiple definition of `init_thread_union' init/built-in.o:(.data..init_task+0x0): first defined here init/built-in.o: In function `init_rootfs': (.init.text+0x80a): multiple definition of `init_rootfs' init/built-in.o:(.init.text+0x80a): first defined here init/built-in.o:(.rodata+0x80): multiple definition of `linux_banner' init/built-in.o:(.rodata+0x80): first defined here init/built-in.o:(.bss+0x34): multiple definition of `ROOT_DEV' init/built-in.o:(.bss+0x34): first defined here init/built-in.o:(.bss+0x0): multiple definition of `initcall_debug' init/built-in.o:(.bss+0x0): first defined here init/built-in.o:(.init.data+0x1080): multiple definition of `boot_command_line' init/built-in.o:(.init.data+0x1080): first defined here init/built-in.o:(.bss+0x44): multiple definition of `lpj_fine' init/built-in.o:(.bss+0x44): first defined here collect2: error: ld returned 1 exit status --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation [-- Attachment #2: .config.gz --] [-- Type: application/octet-stream, Size: 3843 bytes --] ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 1/5] kbuild: allow architectures to use thin archives instead of ld -r @ 2016-08-06 3:50 ` kbuild test robot 0 siblings, 0 replies; 43+ messages in thread From: kbuild test robot @ 2016-08-06 3:50 UTC (permalink / raw) Cc: linux-arch, Stephen Rothwell, Arnd Bergmann, Nicolas Pitre, linux-kbuild, Nicholas Piggin, Alan Modra, kbuild-all, linuxppc-dev [-- Attachment #1: Type: text/plain, Size: 5426 bytes --] Hi Stephen, [auto build test ERROR on kbuild/for-next] [also build test ERROR on v4.7] [cannot apply to linus/master linux/master next-20160805] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Nicholas-Piggin/kbuild-changes-thin-archives-gc-sections/20160805-202258 base: https://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild.git for-next config: um-allnoconfig (attached as .config) compiler: gcc-6 (Debian 6.1.1-9) 6.1.1 20160705 reproduce: # save the attached .config to linux build tree make ARCH=um All errors (new ones prefixed by >>): init/built-in.o:(.bss+0x4): multiple definition of `reset_devices' init/built-in.o:(.bss+0x4): first defined here init/built-in.o: In function `prepare_namespace': (.init.text+0x988): multiple definition of `prepare_namespace' init/built-in.o:(.init.text+0x988): first defined here init/built-in.o:(.init.data+0x1070): multiple definition of `late_time_init' init/built-in.o:(.init.data+0x1070): first defined here init/built-in.o: In function `load_default_modules': (.init.text+0x2f8): multiple definition of `load_default_modules' init/built-in.o:(.init.text+0x2f8): first defined here init/built-in.o:(.bss+0x10): multiple definition of `system_state' init/built-in.o:(.bss+0x10): first defined here init/built-in.o: In function `mount_root': (.init.text+0x987): multiple definition of `mount_root' init/built-in.o:(.init.text+0x987): first defined here init/built-in.o: In function `mount_block_root': (.init.text+0x836): multiple definition of `mount_block_root' init/built-in.o:(.init.text+0x836): first defined here init/built-in.o:(.bss+0x14): multiple definition of `early_boot_irqs_disabled' init/built-in.o:(.bss+0x14): first defined here init/built-in.o:(.data+0x0): multiple definition of `loops_per_jiffy' init/built-in.o:(.data+0x0): first defined here init/built-in.o:(.bss+0xc): multiple definition of `saved_command_line' init/built-in.o:(.bss+0xc): first defined here init/built-in.o: In function `calibrate_delay': (.text+0x2e0): multiple definition of `calibrate_delay' init/built-in.o:(.text+0x2e0): first defined here init/built-in.o: In function `do_one_initcall': (.init.text+0x57a): multiple definition of `do_one_initcall' init/built-in.o:(.init.text+0x57a): first defined here init/built-in.o:(.rodata+0x20): multiple definition of `linux_proc_banner' init/built-in.o:(.rodata+0x20): first defined here init/built-in.o:(.init.data+0x20e4): multiple definition of `rd_doload' init/built-in.o:(.init.data+0x20e4): first defined here init/built-in.o:(.data+0x620): multiple definition of `init_task' init/built-in.o:(.data+0x620): first defined here init/built-in.o:(.data+0x460): multiple definition of `init_uts_ns' init/built-in.o:(.data+0x460): first defined here init/built-in.o:(.data+0x20): multiple definition of `envp_init' init/built-in.o:(.data+0x20): first defined here init/built-in.o: In function `name_to_dev_t': (.text+0x70): multiple definition of `name_to_dev_t' init/built-in.o:(.text+0x70): first defined here init/built-in.o:(.data+0x618): multiple definition of `root_mountflags' init/built-in.o:(.data+0x618): first defined here init/built-in.o:(.bss+0x8): multiple definition of `static_key_initialized' init/built-in.o:(.bss+0x8): first defined here init/built-in.o: In function `start_kernel': (.init.text+0x351): multiple definition of `start_kernel' init/built-in.o:(.init.text+0x351): first defined here >> init/built-in.o:(.bss+0x30): multiple definition of `Version_263936' init/built-in.o:(.bss+0x30): first defined here init/built-in.o: In function `parse_early_options': (.init.text+0x2f9): multiple definition of `parse_early_options' init/built-in.o:(.init.text+0x2f9): first defined here init/built-in.o: In function `parse_early_param': (.init.text+0x31a): multiple definition of `parse_early_param' init/built-in.o:(.init.text+0x31a): first defined here init/built-in.o:(.bss+0x40): multiple definition of `preset_lpj' init/built-in.o:(.bss+0x40): first defined here init/built-in.o:(.data..init_task+0x0): multiple definition of `init_thread_union' init/built-in.o:(.data..init_task+0x0): first defined here init/built-in.o: In function `init_rootfs': (.init.text+0x80a): multiple definition of `init_rootfs' init/built-in.o:(.init.text+0x80a): first defined here init/built-in.o:(.rodata+0x80): multiple definition of `linux_banner' init/built-in.o:(.rodata+0x80): first defined here init/built-in.o:(.bss+0x34): multiple definition of `ROOT_DEV' init/built-in.o:(.bss+0x34): first defined here init/built-in.o:(.bss+0x0): multiple definition of `initcall_debug' init/built-in.o:(.bss+0x0): first defined here init/built-in.o:(.init.data+0x1080): multiple definition of `boot_command_line' init/built-in.o:(.init.data+0x1080): first defined here init/built-in.o:(.bss+0x44): multiple definition of `lpj_fine' init/built-in.o:(.bss+0x44): first defined here collect2: error: ld returned 1 exit status --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation [-- Attachment #2: .config.gz --] [-- Type: application/octet-stream, Size: 3843 bytes --] ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 1/5] kbuild: allow architectures to use thin archives instead of ld -r 2016-08-05 12:11 ` [PATCH 1/5] kbuild: allow architectures to use thin archives instead of ld -r Nicholas Piggin 2016-08-06 3:50 ` kbuild test robot @ 2016-08-06 20:10 ` Sam Ravnborg 2016-08-07 1:49 ` Stephen Rothwell 2016-08-08 3:25 ` Nicholas Piggin 1 sibling, 2 replies; 43+ messages in thread From: Sam Ravnborg @ 2016-08-06 20:10 UTC (permalink / raw) To: Nicholas Piggin Cc: linux-kbuild, linux-arch, linuxppc-dev, Stephen Rothwell, Arnd Bergmann, Nicolas Pitre, Segher Boessenkool, Alan Modra Hi Nicholas. On Fri, Aug 05, 2016 at 10:11:59PM +1000, Nicholas Piggin wrote: > From: Stephen Rothwell <sfr@canb.auug.org.au> > > ld -r is an incremental link used to create built-in.o files in build > subdirectories. It produces relocatable object files containing all > its input files, and these are are then pulled together and relocated > in the final link. Aside from the bloat, this constrains the final > link relocations, which has bitten large powerpc builds with > unresolvable relocations in the final link. > > Alan Modra has recommended the kernel use thin archives for linking. > This is an alternative and means that the linker has more information > available to it when it links the kernel. > > This patch enables a config option architectures can select, If we want to do this, then I suggest to make the logic reverse. Architectures that for some reasons cannot use this should have the possibility to avoid it. But let it be enabled by default. > which > causes all built-in.o files to be built as thin archives. built-in.o > files in subdirectories do not get symbol table or index attached, > which improves speed and size. The final link pass creates a > built-in.o archive in the root output directory which includes the > symbol table and index. The linker then uses takes this file to link. > > The --whole-archive linker option is required, because the linker now > has visibility to every individual object file, and it will otherwise > just completely avoid including those without external references > (consider a file with EXPORT_SYMBOL or initcall or hardware exceptions > as its only entry points). The traditional built works "by luck" as > built-in.o files are large enough that they're going to get external > references. However this optimisation is unpredictable for the kernel > (due to above external references), ineffective at culling unused, and > costly because the .o files have to be searched for references. > Superior alternatives for link-time culling should be used instead. > > Build characteristics for inclink vs thinarc, on a small powerpc64le > pseries VM with a modest .config: > > inclink thinarc > sizes > vmlinux 15 618 680 15 625 028 > sum of all built-in.o 56 091 808 1 054 334 > sum excluding root built-in.o 151 430 > > find -name built-in.o | xargs rm ; time make vmlinux > real 22.772s 21.143s > user 13.280s 13.430s > sys 4.310s 2.750s > > - Final kernel pulled in only about 6K more, which shows how > ineffective the object file culling is. > - Build performance looks improved due to less pagecache activity. > On IO constrained systems it could be a bigger win. > - Build size saving is significant. Good to see this old proposal picked up again! Did you by any chance evalue the use of INPUT in linker files. Stephen back then (again based on proposal from Alan Modra), also made an implementation using INPUT. See below for an updated simple patch on top of mainline. Build statistics for "make defconfig" on my i7 box: find -name built-in.o; xargs rm; time make -j16 vmlinux standard singlelink delta real 0m6.368s 0m7.040s +672ms user 0m15.577s 0m14.960s -617ms sys 0m7.601s 0m6.226s -1375ms vmlinux size: standard singlelink delta text 10.250.675 10.250.675 0 data 4.369.632 4.374.816 +5184 bss 1.110.016 1.110.016 0 I had expected to see improvements in build time - but we serialize the heavy link phase, so it is actually slower. I did not investigate why data section got larger, but I think you already touch the reasons. The patch does not change how we link modules. Please consider if this approach is better / worse than using archieves. Note that this patch remove the possibility to run section mismatch anylysis on a per-directory basis. Sam diff --git a/scripts/Makefile.build b/scripts/Makefile.build index 11602e5..954e7cb 100644 --- a/scripts/Makefile.build +++ b/scripts/Makefile.build @@ -360,10 +360,9 @@ $(sort $(subdir-obj-y)): $(subdir-ym) ; ifdef builtin-target quiet_cmd_link_o_target = LD $@ # If the list of objects to link is empty, just create an empty built-in.o -cmd_link_o_target = $(if $(strip $(obj-y)),\ - $(LD) $(ld_flags) -r -o $@ $(filter $(obj-y), $^) \ - $(cmd_secanalysis),\ - rm -f $@; $(AR) rcs$(KBUILD_ARFLAGS) $@) +cmd_link_o_target = $(if $(filter $(obj-y), $^), \ + echo INPUT\($(filter $(obj-y), $^)\) > $@, \ + echo "/* empty */" > $@) $(builtin-target): $(obj-y) FORCE $(call if_changed,link_o_target) @@ -414,10 +413,10 @@ $($(subst $(obj)/,,$(@:.o=-y))) \ $($(subst $(obj)/,,$(@:.o=-m)))), $^) quiet_cmd_link_multi-y = LD $@ -cmd_link_multi-y = $(LD) $(ld_flags) -r -o $@ $(link_multi_deps) $(cmd_secanalysis) +cmd_link_multi-y = echo INPUT\($(link_multi_deps)\) > $@ quiet_cmd_link_multi-m = LD [M] $@ -cmd_link_multi-m = $(cmd_link_multi-y) +cmd_link_multi-m = $(LD) $(ld_flags) -r -o $@ $(link_multi_deps) $(multi-used-y): FORCE $(call if_changed,link_multi-y) ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH 1/5] kbuild: allow architectures to use thin archives instead of ld -r 2016-08-06 20:10 ` Sam Ravnborg @ 2016-08-07 1:49 ` Stephen Rothwell 2016-08-07 3:34 ` Alan Modra ` (2 more replies) 2016-08-08 3:25 ` Nicholas Piggin 1 sibling, 3 replies; 43+ messages in thread From: Stephen Rothwell @ 2016-08-07 1:49 UTC (permalink / raw) To: Sam Ravnborg Cc: Nicholas Piggin, linux-kbuild, linux-arch, linuxppc-dev, Arnd Bergmann, Nicolas Pitre, Segher Boessenkool, Alan Modra Hi Sam, On Sat, 6 Aug 2016 22:10:45 +0200 Sam Ravnborg <sam@ravnborg.org> wrote: > > Did you by any chance evalue the use of INPUT in linker files. > Stephen back then (again based on proposal from Alan Modra), > also made an implementation using INPUT. The problem with that idea was that (at least for some versions of binutils in use at the time) we hit a static limit to the number of object files and ld just stopped at that point. :-( > See below for an updated simple patch on top of mainline. So, I guess it was fixed, but do we know in what version? -- Cheers, Stephen Rothwell ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 1/5] kbuild: allow architectures to use thin archives instead of ld -r 2016-08-07 1:49 ` Stephen Rothwell @ 2016-08-07 3:34 ` Alan Modra 2016-08-07 4:17 ` Nicolas Pitre 2016-08-07 14:40 ` Sam Ravnborg 2 siblings, 0 replies; 43+ messages in thread From: Alan Modra @ 2016-08-07 3:34 UTC (permalink / raw) To: Stephen Rothwell Cc: Sam Ravnborg, Nicholas Piggin, linux-kbuild, linux-arch, linuxppc-dev, Arnd Bergmann, Nicolas Pitre, Segher Boessenkool On Sun, Aug 07, 2016 at 11:49:46AM +1000, Stephen Rothwell wrote: > Hi Sam, > > On Sat, 6 Aug 2016 22:10:45 +0200 Sam Ravnborg <sam@ravnborg.org> wrote: > > > > Did you by any chance evalue the use of INPUT in linker files. > > Stephen back then (again based on proposal from Alan Modra), > > also made an implementation using INPUT. > > The problem with that idea was that (at least for some versions of > binutils in use at the time) we hit a static limit to the number of > object files and ld just stopped at that point. :-( > > > See below for an updated simple patch on top of mainline. > > So, I guess it was fixed, but do we know in what version? I think the patch was https://sourceware.org/ml/binutils/2012-06/msg00201.html So, you need binutils 2.23 or later. -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 1/5] kbuild: allow architectures to use thin archives instead of ld -r 2016-08-07 1:49 ` Stephen Rothwell 2016-08-07 3:34 ` Alan Modra @ 2016-08-07 4:17 ` Nicolas Pitre 2016-08-07 14:40 ` Sam Ravnborg 2 siblings, 0 replies; 43+ messages in thread From: Nicolas Pitre @ 2016-08-07 4:17 UTC (permalink / raw) To: Stephen Rothwell Cc: Sam Ravnborg, Nicholas Piggin, linux-kbuild, linux-arch, linuxppc-dev, Arnd Bergmann, Segher Boessenkool, Alan Modra On Sun, 7 Aug 2016, Stephen Rothwell wrote: > Hi Sam, > > On Sat, 6 Aug 2016 22:10:45 +0200 Sam Ravnborg <sam@ravnborg.org> wrote: > > > > Did you by any chance evalue the use of INPUT in linker files. > > Stephen back then (again based on proposal from Alan Modra), > > also made an implementation using INPUT. > > The problem with that idea was that (at least for some versions of > binutils in use at the time) we hit a static limit to the number of > object files and ld just stopped at that point. :-( I played with a nearly identical patch to work around the inability to do LTO and 'ld -r' with mainline binutils. However this also requires major changes to modpost otherwise it becomes ineffective. Nicolas ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 1/5] kbuild: allow architectures to use thin archives instead of ld -r 2016-08-07 1:49 ` Stephen Rothwell 2016-08-07 3:34 ` Alan Modra 2016-08-07 4:17 ` Nicolas Pitre @ 2016-08-07 14:40 ` Sam Ravnborg 2016-08-08 3:19 ` Nicholas Piggin 2 siblings, 1 reply; 43+ messages in thread From: Sam Ravnborg @ 2016-08-07 14:40 UTC (permalink / raw) To: Stephen Rothwell Cc: Nicholas Piggin, linux-kbuild, linux-arch, linuxppc-dev, Arnd Bergmann, Nicolas Pitre, Segher Boessenkool, Alan Modra On Sun, Aug 07, 2016 at 11:49:46AM +1000, Stephen Rothwell wrote: > Hi Sam, > > On Sat, 6 Aug 2016 22:10:45 +0200 Sam Ravnborg <sam@ravnborg.org> wrote: > > > > Did you by any chance evalue the use of INPUT in linker files. > > Stephen back then (again based on proposal from Alan Modra), > > also made an implementation using INPUT. > > The problem with that idea was that (at least for some versions of > binutils in use at the time) we hit a static limit to the number of > object files and ld just stopped at that point. :-( The ld bug was caused by opening too many linked definitions files. We can workaround this by expanding the files. I gave this a quick spin - see below. Note - I have no idea if using thin archived or this method is better. But it seems just wrong to me that we convert to thin archives when we really do not need to do so. Note - this was a quick spin. It build fine here and thats it. Sam diff --git a/scripts/Makefile.build b/scripts/Makefile.build index 11602e5..37b8bc9 100644 --- a/scripts/Makefile.build +++ b/scripts/Makefile.build @@ -360,10 +360,16 @@ $(sort $(subdir-obj-y)): $(subdir-ym) ; ifdef builtin-target quiet_cmd_link_o_target = LD $@ # If the list of objects to link is empty, just create an empty built-in.o -cmd_link_o_target = $(if $(strip $(obj-y)),\ - $(LD) $(ld_flags) -r -o $@ $(filter $(obj-y), $^) \ - $(cmd_secanalysis),\ - rm -f $@; $(AR) rcs$(KBUILD_ARFLAGS) $@) +cmd_link_o_target = \ +$(if $(filter $(obj-y), $^), \ + echo $(foreach file, $(filter $(obj-y), $^), \ + $(if $(filter %/built-in.o, $(file)), \ + $(shell cat $(file)), \ + $(file) \ + ) \ + ) \ + , echo "" \ +) > $@ $(builtin-target): $(obj-y) FORCE $(call if_changed,link_o_target) @@ -414,10 +420,10 @@ $($(subst $(obj)/,,$(@:.o=-y))) \ $($(subst $(obj)/,,$(@:.o=-m)))), $^) quiet_cmd_link_multi-y = LD $@ -cmd_link_multi-y = $(LD) $(ld_flags) -r -o $@ $(link_multi_deps) $(cmd_secanalysis) +cmd_link_multi-y = echo INPUT\($(link_multi_deps)\) > $@ quiet_cmd_link_multi-m = LD [M] $@ -cmd_link_multi-m = $(cmd_link_multi-y) +cmd_link_multi-m = $(LD) $(ld_flags) -r -o $@ $(link_multi_deps) $(multi-used-y): FORCE $(call if_changed,link_multi-y) diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh index 4f727eb..323c13a 100755 --- a/scripts/link-vmlinux.sh +++ b/scripts/link-vmlinux.sh @@ -41,8 +41,8 @@ info() # ${1} output file modpost_link() { - ${LD} ${LDFLAGS} -r -o ${1} ${KBUILD_VMLINUX_INIT} \ - --start-group ${KBUILD_VMLINUX_MAIN} --end-group + ${LD} ${LDFLAGS} -r -o ${1} ${vmlinux_init} \ + --start-group ${vmlinux_main} --end-group } # Link of vmlinux @@ -54,13 +54,13 @@ vmlinux_link() if [ "${SRCARCH}" != "um" ]; then ${LD} ${LDFLAGS} ${LDFLAGS_vmlinux} -o ${2} \ - -T ${lds} ${KBUILD_VMLINUX_INIT} \ - --start-group ${KBUILD_VMLINUX_MAIN} --end-group ${1} + -T ${lds} ${vmlinux_init} \ + --start-group ${vmlinux_main} --end-group ${1} else ${CC} ${CFLAGS_vmlinux} -o ${2} \ - -Wl,-T,${lds} ${KBUILD_VMLINUX_INIT} \ + -Wl,-T,${lds} ${vmlinux_init} \ -Wl,--start-group \ - ${KBUILD_VMLINUX_MAIN} \ + ${vmlinux_main} \ -Wl,--end-group \ -lutil -lrt -lpthread ${1} rm -f linux @@ -162,6 +162,23 @@ case "${KCONFIG_CONFIG}" in . "./${KCONFIG_CONFIG}" esac +# Expand built-in.o file as they just list .o files +for f in ${KBUILD_VMLINUX_INIT}; do + if [ $(basename $f) = built-in.o ]; then + vmlinux_init="${vmlinux_init} $(cat $f)" + else + vmlinux_init="${vmlinux_init} $f" + fi +done + +for f in ${KBUILD_VMLINUX_MAIN}; do + if [ $(basename $f) = built-in.o ]; then + vmlinux_main="${vmlinux_main} $(cat $f)" + else + vmlinux_main="${vmlinux_main} $f" + fi +done + #link vmlinux.o info LD vmlinux.o modpost_link vmlinux.o ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH 1/5] kbuild: allow architectures to use thin archives instead of ld -r 2016-08-07 14:40 ` Sam Ravnborg @ 2016-08-08 3:19 ` Nicholas Piggin 2016-08-08 4:46 ` Sam Ravnborg 0 siblings, 1 reply; 43+ messages in thread From: Nicholas Piggin @ 2016-08-08 3:19 UTC (permalink / raw) To: Sam Ravnborg Cc: Stephen Rothwell, linux-kbuild, linux-arch, linuxppc-dev, Arnd Bergmann, Nicolas Pitre, Segher Boessenkool, Alan Modra On Sun, 7 Aug 2016 16:40:54 +0200 Sam Ravnborg <sam@ravnborg.org> wrote: > On Sun, Aug 07, 2016 at 11:49:46AM +1000, Stephen Rothwell wrote: > > Hi Sam, > > > > On Sat, 6 Aug 2016 22:10:45 +0200 Sam Ravnborg <sam@ravnborg.org> wrote: > > > > > > Did you by any chance evalue the use of INPUT in linker files. > > > Stephen back then (again based on proposal from Alan Modra), > > > also made an implementation using INPUT. > > > > The problem with that idea was that (at least for some versions of > > binutils in use at the time) we hit a static limit to the number of > > object files and ld just stopped at that point. :-( > > The ld bug was caused by opening too many linked definitions files. > We can workaround this by expanding the files. > I gave this a quick spin - see below. > > Note - I have no idea if using thin archived or this method is better. > But it seems just wrong to me that we convert to thin archives when > we really do not need to do so. > > Note - this was a quick spin. It build fine here and thats it. Is there a reason to prefer using linker scripts rather than thin archives? I thought the former was possibly a bit less robust, and the latter a smaller change for scripts and toolchain in terms of "almost behaving like an object file". I don't have a strong preference although do have a couple of (out of tree) scripts that expect objdump to work on built-in.o Thanks, Nick ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 1/5] kbuild: allow architectures to use thin archives instead of ld -r 2016-08-08 3:19 ` Nicholas Piggin @ 2016-08-08 4:46 ` Sam Ravnborg 0 siblings, 0 replies; 43+ messages in thread From: Sam Ravnborg @ 2016-08-08 4:46 UTC (permalink / raw) To: Nicholas Piggin Cc: Stephen Rothwell, linux-kbuild, linux-arch, linuxppc-dev, Arnd Bergmann, Nicolas Pitre, Segher Boessenkool, Alan Modra On Mon, Aug 08, 2016 at 01:19:41PM +1000, Nicholas Piggin wrote: > On Sun, 7 Aug 2016 16:40:54 +0200 > Sam Ravnborg <sam@ravnborg.org> wrote: > > > On Sun, Aug 07, 2016 at 11:49:46AM +1000, Stephen Rothwell wrote: > > > Hi Sam, > > > > > > On Sat, 6 Aug 2016 22:10:45 +0200 Sam Ravnborg <sam@ravnborg.org> wrote: > > > > > > > > Did you by any chance evalue the use of INPUT in linker files. > > > > Stephen back then (again based on proposal from Alan Modra), > > > > also made an implementation using INPUT. > > > > > > The problem with that idea was that (at least for some versions of > > > binutils in use at the time) we hit a static limit to the number of > > > object files and ld just stopped at that point. :-( > > > > The ld bug was caused by opening too many linked definitions files. > > We can workaround this by expanding the files. > > I gave this a quick spin - see below. > > > > Note - I have no idea if using thin archived or this method is better. > > But it seems just wrong to me that we convert to thin archives when > > we really do not need to do so. > > > > Note - this was a quick spin. It build fine here and thats it. > > Is there a reason to prefer using linker scripts rather than thin > archives? I thought the former was possibly a bit less robust, and > the latter a smaller change for scripts and toolchain in terms > of "almost behaving like an object file". The only valid reason I can come up with is that it is simpler to build a list of .o files than it is to generate a number of thin archives. I persuaded "linker scripts" mainly because I had the patch floating and I was triggered by the powerpc discussions to resurface it again. > I don't have a strong preference although do have a couple of > (out of tree) scripts that expect objdump to work on built-in.o You are likely not alone here. Unless someone else comes up with any good reason lets stay with thin archives. Sam ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 1/5] kbuild: allow architectures to use thin archives instead of ld -r 2016-08-06 20:10 ` Sam Ravnborg 2016-08-07 1:49 ` Stephen Rothwell @ 2016-08-08 3:25 ` Nicholas Piggin 2016-08-08 9:18 ` Arnd Bergmann 1 sibling, 1 reply; 43+ messages in thread From: Nicholas Piggin @ 2016-08-08 3:25 UTC (permalink / raw) To: Sam Ravnborg Cc: linux-kbuild, linux-arch, linuxppc-dev, Stephen Rothwell, Arnd Bergmann, Nicolas Pitre, Segher Boessenkool, Alan Modra On Sat, 6 Aug 2016 22:10:45 +0200 Sam Ravnborg <sam@ravnborg.org> wrote: > Hi Nicholas. > > On Fri, Aug 05, 2016 at 10:11:59PM +1000, Nicholas Piggin wrote: > > From: Stephen Rothwell <sfr@canb.auug.org.au> > > > > ld -r is an incremental link used to create built-in.o files in build > > subdirectories. It produces relocatable object files containing all > > its input files, and these are are then pulled together and relocated > > in the final link. Aside from the bloat, this constrains the final > > link relocations, which has bitten large powerpc builds with > > unresolvable relocations in the final link. > > > > Alan Modra has recommended the kernel use thin archives for linking. > > This is an alternative and means that the linker has more information > > available to it when it links the kernel. > > > > This patch enables a config option architectures can select, > If we want to do this, then I suggest to make the logic reverse. > Architectures that for some reasons cannot use this should > have the possibility to avoid it. But let it be enabled by default. I was thinking the build matrix (architectures x build options x toolchains) is a bit too large to switch it for everybody. I've far from even tested it for a fraction of powerpc builds. I would prefer arch maintainers to switch it themselves, but I do hope we can move everybody and just remove the old method within a few releases. But I'm happy to go with whatever arch and kbuild maintainers prefer, so I appreciate any discussion on it. Thanks, Nick ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 1/5] kbuild: allow architectures to use thin archives instead of ld -r 2016-08-08 3:25 ` Nicholas Piggin @ 2016-08-08 9:18 ` Arnd Bergmann 0 siblings, 0 replies; 43+ messages in thread From: Arnd Bergmann @ 2016-08-08 9:18 UTC (permalink / raw) To: Nicholas Piggin Cc: Sam Ravnborg, linux-kbuild, linux-arch, linuxppc-dev, Stephen Rothwell, Nicolas Pitre, Segher Boessenkool, Alan Modra On Monday, August 8, 2016 1:25:32 PM CEST Nicholas Piggin wrote: > On Sat, 6 Aug 2016 22:10:45 +0200 > Sam Ravnborg <sam@ravnborg.org> wrote: > > > Hi Nicholas. > > > > On Fri, Aug 05, 2016 at 10:11:59PM +1000, Nicholas Piggin wrote: > > > From: Stephen Rothwell <sfr@canb.auug.org.au> > > > > > > ld -r is an incremental link used to create built-in.o files in build > > > subdirectories. It produces relocatable object files containing all > > > its input files, and these are are then pulled together and relocated > > > in the final link. Aside from the bloat, this constrains the final > > > link relocations, which has bitten large powerpc builds with > > > unresolvable relocations in the final link. > > > > > > Alan Modra has recommended the kernel use thin archives for linking. > > > This is an alternative and means that the linker has more information > > > available to it when it links the kernel. > > > > > > This patch enables a config option architectures can select, > > If we want to do this, then I suggest to make the logic reverse. > > Architectures that for some reasons cannot use this should > > have the possibility to avoid it. But let it be enabled by default. > > I was thinking the build matrix (architectures x build options x toolchains) > is a bit too large to switch it for everybody. I've far from even tested it > for a fraction of powerpc builds. I would prefer arch maintainers to switch > it themselves, but I do hope we can move everybody and just remove the old > method within a few releases. > > But I'm happy to go with whatever arch and kbuild maintainers prefer, so I > appreciate any discussion on it. How about making the option visible for everyone with CONFIG_EXPERT? That way you don't get it by default, but you do get it with randconfig and allmodconfig build testing. Arnd ^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination 2016-08-05 12:11 [RFC][PATCH 0/5] kbuild changes, thin archives, --gc-sections Nicholas Piggin 2016-08-05 12:11 ` [PATCH 1/5] kbuild: allow architectures to use thin archives instead of ld -r Nicholas Piggin @ 2016-08-05 12:12 ` Nicholas Piggin 2016-08-06 20:14 ` Sam Ravnborg ` (2 more replies) 2016-08-05 12:12 ` [PATCH 3/5] kbuild: add arch specific post-module-link pass Nicholas Piggin ` (4 subsequent siblings) 6 siblings, 3 replies; 43+ messages in thread From: Nicholas Piggin @ 2016-08-05 12:12 UTC (permalink / raw) To: linux-kbuild Cc: Nicholas Piggin, linux-arch, linuxppc-dev, Stephen Rothwell, Arnd Bergmann, Nicolas Pitre, Segher Boessenkool, Alan Modra Introduce LINKER_DCE option for architectures to select if they want to build with -ffunction-sections, -fdata-sections, and link with --gc-sections. It requires some work (documented) to ensure all unreferenced entrypoints are live, and requires toolchain and build verification, so it is made a per-arch option for now. On a random powerpc64le build, this yelds a significant size saving, it boots and runs fine, but there is a lot I haven't tested as yet, so these savings may be reduced if there are bugs in the link. text data bss dec filename 11169741 1180744 1923176 14273661 vmlinux 10445269 1004127 1919707 13369103 vmlinux.dce ~700K text, ~170K data, 6% removed from kernel image size. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> --- Makefile | 10 ++++++++ arch/Kconfig | 13 ++++++++++ include/asm-generic/vmlinux.lds.h | 52 ++++++++++++++++++++++----------------- include/linux/compiler.h | 18 ++++++++++++++ include/linux/export.h | 30 +++++++++++----------- include/linux/init.h | 38 ++++++++++------------------ init/Makefile | 2 ++ 7 files changed, 100 insertions(+), 63 deletions(-) diff --git a/Makefile b/Makefile index b409076..d5ef31a 100644 --- a/Makefile +++ b/Makefile @@ -618,6 +618,11 @@ include arch/$(SRCARCH)/Makefile KBUILD_CFLAGS += $(call cc-option,-fno-delete-null-pointer-checks,) +ifdef CONFIG_LINKER_DCE +KBUILD_CFLAGS += $(call cc-option,-ffunction-sections,) +KBUILD_CFLAGS += $(call cc-option,-fdata-sections,) +endif + ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE KBUILD_CFLAGS += -Os $(call cc-disable-warning,maybe-uninitialized,) else @@ -819,6 +824,11 @@ LDFLAGS_BUILD_ID = $(patsubst -Wl$(comma)%,%,\ KBUILD_LDFLAGS_MODULE += $(LDFLAGS_BUILD_ID) LDFLAGS_vmlinux += $(LDFLAGS_BUILD_ID) +ifdef CONFIG_LINKER_DCE +# LDFLAGS_MODULE += $(call ld-option, --gc-sections,) +LDFLAGS_vmlinux += $(call ld-option, --gc-sections,) +endif + ifeq ($(CONFIG_STRIP_ASM_SYMS),y) LDFLAGS_vmlinux += $(call ld-option, -X,) endif diff --git a/arch/Kconfig b/arch/Kconfig index 1330bf4..a49092b 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -430,6 +430,19 @@ config THIN_ARCHIVES Select this if the architecture wants to use thin archives instead of ld -r to create the built-in.o files. +config LINKER_DCE + bool + help + Select this if the architecture wants to do dead code and + data elimination with the linker by compiling with + -ffunction-sections -fdata-sections and linking with + --gc-sections. + + This requires that the arch annotates or otherwise protects + its external entry points from being discarded. Linker scripts + must also merge .text.*, .data.*, and .bss.* correctly into + output sections. + config HAVE_CONTEXT_TRACKING bool help diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index 6a67ab9..a66ffe9 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -196,9 +196,14 @@ *(.dtb.init.rodata) \ VMLINUX_SYMBOL(__dtb_end) = .; -/* .data section */ +/* + * .data section + * -fdata-sections generates .data.identifier which needs to be pulled in + * with .data, but don't want to pull in .data..stuff which has its own + * requirements. Same for bss. + */ #define DATA_DATA \ - *(.data) \ + *(.data .data.[0-9a-zA-Z_]*) \ *(.ref.data) \ *(.data..shared_aligned) /* percpu related */ \ MEM_KEEP(init.data) \ @@ -312,76 +317,76 @@ /* Kernel symbol table: Normal symbols */ \ __ksymtab : AT(ADDR(__ksymtab) - LOAD_OFFSET) { \ VMLINUX_SYMBOL(__start___ksymtab) = .; \ - *(SORT(___ksymtab+*)) \ + KEEP(*(SORT(___ksymtab+*))) \ VMLINUX_SYMBOL(__stop___ksymtab) = .; \ } \ \ /* Kernel symbol table: GPL-only symbols */ \ __ksymtab_gpl : AT(ADDR(__ksymtab_gpl) - LOAD_OFFSET) { \ VMLINUX_SYMBOL(__start___ksymtab_gpl) = .; \ - *(SORT(___ksymtab_gpl+*)) \ + KEEP(*(SORT(___ksymtab_gpl+*))) \ VMLINUX_SYMBOL(__stop___ksymtab_gpl) = .; \ } \ \ /* Kernel symbol table: Normal unused symbols */ \ __ksymtab_unused : AT(ADDR(__ksymtab_unused) - LOAD_OFFSET) { \ VMLINUX_SYMBOL(__start___ksymtab_unused) = .; \ - *(SORT(___ksymtab_unused+*)) \ + KEEP(*(SORT(___ksymtab_unused+*))) \ VMLINUX_SYMBOL(__stop___ksymtab_unused) = .; \ } \ \ /* Kernel symbol table: GPL-only unused symbols */ \ __ksymtab_unused_gpl : AT(ADDR(__ksymtab_unused_gpl) - LOAD_OFFSET) { \ VMLINUX_SYMBOL(__start___ksymtab_unused_gpl) = .; \ - *(SORT(___ksymtab_unused_gpl+*)) \ + KEEP(*(SORT(___ksymtab_unused_gpl+*))) \ VMLINUX_SYMBOL(__stop___ksymtab_unused_gpl) = .; \ } \ \ /* Kernel symbol table: GPL-future-only symbols */ \ __ksymtab_gpl_future : AT(ADDR(__ksymtab_gpl_future) - LOAD_OFFSET) { \ VMLINUX_SYMBOL(__start___ksymtab_gpl_future) = .; \ - *(SORT(___ksymtab_gpl_future+*)) \ + KEEP(*(SORT(___ksymtab_gpl_future+*))) \ VMLINUX_SYMBOL(__stop___ksymtab_gpl_future) = .; \ } \ \ /* Kernel symbol table: Normal symbols */ \ __kcrctab : AT(ADDR(__kcrctab) - LOAD_OFFSET) { \ VMLINUX_SYMBOL(__start___kcrctab) = .; \ - *(SORT(___kcrctab+*)) \ + KEEP(*(SORT(___kcrctab+*))) \ VMLINUX_SYMBOL(__stop___kcrctab) = .; \ } \ \ /* Kernel symbol table: GPL-only symbols */ \ __kcrctab_gpl : AT(ADDR(__kcrctab_gpl) - LOAD_OFFSET) { \ VMLINUX_SYMBOL(__start___kcrctab_gpl) = .; \ - *(SORT(___kcrctab_gpl+*)) \ + KEEP(*(SORT(___kcrctab_gpl+*))) \ VMLINUX_SYMBOL(__stop___kcrctab_gpl) = .; \ } \ \ /* Kernel symbol table: Normal unused symbols */ \ __kcrctab_unused : AT(ADDR(__kcrctab_unused) - LOAD_OFFSET) { \ VMLINUX_SYMBOL(__start___kcrctab_unused) = .; \ - *(SORT(___kcrctab_unused+*)) \ + KEEP(*(SORT(___kcrctab_unused+*))) \ VMLINUX_SYMBOL(__stop___kcrctab_unused) = .; \ } \ \ /* Kernel symbol table: GPL-only unused symbols */ \ __kcrctab_unused_gpl : AT(ADDR(__kcrctab_unused_gpl) - LOAD_OFFSET) { \ VMLINUX_SYMBOL(__start___kcrctab_unused_gpl) = .; \ - *(SORT(___kcrctab_unused_gpl+*)) \ + KEEP(*(SORT(___kcrctab_unused_gpl+*))) \ VMLINUX_SYMBOL(__stop___kcrctab_unused_gpl) = .; \ } \ \ /* Kernel symbol table: GPL-future-only symbols */ \ __kcrctab_gpl_future : AT(ADDR(__kcrctab_gpl_future) - LOAD_OFFSET) { \ VMLINUX_SYMBOL(__start___kcrctab_gpl_future) = .; \ - *(SORT(___kcrctab_gpl_future+*)) \ + KEEP(*(SORT(___kcrctab_gpl_future+*))) \ VMLINUX_SYMBOL(__stop___kcrctab_gpl_future) = .; \ } \ \ /* Kernel symbol table: strings */ \ __ksymtab_strings : AT(ADDR(__ksymtab_strings) - LOAD_OFFSET) { \ - *(__ksymtab_strings) \ + KEEP(*(__ksymtab_strings)) \ } \ \ /* __*init sections */ \ @@ -416,7 +421,7 @@ #define SECURITY_INIT \ .security_initcall.init : AT(ADDR(.security_initcall.init) - LOAD_OFFSET) { \ VMLINUX_SYMBOL(__security_initcall_start) = .; \ - *(.security_initcall.init) \ + KEEP(*(.security_initcall.init)) \ VMLINUX_SYMBOL(__security_initcall_end) = .; \ } @@ -424,7 +429,7 @@ * during second ld run in second ld pass when generating System.map */ #define TEXT_TEXT \ ALIGN_FUNCTION(); \ - *(.text.hot .text .text.fixup .text.unlikely) \ + *(.text.hot .text .text.fixup .text.unlikely .text.*) \ *(.ref.text) \ MEM_KEEP(init.text) \ MEM_KEEP(exit.text) \ @@ -519,6 +524,7 @@ /* init and exit section handling */ #define INIT_DATA \ + KEEP(*(SORT(___kentry+*))) \ *(.init.data) \ MEM_DISCARD(init.data) \ KERNEL_CTORS() \ @@ -581,7 +587,7 @@ BSS_FIRST_SECTIONS \ *(.bss..page_aligned) \ *(.dynbss) \ - *(.bss) \ + *(.bss .bss.[0-9a-zA-Z_]*) \ *(COMMON) \ } @@ -664,12 +670,12 @@ #define INIT_CALLS_LEVEL(level) \ VMLINUX_SYMBOL(__initcall##level##_start) = .; \ - *(.initcall##level##.init) \ - *(.initcall##level##s.init) \ + KEEP(*(.initcall##level##.init)) \ + KEEP(*(.initcall##level##s.init)) \ #define INIT_CALLS \ VMLINUX_SYMBOL(__initcall_start) = .; \ - *(.initcallearly.init) \ + KEEP(*(.initcallearly.init)) \ INIT_CALLS_LEVEL(0) \ INIT_CALLS_LEVEL(1) \ INIT_CALLS_LEVEL(2) \ @@ -683,21 +689,21 @@ #define CON_INITCALL \ VMLINUX_SYMBOL(__con_initcall_start) = .; \ - *(.con_initcall.init) \ + KEEP(*(.con_initcall.init)) \ VMLINUX_SYMBOL(__con_initcall_end) = .; #define SECURITY_INITCALL \ VMLINUX_SYMBOL(__security_initcall_start) = .; \ - *(.security_initcall.init) \ + KEEP(*(.security_initcall.init)) \ VMLINUX_SYMBOL(__security_initcall_end) = .; #ifdef CONFIG_BLK_DEV_INITRD #define INIT_RAM_FS \ . = ALIGN(4); \ VMLINUX_SYMBOL(__initramfs_start) = .; \ - *(.init.ramfs) \ + KEEP(*(.init.ramfs)) \ . = ALIGN(8); \ - *(.init.ramfs.info) + KEEP(*(.init.ramfs.info)) #else #define INIT_RAM_FS #endif diff --git a/include/linux/compiler.h b/include/linux/compiler.h index 793c082..8404d93 100644 --- a/include/linux/compiler.h +++ b/include/linux/compiler.h @@ -184,6 +184,24 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect); # define unreachable() do { } while (1) #endif +/* + * KENTRY - kernel entry point + * This can be used to annotate symbols (functions or data) that are used + * without their linker symbol being referenced. For example, interrupt vector + * handlers, or functions in the kernel image that are found programatically. + * This can be avoided if the symbols are marked as KEEP() in the linker script. + * For example the architecture could KEEP() its entire exception vector + * code rather than annotate each one. + */ +#ifndef KENTRY +# define KENTRY(sym) \ + extern typeof(sym) sym; \ + static const unsigned long __kentry_##sym \ + __used \ + __attribute__((section("___kentry" "+" #sym ), used)) \ + = (unsigned long)&sym; +#endif + #ifndef RELOC_HIDE # define RELOC_HIDE(ptr, off) \ ({ unsigned long __ptr; \ diff --git a/include/linux/export.h b/include/linux/export.h index 2f9ccbe..0d1ccdd 100644 --- a/include/linux/export.h +++ b/include/linux/export.h @@ -1,5 +1,6 @@ #ifndef _LINUX_EXPORT_H #define _LINUX_EXPORT_H + /* * Export symbols from the kernel to modules. Forked from module.h * to reduce the amount of pointless cruft we feed to gcc when only @@ -42,27 +43,26 @@ extern struct module __this_module; #ifdef CONFIG_MODVERSIONS /* Mark the CRC weak since genksyms apparently decides not to * generate a checksums for some symbols */ -#define __CRC_SYMBOL(sym, sec) \ - extern __visible void *__crc_##sym __attribute__((weak)); \ - static const unsigned long __kcrctab_##sym \ - __used \ - __attribute__((section("___kcrctab" sec "+" #sym), unused)) \ +#define __CRC_SYMBOL(sym, sec) \ + extern __visible void *__crc_##sym __attribute__((weak)); \ + static const unsigned long __kcrctab_##sym \ + __used \ + __attribute__((section("___kcrctab" sec "+" #sym), used)) \ = (unsigned long) &__crc_##sym; #else #define __CRC_SYMBOL(sym, sec) #endif /* For every exported symbol, place a struct in the __ksymtab section */ -#define ___EXPORT_SYMBOL(sym, sec) \ - extern typeof(sym) sym; \ - __CRC_SYMBOL(sym, sec) \ - static const char __kstrtab_##sym[] \ - __attribute__((section("__ksymtab_strings"), aligned(1))) \ - = VMLINUX_SYMBOL_STR(sym); \ - extern const struct kernel_symbol __ksymtab_##sym; \ - __visible const struct kernel_symbol __ksymtab_##sym \ - __used \ - __attribute__((section("___ksymtab" sec "+" #sym), unused)) \ +#define ___EXPORT_SYMBOL(sym, sec) \ + extern typeof(sym) sym; \ + __CRC_SYMBOL(sym, sec) \ + static const char __kstrtab_##sym[] \ + __attribute__((section("__ksymtab_strings"), aligned(1))) \ + = VMLINUX_SYMBOL_STR(sym); \ + static const struct kernel_symbol __ksymtab_##sym \ + __used \ + __attribute__((section("___ksymtab" sec "+" #sym), used)) \ = { (unsigned long)&sym, __kstrtab_##sym } #if defined(__KSYM_DEPS__) diff --git a/include/linux/init.h b/include/linux/init.h index aedb254..9abd61f 100644 --- a/include/linux/init.h +++ b/include/linux/init.h @@ -156,24 +156,8 @@ extern bool initcall_debug; #ifndef __ASSEMBLY__ -#ifdef CONFIG_LTO -/* Work around a LTO gcc problem: when there is no reference to a variable - * in a module it will be moved to the end of the program. This causes - * reordering of initcalls which the kernel does not like. - * Add a dummy reference function to avoid this. The function is - * deleted by the linker. - */ -#define LTO_REFERENCE_INITCALL(x) \ - ; /* yes this is needed */ \ - static __used __exit void *reference_##x(void) \ - { \ - return &x; \ - } -#else -#define LTO_REFERENCE_INITCALL(x) -#endif - -/* initcalls are now grouped by functionality into separate +/* + * initcalls are now grouped by functionality into separate * subsections. Ordering inside the subsections is determined * by link order. * For backwards compatibility, initcall() puts the call in @@ -181,12 +165,16 @@ extern bool initcall_debug; * * The `id' arg to __define_initcall() is needed so that multiple initcalls * can point at the same handler without causing duplicate-symbol build errors. + * + * Initcalls are run by placing pointers in initcall sections that the + * kernel iterates at runtime. The linker can do dead code / data elimination + * and remove that completely, so the initcall sections have to be marked + * as KEEP() in the linker script. */ #define __define_initcall(fn, id) \ static initcall_t __initcall_##fn##id __used \ - __attribute__((__section__(".initcall" #id ".init"))) = fn; \ - LTO_REFERENCE_INITCALL(__initcall_##fn##id) + __attribute__((__section__(".initcall" #id ".init"))) = fn; /* * Early initcalls run before initializing SMP. @@ -222,15 +210,15 @@ extern bool initcall_debug; #define __initcall(fn) device_initcall(fn) -#define __exitcall(fn) \ +#define __exitcall(fn) \ static exitcall_t __exitcall_##fn __exit_call = fn -#define console_initcall(fn) \ - static initcall_t __initcall_##fn \ +#define console_initcall(fn) \ + static initcall_t __initcall_##fn \ __used __section(.con_initcall.init) = fn -#define security_initcall(fn) \ - static initcall_t __initcall_##fn \ +#define security_initcall(fn) \ + static initcall_t __initcall_##fn \ __used __section(.security_initcall.init) = fn struct obs_kernel_param { diff --git a/init/Makefile b/init/Makefile index 7bc47ee..c4fb455 100644 --- a/init/Makefile +++ b/init/Makefile @@ -2,6 +2,8 @@ # Makefile for the linux kernel. # +ccflags-y := -fno-function-sections -fno-data-sections + obj-y := main.o version.o mounts.o ifneq ($(CONFIG_BLK_DEV_INITRD),y) obj-y += noinitramfs.o -- 2.8.1 ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination 2016-08-05 12:12 ` [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination Nicholas Piggin @ 2016-08-06 20:14 ` Sam Ravnborg 2016-08-08 3:29 ` Nicholas Piggin 2016-08-07 5:33 ` Nicolas Pitre 2016-08-07 9:57 ` Alan Modra 2 siblings, 1 reply; 43+ messages in thread From: Sam Ravnborg @ 2016-08-06 20:14 UTC (permalink / raw) To: Nicholas Piggin Cc: linux-kbuild, linux-arch, linuxppc-dev, Stephen Rothwell, Arnd Bergmann, Nicolas Pitre, Segher Boessenkool, Alan Modra On Fri, Aug 05, 2016 at 10:12:00PM +1000, Nicholas Piggin wrote: > Introduce LINKER_DCE option for architectures to select if they want > to build with -ffunction-sections, -fdata-sections, and link with > --gc-sections. Can you please try to come up with a less cryptic name. "DCE" may make sense for you today. Bot the naive reader will benefit from the longer and more explcit form. It requires some work (documented) to ensure all > unreferenced entrypoints are live, and requires toolchain and > build verification, so it is made a per-arch option for now. > > On a random powerpc64le build, this yelds a significant size saving, > it boots and runs fine, but there is a lot I haven't tested as yet, > so these savings may be reduced if there are bugs in the link. > > text data bss dec filename > 11169741 1180744 1923176 14273661 vmlinux > 10445269 1004127 1919707 13369103 vmlinux.dce > > ~700K text, ~170K data, 6% removed from kernel image size. > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> > --- > Makefile | 10 ++++++++ > arch/Kconfig | 13 ++++++++++ > include/asm-generic/vmlinux.lds.h | 52 ++++++++++++++++++++++----------------- > include/linux/compiler.h | 18 ++++++++++++++ > include/linux/export.h | 30 +++++++++++----------- > include/linux/init.h | 38 ++++++++++------------------ > init/Makefile | 2 ++ > 7 files changed, 100 insertions(+), 63 deletions(-) > > diff --git a/Makefile b/Makefile > index b409076..d5ef31a 100644 > --- a/Makefile > +++ b/Makefile > @@ -618,6 +618,11 @@ include arch/$(SRCARCH)/Makefile > > KBUILD_CFLAGS += $(call cc-option,-fno-delete-null-pointer-checks,) > > +ifdef CONFIG_LINKER_DCE > +KBUILD_CFLAGS += $(call cc-option,-ffunction-sections,) > +KBUILD_CFLAGS += $(call cc-option,-fdata-sections,) > +endif > + > ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE > KBUILD_CFLAGS += -Os $(call cc-disable-warning,maybe-uninitialized,) > else > @@ -819,6 +824,11 @@ LDFLAGS_BUILD_ID = $(patsubst -Wl$(comma)%,%,\ > KBUILD_LDFLAGS_MODULE += $(LDFLAGS_BUILD_ID) > LDFLAGS_vmlinux += $(LDFLAGS_BUILD_ID) > > +ifdef CONFIG_LINKER_DCE > +# LDFLAGS_MODULE += $(call ld-option, --gc-sections,) > +LDFLAGS_vmlinux += $(call ld-option, --gc-sections,) > +endif Something you missed to clean up Sam ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination 2016-08-06 20:14 ` Sam Ravnborg @ 2016-08-08 3:29 ` Nicholas Piggin 2016-08-08 4:49 ` Sam Ravnborg 0 siblings, 1 reply; 43+ messages in thread From: Nicholas Piggin @ 2016-08-08 3:29 UTC (permalink / raw) To: Sam Ravnborg Cc: linux-kbuild, linux-arch, linuxppc-dev, Stephen Rothwell, Arnd Bergmann, Nicolas Pitre, Segher Boessenkool, Alan Modra On Sat, 6 Aug 2016 22:14:23 +0200 Sam Ravnborg <sam@ravnborg.org> wrote: > On Fri, Aug 05, 2016 at 10:12:00PM +1000, Nicholas Piggin wrote: > > Introduce LINKER_DCE option for architectures to select if they want > > to build with -ffunction-sections, -fdata-sections, and link with > > --gc-sections. > > Can you please try to come up with a less cryptic name. > "DCE" may make sense for you today. > Bot the naive reader will benefit from the longer and > more explcit form. Yes that's a good idea. The name sucks. We don't seem to consistently have a prefix for build configuration options, but we could start? KBUILD_LD_DEAD_CODE_DATA_ELIMINATION? Thanks, Nick ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination 2016-08-08 3:29 ` Nicholas Piggin @ 2016-08-08 4:49 ` Sam Ravnborg 0 siblings, 0 replies; 43+ messages in thread From: Sam Ravnborg @ 2016-08-08 4:49 UTC (permalink / raw) To: Nicholas Piggin Cc: linux-kbuild, linux-arch, linuxppc-dev, Stephen Rothwell, Arnd Bergmann, Nicolas Pitre, Segher Boessenkool, Alan Modra On Mon, Aug 08, 2016 at 01:29:03PM +1000, Nicholas Piggin wrote: > On Sat, 6 Aug 2016 22:14:23 +0200 > Sam Ravnborg <sam@ravnborg.org> wrote: > > > On Fri, Aug 05, 2016 at 10:12:00PM +1000, Nicholas Piggin wrote: > > > Introduce LINKER_DCE option for architectures to select if they want > > > to build with -ffunction-sections, -fdata-sections, and link with > > > --gc-sections. > > > > Can you please try to come up with a less cryptic name. > > "DCE" may make sense for you today. > > Bot the naive reader will benefit from the longer and > > more explcit form. > > Yes that's a good idea. The name sucks. > > We don't seem to consistently have a prefix for build configuration > options, but we could start? KBUILD_LD_DEAD_CODE_DATA_ELIMINATION? For cc related options we sort of have: KCONFIG_CC IIRC. So for LD related options we could use the shorter CONFIG_LD_ prefix. Sam ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination 2016-08-05 12:12 ` [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination Nicholas Piggin 2016-08-06 20:14 ` Sam Ravnborg @ 2016-08-07 5:33 ` Nicolas Pitre 2016-08-08 3:42 ` Nicholas Piggin 2016-08-07 9:57 ` Alan Modra 2 siblings, 1 reply; 43+ messages in thread From: Nicolas Pitre @ 2016-08-07 5:33 UTC (permalink / raw) To: Nicholas Piggin Cc: linux-kbuild, linux-arch, linuxppc-dev, Stephen Rothwell, Arnd Bergmann, Segher Boessenkool, Alan Modra [-- Attachment #1: Type: text/plain, Size: 2200 bytes --] On Fri, 5 Aug 2016, Nicholas Piggin wrote: > Introduce LINKER_DCE option for architectures to select if they want > to build with -ffunction-sections, -fdata-sections, and link with > --gc-sections. It requires some work (documented) to ensure all > unreferenced entrypoints are live, and requires toolchain and > build verification, so it is made a per-arch option for now. > > On a random powerpc64le build, this yelds a significant size saving, > it boots and runs fine, but there is a lot I haven't tested as yet, > so these savings may be reduced if there are bugs in the link. > > text data bss dec filename > 11169741 1180744 1923176 14273661 vmlinux > 10445269 1004127 1919707 13369103 vmlinux.dce > > ~700K text, ~170K data, 6% removed from kernel image size. > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> I played with that too. However this needs distinct sections for exception tables and the like otherwise the backward references from the final exception table to those functions responsible for those exception entries has the effect of pulling in all those functions even if their entry point is never referenced, making --gc-sections less effective. I managed to fix this only with a change to gas (accepted upstream). But once that is solved, you then have the missing forward reference problem i.e. nothing actually references those individual exception entry sections and ld happily drops them all. Having a KEEP() on each of them is unworkable and defeats the purpose anyway. That requires a dummy reloc to trick ld into pulling in those sections when the parent section is also pulled in. Please see attached a subset of the slides I presented at ELC and Linaro Connect last year to illustrate those issues. Also attached a sample patch partially implementing those changes. In short I'm very glad to see that this might steer interest across multiple architectures. I felt like this was becoming much more intrusive than I expected and that maybe LTO was a better bet after all. But LTO has its evils too and I'm willing to look at gc-sections again if there is interest from others as well. Nicolas [-- Attachment #2: Type: application/gzip, Size: 28809 bytes --] [-- Attachment #3: Type: text/plain, Size: 13156 bytes --] commit 1d7ec46257dc546bc7b87439788514fc4650a2b1 Author: Nicolas Pitre <nicolas.pitre@linaro.org> Date: Mon Oct 26 10:16:14 2015 -0400 ARM: pushlinkedsection introduction Signed-off-by: Nicolas Pitre <nico@linaro.org> diff --git a/Makefile b/Makefile index d5b3739119..75541414cb 100644 --- a/Makefile +++ b/Makefile @@ -775,6 +775,10 @@ ifeq ($(shell $(CONFIG_SHELL) $(srctree)/scripts/gcc-goto.sh $(CC)), y) KBUILD_AFLAGS += -DCC_HAVE_ASM_GOTO endif +# Named subsections +KBUILD_AFLAGS += -Wa,--sectname-subst +KBUILD_CFLAGS += -Wa,--sectname-subst + include scripts/Makefile.kasan include scripts/Makefile.extrawarn diff --git a/arch/arm/include/asm/assembler.h b/arch/arm/include/asm/assembler.h index b2bc8e1147..70161c9bfa 100644 --- a/arch/arm/include/asm/assembler.h +++ b/arch/arm/include/asm/assembler.h @@ -88,6 +88,17 @@ #endif /* + * Special .pushsection wrapper with explicit dependency to prevent + * garbage collection of the specified section. This is needed when no + * explicit symbol references are made to this section. + */ + .macro .pushlinkedsection name:vararg + .reloc . - 1, R_ARM_NONE, 9909f + .pushsection \name +9909: + .endm + +/* * Enable and disable interrupts */ #if __LINUX_ARM_ARCH__ >= 6 @@ -239,7 +250,7 @@ #define USER(x...) \ 9999: x; \ - .pushsection __ex_table,"a"; \ + .pushlinkedsection __ex_table.%S,"a"; \ .align 3; \ .long 9999b,9001f; \ .popsection @@ -253,7 +264,7 @@ * ALT_SMP( W(instr) ... ) */ #define ALT_UP(instr...) \ - .pushsection ".alt.smp.init", "a" ;\ + .pushlinkedsection ".alt.smp.init.%S", "a" ;\ .long 9998b ;\ 9997: instr ;\ .if . - 9997b == 2 ;\ @@ -265,7 +276,7 @@ .popsection #define ALT_UP_B(label) \ .equ up_b_offset, label - 9998b ;\ - .pushsection ".alt.smp.init", "a" ;\ + .pushlinkedsection ".alt.smp.init.%S", "a" ;\ .long 9998b ;\ W(b) . + up_b_offset ;\ .popsection @@ -375,7 +386,7 @@ THUMB( orr \reg , \reg , #PSR_T_BIT ) .error "Unsupported inc macro argument" .endif - .pushsection __ex_table,"a" + .pushlinkedsection __ex_table.%S,"a" .align 3 .long 9999b, \abort .popsection @@ -416,7 +427,7 @@ THUMB( orr \reg , \reg , #PSR_T_BIT ) .error "Unsupported inc macro argument" .endif - .pushsection __ex_table,"a" + .pushlinkedsection __ex_table.%S,"a" .align 3 .long 9999b, \abort .popsection diff --git a/arch/arm/include/asm/bug.h b/arch/arm/include/asm/bug.h index e7335a9214..0cbb6ef4b5 100644 --- a/arch/arm/include/asm/bug.h +++ b/arch/arm/include/asm/bug.h @@ -3,6 +3,7 @@ #include <linux/linkage.h> #include <linux/types.h> +#include <asm/compiler.h> #include <asm/opcodes.h> #ifdef CONFIG_BUG @@ -39,9 +40,9 @@ do { \ ".pushsection .rodata.str, \"aMS\", %progbits, 1\n" \ "2:\t.asciz " #__file "\n" \ ".popsection\n" \ - ".pushsection __bug_table,\"a\"\n" \ + __pushlinkedsection("__bug_table.%S,\"a\"") "\n"\ ".align 2\n" \ - "3:\t.word 1b, 2b\n" \ + "\t.word 1b, 2b\n" \ "\t.hword " #__line ", 0\n" \ ".popsection"); \ unreachable(); \ diff --git a/arch/arm/include/asm/compiler.h b/arch/arm/include/asm/compiler.h index 29fe85e594..3bfdd749a3 100644 --- a/arch/arm/include/asm/compiler.h +++ b/arch/arm/include/asm/compiler.h @@ -24,5 +24,14 @@ ".endif; " \ ".endif\n\t" +/* + * Special .pushsection wrapper with explicit dependency to prevent + * garbage collection of the specified section. This is needed when no + * explicit symbol references are made to this section. + */ +#define __pushlinkedsection(name) \ + ".reloc . - 1, R_ARM_NONE, 9909f\n" \ + "\t.pushsection " name "\n" \ + "9909: " #endif /* __ASM_ARM_COMPILER_H */ diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h index 6795368ad0..3540e42084 100644 --- a/arch/arm/include/asm/futex.h +++ b/arch/arm/include/asm/futex.h @@ -5,11 +5,12 @@ #include <linux/futex.h> #include <linux/uaccess.h> +#include <asm/compiler.h> #include <asm/errno.h> #define __futex_atomic_ex_table(err_reg) \ "3:\n" \ - " .pushsection __ex_table,\"a\"\n" \ + " " __pushlinkedsection("__ex_table.%S,\"a\"") "\n" \ " .align 3\n" \ " .long 1b, 4f, 2b, 4f\n" \ " .popsection\n" \ diff --git a/arch/arm/include/asm/jump_label.h b/arch/arm/include/asm/jump_label.h index 34f7b6980d..54e2a5ec11 100644 --- a/arch/arm/include/asm/jump_label.h +++ b/arch/arm/include/asm/jump_label.h @@ -4,6 +4,7 @@ #ifndef __ASSEMBLY__ #include <linux/types.h> +#include <asm/compiler.h> #include <asm/unified.h> #define JUMP_LABEL_NOP_SIZE 4 @@ -12,7 +13,7 @@ static __always_inline bool arch_static_branch(struct static_key *key, bool bran { asm_volatile_goto("1:\n\t" WASM(nop) "\n\t" - ".pushsection __jump_table, \"aw\"\n\t" + __pushlinkedsection("__jump_table.%S, \"aw\") "\n\t" ".word 1b, %l[l_yes], %c0\n\t" ".popsection\n\t" : : "i" (&((char *)key)[branch]) : : l_yes); @@ -26,7 +27,7 @@ static __always_inline bool arch_static_branch_jump(struct static_key *key, bool { asm_volatile_goto("1:\n\t" WASM(b) " %l[l_yes]\n\t" - ".pushsection __jump_table, \"aw\"\n\t" + __pushlinkedsection("__jump_table.%S, \"aw\"") "\n\t" ".word 1b, %l[l_yes], %c0\n\t" ".popsection\n\t" : : "i" (&((char *)key)[branch]) : : l_yes); diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h index 98d58bb04a..d5cc34e9a7 100644 --- a/arch/arm/include/asm/memory.h +++ b/arch/arm/include/asm/memory.h @@ -18,6 +18,8 @@ #include <linux/types.h> #include <linux/sizes.h> +#include <asm/compiler.h> + #ifdef CONFIG_NEED_MACH_MEMORY_H #include <mach/memory.h> #endif @@ -172,7 +174,7 @@ extern const void *__pv_table_begin, *__pv_table_end; #define __pv_stub(from,to,instr,type) \ __asm__("@ __pv_stub\n" \ "1: " instr " %0, %1, %2\n" \ - " .pushsection .pv_table,\"a\"\n" \ + " " __pushlinkedsection(".pv_table.%S,\"a\"") "\n" \ " .long 1b\n" \ " .popsection\n" \ : "=r" (to) \ @@ -181,7 +183,7 @@ extern const void *__pv_table_begin, *__pv_table_end; #define __pv_stub_mov_hi(t) \ __asm__ volatile("@ __pv_stub_mov\n" \ "1: mov %R0, %1\n" \ - " .pushsection .pv_table,\"a\"\n" \ + " " __pushlinkedsection(".pv_table.%S,\"a\"") "\n" \ " .long 1b\n" \ " .popsection\n" \ : "=r" (t) \ @@ -191,7 +193,7 @@ extern const void *__pv_table_begin, *__pv_table_end; __asm__ volatile("@ __pv_add_carry_stub\n" \ "1: adds %Q0, %1, %2\n" \ " adc %R0, %R0, #0\n" \ - " .pushsection .pv_table,\"a\"\n" \ + " " __pushlinkedsection(".pv_table.%S,\"a\"") "\n" \ " .long 1b\n" \ " .popsection\n" \ : "+r" (y) \ diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h index 8a1e8e995d..8c535eacea 100644 --- a/arch/arm/include/asm/processor.h +++ b/arch/arm/include/asm/processor.h @@ -19,6 +19,7 @@ #ifdef __KERNEL__ +#include <asm/compiler.h> #include <asm/hw_breakpoint.h> #include <asm/ptrace.h> #include <asm/types.h> @@ -93,7 +94,7 @@ unsigned long get_wchan(struct task_struct *p); #ifdef CONFIG_SMP #define __ALT_SMP_ASM(smp, up) \ "9998: " smp "\n" \ - " .pushsection \".alt.smp.init\", \"a\"\n" \ + " " __pushlinkedsection("\".alt.smp.init.%S\", \"a\"") "\n" \ " .long 9998b\n" \ " " up "\n" \ " .popsection\n" diff --git a/arch/arm/include/asm/uaccess.h b/arch/arm/include/asm/uaccess.h index 8cc85a4ebe..5e7e404894 100644 --- a/arch/arm/include/asm/uaccess.h +++ b/arch/arm/include/asm/uaccess.h @@ -357,7 +357,7 @@ do { \ " mov %1, #0\n" \ " b 2b\n" \ " .popsection\n" \ - " .pushsection __ex_table,\"a\"\n" \ + " " __pushlinkedsection("__ex_table.%S,\"a\"") "\n" \ " .align 3\n" \ " .long 1b, 3b\n" \ " .popsection" \ @@ -429,7 +429,7 @@ do { \ "3: mov %0, %3\n" \ " b 2b\n" \ " .popsection\n" \ - " .pushsection __ex_table,\"a\"\n" \ + " " __pushlinkedsection("__ex_table.%S,\"a\"") "\n" \ " .align 3\n" \ " .long 1b, 3b\n" \ " .popsection" \ @@ -479,7 +479,7 @@ do { \ "4: mov %0, %3\n" \ " b 3b\n" \ " .popsection\n" \ - " .pushsection __ex_table,\"a\"\n" \ + " " __pushlinkedsection("__ex_table.%S,\"a\"") "\n" \ " .align 3\n" \ " .long 1b, 4b\n" \ " .long 2b, 4b\n" \ diff --git a/arch/arm/include/asm/word-at-a-time.h b/arch/arm/include/asm/word-at-a-time.h index 5831dce4b5..348a462d3e 100644 --- a/arch/arm/include/asm/word-at-a-time.h +++ b/arch/arm/include/asm/word-at-a-time.h @@ -8,6 +8,7 @@ * Heavily based on the x86 algorithm. */ #include <linux/kernel.h> +#include <asm/compiler.h> struct word_at_a_time { const unsigned long one_bits, high_bits; @@ -84,7 +85,7 @@ static inline unsigned long load_unaligned_zeropad(const void *addr) #endif " b 2b\n" " .popsection\n" - " .pushsection __ex_table,\"a\"\n" + " " __pushlinkedsection("__ex_table.%S,\"a\"") "\n" " .align 3\n" " .long 1b, 3b\n" " .popsection" diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S index 3e1c26eb32..5047757c34 100644 --- a/arch/arm/kernel/entry-armv.S +++ b/arch/arm/kernel/entry-armv.S @@ -564,7 +564,7 @@ ENDPROC(__und_usr) 4: str r4, [sp, #S_PC] @ retry current instruction ret r9 .popsection - .pushsection __ex_table,"a" + .pushlinkedsection __ex_table.%S,"a" .long 1b, 4b #if CONFIG_ARM_THUMB && __LINUX_ARM_ARCH__ >= 6 && CONFIG_CPU_V7 .long 2b, 4b diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S index 8b60fde5ce..6885382931 100644 --- a/arch/arm/kernel/vmlinux.lds.S +++ b/arch/arm/kernel/vmlinux.lds.S @@ -36,7 +36,7 @@ #define ARM_CPU_KEEP(x) #endif -#if (defined(CONFIG_SMP_ON_UP) && !defined(CONFIG_DEBUG_SPINLOCK)) || \ +#if 0 // (defined(CONFIG_SMP_ON_UP) && !defined(CONFIG_DEBUG_SPINLOCK)) || \ defined(CONFIG_GENERIC_BUG) #define ARM_EXIT_KEEP(x) x #define ARM_EXIT_DISCARD(x) diff --git a/arch/arm/lib/backtrace.S b/arch/arm/lib/backtrace.S index fab5a50503..238c7de114 100644 --- a/arch/arm/lib/backtrace.S +++ b/arch/arm/lib/backtrace.S @@ -104,7 +104,7 @@ for_each_frame: tst frame, mask @ Check for address exceptions no_frame: ldmfd sp!, {r4 - r8, pc} ENDPROC(c_backtrace) - .pushsection __ex_table,"a" + .pushlinkedsection __ex_table.%S,"a" .align 3 .long 1001b, 1006b .long 1002b, 1006b diff --git a/arch/arm/lib/getuser.S b/arch/arm/lib/getuser.S index 8ecfd15c3a..e2c6a5649f 100644 --- a/arch/arm/lib/getuser.S +++ b/arch/arm/lib/getuser.S @@ -132,7 +132,7 @@ __get_user_bad: ENDPROC(__get_user_bad) ENDPROC(__get_user_bad8) -.pushsection __ex_table, "a" +.pushlinkedsection __ex_table.%S, "a" .long 1b, __get_user_bad .long 2b, __get_user_bad .long 3b, __get_user_bad diff --git a/arch/arm/lib/putuser.S b/arch/arm/lib/putuser.S index 38d660d370..b52f4a264e 100644 --- a/arch/arm/lib/putuser.S +++ b/arch/arm/lib/putuser.S @@ -88,7 +88,7 @@ __put_user_bad: ret lr ENDPROC(__put_user_bad) -.pushsection __ex_table, "a" +.pushlinkedsection __ex_table.%S, "a" .long 1b, __put_user_bad .long 2b, __put_user_bad .long 3b, __put_user_bad diff --git a/arch/arm/mm/alignment.c b/arch/arm/mm/alignment.c index 00b7f7de28..a2e6f47edb 100644 --- a/arch/arm/mm/alignment.c +++ b/arch/arm/mm/alignment.c @@ -22,6 +22,7 @@ #include <linux/sched.h> #include <linux/uaccess.h> +#include <asm/compiler.h> #include <asm/cp15.h> #include <asm/system_info.h> #include <asm/unaligned.h> @@ -206,7 +207,7 @@ union offset_union { "3: mov %0, #1\n" \ " b 2b\n" \ " .popsection\n" \ - " .pushsection __ex_table,\"a\"\n" \ + " " __pushlinkedsection("__ex_table.%S,\"a\"") "\n" \ " .align 3\n" \ " .long 1b, 3b\n" \ " .popsection\n" \ @@ -266,7 +267,7 @@ union offset_union { "4: mov %0, #1\n" \ " b 3b\n" \ " .popsection\n" \ - " .pushsection __ex_table,\"a\"\n" \ + " " __pushlinkedsection("__ex_table.%S,\"a\"") "\n" \ " .align 3\n" \ " .long 1b, 4b\n" \ " .long 2b, 4b\n" \ @@ -306,7 +307,7 @@ union offset_union { "6: mov %0, #1\n" \ " b 5b\n" \ " .popsection\n" \ - " .pushsection __ex_table,\"a\"\n" \ + " " __pushlinkedsection("__ex_table.%S,\"a\"") "\n" \ " .align 3\n" \ " .long 1b, 6b\n" \ " .long 2b, 6b\n" \ diff --git a/arch/arm/nwfpe/entry.S b/arch/arm/nwfpe/entry.S index 39c20afad7..8f566c87c2 100644 --- a/arch/arm/nwfpe/entry.S +++ b/arch/arm/nwfpe/entry.S @@ -119,7 +119,7 @@ next: .Lfix: ret r9 @ let the user eat segfaults .popsection - .pushsection __ex_table,"a" + .pushlinkedsection __ex_table.%S,"a" .align 3 .long .Lx1, .Lfix .popsection ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination 2016-08-07 5:33 ` Nicolas Pitre @ 2016-08-08 3:42 ` Nicholas Piggin 2016-08-08 4:12 ` Nicolas Pitre 0 siblings, 1 reply; 43+ messages in thread From: Nicholas Piggin @ 2016-08-08 3:42 UTC (permalink / raw) To: Nicolas Pitre Cc: linux-kbuild, linux-arch, linuxppc-dev, Stephen Rothwell, Arnd Bergmann, Segher Boessenkool, Alan Modra On Sun, 7 Aug 2016 01:33:45 -0400 (EDT) Nicolas Pitre <nicolas.pitre@linaro.org> wrote: > On Fri, 5 Aug 2016, Nicholas Piggin wrote: > > > Introduce LINKER_DCE option for architectures to select if they want > > to build with -ffunction-sections, -fdata-sections, and link with > > --gc-sections. It requires some work (documented) to ensure all > > unreferenced entrypoints are live, and requires toolchain and > > build verification, so it is made a per-arch option for now. > > > > On a random powerpc64le build, this yelds a significant size saving, > > it boots and runs fine, but there is a lot I haven't tested as yet, > > so these savings may be reduced if there are bugs in the link. > > > > text data bss dec filename > > 11169741 1180744 1923176 14273661 vmlinux > > 10445269 1004127 1919707 13369103 vmlinux.dce > > > > ~700K text, ~170K data, 6% removed from kernel image size. > > > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> > > I played with that too. However this needs distinct sections for > exception tables and the like otherwise the backward references from the > final exception table to those functions responsible for those exception > entries has the effect of pulling in all those functions even if their > entry point is never referenced, making --gc-sections less effective. > I managed to fix this only with a change to gas (accepted upstream). > > But once that is solved, you then have the missing forward reference > problem i.e. nothing actually references those individual exception > entry sections and ld happily drops them all. Having a KEEP() on each of > them is unworkable and defeats the purpose anyway. That requires a > dummy reloc to trick ld into pulling in those sections when the parent > section is also pulled in. Right, although we don't *need* those things just for enabling --gc-sections, do we? It may not be 100% optimal, but it's enough to avoid the regression when switching to --whole-archive build option. > Please see attached a subset of the slides I presented at ELC and Linaro > Connect last year to illustrate those issues. > > Also attached a sample patch partially implementing those changes. > > In short I'm very glad to see that this might steer interest across > multiple architectures. I felt like this was becoming much more > intrusive than I expected and that maybe LTO was a better bet after all. > But LTO has its evils too and I'm willing to look at gc-sections again > if there is interest from others as well. Your results are impressive, and I don't want to stand in the way of either LTO or improving accuracy of --gc-sections. But both are things that can be built on top of this patch, I think. We don't need to do the entire intrusive changes all at once. Thanks, Nick ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination 2016-08-08 3:42 ` Nicholas Piggin @ 2016-08-08 4:12 ` Nicolas Pitre 2016-08-08 4:27 ` Nicholas Piggin 0 siblings, 1 reply; 43+ messages in thread From: Nicolas Pitre @ 2016-08-08 4:12 UTC (permalink / raw) To: Nicholas Piggin Cc: linux-kbuild, linux-arch, linuxppc-dev, Stephen Rothwell, Arnd Bergmann, Segher Boessenkool, Alan Modra On Mon, 8 Aug 2016, Nicholas Piggin wrote: > On Sun, 7 Aug 2016 01:33:45 -0400 (EDT) > Nicolas Pitre <nicolas.pitre@linaro.org> wrote: > > > On Fri, 5 Aug 2016, Nicholas Piggin wrote: > > > > > Introduce LINKER_DCE option for architectures to select if they want > > > to build with -ffunction-sections, -fdata-sections, and link with > > > --gc-sections. It requires some work (documented) to ensure all > > > unreferenced entrypoints are live, and requires toolchain and > > > build verification, so it is made a per-arch option for now. > > > > > > On a random powerpc64le build, this yelds a significant size saving, > > > it boots and runs fine, but there is a lot I haven't tested as yet, > > > so these savings may be reduced if there are bugs in the link. > > > > > > text data bss dec filename > > > 11169741 1180744 1923176 14273661 vmlinux > > > 10445269 1004127 1919707 13369103 vmlinux.dce > > > > > > ~700K text, ~170K data, 6% removed from kernel image size. > > > > > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> > > > > I played with that too. However this needs distinct sections for > > exception tables and the like otherwise the backward references from the > > final exception table to those functions responsible for those exception > > entries has the effect of pulling in all those functions even if their > > entry point is never referenced, making --gc-sections less effective. > > I managed to fix this only with a change to gas (accepted upstream). > > > > But once that is solved, you then have the missing forward reference > > problem i.e. nothing actually references those individual exception > > entry sections and ld happily drops them all. Having a KEEP() on each of > > them is unworkable and defeats the purpose anyway. That requires a > > dummy reloc to trick ld into pulling in those sections when the parent > > section is also pulled in. > > Right, although we don't *need* those things just for enabling > --gc-sections, do we? It may not be 100% optimal, but it's enough > to avoid the regression when switching to --whole-archive build > option. Oh absolutely. > Your results are impressive, and I don't want to stand in the way of > either LTO or improving accuracy of --gc-sections. But both are things > that can be built on top of this patch, I think. Indeed. Those patches are certainly welcome. They represent half of the job already. I just wanted to provide some insight about the whole picture in case someone else notices those flaws I have identified. Nicolas ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination 2016-08-08 4:12 ` Nicolas Pitre @ 2016-08-08 4:27 ` Nicholas Piggin 0 siblings, 0 replies; 43+ messages in thread From: Nicholas Piggin @ 2016-08-08 4:27 UTC (permalink / raw) To: Nicolas Pitre Cc: linux-kbuild, linux-arch, linuxppc-dev, Stephen Rothwell, Arnd Bergmann, Segher Boessenkool, Alan Modra On Mon, 8 Aug 2016 00:12:37 -0400 (EDT) Nicolas Pitre <nicolas.pitre@linaro.org> wrote: > On Mon, 8 Aug 2016, Nicholas Piggin wrote: > > > On Sun, 7 Aug 2016 01:33:45 -0400 (EDT) > > Nicolas Pitre <nicolas.pitre@linaro.org> wrote: > > > > > On Fri, 5 Aug 2016, Nicholas Piggin wrote: > > > > > > > Introduce LINKER_DCE option for architectures to select if they want > > > > to build with -ffunction-sections, -fdata-sections, and link with > > > > --gc-sections. It requires some work (documented) to ensure all > > > > unreferenced entrypoints are live, and requires toolchain and > > > > build verification, so it is made a per-arch option for now. > > > > > > > > On a random powerpc64le build, this yelds a significant size saving, > > > > it boots and runs fine, but there is a lot I haven't tested as yet, > > > > so these savings may be reduced if there are bugs in the link. > > > > > > > > text data bss dec filename > > > > 11169741 1180744 1923176 14273661 vmlinux > > > > 10445269 1004127 1919707 13369103 vmlinux.dce > > > > > > > > ~700K text, ~170K data, 6% removed from kernel image size. > > > > > > > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> > > > > > > I played with that too. However this needs distinct sections for > > > exception tables and the like otherwise the backward references from the > > > final exception table to those functions responsible for those exception > > > entries has the effect of pulling in all those functions even if their > > > entry point is never referenced, making --gc-sections less effective. > > > I managed to fix this only with a change to gas (accepted upstream). > > > > > > But once that is solved, you then have the missing forward reference > > > problem i.e. nothing actually references those individual exception > > > entry sections and ld happily drops them all. Having a KEEP() on each of > > > them is unworkable and defeats the purpose anyway. That requires a > > > dummy reloc to trick ld into pulling in those sections when the parent > > > section is also pulled in. > > > > Right, although we don't *need* those things just for enabling > > --gc-sections, do we? It may not be 100% optimal, but it's enough > > to avoid the regression when switching to --whole-archive build > > option. > > Oh absolutely. > > > Your results are impressive, and I don't want to stand in the way of > > either LTO or improving accuracy of --gc-sections. But both are things > > that can be built on top of this patch, I think. > > Indeed. Those patches are certainly welcome. They represent half of the > job already. I just wanted to provide some insight about the whole > picture in case someone else notices those flaws I have identified. Okay thanks, I appreciate you taking a look. I wanted to be sure I wasn't missing some bug here. Smaller kernel is nice for large systems because it means smaller icache/dcache footprint and fewer branch trampolines, so I'm always happy to see that effort. I will certainly help test LTO or some of these other gc-sections improvements on powerpc. Thanks, Nick ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination 2016-08-05 12:12 ` [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination Nicholas Piggin 2016-08-06 20:14 ` Sam Ravnborg 2016-08-07 5:33 ` Nicolas Pitre @ 2016-08-07 9:57 ` Alan Modra 2016-08-07 11:35 ` Andreas Schwab 2016-08-07 20:26 ` Arnd Bergmann 2 siblings, 2 replies; 43+ messages in thread From: Alan Modra @ 2016-08-07 9:57 UTC (permalink / raw) To: Nicholas Piggin Cc: linux-kbuild, linux-arch, linuxppc-dev, Stephen Rothwell, Arnd Bergmann, Nicolas Pitre, Segher Boessenkool On Fri, Aug 05, 2016 at 10:12:00PM +1000, Nicholas Piggin wrote: > #define TEXT_TEXT \ > ALIGN_FUNCTION(); \ > - *(.text.hot .text .text.fixup .text.unlikely) \ > + *(.text.hot .text .text.fixup .text.unlikely .text.*) \ > *(.ref.text) \ > MEM_KEEP(init.text) \ > MEM_KEEP(exit.text) \ At the risk of being told you (kernel people) have already considerd this I thought I should mention that the above isn't ideal. (Nor is gcc's choice of .text.hot for hot sections, which clashes with --function-sections for a function called "hot" but that's another story.) You'd really like all the hot sections and cold sections to be together, for better cache locality. So the line ought to have been *(.text.hot) *(.text) *(.text.fixup) *(.text.unlikely) That would put all .text.hot sections together. Similarly for .text.unlikely. The trap of course is that this only works if .text.fixup from one object file can be placed relatively far away from .text in the same object file. If it can, then Nicholas' patch should be: *(.text.hot .text.hot.*) *(.text.unlikely .text.unlikely.*) *(.text .text.*) If you can't put .text.fixup too far away then you may as well just use *(.text .text.*) -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination 2016-08-07 9:57 ` Alan Modra @ 2016-08-07 11:35 ` Andreas Schwab 2016-08-07 20:26 ` Arnd Bergmann 1 sibling, 0 replies; 43+ messages in thread From: Andreas Schwab @ 2016-08-07 11:35 UTC (permalink / raw) To: Alan Modra Cc: Nicholas Piggin, linux-kbuild, linux-arch, linuxppc-dev, Stephen Rothwell, Arnd Bergmann, Nicolas Pitre, Segher Boessenkool On So, Aug 07 2016, Alan Modra <amodra@gmail.com> wrote: > At the risk of being told you (kernel people) have already considerd > this I thought I should mention that the above isn't ideal. (Nor is > gcc's choice of .text.hot for hot sections, which clashes with > --function-sections for a function called "hot" but that's another > story.) Or --function-sections should use a unique prefix, like .text.functions.* Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination 2016-08-07 9:57 ` Alan Modra 2016-08-07 11:35 ` Andreas Schwab @ 2016-08-07 20:26 ` Arnd Bergmann 2016-08-07 23:49 ` Alan Modra 1 sibling, 1 reply; 43+ messages in thread From: Arnd Bergmann @ 2016-08-07 20:26 UTC (permalink / raw) To: linuxppc-dev Cc: Alan Modra, Nicholas Piggin, linux-arch, Stephen Rothwell, Nicolas Pitre, linux-kbuild On Sunday, August 7, 2016 7:27:39 PM CEST Alan Modra wrote: > > If it can, then Nicholas' patch should be: > > *(.text.hot .text.hot.*) *(.text.unlikely .text.unlikely.*) *(.text .text.*) > > If you can't put .text.fixup too far away then you may as well just use > > *(.text .text.*) I tried this version: diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index b1f8828e9eac..fc210dacac9a 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -438,7 +438,9 @@ * during second ld run in second ld pass when generating System.map */ #define TEXT_TEXT \ ALIGN_FUNCTION(); \ - *(.text.hot .text .text.fixup .text.unlikely .text.*) \ + *(.text.hot .text.hot.*) \ + *(.text.unlikely .text.fixup .text.unlikely.*) \ + *(.text .text.*) \ *(.ref.text) \ MEM_KEEP(init.text) \ MEM_KEEP(exit.text) \ but that failed to link an allyesconfig kernel because of references from .fixup to .text.*. Trying your version now: *(.text.hot .text.hot.*) *(.text.unlikely .text.unlikely.*) *(.text .text.*) Arnd ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination 2016-08-07 20:26 ` Arnd Bergmann @ 2016-08-07 23:49 ` Alan Modra 2016-08-08 15:14 ` Arnd Bergmann 0 siblings, 1 reply; 43+ messages in thread From: Alan Modra @ 2016-08-07 23:49 UTC (permalink / raw) To: Arnd Bergmann Cc: linuxppc-dev, Nicholas Piggin, linux-arch, Stephen Rothwell, Nicolas Pitre, linux-kbuild On Sun, Aug 07, 2016 at 10:26:19PM +0200, Arnd Bergmann wrote: > On Sunday, August 7, 2016 7:27:39 PM CEST Alan Modra wrote: > > > > If it can, then Nicholas' patch should be: > > > > *(.text.hot .text.hot.*) *(.text.unlikely .text.unlikely.*) *(.text .text.*) > > > > If you can't put .text.fixup too far away then you may as well just use > > > > *(.text .text.*) > > I tried this version: > > diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h > index b1f8828e9eac..fc210dacac9a 100644 > --- a/include/asm-generic/vmlinux.lds.h > +++ b/include/asm-generic/vmlinux.lds.h > @@ -438,7 +438,9 @@ > * during second ld run in second ld pass when generating System.map */ > #define TEXT_TEXT \ > ALIGN_FUNCTION(); \ > - *(.text.hot .text .text.fixup .text.unlikely .text.*) \ > + *(.text.hot .text.hot.*) \ > + *(.text.unlikely .text.fixup .text.unlikely.*) \ > + *(.text .text.*) \ > *(.ref.text) \ > MEM_KEEP(init.text) \ > MEM_KEEP(exit.text) \ > > but that failed to link an allyesconfig kernel because of references > from .fixup to .text.*. Trying your version now: Well then, that proves you can't put .text.fixup too far aways from the associated input section. > *(.text.hot .text.hot.*) *(.text.unlikely .text.unlikely.*) *(.text .text.*) Which means this is guaranteed to fail when you test it properly using gcc's profiling options, in order to generate .text.hot* and/or .text.unlikely* sections. It seems to me the right thing to do would be to change kernel asm to generate .text.foo.fixup for any .text.foo section. A gas feature available with binutils-2.26 enabled by --sectname-subst might help with implementing that. -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination 2016-08-07 23:49 ` Alan Modra @ 2016-08-08 15:14 ` Arnd Bergmann 2016-08-08 23:50 ` Alan Modra 2016-08-09 3:16 ` Andi Kleen 0 siblings, 2 replies; 43+ messages in thread From: Arnd Bergmann @ 2016-08-08 15:14 UTC (permalink / raw) To: Alan Modra Cc: linuxppc-dev, Nicholas Piggin, linux-arch, Stephen Rothwell, Nicolas Pitre, linux-kbuild, Andi Kleen On Monday, August 8, 2016 9:19:47 AM CEST Alan Modra wrote: > On Sun, Aug 07, 2016 at 10:26:19PM +0200, Arnd Bergmann wrote: > > On Sunday, August 7, 2016 7:27:39 PM CEST Alan Modra wrote: > > > > > > If it can, then Nicholas' patch should be: > > > > > > *(.text.hot .text.hot.*) *(.text.unlikely .text.unlikely.*) *(.text .text.*) > > > > > > If you can't put .text.fixup too far away then you may as well just use > > > > > > *(.text .text.*) > > > > I tried this version: > > > > diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h > > index b1f8828e9eac..fc210dacac9a 100644 > > --- a/include/asm-generic/vmlinux.lds.h > > +++ b/include/asm-generic/vmlinux.lds.h > > @@ -438,7 +438,9 @@ > > * during second ld run in second ld pass when generating System.map */ > > #define TEXT_TEXT \ > > ALIGN_FUNCTION(); \ > > - *(.text.hot .text .text.fixup .text.unlikely .text.*) \ > > + *(.text.hot .text.hot.*) \ > > + *(.text.unlikely .text.fixup .text.unlikely.*) \ > > + *(.text .text.*) \ > > *(.ref.text) \ > > MEM_KEEP(init.text) \ > > MEM_KEEP(exit.text) \ > > > > but that failed to link an allyesconfig kernel because of references > > from .fixup to .text.*. Trying your version now: > > Well then, that proves you can't put .text.fixup too far aways from > the associated input section. > > > *(.text.hot .text.hot.*) *(.text.unlikely .text.unlikely.*) *(.text .text.*) > > Which means this is guaranteed to fail when you test it properly using > gcc's profiling options, in order to generate .text.hot* and/or > .text.unlikely* sections. I've investigated further and it seems that "*(.text.fixup) *(.text .text.*)" fails just because we list .text.fixup twice. The .text.fixup section was originally[1] introduced to work around the same link error that it is causing now: if we use recursive linking, merging .text and .text.fixup helps avoid the problems of sections that are >32MB before the final link. I have reverted that patch now, so ARM uses ".fixup" again like every other architecture does, and now "*(.fixup) *(.text .text.*)" works correctly, while ""*(.fixup) *(.text .fixup .text.*)" also fails the same way that I saw before: drivers/scsi/sg.o:(.fixup+0x4): relocation truncated to fit: R_ARM_THM_JUMP24 against `.text.sg_ioctl' drivers/scsi/sg.o:(.fixup+0xc): relocation truncated to fit: R_ARM_THM_JUMP24 against `.text.sg_ioctl' drivers/scsi/sg.o:(.fixup+0x14): relocation truncated to fit: R_ARM_THM_JUMP24 against `.text.sg_ioctl' drivers/scsi/sg.o:(.fixup+0x1c): relocation truncated to fit: R_ARM_THM_JUMP24 against `.text.sg_ioctl' drivers/scsi/sg.o:(.fixup+0x24): relocation truncated to fit: R_ARM_THM_JUMP24 against `.text.sg_ioctl' drivers/scsi/sg.o:(.fixup+0x2c): relocation truncated to fit: R_ARM_THM_JUMP24 against `.text.sg_ioctl' drivers/scsi/sg.o:(.fixup+0x34): relocation truncated to fit: R_ARM_THM_JUMP24 against `.text.sg_ioctl' drivers/scsi/sg.o:(.fixup+0x3c): relocation truncated to fit: R_ARM_THM_JUMP24 against `.text.sg_ioctl' drivers/scsi/sg.o:(.fixup+0x44): relocation truncated to fit: R_ARM_THM_JUMP24 against `.text.sg_ioctl' I don't understand what led Andi Kleen to also move .text.hot and .text.unlikely together with .text [2], but this may have been a related issue. > It seems to me the right thing to do would be to change kernel asm to > generate .text.foo.fixup for any .text.foo section. A gas feature > available with binutils-2.26 enabled by --sectname-subst might help > with implementing that. I think this what Nico wanted to use anyway to eliminate more functions from the output. Arnd [1] http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=8321 http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=8322 [2] https://lkml.org/lkml/2015/7/19/377 ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination 2016-08-08 15:14 ` Arnd Bergmann @ 2016-08-08 23:50 ` Alan Modra 2016-08-09 22:10 ` Arnd Bergmann 2016-08-09 3:16 ` Andi Kleen 1 sibling, 1 reply; 43+ messages in thread From: Alan Modra @ 2016-08-08 23:50 UTC (permalink / raw) To: Arnd Bergmann Cc: linuxppc-dev, Nicholas Piggin, linux-arch, Stephen Rothwell, Nicolas Pitre, linux-kbuild, Andi Kleen On Mon, Aug 08, 2016 at 05:14:27PM +0200, Arnd Bergmann wrote: > I have reverted that patch now, so ARM uses ".fixup" again like every > other architecture does, and now "*(.fixup) *(.text .text.*)" works > correctly, while ""*(.fixup) *(.text .fixup .text.*)" also fails > the same way that I saw before: That is really odd. The linker isn't supposed to treat those script snippets differently. First match for .fixup wins. $ cat > fixup1.s <<\EOF .global _start .text _start: .dc.a .L2 .L1: .section ".fixup","ax",%progbits .L2: .dc.a .L1 EOF $ cat > fixup2.s <<\EOF .section ".text.xyz","ax",%progbits .dc.a .L2 .L1: .section ".fixup","ax",%progbits .L2: .dc.a .L1 EOF $ cat > fixup.lnk <<\EOF SECTIONS { .text : { *(.fixup) *(.text .fixup .text.*) } } EOF $ as -o fixup1.o fixup1.s $ as -o fixup2.o fixup2.s $ ld -o fixup -T fixup.lnk -Map fixup.map fixup1.o fixup2.o $ cat fixup.map Memory Configuration Name Origin Length Attributes *default* 0x0000000000000000 0xffffffffffffffff Linker script and memory map .text 0x0000000000000000 0x10 *(.fixup) .fixup 0x0000000000000000 0x4 fixup1.o .fixup 0x0000000000000004 0x4 fixup2.o *(.text .fixup .text.*) .text 0x0000000000000008 0x4 fixup1.o 0x0000000000000008 _start .text 0x000000000000000c 0x0 fixup2.o .text.xyz 0x000000000000000c 0x4 fixup2.o [snip] -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination 2016-08-08 23:50 ` Alan Modra @ 2016-08-09 22:10 ` Arnd Bergmann 0 siblings, 0 replies; 43+ messages in thread From: Arnd Bergmann @ 2016-08-09 22:10 UTC (permalink / raw) To: Alan Modra Cc: linuxppc-dev, Nicholas Piggin, linux-arch, Stephen Rothwell, Nicolas Pitre, linux-kbuild, Andi Kleen On Tuesday, August 9, 2016 9:20:16 AM CEST Alan Modra wrote: > On Mon, Aug 08, 2016 at 05:14:27PM +0200, Arnd Bergmann wrote: > > I have reverted that patch now, so ARM uses ".fixup" again like every > > other architecture does, and now "*(.fixup) *(.text .text.*)" works > > correctly, while ""*(.fixup) *(.text .fixup .text.*)" also fails > > the same way that I saw before: > > That is really odd. The linker isn't supposed to treat those script > snippets differently. First match for .fixup wins. Sorry for my mistake. I checked again and cannot reproduce what I thought I saw earlier. "*(.fixup) *(.text .text.*)" fails as would be expected. Arnd ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination 2016-08-08 15:14 ` Arnd Bergmann 2016-08-08 23:50 ` Alan Modra @ 2016-08-09 3:16 ` Andi Kleen 2016-08-09 22:29 ` Arnd Bergmann 1 sibling, 1 reply; 43+ messages in thread From: Andi Kleen @ 2016-08-09 3:16 UTC (permalink / raw) To: Arnd Bergmann Cc: Alan Modra, linuxppc-dev, Nicholas Piggin, linux-arch, Stephen Rothwell, Nicolas Pitre, linux-kbuild > I don't understand what led Andi Kleen to also move .text.hot and > .text.unlikely together with .text [2], but this may have > been a related issue. The goal was just to move .hot and .unlikely all together, so that they are clustered and use the minimum amount of cache. On x86 doesn't matter where they are exactly, as long as each is together. If they are not explicitely listed then the linker interleaves them with the normal text, which defeats the purpose. -Andi ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination 2016-08-09 3:16 ` Andi Kleen @ 2016-08-09 22:29 ` Arnd Bergmann 2016-08-09 23:08 ` Andi Kleen 2016-08-10 0:37 ` Andi Kleen 0 siblings, 2 replies; 43+ messages in thread From: Arnd Bergmann @ 2016-08-09 22:29 UTC (permalink / raw) To: Andi Kleen Cc: Alan Modra, linuxppc-dev, Nicholas Piggin, linux-arch, Stephen Rothwell, Nicolas Pitre, linux-kbuild On Monday, August 8, 2016 8:16:05 PM CEST Andi Kleen wrote: > > I don't understand what led Andi Kleen to also move .text.hot and > > .text.unlikely together with .text [2], but this may have > > been a related issue. > > > > [2] https://lkml.org/lkml/2015/7/19/377 > > The goal was just to move .hot and .unlikely all together, so that > they are clustered and use the minimum amount of cache. On x86 doesn't > matter where they are exactly, as long as each is together. > If they are not explicitely listed then the linker interleaves > them with the normal text, which defeats the purpose. I still don't see it, my reading of your patch is that you did the opposite, by changing the description that puts all .text.hot in front of .text, and all .text.unlikely after exit.text into one that mixes them with .text. What am I missing here? Arnd ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination 2016-08-09 22:29 ` Arnd Bergmann @ 2016-08-09 23:08 ` Andi Kleen 2016-08-10 0:37 ` Andi Kleen 1 sibling, 0 replies; 43+ messages in thread From: Andi Kleen @ 2016-08-09 23:08 UTC (permalink / raw) To: Arnd Bergmann Cc: Alan Modra, linuxppc-dev, Nicholas Piggin, linux-arch, Stephen Rothwell, Nicolas Pitre, linux-kbuild On Wed, Aug 10, 2016 at 12:29:29AM +0200, Arnd Bergmann wrote: > On Monday, August 8, 2016 8:16:05 PM CEST Andi Kleen wrote: > > > I don't understand what led Andi Kleen to also move .text.hot and > > > .text.unlikely together with .text [2], but this may have > > > been a related issue. > > > > > > [2] https://lkml.org/lkml/2015/7/19/377 > > > > The goal was just to move .hot and .unlikely all together, so that > > they are clustered and use the minimum amount of cache. On x86 doesn't > > matter where they are exactly, as long as each is together. > > If they are not explicitely listed then the linker interleaves > > them with the normal text, which defeats the purpose. > > I still don't see it, my reading of your patch is that you did > the opposite, by changing the description that puts all .text.hot > in front of .text, and all .text.unlikely after exit.text into > one that mixes them with .text. What am I missing here? .text.hot is actually not used, the critical part is .text.unlikely which was not listed and was interleaved before the patch. -Andi -- ak@linux.intel.com -- Speaking for myself only ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination 2016-08-09 22:29 ` Arnd Bergmann 2016-08-09 23:08 ` Andi Kleen @ 2016-08-10 0:37 ` Andi Kleen 1 sibling, 0 replies; 43+ messages in thread From: Andi Kleen @ 2016-08-10 0:37 UTC (permalink / raw) To: Arnd Bergmann Cc: Alan Modra, linuxppc-dev, Nicholas Piggin, linux-arch, Stephen Rothwell, Nicolas Pitre, linux-kbuild On Wed, Aug 10, 2016 at 12:29:29AM +0200, Arnd Bergmann wrote: > On Monday, August 8, 2016 8:16:05 PM CEST Andi Kleen wrote: > > > I don't understand what led Andi Kleen to also move .text.hot and > > > .text.unlikely together with .text [2], but this may have > > > been a related issue. > > > > > > [2] https://lkml.org/lkml/2015/7/19/377 > > > > The goal was just to move .hot and .unlikely all together, so that > > they are clustered and use the minimum amount of cache. On x86 doesn't > > matter where they are exactly, as long as each is together. > > If they are not explicitely listed then the linker interleaves > > them with the normal text, which defeats the purpose. > > I still don't see it, my reading of your patch is that you did > the opposite, by changing the description that puts all .text.hot > in front of .text, and all .text.unlikely after exit.text into > one that mixes them with .text. What am I missing here? No it doesn't mix .unlikely with .text, .unlikely is all in one place. -Andi -- ak@linux.intel.com -- Speaking for myself only ^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH 3/5] kbuild: add arch specific post-module-link pass 2016-08-05 12:11 [RFC][PATCH 0/5] kbuild changes, thin archives, --gc-sections Nicholas Piggin 2016-08-05 12:11 ` [PATCH 1/5] kbuild: allow architectures to use thin archives instead of ld -r Nicholas Piggin 2016-08-05 12:12 ` [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination Nicholas Piggin @ 2016-08-05 12:12 ` Nicholas Piggin 2016-08-05 13:56 ` Nicholas Piggin 2016-08-06 20:16 ` Sam Ravnborg 2016-08-05 12:12 ` [PATCH 4/5] powerpc: switch to using thin archives Nicholas Piggin ` (3 subsequent siblings) 6 siblings, 2 replies; 43+ messages in thread From: Nicholas Piggin @ 2016-08-05 12:12 UTC (permalink / raw) To: linux-kbuild Cc: Nicholas Piggin, linux-arch, linuxppc-dev, Stephen Rothwell, Arnd Bergmann, Nicolas Pitre, Segher Boessenkool, Alan Modra Add an option for architectures to pass over modules after they are linked. powerpc will use this to fix up alternate instruction patch relocations. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> --- Documentation/kbuild/makefiles.txt | 6 ++++++ Makefile | 1 + scripts/Makefile.modpost | 8 ++++++++ 3 files changed, 15 insertions(+) diff --git a/Documentation/kbuild/makefiles.txt b/Documentation/kbuild/makefiles.txt index 13f888a..f6c065b 100644 --- a/Documentation/kbuild/makefiles.txt +++ b/Documentation/kbuild/makefiles.txt @@ -952,6 +952,12 @@ When kbuild executes, the following steps are followed (roughly): $(KBUILD_ARFLAGS) set by the top level Makefile to "D" (deterministic mode) if this option is supported by $(AR). + KBUILD_MODPOST_TOOL Arch-specific command to run after module link + + $(KBUILD_MODPOST_TOOL) is used to add an arch-specific pass over + modules after their final link. E.g., powerpc uses this to adjust + relative branches of "alternate code patching" sections. + ARCH_CPPFLAGS, ARCH_AFLAGS, ARCH_CFLAGS Overrides the kbuild defaults These variables are appended to the KBUILD_CPPFLAGS, diff --git a/Makefile b/Makefile index d5ef31a..99ab8eb 100644 --- a/Makefile +++ b/Makefile @@ -421,6 +421,7 @@ export KBUILD_AFLAGS AFLAGS_KERNEL AFLAGS_MODULE export KBUILD_AFLAGS_MODULE KBUILD_CFLAGS_MODULE KBUILD_LDFLAGS_MODULE export KBUILD_AFLAGS_KERNEL KBUILD_CFLAGS_KERNEL export KBUILD_ARFLAGS +export KBUILD_MODPOST_TOOL # When compiling out-of-tree modules, put MODVERDIR in the module # tree rather than in the kernel tree. The kernel tree might diff --git a/scripts/Makefile.modpost b/scripts/Makefile.modpost index 1366a94..19f8481 100644 --- a/scripts/Makefile.modpost +++ b/scripts/Makefile.modpost @@ -121,8 +121,16 @@ quiet_cmd_ld_ko_o = LD [M] $@ $(KBUILD_LDFLAGS_MODULE) $(LDFLAGS_MODULE) \ -o $@ $(filter-out FORCE,$^) +ifdef KBUILD_MODPOST_TOOL +quiet_cmd_arch_modpost = ARCH $@ + cmd_arch_modpost = $(KBUILD_MODPOST_TOOL) $@ +endif + $(modules): %.ko :%.o %.mod.o FORCE $(call if_changed,ld_ko_o) +ifdef KBUILD_MODPOST_TOOL + $(call if_changed,arch_modpost) +endif targets += $(modules) -- 2.8.1 ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH 3/5] kbuild: add arch specific post-module-link pass 2016-08-05 12:12 ` [PATCH 3/5] kbuild: add arch specific post-module-link pass Nicholas Piggin @ 2016-08-05 13:56 ` Nicholas Piggin 2016-08-06 20:16 ` Sam Ravnborg 1 sibling, 0 replies; 43+ messages in thread From: Nicholas Piggin @ 2016-08-05 13:56 UTC (permalink / raw) To: linux-kbuild Cc: linux-arch, linuxppc-dev, Stephen Rothwell, Arnd Bergmann, Nicolas Pitre, Segher Boessenkool, Alan Modra On Fri, 5 Aug 2016 22:12:01 +1000 Nicholas Piggin <npiggin@gmail.com> wrote: > Add an option for architectures to pass over modules after they are > linked. powerpc will use this to fix up alternate instruction patch > relocations. For that matter, now I think about it, I'd like to have this generic postmod pass for the vmlinux as well. And it would be to call into the arch Makefile rather than just supply a tool. Currently powerpc deals with it by adding dependencies on its zImage target, but it would be really nice to be able to fix that while we're here too. Is that going to work? Thanks, Nick ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 3/5] kbuild: add arch specific post-module-link pass 2016-08-05 12:12 ` [PATCH 3/5] kbuild: add arch specific post-module-link pass Nicholas Piggin 2016-08-05 13:56 ` Nicholas Piggin @ 2016-08-06 20:16 ` Sam Ravnborg 2016-08-08 3:30 ` Nicholas Piggin 1 sibling, 1 reply; 43+ messages in thread From: Sam Ravnborg @ 2016-08-06 20:16 UTC (permalink / raw) To: Nicholas Piggin Cc: linux-kbuild, linux-arch, linuxppc-dev, Stephen Rothwell, Arnd Bergmann, Nicolas Pitre, Segher Boessenkool, Alan Modra On Fri, Aug 05, 2016 at 10:12:01PM +1000, Nicholas Piggin wrote: > Add an option for architectures to pass over modules after they are > linked. powerpc will use this to fix up alternate instruction patch > relocations. > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> > --- > Documentation/kbuild/makefiles.txt | 6 ++++++ > Makefile | 1 + > scripts/Makefile.modpost | 8 ++++++++ > 3 files changed, 15 insertions(+) > > diff --git a/Documentation/kbuild/makefiles.txt b/Documentation/kbuild/makefiles.txt > index 13f888a..f6c065b 100644 > --- a/Documentation/kbuild/makefiles.txt > +++ b/Documentation/kbuild/makefiles.txt > @@ -952,6 +952,12 @@ When kbuild executes, the following steps are followed (roughly): > $(KBUILD_ARFLAGS) set by the top level Makefile to "D" (deterministic > mode) if this option is supported by $(AR). > > + KBUILD_MODPOST_TOOL Arch-specific command to run after module link > + > + $(KBUILD_MODPOST_TOOL) is used to add an arch-specific pass over > + modules after their final link. E.g., powerpc uses this to adjust > + relative branches of "alternate code patching" sections. > + This needs documentation in kbuild.txt, where there is a nearly full lst of KBUILD_ variables. Sam ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 3/5] kbuild: add arch specific post-module-link pass 2016-08-06 20:16 ` Sam Ravnborg @ 2016-08-08 3:30 ` Nicholas Piggin 0 siblings, 0 replies; 43+ messages in thread From: Nicholas Piggin @ 2016-08-08 3:30 UTC (permalink / raw) To: Sam Ravnborg Cc: linux-kbuild, linux-arch, linuxppc-dev, Stephen Rothwell, Arnd Bergmann, Nicolas Pitre, Segher Boessenkool, Alan Modra On Sat, 6 Aug 2016 22:16:29 +0200 Sam Ravnborg <sam@ravnborg.org> wrote: > On Fri, Aug 05, 2016 at 10:12:01PM +1000, Nicholas Piggin wrote: > > Add an option for architectures to pass over modules after they are > > linked. powerpc will use this to fix up alternate instruction patch > > relocations. > > > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> > > --- > > Documentation/kbuild/makefiles.txt | 6 ++++++ > > Makefile | 1 + > > scripts/Makefile.modpost | 8 ++++++++ > > 3 files changed, 15 insertions(+) > > > > diff --git a/Documentation/kbuild/makefiles.txt b/Documentation/kbuild/makefiles.txt > > index 13f888a..f6c065b 100644 > > --- a/Documentation/kbuild/makefiles.txt > > +++ b/Documentation/kbuild/makefiles.txt > > @@ -952,6 +952,12 @@ When kbuild executes, the following steps are followed (roughly): > > $(KBUILD_ARFLAGS) set by the top level Makefile to "D" (deterministic > > mode) if this option is supported by $(AR). > > > > + KBUILD_MODPOST_TOOL Arch-specific command to run after module link > > + > > + $(KBUILD_MODPOST_TOOL) is used to add an arch-specific pass over > > + modules after their final link. E.g., powerpc uses this to adjust > > + relative branches of "alternate code patching" sections. > > + > > This needs documentation in kbuild.txt, where there is a > nearly full lst of KBUILD_ variables. Thanks, I missed that. Thanks, Nick ^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH 4/5] powerpc: switch to using thin archives 2016-08-05 12:11 [RFC][PATCH 0/5] kbuild changes, thin archives, --gc-sections Nicholas Piggin ` (2 preceding siblings ...) 2016-08-05 12:12 ` [PATCH 3/5] kbuild: add arch specific post-module-link pass Nicholas Piggin @ 2016-08-05 12:12 ` Nicholas Piggin 2016-08-05 12:12 ` [PATCH 5/5] powerpc/64: use linker dce Nicholas Piggin ` (2 subsequent siblings) 6 siblings, 0 replies; 43+ messages in thread From: Nicholas Piggin @ 2016-08-05 12:12 UTC (permalink / raw) To: linux-kbuild Cc: Nicholas Piggin, linux-arch, linuxppc-dev, Stephen Rothwell, Arnd Bergmann, Nicolas Pitre, Segher Boessenkool, Alan Modra From: Stephen Rothwell <sfr@canb.auug.org.au> Some change to the way we invoke ar is required so it can be used by scripts/link-vmlinux.sh Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> --- arch/powerpc/Makefile | 6 ++++-- arch/powerpc/platforms/Kconfig.cputype | 1 + 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index 709a22a..160837c 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -23,7 +23,8 @@ CROSS32AR := $(CROSS32_COMPILE)ar ifeq ($(HAS_BIARCH),y) ifeq ($(CROSS32_COMPILE),) CROSS32CC := $(CC) -m32 -CROSS32AR := GNUTARGET=elf32-powerpc $(AR) +CROSS32AR := $(AR) +KBUILD_ARFLAGS += --target elf32-powerpc endif endif @@ -93,7 +94,8 @@ ifeq ($(HAS_BIARCH),y) override AS += -a$(CONFIG_WORD_SIZE) override LD += -m elf$(CONFIG_WORD_SIZE)$(LDEMULATION) override CC += -m$(CONFIG_WORD_SIZE) -override AR := GNUTARGET=elf$(CONFIG_WORD_SIZE)-$(GNUTARGET) $(AR) +override AR := $(AR) +KBUILD_ARFLAGS += --target elf$(CONFIG_WORD_SIZE)-$(GNUTARGET) endif LDFLAGS_vmlinux-y := -Bstatic diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype index 77e9b8d..3c77091 100644 --- a/arch/powerpc/platforms/Kconfig.cputype +++ b/arch/powerpc/platforms/Kconfig.cputype @@ -2,6 +2,7 @@ config PPC64 bool "64-bit kernel" default n select HAVE_VIRT_CPU_ACCOUNTING + select THIN_ARCHIVES select ZLIB_DEFLATE help This option selects whether a 32-bit or a 64-bit kernel -- 2.8.1 ^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 5/5] powerpc/64: use linker dce 2016-08-05 12:11 [RFC][PATCH 0/5] kbuild changes, thin archives, --gc-sections Nicholas Piggin ` (3 preceding siblings ...) 2016-08-05 12:12 ` [PATCH 4/5] powerpc: switch to using thin archives Nicholas Piggin @ 2016-08-05 12:12 ` Nicholas Piggin 2016-08-05 13:32 ` Nicholas Piggin 2016-08-07 20:23 ` Arnd Bergmann 6 siblings, 0 replies; 43+ messages in thread From: Nicholas Piggin @ 2016-08-05 12:12 UTC (permalink / raw) To: linux-kbuild Cc: Nicholas Piggin, linux-arch, linuxppc-dev, Stephen Rothwell, Arnd Bergmann, Nicolas Pitre, Segher Boessenkool, Alan Modra --- arch/powerpc/kernel/Makefile | 3 +++ arch/powerpc/kernel/vmlinux.lds.S | 2 +- arch/powerpc/platforms/Kconfig.cputype | 1 + 3 files changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index 2da380f..b356e59 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -4,7 +4,10 @@ CFLAGS_ptrace.o += -DUTS_MACHINE='"$(UTS_MACHINE)"' +ccflags-y += -fno-function-sections -fno-data-sections + subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror +subdir-ccflags-y += -fno-function-sections -fno-data-sections ifeq ($(CONFIG_PPC64),y) CFLAGS_prom_init.o += $(NO_MINIMAL_TOC) diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S index 2dd91f7..c157b8d 100644 --- a/arch/powerpc/kernel/vmlinux.lds.S +++ b/arch/powerpc/kernel/vmlinux.lds.S @@ -50,7 +50,7 @@ SECTIONS HEAD_TEXT _text = .; /* careful! __ftr_alt_* sections need to be close to .text */ - *(.text .fixup __ftr_alt_* .ref.text) + *(.text .text.* .fixup __ftr_alt_* .ref.text) SCHED_TEXT LOCK_TEXT KPROBES_TEXT diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype index 3c77091..6afeb9d 100644 --- a/arch/powerpc/platforms/Kconfig.cputype +++ b/arch/powerpc/platforms/Kconfig.cputype @@ -3,6 +3,7 @@ config PPC64 default n select HAVE_VIRT_CPU_ACCOUNTING select THIN_ARCHIVES + select LINKER_DCE select ZLIB_DEFLATE help This option selects whether a 32-bit or a 64-bit kernel -- 2.8.1 ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [RFC][PATCH 0/5] kbuild changes, thin archives, --gc-sections 2016-08-05 12:11 [RFC][PATCH 0/5] kbuild changes, thin archives, --gc-sections Nicholas Piggin @ 2016-08-05 13:32 ` Nicholas Piggin 2016-08-05 12:12 ` [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination Nicholas Piggin ` (5 subsequent siblings) 6 siblings, 0 replies; 43+ messages in thread From: Nicholas Piggin @ 2016-08-05 13:32 UTC (permalink / raw) To: linux-kbuild Cc: linux-arch, linuxppc-dev, Stephen Rothwell, Arnd Bergmann, Nicolas Pitre, Segher Boessenkool, Alan Modra On Fri, 5 Aug 2016 22:11:58 +1000 Nicholas Piggin <npiggin@gmail.com> wrote: > Hello, > > I have 3 different things in this patchset. All arch specific, but all > involve kbuild changes, so I'd like to discuss them with kbuild > maintainers. The goal has been to improve long standing linking > difficulties with the powerpc kernel. Here's a 30 second hack of an x86 patch. It seems to build and boot defconfig in a really quick kvm test. For x86-64 machine building x86 target, defconfig, make -j8 vmlinux time: orig thinarc thinarc+dce real 4m58.865s 4m59.747s 5m20.028s user 15m14.428s 15m13.868s 15m17.012s sys 0m57.296s 0m55.904s 0m58.416s build output directory size: orig thinarc thinarc+dce 317M 257M 285M vmlinux size: text data bss dec filename 10192338 4360136 1105920 15658394 vmlinux 10186739 4356040 1105920 15648699 vmlinux.thinarc 9580486 3759880 1011712 14352078 vmlinux.thinarc+dce Thanks, Nick diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 0a7b885..845069e 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -51,6 +51,8 @@ config X86 select ARCH_WANT_IPC_PARSE_VERSION if X86_32 select ARCH_WANT_OPTIONAL_GPIOLIB select BUILDTIME_EXTABLE_SORT + select THIN_ARCHIVES + select LINKER_DCE select CLKEVT_I8253 select CLKSRC_I8253 if X86_32 select CLOCKSOURCE_VALIDATE_LAST_CYCLE diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S index 9297a00..7395dd8 100644 --- a/arch/x86/kernel/vmlinux.lds.S +++ b/arch/x86/kernel/vmlinux.lds.S @@ -92,7 +92,7 @@ SECTIONS .text : AT(ADDR(.text) - LOAD_OFFSET) { _text = .; /* bootstrapping code */ - HEAD_TEXT + KEEP(HEAD_TEXT) . = ALIGN(8); _stext = .; TEXT_TEXT @@ -321,7 +321,7 @@ SECTIONS .bss : AT(ADDR(.bss) - LOAD_OFFSET) { __bss_start = .; *(.bss..page_aligned) - *(.bss) + *(.bss .bss.*) . = ALIGN(PAGE_SIZE); __bss_stop = .; } ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [RFC][PATCH 0/5] kbuild changes, thin archives, --gc-sections @ 2016-08-05 13:32 ` Nicholas Piggin 0 siblings, 0 replies; 43+ messages in thread From: Nicholas Piggin @ 2016-08-05 13:32 UTC (permalink / raw) To: linux-kbuild Cc: linux-arch, Stephen Rothwell, Arnd Bergmann, Nicolas Pitre, Alan Modra, linuxppc-dev On Fri, 5 Aug 2016 22:11:58 +1000 Nicholas Piggin <npiggin@gmail.com> wrote: > Hello, > > I have 3 different things in this patchset. All arch specific, but all > involve kbuild changes, so I'd like to discuss them with kbuild > maintainers. The goal has been to improve long standing linking > difficulties with the powerpc kernel. Here's a 30 second hack of an x86 patch. It seems to build and boot defconfig in a really quick kvm test. For x86-64 machine building x86 target, defconfig, make -j8 vmlinux time: orig thinarc thinarc+dce real 4m58.865s 4m59.747s 5m20.028s user 15m14.428s 15m13.868s 15m17.012s sys 0m57.296s 0m55.904s 0m58.416s build output directory size: orig thinarc thinarc+dce 317M 257M 285M vmlinux size: text data bss dec filename 10192338 4360136 1105920 15658394 vmlinux 10186739 4356040 1105920 15648699 vmlinux.thinarc 9580486 3759880 1011712 14352078 vmlinux.thinarc+dce Thanks, Nick diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 0a7b885..845069e 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -51,6 +51,8 @@ config X86 select ARCH_WANT_IPC_PARSE_VERSION if X86_32 select ARCH_WANT_OPTIONAL_GPIOLIB select BUILDTIME_EXTABLE_SORT + select THIN_ARCHIVES + select LINKER_DCE select CLKEVT_I8253 select CLKSRC_I8253 if X86_32 select CLOCKSOURCE_VALIDATE_LAST_CYCLE diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S index 9297a00..7395dd8 100644 --- a/arch/x86/kernel/vmlinux.lds.S +++ b/arch/x86/kernel/vmlinux.lds.S @@ -92,7 +92,7 @@ SECTIONS .text : AT(ADDR(.text) - LOAD_OFFSET) { _text = .; /* bootstrapping code */ - HEAD_TEXT + KEEP(HEAD_TEXT) . = ALIGN(8); _stext = .; TEXT_TEXT @@ -321,7 +321,7 @@ SECTIONS .bss : AT(ADDR(.bss) - LOAD_OFFSET) { __bss_start = .; *(.bss..page_aligned) - *(.bss) + *(.bss .bss.*) . = ALIGN(PAGE_SIZE); __bss_stop = .; } ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [RFC][PATCH 0/5] kbuild changes, thin archives, --gc-sections 2016-08-05 12:11 [RFC][PATCH 0/5] kbuild changes, thin archives, --gc-sections Nicholas Piggin ` (5 preceding siblings ...) 2016-08-05 13:32 ` Nicholas Piggin @ 2016-08-07 20:23 ` Arnd Bergmann 2016-08-08 3:53 ` Nicholas Piggin 6 siblings, 1 reply; 43+ messages in thread From: Arnd Bergmann @ 2016-08-07 20:23 UTC (permalink / raw) To: Nicholas Piggin Cc: linux-kbuild, linux-arch, linuxppc-dev, Stephen Rothwell, Nicolas Pitre, Segher Boessenkool, Alan Modra On Friday, August 5, 2016 10:11:58 PM CEST Nicholas Piggin wrote: > Hello, > > I have 3 different things in this patchset. All arch specific, but all > involve kbuild changes, so I'd like to discuss them with kbuild > maintainers. The goal has been to improve long standing linking > difficulties with the powerpc kernel. > > * First, building kernel using thin archives rather than incremental > linking. This seems quite clean and is per-arch, so I hope it should > not be too controversial. > > * Second, building kernel using -ffunction-sections -fdata-sections, > --gc-sections. Yes, I'm spinning the wheel again. It was motivated > by tiny codesize regression in the first patch, but the results seem > too good to ignore. > > * Third, allowing architecture to run a tool over module after it has > been linked. Powerpc wants to use it in order to relocate "alternate > code" instructions that get don't get linked at their runtime > address. No idea if this is the right approach wrt kbuild, but it > seems to work. > > I have included the powerpc code for the first two as a reference. The > third is much bigger and mostly uninteresting for this cc list, but it > can be found here: > > https://patchwork.ozlabs.org/patch/651006/ > > Comments appreciated. > > I've started tested this a bit on ARM now. The first things I noticed are: 1. /home/arnd/cross-gcc/bin/arm-linux-gnueabi-ld: warning: drivers/xen/efi.o uses 2-byte wchar_t yet the output is to use 4-byte wchar_t; use of wchar_t values across objects may fail (actually this one has been present since the first version that stopped using recursive linking, I did 971a69db7dc0 ("Xen: don't warn about 2-byte wchar_t in efi") when I first saw the bug, but that fix no longer works and we have to do this differently 2. big-endian builds on ARM stopped working, I now get 22:53:02 CC init/do_mounts_md.o 22:53:02 LD init/mounts.o /home/arnd/cross-gcc/bin/arm-linux-gnueabi-ld: init/do_mounts.o: compiled for a big endian system and target is little endian /home/arnd/cross-gcc/bin/arm-linux-gnueabi-ld: failed to merge target specific data of file init/do_mounts.o The problem seems to be that we don't pass the correct linker flags any more, it should be using --be8 from arch/arm/Makefile:LDFLAGS_vmlinux += --be8 arch/arm/Makefile:LDFLAGS_MODULE += --be8 but that somehow is lost. 3. drivers/firmware/efi/libstub/lib.a: error adding symbols: Archive has no index; run ranlib to add one haven't investigated at all, turned off EFI for now. Arnd ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC][PATCH 0/5] kbuild changes, thin archives, --gc-sections 2016-08-07 20:23 ` Arnd Bergmann @ 2016-08-08 3:53 ` Nicholas Piggin 0 siblings, 0 replies; 43+ messages in thread From: Nicholas Piggin @ 2016-08-08 3:53 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kbuild, linux-arch, linuxppc-dev, Stephen Rothwell, Nicolas Pitre, Segher Boessenkool, Alan Modra On Sun, 07 Aug 2016 22:23:02 +0200 Arnd Bergmann <arnd@arndb.de> wrote: > On Friday, August 5, 2016 10:11:58 PM CEST Nicholas Piggin wrote: > > Hello, > > > > I have 3 different things in this patchset. All arch specific, but all > > involve kbuild changes, so I'd like to discuss them with kbuild > > maintainers. The goal has been to improve long standing linking > > difficulties with the powerpc kernel. > > > > * First, building kernel using thin archives rather than incremental > > linking. This seems quite clean and is per-arch, so I hope it should > > not be too controversial. > > > > * Second, building kernel using -ffunction-sections -fdata-sections, > > --gc-sections. Yes, I'm spinning the wheel again. It was motivated > > by tiny codesize regression in the first patch, but the results seem > > too good to ignore. > > > > * Third, allowing architecture to run a tool over module after it has > > been linked. Powerpc wants to use it in order to relocate "alternate > > code" instructions that get don't get linked at their runtime > > address. No idea if this is the right approach wrt kbuild, but it > > seems to work. > > > > I have included the powerpc code for the first two as a reference. The > > third is much bigger and mostly uninteresting for this cc list, but it > > can be found here: > > > > https://patchwork.ozlabs.org/patch/651006/ > > > > Comments appreciated. > > > > > > I've started tested this a bit on ARM now. The first things I noticed > are: > > 1. /home/arnd/cross-gcc/bin/arm-linux-gnueabi-ld: warning: drivers/xen/efi.o uses 2-byte wchar_t yet the output is to use 4-byte wchar_t; use of wchar_t values across objects may fail > > (actually this one has been present since the first version that > stopped using recursive linking, I did 971a69db7dc0 ("Xen: don't > warn about 2-byte wchar_t in efi") when I first saw the bug, but > that fix no longer works and we have to do this differently > > 2. big-endian builds on ARM stopped working, I now get > > 22:53:02 CC init/do_mounts_md.o > 22:53:02 LD init/mounts.o > /home/arnd/cross-gcc/bin/arm-linux-gnueabi-ld: init/do_mounts.o: compiled for a big endian system and target is little endian > /home/arnd/cross-gcc/bin/arm-linux-gnueabi-ld: failed to merge target specific data of file init/do_mounts.o > > The problem seems to be that we don't pass the correct linker > flags any more, it should be using --be8 from > > arch/arm/Makefile:LDFLAGS_vmlinux += --be8 > arch/arm/Makefile:LDFLAGS_MODULE += --be8 > > but that somehow is lost. These both seem like linker flags are being lost for some reason, but I don't really know why. Maybe ARM's Makefile. I don't have an ARM build setup but I'll try to help track it down when I get some time. > 3. drivers/firmware/efi/libstub/lib.a: error adding symbols: Archive has no index; run ranlib to add one > > haven't investigated at all, turned off EFI for now. Okay this looks like a bug in patch 2 (caused by me, not Stephen!) For thin archives, cmd_link_l_target should be: cmd_link_l_target = rm -f $@; $(AR) rcT$(KBUILD_ARFLAGS) $@ $(lib-y) (note no "S" option). Thanks, Nick ^ permalink raw reply [flat|nested] 43+ messages in thread
end of thread, other threads:[~2016-08-10 0:37 UTC | newest] Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-08-05 12:11 [RFC][PATCH 0/5] kbuild changes, thin archives, --gc-sections Nicholas Piggin 2016-08-05 12:11 ` [PATCH 1/5] kbuild: allow architectures to use thin archives instead of ld -r Nicholas Piggin 2016-08-06 3:50 ` kbuild test robot 2016-08-06 3:50 ` kbuild test robot 2016-08-06 3:50 ` kbuild test robot 2016-08-06 20:10 ` Sam Ravnborg 2016-08-07 1:49 ` Stephen Rothwell 2016-08-07 3:34 ` Alan Modra 2016-08-07 4:17 ` Nicolas Pitre 2016-08-07 14:40 ` Sam Ravnborg 2016-08-08 3:19 ` Nicholas Piggin 2016-08-08 4:46 ` Sam Ravnborg 2016-08-08 3:25 ` Nicholas Piggin 2016-08-08 9:18 ` Arnd Bergmann 2016-08-05 12:12 ` [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination Nicholas Piggin 2016-08-06 20:14 ` Sam Ravnborg 2016-08-08 3:29 ` Nicholas Piggin 2016-08-08 4:49 ` Sam Ravnborg 2016-08-07 5:33 ` Nicolas Pitre 2016-08-08 3:42 ` Nicholas Piggin 2016-08-08 4:12 ` Nicolas Pitre 2016-08-08 4:27 ` Nicholas Piggin 2016-08-07 9:57 ` Alan Modra 2016-08-07 11:35 ` Andreas Schwab 2016-08-07 20:26 ` Arnd Bergmann 2016-08-07 23:49 ` Alan Modra 2016-08-08 15:14 ` Arnd Bergmann 2016-08-08 23:50 ` Alan Modra 2016-08-09 22:10 ` Arnd Bergmann 2016-08-09 3:16 ` Andi Kleen 2016-08-09 22:29 ` Arnd Bergmann 2016-08-09 23:08 ` Andi Kleen 2016-08-10 0:37 ` Andi Kleen 2016-08-05 12:12 ` [PATCH 3/5] kbuild: add arch specific post-module-link pass Nicholas Piggin 2016-08-05 13:56 ` Nicholas Piggin 2016-08-06 20:16 ` Sam Ravnborg 2016-08-08 3:30 ` Nicholas Piggin 2016-08-05 12:12 ` [PATCH 4/5] powerpc: switch to using thin archives Nicholas Piggin 2016-08-05 12:12 ` [PATCH 5/5] powerpc/64: use linker dce Nicholas Piggin 2016-08-05 13:32 ` [RFC][PATCH 0/5] kbuild changes, thin archives, --gc-sections Nicholas Piggin 2016-08-05 13:32 ` Nicholas Piggin 2016-08-07 20:23 ` Arnd Bergmann 2016-08-08 3:53 ` Nicholas Piggin
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.