All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nicholas Piggin <npiggin@gmail.com>
To: Arnd Bergmann <arnd@kernel.org>, Fangrui Song <maskray@google.com>
Cc: Ard Biesheuvel <ardb@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
	Andrew Scull <ascull@google.com>, Mark Brown <broonie@kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	clang-built-linux <clang-built-linux@googlegroups.com>,
	David Brazdil <dbrazdil@google.com>,
	Geert Uytterhoeven <geert+renesas@glider.be>,
	Ionela Voinescu <ionela.voinescu@arm.com>,
	Kees Cook <keescook@chromium.org>,
	Kristina Martsenko <kristina.martsenko@arm.com>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Marc Zyngier <maz@kernel.org>,
	Nathan Chancellor <nathan@kernel.org>,
	Nick Desaulniers <ndesaulniers@google.com>,
	Vincenzo Frascino <vincenzo.frascino@arm.com>,
	Will Deacon <will@kernel.org>, Nicolas Pitre <nico@fluxnic.net>
Subject: Re: [PATCH] [RFC] arm64: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION
Date: Mon, 01 Mar 2021 11:11:22 +1000	[thread overview]
Message-ID: <1614559739.p25z5x88wl.astroid@bobo.none> (raw)
In-Reply-To: <CAK8P3a2bLKe3js4SKeZoGp8B51+rpW6G3KvpbJ5=y83sxHSu6g@mail.gmail.com>

Excerpts from Arnd Bergmann's message of February 27, 2021 7:49 pm:
> On Fri, Feb 26, 2021 at 10:13 PM 'Fangrui Song' via Clang Built Linux
> <clang-built-linux@googlegroups.com> wrote:
>>
>> For folks who are interested in --gc-sections on metadata sections,
>> I want to bring you awareness of the implication of __start_/__stop_ symbols and C identifier name sections.
>> You can see https://github.com/ClangBuiltLinux/linux/issues/1307 for a summary.
>> (Its linked blog article has some examples.)
>>
>> In the kernel linker scripts, most C identifier name sections begin with double-underscore __.
>> Some are surrounded by `KEEP(...)`, some are not.
>>
>> * A `KEEP` keyword has GC root semantics and makes ld --gc-sections ineffectful.
>> * Without `KEEP`, __start_/__stop_ references from a live input section
>>    can unnecessarily retain all the associated C identifier name input
>>    sections. The new ld.lld option `-z start-stop-gc` can defeat this rule.
>>
>> As an example, a __start___jump_table reference from a live section
>> causes all `__jump_table` input section to be retained, even if you
>> change `KEEP(__jump_table)` to `(__jump_table)`.
>> (If you change the symbol name from `__start_${section}` to something
>> else (e.g. `__start${section}`), the rule will not apply.)
> 
> I suspect the __start_* symbols are cargo-culted by many developers
> copying stuff around between kernel linker scripts, that's certainly how I
> approach making changes to it normally without a deeper understanding
> of how the linker actually works or what the different bits of syntax mean
> there.
> 
> I see the original vmlinux.lds linker script showed up in linux-2.1.23, and
> it contained
> 
> +  . = ALIGN(16);               /* Exception table */
> +  __start___ex_table = .;
> +  __ex_table : { *(__ex_table) }
> +  __stop___ex_table = .;
> +
> +  __start___ksymtab = .;       /* Kernel symbol table */
> +  __ksymtab : { *(__ksymtab) }
> +  __stop___ksymtab = .;
> 
> originally for arch/sparc, and shortly afterwards for i386. The magic
> __ex_table section was first used in linux-2.1.7 without a linker
> script. It's probably a good idea to try cleaning these up by using
> non-magic start/stop symbols for all sections, and relying on KEEP()
> instead where needed.
> 
>> There are a lot of KEEP usage. Perhaps some can be dropped to facilitate
>> ld --gc-sections.
> 
> I see a lot of these were added by Nick Piggin (added to Cc) in this commit:
> 
> commit 266ff2a8f51f02b429a987d87634697eb0d01d6a
> Author: Nicholas Piggin <npiggin@gmail.com>
> Date:   Wed May 9 22:59:58 2018 +1000
> 
>     kbuild: Fix asm-generic/vmlinux.lds.h for LD_DEAD_CODE_DATA_ELIMINATION
> 
>     KEEP more tables, and add the function/data section wildcard to more
>     section selections.
> 
>     This is a little ad-hoc at the moment, but kernel code should be moved
>     to consistently use .text..x (note: double dots) for explicit sections
>     and all references to it in the linker script can be made with
>     TEXT_MAIN, and similarly for other sections.
> 
>     For now, let's see if major architectures move to enabling this option
>     then we can do some refactoring passes. Otherwise if it remains unused
>     or superseded by LTO, this may not be required.
> 
>     Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>     Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
> 
> which apparently was intentionally cautious.
> 
> Unlike what Nick expected in his submission, I now think the annotations
> will be needed for LTO just like they are for --gc-sections.

Yeah I wasn't sure exactly what LTO looks like or how it would work.
I thought perhaps LTO might be able to find dead code with circular / 
back references, we could put references from the code back to these 
tables or something so they would be kept without KEEP. I don't know, I 
was handwaving!

I managed to get powerpc (and IIRC x86?) working with gc sections with
those KEEP annotations, but effectiveness of course is far worse than 
what Nicolas was able to achieve with all his techniques and tricks.

But yes unless there is some other mechanism to handle these tables, 
then KEEP probably has to stay. I suggest this wants a very explicit and 
systematic way to handle it (maybe with some toolchain support) rather 
than trying to just remove things case by case and see what breaks.

I don't know if Nicolas is still been working on his shrinking patches
recenty but he probably knows more than anyone about this stuff.

Thanks,
Nick


WARNING: multiple messages have this Message-ID (diff)
From: Nicholas Piggin <npiggin@gmail.com>
To: Arnd Bergmann <arnd@kernel.org>, Fangrui Song <maskray@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>,
	Vincenzo Frascino <vincenzo.frascino@arm.com>,
	Geert Uytterhoeven <geert+renesas@glider.be>,
	Arnd Bergmann <arnd@arndb.de>, Nicolas Pitre <nico@fluxnic.net>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Nick Desaulniers <ndesaulniers@google.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Kristina Martsenko <kristina.martsenko@arm.com>,
	Nathan Chancellor <nathan@kernel.org>,
	clang-built-linux <clang-built-linux@googlegroups.com>,
	Will Deacon <will@kernel.org>, Mark Brown <broonie@kernel.org>,
	Andrew Scull <ascull@google.com>, Marc Zyngier <maz@kernel.org>,
	David Brazdil <dbrazdil@google.com>,
	Ionela Voinescu <ionela.voinescu@arm.com>,
	Ard Biesheuvel <ardb@kernel.org>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>,
	Kees Cook <keescook@chromium.org>
Subject: Re: [PATCH] [RFC] arm64: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION
Date: Mon, 01 Mar 2021 11:11:22 +1000	[thread overview]
Message-ID: <1614559739.p25z5x88wl.astroid@bobo.none> (raw)
In-Reply-To: <CAK8P3a2bLKe3js4SKeZoGp8B51+rpW6G3KvpbJ5=y83sxHSu6g@mail.gmail.com>

Excerpts from Arnd Bergmann's message of February 27, 2021 7:49 pm:
> On Fri, Feb 26, 2021 at 10:13 PM 'Fangrui Song' via Clang Built Linux
> <clang-built-linux@googlegroups.com> wrote:
>>
>> For folks who are interested in --gc-sections on metadata sections,
>> I want to bring you awareness of the implication of __start_/__stop_ symbols and C identifier name sections.
>> You can see https://github.com/ClangBuiltLinux/linux/issues/1307 for a summary.
>> (Its linked blog article has some examples.)
>>
>> In the kernel linker scripts, most C identifier name sections begin with double-underscore __.
>> Some are surrounded by `KEEP(...)`, some are not.
>>
>> * A `KEEP` keyword has GC root semantics and makes ld --gc-sections ineffectful.
>> * Without `KEEP`, __start_/__stop_ references from a live input section
>>    can unnecessarily retain all the associated C identifier name input
>>    sections. The new ld.lld option `-z start-stop-gc` can defeat this rule.
>>
>> As an example, a __start___jump_table reference from a live section
>> causes all `__jump_table` input section to be retained, even if you
>> change `KEEP(__jump_table)` to `(__jump_table)`.
>> (If you change the symbol name from `__start_${section}` to something
>> else (e.g. `__start${section}`), the rule will not apply.)
> 
> I suspect the __start_* symbols are cargo-culted by many developers
> copying stuff around between kernel linker scripts, that's certainly how I
> approach making changes to it normally without a deeper understanding
> of how the linker actually works or what the different bits of syntax mean
> there.
> 
> I see the original vmlinux.lds linker script showed up in linux-2.1.23, and
> it contained
> 
> +  . = ALIGN(16);               /* Exception table */
> +  __start___ex_table = .;
> +  __ex_table : { *(__ex_table) }
> +  __stop___ex_table = .;
> +
> +  __start___ksymtab = .;       /* Kernel symbol table */
> +  __ksymtab : { *(__ksymtab) }
> +  __stop___ksymtab = .;
> 
> originally for arch/sparc, and shortly afterwards for i386. The magic
> __ex_table section was first used in linux-2.1.7 without a linker
> script. It's probably a good idea to try cleaning these up by using
> non-magic start/stop symbols for all sections, and relying on KEEP()
> instead where needed.
> 
>> There are a lot of KEEP usage. Perhaps some can be dropped to facilitate
>> ld --gc-sections.
> 
> I see a lot of these were added by Nick Piggin (added to Cc) in this commit:
> 
> commit 266ff2a8f51f02b429a987d87634697eb0d01d6a
> Author: Nicholas Piggin <npiggin@gmail.com>
> Date:   Wed May 9 22:59:58 2018 +1000
> 
>     kbuild: Fix asm-generic/vmlinux.lds.h for LD_DEAD_CODE_DATA_ELIMINATION
> 
>     KEEP more tables, and add the function/data section wildcard to more
>     section selections.
> 
>     This is a little ad-hoc at the moment, but kernel code should be moved
>     to consistently use .text..x (note: double dots) for explicit sections
>     and all references to it in the linker script can be made with
>     TEXT_MAIN, and similarly for other sections.
> 
>     For now, let's see if major architectures move to enabling this option
>     then we can do some refactoring passes. Otherwise if it remains unused
>     or superseded by LTO, this may not be required.
> 
>     Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>     Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
> 
> which apparently was intentionally cautious.
> 
> Unlike what Nick expected in his submission, I now think the annotations
> will be needed for LTO just like they are for --gc-sections.

Yeah I wasn't sure exactly what LTO looks like or how it would work.
I thought perhaps LTO might be able to find dead code with circular / 
back references, we could put references from the code back to these 
tables or something so they would be kept without KEEP. I don't know, I 
was handwaving!

I managed to get powerpc (and IIRC x86?) working with gc sections with
those KEEP annotations, but effectiveness of course is far worse than 
what Nicolas was able to achieve with all his techniques and tricks.

But yes unless there is some other mechanism to handle these tables, 
then KEEP probably has to stay. I suggest this wants a very explicit and 
systematic way to handle it (maybe with some toolchain support) rather 
than trying to just remove things case by case and see what breaks.

I don't know if Nicolas is still been working on his shrinking patches
recenty but he probably knows more than anyone about this stuff.

Thanks,
Nick


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-03-01  1:17 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-25 11:20 [PATCH] [RFC] arm64: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION Arnd Bergmann
2021-02-25 11:20 ` Arnd Bergmann
2021-02-25 20:16 ` Kees Cook
2021-02-25 20:16   ` Kees Cook
2021-02-26  0:36 ` Sedat Dilek
2021-02-26  0:36   ` Sedat Dilek
2021-02-26  8:14   ` Arnd Bergmann
2021-02-26  8:14     ` Arnd Bergmann
2021-02-26  9:05     ` Sedat Dilek
2021-02-26  9:05       ` Sedat Dilek
2021-02-26  9:51       ` Arnd Bergmann
2021-02-26  9:51         ` Arnd Bergmann
2021-02-26 10:02         ` Sedat Dilek
2021-02-26 10:02           ` Sedat Dilek
2021-02-27 20:13           ` Sedat Dilek
2021-02-26 21:13 ` Fangrui Song
2021-02-26 21:13   ` Fangrui Song
2021-02-27  9:49   ` Arnd Bergmann
2021-02-27  9:49     ` Arnd Bergmann
2021-03-01  1:11     ` Nicholas Piggin [this message]
2021-03-01  1:11       ` Nicholas Piggin
2021-03-10 20:49       ` Masahiro Yamada
2021-03-10 20:49         ` Masahiro Yamada
2021-03-10 21:08         ` Arnd Bergmann
2021-03-10 21:08           ` Arnd Bergmann
2021-03-10 21:24           ` Sedat Dilek
2021-03-10 21:24             ` Sedat Dilek
2021-03-10 21:47             ` Nicolas Pitre
2021-03-10 21:47               ` Nicolas Pitre
2021-03-10 21:57               ` Sedat Dilek
2021-03-10 21:57                 ` Sedat Dilek
2021-03-10 22:02           ` Nick Desaulniers
2021-03-10 22:02             ` Nick Desaulniers
2021-03-10 22:08             ` Nicolas Pitre
2021-03-10 22:08               ` Nicolas Pitre
2021-03-10 22:29           ` Fangrui Song
2021-03-10 22:29             ` Fangrui Song
2021-03-10 21:45         ` Rasmus Villemoes
2021-03-10 21:45           ` Rasmus Villemoes
2021-03-10 21:19       ` Nicolas Pitre
2021-03-10 21:19         ` Nicolas Pitre
2021-03-10 22:42         ` Fangrui Song
2021-03-10 22:42           ` Fangrui Song
2021-03-17 14:37 ` Catalin Marinas
2021-03-17 14:37   ` Catalin Marinas
2021-03-17 16:18   ` Catalin Marinas
2021-03-17 16:18     ` Catalin Marinas
2021-03-18  8:41     ` Arnd Bergmann
2021-03-18  8:41       ` Arnd Bergmann
2021-03-19 12:25       ` Catalin Marinas
2021-03-19 12:25         ` Catalin Marinas
2021-03-19 14:01         ` Arnd Bergmann
2021-03-19 14:01           ` Arnd Bergmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1614559739.p25z5x88wl.astroid@bobo.none \
    --to=npiggin@gmail.com \
    --cc=ardb@kernel.org \
    --cc=arnd@arndb.de \
    --cc=arnd@kernel.org \
    --cc=ascull@google.com \
    --cc=broonie@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=clang-built-linux@googlegroups.com \
    --cc=dbrazdil@google.com \
    --cc=geert+renesas@glider.be \
    --cc=ionela.voinescu@arm.com \
    --cc=keescook@chromium.org \
    --cc=kristina.martsenko@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=maskray@google.com \
    --cc=maz@kernel.org \
    --cc=nathan@kernel.org \
    --cc=ndesaulniers@google.com \
    --cc=nico@fluxnic.net \
    --cc=vincenzo.frascino@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.