stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 13/36] vmlinux.lds.h: add PGO and AutoFDO input sections
       [not found] <20200731230820.1742553-1-keescook@chromium.org>
@ 2020-07-31 23:07 ` Kees Cook
  2020-08-01  3:51   ` Arvind Sankar
                     ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Kees Cook @ 2020-07-31 23:07 UTC (permalink / raw)
  To: Thomas Gleixner, Will Deacon
  Cc: Kees Cook, Nick Desaulniers, Jian Cai,
	Fāng-ruì Sòng, Luis Lozano, Manoj Gupta, stable,
	Catalin Marinas, Mark Rutland, Ard Biesheuvel,
	Peter Collingbourne, James Morse, Borislav Petkov, Ingo Molnar,
	Russell King, Masahiro Yamada, Arvind Sankar, Nathan Chancellor,
	Arnd Bergmann, x86, clang-built-linux, linux-arch, linux-efi,
	linux-arm-kernel, linux-kernel

From: Nick Desaulniers <ndesaulniers@google.com>

Basically, consider .text.{hot|unlikely|unknown}.* part of .text, too.

When compiling with profiling information (collected via PGO
instrumentations or AutoFDO sampling), Clang will separate code into
.text.hot, .text.unlikely, or .text.unknown sections based on profiling
information. After D79600 (clang-11), these sections will have a
trailing `.` suffix, ie.  .text.hot., .text.unlikely., .text.unknown..

When using -ffunction-sections together with profiling infomation,
either explicitly (FGKASLR) or implicitly (LTO), code may be placed in
sections following the convention:
.text.hot.<foo>, .text.unlikely.<bar>, .text.unknown.<baz>
where <foo>, <bar>, and <baz> are functions.  (This produces one section
per function; we generally try to merge these all back via linker script
so that we don't have 50k sections).

For the above cases, we need to teach our linker scripts that such
sections might exist and that we'd explicitly like them grouped
together, otherwise we can wind up with code outside of the
_stext/_etext boundaries that might not be mapped properly for some
architectures, resulting in boot failures.

If the linker script is not told about possible input sections, then
where the section is placed as output is a heuristic-laiden mess that's
non-portable between linkers (ie. BFD and LLD), and has resulted in many
hard to debug bugs.  Kees Cook is working on cleaning this up by adding
--orphan-handling=warn linker flag used in ARCH=powerpc to additional
architectures. In the case of linker scripts, borrowing from the Zen of
Python: explicit is better than implicit.

Also, ld.bfd's internal linker script considers .text.hot AND
.text.hot.* to be part of .text, as well as .text.unlikely and
.text.unlikely.*. I didn't see support for .text.unknown.*, and didn't
see Clang producing such code in our kernel builds, but I see code in
LLVM that can produce such section names if profiling information is
missing. That may point to a larger issue with generating or collecting
profiles, but I would much rather be safe and explicit than have to
debug yet another issue related to orphan section placement.

Reported-by: Jian Cai <jiancai@google.com>
Suggested-by: Fāng-ruì Sòng <maskray@google.com>
Tested-by: Luis Lozano <llozano@google.com>
Tested-by: Manoj Gupta <manojgupta@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commitdiff;h=add44f8d5c5c05e08b11e033127a744d61c26aee
Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commitdiff;h=1de778ed23ce7492c523d5850c6c6dbb34152655
Link: https://reviews.llvm.org/D79600
Link: https://bugs.chromium.org/p/chromium/issues/detail?id=1084760
Debugged-by: Luis Lozano <llozano@google.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 include/asm-generic/vmlinux.lds.h | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 2593957f6e8b..af5211ca857c 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -561,7 +561,10 @@
  */
 #define TEXT_TEXT							\
 		ALIGN_FUNCTION();					\
-		*(.text.hot TEXT_MAIN .text.fixup .text.unlikely)	\
+		*(.text.hot .text.hot.*)				\
+		*(TEXT_MAIN .text.fixup)				\
+		*(.text.unlikely .text.unlikely.*)			\
+		*(.text.unknown .text.unknown.*)			\
 		NOINSTR_TEXT						\
 		*(.text..refcount)					\
 		*(.ref.text)						\
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 13/36] vmlinux.lds.h: add PGO and AutoFDO input sections
  2020-07-31 23:07 ` [PATCH v5 13/36] vmlinux.lds.h: add PGO and AutoFDO input sections Kees Cook
@ 2020-08-01  3:51   ` Arvind Sankar
  2020-08-01  6:18     ` Kees Cook
  2020-08-03 19:05     ` Andi Kleen
  2020-08-06  1:24   ` Sasha Levin
  2020-08-19 23:56   ` Sasha Levin
  2 siblings, 2 replies; 14+ messages in thread
From: Arvind Sankar @ 2020-08-01  3:51 UTC (permalink / raw)
  To: Kees Cook
  Cc: Thomas Gleixner, Will Deacon, Nick Desaulniers, Jian Cai,
	Fāng-ruì Sòng, Luis Lozano, Manoj Gupta, stable,
	Catalin Marinas, Mark Rutland, Ard Biesheuvel,
	Peter Collingbourne, James Morse, Borislav Petkov, Ingo Molnar,
	Russell King, Masahiro Yamada, Arvind Sankar, Nathan Chancellor,
	Arnd Bergmann, x86, clang-built-linux, linux-arch, linux-efi,
	linux-arm-kernel, linux-kernel, Andi Kleen, Michal Marek

On Fri, Jul 31, 2020 at 04:07:57PM -0700, Kees Cook wrote:
> From: Nick Desaulniers <ndesaulniers@google.com>
> 
> Basically, consider .text.{hot|unlikely|unknown}.* part of .text, too.
> 
> When compiling with profiling information (collected via PGO
> instrumentations or AutoFDO sampling), Clang will separate code into
> .text.hot, .text.unlikely, or .text.unknown sections based on profiling
> information. After D79600 (clang-11), these sections will have a
> trailing `.` suffix, ie.  .text.hot., .text.unlikely., .text.unknown..
> 
> When using -ffunction-sections together with profiling infomation,
> either explicitly (FGKASLR) or implicitly (LTO), code may be placed in
> sections following the convention:
> .text.hot.<foo>, .text.unlikely.<bar>, .text.unknown.<baz>
> where <foo>, <bar>, and <baz> are functions.  (This produces one section
> per function; we generally try to merge these all back via linker script
> so that we don't have 50k sections).
> 
> For the above cases, we need to teach our linker scripts that such
> sections might exist and that we'd explicitly like them grouped
> together, otherwise we can wind up with code outside of the
> _stext/_etext boundaries that might not be mapped properly for some
> architectures, resulting in boot failures.
> 
> If the linker script is not told about possible input sections, then
> where the section is placed as output is a heuristic-laiden mess that's
> non-portable between linkers (ie. BFD and LLD), and has resulted in many
> hard to debug bugs.  Kees Cook is working on cleaning this up by adding
> --orphan-handling=warn linker flag used in ARCH=powerpc to additional
> architectures. In the case of linker scripts, borrowing from the Zen of
> Python: explicit is better than implicit.
> 
> Also, ld.bfd's internal linker script considers .text.hot AND
> .text.hot.* to be part of .text, as well as .text.unlikely and
> .text.unlikely.*. I didn't see support for .text.unknown.*, and didn't
> see Clang producing such code in our kernel builds, but I see code in
> LLVM that can produce such section names if profiling information is
> missing. That may point to a larger issue with generating or collecting
> profiles, but I would much rather be safe and explicit than have to
> debug yet another issue related to orphan section placement.
> 
> Reported-by: Jian Cai <jiancai@google.com>
> Suggested-by: Fāng-ruì Sòng <maskray@google.com>
> Tested-by: Luis Lozano <llozano@google.com>
> Tested-by: Manoj Gupta <manojgupta@google.com>
> Acked-by: Kees Cook <keescook@chromium.org>
> Cc: stable@vger.kernel.org
> Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commitdiff;h=add44f8d5c5c05e08b11e033127a744d61c26aee
> Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commitdiff;h=1de778ed23ce7492c523d5850c6c6dbb34152655
> Link: https://reviews.llvm.org/D79600
> Link: https://bugs.chromium.org/p/chromium/issues/detail?id=1084760
> Debugged-by: Luis Lozano <llozano@google.com>
> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>  include/asm-generic/vmlinux.lds.h | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index 2593957f6e8b..af5211ca857c 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -561,7 +561,10 @@
>   */
>  #define TEXT_TEXT							\
>  		ALIGN_FUNCTION();					\
> -		*(.text.hot TEXT_MAIN .text.fixup .text.unlikely)	\
> +		*(.text.hot .text.hot.*)				\
> +		*(TEXT_MAIN .text.fixup)				\
> +		*(.text.unlikely .text.unlikely.*)			\
> +		*(.text.unknown .text.unknown.*)			\
>  		NOINSTR_TEXT						\
>  		*(.text..refcount)					\
>  		*(.ref.text)						\
> -- 
> 2.25.1
> 

This also changes the ordering to place all hot resp unlikely sections separate
from other text, while currently it places the hot/unlikely bits of each file
together with the rest of the code in that file. That seems like a reasonable
change and should be mentioned in the commit message.

However, the history of their being together comes from

  9bebe9e5b0f3 ("kbuild: Fix .text.unlikely placement")

which seems to indicate there was some problem with having them separated out,
although I don't quite understand what the issue was from the commit message.

Cc Andi and Michal to see if they remember.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 13/36] vmlinux.lds.h: add PGO and AutoFDO input sections
  2020-08-01  3:51   ` Arvind Sankar
@ 2020-08-01  6:18     ` Kees Cook
  2020-08-01 17:27       ` Arvind Sankar
  2020-08-03 19:05     ` Andi Kleen
  1 sibling, 1 reply; 14+ messages in thread
From: Kees Cook @ 2020-08-01  6:18 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Thomas Gleixner, Will Deacon, Nick Desaulniers, Jian Cai,
	Fāng-ruì Sòng, Luis Lozano, Manoj Gupta, stable,
	Catalin Marinas, Mark Rutland, Ard Biesheuvel,
	Peter Collingbourne, James Morse, Borislav Petkov, Ingo Molnar,
	Russell King, Masahiro Yamada, Nathan Chancellor, Arnd Bergmann,
	x86, clang-built-linux, linux-arch, linux-efi, linux-arm-kernel,
	linux-kernel, Andi Kleen, Michal Marek, Kristen Carlson Accardi

On Fri, Jul 31, 2020 at 11:51:28PM -0400, Arvind Sankar wrote:
> On Fri, Jul 31, 2020 at 04:07:57PM -0700, Kees Cook wrote:
> > From: Nick Desaulniers <ndesaulniers@google.com>
> > 
> > Basically, consider .text.{hot|unlikely|unknown}.* part of .text, too.
> > 
> > When compiling with profiling information (collected via PGO
> > instrumentations or AutoFDO sampling), Clang will separate code into
> > .text.hot, .text.unlikely, or .text.unknown sections based on profiling
> > information. After D79600 (clang-11), these sections will have a
> > trailing `.` suffix, ie.  .text.hot., .text.unlikely., .text.unknown..
> > 
> > When using -ffunction-sections together with profiling infomation,
> > either explicitly (FGKASLR) or implicitly (LTO), code may be placed in
> > sections following the convention:
> > .text.hot.<foo>, .text.unlikely.<bar>, .text.unknown.<baz>
> > where <foo>, <bar>, and <baz> are functions.  (This produces one section
> > per function; we generally try to merge these all back via linker script
> > so that we don't have 50k sections).
> > 
> > For the above cases, we need to teach our linker scripts that such
> > sections might exist and that we'd explicitly like them grouped
> > together, otherwise we can wind up with code outside of the
> > _stext/_etext boundaries that might not be mapped properly for some
> > architectures, resulting in boot failures.
> > 
> > If the linker script is not told about possible input sections, then
> > where the section is placed as output is a heuristic-laiden mess that's
> > non-portable between linkers (ie. BFD and LLD), and has resulted in many
> > hard to debug bugs.  Kees Cook is working on cleaning this up by adding
> > --orphan-handling=warn linker flag used in ARCH=powerpc to additional
> > architectures. In the case of linker scripts, borrowing from the Zen of
> > Python: explicit is better than implicit.
> > 
> > Also, ld.bfd's internal linker script considers .text.hot AND
> > .text.hot.* to be part of .text, as well as .text.unlikely and
> > .text.unlikely.*. I didn't see support for .text.unknown.*, and didn't
> > see Clang producing such code in our kernel builds, but I see code in
> > LLVM that can produce such section names if profiling information is
> > missing. That may point to a larger issue with generating or collecting
> > profiles, but I would much rather be safe and explicit than have to
> > debug yet another issue related to orphan section placement.
> > 
> > Reported-by: Jian Cai <jiancai@google.com>
> > Suggested-by: Fāng-ruì Sòng <maskray@google.com>
> > Tested-by: Luis Lozano <llozano@google.com>
> > Tested-by: Manoj Gupta <manojgupta@google.com>
> > Acked-by: Kees Cook <keescook@chromium.org>
> > Cc: stable@vger.kernel.org
> > Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commitdiff;h=add44f8d5c5c05e08b11e033127a744d61c26aee
> > Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commitdiff;h=1de778ed23ce7492c523d5850c6c6dbb34152655
> > Link: https://reviews.llvm.org/D79600
> > Link: https://bugs.chromium.org/p/chromium/issues/detail?id=1084760
> > Debugged-by: Luis Lozano <llozano@google.com>
> > Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
> > Signed-off-by: Kees Cook <keescook@chromium.org>
> > ---
> >  include/asm-generic/vmlinux.lds.h | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> > index 2593957f6e8b..af5211ca857c 100644
> > --- a/include/asm-generic/vmlinux.lds.h
> > +++ b/include/asm-generic/vmlinux.lds.h
> > @@ -561,7 +561,10 @@
> >   */
> >  #define TEXT_TEXT							\
> >  		ALIGN_FUNCTION();					\
> > -		*(.text.hot TEXT_MAIN .text.fixup .text.unlikely)	\
> > +		*(.text.hot .text.hot.*)				\
> > +		*(TEXT_MAIN .text.fixup)				\
> > +		*(.text.unlikely .text.unlikely.*)			\
> > +		*(.text.unknown .text.unknown.*)			\
> >  		NOINSTR_TEXT						\
> >  		*(.text..refcount)					\
> >  		*(.ref.text)						\
> > -- 
> > 2.25.1
> > 
> 
> This also changes the ordering to place all hot resp unlikely sections separate
> from other text, while currently it places the hot/unlikely bits of each file
> together with the rest of the code in that file. That seems like a reasonable

Oh, hmm, yes, we aren't explicitly using SORT() here. Does that mean the
input sections were entirely be ordered in compilation unit link order,
even in the case of orphan sections? (And I think either way, the answer
isn't the same between bfd and lld.) I actually thought the like-named
input sections were collected together first with lld, but bfd strictly
appended to the output section. I guess it's time for me to stare at -M
output from ld...

Regardless, this patch is attempting to fix the problem where bfd and lld
lay out the orphans differently (as mentioned above, lld seems to sort
them in a way that is not strictly appended, and bfd seems to sort them
strictly appended). In the case of being appended to the .text output
section, this would cause boot failures due to _etext not covering the
resulting sections (which this[1] also encountered and fixed to be more
robust for such appended collection -- that series actually _depends_ on
orphan handling doing the appending, because there is no current way
to map wildcard input sections to their own separate output sections).

> change and should be mentioned in the commit message.
> 
> However, the history of their being together comes from
> 
>   9bebe9e5b0f3 ("kbuild: Fix .text.unlikely placement")
> 
> which seems to indicate there was some problem with having them separated out,
> although I don't quite understand what the issue was from the commit message.

Looking at this again, I actually wonder if we have bigger issues here
with dead code elimination:

#ifdef CONFIG_LD_DEAD_CODE_DATA_ELIMINATION
#define TEXT_MAIN .text .text.[0-9a-zA-Z_]*
...

that would catch: .text.hot .text.fixup .text.unlikely and .text.unknown
but not .text.hot.*, etc (i.e. the third dot isn't matched, which is,
I assume, why Clang switched to adding a trailing dot). However, this
patch lists .text.hot .text.hot.* first, so they'd get pulled to the
front correctly, but the trailing ones (with 2 dots) would not, since
they'd match the TEXT_MAIN wildcard first. (This problem actually existed
before this patch too, and is not the fault of 9bebe9e5b0f3, but rather
the addition of TEXT_MAIN, which could potentially match .text.unlikely
and .text.fixup)

Unless I'm totally wrong and the bfd docs don't match the behavior? e.g.
if I have a link order of ".foo.before", ".foo.after", and ".foo.middle",
and this rule:

.foo : { *(.foo.before .foo.* .foo.after) }

do I get this (first match):

	.foo.before
	.foo.after
	.foo.middle

or (most specific match):

	.foo.before
	.foo.middle
	.foo.after

?

As I said, now that I'm able to better articulate these questions, I'll
go get answers from -M output. :)

Perhaps we need to fix TEXT_MAIN not TEXT_TEXT? TEXT_TEXT is for
collecting .text, .text.[^\.]* and *.text, where, effectively,
.text and .text[^\.]* are defined by TEXT_MAIN. i.e. adding 3-dot "text"
input sections needs to likely be included in TEXT_MAIN

Anyway, I'll keep looking at this...

(In the meantime, perhaps we can take Arvind's series, and the earlier
portions of the orphan series where asm-generic/vmlinux.lds.h and other
things are cleaned up...)

-Kees

[1] https://lore.kernel.org/lkml/20200717170008.5949-6-kristen@linux.intel.com/

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 13/36] vmlinux.lds.h: add PGO and AutoFDO input sections
  2020-08-01  6:18     ` Kees Cook
@ 2020-08-01 17:27       ` Arvind Sankar
  0 siblings, 0 replies; 14+ messages in thread
From: Arvind Sankar @ 2020-08-01 17:27 UTC (permalink / raw)
  To: Kees Cook
  Cc: Arvind Sankar, Thomas Gleixner, Will Deacon, Nick Desaulniers,
	Jian Cai, Fāng-ruì Sòng, Luis Lozano, Manoj Gupta,
	stable, Catalin Marinas, Mark Rutland, Ard Biesheuvel,
	Peter Collingbourne, James Morse, Borislav Petkov, Ingo Molnar,
	Russell King, Masahiro Yamada, Nathan Chancellor, Arnd Bergmann,
	x86, clang-built-linux, linux-arch, linux-efi, linux-arm-kernel,
	linux-kernel, Andi Kleen, Michal Marek, Kristen Carlson Accardi

On Fri, Jul 31, 2020 at 11:18:02PM -0700, Kees Cook wrote:
> On Fri, Jul 31, 2020 at 11:51:28PM -0400, Arvind Sankar wrote:
> > 
> > This also changes the ordering to place all hot resp unlikely sections separate
> > from other text, while currently it places the hot/unlikely bits of each file
> > together with the rest of the code in that file. That seems like a reasonable
> 
> Oh, hmm, yes, we aren't explicitly using SORT() here. Does that mean the
> input sections were entirely be ordered in compilation unit link order,
> even in the case of orphan sections? (And I think either way, the answer
> isn't the same between bfd and lld.) I actually thought the like-named
> input sections were collected together first with lld, but bfd strictly
> appended to the output section. I guess it's time for me to stare at -M
> output from ld...

I don't know what happened to the orphans previously. But .text.hot and
.text.unlikely will now change ordering. It sounds from below like this
wasn't intentional? Though it does seem to be how BFD's default linker
scripts lay it out.

> 
> Regardless, this patch is attempting to fix the problem where bfd and lld
> lay out the orphans differently (as mentioned above, lld seems to sort
> them in a way that is not strictly appended, and bfd seems to sort them
> strictly appended). In the case of being appended to the .text output
> section, this would cause boot failures due to _etext not covering the
> resulting sections (which this[1] also encountered and fixed to be more
> robust for such appended collection -- that series actually _depends_ on
> orphan handling doing the appending, because there is no current way
> to map wildcard input sections to their own separate output sections).
> 
> > change and should be mentioned in the commit message.
> > 
> > However, the history of their being together comes from
> > 
> >   9bebe9e5b0f3 ("kbuild: Fix .text.unlikely placement")
> > 
> > which seems to indicate there was some problem with having them separated out,
> > although I don't quite understand what the issue was from the commit message.
> 
> Looking at this again, I actually wonder if we have bigger issues here
> with dead code elimination:
> 
> #ifdef CONFIG_LD_DEAD_CODE_DATA_ELIMINATION
> #define TEXT_MAIN .text .text.[0-9a-zA-Z_]*
> ...
> 
> that would catch: .text.hot .text.fixup .text.unlikely and .text.unknown
> but not .text.hot.*, etc (i.e. the third dot isn't matched, which is,
> I assume, why Clang switched to adding a trailing dot). However, this
> patch lists .text.hot .text.hot.* first, so they'd get pulled to the
> front correctly, but the trailing ones (with 2 dots) would not, since
> they'd match the TEXT_MAIN wildcard first. (This problem actually existed
> before this patch too, and is not the fault of 9bebe9e5b0f3, but rather
> the addition of TEXT_MAIN, which could potentially match .text.unlikely
> and .text.fixup)

The existing comment on TEXT_TEXT mentions that issue. However, note
that the dead code stuff is only available currently on mips and ppc,
and is hidden behind EXPERT for those, so I'm not sure if anyone
actually uses it.

9bebe9e5b0f3 predates LD_DEAD_CODE_DATA_ELIMINATION, and there were no
wildcards I can see in .text at the time, which is why I don't
understand what problem is referred to in the commit message.

Btw, for the FGKASLR stuff, instead of keeping the output sections per
function, couldn't you generate a table of functions with sizes, and use
that when randomizing the order? Then the sections themselves could be
collected into .text explicitly.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 13/36] vmlinux.lds.h: add PGO and AutoFDO input sections
  2020-08-01  3:51   ` Arvind Sankar
  2020-08-01  6:18     ` Kees Cook
@ 2020-08-03 19:05     ` Andi Kleen
  2020-08-03 20:15       ` Arvind Sankar
  1 sibling, 1 reply; 14+ messages in thread
From: Andi Kleen @ 2020-08-03 19:05 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Kees Cook, Thomas Gleixner, Will Deacon, Nick Desaulniers,
	Jian Cai, Fāng-ruì Sòng, Luis Lozano, Manoj Gupta,
	stable, Catalin Marinas, Mark Rutland, Ard Biesheuvel,
	Peter Collingbourne, James Morse, Borislav Petkov, Ingo Molnar,
	Russell King, Masahiro Yamada, Nathan Chancellor, Arnd Bergmann,
	x86, clang-built-linux, linux-arch, linux-efi, linux-arm-kernel,
	linux-kernel, Michal Marek

> However, the history of their being together comes from
> 
>   9bebe9e5b0f3 ("kbuild: Fix .text.unlikely placement")
> 
> which seems to indicate there was some problem with having them separated out,
> although I don't quite understand what the issue was from the commit message.

Separating it out is less efficient. Gives worse packing for the hot part
if they are not aligned to 64byte boundaries, which they are usually not. 

It also improves packing of the cold part, but that probably doesn't matter.

-Andi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 13/36] vmlinux.lds.h: add PGO and AutoFDO input sections
  2020-08-03 19:05     ` Andi Kleen
@ 2020-08-03 20:15       ` Arvind Sankar
  2020-08-04  1:19         ` Fāng-ruì Sòng
  2020-08-04  4:45         ` Andi Kleen
  0 siblings, 2 replies; 14+ messages in thread
From: Arvind Sankar @ 2020-08-03 20:15 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Arvind Sankar, Kees Cook, Thomas Gleixner, Will Deacon,
	Nick Desaulniers, Jian Cai, Fāng-ruì Sòng,
	Luis Lozano, Manoj Gupta, stable, Catalin Marinas, Mark Rutland,
	Ard Biesheuvel, Peter Collingbourne, James Morse,
	Borislav Petkov, Ingo Molnar, Russell King, Masahiro Yamada,
	Nathan Chancellor, Arnd Bergmann, x86, clang-built-linux,
	linux-arch, linux-efi, linux-arm-kernel, linux-kernel,
	Michal Marek

On Mon, Aug 03, 2020 at 12:05:06PM -0700, Andi Kleen wrote:
> > However, the history of their being together comes from
> > 
> >   9bebe9e5b0f3 ("kbuild: Fix .text.unlikely placement")
> > 
> > which seems to indicate there was some problem with having them separated out,
> > although I don't quite understand what the issue was from the commit message.
> 
> Separating it out is less efficient. Gives worse packing for the hot part
> if they are not aligned to 64byte boundaries, which they are usually not. 
> 
> It also improves packing of the cold part, but that probably doesn't matter.
> 
> -Andi

Why is that? Both .text and .text.hot have alignment of 2^4 (default
function alignment on x86) by default, so it doesn't seem like it should
matter for packing density.  Avoiding interspersing cold text among
regular/hot text seems like it should be a net win.

That old commit doesn't reference efficiency -- it says there was some
problem with matching when they were separated out, but there were no
wildcard section names back then.

commit 9bebe9e5b0f3109a14000df25308c2971f872605
Author: Andi Kleen <ak@linux.intel.com>
Date:   Sun Jul 19 18:01:19 2015 -0700

    kbuild: Fix .text.unlikely placement
    
    When building a kernel with .text.unlikely text the unlikely text for
    each translation unit was put next to the main .text code in the
    final vmlinux.
    
    The problem is that the linker doesn't allow more specific submatches
    of a section name in a different linker script statement after the
    main match.
    
    So we need to move them all into one line. With that change
    .text.unlikely is at the end of everything again.
    
    I also moved .text.hot into the same statement though, even though
    that's not strictly needed.
    
    Signed-off-by: Andi Kleen <ak@linux.intel.com>
    Signed-off-by: Michal Marek <mmarek@suse.com>

diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 8bd374d3cf21..1781e54ea6d3 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -412,12 +412,10 @@
  * during second ld run in second ld pass when generating System.map */
 #define TEXT_TEXT							\
 		ALIGN_FUNCTION();					\
-		*(.text.hot)						\
-		*(.text .text.fixup)					\
+		*(.text.hot .text .text.fixup .text.unlikely)		\
 		*(.ref.text)						\
 	MEM_KEEP(init.text)						\
 	MEM_KEEP(exit.text)						\
-		*(.text.unlikely)
 
 
 /* sched.text is aling to function alignment to secure we have same

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 13/36] vmlinux.lds.h: add PGO and AutoFDO input sections
  2020-08-03 20:15       ` Arvind Sankar
@ 2020-08-04  1:19         ` Fāng-ruì Sòng
  2020-08-04  4:45         ` Andi Kleen
  1 sibling, 0 replies; 14+ messages in thread
From: Fāng-ruì Sòng @ 2020-08-04  1:19 UTC (permalink / raw)
  To: Arvind Sankar, Kees Cook
  Cc: Andi Kleen, Thomas Gleixner, Will Deacon, Nick Desaulniers,
	Jian Cai, Luis Lozano, Manoj Gupta, stable, Catalin Marinas,
	Mark Rutland, Ard Biesheuvel, Peter Collingbourne, James Morse,
	Borislav Petkov, Ingo Molnar, Russell King, Masahiro Yamada,
	Nathan Chancellor, Arnd Bergmann, x86, clang-built-linux,
	linux-arch, linux-efi, linux-arm-kernel, linux-kernel,
	Michal Marek

On 2020-08-03, Arvind Sankar wrote:
>On Mon, Aug 03, 2020 at 12:05:06PM -0700, Andi Kleen wrote:
>> > However, the history of their being together comes from
>> >
>> >   9bebe9e5b0f3 ("kbuild: Fix .text.unlikely placement")
>> >
>> > which seems to indicate there was some problem with having them separated out,
>> > although I don't quite understand what the issue was from the commit message.
>>
>> Separating it out is less efficient. Gives worse packing for the hot part
>> if they are not aligned to 64byte boundaries, which they are usually not.
>>
>> It also improves packing of the cold part, but that probably doesn't matter.
>>
>> -Andi
>
>Why is that? Both .text and .text.hot have alignment of 2^4 (default
>function alignment on x86) by default, so it doesn't seem like it should
>matter for packing density.  Avoiding interspersing cold text among
>regular/hot text seems like it should be a net win.
>
>That old commit doesn't reference efficiency -- it says there was some
>problem with matching when they were separated out, but there were no
>wildcard section names back then.

I just want to share some context. GNU ld's internal linker script does
impose a particular input section order by specifying separate input section descriptions:

   .text           :
   {
     *(.text.unlikely .text.*_unlikely .text.unlikely.*)
     *(.text.exit .text.exit.*)
     *(.text.startup .text.startup.*)
     *(.text.hot .text.hot.*)
     *(SORT(.text.sorted.*))   # binutils 5fa5f8f5fe494ba4fe98c11899a5464cd164ec75, invented for GCC's call graph profiling. LLVM doesn't use it
     *(.text .stub .text.* .gnu.linkonce.t.*)
     ...

This order is a bit arbitrary. gold and LLD have -z keep-text-section-prefix. With the option,
there can be several output sections, with the '.unlikely'/'.exit'/'.startup'/etc suffix.
This has the advantage that the hot/unlikely/exit/etc attribution of a particular function is more obvious:

   [ 2] .text             PROGBITS        000000000040007c 00007c 000003 00  AX  0   0  4
   [ 3] .text.startup     PROGBITS        000000000040007f 00007f 000001 00  AX  0   0  1
   [ 4] .text.exit        PROGBITS        0000000000400080 000080 000002 00  AX  0   0  1
   [ 5] .text.unlikely    PROGBITS        0000000000400082 000082 000001 00  AX  0   0  1 
   ...

In our case we only need one output section.......

If we place all text sections in one input section description:

     *(.text.unlikely .text.*_unlikely .text.exit .text.exit.* .text.startup .text.startup.* .text.hot .text.hot.* ... )

In many cases the input sections are laid out in the input order. In LLD there are two ordering cases:

* If clang PGO (-fprofile-use=) is enabled, .llvm.call-graph-profile will be created automatically.
   LLD can perform reordering **within an input section description**. The ordering is quite complex,
   you can read https://github.com/llvm/llvm-project/blob/master/lld/ELF/CallGraphSort.cpp#L9 if you are curious:)

   I don't know the performance improvement of this heuristic. (I don't think the original paper
   cgo2017-hfsort-final1.pdf took ThinLTO into account, so the result might not
   reflect realistic work loads where both ThinLTO and PGO are used) This, if matters, likely only matters
   for very large executable, not the case for the kernel.
* On some RISC architectures (ARM/AArch64/PowerPC),
   the ordered sections (due to either .llvm.call-graph-profile or
   --symbol-reordering-file=; the two can't be used together) are placed in a
   suitable place in the input section description
   ( http://reviews.llvm.org/D44969 )


In summary, using one (large) input section description may have some
performance improvement with LLD but I don't think it will be significant.
There may be some size improvement for ARM/AArch64/PowerPC if someone wants to test.

>commit 9bebe9e5b0f3109a14000df25308c2971f872605
>Author: Andi Kleen <ak@linux.intel.com>
>Date:   Sun Jul 19 18:01:19 2015 -0700
>
>    kbuild: Fix .text.unlikely placement
>
>    When building a kernel with .text.unlikely text the unlikely text for
>    each translation unit was put next to the main .text code in the
>    final vmlinux.
>
>    The problem is that the linker doesn't allow more specific submatches
>    of a section name in a different linker script statement after the
>    main match.
>
>    So we need to move them all into one line. With that change
>    .text.unlikely is at the end of everything again.
>
>    I also moved .text.hot into the same statement though, even though
>    that's not strictly needed.
>
>    Signed-off-by: Andi Kleen <ak@linux.intel.com>
>    Signed-off-by: Michal Marek <mmarek@suse.com>
>
>diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
>index 8bd374d3cf21..1781e54ea6d3 100644
>--- a/include/asm-generic/vmlinux.lds.h
>+++ b/include/asm-generic/vmlinux.lds.h
>@@ -412,12 +412,10 @@
>  * during second ld run in second ld pass when generating System.map */
> #define TEXT_TEXT							\
> 		ALIGN_FUNCTION();					\
>-		*(.text.hot)						\
>-		*(.text .text.fixup)					\
>+		*(.text.hot .text .text.fixup .text.unlikely)		\
> 		*(.ref.text)						\
> 	MEM_KEEP(init.text)						\
> 	MEM_KEEP(exit.text)						\
>-		*(.text.unlikely)
>
>
> /* sched.text is aling to function alignment to secure we have same

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 13/36] vmlinux.lds.h: add PGO and AutoFDO input sections
  2020-08-03 20:15       ` Arvind Sankar
  2020-08-04  1:19         ` Fāng-ruì Sòng
@ 2020-08-04  4:45         ` Andi Kleen
  2020-08-04  5:32           ` Fāng-ruì Sòng
  2020-08-04 16:06           ` Arvind Sankar
  1 sibling, 2 replies; 14+ messages in thread
From: Andi Kleen @ 2020-08-04  4:45 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Kees Cook, Thomas Gleixner, Will Deacon, Nick Desaulniers,
	Jian Cai, Fāng-ruì Sòng, Luis Lozano, Manoj Gupta,
	stable, Catalin Marinas, Mark Rutland, Ard Biesheuvel,
	Peter Collingbourne, James Morse, Borislav Petkov, Ingo Molnar,
	Russell King, Masahiro Yamada, Nathan Chancellor, Arnd Bergmann,
	x86, clang-built-linux, linux-arch, linux-efi, linux-arm-kernel,
	linux-kernel, Michal Marek

> Why is that? Both .text and .text.hot have alignment of 2^4 (default
> function alignment on x86) by default, so it doesn't seem like it should
> matter for packing density.  Avoiding interspersing cold text among

You may lose part of a cache line on each unit boundary. Linux has 
a lot of units, some of them small. All these bytes add up.

It's bad for TLB locality too. Sadly with all the fine grained protection
changes the 2MB coverage is eroding anyways, but this makes it even worse.

> regular/hot text seems like it should be a net win.

> 
> That old commit doesn't reference efficiency -- it says there was some
> problem with matching when they were separated out, but there were no
> wildcard section names back then.

It was about efficiency.

-Andi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 13/36] vmlinux.lds.h: add PGO and AutoFDO input sections
  2020-08-04  4:45         ` Andi Kleen
@ 2020-08-04  5:32           ` Fāng-ruì Sòng
  2020-08-04 16:06           ` Arvind Sankar
  1 sibling, 0 replies; 14+ messages in thread
From: Fāng-ruì Sòng @ 2020-08-04  5:32 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Arvind Sankar, Kees Cook, Thomas Gleixner, Will Deacon,
	Nick Desaulniers, Jian Cai, Luis Lozano, Manoj Gupta, stable,
	Catalin Marinas, Mark Rutland, Ard Biesheuvel,
	Peter Collingbourne, James Morse, Borislav Petkov, Ingo Molnar,
	Russell King, Masahiro Yamada, Nathan Chancellor, Arnd Bergmann,
	x86, clang-built-linux, linux-arch, linux-efi, linux-arm-kernel,
	linux-kernel, Michal Marek

On 2020-08-03, Andi Kleen wrote:
>> Why is that? Both .text and .text.hot have alignment of 2^4 (default
>> function alignment on x86) by default, so it doesn't seem like it should
>> matter for packing density.  Avoiding interspersing cold text among
>
>You may lose part of a cache line on each unit boundary. Linux has
>a lot of units, some of them small. All these bytes add up.
>
>It's bad for TLB locality too. Sadly with all the fine grained protection
>changes the 2MB coverage is eroding anyways, but this makes it even worse.

> Gives worse packing for the hot part
> if they are not aligned to 64byte boundaries, which they are usually
> not.

I do not see how the 64-byte argument is related to this patch.
If a function requires 64-byte alignment to be efficient, the compiler
should communicate this fact by setting the alignment of its containing
section to 64 bytes or above.

If a text section has a 16-byte alignment, the linker can reorder it to
an address which is a multiple of 16 but not a multiple of 64.

I agree with your other statement that having a single input section
description might be helpful. With more than one input section
descrition, the linker has to respect the ordering requirement. With
just one input section description, the linker has more freedom ordering
sections if profitable. For example, LLD performs two ordering
heuristics as my previous reply mentions.

It'd be good if someone can measure the benefit. Personally I don't
think this kind of ordering has significant benefit. (For
arm/aarch64/powerpc there might be some size benefit due to fewer range
extension thunks)

>> regular/hot text seems like it should be a net win.
>
>>
>> That old commit doesn't reference efficiency -- it says there was some
>> problem with matching when they were separated out, but there were no
>> wildcard section names back then.
>
>It was about efficiency.
>
>-Andi
>
>-- 
>You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
>To unsubscribe from this group and stop receiving emails from it, send an email to clang-built-linux+unsubscribe@googlegroups.com.
>To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/20200804044532.GC1321588%40tassilo.jf.intel.com.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 13/36] vmlinux.lds.h: add PGO and AutoFDO input sections
  2020-08-04  4:45         ` Andi Kleen
  2020-08-04  5:32           ` Fāng-ruì Sòng
@ 2020-08-04 16:06           ` Arvind Sankar
  2020-08-21 19:18             ` Kees Cook
  1 sibling, 1 reply; 14+ messages in thread
From: Arvind Sankar @ 2020-08-04 16:06 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Arvind Sankar, Kees Cook, Thomas Gleixner, Will Deacon,
	Nick Desaulniers, Jian Cai, Fāng-ruì Sòng,
	Luis Lozano, Manoj Gupta, stable, Catalin Marinas, Mark Rutland,
	Ard Biesheuvel, Peter Collingbourne, James Morse,
	Borislav Petkov, Ingo Molnar, Russell King, Masahiro Yamada,
	Nathan Chancellor, Arnd Bergmann, x86, clang-built-linux,
	linux-arch, linux-efi, linux-arm-kernel, linux-kernel,
	Michal Marek

On Mon, Aug 03, 2020 at 09:45:32PM -0700, Andi Kleen wrote:
> > Why is that? Both .text and .text.hot have alignment of 2^4 (default
> > function alignment on x86) by default, so it doesn't seem like it should
> > matter for packing density.  Avoiding interspersing cold text among
> 
> You may lose part of a cache line on each unit boundary. Linux has 
> a lot of units, some of them small. All these bytes add up.

Separating out .text.unlikely, which isn't aligned, slightly _reduces_
this loss, but not by much -- just over 1K on a defconfig. More
importantly, it moves cold code out of line (~320k on a defconfig),
giving better code density for the hot code.

For .text and .text.hot, you lose the alignment padding on every
function boundary, not unit boundary, because of the 16-byte alignment.
Whether .text.hot and .text are arranged by translation unit or not
makes no difference.

With *(.text.hot) *(.text) you get HHTT, with *(.text.hot .text) you get
HTHT, but in both cases the individual chunks are already aligned to 16
bytes. If .text.hot _had_ different alignment requirements to .text, the
HHTT should actually give better packing in general, I think.

> 
> It's bad for TLB locality too. Sadly with all the fine grained protection
> changes the 2MB coverage is eroding anyways, but this makes it even worse.
> 

Yes, that could be true for .text.hot, depending on whether the hot
functions are called from all over the kernel (in which case putting
them together ought to be better) or mostly from regular text within the
unit in which they appeared (in which case it would be better together
with that code).

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 13/36] vmlinux.lds.h: add PGO and AutoFDO input sections
  2020-07-31 23:07 ` [PATCH v5 13/36] vmlinux.lds.h: add PGO and AutoFDO input sections Kees Cook
  2020-08-01  3:51   ` Arvind Sankar
@ 2020-08-06  1:24   ` Sasha Levin
  2020-08-06  1:28     ` Nick Desaulniers
  2020-08-19 23:56   ` Sasha Levin
  2 siblings, 1 reply; 14+ messages in thread
From: Sasha Levin @ 2020-08-06  1:24 UTC (permalink / raw)
  To: Sasha Levin, Kees Cook, Nick Desaulniers, Thomas Gleixner, Will Deacon
  Cc: Kees Cook, stable, stable

Hi

[This is an automated email]

This commit has been processed because it contains a -stable tag.
The stable tag indicates that it's relevant for the following trees: all

The bot has tested the following trees: v5.7.11, v5.4.54, v4.19.135, v4.14.190, v4.9.231, v4.4.231.

v5.7.11: Failed to apply! Possible dependencies:
    655389666643 ("vmlinux.lds.h: Create section for protection against instrumentation")

v5.4.54: Failed to apply! Possible dependencies:
    441110a547f8 ("vmlinux.lds.h: Provide EMIT_PT_NOTE to indicate export of .notes")
    6434efbd9aef ("s390: Move RO_DATA into "text" PT_LOAD Program Header")
    655389666643 ("vmlinux.lds.h: Create section for protection against instrumentation")
    6fc4000656a1 ("powerpc: Remove PT_NOTE workaround")
    7a42d41d9dc2 ("x86/vmlinux: Restore "text" Program Header with dummy section")
    eaf937075c9a ("vmlinux.lds.h: Move NOTES into RO_DATA")
    fbe6a8e618a2 ("vmlinux.lds.h: Move Program Header restoration into NOTES macro")

v4.19.135: Failed to apply! Possible dependencies:
    15426ca43d88 ("s390: rescue initrd as early as possible")
    369f91c37451 ("s390/decompressor: rework uncompressed image info collection")
    3bad719b4954 ("powerpc/prom_init: Make of_workarounds static")
    441110a547f8 ("vmlinux.lds.h: Provide EMIT_PT_NOTE to indicate export of .notes")
    4e62d4588500 ("s390: clean up stacks setup")
    5f69e38885c3 ("powerpc/prom_init: Move __prombss to it's own section and store it in .bss")
    655389666643 ("vmlinux.lds.h: Create section for protection against instrumentation")
    67361cf80712 ("powerpc/ftrace: Handle large kernel configs")
    8f75582a2fb6 ("s390: remove decompressor's head.S")
    d1b52a4388ff ("s390: introduce .boot.data section")
    e63334e556d9 ("powerpc/prom_init: Replace __initdata with __prombss when applicable")
    eaf937075c9a ("vmlinux.lds.h: Move NOTES into RO_DATA")
    fbe6a8e618a2 ("vmlinux.lds.h: Move Program Header restoration into NOTES macro")

v4.14.190: Failed to apply! Possible dependencies:
    01417c6cc7dc ("powerpc/64: Change soft_enabled from flag to bitmask")
    0b63acf4a0eb ("powerpc/64: Move set_soft_enabled() and rename")
    1696d0fb7fcd ("powerpc/64: Set DSCR default initially from SPR")
    4e26bc4a4ed6 ("powerpc/64: Rename soft_enabled to irq_soft_mask")
    5080332c2c89 ("powerpc/64s: Add workaround for P9 vector CI load issue")
    5633e85b2c31 ("powerpc64: Add .opd based function descriptor dereference")
    655389666643 ("vmlinux.lds.h: Create section for protection against instrumentation")
    67361cf80712 ("powerpc/ftrace: Handle large kernel configs")
    9f83e00f4cc1 ("powerpc/64: Improve inline asm in arch_local_irq_disable")
    ae30cc05bed2 ("powerpc64/ftrace: Implement support for ftrace_regs_caller()")
    b5c1bd62c054 ("powerpc/64: Fix arch_local_irq_disable() prototype")
    c2e480ba8227 ("powerpc/64: Add #defines for paca->soft_enabled flags")
    ea678ac627e0 ("powerpc64/ftrace: Add a field in paca to disable ftrace in unsafe code paths")
    ff967900c9d4 ("powerpc/64: Fix latency tracing for lazy irq replay")

v4.9.231: Failed to apply! Possible dependencies:
    096ff2ddba83 ("powerpc/ftrace/64: Split further based on -mprofile-kernel")
    2f59be5b970b ("powerpc/ftrace: Restore LR from pt_regs")
    454656155110 ("powerpc/asm: Use OFFSET macro in asm-offsets.c")
    5633e85b2c31 ("powerpc64: Add .opd based function descriptor dereference")
    655389666643 ("vmlinux.lds.h: Create section for protection against instrumentation")
    67361cf80712 ("powerpc/ftrace: Handle large kernel configs")
    700e64377c2c ("powerpc/ftrace: Move stack setup and teardown code into ftrace_graph_caller()")
    7853f9c029ac ("powerpc: Split ftrace bits into a separate file")
    99ad503287da ("powerpc: Add a prototype for mcount() so it can be versioned")
    a97a65d53d9f ("KVM: PPC: Book3S: 64-bit CONFIG_RELOCATABLE support for interrupts")
    ae30cc05bed2 ("powerpc64/ftrace: Implement support for ftrace_regs_caller()")
    b3a7864c6feb ("powerpc/ftrace: Add prototype for prepare_ftrace_return()")
    c02e0349d7e9 ("powerpc/ftrace: Fix the comments for ftrace_modify_code")
    c4f3b52ce7b1 ("powerpc/64s: Disallow system reset vs system reset reentrancy")
    d3918e7fd4a2 ("KVM: PPC: Book3S: Change interrupt call to reduce scratch space use on HV")
    ea678ac627e0 ("powerpc64/ftrace: Add a field in paca to disable ftrace in unsafe code paths")

v4.4.231: Failed to apply! Possible dependencies:
    0f4c4af06eec ("kbuild: -ffunction-sections fix for archs with conflicting sections")
    2aedcd098a94 ("kbuild: suppress annoying "... is up to date." message")
    9895c03d4811 ("kbuild: record needed exported symbols for modules")
    a5967db9af51 ("kbuild: allow architectures to use thin archives instead of ld -r")
    b67067f1176d ("kbuild: allow archs to select link dead code/data elimination")
    b9ab5ebb14ec ("objtool: Add CONFIG_STACK_VALIDATION option")
    c1a95fda2a40 ("kbuild: add fine grained build dependencies for exported symbols")
    cb87481ee89d ("kbuild: linker script do not match C names unless LD_DEAD_CODE_DATA_ELIMINATION is configured")
    cf4f21938e13 ("kbuild: Allow to specify composite modules with modname-m")
    e4aca4595005 ("kbuild: de-duplicate fixdep usage")
    f235541699bc ("export.h: allow for per-symbol configurable EXPORT_SYMBOL()")


NOTE: The patch will not be queued to stable trees until it is upstream.

How should we proceed with this patch?

-- 
Thanks
Sasha

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 13/36] vmlinux.lds.h: add PGO and AutoFDO input sections
  2020-08-06  1:24   ` Sasha Levin
@ 2020-08-06  1:28     ` Nick Desaulniers
  0 siblings, 0 replies; 14+ messages in thread
From: Nick Desaulniers @ 2020-08-06  1:28 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Kees Cook, Thomas Gleixner, Will Deacon, # 3.4.x

Thanks for the report. We can provide manual backports then.  Our
prodkernel and CrOS teams both hit this issue within the past month.

On Wed, Aug 5, 2020 at 6:24 PM Sasha Levin <sashal@kernel.org> wrote:
>
> Hi
>
> [This is an automated email]
>
> This commit has been processed because it contains a -stable tag.
> The stable tag indicates that it's relevant for the following trees: all
>
> The bot has tested the following trees: v5.7.11, v5.4.54, v4.19.135, v4.14.190, v4.9.231, v4.4.231.
>
> v5.7.11: Failed to apply! Possible dependencies:
>     655389666643 ("vmlinux.lds.h: Create section for protection against instrumentation")
>
> v5.4.54: Failed to apply! Possible dependencies:
>     441110a547f8 ("vmlinux.lds.h: Provide EMIT_PT_NOTE to indicate export of .notes")
>     6434efbd9aef ("s390: Move RO_DATA into "text" PT_LOAD Program Header")
>     655389666643 ("vmlinux.lds.h: Create section for protection against instrumentation")
>     6fc4000656a1 ("powerpc: Remove PT_NOTE workaround")
>     7a42d41d9dc2 ("x86/vmlinux: Restore "text" Program Header with dummy section")
>     eaf937075c9a ("vmlinux.lds.h: Move NOTES into RO_DATA")
>     fbe6a8e618a2 ("vmlinux.lds.h: Move Program Header restoration into NOTES macro")
>
> v4.19.135: Failed to apply! Possible dependencies:
>     15426ca43d88 ("s390: rescue initrd as early as possible")
>     369f91c37451 ("s390/decompressor: rework uncompressed image info collection")
>     3bad719b4954 ("powerpc/prom_init: Make of_workarounds static")
>     441110a547f8 ("vmlinux.lds.h: Provide EMIT_PT_NOTE to indicate export of .notes")
>     4e62d4588500 ("s390: clean up stacks setup")
>     5f69e38885c3 ("powerpc/prom_init: Move __prombss to it's own section and store it in .bss")
>     655389666643 ("vmlinux.lds.h: Create section for protection against instrumentation")
>     67361cf80712 ("powerpc/ftrace: Handle large kernel configs")
>     8f75582a2fb6 ("s390: remove decompressor's head.S")
>     d1b52a4388ff ("s390: introduce .boot.data section")
>     e63334e556d9 ("powerpc/prom_init: Replace __initdata with __prombss when applicable")
>     eaf937075c9a ("vmlinux.lds.h: Move NOTES into RO_DATA")
>     fbe6a8e618a2 ("vmlinux.lds.h: Move Program Header restoration into NOTES macro")
>
> v4.14.190: Failed to apply! Possible dependencies:
>     01417c6cc7dc ("powerpc/64: Change soft_enabled from flag to bitmask")
>     0b63acf4a0eb ("powerpc/64: Move set_soft_enabled() and rename")
>     1696d0fb7fcd ("powerpc/64: Set DSCR default initially from SPR")
>     4e26bc4a4ed6 ("powerpc/64: Rename soft_enabled to irq_soft_mask")
>     5080332c2c89 ("powerpc/64s: Add workaround for P9 vector CI load issue")
>     5633e85b2c31 ("powerpc64: Add .opd based function descriptor dereference")
>     655389666643 ("vmlinux.lds.h: Create section for protection against instrumentation")
>     67361cf80712 ("powerpc/ftrace: Handle large kernel configs")
>     9f83e00f4cc1 ("powerpc/64: Improve inline asm in arch_local_irq_disable")
>     ae30cc05bed2 ("powerpc64/ftrace: Implement support for ftrace_regs_caller()")
>     b5c1bd62c054 ("powerpc/64: Fix arch_local_irq_disable() prototype")
>     c2e480ba8227 ("powerpc/64: Add #defines for paca->soft_enabled flags")
>     ea678ac627e0 ("powerpc64/ftrace: Add a field in paca to disable ftrace in unsafe code paths")
>     ff967900c9d4 ("powerpc/64: Fix latency tracing for lazy irq replay")
>
> v4.9.231: Failed to apply! Possible dependencies:
>     096ff2ddba83 ("powerpc/ftrace/64: Split further based on -mprofile-kernel")
>     2f59be5b970b ("powerpc/ftrace: Restore LR from pt_regs")
>     454656155110 ("powerpc/asm: Use OFFSET macro in asm-offsets.c")
>     5633e85b2c31 ("powerpc64: Add .opd based function descriptor dereference")
>     655389666643 ("vmlinux.lds.h: Create section for protection against instrumentation")
>     67361cf80712 ("powerpc/ftrace: Handle large kernel configs")
>     700e64377c2c ("powerpc/ftrace: Move stack setup and teardown code into ftrace_graph_caller()")
>     7853f9c029ac ("powerpc: Split ftrace bits into a separate file")
>     99ad503287da ("powerpc: Add a prototype for mcount() so it can be versioned")
>     a97a65d53d9f ("KVM: PPC: Book3S: 64-bit CONFIG_RELOCATABLE support for interrupts")
>     ae30cc05bed2 ("powerpc64/ftrace: Implement support for ftrace_regs_caller()")
>     b3a7864c6feb ("powerpc/ftrace: Add prototype for prepare_ftrace_return()")
>     c02e0349d7e9 ("powerpc/ftrace: Fix the comments for ftrace_modify_code")
>     c4f3b52ce7b1 ("powerpc/64s: Disallow system reset vs system reset reentrancy")
>     d3918e7fd4a2 ("KVM: PPC: Book3S: Change interrupt call to reduce scratch space use on HV")
>     ea678ac627e0 ("powerpc64/ftrace: Add a field in paca to disable ftrace in unsafe code paths")
>
> v4.4.231: Failed to apply! Possible dependencies:
>     0f4c4af06eec ("kbuild: -ffunction-sections fix for archs with conflicting sections")
>     2aedcd098a94 ("kbuild: suppress annoying "... is up to date." message")
>     9895c03d4811 ("kbuild: record needed exported symbols for modules")
>     a5967db9af51 ("kbuild: allow architectures to use thin archives instead of ld -r")
>     b67067f1176d ("kbuild: allow archs to select link dead code/data elimination")
>     b9ab5ebb14ec ("objtool: Add CONFIG_STACK_VALIDATION option")
>     c1a95fda2a40 ("kbuild: add fine grained build dependencies for exported symbols")
>     cb87481ee89d ("kbuild: linker script do not match C names unless LD_DEAD_CODE_DATA_ELIMINATION is configured")
>     cf4f21938e13 ("kbuild: Allow to specify composite modules with modname-m")
>     e4aca4595005 ("kbuild: de-duplicate fixdep usage")
>     f235541699bc ("export.h: allow for per-symbol configurable EXPORT_SYMBOL()")
>
>
> NOTE: The patch will not be queued to stable trees until it is upstream.
>
> How should we proceed with this patch?
>
> --
> Thanks
> Sasha



-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 13/36] vmlinux.lds.h: add PGO and AutoFDO input sections
  2020-07-31 23:07 ` [PATCH v5 13/36] vmlinux.lds.h: add PGO and AutoFDO input sections Kees Cook
  2020-08-01  3:51   ` Arvind Sankar
  2020-08-06  1:24   ` Sasha Levin
@ 2020-08-19 23:56   ` Sasha Levin
  2 siblings, 0 replies; 14+ messages in thread
From: Sasha Levin @ 2020-08-19 23:56 UTC (permalink / raw)
  To: Sasha Levin, Kees Cook, Nick Desaulniers, Thomas Gleixner, Will Deacon
  Cc: Kees Cook, stable, stable

Hi

[This is an automated email]

This commit has been processed because it contains a -stable tag.
The stable tag indicates that it's relevant for the following trees: all

The bot has tested the following trees: v5.8.1, v5.7.15, v5.4.58, v4.19.139, v4.14.193, v4.9.232, v4.4.232.

v5.8.1: Build OK!
v5.7.15: Failed to apply! Possible dependencies:
    655389666643 ("vmlinux.lds.h: Create section for protection against instrumentation")

v5.4.58: Failed to apply! Possible dependencies:
    441110a547f8 ("vmlinux.lds.h: Provide EMIT_PT_NOTE to indicate export of .notes")
    6434efbd9aef ("s390: Move RO_DATA into "text" PT_LOAD Program Header")
    655389666643 ("vmlinux.lds.h: Create section for protection against instrumentation")
    6fc4000656a1 ("powerpc: Remove PT_NOTE workaround")
    7a42d41d9dc2 ("x86/vmlinux: Restore "text" Program Header with dummy section")
    eaf937075c9a ("vmlinux.lds.h: Move NOTES into RO_DATA")
    fbe6a8e618a2 ("vmlinux.lds.h: Move Program Header restoration into NOTES macro")

v4.19.139: Failed to apply! Possible dependencies:
    15426ca43d88 ("s390: rescue initrd as early as possible")
    369f91c37451 ("s390/decompressor: rework uncompressed image info collection")
    3bad719b4954 ("powerpc/prom_init: Make of_workarounds static")
    441110a547f8 ("vmlinux.lds.h: Provide EMIT_PT_NOTE to indicate export of .notes")
    4e62d4588500 ("s390: clean up stacks setup")
    5f69e38885c3 ("powerpc/prom_init: Move __prombss to it's own section and store it in .bss")
    655389666643 ("vmlinux.lds.h: Create section for protection against instrumentation")
    67361cf80712 ("powerpc/ftrace: Handle large kernel configs")
    8f75582a2fb6 ("s390: remove decompressor's head.S")
    d1b52a4388ff ("s390: introduce .boot.data section")
    e63334e556d9 ("powerpc/prom_init: Replace __initdata with __prombss when applicable")
    eaf937075c9a ("vmlinux.lds.h: Move NOTES into RO_DATA")
    fbe6a8e618a2 ("vmlinux.lds.h: Move Program Header restoration into NOTES macro")

v4.14.193: Failed to apply! Possible dependencies:
    01417c6cc7dc ("powerpc/64: Change soft_enabled from flag to bitmask")
    0b63acf4a0eb ("powerpc/64: Move set_soft_enabled() and rename")
    1696d0fb7fcd ("powerpc/64: Set DSCR default initially from SPR")
    4e26bc4a4ed6 ("powerpc/64: Rename soft_enabled to irq_soft_mask")
    5080332c2c89 ("powerpc/64s: Add workaround for P9 vector CI load issue")
    5633e85b2c31 ("powerpc64: Add .opd based function descriptor dereference")
    655389666643 ("vmlinux.lds.h: Create section for protection against instrumentation")
    67361cf80712 ("powerpc/ftrace: Handle large kernel configs")
    9f83e00f4cc1 ("powerpc/64: Improve inline asm in arch_local_irq_disable")
    ae30cc05bed2 ("powerpc64/ftrace: Implement support for ftrace_regs_caller()")
    b5c1bd62c054 ("powerpc/64: Fix arch_local_irq_disable() prototype")
    c2e480ba8227 ("powerpc/64: Add #defines for paca->soft_enabled flags")
    ea678ac627e0 ("powerpc64/ftrace: Add a field in paca to disable ftrace in unsafe code paths")
    ff967900c9d4 ("powerpc/64: Fix latency tracing for lazy irq replay")

v4.9.232: Failed to apply! Possible dependencies:
    096ff2ddba83 ("powerpc/ftrace/64: Split further based on -mprofile-kernel")
    2f59be5b970b ("powerpc/ftrace: Restore LR from pt_regs")
    454656155110 ("powerpc/asm: Use OFFSET macro in asm-offsets.c")
    5633e85b2c31 ("powerpc64: Add .opd based function descriptor dereference")
    655389666643 ("vmlinux.lds.h: Create section for protection against instrumentation")
    67361cf80712 ("powerpc/ftrace: Handle large kernel configs")
    700e64377c2c ("powerpc/ftrace: Move stack setup and teardown code into ftrace_graph_caller()")
    7853f9c029ac ("powerpc: Split ftrace bits into a separate file")
    99ad503287da ("powerpc: Add a prototype for mcount() so it can be versioned")
    a97a65d53d9f ("KVM: PPC: Book3S: 64-bit CONFIG_RELOCATABLE support for interrupts")
    ae30cc05bed2 ("powerpc64/ftrace: Implement support for ftrace_regs_caller()")
    b3a7864c6feb ("powerpc/ftrace: Add prototype for prepare_ftrace_return()")
    c02e0349d7e9 ("powerpc/ftrace: Fix the comments for ftrace_modify_code")
    c4f3b52ce7b1 ("powerpc/64s: Disallow system reset vs system reset reentrancy")
    d3918e7fd4a2 ("KVM: PPC: Book3S: Change interrupt call to reduce scratch space use on HV")
    ea678ac627e0 ("powerpc64/ftrace: Add a field in paca to disable ftrace in unsafe code paths")

v4.4.232: Failed to apply! Possible dependencies:
    0f4c4af06eec ("kbuild: -ffunction-sections fix for archs with conflicting sections")
    2aedcd098a94 ("kbuild: suppress annoying "... is up to date." message")
    9895c03d4811 ("kbuild: record needed exported symbols for modules")
    a5967db9af51 ("kbuild: allow architectures to use thin archives instead of ld -r")
    b67067f1176d ("kbuild: allow archs to select link dead code/data elimination")
    b9ab5ebb14ec ("objtool: Add CONFIG_STACK_VALIDATION option")
    c1a95fda2a40 ("kbuild: add fine grained build dependencies for exported symbols")
    cb87481ee89d ("kbuild: linker script do not match C names unless LD_DEAD_CODE_DATA_ELIMINATION is configured")
    cf4f21938e13 ("kbuild: Allow to specify composite modules with modname-m")
    e4aca4595005 ("kbuild: de-duplicate fixdep usage")
    f235541699bc ("export.h: allow for per-symbol configurable EXPORT_SYMBOL()")


NOTE: The patch will not be queued to stable trees until it is upstream.

How should we proceed with this patch?

-- 
Thanks
Sasha

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 13/36] vmlinux.lds.h: add PGO and AutoFDO input sections
  2020-08-04 16:06           ` Arvind Sankar
@ 2020-08-21 19:18             ` Kees Cook
  0 siblings, 0 replies; 14+ messages in thread
From: Kees Cook @ 2020-08-21 19:18 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Andi Kleen, Thomas Gleixner, Will Deacon, Nick Desaulniers,
	Jian Cai, Fāng-ruì Sòng, Luis Lozano, Manoj Gupta,
	stable, Catalin Marinas, Mark Rutland, Ard Biesheuvel,
	Peter Collingbourne, James Morse, Borislav Petkov, Ingo Molnar,
	Russell King, Masahiro Yamada, Nathan Chancellor, Arnd Bergmann,
	x86, clang-built-linux, linux-arch, linux-efi, linux-arm-kernel,
	linux-kernel, Michal Marek

On Tue, Aug 04, 2020 at 12:06:49PM -0400, Arvind Sankar wrote:
> On Mon, Aug 03, 2020 at 09:45:32PM -0700, Andi Kleen wrote:
> > > Why is that? Both .text and .text.hot have alignment of 2^4 (default
> > > function alignment on x86) by default, so it doesn't seem like it should
> > > matter for packing density.  Avoiding interspersing cold text among
> > 
> > You may lose part of a cache line on each unit boundary. Linux has 
> > a lot of units, some of them small. All these bytes add up.
> 
> Separating out .text.unlikely, which isn't aligned, slightly _reduces_
> this loss, but not by much -- just over 1K on a defconfig. More
> importantly, it moves cold code out of line (~320k on a defconfig),
> giving better code density for the hot code.
> 
> For .text and .text.hot, you lose the alignment padding on every
> function boundary, not unit boundary, because of the 16-byte alignment.
> Whether .text.hot and .text are arranged by translation unit or not
> makes no difference.
> 
> With *(.text.hot) *(.text) you get HHTT, with *(.text.hot .text) you get
> HTHT, but in both cases the individual chunks are already aligned to 16
> bytes. If .text.hot _had_ different alignment requirements to .text, the
> HHTT should actually give better packing in general, I think.

Okay, so at the end of the conversation, I think it looks like this
patch is correct: it collects the hot, unlikely, etc into their own
areas (e.g. HHTTUU is more correct than HTUHTU), so this patch stands
as-is.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-08-21 19:18 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20200731230820.1742553-1-keescook@chromium.org>
2020-07-31 23:07 ` [PATCH v5 13/36] vmlinux.lds.h: add PGO and AutoFDO input sections Kees Cook
2020-08-01  3:51   ` Arvind Sankar
2020-08-01  6:18     ` Kees Cook
2020-08-01 17:27       ` Arvind Sankar
2020-08-03 19:05     ` Andi Kleen
2020-08-03 20:15       ` Arvind Sankar
2020-08-04  1:19         ` Fāng-ruì Sòng
2020-08-04  4:45         ` Andi Kleen
2020-08-04  5:32           ` Fāng-ruì Sòng
2020-08-04 16:06           ` Arvind Sankar
2020-08-21 19:18             ` Kees Cook
2020-08-06  1:24   ` Sasha Levin
2020-08-06  1:28     ` Nick Desaulniers
2020-08-19 23:56   ` Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).