linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86: mark some mpspec inline functions as __init
@ 2021-02-25 11:22 Arnd Bergmann
  2021-02-25 11:45 ` Borislav Petkov
  0 siblings, 1 reply; 13+ messages in thread
From: Arnd Bergmann @ 2021-02-25 11:22 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Nathan Chancellor, Nick Desaulniers
  Cc: Arnd Bergmann, H. Peter Anvin, linux-kernel, clang-built-linux

From: Arnd Bergmann <arnd@arndb.de>

clang-13 sometimes decides to not inline early_get_smp_config(),
which leads to a link-time warning:

WARNING: modpost: vmlinux.o(.text+0x838cc): Section mismatch in reference from the function early_get_smp_config() to the variable .init.data:x86_init
The function early_get_smp_config() references
the variable __initdata x86_init.
This is often because early_get_smp_config lacks a __initdata
annotation or the annotation of x86_init is wrong.

There are two other functions which may run into the same issue,
so mark all three as __init.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/x86/include/asm/mpspec.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/mpspec.h b/arch/x86/include/asm/mpspec.h
index e90ac7e9ae2c..b41066dbf5c2 100644
--- a/arch/x86/include/asm/mpspec.h
+++ b/arch/x86/include/asm/mpspec.h
@@ -49,17 +49,17 @@ extern int smp_found_config;
 # define smp_found_config 0
 #endif
 
-static inline void get_smp_config(void)
+static inline __init void get_smp_config(void)
 {
 	x86_init.mpparse.get_smp_config(0);
 }
 
-static inline void early_get_smp_config(void)
+static inline __init void early_get_smp_config(void)
 {
 	x86_init.mpparse.get_smp_config(1);
 }
 
-static inline void find_smp_config(void)
+static inline __init void find_smp_config(void)
 {
 	x86_init.mpparse.find_smp_config();
 }
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86: mark some mpspec inline functions as __init
  2021-02-25 11:22 [PATCH] x86: mark some mpspec inline functions as __init Arnd Bergmann
@ 2021-02-25 11:45 ` Borislav Petkov
  2021-02-25 12:18   ` Arnd Bergmann
  0 siblings, 1 reply; 13+ messages in thread
From: Borislav Petkov @ 2021-02-25 11:45 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Thomas Gleixner, Ingo Molnar, x86, Nathan Chancellor,
	Nick Desaulniers, Arnd Bergmann, H. Peter Anvin, linux-kernel,
	clang-built-linux

On Thu, Feb 25, 2021 at 12:22:41PM +0100, Arnd Bergmann wrote:
> From: Arnd Bergmann <arnd@arndb.de>
> 
> clang-13 sometimes decides to not inline early_get_smp_config(),
> which leads to a link-time warning:
> 
> WARNING: modpost: vmlinux.o(.text+0x838cc): Section mismatch in reference from the function early_get_smp_config() to the variable .init.data:x86_init
> The function early_get_smp_config() references
> the variable __initdata x86_init.
> This is often because early_get_smp_config lacks a __initdata
> annotation or the annotation of x86_init is wrong.
> 
> There are two other functions which may run into the same issue,
> so mark all three as __init.
> 
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
>  arch/x86/include/asm/mpspec.h | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/include/asm/mpspec.h b/arch/x86/include/asm/mpspec.h
> index e90ac7e9ae2c..b41066dbf5c2 100644
> --- a/arch/x86/include/asm/mpspec.h
> +++ b/arch/x86/include/asm/mpspec.h
> @@ -49,17 +49,17 @@ extern int smp_found_config;
>  # define smp_found_config 0
>  #endif
>  
> -static inline void get_smp_config(void)
> +static inline __init void get_smp_config(void)

__always_inline then I guess.

Not inlining those is just silly.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86: mark some mpspec inline functions as __init
  2021-02-25 11:45 ` Borislav Petkov
@ 2021-02-25 12:18   ` Arnd Bergmann
  2021-02-25 12:42     ` Borislav Petkov
  0 siblings, 1 reply; 13+ messages in thread
From: Arnd Bergmann @ 2021-02-25 12:18 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, Ingo Molnar, the arch/x86 maintainers,
	Nathan Chancellor, Nick Desaulniers, Arnd Bergmann,
	H. Peter Anvin, linux-kernel, clang-built-linux

On Thu, Feb 25, 2021 at 12:45 PM Borislav Petkov <bp@alien8.de> wrote:
> On Thu, Feb 25, 2021 at 12:22:41PM +0100, Arnd Bergmann wrote:

> > -static inline void get_smp_config(void)
> > +static inline __init void get_smp_config(void)
>
> __always_inline then I guess.
>
> Not inlining those is just silly.

Either way works correctly, I don't care much, but picked the __init
annotation as it seemed more intuitive. If the compiler decides to
make it out-of-line for whatever reason, I see no point in telling it
otherwise, even though I agree it is a bit silly.

Should I send the patch with __always_inline?

      Arnd

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86: mark some mpspec inline functions as __init
  2021-02-25 12:18   ` Arnd Bergmann
@ 2021-02-25 12:42     ` Borislav Petkov
  2021-02-25 13:20       ` Arnd Bergmann
  0 siblings, 1 reply; 13+ messages in thread
From: Borislav Petkov @ 2021-02-25 12:42 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Thomas Gleixner, Ingo Molnar, the arch/x86 maintainers,
	Nathan Chancellor, Nick Desaulniers, Arnd Bergmann,
	H. Peter Anvin, linux-kernel, clang-built-linux

On Thu, Feb 25, 2021 at 01:18:21PM +0100, Arnd Bergmann wrote:
> Either way works correctly, I don't care much, but picked the __init
> annotation as it seemed more intuitive. If the compiler decides to
> make it out-of-line for whatever reason,

Well, frankly, I see no good reason for not inlining a function body
which is a single call. And gcc does it just fine. And previous clangs
did too, so why does clang-13 do it differently?

IOW, could it be that you're fixing something that ain't broke?

> I see no point in telling it otherwise, even though I agree it is a
> bit silly.
>
> Should I send the patch with __always_inline?

I guess.

Although from where I'm standing, it looks like clang-13 needs fixing.
But I surely don't know the whole story and "inline" is not forcing
the inlining so I guess a compiler is free to do what it wants here.
Apparently.

And I guess telling it that those should be always inlined makes it
perfectly clear then.

But WTH do I know...

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86: mark some mpspec inline functions as __init
  2021-02-25 12:42     ` Borislav Petkov
@ 2021-02-25 13:20       ` Arnd Bergmann
  2021-02-25 20:31         ` Nick Desaulniers
  0 siblings, 1 reply; 13+ messages in thread
From: Arnd Bergmann @ 2021-02-25 13:20 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, Ingo Molnar, the arch/x86 maintainers,
	Nathan Chancellor, Nick Desaulniers, Arnd Bergmann,
	H. Peter Anvin, linux-kernel, clang-built-linux

On Thu, Feb 25, 2021 at 1:42 PM Borislav Petkov <bp@alien8.de> wrote:
>
> On Thu, Feb 25, 2021 at 01:18:21PM +0100, Arnd Bergmann wrote:
> > Either way works correctly, I don't care much, but picked the __init
> > annotation as it seemed more intuitive. If the compiler decides to
> > make it out-of-line for whatever reason,
>
> Well, frankly, I see no good reason for not inlining a function body
> which is a single call. And gcc does it just fine. And previous clangs
> did too, so why does clang-13 do it differently?
>
> IOW, could it be that you're fixing something that ain't broke?

Maybe Nick can give some more background here. He mentioned
elsewhere that inlining in clang-13 was completely rewritten and is generally
better than it was before. I'm not sure whether this particular instance
is a case where clang could in fact show an advantage in not inlining
a function, or if this is one case where it got worse and needs to be
tuned better.

In the end, inlining is a bunch of heuristics that are intended help
most of the time, but both (old) clang and gcc get it wrong in cases that
should have been decided the other way. Here we get a new method
that may go wrong in other ways.

        Arnd

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86: mark some mpspec inline functions as __init
  2021-02-25 13:20       ` Arnd Bergmann
@ 2021-02-25 20:31         ` Nick Desaulniers
  2021-02-25 21:33           ` Borislav Petkov
  2021-02-25 22:54           ` Peter Zijlstra
  0 siblings, 2 replies; 13+ messages in thread
From: Nick Desaulniers @ 2021-02-25 20:31 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, Ingo Molnar, the arch/x86 maintainers,
	Nathan Chancellor, Arnd Bergmann, H. Peter Anvin, linux-kernel,
	clang-built-linux, Arnd Bergmann, Arthur Eubanks,
	Chandler Carruth

On Thu, Feb 25, 2021 at 5:20 AM Arnd Bergmann <arnd@kernel.org> wrote:
>
> On Thu, Feb 25, 2021 at 1:42 PM Borislav Petkov <bp@alien8.de> wrote:
> >
> > On Thu, Feb 25, 2021 at 01:18:21PM +0100, Arnd Bergmann wrote:
> > > Either way works correctly, I don't care much, but picked the __init
> > > annotation as it seemed more intuitive. If the compiler decides to
> > > make it out-of-line for whatever reason,
> >
> > Well, frankly, I see no good reason for not inlining a function body
> > which is a single call. And gcc does it just fine. And previous clangs
> > did too, so why does clang-13 do it differently?
> >
> > IOW, could it be that you're fixing something that ain't broke?

It's a reasonable question to ask: "why didn't this get inlined?"
which is worthwhile to revisit on occasion.

I'll post a more generic answer than for this specific case since
there's a bit of fallout from the recent changes, and I plan to
explain this in one place then share lore links on all of the other
threads.

Q: What changed?
A: Large changes to LLVM's pass management have finally been enabled
by default in ToT LLVM (what will ship as clang-13). Specifically,
LLVM's "new pass manager" has been made the default after probably at
least 7 years worth of effort. (The previous system is known as the
"legacy pass manager.") https://bugs.llvm.org/show_bug.cgi?id=46649.
With this change, many heuristics related to inlining have changed.
Cost models have changed, thresholds have been adjusted.  I suspect
there's changes in what granularity of IR gets optimized, but I need
to do more research here.

Next on my reading list, for anyone who wants to learn more is:
"Passes in LLVM": https://llvm.org/devmtg/2014-04/PDFs/Talks/Passes.pdf
https://www.youtube.com/watch?v=rY02LT08-J8
"The LLVM Pass Manager, Part 2"
https://www.youtube.com/watch?v=6NDQl544yEg
https://llvm.org/devmtg/2014-10/Slides/Carruth-TheLLVMPassManagerPart2.pdf

Q: But it's just a single function call, what gives?
A: Imagine you have the following:

void baz(void);
void bar(void) { baz(); ... }
void foo(void) { bar(); ... }

So foo() calls bar() calls baz() (and tail calls don't apply).  The
question being asked in this specific case (and a few other threads)
is "why isn't the call to bar() being inlined into foo()?"  Again,
totally fair question to ask.  (In this analogy, bar() is
get_smp_config()/early_get_smp_config(), foo() is their caller(s,
plural potentially)).

The answer lies in the direction that inlining occurs.  If you're
doing "top down" inlining, then when looking at the edge between foo()
and bar(), inlining looks totally reasonable.  Inline substitution is
performed.  Then you get to the edge that existed between bar() and
baz() and decide perhaps that baz() is too big, has too many callers,
etc. and don't inline baz() into foo().

But if you're doing "bottom up" inlining, then you start by analyzing
the edge between baz() and bar(), perhaps you decide to inline baz()
into bar(), but now the size of bar() is just over the threshold to
inline into foo(), or there's too many callers of bar() to inline into
every caller without excessive code bloat, or trips the threshold for
any number of concerns that go into the inlining cost model.  These
cost models are insanely complex (and don't fully generalize), because
you need to distill a great deal many inputs into a single yes/no
signal: "should I inline?"

LLVM's inliner is "bottom up":
https://www.youtube.com/watch?v=FnGCDLhaxKU&start=1650
This is different than GCC with is "top down":
https://www.youtube.com/embed/FnGCDLhaxKU?start=2301&end=2335&autoplay=1
(That video is 6 years old at this point; I have never looked at the
internals of GCC, so I don't personally know whether that is still the
case, FWIW)

Q: Why doesn't LLVM just do what GCC does?
A: The above video should help a little, but this is something not
mandated by the standard.  There are tradeoffs to either, and only
local optima, not general solutions.
(https://www.youtube.com/watch?v=FnGCDLhaxKU&start=1563 is still my
favorite reference that hints at some of the tradeoffs).

Q: But I put the `inline` keyword on the callee?
A: Probably deserves its own post, but the `inline` keyword doesn't
mean what any rational initial impression would suppose. Language in
the standard refers to "inline substitution" and grants a lot of
leeway to implementations in terms of whether it's performed at all.
There are cases where even with "always_inline" fn attr is applied,
and the compiler says "that's nice, but I still cannot perform inline
substitution here, I'm sorry."  Playing with the inlining heuristics
is always difficult, because for each improvement, code that has
"ossified" around how inlining was previously done may experience
unexpected changes in optimizations performed (see also "Q: What
changed?" above).
https://www.youtube.com/watch?v=FnGCDLhaxKU&start=1563

Q: Should I put "always_inline" fn attr everywhere?
A: No! always_inline can still fail in edge cases, leading to inlining
occurring before most optimizations so the same code gets repeatedly
optimized the same way just in different functions (this can really
hurt compile times).  I'm not saying to completely avoid
always_inline, it's good and has useful cases, but it's not a silver
bullet.  There are no silver bullets here.  It's semantics have
changed since its introduction, and I have seen rare uses that make my
skin crawl.

Q: Wouldn't it be nice if we could express a desire to inline from
individual call sites, rather than on the callee for all call sites?
A: Yes; this can be expressed in LLVM IR, but has no analog AFAIK in C or C++.

Q: But what about my specific case?
A: Without the configuration and compiler version provided, I can't
tell you for certain (and if I stopped to look at every case, I
probably wouldn't get any work done myself).  But what I would do if I
wanted to learn more is:
1. Isolate the kernel config that tickles this.  This is pretty
critical for anyone to reproduce kernel issues found via randconfig
builds.
2. Isolate the compiler invocation from the specific build system for
the affected TU. For the kernel, that's `make ... V=1`.
3. Rebuild LLVM in debug mode; using Cmake that's
-DCMAKE_BUILD_TYPE=Debug. This gives you very powerful compiler
introspection features without necessarily needing to attach a
debugger. (I don't think this is needed for every pass, seems like I
can use Release to debug the inliner, below).  It's one of the main
things I really appreciate about LLVM.
4. Ask LLVM to print diagnostics about certain passes when they are
run. Using the the compiler invocation from 2, add `-mllvm
-debug-only=<pass name>`.  Yes, looking up <pass name> requires some
grepping of LLVM source code.  (Lots more general tips:
https://www.youtube.com/watch?v=y4b-sgp6VYA)  For the inliner:

$ cat foo.c
void baz(void);
void bar(void) { baz(); }
void foo(void) { bar(); }
$ clang -mllvm -debug-only=inline foo.c -O2 -S
Inlining calls in: foo
Inlining calls in: foo
    Inlining (cost=0, threshold=337), Call:   call void @bar()
Updated inlining SCC: (foo)

So LLVM is telling us bar() was inlined into foo(); (baz() can't be
because it wasn't defined in this TU).  You can use this to "watch"
the compiler make decisions about inlining.

(full thread: https://lore.kernel.org/lkml/20210225112247.2240389-1-arnd@kernel.org/)
I suspect in this specific case, "Interprocedural Sparse Conditional
Constant Propagation" sees the calls to the same fn with different
constants, propagates those down creating two specialized versions of
the callee (so they are distinct functions now), inlines those into
get_smp_config()/early_get_smp_config(), then there's too many callers
of those in a single TU where inlining would cause excessive code
bloat.

Either way, it's a deep topic and I'm always happy to learn more about
it to help answer questions.  Would make a fun blog post or LWN
article. Now if only this toolchain would stop catching fire so that I
had the time to write such a post...

>
> Maybe Nick can give some more background here. He mentioned
> elsewhere that inlining in clang-13 was completely rewritten and is generally
> better than it was before. I'm not sure whether this particular instance
> is a case where clang could in fact show an advantage in not inlining
> a function, or if this is one case where it got worse and needs to be
> tuned better.
>
> In the end, inlining is a bunch of heuristics that are intended help
> most of the time, but both (old) clang and gcc get it wrong in cases that
> should have been decided the other way. Here we get a new method
> that may go wrong in other ways.
>
>         Arnd



-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86: mark some mpspec inline functions as __init
  2021-02-25 20:31         ` Nick Desaulniers
@ 2021-02-25 21:33           ` Borislav Petkov
  2021-02-25 21:58             ` Nick Desaulniers
  2021-02-25 22:54           ` Peter Zijlstra
  1 sibling, 1 reply; 13+ messages in thread
From: Borislav Petkov @ 2021-02-25 21:33 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Thomas Gleixner, Ingo Molnar, the arch/x86 maintainers,
	Nathan Chancellor, Arnd Bergmann, H. Peter Anvin, linux-kernel,
	clang-built-linux, Arnd Bergmann, Arthur Eubanks,
	Chandler Carruth

Hi Nick,

On Thu, Feb 25, 2021 at 12:31:33PM -0800, Nick Desaulniers wrote:
> So LLVM is telling us bar() was inlined into foo(); (baz() can't be
> because it wasn't defined in this TU).  You can use this to "watch"
> the compiler make decisions about inlining.

thanks for taking the time to write all this - it is very interesting
and reminds me that I simply won't have time in this life of mine to
learn about compiler inlining - that's a whole another universe. :-)

I hope you can use that text in a blog post too - it is an interesting
read.

> (full thread: https://lore.kernel.org/lkml/20210225112247.2240389-1-arnd@kernel.org/)
> I suspect in this specific case, "Interprocedural Sparse Conditional
> Constant Propagation" sees the calls to the same fn with different
> constants, propagates those down creating two specialized versions of
> the callee (so they are distinct functions now), inlines those into
> get_smp_config()/early_get_smp_config(), then there's too many callers
> of those in a single TU where inlining would cause excessive code
> bloat.

Well, there's exactly one caller of get_smp_config - that's setup_arch().
early_get_smp_config() gets called also exactly once in amd_numa_init().

Now, with my simplistic approach, I can replace the lines at those call
sites by hand with the

	x86_init.mpparse.get_smp_config(<arg>);

call. So those become exactly one function call. I still don't see how
that can be done any differently, frankly.

But apparently the cost model has decided that this is not inlineable.
Maybe because that function ptr is assigned at boot time and that
somehow gets the cost model to give it a very high (or low) value. Or
maybe because the wrappers are calling through a variable - the x86_init
thing - which is in a different section and that confuses the inliner.
Or whatever - totally speculating here.

And this brings me to my point - you can't expect people to do all that
crazy dance of compiler introspection and understand cost models and
compiler optimization just to fix stuff like that.

Now, imagine we "fix" this to clang-13's inliner's satisfaction. Now
imagine too that gcc Version Next changes their inliner and that inliner
says that that "fix" is wrong, for whatever reason, bottom up, top down,
whatever. Do you feel the annoyance all around?

And since, as you say, there are no silver bullets here, I think for
cases like that we'll need a "I know what I'm doing Mr. Compiler, TYVM,
even if your cost model says otherwise" facility. And in this case I
still think __always_inline is correct.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86: mark some mpspec inline functions as __init
  2021-02-25 21:33           ` Borislav Petkov
@ 2021-02-25 21:58             ` Nick Desaulniers
  2021-02-25 22:16               ` Arnd Bergmann
  2021-02-26  8:13               ` Borislav Petkov
  0 siblings, 2 replies; 13+ messages in thread
From: Nick Desaulniers @ 2021-02-25 21:58 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, Ingo Molnar, the arch/x86 maintainers,
	Nathan Chancellor, Arnd Bergmann, H. Peter Anvin, linux-kernel,
	clang-built-linux, Arnd Bergmann

On Thu, Feb 25, 2021 at 1:33 PM Borislav Petkov <bp@alien8.de> wrote:
>
> On Thu, Feb 25, 2021 at 12:31:33PM -0800, Nick Desaulniers wrote:
> > (full thread: https://lore.kernel.org/lkml/20210225112247.2240389-1-arnd@kernel.org/)
> > I suspect in this specific case, "Interprocedural Sparse Conditional
> > Constant Propagation" sees the calls to the same fn with different
> > constants, propagates those down creating two specialized versions of
> > the callee (so they are distinct functions now), inlines those into
> > get_smp_config()/early_get_smp_config(), then there's too many callers
> > of those in a single TU where inlining would cause excessive code
> > bloat.
>
> Well, there's exactly one caller of get_smp_config - that's setup_arch().
> early_get_smp_config() gets called also exactly once in amd_numa_init().
>
> Now, with my simplistic approach, I can replace the lines at those call
> sites by hand with the
>
>         x86_init.mpparse.get_smp_config(<arg>);
>
> call. So those become exactly one function call. I still don't see how
> that can be done any differently, frankly.
>
> But apparently the cost model has decided that this is not inlineable.
> Maybe because that function ptr is assigned at boot time and that
> somehow gets the cost model to give it a very high (or low) value. Or
> maybe because the wrappers are calling through a variable - the x86_init
> thing - which is in a different section and that confuses the inliner.
> Or whatever - totally speculating here.

The config that reproduces it wasn't shared here; I wouldn't be
surprised if this was found via randconfig that enabled some config
that led to excessive code bloat somewhere somehow.

>
> And this brings me to my point - you can't expect people to do all that
> crazy dance of compiler introspection and understand cost models and
> compiler optimization just to fix stuff like that.

Oh, I don't expect everyone to; just leaving breadcrumbs showing other
people on thread how to fish. ;)

>
> Now, imagine we "fix" this to clang-13's inliner's satisfaction. Now
> imagine too that gcc Version Next changes their inliner and that inliner
> says that that "fix" is wrong, for whatever reason, bottom up, top down,
> whatever. Do you feel the annoyance all around?

Yes, mutually unsatisfiable cases are painful, but I don't think
that's what's going on here.

>
> And since, as you say, there are no silver bullets here, I think for
> cases like that we'll need a "I know what I'm doing Mr. Compiler, TYVM,
> even if your cost model says otherwise" facility. And in this case I
> still think __always_inline is correct.

Sure, it doesn't really matter to me which way this is fixed.  I
personally prefer placing functions in the correct sections and
letting the compiler be flexible, since if all of this is to satisfy
some randconfig then __always_inline is making a decision for all
configs, but perhaps it doesn't matter.
-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86: mark some mpspec inline functions as __init
  2021-02-25 21:58             ` Nick Desaulniers
@ 2021-02-25 22:16               ` Arnd Bergmann
  2021-02-26  8:13               ` Borislav Petkov
  1 sibling, 0 replies; 13+ messages in thread
From: Arnd Bergmann @ 2021-02-25 22:16 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Borislav Petkov, Thomas Gleixner, Ingo Molnar,
	the arch/x86 maintainers, Nathan Chancellor, Arnd Bergmann,
	H. Peter Anvin, linux-kernel, clang-built-linux

On Thu, Feb 25, 2021 at 10:59 PM 'Nick Desaulniers' via Clang Built
Linux <clang-built-linux@googlegroups.com> wrote:
> On Thu, Feb 25, 2021 at 1:33 PM Borislav Petkov <bp@alien8.de> wrote:
> >
> > But apparently the cost model has decided that this is not inlineable.
> > Maybe because that function ptr is assigned at boot time and that
> > somehow gets the cost model to give it a very high (or low) value. Or
> > maybe because the wrappers are calling through a variable - the x86_init
> > thing - which is in a different section and that confuses the inliner.
> > Or whatever - totally speculating here.
>
> The config that reproduces it wasn't shared here; I wouldn't be
> surprised if this was found via randconfig that enabled some config
> that led to excessive code bloat somewhere somehow.

It was the case in some similar bugfixes I sent, but in this one, the
out-of-line function was fairly trivial -- there is one call to a sanitizer
but that's it:

        .section        .init.text,"ax",@progbits
        .type   early_get_smp_config,@function  # -- Begin function
early_get_smp_config
early_get_smp_config:                   # @early_get_smp_config
# %bb.0:
        callq   __sanitizer_cov_trace_pc@PLT
        movl    $1, %edi
        jmpq    *x86_init+40(%rip)              # TAILCALL
.Lfunc_end4:
        .size   early_get_smp_config, .Lfunc_end4-early_get_smp_config

I've uploaded the .config to https://pastebin.com/CvvEScnQ

This is with today's linux-next-20210225 plus a number of patches
to address other build failures.

          Arnd

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86: mark some mpspec inline functions as __init
  2021-02-25 20:31         ` Nick Desaulniers
  2021-02-25 21:33           ` Borislav Petkov
@ 2021-02-25 22:54           ` Peter Zijlstra
  1 sibling, 0 replies; 13+ messages in thread
From: Peter Zijlstra @ 2021-02-25 22:54 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Borislav Petkov, Thomas Gleixner, Ingo Molnar,
	the arch/x86 maintainers, Nathan Chancellor, Arnd Bergmann,
	H. Peter Anvin, linux-kernel, clang-built-linux, Arnd Bergmann,
	Arthur Eubanks, Chandler Carruth

On Thu, Feb 25, 2021 at 12:31:33PM -0800, Nick Desaulniers wrote:

> Q: But I put the `inline` keyword on the callee?
> A: Probably deserves its own post, but the `inline` keyword doesn't
> mean what any rational initial impression would suppose. Language in
> the standard refers to "inline substitution" and grants a lot of
> leeway to implementations in terms of whether it's performed at all.

That's just weasle wording to do the wrong thing :/ GCC (and other
compilers) have been saying inline is a valid substitute for macros. You
then reading the spec and saying we can do this ass backwards and
screw everybody who's been expecting things to top-down is just not
acceptible.

The C spec is notoriously bad, because it fails to actually specify and
then we get shit like this :-( And if you then take that 'freedom' and
implement behaviour that 'doesn't mean what any rational initial
impression would suppose', then you're just being a twat.

Life is hard enough without compilers trying to screw you over on
purpose :/

> There are cases where even with "always_inline" fn attr is applied,
> and the compiler says "that's nice, but I still cannot perform inline
> substitution here, I'm sorry."

That's a hard fail from where I'm sitting. Can we please get a compiler
error when that happens? That is, we're relying on this for correctness,
if the compiler then goes and ignores it, we've got *serious* problems.

This really MUST NOT happen.

And I'm not joking, I can get you a non-booting kernel by removing just
a few __always_inline in the right place.

> There are no silver bullets here.  It's semantics have
> changed since its introduction,

That isn't our doing, if the compiler redefines its keywords, don't tell
us our program is broken. You're the ones that have been changing the
meaning of thigs.

> and I have seen rare uses that make my skin crawl.

Do tell...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86: mark some mpspec inline functions as __init
  2021-02-25 21:58             ` Nick Desaulniers
  2021-02-25 22:16               ` Arnd Bergmann
@ 2021-02-26  8:13               ` Borislav Petkov
  2021-02-26 13:24                 ` Arnd Bergmann
  1 sibling, 1 reply; 13+ messages in thread
From: Borislav Petkov @ 2021-02-26  8:13 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Thomas Gleixner, Ingo Molnar, the arch/x86 maintainers,
	Nathan Chancellor, Arnd Bergmann, H. Peter Anvin, linux-kernel,
	clang-built-linux, Arnd Bergmann

On Thu, Feb 25, 2021 at 01:58:48PM -0800, Nick Desaulniers wrote:
> The config that reproduces it wasn't shared here; I wouldn't be
> surprised if this was found via randconfig that enabled some config
> that led to excessive code bloat somewhere somehow.

I'm sceptical it is the .config. As I said, those single function calls
which I could replace by hand - the wrappers simply make the code
cleaner. They could just as well be macros FWIW and then the inlining
will be practically forced at preprocess time.

> Oh, I don't expect everyone to; just leaving breadcrumbs showing other
> people on thread how to fish. ;)

Yap, that's appreciated.

> Sure, it doesn't really matter to me which way this is fixed.  I
> personally prefer placing functions in the correct sections and
> letting the compiler be flexible, since if all of this is to satisfy
> some randconfig then __always_inline is making a decision for all
> configs, but perhaps it doesn't matter.

I hope.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86: mark some mpspec inline functions as __init
  2021-02-26  8:13               ` Borislav Petkov
@ 2021-02-26 13:24                 ` Arnd Bergmann
  2021-02-27 15:11                   ` Arnd Bergmann
  0 siblings, 1 reply; 13+ messages in thread
From: Arnd Bergmann @ 2021-02-26 13:24 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Nick Desaulniers, Thomas Gleixner, Ingo Molnar,
	the arch/x86 maintainers, Nathan Chancellor, Arnd Bergmann,
	H. Peter Anvin, linux-kernel, clang-built-linux

On Fri, Feb 26, 2021 at 9:13 AM Borislav Petkov <bp@alien8.de> wrote:
>
> On Thu, Feb 25, 2021 at 01:58:48PM -0800, Nick Desaulniers wrote:
> > The config that reproduces it wasn't shared here; I wouldn't be
> > surprised if this was found via randconfig that enabled some config
> > that led to excessive code bloat somewhere somehow.
>
> I'm sceptical it is the .config. As I said, those single function calls
> which I could replace by hand - the wrappers simply make the code
> cleaner. They could just as well be macros FWIW and then the inlining
> will be practically forced at preprocess time.

I managed to track down the configurations: This particular function is
not inlined whenever CONFIG_UBSAN_OBJECT_SIZE is enabled
and CONFIG_UBSAN_TRAP is disabled, plus obviously any
configuration option that is needed to build the file.

     Arnd

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86: mark some mpspec inline functions as __init
  2021-02-26 13:24                 ` Arnd Bergmann
@ 2021-02-27 15:11                   ` Arnd Bergmann
  0 siblings, 0 replies; 13+ messages in thread
From: Arnd Bergmann @ 2021-02-27 15:11 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Nick Desaulniers, Thomas Gleixner, Ingo Molnar,
	the arch/x86 maintainers, Nathan Chancellor, Arnd Bergmann,
	H. Peter Anvin, linux-kernel, clang-built-linux

On Fri, Feb 26, 2021 at 2:24 PM Arnd Bergmann <arnd@kernel.org> wrote:
>
> On Fri, Feb 26, 2021 at 9:13 AM Borislav Petkov <bp@alien8.de> wrote:
> >
> > On Thu, Feb 25, 2021 at 01:58:48PM -0800, Nick Desaulniers wrote:
> > > The config that reproduces it wasn't shared here; I wouldn't be
> > > surprised if this was found via randconfig that enabled some config
> > > that led to excessive code bloat somewhere somehow.
> >
> > I'm sceptical it is the .config. As I said, those single function calls
> > which I could replace by hand - the wrappers simply make the code
> > cleaner. They could just as well be macros FWIW and then the inlining
> > will be practically forced at preprocess time.
>
> I managed to track down the configurations: This particular function is
> not inlined whenever CONFIG_UBSAN_OBJECT_SIZE is enabled
> and CONFIG_UBSAN_TRAP is disabled, plus obviously any
> configuration option that is needed to build the file.

And I now had another look at the output after reducing the test case
with cvise to:

struct b {
  void *c;
};
struct {
  struct b d;
} extern e;
int f;

__attribute__((__cold__)) int a();
static inline void early_get_smp_config() {(void) e.d.c; }

int g()
{
  if (a())
    return 2;
  a();
  if (f)
    return f;
  a();
  early_get_smp_config();
  return 0;
}

See https://godbolt.org/z/8qbY65

Some observations:

- The early_get_smp_config function literally does nothing in the
   reduced test case, but is still not inlined.

- This happens regardless of target architecture

- It happens in a code path of the calling function that is 'cold'
   at this point, which presumably is an indication to clang that
   any functions called from here are also cold, and not worth
   inlining.

- I have found no indication why -fsanitize=object-size should
  make a difference.

         Arnd

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-02-27 15:12 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-25 11:22 [PATCH] x86: mark some mpspec inline functions as __init Arnd Bergmann
2021-02-25 11:45 ` Borislav Petkov
2021-02-25 12:18   ` Arnd Bergmann
2021-02-25 12:42     ` Borislav Petkov
2021-02-25 13:20       ` Arnd Bergmann
2021-02-25 20:31         ` Nick Desaulniers
2021-02-25 21:33           ` Borislav Petkov
2021-02-25 21:58             ` Nick Desaulniers
2021-02-25 22:16               ` Arnd Bergmann
2021-02-26  8:13               ` Borislav Petkov
2021-02-26 13:24                 ` Arnd Bergmann
2021-02-27 15:11                   ` Arnd Bergmann
2021-02-25 22:54           ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).