linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Suboptimal inline heuristics due to non-code sections
@ 2018-05-01  6:50 Nadav Amit
  2018-05-01 13:40 ` Josh Poimboeuf
  0 siblings, 1 reply; 8+ messages in thread
From: Nadav Amit @ 2018-05-01  6:50 UTC (permalink / raw)
  To: Josh Poimboeuf, Peter Zijlstra; +Cc: Ingo Molnar, LKML, Thomas Gleixner

When gcc considers the size of a function for inlining decisions, it
apparently considers *all* sections. Since the kernel extensively uses
sections for things other than code (e.g., exception-table, bug-table), the
optimality of these decisions seem questionable to me.

The objtool’s sections may be the most extreme case, as these sections are
discarded, while their size still appears to be considered by the inlining
heuristics. It may be beneficial not to consider (some) the other sections
as well, as they do not affect code-caching but only increase the kernel
size.

To illustrate the issue, consider the function copy_overflow():

   0xffffffff819315e0 <+0>:	push   %rbp
   0xffffffff819315e1 <+1>:	mov    %rsi,%rdx
   0xffffffff819315e4 <+4>:	mov    %edi,%esi
   0xffffffff819315e6 <+6>:	mov    $0xffffffff820bc4b8,%rdi
   0xffffffff819315ed <+13>:	mov    %rsp,%rbp
   0xffffffff819315f0 <+16>:	callq  0xffffffff81089b70 <__warn_printk>
   0xffffffff819315f5 <+21>:	ud2    
   0xffffffff819315f7 <+23>:	pop    %rbp
   0xffffffff819315f8 <+24>:	retq   

This function seems to me as a great candidate for inlining. Yet, in my 4.16
build (using gcc 7.2), I get 38 non-inlined instances of this function in
vmlinux. Forcing CONFIG_STACK_VALIDATION to be disabled reduces the number
non-inlined instances to 35. Removing, in addition, the data which is saved
in the __bug_table makes all the instances of the function to be inlined.

Obviously this certain function can be set as __always_inline, but the inline
heuristics seems to me as wrongfully biased. 

What do you think?

Is there a way to make gcc to ignore sections for its inlining heuristics?

Thanks,
Nadav

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Suboptimal inline heuristics due to non-code sections
  2018-05-01  6:50 Suboptimal inline heuristics due to non-code sections Nadav Amit
@ 2018-05-01 13:40 ` Josh Poimboeuf
  2018-05-01 15:37   ` Linus Torvalds
  0 siblings, 1 reply; 8+ messages in thread
From: Josh Poimboeuf @ 2018-05-01 13:40 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Peter Zijlstra, Ingo Molnar, LKML, Thomas Gleixner, Linus Torvalds

On Tue, May 01, 2018 at 06:50:14AM +0000, Nadav Amit wrote:
> When gcc considers the size of a function for inlining decisions, it
> apparently considers *all* sections. Since the kernel extensively uses
> sections for things other than code (e.g., exception-table, bug-table), the
> optimality of these decisions seem questionable to me.
> 
> The objtool’s sections may be the most extreme case, as these sections are
> discarded, while their size still appears to be considered by the inlining
> heuristics. It may be beneficial not to consider (some) the other sections
> as well, as they do not affect code-caching but only increase the kernel
> size.
> 
> To illustrate the issue, consider the function copy_overflow():
> 
>    0xffffffff819315e0 <+0>:	push   %rbp
>    0xffffffff819315e1 <+1>:	mov    %rsi,%rdx
>    0xffffffff819315e4 <+4>:	mov    %edi,%esi
>    0xffffffff819315e6 <+6>:	mov    $0xffffffff820bc4b8,%rdi
>    0xffffffff819315ed <+13>:	mov    %rsp,%rbp
>    0xffffffff819315f0 <+16>:	callq  0xffffffff81089b70 <__warn_printk>
>    0xffffffff819315f5 <+21>:	ud2    
>    0xffffffff819315f7 <+23>:	pop    %rbp
>    0xffffffff819315f8 <+24>:	retq   
> 
> This function seems to me as a great candidate for inlining. Yet, in my 4.16
> build (using gcc 7.2), I get 38 non-inlined instances of this function in
> vmlinux. Forcing CONFIG_STACK_VALIDATION to be disabled reduces the number
> non-inlined instances to 35. Removing, in addition, the data which is saved
> in the __bug_table makes all the instances of the function to be inlined.
> 
> Obviously this certain function can be set as __always_inline, but the inline
> heuristics seems to me as wrongfully biased. 
> 
> What do you think?
> 
> Is there a way to make gcc to ignore sections for its inlining heuristics?

Good find.

Playing around with one of the affected files (crypto/af_alg.o), if I
make the .discard.reachable section empty by removing the text reference
from the annotate_reachable() macro, then copy_overflow() still isn't
inlined.

But if I remove the section completely by removing the
pushsection/popsection, then copy_overflow() gets inlined.

So GCC's inlining decisions are somehow influenced by the existence of
some random empty section.  This definitely seems like a GCC bug to me.

-- 
Josh

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Suboptimal inline heuristics due to non-code sections
  2018-05-01 13:40 ` Josh Poimboeuf
@ 2018-05-01 15:37   ` Linus Torvalds
  2018-05-01 16:39     ` Nadav Amit
  0 siblings, 1 reply; 8+ messages in thread
From: Linus Torvalds @ 2018-05-01 15:37 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: namit, Peter Zijlstra, Ingo Molnar, Linux Kernel Mailing List,
	Thomas Gleixner

On Tue, May 1, 2018 at 6:40 AM Josh Poimboeuf <jpoimboe@redhat.com> wrote:

> But if I remove the section completely by removing the
> pushsection/popsection, then copy_overflow() gets inlined.

> So GCC's inlining decisions are somehow influenced by the existence of
> some random empty section.  This definitely seems like a GCC bug to me.

I think gcc uses the size of the string to approximate the size of an
inline asm.

So I don't think it's the "empty section" that makes gcc do this, I think
it's literally "our inline asms _look_ big".

                Linus "does this section directive make me look fat?"
Torvalds

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Suboptimal inline heuristics due to non-code sections
  2018-05-01 15:37   ` Linus Torvalds
@ 2018-05-01 16:39     ` Nadav Amit
  2018-05-01 16:46       ` Nadav Amit
  2018-05-01 16:48       ` Linus Torvalds
  0 siblings, 2 replies; 8+ messages in thread
From: Nadav Amit @ 2018-05-01 16:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Josh Poimboeuf, Peter Zijlstra, Ingo Molnar,
	Linux Kernel Mailing List, Thomas Gleixner

Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Tue, May 1, 2018 at 6:40 AM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> 
>> But if I remove the section completely by removing the
>> pushsection/popsection, then copy_overflow() gets inlined.
> 
>> So GCC's inlining decisions are somehow influenced by the existence of
>> some random empty section.  This definitely seems like a GCC bug to me.
> 
> I think gcc uses the size of the string to approximate the size of an
> inline asm.
> 
> So I don't think it's the "empty section" that makes gcc do this, I think
> it's literally "our inline asms _look_ big”.

I didn’t think about that.

Playing with the code a bit more, it seems that it is actually related to
the number of “new-lines” in the inline assembly. Removing 4 new-lines from
_BUG_FLAGS (those that can be removed without breaking assembly) eliminated
most of the non-inlined versions of copy_overflow().

Would it be reasonable to remove new-lines in such cases?

Regards,
Nadav

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Suboptimal inline heuristics due to non-code sections
  2018-05-01 16:39     ` Nadav Amit
@ 2018-05-01 16:46       ` Nadav Amit
  2018-05-01 16:51         ` Linus Torvalds
  2018-05-01 16:48       ` Linus Torvalds
  1 sibling, 1 reply; 8+ messages in thread
From: Nadav Amit @ 2018-05-01 16:46 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Josh Poimboeuf, Peter Zijlstra, Ingo Molnar,
	Linux Kernel Mailing List, Thomas Gleixner

Nadav Amit <namit@vmware.com> wrote:

> Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
>> On Tue, May 1, 2018 at 6:40 AM Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>> 
>>> But if I remove the section completely by removing the
>>> pushsection/popsection, then copy_overflow() gets inlined.
>> 
>>> So GCC's inlining decisions are somehow influenced by the existence of
>>> some random empty section.  This definitely seems like a GCC bug to me.
>> 
>> I think gcc uses the size of the string to approximate the size of an
>> inline asm.
>> 
>> So I don't think it's the "empty section" that makes gcc do this, I think
>> it's literally "our inline asms _look_ big”.
> 
> I didn’t think about that.
> 
> Playing with the code a bit more, it seems that it is actually related to
> the number of “new-lines” in the inline assembly. Removing 4 new-lines from
> _BUG_FLAGS (those that can be removed without breaking assembly) eliminated
> most of the non-inlined versions of copy_overflow().
> 
> Would it be reasonable to remove new-lines in such cases?

My bad. It’s not the new-line. Let me do some more digging.

Nadav

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Suboptimal inline heuristics due to non-code sections
  2018-05-01 16:39     ` Nadav Amit
  2018-05-01 16:46       ` Nadav Amit
@ 2018-05-01 16:48       ` Linus Torvalds
  1 sibling, 0 replies; 8+ messages in thread
From: Linus Torvalds @ 2018-05-01 16:48 UTC (permalink / raw)
  To: namit
  Cc: Josh Poimboeuf, Peter Zijlstra, Ingo Molnar,
	Linux Kernel Mailing List, Thomas Gleixner

On Tue, May 1, 2018 at 9:39 AM Nadav Amit <namit@vmware.com> wrote:

> Would it be reasonable to remove new-lines in such cases?

Yeah, that may be fine for some of our (already illegible) section crud.

                 Linus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Suboptimal inline heuristics due to non-code sections
  2018-05-01 16:46       ` Nadav Amit
@ 2018-05-01 16:51         ` Linus Torvalds
  2018-05-01 16:54           ` Nadav Amit
  0 siblings, 1 reply; 8+ messages in thread
From: Linus Torvalds @ 2018-05-01 16:51 UTC (permalink / raw)
  To: namit
  Cc: Josh Poimboeuf, Peter Zijlstra, Ingo Molnar,
	Linux Kernel Mailing List, Thomas Gleixner

On Tue, May 1, 2018 at 9:46 AM Nadav Amit <namit@vmware.com> wrote:

> My bad. It’s not the new-line. Let me do some more digging.

 From the gcc docs:

   Some targets require that GCC track the size of each instruction used
   in order to generate correct code.  Because the final length of the
   code produced by an @code{asm} statement is only known by the
   assembler, GCC must make an estimate as to how big it will be.  It
   does this by counting the number of instructions in the pattern of the
   @code{asm} and multiplying that by the length of the longest
   instruction supported by that processor.  (When working out the number
   of instructions, it assumes that any occurrence of a newline or of
   whatever statement separator character is supported by the assembler --
   typically @samp{;} --- indicates the end of an instruction.)

so it probably counts newlines and semicolons to estimate the size.

                 Linus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Suboptimal inline heuristics due to non-code sections
  2018-05-01 16:51         ` Linus Torvalds
@ 2018-05-01 16:54           ` Nadav Amit
  0 siblings, 0 replies; 8+ messages in thread
From: Nadav Amit @ 2018-05-01 16:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Josh Poimboeuf, Peter Zijlstra, Ingo Molnar,
	Linux Kernel Mailing List, Thomas Gleixner

Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Tue, May 1, 2018 at 9:46 AM Nadav Amit <namit@vmware.com> wrote:
> 
>> My bad. It’s not the new-line. Let me do some more digging.
> 
> From the gcc docs:
> 
>   Some targets require that GCC track the size of each instruction used
>   in order to generate correct code.  Because the final length of the
>   code produced by an @code{asm} statement is only known by the
>   assembler, GCC must make an estimate as to how big it will be.  It
>   does this by counting the number of instructions in the pattern of the
>   @code{asm} and multiplying that by the length of the longest
>   instruction supported by that processor.  (When working out the number
>   of instructions, it assumes that any occurrence of a newline or of
>   whatever statement separator character is supported by the assembler --
>   typically @samp{;} --- indicates the end of an instruction.)
> 
> so it probably counts newlines and semicolons to estimate the size.

Thanks. I probably did not have enough coffee.

I’ll work on it.

Regards,
Nadav

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-05-01 16:54 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-01  6:50 Suboptimal inline heuristics due to non-code sections Nadav Amit
2018-05-01 13:40 ` Josh Poimboeuf
2018-05-01 15:37   ` Linus Torvalds
2018-05-01 16:39     ` Nadav Amit
2018-05-01 16:46       ` Nadav Amit
2018-05-01 16:51         ` Linus Torvalds
2018-05-01 16:54           ` Nadav Amit
2018-05-01 16:48       ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).