All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nadav Amit <namit@vmware.com>
To: <linux-kernel@vger.kernel.org>
Cc: <nadav.amit@gmail.com>, Nadav Amit <namit@vmware.com>,
	Alok Kataria <akataria@vmware.com>,
	Christopher Li <sparse@chrisli.org>,
	"H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
	Jan Beulich <JBeulich@suse.com>, Jonathan Corbet <corbet@lwn.net>,
	Josh Poimboeuf <jpoimboe@redhat.com>,
	Juergen Gross <jgross@suse.com>,
	Kees Cook <keescook@chromium.org>, <linux-sparse@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Randy Dunlap <rdunlap@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	<virtualization@lists.linux-foundation.org>, <x86@kernel.org>
Subject: [RFC 0/8] Improving compiler inlining decisions
Date: Tue, 15 May 2018 07:11:16 -0700	[thread overview]
Message-ID: <20180515141124.84254-10-namit@vmware.com> (raw)
In-Reply-To: <20180515141124.84254-1-namit@vmware.com>

This patch-set deals with an interesting yet stupid problem: code that
does not get inlined despite its simplicity.

I find 5 classes of causes:

1. Inline assembly blocks in which code and data are added to
alternative sections. The compiler is oblivious to the content of the
blocks and assumes their cost in space and time is proportional to the
number of the perceived assembly "instruction", according to the number
of newlines and semicolons. Alternatives, paravirt and other mechanisms
are affected.

2. Inline assembly with redundant new-lines and semicolons. Similarly to
(1) this code is considered "heavier" than it actually is.

3. Code with constant value optimizations. Quite a few parts of the
kernel check whether a variable is constant (using
__builtin_constant_p()) and perform heavy computations in that case.
These computations are eventually optimized out so they do not land in
the binary. However, the cost of these computations is also associated
with the calling function, which might prevent inlining of the calling
function. ilog2() is an example for such case.

4. Code that is marked with the "cold" attribute, including all the
__init functions. Some may consider it the desired behavior.

5. Code that is marked with a different optimization levels. This
affects for example vmx_vcpu_run(), inducing overheads of up to 10% on
exit.


This patch-set deals with some instances of first 3 classes. 

For (1) we insert an assembly macro, and call it from the inline
assembly block.  As a result, the compiler sees a single "instruction"
and assigns the more appropriate cost to the code.

For (2) the solution is trivial: just remove the newlines.

(3) is somewhat tricky. The proposed solution is to use
__builtin_choose_expr() to check whether a variable is actually constant
instead of using an if-condition or the C ternary operator.
__builtin_choose_expr() is evaluated earlier in the compilation, so it
allows the compiler to associate the right cost for the variable case
before the inlining decisions take place.  So far so good.

Still, there is a drawback. Since __builtin_choose_expr() is evaluated
earlier, it can fail to recognize constants, which an if-condition would
recognize correctly.  As a result, this patch-set only applies it to the
simplest cases.

Overall this patch-set slightly increases the kernel size (my build was
done using localmodconfig + localyesconfig for the record):

   text    data     bss     dec     hex filename
18126699 10066728 2936832 31130259 1db0293 ./vmlinux before
18149210 10064048 2936832 31150090 1db500a ./vmlinux after (+0.06%)

The patch-set eliminates many of the static text symbols:
Before: 40033
After:  39632   (-10%)

There is a measurable effect on performance in some cases. A loop of
MADV_DONTNEED/page-fault shows a 2% performance improvement with this
patch-set.

Some inline comments or self-explaining C macros might still be needed.

[1] https://lkml.org/lkml/2018/5/5/159

Cc: Alok Kataria <akataria@vmware.com>
Cc: Christopher Li <sparse@chrisli.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: linux-sparse@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: virtualization@lists.linux-foundation.org
Cc: x86@kernel.org

Nadav Amit (8):
  x86: objtool: use asm macro for better compiler decisions
  x86: bug: prevent gcc distortions
  x86: alternative: macrofy locks for better inlining
  x86: prevent inline distortion by paravirt ops
  x86: refcount: prevent gcc distortions
  x86: removing unneeded new-lines
  ilog2: preventing compiler distortion due to big condition
  bitops: prevent compiler inline decision distortion

 arch/x86/include/asm/alternative.h    | 28 ++++++++++----
 arch/x86/include/asm/asm.h            |  4 +-
 arch/x86/include/asm/bitops.h         |  8 ++--
 arch/x86/include/asm/bug.h            | 48 ++++++++++++++---------
 arch/x86/include/asm/cmpxchg.h        | 10 ++---
 arch/x86/include/asm/paravirt_types.h | 53 +++++++++++++++-----------
 arch/x86/include/asm/refcount.h       | 55 ++++++++++++++++-----------
 arch/x86/include/asm/special_insns.h  | 12 +++---
 include/linux/compiler.h              | 29 ++++++++++----
 include/linux/log2.h                  | 11 +++---
 10 files changed, 156 insertions(+), 102 deletions(-)

-- 
2.17.0

WARNING: multiple messages have this Message-ID (diff)
From: Nadav Amit <namit@vmware.com>
To: linux-kernel@vger.kernel.org
Cc: nadav.amit@gmail.com, Nadav Amit <namit@vmware.com>,
	Alok Kataria <akataria@vmware.com>,
	Christopher Li <sparse@chrisli.org>,
	"H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
	Jan Beulich <JBeulich@suse.com>, Jonathan Corbet <corbet@lwn.net>,
	Josh Poimboeuf <jpoimboe@redhat.com>,
	Juergen Gross <jgross@suse.com>,
	Kees Cook <keescook@chromium.org>,
	linux-sparse@vger.kernel.org,
	Peter Zijlstra <peterz@infradead.org>,
	Randy Dunlap <rdunlap@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	virtualization@lists.linux-foundation.org, x86@kernel.org
Subject: [RFC 0/8] Improving compiler inlining decisions
Date: Tue, 15 May 2018 07:11:16 -0700	[thread overview]
Message-ID: <20180515141124.84254-10-namit@vmware.com> (raw)
In-Reply-To: <20180515141124.84254-1-namit@vmware.com>

This patch-set deals with an interesting yet stupid problem: code that
does not get inlined despite its simplicity.

I find 5 classes of causes:

1. Inline assembly blocks in which code and data are added to
alternative sections. The compiler is oblivious to the content of the
blocks and assumes their cost in space and time is proportional to the
number of the perceived assembly "instruction", according to the number
of newlines and semicolons. Alternatives, paravirt and other mechanisms
are affected.

2. Inline assembly with redundant new-lines and semicolons. Similarly to
(1) this code is considered "heavier" than it actually is.

3. Code with constant value optimizations. Quite a few parts of the
kernel check whether a variable is constant (using
__builtin_constant_p()) and perform heavy computations in that case.
These computations are eventually optimized out so they do not land in
the binary. However, the cost of these computations is also associated
with the calling function, which might prevent inlining of the calling
function. ilog2() is an example for such case.

4. Code that is marked with the "cold" attribute, including all the
__init functions. Some may consider it the desired behavior.

5. Code that is marked with a different optimization levels. This
affects for example vmx_vcpu_run(), inducing overheads of up to 10% on
exit.


This patch-set deals with some instances of first 3 classes. 

For (1) we insert an assembly macro, and call it from the inline
assembly block.  As a result, the compiler sees a single "instruction"
and assigns the more appropriate cost to the code.

For (2) the solution is trivial: just remove the newlines.

(3) is somewhat tricky. The proposed solution is to use
__builtin_choose_expr() to check whether a variable is actually constant
instead of using an if-condition or the C ternary operator.
__builtin_choose_expr() is evaluated earlier in the compilation, so it
allows the compiler to associate the right cost for the variable case
before the inlining decisions take place.  So far so good.

Still, there is a drawback. Since __builtin_choose_expr() is evaluated
earlier, it can fail to recognize constants, which an if-condition would
recognize correctly.  As a result, this patch-set only applies it to the
simplest cases.

Overall this patch-set slightly increases the kernel size (my build was
done using localmodconfig + localyesconfig for the record):

   text    data     bss     dec     hex filename
18126699 10066728 2936832 31130259 1db0293 ./vmlinux before
18149210 10064048 2936832 31150090 1db500a ./vmlinux after (+0.06%)

The patch-set eliminates many of the static text symbols:
Before: 40033
After:  39632   (-10%)

There is a measurable effect on performance in some cases. A loop of
MADV_DONTNEED/page-fault shows a 2% performance improvement with this
patch-set.

Some inline comments or self-explaining C macros might still be needed.

[1] https://lkml.org/lkml/2018/5/5/159

Cc: Alok Kataria <akataria@vmware.com>
Cc: Christopher Li <sparse@chrisli.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: linux-sparse@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: virtualization@lists.linux-foundation.org
Cc: x86@kernel.org

Nadav Amit (8):
  x86: objtool: use asm macro for better compiler decisions
  x86: bug: prevent gcc distortions
  x86: alternative: macrofy locks for better inlining
  x86: prevent inline distortion by paravirt ops
  x86: refcount: prevent gcc distortions
  x86: removing unneeded new-lines
  ilog2: preventing compiler distortion due to big condition
  bitops: prevent compiler inline decision distortion

 arch/x86/include/asm/alternative.h    | 28 ++++++++++----
 arch/x86/include/asm/asm.h            |  4 +-
 arch/x86/include/asm/bitops.h         |  8 ++--
 arch/x86/include/asm/bug.h            | 48 ++++++++++++++---------
 arch/x86/include/asm/cmpxchg.h        | 10 ++---
 arch/x86/include/asm/paravirt_types.h | 53 +++++++++++++++-----------
 arch/x86/include/asm/refcount.h       | 55 ++++++++++++++++-----------
 arch/x86/include/asm/special_insns.h  | 12 +++---
 include/linux/compiler.h              | 29 ++++++++++----
 include/linux/log2.h                  | 11 +++---
 10 files changed, 156 insertions(+), 102 deletions(-)

-- 
2.17.0

  parent reply	other threads:[~2018-05-15 21:26 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-15 14:11 [RFC 0/8] Improving compiler inlining decisions Nadav Amit
2018-05-15 14:11 ` Nadav Amit
2018-05-15 14:11 ` [RFC 1/8] x86: objtool: use asm macro for better compiler decisions Nadav Amit
2018-05-15 14:11   ` Nadav Amit
2018-05-15 21:37   ` Josh Triplett
2018-05-15 21:53     ` Nadav Amit
2018-05-15 21:55       ` Josh Triplett
2018-05-15 14:11 ` [RFC 2/8] x86: bug: prevent gcc distortions Nadav Amit
2018-05-15 14:11 ` [RFC 3/8] x86: alternative: macrofy locks for better inlining Nadav Amit
2018-05-15 14:11 ` [RFC 4/8] x86: prevent inline distortion by paravirt ops Nadav Amit
2018-05-15 14:11 ` [RFC 5/8] x86: refcount: prevent gcc distortions Nadav Amit
2018-05-16 13:59   ` Kees Cook
2018-05-16 16:37     ` Nadav Amit
2018-05-15 14:11 ` [RFC 6/8] x86: removing unneeded new-lines Nadav Amit
2018-05-15 14:11 ` [RFC 7/8] ilog2: preventing compiler distortion due to big condition Nadav Amit
2018-05-15 14:11 ` [RFC 8/8] bitops: prevent compiler inline decision distortion Nadav Amit
2018-05-15 14:11 ` Nadav Amit [this message]
2018-05-15 14:11   ` [RFC 0/8] Improving compiler inlining decisions Nadav Amit
2018-05-15 14:11 ` [RFC 1/8] x86: objtool: use asm macro for better compiler decisions Nadav Amit
2018-05-15 14:11   ` Nadav Amit
2018-05-15 14:11 ` [RFC 2/8] x86: bug: prevent gcc distortions Nadav Amit
2018-05-15 14:11 ` [RFC 3/8] x86: alternative: macrofy locks for better inlining Nadav Amit
2018-05-15 14:11 ` [RFC 4/8] x86: prevent inline distortion by paravirt ops Nadav Amit
2018-05-15 14:11 ` [RFC 5/8] x86: refcount: prevent gcc distortions Nadav Amit
2018-05-16  7:14   ` Jan Beulich
2018-05-16 16:44     ` Nadav Amit
2018-05-17  7:18       ` Jan Beulich
2018-05-15 14:11 ` [RFC 6/8] x86: removing unneeded new-lines Nadav Amit
2018-05-15 14:11 ` [RFC 7/8] ilog2: preventing compiler distortion due to big condition Nadav Amit
2018-05-15 14:11 ` [RFC 8/8] bitops: prevent compiler inline decision distortion Nadav Amit
2018-05-16 14:09   ` Kees Cook
2018-05-15 22:14 ` [RFC 0/8] Improving compiler inlining decisions Nadav Amit
2018-05-16  3:48 ` Josh Poimboeuf
2018-05-16  3:48   ` Josh Poimboeuf
2018-05-16  4:30   ` Nadav Amit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180515141124.84254-10-namit@vmware.com \
    --to=namit@vmware.com \
    --cc=JBeulich@suse.com \
    --cc=akataria@vmware.com \
    --cc=corbet@lwn.net \
    --cc=hpa@zytor.com \
    --cc=jgross@suse.com \
    --cc=jpoimboe@redhat.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-sparse@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=nadav.amit@gmail.com \
    --cc=peterz@infradead.org \
    --cc=rdunlap@infradead.org \
    --cc=sparse@chrisli.org \
    --cc=tglx@linutronix.de \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.