From: Ingo Molnar <mingo@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
Jakub Jelinek <jakub@redhat.com>,
Denys Vlasenko <dvlasenk@redhat.com>,
Borislav Petkov <bp@alien8.de>,
Tim Chen <tim.c.chen@linux.intel.com>,
Andy Lutomirski <luto@amacapital.net>,
Jason Low <jason.low2@hp.com>, Brian Gerst <brgerst@gmail.com>,
Aswin Chandramouleeswaran <aswin@hp.com>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Davidlohr Bueso <dave@stgolabs.net>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
"H. Peter Anvin" <hpa@zytor.com>,
LKML <linux-kernel@vger.kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Richard Henderson <rth@twiddle.net>
Subject: Re: [PATCH] x86: Turn off GCC branch probability heuristics
Date: Sun, 12 Apr 2015 09:41:24 +0200 [thread overview]
Message-ID: <20150412074123.GB9062@gmail.com> (raw)
In-Reply-To: <CA+55aFxNbqEGouKAmQ72=skkm8NWBUu7jh92BshFsCwF3r294g@mail.gmail.com>
* Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Sat, Apr 11, 2015 at 11:57 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > I thinks its just the no-guess one:
> >
> > text data dec patch reduction
> > 7563475 1781048 10302987
> > 7192973 1780024 9931461 no-guess -4.8%
> > 7354819 1781048 958464 align-1 -2.7%
> > 7192973 1780024 9931461 no-guess + align-1 -4.8%
>
> Yeah, a 5% code expansion is a big deal. Sadly, it looks like
> 'no-guess' also disables our explicit likely/unlikely handling.
>
> Damn. If it actually honored likely/unlikely, then we should just do
> it - and manually fix up any places where we really care.
>
> But the fact that it apparently entirely disables not just the
> guesses, but our *explicit* likely/unlikely, means that we can't fix
> up the mistakes.
>
> And in many of the hot codepaths that likely/unlikely really does
> matter. Some of our hottest paths have known "this basically never
> happens" situations that we do *not* want to break up our L1 I$ over.
> There's a number of functions that have been optimized to really
> generate good code, and "-fno-guess-branch-probability" disables those
> manual optimizations.
>
> So we'd have no way to fix it for the cases that matter.
>
> Sad.
>
> It might be worth bringing this up with some gcc people. I added Jakub
> to the cc. Any other gcc people suggestions?
(Skip to the 'More numbers' section below to see more measurements.)
So what would be nice to have is if GCC had an optimization option to
disable all branch probability heuristics (like
-fno-guess-branch-probability), except explicit __builtin_expect()
hints (which -fno-guess-branch-probability unfortunately disables).
So what would be useful to have is a -fno-branch-probability-heuristics
GCC option or so, or a "--param branch-probability-heuristics=0"
option.
I found one related option:
--param predictable-branch-outcome=N
predictable-branch-outcome
When branch is predicted to be taken with probability
lower than this threshold (in percent), then it is
considered well predictable. The default is 10.
I tried the values 1 and 0, on a -O2 kernel (i.e. guess-branch-probability
was enabled), in the hope that it maybe turns off all non-__builtin_expect()
branch heuristics, but it only had very minor size impact in the 0.001% range.
More numbers:
-------------
I also measured the effect of our __builtin_expect()
(likely()/unlikely()) branch annotations.
As an experiment I mapped both likely()/unlikely() to the four
possible __builtin_expect() probability settings:
likely(): __builtin_expect(, 1) unlikely(): __builtin_expect(, 0)
likely(): __builtin_expect(, 0) unlikely(): __builtin_expect(, 1)
likely(): __builtin_expect(, 0) unlikely(): __builtin_expect(, 0)
likely(): __builtin_expect(, 1) unlikely(): __builtin_expect(, 1)
(the first mapping is the only one that makes sense, and it is the one
that is used by the kernel currently.)
The goal of mixing up the probability mappings was to measure the full
'range' of code size impact that the kernel's explicit hints are
causing, versus the impact that GCC's own branch probability
heuristics are causing:
text data bss dec filename
12566383 1617840 1089536 15273759 vmlinux.expect=10 [==vanilla]
12460250 1617840 1089536 15167626 vmlinux.expect=01
12563332 1617840 1089536 15270708 vmlinux.expect=00
12463035 1617840 1089536 15170411 vmlinux.expect=11
12533382 1617840 1089536 15240758 vmlinux.no-expect
11923529 1617840 1089536 14630905 vmlinux.-fno-guess-branch-probability
[ This was done on a v4.1-rc7-ish vanilla -O2 kernel (no alignment
tweaks), using 'make defconfig' and 'make kvmconfig' on x86-64 to
turn it into a minimally bootable kernel. I used GCC 4.9.1. ]
the 'vmlinux.none' kernel is a vanilla kernel with all
__builtin_expect() hints removed: i.e. GCC heuristics are deciding all
branch probabilities in the kernel.
So the code size 'range' that the kernel's own probability hints are
moving in is around 0.4%:
- the 'vanilla' mapping is the largest (not unexpectedly)
- the 'inverse' mapping is the smallest, by 0.8%
- the 00 and 11 mappings are about mid range, 0.4% away from the
extremes
- the 'none' mapping is also mid-range, which too is somewhat
expected: GCC heuristics would pick only part of our hints.
But note how the 'no GCC heuristics at all' setting reduces size
brutally, by 5.4%.
So if we were able to do that, while keeping __builtin_expect(), and
used our own heuristics only via __builtin_expect(), then we could
still possibly expect a total code size shrinkage of around 5.0%:
-5.4% code size reduction that comes from removal of all hints,
+0.4% code size increase between the 'none' and the '10' mappings
in the experiment above.
Note that even if I apply all the align=1 patches,
-fno-guess-branch-probability on top of that still gives me 2.6% code
size savings:
text data bss dec filename
12566383 1617840 1089536 15273759 vmlinux.expect=10 [==vanilla]
11923529 1617840 1089536 14630905 vmlinux.-fno-guess-branch-probability
11903663 1617840 1089536 14611039 vmlinux.align=1
11646102 1617840 1089536 14353478 vmlinux.align=1+fno-guess-branch-probability
The smallest vmlinux has:
- about 41,000 functions ('tT' symbols in System.map)
- about 300,000 branch/jump instructions (all objdump -d asm mnemonics starting with 'j')
- about 165,000 function calls (all objdump -d asm mnemonics matching 'call')
- about 2,330,000 instructions (all objdump -d asm mnemonics)
With align=1, GCC's heuristics added about 1,200 new branches and
1,350 new function calls, 76,900 instructions, altogether +257,561
bytes of code:
# of x86 instructions
2549742 vmlinux.expect=10 [==vanilla]
2391069 vmlinux.-fno-guess-branch-probability
2411568 vmlinux.align=1
2334595 vmlinux.align=1+fno-guess-branch-probability
(For completeness, the patch generating the smallest kernel is
attached below.)
The takeaway: the code size savings above are 7.9%. Even if we
restored __builtin_expect() hints, we'd probably still see a combined
7.5% code shrinkage.
That would be a rather significant I$ win, with very little cost that
I can see!
Thanks,
Ingo
---
arch/x86/Makefile | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 5ba2d9ce82dc..a6d3feb90b97 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -77,10 +77,25 @@ else
KBUILD_AFLAGS += -m64
KBUILD_CFLAGS += -m64
+ # Pack jump targets tightly, don't align them to the default 16 bytes:
+ KBUILD_CFLAGS += -falign-jumps=1
+
+ # Pack functions tightly as well:
+ KBUILD_CFLAGS += -falign-functions=1
+
+ # Pack loops tightly as well:
+ KBUILD_CFLAGS += -falign-loops=1
+
# Don't autogenerate traditional x87 instructions
KBUILD_CFLAGS += $(call cc-option,-mno-80387)
KBUILD_CFLAGS += $(call cc-option,-mno-fp-ret-in-387)
+ #
+ # Don't guess branch probabilities, follow the code and unlikely()/likely() hints,
+ # which reduces vmlinux size by about 5.4%:
+ #
+ KBUILD_CFLAGS += -fno-guess-branch-probability
+
# Use -mpreferred-stack-boundary=3 if supported.
KBUILD_CFLAGS += $(call cc-option,-mpreferred-stack-boundary=3)
next prev parent reply other threads:[~2015-04-12 7:41 UTC|newest]
Thread overview: 108+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-08 19:39 [PATCH 0/2] locking: Simplify mutex and rwsem spinning code Jason Low
2015-04-08 19:39 ` [PATCH 1/2] locking/mutex: Further refactor mutex_spin_on_owner() Jason Low
2015-04-09 9:00 ` [tip:locking/core] locking/mutex: Further simplify mutex_spin_on_owner() tip-bot for Jason Low
2015-04-08 19:39 ` [PATCH 2/2] locking/rwsem: Use a return variable in rwsem_spin_on_owner() Jason Low
2015-04-09 5:37 ` Ingo Molnar
2015-04-09 6:40 ` Jason Low
2015-04-09 7:53 ` Ingo Molnar
2015-04-09 16:47 ` Linus Torvalds
2015-04-09 17:56 ` Paul E. McKenney
2015-04-09 18:08 ` Linus Torvalds
2015-04-09 18:16 ` Linus Torvalds
2015-04-09 18:39 ` Paul E. McKenney
2015-04-10 9:00 ` [PATCH] mutex: Speed up mutex_spin_on_owner() by not taking the RCU lock Ingo Molnar
2015-04-10 9:12 ` Ingo Molnar
2015-04-10 9:21 ` [PATCH] uaccess: Add __copy_from_kernel_inatomic() primitive Ingo Molnar
2015-04-10 11:14 ` [PATCH] x86/uaccess: Implement get_kernel() Ingo Molnar
2015-04-10 11:27 ` [PATCH] mutex: Improve mutex_spin_on_owner() code generation Ingo Molnar
2015-04-10 12:08 ` [PATCH] x86: Align jump targets to 1 byte boundaries Ingo Molnar
2015-04-10 12:18 ` [PATCH] x86: Pack function addresses tightly as well Ingo Molnar
2015-04-10 12:30 ` [PATCH] x86: Pack loops " Ingo Molnar
2015-04-10 13:46 ` Borislav Petkov
2015-05-15 9:40 ` [tip:x86/asm] " tip-bot for Ingo Molnar
2015-05-17 6:03 ` [tip:x86/apic] " tip-bot for Ingo Molnar
2015-05-15 9:39 ` [tip:x86/asm] x86: Pack function addresses " tip-bot for Ingo Molnar
2015-05-15 18:36 ` Linus Torvalds
2015-05-15 20:52 ` Denys Vlasenko
2015-05-17 5:58 ` Ingo Molnar
2015-05-17 7:09 ` Ingo Molnar
2015-05-17 7:30 ` Ingo Molnar
2015-05-18 9:28 ` Denys Vlasenko
2015-05-19 21:38 ` [RFC PATCH] x86/64: Optimize the effective instruction cache footprint of kernel functions Ingo Molnar
2015-05-20 0:47 ` Linus Torvalds
2015-05-20 12:21 ` Denys Vlasenko
2015-05-21 11:36 ` Ingo Molnar
2015-05-21 11:38 ` Denys Vlasenko
2016-04-16 21:08 ` Denys Vlasenko
2015-05-20 13:09 ` Ingo Molnar
2015-05-20 11:29 ` Denys Vlasenko
2015-05-21 13:28 ` Ingo Molnar
2015-05-21 14:03 ` Ingo Molnar
2015-04-10 12:50 ` [PATCH] x86: Align jump targets to 1 byte boundaries Denys Vlasenko
2015-04-10 13:18 ` H. Peter Anvin
2015-04-10 17:54 ` Ingo Molnar
2015-04-10 18:32 ` H. Peter Anvin
2015-04-11 14:41 ` Markus Trippelsdorf
2015-04-12 10:14 ` Ingo Molnar
2015-04-13 16:23 ` Markus Trippelsdorf
2015-04-13 17:26 ` Markus Trippelsdorf
2015-04-13 18:31 ` Linus Torvalds
2015-04-13 19:09 ` Markus Trippelsdorf
2015-04-14 5:38 ` Ingo Molnar
2015-04-14 8:23 ` Markus Trippelsdorf
2015-04-14 9:16 ` Ingo Molnar
2015-04-14 11:17 ` Markus Trippelsdorf
2015-04-14 12:09 ` Ingo Molnar
2015-04-10 18:48 ` Linus Torvalds
2015-04-12 23:44 ` Maciej W. Rozycki
2015-04-10 19:23 ` Daniel Borkmann
2015-04-11 13:48 ` Markus Trippelsdorf
2015-04-10 13:19 ` Borislav Petkov
2015-04-10 13:54 ` Denys Vlasenko
2015-04-10 14:01 ` Borislav Petkov
2015-04-10 14:53 ` Denys Vlasenko
2015-04-10 15:25 ` Borislav Petkov
2015-04-10 15:48 ` Denys Vlasenko
2015-04-10 15:54 ` Borislav Petkov
2015-04-10 21:44 ` Borislav Petkov
2015-04-10 18:54 ` Linus Torvalds
2015-04-10 14:10 ` Paul E. McKenney
2015-04-11 14:28 ` Josh Triplett
2015-04-11 9:20 ` [PATCH] x86: Turn off GCC branch probability heuristics Ingo Molnar
2015-04-11 17:41 ` Linus Torvalds
2015-04-11 18:57 ` Thomas Gleixner
2015-04-11 19:35 ` Linus Torvalds
2015-04-12 5:47 ` Ingo Molnar
2015-04-12 6:20 ` Markus Trippelsdorf
2015-04-12 10:15 ` Ingo Molnar
2015-04-12 7:56 ` Mike Galbraith
2015-04-12 7:41 ` Ingo Molnar [this message]
2015-04-12 8:07 ` Ingo Molnar
2015-04-12 21:11 ` Jan Hubicka
2015-05-14 11:59 ` [PATCH] x86: Align jump targets to 1 byte boundaries Denys Vlasenko
2015-05-14 18:17 ` Ingo Molnar
2015-05-14 19:04 ` Denys Vlasenko
2015-05-14 19:44 ` Ingo Molnar
2015-05-15 15:45 ` Josh Triplett
2015-05-17 5:34 ` Ingo Molnar
2015-05-17 19:18 ` Josh Triplett
2015-05-18 6:48 ` Ingo Molnar
2015-05-15 9:39 ` [tip:x86/asm] x86: Align jump targets to 1-byte boundaries tip-bot for Ingo Molnar
2015-04-10 11:34 ` [PATCH] x86/uaccess: Implement get_kernel() Peter Zijlstra
2015-04-10 18:04 ` Ingo Molnar
2015-04-10 17:49 ` Linus Torvalds
2015-04-10 18:04 ` Ingo Molnar
2015-04-10 18:09 ` Linus Torvalds
2015-04-10 14:20 ` [PATCH] mutex: Speed up mutex_spin_on_owner() by not taking the RCU lock Paul E. McKenney
2015-04-10 17:44 ` Ingo Molnar
2015-04-10 18:05 ` Paul E. McKenney
2015-04-09 19:43 ` [PATCH 2/2] locking/rwsem: Use a return variable in rwsem_spin_on_owner() Jason Low
2015-04-09 19:58 ` Paul E. McKenney
2015-04-09 20:58 ` Jason Low
2015-04-09 21:07 ` Paul E. McKenney
2015-04-09 19:59 ` Davidlohr Bueso
2015-04-09 20:36 ` Jason Low
2015-04-10 2:43 ` Andev
2015-04-10 9:04 ` Ingo Molnar
2015-04-08 19:49 ` [PATCH 0/2] locking: Simplify mutex and rwsem spinning code Davidlohr Bueso
2015-04-08 20:10 ` Jason Low
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150412074123.GB9062@gmail.com \
--to=mingo@kernel.org \
--cc=a.p.zijlstra@chello.nl \
--cc=aswin@hp.com \
--cc=bp@alien8.de \
--cc=brgerst@gmail.com \
--cc=dave@stgolabs.net \
--cc=dvlasenk@redhat.com \
--cc=hpa@zytor.com \
--cc=jakub@redhat.com \
--cc=jason.low2@hp.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=rth@twiddle.net \
--cc=tglx@linutronix.de \
--cc=tim.c.chen@linux.intel.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.