From: Ingo Molnar <mingo@elte.hu>
To: Linus Torvalds <torvalds@osdl.org>
Cc: Martin Bligh <mbligh@mbligh.org>, Matt Mackall <mpm@selenic.com>,
Arjan van de Ven <arjan@infradead.org>,
Chuck Ebbert <76306.1226@compuserve.com>,
Adrian Bunk <bunk@stusta.de>, Andrew Morton <akpm@osdl.org>,
linux-kernel <linux-kernel@vger.kernel.org>,
Dave Jones <davej@redhat.com>,
Tim Schmielau <tim@physik3.uni-rostock.de>
Subject: Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers
Date: Fri, 6 Jan 2006 00:30:49 +0100 [thread overview]
Message-ID: <20060105233049.GA3441@elte.hu> (raw)
In-Reply-To: <Pine.LNX.4.64.0601051213270.3169@g5.osdl.org>
* Linus Torvalds <torvalds@osdl.org> wrote:
> On Thu, 5 Jan 2006, Linus Torvalds wrote:
> >
> > The cache effects are likely the biggest ones, and no, I don't know how
> > much denser it will be in the cache. Especially with a 64-byte one..
> > (although 128 bytes is fairly common too).
>
> Oh, but validatign things like "likely()" and "unlikely()" branch
> hints might be a noticeably bigger issue.
i frequently validate branches in performance-critical kernel code like
the scheduler (and the mutex code ;), via instruction-granularity
profiling, driven by a high-frequency (10-100 KHz) NMI interrupt. A bad
branch layout shows up pretty clearly in annotated assembly listings:
c037313c: 1715 <schedule>:
c037313c: 1715 55 push %ebp
c037313d: 264 b8 00 f0 ff ff mov $0xfffff000,%eax
c0373142: 150 89 e5 mov %esp,%ebp
c0373144: 0 57 push %edi
c0373145: 852 56 push %esi
c0373146: 215 53 push %ebx
c0373147: 0 83 ec 30 sub $0x30,%esp
c037314a: 184 21 e0 and %esp,%eax
c037314c: 120 8b 10 mov (%eax),%edx
c037314e: 0 83 ba 84 00 00 00 00 cmpl $0x0,0x84(%edx)
c0373155: 83 75 2b jne c0373182 <schedule+0x46>
c0373157: 104 8b 48 14 mov 0x14(%eax),%ecx
c037315a: 39 f7 c1 ff ff ff ef test $0xefffffff,%ecx
c0373160: 112 74 20 je c0373182 <schedule+0x46>
c0373162: 0 ff b2 9c 00 00 00 pushl 0x9c(%edx)
c0373168: 0 8d 82 a4 01 00 00 lea 0x1a4(%edx),%eax
c037316e: 0 51 push %ecx
c037316f: 0 50 push %eax
c0373170: 0 68 7e 0e 39 c0 push $0xc0390e7e
c0373175: 0 e8 a3 36 da ff call c011681d <printk>
c037317a: 0 e8 48 03 d9 ff call c01034c7 <dump_stack>
c037317f: 0 83 c4 10 add $0x10,%esp
c0373182: 323 8b 55 04 mov 0x4(%ebp),%edx
c0373185: 5 b8 02 00 00 00 mov $0x2,%eax
c037318a: 0 e8 b3 3f da ff call c0117142 <profile_hit>
c037318f: 349 b8 00 f0 ff ff mov $0xfffff000,%eax
c0373194: 880 21 e0 and %esp,%eax
c0373196: 0 8b 00 mov (%eax),%eax
c0373198: 0 89 45 d4 mov %eax,0xffffffd4(%ebp)
c037319b: 440 83 78 14 00 cmpl $0x0,0x14(%eax)
c037319f: 5 78 05 js c03731a6 <schedule+0x6a>
the second column is the number of profiler hits. As you can see, the
branch at c0373160 is always taken, and there's a hole of 32 bytes in
the instruction stream. It is relatively easy to identify the
likely/unlikely candidates for various workloads. (It would probably be
even better to have a visual tool that also associates the source code
with the data.)
i've seen alot of such profiles on alot of different workloads, and my
guesstimate would be that with 'perfect' likely/unlikely hints, and with
'perfect' function ordering, we could roughly halve (!) the current
icache footprint of the kernel on complex workloads too.
Especially with 64 or 128 byte L1 cachelines our codepaths are really
fragmented and we can easily have 3-4 times of the optimal icache
footprint, for a given syscall. We very often have cruft in the hotpath,
and we often have functions that belong together ripped apart by things
like e.g. __sched annotators. I havent seen many cases of wrongly judged
likely/unlikely hints, what happens typically is that there's no
annotation and the default compiler guess is wrong.
the dcache footprint of the kernel is much better, mostly because it's
so easy to control it in C. The icache footprint is alot more elusive.
(and also alot more critical to execution speed on nontrivial workloads)
so i think there are two major focus areas to improve our icache
footprint:
- reduce code size
- reduce fragmentation of the codepath
fortunately both are hard and technically challenging projects, and both
will improve the icache footprint - and they will also bring other
benefits. [ We usually have much more problems with the easy and boring
stuff ;-) ]
icache fragmentation reduction is also hard because it has to deal with
fundamentally conflicting constraints: one workload's ideal ordering is
different from another workload's ideal ordering, and such workloads can
be superimposed on a system.
I think the only sane solution [that would be endorsed by distributions]
is to allow users to reorder function sections runtime (per boot). That
is alot faster and more robust (from a production POV) than a full
recompilation of the kernel. Recompilation is always risky, it needs too
much context, and has too many tool dependencies - and is thus currently
untestable. And we dont really need a recompilation of the kernel
technically - we need a relinking. Relinking is much safer from a
testability POV: reordering of the functions doesnt change their
internal instruction sequence or their interactions.
and we could use mcount() to gather function-cohesion data runtime. The
mcount() call could be patched out from the image runtime, when no data
gathering is happening. Given that the average function size is ~100
bytes, and an mcount call costs 5 bytes, the overhead would be +5% of
size and an extra 5-byte NOP per function. That's not good, but it is
still at least an order of magnitude smaller than the possible gain in
icache footprint. (Also, people could run mcount()-less kernels as well,
once the data has been gathered, and the relink was done.)
one problem are modules though - they could only be reordered within
themselves. On an average system which has ~100 modules loaded, the
average icache fragmentation is +100*128/2 == 6.4K [with 128 byte L1
cachelines], which can be significant (depending on the workload). OTOH,
modules do have strong internal cohesion - they contain functions that
belong together conceptually. So by reordering functions within modules
we'll likely be able to realize most of the icache savings possible. The
only exception would be workloads that utilize many modules at a high
frequency. Such workloads will likely trash the icache anyway.
Ingo
next prev parent reply other threads:[~2006-01-05 23:34 UTC|newest]
Thread overview: 167+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-01-05 0:55 [patch 00/2] improve .text size on gcc 4.0 and newer compilers Chuck Ebbert
2006-01-05 1:07 ` Martin Bligh
2006-01-05 12:19 ` Arjan van de Ven
2006-01-05 14:30 ` Jakub Jelinek
2006-01-05 16:55 ` Linus Torvalds
2006-01-05 18:42 ` Daniel Jacobowitz
2006-01-05 17:02 ` Matt Mackall
2006-01-05 17:59 ` Martin Bligh
2006-01-05 18:09 ` Arjan van de Ven
2006-01-05 18:43 ` Daniel Jacobowitz
2006-01-05 19:17 ` Linus Torvalds
2006-01-05 19:40 ` Linus Torvalds
2006-01-05 19:49 ` Martin Bligh
2006-01-05 20:13 ` Linus Torvalds
2006-01-05 20:15 ` Linus Torvalds
2006-01-05 23:30 ` Ingo Molnar [this message]
2006-01-05 23:54 ` Linus Torvalds
2006-01-06 0:15 ` Ingo Molnar
2006-01-06 0:27 ` Linus Torvalds
2006-01-06 0:54 ` Ingo Molnar
2006-01-06 0:02 ` Martin Bligh
2006-01-06 0:40 ` Ingo Molnar
2006-01-06 0:55 ` Martin Bligh
2006-01-06 1:48 ` Mitchell Blank Jr
2006-01-06 0:50 ` Mitchell Blank Jr
2006-01-06 0:58 ` Ingo Molnar
2006-01-06 1:22 ` Mitchell Blank Jr
2006-01-05 21:34 ` Matt Mackall
2006-01-05 22:08 ` Linus Torvalds
2006-01-05 22:36 ` Matt Mackall
2006-01-05 22:49 ` Martin Bligh
2006-01-05 23:02 ` Matt Mackall
2006-01-05 22:55 ` Ingo Molnar
2006-01-05 23:11 ` Matt Mackall
2006-01-05 23:27 ` Jesse Barnes
2006-01-05 23:58 ` Ingo Molnar
2006-01-05 21:32 ` Grzegorz Kulewski
-- strict thread matches above, loose matches on Subject: below --
2005-12-28 11:46 Ingo Molnar
2005-12-28 19:17 ` Linus Torvalds
2005-12-28 19:34 ` Arjan van de Ven
2005-12-28 21:02 ` Linus Torvalds
2005-12-28 21:17 ` Arjan van de Ven
2005-12-28 21:23 ` Ingo Molnar
2005-12-28 21:48 ` Ingo Molnar
2005-12-28 23:56 ` Krzysztof Halasa
2005-12-29 7:41 ` Ingo Molnar
2005-12-29 8:02 ` Dave Jones
2005-12-29 19:44 ` Krzysztof Halasa
2005-12-29 4:11 ` Andrew Morton
2005-12-29 7:32 ` Ingo Molnar
2005-12-29 14:58 ` Horst von Brand
2005-12-29 15:40 ` Adrian Bunk
2005-12-29 17:41 ` Linus Torvalds
2005-12-29 18:42 ` Arjan van de Ven
2005-12-29 18:45 ` Arjan van de Ven
2005-12-29 20:19 ` Ingo Molnar
2005-12-29 22:20 ` Matt Mackall
2005-12-29 20:28 ` Dave Jones
2005-12-29 20:49 ` Linus Torvalds
2005-12-29 21:25 ` Linus Torvalds
[not found] ` <20051229224839.GA12247@elte.hu>
2005-12-29 22:58 ` Arjan van de Ven
2005-12-30 2:03 ` Tim Schmielau
2005-12-30 2:15 ` Tim Schmielau
2005-12-30 7:49 ` Ingo Molnar
2005-12-31 14:38 ` Adrian Bunk
2005-12-31 14:45 ` Ingo Molnar
2005-12-31 15:08 ` Adrian Bunk
2006-01-02 10:37 ` Ingo Molnar
2006-01-02 10:48 ` Arjan van de Ven
2006-01-02 13:43 ` Adrian Bunk
2006-01-02 14:05 ` Ingo Molnar
2006-01-02 15:01 ` Adrian Bunk
2006-01-02 18:44 ` Krzysztof Halasa
2006-01-02 18:51 ` Arjan van de Ven
2006-01-02 19:49 ` Krzysztof Halasa
2006-01-02 19:54 ` Arjan van de Ven
2006-01-02 20:05 ` Krzysztof Halasa
2006-01-02 20:18 ` Jörn Engel
2006-01-02 22:23 ` Russell King
2006-01-02 23:55 ` Alan Cox
2006-01-03 3:59 ` Daniel Jacobowitz
2006-01-03 8:53 ` Russell King
2006-01-03 8:56 ` Arjan van de Ven
2006-01-03 9:00 ` Russell King
2006-01-03 9:10 ` Arjan van de Ven
2006-01-03 9:14 ` Vitaly Wool
2006-01-02 19:03 ` Andrew Morton
2006-01-02 19:17 ` Jakub Jelinek
2006-01-02 19:30 ` Andrew Morton
2006-01-02 19:41 ` Linus Torvalds
2006-01-02 19:53 ` Ingo Molnar
2006-01-02 20:28 ` Jakub Jelinek
2006-01-02 20:09 ` Ingo Molnar
2006-01-02 20:24 ` Andrew Morton
2006-01-02 20:40 ` Ingo Molnar
2006-01-02 20:30 ` Ingo Molnar
2006-01-02 19:12 ` Linus Torvalds
2006-01-02 19:59 ` Krzysztof Halasa
2006-01-02 20:13 ` Ingo Molnar
2006-01-02 21:00 ` Jan Engelhardt
2006-01-02 22:43 ` Linus Torvalds
2006-01-02 13:42 ` Adrian Bunk
2006-01-02 18:28 ` Andrew Morton
2006-01-02 18:49 ` Arjan van de Ven
2006-01-02 19:26 ` Jörn Engel
2006-01-02 21:51 ` Grant Coady
2006-01-02 22:03 ` Antonio Vargas
2006-01-02 22:56 ` Arjan van de Ven
2006-01-02 23:10 ` Grant Coady
2006-01-02 23:57 ` Alan Cox
2006-01-02 23:58 ` Grant Coady
2006-01-03 5:31 ` Nick Piggin
2006-01-03 23:40 ` Martin J. Bligh
2006-01-04 4:28 ` Matt Mackall
2006-01-04 5:51 ` Martin J. Bligh
2006-01-04 17:10 ` Matt Mackall
2006-01-04 22:37 ` Linus Torvalds
2006-01-05 0:55 ` Martin Bligh
2006-01-04 17:36 ` Zwane Mwaikambo
2005-12-31 3:51 ` Kurt Wall
2005-12-30 3:31 ` Nicolas Pitre
2005-12-30 3:47 ` Mark Lord
2005-12-30 3:56 ` Dave Jones
2005-12-30 3:57 ` Mark Lord
2005-12-30 4:02 ` Dave Jones
2005-12-30 4:11 ` Mark Lord
2005-12-30 4:14 ` Mark Lord
2005-12-30 4:20 ` Mark Lord
2005-12-30 5:04 ` Dave Jones
2005-12-29 23:16 ` Willy Tarreau
2005-12-30 8:05 ` Arjan van de Ven
2005-12-30 8:15 ` Willy Tarreau
2005-12-30 8:24 ` Arjan van de Ven
2005-12-30 9:20 ` Willy Tarreau
2005-12-30 13:38 ` Adrian Bunk
2005-12-30 8:33 ` Jesper Juhl
2005-12-30 9:28 ` Willy Tarreau
2005-12-30 9:37 ` Jesper Juhl
2005-12-30 9:38 ` Willy Tarreau
2005-12-30 19:53 ` Alistair John Strachan
2005-12-29 7:49 ` Arjan van de Ven
2005-12-29 15:01 ` Horst von Brand
2005-12-30 15:28 ` Alan Cox
2005-12-30 20:59 ` Adrian Bunk
2005-12-30 22:12 ` Matt Mackall
2005-12-30 23:54 ` Adrian Bunk
2005-12-31 9:20 ` Arjan van de Ven
2005-12-29 14:38 ` Christoph Hellwig
2005-12-29 14:54 ` Arjan van de Ven
2005-12-29 15:35 ` Adrian Bunk
2005-12-29 15:38 ` Arjan van de Ven
2005-12-29 15:42 ` Jakub Jelinek
2005-12-29 19:14 ` Adrian Bunk
2005-12-30 9:28 ` Andi Kleen
2005-12-30 9:40 ` Ingo Molnar
2005-12-30 10:14 ` Ingo Molnar
2005-12-30 13:31 ` Adrian Bunk
2005-12-30 14:08 ` Christian Trefzer
2005-12-30 10:25 ` Andi Kleen
2005-12-29 0:37 ` Rogério Brito
2006-01-03 3:36 ` Daniel Jacobowitz
2005-12-29 4:38 ` Adrian Bunk
2005-12-29 7:59 ` Ingo Molnar
2005-12-29 13:52 ` Adrian Bunk
2005-12-29 19:57 ` Horst von Brand
2005-12-29 20:25 ` Ingo Molnar
2005-12-31 15:22 ` Adrian Bunk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060105233049.GA3441@elte.hu \
--to=mingo@elte.hu \
--cc=76306.1226@compuserve.com \
--cc=akpm@osdl.org \
--cc=arjan@infradead.org \
--cc=bunk@stusta.de \
--cc=davej@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mbligh@mbligh.org \
--cc=mpm@selenic.com \
--cc=tim@physik3.uni-rostock.de \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).