All of lore.kernel.org
 help / color / mirror / Atom feed
From: Josh Poimboeuf <jpoimboe@redhat.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: x86@kernel.org, linux-kernel@vger.kernel.org,
	live-patching@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andy Lutomirski <luto@kernel.org>, Jiri Slaby <jslaby@suse.cz>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Peter Zijlstra <peterz@infradead.org>
Subject: Re: [PATCH v2 4/8] objtool: add undwarf debuginfo generation
Date: Mon, 10 Jul 2017 21:58:07 -0500	[thread overview]
Message-ID: <20170711025807.62fzfgf2dhcgqur6@treble> (raw)
In-Reply-To: <20170707094437.2vgosia5hjg2wsut@gmail.com>

On Fri, Jul 07, 2017 at 11:44:37AM +0200, Ingo Molnar wrote:
> 
> * Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> 
> > On Thu, Jun 29, 2017 at 10:06:52AM -0500, Josh Poimboeuf wrote:
> > > On Thu, Jun 29, 2017 at 04:46:18PM +0200, Ingo Molnar wrote:
> > > > 
> > > > * Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > > > 
> > > > > > Plus, shouldn't we use __packed for 'struct undwarf' to minimize the 
> > > > > > structure's size (to 6 bytes AFAICS?) - or is optimal packing of the main 
> > > > > > undwarf array already guaranteed on every platform with this layout?
> > > > > 
> > > > > Ah yes, it should definitely be packed (assuming that doesn't affect performance 
> > > > > negatively).
> > > > 
> > > > So if I count that correctly that should shave another ~1MB off a typical ~4MB 
> > > > table size?
> > > 
> > > Here's what my Fedora kernel looks like *before* the packed change:
> > > 
> > >   $ eu-readelf -S vmlinux |grep undwarf
> > >   [15] .undwarf_ip          PROGBITS     ffffffff81f776d0 011776d0 0012d9d0  0 A      0   0  1
> > >   [16] .undwarf             PROGBITS     ffffffff820a50a0 012a50a0 0025b3a0  0 A      0   0  1
> > > 
> > > The total undwarf data size is ~3.5MB.
> > > 
> > > There are 308852 entries of two parallel arrays:
> > > 
> > > * .undwarf    (8 bytes/entry) = 2470816 bytes
> > > * .undwarf_ip (4 bytes/entry) = 1235408 bytes
> > > 
> > > If we pack undwarf, reducing the size of the .undwarf entries by two
> > > bytes, it will save 308852 * 2 = 617704.
> > > 
> > > So the savings will be ~600k, and the typical size will be reduced to ~3MB.
> > 
> > Just for the record, while packing the struct from 8 to 6 bytes did save 600k, 
> > it also made the unwinder ~7% slower.  I think that's probably an ok tradeoff, 
> > so I'll leave it packed in v3.
> 
> So, out of curiosity, I'm wondering where that slowdown comes from: on modern x86 
> CPUs indexing by units of 6 bytes ought to be just as fast as indexing by 8 bytes, 
> unless I'm missing something? Is it maybe the not naturally aligned 32-bit words?
> 
> Or maybe there's some bad case of a 32-bit word crossing a 64-byte cache line 
> boundary that hits some pathological aspect of the CPU? We could probably get 
> around any such problems by padding by 2 bytes on 64-byte boundaries - that's only 
> a ~3% data size increase. The flip side would be a complication of the data 
> structure and its accessors - which might cost more in terms of code generation 
> efficiency than it buys us to begin with ...
> 
> Also, there's another aspect besides RAM footprint: a large data structure that is 
> ~20% smaller means 20% less cache footprint: which for cache cold lookups might 
> matter more than the direct computational cost.

tl;dr: Packed really seems to be more like ~2% slower, time for an adult
beverage.

So I tested again with the latest version of my code, and this time
packed was 5% *faster* than unpacked, rather than 7% slower.  'perf
stat' showed that, in both cases, most of the difference was caused by
branch misses in the binary search code.  But that code doesn't even
touch the packed struct...

After some hair-pulling/hand-wringing I realized that changing the
struct packing caused GCC to change some of the unwinder code a bit,
which shifted the rest of the kernel's function offsets enough that it
changed the behavior of the unwind table binary search in a way that
affected the CPU's branch prediction.  And my crude benchmark was just
unwinding the same stack on repeat, so a small change in the loop
behavior had a big impact on the overall branch predictability.

Anyway, I used some linker magic to temporarily move the unwinder code
to the end of .text, so that unwinder changes don't add unexpected side
effects to the microbenchmark behavior.  Now I'm getting more consistent
results: the packed struct is measuring ~2% slower.  The slight slowdown
might just be explained by the fact that GCC generates some extra
instructions for extracting the fields out of the packed struct.

In the meantime, I found a ~10% speedup by making the "fast lookup
table" block size a power-of-two (256) to get rid of the need for a slow
'div' instruction.

I think I'm done performance tweaking for now.  I'll keep the packed
struct, and add the code for the 'div' removal, and hope to submit v3
soon.

-- 
Josh

  reply	other threads:[~2017-07-11  2:58 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-28 15:11 [PATCH v2 0/8] x86: undwarf unwinder Josh Poimboeuf
2017-06-28 15:11 ` [PATCH v2 1/8] objtool: move checking code to check.c Josh Poimboeuf
2017-06-30 13:12   ` [tip:core/objtool] objtool: Move " tip-bot for Josh Poimboeuf
2017-06-28 15:11 ` [PATCH v2 2/8] objtool, x86: add several functions and files to the objtool whitelist Josh Poimboeuf
2017-06-30 13:12   ` [tip:core/objtool] objtool, x86: Add " tip-bot for Josh Poimboeuf
2017-06-28 15:11 ` [PATCH v2 3/8] objtool: stack validation 2.0 Josh Poimboeuf
2017-06-30  8:32   ` Ingo Molnar
2017-06-30 13:23     ` Josh Poimboeuf
2017-06-30 13:26       ` Josh Poimboeuf
2017-06-30 14:09     ` [PATCH] objtool: silence warnings for functions which use iret Josh Poimboeuf
2017-06-30 17:49       ` [tip:core/objtool] objtool: Silence warnings for functions which use IRET tip-bot for Josh Poimboeuf
2017-06-30 13:13   ` [tip:core/objtool] objtool: Implement stack validation 2.0 tip-bot for Josh Poimboeuf
2017-06-28 15:11 ` [PATCH v2 4/8] objtool: add undwarf debuginfo generation Josh Poimboeuf
2017-06-29  7:14   ` Ingo Molnar
2017-06-29 13:40     ` Josh Poimboeuf
2017-06-29  7:25   ` Ingo Molnar
2017-06-29 14:04     ` Josh Poimboeuf
2017-06-29 14:46       ` Ingo Molnar
2017-06-29 15:06         ` Josh Poimboeuf
2017-07-06 20:36           ` Josh Poimboeuf
2017-07-07  9:44             ` Ingo Molnar
2017-07-11  2:58               ` Josh Poimboeuf [this message]
2017-07-11  8:40                 ` Ingo Molnar
2017-06-28 15:11 ` [PATCH v2 5/8] objtool, x86: add facility for asm code to provide unwind hints Josh Poimboeuf
2017-06-28 15:11 ` [PATCH v2 6/8] x86/entry: add unwind hint annotations Josh Poimboeuf
2017-06-29 17:53   ` Josh Poimboeuf
2017-06-29 18:50     ` Andy Lutomirski
2017-06-29 19:05       ` Josh Poimboeuf
2017-06-29 21:09         ` Andy Lutomirski
2017-06-29 21:41           ` Josh Poimboeuf
2017-06-29 22:59             ` Andy Lutomirski
2017-06-30  2:12               ` Josh Poimboeuf
2017-06-30  5:05                 ` Andy Lutomirski
2017-06-30  5:41                   ` Andy Lutomirski
2017-06-30 13:11                     ` Josh Poimboeuf
2017-06-30 15:44                       ` Andy Lutomirski
2017-06-30 15:55                         ` Josh Poimboeuf
2017-06-30 15:56                           ` Andy Lutomirski
2017-06-30 16:16                             ` Josh Poimboeuf
2017-06-28 15:11 ` [PATCH v2 7/8] x86/asm: add unwind hint annotations to sync_core() Josh Poimboeuf
2017-06-28 15:11 ` [PATCH v2 8/8] x86/unwind: add undwarf unwinder Josh Poimboeuf
2017-06-29  7:55 ` [PATCH v2 0/8] x86: " Ingo Molnar
2017-06-29 14:12   ` Josh Poimboeuf
2017-06-29 19:13     ` Josh Poimboeuf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170711025807.62fzfgf2dhcgqur6@treble \
    --to=jpoimboe@redhat.com \
    --cc=hpa@zytor.com \
    --cc=jslaby@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=live-patching@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.