LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Mel Gorman <mgorman@suse.de>
To: Jiri Slaby <jslaby@suse.cz>
Cc: Ingo Molnar <mingo@kernel.org>,
	Josh Poimboeuf <jpoimboe@redhat.com>,
	x86@kernel.org, linux-kernel@vger.kernel.org,
	live-patching@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andy Lutomirski <luto@kernel.org>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Peter Zijlstra <peterz@infradead.org>
Subject: Re: [RFC PATCH 00/10] x86: undwarf unwinder
Date: Fri, 2 Jun 2017 11:40:48 +0100
Message-ID: <20170602104048.jkkzssljsompjdwy@suse.de> (raw)
In-Reply-To: <3db1be2a-cc33-89f3-950f-cfe1c21ee7f1@suse.cz>

On Thu, Jun 01, 2017 at 04:08:25PM +0200, Jiri Slaby wrote:
> Ccing Mel who did proper measurements and can hopefully comment on his
> results.
> 
> On 06/01/2017, 03:50 PM, Ingo Molnar wrote:
> > That's not what I meant! The speedup comes from (hopefully) being able to disable 
> > CONFIG_FRAME_POINTER, which:
> > 
> >  - creates simpler/faster function prologues and epilogues - no managing of RBP 
> >    needed
> > 
> >  - gives one more generic purpose register to work from. This matters less on 
> >    64-bit kernels but it's a small effect.
> > 
> > I've seen numbers of 1-2% of instruction count reduction in common kernel 
> > workloads, which would be pretty significant on well cached workloads.
> 

I didn't preserve the data involved but in a variety of workloads including
netperf, page allocator microbenchmark, pgbench and sqlite, enabling
framepointer introduced overhead of around the 5-10% mark. According
to an internal report I gave at the time, hackbench-thread-sockets was
around the 5% mark and a perf run showed "3.49% more cache misses with
framepointer enabled and 6.59% more cycles". Additional notes I made at
the time although again, without the original data is

---8<---
It looks like a small amount of overhead added everywhere and the size of
the vmlinux files supports that

   text    data     bss     dec     hex   filename
8143072 6480614 11153408 25777094 18953c6 vmlinux/decker/vmlinux-4.8.0-disable-fp
8396698 6480614 11153408 26030720 18d3280 vmlinux/decker/vmlinux-4.8.0-enable-fp

I also took a closer look at the pagealloc microbenchmarks because they
rely on so few functions. Profiles were not always captured due to the
short-lived nature of some of the tests so I looked at batches of 16384
allocation/frees of order-0 pages. Overall it showed 4.46% decline with
framepointer enabled and profiling. 3.89% more cycles and 24.94% more
cache misses.

As before, the framepointer cache miss overhead is not that obvious as
the bulk of samples take place elsewhere -- in this case, in checking
whether pages are buddies when merging. It's slightly clearer in
__rmqueue where 17.9% of cache misses are in the function entry point
with framepointer enabled vs 4.04% with framepointer disabled.
---8<---

Granted, the check was done back in 4.8, but I've no reason to believe
that 4.12 is any different and enabling framepointer does have a quite
substantial hit to workloads that spent a lot of time in the kernel.

-- 
Mel Gorman
SUSE Labs

      reply index

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-01  5:44 Josh Poimboeuf
2017-06-01  5:44 ` [RFC PATCH 01/10] objtool: move checking code to check.c Josh Poimboeuf
2017-06-14  7:22   ` Jiri Slaby
2017-06-01  5:44 ` [RFC PATCH 02/10] objtool, x86: add several functions and files to the objtool whitelist Josh Poimboeuf
2017-06-14  7:24   ` Jiri Slaby
2017-06-14 13:03     ` Josh Poimboeuf
2017-06-01  5:44 ` [RFC PATCH 03/10] objtool: stack validation 2.0 Josh Poimboeuf
2017-06-01  5:44 ` [RFC PATCH 04/10] objtool: add undwarf debuginfo generation Josh Poimboeuf
2017-06-14  8:42   ` Jiri Slaby
2017-06-14 13:27     ` Josh Poimboeuf
2017-06-22  7:47       ` Jiri Slaby
2017-06-22 12:49         ` Josh Poimboeuf
2017-06-01  5:44 ` [RFC PATCH 05/10] objtool, x86: add facility for asm code to provide CFI hints Josh Poimboeuf
2017-06-01 13:57   ` Andy Lutomirski
2017-06-01 14:16     ` Josh Poimboeuf
2017-06-01 14:40       ` Andy Lutomirski
2017-06-01 15:02         ` Josh Poimboeuf
2017-06-01  5:44 ` [RFC PATCH 06/10] x86/entry: add CFI hint undwarf annotations Josh Poimboeuf
2017-06-01 14:03   ` Andy Lutomirski
2017-06-01 14:23     ` Josh Poimboeuf
2017-06-01 14:28       ` Josh Poimboeuf
2017-06-01 14:39         ` Andy Lutomirski
2017-06-01 15:01           ` Josh Poimboeuf
2017-06-01  5:44 ` [RFC PATCH 07/10] x86/asm: add CFI hint annotations to sync_core() Josh Poimboeuf
2017-06-01  5:44 ` [RFC PATCH 08/10] extable: rename 'sortextable' script to 'sorttable' Josh Poimboeuf
2017-06-01  5:44 ` [RFC PATCH 09/10] extable: add undwarf table sorting ability to sorttable script Josh Poimboeuf
2017-06-01  5:44 ` [RFC PATCH 10/10] x86/unwind: add undwarf unwinder Josh Poimboeuf
2017-06-01 11:05   ` Peter Zijlstra
2017-06-01 12:26     ` Josh Poimboeuf
2017-06-01 12:47       ` Jiri Slaby
2017-06-01 13:02         ` Josh Poimboeuf
2017-06-01 13:42         ` Peter Zijlstra
2017-06-01 13:10       ` Peter Zijlstra
2017-06-01 12:13   ` Peter Zijlstra
2017-06-01 12:36     ` Josh Poimboeuf
2017-06-01 13:12       ` Peter Zijlstra
2017-06-01 15:03         ` Josh Poimboeuf
2017-06-14 11:45   ` Jiri Slaby
2017-06-14 13:44     ` Josh Poimboeuf
2017-06-01  6:08 ` [RFC PATCH 00/10] x86: " Ingo Molnar
2017-06-01 11:58   ` Josh Poimboeuf
2017-06-01 12:17     ` Peter Zijlstra
2017-06-01 12:33       ` Jiri Slaby
2017-06-01 12:52         ` Josh Poimboeuf
2017-06-01 12:57           ` Jiri Slaby
2017-06-01 12:47       ` Josh Poimboeuf
2017-06-01 13:25         ` Peter Zijlstra
2017-06-06 14:14           ` Sergey Senozhatsky
2017-06-01 13:50         ` Andy Lutomirski
2017-06-01 13:50     ` Ingo Molnar
2017-06-01 13:58       ` Jiri Slaby
2017-06-02  8:30         ` Jiri Slaby
2017-06-01 14:05       ` Josh Poimboeuf
2017-06-01 14:08       ` Jiri Slaby
2017-06-02 10:40         ` Mel Gorman [this message]

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170602104048.jkkzssljsompjdwy@suse.de \
    --to=mgorman@suse.de \
    --cc=hpa@zytor.com \
    --cc=jpoimboe@redhat.com \
    --cc=jslaby@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=live-patching@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git