From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751235AbdFBKlE (ORCPT ); Fri, 2 Jun 2017 06:41:04 -0400 Received: from mx2.suse.de ([195.135.220.15]:60818 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751138AbdFBKkw (ORCPT ); Fri, 2 Jun 2017 06:40:52 -0400 Date: Fri, 2 Jun 2017 11:40:48 +0100 From: Mel Gorman To: Jiri Slaby Cc: Ingo Molnar , Josh Poimboeuf , x86@kernel.org, linux-kernel@vger.kernel.org, live-patching@vger.kernel.org, Linus Torvalds , Andy Lutomirski , "H. Peter Anvin" , Peter Zijlstra Subject: Re: [RFC PATCH 00/10] x86: undwarf unwinder Message-ID: <20170602104048.jkkzssljsompjdwy@suse.de> References: <20170601060824.wv2go3adbvx5ptmt@gmail.com> <20170601115819.3twoowcnvtrfzjzr@treble> <20170601135005.zf2lidtgslfvyihs@gmail.com> <3db1be2a-cc33-89f3-950f-cfe1c21ee7f1@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <3db1be2a-cc33-89f3-950f-cfe1c21ee7f1@suse.cz> User-Agent: Mutt/1.6.2 (2016-07-01) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jun 01, 2017 at 04:08:25PM +0200, Jiri Slaby wrote: > Ccing Mel who did proper measurements and can hopefully comment on his > results. > > On 06/01/2017, 03:50 PM, Ingo Molnar wrote: > > That's not what I meant! The speedup comes from (hopefully) being able to disable > > CONFIG_FRAME_POINTER, which: > > > > - creates simpler/faster function prologues and epilogues - no managing of RBP > > needed > > > > - gives one more generic purpose register to work from. This matters less on > > 64-bit kernels but it's a small effect. > > > > I've seen numbers of 1-2% of instruction count reduction in common kernel > > workloads, which would be pretty significant on well cached workloads. > I didn't preserve the data involved but in a variety of workloads including netperf, page allocator microbenchmark, pgbench and sqlite, enabling framepointer introduced overhead of around the 5-10% mark. According to an internal report I gave at the time, hackbench-thread-sockets was around the 5% mark and a perf run showed "3.49% more cache misses with framepointer enabled and 6.59% more cycles". Additional notes I made at the time although again, without the original data is ---8<--- It looks like a small amount of overhead added everywhere and the size of the vmlinux files supports that text data bss dec hex filename 8143072 6480614 11153408 25777094 18953c6 vmlinux/decker/vmlinux-4.8.0-disable-fp 8396698 6480614 11153408 26030720 18d3280 vmlinux/decker/vmlinux-4.8.0-enable-fp I also took a closer look at the pagealloc microbenchmarks because they rely on so few functions. Profiles were not always captured due to the short-lived nature of some of the tests so I looked at batches of 16384 allocation/frees of order-0 pages. Overall it showed 4.46% decline with framepointer enabled and profiling. 3.89% more cycles and 24.94% more cache misses. As before, the framepointer cache miss overhead is not that obvious as the bulk of samples take place elsewhere -- in this case, in checking whether pages are buddies when merging. It's slightly clearer in __rmqueue where 17.9% of cache misses are in the function entry point with framepointer enabled vs 4.04% with framepointer disabled. ---8<--- Granted, the check was done back in 4.8, but I've no reason to believe that 4.12 is any different and enabling framepointer does have a quite substantial hit to workloads that spent a lot of time in the kernel. -- Mel Gorman SUSE Labs