From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755170AbbEUHwh (ORCPT ); Thu, 21 May 2015 03:52:37 -0400 Received: from mail-wg0-f50.google.com ([74.125.82.50]:36359 "EHLO mail-wg0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755084AbbEUHwe (ORCPT ); Thu, 21 May 2015 03:52:34 -0400 Date: Thu, 21 May 2015 09:52:28 +0200 From: Ingo Molnar To: Linus Torvalds Cc: Josh Poimboeuf , Andy Lutomirski , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Michal Marek , Peter Zijlstra , X86 ML , live-patching@vger.kernel.org, "linux-kernel@vger.kernel.org" , Andy Lutomirski , Denys Vlasenko , Brian Gerst , Peter Zijlstra , Borislav Petkov , Andrew Morton Subject: Re: [PATCH v4 0/3] Compile-time stack frame pointer validation Message-ID: <20150521075228.GA20782@gmail.com> References: <20150520103339.GA22205@gmail.com> <20150520141331.GA16995@treble.redhat.com> <20150520144810.GA10374@gmail.com> <20150520162537.GD16995@treble.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Linus Torvalds wrote: > On Wed, May 20, 2015 at 9:25 AM, Josh Poimboeuf wrote: > > On Wed, May 20, 2015 at 09:03:37AM -0700, Andy Lutomirski wrote: > >> > >> I've never quite understood what the '?' means. > > > > It basically means "here's a function address we found on the > > stack, which may or may not have been called." It's needed > > because stack walking isn't currently 100% reliable. > > It is often quite interesting and helpful, because it shows stale > data on the stack, giving clues about what happened just before. Yes, it's basically a zero-cost tracer: often showing a partial trace of events that happened before. > Now, I'd like gcc to generally be better about not wasting so much > stack frame, so in that sense I'd like to see fewer '?" entries just > from a code quality standpoint, but when debugging those things, the > downside of "noise" is often cancelled by the upside of "ahh, it > happens after calling X". > > So the "perfect stack frames" is actually not as great a thing as > some people want to make it seem. We should definitely also print out the '?' entries, they are very useful especially when analyzing rare, difficult to reproduce, sporadic bugs - which are usually the hardest to fix bugs. The biggest long term plus of 'perfect stack frames' would not be to skip the '?' entries (we don't want to skip them!), but to be able to eventually build the kernel without frame pointers. Especially on modern x86 CPUs with stack engines (latest Intel and AMD CPUs) that keeps ESP updates out of the later stages of execution pipelines, going from RBP framepointers to direct ESP use is beneficial to performance and compresses I$ footprint as well: text data bss dec hex filename 12150606 2565544 1634304 16350454 f97cf6 linux-CONFIG_FRAME_POINTERS=n/vmlinux 13282884 2571744 1617920 17472548 10a9c24 linux-CONFIG_FRAME_POINTERS=y/vmlinux Here's the I$ cachemiss rate with the 'vfs-mix' workload that I used in the -falign-functions measuremenst gives this for CONFIG_FRAMEPOINTERS=y, on Intel Sandy Bridge (best of 9x10 runs): # # CONFIG_FRAMEPOINTERS=y # Performance counter stats for 'system wide' (10 runs): 728,328,347 L1-icache-load-misses ( +- 0.08% ) (100.00%) 11,891,931,664 instructions ( +- 0.00% ) 300,023 context-switches ( +- 0.00% ) 7.324048170 seconds time elapsed ( +- 0.09% ) ... and these are the I$ miss perf stats from running the same workload on a CONFIG_FRAMEPOINTERS=n kernel: # # CONFIG_FRAMEPOINTERS are not set # Performance counter stats for 'system wide' (10 runs): 687,758,078 L1-icache-load-misses ( +- 0.10% ) (100.00%) 10,984,908,013 instructions ( +- 0.01% ) 300,021 context-switches ( +- 0.00% ) 7.120867260 seconds time elapsed ( +- 0.29% ) So if we disable frame pointers, then on this workload: - the kernel text size is 9.3% smaller - the number of instructions executed went down by about 8.2% - the cachemiss rate went down by about 5.9% - performance went up by about 2.8%. The speedup is actually even better than 2.8%, if you look at average execution time: linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.324048170 seconds time elapsed ( +- 0.09% ) linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.470166715 seconds time elapsed ( +- 1.01% ) linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.365047474 seconds time elapsed ( +- 0.25% ) linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.828223324 seconds time elapsed ( +- 2.04% ) linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.427164489 seconds time elapsed ( +- 0.70% ) linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.385565350 seconds time elapsed ( +- 0.35% ) linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.560782318 seconds time elapsed ( +- 1.68% ) linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.399741309 seconds time elapsed ( +- 0.74% ) linux-CONFIG_FRAME_POINTERS=y/res.txt: 7.303746766 seconds time elapsed ( +- 0.04% ) avg = 7.451609 linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.201498813 seconds time elapsed ( +- 0.86% ) linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.120867260 seconds time elapsed ( +- 0.29% ) linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.141642635 seconds time elapsed ( +- 0.15% ) linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.217213506 seconds time elapsed ( +- 0.85% ) linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.163046581 seconds time elapsed ( +- 0.56% ) linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.128939439 seconds time elapsed ( +- 0.23% ) linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.256172853 seconds time elapsed ( +- 0.82% ) linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.122946768 seconds time elapsed ( +- 0.23% ) linux-CONFIG_FRAME_POINTERS=n/res.txt: 7.126018578 seconds time elapsed ( +- 0.18% ) avg = 7.164260 Then with framepointers disabled this workload gets faster by 4.0% on average. The average result is also pretty stable in the no-framepointers case, while it fluctuates more in the framepointers case. (and this is why the 'best runtime' favors the framepointers case - the average is closer to reality.) So the performance advantages of not doing framepointers is not something we can ignore IMHO: but obviously performance isn't everything - so if stack unwinding is unrobust, then we need and want frame pointers. Thanks, Ingo