From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755638AbdGKIlC (ORCPT ); Tue, 11 Jul 2017 04:41:02 -0400 Received: from mail-wr0-f193.google.com ([209.85.128.193]:35561 "EHLO mail-wr0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755245AbdGKIlA (ORCPT ); Tue, 11 Jul 2017 04:41:00 -0400 Date: Tue, 11 Jul 2017 10:40:56 +0200 From: Ingo Molnar To: Josh Poimboeuf Cc: x86@kernel.org, linux-kernel@vger.kernel.org, live-patching@vger.kernel.org, Linus Torvalds , Andy Lutomirski , Jiri Slaby , "H. Peter Anvin" , Peter Zijlstra Subject: Re: [PATCH v2 4/8] objtool: add undwarf debuginfo generation Message-ID: <20170711084055.pfrzl5kql7coxsxn@gmail.com> References: <20170629072512.pmkfnrgq4dci6od7@gmail.com> <20170629140404.qgcvxhcgm7iywrkb@treble> <20170629144618.vdzem7o6ib5nqab6@gmail.com> <20170629150652.r2dl7f3pzp6cj2i7@treble> <20170706203636.lcwfjsphmy2q464v@treble> <20170707094437.2vgosia5hjg2wsut@gmail.com> <20170711025807.62fzfgf2dhcgqur6@treble> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170711025807.62fzfgf2dhcgqur6@treble> User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Josh Poimboeuf wrote: > Anyway, I used some linker magic to temporarily move the unwinder code to the > end of .text, so that unwinder changes don't add unexpected side effects to the > microbenchmark behavior. Now I'm getting more consistent results: the packed > struct is measuring ~2% slower. The slight slowdown might just be explained by > the fact that GCC generates some extra instructions for extracting the fields > out of the packed struct. Yeah, the 16-bit field accesses versus a zero-extended 32-bit field are more complex to access even on x86 that has a fair amount of 16-bit legacy. > In the meantime, I found a ~10% speedup by making the "fast lookup table" block > size a power-of-two (256) to get rid of the need for a slow 'div' instruction. > > I think I'm done performance tweaking for now. I'll keep the packed struct, and > add the code for the 'div' removal, and hope to submit v3 soon. Sounds good to me! ~2% slowdown for ~30% RAM savings for a debug data structure that is about as large as a typical kernel's total .text is a decent trade-off. Thanks, Ingo