From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1423220AbbEOUww (ORCPT ); Fri, 15 May 2015 16:52:52 -0400 Received: from mx1.redhat.com ([209.132.183.28]:42571 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1422676AbbEOUws (ORCPT ); Fri, 15 May 2015 16:52:48 -0400 Message-ID: <55565C9B.3020607@redhat.com> Date: Fri, 15 May 2015 22:52:43 +0200 From: Denys Vlasenko User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: Linus Torvalds , Andy Lutomirski , Davidlohr Bueso , Peter Anvin , Linux Kernel Mailing List , Tim Chen , Borislav Petkov , Peter Zijlstra , "Chandramouleeswaran, Aswin" , Peter Zijlstra , Brian Gerst , Paul McKenney , Thomas Gleixner , Ingo Molnar , Jason Low CC: "linux-tip-commits@vger.kernel.org" Subject: Re: [tip:x86/asm] x86: Pack function addresses tightly as well References: <20150410121808.GA19918@gmail.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05/15/2015 08:36 PM, Linus Torvalds wrote: > On Fri, May 15, 2015 at 2:39 AM, tip-bot for Ingo Molnar > wrote: >> >> We can pack function addresses tightly as well: > > So I really want to see performance numbers on a few > microarchitectures for this one in particular. > > The kernel generally doesn't have loops (well, not the kinds of > high-rep loops that tend to be worth aligning), and I think the > general branch/loop alignment is likely fine. But the function > alignment doesn't tend to have the same kind of I$ advantages, it's > more lilely purely a size issue and not as interesting. Function > targets are also more likely to be not in the cache, I suspect, since > you don't have a loop priming it or a short forward jump that just got > the cacheline anyway. And then *not* aligning the function would > actually tend to make it *less* dense in the I$. How about taking an intermediate step and using -falign-functions=6. This means "align to 8 if it requires skipping less than 6 bytes". Why < 6? Because with CONFIG_FTRACE=y, every function starts with 5-byte instruction ("call ftrace", replaced by a 5-byte nop). We want at least this one insn to be decoded at once. Without CONFIG_FTRACE, it's not as clear-cut, but typical x86 insns are 5 bytes or less, so it will still make most fuctions start executing reasonably quickly at the cost of only 2.5 bytes of padding on average. I'd prefer "align to 16 if it requires skipping less than 6 bytes" because aligning to 8 which is not a multiple of 16 doesn't make sense on modern CPUs (it can in fact hurt a bit), but alas, gcc's option format doesn't allow that. If you don't like the 8-byte alignment, the smallest option which would align to 16 bytes is -falign-functions=9: it means "align to 16 if it requires skipping less than 9 bytes". Still significantly better than insane padding to 16 even if we at address just a few bytes past cacheline start (0x1231 -> 0x1240). The last thing. If CONFIG_CC_OPTIMIZE_FOR_SIZE=y, we probably shouldn't do any alignment.