From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934285AbbEOSgz (ORCPT ); Fri, 15 May 2015 14:36:55 -0400 Received: from mail-ig0-f181.google.com ([209.85.213.181]:38563 "EHLO mail-ig0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933364AbbEOSgt (ORCPT ); Fri, 15 May 2015 14:36:49 -0400 MIME-Version: 1.0 In-Reply-To: References: <20150410121808.GA19918@gmail.com> Date: Fri, 15 May 2015 11:36:47 -0700 X-Google-Sender-Auth: _6nkNNjIQvwmT7Fb8gOxIE6jvYo Message-ID: Subject: Re: [tip:x86/asm] x86: Pack function addresses tightly as well From: Linus Torvalds To: Andy Lutomirski , Davidlohr Bueso , Peter Anvin , Denys Vlasenko , Linux Kernel Mailing List , Tim Chen , Borislav Petkov , Peter Zijlstra , "Chandramouleeswaran, Aswin" , Linus Torvalds , Peter Zijlstra , Brian Gerst , Paul McKenney , Thomas Gleixner , Ingo Molnar , Jason Low Cc: "linux-tip-commits@vger.kernel.org" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 15, 2015 at 2:39 AM, tip-bot for Ingo Molnar wrote: > > We can pack function addresses tightly as well: So I really want to see performance numbers on a few microarchitectures for this one in particular. The kernel generally doesn't have loops (well, not the kinds of high-rep loops that tend to be worth aligning), and I think the general branch/loop alignment is likely fine. But the function alignment doesn't tend to have the same kind of I$ advantages, it's more lilely purely a size issue and not as interesting. Function targets are also more likely to be not in the cache, I suspect, since you don't have a loop priming it or a short forward jump that just got the cacheline anyway. And then *not* aligning the function would actually tend to make it *less* dense in the I$. Put another way: I suspect this is more likely to hurt, and less likely to help than the others. Size matters, but size matters mainly from an I$ standpoint, not from some absolute 'big is bad" issue. Also, even when size matters, performance matters too. I do want performance numbers. Is this measurable? Linus