From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1752357AbbETAry (ORCPT <rfc822;w@1wt.eu>);
	Tue, 19 May 2015 20:47:54 -0400
Received: from mail-ie0-f176.google.com ([209.85.223.176]:33234 "EHLO
	mail-ie0-f176.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751589AbbETArw (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Tue, 19 May 2015 20:47:52 -0400
MIME-Version: 1.0
In-Reply-To: <20150519213820.GA31688@gmail.com>
References: <20150410121808.GA19918@gmail.com>
	<tip-4874fe1eeb40b403a8c9d0ddeb4d166cab3f37ba@git.kernel.org>
	<CA+55aFywCXk083w78cQGbRKh-ERLtE8v9PZhqxaHcyHJxSNsFQ@mail.gmail.com>
	<20150517055551.GB17002@gmail.com>
	<20150519213820.GA31688@gmail.com>
Date: Tue, 19 May 2015 17:47:51 -0700
X-Google-Sender-Auth: LQGVzAzmVy7h51OTzxD7Rzr2Vck
Message-ID: <CA+55aFwog9oCat-FQqW52SvVGyZohNZPeuWk92a2cvhjc6H38Q@mail.gmail.com>
Subject: Re: [RFC PATCH] x86/64: Optimize the effective instruction cache
 footprint of kernel functions
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>, Davidlohr Bueso <dave@stgolabs.net>,
        Peter Anvin <hpa@zytor.com>, Denys Vlasenko <dvlasenk@redhat.com>,
        Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
        Tim Chen <tim.c.chen@linux.intel.com>, Borislav Petkov <bp@alien8.de>,
        Peter Zijlstra <peterz@infradead.org>,
        "Chandramouleeswaran, Aswin" <aswin@hp.com>,
        Peter Zijlstra <a.p.zijlstra@chello.nl>,
        Brian Gerst <brgerst@gmail.com>,
        Paul McKenney <paulmck@linux.vnet.ibm.com>,
        Thomas Gleixner <tglx@linutronix.de>, Jason Low <jason.low2@hp.com>,
        "linux-tip-commits@vger.kernel.org" 
	<linux-tip-commits@vger.kernel.org>,
        Arjan van de Ven <arjan@infradead.org>,
        Andrew Morton <akpm@linux-foundation.org>
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Tue, May 19, 2015 at 2:38 PM, Ingo Molnar <mingo@kernel.org> wrote:
>
> The optimal I$ miss rate is at 64 bytes - which is 9% better than the
> default kernel's I$ miss rate at 16 bytes alignment.

Ok, these numbers looks reasonable (which is, of course, defined as
"meets Linus' expectations"), so I like it.

At the same time, I have to admit that I abhor a 64-byte function
alignment, when we have a fair number of functions that are (much)
smaller than that.

Is there some way to get gcc to take the size of the function into
account? Because aligning a 16-byte or 32-byte function on a 64-byte
alignment is just criminally nasty and wasteful.

>>From your numbers the 64-byte alignment definitely makes sense in
general, but I really think it would be much nicer if we could get
something like "align functions to their power-of-two size rounded up,
up to a maximum of 64 bytes"

Maybe I did something wrong, but doing this:

    export last=0
    nm vmlinux | grep ' [tT] ' | sort | while read i t name
    do
        size=$((0x$i-$last)); last=0x$i; lastname=$name
        [ $size -ge 16 ] && echo $size $lastname
    done | sort -n | less -S

seems to say that we have a *lot* of small functions (don't do this
with a debug build that has a lot of odd things, do it with something
you'd actually boot and run).

The above assumes the default 16-byte alignment, and gets rid of the
the zero-sized ones (due to mainly system call aliases), and the ones
less than 16 bytes (obviously not aligned as-is). But you still end up
with a *lot* of functions.a lot of the really small ones are silly
setup functions etc, but there's actually a fair number of 16-byte
functions.

I seem to get ~30k functions in my defconfig vmlinux file, and about
half seem to be lless than 96 bytes (that's _with_ the 16-byte
alignment). In fact, there seems to be ~5500 functions that are 32
bytes or less, of which 1850 functions are 16 bytes or less.

Aligning a 16-byte function to 64 bytes really does sound wrong, and
there's a fair number of them.  Of course, it depends on what's around
it just how much memory it wastes, but it *definitely* doesn't help I$
to round small functions up to the next cacheline too.

I dunno. I might have screwed up the above shellscript badly and my
numbers may be pure garbage. But apart from the tail end that has
insane big sizes (due to section changes or intermixed data or
something, I suspect) it doesn't look obviously wrong. So I think it
might be a reasonable approximation.

We'd need toolchain help to do saner alignment.

                         Linus