From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1756279AbbEUN23 (ORCPT <rfc822;w@1wt.eu>);
	Thu, 21 May 2015 09:28:29 -0400
Received: from mail-wi0-f182.google.com ([209.85.212.182]:33342 "EHLO
	mail-wi0-f182.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1756256AbbEUN2Z (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Thu, 21 May 2015 09:28:25 -0400
Date: Thu, 21 May 2015 15:28:18 +0200
From: Ingo Molnar <mingo@kernel.org>
To: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
        Andy Lutomirski <luto@amacapital.net>,
        Davidlohr Bueso <dave@stgolabs.net>, Peter Anvin <hpa@zytor.com>,
        Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
        Tim Chen <tim.c.chen@linux.intel.com>, Borislav Petkov <bp@alien8.de>,
        Peter Zijlstra <peterz@infradead.org>,
        "Chandramouleeswaran, Aswin" <aswin@hp.com>,
        Peter Zijlstra <a.p.zijlstra@chello.nl>,
        Brian Gerst <brgerst@gmail.com>,
        Paul McKenney <paulmck@linux.vnet.ibm.com>,
        Thomas Gleixner <tglx@linutronix.de>, Jason Low <jason.low2@hp.com>,
        "linux-tip-commits@vger.kernel.org" 
	<linux-tip-commits@vger.kernel.org>,
        Arjan van de Ven <arjan@infradead.org>,
        Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [RFC PATCH] x86/64: Optimize the effective instruction cache
 footprint of kernel functions
Message-ID: <20150521132818.GA544@gmail.com>
References: <20150410121808.GA19918@gmail.com>
 <tip-4874fe1eeb40b403a8c9d0ddeb4d166cab3f37ba@git.kernel.org>
 <CA+55aFywCXk083w78cQGbRKh-ERLtE8v9PZhqxaHcyHJxSNsFQ@mail.gmail.com>
 <20150517055551.GB17002@gmail.com>
 <20150519213820.GA31688@gmail.com>
 <555C7012.3040806@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <555C7012.3040806@redhat.com>
User-Agent: Mutt/1.5.23 (2014-03-12)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org


* Denys Vlasenko <dvlasenk@redhat.com> wrote:

> Can you post your .config for the test?
> If you have CONFIG_OPTIMIZE_INLINING=y in your -Os test,
> consider re-testing with it turned off.

Yes, I had CONFIG_OPTIMIZE_INLINING=y.

With that turned off, on GCC 4.9.2, I'm seeing:

 fomalhaut:~/linux/linux-____CC_OPTIMIZE_FOR_SIZE=y> size vmlinux.OPTIMIZE_INLINING\=*
     text           data     bss      dec            hex filename
 12150606        2565544 1634304 16350454         f97cf6 vmlinux.OPTIMIZE_INLINING=y
 12354814        2572520 1634304 16561638         fcb5e6 vmlinux.OPTIMIZE_INLINING=n

I.e. forcing the inlining increases the kernel size again, by about 
1.7%.

I re-ran the tests on the Intel system, and got these I$ miss rates:

linux-falign-functions=_64-bytes:                  647,853,942      L1-icache-load-misses                                         ( +-  0.07% )  (100.00%)
linux-falign-functions=_16-bytes:                  706,080,917      L1-icache-load-misses                                         ( +-  0.05% )  (100.00%)
linux-CC_OPTIMIZE_FOR_SIZE=y+OPTIMIZE_INLINING=y:  921,910,808      L1-icache-load-misses                                         ( +-  0.05% )  (100.00%)
linux-CC_OPTIMIZE_FOR_SIZE=y+OPTIMIZE_INLINING=n:  792,395,265      L1-icache-load-misses                                         ( +-  0.05% )  (100.00%)

So yeah, it got better - but the I$ cache miss rate is still 22.4% 
higher than that of the 64-bytes aligned kernel and 12.2% higher than 
the vanilla kernel.

Elapsed time had this original OPTIMIZE_FOR_SIZE result:

       8.531418784 seconds time elapsed                                          ( +-  0.19% )

this now improved to:

       7.686174880 seconds time elapsed                                          ( +-  0.18% )

but it's still much worse than the 64-byte aligned one:

       7.154816369 seconds time elapsed                                          ( +-  0.03% )

and the 16-byte aligned one:

       7.333597250 seconds time elapsed                                          ( +-  0.48% )

> You may be seeing this: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122

Yeah, disabling OPTIMIZE_INLINING made a difference - but it didn't 
recover the performance loss, -Os is still 4.8% slower in this 
workload than the vanilla kernel.

Thanks,

	Ingo