From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932409AbbDJMap (ORCPT ); Fri, 10 Apr 2015 08:30:45 -0400 Received: from mail-wg0-f50.google.com ([74.125.82.50]:34609 "EHLO mail-wg0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754905AbbDJMam (ORCPT ); Fri, 10 Apr 2015 08:30:42 -0400 Date: Fri, 10 Apr 2015 14:30:18 +0200 From: Ingo Molnar To: "Paul E. McKenney" Cc: Linus Torvalds , Jason Low , Peter Zijlstra , Davidlohr Bueso , Tim Chen , Aswin Chandramouleeswaran , LKML , Borislav Petkov , Andy Lutomirski , Denys Vlasenko , Brian Gerst , "H. Peter Anvin" , Thomas Gleixner , Peter Zijlstra Subject: [PATCH] x86: Pack loops tightly as well Message-ID: <20150410123017.GB19918@gmail.com> References: <20150409183926.GM6464@linux.vnet.ibm.com> <20150410090051.GA28549@gmail.com> <20150410091252.GA27630@gmail.com> <20150410092152.GA21332@gmail.com> <20150410111427.GA30477@gmail.com> <20150410112748.GB30477@gmail.com> <20150410120846.GA17101@gmail.com> <20150410121808.GA19918@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150410121808.GA19918@gmail.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Ingo Molnar wrote: > > I realize that x86 CPU manufacturers recommend 16-byte jump target > > alignments (it's in the Intel optimization manual), but the cost > > of that is very significant: > > > > text data bss dec filename > > 12566391 1617840 1089536 15273767 vmlinux.align.16-byte > > 12224951 1617840 1089536 14932327 vmlinux.align.1-byte > > > > By using 1 byte jump target alignment (i.e. no alignment at all) > > we get an almost 3% reduction in kernel size (!) - and a probably > > similar reduction in I$ footprint. > > Likewise we could pack functions tightly as well via the patch > below: > > text data bss dec filename > 12566391 1617840 1089536 15273767 vmlinux.align.16-byte > 12224951 1617840 1089536 14932327 vmlinux.align.1-byte > 11976567 1617840 1089536 14683943 vmlinux.align.1-byte.funcs-1-byte > > Which brings another 2% reduction in the kernel's code size. > > It would be interesting to see some benchmarks with these two > patches applied. Only lightly tested. And the final patch below also packs loops tightly: text data bss dec filename 12566391 1617840 1089536 15273767 vmlinux.align.16-byte 12224951 1617840 1089536 14932327 vmlinux.align.1-byte 11976567 1617840 1089536 14683943 vmlinux.align.1-byte.funcs-1-byte 11903735 1617840 1089536 14611111 vmlinux.align.1-byte.funcs-1-byte.loops-1-byte The total reduction is 5.5%. Now loop alignment is beneficial if: - a loop is cache-hot and its surroundings are not. Loop alignment is harmful if: - a loop is cache-cold - a loop's surroundings are cache-hot as well - two cache-hot loops are close to each other and I'd argue that the latter three harmful scenarios are much more common in the kernel. Similar arguments can be made for function alignment as well. (Jump target alignment is a bit different but I think the same conclusion holds.) (I might have missed some CPU microarchitectural details though that would make such packing undesirable.) Thanks, Ingo =============================> >>From cfc2ca24908cce66b9df1f711225d461f5d59b97 Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Fri, 10 Apr 2015 14:20:30 +0200 Subject: [PATCH] x86: Pack loops tightly as well Not-Signed-off-by: Ingo Molnar --- arch/x86/Makefile | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/x86/Makefile b/arch/x86/Makefile index 573d0c459f99..10989a73b986 100644 --- a/arch/x86/Makefile +++ b/arch/x86/Makefile @@ -83,6 +83,9 @@ else # Pack functions tightly as well: KBUILD_CFLAGS += -falign-functions=1 + # Pack loops tightly as well: + KBUILD_CFLAGS += -falign-loops=1 + # Don't autogenerate traditional x87 instructions KBUILD_CFLAGS += $(call cc-option,-mno-80387) KBUILD_CFLAGS += $(call cc-option,-mno-fp-ret-in-387)