From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933911AbbDJOKU (ORCPT ); Fri, 10 Apr 2015 10:10:20 -0400 Received: from e35.co.us.ibm.com ([32.97.110.153]:54414 "EHLO e35.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933574AbbDJOKR (ORCPT ); Fri, 10 Apr 2015 10:10:17 -0400 Date: Fri, 10 Apr 2015 07:10:08 -0700 From: "Paul E. McKenney" To: Ingo Molnar Cc: Linus Torvalds , Jason Low , Peter Zijlstra , Davidlohr Bueso , Tim Chen , Aswin Chandramouleeswaran , LKML , Borislav Petkov , Andy Lutomirski , Denys Vlasenko , Brian Gerst , "H. Peter Anvin" , Thomas Gleixner , Peter Zijlstra , josh@joshtriplett.org Subject: Re: [PATCH] x86: Align jump targets to 1 byte boundaries Message-ID: <20150410141008.GX6464@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20150409175652.GI6464@linux.vnet.ibm.com> <20150409183926.GM6464@linux.vnet.ibm.com> <20150410090051.GA28549@gmail.com> <20150410091252.GA27630@gmail.com> <20150410092152.GA21332@gmail.com> <20150410111427.GA30477@gmail.com> <20150410112748.GB30477@gmail.com> <20150410120846.GA17101@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150410120846.GA17101@gmail.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15041014-0013-0000-0000-000009F261CD Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Apr 10, 2015 at 02:08:46PM +0200, Ingo Molnar wrote: > > * Ingo Molnar wrote: > > > So restructure the loop a bit, to get much tighter code: > > > > 0000000000000030 : > > 30: 55 push %rbp > > 31: 65 48 8b 14 25 00 00 mov %gs:0x0,%rdx > > 38: 00 00 > > 3a: 48 89 e5 mov %rsp,%rbp > > 3d: 48 39 37 cmp %rsi,(%rdi) > > 40: 75 1e jne 60 > > 42: 8b 46 28 mov 0x28(%rsi),%eax > > 45: 85 c0 test %eax,%eax > > 47: 74 0d je 56 > > 49: f3 90 pause > > 4b: 48 8b 82 10 c0 ff ff mov -0x3ff0(%rdx),%rax > > 52: a8 08 test $0x8,%al > > 54: 74 e7 je 3d > > 56: 31 c0 xor %eax,%eax > > 58: 5d pop %rbp > > 59: c3 retq > > 5a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) > > 60: b8 01 00 00 00 mov $0x1,%eax > > 65: 5d pop %rbp > > 66: c3 retq > > Btw., totally off topic, the following NOP caught my attention: > > > 5a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) > > That's a dead NOP that boats the function a bit, added for the 16 byte > alignment of one of the jump targets. > > I realize that x86 CPU manufacturers recommend 16-byte jump target > alignments (it's in the Intel optimization manual), but the cost of > that is very significant: > > text data bss dec filename > 12566391 1617840 1089536 15273767 vmlinux.align.16-byte > 12224951 1617840 1089536 14932327 vmlinux.align.1-byte > > By using 1 byte jump target alignment (i.e. no alignment at all) we > get an almost 3% reduction in kernel size (!) - and a probably similar > reduction in I$ footprint. > > So I'm wondering, is the 16 byte jump target optimization suggestion > really worth this price? The patch below boots fine and I've not > measured any noticeable slowdown, but I've not tried hard. Good point, adding Josh Triplett on CC. I suspect that he might be interested. ;-) Thanx, Paul > Now, the usual justification for jump target alignment is the > following: with 16 byte instruction-cache cacheline sizes, if a > forward jump is aligned to cacheline boundary then prefetches will > start from a new cacheline. > > But I think that argument is flawed for typical optimized kernel code > flows: forward jumps often go to 'cold' (uncommon) pieces of code, and > aligning cold code to cache lines does not bring a lot of advantages > (they are uncommon), while it causes collateral damage: > > - their alignment 'spreads out' the cache footprint, it shifts > followup hot code further out > > - plus it slows down even 'cold' code that immediately follows 'hot' > code (like in the above case), which could have benefited from the > partial cacheline that comes off the end of hot code. > > What do you guys think about this? I think we should seriously > consider relaxing our alignment defaults. > > Thanks, > > Ingo > > ==================================> > >From 5b83a095e1abdfee5c710c34a5785232ce74f939 Mon Sep 17 00:00:00 2001 > From: Ingo Molnar > Date: Fri, 10 Apr 2015 13:50:05 +0200 > Subject: [PATCH] x86: Align jumps targets to 1 byte boundaries > > Not-Yet-Signed-off-by: Ingo Molnar > --- > arch/x86/Makefile | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/arch/x86/Makefile b/arch/x86/Makefile > index 5ba2d9ce82dc..0366d6b44a14 100644 > --- a/arch/x86/Makefile > +++ b/arch/x86/Makefile > @@ -77,6 +77,9 @@ else > KBUILD_AFLAGS += -m64 > KBUILD_CFLAGS += -m64 > > + # Align jump targets to 1 byte, not the default 16 bytes: > + KBUILD_CFLAGS += -falign-jumps=1 > + > # Don't autogenerate traditional x87 instructions > KBUILD_CFLAGS += $(call cc-option,-mno-80387) > KBUILD_CFLAGS += $(call cc-option,-mno-fp-ret-in-387) >