From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754950AbbDJMJO (ORCPT ); Fri, 10 Apr 2015 08:09:14 -0400 Received: from mail-wi0-f177.google.com ([209.85.212.177]:35518 "EHLO mail-wi0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754570AbbDJMJK (ORCPT ); Fri, 10 Apr 2015 08:09:10 -0400 Date: Fri, 10 Apr 2015 14:08:46 +0200 From: Ingo Molnar To: "Paul E. McKenney" Cc: Linus Torvalds , Jason Low , Peter Zijlstra , Davidlohr Bueso , Tim Chen , Aswin Chandramouleeswaran , LKML , Borislav Petkov , Andy Lutomirski , Denys Vlasenko , Brian Gerst , "H. Peter Anvin" , Thomas Gleixner , Peter Zijlstra Subject: [PATCH] x86: Align jump targets to 1 byte boundaries Message-ID: <20150410120846.GA17101@gmail.com> References: <20150409175652.GI6464@linux.vnet.ibm.com> <20150409183926.GM6464@linux.vnet.ibm.com> <20150410090051.GA28549@gmail.com> <20150410091252.GA27630@gmail.com> <20150410092152.GA21332@gmail.com> <20150410111427.GA30477@gmail.com> <20150410112748.GB30477@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150410112748.GB30477@gmail.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Ingo Molnar wrote: > So restructure the loop a bit, to get much tighter code: > > 0000000000000030 : > 30: 55 push %rbp > 31: 65 48 8b 14 25 00 00 mov %gs:0x0,%rdx > 38: 00 00 > 3a: 48 89 e5 mov %rsp,%rbp > 3d: 48 39 37 cmp %rsi,(%rdi) > 40: 75 1e jne 60 > 42: 8b 46 28 mov 0x28(%rsi),%eax > 45: 85 c0 test %eax,%eax > 47: 74 0d je 56 > 49: f3 90 pause > 4b: 48 8b 82 10 c0 ff ff mov -0x3ff0(%rdx),%rax > 52: a8 08 test $0x8,%al > 54: 74 e7 je 3d > 56: 31 c0 xor %eax,%eax > 58: 5d pop %rbp > 59: c3 retq > 5a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) > 60: b8 01 00 00 00 mov $0x1,%eax > 65: 5d pop %rbp > 66: c3 retq Btw., totally off topic, the following NOP caught my attention: > 5a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) That's a dead NOP that boats the function a bit, added for the 16 byte alignment of one of the jump targets. I realize that x86 CPU manufacturers recommend 16-byte jump target alignments (it's in the Intel optimization manual), but the cost of that is very significant: text data bss dec filename 12566391 1617840 1089536 15273767 vmlinux.align.16-byte 12224951 1617840 1089536 14932327 vmlinux.align.1-byte By using 1 byte jump target alignment (i.e. no alignment at all) we get an almost 3% reduction in kernel size (!) - and a probably similar reduction in I$ footprint. So I'm wondering, is the 16 byte jump target optimization suggestion really worth this price? The patch below boots fine and I've not measured any noticeable slowdown, but I've not tried hard. Now, the usual justification for jump target alignment is the following: with 16 byte instruction-cache cacheline sizes, if a forward jump is aligned to cacheline boundary then prefetches will start from a new cacheline. But I think that argument is flawed for typical optimized kernel code flows: forward jumps often go to 'cold' (uncommon) pieces of code, and aligning cold code to cache lines does not bring a lot of advantages (they are uncommon), while it causes collateral damage: - their alignment 'spreads out' the cache footprint, it shifts followup hot code further out - plus it slows down even 'cold' code that immediately follows 'hot' code (like in the above case), which could have benefited from the partial cacheline that comes off the end of hot code. What do you guys think about this? I think we should seriously consider relaxing our alignment defaults. Thanks, Ingo ==================================> >>From 5b83a095e1abdfee5c710c34a5785232ce74f939 Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Fri, 10 Apr 2015 13:50:05 +0200 Subject: [PATCH] x86: Align jumps targets to 1 byte boundaries Not-Yet-Signed-off-by: Ingo Molnar --- arch/x86/Makefile | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/x86/Makefile b/arch/x86/Makefile index 5ba2d9ce82dc..0366d6b44a14 100644 --- a/arch/x86/Makefile +++ b/arch/x86/Makefile @@ -77,6 +77,9 @@ else KBUILD_AFLAGS += -m64 KBUILD_CFLAGS += -m64 + # Align jump targets to 1 byte, not the default 16 bytes: + KBUILD_CFLAGS += -falign-jumps=1 + # Don't autogenerate traditional x87 instructions KBUILD_CFLAGS += $(call cc-option,-mno-80387) KBUILD_CFLAGS += $(call cc-option,-mno-fp-ret-in-387)