From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932677AbbENMA1 (ORCPT ); Thu, 14 May 2015 08:00:27 -0400 Received: from mx1.redhat.com ([209.132.183.28]:56469 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932301AbbENMAZ (ORCPT ); Thu, 14 May 2015 08:00:25 -0400 Message-ID: <55548E12.6020207@redhat.com> Date: Thu, 14 May 2015 13:59:14 +0200 From: Denys Vlasenko User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: Ingo Molnar , "Paul E. McKenney" CC: Linus Torvalds , Jason Low , Peter Zijlstra , Davidlohr Bueso , Tim Chen , Aswin Chandramouleeswaran , LKML , Borislav Petkov , Andy Lutomirski , Brian Gerst , "H. Peter Anvin" , Thomas Gleixner , Peter Zijlstra Subject: Re: [PATCH] x86: Align jump targets to 1 byte boundaries References: <20150409175652.GI6464@linux.vnet.ibm.com> <20150409183926.GM6464@linux.vnet.ibm.com> <20150410090051.GA28549@gmail.com> <20150410091252.GA27630@gmail.com> <20150410092152.GA21332@gmail.com> <20150410111427.GA30477@gmail.com> <20150410112748.GB30477@gmail.com> <20150410120846.GA17101@gmail.com> In-Reply-To: <20150410120846.GA17101@gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 04/10/2015 02:08 PM, Ingo Molnar wrote: > > * Ingo Molnar wrote: > >> So restructure the loop a bit, to get much tighter code: >> >> 0000000000000030 : >> 30: 55 push %rbp >> 31: 65 48 8b 14 25 00 00 mov %gs:0x0,%rdx >> 38: 00 00 >> 3a: 48 89 e5 mov %rsp,%rbp >> 3d: 48 39 37 cmp %rsi,(%rdi) >> 40: 75 1e jne 60 >> 42: 8b 46 28 mov 0x28(%rsi),%eax >> 45: 85 c0 test %eax,%eax >> 47: 74 0d je 56 >> 49: f3 90 pause >> 4b: 48 8b 82 10 c0 ff ff mov -0x3ff0(%rdx),%rax >> 52: a8 08 test $0x8,%al >> 54: 74 e7 je 3d >> 56: 31 c0 xor %eax,%eax >> 58: 5d pop %rbp >> 59: c3 retq >> 5a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) >> 60: b8 01 00 00 00 mov $0x1,%eax >> 65: 5d pop %rbp >> 66: c3 retq > > Btw., totally off topic, the following NOP caught my attention: > >> 5a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) > > That's a dead NOP that boats the function a bit, added for the 16 byte > alignment of one of the jump targets. > > I realize that x86 CPU manufacturers recommend 16-byte jump target > alignments (it's in the Intel optimization manual), but the cost of > that is very significant: > > text data bss dec filename > 12566391 1617840 1089536 15273767 vmlinux.align.16-byte > 12224951 1617840 1089536 14932327 vmlinux.align.1-byte > > By using 1 byte jump target alignment (i.e. no alignment at all) we > get an almost 3% reduction in kernel size (!) - and a probably similar > reduction in I$ footprint. > > So I'm wondering, is the 16 byte jump target optimization suggestion > really worth this price? The patch below boots fine and I've not > measured any noticeable slowdown, but I've not tried hard. > > Now, the usual justification for jump target alignment is the > following: with 16 byte instruction-cache cacheline sizes, if a > forward jump is aligned to cacheline boundary then prefetches will > start from a new cacheline. > > But I think that argument is flawed for typical optimized kernel code > flows: forward jumps often go to 'cold' (uncommon) pieces of code, and > aligning cold code to cache lines does not bring a lot of advantages > (they are uncommon), while it causes collateral damage: > > - their alignment 'spreads out' the cache footprint, it shifts > followup hot code further out > > - plus it slows down even 'cold' code that immediately follows 'hot' > code (like in the above case), which could have benefited from the > partial cacheline that comes off the end of hot code. > > What do you guys think about this? I think we should seriously > consider relaxing our alignment defaults. Looks like nobody objected. I think it's ok to submit this patch for real. > + # Align jump targets to 1 byte, not the default 16 bytes: > + KBUILD_CFLAGS += -falign-jumps=1