From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932852AbbDJMvQ (ORCPT ); Fri, 10 Apr 2015 08:51:16 -0400 Received: from mx1.redhat.com ([209.132.183.28]:40968 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932324AbbDJMvO (ORCPT ); Fri, 10 Apr 2015 08:51:14 -0400 Message-ID: <5527C700.3030405@redhat.com> Date: Fri, 10 Apr 2015 14:50:08 +0200 From: Denys Vlasenko User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: Ingo Molnar , "Paul E. McKenney" CC: Linus Torvalds , Jason Low , Peter Zijlstra , Davidlohr Bueso , Tim Chen , Aswin Chandramouleeswaran , LKML , Borislav Petkov , Andy Lutomirski , Brian Gerst , "H. Peter Anvin" , Thomas Gleixner , Peter Zijlstra Subject: Re: [PATCH] x86: Align jump targets to 1 byte boundaries References: <20150409175652.GI6464@linux.vnet.ibm.com> <20150409183926.GM6464@linux.vnet.ibm.com> <20150410090051.GA28549@gmail.com> <20150410091252.GA27630@gmail.com> <20150410092152.GA21332@gmail.com> <20150410111427.GA30477@gmail.com> <20150410112748.GB30477@gmail.com> <20150410120846.GA17101@gmail.com> In-Reply-To: <20150410120846.GA17101@gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 04/10/2015 02:08 PM, Ingo Molnar wrote: > > * Ingo Molnar wrote: > >> So restructure the loop a bit, to get much tighter code: >> >> 0000000000000030 : >> 30: 55 push %rbp >> 31: 65 48 8b 14 25 00 00 mov %gs:0x0,%rdx >> 38: 00 00 >> 3a: 48 89 e5 mov %rsp,%rbp >> 3d: 48 39 37 cmp %rsi,(%rdi) >> 40: 75 1e jne 60 >> 42: 8b 46 28 mov 0x28(%rsi),%eax >> 45: 85 c0 test %eax,%eax >> 47: 74 0d je 56 >> 49: f3 90 pause >> 4b: 48 8b 82 10 c0 ff ff mov -0x3ff0(%rdx),%rax >> 52: a8 08 test $0x8,%al >> 54: 74 e7 je 3d >> 56: 31 c0 xor %eax,%eax >> 58: 5d pop %rbp >> 59: c3 retq >> 5a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) >> 60: b8 01 00 00 00 mov $0x1,%eax >> 65: 5d pop %rbp >> 66: c3 retq > > Btw., totally off topic, the following NOP caught my attention: > >> 5a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) > That's a dead NOP that boats the function a bit, added for the 16 byte > alignment of one of the jump targets. > > I realize that x86 CPU manufacturers recommend 16-byte jump target > alignments (it's in the Intel optimization manual), but the cost of > that is very significant: > > text data bss dec filename > 12566391 1617840 1089536 15273767 vmlinux.align.16-byte > 12224951 1617840 1089536 14932327 vmlinux.align.1-byte > > By using 1 byte jump target alignment (i.e. no alignment at all) we > get an almost 3% reduction in kernel size (!) - and a probably similar > reduction in I$ footprint. > > So I'm wondering, is the 16 byte jump target optimization suggestion > really worth this price? The patch below boots fine and I've not > measured any noticeable slowdown, but I've not tried hard. I am absolutely thrilled by the proposal to cut down on sadistic amounts of alignment. However, I'm an -Os guy. Expect -O2 people to disagree :) New-ish versions of gcc allow people to specify optimization options per function: https://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html#Function-Attributes optimize The optimize attribute is used to specify that a function is to be compiled with different optimization options than specified on the command line. Arguments can either be numbers or strings. Numbers are assumed to be an optimization level. Strings that begin with O are assumed to be an optimization option, while other options are assumed to be used with a -f prefix. How about not aligning code by default, and using #define hot_func __attribute__((optimize("O2","align-functions=16","align-jumps=16"))) ... void hot_func super_often_called_func(...) {...} in hot code paths?