From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934419AbbDJNzI (ORCPT ); Fri, 10 Apr 2015 09:55:08 -0400 Received: from mx1.redhat.com ([209.132.183.28]:44534 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933181AbbDJNzC (ORCPT ); Fri, 10 Apr 2015 09:55:02 -0400 Message-ID: <5527D631.4090905@redhat.com> Date: Fri, 10 Apr 2015 15:54:57 +0200 From: Denys Vlasenko User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: Borislav Petkov , Ingo Molnar CC: "Paul E. McKenney" , Linus Torvalds , Jason Low , Peter Zijlstra , Davidlohr Bueso , Tim Chen , Aswin Chandramouleeswaran , LKML , Andy Lutomirski , Brian Gerst , "H. Peter Anvin" , Thomas Gleixner , Peter Zijlstra Subject: Re: [PATCH] x86: Align jump targets to 1 byte boundaries References: <20150409175652.GI6464@linux.vnet.ibm.com> <20150409183926.GM6464@linux.vnet.ibm.com> <20150410090051.GA28549@gmail.com> <20150410091252.GA27630@gmail.com> <20150410092152.GA21332@gmail.com> <20150410111427.GA30477@gmail.com> <20150410112748.GB30477@gmail.com> <20150410120846.GA17101@gmail.com> <20150410131929.GE28074@pd.tnic> In-Reply-To: <20150410131929.GE28074@pd.tnic> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 04/10/2015 03:19 PM, Borislav Petkov wrote: > On Fri, Apr 10, 2015 at 02:08:46PM +0200, Ingo Molnar wrote: >> Now, the usual justification for jump target alignment is the >> following: with 16 byte instruction-cache cacheline sizes, if a > > You mean 64 bytes? > > Cacheline size on modern x86 is 64 bytes. The 16 alignment is probably > some branch predictor stride thing. IIRC it's a maximum decode bandwidth. Decoders on the most powerful x86 CPUs, both Intel and AMD, attempt to decode in one cycle up to four instructions. For this they fetch up to 16 bytes. If cacheline ends before 16 bytes are available, then decode will operate on fewer bytes, or it will wait for next cacheline to be fetched.