From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1030289AbbDJOEA (ORCPT ); Fri, 10 Apr 2015 10:04:00 -0400 Received: from mail.skyhub.de ([78.46.96.112]:44493 "EHLO mail.skyhub.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933202AbbDJODz (ORCPT ); Fri, 10 Apr 2015 10:03:55 -0400 Date: Fri, 10 Apr 2015 16:01:41 +0200 From: Borislav Petkov To: Denys Vlasenko Cc: Ingo Molnar , "Paul E. McKenney" , Linus Torvalds , Jason Low , Peter Zijlstra , Davidlohr Bueso , Tim Chen , Aswin Chandramouleeswaran , LKML , Andy Lutomirski , Brian Gerst , "H. Peter Anvin" , Thomas Gleixner , Peter Zijlstra Subject: Re: [PATCH] x86: Align jump targets to 1 byte boundaries Message-ID: <20150410140141.GI28074@pd.tnic> References: <20150409183926.GM6464@linux.vnet.ibm.com> <20150410090051.GA28549@gmail.com> <20150410091252.GA27630@gmail.com> <20150410092152.GA21332@gmail.com> <20150410111427.GA30477@gmail.com> <20150410112748.GB30477@gmail.com> <20150410120846.GA17101@gmail.com> <20150410131929.GE28074@pd.tnic> <5527D631.4090905@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <5527D631.4090905@redhat.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Apr 10, 2015 at 03:54:57PM +0200, Denys Vlasenko wrote: > On 04/10/2015 03:19 PM, Borislav Petkov wrote: > > On Fri, Apr 10, 2015 at 02:08:46PM +0200, Ingo Molnar wrote: > >> Now, the usual justification for jump target alignment is the > >> following: with 16 byte instruction-cache cacheline sizes, if a > > > > You mean 64 bytes? > > > > Cacheline size on modern x86 is 64 bytes. The 16 alignment is probably > > some branch predictor stride thing. > > IIRC it's a maximum decode bandwidth. Decoders on the most powerful > x86 CPUs, both Intel and AMD, attempt to decode in one cycle > up to four instructions. For this they fetch up to 16 bytes. 32 bytes fetch window per cycle for AMD F15h and F16h, see my other mail. And Intel probably do the same. > If cacheline ends before 16 bytes are available, then decode > will operate on fewer bytes, or it will wait for next cacheline > to be fetched. Yap. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. --