All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Jason Low <jason.low2@hp.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Aswin Chandramouleeswaran <aswin@hp.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Borislav Petkov <bp@alien8.de>,
	Andy Lutomirski <luto@amacapital.net>,
	Denys Vlasenko <dvlasenk@redhat.com>,
	Brian Gerst <brgerst@gmail.com>, "H. Peter Anvin" <hpa@zytor.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	josh@joshtriplett.org
Subject: Re: [PATCH] x86: Align jump targets to 1 byte boundaries
Date: Fri, 10 Apr 2015 07:10:08 -0700	[thread overview]
Message-ID: <20150410141008.GX6464@linux.vnet.ibm.com> (raw)
In-Reply-To: <20150410120846.GA17101@gmail.com>

On Fri, Apr 10, 2015 at 02:08:46PM +0200, Ingo Molnar wrote:
> 
> * Ingo Molnar <mingo@kernel.org> wrote:
> 
> > So restructure the loop a bit, to get much tighter code:
> > 
> > 0000000000000030 <mutex_spin_on_owner.isra.5>:
> >   30:	55                   	push   %rbp
> >   31:	65 48 8b 14 25 00 00 	mov    %gs:0x0,%rdx
> >   38:	00 00
> >   3a:	48 89 e5             	mov    %rsp,%rbp
> >   3d:	48 39 37             	cmp    %rsi,(%rdi)
> >   40:	75 1e                	jne    60 <mutex_spin_on_owner.isra.5+0x30>
> >   42:	8b 46 28             	mov    0x28(%rsi),%eax
> >   45:	85 c0                	test   %eax,%eax
> >   47:	74 0d                	je     56 <mutex_spin_on_owner.isra.5+0x26>
> >   49:	f3 90                	pause
> >   4b:	48 8b 82 10 c0 ff ff 	mov    -0x3ff0(%rdx),%rax
> >   52:	a8 08                	test   $0x8,%al
> >   54:	74 e7                	je     3d <mutex_spin_on_owner.isra.5+0xd>
> >   56:	31 c0                	xor    %eax,%eax
> >   58:	5d                   	pop    %rbp
> >   59:	c3                   	retq
> >   5a:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)
> >   60:	b8 01 00 00 00       	mov    $0x1,%eax
> >   65:	5d                   	pop    %rbp
> >   66:	c3                   	retq
> 
> Btw., totally off topic, the following NOP caught my attention:
> 
> >   5a:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)
> 
> That's a dead NOP that boats the function a bit, added for the 16 byte 
> alignment of one of the jump targets.
> 
> I realize that x86 CPU manufacturers recommend 16-byte jump target 
> alignments (it's in the Intel optimization manual), but the cost of 
> that is very significant:
> 
>         text           data       bss         dec      filename
>     12566391        1617840   1089536    15273767      vmlinux.align.16-byte
>     12224951        1617840   1089536    14932327      vmlinux.align.1-byte
> 
> By using 1 byte jump target alignment (i.e. no alignment at all) we 
> get an almost 3% reduction in kernel size (!) - and a probably similar 
> reduction in I$ footprint.
> 
> So I'm wondering, is the 16 byte jump target optimization suggestion 
> really worth this price? The patch below boots fine and I've not 
> measured any noticeable slowdown, but I've not tried hard.

Good point, adding Josh Triplett on CC.  I suspect that he might be
interested.  ;-)

							Thanx, Paul

> Now, the usual justification for jump target alignment is the 
> following: with 16 byte instruction-cache cacheline sizes, if a 
> forward jump is aligned to cacheline boundary then prefetches will 
> start from a new cacheline.
> 
> But I think that argument is flawed for typical optimized kernel code 
> flows: forward jumps often go to 'cold' (uncommon) pieces of code, and 
> aligning cold code to cache lines does not bring a lot of advantages 
> (they are uncommon), while it causes collateral damage:
> 
>  - their alignment 'spreads out' the cache footprint, it shifts 
>    followup hot code further out
> 
>  - plus it slows down even 'cold' code that immediately follows 'hot' 
>    code (like in the above case), which could have benefited from the 
>    partial cacheline that comes off the end of hot code.
> 
> What do you guys think about this? I think we should seriously 
> consider relaxing our alignment defaults.
> 
> Thanks,
> 
> 	Ingo
> 
> ==================================>
> >From 5b83a095e1abdfee5c710c34a5785232ce74f939 Mon Sep 17 00:00:00 2001
> From: Ingo Molnar <mingo@kernel.org>
> Date: Fri, 10 Apr 2015 13:50:05 +0200
> Subject: [PATCH] x86: Align jumps targets to 1 byte boundaries
> 
> Not-Yet-Signed-off-by: Ingo Molnar <mingo@kernel.org>
> ---
>  arch/x86/Makefile | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/x86/Makefile b/arch/x86/Makefile
> index 5ba2d9ce82dc..0366d6b44a14 100644
> --- a/arch/x86/Makefile
> +++ b/arch/x86/Makefile
> @@ -77,6 +77,9 @@ else
>          KBUILD_AFLAGS += -m64
>          KBUILD_CFLAGS += -m64
> 
> +	# Align jump targets to 1 byte, not the default 16 bytes:
> +        KBUILD_CFLAGS += -falign-jumps=1
> +
>          # Don't autogenerate traditional x87 instructions
>          KBUILD_CFLAGS += $(call cc-option,-mno-80387)
>          KBUILD_CFLAGS += $(call cc-option,-mno-fp-ret-in-387)
> 


  parent reply	other threads:[~2015-04-10 14:10 UTC|newest]

Thread overview: 108+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-08 19:39 [PATCH 0/2] locking: Simplify mutex and rwsem spinning code Jason Low
2015-04-08 19:39 ` [PATCH 1/2] locking/mutex: Further refactor mutex_spin_on_owner() Jason Low
2015-04-09  9:00   ` [tip:locking/core] locking/mutex: Further simplify mutex_spin_on_owner() tip-bot for Jason Low
2015-04-08 19:39 ` [PATCH 2/2] locking/rwsem: Use a return variable in rwsem_spin_on_owner() Jason Low
2015-04-09  5:37   ` Ingo Molnar
2015-04-09  6:40     ` Jason Low
2015-04-09  7:53       ` Ingo Molnar
2015-04-09 16:47         ` Linus Torvalds
2015-04-09 17:56           ` Paul E. McKenney
2015-04-09 18:08             ` Linus Torvalds
2015-04-09 18:16               ` Linus Torvalds
2015-04-09 18:39                 ` Paul E. McKenney
2015-04-10  9:00                   ` [PATCH] mutex: Speed up mutex_spin_on_owner() by not taking the RCU lock Ingo Molnar
2015-04-10  9:12                     ` Ingo Molnar
2015-04-10  9:21                       ` [PATCH] uaccess: Add __copy_from_kernel_inatomic() primitive Ingo Molnar
2015-04-10 11:14                         ` [PATCH] x86/uaccess: Implement get_kernel() Ingo Molnar
2015-04-10 11:27                           ` [PATCH] mutex: Improve mutex_spin_on_owner() code generation Ingo Molnar
2015-04-10 12:08                             ` [PATCH] x86: Align jump targets to 1 byte boundaries Ingo Molnar
2015-04-10 12:18                               ` [PATCH] x86: Pack function addresses tightly as well Ingo Molnar
2015-04-10 12:30                                 ` [PATCH] x86: Pack loops " Ingo Molnar
2015-04-10 13:46                                   ` Borislav Petkov
2015-05-15  9:40                                   ` [tip:x86/asm] " tip-bot for Ingo Molnar
2015-05-17  6:03                                   ` [tip:x86/apic] " tip-bot for Ingo Molnar
2015-05-15  9:39                                 ` [tip:x86/asm] x86: Pack function addresses " tip-bot for Ingo Molnar
2015-05-15 18:36                                   ` Linus Torvalds
2015-05-15 20:52                                     ` Denys Vlasenko
2015-05-17  5:58                                     ` Ingo Molnar
2015-05-17  7:09                                       ` Ingo Molnar
2015-05-17  7:30                                         ` Ingo Molnar
2015-05-18  9:28                                       ` Denys Vlasenko
2015-05-19 21:38                                       ` [RFC PATCH] x86/64: Optimize the effective instruction cache footprint of kernel functions Ingo Molnar
2015-05-20  0:47                                         ` Linus Torvalds
2015-05-20 12:21                                           ` Denys Vlasenko
2015-05-21 11:36                                             ` Ingo Molnar
2015-05-21 11:38                                             ` Denys Vlasenko
2016-04-16 21:08                                               ` Denys Vlasenko
2015-05-20 13:09                                           ` Ingo Molnar
2015-05-20 11:29                                         ` Denys Vlasenko
2015-05-21 13:28                                           ` Ingo Molnar
2015-05-21 14:03                                           ` Ingo Molnar
2015-04-10 12:50                               ` [PATCH] x86: Align jump targets to 1 byte boundaries Denys Vlasenko
2015-04-10 13:18                                 ` H. Peter Anvin
2015-04-10 17:54                                   ` Ingo Molnar
2015-04-10 18:32                                     ` H. Peter Anvin
2015-04-11 14:41                                   ` Markus Trippelsdorf
2015-04-12 10:14                                     ` Ingo Molnar
2015-04-13 16:23                                       ` Markus Trippelsdorf
2015-04-13 17:26                                         ` Markus Trippelsdorf
2015-04-13 18:31                                           ` Linus Torvalds
2015-04-13 19:09                                             ` Markus Trippelsdorf
2015-04-14  5:38                                               ` Ingo Molnar
2015-04-14  8:23                                                 ` Markus Trippelsdorf
2015-04-14  9:16                                                   ` Ingo Molnar
2015-04-14 11:17                                                     ` Markus Trippelsdorf
2015-04-14 12:09                                                       ` Ingo Molnar
2015-04-10 18:48                                 ` Linus Torvalds
2015-04-12 23:44                                   ` Maciej W. Rozycki
2015-04-10 19:23                                 ` Daniel Borkmann
2015-04-11 13:48                                 ` Markus Trippelsdorf
2015-04-10 13:19                               ` Borislav Petkov
2015-04-10 13:54                                 ` Denys Vlasenko
2015-04-10 14:01                                   ` Borislav Petkov
2015-04-10 14:53                                     ` Denys Vlasenko
2015-04-10 15:25                                       ` Borislav Petkov
2015-04-10 15:48                                         ` Denys Vlasenko
2015-04-10 15:54                                           ` Borislav Petkov
2015-04-10 21:44                                             ` Borislav Petkov
2015-04-10 18:54                                       ` Linus Torvalds
2015-04-10 14:10                               ` Paul E. McKenney [this message]
2015-04-11 14:28                                 ` Josh Triplett
2015-04-11  9:20                               ` [PATCH] x86: Turn off GCC branch probability heuristics Ingo Molnar
2015-04-11 17:41                                 ` Linus Torvalds
2015-04-11 18:57                                   ` Thomas Gleixner
2015-04-11 19:35                                     ` Linus Torvalds
2015-04-12  5:47                                       ` Ingo Molnar
2015-04-12  6:20                                         ` Markus Trippelsdorf
2015-04-12 10:15                                           ` Ingo Molnar
2015-04-12  7:56                                         ` Mike Galbraith
2015-04-12  7:41                                       ` Ingo Molnar
2015-04-12  8:07                                     ` Ingo Molnar
2015-04-12 21:11                                     ` Jan Hubicka
2015-05-14 11:59                               ` [PATCH] x86: Align jump targets to 1 byte boundaries Denys Vlasenko
2015-05-14 18:17                                 ` Ingo Molnar
2015-05-14 19:04                                   ` Denys Vlasenko
2015-05-14 19:44                                     ` Ingo Molnar
2015-05-15 15:45                                   ` Josh Triplett
2015-05-17  5:34                                     ` Ingo Molnar
2015-05-17 19:18                                       ` Josh Triplett
2015-05-18  6:48                                         ` Ingo Molnar
2015-05-15  9:39                               ` [tip:x86/asm] x86: Align jump targets to 1-byte boundaries tip-bot for Ingo Molnar
2015-04-10 11:34                           ` [PATCH] x86/uaccess: Implement get_kernel() Peter Zijlstra
2015-04-10 18:04                             ` Ingo Molnar
2015-04-10 17:49                           ` Linus Torvalds
2015-04-10 18:04                             ` Ingo Molnar
2015-04-10 18:09                               ` Linus Torvalds
2015-04-10 14:20                     ` [PATCH] mutex: Speed up mutex_spin_on_owner() by not taking the RCU lock Paul E. McKenney
2015-04-10 17:44                       ` Ingo Molnar
2015-04-10 18:05                         ` Paul E. McKenney
2015-04-09 19:43                 ` [PATCH 2/2] locking/rwsem: Use a return variable in rwsem_spin_on_owner() Jason Low
2015-04-09 19:58                   ` Paul E. McKenney
2015-04-09 20:58                     ` Jason Low
2015-04-09 21:07                       ` Paul E. McKenney
2015-04-09 19:59                   ` Davidlohr Bueso
2015-04-09 20:36                 ` Jason Low
2015-04-10  2:43                   ` Andev
2015-04-10  9:04                   ` Ingo Molnar
2015-04-08 19:49 ` [PATCH 0/2] locking: Simplify mutex and rwsem spinning code Davidlohr Bueso
2015-04-08 20:10   ` Jason Low

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150410141008.GX6464@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=aswin@hp.com \
    --cc=bp@alien8.de \
    --cc=brgerst@gmail.com \
    --cc=dave@stgolabs.net \
    --cc=dvlasenk@redhat.com \
    --cc=hpa@zytor.com \
    --cc=jason.low2@hp.com \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.