All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Borislav Petkov <bp@alien8.de>
Cc: x86-ml <x86@kernel.org>, lkml <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Jason Baron <jbaron@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [RFC PATCH] Flipped jump labels
Date: Sun, 10 Aug 2014 08:11:03 +0200	[thread overview]
Message-ID: <20140810061103.GA13968@gmail.com> (raw)
In-Reply-To: <20140809105742.GA5910@pd.tnic>


* Borislav Petkov <bp@alien8.de> wrote:

> Hi dudes,
> 
> with the current impl. of jump labels, people can't really do the
> following:
> 
> ---
> JMP unlikely_code
> likely_code
> 
> unlikely_code:
> unlikely code
> ---
> 
> and after some initialization queries overwrite the JMP with a NOP so
> that the likely code gets executed at 0 cost.
> 
> The issue is that jump labels unconditionally add NOPs by default
> (see arch_static_branch). For example, native_sched_clock() gets the
> following code layout here:
> 
> --
> NOP
> unlikely code (which computes time in ns from jiffies)
> likely code (which does RDTSC)
> --
> 
> Yes, unlikely code comes first.
> 
> when the jump labels get initialized and all checks done, at runtime we
> have this:
> 
> 	   0xffffffff8100ce40 <sched_clock>:    push   %rbp
> 	   0xffffffff8100ce41 <sched_clock+1>:  mov    %rsp,%rbp
> 	   0xffffffff8100ce44 <sched_clock+4>:  and    $0xfffffffffffffff0,%rsp
> 
> unconditional JMP!!!
> 
> 	   0xffffffff8100ce48 <sched_clock+8>:  jmpq   0xffffffff8100ce70 <sched_clock+48>
> 
> unlikely code using jiffies
> 
> 	   0xffffffff8100ce4d <sched_clock+13>: mov    0x9a71ac(%rip),%r8        # 0xffffffff819b4000 <jiffies_64>
> 	   0xffffffff8100ce54 <sched_clock+20>: movabs $0xffc2f745d964b800,%rax
> 	   0xffffffff8100ce5e <sched_clock+30>: leaveq 
> 	   0xffffffff8100ce5f <sched_clock+31>: imul   $0x3d0900,%r8,%rdx
> 	   0xffffffff8100ce66 <sched_clock+38>: add    %rdx,%rax
> 	   0xffffffff8100ce69 <sched_clock+41>: retq   
> 	   0xffffffff8100ce6a <sched_clock+42>: nopw   0x0(%rax,%rax,1)
> 
> likely code using RDTSC, see JMP target address.
> 
> 	   0xffffffff8100ce70 <sched_clock+48>: rdtsc
> 
> 
> so what we end up getting is not really helping because we always get to
> pay for that JMP on all modern systems which sport RDTSC even though we
> shouldn't really.
> 
> And remember this is not something cheap: sched_clock uses the TSC
> even if it is unstable so we're always jumping like insane and
> unconditionally.
> 
> So, long story short, we want this, instead:
> 
> 	   0xffffffff8100cf10 <sched_clock>:    push   %rbp
> 	   0xffffffff8100cf11 <sched_clock+1>:  mov    %rsp,%rbp
> 	   0xffffffff8100cf14 <sched_clock+4>:  and    $0xfffffffffffffff0,%rsp
> 
> unconditional JMP is nopped out
> 
> 	   0xffffffff8100cf18 <sched_clock+8>:  data32 data32 data32 xchg %ax,%ax
> 
> likely code which comes first in the function so all the advantages from
> it to front end, branch pred, yadda yadda, get to be enjoyed :)
> 
> 	   0xffffffff8100cf1d <sched_clock+13>: rdtsc  
> 	   0xffffffff8100cf1f <sched_clock+15>: mov    %eax,%esi
> 	   0xffffffff8100cf21 <sched_clock+17>: mov    %rdx,%rax
> 	   0xffffffff8100cf24 <sched_clock+20>: shl    $0x20,%rax
> 	   0xffffffff8100cf28 <sched_clock+24>: or     %rsi,%rax
> 	   0xffffffff8100cf2b <sched_clock+27>: mov    %rax,%rcx
> 	   0xffffffff8100cf2e <sched_clock+30>: incl   %gs:0xb8e0
> 	   0xffffffff8100cf36 <sched_clock+38>: mov    %gs:0x1d0c30,%rsi
> 	   0xffffffff8100cf3f <sched_clock+47>: mov    %gs:0x1d0c38,%rax
> 	   0xffffffff8100cf48 <sched_clock+56>: cmp    %rax,%rsi
> 	   0xffffffff8100cf4b <sched_clock+59>: jne    0xffffffff8100cf90 <sched_clock+128>
> 	   0xffffffff8100cf4d <sched_clock+61>: mov    (%rsi),%eax
> 	   0xffffffff8100cf4f <sched_clock+63>: mul    %rcx
> 	   0xffffffff8100cf52 <sched_clock+66>: shrd   $0xa,%rdx,%rax
> 	   0xffffffff8100cf57 <sched_clock+71>: add    0x8(%rsi),%rax
> 	   0xffffffff8100cf5b <sched_clock+75>: decl   %gs:0xb8e0
> 	   0xffffffff8100cf63 <sched_clock+83>: je     0xffffffff8100cf88 <sched_clock+120>
> 	   0xffffffff8100cf65 <sched_clock+85>: leaveq 
> 	   0xffffffff8100cf66 <sched_clock+86>: retq   
> 
> Done, we return here.
> 
> 	   0xffffffff8100cf67 <sched_clock+87>: nop
> 
> unlikely code which does the jiffies math.
> 
> 	   0xffffffff8100cf68 <sched_clock+88>: mov    0x9a7091(%rip),%rax        # 0xffffffff819b4000 <jiffies_64>
> 	   0xffffffff8100cf6f <sched_clock+95>: leaveq 
> 	   0xffffffff8100cf70 <sched_clock+96>: imul   $0x3d0900,%rax,%rdx
> 	   ...
> 
> 
> So basically what I'm proposing is a jump label type which is
> initialized by default to jump to the unlikely code and once
> initialization has happened, JMP gets overwritten.
> 
> The things to pay attention here is
> 
> * this label should be used in places where it is very likely for the
> jump to get overwritten. Basically the opposite to tracepoints for which
> the jump labels were created initially, to be mostly off.
> 
> * It must be used in places where JMP gets overwritten only after some
> initialization done first.
> 
> Anyway, below is a concept patch for myself to try the idea first - it
> seems to work here. Constructive ideas and suggestions are welcome, as
> always.
> 
> ---
> diff --git a/arch/x86/include/asm/jump_label.h b/arch/x86/include/asm/jump_label.h
> index 6a2cefb4395a..2d963c6489a8 100644
> --- a/arch/x86/include/asm/jump_label.h
> +++ b/arch/x86/include/asm/jump_label.h
> @@ -30,6 +30,22 @@ l_yes:
>  	return true;
>  }
>  
> +static __always_inline bool arch_static_branch_active(struct static_key *key)
> +{
> +	asm_volatile_goto("1:"
> +		"jmp %l[l_yes]\n\t"
> +		".byte " __stringify(GENERIC_NOP3) "\n\t"
> +		".pushsection __jump_table,  \"aw\" \n\t"
> +		_ASM_ALIGN "\n\t"
> +		_ASM_PTR "1b, %l[l_yes], %c0 \n\t"
> +		".popsection \n\t"
> +		: :  "i" (key) : : l_yes);
> +	return false;
> +l_yes:
> +	return true;
> +}
> +
> +
>  #endif /* __KERNEL__ */
>  
>  #ifdef CONFIG_X86_64
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index 4ca327e900ae..81bc2c4a7eab 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -33,7 +33,7 @@ EXPORT_SYMBOL(tsc_khz);
>   */
>  static int __read_mostly tsc_unstable;
>  
> -static struct static_key __use_tsc = STATIC_KEY_INIT;
> +static struct static_key __use_tsc = STATIC_KEY_INIT_FALSE;
>  
>  int tsc_clocksource_reliable;
>  
> @@ -280,7 +280,7 @@ u64 native_sched_clock(void)
>  	 *   very important for it to be as fast as the platform
>  	 *   can achieve it. )
>  	 */
> -	if (!static_key_false(&__use_tsc)) {
> +	if (arch_static_branch_active(&__use_tsc)) {
>  		/* No locking but a rare wrong value is not a big deal: */
>  		return (jiffies_64 - INITIAL_JIFFIES) * (1000000000 / HZ);
>  	}

Wouldn't using STATIC_KEY_INIT_TRUE and static_key_true() [instead of 
!static_key_false()] result in the same good code placement effects?

Thanks,

	Ingo

  reply	other threads:[~2014-08-10  6:11 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-09 10:57 [RFC PATCH] Flipped jump labels Borislav Petkov
2014-08-10  6:11 ` Ingo Molnar [this message]
2014-08-10  6:13   ` Ingo Molnar
2014-08-10 15:35     ` Borislav Petkov
2014-08-10 15:45       ` Ingo Molnar
2014-08-10 16:07         ` Borislav Petkov
2014-08-11  3:32           ` Jason Baron
2014-08-11  8:42             ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140810061103.GA13968@gmail.com \
    --to=mingo@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=jbaron@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.