All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] Flipped jump labels
@ 2014-08-09 10:57 Borislav Petkov
  2014-08-10  6:11 ` Ingo Molnar
  0 siblings, 1 reply; 8+ messages in thread
From: Borislav Petkov @ 2014-08-09 10:57 UTC (permalink / raw)
  To: x86-ml, lkml; +Cc: Peter Zijlstra, Steven Rostedt, Thomas Gleixner

Hi dudes,

with the current impl. of jump labels, people can't really do the
following:

---
JMP unlikely_code
likely_code

unlikely_code:
unlikely code
---

and after some initialization queries overwrite the JMP with a NOP so
that the likely code gets executed at 0 cost.

The issue is that jump labels unconditionally add NOPs by default
(see arch_static_branch). For example, native_sched_clock() gets the
following code layout here:

--
NOP
unlikely code (which computes time in ns from jiffies)
likely code (which does RDTSC)
--

Yes, unlikely code comes first.

when the jump labels get initialized and all checks done, at runtime we
have this:

	   0xffffffff8100ce40 <sched_clock>:    push   %rbp
	   0xffffffff8100ce41 <sched_clock+1>:  mov    %rsp,%rbp
	   0xffffffff8100ce44 <sched_clock+4>:  and    $0xfffffffffffffff0,%rsp

unconditional JMP!!!

	   0xffffffff8100ce48 <sched_clock+8>:  jmpq   0xffffffff8100ce70 <sched_clock+48>

unlikely code using jiffies

	   0xffffffff8100ce4d <sched_clock+13>: mov    0x9a71ac(%rip),%r8        # 0xffffffff819b4000 <jiffies_64>
	   0xffffffff8100ce54 <sched_clock+20>: movabs $0xffc2f745d964b800,%rax
	   0xffffffff8100ce5e <sched_clock+30>: leaveq 
	   0xffffffff8100ce5f <sched_clock+31>: imul   $0x3d0900,%r8,%rdx
	   0xffffffff8100ce66 <sched_clock+38>: add    %rdx,%rax
	   0xffffffff8100ce69 <sched_clock+41>: retq   
	   0xffffffff8100ce6a <sched_clock+42>: nopw   0x0(%rax,%rax,1)

likely code using RDTSC, see JMP target address.

	   0xffffffff8100ce70 <sched_clock+48>: rdtsc


so what we end up getting is not really helping because we always get to
pay for that JMP on all modern systems which sport RDTSC even though we
shouldn't really.

And remember this is not something cheap: sched_clock uses the TSC
even if it is unstable so we're always jumping like insane and
unconditionally.

So, long story short, we want this, instead:

	   0xffffffff8100cf10 <sched_clock>:    push   %rbp
	   0xffffffff8100cf11 <sched_clock+1>:  mov    %rsp,%rbp
	   0xffffffff8100cf14 <sched_clock+4>:  and    $0xfffffffffffffff0,%rsp

unconditional JMP is nopped out

	   0xffffffff8100cf18 <sched_clock+8>:  data32 data32 data32 xchg %ax,%ax

likely code which comes first in the function so all the advantages from
it to front end, branch pred, yadda yadda, get to be enjoyed :)

	   0xffffffff8100cf1d <sched_clock+13>: rdtsc  
	   0xffffffff8100cf1f <sched_clock+15>: mov    %eax,%esi
	   0xffffffff8100cf21 <sched_clock+17>: mov    %rdx,%rax
	   0xffffffff8100cf24 <sched_clock+20>: shl    $0x20,%rax
	   0xffffffff8100cf28 <sched_clock+24>: or     %rsi,%rax
	   0xffffffff8100cf2b <sched_clock+27>: mov    %rax,%rcx
	   0xffffffff8100cf2e <sched_clock+30>: incl   %gs:0xb8e0
	   0xffffffff8100cf36 <sched_clock+38>: mov    %gs:0x1d0c30,%rsi
	   0xffffffff8100cf3f <sched_clock+47>: mov    %gs:0x1d0c38,%rax
	   0xffffffff8100cf48 <sched_clock+56>: cmp    %rax,%rsi
	   0xffffffff8100cf4b <sched_clock+59>: jne    0xffffffff8100cf90 <sched_clock+128>
	   0xffffffff8100cf4d <sched_clock+61>: mov    (%rsi),%eax
	   0xffffffff8100cf4f <sched_clock+63>: mul    %rcx
	   0xffffffff8100cf52 <sched_clock+66>: shrd   $0xa,%rdx,%rax
	   0xffffffff8100cf57 <sched_clock+71>: add    0x8(%rsi),%rax
	   0xffffffff8100cf5b <sched_clock+75>: decl   %gs:0xb8e0
	   0xffffffff8100cf63 <sched_clock+83>: je     0xffffffff8100cf88 <sched_clock+120>
	   0xffffffff8100cf65 <sched_clock+85>: leaveq 
	   0xffffffff8100cf66 <sched_clock+86>: retq   

Done, we return here.

	   0xffffffff8100cf67 <sched_clock+87>: nop

unlikely code which does the jiffies math.

	   0xffffffff8100cf68 <sched_clock+88>: mov    0x9a7091(%rip),%rax        # 0xffffffff819b4000 <jiffies_64>
	   0xffffffff8100cf6f <sched_clock+95>: leaveq 
	   0xffffffff8100cf70 <sched_clock+96>: imul   $0x3d0900,%rax,%rdx
	   ...


So basically what I'm proposing is a jump label type which is
initialized by default to jump to the unlikely code and once
initialization has happened, JMP gets overwritten.

The things to pay attention here is

* this label should be used in places where it is very likely for the
jump to get overwritten. Basically the opposite to tracepoints for which
the jump labels were created initially, to be mostly off.

* It must be used in places where JMP gets overwritten only after some
initialization done first.

Anyway, below is a concept patch for myself to try the idea first - it
seems to work here. Constructive ideas and suggestions are welcome, as
always.

---
diff --git a/arch/x86/include/asm/jump_label.h b/arch/x86/include/asm/jump_label.h
index 6a2cefb4395a..2d963c6489a8 100644
--- a/arch/x86/include/asm/jump_label.h
+++ b/arch/x86/include/asm/jump_label.h
@@ -30,6 +30,22 @@ l_yes:
 	return true;
 }
 
+static __always_inline bool arch_static_branch_active(struct static_key *key)
+{
+	asm_volatile_goto("1:"
+		"jmp %l[l_yes]\n\t"
+		".byte " __stringify(GENERIC_NOP3) "\n\t"
+		".pushsection __jump_table,  \"aw\" \n\t"
+		_ASM_ALIGN "\n\t"
+		_ASM_PTR "1b, %l[l_yes], %c0 \n\t"
+		".popsection \n\t"
+		: :  "i" (key) : : l_yes);
+	return false;
+l_yes:
+	return true;
+}
+
+
 #endif /* __KERNEL__ */
 
 #ifdef CONFIG_X86_64
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 4ca327e900ae..81bc2c4a7eab 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -33,7 +33,7 @@ EXPORT_SYMBOL(tsc_khz);
  */
 static int __read_mostly tsc_unstable;
 
-static struct static_key __use_tsc = STATIC_KEY_INIT;
+static struct static_key __use_tsc = STATIC_KEY_INIT_FALSE;
 
 int tsc_clocksource_reliable;
 
@@ -280,7 +280,7 @@ u64 native_sched_clock(void)
 	 *   very important for it to be as fast as the platform
 	 *   can achieve it. )
 	 */
-	if (!static_key_false(&__use_tsc)) {
+	if (arch_static_branch_active(&__use_tsc)) {
 		/* No locking but a rare wrong value is not a big deal: */
 		return (jiffies_64 - INITIAL_JIFFIES) * (1000000000 / HZ);
 	}

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] Flipped jump labels
  2014-08-09 10:57 [RFC PATCH] Flipped jump labels Borislav Petkov
@ 2014-08-10  6:11 ` Ingo Molnar
  2014-08-10  6:13   ` Ingo Molnar
  0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2014-08-10  6:11 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: x86-ml, lkml, Peter Zijlstra, Steven Rostedt, Thomas Gleixner,
	Linus Torvalds, Jason Baron, Andrew Morton


* Borislav Petkov <bp@alien8.de> wrote:

> Hi dudes,
> 
> with the current impl. of jump labels, people can't really do the
> following:
> 
> ---
> JMP unlikely_code
> likely_code
> 
> unlikely_code:
> unlikely code
> ---
> 
> and after some initialization queries overwrite the JMP with a NOP so
> that the likely code gets executed at 0 cost.
> 
> The issue is that jump labels unconditionally add NOPs by default
> (see arch_static_branch). For example, native_sched_clock() gets the
> following code layout here:
> 
> --
> NOP
> unlikely code (which computes time in ns from jiffies)
> likely code (which does RDTSC)
> --
> 
> Yes, unlikely code comes first.
> 
> when the jump labels get initialized and all checks done, at runtime we
> have this:
> 
> 	   0xffffffff8100ce40 <sched_clock>:    push   %rbp
> 	   0xffffffff8100ce41 <sched_clock+1>:  mov    %rsp,%rbp
> 	   0xffffffff8100ce44 <sched_clock+4>:  and    $0xfffffffffffffff0,%rsp
> 
> unconditional JMP!!!
> 
> 	   0xffffffff8100ce48 <sched_clock+8>:  jmpq   0xffffffff8100ce70 <sched_clock+48>
> 
> unlikely code using jiffies
> 
> 	   0xffffffff8100ce4d <sched_clock+13>: mov    0x9a71ac(%rip),%r8        # 0xffffffff819b4000 <jiffies_64>
> 	   0xffffffff8100ce54 <sched_clock+20>: movabs $0xffc2f745d964b800,%rax
> 	   0xffffffff8100ce5e <sched_clock+30>: leaveq 
> 	   0xffffffff8100ce5f <sched_clock+31>: imul   $0x3d0900,%r8,%rdx
> 	   0xffffffff8100ce66 <sched_clock+38>: add    %rdx,%rax
> 	   0xffffffff8100ce69 <sched_clock+41>: retq   
> 	   0xffffffff8100ce6a <sched_clock+42>: nopw   0x0(%rax,%rax,1)
> 
> likely code using RDTSC, see JMP target address.
> 
> 	   0xffffffff8100ce70 <sched_clock+48>: rdtsc
> 
> 
> so what we end up getting is not really helping because we always get to
> pay for that JMP on all modern systems which sport RDTSC even though we
> shouldn't really.
> 
> And remember this is not something cheap: sched_clock uses the TSC
> even if it is unstable so we're always jumping like insane and
> unconditionally.
> 
> So, long story short, we want this, instead:
> 
> 	   0xffffffff8100cf10 <sched_clock>:    push   %rbp
> 	   0xffffffff8100cf11 <sched_clock+1>:  mov    %rsp,%rbp
> 	   0xffffffff8100cf14 <sched_clock+4>:  and    $0xfffffffffffffff0,%rsp
> 
> unconditional JMP is nopped out
> 
> 	   0xffffffff8100cf18 <sched_clock+8>:  data32 data32 data32 xchg %ax,%ax
> 
> likely code which comes first in the function so all the advantages from
> it to front end, branch pred, yadda yadda, get to be enjoyed :)
> 
> 	   0xffffffff8100cf1d <sched_clock+13>: rdtsc  
> 	   0xffffffff8100cf1f <sched_clock+15>: mov    %eax,%esi
> 	   0xffffffff8100cf21 <sched_clock+17>: mov    %rdx,%rax
> 	   0xffffffff8100cf24 <sched_clock+20>: shl    $0x20,%rax
> 	   0xffffffff8100cf28 <sched_clock+24>: or     %rsi,%rax
> 	   0xffffffff8100cf2b <sched_clock+27>: mov    %rax,%rcx
> 	   0xffffffff8100cf2e <sched_clock+30>: incl   %gs:0xb8e0
> 	   0xffffffff8100cf36 <sched_clock+38>: mov    %gs:0x1d0c30,%rsi
> 	   0xffffffff8100cf3f <sched_clock+47>: mov    %gs:0x1d0c38,%rax
> 	   0xffffffff8100cf48 <sched_clock+56>: cmp    %rax,%rsi
> 	   0xffffffff8100cf4b <sched_clock+59>: jne    0xffffffff8100cf90 <sched_clock+128>
> 	   0xffffffff8100cf4d <sched_clock+61>: mov    (%rsi),%eax
> 	   0xffffffff8100cf4f <sched_clock+63>: mul    %rcx
> 	   0xffffffff8100cf52 <sched_clock+66>: shrd   $0xa,%rdx,%rax
> 	   0xffffffff8100cf57 <sched_clock+71>: add    0x8(%rsi),%rax
> 	   0xffffffff8100cf5b <sched_clock+75>: decl   %gs:0xb8e0
> 	   0xffffffff8100cf63 <sched_clock+83>: je     0xffffffff8100cf88 <sched_clock+120>
> 	   0xffffffff8100cf65 <sched_clock+85>: leaveq 
> 	   0xffffffff8100cf66 <sched_clock+86>: retq   
> 
> Done, we return here.
> 
> 	   0xffffffff8100cf67 <sched_clock+87>: nop
> 
> unlikely code which does the jiffies math.
> 
> 	   0xffffffff8100cf68 <sched_clock+88>: mov    0x9a7091(%rip),%rax        # 0xffffffff819b4000 <jiffies_64>
> 	   0xffffffff8100cf6f <sched_clock+95>: leaveq 
> 	   0xffffffff8100cf70 <sched_clock+96>: imul   $0x3d0900,%rax,%rdx
> 	   ...
> 
> 
> So basically what I'm proposing is a jump label type which is
> initialized by default to jump to the unlikely code and once
> initialization has happened, JMP gets overwritten.
> 
> The things to pay attention here is
> 
> * this label should be used in places where it is very likely for the
> jump to get overwritten. Basically the opposite to tracepoints for which
> the jump labels were created initially, to be mostly off.
> 
> * It must be used in places where JMP gets overwritten only after some
> initialization done first.
> 
> Anyway, below is a concept patch for myself to try the idea first - it
> seems to work here. Constructive ideas and suggestions are welcome, as
> always.
> 
> ---
> diff --git a/arch/x86/include/asm/jump_label.h b/arch/x86/include/asm/jump_label.h
> index 6a2cefb4395a..2d963c6489a8 100644
> --- a/arch/x86/include/asm/jump_label.h
> +++ b/arch/x86/include/asm/jump_label.h
> @@ -30,6 +30,22 @@ l_yes:
>  	return true;
>  }
>  
> +static __always_inline bool arch_static_branch_active(struct static_key *key)
> +{
> +	asm_volatile_goto("1:"
> +		"jmp %l[l_yes]\n\t"
> +		".byte " __stringify(GENERIC_NOP3) "\n\t"
> +		".pushsection __jump_table,  \"aw\" \n\t"
> +		_ASM_ALIGN "\n\t"
> +		_ASM_PTR "1b, %l[l_yes], %c0 \n\t"
> +		".popsection \n\t"
> +		: :  "i" (key) : : l_yes);
> +	return false;
> +l_yes:
> +	return true;
> +}
> +
> +
>  #endif /* __KERNEL__ */
>  
>  #ifdef CONFIG_X86_64
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index 4ca327e900ae..81bc2c4a7eab 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -33,7 +33,7 @@ EXPORT_SYMBOL(tsc_khz);
>   */
>  static int __read_mostly tsc_unstable;
>  
> -static struct static_key __use_tsc = STATIC_KEY_INIT;
> +static struct static_key __use_tsc = STATIC_KEY_INIT_FALSE;
>  
>  int tsc_clocksource_reliable;
>  
> @@ -280,7 +280,7 @@ u64 native_sched_clock(void)
>  	 *   very important for it to be as fast as the platform
>  	 *   can achieve it. )
>  	 */
> -	if (!static_key_false(&__use_tsc)) {
> +	if (arch_static_branch_active(&__use_tsc)) {
>  		/* No locking but a rare wrong value is not a big deal: */
>  		return (jiffies_64 - INITIAL_JIFFIES) * (1000000000 / HZ);
>  	}

Wouldn't using STATIC_KEY_INIT_TRUE and static_key_true() [instead of 
!static_key_false()] result in the same good code placement effects?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] Flipped jump labels
  2014-08-10  6:11 ` Ingo Molnar
@ 2014-08-10  6:13   ` Ingo Molnar
  2014-08-10 15:35     ` Borislav Petkov
  0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2014-08-10  6:13 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: x86-ml, lkml, Peter Zijlstra, Steven Rostedt, Thomas Gleixner,
	Linus Torvalds, Jason Baron, Andrew Morton


[same mail, with Jason's email address corrected.]

* Ingo Molnar <mingo@kernel.org> wrote:

> * Borislav Petkov <bp@alien8.de> wrote:
> 
> > Hi dudes,
> > 
> > with the current impl. of jump labels, people can't really do the
> > following:
> > 
> > ---
> > JMP unlikely_code
> > likely_code
> > 
> > unlikely_code:
> > unlikely code
> > ---
> > 
> > and after some initialization queries overwrite the JMP with a NOP so
> > that the likely code gets executed at 0 cost.
> > 
> > The issue is that jump labels unconditionally add NOPs by default
> > (see arch_static_branch). For example, native_sched_clock() gets the
> > following code layout here:
> > 
> > --
> > NOP
> > unlikely code (which computes time in ns from jiffies)
> > likely code (which does RDTSC)
> > --
> > 
> > Yes, unlikely code comes first.
> > 
> > when the jump labels get initialized and all checks done, at runtime we
> > have this:
> > 
> > 	   0xffffffff8100ce40 <sched_clock>:    push   %rbp
> > 	   0xffffffff8100ce41 <sched_clock+1>:  mov    %rsp,%rbp
> > 	   0xffffffff8100ce44 <sched_clock+4>:  and    $0xfffffffffffffff0,%rsp
> > 
> > unconditional JMP!!!
> > 
> > 	   0xffffffff8100ce48 <sched_clock+8>:  jmpq   0xffffffff8100ce70 <sched_clock+48>
> > 
> > unlikely code using jiffies
> > 
> > 	   0xffffffff8100ce4d <sched_clock+13>: mov    0x9a71ac(%rip),%r8        # 0xffffffff819b4000 <jiffies_64>
> > 	   0xffffffff8100ce54 <sched_clock+20>: movabs $0xffc2f745d964b800,%rax
> > 	   0xffffffff8100ce5e <sched_clock+30>: leaveq 
> > 	   0xffffffff8100ce5f <sched_clock+31>: imul   $0x3d0900,%r8,%rdx
> > 	   0xffffffff8100ce66 <sched_clock+38>: add    %rdx,%rax
> > 	   0xffffffff8100ce69 <sched_clock+41>: retq   
> > 	   0xffffffff8100ce6a <sched_clock+42>: nopw   0x0(%rax,%rax,1)
> > 
> > likely code using RDTSC, see JMP target address.
> > 
> > 	   0xffffffff8100ce70 <sched_clock+48>: rdtsc
> > 
> > 
> > so what we end up getting is not really helping because we always get to
> > pay for that JMP on all modern systems which sport RDTSC even though we
> > shouldn't really.
> > 
> > And remember this is not something cheap: sched_clock uses the TSC
> > even if it is unstable so we're always jumping like insane and
> > unconditionally.
> > 
> > So, long story short, we want this, instead:
> > 
> > 	   0xffffffff8100cf10 <sched_clock>:    push   %rbp
> > 	   0xffffffff8100cf11 <sched_clock+1>:  mov    %rsp,%rbp
> > 	   0xffffffff8100cf14 <sched_clock+4>:  and    $0xfffffffffffffff0,%rsp
> > 
> > unconditional JMP is nopped out
> > 
> > 	   0xffffffff8100cf18 <sched_clock+8>:  data32 data32 data32 xchg %ax,%ax
> > 
> > likely code which comes first in the function so all the advantages from
> > it to front end, branch pred, yadda yadda, get to be enjoyed :)
> > 
> > 	   0xffffffff8100cf1d <sched_clock+13>: rdtsc  
> > 	   0xffffffff8100cf1f <sched_clock+15>: mov    %eax,%esi
> > 	   0xffffffff8100cf21 <sched_clock+17>: mov    %rdx,%rax
> > 	   0xffffffff8100cf24 <sched_clock+20>: shl    $0x20,%rax
> > 	   0xffffffff8100cf28 <sched_clock+24>: or     %rsi,%rax
> > 	   0xffffffff8100cf2b <sched_clock+27>: mov    %rax,%rcx
> > 	   0xffffffff8100cf2e <sched_clock+30>: incl   %gs:0xb8e0
> > 	   0xffffffff8100cf36 <sched_clock+38>: mov    %gs:0x1d0c30,%rsi
> > 	   0xffffffff8100cf3f <sched_clock+47>: mov    %gs:0x1d0c38,%rax
> > 	   0xffffffff8100cf48 <sched_clock+56>: cmp    %rax,%rsi
> > 	   0xffffffff8100cf4b <sched_clock+59>: jne    0xffffffff8100cf90 <sched_clock+128>
> > 	   0xffffffff8100cf4d <sched_clock+61>: mov    (%rsi),%eax
> > 	   0xffffffff8100cf4f <sched_clock+63>: mul    %rcx
> > 	   0xffffffff8100cf52 <sched_clock+66>: shrd   $0xa,%rdx,%rax
> > 	   0xffffffff8100cf57 <sched_clock+71>: add    0x8(%rsi),%rax
> > 	   0xffffffff8100cf5b <sched_clock+75>: decl   %gs:0xb8e0
> > 	   0xffffffff8100cf63 <sched_clock+83>: je     0xffffffff8100cf88 <sched_clock+120>
> > 	   0xffffffff8100cf65 <sched_clock+85>: leaveq 
> > 	   0xffffffff8100cf66 <sched_clock+86>: retq   
> > 
> > Done, we return here.
> > 
> > 	   0xffffffff8100cf67 <sched_clock+87>: nop
> > 
> > unlikely code which does the jiffies math.
> > 
> > 	   0xffffffff8100cf68 <sched_clock+88>: mov    0x9a7091(%rip),%rax        # 0xffffffff819b4000 <jiffies_64>
> > 	   0xffffffff8100cf6f <sched_clock+95>: leaveq 
> > 	   0xffffffff8100cf70 <sched_clock+96>: imul   $0x3d0900,%rax,%rdx
> > 	   ...
> > 
> > 
> > So basically what I'm proposing is a jump label type which is
> > initialized by default to jump to the unlikely code and once
> > initialization has happened, JMP gets overwritten.
> > 
> > The things to pay attention here is
> > 
> > * this label should be used in places where it is very likely for the
> > jump to get overwritten. Basically the opposite to tracepoints for which
> > the jump labels were created initially, to be mostly off.
> > 
> > * It must be used in places where JMP gets overwritten only after some
> > initialization done first.
> > 
> > Anyway, below is a concept patch for myself to try the idea first - it
> > seems to work here. Constructive ideas and suggestions are welcome, as
> > always.
> > 
> > ---
> > diff --git a/arch/x86/include/asm/jump_label.h b/arch/x86/include/asm/jump_label.h
> > index 6a2cefb4395a..2d963c6489a8 100644
> > --- a/arch/x86/include/asm/jump_label.h
> > +++ b/arch/x86/include/asm/jump_label.h
> > @@ -30,6 +30,22 @@ l_yes:
> >  	return true;
> >  }
> >  
> > +static __always_inline bool arch_static_branch_active(struct static_key *key)
> > +{
> > +	asm_volatile_goto("1:"
> > +		"jmp %l[l_yes]\n\t"
> > +		".byte " __stringify(GENERIC_NOP3) "\n\t"
> > +		".pushsection __jump_table,  \"aw\" \n\t"
> > +		_ASM_ALIGN "\n\t"
> > +		_ASM_PTR "1b, %l[l_yes], %c0 \n\t"
> > +		".popsection \n\t"
> > +		: :  "i" (key) : : l_yes);
> > +	return false;
> > +l_yes:
> > +	return true;
> > +}
> > +
> > +
> >  #endif /* __KERNEL__ */
> >  
> >  #ifdef CONFIG_X86_64
> > diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> > index 4ca327e900ae..81bc2c4a7eab 100644
> > --- a/arch/x86/kernel/tsc.c
> > +++ b/arch/x86/kernel/tsc.c
> > @@ -33,7 +33,7 @@ EXPORT_SYMBOL(tsc_khz);
> >   */
> >  static int __read_mostly tsc_unstable;
> >  
> > -static struct static_key __use_tsc = STATIC_KEY_INIT;
> > +static struct static_key __use_tsc = STATIC_KEY_INIT_FALSE;
> >  
> >  int tsc_clocksource_reliable;
> >  
> > @@ -280,7 +280,7 @@ u64 native_sched_clock(void)
> >  	 *   very important for it to be as fast as the platform
> >  	 *   can achieve it. )
> >  	 */
> > -	if (!static_key_false(&__use_tsc)) {
> > +	if (arch_static_branch_active(&__use_tsc)) {
> >  		/* No locking but a rare wrong value is not a big deal: */
> >  		return (jiffies_64 - INITIAL_JIFFIES) * (1000000000 / HZ);
> >  	}
 
Wouldn't using STATIC_KEY_INIT_TRUE and static_key_true() [instead of 
!static_key_false()] result in the same good code placement effects?
 
Thanks,
 
	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] Flipped jump labels
  2014-08-10  6:13   ` Ingo Molnar
@ 2014-08-10 15:35     ` Borislav Petkov
  2014-08-10 15:45       ` Ingo Molnar
  0 siblings, 1 reply; 8+ messages in thread
From: Borislav Petkov @ 2014-08-10 15:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: x86-ml, lkml, Peter Zijlstra, Steven Rostedt, Thomas Gleixner,
	Linus Torvalds, Jason Baron, Andrew Morton

On Sun, Aug 10, 2014 at 08:13:03AM +0200, Ingo Molnar wrote:
> Wouldn't using STATIC_KEY_INIT_TRUE and static_key_true() [instead of
> !static_key_false()] result in the same good code placement effects?

Nope, not really. static_key_true() is !static_key_false() and we're not
changing anything, logically. ASM looks non-optimal here, in any case,
with the "true" version.

Also, the problem in this particular use case in native_sched_clock is
that we want to do some initialization first before we enable the RDTSC,
i.e., the likely branch.

So we want to have the following static layout after build:

JMP
likely
unlikely

and the initialization code turns the JMP into a NOP at runtime.
Basically something like alternatives. :-)

Current jump labels do not have the possibility to get generated as
jump-to-unlikely labels which get changed at init time - they issue
NOPs only.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] Flipped jump labels
  2014-08-10 15:35     ` Borislav Petkov
@ 2014-08-10 15:45       ` Ingo Molnar
  2014-08-10 16:07         ` Borislav Petkov
  0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2014-08-10 15:45 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: x86-ml, lkml, Peter Zijlstra, Steven Rostedt, Thomas Gleixner,
	Linus Torvalds, Jason Baron, Andrew Morton


* Borislav Petkov <bp@alien8.de> wrote:

> On Sun, Aug 10, 2014 at 08:13:03AM +0200, Ingo Molnar wrote:
> > Wouldn't using STATIC_KEY_INIT_TRUE and static_key_true() [instead of
> > !static_key_false()] result in the same good code placement effects?
> 
> Nope, not really. static_key_true() is !static_key_false() and we're not
> changing anything, logically. ASM looks non-optimal here, in any case,
> with the "true" version.

Indeed - but could we use that interface to cleanly expose the 
arch_static_branch_active() code you've written, or do we need new 
variants?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] Flipped jump labels
  2014-08-10 15:45       ` Ingo Molnar
@ 2014-08-10 16:07         ` Borislav Petkov
  2014-08-11  3:32           ` Jason Baron
  0 siblings, 1 reply; 8+ messages in thread
From: Borislav Petkov @ 2014-08-10 16:07 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: x86-ml, lkml, Peter Zijlstra, Steven Rostedt, Thomas Gleixner,
	Linus Torvalds, Jason Baron, Andrew Morton

On Sun, Aug 10, 2014 at 05:45:15PM +0200, Ingo Molnar wrote:
> Indeed - but could we use that interface to cleanly expose the
> arch_static_branch_active() code you've written, or do we need new
> variants?

We could probably.

The thing is, if we want to do the _active thing, the whole jump labels
infrastructure would need to know about those, let's call them "new
types" because they'd need different handling when enabling - see
__jump_label_transform() in arch/x86/kernel/jump_label.c with all the
NOP checks.

Which begs the more important question: is adding those just to save us
a JMP penalty justify the additional code complexity. Frankly, I'm still
on the fence here and I'd rather do some perf measurements of a kernel
build with and without the JMP in native_sched_clock() to see whether it
is even noticeable or it disappears in the noise.

Because if it does disappear, the whole trouble is just for nothing.

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] Flipped jump labels
  2014-08-10 16:07         ` Borislav Petkov
@ 2014-08-11  3:32           ` Jason Baron
  2014-08-11  8:42             ` Borislav Petkov
  0 siblings, 1 reply; 8+ messages in thread
From: Jason Baron @ 2014-08-11  3:32 UTC (permalink / raw)
  To: Borislav Petkov, Ingo Molnar
  Cc: x86-ml, lkml, Peter Zijlstra, Steven Rostedt, Thomas Gleixner,
	Linus Torvalds, Andrew Morton

On 08/10/2014 12:07 PM, Borislav Petkov wrote:
> On Sun, Aug 10, 2014 at 05:45:15PM +0200, Ingo Molnar wrote:
>> Indeed - but could we use that interface to cleanly expose the
>> arch_static_branch_active() code you've written, or do we need new
>> variants?
> We could probably.
>
> The thing is, if we want to do the _active thing, the whole jump labels
> infrastructure would need to know about those, let's call them "new
> types" because they'd need different handling when enabling - see
> __jump_label_transform() in arch/x86/kernel/jump_label.c with all the
> NOP checks.
>
> Which begs the more important question: is adding those just to save us
> a JMP penalty justify the additional code complexity. Frankly, I'm still
> on the fence here and I'd rather do some perf measurements of a kernel
> build with and without the JMP in native_sched_clock() to see whether it
> is even noticeable or it disappears in the noise.
>
> Because if it does disappear, the whole trouble is just for nothing.
>
> Thanks.
>

That is correct. We don't currently support having the default
branch direction or the fast path be different from how the
'static_key' is initialized.

If I understand your use-case correctly,  you can't have the tsc
path be the default *before* tsc_init() is called?

If not, another thought on the implementation here might be
to re-visit the work the Steven Rostedt  proposed a while back,
to use jump instead of no-ops by default, and then convert
the jumps to no-ops in a post-processing phase in order to
reduce code size (by having 2-byte jumps for example on
x86). Potentially, we could then avoid converting some of the
jumps, if they didn't match the default branch direction.
See: http://www.serverphorums.com/read.php?12,759207

In that way there is no API change, we are just relaxing the
restriction that the default branch direction must match the
way that the keys are initialized.

Thanks,

-Jason



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] Flipped jump labels
  2014-08-11  3:32           ` Jason Baron
@ 2014-08-11  8:42             ` Borislav Petkov
  0 siblings, 0 replies; 8+ messages in thread
From: Borislav Petkov @ 2014-08-11  8:42 UTC (permalink / raw)
  To: Jason Baron
  Cc: Ingo Molnar, x86-ml, lkml, Peter Zijlstra, Steven Rostedt,
	Thomas Gleixner, Linus Torvalds, Andrew Morton

On Sun, Aug 10, 2014 at 11:32:31PM -0400, Jason Baron wrote:
> That is correct. We don't currently support having the default
> branch direction or the fast path be different from how the
> 'static_key' is initialized.
> 
> If I understand your use-case correctly,  you can't have the tsc
> path be the default *before* tsc_init() is called?

Yes. And also have it come first in the function so that after nopping
out the JMP later turns it into a 0-cost path.

> If not, another thought on the implementation here might be
> to re-visit the work the Steven Rostedt  proposed a while back,
> to use jump instead of no-ops by default, and then convert
> the jumps to no-ops in a post-processing phase in order to
> reduce code size (by having 2-byte jumps for example on
> x86). Potentially, we could then avoid converting some of the
> jumps, if they didn't match the default branch direction.
> See: http://www.serverphorums.com/read.php?12,759207

Haha, so I did a very similar thing too, independently from Steve (patch
at the end):

https://lkml.kernel.org/r/20140809105742.GA5910@pd.tnic

but mine is dumber.

I'll dig out Steve's version and play with it a bit.

> In that way there is no API change, we are just relaxing the
> restriction that the default branch direction must match the
> way that the keys are initialized.

Which would make the change even smaller and invisible to users, good!

Thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-08-11  8:48 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-09 10:57 [RFC PATCH] Flipped jump labels Borislav Petkov
2014-08-10  6:11 ` Ingo Molnar
2014-08-10  6:13   ` Ingo Molnar
2014-08-10 15:35     ` Borislav Petkov
2014-08-10 15:45       ` Ingo Molnar
2014-08-10 16:07         ` Borislav Petkov
2014-08-11  3:32           ` Jason Baron
2014-08-11  8:42             ` Borislav Petkov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.