* [RFC PATCH] Flipped jump labels
@ 2014-08-09 10:57 Borislav Petkov
2014-08-10 6:11 ` Ingo Molnar
0 siblings, 1 reply; 8+ messages in thread
From: Borislav Petkov @ 2014-08-09 10:57 UTC (permalink / raw)
To: x86-ml, lkml; +Cc: Peter Zijlstra, Steven Rostedt, Thomas Gleixner
Hi dudes,
with the current impl. of jump labels, people can't really do the
following:
---
JMP unlikely_code
likely_code
unlikely_code:
unlikely code
---
and after some initialization queries overwrite the JMP with a NOP so
that the likely code gets executed at 0 cost.
The issue is that jump labels unconditionally add NOPs by default
(see arch_static_branch). For example, native_sched_clock() gets the
following code layout here:
--
NOP
unlikely code (which computes time in ns from jiffies)
likely code (which does RDTSC)
--
Yes, unlikely code comes first.
when the jump labels get initialized and all checks done, at runtime we
have this:
0xffffffff8100ce40 <sched_clock>: push %rbp
0xffffffff8100ce41 <sched_clock+1>: mov %rsp,%rbp
0xffffffff8100ce44 <sched_clock+4>: and $0xfffffffffffffff0,%rsp
unconditional JMP!!!
0xffffffff8100ce48 <sched_clock+8>: jmpq 0xffffffff8100ce70 <sched_clock+48>
unlikely code using jiffies
0xffffffff8100ce4d <sched_clock+13>: mov 0x9a71ac(%rip),%r8 # 0xffffffff819b4000 <jiffies_64>
0xffffffff8100ce54 <sched_clock+20>: movabs $0xffc2f745d964b800,%rax
0xffffffff8100ce5e <sched_clock+30>: leaveq
0xffffffff8100ce5f <sched_clock+31>: imul $0x3d0900,%r8,%rdx
0xffffffff8100ce66 <sched_clock+38>: add %rdx,%rax
0xffffffff8100ce69 <sched_clock+41>: retq
0xffffffff8100ce6a <sched_clock+42>: nopw 0x0(%rax,%rax,1)
likely code using RDTSC, see JMP target address.
0xffffffff8100ce70 <sched_clock+48>: rdtsc
so what we end up getting is not really helping because we always get to
pay for that JMP on all modern systems which sport RDTSC even though we
shouldn't really.
And remember this is not something cheap: sched_clock uses the TSC
even if it is unstable so we're always jumping like insane and
unconditionally.
So, long story short, we want this, instead:
0xffffffff8100cf10 <sched_clock>: push %rbp
0xffffffff8100cf11 <sched_clock+1>: mov %rsp,%rbp
0xffffffff8100cf14 <sched_clock+4>: and $0xfffffffffffffff0,%rsp
unconditional JMP is nopped out
0xffffffff8100cf18 <sched_clock+8>: data32 data32 data32 xchg %ax,%ax
likely code which comes first in the function so all the advantages from
it to front end, branch pred, yadda yadda, get to be enjoyed :)
0xffffffff8100cf1d <sched_clock+13>: rdtsc
0xffffffff8100cf1f <sched_clock+15>: mov %eax,%esi
0xffffffff8100cf21 <sched_clock+17>: mov %rdx,%rax
0xffffffff8100cf24 <sched_clock+20>: shl $0x20,%rax
0xffffffff8100cf28 <sched_clock+24>: or %rsi,%rax
0xffffffff8100cf2b <sched_clock+27>: mov %rax,%rcx
0xffffffff8100cf2e <sched_clock+30>: incl %gs:0xb8e0
0xffffffff8100cf36 <sched_clock+38>: mov %gs:0x1d0c30,%rsi
0xffffffff8100cf3f <sched_clock+47>: mov %gs:0x1d0c38,%rax
0xffffffff8100cf48 <sched_clock+56>: cmp %rax,%rsi
0xffffffff8100cf4b <sched_clock+59>: jne 0xffffffff8100cf90 <sched_clock+128>
0xffffffff8100cf4d <sched_clock+61>: mov (%rsi),%eax
0xffffffff8100cf4f <sched_clock+63>: mul %rcx
0xffffffff8100cf52 <sched_clock+66>: shrd $0xa,%rdx,%rax
0xffffffff8100cf57 <sched_clock+71>: add 0x8(%rsi),%rax
0xffffffff8100cf5b <sched_clock+75>: decl %gs:0xb8e0
0xffffffff8100cf63 <sched_clock+83>: je 0xffffffff8100cf88 <sched_clock+120>
0xffffffff8100cf65 <sched_clock+85>: leaveq
0xffffffff8100cf66 <sched_clock+86>: retq
Done, we return here.
0xffffffff8100cf67 <sched_clock+87>: nop
unlikely code which does the jiffies math.
0xffffffff8100cf68 <sched_clock+88>: mov 0x9a7091(%rip),%rax # 0xffffffff819b4000 <jiffies_64>
0xffffffff8100cf6f <sched_clock+95>: leaveq
0xffffffff8100cf70 <sched_clock+96>: imul $0x3d0900,%rax,%rdx
...
So basically what I'm proposing is a jump label type which is
initialized by default to jump to the unlikely code and once
initialization has happened, JMP gets overwritten.
The things to pay attention here is
* this label should be used in places where it is very likely for the
jump to get overwritten. Basically the opposite to tracepoints for which
the jump labels were created initially, to be mostly off.
* It must be used in places where JMP gets overwritten only after some
initialization done first.
Anyway, below is a concept patch for myself to try the idea first - it
seems to work here. Constructive ideas and suggestions are welcome, as
always.
---
diff --git a/arch/x86/include/asm/jump_label.h b/arch/x86/include/asm/jump_label.h
index 6a2cefb4395a..2d963c6489a8 100644
--- a/arch/x86/include/asm/jump_label.h
+++ b/arch/x86/include/asm/jump_label.h
@@ -30,6 +30,22 @@ l_yes:
return true;
}
+static __always_inline bool arch_static_branch_active(struct static_key *key)
+{
+ asm_volatile_goto("1:"
+ "jmp %l[l_yes]\n\t"
+ ".byte " __stringify(GENERIC_NOP3) "\n\t"
+ ".pushsection __jump_table, \"aw\" \n\t"
+ _ASM_ALIGN "\n\t"
+ _ASM_PTR "1b, %l[l_yes], %c0 \n\t"
+ ".popsection \n\t"
+ : : "i" (key) : : l_yes);
+ return false;
+l_yes:
+ return true;
+}
+
+
#endif /* __KERNEL__ */
#ifdef CONFIG_X86_64
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 4ca327e900ae..81bc2c4a7eab 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -33,7 +33,7 @@ EXPORT_SYMBOL(tsc_khz);
*/
static int __read_mostly tsc_unstable;
-static struct static_key __use_tsc = STATIC_KEY_INIT;
+static struct static_key __use_tsc = STATIC_KEY_INIT_FALSE;
int tsc_clocksource_reliable;
@@ -280,7 +280,7 @@ u64 native_sched_clock(void)
* very important for it to be as fast as the platform
* can achieve it. )
*/
- if (!static_key_false(&__use_tsc)) {
+ if (arch_static_branch_active(&__use_tsc)) {
/* No locking but a rare wrong value is not a big deal: */
return (jiffies_64 - INITIAL_JIFFIES) * (1000000000 / HZ);
}
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [RFC PATCH] Flipped jump labels
2014-08-09 10:57 [RFC PATCH] Flipped jump labels Borislav Petkov
@ 2014-08-10 6:11 ` Ingo Molnar
2014-08-10 6:13 ` Ingo Molnar
0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2014-08-10 6:11 UTC (permalink / raw)
To: Borislav Petkov
Cc: x86-ml, lkml, Peter Zijlstra, Steven Rostedt, Thomas Gleixner,
Linus Torvalds, Jason Baron, Andrew Morton
* Borislav Petkov <bp@alien8.de> wrote:
> Hi dudes,
>
> with the current impl. of jump labels, people can't really do the
> following:
>
> ---
> JMP unlikely_code
> likely_code
>
> unlikely_code:
> unlikely code
> ---
>
> and after some initialization queries overwrite the JMP with a NOP so
> that the likely code gets executed at 0 cost.
>
> The issue is that jump labels unconditionally add NOPs by default
> (see arch_static_branch). For example, native_sched_clock() gets the
> following code layout here:
>
> --
> NOP
> unlikely code (which computes time in ns from jiffies)
> likely code (which does RDTSC)
> --
>
> Yes, unlikely code comes first.
>
> when the jump labels get initialized and all checks done, at runtime we
> have this:
>
> 0xffffffff8100ce40 <sched_clock>: push %rbp
> 0xffffffff8100ce41 <sched_clock+1>: mov %rsp,%rbp
> 0xffffffff8100ce44 <sched_clock+4>: and $0xfffffffffffffff0,%rsp
>
> unconditional JMP!!!
>
> 0xffffffff8100ce48 <sched_clock+8>: jmpq 0xffffffff8100ce70 <sched_clock+48>
>
> unlikely code using jiffies
>
> 0xffffffff8100ce4d <sched_clock+13>: mov 0x9a71ac(%rip),%r8 # 0xffffffff819b4000 <jiffies_64>
> 0xffffffff8100ce54 <sched_clock+20>: movabs $0xffc2f745d964b800,%rax
> 0xffffffff8100ce5e <sched_clock+30>: leaveq
> 0xffffffff8100ce5f <sched_clock+31>: imul $0x3d0900,%r8,%rdx
> 0xffffffff8100ce66 <sched_clock+38>: add %rdx,%rax
> 0xffffffff8100ce69 <sched_clock+41>: retq
> 0xffffffff8100ce6a <sched_clock+42>: nopw 0x0(%rax,%rax,1)
>
> likely code using RDTSC, see JMP target address.
>
> 0xffffffff8100ce70 <sched_clock+48>: rdtsc
>
>
> so what we end up getting is not really helping because we always get to
> pay for that JMP on all modern systems which sport RDTSC even though we
> shouldn't really.
>
> And remember this is not something cheap: sched_clock uses the TSC
> even if it is unstable so we're always jumping like insane and
> unconditionally.
>
> So, long story short, we want this, instead:
>
> 0xffffffff8100cf10 <sched_clock>: push %rbp
> 0xffffffff8100cf11 <sched_clock+1>: mov %rsp,%rbp
> 0xffffffff8100cf14 <sched_clock+4>: and $0xfffffffffffffff0,%rsp
>
> unconditional JMP is nopped out
>
> 0xffffffff8100cf18 <sched_clock+8>: data32 data32 data32 xchg %ax,%ax
>
> likely code which comes first in the function so all the advantages from
> it to front end, branch pred, yadda yadda, get to be enjoyed :)
>
> 0xffffffff8100cf1d <sched_clock+13>: rdtsc
> 0xffffffff8100cf1f <sched_clock+15>: mov %eax,%esi
> 0xffffffff8100cf21 <sched_clock+17>: mov %rdx,%rax
> 0xffffffff8100cf24 <sched_clock+20>: shl $0x20,%rax
> 0xffffffff8100cf28 <sched_clock+24>: or %rsi,%rax
> 0xffffffff8100cf2b <sched_clock+27>: mov %rax,%rcx
> 0xffffffff8100cf2e <sched_clock+30>: incl %gs:0xb8e0
> 0xffffffff8100cf36 <sched_clock+38>: mov %gs:0x1d0c30,%rsi
> 0xffffffff8100cf3f <sched_clock+47>: mov %gs:0x1d0c38,%rax
> 0xffffffff8100cf48 <sched_clock+56>: cmp %rax,%rsi
> 0xffffffff8100cf4b <sched_clock+59>: jne 0xffffffff8100cf90 <sched_clock+128>
> 0xffffffff8100cf4d <sched_clock+61>: mov (%rsi),%eax
> 0xffffffff8100cf4f <sched_clock+63>: mul %rcx
> 0xffffffff8100cf52 <sched_clock+66>: shrd $0xa,%rdx,%rax
> 0xffffffff8100cf57 <sched_clock+71>: add 0x8(%rsi),%rax
> 0xffffffff8100cf5b <sched_clock+75>: decl %gs:0xb8e0
> 0xffffffff8100cf63 <sched_clock+83>: je 0xffffffff8100cf88 <sched_clock+120>
> 0xffffffff8100cf65 <sched_clock+85>: leaveq
> 0xffffffff8100cf66 <sched_clock+86>: retq
>
> Done, we return here.
>
> 0xffffffff8100cf67 <sched_clock+87>: nop
>
> unlikely code which does the jiffies math.
>
> 0xffffffff8100cf68 <sched_clock+88>: mov 0x9a7091(%rip),%rax # 0xffffffff819b4000 <jiffies_64>
> 0xffffffff8100cf6f <sched_clock+95>: leaveq
> 0xffffffff8100cf70 <sched_clock+96>: imul $0x3d0900,%rax,%rdx
> ...
>
>
> So basically what I'm proposing is a jump label type which is
> initialized by default to jump to the unlikely code and once
> initialization has happened, JMP gets overwritten.
>
> The things to pay attention here is
>
> * this label should be used in places where it is very likely for the
> jump to get overwritten. Basically the opposite to tracepoints for which
> the jump labels were created initially, to be mostly off.
>
> * It must be used in places where JMP gets overwritten only after some
> initialization done first.
>
> Anyway, below is a concept patch for myself to try the idea first - it
> seems to work here. Constructive ideas and suggestions are welcome, as
> always.
>
> ---
> diff --git a/arch/x86/include/asm/jump_label.h b/arch/x86/include/asm/jump_label.h
> index 6a2cefb4395a..2d963c6489a8 100644
> --- a/arch/x86/include/asm/jump_label.h
> +++ b/arch/x86/include/asm/jump_label.h
> @@ -30,6 +30,22 @@ l_yes:
> return true;
> }
>
> +static __always_inline bool arch_static_branch_active(struct static_key *key)
> +{
> + asm_volatile_goto("1:"
> + "jmp %l[l_yes]\n\t"
> + ".byte " __stringify(GENERIC_NOP3) "\n\t"
> + ".pushsection __jump_table, \"aw\" \n\t"
> + _ASM_ALIGN "\n\t"
> + _ASM_PTR "1b, %l[l_yes], %c0 \n\t"
> + ".popsection \n\t"
> + : : "i" (key) : : l_yes);
> + return false;
> +l_yes:
> + return true;
> +}
> +
> +
> #endif /* __KERNEL__ */
>
> #ifdef CONFIG_X86_64
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index 4ca327e900ae..81bc2c4a7eab 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -33,7 +33,7 @@ EXPORT_SYMBOL(tsc_khz);
> */
> static int __read_mostly tsc_unstable;
>
> -static struct static_key __use_tsc = STATIC_KEY_INIT;
> +static struct static_key __use_tsc = STATIC_KEY_INIT_FALSE;
>
> int tsc_clocksource_reliable;
>
> @@ -280,7 +280,7 @@ u64 native_sched_clock(void)
> * very important for it to be as fast as the platform
> * can achieve it. )
> */
> - if (!static_key_false(&__use_tsc)) {
> + if (arch_static_branch_active(&__use_tsc)) {
> /* No locking but a rare wrong value is not a big deal: */
> return (jiffies_64 - INITIAL_JIFFIES) * (1000000000 / HZ);
> }
Wouldn't using STATIC_KEY_INIT_TRUE and static_key_true() [instead of
!static_key_false()] result in the same good code placement effects?
Thanks,
Ingo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH] Flipped jump labels
2014-08-10 6:11 ` Ingo Molnar
@ 2014-08-10 6:13 ` Ingo Molnar
2014-08-10 15:35 ` Borislav Petkov
0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2014-08-10 6:13 UTC (permalink / raw)
To: Borislav Petkov
Cc: x86-ml, lkml, Peter Zijlstra, Steven Rostedt, Thomas Gleixner,
Linus Torvalds, Jason Baron, Andrew Morton
[same mail, with Jason's email address corrected.]
* Ingo Molnar <mingo@kernel.org> wrote:
> * Borislav Petkov <bp@alien8.de> wrote:
>
> > Hi dudes,
> >
> > with the current impl. of jump labels, people can't really do the
> > following:
> >
> > ---
> > JMP unlikely_code
> > likely_code
> >
> > unlikely_code:
> > unlikely code
> > ---
> >
> > and after some initialization queries overwrite the JMP with a NOP so
> > that the likely code gets executed at 0 cost.
> >
> > The issue is that jump labels unconditionally add NOPs by default
> > (see arch_static_branch). For example, native_sched_clock() gets the
> > following code layout here:
> >
> > --
> > NOP
> > unlikely code (which computes time in ns from jiffies)
> > likely code (which does RDTSC)
> > --
> >
> > Yes, unlikely code comes first.
> >
> > when the jump labels get initialized and all checks done, at runtime we
> > have this:
> >
> > 0xffffffff8100ce40 <sched_clock>: push %rbp
> > 0xffffffff8100ce41 <sched_clock+1>: mov %rsp,%rbp
> > 0xffffffff8100ce44 <sched_clock+4>: and $0xfffffffffffffff0,%rsp
> >
> > unconditional JMP!!!
> >
> > 0xffffffff8100ce48 <sched_clock+8>: jmpq 0xffffffff8100ce70 <sched_clock+48>
> >
> > unlikely code using jiffies
> >
> > 0xffffffff8100ce4d <sched_clock+13>: mov 0x9a71ac(%rip),%r8 # 0xffffffff819b4000 <jiffies_64>
> > 0xffffffff8100ce54 <sched_clock+20>: movabs $0xffc2f745d964b800,%rax
> > 0xffffffff8100ce5e <sched_clock+30>: leaveq
> > 0xffffffff8100ce5f <sched_clock+31>: imul $0x3d0900,%r8,%rdx
> > 0xffffffff8100ce66 <sched_clock+38>: add %rdx,%rax
> > 0xffffffff8100ce69 <sched_clock+41>: retq
> > 0xffffffff8100ce6a <sched_clock+42>: nopw 0x0(%rax,%rax,1)
> >
> > likely code using RDTSC, see JMP target address.
> >
> > 0xffffffff8100ce70 <sched_clock+48>: rdtsc
> >
> >
> > so what we end up getting is not really helping because we always get to
> > pay for that JMP on all modern systems which sport RDTSC even though we
> > shouldn't really.
> >
> > And remember this is not something cheap: sched_clock uses the TSC
> > even if it is unstable so we're always jumping like insane and
> > unconditionally.
> >
> > So, long story short, we want this, instead:
> >
> > 0xffffffff8100cf10 <sched_clock>: push %rbp
> > 0xffffffff8100cf11 <sched_clock+1>: mov %rsp,%rbp
> > 0xffffffff8100cf14 <sched_clock+4>: and $0xfffffffffffffff0,%rsp
> >
> > unconditional JMP is nopped out
> >
> > 0xffffffff8100cf18 <sched_clock+8>: data32 data32 data32 xchg %ax,%ax
> >
> > likely code which comes first in the function so all the advantages from
> > it to front end, branch pred, yadda yadda, get to be enjoyed :)
> >
> > 0xffffffff8100cf1d <sched_clock+13>: rdtsc
> > 0xffffffff8100cf1f <sched_clock+15>: mov %eax,%esi
> > 0xffffffff8100cf21 <sched_clock+17>: mov %rdx,%rax
> > 0xffffffff8100cf24 <sched_clock+20>: shl $0x20,%rax
> > 0xffffffff8100cf28 <sched_clock+24>: or %rsi,%rax
> > 0xffffffff8100cf2b <sched_clock+27>: mov %rax,%rcx
> > 0xffffffff8100cf2e <sched_clock+30>: incl %gs:0xb8e0
> > 0xffffffff8100cf36 <sched_clock+38>: mov %gs:0x1d0c30,%rsi
> > 0xffffffff8100cf3f <sched_clock+47>: mov %gs:0x1d0c38,%rax
> > 0xffffffff8100cf48 <sched_clock+56>: cmp %rax,%rsi
> > 0xffffffff8100cf4b <sched_clock+59>: jne 0xffffffff8100cf90 <sched_clock+128>
> > 0xffffffff8100cf4d <sched_clock+61>: mov (%rsi),%eax
> > 0xffffffff8100cf4f <sched_clock+63>: mul %rcx
> > 0xffffffff8100cf52 <sched_clock+66>: shrd $0xa,%rdx,%rax
> > 0xffffffff8100cf57 <sched_clock+71>: add 0x8(%rsi),%rax
> > 0xffffffff8100cf5b <sched_clock+75>: decl %gs:0xb8e0
> > 0xffffffff8100cf63 <sched_clock+83>: je 0xffffffff8100cf88 <sched_clock+120>
> > 0xffffffff8100cf65 <sched_clock+85>: leaveq
> > 0xffffffff8100cf66 <sched_clock+86>: retq
> >
> > Done, we return here.
> >
> > 0xffffffff8100cf67 <sched_clock+87>: nop
> >
> > unlikely code which does the jiffies math.
> >
> > 0xffffffff8100cf68 <sched_clock+88>: mov 0x9a7091(%rip),%rax # 0xffffffff819b4000 <jiffies_64>
> > 0xffffffff8100cf6f <sched_clock+95>: leaveq
> > 0xffffffff8100cf70 <sched_clock+96>: imul $0x3d0900,%rax,%rdx
> > ...
> >
> >
> > So basically what I'm proposing is a jump label type which is
> > initialized by default to jump to the unlikely code and once
> > initialization has happened, JMP gets overwritten.
> >
> > The things to pay attention here is
> >
> > * this label should be used in places where it is very likely for the
> > jump to get overwritten. Basically the opposite to tracepoints for which
> > the jump labels were created initially, to be mostly off.
> >
> > * It must be used in places where JMP gets overwritten only after some
> > initialization done first.
> >
> > Anyway, below is a concept patch for myself to try the idea first - it
> > seems to work here. Constructive ideas and suggestions are welcome, as
> > always.
> >
> > ---
> > diff --git a/arch/x86/include/asm/jump_label.h b/arch/x86/include/asm/jump_label.h
> > index 6a2cefb4395a..2d963c6489a8 100644
> > --- a/arch/x86/include/asm/jump_label.h
> > +++ b/arch/x86/include/asm/jump_label.h
> > @@ -30,6 +30,22 @@ l_yes:
> > return true;
> > }
> >
> > +static __always_inline bool arch_static_branch_active(struct static_key *key)
> > +{
> > + asm_volatile_goto("1:"
> > + "jmp %l[l_yes]\n\t"
> > + ".byte " __stringify(GENERIC_NOP3) "\n\t"
> > + ".pushsection __jump_table, \"aw\" \n\t"
> > + _ASM_ALIGN "\n\t"
> > + _ASM_PTR "1b, %l[l_yes], %c0 \n\t"
> > + ".popsection \n\t"
> > + : : "i" (key) : : l_yes);
> > + return false;
> > +l_yes:
> > + return true;
> > +}
> > +
> > +
> > #endif /* __KERNEL__ */
> >
> > #ifdef CONFIG_X86_64
> > diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> > index 4ca327e900ae..81bc2c4a7eab 100644
> > --- a/arch/x86/kernel/tsc.c
> > +++ b/arch/x86/kernel/tsc.c
> > @@ -33,7 +33,7 @@ EXPORT_SYMBOL(tsc_khz);
> > */
> > static int __read_mostly tsc_unstable;
> >
> > -static struct static_key __use_tsc = STATIC_KEY_INIT;
> > +static struct static_key __use_tsc = STATIC_KEY_INIT_FALSE;
> >
> > int tsc_clocksource_reliable;
> >
> > @@ -280,7 +280,7 @@ u64 native_sched_clock(void)
> > * very important for it to be as fast as the platform
> > * can achieve it. )
> > */
> > - if (!static_key_false(&__use_tsc)) {
> > + if (arch_static_branch_active(&__use_tsc)) {
> > /* No locking but a rare wrong value is not a big deal: */
> > return (jiffies_64 - INITIAL_JIFFIES) * (1000000000 / HZ);
> > }
Wouldn't using STATIC_KEY_INIT_TRUE and static_key_true() [instead of
!static_key_false()] result in the same good code placement effects?
Thanks,
Ingo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH] Flipped jump labels
2014-08-10 6:13 ` Ingo Molnar
@ 2014-08-10 15:35 ` Borislav Petkov
2014-08-10 15:45 ` Ingo Molnar
0 siblings, 1 reply; 8+ messages in thread
From: Borislav Petkov @ 2014-08-10 15:35 UTC (permalink / raw)
To: Ingo Molnar
Cc: x86-ml, lkml, Peter Zijlstra, Steven Rostedt, Thomas Gleixner,
Linus Torvalds, Jason Baron, Andrew Morton
On Sun, Aug 10, 2014 at 08:13:03AM +0200, Ingo Molnar wrote:
> Wouldn't using STATIC_KEY_INIT_TRUE and static_key_true() [instead of
> !static_key_false()] result in the same good code placement effects?
Nope, not really. static_key_true() is !static_key_false() and we're not
changing anything, logically. ASM looks non-optimal here, in any case,
with the "true" version.
Also, the problem in this particular use case in native_sched_clock is
that we want to do some initialization first before we enable the RDTSC,
i.e., the likely branch.
So we want to have the following static layout after build:
JMP
likely
unlikely
and the initialization code turns the JMP into a NOP at runtime.
Basically something like alternatives. :-)
Current jump labels do not have the possibility to get generated as
jump-to-unlikely labels which get changed at init time - they issue
NOPs only.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH] Flipped jump labels
2014-08-10 15:35 ` Borislav Petkov
@ 2014-08-10 15:45 ` Ingo Molnar
2014-08-10 16:07 ` Borislav Petkov
0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2014-08-10 15:45 UTC (permalink / raw)
To: Borislav Petkov
Cc: x86-ml, lkml, Peter Zijlstra, Steven Rostedt, Thomas Gleixner,
Linus Torvalds, Jason Baron, Andrew Morton
* Borislav Petkov <bp@alien8.de> wrote:
> On Sun, Aug 10, 2014 at 08:13:03AM +0200, Ingo Molnar wrote:
> > Wouldn't using STATIC_KEY_INIT_TRUE and static_key_true() [instead of
> > !static_key_false()] result in the same good code placement effects?
>
> Nope, not really. static_key_true() is !static_key_false() and we're not
> changing anything, logically. ASM looks non-optimal here, in any case,
> with the "true" version.
Indeed - but could we use that interface to cleanly expose the
arch_static_branch_active() code you've written, or do we need new
variants?
Thanks,
Ingo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH] Flipped jump labels
2014-08-10 15:45 ` Ingo Molnar
@ 2014-08-10 16:07 ` Borislav Petkov
2014-08-11 3:32 ` Jason Baron
0 siblings, 1 reply; 8+ messages in thread
From: Borislav Petkov @ 2014-08-10 16:07 UTC (permalink / raw)
To: Ingo Molnar
Cc: x86-ml, lkml, Peter Zijlstra, Steven Rostedt, Thomas Gleixner,
Linus Torvalds, Jason Baron, Andrew Morton
On Sun, Aug 10, 2014 at 05:45:15PM +0200, Ingo Molnar wrote:
> Indeed - but could we use that interface to cleanly expose the
> arch_static_branch_active() code you've written, or do we need new
> variants?
We could probably.
The thing is, if we want to do the _active thing, the whole jump labels
infrastructure would need to know about those, let's call them "new
types" because they'd need different handling when enabling - see
__jump_label_transform() in arch/x86/kernel/jump_label.c with all the
NOP checks.
Which begs the more important question: is adding those just to save us
a JMP penalty justify the additional code complexity. Frankly, I'm still
on the fence here and I'd rather do some perf measurements of a kernel
build with and without the JMP in native_sched_clock() to see whether it
is even noticeable or it disappears in the noise.
Because if it does disappear, the whole trouble is just for nothing.
Thanks.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH] Flipped jump labels
2014-08-10 16:07 ` Borislav Petkov
@ 2014-08-11 3:32 ` Jason Baron
2014-08-11 8:42 ` Borislav Petkov
0 siblings, 1 reply; 8+ messages in thread
From: Jason Baron @ 2014-08-11 3:32 UTC (permalink / raw)
To: Borislav Petkov, Ingo Molnar
Cc: x86-ml, lkml, Peter Zijlstra, Steven Rostedt, Thomas Gleixner,
Linus Torvalds, Andrew Morton
On 08/10/2014 12:07 PM, Borislav Petkov wrote:
> On Sun, Aug 10, 2014 at 05:45:15PM +0200, Ingo Molnar wrote:
>> Indeed - but could we use that interface to cleanly expose the
>> arch_static_branch_active() code you've written, or do we need new
>> variants?
> We could probably.
>
> The thing is, if we want to do the _active thing, the whole jump labels
> infrastructure would need to know about those, let's call them "new
> types" because they'd need different handling when enabling - see
> __jump_label_transform() in arch/x86/kernel/jump_label.c with all the
> NOP checks.
>
> Which begs the more important question: is adding those just to save us
> a JMP penalty justify the additional code complexity. Frankly, I'm still
> on the fence here and I'd rather do some perf measurements of a kernel
> build with and without the JMP in native_sched_clock() to see whether it
> is even noticeable or it disappears in the noise.
>
> Because if it does disappear, the whole trouble is just for nothing.
>
> Thanks.
>
That is correct. We don't currently support having the default
branch direction or the fast path be different from how the
'static_key' is initialized.
If I understand your use-case correctly, you can't have the tsc
path be the default *before* tsc_init() is called?
If not, another thought on the implementation here might be
to re-visit the work the Steven Rostedt proposed a while back,
to use jump instead of no-ops by default, and then convert
the jumps to no-ops in a post-processing phase in order to
reduce code size (by having 2-byte jumps for example on
x86). Potentially, we could then avoid converting some of the
jumps, if they didn't match the default branch direction.
See: http://www.serverphorums.com/read.php?12,759207
In that way there is no API change, we are just relaxing the
restriction that the default branch direction must match the
way that the keys are initialized.
Thanks,
-Jason
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH] Flipped jump labels
2014-08-11 3:32 ` Jason Baron
@ 2014-08-11 8:42 ` Borislav Petkov
0 siblings, 0 replies; 8+ messages in thread
From: Borislav Petkov @ 2014-08-11 8:42 UTC (permalink / raw)
To: Jason Baron
Cc: Ingo Molnar, x86-ml, lkml, Peter Zijlstra, Steven Rostedt,
Thomas Gleixner, Linus Torvalds, Andrew Morton
On Sun, Aug 10, 2014 at 11:32:31PM -0400, Jason Baron wrote:
> That is correct. We don't currently support having the default
> branch direction or the fast path be different from how the
> 'static_key' is initialized.
>
> If I understand your use-case correctly, you can't have the tsc
> path be the default *before* tsc_init() is called?
Yes. And also have it come first in the function so that after nopping
out the JMP later turns it into a 0-cost path.
> If not, another thought on the implementation here might be
> to re-visit the work the Steven Rostedt proposed a while back,
> to use jump instead of no-ops by default, and then convert
> the jumps to no-ops in a post-processing phase in order to
> reduce code size (by having 2-byte jumps for example on
> x86). Potentially, we could then avoid converting some of the
> jumps, if they didn't match the default branch direction.
> See: http://www.serverphorums.com/read.php?12,759207
Haha, so I did a very similar thing too, independently from Steve (patch
at the end):
https://lkml.kernel.org/r/20140809105742.GA5910@pd.tnic
but mine is dumber.
I'll dig out Steve's version and play with it a bit.
> In that way there is no API change, we are just relaxing the
> restriction that the default branch direction must match the
> way that the keys are initialized.
Which would make the change even smaller and invisible to users, good!
Thanks.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2014-08-11 8:48 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-09 10:57 [RFC PATCH] Flipped jump labels Borislav Petkov
2014-08-10 6:11 ` Ingo Molnar
2014-08-10 6:13 ` Ingo Molnar
2014-08-10 15:35 ` Borislav Petkov
2014-08-10 15:45 ` Ingo Molnar
2014-08-10 16:07 ` Borislav Petkov
2014-08-11 3:32 ` Jason Baron
2014-08-11 8:42 ` Borislav Petkov
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.