linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC V2 0/5] jump-label: allow early jump_label_enable()
@ 2011-10-01 21:55 Jeremy Fitzhardinge
  2011-10-01 21:55 ` [PATCH RFC V2 1/5] jump_label: use proper atomic_t initializer Jeremy Fitzhardinge
                   ` (4 more replies)
  0 siblings, 5 replies; 53+ messages in thread
From: Jeremy Fitzhardinge @ 2011-10-01 21:55 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: David S. Miller, David Daney, Michael Ellerman, Jan Glauber,
	Jason Baron, the arch/x86 maintainers, Xen Devel,
	Linux Kernel Mailing List, Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

[ This is a new version of this series, which drops the need for any
  special _early() function in favour of making sure stop_machine()
  can operate correctly and efficiently in a pre-SMP envronment. ]

While trying to use the jump-label stuff for my PV ticketlock changes,
I had some problems using jump labels early in the kernel's lifetime
(pre-SMP).

The basic problem is that even if I enable a jump_label_key, the
jump_label_init() initializer ends up nopping out all the code sites.

This series enables early use of jump labels by making
jump_label_init() respect already-enabled keys.

To do this, I've dropped arch_jump_label_poke_text_early() and
replaced it with arch_jump_label_transform(), allowing it to either
insert an optimal nop, or leave the jump in place.

Part of this series makes sure that stop_machine() is safe to call
in an early pre-SMP environment, by making it just call the function
with interrupts disabled.

git://github.com/jsgf/linux-xen upstream/jump-label-noearly

Jeremy Fitzhardinge (5):
  jump_label: use proper atomic_t initializer
  stop_machine: make stop_machine safe and efficient to call early
  jump_label: if a key has already been initialized, don't nop it out
  x86/jump_label: drop arch_jump_label_text_poke_early()
  sparc/jump_label: drop arch_jump_label_text_poke_early()

 arch/sparc/kernel/jump_label.c |    8 --------
 arch/x86/kernel/jump_label.c   |    6 ------
 include/linux/jump_label.h     |    7 ++++---
 kernel/jump_label.c            |   20 ++++++++------------
 kernel/stop_machine.c          |   21 +++++++++++++++++++++
 5 files changed, 33 insertions(+), 29 deletions(-)

-- 
1.7.6.2


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH RFC V2 1/5] jump_label: use proper atomic_t initializer
  2011-10-01 21:55 [PATCH RFC V2 0/5] jump-label: allow early jump_label_enable() Jeremy Fitzhardinge
@ 2011-10-01 21:55 ` Jeremy Fitzhardinge
  2011-10-01 21:55 ` [PATCH RFC V2 2/5] stop_machine: make stop_machine safe and efficient to call early Jeremy Fitzhardinge
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 53+ messages in thread
From: Jeremy Fitzhardinge @ 2011-10-01 21:55 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: David S. Miller, David Daney, Michael Ellerman, Jan Glauber,
	Jason Baron, the arch/x86 maintainers, Xen Devel,
	Linux Kernel Mailing List, Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

ATOMIC_INIT() is the proper thing to use.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 include/linux/jump_label.h |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
index 66f23dc..1213e9d 100644
--- a/include/linux/jump_label.h
+++ b/include/linux/jump_label.h
@@ -28,9 +28,9 @@ struct module;
 #ifdef HAVE_JUMP_LABEL
 
 #ifdef CONFIG_MODULES
-#define JUMP_LABEL_INIT {{ 0 }, NULL, NULL}
+#define JUMP_LABEL_INIT {ATOMIC_INIT(0), NULL, NULL}
 #else
-#define JUMP_LABEL_INIT {{ 0 }, NULL}
+#define JUMP_LABEL_INIT {ATOMIC_INIT(0), NULL}
 #endif
 
 static __always_inline bool static_branch(struct jump_label_key *key)
-- 
1.7.6.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH RFC V2 2/5] stop_machine: make stop_machine safe and efficient to call early
  2011-10-01 21:55 [PATCH RFC V2 0/5] jump-label: allow early jump_label_enable() Jeremy Fitzhardinge
  2011-10-01 21:55 ` [PATCH RFC V2 1/5] jump_label: use proper atomic_t initializer Jeremy Fitzhardinge
@ 2011-10-01 21:55 ` Jeremy Fitzhardinge
  2011-10-02  0:36   ` Tejun Heo
  2011-10-03 19:24   ` [Xen-devel] " Konrad Rzeszutek Wilk
  2011-10-01 21:55 ` [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don't nop it out Jeremy Fitzhardinge
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 53+ messages in thread
From: Jeremy Fitzhardinge @ 2011-10-01 21:55 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: David S. Miller, David Daney, Michael Ellerman, Jan Glauber,
	Jason Baron, the arch/x86 maintainers, Xen Devel,
	Linux Kernel Mailing List, Jeremy Fitzhardinge, Tejun Heo,
	Rusty Russell, Peter Zijlstra, Andrew Morton, H. Peter Anvin,
	Ingo Molnar

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Make stop_machine() safe to call early in boot, before SMP has been
set up, by simply calling the callback function directly if there's
only one CPU online.

[ Fixes from AKPM:
   - add comment
   - local_irq_flags, not save_flags
   - also call hard_irq_disable() for systems which need it

  Tejun suggested using an explicit flag rather than just looking at
  the online cpu count. ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/stop_machine.c |   21 +++++++++++++++++++++
 1 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index ba5070c..9c59d9e 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -41,6 +41,7 @@ struct cpu_stopper {
 };
 
 static DEFINE_PER_CPU(struct cpu_stopper, cpu_stopper);
+static bool stop_machine_initialized = false;
 
 static void cpu_stop_init_done(struct cpu_stop_done *done, unsigned int nr_todo)
 {
@@ -386,6 +387,8 @@ static int __init cpu_stop_init(void)
 	cpu_stop_cpu_callback(&cpu_stop_cpu_notifier, CPU_ONLINE, bcpu);
 	register_cpu_notifier(&cpu_stop_cpu_notifier);
 
+	stop_machine_initialized = true;
+
 	return 0;
 }
 early_initcall(cpu_stop_init);
@@ -485,6 +488,24 @@ int __stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus)
 					    .num_threads = num_online_cpus(),
 					    .active_cpus = cpus };
 
+	if (!stop_machine_initialized) {
+		/*
+		 * Handle the case where stop_machine() is called early in boot
+		 * before SMP startup.
+		 */
+ 		unsigned long flags;
+		int ret;
+
+		WARN_ON_ONCE(smdata.num_threads != 1);
+
+		local_irq_save(flags);
+		hard_irq_disable();
+		ret = (*fn)(data);
+		local_irq_restore(flags);
+
+		return ret;
+	}
+
 	/* Set the initial state and stop all online cpus. */
 	set_state(&smdata, STOPMACHINE_PREPARE);
 	return stop_cpus(cpu_online_mask, stop_machine_cpu_stop, &smdata);
-- 
1.7.6.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don't nop it out
  2011-10-01 21:55 [PATCH RFC V2 0/5] jump-label: allow early jump_label_enable() Jeremy Fitzhardinge
  2011-10-01 21:55 ` [PATCH RFC V2 1/5] jump_label: use proper atomic_t initializer Jeremy Fitzhardinge
  2011-10-01 21:55 ` [PATCH RFC V2 2/5] stop_machine: make stop_machine safe and efficient to call early Jeremy Fitzhardinge
@ 2011-10-01 21:55 ` Jeremy Fitzhardinge
  2011-10-03 15:02   ` Jason Baron
  2011-10-10 15:36   ` [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don't nop it out Jason Baron
  2011-10-01 21:55 ` [PATCH RFC V2 4/5] x86/jump_label: drop arch_jump_label_text_poke_early() Jeremy Fitzhardinge
  2011-10-01 21:55 ` [PATCH RFC V2 5/5] sparc/jump_label: " Jeremy Fitzhardinge
  4 siblings, 2 replies; 53+ messages in thread
From: Jeremy Fitzhardinge @ 2011-10-01 21:55 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: David S. Miller, David Daney, Michael Ellerman, Jan Glauber,
	Jason Baron, the arch/x86 maintainers, Xen Devel,
	Linux Kernel Mailing List, Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

If a key has been enabled before jump_label_init() is called, don't
nop it out.

This removes arch_jump_label_text_poke_early() (which can only nop
out a site) and uses arch_jump_label_transform() instead.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 include/linux/jump_label.h |    3 ++-
 kernel/jump_label.c        |   20 ++++++++------------
 2 files changed, 10 insertions(+), 13 deletions(-)

diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
index 1213e9d..c8fb1b3 100644
--- a/include/linux/jump_label.h
+++ b/include/linux/jump_label.h
@@ -45,7 +45,8 @@ extern void jump_label_lock(void);
 extern void jump_label_unlock(void);
 extern void arch_jump_label_transform(struct jump_entry *entry,
 				 enum jump_label_type type);
-extern void arch_jump_label_text_poke_early(jump_label_t addr);
+extern void arch_jump_label_transform_early(struct jump_entry *entry,
+				 enum jump_label_type type);
 extern int jump_label_text_reserved(void *start, void *end);
 extern void jump_label_inc(struct jump_label_key *key);
 extern void jump_label_dec(struct jump_label_key *key);
diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index a8ce450..059202d5 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -121,13 +121,6 @@ static void __jump_label_update(struct jump_label_key *key,
 	}
 }
 
-/*
- * Not all archs need this.
- */
-void __weak arch_jump_label_text_poke_early(jump_label_t addr)
-{
-}
-
 static __init int jump_label_init(void)
 {
 	struct jump_entry *iter_start = __start___jump_table;
@@ -139,12 +132,15 @@ static __init int jump_label_init(void)
 	jump_label_sort_entries(iter_start, iter_stop);
 
 	for (iter = iter_start; iter < iter_stop; iter++) {
-		arch_jump_label_text_poke_early(iter->code);
-		if (iter->key == (jump_label_t)(unsigned long)key)
+		struct jump_label_key *iterk;
+
+		iterk = (struct jump_label_key *)(unsigned long)iter->key;
+		arch_jump_label_transform(iter, jump_label_enabled(iterk) ?
+					  JUMP_LABEL_ENABLE : JUMP_LABEL_DISABLE);
+		if (iterk == key)
 			continue;
 
-		key = (struct jump_label_key *)(unsigned long)iter->key;
-		atomic_set(&key->enabled, 0);
+		key = iterk;
 		key->entries = iter;
 #ifdef CONFIG_MODULES
 		key->next = NULL;
@@ -212,7 +208,7 @@ void jump_label_apply_nops(struct module *mod)
 		return;
 
 	for (iter = iter_start; iter < iter_stop; iter++)
-		arch_jump_label_text_poke_early(iter->code);
+		arch_jump_label_transform(iter, JUMP_LABEL_DISABLE);
 }
 
 static int jump_label_add_module(struct module *mod)
-- 
1.7.6.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH RFC V2 4/5] x86/jump_label: drop arch_jump_label_text_poke_early()
  2011-10-01 21:55 [PATCH RFC V2 0/5] jump-label: allow early jump_label_enable() Jeremy Fitzhardinge
                   ` (2 preceding siblings ...)
  2011-10-01 21:55 ` [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don't nop it out Jeremy Fitzhardinge
@ 2011-10-01 21:55 ` Jeremy Fitzhardinge
  2011-10-01 21:55 ` [PATCH RFC V2 5/5] sparc/jump_label: " Jeremy Fitzhardinge
  4 siblings, 0 replies; 53+ messages in thread
From: Jeremy Fitzhardinge @ 2011-10-01 21:55 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: David S. Miller, David Daney, Michael Ellerman, Jan Glauber,
	Jason Baron, the arch/x86 maintainers, Xen Devel,
	Linux Kernel Mailing List, Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

It is no longer used.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Jason Baron <jbaron@redhat.com>
---
 arch/x86/kernel/jump_label.c |    6 ------
 1 files changed, 0 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
index 3fee346..2ad0298 100644
--- a/arch/x86/kernel/jump_label.c
+++ b/arch/x86/kernel/jump_label.c
@@ -42,10 +42,4 @@ void arch_jump_label_transform(struct jump_entry *entry,
 	put_online_cpus();
 }
 
-void arch_jump_label_text_poke_early(jump_label_t addr)
-{
-	text_poke_early((void *)addr, ideal_nops[NOP_ATOMIC5],
-			JUMP_LABEL_NOP_SIZE);
-}
-
 #endif
-- 
1.7.6.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH RFC V2 5/5] sparc/jump_label: drop arch_jump_label_text_poke_early()
  2011-10-01 21:55 [PATCH RFC V2 0/5] jump-label: allow early jump_label_enable() Jeremy Fitzhardinge
                   ` (3 preceding siblings ...)
  2011-10-01 21:55 ` [PATCH RFC V2 4/5] x86/jump_label: drop arch_jump_label_text_poke_early() Jeremy Fitzhardinge
@ 2011-10-01 21:55 ` Jeremy Fitzhardinge
  4 siblings, 0 replies; 53+ messages in thread
From: Jeremy Fitzhardinge @ 2011-10-01 21:55 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: David S. Miller, David Daney, Michael Ellerman, Jan Glauber,
	Jason Baron, the arch/x86 maintainers, Xen Devel,
	Linux Kernel Mailing List, Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

It is no longer used.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: David Miller <davem@davemloft.net>
---
 arch/sparc/kernel/jump_label.c |    8 --------
 1 files changed, 0 insertions(+), 8 deletions(-)

diff --git a/arch/sparc/kernel/jump_label.c b/arch/sparc/kernel/jump_label.c
index ea2dafc..971fd43 100644
--- a/arch/sparc/kernel/jump_label.c
+++ b/arch/sparc/kernel/jump_label.c
@@ -36,12 +36,4 @@ void arch_jump_label_transform(struct jump_entry *entry,
 	put_online_cpus();
 }
 
-void arch_jump_label_text_poke_early(jump_label_t addr)
-{
-	u32 *insn_p = (u32 *) (unsigned long) addr;
-
-	*insn_p = 0x01000000;
-	flushi(insn_p);
-}
-
 #endif
-- 
1.7.6.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH RFC V2 2/5] stop_machine: make stop_machine safe and efficient to call early
  2011-10-01 21:55 ` [PATCH RFC V2 2/5] stop_machine: make stop_machine safe and efficient to call early Jeremy Fitzhardinge
@ 2011-10-02  0:36   ` Tejun Heo
  2011-10-03 19:24   ` [Xen-devel] " Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 53+ messages in thread
From: Tejun Heo @ 2011-10-02  0:36 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Steven Rostedt, David S. Miller, David Daney, Michael Ellerman,
	Jan Glauber, Jason Baron, the arch/x86 maintainers, Xen Devel,
	Linux Kernel Mailing List, Jeremy Fitzhardinge, Rusty Russell,
	Peter Zijlstra, Andrew Morton, H. Peter Anvin, Ingo Molnar

Hello,

On Sat, Oct 01, 2011 at 02:55:34PM -0700, Jeremy Fitzhardinge wrote:
> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> 
> Make stop_machine() safe to call early in boot, before SMP has been
> set up, by simply calling the callback function directly if there's
> only one CPU online.

Maybe "before stop_machine is initialized" is better wording now both
here and in the comment?

> [ Fixes from AKPM:
>    - add comment
>    - local_irq_flags, not save_flags
>    - also call hard_irq_disable() for systems which need it
> 
>   Tejun suggested using an explicit flag rather than just looking at
>   the online cpu count. ]
> 
> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: H. Peter Anvin <hpa@linux.intel.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Steven Rostedt <rostedt@goodmis.org>

Other than that,

 Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don't nop it out
  2011-10-01 21:55 ` [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don't nop it out Jeremy Fitzhardinge
@ 2011-10-03 15:02   ` Jason Baron
  2011-10-03 15:47     ` Steven Rostedt
  2011-10-03 16:27     ` Jeremy Fitzhardinge
  2011-10-10 15:36   ` [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don't nop it out Jason Baron
  1 sibling, 2 replies; 53+ messages in thread
From: Jason Baron @ 2011-10-03 15:02 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Steven Rostedt, David S. Miller, David Daney, Michael Ellerman,
	Jan Glauber, the arch/x86 maintainers, Xen Devel,
	Linux Kernel Mailing List, Jeremy Fitzhardinge, peterz

Hi,

(Sorry for the late reply - I was away for a few days).

The early enable is really nice - it means there are not restrictions on
when jump_label_inc()/dec() can be called which is nice.

comments below.


On Sat, Oct 01, 2011 at 02:55:35PM -0700, Jeremy Fitzhardinge wrote:
> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> 
> If a key has been enabled before jump_label_init() is called, don't
> nop it out.
> 
> This removes arch_jump_label_text_poke_early() (which can only nop
> out a site) and uses arch_jump_label_transform() instead.
> 
> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> ---
>  include/linux/jump_label.h |    3 ++-
>  kernel/jump_label.c        |   20 ++++++++------------
>  2 files changed, 10 insertions(+), 13 deletions(-)
> 
> diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
> index 1213e9d..c8fb1b3 100644
> --- a/include/linux/jump_label.h
> +++ b/include/linux/jump_label.h
> @@ -45,7 +45,8 @@ extern void jump_label_lock(void);
>  extern void jump_label_unlock(void);
>  extern void arch_jump_label_transform(struct jump_entry *entry,
>  				 enum jump_label_type type);
> -extern void arch_jump_label_text_poke_early(jump_label_t addr);
> +extern void arch_jump_label_transform_early(struct jump_entry *entry,
> +				 enum jump_label_type type);
>  extern int jump_label_text_reserved(void *start, void *end);
>  extern void jump_label_inc(struct jump_label_key *key);
>  extern void jump_label_dec(struct jump_label_key *key);
> diff --git a/kernel/jump_label.c b/kernel/jump_label.c
> index a8ce450..059202d5 100644
> --- a/kernel/jump_label.c
> +++ b/kernel/jump_label.c
> @@ -121,13 +121,6 @@ static void __jump_label_update(struct jump_label_key *key,
>  	}
>  }
>  
> -/*
> - * Not all archs need this.
> - */
> -void __weak arch_jump_label_text_poke_early(jump_label_t addr)
> -{
> -}
> -
>  static __init int jump_label_init(void)
>  {
>  	struct jump_entry *iter_start = __start___jump_table;
> @@ -139,12 +132,15 @@ static __init int jump_label_init(void)
>  	jump_label_sort_entries(iter_start, iter_stop);
>  
>  	for (iter = iter_start; iter < iter_stop; iter++) {
> -		arch_jump_label_text_poke_early(iter->code);
> -		if (iter->key == (jump_label_t)(unsigned long)key)
> +		struct jump_label_key *iterk;
> +
> +		iterk = (struct jump_label_key *)(unsigned long)iter->key;
> +		arch_jump_label_transform(iter, jump_label_enabled(iterk) ?
> +					  JUMP_LABEL_ENABLE : JUMP_LABEL_DISABLE);

The only reason I called this at boot-time was that the 'ideal' x86
no-op isn't known until boot time. Thus, in the enabled case we could
skip the the arch_jump_label_transform() call. ie:

if (!enabled)
	arch_jump_label_transform(iter, JUMP_LABEL_DISABLE);


> +		if (iterk == key)
>  			continue;
>  
> -		key = (struct jump_label_key *)(unsigned long)iter->key;
> -		atomic_set(&key->enabled, 0);
> +		key = iterk;
>  		key->entries = iter;
>  #ifdef CONFIG_MODULES
>  		key->next = NULL;
> @@ -212,7 +208,7 @@ void jump_label_apply_nops(struct module *mod)
>  		return;
>  
>  	for (iter = iter_start; iter < iter_stop; iter++)
> -		arch_jump_label_text_poke_early(iter->code);
> +		arch_jump_label_transform(iter, JUMP_LABEL_DISABLE);
>  }
>  
>  static int jump_label_add_module(struct module *mod)
> -- 
> 1.7.6.2
> 

hmmm...this is used on module load in smp - so this would introduce a number of
calls to stop_machine() where we didn't have them before. Yes, module
load is a very slow path to begin with, but I think its at least worth
pointing out...

thanks,

-Jason

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don't nop it out
  2011-10-03 15:02   ` Jason Baron
@ 2011-10-03 15:47     ` Steven Rostedt
  2011-10-03 16:27     ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 53+ messages in thread
From: Steven Rostedt @ 2011-10-03 15:47 UTC (permalink / raw)
  To: Jason Baron
  Cc: Jeremy Fitzhardinge, David S. Miller, David Daney,
	Michael Ellerman, Jan Glauber, the arch/x86 maintainers,
	Xen Devel, Linux Kernel Mailing List, Jeremy Fitzhardinge,
	peterz

On Mon, 2011-10-03 at 11:02 -0400, Jason Baron wrote:

> if (!enabled)
> 	arch_jump_label_transform(iter, JUMP_LABEL_DISABLE);
> 
> 
> > +		if (iterk == key)
> >  			continue;
> >  
> > -		key = (struct jump_label_key *)(unsigned long)iter->key;
> > -		atomic_set(&key->enabled, 0);
> > +		key = iterk;
> >  		key->entries = iter;
> >  #ifdef CONFIG_MODULES
> >  		key->next = NULL;
> > @@ -212,7 +208,7 @@ void jump_label_apply_nops(struct module *mod)
> >  		return;
> >  
> >  	for (iter = iter_start; iter < iter_stop; iter++)
> > -		arch_jump_label_text_poke_early(iter->code);
> > +		arch_jump_label_transform(iter, JUMP_LABEL_DISABLE);
> >  }
> >  
> >  static int jump_label_add_module(struct module *mod)
> > -- 
> > 1.7.6.2
> > 
> 
> hmmm...this is used on module load in smp - so this would introduce a number of
> calls to stop_machine() where we didn't have them before. Yes, module
> load is a very slow path to begin with, but I think its at least worth
> pointing out...

And it is a good point to point out. As stop_machine becomes noticeable
by users on large scale CPU boxes. Ideally, we want to avoid stopmachine
when we do not need it.

-- Steve



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don't nop it out
  2011-10-03 15:02   ` Jason Baron
  2011-10-03 15:47     ` Steven Rostedt
@ 2011-10-03 16:27     ` Jeremy Fitzhardinge
  2011-10-04 14:10       ` Jason Baron
  1 sibling, 1 reply; 53+ messages in thread
From: Jeremy Fitzhardinge @ 2011-10-03 16:27 UTC (permalink / raw)
  To: Jason Baron
  Cc: Steven Rostedt, David S. Miller, David Daney, Michael Ellerman,
	Jan Glauber, the arch/x86 maintainers, Xen Devel,
	Linux Kernel Mailing List, Jeremy Fitzhardinge, peterz

On 10/03/2011 08:02 AM, Jason Baron wrote:
> Hi,
>
> (Sorry for the late reply - I was away for a few days).
>
> The early enable is really nice - it means there are not restrictions on
> when jump_label_inc()/dec() can be called which is nice.
>
> comments below.
>
>
> On Sat, Oct 01, 2011 at 02:55:35PM -0700, Jeremy Fitzhardinge wrote:
>> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>>
>> If a key has been enabled before jump_label_init() is called, don't
>> nop it out.
>>
>> This removes arch_jump_label_text_poke_early() (which can only nop
>> out a site) and uses arch_jump_label_transform() instead.
>>
>> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>> ---
>>  include/linux/jump_label.h |    3 ++-
>>  kernel/jump_label.c        |   20 ++++++++------------
>>  2 files changed, 10 insertions(+), 13 deletions(-)
>>
>> diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
>> index 1213e9d..c8fb1b3 100644
>> --- a/include/linux/jump_label.h
>> +++ b/include/linux/jump_label.h
>> @@ -45,7 +45,8 @@ extern void jump_label_lock(void);
>>  extern void jump_label_unlock(void);
>>  extern void arch_jump_label_transform(struct jump_entry *entry,
>>  				 enum jump_label_type type);
>> -extern void arch_jump_label_text_poke_early(jump_label_t addr);
>> +extern void arch_jump_label_transform_early(struct jump_entry *entry,
>> +				 enum jump_label_type type);
>>  extern int jump_label_text_reserved(void *start, void *end);
>>  extern void jump_label_inc(struct jump_label_key *key);
>>  extern void jump_label_dec(struct jump_label_key *key);
>> diff --git a/kernel/jump_label.c b/kernel/jump_label.c
>> index a8ce450..059202d5 100644
>> --- a/kernel/jump_label.c
>> +++ b/kernel/jump_label.c
>> @@ -121,13 +121,6 @@ static void __jump_label_update(struct jump_label_key *key,
>>  	}
>>  }
>>  
>> -/*
>> - * Not all archs need this.
>> - */
>> -void __weak arch_jump_label_text_poke_early(jump_label_t addr)
>> -{
>> -}
>> -
>>  static __init int jump_label_init(void)
>>  {
>>  	struct jump_entry *iter_start = __start___jump_table;
>> @@ -139,12 +132,15 @@ static __init int jump_label_init(void)
>>  	jump_label_sort_entries(iter_start, iter_stop);
>>  
>>  	for (iter = iter_start; iter < iter_stop; iter++) {
>> -		arch_jump_label_text_poke_early(iter->code);
>> -		if (iter->key == (jump_label_t)(unsigned long)key)
>> +		struct jump_label_key *iterk;
>> +
>> +		iterk = (struct jump_label_key *)(unsigned long)iter->key;
>> +		arch_jump_label_transform(iter, jump_label_enabled(iterk) ?
>> +					  JUMP_LABEL_ENABLE : JUMP_LABEL_DISABLE);
> The only reason I called this at boot-time was that the 'ideal' x86
> no-op isn't known until boot time. Thus, in the enabled case we could
> skip the the arch_jump_label_transform() call. ie:
>
> if (!enabled)
> 	arch_jump_label_transform(iter, JUMP_LABEL_DISABLE);


Yep, fair enough.

>
>
>> +		if (iterk == key)
>>  			continue;
>>  
>> -		key = (struct jump_label_key *)(unsigned long)iter->key;
>> -		atomic_set(&key->enabled, 0);
>> +		key = iterk;
>>  		key->entries = iter;
>>  #ifdef CONFIG_MODULES
>>  		key->next = NULL;
>> @@ -212,7 +208,7 @@ void jump_label_apply_nops(struct module *mod)
>>  		return;
>>  
>>  	for (iter = iter_start; iter < iter_stop; iter++)
>> -		arch_jump_label_text_poke_early(iter->code);
>> +		arch_jump_label_transform(iter, JUMP_LABEL_DISABLE);
>>  }
>>  
>>  static int jump_label_add_module(struct module *mod)
>> -- 
>> 1.7.6.2
>>
> hmmm...this is used on module load in smp - so this would introduce a number of
> calls to stop_machine() where we didn't have them before. Yes, module
> load is a very slow path to begin with, but I think its at least worth
> pointing out...

Ah, that explains it - the module stuff certainly isn't "early" except -
I guess - in the module's lifetime.

Well, I suppose I could introduce either second variant of the function,
or add a "live" flag (ie, may be updating code that a processor is
executing), which requires a stop_machine, or direct update if it doesn't.

But is there any reason why we couldn't just generate a reasonably
efficient 5-byte atomic nop in the first place, and get rid of all that
fooling around?  It looks like x86 is the only arch where it makes any
difference at all, and how much difference does it really make?  Or is
there no one 5-byte atomic nop that works on all x86 variants, aside
from jmp +0?

    J

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Xen-devel] [PATCH RFC V2 2/5] stop_machine: make stop_machine safe and efficient to call early
  2011-10-01 21:55 ` [PATCH RFC V2 2/5] stop_machine: make stop_machine safe and efficient to call early Jeremy Fitzhardinge
  2011-10-02  0:36   ` Tejun Heo
@ 2011-10-03 19:24   ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 53+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-10-03 19:24 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Steven Rostedt, Rusty Russell, Xen Devel, Peter Zijlstra,
	Jan Glauber, Jason Baron, the arch/x86 maintainers, David Daney,
	Linux Kernel Mailing List, Michael Ellerman, Jeremy Fitzhardinge,
	Ingo Molnar, Tejun Heo, Andrew Morton, H. Peter Anvin,
	David S. Miller

On Sat, Oct 01, 2011 at 02:55:34PM -0700, Jeremy Fitzhardinge wrote:
> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> 
> Make stop_machine() safe to call early in boot, before SMP has been
> set up, by simply calling the callback function directly if there's
> only one CPU online.
> 
> [ Fixes from AKPM:
>    - add comment
>    - local_irq_flags, not save_flags
>    - also call hard_irq_disable() for systems which need it
> 
>   Tejun suggested using an explicit flag rather than just looking at
>   the online cpu count. ]
> 
> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: H. Peter Anvin <hpa@linux.intel.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> ---
>  kernel/stop_machine.c |   21 +++++++++++++++++++++
>  1 files changed, 21 insertions(+), 0 deletions(-)
> 
> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> index ba5070c..9c59d9e 100644
> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -41,6 +41,7 @@ struct cpu_stopper {
>  };
>  
>  static DEFINE_PER_CPU(struct cpu_stopper, cpu_stopper);
> +static bool stop_machine_initialized = false;

__read_mostly?

Thought it probably does not really matter that much in what section it
is put in.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don't nop it out
  2011-10-03 16:27     ` Jeremy Fitzhardinge
@ 2011-10-04 14:10       ` Jason Baron
  2011-10-04 15:18         ` Jeremy Fitzhardinge
  2011-10-04 16:30         ` H. Peter Anvin
  0 siblings, 2 replies; 53+ messages in thread
From: Jason Baron @ 2011-10-04 14:10 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Steven Rostedt, David S. Miller, David Daney, Michael Ellerman,
	Jan Glauber, the arch/x86 maintainers, Xen Devel,
	Linux Kernel Mailing List, Jeremy Fitzhardinge, peterz, hpa

On Mon, Oct 03, 2011 at 09:27:56AM -0700, Jeremy Fitzhardinge wrote:
> On 10/03/2011 08:02 AM, Jason Baron wrote:
> > Hi,
> >
> > (Sorry for the late reply - I was away for a few days).
> >
> > The early enable is really nice - it means there are not restrictions on
> > when jump_label_inc()/dec() can be called which is nice.
> >
> > comments below.
> >
> >
> > On Sat, Oct 01, 2011 at 02:55:35PM -0700, Jeremy Fitzhardinge wrote:
> >> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> >>
> >> If a key has been enabled before jump_label_init() is called, don't
> >> nop it out.
> >>
> >> This removes arch_jump_label_text_poke_early() (which can only nop
> >> out a site) and uses arch_jump_label_transform() instead.
> >>
> >> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> >> ---
> >>  include/linux/jump_label.h |    3 ++-
> >>  kernel/jump_label.c        |   20 ++++++++------------
> >>  2 files changed, 10 insertions(+), 13 deletions(-)
> >>
> >> diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
> >> index 1213e9d..c8fb1b3 100644
> >> --- a/include/linux/jump_label.h
> >> +++ b/include/linux/jump_label.h
> >> @@ -45,7 +45,8 @@ extern void jump_label_lock(void);
> >>  extern void jump_label_unlock(void);
> >>  extern void arch_jump_label_transform(struct jump_entry *entry,
> >>  				 enum jump_label_type type);
> >> -extern void arch_jump_label_text_poke_early(jump_label_t addr);
> >> +extern void arch_jump_label_transform_early(struct jump_entry *entry,
> >> +				 enum jump_label_type type);
> >>  extern int jump_label_text_reserved(void *start, void *end);
> >>  extern void jump_label_inc(struct jump_label_key *key);
> >>  extern void jump_label_dec(struct jump_label_key *key);
> >> diff --git a/kernel/jump_label.c b/kernel/jump_label.c
> >> index a8ce450..059202d5 100644
> >> --- a/kernel/jump_label.c
> >> +++ b/kernel/jump_label.c
> >> @@ -121,13 +121,6 @@ static void __jump_label_update(struct jump_label_key *key,
> >>  	}
> >>  }
> >>  
> >> -/*
> >> - * Not all archs need this.
> >> - */
> >> -void __weak arch_jump_label_text_poke_early(jump_label_t addr)
> >> -{
> >> -}
> >> -
> >>  static __init int jump_label_init(void)
> >>  {
> >>  	struct jump_entry *iter_start = __start___jump_table;
> >> @@ -139,12 +132,15 @@ static __init int jump_label_init(void)
> >>  	jump_label_sort_entries(iter_start, iter_stop);
> >>  
> >>  	for (iter = iter_start; iter < iter_stop; iter++) {
> >> -		arch_jump_label_text_poke_early(iter->code);
> >> -		if (iter->key == (jump_label_t)(unsigned long)key)
> >> +		struct jump_label_key *iterk;
> >> +
> >> +		iterk = (struct jump_label_key *)(unsigned long)iter->key;
> >> +		arch_jump_label_transform(iter, jump_label_enabled(iterk) ?
> >> +					  JUMP_LABEL_ENABLE : JUMP_LABEL_DISABLE);
> > The only reason I called this at boot-time was that the 'ideal' x86
> > no-op isn't known until boot time. Thus, in the enabled case we could
> > skip the the arch_jump_label_transform() call. ie:
> >
> > if (!enabled)
> > 	arch_jump_label_transform(iter, JUMP_LABEL_DISABLE);
> 
> 
> Yep, fair enough.
> 
> >
> >
> >> +		if (iterk == key)
> >>  			continue;
> >>  
> >> -		key = (struct jump_label_key *)(unsigned long)iter->key;
> >> -		atomic_set(&key->enabled, 0);
> >> +		key = iterk;
> >>  		key->entries = iter;
> >>  #ifdef CONFIG_MODULES
> >>  		key->next = NULL;
> >> @@ -212,7 +208,7 @@ void jump_label_apply_nops(struct module *mod)
> >>  		return;
> >>  
> >>  	for (iter = iter_start; iter < iter_stop; iter++)
> >> -		arch_jump_label_text_poke_early(iter->code);
> >> +		arch_jump_label_transform(iter, JUMP_LABEL_DISABLE);
> >>  }
> >>  
> >>  static int jump_label_add_module(struct module *mod)
> >> -- 
> >> 1.7.6.2
> >>
> > hmmm...this is used on module load in smp - so this would introduce a number of
> > calls to stop_machine() where we didn't have them before. Yes, module
> > load is a very slow path to begin with, but I think its at least worth
> > pointing out...
> 
> Ah, that explains it - the module stuff certainly isn't "early" except -
> I guess - in the module's lifetime.
> 
> Well, I suppose I could introduce either second variant of the function,
> or add a "live" flag (ie, may be updating code that a processor is
> executing), which requires a stop_machine, or direct update if it doesn't.
> 
> But is there any reason why we couldn't just generate a reasonably
> efficient 5-byte atomic nop in the first place, and get rid of all that
> fooling around?  It looks like x86 is the only arch where it makes any
> difference at all, and how much difference does it really make?  Or is
> there no one 5-byte atomic nop that works on all x86 variants, aside
> from jmp +0?
> 
>     J

Yes, there are really two reasons for the initial no-op patching pass:

1) The jmp +0, is a 'safe' no-op that I know is going to initially
boot for all x86. I'm not sure if there is a 5-byte nop that works on
all x86 variants - but by using jmp +0, we make it much easier to debug
cases where we may be using broken no-ops.

2) This optimization is about as close to a 0 cost off case as possible.
I know there have been various no-op benchmarks posted on lkml in the
past, so the choice of no-op does seem to make a difference. see:
http://lkml.indiana.edu/hypermail/linux/kernel/0808.1/2416.html, for
example. So at least to me, if we are not using the lowest cost no-op,
we are at least in-part defeating the point of this optimization.

I like the "live" flag suggestion mentioned above. Less functions is
better, and non-x86 arches can simply ignore the flag.

Thanks,

-Jason






^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don't nop it out
  2011-10-04 14:10       ` Jason Baron
@ 2011-10-04 15:18         ` Jeremy Fitzhardinge
  2011-10-04 16:30         ` H. Peter Anvin
  1 sibling, 0 replies; 53+ messages in thread
From: Jeremy Fitzhardinge @ 2011-10-04 15:18 UTC (permalink / raw)
  To: Jason Baron
  Cc: Steven Rostedt, David S. Miller, David Daney, Michael Ellerman,
	Jan Glauber, the arch/x86 maintainers, Xen Devel,
	Linux Kernel Mailing List, Jeremy Fitzhardinge, peterz, hpa

On 10/04/2011 07:10 AM, Jason Baron wrote:
> On Mon, Oct 03, 2011 at 09:27:56AM -0700, Jeremy Fitzhardinge wrote:
>> On 10/03/2011 08:02 AM, Jason Baron wrote:
>>> Hi,
>>>
>>> (Sorry for the late reply - I was away for a few days).
>>>
>>> The early enable is really nice - it means there are not restrictions on
>>> when jump_label_inc()/dec() can be called which is nice.
>>>
>>> comments below.
>>>
>>>
>>> On Sat, Oct 01, 2011 at 02:55:35PM -0700, Jeremy Fitzhardinge wrote:
>>>> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>>>>
>>>> If a key has been enabled before jump_label_init() is called, don't
>>>> nop it out.
>>>>
>>>> This removes arch_jump_label_text_poke_early() (which can only nop
>>>> out a site) and uses arch_jump_label_transform() instead.
>>>>
>>>> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>>>> ---
>>>>  include/linux/jump_label.h |    3 ++-
>>>>  kernel/jump_label.c        |   20 ++++++++------------
>>>>  2 files changed, 10 insertions(+), 13 deletions(-)
>>>>
>>>> diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
>>>> index 1213e9d..c8fb1b3 100644
>>>> --- a/include/linux/jump_label.h
>>>> +++ b/include/linux/jump_label.h
>>>> @@ -45,7 +45,8 @@ extern void jump_label_lock(void);
>>>>  extern void jump_label_unlock(void);
>>>>  extern void arch_jump_label_transform(struct jump_entry *entry,
>>>>  				 enum jump_label_type type);
>>>> -extern void arch_jump_label_text_poke_early(jump_label_t addr);
>>>> +extern void arch_jump_label_transform_early(struct jump_entry *entry,
>>>> +				 enum jump_label_type type);
>>>>  extern int jump_label_text_reserved(void *start, void *end);
>>>>  extern void jump_label_inc(struct jump_label_key *key);
>>>>  extern void jump_label_dec(struct jump_label_key *key);
>>>> diff --git a/kernel/jump_label.c b/kernel/jump_label.c
>>>> index a8ce450..059202d5 100644
>>>> --- a/kernel/jump_label.c
>>>> +++ b/kernel/jump_label.c
>>>> @@ -121,13 +121,6 @@ static void __jump_label_update(struct jump_label_key *key,
>>>>  	}
>>>>  }
>>>>  
>>>> -/*
>>>> - * Not all archs need this.
>>>> - */
>>>> -void __weak arch_jump_label_text_poke_early(jump_label_t addr)
>>>> -{
>>>> -}
>>>> -
>>>>  static __init int jump_label_init(void)
>>>>  {
>>>>  	struct jump_entry *iter_start = __start___jump_table;
>>>> @@ -139,12 +132,15 @@ static __init int jump_label_init(void)
>>>>  	jump_label_sort_entries(iter_start, iter_stop);
>>>>  
>>>>  	for (iter = iter_start; iter < iter_stop; iter++) {
>>>> -		arch_jump_label_text_poke_early(iter->code);
>>>> -		if (iter->key == (jump_label_t)(unsigned long)key)
>>>> +		struct jump_label_key *iterk;
>>>> +
>>>> +		iterk = (struct jump_label_key *)(unsigned long)iter->key;
>>>> +		arch_jump_label_transform(iter, jump_label_enabled(iterk) ?
>>>> +					  JUMP_LABEL_ENABLE : JUMP_LABEL_DISABLE);
>>> The only reason I called this at boot-time was that the 'ideal' x86
>>> no-op isn't known until boot time. Thus, in the enabled case we could
>>> skip the the arch_jump_label_transform() call. ie:
>>>
>>> if (!enabled)
>>> 	arch_jump_label_transform(iter, JUMP_LABEL_DISABLE);
>>
>> Yep, fair enough.
>>
>>>
>>>> +		if (iterk == key)
>>>>  			continue;
>>>>  
>>>> -		key = (struct jump_label_key *)(unsigned long)iter->key;
>>>> -		atomic_set(&key->enabled, 0);
>>>> +		key = iterk;
>>>>  		key->entries = iter;
>>>>  #ifdef CONFIG_MODULES
>>>>  		key->next = NULL;
>>>> @@ -212,7 +208,7 @@ void jump_label_apply_nops(struct module *mod)
>>>>  		return;
>>>>  
>>>>  	for (iter = iter_start; iter < iter_stop; iter++)
>>>> -		arch_jump_label_text_poke_early(iter->code);
>>>> +		arch_jump_label_transform(iter, JUMP_LABEL_DISABLE);
>>>>  }
>>>>  
>>>>  static int jump_label_add_module(struct module *mod)
>>>> -- 
>>>> 1.7.6.2
>>>>
>>> hmmm...this is used on module load in smp - so this would introduce a number of
>>> calls to stop_machine() where we didn't have them before. Yes, module
>>> load is a very slow path to begin with, but I think its at least worth
>>> pointing out...
>> Ah, that explains it - the module stuff certainly isn't "early" except -
>> I guess - in the module's lifetime.
>>
>> Well, I suppose I could introduce either second variant of the function,
>> or add a "live" flag (ie, may be updating code that a processor is
>> executing), which requires a stop_machine, or direct update if it doesn't.
>>
>> But is there any reason why we couldn't just generate a reasonably
>> efficient 5-byte atomic nop in the first place, and get rid of all that
>> fooling around?  It looks like x86 is the only arch where it makes any
>> difference at all, and how much difference does it really make?  Or is
>> there no one 5-byte atomic nop that works on all x86 variants, aside
>> from jmp +0?
>>
>>     J
> Yes, there are really two reasons for the initial no-op patching pass:
>
> 1) The jmp +0, is a 'safe' no-op that I know is going to initially
> boot for all x86. I'm not sure if there is a 5-byte nop that works on
> all x86 variants - but by using jmp +0, we make it much easier to debug
> cases where we may be using broken no-ops.
>
> 2) This optimization is about as close to a 0 cost off case as possible.
> I know there have been various no-op benchmarks posted on lkml in the
> past, so the choice of no-op does seem to make a difference. see:
> http://lkml.indiana.edu/hypermail/linux/kernel/0808.1/2416.html, for
> example. So at least to me, if we are not using the lowest cost no-op,
> we are at least in-part defeating the point of this optimization.
>
> I like the "live" flag suggestion mentioned above. Less functions is
> better, and non-x86 arches can simply ignore the flag.

I went the other way and added a second function,
arch_jump_label_transform_static(), which has a weak default
implementation which calls arch_jump_label_transform().  That way only
the architectures which really care about it need implement a second
variant. I did x86 and s390 by adapting the patches I had from the other
series; it didn't look like mips/sparc/power were very heavyweight at all.

    J

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don't nop it out
  2011-10-04 14:10       ` Jason Baron
  2011-10-04 15:18         ` Jeremy Fitzhardinge
@ 2011-10-04 16:30         ` H. Peter Anvin
  2011-10-04 17:53           ` Jason Baron
  2011-10-06  0:16           ` Jeremy Fitzhardinge
  1 sibling, 2 replies; 53+ messages in thread
From: H. Peter Anvin @ 2011-10-04 16:30 UTC (permalink / raw)
  To: Jason Baron
  Cc: Jeremy Fitzhardinge, Steven Rostedt, David S. Miller,
	David Daney, Michael Ellerman, Jan Glauber,
	the arch/x86 maintainers, Xen Devel, Linux Kernel Mailing List,
	Jeremy Fitzhardinge, peterz

On 10/04/2011 07:10 AM, Jason Baron wrote:
> 
> 1) The jmp +0, is a 'safe' no-op that I know is going to initially
> boot for all x86. I'm not sure if there is a 5-byte nop that works on
> all x86 variants - but by using jmp +0, we make it much easier to debug
> cases where we may be using broken no-ops.
> 

There are *plenty*.  jmp+0 is about as pessimal as you can get.

The current recommendation when you don't know the CPU you're running at is:

	3E 8D 74 26 00	(GENERIC_NOP5_ATOMIC)

... on 32 bits and ...

	0F 1F 44 00 00	(P6_NOP5_ATOMIC)

... on 64 bits.

	-hpa


-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don't nop it out
  2011-10-04 16:30         ` H. Peter Anvin
@ 2011-10-04 17:53           ` Jason Baron
  2011-10-04 18:05             ` Steven Rostedt
  2011-10-06  0:16           ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 53+ messages in thread
From: Jason Baron @ 2011-10-04 17:53 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jeremy Fitzhardinge, Steven Rostedt, David S. Miller,
	David Daney, Michael Ellerman, Jan Glauber,
	the arch/x86 maintainers, Xen Devel, Linux Kernel Mailing List,
	Jeremy Fitzhardinge, peterz

On Tue, Oct 04, 2011 at 09:30:01AM -0700, H. Peter Anvin wrote:
> On 10/04/2011 07:10 AM, Jason Baron wrote:
> > 
> > 1) The jmp +0, is a 'safe' no-op that I know is going to initially
> > boot for all x86. I'm not sure if there is a 5-byte nop that works on
> > all x86 variants - but by using jmp +0, we make it much easier to debug
> > cases where we may be using broken no-ops.
> > 
> 
> There are *plenty*.  jmp+0 is about as pessimal as you can get.
> 
> The current recommendation when you don't know the CPU you're running at is:
> 
> 	3E 8D 74 26 00	(GENERIC_NOP5_ATOMIC)
> 
> ... on 32 bits and ...
> 
> 	0F 1F 44 00 00	(P6_NOP5_ATOMIC)
> 
> ... on 64 bits.
> 
> 	-hpa
> 

We're currently patching the code at run-time (boot and module load
time), with the 'ideal' no-op anyway, so the initial no-op doesn't
really matter much (other than to save patching if the initial and ideal
match).

-Jason

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don't nop it out
  2011-10-04 17:53           ` Jason Baron
@ 2011-10-04 18:05             ` Steven Rostedt
  0 siblings, 0 replies; 53+ messages in thread
From: Steven Rostedt @ 2011-10-04 18:05 UTC (permalink / raw)
  To: Jason Baron
  Cc: H. Peter Anvin, Jeremy Fitzhardinge, David S. Miller,
	David Daney, Michael Ellerman, Jan Glauber,
	the arch/x86 maintainers, Xen Devel, Linux Kernel Mailing List,
	Jeremy Fitzhardinge, peterz

On Tue, 2011-10-04 at 13:53 -0400, Jason Baron wrote:
> On Tue, Oct 04, 2011 at 09:30:01AM -0700, H. Peter Anvin wrote:
> > On 10/04/2011 07:10 AM, Jason Baron wrote:
> > > 
> > > 1) The jmp +0, is a 'safe' no-op that I know is going to initially
> > > boot for all x86. I'm not sure if there is a 5-byte nop that works on
> > > all x86 variants - but by using jmp +0, we make it much easier to debug
> > > cases where we may be using broken no-ops.
> > > 
> > 
> > There are *plenty*.  jmp+0 is about as pessimal as you can get.
> > 
> > The current recommendation when you don't know the CPU you're running at is:
> > 
> > 	3E 8D 74 26 00	(GENERIC_NOP5_ATOMIC)
> > 
> > ... on 32 bits and ...
> > 
> > 	0F 1F 44 00 00	(P6_NOP5_ATOMIC)
> > 
> > ... on 64 bits.
> > 
> > 	-hpa
> > 
> 
> We're currently patching the code at run-time (boot and module load
> time), with the 'ideal' no-op anyway, so the initial no-op doesn't
> really matter much (other than to save patching if the initial and ideal
> match).

Out of correctness, we should still update this to use the proper
"default" nops, as mcount already does.

-- Steve



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don't nop it out
  2011-10-04 16:30         ` H. Peter Anvin
  2011-10-04 17:53           ` Jason Baron
@ 2011-10-06  0:16           ` Jeremy Fitzhardinge
  2011-10-06  0:17             ` H. Peter Anvin
  1 sibling, 1 reply; 53+ messages in thread
From: Jeremy Fitzhardinge @ 2011-10-06  0:16 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jason Baron, Steven Rostedt, David S. Miller, David Daney,
	Michael Ellerman, Jan Glauber, the arch/x86 maintainers,
	Xen Devel, Linux Kernel Mailing List, Jeremy Fitzhardinge,
	peterz

On 10/04/2011 09:30 AM, H. Peter Anvin wrote:
> On 10/04/2011 07:10 AM, Jason Baron wrote:
>> 1) The jmp +0, is a 'safe' no-op that I know is going to initially
>> boot for all x86. I'm not sure if there is a 5-byte nop that works on
>> all x86 variants - but by using jmp +0, we make it much easier to debug
>> cases where we may be using broken no-ops.
>>
> There are *plenty*.  jmp+0 is about as pessimal as you can get.

As an aside, do you know if a 2-byte unconditional jmp is any more
efficient than 5-byte, aside from just being a smaller instruction and
taking less icache?

    J

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don't nop it out
  2011-10-06  0:16           ` Jeremy Fitzhardinge
@ 2011-10-06  0:17             ` H. Peter Anvin
  2011-10-06  0:47               ` Jeremy Fitzhardinge
  2011-10-06 17:53               ` Jeremy Fitzhardinge
  0 siblings, 2 replies; 53+ messages in thread
From: H. Peter Anvin @ 2011-10-06  0:17 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Jason Baron, Steven Rostedt, David S. Miller, David Daney,
	Michael Ellerman, Jan Glauber, the arch/x86 maintainers,
	Xen Devel, Linux Kernel Mailing List, Jeremy Fitzhardinge,
	peterz

On 10/05/2011 05:16 PM, Jeremy Fitzhardinge wrote:
> On 10/04/2011 09:30 AM, H. Peter Anvin wrote:
>> On 10/04/2011 07:10 AM, Jason Baron wrote:
>>> 1) The jmp +0, is a 'safe' no-op that I know is going to initially
>>> boot for all x86. I'm not sure if there is a 5-byte nop that works on
>>> all x86 variants - but by using jmp +0, we make it much easier to debug
>>> cases where we may be using broken no-ops.
>>>
>> There are *plenty*.  jmp+0 is about as pessimal as you can get.
> 
> As an aside, do you know if a 2-byte unconditional jmp is any more
> efficient than 5-byte, aside from just being a smaller instruction and
> taking less icache?
> 

I don't know for sure, no.  I probably depends on the CPU.

	-hpa


-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don't nop it out
  2011-10-06  0:17             ` H. Peter Anvin
@ 2011-10-06  0:47               ` Jeremy Fitzhardinge
  2011-10-06 17:53               ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 53+ messages in thread
From: Jeremy Fitzhardinge @ 2011-10-06  0:47 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jason Baron, Steven Rostedt, David S. Miller, David Daney,
	Michael Ellerman, Jan Glauber, the arch/x86 maintainers,
	Xen Devel, Linux Kernel Mailing List, Jeremy Fitzhardinge,
	peterz

On 10/05/2011 05:17 PM, H. Peter Anvin wrote:
> On 10/05/2011 05:16 PM, Jeremy Fitzhardinge wrote:
>> On 10/04/2011 09:30 AM, H. Peter Anvin wrote:
>>> On 10/04/2011 07:10 AM, Jason Baron wrote:
>>>> 1) The jmp +0, is a 'safe' no-op that I know is going to initially
>>>> boot for all x86. I'm not sure if there is a 5-byte nop that works on
>>>> all x86 variants - but by using jmp +0, we make it much easier to debug
>>>> cases where we may be using broken no-ops.
>>>>
>>> There are *plenty*.  jmp+0 is about as pessimal as you can get.
>> As an aside, do you know if a 2-byte unconditional jmp is any more
>> efficient than 5-byte, aside from just being a smaller instruction and
>> taking less icache?
>>
> I don't know for sure, no.  I probably depends on the CPU.

I was thinking about making the jump-label stuff generate a small jmp if
the offset is small (specifically "jmp; nop3", or perhaps "jmp; ud2a;
nop" to make absolutely sure there's no speculation beyond the jmp), on
the grounds that, while it probably doesn't matter for any modern
Intel/AMD processor, it may help for others.  But I couldn't find any
concrete evidence to support it, and there's already enough questions
about doing "live" code updates without adding more instruction patterns.

    J

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don't nop it out
  2011-10-06  0:17             ` H. Peter Anvin
  2011-10-06  0:47               ` Jeremy Fitzhardinge
@ 2011-10-06 17:53               ` Jeremy Fitzhardinge
  2011-10-06 18:10                 ` Jason Baron
  1 sibling, 1 reply; 53+ messages in thread
From: Jeremy Fitzhardinge @ 2011-10-06 17:53 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jason Baron, Steven Rostedt, David S. Miller, David Daney,
	Michael Ellerman, Jan Glauber, the arch/x86 maintainers,
	Xen Devel, Linux Kernel Mailing List, Jeremy Fitzhardinge,
	peterz

[-- Attachment #1: Type: text/plain, Size: 902 bytes --]

On 10/05/2011 05:17 PM, H. Peter Anvin wrote:
> On 10/05/2011 05:16 PM, Jeremy Fitzhardinge wrote:
>> On 10/04/2011 09:30 AM, H. Peter Anvin wrote:
>>> On 10/04/2011 07:10 AM, Jason Baron wrote:
>>>> 1) The jmp +0, is a 'safe' no-op that I know is going to initially
>>>> boot for all x86. I'm not sure if there is a 5-byte nop that works on
>>>> all x86 variants - but by using jmp +0, we make it much easier to debug
>>>> cases where we may be using broken no-ops.
>>>>
>>> There are *plenty*.  jmp+0 is about as pessimal as you can get.
>> As an aside, do you know if a 2-byte unconditional jmp is any more
>> efficient than 5-byte, aside from just being a smaller instruction and
>> taking less icache?
>>
> I don't know for sure, no.  I probably depends on the CPU.

Looks like jmp2 is about 5% faster than jmp5 on Sandybridge with this
benchmark.

But insignificant difference on Nehalem.

    J

[-- Attachment #2: jmp2.c --]
[-- Type: text/x-csrc, Size: 550 bytes --]

#include <stdio.h>

struct {
	unsigned char flag;
	unsigned char val;
} l;

#define JMP2	asm volatile ("jmp 1f; .byte 0x0f,0x1f,0x00; 1: ");
#define JMPJMP2	JMP2 JMP2
#define JMPJMPJMP2	JMPJMP2 JMPJMP2
#define JMPJMPJMP2	JMPJMP2 JMPJMP2
#define JMPJMPJMPJMP2	JMPJMPJMP2 JMPJMPJMP2
#define JMPJMPJMPJMPJMP2	JMPJMPJMPJMP2 JMPJMPJMPJMP2
#define JMPJMPJMPJMPJMPJMP2	JMPJMPJMPJMPJMP2 JMPJMPJMPJMPJMP2

int main(int argc, char **argv)
{
	int i;

	for (i = 0; i < 100000000; i++) {
		JMPJMPJMPJMPJMPJMP2;
		asm volatile("" : : : "memory");
	}

	return 0;
}

[-- Attachment #3: jmp5.c --]
[-- Type: text/x-csrc, Size: 536 bytes --]

#include <stdio.h>

struct {
	unsigned char flag;
	unsigned char val;
} l;

#define JMP5	asm volatile (".byte 0xe9; .long 0");
#define JMPJMP5	JMP5 JMP5
#define JMPJMPJMP5	JMPJMP5 JMPJMP5
#define JMPJMPJMP5	JMPJMP5 JMPJMP5
#define JMPJMPJMPJMP5	JMPJMPJMP5 JMPJMPJMP5
#define JMPJMPJMPJMPJMP5	JMPJMPJMPJMP5 JMPJMPJMPJMP5
#define JMPJMPJMPJMPJMPJMP5	JMPJMPJMPJMPJMP5 JMPJMPJMPJMPJMP5

int main(int argc, char **argv)
{
	int i;

	for (i = 0; i < 100000000; i++) {
		JMPJMPJMPJMPJMPJMP5;
		asm volatile("" : : : "memory");
	}

	return 0;
}

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don't nop it out
  2011-10-06 17:53               ` Jeremy Fitzhardinge
@ 2011-10-06 18:10                 ` Jason Baron
  2011-10-06 18:13                   ` H. Peter Anvin
                                     ` (2 more replies)
  0 siblings, 3 replies; 53+ messages in thread
From: Jason Baron @ 2011-10-06 18:10 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: H. Peter Anvin, Steven Rostedt, David S. Miller, David Daney,
	Michael Ellerman, Jan Glauber, the arch/x86 maintainers,
	Xen Devel, Linux Kernel Mailing List, Jeremy Fitzhardinge,
	peterz, rth

On Thu, Oct 06, 2011 at 10:53:29AM -0700, Jeremy Fitzhardinge wrote:
> On 10/05/2011 05:17 PM, H. Peter Anvin wrote:
> > On 10/05/2011 05:16 PM, Jeremy Fitzhardinge wrote:
> >> On 10/04/2011 09:30 AM, H. Peter Anvin wrote:
> >>> On 10/04/2011 07:10 AM, Jason Baron wrote:
> >>>> 1) The jmp +0, is a 'safe' no-op that I know is going to initially
> >>>> boot for all x86. I'm not sure if there is a 5-byte nop that works on
> >>>> all x86 variants - but by using jmp +0, we make it much easier to debug
> >>>> cases where we may be using broken no-ops.
> >>>>
> >>> There are *plenty*.  jmp+0 is about as pessimal as you can get.
> >> As an aside, do you know if a 2-byte unconditional jmp is any more
> >> efficient than 5-byte, aside from just being a smaller instruction and
> >> taking less icache?
> >>
> > I don't know for sure, no.  I probably depends on the CPU.
> 
> Looks like jmp2 is about 5% faster than jmp5 on Sandybridge with this
> benchmark.
> 
> But insignificant difference on Nehalem.
> 
>     J

It would be cool if we could make the total width 2-bytes, when
possible.  It might be possible by making the initial 'JUMP_LABEL_INITIAL_NOP'
as a 'jmp' to the 'l_yes' label. And then patching that with a no-op at boot
time or link time - letting the compiler pick the width. In that way we could
get the optimal width...

thanks,

-Jason


> #include <stdio.h>
> 
> struct {
> 	unsigned char flag;
> 	unsigned char val;
> } l;
> 
> #define JMP2	asm volatile ("jmp 1f; .byte 0x0f,0x1f,0x00; 1: ");
> #define JMPJMP2	JMP2 JMP2
> #define JMPJMPJMP2	JMPJMP2 JMPJMP2
> #define JMPJMPJMP2	JMPJMP2 JMPJMP2
> #define JMPJMPJMPJMP2	JMPJMPJMP2 JMPJMPJMP2
> #define JMPJMPJMPJMPJMP2	JMPJMPJMPJMP2 JMPJMPJMPJMP2
> #define JMPJMPJMPJMPJMPJMP2	JMPJMPJMPJMPJMP2 JMPJMPJMPJMPJMP2
> 
> int main(int argc, char **argv)
> {
> 	int i;
> 
> 	for (i = 0; i < 100000000; i++) {
> 		JMPJMPJMPJMPJMPJMP2;
> 		asm volatile("" : : : "memory");
> 	}
> 
> 	return 0;
> }

> #include <stdio.h>
> 
> struct {
> 	unsigned char flag;
> 	unsigned char val;
> } l;
> 
> #define JMP5	asm volatile (".byte 0xe9; .long 0");
> #define JMPJMP5	JMP5 JMP5
> #define JMPJMPJMP5	JMPJMP5 JMPJMP5
> #define JMPJMPJMP5	JMPJMP5 JMPJMP5
> #define JMPJMPJMPJMP5	JMPJMPJMP5 JMPJMPJMP5
> #define JMPJMPJMPJMPJMP5	JMPJMPJMPJMP5 JMPJMPJMPJMP5
> #define JMPJMPJMPJMPJMPJMP5	JMPJMPJMPJMPJMP5 JMPJMPJMPJMPJMP5
> 
> int main(int argc, char **argv)
> {
> 	int i;
> 
> 	for (i = 0; i < 100000000; i++) {
> 		JMPJMPJMPJMPJMPJMP5;
> 		asm volatile("" : : : "memory");
> 	}
> 
> 	return 0;
> }


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don't nop it out
  2011-10-06 18:10                 ` Jason Baron
@ 2011-10-06 18:13                   ` H. Peter Anvin
  2011-10-06 21:39                     ` Jeremy Fitzhardinge
  2011-10-06 18:15                   ` Jeremy Fitzhardinge
  2011-10-06 18:26                   ` Steven Rostedt
  2 siblings, 1 reply; 53+ messages in thread
From: H. Peter Anvin @ 2011-10-06 18:13 UTC (permalink / raw)
  To: Jason Baron
  Cc: Jeremy Fitzhardinge, Steven Rostedt, David S. Miller,
	David Daney, Michael Ellerman, Jan Glauber,
	the arch/x86 maintainers, Xen Devel, Linux Kernel Mailing List,
	Jeremy Fitzhardinge, peterz, rth

On 10/06/2011 11:10 AM, Jason Baron wrote:
> 
> It would be cool if we could make the total width 2-bytes, when
> possible.  It might be possible by making the initial 'JUMP_LABEL_INITIAL_NOP'
> as a 'jmp' to the 'l_yes' label. And then patching that with a no-op at boot
> time or link time - letting the compiler pick the width. In that way we could
> get the optimal width...
> 

Yes, that would be a win just based on icache footprint alone.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don't nop it out
  2011-10-06 18:10                 ` Jason Baron
  2011-10-06 18:13                   ` H. Peter Anvin
@ 2011-10-06 18:15                   ` Jeremy Fitzhardinge
  2011-10-06 18:33                     ` Jason Baron
  2011-10-06 18:26                   ` Steven Rostedt
  2 siblings, 1 reply; 53+ messages in thread
From: Jeremy Fitzhardinge @ 2011-10-06 18:15 UTC (permalink / raw)
  To: Jason Baron
  Cc: H. Peter Anvin, Steven Rostedt, David S. Miller, David Daney,
	Michael Ellerman, Jan Glauber, the arch/x86 maintainers,
	Xen Devel, Linux Kernel Mailing List, Jeremy Fitzhardinge,
	peterz, rth

On 10/06/2011 11:10 AM, Jason Baron wrote:
> On Thu, Oct 06, 2011 at 10:53:29AM -0700, Jeremy Fitzhardinge wrote:
>> On 10/05/2011 05:17 PM, H. Peter Anvin wrote:
>>> On 10/05/2011 05:16 PM, Jeremy Fitzhardinge wrote:
>>>> On 10/04/2011 09:30 AM, H. Peter Anvin wrote:
>>>>> On 10/04/2011 07:10 AM, Jason Baron wrote:
>>>>>> 1) The jmp +0, is a 'safe' no-op that I know is going to initially
>>>>>> boot for all x86. I'm not sure if there is a 5-byte nop that works on
>>>>>> all x86 variants - but by using jmp +0, we make it much easier to debug
>>>>>> cases where we may be using broken no-ops.
>>>>>>
>>>>> There are *plenty*.  jmp+0 is about as pessimal as you can get.
>>>> As an aside, do you know if a 2-byte unconditional jmp is any more
>>>> efficient than 5-byte, aside from just being a smaller instruction and
>>>> taking less icache?
>>>>
>>> I don't know for sure, no.  I probably depends on the CPU.
>> Looks like jmp2 is about 5% faster than jmp5 on Sandybridge with this
>> benchmark.
>>
>> But insignificant difference on Nehalem.
>>
>>     J
> It would be cool if we could make the total width 2-bytes, when
> possible.  It might be possible by making the initial 'JUMP_LABEL_INITIAL_NOP'
> as a 'jmp' to the 'l_yes' label. And then patching that with a no-op at boot
> time or link time - letting the compiler pick the width. In that way we could
> get the optimal width...

I'll have a look at it later today if I get a moment.  Should be fairly
straightforward.

What about the rest of the series.  Do you think it looks cooked enough
for next mergewindow?

    J

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don't nop it out
  2011-10-06 18:10                 ` Jason Baron
  2011-10-06 18:13                   ` H. Peter Anvin
  2011-10-06 18:15                   ` Jeremy Fitzhardinge
@ 2011-10-06 18:26                   ` Steven Rostedt
  2011-10-06 18:29                     ` H. Peter Anvin
  2011-10-06 18:50                     ` Richard Henderson
  2 siblings, 2 replies; 53+ messages in thread
From: Steven Rostedt @ 2011-10-06 18:26 UTC (permalink / raw)
  To: Jason Baron
  Cc: Jeremy Fitzhardinge, H. Peter Anvin, David S. Miller,
	David Daney, Michael Ellerman, Jan Glauber,
	the arch/x86 maintainers, Xen Devel, Linux Kernel Mailing List,
	Jeremy Fitzhardinge, peterz, rth

On Thu, 2011-10-06 at 14:10 -0400, Jason Baron wrote:

> > Looks like jmp2 is about 5% faster than jmp5 on Sandybridge with this
> > benchmark.
> > 
> > But insignificant difference on Nehalem.
> > 
> >     J
> 
> It would be cool if we could make the total width 2-bytes, when
> possible.  It might be possible by making the initial 'JUMP_LABEL_INITIAL_NOP'
> as a 'jmp' to the 'l_yes' label. And then patching that with a no-op at boot
> time or link time - letting the compiler pick the width. In that way we could
> get the optimal width...

Why not just do it?

jump_label is encapsulated in arch_static_branch() which on x86 looks
like:

static __always_inline bool arch_static_branch(struct jump_label_key *key)
{
	asm goto("1:"
		JUMP_LABEL_INITIAL_NOP
		".pushsection __jump_table,  \"aw\" \n\t"
		_ASM_ALIGN "\n\t"
		_ASM_PTR "1b, %l[l_yes], %c0 \n\t"
		".popsection \n\t"
		: :  "i" (key) : : l_yes);
	return false;
l_yes:
	return true;
}


That jmp to l_yes should easily be a two byte jump.

If not I'm sure it would be easy to catch it before modifying the code.
And then complain real loudly about it.

-- Steve



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don't nop it out
  2011-10-06 18:26                   ` Steven Rostedt
@ 2011-10-06 18:29                     ` H. Peter Anvin
  2011-10-06 18:38                       ` Jason Baron
  2011-10-06 18:50                     ` Richard Henderson
  1 sibling, 1 reply; 53+ messages in thread
From: H. Peter Anvin @ 2011-10-06 18:29 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Jason Baron, Jeremy Fitzhardinge, David S. Miller, David Daney,
	Michael Ellerman, Jan Glauber, the arch/x86 maintainers,
	Xen Devel, Linux Kernel Mailing List, Jeremy Fitzhardinge,
	peterz, rth

On 10/06/2011 11:26 AM, Steven Rostedt wrote:
> On Thu, 2011-10-06 at 14:10 -0400, Jason Baron wrote:
> 
>>> Looks like jmp2 is about 5% faster than jmp5 on Sandybridge with this
>>> benchmark.
>>>
>>> But insignificant difference on Nehalem.
>>>
>>>     J
>>
>> It would be cool if we could make the total width 2-bytes, when
>> possible.  It might be possible by making the initial 'JUMP_LABEL_INITIAL_NOP'
>> as a 'jmp' to the 'l_yes' label. And then patching that with a no-op at boot
>> time or link time - letting the compiler pick the width. In that way we could
>> get the optimal width...
> 
> Why not just do it?
> 
> jump_label is encapsulated in arch_static_branch() which on x86 looks
> like:
> 
> static __always_inline bool arch_static_branch(struct jump_label_key *key)
> {
> 	asm goto("1:"
> 		JUMP_LABEL_INITIAL_NOP
> 		".pushsection __jump_table,  \"aw\" \n\t"
> 		_ASM_ALIGN "\n\t"
> 		_ASM_PTR "1b, %l[l_yes], %c0 \n\t"
> 		".popsection \n\t"
> 		: :  "i" (key) : : l_yes);
> 	return false;
> l_yes:
> 	return true;
> }
> 
> 
> That jmp to l_yes should easily be a two byte jump.
> 
> If not I'm sure it would be easy to catch it before modifying the code.
> And then complain real loudly about it.
> 

The important thing is that it requires the build-time elimination of
jumps.  It's just work.

	-hpa


-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don't nop it out
  2011-10-06 18:15                   ` Jeremy Fitzhardinge
@ 2011-10-06 18:33                     ` Jason Baron
  2011-10-06 18:35                       ` H. Peter Anvin
  0 siblings, 1 reply; 53+ messages in thread
From: Jason Baron @ 2011-10-06 18:33 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: H. Peter Anvin, Steven Rostedt, David S. Miller, David Daney,
	Michael Ellerman, Jan Glauber, the arch/x86 maintainers,
	Xen Devel, Linux Kernel Mailing List, Jeremy Fitzhardinge,
	peterz, rth

On Thu, Oct 06, 2011 at 11:15:07AM -0700, Jeremy Fitzhardinge wrote:
> On 10/06/2011 11:10 AM, Jason Baron wrote:
> > On Thu, Oct 06, 2011 at 10:53:29AM -0700, Jeremy Fitzhardinge wrote:
> >> On 10/05/2011 05:17 PM, H. Peter Anvin wrote:
> >>> On 10/05/2011 05:16 PM, Jeremy Fitzhardinge wrote:
> >>>> On 10/04/2011 09:30 AM, H. Peter Anvin wrote:
> >>>>> On 10/04/2011 07:10 AM, Jason Baron wrote:
> >>>>>> 1) The jmp +0, is a 'safe' no-op that I know is going to initially
> >>>>>> boot for all x86. I'm not sure if there is a 5-byte nop that works on
> >>>>>> all x86 variants - but by using jmp +0, we make it much easier to debug
> >>>>>> cases where we may be using broken no-ops.
> >>>>>>
> >>>>> There are *plenty*.  jmp+0 is about as pessimal as you can get.
> >>>> As an aside, do you know if a 2-byte unconditional jmp is any more
> >>>> efficient than 5-byte, aside from just being a smaller instruction and
> >>>> taking less icache?
> >>>>
> >>> I don't know for sure, no.  I probably depends on the CPU.
> >> Looks like jmp2 is about 5% faster than jmp5 on Sandybridge with this
> >> benchmark.
> >>
> >> But insignificant difference on Nehalem.
> >>
> >>     J
> > It would be cool if we could make the total width 2-bytes, when
> > possible.  It might be possible by making the initial 'JUMP_LABEL_INITIAL_NOP'
> > as a 'jmp' to the 'l_yes' label. And then patching that with a no-op at boot
> > time or link time - letting the compiler pick the width. In that way we could
> > get the optimal width...
> 
> I'll have a look at it later today if I get a moment.  Should be fairly
> straightforward.
> 

cool. It does add some complication, I think...detecting the 2-byte vs.
5-byte, and if done at boot time, possibly taking the undesired
branch...

> What about the rest of the series.  Do you think it looks cooked enough
> for next mergewindow?
> 
>     J

Yes, it looks good to me thanks! Feel free to add my ack to the series.

thanks,

-Jason

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don't nop it out
  2011-10-06 18:33                     ` Jason Baron
@ 2011-10-06 18:35                       ` H. Peter Anvin
  2011-10-06 18:43                         ` Jason Baron
  0 siblings, 1 reply; 53+ messages in thread
From: H. Peter Anvin @ 2011-10-06 18:35 UTC (permalink / raw)
  To: Jason Baron
  Cc: Jeremy Fitzhardinge, Steven Rostedt, David S. Miller,
	David Daney, Michael Ellerman, Jan Glauber,
	the arch/x86 maintainers, Xen Devel, Linux Kernel Mailing List,
	Jeremy Fitzhardinge, peterz, rth

On 10/06/2011 11:33 AM, Jason Baron wrote:
> 
> cool. It does add some complication, I think...detecting the 2-byte vs.
> 5-byte, and if done at boot time, possibly taking the undesired
> branch...
> 

It has to be done at build time to be sane, IMO.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don't nop it out
  2011-10-06 18:29                     ` H. Peter Anvin
@ 2011-10-06 18:38                       ` Jason Baron
  2011-10-06 19:34                         ` Steven Rostedt
  0 siblings, 1 reply; 53+ messages in thread
From: Jason Baron @ 2011-10-06 18:38 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Steven Rostedt, Jeremy Fitzhardinge, David S. Miller,
	David Daney, Michael Ellerman, Jan Glauber,
	the arch/x86 maintainers, Xen Devel, Linux Kernel Mailing List,
	Jeremy Fitzhardinge, peterz, rth

On Thu, Oct 06, 2011 at 11:29:25AM -0700, H. Peter Anvin wrote:
> On 10/06/2011 11:26 AM, Steven Rostedt wrote:
> > On Thu, 2011-10-06 at 14:10 -0400, Jason Baron wrote:
> > 
> >>> Looks like jmp2 is about 5% faster than jmp5 on Sandybridge with this
> >>> benchmark.
> >>>
> >>> But insignificant difference on Nehalem.
> >>>
> >>>     J
> >>
> >> It would be cool if we could make the total width 2-bytes, when
> >> possible.  It might be possible by making the initial 'JUMP_LABEL_INITIAL_NOP'
> >> as a 'jmp' to the 'l_yes' label. And then patching that with a no-op at boot
> >> time or link time - letting the compiler pick the width. In that way we could
> >> get the optimal width...
> > 
> > Why not just do it?
> > 
> > jump_label is encapsulated in arch_static_branch() which on x86 looks
> > like:
> > 
> > static __always_inline bool arch_static_branch(struct jump_label_key *key)
> > {
> > 	asm goto("1:"
> > 		JUMP_LABEL_INITIAL_NOP
> > 		".pushsection __jump_table,  \"aw\" \n\t"
> > 		_ASM_ALIGN "\n\t"
> > 		_ASM_PTR "1b, %l[l_yes], %c0 \n\t"
> > 		".popsection \n\t"
> > 		: :  "i" (key) : : l_yes);
> > 	return false;
> > l_yes:
> > 	return true;
> > }
> > 
> > 
> > That jmp to l_yes should easily be a two byte jump.

remember the compiler is moving the l_yes out of line, so its not
necessarily always a two byte jump. Also, I plan to look at a possible
'cold' label for the 'l_yes' branch, so that it can moved to a separate
'cold' section, but we might only want that for some cases...

> > 
> > If not I'm sure it would be easy to catch it before modifying the code.
> > And then complain real loudly about it.
> > 
> 
> The important thing is that it requires the build-time elimination of
> jumps.  It's just work.
> 
> 	-hpa
> 

Right, its certainly doable, but I'm not sure its so simple, since we'll
need a pass to eliminate the jumps - which can be keyed off the
'__jump_table' section. 

thanks,

-Jason

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don't nop it out
  2011-10-06 18:35                       ` H. Peter Anvin
@ 2011-10-06 18:43                         ` Jason Baron
  0 siblings, 0 replies; 53+ messages in thread
From: Jason Baron @ 2011-10-06 18:43 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jeremy Fitzhardinge, Steven Rostedt, David S. Miller,
	David Daney, Michael Ellerman, Jan Glauber,
	the arch/x86 maintainers, Xen Devel, Linux Kernel Mailing List,
	Jeremy Fitzhardinge, peterz, rth

On Thu, Oct 06, 2011 at 11:35:49AM -0700, H. Peter Anvin wrote:
> On 10/06/2011 11:33 AM, Jason Baron wrote:
> > 
> > cool. It does add some complication, I think...detecting the 2-byte vs.
> > 5-byte, and if done at boot time, possibly taking the undesired
> > branch...
> > 
> 
> It has to be done at build time to be sane, IMO.
> 
> 	-hpa
> 

agreed.

-Jason

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don't nop it out
  2011-10-06 18:26                   ` Steven Rostedt
  2011-10-06 18:29                     ` H. Peter Anvin
@ 2011-10-06 18:50                     ` Richard Henderson
  2011-10-06 19:28                       ` Steven Rostedt
  1 sibling, 1 reply; 53+ messages in thread
From: Richard Henderson @ 2011-10-06 18:50 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Jason Baron, Jeremy Fitzhardinge, H. Peter Anvin,
	David S. Miller, David Daney, Michael Ellerman, Jan Glauber,
	the arch/x86 maintainers, Xen Devel, Linux Kernel Mailing List,
	Jeremy Fitzhardinge, peterz

On 10/06/2011 11:26 AM, Steven Rostedt wrote:
> On Thu, 2011-10-06 at 14:10 -0400, Jason Baron wrote:
> 
>>> Looks like jmp2 is about 5% faster than jmp5 on Sandybridge with this
>>> benchmark.
>>>
>>> But insignificant difference on Nehalem.
>>>
>>>     J
>>
>> It would be cool if we could make the total width 2-bytes, when
>> possible.  It might be possible by making the initial 'JUMP_LABEL_INITIAL_NOP'
>> as a 'jmp' to the 'l_yes' label. And then patching that with a no-op at boot
>> time or link time - letting the compiler pick the width. In that way we could
>> get the optimal width...
> 
> Why not just do it?
> 
> jump_label is encapsulated in arch_static_branch() which on x86 looks
> like:
> 
> static __always_inline bool arch_static_branch(struct jump_label_key *key)
> {
> 	asm goto("1:"
> 		JUMP_LABEL_INITIAL_NOP
> 		".pushsection __jump_table,  \"aw\" \n\t"
> 		_ASM_ALIGN "\n\t"
> 		_ASM_PTR "1b, %l[l_yes], %c0 \n\t"
> 		".popsection \n\t"
> 		: :  "i" (key) : : l_yes);
> 	return false;
> l_yes:
> 	return true;
> }
> 
> 
> That jmp to l_yes should easily be a two byte jump.

Until the compiler decides to re-order the code.  That's the problem --
in the general case you do not know how far away the destination is really
going to be.

There are a couple of possibilities for improvement:

(1) Do as Jason suggests above and let the assembler figure out the size
of the branch that is needed.  Without adding more data to __jump_table,
you'll want to be extremely careful about checking the two pointers to 
see what size branch has been installed.

(2) Always reserve 5 bytes of space, but if the distance is small enough
patch in a 2-byte jump.  That doesn't help with the icache footprint.

(3) There is no 3.  I was going to say something clever about gas .ifne
conditionals, but a quick test revealed they don't work for forward
declarations.



r~


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don't nop it out
  2011-10-06 18:50                     ` Richard Henderson
@ 2011-10-06 19:28                       ` Steven Rostedt
  2011-10-06 21:42                         ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 53+ messages in thread
From: Steven Rostedt @ 2011-10-06 19:28 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Jason Baron, Jeremy Fitzhardinge, H. Peter Anvin,
	David S. Miller, David Daney, Michael Ellerman, Jan Glauber,
	the arch/x86 maintainers, Xen Devel, Linux Kernel Mailing List,
	Jeremy Fitzhardinge, peterz

On Thu, 2011-10-06 at 11:50 -0700, Richard Henderson wrote:
> On 10/06/2011 11:26 AM, Steven Rostedt wrote:
>  
> > 
> > That jmp to l_yes should easily be a two byte jump.
> 
> Until the compiler decides to re-order the code.  That's the problem --
> in the general case you do not know how far away the destination is really
> going to be.

Yeah, I was discussing this with Peter Zijlstra on IRC.

> 
> There are a couple of possibilities for improvement:
> 
> (1) Do as Jason suggests above and let the assembler figure out the size
> of the branch that is needed.  Without adding more data to __jump_table,
> you'll want to be extremely careful about checking the two pointers to 
> see what size branch has been installed.

Yeah, that could be done at patch time.


> 
> (2) Always reserve 5 bytes of space, but if the distance is small enough
> patch in a 2-byte jump.  That doesn't help with the icache footprint.

I don't think this one is worth it.

-- Steve


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don't nop it out
  2011-10-06 18:38                       ` Jason Baron
@ 2011-10-06 19:34                         ` Steven Rostedt
  2011-10-06 20:33                           ` Jason Baron
  0 siblings, 1 reply; 53+ messages in thread
From: Steven Rostedt @ 2011-10-06 19:34 UTC (permalink / raw)
  To: Jason Baron
  Cc: H. Peter Anvin, Jeremy Fitzhardinge, David S. Miller,
	David Daney, Michael Ellerman, Jan Glauber,
	the arch/x86 maintainers, Xen Devel, Linux Kernel Mailing List,
	Jeremy Fitzhardinge, peterz, rth

On Thu, 2011-10-06 at 14:38 -0400, Jason Baron wrote:

> Right, its certainly doable, but I'm not sure its so simple, since we'll
> need a pass to eliminate the jumps - which can be keyed off the
> '__jump_table' section. 

Look at the code of scripts/recordmcount.c and friends.

It does two things.

1) find all the callers of mcount and make a section for it.

2) For those callers of mcount that is in sections that are not
whitelisted, and therefor will not be patched, to replace the call to
mcount with a nop.


We can use this code, or a copy of it, to do the same with jump_label.
Have the x86 jump_label be:


static __always_inline bool arch_static_branch(struct jump_label_key
*key)
{
	asm goto("1:"
		"jmp l_yes\n"
		".pushsection __jump_table,  \"aw\" \n\t"
		_ASM_ALIGN "\n\t"
		_ASM_PTR "1b, %l[l_yes], %c0 \n\t"
		".popsection \n\t"
		: :  "i" (key) : : l_yes);
	return false;
l_yes:
	return true;
}

Then have the record_jumplabel.c (or whatever it's called) find all the
jmps at run time, and convert them into the appropriate nop.

Then at runtime patching, the jumplabel code could figure out what size
jump it needs to replace it.

-- Steve



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don't nop it out
  2011-10-06 19:34                         ` Steven Rostedt
@ 2011-10-06 20:33                           ` Jason Baron
  2011-10-06 20:45                             ` Steven Rostedt
  0 siblings, 1 reply; 53+ messages in thread
From: Jason Baron @ 2011-10-06 20:33 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: H. Peter Anvin, Jeremy Fitzhardinge, David S. Miller,
	David Daney, Michael Ellerman, Jan Glauber,
	the arch/x86 maintainers, Xen Devel, Linux Kernel Mailing List,
	Jeremy Fitzhardinge, peterz, rth

On Thu, Oct 06, 2011 at 03:34:13PM -0400, Steven Rostedt wrote:
> On Thu, 2011-10-06 at 14:38 -0400, Jason Baron wrote:
> 
> > Right, its certainly doable, but I'm not sure its so simple, since we'll
> > need a pass to eliminate the jumps - which can be keyed off the
> > '__jump_table' section. 
> 
> Look at the code of scripts/recordmcount.c and friends.
> 
> It does two things.
> 
> 1) find all the callers of mcount and make a section for it.
> 
> 2) For those callers of mcount that is in sections that are not
> whitelisted, and therefor will not be patched, to replace the call to
> mcount with a nop.
> 
> 
> We can use this code, or a copy of it, to do the same with jump_label.
> Have the x86 jump_label be:
> 
> 
> static __always_inline bool arch_static_branch(struct jump_label_key
> *key)
> {
> 	asm goto("1:"
> 		"jmp l_yes\n"
> 		".pushsection __jump_table,  \"aw\" \n\t"
> 		_ASM_ALIGN "\n\t"
> 		_ASM_PTR "1b, %l[l_yes], %c0 \n\t"
> 		".popsection \n\t"
> 		: :  "i" (key) : : l_yes);
> 	return false;
> l_yes:
> 	return true;
> }
> 
> Then have the record_jumplabel.c (or whatever it's called) find all the
> jmps at run time, and convert them into the appropriate nop.
> 

I'd prefer to do this at build-time as hpa said. We don't want there to
be some race b/w patching in the no-ops and taking an incorrect branch.

> Then at runtime patching, the jumplabel code could figure out what size
> jump it needs to replace it.
> 
> -- Steve
> 
> 

sounds like a good plan. thanks for the pointers!

-Jason

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don't nop it out
  2011-10-06 20:33                           ` Jason Baron
@ 2011-10-06 20:45                             ` Steven Rostedt
  0 siblings, 0 replies; 53+ messages in thread
From: Steven Rostedt @ 2011-10-06 20:45 UTC (permalink / raw)
  To: Jason Baron
  Cc: H. Peter Anvin, Jeremy Fitzhardinge, David S. Miller,
	David Daney, Michael Ellerman, Jan Glauber,
	the arch/x86 maintainers, Xen Devel, Linux Kernel Mailing List,
	Jeremy Fitzhardinge, peterz, rth

On Thu, 2011-10-06 at 16:33 -0400, Jason Baron wrote:

> > Then have the record_jumplabel.c (or whatever it's called) find all the
> > jmps at run time, and convert them into the appropriate nop.
> > 
> 
> I'd prefer to do this at build-time as hpa said. We don't want there to
> be some race b/w patching in the no-ops and taking an incorrect branch.

Yep, this record_jumplabel.c would execute at build time. Just like the
record_mcount.c does.

> 
> > Then at runtime patching, the jumplabel code could figure out what size
> > jump it needs to replace it.

The runtime patching is when we can figure out what size the compiler
used. That wont change. It should be pretty trivial to do. I can't see
any races here.

-- Steve





^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don't nop it out
  2011-10-06 18:13                   ` H. Peter Anvin
@ 2011-10-06 21:39                     ` Jeremy Fitzhardinge
  2011-10-06 22:08                       ` Steven Rostedt
  0 siblings, 1 reply; 53+ messages in thread
From: Jeremy Fitzhardinge @ 2011-10-06 21:39 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jason Baron, Steven Rostedt, David S. Miller, David Daney,
	Michael Ellerman, Jan Glauber, the arch/x86 maintainers,
	Xen Devel, Linux Kernel Mailing List, Jeremy Fitzhardinge,
	peterz, rth

On 10/06/2011 11:13 AM, H. Peter Anvin wrote:
> On 10/06/2011 11:10 AM, Jason Baron wrote:
>> It would be cool if we could make the total width 2-bytes, when
>> possible.  It might be possible by making the initial 'JUMP_LABEL_INITIAL_NOP'
>> as a 'jmp' to the 'l_yes' label. And then patching that with a no-op at boot
>> time or link time - letting the compiler pick the width. In that way we could
>> get the optimal width...
>>
> Yes, that would be a win just based on icache footprint alone.

I'm not sure it would be a win, necessarily.  My test with back-to-back
jmp2 was definitely slower than with the nop padding it out to 5 bytes;
I suspect that's a result of having too many jmps within one cacheline. 
Of course, there's no reason why the CPU would optimise for jumps to
jumps, so perhaps its just hitting a "stupid programmer" path.

    J

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don't nop it out
  2011-10-06 19:28                       ` Steven Rostedt
@ 2011-10-06 21:42                         ` Jeremy Fitzhardinge
  2011-10-06 22:06                           ` Steven Rostedt
  0 siblings, 1 reply; 53+ messages in thread
From: Jeremy Fitzhardinge @ 2011-10-06 21:42 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Richard Henderson, Jason Baron, H. Peter Anvin, David S. Miller,
	David Daney, Michael Ellerman, Jan Glauber,
	the arch/x86 maintainers, Xen Devel, Linux Kernel Mailing List,
	Jeremy Fitzhardinge, peterz

On 10/06/2011 12:28 PM, Steven Rostedt wrote:
>> (2) Always reserve 5 bytes of space, but if the distance is small enough
>> patch in a 2-byte jump.  That doesn't help with the icache footprint.
> I don't think this one is worth it.

I disagree.  This is what I benchmarked as having a 5% improvement.  If
squashing out the padding helps, then that's a separate optimisation.

    J

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don't nop it out
  2011-10-06 21:42                         ` Jeremy Fitzhardinge
@ 2011-10-06 22:06                           ` Steven Rostedt
  2011-10-06 22:10                             ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 53+ messages in thread
From: Steven Rostedt @ 2011-10-06 22:06 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Richard Henderson, Jason Baron, H. Peter Anvin, David S. Miller,
	David Daney, Michael Ellerman, Jan Glauber,
	the arch/x86 maintainers, Xen Devel, Linux Kernel Mailing List,
	Jeremy Fitzhardinge, peterz

On Thu, 2011-10-06 at 14:42 -0700, Jeremy Fitzhardinge wrote:
> On 10/06/2011 12:28 PM, Steven Rostedt wrote:
> >> (2) Always reserve 5 bytes of space, but if the distance is small enough
> >> patch in a 2-byte jump.  That doesn't help with the icache footprint.
> > I don't think this one is worth it.
> 
> I disagree.  This is what I benchmarked as having a 5% improvement.  If
> squashing out the padding helps, then that's a separate optimisation.

But it only speeds up the tracing case. The non-tracing case is a nop
and 5bytes is 5bytes regardless.

Did you see a 5% speed up while tracing was happening? How did you do
your test. I find a 5 byte compared to a 2 byte jump being negligible
with the rest of the overhead of tracing, but I could be wrong.

-- Steve
 


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don't nop it out
  2011-10-06 21:39                     ` Jeremy Fitzhardinge
@ 2011-10-06 22:08                       ` Steven Rostedt
  0 siblings, 0 replies; 53+ messages in thread
From: Steven Rostedt @ 2011-10-06 22:08 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: H. Peter Anvin, Jason Baron, David S. Miller, David Daney,
	Michael Ellerman, Jan Glauber, the arch/x86 maintainers,
	Xen Devel, Linux Kernel Mailing List, Jeremy Fitzhardinge,
	peterz, rth

On Thu, 2011-10-06 at 14:39 -0700, Jeremy Fitzhardinge wrote:

> I'm not sure it would be a win, necessarily.  My test with back-to-back
> jmp2 was definitely slower than with the nop padding it out to 5 bytes;
> I suspect that's a result of having too many jmps within one cacheline. 
> Of course, there's no reason why the CPU would optimise for jumps to
> jumps, so perhaps its just hitting a "stupid programmer" path.

Just a note. Micro-benchmarks are known to be as good as political polls
and statistics. They show a much different view of the world than what
is really there.

-- Steve



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don't nop it out
  2011-10-06 22:06                           ` Steven Rostedt
@ 2011-10-06 22:10                             ` Jeremy Fitzhardinge
  2011-10-06 22:20                               ` Steven Rostedt
  2011-10-07 17:09                               ` [PATCH][RFC] jump_labels/x86: Use either 5 byte or 2 byte jumps Steven Rostedt
  0 siblings, 2 replies; 53+ messages in thread
From: Jeremy Fitzhardinge @ 2011-10-06 22:10 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Richard Henderson, Jason Baron, H. Peter Anvin, David S. Miller,
	David Daney, Michael Ellerman, Jan Glauber,
	the arch/x86 maintainers, Xen Devel, Linux Kernel Mailing List,
	Jeremy Fitzhardinge, peterz

On 10/06/2011 03:06 PM, Steven Rostedt wrote:
> But it only speeds up the tracing case. The non-tracing case is a nop
> and 5bytes is 5bytes regardless.
>
> Did you see a 5% speed up while tracing was happening? How did you do
> your test. I find a 5 byte compared to a 2 byte jump being negligible
> with the rest of the overhead of tracing, but I could be wrong.

You're right, this was a completely artificial microbenchmark.  In
practice the improvement would be a much smaller effect.

But bear in mind, I'm not using jump-label for tracing.  While its
important for the "disabled" state to be quick, performance of the
"enabled" state is also important.

Thanks,
    J

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don't nop it out
  2011-10-06 22:10                             ` Jeremy Fitzhardinge
@ 2011-10-06 22:20                               ` Steven Rostedt
  2011-10-07 17:09                               ` [PATCH][RFC] jump_labels/x86: Use either 5 byte or 2 byte jumps Steven Rostedt
  1 sibling, 0 replies; 53+ messages in thread
From: Steven Rostedt @ 2011-10-06 22:20 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Richard Henderson, Jason Baron, H. Peter Anvin, David S. Miller,
	David Daney, Michael Ellerman, Jan Glauber,
	the arch/x86 maintainers, Xen Devel, Linux Kernel Mailing List,
	Jeremy Fitzhardinge, peterz

On Thu, 2011-10-06 at 15:10 -0700, Jeremy Fitzhardinge wrote:
> On 10/06/2011 03:06 PM, Steven Rostedt wrote:

> But bear in mind, I'm not using jump-label for tracing.  While its
> important for the "disabled" state to be quick, performance of the
> "enabled" state is also important.

Sorry, I'm still thinking jump-label for tracing over.

But that said, having the nop match is the best of both worlds.

I think having a update_jumplabel.c that is just like the
record_mcount.c which modifies the code right after it was compiled is
the best thing to do. That is, have the assembler determine what size
jumps to use and update them to nops right in the object file before
linking. This should be rather trivial to do.

-- Steve



^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH][RFC] jump_labels/x86: Use either 5 byte or 2 byte jumps
  2011-10-06 22:10                             ` Jeremy Fitzhardinge
  2011-10-06 22:20                               ` Steven Rostedt
@ 2011-10-07 17:09                               ` Steven Rostedt
  2011-10-07 18:52                                 ` Jason Baron
  2011-10-07 19:40                                 ` Jeremy Fitzhardinge
  1 sibling, 2 replies; 53+ messages in thread
From: Steven Rostedt @ 2011-10-07 17:09 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Richard Henderson, Jason Baron, H. Peter Anvin, David S. Miller,
	David Daney, Michael Ellerman, Jan Glauber,
	the arch/x86 maintainers, Xen Devel, Linux Kernel Mailing List,
	Jeremy Fitzhardinge, peterz

Note, this is just hacked together and needs to be cleaned up. Please do
not comment on formatting or other sloppiness of this patch. I know it's
sloppy and I left debug statements in. I want the comments to be on the
idea of the patch.

I created a new file called scripts/update_jump_label.[ch] based on some
of the work of recordmcount.c. This is executed at build time on all
object files just like recordmcount is. But it does not add any new
sections, it just modifies the code at build time to convert all jump
labels into nops.

The idea is in arch/x86/include/asm/jump_label.h to not place a nop, but
instead to insert a jmp to the label. Depending on how gcc optimizes the
code, the jmp will be either end up being a 2 byte or 5 byte jump.

After an object is compiled, update_jump_label is executed on this file
and it reads the ELF relocation table to find the jump label locations
and examines what jump was used. It then converts the jump into either a
2 byte or 5 byte nop that is appropriate.

At boot time, the jump labels no longer need to be converted (although
we may do so in the future to use better nops depending on the machine
that is used). When jump labels are enabled, the code is examined to see
if a two byte or 5 byte version was used, and the appropriate update is
made.

I just booted this patch and it worked. I was able to enable and disable
trace points using jump labels. Benchmarks are welcomed :)

Comments and thoughts?

-- Steve

Sloppy-signed-off-by: Steven Rostedt <rostedt@goodmis.org>

diff --git a/Makefile b/Makefile
index 31f967c..8368f42 100644
--- a/Makefile
+++ b/Makefile
@@ -245,7 +245,7 @@ CONFIG_SHELL := $(shell if [ -x "$$BASH" ]; then echo $$BASH; \
 
 HOSTCC       = gcc
 HOSTCXX      = g++
-HOSTCFLAGS   = -Wall -Wmissing-prototypes -Wstrict-prototypes -O2 -fomit-frame-pointer
+HOSTCFLAGS   = -Wall -Wmissing-prototypes -Wstrict-prototypes -g -fomit-frame-pointer
 HOSTCXXFLAGS = -O2
 
 # Decide whether to build built-in, modular, or both.
@@ -611,6 +611,13 @@ ifdef CONFIG_DYNAMIC_FTRACE
 endif
 endif
 
+ifdef CONFIG_JUMP_LABEL
+	ifdef CONFIG_HAVE_BUILD_TIME_JUMP_LABEL
+		BUILD_UPDATE_JUMP_LABEL := y
+		export BUILD_UPDATE_JUMP_LABEL
+	endif
+endif
+
 # We trigger additional mismatches with less inlining
 ifdef CONFIG_DEBUG_SECTION_MISMATCH
 KBUILD_CFLAGS += $(call cc-option, -fno-inline-functions-called-once)
diff --git a/arch/Kconfig b/arch/Kconfig
index 4b0669c..8fa6934 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -169,6 +169,12 @@ config HAVE_PERF_EVENTS_NMI
 	  subsystem.  Also has support for calculating CPU cycle events
 	  to determine how many clock cycles in a given period.
 
+config HAVE_BUILD_TIME_JUMP_LABEL
+       bool
+       help
+	If an arch uses scripts/update_jump_label to patch in jump nops
+	at build time, then it must enable this option.
+
 config HAVE_ARCH_JUMP_LABEL
 	bool
 
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 6a47bb2..6de726a 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -61,6 +61,7 @@ config X86
 	select HAVE_ARCH_KMEMCHECK
 	select HAVE_USER_RETURN_NOTIFIER
 	select HAVE_ARCH_JUMP_LABEL
+	select HAVE_BUILD_TIME_JUMP_LABEL
 	select HAVE_TEXT_POKE_SMP
 	select HAVE_GENERIC_HARDIRQS
 	select HAVE_SPARSE_IRQ
diff --git a/arch/x86/include/asm/jump_label.h b/arch/x86/include/asm/jump_label.h
index a32b18c..872b3e1 100644
--- a/arch/x86/include/asm/jump_label.h
+++ b/arch/x86/include/asm/jump_label.h
@@ -14,7 +14,7 @@
 static __always_inline bool arch_static_branch(struct jump_label_key *key)
 {
 	asm goto("1:"
-		JUMP_LABEL_INITIAL_NOP
+		"jmp %l[l_yes]\n"
 		".pushsection __jump_table,  \"aw\" \n\t"
 		_ASM_ALIGN "\n\t"
 		_ASM_PTR "1b, %l[l_yes], %c0 \n\t"
diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
index 3fee346..1f7f88f 100644
--- a/arch/x86/kernel/jump_label.c
+++ b/arch/x86/kernel/jump_label.c
@@ -16,34 +16,75 @@
 
 #ifdef HAVE_JUMP_LABEL
 
+static unsigned char nop_short[] = { P6_NOP2 };
+
 union jump_code_union {
 	char code[JUMP_LABEL_NOP_SIZE];
 	struct {
 		char jump;
 		int offset;
 	} __attribute__((packed));
+	struct {
+		char jump_short;
+		char offset_short;
+	} __attribute__((packed));
 };
 
 void arch_jump_label_transform(struct jump_entry *entry,
 			       enum jump_label_type type)
 {
 	union jump_code_union code;
+	unsigned char op;
+	unsigned size;
+	unsigned char nop;
+
+	/* Use probe_kernel_read()? */
+	op = *(unsigned char *)entry->code;
+	nop = ideal_nops[NOP_ATOMIC5][0];
 
 	if (type == JUMP_LABEL_ENABLE) {
-		code.jump = 0xe9;
-		code.offset = entry->target -
-				(entry->code + JUMP_LABEL_NOP_SIZE);
-	} else
-		memcpy(&code, ideal_nops[NOP_ATOMIC5], JUMP_LABEL_NOP_SIZE);
+		if (op == 0xe9 || op == 0xeb)
+			/* Already enabled. Warn? */
+			return;
+
+		/* FIXME for all archs */
+		if (op == nop_short[0]) {
+			size = 2;
+			code.jump_short = 0xeb;
+			code.offset = entry->target -
+				(entry->code + 2);
+			/* Check for overflow ? */
+		} else if (op == nop) {
+			size = JUMP_LABEL_NOP_SIZE;
+			code.jump = 0xe9;
+			code.offset = entry->target - (entry->code + size);
+		} else
+			return; /* WARN ? */
+
+	} else {
+		if (op == nop_short[0] || nop)
+			/* Already disabled, warn? */
+			return;
+
+		if (op == 0xe9) {
+			size = JUMP_LABEL_NOP_SIZE;
+			memcpy(&code, ideal_nops[NOP_ATOMIC5], size);
+		} else if (op == 0xeb) {
+			size = 2;
+			memcpy(&code, nop_short, size);
+		} else
+			return; /* WARN ? */
+	}
 	get_online_cpus();
 	mutex_lock(&text_mutex);
-	text_poke_smp((void *)entry->code, &code, JUMP_LABEL_NOP_SIZE);
+	text_poke_smp((void *)entry->code, &code, size);
 	mutex_unlock(&text_mutex);
 	put_online_cpus();
 }
 
 void arch_jump_label_text_poke_early(jump_label_t addr)
 {
+	return;
 	text_poke_early((void *)addr, ideal_nops[NOP_ATOMIC5],
 			JUMP_LABEL_NOP_SIZE);
 }
diff --git a/scripts/Makefile b/scripts/Makefile
index df7678f..738b65c 100644
--- a/scripts/Makefile
+++ b/scripts/Makefile
@@ -13,6 +13,7 @@ hostprogs-$(CONFIG_LOGO)         += pnmtologo
 hostprogs-$(CONFIG_VT)           += conmakehash
 hostprogs-$(CONFIG_IKCONFIG)     += bin2c
 hostprogs-$(BUILD_C_RECORDMCOUNT) += recordmcount
+hostprogs-$(BUILD_UPDATE_JUMP_LABEL) += update_jump_label
 
 always		:= $(hostprogs-y) $(hostprogs-m)
 
diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index a0fd502..bc0d89b 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -258,6 +258,15 @@ cmd_modversions =								\
 	fi;
 endif
 
+ifdef BUILD_UPDATE_JUMP_LABEL
+update_jump_label_source := $(srctree)/scripts/update_jump_label.c \
+			$(srctree)/scripts/update_jump_label.h
+cmd_update_jump_label =						\
+	if [ $(@) != "scripts/mod/empty.o" ]; then		\
+		$(objtree)/scripts/update_jump_label "$(@)";	\
+	fi;
+endif
+
 ifdef CONFIG_FTRACE_MCOUNT_RECORD
 ifdef BUILD_C_RECORDMCOUNT
 ifeq ("$(origin RECORDMCOUNT_WARN)", "command line")
@@ -294,6 +303,7 @@ define rule_cc_o_c
 	$(cmd_modversions)						  \
 	$(call echo-cmd,record_mcount)					  \
 	$(cmd_record_mcount)						  \
+	$(cmd_update_jump_label)					  \
 	scripts/basic/fixdep $(depfile) $@ '$(call make-cmd,cc_o_c)' >    \
 	                                              $(dot-target).tmp;  \
 	rm -f $(depfile);						  \
@@ -301,13 +311,14 @@ define rule_cc_o_c
 endef
 
 # Built-in and composite module parts
-$(obj)/%.o: $(src)/%.c $(recordmcount_source) FORCE
+$(obj)/%.o: $(src)/%.c $(recordmcount_source) $(update_jump_label_source) FORCE
 	$(call cmd,force_checksrc)
 	$(call if_changed_rule,cc_o_c)
 
 # Single-part modules are special since we need to mark them in $(MODVERDIR)
 
-$(single-used-m): $(obj)/%.o: $(src)/%.c $(recordmcount_source) FORCE
+$(single-used-m): $(obj)/%.o: $(src)/%.c $(recordmcount_source) \
+		  $(update_jump_label_source) FORCE
 	$(call cmd,force_checksrc)
 	$(call if_changed_rule,cc_o_c)
 	@{ echo $(@:.o=.ko); echo $@; } > $(MODVERDIR)/$(@F:.o=.mod)
diff --git a/scripts/update_jump_label.c b/scripts/update_jump_label.c
new file mode 100644
index 0000000..86e17bc
--- /dev/null
+++ b/scripts/update_jump_label.c
@@ -0,0 +1,349 @@
+/*
+ * update_jump_label.c: replace jmps with nops at compile time.
+ * Copyright 2010 Steven Rostedt <srostedt@redhat.com>, Red Hat Inc.
+ *  Parsing of the elf file was influenced by recordmcount.c
+ *  originally written by and copyright to John F. Reiser <jreiser@BitWagon.com>.
+ */
+
+/*
+ * Note, this code is originally designed for x86, but may be used by
+ * other archs to do the nop updates at compile time instead of at boot time.
+ * X86 uses this as an optimization, as jmps can be either 2 bytes or 5 bytes.
+ * Inserting a 2 byte where possible helps with both CPU performance and
+ * icache strain.
+ */
+#include <sys/types.h>
+#include <sys/mman.h>
+#include <sys/stat.h>
+#include <getopt.h>
+#include <elf.h>
+#include <fcntl.h>
+#include <setjmp.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdarg.h>
+#include <string.h>
+#include <unistd.h>
+
+static int fd_map;	/* File descriptor for file being modified. */
+static struct stat sb;	/* Remember .st_size, etc. */
+static int mmap_failed; /* Boolean flag. */
+
+static void die(const char *err, const char *fmt, ...)
+{
+	va_list ap;
+
+	if (err)
+		perror(err);
+
+	if (fmt) {
+		va_start(ap, fmt);
+		fprintf(stderr, "Fatal error:  ");
+		vfprintf(stderr, fmt, ap);
+		fprintf(stderr, "\n");
+		va_end(ap);
+	}
+
+	exit(1);
+}
+
+static void usage(char **argv)
+{
+	char *arg = argv[0];
+	char *p = arg+strlen(arg);
+
+	while (p >= arg && *p != '/')
+		p--;
+	p++;
+
+	printf("usage: %s file\n"
+	       "\n",p);
+	exit(-1);
+}
+
+/* w8rev, w8nat, ...: Handle endianness. */
+
+static uint64_t w8rev(uint64_t const x)
+{
+	return   ((0xff & (x >> (0 * 8))) << (7 * 8))
+	       | ((0xff & (x >> (1 * 8))) << (6 * 8))
+	       | ((0xff & (x >> (2 * 8))) << (5 * 8))
+	       | ((0xff & (x >> (3 * 8))) << (4 * 8))
+	       | ((0xff & (x >> (4 * 8))) << (3 * 8))
+	       | ((0xff & (x >> (5 * 8))) << (2 * 8))
+	       | ((0xff & (x >> (6 * 8))) << (1 * 8))
+	       | ((0xff & (x >> (7 * 8))) << (0 * 8));
+}
+
+static uint32_t w4rev(uint32_t const x)
+{
+	return   ((0xff & (x >> (0 * 8))) << (3 * 8))
+	       | ((0xff & (x >> (1 * 8))) << (2 * 8))
+	       | ((0xff & (x >> (2 * 8))) << (1 * 8))
+	       | ((0xff & (x >> (3 * 8))) << (0 * 8));
+}
+
+static uint32_t w2rev(uint16_t const x)
+{
+	return   ((0xff & (x >> (0 * 8))) << (1 * 8))
+	       | ((0xff & (x >> (1 * 8))) << (0 * 8));
+}
+
+static uint64_t w8nat(uint64_t const x)
+{
+	return x;
+}
+
+static uint32_t w4nat(uint32_t const x)
+{
+	return x;
+}
+
+static uint32_t w2nat(uint16_t const x)
+{
+	return x;
+}
+
+static uint64_t (*w8)(uint64_t);
+static uint32_t (*w)(uint32_t);
+static uint32_t (*w2)(uint16_t);
+
+/* ulseek, uread, ...:  Check return value for errors. */
+
+static off_t
+ulseek(int const fd, off_t const offset, int const whence)
+{
+	off_t const w = lseek(fd, offset, whence);
+	if (w == (off_t)-1)
+		die("lseek", NULL);
+
+	return w;
+}
+
+static size_t
+uread(int const fd, void *const buf, size_t const count)
+{
+	size_t const n = read(fd, buf, count);
+	if (n != count)
+		die("read", NULL);
+
+	return n;
+}
+
+static size_t
+uwrite(int const fd, void const *const buf, size_t const count)
+{
+	size_t const n = write(fd, buf, count);
+	if (n != count)
+		die("write", NULL);
+
+	return n;
+}
+
+static void *
+umalloc(size_t size)
+{
+	void *const addr = malloc(size);
+	if (addr == 0)
+		die("malloc", "malloc failed: %zu bytes\n", size);
+
+	return addr;
+}
+
+/*
+ * Get the whole file as a programming convenience in order to avoid
+ * malloc+lseek+read+free of many pieces.  If successful, then mmap
+ * avoids copying unused pieces; else just read the whole file.
+ * Open for both read and write; new info will be appended to the file.
+ * Use MAP_PRIVATE so that a few changes to the in-memory ElfXX_Ehdr
+ * do not propagate to the file until an explicit overwrite at the last.
+ * This preserves most aspects of consistency (all except .st_size)
+ * for simultaneous readers of the file while we are appending to it.
+ * However, multiple writers still are bad.  We choose not to use
+ * locking because it is expensive and the use case of kernel build
+ * makes multiple writers unlikely.
+ */
+static void *mmap_file(char const *fname)
+{
+	void *addr;
+
+	fd_map = open(fname, O_RDWR);
+	if (fd_map < 0 || fstat(fd_map, &sb) < 0)
+		die(fname, "failed to open file");
+
+	if (!S_ISREG(sb.st_mode))
+		die(NULL, "not a regular file: %s\n", fname);
+
+	addr = mmap(0, sb.st_size, PROT_READ|PROT_WRITE, MAP_PRIVATE,
+		    fd_map, 0);
+
+	mmap_failed = 0;
+	if (addr == MAP_FAILED) {
+		mmap_failed = 1;
+		addr = umalloc(sb.st_size);
+		uread(fd_map, addr, sb.st_size);
+	}
+	return addr;
+}
+
+static void munmap_file(void *addr)
+{
+	if (!mmap_failed)
+		munmap(addr, sb.st_size);
+	else
+		free(addr);
+	close(fd_map);
+}
+
+static unsigned char ideal_nop5_x86_64[5] = { 0x0f, 0x1f, 0x44, 0x00, 0x00 };
+static unsigned char ideal_nop5_x86_32[5] = { 0x3e, 0x8d, 0x74, 0x26, 0x00 };
+static unsigned char ideal_nop2_x86[2] = { 0x66, 0x99 };
+static unsigned char *ideal_nop;
+
+static int (*make_nop)(void *map, size_t const offset);
+
+static int make_nop_x86(void *map, size_t const offset)
+{
+	unsigned char *op;
+	unsigned char *nop;
+	int size;
+
+	/* Determine which type of jmp this is 2 byte or 5. */
+	op = map + offset;
+	switch (*op) {
+	case 0xeb: /* 2 byte */
+		size = 2;
+		nop = ideal_nop2_x86;
+		break;
+	case 0xe9: /* 5 byte */
+		size = 5;
+		nop = ideal_nop;
+		break;
+	default:
+		die(NULL, "Bad jump label section\n");
+	}
+
+	/* convert to nop */
+	ulseek(fd_map, offset, SEEK_SET);
+	uwrite(fd_map, nop, size);
+	return 0;
+}
+
+/* 32 bit and 64 bit are very similar */
+#include "update_jump_label.h"
+#define UPDATE_JUMP_LABEL_64
+#include "update_jump_label.h"
+
+static int do_file(const char *fname)
+{
+	Elf32_Ehdr *const ehdr = mmap_file(fname);
+	unsigned int reltype = 0;
+
+	w = w4nat;
+	w2 = w2nat;
+	w8 = w8nat;
+	switch (ehdr->e_ident[EI_DATA]) {
+		static unsigned int const endian = 1;
+	default:
+		die(NULL, "unrecognized ELF data encoding %d: %s\n",
+			ehdr->e_ident[EI_DATA], fname);
+		break;
+	case ELFDATA2LSB:
+		if (*(unsigned char const *)&endian != 1) {
+			/* main() is big endian, file.o is little endian. */
+			w = w4rev;
+			w2 = w2rev;
+			w8 = w8rev;
+		}
+		break;
+	case ELFDATA2MSB:
+		if (*(unsigned char const *)&endian != 0) {
+			/* main() is little endian, file.o is big endian. */
+			w = w4rev;
+			w2 = w2rev;
+			w8 = w8rev;
+		}
+		break;
+	}  /* end switch */
+
+	if (memcmp(ELFMAG, ehdr->e_ident, SELFMAG) != 0 ||
+	    w2(ehdr->e_type) != ET_REL ||
+	    ehdr->e_ident[EI_VERSION] != EV_CURRENT)
+		die(NULL, "unrecognized ET_REL file %s\n", fname);
+
+	switch (w2(ehdr->e_machine)) {
+	default:
+		die(NULL, "unrecognized e_machine %d %s\n",
+		    w2(ehdr->e_machine), fname);
+		break;
+	case EM_386:
+		reltype = R_386_32;
+		make_nop = make_nop_x86;
+		ideal_nop = ideal_nop5_x86_32;
+		break;
+	case EM_ARM:	 reltype = R_ARM_ABS32;
+			 break;
+	case EM_IA_64:	 reltype = R_IA64_IMM64; break;
+	case EM_MIPS:	 /* reltype: e_class    */ break;
+	case EM_PPC:	 reltype = R_PPC_ADDR32;   break;
+	case EM_PPC64:	 reltype = R_PPC64_ADDR64; break;
+	case EM_S390:    /* reltype: e_class    */ break;
+	case EM_SH:	 reltype = R_SH_DIR32;                 break;
+	case EM_SPARCV9: reltype = R_SPARC_64;     break;
+	case EM_X86_64:
+		make_nop = make_nop_x86;
+		ideal_nop = ideal_nop5_x86_64;
+		reltype = R_X86_64_64;
+		break;
+	}  /* end switch */
+
+	switch (ehdr->e_ident[EI_CLASS]) {
+	default:
+		die(NULL, "unrecognized ELF class %d %s\n",
+		    ehdr->e_ident[EI_CLASS], fname);
+		break;
+	case ELFCLASS32:
+		if (w2(ehdr->e_ehsize) != sizeof(Elf32_Ehdr)
+		||  w2(ehdr->e_shentsize) != sizeof(Elf32_Shdr))
+			die(NULL, "unrecognized ET_REL file: %s\n", fname);
+
+		if (w2(ehdr->e_machine) == EM_S390) {
+			reltype = R_390_32;
+		}
+		if (w2(ehdr->e_machine) == EM_MIPS) {
+			reltype = R_MIPS_32;
+		}
+		do_func32(ehdr, fname, reltype);
+		break;
+	case ELFCLASS64: {
+		Elf64_Ehdr *const ghdr = (Elf64_Ehdr *)ehdr;
+		if (w2(ghdr->e_ehsize) != sizeof(Elf64_Ehdr)
+		||  w2(ghdr->e_shentsize) != sizeof(Elf64_Shdr))
+			die(NULL, "unrecognized ET_REL file: %s\n", fname);
+
+		if (w2(ghdr->e_machine) == EM_S390)
+			reltype = R_390_64;
+
+#if 0
+		if (w2(ghdr->e_machine) == EM_MIPS) {
+			reltype = R_MIPS_64;
+			Elf64_r_sym = MIPS64_r_sym;
+		}
+#endif
+		do_func64(ghdr, fname, reltype);
+		break;
+	}
+	}  /* end switch */
+
+	munmap_file(ehdr);
+	return 0;
+}
+
+int main (int argc, char **argv)
+{
+	if (argc != 2)
+		usage(argv);
+	
+	return do_file(argv[1]);
+}
+
diff --git a/scripts/update_jump_label.h b/scripts/update_jump_label.h
new file mode 100644
index 0000000..6ff9846
--- /dev/null
+++ b/scripts/update_jump_label.h
@@ -0,0 +1,322 @@
+/*
+ * recordmcount.h
+ *
+ * This code was taken out of recordmcount.c written by
+ * Copyright 2009 John F. Reiser <jreiser@BitWagon.com>.  All rights reserved.
+ *
+ * The original code had the same algorithms for both 32bit
+ * and 64bit ELF files, but the code was duplicated to support
+ * the difference in structures that were used. This
+ * file creates a macro of everything that is different between
+ * the 64 and 32 bit code, such that by including this header
+ * twice we can create both sets of functions by including this
+ * header once with RECORD_MCOUNT_64 undefined, and again with
+ * it defined.
+ *
+ * This conversion to macros was done by:
+ * Copyright 2010 Steven Rostedt <srostedt@redhat.com>, Red Hat Inc.
+ *
+ * Licensed under the GNU General Public License, version 2 (GPLv2).
+ */
+
+#undef EBITS
+#undef _w
+#undef _align
+#undef _size
+
+#ifdef UPDATE_JUMP_LABEL_64
+# define EBITS			64
+# define _w			w8
+# define _align			7u
+# define _size			8
+#else
+# define EBITS			32
+# define _w			w
+# define _align			3u
+# define _size			4
+#endif
+
+#define _FBITS(x, e)	x##e
+#define FBITS(x, e)	_FBITS(x,e)
+#define FUNC(x)		FBITS(x,EBITS)
+
+#undef Elf_Addr
+#undef Elf_Ehdr
+#undef Elf_Shdr
+#undef Elf_Rel
+#undef Elf_Rela
+#undef Elf_Sym
+#undef ELF_R_SYM
+#undef ELF_R_TYPE
+
+#define __ATTACH(x,y,z)	x##y##z
+#define ATTACH(x,y,z)	__ATTACH(x,y,z)
+
+#define Elf_Addr	ATTACH(Elf,EBITS,_Addr)
+#define Elf_Ehdr	ATTACH(Elf,EBITS,_Ehdr)
+#define Elf_Shdr	ATTACH(Elf,EBITS,_Shdr)
+#define Elf_Rel		ATTACH(Elf,EBITS,_Rel)
+#define Elf_Rela	ATTACH(Elf,EBITS,_Rela)
+#define Elf_Sym		ATTACH(Elf,EBITS,_Sym)
+#define uint_t		ATTACH(uint,EBITS,_t)
+#define ELF_R_SYM	ATTACH(ELF,EBITS,_R_SYM)
+#define ELF_R_TYPE	ATTACH(ELF,EBITS,_R_TYPE)
+
+#undef get_shdr
+#define get_shdr(ehdr) ((Elf_Shdr *)(_w((ehdr)->e_shoff) + (void *)(ehdr)))
+
+#undef get_section_loc
+#define get_section_loc(ehdr, shdr)(_w((shdr)->sh_offset) + (void *)(ehdr))
+
+/* Functions and pointers that do_file() may override for specific e_machine. */
+
+#if 0
+static uint_t FUNC(fn_ELF_R_SYM)(Elf_Rel const *rp)
+{
+	return ELF_R_SYM(_w(rp->r_info));
+}
+static uint_t (*FUNC(Elf_r_sym))(Elf_Rel const *rp) = FUNC(fn_ELF_R_SYM);
+#endif
+
+static void FUNC(get_sym_str_and_relp)(Elf_Shdr const *const relhdr,
+				 Elf_Ehdr const *const ehdr,
+				 Elf_Sym const **sym0,
+				 char const **str0,
+				 Elf_Rel const **relp)
+{
+	Elf_Shdr *const shdr0 = get_shdr(ehdr);
+	unsigned const symsec_sh_link = w(relhdr->sh_link);
+	Elf_Shdr const *const symsec = &shdr0[symsec_sh_link];
+	Elf_Shdr const *const strsec = &shdr0[w(symsec->sh_link)];
+	Elf_Rel const *const rel0 =
+		(Elf_Rel const *)get_section_loc(ehdr, relhdr);
+
+	*sym0 = (Elf_Sym const *)get_section_loc(ehdr, symsec);
+
+	*str0 = (char const *)get_section_loc(ehdr, strsec);
+
+	*relp = rel0;
+}
+
+/*
+ * Read the relocation table again, but this time its called on sections
+ * that are not going to be traced. The mcount calls here will be converted
+ * into nops.
+ */
+static void FUNC(nop_jump_label)(Elf_Shdr const *const relhdr,
+		       Elf_Ehdr const *const ehdr,
+		       const char *const txtname)
+{
+	Elf_Shdr *const shdr0 = get_shdr(ehdr);
+	Elf_Sym const *sym0;
+	char const *str0;
+	Elf_Rel const *relp;
+	Elf_Rela const *relap;
+	Elf_Shdr const *const shdr = &shdr0[w(relhdr->sh_info)];
+	unsigned rel_entsize = w(relhdr->sh_entsize);
+	unsigned const nrel = _w(relhdr->sh_size) / rel_entsize;
+	int t;
+
+	FUNC(get_sym_str_and_relp)(relhdr, ehdr, &sym0, &str0, &relp);
+
+	for (t = nrel; t > 0; t -= 3) {
+		int ret = -1;
+
+		relap = (Elf_Rela const *)relp;
+		printf("rel offset=%lx info=%lx sym=%lx type=%lx addend=%lx\n",
+		       (long)relap->r_offset, (long)relap->r_info,
+		       (long)ELF_R_SYM(relap->r_info),
+		       (long)ELF_R_TYPE(relap->r_info),
+		       (long)relap->r_addend);
+
+		if (0 && make_nop)
+			ret = make_nop((void *)ehdr, shdr->sh_offset + relp->r_offset);
+
+		/* jump label sections are paired in threes */
+		relp = (Elf_Rel const *)(rel_entsize * 3 + (void *)relp);
+	}
+}
+
+/* Evade ISO C restriction: no declaration after statement in has_rel_mcount. */
+static char const *
+FUNC(__has_rel_jump_table)(Elf_Shdr const *const relhdr,  /* is SHT_REL or SHT_RELA */
+		 Elf_Shdr const *const shdr0,
+		 char const *const shstrtab,
+		 char const *const fname)
+{
+	/* .sh_info depends on .sh_type == SHT_REL[,A] */
+	Elf_Shdr const *const txthdr = &shdr0[w(relhdr->sh_info)];
+	char const *const txtname = &shstrtab[w(txthdr->sh_name)];
+
+	if (strcmp("__jump_table", txtname) == 0) {
+		fprintf(stderr, "warning: __mcount_loc already exists: %s\n",
+			fname);
+//		succeed_file();
+	}
+	if (w(txthdr->sh_type) != SHT_PROGBITS ||
+	    !(w(txthdr->sh_flags) & SHF_EXECINSTR))
+		return NULL;
+	return txtname;
+}
+
+static char const *FUNC(has_rel_jump_table)(Elf_Shdr const *const relhdr,
+				      Elf_Shdr const *const shdr0,
+				      char const *const shstrtab,
+				      char const *const fname)
+{
+	if (w(relhdr->sh_type) != SHT_REL && w(relhdr->sh_type) != SHT_RELA)
+		return NULL;
+	return FUNC(__has_rel_jump_table)(relhdr, shdr0, shstrtab, fname);
+}
+
+/* Find relocation section hdr for a given section */
+static const Elf_Shdr *
+FUNC(find_relhdr)(const Elf_Ehdr *ehdr, const Elf_Shdr *shdr)
+{
+	const Elf_Shdr *shdr0 = get_shdr(ehdr);
+	int nhdr = w2(ehdr->e_shnum);
+	const Elf_Shdr *hdr;
+	int i;
+
+	for (hdr = shdr0, i = 0; i < nhdr; hdr = &shdr0[++i]) {
+		if (w(hdr->sh_type) != SHT_REL &&
+		    w(hdr->sh_type) != SHT_RELA)
+			continue;
+
+		/*
+		 * The relocation section's info field holds
+		 * the section index that it represents.
+		 */
+		if (shdr == &shdr0[w(hdr->sh_info)])
+			return hdr;
+	}
+	return NULL;
+}
+
+/* Find a section headr based on name and type */
+static const Elf_Shdr *
+FUNC(find_shdr)(const Elf_Ehdr *ehdr, const char *name, uint_t type)
+{
+	const Elf_Shdr *shdr0 = get_shdr(ehdr);
+	const Elf_Shdr *shstr = &shdr0[w2(ehdr->e_shstrndx)];
+	const char *shstrtab = (char *)get_section_loc(ehdr, shstr);
+	int nhdr = w2(ehdr->e_shnum);
+	const Elf_Shdr *hdr;
+	const char *hdrname;
+	int i;
+
+	for (hdr = shdr0, i = 0; i < nhdr; hdr = &shdr0[++i]) {
+		if (w(hdr->sh_type) != type)
+			continue;
+
+		/* If we are just looking for a section by type (ie. SYMTAB) */
+		if (!name)
+			return hdr;
+
+		hdrname = &shstrtab[w(hdr->sh_name)];
+		if (strcmp(hdrname, name) == 0)
+			return hdr;
+	}
+	return NULL;
+}
+
+static void
+FUNC(section_update)(const Elf_Ehdr *ehdr, const Elf_Shdr *symhdr,
+		     unsigned shtype, const Elf_Rel *rel, void *data)
+{
+	const Elf_Shdr *shdr0 = get_shdr(ehdr);
+	const Elf_Shdr *targethdr;
+	const Elf_Rela *rela;
+	const Elf_Sym *syment;
+	uint_t offset = _w(rel->r_offset);
+	uint_t info = _w(rel->r_info);
+	uint_t sym = ELF_R_SYM(info);
+	uint_t type = ELF_R_TYPE(info);
+	uint_t addend;
+	uint_t targetloc;
+
+	if (shtype == SHT_RELA) {
+		rela = (const Elf_Rela *)rel;
+		addend = _w(rela->r_addend);
+	} else
+		addend = _w(*(unsigned short *)(data + offset));
+
+	syment = (const Elf_Sym *)get_section_loc(ehdr, symhdr);
+	targethdr = &shdr0[w2(syment[sym].st_shndx)];
+	targetloc = _w(targethdr->sh_offset);
+
+	/* TODO, need a separate function for all archs */
+	if (type != R_386_32)
+		die(NULL, "Arch relocation type %d not supported", type);
+
+	targetloc += addend;
+
+#if 1
+	printf("offset=%x target=%x shoffset=%x add=%x\n",
+	       offset, targetloc, _w(targethdr->sh_offset), addend);
+#endif
+	*(uint_t *)(data + offset) = targetloc;
+}
+
+/* Overall supervision for Elf32 ET_REL file. */
+static void
+FUNC(do_func)(Elf_Ehdr *ehdr, char const *const fname, unsigned const reltype)
+{
+	const Elf_Shdr *jlshdr;
+	const Elf_Shdr *jlrhdr;
+	const Elf_Shdr *symhdr;
+	const Elf_Rel *rel;
+	unsigned size;
+	unsigned cnt;
+	unsigned i;
+	uint_t type;
+	void *jdata;
+	void *data;
+
+	jlshdr = FUNC(find_shdr)(ehdr, "__jump_table", SHT_PROGBITS);
+	if (!jlshdr)
+		return;
+
+	jlrhdr = FUNC(find_relhdr)(ehdr, jlshdr);
+	if (!jlrhdr)
+		return;
+
+	/*
+	 * Create and fill in the __jump_table section and use it to
+	 * find the offsets into the text that we want to update.
+	 * We create it so that we do not depend on the order of the
+	 * relocations, and use the table directly, as it is broken
+	 * up into sections.
+	 */
+	size = _w(jlshdr->sh_size);
+	data = umalloc(size);
+
+	jdata = (void *)get_section_loc(ehdr, jlshdr);
+	memcpy(data, jdata, size);
+
+	cnt = _w(jlrhdr->sh_size) / w(jlrhdr->sh_entsize);
+
+	rel = (const Elf_Rel *)get_section_loc(ehdr, jlrhdr);
+
+	/* Is this as Rel or Rela? */
+	type = w(jlrhdr->sh_type);
+
+	symhdr = FUNC(find_shdr)(ehdr, NULL, SHT_SYMTAB);
+
+	for (i = 0; i < cnt; i++) {
+		FUNC(section_update)(ehdr, symhdr, type, rel, data);
+		rel = (void *)rel + w(jlrhdr->sh_entsize);
+	}
+
+	/*
+	 * This is specific to x86. The jump_table is stored in three
+	 * long words. The first is the location of the jmp target we
+	 * must update.
+	 */
+	cnt = size / sizeof(uint_t);
+
+	for (i = 0; i < cnt; i += 3)
+		if (0)make_nop((void *)ehdr, *(uint_t *)(data + i * sizeof(uint_t)));
+
+	free(data);
+}



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH][RFC] jump_labels/x86: Use either 5 byte or 2 byte jumps
  2011-10-07 17:09                               ` [PATCH][RFC] jump_labels/x86: Use either 5 byte or 2 byte jumps Steven Rostedt
@ 2011-10-07 18:52                                 ` Jason Baron
  2011-10-07 19:21                                   ` Steven Rostedt
  2011-10-07 19:33                                   ` Steven Rostedt
  2011-10-07 19:40                                 ` Jeremy Fitzhardinge
  1 sibling, 2 replies; 53+ messages in thread
From: Jason Baron @ 2011-10-07 18:52 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Jeremy Fitzhardinge, Richard Henderson, H. Peter Anvin,
	David S. Miller, David Daney, Michael Ellerman, Jan Glauber,
	the arch/x86 maintainers, Xen Devel, Linux Kernel Mailing List,
	Jeremy Fitzhardinge, peterz

On Fri, Oct 07, 2011 at 01:09:32PM -0400, Steven Rostedt wrote:
> Note, this is just hacked together and needs to be cleaned up. Please do
> not comment on formatting or other sloppiness of this patch. I know it's
> sloppy and I left debug statements in. I want the comments to be on the
> idea of the patch.
> 
> I created a new file called scripts/update_jump_label.[ch] based on some
> of the work of recordmcount.c. This is executed at build time on all
> object files just like recordmcount is. But it does not add any new
> sections, it just modifies the code at build time to convert all jump
> labels into nops.
> 
> The idea is in arch/x86/include/asm/jump_label.h to not place a nop, but
> instead to insert a jmp to the label. Depending on how gcc optimizes the
> code, the jmp will be either end up being a 2 byte or 5 byte jump.
> 
> After an object is compiled, update_jump_label is executed on this file
> and it reads the ELF relocation table to find the jump label locations
> and examines what jump was used. It then converts the jump into either a
> 2 byte or 5 byte nop that is appropriate.
> 
> At boot time, the jump labels no longer need to be converted (although
> we may do so in the future to use better nops depending on the machine
> that is used). When jump labels are enabled, the code is examined to see
> if a two byte or 5 byte version was used, and the appropriate update is
> made.
> 
> I just booted this patch and it worked. I was able to enable and disable
> trace points using jump labels. Benchmarks are welcomed :)
> 
> Comments and thoughts?
> 

Generally, I really like it, I guess b/c I suggested it :) I'll try and
run some workloads on it - A real simple one, I used recently was putting
a single jump label in 'getppid()' and then calling it in a loop - I
wonder if the short nop vs long nop would show up there, as a baseline
test. (fwiw, the jump label vs. no jump label for this test was anywhere
b/w 1-5% improvement).

Anyways, some comments below.  

> -- Steve
> 
> Sloppy-signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> 
> diff --git a/Makefile b/Makefile
> index 31f967c..8368f42 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -245,7 +245,7 @@ CONFIG_SHELL := $(shell if [ -x "$$BASH" ]; then echo $$BASH; \
>  
>  HOSTCC       = gcc
>  HOSTCXX      = g++
> -HOSTCFLAGS   = -Wall -Wmissing-prototypes -Wstrict-prototypes -O2 -fomit-frame-pointer
> +HOSTCFLAGS   = -Wall -Wmissing-prototypes -Wstrict-prototypes -g -fomit-frame-pointer
>  HOSTCXXFLAGS = -O2
>  
>  # Decide whether to build built-in, modular, or both.
> @@ -611,6 +611,13 @@ ifdef CONFIG_DYNAMIC_FTRACE
>  endif
>  endif
>  
> +ifdef CONFIG_JUMP_LABEL
> +	ifdef CONFIG_HAVE_BUILD_TIME_JUMP_LABEL
> +		BUILD_UPDATE_JUMP_LABEL := y
> +		export BUILD_UPDATE_JUMP_LABEL
> +	endif
> +endif
> +
>  # We trigger additional mismatches with less inlining
>  ifdef CONFIG_DEBUG_SECTION_MISMATCH
>  KBUILD_CFLAGS += $(call cc-option, -fno-inline-functions-called-once)
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 4b0669c..8fa6934 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -169,6 +169,12 @@ config HAVE_PERF_EVENTS_NMI
>  	  subsystem.  Also has support for calculating CPU cycle events
>  	  to determine how many clock cycles in a given period.
>  
> +config HAVE_BUILD_TIME_JUMP_LABEL
> +       bool
> +       help
> +	If an arch uses scripts/update_jump_label to patch in jump nops
> +	at build time, then it must enable this option.
> +
>  config HAVE_ARCH_JUMP_LABEL
>  	bool
>  
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 6a47bb2..6de726a 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -61,6 +61,7 @@ config X86
>  	select HAVE_ARCH_KMEMCHECK
>  	select HAVE_USER_RETURN_NOTIFIER
>  	select HAVE_ARCH_JUMP_LABEL
> +	select HAVE_BUILD_TIME_JUMP_LABEL
>  	select HAVE_TEXT_POKE_SMP
>  	select HAVE_GENERIC_HARDIRQS
>  	select HAVE_SPARSE_IRQ
> diff --git a/arch/x86/include/asm/jump_label.h b/arch/x86/include/asm/jump_label.h
> index a32b18c..872b3e1 100644
> --- a/arch/x86/include/asm/jump_label.h
> +++ b/arch/x86/include/asm/jump_label.h
> @@ -14,7 +14,7 @@
>  static __always_inline bool arch_static_branch(struct jump_label_key *key)
>  {
>  	asm goto("1:"
> -		JUMP_LABEL_INITIAL_NOP
> +		"jmp %l[l_yes]\n"
>  		".pushsection __jump_table,  \"aw\" \n\t"
>  		_ASM_ALIGN "\n\t"
>  		_ASM_PTR "1b, %l[l_yes], %c0 \n\t"
> diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
> index 3fee346..1f7f88f 100644
> --- a/arch/x86/kernel/jump_label.c
> +++ b/arch/x86/kernel/jump_label.c
> @@ -16,34 +16,75 @@
>  
>  #ifdef HAVE_JUMP_LABEL
>  
> +static unsigned char nop_short[] = { P6_NOP2 };
> +
>  union jump_code_union {
>  	char code[JUMP_LABEL_NOP_SIZE];
>  	struct {
>  		char jump;
>  		int offset;
>  	} __attribute__((packed));
> +	struct {
> +		char jump_short;
> +		char offset_short;
> +	} __attribute__((packed));
>  };
>  
>  void arch_jump_label_transform(struct jump_entry *entry,
>  			       enum jump_label_type type)
>  {
>  	union jump_code_union code;
> +	unsigned char op;
> +	unsigned size;
> +	unsigned char nop;
> +
> +	/* Use probe_kernel_read()? */
> +	op = *(unsigned char *)entry->code;
> +	nop = ideal_nops[NOP_ATOMIC5][0];
>  
>  	if (type == JUMP_LABEL_ENABLE) {
> -		code.jump = 0xe9;
> -		code.offset = entry->target -
> -				(entry->code + JUMP_LABEL_NOP_SIZE);
> -	} else
> -		memcpy(&code, ideal_nops[NOP_ATOMIC5], JUMP_LABEL_NOP_SIZE);
> +		if (op == 0xe9 || op == 0xeb)
> +			/* Already enabled. Warn? */
> +			return;
> +

Using the jump_label_inc/dec interface this shouldn't happen, I would
have it be BUG


> +		/* FIXME for all archs */
> +		if (op == nop_short[0]) {
> +			size = 2;
> +			code.jump_short = 0xeb;
> +			code.offset = entry->target -
> +				(entry->code + 2);
> +			/* Check for overflow ? */
> +		} else if (op == nop) {
> +			size = JUMP_LABEL_NOP_SIZE;
> +			code.jump = 0xe9;
> +			code.offset = entry->target - (entry->code + size);
> +		} else
> +			return; /* WARN ? */

same here, at least WARN, more likely BUG()

> +
> +	} else {
> +		if (op == nop_short[0] || nop)
> +			/* Already disabled, warn? */
> +			return;
> +

same here.

> +		if (op == 0xe9) {
> +			size = JUMP_LABEL_NOP_SIZE;
> +			memcpy(&code, ideal_nops[NOP_ATOMIC5], size);
> +		} else if (op == 0xeb) {
> +			size = 2;
> +			memcpy(&code, nop_short, size);
> +		} else
> +			return; /* WARN ? */

same here

> +	}
>  	get_online_cpus();
>  	mutex_lock(&text_mutex);
> -	text_poke_smp((void *)entry->code, &code, JUMP_LABEL_NOP_SIZE);
> +	text_poke_smp((void *)entry->code, &code, size);
>  	mutex_unlock(&text_mutex);
>  	put_online_cpus();
>  }
>  
>  void arch_jump_label_text_poke_early(jump_label_t addr)
>  {
> +	return;
>  	text_poke_early((void *)addr, ideal_nops[NOP_ATOMIC5],
>  			JUMP_LABEL_NOP_SIZE);
>  }

hmmm...we spent a bunch of time selecting the 'ideal' run-time noops I
wouldn't want to drop that work.

> diff --git a/scripts/Makefile b/scripts/Makefile
> index df7678f..738b65c 100644
> --- a/scripts/Makefile
> +++ b/scripts/Makefile
> @@ -13,6 +13,7 @@ hostprogs-$(CONFIG_LOGO)         += pnmtologo
>  hostprogs-$(CONFIG_VT)           += conmakehash
>  hostprogs-$(CONFIG_IKCONFIG)     += bin2c
>  hostprogs-$(BUILD_C_RECORDMCOUNT) += recordmcount
> +hostprogs-$(BUILD_UPDATE_JUMP_LABEL) += update_jump_label
>  
>  always		:= $(hostprogs-y) $(hostprogs-m)
>  
> diff --git a/scripts/Makefile.build b/scripts/Makefile.build
> index a0fd502..bc0d89b 100644
> --- a/scripts/Makefile.build
> +++ b/scripts/Makefile.build
> @@ -258,6 +258,15 @@ cmd_modversions =								\
>  	fi;
>  endif
>  
> +ifdef BUILD_UPDATE_JUMP_LABEL
> +update_jump_label_source := $(srctree)/scripts/update_jump_label.c \
> +			$(srctree)/scripts/update_jump_label.h
> +cmd_update_jump_label =						\
> +	if [ $(@) != "scripts/mod/empty.o" ]; then		\
> +		$(objtree)/scripts/update_jump_label "$(@)";	\
> +	fi;
> +endif
> +
>  ifdef CONFIG_FTRACE_MCOUNT_RECORD
>  ifdef BUILD_C_RECORDMCOUNT
>  ifeq ("$(origin RECORDMCOUNT_WARN)", "command line")
> @@ -294,6 +303,7 @@ define rule_cc_o_c
>  	$(cmd_modversions)						  \
>  	$(call echo-cmd,record_mcount)					  \
>  	$(cmd_record_mcount)						  \
> +	$(cmd_update_jump_label)					  \
>  	scripts/basic/fixdep $(depfile) $@ '$(call make-cmd,cc_o_c)' >    \
>  	                                              $(dot-target).tmp;  \
>  	rm -f $(depfile);						  \
> @@ -301,13 +311,14 @@ define rule_cc_o_c
>  endef
>  
>  # Built-in and composite module parts
> -$(obj)/%.o: $(src)/%.c $(recordmcount_source) FORCE
> +$(obj)/%.o: $(src)/%.c $(recordmcount_source) $(update_jump_label_source) FORCE
>  	$(call cmd,force_checksrc)
>  	$(call if_changed_rule,cc_o_c)
>  
>  # Single-part modules are special since we need to mark them in $(MODVERDIR)
>  
> -$(single-used-m): $(obj)/%.o: $(src)/%.c $(recordmcount_source) FORCE
> +$(single-used-m): $(obj)/%.o: $(src)/%.c $(recordmcount_source) \
> +		  $(update_jump_label_source) FORCE
>  	$(call cmd,force_checksrc)
>  	$(call if_changed_rule,cc_o_c)
>  	@{ echo $(@:.o=.ko); echo $@; } > $(MODVERDIR)/$(@F:.o=.mod)
> diff --git a/scripts/update_jump_label.c b/scripts/update_jump_label.c
> new file mode 100644
> index 0000000..86e17bc
> --- /dev/null
> +++ b/scripts/update_jump_label.c
> @@ -0,0 +1,349 @@
> +/*
> + * update_jump_label.c: replace jmps with nops at compile time.
> + * Copyright 2010 Steven Rostedt <srostedt@redhat.com>, Red Hat Inc.
> + *  Parsing of the elf file was influenced by recordmcount.c
> + *  originally written by and copyright to John F. Reiser <jreiser@BitWagon.com>.
> + */
> +
> +/*
> + * Note, this code is originally designed for x86, but may be used by
> + * other archs to do the nop updates at compile time instead of at boot time.
> + * X86 uses this as an optimization, as jmps can be either 2 bytes or 5 bytes.
> + * Inserting a 2 byte where possible helps with both CPU performance and
> + * icache strain.
> + */
> +#include <sys/types.h>
> +#include <sys/mman.h>
> +#include <sys/stat.h>
> +#include <getopt.h>
> +#include <elf.h>
> +#include <fcntl.h>
> +#include <setjmp.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <stdarg.h>
> +#include <string.h>
> +#include <unistd.h>
> +
> +static int fd_map;	/* File descriptor for file being modified. */
> +static struct stat sb;	/* Remember .st_size, etc. */
> +static int mmap_failed; /* Boolean flag. */
> +
> +static void die(const char *err, const char *fmt, ...)
> +{
> +	va_list ap;
> +
> +	if (err)
> +		perror(err);
> +
> +	if (fmt) {
> +		va_start(ap, fmt);
> +		fprintf(stderr, "Fatal error:  ");
> +		vfprintf(stderr, fmt, ap);
> +		fprintf(stderr, "\n");
> +		va_end(ap);
> +	}
> +
> +	exit(1);
> +}
> +
> +static void usage(char **argv)
> +{
> +	char *arg = argv[0];
> +	char *p = arg+strlen(arg);
> +
> +	while (p >= arg && *p != '/')
> +		p--;
> +	p++;
> +
> +	printf("usage: %s file\n"
> +	       "\n",p);
> +	exit(-1);
> +}
> +
> +/* w8rev, w8nat, ...: Handle endianness. */
> +
> +static uint64_t w8rev(uint64_t const x)
> +{
> +	return   ((0xff & (x >> (0 * 8))) << (7 * 8))
> +	       | ((0xff & (x >> (1 * 8))) << (6 * 8))
> +	       | ((0xff & (x >> (2 * 8))) << (5 * 8))
> +	       | ((0xff & (x >> (3 * 8))) << (4 * 8))
> +	       | ((0xff & (x >> (4 * 8))) << (3 * 8))
> +	       | ((0xff & (x >> (5 * 8))) << (2 * 8))
> +	       | ((0xff & (x >> (6 * 8))) << (1 * 8))
> +	       | ((0xff & (x >> (7 * 8))) << (0 * 8));
> +}
> +
> +static uint32_t w4rev(uint32_t const x)
> +{
> +	return   ((0xff & (x >> (0 * 8))) << (3 * 8))
> +	       | ((0xff & (x >> (1 * 8))) << (2 * 8))
> +	       | ((0xff & (x >> (2 * 8))) << (1 * 8))
> +	       | ((0xff & (x >> (3 * 8))) << (0 * 8));
> +}
> +
> +static uint32_t w2rev(uint16_t const x)
> +{
> +	return   ((0xff & (x >> (0 * 8))) << (1 * 8))
> +	       | ((0xff & (x >> (1 * 8))) << (0 * 8));
> +}
> +
> +static uint64_t w8nat(uint64_t const x)
> +{
> +	return x;
> +}
> +
> +static uint32_t w4nat(uint32_t const x)
> +{
> +	return x;
> +}
> +
> +static uint32_t w2nat(uint16_t const x)
> +{
> +	return x;
> +}
> +
> +static uint64_t (*w8)(uint64_t);
> +static uint32_t (*w)(uint32_t);
> +static uint32_t (*w2)(uint16_t);
> +
> +/* ulseek, uread, ...:  Check return value for errors. */
> +
> +static off_t
> +ulseek(int const fd, off_t const offset, int const whence)
> +{
> +	off_t const w = lseek(fd, offset, whence);
> +	if (w == (off_t)-1)
> +		die("lseek", NULL);
> +
> +	return w;
> +}
> +
> +static size_t
> +uread(int const fd, void *const buf, size_t const count)
> +{
> +	size_t const n = read(fd, buf, count);
> +	if (n != count)
> +		die("read", NULL);
> +
> +	return n;
> +}
> +
> +static size_t
> +uwrite(int const fd, void const *const buf, size_t const count)
> +{
> +	size_t const n = write(fd, buf, count);
> +	if (n != count)
> +		die("write", NULL);
> +
> +	return n;
> +}
> +
> +static void *
> +umalloc(size_t size)
> +{
> +	void *const addr = malloc(size);
> +	if (addr == 0)
> +		die("malloc", "malloc failed: %zu bytes\n", size);
> +
> +	return addr;
> +}
> +
> +/*
> + * Get the whole file as a programming convenience in order to avoid
> + * malloc+lseek+read+free of many pieces.  If successful, then mmap
> + * avoids copying unused pieces; else just read the whole file.
> + * Open for both read and write; new info will be appended to the file.
> + * Use MAP_PRIVATE so that a few changes to the in-memory ElfXX_Ehdr
> + * do not propagate to the file until an explicit overwrite at the last.
> + * This preserves most aspects of consistency (all except .st_size)
> + * for simultaneous readers of the file while we are appending to it.
> + * However, multiple writers still are bad.  We choose not to use
> + * locking because it is expensive and the use case of kernel build
> + * makes multiple writers unlikely.
> + */
> +static void *mmap_file(char const *fname)
> +{
> +	void *addr;
> +
> +	fd_map = open(fname, O_RDWR);
> +	if (fd_map < 0 || fstat(fd_map, &sb) < 0)
> +		die(fname, "failed to open file");
> +
> +	if (!S_ISREG(sb.st_mode))
> +		die(NULL, "not a regular file: %s\n", fname);
> +
> +	addr = mmap(0, sb.st_size, PROT_READ|PROT_WRITE, MAP_PRIVATE,
> +		    fd_map, 0);
> +
> +	mmap_failed = 0;
> +	if (addr == MAP_FAILED) {
> +		mmap_failed = 1;
> +		addr = umalloc(sb.st_size);
> +		uread(fd_map, addr, sb.st_size);
> +	}
> +	return addr;
> +}
> +
> +static void munmap_file(void *addr)
> +{
> +	if (!mmap_failed)
> +		munmap(addr, sb.st_size);
> +	else
> +		free(addr);
> +	close(fd_map);
> +}
> +
> +static unsigned char ideal_nop5_x86_64[5] = { 0x0f, 0x1f, 0x44, 0x00, 0x00 };
> +static unsigned char ideal_nop5_x86_32[5] = { 0x3e, 0x8d, 0x74, 0x26, 0x00 };
> +static unsigned char ideal_nop2_x86[2] = { 0x66, 0x99 };
> +static unsigned char *ideal_nop;
> +
> +static int (*make_nop)(void *map, size_t const offset);
> +
> +static int make_nop_x86(void *map, size_t const offset)
> +{
> +	unsigned char *op;
> +	unsigned char *nop;
> +	int size;
> +
> +	/* Determine which type of jmp this is 2 byte or 5. */
> +	op = map + offset;
> +	switch (*op) {
> +	case 0xeb: /* 2 byte */
> +		size = 2;
> +		nop = ideal_nop2_x86;
> +		break;
> +	case 0xe9: /* 5 byte */
> +		size = 5;
> +		nop = ideal_nop;
> +		break;
> +	default:
> +		die(NULL, "Bad jump label section\n");
> +	}
> +
> +	/* convert to nop */
> +	ulseek(fd_map, offset, SEEK_SET);
> +	uwrite(fd_map, nop, size);
> +	return 0;
> +}
> +
> +/* 32 bit and 64 bit are very similar */
> +#include "update_jump_label.h"
> +#define UPDATE_JUMP_LABEL_64
> +#include "update_jump_label.h"
> +
> +static int do_file(const char *fname)
> +{
> +	Elf32_Ehdr *const ehdr = mmap_file(fname);
> +	unsigned int reltype = 0;
> +
> +	w = w4nat;
> +	w2 = w2nat;
> +	w8 = w8nat;
> +	switch (ehdr->e_ident[EI_DATA]) {
> +		static unsigned int const endian = 1;
> +	default:
> +		die(NULL, "unrecognized ELF data encoding %d: %s\n",
> +			ehdr->e_ident[EI_DATA], fname);
> +		break;
> +	case ELFDATA2LSB:
> +		if (*(unsigned char const *)&endian != 1) {
> +			/* main() is big endian, file.o is little endian. */
> +			w = w4rev;
> +			w2 = w2rev;
> +			w8 = w8rev;
> +		}
> +		break;
> +	case ELFDATA2MSB:
> +		if (*(unsigned char const *)&endian != 0) {
> +			/* main() is little endian, file.o is big endian. */
> +			w = w4rev;
> +			w2 = w2rev;
> +			w8 = w8rev;
> +		}
> +		break;
> +	}  /* end switch */
> +
> +	if (memcmp(ELFMAG, ehdr->e_ident, SELFMAG) != 0 ||
> +	    w2(ehdr->e_type) != ET_REL ||
> +	    ehdr->e_ident[EI_VERSION] != EV_CURRENT)
> +		die(NULL, "unrecognized ET_REL file %s\n", fname);
> +
> +	switch (w2(ehdr->e_machine)) {
> +	default:
> +		die(NULL, "unrecognized e_machine %d %s\n",
> +		    w2(ehdr->e_machine), fname);
> +		break;
> +	case EM_386:
> +		reltype = R_386_32;
> +		make_nop = make_nop_x86;
> +		ideal_nop = ideal_nop5_x86_32;
> +		break;
> +	case EM_ARM:	 reltype = R_ARM_ABS32;
> +			 break;
> +	case EM_IA_64:	 reltype = R_IA64_IMM64; break;
> +	case EM_MIPS:	 /* reltype: e_class    */ break;
> +	case EM_PPC:	 reltype = R_PPC_ADDR32;   break;
> +	case EM_PPC64:	 reltype = R_PPC64_ADDR64; break;
> +	case EM_S390:    /* reltype: e_class    */ break;
> +	case EM_SH:	 reltype = R_SH_DIR32;                 break;
> +	case EM_SPARCV9: reltype = R_SPARC_64;     break;
> +	case EM_X86_64:
> +		make_nop = make_nop_x86;
> +		ideal_nop = ideal_nop5_x86_64;
> +		reltype = R_X86_64_64;
> +		break;
> +	}  /* end switch */
> +
> +	switch (ehdr->e_ident[EI_CLASS]) {
> +	default:
> +		die(NULL, "unrecognized ELF class %d %s\n",
> +		    ehdr->e_ident[EI_CLASS], fname);
> +		break;
> +	case ELFCLASS32:
> +		if (w2(ehdr->e_ehsize) != sizeof(Elf32_Ehdr)
> +		||  w2(ehdr->e_shentsize) != sizeof(Elf32_Shdr))
> +			die(NULL, "unrecognized ET_REL file: %s\n", fname);
> +
> +		if (w2(ehdr->e_machine) == EM_S390) {
> +			reltype = R_390_32;
> +		}
> +		if (w2(ehdr->e_machine) == EM_MIPS) {
> +			reltype = R_MIPS_32;
> +		}
> +		do_func32(ehdr, fname, reltype);
> +		break;
> +	case ELFCLASS64: {
> +		Elf64_Ehdr *const ghdr = (Elf64_Ehdr *)ehdr;
> +		if (w2(ghdr->e_ehsize) != sizeof(Elf64_Ehdr)
> +		||  w2(ghdr->e_shentsize) != sizeof(Elf64_Shdr))
> +			die(NULL, "unrecognized ET_REL file: %s\n", fname);
> +
> +		if (w2(ghdr->e_machine) == EM_S390)
> +			reltype = R_390_64;
> +
> +#if 0
> +		if (w2(ghdr->e_machine) == EM_MIPS) {
> +			reltype = R_MIPS_64;
> +			Elf64_r_sym = MIPS64_r_sym;
> +		}
> +#endif
> +		do_func64(ghdr, fname, reltype);
> +		break;
> +	}
> +	}  /* end switch */
> +
> +	munmap_file(ehdr);
> +	return 0;
> +}
> +
> +int main (int argc, char **argv)
> +{
> +	if (argc != 2)
> +		usage(argv);
> +	
> +	return do_file(argv[1]);
> +}
> +
> diff --git a/scripts/update_jump_label.h b/scripts/update_jump_label.h
> new file mode 100644
> index 0000000..6ff9846
> --- /dev/null
> +++ b/scripts/update_jump_label.h
> @@ -0,0 +1,322 @@
> +/*
> + * recordmcount.h
> + *
> + * This code was taken out of recordmcount.c written by
> + * Copyright 2009 John F. Reiser <jreiser@BitWagon.com>.  All rights reserved.
> + *
> + * The original code had the same algorithms for both 32bit
> + * and 64bit ELF files, but the code was duplicated to support
> + * the difference in structures that were used. This
> + * file creates a macro of everything that is different between
> + * the 64 and 32 bit code, such that by including this header
> + * twice we can create both sets of functions by including this
> + * header once with RECORD_MCOUNT_64 undefined, and again with
> + * it defined.
> + *
> + * This conversion to macros was done by:
> + * Copyright 2010 Steven Rostedt <srostedt@redhat.com>, Red Hat Inc.
> + *
> + * Licensed under the GNU General Public License, version 2 (GPLv2).
> + */
> +
> +#undef EBITS
> +#undef _w
> +#undef _align
> +#undef _size
> +
> +#ifdef UPDATE_JUMP_LABEL_64
> +# define EBITS			64
> +# define _w			w8
> +# define _align			7u
> +# define _size			8
> +#else
> +# define EBITS			32
> +# define _w			w
> +# define _align			3u
> +# define _size			4
> +#endif
> +
> +#define _FBITS(x, e)	x##e
> +#define FBITS(x, e)	_FBITS(x,e)
> +#define FUNC(x)		FBITS(x,EBITS)
> +
> +#undef Elf_Addr
> +#undef Elf_Ehdr
> +#undef Elf_Shdr
> +#undef Elf_Rel
> +#undef Elf_Rela
> +#undef Elf_Sym
> +#undef ELF_R_SYM
> +#undef ELF_R_TYPE
> +
> +#define __ATTACH(x,y,z)	x##y##z
> +#define ATTACH(x,y,z)	__ATTACH(x,y,z)
> +
> +#define Elf_Addr	ATTACH(Elf,EBITS,_Addr)
> +#define Elf_Ehdr	ATTACH(Elf,EBITS,_Ehdr)
> +#define Elf_Shdr	ATTACH(Elf,EBITS,_Shdr)
> +#define Elf_Rel		ATTACH(Elf,EBITS,_Rel)
> +#define Elf_Rela	ATTACH(Elf,EBITS,_Rela)
> +#define Elf_Sym		ATTACH(Elf,EBITS,_Sym)
> +#define uint_t		ATTACH(uint,EBITS,_t)
> +#define ELF_R_SYM	ATTACH(ELF,EBITS,_R_SYM)
> +#define ELF_R_TYPE	ATTACH(ELF,EBITS,_R_TYPE)
> +
> +#undef get_shdr
> +#define get_shdr(ehdr) ((Elf_Shdr *)(_w((ehdr)->e_shoff) + (void *)(ehdr)))
> +
> +#undef get_section_loc
> +#define get_section_loc(ehdr, shdr)(_w((shdr)->sh_offset) + (void *)(ehdr))
> +
> +/* Functions and pointers that do_file() may override for specific e_machine. */
> +
> +#if 0
> +static uint_t FUNC(fn_ELF_R_SYM)(Elf_Rel const *rp)
> +{
> +	return ELF_R_SYM(_w(rp->r_info));
> +}
> +static uint_t (*FUNC(Elf_r_sym))(Elf_Rel const *rp) = FUNC(fn_ELF_R_SYM);
> +#endif
> +
> +static void FUNC(get_sym_str_and_relp)(Elf_Shdr const *const relhdr,
> +				 Elf_Ehdr const *const ehdr,
> +				 Elf_Sym const **sym0,
> +				 char const **str0,
> +				 Elf_Rel const **relp)
> +{
> +	Elf_Shdr *const shdr0 = get_shdr(ehdr);
> +	unsigned const symsec_sh_link = w(relhdr->sh_link);
> +	Elf_Shdr const *const symsec = &shdr0[symsec_sh_link];
> +	Elf_Shdr const *const strsec = &shdr0[w(symsec->sh_link)];
> +	Elf_Rel const *const rel0 =
> +		(Elf_Rel const *)get_section_loc(ehdr, relhdr);
> +
> +	*sym0 = (Elf_Sym const *)get_section_loc(ehdr, symsec);
> +
> +	*str0 = (char const *)get_section_loc(ehdr, strsec);
> +
> +	*relp = rel0;
> +}
> +
> +/*
> + * Read the relocation table again, but this time its called on sections
> + * that are not going to be traced. The mcount calls here will be converted
> + * into nops.
> + */
> +static void FUNC(nop_jump_label)(Elf_Shdr const *const relhdr,
> +		       Elf_Ehdr const *const ehdr,
> +		       const char *const txtname)
> +{
> +	Elf_Shdr *const shdr0 = get_shdr(ehdr);
> +	Elf_Sym const *sym0;
> +	char const *str0;
> +	Elf_Rel const *relp;
> +	Elf_Rela const *relap;
> +	Elf_Shdr const *const shdr = &shdr0[w(relhdr->sh_info)];
> +	unsigned rel_entsize = w(relhdr->sh_entsize);
> +	unsigned const nrel = _w(relhdr->sh_size) / rel_entsize;
> +	int t;
> +
> +	FUNC(get_sym_str_and_relp)(relhdr, ehdr, &sym0, &str0, &relp);
> +
> +	for (t = nrel; t > 0; t -= 3) {
> +		int ret = -1;
> +
> +		relap = (Elf_Rela const *)relp;
> +		printf("rel offset=%lx info=%lx sym=%lx type=%lx addend=%lx\n",
> +		       (long)relap->r_offset, (long)relap->r_info,
> +		       (long)ELF_R_SYM(relap->r_info),
> +		       (long)ELF_R_TYPE(relap->r_info),
> +		       (long)relap->r_addend);
> +
> +		if (0 && make_nop)
> +			ret = make_nop((void *)ehdr, shdr->sh_offset + relp->r_offset);
> +
> +		/* jump label sections are paired in threes */
> +		relp = (Elf_Rel const *)(rel_entsize * 3 + (void *)relp);
> +	}
> +}
> +
> +/* Evade ISO C restriction: no declaration after statement in has_rel_mcount. */
> +static char const *
> +FUNC(__has_rel_jump_table)(Elf_Shdr const *const relhdr,  /* is SHT_REL or SHT_RELA */
> +		 Elf_Shdr const *const shdr0,
> +		 char const *const shstrtab,
> +		 char const *const fname)
> +{
> +	/* .sh_info depends on .sh_type == SHT_REL[,A] */
> +	Elf_Shdr const *const txthdr = &shdr0[w(relhdr->sh_info)];
> +	char const *const txtname = &shstrtab[w(txthdr->sh_name)];
> +
> +	if (strcmp("__jump_table", txtname) == 0) {
> +		fprintf(stderr, "warning: __mcount_loc already exists: %s\n",
> +			fname);
> +//		succeed_file();
> +	}
> +	if (w(txthdr->sh_type) != SHT_PROGBITS ||
> +	    !(w(txthdr->sh_flags) & SHF_EXECINSTR))
> +		return NULL;
> +	return txtname;
> +}
> +
> +static char const *FUNC(has_rel_jump_table)(Elf_Shdr const *const relhdr,
> +				      Elf_Shdr const *const shdr0,
> +				      char const *const shstrtab,
> +				      char const *const fname)
> +{
> +	if (w(relhdr->sh_type) != SHT_REL && w(relhdr->sh_type) != SHT_RELA)
> +		return NULL;
> +	return FUNC(__has_rel_jump_table)(relhdr, shdr0, shstrtab, fname);
> +}
> +
> +/* Find relocation section hdr for a given section */
> +static const Elf_Shdr *
> +FUNC(find_relhdr)(const Elf_Ehdr *ehdr, const Elf_Shdr *shdr)
> +{
> +	const Elf_Shdr *shdr0 = get_shdr(ehdr);
> +	int nhdr = w2(ehdr->e_shnum);
> +	const Elf_Shdr *hdr;
> +	int i;
> +
> +	for (hdr = shdr0, i = 0; i < nhdr; hdr = &shdr0[++i]) {
> +		if (w(hdr->sh_type) != SHT_REL &&
> +		    w(hdr->sh_type) != SHT_RELA)
> +			continue;
> +
> +		/*
> +		 * The relocation section's info field holds
> +		 * the section index that it represents.
> +		 */
> +		if (shdr == &shdr0[w(hdr->sh_info)])
> +			return hdr;
> +	}
> +	return NULL;
> +}
> +
> +/* Find a section headr based on name and type */
> +static const Elf_Shdr *
> +FUNC(find_shdr)(const Elf_Ehdr *ehdr, const char *name, uint_t type)
> +{
> +	const Elf_Shdr *shdr0 = get_shdr(ehdr);
> +	const Elf_Shdr *shstr = &shdr0[w2(ehdr->e_shstrndx)];
> +	const char *shstrtab = (char *)get_section_loc(ehdr, shstr);
> +	int nhdr = w2(ehdr->e_shnum);
> +	const Elf_Shdr *hdr;
> +	const char *hdrname;
> +	int i;
> +
> +	for (hdr = shdr0, i = 0; i < nhdr; hdr = &shdr0[++i]) {
> +		if (w(hdr->sh_type) != type)
> +			continue;
> +
> +		/* If we are just looking for a section by type (ie. SYMTAB) */
> +		if (!name)
> +			return hdr;
> +
> +		hdrname = &shstrtab[w(hdr->sh_name)];
> +		if (strcmp(hdrname, name) == 0)
> +			return hdr;
> +	}
> +	return NULL;
> +}
> +
> +static void
> +FUNC(section_update)(const Elf_Ehdr *ehdr, const Elf_Shdr *symhdr,
> +		     unsigned shtype, const Elf_Rel *rel, void *data)
> +{
> +	const Elf_Shdr *shdr0 = get_shdr(ehdr);
> +	const Elf_Shdr *targethdr;
> +	const Elf_Rela *rela;
> +	const Elf_Sym *syment;
> +	uint_t offset = _w(rel->r_offset);
> +	uint_t info = _w(rel->r_info);
> +	uint_t sym = ELF_R_SYM(info);
> +	uint_t type = ELF_R_TYPE(info);
> +	uint_t addend;
> +	uint_t targetloc;
> +
> +	if (shtype == SHT_RELA) {
> +		rela = (const Elf_Rela *)rel;
> +		addend = _w(rela->r_addend);
> +	} else
> +		addend = _w(*(unsigned short *)(data + offset));
> +
> +	syment = (const Elf_Sym *)get_section_loc(ehdr, symhdr);
> +	targethdr = &shdr0[w2(syment[sym].st_shndx)];
> +	targetloc = _w(targethdr->sh_offset);
> +
> +	/* TODO, need a separate function for all archs */
> +	if (type != R_386_32)
> +		die(NULL, "Arch relocation type %d not supported", type);
> +
> +	targetloc += addend;
> +
> +#if 1
> +	printf("offset=%x target=%x shoffset=%x add=%x\n",
> +	       offset, targetloc, _w(targethdr->sh_offset), addend);
> +#endif
> +	*(uint_t *)(data + offset) = targetloc;
> +}
> +
> +/* Overall supervision for Elf32 ET_REL file. */
> +static void
> +FUNC(do_func)(Elf_Ehdr *ehdr, char const *const fname, unsigned const reltype)
> +{
> +	const Elf_Shdr *jlshdr;
> +	const Elf_Shdr *jlrhdr;
> +	const Elf_Shdr *symhdr;
> +	const Elf_Rel *rel;
> +	unsigned size;
> +	unsigned cnt;
> +	unsigned i;
> +	uint_t type;
> +	void *jdata;
> +	void *data;
> +
> +	jlshdr = FUNC(find_shdr)(ehdr, "__jump_table", SHT_PROGBITS);
> +	if (!jlshdr)
> +		return;
> +
> +	jlrhdr = FUNC(find_relhdr)(ehdr, jlshdr);
> +	if (!jlrhdr)
> +		return;
> +
> +	/*
> +	 * Create and fill in the __jump_table section and use it to
> +	 * find the offsets into the text that we want to update.
> +	 * We create it so that we do not depend on the order of the
> +	 * relocations, and use the table directly, as it is broken
> +	 * up into sections.
> +	 */
> +	size = _w(jlshdr->sh_size);
> +	data = umalloc(size);
> +
> +	jdata = (void *)get_section_loc(ehdr, jlshdr);
> +	memcpy(data, jdata, size);
> +
> +	cnt = _w(jlrhdr->sh_size) / w(jlrhdr->sh_entsize);
> +
> +	rel = (const Elf_Rel *)get_section_loc(ehdr, jlrhdr);
> +
> +	/* Is this as Rel or Rela? */
> +	type = w(jlrhdr->sh_type);
> +
> +	symhdr = FUNC(find_shdr)(ehdr, NULL, SHT_SYMTAB);
> +
> +	for (i = 0; i < cnt; i++) {
> +		FUNC(section_update)(ehdr, symhdr, type, rel, data);
> +		rel = (void *)rel + w(jlrhdr->sh_entsize);
> +	}
> +
> +	/*
> +	 * This is specific to x86. The jump_table is stored in three
> +	 * long words. The first is the location of the jmp target we
> +	 * must update.
> +	 */
> +	cnt = size / sizeof(uint_t);
> +
> +	for (i = 0; i < cnt; i += 3)
> +		if (0)make_nop((void *)ehdr, *(uint_t *)(data + i * sizeof(uint_t)));
> +

hmmmm, isn't this the line that actually writes in the no-ops? why isn't
it disabled?

> +	free(data);
> +}
> 
> 

Thanks again for doing this...I was still understanding recordmcount.c ;)

-Jason

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH][RFC] jump_labels/x86: Use either 5 byte or 2 byte jumps
  2011-10-07 18:52                                 ` Jason Baron
@ 2011-10-07 19:21                                   ` Steven Rostedt
  2011-10-07 21:48                                     ` H. Peter Anvin
  2011-10-07 19:33                                   ` Steven Rostedt
  1 sibling, 1 reply; 53+ messages in thread
From: Steven Rostedt @ 2011-10-07 19:21 UTC (permalink / raw)
  To: Jason Baron
  Cc: Jeremy Fitzhardinge, Richard Henderson, H. Peter Anvin,
	David S. Miller, David Daney, Michael Ellerman, Jan Glauber,
	the arch/x86 maintainers, Xen Devel, Linux Kernel Mailing List,
	Jeremy Fitzhardinge, peterz

On Fri, 2011-10-07 at 14:52 -0400, Jason Baron wrote:
> On Fri, Oct 07, 2011 at 01:09:32PM -0400, Steven Rostedt wrote:
> > Note, this is just hacked together and needs to be cleaned up. Please do
> > not comment on formatting or other sloppiness of this patch. I know it's
> > sloppy and I left debug statements in. I want the comments to be on the
> > idea of the patch.
> > 
> > I created a new file called scripts/update_jump_label.[ch] based on some
> > of the work of recordmcount.c. This is executed at build time on all
> > object files just like recordmcount is. But it does not add any new
> > sections, it just modifies the code at build time to convert all jump
> > labels into nops.
> > 
> > The idea is in arch/x86/include/asm/jump_label.h to not place a nop, but
> > instead to insert a jmp to the label. Depending on how gcc optimizes the
> > code, the jmp will be either end up being a 2 byte or 5 byte jump.
> > 
> > After an object is compiled, update_jump_label is executed on this file
> > and it reads the ELF relocation table to find the jump label locations
> > and examines what jump was used. It then converts the jump into either a
> > 2 byte or 5 byte nop that is appropriate.
> > 
> > At boot time, the jump labels no longer need to be converted (although
> > we may do so in the future to use better nops depending on the machine
> > that is used). When jump labels are enabled, the code is examined to see
> > if a two byte or 5 byte version was used, and the appropriate update is
> > made.
> > 
> > I just booted this patch and it worked. I was able to enable and disable
> > trace points using jump labels. Benchmarks are welcomed :)
> > 
> > Comments and thoughts?
> > 
> 
> Generally, I really like it, I guess b/c I suggested it :)

Yeah, I saw your suggestion about using jump for the asm. I didn't read
carefully enough if you suggested the build time changes or not. I
thought about recordmcount at that moment, and decided to try it ;)

>  I'll try and
> run some workloads on it - A real simple one, I used recently was putting
> a single jump label in 'getppid()' and then calling it in a loop - I
> wonder if the short nop vs long nop would show up there, as a baseline
> test. (fwiw, the jump label vs. no jump label for this test was anywhere
> b/w 1-5% improvement).
> 
> Anyways, some comments below.  
> 
> > -- Steve
> > 
> > Sloppy-signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> > 
> > diff --git a/Makefile b/Makefile
> > index 31f967c..8368f42 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -245,7 +245,7 @@ CONFIG_SHELL := $(shell if [ -x "$$BASH" ]; then echo $$BASH; \
> >  
> >  HOSTCC       = gcc
> >  HOSTCXX      = g++
> > -HOSTCFLAGS   = -Wall -Wmissing-prototypes -Wstrict-prototypes -O2 -fomit-frame-pointer
> > +HOSTCFLAGS   = -Wall -Wmissing-prototypes -Wstrict-prototypes -g -fomit-frame-pointer
> >  HOSTCXXFLAGS = -O2
> >  
> >  # Decide whether to build built-in, modular, or both.
> > @@ -611,6 +611,13 @@ ifdef CONFIG_DYNAMIC_FTRACE
> >  endif
> >  endif
> >  
> > +ifdef CONFIG_JUMP_LABEL
> > +	ifdef CONFIG_HAVE_BUILD_TIME_JUMP_LABEL
> > +		BUILD_UPDATE_JUMP_LABEL := y
> > +		export BUILD_UPDATE_JUMP_LABEL
> > +	endif
> > +endif
> > +
> >  # We trigger additional mismatches with less inlining
> >  ifdef CONFIG_DEBUG_SECTION_MISMATCH
> >  KBUILD_CFLAGS += $(call cc-option, -fno-inline-functions-called-once)
> > diff --git a/arch/Kconfig b/arch/Kconfig
> > index 4b0669c..8fa6934 100644
> > --- a/arch/Kconfig
> > +++ b/arch/Kconfig
> > @@ -169,6 +169,12 @@ config HAVE_PERF_EVENTS_NMI
> >  	  subsystem.  Also has support for calculating CPU cycle events
> >  	  to determine how many clock cycles in a given period.
> >  
> > +config HAVE_BUILD_TIME_JUMP_LABEL
> > +       bool
> > +       help
> > +	If an arch uses scripts/update_jump_label to patch in jump nops
> > +	at build time, then it must enable this option.
> > +
> >  config HAVE_ARCH_JUMP_LABEL
> >  	bool
> >  
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index 6a47bb2..6de726a 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -61,6 +61,7 @@ config X86
> >  	select HAVE_ARCH_KMEMCHECK
> >  	select HAVE_USER_RETURN_NOTIFIER
> >  	select HAVE_ARCH_JUMP_LABEL
> > +	select HAVE_BUILD_TIME_JUMP_LABEL
> >  	select HAVE_TEXT_POKE_SMP
> >  	select HAVE_GENERIC_HARDIRQS
> >  	select HAVE_SPARSE_IRQ
> > diff --git a/arch/x86/include/asm/jump_label.h b/arch/x86/include/asm/jump_label.h
> > index a32b18c..872b3e1 100644
> > --- a/arch/x86/include/asm/jump_label.h
> > +++ b/arch/x86/include/asm/jump_label.h
> > @@ -14,7 +14,7 @@
> >  static __always_inline bool arch_static_branch(struct jump_label_key *key)
> >  {
> >  	asm goto("1:"
> > -		JUMP_LABEL_INITIAL_NOP
> > +		"jmp %l[l_yes]\n"
> >  		".pushsection __jump_table,  \"aw\" \n\t"
> >  		_ASM_ALIGN "\n\t"
> >  		_ASM_PTR "1b, %l[l_yes], %c0 \n\t"
> > diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
> > index 3fee346..1f7f88f 100644
> > --- a/arch/x86/kernel/jump_label.c
> > +++ b/arch/x86/kernel/jump_label.c
> > @@ -16,34 +16,75 @@
> >  
> >  #ifdef HAVE_JUMP_LABEL
> >  
> > +static unsigned char nop_short[] = { P6_NOP2 };
> > +
> >  union jump_code_union {
> >  	char code[JUMP_LABEL_NOP_SIZE];
> >  	struct {
> >  		char jump;
> >  		int offset;
> >  	} __attribute__((packed));
> > +	struct {
> > +		char jump_short;
> > +		char offset_short;
> > +	} __attribute__((packed));
> >  };
> >  
> >  void arch_jump_label_transform(struct jump_entry *entry,
> >  			       enum jump_label_type type)
> >  {
> >  	union jump_code_union code;
> > +	unsigned char op;
> > +	unsigned size;
> > +	unsigned char nop;
> > +
> > +	/* Use probe_kernel_read()? */
> > +	op = *(unsigned char *)entry->code;
> > +	nop = ideal_nops[NOP_ATOMIC5][0];
> >  
> >  	if (type == JUMP_LABEL_ENABLE) {
> > -		code.jump = 0xe9;
> > -		code.offset = entry->target -
> > -				(entry->code + JUMP_LABEL_NOP_SIZE);
> > -	} else
> > -		memcpy(&code, ideal_nops[NOP_ATOMIC5], JUMP_LABEL_NOP_SIZE);
> > +		if (op == 0xe9 || op == 0xeb)
> > +			/* Already enabled. Warn? */
> > +			return;
> > +
> 
> Using the jump_label_inc/dec interface this shouldn't happen, I would
> have it be BUG
> 
> 
> > +		/* FIXME for all archs */
> > +		if (op == nop_short[0]) {
> > +			size = 2;
> > +			code.jump_short = 0xeb;
> > +			code.offset = entry->target -
> > +				(entry->code + 2);
> > +			/* Check for overflow ? */
> > +		} else if (op == nop) {
> > +			size = JUMP_LABEL_NOP_SIZE;
> > +			code.jump = 0xe9;
> > +			code.offset = entry->target - (entry->code + size);
> > +		} else
> > +			return; /* WARN ? */
> 
> same here, at least WARN, more likely BUG()

I just don't like using BUG(). BUG() means that if we continue we will
corrupt the filesystem or make you go blind. WARN and returning here
should not cause any harm and will even let those with X terminals see
oops in /var/log/messages.

> 
> > +
> > +	} else {
> > +		if (op == nop_short[0] || nop)
> > +			/* Already disabled, warn? */
> > +			return;
> > +
> 
> same here.
> 
> > +		if (op == 0xe9) {
> > +			size = JUMP_LABEL_NOP_SIZE;
> > +			memcpy(&code, ideal_nops[NOP_ATOMIC5], size);
> > +		} else if (op == 0xeb) {
> > +			size = 2;
> > +			memcpy(&code, nop_short, size);
> > +		} else
> > +			return; /* WARN ? */
> 
> same here

WARN is better.

> 
> > +	}
> >  	get_online_cpus();
> >  	mutex_lock(&text_mutex);
> > -	text_poke_smp((void *)entry->code, &code, JUMP_LABEL_NOP_SIZE);
> > +	text_poke_smp((void *)entry->code, &code, size);
> >  	mutex_unlock(&text_mutex);
> >  	put_online_cpus();
> >  }
> >  
> >  void arch_jump_label_text_poke_early(jump_label_t addr)
> >  {
> > +	return;
> >  	text_poke_early((void *)addr, ideal_nops[NOP_ATOMIC5],
> >  			JUMP_LABEL_NOP_SIZE);
> >  }
> 
> hmmm...we spent a bunch of time selecting the 'ideal' run-time noops I
> wouldn't want to drop that work.

Read my change log again :)


> 
> > diff --git a/scripts/Makefile b/scripts/Makefile
> > index df7678f..738b65c 100644
> > --- a/scripts/Makefile
> > +++ b/scripts/Makefile
> > @@ -13,6 +13,7 @@ hostprogs-$(CONFIG_LOGO)         += pnmtologo
> >  hostprogs-$(CONFIG_VT)           += conmakehash
> >  hostprogs-$(CONFIG_IKCONFIG)     += bin2c
> >  hostprogs-$(BUILD_C_RECORDMCOUNT) += recordmcount
> > +hostprogs-$(BUILD_UPDATE_JUMP_LABEL) += update_jump_label
> >  
> >  always		:= $(hostprogs-y) $(hostprogs-m)
> >  
> > diff --git a/scripts/Makefile.build b/scripts/Makefile.build
> > index a0fd502..bc0d89b 100644
> > --- a/scripts/Makefile.build
> > +++ b/scripts/Makefile.build
> > @@ -258,6 +258,15 @@ cmd_modversions =								\
> >  	fi;
> >  endif
> >  
> > +ifdef BUILD_UPDATE_JUMP_LABEL
> > +update_jump_label_source := $(srctree)/scripts/update_jump_label.c \
> > +			$(srctree)/scripts/update_jump_label.h
> > +cmd_update_jump_label =						\
> > +	if [ $(@) != "scripts/mod/empty.o" ]; then		\
> > +		$(objtree)/scripts/update_jump_label "$(@)";	\
> > +	fi;
> > +endif
> > +
> >  ifdef CONFIG_FTRACE_MCOUNT_RECORD
> >  ifdef BUILD_C_RECORDMCOUNT
> >  ifeq ("$(origin RECORDMCOUNT_WARN)", "command line")
> > @@ -294,6 +303,7 @@ define rule_cc_o_c
> >  	$(cmd_modversions)						  \
> >  	$(call echo-cmd,record_mcount)					  \
> >  	$(cmd_record_mcount)						  \
> > +	$(cmd_update_jump_label)					  \
> >  	scripts/basic/fixdep $(depfile) $@ '$(call make-cmd,cc_o_c)' >    \
> >  	                                              $(dot-target).tmp;  \
> >  	rm -f $(depfile);						  \
> > @@ -301,13 +311,14 @@ define rule_cc_o_c
> >  endef
> >  
> >  # Built-in and composite module parts
> > -$(obj)/%.o: $(src)/%.c $(recordmcount_source) FORCE
> > +$(obj)/%.o: $(src)/%.c $(recordmcount_source) $(update_jump_label_source) FORCE
> >  	$(call cmd,force_checksrc)
> >  	$(call if_changed_rule,cc_o_c)
> >  
> >  # Single-part modules are special since we need to mark them in $(MODVERDIR)
> >  
> > -$(single-used-m): $(obj)/%.o: $(src)/%.c $(recordmcount_source) FORCE
> > +$(single-used-m): $(obj)/%.o: $(src)/%.c $(recordmcount_source) \
> > +		  $(update_jump_label_source) FORCE
> >  	$(call cmd,force_checksrc)
> >  	$(call if_changed_rule,cc_o_c)
> >  	@{ echo $(@:.o=.ko); echo $@; } > $(MODVERDIR)/$(@F:.o=.mod)
> > diff --git a/scripts/update_jump_label.c b/scripts/update_jump_label.c
> > new file mode 100644
> > index 0000000..86e17bc

> > +	/*
> > +	 * This is specific to x86. The jump_table is stored in three
> > +	 * long words. The first is the location of the jmp target we
> > +	 * must update.
> > +	 */
> > +	cnt = size / sizeof(uint_t);
> > +
> > +	for (i = 0; i < cnt; i += 3)
> > +		if (0)make_nop((void *)ehdr, *(uint_t *)(data + i * sizeof(uint_t)));
> > +
> 
> hmmmm, isn't this the line that actually writes in the no-ops? why isn't
> it disabled?


*DOH!*

Oh crap! I added that to dump more debug info. Let me make sure it
actually works ;)


> 
> > +	free(data);
> > +}
> > 
> > 
> 
> Thanks again for doing this...I was still understanding recordmcount.c ;)

And after this, I plan on giving recordmcount.c an overhaul so it is
easier to read. Right now it is written with lots of
micro-optimizations, with the sacrifice to simplicity. Modifying this
code is nasty, and it needs to be read by mere mortals.

I'll go and test with actual modifications and see if it still works.

-- Steve



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH][RFC] jump_labels/x86: Use either 5 byte or 2 byte jumps
  2011-10-07 18:52                                 ` Jason Baron
  2011-10-07 19:21                                   ` Steven Rostedt
@ 2011-10-07 19:33                                   ` Steven Rostedt
  1 sibling, 0 replies; 53+ messages in thread
From: Steven Rostedt @ 2011-10-07 19:33 UTC (permalink / raw)
  To: Jason Baron
  Cc: Jeremy Fitzhardinge, Richard Henderson, H. Peter Anvin,
	David S. Miller, David Daney, Michael Ellerman, Jan Glauber,
	the arch/x86 maintainers, Xen Devel, Linux Kernel Mailing List,
	Jeremy Fitzhardinge, peterz

On Fri, 2011-10-07 at 15:21 -0400, Steven Rostedt wrote:

> > > +	/*
> > > +	 * This is specific to x86. The jump_table is stored in three
> > > +	 * long words. The first is the location of the jmp target we
> > > +	 * must update.
> > > +	 */
> > > +	cnt = size / sizeof(uint_t);
> > > +
> > > +	for (i = 0; i < cnt; i += 3)
> > > +		if (0)make_nop((void *)ehdr, *(uint_t *)(data + i * sizeof(uint_t)));

I just compiled and booted the 

-		if (0)make_nop((void *)ehdr, *(uint_t *)(data + i * sizeof(uint_t)));
+		make_nop((void *)ehdr, *(uint_t *)(data + i * sizeof(uint_t)));

version, and it still works.

Phew!

-- Steve



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH][RFC] jump_labels/x86: Use either 5 byte or 2 byte jumps
  2011-10-07 17:09                               ` [PATCH][RFC] jump_labels/x86: Use either 5 byte or 2 byte jumps Steven Rostedt
  2011-10-07 18:52                                 ` Jason Baron
@ 2011-10-07 19:40                                 ` Jeremy Fitzhardinge
  2011-10-07 19:58                                   ` Steven Rostedt
  2011-10-07 20:04                                   ` Peter Zijlstra
  1 sibling, 2 replies; 53+ messages in thread
From: Jeremy Fitzhardinge @ 2011-10-07 19:40 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Richard Henderson, Jason Baron, H. Peter Anvin, David S. Miller,
	David Daney, Michael Ellerman, Jan Glauber,
	the arch/x86 maintainers, Xen Devel, Linux Kernel Mailing List,
	Jeremy Fitzhardinge, peterz

On 10/07/2011 10:09 AM, Steven Rostedt wrote:
> diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
> index 3fee346..1f7f88f 100644
> --- a/arch/x86/kernel/jump_label.c
> +++ b/arch/x86/kernel/jump_label.c
> @@ -16,34 +16,75 @@
>  
>  #ifdef HAVE_JUMP_LABEL
>  
> +static unsigned char nop_short[] = { P6_NOP2 };
> +
>  union jump_code_union {
>  	char code[JUMP_LABEL_NOP_SIZE];
>  	struct {
>  		char jump;
>  		int offset;
>  	} __attribute__((packed));
> +	struct {
> +		char jump_short;
> +		char offset_short;
> +	} __attribute__((packed));
>  };
>  
>  void arch_jump_label_transform(struct jump_entry *entry,
>  			       enum jump_label_type type)
>  {
>  	union jump_code_union code;
> +	unsigned char op;
> +	unsigned size;
> +	unsigned char nop;
> +
> +	/* Use probe_kernel_read()? */
> +	op = *(unsigned char *)entry->code;
> +	nop = ideal_nops[NOP_ATOMIC5][0];
>  
>  	if (type == JUMP_LABEL_ENABLE) {
> -		code.jump = 0xe9;
> -		code.offset = entry->target -
> -				(entry->code + JUMP_LABEL_NOP_SIZE);
> -	} else
> -		memcpy(&code, ideal_nops[NOP_ATOMIC5], JUMP_LABEL_NOP_SIZE);
> +		if (op == 0xe9 || op == 0xeb)
> +			/* Already enabled. Warn? */
> +			return;
> +
> +		/* FIXME for all archs */

By "archs", do you mean different x86 variants?

> +		if (op == nop_short[0]) {

My gut feeling is that all this "trying to determine the jump size by
sniffing the instruction" stuff seems pretty fragile.  Couldn't you
store the jump size in the jump_label structure (even as a bit hidden
away somewhere)?

    J

> +			size = 2;
> +			code.jump_short = 0xeb;
> +			code.offset = entry->target -
> +				(entry->code + 2);
> +			/* Check for overflow ? */
> +		} else if (op == nop) {
> +			size = JUMP_LABEL_NOP_SIZE;
> +			code.jump = 0xe9;
> +			code.offset = entry->target - (entry->code + size);
> +		} else
> +			return; /* WARN ? */
> +
> +	} else {
> +		if (op == nop_short[0] || nop)
> +			/* Already disabled, warn? */
> +			return;
> +
> +		if (op == 0xe9) {
> +			size = JUMP_LABEL_NOP_SIZE;
> +			memcpy(&code, ideal_nops[NOP_ATOMIC5], size);
> +		} else if (op == 0xeb) {
> +			size = 2;
> +			memcpy(&code, nop_short, size);
> +		} else
> +			return; /* WARN ? */
> +	}
>  	get_online_cpus();
>  	mutex_lock(&text_mutex);
> -	text_poke_smp((void *)entry->code, &code, JUMP_LABEL_NOP_SIZE);
> +	text_poke_smp((void *)entry->code, &code, size);
>  	mutex_unlock(&text_mutex);
>  	put_online_cpus();
>  }
>  
>  void arch_jump_label_text_poke_early(jump_label_t addr)
>  {
> +	return;
>  	text_poke_early((void *)addr, ideal_nops[NOP_ATOMIC5],
>  			JUMP_LABEL_NOP_SIZE);
>  }
> diff --git a/scripts/Makefile b/scripts/Makefile
> index df7678f..738b65c 100644
> --- a/scripts/Makefile
> +++ b/scripts/Makefile
> @@ -13,6 +13,7 @@ hostprogs-$(CONFIG_LOGO)         += pnmtologo
>  hostprogs-$(CONFIG_VT)           += conmakehash
>  hostprogs-$(CONFIG_IKCONFIG)     += bin2c
>  hostprogs-$(BUILD_C_RECORDMCOUNT) += recordmcount
> +hostprogs-$(BUILD_UPDATE_JUMP_LABEL) += update_jump_label
>  
>  always		:= $(hostprogs-y) $(hostprogs-m)
>  
> diff --git a/scripts/Makefile.build b/scripts/Makefile.build
> index a0fd502..bc0d89b 100644
> --- a/scripts/Makefile.build
> +++ b/scripts/Makefile.build
> @@ -258,6 +258,15 @@ cmd_modversions =								\
>  	fi;
>  endif
>  
> +ifdef BUILD_UPDATE_JUMP_LABEL
> +update_jump_label_source := $(srctree)/scripts/update_jump_label.c \
> +			$(srctree)/scripts/update_jump_label.h
> +cmd_update_jump_label =						\
> +	if [ $(@) != "scripts/mod/empty.o" ]; then		\
> +		$(objtree)/scripts/update_jump_label "$(@)";	\
> +	fi;
> +endif
> +
>  ifdef CONFIG_FTRACE_MCOUNT_RECORD
>  ifdef BUILD_C_RECORDMCOUNT
>  ifeq ("$(origin RECORDMCOUNT_WARN)", "command line")
> @@ -294,6 +303,7 @@ define rule_cc_o_c
>  	$(cmd_modversions)						  \
>  	$(call echo-cmd,record_mcount)					  \
>  	$(cmd_record_mcount)						  \
> +	$(cmd_update_jump_label)					  \
>  	scripts/basic/fixdep $(depfile) $@ '$(call make-cmd,cc_o_c)' >    \
>  	                                              $(dot-target).tmp;  \
>  	rm -f $(depfile);						  \
> @@ -301,13 +311,14 @@ define rule_cc_o_c
>  endef
>  
>  # Built-in and composite module parts
> -$(obj)/%.o: $(src)/%.c $(recordmcount_source) FORCE
> +$(obj)/%.o: $(src)/%.c $(recordmcount_source) $(update_jump_label_source) FORCE
>  	$(call cmd,force_checksrc)
>  	$(call if_changed_rule,cc_o_c)
>  
>  # Single-part modules are special since we need to mark them in $(MODVERDIR)
>  
> -$(single-used-m): $(obj)/%.o: $(src)/%.c $(recordmcount_source) FORCE
> +$(single-used-m): $(obj)/%.o: $(src)/%.c $(recordmcount_source) \
> +		  $(update_jump_label_source) FORCE
>  	$(call cmd,force_checksrc)
>  	$(call if_changed_rule,cc_o_c)
>  	@{ echo $(@:.o=.ko); echo $@; } > $(MODVERDIR)/$(@F:.o=.mod)
> diff --git a/scripts/update_jump_label.c b/scripts/update_jump_label.c
> new file mode 100644
> index 0000000..86e17bc
> --- /dev/null
> +++ b/scripts/update_jump_label.c
> @@ -0,0 +1,349 @@
> +/*
> + * update_jump_label.c: replace jmps with nops at compile time.
> + * Copyright 2010 Steven Rostedt <srostedt@redhat.com>, Red Hat Inc.
> + *  Parsing of the elf file was influenced by recordmcount.c
> + *  originally written by and copyright to John F. Reiser <jreiser@BitWagon.com>.
> + */
> +
> +/*
> + * Note, this code is originally designed for x86, but may be used by
> + * other archs to do the nop updates at compile time instead of at boot time.
> + * X86 uses this as an optimization, as jmps can be either 2 bytes or 5 bytes.
> + * Inserting a 2 byte where possible helps with both CPU performance and
> + * icache strain.
> + */
> +#include <sys/types.h>
> +#include <sys/mman.h>
> +#include <sys/stat.h>
> +#include <getopt.h>
> +#include <elf.h>
> +#include <fcntl.h>
> +#include <setjmp.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <stdarg.h>
> +#include <string.h>
> +#include <unistd.h>
> +
> +static int fd_map;	/* File descriptor for file being modified. */
> +static struct stat sb;	/* Remember .st_size, etc. */
> +static int mmap_failed; /* Boolean flag. */
> +
> +static void die(const char *err, const char *fmt, ...)
> +{
> +	va_list ap;
> +
> +	if (err)
> +		perror(err);
> +
> +	if (fmt) {
> +		va_start(ap, fmt);
> +		fprintf(stderr, "Fatal error:  ");
> +		vfprintf(stderr, fmt, ap);
> +		fprintf(stderr, "\n");
> +		va_end(ap);
> +	}
> +
> +	exit(1);
> +}
> +
> +static void usage(char **argv)
> +{
> +	char *arg = argv[0];
> +	char *p = arg+strlen(arg);
> +
> +	while (p >= arg && *p != '/')
> +		p--;
> +	p++;
> +
> +	printf("usage: %s file\n"
> +	       "\n",p);
> +	exit(-1);
> +}
> +
> +/* w8rev, w8nat, ...: Handle endianness. */
> +
> +static uint64_t w8rev(uint64_t const x)
> +{
> +	return   ((0xff & (x >> (0 * 8))) << (7 * 8))
> +	       | ((0xff & (x >> (1 * 8))) << (6 * 8))
> +	       | ((0xff & (x >> (2 * 8))) << (5 * 8))
> +	       | ((0xff & (x >> (3 * 8))) << (4 * 8))
> +	       | ((0xff & (x >> (4 * 8))) << (3 * 8))
> +	       | ((0xff & (x >> (5 * 8))) << (2 * 8))
> +	       | ((0xff & (x >> (6 * 8))) << (1 * 8))
> +	       | ((0xff & (x >> (7 * 8))) << (0 * 8));
> +}
> +
> +static uint32_t w4rev(uint32_t const x)
> +{
> +	return   ((0xff & (x >> (0 * 8))) << (3 * 8))
> +	       | ((0xff & (x >> (1 * 8))) << (2 * 8))
> +	       | ((0xff & (x >> (2 * 8))) << (1 * 8))
> +	       | ((0xff & (x >> (3 * 8))) << (0 * 8));
> +}
> +
> +static uint32_t w2rev(uint16_t const x)
> +{
> +	return   ((0xff & (x >> (0 * 8))) << (1 * 8))
> +	       | ((0xff & (x >> (1 * 8))) << (0 * 8));
> +}
> +
> +static uint64_t w8nat(uint64_t const x)
> +{
> +	return x;
> +}
> +
> +static uint32_t w4nat(uint32_t const x)
> +{
> +	return x;
> +}
> +
> +static uint32_t w2nat(uint16_t const x)
> +{
> +	return x;
> +}
> +
> +static uint64_t (*w8)(uint64_t);
> +static uint32_t (*w)(uint32_t);
> +static uint32_t (*w2)(uint16_t);
> +
> +/* ulseek, uread, ...:  Check return value for errors. */
> +
> +static off_t
> +ulseek(int const fd, off_t const offset, int const whence)
> +{
> +	off_t const w = lseek(fd, offset, whence);
> +	if (w == (off_t)-1)
> +		die("lseek", NULL);
> +
> +	return w;
> +}
> +
> +static size_t
> +uread(int const fd, void *const buf, size_t const count)
> +{
> +	size_t const n = read(fd, buf, count);
> +	if (n != count)
> +		die("read", NULL);
> +
> +	return n;
> +}
> +
> +static size_t
> +uwrite(int const fd, void const *const buf, size_t const count)
> +{
> +	size_t const n = write(fd, buf, count);
> +	if (n != count)
> +		die("write", NULL);
> +
> +	return n;
> +}
> +
> +static void *
> +umalloc(size_t size)
> +{
> +	void *const addr = malloc(size);
> +	if (addr == 0)
> +		die("malloc", "malloc failed: %zu bytes\n", size);
> +
> +	return addr;
> +}
> +
> +/*
> + * Get the whole file as a programming convenience in order to avoid
> + * malloc+lseek+read+free of many pieces.  If successful, then mmap
> + * avoids copying unused pieces; else just read the whole file.
> + * Open for both read and write; new info will be appended to the file.
> + * Use MAP_PRIVATE so that a few changes to the in-memory ElfXX_Ehdr
> + * do not propagate to the file until an explicit overwrite at the last.
> + * This preserves most aspects of consistency (all except .st_size)
> + * for simultaneous readers of the file while we are appending to it.
> + * However, multiple writers still are bad.  We choose not to use
> + * locking because it is expensive and the use case of kernel build
> + * makes multiple writers unlikely.
> + */
> +static void *mmap_file(char const *fname)
> +{
> +	void *addr;
> +
> +	fd_map = open(fname, O_RDWR);
> +	if (fd_map < 0 || fstat(fd_map, &sb) < 0)
> +		die(fname, "failed to open file");
> +
> +	if (!S_ISREG(sb.st_mode))
> +		die(NULL, "not a regular file: %s\n", fname);
> +
> +	addr = mmap(0, sb.st_size, PROT_READ|PROT_WRITE, MAP_PRIVATE,
> +		    fd_map, 0);
> +
> +	mmap_failed = 0;
> +	if (addr == MAP_FAILED) {
> +		mmap_failed = 1;
> +		addr = umalloc(sb.st_size);
> +		uread(fd_map, addr, sb.st_size);
> +	}
> +	return addr;
> +}
> +
> +static void munmap_file(void *addr)
> +{
> +	if (!mmap_failed)
> +		munmap(addr, sb.st_size);
> +	else
> +		free(addr);
> +	close(fd_map);
> +}
> +
> +static unsigned char ideal_nop5_x86_64[5] = { 0x0f, 0x1f, 0x44, 0x00, 0x00 };
> +static unsigned char ideal_nop5_x86_32[5] = { 0x3e, 0x8d, 0x74, 0x26, 0x00 };
> +static unsigned char ideal_nop2_x86[2] = { 0x66, 0x99 };
> +static unsigned char *ideal_nop;
> +
> +static int (*make_nop)(void *map, size_t const offset);
> +
> +static int make_nop_x86(void *map, size_t const offset)
> +{
> +	unsigned char *op;
> +	unsigned char *nop;
> +	int size;
> +
> +	/* Determine which type of jmp this is 2 byte or 5. */
> +	op = map + offset;
> +	switch (*op) {
> +	case 0xeb: /* 2 byte */
> +		size = 2;
> +		nop = ideal_nop2_x86;
> +		break;
> +	case 0xe9: /* 5 byte */
> +		size = 5;
> +		nop = ideal_nop;
> +		break;
> +	default:
> +		die(NULL, "Bad jump label section\n");
> +	}
> +
> +	/* convert to nop */
> +	ulseek(fd_map, offset, SEEK_SET);
> +	uwrite(fd_map, nop, size);
> +	return 0;
> +}
> +
> +/* 32 bit and 64 bit are very similar */
> +#include "update_jump_label.h"
> +#define UPDATE_JUMP_LABEL_64
> +#include "update_jump_label.h"
> +
> +static int do_file(const char *fname)
> +{
> +	Elf32_Ehdr *const ehdr = mmap_file(fname);
> +	unsigned int reltype = 0;
> +
> +	w = w4nat;
> +	w2 = w2nat;
> +	w8 = w8nat;
> +	switch (ehdr->e_ident[EI_DATA]) {
> +		static unsigned int const endian = 1;
> +	default:
> +		die(NULL, "unrecognized ELF data encoding %d: %s\n",
> +			ehdr->e_ident[EI_DATA], fname);
> +		break;
> +	case ELFDATA2LSB:
> +		if (*(unsigned char const *)&endian != 1) {
> +			/* main() is big endian, file.o is little endian. */
> +			w = w4rev;
> +			w2 = w2rev;
> +			w8 = w8rev;
> +		}
> +		break;
> +	case ELFDATA2MSB:
> +		if (*(unsigned char const *)&endian != 0) {
> +			/* main() is little endian, file.o is big endian. */
> +			w = w4rev;
> +			w2 = w2rev;
> +			w8 = w8rev;
> +		}
> +		break;
> +	}  /* end switch */
> +
> +	if (memcmp(ELFMAG, ehdr->e_ident, SELFMAG) != 0 ||
> +	    w2(ehdr->e_type) != ET_REL ||
> +	    ehdr->e_ident[EI_VERSION] != EV_CURRENT)
> +		die(NULL, "unrecognized ET_REL file %s\n", fname);
> +
> +	switch (w2(ehdr->e_machine)) {
> +	default:
> +		die(NULL, "unrecognized e_machine %d %s\n",
> +		    w2(ehdr->e_machine), fname);
> +		break;
> +	case EM_386:
> +		reltype = R_386_32;
> +		make_nop = make_nop_x86;
> +		ideal_nop = ideal_nop5_x86_32;
> +		break;
> +	case EM_ARM:	 reltype = R_ARM_ABS32;
> +			 break;
> +	case EM_IA_64:	 reltype = R_IA64_IMM64; break;
> +	case EM_MIPS:	 /* reltype: e_class    */ break;
> +	case EM_PPC:	 reltype = R_PPC_ADDR32;   break;
> +	case EM_PPC64:	 reltype = R_PPC64_ADDR64; break;
> +	case EM_S390:    /* reltype: e_class    */ break;
> +	case EM_SH:	 reltype = R_SH_DIR32;                 break;
> +	case EM_SPARCV9: reltype = R_SPARC_64;     break;
> +	case EM_X86_64:
> +		make_nop = make_nop_x86;
> +		ideal_nop = ideal_nop5_x86_64;
> +		reltype = R_X86_64_64;
> +		break;
> +	}  /* end switch */
> +
> +	switch (ehdr->e_ident[EI_CLASS]) {
> +	default:
> +		die(NULL, "unrecognized ELF class %d %s\n",
> +		    ehdr->e_ident[EI_CLASS], fname);
> +		break;
> +	case ELFCLASS32:
> +		if (w2(ehdr->e_ehsize) != sizeof(Elf32_Ehdr)
> +		||  w2(ehdr->e_shentsize) != sizeof(Elf32_Shdr))
> +			die(NULL, "unrecognized ET_REL file: %s\n", fname);
> +
> +		if (w2(ehdr->e_machine) == EM_S390) {
> +			reltype = R_390_32;
> +		}
> +		if (w2(ehdr->e_machine) == EM_MIPS) {
> +			reltype = R_MIPS_32;
> +		}
> +		do_func32(ehdr, fname, reltype);
> +		break;
> +	case ELFCLASS64: {
> +		Elf64_Ehdr *const ghdr = (Elf64_Ehdr *)ehdr;
> +		if (w2(ghdr->e_ehsize) != sizeof(Elf64_Ehdr)
> +		||  w2(ghdr->e_shentsize) != sizeof(Elf64_Shdr))
> +			die(NULL, "unrecognized ET_REL file: %s\n", fname);
> +
> +		if (w2(ghdr->e_machine) == EM_S390)
> +			reltype = R_390_64;
> +
> +#if 0
> +		if (w2(ghdr->e_machine) == EM_MIPS) {
> +			reltype = R_MIPS_64;
> +			Elf64_r_sym = MIPS64_r_sym;
> +		}
> +#endif
> +		do_func64(ghdr, fname, reltype);
> +		break;
> +	}
> +	}  /* end switch */
> +
> +	munmap_file(ehdr);
> +	return 0;
> +}
> +
> +int main (int argc, char **argv)
> +{
> +	if (argc != 2)
> +		usage(argv);
> +	
> +	return do_file(argv[1]);
> +}
> +
> diff --git a/scripts/update_jump_label.h b/scripts/update_jump_label.h
> new file mode 100644
> index 0000000..6ff9846
> --- /dev/null
> +++ b/scripts/update_jump_label.h
> @@ -0,0 +1,322 @@
> +/*
> + * recordmcount.h
> + *
> + * This code was taken out of recordmcount.c written by
> + * Copyright 2009 John F. Reiser <jreiser@BitWagon.com>.  All rights reserved.
> + *
> + * The original code had the same algorithms for both 32bit
> + * and 64bit ELF files, but the code was duplicated to support
> + * the difference in structures that were used. This
> + * file creates a macro of everything that is different between
> + * the 64 and 32 bit code, such that by including this header
> + * twice we can create both sets of functions by including this
> + * header once with RECORD_MCOUNT_64 undefined, and again with
> + * it defined.
> + *
> + * This conversion to macros was done by:
> + * Copyright 2010 Steven Rostedt <srostedt@redhat.com>, Red Hat Inc.
> + *
> + * Licensed under the GNU General Public License, version 2 (GPLv2).
> + */
> +
> +#undef EBITS
> +#undef _w
> +#undef _align
> +#undef _size
> +
> +#ifdef UPDATE_JUMP_LABEL_64
> +# define EBITS			64
> +# define _w			w8
> +# define _align			7u
> +# define _size			8
> +#else
> +# define EBITS			32
> +# define _w			w
> +# define _align			3u
> +# define _size			4
> +#endif
> +
> +#define _FBITS(x, e)	x##e
> +#define FBITS(x, e)	_FBITS(x,e)
> +#define FUNC(x)		FBITS(x,EBITS)
> +
> +#undef Elf_Addr
> +#undef Elf_Ehdr
> +#undef Elf_Shdr
> +#undef Elf_Rel
> +#undef Elf_Rela
> +#undef Elf_Sym
> +#undef ELF_R_SYM
> +#undef ELF_R_TYPE
> +
> +#define __ATTACH(x,y,z)	x##y##z
> +#define ATTACH(x,y,z)	__ATTACH(x,y,z)
> +
> +#define Elf_Addr	ATTACH(Elf,EBITS,_Addr)
> +#define Elf_Ehdr	ATTACH(Elf,EBITS,_Ehdr)
> +#define Elf_Shdr	ATTACH(Elf,EBITS,_Shdr)
> +#define Elf_Rel		ATTACH(Elf,EBITS,_Rel)
> +#define Elf_Rela	ATTACH(Elf,EBITS,_Rela)
> +#define Elf_Sym		ATTACH(Elf,EBITS,_Sym)
> +#define uint_t		ATTACH(uint,EBITS,_t)
> +#define ELF_R_SYM	ATTACH(ELF,EBITS,_R_SYM)
> +#define ELF_R_TYPE	ATTACH(ELF,EBITS,_R_TYPE)
> +
> +#undef get_shdr
> +#define get_shdr(ehdr) ((Elf_Shdr *)(_w((ehdr)->e_shoff) + (void *)(ehdr)))
> +
> +#undef get_section_loc
> +#define get_section_loc(ehdr, shdr)(_w((shdr)->sh_offset) + (void *)(ehdr))
> +
> +/* Functions and pointers that do_file() may override for specific e_machine. */
> +
> +#if 0
> +static uint_t FUNC(fn_ELF_R_SYM)(Elf_Rel const *rp)
> +{
> +	return ELF_R_SYM(_w(rp->r_info));
> +}
> +static uint_t (*FUNC(Elf_r_sym))(Elf_Rel const *rp) = FUNC(fn_ELF_R_SYM);
> +#endif
> +
> +static void FUNC(get_sym_str_and_relp)(Elf_Shdr const *const relhdr,
> +				 Elf_Ehdr const *const ehdr,
> +				 Elf_Sym const **sym0,
> +				 char const **str0,
> +				 Elf_Rel const **relp)
> +{
> +	Elf_Shdr *const shdr0 = get_shdr(ehdr);
> +	unsigned const symsec_sh_link = w(relhdr->sh_link);
> +	Elf_Shdr const *const symsec = &shdr0[symsec_sh_link];
> +	Elf_Shdr const *const strsec = &shdr0[w(symsec->sh_link)];
> +	Elf_Rel const *const rel0 =
> +		(Elf_Rel const *)get_section_loc(ehdr, relhdr);
> +
> +	*sym0 = (Elf_Sym const *)get_section_loc(ehdr, symsec);
> +
> +	*str0 = (char const *)get_section_loc(ehdr, strsec);
> +
> +	*relp = rel0;
> +}
> +
> +/*
> + * Read the relocation table again, but this time its called on sections
> + * that are not going to be traced. The mcount calls here will be converted
> + * into nops.
> + */
> +static void FUNC(nop_jump_label)(Elf_Shdr const *const relhdr,
> +		       Elf_Ehdr const *const ehdr,
> +		       const char *const txtname)
> +{
> +	Elf_Shdr *const shdr0 = get_shdr(ehdr);
> +	Elf_Sym const *sym0;
> +	char const *str0;
> +	Elf_Rel const *relp;
> +	Elf_Rela const *relap;
> +	Elf_Shdr const *const shdr = &shdr0[w(relhdr->sh_info)];
> +	unsigned rel_entsize = w(relhdr->sh_entsize);
> +	unsigned const nrel = _w(relhdr->sh_size) / rel_entsize;
> +	int t;
> +
> +	FUNC(get_sym_str_and_relp)(relhdr, ehdr, &sym0, &str0, &relp);
> +
> +	for (t = nrel; t > 0; t -= 3) {
> +		int ret = -1;
> +
> +		relap = (Elf_Rela const *)relp;
> +		printf("rel offset=%lx info=%lx sym=%lx type=%lx addend=%lx\n",
> +		       (long)relap->r_offset, (long)relap->r_info,
> +		       (long)ELF_R_SYM(relap->r_info),
> +		       (long)ELF_R_TYPE(relap->r_info),
> +		       (long)relap->r_addend);
> +
> +		if (0 && make_nop)
> +			ret = make_nop((void *)ehdr, shdr->sh_offset + relp->r_offset);
> +
> +		/* jump label sections are paired in threes */
> +		relp = (Elf_Rel const *)(rel_entsize * 3 + (void *)relp);
> +	}
> +}
> +
> +/* Evade ISO C restriction: no declaration after statement in has_rel_mcount. */
> +static char const *
> +FUNC(__has_rel_jump_table)(Elf_Shdr const *const relhdr,  /* is SHT_REL or SHT_RELA */
> +		 Elf_Shdr const *const shdr0,
> +		 char const *const shstrtab,
> +		 char const *const fname)
> +{
> +	/* .sh_info depends on .sh_type == SHT_REL[,A] */
> +	Elf_Shdr const *const txthdr = &shdr0[w(relhdr->sh_info)];
> +	char const *const txtname = &shstrtab[w(txthdr->sh_name)];
> +
> +	if (strcmp("__jump_table", txtname) == 0) {
> +		fprintf(stderr, "warning: __mcount_loc already exists: %s\n",
> +			fname);
> +//		succeed_file();
> +	}
> +	if (w(txthdr->sh_type) != SHT_PROGBITS ||
> +	    !(w(txthdr->sh_flags) & SHF_EXECINSTR))
> +		return NULL;
> +	return txtname;
> +}
> +
> +static char const *FUNC(has_rel_jump_table)(Elf_Shdr const *const relhdr,
> +				      Elf_Shdr const *const shdr0,
> +				      char const *const shstrtab,
> +				      char const *const fname)
> +{
> +	if (w(relhdr->sh_type) != SHT_REL && w(relhdr->sh_type) != SHT_RELA)
> +		return NULL;
> +	return FUNC(__has_rel_jump_table)(relhdr, shdr0, shstrtab, fname);
> +}
> +
> +/* Find relocation section hdr for a given section */
> +static const Elf_Shdr *
> +FUNC(find_relhdr)(const Elf_Ehdr *ehdr, const Elf_Shdr *shdr)
> +{
> +	const Elf_Shdr *shdr0 = get_shdr(ehdr);
> +	int nhdr = w2(ehdr->e_shnum);
> +	const Elf_Shdr *hdr;
> +	int i;
> +
> +	for (hdr = shdr0, i = 0; i < nhdr; hdr = &shdr0[++i]) {
> +		if (w(hdr->sh_type) != SHT_REL &&
> +		    w(hdr->sh_type) != SHT_RELA)
> +			continue;
> +
> +		/*
> +		 * The relocation section's info field holds
> +		 * the section index that it represents.
> +		 */
> +		if (shdr == &shdr0[w(hdr->sh_info)])
> +			return hdr;
> +	}
> +	return NULL;
> +}
> +
> +/* Find a section headr based on name and type */
> +static const Elf_Shdr *
> +FUNC(find_shdr)(const Elf_Ehdr *ehdr, const char *name, uint_t type)
> +{
> +	const Elf_Shdr *shdr0 = get_shdr(ehdr);
> +	const Elf_Shdr *shstr = &shdr0[w2(ehdr->e_shstrndx)];
> +	const char *shstrtab = (char *)get_section_loc(ehdr, shstr);
> +	int nhdr = w2(ehdr->e_shnum);
> +	const Elf_Shdr *hdr;
> +	const char *hdrname;
> +	int i;
> +
> +	for (hdr = shdr0, i = 0; i < nhdr; hdr = &shdr0[++i]) {
> +		if (w(hdr->sh_type) != type)
> +			continue;
> +
> +		/* If we are just looking for a section by type (ie. SYMTAB) */
> +		if (!name)
> +			return hdr;
> +
> +		hdrname = &shstrtab[w(hdr->sh_name)];
> +		if (strcmp(hdrname, name) == 0)
> +			return hdr;
> +	}
> +	return NULL;
> +}
> +
> +static void
> +FUNC(section_update)(const Elf_Ehdr *ehdr, const Elf_Shdr *symhdr,
> +		     unsigned shtype, const Elf_Rel *rel, void *data)
> +{
> +	const Elf_Shdr *shdr0 = get_shdr(ehdr);
> +	const Elf_Shdr *targethdr;
> +	const Elf_Rela *rela;
> +	const Elf_Sym *syment;
> +	uint_t offset = _w(rel->r_offset);
> +	uint_t info = _w(rel->r_info);
> +	uint_t sym = ELF_R_SYM(info);
> +	uint_t type = ELF_R_TYPE(info);
> +	uint_t addend;
> +	uint_t targetloc;
> +
> +	if (shtype == SHT_RELA) {
> +		rela = (const Elf_Rela *)rel;
> +		addend = _w(rela->r_addend);
> +	} else
> +		addend = _w(*(unsigned short *)(data + offset));
> +
> +	syment = (const Elf_Sym *)get_section_loc(ehdr, symhdr);
> +	targethdr = &shdr0[w2(syment[sym].st_shndx)];
> +	targetloc = _w(targethdr->sh_offset);
> +
> +	/* TODO, need a separate function for all archs */
> +	if (type != R_386_32)
> +		die(NULL, "Arch relocation type %d not supported", type);
> +
> +	targetloc += addend;
> +
> +#if 1
> +	printf("offset=%x target=%x shoffset=%x add=%x\n",
> +	       offset, targetloc, _w(targethdr->sh_offset), addend);
> +#endif
> +	*(uint_t *)(data + offset) = targetloc;
> +}
> +
> +/* Overall supervision for Elf32 ET_REL file. */
> +static void
> +FUNC(do_func)(Elf_Ehdr *ehdr, char const *const fname, unsigned const reltype)
> +{
> +	const Elf_Shdr *jlshdr;
> +	const Elf_Shdr *jlrhdr;
> +	const Elf_Shdr *symhdr;
> +	const Elf_Rel *rel;
> +	unsigned size;
> +	unsigned cnt;
> +	unsigned i;
> +	uint_t type;
> +	void *jdata;
> +	void *data;
> +
> +	jlshdr = FUNC(find_shdr)(ehdr, "__jump_table", SHT_PROGBITS);
> +	if (!jlshdr)
> +		return;
> +
> +	jlrhdr = FUNC(find_relhdr)(ehdr, jlshdr);
> +	if (!jlrhdr)
> +		return;
> +
> +	/*
> +	 * Create and fill in the __jump_table section and use it to
> +	 * find the offsets into the text that we want to update.
> +	 * We create it so that we do not depend on the order of the
> +	 * relocations, and use the table directly, as it is broken
> +	 * up into sections.
> +	 */
> +	size = _w(jlshdr->sh_size);
> +	data = umalloc(size);
> +
> +	jdata = (void *)get_section_loc(ehdr, jlshdr);
> +	memcpy(data, jdata, size);
> +
> +	cnt = _w(jlrhdr->sh_size) / w(jlrhdr->sh_entsize);
> +
> +	rel = (const Elf_Rel *)get_section_loc(ehdr, jlrhdr);
> +
> +	/* Is this as Rel or Rela? */
> +	type = w(jlrhdr->sh_type);
> +
> +	symhdr = FUNC(find_shdr)(ehdr, NULL, SHT_SYMTAB);
> +
> +	for (i = 0; i < cnt; i++) {
> +		FUNC(section_update)(ehdr, symhdr, type, rel, data);
> +		rel = (void *)rel + w(jlrhdr->sh_entsize);
> +	}
> +
> +	/*
> +	 * This is specific to x86. The jump_table is stored in three
> +	 * long words. The first is the location of the jmp target we
> +	 * must update.
> +	 */
> +	cnt = size / sizeof(uint_t);
> +
> +	for (i = 0; i < cnt; i += 3)
> +		if (0)make_nop((void *)ehdr, *(uint_t *)(data + i * sizeof(uint_t)));
> +
> +	free(data);
> +}
>
>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH][RFC] jump_labels/x86: Use either 5 byte or 2 byte jumps
  2011-10-07 19:40                                 ` Jeremy Fitzhardinge
@ 2011-10-07 19:58                                   ` Steven Rostedt
  2011-10-07 20:04                                   ` Peter Zijlstra
  1 sibling, 0 replies; 53+ messages in thread
From: Steven Rostedt @ 2011-10-07 19:58 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Richard Henderson, Jason Baron, H. Peter Anvin, David S. Miller,
	David Daney, Michael Ellerman, Jan Glauber,
	the arch/x86 maintainers, Xen Devel, Linux Kernel Mailing List,
	Jeremy Fitzhardinge, peterz

On Fri, 2011-10-07 at 12:40 -0700, Jeremy Fitzhardinge wrote:
> On 10/07/2011 10:09 AM, Steven Rostedt wrote:
> > diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
> > index 3fee346..1f7f88f 100644
> > --- a/arch/x86/kernel/jump_label.c
> > +++ b/arch/x86/kernel/jump_label.c
> > @@ -16,34 +16,75 @@
> >  
> >  #ifdef HAVE_JUMP_LABEL
> >  
> > +static unsigned char nop_short[] = { P6_NOP2 };
> > +
> >  union jump_code_union {
> >  	char code[JUMP_LABEL_NOP_SIZE];
> >  	struct {
> >  		char jump;
> >  		int offset;
> >  	} __attribute__((packed));
> > +	struct {
> > +		char jump_short;
> > +		char offset_short;
> > +	} __attribute__((packed));
> >  };
> >  
> >  void arch_jump_label_transform(struct jump_entry *entry,
> >  			       enum jump_label_type type)
> >  {
> >  	union jump_code_union code;
> > +	unsigned char op;
> > +	unsigned size;
> > +	unsigned char nop;
> > +
> > +	/* Use probe_kernel_read()? */
> > +	op = *(unsigned char *)entry->code;
> > +	nop = ideal_nops[NOP_ATOMIC5][0];
> >  
> >  	if (type == JUMP_LABEL_ENABLE) {
> > -		code.jump = 0xe9;
> > -		code.offset = entry->target -
> > -				(entry->code + JUMP_LABEL_NOP_SIZE);
> > -	} else
> > -		memcpy(&code, ideal_nops[NOP_ATOMIC5], JUMP_LABEL_NOP_SIZE);
> > +		if (op == 0xe9 || op == 0xeb)
> > +			/* Already enabled. Warn? */
> > +			return;
> > +
> > +		/* FIXME for all archs */
> 
> By "archs", do you mean different x86 variants?

Yeah, that was a confusing use of archs. This was to make sure it works
for all nops for different variants of x86.

> 
> > +		if (op == nop_short[0]) {
> 
> My gut feeling is that all this "trying to determine the jump size by
> sniffing the instruction" stuff seems pretty fragile.  Couldn't you
> store the jump size in the jump_label structure (even as a bit hidden
> away somewhere)?

We could but it's not as fragile as you think. This is machine code, and
it should be a jump or not. I could add more checks, that is, to look at
the full nop to make sure it is truly a nop. But for the jump side, a
byte instruction that starts with e9 is definitely a jump.

I could harden this more like what we do with mcount updates in the
function tracer. I actually calculate what I expect to be there before
looking at what is there. The entire instruction is checked. If it does
not match, then we fail and give big warnings about it.

Other than that, it should be quite solid. If we don't get a match, we
should warn and disable jump labels.

No BUG()!

-- Steve



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH][RFC] jump_labels/x86: Use either 5 byte or 2 byte jumps
  2011-10-07 19:40                                 ` Jeremy Fitzhardinge
  2011-10-07 19:58                                   ` Steven Rostedt
@ 2011-10-07 20:04                                   ` Peter Zijlstra
  1 sibling, 0 replies; 53+ messages in thread
From: Peter Zijlstra @ 2011-10-07 20:04 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Steven Rostedt, Richard Henderson, Jason Baron, H. Peter Anvin,
	David S. Miller, David Daney, Michael Ellerman, Jan Glauber,
	the arch/x86 maintainers, Xen Devel, Linux Kernel Mailing List,
	Jeremy Fitzhardinge

On Fri, 2011-10-07 at 12:40 -0700, Jeremy Fitzhardinge wrote:
>     J

can we please, pretty please start trimming email replies?


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH][RFC] jump_labels/x86: Use either 5 byte or 2 byte jumps
  2011-10-07 19:21                                   ` Steven Rostedt
@ 2011-10-07 21:48                                     ` H. Peter Anvin
  2011-10-07 22:00                                       ` Steven Rostedt
  0 siblings, 1 reply; 53+ messages in thread
From: H. Peter Anvin @ 2011-10-07 21:48 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Jason Baron, Jeremy Fitzhardinge, Richard Henderson,
	David S. Miller, David Daney, Michael Ellerman, Jan Glauber,
	the arch/x86 maintainers, Xen Devel, Linux Kernel Mailing List,
	Jeremy Fitzhardinge, peterz

On 10/07/2011 12:21 PM, Steven Rostedt wrote:
>>
>> same here, at least WARN, more likely BUG()
>
> I just don't like using BUG(). BUG() means that if we continue we will
> corrupt the filesystem or make you go blind. WARN and returning here
> should not cause any harm and will even let those with X terminals see
> oops in /var/log/messages.
>

Uh, NO.

If this is wrong something in the kernel code stream is corrupted (heck, 
you might just have caught a rootkit!)

Die.  NOW.

	-hpa

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH][RFC] jump_labels/x86: Use either 5 byte or 2 byte jumps
  2011-10-07 21:48                                     ` H. Peter Anvin
@ 2011-10-07 22:00                                       ` Steven Rostedt
  2011-10-07 22:03                                         ` H. Peter Anvin
  0 siblings, 1 reply; 53+ messages in thread
From: Steven Rostedt @ 2011-10-07 22:00 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jason Baron, Jeremy Fitzhardinge, Richard Henderson,
	David S. Miller, David Daney, Michael Ellerman, Jan Glauber,
	the arch/x86 maintainers, Xen Devel, Linux Kernel Mailing List,
	Jeremy Fitzhardinge, peterz

On Fri, 2011-10-07 at 14:48 -0700, H. Peter Anvin wrote:
> On 10/07/2011 12:21 PM, Steven Rostedt wrote:
> >>
> >> same here, at least WARN, more likely BUG()
> >
> > I just don't like using BUG(). BUG() means that if we continue we will
> > corrupt the filesystem or make you go blind. WARN and returning here
> > should not cause any harm and will even let those with X terminals see
> > oops in /var/log/messages.
> >
> 
> Uh, NO.
> 
> If this is wrong something in the kernel code stream is corrupted (heck, 
> you might just have caught a rootkit!)
> 
> Die.  NOW.

Ouch, quite shaken by k.org? I guess I should have substituted go blind
with being hacked.


The thing is, it may be as simple as an out of tree module screwing up
the jump table. Or worse, gcc not doing things that we did not expect.
If this is the case, jump labels can be disabled from modifying code.

But if we just want to do the BUG() case, this will be a big hammer to
the code and we just prevent any further progress until the issue is
addressed. Which may be tell people to disable jump labels in their
code, or use a different compiler.

Currently ftrace takes the approach to WARN() and disable itself when it
finds an anomaly from what it expects to modify. The times this has
triggered has been either a problem with writing to the code, due to
securities preventing code modification, or the scan of the relocation
tables mistook a data point as code. The later I could foresee happening
with jump labels.

-- Steve



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH][RFC] jump_labels/x86: Use either 5 byte or 2 byte jumps
  2011-10-07 22:00                                       ` Steven Rostedt
@ 2011-10-07 22:03                                         ` H. Peter Anvin
  0 siblings, 0 replies; 53+ messages in thread
From: H. Peter Anvin @ 2011-10-07 22:03 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Jason Baron, Jeremy Fitzhardinge, Richard Henderson,
	David S. Miller, David Daney, Michael Ellerman, Jan Glauber,
	the arch/x86 maintainers, Xen Devel, Linux Kernel Mailing List,
	Jeremy Fitzhardinge, peterz

On 10/07/2011 03:00 PM, Steven Rostedt wrote:
>
> Ouch, quite shaken by k.org? I guess I should have substituted go blind
> with being hacked.
>

Well, yes, but I would have said exactly the same thing before.

> The thing is, it may be as simple as an out of tree module screwing up
> the jump table. Or worse, gcc not doing things that we did not expect.
> If this is the case, jump labels can be disabled from modifying code.
>
> But if we just want to do the BUG() case, this will be a big hammer to
> the code and we just prevent any further progress until the issue is
> addressed. Which may be tell people to disable jump labels in their
> code, or use a different compiler.

That is EXACTLY what should happen.  Something is wrong to the point of 
the kernel is *known* to be executing the wrong code.  That is an 
extremely serious condition and should be treated as such.

If you want, you could have a debug option to demote this to WARN, but I 
really don't want to see it by default.

	-hpa

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don't nop it out
  2011-10-01 21:55 ` [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don't nop it out Jeremy Fitzhardinge
  2011-10-03 15:02   ` Jason Baron
@ 2011-10-10 15:36   ` Jason Baron
  2011-10-10 19:58     ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 53+ messages in thread
From: Jason Baron @ 2011-10-10 15:36 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Steven Rostedt, David S. Miller, David Daney, Michael Ellerman,
	Jan Glauber, the arch/x86 maintainers, Xen Devel,
	Linux Kernel Mailing List, Jeremy Fitzhardinge

On Sat, Oct 01, 2011 at 02:55:35PM -0700, Jeremy Fitzhardinge wrote:
> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> 
> If a key has been enabled before jump_label_init() is called, don't
> nop it out.
> 
> This removes arch_jump_label_text_poke_early() (which can only nop
> out a site) and uses arch_jump_label_transform() instead.
> 
> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> ---
>  include/linux/jump_label.h |    3 ++-
>  kernel/jump_label.c        |   20 ++++++++------------
>  2 files changed, 10 insertions(+), 13 deletions(-)
> 
> diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
> index 1213e9d..c8fb1b3 100644
> --- a/include/linux/jump_label.h
> +++ b/include/linux/jump_label.h
> @@ -45,7 +45,8 @@ extern void jump_label_lock(void);
>  extern void jump_label_unlock(void);
>  extern void arch_jump_label_transform(struct jump_entry *entry,
>  				 enum jump_label_type type);
> -extern void arch_jump_label_text_poke_early(jump_label_t addr);
> +extern void arch_jump_label_transform_early(struct jump_entry *entry,
> +				 enum jump_label_type type);
>  extern int jump_label_text_reserved(void *start, void *end);
>  extern void jump_label_inc(struct jump_label_key *key);
>  extern void jump_label_dec(struct jump_label_key *key);
> diff --git a/kernel/jump_label.c b/kernel/jump_label.c
> index a8ce450..059202d5 100644
> --- a/kernel/jump_label.c
> +++ b/kernel/jump_label.c
> @@ -121,13 +121,6 @@ static void __jump_label_update(struct jump_label_key *key,
>  	}
>  }
>  
> -/*
> - * Not all archs need this.
> - */
> -void __weak arch_jump_label_text_poke_early(jump_label_t addr)
> -{
> -}
> -
>  static __init int jump_label_init(void)
>  {
>  	struct jump_entry *iter_start = __start___jump_table;
> @@ -139,12 +132,15 @@ static __init int jump_label_init(void)
>  	jump_label_sort_entries(iter_start, iter_stop);
>  
>  	for (iter = iter_start; iter < iter_stop; iter++) {
> -		arch_jump_label_text_poke_early(iter->code);
> -		if (iter->key == (jump_label_t)(unsigned long)key)
> +		struct jump_label_key *iterk;
> +
> +		iterk = (struct jump_label_key *)(unsigned long)iter->key;
> +		arch_jump_label_transform(iter, jump_label_enabled(iterk) ?
> +					  JUMP_LABEL_ENABLE : JUMP_LABEL_DISABLE);
> +		if (iterk == key)
>  			continue;
>  
> -		key = (struct jump_label_key *)(unsigned long)iter->key;
> -		atomic_set(&key->enabled, 0);
> +		key = iterk;
>  		key->entries = iter;
>  #ifdef CONFIG_MODULES
>  		key->next = NULL;
> @@ -212,7 +208,7 @@ void jump_label_apply_nops(struct module *mod)
>  		return;
>  
>  	for (iter = iter_start; iter < iter_stop; iter++)
> -		arch_jump_label_text_poke_early(iter->code);
> +		arch_jump_label_transform(iter, JUMP_LABEL_DISABLE);
>  }
>  
>  static int jump_label_add_module(struct module *mod)
> -- 
> 1.7.6.2
> 

Hi,

I just realized that the early call to jump_label_inc(), isn't being
honored with this patch until later when we invoke jump_label_init().
That strikes me as being inconsistent. When jump_label_inc() returns we
should expect the branch to be updated.

Thus, I think what probably want is to add a new 'int jump_label_init'
flag. If its not set we can call 'jump_label_init()' from
jump_label_inc()/dec(). And jump_label_init() can avoid initialization
if its already set.

Thanks,

-Jason 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don't nop it out
  2011-10-10 15:36   ` [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don't nop it out Jason Baron
@ 2011-10-10 19:58     ` Jeremy Fitzhardinge
  2011-10-10 20:10       ` Jason Baron
  0 siblings, 1 reply; 53+ messages in thread
From: Jeremy Fitzhardinge @ 2011-10-10 19:58 UTC (permalink / raw)
  To: Jason Baron
  Cc: Steven Rostedt, David S. Miller, David Daney, Michael Ellerman,
	Jan Glauber, the arch/x86 maintainers, Xen Devel,
	Linux Kernel Mailing List, Jeremy Fitzhardinge

On 10/10/2011 08:36 AM, Jason Baron wrote:
> On Sat, Oct 01, 2011 at 02:55:35PM -0700, Jeremy Fitzhardinge wrote:
>> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>>
>> If a key has been enabled before jump_label_init() is called, don't
>> nop it out.
>>
>> This removes arch_jump_label_text_poke_early() (which can only nop
>> out a site) and uses arch_jump_label_transform() instead.
>>
>> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>> ---
>>  include/linux/jump_label.h |    3 ++-
>>  kernel/jump_label.c        |   20 ++++++++------------
>>  2 files changed, 10 insertions(+), 13 deletions(-)
>>
>> diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
>> index 1213e9d..c8fb1b3 100644
>> --- a/include/linux/jump_label.h
>> +++ b/include/linux/jump_label.h
>> @@ -45,7 +45,8 @@ extern void jump_label_lock(void);
>>  extern void jump_label_unlock(void);
>>  extern void arch_jump_label_transform(struct jump_entry *entry,
>>  				 enum jump_label_type type);
>> -extern void arch_jump_label_text_poke_early(jump_label_t addr);
>> +extern void arch_jump_label_transform_early(struct jump_entry *entry,
>> +				 enum jump_label_type type);
>>  extern int jump_label_text_reserved(void *start, void *end);
>>  extern void jump_label_inc(struct jump_label_key *key);
>>  extern void jump_label_dec(struct jump_label_key *key);
>> diff --git a/kernel/jump_label.c b/kernel/jump_label.c
>> index a8ce450..059202d5 100644
>> --- a/kernel/jump_label.c
>> +++ b/kernel/jump_label.c
>> @@ -121,13 +121,6 @@ static void __jump_label_update(struct jump_label_key *key,
>>  	}
>>  }
>>  
>> -/*
>> - * Not all archs need this.
>> - */
>> -void __weak arch_jump_label_text_poke_early(jump_label_t addr)
>> -{
>> -}
>> -
>>  static __init int jump_label_init(void)
>>  {
>>  	struct jump_entry *iter_start = __start___jump_table;
>> @@ -139,12 +132,15 @@ static __init int jump_label_init(void)
>>  	jump_label_sort_entries(iter_start, iter_stop);
>>  
>>  	for (iter = iter_start; iter < iter_stop; iter++) {
>> -		arch_jump_label_text_poke_early(iter->code);
>> -		if (iter->key == (jump_label_t)(unsigned long)key)
>> +		struct jump_label_key *iterk;
>> +
>> +		iterk = (struct jump_label_key *)(unsigned long)iter->key;
>> +		arch_jump_label_transform(iter, jump_label_enabled(iterk) ?
>> +					  JUMP_LABEL_ENABLE : JUMP_LABEL_DISABLE);
>> +		if (iterk == key)
>>  			continue;
>>  
>> -		key = (struct jump_label_key *)(unsigned long)iter->key;
>> -		atomic_set(&key->enabled, 0);
>> +		key = iterk;
>>  		key->entries = iter;
>>  #ifdef CONFIG_MODULES
>>  		key->next = NULL;
>> @@ -212,7 +208,7 @@ void jump_label_apply_nops(struct module *mod)
>>  		return;
>>  
>>  	for (iter = iter_start; iter < iter_stop; iter++)
>> -		arch_jump_label_text_poke_early(iter->code);
>> +		arch_jump_label_transform(iter, JUMP_LABEL_DISABLE);
>>  }
>>  
>>  static int jump_label_add_module(struct module *mod)
>> -- 
>> 1.7.6.2
>>
> Hi,
>
> I just realized that the early call to jump_label_inc(), isn't being
> honored with this patch until later when we invoke jump_label_init().
> That strikes me as being inconsistent. When jump_label_inc() returns we
> should expect the branch to be updated.

Why is that?  It looks to me like it will unconditionally update the
instruction, irrespective of whether _init() has been called?

> Thus, I think what probably want is to add a new 'int jump_label_init'
> flag. If its not set we can call 'jump_label_init()' from
> jump_label_inc()/dec().

Hm.  I worry that it may end up calling jump_label_init() in an
unexpected context, especially since it may well be config-dependent, or
adding a jump_label_inc() later on starts mysteriously failing.

>  And jump_label_init() can avoid initialization
> if its already set.

That doesn't seem worthwhile in itself.  I suspect the number of "early"
jump_label_incs will be very small (or we should look at doing the init
earlier).

    J

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don't nop it out
  2011-10-10 19:58     ` Jeremy Fitzhardinge
@ 2011-10-10 20:10       ` Jason Baron
  0 siblings, 0 replies; 53+ messages in thread
From: Jason Baron @ 2011-10-10 20:10 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Steven Rostedt, David S. Miller, David Daney, Michael Ellerman,
	Jan Glauber, the arch/x86 maintainers, Xen Devel,
	Linux Kernel Mailing List, Jeremy Fitzhardinge

On Mon, Oct 10, 2011 at 12:58:19PM -0700, Jeremy Fitzhardinge wrote:
> > Hi,
> >
> > I just realized that the early call to jump_label_inc(), isn't being
> > honored with this patch until later when we invoke jump_label_init().
> > That strikes me as being inconsistent. When jump_label_inc() returns we
> > should expect the branch to be updated.
> 
> Why is that?  It looks to me like it will unconditionally update the
> instruction, irrespective of whether _init() has been called?
> 

No. jump_label_init(), sets up key->entries, to point into the jump
table...before that jump_label_update(), doesn't know where the table is
located, and will just return, without doing the update.


> > Thus, I think what probably want is to add a new 'int jump_label_init'
> > flag. If its not set we can call 'jump_label_init()' from
> > jump_label_inc()/dec().
> 
> Hm.  I worry that it may end up calling jump_label_init() in an
> unexpected context, especially since it may well be config-dependent, or
> adding a jump_label_inc() later on starts mysteriously failing.

good point.

> 
> >  And jump_label_init() can avoid initialization
> > if its already set.
> 
> That doesn't seem worthwhile in itself.  I suspect the number of "early"
> jump_label_incs will be very small (or we should look at doing the init
> earlier).
> 
>     J

I have it as 'early_initcall()', but perhaps it should be moved into
init/main.c. I don't think there's any reason it can't be done super
early. So I think this might be the best answer. It will also simplify
your series.

Thanks,

-Jason

^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2011-10-10 20:10 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-01 21:55 [PATCH RFC V2 0/5] jump-label: allow early jump_label_enable() Jeremy Fitzhardinge
2011-10-01 21:55 ` [PATCH RFC V2 1/5] jump_label: use proper atomic_t initializer Jeremy Fitzhardinge
2011-10-01 21:55 ` [PATCH RFC V2 2/5] stop_machine: make stop_machine safe and efficient to call early Jeremy Fitzhardinge
2011-10-02  0:36   ` Tejun Heo
2011-10-03 19:24   ` [Xen-devel] " Konrad Rzeszutek Wilk
2011-10-01 21:55 ` [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don't nop it out Jeremy Fitzhardinge
2011-10-03 15:02   ` Jason Baron
2011-10-03 15:47     ` Steven Rostedt
2011-10-03 16:27     ` Jeremy Fitzhardinge
2011-10-04 14:10       ` Jason Baron
2011-10-04 15:18         ` Jeremy Fitzhardinge
2011-10-04 16:30         ` H. Peter Anvin
2011-10-04 17:53           ` Jason Baron
2011-10-04 18:05             ` Steven Rostedt
2011-10-06  0:16           ` Jeremy Fitzhardinge
2011-10-06  0:17             ` H. Peter Anvin
2011-10-06  0:47               ` Jeremy Fitzhardinge
2011-10-06 17:53               ` Jeremy Fitzhardinge
2011-10-06 18:10                 ` Jason Baron
2011-10-06 18:13                   ` H. Peter Anvin
2011-10-06 21:39                     ` Jeremy Fitzhardinge
2011-10-06 22:08                       ` Steven Rostedt
2011-10-06 18:15                   ` Jeremy Fitzhardinge
2011-10-06 18:33                     ` Jason Baron
2011-10-06 18:35                       ` H. Peter Anvin
2011-10-06 18:43                         ` Jason Baron
2011-10-06 18:26                   ` Steven Rostedt
2011-10-06 18:29                     ` H. Peter Anvin
2011-10-06 18:38                       ` Jason Baron
2011-10-06 19:34                         ` Steven Rostedt
2011-10-06 20:33                           ` Jason Baron
2011-10-06 20:45                             ` Steven Rostedt
2011-10-06 18:50                     ` Richard Henderson
2011-10-06 19:28                       ` Steven Rostedt
2011-10-06 21:42                         ` Jeremy Fitzhardinge
2011-10-06 22:06                           ` Steven Rostedt
2011-10-06 22:10                             ` Jeremy Fitzhardinge
2011-10-06 22:20                               ` Steven Rostedt
2011-10-07 17:09                               ` [PATCH][RFC] jump_labels/x86: Use either 5 byte or 2 byte jumps Steven Rostedt
2011-10-07 18:52                                 ` Jason Baron
2011-10-07 19:21                                   ` Steven Rostedt
2011-10-07 21:48                                     ` H. Peter Anvin
2011-10-07 22:00                                       ` Steven Rostedt
2011-10-07 22:03                                         ` H. Peter Anvin
2011-10-07 19:33                                   ` Steven Rostedt
2011-10-07 19:40                                 ` Jeremy Fitzhardinge
2011-10-07 19:58                                   ` Steven Rostedt
2011-10-07 20:04                                   ` Peter Zijlstra
2011-10-10 15:36   ` [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don't nop it out Jason Baron
2011-10-10 19:58     ` Jeremy Fitzhardinge
2011-10-10 20:10       ` Jason Baron
2011-10-01 21:55 ` [PATCH RFC V2 4/5] x86/jump_label: drop arch_jump_label_text_poke_early() Jeremy Fitzhardinge
2011-10-01 21:55 ` [PATCH RFC V2 5/5] sparc/jump_label: " Jeremy Fitzhardinge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).