linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* irq-disabled vs vmap vs text_poke
@ 2009-02-13 12:50 Peter Zijlstra
  2009-02-13 12:55 ` Nick Piggin
                   ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Peter Zijlstra @ 2009-02-13 12:50 UTC (permalink / raw)
  To: Nick Piggin, akpm, Mathieu Desnoyers; +Cc: linux-kernel, Ingo Molnar

Hi,

Ingo got the following splat:

[    5.101748] ------------[ cut here ]------------
[    5.104305] WARNING: at kernel/smp.c:329 smp_call_function_many+0x34/0x1ea()
[    5.104305] Hardware name: P4DC6
[    5.104305] Modules linked in:
[    5.104305] Pid: 1, comm: swapper Not tainted 2.6.29-rc4-tip-01766-g1757c19-dirty #2
[    5.104305] Call Trace:
[    5.104305]  [<c012b5d6>] warn_slowpath+0x79/0x8f
[    5.104305]  [<c0104d24>] ? dump_trace+0x7d/0xac
[    5.104305]  [<c014805a>] ? __lock_acquire+0x319/0x382
[    5.104305]  [<c014d6a1>] smp_call_function_many+0x34/0x1ea
[    5.104305]  [<c011b59e>] ? do_flush_tlb_all+0x0/0x48
[    5.104305]  [<c011b59e>] ? do_flush_tlb_all+0x0/0x48
[    5.104305]  [<c014d878>] smp_call_function+0x21/0x28
[    5.104305]  [<c012fa56>] on_each_cpu+0x14/0x23
[    5.104305]  [<c011b56d>] flush_tlb_all+0x19/0x1b
[    5.104305]  [<c018dcbe>] flush_tlb_kernel_range+0xd/0xf
[    5.104305]  [<c018dd00>] vmap_debug_free_range+0x1c/0x20
[    5.104305]  [<c018e386>] remove_vm_area+0x28/0x67
[    5.104305]  [<c018e468>] __vunmap+0x30/0xab
[    5.104305]  [<c018e50a>] vunmap+0x27/0x29
[    5.104305]  [<c06421b6>] text_poke+0xd6/0x104
[    5.104305]  [<c015687c>] ? kprobe_target+0x0/0x15
[    5.104305]  [<c0642a82>] arch_disarm_kprobe+0x13/0x15
[    5.104305]  [<c06434f9>] __unregister_kprobe_top+0x68/0xe8
[    5.104305]  [<c06436d2>] unregister_kretprobes+0x2c/0xb9
[    5.104305]  [<c0643775>] unregister_kretprobe+0x16/0x18
[    5.104305]  [<c0156d0b>] init_test_probes+0x2ed/0x40c
[    5.104305]  [<c09d451c>] init_kprobes+0x127/0x131
[    5.104305]  [<c01476bf>] ? lock_release_holdtime+0x43/0x48
[    5.104305]  [<c0192e48>] ? __slab_alloc+0x5d/0x27a
[    5.104305]  [<c0147a2a>] ? lockdep_init_map+0x80/0xe7
[    5.104305]  [<c0141c0d>] ? clocksource_read+0xd/0xf
[    5.104305]  [<c0142542>] ? getnstimeofday+0x5e/0xe9
[    5.104305]  [<c013e01e>] ? timespec_to_ktime+0xe/0x11
[    5.104305]  [<c09d43f5>] ? init_kprobes+0x0/0x131
[    5.104305]  [<c010115c>] do_one_initcall+0x6a/0x169
[    5.104305]  [<c029ad2f>] ? number+0x10d/0x1cf
[    5.104305]  [<c0147799>] ? register_lock_class+0x17/0x228
[    5.104305]  [<c014805a>] ? __lock_acquire+0x319/0x382
[    5.104305]  [<c029fa91>] ? debug_check_no_obj_freed+0xda/0x126
[    5.104305]  [<c014805a>] ? __lock_acquire+0x319/0x382
[    5.104305]  [<c01476bf>] ? lock_release_holdtime+0x43/0x48
[    5.104305]  [<c0640ec0>] ? _spin_unlock+0x22/0x25
[    5.104305]  [<c01cc0f0>] ? proc_register+0x14b/0x15c
[    5.104305]  [<c01cc20f>] ? create_proc_entry+0x76/0x8c
[    5.104305]  [<c0162900>] ? default_affinity_write+0x3f/0x8a
[    5.104305]  [<c0162a6a>] ? init_irq_proc+0x58/0x65
[    5.104305]  [<c09bf52d>] kernel_init+0x118/0x169
[    5.104305]  [<c09bf415>] ? kernel_init+0x0/0x169
[    5.104305]  [<c0103adf>] kernel_thread_helper+0x7/0x10
[    5.104305] ---[ end trace e93713a9d40cd06c ]---

Which points to vunmap() being called with interrupts disabled.

Which made me look at the vmap/vunmap calls, and they appear to not be
irq-safe, therefore this would be a bug in text_poke().

[ that is, vmap() can end up calling get_vm_area_caller() which in turn
  calls __get_vm_area_node() with GFP_KERNEL, ergo, don't do this from
  an atomic context. ]

Now text_poke() uses local_irq_save/restore(), which conveys that it can
be called with IRQs disabled, which is exactly what happens in the trace
above, however we just established that vmap/vunmap() are not irq-safe.

Anybody got an idea on how to fix this?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: irq-disabled vs vmap vs text_poke
  2009-02-13 12:50 irq-disabled vs vmap vs text_poke Peter Zijlstra
@ 2009-02-13 12:55 ` Nick Piggin
  2009-02-13 13:02   ` Peter Zijlstra
  2009-02-13 14:18 ` Mathieu Desnoyers
  2009-02-13 14:27 ` [PATCH] x86: text_poke might sleep Mathieu Desnoyers
  2 siblings, 1 reply; 32+ messages in thread
From: Nick Piggin @ 2009-02-13 12:55 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: akpm, Mathieu Desnoyers, linux-kernel, Ingo Molnar

On Fri, Feb 13, 2009 at 01:50:07PM +0100, Peter Zijlstra wrote:
> Hi,
> 
> Ingo got the following splat:
> 
> [    5.101748] ------------[ cut here ]------------
> [    5.104305] WARNING: at kernel/smp.c:329 smp_call_function_many+0x34/0x1ea()
> [    5.104305] Hardware name: P4DC6
> [    5.104305] Modules linked in:
> [    5.104305] Pid: 1, comm: swapper Not tainted 2.6.29-rc4-tip-01766-g1757c19-dirty #2
> [    5.104305] Call Trace:
> [    5.104305]  [<c012b5d6>] warn_slowpath+0x79/0x8f
> [    5.104305]  [<c0104d24>] ? dump_trace+0x7d/0xac
> [    5.104305]  [<c014805a>] ? __lock_acquire+0x319/0x382
> [    5.104305]  [<c014d6a1>] smp_call_function_many+0x34/0x1ea
> [    5.104305]  [<c011b59e>] ? do_flush_tlb_all+0x0/0x48
> [    5.104305]  [<c011b59e>] ? do_flush_tlb_all+0x0/0x48
> [    5.104305]  [<c014d878>] smp_call_function+0x21/0x28
> [    5.104305]  [<c012fa56>] on_each_cpu+0x14/0x23
> [    5.104305]  [<c011b56d>] flush_tlb_all+0x19/0x1b
> [    5.104305]  [<c018dcbe>] flush_tlb_kernel_range+0xd/0xf
> [    5.104305]  [<c018dd00>] vmap_debug_free_range+0x1c/0x20
> [    5.104305]  [<c018e386>] remove_vm_area+0x28/0x67
> [    5.104305]  [<c018e468>] __vunmap+0x30/0xab
> [    5.104305]  [<c018e50a>] vunmap+0x27/0x29
> [    5.104305]  [<c06421b6>] text_poke+0xd6/0x104
> [    5.104305]  [<c015687c>] ? kprobe_target+0x0/0x15
> [    5.104305]  [<c0642a82>] arch_disarm_kprobe+0x13/0x15
> [    5.104305]  [<c06434f9>] __unregister_kprobe_top+0x68/0xe8
> [    5.104305]  [<c06436d2>] unregister_kretprobes+0x2c/0xb9
> [    5.104305]  [<c0643775>] unregister_kretprobe+0x16/0x18
> [    5.104305]  [<c0156d0b>] init_test_probes+0x2ed/0x40c
> [    5.104305]  [<c09d451c>] init_kprobes+0x127/0x131
> [    5.104305]  [<c01476bf>] ? lock_release_holdtime+0x43/0x48
> [    5.104305]  [<c0192e48>] ? __slab_alloc+0x5d/0x27a
> [    5.104305]  [<c0147a2a>] ? lockdep_init_map+0x80/0xe7
> [    5.104305]  [<c0141c0d>] ? clocksource_read+0xd/0xf
> [    5.104305]  [<c0142542>] ? getnstimeofday+0x5e/0xe9
> [    5.104305]  [<c013e01e>] ? timespec_to_ktime+0xe/0x11
> [    5.104305]  [<c09d43f5>] ? init_kprobes+0x0/0x131
> [    5.104305]  [<c010115c>] do_one_initcall+0x6a/0x169
> [    5.104305]  [<c029ad2f>] ? number+0x10d/0x1cf
> [    5.104305]  [<c0147799>] ? register_lock_class+0x17/0x228
> [    5.104305]  [<c014805a>] ? __lock_acquire+0x319/0x382
> [    5.104305]  [<c029fa91>] ? debug_check_no_obj_freed+0xda/0x126
> [    5.104305]  [<c014805a>] ? __lock_acquire+0x319/0x382
> [    5.104305]  [<c01476bf>] ? lock_release_holdtime+0x43/0x48
> [    5.104305]  [<c0640ec0>] ? _spin_unlock+0x22/0x25
> [    5.104305]  [<c01cc0f0>] ? proc_register+0x14b/0x15c
> [    5.104305]  [<c01cc20f>] ? create_proc_entry+0x76/0x8c
> [    5.104305]  [<c0162900>] ? default_affinity_write+0x3f/0x8a
> [    5.104305]  [<c0162a6a>] ? init_irq_proc+0x58/0x65
> [    5.104305]  [<c09bf52d>] kernel_init+0x118/0x169
> [    5.104305]  [<c09bf415>] ? kernel_init+0x0/0x169
> [    5.104305]  [<c0103adf>] kernel_thread_helper+0x7/0x10
> [    5.104305] ---[ end trace e93713a9d40cd06c ]---
> 
> Which points to vunmap() being called with interrupts disabled.
> 
> Which made me look at the vmap/vunmap calls, and they appear to not be
> irq-safe, therefore this would be a bug in text_poke().
> 
> [ that is, vmap() can end up calling get_vm_area_caller() which in turn
>   calls __get_vm_area_node() with GFP_KERNEL, ergo, don't do this from
>   an atomic context. ]
> 
> Now text_poke() uses local_irq_save/restore(), which conveys that it can
> be called with IRQs disabled, which is exactly what happens in the trace
> above, however we just established that vmap/vunmap() are not irq-safe.
> 
> Anybody got an idea on how to fix this?

Oh, I thought the consensus was not to use vmap for this?

With a bit of work, we can make vunmap irq-safe with the lazy vunmapping
infrastructure (vmap could also be irq-safe, but would be subject to
spurious failures due to being unable to flush lazy vunmaps.

I think I got a mostly working patch cobbled together sitting here
somewhere. I was waiting for some _really_ good use case before spending
more time on it. I would prefer if at all possible to do vmap operations
in sleepable, process context.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: irq-disabled vs vmap vs text_poke
  2009-02-13 12:55 ` Nick Piggin
@ 2009-02-13 13:02   ` Peter Zijlstra
  2009-02-13 13:04     ` Ingo Molnar
  2009-02-13 13:05     ` Peter Zijlstra
  0 siblings, 2 replies; 32+ messages in thread
From: Peter Zijlstra @ 2009-02-13 13:02 UTC (permalink / raw)
  To: Nick Piggin; +Cc: akpm, Mathieu Desnoyers, linux-kernel, Ingo Molnar

On Fri, 2009-02-13 at 13:55 +0100, Nick Piggin wrote:
> On Fri, Feb 13, 2009 at 01:50:07PM +0100, Peter Zijlstra wrote:
> > Hi,
> > 
> > Ingo got the following splat:

<snip splat>

> > Which points to vunmap() being called with interrupts disabled.
> > 
> > Which made me look at the vmap/vunmap calls, and they appear to not be
> > irq-safe, therefore this would be a bug in text_poke().
> > 
> > [ that is, vmap() can end up calling get_vm_area_caller() which in turn
> >   calls __get_vm_area_node() with GFP_KERNEL, ergo, don't do this from
> >   an atomic context. ]
> > 
> > Now text_poke() uses local_irq_save/restore(), which conveys that it can
> > be called with IRQs disabled, which is exactly what happens in the trace
> > above, however we just established that vmap/vunmap() are not irq-safe.
> > 
> > Anybody got an idea on how to fix this?
> 
> Oh, I thought the consensus was not to use vmap for this?

Seems like a sensible consensus, still that means text_poke() needs some
TLC.

> With a bit of work, we can make vunmap irq-safe with the lazy vunmapping
> infrastructure (vmap could also be irq-safe, but would be subject to
> spurious failures due to being unable to flush lazy vunmaps.

*nod*

> I think I got a mostly working patch cobbled together sitting here
> somewhere. I was waiting for some _really_ good use case before spending
> more time on it. I would prefer if at all possible to do vmap operations
> in sleepable, process context.

Agreed, I think we want to fix text_poke() and make the vmap/vunmap()
ops yell louder at violations of these rules.

I'm just totally clueless wrt text_poke() hence this email ;-)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: irq-disabled vs vmap vs text_poke
  2009-02-13 13:02   ` Peter Zijlstra
@ 2009-02-13 13:04     ` Ingo Molnar
  2009-02-13 14:25       ` Mathieu Desnoyers
  2009-02-13 13:05     ` Peter Zijlstra
  1 sibling, 1 reply; 32+ messages in thread
From: Ingo Molnar @ 2009-02-13 13:04 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Nick Piggin, akpm, Mathieu Desnoyers, linux-kernel


* Peter Zijlstra <peterz@infradead.org> wrote:

> > I think I got a mostly working patch cobbled together sitting here
> > somewhere. I was waiting for some _really_ good use case before spending
> > more time on it. I would prefer if at all possible to do vmap operations
> > in sleepable, process context.
> 
> Agreed, I think we want to fix text_poke() and make the vmap/vunmap()
> ops yell louder at violations of these rules.
> 
> I'm just totally clueless wrt text_poke() hence this email ;-)

also, this started triggering yesterday for the first time - and never
saw it before. Has some commit caused this side-effect?

It triggers during kprobes self-test - has that been improved recently?

Maybe these:

f24659d: kprobes: support probing module __init function
0deddf4: module: add MODULE_STATE_LIVE notify
49ad2fd: kprobes: remove called_from argument
e8386a0: kprobes: support probing module __exit function
017c39b: kprobes: add __kprobes to kprobe internal functions
1294156: kprobes: add kprobe_insn_mutex and cleanup arch_remove_kprobe()
a06f621: module: add within_module_core() and within_module_init()
12da3b8: kprobes: add tests for register_kprobes
8e11440: kprobes: indirectly call kprobe_target
bc2f701: kprobes: bugfix: try_module_get even if calling_mod is NULL

exposed this (long-lived) bug?

	Ingo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: irq-disabled vs vmap vs text_poke
  2009-02-13 13:02   ` Peter Zijlstra
  2009-02-13 13:04     ` Ingo Molnar
@ 2009-02-13 13:05     ` Peter Zijlstra
  2009-02-13 13:09       ` Nick Piggin
  1 sibling, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2009-02-13 13:05 UTC (permalink / raw)
  To: Nick Piggin; +Cc: akpm, Mathieu Desnoyers, linux-kernel, Ingo Molnar

On Fri, 2009-02-13 at 14:02 +0100, Peter Zijlstra wrote:
> Agreed, I think we want to fix text_poke() and make the vmap/vunmap()
> ops yell louder at violations of these rules.

Something like so?

---
 mm/vmalloc.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 75f49d3..8516cea 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1249,6 +1249,7 @@ EXPORT_SYMBOL(vfree);
 void vunmap(const void *addr)
 {
 	BUG_ON(in_interrupt());
+	might_sleep();
 	__vunmap(addr, 0);
 }
 EXPORT_SYMBOL(vunmap);
@@ -1268,6 +1269,8 @@ void *vmap(struct page **pages, unsigned int count,
 {
 	struct vm_struct *area;
 
+	might_sleep();
+
 	if (count > num_physpages)
 		return NULL;
 


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: irq-disabled vs vmap vs text_poke
  2009-02-13 13:05     ` Peter Zijlstra
@ 2009-02-13 13:09       ` Nick Piggin
  0 siblings, 0 replies; 32+ messages in thread
From: Nick Piggin @ 2009-02-13 13:09 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: akpm, Mathieu Desnoyers, linux-kernel, Ingo Molnar

On Fri, Feb 13, 2009 at 02:05:11PM +0100, Peter Zijlstra wrote:
> On Fri, 2009-02-13 at 14:02 +0100, Peter Zijlstra wrote:
> > Agreed, I think we want to fix text_poke() and make the vmap/vunmap()
> > ops yell louder at violations of these rules.
> 
> Something like so?

Couldn't hurt :)


> 
> ---
>  mm/vmalloc.c |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 75f49d3..8516cea 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1249,6 +1249,7 @@ EXPORT_SYMBOL(vfree);
>  void vunmap(const void *addr)
>  {
>  	BUG_ON(in_interrupt());
> +	might_sleep();
>  	__vunmap(addr, 0);
>  }
>  EXPORT_SYMBOL(vunmap);
> @@ -1268,6 +1269,8 @@ void *vmap(struct page **pages, unsigned int count,
>  {
>  	struct vm_struct *area;
>  
> +	might_sleep();
> +
>  	if (count > num_physpages)
>  		return NULL;
>  

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: irq-disabled vs vmap vs text_poke
  2009-02-13 12:50 irq-disabled vs vmap vs text_poke Peter Zijlstra
  2009-02-13 12:55 ` Nick Piggin
@ 2009-02-13 14:18 ` Mathieu Desnoyers
  2009-02-13 14:28   ` Peter Zijlstra
  2009-02-13 16:32   ` Masami Hiramatsu
  2009-02-13 14:27 ` [PATCH] x86: text_poke might sleep Mathieu Desnoyers
  2 siblings, 2 replies; 32+ messages in thread
From: Mathieu Desnoyers @ 2009-02-13 14:18 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Nick Piggin, akpm, linux-kernel, Ingo Molnar

* Peter Zijlstra (peterz@infradead.org) wrote:
> Hi,
> 
> Ingo got the following splat:
> 
> [    5.101748] ------------[ cut here ]------------
> [    5.104305] WARNING: at kernel/smp.c:329 smp_call_function_many+0x34/0x1ea()
> [    5.104305] Hardware name: P4DC6
> [    5.104305] Modules linked in:
> [    5.104305] Pid: 1, comm: swapper Not tainted 2.6.29-rc4-tip-01766-g1757c19-dirty #2
> [    5.104305] Call Trace:
> [    5.104305]  [<c012b5d6>] warn_slowpath+0x79/0x8f
> [    5.104305]  [<c0104d24>] ? dump_trace+0x7d/0xac
> [    5.104305]  [<c014805a>] ? __lock_acquire+0x319/0x382
> [    5.104305]  [<c014d6a1>] smp_call_function_many+0x34/0x1ea
> [    5.104305]  [<c011b59e>] ? do_flush_tlb_all+0x0/0x48
> [    5.104305]  [<c011b59e>] ? do_flush_tlb_all+0x0/0x48
> [    5.104305]  [<c014d878>] smp_call_function+0x21/0x28
> [    5.104305]  [<c012fa56>] on_each_cpu+0x14/0x23
> [    5.104305]  [<c011b56d>] flush_tlb_all+0x19/0x1b
> [    5.104305]  [<c018dcbe>] flush_tlb_kernel_range+0xd/0xf
> [    5.104305]  [<c018dd00>] vmap_debug_free_range+0x1c/0x20
> [    5.104305]  [<c018e386>] remove_vm_area+0x28/0x67
> [    5.104305]  [<c018e468>] __vunmap+0x30/0xab
> [    5.104305]  [<c018e50a>] vunmap+0x27/0x29
> [    5.104305]  [<c06421b6>] text_poke+0xd6/0x104
> [    5.104305]  [<c015687c>] ? kprobe_target+0x0/0x15
> [    5.104305]  [<c0642a82>] arch_disarm_kprobe+0x13/0x15
> [    5.104305]  [<c06434f9>] __unregister_kprobe_top+0x68/0xe8
> [    5.104305]  [<c06436d2>] unregister_kretprobes+0x2c/0xb9
> [    5.104305]  [<c0643775>] unregister_kretprobe+0x16/0x18
> [    5.104305]  [<c0156d0b>] init_test_probes+0x2ed/0x40c
> [    5.104305]  [<c09d451c>] init_kprobes+0x127/0x131
> [    5.104305]  [<c01476bf>] ? lock_release_holdtime+0x43/0x48
> [    5.104305]  [<c0192e48>] ? __slab_alloc+0x5d/0x27a
> [    5.104305]  [<c0147a2a>] ? lockdep_init_map+0x80/0xe7
> [    5.104305]  [<c0141c0d>] ? clocksource_read+0xd/0xf
> [    5.104305]  [<c0142542>] ? getnstimeofday+0x5e/0xe9
> [    5.104305]  [<c013e01e>] ? timespec_to_ktime+0xe/0x11
> [    5.104305]  [<c09d43f5>] ? init_kprobes+0x0/0x131
> [    5.104305]  [<c010115c>] do_one_initcall+0x6a/0x169
> [    5.104305]  [<c029ad2f>] ? number+0x10d/0x1cf
> [    5.104305]  [<c0147799>] ? register_lock_class+0x17/0x228
> [    5.104305]  [<c014805a>] ? __lock_acquire+0x319/0x382
> [    5.104305]  [<c029fa91>] ? debug_check_no_obj_freed+0xda/0x126
> [    5.104305]  [<c014805a>] ? __lock_acquire+0x319/0x382
> [    5.104305]  [<c01476bf>] ? lock_release_holdtime+0x43/0x48
> [    5.104305]  [<c0640ec0>] ? _spin_unlock+0x22/0x25
> [    5.104305]  [<c01cc0f0>] ? proc_register+0x14b/0x15c
> [    5.104305]  [<c01cc20f>] ? create_proc_entry+0x76/0x8c
> [    5.104305]  [<c0162900>] ? default_affinity_write+0x3f/0x8a
> [    5.104305]  [<c0162a6a>] ? init_irq_proc+0x58/0x65
> [    5.104305]  [<c09bf52d>] kernel_init+0x118/0x169
> [    5.104305]  [<c09bf415>] ? kernel_init+0x0/0x169
> [    5.104305]  [<c0103adf>] kernel_thread_helper+0x7/0x10
> [    5.104305] ---[ end trace e93713a9d40cd06c ]---
> 
> Which points to vunmap() being called with interrupts disabled.
> 
> Which made me look at the vmap/vunmap calls, and they appear to not be
> irq-safe, therefore this would be a bug in text_poke().
> 
> [ that is, vmap() can end up calling get_vm_area_caller() which in turn
>   calls __get_vm_area_node() with GFP_KERNEL, ergo, don't do this from
>   an atomic context. ]
> 
> Now text_poke() uses local_irq_save/restore(), which conveys that it can
> be called with IRQs disabled, which is exactly what happens in the trace
> above, however we just established that vmap/vunmap() are not irq-safe.
> 

text_poke should actually use local_irq_disable/enable rather than
local_irq_save/restore because it _must_ be called with interrupts on.

> Anybody got an idea on how to fix this?

There is probably something wrong with the caller, kprobes, which calls
text_poke with interrupts off.

Mathieu

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: irq-disabled vs vmap vs text_poke
  2009-02-13 13:04     ` Ingo Molnar
@ 2009-02-13 14:25       ` Mathieu Desnoyers
  2009-02-13 14:33         ` Peter Zijlstra
  0 siblings, 1 reply; 32+ messages in thread
From: Mathieu Desnoyers @ 2009-02-13 14:25 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Peter Zijlstra, Nick Piggin, akpm, linux-kernel

* Ingo Molnar (mingo@elte.hu) wrote:
> 
> * Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > > I think I got a mostly working patch cobbled together sitting here
> > > somewhere. I was waiting for some _really_ good use case before spending
> > > more time on it. I would prefer if at all possible to do vmap operations
> > > in sleepable, process context.
> > 
> > Agreed, I think we want to fix text_poke() and make the vmap/vunmap()
> > ops yell louder at violations of these rules.
> > 
> > I'm just totally clueless wrt text_poke() hence this email ;-)
> 
> also, this started triggering yesterday for the first time - and never
> saw it before. Has some commit caused this side-effect?
> 
> It triggers during kprobes self-test - has that been improved recently?
> 

When is this self-test run ? If it's at early boot while still in UP
with interrupts off, kprobes should probably use text_poke_early()
rather than text_poke().

Mathieu


-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH] x86: text_poke might sleep
  2009-02-13 12:50 irq-disabled vs vmap vs text_poke Peter Zijlstra
  2009-02-13 12:55 ` Nick Piggin
  2009-02-13 14:18 ` Mathieu Desnoyers
@ 2009-02-13 14:27 ` Mathieu Desnoyers
  2 siblings, 0 replies; 32+ messages in thread
From: Mathieu Desnoyers @ 2009-02-13 14:27 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Nick Piggin, akpm, linux-kernel, Ingo Molnar

* Peter Zijlstra (peterz@infradead.org) wrote:

> Now text_poke() uses local_irq_save/restore(), which conveys that it can
> be called with IRQs disabled, which is exactly what happens in the trace
> above, however we just established that vmap/vunmap() are not irq-safe.
> 
> Anybody got an idea on how to fix this?

Add might_sleep(), comments and use local_irq_disable/enable in text_poke so
people are really aware that it uses vmap, which sleeps.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Ingo Molnar <mingo@elte.hu>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Nick Piggin <npiggin@suse.de>
CC: akpm <akpm@linux-foundation.org>
---
 arch/x86/kernel/alternative.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Index: linux-2.6-lttng/arch/x86/kernel/alternative.c
===================================================================
--- linux-2.6-lttng.orig/arch/x86/kernel/alternative.c	2009-02-13 09:02:20.000000000 -0500
+++ linux-2.6-lttng/arch/x86/kernel/alternative.c	2009-02-13 09:05:05.000000000 -0500
@@ -494,16 +494,16 @@ void *text_poke_early(void *addr, const 
  * Only atomic text poke/set should be allowed when not doing early patching.
  * It means the size must be writable atomically and the address must be aligned
  * in a way that permits an atomic write. It also makes sure we fit on a single
- * page.
+ * page. Should be called with preemption enabled.
  */
 void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
 {
-	unsigned long flags;
 	char *vaddr;
 	int nr_pages = 2;
 	struct page *pages[2];
 	int i;
 
+	might_sleep();
 	if (!core_kernel_text((unsigned long)addr)) {
 		pages[0] = vmalloc_to_page(addr);
 		pages[1] = vmalloc_to_page(addr + PAGE_SIZE);
@@ -517,9 +517,9 @@ void *__kprobes text_poke(void *addr, co
 		nr_pages = 1;
 	vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
 	BUG_ON(!vaddr);
-	local_irq_save(flags);
+	local_irq_disable();
 	memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
-	local_irq_restore(flags);
+	local_irq_enable();
 	vunmap(vaddr);
 	sync_core();
 	/* Could also do a CLFLUSH here to speed up CPU recovery; but

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: irq-disabled vs vmap vs text_poke
  2009-02-13 14:18 ` Mathieu Desnoyers
@ 2009-02-13 14:28   ` Peter Zijlstra
  2009-02-13 16:32   ` Masami Hiramatsu
  1 sibling, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2009-02-13 14:28 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Nick Piggin, akpm, linux-kernel, Ingo Molnar, Masami Hiramatsu,
	ananth, davem

Ok, lets CC kprobes folk

On Fri, 2009-02-13 at 09:18 -0500, Mathieu Desnoyers wrote:
> 
> text_poke should actually use local_irq_disable/enable rather than
> local_irq_save/restore because it _must_ be called with interrupts on.
> 
> > Anybody got an idea on how to fix this?
> 
> There is probably something wrong with the caller, kprobes, which calls
> text_poke with interrupts off.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: irq-disabled vs vmap vs text_poke
  2009-02-13 14:25       ` Mathieu Desnoyers
@ 2009-02-13 14:33         ` Peter Zijlstra
  2009-02-13 14:43           ` Mathieu Desnoyers
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2009-02-13 14:33 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: Ingo Molnar, Nick Piggin, akpm, linux-kernel

On Fri, 2009-02-13 at 09:25 -0500, Mathieu Desnoyers wrote:
> * Ingo Molnar (mingo@elte.hu) wrote:
> > 
> > * Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > > I think I got a mostly working patch cobbled together sitting here
> > > > somewhere. I was waiting for some _really_ good use case before spending
> > > > more time on it. I would prefer if at all possible to do vmap operations
> > > > in sleepable, process context.
> > > 
> > > Agreed, I think we want to fix text_poke() and make the vmap/vunmap()
> > > ops yell louder at violations of these rules.
> > > 
> > > I'm just totally clueless wrt text_poke() hence this email ;-)
> > 
> > also, this started triggering yesterday for the first time - and never
> > saw it before. Has some commit caused this side-effect?
> > 
> > It triggers during kprobes self-test - has that been improved recently?
> > 
> 
> When is this self-test run ? If it's at early boot while still in UP
> with interrupts off, kprobes should probably use text_poke_early()
> rather than text_poke().

Looking at the dmesg it looks to be post smp-init, so its late init
calls.

I think its the do_initcalls() from do_basic_setup(). So the machine
should be mostly up and running.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: irq-disabled vs vmap vs text_poke
  2009-02-13 14:33         ` Peter Zijlstra
@ 2009-02-13 14:43           ` Mathieu Desnoyers
  2009-02-13 18:05             ` Ingo Molnar
  0 siblings, 1 reply; 32+ messages in thread
From: Mathieu Desnoyers @ 2009-02-13 14:43 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, Nick Piggin, akpm, linux-kernel

* Peter Zijlstra (peterz@infradead.org) wrote:
> On Fri, 2009-02-13 at 09:25 -0500, Mathieu Desnoyers wrote:
> > * Ingo Molnar (mingo@elte.hu) wrote:
> > > 
> > > * Peter Zijlstra <peterz@infradead.org> wrote:
> > > 
> > > > > I think I got a mostly working patch cobbled together sitting here
> > > > > somewhere. I was waiting for some _really_ good use case before spending
> > > > > more time on it. I would prefer if at all possible to do vmap operations
> > > > > in sleepable, process context.
> > > > 
> > > > Agreed, I think we want to fix text_poke() and make the vmap/vunmap()
> > > > ops yell louder at violations of these rules.
> > > > 
> > > > I'm just totally clueless wrt text_poke() hence this email ;-)
> > > 
> > > also, this started triggering yesterday for the first time - and never
> > > saw it before. Has some commit caused this side-effect?
> > > 
> > > It triggers during kprobes self-test - has that been improved recently?
> > > 
> > 
> > When is this self-test run ? If it's at early boot while still in UP
> > with interrupts off, kprobes should probably use text_poke_early()
> > rather than text_poke().
> 
> Looking at the dmesg it looks to be post smp-init, so its late init
> calls.
> 
> I think its the do_initcalls() from do_basic_setup(). So the machine
> should be mostly up and running.

Here with 2.6.29-rc4 (commit 8e4921515c1a379539607eb443d51c30f4f7f338).
I don't seem to get any warning with a might_sleep() in text_poke.

Kprobe smoke test started
Kprobe smoke test passed successfully

Mathieu

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: irq-disabled vs vmap vs text_poke
  2009-02-13 14:18 ` Mathieu Desnoyers
  2009-02-13 14:28   ` Peter Zijlstra
@ 2009-02-13 16:32   ` Masami Hiramatsu
  2009-02-13 16:38     ` Peter Zijlstra
  2009-02-13 16:55     ` Mathieu Desnoyers
  1 sibling, 2 replies; 32+ messages in thread
From: Masami Hiramatsu @ 2009-02-13 16:32 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Peter Zijlstra, Nick Piggin, akpm, linux-kernel, Ingo Molnar

Mathieu Desnoyers wrote:
> * Peter Zijlstra (peterz@infradead.org) wrote:
>> Hi,
>>
>> Ingo got the following splat:
>>
>> [    5.101748] ------------[ cut here ]------------
>> [    5.104305] WARNING: at kernel/smp.c:329 smp_call_function_many+0x34/0x1ea()
>> [    5.104305] Hardware name: P4DC6
>> [    5.104305] Modules linked in:
>> [    5.104305] Pid: 1, comm: swapper Not tainted 2.6.29-rc4-tip-01766-g1757c19-dirty #2
>> [    5.104305] Call Trace:
>> [    5.104305]  [<c012b5d6>] warn_slowpath+0x79/0x8f
>> [    5.104305]  [<c0104d24>] ? dump_trace+0x7d/0xac
>> [    5.104305]  [<c014805a>] ? __lock_acquire+0x319/0x382
>> [    5.104305]  [<c014d6a1>] smp_call_function_many+0x34/0x1ea
>> [    5.104305]  [<c011b59e>] ? do_flush_tlb_all+0x0/0x48
>> [    5.104305]  [<c011b59e>] ? do_flush_tlb_all+0x0/0x48
>> [    5.104305]  [<c014d878>] smp_call_function+0x21/0x28
>> [    5.104305]  [<c012fa56>] on_each_cpu+0x14/0x23
>> [    5.104305]  [<c011b56d>] flush_tlb_all+0x19/0x1b
>> [    5.104305]  [<c018dcbe>] flush_tlb_kernel_range+0xd/0xf
>> [    5.104305]  [<c018dd00>] vmap_debug_free_range+0x1c/0x20
>> [    5.104305]  [<c018e386>] remove_vm_area+0x28/0x67
>> [    5.104305]  [<c018e468>] __vunmap+0x30/0xab
>> [    5.104305]  [<c018e50a>] vunmap+0x27/0x29
>> [    5.104305]  [<c06421b6>] text_poke+0xd6/0x104
>> [    5.104305]  [<c015687c>] ? kprobe_target+0x0/0x15
>> [    5.104305]  [<c0642a82>] arch_disarm_kprobe+0x13/0x15
>> [    5.104305]  [<c06434f9>] __unregister_kprobe_top+0x68/0xe8
>> [    5.104305]  [<c06436d2>] unregister_kretprobes+0x2c/0xb9
>> [    5.104305]  [<c0643775>] unregister_kretprobe+0x16/0x18
>> [    5.104305]  [<c0156d0b>] init_test_probes+0x2ed/0x40c
>> [    5.104305]  [<c09d451c>] init_kprobes+0x127/0x131
>> [    5.104305]  [<c01476bf>] ? lock_release_holdtime+0x43/0x48
>> [    5.104305]  [<c0192e48>] ? __slab_alloc+0x5d/0x27a
>> [    5.104305]  [<c0147a2a>] ? lockdep_init_map+0x80/0xe7
>> [    5.104305]  [<c0141c0d>] ? clocksource_read+0xd/0xf
>> [    5.104305]  [<c0142542>] ? getnstimeofday+0x5e/0xe9
>> [    5.104305]  [<c013e01e>] ? timespec_to_ktime+0xe/0x11
>> [    5.104305]  [<c09d43f5>] ? init_kprobes+0x0/0x131
>> [    5.104305]  [<c010115c>] do_one_initcall+0x6a/0x169
>> [    5.104305]  [<c029ad2f>] ? number+0x10d/0x1cf
>> [    5.104305]  [<c0147799>] ? register_lock_class+0x17/0x228
>> [    5.104305]  [<c014805a>] ? __lock_acquire+0x319/0x382
>> [    5.104305]  [<c029fa91>] ? debug_check_no_obj_freed+0xda/0x126
>> [    5.104305]  [<c014805a>] ? __lock_acquire+0x319/0x382
>> [    5.104305]  [<c01476bf>] ? lock_release_holdtime+0x43/0x48
>> [    5.104305]  [<c0640ec0>] ? _spin_unlock+0x22/0x25
>> [    5.104305]  [<c01cc0f0>] ? proc_register+0x14b/0x15c
>> [    5.104305]  [<c01cc20f>] ? create_proc_entry+0x76/0x8c
>> [    5.104305]  [<c0162900>] ? default_affinity_write+0x3f/0x8a
>> [    5.104305]  [<c0162a6a>] ? init_irq_proc+0x58/0x65
>> [    5.104305]  [<c09bf52d>] kernel_init+0x118/0x169
>> [    5.104305]  [<c09bf415>] ? kernel_init+0x0/0x169
>> [    5.104305]  [<c0103adf>] kernel_thread_helper+0x7/0x10
>> [    5.104305] ---[ end trace e93713a9d40cd06c ]---
>>
>> Which points to vunmap() being called with interrupts disabled.
>>
>> Which made me look at the vmap/vunmap calls, and they appear to not be
>> irq-safe, therefore this would be a bug in text_poke().
>>
>> [ that is, vmap() can end up calling get_vm_area_caller() which in turn
>>   calls __get_vm_area_node() with GFP_KERNEL, ergo, don't do this from
>>   an atomic context. ]
>>
>> Now text_poke() uses local_irq_save/restore(), which conveys that it can
>> be called with IRQs disabled, which is exactly what happens in the trace
>> above, however we just established that vmap/vunmap() are not irq-safe.
>>
> 
> text_poke should actually use local_irq_disable/enable rather than
> local_irq_save/restore because it _must_ be called with interrupts on.

Could you tell me why it must be called with irq on?

> 
>> Anybody got an idea on how to fix this?
> 
> There is probably something wrong with the caller, kprobes, which calls
> text_poke with interrupts off.

Hmm, kprobe's smoke test caused this problem. Since (un)register_kprobe()
may sleep for waiting a mutex, it should not be called with interrupts off.
So, it's not text_poke()'s issue nor vmap().

BTW, what about using map_vm_area() in text_poke() instead of vmap()?
Since text_poke() just maps text pages to alias pages temporarily,
I think we don't need to use delayed vunmap().

Thanks

> 
> Mathieu
> 

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhiramat@redhat.com


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: irq-disabled vs vmap vs text_poke
  2009-02-13 16:32   ` Masami Hiramatsu
@ 2009-02-13 16:38     ` Peter Zijlstra
  2009-02-13 16:55     ` Mathieu Desnoyers
  1 sibling, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2009-02-13 16:38 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Mathieu Desnoyers, Nick Piggin, akpm, linux-kernel, Ingo Molnar

On Fri, 2009-02-13 at 11:32 -0500, Masami Hiramatsu wrote:

> >> Anybody got an idea on how to fix this?
> > 
> > There is probably something wrong with the caller, kprobes, which calls
> > text_poke with interrupts off.
> 
> Hmm, kprobe's smoke test caused this problem. Since (un)register_kprobe()
> may sleep for waiting a mutex, it should not be called with interrupts off.
> So, it's not text_poke()'s issue nor vmap().

Yeah, since noticed that, trying to figure out where the irq-disable
came from.. somewhat mysterious.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: irq-disabled vs vmap vs text_poke
  2009-02-13 16:32   ` Masami Hiramatsu
  2009-02-13 16:38     ` Peter Zijlstra
@ 2009-02-13 16:55     ` Mathieu Desnoyers
  2009-02-13 18:14       ` Masami Hiramatsu
  1 sibling, 1 reply; 32+ messages in thread
From: Mathieu Desnoyers @ 2009-02-13 16:55 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Peter Zijlstra, Nick Piggin, akpm, linux-kernel, Ingo Molnar

* Masami Hiramatsu (mhiramat@redhat.com) wrote:
> Mathieu Desnoyers wrote:
> > * Peter Zijlstra (peterz@infradead.org) wrote:
> >> Hi,
> >>
> >> Ingo got the following splat:
> >>
> >> [    5.101748] ------------[ cut here ]------------
> >> [    5.104305] WARNING: at kernel/smp.c:329 smp_call_function_many+0x34/0x1ea()
> >> [    5.104305] Hardware name: P4DC6
> >> [    5.104305] Modules linked in:
> >> [    5.104305] Pid: 1, comm: swapper Not tainted 2.6.29-rc4-tip-01766-g1757c19-dirty #2
> >> [    5.104305] Call Trace:
> >> [    5.104305]  [<c012b5d6>] warn_slowpath+0x79/0x8f
> >> [    5.104305]  [<c0104d24>] ? dump_trace+0x7d/0xac
> >> [    5.104305]  [<c014805a>] ? __lock_acquire+0x319/0x382
> >> [    5.104305]  [<c014d6a1>] smp_call_function_many+0x34/0x1ea
> >> [    5.104305]  [<c011b59e>] ? do_flush_tlb_all+0x0/0x48
> >> [    5.104305]  [<c011b59e>] ? do_flush_tlb_all+0x0/0x48
> >> [    5.104305]  [<c014d878>] smp_call_function+0x21/0x28
> >> [    5.104305]  [<c012fa56>] on_each_cpu+0x14/0x23
> >> [    5.104305]  [<c011b56d>] flush_tlb_all+0x19/0x1b
> >> [    5.104305]  [<c018dcbe>] flush_tlb_kernel_range+0xd/0xf
> >> [    5.104305]  [<c018dd00>] vmap_debug_free_range+0x1c/0x20
> >> [    5.104305]  [<c018e386>] remove_vm_area+0x28/0x67
> >> [    5.104305]  [<c018e468>] __vunmap+0x30/0xab
> >> [    5.104305]  [<c018e50a>] vunmap+0x27/0x29
> >> [    5.104305]  [<c06421b6>] text_poke+0xd6/0x104
> >> [    5.104305]  [<c015687c>] ? kprobe_target+0x0/0x15
> >> [    5.104305]  [<c0642a82>] arch_disarm_kprobe+0x13/0x15
> >> [    5.104305]  [<c06434f9>] __unregister_kprobe_top+0x68/0xe8
> >> [    5.104305]  [<c06436d2>] unregister_kretprobes+0x2c/0xb9
> >> [    5.104305]  [<c0643775>] unregister_kretprobe+0x16/0x18
> >> [    5.104305]  [<c0156d0b>] init_test_probes+0x2ed/0x40c
> >> [    5.104305]  [<c09d451c>] init_kprobes+0x127/0x131
> >> [    5.104305]  [<c01476bf>] ? lock_release_holdtime+0x43/0x48
> >> [    5.104305]  [<c0192e48>] ? __slab_alloc+0x5d/0x27a
> >> [    5.104305]  [<c0147a2a>] ? lockdep_init_map+0x80/0xe7
> >> [    5.104305]  [<c0141c0d>] ? clocksource_read+0xd/0xf
> >> [    5.104305]  [<c0142542>] ? getnstimeofday+0x5e/0xe9
> >> [    5.104305]  [<c013e01e>] ? timespec_to_ktime+0xe/0x11
> >> [    5.104305]  [<c09d43f5>] ? init_kprobes+0x0/0x131
> >> [    5.104305]  [<c010115c>] do_one_initcall+0x6a/0x169
> >> [    5.104305]  [<c029ad2f>] ? number+0x10d/0x1cf
> >> [    5.104305]  [<c0147799>] ? register_lock_class+0x17/0x228
> >> [    5.104305]  [<c014805a>] ? __lock_acquire+0x319/0x382
> >> [    5.104305]  [<c029fa91>] ? debug_check_no_obj_freed+0xda/0x126
> >> [    5.104305]  [<c014805a>] ? __lock_acquire+0x319/0x382
> >> [    5.104305]  [<c01476bf>] ? lock_release_holdtime+0x43/0x48
> >> [    5.104305]  [<c0640ec0>] ? _spin_unlock+0x22/0x25
> >> [    5.104305]  [<c01cc0f0>] ? proc_register+0x14b/0x15c
> >> [    5.104305]  [<c01cc20f>] ? create_proc_entry+0x76/0x8c
> >> [    5.104305]  [<c0162900>] ? default_affinity_write+0x3f/0x8a
> >> [    5.104305]  [<c0162a6a>] ? init_irq_proc+0x58/0x65
> >> [    5.104305]  [<c09bf52d>] kernel_init+0x118/0x169
> >> [    5.104305]  [<c09bf415>] ? kernel_init+0x0/0x169
> >> [    5.104305]  [<c0103adf>] kernel_thread_helper+0x7/0x10
> >> [    5.104305] ---[ end trace e93713a9d40cd06c ]---
> >>
> >> Which points to vunmap() being called with interrupts disabled.
> >>
> >> Which made me look at the vmap/vunmap calls, and they appear to not be
> >> irq-safe, therefore this would be a bug in text_poke().
> >>
> >> [ that is, vmap() can end up calling get_vm_area_caller() which in turn
> >>   calls __get_vm_area_node() with GFP_KERNEL, ergo, don't do this from
> >>   an atomic context. ]
> >>
> >> Now text_poke() uses local_irq_save/restore(), which conveys that it can
> >> be called with IRQs disabled, which is exactly what happens in the trace
> >> above, however we just established that vmap/vunmap() are not irq-safe.
> >>
> > 
> > text_poke should actually use local_irq_disable/enable rather than
> > local_irq_save/restore because it _must_ be called with interrupts on.
> 
> Could you tell me why it must be called with irq on?
> 

Because it uses vmap, but see below..

> > 
> >> Anybody got an idea on how to fix this?
> > 
> > There is probably something wrong with the caller, kprobes, which calls
> > text_poke with interrupts off.
> 
> Hmm, kprobe's smoke test caused this problem. Since (un)register_kprobe()
> may sleep for waiting a mutex, it should not be called with interrupts off.
> So, it's not text_poke()'s issue nor vmap().
> 
> BTW, what about using map_vm_area() in text_poke() instead of vmap()?
> Since text_poke() just maps text pages to alias pages temporarily,
> I think we don't need to use delayed vunmap().
> 

As with this patch from you ? Sorry, when it has been initially
submitted, the discussion turned in a different direction. Please see
comments inline.

> From: Masami Hiramatsu <mhiramat@redhat.com>
> 
> Use vm_map_ram() instead of vmap() in text_poke(), because text_poke()
> just want to map pages temporarily.
> 
> ---
> As far as I tested, this patch works fine for me.
> However, there might be another hidden bug in the kernel...
> We need to fix that too.
> 
>  arch/x86/kernel/alternative.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> Index: 2.6-rc/arch/x86/kernel/alternative.c
> ===================================================================
> --- 2.6-rc.orig/arch/x86/kernel/alternative.c
> +++ 2.6-rc/arch/x86/kernel/alternative.c
> @@ -515,12 +515,12 @@ void *__kprobes text_poke(void *addr, co
>  	BUG_ON(!pages[0]);
>  	if (!pages[1])
>  		nr_pages = 1;
> -	vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
> +	vaddr = vm_map_ram(pages, nr_pages, -1, PAGE_KERNEL);

Hrm, maybe we should do :

  vaddr = vm_map_ram(pages, nr_pages, numa_node_id(), PAGE_KERNEL);

instead ?

Related question : can vm_map_ram sleep ?

Mathieu


>  	BUG_ON(!vaddr);
>  	local_irq_save(flags);
>  	memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
>  	local_irq_restore(flags);
> -	vunmap(vaddr);
> +	vm_unmap_ram(vaddr, nr_pages);
>  	sync_core();
>  	/* Could also do a CLFLUSH here to speed up CPU recovery; but
>  	   that causes hangs on some VIA CPUs. */

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: irq-disabled vs vmap vs text_poke
  2009-02-13 14:43           ` Mathieu Desnoyers
@ 2009-02-13 18:05             ` Ingo Molnar
  0 siblings, 0 replies; 32+ messages in thread
From: Ingo Molnar @ 2009-02-13 18:05 UTC (permalink / raw)
  To: Mathieu Desnoyers, Steven Rostedt
  Cc: Peter Zijlstra, Nick Piggin, akpm, linux-kernel


* Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:

> * Peter Zijlstra (peterz@infradead.org) wrote:
> > On Fri, 2009-02-13 at 09:25 -0500, Mathieu Desnoyers wrote:
> > > * Ingo Molnar (mingo@elte.hu) wrote:
> > > > 
> > > > * Peter Zijlstra <peterz@infradead.org> wrote:
> > > > 
> > > > > > I think I got a mostly working patch cobbled together sitting here
> > > > > > somewhere. I was waiting for some _really_ good use case before spending
> > > > > > more time on it. I would prefer if at all possible to do vmap operations
> > > > > > in sleepable, process context.
> > > > > 
> > > > > Agreed, I think we want to fix text_poke() and make the vmap/vunmap()
> > > > > ops yell louder at violations of these rules.
> > > > > 
> > > > > I'm just totally clueless wrt text_poke() hence this email ;-)
> > > > 
> > > > also, this started triggering yesterday for the first time - and never
> > > > saw it before. Has some commit caused this side-effect?
> > > > 
> > > > It triggers during kprobes self-test - has that been improved recently?
> > > > 
> > > 
> > > When is this self-test run ? If it's at early boot while still in UP
> > > with interrupts off, kprobes should probably use text_poke_early()
> > > rather than text_poke().
> > 
> > Looking at the dmesg it looks to be post smp-init, so its late init
> > calls.
> > 
> > I think its the do_initcalls() from do_basic_setup(). So the machine
> > should be mostly up and running.
> 
> Here with 2.6.29-rc4 (commit 8e4921515c1a379539607eb443d51c30f4f7f338).
> I don't seem to get any warning with a might_sleep() in text_poke.
> 
> Kprobe smoke test started
> Kprobe smoke test passed successfully

it doesnt always trigger. It triggers on a box that also generates a
lot of NMIs - maybe there's a connection?

	Ingo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: irq-disabled vs vmap vs text_poke
  2009-02-13 16:55     ` Mathieu Desnoyers
@ 2009-02-13 18:14       ` Masami Hiramatsu
  2009-02-13 18:57         ` Mathieu Desnoyers
  0 siblings, 1 reply; 32+ messages in thread
From: Masami Hiramatsu @ 2009-02-13 18:14 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Peter Zijlstra, Nick Piggin, akpm, linux-kernel, Ingo Molnar,
	Ananth N Mavinakayanahalli, Jim Keniston

Mathieu Desnoyers wrote:
>>> text_poke should actually use local_irq_disable/enable rather than
>>> local_irq_save/restore because it _must_ be called with interrupts on.
>> Could you tell me why it must be called with irq on?
>>
> 
> Because it uses vmap, but see below..
> 
>>>> Anybody got an idea on how to fix this?
>>> There is probably something wrong with the caller, kprobes, which calls
>>> text_poke with interrupts off.
>> Hmm, kprobe's smoke test caused this problem. Since (un)register_kprobe()
>> may sleep for waiting a mutex, it should not be called with interrupts off.
>> So, it's not text_poke()'s issue nor vmap().
>>
>> BTW, what about using map_vm_area() in text_poke() instead of vmap()?
>> Since text_poke() just maps text pages to alias pages temporarily,
>> I think we don't need to use delayed vunmap().
>>
> 
> As with this patch from you ? Sorry, when it has been initially
> submitted, the discussion turned in a different direction. Please see
> comments inline.

No, sorry for confusing, vm_unmap_ram() is basically same as vunmap().
(The root cause of that bug which I had been reported was not in text_poke())

Here I said is (maybe) improving text_poke() by using map_vm_area() which
simply maps pages to pre-allocated vm_area. And when unmapping, we just
cleanup pte by unmap_kernel_range().
For pre-allocating vm_area, we can use get_vm_area().

If we can use kmap for this purpose, it will be better solution.
However, kmap can not be used for making alias pages.

Thanks,

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhiramat@redhat.com


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: irq-disabled vs vmap vs text_poke
  2009-02-13 18:14       ` Masami Hiramatsu
@ 2009-02-13 18:57         ` Mathieu Desnoyers
  2009-02-13 21:41           ` Masami Hiramatsu
  0 siblings, 1 reply; 32+ messages in thread
From: Mathieu Desnoyers @ 2009-02-13 18:57 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Peter Zijlstra, Nick Piggin, akpm, linux-kernel, Ingo Molnar,
	Ananth N Mavinakayanahalli, Jim Keniston

* Masami Hiramatsu (mhiramat@redhat.com) wrote:
> Mathieu Desnoyers wrote:
> >>> text_poke should actually use local_irq_disable/enable rather than
> >>> local_irq_save/restore because it _must_ be called with interrupts on.
> >> Could you tell me why it must be called with irq on?
> >>
> > 
> > Because it uses vmap, but see below..
> > 
> >>>> Anybody got an idea on how to fix this?
> >>> There is probably something wrong with the caller, kprobes, which calls
> >>> text_poke with interrupts off.
> >> Hmm, kprobe's smoke test caused this problem. Since (un)register_kprobe()
> >> may sleep for waiting a mutex, it should not be called with interrupts off.
> >> So, it's not text_poke()'s issue nor vmap().
> >>
> >> BTW, what about using map_vm_area() in text_poke() instead of vmap()?
> >> Since text_poke() just maps text pages to alias pages temporarily,
> >> I think we don't need to use delayed vunmap().
> >>
> > 
> > As with this patch from you ? Sorry, when it has been initially
> > submitted, the discussion turned in a different direction. Please see
> > comments inline.
> 
> No, sorry for confusing, vm_unmap_ram() is basically same as vunmap().
> (The root cause of that bug which I had been reported was not in text_poke())
> 
> Here I said is (maybe) improving text_poke() by using map_vm_area() which
> simply maps pages to pre-allocated vm_area. And when unmapping, we just
> cleanup pte by unmap_kernel_range().
> For pre-allocating vm_area, we can use get_vm_area().
> 
> If we can use kmap for this purpose, it will be better solution.
> However, kmap can not be used for making alias pages.
> 
> Thanks,
> 

That sounds clean, and would improve the current way we (mis)use vmap
for this tiny mapping. I always felt like I was using a sledgehammer
when only a hammer was necessary. :)

Do you have a patch that implements this ?

Mathieu

> -- 
> Masami Hiramatsu
> 
> Software Engineer
> Hitachi Computer Products (America) Inc.
> Software Solutions Division
> 
> e-mail: mhiramat@redhat.com
> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: irq-disabled vs vmap vs text_poke
  2009-02-13 18:57         ` Mathieu Desnoyers
@ 2009-02-13 21:41           ` Masami Hiramatsu
  2009-02-16 15:04             ` Masami Hiramatsu
  0 siblings, 1 reply; 32+ messages in thread
From: Masami Hiramatsu @ 2009-02-13 21:41 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Peter Zijlstra, Nick Piggin, akpm, linux-kernel, Ingo Molnar,
	Ananth N Mavinakayanahalli, Jim Keniston

Mathieu Desnoyers wrote:
> * Masami Hiramatsu (mhiramat@redhat.com) wrote:
>> Mathieu Desnoyers wrote:
>>>>> text_poke should actually use local_irq_disable/enable rather than
>>>>> local_irq_save/restore because it _must_ be called with interrupts on.
>>>> Could you tell me why it must be called with irq on?
>>>>
>>> Because it uses vmap, but see below..
>>>
>>>>>> Anybody got an idea on how to fix this?
>>>>> There is probably something wrong with the caller, kprobes, which calls
>>>>> text_poke with interrupts off.
>>>> Hmm, kprobe's smoke test caused this problem. Since (un)register_kprobe()
>>>> may sleep for waiting a mutex, it should not be called with interrupts off.
>>>> So, it's not text_poke()'s issue nor vmap().
>>>>
>>>> BTW, what about using map_vm_area() in text_poke() instead of vmap()?
>>>> Since text_poke() just maps text pages to alias pages temporarily,
>>>> I think we don't need to use delayed vunmap().
>>>>
>>> As with this patch from you ? Sorry, when it has been initially
>>> submitted, the discussion turned in a different direction. Please see
>>> comments inline.
>> No, sorry for confusing, vm_unmap_ram() is basically same as vunmap().
>> (The root cause of that bug which I had been reported was not in text_poke())
>>
>> Here I said is (maybe) improving text_poke() by using map_vm_area() which
>> simply maps pages to pre-allocated vm_area. And when unmapping, we just
>> cleanup pte by unmap_kernel_range().
>> For pre-allocating vm_area, we can use get_vm_area().
>>
>> If we can use kmap for this purpose, it will be better solution.
>> However, kmap can not be used for making alias pages.
>>
>> Thanks,
>>
> 
> That sounds clean, and would improve the current way we (mis)use vmap
> for this tiny mapping. I always felt like I was using a sledgehammer
> when only a hammer was necessary. :)
> 
> Do you have a patch that implements this ?

Sure, however, the patch was not well tested (nor designed :-)),
because it has been made as a by-product when I tried to solve
another issue.

Anyway, I'll send it after testing.

Thank you,

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhiramat@redhat.com


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: irq-disabled vs vmap vs text_poke
  2009-02-13 21:41           ` Masami Hiramatsu
@ 2009-02-16 15:04             ` Masami Hiramatsu
  2009-02-16 15:31               ` Nick Piggin
  0 siblings, 1 reply; 32+ messages in thread
From: Masami Hiramatsu @ 2009-02-16 15:04 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Peter Zijlstra, Nick Piggin, akpm, linux-kernel, Ingo Molnar,
	Ananth N Mavinakayanahalli, Jim Keniston

Masami Hiramatsu wrote:
> Mathieu Desnoyers wrote:
>> * Masami Hiramatsu (mhiramat@redhat.com) wrote:
>>> Mathieu Desnoyers wrote:
>>>>>> text_poke should actually use local_irq_disable/enable rather than
>>>>>> local_irq_save/restore because it _must_ be called with interrupts on.
>>>>> Could you tell me why it must be called with irq on?
>>>>>
>>>> Because it uses vmap, but see below..
>>>>
>>>>>>> Anybody got an idea on how to fix this?
>>>>>> There is probably something wrong with the caller, kprobes, which calls
>>>>>> text_poke with interrupts off.
>>>>> Hmm, kprobe's smoke test caused this problem. Since (un)register_kprobe()
>>>>> may sleep for waiting a mutex, it should not be called with interrupts off.
>>>>> So, it's not text_poke()'s issue nor vmap().
>>>>>
>>>>> BTW, what about using map_vm_area() in text_poke() instead of vmap()?
>>>>> Since text_poke() just maps text pages to alias pages temporarily,
>>>>> I think we don't need to use delayed vunmap().
>>>>>
>>>> As with this patch from you ? Sorry, when it has been initially
>>>> submitted, the discussion turned in a different direction. Please see
>>>> comments inline.
>>> No, sorry for confusing, vm_unmap_ram() is basically same as vunmap().
>>> (The root cause of that bug which I had been reported was not in text_poke())
>>>
>>> Here I said is (maybe) improving text_poke() by using map_vm_area() which
>>> simply maps pages to pre-allocated vm_area. And when unmapping, we just
>>> cleanup pte by unmap_kernel_range().
>>> For pre-allocating vm_area, we can use get_vm_area().
>>>
>>> If we can use kmap for this purpose, it will be better solution.
>>> However, kmap can not be used for making alias pages.
>>>
>>> Thanks,
>>>
>> That sounds clean, and would improve the current way we (mis)use vmap
>> for this tiny mapping. I always felt like I was using a sledgehammer
>> when only a hammer was necessary. :)
>>
>> Do you have a patch that implements this ?
> 
> Sure, however, the patch was not well tested (nor designed :-)),
> because it has been made as a by-product when I tried to solve
> another issue.

Here is the patch which replace v(un)map with (un)map_vm_area.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
---
  arch/x86/include/asm/alternative.h |    1 +
  arch/x86/kernel/alternative.c      |   25 ++++++++++++++++++++-----
  init/main.c                        |    3 +++
  3 files changed, 24 insertions(+), 5 deletions(-)

Index: linux-2.6/arch/x86/include/asm/alternative.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/alternative.h
+++ linux-2.6/arch/x86/include/asm/alternative.h
@@ -177,6 +177,7 @@ extern void add_nops(void *insns, unsign
   * The _early version expects the memory to already be RW.
   */

+extern void text_poke_init(void);
  extern void *text_poke(void *addr, const void *opcode, size_t len);
  extern void *text_poke_early(void *addr, const void *opcode, size_t len);

Index: linux-2.6/arch/x86/kernel/alternative.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/alternative.c
+++ linux-2.6/arch/x86/kernel/alternative.c
@@ -485,6 +485,16 @@ void *text_poke_early(void *addr, const
  	return addr;
  }

+static struct vm_struct *text_poke_area[2];
+static DEFINE_SPINLOCK(text_poke_lock);
+
+void __init text_poke_init(void)
+{
+	text_poke_area[0] = get_vm_area(PAGE_SIZE, VM_ALLOC);
+	text_poke_area[1] = get_vm_area(2 * PAGE_SIZE, VM_ALLOC);
+	BUG_ON(!text_poke_area[0] || !text_poke_area[1]);
+}
+
  /**
   * text_poke - Update instructions on a live kernel
   * @addr: address to modify
@@ -501,8 +511,9 @@ void *__kprobes text_poke(void *addr, co
  	unsigned long flags;
  	char *vaddr;
  	int nr_pages = 2;
-	struct page *pages[2];
-	int i;
+	struct page *pages[2], **pgp = pages;
+	int i, ret;
+	struct vm_struct *vma;

  	if (!core_kernel_text((unsigned long)addr)) {
  		pages[0] = vmalloc_to_page(addr);
@@ -515,12 +526,16 @@ void *__kprobes text_poke(void *addr, co
  	BUG_ON(!pages[0]);
  	if (!pages[1])
  		nr_pages = 1;
-	vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
-	BUG_ON(!vaddr);
+	spin_lock(&text_poke_lock);
+	vma = text_poke_area[nr_pages-1];
+	ret = map_vm_area(vma, PAGE_KERNEL, &pgp);
+	BUG_ON(ret);
+	vaddr = vma->addr;
  	local_irq_save(flags);
  	memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
  	local_irq_restore(flags);
-	vunmap(vaddr);
+	unmap_kernel_range((unsigned long)vma->addr, (unsigned long)vma->size);
+	spin_unlock(&text_poke_lock);
  	sync_core();
  	/* Could also do a CLFLUSH here to speed up CPU recovery; but
  	   that causes hangs on some VIA CPUs. */
Index: linux-2.6/init/main.c
===================================================================
--- linux-2.6.orig/init/main.c
+++ linux-2.6/init/main.c
@@ -676,6 +676,9 @@ asmlinkage void __init start_kernel(void
  	taskstats_init_early();
  	delayacct_init();

+#ifdef CONFIG_X86
+	text_poke_init();
+#endif
  	check_bugs();

  	acpi_early_init(); /* before LAPIC and SMP init */

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhiramat@redhat.com


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: irq-disabled vs vmap vs text_poke
  2009-02-16 15:04             ` Masami Hiramatsu
@ 2009-02-16 15:31               ` Nick Piggin
  2009-02-16 17:24                 ` Mathieu Desnoyers
  0 siblings, 1 reply; 32+ messages in thread
From: Nick Piggin @ 2009-02-16 15:31 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Mathieu Desnoyers, Peter Zijlstra, akpm, linux-kernel,
	Ingo Molnar, Ananth N Mavinakayanahalli, Jim Keniston

On Mon, Feb 16, 2009 at 10:04:43AM -0500, Masami Hiramatsu wrote:
> >>>>>BTW, what about using map_vm_area() in text_poke() instead of vmap()?
> >>>>>Since text_poke() just maps text pages to alias pages temporarily,
> >>>>>I think we don't need to use delayed vunmap().

[...] 

> Here is the patch which replace v(un)map with (un)map_vm_area.

I don't quite understand the point of this... delayed vunmap() is
just an implementation detail of vmap subsystem. Callers should not
have to care.

 
> Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
> ---
>  arch/x86/include/asm/alternative.h |    1 +
>  arch/x86/kernel/alternative.c      |   25 ++++++++++++++++++++-----
>  init/main.c                        |    3 +++
>  3 files changed, 24 insertions(+), 5 deletions(-)
> 
> Index: linux-2.6/arch/x86/include/asm/alternative.h
> ===================================================================
> --- linux-2.6.orig/arch/x86/include/asm/alternative.h
> +++ linux-2.6/arch/x86/include/asm/alternative.h
> @@ -177,6 +177,7 @@ extern void add_nops(void *insns, unsign
>   * The _early version expects the memory to already be RW.
>   */
> 
> +extern void text_poke_init(void);
>  extern void *text_poke(void *addr, const void *opcode, size_t len);
>  extern void *text_poke_early(void *addr, const void *opcode, size_t len);
> 
> Index: linux-2.6/arch/x86/kernel/alternative.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/kernel/alternative.c
> +++ linux-2.6/arch/x86/kernel/alternative.c
> @@ -485,6 +485,16 @@ void *text_poke_early(void *addr, const
>  	return addr;
>  }
> 
> +static struct vm_struct *text_poke_area[2];
> +static DEFINE_SPINLOCK(text_poke_lock);
> +
> +void __init text_poke_init(void)
> +{
> +	text_poke_area[0] = get_vm_area(PAGE_SIZE, VM_ALLOC);
> +	text_poke_area[1] = get_vm_area(2 * PAGE_SIZE, VM_ALLOC);
> +	BUG_ON(!text_poke_area[0] || !text_poke_area[1]);
> +}
> +
>  /**
>   * text_poke - Update instructions on a live kernel
>   * @addr: address to modify
> @@ -501,8 +511,9 @@ void *__kprobes text_poke(void *addr, co
>  	unsigned long flags;
>  	char *vaddr;
>  	int nr_pages = 2;
> -	struct page *pages[2];
> -	int i;
> +	struct page *pages[2], **pgp = pages;
> +	int i, ret;
> +	struct vm_struct *vma;
> 
>  	if (!core_kernel_text((unsigned long)addr)) {
>  		pages[0] = vmalloc_to_page(addr);
> @@ -515,12 +526,16 @@ void *__kprobes text_poke(void *addr, co
>  	BUG_ON(!pages[0]);
>  	if (!pages[1])
>  		nr_pages = 1;
> -	vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
> -	BUG_ON(!vaddr);
> +	spin_lock(&text_poke_lock);
> +	vma = text_poke_area[nr_pages-1];
> +	ret = map_vm_area(vma, PAGE_KERNEL, &pgp);
> +	BUG_ON(ret);
> +	vaddr = vma->addr;
>  	local_irq_save(flags);
>  	memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
>  	local_irq_restore(flags);
> -	vunmap(vaddr);
> +	unmap_kernel_range((unsigned long)vma->addr, (unsigned 
> long)vma->size);
> +	spin_unlock(&text_poke_lock);
>  	sync_core();
>  	/* Could also do a CLFLUSH here to speed up CPU recovery; but
>  	   that causes hangs on some VIA CPUs. */
> Index: linux-2.6/init/main.c
> ===================================================================
> --- linux-2.6.orig/init/main.c
> +++ linux-2.6/init/main.c
> @@ -676,6 +676,9 @@ asmlinkage void __init start_kernel(void
>  	taskstats_init_early();
>  	delayacct_init();
> 
> +#ifdef CONFIG_X86
> +	text_poke_init();
> +#endif
>  	check_bugs();
> 
>  	acpi_early_init(); /* before LAPIC and SMP init */
> 
> -- 
> Masami Hiramatsu
> 
> Software Engineer
> Hitachi Computer Products (America) Inc.
> Software Solutions Division
> 
> e-mail: mhiramat@redhat.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: irq-disabled vs vmap vs text_poke
  2009-02-16 15:31               ` Nick Piggin
@ 2009-02-16 17:24                 ` Mathieu Desnoyers
  2009-02-17  2:00                   ` Masami Hiramatsu
  0 siblings, 1 reply; 32+ messages in thread
From: Mathieu Desnoyers @ 2009-02-16 17:24 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Masami Hiramatsu, Peter Zijlstra, akpm, linux-kernel,
	Ingo Molnar, Ananth N Mavinakayanahalli, Jim Keniston

* Nick Piggin (npiggin@suse.de) wrote:
> On Mon, Feb 16, 2009 at 10:04:43AM -0500, Masami Hiramatsu wrote:
> > >>>>>BTW, what about using map_vm_area() in text_poke() instead of vmap()?
> > >>>>>Since text_poke() just maps text pages to alias pages temporarily,
> > >>>>>I think we don't need to use delayed vunmap().
> 
> [...] 
> 
> > Here is the patch which replace v(un)map with (un)map_vm_area.
> 
> I don't quite understand the point of this... delayed vunmap() is
> just an implementation detail of vmap subsystem. Callers should not
> have to care.
> 

AFAIK, map_vm_area/unmap_vm_area is faster than vmap/vunmap.  This is
the point of this patch. Masami, could you provide a quick benchmark of
text_poke()/seconds before and after this optimization is applied to
confirm this ?

Thanks,

Mathieu

>  
> > Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
> > ---
> >  arch/x86/include/asm/alternative.h |    1 +
> >  arch/x86/kernel/alternative.c      |   25 ++++++++++++++++++++-----
> >  init/main.c                        |    3 +++
> >  3 files changed, 24 insertions(+), 5 deletions(-)
> > 
> > Index: linux-2.6/arch/x86/include/asm/alternative.h
> > ===================================================================
> > --- linux-2.6.orig/arch/x86/include/asm/alternative.h
> > +++ linux-2.6/arch/x86/include/asm/alternative.h
> > @@ -177,6 +177,7 @@ extern void add_nops(void *insns, unsign
> >   * The _early version expects the memory to already be RW.
> >   */
> > 
> > +extern void text_poke_init(void);
> >  extern void *text_poke(void *addr, const void *opcode, size_t len);
> >  extern void *text_poke_early(void *addr, const void *opcode, size_t len);
> > 
> > Index: linux-2.6/arch/x86/kernel/alternative.c
> > ===================================================================
> > --- linux-2.6.orig/arch/x86/kernel/alternative.c
> > +++ linux-2.6/arch/x86/kernel/alternative.c
> > @@ -485,6 +485,16 @@ void *text_poke_early(void *addr, const
> >  	return addr;
> >  }
> > 
> > +static struct vm_struct *text_poke_area[2];
> > +static DEFINE_SPINLOCK(text_poke_lock);
> > +
> > +void __init text_poke_init(void)
> > +{
> > +	text_poke_area[0] = get_vm_area(PAGE_SIZE, VM_ALLOC);
> > +	text_poke_area[1] = get_vm_area(2 * PAGE_SIZE, VM_ALLOC);
> > +	BUG_ON(!text_poke_area[0] || !text_poke_area[1]);
> > +}
> > +
> >  /**
> >   * text_poke - Update instructions on a live kernel
> >   * @addr: address to modify
> > @@ -501,8 +511,9 @@ void *__kprobes text_poke(void *addr, co
> >  	unsigned long flags;
> >  	char *vaddr;
> >  	int nr_pages = 2;
> > -	struct page *pages[2];
> > -	int i;
> > +	struct page *pages[2], **pgp = pages;
> > +	int i, ret;
> > +	struct vm_struct *vma;
> > 
> >  	if (!core_kernel_text((unsigned long)addr)) {
> >  		pages[0] = vmalloc_to_page(addr);
> > @@ -515,12 +526,16 @@ void *__kprobes text_poke(void *addr, co
> >  	BUG_ON(!pages[0]);
> >  	if (!pages[1])
> >  		nr_pages = 1;
> > -	vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
> > -	BUG_ON(!vaddr);
> > +	spin_lock(&text_poke_lock);
> > +	vma = text_poke_area[nr_pages-1];
> > +	ret = map_vm_area(vma, PAGE_KERNEL, &pgp);
> > +	BUG_ON(ret);
> > +	vaddr = vma->addr;
> >  	local_irq_save(flags);
> >  	memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
> >  	local_irq_restore(flags);
> > -	vunmap(vaddr);
> > +	unmap_kernel_range((unsigned long)vma->addr, (unsigned 
> > long)vma->size);
> > +	spin_unlock(&text_poke_lock);
> >  	sync_core();
> >  	/* Could also do a CLFLUSH here to speed up CPU recovery; but
> >  	   that causes hangs on some VIA CPUs. */
> > Index: linux-2.6/init/main.c
> > ===================================================================
> > --- linux-2.6.orig/init/main.c
> > +++ linux-2.6/init/main.c
> > @@ -676,6 +676,9 @@ asmlinkage void __init start_kernel(void
> >  	taskstats_init_early();
> >  	delayacct_init();
> > 
> > +#ifdef CONFIG_X86
> > +	text_poke_init();
> > +#endif
> >  	check_bugs();
> > 
> >  	acpi_early_init(); /* before LAPIC and SMP init */
> > 
> > -- 
> > Masami Hiramatsu
> > 
> > Software Engineer
> > Hitachi Computer Products (America) Inc.
> > Software Solutions Division
> > 
> > e-mail: mhiramat@redhat.com

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: irq-disabled vs vmap vs text_poke
  2009-02-16 17:24                 ` Mathieu Desnoyers
@ 2009-02-17  2:00                   ` Masami Hiramatsu
  2009-02-17  3:03                     ` Nick Piggin
  0 siblings, 1 reply; 32+ messages in thread
From: Masami Hiramatsu @ 2009-02-17  2:00 UTC (permalink / raw)
  To: Mathieu Desnoyers, Nick Piggin
  Cc: Peter Zijlstra, akpm, linux-kernel, Ingo Molnar,
	Ananth N Mavinakayanahalli, Jim Keniston

Mathieu Desnoyers wrote:
> * Nick Piggin (npiggin@suse.de) wrote:
>> On Mon, Feb 16, 2009 at 10:04:43AM -0500, Masami Hiramatsu wrote:
>>>>>>>> BTW, what about using map_vm_area() in text_poke() instead of vmap()?
>>>>>>>> Since text_poke() just maps text pages to alias pages temporarily,
>>>>>>>> I think we don't need to use delayed vunmap().
>> [...] 
>>
>>> Here is the patch which replace v(un)map with (un)map_vm_area.
>> I don't quite understand the point of this... delayed vunmap() is
>> just an implementation detail of vmap subsystem. Callers should not
>> have to care.
>>
> 
> AFAIK, map_vm_area/unmap_vm_area is faster than vmap/vunmap.  This is
> the point of this patch. Masami, could you provide a quick benchmark of
> text_poke()/seconds before and after this optimization is applied to
> confirm this ?

Sure, here is the result of calling text_poke() 2^14 times.

<Without this patch>
Total: 3634133356(cycles), 221809(cycles/text_poke)
Total: 3699532690(cycles), 225801(cycles/text_poke)
Total: 3249855588(cycles), 198355(cycles/text_poke)

<With this patch>
Total: 483467579(cycles), 29508(cycles/text_poke)
Total: 497441301(cycles), 30361(cycles/text_poke)
Total: 497604548(cycles), 30371(cycles/text_poke)

BTW, this is not only for performance, but also simplicity and its need.
Vmap may allocate new vm_area. However, since text_poke() just needs to
map pages temporarily (yeah, very short time), we don't want to call
kmalloc or any other memory allocators.
And since text_poke() makes WRITABLE aliases of READ-ONLY pages, we
want to purge these pages ASAP.
So, I think just reserving a small vm_area for text_poke() and
reusing it is enough.

Thanks,

> 
> Thanks,
> 
> Mathieu
> 
>>  
>>> Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
>>> ---
>>>  arch/x86/include/asm/alternative.h |    1 +
>>>  arch/x86/kernel/alternative.c      |   25 ++++++++++++++++++++-----
>>>  init/main.c                        |    3 +++
>>>  3 files changed, 24 insertions(+), 5 deletions(-)
>>>
>>> Index: linux-2.6/arch/x86/include/asm/alternative.h
>>> ===================================================================
>>> --- linux-2.6.orig/arch/x86/include/asm/alternative.h
>>> +++ linux-2.6/arch/x86/include/asm/alternative.h
>>> @@ -177,6 +177,7 @@ extern void add_nops(void *insns, unsign
>>>   * The _early version expects the memory to already be RW.
>>>   */
>>>
>>> +extern void text_poke_init(void);
>>>  extern void *text_poke(void *addr, const void *opcode, size_t len);
>>>  extern void *text_poke_early(void *addr, const void *opcode, size_t len);
>>>
>>> Index: linux-2.6/arch/x86/kernel/alternative.c
>>> ===================================================================
>>> --- linux-2.6.orig/arch/x86/kernel/alternative.c
>>> +++ linux-2.6/arch/x86/kernel/alternative.c
>>> @@ -485,6 +485,16 @@ void *text_poke_early(void *addr, const
>>>  	return addr;
>>>  }
>>>
>>> +static struct vm_struct *text_poke_area[2];
>>> +static DEFINE_SPINLOCK(text_poke_lock);
>>> +
>>> +void __init text_poke_init(void)
>>> +{
>>> +	text_poke_area[0] = get_vm_area(PAGE_SIZE, VM_ALLOC);
>>> +	text_poke_area[1] = get_vm_area(2 * PAGE_SIZE, VM_ALLOC);
>>> +	BUG_ON(!text_poke_area[0] || !text_poke_area[1]);
>>> +}
>>> +
>>>  /**
>>>   * text_poke - Update instructions on a live kernel
>>>   * @addr: address to modify
>>> @@ -501,8 +511,9 @@ void *__kprobes text_poke(void *addr, co
>>>  	unsigned long flags;
>>>  	char *vaddr;
>>>  	int nr_pages = 2;
>>> -	struct page *pages[2];
>>> -	int i;
>>> +	struct page *pages[2], **pgp = pages;
>>> +	int i, ret;
>>> +	struct vm_struct *vma;
>>>
>>>  	if (!core_kernel_text((unsigned long)addr)) {
>>>  		pages[0] = vmalloc_to_page(addr);
>>> @@ -515,12 +526,16 @@ void *__kprobes text_poke(void *addr, co
>>>  	BUG_ON(!pages[0]);
>>>  	if (!pages[1])
>>>  		nr_pages = 1;
>>> -	vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
>>> -	BUG_ON(!vaddr);
>>> +	spin_lock(&text_poke_lock);
>>> +	vma = text_poke_area[nr_pages-1];
>>> +	ret = map_vm_area(vma, PAGE_KERNEL, &pgp);
>>> +	BUG_ON(ret);
>>> +	vaddr = vma->addr;
>>>  	local_irq_save(flags);
>>>  	memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
>>>  	local_irq_restore(flags);
>>> -	vunmap(vaddr);
>>> +	unmap_kernel_range((unsigned long)vma->addr, (unsigned 
>>> long)vma->size);
>>> +	spin_unlock(&text_poke_lock);
>>>  	sync_core();
>>>  	/* Could also do a CLFLUSH here to speed up CPU recovery; but
>>>  	   that causes hangs on some VIA CPUs. */
>>> Index: linux-2.6/init/main.c
>>> ===================================================================
>>> --- linux-2.6.orig/init/main.c
>>> +++ linux-2.6/init/main.c
>>> @@ -676,6 +676,9 @@ asmlinkage void __init start_kernel(void
>>>  	taskstats_init_early();
>>>  	delayacct_init();
>>>
>>> +#ifdef CONFIG_X86
>>> +	text_poke_init();
>>> +#endif
>>>  	check_bugs();
>>>
>>>  	acpi_early_init(); /* before LAPIC and SMP init */
>>>
>>> -- 
>>> Masami Hiramatsu
>>>
>>> Software Engineer
>>> Hitachi Computer Products (America) Inc.
>>> Software Solutions Division
>>>
>>> e-mail: mhiramat@redhat.com
> 

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhiramat@redhat.com


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: irq-disabled vs vmap vs text_poke
  2009-02-17  2:00                   ` Masami Hiramatsu
@ 2009-02-17  3:03                     ` Nick Piggin
  2009-02-17  8:31                       ` Peter Zijlstra
  2009-02-17 16:48                       ` Masami Hiramatsu
  0 siblings, 2 replies; 32+ messages in thread
From: Nick Piggin @ 2009-02-17  3:03 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Mathieu Desnoyers, Peter Zijlstra, akpm, linux-kernel,
	Ingo Molnar, Ananth N Mavinakayanahalli, Jim Keniston

On Mon, Feb 16, 2009 at 09:00:35PM -0500, Masami Hiramatsu wrote:
> Mathieu Desnoyers wrote:
> >* Nick Piggin (npiggin@suse.de) wrote:
> >>On Mon, Feb 16, 2009 at 10:04:43AM -0500, Masami Hiramatsu wrote:
> >>>>>>>>BTW, what about using map_vm_area() in text_poke() instead of 
> >>>>>>>>vmap()?
> >>>>>>>>Since text_poke() just maps text pages to alias pages temporarily,
> >>>>>>>>I think we don't need to use delayed vunmap().
> >>[...] 
> >>
> >>>Here is the patch which replace v(un)map with (un)map_vm_area.
> >>I don't quite understand the point of this... delayed vunmap() is
> >>just an implementation detail of vmap subsystem. Callers should not
> >>have to care.
> >>
> >
> >AFAIK, map_vm_area/unmap_vm_area is faster than vmap/vunmap.  This is
> >the point of this patch. Masami, could you provide a quick benchmark of
> >text_poke()/seconds before and after this optimization is applied to
> >confirm this ?
> 
> Sure, here is the result of calling text_poke() 2^14 times.
> 
> <Without this patch>
> Total: 3634133356(cycles), 221809(cycles/text_poke)
> Total: 3699532690(cycles), 225801(cycles/text_poke)
> Total: 3249855588(cycles), 198355(cycles/text_poke)
> 
> <With this patch>
> Total: 483467579(cycles), 29508(cycles/text_poke)
> Total: 497441301(cycles), 30361(cycles/text_poke)
> Total: 497604548(cycles), 30371(cycles/text_poke)

Hmm, on bigger SMP systems, I think the global TLB flush required
for unmap_kernel_range will reverse these numbers.

 
> BTW, this is not only for performance, but also simplicity and its need.
> Vmap may allocate new vm_area. However, since text_poke() just needs to
> map pages temporarily (yeah, very short time), we don't want to call
> kmalloc or any other memory allocators.
> And since text_poke() makes WRITABLE aliases of READ-ONLY pages, we
> want to purge these pages ASAP.
> So, I think just reserving a small vm_area for text_poke() and
> reusing it is enough.

It is not a bad idea, but I don't think it quite goes far enough.
IMO we should reserve 2 pages of virtual memory for each CPU, and
then do the mapping/unmapping without locking, and with another
variant of unmap_kernel_range that does not do the global TLB
flush.

Unless performance doesn't really matter much, in which case, I
guess your patch is nice because it avoids doing the allocations.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: irq-disabled vs vmap vs text_poke
  2009-02-17  3:03                     ` Nick Piggin
@ 2009-02-17  8:31                       ` Peter Zijlstra
  2009-02-17 17:13                         ` Masami Hiramatsu
  2009-02-17 16:48                       ` Masami Hiramatsu
  1 sibling, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2009-02-17  8:31 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Masami Hiramatsu, Mathieu Desnoyers, akpm, linux-kernel,
	Ingo Molnar, Ananth N Mavinakayanahalli, Jim Keniston

On Tue, 2009-02-17 at 04:03 +0100, Nick Piggin wrote:

> It is not a bad idea, but I don't think it quite goes far enough.
> IMO we should reserve 2 pages of virtual memory for each CPU, and
> then do the mapping/unmapping without locking, and with another
> variant of unmap_kernel_range that does not do the global TLB
> flush.

A bit like kmap_atomic() except for 2 pages and !highmem. Should work.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: irq-disabled vs vmap vs text_poke
  2009-02-17  3:03                     ` Nick Piggin
  2009-02-17  8:31                       ` Peter Zijlstra
@ 2009-02-17 16:48                       ` Masami Hiramatsu
  2009-02-17 17:02                         ` Steven Rostedt
  1 sibling, 1 reply; 32+ messages in thread
From: Masami Hiramatsu @ 2009-02-17 16:48 UTC (permalink / raw)
  To: Nick Piggin, Steven Rostedt
  Cc: Mathieu Desnoyers, Peter Zijlstra, akpm, linux-kernel,
	Ingo Molnar, Ananth N Mavinakayanahalli, Jim Keniston

Nick Piggin wrote:
> On Mon, Feb 16, 2009 at 09:00:35PM -0500, Masami Hiramatsu wrote:
>> Mathieu Desnoyers wrote:
>>> * Nick Piggin (npiggin@suse.de) wrote:
>>>> On Mon, Feb 16, 2009 at 10:04:43AM -0500, Masami Hiramatsu wrote:
>>>>>>>>>> BTW, what about using map_vm_area() in text_poke() instead of 
>>>>>>>>>> vmap()?
>>>>>>>>>> Since text_poke() just maps text pages to alias pages temporarily,
>>>>>>>>>> I think we don't need to use delayed vunmap().
>>>> [...] 
>>>>
>>>>> Here is the patch which replace v(un)map with (un)map_vm_area.
>>>> I don't quite understand the point of this... delayed vunmap() is
>>>> just an implementation detail of vmap subsystem. Callers should not
>>>> have to care.
>>>>
>>> AFAIK, map_vm_area/unmap_vm_area is faster than vmap/vunmap.  This is
>>> the point of this patch. Masami, could you provide a quick benchmark of
>>> text_poke()/seconds before and after this optimization is applied to
>>> confirm this ?
>> Sure, here is the result of calling text_poke() 2^14 times.
>>
>> <Without this patch>
>> Total: 3634133356(cycles), 221809(cycles/text_poke)
>> Total: 3699532690(cycles), 225801(cycles/text_poke)
>> Total: 3249855588(cycles), 198355(cycles/text_poke)
>>
>> <With this patch>
>> Total: 483467579(cycles), 29508(cycles/text_poke)
>> Total: 497441301(cycles), 30361(cycles/text_poke)
>> Total: 497604548(cycles), 30371(cycles/text_poke)
> 
> Hmm, on bigger SMP systems, I think the global TLB flush required
> for unmap_kernel_range will reverse these numbers.

Sure, that's possible. unfortunately, I don't have that bigger machine...
It's just the result on 4-core smp machine.


>> BTW, this is not only for performance, but also simplicity and its need.
>> Vmap may allocate new vm_area. However, since text_poke() just needs to
>> map pages temporarily (yeah, very short time), we don't want to call
>> kmalloc or any other memory allocators.
>> And since text_poke() makes WRITABLE aliases of READ-ONLY pages, we
>> want to purge these pages ASAP.
>> So, I think just reserving a small vm_area for text_poke() and
>> reusing it is enough.
> 
> It is not a bad idea, but I don't think it quite goes far enough.
> IMO we should reserve 2 pages of virtual memory for each CPU, and
> then do the mapping/unmapping without locking, and with another
> variant of unmap_kernel_range that does not do the global TLB
> flush.
> 
> Unless performance doesn't really matter much, in which case, I
> guess your patch is nice because it avoids doing the allocations.

Thanks, I think text_poke() doesn't need high performance currently,
because it's not called so frequently, nor from the normal operation.

However, Would dynamic ftrace need performance?

Thank you,

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhiramat@redhat.com


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: irq-disabled vs vmap vs text_poke
  2009-02-17 16:48                       ` Masami Hiramatsu
@ 2009-02-17 17:02                         ` Steven Rostedt
  2009-02-17 17:18                           ` Mathieu Desnoyers
  0 siblings, 1 reply; 32+ messages in thread
From: Steven Rostedt @ 2009-02-17 17:02 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Nick Piggin, Mathieu Desnoyers, Peter Zijlstra, akpm,
	linux-kernel, Ingo Molnar, Ananth N Mavinakayanahalli,
	Jim Keniston


On Tue, 17 Feb 2009, Masami Hiramatsu wrote:
> 
> Thanks, I think text_poke() doesn't need high performance currently,
> because it's not called so frequently, nor from the normal operation.
> 
> However, Would dynamic ftrace need performance?

Are you talking about replacing what dynamic ftrace does now with 
text_poke? Well, It takes just under a second right now to do all the 
conversions. Looking at the list on my booted box, it converts 19805 
locations. It takes care to run stop machine, and it also has code to 
handle NMIs on the other CPUS while it runs.

Changing text must be done extremely carefully on a running box. If the 
code being changed is also executed on another CPU then you will most 
likely take a GPF on that CPU.

Also, every arch must do things a bit different, due to the way modules 
are handled.

-- Steve


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: irq-disabled vs vmap vs text_poke
  2009-02-17  8:31                       ` Peter Zijlstra
@ 2009-02-17 17:13                         ` Masami Hiramatsu
  0 siblings, 0 replies; 32+ messages in thread
From: Masami Hiramatsu @ 2009-02-17 17:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Nick Piggin, Mathieu Desnoyers, akpm, linux-kernel, Ingo Molnar,
	Ananth N Mavinakayanahalli, Jim Keniston

Peter Zijlstra wrote:
> On Tue, 2009-02-17 at 04:03 +0100, Nick Piggin wrote:
> 
>> It is not a bad idea, but I don't think it quite goes far enough.
>> IMO we should reserve 2 pages of virtual memory for each CPU, and
>> then do the mapping/unmapping without locking, and with another
>> variant of unmap_kernel_range that does not do the global TLB
>> flush.
> 
> A bit like kmap_atomic() except for 2 pages and !highmem. Should work.

Hmm, I checked up the kmap_atomic(), but all kernel text
is not on the highmem.

So, what about changing kmap_atomic_prot() implementation as below?

- If the page is !highmem *and the page's prot is same as caller specified prot*,
  just returns page_address(page).
- Add 2 continuous km_types for text_poke().

Then, text_poke() can use it.
---
vaddr = kmap_atomic_prot(page[0], KM_TEXT_POKE0, PAGE_KERNEL);
if(nr_pages > 1)
	vaddr1 = kmap_atomic_prot(page[1], KM_TEXT_POKE1, PAGE_KERNEL);
< change text >
if(nr_pages > 1)
	kunmap_atomic(vaddr1, KM_TEXT_POKE1);
kunmap_atomic(vaddr, KM_TEXT_POKE0);
---

Thank you,

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhiramat@redhat.com


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: irq-disabled vs vmap vs text_poke
  2009-02-17 17:02                         ` Steven Rostedt
@ 2009-02-17 17:18                           ` Mathieu Desnoyers
  2009-02-17 17:24                             ` Steven Rostedt
  0 siblings, 1 reply; 32+ messages in thread
From: Mathieu Desnoyers @ 2009-02-17 17:18 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Masami Hiramatsu, Nick Piggin, Peter Zijlstra, akpm,
	linux-kernel, Ingo Molnar, Ananth N Mavinakayanahalli,
	Jim Keniston

* Steven Rostedt (rostedt@goodmis.org) wrote:
> 
> On Tue, 17 Feb 2009, Masami Hiramatsu wrote:
> > 
> > Thanks, I think text_poke() doesn't need high performance currently,
> > because it's not called so frequently, nor from the normal operation.
> > 
> > However, Would dynamic ftrace need performance?
> 
> Are you talking about replacing what dynamic ftrace does now with 
> text_poke? Well, It takes just under a second right now to do all the 
> conversions. Looking at the list on my booted box, it converts 19805 
> locations. It takes care to run stop machine, and it also has code to 
> handle NMIs on the other CPUS while it runs.
> 
> Changing text must be done extremely carefully on a running box. If the 
> code being changed is also executed on another CPU then you will most 
> likely take a GPF on that CPU.
> 
> Also, every arch must do things a bit different, due to the way modules 
> are handled.
> 
> -- Steve
> 

Anyway dynamic ftrace only need to do code patching at early boot,
right? Or do you need also to patch the call sites a bit later too ?
Because there is a text_poke_early for that purpose : modifying code
when it is still writable.

Mathieu


-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: irq-disabled vs vmap vs text_poke
  2009-02-17 17:18                           ` Mathieu Desnoyers
@ 2009-02-17 17:24                             ` Steven Rostedt
  2009-02-17 17:28                               ` Mathieu Desnoyers
  0 siblings, 1 reply; 32+ messages in thread
From: Steven Rostedt @ 2009-02-17 17:24 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Masami Hiramatsu, Nick Piggin, Peter Zijlstra, akpm,
	linux-kernel, Ingo Molnar, Ananth N Mavinakayanahalli,
	Jim Keniston


On Tue, 17 Feb 2009, Mathieu Desnoyers wrote:
> > 
> 
> Anyway dynamic ftrace only need to do code patching at early boot,
> right? Or do you need also to patch the call sites a bit later too ?
> Because there is a text_poke_early for that purpose : modifying code
> when it is still writable.

It modifies the code both at early boot up, and when it is enabled later 
on at run time.

-- Steve


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: irq-disabled vs vmap vs text_poke
  2009-02-17 17:24                             ` Steven Rostedt
@ 2009-02-17 17:28                               ` Mathieu Desnoyers
  2009-02-17 17:48                                 ` Steven Rostedt
  0 siblings, 1 reply; 32+ messages in thread
From: Mathieu Desnoyers @ 2009-02-17 17:28 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Masami Hiramatsu, Nick Piggin, Peter Zijlstra, akpm,
	linux-kernel, Ingo Molnar, Ananth N Mavinakayanahalli,
	Jim Keniston

* Steven Rostedt (rostedt@goodmis.org) wrote:
> 
> On Tue, 17 Feb 2009, Mathieu Desnoyers wrote:
> > > 
> > 
> > Anyway dynamic ftrace only need to do code patching at early boot,
> > right? Or do you need also to patch the call sites a bit later too ?
> > Because there is a text_poke_early for that purpose : modifying code
> > when it is still writable.
> 
> It modifies the code both at early boot up, and when it is enabled later 
> on at run time.
> 
> -- Steve
> 

OK, so without text_poke, how does it deal with kernel with text pages
marked read-only after boot ?

Mathieu


-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: irq-disabled vs vmap vs text_poke
  2009-02-17 17:28                               ` Mathieu Desnoyers
@ 2009-02-17 17:48                                 ` Steven Rostedt
  0 siblings, 0 replies; 32+ messages in thread
From: Steven Rostedt @ 2009-02-17 17:48 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Masami Hiramatsu, Nick Piggin, Peter Zijlstra, akpm,
	linux-kernel, Ingo Molnar, Ananth N Mavinakayanahalli,
	Jim Keniston


On Tue, 17 Feb 2009, Mathieu Desnoyers wrote:

> * Steven Rostedt (rostedt@goodmis.org) wrote:
> > 
> > On Tue, 17 Feb 2009, Mathieu Desnoyers wrote:
> > > > 
> > > 
> > > Anyway dynamic ftrace only need to do code patching at early boot,
> > > right? Or do you need also to patch the call sites a bit later too ?
> > > Because there is a text_poke_early for that purpose : modifying code
> > > when it is still writable.
> > 
> > It modifies the code both at early boot up, and when it is enabled later 
> > on at run time.
> > 
> > -- Steve
> > 
> 
> OK, so without text_poke, how does it deal with kernel with text pages
> marked read-only after boot ?

Actually, we cheat. If DYNAMIC_FTRACE is configured in, we do not set the 
text sections to read only. But if this needs to be fixed, we can easily 
change it so that before we do the convesion, we set all text back to 
write and after the conversion we set it back to read.  The conversion 
itself is done via stop_machine, so we do not need to worry about other 
things going on.

-- Steve


^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2009-02-17 17:49 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-02-13 12:50 irq-disabled vs vmap vs text_poke Peter Zijlstra
2009-02-13 12:55 ` Nick Piggin
2009-02-13 13:02   ` Peter Zijlstra
2009-02-13 13:04     ` Ingo Molnar
2009-02-13 14:25       ` Mathieu Desnoyers
2009-02-13 14:33         ` Peter Zijlstra
2009-02-13 14:43           ` Mathieu Desnoyers
2009-02-13 18:05             ` Ingo Molnar
2009-02-13 13:05     ` Peter Zijlstra
2009-02-13 13:09       ` Nick Piggin
2009-02-13 14:18 ` Mathieu Desnoyers
2009-02-13 14:28   ` Peter Zijlstra
2009-02-13 16:32   ` Masami Hiramatsu
2009-02-13 16:38     ` Peter Zijlstra
2009-02-13 16:55     ` Mathieu Desnoyers
2009-02-13 18:14       ` Masami Hiramatsu
2009-02-13 18:57         ` Mathieu Desnoyers
2009-02-13 21:41           ` Masami Hiramatsu
2009-02-16 15:04             ` Masami Hiramatsu
2009-02-16 15:31               ` Nick Piggin
2009-02-16 17:24                 ` Mathieu Desnoyers
2009-02-17  2:00                   ` Masami Hiramatsu
2009-02-17  3:03                     ` Nick Piggin
2009-02-17  8:31                       ` Peter Zijlstra
2009-02-17 17:13                         ` Masami Hiramatsu
2009-02-17 16:48                       ` Masami Hiramatsu
2009-02-17 17:02                         ` Steven Rostedt
2009-02-17 17:18                           ` Mathieu Desnoyers
2009-02-17 17:24                             ` Steven Rostedt
2009-02-17 17:28                               ` Mathieu Desnoyers
2009-02-17 17:48                                 ` Steven Rostedt
2009-02-13 14:27 ` [PATCH] x86: text_poke might sleep Mathieu Desnoyers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).