linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH -tip 0/4] Text edit lock and atomic text_poke()
@ 2009-03-06 15:34 Masami Hiramatsu
  2009-03-06 15:35 ` [PATCH -tip 1/4] Text Edit Lock - Architecture Independent Code (v2) Masami Hiramatsu
                   ` (4 more replies)
  0 siblings, 5 replies; 26+ messages in thread
From: Masami Hiramatsu @ 2009-03-06 15:34 UTC (permalink / raw)
  To: Mathieu Desnoyers, Ingo Molnar
  Cc: Andrew Morton, Nick Piggin, Steven Rostedt, Andi Kleen,
	Linux Kernel Mailing List, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Linus Torvalds, Arjan van de Ven,
	Rusty Russell, H. Peter Anvin

Hi,

Here is a series of patches which introduce text_mutex for protecting
editing kernel text from each other subsystems, and make text_poke()
atomic by using fixmap.

BTW,

> Paravirt and alternatives are always done when SMP is inactive, so there is no
> need to use locks.

Mathieu, I'm not sure that means. alternatives will be called from module
init code and other place where the system has already been running multi-
-threads(and they use smp_alt mutex). So, is it possible that those functions
will sleep or yield to another process?

Anyway, I added a new patch which locks text_mutex in alternative_smp_*.

Thank you,

 arch/x86/include/asm/fixmap.h |    2 ++
 arch/x86/kernel/alternative.c |   29 ++++++++++++++++++++---------
 include/linux/memory.h        |    6 ++++++
 kernel/kprobes.c              |   15 +++++++++++++--
 mm/memory.c                   |    9 +++++++++
 5 files changed, 50 insertions(+), 11 deletions(-)

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhiramat@redhat.com


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH -tip 1/4] Text Edit Lock - Architecture Independent Code (v2)
  2009-03-06 15:34 [PATCH -tip 0/4] Text edit lock and atomic text_poke() Masami Hiramatsu
@ 2009-03-06 15:35 ` Masami Hiramatsu
  2009-03-08 15:51   ` [tip:tracing/core] tracing, Text Edit Lock - Architecture Independent Code Mathieu Desnoyers
  2009-03-06 15:36 ` [PATCH -tip 2/4]Text Edit Lock - kprobes architecture independent support (v3) Masami Hiramatsu
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 26+ messages in thread
From: Masami Hiramatsu @ 2009-03-06 15:35 UTC (permalink / raw)
  To: Mathieu Desnoyers, Ingo Molnar
  Cc: Andrew Morton, Nick Piggin, Steven Rostedt, Andi Kleen,
	Linux Kernel Mailing List, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Linus Torvalds, Arjan van de Ven,
	Rusty Russell, H. Peter Anvin

From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>

This is an architecture independant synchronization around kernel text
modifications through use of a global mutex.

A mutex has been chosen so that kprobes, the main user of this, can sleep during
memory allocation between the memory read of the instructions it must replace
and the memory write of the breakpoint.

Other user of this interface: immediate values.

Paravirt and alternatives are always done when SMP is inactive, so there is no
need to use locks.

Changelog :
Export text_mutex directly.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Andi Kleen <andi@firstfloor.org>
CC: Ingo Molnar <mingo@elte.hu>
---
 include/linux/memory.h |    6 ++++++
 mm/memory.c            |    9 +++++++++
 2 files changed, 15 insertions(+)

Index: 2.6.29-rc7/include/linux/memory.h
===================================================================
--- 2.6.29-rc7.orig/include/linux/memory.h
+++ 2.6.29-rc7/include/linux/memory.h
@@ -99,4 +99,10 @@ enum mem_add_context { BOOT, HOTPLUG };
 #define hotplug_memory_notifier(fn, pri) do { } while (0)
 #endif

+/*
+ * Kernel text modification mutex, used for code patching. Users of this lock
+ * can sleep.
+ */
+extern struct mutex text_mutex;
+
 #endif /* _LINUX_MEMORY_H_ */
Index: 2.6.29-rc7/mm/memory.c
===================================================================
--- 2.6.29-rc7.orig/mm/memory.c
+++ 2.6.29-rc7/mm/memory.c
@@ -48,6 +48,8 @@
 #include <linux/rmap.h>
 #include <linux/module.h>
 #include <linux/delayacct.h>
+#include <linux/kprobes.h>
+#include <linux/mutex.h>
 #include <linux/init.h>
 #include <linux/writeback.h>
 #include <linux/memcontrol.h>
@@ -99,6 +101,13 @@ int randomize_va_space __read_mostly =
 					2;
 #endif

+/*
+ * mutex protecting text section modification (dynamic code patching).
+ * some users need to sleep (allocating memory...) while they hold this lock.
+ */
+DEFINE_MUTEX(text_mutex);
+EXPORT_SYMBOL_GPL(text_mutex);
+
 static int __init disable_randmaps(char *s)
 {
 	randomize_va_space = 0;
-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhiramat@redhat.com


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH -tip 2/4]Text Edit Lock - kprobes architecture independent support (v3)
  2009-03-06 15:34 [PATCH -tip 0/4] Text edit lock and atomic text_poke() Masami Hiramatsu
  2009-03-06 15:35 ` [PATCH -tip 1/4] Text Edit Lock - Architecture Independent Code (v2) Masami Hiramatsu
@ 2009-03-06 15:36 ` Masami Hiramatsu
  2009-03-08 15:51   ` [tip:tracing/core] tracing, Text Edit Lock - kprobes architecture independent support Mathieu Desnoyers
  2009-03-06 15:37 ` [PATCH -tip 3/4]Text Edit Lock - Smp alternatives support Masami Hiramatsu
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 26+ messages in thread
From: Masami Hiramatsu @ 2009-03-06 15:36 UTC (permalink / raw)
  To: Mathieu Desnoyers, Ingo Molnar
  Cc: Andrew Morton, Nick Piggin, Steven Rostedt, Andi Kleen,
	Linux Kernel Mailing List, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Linus Torvalds, Arjan van de Ven,
	Rusty Russell, H. Peter Anvin

From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>

Use the mutual exclusion provided by the text edit lock in the kprobes code. It
allows coherent manipulation of the kernel code by other subsystems.

Changelog:

Move the kernel_text_lock/unlock out of the for loops.
Use text_mutex directly instead of a function.
Remove whitespace modifications.

(note : kprobes_mutex is always taken outside of text_mutex)

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
CC: ananth@in.ibm.com
CC: anil.s.keshavamurthy@intel.com
CC: davem@davemloft.net
CC: Roel Kluin <12o3l@tiscali.nl>
---
 kernel/kprobes.c |   15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

Index: 2.6.29-rc7/kernel/kprobes.c
===================================================================
--- 2.6.29-rc7.orig/kernel/kprobes.c
+++ 2.6.29-rc7/kernel/kprobes.c
@@ -43,6 +43,7 @@
 #include <linux/seq_file.h>
 #include <linux/debugfs.h>
 #include <linux/kdebug.h>
+#include <linux/memory.h>

 #include <asm-generic/sections.h>
 #include <asm/cacheflush.h>
@@ -699,9 +700,10 @@ int __kprobes register_kprobe(struct kpr
 		goto out;
 	}

+	mutex_lock(&text_mutex);
 	ret = arch_prepare_kprobe(p);
 	if (ret)
-		goto out;
+		goto out_unlock_text;

 	INIT_HLIST_NODE(&p->hlist);
 	hlist_add_head_rcu(&p->hlist,
@@ -710,6 +712,8 @@ int __kprobes register_kprobe(struct kpr
 	if (kprobe_enabled)
 		arch_arm_kprobe(p);

+out_unlock_text:
+	mutex_unlock(&text_mutex);
 out:
 	mutex_unlock(&kprobe_mutex);

@@ -746,8 +750,11 @@ valid_p:
 		 * enabled and not gone - otherwise, the breakpoint would
 		 * already have been removed. We save on flushing icache.
 		 */
-		if (kprobe_enabled && !kprobe_gone(old_p))
+		if (kprobe_enabled && !kprobe_gone(old_p)) {
+			mutex_lock(&text_mutex);
 			arch_disarm_kprobe(p);
+			mutex_unlock(&text_mutex);
+		}
 		hlist_del_rcu(&old_p->hlist);
 	} else {
 		if (p->break_handler && !kprobe_gone(p))
@@ -1280,12 +1287,14 @@ static void __kprobes enable_all_kprobes
 	if (kprobe_enabled)
 		goto already_enabled;

+	mutex_lock(&text_mutex);
 	for (i = 0; i < KPROBE_TABLE_SIZE; i++) {
 		head = &kprobe_table[i];
 		hlist_for_each_entry_rcu(p, node, head, hlist)
 			if (!kprobe_gone(p))
 				arch_arm_kprobe(p);
 	}
+	mutex_unlock(&text_mutex);

 	kprobe_enabled = true;
 	printk(KERN_INFO "Kprobes globally enabled\n");
@@ -1310,6 +1319,7 @@ static void __kprobes disable_all_kprobe

 	kprobe_enabled = false;
 	printk(KERN_INFO "Kprobes globally disabled\n");
+	mutex_lock(&text_mutex);
 	for (i = 0; i < KPROBE_TABLE_SIZE; i++) {
 		head = &kprobe_table[i];
 		hlist_for_each_entry_rcu(p, node, head, hlist) {
@@ -1318,6 +1328,7 @@ static void __kprobes disable_all_kprobe
 		}
 	}

+	mutex_unlock(&text_mutex);
 	mutex_unlock(&kprobe_mutex);
 	/* Allow all currently running kprobes to complete */
 	synchronize_sched();
-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhiramat@redhat.com



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH -tip 3/4]Text Edit Lock - Smp alternatives support
  2009-03-06 15:34 [PATCH -tip 0/4] Text edit lock and atomic text_poke() Masami Hiramatsu
  2009-03-06 15:35 ` [PATCH -tip 1/4] Text Edit Lock - Architecture Independent Code (v2) Masami Hiramatsu
  2009-03-06 15:36 ` [PATCH -tip 2/4]Text Edit Lock - kprobes architecture independent support (v3) Masami Hiramatsu
@ 2009-03-06 15:37 ` Masami Hiramatsu
  2009-03-06 18:11   ` Mathieu Desnoyers
  2009-03-08 15:51   ` [tip:tracing/core] tracing, Text Edit Lock - SMP " Masami Hiramatsu
  2009-03-06 15:37 ` [PATCH -tip 4/4] Atomic text_poke() with fixmap Masami Hiramatsu
  2009-03-06 18:09 ` [PATCH -tip 0/4] Text edit lock and atomic text_poke() Mathieu Desnoyers
  4 siblings, 2 replies; 26+ messages in thread
From: Masami Hiramatsu @ 2009-03-06 15:37 UTC (permalink / raw)
  To: Mathieu Desnoyers, Ingo Molnar
  Cc: Andrew Morton, Nick Piggin, Steven Rostedt, Andi Kleen,
	Linux Kernel Mailing List, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Linus Torvalds, Arjan van de Ven,
	Rusty Russell, H. Peter Anvin

Use the mutual exclusion provided by the text edit lock in alternatives code.
Since alternative_smp_* will be called from module init code, etc,
we'd better protect it from other subsystems.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/kernel/alternative.c |    5 +++++
 1 file changed, 5 insertions(+)

Index: 2.6.29-rc7/arch/x86/kernel/alternative.c
===================================================================
--- 2.6.29-rc7.orig/arch/x86/kernel/alternative.c
+++ 2.6.29-rc7/arch/x86/kernel/alternative.c
@@ -5,6 +5,7 @@
 #include <linux/kprobes.h>
 #include <linux/mm.h>
 #include <linux/vmalloc.h>
+#include <linux/memory.h>
 #include <asm/alternative.h>
 #include <asm/sections.h>
 #include <asm/pgtable.h>
@@ -226,6 +227,7 @@ static void alternatives_smp_lock(u8 **s
 {
 	u8 **ptr;

+	mutex_lock(&text_mutex);
 	for (ptr = start; ptr < end; ptr++) {
 		if (*ptr < text)
 			continue;
@@ -234,6 +236,7 @@ static void alternatives_smp_lock(u8 **s
 		/* turn DS segment override prefix into lock prefix */
 		text_poke(*ptr, ((unsigned char []){0xf0}), 1);
 	};
+	mutex_unlock(&text_mutex);
 }

 static void alternatives_smp_unlock(u8 **start, u8 **end, u8 *text, u8 *text_end)
@@ -243,6 +246,7 @@ static void alternatives_smp_unlock(u8 *
 	if (noreplace_smp)
 		return;

+	mutex_lock(&text_mutex);
 	for (ptr = start; ptr < end; ptr++) {
 		if (*ptr < text)
 			continue;
@@ -251,6 +255,7 @@ static void alternatives_smp_unlock(u8 *
 		/* turn lock prefix into DS segment override prefix */
 		text_poke(*ptr, ((unsigned char []){0x3E}), 1);
 	};
+	mutex_unlock(&text_mutex);
 }

 struct smp_alt_module {
-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhiramat@redhat.com


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH -tip 4/4] Atomic text_poke() with fixmap
  2009-03-06 15:34 [PATCH -tip 0/4] Text edit lock and atomic text_poke() Masami Hiramatsu
                   ` (2 preceding siblings ...)
  2009-03-06 15:37 ` [PATCH -tip 3/4]Text Edit Lock - Smp alternatives support Masami Hiramatsu
@ 2009-03-06 15:37 ` Masami Hiramatsu
  2009-03-06 18:13   ` Mathieu Desnoyers
  2009-03-08 15:51   ` [tip:tracing/core] x86: implement atomic text_poke() via fixmap Masami Hiramatsu
  2009-03-06 18:09 ` [PATCH -tip 0/4] Text edit lock and atomic text_poke() Mathieu Desnoyers
  4 siblings, 2 replies; 26+ messages in thread
From: Masami Hiramatsu @ 2009-03-06 15:37 UTC (permalink / raw)
  To: Mathieu Desnoyers, Ingo Molnar
  Cc: Andrew Morton, Nick Piggin, Steven Rostedt, Andi Kleen,
	Linux Kernel Mailing List, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Linus Torvalds, Arjan van de Ven,
	Rusty Russell, H. Peter Anvin

Use fixmaps instead of vmap/vunmap in text_poke() for avoiding page allocation
and delayed unmapping.

At the result of above change, text_poke() becomes atomic and can be called
from stop_machine() etc.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/include/asm/fixmap.h |    2 ++
 arch/x86/kernel/alternative.c |   24 +++++++++++++++---------
 2 files changed, 17 insertions(+), 9 deletions(-)

Index: linux-2.6-tip/arch/x86/kernel/alternative.c
===================================================================
--- linux-2.6-tip.orig/arch/x86/kernel/alternative.c
+++ linux-2.6-tip/arch/x86/kernel/alternative.c
@@ -13,7 +13,9 @@
 #include <asm/nmi.h>
 #include <asm/vsyscall.h>
 #include <asm/cacheflush.h>
+#include <asm/tlbflush.h>
 #include <asm/io.h>
+#include <asm/fixmap.h>

 #define MAX_PATCH_LEN (255-1)

@@ -505,15 +507,16 @@ void *text_poke_early(void *addr, const
  * It means the size must be writable atomically and the address must be aligned
  * in a way that permits an atomic write. It also makes sure we fit on a single
  * page.
+ *
+ * Note: Must be called under text_mutex.
  */
 void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
 {
+	unsigned long flags;
 	char *vaddr;
-	int nr_pages = 2;
 	struct page *pages[2];
 	int i;

-	might_sleep();
 	if (!core_kernel_text((unsigned long)addr)) {
 		pages[0] = vmalloc_to_page(addr);
 		pages[1] = vmalloc_to_page(addr + PAGE_SIZE);
@@ -523,14 +526,17 @@ void *__kprobes text_poke(void *addr, co
 		pages[1] = virt_to_page(addr + PAGE_SIZE);
 	}
 	BUG_ON(!pages[0]);
-	if (!pages[1])
-		nr_pages = 1;
-	vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
-	BUG_ON(!vaddr);
-	local_irq_disable();
+	set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
+	if (pages[1])
+		set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
+	vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
+	local_irq_save(flags);
 	memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
-	local_irq_enable();
-	vunmap(vaddr);
+	local_irq_restore(flags);
+	clear_fixmap(FIX_TEXT_POKE0);
+	if (pages[1])
+		clear_fixmap(FIX_TEXT_POKE1);
+	local_flush_tlb();
 	sync_core();
 	/* Could also do a CLFLUSH here to speed up CPU recovery; but
 	   that causes hangs on some VIA CPUs. */
Index: linux-2.6-tip/arch/x86/include/asm/fixmap.h
===================================================================
--- linux-2.6-tip.orig/arch/x86/include/asm/fixmap.h
+++ linux-2.6-tip/arch/x86/include/asm/fixmap.h
@@ -111,6 +111,8 @@ enum fixed_addresses {
 #ifdef CONFIG_PARAVIRT
 	FIX_PARAVIRT_BOOTMAP,
 #endif
+	FIX_TEXT_POKE0,	/* reserve 2 pages for text_poke() */
+	FIX_TEXT_POKE1,
 	__end_of_permanent_fixed_addresses,
 #ifdef CONFIG_PROVIDE_OHCI1394_DMA_INIT
 	FIX_OHCI1394_BASE,
-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhiramat@redhat.com


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH -tip 0/4] Text edit lock and atomic text_poke()
  2009-03-06 15:34 [PATCH -tip 0/4] Text edit lock and atomic text_poke() Masami Hiramatsu
                   ` (3 preceding siblings ...)
  2009-03-06 15:37 ` [PATCH -tip 4/4] Atomic text_poke() with fixmap Masami Hiramatsu
@ 2009-03-06 18:09 ` Mathieu Desnoyers
  2009-03-06 18:12   ` Steven Rostedt
  4 siblings, 1 reply; 26+ messages in thread
From: Mathieu Desnoyers @ 2009-03-06 18:09 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Ingo Molnar, Andrew Morton, Nick Piggin, Steven Rostedt,
	Andi Kleen, Linux Kernel Mailing List, Thomas Gleixner,
	Peter Zijlstra, Frederic Weisbecker, Linus Torvalds,
	Arjan van de Ven, Rusty Russell, H. Peter Anvin

* Masami Hiramatsu (mhiramat@redhat.com) wrote:
> Hi,
> 
> Here is a series of patches which introduce text_mutex for protecting
> editing kernel text from each other subsystems, and make text_poke()
> atomic by using fixmap.
> 
> BTW,
> 
> > Paravirt and alternatives are always done when SMP is inactive, so there is no
> > need to use locks.
> 
> Mathieu, I'm not sure that means. alternatives will be called from module
> init code and other place where the system has already been running multi-
> -threads(and they use smp_alt mutex). So, is it possible that those functions
> will sleep or yield to another process?
> 

As I remember, module code uses text_poke_early when modifying the
module text _before_ the module is made available to the rest of the
kernel. Therefore, it behaves as if it was in a UP system, and that's
ok, because only one CPU has to touch the text. Therefore, there is no
need to use text_poke, and therefore no need to take the text mutex.

Or maybe has it changed ? But I doubt so.

Mathieu

> Anyway, I added a new patch which locks text_mutex in alternative_smp_*.
> 
> Thank you,
> 
>  arch/x86/include/asm/fixmap.h |    2 ++
>  arch/x86/kernel/alternative.c |   29 ++++++++++++++++++++---------
>  include/linux/memory.h        |    6 ++++++
>  kernel/kprobes.c              |   15 +++++++++++++--
>  mm/memory.c                   |    9 +++++++++
>  5 files changed, 50 insertions(+), 11 deletions(-)
> 
> -- 
> Masami Hiramatsu
> 
> Software Engineer
> Hitachi Computer Products (America) Inc.
> Software Solutions Division
> 
> e-mail: mhiramat@redhat.com
> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH -tip 3/4]Text Edit Lock - Smp alternatives support
  2009-03-06 15:37 ` [PATCH -tip 3/4]Text Edit Lock - Smp alternatives support Masami Hiramatsu
@ 2009-03-06 18:11   ` Mathieu Desnoyers
  2009-03-06 18:40     ` Masami Hiramatsu
  2009-03-08 15:51   ` [tip:tracing/core] tracing, Text Edit Lock - SMP " Masami Hiramatsu
  1 sibling, 1 reply; 26+ messages in thread
From: Mathieu Desnoyers @ 2009-03-06 18:11 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Ingo Molnar, Andrew Morton, Nick Piggin, Steven Rostedt,
	Andi Kleen, Linux Kernel Mailing List, Thomas Gleixner,
	Peter Zijlstra, Frederic Weisbecker, Linus Torvalds,
	Arjan van de Ven, Rusty Russell, H. Peter Anvin

* Masami Hiramatsu (mhiramat@redhat.com) wrote:
> Use the mutual exclusion provided by the text edit lock in alternatives code.
> Since alternative_smp_* will be called from module init code, etc,
> we'd better protect it from other subsystems.
> 
> Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> Cc: Ingo Molnar <mingo@elte.hu>
> ---
>  arch/x86/kernel/alternative.c |    5 +++++
>  1 file changed, 5 insertions(+)
> 
> Index: 2.6.29-rc7/arch/x86/kernel/alternative.c
> ===================================================================
> --- 2.6.29-rc7.orig/arch/x86/kernel/alternative.c
> +++ 2.6.29-rc7/arch/x86/kernel/alternative.c
> @@ -5,6 +5,7 @@
>  #include <linux/kprobes.h>
>  #include <linux/mm.h>
>  #include <linux/vmalloc.h>
> +#include <linux/memory.h>
>  #include <asm/alternative.h>
>  #include <asm/sections.h>
>  #include <asm/pgtable.h>
> @@ -226,6 +227,7 @@ static void alternatives_smp_lock(u8 **s
>  {
>  	u8 **ptr;
> 
> +	mutex_lock(&text_mutex);
>  	for (ptr = start; ptr < end; ptr++) {
>  		if (*ptr < text)
>  			continue;
> @@ -234,6 +236,7 @@ static void alternatives_smp_lock(u8 **s
>  		/* turn DS segment override prefix into lock prefix */
>  		text_poke(*ptr, ((unsigned char []){0xf0}), 1);
>  	};
> +	mutex_unlock(&text_mutex);
>  }
> 
>  static void alternatives_smp_unlock(u8 **start, u8 **end, u8 *text, u8 *text_end)
> @@ -243,6 +246,7 @@ static void alternatives_smp_unlock(u8 *
>  	if (noreplace_smp)
>  		return;
> 
> +	mutex_lock(&text_mutex);
>  	for (ptr = start; ptr < end; ptr++) {
>  		if (*ptr < text)
>  			continue;
> @@ -251,6 +255,7 @@ static void alternatives_smp_unlock(u8 *
>  		/* turn lock prefix into DS segment override prefix */
>  		text_poke(*ptr, ((unsigned char []){0x3E}), 1);
>  	};
> +	mutex_unlock(&text_mutex);

And for these cases, the "alternatives_smp_lock" and
"alternatives_smp_unlock", the system is running as uniprocessor.
Therefore it's enough to disable interrupts within text_poke to make it
work correctly.

Mathieu

>  }
> 
>  struct smp_alt_module {
> -- 
> Masami Hiramatsu
> 
> Software Engineer
> Hitachi Computer Products (America) Inc.
> Software Solutions Division
> 
> e-mail: mhiramat@redhat.com
> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH -tip 0/4] Text edit lock and atomic text_poke()
  2009-03-06 18:09 ` [PATCH -tip 0/4] Text edit lock and atomic text_poke() Mathieu Desnoyers
@ 2009-03-06 18:12   ` Steven Rostedt
  0 siblings, 0 replies; 26+ messages in thread
From: Steven Rostedt @ 2009-03-06 18:12 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Masami Hiramatsu, Ingo Molnar, Andrew Morton, Nick Piggin,
	Andi Kleen, Linux Kernel Mailing List, Thomas Gleixner,
	Peter Zijlstra, Frederic Weisbecker, Linus Torvalds,
	Arjan van de Ven, Rusty Russell, H. Peter Anvin


On Fri, 6 Mar 2009, Mathieu Desnoyers wrote:

> * Masami Hiramatsu (mhiramat@redhat.com) wrote:
> > Hi,
> > 
> > Here is a series of patches which introduce text_mutex for protecting
> > editing kernel text from each other subsystems, and make text_poke()
> > atomic by using fixmap.
> > 
> > BTW,
> > 
> > > Paravirt and alternatives are always done when SMP is inactive, so there is no
> > > need to use locks.
> > 
> > Mathieu, I'm not sure that means. alternatives will be called from module
> > init code and other place where the system has already been running multi-
> > -threads(and they use smp_alt mutex). So, is it possible that those functions
> > will sleep or yield to another process?
> > 
> 
> As I remember, module code uses text_poke_early when modifying the
> module text _before_ the module is made available to the rest of the
> kernel. Therefore, it behaves as if it was in a UP system, and that's
> ok, because only one CPU has to touch the text. Therefore, there is no
> need to use text_poke, and therefore no need to take the text mutex.
> 
> Or maybe has it changed ? But I doubt so.
> 

That is correct. Currently ftrace simply does a probe_kernel_write on 
module code to convert the calls to mcount into nops. This is done before 
the module is available, and does not require stop_machine nor break 
points. Since modules are not included in the DEBUG_RODATA there's no need 
to play games with the page tables either.

-- Steve


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH -tip 4/4] Atomic text_poke() with fixmap
  2009-03-06 15:37 ` [PATCH -tip 4/4] Atomic text_poke() with fixmap Masami Hiramatsu
@ 2009-03-06 18:13   ` Mathieu Desnoyers
  2009-03-06 18:18     ` Ingo Molnar
  2009-03-06 18:33     ` [PATCH -tip 4/4] Atomic text_poke() with fixmap take2 Masami Hiramatsu
  2009-03-08 15:51   ` [tip:tracing/core] x86: implement atomic text_poke() via fixmap Masami Hiramatsu
  1 sibling, 2 replies; 26+ messages in thread
From: Mathieu Desnoyers @ 2009-03-06 18:13 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Ingo Molnar, Andrew Morton, Nick Piggin, Steven Rostedt,
	Andi Kleen, Linux Kernel Mailing List, Thomas Gleixner,
	Peter Zijlstra, Frederic Weisbecker, Linus Torvalds,
	Arjan van de Ven, Rusty Russell, H. Peter Anvin

* Masami Hiramatsu (mhiramat@redhat.com) wrote:
> Use fixmaps instead of vmap/vunmap in text_poke() for avoiding page allocation
> and delayed unmapping.
> 
> At the result of above change, text_poke() becomes atomic and can be called
> from stop_machine() etc.
> 
> Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> Cc: Ingo Molnar <mingo@elte.hu>
> ---
>  arch/x86/include/asm/fixmap.h |    2 ++
>  arch/x86/kernel/alternative.c |   24 +++++++++++++++---------
>  2 files changed, 17 insertions(+), 9 deletions(-)
> 
> Index: linux-2.6-tip/arch/x86/kernel/alternative.c
> ===================================================================
> --- linux-2.6-tip.orig/arch/x86/kernel/alternative.c
> +++ linux-2.6-tip/arch/x86/kernel/alternative.c
> @@ -13,7 +13,9 @@
>  #include <asm/nmi.h>
>  #include <asm/vsyscall.h>
>  #include <asm/cacheflush.h>
> +#include <asm/tlbflush.h>
>  #include <asm/io.h>
> +#include <asm/fixmap.h>
> 
>  #define MAX_PATCH_LEN (255-1)
> 
> @@ -505,15 +507,16 @@ void *text_poke_early(void *addr, const
>   * It means the size must be writable atomically and the address must be aligned
>   * in a way that permits an atomic write. It also makes sure we fit on a single
>   * page.
> + *
> + * Note: Must be called under text_mutex.
>   */
>  void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
>  {
> +	unsigned long flags;
>  	char *vaddr;
> -	int nr_pages = 2;
>  	struct page *pages[2];
>  	int i;
> 
> -	might_sleep();
>  	if (!core_kernel_text((unsigned long)addr)) {
>  		pages[0] = vmalloc_to_page(addr);
>  		pages[1] = vmalloc_to_page(addr + PAGE_SIZE);
> @@ -523,14 +526,17 @@ void *__kprobes text_poke(void *addr, co
>  		pages[1] = virt_to_page(addr + PAGE_SIZE);
>  	}
>  	BUG_ON(!pages[0]);
> -	if (!pages[1])
> -		nr_pages = 1;
> -	vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
> -	BUG_ON(!vaddr);
> -	local_irq_disable();
> +	set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));

Can the set_fixmap/clear_fixmap/local_flush_tlb be called within
local_irq_save ? If yes, that would be better, especially for the SMP
alternatives code, which would rely on interrupt disabling in text_poke
for consistency (the mutex is not needed there).

Mathieu

> +	if (pages[1])
> +		set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
> +	vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
> +	local_irq_save(flags);
>  	memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
> -	local_irq_enable();
> -	vunmap(vaddr);
> +	local_irq_restore(flags);
> +	clear_fixmap(FIX_TEXT_POKE0);
> +	if (pages[1])
> +		clear_fixmap(FIX_TEXT_POKE1);
> +	local_flush_tlb();
>  	sync_core();
>  	/* Could also do a CLFLUSH here to speed up CPU recovery; but
>  	   that causes hangs on some VIA CPUs. */
> Index: linux-2.6-tip/arch/x86/include/asm/fixmap.h
> ===================================================================
> --- linux-2.6-tip.orig/arch/x86/include/asm/fixmap.h
> +++ linux-2.6-tip/arch/x86/include/asm/fixmap.h
> @@ -111,6 +111,8 @@ enum fixed_addresses {
>  #ifdef CONFIG_PARAVIRT
>  	FIX_PARAVIRT_BOOTMAP,
>  #endif
> +	FIX_TEXT_POKE0,	/* reserve 2 pages for text_poke() */
> +	FIX_TEXT_POKE1,
>  	__end_of_permanent_fixed_addresses,
>  #ifdef CONFIG_PROVIDE_OHCI1394_DMA_INIT
>  	FIX_OHCI1394_BASE,
> -- 
> Masami Hiramatsu
> 
> Software Engineer
> Hitachi Computer Products (America) Inc.
> Software Solutions Division
> 
> e-mail: mhiramat@redhat.com
> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH -tip 4/4] Atomic text_poke() with fixmap
  2009-03-06 18:13   ` Mathieu Desnoyers
@ 2009-03-06 18:18     ` Ingo Molnar
  2009-03-06 18:33     ` [PATCH -tip 4/4] Atomic text_poke() with fixmap take2 Masami Hiramatsu
  1 sibling, 0 replies; 26+ messages in thread
From: Ingo Molnar @ 2009-03-06 18:18 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Masami Hiramatsu, Andrew Morton, Nick Piggin, Steven Rostedt,
	Andi Kleen, Linux Kernel Mailing List, Thomas Gleixner,
	Peter Zijlstra, Frederic Weisbecker, Linus Torvalds,
	Arjan van de Ven, Rusty Russell, H. Peter Anvin


* Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:

> * Masami Hiramatsu (mhiramat@redhat.com) wrote:
> > Use fixmaps instead of vmap/vunmap in text_poke() for avoiding page allocation
> > and delayed unmapping.
> > 
> > At the result of above change, text_poke() becomes atomic and can be called
> > from stop_machine() etc.
> > 
> > Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
> > Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> > Cc: Ingo Molnar <mingo@elte.hu>
> > ---
> >  arch/x86/include/asm/fixmap.h |    2 ++
> >  arch/x86/kernel/alternative.c |   24 +++++++++++++++---------
> >  2 files changed, 17 insertions(+), 9 deletions(-)
> > 
> > Index: linux-2.6-tip/arch/x86/kernel/alternative.c
> > ===================================================================
> > --- linux-2.6-tip.orig/arch/x86/kernel/alternative.c
> > +++ linux-2.6-tip/arch/x86/kernel/alternative.c
> > @@ -13,7 +13,9 @@
> >  #include <asm/nmi.h>
> >  #include <asm/vsyscall.h>
> >  #include <asm/cacheflush.h>
> > +#include <asm/tlbflush.h>
> >  #include <asm/io.h>
> > +#include <asm/fixmap.h>
> > 
> >  #define MAX_PATCH_LEN (255-1)
> > 
> > @@ -505,15 +507,16 @@ void *text_poke_early(void *addr, const
> >   * It means the size must be writable atomically and the address must be aligned
> >   * in a way that permits an atomic write. It also makes sure we fit on a single
> >   * page.
> > + *
> > + * Note: Must be called under text_mutex.
> >   */
> >  void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
> >  {
> > +	unsigned long flags;
> >  	char *vaddr;
> > -	int nr_pages = 2;
> >  	struct page *pages[2];
> >  	int i;
> > 
> > -	might_sleep();
> >  	if (!core_kernel_text((unsigned long)addr)) {
> >  		pages[0] = vmalloc_to_page(addr);
> >  		pages[1] = vmalloc_to_page(addr + PAGE_SIZE);
> > @@ -523,14 +526,17 @@ void *__kprobes text_poke(void *addr, co
> >  		pages[1] = virt_to_page(addr + PAGE_SIZE);
> >  	}
> >  	BUG_ON(!pages[0]);
> > -	if (!pages[1])
> > -		nr_pages = 1;
> > -	vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
> > -	BUG_ON(!vaddr);
> > -	local_irq_disable();
> > +	set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
> 
> Can the set_fixmap/clear_fixmap/local_flush_tlb be called 
> within local_irq_save ? If yes, that would be better, 
> especially for the SMP alternatives code, which would rely on 
> interrupt disabling in text_poke for consistency (the mutex is 
> not needed there).

yes, it is atomic.

	Ingo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH -tip 4/4] Atomic text_poke() with fixmap take2
  2009-03-06 18:13   ` Mathieu Desnoyers
  2009-03-06 18:18     ` Ingo Molnar
@ 2009-03-06 18:33     ` Masami Hiramatsu
  2009-03-06 18:45       ` Mathieu Desnoyers
  2009-03-06 19:08       ` Ingo Molnar
  1 sibling, 2 replies; 26+ messages in thread
From: Masami Hiramatsu @ 2009-03-06 18:33 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Ingo Molnar, Andrew Morton, Nick Piggin, Steven Rostedt,
	Andi Kleen, Linux Kernel Mailing List, Thomas Gleixner,
	Peter Zijlstra, Frederic Weisbecker, Linus Torvalds,
	Arjan van de Ven, Rusty Russell, H. Peter Anvin

Mathieu Desnoyers wrote:
>> @@ -523,14 +526,17 @@ void *__kprobes text_poke(void *addr, co
>>  		pages[1] = virt_to_page(addr + PAGE_SIZE);
>>  	}
>>  	BUG_ON(!pages[0]);
>> -	if (!pages[1])
>> -		nr_pages = 1;
>> -	vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
>> -	BUG_ON(!vaddr);
>> -	local_irq_disable();
>> +	set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
> 
> Can the set_fixmap/clear_fixmap/local_flush_tlb be called within
> local_irq_save ? If yes, that would be better, especially for the SMP
> alternatives code, which would rely on interrupt disabling in text_poke
> for consistency (the mutex is not needed there).

Yes, that just changes pte.

---
Use fixmaps instead of vmap/vunmap in text_poke() for avoiding page allocation
and delayed unmapping.

At the result of above change, text_poke() becomes atomic and can be called
from stop_machine() etc.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/include/asm/fixmap.h |    2 ++
 arch/x86/kernel/alternative.c |   24 +++++++++++++++---------
 2 files changed, 17 insertions(+), 9 deletions(-)

Index: linux-2.6-tip/arch/x86/kernel/alternative.c
===================================================================
--- linux-2.6-tip.orig/arch/x86/kernel/alternative.c
+++ linux-2.6-tip/arch/x86/kernel/alternative.c
@@ -13,7 +13,9 @@
 #include <asm/nmi.h>
 #include <asm/vsyscall.h>
 #include <asm/cacheflush.h>
+#include <asm/tlbflush.h>
 #include <asm/io.h>
+#include <asm/fixmap.h>

 #define MAX_PATCH_LEN (255-1)

@@ -505,15 +507,16 @@ void *text_poke_early(void *addr, const
  * It means the size must be writable atomically and the address must be aligned
  * in a way that permits an atomic write. It also makes sure we fit on a single
  * page.
+ *
+ * Note: Must be called under text_mutex.
  */
 void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
 {
+	unsigned long flags;
 	char *vaddr;
-	int nr_pages = 2;
 	struct page *pages[2];
 	int i;

-	might_sleep();
 	if (!core_kernel_text((unsigned long)addr)) {
 		pages[0] = vmalloc_to_page(addr);
 		pages[1] = vmalloc_to_page(addr + PAGE_SIZE);
@@ -523,14 +526,17 @@ void *__kprobes text_poke(void *addr, co
 		pages[1] = virt_to_page(addr + PAGE_SIZE);
 	}
 	BUG_ON(!pages[0]);
-	if (!pages[1])
-		nr_pages = 1;
-	vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
-	BUG_ON(!vaddr);
-	local_irq_disable();
+	local_irq_save(flags);
+	set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
+	if (pages[1])
+		set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
+	vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
 	memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
-	local_irq_enable();
-	vunmap(vaddr);
+	clear_fixmap(FIX_TEXT_POKE0);
+	if (pages[1])
+		clear_fixmap(FIX_TEXT_POKE1);
+	local_flush_tlb();
+	local_irq_restore(flags);
 	sync_core();
 	/* Could also do a CLFLUSH here to speed up CPU recovery; but
 	   that causes hangs on some VIA CPUs. */
Index: linux-2.6-tip/arch/x86/include/asm/fixmap.h
===================================================================
--- linux-2.6-tip.orig/arch/x86/include/asm/fixmap.h
+++ linux-2.6-tip/arch/x86/include/asm/fixmap.h
@@ -111,6 +111,8 @@ enum fixed_addresses {
 #ifdef CONFIG_PARAVIRT
 	FIX_PARAVIRT_BOOTMAP,
 #endif
+	FIX_TEXT_POKE0,	/* reserve 2 pages for text_poke() */
+	FIX_TEXT_POKE1,
 	__end_of_permanent_fixed_addresses,
 #ifdef CONFIG_PROVIDE_OHCI1394_DMA_INIT
 	FIX_OHCI1394_BASE,
-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhiramat@redhat.com


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH -tip 3/4]Text Edit Lock - Smp alternatives support
  2009-03-06 18:11   ` Mathieu Desnoyers
@ 2009-03-06 18:40     ` Masami Hiramatsu
  0 siblings, 0 replies; 26+ messages in thread
From: Masami Hiramatsu @ 2009-03-06 18:40 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Ingo Molnar, Andrew Morton, Nick Piggin, Steven Rostedt,
	Andi Kleen, Linux Kernel Mailing List, Thomas Gleixner,
	Peter Zijlstra, Frederic Weisbecker, Linus Torvalds,
	Arjan van de Ven, Rusty Russell, H. Peter Anvin

Mathieu Desnoyers wrote:
> * Masami Hiramatsu (mhiramat@redhat.com) wrote:
>> Use the mutual exclusion provided by the text edit lock in alternatives code.
>> Since alternative_smp_* will be called from module init code, etc,
>> we'd better protect it from other subsystems.
>>
>> Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
>> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
>> Cc: Ingo Molnar <mingo@elte.hu>
>> ---
>>  arch/x86/kernel/alternative.c |    5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> Index: 2.6.29-rc7/arch/x86/kernel/alternative.c
>> ===================================================================
>> --- 2.6.29-rc7.orig/arch/x86/kernel/alternative.c
>> +++ 2.6.29-rc7/arch/x86/kernel/alternative.c
>> @@ -5,6 +5,7 @@
>>  #include <linux/kprobes.h>
>>  #include <linux/mm.h>
>>  #include <linux/vmalloc.h>
>> +#include <linux/memory.h>
>>  #include <asm/alternative.h>
>>  #include <asm/sections.h>
>>  #include <asm/pgtable.h>
>> @@ -226,6 +227,7 @@ static void alternatives_smp_lock(u8 **s
>>  {
>>  	u8 **ptr;
>>
>> +	mutex_lock(&text_mutex);
>>  	for (ptr = start; ptr < end; ptr++) {
>>  		if (*ptr < text)
>>  			continue;
>> @@ -234,6 +236,7 @@ static void alternatives_smp_lock(u8 **s
>>  		/* turn DS segment override prefix into lock prefix */
>>  		text_poke(*ptr, ((unsigned char []){0xf0}), 1);
>>  	};
>> +	mutex_unlock(&text_mutex);
>>  }
>>
>>  static void alternatives_smp_unlock(u8 **start, u8 **end, u8 *text, u8 *text_end)
>> @@ -243,6 +246,7 @@ static void alternatives_smp_unlock(u8 *
>>  	if (noreplace_smp)
>>  		return;
>>
>> +	mutex_lock(&text_mutex);
>>  	for (ptr = start; ptr < end; ptr++) {
>>  		if (*ptr < text)
>>  			continue;
>> @@ -251,6 +255,7 @@ static void alternatives_smp_unlock(u8 *
>>  		/* turn lock prefix into DS segment override prefix */
>>  		text_poke(*ptr, ((unsigned char []){0x3E}), 1);
>>  	};
>> +	mutex_unlock(&text_mutex);
> 
> And for these cases, the "alternatives_smp_lock" and
> "alternatives_smp_unlock", the system is running as uniprocessor.
> Therefore it's enough to disable interrupts within text_poke to make it
> work correctly.

Hmm, you are right. With my atomic-text_poke() take2 patch,
we don't need this patch.

Thank you,

> 
> Mathieu
> 
>>  }
>>
>>  struct smp_alt_module {
>> -- 
>> Masami Hiramatsu
>>
>> Software Engineer
>> Hitachi Computer Products (America) Inc.
>> Software Solutions Division
>>
>> e-mail: mhiramat@redhat.com
>>
> 

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhiramat@redhat.com


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH -tip 4/4] Atomic text_poke() with fixmap take2
  2009-03-06 18:33     ` [PATCH -tip 4/4] Atomic text_poke() with fixmap take2 Masami Hiramatsu
@ 2009-03-06 18:45       ` Mathieu Desnoyers
  2009-03-06 19:08       ` Ingo Molnar
  1 sibling, 0 replies; 26+ messages in thread
From: Mathieu Desnoyers @ 2009-03-06 18:45 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Ingo Molnar, Andrew Morton, Nick Piggin, Steven Rostedt,
	Andi Kleen, Linux Kernel Mailing List, Thomas Gleixner,
	Peter Zijlstra, Frederic Weisbecker, Linus Torvalds,
	Arjan van de Ven, Rusty Russell, H. Peter Anvin

* Masami Hiramatsu (mhiramat@redhat.com) wrote:
> Mathieu Desnoyers wrote:
> >> @@ -523,14 +526,17 @@ void *__kprobes text_poke(void *addr, co
> >>  		pages[1] = virt_to_page(addr + PAGE_SIZE);
> >>  	}
> >>  	BUG_ON(!pages[0]);
> >> -	if (!pages[1])
> >> -		nr_pages = 1;
> >> -	vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
> >> -	BUG_ON(!vaddr);
> >> -	local_irq_disable();
> >> +	set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
> > 
> > Can the set_fixmap/clear_fixmap/local_flush_tlb be called within
> > local_irq_save ? If yes, that would be better, especially for the SMP
> > alternatives code, which would rely on interrupt disabling in text_poke
> > for consistency (the mutex is not needed there).
> 
> Yes, that just changes pte.
> 
> ---
> Use fixmaps instead of vmap/vunmap in text_poke() for avoiding page allocation
> and delayed unmapping.
> 
> At the result of above change, text_poke() becomes atomic and can be called
> from stop_machine() etc.
> 
> Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> Cc: Ingo Molnar <mingo@elte.hu>
> ---
>  arch/x86/include/asm/fixmap.h |    2 ++
>  arch/x86/kernel/alternative.c |   24 +++++++++++++++---------
>  2 files changed, 17 insertions(+), 9 deletions(-)
> 
> Index: linux-2.6-tip/arch/x86/kernel/alternative.c
> ===================================================================
> --- linux-2.6-tip.orig/arch/x86/kernel/alternative.c
> +++ linux-2.6-tip/arch/x86/kernel/alternative.c
> @@ -13,7 +13,9 @@
>  #include <asm/nmi.h>
>  #include <asm/vsyscall.h>
>  #include <asm/cacheflush.h>
> +#include <asm/tlbflush.h>
>  #include <asm/io.h>
> +#include <asm/fixmap.h>
> 
>  #define MAX_PATCH_LEN (255-1)
> 
> @@ -505,15 +507,16 @@ void *text_poke_early(void *addr, const
>   * It means the size must be writable atomically and the address must be aligned
>   * in a way that permits an atomic write. It also makes sure we fit on a single
>   * page.
> + *
> + * Note: Must be called under text_mutex.

Maybe you could add "or in specific cases ensuring UP behavior".

The rest looks fine.

Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>

>   */
>  void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
>  {
> +	unsigned long flags;
>  	char *vaddr;
> -	int nr_pages = 2;
>  	struct page *pages[2];
>  	int i;
> 
> -	might_sleep();
>  	if (!core_kernel_text((unsigned long)addr)) {
>  		pages[0] = vmalloc_to_page(addr);
>  		pages[1] = vmalloc_to_page(addr + PAGE_SIZE);
> @@ -523,14 +526,17 @@ void *__kprobes text_poke(void *addr, co
>  		pages[1] = virt_to_page(addr + PAGE_SIZE);
>  	}
>  	BUG_ON(!pages[0]);
> -	if (!pages[1])
> -		nr_pages = 1;
> -	vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
> -	BUG_ON(!vaddr);
> -	local_irq_disable();
> +	local_irq_save(flags);
> +	set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
> +	if (pages[1])
> +		set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
> +	vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
>  	memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
> -	local_irq_enable();
> -	vunmap(vaddr);
> +	clear_fixmap(FIX_TEXT_POKE0);
> +	if (pages[1])
> +		clear_fixmap(FIX_TEXT_POKE1);
> +	local_flush_tlb();
> +	local_irq_restore(flags);
>  	sync_core();
>  	/* Could also do a CLFLUSH here to speed up CPU recovery; but
>  	   that causes hangs on some VIA CPUs. */
> Index: linux-2.6-tip/arch/x86/include/asm/fixmap.h
> ===================================================================
> --- linux-2.6-tip.orig/arch/x86/include/asm/fixmap.h
> +++ linux-2.6-tip/arch/x86/include/asm/fixmap.h
> @@ -111,6 +111,8 @@ enum fixed_addresses {
>  #ifdef CONFIG_PARAVIRT
>  	FIX_PARAVIRT_BOOTMAP,
>  #endif
> +	FIX_TEXT_POKE0,	/* reserve 2 pages for text_poke() */
> +	FIX_TEXT_POKE1,
>  	__end_of_permanent_fixed_addresses,
>  #ifdef CONFIG_PROVIDE_OHCI1394_DMA_INIT
>  	FIX_OHCI1394_BASE,
> -- 
> Masami Hiramatsu
> 
> Software Engineer
> Hitachi Computer Products (America) Inc.
> Software Solutions Division
> 
> e-mail: mhiramat@redhat.com
> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH -tip 4/4] Atomic text_poke() with fixmap take2
  2009-03-06 18:33     ` [PATCH -tip 4/4] Atomic text_poke() with fixmap take2 Masami Hiramatsu
  2009-03-06 18:45       ` Mathieu Desnoyers
@ 2009-03-06 19:08       ` Ingo Molnar
  2009-03-06 19:15         ` Mathieu Desnoyers
  1 sibling, 1 reply; 26+ messages in thread
From: Ingo Molnar @ 2009-03-06 19:08 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Mathieu Desnoyers, Andrew Morton, Nick Piggin, Steven Rostedt,
	Andi Kleen, Linux Kernel Mailing List, Thomas Gleixner,
	Peter Zijlstra, Frederic Weisbecker, Linus Torvalds,
	Arjan van de Ven, Rusty Russell, H. Peter Anvin


* Masami Hiramatsu <mhiramat@redhat.com> wrote:

> @@ -523,14 +526,17 @@ void *__kprobes text_poke(void *addr, co
>  		pages[1] = virt_to_page(addr + PAGE_SIZE);
>  	}
>  	BUG_ON(!pages[0]);
> -	if (!pages[1])
> -		nr_pages = 1;
> -	vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
> -	BUG_ON(!vaddr);
> -	local_irq_disable();
> +	local_irq_save(flags);
> +	set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
> +	if (pages[1])
> +		set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
> +	vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
>  	memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
> -	local_irq_enable();
> -	vunmap(vaddr);
> +	clear_fixmap(FIX_TEXT_POKE0);
> +	if (pages[1])
> +		clear_fixmap(FIX_TEXT_POKE1);
> +	local_flush_tlb();
> +	local_irq_restore(flags);
>  	sync_core();

I'm not sure at all about this widening of the irq-atomic 
section and the idea of allowing non-locked access on single-CPU 
situations - we dont really want to micro-optimize any of this 
on such a level, holding the text lock is a robust rule all code 
should be listening to. (Creating locking assymetry always 
inserts a certain amount of fragility - adding to an already 
fragile concept here.)

And note that there's no reason why text_poke could not be used 
in stop_machine_run() - the stop_machine_run() handler must not 
take the text_lock of course - but outside code calling 
stop_machine_run() can do it and can hence serialize properly.

Note that even if we did this then your v2 patch is not fully 
correct: you need to move the sync_core() at the end of the 
sequence inside the critical section too. (right now this is 
mostly harmless because the INVLPG inside the clear_fixmap() 
happens to be serializing so it has an implicit sync_core() 
property - but nevertheless we better do this straight away to 
not cause problems later down the line.)

	Ingo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH -tip 4/4] Atomic text_poke() with fixmap take2
  2009-03-06 19:08       ` Ingo Molnar
@ 2009-03-06 19:15         ` Mathieu Desnoyers
  2009-03-06 19:22           ` Ingo Molnar
  0 siblings, 1 reply; 26+ messages in thread
From: Mathieu Desnoyers @ 2009-03-06 19:15 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Masami Hiramatsu, Andrew Morton, Nick Piggin, Steven Rostedt,
	Andi Kleen, Linux Kernel Mailing List, Thomas Gleixner,
	Peter Zijlstra, Frederic Weisbecker, Linus Torvalds,
	Arjan van de Ven, Rusty Russell, H. Peter Anvin

* Ingo Molnar (mingo@elte.hu) wrote:
> 
> * Masami Hiramatsu <mhiramat@redhat.com> wrote:
> 
> > @@ -523,14 +526,17 @@ void *__kprobes text_poke(void *addr, co
> >  		pages[1] = virt_to_page(addr + PAGE_SIZE);
> >  	}
> >  	BUG_ON(!pages[0]);
> > -	if (!pages[1])
> > -		nr_pages = 1;
> > -	vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
> > -	BUG_ON(!vaddr);
> > -	local_irq_disable();
> > +	local_irq_save(flags);
> > +	set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
> > +	if (pages[1])
> > +		set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
> > +	vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
> >  	memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
> > -	local_irq_enable();
> > -	vunmap(vaddr);
> > +	clear_fixmap(FIX_TEXT_POKE0);
> > +	if (pages[1])
> > +		clear_fixmap(FIX_TEXT_POKE1);
> > +	local_flush_tlb();
> > +	local_irq_restore(flags);
> >  	sync_core();
> 
> I'm not sure at all about this widening of the irq-atomic 
> section and the idea of allowing non-locked access on single-CPU 
> situations - we dont really want to micro-optimize any of this 
> on such a level, holding the text lock is a robust rule all code 
> should be listening to. (Creating locking assymetry always 
> inserts a certain amount of fragility - adding to an already 
> fragile concept here.)
> 
> And note that there's no reason why text_poke could not be used 
> in stop_machine_run() - the stop_machine_run() handler must not 
> take the text_lock of course - but outside code calling 
> stop_machine_run() can do it and can hence serialize properly.
> 
> Note that even if we did this then your v2 patch is not fully 
> correct: you need to move the sync_core() at the end of the 
> sequence inside the critical section too. (right now this is 
> mostly harmless because the INVLPG inside the clear_fixmap() 
> happens to be serializing so it has an implicit sync_core() 
> property - but nevertheless we better do this straight away to 
> not cause problems later down the line.)
> 
> 	Ingo

Agreed. The alternatives_smp_lock/alternatives_smp_unlock specific case
does not bring us much if it has no perceivable performance impact. It's
better to keep a standard interface and clear requirements.

Mathieu


-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH -tip 4/4] Atomic text_poke() with fixmap take2
  2009-03-06 19:15         ` Mathieu Desnoyers
@ 2009-03-06 19:22           ` Ingo Molnar
  2009-03-06 20:25             ` [PATCH -tip 5/4] Expands irq-off region in text_poke() Masami Hiramatsu
  0 siblings, 1 reply; 26+ messages in thread
From: Ingo Molnar @ 2009-03-06 19:22 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Masami Hiramatsu, Andrew Morton, Nick Piggin, Steven Rostedt,
	Andi Kleen, Linux Kernel Mailing List, Thomas Gleixner,
	Peter Zijlstra, Frederic Weisbecker, Linus Torvalds,
	Arjan van de Ven, Rusty Russell, H. Peter Anvin


* Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:

> * Ingo Molnar (mingo@elte.hu) wrote:
> > 
> > * Masami Hiramatsu <mhiramat@redhat.com> wrote:
> > 
> > > @@ -523,14 +526,17 @@ void *__kprobes text_poke(void *addr, co
> > >  		pages[1] = virt_to_page(addr + PAGE_SIZE);
> > >  	}
> > >  	BUG_ON(!pages[0]);
> > > -	if (!pages[1])
> > > -		nr_pages = 1;
> > > -	vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
> > > -	BUG_ON(!vaddr);
> > > -	local_irq_disable();
> > > +	local_irq_save(flags);
> > > +	set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
> > > +	if (pages[1])
> > > +		set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
> > > +	vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
> > >  	memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
> > > -	local_irq_enable();
> > > -	vunmap(vaddr);
> > > +	clear_fixmap(FIX_TEXT_POKE0);
> > > +	if (pages[1])
> > > +		clear_fixmap(FIX_TEXT_POKE1);
> > > +	local_flush_tlb();
> > > +	local_irq_restore(flags);
> > >  	sync_core();
> > 
> > I'm not sure at all about this widening of the irq-atomic 
> > section and the idea of allowing non-locked access on single-CPU 
> > situations - we dont really want to micro-optimize any of this 
> > on such a level, holding the text lock is a robust rule all code 
> > should be listening to. (Creating locking assymetry always 
> > inserts a certain amount of fragility - adding to an already 
> > fragile concept here.)
> > 
> > And note that there's no reason why text_poke could not be used 
> > in stop_machine_run() - the stop_machine_run() handler must not 
> > take the text_lock of course - but outside code calling 
> > stop_machine_run() can do it and can hence serialize properly.
> > 
> > Note that even if we did this then your v2 patch is not fully 
> > correct: you need to move the sync_core() at the end of the 
> > sequence inside the critical section too. (right now this is 
> > mostly harmless because the INVLPG inside the clear_fixmap() 
> > happens to be serializing so it has an implicit sync_core() 
> > property - but nevertheless we better do this straight away to 
> > not cause problems later down the line.)
> > 
> > 	Ingo
> 
> Agreed. The alternatives_smp_lock/alternatives_smp_unlock 
> specific case does not bring us much if it has no perceivable 
> performance impact. It's better to keep a standard interface 
> and clear requirements.

Note that i dont object to another aspect of this same change: 
the fact that it makes the whole sequence more atomic and more 
defensive [which is never bad of fragile interfaces].

I only got worried about the "lets use this without the text 
lock" ideas.

So if Masami-san sends a delta patch with a different changelog 
and with the sync_core() bit moved inside the critical section, 
i'll apply that too.

	Ingo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH -tip 5/4] Expands irq-off region in text_poke()
  2009-03-06 19:22           ` Ingo Molnar
@ 2009-03-06 20:25             ` Masami Hiramatsu
  2009-03-06 21:01               ` Mathieu Desnoyers
  0 siblings, 1 reply; 26+ messages in thread
From: Masami Hiramatsu @ 2009-03-06 20:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Mathieu Desnoyers, Andrew Morton, Nick Piggin, Steven Rostedt,
	Andi Kleen, Linux Kernel Mailing List, Thomas Gleixner,
	Peter Zijlstra, Frederic Weisbecker, Linus Torvalds,
	Arjan van de Ven, Rusty Russell, H. Peter Anvin

Ingo Molnar wrote:
> * Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:
> 
>> * Ingo Molnar (mingo@elte.hu) wrote:
>>> * Masami Hiramatsu <mhiramat@redhat.com> wrote:
>>>
>>>> @@ -523,14 +526,17 @@ void *__kprobes text_poke(void *addr, co
>>>>  		pages[1] = virt_to_page(addr + PAGE_SIZE);
>>>>  	}
>>>>  	BUG_ON(!pages[0]);
>>>> -	if (!pages[1])
>>>> -		nr_pages = 1;
>>>> -	vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
>>>> -	BUG_ON(!vaddr);
>>>> -	local_irq_disable();
>>>> +	local_irq_save(flags);
>>>> +	set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
>>>> +	if (pages[1])
>>>> +		set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
>>>> +	vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
>>>>  	memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
>>>> -	local_irq_enable();
>>>> -	vunmap(vaddr);
>>>> +	clear_fixmap(FIX_TEXT_POKE0);
>>>> +	if (pages[1])
>>>> +		clear_fixmap(FIX_TEXT_POKE1);
>>>> +	local_flush_tlb();
>>>> +	local_irq_restore(flags);
>>>>  	sync_core();
>>> I'm not sure at all about this widening of the irq-atomic 
>>> section and the idea of allowing non-locked access on single-CPU 
>>> situations - we dont really want to micro-optimize any of this 
>>> on such a level, holding the text lock is a robust rule all code 
>>> should be listening to. (Creating locking assymetry always 
>>> inserts a certain amount of fragility - adding to an already 
>>> fragile concept here.)
>>>
>>> And note that there's no reason why text_poke could not be used 
>>> in stop_machine_run() - the stop_machine_run() handler must not 
>>> take the text_lock of course - but outside code calling 
>>> stop_machine_run() can do it and can hence serialize properly.
>>>
>>> Note that even if we did this then your v2 patch is not fully 
>>> correct: you need to move the sync_core() at the end of the 
>>> sequence inside the critical section too. (right now this is 
>>> mostly harmless because the INVLPG inside the clear_fixmap() 
>>> happens to be serializing so it has an implicit sync_core() 
>>> property - but nevertheless we better do this straight away to 
>>> not cause problems later down the line.)
>>>
>>> 	Ingo
>> Agreed. The alternatives_smp_lock/alternatives_smp_unlock 
>> specific case does not bring us much if it has no perceivable 
>> performance impact. It's better to keep a standard interface 
>> and clear requirements.
> 
> Note that i dont object to another aspect of this same change: 
> the fact that it makes the whole sequence more atomic and more 
> defensive [which is never bad of fragile interfaces].
> 
> I only got worried about the "lets use this without the text 
> lock" ideas.
> 
> So if Masami-san sends a delta patch with a different changelog 
> and with the sync_core() bit moved inside the critical section, 
> i'll apply that too.

OK, here is the delta patch.

Expand irq-atomic region to cover fixmap using code and sync_core.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/kernel/alternative.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6-tip/arch/x86/kernel/alternative.c
===================================================================
--- linux-2.6-tip.orig/arch/x86/kernel/alternative.c
+++ linux-2.6-tip/arch/x86/kernel/alternative.c
@@ -526,13 +526,12 @@ void *__kprobes text_poke(void *addr, co
 		pages[1] = virt_to_page(addr + PAGE_SIZE);
 	}
 	BUG_ON(!pages[0]);
+	local_irq_save(flags);
 	set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
 	if (pages[1])
 		set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
 	vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
-	local_irq_save(flags);
 	memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
-	local_irq_restore(flags);
 	clear_fixmap(FIX_TEXT_POKE0);
 	if (pages[1])
 		clear_fixmap(FIX_TEXT_POKE1);
@@ -540,6 +539,7 @@ void *__kprobes text_poke(void *addr, co
 	sync_core();
 	/* Could also do a CLFLUSH here to speed up CPU recovery; but
 	   that causes hangs on some VIA CPUs. */
+	local_irq_restore(flags);
 	for (i = 0; i < len; i++)
 		BUG_ON(((char *)addr)[i] != ((char *)opcode)[i]);
 	return addr;
-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhiramat@redhat.com


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH -tip 5/4] Expands irq-off region in text_poke()
  2009-03-06 20:25             ` [PATCH -tip 5/4] Expands irq-off region in text_poke() Masami Hiramatsu
@ 2009-03-06 21:01               ` Mathieu Desnoyers
  2009-03-06 21:57                 ` Masami Hiramatsu
  0 siblings, 1 reply; 26+ messages in thread
From: Mathieu Desnoyers @ 2009-03-06 21:01 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Ingo Molnar, Andrew Morton, Nick Piggin, Steven Rostedt,
	Andi Kleen, Linux Kernel Mailing List, Thomas Gleixner,
	Peter Zijlstra, Frederic Weisbecker, Linus Torvalds,
	Arjan van de Ven, Rusty Russell, H. Peter Anvin

* Masami Hiramatsu (mhiramat@redhat.com) wrote:
> Ingo Molnar wrote:
> > * Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:
> > 
> >> * Ingo Molnar (mingo@elte.hu) wrote:
> >>> * Masami Hiramatsu <mhiramat@redhat.com> wrote:
> >>>
> >>>> @@ -523,14 +526,17 @@ void *__kprobes text_poke(void *addr, co
> >>>>  		pages[1] = virt_to_page(addr + PAGE_SIZE);
> >>>>  	}
> >>>>  	BUG_ON(!pages[0]);
> >>>> -	if (!pages[1])
> >>>> -		nr_pages = 1;
> >>>> -	vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
> >>>> -	BUG_ON(!vaddr);
> >>>> -	local_irq_disable();
> >>>> +	local_irq_save(flags);
> >>>> +	set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
> >>>> +	if (pages[1])
> >>>> +		set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
> >>>> +	vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
> >>>>  	memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
> >>>> -	local_irq_enable();
> >>>> -	vunmap(vaddr);
> >>>> +	clear_fixmap(FIX_TEXT_POKE0);
> >>>> +	if (pages[1])
> >>>> +		clear_fixmap(FIX_TEXT_POKE1);
> >>>> +	local_flush_tlb();
> >>>> +	local_irq_restore(flags);
> >>>>  	sync_core();
> >>> I'm not sure at all about this widening of the irq-atomic 
> >>> section and the idea of allowing non-locked access on single-CPU 
> >>> situations - we dont really want to micro-optimize any of this 
> >>> on such a level, holding the text lock is a robust rule all code 
> >>> should be listening to. (Creating locking assymetry always 
> >>> inserts a certain amount of fragility - adding to an already 
> >>> fragile concept here.)
> >>>
> >>> And note that there's no reason why text_poke could not be used 
> >>> in stop_machine_run() - the stop_machine_run() handler must not 
> >>> take the text_lock of course - but outside code calling 
> >>> stop_machine_run() can do it and can hence serialize properly.
> >>>
> >>> Note that even if we did this then your v2 patch is not fully 
> >>> correct: you need to move the sync_core() at the end of the 
> >>> sequence inside the critical section too. (right now this is 
> >>> mostly harmless because the INVLPG inside the clear_fixmap() 
> >>> happens to be serializing so it has an implicit sync_core() 
> >>> property - but nevertheless we better do this straight away to 
> >>> not cause problems later down the line.)
> >>>
> >>> 	Ingo
> >> Agreed. The alternatives_smp_lock/alternatives_smp_unlock 
> >> specific case does not bring us much if it has no perceivable 
> >> performance impact. It's better to keep a standard interface 
> >> and clear requirements.
> > 
> > Note that i dont object to another aspect of this same change: 
> > the fact that it makes the whole sequence more atomic and more 
> > defensive [which is never bad of fragile interfaces].
> > 
> > I only got worried about the "lets use this without the text 
> > lock" ideas.
> > 
> > So if Masami-san sends a delta patch with a different changelog 
> > and with the sync_core() bit moved inside the critical section, 
> > i'll apply that too.
> 
> OK, here is the delta patch.
> 
> Expand irq-atomic region to cover fixmap using code and sync_core.
> 
> Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> Cc: Ingo Molnar <mingo@elte.hu>
> ---
>  arch/x86/kernel/alternative.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> Index: linux-2.6-tip/arch/x86/kernel/alternative.c
> ===================================================================
> --- linux-2.6-tip.orig/arch/x86/kernel/alternative.c
> +++ linux-2.6-tip/arch/x86/kernel/alternative.c
> @@ -526,13 +526,12 @@ void *__kprobes text_poke(void *addr, co
>  		pages[1] = virt_to_page(addr + PAGE_SIZE);
>  	}
>  	BUG_ON(!pages[0]);
> +	local_irq_save(flags);
>  	set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
>  	if (pages[1])
>  		set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
>  	vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
> -	local_irq_save(flags);
>  	memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
> -	local_irq_restore(flags);
>  	clear_fixmap(FIX_TEXT_POKE0);
>  	if (pages[1])
>  		clear_fixmap(FIX_TEXT_POKE1);
> @@ -540,6 +539,7 @@ void *__kprobes text_poke(void *addr, co
>  	sync_core();
>  	/* Could also do a CLFLUSH here to speed up CPU recovery; but
>  	   that causes hangs on some VIA CPUs. */
> +	local_irq_restore(flags);
>  	for (i = 0; i < len; i++)
>  		BUG_ON(((char *)addr)[i] != ((char *)opcode)[i]);

I think irq off should cover the BUG_ON too. This safety check assumes
we are the only ones modifying "addr".

Mathieu

>  	return addr;
> -- 
> Masami Hiramatsu
> 
> Software Engineer
> Hitachi Computer Products (America) Inc.
> Software Solutions Division
> 
> e-mail: mhiramat@redhat.com
> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH -tip 5/4] Expands irq-off region in text_poke()
  2009-03-06 21:01               ` Mathieu Desnoyers
@ 2009-03-06 21:57                 ` Masami Hiramatsu
  2009-03-07  1:42                   ` Mathieu Desnoyers
  0 siblings, 1 reply; 26+ messages in thread
From: Masami Hiramatsu @ 2009-03-06 21:57 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Ingo Molnar, Andrew Morton, Nick Piggin, Steven Rostedt,
	Andi Kleen, Linux Kernel Mailing List, Thomas Gleixner,
	Peter Zijlstra, Frederic Weisbecker, Linus Torvalds,
	Arjan van de Ven, Rusty Russell, H. Peter Anvin



Mathieu Desnoyers wrote:
> * Masami Hiramatsu (mhiramat@redhat.com) wrote:
>> Ingo Molnar wrote:
>>> * Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:
>>>
>>>> * Ingo Molnar (mingo@elte.hu) wrote:
>>>>> * Masami Hiramatsu <mhiramat@redhat.com> wrote:
>>>>>
>>>>>> @@ -523,14 +526,17 @@ void *__kprobes text_poke(void *addr, co
>>>>>>  		pages[1] = virt_to_page(addr + PAGE_SIZE);
>>>>>>  	}
>>>>>>  	BUG_ON(!pages[0]);
>>>>>> -	if (!pages[1])
>>>>>> -		nr_pages = 1;
>>>>>> -	vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
>>>>>> -	BUG_ON(!vaddr);
>>>>>> -	local_irq_disable();
>>>>>> +	local_irq_save(flags);
>>>>>> +	set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
>>>>>> +	if (pages[1])
>>>>>> +		set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
>>>>>> +	vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
>>>>>>  	memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
>>>>>> -	local_irq_enable();
>>>>>> -	vunmap(vaddr);
>>>>>> +	clear_fixmap(FIX_TEXT_POKE0);
>>>>>> +	if (pages[1])
>>>>>> +		clear_fixmap(FIX_TEXT_POKE1);
>>>>>> +	local_flush_tlb();
>>>>>> +	local_irq_restore(flags);
>>>>>>  	sync_core();
>>>>> I'm not sure at all about this widening of the irq-atomic 
>>>>> section and the idea of allowing non-locked access on single-CPU 
>>>>> situations - we dont really want to micro-optimize any of this 
>>>>> on such a level, holding the text lock is a robust rule all code 
>>>>> should be listening to. (Creating locking assymetry always 
>>>>> inserts a certain amount of fragility - adding to an already 
>>>>> fragile concept here.)
>>>>>
>>>>> And note that there's no reason why text_poke could not be used 
>>>>> in stop_machine_run() - the stop_machine_run() handler must not 
>>>>> take the text_lock of course - but outside code calling 
>>>>> stop_machine_run() can do it and can hence serialize properly.
>>>>>
>>>>> Note that even if we did this then your v2 patch is not fully 
>>>>> correct: you need to move the sync_core() at the end of the 
>>>>> sequence inside the critical section too. (right now this is 
>>>>> mostly harmless because the INVLPG inside the clear_fixmap() 
>>>>> happens to be serializing so it has an implicit sync_core() 
>>>>> property - but nevertheless we better do this straight away to 
>>>>> not cause problems later down the line.)
>>>>>
>>>>> 	Ingo
>>>> Agreed. The alternatives_smp_lock/alternatives_smp_unlock 
>>>> specific case does not bring us much if it has no perceivable 
>>>> performance impact. It's better to keep a standard interface 
>>>> and clear requirements.
>>> Note that i dont object to another aspect of this same change: 
>>> the fact that it makes the whole sequence more atomic and more 
>>> defensive [which is never bad of fragile interfaces].
>>>
>>> I only got worried about the "lets use this without the text 
>>> lock" ideas.
>>>
>>> So if Masami-san sends a delta patch with a different changelog 
>>> and with the sync_core() bit moved inside the critical section, 
>>> i'll apply that too.
>> OK, here is the delta patch.
>>
>> Expand irq-atomic region to cover fixmap using code and sync_core.
>>
>> Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
>> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
>> Cc: Ingo Molnar <mingo@elte.hu>
>> ---
>>  arch/x86/kernel/alternative.c |    4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> Index: linux-2.6-tip/arch/x86/kernel/alternative.c
>> ===================================================================
>> --- linux-2.6-tip.orig/arch/x86/kernel/alternative.c
>> +++ linux-2.6-tip/arch/x86/kernel/alternative.c
>> @@ -526,13 +526,12 @@ void *__kprobes text_poke(void *addr, co
>>  		pages[1] = virt_to_page(addr + PAGE_SIZE);
>>  	}
>>  	BUG_ON(!pages[0]);
>> +	local_irq_save(flags);
>>  	set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
>>  	if (pages[1])
>>  		set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
>>  	vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
>> -	local_irq_save(flags);
>>  	memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
>> -	local_irq_restore(flags);
>>  	clear_fixmap(FIX_TEXT_POKE0);
>>  	if (pages[1])
>>  		clear_fixmap(FIX_TEXT_POKE1);
>> @@ -540,6 +539,7 @@ void *__kprobes text_poke(void *addr, co
>>  	sync_core();
>>  	/* Could also do a CLFLUSH here to speed up CPU recovery; but
>>  	   that causes hangs on some VIA CPUs. */
>> +	local_irq_restore(flags);
>>  	for (i = 0; i < len; i++)
>>  		BUG_ON(((char *)addr)[i] != ((char *)opcode)[i]);
> 
> I think irq off should cover the BUG_ON too. This safety check assumes
> we are the only ones modifying "addr".

I think others don't change without text_mutex, don't it?

Thank you,

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhiramat@redhat.com


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH -tip 5/4] Expands irq-off region in text_poke()
  2009-03-06 21:57                 ` Masami Hiramatsu
@ 2009-03-07  1:42                   ` Mathieu Desnoyers
  2009-03-09 16:40                     ` Masami Hiramatsu
  0 siblings, 1 reply; 26+ messages in thread
From: Mathieu Desnoyers @ 2009-03-07  1:42 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Ingo Molnar, Andrew Morton, Nick Piggin, Steven Rostedt,
	Andi Kleen, Linux Kernel Mailing List, Thomas Gleixner,
	Peter Zijlstra, Frederic Weisbecker, Linus Torvalds,
	Arjan van de Ven, Rusty Russell, H. Peter Anvin

* Masami Hiramatsu (mhiramat@redhat.com) wrote:
> 
> 
> Mathieu Desnoyers wrote:
> > * Masami Hiramatsu (mhiramat@redhat.com) wrote:
> >> Ingo Molnar wrote:
> >>> * Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:
> >>>
> >>>> * Ingo Molnar (mingo@elte.hu) wrote:
> >>>>> * Masami Hiramatsu <mhiramat@redhat.com> wrote:
> >>>>>
> >>>>>> @@ -523,14 +526,17 @@ void *__kprobes text_poke(void *addr, co
> >>>>>>  		pages[1] = virt_to_page(addr + PAGE_SIZE);
> >>>>>>  	}
> >>>>>>  	BUG_ON(!pages[0]);
> >>>>>> -	if (!pages[1])
> >>>>>> -		nr_pages = 1;
> >>>>>> -	vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
> >>>>>> -	BUG_ON(!vaddr);
> >>>>>> -	local_irq_disable();
> >>>>>> +	local_irq_save(flags);
> >>>>>> +	set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
> >>>>>> +	if (pages[1])
> >>>>>> +		set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
> >>>>>> +	vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
> >>>>>>  	memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
> >>>>>> -	local_irq_enable();
> >>>>>> -	vunmap(vaddr);
> >>>>>> +	clear_fixmap(FIX_TEXT_POKE0);
> >>>>>> +	if (pages[1])
> >>>>>> +		clear_fixmap(FIX_TEXT_POKE1);
> >>>>>> +	local_flush_tlb();
> >>>>>> +	local_irq_restore(flags);
> >>>>>>  	sync_core();
> >>>>> I'm not sure at all about this widening of the irq-atomic 
> >>>>> section and the idea of allowing non-locked access on single-CPU 
> >>>>> situations - we dont really want to micro-optimize any of this 
> >>>>> on such a level, holding the text lock is a robust rule all code 
> >>>>> should be listening to. (Creating locking assymetry always 
> >>>>> inserts a certain amount of fragility - adding to an already 
> >>>>> fragile concept here.)
> >>>>>
> >>>>> And note that there's no reason why text_poke could not be used 
> >>>>> in stop_machine_run() - the stop_machine_run() handler must not 
> >>>>> take the text_lock of course - but outside code calling 
> >>>>> stop_machine_run() can do it and can hence serialize properly.
> >>>>>
> >>>>> Note that even if we did this then your v2 patch is not fully 
> >>>>> correct: you need to move the sync_core() at the end of the 
> >>>>> sequence inside the critical section too. (right now this is 
> >>>>> mostly harmless because the INVLPG inside the clear_fixmap() 
> >>>>> happens to be serializing so it has an implicit sync_core() 
> >>>>> property - but nevertheless we better do this straight away to 
> >>>>> not cause problems later down the line.)
> >>>>>
> >>>>> 	Ingo
> >>>> Agreed. The alternatives_smp_lock/alternatives_smp_unlock 
> >>>> specific case does not bring us much if it has no perceivable 
> >>>> performance impact. It's better to keep a standard interface 
> >>>> and clear requirements.
> >>> Note that i dont object to another aspect of this same change: 
> >>> the fact that it makes the whole sequence more atomic and more 
> >>> defensive [which is never bad of fragile interfaces].
> >>>
> >>> I only got worried about the "lets use this without the text 
> >>> lock" ideas.
> >>>
> >>> So if Masami-san sends a delta patch with a different changelog 
> >>> and with the sync_core() bit moved inside the critical section, 
> >>> i'll apply that too.
> >> OK, here is the delta patch.
> >>
> >> Expand irq-atomic region to cover fixmap using code and sync_core.
> >>
> >> Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
> >> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> >> Cc: Ingo Molnar <mingo@elte.hu>
> >> ---
> >>  arch/x86/kernel/alternative.c |    4 ++--
> >>  1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> Index: linux-2.6-tip/arch/x86/kernel/alternative.c
> >> ===================================================================
> >> --- linux-2.6-tip.orig/arch/x86/kernel/alternative.c
> >> +++ linux-2.6-tip/arch/x86/kernel/alternative.c
> >> @@ -526,13 +526,12 @@ void *__kprobes text_poke(void *addr, co
> >>  		pages[1] = virt_to_page(addr + PAGE_SIZE);
> >>  	}
> >>  	BUG_ON(!pages[0]);
> >> +	local_irq_save(flags);
> >>  	set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
> >>  	if (pages[1])
> >>  		set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
> >>  	vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
> >> -	local_irq_save(flags);
> >>  	memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
> >> -	local_irq_restore(flags);
> >>  	clear_fixmap(FIX_TEXT_POKE0);
> >>  	if (pages[1])
> >>  		clear_fixmap(FIX_TEXT_POKE1);
> >> @@ -540,6 +539,7 @@ void *__kprobes text_poke(void *addr, co
> >>  	sync_core();
> >>  	/* Could also do a CLFLUSH here to speed up CPU recovery; but
> >>  	   that causes hangs on some VIA CPUs. */
> >> +	local_irq_restore(flags);
> >>  	for (i = 0; i < len; i++)
> >>  		BUG_ON(((char *)addr)[i] != ((char *)opcode)[i]);
> > 
> > I think irq off should cover the BUG_ON too. This safety check assumes
> > we are the only ones modifying "addr".
> 
> I think others don't change without text_mutex, don't it?
> 

They shouldn't, but given we decided to grow the irq off region to
contain all the code that needs to be executed atomically, it should
also contain the BUG_ON, because it is expected to be as atomic as the
rest of the code.

Mathieu

> Thank you,
> 
> -- 
> Masami Hiramatsu
> 
> Software Engineer
> Hitachi Computer Products (America) Inc.
> Software Solutions Division
> 
> e-mail: mhiramat@redhat.com
> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [tip:tracing/core] tracing, Text Edit Lock - Architecture Independent Code
  2009-03-06 15:35 ` [PATCH -tip 1/4] Text Edit Lock - Architecture Independent Code (v2) Masami Hiramatsu
@ 2009-03-08 15:51   ` Mathieu Desnoyers
  0 siblings, 0 replies; 26+ messages in thread
From: Mathieu Desnoyers @ 2009-03-08 15:51 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, mathieu.desnoyers, hpa, mingo, tglx, mingo

Commit-ID:  0e39ac444636ff5be39b26f1cb56d79594654dda
Gitweb:     http://git.kernel.org/tip/0e39ac444636ff5be39b26f1cb56d79594654dda
Author:     "Mathieu Desnoyers" <mathieu.desnoyers@polymtl.ca>
AuthorDate: Fri, 6 Mar 2009 10:35:52 -0500
Commit:     Ingo Molnar <mingo@elte.hu>
CommitDate: Fri, 6 Mar 2009 16:48:59 +0100

tracing, Text Edit Lock - Architecture Independent Code

This is an architecture independant synchronization around kernel text
modifications through use of a global mutex.

A mutex has been chosen so that kprobes, the main user of this, can sleep
during memory allocation between the memory read of the instructions it
must replace and the memory write of the breakpoint.

Other user of this interface: immediate values.

Paravirt and alternatives are always done when SMP is inactive, so there
is no need to use locks.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
LKML-Reference: <49B142D8.7020601@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


---
 include/linux/memory.h |    6 ++++++
 mm/memory.c            |   10 ++++++++++
 2 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/include/linux/memory.h b/include/linux/memory.h
index 3fdc108..86a6c0f 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -99,4 +99,10 @@ enum mem_add_context { BOOT, HOTPLUG };
 #define hotplug_memory_notifier(fn, pri) do { } while (0)
 #endif
 
+/*
+ * Kernel text modification mutex, used for code patching. Users of this lock
+ * can sleep.
+ */
+extern struct mutex text_mutex;
+
 #endif /* _LINUX_MEMORY_H_ */
diff --git a/mm/memory.c b/mm/memory.c
index baa999e..05fab3b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -48,6 +48,8 @@
 #include <linux/rmap.h>
 #include <linux/module.h>
 #include <linux/delayacct.h>
+#include <linux/kprobes.h>
+#include <linux/mutex.h>
 #include <linux/init.h>
 #include <linux/writeback.h>
 #include <linux/memcontrol.h>
@@ -99,6 +101,14 @@ int randomize_va_space __read_mostly =
 					2;
 #endif
 
+/*
+ * mutex protecting text section modification (dynamic code patching).
+ * some users need to sleep (allocating memory...) while they hold this lock.
+ *
+ * NOT exported to modules - patching kernel text is a really delicate matter.
+ */
+DEFINE_MUTEX(text_mutex);
+
 static int __init disable_randmaps(char *s)
 {
 	randomize_va_space = 0;

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [tip:tracing/core] tracing, Text Edit Lock - kprobes architecture independent support
  2009-03-06 15:36 ` [PATCH -tip 2/4]Text Edit Lock - kprobes architecture independent support (v3) Masami Hiramatsu
@ 2009-03-08 15:51   ` Mathieu Desnoyers
  0 siblings, 0 replies; 26+ messages in thread
From: Mathieu Desnoyers @ 2009-03-08 15:51 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, ananth, mathieu.desnoyers, hpa, mingo, tglx,
	mhiramat, mingo

Commit-ID:  4460fdad85becd569f11501ad5b91814814335ff
Gitweb:     http://git.kernel.org/tip/4460fdad85becd569f11501ad5b91814814335ff
Author:     "Mathieu Desnoyers" <mathieu.desnoyers@polymtl.ca>
AuthorDate: Fri, 6 Mar 2009 10:36:38 -0500
Commit:     Ingo Molnar <mingo@elte.hu>
CommitDate: Fri, 6 Mar 2009 16:49:00 +0100

tracing, Text Edit Lock - kprobes architecture independent support

Use the mutual exclusion provided by the text edit lock in the kprobes code. It
allows coherent manipulation of the kernel code by other subsystems.

Changelog:

Move the kernel_text_lock/unlock out of the for loops.
Use text_mutex directly instead of a function.
Remove whitespace modifications.

(note : kprobes_mutex is always taken outside of text_mutex)

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
LKML-Reference: <49B14306.2080202@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


---
 kernel/kprobes.c |   15 +++++++++++++--
 1 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 7ba8cd9..479d4d5 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -43,6 +43,7 @@
 #include <linux/seq_file.h>
 #include <linux/debugfs.h>
 #include <linux/kdebug.h>
+#include <linux/memory.h>
 
 #include <asm-generic/sections.h>
 #include <asm/cacheflush.h>
@@ -699,9 +700,10 @@ int __kprobes register_kprobe(struct kprobe *p)
 		goto out;
 	}
 
+	mutex_lock(&text_mutex);
 	ret = arch_prepare_kprobe(p);
 	if (ret)
-		goto out;
+		goto out_unlock_text;
 
 	INIT_HLIST_NODE(&p->hlist);
 	hlist_add_head_rcu(&p->hlist,
@@ -710,6 +712,8 @@ int __kprobes register_kprobe(struct kprobe *p)
 	if (kprobe_enabled)
 		arch_arm_kprobe(p);
 
+out_unlock_text:
+	mutex_unlock(&text_mutex);
 out:
 	mutex_unlock(&kprobe_mutex);
 
@@ -746,8 +750,11 @@ valid_p:
 		 * enabled and not gone - otherwise, the breakpoint would
 		 * already have been removed. We save on flushing icache.
 		 */
-		if (kprobe_enabled && !kprobe_gone(old_p))
+		if (kprobe_enabled && !kprobe_gone(old_p)) {
+			mutex_lock(&text_mutex);
 			arch_disarm_kprobe(p);
+			mutex_unlock(&text_mutex);
+		}
 		hlist_del_rcu(&old_p->hlist);
 	} else {
 		if (p->break_handler && !kprobe_gone(p))
@@ -1280,12 +1287,14 @@ static void __kprobes enable_all_kprobes(void)
 	if (kprobe_enabled)
 		goto already_enabled;
 
+	mutex_lock(&text_mutex);
 	for (i = 0; i < KPROBE_TABLE_SIZE; i++) {
 		head = &kprobe_table[i];
 		hlist_for_each_entry_rcu(p, node, head, hlist)
 			if (!kprobe_gone(p))
 				arch_arm_kprobe(p);
 	}
+	mutex_unlock(&text_mutex);
 
 	kprobe_enabled = true;
 	printk(KERN_INFO "Kprobes globally enabled\n");
@@ -1310,6 +1319,7 @@ static void __kprobes disable_all_kprobes(void)
 
 	kprobe_enabled = false;
 	printk(KERN_INFO "Kprobes globally disabled\n");
+	mutex_lock(&text_mutex);
 	for (i = 0; i < KPROBE_TABLE_SIZE; i++) {
 		head = &kprobe_table[i];
 		hlist_for_each_entry_rcu(p, node, head, hlist) {
@@ -1318,6 +1328,7 @@ static void __kprobes disable_all_kprobes(void)
 		}
 	}
 
+	mutex_unlock(&text_mutex);
 	mutex_unlock(&kprobe_mutex);
 	/* Allow all currently running kprobes to complete */
 	synchronize_sched();

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [tip:tracing/core] tracing, Text Edit Lock - SMP alternatives support
  2009-03-06 15:37 ` [PATCH -tip 3/4]Text Edit Lock - Smp alternatives support Masami Hiramatsu
  2009-03-06 18:11   ` Mathieu Desnoyers
@ 2009-03-08 15:51   ` Masami Hiramatsu
  1 sibling, 0 replies; 26+ messages in thread
From: Masami Hiramatsu @ 2009-03-08 15:51 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, tglx, mhiramat, mingo

Commit-ID:  3945dab45aa8c89014893bfa8eb1e1661a409cef
Gitweb:     http://git.kernel.org/tip/3945dab45aa8c89014893bfa8eb1e1661a409cef
Author:     "Masami Hiramatsu" <mhiramat@redhat.com>
AuthorDate: Fri, 6 Mar 2009 10:37:22 -0500
Commit:     Ingo Molnar <mingo@elte.hu>
CommitDate: Fri, 6 Mar 2009 16:49:00 +0100

tracing, Text Edit Lock - SMP alternatives support

Use the mutual exclusion provided by the text edit lock in alternatives code.
Since alternative_smp_* will be called from module init code, etc,
we'd better protect it from other subsystems.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
LKML-Reference: <49B14332.9030109@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


---
 arch/x86/kernel/alternative.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 4c80f15..092a7b8 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -5,6 +5,7 @@
 #include <linux/kprobes.h>
 #include <linux/mm.h>
 #include <linux/vmalloc.h>
+#include <linux/memory.h>
 #include <asm/alternative.h>
 #include <asm/sections.h>
 #include <asm/pgtable.h>
@@ -226,6 +227,7 @@ static void alternatives_smp_lock(u8 **start, u8 **end, u8 *text, u8 *text_end)
 {
 	u8 **ptr;
 
+	mutex_lock(&text_mutex);
 	for (ptr = start; ptr < end; ptr++) {
 		if (*ptr < text)
 			continue;
@@ -234,6 +236,7 @@ static void alternatives_smp_lock(u8 **start, u8 **end, u8 *text, u8 *text_end)
 		/* turn DS segment override prefix into lock prefix */
 		text_poke(*ptr, ((unsigned char []){0xf0}), 1);
 	};
+	mutex_unlock(&text_mutex);
 }
 
 static void alternatives_smp_unlock(u8 **start, u8 **end, u8 *text, u8 *text_end)
@@ -243,6 +246,7 @@ static void alternatives_smp_unlock(u8 **start, u8 **end, u8 *text, u8 *text_end
 	if (noreplace_smp)
 		return;
 
+	mutex_lock(&text_mutex);
 	for (ptr = start; ptr < end; ptr++) {
 		if (*ptr < text)
 			continue;
@@ -251,6 +255,7 @@ static void alternatives_smp_unlock(u8 **start, u8 **end, u8 *text, u8 *text_end
 		/* turn lock prefix into DS segment override prefix */
 		text_poke(*ptr, ((unsigned char []){0x3E}), 1);
 	};
+	mutex_unlock(&text_mutex);
 }
 
 struct smp_alt_module {

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [tip:tracing/core] x86: implement atomic text_poke() via fixmap
  2009-03-06 15:37 ` [PATCH -tip 4/4] Atomic text_poke() with fixmap Masami Hiramatsu
  2009-03-06 18:13   ` Mathieu Desnoyers
@ 2009-03-08 15:51   ` Masami Hiramatsu
  1 sibling, 0 replies; 26+ messages in thread
From: Masami Hiramatsu @ 2009-03-08 15:51 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, mathieu.desnoyers, hpa, mingo, tglx, mhiramat, mingo

Commit-ID:  78ff7fae04554b49d29226ed12536268c2500d1f
Gitweb:     http://git.kernel.org/tip/78ff7fae04554b49d29226ed12536268c2500d1f
Author:     "Masami Hiramatsu" <mhiramat@redhat.com>
AuthorDate: Fri, 6 Mar 2009 10:37:54 -0500
Commit:     Ingo Molnar <mingo@elte.hu>
CommitDate: Fri, 6 Mar 2009 16:49:01 +0100

x86: implement atomic text_poke() via fixmap

Use fixmaps instead of vmap/vunmap in text_poke() for avoiding
page allocation and delayed unmapping.

At the result of above change, text_poke() becomes atomic and can be called
from stop_machine() etc.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
LKML-Reference: <49B14352.2040705@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


---
 arch/x86/include/asm/fixmap.h |    2 ++
 arch/x86/kernel/alternative.c |   24 +++++++++++++++---------
 2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index 63a79c7..81937a5 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -111,6 +111,8 @@ enum fixed_addresses {
 #ifdef CONFIG_PARAVIRT
 	FIX_PARAVIRT_BOOTMAP,
 #endif
+	FIX_TEXT_POKE0,	/* reserve 2 pages for text_poke() */
+	FIX_TEXT_POKE1,
 	__end_of_permanent_fixed_addresses,
 #ifdef CONFIG_PROVIDE_OHCI1394_DMA_INIT
 	FIX_OHCI1394_BASE,
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 092a7b8..2d903b7 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -13,7 +13,9 @@
 #include <asm/nmi.h>
 #include <asm/vsyscall.h>
 #include <asm/cacheflush.h>
+#include <asm/tlbflush.h>
 #include <asm/io.h>
+#include <asm/fixmap.h>
 
 #define MAX_PATCH_LEN (255-1)
 
@@ -505,15 +507,16 @@ void *text_poke_early(void *addr, const void *opcode, size_t len)
  * It means the size must be writable atomically and the address must be aligned
  * in a way that permits an atomic write. It also makes sure we fit on a single
  * page.
+ *
+ * Note: Must be called under text_mutex.
  */
 void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
 {
+	unsigned long flags;
 	char *vaddr;
-	int nr_pages = 2;
 	struct page *pages[2];
 	int i;
 
-	might_sleep();
 	if (!core_kernel_text((unsigned long)addr)) {
 		pages[0] = vmalloc_to_page(addr);
 		pages[1] = vmalloc_to_page(addr + PAGE_SIZE);
@@ -523,14 +526,17 @@ void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
 		pages[1] = virt_to_page(addr + PAGE_SIZE);
 	}
 	BUG_ON(!pages[0]);
-	if (!pages[1])
-		nr_pages = 1;
-	vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
-	BUG_ON(!vaddr);
-	local_irq_disable();
+	set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
+	if (pages[1])
+		set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
+	vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
+	local_irq_save(flags);
 	memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
-	local_irq_enable();
-	vunmap(vaddr);
+	local_irq_restore(flags);
+	clear_fixmap(FIX_TEXT_POKE0);
+	if (pages[1])
+		clear_fixmap(FIX_TEXT_POKE1);
+	local_flush_tlb();
 	sync_core();
 	/* Could also do a CLFLUSH here to speed up CPU recovery; but
 	   that causes hangs on some VIA CPUs. */

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH -tip 5/4] Expands irq-off region in text_poke()
  2009-03-07  1:42                   ` Mathieu Desnoyers
@ 2009-03-09 16:40                     ` Masami Hiramatsu
  2009-03-10 21:57                       ` [tip:tracing/core] x86: expand " Masami Hiramatsu
  0 siblings, 1 reply; 26+ messages in thread
From: Masami Hiramatsu @ 2009-03-09 16:40 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Ingo Molnar, Andrew Morton, Nick Piggin, Steven Rostedt,
	Andi Kleen, Linux Kernel Mailing List, Thomas Gleixner,
	Peter Zijlstra, Frederic Weisbecker, Linus Torvalds,
	Arjan van de Ven, Rusty Russell, H. Peter Anvin

Mathieu Desnoyers wrote:
>>> I think irq off should cover the BUG_ON too. This safety check assumes
>>> we are the only ones modifying "addr".
>> I think others don't change without text_mutex, don't it?
>>
> 
> They shouldn't, but given we decided to grow the irq off region to
> contain all the code that needs to be executed atomically, it should
> also contain the BUG_ON, because it is expected to be as atomic as the
> rest of the code.

Hm, sure, I updated the delta patch.

Expand irq-off region to cover fixmap using code and cache synchronizing.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/kernel/alternative.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6-tip/arch/x86/kernel/alternative.c
===================================================================
--- linux-2.6-tip.orig/arch/x86/kernel/alternative.c
+++ linux-2.6-tip/arch/x86/kernel/alternative.c
@@ -526,13 +526,12 @@ void *__kprobes text_poke(void *addr, co
 		pages[1] = virt_to_page(addr + PAGE_SIZE);
 	}
 	BUG_ON(!pages[0]);
+	local_irq_save(flags);
 	set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
 	if (pages[1])
 		set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
 	vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
-	local_irq_save(flags);
 	memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
-	local_irq_restore(flags);
 	clear_fixmap(FIX_TEXT_POKE0);
 	if (pages[1])
 		clear_fixmap(FIX_TEXT_POKE1);
@@ -542,5 +541,6 @@ void *__kprobes text_poke(void *addr, co
 	   that causes hangs on some VIA CPUs. */
 	for (i = 0; i < len; i++)
 		BUG_ON(((char *)addr)[i] != ((char *)opcode)[i]);
+	local_irq_restore(flags);
 	return addr;
 }

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhiramat@redhat.com


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [tip:tracing/core] x86: expand irq-off region in text_poke()
  2009-03-09 16:40                     ` Masami Hiramatsu
@ 2009-03-10 21:57                       ` Masami Hiramatsu
  0 siblings, 0 replies; 26+ messages in thread
From: Masami Hiramatsu @ 2009-03-10 21:57 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, tglx, mhiramat, mingo

Commit-ID:  7cf49427042400d40bdc80b5c3399b6b5945afa8
Gitweb:     http://git.kernel.org/tip/7cf49427042400d40bdc80b5c3399b6b5945afa8
Author:     "Masami Hiramatsu" <mhiramat@redhat.com>
AuthorDate: Mon, 9 Mar 2009 12:40:40 -0400
Commit:     Ingo Molnar <mingo@elte.hu>
CommitDate: Tue, 10 Mar 2009 16:24:23 +0100

x86: expand irq-off region in text_poke()

Expand irq-off region to cover fixmap using code and cache synchronizing.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
LKML-Reference: <49B54688.8090403@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


---
 arch/x86/kernel/alternative.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 2d903b7..f576587 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -526,13 +526,12 @@ void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
 		pages[1] = virt_to_page(addr + PAGE_SIZE);
 	}
 	BUG_ON(!pages[0]);
+	local_irq_save(flags);
 	set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
 	if (pages[1])
 		set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
 	vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
-	local_irq_save(flags);
 	memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
-	local_irq_restore(flags);
 	clear_fixmap(FIX_TEXT_POKE0);
 	if (pages[1])
 		clear_fixmap(FIX_TEXT_POKE1);
@@ -542,5 +541,6 @@ void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
 	   that causes hangs on some VIA CPUs. */
 	for (i = 0; i < len; i++)
 		BUG_ON(((char *)addr)[i] != ((char *)opcode)[i]);
+	local_irq_restore(flags);
 	return addr;
 }

^ permalink raw reply related	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2009-03-10 21:58 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-06 15:34 [PATCH -tip 0/4] Text edit lock and atomic text_poke() Masami Hiramatsu
2009-03-06 15:35 ` [PATCH -tip 1/4] Text Edit Lock - Architecture Independent Code (v2) Masami Hiramatsu
2009-03-08 15:51   ` [tip:tracing/core] tracing, Text Edit Lock - Architecture Independent Code Mathieu Desnoyers
2009-03-06 15:36 ` [PATCH -tip 2/4]Text Edit Lock - kprobes architecture independent support (v3) Masami Hiramatsu
2009-03-08 15:51   ` [tip:tracing/core] tracing, Text Edit Lock - kprobes architecture independent support Mathieu Desnoyers
2009-03-06 15:37 ` [PATCH -tip 3/4]Text Edit Lock - Smp alternatives support Masami Hiramatsu
2009-03-06 18:11   ` Mathieu Desnoyers
2009-03-06 18:40     ` Masami Hiramatsu
2009-03-08 15:51   ` [tip:tracing/core] tracing, Text Edit Lock - SMP " Masami Hiramatsu
2009-03-06 15:37 ` [PATCH -tip 4/4] Atomic text_poke() with fixmap Masami Hiramatsu
2009-03-06 18:13   ` Mathieu Desnoyers
2009-03-06 18:18     ` Ingo Molnar
2009-03-06 18:33     ` [PATCH -tip 4/4] Atomic text_poke() with fixmap take2 Masami Hiramatsu
2009-03-06 18:45       ` Mathieu Desnoyers
2009-03-06 19:08       ` Ingo Molnar
2009-03-06 19:15         ` Mathieu Desnoyers
2009-03-06 19:22           ` Ingo Molnar
2009-03-06 20:25             ` [PATCH -tip 5/4] Expands irq-off region in text_poke() Masami Hiramatsu
2009-03-06 21:01               ` Mathieu Desnoyers
2009-03-06 21:57                 ` Masami Hiramatsu
2009-03-07  1:42                   ` Mathieu Desnoyers
2009-03-09 16:40                     ` Masami Hiramatsu
2009-03-10 21:57                       ` [tip:tracing/core] x86: expand " Masami Hiramatsu
2009-03-08 15:51   ` [tip:tracing/core] x86: implement atomic text_poke() via fixmap Masami Hiramatsu
2009-03-06 18:09 ` [PATCH -tip 0/4] Text edit lock and atomic text_poke() Mathieu Desnoyers
2009-03-06 18:12   ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).