linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [git pull] updates for tip
@ 2009-02-20 16:38 Steven Rostedt
  2009-02-20 16:38 ` [PATCH 1/5] ftrace: allow archs to preform pre and post process for code modification Steven Rostedt
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Steven Rostedt @ 2009-02-20 16:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra

Ingo,

This is based off of your tip/x86/urgent branch.

Please pull the latest tip/x86/ftrace tree, which can be found at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git
tip/x86/ftrace


Steven Rostedt (5):
      ftrace: allow archs to preform pre and post process for code modification
      ftrace, x86: make kernel text writable only for conversions
      ftrace: immediately stop code modification if failure is detected
      ftrace: break out modify loop immediately on detection of error
      ftrace, x86: do not depend on system state for kernel text info

----
 arch/x86/include/asm/ftrace.h |   10 ++++++++++
 arch/x86/kernel/ftrace.c      |   16 ++++++++++++++++
 arch/x86/mm/init_32.c         |   37 ++++++++++++++++++++++++++++++++++---
 arch/x86/mm/init_64.c         |   39 ++++++++++++++++++++++++++++++++++-----
 include/linux/ftrace.h        |    3 +++
 kernel/trace/ftrace.c         |   34 +++++++++++++++++++++++++++++++++-
 6 files changed, 130 insertions(+), 9 deletions(-)
-- 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/5] ftrace: allow archs to preform pre and post process for code modification
  2009-02-20 16:38 [git pull] updates for tip Steven Rostedt
@ 2009-02-20 16:38 ` Steven Rostedt
  2009-02-20 17:25   ` Ingo Molnar
  2009-02-20 16:38 ` [PATCH 2/5] ftrace, x86: make kernel text writable only for conversions Steven Rostedt
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Steven Rostedt @ 2009-02-20 16:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Steven Rostedt

[-- Attachment #1: 0001-ftrace-allow-archs-to-preform-pre-and-post-process.patch --]
[-- Type: text/plain, Size: 2088 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

This patch creates the weak functions: ftrace_arch_modify_prepare
and ftrace_arch_modify_post_process that are called before and
after the stop machine is called to modify the kernel text.

If the arch needs to do pre or post processing, it only needs to define
these functions.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 include/linux/ftrace.h |    3 +++
 kernel/trace/ftrace.c  |   28 ++++++++++++++++++++++++++++
 2 files changed, 31 insertions(+), 0 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 677432b..44bd470 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -99,6 +99,9 @@ stack_trace_sysctl(struct ctl_table *table, int write,
 /* asm/ftrace.h must be defined for archs supporting dynamic ftrace */
 #include <asm/ftrace.h>
 
+int ftrace_arch_modify_prepare(void);
+int ftrace_arch_modify_post_process(void);
+
 enum {
 	FTRACE_FL_FREE		= (1 << 0),
 	FTRACE_FL_FAILED	= (1 << 1),
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index fdf913d..cf54ba4 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -585,6 +585,24 @@ ftrace_code_disable(struct module *mod, struct dyn_ftrace *rec)
 	return 1;
 }
 
+/*
+ * archs can override this function if they must do something
+ * before the modifying code is performed.
+ */
+int __weak ftrace_arch_modify_prepare(void)
+{
+	return 0;
+}
+
+/*
+ * archs can override this function if they must do something
+ * after the modifying code is performed.
+ */
+int __weak ftrace_arch_modify_post_process(void)
+{
+	return 0;
+}
+
 static int __ftrace_modify_code(void *data)
 {
 	int *command = data;
@@ -607,7 +625,17 @@ static int __ftrace_modify_code(void *data)
 
 static void ftrace_run_update_code(int command)
 {
+	int ret;
+
+	ret = ftrace_arch_modify_prepare();
+	FTRACE_WARN_ON(ret);
+	if (ret)
+		return;
+
 	stop_machine(__ftrace_modify_code, &command, NULL);
+
+	ret = ftrace_arch_modify_post_process();
+	FTRACE_WARN_ON(ret);
 }
 
 static ftrace_func_t saved_ftrace_func;
-- 
1.5.6.5

-- 

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/5] ftrace, x86: make kernel text writable only for conversions
  2009-02-20 16:38 [git pull] updates for tip Steven Rostedt
  2009-02-20 16:38 ` [PATCH 1/5] ftrace: allow archs to preform pre and post process for code modification Steven Rostedt
@ 2009-02-20 16:38 ` Steven Rostedt
  2009-02-20 17:30   ` Ingo Molnar
  2009-02-20 16:38 ` [PATCH 3/5] ftrace: immediately stop code modification if failure is detected Steven Rostedt
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Steven Rostedt @ 2009-02-20 16:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Steven Rostedt

[-- Attachment #1: 0002-ftrace-x86-make-kernel-text-writable-only-for-conv.patch --]
[-- Type: text/plain, Size: 5051 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

Impact: keep kernel text read only

Because dynamic ftrace converts the calls to mcount into and out of
nops at run time, we needed to always keep the kernel text writable.

But this defeats the point of CONFIG_DEBUG_RODATA. This patch converts
the kernel code to writable before ftrace modifies the text, and converts
it back to read only afterward.

The conversion is done via stop_machine and no IPIs may be executed
at that time. The kernel text is set to write just before calling
stop_machine and set to read only again right afterward.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 arch/x86/include/asm/ftrace.h |   10 ++++++++++
 arch/x86/kernel/ftrace.c      |   20 ++++++++++++++++++++
 arch/x86/mm/init_32.c         |   27 ++++++++++++++++++++++++---
 arch/x86/mm/init_64.c         |   29 ++++++++++++++++++++++++-----
 4 files changed, 78 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index b55b4a7..5564cf3 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -80,4 +80,14 @@ extern void return_to_handler(void);
 #endif /* __ASSEMBLY__ */
 #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
 
+#ifndef __ASSEMBLY__
+#ifdef CONFIG_DEBUG_RODATA
+void set_kernel_text_rw(void);
+void set_kernel_text_ro(void);
+#else
+static inline void set_kernel_text_rw(void) { }
+static inline void set_kernel_text_ro(void) { }
+#endif
+#endif /* __ASSEMBLY__ */
+
 #endif /* _ASM_X86_FTRACE_H */
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 231bdd3..aa0e559 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -26,6 +26,26 @@
 
 #ifdef CONFIG_DYNAMIC_FTRACE
 
+int ftrace_arch_modify_prepare(void)
+{
+	/* at boot up, we are still writable */
+	if (system_state != SYSTEM_RUNNING)
+		return 0;
+
+	set_kernel_text_rw();
+	return 0;
+}
+
+int ftrace_arch_modify_post_process(void)
+{
+	/* at boot up, we are still writable */
+	if (system_state != SYSTEM_RUNNING)
+		return 0;
+
+	set_kernel_text_ro();
+	return 0;
+}
+
 union ftrace_code_union {
 	char code[MCOUNT_INSN_SIZE];
 	struct {
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 2cef050..bcd7f00 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -1155,13 +1155,35 @@ static noinline int do_test_wp_bit(void)
 const int rodata_test_data = 0xC3;
 EXPORT_SYMBOL_GPL(rodata_test_data);
 
+/* used by ftrace */
+void set_kernel_text_rw(void)
+{
+	unsigned long start = PFN_ALIGN(_text);
+	unsigned long size = PFN_ALIGN(_etext) - start;
+
+	printk(KERN_INFO "Set kernel text: %lx - %lx for read write\n",
+	       start, start+size);
+
+	set_pages_rw(virt_to_page(start), size >> PAGE_SHIFT);
+}
+
+/* used by ftrace */
+void set_kernel_text_ro(void)
+{
+	unsigned long start = PFN_ALIGN(_text);
+	unsigned long size = PFN_ALIGN(_etext) - start;
+
+	printk(KERN_INFO "Set kernel text: %lx - %lx for read only\n",
+	       start, start+size);
+
+	set_pages_ro(virt_to_page(start), size >> PAGE_SHIFT);
+}
+
 void mark_rodata_ro(void)
 {
 	unsigned long start = PFN_ALIGN(_text);
 	unsigned long size = PFN_ALIGN(_etext) - start;
 
-#ifndef CONFIG_DYNAMIC_FTRACE
-	/* Dynamic tracing modifies the kernel text section */
 	set_pages_ro(virt_to_page(start), size >> PAGE_SHIFT);
 	printk(KERN_INFO "Write protecting the kernel text: %luk\n",
 		size >> 10);
@@ -1174,7 +1196,6 @@ void mark_rodata_ro(void)
 	printk(KERN_INFO "Testing CPA: write protecting again\n");
 	set_pages_ro(virt_to_page(start), size>>PAGE_SHIFT);
 #endif
-#endif /* CONFIG_DYNAMIC_FTRACE */
 
 	start += size;
 	size = (unsigned long)__end_rodata - start;
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index e6d36b4..8c1b5ee 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -986,17 +986,36 @@ void free_initmem(void)
 const int rodata_test_data = 0xC3;
 EXPORT_SYMBOL_GPL(rodata_test_data);
 
+/* used by ftrace */
+void set_kernel_text_rw(void)
+{
+	unsigned long start = PFN_ALIGN(_stext);
+	unsigned long end = PFN_ALIGN(__start_rodata);
+
+	printk(KERN_INFO "Set kernel text: %lx - %lx for read write\n",
+	       start, end);
+
+	set_memory_rw(start, (end - start) >> PAGE_SHIFT);
+}
+
+/* used by ftrace */
+void set_kernel_text_ro(void)
+{
+	unsigned long start = PFN_ALIGN(_stext);
+	unsigned long end = PFN_ALIGN(__start_rodata);
+
+	printk(KERN_INFO "Set kernel text: %lx - %lx for read only\n",
+	       start, end);
+
+	set_memory_ro(start, (end - start) >> PAGE_SHIFT);
+}
+
 void mark_rodata_ro(void)
 {
 	unsigned long start = PFN_ALIGN(_stext), end = PFN_ALIGN(__end_rodata);
 	unsigned long rodata_start =
 		((unsigned long)__start_rodata + PAGE_SIZE - 1) & PAGE_MASK;
 
-#ifdef CONFIG_DYNAMIC_FTRACE
-	/* Dynamic tracing modifies the kernel text section */
-	start = rodata_start;
-#endif
-
 	printk(KERN_INFO "Write protecting the kernel read-only data: %luk\n",
 	       (end - start) >> 10);
 	set_memory_ro(start, (end - start) >> PAGE_SHIFT);
-- 
1.5.6.5

-- 

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/5] ftrace: immediately stop code modification if failure is detected
  2009-02-20 16:38 [git pull] updates for tip Steven Rostedt
  2009-02-20 16:38 ` [PATCH 1/5] ftrace: allow archs to preform pre and post process for code modification Steven Rostedt
  2009-02-20 16:38 ` [PATCH 2/5] ftrace, x86: make kernel text writable only for conversions Steven Rostedt
@ 2009-02-20 16:38 ` Steven Rostedt
  2009-02-20 16:38 ` [PATCH 4/5] ftrace: break out modify loop immediately on detection of error Steven Rostedt
  2009-02-20 16:38 ` [PATCH 5/5] ftrace, x86: do not depend on system state for kernel text info Steven Rostedt
  4 siblings, 0 replies; 16+ messages in thread
From: Steven Rostedt @ 2009-02-20 16:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Steven Rostedt

[-- Attachment #1: 0003-ftrace-immediately-stop-code-modification-if-failur.patch --]
[-- Type: text/plain, Size: 1634 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

Impact: fix to prevent NMI lockup

If the page fault handler produces a WARN_ON in the modifying of
text, and the system is setup to have a high frequency of NMIs,
we can lock up the system on a failure to modify code.

The modifying of code with NMIs allows all NMIs to modify the code
if it is about to run. This prevents a modifier on one CPU from
modifying code running in NMI context on another CPU. The modifying
is done through stop_machine, so only NMIs must be considered.

But if the write causes the page fault handler to produce a warning,
the print can slow it down enough that as soon as it is done
it will take another NMI before going back to the process context.
The new NMI will perform the write again causing another print and
this will hang the box.

This patch turns off the writing as soon as a failure is detected
and does not wait for it to be turned off by the process context.
This will keep NMIs from getting stuck in this back and forth
of print outs.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 arch/x86/kernel/ftrace.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index aa0e559..c9ba2f9 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -131,6 +131,10 @@ static void ftrace_mod_code(void)
 	 */
 	mod_code_status = probe_kernel_write(mod_code_ip, mod_code_newcode,
 					     MCOUNT_INSN_SIZE);
+
+	/* if we fail, then kill any new writers */
+	if (mod_code_status)
+		mod_code_write = 0;
 }
 
 void ftrace_nmi_enter(void)
-- 
1.5.6.5

-- 

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/5] ftrace: break out modify loop immediately on detection of error
  2009-02-20 16:38 [git pull] updates for tip Steven Rostedt
                   ` (2 preceding siblings ...)
  2009-02-20 16:38 ` [PATCH 3/5] ftrace: immediately stop code modification if failure is detected Steven Rostedt
@ 2009-02-20 16:38 ` Steven Rostedt
  2009-02-20 17:33   ` Ingo Molnar
  2009-02-20 16:38 ` [PATCH 5/5] ftrace, x86: do not depend on system state for kernel text info Steven Rostedt
  4 siblings, 1 reply; 16+ messages in thread
From: Steven Rostedt @ 2009-02-20 16:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Steven Rostedt

[-- Attachment #1: 0004-ftrace-break-out-modify-loop-immediately-on-detecti.patch --]
[-- Type: text/plain, Size: 867 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

Impact: added precaution on failure detection

Break out of the modifying loop as soon as a failure is detected.
This is just an added precaution found by code review and was not
found by any bug chasing.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 kernel/trace/ftrace.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index cf54ba4..3b3e0e0 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -561,11 +561,15 @@ static void ftrace_replace_code(int enable)
 				if ((system_state == SYSTEM_BOOTING) ||
 				    !core_kernel_text(rec->ip)) {
 					ftrace_free_rec(rec);
-				} else
+				} else {
 					ftrace_bug(failed, rec->ip);
+					goto out;
+				}
 			}
 		}
 	}
+ out:
+	return;
 }
 
 static int
-- 
1.5.6.5

-- 

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/5] ftrace, x86: do not depend on system state for kernel text info
  2009-02-20 16:38 [git pull] updates for tip Steven Rostedt
                   ` (3 preceding siblings ...)
  2009-02-20 16:38 ` [PATCH 4/5] ftrace: break out modify loop immediately on detection of error Steven Rostedt
@ 2009-02-20 16:38 ` Steven Rostedt
  2009-02-20 17:34   ` Ingo Molnar
  4 siblings, 1 reply; 16+ messages in thread
From: Steven Rostedt @ 2009-02-20 16:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Steven Rostedt

[-- Attachment #1: 0005-ftrace-x86-do-not-depend-on-system-state-for-kerne.patch --]
[-- Type: text/plain, Size: 3481 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

Andrew Morton pointed out that using SYSTEM_STATE is a bad idea
since there is no guarantee to what its state will actually be.

Instead, I moved the check into the set_kernel_text_* functions
themselves, and use a local variable to determine when it is
OK to change the kernel text RW permissions.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 arch/x86/kernel/ftrace.c |    8 --------
 arch/x86/mm/init_32.c    |   10 ++++++++++
 arch/x86/mm/init_64.c    |   10 ++++++++++
 3 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index c9ba2f9..025d783 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -28,20 +28,12 @@
 
 int ftrace_arch_modify_prepare(void)
 {
-	/* at boot up, we are still writable */
-	if (system_state != SYSTEM_RUNNING)
-		return 0;
-
 	set_kernel_text_rw();
 	return 0;
 }
 
 int ftrace_arch_modify_post_process(void)
 {
-	/* at boot up, we are still writable */
-	if (system_state != SYSTEM_RUNNING)
-		return 0;
-
 	set_kernel_text_ro();
 	return 0;
 }
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index bcd7f00..9ca4c57 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -1155,12 +1155,17 @@ static noinline int do_test_wp_bit(void)
 const int rodata_test_data = 0xC3;
 EXPORT_SYMBOL_GPL(rodata_test_data);
 
+static int kernel_set_to_readonly;
+
 /* used by ftrace */
 void set_kernel_text_rw(void)
 {
 	unsigned long start = PFN_ALIGN(_text);
 	unsigned long size = PFN_ALIGN(_etext) - start;
 
+	if (!kernel_set_to_readonly)
+		return;
+
 	printk(KERN_INFO "Set kernel text: %lx - %lx for read write\n",
 	       start, start+size);
 
@@ -1173,6 +1178,9 @@ void set_kernel_text_ro(void)
 	unsigned long start = PFN_ALIGN(_text);
 	unsigned long size = PFN_ALIGN(_etext) - start;
 
+	if (!kernel_set_to_readonly)
+		return;
+
 	printk(KERN_INFO "Set kernel text: %lx - %lx for read only\n",
 	       start, start+size);
 
@@ -1188,6 +1196,8 @@ void mark_rodata_ro(void)
 	printk(KERN_INFO "Write protecting the kernel text: %luk\n",
 		size >> 10);
 
+	kernel_set_to_readonly = 1;
+
 #ifdef CONFIG_CPA_DEBUG
 	printk(KERN_INFO "Testing CPA: Reverting %lx-%lx\n",
 		start, start+size);
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 8c1b5ee..c204433 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -986,12 +986,17 @@ void free_initmem(void)
 const int rodata_test_data = 0xC3;
 EXPORT_SYMBOL_GPL(rodata_test_data);
 
+static int kernel_set_to_readonly;
+
 /* used by ftrace */
 void set_kernel_text_rw(void)
 {
 	unsigned long start = PFN_ALIGN(_stext);
 	unsigned long end = PFN_ALIGN(__start_rodata);
 
+	if (!kernel_set_to_readonly)
+		return;
+
 	printk(KERN_INFO "Set kernel text: %lx - %lx for read write\n",
 	       start, end);
 
@@ -1004,6 +1009,9 @@ void set_kernel_text_ro(void)
 	unsigned long start = PFN_ALIGN(_stext);
 	unsigned long end = PFN_ALIGN(__start_rodata);
 
+	if (!kernel_set_to_readonly)
+		return;
+
 	printk(KERN_INFO "Set kernel text: %lx - %lx for read only\n",
 	       start, end);
 
@@ -1020,6 +1028,8 @@ void mark_rodata_ro(void)
 	       (end - start) >> 10);
 	set_memory_ro(start, (end - start) >> PAGE_SHIFT);
 
+	kernel_set_to_readonly = 1;
+
 	/*
 	 * The rodata section (but not the kernel text!) should also be
 	 * not-executable.
-- 
1.5.6.5

-- 

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/5] ftrace: allow archs to preform pre and post process for code modification
  2009-02-20 16:38 ` [PATCH 1/5] ftrace: allow archs to preform pre and post process for code modification Steven Rostedt
@ 2009-02-20 17:25   ` Ingo Molnar
  2009-02-20 17:32     ` Steven Rostedt
  0 siblings, 1 reply; 16+ messages in thread
From: Ingo Molnar @ 2009-02-20 17:25 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Steven Rostedt


* Steven Rostedt <rostedt@goodmis.org> wrote:

> +int ftrace_arch_modify_prepare(void);
> +int ftrace_arch_modify_post_process(void);

i think it would be clearer to name it:

extern int ftrace_arch_code_modify_prepare(void);
extern int ftrace_arch_code_modify_finish(void);

?

	Ingo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/5] ftrace, x86: make kernel text writable only for conversions
  2009-02-20 16:38 ` [PATCH 2/5] ftrace, x86: make kernel text writable only for conversions Steven Rostedt
@ 2009-02-20 17:30   ` Ingo Molnar
  2009-02-20 17:52     ` Steven Rostedt
  0 siblings, 1 reply; 16+ messages in thread
From: Ingo Molnar @ 2009-02-20 17:30 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Steven Rostedt


* Steven Rostedt <rostedt@goodmis.org> wrote:

> diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
> index b55b4a7..5564cf3 100644
> --- a/arch/x86/include/asm/ftrace.h
> +++ b/arch/x86/include/asm/ftrace.h
> @@ -80,4 +80,14 @@ extern void return_to_handler(void);
>  #endif /* __ASSEMBLY__ */
>  #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
>  
> +#ifndef __ASSEMBLY__
> +#ifdef CONFIG_DEBUG_RODATA
> +void set_kernel_text_rw(void);
> +void set_kernel_text_ro(void);
> +#else
> +static inline void set_kernel_text_rw(void) { }
> +static inline void set_kernel_text_ro(void) { }
> +#endif
> +#endif /* __ASSEMBLY__ */

The proper place for this is where the other pagetable attribute 
functions reside: arch/x86/include/asm/cacheflush.h.

> +/* used by ftrace */
> +void set_kernel_text_rw(void)
> +{
> +	unsigned long start = PFN_ALIGN(_text);
> +	unsigned long size = PFN_ALIGN(_etext) - start;
> +
> +	printk(KERN_INFO "Set kernel text: %lx - %lx for read write\n",
> +	       start, start+size);
> +
> +	set_pages_rw(virt_to_page(start), size >> PAGE_SHIFT);

Wont this be executed by every suspend/resume cycle? I'm not 
sure we want a KERN_INFO printout every time.

> +/* used by ftrace */
> +void set_kernel_text_rw(void)

i'd leave out the 'used by ftrace' bit - more uses might arise. 
How does kprobes get around readonly pages, it uses these APIs 
too, right?

	Ingo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/5] ftrace: allow archs to preform pre and post process for code modification
  2009-02-20 17:25   ` Ingo Molnar
@ 2009-02-20 17:32     ` Steven Rostedt
  0 siblings, 0 replies; 16+ messages in thread
From: Steven Rostedt @ 2009-02-20 17:32 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Steven Rostedt


On Fri, 20 Feb 2009, Ingo Molnar wrote:

> 
> * Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > +int ftrace_arch_modify_prepare(void);
> > +int ftrace_arch_modify_post_process(void);
> 
> i think it would be clearer to name it:
> 
> extern int ftrace_arch_code_modify_prepare(void);
> extern int ftrace_arch_code_modify_finish(void);
> 
> ?

Sure, I'll write up a patch.

-- Steve


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/5] ftrace: break out modify loop immediately on detection of error
  2009-02-20 16:38 ` [PATCH 4/5] ftrace: break out modify loop immediately on detection of error Steven Rostedt
@ 2009-02-20 17:33   ` Ingo Molnar
  2009-02-20 17:54     ` Steven Rostedt
  0 siblings, 1 reply; 16+ messages in thread
From: Ingo Molnar @ 2009-02-20 17:33 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Steven Rostedt


* Steven Rostedt <rostedt@goodmis.org> wrote:

> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -561,11 +561,15 @@ static void ftrace_replace_code(int enable)
>  				if ((system_state == SYSTEM_BOOTING) ||
>  				    !core_kernel_text(rec->ip)) {
>  					ftrace_free_rec(rec);
> -				} else
> +				} else {
>  					ftrace_bug(failed, rec->ip);
> +					goto out;
> +				}
>  			}
>  		}
>  	}
> + out:
> +	return;

wouldnt a simple 'break' suffice? Hm, nope, 
do_for_each_ftrace_rec() is a double loop.

Then perhaps a 'return' would perhaps be cleaner in this case. 
(even though it does make the flow a bit assymetric - the out 
label and the empty return looks a bit ugly.)

	Ingo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/5] ftrace, x86: do not depend on system state for kernel text info
  2009-02-20 16:38 ` [PATCH 5/5] ftrace, x86: do not depend on system state for kernel text info Steven Rostedt
@ 2009-02-20 17:34   ` Ingo Molnar
  2009-02-20 17:54     ` Steven Rostedt
  0 siblings, 1 reply; 16+ messages in thread
From: Ingo Molnar @ 2009-02-20 17:34 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Steven Rostedt


* Steven Rostedt <rostedt@goodmis.org> wrote:

> From: Steven Rostedt <srostedt@redhat.com>
> 
> Andrew Morton pointed out that using SYSTEM_STATE is a bad idea
> since there is no guarantee to what its state will actually be.
> 
> Instead, I moved the check into the set_kernel_text_* functions
> themselves, and use a local variable to determine when it is
> OK to change the kernel text RW permissions.
> 
> Reported-by: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Steven Rostedt <srostedt@redhat.com>
> ---
>  arch/x86/kernel/ftrace.c |    8 --------
>  arch/x86/mm/init_32.c    |   10 ++++++++++
>  arch/x86/mm/init_64.c    |   10 ++++++++++
>  3 files changed, 20 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
> index c9ba2f9..025d783 100644
> --- a/arch/x86/kernel/ftrace.c
> +++ b/arch/x86/kernel/ftrace.c
> @@ -28,20 +28,12 @@
>  
>  int ftrace_arch_modify_prepare(void)
>  {
> -	/* at boot up, we are still writable */
> -	if (system_state != SYSTEM_RUNNING)
> -		return 0;
> -

Since i suspect you'll rebase anyway, i'd suggest to squash this 
patch into patch #2, with credit to Andrew added.

	Ingo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/5] ftrace, x86: make kernel text writable only for conversions
  2009-02-20 17:30   ` Ingo Molnar
@ 2009-02-20 17:52     ` Steven Rostedt
  2009-02-20 17:55       ` Ingo Molnar
  0 siblings, 1 reply; 16+ messages in thread
From: Steven Rostedt @ 2009-02-20 17:52 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Steven Rostedt


On Fri, 20 Feb 2009, Ingo Molnar wrote:

> 
> * Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
> > index b55b4a7..5564cf3 100644
> > --- a/arch/x86/include/asm/ftrace.h
> > +++ b/arch/x86/include/asm/ftrace.h
> > @@ -80,4 +80,14 @@ extern void return_to_handler(void);
> >  #endif /* __ASSEMBLY__ */
> >  #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
> >  
> > +#ifndef __ASSEMBLY__
> > +#ifdef CONFIG_DEBUG_RODATA
> > +void set_kernel_text_rw(void);
> > +void set_kernel_text_ro(void);
> > +#else
> > +static inline void set_kernel_text_rw(void) { }
> > +static inline void set_kernel_text_ro(void) { }
> > +#endif
> > +#endif /* __ASSEMBLY__ */
> 
> The proper place for this is where the other pagetable attribute 
> functions reside: arch/x86/include/asm/cacheflush.h.

Heh, I put it there because I was not sure where the proper place should 
go. I figured someone would comment and tell me where to put it ;-)

> 
> > +/* used by ftrace */
> > +void set_kernel_text_rw(void)
> > +{
> > +	unsigned long start = PFN_ALIGN(_text);
> > +	unsigned long size = PFN_ALIGN(_etext) - start;
> > +
> > +	printk(KERN_INFO "Set kernel text: %lx - %lx for read write\n",
> > +	       start, start+size);
> > +
> > +	set_pages_rw(virt_to_page(start), size >> PAGE_SHIFT);
> 
> Wont this be executed by every suspend/resume cycle? I'm not 
> sure we want a KERN_INFO printout every time.

Yeah, agreed. My original thought was to allow a sysadmin to know when
something was changing the kernel text to read/write. But it does fill
up the logs quite a bit. I'll switch it to a pr_debug.

> 
> > +/* used by ftrace */
> > +void set_kernel_text_rw(void)
> 
> i'd leave out the 'used by ftrace' bit - more uses might arise. 
> How does kprobes get around readonly pages, it uses these APIs 
> too, right?

kprobes uses text_poke. text_poke uses vmap to create its own page table 
pointers to the text memory, does the modification and then removes the 
pointers. This is quite heavy weight and since ftrace needs to modify 10s 
of thousands of areas, converting all of kernel text page tables is much 
more efficient.

Note, if kprobes did change the original page tables, then it too would 
have hit the split_large_page bug too.

-- Steve


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/5] ftrace: break out modify loop immediately on detection of error
  2009-02-20 17:33   ` Ingo Molnar
@ 2009-02-20 17:54     ` Steven Rostedt
  0 siblings, 0 replies; 16+ messages in thread
From: Steven Rostedt @ 2009-02-20 17:54 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Steven Rostedt


On Fri, 20 Feb 2009, Ingo Molnar wrote:

> 
> * Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > --- a/kernel/trace/ftrace.c
> > +++ b/kernel/trace/ftrace.c
> > @@ -561,11 +561,15 @@ static void ftrace_replace_code(int enable)
> >  				if ((system_state == SYSTEM_BOOTING) ||
> >  				    !core_kernel_text(rec->ip)) {
> >  					ftrace_free_rec(rec);
> > -				} else
> > +				} else {
> >  					ftrace_bug(failed, rec->ip);
> > +					goto out;
> > +				}
> >  			}
> >  		}
> >  	}
> > + out:
> > +	return;
> 
> wouldnt a simple 'break' suffice? Hm, nope, 
> do_for_each_ftrace_rec() is a double loop.
> 
> Then perhaps a 'return' would perhaps be cleaner in this case. 
> (even though it does make the flow a bit assymetric - the out 
> label and the empty return looks a bit ugly.)

I guess a return would work too. I did the out jump to annotate the escape 
a bit more. Both are ugly IMHO. I'll switch it to return in the else 
statement.

-- Steve


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/5] ftrace, x86: do not depend on system state for kernel text info
  2009-02-20 17:34   ` Ingo Molnar
@ 2009-02-20 17:54     ` Steven Rostedt
  0 siblings, 0 replies; 16+ messages in thread
From: Steven Rostedt @ 2009-02-20 17:54 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Steven Rostedt


On Fri, 20 Feb 2009, Ingo Molnar wrote:
> 
> Since i suspect you'll rebase anyway, i'd suggest to squash this 
> patch into patch #2, with credit to Andrew added.

Yeah, there's enough things to fix to warrant a rebase.

Thanks,

-- Steve


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/5] ftrace, x86: make kernel text writable only for conversions
  2009-02-20 17:52     ` Steven Rostedt
@ 2009-02-20 17:55       ` Ingo Molnar
  2009-02-20 18:00         ` Steven Rostedt
  0 siblings, 1 reply; 16+ messages in thread
From: Ingo Molnar @ 2009-02-20 17:55 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Steven Rostedt


* Steven Rostedt <rostedt@goodmis.org> wrote:

> > > +/* used by ftrace */
> > > +void set_kernel_text_rw(void)
> > 
> > i'd leave out the 'used by ftrace' bit - more uses might 
> > arise. How does kprobes get around readonly pages, it uses 
> > these APIs too, right?
> 
> kprobes uses text_poke. text_poke uses vmap to create its own 
> page table pointers to the text memory, does the modification 
> and then removes the pointers. This is quite heavy weight and 
> since ftrace needs to modify 10s of thousands of areas, 
> converting all of kernel text page tables is much more 
> efficient.
> 
> Note, if kprobes did change the original page tables, then it 
> too would have hit the split_large_page bug too.

i think the real reason is that kprobes was written before we 
had this more flexible and more usable CPA code.

	Ingo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/5] ftrace, x86: make kernel text writable only for conversions
  2009-02-20 17:55       ` Ingo Molnar
@ 2009-02-20 18:00         ` Steven Rostedt
  0 siblings, 0 replies; 16+ messages in thread
From: Steven Rostedt @ 2009-02-20 18:00 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Steven Rostedt


On Fri, 20 Feb 2009, Ingo Molnar wrote:

> 
> * Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > > > +/* used by ftrace */
> > > > +void set_kernel_text_rw(void)
> > > 
> > > i'd leave out the 'used by ftrace' bit - more uses might 
> > > arise. How does kprobes get around readonly pages, it uses 
> > > these APIs too, right?
> > 
> > kprobes uses text_poke. text_poke uses vmap to create its own 
> > page table pointers to the text memory, does the modification 
> > and then removes the pointers. This is quite heavy weight and 
> > since ftrace needs to modify 10s of thousands of areas, 
> > converting all of kernel text page tables is much more 
> > efficient.
> > 
> > Note, if kprobes did change the original page tables, then it 
> > too would have hit the split_large_page bug too.
> 
> i think the real reason is that kprobes was written before we 
> had this more flexible and more usable CPA code.

I'm not saying that was the reason kprobes did it that way. I was just 
answering your question about how kprobes does it today ;-)

Perhaps it may be more efficient to convert kprobes to use the 
set_memory_rw/ro API now.

-- Steve


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2009-02-20 18:00 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-02-20 16:38 [git pull] updates for tip Steven Rostedt
2009-02-20 16:38 ` [PATCH 1/5] ftrace: allow archs to preform pre and post process for code modification Steven Rostedt
2009-02-20 17:25   ` Ingo Molnar
2009-02-20 17:32     ` Steven Rostedt
2009-02-20 16:38 ` [PATCH 2/5] ftrace, x86: make kernel text writable only for conversions Steven Rostedt
2009-02-20 17:30   ` Ingo Molnar
2009-02-20 17:52     ` Steven Rostedt
2009-02-20 17:55       ` Ingo Molnar
2009-02-20 18:00         ` Steven Rostedt
2009-02-20 16:38 ` [PATCH 3/5] ftrace: immediately stop code modification if failure is detected Steven Rostedt
2009-02-20 16:38 ` [PATCH 4/5] ftrace: break out modify loop immediately on detection of error Steven Rostedt
2009-02-20 17:33   ` Ingo Molnar
2009-02-20 17:54     ` Steven Rostedt
2009-02-20 16:38 ` [PATCH 5/5] ftrace, x86: do not depend on system state for kernel text info Steven Rostedt
2009-02-20 17:34   ` Ingo Molnar
2009-02-20 17:54     ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).