All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs
@ 2013-12-04  1:28 Masami Hiramatsu
  2013-12-04  1:28 ` [PATCH -tip v4 1/6] kprobes: Prohibit probing on .entry.text code Masami Hiramatsu
                   ` (7 more replies)
  0 siblings, 8 replies; 39+ messages in thread
From: Masami Hiramatsu @ 2013-12-04  1:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Ananth N Mavinakayanahalli, Sandeepa Prabhu, x86, lkml,
	Steven Rostedt (Red Hat),
	systemtap, David S. Miller

Hi,
Here is the version 4 of NOKPORBE_SYMBOL series.

In this version, I removed the cleanup patches and
add bugfixes I've found, since those bugs will be
critical.
Rest of the cleanup and visible blacklists will be
proposed later in another series.

Oh, just one new thing, I added a new RFC patch which
removes the dependency of notify_die() from kprobes
miss-hit/recovery path. Since the notify_die() involves
locking and lockdep code which invokes a lot of heavy
printk functions etc. This helped me to minimize the
blacklist and provides more stability for kprobes.
Actually, most of int3 handlers are already called
from do_int3 directly, I think this change is acceptable
too.

Here is the updates about NOKPROBE_SYMBOL().
 - Now _ASM_NOKPROBE() macro is introduced for assembly
   symbols on x86.
 - Rename kprobe_blackpoint to kprobe_blacklist_entry
   and simplify it. Also NOKPROBE_SYMBOL() macro just
   saves the address of non-probe-able symbols.

---

Masami Hiramatsu (6):
      kprobes: Prohibit probing on .entry.text code
      kprobes: Introduce NOKPROBE_SYMBOL() macro for blacklist
      [BUGFIX] kprobes/x86: Prohibit probing on debug_stack_*
      [BUGFIX] x86: Prohibit probing on native_set_debugreg
      [BUGFIX] x86: Prohibit probing on thunk functions and restore
      [RFC] kprobes/x86: Call exception handlers directly from do_int3/do_debug


 Documentation/kprobes.txt         |   16 +++++
 arch/x86/include/asm/asm.h        |    7 ++
 arch/x86/include/asm/kprobes.h    |    2 +
 arch/x86/kernel/cpu/common.c      |    4 +
 arch/x86/kernel/entry_32.S        |   33 -----------
 arch/x86/kernel/entry_64.S        |   20 -------
 arch/x86/kernel/kprobes/core.c    |   32 ++++------
 arch/x86/kernel/paravirt.c        |    5 ++
 arch/x86/kernel/traps.c           |   10 +++
 arch/x86/lib/thunk_32.S           |    3 +
 arch/x86/lib/thunk_64.S           |    3 +
 include/asm-generic/vmlinux.lds.h |    9 +++
 include/linux/kprobes.h           |   21 ++++++-
 kernel/kprobes.c                  |  113 ++++++++++++++++++++-----------------
 kernel/sched/core.c               |    1 
 15 files changed, 147 insertions(+), 132 deletions(-)

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH -tip v4 1/6] kprobes: Prohibit probing on .entry.text code
  2013-12-04  1:28 [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs Masami Hiramatsu
@ 2013-12-04  1:28 ` Masami Hiramatsu
  2013-12-04  1:28 ` [PATCH -tip v4 2/6] kprobes: Introduce NOKPROBE_SYMBOL() macro for blacklist Masami Hiramatsu
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 39+ messages in thread
From: Masami Hiramatsu @ 2013-12-04  1:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Ananth N Mavinakayanahalli, Sandeepa Prabhu, Frederic Weisbecker,
	x86, lkml, Steven Rostedt (Red Hat),
	Peter Zijlstra, Ingo Molnar, Al Viro, systemtap, H. Peter Anvin,
	Thomas Gleixner, Seiji Aguchi, David S. Miller

.entry.text is a code area which is used for interrupt/syscall
entries, and there are many sensitive codes.
Thus, it is better to prohibit probing on all of such codes
instead of a part of that.
Since some symbols are already registered on kprobe blacklist,
this also removes them from the blacklist.

Changes from previous:
 - Introduce arch_within_kprobe_blacklist() which checks
   the address is within the .kprobes.text (generic,x86) or
   .entry.text (x86), for fixing build issue on !x86.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
---
 arch/x86/kernel/entry_32.S     |   33 ---------------------------------
 arch/x86/kernel/entry_64.S     |   20 --------------------
 arch/x86/kernel/kprobes/core.c |    8 ++++++++
 include/linux/kprobes.h        |    1 +
 kernel/kprobes.c               |   13 ++++++++-----
 5 files changed, 17 insertions(+), 58 deletions(-)

diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
index 51e2988..02c2fef 100644
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -315,10 +315,6 @@ ENTRY(ret_from_kernel_thread)
 ENDPROC(ret_from_kernel_thread)
 
 /*
- * Interrupt exit functions should be protected against kprobes
- */
-	.pushsection .kprobes.text, "ax"
-/*
  * Return to user mode is not as complex as all this looks,
  * but we want the default path for a system call return to
  * go as quickly as possible which is why some of this is
@@ -372,10 +368,6 @@ need_resched:
 END(resume_kernel)
 #endif
 	CFI_ENDPROC
-/*
- * End of kprobes section
- */
-	.popsection
 
 /* SYSENTER_RETURN points to after the "sysenter" instruction in
    the vsyscall page.  See vsyscall-sysentry.S, which defines the symbol.  */
@@ -495,10 +487,6 @@ sysexit_audit:
 	PTGS_TO_GS_EX
 ENDPROC(ia32_sysenter_target)
 
-/*
- * syscall stub including irq exit should be protected against kprobes
- */
-	.pushsection .kprobes.text, "ax"
 	# system call handler stub
 ENTRY(system_call)
 	RING0_INT_FRAME			# can't unwind into user space anyway
@@ -691,10 +679,6 @@ syscall_badsys:
 	jmp resume_userspace
 END(syscall_badsys)
 	CFI_ENDPROC
-/*
- * End of kprobes section
- */
-	.popsection
 
 .macro FIXUP_ESPFIX_STACK
 /*
@@ -781,10 +765,6 @@ common_interrupt:
 ENDPROC(common_interrupt)
 	CFI_ENDPROC
 
-/*
- *  Irq entries should be protected against kprobes
- */
-	.pushsection .kprobes.text, "ax"
 #define BUILD_INTERRUPT3(name, nr, fn)	\
 ENTRY(name)				\
 	RING0_INT_FRAME;		\
@@ -961,10 +941,6 @@ ENTRY(spurious_interrupt_bug)
 	jmp error_code
 	CFI_ENDPROC
 END(spurious_interrupt_bug)
-/*
- * End of kprobes section
- */
-	.popsection
 
 #ifdef CONFIG_XEN
 /* Xen doesn't set %esp to be precisely what the normal sysenter
@@ -1239,11 +1215,6 @@ return_to_handler:
 	jmp *%ecx
 #endif
 
-/*
- * Some functions should be protected against kprobes
- */
-	.pushsection .kprobes.text, "ax"
-
 #ifdef CONFIG_TRACING
 ENTRY(trace_page_fault)
 	RING0_EC_FRAME
@@ -1453,7 +1424,3 @@ ENTRY(async_page_fault)
 END(async_page_fault)
 #endif
 
-/*
- * End of kprobes section
- */
-	.popsection
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index e21b078..c48f8f9 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -487,8 +487,6 @@ ENDPROC(native_usergs_sysret64)
 	TRACE_IRQS_OFF
 	.endm
 
-/* save complete stack frame */
-	.pushsection .kprobes.text, "ax"
 ENTRY(save_paranoid)
 	XCPT_FRAME 1 RDI+8
 	cld
@@ -517,7 +515,6 @@ ENTRY(save_paranoid)
 1:	ret
 	CFI_ENDPROC
 END(save_paranoid)
-	.popsection
 
 /*
  * A newly forked process directly context switches into this address.
@@ -975,10 +972,6 @@ END(interrupt)
 	call \func
 	.endm
 
-/*
- * Interrupt entry/exit should be protected against kprobes
- */
-	.pushsection .kprobes.text, "ax"
 	/*
 	 * The interrupt stubs push (~vector+0x80) onto the stack and
 	 * then jump to common_interrupt.
@@ -1113,10 +1106,6 @@ ENTRY(retint_kernel)
 
 	CFI_ENDPROC
 END(common_interrupt)
-/*
- * End of kprobes section
- */
-       .popsection
 
 /*
  * APIC interrupts.
@@ -1477,11 +1466,6 @@ apicinterrupt3 HYPERVISOR_CALLBACK_VECTOR \
 	hyperv_callback_vector hyperv_vector_handler
 #endif /* CONFIG_HYPERV */
 
-/*
- * Some functions should be protected against kprobes
- */
-	.pushsection .kprobes.text, "ax"
-
 paranoidzeroentry_ist debug do_debug DEBUG_STACK
 paranoidzeroentry_ist int3 do_int3 DEBUG_STACK
 paranoiderrorentry stack_segment do_stack_segment
@@ -1898,7 +1882,3 @@ ENTRY(ignore_sysret)
 	CFI_ENDPROC
 END(ignore_sysret)
 
-/*
- * End of kprobes section
- */
-	.popsection
diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 79a3f96..c01e70f 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -1066,6 +1066,14 @@ int __kprobes longjmp_break_handler(struct kprobe *p, struct pt_regs *regs)
 	return 0;
 }
 
+bool arch_within_kprobe_blacklist(unsigned long addr)
+{
+	return  (addr >= (unsigned long)__kprobes_text_start &&
+		 addr < (unsigned long)__kprobes_text_end) ||
+		(addr >= (unsigned long)__entry_text_start &&
+		 addr < (unsigned long)__entry_text_end);
+}
+
 int __init arch_init_kprobes(void)
 {
 	return 0;
diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 925eaf2..cdf9251 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -265,6 +265,7 @@ extern void arch_disarm_kprobe(struct kprobe *p);
 extern int arch_init_kprobes(void);
 extern void show_registers(struct pt_regs *regs);
 extern void kprobes_inc_nmissed_count(struct kprobe *p);
+extern bool arch_within_kprobe_blacklist(unsigned long addr);
 
 struct kprobe_insn_cache {
 	struct mutex mutex;
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index ceeadfc..c6772b9 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -96,9 +96,6 @@ static raw_spinlock_t *kretprobe_table_lock_ptr(unsigned long hash)
 static struct kprobe_blackpoint kprobe_blacklist[] = {
 	{"preempt_schedule",},
 	{"native_get_debugreg",},
-	{"irq_entries_start",},
-	{"common_interrupt",},
-	{"mcount",},	/* mcount can be called from everywhere */
 	{NULL}    /* Terminator */
 };
 
@@ -1324,12 +1321,18 @@ out:
 	return ret;
 }
 
+bool __weak arch_within_kprobe_blacklist(unsigned long addr)
+{
+	/* The __kprobes marked functions and entry code must not be probed */
+	return (addr >= (unsigned long)__kprobes_text_start &&
+		addr < (unsigned long)__kprobes_text_end);
+}
+
 static int __kprobes in_kprobes_functions(unsigned long addr)
 {
 	struct kprobe_blackpoint *kb;
 
-	if (addr >= (unsigned long)__kprobes_text_start &&
-	    addr < (unsigned long)__kprobes_text_end)
+	if (arch_within_kprobe_blacklist(addr))
 		return -EINVAL;
 	/*
 	 * If there exists a kprobe_blacklist, verify and



^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH -tip v4 2/6] kprobes: Introduce NOKPROBE_SYMBOL() macro for blacklist
  2013-12-04  1:28 [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs Masami Hiramatsu
  2013-12-04  1:28 ` [PATCH -tip v4 1/6] kprobes: Prohibit probing on .entry.text code Masami Hiramatsu
@ 2013-12-04  1:28 ` Masami Hiramatsu
  2013-12-04  1:28 ` [PATCH -tip v4 3/6] [BUGFIX] kprobes/x86: Prohibit probing on debug_stack_* Masami Hiramatsu
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 39+ messages in thread
From: Masami Hiramatsu @ 2013-12-04  1:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rob Landley, Jeremy Fitzhardinge, Rusty Russell,
	Ananth N Mavinakayanahalli, Arnd Bergmann, Sandeepa Prabhu, x86,
	lkml, Steven Rostedt (Red Hat),
	Chris Wright, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	systemtap, H. Peter Anvin, Alok Kataria, David S. Miller

Introduce NOKPROBE_SYMBOL() macro which builds a kprobe
blacklist in build time. The usage of this macro is similar
to the EXPORT_SYMBOL, put the NOKPROBE_SYMBOL(function); just
after the function definition.

When CONFIG_KPROBES=y, the macro stores the given function
address in the "_kprobe_blacklist" section.

Since the data structures are not fully initialized by the
macro (because there is no "size" information),  those
are re-initialized at boot time by using kallsyms.

Changes from previous version:
 - Rename kprobe_blackpoint to kprobe_blacklist_entry and
   simplify it.
 - Just stores the address of given function instead of
   defining a data structure in NOKPROBE_SYMBOL().
 - Add _ASM_NOKPROBE macro support for x86.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Rob Landley <rob@landley.net>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 Documentation/kprobes.txt         |   16 ++++++
 arch/x86/include/asm/asm.h        |    7 +++
 arch/x86/kernel/paravirt.c        |    4 +
 include/asm-generic/vmlinux.lds.h |    9 +++
 include/linux/kprobes.h           |   20 ++++++-
 kernel/kprobes.c                  |  100 +++++++++++++++++++------------------
 kernel/sched/core.c               |    1 
 7 files changed, 105 insertions(+), 52 deletions(-)

diff --git a/Documentation/kprobes.txt b/Documentation/kprobes.txt
index 0cfb00f..7062631 100644
--- a/Documentation/kprobes.txt
+++ b/Documentation/kprobes.txt
@@ -22,8 +22,9 @@ Appendix B: The kprobes sysctl interface
 
 Kprobes enables you to dynamically break into any kernel routine and
 collect debugging and performance information non-disruptively. You
-can trap at almost any kernel code address, specifying a handler
+can trap at almost any kernel code address(*), specifying a handler
 routine to be invoked when the breakpoint is hit.
+(*: at some part of kernel code can not be trapped, see 1.5 Blacklist)
 
 There are currently three types of probes: kprobes, jprobes, and
 kretprobes (also called return probes).  A kprobe can be inserted
@@ -273,6 +274,19 @@ using one of the following techniques:
  or
 - Execute 'sysctl -w debug.kprobes_optimization=n'
 
+1.5 Blacklist
+
+Kprobes can probe almost of the kernel except itself. This means
+that there are some functions where kprobes cannot probe. Probing
+(trapping) such functions can cause recursive trap (e.g. double
+fault) or at least the nested probe handler never be called.
+Kprobes manages such functions as a blacklist.
+If you want to add a function into the blacklist, you just need
+to (1) include linux/kprobes.h and (2) use NOKPROBE_SYMBOL() macro
+to specify a blacklisted function.
+Kprobes checks given probe address with the blacklist and reject
+registering if the given address is in the blacklist.
+
 2. Architectures Supported
 
 Kprobes, jprobes, and return probes are implemented on the following
diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
index 4582e8e..7730c1c 100644
--- a/arch/x86/include/asm/asm.h
+++ b/arch/x86/include/asm/asm.h
@@ -57,6 +57,12 @@
 	.long (from) - . ;					\
 	.long (to) - . + 0x7ffffff0 ;				\
 	.popsection
+
+# define _ASM_NOKPROBE(entry)					\
+	.pushsection "_kprobe_blacklist","aw" ;			\
+	_ASM_ALIGN ;						\
+	_ASM_PTR (entry);					\
+	.popsection
 #else
 # define _ASM_EXTABLE(from,to)					\
 	" .pushsection \"__ex_table\",\"a\"\n"			\
@@ -71,6 +77,7 @@
 	" .long (" #from ") - .\n"				\
 	" .long (" #to ") - . + 0x7ffffff0\n"			\
 	" .popsection\n"
+/* For C file, we already have NOKPROBE_SYMBOL macro */
 #endif
 
 #endif /* _ASM_X86_ASM_H */
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 1b10af8..4c785fd 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -23,6 +23,7 @@
 #include <linux/efi.h>
 #include <linux/bcd.h>
 #include <linux/highmem.h>
+#include <linux/kprobes.h>
 
 #include <asm/bug.h>
 #include <asm/paravirt.h>
@@ -389,6 +390,9 @@ __visible struct pv_cpu_ops pv_cpu_ops = {
 	.end_context_switch = paravirt_nop,
 };
 
+/* At this point, native_get_debugreg has real function entry */
+NOKPROBE_SYMBOL(native_get_debugreg);
+
 struct pv_apic_ops pv_apic_ops = {
 #ifdef CONFIG_X86_LOCAL_APIC
 	.startup_ipi_hook = paravirt_nop,
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index bc2121f..81d07d5 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -109,6 +109,14 @@
 #define BRANCH_PROFILE()
 #endif
 
+#ifdef CONFIG_KPROBES
+#define KPROBE_BLACKLIST()	VMLINUX_SYMBOL(__start_kprobe_blacklist) = .; \
+				*(_kprobe_blacklist)			      \
+				VMLINUX_SYMBOL(__stop_kprobe_blacklist) = .;
+#else
+#define KPROBE_BLACKLIST()
+#endif
+
 #ifdef CONFIG_EVENT_TRACING
 #define FTRACE_EVENTS()	. = ALIGN(8);					\
 			VMLINUX_SYMBOL(__start_ftrace_events) = .;	\
@@ -488,6 +496,7 @@
 	*(.init.rodata)							\
 	FTRACE_EVENTS()							\
 	TRACE_SYSCALLS()						\
+	KPROBE_BLACKLIST()						\
 	MEM_DISCARD(init.rodata)					\
 	CLK_OF_TABLES()							\
 	CLKSRC_OF_TABLES()						\
diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index cdf9251..e059507 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -205,10 +205,10 @@ struct kretprobe_blackpoint {
 	void *addr;
 };
 
-struct kprobe_blackpoint {
-	const char *name;
+struct kprobe_blacklist_entry {
+	struct list_head list;
 	unsigned long start_addr;
-	unsigned long range;
+	unsigned long end_addr;
 };
 
 #ifdef CONFIG_KPROBES
@@ -477,4 +477,18 @@ static inline int enable_jprobe(struct jprobe *jp)
 	return enable_kprobe(&jp->kp);
 }
 
+#ifdef CONFIG_KPROBES
+/*
+ * Blacklist ganerating macro. Specify functions which is not probed
+ * by using this macro.
+ */
+#define __NOKPROBE_SYMBOL(fname)			\
+static unsigned long __used				\
+	__attribute__((section("_kprobe_blacklist")))	\
+	_kbl_addr_##fname = (unsigned long)fname;
+#define NOKPROBE_SYMBOL(fname)	__NOKPROBE_SYMBOL(fname)
+#else
+#define NOKPROBE_SYMBOL(fname)
+#endif
+
 #endif /* _LINUX_KPROBES_H */
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index c6772b9..43ef40b 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -86,18 +86,8 @@ static raw_spinlock_t *kretprobe_table_lock_ptr(unsigned long hash)
 	return &(kretprobe_table_locks[hash].lock);
 }
 
-/*
- * Normally, functions that we'd want to prohibit kprobes in, are marked
- * __kprobes. But, there are cases where such functions already belong to
- * a different section (__sched for preempt_schedule)
- *
- * For such cases, we now have a blacklist
- */
-static struct kprobe_blackpoint kprobe_blacklist[] = {
-	{"preempt_schedule",},
-	{"native_get_debugreg",},
-	{NULL}    /* Terminator */
-};
+/* Blacklist -- list of struct kprobe_blacklist_entry */
+static LIST_HEAD(kprobe_blacklist);
 
 #ifdef __ARCH_WANT_KPROBES_INSN_SLOT
 /*
@@ -1328,24 +1318,22 @@ bool __weak arch_within_kprobe_blacklist(unsigned long addr)
 		addr < (unsigned long)__kprobes_text_end);
 }
 
-static int __kprobes in_kprobes_functions(unsigned long addr)
+static bool __kprobes within_kprobe_blacklist(unsigned long addr)
 {
-	struct kprobe_blackpoint *kb;
+	struct kprobe_blacklist_entry *ent;
 
 	if (arch_within_kprobe_blacklist(addr))
-		return -EINVAL;
+		return true;
 	/*
 	 * If there exists a kprobe_blacklist, verify and
 	 * fail any probe registration in the prohibited area
 	 */
-	for (kb = kprobe_blacklist; kb->name != NULL; kb++) {
-		if (kb->start_addr) {
-			if (addr >= kb->start_addr &&
-			    addr < (kb->start_addr + kb->range))
-				return -EINVAL;
-		}
+	list_for_each_entry(ent, &kprobe_blacklist, list) {
+		if (addr >= ent->start_addr && addr < ent->end_addr)
+			return true;
 	}
-	return 0;
+
+	return false;
 }
 
 /*
@@ -1436,7 +1424,7 @@ static __kprobes int check_kprobe_address_safe(struct kprobe *p,
 
 	/* Ensure it is not in reserved area nor out of text */
 	if (!kernel_text_address((unsigned long) p->addr) ||
-	    in_kprobes_functions((unsigned long) p->addr) ||
+	    within_kprobe_blacklist((unsigned long) p->addr) ||
 	    jump_label_text_reserved(p->addr, p->addr)) {
 		ret = -EINVAL;
 		goto out;
@@ -2022,6 +2010,38 @@ void __kprobes dump_kprobe(struct kprobe *kp)
 	       kp->symbol_name, kp->addr, kp->offset);
 }
 
+/*
+ * Lookup and populate the kprobe_blacklist.
+ *
+ * Unlike the kretprobe blacklist, we'll need to determine
+ * the range of addresses that belong to the said functions,
+ * since a kprobe need not necessarily be at the beginning
+ * of a function.
+ */
+static int __init populate_kprobe_blacklist(unsigned long *start,
+					     unsigned long *end)
+{
+	unsigned long *iter;
+	struct kprobe_blacklist_entry *ent;
+	unsigned long offset = 0, size = 0;
+
+	for (iter = start; iter < end; iter++) {
+		if (!kallsyms_lookup_size_offset(*iter, &size, &offset)) {
+			pr_err("Failed to find blacklist %p\n", (void *)*iter);
+			continue;
+		}
+
+		ent = kmalloc(sizeof(*ent), GFP_KERNEL);
+		if (!ent)
+			return -ENOMEM;
+		ent->start_addr = *iter;
+		ent->end_addr = *iter + size;
+		INIT_LIST_HEAD(&ent->list);
+		list_add_tail(&ent->list, &kprobe_blacklist);
+	}
+	return 0;
+}
+
 /* Module notifier call back, checking kprobes on the module */
 static int __kprobes kprobes_module_callback(struct notifier_block *nb,
 					     unsigned long val, void *data)
@@ -2065,14 +2085,13 @@ static struct notifier_block kprobe_module_nb = {
 	.priority = 0
 };
 
+/* Markers of _kprobe_blacklist section */
+extern unsigned long __start_kprobe_blacklist[];
+extern unsigned long __stop_kprobe_blacklist[];
+
 static int __init init_kprobes(void)
 {
 	int i, err = 0;
-	unsigned long offset = 0, size = 0;
-	char *modname, namebuf[KSYM_NAME_LEN];
-	const char *symbol_name;
-	void *addr;
-	struct kprobe_blackpoint *kb;
 
 	/* FIXME allocate the probe table, currently defined statically */
 	/* initialize all list heads */
@@ -2082,26 +2101,11 @@ static int __init init_kprobes(void)
 		raw_spin_lock_init(&(kretprobe_table_locks[i].lock));
 	}
 
-	/*
-	 * Lookup and populate the kprobe_blacklist.
-	 *
-	 * Unlike the kretprobe blacklist, we'll need to determine
-	 * the range of addresses that belong to the said functions,
-	 * since a kprobe need not necessarily be at the beginning
-	 * of a function.
-	 */
-	for (kb = kprobe_blacklist; kb->name != NULL; kb++) {
-		kprobe_lookup_name(kb->name, addr);
-		if (!addr)
-			continue;
-
-		kb->start_addr = (unsigned long)addr;
-		symbol_name = kallsyms_lookup(kb->start_addr,
-				&size, &offset, &modname, namebuf);
-		if (!symbol_name)
-			kb->range = 0;
-		else
-			kb->range = size;
+	err = populate_kprobe_blacklist(__start_kprobe_blacklist,
+					__stop_kprobe_blacklist);
+	if (err) {
+		pr_err("kprobes: failed to populate blacklist: %d\n", err);
+		pr_err("Please take care of using kprobes.\n");
 	}
 
 	if (kretprobe_blacklist_size) {
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 87c3bc4..05e7595 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2662,6 +2662,7 @@ asmlinkage void __sched notrace preempt_schedule(void)
 		barrier();
 	} while (need_resched());
 }
+NOKPROBE_SYMBOL(preempt_schedule);
 EXPORT_SYMBOL(preempt_schedule);
 #endif /* CONFIG_PREEMPT */
 



^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH -tip v4 3/6] [BUGFIX] kprobes/x86: Prohibit probing on debug_stack_*
  2013-12-04  1:28 [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs Masami Hiramatsu
  2013-12-04  1:28 ` [PATCH -tip v4 1/6] kprobes: Prohibit probing on .entry.text code Masami Hiramatsu
  2013-12-04  1:28 ` [PATCH -tip v4 2/6] kprobes: Introduce NOKPROBE_SYMBOL() macro for blacklist Masami Hiramatsu
@ 2013-12-04  1:28 ` Masami Hiramatsu
  2013-12-04  1:28 ` [PATCH -tip v4 4/6] [BUGFIX] x86: Prohibit probing on native_set_debugreg Masami Hiramatsu
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 39+ messages in thread
From: Masami Hiramatsu @ 2013-12-04  1:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Fenghua Yu, Seiji Aguchi, Ananth N Mavinakayanahalli,
	Sandeepa Prabhu, x86, lkml, Steven Rostedt (Red Hat),
	Ingo Molnar, systemtap, H. Peter Anvin, Thomas Gleixner,
	Borislav Petkov, David S. Miller

Prohibit probing on debug_stack_reset and debug_stack_set_zero.
Since the both functions are called from TRACE_IRQS_ON/OFF_DEBUG
macros which run in int3 ist entry, probing it may cause a soft
lockup.

This happens when the kernel built with CONFIG_DYNAMIC_FTRACE=y
and CONFIG_TRACE_IRQFLAGS=y.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
---
 arch/x86/kernel/cpu/common.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 6abc172..220de22 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -8,6 +8,7 @@
 #include <linux/delay.h>
 #include <linux/sched.h>
 #include <linux/init.h>
+#include <linux/kprobes.h>
 #include <linux/kgdb.h>
 #include <linux/smp.h>
 #include <linux/io.h>
@@ -1153,6 +1154,7 @@ int is_debug_stack(unsigned long addr)
 		(addr <= __get_cpu_var(debug_stack_addr) &&
 		 addr > (__get_cpu_var(debug_stack_addr) - DEBUG_STKSZ));
 }
+NOKPROBE_SYMBOL(is_debug_stack);
 
 DEFINE_PER_CPU(u32, debug_idt_ctr);
 
@@ -1161,6 +1163,7 @@ void debug_stack_set_zero(void)
 	this_cpu_inc(debug_idt_ctr);
 	load_current_idt();
 }
+NOKPROBE_SYMBOL(debug_stack_set_zero);
 
 void debug_stack_reset(void)
 {
@@ -1169,6 +1172,7 @@ void debug_stack_reset(void)
 	if (this_cpu_dec_return(debug_idt_ctr) == 0)
 		load_current_idt();
 }
+NOKPROBE_SYMBOL(debug_stack_reset);
 
 #else	/* CONFIG_X86_64 */
 



^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH -tip v4 4/6] [BUGFIX] x86: Prohibit probing on native_set_debugreg
  2013-12-04  1:28 [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs Masami Hiramatsu
                   ` (2 preceding siblings ...)
  2013-12-04  1:28 ` [PATCH -tip v4 3/6] [BUGFIX] kprobes/x86: Prohibit probing on debug_stack_* Masami Hiramatsu
@ 2013-12-04  1:28 ` Masami Hiramatsu
  2013-12-04  1:28 ` [PATCH -tip v4 5/6] [BUGFIX] x86: Prohibit probing on thunk functions and restore Masami Hiramatsu
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 39+ messages in thread
From: Masami Hiramatsu @ 2013-12-04  1:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jeremy Fitzhardinge, Rusty Russell, Ananth N Mavinakayanahalli,
	Sandeepa Prabhu, x86, lkml, Steven Rostedt (Red Hat),
	Chris Wright, Ingo Molnar, Thomas Gleixner, systemtap,
	H. Peter Anvin, Alok Kataria, David S. Miller

Since the kprobes uses do_debug for single stepping,
functions called from do_debug before notify_die must
not be probed. This prohibits probing on
native_set_debugreg which is used in do_debug.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/kernel/paravirt.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 4c785fd..108e685 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -390,8 +390,9 @@ __visible struct pv_cpu_ops pv_cpu_ops = {
 	.end_context_switch = paravirt_nop,
 };
 
-/* At this point, native_get_debugreg has real function entry */
+/* At this point, native_get/set_debugreg has real function entry */
 NOKPROBE_SYMBOL(native_get_debugreg);
+NOKPROBE_SYMBOL(native_set_debugreg);
 
 struct pv_apic_ops pv_apic_ops = {
 #ifdef CONFIG_X86_LOCAL_APIC



^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH -tip v4 5/6] [BUGFIX] x86: Prohibit probing on thunk functions and restore
  2013-12-04  1:28 [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs Masami Hiramatsu
                   ` (3 preceding siblings ...)
  2013-12-04  1:28 ` [PATCH -tip v4 4/6] [BUGFIX] x86: Prohibit probing on native_set_debugreg Masami Hiramatsu
@ 2013-12-04  1:28 ` Masami Hiramatsu
  2013-12-04  1:28 ` [PATCH -tip v4 6/6] [RFC] kprobes/x86: Call exception handlers directly from do_int3/do_debug Masami Hiramatsu
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 39+ messages in thread
From: Masami Hiramatsu @ 2013-12-04  1:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Ananth N Mavinakayanahalli, Sandeepa Prabhu, x86, lkml,
	Steven Rostedt (Red Hat),
	Ingo Molnar, systemtap, H. Peter Anvin, Thomas Gleixner,
	David S. Miller

thunk/restore functions are also used for tracing irqoff etc.
and those are involved in kprobe's exception handling.
Prohibit probing on them to avoid kernel crash.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/lib/thunk_32.S |    3 ++-
 arch/x86/lib/thunk_64.S |    3 +++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/lib/thunk_32.S b/arch/x86/lib/thunk_32.S
index 2930ae0..28f85c91 100644
--- a/arch/x86/lib/thunk_32.S
+++ b/arch/x86/lib/thunk_32.S
@@ -4,8 +4,8 @@
  *  (inspired by Andi Kleen's thunk_64.S)
  * Subject to the GNU public license, v.2. No warranty of any kind.
  */
-
 	#include <linux/linkage.h>
+	#include <asm/asm.h>
 
 #ifdef CONFIG_TRACE_IRQFLAGS
 	/* put return address in eax (arg1) */
@@ -22,6 +22,7 @@
 	popl %ecx
 	popl %eax
 	ret
+	_ASM_NOKPROBE(\name)
 	.endm
 
 	thunk_ra trace_hardirqs_on_thunk,trace_hardirqs_on_caller
diff --git a/arch/x86/lib/thunk_64.S b/arch/x86/lib/thunk_64.S
index a63efd6..92d9fea 100644
--- a/arch/x86/lib/thunk_64.S
+++ b/arch/x86/lib/thunk_64.S
@@ -8,6 +8,7 @@
 #include <linux/linkage.h>
 #include <asm/dwarf2.h>
 #include <asm/calling.h>
+#include <asm/asm.h>
 
 	/* rdi:	arg1 ... normal C conventions. rax is saved/restored. */
 	.macro THUNK name, func, put_ret_addr_in_rdi=0
@@ -25,6 +26,7 @@
 	call \func
 	jmp  restore
 	CFI_ENDPROC
+	_ASM_NOKPROBE(\name)
 	.endm
 
 #ifdef CONFIG_TRACE_IRQFLAGS
@@ -43,3 +45,4 @@ restore:
 	RESTORE_ARGS
 	ret
 	CFI_ENDPROC
+	_ASM_NOKPROBE(restore)



^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH -tip v4 6/6] [RFC] kprobes/x86: Call exception handlers directly from do_int3/do_debug
  2013-12-04  1:28 [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs Masami Hiramatsu
                   ` (4 preceding siblings ...)
  2013-12-04  1:28 ` [PATCH -tip v4 5/6] [BUGFIX] x86: Prohibit probing on thunk functions and restore Masami Hiramatsu
@ 2013-12-04  1:28 ` Masami Hiramatsu
  2013-12-04  2:39   ` Steven Rostedt
  2013-12-04  2:54 ` [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs Sandeepa Prabhu
  2013-12-04  8:45 ` Ingo Molnar
  7 siblings, 1 reply; 39+ messages in thread
From: Masami Hiramatsu @ 2013-12-04  1:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, Ananth N Mavinakayanahalli, Sandeepa Prabhu,
	Frederic Weisbecker, x86, lkml, Steven Rostedt, Andrew Morton,
	Ingo Molnar, systemtap, H. Peter Anvin, Sasha Levin,
	Thomas Gleixner, Seiji Aguchi, David S. Miller

To avoid a kernel crash by probing on lockdep code, call
kprobe_int3_handler and kprobe_debug_handler directly
from do_int3 and do_debug. Since there is a locking code
in notify_die, lockdep code can be invoked. And because
the lockdep involves printk() related things, theoretically,
we need to prohibit probing on much more code...

Anyway, most of the int3 handlers in the kernel are already
called from do_int3 directly, e.g. ftrace_int3_handler,
poke_int3_handler, kgdb_ll_trap. Actually only
kprobe_exceptions_notify is on the notifier_call_chain.

So I think this is not a crazy thing.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
---
 arch/x86/include/asm/kprobes.h |    2 ++
 arch/x86/kernel/kprobes/core.c |   24 +++---------------------
 arch/x86/kernel/traps.c        |   10 ++++++++++
 3 files changed, 15 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/kprobes.h b/arch/x86/include/asm/kprobes.h
index 9454c16..53cdfb2 100644
--- a/arch/x86/include/asm/kprobes.h
+++ b/arch/x86/include/asm/kprobes.h
@@ -116,4 +116,6 @@ struct kprobe_ctlblk {
 extern int kprobe_fault_handler(struct pt_regs *regs, int trapnr);
 extern int kprobe_exceptions_notify(struct notifier_block *self,
 				    unsigned long val, void *data);
+extern int kprobe_int3_handler(struct pt_regs *regs);
+extern int kprobe_debug_handler(struct pt_regs *regs);
 #endif /* _ASM_X86_KPROBES_H */
diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index c01e70f..4d7dbe1 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -558,7 +558,7 @@ reenter_kprobe(struct kprobe *p, struct pt_regs *regs, struct kprobe_ctlblk *kcb
  * Interrupts are disabled on entry as trap3 is an interrupt gate and they
  * remain disabled throughout this function.
  */
-static int __kprobes kprobe_handler(struct pt_regs *regs)
+int __kprobes kprobe_int3_handler(struct pt_regs *regs)
 {
 	kprobe_opcode_t *addr;
 	struct kprobe *p;
@@ -856,7 +856,7 @@ no_change:
  * Interrupts are disabled on entry as trap1 is an interrupt gate and they
  * remain disabled throughout this function.
  */
-static int __kprobes post_kprobe_handler(struct pt_regs *regs)
+int __kprobes kprobe_debug_handler(struct pt_regs *regs)
 {
 	struct kprobe *cur = kprobe_running();
 	struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
@@ -964,22 +964,7 @@ kprobe_exceptions_notify(struct notifier_block *self, unsigned long val, void *d
 	if (args->regs && user_mode_vm(args->regs))
 		return ret;
 
-	switch (val) {
-	case DIE_INT3:
-		if (kprobe_handler(args->regs))
-			ret = NOTIFY_STOP;
-		break;
-	case DIE_DEBUG:
-		if (post_kprobe_handler(args->regs)) {
-			/*
-			 * Reset the BS bit in dr6 (pointed by args->err) to
-			 * denote completion of processing
-			 */
-			(*(unsigned long *)ERR_PTR(args->err)) &= ~DR_STEP;
-			ret = NOTIFY_STOP;
-		}
-		break;
-	case DIE_GPF:
+	if (val == DIE_GPF) {
 		/*
 		 * To be potentially processing a kprobe fault and to
 		 * trust the result from kprobe_running(), we have
@@ -988,9 +973,6 @@ kprobe_exceptions_notify(struct notifier_block *self, unsigned long val, void *d
 		if (!preemptible() && kprobe_running() &&
 		    kprobe_fault_handler(args->regs, args->trapnr))
 			ret = NOTIFY_STOP;
-		break;
-	default:
-		break;
 	}
 	return ret;
 }
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index b857ed8..da2b0c1 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -338,6 +338,11 @@ dotraplinkage void __kprobes notrace do_int3(struct pt_regs *regs, long error_co
 		goto exit;
 #endif /* CONFIG_KGDB_LOW_LEVEL_TRAP */
 
+#ifdef CONFIG_KPROBES
+	if (kprobe_int3_handler(regs))
+		return;
+#endif
+
 	if (notify_die(DIE_INT3, "int3", regs, error_code, X86_TRAP_BP,
 			SIGTRAP) == NOTIFY_STOP)
 		goto exit;
@@ -444,6 +449,11 @@ dotraplinkage void __kprobes do_debug(struct pt_regs *regs, long error_code)
 	/* Store the virtualized DR6 value */
 	tsk->thread.debugreg6 = dr6;
 
+#ifdef CONFIG_KPROBES
+	if (kprobe_debug_handler(regs))
+		goto exit;
+#endif
+
 	if (notify_die(DIE_DEBUG, "debug", regs, (long)&dr6, error_code,
 							SIGTRAP) == NOTIFY_STOP)
 		goto exit;



^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH -tip v4 6/6] [RFC] kprobes/x86: Call exception handlers directly from do_int3/do_debug
  2013-12-04  1:28 ` [PATCH -tip v4 6/6] [RFC] kprobes/x86: Call exception handlers directly from do_int3/do_debug Masami Hiramatsu
@ 2013-12-04  2:39   ` Steven Rostedt
  2013-12-11 13:31     ` Jiri Kosina
  0 siblings, 1 reply; 39+ messages in thread
From: Steven Rostedt @ 2013-12-04  2:39 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Ingo Molnar, Andi Kleen, Ananth N Mavinakayanahalli,
	Sandeepa Prabhu, Frederic Weisbecker, x86, lkml, Andrew Morton,
	Ingo Molnar, systemtap, H. Peter Anvin, Sasha Levin,
	Thomas Gleixner, Seiji Aguchi, David S. Miller

On Wed, 04 Dec 2013 01:28:56 +0000
Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> wrote:

> To avoid a kernel crash by probing on lockdep code, call
> kprobe_int3_handler and kprobe_debug_handler directly
> from do_int3 and do_debug. Since there is a locking code
> in notify_die, lockdep code can be invoked. And because
> the lockdep involves printk() related things, theoretically,
> we need to prohibit probing on much more code...
> 
> Anyway, most of the int3 handlers in the kernel are already
> called from do_int3 directly, e.g. ftrace_int3_handler,
> poke_int3_handler, kgdb_ll_trap. Actually only
> kprobe_exceptions_notify is on the notifier_call_chain.
> 
> So I think this is not a crazy thing.

What? Oh, yeah. No, using notifiers in int3 handler is the crazy
thing ;-)

Hmm, if there's no users of the int3 notifier, should we just remove it?

-- Steve

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs
  2013-12-04  1:28 [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs Masami Hiramatsu
                   ` (5 preceding siblings ...)
  2013-12-04  1:28 ` [PATCH -tip v4 6/6] [RFC] kprobes/x86: Call exception handlers directly from do_int3/do_debug Masami Hiramatsu
@ 2013-12-04  2:54 ` Sandeepa Prabhu
  2013-12-04  7:39   ` Masami Hiramatsu
  2013-12-04  8:45 ` Ingo Molnar
  7 siblings, 1 reply; 39+ messages in thread
From: Sandeepa Prabhu @ 2013-12-04  2:54 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Ingo Molnar, Ananth N Mavinakayanahalli, x86, lkml,
	Steven Rostedt (Red Hat),
	systemtap, David S. Miller

On 4 December 2013 06:58, Masami Hiramatsu
<masami.hiramatsu.pt@hitachi.com> wrote:
> Hi,
> Here is the version 4 of NOKPORBE_SYMBOL series.
>
> In this version, I removed the cleanup patches and
> add bugfixes I've found, since those bugs will be
> critical.
> Rest of the cleanup and visible blacklists will be
> proposed later in another series.
>
> Oh, just one new thing, I added a new RFC patch which
> removes the dependency of notify_die() from kprobes
> miss-hit/recovery path. Since the notify_die() involves
> locking and lockdep code which invokes a lot of heavy
> printk functions etc. This helped me to minimize the
> blacklist and provides more stability for kprobes.
> Actually, most of int3 handlers are already called
> from do_int3 directly, I think this change is acceptable
> too.
>
> Here is the updates about NOKPROBE_SYMBOL().
>  - Now _ASM_NOKPROBE() macro is introduced for assembly
>    symbols on x86.
>  - Rename kprobe_blackpoint to kprobe_blacklist_entry
>    and simplify it. Also NOKPROBE_SYMBOL() macro just
>    saves the address of non-probe-able symbols.
>
> ---
>
> Masami Hiramatsu (6):

>       kprobes: Prohibit probing on .entry.text code
>       kprobes: Introduce NOKPROBE_SYMBOL() macro for blacklist
Hi Masami,
Is it good idea to split  "arch/x86" code from generic kernel changes?
Then we just need to take above two patches for verifying it on arm64
or other platforms.

Thanks,
Sandeepa
>       [BUGFIX] kprobes/x86: Prohibit probing on debug_stack_*
>       [BUGFIX] x86: Prohibit probing on native_set_debugreg
>       [BUGFIX] x86: Prohibit probing on thunk functions and restore
>       [RFC] kprobes/x86: Call exception handlers directly from do_int3/do_debug
>
>
>  Documentation/kprobes.txt         |   16 +++++
>  arch/x86/include/asm/asm.h        |    7 ++
>  arch/x86/include/asm/kprobes.h    |    2 +
>  arch/x86/kernel/cpu/common.c      |    4 +
>  arch/x86/kernel/entry_32.S        |   33 -----------
>  arch/x86/kernel/entry_64.S        |   20 -------
>  arch/x86/kernel/kprobes/core.c    |   32 ++++------
>  arch/x86/kernel/paravirt.c        |    5 ++
>  arch/x86/kernel/traps.c           |   10 +++
>  arch/x86/lib/thunk_32.S           |    3 +
>  arch/x86/lib/thunk_64.S           |    3 +
>  include/asm-generic/vmlinux.lds.h |    9 +++
>  include/linux/kprobes.h           |   21 ++++++-
>  kernel/kprobes.c                  |  113 ++++++++++++++++++++-----------------
>  kernel/sched/core.c               |    1
>  15 files changed, 147 insertions(+), 132 deletions(-)
>
> --
> Masami HIRAMATSU
> IT Management Research Dept. Linux Technology Center
> Hitachi, Ltd., Yokohama Research Laboratory
> E-mail: masami.hiramatsu.pt@hitachi.com
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs
  2013-12-04  2:54 ` [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs Sandeepa Prabhu
@ 2013-12-04  7:39   ` Masami Hiramatsu
  2013-12-04  8:46     ` Sandeepa Prabhu
  0 siblings, 1 reply; 39+ messages in thread
From: Masami Hiramatsu @ 2013-12-04  7:39 UTC (permalink / raw)
  To: Sandeepa Prabhu
  Cc: Ingo Molnar, Ananth N Mavinakayanahalli, x86, lkml,
	Steven Rostedt (Red Hat),
	systemtap, David S. Miller

(2013/12/04 11:54), Sandeepa Prabhu wrote:
> On 4 December 2013 06:58, Masami Hiramatsu
> <masami.hiramatsu.pt@hitachi.com> wrote:
>> Hi,
>> Here is the version 4 of NOKPORBE_SYMBOL series.
>>
>> In this version, I removed the cleanup patches and
>> add bugfixes I've found, since those bugs will be
>> critical.
>> Rest of the cleanup and visible blacklists will be
>> proposed later in another series.
>>
>> Oh, just one new thing, I added a new RFC patch which
>> removes the dependency of notify_die() from kprobes
>> miss-hit/recovery path. Since the notify_die() involves
>> locking and lockdep code which invokes a lot of heavy
>> printk functions etc. This helped me to minimize the
>> blacklist and provides more stability for kprobes.
>> Actually, most of int3 handlers are already called
>> from do_int3 directly, I think this change is acceptable
>> too.
>>
>> Here is the updates about NOKPROBE_SYMBOL().
>>  - Now _ASM_NOKPROBE() macro is introduced for assembly
>>    symbols on x86.
>>  - Rename kprobe_blackpoint to kprobe_blacklist_entry
>>    and simplify it. Also NOKPROBE_SYMBOL() macro just
>>    saves the address of non-probe-able symbols.
>>
>> ---
>>
>> Masami Hiramatsu (6):
> 
>>       kprobes: Prohibit probing on .entry.text code
>>       kprobes: Introduce NOKPROBE_SYMBOL() macro for blacklist
> Hi Masami,
> Is it good idea to split  "arch/x86" code from generic kernel changes?
> Then we just need to take above two patches for verifying it on arm64
> or other platforms.

Yeah, it can be.
However I think you can apply it without any problem on arm64 tree too,
since it "just adds" an asm macro in arch/x86/include/asm/asm.h.
It should not have any effect for other arch. Could you try it? :)

Thank you,


-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs
  2013-12-04  1:28 [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs Masami Hiramatsu
                   ` (6 preceding siblings ...)
  2013-12-04  2:54 ` [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs Sandeepa Prabhu
@ 2013-12-04  8:45 ` Ingo Molnar
  2013-12-04 23:27   ` Masami Hiramatsu
  7 siblings, 1 reply; 39+ messages in thread
From: Ingo Molnar @ 2013-12-04  8:45 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Ananth N Mavinakayanahalli, Sandeepa Prabhu, x86, lkml,
	Steven Rostedt (Red Hat),
	systemtap, David S. Miller


* Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> wrote:

> Hi,
> Here is the version 4 of NOKPORBE_SYMBOL series.
> 
> In this version, I removed the cleanup patches and
> add bugfixes I've found, since those bugs will be
> critical.
>
> Rest of the cleanup and visible blacklists will be proposed later in 
> another series.

Ok, let me make it clear: we need _both_ the conceptual cleanups and 
the bug fixes.

Right now kprobes are restricted to root, and they are unsafe and 
buggy, and rather fundamentally so, because probing cannot be done 
safely without potentially crashing the kernel. So there's no 
'regression' to be fixed, it's mostly about pre-existing bugs - so 
there's no requirement for them to come before maintainability 
cleanups.

So we need both a maintainable and a sane/safe solution, and I'd like 
to apply the whole thing at once and be at ease that the solution is 
round. We should have done this years ago.

So could you please send a whole series that I can apply to -tip as a 
work in progress tree, and then we can see what is left to be solved?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs
  2013-12-04  7:39   ` Masami Hiramatsu
@ 2013-12-04  8:46     ` Sandeepa Prabhu
  2013-12-04 23:32       ` Masami Hiramatsu
  0 siblings, 1 reply; 39+ messages in thread
From: Sandeepa Prabhu @ 2013-12-04  8:46 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Ingo Molnar, Ananth N Mavinakayanahalli, x86, lkml,
	Steven Rostedt (Red Hat),
	systemtap, David S. Miller

On 4 December 2013 13:09, Masami Hiramatsu
<masami.hiramatsu.pt@hitachi.com> wrote:
> (2013/12/04 11:54), Sandeepa Prabhu wrote:
>> On 4 December 2013 06:58, Masami Hiramatsu
>> <masami.hiramatsu.pt@hitachi.com> wrote:
>>> Hi,
>>> Here is the version 4 of NOKPORBE_SYMBOL series.
>>>
>>> In this version, I removed the cleanup patches and
>>> add bugfixes I've found, since those bugs will be
>>> critical.
>>> Rest of the cleanup and visible blacklists will be
>>> proposed later in another series.
>>>
>>> Oh, just one new thing, I added a new RFC patch which
>>> removes the dependency of notify_die() from kprobes
>>> miss-hit/recovery path. Since the notify_die() involves
>>> locking and lockdep code which invokes a lot of heavy
>>> printk functions etc. This helped me to minimize the
>>> blacklist and provides more stability for kprobes.
>>> Actually, most of int3 handlers are already called
>>> from do_int3 directly, I think this change is acceptable
>>> too.
>>>
>>> Here is the updates about NOKPROBE_SYMBOL().
>>>  - Now _ASM_NOKPROBE() macro is introduced for assembly
>>>    symbols on x86.
>>>  - Rename kprobe_blackpoint to kprobe_blacklist_entry
>>>    and simplify it. Also NOKPROBE_SYMBOL() macro just
>>>    saves the address of non-probe-able symbols.
>>>
>>> ---
>>>
>>> Masami Hiramatsu (6):
>>
>>>       kprobes: Prohibit probing on .entry.text code
>>>       kprobes: Introduce NOKPROBE_SYMBOL() macro for blacklist
>> Hi Masami,
>> Is it good idea to split  "arch/x86" code from generic kernel changes?
>> Then we just need to take above two patches for verifying it on arm64
>> or other platforms.
>
> Yeah, it can be.
> However I think you can apply it without any problem on arm64 tree too,
> since it "just adds" an asm macro in arch/x86/include/asm/asm.h.
> It should not have any effect for other arch. Could you try it? :)
Hmm, for the second patch, git am failed with: "error: patch failed:
kernel/sched/core.c:2662",
manually patched to resolve it.  aarch64 tree is right now at Linux 3.13-rc2.

Anyways, no conflicts for x86 arch files.

Thanks,
Sandeepa

> Thank you,
>
>
> --
> Masami HIRAMATSU
> IT Management Research Dept. Linux Technology Center
> Hitachi, Ltd., Yokohama Research Laboratory
> E-mail: masami.hiramatsu.pt@hitachi.com
>
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs
  2013-12-04  8:45 ` Ingo Molnar
@ 2013-12-04 23:27   ` Masami Hiramatsu
  2013-12-05 10:21     ` Ingo Molnar
                       ` (2 more replies)
  0 siblings, 3 replies; 39+ messages in thread
From: Masami Hiramatsu @ 2013-12-04 23:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Ananth N Mavinakayanahalli, Sandeepa Prabhu, x86, lkml,
	Steven Rostedt (Red Hat),
	systemtap, David S. Miller

(2013/12/04 17:45), Ingo Molnar wrote:
> 
> * Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> wrote:
> 
>> Hi,
>> Here is the version 4 of NOKPORBE_SYMBOL series.
>>
>> In this version, I removed the cleanup patches and
>> add bugfixes I've found, since those bugs will be
>> critical.
>>
>> Rest of the cleanup and visible blacklists will be proposed later in 
>> another series.
> 
> Ok, let me make it clear: we need _both_ the conceptual cleanups and 
> the bug fixes.

I see. Why I split this out is because it includes an RFC patch,
and for easier review. I still have a series of cleanups :)

> Right now kprobes are restricted to root, and they are unsafe and 
> buggy, and rather fundamentally so, because probing cannot be done 
> safely without potentially crashing the kernel. So there's no 
> 'regression' to be fixed, it's mostly about pre-existing bugs - so 
> there's no requirement for them to come before maintainability 
> cleanups.

OK, I think the kprobe is like a strong medicine, not a toy,
since it can intercept most of the kernel functions which
may process a sensitive user private data. Thus even if we
fix all bugs and make it safe, I don't think we can open
it for all users (of course, there should be a knob to open
for any or restricted users.)

> So we need both a maintainable and a sane/safe solution, and I'd like 
> to apply the whole thing at once and be at ease that the solution is 
> round. We should have done this years ago.

For the safeness of kprobes, I have an idea; introduce a whitelist
for dynamic events. AFAICS, the biggest unstable issue of kprobes
comes from putting *many* probes on the functions called from tracers.

It doesn't crash the kernel but slows down so much, because every
probes hit many other nested miss-hit probes. This gives us a big
performance impact. However, on the other side, this kind of feature
can be used *for debugging* static trace events by dynamic one if we
carefully use a small number of probes on such functions. :)

Thus, I think we can restrict users from probing such functions by
using a whitelist which ftrace does already have;
 available_filter_functions :)

Then, I'd like to propose this new whitelist feature in kprobe-tracer
(not raw kprobe itself). And a sysctl knob for disabling the whitelist.
That knob will be /proc/sys/debug/kprobe-event-whitelist and disabling
it will mark kernel tainted so that we can check it from bug reports.

> So could you please send a whole series that I can apply to -tip as a 
> work in progress tree, and then we can see what is left to be solved?

Sure. :) BTW, would I better fold the cleanups for reducing the number of
patches?

Thank you,


-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Re: [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs
  2013-12-04  8:46     ` Sandeepa Prabhu
@ 2013-12-04 23:32       ` Masami Hiramatsu
  0 siblings, 0 replies; 39+ messages in thread
From: Masami Hiramatsu @ 2013-12-04 23:32 UTC (permalink / raw)
  To: Sandeepa Prabhu
  Cc: Ingo Molnar, Ananth N Mavinakayanahalli, x86, lkml,
	Steven Rostedt (Red Hat),
	systemtap, David S. Miller

(2013/12/04 17:46), Sandeepa Prabhu wrote:
> On 4 December 2013 13:09, Masami Hiramatsu
> <masami.hiramatsu.pt@hitachi.com> wrote:
>> (2013/12/04 11:54), Sandeepa Prabhu wrote:
>>> On 4 December 2013 06:58, Masami Hiramatsu
>>> <masami.hiramatsu.pt@hitachi.com> wrote:
>>>> Hi,
>>>> Here is the version 4 of NOKPORBE_SYMBOL series.
>>>>
>>>> In this version, I removed the cleanup patches and
>>>> add bugfixes I've found, since those bugs will be
>>>> critical.
>>>> Rest of the cleanup and visible blacklists will be
>>>> proposed later in another series.
>>>>
>>>> Oh, just one new thing, I added a new RFC patch which
>>>> removes the dependency of notify_die() from kprobes
>>>> miss-hit/recovery path. Since the notify_die() involves
>>>> locking and lockdep code which invokes a lot of heavy
>>>> printk functions etc. This helped me to minimize the
>>>> blacklist and provides more stability for kprobes.
>>>> Actually, most of int3 handlers are already called
>>>> from do_int3 directly, I think this change is acceptable
>>>> too.
>>>>
>>>> Here is the updates about NOKPROBE_SYMBOL().
>>>>  - Now _ASM_NOKPROBE() macro is introduced for assembly
>>>>    symbols on x86.
>>>>  - Rename kprobe_blackpoint to kprobe_blacklist_entry
>>>>    and simplify it. Also NOKPROBE_SYMBOL() macro just
>>>>    saves the address of non-probe-able symbols.
>>>>
>>>> ---
>>>>
>>>> Masami Hiramatsu (6):
>>>
>>>>       kprobes: Prohibit probing on .entry.text code
>>>>       kprobes: Introduce NOKPROBE_SYMBOL() macro for blacklist
>>> Hi Masami,
>>> Is it good idea to split  "arch/x86" code from generic kernel changes?
>>> Then we just need to take above two patches for verifying it on arm64
>>> or other platforms.
>>
>> Yeah, it can be.
>> However I think you can apply it without any problem on arm64 tree too,
>> since it "just adds" an asm macro in arch/x86/include/asm/asm.h.
>> It should not have any effect for other arch. Could you try it? :)
> Hmm, for the second patch, git am failed with: "error: patch failed:
> kernel/sched/core.c:2662",
> manually patched to resolve it.  aarch64 tree is right now at Linux 3.13-rc2.

Ah I see, that must be changed because it is the change related to introducing
new blacklist itself. It is not solved by splitting arch/x86 change.

Thank you,

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs
  2013-12-04 23:27   ` Masami Hiramatsu
@ 2013-12-05 10:21     ` Ingo Molnar
  2013-12-06  2:34       ` Masami Hiramatsu
  2013-12-05 13:08     ` Sandeepa Prabhu
  2013-12-05 14:49     ` Frank Ch. Eigler
  2 siblings, 1 reply; 39+ messages in thread
From: Ingo Molnar @ 2013-12-05 10:21 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Ananth N Mavinakayanahalli, Sandeepa Prabhu, x86, lkml,
	Steven Rostedt (Red Hat),
	systemtap, David S. Miller


* Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> wrote:

> > So we need both a maintainable and a sane/safe solution, and I'd 
> > like to apply the whole thing at once and be at ease that the 
> > solution is round. We should have done this years ago.
> 
> For the safeness of kprobes, I have an idea; introduce a whitelist 
> for dynamic events. AFAICS, the biggest unstable issue of kprobes 
> comes from putting *many* probes on the functions called from 
> tracers.

If the number of 'noprobe' annotations is expected to explode then 
maybe another approach should be considered.

For example in perf we detect recursion. Could kprobes do that and 
detect hitting a probe while running kprobes code, and ignore it [do 
an early return]?

That way most of the annotations could be removed and kprobes would 
become inherently safe. Is there any complication I'm missing?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs
  2013-12-04 23:27   ` Masami Hiramatsu
  2013-12-05 10:21     ` Ingo Molnar
@ 2013-12-05 13:08     ` Sandeepa Prabhu
  2013-12-06  6:23       ` Masami Hiramatsu
  2013-12-05 14:49     ` Frank Ch. Eigler
  2 siblings, 1 reply; 39+ messages in thread
From: Sandeepa Prabhu @ 2013-12-05 13:08 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Ingo Molnar, Ananth N Mavinakayanahalli, x86, lkml,
	Steven Rostedt (Red Hat),
	systemtap, David S. Miller

> OK, I think the kprobe is like a strong medicine, not a toy,
> since it can intercept most of the kernel functions which
> may process a sensitive user private data. Thus even if we
> fix all bugs and make it safe, I don't think we can open
> it for all users (of course, there should be a knob to open
> for any or restricted users.)
>
>> So we need both a maintainable and a sane/safe solution, and I'd like
>> to apply the whole thing at once and be at ease that the solution is
>> round. We should have done this years ago.
>
> For the safeness of kprobes, I have an idea; introduce a whitelist
> for dynamic events. AFAICS, the biggest unstable issue of kprobes
> comes from putting *many* probes on the functions called from tracers.
>
> It doesn't crash the kernel but slows down so much, because every
> probes hit many other nested miss-hit probes. This gives us a big
> performance impact. However, on the other side, this kind of feature
> can be used *for debugging* static trace events by dynamic one if we
> carefully use a small number of probes on such functions. :)
>
> Thus, I think we can restrict users from probing such functions by
> using a whitelist which ftrace does already have;
>  available_filter_functions :)
I am not sure if this question is related, uprobes or ftrace code does
not  define __kprobes, so is it safe to place kprobe on uprobes or
ftrace code? Is it expected from arch code to support such cases?

Thanks,
Sandeepa

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs
  2013-12-04 23:27   ` Masami Hiramatsu
  2013-12-05 10:21     ` Ingo Molnar
  2013-12-05 13:08     ` Sandeepa Prabhu
@ 2013-12-05 14:49     ` Frank Ch. Eigler
  2013-12-06  6:12       ` Masami Hiramatsu
  2 siblings, 1 reply; 39+ messages in thread
From: Frank Ch. Eigler @ 2013-12-05 14:49 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Ingo Molnar, Ananth N Mavinakayanahalli, Sandeepa Prabhu, x86,
	lkml, Steven Rostedt (Red Hat),
	systemtap, David S. Miller


Hi, Masami -

masami.hiramatsu.pt wrote:

> [...]
> For the safeness of kprobes, I have an idea; introduce a whitelist
> for dynamic events. AFAICS, the biggest unstable issue of kprobes
> comes from putting *many* probes on the functions called from tracers.

Why do you think so?  We have had problems with single kprobes in the
"wrong" spot.  The main reason I showed spraying them widely is to get
wide coverage with minimal information/effort, not to suggest that the
number of concurrent probes per se is a problem.  (We have had
systemtap scripts probing some areas of the kernel with thousands of
active kprobes, e.g. for statement-by-statement variable-watching
jobs, and these have worked fine.)


> It doesn't crash the kernel but slows down so much, because every
> probes hit many other nested miss-hit probes. 

(kprobes does have code to detect & handle reentrancy.)

> This gives us a big performance impact. [...]

Sure, but I'd expect to see pure slowdowns show their impact with
time-related problems like watchdogs firing or timeouts.


> [...]  Then, I'd like to propose this new whitelist feature in
> kprobe-tracer (not raw kprobe itself). And a sysctl knob for
> disabling the whitelist.  That knob will be
> /proc/sys/debug/kprobe-event-whitelist and disabling it will mark
> kernel tainted so that we can check it from bug reports.

How would one assemble a reliable whitelist, if we haven't fully
characterized the problems that make the blacklist necessary?


- FChE

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs
  2013-12-05 10:21     ` Ingo Molnar
@ 2013-12-06  2:34       ` Masami Hiramatsu
  2013-12-10 15:28         ` Ingo Molnar
  0 siblings, 1 reply; 39+ messages in thread
From: Masami Hiramatsu @ 2013-12-06  2:34 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Ananth N Mavinakayanahalli, Sandeepa Prabhu, x86, lkml,
	Steven Rostedt (Red Hat),
	systemtap, David S. Miller

(2013/12/05 19:21), Ingo Molnar wrote:
> 
> * Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> wrote:
> 
>>> So we need both a maintainable and a sane/safe solution, and I'd 
>>> like to apply the whole thing at once and be at ease that the 
>>> solution is round. We should have done this years ago.
>>
>> For the safeness of kprobes, I have an idea; introduce a whitelist 
>> for dynamic events. AFAICS, the biggest unstable issue of kprobes 
>> comes from putting *many* probes on the functions called from 
>> tracers.
> 
> If the number of 'noprobe' annotations is expected to explode then 
> maybe another approach should be considered.

No, since this is a "quantitative" issue, the annotation helps us.

> For example in perf we detect recursion. Could kprobes do that and 
> detect hitting a probe while running kprobes code, and ignore it [do 
> an early return]?

Yes, the kprobe itself already has recursion detector and it rejects
calling handler.

> 
> That way most of the annotations could be removed and kprobes would 
> become inherently safe. Is there any complication I'm missing?

That is actually what I'm doing with cleanup patches. :)


Thank you,

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs
  2013-12-05 14:49     ` Frank Ch. Eigler
@ 2013-12-06  6:12       ` Masami Hiramatsu
  2013-12-06 19:07         ` Frank Ch. Eigler
  0 siblings, 1 reply; 39+ messages in thread
From: Masami Hiramatsu @ 2013-12-06  6:12 UTC (permalink / raw)
  To: Frank Ch. Eigler
  Cc: Ingo Molnar, Ananth N Mavinakayanahalli, Sandeepa Prabhu, x86,
	lkml, Steven Rostedt (Red Hat),
	systemtap, David S. Miller

(2013/12/05 23:49), Frank Ch. Eigler wrote:
> 
> Hi, Masami -
> 
> masami.hiramatsu.pt wrote:
> 
>> [...]
>> For the safeness of kprobes, I have an idea; introduce a whitelist
>> for dynamic events. AFAICS, the biggest unstable issue of kprobes
>> comes from putting *many* probes on the functions called from tracers.
> 
> Why do you think so?

Oh, because I actually hit this problem when enabling kprobe-events
on every *ftrace-related* functions(ring buffer, trace filter etc.)
It doesn't crash the kernel but it slows down the machine very much.
And finally I have to reboot it forcibly. But when I just enables a
few probes on those functions, the system has no problem.

In this case, almost probes are miss-hit because of recursion, but
anyway each miss-hit involves int3/debug interrupts and it increases
the processing time of one event handling by ftrace as below.

1. hit a kprobe outside of ftrace
2. kprobe calls event handler
3. the event handler calls ftrace-related functions to reserve
   buffer, check filter, commit buffer etc.
  3-1. each ftrace/ringbuffer function hits a kprobe
  3-2. the kprobe detect recursion and just do single-step and return
4. do single stepping
5. return from kprobe

Note that all the problem happens inside the event handler.

>  We have had problems with single kprobes in the
> "wrong" spot.  The main reason I showed spraying them widely is to get
> wide coverage with minimal information/effort, not to suggest that the
> number of concurrent probes per se is a problem.  (We have had
> systemtap scripts probing some areas of the kernel with thousands of
> active kprobes, e.g. for statement-by-statement variable-watching
> jobs, and these have worked fine.)

Ah, sorry for confusion. Agreed. I just tried to explain that kprobes
can cause a performance problem under *very specific* operation.
So the whitelist is just for keeping people away from it.

>> It doesn't crash the kernel but slows down so much, because every
>> probes hit many other nested miss-hit probes. 
> 
> (kprobes does have code to detect & handle reentrancy.)

Right. :)

>> This gives us a big performance impact. [...]
> 
> Sure, but I'd expect to see pure slowdowns show their impact with
> time-related problems like watchdogs firing or timeouts.

I doubt it can cause, because each probe processing time is
still small enough to slip through the watchdog.

>> [...]  Then, I'd like to propose this new whitelist feature in
>> kprobe-tracer (not raw kprobe itself). And a sysctl knob for
>> disabling the whitelist.  That knob will be
>> /proc/sys/debug/kprobe-event-whitelist and disabling it will mark
>> kernel tainted so that we can check it from bug reports.
> 
> How would one assemble a reliable whitelist, if we haven't fully
> characterized the problems that make the blacklist necessary?

As I said, we can use function graph tracer's list as the whitelist,
since it doesn't include any functions invoked from ftrace's event
handler. (Note that I don't mention the Systemtap or other user here)

Whitelist is just for keeping the people away from the quantitative
issue, who just want to trace their subsystems except for ftrace.
For example, such people may try to probe every functions
(e.g. perf probe --add '* $vars' : actually this is why I don't
release wildcard support on perf probe yet).
Of course I can implement the whitelist feature in perf probe only,
that will allow me to support wildcard on perf probe. :)

For the long term solution, I think we can introduce some kind of
performance gatekeeper as systemtap does. Counting the miss-hit rate
per second and if it go over a threshold, disable next miss-hit (or
most miss-hit) probe (as OOM killer does).

Thank you,

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs
  2013-12-05 13:08     ` Sandeepa Prabhu
@ 2013-12-06  6:23       ` Masami Hiramatsu
  2013-12-06  6:54         ` Sandeepa Prabhu
  0 siblings, 1 reply; 39+ messages in thread
From: Masami Hiramatsu @ 2013-12-06  6:23 UTC (permalink / raw)
  To: Sandeepa Prabhu
  Cc: Ingo Molnar, Ananth N Mavinakayanahalli, x86, lkml,
	Steven Rostedt (Red Hat),
	systemtap, David S. Miller

(2013/12/05 22:08), Sandeepa Prabhu wrote:
>> OK, I think the kprobe is like a strong medicine, not a toy,
>> since it can intercept most of the kernel functions which
>> may process a sensitive user private data. Thus even if we
>> fix all bugs and make it safe, I don't think we can open
>> it for all users (of course, there should be a knob to open
>> for any or restricted users.)
>>
>>> So we need both a maintainable and a sane/safe solution, and I'd like
>>> to apply the whole thing at once and be at ease that the solution is
>>> round. We should have done this years ago.
>>
>> For the safeness of kprobes, I have an idea; introduce a whitelist
>> for dynamic events. AFAICS, the biggest unstable issue of kprobes
>> comes from putting *many* probes on the functions called from tracers.
>>
>> It doesn't crash the kernel but slows down so much, because every
>> probes hit many other nested miss-hit probes. This gives us a big
>> performance impact. However, on the other side, this kind of feature
>> can be used *for debugging* static trace events by dynamic one if we
>> carefully use a small number of probes on such functions. :)
>>
>> Thus, I think we can restrict users from probing such functions by
>> using a whitelist which ftrace does already have;
>>  available_filter_functions :)
> I am not sure if this question is related, uprobes or ftrace code does
> not  define __kprobes, so is it safe to place kprobe on uprobes or
> ftrace code? 

Yes, it is "safe" in qualitative meaning. But for ftrace code, it could
give a performance impact by miss-hitting. Since uprobe is independent
from kprobe, it should work.

> Is it expected from arch code to support such cases?

Yes, the arch dependent implementation is the key. If it shares some
code which can be called from miss-hit path, it should be blacklisted.

Thank you,

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs
  2013-12-06  6:23       ` Masami Hiramatsu
@ 2013-12-06  6:54         ` Sandeepa Prabhu
  2013-12-06 23:25           ` Masami Hiramatsu
  0 siblings, 1 reply; 39+ messages in thread
From: Sandeepa Prabhu @ 2013-12-06  6:54 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Ingo Molnar, Ananth N Mavinakayanahalli, x86, lkml,
	Steven Rostedt (Red Hat),
	systemtap, David S. Miller

>> I am not sure if this question is related, uprobes or ftrace code does
>> not  define __kprobes, so is it safe to place kprobe on uprobes or
>> ftrace code?
>
> Yes, it is "safe" in qualitative meaning. But for ftrace code, it could
> give a performance impact by miss-hitting. Since uprobe is independent
> from kprobe, it should work.
>
>> Is it expected from arch code to support such cases?
>
> Yes, the arch dependent implementation is the key. If it shares some
> code which can be called from miss-hit path, it should be blacklisted.
well, isn't the blacklist only for those routines that can not be
handled or may crash kernel, like the code sections called from
exception kprobes exception handlers etc?
suppose if the probe on routine can miss-hit (probes re-cursing) but
can be handled, it's only a quantitative issue (i.e. performance
impact) so it should be *user's* problem right? I mean, as you said
earlier about having white-list or a performance gatekeeper
(systemtap), one can avoid such cases by white list or removing
miss-hit probes dynamically.  But a blacklisting a symbol means
placing a probe on that *can not be handled* and can crash the system,
is it correct?

Thanks,
Sandeepa

>
> Thank you,
>
> --
> Masami HIRAMATSU
> IT Management Research Dept. Linux Technology Center
> Hitachi, Ltd., Yokohama Research Laboratory
> E-mail: masami.hiramatsu.pt@hitachi.com
>
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs
  2013-12-06  6:12       ` Masami Hiramatsu
@ 2013-12-06 19:07         ` Frank Ch. Eigler
  2013-12-06 23:19           ` Masami Hiramatsu
  0 siblings, 1 reply; 39+ messages in thread
From: Frank Ch. Eigler @ 2013-12-06 19:07 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Ingo Molnar, Ananth N Mavinakayanahalli, Sandeepa Prabhu, x86,
	lkml, Steven Rostedt (Red Hat),
	systemtap, David S. Miller

Hi, Masami -

masami.hiramatsu.pt wrote:

> [...]
> >> [...]  Then, I'd like to propose this new whitelist feature in
> >> kprobe-tracer (not raw kprobe itself). And a sysctl knob for
> >> disabling the whitelist.  That knob will be
> >> /proc/sys/debug/kprobe-event-whitelist and disabling it will mark
> >> kernel tainted so that we can check it from bug reports.
> > 
> > How would one assemble a reliable whitelist, if we haven't fully
> > characterized the problems that make the blacklist necessary?
> 
> As I said, we can use function graph tracer's list as the whitelist,
> since it doesn't include any functions invoked from ftrace's event
> handler. (Note that I don't mention the Systemtap or other user here)
>
> Whitelist is just for keeping the people away from the quantitative
> issue, who just want to trace their subsystems except for ftrace.
> [...]

Would you plan to limit kprobes (or just the perf-probe frontend) to
only function-entries also?  If not, and if intra-function
statement-granularity kprobes remain allowed within a
function-granularity whitelist, then you might still have those
"quantitative" problems.

Even worse, kprobes robustness problems can bite even with a small
whitelist, unless you can test the countless subset selections
cartesian-product the aggrevating factors (like other tracing
facilities being in use at the same time, limited memory, high irq
rates, debugging sessions, architectures, whatever).


> [...]  For the long term solution, I think we can introduce some
> kind of performance gatekeeper as systemtap does. Counting the
> miss-hit rate per second and if it go over a threshold, disable next
> miss-hit (or most miss-hit) probe (as OOM killer does).

That would make sense, but again it would not help deal with kprobes
robustness (in the kernel-crashing rather than kernel-slowdown sense).


- FChE

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs
  2013-12-06 19:07         ` Frank Ch. Eigler
@ 2013-12-06 23:19           ` Masami Hiramatsu
  2013-12-07  1:32             ` Frank Ch. Eigler
  0 siblings, 1 reply; 39+ messages in thread
From: Masami Hiramatsu @ 2013-12-06 23:19 UTC (permalink / raw)
  To: Frank Ch. Eigler
  Cc: Ingo Molnar, Ananth N Mavinakayanahalli, Sandeepa Prabhu, x86,
	lkml, Steven Rostedt (Red Hat),
	systemtap, David S. Miller

(2013/12/07 4:07), Frank Ch. Eigler wrote:
> Hi, Masami -
> 
> masami.hiramatsu.pt wrote:
> 
>> [...]
>>>> [...]  Then, I'd like to propose this new whitelist feature in
>>>> kprobe-tracer (not raw kprobe itself). And a sysctl knob for
>>>> disabling the whitelist.  That knob will be
>>>> /proc/sys/debug/kprobe-event-whitelist and disabling it will mark
>>>> kernel tainted so that we can check it from bug reports.
>>>
>>> How would one assemble a reliable whitelist, if we haven't fully
>>> characterized the problems that make the blacklist necessary?
>>
>> As I said, we can use function graph tracer's list as the whitelist,
>> since it doesn't include any functions invoked from ftrace's event
>> handler. (Note that I don't mention the Systemtap or other user here)
>>
>> Whitelist is just for keeping the people away from the quantitative
>> issue, who just want to trace their subsystems except for ftrace.
>> [...]
> 
> Would you plan to limit kprobes (or just the perf-probe frontend) to
> only function-entries also?

Exactly, yes :). Currently I have a patch for kprobe-tracer
implementation (not only for perf-probe, but doesn't limit
kprobes itself).

>  If not, and if intra-function
> statement-granularity kprobes remain allowed within a
> function-granularity whitelist, then you might still have those
> "quantitative" problems.

Yes, but as far as I've tested, the performance overhead is not
high, especially as far as putting kprobes at the entry of those
functions because of ftrace-based optimization.

> Even worse, kprobes robustness problems can bite even with a small
> whitelist, unless you can test the countless subset selections
> cartesian-product the aggrevating factors (like other tracing
> facilities being in use at the same time, limited memory, high irq
> rates, debugging sessions, architectures, whatever).

And also, what script will run on each probe, right? :)

>> [...]  For the long term solution, I think we can introduce some
>> kind of performance gatekeeper as systemtap does. Counting the
>> miss-hit rate per second and if it go over a threshold, disable next
>> miss-hit (or most miss-hit) probe (as OOM killer does).
> 
> That would make sense, but again it would not help deal with kprobes
> robustness (in the kernel-crashing rather than kernel-slowdown sense).

Why would you think so? Is there any hidden path for calling kprobes
mechanism?? The kernel crash problem just comes from bugs, not the
quantitative issue.

Thank you,

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs
  2013-12-06  6:54         ` Sandeepa Prabhu
@ 2013-12-06 23:25           ` Masami Hiramatsu
  0 siblings, 0 replies; 39+ messages in thread
From: Masami Hiramatsu @ 2013-12-06 23:25 UTC (permalink / raw)
  To: Sandeepa Prabhu
  Cc: Ingo Molnar, Ananth N Mavinakayanahalli, x86, lkml,
	Steven Rostedt (Red Hat),
	systemtap, David S. Miller

(2013/12/06 15:54), Sandeepa Prabhu wrote:
>>> I am not sure if this question is related, uprobes or ftrace code does
>>> not  define __kprobes, so is it safe to place kprobe on uprobes or
>>> ftrace code?
>>
>> Yes, it is "safe" in qualitative meaning. But for ftrace code, it could
>> give a performance impact by miss-hitting. Since uprobe is independent
>> from kprobe, it should work.
>>
>>> Is it expected from arch code to support such cases?
>>
>> Yes, the arch dependent implementation is the key. If it shares some
>> code which can be called from miss-hit path, it should be blacklisted.
> well, isn't the blacklist only for those routines that can not be
> handled or may crash kernel, like the code sections called from
> exception kprobes exception handlers etc?

Yes, that's why the blacklist is needed.

> suppose if the probe on routine can miss-hit (probes re-cursing) but
> can be handled, it's only a quantitative issue (i.e. performance
> impact) so it should be *user's* problem right? I mean, as you said
> earlier about having white-list or a performance gatekeeper
> (systemtap), one can avoid such cases by white list or removing
> miss-hit probes dynamically.  But a blacklisting a symbol means
> placing a probe on that *can not be handled* and can crash the system,
> is it correct?

Yes, exactly that is what I meant. :)
The blacklist is only for avoiding such fundamental issue, therefore,
it strongly depends on the architecture code.

Thank you,
-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs
  2013-12-06 23:19           ` Masami Hiramatsu
@ 2013-12-07  1:32             ` Frank Ch. Eigler
  2013-12-07  2:34               ` Masami Hiramatsu
  0 siblings, 1 reply; 39+ messages in thread
From: Frank Ch. Eigler @ 2013-12-07  1:32 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Ingo Molnar, Ananth N Mavinakayanahalli, Sandeepa Prabhu, x86,
	lkml, Steven Rostedt (Red Hat),
	systemtap, David S. Miller

Hi -

On Sat, Dec 07, 2013 at 08:19:13AM +0900, Masami Hiramatsu wrote:

> [...]
> > Would you plan to limit kprobes (or just the perf-probe frontend) to
> > only function-entries also?

> Exactly, yes :). Currently I have a patch for kprobe-tracer
> implementation (not only for perf-probe, but doesn't limit kprobes
> itself).

Interesting option.  It sounds like a restrictive expedient that could
result in kprobes never being made sufficiently robust.


> > If not, and if intra-function statement-granularity kprobes remain
> > allowed within a function-granularity whitelist, then you might
> > still have those "quantitative" problems.

> Yes, but as far as I've tested, the performance overhead is not
> high, especially as far as putting kprobes at the entry of those
> functions because of ftrace-based optimization.

(Would that also make CONFIG_KPROBE_EVENT require KPROBES_ON_FTRACE?)


> > Even worse, kprobes robustness problems can bite even with a small
> > whitelist, unless you can test the countless subset selections
> > cartesian-product the aggrevating factors (like other tracing
> > facilities being in use at the same time, limited memory, high irq
> > rates, debugging sessions, architectures, whatever).
> 
> And also, what script will run on each probe, right? :)

In the perf-probe world, the closest analogue could be varying the
contextual data that's being extracted (stack traces, parameters, ...).


> >> [...]  For the long term solution, I think we can introduce some
> >> kind of performance gatekeeper as systemtap does. Counting the
> >> miss-hit rate per second and if it go over a threshold, disable next
> >> miss-hit (or most miss-hit) probe (as OOM killer does).
> > 
> > That would make sense, but again it would not help deal with kprobes
> > robustness (in the kernel-crashing rather than kernel-slowdown sense).
> 
> Why would you think so? Is there any hidden path for calling kprobes
> mechanism?? The kernel crash problem just comes from bugs, not the
> quantitative issue.

I don't think we're disagreeing.  A performance-gatekeeper in
perf-probe or nearby would be useful (and manage the kprobe-quantity
problem).  It would not be sufficient to prevent the kernel-crashing
bugs.


- FChE

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs
  2013-12-07  1:32             ` Frank Ch. Eigler
@ 2013-12-07  2:34               ` Masami Hiramatsu
  0 siblings, 0 replies; 39+ messages in thread
From: Masami Hiramatsu @ 2013-12-07  2:34 UTC (permalink / raw)
  To: Frank Ch. Eigler
  Cc: Ingo Molnar, Ananth N Mavinakayanahalli, Sandeepa Prabhu, x86,
	lkml, Steven Rostedt (Red Hat),
	systemtap, David S. Miller

(2013/12/07 10:32), Frank Ch. Eigler wrote:
> Hi -
> 
> On Sat, Dec 07, 2013 at 08:19:13AM +0900, Masami Hiramatsu wrote:
> 
>> [...]
>>> Would you plan to limit kprobes (or just the perf-probe frontend) to
>>> only function-entries also?
> 
>> Exactly, yes :). Currently I have a patch for kprobe-tracer
>> implementation (not only for perf-probe, but doesn't limit kprobes
>> itself).
> 
> Interesting option.  It sounds like a restrictive expedient that could
> result in kprobes never being made sufficiently robust.

the raw-kprobes users like systemtap can also implement its own
whitelist. :) ftrace-based whitelist is only useful for ftrace/perf.
Anyway, the list is open via debugfs as available_filter_functions.

>>> If not, and if intra-function statement-granularity kprobes remain
>>> allowed within a function-granularity whitelist, then you might
>>> still have those "quantitative" problems.
> 
>> Yes, but as far as I've tested, the performance overhead is not
>> high, especially as far as putting kprobes at the entry of those
>> functions because of ftrace-based optimization.
> 
> (Would that also make CONFIG_KPROBE_EVENT require KPROBES_ON_FTRACE?)

Ah, no but a good point. at least the whitelist requires
CONFIG_FUNCTION_TRACER.

>>> Even worse, kprobes robustness problems can bite even with a small
>>> whitelist, unless you can test the countless subset selections
>>> cartesian-product the aggrevating factors (like other tracing
>>> facilities being in use at the same time, limited memory, high irq
>>> rates, debugging sessions, architectures, whatever).
>>
>> And also, what script will run on each probe, right? :)
> 
> In the perf-probe world, the closest analogue could be varying the
> contextual data that's being extracted (stack traces, parameters, ...).

Yes, it should be verified before accessing it (and already done).

>>>> [...]  For the long term solution, I think we can introduce some
>>>> kind of performance gatekeeper as systemtap does. Counting the
>>>> miss-hit rate per second and if it go over a threshold, disable next
>>>> miss-hit (or most miss-hit) probe (as OOM killer does).
>>>
>>> That would make sense, but again it would not help deal with kprobes
>>> robustness (in the kernel-crashing rather than kernel-slowdown sense).
>>
>> Why would you think so? Is there any hidden path for calling kprobes
>> mechanism?? The kernel crash problem just comes from bugs, not the
>> quantitative issue.
> 
> I don't think we're disagreeing.  A performance-gatekeeper in
> perf-probe or nearby would be useful (and manage the kprobe-quantity
> problem).  It would not be sufficient to prevent the kernel-crashing
> bugs.

Right. Ah, I just meant that we'd better add those features, not
replacing the blacklist. And the blacklist should be maintained
anyway. :)

Thank you,

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs
  2013-12-06  2:34       ` Masami Hiramatsu
@ 2013-12-10 15:28         ` Ingo Molnar
  2013-12-11  2:12           ` Masami Hiramatsu
  0 siblings, 1 reply; 39+ messages in thread
From: Ingo Molnar @ 2013-12-10 15:28 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Ananth N Mavinakayanahalli, Sandeepa Prabhu, x86, lkml,
	Steven Rostedt (Red Hat),
	systemtap, David S. Miller


* Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> wrote:

> (2013/12/05 19:21), Ingo Molnar wrote:
> > 
> > * Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> wrote:
> > 
> >>> So we need both a maintainable and a sane/safe solution, and I'd 
> >>> like to apply the whole thing at once and be at ease that the 
> >>> solution is round. We should have done this years ago.
> >>
> >> For the safeness of kprobes, I have an idea; introduce a whitelist 
> >> for dynamic events. AFAICS, the biggest unstable issue of kprobes 
> >> comes from putting *many* probes on the functions called from 
> >> tracers.
> > 
> > If the number of 'noprobe' annotations is expected to explode then 
> > maybe another approach should be considered.
> 
> No, since this is a "quantitative" issue, the annotation helps us.
> 
> > For example in perf we detect recursion. Could kprobes do that and 
> > detect hitting a probe while running kprobes code, and ignore it [do 
> > an early return]?
> 
> Yes, the kprobe itself already has recursion detector and it rejects
> calling handler.

So why are annotations needed at all? What can happen if an annotation 
is missing and a piece of code is probed which is also used by the 
kprobes code internally - do we crash, lock up, misbehave or handle it 
safely?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs
  2013-12-10 15:28         ` Ingo Molnar
@ 2013-12-11  2:12           ` Masami Hiramatsu
  2013-12-11 13:34             ` Ingo Molnar
  0 siblings, 1 reply; 39+ messages in thread
From: Masami Hiramatsu @ 2013-12-11  2:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Ananth N Mavinakayanahalli, Sandeepa Prabhu, x86, lkml,
	Steven Rostedt (Red Hat),
	systemtap, David S. Miller

(2013/12/11 0:28), Ingo Molnar wrote:
> 
> * Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> wrote:
> 
>> (2013/12/05 19:21), Ingo Molnar wrote:
>>>
>>> * Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> wrote:
>>>
>>>>> So we need both a maintainable and a sane/safe solution, and I'd 
>>>>> like to apply the whole thing at once and be at ease that the 
>>>>> solution is round. We should have done this years ago.
>>>>
>>>> For the safeness of kprobes, I have an idea; introduce a whitelist 
>>>> for dynamic events. AFAICS, the biggest unstable issue of kprobes 
>>>> comes from putting *many* probes on the functions called from 
>>>> tracers.
>>>
>>> If the number of 'noprobe' annotations is expected to explode then 
>>> maybe another approach should be considered.
>>
>> No, since this is a "quantitative" issue, the annotation helps us.

Sorry for confusion, it should be "the annotation helps us to solves
only for qualitative issue". It's my miss...

>>> For example in perf we detect recursion. Could kprobes do that and 
>>> detect hitting a probe while running kprobes code, and ignore it [do 
>>> an early return]?
>>
>> Yes, the kprobe itself already has recursion detector and it rejects
>> calling handler.
> 
> So why are annotations needed at all? What can happen if an annotation 
> is missing and a piece of code is probed which is also used by the 
> kprobes code internally - do we crash, lock up, misbehave or handle it 
> safely?

The kprobe has recursion detector, but it is detected in the kprobe
exception(int3) handler, this means that if we put a probe before
detecting the recursion, we'll do an infinite recursion.
And also, even if we can detect the recursion, we can't stop the
kernel, we need to skip the probe. This means that we need to recover
to the main execution path by doing single step. As you may know,
since the single stepping involves the debug exception, we have to
avoid proving on that path too. Or we'll have an infinite recursion
again.
This is the basic reason why the __kprobes annotation is introduced.

Thank you,

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH -tip v4 6/6] [RFC] kprobes/x86: Call exception handlers directly from do_int3/do_debug
  2013-12-04  2:39   ` Steven Rostedt
@ 2013-12-11 13:31     ` Jiri Kosina
  2013-12-12  4:40       ` Masami Hiramatsu
  0 siblings, 1 reply; 39+ messages in thread
From: Jiri Kosina @ 2013-12-11 13:31 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Masami Hiramatsu, Ingo Molnar, Andi Kleen,
	Ananth N Mavinakayanahalli, Sandeepa Prabhu, Frederic Weisbecker,
	x86, lkml, Andrew Morton, Ingo Molnar, systemtap, H. Peter Anvin,
	Sasha Levin, Thomas Gleixner, Seiji Aguchi, David S. Miller

On Tue, 3 Dec 2013, Steven Rostedt wrote:

> > To avoid a kernel crash by probing on lockdep code, call
> > kprobe_int3_handler and kprobe_debug_handler directly
> > from do_int3 and do_debug. Since there is a locking code
> > in notify_die, lockdep code can be invoked. And because
> > the lockdep involves printk() related things, theoretically,
> > we need to prohibit probing on much more code...
> > 
> > Anyway, most of the int3 handlers in the kernel are already
> > called from do_int3 directly, e.g. ftrace_int3_handler,
> > poke_int3_handler, kgdb_ll_trap. Actually only
> > kprobe_exceptions_notify is on the notifier_call_chain.
> > 
> > So I think this is not a crazy thing.
> 
> What? Oh, yeah. No, using notifiers in int3 handler is the crazy
> thing ;-)

Yeah, it's broken. Obviously, if you happen to trigger int3 before the 
notifier has been registered, it'd cause int3 exception to be unhandled. 
See

	commit 17f41571bb2c4a398785452ac2718a6c5d77180e
	Author: Jiri Kosina <jkosina@suse.cz>
	Date:   Tue Jul 23 10:09:28 2013 +0200

	    kprobes/x86: Call out into INT3 handler directly instead of using notifier

for one such issue that happened with jump labels.

> Hmm, if there's no users of the int3 notifier, should we just remove it?

Hmm, there are still uprobes, right?

-- 
Jiri Kosina
SUSE Labs


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs
  2013-12-11  2:12           ` Masami Hiramatsu
@ 2013-12-11 13:34             ` Ingo Molnar
  2013-12-12  6:02               ` Masami Hiramatsu
  0 siblings, 1 reply; 39+ messages in thread
From: Ingo Molnar @ 2013-12-11 13:34 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Ananth N Mavinakayanahalli, Sandeepa Prabhu, x86, lkml,
	Steven Rostedt (Red Hat),
	systemtap, David S. Miller


* Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> wrote:

> > So why are annotations needed at all? What can happen if an 
> > annotation is missing and a piece of code is probed which is also 
> > used by the kprobes code internally - do we crash, lock up, 
> > misbehave or handle it safely?
> 
> The kprobe has recursion detector, [...]

It's the 'current_kprobe' percpu variable, checked via 
kprobe_running(), right?

> [...] but it is detected in the kprobe exception(int3) handler, this 
> means that if we put a probe before detecting the recursion, we'll 
> do an infinite recursion.

So only the (presumably rather narrow) code path leading to the 
recursion detection code has to be annotated, correct?

> And also, even if we can detect the recursion, we can't stop the 
> kernel, we need to skip the probe. This means that we need to 
> recover to the main execution path by doing single step. As you may 
> know, since the single stepping involves the debug exception, we 
> have to avoid proving on that path too. Or we'll have an infinite 
> recursion again.

I don't see why this is needed: if a "probing is disabled" recursion 
flag is set the moment the first probe fires, and if it's only cleared 
once all processing is finished, then any intermediate probes should 
simply return early from int3 and not fire.

What am I missing?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH -tip v4 6/6] [RFC] kprobes/x86: Call exception handlers directly from do_int3/do_debug
  2013-12-11 13:31     ` Jiri Kosina
@ 2013-12-12  4:40       ` Masami Hiramatsu
  2013-12-12  9:59         ` Jiri Kosina
  0 siblings, 1 reply; 39+ messages in thread
From: Masami Hiramatsu @ 2013-12-12  4:40 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Steven Rostedt, Ingo Molnar, Andi Kleen,
	Ananth N Mavinakayanahalli, Sandeepa Prabhu, Frederic Weisbecker,
	x86, lkml, Andrew Morton, Ingo Molnar, systemtap, H. Peter Anvin,
	Sasha Levin, Thomas Gleixner, Seiji Aguchi, David S. Miller

(2013/12/11 22:31), Jiri Kosina wrote:
> On Tue, 3 Dec 2013, Steven Rostedt wrote:
> 
>>> To avoid a kernel crash by probing on lockdep code, call
>>> kprobe_int3_handler and kprobe_debug_handler directly
>>> from do_int3 and do_debug. Since there is a locking code
>>> in notify_die, lockdep code can be invoked. And because
>>> the lockdep involves printk() related things, theoretically,
>>> we need to prohibit probing on much more code...
>>>
>>> Anyway, most of the int3 handlers in the kernel are already
>>> called from do_int3 directly, e.g. ftrace_int3_handler,
>>> poke_int3_handler, kgdb_ll_trap. Actually only
>>> kprobe_exceptions_notify is on the notifier_call_chain.
>>>
>>> So I think this is not a crazy thing.
>>
>> What? Oh, yeah. No, using notifiers in int3 handler is the crazy
>> thing ;-)
> 
> Yeah, it's broken. Obviously, if you happen to trigger int3 before the 
> notifier has been registered, it'd cause int3 exception to be unhandled. 
> See
> 
> 	commit 17f41571bb2c4a398785452ac2718a6c5d77180e
> 	Author: Jiri Kosina <jkosina@suse.cz>
> 	Date:   Tue Jul 23 10:09:28 2013 +0200
> 
> 	    kprobes/x86: Call out into INT3 handler directly instead of using notifier
> 
> for one such issue that happened with jump labels.
> 
>> Hmm, if there's no users of the int3 notifier, should we just remove it?
> 
> Hmm, there are still uprobes, right?

Right, uprobes still use it, however, since it only handles user-space
breakpoint, there is no problem.

Thank you!


-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs
  2013-12-11 13:34             ` Ingo Molnar
@ 2013-12-12  6:02               ` Masami Hiramatsu
  2013-12-12 14:03                 ` Ingo Molnar
  0 siblings, 1 reply; 39+ messages in thread
From: Masami Hiramatsu @ 2013-12-12  6:02 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Ananth N Mavinakayanahalli, Sandeepa Prabhu, x86, lkml,
	Steven Rostedt (Red Hat),
	systemtap, David S. Miller

(2013/12/11 22:34), Ingo Molnar wrote:
> 
> * Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> wrote:
> 
>>> So why are annotations needed at all? What can happen if an 
>>> annotation is missing and a piece of code is probed which is also 
>>> used by the kprobes code internally - do we crash, lock up, 
>>> misbehave or handle it safely?
>>
>> The kprobe has recursion detector, [...]
> 
> It's the 'current_kprobe' percpu variable, checked via 
> kprobe_running(), right?

Right. :)

>> [...] but it is detected in the kprobe exception(int3) handler, this 
>> means that if we put a probe before detecting the recursion, we'll 
>> do an infinite recursion.
> 
> So only the (presumably rather narrow) code path leading to the 
> recursion detection code has to be annotated, correct?

Yes, correct.

>> And also, even if we can detect the recursion, we can't stop the 
>> kernel, we need to skip the probe. This means that we need to 
>> recover to the main execution path by doing single step. As you may 
>> know, since the single stepping involves the debug exception, we 
>> have to avoid proving on that path too. Or we'll have an infinite 
>> recursion again.
> 
> I don't see why this is needed: if a "probing is disabled" recursion 
> flag is set the moment the first probe fires, and if it's only cleared 
> once all processing is finished, then any intermediate probes should 
> simply return early from int3 and not fire.

No, because the int3 already changes the original instruction.
This means that you cannot skip singlestep(or emulate) the
instruction which is copied to execution buffer (ainsn->insn),
even if you have such the flag.
So, kprobe requires the annotations on the singlestep path.

Thank you,

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH -tip v4 6/6] [RFC] kprobes/x86: Call exception handlers directly from do_int3/do_debug
  2013-12-12  4:40       ` Masami Hiramatsu
@ 2013-12-12  9:59         ` Jiri Kosina
  2013-12-12 10:31           ` Masami Hiramatsu
  0 siblings, 1 reply; 39+ messages in thread
From: Jiri Kosina @ 2013-12-12  9:59 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Ingo Molnar, Andi Kleen,
	Ananth N Mavinakayanahalli, Sandeepa Prabhu, Frederic Weisbecker,
	x86, lkml, Andrew Morton, Ingo Molnar, systemtap, H. Peter Anvin,
	Sasha Levin, Thomas Gleixner, Seiji Aguchi, David S. Miller

On Thu, 12 Dec 2013, Masami Hiramatsu wrote:

> > Yeah, it's broken. Obviously, if you happen to trigger int3 before the 
> > notifier has been registered, it'd cause int3 exception to be unhandled. 
> > See
> > 
> > 	commit 17f41571bb2c4a398785452ac2718a6c5d77180e
> > 	Author: Jiri Kosina <jkosina@suse.cz>
> > 	Date:   Tue Jul 23 10:09:28 2013 +0200
> > 
> > 	    kprobes/x86: Call out into INT3 handler directly instead of using notifier
> > 
> > for one such issue that happened with jump labels.
> > 
> >> Hmm, if there's no users of the int3 notifier, should we just remove it?
> > 
> > Hmm, there are still uprobes, right?
> 
> Right, uprobes still use it, however, since it only handles user-space
> breakpoint, there is no problem.

Agreed. But therefore the notifier can't just be removed, unless uprobes 
are converted to direct call as well (but I don't think that'd be 
beneficial, notifier is sufficient in this case).

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH -tip v4 6/6] [RFC] kprobes/x86: Call exception handlers directly from do_int3/do_debug
  2013-12-12  9:59         ` Jiri Kosina
@ 2013-12-12 10:31           ` Masami Hiramatsu
  0 siblings, 0 replies; 39+ messages in thread
From: Masami Hiramatsu @ 2013-12-12 10:31 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Steven Rostedt, Ingo Molnar, Andi Kleen,
	Ananth N Mavinakayanahalli, Sandeepa Prabhu, Frederic Weisbecker,
	x86, lkml, Andrew Morton, Ingo Molnar, systemtap, H. Peter Anvin,
	Sasha Levin, Thomas Gleixner, Seiji Aguchi, David S. Miller

(2013/12/12 18:59), Jiri Kosina wrote:
> On Thu, 12 Dec 2013, Masami Hiramatsu wrote:
> 
>>> Yeah, it's broken. Obviously, if you happen to trigger int3 before the 
>>> notifier has been registered, it'd cause int3 exception to be unhandled. 
>>> See
>>>
>>> 	commit 17f41571bb2c4a398785452ac2718a6c5d77180e
>>> 	Author: Jiri Kosina <jkosina@suse.cz>
>>> 	Date:   Tue Jul 23 10:09:28 2013 +0200
>>>
>>> 	    kprobes/x86: Call out into INT3 handler directly instead of using notifier
>>>
>>> for one such issue that happened with jump labels.
>>>
>>>> Hmm, if there's no users of the int3 notifier, should we just remove it?
>>>
>>> Hmm, there are still uprobes, right?
>>
>> Right, uprobes still use it, however, since it only handles user-space
>> breakpoint, there is no problem.
> 
> Agreed. But therefore the notifier can't just be removed, unless uprobes 
> are converted to direct call as well (but I don't think that'd be 
> beneficial, notifier is sufficient in this case).

Ah, I don't intended to remove notify_die from do_int3, since
notify_die notifies many other exceptions too. :)

Thank you,

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs
  2013-12-12  6:02               ` Masami Hiramatsu
@ 2013-12-12 14:03                 ` Ingo Molnar
  2013-12-12 20:42                   ` Josh Stone
  2013-12-13  5:34                   ` Masami Hiramatsu
  0 siblings, 2 replies; 39+ messages in thread
From: Ingo Molnar @ 2013-12-12 14:03 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Ananth N Mavinakayanahalli, Sandeepa Prabhu, x86, lkml,
	Steven Rostedt (Red Hat),
	systemtap, David S. Miller


* Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> wrote:

> (2013/12/11 22:34), Ingo Molnar wrote:
> > 
> > * Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> wrote:
> > 
> >>> So why are annotations needed at all? What can happen if an 
> >>> annotation is missing and a piece of code is probed which is also 
> >>> used by the kprobes code internally - do we crash, lock up, 
> >>> misbehave or handle it safely?
> >>
> >> The kprobe has recursion detector, [...]
> > 
> > It's the 'current_kprobe' percpu variable, checked via 
> > kprobe_running(), right?
> 
> Right. :)

So that recursion detection runs a bit late:

/*
 * Interrupts are disabled on entry as trap3 is an interrupt gate and they
 * remain disabled throughout this function.
 */
static int __kprobes kprobe_handler(struct pt_regs *regs)
{
        kprobe_opcode_t *addr;
        struct kprobe *p;
        struct kprobe_ctlblk *kcb;

        addr = (kprobe_opcode_t *)(regs->ip - sizeof(kprobe_opcode_t));
        /*
         * We don't want to be preempted for the entire
         * duration of kprobe processing. We conditionally
         * re-enable preemption at the end of this function,
         * and also in reenter_kprobe() and setup_singlestep().
         */
        preempt_disable();

        kcb = get_kprobe_ctlblk();
        p = get_kprobe(addr);

        if (p) {
                if (kprobe_running()) {

this flag should be checked first - the kprobe handler should already 
run in non-preemptible context if it comes from an exception.

For that reason I don't understand the whole 
preempt_disable()/enable() dance - it looks entirely superfluous to 
me. The comment above the preempt_disable() looks mostly bogus.

( The flow of logic in the function is rather confusing as well - 
  lots of places return from the middle of the function - instead they 
  should have the usual 'goto out' kind of code pattern. )

> >> [...] but it is detected in the kprobe exception(int3) handler, 
> >> this means that if we put a probe before detecting the recursion, 
> >> we'll do an infinite recursion.
> > 
> > So only the (presumably rather narrow) code path leading to the 
> > recursion detection code has to be annotated, correct?
> 
> Yes, correct.

So, another thing I find confusing is the whole kprobes notifier 
block. Why doesn't it call back specific kprobes handlers, directly 
from do_int3() and do_debug()? That's much more readable and it also 
allows the kprobes code to go earlier in the handler, running its 
recursion code earlier!

Another question I have here is: how does the kprobes code protect 
against interrupts arriving in before the recursion check and running 
a probe recursively?

> >> And also, even if we can detect the recursion, we can't stop the 
> >> kernel, we need to skip the probe. This means that we need to 
> >> recover to the main execution path by doing single step. As you 
> >> may know, since the single stepping involves the debug exception, 
> >> we have to avoid proving on that path too. Or we'll have an 
> >> infinite recursion again.
> > 
> > I don't see why this is needed: if a "probing is disabled" 
> > recursion flag is set the moment the first probe fires, and if 
> > it's only cleared once all processing is finished, then any 
> > intermediate probes should simply return early from int3 and not 
> > fire.
> 
> No, because the int3 already changes the original instruction.
> This means that you cannot skip singlestep(or emulate) the
> instruction which is copied to execution buffer (ainsn->insn),
> even if you have such the flag.
> So, kprobe requires the annotations on the singlestep path.

I don't understand this reasoning.

Lets assume we allow a probe to be inserted in the single-step path. 
Such a probe will be an INT3 instruction and if it hits we get a 
recursive INT3 invocation. In that case the INT3 handler should simply 
restore the original instruction and _leave it so_. There's no 
single-stepping needed - the probe is confused and must be discarded.

Once the original instruction is restored we simply return from the 
int3 exception and the single-step handling execution can continue.

This would be _way_ more robust as we wouldn't have to precisely 
annotate anything but the very narrow int3 exception code path and the 
'restore original instruction in case of recursion' code path.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs
  2013-12-12 14:03                 ` Ingo Molnar
@ 2013-12-12 20:42                   ` Josh Stone
  2013-12-13  5:34                   ` Masami Hiramatsu
  1 sibling, 0 replies; 39+ messages in thread
From: Josh Stone @ 2013-12-12 20:42 UTC (permalink / raw)
  To: Ingo Molnar, Masami Hiramatsu
  Cc: Ananth N Mavinakayanahalli, Sandeepa Prabhu, x86, lkml,
	Steven Rostedt (Red Hat),
	systemtap, David S. Miller

On 12/12/2013 06:03 AM, Ingo Molnar wrote:
>> No, because the int3 already changes the original instruction.
>> This means that you cannot skip singlestep(or emulate) the
>> instruction which is copied to execution buffer (ainsn->insn),
>> even if you have such the flag.
>> So, kprobe requires the annotations on the singlestep path.
> I don't understand this reasoning.
> 
> Lets assume we allow a probe to be inserted in the single-step path. 
> Such a probe will be an INT3 instruction and if it hits we get a 
> recursive INT3 invocation. In that case the INT3 handler should simply 
> restore the original instruction and _leave it so_. There's no 
> single-stepping needed - the probe is confused and must be discarded.

So if you restore the original instruction, then you're essentially
creating a dynamic blacklist for the singlestep path, right?  I think
that's fine, as long as you still allow recursive probes elsewhere to
just singlestep and skip that occurrence.

It also helps with the inlining issues, since an inlined function
instance in the singlestep path can get dynamically blocked, while still
allowing inline instances elsewhere to be probed normally.  Then you
don't have to force always/never inline decisions - whatever gcc decides
to do with inlines and static functions can be dealt with.


Josh

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Re: [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs
  2013-12-12 14:03                 ` Ingo Molnar
  2013-12-12 20:42                   ` Josh Stone
@ 2013-12-13  5:34                   ` Masami Hiramatsu
  2013-12-13  6:06                     ` Masami Hiramatsu
  2013-12-16 10:53                     ` Masami Hiramatsu
  1 sibling, 2 replies; 39+ messages in thread
From: Masami Hiramatsu @ 2013-12-13  5:34 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Ananth N Mavinakayanahalli, Sandeepa Prabhu, x86, lkml,
	Steven Rostedt (Red Hat),
	systemtap, David S. Miller

(2013/12/12 23:03), Ingo Molnar wrote:
> 
> * Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> wrote:
> 
>> (2013/12/11 22:34), Ingo Molnar wrote:
>>>
>>> * Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> wrote:
>>>
>>>>> So why are annotations needed at all? What can happen if an 
>>>>> annotation is missing and a piece of code is probed which is also 
>>>>> used by the kprobes code internally - do we crash, lock up, 
>>>>> misbehave or handle it safely?
>>>>
>>>> The kprobe has recursion detector, [...]
>>>
>>> It's the 'current_kprobe' percpu variable, checked via 
>>> kprobe_running(), right?
>>
>> Right. :)
> 
> So that recursion detection runs a bit late:
> 
> /*
>  * Interrupts are disabled on entry as trap3 is an interrupt gate and they
>  * remain disabled throughout this function.
>  */
> static int __kprobes kprobe_handler(struct pt_regs *regs)
> {
>         kprobe_opcode_t *addr;
>         struct kprobe *p;
>         struct kprobe_ctlblk *kcb;
> 
>         addr = (kprobe_opcode_t *)(regs->ip - sizeof(kprobe_opcode_t));
>         /*
>          * We don't want to be preempted for the entire
>          * duration of kprobe processing. We conditionally
>          * re-enable preemption at the end of this function,
>          * and also in reenter_kprobe() and setup_singlestep().
>          */
>         preempt_disable();
> 
>         kcb = get_kprobe_ctlblk();
>         p = get_kprobe(addr);
> 
>         if (p) {
>                 if (kprobe_running()) {
> 
> this flag should be checked first - the kprobe handler should already 
> run in non-preemptible context if it comes from an exception.

No. Even if we check it first, this flag is *only* for recursive
kprobes inside the kprobe handlers, not for the crash case.

If we'd like to check the crash case, we need to do that in
the earliest stage on the int3 interrupt, like in entry_XX.S.

And we need to allow some degree of recursion, since there
are several different int3 users (ftrace, jump label, kgdb),
at least jprobe uses int3 for returning from its handler
(see jprobe_return - kprobe provides break_handler to handle
int3 inside kprobe for this purpose.) Those are orthogonal
features, thus it will not cause infinite recursion.

> For that reason I don't understand the whole 
> preempt_disable()/enable() dance - it looks entirely superfluous to 
> me. The comment above the preempt_disable() looks mostly bogus.

Actually, this can be removed, because not only what you pointed,
but also the local interrupt is disabled while the int3 is
processing. This means that we don't need to disable preemption.

> ( The flow of logic in the function is rather confusing as well - 
>   lots of places return from the middle of the function - instead they 
>   should have the usual 'goto out' kind of code pattern. )
> 
>>>> [...] but it is detected in the kprobe exception(int3) handler, 
>>>> this means that if we put a probe before detecting the recursion, 
>>>> we'll do an infinite recursion.
>>>
>>> So only the (presumably rather narrow) code path leading to the 
>>> recursion detection code has to be annotated, correct?
>>
>> Yes, correct.
> 
> So, another thing I find confusing is the whole kprobes notifier 
> block. Why doesn't it call back specific kprobes handlers, directly 
> from do_int3() and do_debug()? That's much more readable and it also 
> allows the kprobes code to go earlier in the handler, running its 
> recursion code earlier!

Yes, I fixed it on this series! :)
I don't know what had discussed about using notify_die to handle
int3/debug. I guess it looks more generic way at that time.

> Another question I have here is: how does the kprobes code protect 
> against interrupts arriving in before the recursion check and running 
> a probe recursively?

That does just one recursion, it's no problem. The problem happens
if it causes infinite recursion.

>>>> And also, even if we can detect the recursion, we can't stop the 
>>>> kernel, we need to skip the probe. This means that we need to 
>>>> recover to the main execution path by doing single step. As you 
>>>> may know, since the single stepping involves the debug exception, 
>>>> we have to avoid proving on that path too. Or we'll have an 
>>>> infinite recursion again.
>>>
>>> I don't see why this is needed: if a "probing is disabled" 
>>> recursion flag is set the moment the first probe fires, and if 
>>> it's only cleared once all processing is finished, then any 
>>> intermediate probes should simply return early from int3 and not 
>>> fire.
>>
>> No, because the int3 already changes the original instruction.
>> This means that you cannot skip singlestep(or emulate) the
>> instruction which is copied to execution buffer (ainsn->insn),
>> even if you have such the flag.
>> So, kprobe requires the annotations on the singlestep path.
> 
> I don't understand this reasoning.
> 
> Lets assume we allow a probe to be inserted in the single-step path. 
> Such a probe will be an INT3 instruction and if it hits we get a 
> recursive INT3 invocation. In that case the INT3 handler should simply
> restore the original instruction and _leave it so_. There's no 
> single-stepping needed - the probe is confused and must be discarded.

But how can we restore the protected kernel text?
If we use text_poke, we also need to prohibit probing on the text_poke
and functions called in the text_poke too. That just shifts the annotated
area to the text_poke. :(

Thank you,

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Re: Re: [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs
  2013-12-13  5:34                   ` Masami Hiramatsu
@ 2013-12-13  6:06                     ` Masami Hiramatsu
  2013-12-16 10:53                     ` Masami Hiramatsu
  1 sibling, 0 replies; 39+ messages in thread
From: Masami Hiramatsu @ 2013-12-13  6:06 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Ananth N Mavinakayanahalli, Sandeepa Prabhu, x86, lkml,
	Steven Rostedt (Red Hat),
	systemtap, David S. Miller

(2013/12/13 14:34), Masami Hiramatsu wrote:
>> Lets assume we allow a probe to be inserted in the single-step path. 
>> Such a probe will be an INT3 instruction and if it hits we get a 
>> recursive INT3 invocation. In that case the INT3 handler should simply
>> restore the original instruction and _leave it so_. There's no 
>> single-stepping needed - the probe is confused and must be discarded.
> 
> But how can we restore the protected kernel text?
> If we use text_poke, we also need to prohibit probing on the text_poke
> and functions called in the text_poke too. That just shifts the annotated
> area to the text_poke. :(

BTW, currently we mark the text_poke as nokprobe_symbol, but it should be
removed. We don't call it from kprobes int3/debug handlers.

The patches which removes __kprobes in this series are only for kprobe
related files (arch/x86/kernel/kprobes/* or kernel/kprobes.c.) I think
we should do it for other parts.
Is it better to do that on this series?

Thank you,

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Re: Re: [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs
  2013-12-13  5:34                   ` Masami Hiramatsu
  2013-12-13  6:06                     ` Masami Hiramatsu
@ 2013-12-16 10:53                     ` Masami Hiramatsu
  1 sibling, 0 replies; 39+ messages in thread
From: Masami Hiramatsu @ 2013-12-16 10:53 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Ananth N Mavinakayanahalli, Sandeepa Prabhu, x86, lkml,
	Steven Rostedt (Red Hat),
	systemtap, David S. Miller

Hi Ingo,

(2013/12/13 14:34), Masami Hiramatsu wrote:
>>>>> And also, even if we can detect the recursion, we can't stop the 
>>>>> kernel, we need to skip the probe. This means that we need to 
>>>>> recover to the main execution path by doing single step. As you 
>>>>> may know, since the single stepping involves the debug exception, 
>>>>> we have to avoid proving on that path too. Or we'll have an 
>>>>> infinite recursion again.
>>>>
>>>> I don't see why this is needed: if a "probing is disabled" 
>>>> recursion flag is set the moment the first probe fires, and if 
>>>> it's only cleared once all processing is finished, then any 
>>>> intermediate probes should simply return early from int3 and not 
>>>> fire.
>>>
>>> No, because the int3 already changes the original instruction.
>>> This means that you cannot skip singlestep(or emulate) the
>>> instruction which is copied to execution buffer (ainsn->insn),
>>> even if you have such the flag.
>>> So, kprobe requires the annotations on the singlestep path.
>>
>> I don't understand this reasoning.
>>
>> Lets assume we allow a probe to be inserted in the single-step path. 
>> Such a probe will be an INT3 instruction and if it hits we get a 
>> recursive INT3 invocation. In that case the INT3 handler should simply
>> restore the original instruction and _leave it so_. There's no 
>> single-stepping needed - the probe is confused and must be discarded.
> 
> But how can we restore the protected kernel text?
> If we use text_poke, we also need to prohibit probing on the text_poke
> and functions called in the text_poke too. That just shifts the annotated
> area to the text_poke. :(


OK, I've checked current text_poke() and thought how we can do that.
The current text_poke() uses special fixmap to make alias pages for
avoiding kernel-text readonly protection. For protecting the fixmap
pages, we are currently using text_mutex and this is why we can't use
it in exception path. There are other minor issues, but it seems to
be fixed easily. :)

Thus, for recovering original instruction in the int3 handler,
I'd like to propose adding another text_poke like function, which
requires another fixmap page and protects it by using raw_spinlock
(to avoid tracing), and just support one-byte poke (this means it
 never across the page boundary).

Perhaps, it can be implemented inside kprobes, because it is not
useful for other subsystems.

Thank you!


-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2013-12-16 10:53 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-12-04  1:28 [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs Masami Hiramatsu
2013-12-04  1:28 ` [PATCH -tip v4 1/6] kprobes: Prohibit probing on .entry.text code Masami Hiramatsu
2013-12-04  1:28 ` [PATCH -tip v4 2/6] kprobes: Introduce NOKPROBE_SYMBOL() macro for blacklist Masami Hiramatsu
2013-12-04  1:28 ` [PATCH -tip v4 3/6] [BUGFIX] kprobes/x86: Prohibit probing on debug_stack_* Masami Hiramatsu
2013-12-04  1:28 ` [PATCH -tip v4 4/6] [BUGFIX] x86: Prohibit probing on native_set_debugreg Masami Hiramatsu
2013-12-04  1:28 ` [PATCH -tip v4 5/6] [BUGFIX] x86: Prohibit probing on thunk functions and restore Masami Hiramatsu
2013-12-04  1:28 ` [PATCH -tip v4 6/6] [RFC] kprobes/x86: Call exception handlers directly from do_int3/do_debug Masami Hiramatsu
2013-12-04  2:39   ` Steven Rostedt
2013-12-11 13:31     ` Jiri Kosina
2013-12-12  4:40       ` Masami Hiramatsu
2013-12-12  9:59         ` Jiri Kosina
2013-12-12 10:31           ` Masami Hiramatsu
2013-12-04  2:54 ` [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs Sandeepa Prabhu
2013-12-04  7:39   ` Masami Hiramatsu
2013-12-04  8:46     ` Sandeepa Prabhu
2013-12-04 23:32       ` Masami Hiramatsu
2013-12-04  8:45 ` Ingo Molnar
2013-12-04 23:27   ` Masami Hiramatsu
2013-12-05 10:21     ` Ingo Molnar
2013-12-06  2:34       ` Masami Hiramatsu
2013-12-10 15:28         ` Ingo Molnar
2013-12-11  2:12           ` Masami Hiramatsu
2013-12-11 13:34             ` Ingo Molnar
2013-12-12  6:02               ` Masami Hiramatsu
2013-12-12 14:03                 ` Ingo Molnar
2013-12-12 20:42                   ` Josh Stone
2013-12-13  5:34                   ` Masami Hiramatsu
2013-12-13  6:06                     ` Masami Hiramatsu
2013-12-16 10:53                     ` Masami Hiramatsu
2013-12-05 13:08     ` Sandeepa Prabhu
2013-12-06  6:23       ` Masami Hiramatsu
2013-12-06  6:54         ` Sandeepa Prabhu
2013-12-06 23:25           ` Masami Hiramatsu
2013-12-05 14:49     ` Frank Ch. Eigler
2013-12-06  6:12       ` Masami Hiramatsu
2013-12-06 19:07         ` Frank Ch. Eigler
2013-12-06 23:19           ` Masami Hiramatsu
2013-12-07  1:32             ` Frank Ch. Eigler
2013-12-07  2:34               ` Masami Hiramatsu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.