linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH -tip v10 0/7] kprobes: NOKPROBE_SYMBOL for modules, and scalbility efforts
@ 2014-05-08  9:38 Masami Hiramatsu
  2014-05-08  9:38 ` [PATCH -tip v10 1/7] kprobes: Support blacklist functions in module Masami Hiramatsu
                   ` (6 more replies)
  0 siblings, 7 replies; 12+ messages in thread
From: Masami Hiramatsu @ 2014-05-08  9:38 UTC (permalink / raw)
  To: linux-kernel, Ingo Molnar
  Cc: Andi Kleen, Ananth N Mavinakayanahalli, Sandeepa Prabhu,
	Frederic Weisbecker, x86, Steven Rostedt, fche, mingo, systemtap,
	H. Peter Anvin, Thomas Gleixner

Hi,
Here is the version 10 of NOKPROBE_SYMBOL/scalability series.
This updates some issues pointed by Ingo (Thank you!)

Blacklist improvements
======================
Since most of the NOKPROBE_SYMBOL series are merged, this just adds
kernel module support of NOKPROBE_SYMBOL. If kprobes user module
has kprobes handlers and local functions which is only called from
the handlers, it should be marked as NOKPROBE_SYMBOL. Such symbols
are automatically added to kprobe blacklist.

Scalability effort
==================
This series fixes not only the kernel crashable "qualitative" bugs
but also "quantitative" issue with massive multiple kprobes. Thus
we can now do a stress test, putting kprobes on all (non-blacklisted)
kernel functions and enabling all of them.
To set kprobes on all kernel functions, run the below script.
  ----
  #!/bin/sh
  TRACE_DIR=/sys/kernel/debug/tracing/
  echo > $TRACE_DIR/kprobe_events
  grep -iw t /proc/kallsyms | tr -d . | \
    awk 'BEGIN{i=0};{print("p:"$3"_"i, "0x"$1); i++}' | \
    while read l; do echo $l >> $TRACE_DIR/kprobe_events ; done
  ----
Since it doesn't check the blacklist at all, you'll see many write
errors, but no problem :).

Note that a kind of performance issue is still in the kprobe-tracer
if you trace all functions. Since a few ftrace functions are called
inside the kprobe tracer even if we shut off the tracing (tracing_on
= 0), enabling kprobe-events on the functions will cause a bad
performance impact (it is safe, but you'll see the system slowdown
and no event recorded because it is just ignored).
To find those functions, you can use the third column of
(debugfs)/tracing/kprobe_profile as below, which tells you the number
of miss-hit(ignored) for each events. If you find that some events
which have small number in 2nd column and large number in 3rd column,
those may course the slowdown.
  ----
  # sort -rnk 3 (debugfs)/tracing/kprobe_profile | head
  ftrace_cmp_recs_4907                               264950231     33648874543
  ring_buffer_lock_reserve_5087                              0      4802719935
  trace_buffer_lock_reserve_5199                             0      4385319303
  trace_event_buffer_lock_reserve_5200                       0      4379968153
  ftrace_location_range_4918                          18944015      2407616669
  bsearch_17098                                       18979815      2407579741
  ftrace_location_4972                                18927061      2406723128
  ftrace_int3_handler_1211                            18926980      2406303531
  poke_int3_handler_199                               18448012      1403516611
  inat_get_opcode_attribute_16941                            0        12715314
  ----

I'd recommend you to enable events on such functions after all other
events enabled. Then its performance impact becomes minimum.

To enable kprobes on all kernel functions, run the below script.
  ----
  #!/bin/sh
  TRACE_DIR=/sys/kernel/debug/tracing
  echo "Disable tracing to remove tracing overhead"
  echo 0 > $TRACE_DIR/tracing_on

  BADS="ftrace_cmp_recs ring_buffer_lock_reserve trace_buffer_lock_reserve trace_event_buffer_lock_reserve ftrace_location_range bsearch ftrace_location ftrace_int3_handler poke_int3_handler inat_get_opcode_attribute"
HIDES=
  for i in $BADS; do HIDES=$HIDES" --hide=$i*"; done

  SDATE=`date +%s`
  echo "Enabling trace events: start at $SDATE"

  cd $TRACE_DIR/events/kprobes/
  for i in `ls $HIDES` ; do echo 1 > $i/enable; done
  for j in $BADS; do for i in `ls -d $j*`;do echo 1 > $i/enable; done; done

  EDATE=`date +%s`
  TIME=`expr $EDATE - $SDATE`
  echo "Elapsed time: $TIME"
  ---- 
Note: Perhaps, using systemtap doesn't need to consider above bad
symbols since it has own logic not to probe itself.

Result
======
These were also enabled after all other events are enabled.
And it took 2254 sec(without any intervals) for enabling 37222 probes.
And at that point, the perf top showed below result:
  ----
  Samples: 10K of event 'cycles', Event count (approx.): 270565996
  +  16.39%  [kernel]                       [k] native_load_idt
  +  11.17%  [kernel]                       [k] int3
  -   7.91%  [kernel]                       [k] 0x00007fffa018e8e0
   - 0xffffffffa018d8e0
        59.09% trace_event_buffer_lock_reserve
           kprobe_trace_func
           kprobe_dispatcher
      + 40.45% trace_event_buffer_lock_reserve
  ----
0x00007fffa018e8e0 may be the trampoline buffer of an optimized
probe on trace_event_buffer_lock_reserve. native_load_idt and int3
are also called from normal kprobes.
This means, at least my environment, kprobes now passed the
stress test, and even if we put probes on all available functions
it just slows down about 50%.

Changes from v9:
 - [1/7] Remove unneeded #include <linux/kprobes.h> from module.h
 - [6/7] Add a comment for kpcache_invalidate().
 - [6/7] Remove CONFIG_KPROBE_CACHE accoding to Ingo's suggestion.


Thank you,

---

Masami Hiramatsu (7):
      kprobes: Support blacklist functions in module
      kprobes: Use NOKPROBE_SYMBOL() in sample modules
      kprobes/x86: Use kprobe_blacklist for .kprobes.text and .entry.text
      kprobes/x86: Remove unneeded preempt_disable/enable in interrupt handlers
      kprobes: Enlarge hash table to 512 entries
      kprobes: Introduce kprobe cache to reduce cache misshits
      ftrace: Introduce FTRACE_OPS_FL_SELF_FILTER for ftrace-kprobe


 Documentation/kprobes.txt           |    8 +
 arch/x86/kernel/kprobes/core.c      |   37 +----
 arch/x86/kernel/kprobes/ftrace.c    |    2 
 include/linux/ftrace.h              |    3 
 include/linux/kprobes.h             |    2 
 include/linux/module.h              |    4 +
 kernel/kprobes.c                    |  278 +++++++++++++++++++++++++++++------
 kernel/module.c                     |    6 +
 kernel/trace/ftrace.c               |    3 
 samples/kprobes/jprobe_example.c    |    1 
 samples/kprobes/kprobe_example.c    |    3 
 samples/kprobes/kretprobe_example.c |    2 
 12 files changed, 273 insertions(+), 76 deletions(-)

--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH -tip v10 1/7] kprobes: Support blacklist functions in module
  2014-05-08  9:38 [PATCH -tip v10 0/7] kprobes: NOKPROBE_SYMBOL for modules, and scalbility efforts Masami Hiramatsu
@ 2014-05-08  9:38 ` Masami Hiramatsu
  2014-05-08  9:38 ` [PATCH -tip v10 2/7] kprobes: Use NOKPROBE_SYMBOL() in sample modules Masami Hiramatsu
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Masami Hiramatsu @ 2014-05-08  9:38 UTC (permalink / raw)
  To: linux-kernel, Ingo Molnar
  Cc: Rusty Russell, Andi Kleen, Ananth N Mavinakayanahalli,
	Sandeepa Prabhu, Frederic Weisbecker, x86, Steven Rostedt, fche,
	mingo, Rob Landley, H. Peter Anvin, Thomas Gleixner,
	David S. Miller, systemtap

To blacklist the functions in a module (e.g. user-defined
kprobe handler and the functions invoked from it), expand
blacklist support for modules.
With this change, users can use NOKPROBE_SYMBOL() macro in
their own modules.

Changes from v9:
 - Remove unneeded #include <linux/kprobes.h> from module.h

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Rob Landley <rob@landley.net>
Cc: Rusty Russell <rusty@rustcorp.com.au>
---
 Documentation/kprobes.txt |    8 ++++++
 include/linux/module.h    |    4 +++
 kernel/kprobes.c          |   63 ++++++++++++++++++++++++++++++++++++++-------
 kernel/module.c           |    6 ++++
 4 files changed, 71 insertions(+), 10 deletions(-)

diff --git a/Documentation/kprobes.txt b/Documentation/kprobes.txt
index 4bbeca8..2845956 100644
--- a/Documentation/kprobes.txt
+++ b/Documentation/kprobes.txt
@@ -512,6 +512,14 @@ int enable_jprobe(struct jprobe *jp);
 Enables *probe which has been disabled by disable_*probe(). You must specify
 the probe which has been registered.
 
+4.9 NOKPROBE_SYMBOL()
+
+#include <linux/kprobes.h>
+NOKPROBE_SYMBOL(FUNCTION);
+
+Protects given FUNCTION from other kprobes. This is useful for handler
+functions and functions called from the handlers.
+
 5. Kprobes Features and Limitations
 
 Kprobes allows multiple probes at the same address.  Currently,
diff --git a/include/linux/module.h b/include/linux/module.h
index f520a76..2a95e5a 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -357,6 +357,10 @@ struct module {
 	unsigned int num_ftrace_callsites;
 	unsigned long *ftrace_callsites;
 #endif
+#ifdef CONFIG_KPROBES
+	unsigned int num_kprobe_blacklist;
+	unsigned long  *kprobe_blacklist;
+#endif
 
 #ifdef CONFIG_MODULE_UNLOAD
 	/* What modules depend on me? */
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 3214289..8319048 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -88,6 +88,7 @@ static raw_spinlock_t *kretprobe_table_lock_ptr(unsigned long hash)
 
 /* Blacklist -- list of struct kprobe_blacklist_entry */
 static LIST_HEAD(kprobe_blacklist);
+static DEFINE_MUTEX(kprobe_blacklist_mutex);
 
 #ifdef __ARCH_WANT_KPROBES_INSN_SLOT
 /*
@@ -1331,22 +1332,27 @@ bool __weak arch_within_kprobe_blacklist(unsigned long addr)
 	       addr < (unsigned long)__kprobes_text_end;
 }
 
-static bool within_kprobe_blacklist(unsigned long addr)
+static struct kprobe_blacklist_entry *find_blacklist_entry(unsigned long addr)
 {
 	struct kprobe_blacklist_entry *ent;
 
+	list_for_each_entry(ent, &kprobe_blacklist, list) {
+		if (addr >= ent->start_addr && addr < ent->end_addr)
+			return ent;
+	}
+
+	return NULL;
+}
+
+static bool within_kprobe_blacklist(unsigned long addr)
+{
 	if (arch_within_kprobe_blacklist(addr))
 		return true;
 	/*
 	 * If there exists a kprobe_blacklist, verify and
 	 * fail any probe registration in the prohibited area
 	 */
-	list_for_each_entry(ent, &kprobe_blacklist, list) {
-		if (addr >= ent->start_addr && addr < ent->end_addr)
-			return true;
-	}
-
-	return false;
+	return !!find_blacklist_entry(addr);
 }
 
 /*
@@ -1432,6 +1438,7 @@ static int check_kprobe_address_safe(struct kprobe *p,
 #endif
 	}
 
+	mutex_lock(&kprobe_blacklist_mutex);
 	jump_label_lock();
 	preempt_disable();
 
@@ -1469,6 +1476,7 @@ static int check_kprobe_address_safe(struct kprobe *p,
 out:
 	preempt_enable();
 	jump_label_unlock();
+	mutex_unlock(&kprobe_blacklist_mutex);
 
 	return ret;
 }
@@ -2032,13 +2040,13 @@ NOKPROBE_SYMBOL(dump_kprobe);
  * since a kprobe need not necessarily be at the beginning
  * of a function.
  */
-static int __init populate_kprobe_blacklist(unsigned long *start,
-					     unsigned long *end)
+static int populate_kprobe_blacklist(unsigned long *start, unsigned long *end)
 {
 	unsigned long *iter;
 	struct kprobe_blacklist_entry *ent;
 	unsigned long offset = 0, size = 0;
 
+	mutex_lock(&kprobe_blacklist_mutex);
 	for (iter = start; iter < end; iter++) {
 		if (!kallsyms_lookup_size_offset(*iter, &size, &offset)) {
 			pr_err("Failed to find blacklist %p\n", (void *)*iter);
@@ -2053,9 +2061,28 @@ static int __init populate_kprobe_blacklist(unsigned long *start,
 		INIT_LIST_HEAD(&ent->list);
 		list_add_tail(&ent->list, &kprobe_blacklist);
 	}
+	mutex_unlock(&kprobe_blacklist_mutex);
+
 	return 0;
 }
 
+/* Shrink the blacklist */
+static void shrink_kprobe_blacklist(unsigned long *start, unsigned long *end)
+{
+	struct kprobe_blacklist_entry *ent;
+	unsigned long *iter;
+
+	mutex_lock(&kprobe_blacklist_mutex);
+	for (iter = start; iter < end; iter++) {
+		ent = find_blacklist_entry(*iter);
+		if (!ent)
+			continue;
+		list_del(&ent->list);
+		kfree(ent);
+	}
+	mutex_unlock(&kprobe_blacklist_mutex);
+}
+
 /* Module notifier call back, checking kprobes on the module */
 static int kprobes_module_callback(struct notifier_block *nb,
 				   unsigned long val, void *data)
@@ -2066,6 +2093,16 @@ static int kprobes_module_callback(struct notifier_block *nb,
 	unsigned int i;
 	int checkcore = (val == MODULE_STATE_GOING);
 
+	/* Add/remove module blacklist */
+	if (val == MODULE_STATE_COMING)
+		populate_kprobe_blacklist(mod->kprobe_blacklist,
+					  mod->kprobe_blacklist +
+					  mod->num_kprobe_blacklist);
+	else if (val == MODULE_STATE_GOING)
+		shrink_kprobe_blacklist(mod->kprobe_blacklist,
+					mod->kprobe_blacklist +
+					mod->num_kprobe_blacklist);
+
 	if (val != MODULE_STATE_GOING && val != MODULE_STATE_LIVE)
 		return NOTIFY_DONE;
 
@@ -2252,6 +2289,7 @@ static const struct file_operations debugfs_kprobes_operations = {
 /* kprobes/blacklist -- shows which functions can not be probed */
 static void *kprobe_blacklist_seq_start(struct seq_file *m, loff_t *pos)
 {
+	mutex_lock(&kprobe_blacklist_mutex);
 	return seq_list_start(&kprobe_blacklist, *pos);
 }
 
@@ -2260,6 +2298,11 @@ static void *kprobe_blacklist_seq_next(struct seq_file *m, void *v, loff_t *pos)
 	return seq_list_next(v, &kprobe_blacklist, pos);
 }
 
+static void kprobe_blacklist_seq_stop(struct seq_file *m, void *v)
+{
+	mutex_unlock(&kprobe_blacklist_mutex);
+}
+
 static int kprobe_blacklist_seq_show(struct seq_file *m, void *v)
 {
 	struct kprobe_blacklist_entry *ent =
@@ -2273,7 +2316,7 @@ static int kprobe_blacklist_seq_show(struct seq_file *m, void *v)
 static const struct seq_operations kprobe_blacklist_seq_ops = {
 	.start = kprobe_blacklist_seq_start,
 	.next  = kprobe_blacklist_seq_next,
-	.stop  = kprobe_seq_stop,	/* Reuse void function */
+	.stop  = kprobe_blacklist_seq_stop,
 	.show  = kprobe_blacklist_seq_show,
 };
 
diff --git a/kernel/module.c b/kernel/module.c
index 079c461..618ac21 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -58,6 +58,7 @@
 #include <linux/percpu.h>
 #include <linux/kmemleak.h>
 #include <linux/jump_label.h>
+#include <linux/kprobes.h>
 #include <linux/pfn.h>
 #include <linux/bsearch.h>
 #include <linux/fips.h>
@@ -2769,6 +2770,11 @@ static int find_module_sections(struct module *mod, struct load_info *info)
 					     sizeof(*mod->ftrace_callsites),
 					     &mod->num_ftrace_callsites);
 #endif
+#ifdef CONFIG_KPROBES
+	mod->kprobe_blacklist = section_objs(info, "_kprobe_blacklist",
+					     sizeof(*mod->kprobe_blacklist),
+					     &mod->num_kprobe_blacklist);
+#endif
 
 	mod->extable = section_objs(info, "__ex_table",
 				    sizeof(*mod->extable), &mod->num_exentries);



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH -tip v10 2/7] kprobes: Use NOKPROBE_SYMBOL() in sample modules
  2014-05-08  9:38 [PATCH -tip v10 0/7] kprobes: NOKPROBE_SYMBOL for modules, and scalbility efforts Masami Hiramatsu
  2014-05-08  9:38 ` [PATCH -tip v10 1/7] kprobes: Support blacklist functions in module Masami Hiramatsu
@ 2014-05-08  9:38 ` Masami Hiramatsu
  2014-05-08  9:39 ` [PATCH -tip v10 3/7] kprobes/x86: Use kprobe_blacklist for .kprobes.text and .entry.text Masami Hiramatsu
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Masami Hiramatsu @ 2014-05-08  9:38 UTC (permalink / raw)
  To: linux-kernel, Ingo Molnar
  Cc: Andi Kleen, Ananth N Mavinakayanahalli, Sandeepa Prabhu,
	Frederic Weisbecker, x86, Steven Rostedt, fche, mingo, systemtap,
	H. Peter Anvin, Thomas Gleixner

Use NOKPROBE_SYMBOL() to protect handlers from kprobes
in sample modules.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Ananth N Mavinakayanahalli <ananth@in.ibm.com>
---
 samples/kprobes/jprobe_example.c    |    1 +
 samples/kprobes/kprobe_example.c    |    3 +++
 samples/kprobes/kretprobe_example.c |    2 ++
 3 files changed, 6 insertions(+)

diff --git a/samples/kprobes/jprobe_example.c b/samples/kprobes/jprobe_example.c
index b754135..40114ac 100644
--- a/samples/kprobes/jprobe_example.c
+++ b/samples/kprobes/jprobe_example.c
@@ -35,6 +35,7 @@ static long jdo_fork(unsigned long clone_flags, unsigned long stack_start,
 	jprobe_return();
 	return 0;
 }
+NOKPROBE_SYMBOL(jdo_fork);
 
 static struct jprobe my_jprobe = {
 	.entry			= jdo_fork,
diff --git a/samples/kprobes/kprobe_example.c b/samples/kprobes/kprobe_example.c
index 366db1a..462d90f 100644
--- a/samples/kprobes/kprobe_example.c
+++ b/samples/kprobes/kprobe_example.c
@@ -46,6 +46,7 @@ static int handler_pre(struct kprobe *p, struct pt_regs *regs)
 	/* A dump_stack() here will give a stack backtrace */
 	return 0;
 }
+NOKPROBE_SYMBOL(handler_pre);
 
 /* kprobe post_handler: called after the probed instruction is executed */
 static void handler_post(struct kprobe *p, struct pt_regs *regs,
@@ -68,6 +69,7 @@ static void handler_post(struct kprobe *p, struct pt_regs *regs,
 		p->addr, regs->ex1);
 #endif
 }
+NOKPROBE_SYMBOL(handler_post);
 
 /*
  * fault_handler: this is called if an exception is generated for any
@@ -81,6 +83,7 @@ static int handler_fault(struct kprobe *p, struct pt_regs *regs, int trapnr)
 	/* Return 0 because we don't handle the fault. */
 	return 0;
 }
+NOKPROBE_SYMBOL(handler_fault);
 
 static int __init kprobe_init(void)
 {
diff --git a/samples/kprobes/kretprobe_example.c b/samples/kprobes/kretprobe_example.c
index 1041b67..d932c52 100644
--- a/samples/kprobes/kretprobe_example.c
+++ b/samples/kprobes/kretprobe_example.c
@@ -47,6 +47,7 @@ static int entry_handler(struct kretprobe_instance *ri, struct pt_regs *regs)
 	data->entry_stamp = ktime_get();
 	return 0;
 }
+NOKPROBE_SYMBOL(entry_handler);
 
 /*
  * Return-probe handler: Log the return value and duration. Duration may turn
@@ -66,6 +67,7 @@ static int ret_handler(struct kretprobe_instance *ri, struct pt_regs *regs)
 			func_name, retval, (long long)delta);
 	return 0;
 }
+NOKPROBE_SYMBOL(ret_handler);
 
 static struct kretprobe my_kretprobe = {
 	.handler		= ret_handler,



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH -tip v10 3/7] kprobes/x86: Use kprobe_blacklist for .kprobes.text and .entry.text
  2014-05-08  9:38 [PATCH -tip v10 0/7] kprobes: NOKPROBE_SYMBOL for modules, and scalbility efforts Masami Hiramatsu
  2014-05-08  9:38 ` [PATCH -tip v10 1/7] kprobes: Support blacklist functions in module Masami Hiramatsu
  2014-05-08  9:38 ` [PATCH -tip v10 2/7] kprobes: Use NOKPROBE_SYMBOL() in sample modules Masami Hiramatsu
@ 2014-05-08  9:39 ` Masami Hiramatsu
  2014-05-08  9:39 ` [PATCH -tip v10 4/7] kprobes/x86: Remove unneeded preempt_disable/enable in interrupt handlers Masami Hiramatsu
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Masami Hiramatsu @ 2014-05-08  9:39 UTC (permalink / raw)
  To: linux-kernel, Ingo Molnar
  Cc: Andi Kleen, Ananth N Mavinakayanahalli, Sandeepa Prabhu,
	Frederic Weisbecker, x86, Steven Rostedt, David S. Miller, fche,
	mingo, systemtap, H. Peter Anvin, Andrew Morton, Thomas Gleixner

Use kprobe_blackpoint for blacklisting .entry.text and .kprobees.text
instead of arch_within_kprobe_blacklist. This also makes them visible
via (debugfs)/kprobes/blacklist.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 arch/x86/kernel/kprobes/core.c |   11 +------
 include/linux/kprobes.h        |    1 +
 kernel/kprobes.c               |   64 ++++++++++++++++++++++++++++------------
 3 files changed, 48 insertions(+), 28 deletions(-)

diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 7596df6..c7e29f7 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -1068,17 +1068,10 @@ int longjmp_break_handler(struct kprobe *p, struct pt_regs *regs)
 }
 NOKPROBE_SYMBOL(longjmp_break_handler);
 
-bool arch_within_kprobe_blacklist(unsigned long addr)
-{
-	return  (addr >= (unsigned long)__kprobes_text_start &&
-		 addr < (unsigned long)__kprobes_text_end) ||
-		(addr >= (unsigned long)__entry_text_start &&
-		 addr < (unsigned long)__entry_text_end);
-}
-
 int __init arch_init_kprobes(void)
 {
-	return 0;
+	return kprobe_blacklist_add_range((unsigned long)__entry_text_start,
+					  (unsigned long) __entry_text_end);
 }
 
 int arch_trampoline_kprobe(struct kprobe *p)
diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index e059507..e81bced 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -266,6 +266,7 @@ extern int arch_init_kprobes(void);
 extern void show_registers(struct pt_regs *regs);
 extern void kprobes_inc_nmissed_count(struct kprobe *p);
 extern bool arch_within_kprobe_blacklist(unsigned long addr);
+extern int kprobe_blacklist_add_range(unsigned long start, unsigned long end);
 
 struct kprobe_insn_cache {
 	struct mutex mutex;
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 8319048..abdede5 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1325,13 +1325,6 @@ out:
 	return ret;
 }
 
-bool __weak arch_within_kprobe_blacklist(unsigned long addr)
-{
-	/* The __kprobes marked functions and entry code must not be probed */
-	return addr >= (unsigned long)__kprobes_text_start &&
-	       addr < (unsigned long)__kprobes_text_end;
-}
-
 static struct kprobe_blacklist_entry *find_blacklist_entry(unsigned long addr)
 {
 	struct kprobe_blacklist_entry *ent;
@@ -1346,8 +1339,6 @@ static struct kprobe_blacklist_entry *find_blacklist_entry(unsigned long addr)
 
 static bool within_kprobe_blacklist(unsigned long addr)
 {
-	if (arch_within_kprobe_blacklist(addr))
-		return true;
 	/*
 	 * If there exists a kprobe_blacklist, verify and
 	 * fail any probe registration in the prohibited area
@@ -2032,6 +2023,40 @@ void dump_kprobe(struct kprobe *kp)
 }
 NOKPROBE_SYMBOL(dump_kprobe);
 
+static int __kprobe_blacklist_add(unsigned long start, unsigned long end)
+{
+	struct kprobe_blacklist_entry *ent;
+
+	ent = kmalloc(sizeof(*ent), GFP_KERNEL);
+	if (!ent)
+		return -ENOMEM;
+
+	ent->start_addr = start;
+	ent->end_addr = end;
+	INIT_LIST_HEAD(&ent->list);
+	list_add_tail(&ent->list, &kprobe_blacklist);
+	return 0;
+}
+
+int kprobe_blacklist_add_range(unsigned long start, unsigned long end)
+{
+	unsigned long offset = 0, size = 0;
+	int err = 0;
+
+	mutex_lock(&kprobe_blacklist_mutex);
+	while (!err && start < end) {
+		if (!kallsyms_lookup_size_offset(start, &size, &offset) ||
+		    size == 0) {
+			err = -ENOENT;
+			break;
+		}
+		err = __kprobe_blacklist_add(start, start + size);
+		start += size;
+	}
+	mutex_unlock(&kprobe_blacklist_mutex);
+	return err;
+}
+
 /*
  * Lookup and populate the kprobe_blacklist.
  *
@@ -2043,8 +2068,8 @@ NOKPROBE_SYMBOL(dump_kprobe);
 static int populate_kprobe_blacklist(unsigned long *start, unsigned long *end)
 {
 	unsigned long *iter;
-	struct kprobe_blacklist_entry *ent;
 	unsigned long offset = 0, size = 0;
+	int ret;
 
 	mutex_lock(&kprobe_blacklist_mutex);
 	for (iter = start; iter < end; iter++) {
@@ -2052,14 +2077,7 @@ static int populate_kprobe_blacklist(unsigned long *start, unsigned long *end)
 			pr_err("Failed to find blacklist %p\n", (void *)*iter);
 			continue;
 		}
-
-		ent = kmalloc(sizeof(*ent), GFP_KERNEL);
-		if (!ent)
-			return -ENOMEM;
-		ent->start_addr = *iter;
-		ent->end_addr = *iter + size;
-		INIT_LIST_HEAD(&ent->list);
-		list_add_tail(&ent->list, &kprobe_blacklist);
+		ret = __kprobe_blacklist_add(*iter, *iter + size);
 	}
 	mutex_unlock(&kprobe_blacklist_mutex);
 
@@ -2154,7 +2172,15 @@ static int __init init_kprobes(void)
 
 	err = populate_kprobe_blacklist(__start_kprobe_blacklist,
 					__stop_kprobe_blacklist);
-	if (err) {
+
+	if (err >= 0 && __kprobes_text_start != __kprobes_text_end) {
+		/* The __kprobes marked functions must not be probed */
+		err = kprobe_blacklist_add_range(
+					(unsigned long)__kprobes_text_start,
+					(unsigned long)__kprobes_text_end);
+	}
+
+	if (err < 0) {
 		pr_err("kprobes: failed to populate blacklist: %d\n", err);
 		pr_err("Please take care of using kprobes.\n");
 	}



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH -tip v10 4/7] kprobes/x86: Remove unneeded preempt_disable/enable in interrupt handlers
  2014-05-08  9:38 [PATCH -tip v10 0/7] kprobes: NOKPROBE_SYMBOL for modules, and scalbility efforts Masami Hiramatsu
                   ` (2 preceding siblings ...)
  2014-05-08  9:39 ` [PATCH -tip v10 3/7] kprobes/x86: Use kprobe_blacklist for .kprobes.text and .entry.text Masami Hiramatsu
@ 2014-05-08  9:39 ` Masami Hiramatsu
  2014-05-08  9:39 ` [PATCH -tip v10 5/7] kprobes: Enlarge hash table to 512 entries Masami Hiramatsu
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Masami Hiramatsu @ 2014-05-08  9:39 UTC (permalink / raw)
  To: linux-kernel, Ingo Molnar
  Cc: Andi Kleen, Ananth N Mavinakayanahalli, Sandeepa Prabhu,
	Frederic Weisbecker, x86, Steven Rostedt, fche, mingo, systemtap,
	H. Peter Anvin, Thomas Gleixner

Since the int3 itself disables the local_irq and kprobes
keeps it disabled while the single step has done, the
kernel preemption never happen while processing a kprobe.
This means that we don't need to disable/enable preemption.
Also, this changes kprobe_int3_handler to use goto-out style.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/kernel/kprobes/core.c |   24 +++++++-----------------
 1 file changed, 7 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index c7e29f7..cec5879 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -506,7 +506,6 @@ static void setup_singlestep(struct kprobe *p, struct pt_regs *regs,
 		 * stepping.
 		 */
 		regs->ip = (unsigned long)p->ainsn.insn;
-		preempt_enable_no_resched();
 		return;
 	}
 #endif
@@ -575,13 +574,6 @@ int kprobe_int3_handler(struct pt_regs *regs)
 	struct kprobe_ctlblk *kcb;
 
 	addr = (kprobe_opcode_t *)(regs->ip - sizeof(kprobe_opcode_t));
-	/*
-	 * We don't want to be preempted for the entire
-	 * duration of kprobe processing. We conditionally
-	 * re-enable preemption at the end of this function,
-	 * and also in reenter_kprobe() and setup_singlestep().
-	 */
-	preempt_disable();
 
 	kcb = get_kprobe_ctlblk();
 	p = get_kprobe(addr);
@@ -589,7 +581,7 @@ int kprobe_int3_handler(struct pt_regs *regs)
 	if (p) {
 		if (kprobe_running()) {
 			if (reenter_kprobe(p, regs, kcb))
-				return 1;
+				goto handled;
 		} else {
 			set_current_kprobe(p, regs, kcb);
 			kcb->kprobe_status = KPROBE_HIT_ACTIVE;
@@ -604,7 +596,7 @@ int kprobe_int3_handler(struct pt_regs *regs)
 			 */
 			if (!p->pre_handler || !p->pre_handler(p, regs))
 				setup_singlestep(p, regs, kcb, 0);
-			return 1;
+			goto handled;
 		}
 	} else if (*addr != BREAKPOINT_INSTRUCTION) {
 		/*
@@ -617,19 +609,20 @@ int kprobe_int3_handler(struct pt_regs *regs)
 		 * the original instruction.
 		 */
 		regs->ip = (unsigned long)addr;
-		preempt_enable_no_resched();
-		return 1;
+		goto handled;
 	} else if (kprobe_running()) {
 		p = __this_cpu_read(current_kprobe);
 		if (p->break_handler && p->break_handler(p, regs)) {
 			if (!skip_singlestep(p, regs, kcb))
 				setup_singlestep(p, regs, kcb, 0);
-			return 1;
+			goto handled;
 		}
 	} /* else: not a kprobe fault; let the kernel handle it */
 
-	preempt_enable_no_resched();
 	return 0;
+
+handled:
+	return 1;
 }
 NOKPROBE_SYMBOL(kprobe_int3_handler);
 
@@ -894,7 +887,6 @@ int kprobe_debug_handler(struct pt_regs *regs)
 	}
 	reset_current_kprobe();
 out:
-	preempt_enable_no_resched();
 
 	/*
 	 * if somebody else is singlestepping across a probe point, flags
@@ -930,7 +922,6 @@ int kprobe_fault_handler(struct pt_regs *regs, int trapnr)
 			restore_previous_kprobe(kcb);
 		else
 			reset_current_kprobe();
-		preempt_enable_no_resched();
 	} else if (kcb->kprobe_status == KPROBE_HIT_ACTIVE ||
 		   kcb->kprobe_status == KPROBE_HIT_SSDONE) {
 		/*
@@ -1061,7 +1052,6 @@ int longjmp_break_handler(struct kprobe *p, struct pt_regs *regs)
 		memcpy((kprobe_opcode_t *)(kcb->jprobe_saved_sp),
 		       kcb->jprobes_stack,
 		       MIN_STACK_SIZE(kcb->jprobe_saved_sp));
-		preempt_enable_no_resched();
 		return 1;
 	}
 	return 0;



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH -tip v10 5/7] kprobes: Enlarge hash table to 512 entries
  2014-05-08  9:38 [PATCH -tip v10 0/7] kprobes: NOKPROBE_SYMBOL for modules, and scalbility efforts Masami Hiramatsu
                   ` (3 preceding siblings ...)
  2014-05-08  9:39 ` [PATCH -tip v10 4/7] kprobes/x86: Remove unneeded preempt_disable/enable in interrupt handlers Masami Hiramatsu
@ 2014-05-08  9:39 ` Masami Hiramatsu
  2014-05-08  9:39 ` [PATCH -tip v10 6/7] kprobes: Introduce kprobe cache to reduce cache misshits Masami Hiramatsu
  2014-05-08  9:39 ` [PATCH -tip v10 7/7] ftrace: Introduce FTRACE_OPS_FL_SELF_FILTER for ftrace-kprobe Masami Hiramatsu
  6 siblings, 0 replies; 12+ messages in thread
From: Masami Hiramatsu @ 2014-05-08  9:39 UTC (permalink / raw)
  To: linux-kernel, Ingo Molnar
  Cc: Andi Kleen, Ananth N Mavinakayanahalli, Sandeepa Prabhu,
	Frederic Weisbecker, x86, Steven Rostedt, fche, mingo, systemtap,
	H. Peter Anvin, Thomas Gleixner

Currently, since the kprobes expects to be used with less than
100 probe points, its hash table just has 64 entries. This is
too little to handle several thousands of probes.
Enlarge the size of kprobe_table to 512 entries which just
consumes 4KB (on 64bit arch) for better scalability.

Note that this also decouples the hash sizes of kprobe_table and
kretprobe_inst_table/locks, since those use different sources
for hash (kprobe uses probed address, whereas kretprobe uses
task structure.)

Without this patch, enabling 17787 probes takes more than 2 hours!
(9428sec, 1 min intervals for each 2000 probes enabled)

  Enabling trace events: start at 1392782584
  0 1392782585 a2mp_chan_alloc_skb_cb_38556
  1 1392782585 a2mp_chan_close_cb_38555
  ....
  17785 1392792008 lookup_vport_34987
  17786 1392792010 loop_add_23485
  17787 1392792012 loop_attr_do_show_autoclear_23464

I profiled it and saw that more than 90% of cycles are consumed
on get_kprobe.

  Samples: 18K of event 'cycles', Event count (approx.): 37759714934
  +  95.90%  [k] get_kprobe
  +   0.76%  [k] ftrace_lookup_ip
  +   0.54%  [k] kprobe_trace_func

And also more than 60% of executed instructions were in get_kprobe
too.

  Samples: 17K of event 'instructions', Event count (approx.): 1321391290
  +  65.48%  [k] get_kprobe
  +   4.07%  [k] kprobe_trace_func
  +   2.93%  [k] optimized_callback


And annotating get_kprobe also shows the hlist is too long and takes
a time on tracking it. Thus I guess it mostly comes from tracking
too long hlist at the table entry.

       |            struct hlist_head *head;
       |            struct kprobe *p;
       |
       |            head = &kprobe_table[hash_ptr(addr, KPROBE_HASH_BITS)];
       |            hlist_for_each_entry_rcu(p, head, hlist) {
 86.33 |      mov    (%rax),%rax
 11.24 |      test   %rax,%rax
       |      jne    60
       |                    if (p->addr == addr)
       |                            return p;
       |            }

To fix this issue, I tried to enlarge the size of kprobe_table
from 6(current) to 12, and profiled cycles% on the get_kprobe
with 10k probes enabled / 37k probes registered.

  Size  Cycles%
  2^6   95.58%
  2^7   85.83%
  2^8   68.43%
  2^9   48.61%
  2^10  46.95%
  2^11  48.46%
  2^12  56.95%

Here, we can see the hash tables larger than 2^9 (512 entries)
have no much performance improvement. So I decided to enlarge
it to 512 entries.

With this fix, enabling 20,000 probes just took
about 45 min (2934 sec, 1 min intervals for
each 2000 probes enabled)

  Enabling trace events: start at 1393921862
  0 1393921864 a2mp_chan_alloc_skb_cb_38581
  1 1393921864 a2mp_chan_close_cb_38580
  ...
  19997 1393924927 nfs4_match_stateid_11750
  19998 1393924927 nfs4_negotiate_security_12137
  19999 1393924928 nfs4_open_confirm_done_11785

And it reduced cycles on get_kprobe (with 20,000 probes).

  Samples: 691  of event 'cycles', Event count (approx.): 1743375714
  +  67.68%  [k] get_kprobe
  +   5.98%  [k] ftrace_lookup_ip
  +   4.03%  [k] kprobe_trace_func

Changes from v7:
 - Evaluate the size of hash table and reduce it to 512 entries.
 - Decouple the hash size of kprobe_table from others.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
---
 kernel/kprobes.c |   23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index abdede5..a29e622 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -54,9 +54,10 @@
 #include <asm/errno.h>
 #include <asm/uaccess.h>
 
-#define KPROBE_HASH_BITS 6
+#define KPROBE_HASH_BITS 9
+#define KRETPROBE_HASH_BITS 6
 #define KPROBE_TABLE_SIZE (1 << KPROBE_HASH_BITS)
-
+#define KRETPROBE_TABLE_SIZE (1 << KRETPROBE_HASH_BITS)
 
 /*
  * Some oddball architectures like 64bit powerpc have function descriptors
@@ -69,7 +70,7 @@
 
 static int kprobes_initialized;
 static struct hlist_head kprobe_table[KPROBE_TABLE_SIZE];
-static struct hlist_head kretprobe_inst_table[KPROBE_TABLE_SIZE];
+static struct hlist_head kretprobe_inst_table[KRETPROBE_TABLE_SIZE];
 
 /* NOTE: change this value only with kprobe_mutex held */
 static bool kprobes_all_disarmed;
@@ -79,7 +80,7 @@ static DEFINE_MUTEX(kprobe_mutex);
 static DEFINE_PER_CPU(struct kprobe *, kprobe_instance) = NULL;
 static struct {
 	raw_spinlock_t lock ____cacheline_aligned_in_smp;
-} kretprobe_table_locks[KPROBE_TABLE_SIZE];
+} kretprobe_table_locks[KRETPROBE_TABLE_SIZE];
 
 static raw_spinlock_t *kretprobe_table_lock_ptr(unsigned long hash)
 {
@@ -1096,7 +1097,7 @@ void kretprobe_hash_lock(struct task_struct *tsk,
 			 struct hlist_head **head, unsigned long *flags)
 __acquires(hlist_lock)
 {
-	unsigned long hash = hash_ptr(tsk, KPROBE_HASH_BITS);
+	unsigned long hash = hash_ptr(tsk, KRETPROBE_HASH_BITS);
 	raw_spinlock_t *hlist_lock;
 
 	*head = &kretprobe_inst_table[hash];
@@ -1118,7 +1119,7 @@ void kretprobe_hash_unlock(struct task_struct *tsk,
 			   unsigned long *flags)
 __releases(hlist_lock)
 {
-	unsigned long hash = hash_ptr(tsk, KPROBE_HASH_BITS);
+	unsigned long hash = hash_ptr(tsk, KRETPROBE_HASH_BITS);
 	raw_spinlock_t *hlist_lock;
 
 	hlist_lock = kretprobe_table_lock_ptr(hash);
@@ -1153,7 +1154,7 @@ void kprobe_flush_task(struct task_struct *tk)
 		return;
 
 	INIT_HLIST_HEAD(&empty_rp);
-	hash = hash_ptr(tk, KPROBE_HASH_BITS);
+	hash = hash_ptr(tk, KRETPROBE_HASH_BITS);
 	head = &kretprobe_inst_table[hash];
 	kretprobe_table_lock(hash, &flags);
 	hlist_for_each_entry_safe(ri, tmp, head, hlist) {
@@ -1187,7 +1188,7 @@ static void cleanup_rp_inst(struct kretprobe *rp)
 	struct hlist_head *head;
 
 	/* No race here */
-	for (hash = 0; hash < KPROBE_TABLE_SIZE; hash++) {
+	for (hash = 0; hash < KRETPROBE_TABLE_SIZE; hash++) {
 		kretprobe_table_lock(hash, &flags);
 		head = &kretprobe_inst_table[hash];
 		hlist_for_each_entry_safe(ri, next, head, hlist) {
@@ -1778,7 +1779,7 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
 	struct kretprobe_instance *ri;
 
 	/*TODO: consider to only swap the RA after the last pre_handler fired */
-	hash = hash_ptr(current, KPROBE_HASH_BITS);
+	hash = hash_ptr(current, KRETPROBE_HASH_BITS);
 	raw_spin_lock_irqsave(&rp->lock, flags);
 	if (!hlist_empty(&rp->free_instances)) {
 		ri = hlist_entry(rp->free_instances.first,
@@ -2164,8 +2165,10 @@ static int __init init_kprobes(void)
 
 	/* FIXME allocate the probe table, currently defined statically */
 	/* initialize all list heads */
-	for (i = 0; i < KPROBE_TABLE_SIZE; i++) {
+	for (i = 0; i < KPROBE_TABLE_SIZE; i++)
 		INIT_HLIST_HEAD(&kprobe_table[i]);
+
+	for (i = 0; i < KRETPROBE_TABLE_SIZE; i++) {
 		INIT_HLIST_HEAD(&kretprobe_inst_table[i]);
 		raw_spin_lock_init(&(kretprobe_table_locks[i].lock));
 	}



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH -tip v10 6/7] kprobes: Introduce kprobe cache to reduce cache misshits
  2014-05-08  9:38 [PATCH -tip v10 0/7] kprobes: NOKPROBE_SYMBOL for modules, and scalbility efforts Masami Hiramatsu
                   ` (4 preceding siblings ...)
  2014-05-08  9:39 ` [PATCH -tip v10 5/7] kprobes: Enlarge hash table to 512 entries Masami Hiramatsu
@ 2014-05-08  9:39 ` Masami Hiramatsu
  2014-05-08  9:39 ` [PATCH -tip v10 7/7] ftrace: Introduce FTRACE_OPS_FL_SELF_FILTER for ftrace-kprobe Masami Hiramatsu
  6 siblings, 0 replies; 12+ messages in thread
From: Masami Hiramatsu @ 2014-05-08  9:39 UTC (permalink / raw)
  To: linux-kernel, Ingo Molnar
  Cc: Andi Kleen, Ananth N Mavinakayanahalli, Sandeepa Prabhu,
	Frederic Weisbecker, x86, Steven Rostedt, fche, mingo, systemtap,
	H. Peter Anvin, Thomas Gleixner

Introduce kprobe cache to reduce cache misshits for
massive multiple kprobes.
For stress testing kprobes, we need to activate kprobes
as many as possible. This situation causes cache miss
hit storm on kprobe hash-list. kprobe hashlist is already
enlarged to 512 entries and this is still small for 40k
kprobes.

For example, when registering 40k probes on the hlist and
enabling 20k probes, perf tools shows still a lot of
cache-misses are on the get_kprobe.
  ----
  Samples: 633  of event 'cache-misses', Event count (approx.): 3414776
  +  68.13%  [k] get_kprobe
  +   4.38%  [k] ftrace_lookup_ip
  +   2.54%  [k] kprobe_ftrace_handler
  ----

Also, I found that the most of the kprobes are not hit.
In that case, to reduce cache-misses, we can reduce the
random memory access by introducing a per-cpu cache which
caches the address of frequently used kprobe data structure
and its probe address.

With kpcache enabled, the get_kprobe_cached goes down to
around 4-5% of cache-misses with 20k probes.
  ----
  Samples: 729  of event 'cache-misses', Event count (approx.): 690125
  +  14.49%  [k] ftrace_lookup_ip
  +   5.61%  [k] kprobe_trace_func
  +   5.17%  [k] kprobe_ftrace_handler
  +   4.62%  [k] get_kprobe_cached
  ----

Of course this reduces the enabling time too.

Without this fix (just enlarge hash table):
(2934 sec, 1 min intervals for each 2000 probes enabled)

  ----
  Enabling trace events: start at 1393921862
  0 1393921864 a2mp_chan_alloc_skb_cb_38581
  ...
  19999 1393924928 nfs4_open_confirm_done_11785
  ----

With this fix:
(2025 sec, 1 min intervals for each 2000 probes enabled)
  ----
  Enabling trace events: start at 1393912623
  0 1393912625 a2mp_chan_alloc_skb_cb_38800
  ....
  19999 1393914648 nfs2_xdr_dec_readlinkres_11628
  ----

This patch implements a simple per-cpu 4way/512entry cache
for kprobes hlist. All get_kprobe on hot-path uses the cache
and if the cache miss-hit, it searches kprobes on the hlist
and inserts the found kprobes to the cache entry.
When removing kprobes, it clears cache entries by using IPI,
because it is per-cpu cache.

Note that this consumes some amount of memory (34KB per cpu)
compared with previous one (4KB total) only for kprobes.

Changes from v9:
 - Add a comment for kpcache_invalidate().
 - Remove CONFIG_KPROBE_CACHE kconfig according to Ingo's suggestion.

Changes from v7:
 - Re-evaluate the performance improvements with 512 entry size.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
---
 arch/x86/kernel/kprobes/core.c   |    2 -
 arch/x86/kernel/kprobes/ftrace.c |    2 -
 include/linux/kprobes.h          |    1 
 kernel/kprobes.c                 |  132 +++++++++++++++++++++++++++++++++++---
 4 files changed, 125 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index cec5879..69b42f7 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -576,7 +576,7 @@ int kprobe_int3_handler(struct pt_regs *regs)
 	addr = (kprobe_opcode_t *)(regs->ip - sizeof(kprobe_opcode_t));
 
 	kcb = get_kprobe_ctlblk();
-	p = get_kprobe(addr);
+	p = get_kprobe_cached(addr);
 
 	if (p) {
 		if (kprobe_running()) {
diff --git a/arch/x86/kernel/kprobes/ftrace.c b/arch/x86/kernel/kprobes/ftrace.c
index 717b02a..8178dd4 100644
--- a/arch/x86/kernel/kprobes/ftrace.c
+++ b/arch/x86/kernel/kprobes/ftrace.c
@@ -63,7 +63,7 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
 	/* Disable irq for emulating a breakpoint and avoiding preempt */
 	local_irq_save(flags);
 
-	p = get_kprobe((kprobe_opcode_t *)ip);
+	p = get_kprobe_cached((kprobe_opcode_t *)ip);
 	if (unlikely(!p) || kprobe_disabled(p))
 		goto end;
 
diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index e81bced..70c3314 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -339,6 +339,7 @@ extern int arch_prepare_kprobe_ftrace(struct kprobe *p);
 
 /* Get the kprobe at this addr (if any) - called with preemption disabled */
 struct kprobe *get_kprobe(void *addr);
+struct kprobe *get_kprobe_cached(void *addr);
 void kretprobe_hash_lock(struct task_struct *tsk,
 			 struct hlist_head **head, unsigned long *flags);
 void kretprobe_hash_unlock(struct task_struct *tsk, unsigned long *flags);
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index a29e622..0f5f23c 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -91,6 +91,87 @@ static raw_spinlock_t *kretprobe_table_lock_ptr(unsigned long hash)
 static LIST_HEAD(kprobe_blacklist);
 static DEFINE_MUTEX(kprobe_blacklist_mutex);
 
+/* Kprobe cache */
+#define KPCACHE_BITS	2
+#define KPCACHE_SIZE	(1 << KPCACHE_BITS)
+#define KPCACHE_INDEX(i)	((i) & (KPCACHE_SIZE - 1))
+
+struct kprobe_cache_entry {
+	unsigned long addr;
+	struct kprobe *kp;
+};
+
+struct kprobe_cache {
+	struct kprobe_cache_entry table[KPROBE_TABLE_SIZE][KPCACHE_SIZE];
+	int index[KPROBE_TABLE_SIZE];
+};
+
+static DEFINE_PER_CPU(struct kprobe_cache, kpcache);
+
+static inline
+struct kprobe *kpcache_get(unsigned long hash, unsigned long addr)
+{
+	struct kprobe_cache *cache = this_cpu_ptr(&kpcache);
+	struct kprobe_cache_entry *ent = &cache->table[hash][0];
+	struct kprobe *ret;
+	int idx = ACCESS_ONCE(cache->index[hash]);
+	int i;
+
+	for (i = 0; i < KPCACHE_SIZE; i++)
+		if (ent[i].addr == addr) {
+			ret = ent[i].kp;
+			/* Check the cache is updated */
+			if (unlikely(idx != cache->index[hash]))
+				break;
+			return ret;
+		}
+	return NULL;
+}
+
+static inline void kpcache_set(unsigned long hash, unsigned long addr,
+				struct kprobe *kp)
+{
+	struct kprobe_cache *cache = this_cpu_ptr(&kpcache);
+	struct kprobe_cache_entry *ent = &cache->table[hash][0];
+	int i = KPCACHE_INDEX(cache->index[hash]++);
+
+	/*
+	 * Setting must be done in this order for avoiding interruption;
+	 * (1)invalidate entry, (2)set the value, and (3)enable entry.
+	 */
+	ent[i].addr = 0;
+	barrier();
+	ent[i].kp = kp;
+	barrier();
+	ent[i].addr = addr;
+}
+
+static void kpcache_invalidate_this_cpu(void *addr)
+{
+	unsigned long hash = hash_ptr(addr, KPROBE_HASH_BITS);
+	struct kprobe_cache *cache = this_cpu_ptr(&kpcache);
+	struct kprobe_cache_entry *ent = &cache->table[hash][0];
+	int i;
+
+	for (i = 0; i < KPCACHE_SIZE; i++)
+		if (ent[i].addr == (unsigned long)addr)
+			ent[i].addr = 0;
+}
+
+/* This must be called after ensuring the kprobe is removed from hlist */
+static void kpcache_invalidate(unsigned long addr)
+{
+	on_each_cpu(kpcache_invalidate_this_cpu, (void *)addr, 1);
+	/*
+	 * Here is a trick. Normally, we need another synchronize_rcu() here,
+	 * but all kpcache users are called from arch-dependent kprobe handler
+	 * which runs as an interrupt handler, they have left from the cached
+	 * kprobe at this point (note that kpcache_invalidate() waits for
+	 * finishing IPIs.)
+	 * So it is already safe to release them beyond this point.
+	 */
+}
+
 #ifdef __ARCH_WANT_KPROBES_INSN_SLOT
 /*
  * kprobe->ainsn.insn points to the copy of the instruction to be
@@ -297,18 +378,13 @@ static inline void reset_kprobe_instance(void)
 	__this_cpu_write(kprobe_instance, NULL);
 }
 
-/*
- * This routine is called either:
- * 	- under the kprobe_mutex - during kprobe_[un]register()
- * 				OR
- * 	- with preemption disabled - from arch/xxx/kernel/kprobes.c
- */
-struct kprobe *get_kprobe(void *addr)
+static nokprobe_inline
+struct kprobe *__get_kprobe(void *addr, unsigned long hash)
 {
 	struct hlist_head *head;
 	struct kprobe *p;
 
-	head = &kprobe_table[hash_ptr(addr, KPROBE_HASH_BITS)];
+	head = &kprobe_table[hash];
 	hlist_for_each_entry_rcu(p, head, hlist) {
 		if (p->addr == addr)
 			return p;
@@ -316,8 +392,41 @@ struct kprobe *get_kprobe(void *addr)
 
 	return NULL;
 }
+
+/*
+ * This routine is called either:
+ *  - under the kprobe_mutex - during kprobe_[un]register()
+ * OR
+ *  - with preemption disabled - from arch/xxx/kernel/kprobes.c
+ */
+struct kprobe *get_kprobe(void *addr)
+{
+	return __get_kprobe(addr, hash_ptr(addr, KPROBE_HASH_BITS));
+}
 NOKPROBE_SYMBOL(get_kprobe);
 
+/*
+ * This must be called from kprobe internal arch-dependent functions
+ * which interrupts in the normal kernel path. For other usecases,
+ * please use get_kprobe() for safe.
+ */
+struct kprobe *get_kprobe_cached(void *addr)
+{
+	unsigned long hash = hash_ptr(addr, KPROBE_HASH_BITS);
+	struct kprobe *p;
+
+	p = kpcache_get(hash, (unsigned long)addr);
+	if (likely(p))
+		return p;
+
+	p = __get_kprobe(addr, hash);
+	if (likely(p))
+		kpcache_set(hash, (unsigned long)addr, p);
+	return p;
+}
+NOKPROBE_SYMBOL(get_kprobe_cached);
+
+
 static int aggr_pre_handler(struct kprobe *p, struct pt_regs *regs);
 
 /* Return true if the kprobe is an aggregator */
@@ -518,6 +627,7 @@ static void do_free_cleaned_kprobes(void)
 	list_for_each_entry_safe(op, tmp, &freeing_list, list) {
 		BUG_ON(!kprobe_unused(&op->kp));
 		list_del_init(&op->list);
+		kpcache_invalidate((unsigned long)op->kp.addr);
 		free_aggr_kprobe(&op->kp);
 	}
 }
@@ -1639,13 +1749,15 @@ static void __unregister_kprobe_bottom(struct kprobe *p)
 {
 	struct kprobe *ap;
 
-	if (list_empty(&p->list))
+	if (list_empty(&p->list)) {
 		/* This is an independent kprobe */
+		kpcache_invalidate((unsigned long)p->addr);
 		arch_remove_kprobe(p);
-	else if (list_is_singular(&p->list)) {
+	} else if (list_is_singular(&p->list)) {
 		/* This is the last child of an aggrprobe */
 		ap = list_entry(p->list.next, struct kprobe, list);
 		list_del(&p->list);
+		kpcache_invalidate((unsigned long)p->addr);
 		free_aggr_kprobe(ap);
 	}
 	/* Otherwise, do nothing. */



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH -tip v10 7/7] ftrace: Introduce FTRACE_OPS_FL_SELF_FILTER for ftrace-kprobe
  2014-05-08  9:38 [PATCH -tip v10 0/7] kprobes: NOKPROBE_SYMBOL for modules, and scalbility efforts Masami Hiramatsu
                   ` (5 preceding siblings ...)
  2014-05-08  9:39 ` [PATCH -tip v10 6/7] kprobes: Introduce kprobe cache to reduce cache misshits Masami Hiramatsu
@ 2014-05-08  9:39 ` Masami Hiramatsu
  2014-05-08 10:59   ` Steven Rostedt
  6 siblings, 1 reply; 12+ messages in thread
From: Masami Hiramatsu @ 2014-05-08  9:39 UTC (permalink / raw)
  To: linux-kernel, Ingo Molnar
  Cc: Andi Kleen, Ananth N Mavinakayanahalli, Sandeepa Prabhu,
	Frederic Weisbecker, x86, Steven Rostedt, fche, mingo, systemtap,
	H. Peter Anvin, Thomas Gleixner

Since the kprobes itself owns a hash table to get a kprobe
data structure corresponding to the given ip address, there
is no need to test ftrace hash in ftrace side.
To achive better performance on ftrace-based kprobe,
FTRACE_OPS_FL_SELF_FILTER flag to ftrace_ops which means
that ftrace skips testing its own hash table.

Without this patch, ftrace_lookup_ip() is biggest cycles
consumer when 20,000 kprobes are enabled.
  ----
  Samples: 1K of event 'cycles', Event count (approx.): 340068894
  +  20.77%  [k] ftrace_lookup_ip
  +   8.33%  [k] kprobe_trace_func
  +   4.83%  [k] get_kprobe_cached
  ----

With this patch, ftrace_lookup_ip() vanished from the
cycles consumer list (of course, there is no caller on
hotpath anymore :))
  ----
  Samples: 1K of event 'cycles', Event count (approx.): 186861492
  +   9.95%  [k] kprobe_trace_func
  +   6.00%  [k] kprobe_ftrace_handler
  +   5.53%  [k] get_kprobe_cached
  ----

Changes from v7:
 - Re-evaluate the performance improvement.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
---
 include/linux/ftrace.h |    3 +++
 kernel/kprobes.c       |    2 +-
 kernel/trace/ftrace.c  |    3 ++-
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index ae9504b..f1fa7d27 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -93,6 +93,8 @@ typedef void (*ftrace_func_t)(unsigned long ip, unsigned long parent_ip,
  * INITIALIZED - The ftrace_ops has already been initialized (first use time
  *            register_ftrace_function() is called, it will initialized the ops)
  * DELETED - The ops are being deleted, do not let them be registered again.
+ * SELF_FILTER - The ftrace_ops function filters ip by itself. Do not need to
+ *            check hash table on each hit.
  */
 enum {
 	FTRACE_OPS_FL_ENABLED			= 1 << 0,
@@ -105,6 +107,7 @@ enum {
 	FTRACE_OPS_FL_STUB			= 1 << 7,
 	FTRACE_OPS_FL_INITIALIZED		= 1 << 8,
 	FTRACE_OPS_FL_DELETED			= 1 << 9,
+	FTRACE_OPS_FL_SELF_FILTER		= 1 << 10,
 };
 
 /*
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 0f5f23c..5c6e410 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1027,7 +1027,7 @@ static struct kprobe *alloc_aggr_kprobe(struct kprobe *p)
 #ifdef CONFIG_KPROBES_ON_FTRACE
 static struct ftrace_ops kprobe_ftrace_ops __read_mostly = {
 	.func = kprobe_ftrace_handler,
-	.flags = FTRACE_OPS_FL_SAVE_REGS,
+	.flags = FTRACE_OPS_FL_SAVE_REGS | FTRACE_OPS_FL_SELF_FILTER,
 };
 static int kprobe_ftrace_enabled;
 
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 4a54a25..062ca20 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -4501,7 +4501,8 @@ __ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
 	 */
 	preempt_disable_notrace();
 	do_for_each_ftrace_op(op, ftrace_ops_list) {
-		if (ftrace_ops_test(op, ip, regs))
+		if (op->flags & FTRACE_OPS_FL_SELF_FILTER ||
+		    ftrace_ops_test(op, ip, regs))
 			op->func(ip, parent_ip, op, regs);
 	} while_for_each_ftrace_op(op);
 	preempt_enable_notrace();



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH -tip v10 7/7] ftrace: Introduce FTRACE_OPS_FL_SELF_FILTER for ftrace-kprobe
  2014-05-08  9:39 ` [PATCH -tip v10 7/7] ftrace: Introduce FTRACE_OPS_FL_SELF_FILTER for ftrace-kprobe Masami Hiramatsu
@ 2014-05-08 10:59   ` Steven Rostedt
  2014-05-09  3:11     ` Masami Hiramatsu
  0 siblings, 1 reply; 12+ messages in thread
From: Steven Rostedt @ 2014-05-08 10:59 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: linux-kernel, Ingo Molnar, Andi Kleen,
	Ananth N Mavinakayanahalli, Sandeepa Prabhu, Frederic Weisbecker,
	x86, fche, mingo, systemtap, H. Peter Anvin, Thomas Gleixner

On Thu, 08 May 2014 18:39:30 +0900
Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> wrote:

> Since the kprobes itself owns a hash table to get a kprobe
> data structure corresponding to the given ip address, there
> is no need to test ftrace hash in ftrace side.
> To achive better performance on ftrace-based kprobe,
> FTRACE_OPS_FL_SELF_FILTER flag to ftrace_ops which means
> that ftrace skips testing its own hash table.
> 
> Without this patch, ftrace_lookup_ip() is biggest cycles
> consumer when 20,000 kprobes are enabled.
>   ----
>   Samples: 1K of event 'cycles', Event count (approx.): 340068894
>   +  20.77%  [k] ftrace_lookup_ip
>   +   8.33%  [k] kprobe_trace_func
>   +   4.83%  [k] get_kprobe_cached
>   ----
> 
> With this patch, ftrace_lookup_ip() vanished from the
> cycles consumer list (of course, there is no caller on
> hotpath anymore :))
>   ----
>   Samples: 1K of event 'cycles', Event count (approx.): 186861492
>   +   9.95%  [k] kprobe_trace_func
>   +   6.00%  [k] kprobe_ftrace_handler
>   +   5.53%  [k] get_kprobe_cached

I should look at your filtering methods, maybe it can make ftrace
filtering better?

>   ----
> 
> Changes from v7:
>  - Re-evaluate the performance improvement.
> 
> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> ---
>  include/linux/ftrace.h |    3 +++
>  kernel/kprobes.c       |    2 +-
>  kernel/trace/ftrace.c  |    3 ++-
>  3 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> index ae9504b..f1fa7d27 100644
> --- a/include/linux/ftrace.h
> +++ b/include/linux/ftrace.h
> @@ -93,6 +93,8 @@ typedef void (*ftrace_func_t)(unsigned long ip, unsigned long parent_ip,
>   * INITIALIZED - The ftrace_ops has already been initialized (first use time
>   *            register_ftrace_function() is called, it will initialized the ops)
>   * DELETED - The ops are being deleted, do not let them be registered again.
> + * SELF_FILTER - The ftrace_ops function filters ip by itself. Do not need to
> + *            check hash table on each hit.

 - The ftrace_ops function has its own ip filter and does not need to
   rely on the ftrace internal ip filtering.


>   */
>  enum {
>  	FTRACE_OPS_FL_ENABLED			= 1 << 0,
> @@ -105,6 +107,7 @@ enum {
>  	FTRACE_OPS_FL_STUB			= 1 << 7,
>  	FTRACE_OPS_FL_INITIALIZED		= 1 << 8,
>  	FTRACE_OPS_FL_DELETED			= 1 << 9,
> +	FTRACE_OPS_FL_SELF_FILTER		= 1 << 10,
>  };
>  
>  /*
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 0f5f23c..5c6e410 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1027,7 +1027,7 @@ static struct kprobe *alloc_aggr_kprobe(struct kprobe *p)
>  #ifdef CONFIG_KPROBES_ON_FTRACE
>  static struct ftrace_ops kprobe_ftrace_ops __read_mostly = {
>  	.func = kprobe_ftrace_handler,
> -	.flags = FTRACE_OPS_FL_SAVE_REGS,
> +	.flags = FTRACE_OPS_FL_SAVE_REGS | FTRACE_OPS_FL_SELF_FILTER,
>  };
>  static int kprobe_ftrace_enabled;
>  
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index 4a54a25..062ca20 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -4501,7 +4501,8 @@ __ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
>  	 */
>  	preempt_disable_notrace();
>  	do_for_each_ftrace_op(op, ftrace_ops_list) {
> -		if (ftrace_ops_test(op, ip, regs))
> +		if (op->flags & FTRACE_OPS_FL_SELF_FILTER ||
> +		    ftrace_ops_test(op, ip, regs))

Hmm, I wonder if I should add the check for:

	!(op->flags & FTRACE_OPS_FL_STUB)

here too? But that's another change that I'll do.

Just update the flag description as I commented and the rest looks good.

-- Steve

>  			op->func(ip, parent_ip, op, regs);
>  	} while_for_each_ftrace_op(op);
>  	preempt_enable_notrace();
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Re: [PATCH -tip v10 7/7] ftrace: Introduce FTRACE_OPS_FL_SELF_FILTER for ftrace-kprobe
  2014-05-08 10:59   ` Steven Rostedt
@ 2014-05-09  3:11     ` Masami Hiramatsu
  2014-05-09  3:43       ` Steven Rostedt
  0 siblings, 1 reply; 12+ messages in thread
From: Masami Hiramatsu @ 2014-05-09  3:11 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andi Kleen,
	Ananth N Mavinakayanahalli, Sandeepa Prabhu, Frederic Weisbecker,
	x86, fche, mingo, systemtap, H. Peter Anvin, Thomas Gleixner

(2014/05/08 19:59), Steven Rostedt wrote:
> On Thu, 08 May 2014 18:39:30 +0900
> Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> wrote:
> 
>> Since the kprobes itself owns a hash table to get a kprobe
>> data structure corresponding to the given ip address, there
>> is no need to test ftrace hash in ftrace side.
>> To achive better performance on ftrace-based kprobe,
>> FTRACE_OPS_FL_SELF_FILTER flag to ftrace_ops which means
>> that ftrace skips testing its own hash table.
>>
>> Without this patch, ftrace_lookup_ip() is biggest cycles
>> consumer when 20,000 kprobes are enabled.
>>   ----
>>   Samples: 1K of event 'cycles', Event count (approx.): 340068894
>>   +  20.77%  [k] ftrace_lookup_ip
>>   +   8.33%  [k] kprobe_trace_func
>>   +   4.83%  [k] get_kprobe_cached
>>   ----
>>
>> With this patch, ftrace_lookup_ip() vanished from the
>> cycles consumer list (of course, there is no caller on
>> hotpath anymore :))
>>   ----
>>   Samples: 1K of event 'cycles', Event count (approx.): 186861492
>>   +   9.95%  [k] kprobe_trace_func
>>   +   6.00%  [k] kprobe_ftrace_handler
>>   +   5.53%  [k] get_kprobe_cached
> 
> I should look at your filtering methods, maybe it can make ftrace
> filtering better?

Ah! Yes, it could be better :) At least the hash-table cache is good
for ftrace too. Currently it is just for fixed-size hash-table, but
is easy to expand for resizable one. (however, I guess with the cache
we don't need to resize that anymore.)

> 
>>   ----
>>
>> Changes from v7:
>>  - Re-evaluate the performance improvement.
>>
>> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
>> Cc: Steven Rostedt <rostedt@goodmis.org>
>> Cc: Frederic Weisbecker <fweisbec@gmail.com>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> ---
>>  include/linux/ftrace.h |    3 +++
>>  kernel/kprobes.c       |    2 +-
>>  kernel/trace/ftrace.c  |    3 ++-
>>  3 files changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
>> index ae9504b..f1fa7d27 100644
>> --- a/include/linux/ftrace.h
>> +++ b/include/linux/ftrace.h
>> @@ -93,6 +93,8 @@ typedef void (*ftrace_func_t)(unsigned long ip, unsigned long parent_ip,
>>   * INITIALIZED - The ftrace_ops has already been initialized (first use time
>>   *            register_ftrace_function() is called, it will initialized the ops)
>>   * DELETED - The ops are being deleted, do not let them be registered again.
>> + * SELF_FILTER - The ftrace_ops function filters ip by itself. Do not need to
>> + *            check hash table on each hit.
> 
>  - The ftrace_ops function has its own ip filter and does not need to
>    rely on the ftrace internal ip filtering.

OK, I'll update that.

> 
> 
>>   */
>>  enum {
>>  	FTRACE_OPS_FL_ENABLED			= 1 << 0,
>> @@ -105,6 +107,7 @@ enum {
>>  	FTRACE_OPS_FL_STUB			= 1 << 7,
>>  	FTRACE_OPS_FL_INITIALIZED		= 1 << 8,
>>  	FTRACE_OPS_FL_DELETED			= 1 << 9,
>> +	FTRACE_OPS_FL_SELF_FILTER		= 1 << 10,
>>  };
>>  
>>  /*
>> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
>> index 0f5f23c..5c6e410 100644
>> --- a/kernel/kprobes.c
>> +++ b/kernel/kprobes.c
>> @@ -1027,7 +1027,7 @@ static struct kprobe *alloc_aggr_kprobe(struct kprobe *p)
>>  #ifdef CONFIG_KPROBES_ON_FTRACE
>>  static struct ftrace_ops kprobe_ftrace_ops __read_mostly = {
>>  	.func = kprobe_ftrace_handler,
>> -	.flags = FTRACE_OPS_FL_SAVE_REGS,
>> +	.flags = FTRACE_OPS_FL_SAVE_REGS | FTRACE_OPS_FL_SELF_FILTER,
>>  };
>>  static int kprobe_ftrace_enabled;
>>  
>> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
>> index 4a54a25..062ca20 100644
>> --- a/kernel/trace/ftrace.c
>> +++ b/kernel/trace/ftrace.c
>> @@ -4501,7 +4501,8 @@ __ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
>>  	 */
>>  	preempt_disable_notrace();
>>  	do_for_each_ftrace_op(op, ftrace_ops_list) {
>> -		if (ftrace_ops_test(op, ip, regs))
>> +		if (op->flags & FTRACE_OPS_FL_SELF_FILTER ||
>> +		    ftrace_ops_test(op, ip, regs))
> 
> Hmm, I wonder if I should add the check for:
> 
> 	!(op->flags & FTRACE_OPS_FL_STUB)
> 
> here too? But that's another change that I'll do.

Indeed. BTW, should I change ftrace_ops_control_func() too?

> 
> Just update the flag description as I commented and the rest looks good.

OK, thanks!


-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH -tip v10 7/7] ftrace: Introduce FTRACE_OPS_FL_SELF_FILTER for ftrace-kprobe
  2014-05-09  3:11     ` Masami Hiramatsu
@ 2014-05-09  3:43       ` Steven Rostedt
  2014-05-09 10:04         ` [PATCH -tip v10.1] " Masami Hiramatsu
  0 siblings, 1 reply; 12+ messages in thread
From: Steven Rostedt @ 2014-05-09  3:43 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: linux-kernel, Ingo Molnar, Andi Kleen,
	Ananth N Mavinakayanahalli, Sandeepa Prabhu, Frederic Weisbecker,
	x86, fche, mingo, systemtap, H. Peter Anvin, Thomas Gleixner

On Fri, 09 May 2014 12:11:29 +0900
Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> wrote:

> > Hmm, I wonder if I should add the check for:
> > 
> > 	!(op->flags & FTRACE_OPS_FL_STUB)
> > 
> > here too? But that's another change that I'll do.
> 
> Indeed. BTW, should I change ftrace_ops_control_func() too?
> 

Currently there's no users of it, so I guess it could wait. Otherwise
we are adding a check that will never be true in an extreme hot path.

-- Steve


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH -tip v10.1] ftrace: Introduce FTRACE_OPS_FL_SELF_FILTER for ftrace-kprobe
  2014-05-09  3:43       ` Steven Rostedt
@ 2014-05-09 10:04         ` Masami Hiramatsu
  0 siblings, 0 replies; 12+ messages in thread
From: Masami Hiramatsu @ 2014-05-09 10:04 UTC (permalink / raw)
  To: linux-kernel, Ingo Molnar
  Cc: Andi Kleen, Ananth N Mavinakayanahalli, Sandeepa Prabhu,
	Frederic Weisbecker, x86, Steven Rostedt, fche, mingo, systemtap,
	H. Peter Anvin, Thomas Gleixner

Since the kprobes itself owns a hash table to get a kprobe
data structure corresponding to the given ip address, there
is no need to test ftrace hash in ftrace side.
To achive better performance on ftrace-based kprobe,
FTRACE_OPS_FL_SELF_FILTER flag to ftrace_ops which means
that ftrace skips testing its own hash table.

Without this patch, ftrace_lookup_ip() is biggest cycles
consumer when 20,000 kprobes are enabled.
  ----
  Samples: 1K of event 'cycles', Event count (approx.): 340068894
  +  20.77%  [k] ftrace_lookup_ip
  +   8.33%  [k] kprobe_trace_func
  +   4.83%  [k] get_kprobe_cached
  ----

With this patch, ftrace_lookup_ip() vanished from the
cycles consumer list (of course, there is no caller on
hotpath anymore :))
  ----
  Samples: 1K of event 'cycles', Event count (approx.): 186861492
  +   9.95%  [k] kprobe_trace_func
  +   6.00%  [k] kprobe_ftrace_handler
  +   5.53%  [k] get_kprobe_cached
  ----

Changes from v10:
 - Update comment of the flag according to Steven's comment.

Changes from v7:
 - Re-evaluate the performance improvement.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
---
 include/linux/ftrace.h |    3 +++
 kernel/kprobes.c       |    2 +-
 kernel/trace/ftrace.c  |    3 ++-
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index ae9504b..5653001 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -93,6 +93,8 @@ typedef void (*ftrace_func_t)(unsigned long ip, unsigned long parent_ip,
  * INITIALIZED - The ftrace_ops has already been initialized (first use time
  *            register_ftrace_function() is called, it will initialized the ops)
  * DELETED - The ops are being deleted, do not let them be registered again.
+ * SELF_FILTER - The ftrace_ops function has its own ip filter and does not
+ *            need to rely on the ftrace internal ip filtering.
  */
 enum {
 	FTRACE_OPS_FL_ENABLED			= 1 << 0,
@@ -105,6 +107,7 @@ enum {
 	FTRACE_OPS_FL_STUB			= 1 << 7,
 	FTRACE_OPS_FL_INITIALIZED		= 1 << 8,
 	FTRACE_OPS_FL_DELETED			= 1 << 9,
+	FTRACE_OPS_FL_SELF_FILTER		= 1 << 10,
 };
 
 /*
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 0f5f23c..5c6e410 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1027,7 +1027,7 @@ static struct kprobe *alloc_aggr_kprobe(struct kprobe *p)
 #ifdef CONFIG_KPROBES_ON_FTRACE
 static struct ftrace_ops kprobe_ftrace_ops __read_mostly = {
 	.func = kprobe_ftrace_handler,
-	.flags = FTRACE_OPS_FL_SAVE_REGS,
+	.flags = FTRACE_OPS_FL_SAVE_REGS | FTRACE_OPS_FL_SELF_FILTER,
 };
 static int kprobe_ftrace_enabled;
 
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 4a54a25..062ca20 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -4501,7 +4501,8 @@ __ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
 	 */
 	preempt_disable_notrace();
 	do_for_each_ftrace_op(op, ftrace_ops_list) {
-		if (ftrace_ops_test(op, ip, regs))
+		if (op->flags & FTRACE_OPS_FL_SELF_FILTER ||
+		    ftrace_ops_test(op, ip, regs))
 			op->func(ip, parent_ip, op, regs);
 	} while_for_each_ftrace_op(op);
 	preempt_enable_notrace();



^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-05-09 10:05 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-08  9:38 [PATCH -tip v10 0/7] kprobes: NOKPROBE_SYMBOL for modules, and scalbility efforts Masami Hiramatsu
2014-05-08  9:38 ` [PATCH -tip v10 1/7] kprobes: Support blacklist functions in module Masami Hiramatsu
2014-05-08  9:38 ` [PATCH -tip v10 2/7] kprobes: Use NOKPROBE_SYMBOL() in sample modules Masami Hiramatsu
2014-05-08  9:39 ` [PATCH -tip v10 3/7] kprobes/x86: Use kprobe_blacklist for .kprobes.text and .entry.text Masami Hiramatsu
2014-05-08  9:39 ` [PATCH -tip v10 4/7] kprobes/x86: Remove unneeded preempt_disable/enable in interrupt handlers Masami Hiramatsu
2014-05-08  9:39 ` [PATCH -tip v10 5/7] kprobes: Enlarge hash table to 512 entries Masami Hiramatsu
2014-05-08  9:39 ` [PATCH -tip v10 6/7] kprobes: Introduce kprobe cache to reduce cache misshits Masami Hiramatsu
2014-05-08  9:39 ` [PATCH -tip v10 7/7] ftrace: Introduce FTRACE_OPS_FL_SELF_FILTER for ftrace-kprobe Masami Hiramatsu
2014-05-08 10:59   ` Steven Rostedt
2014-05-09  3:11     ` Masami Hiramatsu
2014-05-09  3:43       ` Steven Rostedt
2014-05-09 10:04         ` [PATCH -tip v10.1] " Masami Hiramatsu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).