linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/5] ftrace perf: Fixes and speedup
@ 2016-03-09 20:46 Jiri Olsa
  2016-03-09 20:46 ` [PATCH 1/5] ftrace perf: Check sample types only for sampling events Jiri Olsa
                   ` (4 more replies)
  0 siblings, 5 replies; 28+ messages in thread
From: Jiri Olsa @ 2016-03-09 20:46 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: lkml, Ingo Molnar, Namhyung Kim, Peter Zijlstra,
	Arnaldo Carvalho de Melo

hi,
sending some small perf related fixes plus one change
to speedup ftrace:function perf registration. I'm still
testing this, but any feedback would be great.

Also available in here:
  git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
  ftrace/fixes

thanks,
jirka
---
Jiri Olsa (5):
      ftrace perf: Check sample types only for sampling events
      ftrace perf: Move exclude_kernel tracepoint check to init event
      ftrace perf: Use ftrace_ops::private to store event pointer
      ftrace: Make ftrace_hash_rec_enable return update bool
      ftrace: Update dynamic ftrace calls only if necessary

 include/linux/perf_event.h      |  3 +++
 kernel/events/core.c            | 27 ++++++++++++++++++++++-----
 kernel/trace/ftrace.c           | 36 +++++++++++++++++++++---------------
 kernel/trace/trace_event_perf.c | 39 ++++++++++++++++++++++++++++++---------
 4 files changed, 76 insertions(+), 29 deletions(-)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/5] ftrace perf: Check sample types only for sampling events
  2016-03-09 20:46 [RFC 0/5] ftrace perf: Fixes and speedup Jiri Olsa
@ 2016-03-09 20:46 ` Jiri Olsa
  2016-03-10  0:36   ` Namhyung Kim
  2016-03-15 20:06   ` Steven Rostedt
  2016-03-09 20:46 ` [PATCH 2/5] ftrace perf: Move exclude_kernel tracepoint check to init event Jiri Olsa
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 28+ messages in thread
From: Jiri Olsa @ 2016-03-09 20:46 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: lkml, Ingo Molnar, Namhyung Kim, Peter Zijlstra,
	Arnaldo Carvalho de Melo

Currently we check sample type for ftrace:function event
even if it's not created as sampling event. That prevents
creating ftrace_function event in counting mode.

Making sure we check sample types only for sampling events.

Before:
  $ sudo perf stat -e ftrace:function ls
  ...

   Performance counter stats for 'ls':

     <not supported>      ftrace:function

         0.001983662 seconds time elapsed

After:
  $ sudo perf stat -e ftrace:function ls
  ...

   Performance counter stats for 'ls':

              44,498      ftrace:function

         0.037534722 seconds time elapsed

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 kernel/trace/trace_event_perf.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 00df25fd86ef..a7171ec2c1ca 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -52,14 +52,14 @@ static int perf_trace_event_perm(struct trace_event_call *tp_event,
 		 * event, due to issues with page faults while tracing page
 		 * fault handler and its overall trickiness nature.
 		 */
-		if (!p_event->attr.exclude_callchain_user)
+		if (is_sampling_event(p_event) && !p_event->attr.exclude_callchain_user)
 			return -EINVAL;
 
 		/*
 		 * Same reason to disable user stack dump as for user space
 		 * callchains above.
 		 */
-		if (p_event->attr.sample_type & PERF_SAMPLE_STACK_USER)
+		if (is_sampling_event(p_event) && p_event->attr.sample_type & PERF_SAMPLE_STACK_USER)
 			return -EINVAL;
 	}
 
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/5] ftrace perf: Move exclude_kernel tracepoint check to init event
  2016-03-09 20:46 [RFC 0/5] ftrace perf: Fixes and speedup Jiri Olsa
  2016-03-09 20:46 ` [PATCH 1/5] ftrace perf: Check sample types only for sampling events Jiri Olsa
@ 2016-03-09 20:46 ` Jiri Olsa
  2016-03-10  0:39   ` Namhyung Kim
  2016-03-09 20:46 ` [PATCH 3/5] ftrace perf: Use ftrace_ops::private to store event pointer Jiri Olsa
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 28+ messages in thread
From: Jiri Olsa @ 2016-03-09 20:46 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: lkml, Ingo Molnar, Namhyung Kim, Peter Zijlstra,
	Arnaldo Carvalho de Melo

We suppress events with attr::exclude_kernel set when
the event is generated, so following capture will
give no warning but won't produce any data:

  $ sudo perf record -e sched:sched_switch:u ls
  $ sudo /perf script | wc -l
  0

Checking the attr::exclude_(kernel|user) at the event
init time and failing right away for tracepoints from
uprobes/kprobes and native ones:

  $ sudo perf record -e sched:sched_switch:u ls
  Error:
  The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (sched:sched_switch).
  /bin/dmesg may provide additional information.
  No CONFIG_PERF_EVENTS=y kernel support configured?

  $ sudo perf record -e probe:sys_read:u ls
  Error:
  The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (probe:sys_read).
  /bin/dmesg may provide additional information.
  No CONFIG_PERF_EVENTS=y kernel support configured?

  $ ./perf record -e probe_ex:main:k ./ex
  Error:
  The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (probe_ex:main).
  /bin/dmesg may provide additional information.
  No CONFIG_PERF_EVENTS=y kernel support configured?

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 kernel/events/core.c            |  5 -----
 kernel/trace/trace_event_perf.c | 25 +++++++++++++++++++++++++
 2 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index c15fd097af93..ca68fdcf47ce 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6898,11 +6898,6 @@ static int perf_tp_event_match(struct perf_event *event,
 {
 	if (event->hw.state & PERF_HES_STOPPED)
 		return 0;
-	/*
-	 * All tracepoints are from kernel-space.
-	 */
-	if (event->attr.exclude_kernel)
-		return 0;
 
 	if (!perf_tp_filter_match(event, data))
 		return 0;
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index a7171ec2c1ca..0a3779bd18a1 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -182,11 +182,36 @@ static void perf_trace_event_close(struct perf_event *p_event)
 	tp_event->class->reg(tp_event, TRACE_REG_PERF_CLOSE, p_event);
 }
 
+static int perf_trace_event_attr(struct trace_event_call *tp_event,
+				 struct perf_event *event)
+{
+	/*
+	 * All tracepoints and kprobes are from kernel-space.
+	 */
+	if (((tp_event->flags & TRACE_EVENT_FL_TRACEPOINT) ||
+	     (tp_event->flags & TRACE_EVENT_FL_KPROBE)) &&
+	     event->attr.exclude_kernel)
+		return -EINVAL;
+
+	/*
+	 * All uprobes are from user-space.
+	 */
+	if ((tp_event->flags & TRACE_EVENT_FL_UPROBE) &&
+	    event->attr.exclude_user)
+		return -EINVAL;
+
+	return 0;
+}
+
 static int perf_trace_event_init(struct trace_event_call *tp_event,
 				 struct perf_event *p_event)
 {
 	int ret;
 
+	ret = perf_trace_event_attr(tp_event, p_event);
+	if (ret)
+		return ret;
+
 	ret = perf_trace_event_perm(tp_event, p_event);
 	if (ret)
 		return ret;
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/5] ftrace perf: Use ftrace_ops::private to store event pointer
  2016-03-09 20:46 [RFC 0/5] ftrace perf: Fixes and speedup Jiri Olsa
  2016-03-09 20:46 ` [PATCH 1/5] ftrace perf: Check sample types only for sampling events Jiri Olsa
  2016-03-09 20:46 ` [PATCH 2/5] ftrace perf: Move exclude_kernel tracepoint check to init event Jiri Olsa
@ 2016-03-09 20:46 ` Jiri Olsa
  2016-03-10  1:29   ` Namhyung Kim
  2016-03-09 20:46 ` [PATCH 4/5] ftrace: Make ftrace_hash_rec_enable return update bool Jiri Olsa
  2016-03-09 20:46 ` [PATCH 5/5] ftrace: Update dynamic ftrace calls only if necessary Jiri Olsa
  4 siblings, 1 reply; 28+ messages in thread
From: Jiri Olsa @ 2016-03-09 20:46 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: lkml, Ingo Molnar, Namhyung Kim, Peter Zijlstra,
	Arnaldo Carvalho de Melo

Having following commands running concurrently:

  # perf record -e ftrace:function -a -o krava.data sleep 10
  # perf record -e ftrace:function --filter 'ip == SyS_read' ls

will end up in the latter one to fail on the filter rules
and store all functions (in perf.data) as instructed by the
first perf record instead of just SyS_read records.

The reason for this is, that tracepoint code by default
triggers all events that registered for the tracepoint.

While ftrace:function is special because ftrace_ops
itself carries a filter and only the event that owns
ftrace_ops is eligible to be triggered.

Fixing this by using ftrace_ops::private value to keep
the perf_event pointer. This way we don't need to search
for triggered event (as tracepoint handler does) and
directly store sample.

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 include/linux/perf_event.h      |  3 +++
 kernel/events/core.c            | 22 ++++++++++++++++++++++
 kernel/trace/trace_event_perf.c | 10 +++-------
 3 files changed, 28 insertions(+), 7 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index a9d8cab18b00..a330dc06d90d 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1008,6 +1008,9 @@ extern void perf_tp_event(u64 addr, u64 count, void *record,
 			  int entry_size, struct pt_regs *regs,
 			  struct hlist_head *head, int rctx,
 			  struct task_struct *task);
+void perf_function_event(struct perf_event *event,
+			 void *record, int entry_size,
+			 struct pt_regs *regs);
 extern void perf_bp_event(struct perf_event *event, void *data);
 
 #ifndef perf_misc_flags
diff --git a/kernel/events/core.c b/kernel/events/core.c
index ca68fdcf47ce..18da90859c17 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7063,6 +7063,28 @@ static void perf_event_free_bpf_prog(struct perf_event *event)
 	}
 }
 
+#ifdef CONFIG_FUNCTION_TRACER
+void perf_function_event(struct perf_event *event,
+			 void *record, int entry_size,
+			 struct pt_regs *regs)
+
+{
+	struct perf_sample_data data;
+	struct perf_raw_record raw = {
+		.size = entry_size,
+		.data = record,
+	};
+
+	if (event->hw.state & PERF_HES_STOPPED)
+		return;
+
+	perf_sample_data_init(&data, 0, 0);
+	data.raw = &raw;
+
+	perf_swevent_event(event, 1, &data, regs);
+}
+#endif /* CONFIG_FUNCTION_TRACER */
+
 #else
 
 static inline void perf_tp_register(void)
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 0a3779bd18a1..087c811db04c 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -328,14 +328,9 @@ perf_ftrace_function_call(unsigned long ip, unsigned long parent_ip,
 			  struct ftrace_ops *ops, struct pt_regs *pt_regs)
 {
 	struct ftrace_entry *entry;
-	struct hlist_head *head;
 	struct pt_regs regs;
 	int rctx;
 
-	head = this_cpu_ptr(event_function.perf_events);
-	if (hlist_empty(head))
-		return;
-
 #define ENTRY_SIZE (ALIGN(sizeof(struct ftrace_entry) + sizeof(u32), \
 		    sizeof(u64)) - sizeof(u32))
 
@@ -349,8 +344,8 @@ perf_ftrace_function_call(unsigned long ip, unsigned long parent_ip,
 
 	entry->ip = ip;
 	entry->parent_ip = parent_ip;
-	perf_trace_buf_submit(entry, ENTRY_SIZE, rctx, 0,
-			      1, &regs, head, NULL);
+	perf_function_event(ops->private, entry, ENTRY_SIZE, &regs);
+	perf_swevent_put_recursion_context(rctx);
 
 #undef ENTRY_SIZE
 }
@@ -359,6 +354,7 @@ static int perf_ftrace_function_register(struct perf_event *event)
 {
 	struct ftrace_ops *ops = &event->ftrace_ops;
 
+	ops->private = event;
 	ops->flags |= FTRACE_OPS_FL_PER_CPU | FTRACE_OPS_FL_RCU;
 	ops->func = perf_ftrace_function_call;
 	return register_ftrace_function(ops);
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 4/5] ftrace: Make ftrace_hash_rec_enable return update bool
  2016-03-09 20:46 [RFC 0/5] ftrace perf: Fixes and speedup Jiri Olsa
                   ` (2 preceding siblings ...)
  2016-03-09 20:46 ` [PATCH 3/5] ftrace perf: Use ftrace_ops::private to store event pointer Jiri Olsa
@ 2016-03-09 20:46 ` Jiri Olsa
  2016-03-11 14:28   ` Namhyung Kim
  2016-03-09 20:46 ` [PATCH 5/5] ftrace: Update dynamic ftrace calls only if necessary Jiri Olsa
  4 siblings, 1 reply; 28+ messages in thread
From: Jiri Olsa @ 2016-03-09 20:46 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: lkml, Ingo Molnar, Namhyung Kim, Peter Zijlstra,
	Arnaldo Carvalho de Melo

Change __ftrace_hash_rec_update to return true in case
we need to update dynamic ftrace call records. It return
false in case no update is needed.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 kernel/trace/ftrace.c | 26 ++++++++++++++++----------
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index eca592f977b2..123dddc660e9 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1610,7 +1610,7 @@ static bool test_rec_ops_needs_regs(struct dyn_ftrace *rec)
 	return  keep_regs;
 }
 
-static void __ftrace_hash_rec_update(struct ftrace_ops *ops,
+static bool __ftrace_hash_rec_update(struct ftrace_ops *ops,
 				     int filter_hash,
 				     bool inc)
 {
@@ -1618,12 +1618,13 @@ static void __ftrace_hash_rec_update(struct ftrace_ops *ops,
 	struct ftrace_hash *other_hash;
 	struct ftrace_page *pg;
 	struct dyn_ftrace *rec;
+	bool update = false;
 	int count = 0;
 	int all = 0;
 
 	/* Only update if the ops has been registered */
 	if (!(ops->flags & FTRACE_OPS_FL_ENABLED))
-		return;
+		return false;
 
 	/*
 	 * In the filter_hash case:
@@ -1650,7 +1651,7 @@ static void __ftrace_hash_rec_update(struct ftrace_ops *ops,
 		 * then there's nothing to do.
 		 */
 		if (ftrace_hash_empty(hash))
-			return;
+			return false;
 	}
 
 	do_for_each_ftrace_rec(pg, rec) {
@@ -1694,7 +1695,7 @@ static void __ftrace_hash_rec_update(struct ftrace_ops *ops,
 		if (inc) {
 			rec->flags++;
 			if (FTRACE_WARN_ON(ftrace_rec_count(rec) == FTRACE_REF_MAX))
-				return;
+				return false;
 
 			/*
 			 * If there's only a single callback registered to a
@@ -1720,7 +1721,7 @@ static void __ftrace_hash_rec_update(struct ftrace_ops *ops,
 				rec->flags |= FTRACE_FL_REGS;
 		} else {
 			if (FTRACE_WARN_ON(ftrace_rec_count(rec) == 0))
-				return;
+				return false;
 			rec->flags--;
 
 			/*
@@ -1753,22 +1754,27 @@ static void __ftrace_hash_rec_update(struct ftrace_ops *ops,
 			 */
 		}
 		count++;
+
+		update |= ftrace_test_record(rec, 1) != FTRACE_UPDATE_IGNORE;
+
 		/* Shortcut, if we handled all records, we are done. */
 		if (!all && count == hash->count)
-			return;
+			return update;
 	} while_for_each_ftrace_rec();
+
+	return update;
 }
 
-static void ftrace_hash_rec_disable(struct ftrace_ops *ops,
+static bool ftrace_hash_rec_disable(struct ftrace_ops *ops,
 				    int filter_hash)
 {
-	__ftrace_hash_rec_update(ops, filter_hash, 0);
+	return __ftrace_hash_rec_update(ops, filter_hash, 0);
 }
 
-static void ftrace_hash_rec_enable(struct ftrace_ops *ops,
+static bool ftrace_hash_rec_enable(struct ftrace_ops *ops,
 				   int filter_hash)
 {
-	__ftrace_hash_rec_update(ops, filter_hash, 1);
+	return __ftrace_hash_rec_update(ops, filter_hash, 1);
 }
 
 static void ftrace_hash_rec_update_modify(struct ftrace_ops *ops,
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 5/5] ftrace: Update dynamic ftrace calls only if necessary
  2016-03-09 20:46 [RFC 0/5] ftrace perf: Fixes and speedup Jiri Olsa
                   ` (3 preceding siblings ...)
  2016-03-09 20:46 ` [PATCH 4/5] ftrace: Make ftrace_hash_rec_enable return update bool Jiri Olsa
@ 2016-03-09 20:46 ` Jiri Olsa
  4 siblings, 0 replies; 28+ messages in thread
From: Jiri Olsa @ 2016-03-09 20:46 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: lkml, Ingo Molnar, Namhyung Kim, Peter Zijlstra,
	Arnaldo Carvalho de Melo

Currently dynamic ftrace calls are updated any time
the ftrace_ops is un/registered. If we do  this update
only when it's needed, we save lot of time for perf
system wide ftrace function sampling/counting.

The reason is that for system wide sampling/counting,
perf creates event for each cpu in the system.

Each event then registers separate copy of ftrace_ops,
which ends up in FTRACE_UPDATE_CALLS updates. On servers
with many cpus that means serious stall (240 cpus server):

Counting:
  # time ./perf stat -e ftrace:function -a sleep 1

   Performance counter stats for 'system wide':

              370,663      ftrace:function

          1.401427505 seconds time elapsed

  real    3m51.743s
  user    0m0.023s
  sys     3m48.569s

Sampling:
  # time ./perf record -e ftrace:function -a sleep 1
  [ perf record: Woken up 0 times to write data ]
  Warning:
  Processed 141200 events and lost 5 chunks!

  [ perf record: Captured and wrote 10.703 MB perf.data (135950 samples) ]

  real    2m31.429s
  user    0m0.213s
  sys     2m29.494s

There's no reason to do the FTRACE_UPDATE_CALLS update
for each event in perf case, because all the ftrace_ops
always share the same filter, so the updated calls are
always the same.

It's required that only first ftrace_ops registration
does the FTRACE_UPDATE_CALLS update (also sometimes
the second if the first one used the trampoline), but
the rest can be only cheaply linked into the ftrace_ops
list.

Counting:
  # time ./perf stat -e ftrace:function -a sleep 1

   Performance counter stats for 'system wide':

             398,571      ftrace:function

         1.377503733 seconds time elapsed

  real    0m2.787s
  user    0m0.005s
  sys     0m1.883s

Sampling:
  # time ./perf record -e ftrace:function -a sleep 1
  [ perf record: Woken up 0 times to write data ]
  Warning:
  Processed 261730 events and lost 9 chunks!

  [ perf record: Captured and wrote 19.907 MB perf.data (256293 samples) ]

  real    1m31.948s
  user    0m0.309s
  sys     1m32.051s

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 kernel/trace/ftrace.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 123dddc660e9..48b491463549 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -2650,7 +2650,6 @@ static int ftrace_startup(struct ftrace_ops *ops, int command)
 		return ret;
 
 	ftrace_start_up++;
-	command |= FTRACE_UPDATE_CALLS;
 
 	/*
 	 * Note that ftrace probes uses this to start up
@@ -2671,7 +2670,8 @@ static int ftrace_startup(struct ftrace_ops *ops, int command)
 		return ret;
 	}
 
-	ftrace_hash_rec_enable(ops, 1);
+	if (ftrace_hash_rec_enable(ops, 1))
+		command |= FTRACE_UPDATE_CALLS;
 
 	ftrace_startup_enable(command);
 
@@ -2701,11 +2701,11 @@ static int ftrace_shutdown(struct ftrace_ops *ops, int command)
 
 	/* Disabling ipmodify never fails */
 	ftrace_hash_ipmodify_disable(ops);
-	ftrace_hash_rec_disable(ops, 1);
 
-	ops->flags &= ~FTRACE_OPS_FL_ENABLED;
+	if (ftrace_hash_rec_disable(ops, 1))
+		command |= FTRACE_UPDATE_CALLS;
 
-	command |= FTRACE_UPDATE_CALLS;
+	ops->flags &= ~FTRACE_OPS_FL_ENABLED;
 
 	if (saved_ftrace_func != ftrace_trace_function) {
 		saved_ftrace_func = ftrace_trace_function;
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/5] ftrace perf: Check sample types only for sampling events
  2016-03-09 20:46 ` [PATCH 1/5] ftrace perf: Check sample types only for sampling events Jiri Olsa
@ 2016-03-10  0:36   ` Namhyung Kim
  2016-03-10  7:25     ` Jiri Olsa
  2016-03-15 20:06   ` Steven Rostedt
  1 sibling, 1 reply; 28+ messages in thread
From: Namhyung Kim @ 2016-03-10  0:36 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Steven Rostedt, lkml, Ingo Molnar, Peter Zijlstra,
	Arnaldo Carvalho de Melo

Hi Jiri,

On Wed, Mar 09, 2016 at 09:46:41PM +0100, Jiri Olsa wrote:
> Currently we check sample type for ftrace:function event
> even if it's not created as sampling event. That prevents
> creating ftrace_function event in counting mode.
> 
> Making sure we check sample types only for sampling events.
> 
> Before:
>   $ sudo perf stat -e ftrace:function ls
>   ...
> 
>    Performance counter stats for 'ls':
> 
>      <not supported>      ftrace:function
> 
>          0.001983662 seconds time elapsed
> 
> After:
>   $ sudo perf stat -e ftrace:function ls
>   ...
> 
>    Performance counter stats for 'ls':
> 
>               44,498      ftrace:function
> 
>          0.037534722 seconds time elapsed
> 
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>  kernel/trace/trace_event_perf.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
> index 00df25fd86ef..a7171ec2c1ca 100644
> --- a/kernel/trace/trace_event_perf.c
> +++ b/kernel/trace/trace_event_perf.c
> @@ -52,14 +52,14 @@ static int perf_trace_event_perm(struct trace_event_call *tp_event,
>  		 * event, due to issues with page faults while tracing page
>  		 * fault handler and its overall trickiness nature.
>  		 */
> -		if (!p_event->attr.exclude_callchain_user)
> +		if (is_sampling_event(p_event) && !p_event->attr.exclude_callchain_user)
>  			return -EINVAL;
>  
>  		/*
>  		 * Same reason to disable user stack dump as for user space
>  		 * callchains above.
>  		 */
> -		if (p_event->attr.sample_type & PERF_SAMPLE_STACK_USER)
> +		if (is_sampling_event(p_event) && p_event->attr.sample_type & PERF_SAMPLE_STACK_USER)
>  			return -EINVAL;
>  	}
>  

What about checking is_sampling_event() first and goto the last
paranoid_tracepoint_raw check instead?  This way we can remove the
same check in the function trace case.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] ftrace perf: Move exclude_kernel tracepoint check to init event
  2016-03-09 20:46 ` [PATCH 2/5] ftrace perf: Move exclude_kernel tracepoint check to init event Jiri Olsa
@ 2016-03-10  0:39   ` Namhyung Kim
  2016-03-11  8:39     ` Jiri Olsa
  0 siblings, 1 reply; 28+ messages in thread
From: Namhyung Kim @ 2016-03-10  0:39 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Steven Rostedt, lkml, Ingo Molnar, Peter Zijlstra,
	Arnaldo Carvalho de Melo

On Wed, Mar 09, 2016 at 09:46:42PM +0100, Jiri Olsa wrote:
> We suppress events with attr::exclude_kernel set when
> the event is generated, so following capture will
> give no warning but won't produce any data:
> 
>   $ sudo perf record -e sched:sched_switch:u ls
>   $ sudo /perf script | wc -l
>   0
> 
> Checking the attr::exclude_(kernel|user) at the event
> init time and failing right away for tracepoints from
> uprobes/kprobes and native ones:
> 
>   $ sudo perf record -e sched:sched_switch:u ls
>   Error:
>   The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (sched:sched_switch).
>   /bin/dmesg may provide additional information.
>   No CONFIG_PERF_EVENTS=y kernel support configured?
> 
>   $ sudo perf record -e probe:sys_read:u ls
>   Error:
>   The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (probe:sys_read).
>   /bin/dmesg may provide additional information.
>   No CONFIG_PERF_EVENTS=y kernel support configured?
> 
>   $ ./perf record -e probe_ex:main:k ./ex
>   Error:
>   The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (probe_ex:main).
>   /bin/dmesg may provide additional information.
>   No CONFIG_PERF_EVENTS=y kernel support configured?
> 
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>

Acked-by: Namhyung Kim <namhyung@kernel.org>

Maybe we need to improve the error message later.

Thanks,
Namhyung


> ---
>  kernel/events/core.c            |  5 -----
>  kernel/trace/trace_event_perf.c | 25 +++++++++++++++++++++++++
>  2 files changed, 25 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index c15fd097af93..ca68fdcf47ce 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -6898,11 +6898,6 @@ static int perf_tp_event_match(struct perf_event *event,
>  {
>  	if (event->hw.state & PERF_HES_STOPPED)
>  		return 0;
> -	/*
> -	 * All tracepoints are from kernel-space.
> -	 */
> -	if (event->attr.exclude_kernel)
> -		return 0;
>  
>  	if (!perf_tp_filter_match(event, data))
>  		return 0;
> diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
> index a7171ec2c1ca..0a3779bd18a1 100644
> --- a/kernel/trace/trace_event_perf.c
> +++ b/kernel/trace/trace_event_perf.c
> @@ -182,11 +182,36 @@ static void perf_trace_event_close(struct perf_event *p_event)
>  	tp_event->class->reg(tp_event, TRACE_REG_PERF_CLOSE, p_event);
>  }
>  
> +static int perf_trace_event_attr(struct trace_event_call *tp_event,
> +				 struct perf_event *event)
> +{
> +	/*
> +	 * All tracepoints and kprobes are from kernel-space.
> +	 */
> +	if (((tp_event->flags & TRACE_EVENT_FL_TRACEPOINT) ||
> +	     (tp_event->flags & TRACE_EVENT_FL_KPROBE)) &&
> +	     event->attr.exclude_kernel)
> +		return -EINVAL;
> +
> +	/*
> +	 * All uprobes are from user-space.
> +	 */
> +	if ((tp_event->flags & TRACE_EVENT_FL_UPROBE) &&
> +	    event->attr.exclude_user)
> +		return -EINVAL;
> +
> +	return 0;
> +}
> +
>  static int perf_trace_event_init(struct trace_event_call *tp_event,
>  				 struct perf_event *p_event)
>  {
>  	int ret;
>  
> +	ret = perf_trace_event_attr(tp_event, p_event);
> +	if (ret)
> +		return ret;
> +
>  	ret = perf_trace_event_perm(tp_event, p_event);
>  	if (ret)
>  		return ret;
> -- 
> 2.4.3
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/5] ftrace perf: Use ftrace_ops::private to store event pointer
  2016-03-09 20:46 ` [PATCH 3/5] ftrace perf: Use ftrace_ops::private to store event pointer Jiri Olsa
@ 2016-03-10  1:29   ` Namhyung Kim
  0 siblings, 0 replies; 28+ messages in thread
From: Namhyung Kim @ 2016-03-10  1:29 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Steven Rostedt, lkml, Ingo Molnar, Peter Zijlstra,
	Arnaldo Carvalho de Melo

On Wed, Mar 09, 2016 at 09:46:43PM +0100, Jiri Olsa wrote:
> Having following commands running concurrently:
> 
>   # perf record -e ftrace:function -a -o krava.data sleep 10
>   # perf record -e ftrace:function --filter 'ip == SyS_read' ls
> 
> will end up in the latter one to fail on the filter rules
> and store all functions (in perf.data) as instructed by the
> first perf record instead of just SyS_read records.
> 
> The reason for this is, that tracepoint code by default
> triggers all events that registered for the tracepoint.
> 
> While ftrace:function is special because ftrace_ops
> itself carries a filter and only the event that owns
> ftrace_ops is eligible to be triggered.
> 
> Fixing this by using ftrace_ops::private value to keep
> the perf_event pointer. This way we don't need to search
> for triggered event (as tracepoint handler does) and
> directly store sample.
> 
> Suggested-by: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>

Acked-by: Namhyung Kim <namhyung@kernel.org>

Thanks,
Namhyung


> ---
>  include/linux/perf_event.h      |  3 +++
>  kernel/events/core.c            | 22 ++++++++++++++++++++++
>  kernel/trace/trace_event_perf.c | 10 +++-------
>  3 files changed, 28 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index a9d8cab18b00..a330dc06d90d 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -1008,6 +1008,9 @@ extern void perf_tp_event(u64 addr, u64 count, void *record,
>  			  int entry_size, struct pt_regs *regs,
>  			  struct hlist_head *head, int rctx,
>  			  struct task_struct *task);
> +void perf_function_event(struct perf_event *event,
> +			 void *record, int entry_size,
> +			 struct pt_regs *regs);
>  extern void perf_bp_event(struct perf_event *event, void *data);
>  
>  #ifndef perf_misc_flags
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index ca68fdcf47ce..18da90859c17 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -7063,6 +7063,28 @@ static void perf_event_free_bpf_prog(struct perf_event *event)
>  	}
>  }
>  
> +#ifdef CONFIG_FUNCTION_TRACER
> +void perf_function_event(struct perf_event *event,
> +			 void *record, int entry_size,
> +			 struct pt_regs *regs)
> +
> +{
> +	struct perf_sample_data data;
> +	struct perf_raw_record raw = {
> +		.size = entry_size,
> +		.data = record,
> +	};
> +
> +	if (event->hw.state & PERF_HES_STOPPED)
> +		return;
> +
> +	perf_sample_data_init(&data, 0, 0);
> +	data.raw = &raw;
> +
> +	perf_swevent_event(event, 1, &data, regs);
> +}
> +#endif /* CONFIG_FUNCTION_TRACER */
> +
>  #else
>  
>  static inline void perf_tp_register(void)
> diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
> index 0a3779bd18a1..087c811db04c 100644
> --- a/kernel/trace/trace_event_perf.c
> +++ b/kernel/trace/trace_event_perf.c
> @@ -328,14 +328,9 @@ perf_ftrace_function_call(unsigned long ip, unsigned long parent_ip,
>  			  struct ftrace_ops *ops, struct pt_regs *pt_regs)
>  {
>  	struct ftrace_entry *entry;
> -	struct hlist_head *head;
>  	struct pt_regs regs;
>  	int rctx;
>  
> -	head = this_cpu_ptr(event_function.perf_events);
> -	if (hlist_empty(head))
> -		return;
> -
>  #define ENTRY_SIZE (ALIGN(sizeof(struct ftrace_entry) + sizeof(u32), \
>  		    sizeof(u64)) - sizeof(u32))
>  
> @@ -349,8 +344,8 @@ perf_ftrace_function_call(unsigned long ip, unsigned long parent_ip,
>  
>  	entry->ip = ip;
>  	entry->parent_ip = parent_ip;
> -	perf_trace_buf_submit(entry, ENTRY_SIZE, rctx, 0,
> -			      1, &regs, head, NULL);
> +	perf_function_event(ops->private, entry, ENTRY_SIZE, &regs);
> +	perf_swevent_put_recursion_context(rctx);
>  
>  #undef ENTRY_SIZE
>  }
> @@ -359,6 +354,7 @@ static int perf_ftrace_function_register(struct perf_event *event)
>  {
>  	struct ftrace_ops *ops = &event->ftrace_ops;
>  
> +	ops->private = event;
>  	ops->flags |= FTRACE_OPS_FL_PER_CPU | FTRACE_OPS_FL_RCU;
>  	ops->func = perf_ftrace_function_call;
>  	return register_ftrace_function(ops);
> -- 
> 2.4.3
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/5] ftrace perf: Check sample types only for sampling events
  2016-03-10  0:36   ` Namhyung Kim
@ 2016-03-10  7:25     ` Jiri Olsa
  2016-03-11  8:36       ` Jiri Olsa
  0 siblings, 1 reply; 28+ messages in thread
From: Jiri Olsa @ 2016-03-10  7:25 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Jiri Olsa, Steven Rostedt, lkml, Ingo Molnar, Peter Zijlstra,
	Arnaldo Carvalho de Melo

On Thu, Mar 10, 2016 at 09:36:37AM +0900, Namhyung Kim wrote:
> Hi Jiri,
> 
> On Wed, Mar 09, 2016 at 09:46:41PM +0100, Jiri Olsa wrote:
> > Currently we check sample type for ftrace:function event
> > even if it's not created as sampling event. That prevents
> > creating ftrace_function event in counting mode.
> > 
> > Making sure we check sample types only for sampling events.
> > 
> > Before:
> >   $ sudo perf stat -e ftrace:function ls
> >   ...
> > 
> >    Performance counter stats for 'ls':
> > 
> >      <not supported>      ftrace:function
> > 
> >          0.001983662 seconds time elapsed
> > 
> > After:
> >   $ sudo perf stat -e ftrace:function ls
> >   ...
> > 
> >    Performance counter stats for 'ls':
> > 
> >               44,498      ftrace:function
> > 
> >          0.037534722 seconds time elapsed
> > 
> > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > ---
> >  kernel/trace/trace_event_perf.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
> > index 00df25fd86ef..a7171ec2c1ca 100644
> > --- a/kernel/trace/trace_event_perf.c
> > +++ b/kernel/trace/trace_event_perf.c
> > @@ -52,14 +52,14 @@ static int perf_trace_event_perm(struct trace_event_call *tp_event,
> >  		 * event, due to issues with page faults while tracing page
> >  		 * fault handler and its overall trickiness nature.
> >  		 */
> > -		if (!p_event->attr.exclude_callchain_user)
> > +		if (is_sampling_event(p_event) && !p_event->attr.exclude_callchain_user)
> >  			return -EINVAL;
> >  
> >  		/*
> >  		 * Same reason to disable user stack dump as for user space
> >  		 * callchains above.
> >  		 */
> > -		if (p_event->attr.sample_type & PERF_SAMPLE_STACK_USER)
> > +		if (is_sampling_event(p_event) && p_event->attr.sample_type & PERF_SAMPLE_STACK_USER)
> >  			return -EINVAL;
> >  	}
> >  
> 
> What about checking is_sampling_event() first and goto the last
> paranoid_tracepoint_raw check instead?  This way we can remove the
> same check in the function trace case.

right, will check

thanks,
jirka

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/5] ftrace perf: Check sample types only for sampling events
  2016-03-10  7:25     ` Jiri Olsa
@ 2016-03-11  8:36       ` Jiri Olsa
  2016-03-11 13:48         ` Namhyung Kim
  0 siblings, 1 reply; 28+ messages in thread
From: Jiri Olsa @ 2016-03-11  8:36 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Jiri Olsa, Steven Rostedt, lkml, Ingo Molnar, Peter Zijlstra,
	Arnaldo Carvalho de Melo

On Thu, Mar 10, 2016 at 08:25:02AM +0100, Jiri Olsa wrote:
> On Thu, Mar 10, 2016 at 09:36:37AM +0900, Namhyung Kim wrote:
> > Hi Jiri,
> > 
> > On Wed, Mar 09, 2016 at 09:46:41PM +0100, Jiri Olsa wrote:
> > > Currently we check sample type for ftrace:function event
> > > even if it's not created as sampling event. That prevents
> > > creating ftrace_function event in counting mode.
> > > 
> > > Making sure we check sample types only for sampling events.
> > > 
> > > Before:
> > >   $ sudo perf stat -e ftrace:function ls
> > >   ...
> > > 
> > >    Performance counter stats for 'ls':
> > > 
> > >      <not supported>      ftrace:function
> > > 
> > >          0.001983662 seconds time elapsed
> > > 
> > > After:
> > >   $ sudo perf stat -e ftrace:function ls
> > >   ...
> > > 
> > >    Performance counter stats for 'ls':
> > > 
> > >               44,498      ftrace:function
> > > 
> > >          0.037534722 seconds time elapsed
> > > 
> > > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > > ---
> > >  kernel/trace/trace_event_perf.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
> > > index 00df25fd86ef..a7171ec2c1ca 100644
> > > --- a/kernel/trace/trace_event_perf.c
> > > +++ b/kernel/trace/trace_event_perf.c
> > > @@ -52,14 +52,14 @@ static int perf_trace_event_perm(struct trace_event_call *tp_event,
> > >  		 * event, due to issues with page faults while tracing page
> > >  		 * fault handler and its overall trickiness nature.
> > >  		 */
> > > -		if (!p_event->attr.exclude_callchain_user)
> > > +		if (is_sampling_event(p_event) && !p_event->attr.exclude_callchain_user)
> > >  			return -EINVAL;
> > >  
> > >  		/*
> > >  		 * Same reason to disable user stack dump as for user space
> > >  		 * callchains above.
> > >  		 */
> > > -		if (p_event->attr.sample_type & PERF_SAMPLE_STACK_USER)
> > > +		if (is_sampling_event(p_event) && p_event->attr.sample_type & PERF_SAMPLE_STACK_USER)
> > >  			return -EINVAL;
> > >  	}
> > >  
> > 
> > What about checking is_sampling_event() first and goto the last
> > paranoid_tracepoint_raw check instead?  This way we can remove the
> > same check in the function trace case.
> 
> right, will check

hum, did you mean something like this?

I'd rather keep it the original way.. seems more straight

jirka


---
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 00df25fd86ef..7c1edb57c823 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -44,23 +44,22 @@ static int perf_trace_event_perm(struct trace_event_call *tp_event,
 
 	/* The ftrace function trace is allowed only for root. */
 	if (ftrace_event_is_function(tp_event)) {
-		if (perf_paranoid_tracepoint_raw() && !capable(CAP_SYS_ADMIN))
-			return -EPERM;
-
 		/*
 		 * We don't allow user space callchains for  function trace
 		 * event, due to issues with page faults while tracing page
 		 * fault handler and its overall trickiness nature.
 		 */
-		if (!p_event->attr.exclude_callchain_user)
+		if (is_sampling_event(p_event) && !p_event->attr.exclude_callchain_user)
 			return -EINVAL;
 
 		/*
 		 * Same reason to disable user stack dump as for user space
 		 * callchains above.
 		 */
-		if (p_event->attr.sample_type & PERF_SAMPLE_STACK_USER)
+		if (is_sampling_event(p_event) && p_event->attr.sample_type & PERF_SAMPLE_STACK_USER)
 			return -EINVAL;
+
+		goto root_check;
 	}
 
 	/* No tracing, just counting, so no obvious leak */
@@ -73,6 +72,7 @@ static int perf_trace_event_perm(struct trace_event_call *tp_event,
 			return 0;
 	}
 
+root_check:
 	/*
 	 * ...otherwise raw tracepoint data can be a severe data leak,
 	 * only allow root to have these.

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] ftrace perf: Move exclude_kernel tracepoint check to init event
  2016-03-10  0:39   ` Namhyung Kim
@ 2016-03-11  8:39     ` Jiri Olsa
  0 siblings, 0 replies; 28+ messages in thread
From: Jiri Olsa @ 2016-03-11  8:39 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Jiri Olsa, Steven Rostedt, lkml, Ingo Molnar, Peter Zijlstra,
	Arnaldo Carvalho de Melo

On Thu, Mar 10, 2016 at 09:39:55AM +0900, Namhyung Kim wrote:
> On Wed, Mar 09, 2016 at 09:46:42PM +0100, Jiri Olsa wrote:
> > We suppress events with attr::exclude_kernel set when
> > the event is generated, so following capture will
> > give no warning but won't produce any data:
> > 
> >   $ sudo perf record -e sched:sched_switch:u ls
> >   $ sudo /perf script | wc -l
> >   0
> > 
> > Checking the attr::exclude_(kernel|user) at the event
> > init time and failing right away for tracepoints from
> > uprobes/kprobes and native ones:
> > 
> >   $ sudo perf record -e sched:sched_switch:u ls
> >   Error:
> >   The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (sched:sched_switch).
> >   /bin/dmesg may provide additional information.
> >   No CONFIG_PERF_EVENTS=y kernel support configured?
> > 
> >   $ sudo perf record -e probe:sys_read:u ls
> >   Error:
> >   The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (probe:sys_read).
> >   /bin/dmesg may provide additional information.
> >   No CONFIG_PERF_EVENTS=y kernel support configured?
> > 
> >   $ ./perf record -e probe_ex:main:k ./ex
> >   Error:
> >   The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (probe_ex:main).
> >   /bin/dmesg may provide additional information.
> >   No CONFIG_PERF_EVENTS=y kernel support configured?
> > 
> > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> 
> Acked-by: Namhyung Kim <namhyung@kernel.org>
> 
> Maybe we need to improve the error message later.

yep, working on that ;-)

thanks,
jirka

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/5] ftrace perf: Check sample types only for sampling events
  2016-03-11  8:36       ` Jiri Olsa
@ 2016-03-11 13:48         ` Namhyung Kim
  2016-03-11 18:14           ` Jiri Olsa
  0 siblings, 1 reply; 28+ messages in thread
From: Namhyung Kim @ 2016-03-11 13:48 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Jiri Olsa, Steven Rostedt, lkml, Ingo Molnar, Peter Zijlstra,
	Arnaldo Carvalho de Melo

On Fri, Mar 11, 2016 at 09:36:24AM +0100, Jiri Olsa wrote:
> On Thu, Mar 10, 2016 at 08:25:02AM +0100, Jiri Olsa wrote:
> > On Thu, Mar 10, 2016 at 09:36:37AM +0900, Namhyung Kim wrote:
> > > Hi Jiri,
> > > 
> > > On Wed, Mar 09, 2016 at 09:46:41PM +0100, Jiri Olsa wrote:
> > > > Currently we check sample type for ftrace:function event
> > > > even if it's not created as sampling event. That prevents
> > > > creating ftrace_function event in counting mode.
> > > > 
> > > > Making sure we check sample types only for sampling events.
> > > > 
> > > > Before:
> > > >   $ sudo perf stat -e ftrace:function ls
> > > >   ...
> > > > 
> > > >    Performance counter stats for 'ls':
> > > > 
> > > >      <not supported>      ftrace:function
> > > > 
> > > >          0.001983662 seconds time elapsed
> > > > 
> > > > After:
> > > >   $ sudo perf stat -e ftrace:function ls
> > > >   ...
> > > > 
> > > >    Performance counter stats for 'ls':
> > > > 
> > > >               44,498      ftrace:function
> > > > 
> > > >          0.037534722 seconds time elapsed
> > > > 
> > > > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > > > ---
> > > >  kernel/trace/trace_event_perf.c | 4 ++--
> > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
> > > > index 00df25fd86ef..a7171ec2c1ca 100644
> > > > --- a/kernel/trace/trace_event_perf.c
> > > > +++ b/kernel/trace/trace_event_perf.c
> > > > @@ -52,14 +52,14 @@ static int perf_trace_event_perm(struct trace_event_call *tp_event,
> > > >  		 * event, due to issues with page faults while tracing page
> > > >  		 * fault handler and its overall trickiness nature.
> > > >  		 */
> > > > -		if (!p_event->attr.exclude_callchain_user)
> > > > +		if (is_sampling_event(p_event) && !p_event->attr.exclude_callchain_user)
> > > >  			return -EINVAL;
> > > >  
> > > >  		/*
> > > >  		 * Same reason to disable user stack dump as for user space
> > > >  		 * callchains above.
> > > >  		 */
> > > > -		if (p_event->attr.sample_type & PERF_SAMPLE_STACK_USER)
> > > > +		if (is_sampling_event(p_event) && p_event->attr.sample_type & PERF_SAMPLE_STACK_USER)
> > > >  			return -EINVAL;
> > > >  	}
> > > >  
> > > 
> > > What about checking is_sampling_event() first and goto the last
> > > paranoid_tracepoint_raw check instead?  This way we can remove the
> > > same check in the function trace case.
> > 
> > right, will check
> 
> hum, did you mean something like this?
> 
> I'd rather keep it the original way.. seems more straight

Hmm.. I think I was wrong.  But it seems we can simply return 0 for
non sampling case.  How about this?

Thanks,
Namhyung


diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 00df25fd86ef..e11108f1d197 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -47,6 +47,9 @@ static int perf_trace_event_perm(struct trace_event_call *tp_event,
 		if (perf_paranoid_tracepoint_raw() && !capable(CAP_SYS_ADMIN))
 			return -EPERM;
 
+		if (!is_sampling_event(p_event))
+			return 0;
+
 		/*
 		 * We don't allow user space callchains for  function trace
 		 * event, due to issues with page faults while tracing page

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/5] ftrace: Make ftrace_hash_rec_enable return update bool
  2016-03-09 20:46 ` [PATCH 4/5] ftrace: Make ftrace_hash_rec_enable return update bool Jiri Olsa
@ 2016-03-11 14:28   ` Namhyung Kim
  2016-03-11 18:15     ` Jiri Olsa
  0 siblings, 1 reply; 28+ messages in thread
From: Namhyung Kim @ 2016-03-11 14:28 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Steven Rostedt, lkml, Ingo Molnar, Peter Zijlstra,
	Arnaldo Carvalho de Melo

On Wed, Mar 09, 2016 at 09:46:44PM +0100, Jiri Olsa wrote:
> Change __ftrace_hash_rec_update to return true in case
> we need to update dynamic ftrace call records. It return
> false in case no update is needed.
> 
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>  kernel/trace/ftrace.c | 26 ++++++++++++++++----------
>  1 file changed, 16 insertions(+), 10 deletions(-)
> 
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index eca592f977b2..123dddc660e9 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -1610,7 +1610,7 @@ static bool test_rec_ops_needs_regs(struct dyn_ftrace *rec)
>  	return  keep_regs;
>  }
>  
> -static void __ftrace_hash_rec_update(struct ftrace_ops *ops,
> +static bool __ftrace_hash_rec_update(struct ftrace_ops *ops,
>  				     int filter_hash,
>  				     bool inc)
>  {
> @@ -1618,12 +1618,13 @@ static void __ftrace_hash_rec_update(struct ftrace_ops *ops,
>  	struct ftrace_hash *other_hash;
>  	struct ftrace_page *pg;
>  	struct dyn_ftrace *rec;
> +	bool update = false;
>  	int count = 0;
>  	int all = 0;
>  
>  	/* Only update if the ops has been registered */
>  	if (!(ops->flags & FTRACE_OPS_FL_ENABLED))
> -		return;
> +		return false;
>  
>  	/*
>  	 * In the filter_hash case:
> @@ -1650,7 +1651,7 @@ static void __ftrace_hash_rec_update(struct ftrace_ops *ops,
>  		 * then there's nothing to do.
>  		 */
>  		if (ftrace_hash_empty(hash))
> -			return;
> +			return false;
>  	}
>  
>  	do_for_each_ftrace_rec(pg, rec) {
> @@ -1694,7 +1695,7 @@ static void __ftrace_hash_rec_update(struct ftrace_ops *ops,
>  		if (inc) {
>  			rec->flags++;
>  			if (FTRACE_WARN_ON(ftrace_rec_count(rec) == FTRACE_REF_MAX))
> -				return;
> +				return false;
>  
>  			/*
>  			 * If there's only a single callback registered to a
> @@ -1720,7 +1721,7 @@ static void __ftrace_hash_rec_update(struct ftrace_ops *ops,
>  				rec->flags |= FTRACE_FL_REGS;
>  		} else {
>  			if (FTRACE_WARN_ON(ftrace_rec_count(rec) == 0))
> -				return;
> +				return false;
>  			rec->flags--;
>  
>  			/*
> @@ -1753,22 +1754,27 @@ static void __ftrace_hash_rec_update(struct ftrace_ops *ops,
>  			 */
>  		}
>  		count++;
> +
> +		update |= ftrace_test_record(rec, 1) != FTRACE_UPDATE_IGNORE;

Shouldn't it use 'inc' instead of 1 for the second argument of
the ftrace_test_record()?

Thanks,
Namhyung


> +
>  		/* Shortcut, if we handled all records, we are done. */
>  		if (!all && count == hash->count)
> -			return;
> +			return update;
>  	} while_for_each_ftrace_rec();
> +
> +	return update;
>  }
>  
> -static void ftrace_hash_rec_disable(struct ftrace_ops *ops,
> +static bool ftrace_hash_rec_disable(struct ftrace_ops *ops,
>  				    int filter_hash)
>  {
> -	__ftrace_hash_rec_update(ops, filter_hash, 0);
> +	return __ftrace_hash_rec_update(ops, filter_hash, 0);
>  }
>  
> -static void ftrace_hash_rec_enable(struct ftrace_ops *ops,
> +static bool ftrace_hash_rec_enable(struct ftrace_ops *ops,
>  				   int filter_hash)
>  {
> -	__ftrace_hash_rec_update(ops, filter_hash, 1);
> +	return __ftrace_hash_rec_update(ops, filter_hash, 1);
>  }
>  
>  static void ftrace_hash_rec_update_modify(struct ftrace_ops *ops,
> -- 
> 2.4.3
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/5] ftrace perf: Check sample types only for sampling events
  2016-03-11 13:48         ` Namhyung Kim
@ 2016-03-11 18:14           ` Jiri Olsa
  0 siblings, 0 replies; 28+ messages in thread
From: Jiri Olsa @ 2016-03-11 18:14 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Jiri Olsa, Steven Rostedt, lkml, Ingo Molnar, Peter Zijlstra,
	Arnaldo Carvalho de Melo

On Fri, Mar 11, 2016 at 10:48:14PM +0900, Namhyung Kim wrote:

SNIP

> > > > What about checking is_sampling_event() first and goto the last
> > > > paranoid_tracepoint_raw check instead?  This way we can remove the
> > > > same check in the function trace case.
> > > 
> > > right, will check
> > 
> > hum, did you mean something like this?
> > 
> > I'd rather keep it the original way.. seems more straight
> 
> Hmm.. I think I was wrong.  But it seems we can simply return 0 for
> non sampling case.  How about this?

yep, that seems better.. will post v2

thanks,
jirka

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/5] ftrace: Make ftrace_hash_rec_enable return update bool
  2016-03-11 14:28   ` Namhyung Kim
@ 2016-03-11 18:15     ` Jiri Olsa
  2016-03-12  8:35       ` Namhyung Kim
  0 siblings, 1 reply; 28+ messages in thread
From: Jiri Olsa @ 2016-03-11 18:15 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Jiri Olsa, Steven Rostedt, lkml, Ingo Molnar, Peter Zijlstra,
	Arnaldo Carvalho de Melo

On Fri, Mar 11, 2016 at 11:28:00PM +0900, Namhyung Kim wrote:

SNIP

> > @@ -1694,7 +1695,7 @@ static void __ftrace_hash_rec_update(struct ftrace_ops *ops,
> >  		if (inc) {
> >  			rec->flags++;
> >  			if (FTRACE_WARN_ON(ftrace_rec_count(rec) == FTRACE_REF_MAX))
> > -				return;
> > +				return false;
> >  
> >  			/*
> >  			 * If there's only a single callback registered to a
> > @@ -1720,7 +1721,7 @@ static void __ftrace_hash_rec_update(struct ftrace_ops *ops,
> >  				rec->flags |= FTRACE_FL_REGS;
> >  		} else {
> >  			if (FTRACE_WARN_ON(ftrace_rec_count(rec) == 0))
> > -				return;
> > +				return false;
> >  			rec->flags--;
> >  
> >  			/*
> > @@ -1753,22 +1754,27 @@ static void __ftrace_hash_rec_update(struct ftrace_ops *ops,
> >  			 */
> >  		}
> >  		count++;
> > +
> > +		update |= ftrace_test_record(rec, 1) != FTRACE_UPDATE_IGNORE;
> 
> Shouldn't it use 'inc' instead of 1 for the second argument of
> the ftrace_test_record()?

I dont think so, 1 is to update calls (FTRACE_UPDATE_CALLS)
check ftrace_modify_all_code:

        if (command & FTRACE_UPDATE_CALLS)
                ftrace_replace_code(1);
        else if (command & FTRACE_DISABLE_CALLS)
                ftrace_replace_code(0);

both ftrace_startup, ftrace_shutdown use FTRACE_UPDATE_CALLS

you'd use 0 only to disable all, check ftrace_check_record comments:

        /*
         * If we are updating calls:
         *
         *   If the record has a ref count, then we need to enable it
         *   because someone is using it.
         *
         *   Otherwise we make sure its disabled.
         *
         * If we are disabling calls, then disable all records that
         * are enabled.
         */
        if (enable && ftrace_rec_count(rec))
                flag = FTRACE_FL_ENABLED;


used by ftrace_shutdown_sysctl

thanks,
jirka

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/5] ftrace: Make ftrace_hash_rec_enable return update bool
  2016-03-11 18:15     ` Jiri Olsa
@ 2016-03-12  8:35       ` Namhyung Kim
  2016-03-15 19:43         ` Steven Rostedt
  0 siblings, 1 reply; 28+ messages in thread
From: Namhyung Kim @ 2016-03-12  8:35 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Jiri Olsa, Steven Rostedt, lkml, Ingo Molnar, Peter Zijlstra,
	Arnaldo Carvalho de Melo

Hi Jiri,

On Fri, Mar 11, 2016 at 07:15:06PM +0100, Jiri Olsa wrote:
> On Fri, Mar 11, 2016 at 11:28:00PM +0900, Namhyung Kim wrote:
> 
> SNIP
> 
> > > @@ -1694,7 +1695,7 @@ static void __ftrace_hash_rec_update(struct ftrace_ops *ops,
> > >  		if (inc) {
> > >  			rec->flags++;
> > >  			if (FTRACE_WARN_ON(ftrace_rec_count(rec) == FTRACE_REF_MAX))
> > > -				return;
> > > +				return false;
> > >  
> > >  			/*
> > >  			 * If there's only a single callback registered to a
> > > @@ -1720,7 +1721,7 @@ static void __ftrace_hash_rec_update(struct ftrace_ops *ops,
> > >  				rec->flags |= FTRACE_FL_REGS;
> > >  		} else {
> > >  			if (FTRACE_WARN_ON(ftrace_rec_count(rec) == 0))
> > > -				return;
> > > +				return false;
> > >  			rec->flags--;
> > >  
> > >  			/*
> > > @@ -1753,22 +1754,27 @@ static void __ftrace_hash_rec_update(struct ftrace_ops *ops,
> > >  			 */
> > >  		}
> > >  		count++;
> > > +
> > > +		update |= ftrace_test_record(rec, 1) != FTRACE_UPDATE_IGNORE;
> > 
> > Shouldn't it use 'inc' instead of 1 for the second argument of
> > the ftrace_test_record()?
> 
> I dont think so, 1 is to update calls (FTRACE_UPDATE_CALLS)
> check ftrace_modify_all_code:
> 
>         if (command & FTRACE_UPDATE_CALLS)
>                 ftrace_replace_code(1);
>         else if (command & FTRACE_DISABLE_CALLS)
>                 ftrace_replace_code(0);
> 
> both ftrace_startup, ftrace_shutdown use FTRACE_UPDATE_CALLS

Ah, ok.  So the second argument of the ftrace_test_record() is not
'enable' actually..  :-/

> 
> you'd use 0 only to disable all, check ftrace_check_record comments:
> 
>         /*
>          * If we are updating calls:
>          *
>          *   If the record has a ref count, then we need to enable it
>          *   because someone is using it.
>          *
>          *   Otherwise we make sure its disabled.
>          *
>          * If we are disabling calls, then disable all records that
>          * are enabled.
>          */
>         if (enable && ftrace_rec_count(rec))
>                 flag = FTRACE_FL_ENABLED;
> 
> 
> used by ftrace_shutdown_sysctl

I got it.  Thank you for the explanation!

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/5] ftrace: Make ftrace_hash_rec_enable return update bool
  2016-03-12  8:35       ` Namhyung Kim
@ 2016-03-15 19:43         ` Steven Rostedt
  0 siblings, 0 replies; 28+ messages in thread
From: Steven Rostedt @ 2016-03-15 19:43 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Jiri Olsa, Jiri Olsa, lkml, Ingo Molnar, Peter Zijlstra,
	Arnaldo Carvalho de Melo

On Sat, 12 Mar 2016 17:35:02 +0900
Namhyung Kim <namhyung@kernel.org> wrote:

> Hi Jiri,
> 
> On Fri, Mar 11, 2016 at 07:15:06PM +0100, Jiri Olsa wrote:
> > On Fri, Mar 11, 2016 at 11:28:00PM +0900, Namhyung Kim wrote:
> > 
> > SNIP
> >   
> > > > @@ -1694,7 +1695,7 @@ static void __ftrace_hash_rec_update(struct ftrace_ops *ops,
> > > >  		if (inc) {
> > > >  			rec->flags++;
> > > >  			if (FTRACE_WARN_ON(ftrace_rec_count(rec) == FTRACE_REF_MAX))
> > > > -				return;
> > > > +				return false;
> > > >  
> > > >  			/*
> > > >  			 * If there's only a single callback registered to a
> > > > @@ -1720,7 +1721,7 @@ static void __ftrace_hash_rec_update(struct ftrace_ops *ops,
> > > >  				rec->flags |= FTRACE_FL_REGS;
> > > >  		} else {
> > > >  			if (FTRACE_WARN_ON(ftrace_rec_count(rec) == 0))
> > > > -				return;
> > > > +				return false;
> > > >  			rec->flags--;
> > > >  
> > > >  			/*
> > > > @@ -1753,22 +1754,27 @@ static void __ftrace_hash_rec_update(struct ftrace_ops *ops,
> > > >  			 */
> > > >  		}
> > > >  		count++;
> > > > +
> > > > +		update |= ftrace_test_record(rec, 1) != FTRACE_UPDATE_IGNORE;  

Yeah, this is confusing. Mind adding a comment above this:

	/* Must match FTRACE_UPDATE_CALLS in ftrace_modify_all_code() */

That way others will know why this is a '1'.

-- Steve

> > > 
> > > Shouldn't it use 'inc' instead of 1 for the second argument of
> > > the ftrace_test_record()?  
> > 
> > I dont think so, 1 is to update calls (FTRACE_UPDATE_CALLS)
> > check ftrace_modify_all_code:
> > 
> >         if (command & FTRACE_UPDATE_CALLS)
> >                 ftrace_replace_code(1);
> >         else if (command & FTRACE_DISABLE_CALLS)
> >                 ftrace_replace_code(0);
> > 
> > both ftrace_startup, ftrace_shutdown use FTRACE_UPDATE_CALLS  
> 
> Ah, ok.  So the second argument of the ftrace_test_record() is not
> 'enable' actually..  :-/
> 
> > 
> > you'd use 0 only to disable all, check ftrace_check_record comments:
> > 
> >         /*
> >          * If we are updating calls:
> >          *
> >          *   If the record has a ref count, then we need to enable it
> >          *   because someone is using it.
> >          *
> >          *   Otherwise we make sure its disabled.
> >          *
> >          * If we are disabling calls, then disable all records that
> >          * are enabled.
> >          */
> >         if (enable && ftrace_rec_count(rec))
> >                 flag = FTRACE_FL_ENABLED;
> > 
> > 
> > used by ftrace_shutdown_sysctl  
> 
> I got it.  Thank you for the explanation!
> 
> Thanks,
> Namhyung

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/5] ftrace perf: Check sample types only for sampling events
  2016-03-09 20:46 ` [PATCH 1/5] ftrace perf: Check sample types only for sampling events Jiri Olsa
  2016-03-10  0:36   ` Namhyung Kim
@ 2016-03-15 20:06   ` Steven Rostedt
  2016-03-15 21:51     ` Jiri Olsa
  1 sibling, 1 reply; 28+ messages in thread
From: Steven Rostedt @ 2016-03-15 20:06 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: lkml, Ingo Molnar, Namhyung Kim, Peter Zijlstra,
	Arnaldo Carvalho de Melo

On Wed,  9 Mar 2016 21:46:41 +0100
Jiri Olsa <jolsa@kernel.org> wrote:

> Currently we check sample type for ftrace:function event
> even if it's not created as sampling event. That prevents
> creating ftrace_function event in counting mode.
> 
> Making sure we check sample types only for sampling events.
> 
> Before:
>   $ sudo perf stat -e ftrace:function ls
>   ...
> 
>    Performance counter stats for 'ls':
> 
>      <not supported>      ftrace:function
> 
>          0.001983662 seconds time elapsed
> 
> After:
>   $ sudo perf stat -e ftrace:function ls

I'm assuming you gave yourself admin capabilities, and not any normal
user may sample function tracing, right?

-- Steve

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/5] ftrace perf: Check sample types only for sampling events
  2016-03-15 20:06   ` Steven Rostedt
@ 2016-03-15 21:51     ` Jiri Olsa
  0 siblings, 0 replies; 28+ messages in thread
From: Jiri Olsa @ 2016-03-15 21:51 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Jiri Olsa, lkml, Ingo Molnar, Namhyung Kim, Peter Zijlstra,
	Arnaldo Carvalho de Melo

On Tue, Mar 15, 2016 at 04:06:46PM -0400, Steven Rostedt wrote:
> On Wed,  9 Mar 2016 21:46:41 +0100
> Jiri Olsa <jolsa@kernel.org> wrote:
> 
> > Currently we check sample type for ftrace:function event
> > even if it's not created as sampling event. That prevents
> > creating ftrace_function event in counting mode.
> > 
> > Making sure we check sample types only for sampling events.
> > 
> > Before:
> >   $ sudo perf stat -e ftrace:function ls
> >   ...
> > 
> >    Performance counter stats for 'ls':
> > 
> >      <not supported>      ftrace:function
> > 
> >          0.001983662 seconds time elapsed
> > 
> > After:
> >   $ sudo perf stat -e ftrace:function ls
> 
> I'm assuming you gave yourself admin capabilities, and not any normal
> user may sample function tracing, right?

right ;-)

jirka

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] ftrace perf: Move exclude_kernel tracepoint check to init event
  2016-03-24 13:00           ` Peter Zijlstra
@ 2016-03-24 13:30             ` Jiri Olsa
  0 siblings, 0 replies; 28+ messages in thread
From: Jiri Olsa @ 2016-03-24 13:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: gg, Jiri Olsa, Steven Rostedt, lkml, Ingo Molnar, Namhyung Kim,
	Arnaldo Carvalho de Melo

On Thu, Mar 24, 2016 at 02:00:45PM +0100, Peter Zijlstra wrote:
> On Thu, Mar 24, 2016 at 01:25:44PM +0100, Jiri Olsa wrote:
> > > > > >   $ sudo perf record -e sched:sched_switch:u ls
> > > > > >   $ sudo /perf script | wc -l
> > > > > >   0
> 
> > > And its not like the [uk] flags are hard to implement here.
> > 
> > sched:sched_switch:u ?
> 
> As per the above, its implemented and correct. The answer is 0.
> 
> The only problem is that currently it assumes all tracepoint events are
> from the kernel, and that is wrong for uprobes.
> 
> But returning an error is not right, its a valid configuration, a daft
> one, sure, but not invalid.
> 
> Similarly uprobe:foo:k is daft, but broken.

ok, will send fix for uprobes then

thanks,
jirka 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] ftrace perf: Move exclude_kernel tracepoint check to init event
  2016-03-24 12:25         ` Jiri Olsa
@ 2016-03-24 13:00           ` Peter Zijlstra
  2016-03-24 13:30             ` Jiri Olsa
  0 siblings, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2016-03-24 13:00 UTC (permalink / raw)
  To: Jiri Olsa, gg
  Cc: Jiri Olsa, Steven Rostedt, lkml, Ingo Molnar, Namhyung Kim,
	Arnaldo Carvalho de Melo

On Thu, Mar 24, 2016 at 01:25:44PM +0100, Jiri Olsa wrote:
> > > > >   $ sudo perf record -e sched:sched_switch:u ls
> > > > >   $ sudo /perf script | wc -l
> > > > >   0

> > And its not like the [uk] flags are hard to implement here.
> 
> sched:sched_switch:u ?

As per the above, its implemented and correct. The answer is 0.

The only problem is that currently it assumes all tracepoint events are
from the kernel, and that is wrong for uprobes.

But returning an error is not right, its a valid configuration, a daft
one, sure, but not invalid.

Similarly uprobe:foo:k is daft, but broken.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] ftrace perf: Move exclude_kernel tracepoint check to init event
  2016-03-24 10:49       ` Peter Zijlstra
@ 2016-03-24 12:25         ` Jiri Olsa
  2016-03-24 13:00           ` Peter Zijlstra
  0 siblings, 1 reply; 28+ messages in thread
From: Jiri Olsa @ 2016-03-24 12:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jiri Olsa, Steven Rostedt, lkml, Ingo Molnar, Namhyung Kim,
	Arnaldo Carvalho de Melo

On Thu, Mar 24, 2016 at 11:49:34AM +0100, Peter Zijlstra wrote:
> On Thu, Mar 24, 2016 at 10:56:48AM +0100, Jiri Olsa wrote:
> > On Wed, Mar 23, 2016 at 11:41:29AM +0100, Peter Zijlstra wrote:
> > > On Wed, Mar 16, 2016 at 03:34:30PM +0100, Jiri Olsa wrote:
> > > > We suppress events with attr::exclude_kernel set when
> > > > the event is generated, so following capture will
> > > > give no warning but won't produce any data:
> > > > 
> > > >   $ sudo perf record -e sched:sched_switch:u ls
> > > >   $ sudo /perf script | wc -l
> > > >   0
> > > > 
> > > > Checking the attr::exclude_(kernel|user) at the event
> > > > init time and failing right away for tracepoints from
> > > > uprobes/kprobes and native ones:
> > > > 
> > > >   $ sudo perf record -e sched:sched_switch:u ls
> > > >   Error:
> > > >   The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (sched:sched_switch).
> > > >   /bin/dmesg may provide additional information.
> > > >   No CONFIG_PERF_EVENTS=y kernel support configured?
> 
> > > Not sure about this one. The previous behaviour suggests
> > > exclude_{user,kernel} is implemented, while the new behaviour says these
> > > flags are not implemented, which is a functional regression.
> > 
> > well I would not expect 'sched:sched_switch:u' to work (be implemented)
> > 
> > and I thought it's better to trigger an error than silently 'produce' no data
> 
> We're not in the business of protecting people from themselves are we?
> And if you want to help them, do so in userspace.

yep, I planned to make user space patch
to make that error more user friendly

> 
> And its not like the [uk] flags are hard to implement here.

sched:sched_switch:u ?

jirka

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] ftrace perf: Move exclude_kernel tracepoint check to init event
  2016-03-24  9:56     ` Jiri Olsa
@ 2016-03-24 10:49       ` Peter Zijlstra
  2016-03-24 12:25         ` Jiri Olsa
  0 siblings, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2016-03-24 10:49 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Jiri Olsa, Steven Rostedt, lkml, Ingo Molnar, Namhyung Kim,
	Arnaldo Carvalho de Melo

On Thu, Mar 24, 2016 at 10:56:48AM +0100, Jiri Olsa wrote:
> On Wed, Mar 23, 2016 at 11:41:29AM +0100, Peter Zijlstra wrote:
> > On Wed, Mar 16, 2016 at 03:34:30PM +0100, Jiri Olsa wrote:
> > > We suppress events with attr::exclude_kernel set when
> > > the event is generated, so following capture will
> > > give no warning but won't produce any data:
> > > 
> > >   $ sudo perf record -e sched:sched_switch:u ls
> > >   $ sudo /perf script | wc -l
> > >   0
> > > 
> > > Checking the attr::exclude_(kernel|user) at the event
> > > init time and failing right away for tracepoints from
> > > uprobes/kprobes and native ones:
> > > 
> > >   $ sudo perf record -e sched:sched_switch:u ls
> > >   Error:
> > >   The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (sched:sched_switch).
> > >   /bin/dmesg may provide additional information.
> > >   No CONFIG_PERF_EVENTS=y kernel support configured?

> > Not sure about this one. The previous behaviour suggests
> > exclude_{user,kernel} is implemented, while the new behaviour says these
> > flags are not implemented, which is a functional regression.
> 
> well I would not expect 'sched:sched_switch:u' to work (be implemented)
> 
> and I thought it's better to trigger an error than silently 'produce' no data

We're not in the business of protecting people from themselves are we?
And if you want to help them, do so in userspace.

And its not like the [uk] flags are hard to implement here.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] ftrace perf: Move exclude_kernel tracepoint check to init event
  2016-03-23 10:41   ` Peter Zijlstra
@ 2016-03-24  9:56     ` Jiri Olsa
  2016-03-24 10:49       ` Peter Zijlstra
  0 siblings, 1 reply; 28+ messages in thread
From: Jiri Olsa @ 2016-03-24  9:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jiri Olsa, Steven Rostedt, lkml, Ingo Molnar, Namhyung Kim,
	Arnaldo Carvalho de Melo

On Wed, Mar 23, 2016 at 11:41:29AM +0100, Peter Zijlstra wrote:
> On Wed, Mar 16, 2016 at 03:34:30PM +0100, Jiri Olsa wrote:
> > We suppress events with attr::exclude_kernel set when
> > the event is generated, so following capture will
> > give no warning but won't produce any data:
> > 
> >   $ sudo perf record -e sched:sched_switch:u ls
> >   $ sudo /perf script | wc -l
> >   0
> > 
> > Checking the attr::exclude_(kernel|user) at the event
> > init time and failing right away for tracepoints from
> > uprobes/kprobes and native ones:
> > 
> >   $ sudo perf record -e sched:sched_switch:u ls
> >   Error:
> >   The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (sched:sched_switch).
> >   /bin/dmesg may provide additional information.
> >   No CONFIG_PERF_EVENTS=y kernel support configured?
> > 
> >   $ sudo perf record -e probe:sys_read:u ls
> >   Error:
> >   The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (probe:sys_read).
> >   /bin/dmesg may provide additional information.
> >   No CONFIG_PERF_EVENTS=y kernel support configured?
> > 
> >   $ ./perf record -e probe_ex:main:k ./ex
> >   Error:
> >   The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (probe_ex:main).
> >   /bin/dmesg may provide additional information.
> >   No CONFIG_PERF_EVENTS=y kernel support configured?
> 
> Not sure about this one. The previous behaviour suggests
> exclude_{user,kernel} is implemented, while the new behaviour says these
> flags are not implemented, which is a functional regression.

well I would not expect 'sched:sched_switch:u' to work (be implemented)

and I thought it's better to trigger an error than silently 'produce' no data

> 
> That is, if all events are from kernel space, and we exclude all kernel
> events, 0 is the right answer not an error.
> 
> Sure, with uprobes the situation is currently broken, but this isn't a
> fix.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] ftrace perf: Move exclude_kernel tracepoint check to init event
  2016-03-16 14:34 ` [PATCH 2/5] ftrace perf: Move exclude_kernel tracepoint check to init event Jiri Olsa
  2016-03-18 14:28   ` Steven Rostedt
@ 2016-03-23 10:41   ` Peter Zijlstra
  2016-03-24  9:56     ` Jiri Olsa
  1 sibling, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2016-03-23 10:41 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Steven Rostedt, lkml, Ingo Molnar, Namhyung Kim,
	Arnaldo Carvalho de Melo

On Wed, Mar 16, 2016 at 03:34:30PM +0100, Jiri Olsa wrote:
> We suppress events with attr::exclude_kernel set when
> the event is generated, so following capture will
> give no warning but won't produce any data:
> 
>   $ sudo perf record -e sched:sched_switch:u ls
>   $ sudo /perf script | wc -l
>   0
> 
> Checking the attr::exclude_(kernel|user) at the event
> init time and failing right away for tracepoints from
> uprobes/kprobes and native ones:
> 
>   $ sudo perf record -e sched:sched_switch:u ls
>   Error:
>   The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (sched:sched_switch).
>   /bin/dmesg may provide additional information.
>   No CONFIG_PERF_EVENTS=y kernel support configured?
> 
>   $ sudo perf record -e probe:sys_read:u ls
>   Error:
>   The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (probe:sys_read).
>   /bin/dmesg may provide additional information.
>   No CONFIG_PERF_EVENTS=y kernel support configured?
> 
>   $ ./perf record -e probe_ex:main:k ./ex
>   Error:
>   The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (probe_ex:main).
>   /bin/dmesg may provide additional information.
>   No CONFIG_PERF_EVENTS=y kernel support configured?

Not sure about this one. The previous behaviour suggests
exclude_{user,kernel} is implemented, while the new behaviour says these
flags are not implemented, which is a functional regression.

That is, if all events are from kernel space, and we exclude all kernel
events, 0 is the right answer not an error.

Sure, with uprobes the situation is currently broken, but this isn't a
fix.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/5] ftrace perf: Move exclude_kernel tracepoint check to init event
  2016-03-16 14:34 ` [PATCH 2/5] ftrace perf: Move exclude_kernel tracepoint check to init event Jiri Olsa
@ 2016-03-18 14:28   ` Steven Rostedt
  2016-03-23 10:41   ` Peter Zijlstra
  1 sibling, 0 replies; 28+ messages in thread
From: Steven Rostedt @ 2016-03-18 14:28 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: lkml, Ingo Molnar, Namhyung Kim, Peter Zijlstra,
	Arnaldo Carvalho de Melo

On Wed, 16 Mar 2016 15:34:30 +0100
Jiri Olsa <jolsa@kernel.org> wrote:

> We suppress events with attr::exclude_kernel set when
> the event is generated, so following capture will
> give no warning but won't produce any data:
> 
>   $ sudo perf record -e sched:sched_switch:u ls
>   $ sudo /perf script | wc -l
>   0
> 
> Checking the attr::exclude_(kernel|user) at the event
> init time and failing right away for tracepoints from
> uprobes/kprobes and native ones:
> 
>   $ sudo perf record -e sched:sched_switch:u ls
>   Error:
>   The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (sched:sched_switch).
>   /bin/dmesg may provide additional information.
>   No CONFIG_PERF_EVENTS=y kernel support configured?
> 
>   $ sudo perf record -e probe:sys_read:u ls
>   Error:
>   The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (probe:sys_read).
>   /bin/dmesg may provide additional information.
>   No CONFIG_PERF_EVENTS=y kernel support configured?
> 
>   $ ./perf record -e probe_ex:main:k ./ex
>   Error:
>   The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (probe_ex:main).
>   /bin/dmesg may provide additional information.
>   No CONFIG_PERF_EVENTS=y kernel support configured?
> 
> Acked-by: Namhyung Kim <namhyung@kernel.org>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>

Acked-by: Steven Rostedt <rostedt@goodmis.org>

-- Steve

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 2/5] ftrace perf: Move exclude_kernel tracepoint check to init event
  2016-03-16 14:34 [PATCHv2 0/5] ftrace perf: Fixes and speedup Jiri Olsa
@ 2016-03-16 14:34 ` Jiri Olsa
  2016-03-18 14:28   ` Steven Rostedt
  2016-03-23 10:41   ` Peter Zijlstra
  0 siblings, 2 replies; 28+ messages in thread
From: Jiri Olsa @ 2016-03-16 14:34 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: lkml, Ingo Molnar, Namhyung Kim, Peter Zijlstra,
	Arnaldo Carvalho de Melo

We suppress events with attr::exclude_kernel set when
the event is generated, so following capture will
give no warning but won't produce any data:

  $ sudo perf record -e sched:sched_switch:u ls
  $ sudo /perf script | wc -l
  0

Checking the attr::exclude_(kernel|user) at the event
init time and failing right away for tracepoints from
uprobes/kprobes and native ones:

  $ sudo perf record -e sched:sched_switch:u ls
  Error:
  The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (sched:sched_switch).
  /bin/dmesg may provide additional information.
  No CONFIG_PERF_EVENTS=y kernel support configured?

  $ sudo perf record -e probe:sys_read:u ls
  Error:
  The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (probe:sys_read).
  /bin/dmesg may provide additional information.
  No CONFIG_PERF_EVENTS=y kernel support configured?

  $ ./perf record -e probe_ex:main:k ./ex
  Error:
  The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (probe_ex:main).
  /bin/dmesg may provide additional information.
  No CONFIG_PERF_EVENTS=y kernel support configured?

Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 kernel/events/core.c            |  5 -----
 kernel/trace/trace_event_perf.c | 25 +++++++++++++++++++++++++
 2 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index c15fd097af93..ca68fdcf47ce 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6898,11 +6898,6 @@ static int perf_tp_event_match(struct perf_event *event,
 {
 	if (event->hw.state & PERF_HES_STOPPED)
 		return 0;
-	/*
-	 * All tracepoints are from kernel-space.
-	 */
-	if (event->attr.exclude_kernel)
-		return 0;
 
 	if (!perf_tp_filter_match(event, data))
 		return 0;
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index e11108f1d197..6e6d4052f398 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -185,11 +185,36 @@ static void perf_trace_event_close(struct perf_event *p_event)
 	tp_event->class->reg(tp_event, TRACE_REG_PERF_CLOSE, p_event);
 }
 
+static int perf_trace_event_attr(struct trace_event_call *tp_event,
+				 struct perf_event *event)
+{
+	/*
+	 * All tracepoints and kprobes are from kernel-space.
+	 */
+	if (((tp_event->flags & TRACE_EVENT_FL_TRACEPOINT) ||
+	     (tp_event->flags & TRACE_EVENT_FL_KPROBE)) &&
+	     event->attr.exclude_kernel)
+		return -EINVAL;
+
+	/*
+	 * All uprobes are from user-space.
+	 */
+	if ((tp_event->flags & TRACE_EVENT_FL_UPROBE) &&
+	    event->attr.exclude_user)
+		return -EINVAL;
+
+	return 0;
+}
+
 static int perf_trace_event_init(struct trace_event_call *tp_event,
 				 struct perf_event *p_event)
 {
 	int ret;
 
+	ret = perf_trace_event_attr(tp_event, p_event);
+	if (ret)
+		return ret;
+
 	ret = perf_trace_event_perm(tp_event, p_event);
 	if (ret)
 		return ret;
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2016-03-24 13:31 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-09 20:46 [RFC 0/5] ftrace perf: Fixes and speedup Jiri Olsa
2016-03-09 20:46 ` [PATCH 1/5] ftrace perf: Check sample types only for sampling events Jiri Olsa
2016-03-10  0:36   ` Namhyung Kim
2016-03-10  7:25     ` Jiri Olsa
2016-03-11  8:36       ` Jiri Olsa
2016-03-11 13:48         ` Namhyung Kim
2016-03-11 18:14           ` Jiri Olsa
2016-03-15 20:06   ` Steven Rostedt
2016-03-15 21:51     ` Jiri Olsa
2016-03-09 20:46 ` [PATCH 2/5] ftrace perf: Move exclude_kernel tracepoint check to init event Jiri Olsa
2016-03-10  0:39   ` Namhyung Kim
2016-03-11  8:39     ` Jiri Olsa
2016-03-09 20:46 ` [PATCH 3/5] ftrace perf: Use ftrace_ops::private to store event pointer Jiri Olsa
2016-03-10  1:29   ` Namhyung Kim
2016-03-09 20:46 ` [PATCH 4/5] ftrace: Make ftrace_hash_rec_enable return update bool Jiri Olsa
2016-03-11 14:28   ` Namhyung Kim
2016-03-11 18:15     ` Jiri Olsa
2016-03-12  8:35       ` Namhyung Kim
2016-03-15 19:43         ` Steven Rostedt
2016-03-09 20:46 ` [PATCH 5/5] ftrace: Update dynamic ftrace calls only if necessary Jiri Olsa
2016-03-16 14:34 [PATCHv2 0/5] ftrace perf: Fixes and speedup Jiri Olsa
2016-03-16 14:34 ` [PATCH 2/5] ftrace perf: Move exclude_kernel tracepoint check to init event Jiri Olsa
2016-03-18 14:28   ` Steven Rostedt
2016-03-23 10:41   ` Peter Zijlstra
2016-03-24  9:56     ` Jiri Olsa
2016-03-24 10:49       ` Peter Zijlstra
2016-03-24 12:25         ` Jiri Olsa
2016-03-24 13:00           ` Peter Zijlstra
2016-03-24 13:30             ` Jiri Olsa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).