* [RFC 0/5] ftrace perf: Fixes and speedup
@ 2016-03-09 20:46 Jiri Olsa
2016-03-09 20:46 ` [PATCH 1/5] ftrace perf: Check sample types only for sampling events Jiri Olsa
` (4 more replies)
0 siblings, 5 replies; 22+ messages in thread
From: Jiri Olsa @ 2016-03-09 20:46 UTC (permalink / raw)
To: Steven Rostedt
Cc: lkml, Ingo Molnar, Namhyung Kim, Peter Zijlstra,
Arnaldo Carvalho de Melo
hi,
sending some small perf related fixes plus one change
to speedup ftrace:function perf registration. I'm still
testing this, but any feedback would be great.
Also available in here:
git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
ftrace/fixes
thanks,
jirka
---
Jiri Olsa (5):
ftrace perf: Check sample types only for sampling events
ftrace perf: Move exclude_kernel tracepoint check to init event
ftrace perf: Use ftrace_ops::private to store event pointer
ftrace: Make ftrace_hash_rec_enable return update bool
ftrace: Update dynamic ftrace calls only if necessary
include/linux/perf_event.h | 3 +++
kernel/events/core.c | 27 ++++++++++++++++++++++-----
kernel/trace/ftrace.c | 36 +++++++++++++++++++++---------------
kernel/trace/trace_event_perf.c | 39 ++++++++++++++++++++++++++++++---------
4 files changed, 76 insertions(+), 29 deletions(-)
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 1/5] ftrace perf: Check sample types only for sampling events
2016-03-09 20:46 [RFC 0/5] ftrace perf: Fixes and speedup Jiri Olsa
@ 2016-03-09 20:46 ` Jiri Olsa
2016-03-10 0:36 ` Namhyung Kim
2016-03-15 20:06 ` Steven Rostedt
2016-03-09 20:46 ` [PATCH 2/5] ftrace perf: Move exclude_kernel tracepoint check to init event Jiri Olsa
` (3 subsequent siblings)
4 siblings, 2 replies; 22+ messages in thread
From: Jiri Olsa @ 2016-03-09 20:46 UTC (permalink / raw)
To: Steven Rostedt
Cc: lkml, Ingo Molnar, Namhyung Kim, Peter Zijlstra,
Arnaldo Carvalho de Melo
Currently we check sample type for ftrace:function event
even if it's not created as sampling event. That prevents
creating ftrace_function event in counting mode.
Making sure we check sample types only for sampling events.
Before:
$ sudo perf stat -e ftrace:function ls
...
Performance counter stats for 'ls':
<not supported> ftrace:function
0.001983662 seconds time elapsed
After:
$ sudo perf stat -e ftrace:function ls
...
Performance counter stats for 'ls':
44,498 ftrace:function
0.037534722 seconds time elapsed
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
kernel/trace/trace_event_perf.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 00df25fd86ef..a7171ec2c1ca 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -52,14 +52,14 @@ static int perf_trace_event_perm(struct trace_event_call *tp_event,
* event, due to issues with page faults while tracing page
* fault handler and its overall trickiness nature.
*/
- if (!p_event->attr.exclude_callchain_user)
+ if (is_sampling_event(p_event) && !p_event->attr.exclude_callchain_user)
return -EINVAL;
/*
* Same reason to disable user stack dump as for user space
* callchains above.
*/
- if (p_event->attr.sample_type & PERF_SAMPLE_STACK_USER)
+ if (is_sampling_event(p_event) && p_event->attr.sample_type & PERF_SAMPLE_STACK_USER)
return -EINVAL;
}
--
2.4.3
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 2/5] ftrace perf: Move exclude_kernel tracepoint check to init event
2016-03-09 20:46 [RFC 0/5] ftrace perf: Fixes and speedup Jiri Olsa
2016-03-09 20:46 ` [PATCH 1/5] ftrace perf: Check sample types only for sampling events Jiri Olsa
@ 2016-03-09 20:46 ` Jiri Olsa
2016-03-10 0:39 ` Namhyung Kim
2016-03-09 20:46 ` [PATCH 3/5] ftrace perf: Use ftrace_ops::private to store event pointer Jiri Olsa
` (2 subsequent siblings)
4 siblings, 1 reply; 22+ messages in thread
From: Jiri Olsa @ 2016-03-09 20:46 UTC (permalink / raw)
To: Steven Rostedt
Cc: lkml, Ingo Molnar, Namhyung Kim, Peter Zijlstra,
Arnaldo Carvalho de Melo
We suppress events with attr::exclude_kernel set when
the event is generated, so following capture will
give no warning but won't produce any data:
$ sudo perf record -e sched:sched_switch:u ls
$ sudo /perf script | wc -l
0
Checking the attr::exclude_(kernel|user) at the event
init time and failing right away for tracepoints from
uprobes/kprobes and native ones:
$ sudo perf record -e sched:sched_switch:u ls
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (sched:sched_switch).
/bin/dmesg may provide additional information.
No CONFIG_PERF_EVENTS=y kernel support configured?
$ sudo perf record -e probe:sys_read:u ls
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (probe:sys_read).
/bin/dmesg may provide additional information.
No CONFIG_PERF_EVENTS=y kernel support configured?
$ ./perf record -e probe_ex:main:k ./ex
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (probe_ex:main).
/bin/dmesg may provide additional information.
No CONFIG_PERF_EVENTS=y kernel support configured?
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
kernel/events/core.c | 5 -----
kernel/trace/trace_event_perf.c | 25 +++++++++++++++++++++++++
2 files changed, 25 insertions(+), 5 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index c15fd097af93..ca68fdcf47ce 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6898,11 +6898,6 @@ static int perf_tp_event_match(struct perf_event *event,
{
if (event->hw.state & PERF_HES_STOPPED)
return 0;
- /*
- * All tracepoints are from kernel-space.
- */
- if (event->attr.exclude_kernel)
- return 0;
if (!perf_tp_filter_match(event, data))
return 0;
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index a7171ec2c1ca..0a3779bd18a1 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -182,11 +182,36 @@ static void perf_trace_event_close(struct perf_event *p_event)
tp_event->class->reg(tp_event, TRACE_REG_PERF_CLOSE, p_event);
}
+static int perf_trace_event_attr(struct trace_event_call *tp_event,
+ struct perf_event *event)
+{
+ /*
+ * All tracepoints and kprobes are from kernel-space.
+ */
+ if (((tp_event->flags & TRACE_EVENT_FL_TRACEPOINT) ||
+ (tp_event->flags & TRACE_EVENT_FL_KPROBE)) &&
+ event->attr.exclude_kernel)
+ return -EINVAL;
+
+ /*
+ * All uprobes are from user-space.
+ */
+ if ((tp_event->flags & TRACE_EVENT_FL_UPROBE) &&
+ event->attr.exclude_user)
+ return -EINVAL;
+
+ return 0;
+}
+
static int perf_trace_event_init(struct trace_event_call *tp_event,
struct perf_event *p_event)
{
int ret;
+ ret = perf_trace_event_attr(tp_event, p_event);
+ if (ret)
+ return ret;
+
ret = perf_trace_event_perm(tp_event, p_event);
if (ret)
return ret;
--
2.4.3
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 3/5] ftrace perf: Use ftrace_ops::private to store event pointer
2016-03-09 20:46 [RFC 0/5] ftrace perf: Fixes and speedup Jiri Olsa
2016-03-09 20:46 ` [PATCH 1/5] ftrace perf: Check sample types only for sampling events Jiri Olsa
2016-03-09 20:46 ` [PATCH 2/5] ftrace perf: Move exclude_kernel tracepoint check to init event Jiri Olsa
@ 2016-03-09 20:46 ` Jiri Olsa
2016-03-10 1:29 ` Namhyung Kim
2016-03-09 20:46 ` [PATCH 4/5] ftrace: Make ftrace_hash_rec_enable return update bool Jiri Olsa
2016-03-09 20:46 ` [PATCH 5/5] ftrace: Update dynamic ftrace calls only if necessary Jiri Olsa
4 siblings, 1 reply; 22+ messages in thread
From: Jiri Olsa @ 2016-03-09 20:46 UTC (permalink / raw)
To: Steven Rostedt
Cc: lkml, Ingo Molnar, Namhyung Kim, Peter Zijlstra,
Arnaldo Carvalho de Melo
Having following commands running concurrently:
# perf record -e ftrace:function -a -o krava.data sleep 10
# perf record -e ftrace:function --filter 'ip == SyS_read' ls
will end up in the latter one to fail on the filter rules
and store all functions (in perf.data) as instructed by the
first perf record instead of just SyS_read records.
The reason for this is, that tracepoint code by default
triggers all events that registered for the tracepoint.
While ftrace:function is special because ftrace_ops
itself carries a filter and only the event that owns
ftrace_ops is eligible to be triggered.
Fixing this by using ftrace_ops::private value to keep
the perf_event pointer. This way we don't need to search
for triggered event (as tracepoint handler does) and
directly store sample.
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
include/linux/perf_event.h | 3 +++
kernel/events/core.c | 22 ++++++++++++++++++++++
kernel/trace/trace_event_perf.c | 10 +++-------
3 files changed, 28 insertions(+), 7 deletions(-)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index a9d8cab18b00..a330dc06d90d 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1008,6 +1008,9 @@ extern void perf_tp_event(u64 addr, u64 count, void *record,
int entry_size, struct pt_regs *regs,
struct hlist_head *head, int rctx,
struct task_struct *task);
+void perf_function_event(struct perf_event *event,
+ void *record, int entry_size,
+ struct pt_regs *regs);
extern void perf_bp_event(struct perf_event *event, void *data);
#ifndef perf_misc_flags
diff --git a/kernel/events/core.c b/kernel/events/core.c
index ca68fdcf47ce..18da90859c17 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7063,6 +7063,28 @@ static void perf_event_free_bpf_prog(struct perf_event *event)
}
}
+#ifdef CONFIG_FUNCTION_TRACER
+void perf_function_event(struct perf_event *event,
+ void *record, int entry_size,
+ struct pt_regs *regs)
+
+{
+ struct perf_sample_data data;
+ struct perf_raw_record raw = {
+ .size = entry_size,
+ .data = record,
+ };
+
+ if (event->hw.state & PERF_HES_STOPPED)
+ return;
+
+ perf_sample_data_init(&data, 0, 0);
+ data.raw = &raw;
+
+ perf_swevent_event(event, 1, &data, regs);
+}
+#endif /* CONFIG_FUNCTION_TRACER */
+
#else
static inline void perf_tp_register(void)
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 0a3779bd18a1..087c811db04c 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -328,14 +328,9 @@ perf_ftrace_function_call(unsigned long ip, unsigned long parent_ip,
struct ftrace_ops *ops, struct pt_regs *pt_regs)
{
struct ftrace_entry *entry;
- struct hlist_head *head;
struct pt_regs regs;
int rctx;
- head = this_cpu_ptr(event_function.perf_events);
- if (hlist_empty(head))
- return;
-
#define ENTRY_SIZE (ALIGN(sizeof(struct ftrace_entry) + sizeof(u32), \
sizeof(u64)) - sizeof(u32))
@@ -349,8 +344,8 @@ perf_ftrace_function_call(unsigned long ip, unsigned long parent_ip,
entry->ip = ip;
entry->parent_ip = parent_ip;
- perf_trace_buf_submit(entry, ENTRY_SIZE, rctx, 0,
- 1, ®s, head, NULL);
+ perf_function_event(ops->private, entry, ENTRY_SIZE, ®s);
+ perf_swevent_put_recursion_context(rctx);
#undef ENTRY_SIZE
}
@@ -359,6 +354,7 @@ static int perf_ftrace_function_register(struct perf_event *event)
{
struct ftrace_ops *ops = &event->ftrace_ops;
+ ops->private = event;
ops->flags |= FTRACE_OPS_FL_PER_CPU | FTRACE_OPS_FL_RCU;
ops->func = perf_ftrace_function_call;
return register_ftrace_function(ops);
--
2.4.3
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 4/5] ftrace: Make ftrace_hash_rec_enable return update bool
2016-03-09 20:46 [RFC 0/5] ftrace perf: Fixes and speedup Jiri Olsa
` (2 preceding siblings ...)
2016-03-09 20:46 ` [PATCH 3/5] ftrace perf: Use ftrace_ops::private to store event pointer Jiri Olsa
@ 2016-03-09 20:46 ` Jiri Olsa
2016-03-11 14:28 ` Namhyung Kim
2016-03-09 20:46 ` [PATCH 5/5] ftrace: Update dynamic ftrace calls only if necessary Jiri Olsa
4 siblings, 1 reply; 22+ messages in thread
From: Jiri Olsa @ 2016-03-09 20:46 UTC (permalink / raw)
To: Steven Rostedt
Cc: lkml, Ingo Molnar, Namhyung Kim, Peter Zijlstra,
Arnaldo Carvalho de Melo
Change __ftrace_hash_rec_update to return true in case
we need to update dynamic ftrace call records. It return
false in case no update is needed.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
kernel/trace/ftrace.c | 26 ++++++++++++++++----------
1 file changed, 16 insertions(+), 10 deletions(-)
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index eca592f977b2..123dddc660e9 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1610,7 +1610,7 @@ static bool test_rec_ops_needs_regs(struct dyn_ftrace *rec)
return keep_regs;
}
-static void __ftrace_hash_rec_update(struct ftrace_ops *ops,
+static bool __ftrace_hash_rec_update(struct ftrace_ops *ops,
int filter_hash,
bool inc)
{
@@ -1618,12 +1618,13 @@ static void __ftrace_hash_rec_update(struct ftrace_ops *ops,
struct ftrace_hash *other_hash;
struct ftrace_page *pg;
struct dyn_ftrace *rec;
+ bool update = false;
int count = 0;
int all = 0;
/* Only update if the ops has been registered */
if (!(ops->flags & FTRACE_OPS_FL_ENABLED))
- return;
+ return false;
/*
* In the filter_hash case:
@@ -1650,7 +1651,7 @@ static void __ftrace_hash_rec_update(struct ftrace_ops *ops,
* then there's nothing to do.
*/
if (ftrace_hash_empty(hash))
- return;
+ return false;
}
do_for_each_ftrace_rec(pg, rec) {
@@ -1694,7 +1695,7 @@ static void __ftrace_hash_rec_update(struct ftrace_ops *ops,
if (inc) {
rec->flags++;
if (FTRACE_WARN_ON(ftrace_rec_count(rec) == FTRACE_REF_MAX))
- return;
+ return false;
/*
* If there's only a single callback registered to a
@@ -1720,7 +1721,7 @@ static void __ftrace_hash_rec_update(struct ftrace_ops *ops,
rec->flags |= FTRACE_FL_REGS;
} else {
if (FTRACE_WARN_ON(ftrace_rec_count(rec) == 0))
- return;
+ return false;
rec->flags--;
/*
@@ -1753,22 +1754,27 @@ static void __ftrace_hash_rec_update(struct ftrace_ops *ops,
*/
}
count++;
+
+ update |= ftrace_test_record(rec, 1) != FTRACE_UPDATE_IGNORE;
+
/* Shortcut, if we handled all records, we are done. */
if (!all && count == hash->count)
- return;
+ return update;
} while_for_each_ftrace_rec();
+
+ return update;
}
-static void ftrace_hash_rec_disable(struct ftrace_ops *ops,
+static bool ftrace_hash_rec_disable(struct ftrace_ops *ops,
int filter_hash)
{
- __ftrace_hash_rec_update(ops, filter_hash, 0);
+ return __ftrace_hash_rec_update(ops, filter_hash, 0);
}
-static void ftrace_hash_rec_enable(struct ftrace_ops *ops,
+static bool ftrace_hash_rec_enable(struct ftrace_ops *ops,
int filter_hash)
{
- __ftrace_hash_rec_update(ops, filter_hash, 1);
+ return __ftrace_hash_rec_update(ops, filter_hash, 1);
}
static void ftrace_hash_rec_update_modify(struct ftrace_ops *ops,
--
2.4.3
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 5/5] ftrace: Update dynamic ftrace calls only if necessary
2016-03-09 20:46 [RFC 0/5] ftrace perf: Fixes and speedup Jiri Olsa
` (3 preceding siblings ...)
2016-03-09 20:46 ` [PATCH 4/5] ftrace: Make ftrace_hash_rec_enable return update bool Jiri Olsa
@ 2016-03-09 20:46 ` Jiri Olsa
4 siblings, 0 replies; 22+ messages in thread
From: Jiri Olsa @ 2016-03-09 20:46 UTC (permalink / raw)
To: Steven Rostedt
Cc: lkml, Ingo Molnar, Namhyung Kim, Peter Zijlstra,
Arnaldo Carvalho de Melo
Currently dynamic ftrace calls are updated any time
the ftrace_ops is un/registered. If we do this update
only when it's needed, we save lot of time for perf
system wide ftrace function sampling/counting.
The reason is that for system wide sampling/counting,
perf creates event for each cpu in the system.
Each event then registers separate copy of ftrace_ops,
which ends up in FTRACE_UPDATE_CALLS updates. On servers
with many cpus that means serious stall (240 cpus server):
Counting:
# time ./perf stat -e ftrace:function -a sleep 1
Performance counter stats for 'system wide':
370,663 ftrace:function
1.401427505 seconds time elapsed
real 3m51.743s
user 0m0.023s
sys 3m48.569s
Sampling:
# time ./perf record -e ftrace:function -a sleep 1
[ perf record: Woken up 0 times to write data ]
Warning:
Processed 141200 events and lost 5 chunks!
[ perf record: Captured and wrote 10.703 MB perf.data (135950 samples) ]
real 2m31.429s
user 0m0.213s
sys 2m29.494s
There's no reason to do the FTRACE_UPDATE_CALLS update
for each event in perf case, because all the ftrace_ops
always share the same filter, so the updated calls are
always the same.
It's required that only first ftrace_ops registration
does the FTRACE_UPDATE_CALLS update (also sometimes
the second if the first one used the trampoline), but
the rest can be only cheaply linked into the ftrace_ops
list.
Counting:
# time ./perf stat -e ftrace:function -a sleep 1
Performance counter stats for 'system wide':
398,571 ftrace:function
1.377503733 seconds time elapsed
real 0m2.787s
user 0m0.005s
sys 0m1.883s
Sampling:
# time ./perf record -e ftrace:function -a sleep 1
[ perf record: Woken up 0 times to write data ]
Warning:
Processed 261730 events and lost 9 chunks!
[ perf record: Captured and wrote 19.907 MB perf.data (256293 samples) ]
real 1m31.948s
user 0m0.309s
sys 1m32.051s
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
kernel/trace/ftrace.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 123dddc660e9..48b491463549 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -2650,7 +2650,6 @@ static int ftrace_startup(struct ftrace_ops *ops, int command)
return ret;
ftrace_start_up++;
- command |= FTRACE_UPDATE_CALLS;
/*
* Note that ftrace probes uses this to start up
@@ -2671,7 +2670,8 @@ static int ftrace_startup(struct ftrace_ops *ops, int command)
return ret;
}
- ftrace_hash_rec_enable(ops, 1);
+ if (ftrace_hash_rec_enable(ops, 1))
+ command |= FTRACE_UPDATE_CALLS;
ftrace_startup_enable(command);
@@ -2701,11 +2701,11 @@ static int ftrace_shutdown(struct ftrace_ops *ops, int command)
/* Disabling ipmodify never fails */
ftrace_hash_ipmodify_disable(ops);
- ftrace_hash_rec_disable(ops, 1);
- ops->flags &= ~FTRACE_OPS_FL_ENABLED;
+ if (ftrace_hash_rec_disable(ops, 1))
+ command |= FTRACE_UPDATE_CALLS;
- command |= FTRACE_UPDATE_CALLS;
+ ops->flags &= ~FTRACE_OPS_FL_ENABLED;
if (saved_ftrace_func != ftrace_trace_function) {
saved_ftrace_func = ftrace_trace_function;
--
2.4.3
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH 1/5] ftrace perf: Check sample types only for sampling events
2016-03-09 20:46 ` [PATCH 1/5] ftrace perf: Check sample types only for sampling events Jiri Olsa
@ 2016-03-10 0:36 ` Namhyung Kim
2016-03-10 7:25 ` Jiri Olsa
2016-03-15 20:06 ` Steven Rostedt
1 sibling, 1 reply; 22+ messages in thread
From: Namhyung Kim @ 2016-03-10 0:36 UTC (permalink / raw)
To: Jiri Olsa
Cc: Steven Rostedt, lkml, Ingo Molnar, Peter Zijlstra,
Arnaldo Carvalho de Melo
Hi Jiri,
On Wed, Mar 09, 2016 at 09:46:41PM +0100, Jiri Olsa wrote:
> Currently we check sample type for ftrace:function event
> even if it's not created as sampling event. That prevents
> creating ftrace_function event in counting mode.
>
> Making sure we check sample types only for sampling events.
>
> Before:
> $ sudo perf stat -e ftrace:function ls
> ...
>
> Performance counter stats for 'ls':
>
> <not supported> ftrace:function
>
> 0.001983662 seconds time elapsed
>
> After:
> $ sudo perf stat -e ftrace:function ls
> ...
>
> Performance counter stats for 'ls':
>
> 44,498 ftrace:function
>
> 0.037534722 seconds time elapsed
>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
> kernel/trace/trace_event_perf.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
> index 00df25fd86ef..a7171ec2c1ca 100644
> --- a/kernel/trace/trace_event_perf.c
> +++ b/kernel/trace/trace_event_perf.c
> @@ -52,14 +52,14 @@ static int perf_trace_event_perm(struct trace_event_call *tp_event,
> * event, due to issues with page faults while tracing page
> * fault handler and its overall trickiness nature.
> */
> - if (!p_event->attr.exclude_callchain_user)
> + if (is_sampling_event(p_event) && !p_event->attr.exclude_callchain_user)
> return -EINVAL;
>
> /*
> * Same reason to disable user stack dump as for user space
> * callchains above.
> */
> - if (p_event->attr.sample_type & PERF_SAMPLE_STACK_USER)
> + if (is_sampling_event(p_event) && p_event->attr.sample_type & PERF_SAMPLE_STACK_USER)
> return -EINVAL;
> }
>
What about checking is_sampling_event() first and goto the last
paranoid_tracepoint_raw check instead? This way we can remove the
same check in the function trace case.
Thanks,
Namhyung
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/5] ftrace perf: Move exclude_kernel tracepoint check to init event
2016-03-09 20:46 ` [PATCH 2/5] ftrace perf: Move exclude_kernel tracepoint check to init event Jiri Olsa
@ 2016-03-10 0:39 ` Namhyung Kim
2016-03-11 8:39 ` Jiri Olsa
0 siblings, 1 reply; 22+ messages in thread
From: Namhyung Kim @ 2016-03-10 0:39 UTC (permalink / raw)
To: Jiri Olsa
Cc: Steven Rostedt, lkml, Ingo Molnar, Peter Zijlstra,
Arnaldo Carvalho de Melo
On Wed, Mar 09, 2016 at 09:46:42PM +0100, Jiri Olsa wrote:
> We suppress events with attr::exclude_kernel set when
> the event is generated, so following capture will
> give no warning but won't produce any data:
>
> $ sudo perf record -e sched:sched_switch:u ls
> $ sudo /perf script | wc -l
> 0
>
> Checking the attr::exclude_(kernel|user) at the event
> init time and failing right away for tracepoints from
> uprobes/kprobes and native ones:
>
> $ sudo perf record -e sched:sched_switch:u ls
> Error:
> The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (sched:sched_switch).
> /bin/dmesg may provide additional information.
> No CONFIG_PERF_EVENTS=y kernel support configured?
>
> $ sudo perf record -e probe:sys_read:u ls
> Error:
> The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (probe:sys_read).
> /bin/dmesg may provide additional information.
> No CONFIG_PERF_EVENTS=y kernel support configured?
>
> $ ./perf record -e probe_ex:main:k ./ex
> Error:
> The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (probe_ex:main).
> /bin/dmesg may provide additional information.
> No CONFIG_PERF_EVENTS=y kernel support configured?
>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Maybe we need to improve the error message later.
Thanks,
Namhyung
> ---
> kernel/events/core.c | 5 -----
> kernel/trace/trace_event_perf.c | 25 +++++++++++++++++++++++++
> 2 files changed, 25 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index c15fd097af93..ca68fdcf47ce 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -6898,11 +6898,6 @@ static int perf_tp_event_match(struct perf_event *event,
> {
> if (event->hw.state & PERF_HES_STOPPED)
> return 0;
> - /*
> - * All tracepoints are from kernel-space.
> - */
> - if (event->attr.exclude_kernel)
> - return 0;
>
> if (!perf_tp_filter_match(event, data))
> return 0;
> diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
> index a7171ec2c1ca..0a3779bd18a1 100644
> --- a/kernel/trace/trace_event_perf.c
> +++ b/kernel/trace/trace_event_perf.c
> @@ -182,11 +182,36 @@ static void perf_trace_event_close(struct perf_event *p_event)
> tp_event->class->reg(tp_event, TRACE_REG_PERF_CLOSE, p_event);
> }
>
> +static int perf_trace_event_attr(struct trace_event_call *tp_event,
> + struct perf_event *event)
> +{
> + /*
> + * All tracepoints and kprobes are from kernel-space.
> + */
> + if (((tp_event->flags & TRACE_EVENT_FL_TRACEPOINT) ||
> + (tp_event->flags & TRACE_EVENT_FL_KPROBE)) &&
> + event->attr.exclude_kernel)
> + return -EINVAL;
> +
> + /*
> + * All uprobes are from user-space.
> + */
> + if ((tp_event->flags & TRACE_EVENT_FL_UPROBE) &&
> + event->attr.exclude_user)
> + return -EINVAL;
> +
> + return 0;
> +}
> +
> static int perf_trace_event_init(struct trace_event_call *tp_event,
> struct perf_event *p_event)
> {
> int ret;
>
> + ret = perf_trace_event_attr(tp_event, p_event);
> + if (ret)
> + return ret;
> +
> ret = perf_trace_event_perm(tp_event, p_event);
> if (ret)
> return ret;
> --
> 2.4.3
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 3/5] ftrace perf: Use ftrace_ops::private to store event pointer
2016-03-09 20:46 ` [PATCH 3/5] ftrace perf: Use ftrace_ops::private to store event pointer Jiri Olsa
@ 2016-03-10 1:29 ` Namhyung Kim
0 siblings, 0 replies; 22+ messages in thread
From: Namhyung Kim @ 2016-03-10 1:29 UTC (permalink / raw)
To: Jiri Olsa
Cc: Steven Rostedt, lkml, Ingo Molnar, Peter Zijlstra,
Arnaldo Carvalho de Melo
On Wed, Mar 09, 2016 at 09:46:43PM +0100, Jiri Olsa wrote:
> Having following commands running concurrently:
>
> # perf record -e ftrace:function -a -o krava.data sleep 10
> # perf record -e ftrace:function --filter 'ip == SyS_read' ls
>
> will end up in the latter one to fail on the filter rules
> and store all functions (in perf.data) as instructed by the
> first perf record instead of just SyS_read records.
>
> The reason for this is, that tracepoint code by default
> triggers all events that registered for the tracepoint.
>
> While ftrace:function is special because ftrace_ops
> itself carries a filter and only the event that owns
> ftrace_ops is eligible to be triggered.
>
> Fixing this by using ftrace_ops::private value to keep
> the perf_event pointer. This way we don't need to search
> for triggered event (as tracepoint handler does) and
> directly store sample.
>
> Suggested-by: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Thanks,
Namhyung
> ---
> include/linux/perf_event.h | 3 +++
> kernel/events/core.c | 22 ++++++++++++++++++++++
> kernel/trace/trace_event_perf.c | 10 +++-------
> 3 files changed, 28 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index a9d8cab18b00..a330dc06d90d 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -1008,6 +1008,9 @@ extern void perf_tp_event(u64 addr, u64 count, void *record,
> int entry_size, struct pt_regs *regs,
> struct hlist_head *head, int rctx,
> struct task_struct *task);
> +void perf_function_event(struct perf_event *event,
> + void *record, int entry_size,
> + struct pt_regs *regs);
> extern void perf_bp_event(struct perf_event *event, void *data);
>
> #ifndef perf_misc_flags
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index ca68fdcf47ce..18da90859c17 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -7063,6 +7063,28 @@ static void perf_event_free_bpf_prog(struct perf_event *event)
> }
> }
>
> +#ifdef CONFIG_FUNCTION_TRACER
> +void perf_function_event(struct perf_event *event,
> + void *record, int entry_size,
> + struct pt_regs *regs)
> +
> +{
> + struct perf_sample_data data;
> + struct perf_raw_record raw = {
> + .size = entry_size,
> + .data = record,
> + };
> +
> + if (event->hw.state & PERF_HES_STOPPED)
> + return;
> +
> + perf_sample_data_init(&data, 0, 0);
> + data.raw = &raw;
> +
> + perf_swevent_event(event, 1, &data, regs);
> +}
> +#endif /* CONFIG_FUNCTION_TRACER */
> +
> #else
>
> static inline void perf_tp_register(void)
> diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
> index 0a3779bd18a1..087c811db04c 100644
> --- a/kernel/trace/trace_event_perf.c
> +++ b/kernel/trace/trace_event_perf.c
> @@ -328,14 +328,9 @@ perf_ftrace_function_call(unsigned long ip, unsigned long parent_ip,
> struct ftrace_ops *ops, struct pt_regs *pt_regs)
> {
> struct ftrace_entry *entry;
> - struct hlist_head *head;
> struct pt_regs regs;
> int rctx;
>
> - head = this_cpu_ptr(event_function.perf_events);
> - if (hlist_empty(head))
> - return;
> -
> #define ENTRY_SIZE (ALIGN(sizeof(struct ftrace_entry) + sizeof(u32), \
> sizeof(u64)) - sizeof(u32))
>
> @@ -349,8 +344,8 @@ perf_ftrace_function_call(unsigned long ip, unsigned long parent_ip,
>
> entry->ip = ip;
> entry->parent_ip = parent_ip;
> - perf_trace_buf_submit(entry, ENTRY_SIZE, rctx, 0,
> - 1, ®s, head, NULL);
> + perf_function_event(ops->private, entry, ENTRY_SIZE, ®s);
> + perf_swevent_put_recursion_context(rctx);
>
> #undef ENTRY_SIZE
> }
> @@ -359,6 +354,7 @@ static int perf_ftrace_function_register(struct perf_event *event)
> {
> struct ftrace_ops *ops = &event->ftrace_ops;
>
> + ops->private = event;
> ops->flags |= FTRACE_OPS_FL_PER_CPU | FTRACE_OPS_FL_RCU;
> ops->func = perf_ftrace_function_call;
> return register_ftrace_function(ops);
> --
> 2.4.3
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/5] ftrace perf: Check sample types only for sampling events
2016-03-10 0:36 ` Namhyung Kim
@ 2016-03-10 7:25 ` Jiri Olsa
2016-03-11 8:36 ` Jiri Olsa
0 siblings, 1 reply; 22+ messages in thread
From: Jiri Olsa @ 2016-03-10 7:25 UTC (permalink / raw)
To: Namhyung Kim
Cc: Jiri Olsa, Steven Rostedt, lkml, Ingo Molnar, Peter Zijlstra,
Arnaldo Carvalho de Melo
On Thu, Mar 10, 2016 at 09:36:37AM +0900, Namhyung Kim wrote:
> Hi Jiri,
>
> On Wed, Mar 09, 2016 at 09:46:41PM +0100, Jiri Olsa wrote:
> > Currently we check sample type for ftrace:function event
> > even if it's not created as sampling event. That prevents
> > creating ftrace_function event in counting mode.
> >
> > Making sure we check sample types only for sampling events.
> >
> > Before:
> > $ sudo perf stat -e ftrace:function ls
> > ...
> >
> > Performance counter stats for 'ls':
> >
> > <not supported> ftrace:function
> >
> > 0.001983662 seconds time elapsed
> >
> > After:
> > $ sudo perf stat -e ftrace:function ls
> > ...
> >
> > Performance counter stats for 'ls':
> >
> > 44,498 ftrace:function
> >
> > 0.037534722 seconds time elapsed
> >
> > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > ---
> > kernel/trace/trace_event_perf.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
> > index 00df25fd86ef..a7171ec2c1ca 100644
> > --- a/kernel/trace/trace_event_perf.c
> > +++ b/kernel/trace/trace_event_perf.c
> > @@ -52,14 +52,14 @@ static int perf_trace_event_perm(struct trace_event_call *tp_event,
> > * event, due to issues with page faults while tracing page
> > * fault handler and its overall trickiness nature.
> > */
> > - if (!p_event->attr.exclude_callchain_user)
> > + if (is_sampling_event(p_event) && !p_event->attr.exclude_callchain_user)
> > return -EINVAL;
> >
> > /*
> > * Same reason to disable user stack dump as for user space
> > * callchains above.
> > */
> > - if (p_event->attr.sample_type & PERF_SAMPLE_STACK_USER)
> > + if (is_sampling_event(p_event) && p_event->attr.sample_type & PERF_SAMPLE_STACK_USER)
> > return -EINVAL;
> > }
> >
>
> What about checking is_sampling_event() first and goto the last
> paranoid_tracepoint_raw check instead? This way we can remove the
> same check in the function trace case.
right, will check
thanks,
jirka
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/5] ftrace perf: Check sample types only for sampling events
2016-03-10 7:25 ` Jiri Olsa
@ 2016-03-11 8:36 ` Jiri Olsa
2016-03-11 13:48 ` Namhyung Kim
0 siblings, 1 reply; 22+ messages in thread
From: Jiri Olsa @ 2016-03-11 8:36 UTC (permalink / raw)
To: Namhyung Kim
Cc: Jiri Olsa, Steven Rostedt, lkml, Ingo Molnar, Peter Zijlstra,
Arnaldo Carvalho de Melo
On Thu, Mar 10, 2016 at 08:25:02AM +0100, Jiri Olsa wrote:
> On Thu, Mar 10, 2016 at 09:36:37AM +0900, Namhyung Kim wrote:
> > Hi Jiri,
> >
> > On Wed, Mar 09, 2016 at 09:46:41PM +0100, Jiri Olsa wrote:
> > > Currently we check sample type for ftrace:function event
> > > even if it's not created as sampling event. That prevents
> > > creating ftrace_function event in counting mode.
> > >
> > > Making sure we check sample types only for sampling events.
> > >
> > > Before:
> > > $ sudo perf stat -e ftrace:function ls
> > > ...
> > >
> > > Performance counter stats for 'ls':
> > >
> > > <not supported> ftrace:function
> > >
> > > 0.001983662 seconds time elapsed
> > >
> > > After:
> > > $ sudo perf stat -e ftrace:function ls
> > > ...
> > >
> > > Performance counter stats for 'ls':
> > >
> > > 44,498 ftrace:function
> > >
> > > 0.037534722 seconds time elapsed
> > >
> > > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > > ---
> > > kernel/trace/trace_event_perf.c | 4 ++--
> > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
> > > index 00df25fd86ef..a7171ec2c1ca 100644
> > > --- a/kernel/trace/trace_event_perf.c
> > > +++ b/kernel/trace/trace_event_perf.c
> > > @@ -52,14 +52,14 @@ static int perf_trace_event_perm(struct trace_event_call *tp_event,
> > > * event, due to issues with page faults while tracing page
> > > * fault handler and its overall trickiness nature.
> > > */
> > > - if (!p_event->attr.exclude_callchain_user)
> > > + if (is_sampling_event(p_event) && !p_event->attr.exclude_callchain_user)
> > > return -EINVAL;
> > >
> > > /*
> > > * Same reason to disable user stack dump as for user space
> > > * callchains above.
> > > */
> > > - if (p_event->attr.sample_type & PERF_SAMPLE_STACK_USER)
> > > + if (is_sampling_event(p_event) && p_event->attr.sample_type & PERF_SAMPLE_STACK_USER)
> > > return -EINVAL;
> > > }
> > >
> >
> > What about checking is_sampling_event() first and goto the last
> > paranoid_tracepoint_raw check instead? This way we can remove the
> > same check in the function trace case.
>
> right, will check
hum, did you mean something like this?
I'd rather keep it the original way.. seems more straight
jirka
---
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 00df25fd86ef..7c1edb57c823 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -44,23 +44,22 @@ static int perf_trace_event_perm(struct trace_event_call *tp_event,
/* The ftrace function trace is allowed only for root. */
if (ftrace_event_is_function(tp_event)) {
- if (perf_paranoid_tracepoint_raw() && !capable(CAP_SYS_ADMIN))
- return -EPERM;
-
/*
* We don't allow user space callchains for function trace
* event, due to issues with page faults while tracing page
* fault handler and its overall trickiness nature.
*/
- if (!p_event->attr.exclude_callchain_user)
+ if (is_sampling_event(p_event) && !p_event->attr.exclude_callchain_user)
return -EINVAL;
/*
* Same reason to disable user stack dump as for user space
* callchains above.
*/
- if (p_event->attr.sample_type & PERF_SAMPLE_STACK_USER)
+ if (is_sampling_event(p_event) && p_event->attr.sample_type & PERF_SAMPLE_STACK_USER)
return -EINVAL;
+
+ goto root_check;
}
/* No tracing, just counting, so no obvious leak */
@@ -73,6 +72,7 @@ static int perf_trace_event_perm(struct trace_event_call *tp_event,
return 0;
}
+root_check:
/*
* ...otherwise raw tracepoint data can be a severe data leak,
* only allow root to have these.
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH 2/5] ftrace perf: Move exclude_kernel tracepoint check to init event
2016-03-10 0:39 ` Namhyung Kim
@ 2016-03-11 8:39 ` Jiri Olsa
0 siblings, 0 replies; 22+ messages in thread
From: Jiri Olsa @ 2016-03-11 8:39 UTC (permalink / raw)
To: Namhyung Kim
Cc: Jiri Olsa, Steven Rostedt, lkml, Ingo Molnar, Peter Zijlstra,
Arnaldo Carvalho de Melo
On Thu, Mar 10, 2016 at 09:39:55AM +0900, Namhyung Kim wrote:
> On Wed, Mar 09, 2016 at 09:46:42PM +0100, Jiri Olsa wrote:
> > We suppress events with attr::exclude_kernel set when
> > the event is generated, so following capture will
> > give no warning but won't produce any data:
> >
> > $ sudo perf record -e sched:sched_switch:u ls
> > $ sudo /perf script | wc -l
> > 0
> >
> > Checking the attr::exclude_(kernel|user) at the event
> > init time and failing right away for tracepoints from
> > uprobes/kprobes and native ones:
> >
> > $ sudo perf record -e sched:sched_switch:u ls
> > Error:
> > The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (sched:sched_switch).
> > /bin/dmesg may provide additional information.
> > No CONFIG_PERF_EVENTS=y kernel support configured?
> >
> > $ sudo perf record -e probe:sys_read:u ls
> > Error:
> > The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (probe:sys_read).
> > /bin/dmesg may provide additional information.
> > No CONFIG_PERF_EVENTS=y kernel support configured?
> >
> > $ ./perf record -e probe_ex:main:k ./ex
> > Error:
> > The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (probe_ex:main).
> > /bin/dmesg may provide additional information.
> > No CONFIG_PERF_EVENTS=y kernel support configured?
> >
> > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
>
> Acked-by: Namhyung Kim <namhyung@kernel.org>
>
> Maybe we need to improve the error message later.
yep, working on that ;-)
thanks,
jirka
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/5] ftrace perf: Check sample types only for sampling events
2016-03-11 8:36 ` Jiri Olsa
@ 2016-03-11 13:48 ` Namhyung Kim
2016-03-11 18:14 ` Jiri Olsa
0 siblings, 1 reply; 22+ messages in thread
From: Namhyung Kim @ 2016-03-11 13:48 UTC (permalink / raw)
To: Jiri Olsa
Cc: Jiri Olsa, Steven Rostedt, lkml, Ingo Molnar, Peter Zijlstra,
Arnaldo Carvalho de Melo
On Fri, Mar 11, 2016 at 09:36:24AM +0100, Jiri Olsa wrote:
> On Thu, Mar 10, 2016 at 08:25:02AM +0100, Jiri Olsa wrote:
> > On Thu, Mar 10, 2016 at 09:36:37AM +0900, Namhyung Kim wrote:
> > > Hi Jiri,
> > >
> > > On Wed, Mar 09, 2016 at 09:46:41PM +0100, Jiri Olsa wrote:
> > > > Currently we check sample type for ftrace:function event
> > > > even if it's not created as sampling event. That prevents
> > > > creating ftrace_function event in counting mode.
> > > >
> > > > Making sure we check sample types only for sampling events.
> > > >
> > > > Before:
> > > > $ sudo perf stat -e ftrace:function ls
> > > > ...
> > > >
> > > > Performance counter stats for 'ls':
> > > >
> > > > <not supported> ftrace:function
> > > >
> > > > 0.001983662 seconds time elapsed
> > > >
> > > > After:
> > > > $ sudo perf stat -e ftrace:function ls
> > > > ...
> > > >
> > > > Performance counter stats for 'ls':
> > > >
> > > > 44,498 ftrace:function
> > > >
> > > > 0.037534722 seconds time elapsed
> > > >
> > > > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > > > ---
> > > > kernel/trace/trace_event_perf.c | 4 ++--
> > > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
> > > > index 00df25fd86ef..a7171ec2c1ca 100644
> > > > --- a/kernel/trace/trace_event_perf.c
> > > > +++ b/kernel/trace/trace_event_perf.c
> > > > @@ -52,14 +52,14 @@ static int perf_trace_event_perm(struct trace_event_call *tp_event,
> > > > * event, due to issues with page faults while tracing page
> > > > * fault handler and its overall trickiness nature.
> > > > */
> > > > - if (!p_event->attr.exclude_callchain_user)
> > > > + if (is_sampling_event(p_event) && !p_event->attr.exclude_callchain_user)
> > > > return -EINVAL;
> > > >
> > > > /*
> > > > * Same reason to disable user stack dump as for user space
> > > > * callchains above.
> > > > */
> > > > - if (p_event->attr.sample_type & PERF_SAMPLE_STACK_USER)
> > > > + if (is_sampling_event(p_event) && p_event->attr.sample_type & PERF_SAMPLE_STACK_USER)
> > > > return -EINVAL;
> > > > }
> > > >
> > >
> > > What about checking is_sampling_event() first and goto the last
> > > paranoid_tracepoint_raw check instead? This way we can remove the
> > > same check in the function trace case.
> >
> > right, will check
>
> hum, did you mean something like this?
>
> I'd rather keep it the original way.. seems more straight
Hmm.. I think I was wrong. But it seems we can simply return 0 for
non sampling case. How about this?
Thanks,
Namhyung
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 00df25fd86ef..e11108f1d197 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -47,6 +47,9 @@ static int perf_trace_event_perm(struct trace_event_call *tp_event,
if (perf_paranoid_tracepoint_raw() && !capable(CAP_SYS_ADMIN))
return -EPERM;
+ if (!is_sampling_event(p_event))
+ return 0;
+
/*
* We don't allow user space callchains for function trace
* event, due to issues with page faults while tracing page
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH 4/5] ftrace: Make ftrace_hash_rec_enable return update bool
2016-03-09 20:46 ` [PATCH 4/5] ftrace: Make ftrace_hash_rec_enable return update bool Jiri Olsa
@ 2016-03-11 14:28 ` Namhyung Kim
2016-03-11 18:15 ` Jiri Olsa
0 siblings, 1 reply; 22+ messages in thread
From: Namhyung Kim @ 2016-03-11 14:28 UTC (permalink / raw)
To: Jiri Olsa
Cc: Steven Rostedt, lkml, Ingo Molnar, Peter Zijlstra,
Arnaldo Carvalho de Melo
On Wed, Mar 09, 2016 at 09:46:44PM +0100, Jiri Olsa wrote:
> Change __ftrace_hash_rec_update to return true in case
> we need to update dynamic ftrace call records. It return
> false in case no update is needed.
>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
> kernel/trace/ftrace.c | 26 ++++++++++++++++----------
> 1 file changed, 16 insertions(+), 10 deletions(-)
>
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index eca592f977b2..123dddc660e9 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -1610,7 +1610,7 @@ static bool test_rec_ops_needs_regs(struct dyn_ftrace *rec)
> return keep_regs;
> }
>
> -static void __ftrace_hash_rec_update(struct ftrace_ops *ops,
> +static bool __ftrace_hash_rec_update(struct ftrace_ops *ops,
> int filter_hash,
> bool inc)
> {
> @@ -1618,12 +1618,13 @@ static void __ftrace_hash_rec_update(struct ftrace_ops *ops,
> struct ftrace_hash *other_hash;
> struct ftrace_page *pg;
> struct dyn_ftrace *rec;
> + bool update = false;
> int count = 0;
> int all = 0;
>
> /* Only update if the ops has been registered */
> if (!(ops->flags & FTRACE_OPS_FL_ENABLED))
> - return;
> + return false;
>
> /*
> * In the filter_hash case:
> @@ -1650,7 +1651,7 @@ static void __ftrace_hash_rec_update(struct ftrace_ops *ops,
> * then there's nothing to do.
> */
> if (ftrace_hash_empty(hash))
> - return;
> + return false;
> }
>
> do_for_each_ftrace_rec(pg, rec) {
> @@ -1694,7 +1695,7 @@ static void __ftrace_hash_rec_update(struct ftrace_ops *ops,
> if (inc) {
> rec->flags++;
> if (FTRACE_WARN_ON(ftrace_rec_count(rec) == FTRACE_REF_MAX))
> - return;
> + return false;
>
> /*
> * If there's only a single callback registered to a
> @@ -1720,7 +1721,7 @@ static void __ftrace_hash_rec_update(struct ftrace_ops *ops,
> rec->flags |= FTRACE_FL_REGS;
> } else {
> if (FTRACE_WARN_ON(ftrace_rec_count(rec) == 0))
> - return;
> + return false;
> rec->flags--;
>
> /*
> @@ -1753,22 +1754,27 @@ static void __ftrace_hash_rec_update(struct ftrace_ops *ops,
> */
> }
> count++;
> +
> + update |= ftrace_test_record(rec, 1) != FTRACE_UPDATE_IGNORE;
Shouldn't it use 'inc' instead of 1 for the second argument of
the ftrace_test_record()?
Thanks,
Namhyung
> +
> /* Shortcut, if we handled all records, we are done. */
> if (!all && count == hash->count)
> - return;
> + return update;
> } while_for_each_ftrace_rec();
> +
> + return update;
> }
>
> -static void ftrace_hash_rec_disable(struct ftrace_ops *ops,
> +static bool ftrace_hash_rec_disable(struct ftrace_ops *ops,
> int filter_hash)
> {
> - __ftrace_hash_rec_update(ops, filter_hash, 0);
> + return __ftrace_hash_rec_update(ops, filter_hash, 0);
> }
>
> -static void ftrace_hash_rec_enable(struct ftrace_ops *ops,
> +static bool ftrace_hash_rec_enable(struct ftrace_ops *ops,
> int filter_hash)
> {
> - __ftrace_hash_rec_update(ops, filter_hash, 1);
> + return __ftrace_hash_rec_update(ops, filter_hash, 1);
> }
>
> static void ftrace_hash_rec_update_modify(struct ftrace_ops *ops,
> --
> 2.4.3
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/5] ftrace perf: Check sample types only for sampling events
2016-03-11 13:48 ` Namhyung Kim
@ 2016-03-11 18:14 ` Jiri Olsa
0 siblings, 0 replies; 22+ messages in thread
From: Jiri Olsa @ 2016-03-11 18:14 UTC (permalink / raw)
To: Namhyung Kim
Cc: Jiri Olsa, Steven Rostedt, lkml, Ingo Molnar, Peter Zijlstra,
Arnaldo Carvalho de Melo
On Fri, Mar 11, 2016 at 10:48:14PM +0900, Namhyung Kim wrote:
SNIP
> > > > What about checking is_sampling_event() first and goto the last
> > > > paranoid_tracepoint_raw check instead? This way we can remove the
> > > > same check in the function trace case.
> > >
> > > right, will check
> >
> > hum, did you mean something like this?
> >
> > I'd rather keep it the original way.. seems more straight
>
> Hmm.. I think I was wrong. But it seems we can simply return 0 for
> non sampling case. How about this?
yep, that seems better.. will post v2
thanks,
jirka
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 4/5] ftrace: Make ftrace_hash_rec_enable return update bool
2016-03-11 14:28 ` Namhyung Kim
@ 2016-03-11 18:15 ` Jiri Olsa
2016-03-12 8:35 ` Namhyung Kim
0 siblings, 1 reply; 22+ messages in thread
From: Jiri Olsa @ 2016-03-11 18:15 UTC (permalink / raw)
To: Namhyung Kim
Cc: Jiri Olsa, Steven Rostedt, lkml, Ingo Molnar, Peter Zijlstra,
Arnaldo Carvalho de Melo
On Fri, Mar 11, 2016 at 11:28:00PM +0900, Namhyung Kim wrote:
SNIP
> > @@ -1694,7 +1695,7 @@ static void __ftrace_hash_rec_update(struct ftrace_ops *ops,
> > if (inc) {
> > rec->flags++;
> > if (FTRACE_WARN_ON(ftrace_rec_count(rec) == FTRACE_REF_MAX))
> > - return;
> > + return false;
> >
> > /*
> > * If there's only a single callback registered to a
> > @@ -1720,7 +1721,7 @@ static void __ftrace_hash_rec_update(struct ftrace_ops *ops,
> > rec->flags |= FTRACE_FL_REGS;
> > } else {
> > if (FTRACE_WARN_ON(ftrace_rec_count(rec) == 0))
> > - return;
> > + return false;
> > rec->flags--;
> >
> > /*
> > @@ -1753,22 +1754,27 @@ static void __ftrace_hash_rec_update(struct ftrace_ops *ops,
> > */
> > }
> > count++;
> > +
> > + update |= ftrace_test_record(rec, 1) != FTRACE_UPDATE_IGNORE;
>
> Shouldn't it use 'inc' instead of 1 for the second argument of
> the ftrace_test_record()?
I dont think so, 1 is to update calls (FTRACE_UPDATE_CALLS)
check ftrace_modify_all_code:
if (command & FTRACE_UPDATE_CALLS)
ftrace_replace_code(1);
else if (command & FTRACE_DISABLE_CALLS)
ftrace_replace_code(0);
both ftrace_startup, ftrace_shutdown use FTRACE_UPDATE_CALLS
you'd use 0 only to disable all, check ftrace_check_record comments:
/*
* If we are updating calls:
*
* If the record has a ref count, then we need to enable it
* because someone is using it.
*
* Otherwise we make sure its disabled.
*
* If we are disabling calls, then disable all records that
* are enabled.
*/
if (enable && ftrace_rec_count(rec))
flag = FTRACE_FL_ENABLED;
used by ftrace_shutdown_sysctl
thanks,
jirka
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 4/5] ftrace: Make ftrace_hash_rec_enable return update bool
2016-03-11 18:15 ` Jiri Olsa
@ 2016-03-12 8:35 ` Namhyung Kim
2016-03-15 19:43 ` Steven Rostedt
0 siblings, 1 reply; 22+ messages in thread
From: Namhyung Kim @ 2016-03-12 8:35 UTC (permalink / raw)
To: Jiri Olsa
Cc: Jiri Olsa, Steven Rostedt, lkml, Ingo Molnar, Peter Zijlstra,
Arnaldo Carvalho de Melo
Hi Jiri,
On Fri, Mar 11, 2016 at 07:15:06PM +0100, Jiri Olsa wrote:
> On Fri, Mar 11, 2016 at 11:28:00PM +0900, Namhyung Kim wrote:
>
> SNIP
>
> > > @@ -1694,7 +1695,7 @@ static void __ftrace_hash_rec_update(struct ftrace_ops *ops,
> > > if (inc) {
> > > rec->flags++;
> > > if (FTRACE_WARN_ON(ftrace_rec_count(rec) == FTRACE_REF_MAX))
> > > - return;
> > > + return false;
> > >
> > > /*
> > > * If there's only a single callback registered to a
> > > @@ -1720,7 +1721,7 @@ static void __ftrace_hash_rec_update(struct ftrace_ops *ops,
> > > rec->flags |= FTRACE_FL_REGS;
> > > } else {
> > > if (FTRACE_WARN_ON(ftrace_rec_count(rec) == 0))
> > > - return;
> > > + return false;
> > > rec->flags--;
> > >
> > > /*
> > > @@ -1753,22 +1754,27 @@ static void __ftrace_hash_rec_update(struct ftrace_ops *ops,
> > > */
> > > }
> > > count++;
> > > +
> > > + update |= ftrace_test_record(rec, 1) != FTRACE_UPDATE_IGNORE;
> >
> > Shouldn't it use 'inc' instead of 1 for the second argument of
> > the ftrace_test_record()?
>
> I dont think so, 1 is to update calls (FTRACE_UPDATE_CALLS)
> check ftrace_modify_all_code:
>
> if (command & FTRACE_UPDATE_CALLS)
> ftrace_replace_code(1);
> else if (command & FTRACE_DISABLE_CALLS)
> ftrace_replace_code(0);
>
> both ftrace_startup, ftrace_shutdown use FTRACE_UPDATE_CALLS
Ah, ok. So the second argument of the ftrace_test_record() is not
'enable' actually.. :-/
>
> you'd use 0 only to disable all, check ftrace_check_record comments:
>
> /*
> * If we are updating calls:
> *
> * If the record has a ref count, then we need to enable it
> * because someone is using it.
> *
> * Otherwise we make sure its disabled.
> *
> * If we are disabling calls, then disable all records that
> * are enabled.
> */
> if (enable && ftrace_rec_count(rec))
> flag = FTRACE_FL_ENABLED;
>
>
> used by ftrace_shutdown_sysctl
I got it. Thank you for the explanation!
Thanks,
Namhyung
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 4/5] ftrace: Make ftrace_hash_rec_enable return update bool
2016-03-12 8:35 ` Namhyung Kim
@ 2016-03-15 19:43 ` Steven Rostedt
0 siblings, 0 replies; 22+ messages in thread
From: Steven Rostedt @ 2016-03-15 19:43 UTC (permalink / raw)
To: Namhyung Kim
Cc: Jiri Olsa, Jiri Olsa, lkml, Ingo Molnar, Peter Zijlstra,
Arnaldo Carvalho de Melo
On Sat, 12 Mar 2016 17:35:02 +0900
Namhyung Kim <namhyung@kernel.org> wrote:
> Hi Jiri,
>
> On Fri, Mar 11, 2016 at 07:15:06PM +0100, Jiri Olsa wrote:
> > On Fri, Mar 11, 2016 at 11:28:00PM +0900, Namhyung Kim wrote:
> >
> > SNIP
> >
> > > > @@ -1694,7 +1695,7 @@ static void __ftrace_hash_rec_update(struct ftrace_ops *ops,
> > > > if (inc) {
> > > > rec->flags++;
> > > > if (FTRACE_WARN_ON(ftrace_rec_count(rec) == FTRACE_REF_MAX))
> > > > - return;
> > > > + return false;
> > > >
> > > > /*
> > > > * If there's only a single callback registered to a
> > > > @@ -1720,7 +1721,7 @@ static void __ftrace_hash_rec_update(struct ftrace_ops *ops,
> > > > rec->flags |= FTRACE_FL_REGS;
> > > > } else {
> > > > if (FTRACE_WARN_ON(ftrace_rec_count(rec) == 0))
> > > > - return;
> > > > + return false;
> > > > rec->flags--;
> > > >
> > > > /*
> > > > @@ -1753,22 +1754,27 @@ static void __ftrace_hash_rec_update(struct ftrace_ops *ops,
> > > > */
> > > > }
> > > > count++;
> > > > +
> > > > + update |= ftrace_test_record(rec, 1) != FTRACE_UPDATE_IGNORE;
Yeah, this is confusing. Mind adding a comment above this:
/* Must match FTRACE_UPDATE_CALLS in ftrace_modify_all_code() */
That way others will know why this is a '1'.
-- Steve
> > >
> > > Shouldn't it use 'inc' instead of 1 for the second argument of
> > > the ftrace_test_record()?
> >
> > I dont think so, 1 is to update calls (FTRACE_UPDATE_CALLS)
> > check ftrace_modify_all_code:
> >
> > if (command & FTRACE_UPDATE_CALLS)
> > ftrace_replace_code(1);
> > else if (command & FTRACE_DISABLE_CALLS)
> > ftrace_replace_code(0);
> >
> > both ftrace_startup, ftrace_shutdown use FTRACE_UPDATE_CALLS
>
> Ah, ok. So the second argument of the ftrace_test_record() is not
> 'enable' actually.. :-/
>
> >
> > you'd use 0 only to disable all, check ftrace_check_record comments:
> >
> > /*
> > * If we are updating calls:
> > *
> > * If the record has a ref count, then we need to enable it
> > * because someone is using it.
> > *
> > * Otherwise we make sure its disabled.
> > *
> > * If we are disabling calls, then disable all records that
> > * are enabled.
> > */
> > if (enable && ftrace_rec_count(rec))
> > flag = FTRACE_FL_ENABLED;
> >
> >
> > used by ftrace_shutdown_sysctl
>
> I got it. Thank you for the explanation!
>
> Thanks,
> Namhyung
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/5] ftrace perf: Check sample types only for sampling events
2016-03-09 20:46 ` [PATCH 1/5] ftrace perf: Check sample types only for sampling events Jiri Olsa
2016-03-10 0:36 ` Namhyung Kim
@ 2016-03-15 20:06 ` Steven Rostedt
2016-03-15 21:51 ` Jiri Olsa
1 sibling, 1 reply; 22+ messages in thread
From: Steven Rostedt @ 2016-03-15 20:06 UTC (permalink / raw)
To: Jiri Olsa
Cc: lkml, Ingo Molnar, Namhyung Kim, Peter Zijlstra,
Arnaldo Carvalho de Melo
On Wed, 9 Mar 2016 21:46:41 +0100
Jiri Olsa <jolsa@kernel.org> wrote:
> Currently we check sample type for ftrace:function event
> even if it's not created as sampling event. That prevents
> creating ftrace_function event in counting mode.
>
> Making sure we check sample types only for sampling events.
>
> Before:
> $ sudo perf stat -e ftrace:function ls
> ...
>
> Performance counter stats for 'ls':
>
> <not supported> ftrace:function
>
> 0.001983662 seconds time elapsed
>
> After:
> $ sudo perf stat -e ftrace:function ls
I'm assuming you gave yourself admin capabilities, and not any normal
user may sample function tracing, right?
-- Steve
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/5] ftrace perf: Check sample types only for sampling events
2016-03-15 20:06 ` Steven Rostedt
@ 2016-03-15 21:51 ` Jiri Olsa
0 siblings, 0 replies; 22+ messages in thread
From: Jiri Olsa @ 2016-03-15 21:51 UTC (permalink / raw)
To: Steven Rostedt
Cc: Jiri Olsa, lkml, Ingo Molnar, Namhyung Kim, Peter Zijlstra,
Arnaldo Carvalho de Melo
On Tue, Mar 15, 2016 at 04:06:46PM -0400, Steven Rostedt wrote:
> On Wed, 9 Mar 2016 21:46:41 +0100
> Jiri Olsa <jolsa@kernel.org> wrote:
>
> > Currently we check sample type for ftrace:function event
> > even if it's not created as sampling event. That prevents
> > creating ftrace_function event in counting mode.
> >
> > Making sure we check sample types only for sampling events.
> >
> > Before:
> > $ sudo perf stat -e ftrace:function ls
> > ...
> >
> > Performance counter stats for 'ls':
> >
> > <not supported> ftrace:function
> >
> > 0.001983662 seconds time elapsed
> >
> > After:
> > $ sudo perf stat -e ftrace:function ls
>
> I'm assuming you gave yourself admin capabilities, and not any normal
> user may sample function tracing, right?
right ;-)
jirka
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/5] ftrace perf: Check sample types only for sampling events
2016-03-16 14:34 ` [PATCH 1/5] ftrace perf: Check sample types only for sampling events Jiri Olsa
@ 2016-03-18 14:27 ` Steven Rostedt
0 siblings, 0 replies; 22+ messages in thread
From: Steven Rostedt @ 2016-03-18 14:27 UTC (permalink / raw)
To: Jiri Olsa
Cc: lkml, Ingo Molnar, Namhyung Kim, Peter Zijlstra,
Arnaldo Carvalho de Melo
On Wed, 16 Mar 2016 15:34:29 +0100
Jiri Olsa <jolsa@kernel.org> wrote:
> Currently we check sample type for ftrace:function event
> even if it's not created as sampling event. That prevents
> creating ftrace_function event in counting mode.
>
> Making sure we check sample types only for sampling events.
>
> Before:
> $ sudo perf stat -e ftrace:function ls
> ...
>
> Performance counter stats for 'ls':
>
> <not supported> ftrace:function
>
> 0.001983662 seconds time elapsed
>
> After:
> $ sudo perf stat -e ftrace:function ls
> ...
>
> Performance counter stats for 'ls':
>
> 44,498 ftrace:function
>
> 0.037534722 seconds time elapsed
>
> Suggested-by: Namhyung Kim <namhyung@kernel.org>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
-- Steve
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 1/5] ftrace perf: Check sample types only for sampling events
2016-03-16 14:34 [PATCHv2 0/5] ftrace perf: Fixes and speedup Jiri Olsa
@ 2016-03-16 14:34 ` Jiri Olsa
2016-03-18 14:27 ` Steven Rostedt
0 siblings, 1 reply; 22+ messages in thread
From: Jiri Olsa @ 2016-03-16 14:34 UTC (permalink / raw)
To: Steven Rostedt
Cc: lkml, Ingo Molnar, Namhyung Kim, Peter Zijlstra,
Arnaldo Carvalho de Melo
Currently we check sample type for ftrace:function event
even if it's not created as sampling event. That prevents
creating ftrace_function event in counting mode.
Making sure we check sample types only for sampling events.
Before:
$ sudo perf stat -e ftrace:function ls
...
Performance counter stats for 'ls':
<not supported> ftrace:function
0.001983662 seconds time elapsed
After:
$ sudo perf stat -e ftrace:function ls
...
Performance counter stats for 'ls':
44,498 ftrace:function
0.037534722 seconds time elapsed
Suggested-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
kernel/trace/trace_event_perf.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 00df25fd86ef..e11108f1d197 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -47,6 +47,9 @@ static int perf_trace_event_perm(struct trace_event_call *tp_event,
if (perf_paranoid_tracepoint_raw() && !capable(CAP_SYS_ADMIN))
return -EPERM;
+ if (!is_sampling_event(p_event))
+ return 0;
+
/*
* We don't allow user space callchains for function trace
* event, due to issues with page faults while tracing page
--
2.4.3
^ permalink raw reply related [flat|nested] 22+ messages in thread
end of thread, other threads:[~2016-03-18 14:27 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-09 20:46 [RFC 0/5] ftrace perf: Fixes and speedup Jiri Olsa
2016-03-09 20:46 ` [PATCH 1/5] ftrace perf: Check sample types only for sampling events Jiri Olsa
2016-03-10 0:36 ` Namhyung Kim
2016-03-10 7:25 ` Jiri Olsa
2016-03-11 8:36 ` Jiri Olsa
2016-03-11 13:48 ` Namhyung Kim
2016-03-11 18:14 ` Jiri Olsa
2016-03-15 20:06 ` Steven Rostedt
2016-03-15 21:51 ` Jiri Olsa
2016-03-09 20:46 ` [PATCH 2/5] ftrace perf: Move exclude_kernel tracepoint check to init event Jiri Olsa
2016-03-10 0:39 ` Namhyung Kim
2016-03-11 8:39 ` Jiri Olsa
2016-03-09 20:46 ` [PATCH 3/5] ftrace perf: Use ftrace_ops::private to store event pointer Jiri Olsa
2016-03-10 1:29 ` Namhyung Kim
2016-03-09 20:46 ` [PATCH 4/5] ftrace: Make ftrace_hash_rec_enable return update bool Jiri Olsa
2016-03-11 14:28 ` Namhyung Kim
2016-03-11 18:15 ` Jiri Olsa
2016-03-12 8:35 ` Namhyung Kim
2016-03-15 19:43 ` Steven Rostedt
2016-03-09 20:46 ` [PATCH 5/5] ftrace: Update dynamic ftrace calls only if necessary Jiri Olsa
2016-03-16 14:34 [PATCHv2 0/5] ftrace perf: Fixes and speedup Jiri Olsa
2016-03-16 14:34 ` [PATCH 1/5] ftrace perf: Check sample types only for sampling events Jiri Olsa
2016-03-18 14:27 ` Steven Rostedt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).