* [PATCH 0/5] tracing: Hash triggers
@ 2014-03-27 4:54 Tom Zanussi
2014-03-27 4:54 ` [PATCH 1/5] tracing: Make ftrace_event_field checking functions available Tom Zanussi
` (4 more replies)
0 siblings, 5 replies; 11+ messages in thread
From: Tom Zanussi @ 2014-03-27 4:54 UTC (permalink / raw)
To: rostedt; +Cc: linux-kernel, Tom Zanussi
Hi Steve,
This is my current code for the hash triggers mentioned in the other
thread.
I've been using it for a project here, and as such it works fine for
me, but it's nowhere near anything like a mergeable state; I'm only
sending/posting it because I didn't realize until today that you were
presenting on triggers at Collab Summit, and if as mentioned you're
thinking of adding a bullet or two for it wrt future/3.16 work, it
might be useful to have the code to play around with too...
Tom
The following changes since commit f217c44ebd41ce7369d2df07622b2839479183b0:
Merge tag 'trace-fixes-v3.14-rc7-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace (2014-03-26 09:09:18 -0700)
are available in the git repository at:
git://git.yoctoproject.org/linux-yocto-contrib.git tzanussi/hashtriggers-v0
http://git.yoctoproject.org/cgit/cgit.cgi/linux-yocto-contrib/log/?h=tzanussi/hashtriggers-v0
Tom Zanussi (5):
tracing: Make ftrace_event_field checking functions available
tracing: Add event record param to trigger_ops.func()
tracing: Add get_syscall_name()
tracing: Add hash trigger to Documentation
tracing: Add 'hash' event trigger command
Documentation/trace/events.txt | 81 ++
include/linux/ftrace_event.h | 8 +-
kernel/trace/trace.h | 27 +-
kernel/trace/trace_events_filter.c | 15 +-
kernel/trace/trace_events_trigger.c | 1439 ++++++++++++++++++++++++++++++++++-
kernel/trace/trace_syscalls.c | 11 +
6 files changed, 1546 insertions(+), 35 deletions(-)
--
1.8.3.1
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/5] tracing: Make ftrace_event_field checking functions available
2014-03-27 4:54 [PATCH 0/5] tracing: Hash triggers Tom Zanussi
@ 2014-03-27 4:54 ` Tom Zanussi
2014-03-27 4:54 ` [PATCH 2/5] tracing: Add event record param to trigger_ops.func() Tom Zanussi
` (3 subsequent siblings)
4 siblings, 0 replies; 11+ messages in thread
From: Tom Zanussi @ 2014-03-27 4:54 UTC (permalink / raw)
To: rostedt; +Cc: linux-kernel, Tom Zanussi
Make is_string_field() and is_function_field() accessible outside of
trace_event_filters.c for other users of ftrace_event_fields.
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
---
kernel/trace/trace.h | 12 ++++++++++++
kernel/trace/trace_events_filter.c | 12 ------------
2 files changed, 12 insertions(+), 12 deletions(-)
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 02b592f..26c55ff 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1012,6 +1012,18 @@ struct filter_pred {
unsigned short right;
};
+static inline bool is_string_field(struct ftrace_event_field *field)
+{
+ return field->filter_type == FILTER_DYN_STRING ||
+ field->filter_type == FILTER_STATIC_STRING ||
+ field->filter_type == FILTER_PTR_STRING;
+}
+
+static inline bool is_function_field(struct ftrace_event_field *field)
+{
+ return field->filter_type == FILTER_TRACE_FN;
+}
+
extern enum regex_type
filter_parse_regex(char *buff, int len, char **search, int *not);
extern void print_event_filter(struct ftrace_event_file *file,
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index 8a86319..60a8e3f 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -947,18 +947,6 @@ int filter_assign_type(const char *type)
return FILTER_OTHER;
}
-static bool is_function_field(struct ftrace_event_field *field)
-{
- return field->filter_type == FILTER_TRACE_FN;
-}
-
-static bool is_string_field(struct ftrace_event_field *field)
-{
- return field->filter_type == FILTER_DYN_STRING ||
- field->filter_type == FILTER_STATIC_STRING ||
- field->filter_type == FILTER_PTR_STRING;
-}
-
static int is_legal_op(struct ftrace_event_field *field, int op)
{
if (is_string_field(field) &&
--
1.8.3.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 2/5] tracing: Add event record param to trigger_ops.func()
2014-03-27 4:54 [PATCH 0/5] tracing: Hash triggers Tom Zanussi
2014-03-27 4:54 ` [PATCH 1/5] tracing: Make ftrace_event_field checking functions available Tom Zanussi
@ 2014-03-27 4:54 ` Tom Zanussi
2014-03-27 4:54 ` [PATCH 3/5] tracing: Add get_syscall_name() Tom Zanussi
` (2 subsequent siblings)
4 siblings, 0 replies; 11+ messages in thread
From: Tom Zanussi @ 2014-03-27 4:54 UTC (permalink / raw)
To: rostedt; +Cc: linux-kernel, Tom Zanussi
Some triggers may need access to the trace event, so pass it in. Also
fix up the existing trigger funcs and their callers.
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
---
include/linux/ftrace_event.h | 7 ++++---
kernel/trace/trace.h | 6 ++++--
kernel/trace/trace_events_trigger.c | 35 ++++++++++++++++++-----------------
3 files changed, 26 insertions(+), 22 deletions(-)
diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 4cdb3a1..5961964 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -368,7 +368,8 @@ extern int call_filter_check_discard(struct ftrace_event_call *call, void *rec,
extern enum event_trigger_type event_triggers_call(struct ftrace_event_file *file,
void *rec);
extern void event_triggers_post_call(struct ftrace_event_file *file,
- enum event_trigger_type tt);
+ enum event_trigger_type tt,
+ void *rec);
/**
* ftrace_trigger_soft_disabled - do triggers and test if soft disabled
@@ -451,7 +452,7 @@ event_trigger_unlock_commit(struct ftrace_event_file *file,
trace_buffer_unlock_commit(buffer, event, irq_flags, pc);
if (tt)
- event_triggers_post_call(file, tt);
+ event_triggers_post_call(file, tt, entry);
}
/**
@@ -484,7 +485,7 @@ event_trigger_unlock_commit_regs(struct ftrace_event_file *file,
irq_flags, pc, regs);
if (tt)
- event_triggers_post_call(file, tt);
+ event_triggers_post_call(file, tt, entry);
}
enum {
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 26c55ff..9032cf3 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1087,7 +1087,8 @@ struct event_trigger_data {
* @func: The trigger 'probe' function called when the triggering
* event occurs. The data passed into this callback is the data
* that was supplied to the event_command @reg() function that
- * registered the trigger (see struct event_command).
+ * registered the trigger (see struct event_command) along with
+ * the trace record, rec.
*
* @init: An optional initialization function called for the trigger
* when the trigger is registered (via the event_command reg()
@@ -1112,7 +1113,8 @@ struct event_trigger_data {
* (see trace_event_triggers.c).
*/
struct event_trigger_ops {
- void (*func)(struct event_trigger_data *data);
+ void (*func)(struct event_trigger_data *data,
+ void *rec);
int (*init)(struct event_trigger_ops *ops,
struct event_trigger_data *data);
void (*free)(struct event_trigger_ops *ops,
diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
index 8efbb69..323846e 100644
--- a/kernel/trace/trace_events_trigger.c
+++ b/kernel/trace/trace_events_trigger.c
@@ -74,7 +74,7 @@ event_triggers_call(struct ftrace_event_file *file, void *rec)
list_for_each_entry_rcu(data, &file->triggers, list) {
if (!rec) {
- data->ops->func(data);
+ data->ops->func(data, rec);
continue;
}
filter = rcu_dereference(data->filter);
@@ -84,7 +84,7 @@ event_triggers_call(struct ftrace_event_file *file, void *rec)
tt |= data->cmd_ops->trigger_type;
continue;
}
- data->ops->func(data);
+ data->ops->func(data, rec);
}
return tt;
}
@@ -104,13 +104,14 @@ EXPORT_SYMBOL_GPL(event_triggers_call);
*/
void
event_triggers_post_call(struct ftrace_event_file *file,
- enum event_trigger_type tt)
+ enum event_trigger_type tt,
+ void *rec)
{
struct event_trigger_data *data;
list_for_each_entry_rcu(data, &file->triggers, list) {
if (data->cmd_ops->trigger_type & tt)
- data->ops->func(data);
+ data->ops->func(data, rec);
}
}
EXPORT_SYMBOL_GPL(event_triggers_post_call);
@@ -751,7 +752,7 @@ static int set_trigger_filter(char *filter_str,
}
static void
-traceon_trigger(struct event_trigger_data *data)
+traceon_trigger(struct event_trigger_data *data, void *rec)
{
if (tracing_is_on())
return;
@@ -760,7 +761,7 @@ traceon_trigger(struct event_trigger_data *data)
}
static void
-traceon_count_trigger(struct event_trigger_data *data)
+traceon_count_trigger(struct event_trigger_data *data, void *rec)
{
if (tracing_is_on())
return;
@@ -775,7 +776,7 @@ traceon_count_trigger(struct event_trigger_data *data)
}
static void
-traceoff_trigger(struct event_trigger_data *data)
+traceoff_trigger(struct event_trigger_data *data, void *rec)
{
if (!tracing_is_on())
return;
@@ -784,7 +785,7 @@ traceoff_trigger(struct event_trigger_data *data)
}
static void
-traceoff_count_trigger(struct event_trigger_data *data)
+traceoff_count_trigger(struct event_trigger_data *data, void *rec)
{
if (!tracing_is_on())
return;
@@ -880,13 +881,13 @@ static struct event_command trigger_traceoff_cmd = {
#ifdef CONFIG_TRACER_SNAPSHOT
static void
-snapshot_trigger(struct event_trigger_data *data)
+snapshot_trigger(struct event_trigger_data *data, void *rec)
{
tracing_snapshot();
}
static void
-snapshot_count_trigger(struct event_trigger_data *data)
+snapshot_count_trigger(struct event_trigger_data *data, void *rec)
{
if (!data->count)
return;
@@ -894,7 +895,7 @@ snapshot_count_trigger(struct event_trigger_data *data)
if (data->count != -1)
(data->count)--;
- snapshot_trigger(data);
+ snapshot_trigger(data, rec);
}
static int
@@ -973,13 +974,13 @@ static __init int register_trigger_snapshot_cmd(void) { return 0; }
#define STACK_SKIP 3
static void
-stacktrace_trigger(struct event_trigger_data *data)
+stacktrace_trigger(struct event_trigger_data *data, void *rec)
{
trace_dump_stack(STACK_SKIP);
}
static void
-stacktrace_count_trigger(struct event_trigger_data *data)
+stacktrace_count_trigger(struct event_trigger_data *data, void *rec)
{
if (!data->count)
return;
@@ -987,7 +988,7 @@ stacktrace_count_trigger(struct event_trigger_data *data)
if (data->count != -1)
(data->count)--;
- stacktrace_trigger(data);
+ stacktrace_trigger(data, rec);
}
static int
@@ -1058,7 +1059,7 @@ struct enable_trigger_data {
};
static void
-event_enable_trigger(struct event_trigger_data *data)
+event_enable_trigger(struct event_trigger_data *data, void *rec)
{
struct enable_trigger_data *enable_data = data->private_data;
@@ -1069,7 +1070,7 @@ event_enable_trigger(struct event_trigger_data *data)
}
static void
-event_enable_count_trigger(struct event_trigger_data *data)
+event_enable_count_trigger(struct event_trigger_data *data, void *rec)
{
struct enable_trigger_data *enable_data = data->private_data;
@@ -1083,7 +1084,7 @@ event_enable_count_trigger(struct event_trigger_data *data)
if (data->count != -1)
(data->count)--;
- event_enable_trigger(data);
+ event_enable_trigger(data, rec);
}
static int
--
1.8.3.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 3/5] tracing: Add get_syscall_name()
2014-03-27 4:54 [PATCH 0/5] tracing: Hash triggers Tom Zanussi
2014-03-27 4:54 ` [PATCH 1/5] tracing: Make ftrace_event_field checking functions available Tom Zanussi
2014-03-27 4:54 ` [PATCH 2/5] tracing: Add event record param to trigger_ops.func() Tom Zanussi
@ 2014-03-27 4:54 ` Tom Zanussi
2014-03-27 4:54 ` [PATCH 4/5] tracing: Add hash trigger to Documentation Tom Zanussi
2014-03-27 4:54 ` [PATCH 5/5] tracing: Add 'hash' event trigger command Tom Zanussi
4 siblings, 0 replies; 11+ messages in thread
From: Tom Zanussi @ 2014-03-27 4:54 UTC (permalink / raw)
To: rostedt; +Cc: linux-kernel, Tom Zanussi
Add a utility function to grab the syscall name from the syscall
metadata, given a syscall id.
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
---
kernel/trace/trace.h | 9 +++++++++
kernel/trace/trace_syscalls.c | 11 +++++++++++
2 files changed, 20 insertions(+)
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 9032cf3..457fb4f 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1277,4 +1277,13 @@ int perf_ftrace_event_register(struct ftrace_event_call *call,
#define perf_ftrace_event_register NULL
#endif
+#ifdef CONFIG_FTRACE_SYSCALLS
+const char *get_syscall_name(int syscall);
+#else
+static inline const char *get_syscall_name(int syscall)
+{
+ return NULL;
+}
+#endif /* CONFIG_FTRACE_SYSCALLS */
+
#endif /* _LINUX_KERNEL_TRACE_H */
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 759d5e0..1abb3396 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -106,6 +106,17 @@ static struct syscall_metadata *syscall_nr_to_meta(int nr)
return syscalls_metadata[nr];
}
+const char *get_syscall_name(int syscall)
+{
+ struct syscall_metadata *entry;
+
+ entry = syscall_nr_to_meta(syscall);
+ if (!entry)
+ return NULL;
+
+ return entry->name;
+}
+
static enum print_line_t
print_syscall_enter(struct trace_iterator *iter, int flags,
struct trace_event *event)
--
1.8.3.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 4/5] tracing: Add hash trigger to Documentation
2014-03-27 4:54 [PATCH 0/5] tracing: Hash triggers Tom Zanussi
` (2 preceding siblings ...)
2014-03-27 4:54 ` [PATCH 3/5] tracing: Add get_syscall_name() Tom Zanussi
@ 2014-03-27 4:54 ` Tom Zanussi
2014-03-27 4:54 ` [PATCH 5/5] tracing: Add 'hash' event trigger command Tom Zanussi
4 siblings, 0 replies; 11+ messages in thread
From: Tom Zanussi @ 2014-03-27 4:54 UTC (permalink / raw)
To: rostedt; +Cc: linux-kernel, Tom Zanussi
Add documentation and usage examples for 'hash' triggers.
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
---
Documentation/trace/events.txt | 81 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 81 insertions(+)
diff --git a/Documentation/trace/events.txt b/Documentation/trace/events.txt
index c94435d..aed77bc 100644
--- a/Documentation/trace/events.txt
+++ b/Documentation/trace/events.txt
@@ -494,3 +494,84 @@ The following commands are supported:
Note that there can be only one traceon or traceoff trigger per
triggering event.
+
+- hash
+
+ This command updates a hash table with a key composed of one or more
+ trace event format fields and a set of values consisting of one or
+ more running totals of either field values or single counts.
+
+ For example, the following trigger hashes all kmalloc events using
+ 'call_site' as the hash key. For each entry, it keeps a running
+ count of event hits ('hitcount', which is optional - counts are
+ always tallied and displayed in the output), and running sums of
+ bytes_alloc, and bytes_req:
+
+ # echo 'hash:call_site:hitcount,bytes_alloc,bytes_req' > \
+ /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
+
+ The following uses the stacktrace at the call_site as a hash key
+ instead of just the straight call_site. :
+
+ # echo 'hash:stacktrace:bytes_alloc,bytes_req' > \
+ /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
+
+ The following uses the combination of call_site and pid as a
+ composite hash key, effectively implementing a per-pid nested hash
+ by call_site:
+
+ # echo 'hash:call_site,common_pid:bytes_alloc,bytes_req' > \
+ /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
+
+ To keep a per-pid count of the number of bytes asked for in file
+ reads:
+
+ # echo 'hash:common_pid:count' > \
+ /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
+
+ To keep a per-pid, per-file count of the number of bytes asked for
+ in file reads:
+
+ # echo 'hash:common_pid,fd:count' > \
+ /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
+
+ To keep a per-pid, per-file count of the number of bytes actually
+ gotten in file reads (but only if the return value wasn't negative):
+
+ # echo 'hash:common_pid,fd:ret if ret > 0' > \
+ /sys/kernel/debug/tracing/events/syscalls/sys_exit_read/trigger
+
+ The format is:
+
+ hash:<key>,<key>:<val>,<val>,<val>[:sort_keys] if filter > event/trigger
+
+ More formally,
+
+ # echo hash:key(s):value(s)[:sort_keys()][ if filter] > event/trigger
+
+ To remove the above commands:
+
+ # echo '!hash:call_site:1,bytes_alloc,bytes_req' > \
+ /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
+
+ Note that there can be any number of hash triggers per triggering
+ event.
+
+ A '-' operator is available for taking differences between numeric
+ fields.
+
+ Sorting:
+
+ The default sort key is 'hitcount' which is always available.
+ Appending ':sort=val1,val1' will sort the output using val1 as the
+ primary key and val2 as the secondary.
+
+ Modifiers:
+
+ Various fields can have a .<modifier> appended to them, which will
+ modify how they're displayed:
+
+ .hex - display a numeric value as hex
+ .sym - display an address as a symbol if possible
+ .syscall - map a number representing syscall id to its syscall name
+ .execname - map a number representing a pid to its process name
--
1.8.3.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 5/5] tracing: Add 'hash' event trigger command
2014-03-27 4:54 [PATCH 0/5] tracing: Hash triggers Tom Zanussi
` (3 preceding siblings ...)
2014-03-27 4:54 ` [PATCH 4/5] tracing: Add hash trigger to Documentation Tom Zanussi
@ 2014-03-27 4:54 ` Tom Zanussi
2014-03-28 16:54 ` Andi Kleen
2014-04-03 8:59 ` Masami Hiramatsu
4 siblings, 2 replies; 11+ messages in thread
From: Tom Zanussi @ 2014-03-27 4:54 UTC (permalink / raw)
To: rostedt; +Cc: linux-kernel, Tom Zanussi
Hash triggers allow users to continually hash events which can then be
dumped later by simply reading the trigger file. This is done
strictly via one-liners and without any kind of programming language.
The syntax follows the existing trigger syntax:
# echo hash:key(s):value(s)[:sort_keys()][ if filter] > event/trigger
The values used as keys and values are just the fields that define the
trace event and available in the event's 'format' file. For example,
the kmalloc event:
root@ie:/sys/kernel/debug/tracing/events/kmem/kmalloc# cat format
name: kmalloc
ID: 370
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1;signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:unsigned long call_site; offset:8; size:4; signed:0;
field:const void * ptr; offset:12; size:4; signed:0;
field:size_t bytes_req; offset:16; size:4; signed:0;
field:size_t bytes_alloc; offset:20; size:4; signed:0;
field:gfp_t gfp_flags; offset:24; size:4; signed:0;
The key can be made up of one or more of these fields and any number of
values can specified - these are automatically tallied in the hash entry
any time the event is hit. Stacktraces can also be used as keys.
For example, the following uses the stacktrace leading up to a kmalloc
as the key for hashing kmalloc events. For each hash entry a tally of
the bytes_alloc field is kept. Dumping out the trigger shows the sum
of bytes allocated for each execution path that led to a kmalloc:
# echo 'hash:call_site:bytes_alloc' > /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
# cat /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
key: stacktrace:
kmem_cache_alloc_trace+0xeb/0x140
intel_ring_begin+0xd8/0x1a0 [i915]
gen6_ring_sync+0x3c/0x140 [i915]
i915_gem_object_sync+0xd1/0x130 [i915]
i915_gem_do_execbuffer.isra.21+0x632/0x10d0 [i915]
i915_gem_execbuffer2+0xac/0x280 [i915]
drm_ioctl+0x4e9/0x610 [drm]
do_vfs_ioctl+0x83/0x510
SyS_ioctl+0x91/0xb0
system_call_fastpath+0x16/0x1b
vals: count:1595 bytes_alloc:153120
key: stacktrace:
__kmalloc+0x10b/0x180
i915_gem_do_execbuffer.isra.21+0x67a/0x10d0 [i915]
i915_gem_execbuffer2+0xac/0x280 [i915]
drm_ioctl+0x4e9/0x610 [drm]
do_vfs_ioctl+0x83/0x510
SyS_ioctl+0x91/0xb0
system_call_fastpath+0x16/0x1b
vals: count:2850 bytes_alloc:888736
key: stacktrace:
__kmalloc+0x10b/0x180
i915_gem_execbuffer2+0x60/0x280 [i915]
drm_ioctl+0x4e9/0x610 [drm]
do_vfs_ioctl+0x83/0x510
SyS_ioctl+0x91/0xb0
system_call_fastpath+0x16/0x1b
vals: count:2850 bytes_alloc:2560384
key: stacktrace:
__kmalloc+0x10b/0x180
hid_report_raw_event+0x15b/0x450 [hid]
hid_input_report+0x119/0x1a0 [hid]
hid_irq_in+0x20b/0x250 [usbhid]
__usb_hcd_giveback_urb+0x7c/0x130
usb_giveback_urb_bh+0x96/0xe0
tasklet_hi_action+0xd7/0xe0
__do_softirq+0x125/0x2e0
irq_exit+0xb5/0xc0
do_IRQ+0x67/0x110
ret_from_intr+0x0/0x13
cpuidle_idle_call+0xbb/0x1f0
arch_cpu_idle+0xe/0x30
cpu_startup_entry+0x9f/0x240
rest_init+0x77/0x80
start_kernel+0x3db/0x3e8
vals: count:5968 bytes_alloc:131296
Totals:
Hits: 22648
Entries: 119
Dropped: 0
This turns the hash trigger off:
# echo '!hash:stacktrace:bytes_alloc' > /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
Stack traces, of course, are very useful but a bit of overkill for
many uses. For instance, suppose we just want a line per caller.
Here, we keep a tally of bytes_alloc per caller. Note that you don't
need to explicitly keep a 'count' tally - counts are automatically
tallied and displayed (and are in fact the default sort key).
Also note that the raw call_site printed here isn't very useful (we'll
remedy that later).
# echo 'hash:call_site:bytes_alloc' > /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
# cat /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
hash:unlimited
key: call_site:18446744071579450186 vals: count:1 bytes_alloc:64
key: call_site:18446744071579439780 vals: count:1 bytes_alloc:64
key: call_site:18446744071579400894 vals: count:1 bytes_alloc:1024
key: call_site:18446744072104627352 vals: count:1 bytes_alloc:512
key: call_site:18446744071580027351 vals: count:1 bytes_alloc:512
key: call_site:18446744071580991590 vals: count:1 bytes_alloc:16
key: call_site:18446744071579463899 vals: count:1 bytes_alloc:64
key: call_site:18446744072102260685 vals: count:1 bytes_alloc:512
key: call_site:18446744071579439821 vals: count:1 bytes_alloc:64
key: call_site:18446744071579532598 vals: count:1 bytes_alloc:1024
key: call_site:18446744071584838347 vals: count:1 bytes_alloc:64
key: call_site:18446744071579450148 vals: count:1 bytes_alloc:64
key: call_site:18446744071580886173 vals: count:2 bytes_alloc:256
key: call_site:18446744071580886422 vals: count:2 bytes_alloc:1024
key: call_site:18446744071580987082 vals: count:2 bytes_alloc:8192
key: call_site:18446744071580652885 vals: count:2 bytes_alloc:128
key: call_site:18446744071580565960 vals: count:2 bytes_alloc:512
key: call_site:18446744071580680412 vals: count:2 bytes_alloc:64
key: call_site:18446744071580891052 vals: count:2 bytes_alloc:1024
key: call_site:18446744071580886777 vals: count:2 bytes_alloc:64
key: call_site:18446744071580572594 vals: count:3 bytes_alloc:3072
key: call_site:18446744071580592783 vals: count:3 bytes_alloc:48
key: call_site:18446744071580679805 vals: count:3 bytes_alloc:12288
key: call_site:18446744071582021108 vals: count:3 bytes_alloc:768
key: call_site:18446744071580572564 vals: count:3 bytes_alloc:576
key: call_site:18446744071581165381 vals: count:4 bytes_alloc:256
key: call_site:18446744071580953553 vals: count:4 bytes_alloc:256
key: call_site:18446744072102160648 vals: count:4 bytes_alloc:1024
key: call_site:18446744071580652708 vals: count:4 bytes_alloc:4224
key: call_site:18446744071580680238 vals: count:5 bytes_alloc:640
key: call_site:18446744071581375333 vals: count:6 bytes_alloc:384
key: call_site:18446744072102162313 vals: count:16 bytes_alloc:7616
key: call_site:18446744071581165832 vals: count:24 bytes_alloc:1600
key: call_site:18446744071582016247 vals: count:26 bytes_alloc:832
key: call_site:18446744071580843814 vals: count:35 bytes_alloc:2240
key: call_site:18446744071581367368 vals: count:39 bytes_alloc:3744
key: call_site:18446744072101806931 vals: count:39 bytes_alloc:1248
key: call_site:18446744072103721852 vals: count:89 bytes_alloc:8544
key: call_site:18446744072101850501 vals: count:89 bytes_alloc:8544
key: call_site:18446744072103729728 vals: count:89 bytes_alloc:17088
key: call_site:18446744071583128580 vals: count:154 bytes_alloc:157696
key: call_site:18446744072103573325 vals: count:643 bytes_alloc:10288
key: call_site:18446744071582381017 vals: count:643 bytes_alloc:159008
key: call_site:18446744072103563942 vals: count:645 bytes_alloc:123840
key: call_site:18446744071582043239 vals: count:765 bytes_alloc:6120
key: call_site:18446744072101884462 vals: count:776 bytes_alloc:49664
key: call_site:18446744072103903864 vals: count:1026 bytes_alloc:98496
key: call_site:18446744072103596026 vals: count:1026 bytes_alloc:287040
key: call_site:18446744072103599888 vals: count:1026 bytes_alloc:724736
key: call_site:18446744071580813202 vals: count:2433 bytes_alloc:155712
key: call_site:18446744072099520315 vals: count:2948 bytes_alloc:64856
Totals:
Hits: 12601
Entries: 51
Dropped: 0
A little more useful, but not much, would be to display the call_sites
as hex addresses. To do this we add a '.hex' modifier to the
call_site key :
root@trz-ThinkPad-T420:~# echo 'hash:call_site.hex:bytes_alloc' > /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
root@trz-ThinkPad-T420:~# cat /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
hash:unlimited
key: call_site:ffffffff811e7f26 vals: count:1 bytes_alloc:64
key: call_site:ffffffff811a5bb2 vals: count:1 bytes_alloc:1024
key: call_site:ffffffff811a41c8 vals: count:1 bytes_alloc:256
key: call_site:ffffffff811c002e vals: count:1 bytes_alloc:128
key: call_site:ffffffff811209d7 vals: count:1 bytes_alloc:256
key: call_site:ffffffff811f26f9 vals: count:1 bytes_alloc:32
key: call_site:ffffffff811f2596 vals: count:1 bytes_alloc:512
key: call_site:ffffffff811f249d vals: count:1 bytes_alloc:128
key: call_site:ffffffff811f37ac vals: count:1 bytes_alloc:512
key: call_site:ffffffff811bfe7d vals: count:1 bytes_alloc:4096
key: call_site:ffffffff811a5b94 vals: count:1 bytes_alloc:192
key: call_site:ffffffff813075f4 vals: count:1 bytes_alloc:256
key: call_site:ffffffff811b9555 vals: count:1 bytes_alloc:64
key: call_site:ffffffff811b94a4 vals: count:2 bytes_alloc:2112
key: call_site:ffffffff81236745 vals: count:2 bytes_alloc:128
key: call_site:ffffffff813062f7 vals: count:5 bytes_alloc:160
key: call_site:ffffffff811e0792 vals: count:8 bytes_alloc:512
key: call_site:ffffffff81236908 vals: count:12 bytes_alloc:800
key: call_site:ffffffffa0491a40 vals: count:12 bytes_alloc:2304
key: call_site:ffffffffa02c6d85 vals: count:12 bytes_alloc:1152
key: call_site:ffffffffa048fb7c vals: count:12 bytes_alloc:1152
key: call_site:ffffffffa0470ffa vals: count:144 bytes_alloc:40192
key: call_site:ffffffffa0471f10 vals: count:144 bytes_alloc:96192
key: call_site:ffffffffa04bc278 vals: count:144 bytes_alloc:13824
key: call_site:ffffffffa04692a6 vals: count:218 bytes_alloc:41856
key: call_site:ffffffffa046b74d vals: count:218 bytes_alloc:3488
key: call_site:ffffffff8135f3d9 vals: count:218 bytes_alloc:53344
key: call_site:ffffffffa02cf22e vals: count:230 bytes_alloc:14720
key: call_site:ffffffff8130cc67 vals: count:1229 bytes_alloc:9832
Totals:
Hits: 2623
Entries: 29
Dropped: 0
Even more useful would be to display the call_sites as symbolic names.
To do that we can add a '.sym' modifier to the call_site key:
root@trz-ThinkPad-T420:~# echo 'hash:call_site.sym:bytes_alloc' > /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
root@trz-ThinkPad-T420:~# cat /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
hash:unlimited
key: call_site:[ffffffff8120aeca] stat_open vals: count:1 bytes_alloc:4096
key: call_site:[ffffffff811a5bb2] alloc_pipe_info vals: count:1 bytes_alloc:1024
key: call_site:[ffffffff811f2596] load_elf_binary vals: count:1 bytes_alloc:512
key: call_site:[ffffffff811209d7] event_hash_trigger_print vals: count:1 bytes_alloc:256
key: call_site:[ffffffff811f26f9] load_elf_binary vals: count:1 bytes_alloc:32
key: call_site:[ffffffff811b9555] alloc_fdtable vals: count:1 bytes_alloc:64
key: call_site:[ffffffff811f37ac] load_elf_binary vals: count:1 bytes_alloc:512
key: call_site:[ffffffff811a41c8] do_execve_common.isra.28 vals: count:1 bytes_alloc:256
key: call_site:[ffffffff811c00dc] single_open vals: count:1 bytes_alloc:32
key: call_site:[ffffffff811f249d] load_elf_binary vals: count:1 bytes_alloc:128
key: call_site:[ffffffff811a5b94] alloc_pipe_info vals: count:1 bytes_alloc:192
key: call_site:[ffffffff813075f4] aa_path_name vals: count:1 bytes_alloc:256
key: call_site:[ffffffff811dd155] mounts_open_common vals: count:2 bytes_alloc:384
key: call_site:[ffffffff811b94a4] alloc_fdmem vals: count:2 bytes_alloc:2112
key: call_site:[ffffffff81202bd1] proc_reg_open vals: count:2 bytes_alloc:128
key: call_site:[ffffffff8120c066] proc_self_follow_link vals: count:2 bytes_alloc:32
key: call_site:[ffffffff811c002e] seq_open vals: count:3 bytes_alloc:384
key: call_site:[ffffffff811bfe7d] seq_read vals: count:4 bytes_alloc:16384
key: call_site:[ffffffff811e0792] inotify_handle_event vals: count:4 bytes_alloc:256
key: call_site:[ffffffff813062f7] aa_alloc_task_context vals: count:5 bytes_alloc:160
key: call_site:[ffffffffa0491a40] intel_framebuffer_create vals: count:8 bytes_alloc:1536
key: call_site:[ffffffffa02c6d85] drm_mode_page_flip_ioctl vals: count:8 bytes_alloc:768
key: call_site:[ffffffffa048fb7c] intel_crtc_page_flip vals: count:8 bytes_alloc:768
key: call_site:[ffffffffa04692a6] i915_gem_obj_lookup_or_create_vma vals: count:112 bytes_alloc:21504
key: call_site:[ffffffffa046b74d] i915_gem_object_get_pages_gtt vals: count:112 bytes_alloc:1792
key: call_site:[ffffffff8135f3d9] sg_kmalloc vals: count:112 bytes_alloc:33088
key: call_site:[ffffffffa02cf22e] drm_vma_node_allow vals: count:120 bytes_alloc:7680
key: call_site:[ffffffffa0470ffa] i915_gem_do_execbuffer.isra.21 vals: count:122 bytes_alloc:34432
key: call_site:[ffffffffa0471f10] i915_gem_execbuffer2 vals: count:122 bytes_alloc:80960
key: call_site:[ffffffffa04bc278] intel_ring_begin vals: count:122 bytes_alloc:11712
key: call_site:[ffffffff8130cc67] apparmor_file_alloc_security vals: count:126 bytes_alloc:1008
Totals:
Hits: 1008
Entries: 31
Dropped: 0
Most useful of all would be to not only display the call_sites
symbolically, but also display tallies of the total number of bytes
requested by each caller, the number allocated, and sort by the
difference betwen the two, which essentially gives you a listing of
the callers that waste the most bytes due to the lack of allocation
granularity.
This is a good demonstration of hashing multiple values, tallying the
difference between values (- is the only 'operator' supported), and
specifying a non-default sort order.
# echo 'hash:call_site.sym:bytes_req,bytes_alloc,bytes_alloc-bytes_req:sort=bytes_alloc-bytes_req' > /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
# cat /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
key: call_site:[ffffffff813062f7] aa_alloc_task_context vals: count:30 bytes_req:960, bytes_alloc:960, bytes_alloc-bytes_req:0
key: call_site:[ffffffff813075f4] aa_path_name vals: count:4 bytes_req:1024, bytes_alloc:1024, bytes_alloc-bytes_req:0
key: call_site:[ffffffff811c002e] seq_open vals: count:18 bytes_req:2304, bytes_alloc:2304, bytes_alloc-bytes_req:0
key: call_site:[ffffffff811bfd3a] seq_read vals: count:3 bytes_req:24576, bytes_alloc:24576, bytes_alloc-bytes_req:0
key: call_site:[ffffffff810912cd] alloc_fair_sched_group vals: count:1 bytes_req:64, bytes_alloc:64, bytes_alloc-bytes_req:0
key: call_site:[ffffffff810970db] sched_autogroup_create_attach vals: count:1 bytes_req:64, bytes_alloc:64, bytes_alloc-bytes_req:0
key: call_site:[ffffffff811aaa8f] vfs_rename vals: count:2 bytes_req:22, bytes_alloc:32, bytes_alloc-bytes_req:10
key: call_site:[ffffffff8120c066] proc_self_follow_link vals: count:3 bytes_req:36, bytes_alloc:48, bytes_alloc-bytes_req:12
key: call_site:[ffffffff811f26f9] load_elf_binary vals: count:4 bytes_req:112, bytes_alloc:128, bytes_alloc-bytes_req:16
key: call_site:[ffffffff811f2596] load_elf_binary vals: count:4 bytes_req:2016, bytes_alloc:2048, bytes_alloc-bytes_req:32
key: call_site:[ffffffff81269b65] ext4_ext_remove_space vals: count:3 bytes_req:144, bytes_alloc:192, bytes_alloc-bytes_req:48
key: call_site:[ffffffff811dd155] mounts_open_common vals: count:2 bytes_req:320, bytes_alloc:384, bytes_alloc-bytes_req:64
key: call_site:[ffffffff811b9555] alloc_fdtable vals: count:4 bytes_req:192, bytes_alloc:256, bytes_alloc-bytes_req:64
key: call_site:[ffffffff81236745] ext4_readdir vals: count:13 bytes_req:624, bytes_alloc:832, bytes_alloc-bytes_req:208
key: call_site:[ffffffff811a5b94] alloc_pipe_info vals: count:5 bytes_req:680, bytes_alloc:960, bytes_alloc-bytes_req:280
key: call_site:[ffffffff81202bd1] proc_reg_open vals: count:14 bytes_req:560, bytes_alloc:896, bytes_alloc-bytes_req:336
key: call_site:[ffffffff81087abe] sched_create_group vals: count:1 bytes_req:664, bytes_alloc:1024, bytes_alloc-bytes_req:360
key: call_site:[ffffffffa0312f89] cfg80211_inform_bss_width_frame vals: count:2 bytes_req:546, bytes_alloc:1024, bytes_alloc-bytes_req:478
key: call_site:[ffffffff811f37ac] load_elf_binary vals: count:4 bytes_req:1568, bytes_alloc:2048, bytes_alloc-bytes_req:480
key: call_site:[ffffffff811209d7] event_hash_trigger_print vals: count:7 bytes_req:2520, bytes_alloc:3328, bytes_alloc-bytes_req:808
key: call_site:[ffffffff811e7f26] eventfd_file_create vals: count:71 bytes_req:3408, bytes_alloc:4544, bytes_alloc-bytes_req:1136
key: call_site:[ffffffff81236908] ext4_htree_store_dirent vals: count:100 bytes_req:6246, bytes_alloc:7456, bytes_alloc-bytes_req:1210
key: call_site:[ffffffff811a5bb2] alloc_pipe_info vals: count:5 bytes_req:3200, bytes_alloc:5120, bytes_alloc-bytes_req:1920
key: call_site:[ffffffffa02c6d85] drm_mode_page_flip_ioctl vals: count:370 bytes_req:32560, bytes_alloc:35520, bytes_alloc-bytes_req:2960
key: call_site:[ffffffff8120aeca] stat_open vals: count:7 bytes_req:24752, bytes_alloc:28672, bytes_alloc-bytes_req:3920
key: call_site:[ffffffff811e0792] inotify_handle_event vals: count:644 bytes_req:37470, bytes_alloc:41792, bytes_alloc-bytes_req:4322
key: call_site:[ffffffffa048fb7c] intel_crtc_page_flip vals: count:370 bytes_req:26640, bytes_alloc:35520, bytes_alloc-bytes_req:8880
key: call_site:[ffffffffa008df3b] hid_report_raw_event vals: count:7048 bytes_req:140960, bytes_alloc:155056, bytes_alloc-bytes_req:14096
key: call_site:[ffffffffa0491a40] intel_framebuffer_create vals: count:370 bytes_req:53280, bytes_alloc:71040, bytes_alloc-bytes_req:17760
key: call_site:[ffffffff8130cc67] apparmor_file_alloc_security vals: count:3058 bytes_req:6116, bytes_alloc:24464, bytes_alloc-bytes_req:18348
key: call_site:[ffffffffa04bc278] intel_ring_begin vals: count:2754 bytes_req:242352, bytes_alloc:264384, bytes_alloc-bytes_req:22032
key: call_site:[ffffffffa04692a6] i915_gem_obj_lookup_or_create_vma vals: count:1835 bytes_req:308280, bytes_alloc:352320, bytes_alloc-bytes_req:44040
key: call_site:[ffffffffa02cf22e] drm_vma_node_allow vals: count:2291 bytes_req:91640, bytes_alloc:146624, bytes_alloc-bytes_req:54984
key: call_site:[ffffffff8135f3d9] sg_kmalloc vals: count:1827 bytes_req:432512, bytes_alloc:491808, bytes_alloc-bytes_req:59296
key: call_site:[ffffffffa0470ffa] i915_gem_do_execbuffer.isra.21 vals: count:2754 bytes_req:534960, bytes_alloc:922624, bytes_alloc-bytes_req:387664
key: call_site:[ffffffffa0471f10] i915_gem_execbuffer2 vals: count:2754 bytes_req:2030840, bytes_alloc:2729792, bytes_alloc-bytes_req:698952
Totals:
Hits: 28354
Entries: 48
Dropped: 0
Here's an example of using a compound key. The below tallies syscall
hits for every unique combination of pid/syscall id ('hitcount' is
essentially a placeholder - as mentioned before, counts are always
kept - using 'hitcount' essentially references that 'fake' event field
in the hash trigger specification). Both the syscall id and the pid
are displayed symbolically via the .syscall and .execname modifiers.
# echo 'hash:common_pid.execname,id.syscall:hitcount:sort=common_pid,hitcount' > /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger
# cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger
key: common_pid:bash[3112], id:sys_write vals: count:69
key: common_pid:bash[3112], id:sys_rt_sigprocmask vals: count:218
key: common_pid:update-notifier[3164], id:sys_poll vals: count:37
key: common_pid:update-notifier[3164], id:sys_recvfrom vals: count:118
key: common_pid:deja-dup-monito[3194], id:sys_sendto vals: count:1
key: common_pid:deja-dup-monito[3194], id:sys_read vals: count:4
key: common_pid:deja-dup-monito[3194], id:sys_poll vals: count:8
key: common_pid:deja-dup-monito[3194], id:sys_recvmsg vals: count:8
key: common_pid:deja-dup-monito[3194], id:sys_geteuid vals: count:8
key: common_pid:deja-dup-monito[3194], id:sys_write vals: count:8
key: common_pid:deja-dup-monito[3194], id:sys_getegid vals: count:8
key: common_pid:emacs[3275], id:sys_fsync vals: count:1
key: common_pid:emacs[3275], id:sys_open vals: count:1
key: common_pid:emacs[3275], id:sys_unlink vals: count:1
key: common_pid:emacs[3275], id:sys_close vals: count:1
key: common_pid:emacs[3275], id:sys_symlink vals: count:2
key: common_pid:emacs[3275], id:sys_readlink vals: count:2
key: common_pid:emacs[3275], id:sys_access vals: count:2
key: common_pid:emacs[3275], id:sys_geteuid vals: count:2
key: common_pid:emacs[3275], id:sys_getgid vals: count:2
key: common_pid:emacs[3275], id:sys_getuid vals: count:2
key: common_pid:emacs[3275], id:sys_getegid vals: count:3
key: common_pid:emacs[3275], id:sys_newlstat vals: count:4
key: common_pid:emacs[3275], id:sys_setitimer vals: count:7
key: common_pid:emacs[3275], id:sys_newstat vals: count:8
key: common_pid:emacs[3275], id:sys_read vals: count:9
key: common_pid:emacs[3275], id:sys_write vals: count:14
key: common_pid:emacs[3275], id:sys_kill vals: count:14
key: common_pid:emacs[3275], id:sys_poll vals: count:23
key: common_pid:emacs[3275], id:sys_select vals: count:23
key: common_pid:emacs[3275], id:unknown_syscall vals: count:34
key: common_pid:emacs[3275], id:sys_ioctl vals: count:60
key: common_pid:emacs[3275], id:sys_rt_sigprocmask vals: count:116
key: common_pid:cat[3323], id:sys_munmap vals: count:1
key: common_pid:cat[3323], id:sys_fadvise64 vals: count:1
Finally, the below uses a string as a hash key, and simply tallies and
displays the default count ('hitcount').
# echo 'hash:child_comm:hitcount' > /sys/kernel/debug/tracing/events/sched/sched_process_fork/trigger
# cat /sys/kernel/debug/tracing/events/sched/sched_process_fork/trigger
hash:unlimited
key: child_comm:pool vals: count:1
key: child_comm:unity-panel-ser vals: count:1
key: child_comm:pool vals: count:1
key: child_comm:hud-service vals: count:1
key: child_comm:Cache I/O vals: count:1
key: child_comm:postgres vals: count:1
key: child_comm:gdbus vals: count:1
key: child_comm:bash vals: count:1
key: child_comm:ubuntu-webapps- vals: count:2
key: child_comm:dbus-daemon vals: count:2
key: child_comm:compiz vals: count:3
key: child_comm:apt-cache vals: count:3
key: child_comm:unity-webapps-s vals: count:4
key: child_comm:java vals: count:6
key: child_comm:firefox vals: count:52
Totals:
Hits: 80
Entries: 15
Dropped: 0
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
---
include/linux/ftrace_event.h | 1 +
kernel/trace/trace_events_filter.c | 3 +-
kernel/trace/trace_events_trigger.c | 1404 +++++++++++++++++++++++++++++++++++
3 files changed, 1407 insertions(+), 1 deletion(-)
diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 5961964..8700630 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -353,6 +353,7 @@ enum event_trigger_type {
ETT_SNAPSHOT = (1 << 1),
ETT_STACKTRACE = (1 << 2),
ETT_EVENT_ENABLE = (1 << 3),
+ ETT_EVENT_HASH = (1 << 4),
};
extern void destroy_preds(struct ftrace_event_file *file);
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index 60a8e3f..cee9b29 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -941,7 +941,8 @@ int filter_assign_type(const char *type)
if (strstr(type, "__data_loc") && strstr(type, "char"))
return FILTER_DYN_STRING;
- if (strchr(type, '[') && strstr(type, "char"))
+ if (strchr(type, '[') &&
+ (strstr(type, "char") || strstr(type, "u8")))
return FILTER_STATIC_STRING;
return FILTER_OTHER;
diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
index 323846e..210ddd0 100644
--- a/kernel/trace/trace_events_trigger.c
+++ b/kernel/trace/trace_events_trigger.c
@@ -22,6 +22,9 @@
#include <linux/ctype.h>
#include <linux/mutex.h>
#include <linux/slab.h>
+#include <linux/hash.h>
+#include <linux/stacktrace.h>
+#include <linux/sort.h>
#include "trace.h"
@@ -1427,12 +1430,1413 @@ static __init int register_trigger_traceon_traceoff_cmds(void)
return ret;
}
+struct hash_field;
+
+typedef u64 (*hash_field_fn_t) (struct hash_field *field, void *event);
+
+struct hash_field {
+ struct ftrace_event_field *field;
+ struct ftrace_event_field *aux_field;
+ hash_field_fn_t fn;
+ unsigned long flags;
+};
+
+static u64 hash_field_none(struct hash_field *field, void *event)
+{
+ return 0;
+}
+
+static u64 hash_field_string(struct hash_field *hash_field, void *event)
+{
+ char *addr = (char *)(event + hash_field->field->offset);
+
+ return (u64)addr;
+}
+
+static u64 hash_field_diff(struct hash_field *hash_field, void *event)
+{
+ u64 *m, *s;
+
+ m = (u64 *)(event + hash_field->field->offset);
+ s = (u64 *)(event + hash_field->aux_field->offset);
+
+ return *m - *s;
+}
+
+#define DEFINE_HASH_FIELD_FN(type) \
+static u64 hash_field_##type(struct hash_field *hash_field, void *event)\
+{ \
+ type *addr = (type *)(event + hash_field->field->offset); \
+ \
+ return (u64)*addr; \
+}
+
+DEFINE_HASH_FIELD_FN(s64);
+DEFINE_HASH_FIELD_FN(u64);
+DEFINE_HASH_FIELD_FN(s32);
+DEFINE_HASH_FIELD_FN(u32);
+DEFINE_HASH_FIELD_FN(s16);
+DEFINE_HASH_FIELD_FN(u16);
+DEFINE_HASH_FIELD_FN(s8);
+DEFINE_HASH_FIELD_FN(u8);
+
+#define HASH_TRIGGER_BITS 11
+#define COMPOUND_KEY_MAX 8
+#define HASH_VALS_MAX 16
+#define HASH_SORT_KEYS_MAX 2
+
+/* Largest event field string currently 32, add 1 = 64 */
+#define HASH_KEY_STRING_MAX 64
+
+enum hash_field_flags {
+ HASH_FIELD_SYM = 1,
+ HASH_FIELD_HEX = 2,
+ HASH_FIELD_STACKTRACE = 4,
+ HASH_FIELD_STRING = 8,
+ HASH_FIELD_EXECNAME = 16,
+ HASH_FIELD_SYSCALL = 32,
+};
+
+enum sort_key_flags {
+ SORT_KEY_COUNT = 1,
+};
+
+struct hash_trigger_sort_key {
+ bool descending;
+ bool use_hitcount;
+ bool key_part;
+ unsigned int idx;
+};
+
+struct hash_trigger_data {
+ struct hlist_head *hashtab;
+ unsigned int hashtab_bits;
+ char *keys_str;
+ char *vals_str;
+ char *sort_keys_str;
+ struct hash_field *keys[COMPOUND_KEY_MAX];
+ unsigned int n_keys;
+ struct hash_field *vals[HASH_VALS_MAX];
+ unsigned int n_vals;
+ struct ftrace_event_file *event_file;
+ unsigned long total_hits;
+ unsigned long total_entries;
+ struct hash_trigger_sort_key *sort_keys[HASH_SORT_KEYS_MAX];
+ struct hash_trigger_sort_key *sort_key_cur;
+ spinlock_t lock;
+ unsigned int max_entries;
+ struct hash_trigger_entry *entries;
+ unsigned int n_entries;
+ struct stack_trace *struct_stacktrace_entries;
+ unsigned int n_struct_stacktrace_entries;
+ unsigned long *stacktrace_entries;
+ unsigned int n_stacktrace_entries;
+ char *hash_key_string_entries;
+ unsigned int n_hash_key_string_entries;
+ unsigned long drops;
+};
+
+enum hash_key_type {
+ HASH_KEY_TYPE_U64,
+ HASH_KEY_TYPE_STACKTRACE,
+ HASH_KEY_TYPE_STRING,
+};
+
+struct hash_key_part {
+ enum hash_key_type type;
+ unsigned long flags;
+ union {
+ u64 val_u64;
+ struct stack_trace *val_stacktrace;
+ char *val_string;
+ } var;
+};
+
+struct hash_trigger_entry {
+ struct hlist_node node;
+ struct hash_key_part key_parts[COMPOUND_KEY_MAX];
+ u64 sums[HASH_VALS_MAX];
+ char comm[TASK_COMM_LEN + 1];
+ u64 count;
+ struct hash_trigger_data* hash_data;
+};
+
+#define HASH_STACKTRACE_DEPTH 16
+#define HASH_STACKTRACE_SKIP 4
+
+static hash_field_fn_t select_value_fn(int field_size, int field_is_signed)
+{
+ hash_field_fn_t fn = NULL;
+
+ switch (field_size) {
+ case 8:
+ if (field_is_signed)
+ fn = hash_field_s64;
+ else
+ fn = hash_field_u64;
+ break;
+ case 4:
+ if (field_is_signed)
+ fn = hash_field_s32;
+ else
+ fn = hash_field_u32;
+ break;
+ case 2:
+ if (field_is_signed)
+ fn = hash_field_s16;
+ else
+ fn = hash_field_u16;
+ break;
+ case 1:
+ if (field_is_signed)
+ fn = hash_field_s8;
+ else
+ fn = hash_field_u8;
+ break;
+ }
+
+ return fn;
+}
+
+#define FNV_OFFSET_BASIS (14695981039346656037ULL)
+#define FNV_PRIME (1099511628211ULL)
+
+static u64 hash_fnv_1a(char *key, unsigned int size, unsigned int bits)
+{
+ u64 hash = FNV_OFFSET_BASIS;
+ unsigned int i;
+
+ for (i = 0; i < size; i++) {
+ hash ^= key[i];
+ hash *= FNV_PRIME;
+ }
+
+ return hash >> (64 - bits);
+}
+
+static u64 hash_stacktrace(struct stack_trace *stacktrace, unsigned int bits)
+{
+ unsigned int size;
+
+ size = stacktrace->nr_entries * sizeof(*stacktrace->entries);
+
+ return hash_fnv_1a((char *)stacktrace->entries, size, bits);
+}
+
+static u64 hash_string(struct hash_field *hash_field,
+ unsigned int bits, void *rec)
+{
+ unsigned int size;
+ char *string;
+
+ size = hash_field->field->size;
+ string = (char *)hash_field->fn(hash_field, rec);
+
+ return hash_fnv_1a(string, size, bits);
+}
+
+static u64 hash_compound_key(struct hash_trigger_data *hash_data,
+ unsigned int bits, void *rec)
+{
+ struct hash_field *hash_field;
+ u64 key[COMPOUND_KEY_MAX];
+ unsigned int i;
+
+ for (i = 0; i < hash_data->n_keys; i++) {
+ hash_field = hash_data->keys[i];
+ key[i] = hash_field->fn(hash_field, rec);
+ }
+
+ return hash_fnv_1a((char *)key, hash_data->n_keys * sizeof(key[0]), bits);
+}
+
+static u64 hash_key(struct hash_trigger_data *hash_data, void *rec,
+ struct stack_trace *stacktrace)
+{
+ /* currently can't have compound key with string or stacktrace */
+ struct hash_field *hash_field = hash_data->keys[0];
+ unsigned int bits = hash_data->hashtab_bits;
+ u64 hash_idx = 0;
+
+ if (hash_field->flags & HASH_FIELD_STACKTRACE)
+ hash_idx = hash_stacktrace(stacktrace, bits);
+ else if (hash_field->flags & HASH_FIELD_STRING)
+ hash_idx = hash_string(hash_field, bits, rec);
+ else if (hash_data->n_keys > 1)
+ hash_idx = hash_compound_key(hash_data, bits, rec);
+ else {
+ u64 hash_val = hash_field->fn(hash_field, rec);
+
+ switch (hash_field->field->size) {
+ case 8:
+ hash_idx = hash_64(hash_val, bits);
+ break;
+ case 4:
+ hash_idx = hash_32(hash_val, bits);
+ break;
+ default:
+ WARN_ON_ONCE(1);
+ break;
+ }
+ }
+
+ return hash_idx;
+}
+
+static inline void save_comm(char *comm, struct task_struct *task) {
+
+ if (!task->pid) {
+ strcpy(comm, "<idle>");
+ return;
+ }
+
+ if (WARN_ON_ONCE(task->pid < 0)) {
+ strcpy(comm, "<XXX>");
+ return;
+ }
+
+ if (task->pid > PID_MAX_DEFAULT) {
+ strcpy(comm, "<...>");
+ return;
+ }
+
+ memcpy(comm, task->comm, TASK_COMM_LEN);
+}
+
+static void stacktrace_entry_fill(struct hash_trigger_entry *entry,
+ unsigned int key,
+ struct hash_field *hash_field,
+ struct stack_trace *stacktrace)
+{
+ struct hash_trigger_data *hash_data = entry->hash_data;
+ struct stack_trace *stacktrace_copy;
+ unsigned int size, offset, idx;
+
+ idx = hash_data->n_struct_stacktrace_entries++;
+ stacktrace_copy = &hash_data->struct_stacktrace_entries[idx];
+ *stacktrace_copy = *stacktrace;
+
+ idx = hash_data->n_stacktrace_entries++;
+ size = sizeof(unsigned long) * HASH_STACKTRACE_DEPTH;
+ offset = HASH_STACKTRACE_DEPTH * idx;
+ stacktrace_copy->entries = &hash_data->stacktrace_entries[offset];
+ memcpy(stacktrace_copy->entries, stacktrace->entries, size);
+
+ entry->key_parts[key].type = HASH_KEY_TYPE_STACKTRACE;
+ entry->key_parts[key].flags = hash_field->flags;
+ entry->key_parts[key].var.val_stacktrace = stacktrace_copy;
+}
+
+static void string_entry_fill(struct hash_trigger_entry *entry,
+ unsigned int key,
+ struct hash_field *hash_field,
+ void *rec)
+{
+ struct hash_trigger_data *hash_data = entry->hash_data;
+ unsigned int size = hash_field->field->size + 1;
+ unsigned int offset;
+ char *string_copy;
+
+ offset = HASH_KEY_STRING_MAX * hash_data->n_hash_key_string_entries++;
+ string_copy = &hash_data->hash_key_string_entries[offset];
+
+ memcpy(string_copy, (char *)hash_field->fn(hash_field, rec), size);
+
+ entry->key_parts[key].type = HASH_KEY_TYPE_STRING;
+ entry->key_parts[key].flags = hash_field->flags;
+ entry->key_parts[key].var.val_string = string_copy;
+}
+
+static struct hash_trigger_entry *
+hash_trigger_entry_create(struct hash_trigger_data *hash_data, void *rec,
+ struct stack_trace *stacktrace)
+{
+ struct hash_trigger_entry *entry = NULL;
+ struct hash_field *hash_field;
+ bool save_execname = false;
+ unsigned int i;
+
+ if (hash_data->n_entries < hash_data->max_entries) {
+ entry = &hash_data->entries[hash_data->n_entries++];
+ if (!entry)
+ return NULL;
+ }
+
+ entry->hash_data = hash_data;
+
+ for (i = 0; i < hash_data->n_keys; i++) {
+ hash_field = hash_data->keys[i];
+
+ if (hash_field->flags & HASH_FIELD_STACKTRACE)
+ stacktrace_entry_fill(entry, i, hash_field, stacktrace);
+ else if (hash_field->flags & HASH_FIELD_STRING)
+ string_entry_fill(entry, i, hash_field, rec);
+ else {
+ u64 hash_val = hash_field->fn(hash_field, rec);
+
+ entry->key_parts[i].type = HASH_KEY_TYPE_U64;
+ entry->key_parts[i].flags = hash_field->flags;
+ entry->key_parts[i].var.val_u64 = hash_val;
+ /*
+ EXECNAME only applies to common_pid as a
+ key, And with the assumption that the comm
+ saved is only for common_pid i.e. current
+ pid when the event was logged. comm is
+ saved only when the hash entry is created,
+ subsequent hits for that hash entry map the
+ same pid and comm.
+ */
+ if (hash_field->flags & HASH_FIELD_EXECNAME)
+ save_execname = true;
+ }
+ }
+
+ if (save_execname)
+ save_comm(entry->comm, current);
+
+ return entry;
+}
+
+static void destroy_hashtab(struct hash_trigger_data *hash_data)
+{
+ struct hlist_head *hashtab = hash_data->hashtab;
+
+ if (!hashtab)
+ return;
+
+ kfree(hashtab);
+
+ hash_data->hashtab = NULL;
+}
+
+static void destroy_hash_field(struct hash_field *hash_field)
+{
+ kfree(hash_field);
+}
+
+static struct hash_field *
+create_hash_field(struct ftrace_event_field *field,
+ struct ftrace_event_field *aux_field,
+ unsigned long flags)
+{
+ hash_field_fn_t fn = hash_field_none;
+ struct hash_field *hash_field;
+
+ hash_field = kzalloc(sizeof(struct hash_field), GFP_KERNEL);
+ if (!hash_field)
+ return NULL;
+
+ if (flags & HASH_FIELD_STACKTRACE) {
+ hash_field->flags = flags;
+ goto out;
+ }
+
+ if (is_string_field(field)) {
+ flags |= HASH_FIELD_STRING;
+ fn = hash_field_string;
+ } else if (is_function_field(field))
+ goto free;
+ else {
+ if (aux_field) {
+ hash_field->aux_field = aux_field;
+ fn = hash_field_diff;
+ } else {
+ fn = select_value_fn(field->size, field->is_signed);
+ if (!fn)
+ goto free;
+ }
+ }
+
+ hash_field->field = field;
+ hash_field->fn = fn;
+ hash_field->flags = flags;
+ out:
+ return hash_field;
+ free:
+ kfree(hash_field);
+ hash_field = NULL;
+ goto out;
+}
+
+static void destroy_hash_fields(struct hash_trigger_data *hash_data)
+{
+ unsigned int i;
+
+ for (i = 0; i < hash_data->n_keys; i++) {
+ destroy_hash_field(hash_data->keys[i]);
+ hash_data->keys[i] = NULL;
+ }
+
+ for (i = 0; i < hash_data->n_vals; i++) {
+ destroy_hash_field(hash_data->vals[i]);
+ hash_data->vals[i] = NULL;
+ }
+}
+
+static inline struct hash_trigger_sort_key *create_default_sort_key(void)
+{
+ struct hash_trigger_sort_key *sort_key;
+
+ sort_key = kzalloc(sizeof(*sort_key), GFP_KERNEL);
+ if (!sort_key)
+ return NULL;
+
+ sort_key->use_hitcount = true;
+
+ return sort_key;
+}
+
+static inline struct hash_trigger_sort_key *
+create_sort_key(char *field_name, struct hash_trigger_data *hash_data)
+{
+ struct hash_trigger_sort_key *sort_key;
+ bool key_part = false;
+ unsigned int j;
+
+ if (!strcmp(field_name, "hitcount"))
+ return create_default_sort_key();
+
+ if (strchr(field_name, '-')) {
+ char *aux_field_name = field_name;
+
+ field_name = strsep(&aux_field_name, "-");
+ if (!aux_field_name)
+ return NULL;
+
+ for (j = 0; j < hash_data->n_vals; j++)
+ if (!strcmp(field_name,
+ hash_data->vals[j]->field->name) &&
+ (hash_data->vals[j]->aux_field &&
+ !strcmp(aux_field_name,
+ hash_data->vals[j]->aux_field->name)))
+ goto out;
+ }
+
+ for (j = 0; j < hash_data->n_vals; j++)
+ if (!strcmp(field_name, hash_data->vals[j]->field->name))
+ goto out;
+
+ for (j = 0; j < hash_data->n_keys; j++) {
+ if (hash_data->keys[j]->flags & HASH_FIELD_STACKTRACE)
+ continue;
+ if (hash_data->keys[j]->flags & HASH_FIELD_STRING)
+ continue;
+ if (!strcmp(field_name, hash_data->keys[j]->field->name)) {
+ key_part = true;
+ goto out;
+ }
+ }
+
+ return NULL;
+ out:
+ sort_key = kzalloc(sizeof(*sort_key), GFP_KERNEL);
+ if (!sort_key)
+ return NULL;
+
+ sort_key->idx = j;
+ sort_key->key_part = key_part;
+
+ return sort_key;
+}
+
+static int create_sort_keys(struct hash_trigger_data *hash_data)
+{
+ char *fields_str = hash_data->sort_keys_str;
+ struct hash_trigger_sort_key *sort_key;
+ char *field_str, *field_name;
+ unsigned int i;
+ int ret = 0;
+
+ if (!fields_str) {
+ sort_key = create_default_sort_key();
+ if (!sort_key) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ hash_data->sort_keys[0] = sort_key;
+ goto out;
+ }
+
+ strsep(&fields_str, "=");
+ if (!fields_str) {
+ ret = -EINVAL;
+ goto free;
+ }
+
+ for (i = 0; i < HASH_SORT_KEYS_MAX; i++) {
+ field_str = strsep(&fields_str, ",");
+ if (!field_str) {
+ if (i == 0) {
+ ret = -EINVAL;
+ goto free;
+ } else
+ break;
+ }
+
+ field_name = strsep(&field_str, ".");
+ sort_key = create_sort_key(field_name, hash_data);
+ if (!sort_key) {
+ ret = -EINVAL; /* or -ENOMEM */
+ goto free;
+ }
+ if (field_str) {
+ if (!strcmp(field_str, "descending"))
+ sort_key->descending = true;
+ else if (strcmp(field_str, "ascending")) {
+ ret = -EINVAL; /* not either, err */
+ goto free;
+ }
+ }
+ hash_data->sort_keys[i] = sort_key;
+ }
+out:
+ return ret;
+free:
+ for (i = 0; i < HASH_SORT_KEYS_MAX; i++) {
+ if (!hash_data->sort_keys[i])
+ break;
+ kfree(hash_data->sort_keys[i]);
+ hash_data->sort_keys[i] = NULL;
+ }
+ goto out;
+}
+
+static int create_key_field(struct hash_trigger_data *hash_data,
+ unsigned int key,
+ struct ftrace_event_file *file,
+ char *field_str)
+{
+ struct ftrace_event_field *field = NULL;
+ unsigned long flags = 0;
+ char *field_name;
+ int ret = 0;
+
+ if (!strcmp(field_str, "stacktrace")) {
+ flags |= HASH_FIELD_STACKTRACE;
+ } else {
+ field_name = strsep(&field_str, ".");
+ if (field_str) {
+ if (!strcmp(field_str, "sym"))
+ flags |= HASH_FIELD_SYM;
+ else if (!strcmp(field_str, "hex"))
+ flags |= HASH_FIELD_HEX;
+ else if (!strcmp(field_str, "execname") &&
+ !strcmp(field_name, "common_pid"))
+ flags |= HASH_FIELD_EXECNAME;
+ else if (!strcmp(field_str, "syscall"))
+ flags |= HASH_FIELD_SYSCALL;
+ }
+
+ field = trace_find_event_field(file->event_call, field_name);
+ if (!field) {
+ ret = -EINVAL;
+ goto out;
+ }
+ }
+
+ hash_data->keys[key] = create_hash_field(field, NULL, flags);
+ if (!hash_data->keys[key]) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ hash_data->n_keys++;
+ out:
+ return ret;
+}
+
+static int create_val_field(struct hash_trigger_data *hash_data,
+ unsigned int val,
+ struct ftrace_event_file *file,
+ char *field_str)
+{
+ struct ftrace_event_field *field = NULL;
+ unsigned long flags = 0;
+ char *field_name;
+ int ret = 0;
+
+ if (!strcmp(field_str, "hitcount"))
+ return ret; /* There's always a hitcount */
+
+ field_name = strsep(&field_str, "-");
+ if (field_str) {
+ struct ftrace_event_field *m_field, *s_field;
+
+ m_field = trace_find_event_field(file->event_call, field_name);
+ if (!m_field || is_string_field(m_field) ||
+ is_function_field(m_field)) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ s_field = trace_find_event_field(file->event_call, field_str);
+ if (!s_field || is_string_field(m_field) ||
+ is_function_field(m_field)) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ hash_data->vals[val] = create_hash_field(m_field, s_field, flags);
+ if (!hash_data->vals[val]) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ } else {
+ field_str = field_name;
+ field_name = strsep(&field_str, ".");
+
+ if (field_str) {
+ if (!strcmp(field_str, "sym"))
+ flags |= HASH_FIELD_SYM;
+ else if (!strcmp(field_str, "hex"))
+ flags |= HASH_FIELD_HEX;
+ }
+
+ field = trace_find_event_field(file->event_call, field_name);
+ if (!field) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ hash_data->vals[val] = create_hash_field(field, NULL, flags);
+ if (!hash_data->vals[val]) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ }
+ hash_data->n_vals++;
+ out:
+ return ret;
+}
+
+static int create_hash_fields(struct hash_trigger_data *hash_data,
+ struct ftrace_event_file *file)
+{
+ char *fields_str, *field_str;
+ unsigned int i;
+ int ret = 0;
+
+ fields_str = hash_data->keys_str;
+
+ for (i = 0; i < COMPOUND_KEY_MAX; i++) {
+ field_str = strsep(&fields_str, ",");
+ if (!field_str) {
+ if (i == 0) {
+ ret = -EINVAL;
+ goto out;
+ } else
+ break;
+ }
+
+ ret = create_key_field(hash_data, i, file, field_str);
+ if (ret)
+ goto out;
+ }
+
+ fields_str = hash_data->vals_str;
+
+ for (i = 0; i < HASH_VALS_MAX; i++) {
+ field_str = strsep(&fields_str, ",");
+ if (!field_str) {
+ if (i == 0) {
+ ret = -EINVAL;
+ goto out;
+ } else
+ break;
+ }
+
+ ret = create_val_field(hash_data, i, file, field_str);
+ if (ret)
+ goto out;
+ }
+
+ ret = create_sort_keys(hash_data);
+ out:
+ return ret;
+}
+
+static void destroy_hashdata(struct hash_trigger_data *hash_data)
+{
+ synchronize_sched();
+
+ kfree(hash_data->keys_str);
+ kfree(hash_data->vals_str);
+ kfree(hash_data->sort_keys_str);
+ hash_data->keys_str = NULL;
+ hash_data->vals_str = NULL;
+ hash_data->sort_keys_str = NULL;
+
+ kfree(hash_data->entries);
+ hash_data->entries = NULL;
+
+ kfree(hash_data->struct_stacktrace_entries);
+ hash_data->struct_stacktrace_entries = NULL;
+
+ kfree(hash_data->stacktrace_entries);
+ hash_data->stacktrace_entries = NULL;
+
+ kfree(hash_data->hash_key_string_entries);
+ hash_data->hash_key_string_entries = NULL;
+
+ destroy_hash_fields(hash_data);
+ destroy_hashtab(hash_data);
+
+ kfree(hash_data);
+}
+
+static struct hash_trigger_data *create_hash_data(unsigned int hashtab_bits,
+ const char *keys,
+ const char *vals,
+ const char *sort_keys,
+ struct ftrace_event_file *file,
+ int *ret)
+{
+ unsigned int hashtab_size = (1 << hashtab_bits);
+ struct hash_trigger_data *hash_data;
+ unsigned int i, size;
+
+ hash_data = kzalloc(sizeof(*hash_data), GFP_KERNEL);
+ if (!hash_data)
+ return NULL;
+
+ /* Let's just say we size for a perfect hash but are not
+ * perfect. So let's have enough for 2 * the hashtab_size. */
+
+ /* Also, we'll run out of entries before or at the same time
+ * we run out of other items like strings or stacks, so we
+ * only need to pay attention to one counter, for entries. */
+
+ /* Also, use vmalloc or something for these large blocks. */
+ hash_data->max_entries = hashtab_size * 2;
+ size = sizeof(struct hash_trigger_entry) * hash_data->max_entries;
+ hash_data->entries = kzalloc(size, GFP_KERNEL);
+ if (!hash_data->entries)
+ goto free;
+
+ size = sizeof(struct stack_trace) * hash_data->max_entries;
+ hash_data->struct_stacktrace_entries = kzalloc(size, GFP_KERNEL);
+ if (!hash_data->struct_stacktrace_entries)
+ goto free;
+
+ size = sizeof(unsigned long) * HASH_STACKTRACE_DEPTH * hash_data->max_entries;
+ hash_data->stacktrace_entries = kzalloc(size, GFP_KERNEL);
+ if (!hash_data->stacktrace_entries)
+ goto free;
+
+ size = sizeof(char) * HASH_KEY_STRING_MAX * hash_data->max_entries;
+ hash_data->hash_key_string_entries = kzalloc(size, GFP_KERNEL);
+ if (!hash_data->hash_key_string_entries)
+ goto free;
+
+ hash_data->keys_str = kstrdup(keys, GFP_KERNEL);
+ hash_data->vals_str = kstrdup(vals, GFP_KERNEL);
+ if (sort_keys)
+ hash_data->sort_keys_str = kstrdup(sort_keys, GFP_KERNEL);
+
+ *ret = create_hash_fields(hash_data, file);
+ if (*ret < 0)
+ goto free;
+
+ hash_data->hashtab = kzalloc(hashtab_size * sizeof(struct hlist_head),
+ GFP_KERNEL);
+ if (!hash_data->hashtab) {
+ *ret = -ENOMEM;
+ goto free;
+ }
+
+ for (i = 0; i < hashtab_size; i++)
+ INIT_HLIST_HEAD(&hash_data->hashtab[i]);
+ spin_lock_init(&hash_data->lock);
+
+ hash_data->hashtab_bits = hashtab_bits;
+ hash_data->event_file = file;
+ out:
+ return hash_data;
+ free:
+ destroy_hashdata(hash_data);
+ hash_data = NULL;
+ goto out;
+}
+
+static inline bool match_stacktraces(struct stack_trace *entry_stacktrace,
+ struct stack_trace *stacktrace)
+{
+ unsigned int size;
+
+ if (entry_stacktrace->nr_entries != entry_stacktrace->nr_entries)
+ return false;
+
+ size = sizeof(*stacktrace->entries) * stacktrace->nr_entries;
+ if (memcmp(entry_stacktrace->entries, stacktrace->entries, size) == 0)
+ return true;
+
+ return false;
+}
+
+static struct hash_trigger_entry *
+hash_trigger_entry_match(struct hash_trigger_entry *entry,
+ struct hash_key_part *key_parts,
+ unsigned int n_key_parts)
+{
+ unsigned int i;
+
+ for (i = 0; i < n_key_parts; i++) {
+ if (entry->key_parts[i].type != key_parts[i].type)
+ return NULL;
+
+ switch (entry->key_parts[i].type) {
+ case HASH_KEY_TYPE_U64:
+ if (entry->key_parts[i].var.val_u64 !=
+ key_parts[i].var.val_u64)
+ return NULL;
+ break;
+ case HASH_KEY_TYPE_STACKTRACE:
+ if (!match_stacktraces(entry->key_parts[i].var.val_stacktrace,
+ key_parts[i].var.val_stacktrace))
+ return NULL;
+ break;
+ case HASH_KEY_TYPE_STRING:
+ if (strcmp(entry->key_parts[i].var.val_string,
+ key_parts[i].var.val_string))
+ return NULL;
+ break;
+ default:
+ return NULL;
+ }
+ }
+
+ return entry;
+}
+
+static struct hash_trigger_entry *
+hash_trigger_entry_find(struct hash_trigger_data *hash_data, void *rec,
+ struct stack_trace *stacktrace)
+{
+ struct hash_key_part key_parts[COMPOUND_KEY_MAX];
+ unsigned int i, n_keys = hash_data->n_keys;
+ struct hash_trigger_entry *entry;
+ struct hash_field *hash_field;
+ u64 hash_idx;
+
+ hash_idx = hash_key(hash_data, rec, stacktrace);
+
+ for (i = 0; i < n_keys; i++) {
+ hash_field = hash_data->keys[i];
+ if (hash_field->flags & HASH_FIELD_STACKTRACE) {
+ key_parts[i].type = HASH_KEY_TYPE_STACKTRACE;
+ key_parts[i].var.val_stacktrace = stacktrace;
+ } else if (hash_field->flags & HASH_FIELD_STRING) {
+ u64 hash_val = hash_field->fn(hash_field, rec);
+
+ key_parts[i].type = HASH_KEY_TYPE_STRING;
+ key_parts[i].var.val_string = (char *)hash_val;
+ } else {
+ u64 hash_val = hash_field->fn(hash_field, rec);
+
+ key_parts[i].type = HASH_KEY_TYPE_U64;
+ key_parts[i].var.val_u64 = hash_val;
+ }
+ }
+
+ hlist_for_each_entry_rcu(entry, &hash_data->hashtab[hash_idx], node) {
+ if (hash_trigger_entry_match(entry, key_parts, n_keys))
+ return entry;
+ }
+
+ return NULL;
+}
+
+static void hash_trigger_entry_insert(struct hash_trigger_data *hash_data,
+ struct hash_trigger_entry *entry,
+ void *rec,
+ struct stack_trace *stacktrace)
+{
+ u64 hash_idx = hash_key(hash_data, rec, stacktrace);
+
+ hash_data->total_entries++;
+
+ hlist_add_head_rcu(&entry->node, &hash_data->hashtab[hash_idx]);
+}
+
+static void
+hash_trigger_entry_update(struct hash_trigger_data *hash_data,
+ struct hash_trigger_entry *entry, void *rec)
+{
+ struct hash_field *hash_field;
+ unsigned int i;
+ u64 hash_val;
+
+ for (i = 0; i < hash_data->n_vals; i++) {
+ hash_field = hash_data->vals[i];
+ hash_val = hash_field->fn(hash_field, rec);
+ entry->sums[i] += hash_val;
+ }
+
+ entry->count++;
+}
+
+static void
+event_hash_trigger(struct event_trigger_data *data, void *rec)
+{
+ struct hash_trigger_data *hash_data = data->private_data;
+ struct hash_trigger_entry *entry;
+ struct hash_field *hash_field;
+
+ struct stack_trace stacktrace;
+ unsigned long entries[HASH_STACKTRACE_DEPTH];
+
+ unsigned long flags;
+
+ if (hash_data->drops) {
+ hash_data->drops++;
+ return;
+ }
+
+ hash_field = hash_data->keys[0];
+
+ if (hash_field->flags & HASH_FIELD_STACKTRACE) {
+ stacktrace.max_entries = HASH_STACKTRACE_DEPTH;
+ stacktrace.entries = entries;
+ stacktrace.nr_entries = 0;
+ stacktrace.skip = HASH_STACKTRACE_SKIP;
+
+ save_stack_trace(&stacktrace);
+ }
+
+ spin_lock_irqsave(&hash_data->lock, flags);
+ entry = hash_trigger_entry_find(hash_data, rec, &stacktrace);
+
+ if (!entry) {
+ entry = hash_trigger_entry_create(hash_data, rec, &stacktrace);
+ WARN_ON_ONCE(!entry);
+ if (!entry) {
+ spin_unlock_irqrestore(&hash_data->lock, flags);
+ return;
+ }
+ hash_trigger_entry_insert(hash_data, entry, rec, &stacktrace);
+ }
+
+ hash_trigger_entry_update(hash_data, entry, rec);
+ hash_data->total_hits++;
+ spin_unlock_irqrestore(&hash_data->lock, flags);
+}
+
+static void
+hash_trigger_stacktrace_print(struct seq_file *m,
+ struct stack_trace *stacktrace)
+{
+ char str[KSYM_SYMBOL_LEN];
+ unsigned int spaces = 8;
+ unsigned int i;
+
+ for (i = 0; i < stacktrace->nr_entries; i++) {
+ if (stacktrace->entries[i] == ULONG_MAX)
+ return;
+ seq_printf(m, "%*c", 1 + spaces, ' ');
+ sprint_symbol(str, stacktrace->entries[i]);
+ seq_printf(m, "%s\n", str);
+ }
+}
+
+static void
+hash_trigger_entry_print(struct seq_file *m,
+ struct hash_trigger_data *hash_data,
+ struct hash_trigger_entry *entry)
+{
+ char str[KSYM_SYMBOL_LEN];
+ unsigned int i;
+
+ seq_printf(m, "key: ");
+ for (i = 0; i < hash_data->n_keys; i++) {
+ if (i > 0)
+ seq_printf(m, ", ");
+ if (entry->key_parts[i].flags & HASH_FIELD_SYM) {
+ kallsyms_lookup(entry->key_parts[i].var.val_u64,
+ NULL, NULL, NULL, str);
+ seq_printf(m, "%s:[%llx] %s",
+ hash_data->keys[i]->field->name,
+ entry->key_parts[i].var.val_u64,
+ str);
+ } else if (entry->key_parts[i].flags & HASH_FIELD_HEX) {
+ seq_printf(m, "%s:%llx",
+ hash_data->keys[i]->field->name,
+ entry->key_parts[i].var.val_u64);
+ } else if (entry->key_parts[i].flags & HASH_FIELD_STACKTRACE) {
+ seq_printf(m, "stacktrace:\n");
+ hash_trigger_stacktrace_print(m,
+ entry->key_parts[i].var.val_stacktrace);
+ } else if (entry->key_parts[i].flags & HASH_FIELD_STRING) {
+ seq_printf(m, "%s:%s",
+ hash_data->keys[i]->field->name,
+ entry->key_parts[i].var.val_string);
+ } else if (entry->key_parts[i].flags & HASH_FIELD_EXECNAME) {
+ seq_printf(m, "%s:%s[%llu]",
+ hash_data->keys[i]->field->name,
+ entry->comm,
+ entry->key_parts[i].var.val_u64);
+ } else if (entry->key_parts[i].flags & HASH_FIELD_SYSCALL) {
+ int syscall = entry->key_parts[i].var.val_u64;
+ const char *syscall_name = get_syscall_name(syscall);
+
+ if (!syscall_name)
+ syscall_name = "unknown_syscall";
+ seq_printf(m, "%s:%s",
+ hash_data->keys[i]->field->name,
+ syscall_name);
+ } else {
+ seq_printf(m, "%s:%llu",
+ hash_data->keys[i]->field->name,
+ entry->key_parts[i].var.val_u64);
+ }
+ }
+
+ seq_printf(m, "\tvals: count:%llu", entry->count);
+
+ for (i = 0; i < hash_data->n_vals; i++) {
+ if (i > 0)
+ seq_printf(m, ", ");
+ if (hash_data->vals[i]->aux_field) {
+ seq_printf(m, " %s-%s:%llu",
+ hash_data->vals[i]->field->name,
+ hash_data->vals[i]->aux_field->name,
+ entry->sums[i]);
+ continue;
+ }
+ seq_printf(m, " %s:%llu",
+ hash_data->vals[i]->field->name,
+ entry->sums[i]);
+ }
+ seq_printf(m, "\n");
+}
+
+static int sort_entries(const struct hash_trigger_entry **a,
+ const struct hash_trigger_entry **b)
+{
+ const struct hash_trigger_entry *entry_a, *entry_b;
+ struct hash_trigger_sort_key *sort_key;
+ struct hash_trigger_data *hash_data;
+ u64 val_a, val_b;
+ int ret = 0;
+
+ entry_a = *a;
+ entry_b = *b;
+
+ hash_data = entry_a->hash_data;
+ sort_key = hash_data->sort_key_cur;
+
+ if (sort_key->use_hitcount) {
+ val_a = entry_a->count;
+ val_b = entry_b->count;
+ } else if (sort_key->key_part) {
+ /* TODO: make sure we never use a stacktrace here */
+ val_a = entry_a->key_parts[sort_key->idx].var.val_u64;
+ val_b = entry_b->key_parts[sort_key->idx].var.val_u64;
+ } else {
+ val_a = entry_a->sums[sort_key->idx];
+ val_b = entry_b->sums[sort_key->idx];
+ }
+
+ if (val_a > val_b)
+ ret = 1;
+ else if (val_a < val_b)
+ ret = -1;
+
+ if (sort_key->descending)
+ ret = -ret;
+
+ return ret;
+}
+
+static void sort_secondary(struct hash_trigger_data *hash_data,
+ struct hash_trigger_entry **entries,
+ unsigned int n_entries)
+{
+ struct hash_trigger_sort_key *primary_sort_key;
+ unsigned int start = 0, n_subelts = 1;
+ struct hash_trigger_entry *entry;
+ bool do_sort = false;
+ unsigned int i, idx;
+ u64 cur_val;
+
+ primary_sort_key = hash_data->sort_keys[0];
+
+ entry = entries[0];
+ if (primary_sort_key->use_hitcount)
+ cur_val = entry->count;
+ else if (primary_sort_key->key_part)
+ cur_val = entry->key_parts[primary_sort_key->idx].var.val_u64;
+ else
+ cur_val = entry->sums[primary_sort_key->idx];
+
+ hash_data->sort_key_cur = hash_data->sort_keys[1];
+
+ for (i = 1; i < n_entries; i++) {
+ entry = entries[i];
+ if (primary_sort_key->use_hitcount) {
+ if (entry->count != cur_val) {
+ cur_val = entry->count;
+ do_sort = true;
+ }
+ } else if (primary_sort_key->key_part) {
+ idx = primary_sort_key->idx;
+ if (entry->key_parts[idx].var.val_u64 != cur_val) {
+ cur_val = entry->key_parts[idx].var.val_u64;
+ do_sort = true;
+ }
+ } else {
+ idx = primary_sort_key->idx;
+ if (entry->sums[idx] != cur_val) {
+ cur_val = entry->sums[idx];
+ do_sort = true;
+ }
+ }
+
+ if (i == n_entries - 1)
+ do_sort = true;
+
+ if (do_sort) {
+ if (n_subelts > 1) {
+ sort(entries + start, n_subelts, sizeof(entry),
+ (int (*)(const void *, const void *))sort_entries, NULL);
+ }
+ start = i;
+ n_subelts = 1;
+ do_sort = false;
+ } else
+ n_subelts++;
+ }
+}
+
+static bool
+print_entries_sorted(struct seq_file *m, struct hash_trigger_data *hash_data)
+{
+ unsigned int hashtab_size = (1 << hash_data->hashtab_bits);
+ struct hash_trigger_entry **entries;
+ struct hash_trigger_entry *entry;
+ unsigned int entries_size;
+ unsigned int i = 0, j = 0;
+
+ entries_size = sizeof(entry) * hash_data->total_entries;
+ entries = kmalloc(entries_size, GFP_KERNEL);
+ if (!entries)
+ return false;
+
+ for (i = 0; i < hashtab_size; i++) {
+ hlist_for_each_entry_rcu(entry, &hash_data->hashtab[i], node)
+ entries[j++] = entry;
+ }
+
+ hash_data->sort_key_cur = hash_data->sort_keys[0];
+ sort(entries, j, sizeof(struct hash_trigger_entry *),
+ (int (*)(const void *, const void *))sort_entries, NULL);
+
+ if (hash_data->sort_keys[1])
+ sort_secondary(hash_data, entries, j);
+
+ for (i = 0; i < j; i++)
+ hash_trigger_entry_print(m, hash_data, entries[i]);
+
+ kfree(entries);
+
+ return true;
+}
+
+static bool
+print_entries_unsorted(struct seq_file *m, struct hash_trigger_data *hash_data)
+{
+ unsigned int hashtab_size = (1 << hash_data->hashtab_bits);
+ struct hash_trigger_entry *entry;
+ unsigned int i = 0;
+
+ for (i = 0; i < hashtab_size; i++) {
+ hlist_for_each_entry_rcu(entry, &hash_data->hashtab[i], node)
+ hash_trigger_entry_print(m, hash_data, entry);
+ }
+
+ return true;
+}
+
+static int
+event_hash_trigger_print(struct seq_file *m, struct event_trigger_ops *ops,
+ struct event_trigger_data *data)
+{
+ struct hash_trigger_data *hash_data = data->private_data;
+ bool sorted;
+ int ret;
+
+ ret = event_trigger_print("hash", m, (void *)data->count,
+ data->filter_str);
+
+ sorted = print_entries_sorted(m, hash_data);
+ if (!sorted)
+ print_entries_unsorted(m, hash_data);
+
+ seq_printf(m, "Totals:\n Hits: %lu\n Entries: %lu\n Dropped: %lu\n",
+ hash_data->total_hits, hash_data->total_entries, hash_data->drops);
+
+ if (!sorted)
+ seq_printf(m, "Unsorted (couldn't alloc memory for sorting)\n");
+
+ return ret;
+}
+
+static void
+event_hash_trigger_free(struct event_trigger_ops *ops,
+ struct event_trigger_data *data)
+{
+ struct hash_trigger_data *hash_data = data->private_data;
+
+ if (WARN_ON_ONCE(data->ref <= 0))
+ return;
+
+ data->ref--;
+ if (!data->ref) {
+ destroy_hashdata(hash_data);
+ trigger_data_free(data);
+ }
+}
+
+static struct event_trigger_ops event_hash_trigger_ops = {
+ .func = event_hash_trigger,
+ .print = event_hash_trigger_print,
+ .init = event_trigger_init,
+ .free = event_hash_trigger_free,
+};
+
+static struct event_trigger_ops *
+event_hash_get_trigger_ops(char *cmd, char *param)
+{
+ /* counts don't make sense for hash triggers */
+ return &event_hash_trigger_ops;
+}
+
+static int
+event_hash_trigger_func(struct event_command *cmd_ops,
+ struct ftrace_event_file *file,
+ char *glob, char *cmd, char *param)
+{
+ struct event_trigger_data *trigger_data;
+ struct event_trigger_ops *trigger_ops;
+ struct hash_trigger_data *hash_data;
+ char *sort_keys = NULL;
+ char *trigger;
+ char *number;
+ int ret = 0;
+ char *keys;
+ char *vals;
+
+ if (!param)
+ return -EINVAL;
+
+ /* separate the trigger from the filter (s:e:n [if filter]) */
+ trigger = strsep(¶m, " \t");
+ if (!trigger)
+ return -EINVAL;
+
+ keys = strsep(&trigger, ":");
+ if (!trigger)
+ return -EINVAL;
+
+ vals = strsep(&trigger, ":");
+ if (trigger)
+ sort_keys = strsep(&trigger, ":");
+
+ hash_data = create_hash_data(HASH_TRIGGER_BITS, keys, vals, sort_keys,
+ file, &ret);
+ if (ret)
+ return ret;
+
+ trigger_ops = cmd_ops->get_trigger_ops(cmd, trigger);
+
+ ret = -ENOMEM;
+ trigger_data = kzalloc(sizeof(*trigger_data), GFP_KERNEL);
+ if (!trigger_data)
+ goto out;
+
+ trigger_data->count = -1;
+ trigger_data->ops = trigger_ops;
+ trigger_data->cmd_ops = cmd_ops;
+ INIT_LIST_HEAD(&trigger_data->list);
+ RCU_INIT_POINTER(trigger_data->filter, NULL);
+
+ trigger_data->private_data = hash_data;
+
+ if (glob[0] == '!') {
+ cmd_ops->unreg(glob+1, trigger_ops, trigger_data, file);
+ ret = 0;
+ goto out_free;
+ }
+
+ if (trigger) {
+ number = strsep(&trigger, ":");
+
+ ret = -EINVAL;
+ if (strlen(number)) /* hash triggers don't support counts */
+ goto out_free;
+ }
+
+ if (!param) /* if param is non-empty, it's supposed to be a filter */
+ goto out_reg;
+
+ if (!cmd_ops->set_filter)
+ goto out_reg;
+
+ ret = cmd_ops->set_filter(param, trigger_data, file);
+ if (ret < 0)
+ goto out_free;
+
+ out_reg:
+ ret = cmd_ops->reg(glob, trigger_ops, trigger_data, file);
+ /*
+ * The above returns on success the # of functions enabled,
+ * but if it didn't find any functions it returns zero.
+ * Consider no functions a failure too.
+ */
+ if (!ret) {
+ ret = -ENOENT;
+ goto out_free;
+ } else if (ret < 0)
+ goto out_free;
+ /* Just return zero, not the number of enabled functions */
+ ret = 0;
+ out:
+ return ret;
+
+ out_free:
+ if (cmd_ops->set_filter)
+ cmd_ops->set_filter(NULL, trigger_data, NULL);
+ kfree(trigger_data);
+ destroy_hashdata(hash_data);
+ goto out;
+}
+
+static struct event_command trigger_hash_cmd= {
+ .name = "hash",
+ .trigger_type = ETT_EVENT_HASH,
+ .post_trigger = true, /* need non-NULL rec */
+ .func = event_hash_trigger_func,
+ .reg = register_trigger,
+ .unreg = unregister_trigger,
+ .get_trigger_ops = event_hash_get_trigger_ops,
+ .set_filter = set_trigger_filter,
+};
+
+static __init int register_trigger_hash_cmd(void)
+{
+ int ret;
+
+ ret = register_event_command(&trigger_hash_cmd);
+ WARN_ON(ret < 0);
+
+ return ret;
+}
+
__init int register_trigger_cmds(void)
{
register_trigger_traceon_traceoff_cmds();
register_trigger_snapshot_cmd();
register_trigger_stacktrace_cmd();
register_trigger_enable_disable_cmds();
+ register_trigger_hash_cmd();
return 0;
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 5/5] tracing: Add 'hash' event trigger command
2014-03-27 4:54 ` [PATCH 5/5] tracing: Add 'hash' event trigger command Tom Zanussi
@ 2014-03-28 16:54 ` Andi Kleen
2014-03-28 19:13 ` Tom Zanussi
2014-04-03 8:59 ` Masami Hiramatsu
1 sibling, 1 reply; 11+ messages in thread
From: Andi Kleen @ 2014-03-28 16:54 UTC (permalink / raw)
To: Tom Zanussi; +Cc: rostedt, linux-kernel
Tom Zanussi <tom.zanussi@linux.intel.com> writes:
> Hash triggers allow users to continually hash events which can then be
> dumped later by simply reading the trigger file. This is done
> strictly via one-liners and without any kind of programming language.
I read through the whole thing. I think I got it somewhere near the end,
but it was quite difficult. What really confuses me is your
use of the "hash" term. I believe the established term for these
kind of data operations is "histogram". How about calling it that.
Overall it seems useful, but it's not fully clear to me why it needs
to be done in the kernel and not an analysis tool?
-Andi
--
ak@linux.intel.com -- Speaking for myself only
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 5/5] tracing: Add 'hash' event trigger command
2014-03-28 16:54 ` Andi Kleen
@ 2014-03-28 19:13 ` Tom Zanussi
0 siblings, 0 replies; 11+ messages in thread
From: Tom Zanussi @ 2014-03-28 19:13 UTC (permalink / raw)
To: Andi Kleen; +Cc: rostedt, linux-kernel
On Fri, 2014-03-28 at 09:54 -0700, Andi Kleen wrote:
> Tom Zanussi <tom.zanussi@linux.intel.com> writes:
>
> > Hash triggers allow users to continually hash events which can then be
> > dumped later by simply reading the trigger file. This is done
> > strictly via one-liners and without any kind of programming language.
>
> I read through the whole thing. I think I got it somewhere near the end,
> but it was quite difficult. What really confuses me is your
> use of the "hash" term. I believe the established term for these
> kind of data operations is "histogram". How about calling it that.
>
Yeah, there are a lot of equivalent terms for the same thing - Python
calls them dictionaries, Perl calls them hashes or associative arrays,
dtrace and systemtap call them aggregations - I just happened to use the
term that seemed the simplest and most direct to me.
> Overall it seems useful, but it's not fully clear to me why it needs
> to be done in the kernel and not an analysis tool?
>
It doesn't necessarily need to be done in the kernel - you could instead
dump the entire trace stream to userspace and analyze it there. That's
basically the idea behind the perl and python scripting interfaces in
perf, which makes a lot of sense if you have a relatively low event rate
and/or the operations you need to perform are non-trivial.
It seems to me though that if you have a relatively simple operation
like hashing an event, which you're going to be doing in your analysis
tool anyway, it makes more sense and may be cheaper to just do it in the
kernel instead of sending it to userspace.
Of course, tools like systemtap also do their associative
arrays/aggregations in the kernel - I guess you could think of this as
something like the equivalent of their aggregation 'runtime'.
And there's also a middle ground e.g. think of a long-running trace that
it wouldn't make sense to continuously stream to userspace, but that
would quickly fill up a hash table in the kernel if left untended - in
cases like that it would make sense to periodically dump an
'aggregation' of it to userspace. [1]
For the embedded systems I've been working on, it's just really so much
more convenient to be able to directly cat a file and get essentially
that same information without having to go through some unnecessary
language runtime and additional userspace tooling.
It just seems to me that you get so much mileage out of implementing
this single simple concept, built on top of the trigger and trace event
formats already in the kernel, that it's worthwhile to expose it as a
tool that stands on its own as well as something that could probably be
reused as a component of higher-level tools.
Tom
[1] http://cygwin.com/ml/systemtap/2005-q3/msg00550.html
> -Andi
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 5/5] tracing: Add 'hash' event trigger command
2014-03-27 4:54 ` [PATCH 5/5] tracing: Add 'hash' event trigger command Tom Zanussi
2014-03-28 16:54 ` Andi Kleen
@ 2014-04-03 8:59 ` Masami Hiramatsu
2014-04-03 22:43 ` Tom Zanussi
1 sibling, 1 reply; 11+ messages in thread
From: Masami Hiramatsu @ 2014-04-03 8:59 UTC (permalink / raw)
To: Tom Zanussi; +Cc: rostedt, linux-kernel
Hi Tom,
(2014/03/27 13:54), Tom Zanussi wrote:
> Hash triggers allow users to continually hash events which can then be
> dumped later by simply reading the trigger file. This is done
> strictly via one-liners and without any kind of programming language.
>
> The syntax follows the existing trigger syntax:
>
> # echo hash:key(s):value(s)[:sort_keys()][ if filter] > event/trigger
>
> The values used as keys and values are just the fields that define the
> trace event and available in the event's 'format' file. For example,
> the kmalloc event:
>
> root@ie:/sys/kernel/debug/tracing/events/kmem/kmalloc# cat format
> name: kmalloc
> ID: 370
> format:
> field:unsigned short common_type; offset:0; size:2; signed:0;
> field:unsigned char common_flags; offset:2; size:1; signed:0;
> field:unsigned char common_preempt_count; offset:3; size:1;signed:0;
> field:int common_pid; offset:4; size:4; signed:1;
>
> field:unsigned long call_site; offset:8; size:4; signed:0;
> field:const void * ptr; offset:12; size:4; signed:0;
> field:size_t bytes_req; offset:16; size:4; signed:0;
> field:size_t bytes_alloc; offset:20; size:4; signed:0;
> field:gfp_t gfp_flags; offset:24; size:4; signed:0;
>
> The key can be made up of one or more of these fields and any number of
> values can specified - these are automatically tallied in the hash entry
> any time the event is hit. Stacktraces can also be used as keys.
>
> For example, the following uses the stacktrace leading up to a kmalloc
> as the key for hashing kmalloc events. For each hash entry a tally of
> the bytes_alloc field is kept. Dumping out the trigger shows the sum
> of bytes allocated for each execution path that led to a kmalloc:
>
> # echo 'hash:call_site:bytes_alloc' > /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
> # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
I like the basic idea :) but I'm confused the interface what you're introduced.
I suppose that the "trigger" file is for control triggers on the event, so that
user can check what trigger rules are set on the event and remove it.
But in this patch, that is also used for a data path.
I'd like to suggest adding new "hash" file under events/GROUP/EVENT/, which is
only for dumping the hash data, and keep the "trigger" as a control path.
This makes users easier to build their own tools on the ftrace facility.
Thank you,
--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 5/5] tracing: Add 'hash' event trigger command
2014-04-03 8:59 ` Masami Hiramatsu
@ 2014-04-03 22:43 ` Tom Zanussi
2014-04-04 1:44 ` Masami Hiramatsu
0 siblings, 1 reply; 11+ messages in thread
From: Tom Zanussi @ 2014-04-03 22:43 UTC (permalink / raw)
To: Masami Hiramatsu; +Cc: rostedt, linux-kernel
Hi Masami,
On Thu, 2014-04-03 at 17:59 +0900, Masami Hiramatsu wrote:
> Hi Tom,
>
> (2014/03/27 13:54), Tom Zanussi wrote:
> > Hash triggers allow users to continually hash events which can then be
> > dumped later by simply reading the trigger file. This is done
> > strictly via one-liners and without any kind of programming language.
> >
> > The syntax follows the existing trigger syntax:
> >
> > # echo hash:key(s):value(s)[:sort_keys()][ if filter] > event/trigger
> >
> > The values used as keys and values are just the fields that define the
> > trace event and available in the event's 'format' file. For example,
> > the kmalloc event:
> >
> > root@ie:/sys/kernel/debug/tracing/events/kmem/kmalloc# cat format
> > name: kmalloc
> > ID: 370
> > format:
> > field:unsigned short common_type; offset:0; size:2; signed:0;
> > field:unsigned char common_flags; offset:2; size:1; signed:0;
> > field:unsigned char common_preempt_count; offset:3; size:1;signed:0;
> > field:int common_pid; offset:4; size:4; signed:1;
> >
> > field:unsigned long call_site; offset:8; size:4; signed:0;
> > field:const void * ptr; offset:12; size:4; signed:0;
> > field:size_t bytes_req; offset:16; size:4; signed:0;
> > field:size_t bytes_alloc; offset:20; size:4; signed:0;
> > field:gfp_t gfp_flags; offset:24; size:4; signed:0;
> >
> > The key can be made up of one or more of these fields and any number of
> > values can specified - these are automatically tallied in the hash entry
> > any time the event is hit. Stacktraces can also be used as keys.
> >
> > For example, the following uses the stacktrace leading up to a kmalloc
> > as the key for hashing kmalloc events. For each hash entry a tally of
> > the bytes_alloc field is kept. Dumping out the trigger shows the sum
> > of bytes allocated for each execution path that led to a kmalloc:
> >
> > # echo 'hash:call_site:bytes_alloc' > /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
> > # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
>
> I like the basic idea :) but I'm confused the interface what you're introduced.
> I suppose that the "trigger" file is for control triggers on the event, so that
> user can check what trigger rules are set on the event and remove it.
> But in this patch, that is also used for a data path.
>
> I'd like to suggest adding new "hash" file under events/GROUP/EVENT/, which is
> only for dumping the hash data, and keep the "trigger" as a control path.
> This makes users easier to build their own tools on the ftrace facility.
>
I was really trying to avoid adding a new file - my thinking was that
the trigger file is just sitting there doing nothing besides either
listing available triggers when inactive or listing active triggers when
active, which it would still do even if also providing a conduit for the
output.
I agree that it would be cleaner to have a separate file, but I don't
know if it's worth a dedicated file. Another possibility would be to
have it exist only when a hash trigger is active..
Tom
> Thank you,
>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Re: [PATCH 5/5] tracing: Add 'hash' event trigger command
2014-04-03 22:43 ` Tom Zanussi
@ 2014-04-04 1:44 ` Masami Hiramatsu
0 siblings, 0 replies; 11+ messages in thread
From: Masami Hiramatsu @ 2014-04-04 1:44 UTC (permalink / raw)
To: Tom Zanussi; +Cc: rostedt, linux-kernel
(2014/04/04 7:43), Tom Zanussi wrote:
> Hi Masami,
>
> On Thu, 2014-04-03 at 17:59 +0900, Masami Hiramatsu wrote:
>> Hi Tom,
>>
>> (2014/03/27 13:54), Tom Zanussi wrote:
>>> Hash triggers allow users to continually hash events which can then be
>>> dumped later by simply reading the trigger file. This is done
>>> strictly via one-liners and without any kind of programming language.
>>>
>>> The syntax follows the existing trigger syntax:
>>>
>>> # echo hash:key(s):value(s)[:sort_keys()][ if filter] > event/trigger
>>>
>>> The values used as keys and values are just the fields that define the
>>> trace event and available in the event's 'format' file. For example,
>>> the kmalloc event:
>>>
>>> root@ie:/sys/kernel/debug/tracing/events/kmem/kmalloc# cat format
>>> name: kmalloc
>>> ID: 370
>>> format:
>>> field:unsigned short common_type; offset:0; size:2; signed:0;
>>> field:unsigned char common_flags; offset:2; size:1; signed:0;
>>> field:unsigned char common_preempt_count; offset:3; size:1;signed:0;
>>> field:int common_pid; offset:4; size:4; signed:1;
>>>
>>> field:unsigned long call_site; offset:8; size:4; signed:0;
>>> field:const void * ptr; offset:12; size:4; signed:0;
>>> field:size_t bytes_req; offset:16; size:4; signed:0;
>>> field:size_t bytes_alloc; offset:20; size:4; signed:0;
>>> field:gfp_t gfp_flags; offset:24; size:4; signed:0;
>>>
>>> The key can be made up of one or more of these fields and any number of
>>> values can specified - these are automatically tallied in the hash entry
>>> any time the event is hit. Stacktraces can also be used as keys.
>>>
>>> For example, the following uses the stacktrace leading up to a kmalloc
>>> as the key for hashing kmalloc events. For each hash entry a tally of
>>> the bytes_alloc field is kept. Dumping out the trigger shows the sum
>>> of bytes allocated for each execution path that led to a kmalloc:
>>>
>>> # echo 'hash:call_site:bytes_alloc' > /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
>>> # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
>>
>> I like the basic idea :) but I'm confused the interface what you're introduced.
>> I suppose that the "trigger" file is for control triggers on the event, so that
>> user can check what trigger rules are set on the event and remove it.
>> But in this patch, that is also used for a data path.
>>
>> I'd like to suggest adding new "hash" file under events/GROUP/EVENT/, which is
>> only for dumping the hash data, and keep the "trigger" as a control path.
>> This makes users easier to build their own tools on the ftrace facility.
>>
>
> I was really trying to avoid adding a new file - my thinking was that
> the trigger file is just sitting there doing nothing besides either
> listing available triggers when inactive or listing active triggers when
> active, which it would still do even if also providing a conduit for the
> output.
You don't need to avoid it unless it is really meaningless :)
Since the available triggers are limited and it doesn't relay on event
type, I think it is enough to prepare tracing/available_triggers.
> I agree that it would be cleaner to have a separate file, but I don't
> know if it's worth a dedicated file. Another possibility would be to
> have it exist only when a hash trigger is active..
Agreed. That's a good idea :)
Thank you,
--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2014-04-04 1:44 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-27 4:54 [PATCH 0/5] tracing: Hash triggers Tom Zanussi
2014-03-27 4:54 ` [PATCH 1/5] tracing: Make ftrace_event_field checking functions available Tom Zanussi
2014-03-27 4:54 ` [PATCH 2/5] tracing: Add event record param to trigger_ops.func() Tom Zanussi
2014-03-27 4:54 ` [PATCH 3/5] tracing: Add get_syscall_name() Tom Zanussi
2014-03-27 4:54 ` [PATCH 4/5] tracing: Add hash trigger to Documentation Tom Zanussi
2014-03-27 4:54 ` [PATCH 5/5] tracing: Add 'hash' event trigger command Tom Zanussi
2014-03-28 16:54 ` Andi Kleen
2014-03-28 19:13 ` Tom Zanussi
2014-04-03 8:59 ` Masami Hiramatsu
2014-04-03 22:43 ` Tom Zanussi
2014-04-04 1:44 ` Masami Hiramatsu
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.