All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/18] [ANNOUNCE] Dynamically created function based events
@ 2018-02-02 23:04 Steven Rostedt
  2018-02-02 23:04 ` [PATCH 01/18] tracing: Add " Steven Rostedt
                   ` (21 more replies)
  0 siblings, 22 replies; 87+ messages in thread
From: Steven Rostedt @ 2018-02-02 23:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Tom Zanussi, linux-rt-users,
	linux-trace-users, Arnaldo Carvalho de Melo, Clark Williams,
	Jiri Olsa, Daniel Bristot de Oliveira, Juri Lelli,
	Jonathan Corbet, Mathieu Desnoyers, Namhyung Kim,
	Alexei Starovoitov


At Kernel Summit back in October, we tried to bring up trace markers, which
would be nops within the kernel proper, that would allow modules to hook
arbitrary trace events to them. The reaction to this proposal was less than
favorable. We were told that we were trying to make a work around for a
problem, and not solving it. The problem in our minds is the notion of a
"stable trace event".

There are maintainers that do not want trace events, or more trace events in
their subsystems. This is due to the fact that trace events post an
interface to user space, and this interface could become required by some
tool. This may cause the trace event to become stable where it must not
break the tool, and thus prevent the code from changing.

Or, the trace event may just have to add padding for fields that tools
may require. The "success" field of the sched_wakeup trace event is one such
instance. There is no more "success" variable, but tools may fail if it were
to go away, so a "1" is simply added to the trace event wasting ring buffer
real estate.

I talked with Linus about this, and he told me that we already have these
markers in the kernel. They are from the mcount/__fentry__ used by function
tracing. Have the trace events be created by these, and see if this will
satisfy most areas that want trace events.

I decided to implement this idea, and here's the patch set.

Introducing "function based events". These are created dynamically by a
tracefs file called "function_events". By writing a pseudo prototype into
this file, you create an event.

 # mount -t tracefs nodev /sys/kernel/tracing
 # cd /sys/kernel/tracing
 # echo 'do_IRQ(symbol ip[16] | x64[6] irq_stack[16])' > function_events
 # cat events/functions/do_IRQ/format
name: do_IRQ
ID: 1399
format:
	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
	field:int common_pid;	offset:4;	size:4;	signed:1;

	field:unsigned long __parent_ip;	offset:8;	size:8;	signed:0;
	field:unsigned long __ip;	offset:16;	size:8;	signed:0;
	field:symbol ip;	offset:24;	size:8;	signed:0;
	field:x64 irq_stack[6];	offset:32;	size:48;	signed:0;

print fmt: "%pS->%pS(ip=%pS, irq_stack=%llx:%llx:%llx:%llx:%llx:%llx)", REC->__ip, REC->__parent_ip,
REC->ip, REC->irq_stack[0], REC->irq_stack[1], REC->irq_stack[2], REC->irq_stack[3], REC->irq_stack[4],
REC->irq_stack[5]

 # echo 1 > events/functions/do_IRQ/enable
 # cat trace
          <idle>-0     [003] d..3  3647.049344: ret_from_intr->do_IRQ(ip=cpuidle_enter_state+0xb1/0x330, irq_stack=ffffffff81665db1,10,246,ffffc900006c3e80,18,ffff88011eae9b40)
          <idle>-0     [003] d..3  3647.049433: ret_from_intr->do_IRQ(ip=cpuidle_enter_state+0xb1/0x330, irq_stack=ffffffff81665db1,10,246,ffffc900006c3e80,18,ffff88011eae9b40)
          <idle>-0     [003] d..3  3647.049672: ret_from_intr->do_IRQ(ip=cpuidle_enter_state+0xb1/0x330, irq_stack=ffffffff81665db1,10,246,ffffc900006c3e80,18,ffff88011eae9b40)
          <idle>-0     [003] d..3  3647.325709: ret_from_intr->do_IRQ(ip=cpuidle_enter_state+0xb1/0x330, irq_stack=ffffffff81665db1,10,246,ffffc900006c3e80,18,ffff88011eae9b40)
          <idle>-0     [003] d..3  3647.325929: ret_from_intr->do_IRQ(ip=cpuidle_enter_state+0xb1/0x330, irq_stack=ffffffff81665db1,10,246,ffffc900006c3e80,18,ffff88011eae9b40)
          <idle>-0     [003] d..3  3647.325993: ret_from_intr->do_IRQ(ip=cpuidle_enter_state+0xb1/0x330, irq_stack=ffffffff81665db1,10,246,ffffc900006c3e80,18,ffff88011eae9b40)
          <idle>-0     [003] d..3  3647.387571: ret_from_intr->do_IRQ(ip=cpuidle_enter_state+0xb1/0x330, irq_stack=ffffffff81665db1,10,246,ffffc900006c3e80,18,ffff88011eae9b40)
          <idle>-0     [003] d..3  3647.387791: ret_from_intr->do_IRQ(ip=cpuidle_enter_state+0xb1/0x330, irq_stack=ffffffff81665db1,10,246,ffffc900006c3e80,18,ffff88011eae9b40)
          <idle>-0     [003] d..3  3647.387874: ret_from_intr->do_IRQ(ip=cpuidle_enter_state+0xb1/0x330, irq_stack=ffffffff81665db1,10,246,ffffc900006c3e80,18,ffff88011eae9b40)

And this is much more powerful than just this. We can show strings, and
index off of structures into other structures.

  # echo '__vfs_read(symbol read+40[0]+16)' > function_events

  # echo 1 > events/functions/__vfs_read/enable
  # cat trace
         sshd-1343  [005] ...2   199.734752: vfs_read->__vfs_read(read=tty_read+0x0/0xf0)
         bash-1344  [003] ...2   199.734822: vfs_read->__vfs_read(read=tty_read+0x0/0xf0)
         sshd-1343  [005] ...2   199.734835: vfs_read->__vfs_read(read=tty_read+0x0/0xf0)
 avahi-daemon-910   [003] ...2   200.136740: vfs_read->__vfs_read(read=          (null))
 avahi-daemon-910   [003] ...2   200.136750: vfs_read->__vfs_read(read=          (null))

And even read user space:

  # echo 'SyS_openat(int dfd, string path, x32 flags, x16 mode)' > function_events
  # echo 1 > events/functions/enable
  # grep task_fork /proc/kallsyms 
ffffffff810d5a60 t task_fork_fair
ffffffff810dfc30 t task_fork_dl
  # cat trace
            grep-1820  [000] ...2  3926.107603: entry_SYSCALL_64_fastpath->SyS_openat(dfd=-100, path=/proc/kallsyms, flags=100, mode=0)

These are fully functional events. That is, they work with ftrace,
trace-cmd, perf, histograms, triggers, and eBPF.

What's next? I need to rewrite the function graph tracer, and be able to add
dynamic events on function return.

I made this work with x86 with a simple function that only returns
6 function parameters for x86_64 and 3 for x86_32. But this could easily
be extended.

Cheers!

  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
ftrace/dynamic-ftrace-events

Head SHA1: 30fbdffd5d38bd27b04fb911f7158f10a99be8c4


Steven Rostedt (VMware) (18):
      tracing: Add function based events
      tracing: Add documentation for function based events
      tracing: Add simple arguments to function based events
      tracing/x86: Add arch_get_func_args() function
      tracing: Add hex print for dynamic ftrace based events
      tracing: Add indirect offset to args of ftrace based events
      tracing: Add dereferencing multiple fields per arg
      tracing: Add "unsigned" to function based events
      tracing: Add indexing of arguments for function based events
      tracing: Make func_type enums for easier comparing of arg types
      tracing: Add symbol type to function based events
      tracing: Add accessing direct address from function based events
      tracing: Add array type to function based events
      tracing: Have char arrays be strings for function based events
      tracing: Add string type for dynamic strings in function based events
      tracing: Add NULL to skip args for function based events
      tracing: Add indirect to indirect access for function based events
      tracing/perf: Allow perf to use function based events

----
 Documentation/trace/function-based-events.rst |  426 ++++++++
 arch/x86/kernel/ftrace.c                      |   28 +
 include/linux/trace_events.h                  |    2 +
 kernel/trace/Kconfig                          |   12 +
 kernel/trace/Makefile                         |    1 +
 kernel/trace/trace.h                          |   11 +
 kernel/trace/trace_event_ftrace.c             | 1440 +++++++++++++++++++++++++
 kernel/trace/trace_probe.h                    |   11 -
 8 files changed, 1920 insertions(+), 11 deletions(-)
 create mode 100644 Documentation/trace/function-based-events.rst
 create mode 100644 kernel/trace/trace_event_ftrace.c

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH 01/18] tracing: Add function based events
  2018-02-02 23:04 [PATCH 00/18] [ANNOUNCE] Dynamically created function based events Steven Rostedt
@ 2018-02-02 23:04 ` Steven Rostedt
  2018-02-05  8:24   ` Jiri Olsa
  2018-02-02 23:05 ` [PATCH 02/18] tracing: Add documentation for " Steven Rostedt
                   ` (20 subsequent siblings)
  21 siblings, 1 reply; 87+ messages in thread
From: Steven Rostedt @ 2018-02-02 23:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Tom Zanussi, linux-rt-users,
	linux-trace-users, Arnaldo Carvalho de Melo, Clark Williams,
	Jiri Olsa, Daniel Bristot de Oliveira, Juri Lelli,
	Jonathan Corbet, Mathieu Desnoyers, Namhyung Kim,
	Alexei Starovoitov

[-- Attachment #1: 0001-tracing-Add-function-based-events.patch --]
[-- Type: text/plain, Size: 15019 bytes --]

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

Add an interface that hooks to the ftrace function tracing to create events
based on functions. A new file is created in the tracefs file system called
function_events. Writing a function name followed by an empty set of
parenthesis will create an event that uses ftrace function callbacks to
record the data. Currently the function takes no arguments, but that will
soon change. The next step is to have arguments within those parenthesis,
and an event created to show it.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 include/linux/trace_events.h      |   2 +
 kernel/trace/Kconfig              |  12 +
 kernel/trace/Makefile             |   1 +
 kernel/trace/trace.h              |  11 +
 kernel/trace/trace_event_ftrace.c | 471 ++++++++++++++++++++++++++++++++++++++
 kernel/trace/trace_probe.h        |  11 -
 6 files changed, 497 insertions(+), 11 deletions(-)
 create mode 100644 kernel/trace/trace_event_ftrace.c

diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index af44e7c2d577..6a3600009c48 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -226,6 +226,7 @@ enum {
 	TRACE_EVENT_FL_TRACEPOINT_BIT,
 	TRACE_EVENT_FL_KPROBE_BIT,
 	TRACE_EVENT_FL_UPROBE_BIT,
+	TRACE_EVENT_FL_FUNC_BIT,
 };
 
 /*
@@ -246,6 +247,7 @@ enum {
 	TRACE_EVENT_FL_TRACEPOINT	= (1 << TRACE_EVENT_FL_TRACEPOINT_BIT),
 	TRACE_EVENT_FL_KPROBE		= (1 << TRACE_EVENT_FL_KPROBE_BIT),
 	TRACE_EVENT_FL_UPROBE		= (1 << TRACE_EVENT_FL_UPROBE_BIT),
+	TRACE_EVENT_FL_FUNC		= (1 << TRACE_EVENT_FL_FUNC_BIT),
 };
 
 #define TRACE_EVENT_FL_UKPROBE (TRACE_EVENT_FL_KPROBE | TRACE_EVENT_FL_UPROBE)
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index f54dc62b599c..2118838946cd 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -442,6 +442,18 @@ config BLK_DEV_IO_TRACE
 
 	  If unsure, say N.
 
+config FUNCTION_EVENTS
+	depends on DYNAMIC_FTRACE_WITH_REGS
+	bool "Enable function based events"
+	select TRACING
+	help
+	 This creates a file function_events in the tracefs file system.
+	 Writing function names into this file will create an event
+	 that can be enabled like any other event, but will be at the
+	 location of the specified function.
+
+	 See Documentation/trace/function_events.txt for more details.
+
 config KPROBE_EVENTS
 	depends on KPROBES
 	depends on HAVE_REGS_AND_STACK_ACCESS_API
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index e2538c7638d4..00f6d69652c0 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -55,6 +55,7 @@ obj-$(CONFIG_FTRACE_SYSCALLS) += trace_syscalls.o
 ifeq ($(CONFIG_PERF_EVENTS),y)
 obj-$(CONFIG_EVENT_TRACING) += trace_event_perf.o
 endif
+obj-$(CONFIG_FUNCTION_EVENTS) += trace_event_ftrace.o
 obj-$(CONFIG_EVENT_TRACING) += trace_events_filter.o
 obj-$(CONFIG_EVENT_TRACING) += trace_events_trigger.o
 obj-$(CONFIG_HIST_TRIGGERS) += trace_events_hist.o
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 2a6d0325a761..67928b53dc06 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1814,4 +1814,15 @@ static inline void trace_event_eval_update(struct trace_eval_map **map, int len)
 
 extern struct trace_iterator *tracepoint_print_iter;
 
+#undef DEFINE_FIELD
+#define DEFINE_FIELD(type, item, name, is_signed)			\
+	do {								\
+		ret = trace_define_field(event_call, #type, name,	\
+					 offsetof(typeof(field), item),	\
+					 sizeof(field.item), is_signed, \
+					 FILTER_OTHER);			\
+		if (ret)						\
+			return ret;					\
+	} while (0)
+
 #endif /* _LINUX_KERNEL_TRACE_H */
diff --git a/kernel/trace/trace_event_ftrace.c b/kernel/trace/trace_event_ftrace.c
new file mode 100644
index 000000000000..aaf62a0b1770
--- /dev/null
+++ b/kernel/trace/trace_event_ftrace.c
@@ -0,0 +1,471 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * trace events created on top of ftrace (functions).
+ *
+ * Copyright (C) 2018 VMware Inc, Steven Rostedt.
+ */
+
+#include <linux/ctype.h>
+#include <linux/slab.h>
+
+#include "trace.h"
+
+#define FUNC_EVENT_SYSTEM "functions"
+#define WRITE_BUFSIZE  4096
+
+struct func_event {
+	struct list_head		list;
+	char				*func;
+	struct trace_event_class	class;
+	struct trace_event_call		call;
+	struct ftrace_ops		ops;
+	struct list_head		files;
+};
+
+struct func_file {
+	struct list_head		list;
+	struct trace_event_file		*file;
+};
+
+struct func_event_hdr {
+	struct trace_entry	ent;
+	unsigned long		ip;
+	unsigned long		parent_ip;
+};
+
+static DEFINE_MUTEX(func_event_mutex);
+static LIST_HEAD(func_events);
+
+enum func_states {
+	FUNC_STATE_INIT,
+	FUNC_STATE_FUNC,
+	FUNC_STATE_PARAM,
+	FUNC_STATE_END,
+	FUNC_STATE_ERROR,
+};
+
+static void free_func_event(struct func_event *func_event)
+{
+	if (!func_event)
+		return;
+
+	ftrace_free_filter(&func_event->ops);
+	kfree(func_event->call.print_fmt);
+	kfree(func_event->func);
+	kfree(func_event);
+}
+
+static char *next_token(char **ptr, char *last)
+{
+	char *arg;
+	char *str;
+
+	if (!*ptr)
+		return NULL;
+
+	arg = *ptr;
+
+	if (*last)
+		*arg = *last;
+
+	if (!*arg)
+		return NULL;
+
+	for (str = arg; *str; str++) {
+		if (*str == '(' ||
+		    *str == ')')
+			break;
+	}
+	if (*str) {
+		if (str == arg)
+			str++;
+		*last = *str;
+		*str = 0;
+		*ptr = str;
+		return arg;
+	}
+
+	*last = 0;
+	*ptr = NULL;
+	return arg;
+}
+
+static enum func_states
+process_event(struct func_event *fevent, const char *token, enum func_states state)
+{
+	switch (state) {
+	case FUNC_STATE_INIT:
+		if (!isalpha(token[0]))
+			return FUNC_STATE_ERROR;
+		/* Do not allow wild cards */
+		if (strstr(token, "*") || strstr(token, "?"))
+			return FUNC_STATE_ERROR;
+		fevent->func = kstrdup(token, GFP_KERNEL);
+		if (!fevent->func)
+			return FUNC_STATE_ERROR;
+		return FUNC_STATE_FUNC;
+
+	case FUNC_STATE_FUNC:
+		if (token[0] != '(')
+			return FUNC_STATE_ERROR;
+		return FUNC_STATE_PARAM;
+
+	case FUNC_STATE_PARAM:
+		if (token[0] != ')')
+			return FUNC_STATE_ERROR;
+		return FUNC_STATE_END;
+
+	default:
+		return FUNC_STATE_ERROR;
+	}
+}
+
+static void func_event_trace(struct trace_event_file *trace_file,
+			     struct func_event *func_event,
+			     unsigned long ip, unsigned long parent_ip,
+			     struct pt_regs *pt_regs)
+{
+	struct func_event_hdr *entry;
+	struct trace_event_call *call = &func_event->call;
+	struct ring_buffer_event *event;
+	struct ring_buffer *buffer;
+	unsigned long irq_flags;
+	int size;
+	int pc;
+
+	if (trace_trigger_soft_disabled(trace_file))
+		return;
+
+	local_save_flags(irq_flags);
+	pc = preempt_count();
+
+	size = sizeof(*entry);
+
+	event = trace_event_buffer_lock_reserve(&buffer, trace_file,
+						call->event.type,
+						size, irq_flags, pc);
+	if (!event)
+		return;
+
+	entry = ring_buffer_event_data(event);
+	entry->ip = ip;
+	entry->parent_ip = parent_ip;
+
+	event_trigger_unlock_commit_regs(trace_file, buffer, event,
+					 entry, irq_flags, pc, pt_regs);
+}
+
+static void
+func_event_call(unsigned long ip, unsigned long parent_ip,
+		    struct ftrace_ops *op, struct pt_regs *pt_regs)
+{
+	struct func_event *func_event;
+	struct func_file *ff;
+
+	func_event = container_of(op, struct func_event, ops);
+
+	rcu_read_lock_sched();
+	list_for_each_entry_rcu(ff, &func_event->files, list) {
+		func_event_trace(ff->file, func_event, ip, parent_ip, pt_regs);
+	}
+	rcu_read_unlock_sched();
+}
+
+
+
+static enum print_line_t
+func_event_print(struct trace_iterator *iter, int flags,
+		 struct trace_event *event)
+{
+	struct func_event_hdr *entry;
+	struct trace_seq *s = &iter->seq;
+
+	entry = (struct func_event_hdr *)iter->ent;
+
+	trace_seq_printf(s, "%ps->%ps()",
+			 (void *)entry->parent_ip, (void *)entry->ip);
+	trace_seq_putc(s, '\n');
+	return trace_handle_return(s);
+}
+
+static struct trace_event_functions func_event_funcs = {
+	.trace		= func_event_print,
+};
+
+static int func_event_define_fields(struct trace_event_call *event_call)
+{
+	struct func_event_hdr field;
+	int ret;
+
+	DEFINE_FIELD(unsigned long, ip, "__parent_ip", 0);
+	DEFINE_FIELD(unsigned long, parent_ip, "__ip", 0);
+
+	return 0;
+}
+
+static int enable_func_event(struct func_event *func_event,
+			     struct trace_event_file *file)
+{
+	struct func_file *ff;
+	int ret;
+
+	ff = kmalloc(sizeof(*ff), GFP_KERNEL);
+	if (!ff)
+		return -ENOMEM;
+
+	if (list_empty(&func_event->files)) {
+		ret = register_ftrace_function(&func_event->ops);
+		if (ret < 0)
+			return ret;
+	}
+
+	ff->file = file;
+	/* Make sure file is visible before adding to the list */
+	smp_wmb();
+	list_add_rcu(&ff->list, &func_event->files);
+	return 0;
+}
+
+static int disable_func_event(struct func_event *func_event,
+			      struct trace_event_file *file)
+{
+	struct list_head *p, *n;
+	struct func_file *ff;
+
+
+	list_for_each_safe(p, n, &func_event->files) {
+		ff = container_of(p, struct func_file, list);
+		if (ff->file == file) {
+			list_del_rcu(&ff->list);
+			break;
+		}
+		ff = NULL;
+	}
+
+	if (!ff)
+		return -ENODEV;
+
+	if (list_empty(&func_event->files))
+		unregister_ftrace_function(&func_event->ops);
+
+	synchronize_sched();
+	kfree(ff);
+
+	return 0;
+}
+
+static int func_event_register(struct trace_event_call *event,
+			       enum trace_reg type, void *data)
+{
+	struct func_event *func_event = event->data;
+	struct trace_event_file *file = data;
+
+	switch (type) {
+	case TRACE_REG_REGISTER:
+		return enable_func_event(func_event, file);
+	case TRACE_REG_UNREGISTER:
+		return disable_func_event(func_event, file);
+	default:
+		break;
+	}
+
+	return 0;
+}
+
+static int set_print_fmt(struct func_event *func_event)
+{
+	const char *fmt = "\"%pS->%pS()\", REC->__ip, REC->__parent_ip";
+
+	func_event->call.print_fmt = kstrdup(fmt, GFP_KERNEL);
+	if (!func_event->call.print_fmt)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static int func_event_create(struct func_event *func_event)
+{
+	struct trace_event_call *call = &func_event->call;
+	int ret;
+
+	func_event->class.system = FUNC_EVENT_SYSTEM;
+	call->class = &func_event->class;
+	INIT_LIST_HEAD(&call->class->fields);
+	call->event.funcs = &func_event_funcs;
+	call->name = func_event->func;
+	call->class->define_fields = func_event_define_fields;
+	ret = set_print_fmt(func_event);
+	if (ret < 0)
+		return ret;
+	ret = register_trace_event(&call->event);
+	if (ret < 0)
+		return ret;
+	call->flags = TRACE_EVENT_FL_FUNC;
+	call->class->reg = func_event_register;
+	call->data = func_event;
+	ret = trace_add_event_call(call);
+	if (ret) {
+		pr_info("Failed to register func event: %s\n", func_event->func);
+		unregister_trace_event(&call->event);
+	}
+	return ret;
+}
+
+static int create_function_event(int argc, char **argv)
+{
+	struct func_event *func_event, *fe;
+	enum func_states state = FUNC_STATE_INIT;
+	char *token;
+	char *ptr;
+	char last;
+	int ret = -EINVAL;
+	int i;
+
+	func_event = kzalloc(sizeof(*func_event), GFP_KERNEL);
+	if (!func_event)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&func_event->files);
+	func_event->ops.func = func_event_call;
+	func_event->ops.flags = FTRACE_OPS_FL_SAVE_REGS;
+
+	for (i = 0; i < argc; i++) {
+		ptr = argv[i];
+		last = 0;
+		for (token = next_token(&ptr, &last); token;
+		     token = next_token(&ptr, &last)) {
+			state = process_event(func_event, token, state);
+			if (state == FUNC_STATE_ERROR)
+				goto fail;
+		}
+	}
+	if (state != FUNC_STATE_END)
+		goto fail;
+
+	ret = -EALREADY;
+	list_for_each_entry(fe, &func_events, list) {
+		if (strcmp(fe->func, func_event->func) == 0)
+			goto fail;
+	}
+
+	ret = ftrace_set_filter(&func_event->ops, func_event->func,
+				strlen(func_event->func), 0);
+	if (ret < 0)
+		goto fail;
+
+	ret = func_event_create(func_event);
+	if (ret < 0)
+		goto fail;
+
+	list_add_tail(&func_event->list, &func_events);
+	return 0;
+ fail:
+	free_func_event(func_event);
+	return ret;
+}
+
+static void *func_event_seq_start(struct seq_file *m, loff_t *pos)
+{
+	mutex_lock(&func_event_mutex);
+	return seq_list_start(&func_events, *pos);
+}
+
+static void *func_event_seq_next(struct seq_file *m, void *v, loff_t *pos)
+{
+	return seq_list_next(v, &func_events, pos);
+}
+
+static void func_event_seq_stop(struct seq_file *m, void *v)
+{
+	mutex_unlock(&func_event_mutex);
+}
+
+static int func_event_seq_show(struct seq_file *m, void *v)
+{
+	struct func_event *func_event = v;
+
+	seq_printf(m, "%s()\n", func_event->func);
+
+	return 0;
+}
+
+static const struct seq_operations func_event_seq_op = {
+	.start  = func_event_seq_start,
+	.next   = func_event_seq_next,
+	.stop   = func_event_seq_stop,
+	.show   = func_event_seq_show
+};
+
+static int release_all_func_events(void)
+{
+	struct func_event *func_event, *n;
+	int ret;
+
+	list_for_each_entry_safe(func_event, n, &func_events, list) {
+		ret = trace_remove_event_call(&func_event->call);
+		if (ret < 0)
+			return ret;
+		list_del(&func_event->list);
+		free_func_event(func_event);
+	}
+	return 0;
+}
+
+static int func_event_open(struct inode *inode, struct file *file)
+{
+	int ret;
+
+	if ((file->f_mode & FMODE_WRITE) && (file->f_flags & O_TRUNC)) {
+		ret = release_all_func_events();
+		if (ret < 0)
+			return ret;
+	}
+
+	return seq_open(file, &func_event_seq_op);
+}
+
+static ssize_t
+func_event_write(struct file *filp, const char __user *ubuf,
+		 size_t cnt, loff_t *ppos)
+{
+	return trace_parse_run_command(filp, ubuf, cnt, ppos,
+				       create_function_event);
+}
+
+static const struct file_operations func_event_fops = {
+	.open		= func_event_open,
+	.read           = seq_read,
+	.llseek         = seq_lseek,
+	.release        = seq_release,
+	.write		= func_event_write,
+};
+
+void create_function_event_file(struct dentry *d_tracer)
+{
+	struct dentry *d;
+
+	d = trace_create_file("function_events", 0644, d_tracer, NULL,
+			      &func_event_fops);
+	WARN(!d, "Failed to create function_events file");
+}
+
+/* Make a tracefs interface for controlling probe points */
+static __init int init_func_events(void)
+{
+	struct dentry *d_tracer;
+	struct dentry *entry;
+
+	d_tracer = tracing_init_dentry();
+	if (IS_ERR(d_tracer))
+		return 0;
+
+	entry = trace_create_file("function_events", 0644, d_tracer, NULL,
+				  &func_event_fops);
+
+	/* Event list interface */
+	if (!entry)
+		pr_warn("Could not create tracefs 'function-events' entry\n");
+
+	return 0;
+}
+fs_initcall(init_func_events);
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index fb66e3eaa192..a51caafd993b 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -49,17 +49,6 @@
 #define FIELD_STRING_RETIP	"__probe_ret_ip"
 #define FIELD_STRING_FUNC	"__probe_func"
 
-#undef DEFINE_FIELD
-#define DEFINE_FIELD(type, item, name, is_signed)			\
-	do {								\
-		ret = trace_define_field(event_call, #type, name,	\
-					 offsetof(typeof(field), item),	\
-					 sizeof(field.item), is_signed, \
-					 FILTER_OTHER);			\
-		if (ret)						\
-			return ret;					\
-	} while (0)
-
 
 /* Flags for trace_probe */
 #define TP_FLAG_TRACE		1
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 02/18] tracing: Add documentation for function based events
  2018-02-02 23:04 [PATCH 00/18] [ANNOUNCE] Dynamically created function based events Steven Rostedt
  2018-02-02 23:04 ` [PATCH 01/18] tracing: Add " Steven Rostedt
@ 2018-02-02 23:05 ` Steven Rostedt
  2018-02-02 23:05 ` [PATCH 03/18] tracing: Add simple arguments to " Steven Rostedt
                   ` (19 subsequent siblings)
  21 siblings, 0 replies; 87+ messages in thread
From: Steven Rostedt @ 2018-02-02 23:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Tom Zanussi, linux-rt-users,
	linux-trace-users, Arnaldo Carvalho de Melo, Clark Williams,
	Jiri Olsa, Daniel Bristot de Oliveira, Juri Lelli,
	Jonathan Corbet, Mathieu Desnoyers, Namhyung Kim,
	Alexei Starovoitov

[-- Attachment #1: 0002-tracing-Add-documentation-for-function-based-events.patch --]
[-- Type: text/plain, Size: 3380 bytes --]

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

Start documenting the usage of function based events. This only gives an
introduction for the function based events. As new features are added to
them, those features will be documented in this document.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 Documentation/trace/function-based-events.rst | 69 +++++++++++++++++++++++++++
 1 file changed, 69 insertions(+)
 create mode 100644 Documentation/trace/function-based-events.rst

diff --git a/Documentation/trace/function-based-events.rst b/Documentation/trace/function-based-events.rst
new file mode 100644
index 000000000000..843a1bf76459
--- /dev/null
+++ b/Documentation/trace/function-based-events.rst
@@ -0,0 +1,69 @@
+=====================
+Function based events
+=====================
+
+.. Copyright 2018 VMware Inc.
+..   Author:   Steven Rostedt <srostedt@goodmis.org>
+..  License:   The GNU Free Documentation License, Version 1.2
+..               (dual licensed under the GPL v2)
+
+
+Introduction
+============
+
+Static events are extremely useful for analyzing the happenings of
+inside the Linux kernel. But there are times where events are not
+available, either due to not being in control of the kernel, or simply
+because a maintainer refuses to have them in their subsystem.
+
+The function tracer is a way trace within a subsystem without trace events.
+But it only provides information of when a function was hit and who
+called it. Combining trace events with the function tracer allows
+for dynamically creating trace events where they do not exist at
+function entry. They provide more information than the function
+tracer can provide, as they can read the parameters of a function
+or simply read an address. This makes it possible to create a
+trace point at any function that the function tracer can trace, and
+read the parameters of the function.
+
+
+Usage
+=====
+
+Simply writing an ASCII string into a file called "function_events"
+in the tracefs file system will create the function based events.
+Note, this file is only writable by root.
+
+ # mount -t tracefs nodev /sys/kernel/tracing
+ # cd /sys/kernel/tracing
+ # echo 'do_IRQ()' > function_events
+
+The above will create a trace event on the do_IRQ function call.
+As no parameters were specified, it will not trace anything other
+than the function and the parent. This is the minimum function
+based event.
+
+ # ls events/functions/do_IRQ
+enable  filter  format  hist  id  trigger
+
+Even though the above function based event does not record much more
+than the function tracer does, it does become a full fledge event.
+This can be used by the histogram infrastructure, and triggers.
+
+ # cat events/functions/do_IRQ/format
+name: do_IRQ
+ID: 1304
+format:
+	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
+	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
+	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
+	field:int common_pid;	offset:4;	size:4;	signed:1;
+
+	field:unsigned long __parent_ip;	offset:8;	size:8;	signed:0;
+	field:unsigned long __ip;	offset:16;	size:8;	signed:0;
+
+print fmt: "%pS->%pS()", REC->__ip, REC->__parent_ip
+
+The above shows that the format is very close to the function trace
+except that it displays the parent function followed by the called
+function.
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 03/18] tracing: Add simple arguments to function based events
  2018-02-02 23:04 [PATCH 00/18] [ANNOUNCE] Dynamically created function based events Steven Rostedt
  2018-02-02 23:04 ` [PATCH 01/18] tracing: Add " Steven Rostedt
  2018-02-02 23:05 ` [PATCH 02/18] tracing: Add documentation for " Steven Rostedt
@ 2018-02-02 23:05 ` Steven Rostedt
  2018-02-08 10:18   ` Namhyung Kim
  2018-02-02 23:05 ` [PATCH 04/18] tracing/x86: Add arch_get_func_args() function Steven Rostedt
                   ` (18 subsequent siblings)
  21 siblings, 1 reply; 87+ messages in thread
From: Steven Rostedt @ 2018-02-02 23:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Tom Zanussi, linux-rt-users,
	linux-trace-users, Arnaldo Carvalho de Melo, Clark Williams,
	Jiri Olsa, Daniel Bristot de Oliveira, Juri Lelli,
	Jonathan Corbet, Mathieu Desnoyers, Namhyung Kim,
	Alexei Starovoitov

[-- Attachment #1: 0003-tracing-Add-simple-arguments-to-function-based-event.patch --]
[-- Type: text/plain, Size: 15252 bytes --]

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

The function based events can now have arguments passed in. A weak function
arch_get_func_args() is created so that archs can fill in the arguments
based on pt_regs. Currently no arch implements this function, so no
arguments are returned. Passing NULL for pt_regs into this function returns
the number of supported args that can be processed, and the format will
only allow the user to add valid args. Which currently are all arguments
until an arch implements the arg_get_func_args() function.

[ missing 'static' found by Fengguang Wu's kbuild test robot ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 Documentation/trace/function-based-events.rst |  57 +++++
 kernel/trace/trace_event_ftrace.c             | 319 ++++++++++++++++++++++++--
 2 files changed, 362 insertions(+), 14 deletions(-)

diff --git a/Documentation/trace/function-based-events.rst b/Documentation/trace/function-based-events.rst
index 843a1bf76459..94c2c975295a 100644
--- a/Documentation/trace/function-based-events.rst
+++ b/Documentation/trace/function-based-events.rst
@@ -67,3 +67,60 @@ print fmt: "%pS->%pS()", REC->__ip, REC->__parent_ip
 The above shows that the format is very close to the function trace
 except that it displays the parent function followed by the called
 function.
+
+
+Number of arguments
+===================
+
+The number of arguments that can be specified is dependent on the
+architecture. An architecture may not allow any arguments, or it
+may limit to just three or six. If more arguments are used than
+supported, it will fail with -EINVAL.
+
+Parameters
+==========
+
+Adding parameters creates fields within the events. The format is
+as follows:
+
+ # echo EVENT > function_events
+
+ EVENT := <function> '(' ARGS ')'
+
+ Where <function> is any function that the function tracer can trace.
+
+ ARGS := ARG | ARG ',' ARGS | ''
+
+ ARG := TYPE FIELD
+
+ TYPE := ATOM
+
+ ATOM := 'u8' | 'u16' | 'u32' | 'u64' |
+         's8' | 's16' | 's32' | 's64' |
+         'char' | 'short' | 'int' | 'long' | 'size_t'
+
+ FIELD := <name>
+
+ Where <name> is a unique string starting with an alphabetic character
+ and consists only of letters and numbers and underscores.
+
+
+Simple arguments
+================
+
+Looking at kernel code, we can see something like:
+
+ v4.15: net/ipv4/ip_input.c:
+
+int ip_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *orig_dev)
+
+If we are only interested in the first argument (skb):
+
+ # echo 'ip_rcv(u64 skb, u64 dev)' > function_events
+
+ # echo 1 > events/functions/ip_rcv/enable
+ # cat trace
+     <idle>-0     [003] ..s3  2119.041935: __netif_receive_skb_core->ip_rcv(skb=18446612136982403072, dev=18446612136968273920)
+     <idle>-0     [003] ..s3  2119.041944: __netif_receive_skb_core->ip_rcv(skb=18446612136982403072, dev=18446612136968273920)
+     <idle>-0     [003] ..s3  2119.288337: __netif_receive_skb_core->ip_rcv(skb=18446612136982403072, dev=18446612136968273920)
+     <idle>-0     [003] ..s3  2119.288960: __netif_receive_skb_core->ip_rcv(skb=18446612136982403072, dev=18446612136968273920)
diff --git a/kernel/trace/trace_event_ftrace.c b/kernel/trace/trace_event_ftrace.c
index aaf62a0b1770..66465be1e6d5 100644
--- a/kernel/trace/trace_event_ftrace.c
+++ b/kernel/trace/trace_event_ftrace.c
@@ -13,6 +13,16 @@
 #define FUNC_EVENT_SYSTEM "functions"
 #define WRITE_BUFSIZE  4096
 
+struct func_arg {
+	struct list_head		list;
+	char				*type;
+	char				*name;
+	short				offset;
+	short				size;
+	char				arg;
+	char				sign;
+};
+
 struct func_event {
 	struct list_head		list;
 	char				*func;
@@ -20,6 +30,10 @@ struct func_event {
 	struct trace_event_call		call;
 	struct ftrace_ops		ops;
 	struct list_head		files;
+	struct list_head		args;
+	struct func_arg			*last_arg;
+	int				arg_cnt;
+	int				arg_offset;
 };
 
 struct func_file {
@@ -31,6 +45,7 @@ struct func_event_hdr {
 	struct trace_entry	ent;
 	unsigned long		ip;
 	unsigned long		parent_ip;
+	char			data[0];
 };
 
 static DEFINE_MUTEX(func_event_mutex);
@@ -40,15 +55,89 @@ enum func_states {
 	FUNC_STATE_INIT,
 	FUNC_STATE_FUNC,
 	FUNC_STATE_PARAM,
+	FUNC_STATE_TYPE,
+	FUNC_STATE_VAR,
+	FUNC_STATE_COMMA,
 	FUNC_STATE_END,
 	FUNC_STATE_ERROR,
 };
 
+#define TYPE_TUPLE(type)			\
+	{ #type, sizeof(type), is_signed_type(type) }
+
+static struct func_type {
+	char		*name;
+	int		size;
+	int		sign;
+} func_types[] = {
+	TYPE_TUPLE(long),
+	TYPE_TUPLE(int),
+	TYPE_TUPLE(short),
+	TYPE_TUPLE(char),
+	TYPE_TUPLE(size_t),
+	TYPE_TUPLE(u64),
+	TYPE_TUPLE(s64),
+	TYPE_TUPLE(u32),
+	TYPE_TUPLE(s32),
+	TYPE_TUPLE(u16),
+	TYPE_TUPLE(s16),
+	TYPE_TUPLE(u8),
+	TYPE_TUPLE(s8),
+	{ NULL,		0,	0 }
+};
+
+/**
+ * arch_get_func_args - retrieve function arguments via pt_regs
+ * @regs: The registers at the moment the function is called
+ * @start: The first argument to retrieve (usually zero)
+ * @end: The last argument to retrive (end - start arguments to get)
+ * @args: The array to store the arguments in
+ *
+ * This is to be implemented by architecture code.
+ *
+ * If @regs is NULL, return the number of supported arguments that
+ * can be retrieved (this default function supports no arguments,
+ * and returns zero). The other parameters are ignored when @regs
+ * is NULL.
+ *
+ * If the function can support 6 arguments, then it should return
+ * 6 if @regs is NULL. If @regs is not NULL and it should start
+ * loading the arguments into @args. If @start is 2 and @end is 4,
+ * @args[0] would get the third argument (0 is the first argument)
+ * and @args[1] would get the forth argument. The function would
+ * return 2 (@end - @start).
+ *
+ * If @start is 5 and @end is 7, as @end is greater than the number
+ * of supported arguments, @args[0] would get the sixth argument,
+ * and 1 would be returned. The function does not error if more
+ * than the supported arguments is asked for. It only load what it
+ * can into @args, and return the number of arguments copied.
+ *
+ * Returns:
+ *  If @regs is NULL, the number of supported arguments it can handle.
+ *
+ *  Otherwise, it returns the number of arguments copied to @args.
+ */
+int __weak arch_get_func_args(struct pt_regs *regs,
+			      int start, int end,
+			      long *args)
+{
+	return 0;
+}
+
 static void free_func_event(struct func_event *func_event)
 {
+	struct func_arg *arg, *n;
+
 	if (!func_event)
 		return;
 
+	list_for_each_entry_safe(arg, n, &func_event->args, list) {
+		list_del(&arg->list);
+		kfree(arg->name);
+		kfree(arg->type);
+		kfree(arg);
+	}
 	ftrace_free_filter(&func_event->ops);
 	kfree(func_event->call.print_fmt);
 	kfree(func_event->func);
@@ -73,6 +162,7 @@ static char *next_token(char **ptr, char *last)
 
 	for (str = arg; *str; str++) {
 		if (*str == '(' ||
+		    *str == ',' ||
 		    *str == ')')
 			break;
 	}
@@ -90,34 +180,99 @@ static char *next_token(char **ptr, char *last)
 	return arg;
 }
 
+static int add_arg(struct func_event *fevent, int ftype)
+{
+	struct func_type *func_type = &func_types[ftype];
+	struct func_arg *arg;
+
+	/* Make sure the arch can support this many args */
+	if (fevent->arg_cnt >= arch_get_func_args(NULL, 0, 0, NULL))
+		return -EINVAL;
+
+	arg = kzalloc(sizeof(*arg), GFP_KERNEL);
+	if (!arg)
+		return -ENOMEM;
+
+	arg->type = kstrdup(func_type->name, GFP_KERNEL);
+	if (!arg->type) {
+		kfree(arg);
+		return -ENOMEM;
+	}
+	arg->size = func_type->size;
+	arg->sign = func_type->sign;
+	arg->offset = ALIGN(fevent->arg_offset, arg->size);
+	arg->arg = fevent->arg_cnt;
+	fevent->arg_offset = arg->offset + arg->size;
+
+	list_add_tail(&arg->list, &fevent->args);
+	fevent->last_arg = arg;
+	fevent->arg_cnt++;
+
+	return 0;
+}
+
 static enum func_states
 process_event(struct func_event *fevent, const char *token, enum func_states state)
 {
+	int ret;
+	int i;
+
 	switch (state) {
 	case FUNC_STATE_INIT:
 		if (!isalpha(token[0]))
-			return FUNC_STATE_ERROR;
+			break;
 		/* Do not allow wild cards */
 		if (strstr(token, "*") || strstr(token, "?"))
-			return FUNC_STATE_ERROR;
+			break;
 		fevent->func = kstrdup(token, GFP_KERNEL);
 		if (!fevent->func)
-			return FUNC_STATE_ERROR;
+			break;
 		return FUNC_STATE_FUNC;
 
 	case FUNC_STATE_FUNC:
 		if (token[0] != '(')
-			return FUNC_STATE_ERROR;
+			break;
 		return FUNC_STATE_PARAM;
 
 	case FUNC_STATE_PARAM:
-		if (token[0] != ')')
-			return FUNC_STATE_ERROR;
-		return FUNC_STATE_END;
+		if (token[0] == ')')
+			return FUNC_STATE_END;
+		/* Fall through */
+	case FUNC_STATE_COMMA:
+		for (i = 0; func_types[i].size; i++) {
+			if (strcmp(token, func_types[i].name) == 0)
+				break;
+		}
+		if (!func_types[i].size)
+			break;
+		ret = add_arg(fevent, i);
+		if (ret < 0)
+			break;
+		return FUNC_STATE_TYPE;
+
+	case FUNC_STATE_TYPE:
+		if (!isalpha(token[0]))
+			break;
+		if (WARN_ON(!fevent->last_arg))
+			break;
+		fevent->last_arg->name = kstrdup(token, GFP_KERNEL);
+		if (!fevent->last_arg->name)
+			break;
+		return FUNC_STATE_VAR;
+
+	case FUNC_STATE_VAR:
+		switch (token[0]) {
+		case ')':
+			return FUNC_STATE_END;
+		case ',':
+			return FUNC_STATE_COMMA;
+		}
+		break;
 
 	default:
-		return FUNC_STATE_ERROR;
+		break;
 	}
+	return FUNC_STATE_ERROR;
 }
 
 static void func_event_trace(struct trace_event_file *trace_file,
@@ -129,9 +284,14 @@ static void func_event_trace(struct trace_event_file *trace_file,
 	struct trace_event_call *call = &func_event->call;
 	struct ring_buffer_event *event;
 	struct ring_buffer *buffer;
+	struct func_arg *arg;
+	long args[func_event->arg_cnt];
+	long long val = 1;
 	unsigned long irq_flags;
+	int nr_args;
 	int size;
 	int pc;
+	int i = 0;
 
 	if (trace_trigger_soft_disabled(trace_file))
 		return;
@@ -139,7 +299,7 @@ static void func_event_trace(struct trace_event_file *trace_file,
 	local_save_flags(irq_flags);
 	pc = preempt_count();
 
-	size = sizeof(*entry);
+	size = func_event->arg_offset + sizeof(*entry);
 
 	event = trace_event_buffer_lock_reserve(&buffer, trace_file,
 						call->event.type,
@@ -150,6 +310,15 @@ static void func_event_trace(struct trace_event_file *trace_file,
 	entry = ring_buffer_event_data(event);
 	entry->ip = ip;
 	entry->parent_ip = parent_ip;
+	nr_args = arch_get_func_args(pt_regs, 0, func_event->arg_cnt, args);
+
+	list_for_each_entry(arg, &func_event->args, list) {
+		if (i < nr_args)
+			val = args[i];
+		else
+			val = 0;
+		memcpy(&entry->data[arg->offset], &val, arg->size);
+	}
 
 	event_trigger_unlock_commit_regs(trace_file, buffer, event,
 					 entry, irq_flags, pc, pt_regs);
@@ -171,7 +340,26 @@ func_event_call(unsigned long ip, unsigned long parent_ip,
 	rcu_read_unlock_sched();
 }
 
+#define FMT_SIZE	8
+
+static void make_fmt(struct func_arg *arg, char *fmt)
+{
+	int c = 0;
 
+	fmt[c++] = '%';
+
+	if (arg->size == 8) {
+		fmt[c++] = 'l';
+		fmt[c++] = 'l';
+	}
+
+	if (arg->sign)
+		fmt[c++] = 'd';
+	else
+		fmt[c++] = 'u';
+
+	fmt[c++] = '\0';
+}
 
 static enum print_line_t
 func_event_print(struct trace_iterator *iter, int flags,
@@ -179,12 +367,43 @@ func_event_print(struct trace_iterator *iter, int flags,
 {
 	struct func_event_hdr *entry;
 	struct trace_seq *s = &iter->seq;
+	struct func_event *func_event;
+	struct func_arg *arg;
+	char fmt[FMT_SIZE];
+	void *data;
+	bool comma = false;
 
 	entry = (struct func_event_hdr *)iter->ent;
 
-	trace_seq_printf(s, "%ps->%ps()",
+	func_event = container_of(event, struct func_event, call.event);
+
+	trace_seq_printf(s, "%ps->%ps(",
 			 (void *)entry->parent_ip, (void *)entry->ip);
-	trace_seq_putc(s, '\n');
+	list_for_each_entry(arg, &func_event->args, list) {
+		if (comma)
+			trace_seq_puts(s, ", ");
+		comma = true;
+		trace_seq_printf(s, "%s=", arg->name);
+		data = &entry->data[arg->offset];
+
+		make_fmt(arg, fmt);
+
+		switch (arg->size) {
+		case 8:
+			trace_seq_printf(s, fmt, *(unsigned long long *)data);
+			break;
+		case 4:
+			trace_seq_printf(s, fmt, *(unsigned *)data);
+			break;
+		case 2:
+			trace_seq_printf(s, fmt, *(unsigned short *)data);
+			break;
+		case 1:
+			trace_seq_printf(s, fmt, *(unsigned char *)data);
+			break;
+		}
+	}
+	trace_seq_puts(s, ")\n");
 	return trace_handle_return(s);
 }
 
@@ -194,12 +413,25 @@ static struct trace_event_functions func_event_funcs = {
 
 static int func_event_define_fields(struct trace_event_call *event_call)
 {
+	struct func_event *fevent;
 	struct func_event_hdr field;
+	struct func_arg *arg;
 	int ret;
 
+	fevent = (struct func_event *)event_call->data;
+
 	DEFINE_FIELD(unsigned long, ip, "__parent_ip", 0);
 	DEFINE_FIELD(unsigned long, parent_ip, "__ip", 0);
 
+	list_for_each_entry(arg, &fevent->args, list) {
+		ret = trace_define_field(event_call, arg->type,
+					 arg->name,
+					 sizeof(field) + arg->offset,
+					 arg->size, arg->sign,
+					 FILTER_OTHER);
+		if (ret < 0)
+			return ret;
+	}
 	return 0;
 }
 
@@ -272,13 +504,61 @@ static int func_event_register(struct trace_event_call *event,
 	return 0;
 }
 
+static int update_len(int len, int i)
+{
+	len -= i;
+	if (len < 0)
+		return 0;
+	return len;
+}
+
+static int __set_print_fmt(struct func_event *func_event,
+			   char *buf, int len)
+{
+	struct func_arg *arg;
+	const char *fmt_start = "\"%pS->%pS(";
+	const char *fmt_end = ")\", REC->__ip, REC->__parent_ip";
+	char fmt[FMT_SIZE];
+	int r, i;
+	bool comma = false;
+
+	r = snprintf(buf, len, "%s", fmt_start);
+	len = update_len(len, r);
+	list_for_each_entry(arg, &func_event->args, list) {
+		if (comma) {
+			i = snprintf(buf + r, len, ", ");
+			r += i;
+			len = update_len(len, i);
+		}
+		comma = true;
+		make_fmt(arg, fmt);
+		i = snprintf(buf + r, len, "%s=%s", arg->name, fmt);
+		r += i;
+		len = update_len(len, i);
+	}
+	i = snprintf(buf + r, len, "%s", fmt_end);
+	r += i;
+	len = update_len(len, i);
+
+	list_for_each_entry(arg, &func_event->args, list) {
+		i = snprintf(buf + r, len, ", REC->%s", arg->name);
+		r += i;
+		len = update_len(len, i);
+	}
+
+	return r;
+}
+
 static int set_print_fmt(struct func_event *func_event)
 {
-	const char *fmt = "\"%pS->%pS()\", REC->__ip, REC->__parent_ip";
+	int len;
 
-	func_event->call.print_fmt = kstrdup(fmt, GFP_KERNEL);
+	/* Get required length */
+	len = __set_print_fmt(func_event, NULL, 0) + 1;
+	func_event->call.print_fmt = kmalloc(len, GFP_KERNEL);
 	if (!func_event->call.print_fmt)
 		return -ENOMEM;
+	__set_print_fmt(func_event, func_event->call.print_fmt, len);
 
 	return 0;
 }
@@ -326,6 +606,7 @@ static int create_function_event(int argc, char **argv)
 		return -ENOMEM;
 
 	INIT_LIST_HEAD(&func_event->files);
+	INIT_LIST_HEAD(&func_event->args);
 	func_event->ops.func = func_event_call;
 	func_event->ops.flags = FTRACE_OPS_FL_SAVE_REGS;
 
@@ -383,8 +664,18 @@ static void func_event_seq_stop(struct seq_file *m, void *v)
 static int func_event_seq_show(struct seq_file *m, void *v)
 {
 	struct func_event *func_event = v;
+	struct func_arg *arg;
+	bool comma = false;
 
-	seq_printf(m, "%s()\n", func_event->func);
+	seq_printf(m, "%s(", func_event->func);
+
+	list_for_each_entry(arg, &func_event->args, list) {
+		if (comma)
+			seq_puts(m, ", ");
+		comma = true;
+		seq_printf(m, "%s %s", arg->type, arg->name);
+	}
+	seq_puts(m, ")\n");
 
 	return 0;
 }
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 04/18] tracing/x86: Add arch_get_func_args() function
  2018-02-02 23:04 [PATCH 00/18] [ANNOUNCE] Dynamically created function based events Steven Rostedt
                   ` (2 preceding siblings ...)
  2018-02-02 23:05 ` [PATCH 03/18] tracing: Add simple arguments to " Steven Rostedt
@ 2018-02-02 23:05 ` Steven Rostedt
  2018-02-05 16:33   ` Masami Hiramatsu
  2018-02-08  5:28   ` Namhyung Kim
  2018-02-02 23:05 ` [PATCH 05/18] tracing: Add hex print for dynamic ftrace based events Steven Rostedt
                   ` (17 subsequent siblings)
  21 siblings, 2 replies; 87+ messages in thread
From: Steven Rostedt @ 2018-02-02 23:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Tom Zanussi, linux-rt-users,
	linux-trace-users, Arnaldo Carvalho de Melo, Clark Williams,
	Jiri Olsa, Daniel Bristot de Oliveira, Juri Lelli,
	Jonathan Corbet, Mathieu Desnoyers, Namhyung Kim,
	Alexei Starovoitov

[-- Attachment #1: 0004-tracing-x86-Add-arch_get_func_args-function.patch --]
[-- Type: text/plain, Size: 1130 bytes --]

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

Add function to get the function arguments from pt_regs.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 arch/x86/kernel/ftrace.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 01ebcb6f263e..5e845c8cf89d 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -46,6 +46,34 @@ int ftrace_arch_code_modify_post_process(void)
 	return 0;
 }
 
+int arch_get_func_args(struct pt_regs *regs,
+		       int start, int end, long *args)
+{
+#ifdef CONFIG_X86_64
+# define MAX_ARGS 6
+# define INIT_REGS				\
+	{	regs->di, regs->si, regs->dx,	\
+		regs->cx, regs->r8, regs->r9	\
+	}
+#else
+# define MAX_ARGS 3
+# define INIT_REGS				\
+	{	regs->ax, regs->dx, regs->cx	}
+#endif
+	if (!regs)
+		return MAX_ARGS;
+
+	{
+		long pt_args[] = INIT_REGS;
+		int i;
+
+		for (i = start; i <= end && i < MAX_ARGS; i++)
+			args[i - start] = pt_args[i];
+
+		return i - start;
+	}
+}
+
 union ftrace_code_union {
 	char code[MCOUNT_INSN_SIZE];
 	struct {
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 05/18] tracing: Add hex print for dynamic ftrace based events
  2018-02-02 23:04 [PATCH 00/18] [ANNOUNCE] Dynamically created function based events Steven Rostedt
                   ` (3 preceding siblings ...)
  2018-02-02 23:05 ` [PATCH 04/18] tracing/x86: Add arch_get_func_args() function Steven Rostedt
@ 2018-02-02 23:05 ` Steven Rostedt
  2018-02-02 23:05 ` [PATCH 06/18] tracing: Add indirect offset to args of " Steven Rostedt
                   ` (16 subsequent siblings)
  21 siblings, 0 replies; 87+ messages in thread
From: Steven Rostedt @ 2018-02-02 23:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Tom Zanussi, linux-rt-users,
	linux-trace-users, Arnaldo Carvalho de Melo, Clark Williams,
	Jiri Olsa, Daniel Bristot de Oliveira, Juri Lelli,
	Jonathan Corbet, Mathieu Desnoyers, Namhyung Kim,
	Alexei Starovoitov

[-- Attachment #1: 0005-tracing-Add-hex-print-for-dynamic-ftrace-based-event.patch --]
[-- Type: text/plain, Size: 3370 bytes --]

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

Add x64, x32, x16 and x8 to represent numbers of the same size in hex.
Similar to u64, u32, u16, and u8 but uses %x instead of %u.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 Documentation/trace/function-based-events.rst | 14 +++++++++-----
 kernel/trace/trace_event_ftrace.c             | 13 ++++++++++++-
 2 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/Documentation/trace/function-based-events.rst b/Documentation/trace/function-based-events.rst
index 94c2c975295a..f27a0c4e829c 100644
--- a/Documentation/trace/function-based-events.rst
+++ b/Documentation/trace/function-based-events.rst
@@ -97,6 +97,7 @@ as follows:
 
  ATOM := 'u8' | 'u16' | 'u32' | 'u64' |
          's8' | 's16' | 's32' | 's64' |
+         'x8' | 'x16' | 'x32' | 'x64' |
          'char' | 'short' | 'int' | 'long' | 'size_t'
 
  FIELD := <name>
@@ -116,11 +117,14 @@ int ip_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt,
 
 If we are only interested in the first argument (skb):
 
- # echo 'ip_rcv(u64 skb, u64 dev)' > function_events
+ # echo 'ip_rcv(x64 skb, x86 dev)' > function_events
 
  # echo 1 > events/functions/ip_rcv/enable
  # cat trace
-     <idle>-0     [003] ..s3  2119.041935: __netif_receive_skb_core->ip_rcv(skb=18446612136982403072, dev=18446612136968273920)
-     <idle>-0     [003] ..s3  2119.041944: __netif_receive_skb_core->ip_rcv(skb=18446612136982403072, dev=18446612136968273920)
-     <idle>-0     [003] ..s3  2119.288337: __netif_receive_skb_core->ip_rcv(skb=18446612136982403072, dev=18446612136968273920)
-     <idle>-0     [003] ..s3  2119.288960: __netif_receive_skb_core->ip_rcv(skb=18446612136982403072, dev=18446612136968273920)
+     <idle>-0     [003] ..s3  5543.133460: __netif_receive_skb_core->ip_rcv(skb=ffff88007f960700, net=ffff880114250000)
+     <idle>-0     [003] ..s3  5543.133475: __netif_receive_skb_core->ip_rcv(skb=ffff88007f960700, net=ffff880114250000)
+     <idle>-0     [003] ..s3  5543.312592: __netif_receive_skb_core->ip_rcv(skb=ffff88007f960700, net=ffff880114250000)
+     <idle>-0     [003] ..s3  5543.313150: __netif_receive_skb_core->ip_rcv(skb=ffff88007f960700, net=ffff880114250000)
+
+We use "x64" in order to make sure that the data is displayed in hex.
+This is on a x86_64 machine, and we know the pointer sizes are 8 bytes.
diff --git a/kernel/trace/trace_event_ftrace.c b/kernel/trace/trace_event_ftrace.c
index 66465be1e6d5..aa19c8af9d34 100644
--- a/kernel/trace/trace_event_ftrace.c
+++ b/kernel/trace/trace_event_ftrace.c
@@ -62,6 +62,11 @@ enum func_states {
 	FUNC_STATE_ERROR,
 };
 
+typedef u64 x64;
+typedef u32 x32;
+typedef u16 x16;
+typedef u8 x8;
+
 #define TYPE_TUPLE(type)			\
 	{ #type, sizeof(type), is_signed_type(type) }
 
@@ -77,12 +82,16 @@ static struct func_type {
 	TYPE_TUPLE(size_t),
 	TYPE_TUPLE(u64),
 	TYPE_TUPLE(s64),
+	TYPE_TUPLE(x64),
 	TYPE_TUPLE(u32),
 	TYPE_TUPLE(s32),
+	TYPE_TUPLE(x32),
 	TYPE_TUPLE(u16),
 	TYPE_TUPLE(s16),
+	TYPE_TUPLE(x16),
 	TYPE_TUPLE(u8),
 	TYPE_TUPLE(s8),
+	TYPE_TUPLE(x8),
 	{ NULL,		0,	0 }
 };
 
@@ -353,7 +362,9 @@ static void make_fmt(struct func_arg *arg, char *fmt)
 		fmt[c++] = 'l';
 	}
 
-	if (arg->sign)
+	if (arg->type[0] == 'x')
+		fmt[c++] = 'x';
+	else if (arg->sign)
 		fmt[c++] = 'd';
 	else
 		fmt[c++] = 'u';
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 06/18] tracing: Add indirect offset to args of ftrace based events
  2018-02-02 23:04 [PATCH 00/18] [ANNOUNCE] Dynamically created function based events Steven Rostedt
                   ` (4 preceding siblings ...)
  2018-02-02 23:05 ` [PATCH 05/18] tracing: Add hex print for dynamic ftrace based events Steven Rostedt
@ 2018-02-02 23:05 ` Steven Rostedt
  2018-02-02 23:05 ` [PATCH 07/18] tracing: Add dereferencing multiple fields per arg Steven Rostedt
                   ` (15 subsequent siblings)
  21 siblings, 0 replies; 87+ messages in thread
From: Steven Rostedt @ 2018-02-02 23:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Tom Zanussi, linux-rt-users,
	linux-trace-users, Arnaldo Carvalho de Melo, Clark Williams,
	Jiri Olsa, Daniel Bristot de Oliveira, Juri Lelli,
	Jonathan Corbet, Mathieu Desnoyers, Namhyung Kim,
	Alexei Starovoitov

[-- Attachment #1: 0006-tracing-Add-indirect-offset-to-args-of-ftrace-based-.patch --]
[-- Type: text/plain, Size: 5984 bytes --]

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

Add '[' ']' syntex to allow to get values indirectly from the arguments.
For example:

 echo replenish_dl_entity(s64 dl_se[4]) > function_events

Will get the 4th long long word from the first parameter like an array.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 Documentation/trace/function-based-events.rst | 32 +++++++++++-
 kernel/trace/trace_event_ftrace.c             | 73 +++++++++++++++++++++++++--
 2 files changed, 101 insertions(+), 4 deletions(-)

diff --git a/Documentation/trace/function-based-events.rst b/Documentation/trace/function-based-events.rst
index f27a0c4e829c..7d67229e8e88 100644
--- a/Documentation/trace/function-based-events.rst
+++ b/Documentation/trace/function-based-events.rst
@@ -100,11 +100,15 @@ as follows:
          'x8' | 'x16' | 'x32' | 'x64' |
          'char' | 'short' | 'int' | 'long' | 'size_t'
 
- FIELD := <name>
+ FIELD := <name> | <name> INDEX
+
+ INDEX := '[' <number> ']'
 
  Where <name> is a unique string starting with an alphabetic character
  and consists only of letters and numbers and underscores.
 
+ Where <number> is a number that can be read by kstrtol() (hex, decimal, etc).
+
 
 Simple arguments
 ================
@@ -128,3 +132,29 @@ If we are only interested in the first argument (skb):
 
 We use "x64" in order to make sure that the data is displayed in hex.
 This is on a x86_64 machine, and we know the pointer sizes are 8 bytes.
+
+
+Indexing
+========
+
+The pointers of the skb and the dev isn't that interesting. But if we want the
+length "len" field of skb, we could index it with an index operator '[' and ']'.
+
+Using gdb, we can find the offset of 'len' from the sk_buff type:
+
+ $ gdb vmlinux
+ (gdb) printf "%d\n", &((struct sk_buff *)0)->len
+128
+
+As 128 / 4 (length of int) is 32, we can see the length of the skb with:
+
+ # echo 'ip_rcv(int skb[32], x64 dev)' > function_events
+
+ # echo 1 > events/functions/ip_rcv/enable
+ # cat trace
+    <idle>-0     [003] ..s3   280.167137: __netif_receive_skb_core->ip_rcv(skb=52, dev=ffff8801092f9400)
+    <idle>-0     [003] ..s3   280.167152: __netif_receive_skb_core->ip_rcv(skb=52, dev=ffff8801092f9400)
+    <idle>-0     [003] ..s3   280.806629: __netif_receive_skb_core->ip_rcv(skb=88, dev=ffff8801092f9400)
+    <idle>-0     [003] ..s3   280.807023: __netif_receive_skb_core->ip_rcv(skb=52, dev=ffff8801092f9400)
+
+Now we see the length of the sk_buff per event.
diff --git a/kernel/trace/trace_event_ftrace.c b/kernel/trace/trace_event_ftrace.c
index aa19c8af9d34..5d37498d1c6b 100644
--- a/kernel/trace/trace_event_ftrace.c
+++ b/kernel/trace/trace_event_ftrace.c
@@ -10,13 +10,15 @@
 
 #include "trace.h"
 
-#define FUNC_EVENT_SYSTEM "functions"
-#define WRITE_BUFSIZE  4096
+#define FUNC_EVENT_SYSTEM	"functions"
+#define WRITE_BUFSIZE		4096
+#define INDIRECT_FLAG		0x10000000
 
 struct func_arg {
 	struct list_head		list;
 	char				*type;
 	char				*name;
+	long				indirect;
 	short				offset;
 	short				size;
 	char				arg;
@@ -55,6 +57,9 @@ enum func_states {
 	FUNC_STATE_INIT,
 	FUNC_STATE_FUNC,
 	FUNC_STATE_PARAM,
+	FUNC_STATE_BRACKET,
+	FUNC_STATE_BRACKET_END,
+	FUNC_STATE_INDIRECT,
 	FUNC_STATE_TYPE,
 	FUNC_STATE_VAR,
 	FUNC_STATE_COMMA,
@@ -171,6 +176,8 @@ static char *next_token(char **ptr, char *last)
 
 	for (str = arg; *str; str++) {
 		if (*str == '(' ||
+		    *str == '[' ||
+		    *str == ']' ||
 		    *str == ',' ||
 		    *str == ')')
 			break;
@@ -223,6 +230,7 @@ static int add_arg(struct func_event *fevent, int ftype)
 static enum func_states
 process_event(struct func_event *fevent, const char *token, enum func_states state)
 {
+	long val;
 	int ret;
 	int i;
 
@@ -269,12 +277,37 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
 			break;
 		return FUNC_STATE_VAR;
 
+	case FUNC_STATE_BRACKET:
+		WARN_ON(!fevent->last_arg);
+		ret = kstrtol(token, 0, &val);
+		if (ret)
+			break;
+		val *= fevent->last_arg->size;
+		fevent->last_arg->indirect = val ^ INDIRECT_FLAG;
+		return FUNC_STATE_INDIRECT;
+
+	case FUNC_STATE_INDIRECT:
+		if (token[0] != ']')
+			break;
+		return FUNC_STATE_BRACKET_END;
+
+	case FUNC_STATE_BRACKET_END:
+		switch (token[0]) {
+		case ')':
+			return FUNC_STATE_END;
+		case ',':
+			return FUNC_STATE_COMMA;
+		}
+		break;
+
 	case FUNC_STATE_VAR:
 		switch (token[0]) {
 		case ')':
 			return FUNC_STATE_END;
 		case ',':
 			return FUNC_STATE_COMMA;
+		case '[':
+			return FUNC_STATE_BRACKET;
 		}
 		break;
 
@@ -284,6 +317,37 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
 	return FUNC_STATE_ERROR;
 }
 
+static long long get_arg(struct func_arg *arg, unsigned long val)
+{
+	char buf[8];
+	int ret;
+
+	if (!arg->indirect)
+		return val;
+
+	val = val + (arg->indirect ^ INDIRECT_FLAG);
+
+	ret = probe_kernel_read(buf, (void *)val, arg->size);
+	if (ret)
+		return 0;
+
+	switch (arg->size) {
+		case 8:
+			val = *(unsigned long long *)buf;
+			break;
+		case 4:
+			val = *(unsigned int *)buf;
+			break;
+		case 2:
+			val = *(unsigned short *)buf;
+			break;
+		case 1:
+			val = *(unsigned char *)buf;
+			break;
+	}
+	return val;
+}
+
 static void func_event_trace(struct trace_event_file *trace_file,
 			     struct func_event *func_event,
 			     unsigned long ip, unsigned long parent_ip,
@@ -323,7 +387,7 @@ static void func_event_trace(struct trace_event_file *trace_file,
 
 	list_for_each_entry(arg, &func_event->args, list) {
 		if (i < nr_args)
-			val = args[i];
+			val = get_arg(arg, args[i]);
 		else
 			val = 0;
 		memcpy(&entry->data[arg->offset], &val, arg->size);
@@ -685,6 +749,9 @@ static int func_event_seq_show(struct seq_file *m, void *v)
 			seq_puts(m, ", ");
 		comma = true;
 		seq_printf(m, "%s %s", arg->type, arg->name);
+		if (arg->indirect && arg->size)
+			seq_printf(m, "[%ld]",
+				   (arg->indirect ^ INDIRECT_FLAG) / arg->size);
 	}
 	seq_puts(m, ")\n");
 
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 07/18] tracing: Add dereferencing multiple fields per arg
  2018-02-02 23:04 [PATCH 00/18] [ANNOUNCE] Dynamically created function based events Steven Rostedt
                   ` (5 preceding siblings ...)
  2018-02-02 23:05 ` [PATCH 06/18] tracing: Add indirect offset to args of " Steven Rostedt
@ 2018-02-02 23:05 ` Steven Rostedt
  2018-02-02 23:05 ` [PATCH 08/18] tracing: Add "unsigned" to function based events Steven Rostedt
                   ` (14 subsequent siblings)
  21 siblings, 0 replies; 87+ messages in thread
From: Steven Rostedt @ 2018-02-02 23:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Tom Zanussi, linux-rt-users,
	linux-trace-users, Arnaldo Carvalho de Melo, Clark Williams,
	Jiri Olsa, Daniel Bristot de Oliveira, Juri Lelli,
	Jonathan Corbet, Mathieu Desnoyers, Namhyung Kim,
	Alexei Starovoitov

[-- Attachment #1: 0007-tracing-Add-dereferencing-multiple-fields-per-arg.patch --]
[-- Type: text/plain, Size: 5005 bytes --]

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

As an argument may be a structure or an array, we may want to dereference
more than one field per argument. Create a pipe '|' token to the parsing
that allows to reference multipe dereference fields per function argument.

Change func_arg fields from char to s8 or u8 to allow them to be
subscripts to arrays.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 Documentation/trace/function-based-events.rst | 20 +++++++++++++++++-
 kernel/trace/trace_event_ftrace.c             | 29 ++++++++++++++++++++-------
 2 files changed, 41 insertions(+), 8 deletions(-)

diff --git a/Documentation/trace/function-based-events.rst b/Documentation/trace/function-based-events.rst
index 7d67229e8e88..2a002c8a500b 100644
--- a/Documentation/trace/function-based-events.rst
+++ b/Documentation/trace/function-based-events.rst
@@ -91,7 +91,7 @@ as follows:
 
  ARGS := ARG | ARG ',' ARGS | ''
 
- ARG := TYPE FIELD
+ ARG := TYPE FIELD | ARG '|' ARG
 
  TYPE := ATOM
 
@@ -158,3 +158,21 @@ As 128 / 4 (length of int) is 32, we can see the length of the skb with:
     <idle>-0     [003] ..s3   280.807023: __netif_receive_skb_core->ip_rcv(skb=52, dev=ffff8801092f9400)
 
 Now we see the length of the sk_buff per event.
+
+
+Multiple fields per argument
+============================
+
+
+If we still want to see the skb pointer value along with the length of the
+skb, then using the '|' option allows us to add more than one option to
+an argument:
+
+ # echo 'ip_rcv(x64 skb | int skb[32], x64 dev)' > function_events
+
+ # echo 1 > events/functions/ip_rcv/enable
+ # cat trace
+    <idle>-0     [003] ..s3   904.075838: __netif_receive_skb_core->ip_rcv(skb=ffff88011396e800, skb=52, dev=ffff880115204000)
+    <idle>-0     [003] ..s3   904.075848: __netif_receive_skb_core->ip_rcv(skb=ffff88011396e800, skb=52, dev=ffff880115204000)
+    <idle>-0     [003] ..s3   904.725486: __netif_receive_skb_core->ip_rcv(skb=ffff88011396e800, skb=194, dev=ffff880115204000)
+    <idle>-0     [003] ..s3   905.152537: __netif_receive_skb_core->ip_rcv(skb=ffff88011396f200, skb=88, dev=ffff880115204000)
diff --git a/kernel/trace/trace_event_ftrace.c b/kernel/trace/trace_event_ftrace.c
index 5d37498d1c6b..8c9d4a92deab 100644
--- a/kernel/trace/trace_event_ftrace.c
+++ b/kernel/trace/trace_event_ftrace.c
@@ -21,8 +21,8 @@ struct func_arg {
 	long				indirect;
 	short				offset;
 	short				size;
-	char				arg;
-	char				sign;
+	s8				arg;
+	u8				sign;
 };
 
 struct func_event {
@@ -60,6 +60,7 @@ enum func_states {
 	FUNC_STATE_BRACKET,
 	FUNC_STATE_BRACKET_END,
 	FUNC_STATE_INDIRECT,
+	FUNC_STATE_PIPE,
 	FUNC_STATE_TYPE,
 	FUNC_STATE_VAR,
 	FUNC_STATE_COMMA,
@@ -179,6 +180,7 @@ static char *next_token(char **ptr, char *last)
 		    *str == '[' ||
 		    *str == ']' ||
 		    *str == ',' ||
+		    *str == '|' ||
 		    *str == ')')
 			break;
 	}
@@ -251,11 +253,15 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
 			break;
 		return FUNC_STATE_PARAM;
 
+	case FUNC_STATE_PIPE:
+		fevent->arg_cnt--;
+		goto comma;
 	case FUNC_STATE_PARAM:
 		if (token[0] == ')')
 			return FUNC_STATE_END;
 		/* Fall through */
 	case FUNC_STATE_COMMA:
+ comma:
 		for (i = 0; func_types[i].size; i++) {
 			if (strcmp(token, func_types[i].name) == 0)
 				break;
@@ -297,6 +303,8 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
 			return FUNC_STATE_END;
 		case ',':
 			return FUNC_STATE_COMMA;
+		case '|':
+			return FUNC_STATE_PIPE;
 		}
 		break;
 
@@ -306,6 +314,8 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
 			return FUNC_STATE_END;
 		case ',':
 			return FUNC_STATE_COMMA;
+		case '|':
+			return FUNC_STATE_PIPE;
 		case '[':
 			return FUNC_STATE_BRACKET;
 		}
@@ -364,7 +374,6 @@ static void func_event_trace(struct trace_event_file *trace_file,
 	int nr_args;
 	int size;
 	int pc;
-	int i = 0;
 
 	if (trace_trigger_soft_disabled(trace_file))
 		return;
@@ -386,8 +395,8 @@ static void func_event_trace(struct trace_event_file *trace_file,
 	nr_args = arch_get_func_args(pt_regs, 0, func_event->arg_cnt, args);
 
 	list_for_each_entry(arg, &func_event->args, list) {
-		if (i < nr_args)
-			val = get_arg(arg, args[i]);
+		if (arg->arg < nr_args)
+			val = get_arg(arg, args[arg->arg]);
 		else
 			val = 0;
 		memcpy(&entry->data[arg->offset], &val, arg->size);
@@ -741,12 +750,18 @@ static int func_event_seq_show(struct seq_file *m, void *v)
 	struct func_event *func_event = v;
 	struct func_arg *arg;
 	bool comma = false;
+	int last_arg = 0;
 
 	seq_printf(m, "%s(", func_event->func);
 
 	list_for_each_entry(arg, &func_event->args, list) {
-		if (comma)
-			seq_puts(m, ", ");
+		if (comma) {
+			if (last_arg == arg->arg)
+				seq_puts(m, " | ");
+			else
+				seq_puts(m, ", ");
+		}
+		last_arg = arg->arg;
 		comma = true;
 		seq_printf(m, "%s %s", arg->type, arg->name);
 		if (arg->indirect && arg->size)
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 08/18] tracing: Add "unsigned" to function based events
  2018-02-02 23:04 [PATCH 00/18] [ANNOUNCE] Dynamically created function based events Steven Rostedt
                   ` (6 preceding siblings ...)
  2018-02-02 23:05 ` [PATCH 07/18] tracing: Add dereferencing multiple fields per arg Steven Rostedt
@ 2018-02-02 23:05 ` Steven Rostedt
  2018-02-02 23:05 ` [PATCH 09/18] tracing: Add indexing of arguments for " Steven Rostedt
                   ` (13 subsequent siblings)
  21 siblings, 0 replies; 87+ messages in thread
From: Steven Rostedt @ 2018-02-02 23:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Tom Zanussi, linux-rt-users,
	linux-trace-users, Arnaldo Carvalho de Melo, Clark Williams,
	Jiri Olsa, Daniel Bristot de Oliveira, Juri Lelli,
	Jonathan Corbet, Mathieu Desnoyers, Namhyung Kim,
	Alexei Starovoitov

[-- Attachment #1: 0008-tracing-Add-unsigned-to-function-based-events.patch --]
[-- Type: text/plain, Size: 5151 bytes --]

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

Add "unsigned" to the format processing to creating dynamic function based
events. For example: "unsigned long" now works.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 Documentation/trace/function-based-events.rst | 47 ++++++++++++++++++++++++++-
 kernel/trace/trace_event_ftrace.c             | 23 ++++++++++---
 2 files changed, 65 insertions(+), 5 deletions(-)

diff --git a/Documentation/trace/function-based-events.rst b/Documentation/trace/function-based-events.rst
index 2a002c8a500b..72e3e7730d63 100644
--- a/Documentation/trace/function-based-events.rst
+++ b/Documentation/trace/function-based-events.rst
@@ -93,7 +93,7 @@ as follows:
 
  ARG := TYPE FIELD | ARG '|' ARG
 
- TYPE := ATOM
+ TYPE := ATOM | 'unsigned' ATOM
 
  ATOM := 'u8' | 'u16' | 'u32' | 'u64' |
          's8' | 's16' | 's32' | 's64' |
@@ -176,3 +176,48 @@ an argument:
     <idle>-0     [003] ..s3   904.075848: __netif_receive_skb_core->ip_rcv(skb=ffff88011396e800, skb=52, dev=ffff880115204000)
     <idle>-0     [003] ..s3   904.725486: __netif_receive_skb_core->ip_rcv(skb=ffff88011396e800, skb=194, dev=ffff880115204000)
     <idle>-0     [003] ..s3   905.152537: __netif_receive_skb_core->ip_rcv(skb=ffff88011396f200, skb=88, dev=ffff880115204000)
+
+
+Unsigned usage
+==============
+
+One can also use "unsigned" to make some types unsigned. It works against
+"long", "int", "short" and "char". It doesn't error against other types but
+may not make any sense.
+
+ # echo 'ip_rcv(int skb[32])' > function_events
+ # cat events/functions/ip_rcv/format
+name: ip_rcv
+ID: 1397
+format:
+	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
+	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
+	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
+	field:int common_pid;	offset:4;	size:4;	signed:1;
+
+	field:unsigned long __parent_ip;	offset:8;	size:8;	signed:0;
+	field:unsigned long __ip;	offset:16;	size:8;	signed:0;
+	field:int skb;	offset:24;	size:4;	signed:1;
+
+print fmt: "%pS->%pS(skb=%d)", REC->__ip, REC->__parent_ip, REC->skb
+
+
+Notice that REC->skb is printed with "%d". By adding "unsigned"
+
+ # echo 'ip_rcv(unsigned int skb[32])' > function_events
+ # cat events/functions/ip_rcv/format
+name: ip_rcv
+ID: 1398
+format:
+	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
+	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
+	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
+	field:int common_pid;	offset:4;	size:4;	signed:1;
+
+	field:unsigned long __parent_ip;	offset:8;	size:8;	signed:0;
+	field:unsigned long __ip;	offset:16;	size:8;	signed:0;
+	field:unsigned int skb;	offset:24;	size:4;	signed:0;
+
+print fmt: "%pS->%pS(skb=%u)", REC->__ip, REC->__parent_ip, REC->skb
+
+It is now printed with a "%u".
diff --git a/kernel/trace/trace_event_ftrace.c b/kernel/trace/trace_event_ftrace.c
index 8c9d4a92deab..9548b93eb8cd 100644
--- a/kernel/trace/trace_event_ftrace.c
+++ b/kernel/trace/trace_event_ftrace.c
@@ -60,6 +60,7 @@ enum func_states {
 	FUNC_STATE_BRACKET,
 	FUNC_STATE_BRACKET_END,
 	FUNC_STATE_INDIRECT,
+	FUNC_STATE_UNSIGNED,
 	FUNC_STATE_PIPE,
 	FUNC_STATE_TYPE,
 	FUNC_STATE_VAR,
@@ -198,7 +199,7 @@ static char *next_token(char **ptr, char *last)
 	return arg;
 }
 
-static int add_arg(struct func_event *fevent, int ftype)
+static int add_arg(struct func_event *fevent, int ftype, int unsign)
 {
 	struct func_type *func_type = &func_types[ftype];
 	struct func_arg *arg;
@@ -211,13 +212,18 @@ static int add_arg(struct func_event *fevent, int ftype)
 	if (!arg)
 		return -ENOMEM;
 
-	arg->type = kstrdup(func_type->name, GFP_KERNEL);
+	if (unsign)
+		arg->type = kasprintf(GFP_KERNEL, "unsigned %s",
+				      func_type->name);
+	else
+		arg->type = kstrdup(func_type->name, GFP_KERNEL);
 	if (!arg->type) {
 		kfree(arg);
 		return -ENOMEM;
 	}
 	arg->size = func_type->size;
-	arg->sign = func_type->sign;
+	if (!unsign)
+		arg->sign = func_type->sign;
 	arg->offset = ALIGN(fevent->arg_offset, arg->size);
 	arg->arg = fevent->arg_cnt;
 	fevent->arg_offset = arg->offset + arg->size;
@@ -232,12 +238,14 @@ static int add_arg(struct func_event *fevent, int ftype)
 static enum func_states
 process_event(struct func_event *fevent, const char *token, enum func_states state)
 {
+	static int unsign;
 	long val;
 	int ret;
 	int i;
 
 	switch (state) {
 	case FUNC_STATE_INIT:
+		unsign = 0;
 		if (!isalpha(token[0]))
 			break;
 		/* Do not allow wild cards */
@@ -262,13 +270,20 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
 		/* Fall through */
 	case FUNC_STATE_COMMA:
  comma:
+		if (strcmp(token, "unsigned") == 0) {
+			unsign = 2;
+			return FUNC_STATE_UNSIGNED;
+		}
+		/* Fall through */
+	case FUNC_STATE_UNSIGNED:
 		for (i = 0; func_types[i].size; i++) {
 			if (strcmp(token, func_types[i].name) == 0)
 				break;
 		}
 		if (!func_types[i].size)
 			break;
-		ret = add_arg(fevent, i);
+		ret = add_arg(fevent, i, unsign);
+		unsign = 0;
 		if (ret < 0)
 			break;
 		return FUNC_STATE_TYPE;
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 09/18] tracing: Add indexing of arguments for function based events
  2018-02-02 23:04 [PATCH 00/18] [ANNOUNCE] Dynamically created function based events Steven Rostedt
                   ` (7 preceding siblings ...)
  2018-02-02 23:05 ` [PATCH 08/18] tracing: Add "unsigned" to function based events Steven Rostedt
@ 2018-02-02 23:05 ` Steven Rostedt
  2018-02-08 10:59   ` Namhyung Kim
  2018-02-02 23:05 ` [PATCH 10/18] tracing: Make func_type enums for easier comparing of arg types Steven Rostedt
                   ` (12 subsequent siblings)
  21 siblings, 1 reply; 87+ messages in thread
From: Steven Rostedt @ 2018-02-02 23:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Tom Zanussi, linux-rt-users,
	linux-trace-users, Arnaldo Carvalho de Melo, Clark Williams,
	Jiri Olsa, Daniel Bristot de Oliveira, Juri Lelli,
	Jonathan Corbet, Mathieu Desnoyers, Namhyung Kim,
	Alexei Starovoitov

[-- Attachment #1: 0009-tracing-Add-indexing-of-arguments-for-function-based.patch --]
[-- Type: text/plain, Size: 4009 bytes --]

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

Currently reading of 8 byte words can only happen 8 bytes aligned from the
argument. But there may be cases that they are 4 bytes aligned. To make the
capturing of arguments more flexible, add a plus '+' operator that can index
the variable at arbitrary indexes to get any location.

 u64 arg+4[3]

Will get an 8 byte word at index 28 (3 * 8 + 4)

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 Documentation/trace/function-based-events.rst | 24 +++++++++++++++++++++++-
 kernel/trace/trace_event_ftrace.c             | 18 ++++++++++++++++++
 2 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/Documentation/trace/function-based-events.rst b/Documentation/trace/function-based-events.rst
index 72e3e7730d63..bdb28f433bfb 100644
--- a/Documentation/trace/function-based-events.rst
+++ b/Documentation/trace/function-based-events.rst
@@ -100,10 +100,12 @@ as follows:
          'x8' | 'x16' | 'x32' | 'x64' |
          'char' | 'short' | 'int' | 'long' | 'size_t'
 
- FIELD := <name> | <name> INDEX
+ FIELD := <name> | <name> INDEX | <name> OFFSET | <name> OFFSET INDEX
 
  INDEX := '[' <number> ']'
 
+ OFFSET := '+' <number>
+
  Where <name> is a unique string starting with an alphabetic character
  and consists only of letters and numbers and underscores.
 
@@ -221,3 +223,23 @@ format:
 print fmt: "%pS->%pS(skb=%u)", REC->__ip, REC->__parent_ip, REC->skb
 
 It is now printed with a "%u".
+
+
+Offsets
+=======
+
+After the name of the variable, brackets '[' number ']' will index the value of
+the argument by the number given times the size of the field.
+
+ int field[5] will dereference the value of the argument 20 bytes away (4 * 5)
+  as sizeof(int) is 4.
+
+If there's a case where the type is of 8 bytes in size but is not 8 bytes
+alligned in the structure, an offset may be required.
+
+  For example: x64 param+4[2]
+
+The above will take the parameter value, add it by 4, then index it by two
+8 byte words. It's the same in C as: (u64 *)((void *)param + 4)[2]
+
+ Note: "int skb[32]" is the same as "int skb+4[31]".
diff --git a/kernel/trace/trace_event_ftrace.c b/kernel/trace/trace_event_ftrace.c
index 9548b93eb8cd..4c23fa18453d 100644
--- a/kernel/trace/trace_event_ftrace.c
+++ b/kernel/trace/trace_event_ftrace.c
@@ -19,6 +19,7 @@ struct func_arg {
 	char				*type;
 	char				*name;
 	long				indirect;
+	long				index;
 	short				offset;
 	short				size;
 	s8				arg;
@@ -62,6 +63,7 @@ enum func_states {
 	FUNC_STATE_INDIRECT,
 	FUNC_STATE_UNSIGNED,
 	FUNC_STATE_PIPE,
+	FUNC_STATE_PLUS,
 	FUNC_STATE_TYPE,
 	FUNC_STATE_VAR,
 	FUNC_STATE_COMMA,
@@ -182,6 +184,7 @@ static char *next_token(char **ptr, char *last)
 		    *str == ']' ||
 		    *str == ',' ||
 		    *str == '|' ||
+		    *str == '+' ||
 		    *str == ')')
 			break;
 	}
@@ -323,6 +326,15 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
 		}
 		break;
 
+	case FUNC_STATE_PLUS:
+		if (WARN_ON(!fevent->last_arg))
+			break;
+		ret = kstrtol(token, 0, &val);
+		if (ret)
+			break;
+		fevent->last_arg->index += val;
+		return FUNC_STATE_VAR;
+
 	case FUNC_STATE_VAR:
 		switch (token[0]) {
 		case ')':
@@ -331,6 +343,8 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
 			return FUNC_STATE_COMMA;
 		case '|':
 			return FUNC_STATE_PIPE;
+		case '+':
+			return FUNC_STATE_PLUS;
 		case '[':
 			return FUNC_STATE_BRACKET;
 		}
@@ -347,6 +361,8 @@ static long long get_arg(struct func_arg *arg, unsigned long val)
 	char buf[8];
 	int ret;
 
+	val += arg->index;
+
 	if (!arg->indirect)
 		return val;
 
@@ -779,6 +795,8 @@ static int func_event_seq_show(struct seq_file *m, void *v)
 		last_arg = arg->arg;
 		comma = true;
 		seq_printf(m, "%s %s", arg->type, arg->name);
+		if (arg->index)
+			seq_printf(m, "+%ld", arg->index);
 		if (arg->indirect && arg->size)
 			seq_printf(m, "[%ld]",
 				   (arg->indirect ^ INDIRECT_FLAG) / arg->size);
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 10/18] tracing: Make func_type enums for easier comparing of arg types
  2018-02-02 23:04 [PATCH 00/18] [ANNOUNCE] Dynamically created function based events Steven Rostedt
                   ` (8 preceding siblings ...)
  2018-02-02 23:05 ` [PATCH 09/18] tracing: Add indexing of arguments for " Steven Rostedt
@ 2018-02-02 23:05 ` Steven Rostedt
  2018-02-02 23:05 ` [PATCH 11/18] tracing: Add symbol type to function based events Steven Rostedt
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 87+ messages in thread
From: Steven Rostedt @ 2018-02-02 23:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Tom Zanussi, linux-rt-users,
	linux-trace-users, Arnaldo Carvalho de Melo, Clark Williams,
	Jiri Olsa, Daniel Bristot de Oliveira, Juri Lelli,
	Jonathan Corbet, Mathieu Desnoyers, Namhyung Kim,
	Alexei Starovoitov

[-- Attachment #1: 0010-tracing-Make-func_type-enums-for-easier-comparing-of.patch --]
[-- Type: text/plain, Size: 2422 bytes --]

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

For the function based event args, knowing quickly what type they are is
advantageous, as decisions can be made quickly based on them. Having an
enum for the types is useful for this purpose.

Use macros to create both the func_type array as well as enums that
match the type to the index into that array.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 kernel/trace/trace_event_ftrace.c | 47 +++++++++++++++++++++++++--------------
 1 file changed, 30 insertions(+), 17 deletions(-)

diff --git a/kernel/trace/trace_event_ftrace.c b/kernel/trace/trace_event_ftrace.c
index 4c23fa18453d..0f2650e97e49 100644
--- a/kernel/trace/trace_event_ftrace.c
+++ b/kernel/trace/trace_event_ftrace.c
@@ -24,6 +24,7 @@ struct func_arg {
 	short				size;
 	s8				arg;
 	u8				sign;
+	u8				func_type;
 };
 
 struct func_event {
@@ -79,31 +80,42 @@ typedef u8 x8;
 #define TYPE_TUPLE(type)			\
 	{ #type, sizeof(type), is_signed_type(type) }
 
+#define FUNC_TYPES				\
+	TYPE_TUPLE(long),			\
+	TYPE_TUPLE(int),			\
+	TYPE_TUPLE(short),			\
+	TYPE_TUPLE(char),			\
+	TYPE_TUPLE(size_t),			\
+	TYPE_TUPLE(u64),			\
+	TYPE_TUPLE(s64),			\
+	TYPE_TUPLE(x64),			\
+	TYPE_TUPLE(u32),			\
+	TYPE_TUPLE(s32),			\
+	TYPE_TUPLE(x32),			\
+	TYPE_TUPLE(u16),			\
+	TYPE_TUPLE(s16),			\
+	TYPE_TUPLE(x16),			\
+	TYPE_TUPLE(u8),				\
+	TYPE_TUPLE(s8),				\
+	TYPE_TUPLE(x8)
+
 static struct func_type {
 	char		*name;
 	int		size;
 	int		sign;
 } func_types[] = {
-	TYPE_TUPLE(long),
-	TYPE_TUPLE(int),
-	TYPE_TUPLE(short),
-	TYPE_TUPLE(char),
-	TYPE_TUPLE(size_t),
-	TYPE_TUPLE(u64),
-	TYPE_TUPLE(s64),
-	TYPE_TUPLE(x64),
-	TYPE_TUPLE(u32),
-	TYPE_TUPLE(s32),
-	TYPE_TUPLE(x32),
-	TYPE_TUPLE(u16),
-	TYPE_TUPLE(s16),
-	TYPE_TUPLE(x16),
-	TYPE_TUPLE(u8),
-	TYPE_TUPLE(s8),
-	TYPE_TUPLE(x8),
+	FUNC_TYPES,
 	{ NULL,		0,	0 }
 };
 
+#undef TYPE_TUPLE
+#define TYPE_TUPLE(type)	FUNC_TYPE_##type
+
+enum {
+	FUNC_TYPES,
+	FUNC_TYPE_MAX
+};
+
 /**
  * arch_get_func_args - retrieve function arguments via pt_regs
  * @regs: The registers at the moment the function is called
@@ -228,6 +240,7 @@ static int add_arg(struct func_event *fevent, int ftype, int unsign)
 	if (!unsign)
 		arg->sign = func_type->sign;
 	arg->offset = ALIGN(fevent->arg_offset, arg->size);
+	arg->func_type = ftype;
 	arg->arg = fevent->arg_cnt;
 	fevent->arg_offset = arg->offset + arg->size;
 
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 11/18] tracing: Add symbol type to function based events
  2018-02-02 23:04 [PATCH 00/18] [ANNOUNCE] Dynamically created function based events Steven Rostedt
                   ` (9 preceding siblings ...)
  2018-02-02 23:05 ` [PATCH 10/18] tracing: Make func_type enums for easier comparing of arg types Steven Rostedt
@ 2018-02-02 23:05 ` Steven Rostedt
  2018-02-08 11:03   ` Namhyung Kim
  2018-02-02 23:05 ` [PATCH 12/18] tracing: Add accessing direct address from " Steven Rostedt
                   ` (10 subsequent siblings)
  21 siblings, 1 reply; 87+ messages in thread
From: Steven Rostedt @ 2018-02-02 23:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Tom Zanussi, linux-rt-users,
	linux-trace-users, Arnaldo Carvalho de Melo, Clark Williams,
	Jiri Olsa, Daniel Bristot de Oliveira, Juri Lelli,
	Jonathan Corbet, Mathieu Desnoyers, Namhyung Kim,
	Alexei Starovoitov

[-- Attachment #1: 0011-tracing-Add-symbol-type-to-function-based-events.patch --]
[-- Type: text/plain, Size: 3834 bytes --]

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

Add a special type "symbol" that will use %pS to display the field of a
function based event.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 Documentation/trace/function-based-events.rst | 26 +++++++++++++++++++++++++-
 kernel/trace/trace_event_ftrace.c             | 13 ++++++++++---
 2 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/Documentation/trace/function-based-events.rst b/Documentation/trace/function-based-events.rst
index bdb28f433bfb..f18c8f3ef330 100644
--- a/Documentation/trace/function-based-events.rst
+++ b/Documentation/trace/function-based-events.rst
@@ -98,7 +98,8 @@ as follows:
  ATOM := 'u8' | 'u16' | 'u32' | 'u64' |
          's8' | 's16' | 's32' | 's64' |
          'x8' | 'x16' | 'x32' | 'x64' |
-         'char' | 'short' | 'int' | 'long' | 'size_t'
+         'char' | 'short' | 'int' | 'long' | 'size_t' |
+	 'symbol'
 
  FIELD := <name> | <name> INDEX | <name> OFFSET | <name> OFFSET INDEX
 
@@ -243,3 +244,26 @@ The above will take the parameter value, add it by 4, then index it by two
 8 byte words. It's the same in C as: (u64 *)((void *)param + 4)[2]
 
  Note: "int skb[32]" is the same as "int skb+4[31]".
+
+
+Symbols (function names)
+========================
+
+To display kallsyms "%pS" type of output, use the special type "symbol".
+
+Again, using gdb to find the offset of the "func" field of struct work_struct
+
+(gdb) printf "%d\n", &((struct work_struct *)0)->func
+24
+
+ Both "symbol func[3]" and "symbol func+24[0]" will work.
+
+ # echo '__queue_work(int cpu, x64 wq, symbol func[3])' > function_events
+
+ # echo 1 > events/functions/__queue_work/enable
+ # cat trace
+       bash-1641  [007] d..2  6241.171332: queue_work_on->__queue_work(cpu=128, wq=ffff88011a010e00, func=flush_to_ldisc+0x0/0xa0)
+       bash-1641  [007] d..2  6241.171460: queue_work_on->__queue_work(cpu=128, wq=ffff88011a010e00, func=flush_to_ldisc+0x0/0xa0)
+     <idle>-0     [000] dNs3  6241.172004: delayed_work_timer_fn->__queue_work(cpu=128, wq=ffff88011a010800, func=vmstat_shepherd+0x0/0xb0)
+ worker/0:2-1689  [000] d..2  6241.172026: __queue_delayed_work->__queue_work(cpu=7, wq=ffff88011a11da00, func=vmstat_update+0x0/0x70)
+     <idle>-0     [005] d.s3  6241.347996: queue_work_on->__queue_work(cpu=128, wq=ffff88011a011200, func=fb_flashcursor+0x0/0x110 [fb])
diff --git a/kernel/trace/trace_event_ftrace.c b/kernel/trace/trace_event_ftrace.c
index 0f2650e97e49..ba10177b9bd6 100644
--- a/kernel/trace/trace_event_ftrace.c
+++ b/kernel/trace/trace_event_ftrace.c
@@ -76,6 +76,7 @@ typedef u64 x64;
 typedef u32 x32;
 typedef u16 x16;
 typedef u8 x8;
+typedef void * symbol;
 
 #define TYPE_TUPLE(type)			\
 	{ #type, sizeof(type), is_signed_type(type) }
@@ -97,7 +98,8 @@ typedef u8 x8;
 	TYPE_TUPLE(x16),			\
 	TYPE_TUPLE(u8),				\
 	TYPE_TUPLE(s8),				\
-	TYPE_TUPLE(x8)
+	TYPE_TUPLE(x8),				\
+	TYPE_TUPLE(symbol)
 
 static struct func_type {
 	char		*name;
@@ -262,7 +264,7 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
 	switch (state) {
 	case FUNC_STATE_INIT:
 		unsign = 0;
-		if (!isalpha(token[0]))
+		if (!isalpha(token[0]) && token[0] != '_')
 			break;
 		/* Do not allow wild cards */
 		if (strstr(token, "*") || strstr(token, "?"))
@@ -305,7 +307,7 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
 		return FUNC_STATE_TYPE;
 
 	case FUNC_STATE_TYPE:
-		if (!isalpha(token[0]))
+		if (!isalpha(token[0]) || token[0] == '_')
 			break;
 		if (WARN_ON(!fevent->last_arg))
 			break;
@@ -472,6 +474,11 @@ static void make_fmt(struct func_arg *arg, char *fmt)
 {
 	int c = 0;
 
+	if (arg->func_type == FUNC_TYPE_symbol) {
+		strcpy(fmt, "%pS");
+		return;
+	}
+
 	fmt[c++] = '%';
 
 	if (arg->size == 8) {
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 12/18] tracing: Add accessing direct address from function based events
  2018-02-02 23:04 [PATCH 00/18] [ANNOUNCE] Dynamically created function based events Steven Rostedt
                   ` (10 preceding siblings ...)
  2018-02-02 23:05 ` [PATCH 11/18] tracing: Add symbol type to function based events Steven Rostedt
@ 2018-02-02 23:05 ` Steven Rostedt
  2018-02-09  0:34   ` Namhyung Kim
  2018-02-02 23:05 ` [PATCH 13/18] tracing: Add array type to " Steven Rostedt
                   ` (9 subsequent siblings)
  21 siblings, 1 reply; 87+ messages in thread
From: Steven Rostedt @ 2018-02-02 23:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Tom Zanussi, linux-rt-users,
	linux-trace-users, Arnaldo Carvalho de Melo, Clark Williams,
	Jiri Olsa, Daniel Bristot de Oliveira, Juri Lelli,
	Jonathan Corbet, Mathieu Desnoyers, Namhyung Kim,
	Alexei Starovoitov

[-- Attachment #1: 0012-tracing-Add-accessing-direct-address-from-function-b.patch --]
[-- Type: text/plain, Size: 10304 bytes --]

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

Allow referencing any address during the function based event. The syntax is
to use <type> <name>=<addr> For example:

 # echo 'do_IRQ(long total_forks=0xffffffffa2a4b4c0)' > function_events
 # echo 1 > events/function/enable
 # cat trace
            sshd-832   [000] d... 221639.210845: ret_from_intr->do_IRQ(total_forks=855)
            sshd-832   [000] d... 221639.211114: ret_from_intr->do_IRQ(total_forks=855)
          <idle>-0     [000] d... 221639.211198: ret_from_intr->do_IRQ(total_forks=855)

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 Documentation/trace/function-based-events.rst |  40 +++++++-
 kernel/trace/trace_event_ftrace.c             | 129 +++++++++++++++++++++-----
 2 files changed, 143 insertions(+), 26 deletions(-)

diff --git a/Documentation/trace/function-based-events.rst b/Documentation/trace/function-based-events.rst
index f18c8f3ef330..b0e6725f3032 100644
--- a/Documentation/trace/function-based-events.rst
+++ b/Documentation/trace/function-based-events.rst
@@ -91,7 +91,7 @@ as follows:
 
  ARGS := ARG | ARG ',' ARGS | ''
 
- ARG := TYPE FIELD | ARG '|' ARG
+ ARG := TYPE FIELD | TYPE <name> '=' ADDR | TYPE ADDR | ARG '|' ARG
 
  TYPE := ATOM | 'unsigned' ATOM
 
@@ -107,6 +107,8 @@ as follows:
 
  OFFSET := '+' <number>
 
+ ADDR := A hexidecimal address starting with '0x'
+
  Where <name> is a unique string starting with an alphabetic character
  and consists only of letters and numbers and underscores.
 
@@ -267,3 +269,39 @@ Again, using gdb to find the offset of the "func" field of struct work_struct
      <idle>-0     [000] dNs3  6241.172004: delayed_work_timer_fn->__queue_work(cpu=128, wq=ffff88011a010800, func=vmstat_shepherd+0x0/0xb0)
  worker/0:2-1689  [000] d..2  6241.172026: __queue_delayed_work->__queue_work(cpu=7, wq=ffff88011a11da00, func=vmstat_update+0x0/0x70)
      <idle>-0     [005] d.s3  6241.347996: queue_work_on->__queue_work(cpu=128, wq=ffff88011a011200, func=fb_flashcursor+0x0/0x110 [fb])
+
+
+Direct memory access
+====================
+
+Function arguments are not the only thing that can be recorded from a function
+based event. Memory addresses can also be examined. If there's a global variable
+that you want to monitor via an interrupt, you can put in the address directly.
+
+  # grep total_forks /proc/kallsyms
+ffffffff82354c18 B total_forks
+
+  # echo 'do_IRQ(int total_forks=0xffffffff82354c18)' > function_events
+
+  # echo 1 events/functions/do_IRQ/enable
+  # cat trace
+    <idle>-0     [003] d..3   337.076709: ret_from_intr->do_IRQ(total_forks=1419)
+    <idle>-0     [003] d..3   337.077046: ret_from_intr->do_IRQ(total_forks=1419)
+    <idle>-0     [003] d..3   337.077076: ret_from_intr->do_IRQ(total_forks=1420)
+
+Note, address notations do not affect the argument count. For instance, with
+
+__visible unsigned int __irq_entry do_IRQ(struct pt_regs *regs)
+
+  # echo 'do_IRQ(int total_forks=0xffffffff82354c18, symbol regs[16])' > function_events
+
+Is the same as
+
+  # echo 'do_IRQ(int total_forks=0xffffffff82354c18 | symbol regs[16])' > function_events
+
+  # cat trace
+    <idle>-0     [003] d..3   653.839546: ret_from_intr->do_IRQ(total_forks=1504, regs=cpuidle_enter_state+0xb1/0x330)
+    <idle>-0     [003] d..3   653.906011: ret_from_intr->do_IRQ(total_forks=1504, regs=cpuidle_enter_state+0xb1/0x330)
+    <idle>-0     [003] d..3   655.823498: ret_from_intr->do_IRQ(total_forks=1504, regs=tick_nohz_idle_enter+0x4c/0x50)
+    <idle>-0     [003] d..3   655.954096: ret_from_intr->do_IRQ(total_forks=1504, regs=cpuidle_enter_state+0xb1/0x330)
+
diff --git a/kernel/trace/trace_event_ftrace.c b/kernel/trace/trace_event_ftrace.c
index ba10177b9bd6..206114f192be 100644
--- a/kernel/trace/trace_event_ftrace.c
+++ b/kernel/trace/trace_event_ftrace.c
@@ -63,6 +63,8 @@ enum func_states {
 	FUNC_STATE_BRACKET_END,
 	FUNC_STATE_INDIRECT,
 	FUNC_STATE_UNSIGNED,
+	FUNC_STATE_ADDR,
+	FUNC_STATE_EQUAL,
 	FUNC_STATE_PIPE,
 	FUNC_STATE_PLUS,
 	FUNC_STATE_TYPE,
@@ -199,6 +201,7 @@ static char *next_token(char **ptr, char *last)
 		    *str == ',' ||
 		    *str == '|' ||
 		    *str == '+' ||
+		    *str == '=' ||
 		    *str == ')')
 			break;
 	}
@@ -243,12 +246,39 @@ static int add_arg(struct func_event *fevent, int ftype, int unsign)
 		arg->sign = func_type->sign;
 	arg->offset = ALIGN(fevent->arg_offset, arg->size);
 	arg->func_type = ftype;
-	arg->arg = fevent->arg_cnt;
 	fevent->arg_offset = arg->offset + arg->size;
 
 	list_add_tail(&arg->list, &fevent->args);
 	fevent->last_arg = arg;
-	fevent->arg_cnt++;
+
+	return 0;
+}
+
+static int update_arg_name(struct func_event *fevent, const char *name)
+{
+	struct func_arg *arg = fevent->last_arg;
+
+	if (WARN_ON(!arg))
+		return -EINVAL;
+
+	arg->name = kstrdup(name, GFP_KERNEL);
+	if (!arg->name)
+		return -ENOMEM;
+	return 0;
+}
+
+static int update_arg_arg(struct func_event *fevent)
+{
+	struct func_arg *arg = fevent->last_arg;
+
+	if (WARN_ON(!arg))
+		return -EINVAL;
+
+	/* Make sure the arch can support this many args */
+	if (fevent->arg_cnt >= arch_get_func_args(NULL, 0, 0, NULL))
+		return -EINVAL;
+
+	arg->arg = fevent->arg_cnt;
 
 	return 0;
 }
@@ -256,14 +286,16 @@ static int add_arg(struct func_event *fevent, int ftype, int unsign)
 static enum func_states
 process_event(struct func_event *fevent, const char *token, enum func_states state)
 {
+	static bool update_arg;
 	static int unsign;
-	long val;
+	unsigned long val;
 	int ret;
 	int i;
 
 	switch (state) {
 	case FUNC_STATE_INIT:
 		unsign = 0;
+		update_arg = false;
 		if (!isalpha(token[0]) && token[0] != '_')
 			break;
 		/* Do not allow wild cards */
@@ -279,15 +311,15 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
 			break;
 		return FUNC_STATE_PARAM;
 
-	case FUNC_STATE_PIPE:
-		fevent->arg_cnt--;
-		goto comma;
 	case FUNC_STATE_PARAM:
 		if (token[0] == ')')
 			return FUNC_STATE_END;
 		/* Fall through */
 	case FUNC_STATE_COMMA:
- comma:
+		if (update_arg)
+			fevent->arg_cnt++;
+		update_arg = false;
+	case FUNC_STATE_PIPE:
 		if (strcmp(token, "unsigned") == 0) {
 			unsign = 2;
 			return FUNC_STATE_UNSIGNED;
@@ -307,18 +339,20 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
 		return FUNC_STATE_TYPE;
 
 	case FUNC_STATE_TYPE:
-		if (!isalpha(token[0]) || token[0] == '_')
-			break;
 		if (WARN_ON(!fevent->last_arg))
 			break;
-		fevent->last_arg->name = kstrdup(token, GFP_KERNEL);
-		if (!fevent->last_arg->name)
+		if (update_arg_name(fevent, token) < 0)
+			break;
+		if (strncmp(token, "0x", 2) == 0)
+			goto equal;
+		if (!isalpha(token[0]) && token[0] != '_')
 			break;
+		update_arg = true;
 		return FUNC_STATE_VAR;
 
 	case FUNC_STATE_BRACKET:
 		WARN_ON(!fevent->last_arg);
-		ret = kstrtol(token, 0, &val);
+		ret = kstrtoul(token, 0, &val);
 		if (ret)
 			break;
 		val *= fevent->last_arg->size;
@@ -333,7 +367,7 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
 	case FUNC_STATE_BRACKET_END:
 		switch (token[0]) {
 		case ')':
-			return FUNC_STATE_END;
+			goto end;
 		case ',':
 			return FUNC_STATE_COMMA;
 		case '|':
@@ -344,16 +378,33 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
 	case FUNC_STATE_PLUS:
 		if (WARN_ON(!fevent->last_arg))
 			break;
-		ret = kstrtol(token, 0, &val);
+		ret = kstrtoul(token, 0, &val);
 		if (ret)
 			break;
 		fevent->last_arg->index += val;
 		return FUNC_STATE_VAR;
 
+	case FUNC_STATE_ADDR:
+		switch (token[0]) {
+		case ')':
+			goto end;
+		case ',':
+			return FUNC_STATE_COMMA;
+		case '|':
+			return FUNC_STATE_PIPE;
+		}
+		break;
+
 	case FUNC_STATE_VAR:
+		if (token[0] == '=')
+			return FUNC_STATE_EQUAL;
+		if (WARN_ON(!fevent->last_arg))
+			break;
+		update_arg_arg(fevent);
+		update_arg = true;
 		switch (token[0]) {
 		case ')':
-			return FUNC_STATE_END;
+			goto end;
 		case ',':
 			return FUNC_STATE_COMMA;
 		case '|':
@@ -365,10 +416,29 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
 		}
 		break;
 
+	case FUNC_STATE_EQUAL:
+		if (strncmp(token, "0x", 2) != 0)
+			break;
+ equal:
+		if (WARN_ON(!fevent->last_arg))
+			break;
+		ret = kstrtoul(token, 0, &val);
+		if (ret < 0)
+			break;
+		update_arg = false;
+		fevent->last_arg->index = val;
+		fevent->last_arg->arg = -1;
+		fevent->last_arg->indirect = INDIRECT_FLAG;
+		return FUNC_STATE_ADDR;
+
 	default:
 		break;
 	}
 	return FUNC_STATE_ERROR;
+ end:
+	if (update_arg)
+		fevent->arg_cnt++;
+	return FUNC_STATE_END;
 }
 
 static long long get_arg(struct func_arg *arg, unsigned long val)
@@ -417,7 +487,7 @@ static void func_event_trace(struct trace_event_file *trace_file,
 	long args[func_event->arg_cnt];
 	long long val = 1;
 	unsigned long irq_flags;
-	int nr_args;
+	int nr_args = 0;
 	int size;
 	int pc;
 
@@ -438,12 +508,17 @@ static void func_event_trace(struct trace_event_file *trace_file,
 	entry = ring_buffer_event_data(event);
 	entry->ip = ip;
 	entry->parent_ip = parent_ip;
-	nr_args = arch_get_func_args(pt_regs, 0, func_event->arg_cnt, args);
+	if (func_event->arg_cnt)
+		nr_args = arch_get_func_args(pt_regs, 0, func_event->arg_cnt, args);
 
 	list_for_each_entry(arg, &func_event->args, list) {
-		if (arg->arg < nr_args)
-			val = get_arg(arg, args[arg->arg]);
-		else
+		if (arg->arg < nr_args) {
+			/* Is arg an address and not a parameter? */
+			if (arg->arg < 0)
+				val = get_arg(arg, 0);
+			else
+				val = get_arg(arg, args[arg->arg]);
+		} else
 			val = 0;
 		memcpy(&entry->data[arg->offset], &val, arg->size);
 	}
@@ -815,11 +890,15 @@ static int func_event_seq_show(struct seq_file *m, void *v)
 		last_arg = arg->arg;
 		comma = true;
 		seq_printf(m, "%s %s", arg->type, arg->name);
-		if (arg->index)
-			seq_printf(m, "+%ld", arg->index);
-		if (arg->indirect && arg->size)
-			seq_printf(m, "[%ld]",
-				   (arg->indirect ^ INDIRECT_FLAG) / arg->size);
+		if (arg->arg < 0) {
+			seq_printf(m, "=0x%lx", arg->index);
+		} else {
+			if (arg->index)
+				seq_printf(m, "+%ld", arg->index);
+			if (arg->indirect && arg->size)
+				seq_printf(m, "[%ld]",
+					   (arg->indirect ^ INDIRECT_FLAG) / arg->size);
+		}
 	}
 	seq_puts(m, ")\n");
 
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 13/18] tracing: Add array type to function based events
  2018-02-02 23:04 [PATCH 00/18] [ANNOUNCE] Dynamically created function based events Steven Rostedt
                   ` (11 preceding siblings ...)
  2018-02-02 23:05 ` [PATCH 12/18] tracing: Add accessing direct address from " Steven Rostedt
@ 2018-02-02 23:05 ` Steven Rostedt
  2018-02-03 13:56   ` Masami Hiramatsu
  2018-02-09  1:17   ` Namhyung Kim
  2018-02-02 23:05 ` [PATCH 14/18] tracing: Have char arrays be strings for " Steven Rostedt
                   ` (8 subsequent siblings)
  21 siblings, 2 replies; 87+ messages in thread
From: Steven Rostedt @ 2018-02-02 23:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Tom Zanussi, linux-rt-users,
	linux-trace-users, Arnaldo Carvalho de Melo, Clark Williams,
	Jiri Olsa, Daniel Bristot de Oliveira, Juri Lelli,
	Jonathan Corbet, Mathieu Desnoyers, Namhyung Kim,
	Alexei Starovoitov

[-- Attachment #1: 0013-tracing-Add-array-type-to-function-based-events.patch --]
[-- Type: text/plain, Size: 9928 bytes --]

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

Add syntex to allow the user to create an array type. Brackets after the
type field will denote that this is an array type. For example:

 # echo 'SyS_open(x8[32] buf, x32 flags, x32 mode)' > function_events

Will make the first argument of the sys_open function call an array of
32 bytes.

The array type can also be used in conjunction with the indirect offset
brackets as well. For example to get the interrupt stack of regs in do_IRQ()
for x86_64.

 # echo 'do_IRQ(x64[5] regs[16])' > function_events

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 Documentation/trace/function-based-events.rst |  22 +++-
 kernel/trace/trace_event_ftrace.c             | 157 +++++++++++++++++++++-----
 2 files changed, 151 insertions(+), 28 deletions(-)

diff --git a/Documentation/trace/function-based-events.rst b/Documentation/trace/function-based-events.rst
index b0e6725f3032..4a8a6fb16a0a 100644
--- a/Documentation/trace/function-based-events.rst
+++ b/Documentation/trace/function-based-events.rst
@@ -93,7 +93,7 @@ as follows:
 
  ARG := TYPE FIELD | TYPE <name> '=' ADDR | TYPE ADDR | ARG '|' ARG
 
- TYPE := ATOM | 'unsigned' ATOM
+ TYPE := ATOM | ATOM '[' <number> ']' | 'unsigned' TYPE
 
  ATOM := 'u8' | 'u16' | 'u32' | 'u64' |
          's8' | 's16' | 's32' | 's64' |
@@ -305,3 +305,23 @@ Is the same as
     <idle>-0     [003] d..3   655.823498: ret_from_intr->do_IRQ(total_forks=1504, regs=tick_nohz_idle_enter+0x4c/0x50)
     <idle>-0     [003] d..3   655.954096: ret_from_intr->do_IRQ(total_forks=1504, regs=cpuidle_enter_state+0xb1/0x330)
 
+
+Array types
+===========
+
+If there's a case where you want to see an array of a type, then you can
+declare a type as an array by adding '[' number ']' after the type.
+
+To get the net_device perm_addr, from the dev parameter.
+
+ (gdb) printf "%d\n", &((struct net_device *)0)->perm_addr
+558
+
+ # echo 'ip_rcv(x64 skb, x8[6] perm_addr+558)' > function_events
+
+ # echo 1 > events/functions/ip_rcv/enable
+ # cat trace
+    <idle>-0     [003] ..s3   219.813582: __netif_receive_skb_core->ip_rcv(skb=ffff880118195e00, perm_addr=b4,b5,2f,ce,18,65)
+    <idle>-0     [003] ..s3   219.813595: __netif_receive_skb_core->ip_rcv(skb=ffff880118195e00, perm_addr=b4,b5,2f,ce,18,65)
+    <idle>-0     [003] ..s3   220.115053: __netif_receive_skb_core->ip_rcv(skb=ffff880118195c00, perm_addr=b4,b5,2f,ce,18,65)
+    <idle>-0     [003] ..s3   220.115293: __netif_receive_skb_core->ip_rcv(skb=ffff880118195c00, perm_addr=b4,b5,2f,ce,18,65)
diff --git a/kernel/trace/trace_event_ftrace.c b/kernel/trace/trace_event_ftrace.c
index 206114f192be..64e2d7dcfd18 100644
--- a/kernel/trace/trace_event_ftrace.c
+++ b/kernel/trace/trace_event_ftrace.c
@@ -20,6 +20,7 @@ struct func_arg {
 	char				*name;
 	long				indirect;
 	long				index;
+	short				array;
 	short				offset;
 	short				size;
 	s8				arg;
@@ -68,6 +69,9 @@ enum func_states {
 	FUNC_STATE_PIPE,
 	FUNC_STATE_PLUS,
 	FUNC_STATE_TYPE,
+	FUNC_STATE_ARRAY,
+	FUNC_STATE_ARRAY_SIZE,
+	FUNC_STATE_ARRAY_END,
 	FUNC_STATE_VAR,
 	FUNC_STATE_COMMA,
 	FUNC_STATE_END,
@@ -289,6 +293,7 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
 	static bool update_arg;
 	static int unsign;
 	unsigned long val;
+	char *type;
 	int ret;
 	int i;
 
@@ -339,6 +344,10 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
 		return FUNC_STATE_TYPE;
 
 	case FUNC_STATE_TYPE:
+		if (token[0] == '[')
+			return FUNC_STATE_ARRAY;
+		/* Fall through */
+	case FUNC_STATE_ARRAY_END:
 		if (WARN_ON(!fevent->last_arg))
 			break;
 		if (update_arg_name(fevent, token) < 0)
@@ -350,14 +359,37 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
 		update_arg = true;
 		return FUNC_STATE_VAR;
 
+	case FUNC_STATE_ARRAY:
 	case FUNC_STATE_BRACKET:
-		WARN_ON(!fevent->last_arg);
+		if (WARN_ON(!fevent->last_arg))
+			break;
 		ret = kstrtoul(token, 0, &val);
 		if (ret)
 			break;
-		val *= fevent->last_arg->size;
-		fevent->last_arg->indirect = val ^ INDIRECT_FLAG;
-		return FUNC_STATE_INDIRECT;
+		if (state == FUNC_STATE_BRACKET) {
+			val *= fevent->last_arg->size;
+			fevent->last_arg->indirect = val ^ INDIRECT_FLAG;
+			return FUNC_STATE_INDIRECT;
+		}
+		if (val <= 0)
+			break;
+		fevent->last_arg->array = val;
+		type = kasprintf(GFP_KERNEL, "%s[%d]", fevent->last_arg->type, (unsigned)val);
+		if (!type)
+			break;
+		kfree(fevent->last_arg->type);
+		fevent->last_arg->type = type;
+		/*
+		 * arg_offset has already been updated once by size.
+		 * This update needs to account for that (hence the "- 1").
+		 */
+		fevent->arg_offset += fevent->last_arg->size * (fevent->last_arg->array - 1);
+		return FUNC_STATE_ARRAY_SIZE;
+
+	case FUNC_STATE_ARRAY_SIZE:
+		if (token[0] != ']')
+			break;
+		return FUNC_STATE_ARRAY_END;
 
 	case FUNC_STATE_INDIRECT:
 		if (token[0] != ']')
@@ -453,6 +485,10 @@ static long long get_arg(struct func_arg *arg, unsigned long val)
 
 	val = val + (arg->indirect ^ INDIRECT_FLAG);
 
+	/* Arrays do their own indirect reads */
+	if (arg->array)
+		return val;
+
 	ret = probe_kernel_read(buf, (void *)val, arg->size);
 	if (ret)
 		return 0;
@@ -474,6 +510,21 @@ static long long get_arg(struct func_arg *arg, unsigned long val)
 	return val;
 }
 
+static void get_array(void *dst, struct func_arg *arg, unsigned long val)
+{
+	void *ptr = (void *)val;
+	int ret;
+	int i;
+
+	for (i = 0; i < arg->array; i++) {
+		ret = probe_kernel_read(dst, ptr, arg->size);
+		if (ret)
+			memset(dst, 0, arg->size);
+		ptr += arg->size;
+		dst += arg->size;
+	}
+}
+
 static void func_event_trace(struct trace_event_file *trace_file,
 			     struct func_event *func_event,
 			     unsigned long ip, unsigned long parent_ip,
@@ -520,7 +571,10 @@ static void func_event_trace(struct trace_event_file *trace_file,
 				val = get_arg(arg, args[arg->arg]);
 		} else
 			val = 0;
-		memcpy(&entry->data[arg->offset], &val, arg->size);
+		if (arg->array)
+			get_array(&entry->data[arg->offset], arg, val);
+		else
+			memcpy(&entry->data[arg->offset], &val, arg->size);
 	}
 
 	event_trigger_unlock_commit_regs(trace_file, buffer, event,
@@ -571,6 +625,25 @@ static void make_fmt(struct func_arg *arg, char *fmt)
 	fmt[c++] = '\0';
 }
 
+static void write_data(struct trace_seq *s, const struct func_arg *arg, const char *fmt,
+		       const void *data)
+{
+	switch (arg->size) {
+	case 8:
+		trace_seq_printf(s, fmt, *(unsigned long long *)data);
+		break;
+	case 4:
+		trace_seq_printf(s, fmt, *(unsigned *)data);
+		break;
+	case 2:
+		trace_seq_printf(s, fmt, *(unsigned short *)data);
+		break;
+	case 1:
+		trace_seq_printf(s, fmt, *(unsigned char *)data);
+		break;
+	}
+}
+
 static enum print_line_t
 func_event_print(struct trace_iterator *iter, int flags,
 		 struct trace_event *event)
@@ -582,6 +655,7 @@ func_event_print(struct trace_iterator *iter, int flags,
 	char fmt[FMT_SIZE];
 	void *data;
 	bool comma = false;
+	int a;
 
 	entry = (struct func_event_hdr *)iter->ent;
 
@@ -598,20 +672,16 @@ func_event_print(struct trace_iterator *iter, int flags,
 
 		make_fmt(arg, fmt);
 
-		switch (arg->size) {
-		case 8:
-			trace_seq_printf(s, fmt, *(unsigned long long *)data);
-			break;
-		case 4:
-			trace_seq_printf(s, fmt, *(unsigned *)data);
-			break;
-		case 2:
-			trace_seq_printf(s, fmt, *(unsigned short *)data);
-			break;
-		case 1:
-			trace_seq_printf(s, fmt, *(unsigned char *)data);
-			break;
-		}
+		if (arg->array) {
+			comma = false;
+			for (a = 0; a < arg->array; a++, data += arg->size) {
+				if (comma)
+					trace_seq_putc(s, ',');
+				comma = true;
+				write_data(s, arg, fmt, data);
+			}
+		} else
+			write_data(s, arg, fmt, data);
 	}
 	trace_seq_puts(s, ")\n");
 	return trace_handle_return(s);
@@ -634,11 +704,14 @@ static int func_event_define_fields(struct trace_event_call *event_call)
 	DEFINE_FIELD(unsigned long, parent_ip, "__ip", 0);
 
 	list_for_each_entry(arg, &fevent->args, list) {
+		int size = arg->size;
+
+		if (arg->array)
+			size *= arg->array;
 		ret = trace_define_field(event_call, arg->type,
 					 arg->name,
 					 sizeof(field) + arg->offset,
-					 arg->size, arg->sign,
-					 FILTER_OTHER);
+					 size, arg->sign, FILTER_OTHER);
 		if (ret < 0)
 			return ret;
 	}
@@ -729,7 +802,7 @@ static int __set_print_fmt(struct func_event *func_event,
 	const char *fmt_start = "\"%pS->%pS(";
 	const char *fmt_end = ")\", REC->__ip, REC->__parent_ip";
 	char fmt[FMT_SIZE];
-	int r, i;
+	int r, i, a;
 	bool comma = false;
 
 	r = snprintf(buf, len, "%s", fmt_start);
@@ -741,19 +814,49 @@ static int __set_print_fmt(struct func_event *func_event,
 			len = update_len(len, i);
 		}
 		comma = true;
-		make_fmt(arg, fmt);
-		i = snprintf(buf + r, len, "%s=%s", arg->name, fmt);
+
+		i = snprintf(buf + r, len, "%s=", arg->name);
 		r += i;
 		len = update_len(len, i);
+
+		make_fmt(arg, fmt);
+
+		if (arg->array) {
+			bool colon = false;
+
+			for (a = 0; a < arg->array; a++) {
+				if (colon) {
+					i = snprintf(buf + r, len, ":");
+					r += i;
+					len = update_len(len, i);
+				}
+				colon = true;
+				i = snprintf(buf + r, len, "%s", fmt);
+				r += i;
+				len = update_len(len, i);
+			}
+		} else {
+			i = snprintf(buf + r, len, "%s", fmt);
+			r += i;
+			len = update_len(len, i);
+		}
 	}
 	i = snprintf(buf + r, len, "%s", fmt_end);
 	r += i;
 	len = update_len(len, i);
 
 	list_for_each_entry(arg, &func_event->args, list) {
-		i = snprintf(buf + r, len, ", REC->%s", arg->name);
-		r += i;
-		len = update_len(len, i);
+		if (arg->array) {
+			for (a = 0; a < arg->array; a++) {
+				i = snprintf(buf + r, len, ", REC->%s[%d]", arg->name, a);
+				r += i;
+				len = update_len(len, i);
+			}
+		} else {
+			i = snprintf(buf + r, len, ", REC->%s", arg->name);
+			r += i;
+			len = update_len(len, i);
+		}
 	}
 
 	return r;
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 14/18] tracing: Have char arrays be strings for function based events
  2018-02-02 23:04 [PATCH 00/18] [ANNOUNCE] Dynamically created function based events Steven Rostedt
                   ` (12 preceding siblings ...)
  2018-02-02 23:05 ` [PATCH 13/18] tracing: Add array type to " Steven Rostedt
@ 2018-02-02 23:05 ` Steven Rostedt
  2018-02-02 23:05 ` [PATCH 15/18] tracing: Add string type for dynamic strings in " Steven Rostedt
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 87+ messages in thread
From: Steven Rostedt @ 2018-02-02 23:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Tom Zanussi, linux-rt-users,
	linux-trace-users, Arnaldo Carvalho de Melo, Clark Williams,
	Jiri Olsa, Daniel Bristot de Oliveira, Juri Lelli,
	Jonathan Corbet, Mathieu Desnoyers, Namhyung Kim,
	Alexei Starovoitov

[-- Attachment #1: 0014-tracing-Have-char-arrays-be-strings-for-function-bas.patch --]
[-- Type: text/plain, Size: 5070 bytes --]

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

If a field in a function based event is defined with type "char[##]" then it
will be considered a static string. If a user wants an actual byte array
they should use one of u8, s8, or x8.

Now we can get strings from events:

 # echo 'SyS_openat(int dfd, char[64] buf, x32 flags, x32 mode)' > function_events
 # grep xxx /etc/*
 # cat trace
  grep-1745  [001] .... 346135.431364: entry_SYSCALL_64_fastpath->SyS_openat(dfd=-100, buf=/etc/adjtime, flags=100, mode=0)
  grep-1745  [001] .... 346135.431734: entry_SYSCALL_64_fastpath->SyS_openat(dfd=-100, buf=/etc/aliases, flags=100, mode=0)
  grep-1745  [001] .... 346135.618765: entry_SYSCALL_64_fastpath->SyS_openat(dfd=-100, buf=/etc/alternatives, flags=100, mode=0)
  grep-1745  [001] .... 346135.619063: entry_SYSCALL_64_fastpath->SyS_openat(dfd=-100, buf=/etc/anacrontab, flags=100, mode=0)
  grep-1745  [001] .... 346135.619134: entry_SYSCALL_64_fastpath->SyS_openat(dfd=-100, buf=/etc/asciidoc, flags=100, mode=0)
  grep-1745  [001] .... 346135.619390: entry_SYSCALL_64_fastpath->SyS_openat(dfd=-100, buf=/etc/asound.conf, flags=100, mode=0)
  grep-1745  [001] .... 346135.624350: entry_SYSCALL_64_fastpath->SyS_openat(dfd=-100, buf=/etc/audisp, flags=100, mode=0)
  grep-1745  [001] .... 346135.624565: entry_SYSCALL_64_fastpath->SyS_openat(dfd=-100, buf=/etc/audit, flags=100, mode=0)

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 Documentation/trace/function-based-events.rst | 17 +++++++++++++++++
 kernel/trace/trace_event_ftrace.c             | 21 +++++++++++++++++----
 2 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/Documentation/trace/function-based-events.rst b/Documentation/trace/function-based-events.rst
index 4a8a6fb16a0a..99ae77cd59e6 100644
--- a/Documentation/trace/function-based-events.rst
+++ b/Documentation/trace/function-based-events.rst
@@ -325,3 +325,20 @@ To get the net_device perm_addr, from the dev parameter.
     <idle>-0     [003] ..s3   219.813595: __netif_receive_skb_core->ip_rcv(skb=ffff880118195e00, perm_addr=b4,b5,2f,ce,18,65)
     <idle>-0     [003] ..s3   220.115053: __netif_receive_skb_core->ip_rcv(skb=ffff880118195c00, perm_addr=b4,b5,2f,ce,18,65)
     <idle>-0     [003] ..s3   220.115293: __netif_receive_skb_core->ip_rcv(skb=ffff880118195c00, perm_addr=b4,b5,2f,ce,18,65)
+
+
+Static strings
+==============
+
+An array of type 'char' or 'unsigned char' will be processed as a string using
+the format "%s". If a nul is found, the output will stop. Use another type
+(x8, u8, s8) if this is not desired.
+
+  # echo 'link_path_walk(char[64] name)' > function_events
+
+  # echo 1 > events/functions/link_path_walk/enable
+  # cat trace
+      bash-1470  [003] ...2   980.678664: path_openat->link_path_walk(name=/usr/bin/cat)
+      bash-1470  [003] ...2   980.678715: path_openat->link_path_walk(name=/lib64/ld-linux-x86-64.so.2)
+      bash-1470  [003] ...2   980.678721: path_openat->link_path_walk(name=ld-2.24.so)
+      bash-1470  [003] ...2   980.678978: path_lookupat->link_path_walk(name=/etc/ld.so.preload)
diff --git a/kernel/trace/trace_event_ftrace.c b/kernel/trace/trace_event_ftrace.c
index 64e2d7dcfd18..dd24b840329d 100644
--- a/kernel/trace/trace_event_ftrace.c
+++ b/kernel/trace/trace_event_ftrace.c
@@ -610,6 +610,14 @@ static void make_fmt(struct func_arg *arg, char *fmt)
 
 	fmt[c++] = '%';
 
+	if (arg->func_type == FUNC_TYPE_char) {
+		if (arg->array)
+			fmt[c++] = 's';
+		else
+			fmt[c++] = 'c';
+		goto out;
+	}
+
 	if (arg->size == 8) {
 		fmt[c++] = 'l';
 		fmt[c++] = 'l';
@@ -622,6 +630,7 @@ static void make_fmt(struct func_arg *arg, char *fmt)
 	else
 		fmt[c++] = 'u';
 
+ out:
 	fmt[c++] = '\0';
 }
 
@@ -639,7 +648,10 @@ static void write_data(struct trace_seq *s, const struct func_arg *arg, const ch
 		trace_seq_printf(s, fmt, *(unsigned short *)data);
 		break;
 	case 1:
-		trace_seq_printf(s, fmt, *(unsigned char *)data);
+		if (arg->array && arg->func_type == FUNC_TYPE_char)
+			trace_seq_printf(s, fmt, (char *)data);
+		else
+			trace_seq_printf(s, fmt, *(unsigned char *)data);
 		break;
 	}
 }
@@ -672,7 +684,7 @@ func_event_print(struct trace_iterator *iter, int flags,
 
 		make_fmt(arg, fmt);
 
-		if (arg->array) {
+		if (arg->array && arg->func_type != FUNC_TYPE_char) {
 			comma = false;
 			for (a = 0; a < arg->array; a++, data += arg->size) {
 				if (comma)
@@ -821,7 +833,7 @@ static int __set_print_fmt(struct func_event *func_event,
 
 		make_fmt(arg, fmt);
 
-		if (arg->array) {
+		if (arg->array && arg->func_type != FUNC_TYPE_char) {
 			bool colon = false;
 
 			for (a = 0; a < arg->array; a++) {
@@ -846,7 +858,8 @@ static int __set_print_fmt(struct func_event *func_event,
 	len = update_len(len, i);
 
 	list_for_each_entry(arg, &func_event->args, list) {
-		if (arg->array) {
+		/* Don't iterate for strings */
+		if (arg->array && arg->func_type != FUNC_TYPE_char) {
 			for (a = 0; a < arg->array; a++) {
 				i = snprintf(buf + r, len, ", REC->%s[%d]", arg->name, a);
 				r += i;
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 15/18] tracing: Add string type for dynamic strings in function based events
  2018-02-02 23:04 [PATCH 00/18] [ANNOUNCE] Dynamically created function based events Steven Rostedt
                   ` (13 preceding siblings ...)
  2018-02-02 23:05 ` [PATCH 14/18] tracing: Have char arrays be strings for " Steven Rostedt
@ 2018-02-02 23:05 ` Steven Rostedt
  2018-02-09  3:15   ` Namhyung Kim
  2018-02-02 23:05 ` [PATCH 16/18] tracing: Add NULL to skip args for " Steven Rostedt
                   ` (6 subsequent siblings)
  21 siblings, 1 reply; 87+ messages in thread
From: Steven Rostedt @ 2018-02-02 23:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Tom Zanussi, linux-rt-users,
	linux-trace-users, Arnaldo Carvalho de Melo, Clark Williams,
	Jiri Olsa, Daniel Bristot de Oliveira, Juri Lelli,
	Jonathan Corbet, Mathieu Desnoyers, Namhyung Kim,
	Alexei Starovoitov

[-- Attachment #1: 0015-tracing-Add-string-type-for-dynamic-strings-in-funct.patch --]
[-- Type: text/plain, Size: 10749 bytes --]

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

Add a "string" type that will create a dynamic length string for the
event, this is the same as the __string() field in normal TRACE_EVENTS.

[ missing 'static' found by Fengguang Wu's kbuild test robot ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 Documentation/trace/function-based-events.rst |  19 ++-
 kernel/trace/trace_event_ftrace.c             | 183 +++++++++++++++++++++++---
 2 files changed, 181 insertions(+), 21 deletions(-)

diff --git a/Documentation/trace/function-based-events.rst b/Documentation/trace/function-based-events.rst
index 99ae77cd59e6..6c643ea749e7 100644
--- a/Documentation/trace/function-based-events.rst
+++ b/Documentation/trace/function-based-events.rst
@@ -99,7 +99,7 @@ as follows:
          's8' | 's16' | 's32' | 's64' |
          'x8' | 'x16' | 'x32' | 'x64' |
          'char' | 'short' | 'int' | 'long' | 'size_t' |
-	 'symbol'
+	 'symbol' | 'string'
 
  FIELD := <name> | <name> INDEX | <name> OFFSET | <name> OFFSET INDEX
 
@@ -342,3 +342,20 @@ the format "%s". If a nul is found, the output will stop. Use another type
       bash-1470  [003] ...2   980.678715: path_openat->link_path_walk(name=/lib64/ld-linux-x86-64.so.2)
       bash-1470  [003] ...2   980.678721: path_openat->link_path_walk(name=ld-2.24.so)
       bash-1470  [003] ...2   980.678978: path_lookupat->link_path_walk(name=/etc/ld.so.preload)
+
+
+Dynamic strings
+===============
+
+Static strings are fine, but they can waste a lot of memory in the ring buffer.
+The above allocated 64 bytes for a character array, but most of the output was
+less than 20 characters. Not wanting to truncate strings or waste space on
+the ring buffer, the dynamic string can help.
+
+Use the "string" type for strings that have a large range in size. The max
+size that will be recorded is 512 bytes. If a string is larger than that, then
+it will be truncated.
+
+ # echo 'link_path_walk(string name)' > function_events
+
+Gives the same result as above, but does not waste buffer space.
diff --git a/kernel/trace/trace_event_ftrace.c b/kernel/trace/trace_event_ftrace.c
index dd24b840329d..273c5838a8e2 100644
--- a/kernel/trace/trace_event_ftrace.c
+++ b/kernel/trace/trace_event_ftrace.c
@@ -39,6 +39,7 @@ struct func_event {
 	struct func_arg			*last_arg;
 	int				arg_cnt;
 	int				arg_offset;
+	int				has_strings;
 };
 
 struct func_file {
@@ -83,6 +84,8 @@ typedef u32 x32;
 typedef u16 x16;
 typedef u8 x8;
 typedef void * symbol;
+/* 2 byte offset, 2 byte length */
+typedef u32 string;
 
 #define TYPE_TUPLE(type)			\
 	{ #type, sizeof(type), is_signed_type(type) }
@@ -105,7 +108,8 @@ typedef void * symbol;
 	TYPE_TUPLE(u8),				\
 	TYPE_TUPLE(s8),				\
 	TYPE_TUPLE(x8),				\
-	TYPE_TUPLE(symbol)
+	TYPE_TUPLE(symbol),			\
+	TYPE_TUPLE(string)
 
 static struct func_type {
 	char		*name;
@@ -124,6 +128,16 @@ enum {
 	FUNC_TYPE_MAX
 };
 
+#define MAX_STR		512
+
+/* Two contexts, normal and NMI, hence the " * 2" */
+struct func_string {
+	char		buf[MAX_STR * 2];
+};
+
+static struct func_string __percpu *str_buffer;
+static int nr_strings;
+
 /**
  * arch_get_func_args - retrieve function arguments via pt_regs
  * @regs: The registers at the moment the function is called
@@ -163,6 +177,23 @@ int __weak arch_get_func_args(struct pt_regs *regs,
 	return 0;
 }
 
+static void free_arg(struct func_arg *arg)
+{
+	list_del(&arg->list);
+	if (arg->func_type == FUNC_TYPE_string) {
+		nr_strings--;
+		if (WARN_ON(nr_strings < 0))
+			nr_strings = 0;
+		if (!nr_strings) {
+			free_percpu(str_buffer);
+			str_buffer = NULL;
+		}
+	}
+	kfree(arg->name);
+	kfree(arg->type);
+	kfree(arg);
+}
+
 static void free_func_event(struct func_event *func_event)
 {
 	struct func_arg *arg, *n;
@@ -171,10 +202,7 @@ static void free_func_event(struct func_event *func_event)
 		return;
 
 	list_for_each_entry_safe(arg, n, &func_event->args, list) {
-		list_del(&arg->list);
-		kfree(arg->name);
-		kfree(arg->type);
-		kfree(arg);
+		free_arg(arg);
 	}
 	ftrace_free_filter(&func_event->ops);
 	kfree(func_event->call.print_fmt);
@@ -255,6 +283,17 @@ static int add_arg(struct func_event *fevent, int ftype, int unsign)
 	list_add_tail(&arg->list, &fevent->args);
 	fevent->last_arg = arg;
 
+	if (ftype == FUNC_TYPE_string) {
+		fevent->has_strings++;
+		nr_strings++;
+		if (nr_strings == 1) {
+			str_buffer = alloc_percpu(struct func_string);
+			if (!str_buffer) {
+				free_arg(arg);
+				return -ENOMEM;
+			}
+		}
+	}
 	return 0;
 }
 
@@ -344,8 +383,19 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
 		return FUNC_STATE_TYPE;
 
 	case FUNC_STATE_TYPE:
-		if (token[0] == '[')
+		if (token[0] == '[') {
+			/* Strings are already arrays */
+			if (fevent->last_arg->func_type == FUNC_TYPE_string)
+				break;
 			return FUNC_STATE_ARRAY;
+		}
+		if (fevent->last_arg->func_type == FUNC_TYPE_string) {
+			type = kstrdup("__data_loc char[]", GFP_KERNEL);
+			if (!type)
+				break;
+			kfree(fevent->last_arg->type);
+			fevent->last_arg->type = type;
+		}
 		/* Fall through */
 	case FUNC_STATE_ARRAY_END:
 		if (WARN_ON(!fevent->last_arg))
@@ -473,7 +523,7 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
 	return FUNC_STATE_END;
 }
 
-static long long get_arg(struct func_arg *arg, unsigned long val)
+static long long __get_arg(struct func_arg *arg, unsigned long val)
 {
 	char buf[8];
 	int ret;
@@ -485,8 +535,8 @@ static long long get_arg(struct func_arg *arg, unsigned long val)
 
 	val = val + (arg->indirect ^ INDIRECT_FLAG);
 
-	/* Arrays do their own indirect reads */
-	if (arg->array)
+	/* Arrays and strings do their own indirect reads */
+	if (arg->array || arg->func_type == FUNC_TYPE_string)
 		return val;
 
 	ret = probe_kernel_read(buf, (void *)val, arg->size);
@@ -510,6 +560,15 @@ static long long get_arg(struct func_arg *arg, unsigned long val)
 	return val;
 }
 
+static long long get_arg(struct func_arg *arg, long *args)
+{
+	/* Is arg an address and not a parameter? */
+	if (arg->arg < 0)
+		return __get_arg(arg, 0);
+	else
+		return __get_arg(arg, args[arg->arg]);
+}
+
 static void get_array(void *dst, struct func_arg *arg, unsigned long val)
 {
 	void *ptr = (void *)val;
@@ -525,6 +584,67 @@ static void get_array(void *dst, struct func_arg *arg, unsigned long val)
 	}
 }
 
+static int read_string(char *str, unsigned long addr)
+{
+	unsigned long flags;
+	struct func_string *strbuf;
+	char *ptr = (void *)addr;
+	char *buf;
+	int ret;
+
+	if (!str_buffer)
+		return 0;
+
+	strbuf = this_cpu_ptr(str_buffer);
+	buf = &strbuf->buf[0];
+
+	if (in_nmi())
+		buf += MAX_STR;
+
+	local_irq_save(flags);
+	ret = strncpy_from_unsafe(buf, ptr, MAX_STR);
+	if (ret < 0)
+		ret = 0;
+	if (ret > 0 && str)
+		memcpy(str, buf, ret);
+	local_irq_restore(flags);
+
+	return ret;
+}
+
+static int calculate_strings(struct func_event *func_event, int nr_args, long *args)
+{
+	struct func_arg *arg;
+	unsigned long val;
+	int str_count = 0;
+	int size = 0;
+
+	list_for_each_entry(arg, &func_event->args, list) {
+		if (arg->func_type != FUNC_TYPE_string)
+			continue;
+		if (arg->arg < nr_args)
+			val = get_arg(arg, args);
+		else
+			goto skip;
+		size += read_string(NULL, val);
+ skip:
+		if (++str_count >= func_event->has_strings)
+			return size;
+	}
+	return size;
+}
+
+static int get_string(unsigned long addr, unsigned int idx,
+		      unsigned int *info, char *data)
+{
+	int len;
+
+	len = read_string(data, addr);
+	*info = len << 16 | idx;
+
+	return len;
+}
+
 static void func_event_trace(struct trace_event_file *trace_file,
 			     struct func_event *func_event,
 			     unsigned long ip, unsigned long parent_ip,
@@ -538,6 +658,8 @@ static void func_event_trace(struct trace_event_file *trace_file,
 	long args[func_event->arg_cnt];
 	long long val = 1;
 	unsigned long irq_flags;
+	int str_offset;
+	int str_idx = 0;
 	int nr_args = 0;
 	int size;
 	int pc;
@@ -550,6 +672,12 @@ static void func_event_trace(struct trace_event_file *trace_file,
 
 	size = func_event->arg_offset + sizeof(*entry);
 
+	if (func_event->arg_cnt)
+		nr_args = arch_get_func_args(pt_regs, 0, func_event->arg_cnt, args);
+
+	if (func_event->has_strings)
+		size += calculate_strings(func_event, nr_args, args);
+
 	event = trace_event_buffer_lock_reserve(&buffer, trace_file,
 						call->event.type,
 						size, irq_flags, pc);
@@ -559,21 +687,22 @@ static void func_event_trace(struct trace_event_file *trace_file,
 	entry = ring_buffer_event_data(event);
 	entry->ip = ip;
 	entry->parent_ip = parent_ip;
-	if (func_event->arg_cnt)
-		nr_args = arch_get_func_args(pt_regs, 0, func_event->arg_cnt, args);
 
 	list_for_each_entry(arg, &func_event->args, list) {
-		if (arg->arg < nr_args) {
-			/* Is arg an address and not a parameter? */
-			if (arg->arg < 0)
-				val = get_arg(arg, 0);
-			else
-				val = get_arg(arg, args[arg->arg]);
-		} else
+		if (arg->arg < nr_args)
+			val = get_arg(arg, args);
+		else
 			val = 0;
 		if (arg->array)
 			get_array(&entry->data[arg->offset], arg, val);
-		else
+		else if (arg->func_type == FUNC_TYPE_string) {
+			str_offset = sizeof(struct func_event_hdr) +
+				func_event->arg_offset;
+
+			str_idx += get_string(val, str_offset + str_idx,
+					      (unsigned int *)&entry->data[arg->offset],
+					      &entry->data[func_event->arg_offset + str_idx]);
+		} else
 			memcpy(&entry->data[arg->offset], &val, arg->size);
 	}
 
@@ -610,6 +739,11 @@ static void make_fmt(struct func_arg *arg, char *fmt)
 
 	fmt[c++] = '%';
 
+	if (arg->func_type == FUNC_TYPE_string) {
+		fmt[c++] = 's';
+		goto out;
+	}
+
 	if (arg->func_type == FUNC_TYPE_char) {
 		if (arg->array)
 			fmt[c++] = 's';
@@ -667,6 +801,7 @@ func_event_print(struct trace_iterator *iter, int flags,
 	char fmt[FMT_SIZE];
 	void *data;
 	bool comma = false;
+	int info;
 	int a;
 
 	entry = (struct func_event_hdr *)iter->ent;
@@ -692,6 +827,11 @@ func_event_print(struct trace_iterator *iter, int flags,
 				comma = true;
 				write_data(s, arg, fmt, data);
 			}
+		} else if (arg->func_type == FUNC_TYPE_string) {
+			info = *(unsigned int *)data;
+			info = (info & 0xffff) - sizeof(struct func_event_hdr);
+			data = &entry->data[info];
+			trace_seq_printf(s, fmt, data);
 		} else
 			write_data(s, arg, fmt, data);
 	}
@@ -866,7 +1006,10 @@ static int __set_print_fmt(struct func_event *func_event,
 				len = update_len(len, i);
 			}
 		} else {
-			i = snprintf(buf + r, len, ", REC->%s", arg->name);
+			if (arg->func_type == FUNC_TYPE_string)
+				i = snprintf(buf + r, len, ", __get_str(%s)", arg->name);
+			else
+				i = snprintf(buf + r, len, ", REC->%s", arg->name);
 			r += i;
 			len = update_len(len, i);
 		}
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 16/18] tracing: Add NULL to skip args for function based events
  2018-02-02 23:04 [PATCH 00/18] [ANNOUNCE] Dynamically created function based events Steven Rostedt
                   ` (14 preceding siblings ...)
  2018-02-02 23:05 ` [PATCH 15/18] tracing: Add string type for dynamic strings in " Steven Rostedt
@ 2018-02-02 23:05 ` Steven Rostedt
  2018-02-02 23:05 ` [PATCH 17/18] tracing: Add indirect to indirect access " Steven Rostedt
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 87+ messages in thread
From: Steven Rostedt @ 2018-02-02 23:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Tom Zanussi, linux-rt-users,
	linux-trace-users, Arnaldo Carvalho de Melo, Clark Williams,
	Jiri Olsa, Daniel Bristot de Oliveira, Juri Lelli,
	Jonathan Corbet, Mathieu Desnoyers, Namhyung Kim,
	Alexei Starovoitov

[-- Attachment #1: 0016-tracing-Add-NULL-to-skip-args-for-function-based-eve.patch --]
[-- Type: text/plain, Size: 5889 bytes --]

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

If args are to be skipped (only care about second, third or later arguments)
then add a NULL to ignore them. For example, if one only wants to record the
third argument of a function, they can perform:

 echo foo(NULL, NULL, u32 arg3) > function_events

Then only the third argument is saved in the function based event.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 Documentation/trace/function-based-events.rst | 28 +++++++++++++++++++++-
 kernel/trace/trace_event_ftrace.c             | 34 ++++++++++++++++++++++++++-
 2 files changed, 60 insertions(+), 2 deletions(-)

diff --git a/Documentation/trace/function-based-events.rst b/Documentation/trace/function-based-events.rst
index 6c643ea749e7..b90b52b7061d 100644
--- a/Documentation/trace/function-based-events.rst
+++ b/Documentation/trace/function-based-events.rst
@@ -91,7 +91,7 @@ as follows:
 
  ARGS := ARG | ARG ',' ARGS | ''
 
- ARG := TYPE FIELD | TYPE <name> '=' ADDR | TYPE ADDR | ARG '|' ARG
+ ARG := TYPE FIELD | TYPE <name> '=' ADDR | TYPE ADDR | ARG '|' ARG | 'NULL'
 
  TYPE := ATOM | ATOM '[' <number> ']' | 'unsigned' TYPE
 
@@ -359,3 +359,29 @@ it will be truncated.
  # echo 'link_path_walk(string name)' > function_events
 
 Gives the same result as above, but does not waste buffer space.
+
+
+NULL arguments
+==============
+
+If you are only interested in the second, or later parameter of a function,
+you do not have to record the previous parameters. Just set them as NULL and
+they will not be recorded.
+
+If we only wanted the perm_addr of the net_device of ip_rcv() and not the
+sk_buff, we put a NULL into the first parameter when created the function
+based event.
+
+  # echo 'ip_rcv(NULL, x8[6] perm_addr+558)' > function_events
+
+  # echo 1 > events/functions/ip_rcv/enable
+  # cat trace
+    <idle>-0     [003] ..s3   165.617114: __netif_receive_skb_core->ip_rcv(perm_addr=b4,b5,2f,ce,18,65)
+    <idle>-0     [003] ..s3   165.617133: __netif_receive_skb_core->ip_rcv(perm_addr=b4,b5,2f,ce,18,65)
+    <idle>-0     [003] ..s3   166.412277: __netif_receive_skb_core->ip_rcv(perm_addr=b4,b5,2f,ce,18,65)
+    <idle>-0     [003] ..s3   166.412797: __netif_receive_skb_core->ip_rcv(perm_addr=b4,b5,2f,ce,18,65)
+
+
+NULL can appear in any argument, to have them ignored. Note, skipping arguments
+does not give you access to later arguments if they are not supported by the
+architecture. The architecture only supplies the first set of arguments.
diff --git a/kernel/trace/trace_event_ftrace.c b/kernel/trace/trace_event_ftrace.c
index 273c5838a8e2..22bcb67ad184 100644
--- a/kernel/trace/trace_event_ftrace.c
+++ b/kernel/trace/trace_event_ftrace.c
@@ -75,6 +75,7 @@ enum func_states {
 	FUNC_STATE_ARRAY_END,
 	FUNC_STATE_VAR,
 	FUNC_STATE_COMMA,
+	FUNC_STATE_NULL,
 	FUNC_STATE_END,
 	FUNC_STATE_ERROR,
 };
@@ -117,6 +118,7 @@ static struct func_type {
 	int		sign;
 } func_types[] = {
 	FUNC_TYPES,
+	{ "NULL",	0,	0 },
 	{ NULL,		0,	0 }
 };
 
@@ -125,6 +127,7 @@ static struct func_type {
 
 enum {
 	FUNC_TYPES,
+	FUNC_TYPE_NULL,
 	FUNC_TYPE_MAX
 };
 
@@ -364,6 +367,8 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
 			fevent->arg_cnt++;
 		update_arg = false;
 	case FUNC_STATE_PIPE:
+		if (strcmp(token, "NULL") == 0)
+			return FUNC_STATE_NULL;
 		if (strcmp(token, "unsigned") == 0) {
 			unsign = 2;
 			return FUNC_STATE_UNSIGNED;
@@ -513,6 +518,19 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
 		fevent->last_arg->indirect = INDIRECT_FLAG;
 		return FUNC_STATE_ADDR;
 
+	case FUNC_STATE_NULL:
+		ret = add_arg(fevent, FUNC_TYPE_NULL, 0);
+		if (ret < 0)
+			break;
+		switch (token[0]) {
+		case ')':
+			goto end;
+		case ',':
+			update_arg = true;
+			return FUNC_STATE_COMMA;
+		}
+		break;
+
 	default:
 		break;
 	}
@@ -689,6 +707,8 @@ static void func_event_trace(struct trace_event_file *trace_file,
 	entry->parent_ip = parent_ip;
 
 	list_for_each_entry(arg, &func_event->args, list) {
+		if (arg->func_type == FUNC_TYPE_NULL)
+			continue;
 		if (arg->arg < nr_args)
 			val = get_arg(arg, args);
 		else
@@ -811,6 +831,8 @@ func_event_print(struct trace_iterator *iter, int flags,
 	trace_seq_printf(s, "%ps->%ps(",
 			 (void *)entry->parent_ip, (void *)entry->ip);
 	list_for_each_entry(arg, &func_event->args, list) {
+		if (arg->func_type == FUNC_TYPE_NULL)
+			continue;
 		if (comma)
 			trace_seq_puts(s, ", ");
 		comma = true;
@@ -858,6 +880,9 @@ static int func_event_define_fields(struct trace_event_call *event_call)
 	list_for_each_entry(arg, &fevent->args, list) {
 		int size = arg->size;
 
+		if (arg->func_type == FUNC_TYPE_NULL)
+			continue;
+
 		if (arg->array)
 			size *= arg->array;
 		ret = trace_define_field(event_call, arg->type,
@@ -960,6 +985,8 @@ static int __set_print_fmt(struct func_event *func_event,
 	r = snprintf(buf, len, "%s", fmt_start);
 	len = update_len(len, r);
 	list_for_each_entry(arg, &func_event->args, list) {
+		if (arg->func_type == FUNC_TYPE_NULL)
+			continue;
 		if (comma) {
 			i = snprintf(buf + r, len, ", ");
 			r += i;
@@ -998,6 +1025,8 @@ static int __set_print_fmt(struct func_event *func_event,
 	len = update_len(len, i);
 
 	list_for_each_entry(arg, &func_event->args, list) {
+		if (arg->func_type == FUNC_TYPE_NULL)
+			continue;
 		/* Don't iterate for strings */
 		if (arg->array && arg->func_type != FUNC_TYPE_char) {
 			for (a = 0; a < arg->array; a++) {
@@ -1148,7 +1177,10 @@ static int func_event_seq_show(struct seq_file *m, void *v)
 		}
 		last_arg = arg->arg;
 		comma = true;
-		seq_printf(m, "%s %s", arg->type, arg->name);
+		if (arg->func_type == FUNC_TYPE_NULL)
+			seq_puts(m, "NULL");
+		else
+			seq_printf(m, "%s %s", arg->type, arg->name);
 		if (arg->arg < 0) {
 			seq_printf(m, "=0x%lx", arg->index);
 		} else {
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 17/18] tracing: Add indirect to indirect access for function based events
  2018-02-02 23:04 [PATCH 00/18] [ANNOUNCE] Dynamically created function based events Steven Rostedt
                   ` (15 preceding siblings ...)
  2018-02-02 23:05 ` [PATCH 16/18] tracing: Add NULL to skip args for " Steven Rostedt
@ 2018-02-02 23:05 ` Steven Rostedt
  2018-02-09  5:13   ` Namhyung Kim
  2018-02-02 23:05 ` [PATCH 18/18] tracing/perf: Allow perf to use " Steven Rostedt
                   ` (4 subsequent siblings)
  21 siblings, 1 reply; 87+ messages in thread
From: Steven Rostedt @ 2018-02-02 23:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Tom Zanussi, linux-rt-users,
	linux-trace-users, Arnaldo Carvalho de Melo, Clark Williams,
	Jiri Olsa, Daniel Bristot de Oliveira, Juri Lelli,
	Jonathan Corbet, Mathieu Desnoyers, Namhyung Kim,
	Alexei Starovoitov

[-- Attachment #1: 0017-tracing-Add-indirect-to-indirect-access-for-function.patch --]
[-- Type: text/plain, Size: 8772 bytes --]

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

Allow the function based events to retrieve not only the parameters offsets,
but also get data from a pointer within a parameter structure. Something
like:

 # echo 'ip_rcv(string skdev+16[0][0] | x8[6] skperm+16[0]+558)' > function_events

 # echo 1 > events/functions/ip_rcv/enable
 # cat trace
    <idle>-0     [003] ..s3   310.626391: __netif_receive_skb_core->ip_rcv(skdev=em1, skperm=b4,b5,2f,ce,18,65)
    <idle>-0     [003] ..s3   310.626400: __netif_receive_skb_core->ip_rcv(skdev=em1, skperm=b4,b5,2f,ce,18,65)
    <idle>-0     [003] ..s3   312.183775: __netif_receive_skb_core->ip_rcv(skdev=em1, skperm=b4,b5,2f,ce,18,65)
    <idle>-0     [003] ..s3   312.184329: __netif_receive_skb_core->ip_rcv(skdev=em1, skperm=b4,b5,2f,ce,18,65)
    <idle>-0     [003] ..s3   312.303895: __netif_receive_skb_core->ip_rcv(skdev=em1, skperm=b4,b5,2f,ce,18,65)
    <idle>-0     [003] ..s3   312.304610: __netif_receive_skb_core->ip_rcv(skdev=em1, skperm=b4,b5,2f,ce,18,65)
    <idle>-0     [003] ..s3   312.471980: __netif_receive_skb_core->ip_rcv(skdev=em1, skperm=b4,b5,2f,ce,18,65)
    <idle>-0     [003] ..s3   312.472908: __netif_receive_skb_core->ip_rcv(skdev=em1, skperm=b4,b5,2f,ce,18,65)
    <idle>-0     [003] ..s3   313.135804: __netif_receive_skb_core->ip_rcv(skdev=em1, skperm=b4,b5,2f,ce,18,65)

That is, we retrieved the net_device of the sk_buff and displayed its name
and perm_addr info.

  sk->dev->name, sk->dev->perm_addr

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 Documentation/trace/function-based-events.rst |  40 +++++++++-
 kernel/trace/trace_event_ftrace.c             | 102 ++++++++++++++++++++++++--
 2 files changed, 136 insertions(+), 6 deletions(-)

diff --git a/Documentation/trace/function-based-events.rst b/Documentation/trace/function-based-events.rst
index b90b52b7061d..3b341992b93d 100644
--- a/Documentation/trace/function-based-events.rst
+++ b/Documentation/trace/function-based-events.rst
@@ -101,12 +101,15 @@ as follows:
          'char' | 'short' | 'int' | 'long' | 'size_t' |
 	 'symbol' | 'string'
 
- FIELD := <name> | <name> INDEX | <name> OFFSET | <name> OFFSET INDEX
+ FIELD := <name> | <name> INDEX | <name> OFFSET | <name> OFFSET INDEX |
+	 FIELD INDIRECT
 
  INDEX := '[' <number> ']'
 
  OFFSET := '+' <number>
 
+ INDIRECT := INDEX | OFFSET | INDIRECT INDIRECT | ''
+
  ADDR := A hexidecimal address starting with '0x'
 
  Where <name> is a unique string starting with an alphabetic character
@@ -385,3 +388,38 @@ based event.
 NULL can appear in any argument, to have them ignored. Note, skipping arguments
 does not give you access to later arguments if they are not supported by the
 architecture. The architecture only supplies the first set of arguments.
+
+
+The chain of indirects
+======================
+
+When a parameter is a structure, and that structure points to another structure,
+the data of that structure can still be found.
+
+ssize_t __vfs_read(struct file *file, char __user *buf, size_t count,
+		   loff_t *pos)
+
+has the following code.
+
+	if (file->f_op->read)
+		return file->f_op->read(file, buf, count, pos);
+
+To trace all the functions that are called by f_op->read(), that information
+can be obtained from the file pointer.
+
+Using gdb again:
+
+   (gdb) printf "%d\n", &((struct file *)0)->f_op
+40
+   (gdb) printf "%d\n", &((struct file_operations *)0)->read
+16
+
+    # echo '__vfs_read(symbol read+40[0]+16)' > function_events
+
+  # echo 1 > events/functions/__vfs_read/enable
+  # cat trace
+         sshd-1343  [005] ...2   199.734752: vfs_read->__vfs_read(read=tty_read+0x0/0xf0)
+         bash-1344  [003] ...2   199.734822: vfs_read->__vfs_read(read=tty_read+0x0/0xf0)
+         sshd-1343  [005] ...2   199.734835: vfs_read->__vfs_read(read=tty_read+0x0/0xf0)
+ avahi-daemon-910   [003] ...2   200.136740: vfs_read->__vfs_read(read=          (null))
+ avahi-daemon-910   [003] ...2   200.136750: vfs_read->__vfs_read(read=          (null))
diff --git a/kernel/trace/trace_event_ftrace.c b/kernel/trace/trace_event_ftrace.c
index 22bcb67ad184..b5b719680686 100644
--- a/kernel/trace/trace_event_ftrace.c
+++ b/kernel/trace/trace_event_ftrace.c
@@ -14,8 +14,15 @@
 #define WRITE_BUFSIZE		4096
 #define INDIRECT_FLAG		0x10000000
 
+struct func_arg_redirect {
+	struct list_head		list;
+	long				index;
+	long				indirect;
+};
+
 struct func_arg {
 	struct list_head		list;
+	struct list_head		redirects;
 	char				*type;
 	char				*name;
 	long				indirect;
@@ -73,6 +80,8 @@ enum func_states {
 	FUNC_STATE_ARRAY,
 	FUNC_STATE_ARRAY_SIZE,
 	FUNC_STATE_ARRAY_END,
+	FUNC_STATE_REDIRECT_PLUS,
+	FUNC_STATE_REDIRECT_BRACKET,
 	FUNC_STATE_VAR,
 	FUNC_STATE_COMMA,
 	FUNC_STATE_NULL,
@@ -267,6 +276,8 @@ static int add_arg(struct func_event *fevent, int ftype, int unsign)
 	if (!arg)
 		return -ENOMEM;
 
+	INIT_LIST_HEAD(&arg->redirects);
+
 	if (unsign)
 		arg->type = kasprintf(GFP_KERNEL, "unsigned %s",
 				      func_type->name);
@@ -329,6 +340,22 @@ static int update_arg_arg(struct func_event *fevent)
 	return 0;
 }
 
+static int add_arg_redirect(struct func_arg *arg, long index, long indirect)
+{
+	struct func_arg_redirect *redirect;
+
+	redirect = kzalloc(sizeof(*redirect), GFP_KERNEL);
+	if (!redirect)
+		return -ENOMEM;
+
+	redirect->index = index;
+	redirect->indirect = indirect;
+
+	list_add_tail(&redirect->list, &arg->redirects);
+
+	return 0;
+}
+
 static enum func_states
 process_event(struct func_event *fevent, const char *token, enum func_states state)
 {
@@ -459,6 +486,10 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
 			return FUNC_STATE_COMMA;
 		case '|':
 			return FUNC_STATE_PIPE;
+		case '+':
+			return FUNC_STATE_REDIRECT_PLUS;
+		case '[':
+			return FUNC_STATE_REDIRECT_BRACKET;
 		}
 		break;
 
@@ -482,6 +513,30 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
 		}
 		break;
 
+	case FUNC_STATE_REDIRECT_PLUS:
+		if (WARN_ON(!fevent->last_arg))
+			break;
+		ret = kstrtoul(token, 0, &val);
+		if (ret)
+			break;
+		ret = add_arg_redirect(fevent->last_arg, val, 0);
+		if (ret)
+			break;
+		return FUNC_STATE_VAR;
+
+	case FUNC_STATE_REDIRECT_BRACKET:
+		if (WARN_ON(!fevent->last_arg))
+			break;
+		ret = kstrtoul(token, 0, &val);
+		if (ret)
+			break;
+		val *= fevent->last_arg->size;
+		val ^= INDIRECT_FLAG;
+		ret = add_arg_redirect(fevent->last_arg, 0, val);
+		if (ret)
+			break;
+		return FUNC_STATE_INDIRECT;
+
 	case FUNC_STATE_VAR:
 		if (token[0] == '=')
 			return FUNC_STATE_EQUAL;
@@ -541,20 +596,49 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
 	return FUNC_STATE_END;
 }
 
-static long long __get_arg(struct func_arg *arg, unsigned long val)
+static unsigned long process_redirects(struct func_arg *arg, unsigned long val,
+				       char *buf)
+{
+	struct func_arg_redirect *redirect;
+	int ret;
+
+	if (arg->indirect) {
+		ret = probe_kernel_read(buf, (void *)val, sizeof(long));
+		if (ret)
+			return 0;
+		val = *(unsigned long *)buf;
+	}
+
+	list_for_each_entry(redirect, &arg->redirects, list) {
+		val += redirect->index;
+		if (redirect->indirect) {
+			val += (redirect->indirect ^ INDIRECT_FLAG);
+			ret = probe_kernel_read(buf, (void *)val, sizeof(long));
+			if (ret)
+				return 0;
+		}
+	}
+	return val;
+}
+
+static long long __get_arg(struct func_arg *arg, unsigned long long val)
 {
 	char buf[8];
 	int ret;
 
 	val += arg->index;
 
-	if (!arg->indirect)
-		return val;
+	if (arg->indirect)
+		val += (arg->indirect ^ INDIRECT_FLAG);
 
-	val = val + (arg->indirect ^ INDIRECT_FLAG);
+	if (!list_empty(&arg->redirects))
+		val = process_redirects(arg, val, buf);
+
+	if (!val)
+		return 0;
 
 	/* Arrays and strings do their own indirect reads */
-	if (arg->array || arg->func_type == FUNC_TYPE_string)
+	if (!arg->indirect || arg->array || arg->func_type == FUNC_TYPE_string)
 		return val;
 
 	ret = probe_kernel_read(buf, (void *)val, arg->size);
@@ -1162,6 +1246,7 @@ static void func_event_seq_stop(struct seq_file *m, void *v)
 static int func_event_seq_show(struct seq_file *m, void *v)
 {
 	struct func_event *func_event = v;
+	struct func_arg_redirect *redirect;
 	struct func_arg *arg;
 	bool comma = false;
 	int last_arg = 0;
@@ -1190,6 +1275,13 @@ static int func_event_seq_show(struct seq_file *m, void *v)
 				seq_printf(m, "[%ld]",
 					   (arg->indirect ^ INDIRECT_FLAG) / arg->size);
 		}
+		list_for_each_entry(redirect, &arg->redirects, list) {
+			if (redirect->index)
+				seq_printf(m, "+%ld", redirect->index);
+			if (redirect->indirect)
+				seq_printf(m, "[%d]",
+					   (redirect->indirect ^ INDIRECT_FLAG) / arg->size);
+		}
 	}
 	seq_puts(m, ")\n");
 
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 18/18] tracing/perf: Allow perf to use function based events
  2018-02-02 23:04 [PATCH 00/18] [ANNOUNCE] Dynamically created function based events Steven Rostedt
                   ` (16 preceding siblings ...)
  2018-02-02 23:05 ` [PATCH 17/18] tracing: Add indirect to indirect access " Steven Rostedt
@ 2018-02-02 23:05 ` Steven Rostedt
  2018-02-03 13:38 ` [PATCH 00/18] [ANNOUNCE] Dynamically created " Masami Hiramatsu
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 87+ messages in thread
From: Steven Rostedt @ 2018-02-02 23:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Tom Zanussi, linux-rt-users,
	linux-trace-users, Arnaldo Carvalho de Melo, Clark Williams,
	Jiri Olsa, Daniel Bristot de Oliveira, Juri Lelli,
	Jonathan Corbet, Mathieu Desnoyers, Namhyung Kim,
	Alexei Starovoitov

[-- Attachment #1: 0018-tracing-perf-Allow-perf-to-use-function-based-events.patch --]
[-- Type: text/plain, Size: 6682 bytes --]

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

Have perf use function based events.

 # echo 'SyS_openat(int dfd, string buf, x32 flags, x32 mode)' > /sys/kernel/tracing/function_events
 # perf record -e functions:SyS_openat grep task_forks /proc/kallsyms
 # perf script
    grep   913 [002]  5713.413239: functions:SyS_openat: entry_SYSCALL_64_fastpath->sys_openat(dfd=-100, buf=/proc/kallsyms, flags=100, mode=0)

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 Documentation/trace/function-based-events.rst |   3 +-
 kernel/trace/trace_event_ftrace.c             | 134 ++++++++++++++++++++------
 2 files changed, 104 insertions(+), 33 deletions(-)

diff --git a/Documentation/trace/function-based-events.rst b/Documentation/trace/function-based-events.rst
index 3b341992b93d..6effde96d3d6 100644
--- a/Documentation/trace/function-based-events.rst
+++ b/Documentation/trace/function-based-events.rst
@@ -48,7 +48,8 @@ enable  filter  format  hist  id  trigger
 
 Even though the above function based event does not record much more
 than the function tracer does, it does become a full fledge event.
-This can be used by the histogram infrastructure, and triggers.
+This can be used by the histogram infrastructure, triggers, and perf
+where one can attach eBPF programs to.
 
  # cat events/functions/do_IRQ/format
 name: do_IRQ
diff --git a/kernel/trace/trace_event_ftrace.c b/kernel/trace/trace_event_ftrace.c
index b5b719680686..b145639eac45 100644
--- a/kernel/trace/trace_event_ftrace.c
+++ b/kernel/trace/trace_event_ftrace.c
@@ -747,46 +747,33 @@ static int get_string(unsigned long addr, unsigned int idx,
 	return len;
 }
 
-static void func_event_trace(struct trace_event_file *trace_file,
-			     struct func_event *func_event,
-			     unsigned long ip, unsigned long parent_ip,
-			     struct pt_regs *pt_regs)
+static int get_event_size(struct func_event *func_event, struct pt_regs *pt_regs,
+			  long *args, int *nr_args)
 {
-	struct func_event_hdr *entry;
-	struct trace_event_call *call = &func_event->call;
-	struct ring_buffer_event *event;
-	struct ring_buffer *buffer;
-	struct func_arg *arg;
-	long args[func_event->arg_cnt];
-	long long val = 1;
-	unsigned long irq_flags;
-	int str_offset;
-	int str_idx = 0;
-	int nr_args = 0;
 	int size;
-	int pc;
-
-	if (trace_trigger_soft_disabled(trace_file))
-		return;
-
-	local_save_flags(irq_flags);
-	pc = preempt_count();
 
-	size = func_event->arg_offset + sizeof(*entry);
+	size = func_event->arg_offset + sizeof(struct func_event_hdr);
 
 	if (func_event->arg_cnt)
-		nr_args = arch_get_func_args(pt_regs, 0, func_event->arg_cnt, args);
+		*nr_args = arch_get_func_args(pt_regs, 0, func_event->arg_cnt, args);
+	else
+		*nr_args = 0;
 
 	if (func_event->has_strings)
-		size += calculate_strings(func_event, nr_args, args);
+		size += calculate_strings(func_event, *nr_args, args);
 
-	event = trace_event_buffer_lock_reserve(&buffer, trace_file,
-						call->event.type,
-						size, irq_flags, pc);
-	if (!event)
-		return;
+	return size;
+}
+
+static void
+record_entry(struct func_event_hdr *entry, struct func_event *func_event,
+	     unsigned long ip, unsigned long parent_ip, int nr_args, long *args)
+{
+	struct func_arg *arg;
+	long long val;
+	int str_offset;
+	int str_idx = 0;
 
-	entry = ring_buffer_event_data(event);
 	entry->ip = ip;
 	entry->parent_ip = parent_ip;
 
@@ -809,11 +796,80 @@ static void func_event_trace(struct trace_event_file *trace_file,
 		} else
 			memcpy(&entry->data[arg->offset], &val, arg->size);
 	}
+}
+
+static void func_event_trace(struct trace_event_file *trace_file,
+			     struct func_event *func_event,
+			     unsigned long ip, unsigned long parent_ip,
+			     struct pt_regs *pt_regs)
+{
+	struct func_event_hdr *entry;
+	struct trace_event_call *call = &func_event->call;
+	struct ring_buffer_event *event;
+	struct ring_buffer *buffer;
+	long args[func_event->arg_cnt];
+	unsigned long irq_flags;
+	int nr_args;
+	int size;
+	int pc;
+
+	if (trace_trigger_soft_disabled(trace_file))
+		return;
+
+	local_save_flags(irq_flags);
+	pc = preempt_count();
+
+	size = get_event_size(func_event, pt_regs, args, &nr_args);
+
+	event = trace_event_buffer_lock_reserve(&buffer, trace_file,
+						call->event.type,
+						size, irq_flags, pc);
+	if (!event)
+		return;
 
+	entry = ring_buffer_event_data(event);
+	record_entry(entry, func_event, ip, parent_ip, nr_args, args);
 	event_trigger_unlock_commit_regs(trace_file, buffer, event,
 					 entry, irq_flags, pc, pt_regs);
 }
 
+#ifdef CONFIG_PERF_EVENTS
+/* Kprobe profile handler */
+static void func_event_perf(struct func_event *func_event,
+			    unsigned long ip, unsigned long parent_ip,
+			    struct pt_regs *pt_regs)
+{
+	struct trace_event_call *call = &func_event->call;
+	struct func_event_hdr *entry;
+	struct hlist_head *head;
+	long args[func_event->arg_cnt];
+	int nr_args = 0;
+	int rctx;
+	int size;
+
+	if (bpf_prog_array_valid(call) && !trace_call_bpf(call, pt_regs))
+		return;
+
+	head = this_cpu_ptr(call->perf_events);
+	if (hlist_empty(head))
+		return;
+
+	size = get_event_size(func_event, pt_regs, args, &nr_args);
+
+	entry = perf_trace_buf_alloc(size, NULL, &rctx);
+	if (!entry)
+		return;
+
+	record_entry(entry, func_event, ip, parent_ip, nr_args, args);
+	perf_trace_buf_submit(entry, size, rctx, call->event.type, 1, pt_regs,
+			      head, NULL);
+}
+#else
+static inline void func_event_perf(struct func_event *func_event,
+				   unsigned long ip, unsigned long parent_ip,
+				   struct pt_regs *pt_regs) { }
+#endif
+
 static void
 func_event_call(unsigned long ip, unsigned long parent_ip,
 		    struct ftrace_ops *op, struct pt_regs *pt_regs)
@@ -825,7 +881,10 @@ func_event_call(unsigned long ip, unsigned long parent_ip,
 
 	rcu_read_lock_sched();
 	list_for_each_entry_rcu(ff, &func_event->files, list) {
-		func_event_trace(ff->file, func_event, ip, parent_ip, pt_regs);
+		if (ff->file)
+			func_event_trace(ff->file, func_event, ip, parent_ip, pt_regs);
+		else
+			func_event_perf(func_event, ip, parent_ip, pt_regs);
 	}
 	rcu_read_unlock_sched();
 }
@@ -1041,6 +1100,17 @@ static int func_event_register(struct trace_event_call *event,
 		return enable_func_event(func_event, file);
 	case TRACE_REG_UNREGISTER:
 		return disable_func_event(func_event, file);
+#ifdef CONFIG_PERF_EVENTS
+	case TRACE_REG_PERF_REGISTER:
+		return enable_func_event(func_event, NULL);
+	case TRACE_REG_PERF_UNREGISTER:
+		return disable_func_event(func_event, NULL);
+	case TRACE_REG_PERF_OPEN:
+	case TRACE_REG_PERF_CLOSE:
+	case TRACE_REG_PERF_ADD:
+	case TRACE_REG_PERF_DEL:
+		return 0;
+#endif
 	default:
 		break;
 	}
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* Re: [PATCH 00/18] [ANNOUNCE] Dynamically created function based events
  2018-02-02 23:04 [PATCH 00/18] [ANNOUNCE] Dynamically created function based events Steven Rostedt
                   ` (17 preceding siblings ...)
  2018-02-02 23:05 ` [PATCH 18/18] tracing/perf: Allow perf to use " Steven Rostedt
@ 2018-02-03 13:38 ` Masami Hiramatsu
  2018-02-03 15:27   ` Steven Rostedt
  2018-02-03 17:04 ` Mathieu Desnoyers
                   ` (2 subsequent siblings)
  21 siblings, 1 reply; 87+ messages in thread
From: Masami Hiramatsu @ 2018-02-03 13:38 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu, Tom Zanussi,
	linux-rt-users, linux-trace-users, Arnaldo Carvalho de Melo,
	Clark Williams, Jiri Olsa, Daniel Bristot de Oliveira,
	Juri Lelli, Jonathan Corbet, Mathieu Desnoyers, Namhyung Kim,
	Alexei Starovoitov

On Fri, 02 Feb 2018 18:04:58 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> 
> At Kernel Summit back in October, we tried to bring up trace markers, which
> would be nops within the kernel proper, that would allow modules to hook
> arbitrary trace events to them. The reaction to this proposal was less than
> favorable. We were told that we were trying to make a work around for a
> problem, and not solving it. The problem in our minds is the notion of a
> "stable trace event".
> 
> There are maintainers that do not want trace events, or more trace events in
> their subsystems. This is due to the fact that trace events post an
> interface to user space, and this interface could become required by some
> tool. This may cause the trace event to become stable where it must not
> break the tool, and thus prevent the code from changing.
> 
> Or, the trace event may just have to add padding for fields that tools
> may require. The "success" field of the sched_wakeup trace event is one such
> instance. There is no more "success" variable, but tools may fail if it were
> to go away, so a "1" is simply added to the trace event wasting ring buffer
> real estate.
> 
> I talked with Linus about this, and he told me that we already have these
> markers in the kernel. They are from the mcount/__fentry__ used by function
> tracing. Have the trace events be created by these, and see if this will
> satisfy most areas that want trace events.
> 
> I decided to implement this idea, and here's the patch set.
> 
> Introducing "function based events". These are created dynamically by a
> tracefs file called "function_events". By writing a pseudo prototype into
> this file, you create an event.

This seems very similar feature of what kprobe-based event does.

Earlier version of kprobe-based event supported Nth argument, but I decided
to drop it because we can not ensure the "function signature" from kernel
binary. It has been passed to "perf probe", so that we can define line-level
granularity. 

Of course if it is easy to support "argN" again as it does if the arch's
calling convention is clearly stated. (and now kprobe is based on ftrace
if it is on the entry of functions)

So my question is, what is the difference of those from the user perspective?
Only event syntax is a problem?
I'm very confusing...

Thank you,

> 
>  # mount -t tracefs nodev /sys/kernel/tracing
>  # cd /sys/kernel/tracing
>  # echo 'do_IRQ(symbol ip[16] | x64[6] irq_stack[16])' > function_events
>  # cat events/functions/do_IRQ/format
> name: do_IRQ
> ID: 1399
> format:
> 	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
> 	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
> 	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
> 	field:int common_pid;	offset:4;	size:4;	signed:1;
> 
> 	field:unsigned long __parent_ip;	offset:8;	size:8;	signed:0;
> 	field:unsigned long __ip;	offset:16;	size:8;	signed:0;
> 	field:symbol ip;	offset:24;	size:8;	signed:0;
> 	field:x64 irq_stack[6];	offset:32;	size:48;	signed:0;
> 
> print fmt: "%pS->%pS(ip=%pS, irq_stack=%llx:%llx:%llx:%llx:%llx:%llx)", REC->__ip, REC->__parent_ip,
> REC->ip, REC->irq_stack[0], REC->irq_stack[1], REC->irq_stack[2], REC->irq_stack[3], REC->irq_stack[4],
> REC->irq_stack[5]
> 
>  # echo 1 > events/functions/do_IRQ/enable
>  # cat trace
>           <idle>-0     [003] d..3  3647.049344: ret_from_intr->do_IRQ(ip=cpuidle_enter_state+0xb1/0x330, irq_stack=ffffffff81665db1,10,246,ffffc900006c3e80,18,ffff88011eae9b40)
>           <idle>-0     [003] d..3  3647.049433: ret_from_intr->do_IRQ(ip=cpuidle_enter_state+0xb1/0x330, irq_stack=ffffffff81665db1,10,246,ffffc900006c3e80,18,ffff88011eae9b40)
>           <idle>-0     [003] d..3  3647.049672: ret_from_intr->do_IRQ(ip=cpuidle_enter_state+0xb1/0x330, irq_stack=ffffffff81665db1,10,246,ffffc900006c3e80,18,ffff88011eae9b40)
>           <idle>-0     [003] d..3  3647.325709: ret_from_intr->do_IRQ(ip=cpuidle_enter_state+0xb1/0x330, irq_stack=ffffffff81665db1,10,246,ffffc900006c3e80,18,ffff88011eae9b40)
>           <idle>-0     [003] d..3  3647.325929: ret_from_intr->do_IRQ(ip=cpuidle_enter_state+0xb1/0x330, irq_stack=ffffffff81665db1,10,246,ffffc900006c3e80,18,ffff88011eae9b40)
>           <idle>-0     [003] d..3  3647.325993: ret_from_intr->do_IRQ(ip=cpuidle_enter_state+0xb1/0x330, irq_stack=ffffffff81665db1,10,246,ffffc900006c3e80,18,ffff88011eae9b40)
>           <idle>-0     [003] d..3  3647.387571: ret_from_intr->do_IRQ(ip=cpuidle_enter_state+0xb1/0x330, irq_stack=ffffffff81665db1,10,246,ffffc900006c3e80,18,ffff88011eae9b40)
>           <idle>-0     [003] d..3  3647.387791: ret_from_intr->do_IRQ(ip=cpuidle_enter_state+0xb1/0x330, irq_stack=ffffffff81665db1,10,246,ffffc900006c3e80,18,ffff88011eae9b40)
>           <idle>-0     [003] d..3  3647.387874: ret_from_intr->do_IRQ(ip=cpuidle_enter_state+0xb1/0x330, irq_stack=ffffffff81665db1,10,246,ffffc900006c3e80,18,ffff88011eae9b40)
> 
> And this is much more powerful than just this. We can show strings, and
> index off of structures into other structures.
> 
>   # echo '__vfs_read(symbol read+40[0]+16)' > function_events
> 
>   # echo 1 > events/functions/__vfs_read/enable
>   # cat trace
>          sshd-1343  [005] ...2   199.734752: vfs_read->__vfs_read(read=tty_read+0x0/0xf0)
>          bash-1344  [003] ...2   199.734822: vfs_read->__vfs_read(read=tty_read+0x0/0xf0)
>          sshd-1343  [005] ...2   199.734835: vfs_read->__vfs_read(read=tty_read+0x0/0xf0)
>  avahi-daemon-910   [003] ...2   200.136740: vfs_read->__vfs_read(read=          (null))
>  avahi-daemon-910   [003] ...2   200.136750: vfs_read->__vfs_read(read=          (null))
> 
> And even read user space:
> 
>   # echo 'SyS_openat(int dfd, string path, x32 flags, x16 mode)' > function_events
>   # echo 1 > events/functions/enable
>   # grep task_fork /proc/kallsyms 
> ffffffff810d5a60 t task_fork_fair
> ffffffff810dfc30 t task_fork_dl
>   # cat trace
>             grep-1820  [000] ...2  3926.107603: entry_SYSCALL_64_fastpath->SyS_openat(dfd=-100, path=/proc/kallsyms, flags=100, mode=0)
> 
> These are fully functional events. That is, they work with ftrace,
> trace-cmd, perf, histograms, triggers, and eBPF.
> 
> What's next? I need to rewrite the function graph tracer, and be able to add
> dynamic events on function return.
> 
> I made this work with x86 with a simple function that only returns
> 6 function parameters for x86_64 and 3 for x86_32. But this could easily
> be extended.
> 
> Cheers!
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
> ftrace/dynamic-ftrace-events
> 
> Head SHA1: 30fbdffd5d38bd27b04fb911f7158f10a99be8c4
> 
> 
> Steven Rostedt (VMware) (18):
>       tracing: Add function based events
>       tracing: Add documentation for function based events
>       tracing: Add simple arguments to function based events
>       tracing/x86: Add arch_get_func_args() function
>       tracing: Add hex print for dynamic ftrace based events
>       tracing: Add indirect offset to args of ftrace based events
>       tracing: Add dereferencing multiple fields per arg
>       tracing: Add "unsigned" to function based events
>       tracing: Add indexing of arguments for function based events
>       tracing: Make func_type enums for easier comparing of arg types
>       tracing: Add symbol type to function based events
>       tracing: Add accessing direct address from function based events
>       tracing: Add array type to function based events
>       tracing: Have char arrays be strings for function based events
>       tracing: Add string type for dynamic strings in function based events
>       tracing: Add NULL to skip args for function based events
>       tracing: Add indirect to indirect access for function based events
>       tracing/perf: Allow perf to use function based events
> 
> ----
>  Documentation/trace/function-based-events.rst |  426 ++++++++
>  arch/x86/kernel/ftrace.c                      |   28 +
>  include/linux/trace_events.h                  |    2 +
>  kernel/trace/Kconfig                          |   12 +
>  kernel/trace/Makefile                         |    1 +
>  kernel/trace/trace.h                          |   11 +
>  kernel/trace/trace_event_ftrace.c             | 1440 +++++++++++++++++++++++++
>  kernel/trace/trace_probe.h                    |   11 -
>  8 files changed, 1920 insertions(+), 11 deletions(-)
>  create mode 100644 Documentation/trace/function-based-events.rst
>  create mode 100644 kernel/trace/trace_event_ftrace.c


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 13/18] tracing: Add array type to function based events
  2018-02-02 23:05 ` [PATCH 13/18] tracing: Add array type to " Steven Rostedt
@ 2018-02-03 13:56   ` Masami Hiramatsu
  2018-02-03 15:29     ` Steven Rostedt
  2018-02-09  1:17   ` Namhyung Kim
  1 sibling, 1 reply; 87+ messages in thread
From: Masami Hiramatsu @ 2018-02-03 13:56 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu, Tom Zanussi,
	linux-rt-users, linux-trace-users, Arnaldo Carvalho de Melo,
	Clark Williams, Jiri Olsa, Daniel Bristot de Oliveira,
	Juri Lelli, Jonathan Corbet, Mathieu Desnoyers, Namhyung Kim,
	Alexei Starovoitov

On Fri, 02 Feb 2018 18:05:11 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> 
> Add syntex to allow the user to create an array type. Brackets after the
> type field will denote that this is an array type. For example:
> 
>  # echo 'SyS_open(x8[32] buf, x32 flags, x32 mode)' > function_events
> 
> Will make the first argument of the sys_open function call an array of
> 32 bytes.
> 
> The array type can also be used in conjunction with the indirect offset
> brackets as well. For example to get the interrupt stack of regs in do_IRQ()
> for x86_64.
> 
>  # echo 'do_IRQ(x64[5] regs[16])' > function_events

This idea looks good for kprobe events too.
I'll try to implement same one :)

Thank you,

> 
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> ---
>  Documentation/trace/function-based-events.rst |  22 +++-
>  kernel/trace/trace_event_ftrace.c             | 157 +++++++++++++++++++++-----
>  2 files changed, 151 insertions(+), 28 deletions(-)
> 
> diff --git a/Documentation/trace/function-based-events.rst b/Documentation/trace/function-based-events.rst
> index b0e6725f3032..4a8a6fb16a0a 100644
> --- a/Documentation/trace/function-based-events.rst
> +++ b/Documentation/trace/function-based-events.rst
> @@ -93,7 +93,7 @@ as follows:
>  
>   ARG := TYPE FIELD | TYPE <name> '=' ADDR | TYPE ADDR | ARG '|' ARG
>  
> - TYPE := ATOM | 'unsigned' ATOM
> + TYPE := ATOM | ATOM '[' <number> ']' | 'unsigned' TYPE
>  
>   ATOM := 'u8' | 'u16' | 'u32' | 'u64' |
>           's8' | 's16' | 's32' | 's64' |
> @@ -305,3 +305,23 @@ Is the same as
>      <idle>-0     [003] d..3   655.823498: ret_from_intr->do_IRQ(total_forks=1504, regs=tick_nohz_idle_enter+0x4c/0x50)
>      <idle>-0     [003] d..3   655.954096: ret_from_intr->do_IRQ(total_forks=1504, regs=cpuidle_enter_state+0xb1/0x330)
>  
> +
> +Array types
> +===========
> +
> +If there's a case where you want to see an array of a type, then you can
> +declare a type as an array by adding '[' number ']' after the type.
> +
> +To get the net_device perm_addr, from the dev parameter.
> +
> + (gdb) printf "%d\n", &((struct net_device *)0)->perm_addr
> +558
> +
> + # echo 'ip_rcv(x64 skb, x8[6] perm_addr+558)' > function_events
> +
> + # echo 1 > events/functions/ip_rcv/enable
> + # cat trace
> +    <idle>-0     [003] ..s3   219.813582: __netif_receive_skb_core->ip_rcv(skb=ffff880118195e00, perm_addr=b4,b5,2f,ce,18,65)
> +    <idle>-0     [003] ..s3   219.813595: __netif_receive_skb_core->ip_rcv(skb=ffff880118195e00, perm_addr=b4,b5,2f,ce,18,65)
> +    <idle>-0     [003] ..s3   220.115053: __netif_receive_skb_core->ip_rcv(skb=ffff880118195c00, perm_addr=b4,b5,2f,ce,18,65)
> +    <idle>-0     [003] ..s3   220.115293: __netif_receive_skb_core->ip_rcv(skb=ffff880118195c00, perm_addr=b4,b5,2f,ce,18,65)
> diff --git a/kernel/trace/trace_event_ftrace.c b/kernel/trace/trace_event_ftrace.c
> index 206114f192be..64e2d7dcfd18 100644
> --- a/kernel/trace/trace_event_ftrace.c
> +++ b/kernel/trace/trace_event_ftrace.c
> @@ -20,6 +20,7 @@ struct func_arg {
>  	char				*name;
>  	long				indirect;
>  	long				index;
> +	short				array;
>  	short				offset;
>  	short				size;
>  	s8				arg;
> @@ -68,6 +69,9 @@ enum func_states {
>  	FUNC_STATE_PIPE,
>  	FUNC_STATE_PLUS,
>  	FUNC_STATE_TYPE,
> +	FUNC_STATE_ARRAY,
> +	FUNC_STATE_ARRAY_SIZE,
> +	FUNC_STATE_ARRAY_END,
>  	FUNC_STATE_VAR,
>  	FUNC_STATE_COMMA,
>  	FUNC_STATE_END,
> @@ -289,6 +293,7 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
>  	static bool update_arg;
>  	static int unsign;
>  	unsigned long val;
> +	char *type;
>  	int ret;
>  	int i;
>  
> @@ -339,6 +344,10 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
>  		return FUNC_STATE_TYPE;
>  
>  	case FUNC_STATE_TYPE:
> +		if (token[0] == '[')
> +			return FUNC_STATE_ARRAY;
> +		/* Fall through */
> +	case FUNC_STATE_ARRAY_END:
>  		if (WARN_ON(!fevent->last_arg))
>  			break;
>  		if (update_arg_name(fevent, token) < 0)
> @@ -350,14 +359,37 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
>  		update_arg = true;
>  		return FUNC_STATE_VAR;
>  
> +	case FUNC_STATE_ARRAY:
>  	case FUNC_STATE_BRACKET:
> -		WARN_ON(!fevent->last_arg);
> +		if (WARN_ON(!fevent->last_arg))
> +			break;
>  		ret = kstrtoul(token, 0, &val);
>  		if (ret)
>  			break;
> -		val *= fevent->last_arg->size;
> -		fevent->last_arg->indirect = val ^ INDIRECT_FLAG;
> -		return FUNC_STATE_INDIRECT;
> +		if (state == FUNC_STATE_BRACKET) {
> +			val *= fevent->last_arg->size;
> +			fevent->last_arg->indirect = val ^ INDIRECT_FLAG;
> +			return FUNC_STATE_INDIRECT;
> +		}
> +		if (val <= 0)
> +			break;
> +		fevent->last_arg->array = val;
> +		type = kasprintf(GFP_KERNEL, "%s[%d]", fevent->last_arg->type, (unsigned)val);
> +		if (!type)
> +			break;
> +		kfree(fevent->last_arg->type);
> +		fevent->last_arg->type = type;
> +		/*
> +		 * arg_offset has already been updated once by size.
> +		 * This update needs to account for that (hence the "- 1").
> +		 */
> +		fevent->arg_offset += fevent->last_arg->size * (fevent->last_arg->array - 1);
> +		return FUNC_STATE_ARRAY_SIZE;
> +
> +	case FUNC_STATE_ARRAY_SIZE:
> +		if (token[0] != ']')
> +			break;
> +		return FUNC_STATE_ARRAY_END;
>  
>  	case FUNC_STATE_INDIRECT:
>  		if (token[0] != ']')
> @@ -453,6 +485,10 @@ static long long get_arg(struct func_arg *arg, unsigned long val)
>  
>  	val = val + (arg->indirect ^ INDIRECT_FLAG);
>  
> +	/* Arrays do their own indirect reads */
> +	if (arg->array)
> +		return val;
> +
>  	ret = probe_kernel_read(buf, (void *)val, arg->size);
>  	if (ret)
>  		return 0;
> @@ -474,6 +510,21 @@ static long long get_arg(struct func_arg *arg, unsigned long val)
>  	return val;
>  }
>  
> +static void get_array(void *dst, struct func_arg *arg, unsigned long val)
> +{
> +	void *ptr = (void *)val;
> +	int ret;
> +	int i;
> +
> +	for (i = 0; i < arg->array; i++) {
> +		ret = probe_kernel_read(dst, ptr, arg->size);
> +		if (ret)
> +			memset(dst, 0, arg->size);
> +		ptr += arg->size;
> +		dst += arg->size;
> +	}
> +}
> +
>  static void func_event_trace(struct trace_event_file *trace_file,
>  			     struct func_event *func_event,
>  			     unsigned long ip, unsigned long parent_ip,
> @@ -520,7 +571,10 @@ static void func_event_trace(struct trace_event_file *trace_file,
>  				val = get_arg(arg, args[arg->arg]);
>  		} else
>  			val = 0;
> -		memcpy(&entry->data[arg->offset], &val, arg->size);
> +		if (arg->array)
> +			get_array(&entry->data[arg->offset], arg, val);
> +		else
> +			memcpy(&entry->data[arg->offset], &val, arg->size);
>  	}
>  
>  	event_trigger_unlock_commit_regs(trace_file, buffer, event,
> @@ -571,6 +625,25 @@ static void make_fmt(struct func_arg *arg, char *fmt)
>  	fmt[c++] = '\0';
>  }
>  
> +static void write_data(struct trace_seq *s, const struct func_arg *arg, const char *fmt,
> +		       const void *data)
> +{
> +	switch (arg->size) {
> +	case 8:
> +		trace_seq_printf(s, fmt, *(unsigned long long *)data);
> +		break;
> +	case 4:
> +		trace_seq_printf(s, fmt, *(unsigned *)data);
> +		break;
> +	case 2:
> +		trace_seq_printf(s, fmt, *(unsigned short *)data);
> +		break;
> +	case 1:
> +		trace_seq_printf(s, fmt, *(unsigned char *)data);
> +		break;
> +	}
> +}
> +
>  static enum print_line_t
>  func_event_print(struct trace_iterator *iter, int flags,
>  		 struct trace_event *event)
> @@ -582,6 +655,7 @@ func_event_print(struct trace_iterator *iter, int flags,
>  	char fmt[FMT_SIZE];
>  	void *data;
>  	bool comma = false;
> +	int a;
>  
>  	entry = (struct func_event_hdr *)iter->ent;
>  
> @@ -598,20 +672,16 @@ func_event_print(struct trace_iterator *iter, int flags,
>  
>  		make_fmt(arg, fmt);
>  
> -		switch (arg->size) {
> -		case 8:
> -			trace_seq_printf(s, fmt, *(unsigned long long *)data);
> -			break;
> -		case 4:
> -			trace_seq_printf(s, fmt, *(unsigned *)data);
> -			break;
> -		case 2:
> -			trace_seq_printf(s, fmt, *(unsigned short *)data);
> -			break;
> -		case 1:
> -			trace_seq_printf(s, fmt, *(unsigned char *)data);
> -			break;
> -		}
> +		if (arg->array) {
> +			comma = false;
> +			for (a = 0; a < arg->array; a++, data += arg->size) {
> +				if (comma)
> +					trace_seq_putc(s, ',');
> +				comma = true;
> +				write_data(s, arg, fmt, data);
> +			}
> +		} else
> +			write_data(s, arg, fmt, data);
>  	}
>  	trace_seq_puts(s, ")\n");
>  	return trace_handle_return(s);
> @@ -634,11 +704,14 @@ static int func_event_define_fields(struct trace_event_call *event_call)
>  	DEFINE_FIELD(unsigned long, parent_ip, "__ip", 0);
>  
>  	list_for_each_entry(arg, &fevent->args, list) {
> +		int size = arg->size;
> +
> +		if (arg->array)
> +			size *= arg->array;
>  		ret = trace_define_field(event_call, arg->type,
>  					 arg->name,
>  					 sizeof(field) + arg->offset,
> -					 arg->size, arg->sign,
> -					 FILTER_OTHER);
> +					 size, arg->sign, FILTER_OTHER);
>  		if (ret < 0)
>  			return ret;
>  	}
> @@ -729,7 +802,7 @@ static int __set_print_fmt(struct func_event *func_event,
>  	const char *fmt_start = "\"%pS->%pS(";
>  	const char *fmt_end = ")\", REC->__ip, REC->__parent_ip";
>  	char fmt[FMT_SIZE];
> -	int r, i;
> +	int r, i, a;
>  	bool comma = false;
>  
>  	r = snprintf(buf, len, "%s", fmt_start);
> @@ -741,19 +814,49 @@ static int __set_print_fmt(struct func_event *func_event,
>  			len = update_len(len, i);
>  		}
>  		comma = true;
> -		make_fmt(arg, fmt);
> -		i = snprintf(buf + r, len, "%s=%s", arg->name, fmt);
> +
> +		i = snprintf(buf + r, len, "%s=", arg->name);
>  		r += i;
>  		len = update_len(len, i);
> +
> +		make_fmt(arg, fmt);
> +
> +		if (arg->array) {
> +			bool colon = false;
> +
> +			for (a = 0; a < arg->array; a++) {
> +				if (colon) {
> +					i = snprintf(buf + r, len, ":");
> +					r += i;
> +					len = update_len(len, i);
> +				}
> +				colon = true;
> +				i = snprintf(buf + r, len, "%s", fmt);
> +				r += i;
> +				len = update_len(len, i);
> +			}
> +		} else {
> +			i = snprintf(buf + r, len, "%s", fmt);
> +			r += i;
> +			len = update_len(len, i);
> +		}
>  	}
>  	i = snprintf(buf + r, len, "%s", fmt_end);
>  	r += i;
>  	len = update_len(len, i);
>  
>  	list_for_each_entry(arg, &func_event->args, list) {
> -		i = snprintf(buf + r, len, ", REC->%s", arg->name);
> -		r += i;
> -		len = update_len(len, i);
> +		if (arg->array) {
> +			for (a = 0; a < arg->array; a++) {
> +				i = snprintf(buf + r, len, ", REC->%s[%d]", arg->name, a);
> +				r += i;
> +				len = update_len(len, i);
> +			}
> +		} else {
> +			i = snprintf(buf + r, len, ", REC->%s", arg->name);
> +			r += i;
> +			len = update_len(len, i);
> +		}
>  	}
>  
>  	return r;
> -- 
> 2.15.1
> 
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 00/18] [ANNOUNCE] Dynamically created function based events
  2018-02-03 13:38 ` [PATCH 00/18] [ANNOUNCE] Dynamically created " Masami Hiramatsu
@ 2018-02-03 15:27   ` Steven Rostedt
  2018-02-04  3:57     ` Masami Hiramatsu
  0 siblings, 1 reply; 87+ messages in thread
From: Steven Rostedt @ 2018-02-03 15:27 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Tom Zanussi, linux-rt-users,
	linux-trace-users, Arnaldo Carvalho de Melo, Clark Williams,
	Jiri Olsa, Daniel Bristot de Oliveira, Juri Lelli,
	Jonathan Corbet, Mathieu Desnoyers, Namhyung Kim,
	Alexei Starovoitov

On Sat, 3 Feb 2018 22:38:17 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> This seems very similar feature of what kprobe-based event does.

It is similar, but not the same as kprobes. It only focuses on
functions and their arguments, and should not require any disassembling
(no knowledge of regs required). Any need to see anything within the
function will still require kprobe based help.

> 
> Earlier version of kprobe-based event supported Nth argument, but I decided
> to drop it because we can not ensure the "function signature" from kernel
> binary. It has been passed to "perf probe", so that we can define line-level
> granularity. 

Sure, and if we get a wrong function, which can happen, the code is set
up not to break anything. You only get garbage output.

> 
> Of course if it is easy to support "argN" again as it does if the arch's
> calling convention is clearly stated. (and now kprobe is based on ftrace
> if it is on the entry of functions)
> 
> So my question is, what is the difference of those from the user perspective?
> Only event syntax is a problem?
> I'm very confusing...

Again, this is not a kprobe replacement. It is somewhat of a syntax
issue. I want this to be very simple and not need a disassembler. For
indexing of structures one may use gdb, but that's about it. You could
get the same info from counting what's in a structure itself.

I based some of the code from kprobes too. But I wanted this to be
simpler, and as such, not as powerful as kprobes. More of a "poor mans"
kprobe ;-) Where you are limited to functions and their arguments. If
you need more power, switch to kprobes. In other words, its just an
added stepping stone.

Also, this should work without kprobe support, only ftrace, and function
args from the arch.

-- Steve

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 13/18] tracing: Add array type to function based events
  2018-02-03 13:56   ` Masami Hiramatsu
@ 2018-02-03 15:29     ` Steven Rostedt
  2018-02-04  3:50       ` Masami Hiramatsu
  0 siblings, 1 reply; 87+ messages in thread
From: Steven Rostedt @ 2018-02-03 15:29 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Tom Zanussi, linux-rt-users,
	linux-trace-users, Arnaldo Carvalho de Melo, Clark Williams,
	Jiri Olsa, Daniel Bristot de Oliveira, Juri Lelli,
	Jonathan Corbet, Mathieu Desnoyers, Namhyung Kim,
	Alexei Starovoitov

On Sat, 3 Feb 2018 22:56:15 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> This idea looks good for kprobe events too.
> I'll try to implement same one :)

We should have the two re-use the same code.

In other words, I would like as much code sharing as possible.

-- Steve

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 00/18] [ANNOUNCE] Dynamically created function based events
  2018-02-02 23:04 [PATCH 00/18] [ANNOUNCE] Dynamically created function based events Steven Rostedt
                   ` (18 preceding siblings ...)
  2018-02-03 13:38 ` [PATCH 00/18] [ANNOUNCE] Dynamically created " Masami Hiramatsu
@ 2018-02-03 17:04 ` Mathieu Desnoyers
  2018-02-03 19:02   ` Steven Rostedt
  2018-02-03 21:43   ` Linus Torvalds
  2018-02-03 18:52 ` Steven Rostedt
  2018-02-05 10:23 ` Juri Lelli
  21 siblings, 2 replies; 87+ messages in thread
From: Mathieu Desnoyers @ 2018-02-03 17:04 UTC (permalink / raw)
  To: rostedt
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu, Tom Zanussi,
	linux-rt-users, linux-trace-users, acme, Clark Williams,
	Jiri Olsa, bristot, Juri Lelli, Jonathan Corbet, Namhyung Kim,
	Alexei Starovoitov

----- On Feb 2, 2018, at 6:04 PM, rostedt rostedt@goodmis.org wrote:

> At Kernel Summit back in October, we tried to bring up trace markers, which
> would be nops within the kernel proper, that would allow modules to hook
> arbitrary trace events to them. The reaction to this proposal was less than
> favorable. We were told that we were trying to make a work around for a
> problem, and not solving it. The problem in our minds is the notion of a
> "stable trace event".
> 
> There are maintainers that do not want trace events, or more trace events in
> their subsystems. This is due to the fact that trace events post an
> interface to user space, and this interface could become required by some
> tool. This may cause the trace event to become stable where it must not
> break the tool, and thus prevent the code from changing.
> 
> Or, the trace event may just have to add padding for fields that tools
> may require. The "success" field of the sched_wakeup trace event is one such
> instance. There is no more "success" variable, but tools may fail if it were
> to go away, so a "1" is simply added to the trace event wasting ring buffer
> real estate.
> 
> I talked with Linus about this, and he told me that we already have these
> markers in the kernel. They are from the mcount/__fentry__ used by function
> tracing. Have the trace events be created by these, and see if this will
> satisfy most areas that want trace events.

The approach proposed here will introduce an expectation that internal
function signatures never change in the kernel, else it would break user-space
tools hooking on those functions.

The instrumentation infrastructure provided by this patchset might be useful
for "one off" scripts, but it does not address the "stable instrumentation"
expectations issue.

The problem today is caused by widely used trace analysis tools that cannot
cope with changes to the kernel instrumentation, do not report the
instrumentation mismatch compared to their expectations, and we generally
don't expect users to ever update those tools to deal with newer kernels. Having
those tools hook on function names/arguments will not make this magically go
away. As soon as kernel code changes, widely used trace analysis tools will
start breaking left and right, and we will be back to square one. Only this time,
it's the internal function signature which will have become an ABI.

A possible solution to this problem appears if we start considering trace
analysis tools as just that: "tooling", with the following properties:

1) Tools need to validate that the instrumentation provided matches their
   expectations. This can be done by checking event/field names and/or version.
   Tools that fail to do that should be fixed.

2) Tools need to report to the user when the instrumentation does not match
   their expectations, and hint users to upgrade in order to deal with change.

3) Tools need to be backward compatible with respect to instrumentation: a
   user switching between older and newer kernels should not have to keep
   various copies of their tooling stack (graphical UI, analysis scripts,
   and so on).

If our goal is really to address this "stable instrumentation" issue, I don't
think hooking on functions helps in any way. I hope we can work on defining
instrumentation interface rules in order to deal with the fundamental problem
of requiring tooling to adapt to kernel changes.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 00/18] [ANNOUNCE] Dynamically created function based events
  2018-02-02 23:04 [PATCH 00/18] [ANNOUNCE] Dynamically created function based events Steven Rostedt
                   ` (19 preceding siblings ...)
  2018-02-03 17:04 ` Mathieu Desnoyers
@ 2018-02-03 18:52 ` Steven Rostedt
  2018-02-05 10:23 ` Juri Lelli
  21 siblings, 0 replies; 87+ messages in thread
From: Steven Rostedt @ 2018-02-03 18:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Tom Zanussi, linux-rt-users,
	linux-trace-users, Arnaldo Carvalho de Melo, Clark Williams,
	Jiri Olsa, Daniel Bristot de Oliveira, Juri Lelli,
	Jonathan Corbet, Mathieu Desnoyers, Namhyung Kim,
	Alexei Starovoitov


I need to add a "Quick guide" and "Tips and tricks" section to the
document. To explain the arguments better...

Simple args are:

 "int val", "unsigned int val", "char x", "unsigned long addr",

Also the types:

 "s32 val", "u32 val", "s8 x", "u64 addr"

The above are all printed in decimal "%d" or "%u", if you want hex...

 "x32 val", "x8 x", "x64 addr"

If you want to have it use "%pS" to print (symbols)

 "symbol addr" is like: "%pS", (void *)addr

Arrays can be expressed after the type:

 "x8[6] mac" is like: "%x,%x,%x,%x,%x,%x", mac[0], mac[1], mac[2],
                                           mac[3], mac[4], mac[5]

Where mac would be: unsigned char mac[6] type.

Note, arrays of type "char" and "unsigned char" turn into a static
string.

 "char[10] str" is like: "%s", str

Where str is defined as char str[10];

If the argument is a pointer to a structure, you can index into the
structure:

 "x64 ip[16]" is like: "%llx", ((u64 *)ip)[16]

Finally, if an argument is a pointer to a structure, and you want to
get to another structure that it points to, for example

 struct sk_buff *skb;

and you want to get to:

  skb->dev->perm_addr

when the parameter is a pointer to skb.

  (gdb) printf "%d\n", &((struct sk_buff *)0)->dev
16
  (gdb) printf "%d\n", &((struct net_device *)0)->perm_addr
558

The net_device *dev is 16 bytes into sk_buff, and the char array
perm_addr, is 558 bytes into the net_device structure.

Where perm_addr is an array of the mac address.

 "x8[6] perm_addr+16[0]+558"

Which would turn into:

 char *dev = (char **)(((void *)skb)+16)[0];
 char *perm_addr = (char *)(dev+558);

 "%x,%x,%x,%x,%x,%x", perm_addr[0], perm_addr[1], perm_addr[2],
                      perm_addr[3], perm_addr[4], perm_addr[5]

OK, the above is a bit complex ;-) But works nicely.

-- Steve

On Fri, 02 Feb 2018 18:04:58 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> At Kernel Summit back in October, we tried to bring up trace markers, which
> would be nops within the kernel proper, that would allow modules to hook
> arbitrary trace events to them. The reaction to this proposal was less than
> favorable. We were told that we were trying to make a work around for a
> problem, and not solving it. The problem in our minds is the notion of a
> "stable trace event".
> 
> There are maintainers that do not want trace events, or more trace events in
> their subsystems. This is due to the fact that trace events post an
> interface to user space, and this interface could become required by some
> tool. This may cause the trace event to become stable where it must not
> break the tool, and thus prevent the code from changing.
> 
> Or, the trace event may just have to add padding for fields that tools
> may require. The "success" field of the sched_wakeup trace event is one such
> instance. There is no more "success" variable, but tools may fail if it were
> to go away, so a "1" is simply added to the trace event wasting ring buffer
> real estate.
> 
> I talked with Linus about this, and he told me that we already have these
> markers in the kernel. They are from the mcount/__fentry__ used by function
> tracing. Have the trace events be created by these, and see if this will
> satisfy most areas that want trace events.
> 
> I decided to implement this idea, and here's the patch set.
> 
> Introducing "function based events". These are created dynamically by a
> tracefs file called "function_events". By writing a pseudo prototype into
> this file, you create an event.
> 
>  # mount -t tracefs nodev /sys/kernel/tracing
>  # cd /sys/kernel/tracing
>  # echo 'do_IRQ(symbol ip[16] | x64[6] irq_stack[16])' > function_events
>  # cat events/functions/do_IRQ/format
> name: do_IRQ
> ID: 1399
> format:
> 	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
> 	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
> 	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
> 	field:int common_pid;	offset:4;	size:4;	signed:1;
> 
> 	field:unsigned long __parent_ip;	offset:8;	size:8;	signed:0;
> 	field:unsigned long __ip;	offset:16;	size:8;	signed:0;
> 	field:symbol ip;	offset:24;	size:8;	signed:0;
> 	field:x64 irq_stack[6];	offset:32;	size:48;	signed:0;
> 
> print fmt: "%pS->%pS(ip=%pS, irq_stack=%llx:%llx:%llx:%llx:%llx:%llx)", REC->__ip, REC->__parent_ip,
> REC->ip, REC->irq_stack[0], REC->irq_stack[1], REC->irq_stack[2], REC->irq_stack[3], REC->irq_stack[4],
> REC->irq_stack[5]
> 
>  # echo 1 > events/functions/do_IRQ/enable
>  # cat trace
>           <idle>-0     [003] d..3  3647.049344: ret_from_intr->do_IRQ(ip=cpuidle_enter_state+0xb1/0x330, irq_stack=ffffffff81665db1,10,246,ffffc900006c3e80,18,ffff88011eae9b40)
>           <idle>-0     [003] d..3  3647.049433: ret_from_intr->do_IRQ(ip=cpuidle_enter_state+0xb1/0x330, irq_stack=ffffffff81665db1,10,246,ffffc900006c3e80,18,ffff88011eae9b40)
>           <idle>-0     [003] d..3  3647.049672: ret_from_intr->do_IRQ(ip=cpuidle_enter_state+0xb1/0x330, irq_stack=ffffffff81665db1,10,246,ffffc900006c3e80,18,ffff88011eae9b40)
>           <idle>-0     [003] d..3  3647.325709: ret_from_intr->do_IRQ(ip=cpuidle_enter_state+0xb1/0x330, irq_stack=ffffffff81665db1,10,246,ffffc900006c3e80,18,ffff88011eae9b40)
>           <idle>-0     [003] d..3  3647.325929: ret_from_intr->do_IRQ(ip=cpuidle_enter_state+0xb1/0x330, irq_stack=ffffffff81665db1,10,246,ffffc900006c3e80,18,ffff88011eae9b40)
>           <idle>-0     [003] d..3  3647.325993: ret_from_intr->do_IRQ(ip=cpuidle_enter_state+0xb1/0x330, irq_stack=ffffffff81665db1,10,246,ffffc900006c3e80,18,ffff88011eae9b40)
>           <idle>-0     [003] d..3  3647.387571: ret_from_intr->do_IRQ(ip=cpuidle_enter_state+0xb1/0x330, irq_stack=ffffffff81665db1,10,246,ffffc900006c3e80,18,ffff88011eae9b40)
>           <idle>-0     [003] d..3  3647.387791: ret_from_intr->do_IRQ(ip=cpuidle_enter_state+0xb1/0x330, irq_stack=ffffffff81665db1,10,246,ffffc900006c3e80,18,ffff88011eae9b40)
>           <idle>-0     [003] d..3  3647.387874: ret_from_intr->do_IRQ(ip=cpuidle_enter_state+0xb1/0x330, irq_stack=ffffffff81665db1,10,246,ffffc900006c3e80,18,ffff88011eae9b40)
> 
> And this is much more powerful than just this. We can show strings, and
> index off of structures into other structures.
> 
>   # echo '__vfs_read(symbol read+40[0]+16)' > function_events
> 
>   # echo 1 > events/functions/__vfs_read/enable
>   # cat trace
>          sshd-1343  [005] ...2   199.734752: vfs_read->__vfs_read(read=tty_read+0x0/0xf0)
>          bash-1344  [003] ...2   199.734822: vfs_read->__vfs_read(read=tty_read+0x0/0xf0)
>          sshd-1343  [005] ...2   199.734835: vfs_read->__vfs_read(read=tty_read+0x0/0xf0)
>  avahi-daemon-910   [003] ...2   200.136740: vfs_read->__vfs_read(read=          (null))
>  avahi-daemon-910   [003] ...2   200.136750: vfs_read->__vfs_read(read=          (null))
> 
> And even read user space:
> 
>   # echo 'SyS_openat(int dfd, string path, x32 flags, x16 mode)' > function_events
>   # echo 1 > events/functions/enable
>   # grep task_fork /proc/kallsyms 
> ffffffff810d5a60 t task_fork_fair
> ffffffff810dfc30 t task_fork_dl
>   # cat trace
>             grep-1820  [000] ...2  3926.107603: entry_SYSCALL_64_fastpath->SyS_openat(dfd=-100, path=/proc/kallsyms, flags=100, mode=0)
> 
> These are fully functional events. That is, they work with ftrace,
> trace-cmd, perf, histograms, triggers, and eBPF.
> 
> What's next? I need to rewrite the function graph tracer, and be able to add
> dynamic events on function return.
> 
> I made this work with x86 with a simple function that only returns
> 6 function parameters for x86_64 and 3 for x86_32. But this could easily
> be extended.
> 
> Cheers!
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
> ftrace/dynamic-ftrace-events
> 
> Head SHA1: 30fbdffd5d38bd27b04fb911f7158f10a99be8c4
> 
> 
> Steven Rostedt (VMware) (18):
>       tracing: Add function based events
>       tracing: Add documentation for function based events
>       tracing: Add simple arguments to function based events
>       tracing/x86: Add arch_get_func_args() function
>       tracing: Add hex print for dynamic ftrace based events
>       tracing: Add indirect offset to args of ftrace based events
>       tracing: Add dereferencing multiple fields per arg
>       tracing: Add "unsigned" to function based events
>       tracing: Add indexing of arguments for function based events
>       tracing: Make func_type enums for easier comparing of arg types
>       tracing: Add symbol type to function based events
>       tracing: Add accessing direct address from function based events
>       tracing: Add array type to function based events
>       tracing: Have char arrays be strings for function based events
>       tracing: Add string type for dynamic strings in function based events
>       tracing: Add NULL to skip args for function based events
>       tracing: Add indirect to indirect access for function based events
>       tracing/perf: Allow perf to use function based events
> 
> ----
>  Documentation/trace/function-based-events.rst |  426 ++++++++
>  arch/x86/kernel/ftrace.c                      |   28 +
>  include/linux/trace_events.h                  |    2 +
>  kernel/trace/Kconfig                          |   12 +
>  kernel/trace/Makefile                         |    1 +
>  kernel/trace/trace.h                          |   11 +
>  kernel/trace/trace_event_ftrace.c             | 1440 +++++++++++++++++++++++++
>  kernel/trace/trace_probe.h                    |   11 -
>  8 files changed, 1920 insertions(+), 11 deletions(-)
>  create mode 100644 Documentation/trace/function-based-events.rst
>  create mode 100644 kernel/trace/trace_event_ftrace.c
> --
> To unsubscribe from this list: send the line "unsubscribe linux-trace-users" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 00/18] [ANNOUNCE] Dynamically created function based events
  2018-02-03 17:04 ` Mathieu Desnoyers
@ 2018-02-03 19:02   ` Steven Rostedt
  2018-02-03 20:52     ` Alexei Starovoitov
  2018-02-03 21:43   ` Linus Torvalds
  1 sibling, 1 reply; 87+ messages in thread
From: Steven Rostedt @ 2018-02-03 19:02 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu, Tom Zanussi,
	linux-rt-users, linux-trace-users, acme, Clark Williams,
	Jiri Olsa, bristot, Juri Lelli, Jonathan Corbet, Namhyung Kim,
	Alexei Starovoitov

On Sat, 3 Feb 2018 17:04:14 +0000 (UTC)
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:


> The approach proposed here will introduce an expectation that internal
> function signatures never change in the kernel, else it would break user-space
> tools hooking on those functions.

I had this exact discussion with Linus. Linus, please correct me if I'm
wrong.

This is a case where he said if someone expected a function to be
there, than too bad. Functions can come and go depending on if gcc
inlines it or not. We already have this interface today. It's the
function tracer. One could argue a tool requires a function to exist
because it depends on a function being accessible to the function
tracer.

> 
> The instrumentation infrastructure provided by this patchset might be useful
> for "one off" scripts, but it does not address the "stable instrumentation"
> expectations issue.

Actually, it could work for adding a tracepoint.

> 
> The problem today is caused by widely used trace analysis tools that cannot
> cope with changes to the kernel instrumentation, do not report the
> instrumentation mismatch compared to their expectations, and we generally
> don't expect users to ever update those tools to deal with newer kernels. Having
> those tools hook on function names/arguments will not make this magically go
> away. As soon as kernel code changes, widely used trace analysis tools will
> start breaking left and right, and we will be back to square one. Only this time,
> it's the internal function signature which will have become an ABI.

>From those that were asking about having "trace markers" (ie.
Facebook), they told us they can cope with kernel changes.

If a user can't cope with the changes, then they need to have their own
custom kernels.

> 
> A possible solution to this problem appears if we start considering trace
> analysis tools as just that: "tooling", with the following properties:
> 
> 1) Tools need to validate that the instrumentation provided matches their
>    expectations. This can be done by checking event/field names and/or version.
>    Tools that fail to do that should be fixed.
> 
> 2) Tools need to report to the user when the instrumentation does not match
>    their expectations, and hint users to upgrade in order to deal with change.
> 
> 3) Tools need to be backward compatible with respect to instrumentation: a
>    user switching between older and newer kernels should not have to keep
>    various copies of their tooling stack (graphical UI, analysis scripts,
>    and so on).
> 
> If our goal is really to address this "stable instrumentation" issue, I don't
> think hooking on functions helps in any way. I hope we can work on defining
> instrumentation interface rules in order to deal with the fundamental problem
> of requiring tooling to adapt to kernel changes.

I think you may have mistaken my goal. It was not to establish stable
instrumentation. In fact, it was the exact opposite. It was a way to
avoid stable instrumentation but still be able to add trace events.

The issue is that people are afraid to add tracepoints into their
subsystem because they are afraid that they will become stable and
limit their own development. The problem is that it hurts those that
want to trace these subsystems who are perfectly fine with the
tracepoints going away, and then they would need to change their tools.
This change set was to help those that can customize their tools with
new kernels. It was not to help those that just want their tools to
work with all kernels.

With that said, this actually can help those who want stable
infrastructure as well. If there happens to be a function that is
constantly used to create a dynamic function based event, it can then
be shown to ask the sub system maintainer to add a static tracepoint
there. As they can show that it is very useful to have.

One problem we are having today is that too many trace events are being
created, where there are a lot of them that have been used once and
never used again. And people don't care about them. I want to slow down
the addition of trace events if these function events can be used
instead. And when they are not good enough, or we see that one is
constantly being added, then we will know that we can add a trace event
that would be useful in the future.

-- Steve

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 00/18] [ANNOUNCE] Dynamically created function based events
  2018-02-03 19:02   ` Steven Rostedt
@ 2018-02-03 20:52     ` Alexei Starovoitov
  2018-02-03 21:08       ` Steven Rostedt
  2018-02-03 21:17       ` Steven Rostedt
  0 siblings, 2 replies; 87+ messages in thread
From: Alexei Starovoitov @ 2018-02-03 20:52 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mathieu Desnoyers, linux-kernel, Linus Torvalds, Ingo Molnar,
	Andrew Morton, Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu,
	Tom Zanussi, linux-rt-users, linux-trace-users, acme,
	Clark Williams, Jiri Olsa, bristot, Juri Lelli, Jonathan Corbet,
	Namhyung Kim

On Sat, Feb 03, 2018 at 02:02:17PM -0500, Steven Rostedt wrote:
> 
> From those that were asking about having "trace markers" (ie.
> Facebook), they told us they can cope with kernel changes.

There is some misunderstanding here.
We never asked for this interface.
We're perfectly fine with existing kprobe/tracepoint+bpf.
There are plenty of things to improve there, but this 'function based events'
is not something we're interested in.
I don't see how they are any better than kprobes and suffer from the same issues.
We really dislike text based interfaces since they are good only
for human access and very awkward to use from tools.
We also dislike when kernel takes on challenge to do text language parsing.
It's a user space job.

> The issue is that people are afraid to add tracepoints into their
> subsystem because they are afraid that they will become stable and
> limit their own development.

This is not true. Tracepoints are being added and they're being changed.
We have a bunch of tools that use both kprobe and tracepoint hooks
together with bpf programs to extract information from the kernel.
They do break from time to time when we upgrade kernels (and we upgrade often),
but keeping 'if kernel X do this, if kernel Y do that' inside the tool
is perfectly fine.
More often the tools have 'if kernel X ...' due to bpf verifier differences.
It's constantly evolving and older kernels cannot load complex bpf
programs written for the latest. So tools have to do some ugly workarounds.

> One problem we are having today is that too many trace events are being
> created, where there are a lot of them that have been used once and
> never used again. And people don't care about them.

I don't think such issue exists. Please point an example of
a tracepoint that was added and no one cares about it.

As far as Mathieu's point about detecting the difference between kernels,
yes, it's a real problem to solve. The tracepoint changes are
easy to detect, but changes to function arguments are much harder.
Hence we're using kprobes on functions that are unlikely to change
and that works well.

Also please note that kernel tracepoints are not different from tracing tool
point of view than USDT tracepoints in user space apps.
The tools attach to both of them and expect both to be more or less
stable. Yet kernel tracepoints and USDT in apps _do_ change
and tools have to deal with changes. It's actually harder with USDT.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 00/18] [ANNOUNCE] Dynamically created function based events
  2018-02-03 20:52     ` Alexei Starovoitov
@ 2018-02-03 21:08       ` Steven Rostedt
  2018-02-03 21:30         ` Alexei Starovoitov
  2018-02-04 15:50         ` Mathieu Desnoyers
  2018-02-03 21:17       ` Steven Rostedt
  1 sibling, 2 replies; 87+ messages in thread
From: Steven Rostedt @ 2018-02-03 21:08 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Mathieu Desnoyers, linux-kernel, Linus Torvalds, Ingo Molnar,
	Andrew Morton, Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu,
	Tom Zanussi, linux-rt-users, linux-trace-users, acme,
	Clark Williams, Jiri Olsa, bristot, Juri Lelli, Jonathan Corbet,
	Namhyung Kim

On Sat, 3 Feb 2018 12:52:08 -0800
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> On Sat, Feb 03, 2018 at 02:02:17PM -0500, Steven Rostedt wrote:
> > 
> > From those that were asking about having "trace markers" (ie.
> > Facebook), they told us they can cope with kernel changes.  
> 
> There is some misunderstanding here.
> We never asked for this interface.

But you wanted trace markers? Just to confirm.

> We're perfectly fine with existing kprobe/tracepoint+bpf.

OK, so no new development in this was wanted? So the entire talk about
getting tracepoints into vfs and scheduling wasn't needed?

> There are plenty of things to improve there, but this 'function based events'
> is not something we're interested in.

OK, but when I was showing this interface in DevConf.cz, there appeared
to be a lot of interest for it.

> I don't see how they are any better than kprobes and suffer from the same issues.

One only needs to look at source code, to add these. You don't need to
know the specifics of a registers and such. That's a big +. Sure, we
could add this to kprobes as well. But this also doesn't need the
kprobe infrastructure.

> We really dislike text based interfaces since they are good only

Who exactly is "we"?

Peter Zijlstra told me it's basically the only interface he uses. He
doesn't care for tools on top.

> for human access and very awkward to use from tools.

trace-cmd builds its entire connection without issue via these
interfaces. What is awkward about it?

> We also dislike when kernel takes on challenge to do text language parsing.
> It's a user space job.

Not if you are working in the embedded space and only have busybox as
your interface.

> 
> > The issue is that people are afraid to add tracepoints into their
> > subsystem because they are afraid that they will become stable and
> > limit their own development.  
> 
> This is not true. Tracepoints are being added and they're being changed.

vfs doesn't have any tracepoints. And Peter is reluctant to add any
more to the scheduler.

> We have a bunch of tools that use both kprobe and tracepoint hooks
> together with bpf programs to extract information from the kernel.
> They do break from time to time when we upgrade kernels (and we upgrade often),
> but keeping 'if kernel X do this, if kernel Y do that' inside the tool
> is perfectly fine.
> More often the tools have 'if kernel X ...' due to bpf verifier differences.
> It's constantly evolving and older kernels cannot load complex bpf
> programs written for the latest. So tools have to do some ugly workarounds.
> 
> > One problem we are having today is that too many trace events are being
> > created, where there are a lot of them that have been used once and
> > never used again. And people don't care about them.  
> 
> I don't think such issue exists. Please point an example of
> a tracepoint that was added and no one cares about it.

I've already cleaned up several tracepoints that have no path to them.
I'd say those are ones people do not care about. I've also removed
several trace events that are not even connected anywhere. These take
up around 5k each of memory. And these are just the trace events that
don't have paths to them. If we have tracepoints that no longer have
paths to them (which I can detect), how many more have paths but people
don't care about?

-- Steve

> 
> As far as Mathieu's point about detecting the difference between kernels,
> yes, it's a real problem to solve. The tracepoint changes are
> easy to detect, but changes to function arguments are much harder.
> Hence we're using kprobes on functions that are unlikely to change
> and that works well.
> 
> Also please note that kernel tracepoints are not different from tracing tool
> point of view than USDT tracepoints in user space apps.
> The tools attach to both of them and expect both to be more or less
> stable. Yet kernel tracepoints and USDT in apps _do_ change
> and tools have to deal with changes. It's actually harder with USDT.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 00/18] [ANNOUNCE] Dynamically created function based events
  2018-02-03 20:52     ` Alexei Starovoitov
  2018-02-03 21:08       ` Steven Rostedt
@ 2018-02-03 21:17       ` Steven Rostedt
  2018-02-03 21:38         ` Alexei Starovoitov
                           ` (2 more replies)
  1 sibling, 3 replies; 87+ messages in thread
From: Steven Rostedt @ 2018-02-03 21:17 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Mathieu Desnoyers, linux-kernel, Linus Torvalds, Ingo Molnar,
	Andrew Morton, Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu,
	Tom Zanussi, linux-rt-users, linux-trace-users, acme,
	Clark Williams, Jiri Olsa, bristot, Juri Lelli, Jonathan Corbet,
	Namhyung Kim

On Sat, 3 Feb 2018 12:52:08 -0800
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> It's a user space job.

BTW, I asked around at DevConf.cz, and nobody I talked with (besides
Arnaldo), have used eBPF. The "path to hello world" is quite high. This
interface is extremely simple to use, and one doesn't need to install
LLVM or other tools to interface with it.

I used the analogy, that eBPF is like C, and this is like Bash. One is
much easier to get "Hello World!" out than the other.

So personally, this is something I know I would use (note, I have
never used eBPF either). But if I'm the only one to use this
interface then I'll stop here (and not bother with the function graph
return interface). If others think this would be helpful, I would ask
them to speak up now.

Thanks,

-- Steve

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 00/18] [ANNOUNCE] Dynamically created function based events
  2018-02-03 21:08       ` Steven Rostedt
@ 2018-02-03 21:30         ` Alexei Starovoitov
  2018-02-04  2:37           ` Namhyung Kim
  2018-02-04 15:50         ` Mathieu Desnoyers
  1 sibling, 1 reply; 87+ messages in thread
From: Alexei Starovoitov @ 2018-02-03 21:30 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mathieu Desnoyers, linux-kernel, Linus Torvalds, Ingo Molnar,
	Andrew Morton, Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu,
	Tom Zanussi, linux-rt-users, linux-trace-users, acme,
	Clark Williams, Jiri Olsa, bristot, Juri Lelli, Jonathan Corbet,
	Namhyung Kim

On Sat, Feb 03, 2018 at 04:08:24PM -0500, Steven Rostedt wrote:
> On Sat, 3 Feb 2018 12:52:08 -0800
> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> 
> > On Sat, Feb 03, 2018 at 02:02:17PM -0500, Steven Rostedt wrote:
> > > 
> > > From those that were asking about having "trace markers" (ie.
> > > Facebook), they told us they can cope with kernel changes.  
> > 
> > There is some misunderstanding here.
> > We never asked for this interface.
> 
> But you wanted trace markers? Just to confirm.

what is a definition of 'trace marker' and how it's better than tracepoint?

> > We're perfectly fine with existing kprobe/tracepoint+bpf.
> 
> OK, so no new development in this was wanted? So the entire talk about
> getting tracepoints into vfs and scheduling wasn't needed?

I don't know who wants tracepoints in vfs.
Improving scheduler tracepoints? yes. definitely,
but it's not a technical problem and cannot be solved by technical means.

> > I don't see how they are any better than kprobes and suffer from the same issues.
> 
> One only needs to look at source code, to add these. You don't need to
> know the specifics of a registers and such. That's a big +. Sure, we
> could add this to kprobes as well. But this also doesn't need the
> kprobe infrastructure.

same goes for kprobes.
with bcc it's even easier. we write tools like:
int kprobe__inet_listen(struct pt_regs *ctx, struct socket *sock, int backlog)
{ 
  struct sock *sk = sock->sk;
  ...
and bcc knows that it needs to add kprobe to inet_listen()
and this function has two arguments of the given types.
Then sock->sk access is automatically replaced with bpf_probe_read, etc, etc.

> Not if you are working in the embedded space and only have busybox as
> your interface.

did you notice bpfd project that does remote kprobe+bpf into an android phone?
or phone is not an embedded space?

> I've already cleaned up several tracepoints that have no path to them.

Great. So you're making the same point as I do that tracepoints can and do change.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 00/18] [ANNOUNCE] Dynamically created function based events
  2018-02-03 21:17       ` Steven Rostedt
@ 2018-02-03 21:38         ` Alexei Starovoitov
  2018-02-04  2:25         ` Namhyung Kim
  2018-02-05 13:53           ` Juri Lelli
  2 siblings, 0 replies; 87+ messages in thread
From: Alexei Starovoitov @ 2018-02-03 21:38 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mathieu Desnoyers, linux-kernel, Linus Torvalds, Ingo Molnar,
	Andrew Morton, Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu,
	Tom Zanussi, linux-rt-users, linux-trace-users, acme,
	Clark Williams, Jiri Olsa, bristot, Juri Lelli, Jonathan Corbet,
	Namhyung Kim

On Sat, Feb 03, 2018 at 04:17:32PM -0500, Steven Rostedt wrote:
> On Sat, 3 Feb 2018 12:52:08 -0800
> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> 
> > It's a user space job.
> 
> BTW, I asked around at DevConf.cz, and nobody I talked with (besides
> Arnaldo), have used eBPF. The "path to hello world" is quite high. This
> interface is extremely simple to use, and one doesn't need to install
> LLVM or other tools to interface with it.
> 
> I used the analogy, that eBPF is like C, and this is like Bash. One is
> much easier to get "Hello World!" out than the other.
> 
> So personally, this is something I know I would use (note, I have
> never used eBPF either). But if I'm the only one to use this
> interface then I'll stop here (and not bother with the function graph
> return interface). If others think this would be helpful, I would ask
> them to speak up now.

I'm not arguing against the patches. I know that there are folks
out there who like to use cat/echo interfaces.
I'm only happy that the whole thing has its own kconfig
that we can keep off in fb kernels to reduce maintenance/support burden.
Just like we do for all sorts of other kernel features.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 00/18] [ANNOUNCE] Dynamically created function based events
  2018-02-03 17:04 ` Mathieu Desnoyers
  2018-02-03 19:02   ` Steven Rostedt
@ 2018-02-03 21:43   ` Linus Torvalds
  2018-02-04 15:30     ` Mathieu Desnoyers
  1 sibling, 1 reply; 87+ messages in thread
From: Linus Torvalds @ 2018-02-03 21:43 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: rostedt, linux-kernel, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu, Tom Zanussi,
	linux-rt-users, linux-trace-users, acme, Clark Williams,
	Jiri Olsa, bristot, Juri Lelli, Jonathan Corbet, Namhyung Kim,
	Alexei Starovoitov

On Sat, Feb 3, 2018 at 9:04 AM, Mathieu Desnoyers
<mathieu.desnoyers@efficios.com> wrote:
>
> The approach proposed here will introduce an expectation that internal
> function signatures never change in the kernel, else it would break user-space
> tools hooking on those functions.

No, I really don't think so.

There's two reasons for that:

The first is purely about kernel development. I, and every sane kernel
engineer, will simply laugh in the face of somebody who comes to us
and says "hey, I had this script that did low-level function tracing
on your kernel, and then you changed something, and now the random
function I was tracing has a new name and different arguments".

We'll just go "yeah, tough, change your script". Or more likely, not
even bother to reply at all.

But the bigger issue is actually simply just psychology. Exactly
*because* this is all implicit, and there are no explicit trace
points, it's _obvious_ to any user that there isn't something
long-term dependable that they hang their hat on.

Everybody *understands* that this is like a debugger: if you have a
gdb script that shows some information, and then you go around and
change the source code, then *obviously* you'll have to change your
debugger script too. You don't keep the source code static just to
make your gdb script happy., That would be silly.

In contrast, the explicit tracepoints really made people believe that
they have some long-term meaning.

So yes, we'll  make it obvious that hell no, random kernel functions
are not a long-term ABI. But honestly, I don't think we even need to
have a lot of "education" on this, simply because it's so obvious that
anybody who thinks it's some ABI is not going to be somebody we'll
have to worry about.

Because the kind of person thinking "Ooh, this is a stable ABI" won't
be doing interesting work anyway. That kind of person will be sitting
in a corner eating paste, not doing interesting kernel tracing.

                 Linus

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 00/18] [ANNOUNCE] Dynamically created function based events
  2018-02-03 21:17       ` Steven Rostedt
  2018-02-03 21:38         ` Alexei Starovoitov
@ 2018-02-04  2:25         ` Namhyung Kim
  2018-02-05 15:02           ` Steven Rostedt
  2018-02-05 13:53           ` Juri Lelli
  2 siblings, 1 reply; 87+ messages in thread
From: Namhyung Kim @ 2018-02-04  2:25 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Alexei Starovoitov, Mathieu Desnoyers, linux-kernel,
	Linus Torvalds, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Tom Zanussi, linux-rt-users,
	linux-trace-users, acme, Clark Williams, Jiri Olsa, bristot,
	Juri Lelli, Jonathan Corbet

Hi Steve and Alexei,

On Sun, Feb 4, 2018 at 6:17 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Sat, 3 Feb 2018 12:52:08 -0800
> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>
>> It's a user space job.
>
> BTW, I asked around at DevConf.cz, and nobody I talked with (besides
> Arnaldo), have used eBPF. The "path to hello world" is quite high. This
> interface is extremely simple to use, and one doesn't need to install
> LLVM or other tools to interface with it.
>
> I used the analogy, that eBPF is like C, and this is like Bash. One is
> much easier to get "Hello World!" out than the other.
>
> So personally, this is something I know I would use (note, I have
> never used eBPF either). But if I'm the only one to use this
> interface then I'll stop here (and not bother with the function graph
> return interface). If others think this would be helpful, I would ask
> them to speak up now.

I'm interested in this.  From my understanding, it's basically
function tracing + filter + custom argument info, right?

Supporting arguments with complex type could be error-prone.
We need to prevent malfunctions by invalid inputs.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 00/18] [ANNOUNCE] Dynamically created function based events
  2018-02-03 21:30         ` Alexei Starovoitov
@ 2018-02-04  2:37           ` Namhyung Kim
  0 siblings, 0 replies; 87+ messages in thread
From: Namhyung Kim @ 2018-02-04  2:37 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Steven Rostedt, Mathieu Desnoyers, linux-kernel, Linus Torvalds,
	Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Masami Hiramatsu, Tom Zanussi, linux-rt-users, linux-trace-users,
	acme, Clark Williams, Jiri Olsa, bristot, Juri Lelli,
	Jonathan Corbet

On Sun, Feb 4, 2018 at 6:30 AM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Sat, Feb 03, 2018 at 04:08:24PM -0500, Steven Rostedt wrote:
>> OK, so no new development in this was wanted? So the entire talk about
>> getting tracepoints into vfs and scheduling wasn't needed?
>
> I don't know who wants tracepoints in vfs.

AFAIK some people wanted to get some info (e.g. filename) from vfs.


>> Not if you are working in the embedded space and only have busybox as
>> your interface.
>
> did you notice bpfd project that does remote kprobe+bpf into an android phone?
> or phone is not an embedded space?

I never tried bpfd yet but it looks promissing.
It'd be nice to have such setup working on a typical yocto environment.
Anyway, I'd say that android phone is different that typical embedded
systems. :)

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 13/18] tracing: Add array type to function based events
  2018-02-03 15:29     ` Steven Rostedt
@ 2018-02-04  3:50       ` Masami Hiramatsu
  0 siblings, 0 replies; 87+ messages in thread
From: Masami Hiramatsu @ 2018-02-04  3:50 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Tom Zanussi, linux-rt-users,
	linux-trace-users, Arnaldo Carvalho de Melo, Clark Williams,
	Jiri Olsa, Daniel Bristot de Oliveira, Juri Lelli,
	Jonathan Corbet, Mathieu Desnoyers, Namhyung Kim,
	Alexei Starovoitov

On Sat, 3 Feb 2018 10:29:03 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Sat, 3 Feb 2018 22:56:15 +0900
> Masami Hiramatsu <mhiramat@kernel.org> wrote:
> 
> > This idea looks good for kprobe events too.
> > I'll try to implement same one :)
> 
> We should have the two re-use the same code.
> 
> In other words, I would like as much code sharing as possible.

Finally it will be, but currently the code base is too far.
I would like to implement it on current code for avoiding
break anything, but refactoring afterwards.

I think current fetch function implementation may be too heavy
with retpoline. I would like to reimplement it with eBPF :)

Thank you,

> 
> -- Steve


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 00/18] [ANNOUNCE] Dynamically created function based events
  2018-02-03 15:27   ` Steven Rostedt
@ 2018-02-04  3:57     ` Masami Hiramatsu
  2018-02-04 17:21       ` Alexei Starovoitov
  0 siblings, 1 reply; 87+ messages in thread
From: Masami Hiramatsu @ 2018-02-04  3:57 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Tom Zanussi, linux-rt-users,
	linux-trace-users, Arnaldo Carvalho de Melo, Clark Williams,
	Jiri Olsa, Daniel Bristot de Oliveira, Juri Lelli,
	Jonathan Corbet, Mathieu Desnoyers, Namhyung Kim,
	Alexei Starovoitov

On Sat, 3 Feb 2018 10:27:59 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Sat, 3 Feb 2018 22:38:17 +0900
> Masami Hiramatsu <mhiramat@kernel.org> wrote:
> 
> > This seems very similar feature of what kprobe-based event does.
> 
> It is similar, but not the same as kprobes. It only focuses on
> functions and their arguments, and should not require any disassembling
> (no knowledge of regs required). Any need to see anything within the
> function will still require kprobe based help.

Yes, I see that point.

> > Earlier version of kprobe-based event supported Nth argument, but I decided
> > to drop it because we can not ensure the "function signature" from kernel
> > binary. It has been passed to "perf probe", so that we can define line-level
> > granularity. 
> 
> Sure, and if we get a wrong function, which can happen, the code is set
> up not to break anything. You only get garbage output.
> 
> > 
> > Of course if it is easy to support "argN" again as it does if the arch's
> > calling convention is clearly stated. (and now kprobe is based on ftrace
> > if it is on the entry of functions)
> > 
> > So my question is, what is the difference of those from the user perspective?
> > Only event syntax is a problem?
> > I'm very confusing...
> 
> Again, this is not a kprobe replacement. It is somewhat of a syntax
> issue. I want this to be very simple and not need a disassembler. For
> indexing of structures one may use gdb, but that's about it. You could
> get the same info from counting what's in a structure itself.

OK, so it is a simpler version of function event...

> I based some of the code from kprobes too. But I wanted this to be
> simpler, and as such, not as powerful as kprobes. More of a "poor mans"
> kprobe ;-) Where you are limited to functions and their arguments. If
> you need more power, switch to kprobes. In other words, its just an
> added stepping stone.
> 
> Also, this should work without kprobe support, only ftrace, and function
> args from the arch.

Hmm, but implementation seems very far from current probe events, we need
to consider how to unify it. Anyway, it is a very good time to do, because
I found current probe-event fetch method is not good with retpoline/IBRS,
it is full of indirect call.

I would like to convert it to eBPF if possible. It will be good for the
performance with JIT, and we can collaborate on the same code with BPF
people.

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 00/18] [ANNOUNCE] Dynamically created function based events
  2018-02-03 21:43   ` Linus Torvalds
@ 2018-02-04 15:30     ` Mathieu Desnoyers
  2018-02-04 15:47       ` Steven Rostedt
  2018-02-04 19:39       ` Linus Torvalds
  0 siblings, 2 replies; 87+ messages in thread
From: Mathieu Desnoyers @ 2018-02-04 15:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: rostedt, linux-kernel, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu, Tom Zanussi,
	linux-rt-users, linux-trace-users, acme, Clark Williams,
	Jiri Olsa, bristot, Juri Lelli, Jonathan Corbet, Namhyung Kim,
	Alexei Starovoitov

----- On Feb 3, 2018, at 4:43 PM, Linus Torvalds torvalds@linux-foundation.org wrote:

> On Sat, Feb 3, 2018 at 9:04 AM, Mathieu Desnoyers
> <mathieu.desnoyers@efficios.com> wrote:
>>
>> The approach proposed here will introduce an expectation that internal
>> function signatures never change in the kernel, else it would break user-space
>> tools hooking on those functions.
> 
> No, I really don't think so.
> 
> There's two reasons for that:
> 
> The first is purely about kernel development. I, and every sane kernel
> engineer, will simply laugh in the face of somebody who comes to us
> and says "hey, I had this script that did low-level function tracing
> on your kernel, and then you changed something, and now the random
> function I was tracing has a new name and different arguments".
> 
> We'll just go "yeah, tough, change your script". Or more likely, not
> even bother to reply at all.
> 
> But the bigger issue is actually simply just psychology. Exactly
> *because* this is all implicit, and there are no explicit trace
> points, it's _obvious_ to any user that there isn't something
> long-term dependable that they hang their hat on.
> 
> Everybody *understands* that this is like a debugger: if you have a
> gdb script that shows some information, and then you go around and
> change the source code, then *obviously* you'll have to change your
> debugger script too. You don't keep the source code static just to
> make your gdb script happy., That would be silly.
> 
> In contrast, the explicit tracepoints really made people believe that
> they have some long-term meaning.
> 
> So yes, we'll  make it obvious that hell no, random kernel functions
> are not a long-term ABI. But honestly, I don't think we even need to
> have a lot of "education" on this, simply because it's so obvious that
> anybody who thinks it's some ABI is not going to be somebody we'll
> have to worry about.
> 
> Because the kind of person thinking "Ooh, this is a stable ABI" won't
> be doing interesting work anyway. That kind of person will be sitting
> in a corner eating paste, not doing interesting kernel tracing.

I agree with your arguments. A consequence of those arguments is that
function-based tracing should be expected to be used by kernel engineers
and experts who can adapt their scripts to follow code changes, and tune
the script based on their specific kernel version and configuration.

On the other hand, system administrators and application developers who
wish to tune and understand their workload will more likely fetch their
analysis scripts/tools from a third-party. If those scripts hook on
functions/arguments, it becomes really hard to ensure those will work
with a myriad of kernel versions, configurations, and toolchain versions
out there.

The good news is that Steven's approach should allow us to deal with
use-cases that are specific to kernel developers by simply using
function tracing. That should take care of most "one off" and very
specialized use-cases.

My genuine concern here is that tools targeting sysadmins and application
developers start hooking on kernel functions, and all goes well for a
few months or years until someone changes that part of the code. Then
all those wonderfully useful scripts that users depend on will end up
being broken: in the best scenario those tools will detect the change
and require to be updated, but in the worse case the tools will simply
print bogus results that people will take for granted.

This should therefore leave a door open to adding new tracepoints: cases
where the data gathered is shown to be useful enough for tools targeting
an audience wider than just kernel developers. To improve over the current
situation, we should think about documenting some rules about how those
tools should cope with tracepoints changing over time (event version,
tools backward compatibility, and so on), and make sure the ABI exposes
the information required to help tools cope with change.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 00/18] [ANNOUNCE] Dynamically created function based events
  2018-02-04 15:30     ` Mathieu Desnoyers
@ 2018-02-04 15:47       ` Steven Rostedt
  2018-02-04 19:39       ` Linus Torvalds
  1 sibling, 0 replies; 87+ messages in thread
From: Steven Rostedt @ 2018-02-04 15:47 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Linus Torvalds, linux-kernel, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu, Tom Zanussi,
	linux-rt-users, linux-trace-users, acme, Clark Williams,
	Jiri Olsa, bristot, Juri Lelli, Jonathan Corbet, Namhyung Kim,
	Alexei Starovoitov

On Sun, 4 Feb 2018 15:30:04 +0000 (UTC)
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> This should therefore leave a door open to adding new tracepoints: cases
> where the data gathered is shown to be useful enough for tools targeting
> an audience wider than just kernel developers. To improve over the current
> situation, we should think about documenting some rules about how those
> tools should cope with tracepoints changing over time (event version,
> tools backward compatibility, and so on), and make sure the ABI exposes
> the information required to help tools cope with change.

As I mentioned earlier. If a function based event proves to be useful
enough to pull out information that sysadmins et.al. find beneficial,
than that could be used as an argument to create a normal static
tracepoint for that information.

-- Steve

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 00/18] [ANNOUNCE] Dynamically created function based events
  2018-02-03 21:08       ` Steven Rostedt
  2018-02-03 21:30         ` Alexei Starovoitov
@ 2018-02-04 15:50         ` Mathieu Desnoyers
  1 sibling, 0 replies; 87+ messages in thread
From: Mathieu Desnoyers @ 2018-02-04 15:50 UTC (permalink / raw)
  To: rostedt
  Cc: Alexei Starovoitov, linux-kernel, Linus Torvalds, Ingo Molnar,
	Andrew Morton, Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu,
	Tom Zanussi, linux-rt-users, linux-trace-users, acme,
	Clark Williams, Jiri Olsa, bristot, Juri Lelli, Jonathan Corbet,
	Namhyung Kim

----- On Feb 3, 2018, at 4:08 PM, rostedt rostedt@goodmis.org wrote:

> On Sat, 3 Feb 2018 12:52:08 -0800
> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> 
>> On Sat, Feb 03, 2018 at 02:02:17PM -0500, Steven Rostedt wrote:
>> > 
>> > From those that were asking about having "trace markers" (ie.
>> > Facebook), they told us they can cope with kernel changes.
>> 
>> There is some misunderstanding here.
>> We never asked for this interface.
> 
> But you wanted trace markers? Just to confirm.
> 
>> We're perfectly fine with existing kprobe/tracepoint+bpf.
> 
> OK, so no new development in this was wanted? So the entire talk about
> getting tracepoints into vfs and scheduling wasn't needed?

I did presentations in the past months about the need to add some
tracepoints into scheduling and IPI code on x86.

Instrumentation of IPI is needed not only by kernel developers, but also
by tools targeting sysadmins/app developers to help them figure out where
the time is spent when they are hitting unexpected long latencies in their
system. We can indeed start by using function instrumentation to show the
usefulness of this instrumentation, but I expect that we'll end up adding
a tracepoint there eventually.

Tracepoints in scheduling also falls in the category of letting sysadmins
and app developers understand where time is spent on their system. When
they hit an unexpected long latency, they want to understand what is
wrong in their task priority and scheduling policy that led to this delay.
The data extracted from the scheduler today is not sufficient to achieve
this. So this is another case where we might see kernel developers using
function instrumentation initially, but we'll probably end up adding and/or
changing tracepoints to help users out there who need tools analyzing this
scheduling data.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 00/18] [ANNOUNCE] Dynamically created function based events
  2018-02-04  3:57     ` Masami Hiramatsu
@ 2018-02-04 17:21       ` Alexei Starovoitov
  2018-02-05 14:39         ` Masami Hiramatsu
  0 siblings, 1 reply; 87+ messages in thread
From: Alexei Starovoitov @ 2018-02-04 17:21 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: rostedt, linux-kernel, torvalds, mingo, akpm, tglx, peterz, acme,
	corbet, mathieu.desnoyers, namhyung, daniel, davem

On Sun, Feb 04, 2018 at 12:57:47PM +0900, Masami Hiramatsu wrote:
> 
> > I based some of the code from kprobes too. But I wanted this to be
> > simpler, and as such, not as powerful as kprobes. More of a "poor mans"
> > kprobe ;-) Where you are limited to functions and their arguments. If
> > you need more power, switch to kprobes. In other words, its just an
> > added stepping stone.
> > 
> > Also, this should work without kprobe support, only ftrace, and function
> > args from the arch.
> 
> Hmm, but implementation seems very far from current probe events, we need
> to consider how to unify it. Anyway, it is a very good time to do, because
> I found current probe-event fetch method is not good with retpoline/IBRS,
> it is full of indirect call.
> 
> I would like to convert it to eBPF if possible. It will be good for the
> performance with JIT, and we can collaborate on the same code with BPF
> people.

The current probe fetch method is indeed going to slow down due to
retpoline, but this issue is going to affect not only this piece
of code, but the rest of the kernel where indirect call performance
matters a lot. Like networking stack where we have at least 4 indirect
calls per packet.
So I'd suggest to focus on finding a general method instead of coming
with a specific solution for this kprobe fetching problem.
Devirtualization approach works well and applicable in many cases.
For networking stack deliver_skb() and __netif_receive_skb_core()
can check if (pt_prev->func == ip_rcv || ipv6_rcv)
and call them directly.
The other approach I was thinking to explore is static_key-like
for indirect calls. In many cases the target is rarely changed,
so we can do arch specific rewrite of destination offset inside
normal direct call instruction. That should be faster than retpoline.

As far as emitting raw bpf insns instead of kprobe fetch methods
there is a big problem with such apporach. Interpreter and all
JITs take 'struct bpf_prog' that passed the verifier and not just
random set of bpf instructions. BPF is not a generic assembler.
BPF is an instruction set _with_ C calling convention.
The registers and instructions must be used in certain way or
things will horribly break.
See Documentation/bpf/bpf_design_QA.txt for details.
Long ago I wrote a patch that converted pred tree walk into
raw bpf insns. If that patch made it into mainline back then
it would have been a huge headache for us now.
So if you plan on generating bpf programs they _must_ pass the verifier.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 00/18] [ANNOUNCE] Dynamically created function based events
  2018-02-04 15:30     ` Mathieu Desnoyers
  2018-02-04 15:47       ` Steven Rostedt
@ 2018-02-04 19:39       ` Linus Torvalds
  2018-02-05 10:09         ` Peter Zijlstra
  2018-02-05 15:14         ` Masami Hiramatsu
  1 sibling, 2 replies; 87+ messages in thread
From: Linus Torvalds @ 2018-02-04 19:39 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: rostedt, linux-kernel, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu, Tom Zanussi,
	linux-rt-users, linux-trace-users, acme, Clark Williams,
	Jiri Olsa, bristot, Juri Lelli, Jonathan Corbet, Namhyung Kim,
	Alexei Starovoitov

On Sun, Feb 4, 2018 at 7:30 AM, Mathieu Desnoyers
<mathieu.desnoyers@efficios.com> wrote:
>
> I agree with your arguments. A consequence of those arguments is that
> function-based tracing should be expected to be used by kernel engineers
> and experts who can adapt their scripts to follow code changes, and tune
> the script based on their specific kernel version and configuration.

Honestly, I think that's largely the case already.

The main source of tracing is done by experts at big cloud companies,
I bet. People who do it for performance reasons, or to find some
anomaly. They're pretty intimate with the kernel.

There _are_ "generic MIS" uses for tracing, and I think those are
places where we may want architectural trace points. Things like
gathering IO statistics etc.

I personally think that one of the pain points with tracing has been
exactly the fact that there are two *completely* different uses, and
they have *completely* different requirements. There's the expert
user, who basically wants tracepoints almost everywhere, and who is
doing some really deep analysis of some random area.

Then there's the "I just want an overview" MIS people, who care about
things like "I want a histogram of packets sent according to criteria
XYZ", who want some highlevel block IO performance, or who just want
random system-wide statistics.

One group really needs to tie in to _anything_, and by definition is
going to delve deep into some very specific corner of the kernel,
because they might be chasing a subtle bug and want to have traces to
just _find_ it.

The other group is looking for a much higher-level thing, and isn't
necessarily a kernel hacker, and just wants to know IO latencies or
something for statistics.

I think the function-based events is for that first group. We do not
want to have actual explicit trace events for that group, because that
group might want them _everywhere_.  That first group might want to
know the latency of a packet or block command through one particular
chain.

The second group might want explicit trace points exactly because that
group doesn't even care *how* a packet is sent or received, or what
the path through the block layer is. It just wants to know "packet
sent" or "latency between IO request and completion" or things like
that.

The first group cares about a particular kernel implementation and has
the expertise to line things up for the particular kernel that is
being deployed on a hundred thousand machines.

The second group doesn't want to care about a particular kernel, just
wants tools that work across them.

This is why I pushed Steven towards this function-based events things.
Because I'm *hoping* that this can actually resolve that conflict
between the two groups. Function-based events are for the first group,
while actual explicit trace points are for the second.

(Obviously it's not entirely black-and-white, but I do think there is
a pretty big difference between the two groups. And the first group
will obviously use the explicit trace points _too_, generally to
narrow down where they want to go with the function-based one).

We'll see. Maybe I'm entirely wrong. But I'm hoping that the
function-based one will end up being helpful.

               Linus

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 01/18] tracing: Add function based events
  2018-02-02 23:04 ` [PATCH 01/18] tracing: Add " Steven Rostedt
@ 2018-02-05  8:24   ` Jiri Olsa
  2018-02-05 15:00     ` Steven Rostedt
  0 siblings, 1 reply; 87+ messages in thread
From: Jiri Olsa @ 2018-02-05  8:24 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu, Tom Zanussi,
	linux-rt-users, linux-trace-users, Arnaldo Carvalho de Melo,
	Clark Williams, Daniel Bristot de Oliveira, Juri Lelli,
	Jonathan Corbet, Mathieu Desnoyers, Namhyung Kim,
	Alexei Starovoitov

On Fri, Feb 02, 2018 at 06:04:59PM -0500, Steven Rostedt wrote:

SNIP

> +static int create_function_event(int argc, char **argv)
> +{
> +	struct func_event *func_event, *fe;
> +	enum func_states state = FUNC_STATE_INIT;
> +	char *token;
> +	char *ptr;
> +	char last;
> +	int ret = -EINVAL;
> +	int i;
> +
> +	func_event = kzalloc(sizeof(*func_event), GFP_KERNEL);
> +	if (!func_event)
> +		return -ENOMEM;
> +
> +	INIT_LIST_HEAD(&func_event->files);
> +	func_event->ops.func = func_event_call;
> +	func_event->ops.flags = FTRACE_OPS_FL_SAVE_REGS;
> +
> +	for (i = 0; i < argc; i++) {
> +		ptr = argv[i];
> +		last = 0;
> +		for (token = next_token(&ptr, &last); token;
> +		     token = next_token(&ptr, &last)) {
> +			state = process_event(func_event, token, state);
> +			if (state == FUNC_STATE_ERROR)
> +				goto fail;
> +		}
> +	}
> +	if (state != FUNC_STATE_END)
> +		goto fail;
> +
> +	ret = -EALREADY;
> +	list_for_each_entry(fe, &func_events, list) {
> +		if (strcmp(fe->func, func_event->func) == 0)
> +			goto fail;
> +	}
> +
> +	ret = ftrace_set_filter(&func_event->ops, func_event->func,
> +				strlen(func_event->func), 0);
> +	if (ret < 0)
> +		goto fail;
> +
> +	ret = func_event_create(func_event);
> +	if (ret < 0)
> +		goto fail;
> +
> +	list_add_tail(&func_event->list, &func_events);
> +	return 0;

should this be done under 'func_event_mutex' ?

I tried and crashed the system by running 2 scripts with:

  echo 'ip_rcv(u64 skb, u64 dev)' > /sys/kernel/debug/tracing/function_events
  echo 'SyS_openat(int dfd, string buf, x32 flags, x32 mode)' >> /sys/kernel/debug/tracing/function_events
  echo 'SyS_open(x8[32] buf, x32 flags, x32 mode)' >> /sys/kernel/debug/tracing/function_events

jirka


[  376.727159] general protection fault: 0000 [#1] SMP PTI
[  376.732992] Modules linked in: intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul igb ghash_clmulni_intel intel_cstate ptp intel_uncore pps_core iTCO_wdt iTCO_vendor_support ipmi_ssif ipmi_si ipmi_devintf ipmi_msghandler shpchp ioatdma cdc_ether usbnet mii intel_rapl_perf i2c_i801 tpm_tis tpm_tis_core dca tpm lpc_ich wmi xfs libcrc32c mgag200 i2c_algo_bit drm_kms_helper ttm crc32c_intel drm megaraid_sas
[  376.779583] CPU: 9 PID: 1285 Comm: t.sh Not tainted 4.15.0-rc9idle+ #32
[  376.786956] Hardware name: IBM System x3650 M4 : -[7915E2G]-/00Y7683, BIOS -[VVE124AUS-1.30]- 11/21/2012
[  376.797545] RIP: 0010:__unregister_trace_event+0xe/0x70
[  376.803376] RSP: 0018:ffffa643043cbc50 EFLAGS: 00010286
[  376.809206] RAX: dead000000000100 RBX: ffff91b6340c2480 RCX: ffffffffbf2ebf50
[  376.817170] RDX: dead000000000200 RSI: ffffffffbf2ed540 RDI: ffff91b6340c2480
[  376.825135] RBP: ffff91b6340c2460 R08: 0000000000000001 R09: 0000000000000000
[  376.833099] R10: ffffa643043cbc78 R11: 0000000000000000 R12: ffff91b6340c2400
[  376.841062] R13: ffff91b6326f8600 R14: dead000000000200 R15: dead000000000100
[  376.849028] FS:  00007ff3644bcb40(0000) GS:ffff91b63fac0000(0000) knlGS:0000000000000000
[  376.858058] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  376.864469] CR2: 000055c83b531008 CR3: 00000004736d0003 CR4: 00000000000606e0
[  376.872433] Call Trace:
[  376.875168]  event_remove+0x72/0x120
[  376.879159]  trace_remove_event_call+0x79/0xd0
[  376.884117]  func_event_open+0xb1/0xd0
[  376.888302]  ? free_func_event+0x70/0x70
[  376.892673]  do_dentry_open+0x1b1/0x2d0
[  376.896954]  path_openat+0x602/0x14e0
[  376.901041]  do_filp_open+0x9b/0x110
[  376.905032]  ? __vfs_write+0x33/0x170
[  376.909119]  ? __check_object_size+0xaf/0x1b0
[  376.913981]  ? do_sys_open+0x1bd/0x250
[  376.918164]  do_sys_open+0x1bd/0x250
[  376.922156]  entry_SYSCALL_64_fastpath+0x20/0x83
[  376.927299] RIP: 0033:0x7ff363ba01c0
[  376.931287] RSP: 002b:00007ffdf21e4978 EFLAGS: 00000246
[  376.931290] Code: e0 d9 2e bf e9 94 aa 6e 00 0f 1f 40 00 48 c7 c7 e0 d9 2e bf e9 54 5c f7 ff 0f 1f 40 00 53 48 8b 07 48 89 fb 48 8b 57 08 48 85 c0 <48> 89 02 74 04 48 89 50 08 48 b8 00 01 00 00 00 00 ad de 48 8d 
[  376.958209] RIP: __unregister_trace_event+0xe/0x70 RSP: ffffa643043cbc50
[  376.965711] ---[ end trace b3dd6064ee6bc2f4 ]---

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 00/18] [ANNOUNCE] Dynamically created function based events
  2018-02-04 19:39       ` Linus Torvalds
@ 2018-02-05 10:09         ` Peter Zijlstra
  2018-02-05 15:10           ` Steven Rostedt
  2018-02-05 15:14         ` Masami Hiramatsu
  1 sibling, 1 reply; 87+ messages in thread
From: Peter Zijlstra @ 2018-02-05 10:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Mathieu Desnoyers, rostedt, linux-kernel, Ingo Molnar,
	Andrew Morton, Thomas Gleixner, Masami Hiramatsu, Tom Zanussi,
	linux-rt-users, linux-trace-users, acme, Clark Williams,
	Jiri Olsa, bristot, Juri Lelli, Jonathan Corbet, Namhyung Kim,
	Alexei Starovoitov

On Sun, Feb 04, 2018 at 11:39:39AM -0800, Linus Torvalds wrote:
> Then there's the "I just want an overview" MIS people, who care about
> things like "I want a histogram of packets sent according to criteria
> XYZ", who want some highlevel block IO performance, or who just want
> random system-wide statistics.

> The second group might want explicit trace points exactly because that
> group doesn't even care *how* a packet is sent or received, or what
> the path through the block layer is. It just wants to know "packet
> sent" or "latency between IO request and completion" or things like
> that.

So a large sticking point here as been the scheduler tracepoints; which
I'm rather unhappy with.

As a result of adding SCHED_DEADLINE the existing tracepoints no longer
are sufficient (they don't provide any deadline specific information).

So the MIS people that are intersted in deadline tasks are unhappy.

My own preference is to just add the deadline information to the
existing tracepoints, but then people complain these become too big
(which is slow etc..), saying sched_switch is a high rate tracepoint
(true of course) (not to mention that changing the tracepoint will
probably break something, but they'll just have to cope).

So they've proposed all kinds of horrible alternatives that are all
variations of multiple tracepoints in the same location that fragment
the information, each of which I hate.

I'm ok with having the _one_ tracepoint, but I don't want 3+
sched_switch tracepoints, each having a different set of information
depending on what people want, that way lies madness.

As a run-around, Steve then suggested to decouple the trace-hook from
the actual trace-event. Let the scheduler only provide the hook, nothing
else. And then allow users to create their own events with the specific
data they need for their specific use-case.

Various options have been floated, ebpf, modules, whatever.

At this point I'm well tired of all this and would just as soon rip out
all tracepoints entire; but of course, the scheduler has some very
useful information for MIS people, so we can't realy do that either.

/sadface

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 00/18] [ANNOUNCE] Dynamically created function based events
  2018-02-02 23:04 [PATCH 00/18] [ANNOUNCE] Dynamically created function based events Steven Rostedt
                   ` (20 preceding siblings ...)
  2018-02-03 18:52 ` Steven Rostedt
@ 2018-02-05 10:23 ` Juri Lelli
  2018-02-05 10:49   ` Daniel Bristot de Oliveira
  21 siblings, 1 reply; 87+ messages in thread
From: Juri Lelli @ 2018-02-05 10:23 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu, Tom Zanussi,
	linux-rt-users, linux-trace-users, Arnaldo Carvalho de Melo,
	Clark Williams, Jiri Olsa, Daniel Bristot de Oliveira,
	Jonathan Corbet, Mathieu Desnoyers, Namhyung Kim,
	Alexei Starovoitov

Hi Steve,

On 02/02/18 18:04, Steven Rostedt wrote:
> 
> At Kernel Summit back in October, we tried to bring up trace markers, which
> would be nops within the kernel proper, that would allow modules to hook
> arbitrary trace events to them. The reaction to this proposal was less than
> favorable. We were told that we were trying to make a work around for a
> problem, and not solving it. The problem in our minds is the notion of a
> "stable trace event".
> 
> There are maintainers that do not want trace events, or more trace events in
> their subsystems. This is due to the fact that trace events post an
> interface to user space, and this interface could become required by some
> tool. This may cause the trace event to become stable where it must not
> break the tool, and thus prevent the code from changing.
> 
> Or, the trace event may just have to add padding for fields that tools
> may require. The "success" field of the sched_wakeup trace event is one such
> instance. There is no more "success" variable, but tools may fail if it were
> to go away, so a "1" is simply added to the trace event wasting ring buffer
> real estate.
> 
> I talked with Linus about this, and he told me that we already have these
> markers in the kernel. They are from the mcount/__fentry__ used by function
> tracing. Have the trace events be created by these, and see if this will
> satisfy most areas that want trace events.
> 
> I decided to implement this idea, and here's the patch set.
> 
> Introducing "function based events". These are created dynamically by a
> tracefs file called "function_events". By writing a pseudo prototype into
> this file, you create an event.
> 
>  # mount -t tracefs nodev /sys/kernel/tracing
>  # cd /sys/kernel/tracing
>  # echo 'do_IRQ(symbol ip[16] | x64[6] irq_stack[16])' > function_events
>  # cat events/functions/do_IRQ/format
> name: do_IRQ
> ID: 1399
> format:
> 	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
> 	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
> 	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
> 	field:int common_pid;	offset:4;	size:4;	signed:1;
> 
> 	field:unsigned long __parent_ip;	offset:8;	size:8;	signed:0;
> 	field:unsigned long __ip;	offset:16;	size:8;	signed:0;
> 	field:symbol ip;	offset:24;	size:8;	signed:0;
> 	field:x64 irq_stack[6];	offset:32;	size:48;	signed:0;
> 
> print fmt: "%pS->%pS(ip=%pS, irq_stack=%llx:%llx:%llx:%llx:%llx:%llx)", REC->__ip, REC->__parent_ip,
> REC->ip, REC->irq_stack[0], REC->irq_stack[1], REC->irq_stack[2], REC->irq_stack[3], REC->irq_stack[4],
> REC->irq_stack[5]
> 
>  # echo 1 > events/functions/do_IRQ/enable

Got the following:

[  110.433602] =============================
[  110.435671] WARNING: suspicious RCU usage
[  110.437173] 4.15.0-rc9+ #42 Not tainted
[  110.438698] -----------------------------
[  110.440343] /home/juri/Work/kernel/linux/include/linux/rcupdate.h:749 rcu_read_lock_sched() used illegally while idle!
[  110.444480]
[  110.444480] other info that might help us debug this:
[  110.444480]
[  110.447616]
[  110.447616] RCU used illegally from idle CPU!
[  110.447616] rcu_scheduler_active = 2, debug_locks = 1
[  110.452047] RCU used illegally from extended quiescent state!
[  110.454072] 1 lock held by swapper/0/0:
[  110.455447]  #0:  (rcu_read_lock_sched){....}, at: [<00000000de240ad4>] func_event_call+0x0/0x3c0
[  110.458532]
[  110.458532] stack backtrace:
[  110.460066] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.15.0-rc9+ #42
[  110.462300] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
[  110.464477] Call Trace:
[  110.465095]  <IRQ>
[  110.465600]  dump_stack+0x85/0xc5
[  110.466417]  func_event_call+0x378/0x3c0
[  110.467373]  ? find_held_lock+0x34/0xa0
[  110.468216]  ? common_interrupt+0xa2/0xa2
[  110.468978]  ? irq_work_interrupt+0xb0/0xb0
[  110.470021]  ? hrtimer_start_range_ns+0x1bf/0x3e0
[  110.471031]  ftrace_ops_assist_func+0x64/0xf0
[  110.471941]  ? _raw_spin_unlock_irqrestore+0x55/0x60
[  110.472926]  0xffffffffc02e30bf
[  110.473491]  ? do_IRQ+0x5/0x100
[  110.473977]  do_IRQ+0x5/0x100
[  110.474430]  common_interrupt+0xa2/0xa2
[  110.475014]  </IRQ>
[  110.475341] RIP: 0010:native_safe_halt+0x2/0x10
[  110.476020] RSP: 0018:ffffffff96a03ec8 EFLAGS: 00000206 ORIG_RAX: ffffffffffffffdd
[  110.477137] RAX: ffffffff96a2a500 RBX: 0000000000000000 RCX: 0000000000000000
[  110.478110] RDX: ffffffff96a2a500 RSI: 0000000000000001 RDI: ffffffff96a2a500
[  110.478997] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
[  110.479880] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  110.480764] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  110.481661]  default_idle+0x1f/0x1a0
[  110.482118]  do_idle+0x166/0x1e0
[  110.482530]  cpu_startup_entry+0x19/0x20
[  110.482985]  start_kernel+0x40a/0x412
[  110.483385]  secondary_startup_64+0xa5/0xb0

continuing to test this. :)

Thanks,

- Juri

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 00/18] [ANNOUNCE] Dynamically created function based events
  2018-02-05 10:23 ` Juri Lelli
@ 2018-02-05 10:49   ` Daniel Bristot de Oliveira
  2018-02-05 15:11     ` Steven Rostedt
  0 siblings, 1 reply; 87+ messages in thread
From: Daniel Bristot de Oliveira @ 2018-02-05 10:49 UTC (permalink / raw)
  To: Juri Lelli, Steven Rostedt
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu, Tom Zanussi,
	linux-rt-users, linux-trace-users, Arnaldo Carvalho de Melo,
	Clark Williams, Jiri Olsa, Jonathan Corbet, Mathieu Desnoyers,
	Namhyung Kim, Alexei Starovoitov

On 02/05/2018 11:23 AM, Juri Lelli wrote:
> Hi Steve,
> 
> On 02/02/18 18:04, Steven Rostedt wrote:
>>
>> At Kernel Summit back in October, we tried to bring up trace markers, which
>> would be nops within the kernel proper, that would allow modules to hook
>> arbitrary trace events to them. The reaction to this proposal was less than
>> favorable. We were told that we were trying to make a work around for a
>> problem, and not solving it. The problem in our minds is the notion of a
>> "stable trace event".
>>
>> There are maintainers that do not want trace events, or more trace events in
>> their subsystems. This is due to the fact that trace events post an
>> interface to user space, and this interface could become required by some
>> tool. This may cause the trace event to become stable where it must not
>> break the tool, and thus prevent the code from changing.
>>
>> Or, the trace event may just have to add padding for fields that tools
>> may require. The "success" field of the sched_wakeup trace event is one such
>> instance. There is no more "success" variable, but tools may fail if it were
>> to go away, so a "1" is simply added to the trace event wasting ring buffer
>> real estate.
>>
>> I talked with Linus about this, and he told me that we already have these
>> markers in the kernel. They are from the mcount/__fentry__ used by function
>> tracing. Have the trace events be created by these, and see if this will
>> satisfy most areas that want trace events.
>>
>> I decided to implement this idea, and here's the patch set.
>>
>> Introducing "function based events". These are created dynamically by a
>> tracefs file called "function_events". By writing a pseudo prototype into
>> this file, you create an event.
>>
>>  # mount -t tracefs nodev /sys/kernel/tracing
>>  # cd /sys/kernel/tracing
>>  # echo 'do_IRQ(symbol ip[16] | x64[6] irq_stack[16])' > function_events
>>  # cat events/functions/do_IRQ/format
>> name: do_IRQ
>> ID: 1399
>> format:
>> 	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
>> 	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
>> 	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
>> 	field:int common_pid;	offset:4;	size:4;	signed:1;
>>
>> 	field:unsigned long __parent_ip;	offset:8;	size:8;	signed:0;
>> 	field:unsigned long __ip;	offset:16;	size:8;	signed:0;
>> 	field:symbol ip;	offset:24;	size:8;	signed:0;
>> 	field:x64 irq_stack[6];	offset:32;	size:48;	signed:0;
>>
>> print fmt: "%pS->%pS(ip=%pS, irq_stack=%llx:%llx:%llx:%llx:%llx:%llx)", REC->__ip, REC->__parent_ip,
>> REC->ip, REC->irq_stack[0], REC->irq_stack[1], REC->irq_stack[2], REC->irq_stack[3], REC->irq_stack[4],
>> REC->irq_stack[5]
>>
>>  # echo 1 > events/functions/do_IRQ/enable
> 
> Got the following:
> 
> [  110.433602] =============================
> [  110.435671] WARNING: suspicious RCU usage
> [  110.437173] 4.15.0-rc9+ #42 Not tainted
> [  110.438698] -----------------------------
> [  110.440343] /home/juri/Work/kernel/linux/include/linux/rcupdate.h:749 rcu_read_lock_sched() used illegally while idle!
> [  110.444480]
> [  110.444480] other info that might help us debug this:
> [  110.444480]
> [  110.447616]
> [  110.447616] RCU used illegally from idle CPU!
> [  110.447616] rcu_scheduler_active = 2, debug_locks = 1
> [  110.452047] RCU used illegally from extended quiescent state!
> [  110.454072] 1 lock held by swapper/0/0:
> [  110.455447]  #0:  (rcu_read_lock_sched){....}, at: [<00000000de240ad4>] func_event_call+0x0/0x3c0
> [  110.458532]
> [  110.458532] stack backtrace:
> [  110.460066] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.15.0-rc9+ #42
> [  110.462300] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
> [  110.464477] Call Trace:
> [  110.465095]  <IRQ>
> [  110.465600]  dump_stack+0x85/0xc5
> [  110.466417]  func_event_call+0x378/0x3c0
> [  110.467373]  ? find_held_lock+0x34/0xa0
> [  110.468216]  ? common_interrupt+0xa2/0xa2
> [  110.468978]  ? irq_work_interrupt+0xb0/0xb0
> [  110.470021]  ? hrtimer_start_range_ns+0x1bf/0x3e0
> [  110.471031]  ftrace_ops_assist_func+0x64/0xf0
> [  110.471941]  ? _raw_spin_unlock_irqrestore+0x55/0x60
> [  110.472926]  0xffffffffc02e30bf
> [  110.473491]  ? do_IRQ+0x5/0x100
> [  110.473977]  do_IRQ+0x5/0x100
> [  110.474430]  common_interrupt+0xa2/0xa2
> [  110.475014]  </IRQ>
> [  110.475341] RIP: 0010:native_safe_halt+0x2/0x10
> [  110.476020] RSP: 0018:ffffffff96a03ec8 EFLAGS: 00000206 ORIG_RAX: ffffffffffffffdd
> [  110.477137] RAX: ffffffff96a2a500 RBX: 0000000000000000 RCX: 0000000000000000
> [  110.478110] RDX: ffffffff96a2a500 RSI: 0000000000000001 RDI: ffffffff96a2a500
> [  110.478997] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
> [  110.479880] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [  110.480764] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [  110.481661]  default_idle+0x1f/0x1a0
> [  110.482118]  do_idle+0x166/0x1e0
> [  110.482530]  cpu_startup_entry+0x19/0x20
> [  110.482985]  start_kernel+0x40a/0x412
> [  110.483385]  secondary_startup_64+0xa5/0xb0

Isn't this the case of calling tracing functions before calling:

rcu_irq_enter();

?

-- Daniel
> 
> continuing to test this. :)
> 
> Thanks,
> 
> - Juri
> 

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 00/18] [ANNOUNCE] Dynamically created function based events
  2018-02-03 21:17       ` Steven Rostedt
@ 2018-02-05 13:53           ` Juri Lelli
  2018-02-04  2:25         ` Namhyung Kim
  2018-02-05 13:53           ` Juri Lelli
  2 siblings, 0 replies; 87+ messages in thread
From: Juri Lelli @ 2018-02-05 13:53 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Alexei Starovoitov, Mathieu Desnoyers, linux-kernel,
	Linus Torvalds, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Tom Zanussi, linux-rt-users,
	linux-trace-users, acme, Clark Williams, Jiri Olsa, bristot,
	Jonathan Corbet, Namhyung Kim, Dietmar Eggemann, Patrick Bellasi,
	Morten Rasmussen

Hi Steve,

On 03/02/18 16:17, Steven Rostedt wrote:
> On Sat, 3 Feb 2018 12:52:08 -0800
> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> 
> > It's a user space job.
> 
> BTW, I asked around at DevConf.cz, and nobody I talked with (besides
> Arnaldo), have used eBPF. The "path to hello world" is quite high. This
> interface is extremely simple to use, and one doesn't need to install
> LLVM or other tools to interface with it.

Yep. Managed to get this working in less than an hour. :)

With something like

# echo 'replenish_dl_entity(u64 dl_runtime[3] | u64 dl_deadline[4] | u64 dl_period[5] | s64 runtime[8] | u64 deadline[9])' > function_events
# echo 'sched:*' > set_event
# echo replenish_dl_entity >> set_event

you can get something like

--->8---
     [...]
     cpuhog-3556  [002] d..3   727.101815: sched_switch: prev_comm=cpuhog prev_pid=3556 prev_prio=-1 prev_state=S ==> next_comm=swapper/2 next_pid=0 next_prio=120                                                                                                         
     <idle>-0     [002] d.s4   727.128139: sched_waking: comm=kworker/2:1 pid=53 prio=120 target_cpu=002                           
     <idle>-0     [002] dNs5   727.128150: sched_wakeup: comm=kworker/2:1 pid=53 prio=120 target_cpu=002                           
     <idle>-0     [002] d..3   727.128184: sched_switch: prev_comm=swapper/2 prev_pid=0 prev_prio=120 prev_state=S ==> next_comm=kworker/2:1 next_pid=53 next_prio=120                                                                                                     
kworker/2:1-53    [002] d..3   727.128280: sched_stat_runtime: comm=kworker/2:1 pid=53 runtime=123827 [ns] vruntime=12389788162 [ns]                                                                                                                                       
kworker/2:1-53    [002] d..3   727.128288: sched_switch: prev_comm=kworker/2:1 prev_pid=53 prev_prio=120 prev_state=R+ ==> next_comm=swapper/2 next_pid=0 next_prio=120                                                                                                    
     <idle>-0     [002] d.h5   727.191609: enqueue_task_dl->replenish_dl_entity(dl_runtime=10000000, dl_deadline=100000000, dl_period=100000000, runtime=-218339, deadline=726823680456)                                                                                   
     <idle>-0     [002] d..3   727.191676: sched_switch: prev_comm=swapper/2 prev_pid=0 prev_prio=120 prev_state=S ==> next_comm=cpuhog next_pid=3556 next_prio=-1
     [...]
--->8---

Which is quite nice already IMHO.

> I used the analogy, that eBPF is like C, and this is like Bash. One is
> much easier to get "Hello World!" out than the other.
> 
> So personally, this is something I know I would use (note, I have
> never used eBPF either). But if I'm the only one to use this
> interface then I'll stop here (and not bother with the function graph
> return interface). If others think this would be helpful, I would ask
> them to speak up now.

First impression is that this is going to be definitely useful if

 - it's possibile to hook at function end (e.g., replenish_dl_entity above
   carries more useful information _after_ it did its job)
 - inside functions? not really sure it's actually going to be needed, but I
   was wondering if it's possible at all :); with tracepoints it's for example
   easy to collect detailed information about which branches has been taken etc.

I'm going to play with this more. Just wanted to give back a quick positive
feedback.

I'm also adding Arm folks to the discussion, as they (and I :) have been
asking to add tracepoints to scheduler code in the past [1].

Best,

- Juri

[1] https://marc.info/?l=linux-kernel&m=149068303518607

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 00/18] [ANNOUNCE] Dynamically created function based events
@ 2018-02-05 13:53           ` Juri Lelli
  0 siblings, 0 replies; 87+ messages in thread
From: Juri Lelli @ 2018-02-05 13:53 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Alexei Starovoitov, Mathieu Desnoyers, linux-kernel,
	Linus Torvalds, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Tom Zanussi, linux-rt-users,
	linux-trace-users, acme, Clark Williams, Jiri Olsa, bristot,
	Jonathan Corbet, Namhyung Kim, Dietmar Eggemann

Hi Steve,

On 03/02/18 16:17, Steven Rostedt wrote:
> On Sat, 3 Feb 2018 12:52:08 -0800
> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> 
> > It's a user space job.
> 
> BTW, I asked around at DevConf.cz, and nobody I talked with (besides
> Arnaldo), have used eBPF. The "path to hello world" is quite high. This
> interface is extremely simple to use, and one doesn't need to install
> LLVM or other tools to interface with it.

Yep. Managed to get this working in less than an hour. :)

With something like

# echo 'replenish_dl_entity(u64 dl_runtime[3] | u64 dl_deadline[4] | u64 dl_period[5] | s64 runtime[8] | u64 deadline[9])' > function_events
# echo 'sched:*' > set_event
# echo replenish_dl_entity >> set_event

you can get something like

--->8---
     [...]
     cpuhog-3556  [002] d..3   727.101815: sched_switch: prev_comm=cpuhog prev_pid=3556 prev_prio=-1 prev_state=S ==> next_comm=swapper/2 next_pid=0 next_prio=120                                                                                                         
     <idle>-0     [002] d.s4   727.128139: sched_waking: comm=kworker/2:1 pid=53 prio=120 target_cpu=002                           
     <idle>-0     [002] dNs5   727.128150: sched_wakeup: comm=kworker/2:1 pid=53 prio=120 target_cpu=002                           
     <idle>-0     [002] d..3   727.128184: sched_switch: prev_comm=swapper/2 prev_pid=0 prev_prio=120 prev_state=S ==> next_comm=kworker/2:1 next_pid=53 next_prio=120                                                                                                     
kworker/2:1-53    [002] d..3   727.128280: sched_stat_runtime: comm=kworker/2:1 pid=53 runtime=123827 [ns] vruntime=12389788162 [ns]                                                                                                                                       
kworker/2:1-53    [002] d..3   727.128288: sched_switch: prev_comm=kworker/2:1 prev_pid=53 prev_prio=120 prev_state=R+ ==> next_comm=swapper/2 next_pid=0 next_prio=120                                                                                                    
     <idle>-0     [002] d.h5   727.191609: enqueue_task_dl->replenish_dl_entity(dl_runtime=10000000, dl_deadline=100000000, dl_period=100000000, runtime=-218339, deadline=726823680456)                                                                                   
     <idle>-0     [002] d..3   727.191676: sched_switch: prev_comm=swapper/2 prev_pid=0 prev_prio=120 prev_state=S ==> next_comm=cpuhog next_pid=3556 next_prio=-1
     [...]
--->8---

Which is quite nice already IMHO.

> I used the analogy, that eBPF is like C, and this is like Bash. One is
> much easier to get "Hello World!" out than the other.
> 
> So personally, this is something I know I would use (note, I have
> never used eBPF either). But if I'm the only one to use this
> interface then I'll stop here (and not bother with the function graph
> return interface). If others think this would be helpful, I would ask
> them to speak up now.

First impression is that this is going to be definitely useful if

 - it's possibile to hook at function end (e.g., replenish_dl_entity above
   carries more useful information _after_ it did its job)
 - inside functions? not really sure it's actually going to be needed, but I
   was wondering if it's possible at all :); with tracepoints it's for example
   easy to collect detailed information about which branches has been taken etc.

I'm going to play with this more. Just wanted to give back a quick positive
feedback.

I'm also adding Arm folks to the discussion, as they (and I :) have been
asking to add tracepoints to scheduler code in the past [1].

Best,

- Juri

[1] https://marc.info/?l=linux-kernel&m=149068303518607

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 00/18] [ANNOUNCE] Dynamically created function based events
  2018-02-04 17:21       ` Alexei Starovoitov
@ 2018-02-05 14:39         ` Masami Hiramatsu
  0 siblings, 0 replies; 87+ messages in thread
From: Masami Hiramatsu @ 2018-02-05 14:39 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: rostedt, linux-kernel, torvalds, mingo, akpm, tglx, peterz, acme,
	corbet, mathieu.desnoyers, namhyung, daniel, davem

On Sun, 4 Feb 2018 09:21:30 -0800
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> On Sun, Feb 04, 2018 at 12:57:47PM +0900, Masami Hiramatsu wrote:
> > 
> > > I based some of the code from kprobes too. But I wanted this to be
> > > simpler, and as such, not as powerful as kprobes. More of a "poor mans"
> > > kprobe ;-) Where you are limited to functions and their arguments. If
> > > you need more power, switch to kprobes. In other words, its just an
> > > added stepping stone.
> > > 
> > > Also, this should work without kprobe support, only ftrace, and function
> > > args from the arch.
> > 
> > Hmm, but implementation seems very far from current probe events, we need
> > to consider how to unify it. Anyway, it is a very good time to do, because
> > I found current probe-event fetch method is not good with retpoline/IBRS,
> > it is full of indirect call.
> > 
> > I would like to convert it to eBPF if possible. It will be good for the
> > performance with JIT, and we can collaborate on the same code with BPF
> > people.
> 
> The current probe fetch method is indeed going to slow down due to
> retpoline, but this issue is going to affect not only this piece
> of code, but the rest of the kernel where indirect call performance
> matters a lot. Like networking stack where we have at least 4 indirect
> calls per packet.
> So I'd suggest to focus on finding a general method instead of coming
> with a specific solution for this kprobe fetching problem.

OK.

> Devirtualization approach works well and applicable in many cases.
> For networking stack deliver_skb() and __netif_receive_skb_core()
> can check if (pt_prev->func == ip_rcv || ipv6_rcv)
> and call them directly.

Yeah, if the options are limited, that works. (like replacing with
switch-case)

> The other approach I was thinking to explore is static_key-like
> for indirect calls. In many cases the target is rarely changed,
> so we can do arch specific rewrite of destination offset inside
> normal direct call instruction. That should be faster than retpoline.

I doubt it. Most of the indirect call uses are "ops->method" and
it depends on "ops".

> As far as emitting raw bpf insns instead of kprobe fetch methods
> there is a big problem with such apporach. Interpreter and all
> JITs take 'struct bpf_prog' that passed the verifier and not just
> random set of bpf instructions. BPF is not a generic assembler.

If you mean kernel/bpf/verifier.c, I'm happy with passing raw
bpf insns generated by kprobe-fetch-method to it :)

> BPF is an instruction set _with_ C calling convention.
> The registers and instructions must be used in certain way or
> things will horribly break.
> See Documentation/bpf/bpf_design_QA.txt for details.
> Long ago I wrote a patch that converted pred tree walk into
> raw bpf insns. If that patch made it into mainline back then
> it would have been a huge headache for us now.
> So if you plan on generating bpf programs they _must_ pass the verifier.

Yes, of course.
Anyway, it is just an idea for retpoline/Spectre V2. (yeah, it
is actual big issue, it makes the faster pointer-call method
slower. Now we see switch-case may be faster than that in some cases.)

I'm also considering to simplify it (or do it with branch and static
function call) as Steve did on this series.

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 01/18] tracing: Add function based events
  2018-02-05  8:24   ` Jiri Olsa
@ 2018-02-05 15:00     ` Steven Rostedt
  2018-02-07  3:09       ` Steven Rostedt
  0 siblings, 1 reply; 87+ messages in thread
From: Steven Rostedt @ 2018-02-05 15:00 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu, Tom Zanussi,
	linux-rt-users, linux-trace-users, Arnaldo Carvalho de Melo,
	Clark Williams, Daniel Bristot de Oliveira, Juri Lelli,
	Jonathan Corbet, Mathieu Desnoyers, Namhyung Kim,
	Alexei Starovoitov

On Mon, 5 Feb 2018 09:24:23 +0100
Jiri Olsa <jolsa@redhat.com> wrote:


> should this be done under 'func_event_mutex' ?

Probably.

> 
> I tried and crashed the system by running 2 scripts with:
> 
>   echo 'ip_rcv(u64 skb, u64 dev)' > /sys/kernel/debug/tracing/function_events
>   echo 'SyS_openat(int dfd, string buf, x32 flags, x32 mode)' >> /sys/kernel/debug/tracing/function_events
>   echo 'SyS_open(x8[32] buf, x32 flags, x32 mode)' >> /sys/kernel/debug/tracing/function_events
>

Thanks for testing. I'll fix.

-- Steve

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 00/18] [ANNOUNCE] Dynamically created function based events
  2018-02-04  2:25         ` Namhyung Kim
@ 2018-02-05 15:02           ` Steven Rostedt
  0 siblings, 0 replies; 87+ messages in thread
From: Steven Rostedt @ 2018-02-05 15:02 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Alexei Starovoitov, Mathieu Desnoyers, linux-kernel,
	Linus Torvalds, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Tom Zanussi, linux-rt-users,
	linux-trace-users, acme, Clark Williams, Jiri Olsa, bristot,
	Juri Lelli, Jonathan Corbet

On Sun, 4 Feb 2018 11:25:20 +0900
Namhyung Kim <namhyung@kernel.org> wrote:

> I'm interested in this.  From my understanding, it's basically
> function tracing + filter + custom argument info, right?
> 
> Supporting arguments with complex type could be error-prone.
> We need to prevent malfunctions by invalid inputs.

All reads are done with probe_kernel_read(). If it faults (at any
stage), it simply returns "0".

-- Steve

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 00/18] [ANNOUNCE] Dynamically created function based events
  2018-02-05 13:53           ` Juri Lelli
@ 2018-02-05 15:07             ` Steven Rostedt
  -1 siblings, 0 replies; 87+ messages in thread
From: Steven Rostedt @ 2018-02-05 15:07 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Alexei Starovoitov, Mathieu Desnoyers, linux-kernel,
	Linus Torvalds, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Tom Zanussi, linux-rt-users,
	linux-trace-users, acme, Clark Williams, Jiri Olsa, bristot,
	Jonathan Corbet, Namhyung Kim, Dietmar Eggemann, Patrick Bellasi,
	Morten Rasmussen

On Mon, 5 Feb 2018 14:53:55 +0100
Juri Lelli <juri.lelli@redhat.com> wrote:

> First impression is that this is going to be definitely useful if
> 
>  - it's possibile to hook at function end (e.g., replenish_dl_entity above
>    carries more useful information _after_ it did its job)

The one issue is that you will only have access to one argument at the
end. And that will be the return value. How useful would that be?

Hmm, actually, if we incorporate Tom Zanussi's histogram patches (which
I'll start reviewing this week for inclusion), we could add the pseudo
events to carry necessary data.

>  - inside functions? not really sure it's actually going to be needed, but I
>    was wondering if it's possible at all :); with tracepoints it's for example
>    easy to collect detailed information about which branches has been taken etc.

This will not be something to handle anything other than function
calls. You have three options for dealing with inside a function.

 - add another function that can be traced with this, inside the
   function

 - use kprobes

 - add a tracepoint

> 
> I'm going to play with this more. Just wanted to give back a quick positive
> feedback.

Thanks!

> 
> I'm also adding Arm folks to the discussion, as they (and I :) have been
> asking to add tracepoints to scheduler code in the past [1].

You will need to implement the arch_get_func_args() for ARM too.

-- Steve

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 00/18] [ANNOUNCE] Dynamically created function based events
@ 2018-02-05 15:07             ` Steven Rostedt
  0 siblings, 0 replies; 87+ messages in thread
From: Steven Rostedt @ 2018-02-05 15:07 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Alexei Starovoitov, Mathieu Desnoyers, linux-kernel,
	Linus Torvalds, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Tom Zanussi, linux-rt-users,
	linux-trace-users, acme, Clark Williams, Jiri Olsa, bristot,
	Jonathan Corbet, Namhyung Kim, Dietmar Eggemann

On Mon, 5 Feb 2018 14:53:55 +0100
Juri Lelli <juri.lelli@redhat.com> wrote:

> First impression is that this is going to be definitely useful if
> 
>  - it's possibile to hook at function end (e.g., replenish_dl_entity above
>    carries more useful information _after_ it did its job)

The one issue is that you will only have access to one argument at the
end. And that will be the return value. How useful would that be?

Hmm, actually, if we incorporate Tom Zanussi's histogram patches (which
I'll start reviewing this week for inclusion), we could add the pseudo
events to carry necessary data.

>  - inside functions? not really sure it's actually going to be needed, but I
>    was wondering if it's possible at all :); with tracepoints it's for example
>    easy to collect detailed information about which branches has been taken etc.

This will not be something to handle anything other than function
calls. You have three options for dealing with inside a function.

 - add another function that can be traced with this, inside the
   function

 - use kprobes

 - add a tracepoint

> 
> I'm going to play with this more. Just wanted to give back a quick positive
> feedback.

Thanks!

> 
> I'm also adding Arm folks to the discussion, as they (and I :) have been
> asking to add tracepoints to scheduler code in the past [1].

You will need to implement the arch_get_func_args() for ARM too.

-- Steve

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 00/18] [ANNOUNCE] Dynamically created function based events
  2018-02-05 10:09         ` Peter Zijlstra
@ 2018-02-05 15:10           ` Steven Rostedt
  0 siblings, 0 replies; 87+ messages in thread
From: Steven Rostedt @ 2018-02-05 15:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Mathieu Desnoyers, linux-kernel, Ingo Molnar,
	Andrew Morton, Thomas Gleixner, Masami Hiramatsu, Tom Zanussi,
	linux-rt-users, linux-trace-users, acme, Clark Williams,
	Jiri Olsa, bristot, Juri Lelli, Jonathan Corbet, Namhyung Kim,
	Alexei Starovoitov

On Mon, 5 Feb 2018 11:09:31 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> As a run-around, Steve then suggested to decouple the trace-hook from
> the actual trace-event. Let the scheduler only provide the hook, nothing
> else. And then allow users to create their own events with the specific
> data they need for their specific use-case.

Actually, we can make the raw tracepoint callback no longer 'notrace',
and then we can extend all tracepoints as well with this.

One issue I hit was the structure randomization for security. If the
structures are randomized, it becomes much more difficult to find
specific aspects of task_struct.

-- Steve

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 00/18] [ANNOUNCE] Dynamically created function based events
  2018-02-05 10:49   ` Daniel Bristot de Oliveira
@ 2018-02-05 15:11     ` Steven Rostedt
  0 siblings, 0 replies; 87+ messages in thread
From: Steven Rostedt @ 2018-02-05 15:11 UTC (permalink / raw)
  To: Daniel Bristot de Oliveira
  Cc: Juri Lelli, linux-kernel, Linus Torvalds, Ingo Molnar,
	Andrew Morton, Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu,
	Tom Zanussi, linux-rt-users, linux-trace-users,
	Arnaldo Carvalho de Melo, Clark Williams, Jiri Olsa,
	Jonathan Corbet, Mathieu Desnoyers, Namhyung Kim,
	Alexei Starovoitov

On Mon, 5 Feb 2018 11:49:31 +0100
Daniel Bristot de Oliveira <bristot@redhat.com> wrote:

> Isn't this the case of calling tracing functions before calling:
> 
> rcu_irq_enter();

Yes, I'm actually aware of this. I just forgot to deal with it during
development.

Thanks, for reporting.

-- Steve

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 00/18] [ANNOUNCE] Dynamically created function based events
  2018-02-04 19:39       ` Linus Torvalds
  2018-02-05 10:09         ` Peter Zijlstra
@ 2018-02-05 15:14         ` Masami Hiramatsu
  1 sibling, 0 replies; 87+ messages in thread
From: Masami Hiramatsu @ 2018-02-05 15:14 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Mathieu Desnoyers, rostedt, linux-kernel, Ingo Molnar,
	Andrew Morton, Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu,
	Tom Zanussi, linux-rt-users, linux-trace-users, acme,
	Clark Williams, Jiri Olsa, bristot, Juri Lelli, Jonathan Corbet,
	Namhyung Kim, Alexei Starovoitov

(Note that I also agree with Linus's opinion that this is
 like a debugger, since I already did it in perf-probe :))

On Sun, 4 Feb 2018 11:39:39 -0800
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> (Obviously it's not entirely black-and-white, but I do think there is
> a pretty big difference between the two groups. And the first group
> will obviously use the explicit trace points _too_, generally to
> narrow down where they want to go with the function-based one).
> 
> We'll see. Maybe I'm entirely wrong. But I'm hoping that the
> function-based one will end up being helpful.

BTW, if the function-based tracing is helpful for both of them,
they can start using it back in 2010 because kprobe-based tracer
already supported it.

It was less announced, I must admit that I was lazy at that point.
Also, since I moved usability effort on perf-probe, kprobe-based
event syntax is not so funcy.

"SyS_openat(int dfd, string path, x32 flags, x16 mode)"

is equal to

"p SyS_openat dfd=%di:x64 path=%si:string flags=%dx:x32 mode=%cx:x16"

in kprobe probe definition syntax, but with perf-probe and CONFIG_DEBUG_INFO,

perf probe -a 'sys_openat $params'

will setup the event correctly.

So, we need to clarify what will attract more "2nd group" people to
function based events. E.g. the events for EXPORT_SYMBOL_GPL() symbols
already defined and easily on/off.

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 04/18] tracing/x86: Add arch_get_func_args() function
  2018-02-02 23:05 ` [PATCH 04/18] tracing/x86: Add arch_get_func_args() function Steven Rostedt
@ 2018-02-05 16:33   ` Masami Hiramatsu
  2018-02-05 17:06     ` Steven Rostedt
  2018-02-08  5:28   ` Namhyung Kim
  1 sibling, 1 reply; 87+ messages in thread
From: Masami Hiramatsu @ 2018-02-05 16:33 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu, Tom Zanussi,
	linux-rt-users, linux-trace-users, Arnaldo Carvalho de Melo,
	Clark Williams, Jiri Olsa, Daniel Bristot de Oliveira,
	Juri Lelli, Jonathan Corbet, Mathieu Desnoyers, Namhyung Kim,
	Alexei Starovoitov

On Fri, 02 Feb 2018 18:05:02 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> 
> Add function to get the function arguments from pt_regs.
> 

Can we make it an independent feature in asm/ptrace.h so that
other components (like kprobe-event) share it?
E.g.

static inline unsigned long regs_get_func_arg(struct pt_regs *regs,
                                              unsigned int n)
{
	unsigned int offs[] = {
#ifdef CONFIG_X86_64
	offsetof(typeof(regs),di),
	offsetof(typeof(regs),si),
	offsetof(typeof(regs),dx),
	offsetof(typeof(regs),cx),
	offsetof(typeof(regs),r8),
	offsetof(typeof(regs),r9),
#else
	offsetof(typeof(regs),ax),
	offsetof(typeof(regs),dx),
	offsetof(typeof(regs),cx),
#endif
	};

	if (unlikely(n >= ARRAY_SIZE(offs))
		return 0;
	return *(unsigned long *)((unsigned long)regs + offs[n]);
}

And HAVE_REGS_GET_FUNC_ARG indicates this is defined on that arch.

Thank you,

> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> ---
>  arch/x86/kernel/ftrace.c | 28 ++++++++++++++++++++++++++++
>  1 file changed, 28 insertions(+)
> 
> diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
> index 01ebcb6f263e..5e845c8cf89d 100644
> --- a/arch/x86/kernel/ftrace.c
> +++ b/arch/x86/kernel/ftrace.c
> @@ -46,6 +46,34 @@ int ftrace_arch_code_modify_post_process(void)
>  	return 0;
>  }
>  
> +int arch_get_func_args(struct pt_regs *regs,
> +		       int start, int end, long *args)
> +{
> +#ifdef CONFIG_X86_64
> +# define MAX_ARGS 6
> +# define INIT_REGS				\
> +	{	regs->di, regs->si, regs->dx,	\
> +		regs->cx, regs->r8, regs->r9	\
> +	}
> +#else
> +# define MAX_ARGS 3
> +# define INIT_REGS				\
> +	{	regs->ax, regs->dx, regs->cx	}
> +#endif
> +	if (!regs)
> +		return MAX_ARGS;
> +
> +	{
> +		long pt_args[] = INIT_REGS;
> +		int i;
> +
> +		for (i = start; i <= end && i < MAX_ARGS; i++)
> +			args[i - start] = pt_args[i];
> +
> +		return i - start;
> +	}
> +}
> +
>  union ftrace_code_union {
>  	char code[MCOUNT_INSN_SIZE];
>  	struct {
> -- 
> 2.15.1
> 
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 04/18] tracing/x86: Add arch_get_func_args() function
  2018-02-05 16:33   ` Masami Hiramatsu
@ 2018-02-05 17:06     ` Steven Rostedt
  0 siblings, 0 replies; 87+ messages in thread
From: Steven Rostedt @ 2018-02-05 17:06 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Tom Zanussi, linux-rt-users,
	linux-trace-users, Arnaldo Carvalho de Melo, Clark Williams,
	Jiri Olsa, Daniel Bristot de Oliveira, Juri Lelli,
	Jonathan Corbet, Mathieu Desnoyers, Namhyung Kim,
	Alexei Starovoitov

On Tue, 6 Feb 2018 01:33:22 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> On Fri, 02 Feb 2018 18:05:02 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> > 
> > Add function to get the function arguments from pt_regs.
> >   
> 
> Can we make it an independent feature in asm/ptrace.h so that
> other components (like kprobe-event) share it?

Most definitely!

> E.g.
> 
> static inline unsigned long regs_get_func_arg(struct pt_regs *regs,
>                                               unsigned int n)
> {
> 	unsigned int offs[] = {
> #ifdef CONFIG_X86_64
> 	offsetof(typeof(regs),di),
> 	offsetof(typeof(regs),si),
> 	offsetof(typeof(regs),dx),
> 	offsetof(typeof(regs),cx),
> 	offsetof(typeof(regs),r8),
> 	offsetof(typeof(regs),r9),
> #else
> 	offsetof(typeof(regs),ax),
> 	offsetof(typeof(regs),dx),
> 	offsetof(typeof(regs),cx),
> #endif
> 	};
> 
> 	if (unlikely(n >= ARRAY_SIZE(offs))
> 		return 0;
> 	return *(unsigned long *)((unsigned long)regs + offs[n]);

I like having the function I used, because it could also handle
arguments in the stack (although I didn't implement that yet). But this
would still be a nice helper function.

-- Steve


> }
> 
> And HAVE_REGS_GET_FUNC_ARG indicates this is defined on that arch.
> 
> Thank you,
> 
> > Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> > ---
> >  arch/x86/kernel/ftrace.c | 28 ++++++++++++++++++++++++++++
> >  1 file changed, 28 insertions(+)
> > 
> > diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
> > index 01ebcb6f263e..5e845c8cf89d 100644
> > --- a/arch/x86/kernel/ftrace.c
> > +++ b/arch/x86/kernel/ftrace.c
> > @@ -46,6 +46,34 @@ int ftrace_arch_code_modify_post_process(void)
> >  	return 0;
> >  }
> >  
> > +int arch_get_func_args(struct pt_regs *regs,
> > +		       int start, int end, long *args)
> > +{
> > +#ifdef CONFIG_X86_64
> > +# define MAX_ARGS 6
> > +# define INIT_REGS				\
> > +	{	regs->di, regs->si, regs->dx,	\
> > +		regs->cx, regs->r8, regs->r9	\
> > +	}
> > +#else
> > +# define MAX_ARGS 3
> > +# define INIT_REGS				\
> > +	{	regs->ax, regs->dx, regs->cx	}
> > +#endif
> > +	if (!regs)
> > +		return MAX_ARGS;
> > +
> > +	{
> > +		long pt_args[] = INIT_REGS;
> > +		int i;
> > +
> > +		for (i = start; i <= end && i < MAX_ARGS; i++)
> > +			args[i - start] = pt_args[i];
> > +
> > +		return i - start;
> > +	}
> > +}
> > +
> >  union ftrace_code_union {
> >  	char code[MCOUNT_INSN_SIZE];
> >  	struct {
> > -- 
> > 2.15.1
> > 
> >   
> 
> 

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 01/18] tracing: Add function based events
  2018-02-05 15:00     ` Steven Rostedt
@ 2018-02-07  3:09       ` Steven Rostedt
  2018-02-07 12:06         ` Jiri Olsa
  0 siblings, 1 reply; 87+ messages in thread
From: Steven Rostedt @ 2018-02-07  3:09 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu, Tom Zanussi,
	linux-rt-users, linux-trace-users, Arnaldo Carvalho de Melo,
	Clark Williams, Daniel Bristot de Oliveira, Juri Lelli,
	Jonathan Corbet, Mathieu Desnoyers, Namhyung Kim,
	Alexei Starovoitov

On Mon, 5 Feb 2018 10:00:50 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Mon, 5 Feb 2018 09:24:23 +0100
> Jiri Olsa <jolsa@redhat.com> wrote:
> 
> 
> > should this be done under 'func_event_mutex' ?  
> 
> Probably.

I think we only need to add the list.

> 
> > 
> > I tried and crashed the system by running 2 scripts with:
> > 
> >   echo 'ip_rcv(u64 skb, u64 dev)' > /sys/kernel/debug/tracing/function_events
> >   echo 'SyS_openat(int dfd, string buf, x32 flags, x32 mode)' >> /sys/kernel/debug/tracing/function_events
> >   echo 'SyS_open(x8[32] buf, x32 flags, x32 mode)' >> /sys/kernel/debug/tracing/function_events
> >  
> 

There's no reason that we can't have more than one function  event
attached to the same function. I'm adding this:

diff --git a/kernel/trace/trace_event_ftrace.c b/kernel/trace/trace_event_ftrace.c
index b145639eac45..928168fc2025 100644
--- a/kernel/trace/trace_event_ftrace.c
+++ b/kernel/trace/trace_event_ftrace.c
@@ -1275,12 +1275,6 @@ static int create_function_event(int argc, char **argv)
 	if (state != FUNC_STATE_END)
 		goto fail;
 
-	ret = -EALREADY;
-	list_for_each_entry(fe, &func_events, list) {
-		if (strcmp(fe->func, func_event->func) == 0)
-			goto fail;
-	}
-
 	ret = ftrace_set_filter(&func_event->ops, func_event->func,
 				strlen(func_event->func), 0);
 	if (ret < 0)
@@ -1290,7 +1284,9 @@ static int create_function_event(int argc, char **argv)
 	if (ret < 0)
 		goto fail;
 
+	mutex_lock(&func_event_mutex);
 	list_add_tail(&func_event->list, &func_events);
+	mutex_unlock(&func_event_mutex);
 	return 0;
  fail:
 	free_func_event(func_event);

-- Steve

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* Re: [PATCH 01/18] tracing: Add function based events
  2018-02-07  3:09       ` Steven Rostedt
@ 2018-02-07 12:06         ` Jiri Olsa
  0 siblings, 0 replies; 87+ messages in thread
From: Jiri Olsa @ 2018-02-07 12:06 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu, Tom Zanussi,
	linux-rt-users, linux-trace-users, Arnaldo Carvalho de Melo,
	Clark Williams, Daniel Bristot de Oliveira, Juri Lelli,
	Jonathan Corbet, Mathieu Desnoyers, Namhyung Kim,
	Alexei Starovoitov

On Tue, Feb 06, 2018 at 10:09:17PM -0500, Steven Rostedt wrote:
> On Mon, 5 Feb 2018 10:00:50 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > On Mon, 5 Feb 2018 09:24:23 +0100
> > Jiri Olsa <jolsa@redhat.com> wrote:
> > 
> > 
> > > should this be done under 'func_event_mutex' ?  
> > 
> > Probably.
> 
> I think we only need to add the list.

yep, seems enough

jirka

> 
> > 
> > > 
> > > I tried and crashed the system by running 2 scripts with:
> > > 
> > >   echo 'ip_rcv(u64 skb, u64 dev)' > /sys/kernel/debug/tracing/function_events
> > >   echo 'SyS_openat(int dfd, string buf, x32 flags, x32 mode)' >> /sys/kernel/debug/tracing/function_events
> > >   echo 'SyS_open(x8[32] buf, x32 flags, x32 mode)' >> /sys/kernel/debug/tracing/function_events
> > >  
> > 
> 
> There's no reason that we can't have more than one function  event
> attached to the same function. I'm adding this:
> 
> diff --git a/kernel/trace/trace_event_ftrace.c b/kernel/trace/trace_event_ftrace.c
> index b145639eac45..928168fc2025 100644
> --- a/kernel/trace/trace_event_ftrace.c
> +++ b/kernel/trace/trace_event_ftrace.c
> @@ -1275,12 +1275,6 @@ static int create_function_event(int argc, char **argv)
>  	if (state != FUNC_STATE_END)
>  		goto fail;
>  
> -	ret = -EALREADY;
> -	list_for_each_entry(fe, &func_events, list) {
> -		if (strcmp(fe->func, func_event->func) == 0)
> -			goto fail;
> -	}
> -
>  	ret = ftrace_set_filter(&func_event->ops, func_event->func,
>  				strlen(func_event->func), 0);
>  	if (ret < 0)
> @@ -1290,7 +1284,9 @@ static int create_function_event(int argc, char **argv)
>  	if (ret < 0)
>  		goto fail;
>  
> +	mutex_lock(&func_event_mutex);
>  	list_add_tail(&func_event->list, &func_events);
> +	mutex_unlock(&func_event_mutex);
>  	return 0;
>   fail:
>  	free_func_event(func_event);
> 
> -- Steve

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 04/18] tracing/x86: Add arch_get_func_args() function
  2018-02-02 23:05 ` [PATCH 04/18] tracing/x86: Add arch_get_func_args() function Steven Rostedt
  2018-02-05 16:33   ` Masami Hiramatsu
@ 2018-02-08  5:28   ` Namhyung Kim
  2018-02-08 15:29     ` Steven Rostedt
  1 sibling, 1 reply; 87+ messages in thread
From: Namhyung Kim @ 2018-02-08  5:28 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu, Tom Zanussi,
	linux-rt-users, linux-trace-users, Arnaldo Carvalho de Melo,
	Clark Williams, Jiri Olsa, Daniel Bristot de Oliveira,
	Juri Lelli, Jonathan Corbet, Mathieu Desnoyers,
	Alexei Starovoitov, kernel-team

Hi Steve,

On Fri, Feb 02, 2018 at 06:05:02PM -0500, Steven Rostedt wrote:
> From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> 
> Add function to get the function arguments from pt_regs.
> 
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> ---
>  arch/x86/kernel/ftrace.c | 28 ++++++++++++++++++++++++++++
>  1 file changed, 28 insertions(+)
> 
> diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
> index 01ebcb6f263e..5e845c8cf89d 100644
> --- a/arch/x86/kernel/ftrace.c
> +++ b/arch/x86/kernel/ftrace.c
> @@ -46,6 +46,34 @@ int ftrace_arch_code_modify_post_process(void)
>  	return 0;
>  }
>  
> +int arch_get_func_args(struct pt_regs *regs,
> +		       int start, int end, long *args)
> +{
> +#ifdef CONFIG_X86_64
> +# define MAX_ARGS 6
> +# define INIT_REGS				\
> +	{	regs->di, regs->si, regs->dx,	\
> +		regs->cx, regs->r8, regs->r9	\
> +	}
> +#else
> +# define MAX_ARGS 3
> +# define INIT_REGS				\
> +	{	regs->ax, regs->dx, regs->cx	}
> +#endif
> +	if (!regs)
> +		return MAX_ARGS;
> +
> +	{
> +		long pt_args[] = INIT_REGS;
> +		int i;
> +
> +		for (i = start; i <= end && i < MAX_ARGS; i++)
                                ^^^^^^^^

I expected it being 'i < end' based on your description.

Thanks,
Namhyung


> +			args[i - start] = pt_args[i];
> +
> +		return i - start;
> +	}
> +}
> +
>  union ftrace_code_union {
>  	char code[MCOUNT_INSN_SIZE];
>  	struct {
> -- 
> 2.15.1
> 
> 

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 03/18] tracing: Add simple arguments to function based events
  2018-02-02 23:05 ` [PATCH 03/18] tracing: Add simple arguments to " Steven Rostedt
@ 2018-02-08 10:18   ` Namhyung Kim
  2018-02-08 15:37     ` Steven Rostedt
  0 siblings, 1 reply; 87+ messages in thread
From: Namhyung Kim @ 2018-02-08 10:18 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu, Tom Zanussi,
	linux-rt-users, linux-trace-users, Arnaldo Carvalho de Melo,
	Clark Williams, Jiri Olsa, Daniel Bristot de Oliveira,
	Juri Lelli, Jonathan Corbet, Mathieu Desnoyers,
	Alexei Starovoitov, kernel-team

On Fri, Feb 02, 2018 at 06:05:01PM -0500, Steven Rostedt wrote:
> From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> 
> The function based events can now have arguments passed in. A weak function
> arch_get_func_args() is created so that archs can fill in the arguments
> based on pt_regs. Currently no arch implements this function, so no
> arguments are returned. Passing NULL for pt_regs into this function returns
> the number of supported args that can be processed, and the format will
> only allow the user to add valid args. Which currently are all arguments
> until an arch implements the arg_get_func_args() function.
> 
> [ missing 'static' found by Fengguang Wu's kbuild test robot ]
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> ---

[SNIP]
> @@ -129,9 +284,14 @@ static void func_event_trace(struct trace_event_file *trace_file,
>  	struct trace_event_call *call = &func_event->call;
>  	struct ring_buffer_event *event;
>  	struct ring_buffer *buffer;
> +	struct func_arg *arg;
> +	long args[func_event->arg_cnt];
> +	long long val = 1;
>  	unsigned long irq_flags;
> +	int nr_args;
>  	int size;
>  	int pc;
> +	int i = 0;
>  
>  	if (trace_trigger_soft_disabled(trace_file))
>  		return;
> @@ -139,7 +299,7 @@ static void func_event_trace(struct trace_event_file *trace_file,
>  	local_save_flags(irq_flags);
>  	pc = preempt_count();
>  
> -	size = sizeof(*entry);
> +	size = func_event->arg_offset + sizeof(*entry);
>  
>  	event = trace_event_buffer_lock_reserve(&buffer, trace_file,
>  						call->event.type,
> @@ -150,6 +310,15 @@ static void func_event_trace(struct trace_event_file *trace_file,
>  	entry = ring_buffer_event_data(event);
>  	entry->ip = ip;
>  	entry->parent_ip = parent_ip;
> +	nr_args = arch_get_func_args(pt_regs, 0, func_event->arg_cnt, args);
> +
> +	list_for_each_entry(arg, &func_event->args, list) {
> +		if (i < nr_args)
> +			val = args[i];
> +		else
> +			val = 0;
> +		memcpy(&entry->data[arg->offset], &val, arg->size);
> +	}

Where is the 'i' increased?

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 09/18] tracing: Add indexing of arguments for function based events
  2018-02-02 23:05 ` [PATCH 09/18] tracing: Add indexing of arguments for " Steven Rostedt
@ 2018-02-08 10:59   ` Namhyung Kim
  2018-02-08 15:43     ` Steven Rostedt
  0 siblings, 1 reply; 87+ messages in thread
From: Namhyung Kim @ 2018-02-08 10:59 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu, Tom Zanussi,
	linux-rt-users, linux-trace-users, Arnaldo Carvalho de Melo,
	Clark Williams, Jiri Olsa, Daniel Bristot de Oliveira,
	Juri Lelli, Jonathan Corbet, Mathieu Desnoyers,
	Alexei Starovoitov, kernel-team

On Fri, Feb 02, 2018 at 06:05:07PM -0500, Steven Rostedt wrote:
> From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> 
> Currently reading of 8 byte words can only happen 8 bytes aligned from the
> argument. But there may be cases that they are 4 bytes aligned. To make the
> capturing of arguments more flexible, add a plus '+' operator that can index
> the variable at arbitrary indexes to get any location.
> 
>  u64 arg+4[3]
> 
> Will get an 8 byte word at index 28 (3 * 8 + 4)
> 
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> ---
>  Documentation/trace/function-based-events.rst | 24 +++++++++++++++++++++++-
>  kernel/trace/trace_event_ftrace.c             | 18 ++++++++++++++++++
>  2 files changed, 41 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/trace/function-based-events.rst b/Documentation/trace/function-based-events.rst
> index 72e3e7730d63..bdb28f433bfb 100644
> --- a/Documentation/trace/function-based-events.rst
> +++ b/Documentation/trace/function-based-events.rst
> @@ -100,10 +100,12 @@ as follows:
>           'x8' | 'x16' | 'x32' | 'x64' |
>           'char' | 'short' | 'int' | 'long' | 'size_t'
>  
> - FIELD := <name> | <name> INDEX
> + FIELD := <name> | <name> INDEX | <name> OFFSET | <name> OFFSET INDEX
>  
>   INDEX := '[' <number> ']'
>  
> + OFFSET := '+' <number>
> +
>   Where <name> is a unique string starting with an alphabetic character
>   and consists only of letters and numbers and underscores.
>  
> @@ -221,3 +223,23 @@ format:
>  print fmt: "%pS->%pS(skb=%u)", REC->__ip, REC->__parent_ip, REC->skb
>  
>  It is now printed with a "%u".
> +
> +
> +Offsets
> +=======
> +
> +After the name of the variable, brackets '[' number ']' will index the value of
> +the argument by the number given times the size of the field.
> +
> + int field[5] will dereference the value of the argument 20 bytes away (4 * 5)
> +  as sizeof(int) is 4.
> +
> +If there's a case where the type is of 8 bytes in size but is not 8 bytes
> +alligned in the structure, an offset may be required.
> +
> +  For example: x64 param+4[2]
> +
> +The above will take the parameter value, add it by 4, then index it by two
> +8 byte words. It's the same in C as: (u64 *)((void *)param + 4)[2]
> +
> + Note: "int skb[32]" is the same as "int skb+4[31]".
> diff --git a/kernel/trace/trace_event_ftrace.c b/kernel/trace/trace_event_ftrace.c
> index 9548b93eb8cd..4c23fa18453d 100644
> --- a/kernel/trace/trace_event_ftrace.c
> +++ b/kernel/trace/trace_event_ftrace.c
> @@ -19,6 +19,7 @@ struct func_arg {
>  	char				*type;
>  	char				*name;
>  	long				indirect;
> +	long				index;
>  	short				offset;
>  	short				size;
>  	s8				arg;
> @@ -62,6 +63,7 @@ enum func_states {
>  	FUNC_STATE_INDIRECT,
>  	FUNC_STATE_UNSIGNED,
>  	FUNC_STATE_PIPE,
> +	FUNC_STATE_PLUS,
>  	FUNC_STATE_TYPE,
>  	FUNC_STATE_VAR,
>  	FUNC_STATE_COMMA,
> @@ -182,6 +184,7 @@ static char *next_token(char **ptr, char *last)
>  		    *str == ']' ||
>  		    *str == ',' ||
>  		    *str == '|' ||
> +		    *str == '+' ||
>  		    *str == ')')
>  			break;
>  	}
> @@ -323,6 +326,15 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
>  		}
>  		break;
>  
> +	case FUNC_STATE_PLUS:
> +		if (WARN_ON(!fevent->last_arg))
> +			break;
> +		ret = kstrtol(token, 0, &val);
> +		if (ret)
> +			break;
> +		fevent->last_arg->index += val;
> +		return FUNC_STATE_VAR;
> +
>  	case FUNC_STATE_VAR:
>  		switch (token[0]) {
>  		case ')':
> @@ -331,6 +343,8 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
>  			return FUNC_STATE_COMMA;
>  		case '|':
>  			return FUNC_STATE_PIPE;
> +		case '+':
> +			return FUNC_STATE_PLUS;
>  		case '[':
>  			return FUNC_STATE_BRACKET;
>  		}
> @@ -347,6 +361,8 @@ static long long get_arg(struct func_arg *arg, unsigned long val)
>  	char buf[8];
>  	int ret;
>  
> +	val += arg->index;
> +
>  	if (!arg->indirect)
>  		return val;

So this also works without the indirect, and just add the immediate to
the value.

Thanks,
Namhyung


>  
> @@ -779,6 +795,8 @@ static int func_event_seq_show(struct seq_file *m, void *v)
>  		last_arg = arg->arg;
>  		comma = true;
>  		seq_printf(m, "%s %s", arg->type, arg->name);
> +		if (arg->index)
> +			seq_printf(m, "+%ld", arg->index);
>  		if (arg->indirect && arg->size)
>  			seq_printf(m, "[%ld]",
>  				   (arg->indirect ^ INDIRECT_FLAG) / arg->size);
> -- 
> 2.15.1
> 
> 

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 11/18] tracing: Add symbol type to function based events
  2018-02-02 23:05 ` [PATCH 11/18] tracing: Add symbol type to function based events Steven Rostedt
@ 2018-02-08 11:03   ` Namhyung Kim
  2018-02-08 15:48     ` Steven Rostedt
  0 siblings, 1 reply; 87+ messages in thread
From: Namhyung Kim @ 2018-02-08 11:03 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu, Tom Zanussi,
	linux-rt-users, linux-trace-users, Arnaldo Carvalho de Melo,
	Clark Williams, Jiri Olsa, Daniel Bristot de Oliveira,
	Juri Lelli, Jonathan Corbet, Mathieu Desnoyers,
	Alexei Starovoitov, kernel-team

On Fri, Feb 02, 2018 at 06:05:09PM -0500, Steven Rostedt wrote:
> From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> 
> Add a special type "symbol" that will use %pS to display the field of a
> function based event.
> 
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> ---
>  Documentation/trace/function-based-events.rst | 26 +++++++++++++++++++++++++-
>  kernel/trace/trace_event_ftrace.c             | 13 ++++++++++---
>  2 files changed, 35 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/trace/function-based-events.rst b/Documentation/trace/function-based-events.rst
> index bdb28f433bfb..f18c8f3ef330 100644
> --- a/Documentation/trace/function-based-events.rst
> +++ b/Documentation/trace/function-based-events.rst
> @@ -98,7 +98,8 @@ as follows:
>   ATOM := 'u8' | 'u16' | 'u32' | 'u64' |
>           's8' | 's16' | 's32' | 's64' |
>           'x8' | 'x16' | 'x32' | 'x64' |
> -         'char' | 'short' | 'int' | 'long' | 'size_t'
> +         'char' | 'short' | 'int' | 'long' | 'size_t' |
> +	 'symbol'
>  
>   FIELD := <name> | <name> INDEX | <name> OFFSET | <name> OFFSET INDEX
>  
> @@ -243,3 +244,26 @@ The above will take the parameter value, add it by 4, then index it by two
>  8 byte words. It's the same in C as: (u64 *)((void *)param + 4)[2]
>  
>   Note: "int skb[32]" is the same as "int skb+4[31]".
> +
> +
> +Symbols (function names)
> +========================
> +
> +To display kallsyms "%pS" type of output, use the special type "symbol".
> +
> +Again, using gdb to find the offset of the "func" field of struct work_struct
> +
> +(gdb) printf "%d\n", &((struct work_struct *)0)->func
> +24
> +
> + Both "symbol func[3]" and "symbol func+24[0]" will work.
> +
> + # echo '__queue_work(int cpu, x64 wq, symbol func[3])' > function_events
> +
> + # echo 1 > events/functions/__queue_work/enable
> + # cat trace
> +       bash-1641  [007] d..2  6241.171332: queue_work_on->__queue_work(cpu=128, wq=ffff88011a010e00, func=flush_to_ldisc+0x0/0xa0)
> +       bash-1641  [007] d..2  6241.171460: queue_work_on->__queue_work(cpu=128, wq=ffff88011a010e00, func=flush_to_ldisc+0x0/0xa0)
> +     <idle>-0     [000] dNs3  6241.172004: delayed_work_timer_fn->__queue_work(cpu=128, wq=ffff88011a010800, func=vmstat_shepherd+0x0/0xb0)
> + worker/0:2-1689  [000] d..2  6241.172026: __queue_delayed_work->__queue_work(cpu=7, wq=ffff88011a11da00, func=vmstat_update+0x0/0x70)
> +     <idle>-0     [005] d.s3  6241.347996: queue_work_on->__queue_work(cpu=128, wq=ffff88011a011200, func=fb_flashcursor+0x0/0x110 [fb])
> diff --git a/kernel/trace/trace_event_ftrace.c b/kernel/trace/trace_event_ftrace.c
> index 0f2650e97e49..ba10177b9bd6 100644
> --- a/kernel/trace/trace_event_ftrace.c
> +++ b/kernel/trace/trace_event_ftrace.c
> @@ -76,6 +76,7 @@ typedef u64 x64;
>  typedef u32 x32;
>  typedef u16 x16;
>  typedef u8 x8;
> +typedef void * symbol;
>  
>  #define TYPE_TUPLE(type)			\
>  	{ #type, sizeof(type), is_signed_type(type) }
> @@ -97,7 +98,8 @@ typedef u8 x8;
>  	TYPE_TUPLE(x16),			\
>  	TYPE_TUPLE(u8),				\
>  	TYPE_TUPLE(s8),				\
> -	TYPE_TUPLE(x8)
> +	TYPE_TUPLE(x8),				\
> +	TYPE_TUPLE(symbol)
>  
>  static struct func_type {
>  	char		*name;
> @@ -262,7 +264,7 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
>  	switch (state) {
>  	case FUNC_STATE_INIT:
>  		unsign = 0;
> -		if (!isalpha(token[0]))
> +		if (!isalpha(token[0]) && token[0] != '_')
>  			break;

Hmm.. it seems that it needs to be moved to the patch 1?


>  		/* Do not allow wild cards */
>  		if (strstr(token, "*") || strstr(token, "?"))
> @@ -305,7 +307,7 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
>  		return FUNC_STATE_TYPE;
>  
>  	case FUNC_STATE_TYPE:
> -		if (!isalpha(token[0]))
> +		if (!isalpha(token[0]) || token[0] == '_')
>  			break;

Why is different that the above?  Anyway, '_' is not an alphabet..

Thanks,
Namhyung


>  		if (WARN_ON(!fevent->last_arg))
>  			break;
> @@ -472,6 +474,11 @@ static void make_fmt(struct func_arg *arg, char *fmt)
>  {
>  	int c = 0;
>  
> +	if (arg->func_type == FUNC_TYPE_symbol) {
> +		strcpy(fmt, "%pS");
> +		return;
> +	}
> +
>  	fmt[c++] = '%';
>  
>  	if (arg->size == 8) {
> -- 
> 2.15.1
> 
> 

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 04/18] tracing/x86: Add arch_get_func_args() function
  2018-02-08  5:28   ` Namhyung Kim
@ 2018-02-08 15:29     ` Steven Rostedt
  0 siblings, 0 replies; 87+ messages in thread
From: Steven Rostedt @ 2018-02-08 15:29 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu, Tom Zanussi,
	linux-rt-users, linux-trace-users, Arnaldo Carvalho de Melo,
	Clark Williams, Jiri Olsa, Daniel Bristot de Oliveira,
	Juri Lelli, Jonathan Corbet, Mathieu Desnoyers,
	Alexei Starovoitov, kernel-team

On Thu, 8 Feb 2018 14:28:13 +0900
Namhyung Kim <namhyung@kernel.org> wrote:

> Hi Steve,
> 
> On Fri, Feb 02, 2018 at 06:05:02PM -0500, Steven Rostedt wrote:
> > From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> > 
> > Add function to get the function arguments from pt_regs.
> > 
> > Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> > ---
> >  arch/x86/kernel/ftrace.c | 28 ++++++++++++++++++++++++++++
> >  1 file changed, 28 insertions(+)
> > 
> > diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
> > index 01ebcb6f263e..5e845c8cf89d 100644
> > --- a/arch/x86/kernel/ftrace.c
> > +++ b/arch/x86/kernel/ftrace.c
> > @@ -46,6 +46,34 @@ int ftrace_arch_code_modify_post_process(void)
> >  	return 0;
> >  }
> >  
> > +int arch_get_func_args(struct pt_regs *regs,
> > +		       int start, int end, long *args)
> > +{
> > +#ifdef CONFIG_X86_64
> > +# define MAX_ARGS 6
> > +# define INIT_REGS				\
> > +	{	regs->di, regs->si, regs->dx,	\
> > +		regs->cx, regs->r8, regs->r9	\
> > +	}
> > +#else
> > +# define MAX_ARGS 3
> > +# define INIT_REGS				\
> > +	{	regs->ax, regs->dx, regs->cx	}
> > +#endif
> > +	if (!regs)
> > +		return MAX_ARGS;
> > +
> > +	{
> > +		long pt_args[] = INIT_REGS;
> > +		int i;
> > +
> > +		for (i = start; i <= end && i < MAX_ARGS; i++)  
>                                 ^^^^^^^^
> 
> I expected it being 'i < end' based on your description.
> 

Ug. Thanks for pointing that out.

-- Steve

> 
> > +			args[i - start] = pt_args[i];
> > +
> > +		return i - start;
> > +	}
> > +}
> > +
> >  union ftrace_code_union {
> >  	char code[MCOUNT_INSN_SIZE];
> >  	struct {
> > -- 
> > 2.15.1
> > 
> >   

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 03/18] tracing: Add simple arguments to function based events
  2018-02-08 10:18   ` Namhyung Kim
@ 2018-02-08 15:37     ` Steven Rostedt
  0 siblings, 0 replies; 87+ messages in thread
From: Steven Rostedt @ 2018-02-08 15:37 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu, Tom Zanussi,
	linux-rt-users, linux-trace-users, Arnaldo Carvalho de Melo,
	Clark Williams, Jiri Olsa, Daniel Bristot de Oliveira,
	Juri Lelli, Jonathan Corbet, Mathieu Desnoyers,
	Alexei Starovoitov, kernel-team

On Thu, 8 Feb 2018 19:18:18 +0900
Namhyung Kim <namhyung@kernel.org> wrote:
						call->event.type,
> > @@ -150,6 +310,15 @@ static void func_event_trace(struct trace_event_file *trace_file,
> >  	entry = ring_buffer_event_data(event);
> >  	entry->ip = ip;
> >  	entry->parent_ip = parent_ip;
> > +	nr_args = arch_get_func_args(pt_regs, 0, func_event->arg_cnt, args);
> > +
> > +	list_for_each_entry(arg, &func_event->args, list) {
> > +		if (i < nr_args)
> > +			val = args[i];
> > +		else
> > +			val = 0;
> > +		memcpy(&entry->data[arg->offset], &val, arg->size);
> > +	}  
> 
> Where is the 'i' increased?

Good question. I think the increment got nuked via one of my rebases,
and then most my testing happened at the end of the patch series where
"i" is no longer used. But that's no excuse for keeping this bug
around. I'll fix it and test again at each patch.

Thanks for reporting.

-- Steve

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 09/18] tracing: Add indexing of arguments for function based events
  2018-02-08 10:59   ` Namhyung Kim
@ 2018-02-08 15:43     ` Steven Rostedt
  2018-02-08 23:56       ` Namhyung Kim
  0 siblings, 1 reply; 87+ messages in thread
From: Steven Rostedt @ 2018-02-08 15:43 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu, Tom Zanussi,
	linux-rt-users, linux-trace-users, Arnaldo Carvalho de Melo,
	Clark Williams, Jiri Olsa, Daniel Bristot de Oliveira,
	Juri Lelli, Jonathan Corbet, Mathieu Desnoyers,
	Alexei Starovoitov, kernel-team

On Thu, 8 Feb 2018 19:59:24 +0900
Namhyung Kim <namhyung@kernel.org> wrote:

> > @@ -347,6 +361,8 @@ static long long get_arg(struct func_arg *arg, unsigned long val)
> >  	char buf[8];
> >  	int ret;
> >  
> > +	val += arg->index;
> > +
> >  	if (!arg->indirect)
> >  		return val;  
> 
> So this also works without the indirect, and just add the immediate to
> the value.

Not sure what you are asking here. The immediate adds to the current
value, where as the indirect will then look what's at that location.

If the arg (val) is 0xffffffffabcd0000

	u64 val+8

Will return: 0xffffffffabcd0008

	u64 val[1]

will return what's at location 0xffffffffabcd0008

"u64 val+8[0]" is the same as "u64 val[1]"

Note: "u64 val[0]+8" will return what's at location 0xffffffffabcd0000
plus 8.

-- Steve

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 11/18] tracing: Add symbol type to function based events
  2018-02-08 11:03   ` Namhyung Kim
@ 2018-02-08 15:48     ` Steven Rostedt
  0 siblings, 0 replies; 87+ messages in thread
From: Steven Rostedt @ 2018-02-08 15:48 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu, Tom Zanussi,
	linux-rt-users, linux-trace-users, Arnaldo Carvalho de Melo,
	Clark Williams, Jiri Olsa, Daniel Bristot de Oliveira,
	Juri Lelli, Jonathan Corbet, Mathieu Desnoyers,
	Alexei Starovoitov, kernel-team

On Thu, 8 Feb 2018 20:03:41 +0900
Namhyung Kim <namhyung@kernel.org> wrote:

> > @@ -76,6 +76,7 @@ typedef u64 x64;
> >  typedef u32 x32;
> >  typedef u16 x16;
> >  typedef u8 x8;
> > +typedef void * symbol;
> >  
> >  #define TYPE_TUPLE(type)			\
> >  	{ #type, sizeof(type), is_signed_type(type) }
> > @@ -97,7 +98,8 @@ typedef u8 x8;
> >  	TYPE_TUPLE(x16),			\
> >  	TYPE_TUPLE(u8),				\
> >  	TYPE_TUPLE(s8),				\
> > -	TYPE_TUPLE(x8)
> > +	TYPE_TUPLE(x8),				\
> > +	TYPE_TUPLE(symbol)
> >  
> >  static struct func_type {
> >  	char		*name;
> > @@ -262,7 +264,7 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
> >  	switch (state) {
> >  	case FUNC_STATE_INIT:
> >  		unsign = 0;
> > -		if (!isalpha(token[0]))
> > +		if (!isalpha(token[0]) && token[0] != '_')
> >  			break;  
> 
> Hmm.. it seems that it needs to be moved to the patch 1?

Agreed.

> 
> 
> >  		/* Do not allow wild cards */
> >  		if (strstr(token, "*") || strstr(token, "?"))
> > @@ -305,7 +307,7 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
> >  		return FUNC_STATE_TYPE;
> >  
> >  	case FUNC_STATE_TYPE:
> > -		if (!isalpha(token[0]))
> > +		if (!isalpha(token[0]) || token[0] == '_')
> >  			break;  
> 
> Why is different that the above?  Anyway, '_' is not an alphabet..

That's a bug. I thought I fixed that. Ah, in v2 I fixed it in a later
patch. I'll restructure to fix it in the proper place.

Thanks!

-- Steve

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 09/18] tracing: Add indexing of arguments for function based events
  2018-02-08 15:43     ` Steven Rostedt
@ 2018-02-08 23:56       ` Namhyung Kim
  2018-02-09  0:19         ` Steven Rostedt
  0 siblings, 1 reply; 87+ messages in thread
From: Namhyung Kim @ 2018-02-08 23:56 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu, Tom Zanussi,
	linux-rt-users, linux-trace-users, Arnaldo Carvalho de Melo,
	Clark Williams, Jiri Olsa, Daniel Bristot de Oliveira,
	Juri Lelli, Jonathan Corbet, Mathieu Desnoyers,
	Alexei Starovoitov, kernel-team

On Thu, Feb 08, 2018 at 10:43:43AM -0500, Steven Rostedt wrote:
> On Thu, 8 Feb 2018 19:59:24 +0900
> Namhyung Kim <namhyung@kernel.org> wrote:
> 
> > > @@ -347,6 +361,8 @@ static long long get_arg(struct func_arg *arg, unsigned long val)
> > >  	char buf[8];
> > >  	int ret;
> > >  
> > > +	val += arg->index;
> > > +
> > >  	if (!arg->indirect)
> > >  		return val;  
> > 
> > So this also works without the indirect, and just add the immediate to
> > the value.
> 
> Not sure what you are asking here. The immediate adds to the current
> value, where as the indirect will then look what's at that location.

I expected that the immediate offset is only meaningful with the
indirect (dereference) as the doc says just about it.  So I asked it
was intentional or not.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 09/18] tracing: Add indexing of arguments for function based events
  2018-02-08 23:56       ` Namhyung Kim
@ 2018-02-09  0:19         ` Steven Rostedt
  0 siblings, 0 replies; 87+ messages in thread
From: Steven Rostedt @ 2018-02-09  0:19 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu, Tom Zanussi,
	linux-rt-users, linux-trace-users, Arnaldo Carvalho de Melo,
	Clark Williams, Jiri Olsa, Daniel Bristot de Oliveira,
	Juri Lelli, Jonathan Corbet, Mathieu Desnoyers,
	Alexei Starovoitov, kernel-team

On Fri, 9 Feb 2018 08:56:15 +0900
Namhyung Kim <namhyung@kernel.org> wrote:

> On Thu, Feb 08, 2018 at 10:43:43AM -0500, Steven Rostedt wrote:
> > On Thu, 8 Feb 2018 19:59:24 +0900
> > Namhyung Kim <namhyung@kernel.org> wrote:
> >   
> > > > @@ -347,6 +361,8 @@ static long long get_arg(struct func_arg *arg, unsigned long val)
> > > >  	char buf[8];
> > > >  	int ret;
> > > >  
> > > > +	val += arg->index;
> > > > +
> > > >  	if (!arg->indirect)
> > > >  		return val;    
> > > 
> > > So this also works without the indirect, and just add the immediate to
> > > the value.  
> > 
> > Not sure what you are asking here. The immediate adds to the current
> > value, where as the indirect will then look what's at that location.  
> 
> I expected that the immediate offset is only meaningful with the
> indirect (dereference) as the doc says just about it.  So I asked it
> was intentional or not.
>

Yes it is intentional, but rather useless without an indirect. I mean,
you could just add to the value if you want :-)

The reason it doesn't need the indirect is because there's some types
(arrays and strings) that don't need the indirect. For example, with
the net_device with the perm_addr at 558 bytes away:

 echo 'ip_rcv(NULL, x8[6] perm_addr+558)' > function_events

produces:

          <idle>-0     [003] ..s3  1809.074329: __netif_receive_skb_core->ip_rcv(perm_addr=b4,b5,2f,ce,18,65)

-- Steve

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 12/18] tracing: Add accessing direct address from function based events
  2018-02-02 23:05 ` [PATCH 12/18] tracing: Add accessing direct address from " Steven Rostedt
@ 2018-02-09  0:34   ` Namhyung Kim
  2018-02-09  1:10     ` Steven Rostedt
  2018-02-09 22:07     ` Steven Rostedt
  0 siblings, 2 replies; 87+ messages in thread
From: Namhyung Kim @ 2018-02-09  0:34 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu, Tom Zanussi,
	linux-rt-users, linux-trace-users, Arnaldo Carvalho de Melo,
	Clark Williams, Jiri Olsa, Daniel Bristot de Oliveira,
	Juri Lelli, Jonathan Corbet, Mathieu Desnoyers,
	Alexei Starovoitov, kernel-team

On Fri, Feb 02, 2018 at 06:05:10PM -0500, Steven Rostedt wrote:
> From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> 
> Allow referencing any address during the function based event. The syntax is
> to use <type> <name>=<addr> For example:
> 
>  # echo 'do_IRQ(long total_forks=0xffffffffa2a4b4c0)' > function_events
>  # echo 1 > events/function/enable
>  # cat trace
>             sshd-832   [000] d... 221639.210845: ret_from_intr->do_IRQ(total_forks=855)
>             sshd-832   [000] d... 221639.211114: ret_from_intr->do_IRQ(total_forks=855)
>           <idle>-0     [000] d... 221639.211198: ret_from_intr->do_IRQ(total_forks=855)
> 
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> ---
>  Documentation/trace/function-based-events.rst |  40 +++++++-
>  kernel/trace/trace_event_ftrace.c             | 129 +++++++++++++++++++++-----
>  2 files changed, 143 insertions(+), 26 deletions(-)
> 
> diff --git a/Documentation/trace/function-based-events.rst b/Documentation/trace/function-based-events.rst
> index f18c8f3ef330..b0e6725f3032 100644
> --- a/Documentation/trace/function-based-events.rst
> +++ b/Documentation/trace/function-based-events.rst
> @@ -91,7 +91,7 @@ as follows:
>  
>   ARGS := ARG | ARG ',' ARGS | ''
>  
> - ARG := TYPE FIELD | ARG '|' ARG
> + ARG := TYPE FIELD | TYPE <name> '=' ADDR | TYPE ADDR | ARG '|' ARG
>  
>   TYPE := ATOM | 'unsigned' ATOM
>  
> @@ -107,6 +107,8 @@ as follows:
>  
>   OFFSET := '+' <number>
>  
> + ADDR := A hexidecimal address starting with '0x'
> +
>   Where <name> is a unique string starting with an alphabetic character
>   and consists only of letters and numbers and underscores.
>  
> @@ -267,3 +269,39 @@ Again, using gdb to find the offset of the "func" field of struct work_struct
>       <idle>-0     [000] dNs3  6241.172004: delayed_work_timer_fn->__queue_work(cpu=128, wq=ffff88011a010800, func=vmstat_shepherd+0x0/0xb0)
>   worker/0:2-1689  [000] d..2  6241.172026: __queue_delayed_work->__queue_work(cpu=7, wq=ffff88011a11da00, func=vmstat_update+0x0/0x70)
>       <idle>-0     [005] d.s3  6241.347996: queue_work_on->__queue_work(cpu=128, wq=ffff88011a011200, func=fb_flashcursor+0x0/0x110 [fb])
> +
> +
> +Direct memory access
> +====================
> +
> +Function arguments are not the only thing that can be recorded from a function
> +based event. Memory addresses can also be examined. If there's a global variable
> +that you want to monitor via an interrupt, you can put in the address directly.
> +
> +  # grep total_forks /proc/kallsyms
> +ffffffff82354c18 B total_forks
> +
> +  # echo 'do_IRQ(int total_forks=0xffffffff82354c18)' > function_events

Couldn't we use the symbol name directly?  Maybe it needs a syntax to
indicate global variable.  Like this?

  # echo 'do_IRQ(int $total_forks)' > function_events


> +
> +  # echo 1 events/functions/do_IRQ/enable
> +  # cat trace
> +    <idle>-0     [003] d..3   337.076709: ret_from_intr->do_IRQ(total_forks=1419)
> +    <idle>-0     [003] d..3   337.077046: ret_from_intr->do_IRQ(total_forks=1419)
> +    <idle>-0     [003] d..3   337.077076: ret_from_intr->do_IRQ(total_forks=1420)
> +
> +Note, address notations do not affect the argument count. For instance, with
> +
> +__visible unsigned int __irq_entry do_IRQ(struct pt_regs *regs)
> +
> +  # echo 'do_IRQ(int total_forks=0xffffffff82354c18, symbol regs[16])' > function_events
> +
> +Is the same as
> +
> +  # echo 'do_IRQ(int total_forks=0xffffffff82354c18 | symbol regs[16])' > function_events
> +
> +  # cat trace
> +    <idle>-0     [003] d..3   653.839546: ret_from_intr->do_IRQ(total_forks=1504, regs=cpuidle_enter_state+0xb1/0x330)
> +    <idle>-0     [003] d..3   653.906011: ret_from_intr->do_IRQ(total_forks=1504, regs=cpuidle_enter_state+0xb1/0x330)
> +    <idle>-0     [003] d..3   655.823498: ret_from_intr->do_IRQ(total_forks=1504, regs=tick_nohz_idle_enter+0x4c/0x50)
> +    <idle>-0     [003] d..3   655.954096: ret_from_intr->do_IRQ(total_forks=1504, regs=cpuidle_enter_state+0xb1/0x330)
> +
> diff --git a/kernel/trace/trace_event_ftrace.c b/kernel/trace/trace_event_ftrace.c
> index ba10177b9bd6..206114f192be 100644
> --- a/kernel/trace/trace_event_ftrace.c
> +++ b/kernel/trace/trace_event_ftrace.c
> @@ -256,14 +286,16 @@ static int add_arg(struct func_event *fevent, int ftype, int unsign)
>  static enum func_states
>  process_event(struct func_event *fevent, const char *token, enum func_states state)
>  {
> +	static bool update_arg;
>  	static int unsign;
> -	long val;
> +	unsigned long val;
>  	int ret;
>  	int i;
>  
>  	switch (state) {
>  	case FUNC_STATE_INIT:
>  		unsign = 0;
> +		update_arg = false;
>  		if (!isalpha(token[0]) && token[0] != '_')
>  			break;
>  		/* Do not allow wild cards */
> @@ -279,15 +311,15 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
>  			break;
>  		return FUNC_STATE_PARAM;
>  
> -	case FUNC_STATE_PIPE:
> -		fevent->arg_cnt--;
> -		goto comma;
>  	case FUNC_STATE_PARAM:
>  		if (token[0] == ')')
>  			return FUNC_STATE_END;
>  		/* Fall through */
>  	case FUNC_STATE_COMMA:
> - comma:
> +		if (update_arg)
> +			fevent->arg_cnt++;
> +		update_arg = false;
                /* Fall through */

> +	case FUNC_STATE_PIPE:
>  		if (strcmp(token, "unsigned") == 0) {
>  			unsign = 2;
>  			return FUNC_STATE_UNSIGNED;
> @@ -307,18 +339,20 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
>  		return FUNC_STATE_TYPE;
>  
>  	case FUNC_STATE_TYPE:
> -		if (!isalpha(token[0]) || token[0] == '_')
> -			break;
>  		if (WARN_ON(!fevent->last_arg))
>  			break;
> -		fevent->last_arg->name = kstrdup(token, GFP_KERNEL);
> -		if (!fevent->last_arg->name)
> +		if (update_arg_name(fevent, token) < 0)
> +			break;
> +		if (strncmp(token, "0x", 2) == 0)
> +			goto equal;

Not sure it's needed here.  IIUC it should see '=' first and you used
the same token with arg->name.  Hmm.. do you want support accessing to
an unnamed address directly like below?

  # echo 'do_IRQ(int 0xffffffff82354c18)' > function_events 

> +		if (!isalpha(token[0]) && token[0] != '_')
>  			break;

Maybe you want to check it before the update_arg_name().

Thanks,
Namhyung


> +		update_arg = true;
>  		return FUNC_STATE_VAR;
>  
>  	case FUNC_STATE_BRACKET:
>  		WARN_ON(!fevent->last_arg);
> -		ret = kstrtol(token, 0, &val);
> +		ret = kstrtoul(token, 0, &val);
>  		if (ret)
>  			break;
>  		val *= fevent->last_arg->size;
> @@ -333,7 +367,7 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
>  	case FUNC_STATE_BRACKET_END:
>  		switch (token[0]) {
>  		case ')':
> -			return FUNC_STATE_END;
> +			goto end;
>  		case ',':
>  			return FUNC_STATE_COMMA;
>  		case '|':
> @@ -344,16 +378,33 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
>  	case FUNC_STATE_PLUS:
>  		if (WARN_ON(!fevent->last_arg))
>  			break;
> -		ret = kstrtol(token, 0, &val);
> +		ret = kstrtoul(token, 0, &val);
>  		if (ret)
>  			break;
>  		fevent->last_arg->index += val;
>  		return FUNC_STATE_VAR;
>  
> +	case FUNC_STATE_ADDR:
> +		switch (token[0]) {
> +		case ')':
> +			goto end;
> +		case ',':
> +			return FUNC_STATE_COMMA;
> +		case '|':
> +			return FUNC_STATE_PIPE;
> +		}
> +		break;
> +
>  	case FUNC_STATE_VAR:
> +		if (token[0] == '=')
> +			return FUNC_STATE_EQUAL;
> +		if (WARN_ON(!fevent->last_arg))
> +			break;
> +		update_arg_arg(fevent);
> +		update_arg = true;
>  		switch (token[0]) {
>  		case ')':
> -			return FUNC_STATE_END;
> +			goto end;
>  		case ',':
>  			return FUNC_STATE_COMMA;
>  		case '|':
> @@ -365,10 +416,29 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
>  		}
>  		break;
>  
> +	case FUNC_STATE_EQUAL:
> +		if (strncmp(token, "0x", 2) != 0)
> +			break;
> + equal:
> +		if (WARN_ON(!fevent->last_arg))
> +			break;
> +		ret = kstrtoul(token, 0, &val);
> +		if (ret < 0)
> +			break;
> +		update_arg = false;
> +		fevent->last_arg->index = val;
> +		fevent->last_arg->arg = -1;
> +		fevent->last_arg->indirect = INDIRECT_FLAG;
> +		return FUNC_STATE_ADDR;
> +
>  	default:
>  		break;
>  	}
>  	return FUNC_STATE_ERROR;
> + end:
> +	if (update_arg)
> +		fevent->arg_cnt++;
> +	return FUNC_STATE_END;
>  }

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 12/18] tracing: Add accessing direct address from function based events
  2018-02-09  0:34   ` Namhyung Kim
@ 2018-02-09  1:10     ` Steven Rostedt
  2018-02-09 22:07     ` Steven Rostedt
  1 sibling, 0 replies; 87+ messages in thread
From: Steven Rostedt @ 2018-02-09  1:10 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu, Tom Zanussi,
	linux-rt-users, linux-trace-users, Arnaldo Carvalho de Melo,
	Clark Williams, Jiri Olsa, Daniel Bristot de Oliveira,
	Juri Lelli, Jonathan Corbet, Mathieu Desnoyers,
	Alexei Starovoitov, kernel-team

On Fri, 9 Feb 2018 09:34:36 +0900
Namhyung Kim <namhyung@kernel.org> wrote:

> > +Direct memory access
> > +====================
> > +
> > +Function arguments are not the only thing that can be recorded from a function
> > +based event. Memory addresses can also be examined. If there's a global variable
> > +that you want to monitor via an interrupt, you can put in the address directly.
> > +
> > +  # grep total_forks /proc/kallsyms
> > +ffffffff82354c18 B total_forks
> > +
> > +  # echo 'do_IRQ(int total_forks=0xffffffff82354c18)' > function_events  
> 
> Couldn't we use the symbol name directly?  Maybe it needs a syntax to
> indicate global variable.  Like this?
> 
>   # echo 'do_IRQ(int $total_forks)' > function_events

Or perhaps use "@"?

But that's a good idea and not hard to implement.


> >  	case FUNC_STATE_TYPE:
> > -		if (!isalpha(token[0]) || token[0] == '_')
> > -			break;
> >  		if (WARN_ON(!fevent->last_arg))
> >  			break;
> > -		fevent->last_arg->name = kstrdup(token, GFP_KERNEL);
> > -		if (!fevent->last_arg->name)
> > +		if (update_arg_name(fevent, token) < 0)
> > +			break;
> > +		if (strncmp(token, "0x", 2) == 0)
> > +			goto equal;  
> 
> Not sure it's needed here.  IIUC it should see '=' first and you used
> the same token with arg->name.  Hmm.. do you want support accessing to
> an unnamed address directly like below?
> 
>   # echo 'do_IRQ(int 0xffffffff82354c18)' > function_events 

Yes this works, and was the original way. Someone at DevConf.cz
(Arnaldo maybe, can't remember) recommended giving a name and then we
came up with the "=" sign to use.

> 
> > +		if (!isalpha(token[0]) && token[0] != '_')
> >  			break;  
> 
> Maybe you want to check it before the update_arg_name().

Hmm, perhaps, I guess I should see what the error messages shows.

Thanks!

-- Steve

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 13/18] tracing: Add array type to function based events
  2018-02-02 23:05 ` [PATCH 13/18] tracing: Add array type to " Steven Rostedt
  2018-02-03 13:56   ` Masami Hiramatsu
@ 2018-02-09  1:17   ` Namhyung Kim
  2018-02-09  1:54     ` Steven Rostedt
  1 sibling, 1 reply; 87+ messages in thread
From: Namhyung Kim @ 2018-02-09  1:17 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu, Tom Zanussi,
	linux-rt-users, linux-trace-users, Arnaldo Carvalho de Melo,
	Clark Williams, Jiri Olsa, Daniel Bristot de Oliveira,
	Juri Lelli, Jonathan Corbet, Mathieu Desnoyers,
	Alexei Starovoitov, kernel-team

On Fri, Feb 02, 2018 at 06:05:11PM -0500, Steven Rostedt wrote:
> From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> 
> Add syntex to allow the user to create an array type. Brackets after the
> type field will denote that this is an array type. For example:
> 
>  # echo 'SyS_open(x8[32] buf, x32 flags, x32 mode)' > function_events
> 
> Will make the first argument of the sys_open function call an array of
> 32 bytes.
> 
> The array type can also be used in conjunction with the indirect offset
> brackets as well. For example to get the interrupt stack of regs in do_IRQ()
> for x86_64.
> 
>  # echo 'do_IRQ(x64[5] regs[16])' > function_events
> 
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> ---
>  Documentation/trace/function-based-events.rst |  22 +++-
>  kernel/trace/trace_event_ftrace.c             | 157 +++++++++++++++++++++-----
>  2 files changed, 151 insertions(+), 28 deletions(-)
> 
> diff --git a/Documentation/trace/function-based-events.rst b/Documentation/trace/function-based-events.rst
> index b0e6725f3032..4a8a6fb16a0a 100644
> --- a/Documentation/trace/function-based-events.rst
> +++ b/Documentation/trace/function-based-events.rst
> @@ -93,7 +93,7 @@ as follows:
>  
>   ARG := TYPE FIELD | TYPE <name> '=' ADDR | TYPE ADDR | ARG '|' ARG
>  
> - TYPE := ATOM | 'unsigned' ATOM
> + TYPE := ATOM | ATOM '[' <number> ']' | 'unsigned' TYPE
>  
>   ATOM := 'u8' | 'u16' | 'u32' | 'u64' |
>           's8' | 's16' | 's32' | 's64' |
> @@ -305,3 +305,23 @@ Is the same as
>      <idle>-0     [003] d..3   655.823498: ret_from_intr->do_IRQ(total_forks=1504, regs=tick_nohz_idle_enter+0x4c/0x50)
>      <idle>-0     [003] d..3   655.954096: ret_from_intr->do_IRQ(total_forks=1504, regs=cpuidle_enter_state+0xb1/0x330)
>  
> +
> +Array types
> +===========
> +
> +If there's a case where you want to see an array of a type, then you can
> +declare a type as an array by adding '[' number ']' after the type.
> +
> +To get the net_device perm_addr, from the dev parameter.
> +
> + (gdb) printf "%d\n", &((struct net_device *)0)->perm_addr
> +558
> +
> + # echo 'ip_rcv(x64 skb, x8[6] perm_addr+558)' > function_events
> +
> + # echo 1 > events/functions/ip_rcv/enable
> + # cat trace
> +    <idle>-0     [003] ..s3   219.813582: __netif_receive_skb_core->ip_rcv(skb=ffff880118195e00, perm_addr=b4,b5,2f,ce,18,65)
> +    <idle>-0     [003] ..s3   219.813595: __netif_receive_skb_core->ip_rcv(skb=ffff880118195e00, perm_addr=b4,b5,2f,ce,18,65)
> +    <idle>-0     [003] ..s3   220.115053: __netif_receive_skb_core->ip_rcv(skb=ffff880118195c00, perm_addr=b4,b5,2f,ce,18,65)
> +    <idle>-0     [003] ..s3   220.115293: __netif_receive_skb_core->ip_rcv(skb=ffff880118195c00, perm_addr=b4,b5,2f,ce,18,65)

What about adding braces to indicate array type like below?

... ip_rcv(skb=ffff880118195c00, perm_addr={b4,b5,2f,ce,18,65})


> diff --git a/kernel/trace/trace_event_ftrace.c b/kernel/trace/trace_event_ftrace.c
> index 206114f192be..64e2d7dcfd18 100644
> --- a/kernel/trace/trace_event_ftrace.c
> +++ b/kernel/trace/trace_event_ftrace.c
> @@ -20,6 +20,7 @@ struct func_arg {
>  	char				*name;
>  	long				indirect;
>  	long				index;
> +	short				array;
>  	short				offset;
>  	short				size;
>  	s8				arg;
> @@ -68,6 +69,9 @@ enum func_states {
>  	FUNC_STATE_PIPE,
>  	FUNC_STATE_PLUS,
>  	FUNC_STATE_TYPE,
> +	FUNC_STATE_ARRAY,
> +	FUNC_STATE_ARRAY_SIZE,
> +	FUNC_STATE_ARRAY_END,
>  	FUNC_STATE_VAR,
>  	FUNC_STATE_COMMA,
>  	FUNC_STATE_END,
> @@ -289,6 +293,7 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
>  	static bool update_arg;
>  	static int unsign;
>  	unsigned long val;
> +	char *type;
>  	int ret;
>  	int i;
>  
> @@ -339,6 +344,10 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
>  		return FUNC_STATE_TYPE;
>  
>  	case FUNC_STATE_TYPE:
> +		if (token[0] == '[')
> +			return FUNC_STATE_ARRAY;
> +		/* Fall through */
> +	case FUNC_STATE_ARRAY_END:
>  		if (WARN_ON(!fevent->last_arg))
>  			break;
>  		if (update_arg_name(fevent, token) < 0)
> @@ -350,14 +359,37 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
>  		update_arg = true;
>  		return FUNC_STATE_VAR;
>  
> +	case FUNC_STATE_ARRAY:
>  	case FUNC_STATE_BRACKET:
> -		WARN_ON(!fevent->last_arg);
> +		if (WARN_ON(!fevent->last_arg))
> +			break;
>  		ret = kstrtoul(token, 0, &val);
>  		if (ret)
>  			break;
> -		val *= fevent->last_arg->size;
> -		fevent->last_arg->indirect = val ^ INDIRECT_FLAG;
> -		return FUNC_STATE_INDIRECT;
> +		if (state == FUNC_STATE_BRACKET) {
> +			val *= fevent->last_arg->size;
> +			fevent->last_arg->indirect = val ^ INDIRECT_FLAG;
> +			return FUNC_STATE_INDIRECT;
> +		}
> +		if (val <= 0)
> +			break;

The val is unsigned long type.


> +		fevent->last_arg->array = val;
> +		type = kasprintf(GFP_KERNEL, "%s[%d]", fevent->last_arg->type, (unsigned)val);

s/%d/%lu/  and no need to cast it.


> +		if (!type)
> +			break;
> +		kfree(fevent->last_arg->type);
> +		fevent->last_arg->type = type;
> +		/*
> +		 * arg_offset has already been updated once by size.
> +		 * This update needs to account for that (hence the "- 1").
> +		 */
> +		fevent->arg_offset += fevent->last_arg->size * (fevent->last_arg->array - 1);
> +		return FUNC_STATE_ARRAY_SIZE;
> +
> +	case FUNC_STATE_ARRAY_SIZE:
> +		if (token[0] != ']')
> +			break;
> +		return FUNC_STATE_ARRAY_END;
>  
>  	case FUNC_STATE_INDIRECT:
>  		if (token[0] != ']')
> @@ -453,6 +485,10 @@ static long long get_arg(struct func_arg *arg, unsigned long val)
>  
>  	val = val + (arg->indirect ^ INDIRECT_FLAG);
>  
> +	/* Arrays do their own indirect reads */
> +	if (arg->array)
> +		return val;
> +

Not sure about this.  After this change it would make 'x64[1] foo' and
'x64[1] foo[0]' equivalent, right?

Thanks,
Namhyung


>  	ret = probe_kernel_read(buf, (void *)val, arg->size);
>  	if (ret)
>  		return 0;
> @@ -474,6 +510,21 @@ static long long get_arg(struct func_arg *arg, unsigned long val)
>  	return val;
>  }

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 13/18] tracing: Add array type to function based events
  2018-02-09  1:17   ` Namhyung Kim
@ 2018-02-09  1:54     ` Steven Rostedt
  0 siblings, 0 replies; 87+ messages in thread
From: Steven Rostedt @ 2018-02-09  1:54 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu, Tom Zanussi,
	linux-rt-users, linux-trace-users, Arnaldo Carvalho de Melo,
	Clark Williams, Jiri Olsa, Daniel Bristot de Oliveira,
	Juri Lelli, Jonathan Corbet, Mathieu Desnoyers,
	Alexei Starovoitov, kernel-team

On Fri, 9 Feb 2018 10:17:45 +0900
Namhyung Kim <namhyung@kernel.org> wrote:

> > + # echo 1 > events/functions/ip_rcv/enable
> > + # cat trace
> > +    <idle>-0     [003] ..s3   219.813582: __netif_receive_skb_core->ip_rcv(skb=ffff880118195e00, perm_addr=b4,b5,2f,ce,18,65)
> > +    <idle>-0     [003] ..s3   219.813595: __netif_receive_skb_core->ip_rcv(skb=ffff880118195e00, perm_addr=b4,b5,2f,ce,18,65)
> > +    <idle>-0     [003] ..s3   220.115053: __netif_receive_skb_core->ip_rcv(skb=ffff880118195c00, perm_addr=b4,b5,2f,ce,18,65)
> > +    <idle>-0     [003] ..s3   220.115293: __netif_receive_skb_core->ip_rcv(skb=ffff880118195c00, perm_addr=b4,b5,2f,ce,18,65)  
> 
> What about adding braces to indicate array type like below?
> 
> ... ip_rcv(skb=ffff880118195c00, perm_addr={b4,b5,2f,ce,18,65})
> 

That's a nice idea, I'll add it.

> > +	case FUNC_STATE_ARRAY:
> >  	case FUNC_STATE_BRACKET:
> > -		WARN_ON(!fevent->last_arg);
> > +		if (WARN_ON(!fevent->last_arg))
> > +			break;
> >  		ret = kstrtoul(token, 0, &val);
> >  		if (ret)
> >  			break;
> > -		val *= fevent->last_arg->size;
> > -		fevent->last_arg->indirect = val ^ INDIRECT_FLAG;
> > -		return FUNC_STATE_INDIRECT;
> > +		if (state == FUNC_STATE_BRACKET) {
> > +			val *= fevent->last_arg->size;
> > +			fevent->last_arg->indirect = val ^ INDIRECT_FLAG;
> > +			return FUNC_STATE_INDIRECT;
> > +		}
> > +		if (val <= 0)
> > +			break;  
> 
> The val is unsigned long type.

I probably should make it a cap it for the array, as arrays that are
too big will simply fail to allocate on the ring buffer.

But it should only check for zero.

> 
> 
> > +		fevent->last_arg->array = val;
> > +		type = kasprintf(GFP_KERNEL, "%s[%d]", fevent->last_arg->type, (unsigned)val);  
> 
> s/%d/%lu/  and no need to cast it.

Sure.

> 
> 
> > +		if (!type)
> > +			break;
> > +		kfree(fevent->last_arg->type);
> > +		fevent->last_arg->type = type;
> > +		/*
> > +		 * arg_offset has already been updated once by size.
> > +		 * This update needs to account for that (hence the "- 1").
> > +		 */
> > +		fevent->arg_offset += fevent->last_arg->size * (fevent->last_arg->array - 1);
> > +		return FUNC_STATE_ARRAY_SIZE;
> > +
> > +	case FUNC_STATE_ARRAY_SIZE:
> > +		if (token[0] != ']')
> > +			break;
> > +		return FUNC_STATE_ARRAY_END;
> >  
> >  	case FUNC_STATE_INDIRECT:
> >  		if (token[0] != ']')
> > @@ -453,6 +485,10 @@ static long long get_arg(struct func_arg *arg, unsigned long val)
> >  
> >  	val = val + (arg->indirect ^ INDIRECT_FLAG);
> >  
> > +	/* Arrays do their own indirect reads */
> > +	if (arg->array)
> > +		return val;
> > +  
> 
> Not sure about this.  After this change it would make 'x64[1] foo' and
> 'x64[1] foo[0]' equivalent, right?

Yeah, I may need to re-think this. I originally had the "array"
use the "indirect" code, but I'm thinking that isn't necessary.

Thanks for the input.

-- Steve

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 15/18] tracing: Add string type for dynamic strings in function based events
  2018-02-02 23:05 ` [PATCH 15/18] tracing: Add string type for dynamic strings in " Steven Rostedt
@ 2018-02-09  3:15   ` Namhyung Kim
  2018-02-09  3:31     ` Steven Rostedt
  0 siblings, 1 reply; 87+ messages in thread
From: Namhyung Kim @ 2018-02-09  3:15 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu, Tom Zanussi,
	linux-rt-users, linux-trace-users, Arnaldo Carvalho de Melo,
	Clark Williams, Jiri Olsa, Daniel Bristot de Oliveira,
	Juri Lelli, Jonathan Corbet, Mathieu Desnoyers,
	Alexei Starovoitov, kernel-team

On Fri, Feb 02, 2018 at 06:05:13PM -0500, Steven Rostedt wrote:
> From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> 
> Add a "string" type that will create a dynamic length string for the
> event, this is the same as the __string() field in normal TRACE_EVENTS.
> 
> [ missing 'static' found by Fengguang Wu's kbuild test robot ]
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> ---
>  Documentation/trace/function-based-events.rst |  19 ++-
>  kernel/trace/trace_event_ftrace.c             | 183 +++++++++++++++++++++++---
>  2 files changed, 181 insertions(+), 21 deletions(-)
> 
> diff --git a/Documentation/trace/function-based-events.rst b/Documentation/trace/function-based-events.rst
> index 99ae77cd59e6..6c643ea749e7 100644
> --- a/Documentation/trace/function-based-events.rst
> +++ b/Documentation/trace/function-based-events.rst
> @@ -99,7 +99,7 @@ as follows:
>           's8' | 's16' | 's32' | 's64' |
>           'x8' | 'x16' | 'x32' | 'x64' |
>           'char' | 'short' | 'int' | 'long' | 'size_t' |
> -	 'symbol'
> +	 'symbol' | 'string'
>  
>   FIELD := <name> | <name> INDEX | <name> OFFSET | <name> OFFSET INDEX
>  
> @@ -342,3 +342,20 @@ the format "%s". If a nul is found, the output will stop. Use another type
>        bash-1470  [003] ...2   980.678715: path_openat->link_path_walk(name=/lib64/ld-linux-x86-64.so.2)
>        bash-1470  [003] ...2   980.678721: path_openat->link_path_walk(name=ld-2.24.so)
>        bash-1470  [003] ...2   980.678978: path_lookupat->link_path_walk(name=/etc/ld.so.preload)
> +
> +
> +Dynamic strings
> +===============
> +
> +Static strings are fine, but they can waste a lot of memory in the ring buffer.
> +The above allocated 64 bytes for a character array, but most of the output was
> +less than 20 characters. Not wanting to truncate strings or waste space on
> +the ring buffer, the dynamic string can help.
> +
> +Use the "string" type for strings that have a large range in size. The max
> +size that will be recorded is 512 bytes. If a string is larger than that, then
> +it will be truncated.
> +
> + # echo 'link_path_walk(string name)' > function_events
> +
> +Gives the same result as above, but does not waste buffer space.
> diff --git a/kernel/trace/trace_event_ftrace.c b/kernel/trace/trace_event_ftrace.c
> index dd24b840329d..273c5838a8e2 100644
> --- a/kernel/trace/trace_event_ftrace.c
> +++ b/kernel/trace/trace_event_ftrace.c
> @@ -39,6 +39,7 @@ struct func_event {
>  	struct func_arg			*last_arg;
>  	int				arg_cnt;
>  	int				arg_offset;
> +	int				has_strings;
>  };
>  
>  struct func_file {
> @@ -83,6 +84,8 @@ typedef u32 x32;
>  typedef u16 x16;
>  typedef u8 x8;
>  typedef void * symbol;
> +/* 2 byte offset, 2 byte length */
> +typedef u32 string;
>  
>  #define TYPE_TUPLE(type)			\
>  	{ #type, sizeof(type), is_signed_type(type) }
> @@ -105,7 +108,8 @@ typedef void * symbol;
>  	TYPE_TUPLE(u8),				\
>  	TYPE_TUPLE(s8),				\
>  	TYPE_TUPLE(x8),				\
> -	TYPE_TUPLE(symbol)
> +	TYPE_TUPLE(symbol),			\
> +	TYPE_TUPLE(string)
>  
>  static struct func_type {
>  	char		*name;
> @@ -124,6 +128,16 @@ enum {
>  	FUNC_TYPE_MAX
>  };
>  
> +#define MAX_STR		512
> +
> +/* Two contexts, normal and NMI, hence the " * 2" */
> +struct func_string {
> +	char		buf[MAX_STR * 2];
> +};
> +
> +static struct func_string __percpu *str_buffer;
> +static int nr_strings;

What protects it?

Thanks,
Namhyung


> +
>  /**
>   * arch_get_func_args - retrieve function arguments via pt_regs
>   * @regs: The registers at the moment the function is called
> @@ -163,6 +177,23 @@ int __weak arch_get_func_args(struct pt_regs *regs,
>  	return 0;
>  }
>  
> +static void free_arg(struct func_arg *arg)
> +{
> +	list_del(&arg->list);
> +	if (arg->func_type == FUNC_TYPE_string) {
> +		nr_strings--;
> +		if (WARN_ON(nr_strings < 0))
> +			nr_strings = 0;
> +		if (!nr_strings) {
> +			free_percpu(str_buffer);
> +			str_buffer = NULL;
> +		}
> +	}
> +	kfree(arg->name);
> +	kfree(arg->type);
> +	kfree(arg);
> +}
> +
>  static void free_func_event(struct func_event *func_event)
>  {
>  	struct func_arg *arg, *n;
> @@ -171,10 +202,7 @@ static void free_func_event(struct func_event *func_event)
>  		return;
>  
>  	list_for_each_entry_safe(arg, n, &func_event->args, list) {
> -		list_del(&arg->list);
> -		kfree(arg->name);
> -		kfree(arg->type);
> -		kfree(arg);
> +		free_arg(arg);
>  	}
>  	ftrace_free_filter(&func_event->ops);
>  	kfree(func_event->call.print_fmt);
> @@ -255,6 +283,17 @@ static int add_arg(struct func_event *fevent, int ftype, int unsign)
>  	list_add_tail(&arg->list, &fevent->args);
>  	fevent->last_arg = arg;
>  
> +	if (ftype == FUNC_TYPE_string) {
> +		fevent->has_strings++;
> +		nr_strings++;
> +		if (nr_strings == 1) {
> +			str_buffer = alloc_percpu(struct func_string);
> +			if (!str_buffer) {
> +				free_arg(arg);
> +				return -ENOMEM;
> +			}
> +		}
> +	}
>  	return 0;
>  }

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 15/18] tracing: Add string type for dynamic strings in function based events
  2018-02-09  3:15   ` Namhyung Kim
@ 2018-02-09  3:31     ` Steven Rostedt
  0 siblings, 0 replies; 87+ messages in thread
From: Steven Rostedt @ 2018-02-09  3:31 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu, Tom Zanussi,
	linux-rt-users, linux-trace-users, Arnaldo Carvalho de Melo,
	Clark Williams, Jiri Olsa, Daniel Bristot de Oliveira,
	Juri Lelli, Jonathan Corbet, Mathieu Desnoyers,
	Alexei Starovoitov, kernel-team

On Fri, 9 Feb 2018 12:15:47 +0900
Namhyung Kim <namhyung@kernel.org> wrote:

> > @@ -124,6 +128,16 @@ enum {
> >  	FUNC_TYPE_MAX
> >  };
> >  
> > +#define MAX_STR		512
> > +
> > +/* Two contexts, normal and NMI, hence the " * 2" */
> > +struct func_string {
> > +	char		buf[MAX_STR * 2];
> > +};
> > +
> > +static struct func_string __percpu *str_buffer;
> > +static int nr_strings;  
> 
> What protects it?

Grumble, I was thinking that the entire create_function_event was under
the func_event_mutex, which it is not. So nr_strings is not fully
protected. I'll fix that thanks.

As for str_buffer, I should comment this as it is rather subtle.


+static int read_string(char *str, unsigned long addr)
+{
+	unsigned long flags;
+	struct func_string *strbuf;
+	char *ptr = (void *)addr;
+	char *buf;
+	int ret;
+
+	if (!str_buffer)
+		return 0;
+
+	strbuf = this_cpu_ptr(str_buffer);
+	buf = &strbuf->buf[0];
+
+	if (in_nmi())
+		buf += MAX_STR;
+
+	local_irq_save(flags);

Like I said, this is really subtle, and desperately needs a comment.

The str_buffer is per cpu and can only be access under irqs disabled.
If we are in NMI, then we move the starting position forward by MAX_STR.

I'll add comments and protect create_function_event with the mutex.

Thanks for pointing this out!

-- Steve


+	ret = strncpy_from_unsafe(buf, ptr, MAX_STR);
+	if (ret < 0)
+		ret = 0;
+	if (ret > 0 && str)
+		memcpy(str, buf, ret);
+	local_irq_restore(flags);
+
+	return ret;
+}

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 17/18] tracing: Add indirect to indirect access for function based events
  2018-02-02 23:05 ` [PATCH 17/18] tracing: Add indirect to indirect access " Steven Rostedt
@ 2018-02-09  5:13   ` Namhyung Kim
  2018-02-09 15:47     ` Steven Rostedt
  0 siblings, 1 reply; 87+ messages in thread
From: Namhyung Kim @ 2018-02-09  5:13 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu, Tom Zanussi,
	linux-rt-users, linux-trace-users, Arnaldo Carvalho de Melo,
	Clark Williams, Jiri Olsa, Daniel Bristot de Oliveira,
	Juri Lelli, Jonathan Corbet, Mathieu Desnoyers,
	Alexei Starovoitov, kernel-team

On Fri, Feb 02, 2018 at 06:05:15PM -0500, Steven Rostedt wrote:
> From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> 
> Allow the function based events to retrieve not only the parameters offsets,
> but also get data from a pointer within a parameter structure. Something
> like:
> 
>  # echo 'ip_rcv(string skdev+16[0][0] | x8[6] skperm+16[0]+558)' > function_events
> 
>  # echo 1 > events/functions/ip_rcv/enable
>  # cat trace
>     <idle>-0     [003] ..s3   310.626391: __netif_receive_skb_core->ip_rcv(skdev=em1, skperm=b4,b5,2f,ce,18,65)
>     <idle>-0     [003] ..s3   310.626400: __netif_receive_skb_core->ip_rcv(skdev=em1, skperm=b4,b5,2f,ce,18,65)
>     <idle>-0     [003] ..s3   312.183775: __netif_receive_skb_core->ip_rcv(skdev=em1, skperm=b4,b5,2f,ce,18,65)
>     <idle>-0     [003] ..s3   312.184329: __netif_receive_skb_core->ip_rcv(skdev=em1, skperm=b4,b5,2f,ce,18,65)
>     <idle>-0     [003] ..s3   312.303895: __netif_receive_skb_core->ip_rcv(skdev=em1, skperm=b4,b5,2f,ce,18,65)
>     <idle>-0     [003] ..s3   312.304610: __netif_receive_skb_core->ip_rcv(skdev=em1, skperm=b4,b5,2f,ce,18,65)
>     <idle>-0     [003] ..s3   312.471980: __netif_receive_skb_core->ip_rcv(skdev=em1, skperm=b4,b5,2f,ce,18,65)
>     <idle>-0     [003] ..s3   312.472908: __netif_receive_skb_core->ip_rcv(skdev=em1, skperm=b4,b5,2f,ce,18,65)
>     <idle>-0     [003] ..s3   313.135804: __netif_receive_skb_core->ip_rcv(skdev=em1, skperm=b4,b5,2f,ce,18,65)
> 
> That is, we retrieved the net_device of the sk_buff and displayed its name
> and perm_addr info.
> 
>   sk->dev->name, sk->dev->perm_addr
> 
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> ---

[SNIP]
> +static unsigned long process_redirects(struct func_arg *arg, unsigned long val,
> +				       char *buf)
> +{
> +	struct func_arg_redirect *redirect;
> +	int ret;
> +
> +	if (arg->indirect) {
> +		ret = probe_kernel_read(buf, (void *)val, sizeof(long));
> +		if (ret)
> +			return 0;
> +		val = *(unsigned long *)buf;
> +	}
> +
> +	list_for_each_entry(redirect, &arg->redirects, list) {
> +		val += redirect->index;
> +		if (redirect->indirect) {
> +			val += (redirect->indirect ^ INDIRECT_FLAG);
> +			ret = probe_kernel_read(buf, (void *)val, sizeof(long));
> +			if (ret)
> +				return 0;
> +		}
> +	}
> +	return val;
> +}
> +
> +static long long __get_arg(struct func_arg *arg, unsigned long long val)
>  {
>  	char buf[8];
>  	int ret;
>  
>  	val += arg->index;
>  
> -	if (!arg->indirect)
> -		return val;
> +	if (arg->indirect)
> +		val += (arg->indirect ^ INDIRECT_FLAG);
>  
> -	val = val + (arg->indirect ^ INDIRECT_FLAG);
> +	if (!list_empty(&arg->redirects))
> +		val = process_redirects(arg, val, buf);
> +
> +	if (!val)
> +		return 0;
>  
>  	/* Arrays and strings do their own indirect reads */
> -	if (arg->array || arg->func_type == FUNC_TYPE_string)
> +	if (!arg->indirect || arg->array || arg->func_type == FUNC_TYPE_string)
>  		return val;

It seems the indirect is processed twice with redirects.  Consider
"x64 foo[0]+4", the process_redirects() will call probe_kernel_read()
and then here again.

Thanks,
Namhyung


>  
>  	ret = probe_kernel_read(buf, (void *)val, arg->size);
> @@ -1162,6 +1246,7 @@ static void func_event_seq_stop(struct seq_file *m, void *v)
>  static int func_event_seq_show(struct seq_file *m, void *v)
>  {
>  	struct func_event *func_event = v;
> +	struct func_arg_redirect *redirect;
>  	struct func_arg *arg;
>  	bool comma = false;
>  	int last_arg = 0;
> @@ -1190,6 +1275,13 @@ static int func_event_seq_show(struct seq_file *m, void *v)
>  				seq_printf(m, "[%ld]",
>  					   (arg->indirect ^ INDIRECT_FLAG) / arg->size);
>  		}
> +		list_for_each_entry(redirect, &arg->redirects, list) {
> +			if (redirect->index)
> +				seq_printf(m, "+%ld", redirect->index);
> +			if (redirect->indirect)
> +				seq_printf(m, "[%d]",
> +					   (redirect->indirect ^ INDIRECT_FLAG) / arg->size);
> +		}
>  	}
>  	seq_puts(m, ")\n");
>  
> -- 
> 2.15.1
> 
> 

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 17/18] tracing: Add indirect to indirect access for function based events
  2018-02-09  5:13   ` Namhyung Kim
@ 2018-02-09 15:47     ` Steven Rostedt
  2018-02-09 17:18       ` Steven Rostedt
  2018-02-12  2:15       ` Namhyung Kim
  0 siblings, 2 replies; 87+ messages in thread
From: Steven Rostedt @ 2018-02-09 15:47 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu, Tom Zanussi,
	linux-rt-users, linux-trace-users, Arnaldo Carvalho de Melo,
	Clark Williams, Jiri Olsa, Daniel Bristot de Oliveira,
	Juri Lelli, Jonathan Corbet, Mathieu Desnoyers,
	Alexei Starovoitov, kernel-team

On Fri, 9 Feb 2018 14:13:01 +0900
Namhyung Kim <namhyung@kernel.org> wrote:

> On Fri, Feb 02, 2018 at 06:05:15PM -0500, Steven Rostedt wrote:
> > From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> > 
> > Allow the function based events to retrieve not only the parameters offsets,
> > but also get data from a pointer within a parameter structure. Something
> > like:
> > 
> >  # echo 'ip_rcv(string skdev+16[0][0] | x8[6] skperm+16[0]+558)' > function_events
> > 
> >  # echo 1 > events/functions/ip_rcv/enable
> >  # cat trace
> >     <idle>-0     [003] ..s3   310.626391: __netif_receive_skb_core->ip_rcv(skdev=em1, skperm=b4,b5,2f,ce,18,65)
> >     <idle>-0     [003] ..s3   310.626400: __netif_receive_skb_core->ip_rcv(skdev=em1, skperm=b4,b5,2f,ce,18,65)
> >     <idle>-0     [003] ..s3   312.183775: __netif_receive_skb_core->ip_rcv(skdev=em1, skperm=b4,b5,2f,ce,18,65)
> >     <idle>-0     [003] ..s3   312.184329: __netif_receive_skb_core->ip_rcv(skdev=em1, skperm=b4,b5,2f,ce,18,65)
> >     <idle>-0     [003] ..s3   312.303895: __netif_receive_skb_core->ip_rcv(skdev=em1, skperm=b4,b5,2f,ce,18,65)
> >     <idle>-0     [003] ..s3   312.304610: __netif_receive_skb_core->ip_rcv(skdev=em1, skperm=b4,b5,2f,ce,18,65)
> >     <idle>-0     [003] ..s3   312.471980: __netif_receive_skb_core->ip_rcv(skdev=em1, skperm=b4,b5,2f,ce,18,65)
> >     <idle>-0     [003] ..s3   312.472908: __netif_receive_skb_core->ip_rcv(skdev=em1, skperm=b4,b5,2f,ce,18,65)
> >     <idle>-0     [003] ..s3   313.135804: __netif_receive_skb_core->ip_rcv(skdev=em1, skperm=b4,b5,2f,ce,18,65)
> > 
> > That is, we retrieved the net_device of the sk_buff and displayed its name
> > and perm_addr info.
> > 
> >   sk->dev->name, sk->dev->perm_addr
> > 
> > Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> > ---  
> 
> [SNIP]
> > +static unsigned long process_redirects(struct func_arg *arg, unsigned long val,
> > +				       char *buf)
> > +{
> > +	struct func_arg_redirect *redirect;
> > +	int ret;
> > +
> > +	if (arg->indirect) {
> > +		ret = probe_kernel_read(buf, (void *)val, sizeof(long));
> > +		if (ret)
> > +			return 0;
> > +		val = *(unsigned long *)buf;
> > +	}
> > +
> > +	list_for_each_entry(redirect, &arg->redirects, list) {
> > +		val += redirect->index;
> > +		if (redirect->indirect) {
> > +			val += (redirect->indirect ^ INDIRECT_FLAG);
> > +			ret = probe_kernel_read(buf, (void *)val, sizeof(long));
> > +			if (ret)
> > +				return 0;
> > +		}
> > +	}
> > +	return val;
> > +}
> > +
> > +static long long __get_arg(struct func_arg *arg, unsigned long long val)
> >  {
> >  	char buf[8];
> >  	int ret;
> >  
> >  	val += arg->index;
> >  
> > -	if (!arg->indirect)
> > -		return val;
> > +	if (arg->indirect)
> > +		val += (arg->indirect ^ INDIRECT_FLAG);
> >  
> > -	val = val + (arg->indirect ^ INDIRECT_FLAG);
> > +	if (!list_empty(&arg->redirects))
> > +		val = process_redirects(arg, val, buf);
> > +
> > +	if (!val)
> > +		return 0;
> >  
> >  	/* Arrays and strings do their own indirect reads */
> > -	if (arg->array || arg->func_type == FUNC_TYPE_string)
> > +	if (!arg->indirect || arg->array || arg->func_type == FUNC_TYPE_string)
> >  		return val;  
> 
> It seems the indirect is processed twice with redirects.  Consider
> "x64 foo[0]+4", the process_redirects() will call probe_kernel_read()
> and then here again.
> 


Good catch!

It should have been:

		return process_redirects(arg, val, buf);

Thanks!

-- Steve

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 17/18] tracing: Add indirect to indirect access for function based events
  2018-02-09 15:47     ` Steven Rostedt
@ 2018-02-09 17:18       ` Steven Rostedt
  2018-02-12  2:15       ` Namhyung Kim
  1 sibling, 0 replies; 87+ messages in thread
From: Steven Rostedt @ 2018-02-09 17:18 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu, Tom Zanussi,
	linux-rt-users, linux-trace-users, Arnaldo Carvalho de Melo,
	Clark Williams, Jiri Olsa, Daniel Bristot de Oliveira,
	Juri Lelli, Jonathan Corbet, Mathieu Desnoyers,
	Alexei Starovoitov, kernel-team

On Fri, 9 Feb 2018 10:47:58 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> Good catch!
> 
> It should have been:
> 
> 		return process_redirects(arg, val, buf);

Although I need to add this :-p

diff --git a/kernel/trace/trace_event_ftrace.c b/kernel/trace/trace_event_ftrace.c
index 673336e352be..2690d4e46322 100644
--- a/kernel/trace/trace_event_ftrace.c
+++ b/kernel/trace/trace_event_ftrace.c
@@ -562,7 +564,7 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
                ret = add_arg_redirect(fevent->last_arg, val, 0);
                if (ret)
                        break;
-               return FUNC_STATE_VAR;
+               return FUNC_STATE_BRACKET_END;
 
        case FUNC_STATE_REDIRECT_BRACKET:
                if (WARN_ON(!fevent->last_arg))
@@ -656,6 +658,7 @@ static unsigned long process_redirects(struct func_arg *arg, unsigned long val,
                        ret = probe_kernel_read(buf, (void *)val, sizeof(long));
                        if (ret)
                                return 0;
+                       val = *(unsigned long *)buf;
                }
        }
        return val;

Because it wasn't parsing properly, and then not getting the redirect.

-- Steve

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* Re: [PATCH 12/18] tracing: Add accessing direct address from function based events
  2018-02-09  0:34   ` Namhyung Kim
  2018-02-09  1:10     ` Steven Rostedt
@ 2018-02-09 22:07     ` Steven Rostedt
  2018-02-12  2:06       ` Namhyung Kim
  1 sibling, 1 reply; 87+ messages in thread
From: Steven Rostedt @ 2018-02-09 22:07 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu, Tom Zanussi,
	linux-rt-users, linux-trace-users, Arnaldo Carvalho de Melo,
	Clark Williams, Jiri Olsa, Daniel Bristot de Oliveira,
	Juri Lelli, Jonathan Corbet, Mathieu Desnoyers,
	Alexei Starovoitov, kernel-team

On Fri, 9 Feb 2018 09:34:36 +0900
Namhyung Kim <namhyung@kernel.org> wrote:

> Couldn't we use the symbol name directly?  Maybe it needs a syntax to
> indicate global variable.  Like this?
> 
>   # echo 'do_IRQ(int $total_forks)' > function_events


I decided to stick with "$".

-- Steve

>From ed303534dac5b2d9af7b4db9f042d7941c997288 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Date: Fri, 9 Feb 2018 17:03:06 -0500
Subject: [PATCH] tracing: Add direct kallsym access to function based events

Instead of searching for the address via kallsyms to print the variable in a
function based event, have "$<symbol>" be a way to tell the function based
event to look up the symbol for you.

Instead of:

   # grep total_forks /proc/kallsyms
ffffffff82354c18 B total_forks

  # echo 'do_IRQ(int forks=0xffffffff82354c18)' > function_events

One can do either:

  # echo 'do_IRQ(int forks=$total_forks)' > function_events

or simply

  # echo 'do_IRQ(int $total_forks)' > function_events

The latter will say "total_forks=" in the output where the formal says
"forks=".

Suggested-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 Documentation/trace/function-based-events.rst | 25 ++++++++++++++++++-
 kernel/trace/trace_event_ftrace.c             | 35 ++++++++++++++++++++++++---
 2 files changed, 56 insertions(+), 4 deletions(-)

diff --git a/Documentation/trace/function-based-events.rst b/Documentation/trace/function-based-events.rst
index 606981b876a0..9a30aee338f4 100644
--- a/Documentation/trace/function-based-events.rst
+++ b/Documentation/trace/function-based-events.rst
@@ -112,13 +112,19 @@ as follows:
 
  INDIRECT := INDEX | OFFSET | INDIRECT INDIRECT | ''
 
- ADDR := A hexidecimal address starting with '0x'
+ ADDR := <symbol> | <hexnumber>
 
  Where <name> is a unique string starting with an alphabetic character
  and consists only of letters and numbers and underscores.
 
  Where <number> is a number that can be read by kstrtol() (hex, decimal, etc).
 
+ Where <hexnumber> is an address starting with '0x'
+
+ Where <symbol> is a valid symbol name from kallsyms starting with "$".
+ For example: $total_forks
+
+
 
 Simple arguments
 ================
@@ -317,6 +323,23 @@ Is the same as
     <idle>-0     [003] d..3   655.823498: ret_from_intr->do_IRQ(total_forks=1504, regs=tick_nohz_idle_enter+0x4c/0x50)
     <idle>-0     [003] d..3   655.954096: ret_from_intr->do_IRQ(total_forks=1504, regs=cpuidle_enter_state+0xb1/0x330)
 
+You can also accomplish the same thing above using the kallsym name following
+a "$" symbol. That is:
+
+  # echo 'do_IRQ(int $total_forks)' > function_events
+
+is the same as the above command using the "0xffffffff82354c18" address.
+
+You can rename the variable by using "=":
+
+  # echo 'do_IRQ(int forks=$total_forks)' > function_events
+
+  # cat trace
+    <idle>-0     [003] d..3   698.226763: ret_from_intr->do_IRQ(forks=1475)
+    <idle>-0     [003] d..3   698.226810: ret_from_intr->do_IRQ(forks=1475)
+    <idle>-0     [003] d..3   698.227046: ret_from_intr->do_IRQ(forks=1475)
+    <idle>-0     [003] d..3   698.502222: ret_from_intr->do_IRQ(forks=1475)
+
 
 Array types
 ===========
diff --git a/kernel/trace/trace_event_ftrace.c b/kernel/trace/trace_event_ftrace.c
index 376c9324d65c..39abda19d5d2 100644
--- a/kernel/trace/trace_event_ftrace.c
+++ b/kernel/trace/trace_event_ftrace.c
@@ -92,6 +92,7 @@ static LIST_HEAD(func_events);
 	C(ARRAY_END),				\
 	C(REDIRECT_PLUS),			\
 	C(REDIRECT_BRACKET),			\
+	C(SYMBOL),				\
 	C(VAR),					\
 	C(COMMA),				\
 	C(NULL),				\
@@ -281,6 +282,7 @@ static char *next_token(char **ptr, char *last)
 		    *str == '|' ||
 		    *str == '+' ||
 		    *str == '=' ||
+		    *str == '$' ||
 		    *str == ')')
 			break;
 	}
@@ -393,6 +395,14 @@ static int add_arg_redirect(struct func_arg *arg, long index, long indirect)
 	return 0;
 }
 
+static int get_symbol(const char *symbol, unsigned long *val)
+{
+	*val = kallsyms_lookup_name(symbol);
+	if (!*val)
+		return -1;
+	return 0;
+}
+
 static enum func_states
 process_event(struct func_event *fevent, const char *token, enum func_states state)
 {
@@ -469,6 +479,8 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
 	case FUNC_STATE_ARRAY_END:
 		if (WARN_ON(!fevent->last_arg))
 			break;
+		if (token[0] == '$')
+			return FUNC_STATE_SYMBOL;
 		if (update_arg_name(fevent, token) < 0)
 			break;
 		if (strncmp(token, "0x", 2) == 0)
@@ -542,6 +554,11 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
 		fevent->last_arg->index += val;
 		return FUNC_STATE_VAR;
 
+	case FUNC_STATE_SYMBOL:
+		if (!isalpha(token[0]) && token[0] != '_')
+			break;
+		goto equal;
+
 	case FUNC_STATE_ADDR:
 		switch (token[0]) {
 		case ')':
@@ -599,14 +616,26 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
 		break;
 
 	case FUNC_STATE_EQUAL:
+		if (token[0] == '$')
+			return FUNC_STATE_SYMBOL;
 		if (strncmp(token, "0x", 2) != 0)
 			break;
  equal:
 		if (WARN_ON(!fevent->last_arg))
 			break;
-		ret = kstrtoul(token, 0, &val);
-		if (ret < 0)
-			break;
+		if (isalpha(token[0]) || token[0] != '_') {
+			ret = get_symbol(token, &val);
+			if (ret < 0)
+				break;
+			if (!fevent->last_arg->name) {
+				if (update_arg_name(fevent, token) < 0)
+					break;
+			}
+		} else {
+			ret = kstrtoul(token, 0, &val);
+			if (ret < 0)
+				break;
+		}
 		update_arg = false;
 		fevent->last_arg->index = val;
 		fevent->last_arg->arg = -1;
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* Re: [PATCH 12/18] tracing: Add accessing direct address from function based events
  2018-02-09 22:07     ` Steven Rostedt
@ 2018-02-12  2:06       ` Namhyung Kim
  2018-02-12 15:47           ` Masami Hiramatsu
  0 siblings, 1 reply; 87+ messages in thread
From: Namhyung Kim @ 2018-02-12  2:06 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu, Tom Zanussi,
	linux-rt-users, linux-trace-users, Arnaldo Carvalho de Melo,
	Clark Williams, Jiri Olsa, Daniel Bristot de Oliveira,
	Juri Lelli, Jonathan Corbet, Mathieu Desnoyers,
	Alexei Starovoitov, kernel-team

Hi Steve,

On Fri, Feb 09, 2018 at 05:07:37PM -0500, Steven Rostedt wrote:
> On Fri, 9 Feb 2018 09:34:36 +0900
> Namhyung Kim <namhyung@kernel.org> wrote:
> 
> > Couldn't we use the symbol name directly?  Maybe it needs a syntax to
> > indicate global variable.  Like this?
> > 
> >   # echo 'do_IRQ(int $total_forks)' > function_events
> 
> 
> I decided to stick with "$".

Good.


[SNIP]
>  static enum func_states
>  process_event(struct func_event *fevent, const char *token, enum func_states state)
>  {
> @@ -469,6 +479,8 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
>  	case FUNC_STATE_ARRAY_END:
>  		if (WARN_ON(!fevent->last_arg))
>  			break;
> +		if (token[0] == '$')
> +			return FUNC_STATE_SYMBOL;
>  		if (update_arg_name(fevent, token) < 0)
>  			break;
>  		if (strncmp(token, "0x", 2) == 0)
> @@ -542,6 +554,11 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
>  		fevent->last_arg->index += val;
>  		return FUNC_STATE_VAR;
>  
> +	case FUNC_STATE_SYMBOL:
> +		if (!isalpha(token[0]) && token[0] != '_')
> +			break;
> +		goto equal;
> +
>  	case FUNC_STATE_ADDR:
>  		switch (token[0]) {
>  		case ')':
> @@ -599,14 +616,26 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
>  		break;
>  
>  	case FUNC_STATE_EQUAL:
> +		if (token[0] == '$')
> +			return FUNC_STATE_SYMBOL;
>  		if (strncmp(token, "0x", 2) != 0)
>  			break;
>   equal:
>  		if (WARN_ON(!fevent->last_arg))
>  			break;
> -		ret = kstrtoul(token, 0, &val);
> -		if (ret < 0)
> -			break;
> +		if (isalpha(token[0]) || token[0] != '_') {

I guess you wanted the token[0] being '_'.  Maybe it'd be better adding

  #define isident0(x)  (isalpha(x) || (x) == '_')

?

Thanks,
Namhyung


> +			ret = get_symbol(token, &val);
> +			if (ret < 0)
> +				break;
> +			if (!fevent->last_arg->name) {
> +				if (update_arg_name(fevent, token) < 0)
> +					break;
> +			}
> +		} else {
> +			ret = kstrtoul(token, 0, &val);
> +			if (ret < 0)
> +				break;
> +		}
>  		update_arg = false;
>  		fevent->last_arg->index = val;
>  		fevent->last_arg->arg = -1;
> -- 
> 2.13.6
> 

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 17/18] tracing: Add indirect to indirect access for function based events
  2018-02-09 15:47     ` Steven Rostedt
  2018-02-09 17:18       ` Steven Rostedt
@ 2018-02-12  2:15       ` Namhyung Kim
  2018-02-12 17:23         ` Steven Rostedt
  1 sibling, 1 reply; 87+ messages in thread
From: Namhyung Kim @ 2018-02-12  2:15 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu, Tom Zanussi,
	linux-rt-users, linux-trace-users, Arnaldo Carvalho de Melo,
	Clark Williams, Jiri Olsa, Daniel Bristot de Oliveira,
	Juri Lelli, Jonathan Corbet, Mathieu Desnoyers,
	Alexei Starovoitov, kernel-team

On Fri, Feb 09, 2018 at 10:47:58AM -0500, Steven Rostedt wrote:
> On Fri, 9 Feb 2018 14:13:01 +0900
> Namhyung Kim <namhyung@kernel.org> wrote:
> 
> > On Fri, Feb 02, 2018 at 06:05:15PM -0500, Steven Rostedt wrote:
> > > From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> > > 
> > > Allow the function based events to retrieve not only the parameters offsets,
> > > but also get data from a pointer within a parameter structure. Something
> > > like:
> > > 
> > >  # echo 'ip_rcv(string skdev+16[0][0] | x8[6] skperm+16[0]+558)' > function_events
> > > 
> > >  # echo 1 > events/functions/ip_rcv/enable
> > >  # cat trace
> > >     <idle>-0     [003] ..s3   310.626391: __netif_receive_skb_core->ip_rcv(skdev=em1, skperm=b4,b5,2f,ce,18,65)
> > >     <idle>-0     [003] ..s3   310.626400: __netif_receive_skb_core->ip_rcv(skdev=em1, skperm=b4,b5,2f,ce,18,65)
> > >     <idle>-0     [003] ..s3   312.183775: __netif_receive_skb_core->ip_rcv(skdev=em1, skperm=b4,b5,2f,ce,18,65)
> > >     <idle>-0     [003] ..s3   312.184329: __netif_receive_skb_core->ip_rcv(skdev=em1, skperm=b4,b5,2f,ce,18,65)
> > >     <idle>-0     [003] ..s3   312.303895: __netif_receive_skb_core->ip_rcv(skdev=em1, skperm=b4,b5,2f,ce,18,65)
> > >     <idle>-0     [003] ..s3   312.304610: __netif_receive_skb_core->ip_rcv(skdev=em1, skperm=b4,b5,2f,ce,18,65)
> > >     <idle>-0     [003] ..s3   312.471980: __netif_receive_skb_core->ip_rcv(skdev=em1, skperm=b4,b5,2f,ce,18,65)
> > >     <idle>-0     [003] ..s3   312.472908: __netif_receive_skb_core->ip_rcv(skdev=em1, skperm=b4,b5,2f,ce,18,65)
> > >     <idle>-0     [003] ..s3   313.135804: __netif_receive_skb_core->ip_rcv(skdev=em1, skperm=b4,b5,2f,ce,18,65)
> > > 
> > > That is, we retrieved the net_device of the sk_buff and displayed its name
> > > and perm_addr info.
> > > 
> > >   sk->dev->name, sk->dev->perm_addr
> > > 
> > > Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> > > ---  
> > 
> > [SNIP]
> > > +static unsigned long process_redirects(struct func_arg *arg, unsigned long val,
> > > +				       char *buf)
> > > +{
> > > +	struct func_arg_redirect *redirect;
> > > +	int ret;
> > > +
> > > +	if (arg->indirect) {
> > > +		ret = probe_kernel_read(buf, (void *)val, sizeof(long));
> > > +		if (ret)
> > > +			return 0;
> > > +		val = *(unsigned long *)buf;
> > > +	}
> > > +
> > > +	list_for_each_entry(redirect, &arg->redirects, list) {
> > > +		val += redirect->index;
> > > +		if (redirect->indirect) {
> > > +			val += (redirect->indirect ^ INDIRECT_FLAG);
> > > +			ret = probe_kernel_read(buf, (void *)val, sizeof(long));
> > > +			if (ret)
> > > +				return 0;
> > > +		}
> > > +	}
> > > +	return val;
> > > +}
> > > +
> > > +static long long __get_arg(struct func_arg *arg, unsigned long long val)
> > >  {
> > >  	char buf[8];
> > >  	int ret;
> > >  
> > >  	val += arg->index;
> > >  
> > > -	if (!arg->indirect)
> > > -		return val;
> > > +	if (arg->indirect)
> > > +		val += (arg->indirect ^ INDIRECT_FLAG);
> > >  
> > > -	val = val + (arg->indirect ^ INDIRECT_FLAG);
> > > +	if (!list_empty(&arg->redirects))
> > > +		val = process_redirects(arg, val, buf);
> > > +
> > > +	if (!val)
> > > +		return 0;
> > >  
> > >  	/* Arrays and strings do their own indirect reads */
> > > -	if (arg->array || arg->func_type == FUNC_TYPE_string)
> > > +	if (!arg->indirect || arg->array || arg->func_type == FUNC_TYPE_string)
> > >  		return val;  
> > 
> > It seems the indirect is processed twice with redirects.  Consider
> > "x64 foo[0]+4", the process_redirects() will call probe_kernel_read()
> > and then here again.
> > 
> 
> 
> Good catch!
> 
> It should have been:
> 
> 		return process_redirects(arg, val, buf);

But I think you need to consider data type of the arg when
dereferencing the last redirect.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 12/18] tracing: Add accessing direct address from function based events
  2018-02-12  2:06       ` Namhyung Kim
@ 2018-02-12 15:47           ` Masami Hiramatsu
  0 siblings, 0 replies; 87+ messages in thread
From: Masami Hiramatsu @ 2018-02-12 15:47 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Steven Rostedt, linux-kernel, Linus Torvalds, Ingo Molnar,
	Andrew Morton, Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu,
	Tom Zanussi, linux-rt-users, linux-trace-users,
	Arnaldo Carvalho de Melo, Clark Williams, Jiri Olsa,
	Daniel Bristot de Oliveira, Juri Lelli, Jonathan Corbet,
	Mathieu Desnoyers, Alexei Starovoitov, kernel-team

On Mon, 12 Feb 2018 11:06:44 +0900
Namhyung Kim <namhyung@kernel.org> wrote:

> Hi Steve,
> 
> On Fri, Feb 09, 2018 at 05:07:37PM -0500, Steven Rostedt wrote:
> > On Fri, 9 Feb 2018 09:34:36 +0900
> > Namhyung Kim <namhyung@kernel.org> wrote:
> > 
> > > Couldn't we use the symbol name directly?  Maybe it needs a syntax to
> > > indicate global variable.  Like this?
> > > 
> > >   # echo 'do_IRQ(int $total_forks)' > function_events
> > 
> > 
> > I decided to stick with "$".
> 
> Good.
> 
> 
> [SNIP]
> >  static enum func_states
> >  process_event(struct func_event *fevent, const char *token, enum func_states state)
> >  {
> > @@ -469,6 +479,8 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
> >  	case FUNC_STATE_ARRAY_END:
> >  		if (WARN_ON(!fevent->last_arg))
> >  			break;
> > +		if (token[0] == '$')
> > +			return FUNC_STATE_SYMBOL;
> >  		if (update_arg_name(fevent, token) < 0)
> >  			break;
> >  		if (strncmp(token, "0x", 2) == 0)
> > @@ -542,6 +554,11 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
> >  		fevent->last_arg->index += val;
> >  		return FUNC_STATE_VAR;
> >  
> > +	case FUNC_STATE_SYMBOL:
> > +		if (!isalpha(token[0]) && token[0] != '_')
> > +			break;
> > +		goto equal;
> > +
> >  	case FUNC_STATE_ADDR:
> >  		switch (token[0]) {
> >  		case ')':
> > @@ -599,14 +616,26 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
> >  		break;
> >  
> >  	case FUNC_STATE_EQUAL:
> > +		if (token[0] == '$')
> > +			return FUNC_STATE_SYMBOL;
> >  		if (strncmp(token, "0x", 2) != 0)
> >  			break;
> >   equal:
> >  		if (WARN_ON(!fevent->last_arg))
> >  			break;
> > -		ret = kstrtoul(token, 0, &val);
> > -		if (ret < 0)
> > -			break;
> > +		if (isalpha(token[0]) || token[0] != '_') {
> 
> I guess you wanted the token[0] being '_'.  Maybe it'd be better adding
> 
>   #define isident0(x)  (isalpha(x) || (x) == '_')

If this '$' is only for the symbol or direct address(with 0x prefix),
you just need to check by !isdigit(token[0]), isn't it? 
(and if it is insane get_symbol just fails)

Thanks,

> 
> ?
> 
> Thanks,
> Namhyung
> 
> 
> > +			ret = get_symbol(token, &val);
> > +			if (ret < 0)
> > +				break;
> > +			if (!fevent->last_arg->name) {
> > +				if (update_arg_name(fevent, token) < 0)
> > +					break;
> > +			}
> > +		} else {
> > +			ret = kstrtoul(token, 0, &val);
> > +			if (ret < 0)
> > +				break;
> > +		}
> >  		update_arg = false;
> >  		fevent->last_arg->index = val;
> >  		fevent->last_arg->arg = -1;
> > -- 
> > 2.13.6
> > 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 12/18] tracing: Add accessing direct address from function based events
@ 2018-02-12 15:47           ` Masami Hiramatsu
  0 siblings, 0 replies; 87+ messages in thread
From: Masami Hiramatsu @ 2018-02-12 15:47 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Steven Rostedt, linux-kernel, Linus Torvalds, Ingo Molnar,
	Andrew Morton, Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu,
	Tom Zanussi, linux-rt-users, linux-trace-users,
	Arnaldo Carvalho de Melo, Clark Williams, Jiri Olsa,
	Daniel Bristot de Oliveira, Juri Lelli, Jonathan Corbet,
	Mathieu Desnoyers, Alexei Starovoitov, kernel-t

On Mon, 12 Feb 2018 11:06:44 +0900
Namhyung Kim <namhyung@kernel.org> wrote:

> Hi Steve,
> 
> On Fri, Feb 09, 2018 at 05:07:37PM -0500, Steven Rostedt wrote:
> > On Fri, 9 Feb 2018 09:34:36 +0900
> > Namhyung Kim <namhyung@kernel.org> wrote:
> > 
> > > Couldn't we use the symbol name directly?  Maybe it needs a syntax to
> > > indicate global variable.  Like this?
> > > 
> > >   # echo 'do_IRQ(int $total_forks)' > function_events
> > 
> > 
> > I decided to stick with "$".
> 
> Good.
> 
> 
> [SNIP]
> >  static enum func_states
> >  process_event(struct func_event *fevent, const char *token, enum func_states state)
> >  {
> > @@ -469,6 +479,8 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
> >  	case FUNC_STATE_ARRAY_END:
> >  		if (WARN_ON(!fevent->last_arg))
> >  			break;
> > +		if (token[0] == '$')
> > +			return FUNC_STATE_SYMBOL;
> >  		if (update_arg_name(fevent, token) < 0)
> >  			break;
> >  		if (strncmp(token, "0x", 2) == 0)
> > @@ -542,6 +554,11 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
> >  		fevent->last_arg->index += val;
> >  		return FUNC_STATE_VAR;
> >  
> > +	case FUNC_STATE_SYMBOL:
> > +		if (!isalpha(token[0]) && token[0] != '_')
> > +			break;
> > +		goto equal;
> > +
> >  	case FUNC_STATE_ADDR:
> >  		switch (token[0]) {
> >  		case ')':
> > @@ -599,14 +616,26 @@ process_event(struct func_event *fevent, const char *token, enum func_states sta
> >  		break;
> >  
> >  	case FUNC_STATE_EQUAL:
> > +		if (token[0] == '$')
> > +			return FUNC_STATE_SYMBOL;
> >  		if (strncmp(token, "0x", 2) != 0)
> >  			break;
> >   equal:
> >  		if (WARN_ON(!fevent->last_arg))
> >  			break;
> > -		ret = kstrtoul(token, 0, &val);
> > -		if (ret < 0)
> > -			break;
> > +		if (isalpha(token[0]) || token[0] != '_') {
> 
> I guess you wanted the token[0] being '_'.  Maybe it'd be better adding
> 
>   #define isident0(x)  (isalpha(x) || (x) == '_')

If this '$' is only for the symbol or direct address(with 0x prefix),
you just need to check by !isdigit(token[0]), isn't it? 
(and if it is insane get_symbol just fails)

Thanks,

> 
> ?
> 
> Thanks,
> Namhyung
> 
> 
> > +			ret = get_symbol(token, &val);
> > +			if (ret < 0)
> > +				break;
> > +			if (!fevent->last_arg->name) {
> > +				if (update_arg_name(fevent, token) < 0)
> > +					break;
> > +			}
> > +		} else {
> > +			ret = kstrtoul(token, 0, &val);
> > +			if (ret < 0)
> > +				break;
> > +		}
> >  		update_arg = false;
> >  		fevent->last_arg->index = val;
> >  		fevent->last_arg->arg = -1;
> > -- 
> > 2.13.6
> > 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 12/18] tracing: Add accessing direct address from function based events
  2018-02-12 15:47           ` Masami Hiramatsu
  (?)
@ 2018-02-12 16:47           ` Steven Rostedt
  -1 siblings, 0 replies; 87+ messages in thread
From: Steven Rostedt @ 2018-02-12 16:47 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Namhyung Kim, linux-kernel, Linus Torvalds, Ingo Molnar,
	Andrew Morton, Thomas Gleixner, Peter Zijlstra, Tom Zanussi,
	linux-rt-users, linux-trace-users, Arnaldo Carvalho de Melo,
	Clark Williams, Jiri Olsa, Daniel Bristot de Oliveira,
	Juri Lelli, Jonathan Corbet, Mathieu Desnoyers,
	Alexei Starovoitov, kernel-team

On Tue, 13 Feb 2018 00:47:50 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> > >  		if (WARN_ON(!fevent->last_arg))
> > >  			break;
> > > -		ret = kstrtoul(token, 0, &val);
> > > -		if (ret < 0)
> > > -			break;
> > > +		if (isalpha(token[0]) || token[0] != '_') {  
> > 
> > I guess you wanted the token[0] being '_'.  Maybe it'd be better adding
> > 
> >   #define isident0(x)  (isalpha(x) || (x) == '_')  
> 
> If this '$' is only for the symbol or direct address(with 0x prefix),
> you just need to check by !isdigit(token[0]), isn't it? 
> (and if it is insane get_symbol just fails)

I modified a lot of this code for the next version (which I'm still
tweaking).

I have this for next_token() (which I may add for the
trace_events_filter.c code as Al Viro has recently pointed out issues
with its parsing):

static char *next_token(char **ptr, char *last)
{
	char *arg;
	char *str;

	if (!*ptr)
		return NULL;

	arg = *ptr;

	if (*last)
		*arg = *last;

	if (!*arg)
		return NULL;

	for (str = arg; *str; str++) {
		if (!isalnum(*str) && *str != '_')
			break;
	}
	if (*str) {
		if (str == arg)
			str++;
		*last = *str;
		*str = 0;
		*ptr = str;
		return arg;
	}

	*last = 0;
	*ptr = NULL;
	return arg;
}


And this:

static bool valid_name(const char *token)
{
	return isalpha(token[0]) || token[0] == '_';
}

As all tokens will now be either entirely alphanumeric with '_' or a
single character.

-- Steve

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 17/18] tracing: Add indirect to indirect access for function based events
  2018-02-12  2:15       ` Namhyung Kim
@ 2018-02-12 17:23         ` Steven Rostedt
  2018-02-13  9:27           ` Namhyung Kim
  0 siblings, 1 reply; 87+ messages in thread
From: Steven Rostedt @ 2018-02-12 17:23 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu, Tom Zanussi,
	linux-rt-users, linux-trace-users, Arnaldo Carvalho de Melo,
	Clark Williams, Jiri Olsa, Daniel Bristot de Oliveira,
	Juri Lelli, Jonathan Corbet, Mathieu Desnoyers,
	Alexei Starovoitov, kernel-team

On Mon, 12 Feb 2018 11:15:34 +0900
Namhyung Kim <namhyung@kernel.org> wrote:

> > It should have been:
> > 
> > 		return process_redirects(arg, val, buf);  
> 
> But I think you need to consider data type of the arg when
> dereferencing the last redirect.

What for?

Also this code has also changed. I haven't posted new patches but my
latest is in my git tree in the branch ftrace/dynamic-ftrace-events.

-- Steve

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 17/18] tracing: Add indirect to indirect access for function based events
  2018-02-12 17:23         ` Steven Rostedt
@ 2018-02-13  9:27           ` Namhyung Kim
  2018-02-13 15:28             ` Steven Rostedt
  0 siblings, 1 reply; 87+ messages in thread
From: Namhyung Kim @ 2018-02-13  9:27 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu, Tom Zanussi,
	linux-rt-users, linux-trace-users, Arnaldo Carvalho de Melo,
	Clark Williams, Jiri Olsa, Daniel Bristot de Oliveira,
	Juri Lelli, Jonathan Corbet, Mathieu Desnoyers,
	Alexei Starovoitov, kernel-team

Hi Steve,

On Mon, Feb 12, 2018 at 12:23:54PM -0500, Steven Rostedt wrote:
> On Mon, 12 Feb 2018 11:15:34 +0900
> Namhyung Kim <namhyung@kernel.org> wrote:
> 
> > > It should have been:
> > > 
> > > 		return process_redirects(arg, val, buf);  
> > 
> > But I think you need to consider data type of the arg when
> > dereferencing the last redirect.
> 
> What for?

Otherwise it'd return a value of unsigned long type regardless of the
type of the arg.  I thought it should have same logic as indirect
args.

But it seems not matter since record_entry() would copy it only for
the arg->size.  Then type check in __get_arg() might be unnecessary
too.

> 
> Also this code has also changed. I haven't posted new patches but my
> latest is in my git tree in the branch ftrace/dynamic-ftrace-events.

I've checked it.  And it seems to have another problem for indirect
(but not redirect) arrays and strings.  Like 'x32[1] foo[2]'?

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 17/18] tracing: Add indirect to indirect access for function based events
  2018-02-13  9:27           ` Namhyung Kim
@ 2018-02-13 15:28             ` Steven Rostedt
  0 siblings, 0 replies; 87+ messages in thread
From: Steven Rostedt @ 2018-02-13 15:28 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Masami Hiramatsu, Tom Zanussi,
	linux-rt-users, linux-trace-users, Arnaldo Carvalho de Melo,
	Clark Williams, Jiri Olsa, Daniel Bristot de Oliveira,
	Juri Lelli, Jonathan Corbet, Mathieu Desnoyers,
	Alexei Starovoitov, kernel-team

On Tue, 13 Feb 2018 18:27:37 +0900
Namhyung Kim <namhyung@kernel.org> wrote:

> Hi Steve,
> 
> On Mon, Feb 12, 2018 at 12:23:54PM -0500, Steven Rostedt wrote:
> > On Mon, 12 Feb 2018 11:15:34 +0900
> > Namhyung Kim <namhyung@kernel.org> wrote:
> >   
> > > > It should have been:
> > > > 
> > > > 		return process_redirects(arg, val, buf);    
> > > 
> > > But I think you need to consider data type of the arg when
> > > dereferencing the last redirect.  
> > 
> > What for?  
> 
> Otherwise it'd return a value of unsigned long type regardless of the
> type of the arg.  I thought it should have same logic as indirect
> args.

OK, I see what you are saying. Yeah, I think it should have the check,
or ...

> 
> But it seems not matter since record_entry() would copy it only for
> the arg->size.  Then type check in __get_arg() might be unnecessary
> too.

No, on big endian boxes it can break. I think the answer is to have the
final assignment do the conversion. Or I should have a helper function
to do it.

> 
> > 
> > Also this code has also changed. I haven't posted new patches but my
> > latest is in my git tree in the branch ftrace/dynamic-ftrace-events.  
> 
> I've checked it.  And it seems to have another problem for indirect
> (but not redirect) arrays and strings.  Like 'x32[1] foo[2]'?

Yep, I know the issue there, I haven't gotten around to fixing it.

Thanks!

-- Steve

^ permalink raw reply	[flat|nested] 87+ messages in thread

end of thread, other threads:[~2018-02-13 15:28 UTC | newest]

Thread overview: 87+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-02 23:04 [PATCH 00/18] [ANNOUNCE] Dynamically created function based events Steven Rostedt
2018-02-02 23:04 ` [PATCH 01/18] tracing: Add " Steven Rostedt
2018-02-05  8:24   ` Jiri Olsa
2018-02-05 15:00     ` Steven Rostedt
2018-02-07  3:09       ` Steven Rostedt
2018-02-07 12:06         ` Jiri Olsa
2018-02-02 23:05 ` [PATCH 02/18] tracing: Add documentation for " Steven Rostedt
2018-02-02 23:05 ` [PATCH 03/18] tracing: Add simple arguments to " Steven Rostedt
2018-02-08 10:18   ` Namhyung Kim
2018-02-08 15:37     ` Steven Rostedt
2018-02-02 23:05 ` [PATCH 04/18] tracing/x86: Add arch_get_func_args() function Steven Rostedt
2018-02-05 16:33   ` Masami Hiramatsu
2018-02-05 17:06     ` Steven Rostedt
2018-02-08  5:28   ` Namhyung Kim
2018-02-08 15:29     ` Steven Rostedt
2018-02-02 23:05 ` [PATCH 05/18] tracing: Add hex print for dynamic ftrace based events Steven Rostedt
2018-02-02 23:05 ` [PATCH 06/18] tracing: Add indirect offset to args of " Steven Rostedt
2018-02-02 23:05 ` [PATCH 07/18] tracing: Add dereferencing multiple fields per arg Steven Rostedt
2018-02-02 23:05 ` [PATCH 08/18] tracing: Add "unsigned" to function based events Steven Rostedt
2018-02-02 23:05 ` [PATCH 09/18] tracing: Add indexing of arguments for " Steven Rostedt
2018-02-08 10:59   ` Namhyung Kim
2018-02-08 15:43     ` Steven Rostedt
2018-02-08 23:56       ` Namhyung Kim
2018-02-09  0:19         ` Steven Rostedt
2018-02-02 23:05 ` [PATCH 10/18] tracing: Make func_type enums for easier comparing of arg types Steven Rostedt
2018-02-02 23:05 ` [PATCH 11/18] tracing: Add symbol type to function based events Steven Rostedt
2018-02-08 11:03   ` Namhyung Kim
2018-02-08 15:48     ` Steven Rostedt
2018-02-02 23:05 ` [PATCH 12/18] tracing: Add accessing direct address from " Steven Rostedt
2018-02-09  0:34   ` Namhyung Kim
2018-02-09  1:10     ` Steven Rostedt
2018-02-09 22:07     ` Steven Rostedt
2018-02-12  2:06       ` Namhyung Kim
2018-02-12 15:47         ` Masami Hiramatsu
2018-02-12 15:47           ` Masami Hiramatsu
2018-02-12 16:47           ` Steven Rostedt
2018-02-02 23:05 ` [PATCH 13/18] tracing: Add array type to " Steven Rostedt
2018-02-03 13:56   ` Masami Hiramatsu
2018-02-03 15:29     ` Steven Rostedt
2018-02-04  3:50       ` Masami Hiramatsu
2018-02-09  1:17   ` Namhyung Kim
2018-02-09  1:54     ` Steven Rostedt
2018-02-02 23:05 ` [PATCH 14/18] tracing: Have char arrays be strings for " Steven Rostedt
2018-02-02 23:05 ` [PATCH 15/18] tracing: Add string type for dynamic strings in " Steven Rostedt
2018-02-09  3:15   ` Namhyung Kim
2018-02-09  3:31     ` Steven Rostedt
2018-02-02 23:05 ` [PATCH 16/18] tracing: Add NULL to skip args for " Steven Rostedt
2018-02-02 23:05 ` [PATCH 17/18] tracing: Add indirect to indirect access " Steven Rostedt
2018-02-09  5:13   ` Namhyung Kim
2018-02-09 15:47     ` Steven Rostedt
2018-02-09 17:18       ` Steven Rostedt
2018-02-12  2:15       ` Namhyung Kim
2018-02-12 17:23         ` Steven Rostedt
2018-02-13  9:27           ` Namhyung Kim
2018-02-13 15:28             ` Steven Rostedt
2018-02-02 23:05 ` [PATCH 18/18] tracing/perf: Allow perf to use " Steven Rostedt
2018-02-03 13:38 ` [PATCH 00/18] [ANNOUNCE] Dynamically created " Masami Hiramatsu
2018-02-03 15:27   ` Steven Rostedt
2018-02-04  3:57     ` Masami Hiramatsu
2018-02-04 17:21       ` Alexei Starovoitov
2018-02-05 14:39         ` Masami Hiramatsu
2018-02-03 17:04 ` Mathieu Desnoyers
2018-02-03 19:02   ` Steven Rostedt
2018-02-03 20:52     ` Alexei Starovoitov
2018-02-03 21:08       ` Steven Rostedt
2018-02-03 21:30         ` Alexei Starovoitov
2018-02-04  2:37           ` Namhyung Kim
2018-02-04 15:50         ` Mathieu Desnoyers
2018-02-03 21:17       ` Steven Rostedt
2018-02-03 21:38         ` Alexei Starovoitov
2018-02-04  2:25         ` Namhyung Kim
2018-02-05 15:02           ` Steven Rostedt
2018-02-05 13:53         ` Juri Lelli
2018-02-05 13:53           ` Juri Lelli
2018-02-05 15:07           ` Steven Rostedt
2018-02-05 15:07             ` Steven Rostedt
2018-02-03 21:43   ` Linus Torvalds
2018-02-04 15:30     ` Mathieu Desnoyers
2018-02-04 15:47       ` Steven Rostedt
2018-02-04 19:39       ` Linus Torvalds
2018-02-05 10:09         ` Peter Zijlstra
2018-02-05 15:10           ` Steven Rostedt
2018-02-05 15:14         ` Masami Hiramatsu
2018-02-03 18:52 ` Steven Rostedt
2018-02-05 10:23 ` Juri Lelli
2018-02-05 10:49   ` Daniel Bristot de Oliveira
2018-02-05 15:11     ` Steven Rostedt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.