All of lore.kernel.org
 help / color / mirror / Atom feed
* [for-next][PATCH 00/12] tracing: Updates for 6.3
@ 2023-01-25 17:12 Steven Rostedt
  2023-01-25 17:12 ` [for-next][PATCH 01/12] tracing: Add a way to filter function addresses to function names Steven Rostedt
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: Steven Rostedt @ 2023-01-25 17:12 UTC (permalink / raw)
  To: linux-kernel; +Cc: Masami Hiramatsu, Andrew Morton


  git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
trace/for-next

Head SHA1: dc513fd5321d4c15ac1f820c224b2e5220b6e14f


Jia-Ju Bai (1):
      tracing: Add NULL checks for buffer in ring_buffer_free_read_page()

Mark Rutland (1):
      ftrace: Add sample with custom ops

Steven Rostedt (Google) (10):
      tracing: Add a way to filter function addresses to function names
      tracing/selftests: Add test for event filtering on function name
      tracing: Simplify calculating entry size using struct_size()
      tracing: Allow stacktraces to be saved as histogram variables
      tracing: Allow synthetic events to pass around stacktraces
      tracing/histogram: Add stacktrace type
      tracing/histogram: Document variable stacktrace
      tracing/histogram: Add simple tests for stacktrace usage of synthetic events
      perf/tracing: Use stage6 of tracing to not duplicate macros
      bpf/tracing: Use stage6 of tracing to not duplicate macros

----
 Documentation/trace/events.rst                     |  12 +
 Documentation/trace/histogram.rst                  | 129 +++++++++++
 include/trace/bpf_probe.h                          |  45 +---
 include/trace/perf.h                               |  46 +---
 include/trace/stages/stage6_event_callback.h       |   3 +
 kernel/trace/ring_buffer.c                         |   7 +-
 kernel/trace/trace.c                               |   2 +-
 kernel/trace/trace.h                               |   4 +
 kernel/trace/trace_events.c                        |   2 +-
 kernel/trace/trace_events_filter.c                 |  93 +++++++-
 kernel/trace/trace_events_hist.c                   |  63 ++++--
 kernel/trace/trace_events_synth.c                  |  83 ++++++-
 kernel/trace/trace_synth.h                         |   1 +
 samples/Kconfig                                    |   7 +
 samples/Makefile                                   |   1 +
 samples/ftrace/Makefile                            |   1 +
 samples/ftrace/ftrace-ops.c                        | 252 +++++++++++++++++++++
 .../ftrace/test.d/filter/event-filter-function.tc  |  58 +++++
 .../inter-event/trigger-synthetic-event-stack.tc   |  24 ++
 .../inter-event/trigger-synthetic-event-syntax.tc  |   6 +
 20 files changed, 729 insertions(+), 110 deletions(-)
 create mode 100644 samples/ftrace/ftrace-ops.c
 create mode 100644 tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc
 create mode 100644 tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-synthetic-event-stack.tc

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [for-next][PATCH 01/12] tracing: Add a way to filter function addresses to function names
  2023-01-25 17:12 [for-next][PATCH 00/12] tracing: Updates for 6.3 Steven Rostedt
@ 2023-01-25 17:12 ` Steven Rostedt
  2023-01-25 17:12 ` [for-next][PATCH 02/12] tracing/selftests: Add test for event filtering on function name Steven Rostedt
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Steven Rostedt @ 2023-01-25 17:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: Masami Hiramatsu, Andrew Morton, Tom Zanussi, Zheng Yejian, Ross Zwisler

From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

There's been several times where an event records a function address in
its field and I needed to filter on that address for a specific function
name. It required looking up the function in kallsyms, finding its size,
and doing a compare of "field >= function_start && field < function_end".

But this would change from boot to boot and is unreliable in scripts.
Also, it is useful to have this at boot up, where the addresses will not
be known. For example, on the boot command line:

  trace_trigger="initcall_finish.traceoff if func.function == acpi_init"

To implement this, add a ".function" prefix, that will check that the
field is of size long, and the only operations allowed (so far) are "=="
and "!=".

Link: https://lkml.kernel.org/r/20221219183213.916833763@goodmis.org

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Zheng Yejian <zhengyejian1@huawei.com>
Reviewed-by: Ross Zwisler <zwisler@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 Documentation/trace/events.rst     | 12 ++++
 kernel/trace/trace_events.c        |  2 +-
 kernel/trace/trace_events_filter.c | 93 +++++++++++++++++++++++++++++-
 3 files changed, 105 insertions(+), 2 deletions(-)

diff --git a/Documentation/trace/events.rst b/Documentation/trace/events.rst
index c47f381d0c00..d0fd5c7220b7 100644
--- a/Documentation/trace/events.rst
+++ b/Documentation/trace/events.rst
@@ -207,6 +207,18 @@ field name::
 As the kernel will have to know how to retrieve the memory that the pointer
 is at from user space.
 
+You can convert any long type to a function address and search by function name::
+
+  call_site.function == security_prepare_creds
+
+The above will filter when the field "call_site" falls on the address within
+"security_prepare_creds". That is, it will compare the value of "call_site" and
+the filter will return true if it is greater than or equal to the start of
+the function "security_prepare_creds" and less than the end of that function.
+
+The ".function" postfix can only be attached to values of size long, and can only
+be compared with "==" or "!=".
+
 5.2 Setting filters
 -------------------
 
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 33e0b4f8ebe6..b07931ad97de 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -2822,7 +2822,7 @@ static __init int setup_trace_triggers(char *str)
 		if (!trigger)
 			break;
 		bootup_triggers[i].event = strsep(&trigger, ".");
-		bootup_triggers[i].trigger = strsep(&trigger, ".");
+		bootup_triggers[i].trigger = trigger;
 		if (!bootup_triggers[i].trigger)
 			break;
 	}
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index e095c3b3a50d..1dad64267878 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -64,6 +64,7 @@ enum filter_pred_fn {
 	FILTER_PRED_FN_PCHAR_USER,
 	FILTER_PRED_FN_PCHAR,
 	FILTER_PRED_FN_CPU,
+	FILTER_PRED_FN_FUNCTION,
 	FILTER_PRED_FN_,
 	FILTER_PRED_TEST_VISITED,
 };
@@ -71,6 +72,7 @@ enum filter_pred_fn {
 struct filter_pred {
 	enum filter_pred_fn 	fn_num;
 	u64 			val;
+	u64 			val2;
 	struct regex		regex;
 	unsigned short		*ops;
 	struct ftrace_event_field *field;
@@ -103,6 +105,7 @@ struct filter_pred {
 	C(INVALID_FILTER,	"Meaningless filter expression"),	\
 	C(IP_FIELD_ONLY,	"Only 'ip' field is supported for function trace"), \
 	C(INVALID_VALUE,	"Invalid value (did you forget quotes)?"), \
+	C(NO_FUNCTION,		"Function not found"),			\
 	C(ERRNO,		"Error"),				\
 	C(NO_FILTER,		"No filter found")
 
@@ -876,6 +879,17 @@ static int filter_pred_comm(struct filter_pred *pred, void *event)
 	return cmp ^ pred->not;
 }
 
+/* Filter predicate for functions. */
+static int filter_pred_function(struct filter_pred *pred, void *event)
+{
+	unsigned long *addr = (unsigned long *)(event + pred->offset);
+	unsigned long start = (unsigned long)pred->val;
+	unsigned long end = (unsigned long)pred->val2;
+	int ret = *addr >= start && *addr < end;
+
+	return pred->op == OP_EQ ? ret : !ret;
+}
+
 /*
  * regex_match_foo - Basic regex callbacks
  *
@@ -1335,6 +1349,8 @@ static int filter_pred_fn_call(struct filter_pred *pred, void *event)
 		return filter_pred_pchar(pred, event);
 	case FILTER_PRED_FN_CPU:
 		return filter_pred_cpu(pred, event);
+	case FILTER_PRED_FN_FUNCTION:
+		return filter_pred_function(pred, event);
 	case FILTER_PRED_TEST_VISITED:
 		return test_pred_visited_fn(pred, event);
 	default:
@@ -1350,8 +1366,13 @@ static int parse_pred(const char *str, void *data,
 	struct trace_event_call *call = data;
 	struct ftrace_event_field *field;
 	struct filter_pred *pred = NULL;
+	unsigned long offset;
+	unsigned long size;
+	unsigned long ip;
 	char num_buf[24];	/* Big enough to hold an address */
 	char *field_name;
+	char *name;
+	bool function = false;
 	bool ustring = false;
 	char q;
 	u64 val;
@@ -1393,6 +1414,12 @@ static int parse_pred(const char *str, void *data,
 		i += len;
 	}
 
+	/* See if the field is a kernel function name */
+	if ((len = str_has_prefix(str + i, ".function"))) {
+		function = true;
+		i += len;
+	}
+
 	while (isspace(str[i]))
 		i++;
 
@@ -1423,7 +1450,71 @@ static int parse_pred(const char *str, void *data,
 	pred->offset = field->offset;
 	pred->op = op;
 
-	if (ftrace_event_is_function(call)) {
+	if (function) {
+		/* The field must be the same size as long */
+		if (field->size != sizeof(long)) {
+			parse_error(pe, FILT_ERR_ILLEGAL_FIELD_OP, pos + i);
+			goto err_free;
+		}
+
+		/* Function only works with '==' or '!=' and an unquoted string */
+		switch (op) {
+		case OP_NE:
+		case OP_EQ:
+			break;
+		default:
+			parse_error(pe, FILT_ERR_INVALID_OP, pos + i);
+			goto err_free;
+		}
+
+		if (isdigit(str[i])) {
+			/* We allow 0xDEADBEEF */
+			while (isalnum(str[i]))
+				i++;
+
+			len = i - s;
+			/* 0xfeedfacedeadbeef is 18 chars max */
+			if (len >= sizeof(num_buf)) {
+				parse_error(pe, FILT_ERR_OPERAND_TOO_LONG, pos + i);
+				goto err_free;
+			}
+
+			strncpy(num_buf, str + s, len);
+			num_buf[len] = 0;
+
+			ret = kstrtoul(num_buf, 0, &ip);
+			if (ret) {
+				parse_error(pe, FILT_ERR_INVALID_VALUE, pos + i);
+				goto err_free;
+			}
+		} else {
+			s = i;
+			for (; str[i] && !isspace(str[i]); i++)
+				;
+
+			len = i - s;
+			name = kmemdup_nul(str + s, len, GFP_KERNEL);
+			if (!name)
+				goto err_mem;
+			ip = kallsyms_lookup_name(name);
+			kfree(name);
+			if (!ip) {
+				parse_error(pe, FILT_ERR_NO_FUNCTION, pos + i);
+				goto err_free;
+			}
+		}
+
+		/* Now find the function start and end address */
+		if (!kallsyms_lookup_size_offset(ip, &size, &offset)) {
+			parse_error(pe, FILT_ERR_NO_FUNCTION, pos + i);
+			goto err_free;
+		}
+
+		pred->fn_num = FILTER_PRED_FN_FUNCTION;
+		pred->val = ip - offset;
+		pred->val2 = pred->val + size;
+
+	} else if (ftrace_event_is_function(call)) {
 		/*
 		 * Perf does things different with function events.
 		 * It only allows an "ip" field, and expects a string.
-- 
2.39.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [for-next][PATCH 02/12] tracing/selftests: Add test for event filtering on function name
  2023-01-25 17:12 [for-next][PATCH 00/12] tracing: Updates for 6.3 Steven Rostedt
  2023-01-25 17:12 ` [for-next][PATCH 01/12] tracing: Add a way to filter function addresses to function names Steven Rostedt
@ 2023-01-25 17:12 ` Steven Rostedt
  2023-01-25 17:12 ` [for-next][PATCH 03/12] ftrace: Add sample with custom ops Steven Rostedt
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Steven Rostedt @ 2023-01-25 17:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: Masami Hiramatsu, Andrew Morton, Tom Zanussi, Zheng Yejian,
	linux-kselftest, Ross Zwisler, Shuah Khan

From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

With the new filter logic of passing in the name of a function to match an
instruction pointer (or the address of the function), add a test to make
sure that it is functional.

This is also the first test to test plain filtering. The filtering has
been tested via the trigger logic, which uses the same code, but there was
nothing to test just the event filter, so this test is the first to add
such a case.

Link: https://lkml.kernel.org/r/20221219183214.075559302@goodmis.org

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Zheng Yejian <zhengyejian1@huawei.com>
Cc: linux-kselftest@vger.kernel.org
Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Ross Zwisler <zwisler@google.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 .../test.d/filter/event-filter-function.tc    | 58 +++++++++++++++++++
 1 file changed, 58 insertions(+)
 create mode 100644 tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc

diff --git a/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc b/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc
new file mode 100644
index 000000000000..e2ff3bf4df80
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc
@@ -0,0 +1,58 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: event filter function - test event filtering on functions
+# requires: set_event events/kmem/kmem_cache_free/filter
+# flags: instance
+
+fail() { #msg
+    echo $1
+    exit_fail
+}
+
+echo "Test event filter function name"
+echo 0 > tracing_on
+echo 0 > events/enable
+echo > trace
+echo 'call_site.function == exit_mmap' > events/kmem/kmem_cache_free/filter
+echo 1 > events/kmem/kmem_cache_free/enable
+echo 1 > tracing_on
+ls > /dev/null
+echo 0 > events/kmem/kmem_cache_free/enable
+
+hitcnt=`grep kmem_cache_free trace| grep exit_mmap | wc -l`
+misscnt=`grep kmem_cache_free trace| grep -v exit_mmap | wc -l`
+
+if [ $hitcnt -eq 0 ]; then
+	exit_fail
+fi
+
+if [ $misscnt -gt 0 ]; then
+	exit_fail
+fi
+
+address=`grep ' exit_mmap$' /proc/kallsyms | cut -d' ' -f1`
+
+echo "Test event filter function address"
+echo 0 > tracing_on
+echo 0 > events/enable
+echo > trace
+echo "call_site.function == 0x$address" > events/kmem/kmem_cache_free/filter
+echo 1 > events/kmem/kmem_cache_free/enable
+echo 1 > tracing_on
+sleep 1
+echo 0 > events/kmem/kmem_cache_free/enable
+
+hitcnt=`grep kmem_cache_free trace| grep exit_mmap | wc -l`
+misscnt=`grep kmem_cache_free trace| grep -v exit_mmap | wc -l`
+
+if [ $hitcnt -eq 0 ]; then
+	exit_fail
+fi
+
+if [ $misscnt -gt 0 ]; then
+	exit_fail
+fi
+
+reset_events_filter
+
+exit 0
-- 
2.39.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [for-next][PATCH 03/12] ftrace: Add sample with custom ops
  2023-01-25 17:12 [for-next][PATCH 00/12] tracing: Updates for 6.3 Steven Rostedt
  2023-01-25 17:12 ` [for-next][PATCH 01/12] tracing: Add a way to filter function addresses to function names Steven Rostedt
  2023-01-25 17:12 ` [for-next][PATCH 02/12] tracing/selftests: Add test for event filtering on function name Steven Rostedt
@ 2023-01-25 17:12 ` Steven Rostedt
  2023-01-25 17:12 ` [for-next][PATCH 04/12] tracing: Add NULL checks for buffer in ring_buffer_free_read_page() Steven Rostedt
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Steven Rostedt @ 2023-01-25 17:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: Masami Hiramatsu, Andrew Morton, Florent Revest, Mark Rutland

From: Mark Rutland <mark.rutland@arm.com>

When reworking core ftrace code or architectural ftrace code, it's often
necessary to test/analyse/benchmark a number of ftrace_ops
configurations. This patch adds a module which can be used to explore
some of those configurations.

I'm using this to benchmark various options for changing the way
trampolines and handling of ftrace_ops work on arm64, and ensuring other
architectures aren't adversely affected.

For example, in a QEMU+KVM VM running on a 2GHz Xeon E5-2660
workstation, loading the module in various configurations produces:

| # insmod ftrace-ops.ko
| ftrace_ops: registering:
|   relevant ops: 1
|     tracee: tracee_relevant [ftrace_ops]
|     tracer: ops_func_nop [ftrace_ops]
|   irrelevant ops: 0
|     tracee: tracee_irrelevant [ftrace_ops]
|     tracer: ops_func_nop [ftrace_ops]
|   saving registers: NO
|   assist recursion: NO
|   assist RCU: NO
| ftrace_ops: Attempted 100000 calls to tracee_relevant [ftrace_ops] in 1681558ns (16ns / call)

| # insmod ftrace-ops.ko nr_ops_irrelevant=5
| ftrace_ops: registering:
|   relevant ops: 1
|     tracee: tracee_relevant [ftrace_ops]
|     tracer: ops_func_nop [ftrace_ops]
|   irrelevant ops: 5
|     tracee: tracee_irrelevant [ftrace_ops]
|     tracer: ops_func_nop [ftrace_ops]
|   saving registers: NO
|   assist recursion: NO
|   assist RCU: NO
| ftrace_ops: Attempted 100000 calls to tracee_relevant [ftrace_ops] in 1693042ns (16ns / call)

| # insmod ftrace-ops.ko nr_ops_relevant=2
| ftrace_ops: registering:
|   relevant ops: 2
|     tracee: tracee_relevant [ftrace_ops]
|     tracer: ops_func_nop [ftrace_ops]
|   irrelevant ops: 0
|     tracee: tracee_irrelevant [ftrace_ops]
|     tracer: ops_func_nop [ftrace_ops]
|   saving registers: NO
|   assist recursion: NO
|   assist RCU: NO
| ftrace_ops: Attempted 100000 calls to tracee_relevant [ftrace_ops] in 11965582ns (119ns / call)

| # insmod ftrace-ops.ko save_regs=true
| ftrace_ops: registering:
|   relevant ops: 1
|     tracee: tracee_relevant [ftrace_ops]
|     tracer: ops_func_nop [ftrace_ops]
|   irrelevant ops: 0
|     tracee: tracee_irrelevant [ftrace_ops]
|     tracer: ops_func_nop [ftrace_ops]
|   saving registers: YES
|   assist recursion: NO
|   assist RCU: NO
| ftrace_ops: Attempted 100000 calls to tracee_relevant [ftrace_ops] in 4459624ns (44ns / call)

Link: https://lkml.kernel.org/r/20230103124912.2948963-4-mark.rutland@arm.com

Cc: Florent Revest <revest@chromium.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 samples/Kconfig             |   7 +
 samples/Makefile            |   1 +
 samples/ftrace/Makefile     |   1 +
 samples/ftrace/ftrace-ops.c | 252 ++++++++++++++++++++++++++++++++++++
 4 files changed, 261 insertions(+)
 create mode 100644 samples/ftrace/ftrace-ops.c

diff --git a/samples/Kconfig b/samples/Kconfig
index 0d81c00289ee..daf14c35f071 100644
--- a/samples/Kconfig
+++ b/samples/Kconfig
@@ -46,6 +46,13 @@ config SAMPLE_FTRACE_DIRECT_MULTI
 	  that hooks to wake_up_process and schedule, and prints
 	  the function addresses.
 
+config SAMPLE_FTRACE_OPS
+	tristate "Build custom ftrace ops example"
+	depends on FUNCTION_TRACER
+	help
+	  This builds an ftrace ops example that hooks two functions and
+	  measures the time taken to invoke one function a number of times.
+
 config SAMPLE_TRACE_ARRAY
         tristate "Build sample module for kernel access to Ftrace instancess"
 	depends on EVENT_TRACING && m
diff --git a/samples/Makefile b/samples/Makefile
index 9832ef3f8fcb..7cb632ef88ee 100644
--- a/samples/Makefile
+++ b/samples/Makefile
@@ -24,6 +24,7 @@ obj-$(CONFIG_SAMPLE_TRACE_CUSTOM_EVENTS) += trace_events/
 obj-$(CONFIG_SAMPLE_TRACE_PRINTK)	+= trace_printk/
 obj-$(CONFIG_SAMPLE_FTRACE_DIRECT)	+= ftrace/
 obj-$(CONFIG_SAMPLE_FTRACE_DIRECT_MULTI) += ftrace/
+obj-$(CONFIG_SAMPLE_FTRACE_OPS)		+= ftrace/
 obj-$(CONFIG_SAMPLE_TRACE_ARRAY)	+= ftrace/
 subdir-$(CONFIG_SAMPLE_UHID)		+= uhid
 obj-$(CONFIG_VIDEO_PCI_SKELETON)	+= v4l/
diff --git a/samples/ftrace/Makefile b/samples/ftrace/Makefile
index faf8cdb79c5f..589baf2ec4e3 100644
--- a/samples/ftrace/Makefile
+++ b/samples/ftrace/Makefile
@@ -5,6 +5,7 @@ obj-$(CONFIG_SAMPLE_FTRACE_DIRECT) += ftrace-direct-too.o
 obj-$(CONFIG_SAMPLE_FTRACE_DIRECT) += ftrace-direct-modify.o
 obj-$(CONFIG_SAMPLE_FTRACE_DIRECT_MULTI) += ftrace-direct-multi.o
 obj-$(CONFIG_SAMPLE_FTRACE_DIRECT_MULTI) += ftrace-direct-multi-modify.o
+obj-$(CONFIG_SAMPLE_FTRACE_OPS) += ftrace-ops.o
 
 CFLAGS_sample-trace-array.o := -I$(src)
 obj-$(CONFIG_SAMPLE_TRACE_ARRAY) += sample-trace-array.o
diff --git a/samples/ftrace/ftrace-ops.c b/samples/ftrace/ftrace-ops.c
new file mode 100644
index 000000000000..24deb51c7261
--- /dev/null
+++ b/samples/ftrace/ftrace-ops.c
@@ -0,0 +1,252 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#define pr_fmt(fmt)       KBUILD_MODNAME ": " fmt
+
+#include <linux/ftrace.h>
+#include <linux/ktime.h>
+#include <linux/module.h>
+
+#include <asm/barrier.h>
+
+/*
+ * Arbitrary large value chosen to be sufficiently large to minimize noise but
+ * sufficiently small to complete quickly.
+ */
+unsigned int nr_function_calls = 100000;
+module_param(nr_function_calls, uint, 0);
+MODULE_PARM_DESC(nr_function_calls, "How many times to call the relevant tracee");
+
+/*
+ * The number of ops associated with a call site affects whether a tracer can
+ * be called directly or whether it's necessary to go via the list func, which
+ * can be significantly more expensive.
+ */
+unsigned int nr_ops_relevant = 1;
+module_param(nr_ops_relevant, uint, 0);
+MODULE_PARM_DESC(nr_ops_relevant, "How many ftrace_ops to associate with the relevant tracee");
+
+/*
+ * On architectures where all call sites share the same trampoline, having
+ * tracers enabled for distinct functions can force the use of the list func
+ * and incur overhead for all call sites.
+ */
+unsigned int nr_ops_irrelevant = 0;
+module_param(nr_ops_irrelevant, uint, 0);
+MODULE_PARM_DESC(nr_ops_irrelevant, "How many ftrace_ops to associate with the irrelevant tracee");
+
+/*
+ * On architectures with DYNAMIC_FTRACE_WITH_REGS, saving the full pt_regs can
+ * be more expensive than only saving the minimal necessary regs.
+ */
+bool save_regs = false;
+module_param(save_regs, bool, 0);
+MODULE_PARM_DESC(save_regs, "Register ops with FTRACE_OPS_FL_SAVE_REGS (save all registers in the trampoline)");
+
+bool assist_recursion = false;
+module_param(assist_recursion, bool, 0);
+MODULE_PARM_DESC(assist_reursion, "Register ops with FTRACE_OPS_FL_RECURSION");
+
+bool assist_rcu = false;
+module_param(assist_rcu, bool, 0);
+MODULE_PARM_DESC(assist_reursion, "Register ops with FTRACE_OPS_FL_RCU");
+
+/*
+ * By default, a trivial tracer is used which immediately returns to mimimize
+ * overhead. Sometimes a consistency check using a more expensive tracer is
+ * desireable.
+ */
+bool check_count = false;
+module_param(check_count, bool, 0);
+MODULE_PARM_DESC(check_count, "Check that tracers are called the expected number of times\n");
+
+/*
+ * Usually it's not interesting to leave the ops registered after the test
+ * runs, but sometimes it can be useful to leave them registered so that they
+ * can be inspected through the tracefs 'enabled_functions' file.
+ */
+bool persist = false;
+module_param(persist, bool, 0);
+MODULE_PARM_DESC(persist, "Successfully load module and leave ftrace ops registered after test completes\n");
+
+/*
+ * Marked as noinline to ensure that an out-of-line traceable copy is
+ * generated by the compiler.
+ *
+ * The barrier() ensures the compiler won't elide calls by determining there
+ * are no side-effects.
+ */
+static noinline void tracee_relevant(void)
+{
+	barrier();
+}
+
+/*
+ * Marked as noinline to ensure that an out-of-line traceable copy is
+ * generated by the compiler.
+ *
+ * The barrier() ensures the compiler won't elide calls by determining there
+ * are no side-effects.
+ */
+static noinline void tracee_irrelevant(void)
+{
+	barrier();
+}
+
+struct sample_ops {
+	struct ftrace_ops ops;
+	unsigned int count;
+};
+
+static void ops_func_nop(unsigned long ip, unsigned long parent_ip,
+			 struct ftrace_ops *op,
+			 struct ftrace_regs *fregs)
+{
+	/* do nothing */
+}
+
+static void ops_func_count(unsigned long ip, unsigned long parent_ip,
+			   struct ftrace_ops *op,
+			   struct ftrace_regs *fregs)
+{
+	struct sample_ops *self;
+
+	self = container_of(op, struct sample_ops, ops);
+	self->count++;
+}
+
+struct sample_ops *ops_relevant;
+struct sample_ops *ops_irrelevant;
+
+static struct sample_ops *ops_alloc_init(void *tracee, ftrace_func_t func,
+					 unsigned long flags, int nr)
+{
+	struct sample_ops *ops;
+
+	ops = kcalloc(nr, sizeof(*ops), GFP_KERNEL);
+	if (WARN_ON_ONCE(!ops))
+		return NULL;
+
+	for (unsigned int i = 0; i < nr; i++) {
+		ops[i].ops.func = func;
+		ops[i].ops.flags = flags;
+		WARN_ON_ONCE(ftrace_set_filter_ip(&ops[i].ops, (unsigned long)tracee, 0, 0));
+		WARN_ON_ONCE(register_ftrace_function(&ops[i].ops));
+	}
+
+	return ops;
+}
+
+static void ops_destroy(struct sample_ops *ops, int nr)
+{
+	if (!ops)
+		return;
+
+	for (unsigned int i = 0; i < nr; i++) {
+		WARN_ON_ONCE(unregister_ftrace_function(&ops[i].ops));
+		ftrace_free_filter(&ops[i].ops);
+	}
+
+	kfree(ops);
+}
+
+static void ops_check(struct sample_ops *ops, int nr,
+		      unsigned int expected_count)
+{
+	if (!ops || !check_count)
+		return;
+
+	for (unsigned int i = 0; i < nr; i++) {
+		if (ops->count == expected_count)
+			continue;
+		pr_warn("Counter called %u times (expected %u)\n",
+			ops->count, expected_count);
+	}
+}
+
+ftrace_func_t tracer_relevant = ops_func_nop;
+ftrace_func_t tracer_irrelevant = ops_func_nop;
+
+static int __init ftrace_ops_sample_init(void)
+{
+	unsigned long flags = 0;
+	ktime_t start, end;
+	u64 period;
+
+	if (!IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_REGS) && save_regs) {
+		pr_info("this kernel does not support saving registers\n");
+		save_regs = false;
+	} else if (save_regs) {
+		flags |= FTRACE_OPS_FL_SAVE_REGS;
+	}
+
+	if (assist_recursion)
+		flags |= FTRACE_OPS_FL_RECURSION;
+
+	if (assist_rcu)
+		flags |= FTRACE_OPS_FL_RCU;
+
+	if (check_count) {
+		tracer_relevant = ops_func_count;
+		tracer_irrelevant = ops_func_count;
+	}
+
+	pr_info("registering:\n"
+		"  relevant ops: %u\n"
+		"    tracee: %ps\n"
+		"    tracer: %ps\n"
+		"  irrelevant ops: %u\n"
+		"    tracee: %ps\n"
+		"    tracer: %ps\n"
+		"  saving registers: %s\n"
+		"  assist recursion: %s\n"
+		"  assist RCU: %s\n",
+		nr_ops_relevant, tracee_relevant, tracer_relevant,
+		nr_ops_irrelevant, tracee_irrelevant, tracer_irrelevant,
+		save_regs ? "YES" : "NO",
+		assist_recursion ? "YES" : "NO",
+		assist_rcu ? "YES" : "NO");
+
+	ops_relevant = ops_alloc_init(tracee_relevant, tracer_relevant,
+				      flags, nr_ops_relevant);
+	ops_irrelevant = ops_alloc_init(tracee_irrelevant, tracer_irrelevant,
+					flags, nr_ops_irrelevant);
+
+	start = ktime_get();
+	for (unsigned int i = 0; i < nr_function_calls; i++)
+		tracee_relevant();
+	end = ktime_get();
+
+	ops_check(ops_relevant, nr_ops_relevant, nr_function_calls);
+	ops_check(ops_irrelevant, nr_ops_irrelevant, 0);
+
+	period = ktime_to_ns(ktime_sub(end, start));
+
+	pr_info("Attempted %u calls to %ps in %lluns (%lluns / call)\n",
+		nr_function_calls, tracee_relevant,
+		period, period / nr_function_calls);
+
+	if (persist)
+		return 0;
+
+	ops_destroy(ops_relevant, nr_ops_relevant);
+	ops_destroy(ops_irrelevant, nr_ops_irrelevant);
+
+	/*
+	 * The benchmark completed sucessfully, but there's no reason to keep
+	 * the module around. Return an error do the user doesn't have to
+	 * manually unload the module.
+	 */
+	return -EINVAL;
+}
+module_init(ftrace_ops_sample_init);
+
+static void __exit ftrace_ops_sample_exit(void)
+{
+	ops_destroy(ops_relevant, nr_ops_relevant);
+	ops_destroy(ops_irrelevant, nr_ops_irrelevant);
+}
+module_exit(ftrace_ops_sample_exit);
+
+MODULE_AUTHOR("Mark Rutland");
+MODULE_DESCRIPTION("Example of using custom ftrace_ops");
+MODULE_LICENSE("GPL");
-- 
2.39.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [for-next][PATCH 04/12] tracing: Add NULL checks for buffer in ring_buffer_free_read_page()
  2023-01-25 17:12 [for-next][PATCH 00/12] tracing: Updates for 6.3 Steven Rostedt
                   ` (2 preceding siblings ...)
  2023-01-25 17:12 ` [for-next][PATCH 03/12] ftrace: Add sample with custom ops Steven Rostedt
@ 2023-01-25 17:12 ` Steven Rostedt
  2023-01-25 17:12 ` [for-next][PATCH 05/12] tracing: Simplify calculating entry size using struct_size() Steven Rostedt
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Steven Rostedt @ 2023-01-25 17:12 UTC (permalink / raw)
  To: linux-kernel; +Cc: Masami Hiramatsu, Andrew Morton, TOTE Robot, Jia-Ju Bai

From: Jia-Ju Bai <baijiaju1990@gmail.com>

In a previous commit 7433632c9ff6, buffer, buffer->buffers and
buffer->buffers[cpu] in ring_buffer_wake_waiters() can be NULL,
and thus the related checks are added.

However, in the same call stack, these variables are also used in
ring_buffer_free_read_page():

tracing_buffers_release()
  ring_buffer_wake_waiters(iter->array_buffer->buffer)
    cpu_buffer = buffer->buffers[cpu] -> Add checks by previous commit
  ring_buffer_free_read_page(iter->array_buffer->buffer)
    cpu_buffer = buffer->buffers[cpu] -> No check

Thus, to avod possible null-pointer derefernces, the related checks
should be added.

These results are reported by a static tool designed by myself.

Link: https://lkml.kernel.org/r/20230113125501.760324-1-baijiaju1990@gmail.com

Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 kernel/trace/ring_buffer.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index c366a0a9ddba..45d4a23d6044 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -5626,11 +5626,16 @@ EXPORT_SYMBOL_GPL(ring_buffer_alloc_read_page);
  */
 void ring_buffer_free_read_page(struct trace_buffer *buffer, int cpu, void *data)
 {
-	struct ring_buffer_per_cpu *cpu_buffer = buffer->buffers[cpu];
+	struct ring_buffer_per_cpu *cpu_buffer;
 	struct buffer_data_page *bpage = data;
 	struct page *page = virt_to_page(bpage);
 	unsigned long flags;
 
+	if (!buffer || !buffer->buffers || !buffer->buffers[cpu])
+		return;
+
+	cpu_buffer = buffer->buffers[cpu];
+
 	/* If the page is still in use someplace else, we can't reuse it */
 	if (page_ref_count(page) > 1)
 		goto out;
-- 
2.39.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [for-next][PATCH 05/12] tracing: Simplify calculating entry size using struct_size()
  2023-01-25 17:12 [for-next][PATCH 00/12] tracing: Updates for 6.3 Steven Rostedt
                   ` (3 preceding siblings ...)
  2023-01-25 17:12 ` [for-next][PATCH 04/12] tracing: Add NULL checks for buffer in ring_buffer_free_read_page() Steven Rostedt
@ 2023-01-25 17:12 ` Steven Rostedt
  2023-01-25 17:12 ` [for-next][PATCH 06/12] tracing: Allow stacktraces to be saved as histogram variables Steven Rostedt
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Steven Rostedt @ 2023-01-25 17:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: Masami Hiramatsu, Andrew Morton, Tom Zanussi, Ross Zwisler, Ching-lin Yu

From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

When tracing a dynamic string field for a synthetic event, the offset
calculation for where to write the next event can use struct_size() to
find what the current size of the structure is.

This simplifies the code and makes it less error prone.

Link: https://lkml.kernel.org/r/20230117152235.698632147@goodmis.org

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Ross Zwisler <zwisler@google.com>
Cc: Ching-lin Yu <chinglinyu@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 kernel/trace/trace_events_synth.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c
index 67592eed0be8..9f79cd689b79 100644
--- a/kernel/trace/trace_events_synth.c
+++ b/kernel/trace/trace_events_synth.c
@@ -416,8 +416,7 @@ static unsigned int trace_string(struct synth_trace_event *entry,
 	if (is_dynamic) {
 		u32 data_offset;
 
-		data_offset = offsetof(typeof(*entry), fields);
-		data_offset += event->n_u64 * sizeof(u64);
+		data_offset = struct_size(entry, fields, event->n_u64);
 		data_offset += data_size;
 
 		len = kern_fetch_store_strlen((unsigned long)str_val);
-- 
2.39.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [for-next][PATCH 06/12] tracing: Allow stacktraces to be saved as histogram variables
  2023-01-25 17:12 [for-next][PATCH 00/12] tracing: Updates for 6.3 Steven Rostedt
                   ` (4 preceding siblings ...)
  2023-01-25 17:12 ` [for-next][PATCH 05/12] tracing: Simplify calculating entry size using struct_size() Steven Rostedt
@ 2023-01-25 17:12 ` Steven Rostedt
  2023-01-25 17:12 ` [for-next][PATCH 07/12] tracing: Allow synthetic events to pass around stacktraces Steven Rostedt
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Steven Rostedt @ 2023-01-25 17:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: Masami Hiramatsu, Andrew Morton, Tom Zanussi, Ross Zwisler, Ching-lin Yu

From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

Allow to save stacktraces into a histogram variable. This will be used by
synthetic events to allow a stacktrace from one event to be passed and
displayed by another event.

The special keyword "stacktrace" is to be used to trigger a stack
trace for the event that the histogram trigger is attached to.

  echo 'hist:keys=pid:st=stacktrace" > events/sched/sched_waking/trigger

Currently nothing can get access to the "$st" variable above that contains
the stack trace, but that will soon change.

Link: https://lkml.kernel.org/r/20230117152235.856323729@goodmis.org

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Ross Zwisler <zwisler@google.com>
Cc: Ching-lin Yu <chinglinyu@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 kernel/trace/trace_events_hist.c | 52 ++++++++++++++++++++++++++------
 1 file changed, 42 insertions(+), 10 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 5edbf6b1da3f..f8f67e17898a 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -1360,6 +1360,8 @@ static const char *hist_field_name(struct hist_field *field,
 			field_name = field->name;
 	} else if (field->flags & HIST_FIELD_FL_TIMESTAMP)
 		field_name = "common_timestamp";
+	else if (field->flags & HIST_FIELD_FL_STACKTRACE)
+		field_name = "stacktrace";
 	else if (field->flags & HIST_FIELD_FL_HITCOUNT)
 		field_name = "hitcount";
 
@@ -1980,6 +1982,10 @@ static struct hist_field *create_hist_field(struct hist_trigger_data *hist_data,
 
 	if (flags & HIST_FIELD_FL_STACKTRACE) {
 		hist_field->fn_num = HIST_FIELD_FN_NOP;
+		hist_field->size = HIST_STACKTRACE_SIZE;
+		hist_field->type = kstrdup_const("unsigned long[]", GFP_KERNEL);
+		if (!hist_field->type)
+			goto free;
 		goto out;
 	}
 
@@ -2351,6 +2357,8 @@ parse_field(struct hist_trigger_data *hist_data, struct trace_event_file *file,
 		hist_data->enable_timestamps = true;
 		if (*flags & HIST_FIELD_FL_TIMESTAMP_USECS)
 			hist_data->attrs->ts_in_usecs = true;
+	} else if (strcmp(field_name, "stacktrace") == 0) {
+		*flags |= HIST_FIELD_FL_STACKTRACE;
 	} else if (strcmp(field_name, "common_cpu") == 0)
 		*flags |= HIST_FIELD_FL_CPU;
 	else if (strcmp(field_name, "hitcount") == 0)
@@ -3119,13 +3127,24 @@ static inline void __update_field_vars(struct tracing_map_elt *elt,
 		var_val = hist_fn_call(val, elt, buffer, rbe, rec);
 		var_idx = var->var.idx;
 
-		if (val->flags & HIST_FIELD_FL_STRING) {
+		if (val->flags & (HIST_FIELD_FL_STRING |
+				  HIST_FIELD_FL_STACKTRACE)) {
 			char *str = elt_data->field_var_str[j++];
 			char *val_str = (char *)(uintptr_t)var_val;
 			unsigned int size;
 
-			size = min(val->size, STR_VAR_LEN_MAX);
-			strscpy(str, val_str, size);
+			if (val->flags & HIST_FIELD_FL_STRING) {
+				size = min(val->size, STR_VAR_LEN_MAX);
+				strscpy(str, val_str, size);
+			} else {
+				int e;
+
+				e = stack_trace_save((void *)str,
+						     HIST_STACKTRACE_DEPTH,
+						     HIST_STACKTRACE_SKIP);
+				if (e < HIST_STACKTRACE_DEPTH - 1)
+					((unsigned long *)str)[e] = 0;
+			}
 			var_val = (u64)(uintptr_t)str;
 		}
 		tracing_map_set_var(elt, var_idx, var_val);
@@ -3824,7 +3843,8 @@ static void save_field_var(struct hist_trigger_data *hist_data,
 {
 	hist_data->field_vars[hist_data->n_field_vars++] = field_var;
 
-	if (field_var->val->flags & HIST_FIELD_FL_STRING)
+	/* Stack traces are saved in the string storage too */
+	if (field_var->val->flags & (HIST_FIELD_FL_STRING | HIST_FIELD_FL_STACKTRACE))
 		hist_data->n_field_var_str++;
 }
 
@@ -4103,7 +4123,8 @@ static int action_create(struct hist_trigger_data *hist_data,
 			}
 
 			hist_data->save_vars[hist_data->n_save_vars++] = field_var;
-			if (field_var->val->flags & HIST_FIELD_FL_STRING)
+			if (field_var->val->flags &
+			    (HIST_FIELD_FL_STRING | HIST_FIELD_FL_STACKTRACE))
 				hist_data->n_save_var_str++;
 			kfree(param);
 		}
@@ -4351,7 +4372,8 @@ static int create_var_field(struct hist_trigger_data *hist_data,
 	if (!ret && hist_data->fields[val_idx]->flags & HIST_FIELD_FL_EXECNAME)
 		update_var_execname(hist_data->fields[val_idx]);
 
-	if (!ret && hist_data->fields[val_idx]->flags & HIST_FIELD_FL_STRING)
+	if (!ret && hist_data->fields[val_idx]->flags &
+	    (HIST_FIELD_FL_STRING | HIST_FIELD_FL_STACKTRACE))
 		hist_data->fields[val_idx]->var_str_idx = hist_data->n_var_str++;
 
 	return ret;
@@ -5092,7 +5114,8 @@ static void hist_trigger_elt_update(struct hist_trigger_data *hist_data,
 		if (hist_field->flags & HIST_FIELD_FL_VAR) {
 			var_idx = hist_field->var.idx;
 
-			if (hist_field->flags & HIST_FIELD_FL_STRING) {
+			if (hist_field->flags &
+			    (HIST_FIELD_FL_STRING | HIST_FIELD_FL_STACKTRACE)) {
 				unsigned int str_start, var_str_idx, idx;
 				char *str, *val_str;
 				unsigned int size;
@@ -5105,9 +5128,18 @@ static void hist_trigger_elt_update(struct hist_trigger_data *hist_data,
 				str = elt_data->field_var_str[idx];
 				val_str = (char *)(uintptr_t)hist_val;
 
-				size = min(hist_field->size, STR_VAR_LEN_MAX);
-				strscpy(str, val_str, size);
-
+				if (hist_field->flags & HIST_FIELD_FL_STRING) {
+					size = min(hist_field->size, STR_VAR_LEN_MAX);
+					strscpy(str, val_str, size);
+				} else {
+					int e;
+
+					e = stack_trace_save((void *)str,
+							     HIST_STACKTRACE_DEPTH,
+							     HIST_STACKTRACE_SKIP);
+					if (e < HIST_STACKTRACE_DEPTH - 1)
+						((unsigned long *)str)[e] = 0;
+				}
 				hist_val = (u64)(uintptr_t)str;
 			}
 			tracing_map_set_var(elt, var_idx, hist_val);
-- 
2.39.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [for-next][PATCH 07/12] tracing: Allow synthetic events to pass around stacktraces
  2023-01-25 17:12 [for-next][PATCH 00/12] tracing: Updates for 6.3 Steven Rostedt
                   ` (5 preceding siblings ...)
  2023-01-25 17:12 ` [for-next][PATCH 06/12] tracing: Allow stacktraces to be saved as histogram variables Steven Rostedt
@ 2023-01-25 17:12 ` Steven Rostedt
  2023-01-25 17:13 ` [for-next][PATCH 08/12] tracing/histogram: Add stacktrace type Steven Rostedt
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Steven Rostedt @ 2023-01-25 17:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: Masami Hiramatsu, Andrew Morton, Tom Zanussi, Ross Zwisler, Ching-lin Yu

From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

Allow a stacktrace from one event to be displayed by the end event of a
synthetic event. This is very useful when looking for the longest latency
of a sleep or something blocked on I/O.

 # cd /sys/kernel/tracing/
 # echo 's:block_lat pid_t pid; u64 delta; unsigned long[] stack;' > dynamic_events
 # echo 'hist:keys=next_pid:ts=common_timestamp.usecs,st=stacktrace  if prev_state == 1||prev_state == 2' > events/sched/sched_switch/trigger
 # echo 'hist:keys=prev_pid:delta=common_timestamp.usecs-$ts,s=$st:onmax($delta).trace(block_lat,prev_pid,$delta,$s)' >> events/sched/sched_switch/trigger

The above creates a "block_lat" synthetic event that take the stacktrace of
when a task schedules out in either the interruptible or uninterruptible
states, and on a new per process max $delta (the time it was scheduled
out), will print the process id and the stacktrace.

  # echo 1 > events/synthetic/block_lat/enable
  # cat trace
 #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
 #              | |         |   |||||     |         |
    kworker/u16:0-767     [006] d..4.   560.645045: block_lat: pid=767 delta=66 stack=STACK:
 => __schedule
 => schedule
 => pipe_read
 => vfs_read
 => ksys_read
 => do_syscall_64
 => 0x966000aa

           <idle>-0       [003] d..4.   561.132117: block_lat: pid=0 delta=413787 stack=STACK:
 => __schedule
 => schedule
 => schedule_hrtimeout_range_clock
 => do_sys_poll
 => __x64_sys_poll
 => do_syscall_64
 => 0x966000aa

            <...>-153     [006] d..4.   562.068407: block_lat: pid=153 delta=54 stack=STACK:
 => __schedule
 => schedule
 => io_schedule
 => rq_qos_wait
 => wbt_wait
 => __rq_qos_throttle
 => blk_mq_submit_bio
 => submit_bio_noacct_nocheck
 => ext4_bio_write_page
 => mpage_submit_page
 => mpage_process_page_bufs
 => mpage_prepare_extent_to_map
 => ext4_do_writepages
 => ext4_writepages
 => do_writepages
 => __writeback_single_inode

Link: https://lkml.kernel.org/r/20230117152236.010941267@goodmis.org

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Ross Zwisler <zwisler@google.com>
Cc: Ching-lin Yu <chinglinyu@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 kernel/trace/trace.h              |  4 ++
 kernel/trace/trace_events_hist.c  |  7 ++-
 kernel/trace/trace_events_synth.c | 80 ++++++++++++++++++++++++++++++-
 kernel/trace/trace_synth.h        |  1 +
 4 files changed, 87 insertions(+), 5 deletions(-)

diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 4eb6d6b97a9f..d16929dd0f08 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -113,6 +113,10 @@ enum trace_type {
 #define MEM_FAIL(condition, fmt, ...)					\
 	DO_ONCE_LITE_IF(condition, pr_err, "ERROR: " fmt, ##__VA_ARGS__)
 
+#define HIST_STACKTRACE_DEPTH	16
+#define HIST_STACKTRACE_SIZE	(HIST_STACKTRACE_DEPTH * sizeof(unsigned long))
+#define HIST_STACKTRACE_SKIP	5
+
 /*
  * syscalls are special, and need special handling, this is why
  * they are not included in trace_entries.h
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index f8f67e17898a..2c0837ff2299 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -480,10 +480,6 @@ DEFINE_HIST_FIELD_FN(u8);
 #define for_each_hist_key_field(i, hist_data)	\
 	for ((i) = (hist_data)->n_vals; (i) < (hist_data)->n_fields; (i)++)
 
-#define HIST_STACKTRACE_DEPTH	16
-#define HIST_STACKTRACE_SIZE	(HIST_STACKTRACE_DEPTH * sizeof(unsigned long))
-#define HIST_STACKTRACE_SKIP	5
-
 #define HITCOUNT_IDX		0
 #define HIST_KEY_SIZE_MAX	(MAX_FILTER_STR_VAL + HIST_STACKTRACE_SIZE)
 
@@ -3869,6 +3865,9 @@ static int check_synth_field(struct synth_event *event,
 	    && field->is_dynamic)
 		return 0;
 
+	if (strstr(hist_field->type, "long[") && field->is_stack)
+		return 0;
+
 	if (strcmp(field->type, hist_field->type) != 0) {
 		if (field->size != hist_field->size ||
 		    (!field->is_string && field->is_signed != hist_field->is_signed))
diff --git a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c
index 9f79cd689b79..adb630633f31 100644
--- a/kernel/trace/trace_events_synth.c
+++ b/kernel/trace/trace_events_synth.c
@@ -173,6 +173,14 @@ static int synth_field_is_string(char *type)
 	return false;
 }
 
+static int synth_field_is_stack(char *type)
+{
+	if (strstr(type, "long[") != NULL)
+		return true;
+
+	return false;
+}
+
 static int synth_field_string_size(char *type)
 {
 	char buf[4], *end, *start;
@@ -248,6 +256,8 @@ static int synth_field_size(char *type)
 		size = sizeof(gfp_t);
 	else if (synth_field_is_string(type))
 		size = synth_field_string_size(type);
+	else if (synth_field_is_stack(type))
+		size = 0;
 
 	return size;
 }
@@ -292,6 +302,8 @@ static const char *synth_field_fmt(char *type)
 		fmt = "%x";
 	else if (synth_field_is_string(type))
 		fmt = "%.*s";
+	else if (synth_field_is_stack(type))
+		fmt = "%s";
 
 	return fmt;
 }
@@ -371,6 +383,23 @@ static enum print_line_t print_synth_event(struct trace_iterator *iter,
 						 i == se->n_fields - 1 ? "" : " ");
 				n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
 			}
+		} else if (se->fields[i]->is_stack) {
+			u32 offset, data_offset, len;
+			unsigned long *p, *end;
+
+			offset = (u32)entry->fields[n_u64];
+			data_offset = offset & 0xffff;
+			len = offset >> 16;
+
+			p = (void *)entry + data_offset;
+			end = (void *)p + len - (sizeof(long) - 1);
+
+			trace_seq_printf(s, "%s=STACK:\n", se->fields[i]->name);
+
+			for (; *p && p < end; p++)
+				trace_seq_printf(s, "=> %pS\n", (void *)*p);
+			n_u64++;
+
 		} else {
 			struct trace_print_flags __flags[] = {
 			    __def_gfpflag_names, {-1, NULL} };
@@ -446,6 +475,43 @@ static unsigned int trace_string(struct synth_trace_event *entry,
 	return len;
 }
 
+static unsigned int trace_stack(struct synth_trace_event *entry,
+				 struct synth_event *event,
+				 long *stack,
+				 unsigned int data_size,
+				 unsigned int *n_u64)
+{
+	unsigned int len;
+	u32 data_offset;
+	void *data_loc;
+
+	data_offset = struct_size(entry, fields, event->n_u64);
+	data_offset += data_size;
+
+	for (len = 0; len < HIST_STACKTRACE_DEPTH; len++) {
+		if (!stack[len])
+			break;
+	}
+
+	/* Include the zero'd element if it fits */
+	if (len < HIST_STACKTRACE_DEPTH)
+		len++;
+
+	len *= sizeof(long);
+
+	/* Find the dynamic section to copy the stack into. */
+	data_loc = (void *)entry + data_offset;
+	memcpy(data_loc, stack, len);
+
+	/* Fill in the field that holds the offset/len combo */
+	data_offset |= len << 16;
+	*(u32 *)&entry->fields[*n_u64] = data_offset;
+
+	(*n_u64)++;
+
+	return len;
+}
+
 static notrace void trace_event_raw_event_synth(void *__data,
 						u64 *var_ref_vals,
 						unsigned int *var_ref_idx)
@@ -498,6 +564,12 @@ static notrace void trace_event_raw_event_synth(void *__data,
 					   event->fields[i]->is_dynamic,
 					   data_size, &n_u64);
 			data_size += len; /* only dynamic string increments */
+		} if (event->fields[i]->is_stack) {
+		        long *stack = (long *)(long)var_ref_vals[val_idx];
+
+			len = trace_stack(entry, event, stack,
+					   data_size, &n_u64);
+			data_size += len;
 		} else {
 			struct synth_field *field = event->fields[i];
 			u64 val = var_ref_vals[val_idx];
@@ -560,6 +632,9 @@ static int __set_synth_event_print_fmt(struct synth_event *event,
 		    event->fields[i]->is_dynamic)
 			pos += snprintf(buf + pos, LEN_OR_ZERO,
 				", __get_str(%s)", event->fields[i]->name);
+		else if (event->fields[i]->is_stack)
+			pos += snprintf(buf + pos, LEN_OR_ZERO,
+				", __get_stacktrace(%s)", event->fields[i]->name);
 		else
 			pos += snprintf(buf + pos, LEN_OR_ZERO,
 					", REC->%s", event->fields[i]->name);
@@ -696,7 +771,8 @@ static struct synth_field *parse_synth_field(int argc, char **argv,
 		ret = -EINVAL;
 		goto free;
 	} else if (size == 0) {
-		if (synth_field_is_string(field->type)) {
+		if (synth_field_is_string(field->type) ||
+		    synth_field_is_stack(field->type)) {
 			char *type;
 
 			len = sizeof("__data_loc ") + strlen(field->type) + 1;
@@ -727,6 +803,8 @@ static struct synth_field *parse_synth_field(int argc, char **argv,
 
 	if (synth_field_is_string(field->type))
 		field->is_string = true;
+	else if (synth_field_is_stack(field->type))
+		field->is_stack = true;
 
 	field->is_signed = synth_field_signed(field->type);
  out:
diff --git a/kernel/trace/trace_synth.h b/kernel/trace/trace_synth.h
index b29595fe3ac5..43f6fb6078db 100644
--- a/kernel/trace/trace_synth.h
+++ b/kernel/trace/trace_synth.h
@@ -18,6 +18,7 @@ struct synth_field {
 	bool is_signed;
 	bool is_string;
 	bool is_dynamic;
+	bool is_stack;
 };
 
 struct synth_event {
-- 
2.39.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [for-next][PATCH 08/12] tracing/histogram: Add stacktrace type
  2023-01-25 17:12 [for-next][PATCH 00/12] tracing: Updates for 6.3 Steven Rostedt
                   ` (6 preceding siblings ...)
  2023-01-25 17:12 ` [for-next][PATCH 07/12] tracing: Allow synthetic events to pass around stacktraces Steven Rostedt
@ 2023-01-25 17:13 ` Steven Rostedt
  2023-01-25 17:13 ` [for-next][PATCH 09/12] tracing/histogram: Document variable stacktrace Steven Rostedt
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Steven Rostedt @ 2023-01-25 17:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Masami Hiramatsu, Andrew Morton, Tom Zanussi, Ross Zwisler, Ching-lin Yu

From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

Now that stacktraces can be part of synthetic events, allow a key to be
typed as a stacktrace.

  # cd /sys/kernel/tracing
  # echo 's:block_lat u64 delta; unsigned long stack[];' >> dynamic_events
  # echo 'hist:keys=next_pid:ts=common_timestamp.usecs,st=stacktrace if prev_state == 2' >> events/sched/sched_switch/trigger
  # echo 'hist:keys=prev_pid:delta=common_timestamp.usecs-$ts,st2=$st:onmatch(sched.sched_switch).trace(block_lat,$delta,$st2)' >> events/sched/sched_switch/trigger
  # echo 'hist:keys=delta.buckets=100,stack.stacktrace:sort=delta' > events/synthetic/block_lat/trigger

  # cat events/synthetic/block_lat/hist

  # event histogram
  #
  # trigger info: hist:keys=delta.buckets=100,stacktrace:vals=hitcount:sort=delta.buckets=100:size=2048 [active]
  #

  { delta: ~ 0-99, stacktrace:
           event_hist_trigger+0x464/0x480
           event_triggers_call+0x52/0xe0
           trace_event_buffer_commit+0x193/0x250
           trace_event_raw_event_sched_switch+0xfc/0x150
           __traceiter_sched_switch+0x41/0x60
           __schedule+0x448/0x7b0
           schedule_idle+0x26/0x40
           cpu_startup_entry+0x19/0x20
           start_secondary+0xed/0xf0
           secondary_startup_64_no_verify+0xe0/0xeb
  } hitcount:          6
  { delta: ~ 0-99, stacktrace:
           event_hist_trigger+0x464/0x480
           event_triggers_call+0x52/0xe0
           trace_event_buffer_commit+0x193/0x250
           trace_event_raw_event_sched_switch+0xfc/0x150
           __traceiter_sched_switch+0x41/0x60
           __schedule+0x448/0x7b0
           schedule_idle+0x26/0x40
           cpu_startup_entry+0x19/0x20
           __pfx_kernel_init+0x0/0x10
           arch_call_rest_init+0xa/0x24
           start_kernel+0x964/0x98d
           secondary_startup_64_no_verify+0xe0/0xeb
  } hitcount:          3
  { delta: ~ 0-99, stacktrace:
           event_hist_trigger+0x464/0x480
           event_triggers_call+0x52/0xe0
           trace_event_buffer_commit+0x193/0x250
           trace_event_raw_event_sched_switch+0xfc/0x150
           __traceiter_sched_switch+0x41/0x60
           __schedule+0x448/0x7b0
           schedule+0x5a/0xb0
           worker_thread+0xaf/0x380
           kthread+0xe9/0x110
           ret_from_fork+0x2c/0x50
  } hitcount:          1
  { delta: ~ 100-199, stacktrace:
           event_hist_trigger+0x464/0x480
           event_triggers_call+0x52/0xe0
           trace_event_buffer_commit+0x193/0x250
           trace_event_raw_event_sched_switch+0xfc/0x150
           __traceiter_sched_switch+0x41/0x60
           __schedule+0x448/0x7b0
           schedule_idle+0x26/0x40
           cpu_startup_entry+0x19/0x20
           start_secondary+0xed/0xf0
           secondary_startup_64_no_verify+0xe0/0xeb
  } hitcount:         15
  [..]
  { delta: ~ 8500-8599, stacktrace:
           event_hist_trigger+0x464/0x480
           event_triggers_call+0x52/0xe0
           trace_event_buffer_commit+0x193/0x250
           trace_event_raw_event_sched_switch+0xfc/0x150
           __traceiter_sched_switch+0x41/0x60
           __schedule+0x448/0x7b0
           schedule_idle+0x26/0x40
           cpu_startup_entry+0x19/0x20
           start_secondary+0xed/0xf0
           secondary_startup_64_no_verify+0xe0/0xeb
  } hitcount:          1

  Totals:
      Hits: 89
      Entries: 11
      Dropped: 0

Link: https://lkml.kernel.org/r/20230117152236.167046397@goodmis.org

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Ross Zwisler <zwisler@google.com>
Cc: Ching-lin Yu <chinglinyu@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 kernel/trace/trace_events_hist.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 2c0837ff2299..7f3e6ca6f0fa 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -1716,6 +1716,8 @@ static const char *get_hist_field_flags(struct hist_field *hist_field)
 		flags_str = "percent";
 	else if (hist_field->flags & HIST_FIELD_FL_GRAPH)
 		flags_str = "graph";
+	else if (hist_field->flags & HIST_FIELD_FL_STACKTRACE)
+		flags_str = "stacktrace";
 
 	return flags_str;
 }
@@ -2314,6 +2316,8 @@ parse_field(struct hist_trigger_data *hist_data, struct trace_event_file *file,
 			*flags |= HIST_FIELD_FL_EXECNAME;
 		else if (strcmp(modifier, "syscall") == 0)
 			*flags |= HIST_FIELD_FL_SYSCALL;
+		else if (strcmp(modifier, "stacktrace") == 0)
+			*flags |= HIST_FIELD_FL_STACKTRACE;
 		else if (strcmp(modifier, "log2") == 0)
 			*flags |= HIST_FIELD_FL_LOG2;
 		else if (strcmp(modifier, "usecs") == 0)
-- 
2.39.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [for-next][PATCH 09/12] tracing/histogram: Document variable stacktrace
  2023-01-25 17:12 [for-next][PATCH 00/12] tracing: Updates for 6.3 Steven Rostedt
                   ` (7 preceding siblings ...)
  2023-01-25 17:13 ` [for-next][PATCH 08/12] tracing/histogram: Add stacktrace type Steven Rostedt
@ 2023-01-25 17:13 ` Steven Rostedt
  2023-01-25 17:13 ` [for-next][PATCH 10/12] tracing/histogram: Add simple tests for stacktrace usage of synthetic events Steven Rostedt
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Steven Rostedt @ 2023-01-25 17:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Masami Hiramatsu, Andrew Morton, Tom Zanussi, Ross Zwisler, Ching-lin Yu

From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

Add a little documentation (and a useful example) of how a stacktrace can
be used within a histogram variable and synthetic event.

Link: https://lkml.kernel.org/r/20230117152236.320181354@goodmis.org

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Ross Zwisler <zwisler@google.com>
Cc: Ching-lin Yu <chinglinyu@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 Documentation/trace/histogram.rst | 129 ++++++++++++++++++++++++++++++
 1 file changed, 129 insertions(+)

diff --git a/Documentation/trace/histogram.rst b/Documentation/trace/histogram.rst
index f95459aa984f..5c391328b9bb 100644
--- a/Documentation/trace/histogram.rst
+++ b/Documentation/trace/histogram.rst
@@ -81,6 +81,7 @@ Documentation written by Tom Zanussi
 	.usecs         display a common_timestamp in microseconds
         .percent       display a number of percentage value
         .graph         display a bar-graph of a value
+	.stacktrace    display as a stacktrace (must by a long[] type)
 	=============  =================================================
 
   Note that in general the semantics of a given field aren't
@@ -1786,6 +1787,8 @@ or assigned to a variable and referenced in a subsequent expression::
   # echo 'hist:keys=next_pid:us_per_sec=1000000 ...' >> event/trigger
   # echo 'hist:keys=next_pid:timestamp_secs=common_timestamp/$us_per_sec ...' >> event/trigger
 
+Variables can even hold stacktraces, which are useful with synthetic events.
+
 2.2.2 Synthetic Events
 ----------------------
 
@@ -1940,6 +1943,132 @@ the ".buckets" modifier and specify a size (in this case groups of 10).
       Entries: 16
       Dropped: 0
 
+To save stacktraces, create a synthetic event with a field of type "unsigned long[]"
+or even just "long[]". For example, to see how long a task is blocked in an
+uninterruptible state:
+
+  # cd /sys/kernel/tracing
+  # echo 's:block_lat pid_t pid; u64 delta; unsigned long[] stack;' > dynamic_events
+  # echo 'hist:keys=next_pid:ts=common_timestamp.usecs,st=stacktrace  if prev_state == 2' >> events/sched/sched_switch/trigger
+  # echo 'hist:keys=prev_pid:delta=common_timestamp.usecs-$ts,s=$st:onmax($delta).trace(block_lat,prev_pid,$delta,$s)' >> events/sched/sched_switch/trigger
+  # echo 1 > events/synthetic/block_lat/enable
+  # cat trace
+
+  # tracer: nop
+  #
+  # entries-in-buffer/entries-written: 2/2   #P:8
+  #
+  #                                _-----=> irqs-off/BH-disabled
+  #                               / _----=> need-resched
+  #                              | / _---=> hardirq/softirq
+  #                              || / _--=> preempt-depth
+  #                              ||| / _-=> migrate-disable
+  #                              |||| /     delay
+  #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
+  #              | |         |   |||||     |         |
+            <idle>-0       [005] d..4.   521.164922: block_lat: pid=0 delta=8322 stack=STACK:
+  => __schedule+0x448/0x7b0
+  => schedule+0x5a/0xb0
+  => io_schedule+0x42/0x70
+  => bit_wait_io+0xd/0x60
+  => __wait_on_bit+0x4b/0x140
+  => out_of_line_wait_on_bit+0x91/0xb0
+  => jbd2_journal_commit_transaction+0x1679/0x1a70
+  => kjournald2+0xa9/0x280
+  => kthread+0xe9/0x110
+  => ret_from_fork+0x2c/0x50
+
+             <...>-2       [004] d..4.   525.184257: block_lat: pid=2 delta=76 stack=STACK:
+  => __schedule+0x448/0x7b0
+  => schedule+0x5a/0xb0
+  => schedule_timeout+0x11a/0x150
+  => wait_for_completion_killable+0x144/0x1f0
+  => __kthread_create_on_node+0xe7/0x1e0
+  => kthread_create_on_node+0x51/0x70
+  => create_worker+0xcc/0x1a0
+  => worker_thread+0x2ad/0x380
+  => kthread+0xe9/0x110
+  => ret_from_fork+0x2c/0x50
+
+A synthetic event that has a stacktrace field may use it as a key in histogram:
+
+  # echo 'hist:delta.buckets=100,stack.stacktrace:sort=delta' > events/synthetic/block_lat/trigger
+  # cat events/synthetic/block_lat/hist
+
+  # event histogram
+  #
+  # trigger info: hist:keys=delta.buckets=100,stacktrace:vals=hitcount:sort=delta.buckets=100:size=2048 [active]
+  #
+
+  { delta: ~ 0-99, stacktrace:
+           event_hist_trigger+0x464/0x480
+           event_triggers_call+0x52/0xe0
+           trace_event_buffer_commit+0x193/0x250
+           trace_event_raw_event_sched_switch+0xfc/0x150
+           __traceiter_sched_switch+0x41/0x60
+           __schedule+0x448/0x7b0
+           schedule_idle+0x26/0x40
+           cpu_startup_entry+0x19/0x20
+           start_secondary+0xed/0xf0
+           secondary_startup_64_no_verify+0xe0/0xeb
+  } hitcount:          6
+  { delta: ~ 0-99, stacktrace:
+           event_hist_trigger+0x464/0x480
+           event_triggers_call+0x52/0xe0
+           trace_event_buffer_commit+0x193/0x250
+           trace_event_raw_event_sched_switch+0xfc/0x150
+           __traceiter_sched_switch+0x41/0x60
+           __schedule+0x448/0x7b0
+           schedule_idle+0x26/0x40
+           cpu_startup_entry+0x19/0x20
+           __pfx_kernel_init+0x0/0x10
+           arch_call_rest_init+0xa/0x24
+           start_kernel+0x964/0x98d
+           secondary_startup_64_no_verify+0xe0/0xeb
+  } hitcount:          3
+  { delta: ~ 0-99, stacktrace:
+           event_hist_trigger+0x464/0x480
+           event_triggers_call+0x52/0xe0
+           trace_event_buffer_commit+0x193/0x250
+           trace_event_raw_event_sched_switch+0xfc/0x150
+           __traceiter_sched_switch+0x41/0x60
+           __schedule+0x448/0x7b0
+           schedule+0x5a/0xb0
+           worker_thread+0xaf/0x380
+           kthread+0xe9/0x110
+           ret_from_fork+0x2c/0x50
+  } hitcount:          1
+  { delta: ~ 100-199, stacktrace:
+           event_hist_trigger+0x464/0x480
+           event_triggers_call+0x52/0xe0
+           trace_event_buffer_commit+0x193/0x250
+           trace_event_raw_event_sched_switch+0xfc/0x150
+           __traceiter_sched_switch+0x41/0x60
+           __schedule+0x448/0x7b0
+           schedule_idle+0x26/0x40
+           cpu_startup_entry+0x19/0x20
+           start_secondary+0xed/0xf0
+           secondary_startup_64_no_verify+0xe0/0xeb
+  } hitcount:         15
+  [..]
+  { delta: ~ 8500-8599, stacktrace:
+           event_hist_trigger+0x464/0x480
+           event_triggers_call+0x52/0xe0
+           trace_event_buffer_commit+0x193/0x250
+           trace_event_raw_event_sched_switch+0xfc/0x150
+           __traceiter_sched_switch+0x41/0x60
+           __schedule+0x448/0x7b0
+           schedule_idle+0x26/0x40
+           cpu_startup_entry+0x19/0x20
+           start_secondary+0xed/0xf0
+           secondary_startup_64_no_verify+0xe0/0xeb
+  } hitcount:          1
+
+  Totals:
+      Hits: 89
+      Entries: 11
+      Dropped: 0
+
 2.2.3 Hist trigger 'handlers' and 'actions'
 -------------------------------------------
 
-- 
2.39.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [for-next][PATCH 10/12] tracing/histogram: Add simple tests for stacktrace usage of synthetic events
  2023-01-25 17:12 [for-next][PATCH 00/12] tracing: Updates for 6.3 Steven Rostedt
                   ` (8 preceding siblings ...)
  2023-01-25 17:13 ` [for-next][PATCH 09/12] tracing/histogram: Document variable stacktrace Steven Rostedt
@ 2023-01-25 17:13 ` Steven Rostedt
  2023-01-25 17:13 ` [for-next][PATCH 11/12] perf/tracing: Use stage6 of tracing to not duplicate macros Steven Rostedt
  2023-01-25 17:13 ` [for-next][PATCH 12/12] bpf/tracing: " Steven Rostedt
  11 siblings, 0 replies; 13+ messages in thread
From: Steven Rostedt @ 2023-01-25 17:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Masami Hiramatsu, Andrew Morton, Tom Zanussi, Ross Zwisler, Ching-lin Yu

From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

Update the selftests to include a test of passing a stacktrace between the
events of a synthetic event.

Link: https://lkml.kernel.org/r/20230117152236.475439286@goodmis.org

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Ross Zwisler <zwisler@google.com>
Cc: Ching-lin Yu <chinglinyu@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 kernel/trace/trace.c                          |  2 +-
 .../trigger-synthetic-event-stack.tc          | 24 +++++++++++++++++++
 .../trigger-synthetic-event-syntax.tc         |  6 +++++
 3 files changed, 31 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-synthetic-event-stack.tc

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 78ed5f1baa8c..b90eecd27dfc 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -5757,7 +5757,7 @@ static const char readme_msg[] =
 #ifdef CONFIG_SYNTH_EVENTS
 	"  events/synthetic_events\t- Create/append/remove/show synthetic events\n"
 	"\t  Write into this file to define/undefine new synthetic events.\n"
-	"\t     example: echo 'myevent u64 lat; char name[]' >> synthetic_events\n"
+	"\t     example: echo 'myevent u64 lat; char name[]; long[] stack' >> synthetic_events\n"
 #endif
 #endif
 ;
diff --git a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-synthetic-event-stack.tc b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-synthetic-event-stack.tc
new file mode 100644
index 000000000000..755dbe94ccf4
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-synthetic-event-stack.tc
@@ -0,0 +1,24 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: event trigger - test inter-event histogram trigger trace action with dynamic string param
+# requires: set_event synthetic_events events/sched/sched_process_exec/hist "long[]' >> synthetic_events":README
+
+fail() { #msg
+    echo $1
+    exit_fail
+}
+
+echo "Test create synthetic event with stack"
+
+
+echo 's:wake_lat pid_t pid; u64 delta; unsigned long[] stack;' > dynamic_events
+echo 'hist:keys=next_pid:ts=common_timestamp.usecs,st=stacktrace  if prev_state == 1||prev_state == 2' >> events/sched/sched_switch/trigger
+echo 'hist:keys=prev_pid:delta=common_timestamp.usecs-$ts,s=$st:onmax($delta).trace(wake_lat,prev_pid,$delta,$s)' >> events/sched/sched_switch/trigger
+echo 1 > events/synthetic/wake_lat/enable
+sleep 1
+
+if ! grep -q "=>.*sched" trace; then
+    fail "Failed to create synthetic event with stack"
+fi
+
+exit 0
diff --git a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-synthetic-event-syntax.tc b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-synthetic-event-syntax.tc
index 2968cdc7df30..366f1f3ad906 100644
--- a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-synthetic-event-syntax.tc
+++ b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-synthetic-event-syntax.tc
@@ -70,6 +70,12 @@ grep "myevent[[:space:]]unsigned long var" synthetic_events
 echo "myevent char var[10]" > synthetic_events
 grep "myevent[[:space:]]char\[10\] var" synthetic_events
 
+if grep -q 'long\[\]' README; then
+  # test stacktrace type
+  echo "myevent unsigned long[] var" > synthetic_events
+  grep "myevent[[:space:]]unsigned long\[\] var" synthetic_events
+fi
+
 do_reset
 
 exit 0
-- 
2.39.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [for-next][PATCH 11/12] perf/tracing: Use stage6 of tracing to not duplicate macros
  2023-01-25 17:12 [for-next][PATCH 00/12] tracing: Updates for 6.3 Steven Rostedt
                   ` (9 preceding siblings ...)
  2023-01-25 17:13 ` [for-next][PATCH 10/12] tracing/histogram: Add simple tests for stacktrace usage of synthetic events Steven Rostedt
@ 2023-01-25 17:13 ` Steven Rostedt
  2023-01-25 17:13 ` [for-next][PATCH 12/12] bpf/tracing: " Steven Rostedt
  11 siblings, 0 replies; 13+ messages in thread
From: Steven Rostedt @ 2023-01-25 17:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Masami Hiramatsu, Andrew Morton, Frederic Weisbecker,
	Arnaldo Carvalho de Melo, Peter Zijlstra (Intel),
	Linyu Yuan

From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

The perf events are created by the same macro magic as tracefs trace
events are. But to hook into perf, it has its own code. It duplicates many
of the same macros as the tracefs macros and this is an issue because it
misses bug fixes as well as any new enhancements that come with the other
trace macros.

As the trace macros have been put into their own staging files, have perf
take advantage of this and use the tracefs stage 6 macros that the "fast
assign" portion of the trace event macro uses.

Link: https://lkml.kernel.org/r/20230124202515.716458410@goodmis.org
Link: https://lore.kernel.org/lkml/1671181385-5719-1-git-send-email-quic_linyyuan@quicinc.com/

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reported-by: Linyu Yuan <quic_linyyuan@quicinc.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 include/trace/perf.h                         | 46 +-------------------
 include/trace/stages/stage6_event_callback.h |  3 ++
 2 files changed, 4 insertions(+), 45 deletions(-)

diff --git a/include/trace/perf.h b/include/trace/perf.h
index 8f3bf1e17707..2c11181c82e0 100644
--- a/include/trace/perf.h
+++ b/include/trace/perf.h
@@ -4,51 +4,7 @@
 
 #ifdef CONFIG_PERF_EVENTS
 
-#undef __entry
-#define __entry entry
-
-#undef __get_dynamic_array
-#define __get_dynamic_array(field)	\
-		((void *)__entry + (__entry->__data_loc_##field & 0xffff))
-
-#undef __get_dynamic_array_len
-#define __get_dynamic_array_len(field)	\
-		((__entry->__data_loc_##field >> 16) & 0xffff)
-
-#undef __get_str
-#define __get_str(field) ((char *)__get_dynamic_array(field))
-
-#undef __get_bitmask
-#define __get_bitmask(field) (char *)__get_dynamic_array(field)
-
-#undef __get_cpumask
-#define __get_cpumask(field) (char *)__get_dynamic_array(field)
-
-#undef __get_sockaddr
-#define __get_sockaddr(field) ((struct sockaddr *)__get_dynamic_array(field))
-
-#undef __get_rel_dynamic_array
-#define __get_rel_dynamic_array(field)	\
-		((void *)__entry +					\
-		 offsetof(typeof(*__entry), __rel_loc_##field) +	\
-		 sizeof(__entry->__rel_loc_##field) +			\
-		 (__entry->__rel_loc_##field & 0xffff))
-
-#undef __get_rel_dynamic_array_len
-#define __get_rel_dynamic_array_len(field)	\
-		((__entry->__rel_loc_##field >> 16) & 0xffff)
-
-#undef __get_rel_str
-#define __get_rel_str(field) ((char *)__get_rel_dynamic_array(field))
-
-#undef __get_rel_bitmask
-#define __get_rel_bitmask(field) (char *)__get_rel_dynamic_array(field)
-
-#undef __get_rel_cpumask
-#define __get_rel_cpumask(field) (char *)__get_rel_dynamic_array(field)
-
-#undef __get_rel_sockaddr
-#define __get_rel_sockaddr(field) ((struct sockaddr *)__get_rel_dynamic_array(field))
+#include "stages/stage6_event_callback.h"
 
 #undef __perf_count
 #define __perf_count(c)	(__count = (c))
diff --git a/include/trace/stages/stage6_event_callback.h b/include/trace/stages/stage6_event_callback.h
index 49c32394b53f..919b1a4da980 100644
--- a/include/trace/stages/stage6_event_callback.h
+++ b/include/trace/stages/stage6_event_callback.h
@@ -2,6 +2,9 @@
 
 /* Stage 6 definitions for creating trace events */
 
+/* Reuse some of the stage 3 macros */
+#include "stage3_trace_output.h"
+
 #undef __entry
 #define __entry entry
 
-- 
2.39.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [for-next][PATCH 12/12] bpf/tracing: Use stage6 of tracing to not duplicate macros
  2023-01-25 17:12 [for-next][PATCH 00/12] tracing: Updates for 6.3 Steven Rostedt
                   ` (10 preceding siblings ...)
  2023-01-25 17:13 ` [for-next][PATCH 11/12] perf/tracing: Use stage6 of tracing to not duplicate macros Steven Rostedt
@ 2023-01-25 17:13 ` Steven Rostedt
  11 siblings, 0 replies; 13+ messages in thread
From: Steven Rostedt @ 2023-01-25 17:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Masami Hiramatsu, Andrew Morton, bpf, Peter Zijlstra,
	Alexei Starovoitov, Daniel Borkmann, Linyu Yuan

From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

The bpf events are created by the same macro magic as tracefs trace
events are. But to hook into bpf, it has its own code. It duplicates many
of the same macros as the tracefs macros and this is an issue because it
misses bug fixes as well as any new enhancements that come with the other
trace macros.

As the trace macros have been put into their own staging files, have bpf
take advantage of this and use the tracefs stage 6 macros that the "fast
ssign" portion of the trace event macro uses.

Link: https://lkml.kernel.org/r/20230124202515.873075730@goodmis.org
Link: https://lore.kernel.org/lkml/1671181385-5719-1-git-send-email-quic_linyyuan@quicinc.com/

Cc: bpf@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Reported-by: Linyu Yuan <quic_linyyuan@quicinc.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 include/trace/bpf_probe.h | 45 +--------------------------------------
 1 file changed, 1 insertion(+), 44 deletions(-)

diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
index 155c495b89ea..1f7fc1fc590c 100644
--- a/include/trace/bpf_probe.h
+++ b/include/trace/bpf_probe.h
@@ -4,50 +4,7 @@
 
 #ifdef CONFIG_BPF_EVENTS
 
-#undef __entry
-#define __entry entry
-
-#undef __get_dynamic_array
-#define __get_dynamic_array(field)	\
-		((void *)__entry + (__entry->__data_loc_##field & 0xffff))
-
-#undef __get_dynamic_array_len
-#define __get_dynamic_array_len(field)	\
-		((__entry->__data_loc_##field >> 16) & 0xffff)
-
-#undef __get_str
-#define __get_str(field) ((char *)__get_dynamic_array(field))
-
-#undef __get_bitmask
-#define __get_bitmask(field) (char *)__get_dynamic_array(field)
-
-#undef __get_cpumask
-#define __get_cpumask(field) (char *)__get_dynamic_array(field)
-
-#undef __get_sockaddr
-#define __get_sockaddr(field) ((struct sockaddr *)__get_dynamic_array(field))
-
-#undef __get_rel_dynamic_array
-#define __get_rel_dynamic_array(field)	\
-		((void *)(&__entry->__rel_loc_##field) +	\
-		 sizeof(__entry->__rel_loc_##field) +		\
-		 (__entry->__rel_loc_##field & 0xffff))
-
-#undef __get_rel_dynamic_array_len
-#define __get_rel_dynamic_array_len(field)	\
-		((__entry->__rel_loc_##field >> 16) & 0xffff)
-
-#undef __get_rel_str
-#define __get_rel_str(field) ((char *)__get_rel_dynamic_array(field))
-
-#undef __get_rel_bitmask
-#define __get_rel_bitmask(field) (char *)__get_rel_dynamic_array(field)
-
-#undef __get_rel_cpumask
-#define __get_rel_cpumask(field) (char *)__get_rel_dynamic_array(field)
-
-#undef __get_rel_sockaddr
-#define __get_rel_sockaddr(field) ((struct sockaddr *)__get_rel_dynamic_array(field))
+#include "stages/stage6_event_callback.h"
 
 #undef __perf_count
 #define __perf_count(c)	(c)
-- 
2.39.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-01-25 17:14 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-25 17:12 [for-next][PATCH 00/12] tracing: Updates for 6.3 Steven Rostedt
2023-01-25 17:12 ` [for-next][PATCH 01/12] tracing: Add a way to filter function addresses to function names Steven Rostedt
2023-01-25 17:12 ` [for-next][PATCH 02/12] tracing/selftests: Add test for event filtering on function name Steven Rostedt
2023-01-25 17:12 ` [for-next][PATCH 03/12] ftrace: Add sample with custom ops Steven Rostedt
2023-01-25 17:12 ` [for-next][PATCH 04/12] tracing: Add NULL checks for buffer in ring_buffer_free_read_page() Steven Rostedt
2023-01-25 17:12 ` [for-next][PATCH 05/12] tracing: Simplify calculating entry size using struct_size() Steven Rostedt
2023-01-25 17:12 ` [for-next][PATCH 06/12] tracing: Allow stacktraces to be saved as histogram variables Steven Rostedt
2023-01-25 17:12 ` [for-next][PATCH 07/12] tracing: Allow synthetic events to pass around stacktraces Steven Rostedt
2023-01-25 17:13 ` [for-next][PATCH 08/12] tracing/histogram: Add stacktrace type Steven Rostedt
2023-01-25 17:13 ` [for-next][PATCH 09/12] tracing/histogram: Document variable stacktrace Steven Rostedt
2023-01-25 17:13 ` [for-next][PATCH 10/12] tracing/histogram: Add simple tests for stacktrace usage of synthetic events Steven Rostedt
2023-01-25 17:13 ` [for-next][PATCH 11/12] perf/tracing: Use stage6 of tracing to not duplicate macros Steven Rostedt
2023-01-25 17:13 ` [for-next][PATCH 12/12] bpf/tracing: " Steven Rostedt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.