linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] trace: introduce trace event injection
@ 2019-05-25 16:57 Cong Wang
  2019-05-25 16:57 ` [PATCH 1/4] trace: fold type initialization into tracing_generic_entry_update() Cong Wang
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Cong Wang @ 2019-05-25 16:57 UTC (permalink / raw)
  To: linux-kernel; +Cc: Cong Wang, Steven Rostedt, Ingo Molnar

This patchset introduces trace event injection, the first 3 patches
are some trivial prerequisites, the last one implements the trace
event injection. Please check each patch for details.

I have tested them with both valid and invalid input on different
tracepoints, it works as expected.

Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

---

Cong Wang (4):
  trace: fold type initialization into tracing_generic_entry_update()
  trace: let filter_assign_type() detect FILTER_PTR_STRING
  trace: make trace_get_fields() global
  trace: introduce trace event injection

 include/linux/trace_events.h       |   9 +
 kernel/trace/Makefile              |   1 +
 kernel/trace/trace.c               |   8 +-
 kernel/trace/trace.h               |   1 +
 kernel/trace/trace_event_perf.c    |   3 +-
 kernel/trace/trace_events.c        |  12 +-
 kernel/trace/trace_events_filter.c |   3 +
 kernel/trace/trace_events_inject.c | 330 +++++++++++++++++++++++++++++
 8 files changed, 353 insertions(+), 14 deletions(-)
 create mode 100644 kernel/trace/trace_events_inject.c

-- 
2.21.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/4] trace: fold type initialization into tracing_generic_entry_update()
  2019-05-25 16:57 [PATCH 0/4] trace: introduce trace event injection Cong Wang
@ 2019-05-25 16:57 ` Cong Wang
  2019-05-25 16:58 ` [PATCH 2/4] trace: let filter_assign_type() detect FILTER_PTR_STRING Cong Wang
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Cong Wang @ 2019-05-25 16:57 UTC (permalink / raw)
  To: linux-kernel; +Cc: Cong Wang, Steven Rostedt, Ingo Molnar

All callers of tracing_generic_entry_update() have to initialize
entry->type, so let's just simply move it inside.

Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 include/linux/trace_events.h    | 1 +
 kernel/trace/trace.c            | 8 ++++----
 kernel/trace/trace_event_perf.c | 3 +--
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 8a62731673f7..5c6f2a6c8cd2 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -142,6 +142,7 @@ enum print_line_t {
 enum print_line_t trace_handle_return(struct trace_seq *s);
 
 void tracing_generic_entry_update(struct trace_entry *entry,
+				  unsigned short type,
 				  unsigned long flags,
 				  int pc);
 struct trace_event_file;
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 2c92b3d9ea30..0e0e0e726170 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -743,8 +743,7 @@ trace_event_setup(struct ring_buffer_event *event,
 {
 	struct trace_entry *ent = ring_buffer_event_data(event);
 
-	tracing_generic_entry_update(ent, flags, pc);
-	ent->type = type;
+	tracing_generic_entry_update(ent, type, flags, pc);
 }
 
 static __always_inline struct ring_buffer_event *
@@ -2312,13 +2311,14 @@ enum print_line_t trace_handle_return(struct trace_seq *s)
 EXPORT_SYMBOL_GPL(trace_handle_return);
 
 void
-tracing_generic_entry_update(struct trace_entry *entry, unsigned long flags,
-			     int pc)
+tracing_generic_entry_update(struct trace_entry *entry, unsigned short type,
+			     unsigned long flags, int pc)
 {
 	struct task_struct *tsk = current;
 
 	entry->preempt_count		= pc & 0xff;
 	entry->pid			= (tsk) ? tsk->pid : 0;
+	entry->type			= type;
 	entry->flags =
 #ifdef CONFIG_TRACE_IRQFLAGS_SUPPORT
 		(irqs_disabled_flags(flags) ? TRACE_FLAG_IRQS_OFF : 0) |
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 4629a6104474..0892e38ed6fb 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -416,8 +416,7 @@ void perf_trace_buf_update(void *record, u16 type)
 	unsigned long flags;
 
 	local_save_flags(flags);
-	tracing_generic_entry_update(entry, flags, pc);
-	entry->type = type;
+	tracing_generic_entry_update(entry, type, flags, pc);
 }
 NOKPROBE_SYMBOL(perf_trace_buf_update);
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/4] trace: let filter_assign_type() detect FILTER_PTR_STRING
  2019-05-25 16:57 [PATCH 0/4] trace: introduce trace event injection Cong Wang
  2019-05-25 16:57 ` [PATCH 1/4] trace: fold type initialization into tracing_generic_entry_update() Cong Wang
@ 2019-05-25 16:58 ` Cong Wang
  2019-05-25 16:58 ` [PATCH 3/4] trace: make trace_get_fields() global Cong Wang
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Cong Wang @ 2019-05-25 16:58 UTC (permalink / raw)
  To: linux-kernel; +Cc: Cong Wang, Steven Rostedt, Ingo Molnar

filter_assign_type() could detect dynamic string and static
string, but not string pointers. Teach filter_assign_type()
to detect string pointers, and this will be needed by trace
event injection code.

BTW, trace event hist uses FILTER_PTR_STRING too.

Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 kernel/trace/trace_events_filter.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index d3e59312ef40..550e8a0d048a 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -1080,6 +1080,9 @@ int filter_assign_type(const char *type)
 	if (strchr(type, '[') && strstr(type, "char"))
 		return FILTER_STATIC_STRING;
 
+	if (strcmp(type, "char *") == 0 || strcmp(type, "const char *") == 0)
+		return FILTER_PTR_STRING;
+
 	return FILTER_OTHER;
 }
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/4] trace: make trace_get_fields() global
  2019-05-25 16:57 [PATCH 0/4] trace: introduce trace event injection Cong Wang
  2019-05-25 16:57 ` [PATCH 1/4] trace: fold type initialization into tracing_generic_entry_update() Cong Wang
  2019-05-25 16:58 ` [PATCH 2/4] trace: let filter_assign_type() detect FILTER_PTR_STRING Cong Wang
@ 2019-05-25 16:58 ` Cong Wang
  2019-05-25 16:58 ` [PATCH 4/4] trace: introduce trace event injection Cong Wang
  2019-05-25 22:37 ` [PATCH 0/4] " Steven Rostedt
  4 siblings, 0 replies; 12+ messages in thread
From: Cong Wang @ 2019-05-25 16:58 UTC (permalink / raw)
  To: linux-kernel; +Cc: Cong Wang, Steven Rostedt, Ingo Molnar

trace_get_fields() is the only way to read tracepoint fields at
run time, as their fields are defined at compile-time with macros.
Make this function visible to all users and it will be used by
trace event injection code to calculate the size of a tracepoint
entry.

Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 include/linux/trace_events.h | 8 ++++++++
 kernel/trace/trace_events.c  | 8 --------
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 5c6f2a6c8cd2..5150436783e8 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -318,6 +318,14 @@ trace_event_name(struct trace_event_call *call)
 		return call->name;
 }
 
+static inline struct list_head *
+trace_get_fields(struct trace_event_call *event_call)
+{
+	if (!event_call->class->get_fields)
+		return &event_call->class->fields;
+	return event_call->class->get_fields(event_call);
+}
+
 struct trace_array;
 struct trace_subsystem_dir;
 
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 0ce3db67f556..98df044d34ce 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -70,14 +70,6 @@ static int system_refcount_dec(struct event_subsystem *system)
 #define while_for_each_event_file()		\
 	}
 
-static struct list_head *
-trace_get_fields(struct trace_event_call *event_call)
-{
-	if (!event_call->class->get_fields)
-		return &event_call->class->fields;
-	return event_call->class->get_fields(event_call);
-}
-
 static struct ftrace_event_field *
 __find_event_field(struct list_head *head, char *name)
 {
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 4/4] trace: introduce trace event injection
  2019-05-25 16:57 [PATCH 0/4] trace: introduce trace event injection Cong Wang
                   ` (2 preceding siblings ...)
  2019-05-25 16:58 ` [PATCH 3/4] trace: make trace_get_fields() global Cong Wang
@ 2019-05-25 16:58 ` Cong Wang
  2019-07-09 19:51   ` Steven Rostedt
  2019-05-25 22:37 ` [PATCH 0/4] " Steven Rostedt
  4 siblings, 1 reply; 12+ messages in thread
From: Cong Wang @ 2019-05-25 16:58 UTC (permalink / raw)
  To: linux-kernel; +Cc: Cong Wang, Steven Rostedt, Ingo Molnar

We have been trying to use rasdaemon to monitor hardware errors like
correctable memory errors. rasdaemon uses trace events to monitor
various hardware errors. In order to test it, we have to inject some
hardware errors, unfortunately not all of them provide error
injections. MCE does provide a way to inject MCE errors, but errors
like PCI error and devlink error don't, it is not easy to add error
injection to each of them. Instead, it is relatively easier to just
allow users to inject trace events in a generic way so that all trace
events can be injected.

This patch introduces trace event injection, where a new 'inject' is
added to each tracepoint directory. Users could write into this file
with key=value pairs to specify the value of each fields of the trace
event, all unspecified fields are set to zero values by default.

For example, for the net/net_dev_queue tracepoint, we can inject:

  INJECT=/sys/kernel/debug/tracing/events/net/net_dev_queue/inject
  echo "" > $INJECT
  echo "name='test'" > $INJECT
  echo "name='test' len=1024" > $INJECT
  cat /sys/kernel/debug/tracing/trace
  ...
   <...>-614   [000] ....    36.571483: net_dev_queue: dev= skbaddr=00000000fbf338c2 len=0
   <...>-614   [001] ....   136.588252: net_dev_queue: dev=test skbaddr=00000000fbf338c2 len=0
   <...>-614   [001] .N..   208.431878: net_dev_queue: dev=test skbaddr=00000000fbf338c2 len=1024

Triggers could be triggered as usual too:

  echo "stacktrace if len == 1025" > /sys/kernel/debug/tracing/events/net/net_dev_queue/trigger
  echo "len=1025" > $INJECT
  cat /sys/kernel/debug/tracing/trace
  ...
      bash-614   [000] ....    36.571483: net_dev_queue: dev= skbaddr=00000000fbf338c2 len=0
      bash-614   [001] ....   136.588252: net_dev_queue: dev=test skbaddr=00000000fbf338c2 len=0
      bash-614   [001] .N..   208.431878: net_dev_queue: dev=test skbaddr=00000000fbf338c2 len=1024
      bash-614   [001] .N.1   284.236349: <stack trace>
 => event_inject_write
 => vfs_write
 => ksys_write
 => do_syscall_64
 => entry_SYSCALL_64_after_hwframe

The only thing that can't be injected is string pointers as they
require constant string pointers, this can't be done at run time.

Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 kernel/trace/Makefile              |   1 +
 kernel/trace/trace.h               |   1 +
 kernel/trace/trace_events.c        |   4 +
 kernel/trace/trace_events_inject.c | 330 +++++++++++++++++++++++++++++
 4 files changed, 336 insertions(+)
 create mode 100644 kernel/trace/trace_events_inject.c

diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index c2b2148bb1d2..3c7bbacf4c18 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -69,6 +69,7 @@ obj-$(CONFIG_EVENT_TRACING) += trace_event_perf.o
 endif
 obj-$(CONFIG_EVENT_TRACING) += trace_events_filter.o
 obj-$(CONFIG_EVENT_TRACING) += trace_events_trigger.o
+obj-$(CONFIG_EVENT_TRACING) += trace_events_inject.o
 obj-$(CONFIG_HIST_TRIGGERS) += trace_events_hist.o
 obj-$(CONFIG_BPF_EVENTS) += bpf_trace.o
 obj-$(CONFIG_KPROBE_EVENTS) += trace_kprobe.o
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 1974ce818ddb..69b5ce0ad597 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1583,6 +1583,7 @@ extern struct list_head ftrace_events;
 
 extern const struct file_operations event_trigger_fops;
 extern const struct file_operations event_hist_fops;
+extern const struct file_operations event_inject_fops;
 
 #ifdef CONFIG_HIST_TRIGGERS
 extern int register_trigger_hist_cmd(void);
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 98df044d34ce..d23d6d6685e7 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -2018,6 +2018,10 @@ event_create_dir(struct dentry *parent, struct trace_event_file *file)
 	trace_create_file("format", 0444, file->dir, call,
 			  &ftrace_event_format_fops);
 
+	if (call->event.type && call->class->reg)
+		trace_create_file("inject", 0200, file->dir, file,
+				  &event_inject_fops);
+
 	return 0;
 }
 
diff --git a/kernel/trace/trace_events_inject.c b/kernel/trace/trace_events_inject.c
new file mode 100644
index 000000000000..cdd7db7f2724
--- /dev/null
+++ b/kernel/trace/trace_events_inject.c
@@ -0,0 +1,330 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * trace_events_inject - trace event injection
+ *
+ * Copyright (C) 2019 Cong Wang <cwang@twitter.com>
+ */
+
+#include <linux/module.h>
+#include <linux/ctype.h>
+#include <linux/mutex.h>
+#include <linux/slab.h>
+#include <linux/rculist.h>
+
+#include "trace.h"
+
+static int
+trace_inject_entry(struct trace_event_file *file, void *rec, int len)
+{
+	struct trace_event_buffer fbuffer;
+	struct ring_buffer *buffer;
+	int written = 0;
+	void *entry;
+
+	buffer = file->tr->trace_buffer.buffer;
+	ring_buffer_nest_start(buffer);
+
+	entry = trace_event_buffer_reserve(&fbuffer, file, len);
+	if (entry) {
+		memcpy(entry, rec, len);
+		written = len;
+		trace_event_buffer_commit(&fbuffer);
+	}
+
+	ring_buffer_nest_end(buffer);
+	return written;
+}
+
+static int
+parse_field(char *str, struct trace_event_call *call,
+	    struct ftrace_event_field **pf, u64 *pv)
+{
+	struct ftrace_event_field *field;
+	char *field_name;
+	int s, i = 0;
+	char *p;
+	int len;
+	u64 val;
+
+	if (!str[i])
+		return 0;
+	/* First find the field to associate to */
+	while (isspace(str[i]))
+		i++;
+	s = i;
+	while (isalnum(str[i]) || str[i] == '_')
+		i++;
+	len = i - s;
+	if (!len)
+		return -EINVAL;
+
+	field_name = kmemdup_nul(str + s, len, GFP_KERNEL);
+	if (!field_name)
+		return -ENOMEM;
+	field = trace_find_event_field(call, field_name);
+	kfree(field_name);
+	if (!field)
+		return -ENOENT;
+
+	*pf = field;
+	p = strchr(str + i, '=');
+	if (!p)
+		return -EINVAL;
+	i = p + 1 - str;
+	while (isspace(str[i]))
+		i++;
+	s = i;
+	if (isdigit(str[i]) || str[i] == '-') {
+		char num_buf[24];	/* Big enough to hold an address */
+		int ret;
+
+		/* Make sure the field is not a string */
+		if (is_string_field(field))
+			return -EINVAL;
+
+		if (str[i] == '-')
+			i++;
+
+		/* We allow 0xDEADBEEF */
+		while (isalnum(str[i]))
+			i++;
+
+		len = i - s;
+		/* 0xfeedfacedeadbeef is 18 chars max */
+		if (len >= sizeof(num_buf))
+			return -EINVAL;
+
+		strncpy(num_buf, str + s, len);
+		num_buf[len] = 0;
+
+		/* Make sure it is a value */
+		if (field->is_signed)
+			ret = kstrtoll(num_buf, 0, &val);
+		else
+			ret = kstrtoull(num_buf, 0, &val);
+		if (ret)
+			return ret;
+
+		*pv = val;
+		return i;
+	} else if (str[i] == '\'' || str[i] == '"') {
+		char q = str[i];
+
+		/* Make sure the field is OK for strings */
+		if (!is_string_field(field))
+			return -EINVAL;
+
+		for (i++; str[i]; i++) {
+			if (str[i] == q)
+				break;
+		}
+		if (!str[i])
+			return -EINVAL;
+
+		/* Skip quotes */
+		s++;
+		len = i - s;
+		if (len >= MAX_FILTER_STR_VAL)
+			return -EINVAL;
+
+		*pv = (unsigned long)(str + s);
+		str[i] = 0;
+		/* go past the last quote */
+		i++;
+		return i;
+	}
+
+	return -EINVAL;
+}
+
+static int trace_get_entry_size(struct trace_event_call *call)
+{
+	struct ftrace_event_field *field;
+	struct list_head *head;
+	int size = 0;
+
+	head = trace_get_fields(call);
+	list_for_each_entry(field, head, link)
+		if (field->size + field->offset > size)
+			size = field->size + field->offset;
+
+	return size;
+}
+
+static void *trace_alloc_entry(struct trace_event_call *call, int *size)
+{
+	int entry_size = trace_get_entry_size(call);
+	struct ftrace_event_field *field;
+	struct list_head *head;
+	void *entry = NULL;
+
+	/* We need an extra '\0' at the end. */
+	entry = kzalloc(entry_size + 1, GFP_KERNEL);
+	if (!entry)
+		return NULL;
+
+	head = trace_get_fields(call);
+	list_for_each_entry(field, head, link) {
+		if (!is_string_field(field))
+			continue;
+		if (field->filter_type == FILTER_STATIC_STRING)
+			continue;
+		if (field->filter_type == FILTER_DYN_STRING) {
+			u32 *str_item;
+			int str_len = 0;
+			int str_loc = entry_size & 0xffff;
+
+			str_item = (u32 *)(entry + field->offset);
+			*str_item = (str_len << 16) | str_loc;
+		} else {
+			char **paddr;
+
+			paddr = (char **)(entry + field->offset);
+			*paddr = "";
+		}
+	}
+
+	*size = entry_size + 1;
+	return entry;
+}
+
+#define INJECT_STRING "STATIC STRING CAN NOT BE INJECTED"
+
+static int parse_entry(char *str, struct trace_event_call *call, void **pentry)
+{
+	struct ftrace_event_field *field;
+	unsigned long irq_flags;
+	void *entry = NULL;
+	int entry_size;
+	u64 val;
+	int len;
+
+	entry = trace_alloc_entry(call, &entry_size);
+	*pentry = entry;
+	if (!entry)
+		return -ENOMEM;
+
+	local_save_flags(irq_flags);
+	tracing_generic_entry_update(entry, call->event.type, irq_flags,
+				     preempt_count());
+
+	while ((len = parse_field(str, call, &field, &val)) > 0) {
+		if (is_function_field(field))
+			return -EINVAL;
+
+		if (is_string_field(field)) {
+			char *addr = (char *)(unsigned long) val;
+
+			if (field->filter_type == FILTER_STATIC_STRING) {
+				strlcpy(entry + field->offset, addr, field->size);
+			} else if (field->filter_type == FILTER_DYN_STRING) {
+				u32 *str_item;
+				int str_len = strlen(addr) + 1;
+				int str_loc = entry_size & 0xffff;
+
+				entry_size += str_len;
+				*pentry = krealloc(entry, entry_size, GFP_KERNEL);
+				entry = *pentry;
+				if (!entry)
+					return -ENOMEM;
+
+				strlcpy(entry + (entry_size - str_len), addr, str_len);
+				str_item = (u32 *)(entry + field->offset);
+				*str_item = (str_len << 16) | str_loc;
+			} else {
+				char **paddr;
+
+				/* TODO: can we find the constant string? */
+				paddr = (char **)(entry + field->offset);
+				*paddr = INJECT_STRING;
+			}
+		} else {
+			switch (field->size) {
+			case 1: {
+				u8 tmp = (u8) val;
+
+				memcpy(entry + field->offset, &tmp, 1);
+				break;
+			}
+			case 2: {
+				u16 tmp = (u16) val;
+
+				memcpy(entry + field->offset, &tmp, 2);
+				break;
+			}
+			case 4: {
+				u32 tmp = (u32) val;
+
+				memcpy(entry + field->offset, &tmp, 4);
+				break;
+			}
+			case 8:
+				memcpy(entry + field->offset, &val, 8);
+				break;
+			default:
+				return -EINVAL;
+			}
+		}
+
+		str += len;
+	}
+
+	if (len < 0)
+		return len;
+
+	return entry_size;
+}
+
+static ssize_t
+event_inject_write(struct file *filp, const char __user *ubuf, size_t cnt,
+		   loff_t *ppos)
+{
+	struct trace_event_call *call;
+	struct trace_event_file *file;
+	int err = -ENODEV, size;
+	void *entry = NULL;
+	char *buf;
+
+	if (cnt >= PAGE_SIZE)
+		return -EINVAL;
+
+	buf = memdup_user_nul(ubuf, cnt);
+	if (IS_ERR(buf))
+		return PTR_ERR(buf);
+	strim(buf);
+
+	mutex_lock(&event_mutex);
+	file = event_file_data(filp);
+	if (file) {
+		call = file->event_call;
+		size = parse_entry(buf, call, &entry);
+		if (size < 0)
+			err = size;
+		else
+			err = trace_inject_entry(file, entry, size);
+	}
+	mutex_unlock(&event_mutex);
+
+	kfree(entry);
+	kfree(buf);
+
+	if (err < 0)
+		return err;
+
+	*ppos += err;
+	return cnt;
+}
+
+static ssize_t
+event_inject_read(struct file *file, char __user *buf, size_t size,
+		  loff_t *ppos)
+{
+	return -EPERM;
+}
+
+const struct file_operations event_inject_fops = {
+	.open = tracing_open_generic,
+	.read = event_inject_read,
+	.write = event_inject_write,
+};
+
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/4] trace: introduce trace event injection
  2019-05-25 16:57 [PATCH 0/4] trace: introduce trace event injection Cong Wang
                   ` (3 preceding siblings ...)
  2019-05-25 16:58 ` [PATCH 4/4] trace: introduce trace event injection Cong Wang
@ 2019-05-25 22:37 ` Steven Rostedt
  2019-05-26 20:12   ` Cong Wang
  2019-06-10 21:11   ` Cong Wang
  4 siblings, 2 replies; 12+ messages in thread
From: Steven Rostedt @ 2019-05-25 22:37 UTC (permalink / raw)
  To: Cong Wang; +Cc: linux-kernel, Ingo Molnar

On Sat, 25 May 2019 09:57:58 -0700
Cong Wang <xiyou.wangcong@gmail.com> wrote:

> This patchset introduces trace event injection, the first 3 patches
> are some trivial prerequisites, the last one implements the trace
> event injection. Please check each patch for details.
> 
> I have tested them with both valid and invalid input on different
> tracepoints, it works as expected.

Hi Cong,

Thanks for sending these patches, but I just want to let you know that
it's currently a US holiday, and then afterward I'll be doing quite a
bit of traveling for the next two weeks. If you don't hear from me in
after two weeks, please send me a reminder.

Thanks!

-- Steve


> 
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> 
> ---
> 
> Cong Wang (4):
>   trace: fold type initialization into tracing_generic_entry_update()
>   trace: let filter_assign_type() detect FILTER_PTR_STRING
>   trace: make trace_get_fields() global
>   trace: introduce trace event injection
> 
>  include/linux/trace_events.h       |   9 +
>  kernel/trace/Makefile              |   1 +
>  kernel/trace/trace.c               |   8 +-
>  kernel/trace/trace.h               |   1 +
>  kernel/trace/trace_event_perf.c    |   3 +-
>  kernel/trace/trace_events.c        |  12 +-
>  kernel/trace/trace_events_filter.c |   3 +
>  kernel/trace/trace_events_inject.c | 330 +++++++++++++++++++++++++++++
>  8 files changed, 353 insertions(+), 14 deletions(-)
>  create mode 100644 kernel/trace/trace_events_inject.c
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/4] trace: introduce trace event injection
  2019-05-25 22:37 ` [PATCH 0/4] " Steven Rostedt
@ 2019-05-26 20:12   ` Cong Wang
  2019-06-10 21:11   ` Cong Wang
  1 sibling, 0 replies; 12+ messages in thread
From: Cong Wang @ 2019-05-26 20:12 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: LKML, Ingo Molnar

On Sat, May 25, 2019 at 3:37 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Sat, 25 May 2019 09:57:58 -0700
> Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> > This patchset introduces trace event injection, the first 3 patches
> > are some trivial prerequisites, the last one implements the trace
> > event injection. Please check each patch for details.
> >
> > I have tested them with both valid and invalid input on different
> > tracepoints, it works as expected.
>
> Hi Cong,
>
> Thanks for sending these patches, but I just want to let you know that
> it's currently a US holiday, and then afterward I'll be doing quite a
> bit of traveling for the next two weeks. If you don't hear from me in
> after two weeks, please send me a reminder.

I will do it. Thanks for letting me know!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/4] trace: introduce trace event injection
  2019-05-25 22:37 ` [PATCH 0/4] " Steven Rostedt
  2019-05-26 20:12   ` Cong Wang
@ 2019-06-10 21:11   ` Cong Wang
  2019-06-11  1:12     ` Steven Rostedt
  1 sibling, 1 reply; 12+ messages in thread
From: Cong Wang @ 2019-06-10 21:11 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: LKML, Ingo Molnar

On Sat, May 25, 2019 at 3:37 PM Steven Rostedt <rostedt@goodmis.org> wrote:
> Hi Cong,
>
> Thanks for sending these patches, but I just want to let you know that
> it's currently a US holiday, and then afterward I'll be doing quite a
> bit of traveling for the next two weeks. If you don't hear from me in
> after two weeks, please send me a reminder.

This is a reminder after two weeks. :) Please review my patches
when you have a chance.

Thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/4] trace: introduce trace event injection
  2019-06-10 21:11   ` Cong Wang
@ 2019-06-11  1:12     ` Steven Rostedt
  2019-06-18 16:57       ` Steven Rostedt
  0 siblings, 1 reply; 12+ messages in thread
From: Steven Rostedt @ 2019-06-11  1:12 UTC (permalink / raw)
  To: Cong Wang; +Cc: LKML, Ingo Molnar

On Mon, 10 Jun 2019 14:11:57 -0700
Cong Wang <xiyou.wangcong@gmail.com> wrote:

> On Sat, May 25, 2019 at 3:37 PM Steven Rostedt <rostedt@goodmis.org> wrote:
> > Hi Cong,
> >
> > Thanks for sending these patches, but I just want to let you know that
> > it's currently a US holiday, and then afterward I'll be doing quite a
> > bit of traveling for the next two weeks. If you don't hear from me in
> > after two weeks, please send me a reminder.  
> 
> This is a reminder after two weeks. :) Please review my patches
> when you have a chance.
>

Thanks for the reminder. I'll try to get to it this week.

-- Steve

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/4] trace: introduce trace event injection
  2019-06-11  1:12     ` Steven Rostedt
@ 2019-06-18 16:57       ` Steven Rostedt
  0 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2019-06-18 16:57 UTC (permalink / raw)
  To: Cong Wang; +Cc: LKML, Ingo Molnar

On Mon, 10 Jun 2019 21:12:04 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Mon, 10 Jun 2019 14:11:57 -0700
> Cong Wang <xiyou.wangcong@gmail.com> wrote:
> 
> > On Sat, May 25, 2019 at 3:37 PM Steven Rostedt <rostedt@goodmis.org> wrote:  
> > > Hi Cong,
> > >
> > > Thanks for sending these patches, but I just want to let you know that
> > > it's currently a US holiday, and then afterward I'll be doing quite a
> > > bit of traveling for the next two weeks. If you don't hear from me in
> > > after two weeks, please send me a reminder.    
> > 
> > This is a reminder after two weeks. :) Please review my patches
> > when you have a chance.
> >  
> 
> Thanks for the reminder. I'll try to get to it this week.
>

Bah, sorry I haven't gotten to this yet. It's still in my queue to do,
and I'm going to be traveling again. I have not forgotten about it.
It's now marked in my INBOX patchwork, so it will not be forgotten.

-- Steve

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/4] trace: introduce trace event injection
  2019-05-25 16:58 ` [PATCH 4/4] trace: introduce trace event injection Cong Wang
@ 2019-07-09 19:51   ` Steven Rostedt
  2019-07-11 17:09     ` Cong Wang
  0 siblings, 1 reply; 12+ messages in thread
From: Steven Rostedt @ 2019-07-09 19:51 UTC (permalink / raw)
  To: Cong Wang; +Cc: linux-kernel, Ingo Molnar


Hi Cong,

I finally got around to looking at these patches. Sorry for such a long
time, I've been terribly busy :-(

BTW, it's best to look at how other commits in a subsystem do their
subject lines. The subsystem is "tracing" not "trace", and Linus
prefers to have the first letter after that capitalized. For example, I
changed your subject to:

 "tracing: Introduce trace event injection"


On Sat, 25 May 2019 09:58:02 -0700
Cong Wang <xiyou.wangcong@gmail.com> wrote:

> For example, for the net/net_dev_queue tracepoint, we can inject:
> 
>   INJECT=/sys/kernel/debug/tracing/events/net/net_dev_queue/inject
>   echo "" > $INJECT
>   echo "name='test'" > $INJECT
>   echo "name='test' len=1024" > $INJECT

I think you meant "dev='test'" not "name='test'"

>   cat /sys/kernel/debug/tracing/trace
>   ...
>    <...>-614   [000] ....    36.571483: net_dev_queue: dev= skbaddr=00000000fbf338c2 len=0
>    <...>-614   [001] ....   136.588252: net_dev_queue: dev=test skbaddr=00000000fbf338c2 len=0
>    <...>-614   [001] .N..   208.431878: net_dev_queue: dev=test skbaddr=00000000fbf338c2 len=1024
> 
> Triggers could be triggered as usual too:
> 
>   echo "stacktrace if len == 1025" > /sys/kernel/debug/tracing/events/net/net_dev_queue/trigger
>   echo "len=1025" > $INJECT
>   cat /sys/kernel/debug/tracing/trace
>   ...
>       bash-614   [000] ....    36.571483: net_dev_queue: dev= skbaddr=00000000fbf338c2 len=0
>       bash-614   [001] ....   136.588252: net_dev_queue: dev=test skbaddr=00000000fbf338c2 len=0
>       bash-614   [001] .N..   208.431878: net_dev_queue: dev=test skbaddr=00000000fbf338c2 len=1024
>       bash-614   [001] .N.1   284.236349: <stack trace>
>  => event_inject_write
>  => vfs_write
>  => ksys_write
>  => do_syscall_64
>  => entry_SYSCALL_64_after_hwframe  
> 
> The only thing that can't be injected is string pointers as they
> require constant string pointers, this can't be done at run time.
> 
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> ---
>  kernel/trace/Makefile              |   1 +
>  kernel/trace/trace.h               |   1 +
>  kernel/trace/trace_events.c        |   4 +
>  kernel/trace/trace_events_inject.c | 330 +++++++++++++++++++++++++++++
>  4 files changed, 336 insertions(+)
>  create mode 100644 kernel/trace/trace_events_inject.c
> 
> diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
> index c2b2148bb1d2..3c7bbacf4c18 100644
> --- a/kernel/trace/Makefile
> +++ b/kernel/trace/Makefile
> @@ -69,6 +69,7 @@ obj-$(CONFIG_EVENT_TRACING) += trace_event_perf.o
>  endif
>  obj-$(CONFIG_EVENT_TRACING) += trace_events_filter.o
>  obj-$(CONFIG_EVENT_TRACING) += trace_events_trigger.o
> +obj-$(CONFIG_EVENT_TRACING) += trace_events_inject.o
>  obj-$(CONFIG_HIST_TRIGGERS) += trace_events_hist.o
>  obj-$(CONFIG_BPF_EVENTS) += bpf_trace.o
>  obj-$(CONFIG_KPROBE_EVENTS) += trace_kprobe.o
> diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> index 1974ce818ddb..69b5ce0ad597 100644
> --- a/kernel/trace/trace.h
> +++ b/kernel/trace/trace.h
> @@ -1583,6 +1583,7 @@ extern struct list_head ftrace_events;
>  
>  extern const struct file_operations event_trigger_fops;
>  extern const struct file_operations event_hist_fops;
> +extern const struct file_operations event_inject_fops;
>  
>  #ifdef CONFIG_HIST_TRIGGERS
>  extern int register_trigger_hist_cmd(void);
> diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> index 98df044d34ce..d23d6d6685e7 100644
> --- a/kernel/trace/trace_events.c
> +++ b/kernel/trace/trace_events.c
> @@ -2018,6 +2018,10 @@ event_create_dir(struct dentry *parent, struct trace_event_file *file)
>  	trace_create_file("format", 0444, file->dir, call,
>  			  &ftrace_event_format_fops);
>  
> +	if (call->event.type && call->class->reg)
> +		trace_create_file("inject", 0200, file->dir, file,
> +				  &event_inject_fops);
> +
>  	return 0;
>  }
>  
> diff --git a/kernel/trace/trace_events_inject.c b/kernel/trace/trace_events_inject.c
> new file mode 100644
> index 000000000000..cdd7db7f2724
> --- /dev/null
> +++ b/kernel/trace/trace_events_inject.c
> @@ -0,0 +1,330 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * trace_events_inject - trace event injection
> + *
> + * Copyright (C) 2019 Cong Wang <cwang@twitter.com>
> + */
> +
> +#include <linux/module.h>
> +#include <linux/ctype.h>
> +#include <linux/mutex.h>
> +#include <linux/slab.h>
> +#include <linux/rculist.h>
> +
> +#include "trace.h"
> +
> +static int
> +trace_inject_entry(struct trace_event_file *file, void *rec, int len)
> +{
> +	struct trace_event_buffer fbuffer;
> +	struct ring_buffer *buffer;
> +	int written = 0;
> +	void *entry;
> +
> +	buffer = file->tr->trace_buffer.buffer;
> +	ring_buffer_nest_start(buffer);

I'm curious about why you added the ring_buffer_nest_start().

> +
> +	entry = trace_event_buffer_reserve(&fbuffer, file, len);
> +	if (entry) {
> +		memcpy(entry, rec, len);
> +		written = len;
> +		trace_event_buffer_commit(&fbuffer);
> +	}
> +
> +	ring_buffer_nest_end(buffer);
> +	return written;
> +}
> +
> +static int
> +parse_field(char *str, struct trace_event_call *call,
> +	    struct ftrace_event_field **pf, u64 *pv)
> +{
> +	struct ftrace_event_field *field;
> +	char *field_name;
> +	int s, i = 0;
> +	char *p;
> +	int len;
> +	u64 val;
> +
> +	if (!str[i])
> +		return 0;
> +	/* First find the field to associate to */
> +	while (isspace(str[i]))
> +		i++;
> +	s = i;
> +	while (isalnum(str[i]) || str[i] == '_')
> +		i++;
> +	len = i - s;
> +	if (!len)
> +		return -EINVAL;
> +
> +	field_name = kmemdup_nul(str + s, len, GFP_KERNEL);
> +	if (!field_name)
> +		return -ENOMEM;
> +	field = trace_find_event_field(call, field_name);
> +	kfree(field_name);
> +	if (!field)
> +		return -ENOENT;
> +
> +	*pf = field;
> +	p = strchr(str + i, '=');
> +	if (!p)
> +		return -EINVAL;
> +	i = p + 1 - str;
> +	while (isspace(str[i]))
> +		i++;
> +	s = i;
> +	if (isdigit(str[i]) || str[i] == '-') {
> +		char num_buf[24];	/* Big enough to hold an address */

Do you really need this buffer? See below.

> +		int ret;
> +
> +		/* Make sure the field is not a string */
> +		if (is_string_field(field))
> +			return -EINVAL;
> +
> +		if (str[i] == '-')
> +			i++;
> +
> +		/* We allow 0xDEADBEEF */
> +		while (isalnum(str[i]))
> +			i++;
> +
> +		len = i - s;
> +		/* 0xfeedfacedeadbeef is 18 chars max */
> +		if (len >= sizeof(num_buf))
> +			return -EINVAL;
> +
> +		strncpy(num_buf, str + s, len);
> +		num_buf[len] = 0;

Instead of using num_buf[], as str is already modified in the else
statement, we could do:

		char *num;
		char c;

		num = str + s;
		c = str[i];
		str[i] = '\0';
		[..]
			ret = kstrtoll(num, 0, &val);
		[..]
		str[i] = c;


> +
> +		/* Make sure it is a value */
> +		if (field->is_signed)
> +			ret = kstrtoll(num_buf, 0, &val);
> +		else
> +			ret = kstrtoull(num_buf, 0, &val);
> +		if (ret)
> +			return ret;
> +
> +		*pv = val;
> +		return i;
> +	} else if (str[i] == '\'' || str[i] == '"') {
> +		char q = str[i];
> +
> +		/* Make sure the field is OK for strings */
> +		if (!is_string_field(field))
> +			return -EINVAL;
> +
> +		for (i++; str[i]; i++) {

Should we allow for \' and \"?

			if (str[i] == '\\' && str[i + 1]) {
				i++;
				continue;
			}


> +			if (str[i] == q)
> +				break;
> +		}
> +		if (!str[i])
> +			return -EINVAL;
> +
> +		/* Skip quotes */
> +		s++;
> +		len = i - s;
> +		if (len >= MAX_FILTER_STR_VAL)
> +			return -EINVAL;
> +
> +		*pv = (unsigned long)(str + s);
> +		str[i] = 0;
> +		/* go past the last quote */
> +		i++;
> +		return i;
> +	}
> +
> +	return -EINVAL;
> +}
> +
> +static int trace_get_entry_size(struct trace_event_call *call)
> +{
> +	struct ftrace_event_field *field;
> +	struct list_head *head;
> +	int size = 0;
> +
> +	head = trace_get_fields(call);
> +	list_for_each_entry(field, head, link)

Add a '{' for the loop. We only allow for removing braces for non
complex statements.


> +		if (field->size + field->offset > size)
> +			size = field->size + field->offset;
> +
> +	return size;
> +}
> +
> +static void *trace_alloc_entry(struct trace_event_call *call, int *size)
> +{
> +	int entry_size = trace_get_entry_size(call);
> +	struct ftrace_event_field *field;
> +	struct list_head *head;
> +	void *entry = NULL;
> +
> +	/* We need an extra '\0' at the end. */
> +	entry = kzalloc(entry_size + 1, GFP_KERNEL);
> +	if (!entry)
> +		return NULL;
> +
> +	head = trace_get_fields(call);
> +	list_for_each_entry(field, head, link) {
> +		if (!is_string_field(field))
> +			continue;
> +		if (field->filter_type == FILTER_STATIC_STRING)
> +			continue;
> +		if (field->filter_type == FILTER_DYN_STRING) {
> +			u32 *str_item;
> +			int str_len = 0;
> +			int str_loc = entry_size & 0xffff;
> +
> +			str_item = (u32 *)(entry + field->offset);
> +			*str_item = (str_len << 16) | str_loc;
> +		} else {
> +			char **paddr;
> +
> +			paddr = (char **)(entry + field->offset);
> +			*paddr = "";
> +		}
> +	}
> +
> +	*size = entry_size + 1;
> +	return entry;
> +}
> +
> +#define INJECT_STRING "STATIC STRING CAN NOT BE INJECTED"
> +
> +static int parse_entry(char *str, struct trace_event_call *call, void **pentry)
> +{
> +	struct ftrace_event_field *field;
> +	unsigned long irq_flags;
> +	void *entry = NULL;
> +	int entry_size;
> +	u64 val;
> +	int len;
> +
> +	entry = trace_alloc_entry(call, &entry_size);
> +	*pentry = entry;
> +	if (!entry)
> +		return -ENOMEM;
> +
> +	local_save_flags(irq_flags);
> +	tracing_generic_entry_update(entry, call->event.type, irq_flags,
> +				     preempt_count());
> +
> +	while ((len = parse_field(str, call, &field, &val)) > 0) {
> +		if (is_function_field(field))
> +			return -EINVAL;
> +
> +		if (is_string_field(field)) {
> +			char *addr = (char *)(unsigned long) val;
> +
> +			if (field->filter_type == FILTER_STATIC_STRING) {
> +				strlcpy(entry + field->offset, addr, field->size);
> +			} else if (field->filter_type == FILTER_DYN_STRING) {
> +				u32 *str_item;
> +				int str_len = strlen(addr) + 1;
> +				int str_loc = entry_size & 0xffff;
> +
> +				entry_size += str_len;
> +				*pentry = krealloc(entry, entry_size, GFP_KERNEL);
> +				entry = *pentry;
> +				if (!entry)
> +					return -ENOMEM;

Add a comment here that states that *pentry gets freed by the caller
even on error, otherwise, to a reviewer this looks like a possible
memory leak. Or worse, we add a new caller that doesn't free it.
Perhaps we should add a comment above the function.

> +
> +				strlcpy(entry + (entry_size - str_len), addr, str_len);
> +				str_item = (u32 *)(entry + field->offset);
> +				*str_item = (str_len << 16) | str_loc;
> +			} else {
> +				char **paddr;
> +
> +				/* TODO: can we find the constant string? */

Honestly, char * is dangerous in tracing. Because the time you read the
pointer, what you pointed at during the recording could be long gone. I
really should have them removed in the code. In other words, don't
worry about pointer strings and such.

You can keep this as is.

> +				paddr = (char **)(entry + field->offset);
> +				*paddr = INJECT_STRING;
> +			}
> +		} else {
> +			switch (field->size) {
> +			case 1: {
> +				u8 tmp = (u8) val;
> +
> +				memcpy(entry + field->offset, &tmp, 1);
> +				break;
> +			}
> +			case 2: {
> +				u16 tmp = (u16) val;
> +
> +				memcpy(entry + field->offset, &tmp, 2);
> +				break;
> +			}
> +			case 4: {
> +				u32 tmp = (u32) val;
> +
> +				memcpy(entry + field->offset, &tmp, 4);
> +				break;
> +			}
> +			case 8:
> +				memcpy(entry + field->offset, &val, 8);
> +				break;
> +			default:
> +				return -EINVAL;
> +			}
> +		}
> +
> +		str += len;
> +	}
> +
> +	if (len < 0)
> +		return len;
> +
> +	return entry_size;
> +}
> +
> +static ssize_t
> +event_inject_write(struct file *filp, const char __user *ubuf, size_t cnt,
> +		   loff_t *ppos)
> +{
> +	struct trace_event_call *call;
> +	struct trace_event_file *file;
> +	int err = -ENODEV, size;
> +	void *entry = NULL;
> +	char *buf;
> +
> +	if (cnt >= PAGE_SIZE)
> +		return -EINVAL;
> +
> +	buf = memdup_user_nul(ubuf, cnt);
> +	if (IS_ERR(buf))
> +		return PTR_ERR(buf);
> +	strim(buf);
> +
> +	mutex_lock(&event_mutex);
> +	file = event_file_data(filp);
> +	if (file) {
> +		call = file->event_call;
> +		size = parse_entry(buf, call, &entry);
> +		if (size < 0)
> +			err = size;
> +		else
> +			err = trace_inject_entry(file, entry, size);
> +	}
> +	mutex_unlock(&event_mutex);
> +
> +	kfree(entry);
> +	kfree(buf);
> +
> +	if (err < 0)
> +		return err;
> +
> +	*ppos += err;
> +	return cnt;
> +}
> +
> +static ssize_t
> +event_inject_read(struct file *file, char __user *buf, size_t size,
> +		  loff_t *ppos)
> +{
> +	return -EPERM;
> +}
> +
> +const struct file_operations event_inject_fops = {
> +	.open = tracing_open_generic,
> +	.read = event_inject_read,
> +	.write = event_inject_write,
> +};
> +

The rest looks fine. Note, I'm sorry for the late response, but as the
merge window opened, I'll probably wait till after the merge window
closes before adding this.

That said, I'll apply the first three patches and get them in this
merge window, as they are mostly clean ups.

Thanks!

-- Steve

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/4] trace: introduce trace event injection
  2019-07-09 19:51   ` Steven Rostedt
@ 2019-07-11 17:09     ` Cong Wang
  0 siblings, 0 replies; 12+ messages in thread
From: Cong Wang @ 2019-07-11 17:09 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: LKML, Ingo Molnar

On Tue, Jul 9, 2019 at 12:51 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
>
> Hi Cong,
>
> I finally got around to looking at these patches. Sorry for such a long
> time, I've been terribly busy :-(
>
> BTW, it's best to look at how other commits in a subsystem do their
> subject lines. The subsystem is "tracing" not "trace", and Linus
> prefers to have the first letter after that capitalized. For example, I
> changed your subject to:
>
>  "tracing: Introduce trace event injection"

Sure, will update it.


>
>
> On Sat, 25 May 2019 09:58:02 -0700
> Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> > For example, for the net/net_dev_queue tracepoint, we can inject:
> >
> >   INJECT=/sys/kernel/debug/tracing/events/net/net_dev_queue/inject
> >   echo "" > $INJECT
> >   echo "name='test'" > $INJECT
> >   echo "name='test' len=1024" > $INJECT
>
> I think you meant "dev='test'" not "name='test'"

No, the filed name is "name", "dev" is just what shows in the output:

        TP_printk("dev=%s skbaddr=%p len=%u",
                __get_str(name), __entry->skbaddr, __entry->len)

I was confused too when I saw it at first. :)

>
> >   cat /sys/kernel/debug/tracing/trace
> >   ...
> >    <...>-614   [000] ....    36.571483: net_dev_queue: dev= skbaddr=00000000fbf338c2 len=0
> >    <...>-614   [001] ....   136.588252: net_dev_queue: dev=test skbaddr=00000000fbf338c2 len=0
> >    <...>-614   [001] .N..   208.431878: net_dev_queue: dev=test skbaddr=00000000fbf338c2 len=1024
> >
> > Triggers could be triggered as usual too:
> >
> >   echo "stacktrace if len == 1025" > /sys/kernel/debug/tracing/events/net/net_dev_queue/trigger
> >   echo "len=1025" > $INJECT
> >   cat /sys/kernel/debug/tracing/trace
> >   ...
> >       bash-614   [000] ....    36.571483: net_dev_queue: dev= skbaddr=00000000fbf338c2 len=0
> >       bash-614   [001] ....   136.588252: net_dev_queue: dev=test skbaddr=00000000fbf338c2 len=0
> >       bash-614   [001] .N..   208.431878: net_dev_queue: dev=test skbaddr=00000000fbf338c2 len=1024
> >       bash-614   [001] .N.1   284.236349: <stack trace>
> >  => event_inject_write
> >  => vfs_write
> >  => ksys_write
> >  => do_syscall_64
> >  => entry_SYSCALL_64_after_hwframe
> >
> > The only thing that can't be injected is string pointers as they
> > require constant string pointers, this can't be done at run time.
> >
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> > ---
> >  kernel/trace/Makefile              |   1 +
> >  kernel/trace/trace.h               |   1 +
> >  kernel/trace/trace_events.c        |   4 +
> >  kernel/trace/trace_events_inject.c | 330 +++++++++++++++++++++++++++++
> >  4 files changed, 336 insertions(+)
> >  create mode 100644 kernel/trace/trace_events_inject.c
> >
> > diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
> > index c2b2148bb1d2..3c7bbacf4c18 100644
> > --- a/kernel/trace/Makefile
> > +++ b/kernel/trace/Makefile
> > @@ -69,6 +69,7 @@ obj-$(CONFIG_EVENT_TRACING) += trace_event_perf.o
> >  endif
> >  obj-$(CONFIG_EVENT_TRACING) += trace_events_filter.o
> >  obj-$(CONFIG_EVENT_TRACING) += trace_events_trigger.o
> > +obj-$(CONFIG_EVENT_TRACING) += trace_events_inject.o
> >  obj-$(CONFIG_HIST_TRIGGERS) += trace_events_hist.o
> >  obj-$(CONFIG_BPF_EVENTS) += bpf_trace.o
> >  obj-$(CONFIG_KPROBE_EVENTS) += trace_kprobe.o
> > diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> > index 1974ce818ddb..69b5ce0ad597 100644
> > --- a/kernel/trace/trace.h
> > +++ b/kernel/trace/trace.h
> > @@ -1583,6 +1583,7 @@ extern struct list_head ftrace_events;
> >
> >  extern const struct file_operations event_trigger_fops;
> >  extern const struct file_operations event_hist_fops;
> > +extern const struct file_operations event_inject_fops;
> >
> >  #ifdef CONFIG_HIST_TRIGGERS
> >  extern int register_trigger_hist_cmd(void);
> > diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> > index 98df044d34ce..d23d6d6685e7 100644
> > --- a/kernel/trace/trace_events.c
> > +++ b/kernel/trace/trace_events.c
> > @@ -2018,6 +2018,10 @@ event_create_dir(struct dentry *parent, struct trace_event_file *file)
> >       trace_create_file("format", 0444, file->dir, call,
> >                         &ftrace_event_format_fops);
> >
> > +     if (call->event.type && call->class->reg)
> > +             trace_create_file("inject", 0200, file->dir, file,
> > +                               &event_inject_fops);
> > +
> >       return 0;
> >  }
> >
> > diff --git a/kernel/trace/trace_events_inject.c b/kernel/trace/trace_events_inject.c
> > new file mode 100644
> > index 000000000000..cdd7db7f2724
> > --- /dev/null
> > +++ b/kernel/trace/trace_events_inject.c
> > @@ -0,0 +1,330 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * trace_events_inject - trace event injection
> > + *
> > + * Copyright (C) 2019 Cong Wang <cwang@twitter.com>
> > + */
> > +
> > +#include <linux/module.h>
> > +#include <linux/ctype.h>
> > +#include <linux/mutex.h>
> > +#include <linux/slab.h>
> > +#include <linux/rculist.h>
> > +
> > +#include "trace.h"
> > +
> > +static int
> > +trace_inject_entry(struct trace_event_file *file, void *rec, int len)
> > +{
> > +     struct trace_event_buffer fbuffer;
> > +     struct ring_buffer *buffer;
> > +     int written = 0;
> > +     void *entry;
> > +
> > +     buffer = file->tr->trace_buffer.buffer;
> > +     ring_buffer_nest_start(buffer);
>
> I'm curious about why you added the ring_buffer_nest_start().


This is a copy-n-paste, I can remove it if it is not needed in this context.


>
> > +
> > +     entry = trace_event_buffer_reserve(&fbuffer, file, len);
> > +     if (entry) {
> > +             memcpy(entry, rec, len);
> > +             written = len;
> > +             trace_event_buffer_commit(&fbuffer);
> > +     }
> > +
> > +     ring_buffer_nest_end(buffer);
> > +     return written;
> > +}
> > +
> > +static int
> > +parse_field(char *str, struct trace_event_call *call,
> > +         struct ftrace_event_field **pf, u64 *pv)
> > +{
> > +     struct ftrace_event_field *field;
> > +     char *field_name;
> > +     int s, i = 0;
> > +     char *p;
> > +     int len;
> > +     u64 val;
> > +
> > +     if (!str[i])
> > +             return 0;
> > +     /* First find the field to associate to */
> > +     while (isspace(str[i]))
> > +             i++;
> > +     s = i;
> > +     while (isalnum(str[i]) || str[i] == '_')
> > +             i++;
> > +     len = i - s;
> > +     if (!len)
> > +             return -EINVAL;
> > +
> > +     field_name = kmemdup_nul(str + s, len, GFP_KERNEL);
> > +     if (!field_name)
> > +             return -ENOMEM;
> > +     field = trace_find_event_field(call, field_name);
> > +     kfree(field_name);
> > +     if (!field)
> > +             return -ENOENT;
> > +
> > +     *pf = field;
> > +     p = strchr(str + i, '=');
> > +     if (!p)
> > +             return -EINVAL;
> > +     i = p + 1 - str;
> > +     while (isspace(str[i]))
> > +             i++;
> > +     s = i;
> > +     if (isdigit(str[i]) || str[i] == '-') {
> > +             char num_buf[24];       /* Big enough to hold an address */
>
> Do you really need this buffer? See below.
>
> > +             int ret;
> > +
> > +             /* Make sure the field is not a string */
> > +             if (is_string_field(field))
> > +                     return -EINVAL;
> > +
> > +             if (str[i] == '-')
> > +                     i++;
> > +
> > +             /* We allow 0xDEADBEEF */
> > +             while (isalnum(str[i]))
> > +                     i++;
> > +
> > +             len = i - s;
> > +             /* 0xfeedfacedeadbeef is 18 chars max */
> > +             if (len >= sizeof(num_buf))
> > +                     return -EINVAL;
> > +
> > +             strncpy(num_buf, str + s, len);
> > +             num_buf[len] = 0;
>
> Instead of using num_buf[], as str is already modified in the else
> statement, we could do:
>
>                 char *num;
>                 char c;
>
>                 num = str + s;
>                 c = str[i];
>                 str[i] = '\0';
>                 [..]
>                         ret = kstrtoll(num, 0, &val);
>                 [..]
>                 str[i] = c;

Good point, I think it probably can be removed as you said, let
me do a double check.

>
>
> > +
> > +             /* Make sure it is a value */
> > +             if (field->is_signed)
> > +                     ret = kstrtoll(num_buf, 0, &val);
> > +             else
> > +                     ret = kstrtoull(num_buf, 0, &val);
> > +             if (ret)
> > +                     return ret;
> > +
> > +             *pv = val;
> > +             return i;
> > +     } else if (str[i] == '\'' || str[i] == '"') {
> > +             char q = str[i];
> > +
> > +             /* Make sure the field is OK for strings */
> > +             if (!is_string_field(field))
> > +                     return -EINVAL;
> > +
> > +             for (i++; str[i]; i++) {
>
> Should we allow for \' and \"?
>
>                         if (str[i] == '\\' && str[i + 1]) {
>                                 i++;
>                                 continue;
>                         }

Good suggestion. I think it is good to have it.


>
>
> > +                     if (str[i] == q)
> > +                             break;
> > +             }
> > +             if (!str[i])
> > +                     return -EINVAL;
> > +
> > +             /* Skip quotes */
> > +             s++;
> > +             len = i - s;
> > +             if (len >= MAX_FILTER_STR_VAL)
> > +                     return -EINVAL;
> > +
> > +             *pv = (unsigned long)(str + s);
> > +             str[i] = 0;
> > +             /* go past the last quote */
> > +             i++;
> > +             return i;
> > +     }
> > +
> > +     return -EINVAL;
> > +}
> > +
> > +static int trace_get_entry_size(struct trace_event_call *call)
> > +{
> > +     struct ftrace_event_field *field;
> > +     struct list_head *head;
> > +     int size = 0;
> > +
> > +     head = trace_get_fields(call);
> > +     list_for_each_entry(field, head, link)
>
> Add a '{' for the loop. We only allow for removing braces for non
> complex statements.

OK.

>
>
> > +             if (field->size + field->offset > size)
> > +                     size = field->size + field->offset;
> > +
> > +     return size;
> > +}
> > +
> > +static void *trace_alloc_entry(struct trace_event_call *call, int *size)
> > +{
> > +     int entry_size = trace_get_entry_size(call);
> > +     struct ftrace_event_field *field;
> > +     struct list_head *head;
> > +     void *entry = NULL;
> > +
> > +     /* We need an extra '\0' at the end. */
> > +     entry = kzalloc(entry_size + 1, GFP_KERNEL);
> > +     if (!entry)
> > +             return NULL;
> > +
> > +     head = trace_get_fields(call);
> > +     list_for_each_entry(field, head, link) {
> > +             if (!is_string_field(field))
> > +                     continue;
> > +             if (field->filter_type == FILTER_STATIC_STRING)
> > +                     continue;
> > +             if (field->filter_type == FILTER_DYN_STRING) {
> > +                     u32 *str_item;
> > +                     int str_len = 0;
> > +                     int str_loc = entry_size & 0xffff;
> > +
> > +                     str_item = (u32 *)(entry + field->offset);
> > +                     *str_item = (str_len << 16) | str_loc;
> > +             } else {
> > +                     char **paddr;
> > +
> > +                     paddr = (char **)(entry + field->offset);
> > +                     *paddr = "";
> > +             }
> > +     }
> > +
> > +     *size = entry_size + 1;
> > +     return entry;
> > +}
> > +
> > +#define INJECT_STRING "STATIC STRING CAN NOT BE INJECTED"
> > +
> > +static int parse_entry(char *str, struct trace_event_call *call, void **pentry)
> > +{
> > +     struct ftrace_event_field *field;
> > +     unsigned long irq_flags;
> > +     void *entry = NULL;
> > +     int entry_size;
> > +     u64 val;
> > +     int len;
> > +
> > +     entry = trace_alloc_entry(call, &entry_size);
> > +     *pentry = entry;
> > +     if (!entry)
> > +             return -ENOMEM;
> > +
> > +     local_save_flags(irq_flags);
> > +     tracing_generic_entry_update(entry, call->event.type, irq_flags,
> > +                                  preempt_count());
> > +
> > +     while ((len = parse_field(str, call, &field, &val)) > 0) {
> > +             if (is_function_field(field))
> > +                     return -EINVAL;
> > +
> > +             if (is_string_field(field)) {
> > +                     char *addr = (char *)(unsigned long) val;
> > +
> > +                     if (field->filter_type == FILTER_STATIC_STRING) {
> > +                             strlcpy(entry + field->offset, addr, field->size);
> > +                     } else if (field->filter_type == FILTER_DYN_STRING) {
> > +                             u32 *str_item;
> > +                             int str_len = strlen(addr) + 1;
> > +                             int str_loc = entry_size & 0xffff;
> > +
> > +                             entry_size += str_len;
> > +                             *pentry = krealloc(entry, entry_size, GFP_KERNEL);
> > +                             entry = *pentry;
> > +                             if (!entry)
> > +                                     return -ENOMEM;
>
> Add a comment here that states that *pentry gets freed by the caller
> even on error, otherwise, to a reviewer this looks like a possible
> memory leak. Or worse, we add a new caller that doesn't free it.
> Perhaps we should add a comment above the function.

Agreed, I will add a comment above this function.


>
> > +
> > +                             strlcpy(entry + (entry_size - str_len), addr, str_len);
> > +                             str_item = (u32 *)(entry + field->offset);
> > +                             *str_item = (str_len << 16) | str_loc;
> > +                     } else {
> > +                             char **paddr;
> > +
> > +                             /* TODO: can we find the constant string? */
>
> Honestly, char * is dangerous in tracing. Because the time you read the
> pointer, what you pointed at during the recording could be long gone. I
> really should have them removed in the code. In other words, don't
> worry about pointer strings and such.
>
> You can keep this as is.

Ok, good to know.


>
> > +                             paddr = (char **)(entry + field->offset);
> > +                             *paddr = INJECT_STRING;
> > +                     }
> > +             } else {
> > +                     switch (field->size) {
> > +                     case 1: {
> > +                             u8 tmp = (u8) val;
> > +
> > +                             memcpy(entry + field->offset, &tmp, 1);
> > +                             break;
> > +                     }
> > +                     case 2: {
> > +                             u16 tmp = (u16) val;
> > +
> > +                             memcpy(entry + field->offset, &tmp, 2);
> > +                             break;
> > +                     }
> > +                     case 4: {
> > +                             u32 tmp = (u32) val;
> > +
> > +                             memcpy(entry + field->offset, &tmp, 4);
> > +                             break;
> > +                     }
> > +                     case 8:
> > +                             memcpy(entry + field->offset, &val, 8);
> > +                             break;
> > +                     default:
> > +                             return -EINVAL;
> > +                     }
> > +             }
> > +
> > +             str += len;
> > +     }
> > +
> > +     if (len < 0)
> > +             return len;
> > +
> > +     return entry_size;
> > +}
> > +
> > +static ssize_t
> > +event_inject_write(struct file *filp, const char __user *ubuf, size_t cnt,
> > +                loff_t *ppos)
> > +{
> > +     struct trace_event_call *call;
> > +     struct trace_event_file *file;
> > +     int err = -ENODEV, size;
> > +     void *entry = NULL;
> > +     char *buf;
> > +
> > +     if (cnt >= PAGE_SIZE)
> > +             return -EINVAL;
> > +
> > +     buf = memdup_user_nul(ubuf, cnt);
> > +     if (IS_ERR(buf))
> > +             return PTR_ERR(buf);
> > +     strim(buf);
> > +
> > +     mutex_lock(&event_mutex);
> > +     file = event_file_data(filp);
> > +     if (file) {
> > +             call = file->event_call;
> > +             size = parse_entry(buf, call, &entry);
> > +             if (size < 0)
> > +                     err = size;
> > +             else
> > +                     err = trace_inject_entry(file, entry, size);
> > +     }
> > +     mutex_unlock(&event_mutex);
> > +
> > +     kfree(entry);
> > +     kfree(buf);
> > +
> > +     if (err < 0)
> > +             return err;
> > +
> > +     *ppos += err;
> > +     return cnt;
> > +}
> > +
> > +static ssize_t
> > +event_inject_read(struct file *file, char __user *buf, size_t size,
> > +               loff_t *ppos)
> > +{
> > +     return -EPERM;
> > +}
> > +
> > +const struct file_operations event_inject_fops = {
> > +     .open = tracing_open_generic,
> > +     .read = event_inject_read,
> > +     .write = event_inject_write,
> > +};
> > +
>
> The rest looks fine. Note, I'm sorry for the late response, but as the
> merge window opened, I'll probably wait till after the merge window
> closes before adding this.

No problem.

>
> That said, I'll apply the first three patches and get them in this
> merge window, as they are mostly clean ups.

Sounds good, I read this as I will only have to update and resend
the last patch in this series.

Thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-07-11 17:09 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-25 16:57 [PATCH 0/4] trace: introduce trace event injection Cong Wang
2019-05-25 16:57 ` [PATCH 1/4] trace: fold type initialization into tracing_generic_entry_update() Cong Wang
2019-05-25 16:58 ` [PATCH 2/4] trace: let filter_assign_type() detect FILTER_PTR_STRING Cong Wang
2019-05-25 16:58 ` [PATCH 3/4] trace: make trace_get_fields() global Cong Wang
2019-05-25 16:58 ` [PATCH 4/4] trace: introduce trace event injection Cong Wang
2019-07-09 19:51   ` Steven Rostedt
2019-07-11 17:09     ` Cong Wang
2019-05-25 22:37 ` [PATCH 0/4] " Steven Rostedt
2019-05-26 20:12   ` Cong Wang
2019-06-10 21:11   ` Cong Wang
2019-06-11  1:12     ` Steven Rostedt
2019-06-18 16:57       ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).