All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v15 0/4] trace: Introduce objtrace trigger to trace the kernel object
@ 2022-08-19  3:27 Jeff Xie
  2022-08-19  3:27 ` [PATCH v15 1/4] trace: Add trace any " Jeff Xie
                   ` (4 more replies)
  0 siblings, 5 replies; 21+ messages in thread
From: Jeff Xie @ 2022-08-19  3:27 UTC (permalink / raw)
  To: rostedt; +Cc: mingo, mhiramat, zanussi, linux-kernel, chensong_2000, Jeff Xie

Introduce a method based on function tracer to trace any object and get
the value of the object dynamically. the object can be obtained from the
dynamic event (kprobe_event/uprobe_event) or the static event(tracepoint).

Usage:
When using the kprobe event, only need to set the objtrace(a new trigger),
we can get the value of the object. The object is from the setting of the 
kprobe event.

For example:
For the function bio_add_page():

int bio_add_page(struct bio *bio, struct page *page,
	unsigned int len, unsigned int offset)

Firstly, we can set the base of the object, thus the first string "arg1"
stands for the value of the first parameter of this function bio_add_gage(),

# echo 'p bio_add_page arg1=$arg1' > ./kprobe_events

Secondly, we can get the value dynamically based on above object. 

find the offset of the bi_size in struct bio:
$ gdb vmlinux
(gdb) p &(((struct bio *)0)->bi_iter.bi_size)
$1 = (unsigned int *) 0x28

# echo 'objtrace:add:arg1,0x28:u32:1 if comm == "cat"' > ./events/kprobes/ \
	p_bio_add_page_0/trigger

# cd /sys/kernel/debug/tracing/
# echo 'p bio_add_page arg1=$arg1' > ./kprobe_events
# echo 'objtrace:add:arg1,0x28:u32:1 if comm == "cat"' > ./events/kprobes/p_bio_add_page_0/trigger

# du -sh /test.txt
12.0K   /test.txt

# cat  /test.txt > /dev/null
# cat ./trace
# tracer: nop
#
# entries-in-buffer/entries-written: 128/128   #P:4
#
#                                _-----=> irqs-off/BH-disabled
#                               / _----=> need-resched
#                              | / _---=> hardirq/softirq
#                              || / _--=> preempt-depth
#                              ||| / _-=> migrate-disable
#                              |||| /     delay
#           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
#              | |         |   |||||     |         |
             cat-117     [002] ...1.     1.602243: __bio_try_merge_page <-bio_add_page object:0xffff88811bee4000 value:0x0
             cat-117     [002] ...1.     1.602244: __bio_add_page <-bio_add_page object:0xffff88811bee4000 value:0x0
             cat-117     [002] ...2.     1.602244: bio_add_page <-ext4_mpage_readpages object:0xffff88811bee4000 value:0x1000
             cat-117     [002] ...1.     1.602245: __bio_try_merge_page <-bio_add_page object:0xffff88811bee4000 value:0x1000
             cat-117     [002] ...1.     1.602245: __bio_add_page <-bio_add_page object:0xffff88811bee4000 value:0x1000
             cat-117     [002] ...2.     1.602245: bio_add_page <-ext4_mpage_readpages object:0xffff88811bee4000 value:0x2000
             cat-117     [002] ...1.     1.602245: __bio_try_merge_page <-bio_add_page object:0xffff88811bee4000 value:0x2000
             cat-117     [002] ...1.     1.602245: __bio_add_page <-bio_add_page object:0xffff88811bee4000 value:0x2000
             cat-117     [002] ...1.     1.602245: submit_bio <-ext4_mpage_readpages object:0xffff88811bee4000 value:0x3000
             cat-117     [002] ...1.     1.602245: submit_bio_noacct <-ext4_mpage_readpages object:0xffff88811bee4000 value:0x3000
             cat-117     [002] ...1.     1.602246: __submit_bio <-submit_bio_noacct object:0xffff88811bee4000 value:0x3000
             cat-117     [002] ...1.     1.602246: submit_bio_checks <-__submit_bio object:0xffff88811bee4000 value:0x3000
             cat-117     [002] ...1.     1.602246: __cond_resched <-submit_bio_checks object:0xffff88811bee4000 value:0x3000
             cat-117     [002] ...1.     1.602246: should_fail_bio <-submit_bio_checks object:0xffff88811bee4000 value:0x3000
             cat-117     [002] ...1.     1.602246: blk_mq_submit_bio <-submit_bio_noacct object:0xffff88811bee4000 value:0x3000
             cat-117     [002] ...1.     1.602246: blk_attempt_plug_merge <-blk_mq_submit_bio object:0xffff88811bee4000 value:0x3000
             cat-117     [002] ...1.     1.602246: blk_mq_sched_bio_merge <-blk_mq_submit_bio object:0xffff88811bee4000 value:0x3000
             cat-117     [002] ...1.     1.602247: __rcu_read_lock <-blk_mq_submit_bio object:0xffff88811bee4000 value:0x3000
             cat-117     [002] ...1.     1.602247: __rcu_read_unlock <-blk_mq_submit_bio object:0xffff88811bee4000 value:0x3000
             cat-117     [002] ...1.     1.602247: __blk_mq_alloc_requests <-blk_mq_submit_bio object:0xffff88811bee4000 value:0x3000
          <idle>-0       [002] d..3.     1.602298: bio_endio <-blk_update_request object:0xffff88811bee4000 value:0x0
          <idle>-0       [002] d..3.     1.602298: mpage_end_io <-blk_update_request object:0xffff88811bee4000 value:0x0
          <idle>-0       [002] d..3.     1.602298: __read_end_io <-blk_update_request object:0xffff88811bee4000 value:0x0
          <idle>-0       [002] d..3.     1.602300: bio_put <-blk_update_request object:0xffff88811bee4000 value:0x0
          <idle>-0       [002] d..3.     1.602300: bio_free <-blk_update_request object:0xffff88811bee4000 value:0x0
          <idle>-0       [002] d..3.     1.602300: mempool_free <-blk_update_request object:0xffff88811bee4000 value:0x0
          <idle>-0       [002] d..3.     1.602300: mempool_free_slab <-blk_update_request object:0xffff88811bee4000 value:0x0
          <idle>-0       [002] d..3.     1.602300: kmem_cache_free <-blk_update_request object:0xffff88811bee4000 value:0x0
	  ...

Almost all changelogs were suggested by Masami(mhiramat@kernel.org)
and steve(rostedt@goodmis.org), thank you all so much.

v15:
- Add Shuah Khan's acked-by for [3/4]
  https://lore.kernel.org/lkml/db539a80-8cad-b5bf-3bea-6a883f48585e@linuxfoundation.org/
- Add Bagas Sanjaya's reviewed-by for [4/4]
  https://lore.kernel.org/lkml/YrwbMu9f7cGosYCT@debian.me/
- Add syntax suggested by Joel for usage of objtrace in [4/4] 

v14:
- make documentation more readable and fix literal code block by Bagas Sanjaya

v13:
- remove the 's' type, since the objtrace event doesn't show the value with sign
- change the [3/4] tag with 'selftests/ftrace' instead of 'trace/objtrace'
- add 'Documentation:' tag for [4/4]

v12:
- use the %zu to print the sizeof
  Reported-by: kernel test robot <lkp@intel.com>
  Suggested-by: Nathan Chancellor <nathan@kernel.org>

v11:
- remove useless atomic counting methods for num_traced_obj
- make array objtrace_fetch_types null terminated
- add raw_spin_lock_init for obj_data_lock

v10:
- support ftrace instances
- use trace_buffer_lock_reserve instead of trace_event_buffer_lock_reserve
- a lot of cleanup work has been done mainly for event_object_trigger_parse

v9:
- fix objtrace trigger output was incomplete
- fix the objtrace trigger was removed when using the existed parameter on
  event.
- add testcase for the second fix above.

v8:
- revert to use per-cpu recursion for the function trace_object_events_call
- recover the filter when getting the value of the object
- simplify the implementation for the function get_object_value
- fix the build error

v7:
- use fixed-size array for object pool instead of list structure
- use ftrace_test_recursion_trylock for function trace hook function
- fix trace_object_ref reference count in the init_trace_object
- invoke exit_trace_object no matter whether data->ops->free is null 
  in the unregister_object_trigger
- release private_data of event_trigger_data in the trace_object_trigger_free
- remove [RFC] tag

v6:
- change the objtrace trigger syntax.
- add patchset description
- add <tracefs>/README

v5:
- add testcasts
- add check the field->size
- add lockless to search object
- describe the object trace more clearly in Kconfig

v4:
- please ignore the v4 which is the same as v3

v3:
- change the objfilter to objtrace
- add a command to the objfilter syntax
- change to get the value of the object
- use trace_find_event_field to find the field instead of using argN
- get data from @rec in the event trigger callback funciton

v2:
- adding a "objfilter" trigger to update object

Jeff Xie (4):
  trace: Add trace any kernel object
  trace/objtrace: Get the value of the object
  selftests/ftrace: Add testcases for objtrace
  Documentation: trace/objtrace: Add documentation for objtrace

 Documentation/trace/events.rst                |  90 +++
 include/linux/trace_events.h                  |   1 +
 kernel/trace/Kconfig                          |  10 +
 kernel/trace/Makefile                         |   1 +
 kernel/trace/trace.c                          |  11 +
 kernel/trace/trace.h                          |  21 +
 kernel/trace/trace_entries.h                  |  18 +
 kernel/trace/trace_events_trigger.c           |   5 +-
 kernel/trace/trace_object.c                   | 611 ++++++++++++++++++
 kernel/trace/trace_output.c                   |  40 ++
 .../ftrace/test.d/trigger/trigger-objtrace.tc |  41 ++
 11 files changed, 847 insertions(+), 2 deletions(-)
 create mode 100644 kernel/trace/trace_object.c
 create mode 100644 tools/testing/selftests/ftrace/test.d/trigger/trigger-objtrace.tc


base-commit: 09794a5a6c348f629b35fc1687071a1622ef4265
-- 
2.25.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v15 1/4] trace: Add trace any kernel object
  2022-08-19  3:27 [PATCH v15 0/4] trace: Introduce objtrace trigger to trace the kernel object Jeff Xie
@ 2022-08-19  3:27 ` Jeff Xie
  2022-10-17 19:58   ` Steven Rostedt
  2022-08-19  3:27 ` [PATCH v15 2/4] trace/objtrace: Get the value of the object Jeff Xie
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 21+ messages in thread
From: Jeff Xie @ 2022-08-19  3:27 UTC (permalink / raw)
  To: rostedt; +Cc: mingo, mhiramat, zanussi, linux-kernel, chensong_2000, Jeff Xie

Introduce objtrace trigger to get the call flow by tracing any kernel
object in the function parameter.

The objtrace trigger makes a list of the target object address from
the given event parameter, and records all kernel function calls
which has the object address in its parameter.

Syntax:
	objtrace:add:obj[:count][if <filter>]

Usage:
	# echo 'p bio_add_page arg1=$arg1' > kprobe_events
	# cd events/kprobes/p_bio_add_page_0
	# echo 'objtrace:add:arg1:1 if comm == "cat"' > ./trigger
	# cat /test.txt

Signed-off-by: Jeff Xie <xiehuan09@gmail.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 include/linux/trace_events.h        |   1 +
 kernel/trace/Kconfig                |  10 +
 kernel/trace/Makefile               |   1 +
 kernel/trace/trace.c                |  11 +
 kernel/trace/trace.h                |  21 ++
 kernel/trace/trace_entries.h        |  17 ++
 kernel/trace/trace_events_trigger.c |   5 +-
 kernel/trace/trace_object.c         | 450 ++++++++++++++++++++++++++++
 kernel/trace/trace_output.c         |  40 +++
 9 files changed, 554 insertions(+), 2 deletions(-)
 create mode 100644 kernel/trace/trace_object.c

diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index b18759a673c6..4c9a899fc041 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -693,6 +693,7 @@ enum event_trigger_type {
 	ETT_EVENT_HIST		= (1 << 4),
 	ETT_HIST_ENABLE		= (1 << 5),
 	ETT_EVENT_EPROBE	= (1 << 6),
+	ETT_TRACE_OBJECT	= (1 << 7),
 };
 
 extern int filter_match_preds(struct event_filter *filter, void *rec);
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 1052126bdca2..4a78e72a07df 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -278,6 +278,16 @@ config FUNCTION_PROFILER
 
 	  If in doubt, say N.
 
+config TRACE_OBJECT
+	bool "Trace kernel object in function parameter"
+	depends on FUNCTION_TRACER
+	depends on HAVE_FUNCTION_ARG_ACCESS_API
+	select TRACING
+	default y
+	help
+	 You can trace the kernel object in the kernel function parameter.
+	 The kernel object is dynamically specified via event trigger.
+
 config STACK_TRACER
 	bool "Trace max stack"
 	depends on HAVE_FUNCTION_TRACER
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index c6651e16b557..9bbd1ac22518 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -71,6 +71,7 @@ obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += trace_functions_graph.o
 obj-$(CONFIG_TRACE_BRANCH_PROFILING) += trace_branch.o
 obj-$(CONFIG_BLK_DEV_IO_TRACE) += blktrace.o
 obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += fgraph.o
+obj-$(CONFIG_TRACE_OBJECT) += trace_object.o
 ifeq ($(CONFIG_BLOCK),y)
 obj-$(CONFIG_EVENT_TRACING) += blktrace.o
 endif
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 27febd4ee33e..2c2477dea0f2 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -5621,6 +5621,9 @@ static const char readme_msg[] =
 	"\t            enable_hist:<system>:<event>\n"
 	"\t            disable_hist:<system>:<event>\n"
 #endif
+#ifdef CONFIG_TRACE_OBJECT
+	"\t            objtrace:add:obj[:count][if <filter>]\n"
+#endif
 #ifdef CONFIG_STACKTRACE
 	"\t\t    stacktrace\n"
 #endif
@@ -9261,6 +9264,9 @@ static struct trace_array *trace_array_create(const char *name)
 	if (ftrace_allocate_ftrace_ops(tr) < 0)
 		goto out_free_tr;
 
+	if (allocate_objtrace_data(tr) < 0)
+		goto out_free_tr;
+
 	ftrace_init_trace_array(tr);
 
 	init_trace_flags_index(tr);
@@ -9280,6 +9286,7 @@ static struct trace_array *trace_array_create(const char *name)
 
  out_free_tr:
 	ftrace_free_ftrace_ops(tr);
+	free_objtrace_data(tr);
 	free_trace_buffers(tr);
 	free_cpumask_var(tr->tracing_cpumask);
 	kfree(tr->name);
@@ -9373,6 +9380,7 @@ static int __remove_instance(struct trace_array *tr)
 	event_trace_del_tracer(tr);
 	ftrace_clear_pids(tr);
 	ftrace_destroy_function_files(tr);
+	free_objtrace_data(tr);
 	tracefs_remove(tr->dir);
 	free_percpu(tr->last_func_repeats);
 	free_trace_buffers(tr);
@@ -10110,6 +10118,9 @@ __init static int tracer_alloc_buffers(void)
 		goto out_free_savedcmd;
 	}
 
+	if (allocate_objtrace_data(&global_trace))
+		goto out_free_savedcmd;
+
 	if (global_trace.buffer_disabled)
 		tracing_off();
 
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 900e75d96c84..8da1ef78a09a 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -54,6 +54,7 @@ enum trace_type {
 	TRACE_TIMERLAT,
 	TRACE_RAW_DATA,
 	TRACE_FUNC_REPEATS,
+	TRACE_OBJECT,
 
 	__TRACE_LAST_TYPE,
 };
@@ -363,6 +364,9 @@ struct trace_array {
 	cpumask_var_t		tracing_cpumask; /* only trace on set CPUs */
 	int			ref;
 	int			trace_ref;
+#ifdef CONFIG_TRACE_OBJECT
+	struct objtrace_data	*obj_data;
+#endif
 #ifdef CONFIG_FUNCTION_TRACER
 	struct ftrace_ops	*ops;
 	struct trace_pid_list	__rcu *function_pids;
@@ -471,6 +475,7 @@ extern void __ftrace_bad_type(void);
 			  TRACE_GRAPH_RET);		\
 		IF_ASSIGN(var, ent, struct func_repeats_entry,		\
 			  TRACE_FUNC_REPEATS);				\
+		IF_ASSIGN(var, ent, struct trace_object_entry, TRACE_OBJECT);\
 		__ftrace_bad_type();					\
 	} while (0)
 
@@ -1536,6 +1541,16 @@ static inline int register_trigger_hist_cmd(void) { return 0; }
 static inline int register_trigger_hist_enable_disable_cmds(void) { return 0; }
 #endif
 
+#ifdef CONFIG_TRACE_OBJECT
+extern int register_trigger_object_cmd(void);
+extern int allocate_objtrace_data(struct trace_array *tr);
+extern void free_objtrace_data(struct trace_array *tr);
+#else
+static inline int register_trigger_object_cmd(void) { return 0; }
+static inline int allocate_objtrace_data(struct trace_array *tr) { return 0; }
+static inline void free_objtrace_data(struct trace_array *tr) { };
+#endif
+
 extern int register_trigger_cmds(void);
 extern void clear_event_triggers(struct trace_array *tr);
 
@@ -1611,6 +1626,12 @@ extern bool event_trigger_check_remove(const char *glob);
 extern bool event_trigger_empty_param(const char *param);
 extern int event_trigger_separate_filter(char *param_and_filter, char **param,
 					 char **filter, bool param_required);
+extern int register_trigger(char *glob,
+			struct event_trigger_data *data,
+			struct trace_event_file *file);
+extern void unregister_trigger(char *glob,
+			struct event_trigger_data *test,
+			struct trace_event_file *file);
 extern struct event_trigger_data *
 event_trigger_alloc(struct event_command *cmd_ops,
 		    char *cmd,
diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h
index cd41e863b51c..bb120d9498a9 100644
--- a/kernel/trace/trace_entries.h
+++ b/kernel/trace/trace_entries.h
@@ -401,3 +401,20 @@ FTRACE_ENTRY(timerlat, timerlat_entry,
 		 __entry->context,
 		 __entry->timer_latency)
 );
+
+/*
+ * trace object entry:
+ */
+FTRACE_ENTRY(object, trace_object_entry,
+
+	TRACE_OBJECT,
+
+	F_STRUCT(
+		__field(	unsigned long,		ip		)
+		__field(	unsigned long,		parent_ip	)
+		__field(	unsigned long,		object		)
+	),
+
+	F_printk(" %ps <-- %ps object:%lx\n",
+		 (void *)__entry->ip, (void *)__entry->parent_ip, __entry->object)
+);
diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
index cb866c3141af..8bad4eb3b997 100644
--- a/kernel/trace/trace_events_trigger.c
+++ b/kernel/trace/trace_events_trigger.c
@@ -560,7 +560,7 @@ void update_cond_flag(struct trace_event_file *file)
  *
  * Return: 0 on success, errno otherwise
  */
-static int register_trigger(char *glob,
+int register_trigger(char *glob,
 			    struct event_trigger_data *data,
 			    struct trace_event_file *file)
 {
@@ -605,7 +605,7 @@ static int register_trigger(char *glob,
  * Usually used directly as the @unreg method in event command
  * implementations.
  */
-static void unregister_trigger(char *glob,
+void unregister_trigger(char *glob,
 			       struct event_trigger_data *test,
 			       struct trace_event_file *file)
 {
@@ -1976,6 +1976,7 @@ __init int register_trigger_cmds(void)
 	register_trigger_enable_disable_cmds();
 	register_trigger_hist_enable_disable_cmds();
 	register_trigger_hist_cmd();
+	register_trigger_object_cmd();
 
 	return 0;
 }
diff --git a/kernel/trace/trace_object.c b/kernel/trace/trace_object.c
new file mode 100644
index 000000000000..19ec4b1c0186
--- /dev/null
+++ b/kernel/trace/trace_object.c
@@ -0,0 +1,450 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * trace the kernel object in the kernel function parameter
+ * Copyright (C) 2021 Jeff Xie <xiehuan09@gmail.com>
+ */
+
+#define pr_fmt(fmt) "trace_object: " fmt
+
+#include "trace_output.h"
+
+#define MAX_TRACED_OBJECT 5
+#define OBJTRACE_CMD_LEN  16
+#define OBJTRACE_CMD_ADD "add"
+static LIST_HEAD(obj_data_head);
+static const int max_args_num = 6;
+static void exit_trace_object(struct trace_array *tr);
+static int init_trace_object(struct trace_array *tr);
+
+struct object_instance {
+	void *obj;
+	struct trace_array *tr;
+};
+
+/* objtrace private data */
+struct objtrace_trigger_data {
+	struct ftrace_event_field *field;
+	char objtrace_cmd[OBJTRACE_CMD_LEN];
+	struct trace_array *tr;
+};
+
+/* objtrace data with fops and objtrace_instances */
+struct objtrace_data {
+	struct list_head head;
+	struct trace_array *tr;
+	struct ftrace_ops fops;
+	int num_traced_obj;
+	struct object_instance traced_obj[MAX_TRACED_OBJECT];
+	raw_spinlock_t obj_data_lock;
+};
+
+static struct objtrace_data *get_obj_data(struct trace_array *tr)
+{
+	struct objtrace_data *obj_data = NULL;
+
+	list_for_each_entry(obj_data, &obj_data_head, head) {
+		if (obj_data->tr == tr)
+			break;
+	}
+	return obj_data;
+}
+
+static bool object_exist(void *obj, struct trace_array *tr)
+{
+	int i, max;
+	struct objtrace_data *obj_data;
+
+	obj_data = get_obj_data(tr);
+	if (!obj_data)
+		return false;
+
+	max = READ_ONCE(obj_data->num_traced_obj);
+	smp_rmb();
+	for (i = 0; i < max; i++) {
+		if (obj_data->traced_obj[i].obj == obj)
+			return true;
+	}
+	return false;
+}
+
+static bool object_empty(struct trace_array *tr)
+{
+	struct objtrace_data *obj_data;
+
+	obj_data = get_obj_data(tr);
+	if (!obj_data)
+		return false;
+
+	return !READ_ONCE(obj_data->num_traced_obj);
+}
+
+static void set_trace_object(void *obj, struct trace_array *tr)
+{
+	unsigned long flags;
+	struct object_instance *obj_ins;
+	struct objtrace_data *obj_data;
+
+	if (in_nmi())
+		return;
+
+	if (!obj && object_exist(obj, tr))
+		return;
+
+	obj_data = get_obj_data(tr);
+	if (!obj_data)
+		return;
+
+	/* only this place has write operations */
+	raw_spin_lock_irqsave(&obj_data->obj_data_lock, flags);
+	if (READ_ONCE(obj_data->num_traced_obj) == MAX_TRACED_OBJECT) {
+		trace_array_printk_buf(tr->array_buffer.buffer, _THIS_IP_,
+				"object_pool is full, can't trace object:0x%px\n", obj);
+		goto out;
+	}
+	obj_ins = &obj_data->traced_obj[READ_ONCE(obj_data->num_traced_obj)];
+	obj_ins->obj = obj;
+	obj_ins->tr = tr;
+	/* make sure the num_traced_obj update always appears after traced_obj update */
+	smp_wmb();
+	obj_data->num_traced_obj++;
+out:
+	raw_spin_unlock_irqrestore(&obj_data->obj_data_lock, flags);
+}
+
+static void submit_trace_object(unsigned long ip, unsigned long parent_ip,
+				 unsigned long object, struct trace_array *tr)
+{
+
+	struct trace_buffer *buffer = tr->array_buffer.buffer;
+	struct ring_buffer_event *event;
+	struct trace_object_entry *entry;
+	unsigned int trace_ctx = 0;
+
+	trace_ctx = tracing_gen_ctx();
+	event = trace_buffer_lock_reserve(buffer, TRACE_OBJECT,
+			sizeof(*entry), trace_ctx);
+	if (!event)
+		return;
+	entry   = ring_buffer_event_data(event);
+	entry->ip                       = ip;
+	entry->parent_ip                = parent_ip;
+	entry->object			= object;
+
+	trace_buffer_unlock_commit(tr, buffer, event, trace_ctx);
+}
+
+static void
+trace_object_events_call(unsigned long ip, unsigned long parent_ip,
+		struct ftrace_ops *op, struct ftrace_regs *fregs)
+{
+	struct pt_regs *pt_regs = ftrace_get_regs(fregs);
+	struct trace_array *tr = op->private;
+	struct trace_array_cpu *data;
+	unsigned long obj;
+	long disabled;
+	int cpu, n;
+
+	preempt_disable_notrace();
+
+	cpu = raw_smp_processor_id();
+	data = per_cpu_ptr(tr->array_buffer.data, cpu);
+	disabled = atomic_inc_return(&data->disabled);
+	if (likely(disabled == 1)) {
+		if (object_empty(tr))
+			goto out;
+		for (n = 0; n < max_args_num; n++) {
+			obj = regs_get_kernel_argument(pt_regs, n);
+			if (object_exist((void *)obj, tr))
+				submit_trace_object(ip, parent_ip, obj, tr);
+		/* The parameters of a function may match multiple objects */
+		}
+	}
+out:
+	atomic_dec(&data->disabled);
+	preempt_enable_notrace();
+}
+
+static void
+trace_object_trigger(struct event_trigger_data *data,
+		   struct trace_buffer *buffer,  void *rec,
+		   struct ring_buffer_event *event)
+{
+	struct objtrace_trigger_data *obj_data = data->private_data;
+	struct trace_array *tr = obj_data->tr;
+	struct ftrace_event_field *field;
+	void *obj = NULL;
+
+	field = obj_data->field;
+	memcpy(&obj, rec + field->offset, sizeof(obj));
+	set_trace_object(obj, tr);
+}
+
+static void
+trace_object_trigger_free(struct event_trigger_data *data)
+{
+	struct objtrace_trigger_data *obj_data = data->private_data;
+
+	if (WARN_ON_ONCE(data->ref <= 0))
+		return;
+
+	data->ref--;
+	if (!data->ref) {
+		exit_trace_object(obj_data->tr);
+		kfree(data->private_data);
+		trigger_data_free(data);
+	}
+}
+
+static void
+trace_object_count_trigger(struct event_trigger_data *data,
+			 struct trace_buffer *buffer, void *rec,
+			 struct ring_buffer_event *event)
+{
+	if (!data->count)
+		return;
+
+	if (data->count != -1)
+		(data->count)--;
+
+	trace_object_trigger(data, buffer, rec, event);
+}
+
+static int
+event_trigger_print(const char *name, struct seq_file *m,
+		void *data, char *filter_str, void *objtrace_data)
+{
+	long count = (long)data;
+	struct objtrace_trigger_data *obj_data = objtrace_data;
+
+	seq_puts(m, name);
+
+	seq_printf(m, ":%s", obj_data->objtrace_cmd);
+	seq_printf(m, ":%s", obj_data->field->name);
+
+	if (count == -1)
+		seq_puts(m, ":unlimited");
+	else
+		seq_printf(m, ":count=%ld", count);
+
+	if (filter_str)
+		seq_printf(m, " if %s\n", filter_str);
+	else
+		seq_putc(m, '\n');
+
+	return 0;
+}
+
+static int event_object_trigger_init(struct event_trigger_data *data)
+{
+	struct objtrace_trigger_data *obj_data = data->private_data;
+	int ret;
+
+	if (!data->ref) {
+		ret = init_trace_object(obj_data->tr);
+		if (ret)
+			return ret;
+	}
+	data->ref++;
+	return 0;
+}
+
+static int
+trace_object_trigger_print(struct seq_file *m, struct event_trigger_data *data)
+{
+	return event_trigger_print("objtrace", m, (void *)data->count,
+				   data->filter_str, data->private_data);
+}
+
+static struct event_trigger_ops objecttrace_trigger_ops = {
+	.trigger		= trace_object_trigger,
+	.print			= trace_object_trigger_print,
+	.init			= event_object_trigger_init,
+	.free			= trace_object_trigger_free,
+};
+
+static struct event_trigger_ops objecttrace_count_trigger_ops = {
+	.trigger		= trace_object_count_trigger,
+	.print			= trace_object_trigger_print,
+	.init			= event_object_trigger_init,
+	.free			= trace_object_trigger_free,
+};
+
+static struct event_trigger_ops *
+objecttrace_get_trigger_ops(char *cmd, char *param)
+{
+	return param ? &objecttrace_count_trigger_ops : &objecttrace_trigger_ops;
+}
+
+static bool field_exist(struct trace_event_file *file,
+			struct event_command *cmd_ops,
+			const char *field_name)
+{
+	struct event_trigger_data *data;
+	struct objtrace_trigger_data *obj_data;
+
+	lockdep_assert_held(&event_mutex);
+
+	list_for_each_entry(data, &file->triggers, list) {
+		if (data->cmd_ops->trigger_type == cmd_ops->trigger_type) {
+			obj_data = data->private_data;
+			if (!strcmp(obj_data->field->name, field_name))
+				return true;
+		}
+	}
+
+	return false;
+}
+
+static int
+event_object_trigger_parse(struct event_command *cmd_ops,
+		       struct trace_event_file *file,
+		       char *glob, char *cmd, char *param_and_filter)
+{
+	struct event_trigger_data *trigger_data;
+	struct objtrace_trigger_data *obj_data;
+	struct ftrace_event_field *field;
+	char *objtrace_cmd, *arg;
+	char *param, *filter;
+	int ret;
+	bool remove;
+
+	remove = event_trigger_check_remove(glob);
+
+	/*
+	 * separate the param and the filter:
+	 * objtrace:add:OBJ[:COUNT] [if filter]
+	 */
+	ret = event_trigger_separate_filter(param_and_filter, &param, &filter, true);
+	if (ret)
+		return ret;
+
+	objtrace_cmd = strsep(&param, ":");
+	if (!objtrace_cmd || strcmp(objtrace_cmd, OBJTRACE_CMD_ADD)) {
+		pr_err("error objtrace command\n");
+		return -EINVAL;
+	}
+
+	arg = strsep(&param, ":");
+	if (!arg)
+		return -EINVAL;
+
+	field = trace_find_event_field(file->event_call, arg);
+	if (!field)
+		return -EINVAL;
+
+	if (field->size != sizeof(void *)) {
+		pr_err("the size of the %s should be:%zu\n", field->name, sizeof(void *));
+		return -EINVAL;
+	}
+
+	if (remove && !field_exist(file, cmd_ops, field->name))
+		return -ENOENT;
+
+	obj_data = kzalloc(sizeof(*obj_data), GFP_KERNEL);
+	if (!obj_data)
+		return -ENOMEM;
+
+	obj_data->field = field;
+	obj_data->tr = file->tr;
+	snprintf(obj_data->objtrace_cmd, OBJTRACE_CMD_LEN, objtrace_cmd);
+
+	trigger_data = event_trigger_alloc(cmd_ops, cmd, param, obj_data);
+	if (!trigger_data) {
+		kfree(obj_data);
+		return -ENOMEM;
+	}
+	if (remove) {
+		event_trigger_unregister(cmd_ops, file, glob+1, trigger_data);
+		kfree(obj_data);
+		kfree(trigger_data);
+		return 0;
+	}
+
+	ret = event_trigger_parse_num(param, trigger_data);
+	if (ret)
+		goto out_free;
+
+	ret = event_trigger_set_filter(cmd_ops, file, filter, trigger_data);
+	if (ret < 0)
+		goto out_free;
+
+	ret = event_trigger_register(cmd_ops, file, glob, trigger_data);
+	if (ret)
+		goto out_free;
+
+	return ret;
+
+ out_free:
+	event_trigger_reset_filter(cmd_ops, trigger_data);
+	kfree(obj_data);
+	kfree(trigger_data);
+	return ret;
+}
+
+static struct event_command trigger_object_cmd = {
+	.name			= "objtrace",
+	.trigger_type		= ETT_TRACE_OBJECT,
+	.flags			= EVENT_CMD_FL_NEEDS_REC,
+	.parse			= event_object_trigger_parse,
+	.reg			= register_trigger,
+	.unreg			= unregister_trigger,
+	.get_trigger_ops	= objecttrace_get_trigger_ops,
+	.set_filter		= set_trigger_filter,
+};
+
+__init int register_trigger_object_cmd(void)
+{
+	int ret;
+
+	ret = register_event_command(&trigger_object_cmd);
+	WARN_ON(ret < 0);
+
+	return ret;
+}
+
+int allocate_objtrace_data(struct trace_array *tr)
+{
+	struct objtrace_data *obj_data;
+	struct ftrace_ops *fops;
+
+	obj_data = kzalloc(sizeof(*obj_data), GFP_KERNEL);
+	if (!obj_data)
+		return -ENOMEM;
+
+	raw_spin_lock_init(&obj_data->obj_data_lock);
+	obj_data->tr = tr;
+	fops = &obj_data->fops;
+	fops->func = trace_object_events_call;
+	fops->flags = FTRACE_OPS_FL_SAVE_REGS;
+	fops->private = tr;
+	list_add(&obj_data->head, &obj_data_head);
+
+	tr->obj_data = obj_data;
+
+	return 0;
+}
+
+static int init_trace_object(struct trace_array *tr)
+{
+	int ret;
+
+	ret = register_ftrace_function(&tr->obj_data->fops);
+	WARN_ON(ret < 0);
+
+	return ret;
+}
+
+void free_objtrace_data(struct trace_array *tr)
+{
+	kfree(tr->obj_data);
+	tr->obj_data = NULL;
+}
+
+static void exit_trace_object(struct trace_array *tr)
+{
+	struct objtrace_data *obj_data = tr->obj_data;
+
+	obj_data->num_traced_obj = 0;
+	WARN_ONCE(unregister_ftrace_function(&obj_data->fops),
+			"can't unregister ftrace for trace object\n");
+}
diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
index 67f47ea27921..34ff7b4dc521 100644
--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -1551,6 +1551,45 @@ static struct trace_event trace_func_repeats_event = {
 	.funcs		= &trace_func_repeats_funcs,
 };
 
+/* TRACE_OBJECT */
+static enum print_line_t trace_object_print(struct trace_iterator *iter, int flags,
+					struct trace_event *event)
+{
+	struct trace_object_entry *field;
+	struct trace_seq *s = &iter->seq;
+
+	trace_assign_type(field, iter->ent);
+	print_fn_trace(s, field->ip, field->parent_ip, flags);
+	trace_seq_printf(s, " object:0x%lx", field->object);
+	trace_seq_putc(s, '\n');
+
+	return trace_handle_return(s);
+}
+
+static enum print_line_t trace_object_raw(struct trace_iterator *iter, int flags,
+				      struct trace_event *event)
+{
+	struct trace_object_entry *field;
+
+	trace_assign_type(field, iter->ent);
+
+	trace_seq_printf(&iter->seq, "%lx %lx\n",
+			 field->ip,
+			 field->parent_ip);
+
+	return trace_handle_return(&iter->seq);
+}
+
+static struct trace_event_functions trace_object_funcs = {
+	.trace		= trace_object_print,
+	.raw		= trace_object_raw,
+};
+
+static struct trace_event trace_object_event = {
+	.type		= TRACE_OBJECT,
+	.funcs		= &trace_object_funcs,
+};
+
 static struct trace_event *events[] __initdata = {
 	&trace_fn_event,
 	&trace_ctx_event,
@@ -1565,6 +1604,7 @@ static struct trace_event *events[] __initdata = {
 	&trace_timerlat_event,
 	&trace_raw_data_event,
 	&trace_func_repeats_event,
+	&trace_object_event,
 	NULL
 };
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v15 2/4] trace/objtrace: Get the value of the object
  2022-08-19  3:27 [PATCH v15 0/4] trace: Introduce objtrace trigger to trace the kernel object Jeff Xie
  2022-08-19  3:27 ` [PATCH v15 1/4] trace: Add trace any " Jeff Xie
@ 2022-08-19  3:27 ` Jeff Xie
  2022-09-24 20:01   ` Steven Rostedt
  2022-08-19  3:27 ` [PATCH v15 3/4] selftests/ftrace: Add testcases for objtrace Jeff Xie
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 21+ messages in thread
From: Jeff Xie @ 2022-08-19  3:27 UTC (permalink / raw)
  To: rostedt; +Cc: mingo, mhiramat, zanussi, linux-kernel, chensong_2000, Jeff Xie

Using objtrace trigger to get the value of the object which from the kernel
function parameter.

Syntax:
	objtrace:add:obj[,offset][:type][:count][if <filter>]

Usage:
	# echo 'p bio_add_page arg1=$arg1' > ./kprobe_events
	# gdb vmlinux
	(gdb) p &(((struct bio *)0)->bi_iter.bi_size)
	$1 = (unsigned int *) 0x28
	# echo 'objtrace:add:arg1,0x28:u32:1 if comm == "cat"' > ./events/kprobes/ \
		 p_bio_add_page_0/trigger
	# cat /test.txt

Signed-off-by: Jeff Xie <xiehuan09@gmail.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 kernel/trace/trace.c         |   2 +-
 kernel/trace/trace_entries.h |   5 +-
 kernel/trace/trace_object.c  | 187 ++++++++++++++++++++++++++++++++---
 kernel/trace/trace_output.c  |   6 +-
 4 files changed, 181 insertions(+), 19 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 2c2477dea0f2..c2ffc2235b7b 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -5622,7 +5622,7 @@ static const char readme_msg[] =
 	"\t            disable_hist:<system>:<event>\n"
 #endif
 #ifdef CONFIG_TRACE_OBJECT
-	"\t            objtrace:add:obj[:count][if <filter>]\n"
+	"\t            objtrace:add:obj[,offset][:type][:count][if <filter>]\n"
 #endif
 #ifdef CONFIG_STACKTRACE
 	"\t\t    stacktrace\n"
diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h
index bb120d9498a9..2407c45a568c 100644
--- a/kernel/trace/trace_entries.h
+++ b/kernel/trace/trace_entries.h
@@ -413,8 +413,9 @@ FTRACE_ENTRY(object, trace_object_entry,
 		__field(	unsigned long,		ip		)
 		__field(	unsigned long,		parent_ip	)
 		__field(	unsigned long,		object		)
+		__field(	unsigned long,		value		)
 	),
 
-	F_printk(" %ps <-- %ps object:%lx\n",
-		 (void *)__entry->ip, (void *)__entry->parent_ip, __entry->object)
+	F_printk(" %ps <-- %ps object:%lx value:%lx\n", (void *)__entry->ip,
+	       (void *)__entry->parent_ip, __entry->object, __entry->value)
 );
diff --git a/kernel/trace/trace_object.c b/kernel/trace/trace_object.c
index 19ec4b1c0186..611341abeb7c 100644
--- a/kernel/trace/trace_object.c
+++ b/kernel/trace/trace_object.c
@@ -16,8 +16,14 @@ static const int max_args_num = 6;
 static void exit_trace_object(struct trace_array *tr);
 static int init_trace_object(struct trace_array *tr);
 
+/*
+ * get the offset from the special object and
+ * the type size of the value
+ */
 struct object_instance {
 	void *obj;
+	int obj_offset;
+	int obj_value_type_size;
 	struct trace_array *tr;
 };
 
@@ -25,9 +31,23 @@ struct object_instance {
 struct objtrace_trigger_data {
 	struct ftrace_event_field *field;
 	char objtrace_cmd[OBJTRACE_CMD_LEN];
+	int obj_offset;
+	int obj_value_type_size;
 	struct trace_array *tr;
 };
 
+/* get the type size for the special object */
+struct objtrace_fetch_type {
+	char *name;
+	int type_size;
+};
+
+enum objattr {
+	OBJ_OFFSET,
+	OBJ_VAL_TYPE_SIZE,
+	MAX_OBJ_ATTR
+};
+
 /* objtrace data with fops and objtrace_instances */
 struct objtrace_data {
 	struct list_head head;
@@ -67,6 +87,35 @@ static bool object_exist(void *obj, struct trace_array *tr)
 	return false;
 }
 
+static int get_object_attr(void *obj, int objattr,
+		struct trace_array *tr, int *result)
+{
+	int i, max;
+	struct objtrace_data *obj_data;
+
+	obj_data = get_obj_data(tr);
+	if (!obj_data)
+		return -EINVAL;
+
+	max = READ_ONCE(obj_data->num_traced_obj);
+	smp_rmb();
+	for (i = 0; i < max; i++) {
+		if (obj_data->traced_obj[i].obj == obj) {
+			switch (objattr) {
+			case OBJ_OFFSET:
+				*result = obj_data->traced_obj[i].obj_offset;
+				return 0;
+			case OBJ_VAL_TYPE_SIZE:
+				*result = obj_data->traced_obj[i].obj_value_type_size;
+				return 0;
+			default:
+				return -EINVAL;
+			}
+		}
+	}
+	return -EINVAL;
+}
+
 static bool object_empty(struct trace_array *tr)
 {
 	struct objtrace_data *obj_data;
@@ -78,7 +127,8 @@ static bool object_empty(struct trace_array *tr)
 	return !READ_ONCE(obj_data->num_traced_obj);
 }
 
-static void set_trace_object(void *obj, struct trace_array *tr)
+static void set_trace_object(void *obj, int obj_offset,
+			int obj_value_type_size, struct trace_array *tr)
 {
 	unsigned long flags;
 	struct object_instance *obj_ins;
@@ -103,6 +153,8 @@ static void set_trace_object(void *obj, struct trace_array *tr)
 	}
 	obj_ins = &obj_data->traced_obj[READ_ONCE(obj_data->num_traced_obj)];
 	obj_ins->obj = obj;
+	obj_ins->obj_value_type_size = obj_value_type_size;
+	obj_ins->obj_offset = obj_offset;
 	obj_ins->tr = tr;
 	/* make sure the num_traced_obj update always appears after traced_obj update */
 	smp_wmb();
@@ -112,7 +164,7 @@ static void set_trace_object(void *obj, struct trace_array *tr)
 }
 
 static void submit_trace_object(unsigned long ip, unsigned long parent_ip,
-				 unsigned long object, struct trace_array *tr)
+		unsigned long object, unsigned long value, struct trace_array *tr)
 {
 
 	struct trace_buffer *buffer = tr->array_buffer.buffer;
@@ -129,10 +181,43 @@ static void submit_trace_object(unsigned long ip, unsigned long parent_ip,
 	entry->ip                       = ip;
 	entry->parent_ip                = parent_ip;
 	entry->object			= object;
+	entry->value			= value;
 
 	trace_buffer_unlock_commit(tr, buffer, event, trace_ctx);
 }
 
+static inline long get_object_value(unsigned long *val, void *obj, int type_size)
+{
+	char tmp[sizeof(u64)];
+	long ret = 0;
+
+	ret = copy_from_kernel_nofault(tmp, obj, sizeof(tmp));
+	if (ret)
+		return ret;
+	switch (type_size) {
+	case 1: {
+		*val = (unsigned long)*(u8 *)tmp;
+		break;
+	}
+	case 2: {
+		*val = (unsigned long)*(u16 *)tmp;
+		break;
+	}
+	case 4: {
+		*val = (unsigned long)*(u32 *)tmp;
+		break;
+	}
+	case 8: {
+		*val = (unsigned long)*(u64 *)tmp;
+		break;
+	}
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 static void
 trace_object_events_call(unsigned long ip, unsigned long parent_ip,
 		struct ftrace_ops *op, struct ftrace_regs *fregs)
@@ -140,7 +225,8 @@ trace_object_events_call(unsigned long ip, unsigned long parent_ip,
 	struct pt_regs *pt_regs = ftrace_get_regs(fregs);
 	struct trace_array *tr = op->private;
 	struct trace_array_cpu *data;
-	unsigned long obj;
+	int ret, val_type_size, obj_offset;
+	unsigned long obj, val;
 	long disabled;
 	int cpu, n;
 
@@ -154,8 +240,21 @@ trace_object_events_call(unsigned long ip, unsigned long parent_ip,
 			goto out;
 		for (n = 0; n < max_args_num; n++) {
 			obj = regs_get_kernel_argument(pt_regs, n);
-			if (object_exist((void *)obj, tr))
-				submit_trace_object(ip, parent_ip, obj, tr);
+			if (object_exist((void *)obj, tr)) {
+				ret = get_object_attr((void *)obj, OBJ_OFFSET,
+						tr, &obj_offset);
+				if (unlikely(ret) < 0)
+					goto out;
+				ret = get_object_attr((void *)obj, OBJ_VAL_TYPE_SIZE,
+						tr, &val_type_size);
+				if (unlikely(ret) < 0)
+					goto out;
+				if (get_object_value(&val, (void *)(obj + obj_offset),
+							val_type_size))
+					continue;
+
+				submit_trace_object(ip, parent_ip, obj, val, tr);
+			}
 		/* The parameters of a function may match multiple objects */
 		}
 	}
@@ -176,9 +275,23 @@ trace_object_trigger(struct event_trigger_data *data,
 
 	field = obj_data->field;
 	memcpy(&obj, rec + field->offset, sizeof(obj));
-	set_trace_object(obj, tr);
+	/* set the offset from the special object and the type size of the value*/
+	set_trace_object(obj, obj_data->obj_offset,
+			obj_data->obj_value_type_size, tr);
 }
 
+static const struct objtrace_fetch_type objtrace_fetch_types[] = {
+	{"u8", 1},
+	{"x8", 1},
+	{"u16", 2},
+	{"x16", 2},
+	{"u32", 4},
+	{"x32", 4},
+	{"u64", 8},
+	{"x64", 8},
+	{NULL, 0},
+};
+
 static void
 trace_object_trigger_free(struct event_trigger_data *data)
 {
@@ -213,14 +326,25 @@ static int
 event_trigger_print(const char *name, struct seq_file *m,
 		void *data, char *filter_str, void *objtrace_data)
 {
+	int i;
 	long count = (long)data;
 	struct objtrace_trigger_data *obj_data = objtrace_data;
+	const char *value_type_name;
 
 	seq_puts(m, name);
 
 	seq_printf(m, ":%s", obj_data->objtrace_cmd);
 	seq_printf(m, ":%s", obj_data->field->name);
+	if (obj_data->obj_offset)
+		seq_printf(m, ",0x%x", obj_data->obj_offset);
 
+	for (i = 0; objtrace_fetch_types[i].name; i++) {
+		if (objtrace_fetch_types[i].type_size == obj_data->obj_value_type_size) {
+			value_type_name = objtrace_fetch_types[i].name;
+			break;
+		}
+	}
+	seq_printf(m, ":%s", value_type_name);
 	if (count == -1)
 		seq_puts(m, ":unlimited");
 	else
@@ -303,16 +427,18 @@ event_object_trigger_parse(struct event_command *cmd_ops,
 	struct event_trigger_data *trigger_data;
 	struct objtrace_trigger_data *obj_data;
 	struct ftrace_event_field *field;
-	char *objtrace_cmd, *arg;
-	char *param, *filter;
-	int ret;
+	char *objtrace_cmd, *obj;
+	char *param, *filter, *str, *type;
+	int ret, i, def_type_size, obj_value_type_size = 0;
+	char *tmp_saved_param;
+	long offset = 0;
 	bool remove;
 
 	remove = event_trigger_check_remove(glob);
 
 	/*
 	 * separate the param and the filter:
-	 * objtrace:add:OBJ[:COUNT] [if filter]
+	 * objtrace:add:OBJ[,OFFS][:TYPE][:COUNT] [if filter]
 	 */
 	ret = event_trigger_separate_filter(param_and_filter, &param, &filter, true);
 	if (ret)
@@ -324,11 +450,44 @@ event_object_trigger_parse(struct event_command *cmd_ops,
 		return -EINVAL;
 	}
 
-	arg = strsep(&param, ":");
-	if (!arg)
+	obj = strsep(&param, ":");
+	if (!obj)
 		return -EINVAL;
 
-	field = trace_find_event_field(file->event_call, arg);
+	str = strchr(obj, ',');
+	if (!str)
+		offset = 0;
+	else {
+		*str++ = '\0';
+		ret = kstrtol(str, 0, &offset);
+		if (ret)
+			return -EINVAL;
+	}
+	def_type_size = sizeof(void *);
+	if (!param) {
+		obj_value_type_size = def_type_size;
+		goto skip_get_type;
+	}
+	tmp_saved_param = param;
+	type = strsep(&param, ":");
+	if (!type)
+		obj_value_type_size = def_type_size;
+	/* if this is the trigger count */
+	else if (isdigit(type[0])) {
+		obj_value_type_size = def_type_size;
+		param = tmp_saved_param;
+	} else {
+		for (i = 0; objtrace_fetch_types[i].name; i++) {
+			if (strcmp(objtrace_fetch_types[i].name, type) == 0) {
+				obj_value_type_size = objtrace_fetch_types[i].type_size;
+				break;
+			}
+		}
+	}
+	if (!obj_value_type_size)
+		return -EINVAL;
+skip_get_type:
+	field = trace_find_event_field(file->event_call, obj);
 	if (!field)
 		return -EINVAL;
 
@@ -345,6 +504,8 @@ event_object_trigger_parse(struct event_command *cmd_ops,
 		return -ENOMEM;
 
 	obj_data->field = field;
+	obj_data->obj_offset = offset;
+	obj_data->obj_value_type_size = obj_value_type_size;
 	obj_data->tr = file->tr;
 	snprintf(obj_data->objtrace_cmd, OBJTRACE_CMD_LEN, objtrace_cmd);
 
diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
index 34ff7b4dc521..a45871b52dcc 100644
--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -1561,6 +1561,7 @@ static enum print_line_t trace_object_print(struct trace_iterator *iter, int fla
 	trace_assign_type(field, iter->ent);
 	print_fn_trace(s, field->ip, field->parent_ip, flags);
 	trace_seq_printf(s, " object:0x%lx", field->object);
+	trace_seq_printf(s, " value:0x%lx", field->value);
 	trace_seq_putc(s, '\n');
 
 	return trace_handle_return(s);
@@ -1573,9 +1574,8 @@ static enum print_line_t trace_object_raw(struct trace_iterator *iter, int flags
 
 	trace_assign_type(field, iter->ent);
 
-	trace_seq_printf(&iter->seq, "%lx %lx\n",
-			 field->ip,
-			 field->parent_ip);
+	trace_seq_printf(&iter->seq, "%lx %lx %lx %lx\n", field->ip,
+			field->parent_ip, field->object, field->value);
 
 	return trace_handle_return(&iter->seq);
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v15 3/4] selftests/ftrace: Add testcases for objtrace
  2022-08-19  3:27 [PATCH v15 0/4] trace: Introduce objtrace trigger to trace the kernel object Jeff Xie
  2022-08-19  3:27 ` [PATCH v15 1/4] trace: Add trace any " Jeff Xie
  2022-08-19  3:27 ` [PATCH v15 2/4] trace/objtrace: Get the value of the object Jeff Xie
@ 2022-08-19  3:27 ` Jeff Xie
  2022-08-19  3:27 ` [PATCH v15 4/4] Documentation: trace/objtrace: Add documentation " Jeff Xie
  2022-10-17 19:40 ` [PATCH v15 0/4] trace: Introduce objtrace trigger to trace the kernel object Steven Rostedt
  4 siblings, 0 replies; 21+ messages in thread
From: Jeff Xie @ 2022-08-19  3:27 UTC (permalink / raw)
  To: rostedt
  Cc: mingo, mhiramat, zanussi, linux-kernel, chensong_2000, Jeff Xie,
	Shuah Khan, linux-kselftest, Shuah Khan

Add a series of testcases to illustrate correct and incorrect usage of
objtrace trigger.

Cc: Shuah Khan <shuah@kernel.org>
Cc: linux-kselftest@vger.kernel.org
Signed-off-by: Jeff Xie <xiehuan09@gmail.com>
Tested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
---
 .../ftrace/test.d/trigger/trigger-objtrace.tc | 41 +++++++++++++++++++
 1 file changed, 41 insertions(+)
 create mode 100644 tools/testing/selftests/ftrace/test.d/trigger/trigger-objtrace.tc

diff --git a/tools/testing/selftests/ftrace/test.d/trigger/trigger-objtrace.tc b/tools/testing/selftests/ftrace/test.d/trigger/trigger-objtrace.tc
new file mode 100644
index 000000000000..d894442b6a30
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/trigger/trigger-objtrace.tc
@@ -0,0 +1,41 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: event trigger - test objtrace-trigger
+# requires: kprobe_events "objtrace":README
+
+fail() { #msg
+    echo $1
+    exit_fail
+}
+
+echo 'p bio_add_page arg1=$arg1 arg2=$arg2' > kprobe_events
+
+FEATURE=`grep objtrace events/kprobes/p_bio_add_page_0/trigger`
+if [ -z "$FEATURE" ]; then
+    echo "objtrace trigger is not supported"
+    exit_unsupported
+fi
+
+echo "Test objtrace trigger"
+echo 'objtrace:add:arg1,0x28:u32:1 if comm == "cat"' > \
+	events/kprobes/p_bio_add_page_0/trigger
+if [ -z $? ]; then
+	fail "objtrace trigger syntax error"
+fi
+
+echo "Test objtrace semantic errors"
+
+# Being lack of objtrace command
+! echo 'objtrace:arg1,0x28:u32:1' > events/kprobes/p_bio_add_page_0/trigger
+# Bad parameter name
+! echo 'objtrace:add:argx:u32:1' > events/kprobes/p_bio_add_page_0/trigger
+# The parameter existed on event
+! echo 'objtrace:add:arg2:u32:1' > events/kprobes/p_bio_add_page_0/trigger
+
+echo "reset objtrace trigger"
+
+echo '!objtrace:add:arg1,0x28:u32' > \
+	events/kprobes/p_bio_add_page_0/trigger
+echo '-:p_bio_add_page_0' >> ./kprobe_events
+
+exit 0
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v15 4/4] Documentation: trace/objtrace: Add documentation for objtrace
  2022-08-19  3:27 [PATCH v15 0/4] trace: Introduce objtrace trigger to trace the kernel object Jeff Xie
                   ` (2 preceding siblings ...)
  2022-08-19  3:27 ` [PATCH v15 3/4] selftests/ftrace: Add testcases for objtrace Jeff Xie
@ 2022-08-19  3:27 ` Jeff Xie
  2022-10-17 19:40 ` [PATCH v15 0/4] trace: Introduce objtrace trigger to trace the kernel object Steven Rostedt
  4 siblings, 0 replies; 21+ messages in thread
From: Jeff Xie @ 2022-08-19  3:27 UTC (permalink / raw)
  To: rostedt
  Cc: mingo, mhiramat, zanussi, linux-kernel, chensong_2000, Jeff Xie,
	Joel Fernandes, Jonathan Corbet, Bagas Sanjaya, linux-doc

Add documentation explaining how to use objtrace trigger to get the value
of the object.

Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Jeff Xie <xiehuan09@gmail.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
---
Changelog:
V15:
- Add syntax suggested by Joel for usage of objtrace

 Documentation/trace/events.rst | 90 ++++++++++++++++++++++++++++++++++
 1 file changed, 90 insertions(+)

diff --git a/Documentation/trace/events.rst b/Documentation/trace/events.rst
index c47f381d0c00..1209f3cd8f1d 100644
--- a/Documentation/trace/events.rst
+++ b/Documentation/trace/events.rst
@@ -546,6 +546,96 @@ The following commands are supported:
 
   See Documentation/trace/histogram.rst for details and examples.
 
+- objtrace
+
+  This command provides a way to get the value of any object, The object
+  can be obtained from the dynamic event (kprobe_event/uprobe_event) or the
+  static event (tracepoint).
+
+  Syntax:
+        objtrace:add:obj[,offset][:type][:count][if <filter>]
+
+  Usage:
+  When using the kprobe event, by only need to set the objtrace (a new
+  trigger), we can get the value of object that is set by kprobe event.
+
+  For example, for the function bio_add_page():
+
+  .. code-block:: c
+
+     int bio_add_page(struct bio *bio, struct page *page,
+	              unsigned int len, unsigned int offset)
+
+  Firstly, we can set the base of the object as first parameter (arg1) to
+  to the function:
+
+  .. code-block::
+
+     # echo 'p bio_add_page arg1=$arg1' > ./kprobe_events
+
+  Secondly, we can get the value dynamically based on the object:
+
+  .. code-block::
+
+     find the offset of the bi_size in struct bio:
+     $ gdb vmlinux
+     (gdb) p &(((struct bio *)0)->bi_iter.bi_size)
+     $1 = (unsigned int *) 0x28
+
+     # echo 'objtrace:add:arg1,0x28:u32:1 if comm == "cat"' > ./events/kprobes/ \
+       p_bio_add_page_0/trigger
+
+     # cd /sys/kernel/debug/tracing/
+     # echo 'p bio_add_page arg1=$arg1' > ./kprobe_events
+     # echo 'objtrace:add:arg1,0x28:u32:1 if comm == "cat"' > ./events/kprobes/p_bio_add_page_0/trigger
+
+     # du -sh /test.txt
+     12.0K   /test.txt
+
+     # cat  /test.txt > /dev/null
+     # cat ./trace
+     # tracer: nop
+     #
+     # entries-in-buffer/entries-written: 128/128   #P:4
+     #
+     #                                _-----=> irqs-off/BH-disabled
+     #                               / _----=> need-resched
+     #                              | / _---=> hardirq/softirq
+     #                              || / _--=> preempt-depth
+     #                              ||| / _-=> migrate-disable
+     #                              |||| /     delay
+     #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
+     #              | |         |   |||||     |         |
+                  cat-117     [002] ...1.     1.602243: __bio_try_merge_page <-bio_add_page object:0xffff88811bee4000 value:0x0
+                  cat-117     [002] ...1.     1.602244: __bio_add_page <-bio_add_page object:0xffff88811bee4000 value:0x0
+                  cat-117     [002] ...2.     1.602244: bio_add_page <-ext4_mpage_readpages object:0xffff88811bee4000 value:0x1000
+                  cat-117     [002] ...1.     1.602245: __bio_try_merge_page <-bio_add_page object:0xffff88811bee4000 value:0x1000
+                  cat-117     [002] ...1.     1.602245: __bio_add_page <-bio_add_page object:0xffff88811bee4000 value:0x1000
+                  cat-117     [002] ...2.     1.602245: bio_add_page <-ext4_mpage_readpages object:0xffff88811bee4000 value:0x2000
+                  cat-117     [002] ...1.     1.602245: __bio_try_merge_page <-bio_add_page object:0xffff88811bee4000 value:0x2000
+                  cat-117     [002] ...1.     1.602245: __bio_add_page <-bio_add_page object:0xffff88811bee4000 value:0x2000
+                  cat-117     [002] ...1.     1.602245: submit_bio <-ext4_mpage_readpages object:0xffff88811bee4000 value:0x3000
+                  cat-117     [002] ...1.     1.602245: submit_bio_noacct <-ext4_mpage_readpages object:0xffff88811bee4000 value:0x3000
+                  cat-117     [002] ...1.     1.602246: __submit_bio <-submit_bio_noacct object:0xffff88811bee4000 value:0x3000
+                  cat-117     [002] ...1.     1.602246: submit_bio_checks <-__submit_bio object:0xffff88811bee4000 value:0x3000
+                  cat-117     [002] ...1.     1.602246: __cond_resched <-submit_bio_checks object:0xffff88811bee4000 value:0x3000
+                  cat-117     [002] ...1.     1.602246: should_fail_bio <-submit_bio_checks object:0xffff88811bee4000 value:0x3000
+                  cat-117     [002] ...1.     1.602246: blk_mq_submit_bio <-submit_bio_noacct object:0xffff88811bee4000 value:0x3000
+                  cat-117     [002] ...1.     1.602246: blk_attempt_plug_merge <-blk_mq_submit_bio object:0xffff88811bee4000 value:0x3000
+                  cat-117     [002] ...1.     1.602246: blk_mq_sched_bio_merge <-blk_mq_submit_bio object:0xffff88811bee4000 value:0x3000
+                  cat-117     [002] ...1.     1.602247: __rcu_read_lock <-blk_mq_submit_bio object:0xffff88811bee4000 value:0x3000
+                  cat-117     [002] ...1.     1.602247: __rcu_read_unlock <-blk_mq_submit_bio object:0xffff88811bee4000 value:0x3000
+                  cat-117     [002] ...1.     1.602247: __blk_mq_alloc_requests <-blk_mq_submit_bio object:0xffff88811bee4000 value:0x3000
+               <idle>-0       [002] d..3.     1.602298: bio_endio <-blk_update_request object:0xffff88811bee4000 value:0x0
+               <idle>-0       [002] d..3.     1.602298: mpage_end_io <-blk_update_request object:0xffff88811bee4000 value:0x0
+               <idle>-0       [002] d..3.     1.602298: __read_end_io <-blk_update_request object:0xffff88811bee4000 value:0x0
+               <idle>-0       [002] d..3.     1.602300: bio_put <-blk_update_request object:0xffff88811bee4000 value:0x0
+               <idle>-0       [002] d..3.     1.602300: bio_free <-blk_update_request object:0xffff88811bee4000 value:0x0
+               <idle>-0       [002] d..3.     1.602300: mempool_free <-blk_update_request object:0xffff88811bee4000 value:0x0
+               <idle>-0       [002] d..3.     1.602300: mempool_free_slab <-blk_update_request object:0xffff88811bee4000 value:0x0
+               <idle>-0       [002] d..3.     1.602300: kmem_cache_free <-blk_update_request object:0xffff88811bee4000 value:0x0
+                ...
+
 7. In-kernel trace event API
 ============================
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v15 2/4] trace/objtrace: Get the value of the object
  2022-08-19  3:27 ` [PATCH v15 2/4] trace/objtrace: Get the value of the object Jeff Xie
@ 2022-09-24 20:01   ` Steven Rostedt
  2022-09-25  5:27     ` Jeff Xie
  0 siblings, 1 reply; 21+ messages in thread
From: Steven Rostedt @ 2022-09-24 20:01 UTC (permalink / raw)
  To: Jeff Xie; +Cc: mingo, mhiramat, zanussi, linux-kernel, chensong_2000

On Fri, 19 Aug 2022 11:27:04 +0800
Jeff Xie <xiehuan09@gmail.com> wrote:

Hi Jeff,

I finally (sorry about the long wait) got a chance to take a look at
this code and I really like it a lot! and I want to get this into the
next merge window.

I have one issue with it though, and that's with the syntax.

> Using objtrace trigger to get the value of the object which from the kernel
> function parameter.
> 
> Syntax:
> 	objtrace:add:obj[,offset][:type][:count][if <filter>]

I'm thinking instead of using the above syntax that is new, instead use
the syntax that is used by kprobes, eprobes and uprobes. That is:

  objtrace:+offset(obj):type

That is, instead of:

  objtrace:add:arg1,0x28:u32:1

have:

  objtrace:+0x28(arg1):u32

Perhaps we can add for count for greater than 1:

  obtrace:+0x28(arg1):u32[2]

for two items.

Then we could do even more complex analysis where we can dereference a
pointer within a structure to another pointer:

  obtrace:+0x16(+0x28(arg1)):u32[2]

Which will look at arg1, add 0x28 to it. dereference that location,
then add 0x16 to the value, and then dereference that location as well.

This code is available in the kprobe code that eprobes also uses:

See process_fetch_insn() in kernel/trace/trace_eprobe.c

and the parsing of the string is in kernel/trace/trace_probe.c:

  parse_probe_arg()


I think doing this will make it much more extensive, not to mention it
will match the syntax of other code in the tracing infrastructure.

What do you think?

-- Steve

> 
> Usage:
> 	# echo 'p bio_add_page arg1=$arg1' > ./kprobe_events
> 	# gdb vmlinux
> 	(gdb) p &(((struct bio *)0)->bi_iter.bi_size)
> 	$1 = (unsigned int *) 0x28
> 	# echo 'objtrace:add:arg1,0x28:u32:1 if comm == "cat"' > ./events/kprobes/ \
> 		 p_bio_add_page_0/trigger
> 	# cat /test.txt
> 
> Signed-off-by: Jeff Xie <xiehuan09@gmail.com>
> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> ---
>  kernel/trace/trace.c         |   2 +-
>  kernel/trace/trace_entries.h |   5 +-
>  kernel/trace/trace_object.c  | 187 ++++++++++++++++++++++++++++++++---
>  kernel/trace/trace_output.c  |   6 +-
>  4 files changed, 181 insertions(+), 19 deletions(-)
> 
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 2c2477dea0f2..c2ffc2235b7b 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -5622,7 +5622,7 @@ static const char readme_msg[] =
>  	"\t            disable_hist:<system>:<event>\n"
>  #endif
>  #ifdef CONFIG_TRACE_OBJECT
> -	"\t            objtrace:add:obj[:count][if <filter>]\n"
> +	"\t            objtrace:add:obj[,offset][:type][:count][if <filter>]\n"
>  #endif
>  #ifdef CONFIG_STACKTRACE
>  	"\t\t    stacktrace\n"
> diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h
> index bb120d9498a9..2407c45a568c 100644
> --- a/kernel/trace/trace_entries.h
> +++ b/kernel/trace/trace_entries.h
> @@ -413,8 +413,9 @@ FTRACE_ENTRY(object, trace_object_entry,
>  		__field(	unsigned long,		ip		)
>  		__field(	unsigned long,		parent_ip	)
>  		__field(	unsigned long,		object		)
> +		__field(	unsigned long,		value		)
>  	),
>  
> -	F_printk(" %ps <-- %ps object:%lx\n",
> -		 (void *)__entry->ip, (void *)__entry->parent_ip, __entry->object)
> +	F_printk(" %ps <-- %ps object:%lx value:%lx\n", (void *)__entry->ip,
> +	       (void *)__entry->parent_ip, __entry->object, __entry->value)
>  );
> diff --git a/kernel/trace/trace_object.c b/kernel/trace/trace_object.c
> index 19ec4b1c0186..611341abeb7c 100644
> --- a/kernel/trace/trace_object.c
> +++ b/kernel/trace/trace_object.c
> @@ -16,8 +16,14 @@ static const int max_args_num = 6;
>  static void exit_trace_object(struct trace_array *tr);
>  static int init_trace_object(struct trace_array *tr);
>  
> +/*
> + * get the offset from the special object and
> + * the type size of the value
> + */
>  struct object_instance {
>  	void *obj;
> +	int obj_offset;
> +	int obj_value_type_size;
>  	struct trace_array *tr;
>  };
>  
> @@ -25,9 +31,23 @@ struct object_instance {
>  struct objtrace_trigger_data {
>  	struct ftrace_event_field *field;
>  	char objtrace_cmd[OBJTRACE_CMD_LEN];
> +	int obj_offset;
> +	int obj_value_type_size;
>  	struct trace_array *tr;
>  };
>  
> +/* get the type size for the special object */
> +struct objtrace_fetch_type {
> +	char *name;
> +	int type_size;
> +};
> +
> +enum objattr {
> +	OBJ_OFFSET,
> +	OBJ_VAL_TYPE_SIZE,
> +	MAX_OBJ_ATTR
> +};
> +
>  /* objtrace data with fops and objtrace_instances */
>  struct objtrace_data {
>  	struct list_head head;
> @@ -67,6 +87,35 @@ static bool object_exist(void *obj, struct trace_array *tr)
>  	return false;
>  }
>  
> +static int get_object_attr(void *obj, int objattr,
> +		struct trace_array *tr, int *result)
> +{
> +	int i, max;
> +	struct objtrace_data *obj_data;
> +
> +	obj_data = get_obj_data(tr);
> +	if (!obj_data)
> +		return -EINVAL;
> +
> +	max = READ_ONCE(obj_data->num_traced_obj);
> +	smp_rmb();
> +	for (i = 0; i < max; i++) {
> +		if (obj_data->traced_obj[i].obj == obj) {
> +			switch (objattr) {
> +			case OBJ_OFFSET:
> +				*result = obj_data->traced_obj[i].obj_offset;
> +				return 0;
> +			case OBJ_VAL_TYPE_SIZE:
> +				*result = obj_data->traced_obj[i].obj_value_type_size;
> +				return 0;
> +			default:
> +				return -EINVAL;
> +			}
> +		}
> +	}
> +	return -EINVAL;
> +}
> +
>  static bool object_empty(struct trace_array *tr)
>  {
>  	struct objtrace_data *obj_data;
> @@ -78,7 +127,8 @@ static bool object_empty(struct trace_array *tr)
>  	return !READ_ONCE(obj_data->num_traced_obj);
>  }
>  
> -static void set_trace_object(void *obj, struct trace_array *tr)
> +static void set_trace_object(void *obj, int obj_offset,
> +			int obj_value_type_size, struct trace_array *tr)
>  {
>  	unsigned long flags;
>  	struct object_instance *obj_ins;
> @@ -103,6 +153,8 @@ static void set_trace_object(void *obj, struct trace_array *tr)
>  	}
>  	obj_ins = &obj_data->traced_obj[READ_ONCE(obj_data->num_traced_obj)];
>  	obj_ins->obj = obj;
> +	obj_ins->obj_value_type_size = obj_value_type_size;
> +	obj_ins->obj_offset = obj_offset;
>  	obj_ins->tr = tr;
>  	/* make sure the num_traced_obj update always appears after traced_obj update */
>  	smp_wmb();
> @@ -112,7 +164,7 @@ static void set_trace_object(void *obj, struct trace_array *tr)
>  }
>  
>  static void submit_trace_object(unsigned long ip, unsigned long parent_ip,
> -				 unsigned long object, struct trace_array *tr)
> +		unsigned long object, unsigned long value, struct trace_array *tr)
>  {
>  
>  	struct trace_buffer *buffer = tr->array_buffer.buffer;
> @@ -129,10 +181,43 @@ static void submit_trace_object(unsigned long ip, unsigned long parent_ip,
>  	entry->ip                       = ip;
>  	entry->parent_ip                = parent_ip;
>  	entry->object			= object;
> +	entry->value			= value;
>  
>  	trace_buffer_unlock_commit(tr, buffer, event, trace_ctx);
>  }
>  
> +static inline long get_object_value(unsigned long *val, void *obj, int type_size)
> +{
> +	char tmp[sizeof(u64)];
> +	long ret = 0;
> +
> +	ret = copy_from_kernel_nofault(tmp, obj, sizeof(tmp));
> +	if (ret)
> +		return ret;
> +	switch (type_size) {
> +	case 1: {
> +		*val = (unsigned long)*(u8 *)tmp;
> +		break;
> +	}
> +	case 2: {
> +		*val = (unsigned long)*(u16 *)tmp;
> +		break;
> +	}
> +	case 4: {
> +		*val = (unsigned long)*(u32 *)tmp;
> +		break;
> +	}
> +	case 8: {
> +		*val = (unsigned long)*(u64 *)tmp;
> +		break;
> +	}
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
>  static void
>  trace_object_events_call(unsigned long ip, unsigned long parent_ip,
>  		struct ftrace_ops *op, struct ftrace_regs *fregs)
> @@ -140,7 +225,8 @@ trace_object_events_call(unsigned long ip, unsigned long parent_ip,
>  	struct pt_regs *pt_regs = ftrace_get_regs(fregs);
>  	struct trace_array *tr = op->private;
>  	struct trace_array_cpu *data;
> -	unsigned long obj;
> +	int ret, val_type_size, obj_offset;
> +	unsigned long obj, val;
>  	long disabled;
>  	int cpu, n;
>  
> @@ -154,8 +240,21 @@ trace_object_events_call(unsigned long ip, unsigned long parent_ip,
>  			goto out;
>  		for (n = 0; n < max_args_num; n++) {
>  			obj = regs_get_kernel_argument(pt_regs, n);
> -			if (object_exist((void *)obj, tr))
> -				submit_trace_object(ip, parent_ip, obj, tr);
> +			if (object_exist((void *)obj, tr)) {
> +				ret = get_object_attr((void *)obj, OBJ_OFFSET,
> +						tr, &obj_offset);
> +				if (unlikely(ret) < 0)
> +					goto out;
> +				ret = get_object_attr((void *)obj, OBJ_VAL_TYPE_SIZE,
> +						tr, &val_type_size);
> +				if (unlikely(ret) < 0)
> +					goto out;
> +				if (get_object_value(&val, (void *)(obj + obj_offset),
> +							val_type_size))
> +					continue;
> +
> +				submit_trace_object(ip, parent_ip, obj, val, tr);
> +			}
>  		/* The parameters of a function may match multiple objects */
>  		}
>  	}
> @@ -176,9 +275,23 @@ trace_object_trigger(struct event_trigger_data *data,
>  
>  	field = obj_data->field;
>  	memcpy(&obj, rec + field->offset, sizeof(obj));
> -	set_trace_object(obj, tr);
> +	/* set the offset from the special object and the type size of the value*/
> +	set_trace_object(obj, obj_data->obj_offset,
> +			obj_data->obj_value_type_size, tr);
>  }
>  
> +static const struct objtrace_fetch_type objtrace_fetch_types[] = {
> +	{"u8", 1},
> +	{"x8", 1},
> +	{"u16", 2},
> +	{"x16", 2},
> +	{"u32", 4},
> +	{"x32", 4},
> +	{"u64", 8},
> +	{"x64", 8},
> +	{NULL, 0},
> +};
> +
>  static void
>  trace_object_trigger_free(struct event_trigger_data *data)
>  {
> @@ -213,14 +326,25 @@ static int
>  event_trigger_print(const char *name, struct seq_file *m,
>  		void *data, char *filter_str, void *objtrace_data)
>  {
> +	int i;
>  	long count = (long)data;
>  	struct objtrace_trigger_data *obj_data = objtrace_data;
> +	const char *value_type_name;
>  
>  	seq_puts(m, name);
>  
>  	seq_printf(m, ":%s", obj_data->objtrace_cmd);
>  	seq_printf(m, ":%s", obj_data->field->name);
> +	if (obj_data->obj_offset)
> +		seq_printf(m, ",0x%x", obj_data->obj_offset);
>  
> +	for (i = 0; objtrace_fetch_types[i].name; i++) {
> +		if (objtrace_fetch_types[i].type_size == obj_data->obj_value_type_size) {
> +			value_type_name = objtrace_fetch_types[i].name;
> +			break;
> +		}
> +	}
> +	seq_printf(m, ":%s", value_type_name);
>  	if (count == -1)
>  		seq_puts(m, ":unlimited");
>  	else
> @@ -303,16 +427,18 @@ event_object_trigger_parse(struct event_command *cmd_ops,
>  	struct event_trigger_data *trigger_data;
>  	struct objtrace_trigger_data *obj_data;
>  	struct ftrace_event_field *field;
> -	char *objtrace_cmd, *arg;
> -	char *param, *filter;
> -	int ret;
> +	char *objtrace_cmd, *obj;
> +	char *param, *filter, *str, *type;
> +	int ret, i, def_type_size, obj_value_type_size = 0;
> +	char *tmp_saved_param;
> +	long offset = 0;
>  	bool remove;
>  
>  	remove = event_trigger_check_remove(glob);
>  
>  	/*
>  	 * separate the param and the filter:
> -	 * objtrace:add:OBJ[:COUNT] [if filter]
> +	 * objtrace:add:OBJ[,OFFS][:TYPE][:COUNT] [if filter]
>  	 */
>  	ret = event_trigger_separate_filter(param_and_filter, &param, &filter, true);
>  	if (ret)
> @@ -324,11 +450,44 @@ event_object_trigger_parse(struct event_command *cmd_ops,
>  		return -EINVAL;
>  	}
>  
> -	arg = strsep(&param, ":");
> -	if (!arg)
> +	obj = strsep(&param, ":");
> +	if (!obj)
>  		return -EINVAL;
>  
> -	field = trace_find_event_field(file->event_call, arg);
> +	str = strchr(obj, ',');
> +	if (!str)
> +		offset = 0;
> +	else {
> +		*str++ = '\0';
> +		ret = kstrtol(str, 0, &offset);
> +		if (ret)
> +			return -EINVAL;
> +	}
> +	def_type_size = sizeof(void *);
> +	if (!param) {
> +		obj_value_type_size = def_type_size;
> +		goto skip_get_type;
> +	}
> +	tmp_saved_param = param;
> +	type = strsep(&param, ":");
> +	if (!type)
> +		obj_value_type_size = def_type_size;
> +	/* if this is the trigger count */
> +	else if (isdigit(type[0])) {
> +		obj_value_type_size = def_type_size;
> +		param = tmp_saved_param;
> +	} else {
> +		for (i = 0; objtrace_fetch_types[i].name; i++) {
> +			if (strcmp(objtrace_fetch_types[i].name, type) == 0) {
> +				obj_value_type_size = objtrace_fetch_types[i].type_size;
> +				break;
> +			}
> +		}
> +	}
> +	if (!obj_value_type_size)
> +		return -EINVAL;
> +skip_get_type:
> +	field = trace_find_event_field(file->event_call, obj);
>  	if (!field)
>  		return -EINVAL;
>  
> @@ -345,6 +504,8 @@ event_object_trigger_parse(struct event_command *cmd_ops,
>  		return -ENOMEM;
>  
>  	obj_data->field = field;
> +	obj_data->obj_offset = offset;
> +	obj_data->obj_value_type_size = obj_value_type_size;
>  	obj_data->tr = file->tr;
>  	snprintf(obj_data->objtrace_cmd, OBJTRACE_CMD_LEN, objtrace_cmd);
>  
> diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
> index 34ff7b4dc521..a45871b52dcc 100644
> --- a/kernel/trace/trace_output.c
> +++ b/kernel/trace/trace_output.c
> @@ -1561,6 +1561,7 @@ static enum print_line_t trace_object_print(struct trace_iterator *iter, int fla
>  	trace_assign_type(field, iter->ent);
>  	print_fn_trace(s, field->ip, field->parent_ip, flags);
>  	trace_seq_printf(s, " object:0x%lx", field->object);
> +	trace_seq_printf(s, " value:0x%lx", field->value);
>  	trace_seq_putc(s, '\n');
>  
>  	return trace_handle_return(s);
> @@ -1573,9 +1574,8 @@ static enum print_line_t trace_object_raw(struct trace_iterator *iter, int flags
>  
>  	trace_assign_type(field, iter->ent);
>  
> -	trace_seq_printf(&iter->seq, "%lx %lx\n",
> -			 field->ip,
> -			 field->parent_ip);
> +	trace_seq_printf(&iter->seq, "%lx %lx %lx %lx\n", field->ip,
> +			field->parent_ip, field->object, field->value);
>  
>  	return trace_handle_return(&iter->seq);
>  }


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v15 2/4] trace/objtrace: Get the value of the object
  2022-09-24 20:01   ` Steven Rostedt
@ 2022-09-25  5:27     ` Jeff Xie
  2022-09-25 15:53       ` Steven Rostedt
  0 siblings, 1 reply; 21+ messages in thread
From: Jeff Xie @ 2022-09-25  5:27 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: mingo, mhiramat, zanussi, linux-kernel, chensong_2000

Hi Steve,

On Sun, Sep 25, 2022 at 4:01 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Fri, 19 Aug 2022 11:27:04 +0800
> Jeff Xie <xiehuan09@gmail.com> wrote:
>
> Hi Jeff,
>
> I finally (sorry about the long wait) got a chance to take a look at
> this code and I really like it a lot! and I want to get this into the
> next merge window.
>

It's so exciting for me to hear this ;-) , and I really appreciate you
looking at this series patch.

> I have one issue with it though, and that's with the syntax.
>
> > Using objtrace trigger to get the value of the object which from the kernel
> > function parameter.
> >
> > Syntax:
> >       objtrace:add:obj[,offset][:type][:count][if <filter>]
>
> I'm thinking instead of using the above syntax that is new, instead use
> the syntax that is used by kprobes, eprobes and uprobes. That is:
>
>   objtrace:+offset(obj):type
>
> That is, instead of:
>
>   objtrace:add:arg1,0x28:u32:1
>
> have:
>
>   objtrace:+0x28(arg1):u32
>
> Perhaps we can add for count for greater than 1:
>
>   obtrace:+0x28(arg1):u32[2]
>
> for two items.
>
> Then we could do even more complex analysis where we can dereference a
> pointer within a structure to another pointer:
>
>   obtrace:+0x16(+0x28(arg1)):u32[2]
>
> Which will look at arg1, add 0x28 to it. dereference that location,
> then add 0x16 to the value, and then dereference that location as well.

Agree with your idea, It does seem like it's more powerful to
dereference pointers.

It also looks like the "add" command field has been removed,
Masami previously suggested I to add a reserved command field for the
future (see [1] and [2]):

[1]:
https://lore.kernel.org/lkml/20211026225234.549ec8e9eb59f1fd1671edbc@kernel.org/

[2]:
https://lore.kernel.org/lkml/20211105004704.3f4aa456c2c537a56f64e65a@kernel.org/


> This code is available in the kprobe code that eprobes also uses:
>
> See process_fetch_insn() in kernel/trace/trace_eprobe.c
>
> and the parsing of the string is in kernel/trace/trace_probe.c:
>
>   parse_probe_arg()
>

Thanks for your guidance, but the next merge window is coming soon,
and since I'm not doing full-time kernel development,
I can't guarantee that the development and testing of this change will
be completed before the next window merge.

 Anyone familiar with this code providing a patch is welcome ;-)

>
> I think doing this will make it much more extensive, not to mention it
> will match the syntax of other code in the tracing infrastructure.

I super agree.

> What do you think?
>
> -- Steve
>
> >
> > Usage:
> >       # echo 'p bio_add_page arg1=$arg1' > ./kprobe_events
> >       # gdb vmlinux
> >       (gdb) p &(((struct bio *)0)->bi_iter.bi_size)
> >       $1 = (unsigned int *) 0x28
> >       # echo 'objtrace:add:arg1,0x28:u32:1 if comm == "cat"' > ./events/kprobes/ \
> >                p_bio_add_page_0/trigger
> >       # cat /test.txt
> >
> > Signed-off-by: Jeff Xie <xiehuan09@gmail.com>
> > Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> > ---
> >  kernel/trace/trace.c         |   2 +-
> >  kernel/trace/trace_entries.h |   5 +-
> >  kernel/trace/trace_object.c  | 187 ++++++++++++++++++++++++++++++++---
> >  kernel/trace/trace_output.c  |   6 +-
> >  4 files changed, 181 insertions(+), 19 deletions(-)
> >
> > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> > index 2c2477dea0f2..c2ffc2235b7b 100644
> > --- a/kernel/trace/trace.c
> > +++ b/kernel/trace/trace.c
> > @@ -5622,7 +5622,7 @@ static const char readme_msg[] =
> >       "\t            disable_hist:<system>:<event>\n"
> >  #endif
> >  #ifdef CONFIG_TRACE_OBJECT
> > -     "\t            objtrace:add:obj[:count][if <filter>]\n"
> > +     "\t            objtrace:add:obj[,offset][:type][:count][if <filter>]\n"
> >  #endif
> >  #ifdef CONFIG_STACKTRACE
> >       "\t\t    stacktrace\n"
> > diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h
> > index bb120d9498a9..2407c45a568c 100644
> > --- a/kernel/trace/trace_entries.h
> > +++ b/kernel/trace/trace_entries.h
> > @@ -413,8 +413,9 @@ FTRACE_ENTRY(object, trace_object_entry,
> >               __field(        unsigned long,          ip              )
> >               __field(        unsigned long,          parent_ip       )
> >               __field(        unsigned long,          object          )
> > +             __field(        unsigned long,          value           )
> >       ),
> >
> > -     F_printk(" %ps <-- %ps object:%lx\n",
> > -              (void *)__entry->ip, (void *)__entry->parent_ip, __entry->object)
> > +     F_printk(" %ps <-- %ps object:%lx value:%lx\n", (void *)__entry->ip,
> > +            (void *)__entry->parent_ip, __entry->object, __entry->value)
> >  );
> > diff --git a/kernel/trace/trace_object.c b/kernel/trace/trace_object.c
> > index 19ec4b1c0186..611341abeb7c 100644
> > --- a/kernel/trace/trace_object.c
> > +++ b/kernel/trace/trace_object.c
> > @@ -16,8 +16,14 @@ static const int max_args_num = 6;
> >  static void exit_trace_object(struct trace_array *tr);
> >  static int init_trace_object(struct trace_array *tr);
> >
> > +/*
> > + * get the offset from the special object and
> > + * the type size of the value
> > + */
> >  struct object_instance {
> >       void *obj;
> > +     int obj_offset;
> > +     int obj_value_type_size;
> >       struct trace_array *tr;
> >  };
> >
> > @@ -25,9 +31,23 @@ struct object_instance {
> >  struct objtrace_trigger_data {
> >       struct ftrace_event_field *field;
> >       char objtrace_cmd[OBJTRACE_CMD_LEN];
> > +     int obj_offset;
> > +     int obj_value_type_size;
> >       struct trace_array *tr;
> >  };
> >
> > +/* get the type size for the special object */
> > +struct objtrace_fetch_type {
> > +     char *name;
> > +     int type_size;
> > +};
> > +
> > +enum objattr {
> > +     OBJ_OFFSET,
> > +     OBJ_VAL_TYPE_SIZE,
> > +     MAX_OBJ_ATTR
> > +};
> > +
> >  /* objtrace data with fops and objtrace_instances */
> >  struct objtrace_data {
> >       struct list_head head;
> > @@ -67,6 +87,35 @@ static bool object_exist(void *obj, struct trace_array *tr)
> >       return false;
> >  }
> >
> > +static int get_object_attr(void *obj, int objattr,
> > +             struct trace_array *tr, int *result)
> > +{
> > +     int i, max;
> > +     struct objtrace_data *obj_data;
> > +
> > +     obj_data = get_obj_data(tr);
> > +     if (!obj_data)
> > +             return -EINVAL;
> > +
> > +     max = READ_ONCE(obj_data->num_traced_obj);
> > +     smp_rmb();
> > +     for (i = 0; i < max; i++) {
> > +             if (obj_data->traced_obj[i].obj == obj) {
> > +                     switch (objattr) {
> > +                     case OBJ_OFFSET:
> > +                             *result = obj_data->traced_obj[i].obj_offset;
> > +                             return 0;
> > +                     case OBJ_VAL_TYPE_SIZE:
> > +                             *result = obj_data->traced_obj[i].obj_value_type_size;
> > +                             return 0;
> > +                     default:
> > +                             return -EINVAL;
> > +                     }
> > +             }
> > +     }
> > +     return -EINVAL;
> > +}
> > +
> >  static bool object_empty(struct trace_array *tr)
> >  {
> >       struct objtrace_data *obj_data;
> > @@ -78,7 +127,8 @@ static bool object_empty(struct trace_array *tr)
> >       return !READ_ONCE(obj_data->num_traced_obj);
> >  }
> >
> > -static void set_trace_object(void *obj, struct trace_array *tr)
> > +static void set_trace_object(void *obj, int obj_offset,
> > +                     int obj_value_type_size, struct trace_array *tr)
> >  {
> >       unsigned long flags;
> >       struct object_instance *obj_ins;
> > @@ -103,6 +153,8 @@ static void set_trace_object(void *obj, struct trace_array *tr)
> >       }
> >       obj_ins = &obj_data->traced_obj[READ_ONCE(obj_data->num_traced_obj)];
> >       obj_ins->obj = obj;
> > +     obj_ins->obj_value_type_size = obj_value_type_size;
> > +     obj_ins->obj_offset = obj_offset;
> >       obj_ins->tr = tr;
> >       /* make sure the num_traced_obj update always appears after traced_obj update */
> >       smp_wmb();
> > @@ -112,7 +164,7 @@ static void set_trace_object(void *obj, struct trace_array *tr)
> >  }
> >
> >  static void submit_trace_object(unsigned long ip, unsigned long parent_ip,
> > -                              unsigned long object, struct trace_array *tr)
> > +             unsigned long object, unsigned long value, struct trace_array *tr)
> >  {
> >
> >       struct trace_buffer *buffer = tr->array_buffer.buffer;
> > @@ -129,10 +181,43 @@ static void submit_trace_object(unsigned long ip, unsigned long parent_ip,
> >       entry->ip                       = ip;
> >       entry->parent_ip                = parent_ip;
> >       entry->object                   = object;
> > +     entry->value                    = value;
> >
> >       trace_buffer_unlock_commit(tr, buffer, event, trace_ctx);
> >  }
> >
> > +static inline long get_object_value(unsigned long *val, void *obj, int type_size)
> > +{
> > +     char tmp[sizeof(u64)];
> > +     long ret = 0;
> > +
> > +     ret = copy_from_kernel_nofault(tmp, obj, sizeof(tmp));
> > +     if (ret)
> > +             return ret;
> > +     switch (type_size) {
> > +     case 1: {
> > +             *val = (unsigned long)*(u8 *)tmp;
> > +             break;
> > +     }
> > +     case 2: {
> > +             *val = (unsigned long)*(u16 *)tmp;
> > +             break;
> > +     }
> > +     case 4: {
> > +             *val = (unsigned long)*(u32 *)tmp;
> > +             break;
> > +     }
> > +     case 8: {
> > +             *val = (unsigned long)*(u64 *)tmp;
> > +             break;
> > +     }
> > +     default:
> > +             return -EINVAL;
> > +     }
> > +
> > +     return 0;
> > +}
> > +
> >  static void
> >  trace_object_events_call(unsigned long ip, unsigned long parent_ip,
> >               struct ftrace_ops *op, struct ftrace_regs *fregs)
> > @@ -140,7 +225,8 @@ trace_object_events_call(unsigned long ip, unsigned long parent_ip,
> >       struct pt_regs *pt_regs = ftrace_get_regs(fregs);
> >       struct trace_array *tr = op->private;
> >       struct trace_array_cpu *data;
> > -     unsigned long obj;
> > +     int ret, val_type_size, obj_offset;
> > +     unsigned long obj, val;
> >       long disabled;
> >       int cpu, n;
> >
> > @@ -154,8 +240,21 @@ trace_object_events_call(unsigned long ip, unsigned long parent_ip,
> >                       goto out;
> >               for (n = 0; n < max_args_num; n++) {
> >                       obj = regs_get_kernel_argument(pt_regs, n);
> > -                     if (object_exist((void *)obj, tr))
> > -                             submit_trace_object(ip, parent_ip, obj, tr);
> > +                     if (object_exist((void *)obj, tr)) {
> > +                             ret = get_object_attr((void *)obj, OBJ_OFFSET,
> > +                                             tr, &obj_offset);
> > +                             if (unlikely(ret) < 0)
> > +                                     goto out;
> > +                             ret = get_object_attr((void *)obj, OBJ_VAL_TYPE_SIZE,
> > +                                             tr, &val_type_size);
> > +                             if (unlikely(ret) < 0)
> > +                                     goto out;
> > +                             if (get_object_value(&val, (void *)(obj + obj_offset),
> > +                                                     val_type_size))
> > +                                     continue;
> > +
> > +                             submit_trace_object(ip, parent_ip, obj, val, tr);
> > +                     }
> >               /* The parameters of a function may match multiple objects */
> >               }
> >       }
> > @@ -176,9 +275,23 @@ trace_object_trigger(struct event_trigger_data *data,
> >
> >       field = obj_data->field;
> >       memcpy(&obj, rec + field->offset, sizeof(obj));
> > -     set_trace_object(obj, tr);
> > +     /* set the offset from the special object and the type size of the value*/
> > +     set_trace_object(obj, obj_data->obj_offset,
> > +                     obj_data->obj_value_type_size, tr);
> >  }
> >
> > +static const struct objtrace_fetch_type objtrace_fetch_types[] = {
> > +     {"u8", 1},
> > +     {"x8", 1},
> > +     {"u16", 2},
> > +     {"x16", 2},
> > +     {"u32", 4},
> > +     {"x32", 4},
> > +     {"u64", 8},
> > +     {"x64", 8},
> > +     {NULL, 0},
> > +};
> > +
> >  static void
> >  trace_object_trigger_free(struct event_trigger_data *data)
> >  {
> > @@ -213,14 +326,25 @@ static int
> >  event_trigger_print(const char *name, struct seq_file *m,
> >               void *data, char *filter_str, void *objtrace_data)
> >  {
> > +     int i;
> >       long count = (long)data;
> >       struct objtrace_trigger_data *obj_data = objtrace_data;
> > +     const char *value_type_name;
> >
> >       seq_puts(m, name);
> >
> >       seq_printf(m, ":%s", obj_data->objtrace_cmd);
> >       seq_printf(m, ":%s", obj_data->field->name);
> > +     if (obj_data->obj_offset)
> > +             seq_printf(m, ",0x%x", obj_data->obj_offset);
> >
> > +     for (i = 0; objtrace_fetch_types[i].name; i++) {
> > +             if (objtrace_fetch_types[i].type_size == obj_data->obj_value_type_size) {
> > +                     value_type_name = objtrace_fetch_types[i].name;
> > +                     break;
> > +             }
> > +     }
> > +     seq_printf(m, ":%s", value_type_name);
> >       if (count == -1)
> >               seq_puts(m, ":unlimited");
> >       else
> > @@ -303,16 +427,18 @@ event_object_trigger_parse(struct event_command *cmd_ops,
> >       struct event_trigger_data *trigger_data;
> >       struct objtrace_trigger_data *obj_data;
> >       struct ftrace_event_field *field;
> > -     char *objtrace_cmd, *arg;
> > -     char *param, *filter;
> > -     int ret;
> > +     char *objtrace_cmd, *obj;
> > +     char *param, *filter, *str, *type;
> > +     int ret, i, def_type_size, obj_value_type_size = 0;
> > +     char *tmp_saved_param;
> > +     long offset = 0;
> >       bool remove;
> >
> >       remove = event_trigger_check_remove(glob);
> >
> >       /*
> >        * separate the param and the filter:
> > -      * objtrace:add:OBJ[:COUNT] [if filter]
> > +      * objtrace:add:OBJ[,OFFS][:TYPE][:COUNT] [if filter]
> >        */
> >       ret = event_trigger_separate_filter(param_and_filter, &param, &filter, true);
> >       if (ret)
> > @@ -324,11 +450,44 @@ event_object_trigger_parse(struct event_command *cmd_ops,
> >               return -EINVAL;
> >       }
> >
> > -     arg = strsep(&param, ":");
> > -     if (!arg)
> > +     obj = strsep(&param, ":");
> > +     if (!obj)
> >               return -EINVAL;
> >
> > -     field = trace_find_event_field(file->event_call, arg);
> > +     str = strchr(obj, ',');
> > +     if (!str)
> > +             offset = 0;
> > +     else {
> > +             *str++ = '\0';
> > +             ret = kstrtol(str, 0, &offset);
> > +             if (ret)
> > +                     return -EINVAL;
> > +     }
> > +     def_type_size = sizeof(void *);
> > +     if (!param) {
> > +             obj_value_type_size = def_type_size;
> > +             goto skip_get_type;
> > +     }
> > +     tmp_saved_param = param;
> > +     type = strsep(&param, ":");
> > +     if (!type)
> > +             obj_value_type_size = def_type_size;
> > +     /* if this is the trigger count */
> > +     else if (isdigit(type[0])) {
> > +             obj_value_type_size = def_type_size;
> > +             param = tmp_saved_param;
> > +     } else {
> > +             for (i = 0; objtrace_fetch_types[i].name; i++) {
> > +                     if (strcmp(objtrace_fetch_types[i].name, type) == 0) {
> > +                             obj_value_type_size = objtrace_fetch_types[i].type_size;
> > +                             break;
> > +                     }
> > +             }
> > +     }
> > +     if (!obj_value_type_size)
> > +             return -EINVAL;
> > +skip_get_type:
> > +     field = trace_find_event_field(file->event_call, obj);
> >       if (!field)
> >               return -EINVAL;
> >
> > @@ -345,6 +504,8 @@ event_object_trigger_parse(struct event_command *cmd_ops,
> >               return -ENOMEM;
> >
> >       obj_data->field = field;
> > +     obj_data->obj_offset = offset;
> > +     obj_data->obj_value_type_size = obj_value_type_size;
> >       obj_data->tr = file->tr;
> >       snprintf(obj_data->objtrace_cmd, OBJTRACE_CMD_LEN, objtrace_cmd);
> >
> > diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
> > index 34ff7b4dc521..a45871b52dcc 100644
> > --- a/kernel/trace/trace_output.c
> > +++ b/kernel/trace/trace_output.c
> > @@ -1561,6 +1561,7 @@ static enum print_line_t trace_object_print(struct trace_iterator *iter, int fla
> >       trace_assign_type(field, iter->ent);
> >       print_fn_trace(s, field->ip, field->parent_ip, flags);
> >       trace_seq_printf(s, " object:0x%lx", field->object);
> > +     trace_seq_printf(s, " value:0x%lx", field->value);
> >       trace_seq_putc(s, '\n');
> >
> >       return trace_handle_return(s);
> > @@ -1573,9 +1574,8 @@ static enum print_line_t trace_object_raw(struct trace_iterator *iter, int flags
> >
> >       trace_assign_type(field, iter->ent);
> >
> > -     trace_seq_printf(&iter->seq, "%lx %lx\n",
> > -                      field->ip,
> > -                      field->parent_ip);
> > +     trace_seq_printf(&iter->seq, "%lx %lx %lx %lx\n", field->ip,
> > +                     field->parent_ip, field->object, field->value);
> >
> >       return trace_handle_return(&iter->seq);
> >  }
>

-- 
Thanks,
JeffXie

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v15 2/4] trace/objtrace: Get the value of the object
  2022-09-25  5:27     ` Jeff Xie
@ 2022-09-25 15:53       ` Steven Rostedt
  2022-09-25 16:49         ` Jeff Xie
  0 siblings, 1 reply; 21+ messages in thread
From: Steven Rostedt @ 2022-09-25 15:53 UTC (permalink / raw)
  To: Jeff Xie; +Cc: mingo, mhiramat, zanussi, linux-kernel, chensong_2000

On Sun, 25 Sep 2022 13:27:10 +0800
Jeff Xie <xiehuan09@gmail.com> wrote:

> 
> It also looks like the "add" command field has been removed,
> Masami previously suggested I to add a reserved command field for the
> future (see [1] and [2]):
> 
> [1]:
> https://lore.kernel.org/lkml/20211026225234.549ec8e9eb59f1fd1671edbc@kernel.org/
> 
> [2]:
> https://lore.kernel.org/lkml/20211105004704.3f4aa456c2c537a56f64e65a@kernel.org/

Ah, I see. add means enable the tracing and del means disable.

Masami, perhaps we should call it "enable/disable" instead? Because, to
me "del" means to remove it (permanently), where as "enable/disable"
matches other triggers like the "enable/disable event" that starts and
stops another event.

> 
> 
> > This code is available in the kprobe code that eprobes also uses:
> >
> > See process_fetch_insn() in kernel/trace/trace_eprobe.c
> >
> > and the parsing of the string is in kernel/trace/trace_probe.c:
> >
> >   parse_probe_arg()
> >  
> 
> Thanks for your guidance, but the next merge window is coming soon,
> and since I'm not doing full-time kernel development,
> I can't guarantee that the development and testing of this change will
> be completed before the next window merge.
> 
>  Anyone familiar with this code providing a patch is welcome ;-)

If I get a chance, I may see if I can quickly implement it.

> 
> >
> > I think doing this will make it much more extensive, not to mention it
> > will match the syntax of other code in the tracing infrastructure.  
> 
> I super agree.

Thanks for you patience, and I'm sorry it took so long for me to review
it.

-- Steve

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v15 2/4] trace/objtrace: Get the value of the object
  2022-09-25 15:53       ` Steven Rostedt
@ 2022-09-25 16:49         ` Jeff Xie
  2022-09-28 14:15           ` Jeff Xie
  0 siblings, 1 reply; 21+ messages in thread
From: Jeff Xie @ 2022-09-25 16:49 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: mingo, mhiramat, zanussi, linux-kernel, chensong_2000

On Sun, Sep 25, 2022 at 11:53 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Sun, 25 Sep 2022 13:27:10 +0800
> Jeff Xie <xiehuan09@gmail.com> wrote:
>
> >
> > It also looks like the "add" command field has been removed,
> > Masami previously suggested I to add a reserved command field for the
> > future (see [1] and [2]):
> >
> > [1]:
> > https://lore.kernel.org/lkml/20211026225234.549ec8e9eb59f1fd1671edbc@kernel.org/
> >
> > [2]:
> > https://lore.kernel.org/lkml/20211105004704.3f4aa456c2c537a56f64e65a@kernel.org/
>
> Ah, I see. add means enable the tracing and del means disable.
>
> Masami, perhaps we should call it "enable/disable" instead? Because, to
> me "del" means to remove it (permanently), where as "enable/disable"
> matches other triggers like the "enable/disable event" that starts and
> stops another event.
>
> >
> >
> > > This code is available in the kprobe code that eprobes also uses:
> > >
> > > See process_fetch_insn() in kernel/trace/trace_eprobe.c
> > >
> > > and the parsing of the string is in kernel/trace/trace_probe.c:
> > >
> > >   parse_probe_arg()
> > >
> >
> > Thanks for your guidance, but the next merge window is coming soon,
> > and since I'm not doing full-time kernel development,
> > I can't guarantee that the development and testing of this change will
> > be completed before the next window merge.
> >
> >  Anyone familiar with this code providing a patch is welcome ;-)
>
> If I get a chance, I may see if I can quickly implement it.

Thanks a lot ;-)

>
> >
> > >
> > > I think doing this will make it much more extensive, not to mention it
> > > will match the syntax of other code in the tracing infrastructure.
> >
> > I super agree.
>
> Thanks for you patience, and I'm sorry it took so long for me to review
> it.

Nevermind, I've always been a fan of ftrace.

>
> -- Steve



-- 
Thanks,
JeffXie

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v15 2/4] trace/objtrace: Get the value of the object
  2022-09-25 16:49         ` Jeff Xie
@ 2022-09-28 14:15           ` Jeff Xie
  2022-09-28 14:40             ` Steven Rostedt
  0 siblings, 1 reply; 21+ messages in thread
From: Jeff Xie @ 2022-09-28 14:15 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: mingo, mhiramat, zanussi, linux-kernel, chensong_2000

On Mon, Sep 26, 2022 at 12:49 AM Jeff Xie <xiehuan09@gmail.com> wrote:
>
> On Sun, Sep 25, 2022 at 11:53 PM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > On Sun, 25 Sep 2022 13:27:10 +0800
> > Jeff Xie <xiehuan09@gmail.com> wrote:
> >
> > >
> > > It also looks like the "add" command field has been removed,
> > > Masami previously suggested I to add a reserved command field for the
> > > future (see [1] and [2]):
> > >
> > > [1]:
> > > https://lore.kernel.org/lkml/20211026225234.549ec8e9eb59f1fd1671edbc@kernel.org/
> > >
> > > [2]:
> > > https://lore.kernel.org/lkml/20211105004704.3f4aa456c2c537a56f64e65a@kernel.org/
> >
> > Ah, I see. add means enable the tracing and del means disable.
> >
> > Masami, perhaps we should call it "enable/disable" instead? Because, to
> > me "del" means to remove it (permanently), where as "enable/disable"
> > matches other triggers like the "enable/disable event" that starts and
> > stops another event.
> >
> > >
> > >
> > > > This code is available in the kprobe code that eprobes also uses:
> > > >
> > > > See process_fetch_insn() in kernel/trace/trace_eprobe.c
> > > >
> > > > and the parsing of the string is in kernel/trace/trace_probe.c:
> > > >
> > > >   parse_probe_arg()

I guess I won't be able to finish it until the next merge window, I've
been trying it for a few days
In my spare time recently, with no progress, it looks like it's really
more complicated to implement ;-)


> > >
> > > Thanks for your guidance, but the next merge window is coming soon,
> > > and since I'm not doing full-time kernel development,
> > > I can't guarantee that the development and testing of this change will
> > > be completed before the next window merge.
> > >
> > >  Anyone familiar with this code providing a patch is welcome ;-)
> >
> > If I get a chance, I may see if I can quickly implement it.
>
> Thanks a lot ;-)
>
> >
> > >
> > > >
> > > > I think doing this will make it much more extensive, not to mention it
> > > > will match the syntax of other code in the tracing infrastructure.
> > >
> > > I super agree.
> >
> > Thanks for you patience, and I'm sorry it took so long for me to review
> > it.
>
> Nevermind, I've always been a fan of ftrace.
>
> >
> > -- Steve
>
>
>
> --
> Thanks,
> JeffXie


-- 
Thanks,
JeffXie

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v15 2/4] trace/objtrace: Get the value of the object
  2022-09-28 14:15           ` Jeff Xie
@ 2022-09-28 14:40             ` Steven Rostedt
  0 siblings, 0 replies; 21+ messages in thread
From: Steven Rostedt @ 2022-09-28 14:40 UTC (permalink / raw)
  To: Jeff Xie; +Cc: mingo, mhiramat, zanussi, linux-kernel, chensong_2000

On Wed, 28 Sep 2022 22:15:24 +0800
Jeff Xie <xiehuan09@gmail.com> wrote:

> > > > > This code is available in the kprobe code that eprobes also uses:
> > > > >
> > > > > See process_fetch_insn() in kernel/trace/trace_eprobe.c
> > > > >
> > > > > and the parsing of the string is in kernel/trace/trace_probe.c:
> > > > >
> > > > >   parse_probe_arg()  
> 
> I guess I won't be able to finish it until the next merge window, I've
> been trying it for a few days

Yeah, the merge window is expected to open next week, so it likely will not
be able to make that. But hopefully the next window it will!

> In my spare time recently, with no progress, it looks like it's really
> more complicated to implement ;-)

Yeah, I was playing with it too. It is complicated, but I do not see any
show stoppers from getting it working.

-- Steve

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v15 0/4] trace: Introduce objtrace trigger to trace the kernel object
  2022-08-19  3:27 [PATCH v15 0/4] trace: Introduce objtrace trigger to trace the kernel object Jeff Xie
                   ` (3 preceding siblings ...)
  2022-08-19  3:27 ` [PATCH v15 4/4] Documentation: trace/objtrace: Add documentation " Jeff Xie
@ 2022-10-17 19:40 ` Steven Rostedt
  2022-10-18  7:01   ` Jeff Xie
  2022-11-29 16:09   ` Jeff Xie
  4 siblings, 2 replies; 21+ messages in thread
From: Steven Rostedt @ 2022-10-17 19:40 UTC (permalink / raw)
  To: Jeff Xie; +Cc: mingo, mhiramat, zanussi, linux-kernel, chensong_2000

On Fri, 19 Aug 2022 11:27:02 +0800
Jeff Xie <xiehuan09@gmail.com> wrote:

So on my way home from London, I tried to update it to use the fetch
instruction logic of kprobes (which will work), but I also ended up pretty
much rewriting most of it, which then I realized, I'm going much further
than I wanted to.

But know I have a much better idea of how this should work, and I'll try to
go through what I learned here and throughout the patches.


> Introduce a method based on function tracer to trace any object and get
> the value of the object dynamically. the object can be obtained from the
> dynamic event (kprobe_event/uprobe_event) or the static event(tracepoint).

Note, I can see the value of tracing only the value of the object, but that
to me would be "objfollow" not "objtrace". That is, to find out what
functions are calling the object. I guess that may have been your original
objective, but it was articulated more about knowing how the value changed
and not where the value is going (and the code was written more like that).

Having a objfollow may be nice, but reading the arguments of a function is
really a "best attempt" at most, and you can't really trust the arguments
are what you are tracing. I would hold off on that until we have a good BTF
tracing infrastructure in the function tracer.

So I'm going to continue with "objtrace" which is about tracing the
object's value throughout time.

> 
> Usage:
> When using the kprobe event, only need to set the objtrace(a new trigger),
> we can get the value of the object. The object is from the setting of the 
> kprobe event.
> 
> For example:
> For the function bio_add_page():
> 
> int bio_add_page(struct bio *bio, struct page *page,
> 	unsigned int len, unsigned int offset)
> 
> Firstly, we can set the base of the object, thus the first string "arg1"
> stands for the value of the first parameter of this function bio_add_gage(),
> 
> # echo 'p bio_add_page arg1=$arg1' > ./kprobe_events
> 
> Secondly, we can get the value dynamically based on above object. 
> 
> find the offset of the bi_size in struct bio:
> $ gdb vmlinux
> (gdb) p &(((struct bio *)0)->bi_iter.bi_size)
> $1 = (unsigned int *) 0x28
> 
> # echo 'objtrace:add:arg1,0x28:u32:1 if comm == "cat"' > ./events/kprobes/ \
> 	p_bio_add_page_0/trigger
> 
> # cd /sys/kernel/debug/tracing/
> # echo 'p bio_add_page arg1=$arg1' > ./kprobe_events
> # echo 'objtrace:add:arg1,0x28:u32:1 if comm == "cat"' > ./events/kprobes/p_bio_add_page_0/trigger

So I was working to change the above to:

  echo 'objtrace:add:+0x28($arg1):u32:1 if comm == "cat"'



> 
> # du -sh /test.txt
> 12.0K   /test.txt
> 
> # cat  /test.txt > /dev/null
> # cat ./trace
> # tracer: nop
> #
> # entries-in-buffer/entries-written: 128/128   #P:4
> #
> #                                _-----=> irqs-off/BH-disabled
> #                               / _----=> need-resched
> #                              | / _---=> hardirq/softirq
> #                              || / _--=> preempt-depth
> #                              ||| / _-=> migrate-disable
> #                              |||| /     delay
> #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
> #              | |         |   |||||     |         |
>              cat-117     [002] ...1.     1.602243: __bio_try_merge_page <-bio_add_page object:0xffff88811bee4000 value:0x0
>              cat-117     [002] ...1.     1.602244: __bio_add_page <-bio_add_page object:0xffff88811bee4000 value:0x0
>              cat-117     [002] ...2.     1.602244: bio_add_page <-ext4_mpage_readpages object:0xffff88811bee4000 value:0x1000
>              cat-117     [002] ...1.     1.602245: __bio_try_merge_page <-bio_add_page object:0xffff88811bee4000 value:0x1000
>              cat-117     [002] ...1.     1.602245: __bio_add_page <-bio_add_page object:0xffff88811bee4000 value:0x1000

Now instead of searching the function arguments for "obj", we should just
read the obj value at every function and report when it changed, as well as
the last function it was still the same. The report will be like the
"func_repeats" entry. And instead we can print when the object changes,
something like this:

              cat-117     [002] ...2.     1.602245: bio_add_page <-ext4_mpage_readpages object:0xffff88811bee4000 value:0x2000 (last value:0x1000 at [002] __bio_add_page <-bio_add_page ts: 1.602245)

This will tell us when a value changed.

It would need to store the CPU, ip and pip and timestamp of the last time
it was the same. And when it changes, it would then report it.

-- Steve

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v15 1/4] trace: Add trace any kernel object
  2022-08-19  3:27 ` [PATCH v15 1/4] trace: Add trace any " Jeff Xie
@ 2022-10-17 19:58   ` Steven Rostedt
  2022-10-18  8:35     ` Jeff Xie
  0 siblings, 1 reply; 21+ messages in thread
From: Steven Rostedt @ 2022-10-17 19:58 UTC (permalink / raw)
  To: Jeff Xie; +Cc: mingo, mhiramat, zanussi, linux-kernel, chensong_2000

On Fri, 19 Aug 2022 11:27:03 +0800
Jeff Xie <xiehuan09@gmail.com> wrote:


> index 27febd4ee33e..2c2477dea0f2 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -5621,6 +5621,9 @@ static const char readme_msg[] =
>  	"\t            enable_hist:<system>:<event>\n"
>  	"\t            disable_hist:<system>:<event>\n"
>  #endif
> +#ifdef CONFIG_TRACE_OBJECT
> +	"\t            objtrace:add:obj[:count][if <filter>]\n"
> +#endif
>  #ifdef CONFIG_STACKTRACE
>  	"\t\t    stacktrace\n"
>  #endif
> @@ -9261,6 +9264,9 @@ static struct trace_array *trace_array_create(const char *name)
>  	if (ftrace_allocate_ftrace_ops(tr) < 0)
>  		goto out_free_tr;
>  
> +	if (allocate_objtrace_data(tr) < 0)
> +		goto out_free_tr;

When I was rewriting this, I found there's no reason to associate a objdata
with a tr. It should all be self contained (per event file). The objdata
can have a tr, but the tr doesn't care about the objdata.

> +
>  	ftrace_init_trace_array(tr);
>  
>  	init_trace_flags_index(tr);
> @@ -9280,6 +9286,7 @@ static struct trace_array *trace_array_create(const char *name)
>  
>   out_free_tr:
>  	ftrace_free_ftrace_ops(tr);
> +	free_objtrace_data(tr);
>  	free_trace_buffers(tr);
>  	free_cpumask_var(tr->tracing_cpumask);
>  	kfree(tr->name);
> @@ -9373,6 +9380,7 @@ static int __remove_instance(struct trace_array *tr)
>  	event_trace_del_tracer(tr);
>  	ftrace_clear_pids(tr);
>  	ftrace_destroy_function_files(tr);
> +	free_objtrace_data(tr);
>  	tracefs_remove(tr->dir);
>  	free_percpu(tr->last_func_repeats);
>  	free_trace_buffers(tr);
> @@ -10110,6 +10118,9 @@ __init static int tracer_alloc_buffers(void)
>  		goto out_free_savedcmd;
>  	}
>  
> +	if (allocate_objtrace_data(&global_trace))
> +		goto out_free_savedcmd;
> +
>  	if (global_trace.buffer_disabled)
>  		tracing_off();
>  
> diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> index 900e75d96c84..8da1ef78a09a 100644
> --- a/kernel/trace/trace.h
> +++ b/kernel/trace/trace.h
> @@ -54,6 +54,7 @@ enum trace_type {
>  	TRACE_TIMERLAT,
>  	TRACE_RAW_DATA,
>  	TRACE_FUNC_REPEATS,
> +	TRACE_OBJECT,
>  
>  	__TRACE_LAST_TYPE,
>  };
> @@ -363,6 +364,9 @@ struct trace_array {
>  	cpumask_var_t		tracing_cpumask; /* only trace on set CPUs */
>  	int			ref;
>  	int			trace_ref;
> +#ifdef CONFIG_TRACE_OBJECT
> +	struct objtrace_data	*obj_data;
> +#endif

That means the above is not needed.

>  #ifdef CONFIG_FUNCTION_TRACER
>  	struct ftrace_ops	*ops;
>  	struct trace_pid_list	__rcu *function_pids;
> @@ -471,6 +475,7 @@ extern void __ftrace_bad_type(void);
>  			  TRACE_GRAPH_RET);		\
>  		IF_ASSIGN(var, ent, struct func_repeats_entry,		\
>  			  TRACE_FUNC_REPEATS);				\
> +		IF_ASSIGN(var, ent, struct trace_object_entry, TRACE_OBJECT);\
>  		__ftrace_bad_type();					\
>  	} while (0)
>  
> @@ -1536,6 +1541,16 @@ static inline int register_trigger_hist_cmd(void) { return 0; }
>  static inline int register_trigger_hist_enable_disable_cmds(void) { return 0; }
>  #endif
>  
> +#ifdef CONFIG_TRACE_OBJECT
> +extern int register_trigger_object_cmd(void);
> +extern int allocate_objtrace_data(struct trace_array *tr);
> +extern void free_objtrace_data(struct trace_array *tr);
> +#else
> +static inline int register_trigger_object_cmd(void) { return 0; }
> +static inline int allocate_objtrace_data(struct trace_array *tr) { return 0; }
> +static inline void free_objtrace_data(struct trace_array *tr) { };
> +#endif
> +
>  extern int register_trigger_cmds(void);
>  extern void clear_event_triggers(struct trace_array *tr);
>  
> @@ -1611,6 +1626,12 @@ extern bool event_trigger_check_remove(const char *glob);
>  extern bool event_trigger_empty_param(const char *param);
>  extern int event_trigger_separate_filter(char *param_and_filter, char **param,
>  					 char **filter, bool param_required);
> +extern int register_trigger(char *glob,
> +			struct event_trigger_data *data,
> +			struct trace_event_file *file);
> +extern void unregister_trigger(char *glob,
> +			struct event_trigger_data *test,
> +			struct trace_event_file *file);
>  extern struct event_trigger_data *
>  event_trigger_alloc(struct event_command *cmd_ops,
>  		    char *cmd,
> diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h
> index cd41e863b51c..bb120d9498a9 100644
> --- a/kernel/trace/trace_entries.h
> +++ b/kernel/trace/trace_entries.h
> @@ -401,3 +401,20 @@ FTRACE_ENTRY(timerlat, timerlat_entry,
>  		 __entry->context,
>  		 __entry->timer_latency)
>  );
> +
> +/*
> + * trace object entry:
> + */
> +FTRACE_ENTRY(object, trace_object_entry,
> +
> +	TRACE_OBJECT,
> +
> +	F_STRUCT(
> +		__field(	unsigned long,		ip		)
> +		__field(	unsigned long,		parent_ip	)
> +		__field(	unsigned long,		object		)
> +	),
> +
> +	F_printk(" %ps <-- %ps object:%lx\n",
> +		 (void *)__entry->ip, (void *)__entry->parent_ip, __entry->object)
> +);
> diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
> index cb866c3141af..8bad4eb3b997 100644
> --- a/kernel/trace/trace_events_trigger.c
> +++ b/kernel/trace/trace_events_trigger.c
> @@ -560,7 +560,7 @@ void update_cond_flag(struct trace_event_file *file)
>   *
>   * Return: 0 on success, errno otherwise
>   */
> -static int register_trigger(char *glob,
> +int register_trigger(char *glob,
>  			    struct event_trigger_data *data,
>  			    struct trace_event_file *file)
>  {
> @@ -605,7 +605,7 @@ static int register_trigger(char *glob,
>   * Usually used directly as the @unreg method in event command
>   * implementations.
>   */
> -static void unregister_trigger(char *glob,
> +void unregister_trigger(char *glob,
>  			       struct event_trigger_data *test,
>  			       struct trace_event_file *file)
>  {
> @@ -1976,6 +1976,7 @@ __init int register_trigger_cmds(void)
>  	register_trigger_enable_disable_cmds();
>  	register_trigger_hist_enable_disable_cmds();
>  	register_trigger_hist_cmd();
> +	register_trigger_object_cmd();
>  
>  	return 0;
>  }
> diff --git a/kernel/trace/trace_object.c b/kernel/trace/trace_object.c
> new file mode 100644
> index 000000000000..19ec4b1c0186
> --- /dev/null
> +++ b/kernel/trace/trace_object.c
> @@ -0,0 +1,450 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * trace the kernel object in the kernel function parameter
> + * Copyright (C) 2021 Jeff Xie <xiehuan09@gmail.com>
> + */
> +
> +#define pr_fmt(fmt) "trace_object: " fmt
> +
> +#include "trace_output.h"
> +
> +#define MAX_TRACED_OBJECT 5
> +#define OBJTRACE_CMD_LEN  16
> +#define OBJTRACE_CMD_ADD "add"
> +static LIST_HEAD(obj_data_head);
> +static const int max_args_num = 6;
> +static void exit_trace_object(struct trace_array *tr);
> +static int init_trace_object(struct trace_array *tr);
> +
> +struct object_instance {
> +	void *obj;
> +	struct trace_array *tr;

The instance will need to hold the object, and last value.

> +};
> +
> +/* objtrace private data */
> +struct objtrace_trigger_data {
> +	struct ftrace_event_field *field;
> +	char objtrace_cmd[OBJTRACE_CMD_LEN];
> +	struct trace_array *tr;
> +};
> +
> +/* objtrace data with fops and objtrace_instances */
> +struct objtrace_data {
> +	struct list_head head;
> +	struct trace_array *tr;
> +	struct ftrace_ops fops;
> +	int num_traced_obj;
> +	struct object_instance traced_obj[MAX_TRACED_OBJECT];
> +	raw_spinlock_t obj_data_lock;
> +};

The objtrace_data and the objtrace_trigger_data could be merged into a
single instance that get's allocated when the trigger is written. The
object_instance array size can be the "count" passed in.

It will require initializing the entire instance at the time the trigger is
written into the trigger file.

> +
> +static struct objtrace_data *get_obj_data(struct trace_array *tr)
> +{
> +	struct objtrace_data *obj_data = NULL;
> +
> +	list_for_each_entry(obj_data, &obj_data_head, head) {
> +		if (obj_data->tr == tr)
> +			break;
> +	}
> +	return obj_data;
> +}

I deleted the above function.

> +
> +static bool object_exist(void *obj, struct trace_array *tr)
> +{
> +	int i, max;
> +	struct objtrace_data *obj_data;
> +
> +	obj_data = get_obj_data(tr);
> +	if (!obj_data)
> +		return false;
> +
> +	max = READ_ONCE(obj_data->num_traced_obj);
> +	smp_rmb();
> +	for (i = 0; i < max; i++) {
> +		if (obj_data->traced_obj[i].obj == obj)
> +			return true;
> +	}
> +	return false;
> +}
> +
> +static bool object_empty(struct trace_array *tr)
> +{
> +	struct objtrace_data *obj_data;
> +
> +	obj_data = get_obj_data(tr);
> +	if (!obj_data)
> +		return false;
> +
> +	return !READ_ONCE(obj_data->num_traced_obj);
> +}
> +
> +static void set_trace_object(void *obj, struct trace_array *tr)
> +{
> +	unsigned long flags;
> +	struct object_instance *obj_ins;
> +	struct objtrace_data *obj_data;

BTW, please use "upside-down x-mas tree" formatting for variables. That is:

	struct object_instance *obj_ins;
	struct objtrace_data *obj_data;
	unsigned long flags;

And I have more to comment on about variable declarations.

> +
> +	if (in_nmi())
> +		return;
> +
> +	if (!obj && object_exist(obj, tr))

If we are checking values, we do not test for object_exist here. Just !obj.

> +		return;
> +
> +	obj_data = get_obj_data(tr);
> +	if (!obj_data)
> +		return;
> +
> +	/* only this place has write operations */
> +	raw_spin_lock_irqsave(&obj_data->obj_data_lock, flags);
> +	if (READ_ONCE(obj_data->num_traced_obj) == MAX_TRACED_OBJECT) {

No need for the READ_ONCE() in the spin locked area.

> +		trace_array_printk_buf(tr->array_buffer.buffer, _THIS_IP_,
> +				"object_pool is full, can't trace object:0x%px\n", obj);

Also, I'm not sure if we have a counter, that we are going to want to print
anything when it is full. Maybe for debugging?
> +		goto out;
> +	}
> +	obj_ins = &obj_data->traced_obj[READ_ONCE(obj_data->num_traced_obj)];
> +	obj_ins->obj = obj;
> +	obj_ins->tr = tr;
> +	/* make sure the num_traced_obj update always appears after traced_obj update */
> +	smp_wmb();
> +	obj_data->num_traced_obj++;
> +out:
> +	raw_spin_unlock_irqrestore(&obj_data->obj_data_lock, flags);
> +}
> +
> +static void submit_trace_object(unsigned long ip, unsigned long parent_ip,
> +				 unsigned long object, struct trace_array *tr)
> +{
> +
> +	struct trace_buffer *buffer = tr->array_buffer.buffer;
> +	struct ring_buffer_event *event;
> +	struct trace_object_entry *entry;
> +	unsigned int trace_ctx = 0;
> +
> +	trace_ctx = tracing_gen_ctx();
> +	event = trace_buffer_lock_reserve(buffer, TRACE_OBJECT,
> +			sizeof(*entry), trace_ctx);
> +	if (!event)
> +		return;
> +	entry   = ring_buffer_event_data(event);
> +	entry->ip                       = ip;
> +	entry->parent_ip                = parent_ip;
> +	entry->object			= object;
> +
> +	trace_buffer_unlock_commit(tr, buffer, event, trace_ctx);
> +}
> +
> +static void
> +trace_object_events_call(unsigned long ip, unsigned long parent_ip,
> +		struct ftrace_ops *op, struct ftrace_regs *fregs)
> +{
> +	struct pt_regs *pt_regs = ftrace_get_regs(fregs);
> +	struct trace_array *tr = op->private;

We do not need the op->private for the tr. instead:

	struct obj_data *obj_data;

> +	struct trace_array_cpu *data;
> +	unsigned long obj;
> +	long disabled;
> +	int cpu, n;

	obj_data = container_of(op, struct obj_data, fops);

	tr = obj_data->tr;

> +
> +	preempt_disable_notrace();
> +
> +	cpu = raw_smp_processor_id();
> +	data = per_cpu_ptr(tr->array_buffer.data, cpu);
> +	disabled = atomic_inc_return(&data->disabled);
> +	if (likely(disabled == 1)) {
> +		if (object_empty(tr))
> +			goto out;
> +		for (n = 0; n < max_args_num; n++) {
> +			obj = regs_get_kernel_argument(pt_regs, 

As I mentioned, the above is best effort at best. Instead, we should get
the value and check if it changed. If it did, then we print, otherwise skip
it. Will need to be under the lock to do this check.


> +			if (object_exist((void *)obj, tr))
> +				submit_trace_object(ip, parent_ip, obj, tr);
> +		/* The parameters of a function may match multiple objects */
> +		}
> +	}
> +out:
> +	atomic_dec(&data->disabled);
> +	preempt_enable_notrace();
> +}
> +
> +static void
> +trace_object_trigger(struct event_trigger_data *data,
> +		   struct trace_buffer *buffer,  void *rec,
> +		   struct ring_buffer_event *event)
> +{
> +	struct objtrace_trigger_data *obj_data = data->private_data;
> +	struct trace_array *tr = obj_data->tr;
> +	struct ftrace_event_field *field;
> +	void *obj = NULL;
> +
> +	field = obj_data->field;
> +	memcpy(&obj, rec + field->offset, sizeof(obj));
> +	set_trace_object(obj, tr);
> +}
> +
> +static void
> +trace_object_trigger_free(struct event_trigger_data *data)
> +{
> +	struct objtrace_trigger_data *obj_data = data->private_data;
> +
> +	if (WARN_ON_ONCE(data->ref <= 0))
> +		return;
> +
> +	data->ref--;
> +	if (!data->ref) {
> +		exit_trace_object(obj_data->tr);
> +		kfree(data->private_data);
> +		trigger_data_free(data);
> +	}
> +}
> +
> +static void
> +trace_object_count_trigger(struct event_trigger_data *data,
> +			 struct trace_buffer *buffer, void *rec,
> +			 struct ring_buffer_event *event)
> +{
> +	if (!data->count)
> +		return;
> +
> +	if (data->count != -1)
> +		(data->count)--;
> +
> +	trace_object_trigger(data, buffer, rec, event);
> +}
> +
> +static int
> +event_trigger_print(const char *name, struct seq_file *m,
> +		void *data, char *filter_str, void *objtrace_data)
> +{
> +	long count = (long)data;
> +	struct objtrace_trigger_data *obj_data = objtrace_data;
> +
> +	seq_puts(m, name);
> +
> +	seq_printf(m, ":%s", obj_data->objtrace_cmd);
> +	seq_printf(m, ":%s", obj_data->field->name);
> +
> +	if (count == -1)
> +		seq_puts(m, ":unlimited");
> +	else
> +		seq_printf(m, ":count=%ld", count);

We are not going to have an "unlimited" here. I think the count should be
the number of objects we can trace.

> +
> +	if (filter_str)
> +		seq_printf(m, " if %s\n", filter_str);
> +	else
> +		seq_putc(m, '\n');
> +
> +	return 0;
> +}
> +
> +static int event_object_trigger_init(struct event_trigger_data *data)
> +{
> +	struct objtrace_trigger_data *obj_data = data->private_data;
> +	int ret;
> +
> +	if (!data->ref) {
> +		ret = init_trace_object(obj_data->tr);
> +		if (ret)
> +			return ret;
> +	}
> +	data->ref++;
> +	return 0;
> +}
> +
> +static int
> +trace_object_trigger_print(struct seq_file *m, struct event_trigger_data *data)
> +{
> +	return event_trigger_print("objtrace", m, (void *)data->count,
> +				   data->filter_str, data->private_data);
> +}
> +
> +static struct event_trigger_ops objecttrace_trigger_ops = {
> +	.trigger		= trace_object_trigger,
> +	.print			= trace_object_trigger_print,
> +	.init			= event_object_trigger_init,
> +	.free			= trace_object_trigger_free,
> +};
> +
> +static struct event_trigger_ops objecttrace_count_trigger_ops = {
> +	.trigger		= trace_object_count_trigger,
> +	.print			= trace_object_trigger_print,
> +	.init			= event_object_trigger_init,
> +	.free			= trace_object_trigger_free,
> +};
> +
> +static struct event_trigger_ops *
> +objecttrace_get_trigger_ops(char *cmd, char *param)
> +{
> +	return param ? &objecttrace_count_trigger_ops : &objecttrace_trigger_ops;
> +}
> +
> +static bool field_exist(struct trace_event_file *file,
> +			struct event_command *cmd_ops,
> +			const char *field_name)
> +{
> +	struct event_trigger_data *data;
> +	struct objtrace_trigger_data *obj_data;
> +
> +	lockdep_assert_held(&event_mutex);
> +
> +	list_for_each_entry(data, &file->triggers, list) {
> +		if (data->cmd_ops->trigger_type == cmd_ops->trigger_type) {
> +			obj_data = data->private_data;
> +			if (!strcmp(obj_data->field->name, field_name))
> +				return true;
> +		}
> +	}
> +
> +	return false;
> +}
> +
> +static int
> +event_object_trigger_parse(struct event_command *cmd_ops,
> +		       struct trace_event_file *file,
> +		       char *glob, char *cmd, char *param_and_filter)
> +{
> +	struct event_trigger_data *trigger_data;
> +	struct objtrace_trigger_data *obj_data;
> +	struct ftrace_event_field *field;
> +	char *objtrace_cmd, *arg;
> +	char *param, *filter;
> +	int ret;
> +	bool remove;

BTW, do not declare more than one variable on the same line. It makes it
hard to know what is declared, makes diffs harder to know what changed, and
makes it annoying when the compiler says a variable is unused to delete it.

The above should be:

	struct event_trigger_data *trigger_data;
	struct objtrace_trigger_data *obj_data;
	struct ftrace_event_field *field;
	char *objtrace_cmd;
	char *filter;
	char *param;
	char *arg;
	bool remove;
	int ret;

Although, when I changed this to use the fetch instruction, most of it was
removed anyway (which is why I found the above to be annoying due to all the
"unused variable" warnings I had).

-- Steve

> +
> +	remove = event_trigger_check_remove(glob);
> +
> +	/*
> +	 * separate the param and the filter:
> +	 * objtrace:add:OBJ[:COUNT] [if filter]
> +	 */
> +	ret = event_trigger_separate_filter(param_and_filter, &param, &filter, true);
> +	if (ret)
> +		return ret;
> +
> +	objtrace_cmd = strsep(&param, ":");
> +	if (!objtrace_cmd || strcmp(objtrace_cmd, OBJTRACE_CMD_ADD)) {
> +		pr_err("error objtrace command\n");
> +		return -EINVAL;
> +	}
> +
> +	arg = strsep(&param, ":");
> +	if (!arg)
> +		return -EINVAL;
> +
> +	field = trace_find_event_field(file->event_call, arg);
> +	if (!field)
> +		return -EINVAL;
> +
> +	if (field->size != sizeof(void *)) {
> +		pr_err("the size of the %s should be:%zu\n", field->name, sizeof(void *));
> +		return -EINVAL;
> +	}
> +
> +	if (remove && !field_exist(file, cmd_ops, field->name))
> +		return -ENOENT;
> +
> +	obj_data = kzalloc(sizeof(*obj_data), GFP_KERNEL);
> +	if (!obj_data)
> +		return -ENOMEM;
> +
> +	obj_data->field = field;
> +	obj_data->tr = file->tr;
> +	snprintf(obj_data->objtrace_cmd, OBJTRACE_CMD_LEN, objtrace_cmd);
> +
> +	trigger_data = event_trigger_alloc(cmd_ops, cmd, param, obj_data);
> +	if (!trigger_data) {
> +		kfree(obj_data);
> +		return -ENOMEM;
> +	}
> +	if (remove) {
> +		event_trigger_unregister(cmd_ops, file, glob+1, trigger_data);
> +		kfree(obj_data);
> +		kfree(trigger_data);
> +		return 0;
> +	}
> +
> +	ret = event_trigger_parse_num(param, trigger_data);
> +	if (ret)
> +		goto out_free;
> +
> +	ret = event_trigger_set_filter(cmd_ops, file, filter, trigger_data);
> +	if (ret < 0)
> +		goto out_free;
> +
> +	ret = event_trigger_register(cmd_ops, file, glob, trigger_data);
> +	if (ret)
> +		goto out_free;
> +
> +	return ret;
> +
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v15 0/4] trace: Introduce objtrace trigger to trace the kernel object
  2022-10-17 19:40 ` [PATCH v15 0/4] trace: Introduce objtrace trigger to trace the kernel object Steven Rostedt
@ 2022-10-18  7:01   ` Jeff Xie
  2022-11-29 16:09   ` Jeff Xie
  1 sibling, 0 replies; 21+ messages in thread
From: Jeff Xie @ 2022-10-18  7:01 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: mingo, mhiramat, zanussi, linux-kernel, chensong_2000

On Tue, Oct 18, 2022 at 3:40 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Fri, 19 Aug 2022 11:27:02 +0800
> Jeff Xie <xiehuan09@gmail.com> wrote:
>
> So on my way home from London, I tried to update it to use the fetch
> instruction logic of kprobes (which will work), but I also ended up pretty
> much rewriting most of it, which then I realized, I'm going much further
> than I wanted to.

Very excited  to hear it will work. Thank you for your hard work.

> But know I have a much better idea of how this should work, and I'll try to
> go through what I learned here and throughout the patches.
>
>
> > Introduce a method based on function tracer to trace any object and get
> > the value of the object dynamically. the object can be obtained from the
> > dynamic event (kprobe_event/uprobe_event) or the static event(tracepoint).
>
> Note, I can see the value of tracing only the value of the object, but that
> to me would be "objfollow" not "objtrace". That is, to find out what
> functions are calling the object. I guess that may have been your original
> objective, but it was articulated more about knowing how the value changed

Yes, It actually was my original objective -;)  and I also really love
your idea of getting its value.

> and not where the value is going (and the code was written more like that).
> Having a objfollow may be nice, but reading the arguments of a function is
> really a "best attempt" at most, and you can't really trust the arguments
> are what you are tracing.

It's true, you spoke to my heart ;-)

>I would hold off on that until we have a good BTF
> tracing infrastructure in the function tracer.
>
> So I'm going to continue with "objtrace" which is about tracing the
> object's value throughout time.

Thanks.

>
> >
> > Usage:
> > When using the kprobe event, only need to set the objtrace(a new trigger),
> > we can get the value of the object. The object is from the setting of the
> > kprobe event.
> >
> > For example:
> > For the function bio_add_page():
> >
> > int bio_add_page(struct bio *bio, struct page *page,
> >       unsigned int len, unsigned int offset)
> >
> > Firstly, we can set the base of the object, thus the first string "arg1"
> > stands for the value of the first parameter of this function bio_add_gage(),
> >
> > # echo 'p bio_add_page arg1=$arg1' > ./kprobe_events
> >
> > Secondly, we can get the value dynamically based on above object.
> >
> > find the offset of the bi_size in struct bio:
> > $ gdb vmlinux
> > (gdb) p &(((struct bio *)0)->bi_iter.bi_size)
> > $1 = (unsigned int *) 0x28
> >
> > # echo 'objtrace:add:arg1,0x28:u32:1 if comm == "cat"' > ./events/kprobes/ \
> >       p_bio_add_page_0/trigger
> >
> > # cd /sys/kernel/debug/tracing/
> > # echo 'p bio_add_page arg1=$arg1' > ./kprobe_events
> > # echo 'objtrace:add:arg1,0x28:u32:1 if comm == "cat"' > ./events/kprobes/p_bio_add_page_0/trigger
>
> So I was working to change the above to:
>
>   echo 'objtrace:add:+0x28($arg1):u32:1 if comm == "cat"'

It seems easier to use, I like it.

>
>
> >
> > # du -sh /test.txt
> > 12.0K   /test.txt
> >
> > # cat  /test.txt > /dev/null
> > # cat ./trace
> > # tracer: nop
> > #
> > # entries-in-buffer/entries-written: 128/128   #P:4
> > #
> > #                                _-----=> irqs-off/BH-disabled
> > #                               / _----=> need-resched
> > #                              | / _---=> hardirq/softirq
> > #                              || / _--=> preempt-depth
> > #                              ||| / _-=> migrate-disable
> > #                              |||| /     delay
> > #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
> > #              | |         |   |||||     |         |
> >              cat-117     [002] ...1.     1.602243: __bio_try_merge_page <-bio_add_page object:0xffff88811bee4000 value:0x0
> >              cat-117     [002] ...1.     1.602244: __bio_add_page <-bio_add_page object:0xffff88811bee4000 value:0x0
> >              cat-117     [002] ...2.     1.602244: bio_add_page <-ext4_mpage_readpages object:0xffff88811bee4000 value:0x1000
> >              cat-117     [002] ...1.     1.602245: __bio_try_merge_page <-bio_add_page object:0xffff88811bee4000 value:0x1000
> >              cat-117     [002] ...1.     1.602245: __bio_add_page <-bio_add_page object:0xffff88811bee4000 value:0x1000
>
> Now instead of searching the function arguments for "obj", we should just
> read the obj value at every function and report when it changed, as well as
> the last function it was still the same. The report will be like the
> "func_repeats" entry. And instead we can print when the object changes,
> something like this:
>
>               cat-117     [002] ...2.     1.602245: bio_add_page <-ext4_mpage_readpages object:0xffff88811bee4000 value:0x2000 (last value:0x1000 at [002] __bio_add_page <-bio_add_page ts: 1.602245)

This output is better than I expected.

>
> This will tell us when a value changed.
>
> It would need to store the CPU, ip and pip and timestamp of the last time
> it was the same. And when it changes, it would then report it.
>
> -- Steve

-- 
Thanks,
JeffXie

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v15 1/4] trace: Add trace any kernel object
  2022-10-17 19:58   ` Steven Rostedt
@ 2022-10-18  8:35     ` Jeff Xie
  0 siblings, 0 replies; 21+ messages in thread
From: Jeff Xie @ 2022-10-18  8:35 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: mingo, mhiramat, zanussi, linux-kernel, chensong_2000

Hi steve,

Thank you very much for your patient reply, very happy and very welcome to
rewrite the patch series with fetch instructions.

On Tue, Oct 18, 2022 at 3:58 AM Steven Rostedt <rostedt@goodmis.org> wrote:>
> On Fri, 19 Aug 2022 11:27:03 +0800
> Jeff Xie <xiehuan09@gmail.com> wrote:
>
>
> > index 27febd4ee33e..2c2477dea0f2 100644
> > --- a/kernel/trace/trace.c
> > +++ b/kernel/trace/trace.c
> > @@ -5621,6 +5621,9 @@ static const char readme_msg[] =
> >       "\t            enable_hist:<system>:<event>\n"
> >       "\t            disable_hist:<system>:<event>\n"
> >  #endif
> > +#ifdef CONFIG_TRACE_OBJECT
> > +     "\t            objtrace:add:obj[:count][if <filter>]\n"
> > +#endif
> >  #ifdef CONFIG_STACKTRACE
> >       "\t\t    stacktrace\n"
> >  #endif
> > @@ -9261,6 +9264,9 @@ static struct trace_array *trace_array_create(const char *name)
> >       if (ftrace_allocate_ftrace_ops(tr) < 0)
> >               goto out_free_tr;
> >
> > +     if (allocate_objtrace_data(tr) < 0)
> > +             goto out_free_tr;
>
> When I was rewriting this, I found there's no reason to associate a objdata
> with a tr. It should all be self contained (per event file). The objdata
> can have a tr, but the tr doesn't care about the objdata.

I'm so glad you can rewrite it. okay, you should have removed the
object_exist(),
so the tr doesn't care about the objdata.

>
> > +
> >       ftrace_init_trace_array(tr);
> >
> >       init_trace_flags_index(tr);
> > @@ -9280,6 +9286,7 @@ static struct trace_array *trace_array_create(const char *name)
> >
> >   out_free_tr:
> >       ftrace_free_ftrace_ops(tr);
> > +     free_objtrace_data(tr);
> >       free_trace_buffers(tr);
> >       free_cpumask_var(tr->tracing_cpumask);
> >       kfree(tr->name);
> > @@ -9373,6 +9380,7 @@ static int __remove_instance(struct trace_array *tr)
> >       event_trace_del_tracer(tr);
> >       ftrace_clear_pids(tr);
> >       ftrace_destroy_function_files(tr);
> > +     free_objtrace_data(tr);
> >       tracefs_remove(tr->dir);
> >       free_percpu(tr->last_func_repeats);
> >       free_trace_buffers(tr);
> > @@ -10110,6 +10118,9 @@ __init static int tracer_alloc_buffers(void)
> >               goto out_free_savedcmd;
> >       }
> >
> > +     if (allocate_objtrace_data(&global_trace))
> > +             goto out_free_savedcmd;
> > +
> >       if (global_trace.buffer_disabled)
> >               tracing_off();
> >
> > diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> > index 900e75d96c84..8da1ef78a09a 100644
> > --- a/kernel/trace/trace.h
> > +++ b/kernel/trace/trace.h
> > @@ -54,6 +54,7 @@ enum trace_type {
> >       TRACE_TIMERLAT,
> >       TRACE_RAW_DATA,
> >       TRACE_FUNC_REPEATS,
> > +     TRACE_OBJECT,
> >
> >       __TRACE_LAST_TYPE,
> >  };
> > @@ -363,6 +364,9 @@ struct trace_array {
> >       cpumask_var_t           tracing_cpumask; /* only trace on set CPUs */
> >       int                     ref;
> >       int                     trace_ref;
> > +#ifdef CONFIG_TRACE_OBJECT
> > +     struct objtrace_data    *obj_data;
> > +#endif
>
> That means the above is not needed.

Yes, thanks.

>
> >  #ifdef CONFIG_FUNCTION_TRACER
> >       struct ftrace_ops       *ops;
> >       struct trace_pid_list   __rcu *function_pids;
> > @@ -471,6 +475,7 @@ extern void __ftrace_bad_type(void);
> >                         TRACE_GRAPH_RET);             \
> >               IF_ASSIGN(var, ent, struct func_repeats_entry,          \
> >                         TRACE_FUNC_REPEATS);                          \
> > +             IF_ASSIGN(var, ent, struct trace_object_entry, TRACE_OBJECT);\
> >               __ftrace_bad_type();                                    \
> >       } while (0)
> >
> > @@ -1536,6 +1541,16 @@ static inline int register_trigger_hist_cmd(void) { return 0; }
> >  static inline int register_trigger_hist_enable_disable_cmds(void) { return 0; }
> >  #endif
> >
> > +#ifdef CONFIG_TRACE_OBJECT
> > +extern int register_trigger_object_cmd(void);
> > +extern int allocate_objtrace_data(struct trace_array *tr);
> > +extern void free_objtrace_data(struct trace_array *tr);
> > +#else
> > +static inline int register_trigger_object_cmd(void) { return 0; }
> > +static inline int allocate_objtrace_data(struct trace_array *tr) { return 0; }
> > +static inline void free_objtrace_data(struct trace_array *tr) { };
> > +#endif
> > +
> >  extern int register_trigger_cmds(void);
> >  extern void clear_event_triggers(struct trace_array *tr);
> >
> > @@ -1611,6 +1626,12 @@ extern bool event_trigger_check_remove(const char *glob);
> >  extern bool event_trigger_empty_param(const char *param);
> >  extern int event_trigger_separate_filter(char *param_and_filter, char **param,
> >                                        char **filter, bool param_required);
> > +extern int register_trigger(char *glob,
> > +                     struct event_trigger_data *data,
> > +                     struct trace_event_file *file);
> > +extern void unregister_trigger(char *glob,
> > +                     struct event_trigger_data *test,
> > +                     struct trace_event_file *file);
> >  extern struct event_trigger_data *
> >  event_trigger_alloc(struct event_command *cmd_ops,
> >                   char *cmd,
> > diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h
> > index cd41e863b51c..bb120d9498a9 100644
> > --- a/kernel/trace/trace_entries.h
> > +++ b/kernel/trace/trace_entries.h
> > @@ -401,3 +401,20 @@ FTRACE_ENTRY(timerlat, timerlat_entry,
> >                __entry->context,
> >                __entry->timer_latency)
> >  );
> > +
> > +/*
> > + * trace object entry:
> > + */
> > +FTRACE_ENTRY(object, trace_object_entry,
> > +
> > +     TRACE_OBJECT,
> > +
> > +     F_STRUCT(
> > +             __field(        unsigned long,          ip              )
> > +             __field(        unsigned long,          parent_ip       )
> > +             __field(        unsigned long,          object          )
> > +     ),
> > +
> > +     F_printk(" %ps <-- %ps object:%lx\n",
> > +              (void *)__entry->ip, (void *)__entry->parent_ip, __entry->object)
> > +);
> > diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
> > index cb866c3141af..8bad4eb3b997 100644
> > --- a/kernel/trace/trace_events_trigger.c
> > +++ b/kernel/trace/trace_events_trigger.c
> > @@ -560,7 +560,7 @@ void update_cond_flag(struct trace_event_file *file)
> >   *
> >   * Return: 0 on success, errno otherwise
> >   */
> > -static int register_trigger(char *glob,
> > +int register_trigger(char *glob,
> >                           struct event_trigger_data *data,
> >                           struct trace_event_file *file)
> >  {
> > @@ -605,7 +605,7 @@ static int register_trigger(char *glob,
> >   * Usually used directly as the @unreg method in event command
> >   * implementations.
> >   */
> > -static void unregister_trigger(char *glob,
> > +void unregister_trigger(char *glob,
> >                              struct event_trigger_data *test,
> >                              struct trace_event_file *file)
> >  {
> > @@ -1976,6 +1976,7 @@ __init int register_trigger_cmds(void)
> >       register_trigger_enable_disable_cmds();
> >       register_trigger_hist_enable_disable_cmds();
> >       register_trigger_hist_cmd();
> > +     register_trigger_object_cmd();
> >
> >       return 0;
> >  }
> > diff --git a/kernel/trace/trace_object.c b/kernel/trace/trace_object.c
> > new file mode 100644
> > index 000000000000..19ec4b1c0186
> > --- /dev/null
> > +++ b/kernel/trace/trace_object.c
> > @@ -0,0 +1,450 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * trace the kernel object in the kernel function parameter
> > + * Copyright (C) 2021 Jeff Xie <xiehuan09@gmail.com>
> > + */
> > +
> > +#define pr_fmt(fmt) "trace_object: " fmt
> > +
> > +#include "trace_output.h"
> > +
> > +#define MAX_TRACED_OBJECT 5
> > +#define OBJTRACE_CMD_LEN  16
> > +#define OBJTRACE_CMD_ADD "add"
> > +static LIST_HEAD(obj_data_head);
> > +static const int max_args_num = 6;
> > +static void exit_trace_object(struct trace_array *tr);
> > +static int init_trace_object(struct trace_array *tr);
> > +
> > +struct object_instance {
> > +     void *obj;
> > +     struct trace_array *tr;
>
> The instance will need to hold the object, and last value.

Thanks, I got it.

>
> > +};
> > +
> > +/* objtrace private data */
> > +struct objtrace_trigger_data {
> > +     struct ftrace_event_field *field;
> > +     char objtrace_cmd[OBJTRACE_CMD_LEN];
> > +     struct trace_array *tr;
> > +};
> > +
> > +/* objtrace data with fops and objtrace_instances */
> > +struct objtrace_data {
> > +     struct list_head head;
> > +     struct trace_array *tr;
> > +     struct ftrace_ops fops;
> > +     int num_traced_obj;
> > +     struct object_instance traced_obj[MAX_TRACED_OBJECT];
> > +     raw_spinlock_t obj_data_lock;
> > +};
>
> The objtrace_data and the objtrace_trigger_data could be merged into a
> single instance that get's allocated when the trigger is written. The
> object_instance array size can be the "count" passed in.
>
> It will require initializing the entire instance at the time the trigger is
> written into the trigger file.
>

It does look better;-)

> > +
> > +static struct objtrace_data *get_obj_data(struct trace_array *tr)
> > +{
> > +     struct objtrace_data *obj_data = NULL;
> > +
> > +     list_for_each_entry(obj_data, &obj_data_head, head) {
> > +             if (obj_data->tr == tr)
> > +                     break;
> > +     }
> > +     return obj_data;
> > +}
>
> I deleted the above function.

Thanks.

>
> > +
> > +static bool object_exist(void *obj, struct trace_array *tr)
> > +{
> > +     int i, max;
> > +     struct objtrace_data *obj_data;
> > +
> > +     obj_data = get_obj_data(tr);
> > +     if (!obj_data)
> > +             return false;
> > +
> > +     max = READ_ONCE(obj_data->num_traced_obj);
> > +     smp_rmb();
> > +     for (i = 0; i < max; i++) {
> > +             if (obj_data->traced_obj[i].obj == obj)
> > +                     return true;
> > +     }
> > +     return false;
> > +}
> > +
> > +static bool object_empty(struct trace_array *tr)
> > +{
> > +     struct objtrace_data *obj_data;
> > +
> > +     obj_data = get_obj_data(tr);
> > +     if (!obj_data)
> > +             return false;
> > +
> > +     return !READ_ONCE(obj_data->num_traced_obj);
> > +}
> > +
> > +static void set_trace_object(void *obj, struct trace_array *tr)
> > +{
> > +     unsigned long flags;
> > +     struct object_instance *obj_ins;
> > +     struct objtrace_data *obj_data;
>
> BTW, please use "upside-down x-mas tree" formatting for variables. That is:

Thanks for your reminder.

>
>         struct object_instance *obj_ins;
>         struct objtrace_data *obj_data;
>         unsigned long flags;
>
> And I have more to comment on about variable declarations.
>
> > +
> > +     if (in_nmi())
> > +             return;
> > +
> > +     if (!obj && object_exist(obj, tr))
>
> If we are checking values, we do not test for object_exist here. Just !obj.
>
> > +             return;
> > +
> > +     obj_data = get_obj_data(tr);
> > +     if (!obj_data)
> > +             return;
> > +
> > +     /* only this place has write operations */
> > +     raw_spin_lock_irqsave(&obj_data->obj_data_lock, flags);
> > +     if (READ_ONCE(obj_data->num_traced_obj) == MAX_TRACED_OBJECT) {
>
> No need for the READ_ONCE() in the spin locked area.

Thanks.

>
> > +             trace_array_printk_buf(tr->array_buffer.buffer, _THIS_IP_,
> > +                             "object_pool is full, can't trace object:0x%px\n", obj);
>
> Also, I'm not sure if we have a counter, that we are going to want to print
> anything when it is full. Maybe for debugging?

I was worried that if there is no counter, there needs to be a refill logic,
and the implementation will become more complicated ;-)


> > +             goto out;
> > +     }
> > +     obj_ins = &obj_data->traced_obj[READ_ONCE(obj_data->num_traced_obj)];
> > +     obj_ins->obj = obj;
> > +     obj_ins->tr = tr;
> > +     /* make sure the num_traced_obj update always appears after traced_obj update */
> > +     smp_wmb();
> > +     obj_data->num_traced_obj++;
> > +out:
> > +     raw_spin_unlock_irqrestore(&obj_data->obj_data_lock, flags);
> > +}
> > +
> > +static void submit_trace_object(unsigned long ip, unsigned long parent_ip,
> > +                              unsigned long object, struct trace_array *tr)
> > +{
> > +
> > +     struct trace_buffer *buffer = tr->array_buffer.buffer;
> > +     struct ring_buffer_event *event;
> > +     struct trace_object_entry *entry;
> > +     unsigned int trace_ctx = 0;
> > +
> > +     trace_ctx = tracing_gen_ctx();
> > +     event = trace_buffer_lock_reserve(buffer, TRACE_OBJECT,
> > +                     sizeof(*entry), trace_ctx);
> > +     if (!event)
> > +             return;
> > +     entry   = ring_buffer_event_data(event);
> > +     entry->ip                       = ip;
> > +     entry->parent_ip                = parent_ip;
> > +     entry->object                   = object;
> > +
> > +     trace_buffer_unlock_commit(tr, buffer, event, trace_ctx);
> > +}
> > +
> > +static void
> > +trace_object_events_call(unsigned long ip, unsigned long parent_ip,
> > +             struct ftrace_ops *op, struct ftrace_regs *fregs)
> > +{
> > +     struct pt_regs *pt_regs = ftrace_get_regs(fregs);
> > +     struct trace_array *tr = op->private;
>
> We do not need the op->private for the tr. instead:
>
>         struct obj_data *obj_data;
>
> > +     struct trace_array_cpu *data;
> > +     unsigned long obj;
> > +     long disabled;
> > +     int cpu, n;
>
>         obj_data = container_of(op, struct obj_data, fops);
>
>         tr = obj_data->tr;

learned, thanks.

>
> > +
> > +     preempt_disable_notrace();
> > +
> > +     cpu = raw_smp_processor_id();
> > +     data = per_cpu_ptr(tr->array_buffer.data, cpu);
> > +     disabled = atomic_inc_return(&data->disabled);
> > +     if (likely(disabled == 1)) {
> > +             if (object_empty(tr))
> > +                     goto out;
> > +             for (n = 0; n < max_args_num; n++) {
> > +                     obj = regs_get_kernel_argument(pt_regs,
>
> As I mentioned, the above is best effort at best. Instead, we should get
> the value and check if it changed. If it did, then we print, otherwise skip
> it. Will need to be under the lock to do this check.

Although this is not my original intention, I am also very happy to
accept your idea.

>
>
> > +                     if (object_exist((void *)obj, tr))
> > +                             submit_trace_object(ip, parent_ip, obj, tr);
> > +             /* The parameters of a function may match multiple objects */
> > +             }
> > +     }
> > +out:
> > +     atomic_dec(&data->disabled);
> > +     preempt_enable_notrace();
> > +}
> > +
> > +static void
> > +trace_object_trigger(struct event_trigger_data *data,
> > +                struct trace_buffer *buffer,  void *rec,
> > +                struct ring_buffer_event *event)
> > +{
> > +     struct objtrace_trigger_data *obj_data = data->private_data;
> > +     struct trace_array *tr = obj_data->tr;
> > +     struct ftrace_event_field *field;
> > +     void *obj = NULL;
> > +
> > +     field = obj_data->field;
> > +     memcpy(&obj, rec + field->offset, sizeof(obj));
> > +     set_trace_object(obj, tr);
> > +}
> > +
> > +static void
> > +trace_object_trigger_free(struct event_trigger_data *data)
> > +{
> > +     struct objtrace_trigger_data *obj_data = data->private_data;
> > +
> > +     if (WARN_ON_ONCE(data->ref <= 0))
> > +             return;
> > +
> > +     data->ref--;
> > +     if (!data->ref) {
> > +             exit_trace_object(obj_data->tr);
> > +             kfree(data->private_data);
> > +             trigger_data_free(data);
> > +     }
> > +}
> > +
> > +static void
> > +trace_object_count_trigger(struct event_trigger_data *data,
> > +                      struct trace_buffer *buffer, void *rec,
> > +                      struct ring_buffer_event *event)
> > +{
> > +     if (!data->count)
> > +             return;
> > +
> > +     if (data->count != -1)
> > +             (data->count)--;
> > +
> > +     trace_object_trigger(data, buffer, rec, event);
> > +}
> > +
> > +static int
> > +event_trigger_print(const char *name, struct seq_file *m,
> > +             void *data, char *filter_str, void *objtrace_data)
> > +{
> > +     long count = (long)data;
> > +     struct objtrace_trigger_data *obj_data = objtrace_data;
> > +
> > +     seq_puts(m, name);
> > +
> > +     seq_printf(m, ":%s", obj_data->objtrace_cmd);
> > +     seq_printf(m, ":%s", obj_data->field->name);
> > +
> > +     if (count == -1)
> > +             seq_puts(m, ":unlimited");
> > +     else
> > +             seq_printf(m, ":count=%ld", count);
>
> We are not going to have an "unlimited" here. I think the count should be
> the number of objects we can trace.

Thanks, I agree.

>
> > +
> > +     if (filter_str)
> > +             seq_printf(m, " if %s\n", filter_str);
> > +     else
> > +             seq_putc(m, '\n');
> > +
> > +     return 0;
> > +}
> > +
> > +static int event_object_trigger_init(struct event_trigger_data *data)
> > +{
> > +     struct objtrace_trigger_data *obj_data = data->private_data;
> > +     int ret;
> > +
> > +     if (!data->ref) {
> > +             ret = init_trace_object(obj_data->tr);
> > +             if (ret)
> > +                     return ret;
> > +     }
> > +     data->ref++;
> > +     return 0;
> > +}
> > +
> > +static int
> > +trace_object_trigger_print(struct seq_file *m, struct event_trigger_data *data)
> > +{
> > +     return event_trigger_print("objtrace", m, (void *)data->count,
> > +                                data->filter_str, data->private_data);
> > +}
> > +
> > +static struct event_trigger_ops objecttrace_trigger_ops = {
> > +     .trigger                = trace_object_trigger,
> > +     .print                  = trace_object_trigger_print,
> > +     .init                   = event_object_trigger_init,
> > +     .free                   = trace_object_trigger_free,
> > +};
> > +
> > +static struct event_trigger_ops objecttrace_count_trigger_ops = {
> > +     .trigger                = trace_object_count_trigger,
> > +     .print                  = trace_object_trigger_print,
> > +     .init                   = event_object_trigger_init,
> > +     .free                   = trace_object_trigger_free,
> > +};
> > +
> > +static struct event_trigger_ops *
> > +objecttrace_get_trigger_ops(char *cmd, char *param)
> > +{
> > +     return param ? &objecttrace_count_trigger_ops : &objecttrace_trigger_ops;
> > +}
> > +
> > +static bool field_exist(struct trace_event_file *file,
> > +                     struct event_command *cmd_ops,
> > +                     const char *field_name)
> > +{
> > +     struct event_trigger_data *data;
> > +     struct objtrace_trigger_data *obj_data;
> > +
> > +     lockdep_assert_held(&event_mutex);
> > +
> > +     list_for_each_entry(data, &file->triggers, list) {
> > +             if (data->cmd_ops->trigger_type == cmd_ops->trigger_type) {
> > +                     obj_data = data->private_data;
> > +                     if (!strcmp(obj_data->field->name, field_name))
> > +                             return true;
> > +             }
> > +     }
> > +
> > +     return false;
> > +}
> > +
> > +static int
> > +event_object_trigger_parse(struct event_command *cmd_ops,
> > +                    struct trace_event_file *file,
> > +                    char *glob, char *cmd, char *param_and_filter)
> > +{
> > +     struct event_trigger_data *trigger_data;
> > +     struct objtrace_trigger_data *obj_data;
> > +     struct ftrace_event_field *field;
> > +     char *objtrace_cmd, *arg;
> > +     char *param, *filter;
> > +     int ret;
> > +     bool remove;
>
> BTW, do not declare more than one variable on the same line. It makes it
> hard to know what is declared, makes diffs harder to know what changed, and
> makes it annoying when the compiler says a variable is unused to delete it.
>
> The above should be:
>
>         struct event_trigger_data *trigger_data;
>         struct objtrace_trigger_data *obj_data;
>         struct ftrace_event_field *field;
>         char *objtrace_cmd;
>         char *filter;
>         char *param;
>         char *arg;
>         bool remove;
>         int ret;

Thanks for your reminder .

> Although, when I changed this to use the fetch instruction, most of it was
> removed anyway (which is why I found the above to be annoying due to all the
> "unused variable" warnings I had).
>
> -- Steve
>
> > +
> > +     remove = event_trigger_check_remove(glob);
> > +
> > +     /*
> > +      * separate the param and the filter:
> > +      * objtrace:add:OBJ[:COUNT] [if filter]
> > +      */
> > +     ret = event_trigger_separate_filter(param_and_filter, &param, &filter, true);
> > +     if (ret)
> > +             return ret;
> > +
> > +     objtrace_cmd = strsep(&param, ":");
> > +     if (!objtrace_cmd || strcmp(objtrace_cmd, OBJTRACE_CMD_ADD)) {
> > +             pr_err("error objtrace command\n");
> > +             return -EINVAL;
> > +     }
> > +
> > +     arg = strsep(&param, ":");
> > +     if (!arg)
> > +             return -EINVAL;
> > +
> > +     field = trace_find_event_field(file->event_call, arg);
> > +     if (!field)
> > +             return -EINVAL;
> > +
> > +     if (field->size != sizeof(void *)) {
> > +             pr_err("the size of the %s should be:%zu\n", field->name, sizeof(void *));
> > +             return -EINVAL;
> > +     }
> > +
> > +     if (remove && !field_exist(file, cmd_ops, field->name))
> > +             return -ENOENT;
> > +
> > +     obj_data = kzalloc(sizeof(*obj_data), GFP_KERNEL);
> > +     if (!obj_data)
> > +             return -ENOMEM;
> > +
> > +     obj_data->field = field;
> > +     obj_data->tr = file->tr;
> > +     snprintf(obj_data->objtrace_cmd, OBJTRACE_CMD_LEN, objtrace_cmd);
> > +
> > +     trigger_data = event_trigger_alloc(cmd_ops, cmd, param, obj_data);
> > +     if (!trigger_data) {
> > +             kfree(obj_data);
> > +             return -ENOMEM;
> > +     }
> > +     if (remove) {
> > +             event_trigger_unregister(cmd_ops, file, glob+1, trigger_data);
> > +             kfree(obj_data);
> > +             kfree(trigger_data);
> > +             return 0;
> > +     }
> > +
> > +     ret = event_trigger_parse_num(param, trigger_data);
> > +     if (ret)
> > +             goto out_free;
> > +
> > +     ret = event_trigger_set_filter(cmd_ops, file, filter, trigger_data);
> > +     if (ret < 0)
> > +             goto out_free;
> > +
> > +     ret = event_trigger_register(cmd_ops, file, glob, trigger_data);
> > +     if (ret)
> > +             goto out_free;
> > +
> > +     return ret;
> > +
> >


-- 
Thanks,
JeffXie

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v15 0/4] trace: Introduce objtrace trigger to trace the kernel object
  2022-10-17 19:40 ` [PATCH v15 0/4] trace: Introduce objtrace trigger to trace the kernel object Steven Rostedt
  2022-10-18  7:01   ` Jeff Xie
@ 2022-11-29 16:09   ` Jeff Xie
  2022-11-29 16:30     ` Steven Rostedt
  1 sibling, 1 reply; 21+ messages in thread
From: Jeff Xie @ 2022-11-29 16:09 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: mingo, mhiramat, zanussi, linux-kernel, chensong_2000

Hi Steven,

On Tue, Oct 18, 2022 at 3:40 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Fri, 19 Aug 2022 11:27:02 +0800
> Jeff Xie <xiehuan09@gmail.com> wrote:
>
> So on my way home from London, I tried to update it to use the fetch
> instruction logic of kprobes (which will work), but I also ended up pretty
> much rewriting most of it, which then I realized, I'm going much further
> than I wanted to.
>
> But know I have a much better idea of how this should work, and I'll try to
> go through what I learned here and throughout the patches.
>
>
> > Introduce a method based on function tracer to trace any object and get
> > the value of the object dynamically. the object can be obtained from the
> > dynamic event (kprobe_event/uprobe_event) or the static event(tracepoint).
>
> Note, I can see the value of tracing only the value of the object, but that
> to me would be "objfollow" not "objtrace". That is, to find out what
> functions are calling the object. I guess that may have been your original
> objective, but it was articulated more about knowing how the value changed
> and not where the value is going (and the code was written more like that).
>
> Having a objfollow may be nice, but reading the arguments of a function is
> really a "best attempt" at most, and you can't really trust the arguments
> are what you are tracing. I would hold off on that until we have a good BTF
> tracing infrastructure in the function tracer.
>
> So I'm going to continue with "objtrace" which is about tracing the
> object's value throughout time.
>
> >
> > Usage:
> > When using the kprobe event, only need to set the objtrace(a new trigger),
> > we can get the value of the object. The object is from the setting of the
> > kprobe event.
> >
> > For example:
> > For the function bio_add_page():
> >
> > int bio_add_page(struct bio *bio, struct page *page,
> >       unsigned int len, unsigned int offset)
> >
> > Firstly, we can set the base of the object, thus the first string "arg1"
> > stands for the value of the first parameter of this function bio_add_gage(),
> >
> > # echo 'p bio_add_page arg1=$arg1' > ./kprobe_events
> >
> > Secondly, we can get the value dynamically based on above object.
> >
> > find the offset of the bi_size in struct bio:
> > $ gdb vmlinux
> > (gdb) p &(((struct bio *)0)->bi_iter.bi_size)
> > $1 = (unsigned int *) 0x28
> >
> > # echo 'objtrace:add:arg1,0x28:u32:1 if comm == "cat"' > ./events/kprobes/ \
> >       p_bio_add_page_0/trigger
> >
> > # cd /sys/kernel/debug/tracing/
> > # echo 'p bio_add_page arg1=$arg1' > ./kprobe_events
> > # echo 'objtrace:add:arg1,0x28:u32:1 if comm == "cat"' > ./events/kprobes/p_bio_add_page_0/trigger
>
> So I was working to change the above to:
>
>   echo 'objtrace:add:+0x28($arg1):u32:1 if comm == "cat"'
>
>
>
> >
> > # du -sh /test.txt
> > 12.0K   /test.txt
> >
> > # cat  /test.txt > /dev/null
> > # cat ./trace
> > # tracer: nop
> > #
> > # entries-in-buffer/entries-written: 128/128   #P:4
> > #
> > #                                _-----=> irqs-off/BH-disabled
> > #                               / _----=> need-resched
> > #                              | / _---=> hardirq/softirq
> > #                              || / _--=> preempt-depth
> > #                              ||| / _-=> migrate-disable
> > #                              |||| /     delay
> > #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
> > #              | |         |   |||||     |         |
> >              cat-117     [002] ...1.     1.602243: __bio_try_merge_page <-bio_add_page object:0xffff88811bee4000 value:0x0
> >              cat-117     [002] ...1.     1.602244: __bio_add_page <-bio_add_page object:0xffff88811bee4000 value:0x0
> >              cat-117     [002] ...2.     1.602244: bio_add_page <-ext4_mpage_readpages object:0xffff88811bee4000 value:0x1000
> >              cat-117     [002] ...1.     1.602245: __bio_try_merge_page <-bio_add_page object:0xffff88811bee4000 value:0x1000
> >              cat-117     [002] ...1.     1.602245: __bio_add_page <-bio_add_page object:0xffff88811bee4000 value:0x1000
>
> Now instead of searching the function arguments for "obj", we should just
> read the obj value at every function and report when it changed, as well as
> the last function it was still the same. The report will be like the
> "func_repeats" entry. And instead we can print when the object changes,
> something like this:
>
>               cat-117     [002] ...2.     1.602245: bio_add_page <-ext4_mpage_readpages object:0xffff88811bee4000 value:0x2000 (last value:0x1000 at [002] __bio_add_page <-bio_add_page ts: 1.602245)


I'm just curious if we'll see this rewritten version in the next merge
window  ;-)


> This will tell us when a value changed.
>
> It would need to store the CPU, ip and pip and timestamp of the last time
> it was the same. And when it changes, it would then report it.
>
> -- Steve



-- 
Thanks,
JeffXie

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v15 0/4] trace: Introduce objtrace trigger to trace the kernel object
  2022-11-29 16:09   ` Jeff Xie
@ 2022-11-29 16:30     ` Steven Rostedt
  2022-12-17  8:27       ` Jeff Xie
  0 siblings, 1 reply; 21+ messages in thread
From: Steven Rostedt @ 2022-11-29 16:30 UTC (permalink / raw)
  To: Jeff Xie; +Cc: mingo, mhiramat, zanussi, linux-kernel, chensong_2000

On Wed, 30 Nov 2022 00:09:51 +0800
Jeff Xie <xiehuan09@gmail.com> wrote:

> > Now instead of searching the function arguments for "obj", we should just
> > read the obj value at every function and report when it changed, as well as
> > the last function it was still the same. The report will be like the
> > "func_repeats" entry. And instead we can print when the object changes,
> > something like this:
> >
> >               cat-117     [002] ...2.     1.602245: bio_add_page <-ext4_mpage_readpages object:0xffff88811bee4000 value:0x2000 (last value:0x1000 at [002] __bio_add_page <-bio_add_page ts: 1.602245)  
> 
> 
> I'm just curious if we'll see this rewritten version in the next merge
> window  ;-)

Unfortunately, this ended up getting dropped in priority, as my workload
increased internally. :-/

-- Steve

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v15 0/4] trace: Introduce objtrace trigger to trace the kernel object
  2022-11-29 16:30     ` Steven Rostedt
@ 2022-12-17  8:27       ` Jeff Xie
  2023-01-31  9:55         ` Jeff Xie
  0 siblings, 1 reply; 21+ messages in thread
From: Jeff Xie @ 2022-12-17  8:27 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: mingo, mhiramat, zanussi, linux-kernel, chensong_2000

On Wed, Nov 30, 2022 at 12:30 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Wed, 30 Nov 2022 00:09:51 +0800
> Jeff Xie <xiehuan09@gmail.com> wrote:
>
> > > Now instead of searching the function arguments for "obj", we should just
> > > read the obj value at every function and report when it changed, as well as
> > > the last function it was still the same. The report will be like the
> > > "func_repeats" entry. And instead we can print when the object changes,
> > > something like this:
> > >
> > >               cat-117     [002] ...2.     1.602245: bio_add_page <-ext4_mpage_readpages object:0xffff88811bee4000 value:0x2000 (last value:0x1000 at [002] __bio_add_page <-bio_add_page ts: 1.602245)
> >
> >
> > I'm just curious if we'll see this rewritten version in the next merge
> > window  ;-)
>
> Unfortunately, this ended up getting dropped in priority, as my workload
> increased internally. :-/

Hope you have multiple trips to London and more time on the plane to
get this done  ;-)

>
> -- Steve


-- 
Thanks,
JeffXie

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v15 0/4] trace: Introduce objtrace trigger to trace the kernel object
  2022-12-17  8:27       ` Jeff Xie
@ 2023-01-31  9:55         ` Jeff Xie
  2023-01-31 15:36           ` Steven Rostedt
  0 siblings, 1 reply; 21+ messages in thread
From: Jeff Xie @ 2023-01-31  9:55 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: mingo, mhiramat, zanussi, linux-kernel, chensong_2000

 Hi Steve,

On Sat, Dec 17, 2022 at 4:27 PM Jeff Xie <xiehuan09@gmail.com> wrote:
>
> On Wed, Nov 30, 2022 at 12:30 AM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > On Wed, 30 Nov 2022 00:09:51 +0800
> > Jeff Xie <xiehuan09@gmail.com> wrote:
> >
> > > > Now instead of searching the function arguments for "obj", we should just
> > > > read the obj value at every function and report when it changed, as well as
> > > > the last function it was still the same. The report will be like the
> > > > "func_repeats" entry. And instead we can print when the object changes,
> > > > something like this:
> > > >
> > > >               cat-117     [002] ...2.     1.602245: bio_add_page <-ext4_mpage_readpages object:0xffff88811bee4000 value:0x2000 (last value:0x1000 at [002] __bio_add_page <-bio_add_page ts: 1.602245)
> > >
> > >
> > > I'm just curious if we'll see this rewritten version in the next merge
> > > window  ;-)
> >
> > Unfortunately, this ended up getting dropped in priority, as my workload
> > increased internally. :-/
>

I would like to ask if there is any hope that this patchset can be
merged into the linux kernel in the end.

We spent a lot of time on this patchset, I don't want to give up so easily ;-)

-- 
Thanks,
JeffXie

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v15 0/4] trace: Introduce objtrace trigger to trace the kernel object
  2023-01-31  9:55         ` Jeff Xie
@ 2023-01-31 15:36           ` Steven Rostedt
  2023-01-31 15:48             ` Jeff Xie
  0 siblings, 1 reply; 21+ messages in thread
From: Steven Rostedt @ 2023-01-31 15:36 UTC (permalink / raw)
  To: Jeff Xie; +Cc: mingo, mhiramat, zanussi, linux-kernel, chensong_2000

On Tue, 31 Jan 2023 17:55:16 +0800
Jeff Xie <xiehuan09@gmail.com> wrote:

>  Hi Steve,
> 
> On Sat, Dec 17, 2022 at 4:27 PM Jeff Xie <xiehuan09@gmail.com> wrote:
> >
> > On Wed, Nov 30, 2022 at 12:30 AM Steven Rostedt <rostedt@goodmis.org> wrote:  
> > >
> > > On Wed, 30 Nov 2022 00:09:51 +0800
> > > Jeff Xie <xiehuan09@gmail.com> wrote:
> > >  
> > > > > Now instead of searching the function arguments for "obj", we should just
> > > > > read the obj value at every function and report when it changed, as well as
> > > > > the last function it was still the same. The report will be like the
> > > > > "func_repeats" entry. And instead we can print when the object changes,
> > > > > something like this:
> > > > >
> > > > >               cat-117     [002] ...2.     1.602245: bio_add_page <-ext4_mpage_readpages object:0xffff88811bee4000 value:0x2000 (last value:0x1000 at [002] __bio_add_page <-bio_add_page ts: 1.602245)  
> > > >
> > > >
> > > > I'm just curious if we'll see this rewritten version in the next merge
> > > > window  ;-)  
> > >
> > > Unfortunately, this ended up getting dropped in priority, as my workload
> > > increased internally. :-/  
> >  
> 
> I would like to ask if there is any hope that this patchset can be
> merged into the linux kernel in the end.
> 
> We spent a lot of time on this patchset, I don't want to give up so easily ;-)
> 

I think we were near the end, but it wasn't quite done, as I want it to be
integrated with the kprobes indirection interface:

  https://lore.kernel.org/all/20220924160136.5029e942@rorschach.local.home/

Which I started to work on but didn't have time to finish, nor even get
very far with it.

I think once we can get that implemented, then this can get in. I don't
want to give up on it either.

-- Steve

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v15 0/4] trace: Introduce objtrace trigger to trace the kernel object
  2023-01-31 15:36           ` Steven Rostedt
@ 2023-01-31 15:48             ` Jeff Xie
  0 siblings, 0 replies; 21+ messages in thread
From: Jeff Xie @ 2023-01-31 15:48 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: mingo, mhiramat, zanussi, linux-kernel, chensong_2000

On Tue, Jan 31, 2023 at 11:36 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Tue, 31 Jan 2023 17:55:16 +0800
> Jeff Xie <xiehuan09@gmail.com> wrote:
>
> >  Hi Steve,
> >
> > On Sat, Dec 17, 2022 at 4:27 PM Jeff Xie <xiehuan09@gmail.com> wrote:
> > >
> > > On Wed, Nov 30, 2022 at 12:30 AM Steven Rostedt <rostedt@goodmis.org> wrote:
> > > >
> > > > On Wed, 30 Nov 2022 00:09:51 +0800
> > > > Jeff Xie <xiehuan09@gmail.com> wrote:
> > > >
> > > > > > Now instead of searching the function arguments for "obj", we should just
> > > > > > read the obj value at every function and report when it changed, as well as
> > > > > > the last function it was still the same. The report will be like the
> > > > > > "func_repeats" entry. And instead we can print when the object changes,
> > > > > > something like this:
> > > > > >
> > > > > >               cat-117     [002] ...2.     1.602245: bio_add_page <-ext4_mpage_readpages object:0xffff88811bee4000 value:0x2000 (last value:0x1000 at [002] __bio_add_page <-bio_add_page ts: 1.602245)
> > > > >
> > > > >
> > > > > I'm just curious if we'll see this rewritten version in the next merge
> > > > > window  ;-)
> > > >
> > > > Unfortunately, this ended up getting dropped in priority, as my workload
> > > > increased internally. :-/
> > >
> >
> > I would like to ask if there is any hope that this patchset can be
> > merged into the linux kernel in the end.
> >
> > We spent a lot of time on this patchset, I don't want to give up so easily ;-)
> >
>
> I think we were near the end, but it wasn't quite done, as I want it to be
> integrated with the kprobes indirection interface:
>
>   https://lore.kernel.org/all/20220924160136.5029e942@rorschach.local.home/
>
> Which I started to work on but didn't have time to finish, nor even get
> very far with it.
>
> I think once we can get that implemented, then this can get in. I don't
> want to give up on it either.

Thank you for your positive and fast feedback.

>
> -- Steve


-- 
Thanks,
JeffXie

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2023-01-31 15:48 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-19  3:27 [PATCH v15 0/4] trace: Introduce objtrace trigger to trace the kernel object Jeff Xie
2022-08-19  3:27 ` [PATCH v15 1/4] trace: Add trace any " Jeff Xie
2022-10-17 19:58   ` Steven Rostedt
2022-10-18  8:35     ` Jeff Xie
2022-08-19  3:27 ` [PATCH v15 2/4] trace/objtrace: Get the value of the object Jeff Xie
2022-09-24 20:01   ` Steven Rostedt
2022-09-25  5:27     ` Jeff Xie
2022-09-25 15:53       ` Steven Rostedt
2022-09-25 16:49         ` Jeff Xie
2022-09-28 14:15           ` Jeff Xie
2022-09-28 14:40             ` Steven Rostedt
2022-08-19  3:27 ` [PATCH v15 3/4] selftests/ftrace: Add testcases for objtrace Jeff Xie
2022-08-19  3:27 ` [PATCH v15 4/4] Documentation: trace/objtrace: Add documentation " Jeff Xie
2022-10-17 19:40 ` [PATCH v15 0/4] trace: Introduce objtrace trigger to trace the kernel object Steven Rostedt
2022-10-18  7:01   ` Jeff Xie
2022-11-29 16:09   ` Jeff Xie
2022-11-29 16:30     ` Steven Rostedt
2022-12-17  8:27       ` Jeff Xie
2023-01-31  9:55         ` Jeff Xie
2023-01-31 15:36           ` Steven Rostedt
2023-01-31 15:48             ` Jeff Xie

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.