All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH v6 0/5] trace: Introduce objtrace trigger to trace the kernel object
@ 2021-11-29 16:49 Jeff Xie
  2021-11-29 16:49 ` [RFC][PATCH v6 1/5] trace: Add trace any " Jeff Xie
                   ` (6 more replies)
  0 siblings, 7 replies; 26+ messages in thread
From: Jeff Xie @ 2021-11-29 16:49 UTC (permalink / raw)
  To: rostedt; +Cc: mhiramat, mingo, zanussi, linux-kernel, Jeff Xie

Introduce a method based on function tracer to trace any object and get
the value of the object dynamically. the object can be obtained from the
dynamic event (kprobe_event/uprobe_event) or the static event(tracepoint).

Usage:
When using the kprobe event, only need to set the objtrace(a new trigger),
we can get the value of the object. The object is from the setting of the 
kprobe event.

For example:
For the function bio_add_page():

int bio_add_page(struct bio *bio, struct page *page,
	unsigned int len, unsigned int offset)

Firstly, we can set the base of the object, thus the first string "arg1"
stands for the value of the first parameter of this function bio_add_gage(),

# echo 'p bio_add_page arg1=$arg1' >> ./kprobe_events

Secondly, we can get the value dynamically base the above object. 

find the offset of the bi_size in struct bio:
$ gdb vmlinux
(gdb) p &(((struct bio *)0)->bi_iter.bi_size)
$1 = (unsigned int *) 0x28

# echo 'objtrace:add:arg1,0x28:u32:1 if comm == "cat"' > ./events/kprobes/ \
	p_bio_add_page_0/trigger

The best way to use this is that we can set the entrance event and exit
event, for example, the following example is to set the read_papes as
the entrance event, and set the __blk_account_io_start as the exit event.

# cd /sys/kernel/debug/tracing/
# echo 0 > ./tracing_on
# echo 'p read_pages' >> ./kprobe_events
# echo 'p __blk_account_io_start' >> ./kprobe_events
# echo 'traceon if comm == "cat"' > ./events/kprobes/p_read_pages_0/trigger
# echo 'traceoff if comm == "cat"' > ./events/kprobes/p___blk_account_io_start_0/trigger
# echo 'p bio_add_page arg1=$arg1' >> ./kprobe_events
# echo 'objtrace:add:arg1,0x28:u32:1 if comm == "cat"' > ./events/kprobes/p_bio_add_page_0/trigger

# du -sh /test.txt
12.0K   /test.txt

# cat  /test.txt > /dev/null
# cat ./trace

# tracer: nop
#
# entries-in-buffer/entries-written: 50/50   #P:1
#
#                                _-----=> irqs-off
#                               / _----=> need-resched
#                              | / _---=> hardirq/softirq
#                              || / _--=> preempt-depth
#                              ||| / _-=> migrate-disable
#                              |||| /     delay
#           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
#              | |         |   |||||     |         |
             cat-95      [000] .....     1.412065: _raw_spin_unlock_irqrestore <-event_triggers_call object:0xffff888108af6328 value:0x0
             cat-95      [000] .....     1.412066: __bio_try_merge_page <-bio_add_page object:0xffff888108af6328 value:0x0
             cat-95      [000] .....     1.412066: __bio_add_page <-bio_add_page object:0xffff888108af6328 value:0x0
             cat-95      [000] .....     1.412066: rcu_read_unlock_strict <-xa_load object:0xffff888108af6328 value:0x1000
             cat-95      [000] .....     1.412066: bio_add_page <-ext4_mpage_readpages object:0xffff888108af6328 value:0x1000
             cat-95      [000] .....     1.412066: kprobe_ftrace_handler <-ftrace_ops_list_func object:0xffff888108af6328 value:0x1000
             cat-95      [000] .....     1.412067: get_kprobe <-kprobe_ftrace_handler object:0xffff888108af6328 value:0x1000
             cat-95      [000] .....     1.412067: __bio_try_merge_page <-bio_add_page object:0xffff888108af6328 value:0x1000
             cat-95      [000] .....     1.412067: __bio_add_page <-bio_add_page object:0xffff888108af6328 value:0x1000
             cat-95      [000] .....     1.412067: rcu_read_unlock_strict <-xa_load object:0xffff888108af6328 value:0x2000
             cat-95      [000] .....     1.412067: bio_add_page <-ext4_mpage_readpages object:0xffff888108af6328 value:0x2000
             cat-95      [000] .....     1.412067: kprobe_ftrace_handler <-ftrace_ops_list_func object:0xffff888108af6328 value:0x2000
             cat-95      [000] .....     1.412067: get_kprobe <-kprobe_ftrace_handler object:0xffff888108af6328 value:0x2000
             cat-95      [000] .....     1.412067: __bio_try_merge_page <-bio_add_page object:0xffff888108af6328 value:0x2000
             cat-95      [000] .....     1.412068: submit_bio <-ext4_mpage_readpages object:0xffff888108af6328 value:0x3000
             cat-95      [000] .....     1.412068: submit_bio_noacct <-ext4_mpage_readpages object:0xffff888108af6328 value:0x3000
             cat-95      [000] .....     1.412068: __submit_bio <-submit_bio_noacct object:0xffff888108af6328 value:0x3000
             cat-95      [000] .....     1.412068: blk_try_enter_queue <-__submit_bio object:0xffff888108af6328 value:0x3000
             cat-95      [000] .....     1.412068: rcu_read_unlock_strict <-blk_try_enter_queue object:0xffff888108af6328 value:0x3000
             cat-95      [000] .....     1.412068: rcu_read_unlock_strict <-blk_try_enter_queue object:0xffff888108af6328 value:0x3000
             cat-95      [000] .....     1.412068: submit_bio_checks <-__submit_bio object:0xffff888108af6328 value:0x3000
             cat-95      [000] .....     1.412068: __cond_resched <-submit_bio_checks object:0xffff888108af6328 value:0x3000
             cat-95      [000] .....     1.412068: rcu_all_qs <-__cond_resched object:0xffff888108af6328 value:0x3000
             cat-95      [000] .....     1.412068: should_fail_bio <-submit_bio_checks object:0xffff888108af6328 value:0x3000
             cat-95      [000] .....     1.412069: create_task_io_context <-submit_bio_checks object:0xffff888108af6328 value:0x3000
             cat-95      [000] .....     1.412069: kmem_cache_alloc_node <-create_task_io_context object:0xffff888108af6328 value:0x3000
             cat-95      [000] .....     1.412069: should_failslab <-kmem_cache_alloc_node object:0xffff888108af6328 value:0x3000
             cat-95      [000] .....     1.412069: _raw_spin_lock <-create_task_io_context object:0xffff888108af6328 value:0x3000
             cat-95      [000] .....     1.412069: blk_mq_submit_bio <-__submit_bio object:0xffff888108af6328 value:0x3000
             cat-95      [000] .....     1.412069: __blk_queue_split <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
             cat-95      [000] .....     1.412069: bvec_split_segs <-__blk_queue_split object:0xffff888108af6328 value:0x3000
             cat-95      [000] .....     1.412069: blk_attempt_plug_merge <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
             cat-95      [000] .....     1.412070: __blk_mq_sched_bio_merge <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
             cat-95      [000] .....     1.412070: dd_bio_merge <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
             cat-95      [000] .....     1.412070: _raw_spin_lock <-dd_bio_merge object:0xffff888108af6328 value:0x3000
             cat-95      [000] .....     1.412070: blk_mq_sched_try_merge <-dd_bio_merge object:0xffff888108af6328 value:0x3000
             cat-95      [000] .....     1.412070: elv_merge <-blk_mq_sched_try_merge object:0xffff888108af6328 value:0x3000
             cat-95      [000] .....     1.412070: elv_rqhash_find <-elv_merge object:0xffff888108af6328 value:0x3000
             cat-95      [000] .....     1.412070: dd_request_merge <-blk_mq_sched_try_merge object:0xffff888108af6328 value:0x3000
             cat-95      [000] .....     1.412070: elv_rb_find <-dd_request_merge object:0xffff888108af6328 value:0x3000
             cat-95      [000] .....     1.412070: __blk_mq_alloc_request <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
             cat-95      [000] .....     1.412071: dd_limit_depth <-__blk_mq_alloc_request object:0xffff888108af6328 value:0x3000
             cat-95      [000] .....     1.412071: blk_mq_get_tag <-__blk_mq_alloc_request object:0xffff888108af6328 value:0x3000
             cat-95      [000] .....     1.412071: __blk_mq_get_tag <-blk_mq_get_tag object:0xffff888108af6328 value:0x3000
             cat-95      [000] .....     1.412071: blk_mq_rq_ctx_init.isra.0 <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
             cat-95      [000] .....     1.412071: ktime_get <-blk_mq_rq_ctx_init.isra.0 object:0xffff888108af6328 value:0x3000
             cat-95      [000] .....     1.412071: dd_prepare_request <-blk_mq_rq_ctx_init.isra.0 object:0xffff888108af6328 value:0x3000
             cat-95      [000] .....     1.412071: __blk_account_io_start <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
             cat-95      [000] .....     1.412071: kprobe_ftrace_handler <-ftrace_ops_list_func object:0xffff888108af6328 value:0x3000
             cat-95      [000] .....     1.412071: get_kprobe <-kprobe_ftrace_handler object:0xffff888108af6328 value:0x3000

Jeff Xie (5):
  trace: Add trace any kernel object
  trace/objtrace: get the value of the object
  trace/README: Document objtrace trigger syntax
  trace/objtrace: Add testcases for objtrace
  trace/objtrace: Add documentation for objtrace

 Documentation/trace/events.rst                | 114 +++
 include/linux/trace_events.h                  |   1 +
 kernel/trace/Kconfig                          |  10 +
 kernel/trace/Makefile                         |   1 +
 kernel/trace/trace.c                          |   3 +
 kernel/trace/trace.h                          |   8 +
 kernel/trace/trace_entries.h                  |  18 +
 kernel/trace/trace_events_trigger.c           |   1 +
 kernel/trace/trace_object.c                   | 660 ++++++++++++++++++
 kernel/trace/trace_output.c                   |  40 ++
 .../ftrace/test.d/trigger/trigger-objtrace.tc |  39 ++
 11 files changed, 895 insertions(+)
 create mode 100644 kernel/trace/trace_object.c
 create mode 100644 tools/testing/selftests/ftrace/test.d/trigger/trigger-objtrace.tc

-- 
2.25.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC][PATCH v6 1/5] trace: Add trace any kernel object
  2021-11-29 16:49 [RFC][PATCH v6 0/5] trace: Introduce objtrace trigger to trace the kernel object Jeff Xie
@ 2021-11-29 16:49 ` Jeff Xie
  2021-11-29 23:05     ` kernel test robot
  2021-12-17  4:51   ` Masami Hiramatsu
  2021-11-29 16:49 ` [RFC][PATCH v6 2/5] trace/objtrace: get the value of the object Jeff Xie
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 26+ messages in thread
From: Jeff Xie @ 2021-11-29 16:49 UTC (permalink / raw)
  To: rostedt; +Cc: mhiramat, mingo, zanussi, linux-kernel, Jeff Xie

Introduce objtrace trigger to get the execution flow by tracing
any kernel object in the kernel parameter.

Signed-off-by: Jeff Xie <xiehuan09@gmail.com>
---
 include/linux/trace_events.h        |   1 +
 kernel/trace/Kconfig                |  10 +
 kernel/trace/Makefile               |   1 +
 kernel/trace/trace.h                |   8 +
 kernel/trace/trace_entries.h        |  17 +
 kernel/trace/trace_events_trigger.c |   1 +
 kernel/trace/trace_object.c         | 534 ++++++++++++++++++++++++++++
 kernel/trace/trace_output.c         |  40 +++
 8 files changed, 612 insertions(+)
 create mode 100644 kernel/trace/trace_object.c

diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 2d167ac3452c..54baec78ce62 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -684,6 +684,7 @@ enum event_trigger_type {
 	ETT_EVENT_HIST		= (1 << 4),
 	ETT_HIST_ENABLE		= (1 << 5),
 	ETT_EVENT_EPROBE	= (1 << 6),
+	ETT_TRACE_OBJECT	= (1 << 7),
 };
 
 extern int filter_match_preds(struct event_filter *filter, void *rec);
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 420ff4bc67fd..f69761cbfaaa 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -237,6 +237,16 @@ config FUNCTION_PROFILER
 
 	  If in doubt, say N.
 
+config TRACE_OBJECT
+	bool "Trace kernel object in function parameter"
+	depends on FUNCTION_TRACER
+	depends on HAVE_FUNCTION_ARG_ACCESS_API
+	select TRACING
+	default y
+	help
+	 You can trace the kernel object in the kernel function parameter.
+	 The kernel object is dynamically specified via event trigger.
+
 config STACK_TRACER
 	bool "Trace max stack"
 	depends on HAVE_FUNCTION_TRACER
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index bedc5caceec7..b924b8e55922 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -67,6 +67,7 @@ obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += trace_functions_graph.o
 obj-$(CONFIG_TRACE_BRANCH_PROFILING) += trace_branch.o
 obj-$(CONFIG_BLK_DEV_IO_TRACE) += blktrace.o
 obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += fgraph.o
+obj-$(CONFIG_TRACE_OBJECT) += trace_object.o
 ifeq ($(CONFIG_BLOCK),y)
 obj-$(CONFIG_EVENT_TRACING) += blktrace.o
 endif
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 38715aa6cfdf..46a1301ac47a 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -54,6 +54,7 @@ enum trace_type {
 	TRACE_TIMERLAT,
 	TRACE_RAW_DATA,
 	TRACE_FUNC_REPEATS,
+	TRACE_OBJECT,
 
 	__TRACE_LAST_TYPE,
 };
@@ -469,6 +470,7 @@ extern void __ftrace_bad_type(void);
 			  TRACE_GRAPH_RET);		\
 		IF_ASSIGN(var, ent, struct func_repeats_entry,		\
 			  TRACE_FUNC_REPEATS);				\
+		IF_ASSIGN(var, ent, struct trace_object_entry, TRACE_OBJECT);\
 		__ftrace_bad_type();					\
 	} while (0)
 
@@ -1531,6 +1533,12 @@ static inline int register_trigger_hist_cmd(void) { return 0; }
 static inline int register_trigger_hist_enable_disable_cmds(void) { return 0; }
 #endif
 
+#ifdef CONFIG_TRACE_OBJECT
+extern int register_trigger_object_cmd(void);
+#else
+static inline int register_trigger_object_cmd(void) { return 0; }
+#endif
+
 extern int register_trigger_cmds(void);
 extern void clear_event_triggers(struct trace_array *tr);
 
diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h
index cd41e863b51c..bb120d9498a9 100644
--- a/kernel/trace/trace_entries.h
+++ b/kernel/trace/trace_entries.h
@@ -401,3 +401,20 @@ FTRACE_ENTRY(timerlat, timerlat_entry,
 		 __entry->context,
 		 __entry->timer_latency)
 );
+
+/*
+ * trace object entry:
+ */
+FTRACE_ENTRY(object, trace_object_entry,
+
+	TRACE_OBJECT,
+
+	F_STRUCT(
+		__field(	unsigned long,		ip		)
+		__field(	unsigned long,		parent_ip	)
+		__field(	unsigned long,		object		)
+	),
+
+	F_printk(" %ps <-- %ps object:%lx\n",
+		 (void *)__entry->ip, (void *)__entry->parent_ip, __entry->object)
+);
diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
index 3d5c07239a2a..da2d871fa809 100644
--- a/kernel/trace/trace_events_trigger.c
+++ b/kernel/trace/trace_events_trigger.c
@@ -1687,6 +1687,7 @@ __init int register_trigger_cmds(void)
 	register_trigger_enable_disable_cmds();
 	register_trigger_hist_enable_disable_cmds();
 	register_trigger_hist_cmd();
+	register_trigger_object_cmd();
 
 	return 0;
 }
diff --git a/kernel/trace/trace_object.c b/kernel/trace/trace_object.c
new file mode 100644
index 000000000000..83fccd587b63
--- /dev/null
+++ b/kernel/trace/trace_object.c
@@ -0,0 +1,534 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * trace the kernel object in the kernel function parameter
+ * Copyright (C) 2021 Jeff Xie <xiehuan09@gmail.com>
+ */
+
+#define pr_fmt(fmt) "trace_object: " fmt
+
+#include "trace_output.h"
+#include <linux/freelist.h>
+
+static DEFINE_PER_CPU(atomic_t, trace_object_event_disable);
+static DEFINE_RAW_SPINLOCK(object_spin_lock);
+static struct trace_event_file event_trace_file;
+static const int max_args_num = 6;
+static const int max_obj_pool = 10;
+static atomic_t trace_object_ref;
+static int exit_trace_object(void);
+static int init_trace_object(void);
+
+struct object_instance {
+	void *object;
+	struct freelist_node free_list;
+	struct list_head active_list;
+};
+
+struct obj_pool {
+	struct freelist_head free_list;
+	struct list_head active_list;
+};
+static struct obj_pool *obj_pool;
+
+static bool object_exist(void *obj)
+{
+	struct object_instance *inst;
+	bool ret = false;
+
+	list_for_each_entry_rcu(inst, &obj_pool->active_list, active_list) {
+		if (inst->object == obj) {
+			ret = true;
+			goto out;
+		}
+	}
+out:
+	return ret;
+}
+
+static bool object_empty(void)
+{
+	return list_empty(&obj_pool->active_list);
+}
+
+static void set_trace_object(void *obj)
+{
+	struct freelist_node *fn;
+	struct object_instance *ins;
+	unsigned long flags;
+
+	if (in_nmi())
+		return;
+
+	if (!obj)
+		return;
+
+	if (object_exist(obj))
+		return;
+
+	fn = freelist_try_get(&obj_pool->free_list);
+	if (!fn) {
+		trace_printk("object_pool is full, can't trace object:0x%px\n", obj);
+		return;
+	}
+
+	ins = container_of(fn, struct object_instance, free_list);
+	ins->object = obj;
+
+	raw_spin_lock_irqsave(&object_spin_lock, flags);
+	list_add_rcu(&ins->active_list, &obj_pool->active_list);
+	raw_spin_unlock_irqrestore(&object_spin_lock, flags);
+}
+
+static inline void free_free_list_objects(struct freelist_head *head)
+{
+
+	struct object_instance *inst;
+	struct freelist_node *node = head->head;
+
+	while (node) {
+		inst = container_of(node, struct object_instance, free_list);
+		node = node->next;
+		kfree(inst);
+	}
+}
+
+static inline void free_active_list_objects(struct list_head *head)
+{
+	struct object_instance *inst;
+
+	list_for_each_entry_rcu(inst, head, active_list)
+		kfree(inst);
+}
+
+static inline void free_object_pool(void)
+{
+	free_free_list_objects(&obj_pool->free_list);
+	free_active_list_objects(&obj_pool->active_list);
+	kfree(obj_pool);
+}
+
+
+static int init_object_pool(void)
+{
+	struct object_instance *inst;
+	int i, ret = 0;
+
+	obj_pool = kzalloc(sizeof(*obj_pool), GFP_KERNEL);
+	if (!obj_pool) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	INIT_LIST_HEAD(&obj_pool->active_list);
+
+	for (i = 0; i < max_obj_pool; i++) {
+		inst = kzalloc(sizeof(*inst), GFP_KERNEL);
+		if (!inst) {
+			free_object_pool();
+			ret = -ENOMEM;
+			goto out;
+		}
+		freelist_add(&inst->free_list, &obj_pool->free_list);
+	}
+out:
+	return ret;
+}
+
+static void submit_trace_object(unsigned long ip, unsigned long parent_ip,
+				 unsigned long object)
+{
+
+	struct trace_buffer *buffer;
+	struct ring_buffer_event *event;
+	struct trace_object_entry *entry;
+	int pc;
+
+	pc = preempt_count();
+	event = trace_event_buffer_lock_reserve(&buffer, &event_trace_file,
+			TRACE_OBJECT, sizeof(*entry), pc);
+	if (!event)
+		return;
+	entry   = ring_buffer_event_data(event);
+	entry->ip                       = ip;
+	entry->parent_ip                = parent_ip;
+	entry->object			= object;
+
+	event_trigger_unlock_commit(&event_trace_file, buffer, event,
+		entry, pc);
+}
+
+static void
+trace_object_events_call(unsigned long ip, unsigned long parent_ip,
+		struct ftrace_ops *op, struct ftrace_regs *fregs)
+{
+	struct pt_regs *pt_regs = ftrace_get_regs(fregs);
+	unsigned long obj;
+	long disabled;
+	int cpu, n;
+
+	preempt_disable_notrace();
+
+	cpu = raw_smp_processor_id();
+	disabled = atomic_inc_return(&per_cpu(trace_object_event_disable, cpu));
+
+	if (disabled != 1)
+		goto out;
+
+	if (object_empty())
+		goto out;
+
+	for (n = 0; n < max_args_num; n++) {
+		obj = regs_get_kernel_argument(pt_regs, n);
+		if (object_exist((void *)obj))
+			submit_trace_object(ip, parent_ip, obj);
+	}
+
+out:
+	atomic_dec(&per_cpu(trace_object_event_disable, cpu));
+	preempt_enable_notrace();
+}
+
+static struct ftrace_ops trace_ops = {
+	.func  = trace_object_events_call,
+	.flags = FTRACE_OPS_FL_SAVE_REGS,
+};
+
+static void
+trace_object_trigger(struct event_trigger_data *data,
+		   struct trace_buffer *buffer,  void *rec,
+		   struct ring_buffer_event *event)
+{
+
+	struct ftrace_event_field *field = data->private_data;
+	void *obj = NULL;
+
+	memcpy(&obj, rec + field->offset, sizeof(obj));
+	set_trace_object(obj);
+}
+
+static void
+trace_object_trigger_free(struct event_trigger_ops *ops,
+		   struct event_trigger_data *data)
+{
+	if (WARN_ON_ONCE(data->ref <= 0))
+		return;
+
+	data->ref--;
+	if (!data->ref)
+		trigger_data_free(data);
+}
+
+static void
+trace_object_count_trigger(struct event_trigger_data *data,
+			 struct trace_buffer *buffer, void *rec,
+			 struct ring_buffer_event *event)
+{
+	if (!data->count)
+		return;
+
+	if (data->count != -1)
+		(data->count)--;
+
+	trace_object_trigger(data, buffer, rec, event);
+}
+
+static int event_object_trigger_init(struct event_trigger_ops *ops,
+		       struct event_trigger_data *data)
+{
+	data->ref++;
+	return 0;
+}
+
+static int
+event_trigger_print(const char *name, struct seq_file *m,
+		    void *data, char *filter_str)
+{
+	long count = (long)data;
+
+	seq_puts(m, name);
+
+	if (count == -1)
+		seq_puts(m, ":unlimited");
+	else
+		seq_printf(m, ":count=%ld", count);
+
+	if (filter_str)
+		seq_printf(m, " if %s\n", filter_str);
+	else
+		seq_putc(m, '\n');
+
+	return 0;
+}
+
+static int
+trace_object_trigger_print(struct seq_file *m, struct event_trigger_ops *ops,
+			 struct event_trigger_data *data)
+{
+	return event_trigger_print("objtrace", m, (void *)data->count,
+				   data->filter_str);
+}
+
+static struct event_trigger_ops objecttrace_trigger_ops = {
+	.func			= trace_object_trigger,
+	.print			= trace_object_trigger_print,
+	.init			= event_object_trigger_init,
+	.free			= trace_object_trigger_free,
+};
+
+static struct event_trigger_ops objecttrace_count_trigger_ops = {
+	.func			= trace_object_count_trigger,
+	.print			= trace_object_trigger_print,
+	.init			= event_object_trigger_init,
+	.free			= trace_object_trigger_free,
+};
+
+static struct event_trigger_ops *
+objecttrace_get_trigger_ops(char *cmd, char *param)
+{
+	return param ? &objecttrace_count_trigger_ops : &objecttrace_trigger_ops;
+}
+
+static int register_object_trigger(char *glob, struct event_trigger_ops *ops,
+			    struct event_trigger_data *data,
+			    struct trace_event_file *file)
+{
+	struct event_trigger_data *test;
+	int ret = 0;
+
+	lockdep_assert_held(&event_mutex);
+
+	list_for_each_entry(test, &file->triggers, list) {
+		if (test->cmd_ops->trigger_type == data->cmd_ops->trigger_type) {
+			ret = -EEXIST;
+			goto out;
+		}
+	}
+
+	if (data->ops->init) {
+		ret = data->ops->init(data->ops, data);
+		if (ret < 0)
+			goto out;
+	}
+
+	list_add_rcu(&data->list, &file->triggers);
+	ret++;
+
+	update_cond_flag(file);
+	if (trace_event_trigger_enable_disable(file, 1) < 0) {
+		list_del_rcu(&data->list);
+		update_cond_flag(file);
+		ret--;
+	}
+	init_trace_object();
+out:
+	return ret;
+}
+
+static void unregister_object_trigger(char *glob, struct event_trigger_ops *ops,
+			       struct event_trigger_data *test,
+			       struct trace_event_file *file)
+{
+	struct event_trigger_data *data;
+	bool unregistered = false;
+
+	lockdep_assert_held(&event_mutex);
+
+	list_for_each_entry(data, &file->triggers, list) {
+		if (data->cmd_ops->trigger_type == test->cmd_ops->trigger_type) {
+			unregistered = true;
+			list_del_rcu(&data->list);
+			trace_event_trigger_enable_disable(file, 0);
+			update_cond_flag(file);
+			break;
+		}
+	}
+
+	if (unregistered && data->ops->free) {
+		data->ops->free(data->ops, data);
+		exit_trace_object();
+	}
+}
+
+static int
+event_object_trigger_callback(struct event_command *cmd_ops,
+		       struct trace_event_file *file,
+		       char *glob, char *cmd, char *param)
+{
+	struct event_trigger_data *trigger_data;
+	struct event_trigger_ops *trigger_ops;
+	struct trace_event_call *call;
+	struct ftrace_event_field *field;
+	char *objtrace_cmd;
+	char *trigger = NULL;
+	char *arg;
+	char *number;
+	int ret;
+
+	ret = -EINVAL;
+	if (!param)
+		goto out;
+
+	/* separate the trigger from the filter (c:a:n [if filter]) */
+	trigger = strsep(&param, " \t");
+	if (!trigger)
+		goto out;
+	if (param) {
+		param = skip_spaces(param);
+		if (!*param)
+			param = NULL;
+	}
+
+	objtrace_cmd = strsep(&trigger, ":");
+	if (!objtrace_cmd || strcmp(objtrace_cmd, "add"))
+		goto out;
+
+	arg = strsep(&trigger, ":");
+	if (!arg)
+		goto out;
+	call = file->event_call;
+	field = trace_find_event_field(call, arg);
+	if (!field)
+		goto out;
+
+	if (field->size != sizeof(void *))
+		goto out;
+
+	trigger_ops = cmd_ops->get_trigger_ops(cmd, trigger);
+
+	ret = -ENOMEM;
+	trigger_data = kzalloc(sizeof(*trigger_data), GFP_KERNEL);
+	if (!trigger_data)
+		goto out;
+
+	trigger_data->count = -1;
+	trigger_data->ops = trigger_ops;
+	trigger_data->cmd_ops = cmd_ops;
+	trigger_data->private_data = field;
+	INIT_LIST_HEAD(&trigger_data->list);
+	INIT_LIST_HEAD(&trigger_data->named_list);
+
+	if (glob[0] == '!') {
+		cmd_ops->unreg(glob+1, trigger_ops, trigger_data, file);
+		kfree(trigger_data);
+		ret = 0;
+		goto out;
+	}
+
+	if (trigger) {
+		number = strsep(&trigger, ":");
+
+		ret = -EINVAL;
+		if (!strlen(number))
+			goto out_free;
+
+		/*
+		 * We use the callback data field (which is a pointer)
+		 * as our counter.
+		 */
+		ret = kstrtoul(number, 0, &trigger_data->count);
+		if (ret)
+			goto out_free;
+	}
+
+	if (!param) /* if param is non-empty, it's supposed to be a filter */
+		goto out_reg;
+
+	if (!cmd_ops->set_filter)
+		goto out_reg;
+
+	ret = cmd_ops->set_filter(param, trigger_data, file);
+	if (ret < 0)
+		goto out_free;
+
+ out_reg:
+	/* Up the trigger_data count to make sure reg doesn't free it on failure */
+	event_object_trigger_init(trigger_ops, trigger_data);
+	ret = cmd_ops->reg(glob, trigger_ops, trigger_data, file);
+	/*
+	 * The above returns on success the # of functions enabled,
+	 * but if it didn't find any functions it returns zero.
+	 * Consider no functions a failure too.
+	 */
+	if (!ret) {
+		cmd_ops->unreg(glob, trigger_ops, trigger_data, file);
+		ret = -ENOENT;
+	} else if (ret > 0)
+		ret = 0;
+
+	/* Down the counter of trigger_data or free it if not used anymore */
+	trace_object_trigger_free(trigger_ops, trigger_data);
+ out:
+	return ret;
+
+ out_free:
+	if (cmd_ops->set_filter)
+		cmd_ops->set_filter(NULL, trigger_data, NULL);
+	kfree(trigger_data);
+	goto out;
+}
+
+static struct event_command trigger_object_cmd = {
+	.name			= "objtrace",
+	.trigger_type		= ETT_TRACE_OBJECT,
+	.flags			= EVENT_CMD_FL_NEEDS_REC,
+	.func			= event_object_trigger_callback,
+	.reg			= register_object_trigger,
+	.unreg			= unregister_object_trigger,
+	.get_trigger_ops	= objecttrace_get_trigger_ops,
+	.set_filter		= set_trigger_filter,
+};
+
+__init int register_trigger_object_cmd(void)
+{
+	int ret;
+
+	ret = register_event_command(&trigger_object_cmd);
+	WARN_ON(ret < 0);
+
+	return ret;
+}
+
+static int init_trace_object(void)
+{
+	int ret;
+
+	if (atomic_inc_return(&trace_object_ref) != 1) {
+		ret = 0;
+		goto out;
+	}
+
+	ret = init_object_pool();
+	if (ret)
+		goto out;
+
+	event_trace_file.tr = top_trace_array();
+	if (WARN_ON(!event_trace_file.tr)) {
+		ret = -1;
+		goto out;
+	}
+	ret = register_ftrace_function(&trace_ops);
+out:
+	return ret;
+}
+
+static int exit_trace_object(void)
+{
+	int ret;
+
+	if (WARN_ON_ONCE(atomic_read(&trace_object_ref) <= 0))
+		goto out;
+
+	if (atomic_dec_return(&trace_object_ref) != 0) {
+		ret = 0;
+		goto out;
+	}
+
+	ret = unregister_ftrace_function(&trace_ops);
+	if (ret) {
+		pr_err("can't unregister ftrace for trace object\n");
+		goto out;
+	}
+	free_object_pool();
+out:
+	return ret;
+}
diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
index 3547e7176ff7..d747aed27104 100644
--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -1543,6 +1543,45 @@ static struct trace_event trace_func_repeats_event = {
 	.funcs		= &trace_func_repeats_funcs,
 };
 
+/* TRACE_OBJECT */
+static enum print_line_t trace_object_print(struct trace_iterator *iter, int flags,
+					struct trace_event *event)
+{
+	struct trace_object_entry *field;
+	struct trace_seq *s = &iter->seq;
+
+	trace_assign_type(field, iter->ent);
+	print_fn_trace(s, field->ip, field->parent_ip, flags);
+	trace_seq_printf(s, " object:0x%lx", field->object);
+	trace_seq_putc(s, '\n');
+
+	return trace_handle_return(s);
+}
+
+static enum print_line_t trace_object_raw(struct trace_iterator *iter, int flags,
+				      struct trace_event *event)
+{
+	struct trace_object_entry *field;
+
+	trace_assign_type(field, iter->ent);
+
+	trace_seq_printf(&iter->seq, "%lx %lx\n",
+			 field->ip,
+			 field->parent_ip);
+
+	return trace_handle_return(&iter->seq);
+}
+
+static struct trace_event_functions trace_object_funcs = {
+	.trace		= trace_object_print,
+	.raw		= trace_object_raw,
+};
+
+static struct trace_event trace_object_event = {
+	.type		= TRACE_OBJECT,
+	.funcs		= &trace_object_funcs,
+};
+
 static struct trace_event *events[] __initdata = {
 	&trace_fn_event,
 	&trace_ctx_event,
@@ -1557,6 +1596,7 @@ static struct trace_event *events[] __initdata = {
 	&trace_timerlat_event,
 	&trace_raw_data_event,
 	&trace_func_repeats_event,
+	&trace_object_event,
 	NULL
 };
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC][PATCH v6 2/5] trace/objtrace: get the value of the object
  2021-11-29 16:49 [RFC][PATCH v6 0/5] trace: Introduce objtrace trigger to trace the kernel object Jeff Xie
  2021-11-29 16:49 ` [RFC][PATCH v6 1/5] trace: Add trace any " Jeff Xie
@ 2021-11-29 16:49 ` Jeff Xie
  2021-11-29 16:49 ` [RFC][PATCH v6 3/5] trace/README: Document objtrace trigger syntax Jeff Xie
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 26+ messages in thread
From: Jeff Xie @ 2021-11-29 16:49 UTC (permalink / raw)
  To: rostedt; +Cc: mhiramat, mingo, zanussi, linux-kernel, Jeff Xie

Using objtrace trigger to get the value of the object which from the kernel
function parameter.

Signed-off-by: Jeff Xie <xiehuan09@gmail.com>
---
 kernel/trace/trace_entries.h |   5 +-
 kernel/trace/trace_object.c  | 184 +++++++++++++++++++++++++++++------
 kernel/trace/trace_output.c  |   6 +-
 3 files changed, 161 insertions(+), 34 deletions(-)

diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h
index bb120d9498a9..2407c45a568c 100644
--- a/kernel/trace/trace_entries.h
+++ b/kernel/trace/trace_entries.h
@@ -413,8 +413,9 @@ FTRACE_ENTRY(object, trace_object_entry,
 		__field(	unsigned long,		ip		)
 		__field(	unsigned long,		parent_ip	)
 		__field(	unsigned long,		object		)
+		__field(	unsigned long,		value		)
 	),
 
-	F_printk(" %ps <-- %ps object:%lx\n",
-		 (void *)__entry->ip, (void *)__entry->parent_ip, __entry->object)
+	F_printk(" %ps <-- %ps object:%lx value:%lx\n", (void *)__entry->ip,
+	       (void *)__entry->parent_ip, __entry->object, __entry->value)
 );
diff --git a/kernel/trace/trace_object.c b/kernel/trace/trace_object.c
index 83fccd587b63..c66e53da6aff 100644
--- a/kernel/trace/trace_object.c
+++ b/kernel/trace/trace_object.c
@@ -12,14 +12,25 @@
 static DEFINE_PER_CPU(atomic_t, trace_object_event_disable);
 static DEFINE_RAW_SPINLOCK(object_spin_lock);
 static struct trace_event_file event_trace_file;
-static const int max_args_num = 6;
 static const int max_obj_pool = 10;
 static atomic_t trace_object_ref;
 static int exit_trace_object(void);
 static int init_trace_object(void);
 
+struct objtrace_trigger_data {
+	struct ftrace_event_field *field;
+	long offset;
+	int type_size;
+};
+
+struct objtrace_fetch_type {
+	char *name;
+	int type_size;
+};
+
 struct object_instance {
 	void *object;
+	int obj_type_size;
 	struct freelist_node free_list;
 	struct list_head active_list;
 };
@@ -50,7 +61,7 @@ static bool object_empty(void)
 	return list_empty(&obj_pool->active_list);
 }
 
-static void set_trace_object(void *obj)
+static void set_trace_object(void *obj, int type_size)
 {
 	struct freelist_node *fn;
 	struct object_instance *ins;
@@ -73,6 +84,7 @@ static void set_trace_object(void *obj)
 
 	ins = container_of(fn, struct object_instance, free_list);
 	ins->object = obj;
+	ins->obj_type_size = type_size;
 
 	raw_spin_lock_irqsave(&object_spin_lock, flags);
 	list_add_rcu(&ins->active_list, &obj_pool->active_list);
@@ -135,7 +147,7 @@ static int init_object_pool(void)
 }
 
 static void submit_trace_object(unsigned long ip, unsigned long parent_ip,
-				 unsigned long object)
+				 unsigned long object, unsigned long value)
 {
 
 	struct trace_buffer *buffer;
@@ -152,19 +164,68 @@ static void submit_trace_object(unsigned long ip, unsigned long parent_ip,
 	entry->ip                       = ip;
 	entry->parent_ip                = parent_ip;
 	entry->object			= object;
+	entry->value			= value;
 
 	event_trigger_unlock_commit(&event_trace_file, buffer, event,
 		entry, pc);
 }
 
+static inline long get_object_value(unsigned long *val, void *obj, int type_size)
+{
+	long ret = 0;
+
+	switch (type_size) {
+	case 1: {
+		u8 tmp;
+
+		ret = copy_from_kernel_nofault(&tmp, obj, sizeof(tmp));
+		if (ret)
+			goto out;
+		*val = tmp;
+		break;
+	}
+	case 2: {
+		u16 tmp;
+
+		ret = copy_from_kernel_nofault(&tmp, obj, sizeof(tmp));
+		if (ret)
+			goto out;
+		*val = tmp;
+		break;
+	}
+	case 4: {
+		u32 tmp;
+
+		ret = copy_from_kernel_nofault(&tmp, obj, sizeof(tmp));
+		if (ret)
+			goto out;
+		*val = tmp;
+		break;
+	}
+	case 8: {
+		u64 tmp;
+
+		ret = copy_from_kernel_nofault(&tmp, obj, sizeof(tmp));
+		if (ret)
+			goto out;
+		*val = tmp;
+		break;
+	}
+	default:
+		return -EINVAL;
+	}
+out:
+	return ret;
+}
+
 static void
 trace_object_events_call(unsigned long ip, unsigned long parent_ip,
 		struct ftrace_ops *op, struct ftrace_regs *fregs)
 {
-	struct pt_regs *pt_regs = ftrace_get_regs(fregs);
-	unsigned long obj;
+	struct object_instance *inst;
+	unsigned long val = 0;
 	long disabled;
-	int cpu, n;
+	int cpu;
 
 	preempt_disable_notrace();
 
@@ -177,12 +238,11 @@ trace_object_events_call(unsigned long ip, unsigned long parent_ip,
 	if (object_empty())
 		goto out;
 
-	for (n = 0; n < max_args_num; n++) {
-		obj = regs_get_kernel_argument(pt_regs, n);
-		if (object_exist((void *)obj))
-			submit_trace_object(ip, parent_ip, obj);
+	list_for_each_entry_rcu(inst, &obj_pool->active_list, active_list) {
+		if (get_object_value(&val, inst->object, inst->obj_type_size))
+			goto out;
+		submit_trace_object(ip, parent_ip, (unsigned long)inst->object, val);
 	}
-
 out:
 	atomic_dec(&per_cpu(trace_object_event_disable, cpu));
 	preempt_enable_notrace();
@@ -198,12 +258,14 @@ trace_object_trigger(struct event_trigger_data *data,
 		   struct trace_buffer *buffer,  void *rec,
 		   struct ring_buffer_event *event)
 {
+	struct objtrace_trigger_data *obj_data = data->private_data;
+	struct ftrace_event_field *field;
+	void *obj, *val = NULL;
 
-	struct ftrace_event_field *field = data->private_data;
-	void *obj = NULL;
-
-	memcpy(&obj, rec + field->offset, sizeof(obj));
-	set_trace_object(obj);
+	field = obj_data->field;
+	memcpy(&val, rec + field->offset, sizeof(val));
+	obj = val + obj_data->offset;
+	set_trace_object(obj, obj_data->type_size);
 }
 
 static void
@@ -349,6 +411,22 @@ static void unregister_object_trigger(char *glob, struct event_trigger_ops *ops,
 	}
 }
 
+static const struct objtrace_fetch_type objtrace_fetch_types[] = {
+	{"u8", 1},
+	{"s8", 1},
+	{"x8", 1},
+	{"u16", 2},
+	{"s16", 2},
+	{"x16", 2},
+	{"u32", 4},
+	{"s32", 4},
+	{"x32", 4},
+	{"u64", 8},
+	{"s64", 8},
+	{"x64", 8},
+	{}
+};
+
 static int
 event_object_trigger_callback(struct event_command *cmd_ops,
 		       struct trace_event_file *file,
@@ -356,19 +434,21 @@ event_object_trigger_callback(struct event_command *cmd_ops,
 {
 	struct event_trigger_data *trigger_data;
 	struct event_trigger_ops *trigger_ops;
+	struct objtrace_trigger_data *obj_data;
 	struct trace_event_call *call;
 	struct ftrace_event_field *field;
-	char *objtrace_cmd;
-	char *trigger = NULL;
-	char *arg;
-	char *number;
-	int ret;
+	char *type, *tr, *obj, *tmp, *trigger = NULL;
+	char *number, *objtrace_cmd;
+	int ret, i, def_type_size, type_size = 0;
+	long offset = 0;
 
 	ret = -EINVAL;
 	if (!param)
 		goto out;
-
-	/* separate the trigger from the filter (c:a:n [if filter]) */
+	/*
+	 * separate the trigger from the filter:
+	 * objtrace:add:OBJ[,OFFS][:TYPE][:COUNT] [if filter]
+	 */
 	trigger = strsep(&param, " \t");
 	if (!trigger)
 		goto out;
@@ -382,33 +462,78 @@ event_object_trigger_callback(struct event_command *cmd_ops,
 	if (!objtrace_cmd || strcmp(objtrace_cmd, "add"))
 		goto out;
 
-	arg = strsep(&trigger, ":");
-	if (!arg)
+	obj = strsep(&trigger, ":");
+	if (!obj)
 		goto out;
+
+	tr = strchr(obj, ',');
+	if (!tr)
+		offset = 0;
+	else {
+		*tr++ = '\0';
+		ret = kstrtol(tr, 0, &offset);
+		if (ret)
+			goto out;
+	}
+
+	ret = -EINVAL;
 	call = file->event_call;
-	field = trace_find_event_field(call, arg);
+	field = trace_find_event_field(call, obj);
 	if (!field)
 		goto out;
-
 	if (field->size != sizeof(void *))
 		goto out;
+	def_type_size = sizeof(void *);
+	if (!trigger) {
+		type_size = def_type_size;
+		goto skip_get_type;
+	}
 
+	tmp = trigger;
+	type = strsep(&trigger, ":");
+	if (!type)
+		type_size = def_type_size;
+	else if (isdigit(type[0])) {
+		type_size = def_type_size;
+		trigger = tmp;
+	} else {
+		for (i = 0; objtrace_fetch_types[i].name; i++) {
+			if (strcmp(objtrace_fetch_types[i].name, type) == 0) {
+				type_size = objtrace_fetch_types[i].type_size;
+				break;
+			}
+		}
+	}
+	if (!type_size)
+		goto out;
+skip_get_type:
 	trigger_ops = cmd_ops->get_trigger_ops(cmd, trigger);
 
 	ret = -ENOMEM;
+	obj_data = kzalloc(sizeof(*obj_data), GFP_KERNEL);
+	if (!obj_data)
+		goto out;
+
+	obj_data->field = field;
+	obj_data->offset = offset;
+	obj_data->type_size = type_size;
+
 	trigger_data = kzalloc(sizeof(*trigger_data), GFP_KERNEL);
-	if (!trigger_data)
+	if (!trigger_data) {
+		kfree(obj_data);
 		goto out;
+	}
 
 	trigger_data->count = -1;
 	trigger_data->ops = trigger_ops;
 	trigger_data->cmd_ops = cmd_ops;
-	trigger_data->private_data = field;
+	trigger_data->private_data = obj_data;
 	INIT_LIST_HEAD(&trigger_data->list);
 	INIT_LIST_HEAD(&trigger_data->named_list);
 
 	if (glob[0] == '!') {
 		cmd_ops->unreg(glob+1, trigger_ops, trigger_data, file);
+		kfree(obj_data);
 		kfree(trigger_data);
 		ret = 0;
 		goto out;
@@ -463,6 +588,7 @@ event_object_trigger_callback(struct event_command *cmd_ops,
  out_free:
 	if (cmd_ops->set_filter)
 		cmd_ops->set_filter(NULL, trigger_data, NULL);
+	kfree(obj_data);
 	kfree(trigger_data);
 	goto out;
 }
diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
index d747aed27104..12a971927d8c 100644
--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -1553,6 +1553,7 @@ static enum print_line_t trace_object_print(struct trace_iterator *iter, int fla
 	trace_assign_type(field, iter->ent);
 	print_fn_trace(s, field->ip, field->parent_ip, flags);
 	trace_seq_printf(s, " object:0x%lx", field->object);
+	trace_seq_printf(s, " value:0x%lx", field->value);
 	trace_seq_putc(s, '\n');
 
 	return trace_handle_return(s);
@@ -1565,9 +1566,8 @@ static enum print_line_t trace_object_raw(struct trace_iterator *iter, int flags
 
 	trace_assign_type(field, iter->ent);
 
-	trace_seq_printf(&iter->seq, "%lx %lx\n",
-			 field->ip,
-			 field->parent_ip);
+	trace_seq_printf(&iter->seq, "%lx %lx %lx %lx\n", field->ip,
+			field->parent_ip, field->object, field->value);
 
 	return trace_handle_return(&iter->seq);
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC][PATCH v6 3/5] trace/README: Document objtrace trigger syntax
  2021-11-29 16:49 [RFC][PATCH v6 0/5] trace: Introduce objtrace trigger to trace the kernel object Jeff Xie
  2021-11-29 16:49 ` [RFC][PATCH v6 1/5] trace: Add trace any " Jeff Xie
  2021-11-29 16:49 ` [RFC][PATCH v6 2/5] trace/objtrace: get the value of the object Jeff Xie
@ 2021-11-29 16:49 ` Jeff Xie
  2021-12-16 15:02   ` Masami Hiramatsu
  2021-11-29 16:49 ` [RFC][PATCH v6 4/5] trace/objtrace: Add testcases for objtrace Jeff Xie
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 26+ messages in thread
From: Jeff Xie @ 2021-11-29 16:49 UTC (permalink / raw)
  To: rostedt; +Cc: mhiramat, mingo, zanussi, linux-kernel, Jeff Xie

Add a short description of the use of objtrace trigger.

Signed-off-by: Jeff Xie <xiehuan09@gmail.com>
---
 kernel/trace/trace.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 88de94da596b..a84b6813769e 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -5590,6 +5590,9 @@ static const char readme_msg[] =
 	"\t            enable_hist:<system>:<event>\n"
 	"\t            disable_hist:<system>:<event>\n"
 #endif
+#ifdef CONFIG_TRACE_OBJECT
+	"\t            objtrace:add:obj[,offset][:type][:count][if <filter>]\n"
+#endif
 #ifdef CONFIG_STACKTRACE
 	"\t\t    stacktrace\n"
 #endif
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC][PATCH v6 4/5] trace/objtrace: Add testcases for objtrace
  2021-11-29 16:49 [RFC][PATCH v6 0/5] trace: Introduce objtrace trigger to trace the kernel object Jeff Xie
                   ` (2 preceding siblings ...)
  2021-11-29 16:49 ` [RFC][PATCH v6 3/5] trace/README: Document objtrace trigger syntax Jeff Xie
@ 2021-11-29 16:49 ` Jeff Xie
  2021-11-29 16:49 ` [RFC][PATCH v6 5/5] trace/objtrace: Add documentation " Jeff Xie
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 26+ messages in thread
From: Jeff Xie @ 2021-11-29 16:49 UTC (permalink / raw)
  To: rostedt; +Cc: mhiramat, mingo, zanussi, linux-kernel, Jeff Xie

Add a series of testcases to illustrate correct and incorrect usage of
objtrace trigger.

Signed-off-by: Jeff Xie <xiehuan09@gmail.com>
---
 .../ftrace/test.d/trigger/trigger-objtrace.tc | 39 +++++++++++++++++++
 1 file changed, 39 insertions(+)
 create mode 100644 tools/testing/selftests/ftrace/test.d/trigger/trigger-objtrace.tc

diff --git a/tools/testing/selftests/ftrace/test.d/trigger/trigger-objtrace.tc b/tools/testing/selftests/ftrace/test.d/trigger/trigger-objtrace.tc
new file mode 100644
index 000000000000..d71eb157c51f
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/trigger/trigger-objtrace.tc
@@ -0,0 +1,39 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: event trigger - test objtrace-trigger
+# requires: kprobe_events "objtrace":README
+
+fail() { #msg
+    echo $1
+    exit_fail
+}
+
+echo 'p bio_add_page arg1=$arg1' > kprobe_events
+
+FEATURE=`grep objtrace events/kprobes/p_bio_add_page_0/trigger`
+if [ -z "$FEATURE" ]; then
+    echo "objtrace trigger is not supported"
+    exit_unsupported
+fi
+
+echo "Test objtrace trigger"
+echo 'objtrace:add:arg1,0x28:u32:1 if comm == "cat"' > \
+	events/kprobes/p_bio_add_page_0/trigger
+if [ -z $? ]; then
+	fail "objtrace trigger syntax error"
+fi
+
+echo "Test objtrace semantic errors"
+
+# Being lack of objtrace command
+! echo 'objtrace:arg1,0x28:u32:1' > events/kprobes/p_bio_add_page_0/trigger
+# Bad parameter name
+! echo 'objtrace:add:argx:u32:1' > events/kprobes/p_bio_add_page_0/trigger
+
+echo "reset objtrace trigger"
+
+echo '!objtrace:add:arg1,0x28:u32' > \
+	events/kprobes/p_bio_add_page_0/trigger
+echo '-:p_bio_add_page_0' >> ./kprobe_events
+
+exit 0
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC][PATCH v6 5/5] trace/objtrace: Add documentation for objtrace
  2021-11-29 16:49 [RFC][PATCH v6 0/5] trace: Introduce objtrace trigger to trace the kernel object Jeff Xie
                   ` (3 preceding siblings ...)
  2021-11-29 16:49 ` [RFC][PATCH v6 4/5] trace/objtrace: Add testcases for objtrace Jeff Xie
@ 2021-11-29 16:49 ` Jeff Xie
  2021-12-10 16:55 ` [RFC][PATCH v6 0/5] trace: Introduce objtrace trigger to trace the kernel object Jeff Xie
  2021-12-16 14:50 ` Masami Hiramatsu
  6 siblings, 0 replies; 26+ messages in thread
From: Jeff Xie @ 2021-11-29 16:49 UTC (permalink / raw)
  To: rostedt; +Cc: mhiramat, mingo, zanussi, linux-kernel, Jeff Xie

Added documentation explaining how to use objtrace trigger to get the value
of the object.

Signed-off-by: Jeff Xie <xiehuan09@gmail.com>
---
 Documentation/trace/events.rst | 114 +++++++++++++++++++++++++++++++++
 1 file changed, 114 insertions(+)

diff --git a/Documentation/trace/events.rst b/Documentation/trace/events.rst
index 8ddb9b09451c..a6d8313ba24e 100644
--- a/Documentation/trace/events.rst
+++ b/Documentation/trace/events.rst
@@ -527,6 +527,120 @@ The following commands are supported:
 
   See Documentation/trace/histogram.rst for details and examples.
 
+- objtrace
+
+  This command provides a way to get the value of any object, The object
+  can be obtained from the dynamic event(kprobe_event/uprobe_event) or the
+  static event(tracepoint).
+
+  Usage:
+  When using the kprobe event, only need to set the objtrace(a new trigger),
+  we can get the value of the object. The object is from the setting of the
+  kprobe event.
+
+  For example:
+  For the function bio_add_page():
+
+  int bio_add_page(struct bio *bio, struct page *page,
+	unsigned int len, unsigned int offset)
+
+  Firstly, we can set the base of the object, thus the first string "arg1"
+  stands for the value of the first parameter of this function bio_add_gage(),
+
+  # echo 'p bio_add_page arg1=$arg1' >> ./kprobe_events
+
+  Secondly, we can get the value dynamically base the above object.
+
+  find the offset of the bi_size in struct bio:
+  $ gdb vmlinux
+  (gdb) p &(((struct bio *)0)->bi_iter.bi_size)
+  $1 = (unsigned int *) 0x28
+
+  # echo 'objtrace:add:arg1,0x28:u32:1 if comm == "cat"' > ./events/kprobes/ \
+	p_bio_add_page_0/trigger
+
+  The best way to use this is that we can set the entrance event and exit
+  event, for example, the following example is to set the read_papes as
+  the entrance event, and set the __blk_account_io_start as the exit event.
+
+  # cd /sys/kernel/debug/tracing/
+  # echo 0 > ./tracing_on
+  # echo 'p read_pages' >> ./kprobe_events
+  # echo 'p __blk_account_io_start' >> ./kprobe_events
+  # echo 'traceon if comm == "cat"' > ./events/kprobes/p_read_pages_0/trigger
+  # echo 'traceoff if comm == "cat"' > ./events/kprobes/p___blk_account_io_start_0/trigger
+  # echo 'p bio_add_page arg1=$arg1' >> ./kprobe_events
+  # echo 'objtrace:add:arg1,0x28:u32:1 if comm == "cat"' > ./events/kprobes/p_bio_add_page_0/trigger
+
+  # du -sh /test.txt
+  12.0K   /test.txt
+
+  # cat  /test.txt > /dev/null
+  # cat ./trace
+
+  # tracer: nop
+  #
+  # entries-in-buffer/entries-written: 50/50   #P:1
+  #
+  #                                _-----=> irqs-off
+  #                               / _----=> need-resched
+  #                              | / _---=> hardirq/softirq
+  #                              || / _--=> preempt-depth
+  #                              ||| / _-=> migrate-disable
+  #                              |||| /     delay
+  #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
+  #              | |         |   |||||     |         |
+             cat-95      [000] .....     1.412065: _raw_spin_unlock_irqrestore <-event_triggers_call object:0xffff888108af6328 value:0x0
+             cat-95      [000] .....     1.412066: __bio_try_merge_page <-bio_add_page object:0xffff888108af6328 value:0x0
+             cat-95      [000] .....     1.412066: __bio_add_page <-bio_add_page object:0xffff888108af6328 value:0x0
+             cat-95      [000] .....     1.412066: rcu_read_unlock_strict <-xa_load object:0xffff888108af6328 value:0x1000
+             cat-95      [000] .....     1.412066: bio_add_page <-ext4_mpage_readpages object:0xffff888108af6328 value:0x1000
+             cat-95      [000] .....     1.412066: kprobe_ftrace_handler <-ftrace_ops_list_func object:0xffff888108af6328 value:0x1000
+             cat-95      [000] .....     1.412067: get_kprobe <-kprobe_ftrace_handler object:0xffff888108af6328 value:0x1000
+             cat-95      [000] .....     1.412067: __bio_try_merge_page <-bio_add_page object:0xffff888108af6328 value:0x1000
+             cat-95      [000] .....     1.412067: __bio_add_page <-bio_add_page object:0xffff888108af6328 value:0x1000
+             cat-95      [000] .....     1.412067: rcu_read_unlock_strict <-xa_load object:0xffff888108af6328 value:0x2000
+             cat-95      [000] .....     1.412067: bio_add_page <-ext4_mpage_readpages object:0xffff888108af6328 value:0x2000
+             cat-95      [000] .....     1.412067: kprobe_ftrace_handler <-ftrace_ops_list_func object:0xffff888108af6328 value:0x2000
+             cat-95      [000] .....     1.412067: get_kprobe <-kprobe_ftrace_handler object:0xffff888108af6328 value:0x2000
+             cat-95      [000] .....     1.412067: __bio_try_merge_page <-bio_add_page object:0xffff888108af6328 value:0x2000
+             cat-95      [000] .....     1.412068: submit_bio <-ext4_mpage_readpages object:0xffff888108af6328 value:0x3000
+             cat-95      [000] .....     1.412068: submit_bio_noacct <-ext4_mpage_readpages object:0xffff888108af6328 value:0x3000
+             cat-95      [000] .....     1.412068: __submit_bio <-submit_bio_noacct object:0xffff888108af6328 value:0x3000
+             cat-95      [000] .....     1.412068: blk_try_enter_queue <-__submit_bio object:0xffff888108af6328 value:0x3000
+             cat-95      [000] .....     1.412068: rcu_read_unlock_strict <-blk_try_enter_queue object:0xffff888108af6328 value:0x3000
+             cat-95      [000] .....     1.412068: rcu_read_unlock_strict <-blk_try_enter_queue object:0xffff888108af6328 value:0x3000
+             cat-95      [000] .....     1.412068: submit_bio_checks <-__submit_bio object:0xffff888108af6328 value:0x3000
+             cat-95      [000] .....     1.412068: __cond_resched <-submit_bio_checks object:0xffff888108af6328 value:0x3000
+             cat-95      [000] .....     1.412068: rcu_all_qs <-__cond_resched object:0xffff888108af6328 value:0x3000
+             cat-95      [000] .....     1.412068: should_fail_bio <-submit_bio_checks object:0xffff888108af6328 value:0x3000
+             cat-95      [000] .....     1.412069: create_task_io_context <-submit_bio_checks object:0xffff888108af6328 value:0x3000
+             cat-95      [000] .....     1.412069: kmem_cache_alloc_node <-create_task_io_context object:0xffff888108af6328 value:0x3000
+             cat-95      [000] .....     1.412069: should_failslab <-kmem_cache_alloc_node object:0xffff888108af6328 value:0x3000
+             cat-95      [000] .....     1.412069: _raw_spin_lock <-create_task_io_context object:0xffff888108af6328 value:0x3000
+             cat-95      [000] .....     1.412069: blk_mq_submit_bio <-__submit_bio object:0xffff888108af6328 value:0x3000
+             cat-95      [000] .....     1.412069: __blk_queue_split <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
+             cat-95      [000] .....     1.412069: bvec_split_segs <-__blk_queue_split object:0xffff888108af6328 value:0x3000
+             cat-95      [000] .....     1.412069: blk_attempt_plug_merge <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
+             cat-95      [000] .....     1.412070: __blk_mq_sched_bio_merge <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
+             cat-95      [000] .....     1.412070: dd_bio_merge <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
+             cat-95      [000] .....     1.412070: _raw_spin_lock <-dd_bio_merge object:0xffff888108af6328 value:0x3000
+             cat-95      [000] .....     1.412070: blk_mq_sched_try_merge <-dd_bio_merge object:0xffff888108af6328 value:0x3000
+             cat-95      [000] .....     1.412070: elv_merge <-blk_mq_sched_try_merge object:0xffff888108af6328 value:0x3000
+             cat-95      [000] .....     1.412070: elv_rqhash_find <-elv_merge object:0xffff888108af6328 value:0x3000
+             cat-95      [000] .....     1.412070: dd_request_merge <-blk_mq_sched_try_merge object:0xffff888108af6328 value:0x3000
+             cat-95      [000] .....     1.412070: elv_rb_find <-dd_request_merge object:0xffff888108af6328 value:0x3000
+             cat-95      [000] .....     1.412070: __blk_mq_alloc_request <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
+             cat-95      [000] .....     1.412071: dd_limit_depth <-__blk_mq_alloc_request object:0xffff888108af6328 value:0x3000
+             cat-95      [000] .....     1.412071: blk_mq_get_tag <-__blk_mq_alloc_request object:0xffff888108af6328 value:0x3000
+             cat-95      [000] .....     1.412071: __blk_mq_get_tag <-blk_mq_get_tag object:0xffff888108af6328 value:0x3000
+             cat-95      [000] .....     1.412071: blk_mq_rq_ctx_init.isra.0 <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
+             cat-95      [000] .....     1.412071: ktime_get <-blk_mq_rq_ctx_init.isra.0 object:0xffff888108af6328 value:0x3000
+             cat-95      [000] .....     1.412071: dd_prepare_request <-blk_mq_rq_ctx_init.isra.0 object:0xffff888108af6328 value:0x3000
+             cat-95      [000] .....     1.412071: __blk_account_io_start <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
+             cat-95      [000] .....     1.412071: kprobe_ftrace_handler <-ftrace_ops_list_func object:0xffff888108af6328 value:0x3000
+             cat-95      [000] .....     1.412071: get_kprobe <-kprobe_ftrace_handler object:0xffff888108af6328 value:0x3000
+
 7. In-kernel trace event API
 ============================
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH v6 1/5] trace: Add trace any kernel object
  2021-11-29 16:49 ` [RFC][PATCH v6 1/5] trace: Add trace any " Jeff Xie
@ 2021-11-29 23:05     ` kernel test robot
  2021-12-17  4:51   ` Masami Hiramatsu
  1 sibling, 0 replies; 26+ messages in thread
From: kernel test robot @ 2021-11-29 23:05 UTC (permalink / raw)
  To: Jeff Xie; +Cc: llvm, kbuild-all

Hi Jeff,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on rostedt-trace/for-next]
[also build test WARNING on linux/master linus/master v5.16-rc3 next-20211129]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Jeff-Xie/trace-Introduce-objtrace-trigger-to-trace-the-kernel-object/20211130-041958
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git for-next
config: i386-buildonly-randconfig-r003-20211128 (https://download.01.org/0day-ci/archive/20211130/202111300652.yLaNuuKr-lkp@intel.com/config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project df08b2fe8b35cb63dfb3b49738a3494b9b4e6f8e)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/88ea43d691b0980bf49b4e579c39303cf3c9cd72
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Jeff-Xie/trace-Introduce-objtrace-trigger-to-trace-the-kernel-object/20211130-041958
        git checkout 88ea43d691b0980bf49b4e579c39303cf3c9cd72
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash kernel/trace/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> kernel/trace/trace_object.c:518:6: warning: variable 'ret' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
           if (WARN_ON_ONCE(atomic_read(&trace_object_ref) <= 0))
               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/asm-generic/bug.h:179:33: note: expanded from macro 'WARN_ON_ONCE'
   #define WARN_ON_ONCE(condition) WARN_ON(condition)
                                   ^~~~~~~~~~~~~~~~~~
   include/asm-generic/bug.h:165:28: note: expanded from macro 'WARN_ON'
   #define WARN_ON(condition) ({                                           \
                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   kernel/trace/trace_object.c:533:9: note: uninitialized use occurs here
           return ret;
                  ^~~
   kernel/trace/trace_object.c:518:2: note: remove the 'if' if its condition is always false
           if (WARN_ON_ONCE(atomic_read(&trace_object_ref) <= 0))
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   kernel/trace/trace_object.c:516:9: note: initialize the variable 'ret' to silence this warning
           int ret;
                  ^
                   = 0
   1 warning generated.


vim +518 kernel/trace/trace_object.c

   513	
   514	static int exit_trace_object(void)
   515	{
   516		int ret;
   517	
 > 518		if (WARN_ON_ONCE(atomic_read(&trace_object_ref) <= 0))

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH v6 1/5] trace: Add trace any kernel object
@ 2021-11-29 23:05     ` kernel test robot
  0 siblings, 0 replies; 26+ messages in thread
From: kernel test robot @ 2021-11-29 23:05 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3310 bytes --]

Hi Jeff,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on rostedt-trace/for-next]
[also build test WARNING on linux/master linus/master v5.16-rc3 next-20211129]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Jeff-Xie/trace-Introduce-objtrace-trigger-to-trace-the-kernel-object/20211130-041958
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git for-next
config: i386-buildonly-randconfig-r003-20211128 (https://download.01.org/0day-ci/archive/20211130/202111300652.yLaNuuKr-lkp(a)intel.com/config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project df08b2fe8b35cb63dfb3b49738a3494b9b4e6f8e)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/88ea43d691b0980bf49b4e579c39303cf3c9cd72
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Jeff-Xie/trace-Introduce-objtrace-trigger-to-trace-the-kernel-object/20211130-041958
        git checkout 88ea43d691b0980bf49b4e579c39303cf3c9cd72
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash kernel/trace/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> kernel/trace/trace_object.c:518:6: warning: variable 'ret' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
           if (WARN_ON_ONCE(atomic_read(&trace_object_ref) <= 0))
               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/asm-generic/bug.h:179:33: note: expanded from macro 'WARN_ON_ONCE'
   #define WARN_ON_ONCE(condition) WARN_ON(condition)
                                   ^~~~~~~~~~~~~~~~~~
   include/asm-generic/bug.h:165:28: note: expanded from macro 'WARN_ON'
   #define WARN_ON(condition) ({                                           \
                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   kernel/trace/trace_object.c:533:9: note: uninitialized use occurs here
           return ret;
                  ^~~
   kernel/trace/trace_object.c:518:2: note: remove the 'if' if its condition is always false
           if (WARN_ON_ONCE(atomic_read(&trace_object_ref) <= 0))
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   kernel/trace/trace_object.c:516:9: note: initialize the variable 'ret' to silence this warning
           int ret;
                  ^
                   = 0
   1 warning generated.


vim +518 kernel/trace/trace_object.c

   513	
   514	static int exit_trace_object(void)
   515	{
   516		int ret;
   517	
 > 518		if (WARN_ON_ONCE(atomic_read(&trace_object_ref) <= 0))

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH v6 1/5] trace: Add trace any kernel object
  2021-11-29 16:49 ` [RFC][PATCH v6 1/5] trace: Add trace any " Jeff Xie
@ 2021-12-06 13:21 ` Dan Carpenter
  2021-12-17  4:51   ` Masami Hiramatsu
  1 sibling, 0 replies; 26+ messages in thread
From: kernel test robot @ 2021-12-03 19:54 UTC (permalink / raw)
  To: kbuild

[-- Attachment #1: Type: text/plain, Size: 2730 bytes --]

CC: kbuild-all(a)lists.01.org
In-Reply-To: <20211129164951.220511-2-xiehuan09@gmail.com>
References: <20211129164951.220511-2-xiehuan09@gmail.com>
TO: Jeff Xie <xiehuan09@gmail.com>

Hi Jeff,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on rostedt-trace/for-next]
[also build test WARNING on linux/master linus/master v5.16-rc3 next-20211203]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Jeff-Xie/trace-Introduce-objtrace-trigger-to-trace-the-kernel-object/20211130-041958
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git for-next
:::::: branch date: 4 days ago
:::::: commit date: 4 days ago
config: i386-randconfig-m021-20211128 (https://download.01.org/0day-ci/archive/20211204/202112040300.0gTw1WdI-lkp(a)intel.com/config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

smatch warnings:
kernel/trace/trace_object.c:533 exit_trace_object() error: uninitialized symbol 'ret'.

vim +/ret +533 kernel/trace/trace_object.c

88ea43d691b098 Jeff Xie 2021-11-30  513  
88ea43d691b098 Jeff Xie 2021-11-30  514  static int exit_trace_object(void)
88ea43d691b098 Jeff Xie 2021-11-30  515  {
88ea43d691b098 Jeff Xie 2021-11-30  516  	int ret;
88ea43d691b098 Jeff Xie 2021-11-30  517  
88ea43d691b098 Jeff Xie 2021-11-30  518  	if (WARN_ON_ONCE(atomic_read(&trace_object_ref) <= 0))
88ea43d691b098 Jeff Xie 2021-11-30  519  		goto out;
88ea43d691b098 Jeff Xie 2021-11-30  520  
88ea43d691b098 Jeff Xie 2021-11-30  521  	if (atomic_dec_return(&trace_object_ref) != 0) {
88ea43d691b098 Jeff Xie 2021-11-30  522  		ret = 0;
88ea43d691b098 Jeff Xie 2021-11-30  523  		goto out;
88ea43d691b098 Jeff Xie 2021-11-30  524  	}
88ea43d691b098 Jeff Xie 2021-11-30  525  
88ea43d691b098 Jeff Xie 2021-11-30  526  	ret = unregister_ftrace_function(&trace_ops);
88ea43d691b098 Jeff Xie 2021-11-30  527  	if (ret) {
88ea43d691b098 Jeff Xie 2021-11-30  528  		pr_err("can't unregister ftrace for trace object\n");
88ea43d691b098 Jeff Xie 2021-11-30  529  		goto out;
88ea43d691b098 Jeff Xie 2021-11-30  530  	}
88ea43d691b098 Jeff Xie 2021-11-30  531  	free_object_pool();
88ea43d691b098 Jeff Xie 2021-11-30  532  out:
88ea43d691b098 Jeff Xie 2021-11-30 @533  	return ret;

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH v6 1/5] trace: Add trace any kernel object
@ 2021-12-06 13:21 ` Dan Carpenter
  0 siblings, 0 replies; 26+ messages in thread
From: Dan Carpenter @ 2021-12-06 13:21 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 2208 bytes --]

Hi Jeff,

url:    https://github.com/0day-ci/linux/commits/Jeff-Xie/trace-Introduce-objtrace-trigger-to-trace-the-kernel-object/20211130-041958
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git for-next
config: i386-randconfig-m021-20211128 (https://download.01.org/0day-ci/archive/20211204/202112040300.0gTw1WdI-lkp(a)intel.com/config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

smatch warnings:
kernel/trace/trace_object.c:533 exit_trace_object() error: uninitialized symbol 'ret'.

vim +/ret +533 kernel/trace/trace_object.c

88ea43d691b098 Jeff Xie 2021-11-30  514  static int exit_trace_object(void)
88ea43d691b098 Jeff Xie 2021-11-30  515  {
88ea43d691b098 Jeff Xie 2021-11-30  516  	int ret;
88ea43d691b098 Jeff Xie 2021-11-30  517  
88ea43d691b098 Jeff Xie 2021-11-30  518  	if (WARN_ON_ONCE(atomic_read(&trace_object_ref) <= 0))
88ea43d691b098 Jeff Xie 2021-11-30  519  		goto out;

"ret" not set.  This is the cannonical bug that happens when "goto out;"
is a no-op.  It's less bug prone and more readable to return directly.

88ea43d691b098 Jeff Xie 2021-11-30  520  
88ea43d691b098 Jeff Xie 2021-11-30  521  	if (atomic_dec_return(&trace_object_ref) != 0) {
88ea43d691b098 Jeff Xie 2021-11-30  522  		ret = 0;
88ea43d691b098 Jeff Xie 2021-11-30  523  		goto out;
88ea43d691b098 Jeff Xie 2021-11-30  524  	}
88ea43d691b098 Jeff Xie 2021-11-30  525  
88ea43d691b098 Jeff Xie 2021-11-30  526  	ret = unregister_ftrace_function(&trace_ops);
88ea43d691b098 Jeff Xie 2021-11-30  527  	if (ret) {
88ea43d691b098 Jeff Xie 2021-11-30  528  		pr_err("can't unregister ftrace for trace object\n");
88ea43d691b098 Jeff Xie 2021-11-30  529  		goto out;
88ea43d691b098 Jeff Xie 2021-11-30  530  	}
88ea43d691b098 Jeff Xie 2021-11-30  531  	free_object_pool();
88ea43d691b098 Jeff Xie 2021-11-30  532  out:
88ea43d691b098 Jeff Xie 2021-11-30 @533  	return ret;

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH v6 0/5] trace: Introduce objtrace trigger to trace the kernel object
  2021-11-29 16:49 [RFC][PATCH v6 0/5] trace: Introduce objtrace trigger to trace the kernel object Jeff Xie
                   ` (4 preceding siblings ...)
  2021-11-29 16:49 ` [RFC][PATCH v6 5/5] trace/objtrace: Add documentation " Jeff Xie
@ 2021-12-10 16:55 ` Jeff Xie
  2021-12-10 17:31   ` Steven Rostedt
  2021-12-13 10:16   ` Masami Hiramatsu
  2021-12-16 14:50 ` Masami Hiramatsu
  6 siblings, 2 replies; 26+ messages in thread
From: Jeff Xie @ 2021-12-10 16:55 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Masami Hiramatsu, mingo, Tom Zanussi, linux-kernel

hi Steve and Masami,

I really don't know what else I can do to enhance this patchset, if
this idea is really valuable and
can play its role, it may need the help of you or other developers ;-)

On Tue, Nov 30, 2021 at 12:50 AM Jeff Xie <xiehuan09@gmail.com> wrote:
>
> Introduce a method based on function tracer to trace any object and get
> the value of the object dynamically. the object can be obtained from the
> dynamic event (kprobe_event/uprobe_event) or the static event(tracepoint).
>
> Usage:
> When using the kprobe event, only need to set the objtrace(a new trigger),
> we can get the value of the object. The object is from the setting of the
> kprobe event.
>
> For example:
> For the function bio_add_page():
>
> int bio_add_page(struct bio *bio, struct page *page,
>         unsigned int len, unsigned int offset)
>
> Firstly, we can set the base of the object, thus the first string "arg1"
> stands for the value of the first parameter of this function bio_add_gage(),
>
> # echo 'p bio_add_page arg1=$arg1' >> ./kprobe_events
>
> Secondly, we can get the value dynamically base the above object.
>
> find the offset of the bi_size in struct bio:
> $ gdb vmlinux
> (gdb) p &(((struct bio *)0)->bi_iter.bi_size)
> $1 = (unsigned int *) 0x28
>
> # echo 'objtrace:add:arg1,0x28:u32:1 if comm == "cat"' > ./events/kprobes/ \
>         p_bio_add_page_0/trigger
>
> The best way to use this is that we can set the entrance event and exit
> event, for example, the following example is to set the read_papes as
> the entrance event, and set the __blk_account_io_start as the exit event.
>
> # cd /sys/kernel/debug/tracing/
> # echo 0 > ./tracing_on
> # echo 'p read_pages' >> ./kprobe_events
> # echo 'p __blk_account_io_start' >> ./kprobe_events
> # echo 'traceon if comm == "cat"' > ./events/kprobes/p_read_pages_0/trigger
> # echo 'traceoff if comm == "cat"' > ./events/kprobes/p___blk_account_io_start_0/trigger
> # echo 'p bio_add_page arg1=$arg1' >> ./kprobe_events
> # echo 'objtrace:add:arg1,0x28:u32:1 if comm == "cat"' > ./events/kprobes/p_bio_add_page_0/trigger
>
> # du -sh /test.txt
> 12.0K   /test.txt
>
> # cat  /test.txt > /dev/null
> # cat ./trace
>
> # tracer: nop
> #
> # entries-in-buffer/entries-written: 50/50   #P:1
> #
> #                                _-----=> irqs-off
> #                               / _----=> need-resched
> #                              | / _---=> hardirq/softirq
> #                              || / _--=> preempt-depth
> #                              ||| / _-=> migrate-disable
> #                              |||| /     delay
> #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
> #              | |         |   |||||     |         |
>              cat-95      [000] .....     1.412065: _raw_spin_unlock_irqrestore <-event_triggers_call object:0xffff888108af6328 value:0x0
>              cat-95      [000] .....     1.412066: __bio_try_merge_page <-bio_add_page object:0xffff888108af6328 value:0x0
>              cat-95      [000] .....     1.412066: __bio_add_page <-bio_add_page object:0xffff888108af6328 value:0x0
>              cat-95      [000] .....     1.412066: rcu_read_unlock_strict <-xa_load object:0xffff888108af6328 value:0x1000
>              cat-95      [000] .....     1.412066: bio_add_page <-ext4_mpage_readpages object:0xffff888108af6328 value:0x1000
>              cat-95      [000] .....     1.412066: kprobe_ftrace_handler <-ftrace_ops_list_func object:0xffff888108af6328 value:0x1000
>              cat-95      [000] .....     1.412067: get_kprobe <-kprobe_ftrace_handler object:0xffff888108af6328 value:0x1000
>              cat-95      [000] .....     1.412067: __bio_try_merge_page <-bio_add_page object:0xffff888108af6328 value:0x1000
>              cat-95      [000] .....     1.412067: __bio_add_page <-bio_add_page object:0xffff888108af6328 value:0x1000
>              cat-95      [000] .....     1.412067: rcu_read_unlock_strict <-xa_load object:0xffff888108af6328 value:0x2000
>              cat-95      [000] .....     1.412067: bio_add_page <-ext4_mpage_readpages object:0xffff888108af6328 value:0x2000
>              cat-95      [000] .....     1.412067: kprobe_ftrace_handler <-ftrace_ops_list_func object:0xffff888108af6328 value:0x2000
>              cat-95      [000] .....     1.412067: get_kprobe <-kprobe_ftrace_handler object:0xffff888108af6328 value:0x2000
>              cat-95      [000] .....     1.412067: __bio_try_merge_page <-bio_add_page object:0xffff888108af6328 value:0x2000
>              cat-95      [000] .....     1.412068: submit_bio <-ext4_mpage_readpages object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412068: submit_bio_noacct <-ext4_mpage_readpages object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412068: __submit_bio <-submit_bio_noacct object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412068: blk_try_enter_queue <-__submit_bio object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412068: rcu_read_unlock_strict <-blk_try_enter_queue object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412068: rcu_read_unlock_strict <-blk_try_enter_queue object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412068: submit_bio_checks <-__submit_bio object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412068: __cond_resched <-submit_bio_checks object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412068: rcu_all_qs <-__cond_resched object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412068: should_fail_bio <-submit_bio_checks object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412069: create_task_io_context <-submit_bio_checks object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412069: kmem_cache_alloc_node <-create_task_io_context object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412069: should_failslab <-kmem_cache_alloc_node object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412069: _raw_spin_lock <-create_task_io_context object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412069: blk_mq_submit_bio <-__submit_bio object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412069: __blk_queue_split <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412069: bvec_split_segs <-__blk_queue_split object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412069: blk_attempt_plug_merge <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412070: __blk_mq_sched_bio_merge <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412070: dd_bio_merge <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412070: _raw_spin_lock <-dd_bio_merge object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412070: blk_mq_sched_try_merge <-dd_bio_merge object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412070: elv_merge <-blk_mq_sched_try_merge object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412070: elv_rqhash_find <-elv_merge object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412070: dd_request_merge <-blk_mq_sched_try_merge object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412070: elv_rb_find <-dd_request_merge object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412070: __blk_mq_alloc_request <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412071: dd_limit_depth <-__blk_mq_alloc_request object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412071: blk_mq_get_tag <-__blk_mq_alloc_request object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412071: __blk_mq_get_tag <-blk_mq_get_tag object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412071: blk_mq_rq_ctx_init.isra.0 <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412071: ktime_get <-blk_mq_rq_ctx_init.isra.0 object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412071: dd_prepare_request <-blk_mq_rq_ctx_init.isra.0 object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412071: __blk_account_io_start <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412071: kprobe_ftrace_handler <-ftrace_ops_list_func object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412071: get_kprobe <-kprobe_ftrace_handler object:0xffff888108af6328 value:0x3000
>
> Jeff Xie (5):
>   trace: Add trace any kernel object
>   trace/objtrace: get the value of the object
>   trace/README: Document objtrace trigger syntax
>   trace/objtrace: Add testcases for objtrace
>   trace/objtrace: Add documentation for objtrace
>
>  Documentation/trace/events.rst                | 114 +++
>  include/linux/trace_events.h                  |   1 +
>  kernel/trace/Kconfig                          |  10 +
>  kernel/trace/Makefile                         |   1 +
>  kernel/trace/trace.c                          |   3 +
>  kernel/trace/trace.h                          |   8 +
>  kernel/trace/trace_entries.h                  |  18 +
>  kernel/trace/trace_events_trigger.c           |   1 +
>  kernel/trace/trace_object.c                   | 660 ++++++++++++++++++
>  kernel/trace/trace_output.c                   |  40 ++
>  .../ftrace/test.d/trigger/trigger-objtrace.tc |  39 ++
>  11 files changed, 895 insertions(+)
>  create mode 100644 kernel/trace/trace_object.c
>  create mode 100644 tools/testing/selftests/ftrace/test.d/trigger/trigger-objtrace.tc
>
> --
> 2.25.1
>

Thanks,
---
JeffXie

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH v6 0/5] trace: Introduce objtrace trigger to trace the kernel object
  2021-12-10 16:55 ` [RFC][PATCH v6 0/5] trace: Introduce objtrace trigger to trace the kernel object Jeff Xie
@ 2021-12-10 17:31   ` Steven Rostedt
  2021-12-13 10:16   ` Masami Hiramatsu
  1 sibling, 0 replies; 26+ messages in thread
From: Steven Rostedt @ 2021-12-10 17:31 UTC (permalink / raw)
  To: Jeff Xie; +Cc: Masami Hiramatsu, mingo, Tom Zanussi, linux-kernel

On Sat, 11 Dec 2021 00:55:31 +0800
Jeff Xie <xiehuan09@gmail.com> wrote:

> hi Steve and Masami,
> 
> I really don't know what else I can do to enhance this patchset, if
> this idea is really valuable and
> can play its role, it may need the help of you or other developers ;-)

Hi Jeff,

I've been very busy with other things, but I may have time soon to look at
your patch set. After I get a chance to play with it, I'm certain I will be
coming back with some more comments ;-)

Sorry about the delay, but my day job got in the way :-p

-- Steve

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH v6 0/5] trace: Introduce objtrace trigger to trace the kernel object
  2021-12-10 16:55 ` [RFC][PATCH v6 0/5] trace: Introduce objtrace trigger to trace the kernel object Jeff Xie
  2021-12-10 17:31   ` Steven Rostedt
@ 2021-12-13 10:16   ` Masami Hiramatsu
  1 sibling, 0 replies; 26+ messages in thread
From: Masami Hiramatsu @ 2021-12-13 10:16 UTC (permalink / raw)
  To: Jeff Xie
  Cc: Steven Rostedt, Masami Hiramatsu, mingo, Tom Zanussi, linux-kernel

Hi Jeff,

On Sat, 11 Dec 2021 00:55:31 +0800
Jeff Xie <xiehuan09@gmail.com> wrote:

> hi Steve and Masami,
> 
> I really don't know what else I can do to enhance this patchset, if
> this idea is really valuable and
> can play its role, it may need the help of you or other developers ;-)

Thanks for ping me. Let me review it this week.

Thank you!

> 
> On Tue, Nov 30, 2021 at 12:50 AM Jeff Xie <xiehuan09@gmail.com> wrote:
> >
> > Introduce a method based on function tracer to trace any object and get
> > the value of the object dynamically. the object can be obtained from the
> > dynamic event (kprobe_event/uprobe_event) or the static event(tracepoint).
> >
> > Usage:
> > When using the kprobe event, only need to set the objtrace(a new trigger),
> > we can get the value of the object. The object is from the setting of the
> > kprobe event.
> >
> > For example:
> > For the function bio_add_page():
> >
> > int bio_add_page(struct bio *bio, struct page *page,
> >         unsigned int len, unsigned int offset)
> >
> > Firstly, we can set the base of the object, thus the first string "arg1"
> > stands for the value of the first parameter of this function bio_add_gage(),
> >
> > # echo 'p bio_add_page arg1=$arg1' >> ./kprobe_events
> >
> > Secondly, we can get the value dynamically base the above object.
> >
> > find the offset of the bi_size in struct bio:
> > $ gdb vmlinux
> > (gdb) p &(((struct bio *)0)->bi_iter.bi_size)
> > $1 = (unsigned int *) 0x28
> >
> > # echo 'objtrace:add:arg1,0x28:u32:1 if comm == "cat"' > ./events/kprobes/ \
> >         p_bio_add_page_0/trigger
> >
> > The best way to use this is that we can set the entrance event and exit
> > event, for example, the following example is to set the read_papes as
> > the entrance event, and set the __blk_account_io_start as the exit event.
> >
> > # cd /sys/kernel/debug/tracing/
> > # echo 0 > ./tracing_on
> > # echo 'p read_pages' >> ./kprobe_events
> > # echo 'p __blk_account_io_start' >> ./kprobe_events
> > # echo 'traceon if comm == "cat"' > ./events/kprobes/p_read_pages_0/trigger
> > # echo 'traceoff if comm == "cat"' > ./events/kprobes/p___blk_account_io_start_0/trigger
> > # echo 'p bio_add_page arg1=$arg1' >> ./kprobe_events
> > # echo 'objtrace:add:arg1,0x28:u32:1 if comm == "cat"' > ./events/kprobes/p_bio_add_page_0/trigger
> >
> > # du -sh /test.txt
> > 12.0K   /test.txt
> >
> > # cat  /test.txt > /dev/null
> > # cat ./trace
> >
> > # tracer: nop
> > #
> > # entries-in-buffer/entries-written: 50/50   #P:1
> > #
> > #                                _-----=> irqs-off
> > #                               / _----=> need-resched
> > #                              | / _---=> hardirq/softirq
> > #                              || / _--=> preempt-depth
> > #                              ||| / _-=> migrate-disable
> > #                              |||| /     delay
> > #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
> > #              | |         |   |||||     |         |
> >              cat-95      [000] .....     1.412065: _raw_spin_unlock_irqrestore <-event_triggers_call object:0xffff888108af6328 value:0x0
> >              cat-95      [000] .....     1.412066: __bio_try_merge_page <-bio_add_page object:0xffff888108af6328 value:0x0
> >              cat-95      [000] .....     1.412066: __bio_add_page <-bio_add_page object:0xffff888108af6328 value:0x0
> >              cat-95      [000] .....     1.412066: rcu_read_unlock_strict <-xa_load object:0xffff888108af6328 value:0x1000
> >              cat-95      [000] .....     1.412066: bio_add_page <-ext4_mpage_readpages object:0xffff888108af6328 value:0x1000
> >              cat-95      [000] .....     1.412066: kprobe_ftrace_handler <-ftrace_ops_list_func object:0xffff888108af6328 value:0x1000
> >              cat-95      [000] .....     1.412067: get_kprobe <-kprobe_ftrace_handler object:0xffff888108af6328 value:0x1000
> >              cat-95      [000] .....     1.412067: __bio_try_merge_page <-bio_add_page object:0xffff888108af6328 value:0x1000
> >              cat-95      [000] .....     1.412067: __bio_add_page <-bio_add_page object:0xffff888108af6328 value:0x1000
> >              cat-95      [000] .....     1.412067: rcu_read_unlock_strict <-xa_load object:0xffff888108af6328 value:0x2000
> >              cat-95      [000] .....     1.412067: bio_add_page <-ext4_mpage_readpages object:0xffff888108af6328 value:0x2000
> >              cat-95      [000] .....     1.412067: kprobe_ftrace_handler <-ftrace_ops_list_func object:0xffff888108af6328 value:0x2000
> >              cat-95      [000] .....     1.412067: get_kprobe <-kprobe_ftrace_handler object:0xffff888108af6328 value:0x2000
> >              cat-95      [000] .....     1.412067: __bio_try_merge_page <-bio_add_page object:0xffff888108af6328 value:0x2000
> >              cat-95      [000] .....     1.412068: submit_bio <-ext4_mpage_readpages object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412068: submit_bio_noacct <-ext4_mpage_readpages object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412068: __submit_bio <-submit_bio_noacct object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412068: blk_try_enter_queue <-__submit_bio object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412068: rcu_read_unlock_strict <-blk_try_enter_queue object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412068: rcu_read_unlock_strict <-blk_try_enter_queue object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412068: submit_bio_checks <-__submit_bio object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412068: __cond_resched <-submit_bio_checks object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412068: rcu_all_qs <-__cond_resched object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412068: should_fail_bio <-submit_bio_checks object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412069: create_task_io_context <-submit_bio_checks object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412069: kmem_cache_alloc_node <-create_task_io_context object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412069: should_failslab <-kmem_cache_alloc_node object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412069: _raw_spin_lock <-create_task_io_context object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412069: blk_mq_submit_bio <-__submit_bio object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412069: __blk_queue_split <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412069: bvec_split_segs <-__blk_queue_split object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412069: blk_attempt_plug_merge <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412070: __blk_mq_sched_bio_merge <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412070: dd_bio_merge <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412070: _raw_spin_lock <-dd_bio_merge object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412070: blk_mq_sched_try_merge <-dd_bio_merge object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412070: elv_merge <-blk_mq_sched_try_merge object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412070: elv_rqhash_find <-elv_merge object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412070: dd_request_merge <-blk_mq_sched_try_merge object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412070: elv_rb_find <-dd_request_merge object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412070: __blk_mq_alloc_request <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412071: dd_limit_depth <-__blk_mq_alloc_request object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412071: blk_mq_get_tag <-__blk_mq_alloc_request object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412071: __blk_mq_get_tag <-blk_mq_get_tag object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412071: blk_mq_rq_ctx_init.isra.0 <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412071: ktime_get <-blk_mq_rq_ctx_init.isra.0 object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412071: dd_prepare_request <-blk_mq_rq_ctx_init.isra.0 object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412071: __blk_account_io_start <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412071: kprobe_ftrace_handler <-ftrace_ops_list_func object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412071: get_kprobe <-kprobe_ftrace_handler object:0xffff888108af6328 value:0x3000
> >
> > Jeff Xie (5):
> >   trace: Add trace any kernel object
> >   trace/objtrace: get the value of the object
> >   trace/README: Document objtrace trigger syntax
> >   trace/objtrace: Add testcases for objtrace
> >   trace/objtrace: Add documentation for objtrace
> >
> >  Documentation/trace/events.rst                | 114 +++
> >  include/linux/trace_events.h                  |   1 +
> >  kernel/trace/Kconfig                          |  10 +
> >  kernel/trace/Makefile                         |   1 +
> >  kernel/trace/trace.c                          |   3 +
> >  kernel/trace/trace.h                          |   8 +
> >  kernel/trace/trace_entries.h                  |  18 +
> >  kernel/trace/trace_events_trigger.c           |   1 +
> >  kernel/trace/trace_object.c                   | 660 ++++++++++++++++++
> >  kernel/trace/trace_output.c                   |  40 ++
> >  .../ftrace/test.d/trigger/trigger-objtrace.tc |  39 ++
> >  11 files changed, 895 insertions(+)
> >  create mode 100644 kernel/trace/trace_object.c
> >  create mode 100644 tools/testing/selftests/ftrace/test.d/trigger/trigger-objtrace.tc
> >
> > --
> > 2.25.1
> >
> 
> Thanks,
> ---
> JeffXie


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH v6 0/5] trace: Introduce objtrace trigger to trace the kernel object
  2021-11-29 16:49 [RFC][PATCH v6 0/5] trace: Introduce objtrace trigger to trace the kernel object Jeff Xie
                   ` (5 preceding siblings ...)
  2021-12-10 16:55 ` [RFC][PATCH v6 0/5] trace: Introduce objtrace trigger to trace the kernel object Jeff Xie
@ 2021-12-16 14:50 ` Masami Hiramatsu
  2021-12-17  1:43   ` Jeff Xie
  6 siblings, 1 reply; 26+ messages in thread
From: Masami Hiramatsu @ 2021-12-16 14:50 UTC (permalink / raw)
  To: Jeff Xie; +Cc: rostedt, mhiramat, mingo, zanussi, linux-kernel

Hi Jeff,

On Tue, 30 Nov 2021 00:49:46 +0800
Jeff Xie <xiehuan09@gmail.com> wrote:

> Introduce a method based on function tracer to trace any object and get
> the value of the object dynamically. the object can be obtained from the
> dynamic event (kprobe_event/uprobe_event) or the static event(tracepoint).

Now you can remove [RFC] tag from this series, I think.
And can you also add a changelog (short description of the changes) from
the previous version, so that reviewers can find the difference
point easily? Unless that, I need to make a diff from your previous version. :(

Thank you,

> 
> Usage:
> When using the kprobe event, only need to set the objtrace(a new trigger),
> we can get the value of the object. The object is from the setting of the 
> kprobe event.
> 
> For example:
> For the function bio_add_page():
> 
> int bio_add_page(struct bio *bio, struct page *page,
> 	unsigned int len, unsigned int offset)
> 
> Firstly, we can set the base of the object, thus the first string "arg1"
> stands for the value of the first parameter of this function bio_add_gage(),
> 
> # echo 'p bio_add_page arg1=$arg1' >> ./kprobe_events
> 
> Secondly, we can get the value dynamically base the above object. 
> 
> find the offset of the bi_size in struct bio:
> $ gdb vmlinux
> (gdb) p &(((struct bio *)0)->bi_iter.bi_size)
> $1 = (unsigned int *) 0x28
> 
> # echo 'objtrace:add:arg1,0x28:u32:1 if comm == "cat"' > ./events/kprobes/ \
> 	p_bio_add_page_0/trigger
> 
> The best way to use this is that we can set the entrance event and exit
> event, for example, the following example is to set the read_papes as
> the entrance event, and set the __blk_account_io_start as the exit event.
> 
> # cd /sys/kernel/debug/tracing/
> # echo 0 > ./tracing_on
> # echo 'p read_pages' >> ./kprobe_events
> # echo 'p __blk_account_io_start' >> ./kprobe_events
> # echo 'traceon if comm == "cat"' > ./events/kprobes/p_read_pages_0/trigger
> # echo 'traceoff if comm == "cat"' > ./events/kprobes/p___blk_account_io_start_0/trigger
> # echo 'p bio_add_page arg1=$arg1' >> ./kprobe_events
> # echo 'objtrace:add:arg1,0x28:u32:1 if comm == "cat"' > ./events/kprobes/p_bio_add_page_0/trigger
> 
> # du -sh /test.txt
> 12.0K   /test.txt
> 
> # cat  /test.txt > /dev/null
> # cat ./trace
> 
> # tracer: nop
> #
> # entries-in-buffer/entries-written: 50/50   #P:1
> #
> #                                _-----=> irqs-off
> #                               / _----=> need-resched
> #                              | / _---=> hardirq/softirq
> #                              || / _--=> preempt-depth
> #                              ||| / _-=> migrate-disable
> #                              |||| /     delay
> #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
> #              | |         |   |||||     |         |
>              cat-95      [000] .....     1.412065: _raw_spin_unlock_irqrestore <-event_triggers_call object:0xffff888108af6328 value:0x0
>              cat-95      [000] .....     1.412066: __bio_try_merge_page <-bio_add_page object:0xffff888108af6328 value:0x0
>              cat-95      [000] .....     1.412066: __bio_add_page <-bio_add_page object:0xffff888108af6328 value:0x0
>              cat-95      [000] .....     1.412066: rcu_read_unlock_strict <-xa_load object:0xffff888108af6328 value:0x1000
>              cat-95      [000] .....     1.412066: bio_add_page <-ext4_mpage_readpages object:0xffff888108af6328 value:0x1000
>              cat-95      [000] .....     1.412066: kprobe_ftrace_handler <-ftrace_ops_list_func object:0xffff888108af6328 value:0x1000
>              cat-95      [000] .....     1.412067: get_kprobe <-kprobe_ftrace_handler object:0xffff888108af6328 value:0x1000
>              cat-95      [000] .....     1.412067: __bio_try_merge_page <-bio_add_page object:0xffff888108af6328 value:0x1000
>              cat-95      [000] .....     1.412067: __bio_add_page <-bio_add_page object:0xffff888108af6328 value:0x1000
>              cat-95      [000] .....     1.412067: rcu_read_unlock_strict <-xa_load object:0xffff888108af6328 value:0x2000
>              cat-95      [000] .....     1.412067: bio_add_page <-ext4_mpage_readpages object:0xffff888108af6328 value:0x2000
>              cat-95      [000] .....     1.412067: kprobe_ftrace_handler <-ftrace_ops_list_func object:0xffff888108af6328 value:0x2000
>              cat-95      [000] .....     1.412067: get_kprobe <-kprobe_ftrace_handler object:0xffff888108af6328 value:0x2000
>              cat-95      [000] .....     1.412067: __bio_try_merge_page <-bio_add_page object:0xffff888108af6328 value:0x2000
>              cat-95      [000] .....     1.412068: submit_bio <-ext4_mpage_readpages object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412068: submit_bio_noacct <-ext4_mpage_readpages object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412068: __submit_bio <-submit_bio_noacct object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412068: blk_try_enter_queue <-__submit_bio object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412068: rcu_read_unlock_strict <-blk_try_enter_queue object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412068: rcu_read_unlock_strict <-blk_try_enter_queue object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412068: submit_bio_checks <-__submit_bio object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412068: __cond_resched <-submit_bio_checks object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412068: rcu_all_qs <-__cond_resched object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412068: should_fail_bio <-submit_bio_checks object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412069: create_task_io_context <-submit_bio_checks object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412069: kmem_cache_alloc_node <-create_task_io_context object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412069: should_failslab <-kmem_cache_alloc_node object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412069: _raw_spin_lock <-create_task_io_context object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412069: blk_mq_submit_bio <-__submit_bio object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412069: __blk_queue_split <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412069: bvec_split_segs <-__blk_queue_split object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412069: blk_attempt_plug_merge <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412070: __blk_mq_sched_bio_merge <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412070: dd_bio_merge <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412070: _raw_spin_lock <-dd_bio_merge object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412070: blk_mq_sched_try_merge <-dd_bio_merge object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412070: elv_merge <-blk_mq_sched_try_merge object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412070: elv_rqhash_find <-elv_merge object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412070: dd_request_merge <-blk_mq_sched_try_merge object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412070: elv_rb_find <-dd_request_merge object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412070: __blk_mq_alloc_request <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412071: dd_limit_depth <-__blk_mq_alloc_request object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412071: blk_mq_get_tag <-__blk_mq_alloc_request object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412071: __blk_mq_get_tag <-blk_mq_get_tag object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412071: blk_mq_rq_ctx_init.isra.0 <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412071: ktime_get <-blk_mq_rq_ctx_init.isra.0 object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412071: dd_prepare_request <-blk_mq_rq_ctx_init.isra.0 object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412071: __blk_account_io_start <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412071: kprobe_ftrace_handler <-ftrace_ops_list_func object:0xffff888108af6328 value:0x3000
>              cat-95      [000] .....     1.412071: get_kprobe <-kprobe_ftrace_handler object:0xffff888108af6328 value:0x3000
> 
> Jeff Xie (5):
>   trace: Add trace any kernel object
>   trace/objtrace: get the value of the object
>   trace/README: Document objtrace trigger syntax
>   trace/objtrace: Add testcases for objtrace
>   trace/objtrace: Add documentation for objtrace
> 
>  Documentation/trace/events.rst                | 114 +++
>  include/linux/trace_events.h                  |   1 +
>  kernel/trace/Kconfig                          |  10 +
>  kernel/trace/Makefile                         |   1 +
>  kernel/trace/trace.c                          |   3 +
>  kernel/trace/trace.h                          |   8 +
>  kernel/trace/trace_entries.h                  |  18 +
>  kernel/trace/trace_events_trigger.c           |   1 +
>  kernel/trace/trace_object.c                   | 660 ++++++++++++++++++
>  kernel/trace/trace_output.c                   |  40 ++
>  .../ftrace/test.d/trigger/trigger-objtrace.tc |  39 ++
>  11 files changed, 895 insertions(+)
>  create mode 100644 kernel/trace/trace_object.c
>  create mode 100644 tools/testing/selftests/ftrace/test.d/trigger/trigger-objtrace.tc
> 
> -- 
> 2.25.1
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH v6 3/5] trace/README: Document objtrace trigger syntax
  2021-11-29 16:49 ` [RFC][PATCH v6 3/5] trace/README: Document objtrace trigger syntax Jeff Xie
@ 2021-12-16 15:02   ` Masami Hiramatsu
  2021-12-17  1:45     ` Jeff Xie
  0 siblings, 1 reply; 26+ messages in thread
From: Masami Hiramatsu @ 2021-12-16 15:02 UTC (permalink / raw)
  To: Jeff Xie; +Cc: rostedt, mhiramat, mingo, zanussi, linux-kernel

Hi Jeff,

On Tue, 30 Nov 2021 00:49:49 +0800
Jeff Xie <xiehuan09@gmail.com> wrote:

> Add a short description of the use of objtrace trigger.

Please merge this into the [1/5] and [2/5], because the changes of
"a new feature" and "the existance of the feature which
kernel indicates" must be atomic.

My recommendation is

- add following line in the first patch

	"\t            objtrace:add:obj[:count][if <filter>]\n"

- and update it as following in the next patch according to the changes

	"\t            objtrace:add:obj[,offset][:type][:count][if <filter>]\n"


Thank you!


> 
> Signed-off-by: Jeff Xie <xiehuan09@gmail.com>
> ---
>  kernel/trace/trace.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 88de94da596b..a84b6813769e 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -5590,6 +5590,9 @@ static const char readme_msg[] =
>  	"\t            enable_hist:<system>:<event>\n"
>  	"\t            disable_hist:<system>:<event>\n"
>  #endif
> +#ifdef CONFIG_TRACE_OBJECT
> +	"\t            objtrace:add:obj[,offset][:type][:count][if <filter>]\n"
> +#endif
>  #ifdef CONFIG_STACKTRACE
>  	"\t\t    stacktrace\n"
>  #endif
> -- 
> 2.25.1
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH v6 0/5] trace: Introduce objtrace trigger to trace the kernel object
  2021-12-16 14:50 ` Masami Hiramatsu
@ 2021-12-17  1:43   ` Jeff Xie
  0 siblings, 0 replies; 26+ messages in thread
From: Jeff Xie @ 2021-12-17  1:43 UTC (permalink / raw)
  To: Masami Hiramatsu; +Cc: Steven Rostedt, mingo, Tom Zanussi, linux-kernel

Hi Masami,

Thanks for your review.

On Thu, Dec 16, 2021 at 10:50 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
>
> Hi Jeff,
>
> On Tue, 30 Nov 2021 00:49:46 +0800
> Jeff Xie <xiehuan09@gmail.com> wrote:
>
> > Introduce a method based on function tracer to trace any object and get
> > the value of the object dynamically. the object can be obtained from the
> > dynamic event (kprobe_event/uprobe_event) or the static event(tracepoint).
>
> Now you can remove [RFC] tag from this series, I think.
> And can you also add a changelog (short description of the changes) from
> the previous version, so that reviewers can find the difference
> point easily? Unless that, I need to make a diff from your previous version. :(

Ok, I will remove the [RFC] and add changelog.

> Thank you,
>
> >
> > Usage:
> > When using the kprobe event, only need to set the objtrace(a new trigger),
> > we can get the value of the object. The object is from the setting of the
> > kprobe event.
> >
> > For example:
> > For the function bio_add_page():
> >
> > int bio_add_page(struct bio *bio, struct page *page,
> >       unsigned int len, unsigned int offset)
> >
> > Firstly, we can set the base of the object, thus the first string "arg1"
> > stands for the value of the first parameter of this function bio_add_gage(),
> >
> > # echo 'p bio_add_page arg1=$arg1' >> ./kprobe_events
> >
> > Secondly, we can get the value dynamically base the above object.
> >
> > find the offset of the bi_size in struct bio:
> > $ gdb vmlinux
> > (gdb) p &(((struct bio *)0)->bi_iter.bi_size)
> > $1 = (unsigned int *) 0x28
> >
> > # echo 'objtrace:add:arg1,0x28:u32:1 if comm == "cat"' > ./events/kprobes/ \
> >       p_bio_add_page_0/trigger
> >
> > The best way to use this is that we can set the entrance event and exit
> > event, for example, the following example is to set the read_papes as
> > the entrance event, and set the __blk_account_io_start as the exit event.
> >
> > # cd /sys/kernel/debug/tracing/
> > # echo 0 > ./tracing_on
> > # echo 'p read_pages' >> ./kprobe_events
> > # echo 'p __blk_account_io_start' >> ./kprobe_events
> > # echo 'traceon if comm == "cat"' > ./events/kprobes/p_read_pages_0/trigger
> > # echo 'traceoff if comm == "cat"' > ./events/kprobes/p___blk_account_io_start_0/trigger
> > # echo 'p bio_add_page arg1=$arg1' >> ./kprobe_events
> > # echo 'objtrace:add:arg1,0x28:u32:1 if comm == "cat"' > ./events/kprobes/p_bio_add_page_0/trigger
> >
> > # du -sh /test.txt
> > 12.0K   /test.txt
> >
> > # cat  /test.txt > /dev/null
> > # cat ./trace
> >
> > # tracer: nop
> > #
> > # entries-in-buffer/entries-written: 50/50   #P:1
> > #
> > #                                _-----=> irqs-off
> > #                               / _----=> need-resched
> > #                              | / _---=> hardirq/softirq
> > #                              || / _--=> preempt-depth
> > #                              ||| / _-=> migrate-disable
> > #                              |||| /     delay
> > #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
> > #              | |         |   |||||     |         |
> >              cat-95      [000] .....     1.412065: _raw_spin_unlock_irqrestore <-event_triggers_call object:0xffff888108af6328 value:0x0
> >              cat-95      [000] .....     1.412066: __bio_try_merge_page <-bio_add_page object:0xffff888108af6328 value:0x0
> >              cat-95      [000] .....     1.412066: __bio_add_page <-bio_add_page object:0xffff888108af6328 value:0x0
> >              cat-95      [000] .....     1.412066: rcu_read_unlock_strict <-xa_load object:0xffff888108af6328 value:0x1000
> >              cat-95      [000] .....     1.412066: bio_add_page <-ext4_mpage_readpages object:0xffff888108af6328 value:0x1000
> >              cat-95      [000] .....     1.412066: kprobe_ftrace_handler <-ftrace_ops_list_func object:0xffff888108af6328 value:0x1000
> >              cat-95      [000] .....     1.412067: get_kprobe <-kprobe_ftrace_handler object:0xffff888108af6328 value:0x1000
> >              cat-95      [000] .....     1.412067: __bio_try_merge_page <-bio_add_page object:0xffff888108af6328 value:0x1000
> >              cat-95      [000] .....     1.412067: __bio_add_page <-bio_add_page object:0xffff888108af6328 value:0x1000
> >              cat-95      [000] .....     1.412067: rcu_read_unlock_strict <-xa_load object:0xffff888108af6328 value:0x2000
> >              cat-95      [000] .....     1.412067: bio_add_page <-ext4_mpage_readpages object:0xffff888108af6328 value:0x2000
> >              cat-95      [000] .....     1.412067: kprobe_ftrace_handler <-ftrace_ops_list_func object:0xffff888108af6328 value:0x2000
> >              cat-95      [000] .....     1.412067: get_kprobe <-kprobe_ftrace_handler object:0xffff888108af6328 value:0x2000
> >              cat-95      [000] .....     1.412067: __bio_try_merge_page <-bio_add_page object:0xffff888108af6328 value:0x2000
> >              cat-95      [000] .....     1.412068: submit_bio <-ext4_mpage_readpages object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412068: submit_bio_noacct <-ext4_mpage_readpages object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412068: __submit_bio <-submit_bio_noacct object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412068: blk_try_enter_queue <-__submit_bio object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412068: rcu_read_unlock_strict <-blk_try_enter_queue object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412068: rcu_read_unlock_strict <-blk_try_enter_queue object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412068: submit_bio_checks <-__submit_bio object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412068: __cond_resched <-submit_bio_checks object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412068: rcu_all_qs <-__cond_resched object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412068: should_fail_bio <-submit_bio_checks object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412069: create_task_io_context <-submit_bio_checks object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412069: kmem_cache_alloc_node <-create_task_io_context object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412069: should_failslab <-kmem_cache_alloc_node object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412069: _raw_spin_lock <-create_task_io_context object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412069: blk_mq_submit_bio <-__submit_bio object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412069: __blk_queue_split <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412069: bvec_split_segs <-__blk_queue_split object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412069: blk_attempt_plug_merge <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412070: __blk_mq_sched_bio_merge <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412070: dd_bio_merge <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412070: _raw_spin_lock <-dd_bio_merge object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412070: blk_mq_sched_try_merge <-dd_bio_merge object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412070: elv_merge <-blk_mq_sched_try_merge object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412070: elv_rqhash_find <-elv_merge object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412070: dd_request_merge <-blk_mq_sched_try_merge object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412070: elv_rb_find <-dd_request_merge object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412070: __blk_mq_alloc_request <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412071: dd_limit_depth <-__blk_mq_alloc_request object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412071: blk_mq_get_tag <-__blk_mq_alloc_request object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412071: __blk_mq_get_tag <-blk_mq_get_tag object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412071: blk_mq_rq_ctx_init.isra.0 <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412071: ktime_get <-blk_mq_rq_ctx_init.isra.0 object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412071: dd_prepare_request <-blk_mq_rq_ctx_init.isra.0 object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412071: __blk_account_io_start <-blk_mq_submit_bio object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412071: kprobe_ftrace_handler <-ftrace_ops_list_func object:0xffff888108af6328 value:0x3000
> >              cat-95      [000] .....     1.412071: get_kprobe <-kprobe_ftrace_handler object:0xffff888108af6328 value:0x3000
> >
> > Jeff Xie (5):
> >   trace: Add trace any kernel object
> >   trace/objtrace: get the value of the object
> >   trace/README: Document objtrace trigger syntax
> >   trace/objtrace: Add testcases for objtrace
> >   trace/objtrace: Add documentation for objtrace
> >
> >  Documentation/trace/events.rst                | 114 +++
> >  include/linux/trace_events.h                  |   1 +
> >  kernel/trace/Kconfig                          |  10 +
> >  kernel/trace/Makefile                         |   1 +
> >  kernel/trace/trace.c                          |   3 +
> >  kernel/trace/trace.h                          |   8 +
> >  kernel/trace/trace_entries.h                  |  18 +
> >  kernel/trace/trace_events_trigger.c           |   1 +
> >  kernel/trace/trace_object.c                   | 660 ++++++++++++++++++
> >  kernel/trace/trace_output.c                   |  40 ++
> >  .../ftrace/test.d/trigger/trigger-objtrace.tc |  39 ++
> >  11 files changed, 895 insertions(+)
> >  create mode 100644 kernel/trace/trace_object.c
> >  create mode 100644 tools/testing/selftests/ftrace/test.d/trigger/trigger-objtrace.tc
> >
> > --
> > 2.25.1
> >
>
>
> --
> Masami Hiramatsu <mhiramat@kernel.org>

---
JeffXie

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH v6 3/5] trace/README: Document objtrace trigger syntax
  2021-12-16 15:02   ` Masami Hiramatsu
@ 2021-12-17  1:45     ` Jeff Xie
  0 siblings, 0 replies; 26+ messages in thread
From: Jeff Xie @ 2021-12-17  1:45 UTC (permalink / raw)
  To: Masami Hiramatsu; +Cc: Steven Rostedt, mingo, Tom Zanussi, linux-kernel

Hi Masami,

On Thu, Dec 16, 2021 at 11:02 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
>
> Hi Jeff,
>
> On Tue, 30 Nov 2021 00:49:49 +0800
> Jeff Xie <xiehuan09@gmail.com> wrote:
>
> > Add a short description of the use of objtrace trigger.
>
> Please merge this into the [1/5] and [2/5], because the changes of
> "a new feature" and "the existance of the feature which
> kernel indicates" must be atomic.
>
> My recommendation is
>
> - add following line in the first patch
>
>         "\t            objtrace:add:obj[:count][if <filter>]\n"
>
> - and update it as following in the next patch according to the changes
>
>         "\t            objtrace:add:obj[,offset][:type][:count][if <filter>]\n"

I will merge them, Thanks.

>
> Thank you!
>
>
> >
> > Signed-off-by: Jeff Xie <xiehuan09@gmail.com>
> > ---
> >  kernel/trace/trace.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> > index 88de94da596b..a84b6813769e 100644
> > --- a/kernel/trace/trace.c
> > +++ b/kernel/trace/trace.c
> > @@ -5590,6 +5590,9 @@ static const char readme_msg[] =
> >       "\t            enable_hist:<system>:<event>\n"
> >       "\t            disable_hist:<system>:<event>\n"
> >  #endif
> > +#ifdef CONFIG_TRACE_OBJECT
> > +     "\t            objtrace:add:obj[,offset][:type][:count][if <filter>]\n"
> > +#endif
> >  #ifdef CONFIG_STACKTRACE
> >       "\t\t    stacktrace\n"
> >  #endif
> > --
> > 2.25.1
> >
>
>
> --
> Masami Hiramatsu <mhiramat@kernel.org>
---
JeffXie

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH v6 1/5] trace: Add trace any kernel object
  2021-11-29 16:49 ` [RFC][PATCH v6 1/5] trace: Add trace any " Jeff Xie
  2021-11-29 23:05     ` kernel test robot
@ 2021-12-17  4:51   ` Masami Hiramatsu
  2021-12-17 16:32     ` Jeff Xie
  1 sibling, 1 reply; 26+ messages in thread
From: Masami Hiramatsu @ 2021-12-17  4:51 UTC (permalink / raw)
  To: Jeff Xie; +Cc: rostedt, mhiramat, mingo, zanussi, linux-kernel

On Tue, 30 Nov 2021 00:49:47 +0800
Jeff Xie <xiehuan09@gmail.com> wrote:

> Introduce objtrace trigger to get the execution flow by tracing
> any kernel object in the kernel parameter.

Here, please add simple usage (no need to add output) since we can
see only this comment after merging this on git log.
If you run `git log -- kernel/trace/` and search "tracing: Add ",
then you can see more examples to add a new feature. ;)

What would you think this?
-----
Introduce objtrace trigger to get the call flow by tracing
any kernel object in the function parameter.

The objtrace trigger makes a list of the target object address from
the given event parameter, and records all kernel function calls
which has the object address in its parameter.

User can identify the kernel call flow related to the object easily.

Syntax:
...

Usage:
...
-----

> 
> Signed-off-by: Jeff Xie <xiehuan09@gmail.com>
> ---
>  include/linux/trace_events.h        |   1 +
>  kernel/trace/Kconfig                |  10 +
>  kernel/trace/Makefile               |   1 +
>  kernel/trace/trace.h                |   8 +
>  kernel/trace/trace_entries.h        |  17 +
>  kernel/trace/trace_events_trigger.c |   1 +
>  kernel/trace/trace_object.c         | 534 ++++++++++++++++++++++++++++
>  kernel/trace/trace_output.c         |  40 +++
>  8 files changed, 612 insertions(+)
>  create mode 100644 kernel/trace/trace_object.c
> 
> diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
> index 2d167ac3452c..54baec78ce62 100644
> --- a/include/linux/trace_events.h
> +++ b/include/linux/trace_events.h
> @@ -684,6 +684,7 @@ enum event_trigger_type {
>  	ETT_EVENT_HIST		= (1 << 4),
>  	ETT_HIST_ENABLE		= (1 << 5),
>  	ETT_EVENT_EPROBE	= (1 << 6),
> +	ETT_TRACE_OBJECT	= (1 << 7),
>  };
>  
>  extern int filter_match_preds(struct event_filter *filter, void *rec);
> diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
> index 420ff4bc67fd..f69761cbfaaa 100644
> --- a/kernel/trace/Kconfig
> +++ b/kernel/trace/Kconfig
> @@ -237,6 +237,16 @@ config FUNCTION_PROFILER
>  
>  	  If in doubt, say N.
>  
> +config TRACE_OBJECT
> +	bool "Trace kernel object in function parameter"
> +	depends on FUNCTION_TRACER
> +	depends on HAVE_FUNCTION_ARG_ACCESS_API
> +	select TRACING
> +	default y
> +	help
> +	 You can trace the kernel object in the kernel function parameter.
> +	 The kernel object is dynamically specified via event trigger.
> +
>  config STACK_TRACER
>  	bool "Trace max stack"
>  	depends on HAVE_FUNCTION_TRACER
> diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
> index bedc5caceec7..b924b8e55922 100644
> --- a/kernel/trace/Makefile
> +++ b/kernel/trace/Makefile
> @@ -67,6 +67,7 @@ obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += trace_functions_graph.o
>  obj-$(CONFIG_TRACE_BRANCH_PROFILING) += trace_branch.o
>  obj-$(CONFIG_BLK_DEV_IO_TRACE) += blktrace.o
>  obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += fgraph.o
> +obj-$(CONFIG_TRACE_OBJECT) += trace_object.o
>  ifeq ($(CONFIG_BLOCK),y)
>  obj-$(CONFIG_EVENT_TRACING) += blktrace.o
>  endif
> diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> index 38715aa6cfdf..46a1301ac47a 100644
> --- a/kernel/trace/trace.h
> +++ b/kernel/trace/trace.h
> @@ -54,6 +54,7 @@ enum trace_type {
>  	TRACE_TIMERLAT,
>  	TRACE_RAW_DATA,
>  	TRACE_FUNC_REPEATS,
> +	TRACE_OBJECT,
>  
>  	__TRACE_LAST_TYPE,
>  };
> @@ -469,6 +470,7 @@ extern void __ftrace_bad_type(void);
>  			  TRACE_GRAPH_RET);		\
>  		IF_ASSIGN(var, ent, struct func_repeats_entry,		\
>  			  TRACE_FUNC_REPEATS);				\
> +		IF_ASSIGN(var, ent, struct trace_object_entry, TRACE_OBJECT);\
>  		__ftrace_bad_type();					\
>  	} while (0)
>  
> @@ -1531,6 +1533,12 @@ static inline int register_trigger_hist_cmd(void) { return 0; }
>  static inline int register_trigger_hist_enable_disable_cmds(void) { return 0; }
>  #endif
>  
> +#ifdef CONFIG_TRACE_OBJECT
> +extern int register_trigger_object_cmd(void);
> +#else
> +static inline int register_trigger_object_cmd(void) { return 0; }
> +#endif
> +
>  extern int register_trigger_cmds(void);
>  extern void clear_event_triggers(struct trace_array *tr);
>  
> diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h
> index cd41e863b51c..bb120d9498a9 100644
> --- a/kernel/trace/trace_entries.h
> +++ b/kernel/trace/trace_entries.h
> @@ -401,3 +401,20 @@ FTRACE_ENTRY(timerlat, timerlat_entry,
>  		 __entry->context,
>  		 __entry->timer_latency)
>  );
> +
> +/*
> + * trace object entry:
> + */
> +FTRACE_ENTRY(object, trace_object_entry,
> +
> +	TRACE_OBJECT,
> +
> +	F_STRUCT(
> +		__field(	unsigned long,		ip		)
> +		__field(	unsigned long,		parent_ip	)
> +		__field(	unsigned long,		object		)
> +	),
> +
> +	F_printk(" %ps <-- %ps object:%lx\n",
> +		 (void *)__entry->ip, (void *)__entry->parent_ip, __entry->object)
> +);
> diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
> index 3d5c07239a2a..da2d871fa809 100644
> --- a/kernel/trace/trace_events_trigger.c
> +++ b/kernel/trace/trace_events_trigger.c
> @@ -1687,6 +1687,7 @@ __init int register_trigger_cmds(void)
>  	register_trigger_enable_disable_cmds();
>  	register_trigger_hist_enable_disable_cmds();
>  	register_trigger_hist_cmd();
> +	register_trigger_object_cmd();
>  
>  	return 0;
>  }
> diff --git a/kernel/trace/trace_object.c b/kernel/trace/trace_object.c
> new file mode 100644
> index 000000000000..83fccd587b63
> --- /dev/null
> +++ b/kernel/trace/trace_object.c
> @@ -0,0 +1,534 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * trace the kernel object in the kernel function parameter
> + * Copyright (C) 2021 Jeff Xie <xiehuan09@gmail.com>
> + */
> +
> +#define pr_fmt(fmt) "trace_object: " fmt
> +
> +#include "trace_output.h"
> +#include <linux/freelist.h>
> +
> +static DEFINE_PER_CPU(atomic_t, trace_object_event_disable);

atomic_t is for atomic operation which must be shared among cpus. On the
other hand, per-cpu variable is used for the core-local storage or flags,
other cpus never touch it. Thus the per-cpu atomic_t is very strange.

> +static DEFINE_RAW_SPINLOCK(object_spin_lock);
> +static struct trace_event_file event_trace_file;
> +static const int max_args_num = 6;
> +static const int max_obj_pool = 10;
> +static atomic_t trace_object_ref;
> +static int exit_trace_object(void);
> +static int init_trace_object(void);
> +

Please add more comments to the code itself. Explain why this is needed
and how it works for which case. That will lead deeper understanding.

> +struct object_instance {
> +	void *object;
> +	struct freelist_node free_list;
> +	struct list_head active_list;
> +};
> +
> +struct obj_pool {
> +	struct freelist_head free_list;
> +	struct list_head active_list;
> +};
> +static struct obj_pool *obj_pool;
> +
> +static bool object_exist(void *obj)
> +{
> +	struct object_instance *inst;
> +	bool ret = false;
> +
> +	list_for_each_entry_rcu(inst, &obj_pool->active_list, active_list) {
> +		if (inst->object == obj) {
> +			ret = true;
> +			goto out;
> +		}
> +	}
> +out:
> +	return ret;
> +}
> +
> +static bool object_empty(void)
> +{
> +	return list_empty(&obj_pool->active_list);
> +}
> +
> +static void set_trace_object(void *obj)
> +{
> +	struct freelist_node *fn;
> +	struct object_instance *ins;
> +	unsigned long flags;
> +
> +	if (in_nmi())
> +		return;
> +
> +	if (!obj)
> +		return;
> +
> +	if (object_exist(obj))
> +		return;
> +
> +	fn = freelist_try_get(&obj_pool->free_list);
> +	if (!fn) {
> +		trace_printk("object_pool is full, can't trace object:0x%px\n", obj);
> +		return;
> +	}
> +
> +	ins = container_of(fn, struct object_instance, free_list);
> +	ins->object = obj;
> +
> +	raw_spin_lock_irqsave(&object_spin_lock, flags);
> +	list_add_rcu(&ins->active_list, &obj_pool->active_list);
> +	raw_spin_unlock_irqrestore(&object_spin_lock, flags);

Please add a comment that why this spinlock is needed here and why
other operation doesn't need it.

> +}
> +
> +static inline void free_free_list_objects(struct freelist_head *head)
> +{
> +
> +	struct object_instance *inst;
> +	struct freelist_node *node = head->head;
> +
> +	while (node) {
> +		inst = container_of(node, struct object_instance, free_list);
> +		node = node->next;
> +		kfree(inst);

Note that this is safe only if this free_list is no more accessed.

> +	}
> +}
> +
> +static inline void free_active_list_objects(struct list_head *head)
> +{
> +	struct object_instance *inst;
> +
> +	list_for_each_entry_rcu(inst, head, active_list)
> +		kfree(inst);

No, you can not do this, since the 'inst' is still on the active_list.
Before freeing the 'inst', you need to do list_del_rcu().
Moreover, the 'inst' might be still accessed from the callbacks,
you can not directly free it.

RCU ensures that looping on a list can be done safely before and after
removing/adding a node, but not ensure it is freeable soon, because
someone still looking the node.

I guess you may expect that all callbacks are stopped after calling

unregister_ftrace_function(&trace_ops);

but that is wrong. That don't synchronize the callbacks (= wait
for finishing all running callbacks).
So, you should use synchronize_rcu() after that.

> +}
> +
> +static inline void free_object_pool(void)
> +{
> +	free_free_list_objects(&obj_pool->free_list);
> +	free_active_list_objects(&obj_pool->active_list);

BTW, did you also ensure all the trigger callback has been done?
set_trace_object() has a small time window when the object_instance
is not linked to both of the free_list and the active_list.
Thus you have to make sure that the function is never called while
this function. (How to do it? please write the comment about it)

> +	kfree(obj_pool);
> +}
> +
> +
> +static int init_object_pool(void)
> +{
> +	struct object_instance *inst;
> +	int i, ret = 0;
> +
> +	obj_pool = kzalloc(sizeof(*obj_pool), GFP_KERNEL);
> +	if (!obj_pool) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +
> +	INIT_LIST_HEAD(&obj_pool->active_list);
> +
> +	for (i = 0; i < max_obj_pool; i++) {
> +		inst = kzalloc(sizeof(*inst), GFP_KERNEL);
> +		if (!inst) {
> +			free_object_pool();
> +			ret = -ENOMEM;
> +			goto out;
> +		}
> +		freelist_add(&inst->free_list, &obj_pool->free_list);
> +	}
> +out:
> +	return ret;
> +}
> +
> +static void submit_trace_object(unsigned long ip, unsigned long parent_ip,
> +				 unsigned long object)
> +{
> +
> +	struct trace_buffer *buffer;
> +	struct ring_buffer_event *event;
> +	struct trace_object_entry *entry;
> +	int pc;
> +
> +	pc = preempt_count();
> +	event = trace_event_buffer_lock_reserve(&buffer, &event_trace_file,
> +			TRACE_OBJECT, sizeof(*entry), pc);
> +	if (!event)
> +		return;
> +	entry   = ring_buffer_event_data(event);
> +	entry->ip                       = ip;
> +	entry->parent_ip                = parent_ip;
> +	entry->object			= object;
> +
> +	event_trigger_unlock_commit(&event_trace_file, buffer, event,
> +		entry, pc);
> +}
> +
> +static void
> +trace_object_events_call(unsigned long ip, unsigned long parent_ip,
> +		struct ftrace_ops *op, struct ftrace_regs *fregs)
> +{
> +	struct pt_regs *pt_regs = ftrace_get_regs(fregs);
> +	unsigned long obj;
> +	long disabled;
> +	int cpu, n;
> +
> +	preempt_disable_notrace();
> +
> +	cpu = raw_smp_processor_id();
> +	disabled = atomic_inc_return(&per_cpu(trace_object_event_disable, cpu));

So what is the purpose of this check? (are there any issue if the same
cpu reenter here?)

> +
> +	if (disabled != 1)
> +		goto out;
> +
> +	if (object_empty())
> +		goto out;
> +
> +	for (n = 0; n < max_args_num; n++) {
> +		obj = regs_get_kernel_argument(pt_regs, n);
> +		if (object_exist((void *)obj))
> +			submit_trace_object(ip, parent_ip, obj);

Please add a comment why don't you break the loop after finding
any trace object here.

> +	}
> +
> +out:
> +	atomic_dec(&per_cpu(trace_object_event_disable, cpu));
> +	preempt_enable_notrace();
> +}
> +
> +static struct ftrace_ops trace_ops = {
> +	.func  = trace_object_events_call,
> +	.flags = FTRACE_OPS_FL_SAVE_REGS,
> +};
> +
> +static void
> +trace_object_trigger(struct event_trigger_data *data,
> +		   struct trace_buffer *buffer,  void *rec,
> +		   struct ring_buffer_event *event)
> +{
> +
> +	struct ftrace_event_field *field = data->private_data;
> +	void *obj = NULL;
> +
> +	memcpy(&obj, rec + field->offset, sizeof(obj));
> +	set_trace_object(obj);
> +}
> +

/* This is clone of event_trigger_free(). */

> +static void
> +trace_object_trigger_free(struct event_trigger_ops *ops,
> +		   struct event_trigger_data *data)
> +{
> +	if (WARN_ON_ONCE(data->ref <= 0))
> +		return;
> +
> +	data->ref--;
> +	if (!data->ref)
> +		trigger_data_free(data);
> +}
> +
> +static void
> +trace_object_count_trigger(struct event_trigger_data *data,
> +			 struct trace_buffer *buffer, void *rec,
> +			 struct ring_buffer_event *event)
> +{
> +	if (!data->count)
> +		return;
> +
> +	if (data->count != -1)
> +		(data->count)--;
> +
> +	trace_object_trigger(data, buffer, rec, event);
> +}
> +

/* This is the clone of event_trigger_init() */

> +static int event_object_trigger_init(struct event_trigger_ops *ops,
> +		       struct event_trigger_data *data)
> +{
> +	data->ref++;
> +	return 0;
> +}


At this moment, I think cloning those functions are OK. But after merged
and enough tested, trigger should expose these functions and this
should use it.


BTW, it seems you are sharing the trace object active_list (obj_pool::active_list)
among all object trace triggers. This means if user sets one objtrace trigger
on 2 events and remove from one event, the active_list still have the object
which sets by the event we removed. If it is correct, I think this should be
noted in the document.
Something like;

Note that currently this is not designed to trace objects of different types
at the same time. You must set an objtrace trigger to only one event, or
you must carefully choose the same type object when you set the objtrace on
differnt events. Also, the traced object list is not cleared unless all
triggers are removed.


> +
> +static int
> +event_trigger_print(const char *name, struct seq_file *m,
> +		    void *data, char *filter_str)
> +{
> +	long count = (long)data;
> +
> +	seq_puts(m, name);
> +
> +	if (count == -1)
> +		seq_puts(m, ":unlimited");
> +	else
> +		seq_printf(m, ":count=%ld", count);
> +
> +	if (filter_str)
> +		seq_printf(m, " if %s\n", filter_str);
> +	else
> +		seq_putc(m, '\n');
> +
> +	return 0;
> +}
> +
> +static int
> +trace_object_trigger_print(struct seq_file *m, struct event_trigger_ops *ops,
> +			 struct event_trigger_data *data)
> +{
> +	return event_trigger_print("objtrace", m, (void *)data->count,
> +				   data->filter_str);
> +}
> +
> +static struct event_trigger_ops objecttrace_trigger_ops = {
> +	.func			= trace_object_trigger,
> +	.print			= trace_object_trigger_print,
> +	.init			= event_object_trigger_init,
> +	.free			= trace_object_trigger_free,
> +};
> +
> +static struct event_trigger_ops objecttrace_count_trigger_ops = {
> +	.func			= trace_object_count_trigger,
> +	.print			= trace_object_trigger_print,
> +	.init			= event_object_trigger_init,
> +	.free			= trace_object_trigger_free,
> +};
> +
> +static struct event_trigger_ops *
> +objecttrace_get_trigger_ops(char *cmd, char *param)
> +{
> +	return param ? &objecttrace_count_trigger_ops : &objecttrace_trigger_ops;
> +}
> +
> +static int register_object_trigger(char *glob, struct event_trigger_ops *ops,
> +			    struct event_trigger_data *data,
> +			    struct trace_event_file *file)
> +{
> +	struct event_trigger_data *test;
> +	int ret = 0;
> +
> +	lockdep_assert_held(&event_mutex);
> +
> +	list_for_each_entry(test, &file->triggers, list) {
> +		if (test->cmd_ops->trigger_type == data->cmd_ops->trigger_type) {
> +			ret = -EEXIST;
> +			goto out;
> +		}
> +	}
> +
> +	if (data->ops->init) {
> +		ret = data->ops->init(data->ops, data);
> +		if (ret < 0)
> +			goto out;
> +	}
> +
> +	list_add_rcu(&data->list, &file->triggers);
> +	ret++;
> +
> +	update_cond_flag(file);
> +	if (trace_event_trigger_enable_disable(file, 1) < 0) {
> +		list_del_rcu(&data->list);
> +		update_cond_flag(file);
> +		ret--;
> +	}
> +	init_trace_object();
> +out:
> +	return ret;
> +}
> +
> +static void unregister_object_trigger(char *glob, struct event_trigger_ops *ops,
> +			       struct event_trigger_data *test,
> +			       struct trace_event_file *file)
> +{
> +	struct event_trigger_data *data;
> +	bool unregistered = false;
> +
> +	lockdep_assert_held(&event_mutex);
> +
> +	list_for_each_entry(data, &file->triggers, list) {
> +		if (data->cmd_ops->trigger_type == test->cmd_ops->trigger_type) {
> +			unregistered = true;
> +			list_del_rcu(&data->list);
> +			trace_event_trigger_enable_disable(file, 0);
> +			update_cond_flag(file);
> +			break;
> +		}
> +	}
> +
> +	if (unregistered && data->ops->free) {
> +		data->ops->free(data->ops, data);
> +		exit_trace_object();
> +	}

isn't this the following? Can you skip exit_trace_object() 
if data->ops->free == NULL?

	if (unregistered) {
		if (data->ops->free)
			data->ops->free(data->ops, data);
		exit_trace_object();
	}

Thank you,


> +}
> +
> +static int
> +event_object_trigger_callback(struct event_command *cmd_ops,
> +		       struct trace_event_file *file,
> +		       char *glob, char *cmd, char *param)
> +{
> +	struct event_trigger_data *trigger_data;
> +	struct event_trigger_ops *trigger_ops;
> +	struct trace_event_call *call;
> +	struct ftrace_event_field *field;
> +	char *objtrace_cmd;
> +	char *trigger = NULL;
> +	char *arg;
> +	char *number;
> +	int ret;
> +
> +	ret = -EINVAL;
> +	if (!param)
> +		goto out;
> +
> +	/* separate the trigger from the filter (c:a:n [if filter]) */
> +	trigger = strsep(&param, " \t");
> +	if (!trigger)
> +		goto out;
> +	if (param) {
> +		param = skip_spaces(param);
> +		if (!*param)
> +			param = NULL;
> +	}
> +
> +	objtrace_cmd = strsep(&trigger, ":");
> +	if (!objtrace_cmd || strcmp(objtrace_cmd, "add"))
> +		goto out;
> +
> +	arg = strsep(&trigger, ":");
> +	if (!arg)
> +		goto out;
> +	call = file->event_call;
> +	field = trace_find_event_field(call, arg);
> +	if (!field)
> +		goto out;
> +
> +	if (field->size != sizeof(void *))
> +		goto out;
> +
> +	trigger_ops = cmd_ops->get_trigger_ops(cmd, trigger);
> +
> +	ret = -ENOMEM;
> +	trigger_data = kzalloc(sizeof(*trigger_data), GFP_KERNEL);
> +	if (!trigger_data)
> +		goto out;
> +
> +	trigger_data->count = -1;
> +	trigger_data->ops = trigger_ops;
> +	trigger_data->cmd_ops = cmd_ops;
> +	trigger_data->private_data = field;
> +	INIT_LIST_HEAD(&trigger_data->list);
> +	INIT_LIST_HEAD(&trigger_data->named_list);
> +
> +	if (glob[0] == '!') {
> +		cmd_ops->unreg(glob+1, trigger_ops, trigger_data, file);
> +		kfree(trigger_data);
> +		ret = 0;
> +		goto out;
> +	}
> +
> +	if (trigger) {
> +		number = strsep(&trigger, ":");
> +
> +		ret = -EINVAL;
> +		if (!strlen(number))
> +			goto out_free;
> +
> +		/*
> +		 * We use the callback data field (which is a pointer)
> +		 * as our counter.
> +		 */
> +		ret = kstrtoul(number, 0, &trigger_data->count);
> +		if (ret)
> +			goto out_free;
> +	}
> +
> +	if (!param) /* if param is non-empty, it's supposed to be a filter */
> +		goto out_reg;
> +
> +	if (!cmd_ops->set_filter)
> +		goto out_reg;
> +
> +	ret = cmd_ops->set_filter(param, trigger_data, file);
> +	if (ret < 0)
> +		goto out_free;
> +
> + out_reg:
> +	/* Up the trigger_data count to make sure reg doesn't free it on failure */
> +	event_object_trigger_init(trigger_ops, trigger_data);
> +	ret = cmd_ops->reg(glob, trigger_ops, trigger_data, file);
> +	/*
> +	 * The above returns on success the # of functions enabled,
> +	 * but if it didn't find any functions it returns zero.
> +	 * Consider no functions a failure too.
> +	 */
> +	if (!ret) {
> +		cmd_ops->unreg(glob, trigger_ops, trigger_data, file);
> +		ret = -ENOENT;
> +	} else if (ret > 0)
> +		ret = 0;
> +
> +	/* Down the counter of trigger_data or free it if not used anymore */
> +	trace_object_trigger_free(trigger_ops, trigger_data);
> + out:
> +	return ret;
> +
> + out_free:
> +	if (cmd_ops->set_filter)
> +		cmd_ops->set_filter(NULL, trigger_data, NULL);
> +	kfree(trigger_data);
> +	goto out;
> +}
> +
> +static struct event_command trigger_object_cmd = {
> +	.name			= "objtrace",
> +	.trigger_type		= ETT_TRACE_OBJECT,
> +	.flags			= EVENT_CMD_FL_NEEDS_REC,
> +	.func			= event_object_trigger_callback,
> +	.reg			= register_object_trigger,
> +	.unreg			= unregister_object_trigger,
> +	.get_trigger_ops	= objecttrace_get_trigger_ops,
> +	.set_filter		= set_trigger_filter,
> +};
> +
> +__init int register_trigger_object_cmd(void)
> +{
> +	int ret;
> +
> +	ret = register_event_command(&trigger_object_cmd);
> +	WARN_ON(ret < 0);
> +
> +	return ret;
> +}
> +
> +static int init_trace_object(void)
> +{
> +	int ret;
> +
> +	if (atomic_inc_return(&trace_object_ref) != 1) {
> +		ret = 0;
> +		goto out;
> +	}
> +
> +	ret = init_object_pool();
> +	if (ret)
> +		goto out;
> +
> +	event_trace_file.tr = top_trace_array();
> +	if (WARN_ON(!event_trace_file.tr)) {
> +		ret = -1;
> +		goto out;
> +	}
> +	ret = register_ftrace_function(&trace_ops);
> +out:
> +	return ret;
> +}
> +
> +static int exit_trace_object(void)
> +{
> +	int ret;
> +
> +	if (WARN_ON_ONCE(atomic_read(&trace_object_ref) <= 0))
> +		goto out;
> +
> +	if (atomic_dec_return(&trace_object_ref) != 0) {
> +		ret = 0;
> +		goto out;
> +	}
> +
> +	ret = unregister_ftrace_function(&trace_ops);
> +	if (ret) {
> +		pr_err("can't unregister ftrace for trace object\n");
> +		goto out;
> +	}
> +	free_object_pool();
> +out:
> +	return ret;
> +}
> diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
> index 3547e7176ff7..d747aed27104 100644
> --- a/kernel/trace/trace_output.c
> +++ b/kernel/trace/trace_output.c
> @@ -1543,6 +1543,45 @@ static struct trace_event trace_func_repeats_event = {
>  	.funcs		= &trace_func_repeats_funcs,
>  };
>  
> +/* TRACE_OBJECT */
> +static enum print_line_t trace_object_print(struct trace_iterator *iter, int flags,
> +					struct trace_event *event)
> +{
> +	struct trace_object_entry *field;
> +	struct trace_seq *s = &iter->seq;
> +
> +	trace_assign_type(field, iter->ent);
> +	print_fn_trace(s, field->ip, field->parent_ip, flags);
> +	trace_seq_printf(s, " object:0x%lx", field->object);
> +	trace_seq_putc(s, '\n');
> +
> +	return trace_handle_return(s);
> +}
> +
> +static enum print_line_t trace_object_raw(struct trace_iterator *iter, int flags,
> +				      struct trace_event *event)
> +{
> +	struct trace_object_entry *field;
> +
> +	trace_assign_type(field, iter->ent);
> +
> +	trace_seq_printf(&iter->seq, "%lx %lx\n",
> +			 field->ip,
> +			 field->parent_ip);
> +
> +	return trace_handle_return(&iter->seq);
> +}
> +
> +static struct trace_event_functions trace_object_funcs = {
> +	.trace		= trace_object_print,
> +	.raw		= trace_object_raw,
> +};
> +
> +static struct trace_event trace_object_event = {
> +	.type		= TRACE_OBJECT,
> +	.funcs		= &trace_object_funcs,
> +};
> +
>  static struct trace_event *events[] __initdata = {
>  	&trace_fn_event,
>  	&trace_ctx_event,
> @@ -1557,6 +1596,7 @@ static struct trace_event *events[] __initdata = {
>  	&trace_timerlat_event,
>  	&trace_raw_data_event,
>  	&trace_func_repeats_event,
> +	&trace_object_event,
>  	NULL
>  };
>  
> -- 
> 2.25.1
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH v6 1/5] trace: Add trace any kernel object
  2021-12-17  4:51   ` Masami Hiramatsu
@ 2021-12-17 16:32     ` Jeff Xie
  2021-12-19  3:07       ` Masami Hiramatsu
  2021-12-21  7:36       ` Masami Hiramatsu
  0 siblings, 2 replies; 26+ messages in thread
From: Jeff Xie @ 2021-12-17 16:32 UTC (permalink / raw)
  To: Masami Hiramatsu; +Cc: Steven Rostedt, mingo, Tom Zanussi, linux-kernel

Hi Masami,

Thanks for your review.

On Fri, Dec 17, 2021 at 12:51 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
>
> On Tue, 30 Nov 2021 00:49:47 +0800
> Jeff Xie <xiehuan09@gmail.com> wrote:
>
> > Introduce objtrace trigger to get the execution flow by tracing
> > any kernel object in the kernel parameter.
>
> Here, please add simple usage (no need to add output) since we can
> see only this comment after merging this on git log.
> If you run `git log -- kernel/trace/` and search "tracing: Add ",
> then you can see more examples to add a new feature. ;)
>
> What would you think this?
> -----
> Introduce objtrace trigger to get the call flow by tracing
> any kernel object in the function parameter.
>
> The objtrace trigger makes a list of the target object address from
> the given event parameter, and records all kernel function calls
> which has the object address in its parameter.
>
> User can identify the kernel call flow related to the object easily.
>
> Syntax:
> ...
>
> Usage:
> ...

I agree, I will add it.

> -----
>
> >
> > Signed-off-by: Jeff Xie <xiehuan09@gmail.com>
> > ---
> >  include/linux/trace_events.h        |   1 +
> >  kernel/trace/Kconfig                |  10 +
> >  kernel/trace/Makefile               |   1 +
> >  kernel/trace/trace.h                |   8 +
> >  kernel/trace/trace_entries.h        |  17 +
> >  kernel/trace/trace_events_trigger.c |   1 +
> >  kernel/trace/trace_object.c         | 534 ++++++++++++++++++++++++++++
> >  kernel/trace/trace_output.c         |  40 +++
> >  8 files changed, 612 insertions(+)
> >  create mode 100644 kernel/trace/trace_object.c
> >
> > diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
> > index 2d167ac3452c..54baec78ce62 100644
> > --- a/include/linux/trace_events.h
> > +++ b/include/linux/trace_events.h
> > @@ -684,6 +684,7 @@ enum event_trigger_type {
> >       ETT_EVENT_HIST          = (1 << 4),
> >       ETT_HIST_ENABLE         = (1 << 5),
> >       ETT_EVENT_EPROBE        = (1 << 6),
> > +     ETT_TRACE_OBJECT        = (1 << 7),
> >  };
> >
> >  extern int filter_match_preds(struct event_filter *filter, void *rec);
> > diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
> > index 420ff4bc67fd..f69761cbfaaa 100644
> > --- a/kernel/trace/Kconfig
> > +++ b/kernel/trace/Kconfig
> > @@ -237,6 +237,16 @@ config FUNCTION_PROFILER
> >
> >         If in doubt, say N.
> >
> > +config TRACE_OBJECT
> > +     bool "Trace kernel object in function parameter"
> > +     depends on FUNCTION_TRACER
> > +     depends on HAVE_FUNCTION_ARG_ACCESS_API
> > +     select TRACING
> > +     default y
> > +     help
> > +      You can trace the kernel object in the kernel function parameter.
> > +      The kernel object is dynamically specified via event trigger.
> > +
> >  config STACK_TRACER
> >       bool "Trace max stack"
> >       depends on HAVE_FUNCTION_TRACER
> > diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
> > index bedc5caceec7..b924b8e55922 100644
> > --- a/kernel/trace/Makefile
> > +++ b/kernel/trace/Makefile
> > @@ -67,6 +67,7 @@ obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += trace_functions_graph.o
> >  obj-$(CONFIG_TRACE_BRANCH_PROFILING) += trace_branch.o
> >  obj-$(CONFIG_BLK_DEV_IO_TRACE) += blktrace.o
> >  obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += fgraph.o
> > +obj-$(CONFIG_TRACE_OBJECT) += trace_object.o
> >  ifeq ($(CONFIG_BLOCK),y)
> >  obj-$(CONFIG_EVENT_TRACING) += blktrace.o
> >  endif
> > diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> > index 38715aa6cfdf..46a1301ac47a 100644
> > --- a/kernel/trace/trace.h
> > +++ b/kernel/trace/trace.h
> > @@ -54,6 +54,7 @@ enum trace_type {
> >       TRACE_TIMERLAT,
> >       TRACE_RAW_DATA,
> >       TRACE_FUNC_REPEATS,
> > +     TRACE_OBJECT,
> >
> >       __TRACE_LAST_TYPE,
> >  };
> > @@ -469,6 +470,7 @@ extern void __ftrace_bad_type(void);
> >                         TRACE_GRAPH_RET);             \
> >               IF_ASSIGN(var, ent, struct func_repeats_entry,          \
> >                         TRACE_FUNC_REPEATS);                          \
> > +             IF_ASSIGN(var, ent, struct trace_object_entry, TRACE_OBJECT);\
> >               __ftrace_bad_type();                                    \
> >       } while (0)
> >
> > @@ -1531,6 +1533,12 @@ static inline int register_trigger_hist_cmd(void) { return 0; }
> >  static inline int register_trigger_hist_enable_disable_cmds(void) { return 0; }
> >  #endif
> >
> > +#ifdef CONFIG_TRACE_OBJECT
> > +extern int register_trigger_object_cmd(void);
> > +#else
> > +static inline int register_trigger_object_cmd(void) { return 0; }
> > +#endif
> > +
> >  extern int register_trigger_cmds(void);
> >  extern void clear_event_triggers(struct trace_array *tr);
> >
> > diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h
> > index cd41e863b51c..bb120d9498a9 100644
> > --- a/kernel/trace/trace_entries.h
> > +++ b/kernel/trace/trace_entries.h
> > @@ -401,3 +401,20 @@ FTRACE_ENTRY(timerlat, timerlat_entry,
> >                __entry->context,
> >                __entry->timer_latency)
> >  );
> > +
> > +/*
> > + * trace object entry:
> > + */
> > +FTRACE_ENTRY(object, trace_object_entry,
> > +
> > +     TRACE_OBJECT,
> > +
> > +     F_STRUCT(
> > +             __field(        unsigned long,          ip              )
> > +             __field(        unsigned long,          parent_ip       )
> > +             __field(        unsigned long,          object          )
> > +     ),
> > +
> > +     F_printk(" %ps <-- %ps object:%lx\n",
> > +              (void *)__entry->ip, (void *)__entry->parent_ip, __entry->object)
> > +);
> > diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
> > index 3d5c07239a2a..da2d871fa809 100644
> > --- a/kernel/trace/trace_events_trigger.c
> > +++ b/kernel/trace/trace_events_trigger.c
> > @@ -1687,6 +1687,7 @@ __init int register_trigger_cmds(void)
> >       register_trigger_enable_disable_cmds();
> >       register_trigger_hist_enable_disable_cmds();
> >       register_trigger_hist_cmd();
> > +     register_trigger_object_cmd();
> >
> >       return 0;
> >  }
> > diff --git a/kernel/trace/trace_object.c b/kernel/trace/trace_object.c
> > new file mode 100644
> > index 000000000000..83fccd587b63
> > --- /dev/null
> > +++ b/kernel/trace/trace_object.c
> > @@ -0,0 +1,534 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * trace the kernel object in the kernel function parameter
> > + * Copyright (C) 2021 Jeff Xie <xiehuan09@gmail.com>
> > + */
> > +
> > +#define pr_fmt(fmt) "trace_object: " fmt
> > +
> > +#include "trace_output.h"
> > +#include <linux/freelist.h>
> > +
> > +static DEFINE_PER_CPU(atomic_t, trace_object_event_disable);
>
> atomic_t is for atomic operation which must be shared among cpus. On the
> other hand, per-cpu variable is used for the core-local storage or flags,
> other cpus never touch it. Thus the per-cpu atomic_t is very strange.
>

From the patch V1, I cloned it from the  function_test_events_call()
in kernel/trace/trace_events.c

commit: 9ea21c1ecdb35ecdcac5fd9d95f62a1f6a7ffec0
tracing/events: perform function tracing in event selftests
Author:     Steven Rostedt <srostedt@redhat.com>

It should be to prevent being preempted by the interrupt context in
the process of adding one

> > +static DEFINE_RAW_SPINLOCK(object_spin_lock);
> > +static struct trace_event_file event_trace_file;
> > +static const int max_args_num = 6;
> > +static const int max_obj_pool = 10;
> > +static atomic_t trace_object_ref;
> > +static int exit_trace_object(void);
> > +static int init_trace_object(void);
> > +
>
> Please add more comments to the code itself. Explain why this is needed
> and how it works for which case. That will lead deeper understanding.
>

I agree, I will add more comments in the next version.

> > +struct object_instance {
> > +     void *object;
> > +     struct freelist_node free_list;
> > +     struct list_head active_list;
> > +};
> > +
> > +struct obj_pool {
> > +     struct freelist_head free_list;
> > +     struct list_head active_list;
> > +};
> > +static struct obj_pool *obj_pool;
> > +
> > +static bool object_exist(void *obj)
> > +{
> > +     struct object_instance *inst;
> > +     bool ret = false;
> > +
> > +     list_for_each_entry_rcu(inst, &obj_pool->active_list, active_list) {
> > +             if (inst->object == obj) {
> > +                     ret = true;
> > +                     goto out;
> > +             }
> > +     }
> > +out:
> > +     return ret;
> > +}
> > +
> > +static bool object_empty(void)
> > +{
> > +     return list_empty(&obj_pool->active_list);
> > +}
> > +
> > +static void set_trace_object(void *obj)
> > +{
> > +     struct freelist_node *fn;
> > +     struct object_instance *ins;
> > +     unsigned long flags;
> > +
> > +     if (in_nmi())
> > +             return;
> > +
> > +     if (!obj)
> > +             return;
> > +
> > +     if (object_exist(obj))
> > +             return;
> > +
> > +     fn = freelist_try_get(&obj_pool->free_list);
> > +     if (!fn) {
> > +             trace_printk("object_pool is full, can't trace object:0x%px\n", obj);
> > +             return;
> > +     }
> > +
> > +     ins = container_of(fn, struct object_instance, free_list);
> > +     ins->object = obj;
> > +
> > +     raw_spin_lock_irqsave(&object_spin_lock, flags);
> > +     list_add_rcu(&ins->active_list, &obj_pool->active_list);
> > +     raw_spin_unlock_irqrestore(&object_spin_lock, flags);
>
> Please add a comment that why this spinlock is needed here and why
> other operation doesn't need it.

(Only this place has write operations, and it cannot be concurrent)
I agree, I will add it.

> > +}
> > +
> > +static inline void free_free_list_objects(struct freelist_head *head)
> > +{
> > +
> > +     struct object_instance *inst;
> > +     struct freelist_node *node = head->head;
> > +
> > +     while (node) {
> > +             inst = container_of(node, struct object_instance, free_list);
> > +             node = node->next;
> > +             kfree(inst);
>
> Note that this is safe only if this free_list is no more accessed.
>

Maybe it need to use rcu to handle it , I will think about how to  fix
it in the next version.

> > +     }
> > +}
> > +
> > +static inline void free_active_list_objects(struct list_head *head)
> > +{
> > +     struct object_instance *inst;
> > +
> > +     list_for_each_entry_rcu(inst, head, active_list)
> > +             kfree(inst);
>
> No, you can not do this, since the 'inst' is still on the active_list.
> Before freeing the 'inst', you need to do list_del_rcu().
> Moreover, the 'inst' might be still accessed from the callbacks,
> you can not directly free it.
>
> RCU ensures that looping on a list can be done safely before and after
> removing/adding a node, but not ensure it is freeable soon, because
> someone still looking the node.
>
> I guess you may expect that all callbacks are stopped after calling

Yes you guessed it right ;-)

> unregister_ftrace_function(&trace_ops);
>
> but that is wrong. That don't synchronize the callbacks (= wait
> for finishing all running callbacks).
> So, you should use synchronize_rcu() after that.

Thanks,  I will fix it.

> > +}
> > +
> > +static inline void free_object_pool(void)
> > +{
> > +     free_free_list_objects(&obj_pool->free_list);
> > +     free_active_list_objects(&obj_pool->active_list);
>
> BTW, did you also ensure all the trigger callback has been done?
> set_trace_object() has a small time window when the object_instance
> is not linked to both of the free_list and the active_list.
> Thus you have to make sure that the function is never called while
> this function. (How to do it? please write the comment about it)

I ignored this small time window, maybe use the rcu can handle it, I
will think about how to fix it.

> > +     kfree(obj_pool);
> > +}
> > +
> > +
> > +static int init_object_pool(void)
> > +{
> > +     struct object_instance *inst;
> > +     int i, ret = 0;
> > +
> > +     obj_pool = kzalloc(sizeof(*obj_pool), GFP_KERNEL);
> > +     if (!obj_pool) {
> > +             ret = -ENOMEM;
> > +             goto out;
> > +     }
> > +
> > +     INIT_LIST_HEAD(&obj_pool->active_list);
> > +
> > +     for (i = 0; i < max_obj_pool; i++) {
> > +             inst = kzalloc(sizeof(*inst), GFP_KERNEL);
> > +             if (!inst) {
> > +                     free_object_pool();
> > +                     ret = -ENOMEM;
> > +                     goto out;
> > +             }
> > +             freelist_add(&inst->free_list, &obj_pool->free_list);
> > +     }
> > +out:
> > +     return ret;
> > +}
> > +
> > +static void submit_trace_object(unsigned long ip, unsigned long parent_ip,
> > +                              unsigned long object)
> > +{
> > +
> > +     struct trace_buffer *buffer;
> > +     struct ring_buffer_event *event;
> > +     struct trace_object_entry *entry;
> > +     int pc;
> > +
> > +     pc = preempt_count();
> > +     event = trace_event_buffer_lock_reserve(&buffer, &event_trace_file,
> > +                     TRACE_OBJECT, sizeof(*entry), pc);
> > +     if (!event)
> > +             return;
> > +     entry   = ring_buffer_event_data(event);
> > +     entry->ip                       = ip;
> > +     entry->parent_ip                = parent_ip;
> > +     entry->object                   = object;
> > +
> > +     event_trigger_unlock_commit(&event_trace_file, buffer, event,
> > +             entry, pc);
> > +}
> > +
> > +static void
> > +trace_object_events_call(unsigned long ip, unsigned long parent_ip,
> > +             struct ftrace_ops *op, struct ftrace_regs *fregs)
> > +{
> > +     struct pt_regs *pt_regs = ftrace_get_regs(fregs);
> > +     unsigned long obj;
> > +     long disabled;
> > +     int cpu, n;
> > +
> > +     preempt_disable_notrace();
> > +
> > +     cpu = raw_smp_processor_id();
> > +     disabled = atomic_inc_return(&per_cpu(trace_object_event_disable, cpu));
>
> So what is the purpose of this check? (are there any issue if the same
> cpu reenter here?)
>

There may be an interrupt context that can preempt it. I am not yet
sure whether the cpu reenter  will affect it.
I will debug and test it. (Referred from function_test_events_call())

> > +
> > +     if (disabled != 1)
> > +             goto out;
> > +
> > +     if (object_empty())
> > +             goto out;
> > +
> > +     for (n = 0; n < max_args_num; n++) {
> > +             obj = regs_get_kernel_argument(pt_regs, n);
> > +             if (object_exist((void *)obj))
> > +                     submit_trace_object(ip, parent_ip, obj);
>
> Please add a comment why don't you break the loop after finding
> any trace object here.
>
(because the parameters of a function may match multiple objects)
I will add the comment in the next version.

> > +     }
> > +
> > +out:
> > +     atomic_dec(&per_cpu(trace_object_event_disable, cpu));
> > +     preempt_enable_notrace();
> > +}
> > +
> > +static struct ftrace_ops trace_ops = {
> > +     .func  = trace_object_events_call,
> > +     .flags = FTRACE_OPS_FL_SAVE_REGS,
> > +};
> > +
> > +static void
> > +trace_object_trigger(struct event_trigger_data *data,
> > +                struct trace_buffer *buffer,  void *rec,
> > +                struct ring_buffer_event *event)
> > +{
> > +
> > +     struct ftrace_event_field *field = data->private_data;
> > +     void *obj = NULL;
> > +
> > +     memcpy(&obj, rec + field->offset, sizeof(obj));
> > +     set_trace_object(obj);
> > +}
> > +
>
> /* This is clone of event_trigger_free(). */
>
> > +static void
> > +trace_object_trigger_free(struct event_trigger_ops *ops,
> > +                struct event_trigger_data *data)
> > +{
> > +     if (WARN_ON_ONCE(data->ref <= 0))
> > +             return;
> > +
> > +     data->ref--;
> > +     if (!data->ref)
> > +             trigger_data_free(data);
> > +}
> > +
> > +static void
> > +trace_object_count_trigger(struct event_trigger_data *data,
> > +                      struct trace_buffer *buffer, void *rec,
> > +                      struct ring_buffer_event *event)
> > +{
> > +     if (!data->count)
> > +             return;
> > +
> > +     if (data->count != -1)
> > +             (data->count)--;
> > +
> > +     trace_object_trigger(data, buffer, rec, event);
> > +}
> > +
>
> /* This is the clone of event_trigger_init() */
>
> > +static int event_object_trigger_init(struct event_trigger_ops *ops,
> > +                    struct event_trigger_data *data)
> > +{
> > +     data->ref++;
> > +     return 0;
> > +}
>
>
> At this moment, I think cloning those functions are OK. But after merged
> and enough tested, trigger should expose these functions and this
> should use it.

I agree.

>
> BTW, it seems you are sharing the trace object active_list (obj_pool::active_list)
> among all object trace triggers. This means if user sets one objtrace trigger
> on 2 events and remove from one event, the active_list still have the object
> which sets by the event we removed. If it is correct, I think this should be
> noted in the document.
> Something like;
>
> Note that currently this is not designed to trace objects of different types
> at the same time. You must set an objtrace trigger to only one event, or
> you must carefully choose the same type object when you set the objtrace on
> differnt events. Also, the traced object list is not cleared unless all
> triggers are removed.
>
Thanks, I agree, I will add it.
>
> > +
> > +static int
> > +event_trigger_print(const char *name, struct seq_file *m,
> > +                 void *data, char *filter_str)
> > +{
> > +     long count = (long)data;
> > +
> > +     seq_puts(m, name);
> > +
> > +     if (count == -1)
> > +             seq_puts(m, ":unlimited");
> > +     else
> > +             seq_printf(m, ":count=%ld", count);
> > +
> > +     if (filter_str)
> > +             seq_printf(m, " if %s\n", filter_str);
> > +     else
> > +             seq_putc(m, '\n');
> > +
> > +     return 0;
> > +}
> > +
> > +static int
> > +trace_object_trigger_print(struct seq_file *m, struct event_trigger_ops *ops,
> > +                      struct event_trigger_data *data)
> > +{
> > +     return event_trigger_print("objtrace", m, (void *)data->count,
> > +                                data->filter_str);
> > +}
> > +
> > +static struct event_trigger_ops objecttrace_trigger_ops = {
> > +     .func                   = trace_object_trigger,
> > +     .print                  = trace_object_trigger_print,
> > +     .init                   = event_object_trigger_init,
> > +     .free                   = trace_object_trigger_free,
> > +};
> > +
> > +static struct event_trigger_ops objecttrace_count_trigger_ops = {
> > +     .func                   = trace_object_count_trigger,
> > +     .print                  = trace_object_trigger_print,
> > +     .init                   = event_object_trigger_init,
> > +     .free                   = trace_object_trigger_free,
> > +};
> > +
> > +static struct event_trigger_ops *
> > +objecttrace_get_trigger_ops(char *cmd, char *param)
> > +{
> > +     return param ? &objecttrace_count_trigger_ops : &objecttrace_trigger_ops;
> > +}
> > +
> > +static int register_object_trigger(char *glob, struct event_trigger_ops *ops,
> > +                         struct event_trigger_data *data,
> > +                         struct trace_event_file *file)
> > +{
> > +     struct event_trigger_data *test;
> > +     int ret = 0;
> > +
> > +     lockdep_assert_held(&event_mutex);
> > +
> > +     list_for_each_entry(test, &file->triggers, list) {
> > +             if (test->cmd_ops->trigger_type == data->cmd_ops->trigger_type) {
> > +                     ret = -EEXIST;
> > +                     goto out;
> > +             }
> > +     }
> > +
> > +     if (data->ops->init) {
> > +             ret = data->ops->init(data->ops, data);
> > +             if (ret < 0)
> > +                     goto out;
> > +     }
> > +
> > +     list_add_rcu(&data->list, &file->triggers);
> > +     ret++;
> > +
> > +     update_cond_flag(file);
> > +     if (trace_event_trigger_enable_disable(file, 1) < 0) {
> > +             list_del_rcu(&data->list);
> > +             update_cond_flag(file);
> > +             ret--;
> > +     }
> > +     init_trace_object();
> > +out:
> > +     return ret;
> > +}
> > +
> > +static void unregister_object_trigger(char *glob, struct event_trigger_ops *ops,
> > +                            struct event_trigger_data *test,
> > +                            struct trace_event_file *file)
> > +{
> > +     struct event_trigger_data *data;
> > +     bool unregistered = false;
> > +
> > +     lockdep_assert_held(&event_mutex);
> > +
> > +     list_for_each_entry(data, &file->triggers, list) {
> > +             if (data->cmd_ops->trigger_type == test->cmd_ops->trigger_type) {
> > +                     unregistered = true;
> > +                     list_del_rcu(&data->list);
> > +                     trace_event_trigger_enable_disable(file, 0);
> > +                     update_cond_flag(file);
> > +                     break;
> > +             }
> > +     }
> > +
> > +     if (unregistered && data->ops->free) {
> > +             data->ops->free(data->ops, data);
> > +             exit_trace_object();
> > +     }
>
> isn't this the following? Can you skip exit_trace_object()
> if data->ops->free == NULL?
>
>         if (unregistered) {
>                 if (data->ops->free)
>                         data->ops->free(data->ops, data);
>                 exit_trace_object();
>         }

Thanks, this one would be better, although data->ops->free  is not
empty at the moment.

>
> Thank you,
>
>
> > +}
> > +
> > +static int
> > +event_object_trigger_callback(struct event_command *cmd_ops,
> > +                    struct trace_event_file *file,
> > +                    char *glob, char *cmd, char *param)
> > +{
> > +     struct event_trigger_data *trigger_data;
> > +     struct event_trigger_ops *trigger_ops;
> > +     struct trace_event_call *call;
> > +     struct ftrace_event_field *field;
> > +     char *objtrace_cmd;
> > +     char *trigger = NULL;
> > +     char *arg;
> > +     char *number;
> > +     int ret;
> > +
> > +     ret = -EINVAL;
> > +     if (!param)
> > +             goto out;
> > +
> > +     /* separate the trigger from the filter (c:a:n [if filter]) */
> > +     trigger = strsep(&param, " \t");
> > +     if (!trigger)
> > +             goto out;
> > +     if (param) {
> > +             param = skip_spaces(param);
> > +             if (!*param)
> > +                     param = NULL;
> > +     }
> > +
> > +     objtrace_cmd = strsep(&trigger, ":");
> > +     if (!objtrace_cmd || strcmp(objtrace_cmd, "add"))
> > +             goto out;
> > +
> > +     arg = strsep(&trigger, ":");
> > +     if (!arg)
> > +             goto out;
> > +     call = file->event_call;
> > +     field = trace_find_event_field(call, arg);
> > +     if (!field)
> > +             goto out;
> > +
> > +     if (field->size != sizeof(void *))
> > +             goto out;
> > +
> > +     trigger_ops = cmd_ops->get_trigger_ops(cmd, trigger);
> > +
> > +     ret = -ENOMEM;
> > +     trigger_data = kzalloc(sizeof(*trigger_data), GFP_KERNEL);
> > +     if (!trigger_data)
> > +             goto out;
> > +
> > +     trigger_data->count = -1;
> > +     trigger_data->ops = trigger_ops;
> > +     trigger_data->cmd_ops = cmd_ops;
> > +     trigger_data->private_data = field;
> > +     INIT_LIST_HEAD(&trigger_data->list);
> > +     INIT_LIST_HEAD(&trigger_data->named_list);
> > +
> > +     if (glob[0] == '!') {
> > +             cmd_ops->unreg(glob+1, trigger_ops, trigger_data, file);
> > +             kfree(trigger_data);
> > +             ret = 0;
> > +             goto out;
> > +     }
> > +
> > +     if (trigger) {
> > +             number = strsep(&trigger, ":");
> > +
> > +             ret = -EINVAL;
> > +             if (!strlen(number))
> > +                     goto out_free;
> > +
> > +             /*
> > +              * We use the callback data field (which is a pointer)
> > +              * as our counter.
> > +              */
> > +             ret = kstrtoul(number, 0, &trigger_data->count);
> > +             if (ret)
> > +                     goto out_free;
> > +     }
> > +
> > +     if (!param) /* if param is non-empty, it's supposed to be a filter */
> > +             goto out_reg;
> > +
> > +     if (!cmd_ops->set_filter)
> > +             goto out_reg;
> > +
> > +     ret = cmd_ops->set_filter(param, trigger_data, file);
> > +     if (ret < 0)
> > +             goto out_free;
> > +
> > + out_reg:
> > +     /* Up the trigger_data count to make sure reg doesn't free it on failure */
> > +     event_object_trigger_init(trigger_ops, trigger_data);
> > +     ret = cmd_ops->reg(glob, trigger_ops, trigger_data, file);
> > +     /*
> > +      * The above returns on success the # of functions enabled,
> > +      * but if it didn't find any functions it returns zero.
> > +      * Consider no functions a failure too.
> > +      */
> > +     if (!ret) {
> > +             cmd_ops->unreg(glob, trigger_ops, trigger_data, file);
> > +             ret = -ENOENT;
> > +     } else if (ret > 0)
> > +             ret = 0;
> > +
> > +     /* Down the counter of trigger_data or free it if not used anymore */
> > +     trace_object_trigger_free(trigger_ops, trigger_data);
> > + out:
> > +     return ret;
> > +
> > + out_free:
> > +     if (cmd_ops->set_filter)
> > +             cmd_ops->set_filter(NULL, trigger_data, NULL);
> > +     kfree(trigger_data);
> > +     goto out;
> > +}
> > +
> > +static struct event_command trigger_object_cmd = {
> > +     .name                   = "objtrace",
> > +     .trigger_type           = ETT_TRACE_OBJECT,
> > +     .flags                  = EVENT_CMD_FL_NEEDS_REC,
> > +     .func                   = event_object_trigger_callback,
> > +     .reg                    = register_object_trigger,
> > +     .unreg                  = unregister_object_trigger,
> > +     .get_trigger_ops        = objecttrace_get_trigger_ops,
> > +     .set_filter             = set_trigger_filter,
> > +};
> > +
> > +__init int register_trigger_object_cmd(void)
> > +{
> > +     int ret;
> > +
> > +     ret = register_event_command(&trigger_object_cmd);
> > +     WARN_ON(ret < 0);
> > +
> > +     return ret;
> > +}
> > +
> > +static int init_trace_object(void)
> > +{
> > +     int ret;
> > +
> > +     if (atomic_inc_return(&trace_object_ref) != 1) {
> > +             ret = 0;
> > +             goto out;
> > +     }
> > +
> > +     ret = init_object_pool();
> > +     if (ret)
> > +             goto out;
> > +
> > +     event_trace_file.tr = top_trace_array();
> > +     if (WARN_ON(!event_trace_file.tr)) {
> > +             ret = -1;
> > +             goto out;
> > +     }
> > +     ret = register_ftrace_function(&trace_ops);
> > +out:
> > +     return ret;
> > +}
> > +
> > +static int exit_trace_object(void)
> > +{
> > +     int ret;
> > +
> > +     if (WARN_ON_ONCE(atomic_read(&trace_object_ref) <= 0))
> > +             goto out;
> > +
> > +     if (atomic_dec_return(&trace_object_ref) != 0) {
> > +             ret = 0;
> > +             goto out;
> > +     }
> > +
> > +     ret = unregister_ftrace_function(&trace_ops);
> > +     if (ret) {
> > +             pr_err("can't unregister ftrace for trace object\n");
> > +             goto out;
> > +     }
> > +     free_object_pool();
> > +out:
> > +     return ret;
> > +}
> > diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
> > index 3547e7176ff7..d747aed27104 100644
> > --- a/kernel/trace/trace_output.c
> > +++ b/kernel/trace/trace_output.c
> > @@ -1543,6 +1543,45 @@ static struct trace_event trace_func_repeats_event = {
> >       .funcs          = &trace_func_repeats_funcs,
> >  };
> >
> > +/* TRACE_OBJECT */
> > +static enum print_line_t trace_object_print(struct trace_iterator *iter, int flags,
> > +                                     struct trace_event *event)
> > +{
> > +     struct trace_object_entry *field;
> > +     struct trace_seq *s = &iter->seq;
> > +
> > +     trace_assign_type(field, iter->ent);
> > +     print_fn_trace(s, field->ip, field->parent_ip, flags);
> > +     trace_seq_printf(s, " object:0x%lx", field->object);
> > +     trace_seq_putc(s, '\n');
> > +
> > +     return trace_handle_return(s);
> > +}
> > +
> > +static enum print_line_t trace_object_raw(struct trace_iterator *iter, int flags,
> > +                                   struct trace_event *event)
> > +{
> > +     struct trace_object_entry *field;
> > +
> > +     trace_assign_type(field, iter->ent);
> > +
> > +     trace_seq_printf(&iter->seq, "%lx %lx\n",
> > +                      field->ip,
> > +                      field->parent_ip);
> > +
> > +     return trace_handle_return(&iter->seq);
> > +}
> > +
> > +static struct trace_event_functions trace_object_funcs = {
> > +     .trace          = trace_object_print,
> > +     .raw            = trace_object_raw,
> > +};
> > +
> > +static struct trace_event trace_object_event = {
> > +     .type           = TRACE_OBJECT,
> > +     .funcs          = &trace_object_funcs,
> > +};
> > +
> >  static struct trace_event *events[] __initdata = {
> >       &trace_fn_event,
> >       &trace_ctx_event,
> > @@ -1557,6 +1596,7 @@ static struct trace_event *events[] __initdata = {
> >       &trace_timerlat_event,
> >       &trace_raw_data_event,
> >       &trace_func_repeats_event,
> > +     &trace_object_event,
> >       NULL
> >  };
> >
> > --
> > 2.25.1
> >
>
>
> --
> Masami Hiramatsu <mhiramat@kernel.org>

---
JeffXie

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH v6 1/5] trace: Add trace any kernel object
  2021-12-17 16:32     ` Jeff Xie
@ 2021-12-19  3:07       ` Masami Hiramatsu
  2022-01-08  0:21         ` Steven Rostedt
  2021-12-21  7:36       ` Masami Hiramatsu
  1 sibling, 1 reply; 26+ messages in thread
From: Masami Hiramatsu @ 2021-12-19  3:07 UTC (permalink / raw)
  To: Jeff Xie; +Cc: Steven Rostedt, mingo, Tom Zanussi, linux-kernel

On Sat, 18 Dec 2021 00:32:57 +0800
Jeff Xie <xiehuan09@gmail.com> wrote:

> Hi Masami,
> 
> Thanks for your review.
> 
> On Fri, Dec 17, 2021 at 12:51 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
> >
> > On Tue, 30 Nov 2021 00:49:47 +0800
> > Jeff Xie <xiehuan09@gmail.com> wrote:
> >
> > > Introduce objtrace trigger to get the execution flow by tracing
> > > any kernel object in the kernel parameter.
> >
> > Here, please add simple usage (no need to add output) since we can
> > see only this comment after merging this on git log.
> > If you run `git log -- kernel/trace/` and search "tracing: Add ",
> > then you can see more examples to add a new feature. ;)
> >
> > What would you think this?
> > -----
> > Introduce objtrace trigger to get the call flow by tracing
> > any kernel object in the function parameter.
> >
> > The objtrace trigger makes a list of the target object address from
> > the given event parameter, and records all kernel function calls
> > which has the object address in its parameter.
> >
> > User can identify the kernel call flow related to the object easily.
> >
> > Syntax:
> > ...
> >
> > Usage:
> > ...
> 
> I agree, I will add it.
> 
> > -----
> >
> > >
> > > Signed-off-by: Jeff Xie <xiehuan09@gmail.com>
> > > ---
> > >  include/linux/trace_events.h        |   1 +
> > >  kernel/trace/Kconfig                |  10 +
> > >  kernel/trace/Makefile               |   1 +
> > >  kernel/trace/trace.h                |   8 +
> > >  kernel/trace/trace_entries.h        |  17 +
> > >  kernel/trace/trace_events_trigger.c |   1 +
> > >  kernel/trace/trace_object.c         | 534 ++++++++++++++++++++++++++++
> > >  kernel/trace/trace_output.c         |  40 +++
> > >  8 files changed, 612 insertions(+)
> > >  create mode 100644 kernel/trace/trace_object.c
> > >
> > > diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
> > > index 2d167ac3452c..54baec78ce62 100644
> > > --- a/include/linux/trace_events.h
> > > +++ b/include/linux/trace_events.h
> > > @@ -684,6 +684,7 @@ enum event_trigger_type {
> > >       ETT_EVENT_HIST          = (1 << 4),
> > >       ETT_HIST_ENABLE         = (1 << 5),
> > >       ETT_EVENT_EPROBE        = (1 << 6),
> > > +     ETT_TRACE_OBJECT        = (1 << 7),
> > >  };
> > >
> > >  extern int filter_match_preds(struct event_filter *filter, void *rec);
> > > diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
> > > index 420ff4bc67fd..f69761cbfaaa 100644
> > > --- a/kernel/trace/Kconfig
> > > +++ b/kernel/trace/Kconfig
> > > @@ -237,6 +237,16 @@ config FUNCTION_PROFILER
> > >
> > >         If in doubt, say N.
> > >
> > > +config TRACE_OBJECT
> > > +     bool "Trace kernel object in function parameter"
> > > +     depends on FUNCTION_TRACER
> > > +     depends on HAVE_FUNCTION_ARG_ACCESS_API
> > > +     select TRACING
> > > +     default y
> > > +     help
> > > +      You can trace the kernel object in the kernel function parameter.
> > > +      The kernel object is dynamically specified via event trigger.
> > > +
> > >  config STACK_TRACER
> > >       bool "Trace max stack"
> > >       depends on HAVE_FUNCTION_TRACER
> > > diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
> > > index bedc5caceec7..b924b8e55922 100644
> > > --- a/kernel/trace/Makefile
> > > +++ b/kernel/trace/Makefile
> > > @@ -67,6 +67,7 @@ obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += trace_functions_graph.o
> > >  obj-$(CONFIG_TRACE_BRANCH_PROFILING) += trace_branch.o
> > >  obj-$(CONFIG_BLK_DEV_IO_TRACE) += blktrace.o
> > >  obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += fgraph.o
> > > +obj-$(CONFIG_TRACE_OBJECT) += trace_object.o
> > >  ifeq ($(CONFIG_BLOCK),y)
> > >  obj-$(CONFIG_EVENT_TRACING) += blktrace.o
> > >  endif
> > > diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> > > index 38715aa6cfdf..46a1301ac47a 100644
> > > --- a/kernel/trace/trace.h
> > > +++ b/kernel/trace/trace.h
> > > @@ -54,6 +54,7 @@ enum trace_type {
> > >       TRACE_TIMERLAT,
> > >       TRACE_RAW_DATA,
> > >       TRACE_FUNC_REPEATS,
> > > +     TRACE_OBJECT,
> > >
> > >       __TRACE_LAST_TYPE,
> > >  };
> > > @@ -469,6 +470,7 @@ extern void __ftrace_bad_type(void);
> > >                         TRACE_GRAPH_RET);             \
> > >               IF_ASSIGN(var, ent, struct func_repeats_entry,          \
> > >                         TRACE_FUNC_REPEATS);                          \
> > > +             IF_ASSIGN(var, ent, struct trace_object_entry, TRACE_OBJECT);\
> > >               __ftrace_bad_type();                                    \
> > >       } while (0)
> > >
> > > @@ -1531,6 +1533,12 @@ static inline int register_trigger_hist_cmd(void) { return 0; }
> > >  static inline int register_trigger_hist_enable_disable_cmds(void) { return 0; }
> > >  #endif
> > >
> > > +#ifdef CONFIG_TRACE_OBJECT
> > > +extern int register_trigger_object_cmd(void);
> > > +#else
> > > +static inline int register_trigger_object_cmd(void) { return 0; }
> > > +#endif
> > > +
> > >  extern int register_trigger_cmds(void);
> > >  extern void clear_event_triggers(struct trace_array *tr);
> > >
> > > diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h
> > > index cd41e863b51c..bb120d9498a9 100644
> > > --- a/kernel/trace/trace_entries.h
> > > +++ b/kernel/trace/trace_entries.h
> > > @@ -401,3 +401,20 @@ FTRACE_ENTRY(timerlat, timerlat_entry,
> > >                __entry->context,
> > >                __entry->timer_latency)
> > >  );
> > > +
> > > +/*
> > > + * trace object entry:
> > > + */
> > > +FTRACE_ENTRY(object, trace_object_entry,
> > > +
> > > +     TRACE_OBJECT,
> > > +
> > > +     F_STRUCT(
> > > +             __field(        unsigned long,          ip              )
> > > +             __field(        unsigned long,          parent_ip       )
> > > +             __field(        unsigned long,          object          )
> > > +     ),
> > > +
> > > +     F_printk(" %ps <-- %ps object:%lx\n",
> > > +              (void *)__entry->ip, (void *)__entry->parent_ip, __entry->object)
> > > +);
> > > diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
> > > index 3d5c07239a2a..da2d871fa809 100644
> > > --- a/kernel/trace/trace_events_trigger.c
> > > +++ b/kernel/trace/trace_events_trigger.c
> > > @@ -1687,6 +1687,7 @@ __init int register_trigger_cmds(void)
> > >       register_trigger_enable_disable_cmds();
> > >       register_trigger_hist_enable_disable_cmds();
> > >       register_trigger_hist_cmd();
> > > +     register_trigger_object_cmd();
> > >
> > >       return 0;
> > >  }
> > > diff --git a/kernel/trace/trace_object.c b/kernel/trace/trace_object.c
> > > new file mode 100644
> > > index 000000000000..83fccd587b63
> > > --- /dev/null
> > > +++ b/kernel/trace/trace_object.c
> > > @@ -0,0 +1,534 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +/*
> > > + * trace the kernel object in the kernel function parameter
> > > + * Copyright (C) 2021 Jeff Xie <xiehuan09@gmail.com>
> > > + */
> > > +
> > > +#define pr_fmt(fmt) "trace_object: " fmt
> > > +
> > > +#include "trace_output.h"
> > > +#include <linux/freelist.h>
> > > +
> > > +static DEFINE_PER_CPU(atomic_t, trace_object_event_disable);
> >
> > atomic_t is for atomic operation which must be shared among cpus. On the
> > other hand, per-cpu variable is used for the core-local storage or flags,
> > other cpus never touch it. Thus the per-cpu atomic_t is very strange.
> >
> 
> From the patch V1, I cloned it from the  function_test_events_call()
> in kernel/trace/trace_events.c
> 
> commit: 9ea21c1ecdb35ecdcac5fd9d95f62a1f6a7ffec0
> tracing/events: perform function tracing in event selftests
> Author:     Steven Rostedt <srostedt@redhat.com>

Hmm, OK.

> 
> It should be to prevent being preempted by the interrupt context in
> the process of adding one

Yeah, I think so.

The commit says "some bugs" but it is not sure what actually needs to be
cared.

    tracing/events: perform function tracing in event selftests
    
    We can find some bugs in the trace events if we stress the writes as well.
    The function tracer is a good way to stress the events.

Steve, can you tell me what was the problem?

I think we don't need per-cpu atomic_t because the counter is increment
and decrement. Thus when quiting the nested ftrace handler on the same CPU,
the counter comes back to the same value. We don't need to care about
atomic increment.

I mean, if we use the normal per-cpu "unsigned int" as a counter, the
operation of "counter++" becomes;

load counter to reg1
[1]
reg1 = reg1 + 1
store reg1 to counter

And if an interrupt occurs at [1], the following happens.

load counter to reg1 # counter = 0

  (interrupt - save reg1)
  load counter to reg1  # counter = 0
  reg1 = reg1 + 1
  store reg1 to counter  # counter = 1
  ...
  load counter to reg1  # counter = 1
  reg1 = reg1 - 1
  store reg1 to counter  # counter = 0
  (iret - restore reg1)

reg1 = reg1 + 1
store reg1 to counter

So, even if the operation is not atomic, there seems no problem.
What else scenario we have to worry?

(BTW, what is the ftrace_test_recursion_trylock()? Is that also
for detecting nesting case??)

> > > +static DEFINE_RAW_SPINLOCK(object_spin_lock);
> > > +static struct trace_event_file event_trace_file;
> > > +static const int max_args_num = 6;
> > > +static const int max_obj_pool = 10;
> > > +static atomic_t trace_object_ref;
> > > +static int exit_trace_object(void);
> > > +static int init_trace_object(void);
> > > +
> >
> > Please add more comments to the code itself. Explain why this is needed
> > and how it works for which case. That will lead deeper understanding.
> >
> 
> I agree, I will add more comments in the next version.
> 
> > > +struct object_instance {
> > > +     void *object;
> > > +     struct freelist_node free_list;
> > > +     struct list_head active_list;
> > > +};
> > > +
> > > +struct obj_pool {
> > > +     struct freelist_head free_list;
> > > +     struct list_head active_list;
> > > +};
> > > +static struct obj_pool *obj_pool;
> > > +
> > > +static bool object_exist(void *obj)
> > > +{
> > > +     struct object_instance *inst;
> > > +     bool ret = false;
> > > +
> > > +     list_for_each_entry_rcu(inst, &obj_pool->active_list, active_list) {
> > > +             if (inst->object == obj) {
> > > +                     ret = true;
> > > +                     goto out;
> > > +             }
> > > +     }
> > > +out:
> > > +     return ret;
> > > +}
> > > +
> > > +static bool object_empty(void)
> > > +{
> > > +     return list_empty(&obj_pool->active_list);
> > > +}
> > > +
> > > +static void set_trace_object(void *obj)
> > > +{
> > > +     struct freelist_node *fn;
> > > +     struct object_instance *ins;
> > > +     unsigned long flags;
> > > +
> > > +     if (in_nmi())
> > > +             return;
> > > +
> > > +     if (!obj)
> > > +             return;
> > > +
> > > +     if (object_exist(obj))
> > > +             return;
> > > +
> > > +     fn = freelist_try_get(&obj_pool->free_list);
> > > +     if (!fn) {
> > > +             trace_printk("object_pool is full, can't trace object:0x%px\n", obj);
> > > +             return;
> > > +     }
> > > +
> > > +     ins = container_of(fn, struct object_instance, free_list);
> > > +     ins->object = obj;
> > > +
> > > +     raw_spin_lock_irqsave(&object_spin_lock, flags);
> > > +     list_add_rcu(&ins->active_list, &obj_pool->active_list);
> > > +     raw_spin_unlock_irqrestore(&object_spin_lock, flags);
> >
> > Please add a comment that why this spinlock is needed here and why
> > other operation doesn't need it.
> 
> (Only this place has write operations, and it cannot be concurrent)

Yes and such comment helps us when objtrace supports "del" command too. :)

> I agree, I will add it.
> 
> > > +}
> > > +
> > > +static inline void free_free_list_objects(struct freelist_head *head)
> > > +{
> > > +
> > > +     struct object_instance *inst;
> > > +     struct freelist_node *node = head->head;
> > > +
> > > +     while (node) {
> > > +             inst = container_of(node, struct object_instance, free_list);
> > > +             node = node->next;
> > > +             kfree(inst);
> >
> > Note that this is safe only if this free_list is no more accessed.
> >
> 
> Maybe it need to use rcu to handle it , I will think about how to  fix
> it in the next version.

Yes, what you need is a synchronize_rcu().

> 
> > > +     }
> > > +}
> > > +
> > > +static inline void free_active_list_objects(struct list_head *head)
> > > +{
> > > +     struct object_instance *inst;
> > > +
> > > +     list_for_each_entry_rcu(inst, head, active_list)
> > > +             kfree(inst);
> >
> > No, you can not do this, since the 'inst' is still on the active_list.
> > Before freeing the 'inst', you need to do list_del_rcu().
> > Moreover, the 'inst' might be still accessed from the callbacks,
> > you can not directly free it.
> >
> > RCU ensures that looping on a list can be done safely before and after
> > removing/adding a node, but not ensure it is freeable soon, because
> > someone still looking the node.
> >
> > I guess you may expect that all callbacks are stopped after calling
> 
> Yes you guessed it right ;-)
> 
> > unregister_ftrace_function(&trace_ops);
> >
> > but that is wrong. That don't synchronize the callbacks (= wait
> > for finishing all running callbacks).
> > So, you should use synchronize_rcu() after that.
> 
> Thanks,  I will fix it.

Thanks, it maybe hard to notice. Since the synchronize_rcu() is a time
taking operation, trace_event subsystem carefully chooses the calling
point. (e.g. where all user specified operation was done)

> > > +}
> > > +
> > > +static inline void free_object_pool(void)
> > > +{
> > > +     free_free_list_objects(&obj_pool->free_list);
> > > +     free_active_list_objects(&obj_pool->active_list);
> >
> > BTW, did you also ensure all the trigger callback has been done?
> > set_trace_object() has a small time window when the object_instance
> > is not linked to both of the free_list and the active_list.
> > Thus you have to make sure that the function is never called while
> > this function. (How to do it? please write the comment about it)
> 
> I ignored this small time window, maybe use the rcu can handle it, I
> will think about how to fix it.
> 
> > > +     kfree(obj_pool);
> > > +}
> > > +
> > > +
> > > +static int init_object_pool(void)
> > > +{
> > > +     struct object_instance *inst;
> > > +     int i, ret = 0;
> > > +
> > > +     obj_pool = kzalloc(sizeof(*obj_pool), GFP_KERNEL);
> > > +     if (!obj_pool) {
> > > +             ret = -ENOMEM;
> > > +             goto out;
> > > +     }
> > > +
> > > +     INIT_LIST_HEAD(&obj_pool->active_list);
> > > +
> > > +     for (i = 0; i < max_obj_pool; i++) {
> > > +             inst = kzalloc(sizeof(*inst), GFP_KERNEL);
> > > +             if (!inst) {
> > > +                     free_object_pool();
> > > +                     ret = -ENOMEM;
> > > +                     goto out;
> > > +             }
> > > +             freelist_add(&inst->free_list, &obj_pool->free_list);
> > > +     }
> > > +out:
> > > +     return ret;
> > > +}
> > > +
> > > +static void submit_trace_object(unsigned long ip, unsigned long parent_ip,
> > > +                              unsigned long object)
> > > +{
> > > +
> > > +     struct trace_buffer *buffer;
> > > +     struct ring_buffer_event *event;
> > > +     struct trace_object_entry *entry;
> > > +     int pc;
> > > +
> > > +     pc = preempt_count();
> > > +     event = trace_event_buffer_lock_reserve(&buffer, &event_trace_file,
> > > +                     TRACE_OBJECT, sizeof(*entry), pc);
> > > +     if (!event)
> > > +             return;
> > > +     entry   = ring_buffer_event_data(event);
> > > +     entry->ip                       = ip;
> > > +     entry->parent_ip                = parent_ip;
> > > +     entry->object                   = object;
> > > +
> > > +     event_trigger_unlock_commit(&event_trace_file, buffer, event,
> > > +             entry, pc);
> > > +}
> > > +
> > > +static void
> > > +trace_object_events_call(unsigned long ip, unsigned long parent_ip,
> > > +             struct ftrace_ops *op, struct ftrace_regs *fregs)
> > > +{
> > > +     struct pt_regs *pt_regs = ftrace_get_regs(fregs);
> > > +     unsigned long obj;
> > > +     long disabled;
> > > +     int cpu, n;
> > > +
> > > +     preempt_disable_notrace();
> > > +
> > > +     cpu = raw_smp_processor_id();
> > > +     disabled = atomic_inc_return(&per_cpu(trace_object_event_disable, cpu));
> >
> > So what is the purpose of this check? (are there any issue if the same
> > cpu reenter here?)
> >
> 
> There may be an interrupt context that can preempt it. I am not yet
> sure whether the cpu reenter  will affect it.
> I will debug and test it. (Referred from function_test_events_call())

I think what you need is ftrace_test_recursion_trylock(). Please
check the "Protect your callback" section in Documentation/trace/ftrace-uses.rst.

> 
> > > +
> > > +     if (disabled != 1)
> > > +             goto out;
> > > +
> > > +     if (object_empty())
> > > +             goto out;
> > > +
> > > +     for (n = 0; n < max_args_num; n++) {
> > > +             obj = regs_get_kernel_argument(pt_regs, n);
> > > +             if (object_exist((void *)obj))
> > > +                     submit_trace_object(ip, parent_ip, obj);
> >
> > Please add a comment why don't you break the loop after finding
> > any trace object here.
> >
> (because the parameters of a function may match multiple objects)
> I will add the comment in the next version.

good :)

> 
> > > +     }
> > > +
> > > +out:
> > > +     atomic_dec(&per_cpu(trace_object_event_disable, cpu));
> > > +     preempt_enable_notrace();
> > > +}
> > > +
> > > +static struct ftrace_ops trace_ops = {
> > > +     .func  = trace_object_events_call,
> > > +     .flags = FTRACE_OPS_FL_SAVE_REGS,
> > > +};
> > > +
> > > +static void
> > > +trace_object_trigger(struct event_trigger_data *data,
> > > +                struct trace_buffer *buffer,  void *rec,
> > > +                struct ring_buffer_event *event)
> > > +{
> > > +
> > > +     struct ftrace_event_field *field = data->private_data;
> > > +     void *obj = NULL;
> > > +
> > > +     memcpy(&obj, rec + field->offset, sizeof(obj));
> > > +     set_trace_object(obj);
> > > +}
> > > +
> >
> > /* This is clone of event_trigger_free(). */
> >
> > > +static void
> > > +trace_object_trigger_free(struct event_trigger_ops *ops,
> > > +                struct event_trigger_data *data)
> > > +{
> > > +     if (WARN_ON_ONCE(data->ref <= 0))
> > > +             return;
> > > +
> > > +     data->ref--;
> > > +     if (!data->ref)
> > > +             trigger_data_free(data);
> > > +}
> > > +
> > > +static void
> > > +trace_object_count_trigger(struct event_trigger_data *data,
> > > +                      struct trace_buffer *buffer, void *rec,
> > > +                      struct ring_buffer_event *event)
> > > +{
> > > +     if (!data->count)
> > > +             return;
> > > +
> > > +     if (data->count != -1)
> > > +             (data->count)--;
> > > +
> > > +     trace_object_trigger(data, buffer, rec, event);
> > > +}
> > > +
> >
> > /* This is the clone of event_trigger_init() */
> >
> > > +static int event_object_trigger_init(struct event_trigger_ops *ops,
> > > +                    struct event_trigger_data *data)
> > > +{
> > > +     data->ref++;
> > > +     return 0;
> > > +}
> >
> >
> > At this moment, I think cloning those functions are OK. But after merged
> > and enough tested, trigger should expose these functions and this
> > should use it.
> 
> I agree.
> 
> >
> > BTW, it seems you are sharing the trace object active_list (obj_pool::active_list)
> > among all object trace triggers. This means if user sets one objtrace trigger
> > on 2 events and remove from one event, the active_list still have the object
> > which sets by the event we removed. If it is correct, I think this should be
> > noted in the document.
> > Something like;
> >
> > Note that currently this is not designed to trace objects of different types
> > at the same time. You must set an objtrace trigger to only one event, or
> > you must carefully choose the same type object when you set the objtrace on
> > differnt events. Also, the traced object list is not cleared unless all
> > triggers are removed.
> >
> Thanks, I agree, I will add it.
> >
> > > +
> > > +static int
> > > +event_trigger_print(const char *name, struct seq_file *m,
> > > +                 void *data, char *filter_str)
> > > +{
> > > +     long count = (long)data;
> > > +
> > > +     seq_puts(m, name);
> > > +
> > > +     if (count == -1)
> > > +             seq_puts(m, ":unlimited");
> > > +     else
> > > +             seq_printf(m, ":count=%ld", count);
> > > +
> > > +     if (filter_str)
> > > +             seq_printf(m, " if %s\n", filter_str);
> > > +     else
> > > +             seq_putc(m, '\n');
> > > +
> > > +     return 0;
> > > +}
> > > +
> > > +static int
> > > +trace_object_trigger_print(struct seq_file *m, struct event_trigger_ops *ops,
> > > +                      struct event_trigger_data *data)
> > > +{
> > > +     return event_trigger_print("objtrace", m, (void *)data->count,
> > > +                                data->filter_str);
> > > +}
> > > +
> > > +static struct event_trigger_ops objecttrace_trigger_ops = {
> > > +     .func                   = trace_object_trigger,
> > > +     .print                  = trace_object_trigger_print,
> > > +     .init                   = event_object_trigger_init,
> > > +     .free                   = trace_object_trigger_free,
> > > +};
> > > +
> > > +static struct event_trigger_ops objecttrace_count_trigger_ops = {
> > > +     .func                   = trace_object_count_trigger,
> > > +     .print                  = trace_object_trigger_print,
> > > +     .init                   = event_object_trigger_init,
> > > +     .free                   = trace_object_trigger_free,
> > > +};
> > > +
> > > +static struct event_trigger_ops *
> > > +objecttrace_get_trigger_ops(char *cmd, char *param)
> > > +{
> > > +     return param ? &objecttrace_count_trigger_ops : &objecttrace_trigger_ops;
> > > +}
> > > +
> > > +static int register_object_trigger(char *glob, struct event_trigger_ops *ops,
> > > +                         struct event_trigger_data *data,
> > > +                         struct trace_event_file *file)
> > > +{
> > > +     struct event_trigger_data *test;
> > > +     int ret = 0;
> > > +
> > > +     lockdep_assert_held(&event_mutex);
> > > +
> > > +     list_for_each_entry(test, &file->triggers, list) {
> > > +             if (test->cmd_ops->trigger_type == data->cmd_ops->trigger_type) {
> > > +                     ret = -EEXIST;
> > > +                     goto out;
> > > +             }
> > > +     }
> > > +
> > > +     if (data->ops->init) {
> > > +             ret = data->ops->init(data->ops, data);
> > > +             if (ret < 0)
> > > +                     goto out;
> > > +     }
> > > +
> > > +     list_add_rcu(&data->list, &file->triggers);
> > > +     ret++;
> > > +
> > > +     update_cond_flag(file);
> > > +     if (trace_event_trigger_enable_disable(file, 1) < 0) {
> > > +             list_del_rcu(&data->list);
> > > +             update_cond_flag(file);
> > > +             ret--;
> > > +     }
> > > +     init_trace_object();
> > > +out:
> > > +     return ret;
> > > +}
> > > +
> > > +static void unregister_object_trigger(char *glob, struct event_trigger_ops *ops,
> > > +                            struct event_trigger_data *test,
> > > +                            struct trace_event_file *file)
> > > +{
> > > +     struct event_trigger_data *data;
> > > +     bool unregistered = false;
> > > +
> > > +     lockdep_assert_held(&event_mutex);
> > > +
> > > +     list_for_each_entry(data, &file->triggers, list) {
> > > +             if (data->cmd_ops->trigger_type == test->cmd_ops->trigger_type) {
> > > +                     unregistered = true;
> > > +                     list_del_rcu(&data->list);
> > > +                     trace_event_trigger_enable_disable(file, 0);
> > > +                     update_cond_flag(file);
> > > +                     break;
> > > +             }
> > > +     }
> > > +
> > > +     if (unregistered && data->ops->free) {
> > > +             data->ops->free(data->ops, data);
> > > +             exit_trace_object();
> > > +     }
> >
> > isn't this the following? Can you skip exit_trace_object()
> > if data->ops->free == NULL?
> >
> >         if (unregistered) {
> >                 if (data->ops->free)
> >                         data->ops->free(data->ops, data);
> >                 exit_trace_object();
> >         }
> 
> Thanks, this one would be better, although data->ops->free  is not
> empty at the moment.
> 
>

Thank you,


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH v6 1/5] trace: Add trace any kernel object
  2021-12-17 16:32     ` Jeff Xie
  2021-12-19  3:07       ` Masami Hiramatsu
@ 2021-12-21  7:36       ` Masami Hiramatsu
  2021-12-21 10:29         ` Jeff Xie
  1 sibling, 1 reply; 26+ messages in thread
From: Masami Hiramatsu @ 2021-12-21  7:36 UTC (permalink / raw)
  To: Jeff Xie; +Cc: Steven Rostedt, mingo, Tom Zanussi, linux-kernel

Hi Jeff,

On Sat, 18 Dec 2021 00:32:57 +0800
Jeff Xie <xiehuan09@gmail.com> wrote:

> > > +struct object_instance {
> > > +     void *object;
> > > +     struct freelist_node free_list;
> > > +     struct list_head active_list;
> > > +};
> > > +
> > > +struct obj_pool {
> > > +     struct freelist_head free_list;
> > > +     struct list_head active_list;
> > > +};
> > > +static struct obj_pool *obj_pool;
> > > +
> > > +static bool object_exist(void *obj)
> > > +{
> > > +     struct object_instance *inst;
> > > +     bool ret = false;
> > > +
> > > +     list_for_each_entry_rcu(inst, &obj_pool->active_list, active_list) {
> > > +             if (inst->object == obj) {
> > > +                     ret = true;
> > > +                     goto out;
> > > +             }
> > > +     }
> > > +out:
> > > +     return ret;
> > > +}
> > > +
> > > +static bool object_empty(void)
> > > +{
> > > +     return list_empty(&obj_pool->active_list);
> > > +}
> > > +
> > > +static void set_trace_object(void *obj)
> > > +{
> > > +     struct freelist_node *fn;
> > > +     struct object_instance *ins;
> > > +     unsigned long flags;
> > > +
> > > +     if (in_nmi())
> > > +             return;
> > > +
> > > +     if (!obj)
> > > +             return;
> > > +
> > > +     if (object_exist(obj))
> > > +             return;
> > > +
> > > +     fn = freelist_try_get(&obj_pool->free_list);
> > > +     if (!fn) {
> > > +             trace_printk("object_pool is full, can't trace object:0x%px\n", obj);
> > > +             return;
> > > +     }
> > > +
> > > +     ins = container_of(fn, struct object_instance, free_list);
> > > +     ins->object = obj;
> > > +
> > > +     raw_spin_lock_irqsave(&object_spin_lock, flags);
> > > +     list_add_rcu(&ins->active_list, &obj_pool->active_list);
> > > +     raw_spin_unlock_irqrestore(&object_spin_lock, flags);
> >
> > Please add a comment that why this spinlock is needed here and why
> > other operation doesn't need it.
> 
> (Only this place has write operations, and it cannot be concurrent)
> I agree, I will add it.

BTW, I have another better solution for object pool. If the
object pool size is fixed (of course to avoid performance overhead,
it should be small enough) and it can not avoid using spinlock, 
it is better to use fixed-size array. That makes the implementation
much simpler.

static struct object_instance {
	void *obj;	/* trace object */
	// add offset in the next patch	
} traced_obj[MAX_TRACE_OBJECT];

static atomic_t num_traced_obj;
static DEFINE_RAW_SPINLOCK(trace_obj_lock);

static void set_trace_object(void *obj)
{
	...

	raw_spin_lock_irqsave(&trace_obj_lock, flags);
	if (num_traced_obj == MAX_TRACED_OBJECT)
		goto out;

	traced_obj[num_traced_obj].obj = obj;
	smp_wmb();	// make sure the num_traced_obj update always appears after trace_obj update. 
	num_traced_obj++;
out:
	raw_spin_unlock_irqrestore(&trace_obj_lock, flags);
}

static bool object_exist(void *obj)
{
	...
	max = num_traced_obj;
	smp_rmb();	// then the num_traced_obj will cap the max.
	for (i = 0; i < max; i++) {
		if (traced_obj[i].obj == obj)
			return true;
	}
	return false;
}

static inline void free_object_pool(void)
{
	num_traced_obj = 0;
	memset(traced_obj, 0, sizeof(traced_obj));
}

Sorry if I confuse you but I think you shouldn't take a time on those unneeded
complexity. :)


> > > +static void submit_trace_object(unsigned long ip, unsigned long parent_ip,
> > > +                              unsigned long object)
> > > +{
> > > +
> > > +     struct trace_buffer *buffer;
> > > +     struct ring_buffer_event *event;
> > > +     struct trace_object_entry *entry;
> > > +     int pc;
> > > +
> > > +     pc = preempt_count();
> > > +     event = trace_event_buffer_lock_reserve(&buffer, &event_trace_file,
> > > +                     TRACE_OBJECT, sizeof(*entry), pc);
> > > +     if (!event)
> > > +             return;
> > > +     entry   = ring_buffer_event_data(event);
> > > +     entry->ip                       = ip;
> > > +     entry->parent_ip                = parent_ip;
> > > +     entry->object                   = object;
> > > +
> > > +     event_trigger_unlock_commit(&event_trace_file, buffer, event,
> > > +             entry, pc);
> > > +}
> > > +
> > > +static void
> > > +trace_object_events_call(unsigned long ip, unsigned long parent_ip,
> > > +             struct ftrace_ops *op, struct ftrace_regs *fregs)
> > > +{
> > > +     struct pt_regs *pt_regs = ftrace_get_regs(fregs);
> > > +     unsigned long obj;
> > > +     long disabled;
> > > +     int cpu, n;
> > > +
> > > +     preempt_disable_notrace();
> > > +
> > > +     cpu = raw_smp_processor_id();
> > > +     disabled = atomic_inc_return(&per_cpu(trace_object_event_disable, cpu));
> >
> > So what is the purpose of this check? (are there any issue if the same
> > cpu reenter here?)
> >
> 
> There may be an interrupt context that can preempt it. I am not yet
> sure whether the cpu reenter  will affect it.
> I will debug and test it. (Referred from function_test_events_call())

Maybe you can use ftrace_test_recursion_trylock(), as kprobe_ftrace_handler()
does.

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH v6 1/5] trace: Add trace any kernel object
  2021-12-21  7:36       ` Masami Hiramatsu
@ 2021-12-21 10:29         ` Jeff Xie
  2021-12-23 14:12           ` Jeff Xie
  0 siblings, 1 reply; 26+ messages in thread
From: Jeff Xie @ 2021-12-21 10:29 UTC (permalink / raw)
  To: Masami Hiramatsu; +Cc: Steven Rostedt, mingo, Tom Zanussi, linux-kernel

Hi Masami,

On Tue, Dec 21, 2021 at 3:36 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
>
> Hi Jeff,
>
> On Sat, 18 Dec 2021 00:32:57 +0800
> Jeff Xie <xiehuan09@gmail.com> wrote:
>
> > > > +struct object_instance {
> > > > +     void *object;
> > > > +     struct freelist_node free_list;
> > > > +     struct list_head active_list;
> > > > +};
> > > > +
> > > > +struct obj_pool {
> > > > +     struct freelist_head free_list;
> > > > +     struct list_head active_list;
> > > > +};
> > > > +static struct obj_pool *obj_pool;
> > > > +
> > > > +static bool object_exist(void *obj)
> > > > +{
> > > > +     struct object_instance *inst;
> > > > +     bool ret = false;
> > > > +
> > > > +     list_for_each_entry_rcu(inst, &obj_pool->active_list, active_list) {
> > > > +             if (inst->object == obj) {
> > > > +                     ret = true;
> > > > +                     goto out;
> > > > +             }
> > > > +     }
> > > > +out:
> > > > +     return ret;
> > > > +}
> > > > +
> > > > +static bool object_empty(void)
> > > > +{
> > > > +     return list_empty(&obj_pool->active_list);
> > > > +}
> > > > +
> > > > +static void set_trace_object(void *obj)
> > > > +{
> > > > +     struct freelist_node *fn;
> > > > +     struct object_instance *ins;
> > > > +     unsigned long flags;
> > > > +
> > > > +     if (in_nmi())
> > > > +             return;
> > > > +
> > > > +     if (!obj)
> > > > +             return;
> > > > +
> > > > +     if (object_exist(obj))
> > > > +             return;
> > > > +
> > > > +     fn = freelist_try_get(&obj_pool->free_list);
> > > > +     if (!fn) {
> > > > +             trace_printk("object_pool is full, can't trace object:0x%px\n", obj);
> > > > +             return;
> > > > +     }
> > > > +
> > > > +     ins = container_of(fn, struct object_instance, free_list);
> > > > +     ins->object = obj;
> > > > +
> > > > +     raw_spin_lock_irqsave(&object_spin_lock, flags);
> > > > +     list_add_rcu(&ins->active_list, &obj_pool->active_list);
> > > > +     raw_spin_unlock_irqrestore(&object_spin_lock, flags);
> > >
> > > Please add a comment that why this spinlock is needed here and why
> > > other operation doesn't need it.
> >
> > (Only this place has write operations, and it cannot be concurrent)
> > I agree, I will add it.
>
> BTW, I have another better solution for object pool. If the
> object pool size is fixed (of course to avoid performance overhead,
> it should be small enough) and it can not avoid using spinlock,
> it is better to use fixed-size array. That makes the implementation
> much simpler.

This looks really simpler, I will add it in the next version ;-)

> static struct object_instance {
>         void *obj;      /* trace object */
>         // add offset in the next patch
> } traced_obj[MAX_TRACE_OBJECT];
>
> static atomic_t num_traced_obj;
> static DEFINE_RAW_SPINLOCK(trace_obj_lock);
>
> static void set_trace_object(void *obj)
> {
>         ...
>
>         raw_spin_lock_irqsave(&trace_obj_lock, flags);
>         if (num_traced_obj == MAX_TRACED_OBJECT)
>                 goto out;
>
>         traced_obj[num_traced_obj].obj = obj;
>         smp_wmb();      // make sure the num_traced_obj update always appears after trace_obj update.
>         num_traced_obj++;

I would like to ask whether this place need to add another smp_wmb()
to match the smp_rmb() in the object_exist().
I learned that rcu has a publish and subscribe mechanism ;-)

> out:
>         raw_spin_unlock_irqrestore(&trace_obj_lock, flags);
> }
>
> static bool object_exist(void *obj)
> {
>         ...
>         max = num_traced_obj;
>         smp_rmb();      // then the num_traced_obj will cap the max.
>         for (i = 0; i < max; i++) {
>                 if (traced_obj[i].obj == obj)
>                         return true;
>         }
>         return false;
> }
>
> static inline void free_object_pool(void)
> {
>         num_traced_obj = 0;
>         memset(traced_obj, 0, sizeof(traced_obj));
> }
>
> Sorry if I confuse you but I think you shouldn't take a time on those unneeded
> complexity. :)

Thanks, at least I learned a different way to implement it ;-)

>
>
> > > > +static void submit_trace_object(unsigned long ip, unsigned long parent_ip,
> > > > +                              unsigned long object)
> > > > +{
> > > > +
> > > > +     struct trace_buffer *buffer;
> > > > +     struct ring_buffer_event *event;
> > > > +     struct trace_object_entry *entry;
> > > > +     int pc;
> > > > +
> > > > +     pc = preempt_count();
> > > > +     event = trace_event_buffer_lock_reserve(&buffer, &event_trace_file,
> > > > +                     TRACE_OBJECT, sizeof(*entry), pc);
> > > > +     if (!event)
> > > > +             return;
> > > > +     entry   = ring_buffer_event_data(event);
> > > > +     entry->ip                       = ip;
> > > > +     entry->parent_ip                = parent_ip;
> > > > +     entry->object                   = object;
> > > > +
> > > > +     event_trigger_unlock_commit(&event_trace_file, buffer, event,
> > > > +             entry, pc);
> > > > +}
> > > > +
> > > > +static void
> > > > +trace_object_events_call(unsigned long ip, unsigned long parent_ip,
> > > > +             struct ftrace_ops *op, struct ftrace_regs *fregs)
> > > > +{
> > > > +     struct pt_regs *pt_regs = ftrace_get_regs(fregs);
> > > > +     unsigned long obj;
> > > > +     long disabled;
> > > > +     int cpu, n;
> > > > +
> > > > +     preempt_disable_notrace();
> > > > +
> > > > +     cpu = raw_smp_processor_id();
> > > > +     disabled = atomic_inc_return(&per_cpu(trace_object_event_disable, cpu));
> > >
> > > So what is the purpose of this check? (are there any issue if the same
> > > cpu reenter here?)
> > >
> >
> > There may be an interrupt context that can preempt it. I am not yet
> > sure whether the cpu reenter  will affect it.
> > I will debug and test it. (Referred from function_test_events_call())
>
> Maybe you can use ftrace_test_recursion_trylock(), as kprobe_ftrace_handler()
> does.

I will use it ,thanks.

>
> Thank you,
>
> --
> Masami Hiramatsu <mhiramat@kernel.org>
---
JeffXie

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH v6 1/5] trace: Add trace any kernel object
  2021-12-21 10:29         ` Jeff Xie
@ 2021-12-23 14:12           ` Jeff Xie
  0 siblings, 0 replies; 26+ messages in thread
From: Jeff Xie @ 2021-12-23 14:12 UTC (permalink / raw)
  To: Masami Hiramatsu; +Cc: Steven Rostedt, mingo, Tom Zanussi, linux-kernel

Hi Masami,

On Tue, Dec 21, 2021 at 6:29 PM Jeff Xie <xiehuan09@gmail.com> wrote:
>
> Hi Masami,
>
> On Tue, Dec 21, 2021 at 3:36 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
> >
> > Hi Jeff,
> >
> > On Sat, 18 Dec 2021 00:32:57 +0800
> > Jeff Xie <xiehuan09@gmail.com> wrote:
> >
> > > > > +struct object_instance {
> > > > > +     void *object;
> > > > > +     struct freelist_node free_list;
> > > > > +     struct list_head active_list;
> > > > > +};
> > > > > +
> > > > > +struct obj_pool {
> > > > > +     struct freelist_head free_list;
> > > > > +     struct list_head active_list;
> > > > > +};
> > > > > +static struct obj_pool *obj_pool;
> > > > > +
> > > > > +static bool object_exist(void *obj)
> > > > > +{
> > > > > +     struct object_instance *inst;
> > > > > +     bool ret = false;
> > > > > +
> > > > > +     list_for_each_entry_rcu(inst, &obj_pool->active_list, active_list) {
> > > > > +             if (inst->object == obj) {
> > > > > +                     ret = true;
> > > > > +                     goto out;
> > > > > +             }
> > > > > +     }
> > > > > +out:
> > > > > +     return ret;
> > > > > +}
> > > > > +
> > > > > +static bool object_empty(void)
> > > > > +{
> > > > > +     return list_empty(&obj_pool->active_list);
> > > > > +}
> > > > > +
> > > > > +static void set_trace_object(void *obj)
> > > > > +{
> > > > > +     struct freelist_node *fn;
> > > > > +     struct object_instance *ins;
> > > > > +     unsigned long flags;
> > > > > +
> > > > > +     if (in_nmi())
> > > > > +             return;
> > > > > +
> > > > > +     if (!obj)
> > > > > +             return;
> > > > > +
> > > > > +     if (object_exist(obj))
> > > > > +             return;
> > > > > +
> > > > > +     fn = freelist_try_get(&obj_pool->free_list);
> > > > > +     if (!fn) {
> > > > > +             trace_printk("object_pool is full, can't trace object:0x%px\n", obj);
> > > > > +             return;
> > > > > +     }
> > > > > +
> > > > > +     ins = container_of(fn, struct object_instance, free_list);
> > > > > +     ins->object = obj;
> > > > > +
> > > > > +     raw_spin_lock_irqsave(&object_spin_lock, flags);
> > > > > +     list_add_rcu(&ins->active_list, &obj_pool->active_list);
> > > > > +     raw_spin_unlock_irqrestore(&object_spin_lock, flags);
> > > >
> > > > Please add a comment that why this spinlock is needed here and why
> > > > other operation doesn't need it.
> > >
> > > (Only this place has write operations, and it cannot be concurrent)
> > > I agree, I will add it.
> >
> > BTW, I have another better solution for object pool. If the
> > object pool size is fixed (of course to avoid performance overhead,
> > it should be small enough) and it can not avoid using spinlock,
> > it is better to use fixed-size array. That makes the implementation
> > much simpler.
>
> This looks really simpler, I will add it in the next version ;-)
>
> > static struct object_instance {
> >         void *obj;      /* trace object */
> >         // add offset in the next patch
> > } traced_obj[MAX_TRACE_OBJECT];
> >
> > static atomic_t num_traced_obj;
> > static DEFINE_RAW_SPINLOCK(trace_obj_lock);
> >
> > static void set_trace_object(void *obj)
> > {
> >         ...
> >
> >         raw_spin_lock_irqsave(&trace_obj_lock, flags);
> >         if (num_traced_obj == MAX_TRACED_OBJECT)
> >                 goto out;
> >
> >         traced_obj[num_traced_obj].obj = obj;
> >         smp_wmb();      // make sure the num_traced_obj update always appears after trace_obj update.
> >         num_traced_obj++;
>
> I would like to ask whether this place need to add another smp_wmb()
> to match the smp_rmb() in the object_exist().
> I learned that rcu has a publish and subscribe mechanism ;-)

There is no need to add another smp_wmb(). I understood.

Have a nice holiday;-) Thanks.

>
> > out:
> >         raw_spin_unlock_irqrestore(&trace_obj_lock, flags);
> > }
> >
> > static bool object_exist(void *obj)
> > {
> >         ...
> >         max = num_traced_obj;
> >         smp_rmb();      // then the num_traced_obj will cap the max.
> >         for (i = 0; i < max; i++) {
> >                 if (traced_obj[i].obj == obj)
> >                         return true;
> >         }
> >         return false;
> > }
> >
> > static inline void free_object_pool(void)
> > {
> >         num_traced_obj = 0;
> >         memset(traced_obj, 0, sizeof(traced_obj));
> > }
> >
> > Sorry if I confuse you but I think you shouldn't take a time on those unneeded
> > complexity. :)
>
> Thanks, at least I learned a different way to implement it ;-)
>
> >
> >
> > > > > +static void submit_trace_object(unsigned long ip, unsigned long parent_ip,
> > > > > +                              unsigned long object)
> > > > > +{
> > > > > +
> > > > > +     struct trace_buffer *buffer;
> > > > > +     struct ring_buffer_event *event;
> > > > > +     struct trace_object_entry *entry;
> > > > > +     int pc;
> > > > > +
> > > > > +     pc = preempt_count();
> > > > > +     event = trace_event_buffer_lock_reserve(&buffer, &event_trace_file,
> > > > > +                     TRACE_OBJECT, sizeof(*entry), pc);
> > > > > +     if (!event)
> > > > > +             return;
> > > > > +     entry   = ring_buffer_event_data(event);
> > > > > +     entry->ip                       = ip;
> > > > > +     entry->parent_ip                = parent_ip;
> > > > > +     entry->object                   = object;
> > > > > +
> > > > > +     event_trigger_unlock_commit(&event_trace_file, buffer, event,
> > > > > +             entry, pc);
> > > > > +}
> > > > > +
> > > > > +static void
> > > > > +trace_object_events_call(unsigned long ip, unsigned long parent_ip,
> > > > > +             struct ftrace_ops *op, struct ftrace_regs *fregs)
> > > > > +{
> > > > > +     struct pt_regs *pt_regs = ftrace_get_regs(fregs);
> > > > > +     unsigned long obj;
> > > > > +     long disabled;
> > > > > +     int cpu, n;
> > > > > +
> > > > > +     preempt_disable_notrace();
> > > > > +
> > > > > +     cpu = raw_smp_processor_id();
> > > > > +     disabled = atomic_inc_return(&per_cpu(trace_object_event_disable, cpu));
> > > >
> > > > So what is the purpose of this check? (are there any issue if the same
> > > > cpu reenter here?)
> > > >
> > >
> > > There may be an interrupt context that can preempt it. I am not yet
> > > sure whether the cpu reenter  will affect it.
> > > I will debug and test it. (Referred from function_test_events_call())
> >
> > Maybe you can use ftrace_test_recursion_trylock(), as kprobe_ftrace_handler()
> > does.
>
> I will use it ,thanks.
>
> >
> > Thank you,
> >
> > --
> > Masami Hiramatsu <mhiramat@kernel.org>
> ---
> JeffXie
---
JeffXie

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH v6 1/5] trace: Add trace any kernel object
  2021-12-19  3:07       ` Masami Hiramatsu
@ 2022-01-08  0:21         ` Steven Rostedt
  2022-01-10  2:00           ` Jeff Xie
  0 siblings, 1 reply; 26+ messages in thread
From: Steven Rostedt @ 2022-01-08  0:21 UTC (permalink / raw)
  To: Masami Hiramatsu; +Cc: Jeff Xie, mingo, Tom Zanussi, linux-kernel

Sorry for the late reply, I'm currently unemployed (for another week) and
was spending all my time renovating my office. I finished my office and I'm
now trying to get back up to speed.

On Sun, 19 Dec 2021 12:07:23 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:


> > > > +#include "trace_output.h"
> > > > +#include <linux/freelist.h>
> > > > +
> > > > +static DEFINE_PER_CPU(atomic_t, trace_object_event_disable);  
> > >
> > > atomic_t is for atomic operation which must be shared among cpus. On the
> > > other hand, per-cpu variable is used for the core-local storage or flags,
> > > other cpus never touch it. Thus the per-cpu atomic_t is very strange.
> > >  
> > 
> > From the patch V1, I cloned it from the  function_test_events_call()
> > in kernel/trace/trace_events.c
> > 
> > commit: 9ea21c1ecdb35ecdcac5fd9d95f62a1f6a7ffec0
> > tracing/events: perform function tracing in event selftests
> > Author:     Steven Rostedt <srostedt@redhat.com>  
> 
> Hmm, OK.

Ug, showing me my skeletons in my closet! That commit is from 2009, where I
didn't know any better ;-)

> 
> > 
> > It should be to prevent being preempted by the interrupt context in
> > the process of adding one  
> 
> Yeah, I think so.
> 
> The commit says "some bugs" but it is not sure what actually needs to be
> cared.
> 
>     tracing/events: perform function tracing in event selftests
>     
>     We can find some bugs in the trace events if we stress the writes as well.
>     The function tracer is a good way to stress the events.
> 
> Steve, can you tell me what was the problem?
> 
> I think we don't need per-cpu atomic_t because the counter is increment
> and decrement. Thus when quiting the nested ftrace handler on the same CPU,
> the counter comes back to the same value. We don't need to care about
> atomic increment.
> 
> I mean, if we use the normal per-cpu "unsigned int" as a counter, the
> operation of "counter++" becomes;

Yes, that was from the days of being extra paranoid. A simple counter would
work, with a barrier() in place such that gcc doesn't cause any issues.

I may have to go back and revisit all that code and clean it up a bit.

> 
> load counter to reg1
> [1]
> reg1 = reg1 + 1
> store reg1 to counter
> 
> And if an interrupt occurs at [1], the following happens.
> 
> load counter to reg1 # counter = 0
> 
>   (interrupt - save reg1)
>   load counter to reg1  # counter = 0
>   reg1 = reg1 + 1
>   store reg1 to counter  # counter = 1
>   ...
>   load counter to reg1  # counter = 1
>   reg1 = reg1 - 1
>   store reg1 to counter  # counter = 0
>   (iret - restore reg1)
> 
> reg1 = reg1 + 1
> store reg1 to counter
> 
> So, even if the operation is not atomic, there seems no problem.
> What else scenario we have to worry?
> 
> (BTW, what is the ftrace_test_recursion_trylock()? Is that also
> for detecting nesting case??)

Yes, the ftrace_test_recursion_trylock() is for finding recursions.

The above code is from the early days of ftrace, and was only used in
testing at boot up. It's not something to copy from ;-)

> 
> > > > +static DEFINE_RAW_SPINLOCK(object_spin_lock);
> > > > +static struct trace_event_file event_trace_file;
> > > > +static const int max_args_num = 6;
> > > > +static const int max_obj_pool = 10;
> > > > +static atomic_t trace_object_ref;
> > > > +static int exit_trace_object(void);
> > > > +static int init_trace_object(void);
> > > > +  
> > >
> > > Please add more comments to the code itself. Explain why this is needed
> > > and how it works for which case. That will lead deeper understanding.
> > >  
> > 
> > I agree, I will add more comments in the next version.
> >   
> > > > +struct object_instance {
> > > > +     void *object;
> > > > +     struct freelist_node free_list;
> > > > +     struct list_head active_list;
> > > > +};
> > > > +
> > > > +struct obj_pool {
> > > > +     struct freelist_head free_list;
> > > > +     struct list_head active_list;
> > > > +};
> > > > +static struct obj_pool *obj_pool;
> > > > +
> > > > +static bool object_exist(void *obj)
> > > > +{
> > > > +     struct object_instance *inst;
> > > > +     bool ret = false;
> > > > +
> > > > +     list_for_each_entry_rcu(inst, &obj_pool->active_list, active_list) {
> > > > +             if (inst->object == obj) {
> > > > +                     ret = true;
> > > > +                     goto out;
> > > > +             }
> > > > +     }
> > > > +out:
> > > > +     return ret;

BTW, the above really should be:

static bool object_exist(void *obj)
{
	struct object_instance *inst;

	list_for_each_entry_rcu(inst, &obj_pool->active_list, active_list) {
		if (inst->object == obj)
			return true;
	}
	return false;
}

-- Steve


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH v6 1/5] trace: Add trace any kernel object
  2022-01-08  0:21         ` Steven Rostedt
@ 2022-01-10  2:00           ` Jeff Xie
  0 siblings, 0 replies; 26+ messages in thread
From: Jeff Xie @ 2022-01-10  2:00 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Masami Hiramatsu, mingo, Tom Zanussi, linux-kernel

Hi Steven,

Welcome back and look forward to playing this patch set with you again
in 2022 ;-)

Thank you and Masima for your guidance on this patchset in 2021. I
learned a lot.


On Sat, Jan 8, 2022 at 8:21 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> Sorry for the late reply, I'm currently unemployed (for another week) and
> was spending all my time renovating my office. I finished my office and I'm
> now trying to get back up to speed.
>
> On Sun, 19 Dec 2021 12:07:23 +0900
> Masami Hiramatsu <mhiramat@kernel.org> wrote:
>
>
> > > > > +#include "trace_output.h"
> > > > > +#include <linux/freelist.h>
> > > > > +
> > > > > +static DEFINE_PER_CPU(atomic_t, trace_object_event_disable);
> > > >
> > > > atomic_t is for atomic operation which must be shared among cpus. On the
> > > > other hand, per-cpu variable is used for the core-local storage or flags,
> > > > other cpus never touch it. Thus the per-cpu atomic_t is very strange.
> > > >
> > >
> > > From the patch V1, I cloned it from the  function_test_events_call()
> > > in kernel/trace/trace_events.c
> > >
> > > commit: 9ea21c1ecdb35ecdcac5fd9d95f62a1f6a7ffec0
> > > tracing/events: perform function tracing in event selftests
> > > Author:     Steven Rostedt <srostedt@redhat.com>
> >
> > Hmm, OK.
>
> Ug, showing me my skeletons in my closet! That commit is from 2009, where I
> didn't know any better ;-)
>
> >
> > >
> > > It should be to prevent being preempted by the interrupt context in
> > > the process of adding one
> >
> > Yeah, I think so.
> >
> > The commit says "some bugs" but it is not sure what actually needs to be
> > cared.
> >
> >     tracing/events: perform function tracing in event selftests
> >
> >     We can find some bugs in the trace events if we stress the writes as well.
> >     The function tracer is a good way to stress the events.
> >
> > Steve, can you tell me what was the problem?
> >
> > I think we don't need per-cpu atomic_t because the counter is increment
> > and decrement. Thus when quiting the nested ftrace handler on the same CPU,
> > the counter comes back to the same value. We don't need to care about
> > atomic increment.
> >
> > I mean, if we use the normal per-cpu "unsigned int" as a counter, the
> > operation of "counter++" becomes;
>
> Yes, that was from the days of being extra paranoid. A simple counter would
> work, with a barrier() in place such that gcc doesn't cause any issues.
>
> I may have to go back and revisit all that code and clean it up a bit.
>
> >
> > load counter to reg1
> > [1]
> > reg1 = reg1 + 1
> > store reg1 to counter
> >
> > And if an interrupt occurs at [1], the following happens.
> >
> > load counter to reg1 # counter = 0
> >
> >   (interrupt - save reg1)
> >   load counter to reg1  # counter = 0
> >   reg1 = reg1 + 1
> >   store reg1 to counter  # counter = 1
> >   ...
> >   load counter to reg1  # counter = 1
> >   reg1 = reg1 - 1
> >   store reg1 to counter  # counter = 0
> >   (iret - restore reg1)
> >
> > reg1 = reg1 + 1
> > store reg1 to counter
> >
> > So, even if the operation is not atomic, there seems no problem.
> > What else scenario we have to worry?
> >
> > (BTW, what is the ftrace_test_recursion_trylock()? Is that also
> > for detecting nesting case??)
>
> Yes, the ftrace_test_recursion_trylock() is for finding recursions.
>
> The above code is from the early days of ftrace, and was only used in
> testing at boot up. It's not something to copy from ;-)
>
> >
> > > > > +static DEFINE_RAW_SPINLOCK(object_spin_lock);
> > > > > +static struct trace_event_file event_trace_file;
> > > > > +static const int max_args_num = 6;
> > > > > +static const int max_obj_pool = 10;
> > > > > +static atomic_t trace_object_ref;
> > > > > +static int exit_trace_object(void);
> > > > > +static int init_trace_object(void);
> > > > > +
> > > >
> > > > Please add more comments to the code itself. Explain why this is needed
> > > > and how it works for which case. That will lead deeper understanding.
> > > >
> > >
> > > I agree, I will add more comments in the next version.
> > >
> > > > > +struct object_instance {
> > > > > +     void *object;
> > > > > +     struct freelist_node free_list;
> > > > > +     struct list_head active_list;
> > > > > +};
> > > > > +
> > > > > +struct obj_pool {
> > > > > +     struct freelist_head free_list;
> > > > > +     struct list_head active_list;
> > > > > +};
> > > > > +static struct obj_pool *obj_pool;
> > > > > +
> > > > > +static bool object_exist(void *obj)
> > > > > +{
> > > > > +     struct object_instance *inst;
> > > > > +     bool ret = false;
> > > > > +
> > > > > +     list_for_each_entry_rcu(inst, &obj_pool->active_list, active_list) {
> > > > > +             if (inst->object == obj) {
> > > > > +                     ret = true;
> > > > > +                     goto out;
> > > > > +             }
> > > > > +     }
> > > > > +out:
> > > > > +     return ret;
>
> BTW, the above really should be:
>
> static bool object_exist(void *obj)
> {
>         struct object_instance *inst;
>
>         list_for_each_entry_rcu(inst, &obj_pool->active_list, active_list) {
>                 if (inst->object == obj)
>                         return true;
>         }
>         return false;
> }

Thanks. Masami suggested that it is better to use fixed-size array,
I will be ready to send the next version.

> -- Steve
>
---
JeffXie

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC][PATCH v6 1/5] trace: Add trace any kernel object
@ 2021-11-30 19:50 kernel test robot
  0 siblings, 0 replies; 26+ messages in thread
From: kernel test robot @ 2021-11-30 19:50 UTC (permalink / raw)
  To: kbuild

[-- Attachment #1: Type: text/plain, Size: 25803 bytes --]

CC: llvm(a)lists.linux.dev
CC: kbuild-all(a)lists.01.org
In-Reply-To: <20211129164951.220511-2-xiehuan09@gmail.com>
References: <20211129164951.220511-2-xiehuan09@gmail.com>
TO: Jeff Xie <xiehuan09@gmail.com>

Hi Jeff,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on rostedt-trace/for-next]
[also build test WARNING on linux/master linus/master v5.16-rc3 next-20211130]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Jeff-Xie/trace-Introduce-objtrace-trigger-to-trace-the-kernel-object/20211130-041958
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git for-next
:::::: branch date: 23 hours ago
:::::: commit date: 23 hours ago
config: x86_64-randconfig-c007-20211128 (https://download.01.org/0day-ci/archive/20211201/202112010332.H8xSohUr-lkp(a)intel.com/config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 25eb7fa01d7ebbe67648ea03841cda55b4239ab2)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/88ea43d691b0980bf49b4e579c39303cf3c9cd72
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Jeff-Xie/trace-Introduce-objtrace-trigger-to-trace-the-kernel-object/20211130-041958
        git checkout 88ea43d691b0980bf49b4e579c39303cf3c9cd72
        # save the config file to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 clang-analyzer 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


clang-analyzer warnings: (new ones prefixed by >>)
           if (function != ACPI_READ && function != ACPI_WRITE)
               ^~~~~~~~~~~~~~~~~~~~~
   drivers/acpi/ec.c:1283:6: note: Left side of '&&' is true
   drivers/acpi/ec.c:1283:31: note: Assuming 'function' is equal to ACPI_WRITE
           if (function != ACPI_READ && function != ACPI_WRITE)
                                        ^~~~~~~~~~~~~~~~~~~~~~
   drivers/acpi/ec.c:1283:2: note: Taking false branch
           if (function != ACPI_READ && function != ACPI_WRITE)
           ^
   drivers/acpi/ec.c:1286:6: note: Assuming field 'busy_polling' is true
           if (ec->busy_polling || bits > 8)
               ^~~~~~~~~~~~~~~~
   drivers/acpi/ec.c:1286:23: note: Left side of '||' is true
           if (ec->busy_polling || bits > 8)
                                ^
   drivers/acpi/ec.c:1289:14: note: Assuming 'i' is < 'bytes'
           for (i = 0; i < bytes; ++i, ++address, ++value)
                       ^~~~~~~~~
   drivers/acpi/ec.c:1289:2: note: Loop condition is true.  Entering loop body
           for (i = 0; i < bytes; ++i, ++address, ++value)
           ^
   drivers/acpi/ec.c:1290:13: note: 'function' is not equal to ACPI_READ
                   result = (function == ACPI_READ) ?
                             ^~~~~~~~
   drivers/acpi/ec.c:1290:12: note: '?' condition is false
                   result = (function == ACPI_READ) ?
                            ^
   drivers/acpi/ec.c:1292:4: note: Calling 'acpi_ec_write'
                           acpi_ec_write(ec, address, *value);
                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/acpi/ec.c:872:9: note: Calling 'acpi_ec_transaction'
           return acpi_ec_transaction(ec, &t);
                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/acpi/ec.c:807:2: note: 'glk' declared without an initial value
           u32 glk;
           ^~~~~~~
   drivers/acpi/ec.c:809:7: note: 'ec' is non-null
           if (!ec || (!t) || (t->wlen && !t->wdata) || (t->rlen && !t->rdata))
                ^~
   drivers/acpi/ec.c:809:6: note: Left side of '||' is false
           if (!ec || (!t) || (t->wlen && !t->wdata) || (t->rlen && !t->rdata))
               ^
   drivers/acpi/ec.c:809:15: note: 't' is non-null
           if (!ec || (!t) || (t->wlen && !t->wdata) || (t->rlen && !t->rdata))
                        ^
   drivers/acpi/ec.c:809:6: note: Left side of '||' is false
           if (!ec || (!t) || (t->wlen && !t->wdata) || (t->rlen && !t->rdata))
               ^
   drivers/acpi/ec.c:809:25: note: Field 'wlen' is 2
           if (!ec || (!t) || (t->wlen && !t->wdata) || (t->rlen && !t->rdata))
                                  ^
   drivers/acpi/ec.c:809:22: note: Left side of '&&' is true
           if (!ec || (!t) || (t->wlen && !t->wdata) || (t->rlen && !t->rdata))
                               ^
   drivers/acpi/ec.c:809:37: note: Field 'wdata' is non-null
           if (!ec || (!t) || (t->wlen && !t->wdata) || (t->rlen && !t->rdata))
                                              ^
   drivers/acpi/ec.c:809:6: note: Left side of '||' is false
           if (!ec || (!t) || (t->wlen && !t->wdata) || (t->rlen && !t->rdata))
               ^
   drivers/acpi/ec.c:809:51: note: Field 'rlen' is 0
           if (!ec || (!t) || (t->wlen && !t->wdata) || (t->rlen && !t->rdata))
                                                            ^
   drivers/acpi/ec.c:809:56: note: Left side of '&&' is false
           if (!ec || (!t) || (t->wlen && !t->wdata) || (t->rlen && !t->rdata))
                                                                 ^
   drivers/acpi/ec.c:811:9: note: Field 'rdata' is null
           if (t->rdata)
                  ^
   drivers/acpi/ec.c:811:2: note: Taking false branch
           if (t->rdata)
           ^
   drivers/acpi/ec.c:815:6: note: Assuming field 'global_lock' is false
           if (ec->global_lock) {
               ^~~~~~~~~~~~~~~
   drivers/acpi/ec.c:815:2: note: Taking false branch
           if (ec->global_lock) {
           ^
   drivers/acpi/ec.c:825:6: note: Assuming field 'global_lock' is true
           if (ec->global_lock)
               ^~~~~~~~~~~~~~~
   drivers/acpi/ec.c:825:2: note: Taking true branch
           if (ec->global_lock)
           ^
   drivers/acpi/ec.c:826:3: note: 1st function call argument is an uninitialized value
                   acpi_release_global_lock(glk);
                   ^                        ~~~
   drivers/acpi/ec.c:1576:2: warning: Call to function 'strcpy' is insecure as it does not provide bounding of the memory buffer. Replace unbounded copy functions with analogous functions that support length arguments such as 'strlcpy'. CWE-119 [clang-analyzer-security.insecureAPI.strcpy]
           strcpy(acpi_device_name(device), ACPI_EC_DEVICE_NAME);
           ^~~~~~
   drivers/acpi/ec.c:1576:2: note: Call to function 'strcpy' is insecure as it does not provide bounding of the memory buffer. Replace unbounded copy functions with analogous functions that support length arguments such as 'strlcpy'. CWE-119
           strcpy(acpi_device_name(device), ACPI_EC_DEVICE_NAME);
           ^~~~~~
   drivers/acpi/ec.c:1577:2: warning: Call to function 'strcpy' is insecure as it does not provide bounding of the memory buffer. Replace unbounded copy functions with analogous functions that support length arguments such as 'strlcpy'. CWE-119 [clang-analyzer-security.insecureAPI.strcpy]
           strcpy(acpi_device_class(device), ACPI_EC_CLASS);
           ^~~~~~
   drivers/acpi/ec.c:1577:2: note: Call to function 'strcpy' is insecure as it does not provide bounding of the memory buffer. Replace unbounded copy functions with analogous functions that support length arguments such as 'strlcpy'. CWE-119
           strcpy(acpi_device_class(device), ACPI_EC_CLASS);
           ^~~~~~
   3 warnings generated.
>> kernel/trace/trace_object.c:99:2: warning: Use of memory after it is freed [clang-analyzer-unix.Malloc]
           list_for_each_entry_rcu(inst, head, active_list)
           ^
   include/linux/rculist.h:393:9: note: expanded from macro 'list_for_each_entry_rcu'
                   pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
                         ^
   include/linux/rculist.h:307:15: note: expanded from macro 'list_entry_rcu'
           container_of(READ_ONCE(ptr), type, member)
                        ^
   include/asm-generic/rwonce.h:50:2: note: expanded from macro 'READ_ONCE'
           __READ_ONCE(x);                                                 \
           ^
   include/asm-generic/rwonce.h:44:24: note: expanded from macro '__READ_ONCE'
   #define __READ_ONCE(x)  (*(const volatile __unqual_scalar_typeof(x) *)&(x))
                           ^
   kernel/trace/trace_object.c:298:2: note: Assuming 'debug_locks' is 0
           lockdep_assert_held(&event_mutex);
           ^
   include/linux/lockdep.h:316:2: note: expanded from macro 'lockdep_assert_held'
           lockdep_assert(lockdep_is_held(l) != LOCK_STATE_NOT_HELD)
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/lockdep.h:310:15: note: expanded from macro 'lockdep_assert'
           do { WARN_ON(debug_locks && !(cond)); } while (0)
                        ^~~~~~~~~~~
   include/asm-generic/bug.h:121:25: note: expanded from macro 'WARN_ON'
           int __ret_warn_on = !!(condition);                              \
                                  ^~~~~~~~~
   kernel/trace/trace_object.c:298:2: note: Left side of '&&' is false
           lockdep_assert_held(&event_mutex);
           ^
   include/linux/lockdep.h:316:2: note: expanded from macro 'lockdep_assert_held'
           lockdep_assert(lockdep_is_held(l) != LOCK_STATE_NOT_HELD)
           ^
   include/linux/lockdep.h:310:27: note: expanded from macro 'lockdep_assert'
           do { WARN_ON(debug_locks && !(cond)); } while (0)
                                    ^
   kernel/trace/trace_object.c:298:2: note: Taking false branch
           lockdep_assert_held(&event_mutex);
           ^
   include/linux/lockdep.h:316:2: note: expanded from macro 'lockdep_assert_held'
           lockdep_assert(lockdep_is_held(l) != LOCK_STATE_NOT_HELD)
           ^
   include/linux/lockdep.h:310:7: note: expanded from macro 'lockdep_assert'
           do { WARN_ON(debug_locks && !(cond)); } while (0)
                ^
   include/asm-generic/bug.h:122:2: note: expanded from macro 'WARN_ON'
           if (unlikely(__ret_warn_on))                                    \
           ^
   kernel/trace/trace_object.c:298:2: note: Loop condition is false.  Exiting loop
           lockdep_assert_held(&event_mutex);
           ^
   include/linux/lockdep.h:316:2: note: expanded from macro 'lockdep_assert_held'
           lockdep_assert(lockdep_is_held(l) != LOCK_STATE_NOT_HELD)
           ^
   include/linux/lockdep.h:310:2: note: expanded from macro 'lockdep_assert'
           do { WARN_ON(debug_locks && !(cond)); } while (0)
           ^
   kernel/trace/trace_object.c:300:2: note: Loop condition is false. Execution continues on line 307
           list_for_each_entry(test, &file->triggers, list) {
           ^
   include/linux/list.h:630:2: note: expanded from macro 'list_for_each_entry'
           for (pos = list_first_entry(head, typeof(*pos), member);        \
           ^
   kernel/trace/trace_object.c:307:6: note: Assuming field 'init' is null
           if (data->ops->init) {
               ^~~~~~~~~~~~~~~
   kernel/trace/trace_object.c:307:2: note: Taking false branch
           if (data->ops->init) {
           ^
   kernel/trace/trace_object.c:317:6: note: Assuming the condition is false
           if (trace_event_trigger_enable_disable(file, 1) < 0) {
               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   kernel/trace/trace_object.c:317:2: note: Taking false branch
           if (trace_event_trigger_enable_disable(file, 1) < 0) {
           ^
   kernel/trace/trace_object.c:322:2: note: Calling 'init_trace_object'
           init_trace_object();
           ^~~~~~~~~~~~~~~~~~~
   kernel/trace/trace_object.c:495:6: note: Assuming the condition is false
           if (atomic_inc_return(&trace_object_ref) != 1) {
               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   kernel/trace/trace_object.c:495:2: note: Taking false branch
           if (atomic_inc_return(&trace_object_ref) != 1) {
           ^
   kernel/trace/trace_object.c:500:8: note: Calling 'init_object_pool'
           ret = init_object_pool();
                 ^~~~~~~~~~~~~~~~~~
   kernel/trace/trace_object.c:117:6: note: Assuming 'obj_pool' is non-null
           if (!obj_pool) {
               ^~~~~~~~~
   kernel/trace/trace_object.c:117:2: note: Taking false branch
           if (!obj_pool) {
           ^
   kernel/trace/trace_object.c:124:2: note: Loop condition is true.  Entering loop body
           for (i = 0; i < max_obj_pool; i++) {
           ^
   kernel/trace/trace_object.c:126:7: note: Assuming 'inst' is null
                   if (!inst) {
                       ^~~~~
   kernel/trace/trace_object.c:126:3: note: Taking true branch
                   if (!inst) {
--
           compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long),  \
                              ^
   include/linux/compiler_types.h:302:3: note: expanded from macro '__native_word'
           (sizeof(t) == sizeof(char) || sizeof(t) == sizeof(short) || \
            ^
   kernel/trace/trace_object.c:99:2: note: Left side of '||' is false
           list_for_each_entry_rcu(inst, head, active_list)
           ^
   include/linux/rculist.h:393:9: note: expanded from macro 'list_for_each_entry_rcu'
                   pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
                         ^
   include/linux/rculist.h:307:15: note: expanded from macro 'list_entry_rcu'
           container_of(READ_ONCE(ptr), type, member)
                        ^
   include/asm-generic/rwonce.h:49:2: note: expanded from macro 'READ_ONCE'
           compiletime_assert_rwonce_type(x);                              \
           ^
   include/asm-generic/rwonce.h:36:21: note: expanded from macro 'compiletime_assert_rwonce_type'
           compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long),  \
                              ^
   include/linux/compiler_types.h:302:3: note: expanded from macro '__native_word'
           (sizeof(t) == sizeof(char) || sizeof(t) == sizeof(short) || \
            ^
   kernel/trace/trace_object.c:99:2: note: Left side of '||' is true
           list_for_each_entry_rcu(inst, head, active_list)
           ^
   include/linux/rculist.h:393:9: note: expanded from macro 'list_for_each_entry_rcu'
                   pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
                         ^
   include/linux/rculist.h:307:15: note: expanded from macro 'list_entry_rcu'
           container_of(READ_ONCE(ptr), type, member)
                        ^
   include/asm-generic/rwonce.h:49:2: note: expanded from macro 'READ_ONCE'
           compiletime_assert_rwonce_type(x);                              \
           ^
   include/asm-generic/rwonce.h:36:38: note: expanded from macro 'compiletime_assert_rwonce_type'
           compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long),  \
                                               ^
   kernel/trace/trace_object.c:99:2: note: Taking false branch
           list_for_each_entry_rcu(inst, head, active_list)
           ^
   include/linux/rculist.h:393:9: note: expanded from macro 'list_for_each_entry_rcu'
                   pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
                         ^
   include/linux/rculist.h:307:15: note: expanded from macro 'list_entry_rcu'
           container_of(READ_ONCE(ptr), type, member)
                        ^
   include/asm-generic/rwonce.h:49:2: note: expanded from macro 'READ_ONCE'
           compiletime_assert_rwonce_type(x);                              \
           ^
   note: (skipping 1 expansions in backtrace; use -fmacro-backtrace-limit=0 to see all)
   include/linux/compiler_types.h:335:2: note: expanded from macro 'compiletime_assert'
           _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
           ^
   include/linux/compiler_types.h:323:2: note: expanded from macro '_compiletime_assert'
           __compiletime_assert(condition, msg, prefix, suffix)
           ^
   include/linux/compiler_types.h:315:3: note: expanded from macro '__compiletime_assert'
                   if (!(condition))                                       \
                   ^
   kernel/trace/trace_object.c:99:2: note: Loop condition is false.  Exiting loop
           list_for_each_entry_rcu(inst, head, active_list)
           ^
   include/linux/rculist.h:393:9: note: expanded from macro 'list_for_each_entry_rcu'
                   pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
                         ^
   include/linux/rculist.h:307:15: note: expanded from macro 'list_entry_rcu'
           container_of(READ_ONCE(ptr), type, member)
                        ^
   include/asm-generic/rwonce.h:49:2: note: expanded from macro 'READ_ONCE'
           compiletime_assert_rwonce_type(x);                              \
           ^
   note: (skipping 1 expansions in backtrace; use -fmacro-backtrace-limit=0 to see all)
   include/linux/compiler_types.h:335:2: note: expanded from macro 'compiletime_assert'
           _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
           ^
   include/linux/compiler_types.h:323:2: note: expanded from macro '_compiletime_assert'
           __compiletime_assert(condition, msg, prefix, suffix)
           ^
   include/linux/compiler_types.h:307:2: note: expanded from macro '__compiletime_assert'
           do {                                                            \
           ^
   kernel/trace/trace_object.c:99:2: note: Use of memory after it is freed
           list_for_each_entry_rcu(inst, head, active_list)
           ^
   include/linux/rculist.h:393:9: note: expanded from macro 'list_for_each_entry_rcu'
                   pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/rculist.h:307:15: note: expanded from macro 'list_entry_rcu'
           container_of(READ_ONCE(ptr), type, member)
           ~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/asm-generic/rwonce.h:50:2: note: expanded from macro 'READ_ONCE'
           __READ_ONCE(x);                                                 \
           ^
   include/asm-generic/rwonce.h:44:24: note: expanded from macro '__READ_ONCE'
   #define __READ_ONCE(x)  (*(const volatile __unqual_scalar_typeof(x) *)&(x))
                           ^
   include/linux/container_of.h:18:26: note: expanded from macro 'container_of'
           void *__mptr = (void *)(ptr);                                   \
                                   ^~~
>> kernel/trace/trace_object.c:533:2: warning: Undefined or garbage value returned to caller [clang-analyzer-core.uninitialized.UndefReturn]
           return ret;
           ^
   kernel/trace/trace_object.c:334:2: note: Assuming 'debug_locks' is 0
           lockdep_assert_held(&event_mutex);
           ^
   include/linux/lockdep.h:316:2: note: expanded from macro 'lockdep_assert_held'
           lockdep_assert(lockdep_is_held(l) != LOCK_STATE_NOT_HELD)
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/lockdep.h:310:15: note: expanded from macro 'lockdep_assert'
           do { WARN_ON(debug_locks && !(cond)); } while (0)
                        ^~~~~~~~~~~
   include/asm-generic/bug.h:121:25: note: expanded from macro 'WARN_ON'
           int __ret_warn_on = !!(condition);                              \
                                  ^~~~~~~~~
   kernel/trace/trace_object.c:334:2: note: Left side of '&&' is false
           lockdep_assert_held(&event_mutex);
           ^
   include/linux/lockdep.h:316:2: note: expanded from macro 'lockdep_assert_held'
           lockdep_assert(lockdep_is_held(l) != LOCK_STATE_NOT_HELD)
           ^
   include/linux/lockdep.h:310:27: note: expanded from macro 'lockdep_assert'
           do { WARN_ON(debug_locks && !(cond)); } while (0)
                                    ^
   kernel/trace/trace_object.c:334:2: note: Taking false branch
           lockdep_assert_held(&event_mutex);
           ^
   include/linux/lockdep.h:316:2: note: expanded from macro 'lockdep_assert_held'
           lockdep_assert(lockdep_is_held(l) != LOCK_STATE_NOT_HELD)
           ^
   include/linux/lockdep.h:310:7: note: expanded from macro 'lockdep_assert'
           do { WARN_ON(debug_locks && !(cond)); } while (0)
                ^
   include/asm-generic/bug.h:122:2: note: expanded from macro 'WARN_ON'
           if (unlikely(__ret_warn_on))                                    \
           ^
   kernel/trace/trace_object.c:334:2: note: Loop condition is false.  Exiting loop
           lockdep_assert_held(&event_mutex);
           ^
   include/linux/lockdep.h:316:2: note: expanded from macro 'lockdep_assert_held'
           lockdep_assert(lockdep_is_held(l) != LOCK_STATE_NOT_HELD)
           ^
   include/linux/lockdep.h:310:2: note: expanded from macro 'lockdep_assert'
           do { WARN_ON(debug_locks && !(cond)); } while (0)
           ^
   kernel/trace/trace_object.c:336:2: note: Loop condition is true.  Entering loop body
           list_for_each_entry(data, &file->triggers, list) {
           ^
   include/linux/list.h:630:2: note: expanded from macro 'list_for_each_entry'
           for (pos = list_first_entry(head, typeof(*pos), member);        \
           ^
   kernel/trace/trace_object.c:337:7: note: Assuming 'data->cmd_ops->trigger_type' is equal to 'test->cmd_ops->trigger_type'
                   if (data->cmd_ops->trigger_type == test->cmd_ops->trigger_type) {
                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   kernel/trace/trace_object.c:337:3: note: Taking true branch
                   if (data->cmd_ops->trigger_type == test->cmd_ops->trigger_type) {
                   ^
   kernel/trace/trace_object.c:342:4: note:  Execution continues on line 346
                           break;
                           ^
   kernel/trace/trace_object.c:346:6: note: 'unregistered' is true
           if (unregistered && data->ops->free) {
               ^~~~~~~~~~~~
   kernel/trace/trace_object.c:346:6: note: Left side of '&&' is true
   kernel/trace/trace_object.c:346:22: note: Assuming field 'free' is non-null
           if (unregistered && data->ops->free) {
                               ^~~~~~~~~~~~~~~
   kernel/trace/trace_object.c:346:2: note: Taking true branch
           if (unregistered && data->ops->free) {
           ^
   kernel/trace/trace_object.c:348:3: note: Calling 'exit_trace_object'
                   exit_trace_object();
                   ^~~~~~~~~~~~~~~~~~~
   kernel/trace/trace_object.c:516:2: note: 'ret' declared without an initial value
           int ret;
           ^~~~~~~
   kernel/trace/trace_object.c:518:19: note: Assuming the condition is true
           if (WARN_ON_ONCE(atomic_read(&trace_object_ref) <= 0))
                            ^
   include/asm-generic/bug.h:104:25: note: expanded from macro 'WARN_ON_ONCE'
           int __ret_warn_on = !!(condition);                      \
                                  ^~~~~~~~~
   kernel/trace/trace_object.c:518:6: note: Taking true branch
           if (WARN_ON_ONCE(atomic_read(&trace_object_ref) <= 0))
               ^
   include/asm-generic/bug.h:105:2: note: expanded from macro 'WARN_ON_ONCE'
           if (unlikely(__ret_warn_on))                            \
           ^
   kernel/trace/trace_object.c:518:6: note: Loop condition is false.  Exiting loop
           if (WARN_ON_ONCE(atomic_read(&trace_object_ref) <= 0))
               ^
   include/asm-generic/bug.h:106:3: note: expanded from macro 'WARN_ON_ONCE'
                   __WARN_FLAGS(BUGFLAG_ONCE |                     \
                   ^
   arch/x86/include/asm/bug.h:79:2: note: expanded from macro '__WARN_FLAGS'
           _BUG_FLAGS(ASM_UD2, BUGFLAG_WARNING|(flags));           \
           ^
   arch/x86/include/asm/bug.h:25:37: note: expanded from macro '_BUG_FLAGS'
   #define _BUG_FLAGS(ins, flags)                                          \
                                                                           ^
   kernel/trace/trace_object.c:518:6: note: Loop condition is false.  Exiting loop

vim +99 kernel/trace/trace_object.c

88ea43d691b0980 Jeff Xie 2021-11-30   94  
88ea43d691b0980 Jeff Xie 2021-11-30   95  static inline void free_active_list_objects(struct list_head *head)
88ea43d691b0980 Jeff Xie 2021-11-30   96  {
88ea43d691b0980 Jeff Xie 2021-11-30   97  	struct object_instance *inst;
88ea43d691b0980 Jeff Xie 2021-11-30   98  
88ea43d691b0980 Jeff Xie 2021-11-30  @99  	list_for_each_entry_rcu(inst, head, active_list)
88ea43d691b0980 Jeff Xie 2021-11-30  100  		kfree(inst);
88ea43d691b0980 Jeff Xie 2021-11-30  101  }
88ea43d691b0980 Jeff Xie 2021-11-30  102  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2022-01-10  2:00 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-29 16:49 [RFC][PATCH v6 0/5] trace: Introduce objtrace trigger to trace the kernel object Jeff Xie
2021-11-29 16:49 ` [RFC][PATCH v6 1/5] trace: Add trace any " Jeff Xie
2021-11-29 23:05   ` kernel test robot
2021-11-29 23:05     ` kernel test robot
2021-12-17  4:51   ` Masami Hiramatsu
2021-12-17 16:32     ` Jeff Xie
2021-12-19  3:07       ` Masami Hiramatsu
2022-01-08  0:21         ` Steven Rostedt
2022-01-10  2:00           ` Jeff Xie
2021-12-21  7:36       ` Masami Hiramatsu
2021-12-21 10:29         ` Jeff Xie
2021-12-23 14:12           ` Jeff Xie
2021-11-29 16:49 ` [RFC][PATCH v6 2/5] trace/objtrace: get the value of the object Jeff Xie
2021-11-29 16:49 ` [RFC][PATCH v6 3/5] trace/README: Document objtrace trigger syntax Jeff Xie
2021-12-16 15:02   ` Masami Hiramatsu
2021-12-17  1:45     ` Jeff Xie
2021-11-29 16:49 ` [RFC][PATCH v6 4/5] trace/objtrace: Add testcases for objtrace Jeff Xie
2021-11-29 16:49 ` [RFC][PATCH v6 5/5] trace/objtrace: Add documentation " Jeff Xie
2021-12-10 16:55 ` [RFC][PATCH v6 0/5] trace: Introduce objtrace trigger to trace the kernel object Jeff Xie
2021-12-10 17:31   ` Steven Rostedt
2021-12-13 10:16   ` Masami Hiramatsu
2021-12-16 14:50 ` Masami Hiramatsu
2021-12-17  1:43   ` Jeff Xie
2021-11-30 19:50 [RFC][PATCH v6 1/5] trace: Add trace any " kernel test robot
2021-12-03 19:54 kernel test robot
2021-12-06 13:21 ` Dan Carpenter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.