All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe
@ 2021-04-13 12:15 Jiri Olsa
  2021-04-13 12:15 ` [PATCHv2 RFC bpf-next 1/7] bpf: Move bpf_prog_start/end functions to generic place Jiri Olsa
                   ` (7 more replies)
  0 siblings, 8 replies; 39+ messages in thread
From: Jiri Olsa @ 2021-04-13 12:15 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: netdev, bpf, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Daniel Xu, Steven Rostedt,
	Jesper Brouer, Toke Høiland-Jørgensen, Viktor Malik

hi,
sending another attempt on speeding up load of multiple probes
for bpftrace and possibly other tools (first post in [1]).

This patchset adds support to attach bpf program directly to
ftrace probe as suggested by Steven and it speeds up loading
for bpftrace commands like:

   # bpftrace -e 'kfunc:_raw_spin* { @[probe] = count(); }'
   # bpftrace -e 'kfunc:ksys_* { @[probe] = count(); }'

Using ftrace with single bpf program for attachment to multiple
functions is much faster than current approach, where we need to
load and attach program for each probe function.

Also available in
  git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
  bpf/ftrace

thanks,
jirka


[1] https://lore.kernel.org/bpf/20201022082138.2322434-1-jolsa@kernel.org/
---
Jiri Olsa (7):
      bpf: Move bpf_prog_start/end functions to generic place
      bpf: Add bpf_functions object
      bpf: Add support to attach program to ftrace probe
      libbpf: Add btf__find_by_pattern_kind function
      libbpf: Add support to load and attach ftrace probe
      selftests/bpf: Add ftrace probe to fentry test
      selftests/bpf: Add ftrace probe test

 include/uapi/linux/bpf.h                             |   8 ++++
 kernel/bpf/syscall.c                                 | 381 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 kernel/bpf/trampoline.c                              |  97 ---------------------------------------
 kernel/bpf/verifier.c                                |  27 +++++++++++
 net/bpf/test_run.c                                   |   1 +
 tools/include/uapi/linux/bpf.h                       |   8 ++++
 tools/lib/bpf/bpf.c                                  |  12 +++++
 tools/lib/bpf/bpf.h                                  |   5 +-
 tools/lib/bpf/btf.c                                  |  67 +++++++++++++++++++++++++++
 tools/lib/bpf/btf.h                                  |   3 ++
 tools/lib/bpf/libbpf.c                               |  74 ++++++++++++++++++++++++++++++
 tools/lib/bpf/libbpf.map                             |   1 +
 tools/testing/selftests/bpf/prog_tests/fentry_test.c |   5 +-
 tools/testing/selftests/bpf/prog_tests/ftrace_test.c |  48 +++++++++++++++++++
 tools/testing/selftests/bpf/progs/fentry_test.c      |  16 +++++++
 tools/testing/selftests/bpf/progs/ftrace_test.c      |  17 +++++++
 16 files changed, 671 insertions(+), 99 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/ftrace_test.c
 create mode 100644 tools/testing/selftests/bpf/progs/ftrace_test.c


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCHv2 RFC bpf-next 1/7] bpf: Move bpf_prog_start/end functions to generic place
  2021-04-13 12:15 [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe Jiri Olsa
@ 2021-04-13 12:15 ` Jiri Olsa
  2021-04-13 12:15 ` [PATCHv2 RFC bpf-next 2/7] bpf: Add bpf_functions object Jiri Olsa
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 39+ messages in thread
From: Jiri Olsa @ 2021-04-13 12:15 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: kernel test robot, netdev, bpf, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Daniel Xu,
	Steven Rostedt, Jesper Brouer, Toke Høiland-Jørgensen,
	Viktor Malik

Moving bpf_prog_start/end functions plus related static
functions to generic place. So they can be used also when
trampolines are disabled.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 kernel/bpf/syscall.c    | 97 +++++++++++++++++++++++++++++++++++++++++
 kernel/bpf/trampoline.c | 97 -----------------------------------------
 2 files changed, 97 insertions(+), 97 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 6428634da57e..90cd58520bd4 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -4494,3 +4494,100 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
 
 	return err;
 }
+
+#define NO_START_TIME 1
+static u64 notrace bpf_prog_start_time(void)
+{
+	u64 start = NO_START_TIME;
+
+	if (static_branch_unlikely(&bpf_stats_enabled_key)) {
+		start = sched_clock();
+		if (unlikely(!start))
+			start = NO_START_TIME;
+	}
+	return start;
+}
+
+static void notrace inc_misses_counter(struct bpf_prog *prog)
+{
+	struct bpf_prog_stats *stats;
+
+	stats = this_cpu_ptr(prog->stats);
+	u64_stats_update_begin(&stats->syncp);
+	stats->misses++;
+	u64_stats_update_end(&stats->syncp);
+}
+
+/* The logic is similar to BPF_PROG_RUN, but with an explicit
+ * rcu_read_lock() and migrate_disable() which are required
+ * for the trampoline. The macro is split into
+ * call __bpf_prog_enter
+ * call prog->bpf_func
+ * call __bpf_prog_exit
+ *
+ * __bpf_prog_enter returns:
+ * 0 - skip execution of the bpf prog
+ * 1 - execute bpf prog
+ * [2..MAX_U64] - excute bpf prog and record execution time.
+ *     This is start time.
+ */
+u64 notrace __bpf_prog_enter(struct bpf_prog *prog)
+	__acquires(RCU)
+{
+	rcu_read_lock();
+	migrate_disable();
+	if (unlikely(__this_cpu_inc_return(*(prog->active)) != 1)) {
+		inc_misses_counter(prog);
+		return 0;
+	}
+	return bpf_prog_start_time();
+}
+
+static void notrace update_prog_stats(struct bpf_prog *prog,
+				      u64 start)
+{
+	struct bpf_prog_stats *stats;
+
+	if (static_branch_unlikely(&bpf_stats_enabled_key) &&
+	    /* static_key could be enabled in __bpf_prog_enter*
+	     * and disabled in __bpf_prog_exit*.
+	     * And vice versa.
+	     * Hence check that 'start' is valid.
+	     */
+	    start > NO_START_TIME) {
+		stats = this_cpu_ptr(prog->stats);
+		u64_stats_update_begin(&stats->syncp);
+		stats->cnt++;
+		stats->nsecs += sched_clock() - start;
+		u64_stats_update_end(&stats->syncp);
+	}
+}
+
+void notrace __bpf_prog_exit(struct bpf_prog *prog, u64 start)
+	__releases(RCU)
+{
+	update_prog_stats(prog, start);
+	__this_cpu_dec(*(prog->active));
+	migrate_enable();
+	rcu_read_unlock();
+}
+
+u64 notrace __bpf_prog_enter_sleepable(struct bpf_prog *prog)
+{
+	rcu_read_lock_trace();
+	migrate_disable();
+	might_fault();
+	if (unlikely(__this_cpu_inc_return(*(prog->active)) != 1)) {
+		inc_misses_counter(prog);
+		return 0;
+	}
+	return bpf_prog_start_time();
+}
+
+void notrace __bpf_prog_exit_sleepable(struct bpf_prog *prog, u64 start)
+{
+	update_prog_stats(prog, start);
+	__this_cpu_dec(*(prog->active));
+	migrate_enable();
+	rcu_read_unlock_trace();
+}
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index 1f3a4be4b175..951cad26c5a9 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -489,103 +489,6 @@ void bpf_trampoline_put(struct bpf_trampoline *tr)
 	mutex_unlock(&trampoline_mutex);
 }
 
-#define NO_START_TIME 1
-static u64 notrace bpf_prog_start_time(void)
-{
-	u64 start = NO_START_TIME;
-
-	if (static_branch_unlikely(&bpf_stats_enabled_key)) {
-		start = sched_clock();
-		if (unlikely(!start))
-			start = NO_START_TIME;
-	}
-	return start;
-}
-
-static void notrace inc_misses_counter(struct bpf_prog *prog)
-{
-	struct bpf_prog_stats *stats;
-
-	stats = this_cpu_ptr(prog->stats);
-	u64_stats_update_begin(&stats->syncp);
-	stats->misses++;
-	u64_stats_update_end(&stats->syncp);
-}
-
-/* The logic is similar to BPF_PROG_RUN, but with an explicit
- * rcu_read_lock() and migrate_disable() which are required
- * for the trampoline. The macro is split into
- * call __bpf_prog_enter
- * call prog->bpf_func
- * call __bpf_prog_exit
- *
- * __bpf_prog_enter returns:
- * 0 - skip execution of the bpf prog
- * 1 - execute bpf prog
- * [2..MAX_U64] - excute bpf prog and record execution time.
- *     This is start time.
- */
-u64 notrace __bpf_prog_enter(struct bpf_prog *prog)
-	__acquires(RCU)
-{
-	rcu_read_lock();
-	migrate_disable();
-	if (unlikely(__this_cpu_inc_return(*(prog->active)) != 1)) {
-		inc_misses_counter(prog);
-		return 0;
-	}
-	return bpf_prog_start_time();
-}
-
-static void notrace update_prog_stats(struct bpf_prog *prog,
-				      u64 start)
-{
-	struct bpf_prog_stats *stats;
-
-	if (static_branch_unlikely(&bpf_stats_enabled_key) &&
-	    /* static_key could be enabled in __bpf_prog_enter*
-	     * and disabled in __bpf_prog_exit*.
-	     * And vice versa.
-	     * Hence check that 'start' is valid.
-	     */
-	    start > NO_START_TIME) {
-		stats = this_cpu_ptr(prog->stats);
-		u64_stats_update_begin(&stats->syncp);
-		stats->cnt++;
-		stats->nsecs += sched_clock() - start;
-		u64_stats_update_end(&stats->syncp);
-	}
-}
-
-void notrace __bpf_prog_exit(struct bpf_prog *prog, u64 start)
-	__releases(RCU)
-{
-	update_prog_stats(prog, start);
-	__this_cpu_dec(*(prog->active));
-	migrate_enable();
-	rcu_read_unlock();
-}
-
-u64 notrace __bpf_prog_enter_sleepable(struct bpf_prog *prog)
-{
-	rcu_read_lock_trace();
-	migrate_disable();
-	might_fault();
-	if (unlikely(__this_cpu_inc_return(*(prog->active)) != 1)) {
-		inc_misses_counter(prog);
-		return 0;
-	}
-	return bpf_prog_start_time();
-}
-
-void notrace __bpf_prog_exit_sleepable(struct bpf_prog *prog, u64 start)
-{
-	update_prog_stats(prog, start);
-	__this_cpu_dec(*(prog->active));
-	migrate_enable();
-	rcu_read_unlock_trace();
-}
-
 void notrace __bpf_tramp_enter(struct bpf_tramp_image *tr)
 {
 	percpu_ref_get(&tr->pcref);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCHv2 RFC bpf-next 2/7] bpf: Add bpf_functions object
  2021-04-13 12:15 [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe Jiri Olsa
  2021-04-13 12:15 ` [PATCHv2 RFC bpf-next 1/7] bpf: Move bpf_prog_start/end functions to generic place Jiri Olsa
@ 2021-04-13 12:15 ` Jiri Olsa
  2021-04-13 12:15 ` [PATCHv2 RFC bpf-next 3/7] bpf: Add support to attach program to ftrace probe Jiri Olsa
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 39+ messages in thread
From: Jiri Olsa @ 2021-04-13 12:15 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: netdev, bpf, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Daniel Xu, Steven Rostedt,
	Jesper Brouer, Toke Høiland-Jørgensen, Viktor Malik

Adding bpf_functions object to gather and carry functions
based on their BTF id. It will be used in following patch
to attach multiple functions to bpf ftrace probe program.

With bpf_functions object we can do such attachment at one
single moment, so it will speed up tools that need this.

New struct is added to union bpf_attr, that is used for
new command BPF_FUNCTIONS_ADD:

  struct { /* BPF_FUNCTIONS_ADD */
          __u32           fd;
          __u32           btf_id;
  } functions_add;

When fd == -1 new bpf_functions object is created with one
function (specified by btf_id) and its fd is returned.
For fd >= 0 the function (specified by btf_id) is added
to the existing object for the given fd.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 include/uapi/linux/bpf.h       |   5 ++
 kernel/bpf/syscall.c           | 137 +++++++++++++++++++++++++++++++++
 tools/include/uapi/linux/bpf.h |   5 ++
 3 files changed, 147 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index e1ee1be7e49b..5d616735fe1b 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -862,6 +862,7 @@ enum bpf_cmd {
 	BPF_ITER_CREATE,
 	BPF_LINK_DETACH,
 	BPF_PROG_BIND_MAP,
+	BPF_FUNCTIONS_ADD,
 };
 
 enum bpf_map_type {
@@ -1458,6 +1459,10 @@ union bpf_attr {
 		__u32		flags;		/* extra flags */
 	} prog_bind_map;
 
+	struct { /* BPF_FUNCTIONS_ADD */
+		__u32		fd;
+		__u32		btf_id;
+	} functions_add;
 } __attribute__((aligned(8)));
 
 /* The description below is an attempt at providing documentation to eBPF
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 90cd58520bd4..b240a500cae5 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2265,6 +2265,140 @@ static int bpf_prog_load(union bpf_attr *attr, union bpf_attr __user *uattr)
 	return err;
 }
 
+#define BPF_FUNCTIONS_ALLOC 100
+#define BPF_FUNCTIONS_MAX   (BPF_FUNCTIONS_ALLOC*10)
+
+struct bpf_functions {
+	struct mutex mutex;
+	unsigned long *addrs;
+	int cnt;
+	int alloc;
+};
+
+static int bpf_functions_release(struct inode *inode, struct file *file)
+{
+	struct bpf_functions *funcs = file->private_data;
+
+	kfree(funcs->addrs);
+	kfree(funcs);
+	return 0;
+}
+
+static const struct file_operations bpf_functions_fops = {
+	.release = bpf_functions_release,
+};
+
+static struct bpf_functions *bpf_functions_get_from_fd(u32 ufd, struct fd *p)
+{
+	struct fd f = fdget(ufd);
+
+	if (!f.file)
+		return ERR_PTR(-EBADF);
+	if (f.file->f_op != &bpf_functions_fops) {
+		fdput(f);
+		return ERR_PTR(-EINVAL);
+	}
+	*p = f;
+	return f.file->private_data;
+}
+
+static unsigned long bpf_get_kernel_func_addr(u32 btf_id, struct btf *btf)
+{
+	const struct btf_type *t;
+	const char *tname;
+
+	t = btf_type_by_id(btf, btf_id);
+	if (!t)
+		return 0;
+	tname = btf_name_by_offset(btf, t->name_off);
+	if (!tname)
+		return 0;
+	if (!btf_type_is_func(t))
+		return 0;
+	t = btf_type_by_id(btf, t->type);
+	if (!btf_type_is_func_proto(t))
+		return 0;
+
+	return kallsyms_lookup_name(tname);
+}
+
+#define BPF_FUNCTIONS_ADD_LAST_FIELD functions_add.btf_id
+
+static int bpf_functions_add(union bpf_attr *attr)
+{
+	struct bpf_functions *funcs;
+	unsigned long addr, *p;
+	struct fd orig = { };
+	int ret = 0, fd;
+	struct btf *btf;
+
+	if (CHECK_ATTR(BPF_FUNCTIONS_ADD))
+		return -EINVAL;
+
+	if (!attr->functions_add.btf_id)
+		return -EINVAL;
+
+	/* fd >=  0  use existing bpf_functions object
+	 * fd == -1  create new bpf_functions object
+	 */
+	fd = attr->functions_add.fd;
+	if (fd < -1)
+		return -EINVAL;
+
+	btf = bpf_get_btf_vmlinux();
+	if (!btf)
+		return -EINVAL;
+
+	addr = bpf_get_kernel_func_addr(attr->functions_add.btf_id, btf);
+	if (!addr)
+		return -EINVAL;
+
+	if (!ftrace_location(addr))
+		return -EINVAL;
+
+	if (fd >= 0) {
+		funcs = bpf_functions_get_from_fd(fd, &orig);
+		if (IS_ERR(funcs))
+			return PTR_ERR(funcs);
+	} else {
+		funcs = kzalloc(sizeof(*funcs), GFP_USER);
+		if (!funcs)
+			return -ENOMEM;
+
+		mutex_init(&funcs->mutex);
+		fd = anon_inode_getfd("bpf-functions", &bpf_functions_fops,
+				      funcs, O_CLOEXEC);
+		if (fd < 0) {
+			kfree(funcs);
+			return fd;
+		}
+		ret = fd;
+	}
+
+	mutex_lock(&funcs->mutex);
+
+	if (funcs->cnt == BPF_FUNCTIONS_MAX) {
+		ret = -EINVAL;
+		goto out_put;
+	}
+	if (funcs->cnt == funcs->alloc) {
+		funcs->alloc += BPF_FUNCTIONS_ALLOC;
+		p = krealloc(funcs->addrs, funcs->alloc * sizeof(p[0]), GFP_USER);
+		if (!p) {
+			ret = -ENOMEM;
+			goto out_put;
+		}
+		funcs->addrs = p;
+	}
+
+	funcs->addrs[funcs->cnt++] = addr;
+
+out_put:
+	mutex_unlock(&funcs->mutex);
+	fdput(orig);
+	return ret;
+}
+
 #define BPF_OBJ_LAST_FIELD file_flags
 
 static int bpf_obj_pin(const union bpf_attr *attr)
@@ -4487,6 +4621,9 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
 	case BPF_PROG_BIND_MAP:
 		err = bpf_prog_bind_map(&attr);
 		break;
+	case BPF_FUNCTIONS_ADD:
+		err = bpf_functions_add(&attr);
+		break;
 	default:
 		err = -EINVAL;
 		break;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index e1ee1be7e49b..5d616735fe1b 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -862,6 +862,7 @@ enum bpf_cmd {
 	BPF_ITER_CREATE,
 	BPF_LINK_DETACH,
 	BPF_PROG_BIND_MAP,
+	BPF_FUNCTIONS_ADD,
 };
 
 enum bpf_map_type {
@@ -1458,6 +1459,10 @@ union bpf_attr {
 		__u32		flags;		/* extra flags */
 	} prog_bind_map;
 
+	struct { /* BPF_FUNCTIONS_ADD */
+		__u32		fd;
+		__u32		btf_id;
+	} functions_add;
 } __attribute__((aligned(8)));
 
 /* The description below is an attempt at providing documentation to eBPF
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCHv2 RFC bpf-next 3/7] bpf: Add support to attach program to ftrace probe
  2021-04-13 12:15 [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe Jiri Olsa
  2021-04-13 12:15 ` [PATCHv2 RFC bpf-next 1/7] bpf: Move bpf_prog_start/end functions to generic place Jiri Olsa
  2021-04-13 12:15 ` [PATCHv2 RFC bpf-next 2/7] bpf: Add bpf_functions object Jiri Olsa
@ 2021-04-13 12:15 ` Jiri Olsa
  2021-04-13 12:15 ` [PATCHv2 RFC bpf-next 4/7] libbpf: Add btf__find_by_pattern_kind function Jiri Olsa
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 39+ messages in thread
From: Jiri Olsa @ 2021-04-13 12:15 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: netdev, bpf, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Daniel Xu, Steven Rostedt,
	Jesper Brouer, Toke Høiland-Jørgensen, Viktor Malik

Adding support to attach bpf program to ftrace probes.

The program needs to loaded with BPF_TRACE_FTRACE_ENTRY
as its expected_attach_type. With such program we can
create a link with new 'funcs_fd' field, which holds
fd of the bpf_function object.

The attach will create ftrace_ops object and set filter
to the bpf_functions functions.

The ftrace bpf program gets following arguments on entry:
  ip, parent_ip

It's possible to add registers in the future, but I have
no use for them at the moment. Currently bpftrace is using
'ip' to identify the probe.

Adding 'entry' support for now, 'exit' support can be added
later when it's supported in ftrace.

Forcing userspace to use bpf_ftrace_probe BTF ID as probed
function, which is used in verifier to check on probe's
data accesses. The verifier now checks that we use directly
bpf_ftrace_probe as probe, but we could change it to use any
function with same prototype if needed.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 include/uapi/linux/bpf.h       |   3 +
 kernel/bpf/syscall.c           | 147 +++++++++++++++++++++++++++++++++
 kernel/bpf/verifier.c          |  30 +++++++
 net/bpf/test_run.c             |   1 +
 tools/include/uapi/linux/bpf.h |   3 +
 5 files changed, 184 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 5d616735fe1b..dbedbcdc8122 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -980,6 +980,7 @@ enum bpf_attach_type {
 	BPF_SK_LOOKUP,
 	BPF_XDP,
 	BPF_SK_SKB_VERDICT,
+	BPF_TRACE_FTRACE_ENTRY,
 	__MAX_BPF_ATTACH_TYPE
 };
 
@@ -993,6 +994,7 @@ enum bpf_link_type {
 	BPF_LINK_TYPE_ITER = 4,
 	BPF_LINK_TYPE_NETNS = 5,
 	BPF_LINK_TYPE_XDP = 6,
+	BPF_LINK_TYPE_FTRACE = 7,
 
 	MAX_BPF_LINK_TYPE,
 };
@@ -1427,6 +1429,7 @@ union bpf_attr {
 				__aligned_u64	iter_info;	/* extra bpf_iter_link_info */
 				__u32		iter_info_len;	/* iter_info length */
 			};
+			__u32		funcs_fd;
 		};
 	} link_create;
 
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index b240a500cae5..c83515d41020 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1965,6 +1965,11 @@ bpf_prog_load_check_attach(enum bpf_prog_type prog_type,
 	    prog_type != BPF_PROG_TYPE_EXT)
 		return -EINVAL;
 
+	if (prog_type == BPF_PROG_TYPE_TRACING &&
+	    expected_attach_type == BPF_TRACE_FTRACE_ENTRY &&
+	    !IS_ENABLED(CONFIG_FUNCTION_TRACER))
+		return -EINVAL;
+
 	switch (prog_type) {
 	case BPF_PROG_TYPE_CGROUP_SOCK:
 		switch (expected_attach_type) {
@@ -2861,6 +2866,144 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog,
 	return err;
 }
 
+#ifdef CONFIG_FUNCTION_TRACER
+struct bpf_tracing_ftrace_link {
+	struct bpf_link link;
+	enum bpf_attach_type attach_type;
+	struct ftrace_ops ops;
+};
+
+static void bpf_tracing_ftrace_link_release(struct bpf_link *link)
+{
+	struct bpf_tracing_ftrace_link *tr_link =
+		container_of(link, struct bpf_tracing_ftrace_link, link);
+
+	WARN_ON(unregister_ftrace_function(&tr_link->ops));
+}
+
+static void bpf_tracing_ftrace_link_dealloc(struct bpf_link *link)
+{
+	struct bpf_tracing_ftrace_link *tr_link =
+		container_of(link, struct bpf_tracing_ftrace_link, link);
+
+	kfree(tr_link);
+}
+
+static void bpf_tracing_ftrace_link_show_fdinfo(const struct bpf_link *link,
+					       struct seq_file *seq)
+{
+	struct bpf_tracing_ftrace_link *tr_link =
+		container_of(link, struct bpf_tracing_ftrace_link, link);
+
+	seq_printf(seq,
+		   "attach_type:\t%d\n",
+		   tr_link->attach_type);
+}
+
+static int bpf_tracing_ftrace_link_fill_link_info(const struct bpf_link *link,
+						 struct bpf_link_info *info)
+{
+	struct bpf_tracing_ftrace_link *tr_link =
+		container_of(link, struct bpf_tracing_ftrace_link, link);
+
+	info->tracing.attach_type = tr_link->attach_type;
+	return 0;
+}
+
+static const struct bpf_link_ops bpf_tracing_ftrace_lops = {
+	.release = bpf_tracing_ftrace_link_release,
+	.dealloc = bpf_tracing_ftrace_link_dealloc,
+	.show_fdinfo = bpf_tracing_ftrace_link_show_fdinfo,
+	.fill_link_info = bpf_tracing_ftrace_link_fill_link_info,
+};
+
+static void
+bpf_ftrace_function_call(unsigned long ip, unsigned long parent_ip,
+			 struct ftrace_ops *ops,  struct ftrace_regs *fregs)
+{
+	struct bpf_tracing_ftrace_link *tr_link;
+	struct bpf_prog *prog;
+	u64 start;
+
+	tr_link = container_of(ops, struct bpf_tracing_ftrace_link, ops);
+	prog = tr_link->link.prog;
+
+	if (prog->aux->sleepable)
+		start = __bpf_prog_enter_sleepable(prog);
+	else
+		start = __bpf_prog_enter(prog);
+
+	if (start)
+		bpf_trace_run2(tr_link->link.prog, ip, parent_ip);
+
+	if (prog->aux->sleepable)
+		__bpf_prog_exit_sleepable(prog, start);
+	else
+		__bpf_prog_exit(prog, start);
+}
+
+static int bpf_tracing_ftrace_attach(struct bpf_prog *prog, int funcs_fd)
+{
+	struct bpf_tracing_ftrace_link *link;
+	struct bpf_link_primer link_primer;
+	struct bpf_functions *funcs;
+	struct ftrace_ops *ops;
+	int err = -ENOMEM;
+	struct fd orig;
+	int i;
+
+	if (prog->type != BPF_PROG_TYPE_TRACING)
+		return -EINVAL;
+
+	if (prog->expected_attach_type != BPF_TRACE_FTRACE_ENTRY)
+		return -EINVAL;
+
+	funcs = bpf_functions_get_from_fd(funcs_fd, &orig);
+	if (IS_ERR(funcs))
+		return PTR_ERR(funcs);
+
+	link = kzalloc(sizeof(*link), GFP_USER);
+	if (!link)
+		goto out_free;
+
+	ops = &link->ops;
+	ops->func = bpf_ftrace_function_call;
+	ops->flags = FTRACE_OPS_FL_DYNAMIC;
+
+	bpf_link_init(&link->link, BPF_LINK_TYPE_FTRACE,
+		      &bpf_tracing_ftrace_lops, prog);
+	link->attach_type = prog->expected_attach_type;
+
+	err = bpf_link_prime(&link->link, &link_primer);
+	if (err)
+		goto out_free;
+
+	for (i = 0; i < funcs->cnt; i++) {
+		err = ftrace_set_filter_ip(ops, funcs->addrs[i], 0, 0);
+		if (err)
+			goto out_free;
+	}
+
+	err = register_ftrace_function(ops);
+	if (err)
+		goto out_free;
+
+	fdput(orig);
+	return bpf_link_settle(&link_primer);
+
+out_free:
+	kfree(link);
+	fdput(orig);
+	return err;
+}
+#else
+static int bpf_tracing_ftrace_attach(struct bpf_prog *prog __maybe_unused,
+				     int funcs_fd __maybe_unused)
+{
+	return -ENODEV;
+}
+#endif /* CONFIG_FUNCTION_TRACER */
+
 struct bpf_raw_tp_link {
 	struct bpf_link link;
 	struct bpf_raw_event_map *btp;
@@ -3093,6 +3236,7 @@ attach_type_to_prog_type(enum bpf_attach_type attach_type)
 	case BPF_CGROUP_GETSOCKOPT:
 	case BPF_CGROUP_SETSOCKOPT:
 		return BPF_PROG_TYPE_CGROUP_SOCKOPT;
+	case BPF_TRACE_FTRACE_ENTRY:
 	case BPF_TRACE_ITER:
 		return BPF_PROG_TYPE_TRACING;
 	case BPF_SK_LOOKUP:
@@ -4149,6 +4293,9 @@ static int tracing_bpf_link_attach(const union bpf_attr *attr, struct bpf_prog *
 
 	if (prog->expected_attach_type == BPF_TRACE_ITER)
 		return bpf_iter_link_attach(attr, prog);
+	else if (prog->expected_attach_type == BPF_TRACE_FTRACE_ENTRY)
+		return bpf_tracing_ftrace_attach(prog,
+						 attr->link_create.funcs_fd);
 	else if (prog->type == BPF_PROG_TYPE_EXT)
 		return bpf_tracing_prog_attach(prog,
 					       attr->link_create.target_fd,
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 852541a435ef..ea001aec66f6 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -8946,6 +8946,7 @@ static int check_return_code(struct bpf_verifier_env *env)
 			break;
 		case BPF_TRACE_RAW_TP:
 		case BPF_MODIFY_RETURN:
+		case BPF_TRACE_FTRACE_ENTRY:
 			return 0;
 		case BPF_TRACE_ITER:
 			break;
@@ -12794,6 +12795,14 @@ static int check_non_sleepable_error_inject(u32 btf_id)
 	return btf_id_set_contains(&btf_non_sleepable_error_inject, btf_id);
 }
 
+__maybe_unused
+void bpf_ftrace_probe(unsigned long ip __maybe_unused,
+		      unsigned long parent_ip __maybe_unused)
+{
+}
+
+BTF_ID_LIST_SINGLE(btf_ftrace_probe_id, func, bpf_ftrace_probe);
+
 int bpf_check_attach_target(struct bpf_verifier_log *log,
 			    const struct bpf_prog *prog,
 			    const struct bpf_prog *tgt_prog,
@@ -13021,6 +13030,25 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
 		}
 
 		break;
+	case BPF_TRACE_FTRACE_ENTRY:
+		if (tgt_prog) {
+			bpf_log(log,
+				"Only FENTRY/FEXIT progs are attachable to another BPF prog\n");
+			return -EINVAL;
+		}
+		if (btf_id != btf_ftrace_probe_id[0]) {
+			bpf_log(log,
+				"Only btf id '%d' allowed for ftrace probe\n",
+				btf_ftrace_probe_id[0]);
+			return -EINVAL;
+		}
+		t = btf_type_by_id(btf, t->type);
+		if (!btf_type_is_func_proto(t))
+			return -EINVAL;
+		ret = btf_distill_func_proto(log, btf, t, tname, &tgt_info->fmodel);
+		if (ret < 0)
+			return ret;
+		break;
 	}
 	tgt_info->tgt_addr = addr;
 	tgt_info->tgt_name = tname;
@@ -13081,6 +13109,8 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
 		if (!bpf_iter_prog_supported(prog))
 			return -EINVAL;
 		return 0;
+	} else if (prog->expected_attach_type == BPF_TRACE_FTRACE_ENTRY) {
+		return 0;
 	}
 
 	if (prog->type == BPF_PROG_TYPE_LSM) {
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index a5d72c48fb66..0a891c27bad0 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -285,6 +285,7 @@ int bpf_prog_test_run_tracing(struct bpf_prog *prog,
 	switch (prog->expected_attach_type) {
 	case BPF_TRACE_FENTRY:
 	case BPF_TRACE_FEXIT:
+	case BPF_TRACE_FTRACE_ENTRY:
 		if (bpf_fentry_test1(1) != 2 ||
 		    bpf_fentry_test2(2, 3) != 5 ||
 		    bpf_fentry_test3(4, 5, 6) != 15 ||
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 5d616735fe1b..dbedbcdc8122 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -980,6 +980,7 @@ enum bpf_attach_type {
 	BPF_SK_LOOKUP,
 	BPF_XDP,
 	BPF_SK_SKB_VERDICT,
+	BPF_TRACE_FTRACE_ENTRY,
 	__MAX_BPF_ATTACH_TYPE
 };
 
@@ -993,6 +994,7 @@ enum bpf_link_type {
 	BPF_LINK_TYPE_ITER = 4,
 	BPF_LINK_TYPE_NETNS = 5,
 	BPF_LINK_TYPE_XDP = 6,
+	BPF_LINK_TYPE_FTRACE = 7,
 
 	MAX_BPF_LINK_TYPE,
 };
@@ -1427,6 +1429,7 @@ union bpf_attr {
 				__aligned_u64	iter_info;	/* extra bpf_iter_link_info */
 				__u32		iter_info_len;	/* iter_info length */
 			};
+			__u32		funcs_fd;
 		};
 	} link_create;
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCHv2 RFC bpf-next 4/7] libbpf: Add btf__find_by_pattern_kind function
  2021-04-13 12:15 [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe Jiri Olsa
                   ` (2 preceding siblings ...)
  2021-04-13 12:15 ` [PATCHv2 RFC bpf-next 3/7] bpf: Add support to attach program to ftrace probe Jiri Olsa
@ 2021-04-13 12:15 ` Jiri Olsa
  2021-04-13 12:15 ` [PATCHv2 RFC bpf-next 5/7] libbpf: Add support to load and attach ftrace probe Jiri Olsa
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 39+ messages in thread
From: Jiri Olsa @ 2021-04-13 12:15 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: netdev, bpf, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Daniel Xu, Steven Rostedt,
	Jesper Brouer, Toke Høiland-Jørgensen, Viktor Malik

Adding btf__find_by_pattern_kind function that returns
array of BTF ids for given function name pattern.

Using libc's regex.h support for that.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/lib/bpf/btf.c | 68 +++++++++++++++++++++++++++++++++++++++++++++
 tools/lib/bpf/btf.h |  3 ++
 2 files changed, 71 insertions(+)

diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index d30e67e7e1e5..e193713b9d56 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
 /* Copyright (c) 2018 Facebook */
 
+#define _GNU_SOURCE
 #include <byteswap.h>
 #include <endian.h>
 #include <stdio.h>
@@ -16,6 +17,7 @@
 #include <linux/err.h>
 #include <linux/btf.h>
 #include <gelf.h>
+#include <regex.h>
 #include "btf.h"
 #include "bpf.h"
 #include "libbpf.h"
@@ -712,6 +714,72 @@ __s32 btf__find_by_name_kind(const struct btf *btf, const char *type_name,
 	return -ENOENT;
 }
 
+static bool is_wildcard(char c)
+{
+	static const char *wildchars = "*?[|";
+
+	return strchr(wildchars, c);
+}
+
+int btf__find_by_pattern_kind(const struct btf *btf,
+			      const char *type_pattern, __u32 kind,
+			      __s32 **__ids)
+{
+	__u32 i, nr_types = btf__get_nr_types(btf);
+	__s32 *ids = NULL;
+	int cnt = 0, alloc = 0, ret;
+	regex_t regex;
+	char *pattern;
+
+	if (kind == BTF_KIND_UNKN || !strcmp(type_pattern, "void"))
+		return 0;
+
+	/* When the pattern does not start with wildcard, treat it as
+	 * if we'd want to match it from the beginning of the string.
+	 */
+	asprintf(&pattern, "%s%s",
+		 is_wildcard(type_pattern[0]) ? "^" : "",
+		 type_pattern);
+
+	ret = regcomp(&regex, pattern, REG_EXTENDED);
+	if (ret) {
+		pr_warn("failed to compile regex\n");
+		free(pattern);
+		return -EINVAL;
+	}
+
+	free(pattern);
+
+	for (i = 1; i <= nr_types; i++) {
+		const struct btf_type *t = btf__type_by_id(btf, i);
+		const char *name;
+		__s32 *p;
+
+		if (btf_kind(t) != kind)
+			continue;
+		name = btf__name_by_offset(btf, t->name_off);
+		if (name && regexec(&regex, name, 0, NULL, 0))
+			continue;
+		if (cnt == alloc) {
+			alloc = max(100, alloc * 3 / 2);
+			p = realloc(ids, alloc * sizeof(__u32));
+			if (!p) {
+				free(ids);
+				regfree(&regex);
+				return -ENOMEM;
+			}
+			ids = p;
+		}
+
+		ids[cnt] = i;
+		cnt++;
+	}
+
+	regfree(&regex);
+	*__ids = ids;
+	return cnt ?: -ENOENT;
+}
+
 static bool btf_is_modifiable(const struct btf *btf)
 {
 	return (void *)btf->hdr != btf->raw_data;
diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
index b54f1c3ebd57..036857aded94 100644
--- a/tools/lib/bpf/btf.h
+++ b/tools/lib/bpf/btf.h
@@ -371,6 +371,9 @@ btf_var_secinfos(const struct btf_type *t)
 	return (struct btf_var_secinfo *)(t + 1);
 }
 
+int btf__find_by_pattern_kind(const struct btf *btf,
+			      const char *type_pattern, __u32 kind,
+			      __s32 **__ids);
 #ifdef __cplusplus
 } /* extern "C" */
 #endif
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCHv2 RFC bpf-next 5/7] libbpf: Add support to load and attach ftrace probe
  2021-04-13 12:15 [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe Jiri Olsa
                   ` (3 preceding siblings ...)
  2021-04-13 12:15 ` [PATCHv2 RFC bpf-next 4/7] libbpf: Add btf__find_by_pattern_kind function Jiri Olsa
@ 2021-04-13 12:15 ` Jiri Olsa
  2021-04-13 12:15 ` [PATCHv2 RFC bpf-next 6/7] selftests/bpf: Add ftrace probe to fentry test Jiri Olsa
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 39+ messages in thread
From: Jiri Olsa @ 2021-04-13 12:15 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: netdev, bpf, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Daniel Xu, Steven Rostedt,
	Jesper Brouer, Toke Høiland-Jørgensen, Viktor Malik

Adding support to load and attach ftrace probe.

Adding new section type 'fentry.ftrace', that identifies
ftrace probe and assigns BPF_TRACE_FTRACE_ENTRY to prog's
expected_attach_type.

The attach function creates bpf_functions object and
makes an ftrace link with the program.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/lib/bpf/bpf.c      | 12 +++++++
 tools/lib/bpf/bpf.h      |  5 ++-
 tools/lib/bpf/libbpf.c   | 74 ++++++++++++++++++++++++++++++++++++++++
 tools/lib/bpf/libbpf.map |  1 +
 4 files changed, 91 insertions(+), 1 deletion(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index bba48ff4c5c0..b3195ac3e32e 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -643,6 +643,7 @@ int bpf_link_create(int prog_fd, int target_fd,
 	attr.link_create.target_fd = target_fd;
 	attr.link_create.attach_type = attach_type;
 	attr.link_create.flags = OPTS_GET(opts, flags, 0);
+	attr.link_create.funcs_fd = OPTS_GET(opts, funcs_fd, 0);
 
 	if (iter_info_len) {
 		attr.link_create.iter_info =
@@ -971,3 +972,14 @@ int bpf_prog_bind_map(int prog_fd, int map_fd,
 
 	return sys_bpf(BPF_PROG_BIND_MAP, &attr, sizeof(attr));
 }
+
+int bpf_functions_add(int fd, int btf_id)
+{
+	union bpf_attr attr;
+
+	memset(&attr, 0, sizeof(attr));
+	attr.functions_add.fd = fd;
+	attr.functions_add.btf_id = btf_id;
+
+	return sys_bpf(BPF_FUNCTIONS_ADD, &attr, sizeof(attr));
+}
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 875dde20d56e..f677fe06262b 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -175,8 +175,9 @@ struct bpf_link_create_opts {
 	union bpf_iter_link_info *iter_info;
 	__u32 iter_info_len;
 	__u32 target_btf_id;
+	__u32 funcs_fd;
 };
-#define bpf_link_create_opts__last_field target_btf_id
+#define bpf_link_create_opts__last_field funcs_fd
 
 LIBBPF_API int bpf_link_create(int prog_fd, int target_fd,
 			       enum bpf_attach_type attach_type,
@@ -278,6 +279,8 @@ struct bpf_test_run_opts {
 LIBBPF_API int bpf_prog_test_run_opts(int prog_fd,
 				      struct bpf_test_run_opts *opts);
 
+LIBBPF_API int bpf_functions_add(int fd, int btf_id);
+
 #ifdef __cplusplus
 } /* extern "C" */
 #endif
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index ed5586cce227..b3cb43990524 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -8838,6 +8838,10 @@ static const struct bpf_sec_def section_defs[] = {
 		.expected_attach_type = BPF_TRACE_ITER,
 		.is_attach_btf = true,
 		.attach_fn = attach_iter),
+	SEC_DEF("fentry.ftrace/", TRACING,
+		.expected_attach_type = BPF_TRACE_FTRACE_ENTRY,
+		.is_attach_btf = true,
+		.attach_fn = attach_trace),
 	BPF_EAPROG_SEC("xdp_devmap/",		BPF_PROG_TYPE_XDP,
 						BPF_XDP_DEVMAP),
 	BPF_EAPROG_SEC("xdp_cpumap/",		BPF_PROG_TYPE_XDP,
@@ -9125,6 +9129,7 @@ static int bpf_object__collect_st_ops_relos(struct bpf_object *obj,
 #define BTF_TRACE_PREFIX "btf_trace_"
 #define BTF_LSM_PREFIX "bpf_lsm_"
 #define BTF_ITER_PREFIX "bpf_iter_"
+#define BTF_FTRACE_PROBE "bpf_ftrace_probe"
 #define BTF_MAX_NAME_SIZE 128
 
 static int find_btf_by_prefix_kind(const struct btf *btf, const char *prefix,
@@ -9158,6 +9163,9 @@ static inline int find_attach_btf_id(struct btf *btf, const char *name,
 	else if (attach_type == BPF_TRACE_ITER)
 		err = find_btf_by_prefix_kind(btf, BTF_ITER_PREFIX, name,
 					      BTF_KIND_FUNC);
+	else if (attach_type == BPF_TRACE_FTRACE_ENTRY)
+		err = btf__find_by_name_kind(btf, BTF_FTRACE_PROBE,
+					     BTF_KIND_FUNC);
 	else
 		err = btf__find_by_name_kind(btf, name, BTF_KIND_FUNC);
 
@@ -10191,8 +10199,74 @@ static struct bpf_link *bpf_program__attach_btf_id(struct bpf_program *prog)
 	return (struct bpf_link *)link;
 }
 
+static struct bpf_link *bpf_program__attach_ftrace(struct bpf_program *prog)
+{
+	char *pattern = prog->sec_name + prog->sec_def->len;
+	DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts);
+	int prog_fd, link_fd, cnt, err, i;
+	enum bpf_attach_type attach_type;
+	struct bpf_link *link = NULL;
+	__s32 *ids = NULL;
+	int funcs_fd = -1;
+
+	prog_fd = bpf_program__fd(prog);
+	if (prog_fd < 0) {
+		pr_warn("prog '%s': can't attach before loaded\n", prog->name);
+		return ERR_PTR(-EINVAL);
+	}
+
+	err = bpf_object__load_vmlinux_btf(prog->obj, true);
+	if (err)
+		return ERR_PTR(err);
+
+	cnt = btf__find_by_pattern_kind(prog->obj->btf_vmlinux, pattern,
+					BTF_KIND_FUNC, &ids);
+	if (cnt <= 0)
+		return ERR_PTR(-EINVAL);
+
+	for (i = 0; i < cnt; i++) {
+		err = bpf_functions_add(funcs_fd, ids[i]);
+		if (err < 0) {
+			pr_warn("prog '%s': can't attach function BTF ID %d\n",
+				prog->name, ids[i]);
+			goto out_err;
+		}
+		if (funcs_fd == -1)
+			funcs_fd = err;
+	}
+
+	link = calloc(1, sizeof(*link));
+	if (!link) {
+		err = -ENOMEM;
+		goto out_err;
+	}
+	link->detach = &bpf_link__detach_fd;
+
+	opts.funcs_fd = funcs_fd;
+
+	attach_type = bpf_program__get_expected_attach_type(prog);
+	link_fd = bpf_link_create(prog_fd, 0, attach_type, &opts);
+	if (link_fd < 0) {
+		err = -errno;
+		goto out_err;
+	}
+	link->fd = link_fd;
+	free(ids);
+	return link;
+
+out_err:
+	if (funcs_fd != -1)
+		close(funcs_fd);
+	free(link);
+	free(ids);
+	return ERR_PTR(err);
+}
+
 struct bpf_link *bpf_program__attach_trace(struct bpf_program *prog)
 {
+	if (prog->expected_attach_type == BPF_TRACE_FTRACE_ENTRY)
+		return bpf_program__attach_ftrace(prog);
+
 	return bpf_program__attach_btf_id(prog);
 }
 
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index b9b29baf1df8..69cbe54125e3 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -355,6 +355,7 @@ LIBBPF_0.4.0 {
 	global:
 		btf__add_float;
 		btf__add_type;
+		bpf_functions_add;
 		bpf_linker__add_file;
 		bpf_linker__finalize;
 		bpf_linker__free;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCHv2 RFC bpf-next 6/7] selftests/bpf: Add ftrace probe to fentry test
  2021-04-13 12:15 [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe Jiri Olsa
                   ` (4 preceding siblings ...)
  2021-04-13 12:15 ` [PATCHv2 RFC bpf-next 5/7] libbpf: Add support to load and attach ftrace probe Jiri Olsa
@ 2021-04-13 12:15 ` Jiri Olsa
  2021-04-13 12:15 ` [PATCHv2 RFC bpf-next 7/7] selftests/bpf: Add ftrace probe test Jiri Olsa
  2021-04-14  1:04 ` [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe Andrii Nakryiko
  7 siblings, 0 replies; 39+ messages in thread
From: Jiri Olsa @ 2021-04-13 12:15 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: netdev, bpf, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Daniel Xu, Steven Rostedt,
	Jesper Brouer, Toke Høiland-Jørgensen, Viktor Malik

Adding 2 more tests for fentry probe test,
to show/test ftrace probe.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 .../selftests/bpf/prog_tests/fentry_test.c       |  5 ++++-
 tools/testing/selftests/bpf/progs/fentry_test.c  | 16 ++++++++++++++++
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/fentry_test.c b/tools/testing/selftests/bpf/prog_tests/fentry_test.c
index 04ebbf1cb390..70f414cb3bfd 100644
--- a/tools/testing/selftests/bpf/prog_tests/fentry_test.c
+++ b/tools/testing/selftests/bpf/prog_tests/fentry_test.c
@@ -26,12 +26,15 @@ void test_fentry_test(void)
 	      err, errno, retval, duration);
 
 	result = (__u64 *)fentry_skel->bss;
-	for (i = 0; i < 6; i++) {
+	for (i = 0; i < 8; i++) {
 		if (CHECK(result[i] != 1, "result",
 			  "fentry_test%d failed err %lld\n", i + 1, result[i]))
 			goto cleanup;
 	}
 
+	ASSERT_EQ(result[8], 8, "result");
+	ASSERT_EQ(result[9], 2, "result");
+
 cleanup:
 	fentry_test__destroy(fentry_skel);
 }
diff --git a/tools/testing/selftests/bpf/progs/fentry_test.c b/tools/testing/selftests/bpf/progs/fentry_test.c
index 52a550d281d9..b32b589923a4 100644
--- a/tools/testing/selftests/bpf/progs/fentry_test.c
+++ b/tools/testing/selftests/bpf/progs/fentry_test.c
@@ -77,3 +77,19 @@ int BPF_PROG(test8, struct bpf_fentry_test_t *arg)
 		test8_result = 1;
 	return 0;
 }
+
+__u64 test9_result = 0;
+SEC("fentry.ftrace/bpf_fentry_test*")
+int BPF_PROG(test9)
+{
+	test9_result++;
+	return 0;
+}
+
+__u64 test10_result = 0;
+SEC("fentry.ftrace/bpf_fentry_test1|bpf_fentry_test2")
+int BPF_PROG(test10)
+{
+	test10_result++;
+	return 0;
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCHv2 RFC bpf-next 7/7] selftests/bpf: Add ftrace probe test
  2021-04-13 12:15 [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe Jiri Olsa
                   ` (5 preceding siblings ...)
  2021-04-13 12:15 ` [PATCHv2 RFC bpf-next 6/7] selftests/bpf: Add ftrace probe to fentry test Jiri Olsa
@ 2021-04-13 12:15 ` Jiri Olsa
  2021-04-14  1:04 ` [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe Andrii Nakryiko
  7 siblings, 0 replies; 39+ messages in thread
From: Jiri Olsa @ 2021-04-13 12:15 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: netdev, bpf, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Daniel Xu, Steven Rostedt,
	Jesper Brouer, Toke Høiland-Jørgensen, Viktor Malik

Adding simple ftrace probe test that configures ftrace
probe and verifies the 'ip' argument matches the probed
functions addresses.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 .../selftests/bpf/prog_tests/ftrace_test.c    | 48 +++++++++++++++++++
 .../testing/selftests/bpf/progs/ftrace_test.c | 17 +++++++
 2 files changed, 65 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/ftrace_test.c
 create mode 100644 tools/testing/selftests/bpf/progs/ftrace_test.c

diff --git a/tools/testing/selftests/bpf/prog_tests/ftrace_test.c b/tools/testing/selftests/bpf/prog_tests/ftrace_test.c
new file mode 100644
index 000000000000..34d6dbe88251
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/ftrace_test.c
@@ -0,0 +1,48 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <test_progs.h>
+#include "ftrace_test.skel.h"
+
+void test_ftrace_test(void)
+{
+	struct ftrace_test *ftrace_skel = NULL;
+	__u32 duration = 0, retval;
+	int err, prog_fd;
+	__u64 *ips;
+	int idx, i;
+
+	ftrace_skel = ftrace_test__open_and_load();
+	if (!ASSERT_OK_PTR(ftrace_skel, "ftrace_skel_load"))
+		goto cleanup;
+
+	err = ftrace_test__attach(ftrace_skel);
+	if (!ASSERT_OK(err, "ftrace_attach"))
+		goto cleanup;
+
+	prog_fd = bpf_program__fd(ftrace_skel->progs.test);
+	err = bpf_prog_test_run(prog_fd, 1, NULL, 0,
+				NULL, NULL, &retval, &duration);
+	ASSERT_OK(err || retval, "test_run");
+
+	ips = ftrace_skel->bss->ips;
+	idx = ftrace_skel->bss->idx;
+
+	if (!ASSERT_EQ(idx, 8, "idx"))
+		goto cleanup;
+
+	for (i = 0; i < 8; i++) {
+		unsigned long long addr;
+		char func[50];
+
+		snprintf(func, sizeof(func), "bpf_fentry_test%d", i + 1);
+
+		err = kallsyms_find(func, &addr);
+		if (!ASSERT_OK(err, "kallsyms_find"))
+			goto cleanup;
+
+		if (!ASSERT_EQ(ips[i],  addr, "ips_addr"))
+			goto cleanup;
+	}
+
+cleanup:
+	ftrace_test__destroy(ftrace_skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/ftrace_test.c b/tools/testing/selftests/bpf/progs/ftrace_test.c
new file mode 100644
index 000000000000..b2a55aa10318
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/ftrace_test.c
@@ -0,0 +1,17 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+char _license[] SEC("license") = "GPL";
+
+__u64 ips[9] = { };
+unsigned int idx = 0;
+
+SEC("fentry.ftrace/bpf_fentry_test*")
+int BPF_PROG(test, __u64 ip, __u64 parent_ip)
+{
+	if (idx >= 0 && idx < 8)
+		ips[idx++] = ip;
+	return 0;
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe
  2021-04-13 12:15 [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe Jiri Olsa
                   ` (6 preceding siblings ...)
  2021-04-13 12:15 ` [PATCHv2 RFC bpf-next 7/7] selftests/bpf: Add ftrace probe test Jiri Olsa
@ 2021-04-14  1:04 ` Andrii Nakryiko
  2021-04-14 12:19   ` Jiri Olsa
  7 siblings, 1 reply; 39+ messages in thread
From: Andrii Nakryiko @ 2021-04-14  1:04 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Networking,
	bpf, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Daniel Xu, Steven Rostedt, Jesper Brouer,
	Toke Høiland-Jørgensen, Viktor Malik

On Tue, Apr 13, 2021 at 7:57 AM Jiri Olsa <jolsa@kernel.org> wrote:
>
> hi,
> sending another attempt on speeding up load of multiple probes
> for bpftrace and possibly other tools (first post in [1]).
>
> This patchset adds support to attach bpf program directly to
> ftrace probe as suggested by Steven and it speeds up loading
> for bpftrace commands like:
>
>    # bpftrace -e 'kfunc:_raw_spin* { @[probe] = count(); }'
>    # bpftrace -e 'kfunc:ksys_* { @[probe] = count(); }'
>
> Using ftrace with single bpf program for attachment to multiple
> functions is much faster than current approach, where we need to
> load and attach program for each probe function.
>

Ok, so first of all, I think it's super important to allow fast
attachment of a single BPF program to multiple kernel functions (I
call it mass-attachment). I've been recently prototyping a tool
(retsnoop, [0]) that allows attaching fentry/fexit to multiple
functions, and not having this feature turned into lots of extra code
and slow startup/teardown speeds. So we should definitely fix that.

But I think the approach you've taken is not the best one, even though
it's a good starting point for discussion.

First, you are saying function return attachment support is missing,
but is not needed so far. I actually think that without func return
the whole feature is extremely limiting. Not being able to measure
function latency  by tracking enter/exit events is crippling for tons
of useful applications. So I think this should go with both at the
same time.

But guess what, we already have a good BPF infra (BPF trampoline and
fexit programs) that supports func exit tracing. Additionally, it
supports the ability to read input arguments *on function exit*, which
is something that kretprobe doesn't support and which is often a very
limiting restriction, necessitating complicated logic to trace
function entry just to store input arguments. It's a killer feature
and one that makes fexit so much more useful than kretprobe.

The only problem is that currently we have a 1:1:1 relationship
between BPF trampoline, BPF program, and kernel function. I think we
should allow to have a single BPF program, using a single BPF
trampoline, but being able to attach to multiple kernel functions
(1:1:N). This will allow to validate BPF program once, allocate only
one dedicated BPF trampoline, and then (with appropriate attach API)
attach them in a batch mode.

We'll probably have to abandon direct memory read for input arguments,
but for these mass-attachment scenarios that's rarely needed at all.
Just allowing to read input args as u64 and providing traced function
IP would be enough to do a lot. BPF trampoline can just
unconditionally save the first 6 arguments, similarly how we do it
today for a specific BTF function, just always 6.

As for attachment, dedicating an entire new FD for storing functions
seems like an overkill. I think BPF_LINK_CREATE is the right place to
do this, providing an array of BTF IDs to identify all functions to be
attached to. It's both simple and efficient.

We'll get to libbpf APIs and those pseudo-regexp usage a bit later, I
don't think we need to discuss that at this stage yet :)

So, WDYT about BPF trampoline-based generic fentry/fexit with mass-attach API?

  [0] https://github.com/anakryiko/retsnoop

> Also available in
>   git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
>   bpf/ftrace
>
> thanks,
> jirka
>
>
> [1] https://lore.kernel.org/bpf/20201022082138.2322434-1-jolsa@kernel.org/
> ---
> Jiri Olsa (7):
>       bpf: Move bpf_prog_start/end functions to generic place
>       bpf: Add bpf_functions object
>       bpf: Add support to attach program to ftrace probe
>       libbpf: Add btf__find_by_pattern_kind function
>       libbpf: Add support to load and attach ftrace probe
>       selftests/bpf: Add ftrace probe to fentry test
>       selftests/bpf: Add ftrace probe test
>
>  include/uapi/linux/bpf.h                             |   8 ++++
>  kernel/bpf/syscall.c                                 | 381 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  kernel/bpf/trampoline.c                              |  97 ---------------------------------------
>  kernel/bpf/verifier.c                                |  27 +++++++++++
>  net/bpf/test_run.c                                   |   1 +
>  tools/include/uapi/linux/bpf.h                       |   8 ++++
>  tools/lib/bpf/bpf.c                                  |  12 +++++
>  tools/lib/bpf/bpf.h                                  |   5 +-
>  tools/lib/bpf/btf.c                                  |  67 +++++++++++++++++++++++++++
>  tools/lib/bpf/btf.h                                  |   3 ++
>  tools/lib/bpf/libbpf.c                               |  74 ++++++++++++++++++++++++++++++
>  tools/lib/bpf/libbpf.map                             |   1 +
>  tools/testing/selftests/bpf/prog_tests/fentry_test.c |   5 +-
>  tools/testing/selftests/bpf/prog_tests/ftrace_test.c |  48 +++++++++++++++++++
>  tools/testing/selftests/bpf/progs/fentry_test.c      |  16 +++++++
>  tools/testing/selftests/bpf/progs/ftrace_test.c      |  17 +++++++
>  16 files changed, 671 insertions(+), 99 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/ftrace_test.c
>  create mode 100644 tools/testing/selftests/bpf/progs/ftrace_test.c
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe
  2021-04-14  1:04 ` [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe Andrii Nakryiko
@ 2021-04-14 12:19   ` Jiri Olsa
  2021-04-14 22:46     ` Andrii Nakryiko
  0 siblings, 1 reply; 39+ messages in thread
From: Jiri Olsa @ 2021-04-14 12:19 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Jiri Olsa, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Networking, bpf, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Daniel Xu, Steven Rostedt,
	Jesper Brouer, Toke Høiland-Jørgensen, Viktor Malik

On Tue, Apr 13, 2021 at 06:04:05PM -0700, Andrii Nakryiko wrote:
> On Tue, Apr 13, 2021 at 7:57 AM Jiri Olsa <jolsa@kernel.org> wrote:
> >
> > hi,
> > sending another attempt on speeding up load of multiple probes
> > for bpftrace and possibly other tools (first post in [1]).
> >
> > This patchset adds support to attach bpf program directly to
> > ftrace probe as suggested by Steven and it speeds up loading
> > for bpftrace commands like:
> >
> >    # bpftrace -e 'kfunc:_raw_spin* { @[probe] = count(); }'
> >    # bpftrace -e 'kfunc:ksys_* { @[probe] = count(); }'
> >
> > Using ftrace with single bpf program for attachment to multiple
> > functions is much faster than current approach, where we need to
> > load and attach program for each probe function.
> >
> 
> Ok, so first of all, I think it's super important to allow fast
> attachment of a single BPF program to multiple kernel functions (I
> call it mass-attachment). I've been recently prototyping a tool
> (retsnoop, [0]) that allows attaching fentry/fexit to multiple
> functions, and not having this feature turned into lots of extra code
> and slow startup/teardown speeds. So we should definitely fix that.
> 
> But I think the approach you've taken is not the best one, even though
> it's a good starting point for discussion.
> 
> First, you are saying function return attachment support is missing,
> but is not needed so far. I actually think that without func return
> the whole feature is extremely limiting. Not being able to measure
> function latency  by tracking enter/exit events is crippling for tons
> of useful applications. So I think this should go with both at the
> same time.
> 
> But guess what, we already have a good BPF infra (BPF trampoline and
> fexit programs) that supports func exit tracing. Additionally, it
> supports the ability to read input arguments *on function exit*, which
> is something that kretprobe doesn't support and which is often a very
> limiting restriction, necessitating complicated logic to trace
> function entry just to store input arguments. It's a killer feature
> and one that makes fexit so much more useful than kretprobe.
> 
> The only problem is that currently we have a 1:1:1 relationship
> between BPF trampoline, BPF program, and kernel function. I think we
> should allow to have a single BPF program, using a single BPF
> trampoline, but being able to attach to multiple kernel functions
> (1:1:N). This will allow to validate BPF program once, allocate only
> one dedicated BPF trampoline, and then (with appropriate attach API)
> attach them in a batch mode.

heya,
I had some initial prototypes trying this way, but always ended up
in complicated code, that's why I turned to ftrace_ops.

let's see if it'll make any sense to you ;-)

1) so let's say we have extra trampoline for the program (which
also seems a bit of waste since there will be just single record
in it, but sure) - this single trampoline can be easily attached
to multiple functions, but what about other trampolines/tools,
that want to trace the same function? we'd need some way for a
function to share/call multiple trampolines - I did not see easy
solution in here so I moved to another way..


2) we keep the trampoline:function relationship to 1:1 and allow
'mass-attachment' program to register in multiple trampolines.
(it needs special hlist node for each attachment, but that's ok)

the problem was that to make this fast, you don't want to attach/detach
program to trampolines one by one, you need to do it in batch,
so you can call ftrace API just once (ftrace API is another problem below)
and doing this in batch mode means, that you need to lock all the
related trampolines and not allow any change in them by another tools,
and that's where I couldn't find any easy solution.. you can't take
a lock for 100 trampolines.. and having some 'master' lock is tricky

another problem is the ftrace API.. to make it fast we either
need to use ftrace_ops or create fast API to ftrace's direct
functions.. and that was rejected last time [1]


3) bpf has support for batch interface already, but only if ftrace
is not in the way..  compile without ftrace is not an option for us,
so I was also thinking about some way to bypass ftrace and allow
any trace engine to own some function.. so whoever takes it first
(ftrace or bpf) can use it, the other one will see -EBUSY and once
the tool is done, the function is free to take


[1] https://lore.kernel.org/bpf/20201022104205.728dd135@gandalf.local.home/#t

> 
> We'll probably have to abandon direct memory read for input arguments,
> but for these mass-attachment scenarios that's rarely needed at all.
> Just allowing to read input args as u64 and providing traced function
> IP would be enough to do a lot. BPF trampoline can just
> unconditionally save the first 6 arguments, similarly how we do it
> today for a specific BTF function, just always 6.

yes, we don't need arguments just function ip/id to tell what
function we just trace

> 
> As for attachment, dedicating an entire new FD for storing functions
> seems like an overkill. I think BPF_LINK_CREATE is the right place to
> do this, providing an array of BTF IDs to identify all functions to be
> attached to. It's both simple and efficient.

that fd can be closed right after link is created ;-)

I used it because it seemed simpler than the array approach,
and also for the ftrace_location check when adding the function
to the object - so you know it's under ftrace - when it passed,
the attach will most likely pass as well - so tools is just adding
IDs and the objects keeps only the good ones ;-)

but that ftrace_location info can be found in user space as well,
so it's definitely possible to prepare 'good' BTF IDs in userspace
and use array

> 
> We'll get to libbpf APIs and those pseudo-regexp usage a bit later, I
> don't think we need to discuss that at this stage yet :)
> 
> So, WDYT about BPF trampoline-based generic fentry/fexit with mass-attach API?

I'll double check the ftrace graph support for ftrace_ops ;-)

let's see if we could find some solutions for the problems I
described above.. or most likely another better way to do this,
that'd be great

> 
>   [0] https://github.com/anakryiko/retsnoop

nice tool, could it be also part of bcc?

thanks,
jirka

> 
> > Also available in
> >   git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
> >   bpf/ftrace
> >
> > thanks,
> > jirka
> >
> >
> > [1] https://lore.kernel.org/bpf/20201022082138.2322434-1-jolsa@kernel.org/
> > ---
> > Jiri Olsa (7):
> >       bpf: Move bpf_prog_start/end functions to generic place
> >       bpf: Add bpf_functions object
> >       bpf: Add support to attach program to ftrace probe
> >       libbpf: Add btf__find_by_pattern_kind function
> >       libbpf: Add support to load and attach ftrace probe
> >       selftests/bpf: Add ftrace probe to fentry test
> >       selftests/bpf: Add ftrace probe test
> >
> >  include/uapi/linux/bpf.h                             |   8 ++++
> >  kernel/bpf/syscall.c                                 | 381 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  kernel/bpf/trampoline.c                              |  97 ---------------------------------------
> >  kernel/bpf/verifier.c                                |  27 +++++++++++
> >  net/bpf/test_run.c                                   |   1 +
> >  tools/include/uapi/linux/bpf.h                       |   8 ++++
> >  tools/lib/bpf/bpf.c                                  |  12 +++++
> >  tools/lib/bpf/bpf.h                                  |   5 +-
> >  tools/lib/bpf/btf.c                                  |  67 +++++++++++++++++++++++++++
> >  tools/lib/bpf/btf.h                                  |   3 ++
> >  tools/lib/bpf/libbpf.c                               |  74 ++++++++++++++++++++++++++++++
> >  tools/lib/bpf/libbpf.map                             |   1 +
> >  tools/testing/selftests/bpf/prog_tests/fentry_test.c |   5 +-
> >  tools/testing/selftests/bpf/prog_tests/ftrace_test.c |  48 +++++++++++++++++++
> >  tools/testing/selftests/bpf/progs/fentry_test.c      |  16 +++++++
> >  tools/testing/selftests/bpf/progs/ftrace_test.c      |  17 +++++++
> >  16 files changed, 671 insertions(+), 99 deletions(-)
> >  create mode 100644 tools/testing/selftests/bpf/prog_tests/ftrace_test.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/ftrace_test.c
> >
> 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe
  2021-04-14 12:19   ` Jiri Olsa
@ 2021-04-14 22:46     ` Andrii Nakryiko
  2021-04-15 14:00       ` Jiri Olsa
  2021-04-15 15:10       ` Steven Rostedt
  0 siblings, 2 replies; 39+ messages in thread
From: Andrii Nakryiko @ 2021-04-14 22:46 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Jiri Olsa, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Networking, bpf, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Daniel Xu, Steven Rostedt,
	Jesper Brouer, Toke Høiland-Jørgensen, Viktor Malik

On Wed, Apr 14, 2021 at 5:19 AM Jiri Olsa <jolsa@redhat.com> wrote:
>
> On Tue, Apr 13, 2021 at 06:04:05PM -0700, Andrii Nakryiko wrote:
> > On Tue, Apr 13, 2021 at 7:57 AM Jiri Olsa <jolsa@kernel.org> wrote:
> > >
> > > hi,
> > > sending another attempt on speeding up load of multiple probes
> > > for bpftrace and possibly other tools (first post in [1]).
> > >
> > > This patchset adds support to attach bpf program directly to
> > > ftrace probe as suggested by Steven and it speeds up loading
> > > for bpftrace commands like:
> > >
> > >    # bpftrace -e 'kfunc:_raw_spin* { @[probe] = count(); }'
> > >    # bpftrace -e 'kfunc:ksys_* { @[probe] = count(); }'
> > >
> > > Using ftrace with single bpf program for attachment to multiple
> > > functions is much faster than current approach, where we need to
> > > load and attach program for each probe function.
> > >
> >
> > Ok, so first of all, I think it's super important to allow fast
> > attachment of a single BPF program to multiple kernel functions (I
> > call it mass-attachment). I've been recently prototyping a tool
> > (retsnoop, [0]) that allows attaching fentry/fexit to multiple
> > functions, and not having this feature turned into lots of extra code
> > and slow startup/teardown speeds. So we should definitely fix that.
> >
> > But I think the approach you've taken is not the best one, even though
> > it's a good starting point for discussion.
> >
> > First, you are saying function return attachment support is missing,
> > but is not needed so far. I actually think that without func return
> > the whole feature is extremely limiting. Not being able to measure
> > function latency  by tracking enter/exit events is crippling for tons
> > of useful applications. So I think this should go with both at the
> > same time.
> >
> > But guess what, we already have a good BPF infra (BPF trampoline and
> > fexit programs) that supports func exit tracing. Additionally, it
> > supports the ability to read input arguments *on function exit*, which
> > is something that kretprobe doesn't support and which is often a very
> > limiting restriction, necessitating complicated logic to trace
> > function entry just to store input arguments. It's a killer feature
> > and one that makes fexit so much more useful than kretprobe.
> >
> > The only problem is that currently we have a 1:1:1 relationship
> > between BPF trampoline, BPF program, and kernel function. I think we
> > should allow to have a single BPF program, using a single BPF
> > trampoline, but being able to attach to multiple kernel functions
> > (1:1:N). This will allow to validate BPF program once, allocate only
> > one dedicated BPF trampoline, and then (with appropriate attach API)
> > attach them in a batch mode.
>
> heya,
> I had some initial prototypes trying this way, but always ended up
> in complicated code, that's why I turned to ftrace_ops.
>
> let's see if it'll make any sense to you ;-)
>
> 1) so let's say we have extra trampoline for the program (which
> also seems a bit of waste since there will be just single record

BPF trampoline does more than just calls BPF program. At the very
least it saves input arguments for fexit program to be able to access
it. But given it's one BPF trampoline attached to thousands of
functions, I don't see any problem there.

> in it, but sure) - this single trampoline can be easily attached
> to multiple functions, but what about other trampolines/tools,
> that want to trace the same function? we'd need some way for a
> function to share/call multiple trampolines - I did not see easy
> solution in here so I moved to another way..

The easiest would be to make the existing BPF trampoline to co-exist
with this new multi-attach one. As to how, I don't know the code well
enough yet to answer specifically.

>
>
> 2) we keep the trampoline:function relationship to 1:1 and allow
> 'mass-attachment' program to register in multiple trampolines.
> (it needs special hlist node for each attachment, but that's ok)
>
> the problem was that to make this fast, you don't want to attach/detach
> program to trampolines one by one, you need to do it in batch,
> so you can call ftrace API just once (ftrace API is another problem below)
> and doing this in batch mode means, that you need to lock all the
> related trampolines and not allow any change in them by another tools,
> and that's where I couldn't find any easy solution.. you can't take
> a lock for 100 trampolines.. and having some 'master' lock is tricky

So this generic fentry would have its own BPF trampoline. Now you need
to attach it to 1000s of places with a single batch API call. We won't
have to modify 100s of other BPF trampolines, if we can find a good
way to let them co-exist.


>
> another problem is the ftrace API.. to make it fast we either
> need to use ftrace_ops or create fast API to ftrace's direct
> functions.. and that was rejected last time [1]

I don't read it as a rejection, just that ftrace infra needs to be
improved to support. In any case, I haven't spent enough time thinking
and digging through code, but I know that without fexit support this
feature is useless in a lot of cases. And input argument reading in
fexit is too good to give up at this point either.

>
>
> 3) bpf has support for batch interface already, but only if ftrace

It does? What is it? Last time I looked I didn't find anything like that.

> is not in the way..  compile without ftrace is not an option for us,
> so I was also thinking about some way to bypass ftrace and allow
> any trace engine to own some function.. so whoever takes it first
> (ftrace or bpf) can use it, the other one will see -EBUSY and once
> the tool is done, the function is free to take
>
>
> [1] https://lore.kernel.org/bpf/20201022104205.728dd135@gandalf.local.home/#t
>
> >
> > We'll probably have to abandon direct memory read for input arguments,
> > but for these mass-attachment scenarios that's rarely needed at all.
> > Just allowing to read input args as u64 and providing traced function
> > IP would be enough to do a lot. BPF trampoline can just
> > unconditionally save the first 6 arguments, similarly how we do it
> > today for a specific BTF function, just always 6.
>
> yes, we don't need arguments just function ip/id to tell what
> function we just trace
>
> >
> > As for attachment, dedicating an entire new FD for storing functions
> > seems like an overkill. I think BPF_LINK_CREATE is the right place to
> > do this, providing an array of BTF IDs to identify all functions to be
> > attached to. It's both simple and efficient.
>
> that fd can be closed right after link is created ;-)
>
> I used it because it seemed simpler than the array approach,
> and also for the ftrace_location check when adding the function
> to the object - so you know it's under ftrace - when it passed,
> the attach will most likely pass as well - so tools is just adding
> IDs and the objects keeps only the good ones ;-)
>
> but that ftrace_location info can be found in user space as well,
> so it's definitely possible to prepare 'good' BTF IDs in userspace
> and use array

right; a new FD and new concept just for this seems like an overkill

>
> >
> > We'll get to libbpf APIs and those pseudo-regexp usage a bit later, I
> > don't think we need to discuss that at this stage yet :)
> >
> > So, WDYT about BPF trampoline-based generic fentry/fexit with mass-attach API?
>
> I'll double check the ftrace graph support for ftrace_ops ;-)
>
> let's see if we could find some solutions for the problems I
> described above.. or most likely another better way to do this,
> that'd be great
>
> >
> >   [0] https://github.com/anakryiko/retsnoop
>
> nice tool, could it be also part of bcc?

As it is right now it takes minutes to attach and then minutes to
detach, so not a very convenient tool without batched attach. Brendan
Gregg also hinted that he doesn't intend to allow BCC tools collection
to grow unbounded. So for now it is in its own repo, feel free to try
it. We'll see how it goes.

>
> thanks,
> jirka
>
> >

[...]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe
  2021-04-14 22:46     ` Andrii Nakryiko
@ 2021-04-15 14:00       ` Jiri Olsa
  2021-04-15 15:10       ` Steven Rostedt
  1 sibling, 0 replies; 39+ messages in thread
From: Jiri Olsa @ 2021-04-15 14:00 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Jiri Olsa, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Networking, bpf, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Daniel Xu, Steven Rostedt,
	Jesper Brouer, Toke Høiland-Jørgensen, Viktor Malik

On Wed, Apr 14, 2021 at 03:46:49PM -0700, Andrii Nakryiko wrote:
> On Wed, Apr 14, 2021 at 5:19 AM Jiri Olsa <jolsa@redhat.com> wrote:
> >
> > On Tue, Apr 13, 2021 at 06:04:05PM -0700, Andrii Nakryiko wrote:
> > > On Tue, Apr 13, 2021 at 7:57 AM Jiri Olsa <jolsa@kernel.org> wrote:
> > > >
> > > > hi,
> > > > sending another attempt on speeding up load of multiple probes
> > > > for bpftrace and possibly other tools (first post in [1]).
> > > >
> > > > This patchset adds support to attach bpf program directly to
> > > > ftrace probe as suggested by Steven and it speeds up loading
> > > > for bpftrace commands like:
> > > >
> > > >    # bpftrace -e 'kfunc:_raw_spin* { @[probe] = count(); }'
> > > >    # bpftrace -e 'kfunc:ksys_* { @[probe] = count(); }'
> > > >
> > > > Using ftrace with single bpf program for attachment to multiple
> > > > functions is much faster than current approach, where we need to
> > > > load and attach program for each probe function.
> > > >
> > >
> > > Ok, so first of all, I think it's super important to allow fast
> > > attachment of a single BPF program to multiple kernel functions (I
> > > call it mass-attachment). I've been recently prototyping a tool
> > > (retsnoop, [0]) that allows attaching fentry/fexit to multiple
> > > functions, and not having this feature turned into lots of extra code
> > > and slow startup/teardown speeds. So we should definitely fix that.
> > >
> > > But I think the approach you've taken is not the best one, even though
> > > it's a good starting point for discussion.
> > >
> > > First, you are saying function return attachment support is missing,
> > > but is not needed so far. I actually think that without func return
> > > the whole feature is extremely limiting. Not being able to measure
> > > function latency  by tracking enter/exit events is crippling for tons
> > > of useful applications. So I think this should go with both at the
> > > same time.
> > >
> > > But guess what, we already have a good BPF infra (BPF trampoline and
> > > fexit programs) that supports func exit tracing. Additionally, it
> > > supports the ability to read input arguments *on function exit*, which
> > > is something that kretprobe doesn't support and which is often a very
> > > limiting restriction, necessitating complicated logic to trace
> > > function entry just to store input arguments. It's a killer feature
> > > and one that makes fexit so much more useful than kretprobe.
> > >
> > > The only problem is that currently we have a 1:1:1 relationship
> > > between BPF trampoline, BPF program, and kernel function. I think we
> > > should allow to have a single BPF program, using a single BPF
> > > trampoline, but being able to attach to multiple kernel functions
> > > (1:1:N). This will allow to validate BPF program once, allocate only
> > > one dedicated BPF trampoline, and then (with appropriate attach API)
> > > attach them in a batch mode.
> >
> > heya,
> > I had some initial prototypes trying this way, but always ended up
> > in complicated code, that's why I turned to ftrace_ops.
> >
> > let's see if it'll make any sense to you ;-)
> >
> > 1) so let's say we have extra trampoline for the program (which
> > also seems a bit of waste since there will be just single record
> 
> BPF trampoline does more than just calls BPF program. At the very
> least it saves input arguments for fexit program to be able to access
> it. But given it's one BPF trampoline attached to thousands of
> functions, I don't see any problem there.
> 
> > in it, but sure) - this single trampoline can be easily attached
> > to multiple functions, but what about other trampolines/tools,
> > that want to trace the same function? we'd need some way for a
> > function to share/call multiple trampolines - I did not see easy
> > solution in here so I moved to another way..
> 
> The easiest would be to make the existing BPF trampoline to co-exist
> with this new multi-attach one. As to how, I don't know the code well
> enough yet to answer specifically.

I did not explore this possibility, because it seemed too
complicated ;-) I'll see if I can come up with something,
that we could start discussion for, so:

  - new trampoline type that would attach single program
    to multiple functions
  - it needs to 'co-exist' with current trampolines so
    both types could be attached to same function

> 
> >
> >
> > 2) we keep the trampoline:function relationship to 1:1 and allow
> > 'mass-attachment' program to register in multiple trampolines.
> > (it needs special hlist node for each attachment, but that's ok)
> >
> > the problem was that to make this fast, you don't want to attach/detach
> > program to trampolines one by one, you need to do it in batch,
> > so you can call ftrace API just once (ftrace API is another problem below)
> > and doing this in batch mode means, that you need to lock all the
> > related trampolines and not allow any change in them by another tools,
> > and that's where I couldn't find any easy solution.. you can't take
> > a lock for 100 trampolines.. and having some 'master' lock is tricky
> 
> So this generic fentry would have its own BPF trampoline. Now you need
> to attach it to 1000s of places with a single batch API call. We won't
> have to modify 100s of other BPF trampolines, if we can find a good
> way to let them co-exist.
> 
> 
> >
> > another problem is the ftrace API.. to make it fast we either
> > need to use ftrace_ops or create fast API to ftrace's direct
> > functions.. and that was rejected last time [1]
> 
> I don't read it as a rejection, just that ftrace infra needs to be
> improved to support. In any case, I haven't spent enough time thinking
> and digging through code, but I know that without fexit support this
> feature is useless in a lot of cases. And input argument reading in
> fexit is too good to give up at this point either.
> 
> >
> >
> > 3) bpf has support for batch interface already, but only if ftrace
> 
> It does? What is it? Last time I looked I didn't find anything like that.

trampolines uses text_poke_bp function (when ftrace is not compiled in
or the function is not ftrace-managed)

text_poke_bp is wrapper for text_poke_bp_batch to change 1 location,
text_poke_bp_batch allows to change more than one place with:

   text_poke_queue
   text_poke_queue
   ...
   text_poke_finish -> text_poke_flush -> text_poke_bp_batch

thanks,
jirka


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe
  2021-04-14 22:46     ` Andrii Nakryiko
  2021-04-15 14:00       ` Jiri Olsa
@ 2021-04-15 15:10       ` Steven Rostedt
  2021-04-15 17:39         ` Jiri Olsa
  2021-04-15 20:45         ` Andrii Nakryiko
  1 sibling, 2 replies; 39+ messages in thread
From: Steven Rostedt @ 2021-04-15 15:10 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Jiri Olsa, Jiri Olsa, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Networking, bpf, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Daniel Xu,
	Jesper Brouer, Toke Høiland-Jørgensen, Viktor Malik

On Wed, 14 Apr 2021 15:46:49 -0700
Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:

> On Wed, Apr 14, 2021 at 5:19 AM Jiri Olsa <jolsa@redhat.com> wrote:
> >
> > On Tue, Apr 13, 2021 at 06:04:05PM -0700, Andrii Nakryiko wrote:  
> > > On Tue, Apr 13, 2021 at 7:57 AM Jiri Olsa <jolsa@kernel.org> wrote:  
> > > >
> > > > hi,
> > > > sending another attempt on speeding up load of multiple probes
> > > > for bpftrace and possibly other tools (first post in [1]).
> > > >
> > > > This patchset adds support to attach bpf program directly to
> > > > ftrace probe as suggested by Steven and it speeds up loading
> > > > for bpftrace commands like:
> > > >
> > > >    # bpftrace -e 'kfunc:_raw_spin* { @[probe] = count(); }'
> > > >    # bpftrace -e 'kfunc:ksys_* { @[probe] = count(); }'
> > > >
> > > > Using ftrace with single bpf program for attachment to multiple
> > > > functions is much faster than current approach, where we need to
> > > > load and attach program for each probe function.
> > > >  
> > >
> > > Ok, so first of all, I think it's super important to allow fast
> > > attachment of a single BPF program to multiple kernel functions (I
> > > call it mass-attachment). I've been recently prototyping a tool
> > > (retsnoop, [0]) that allows attaching fentry/fexit to multiple
> > > functions, and not having this feature turned into lots of extra code
> > > and slow startup/teardown speeds. So we should definitely fix that.
> > >
> > > But I think the approach you've taken is not the best one, even though
> > > it's a good starting point for discussion.
> > >
> > > First, you are saying function return attachment support is missing,
> > > but is not needed so far. I actually think that without func return
> > > the whole feature is extremely limiting. Not being able to measure
> > > function latency  by tracking enter/exit events is crippling for tons
> > > of useful applications. So I think this should go with both at the
> > > same time.
> > >
> > > But guess what, we already have a good BPF infra (BPF trampoline and
> > > fexit programs) that supports func exit tracing. Additionally, it
> > > supports the ability to read input arguments *on function exit*, which
> > > is something that kretprobe doesn't support and which is often a very
> > > limiting restriction, necessitating complicated logic to trace
> > > function entry just to store input arguments. It's a killer feature
> > > and one that makes fexit so much more useful than kretprobe.
> > >
> > > The only problem is that currently we have a 1:1:1 relationship
> > > between BPF trampoline, BPF program, and kernel function. I think we
> > > should allow to have a single BPF program, using a single BPF
> > > trampoline, but being able to attach to multiple kernel functions
> > > (1:1:N). This will allow to validate BPF program once, allocate only
> > > one dedicated BPF trampoline, and then (with appropriate attach API)
> > > attach them in a batch mode.  
> >
> > heya,
> > I had some initial prototypes trying this way, but always ended up
> > in complicated code, that's why I turned to ftrace_ops.
> >
> > let's see if it'll make any sense to you ;-)
> >
> > 1) so let's say we have extra trampoline for the program (which
> > also seems a bit of waste since there will be just single record  
> 
> BPF trampoline does more than just calls BPF program. At the very
> least it saves input arguments for fexit program to be able to access
> it. But given it's one BPF trampoline attached to thousands of
> functions, I don't see any problem there.

Note, there's a whole infrastructure that does similar things in ftrace.
I wrote the direct call to jump to individual trampolines, because ftrace
was too generic. The only way at the time to get to the arguments was via
the ftrace_regs_caller, which did a full save of regs, because this was
what kprobes needed, and was too expensive for BPF.

I now regret writing the direct callers, and instead should have just done
what I did afterward, which was to make ftrace default to a light weight
trampoline that only saves enough for getting access to the arguments of
the function. And have BPF use that. But I was under the impression that
BPF needed fast access to a single function, and it would not become a
generic trampoline for multiple functions, because that was the argument
used to not enhance ftrace.

Today, ftrace by dafault (on x86) implements a generic way to get the
arguments, and just the arguments which is exactly what BPF would need for
multiple functions. And yes, you even have access to the return code if you
want to "hijack" it. And since it was originally for a individual functions
(and not a batch), I created the direct caller for BPF. But the direct
caller will not be enhanced for multiple functions, as that's not its
purpose. If you want a trampoline to be called back to multiple functions,
then use the infrastructure that was designed for that. Which is what Jiri
had proposed here.

And because the direct caller can mess with the return code, it breaks
function graph tracing. As a temporary work around, we just made function
graph ignore any function that has a direct caller attached to it.

If you want batch processing of BPF programs, you need to first fix the
function graph tracing issue, and allow both BPF attached callers and
function graph to work on the same functions.

I don't know how the BPF code does it, but if you are tracing the exit
of a function, I'm assuming that you hijack the return pointer and replace
it with a call to a trampoline that has access to the arguments. To do
this you need a shadow stack to save the real return as well as the
parameters of the function. This is something that I have patches that do
similar things with function graph.

If you want this feature, lets work together and make this work for both
BPF and ftrace.

-- Steve

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe
  2021-04-15 15:10       ` Steven Rostedt
@ 2021-04-15 17:39         ` Jiri Olsa
  2021-04-15 18:18           ` Steven Rostedt
  2021-04-15 20:45         ` Andrii Nakryiko
  1 sibling, 1 reply; 39+ messages in thread
From: Jiri Olsa @ 2021-04-15 17:39 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andrii Nakryiko, Jiri Olsa, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Networking, bpf, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Daniel Xu,
	Jesper Brouer, Toke Høiland-Jørgensen, Viktor Malik

On Thu, Apr 15, 2021 at 11:10:02AM -0400, Steven Rostedt wrote:

SNIP

> > > heya,
> > > I had some initial prototypes trying this way, but always ended up
> > > in complicated code, that's why I turned to ftrace_ops.
> > >
> > > let's see if it'll make any sense to you ;-)
> > >
> > > 1) so let's say we have extra trampoline for the program (which
> > > also seems a bit of waste since there will be just single record  
> > 
> > BPF trampoline does more than just calls BPF program. At the very
> > least it saves input arguments for fexit program to be able to access
> > it. But given it's one BPF trampoline attached to thousands of
> > functions, I don't see any problem there.
> 
> Note, there's a whole infrastructure that does similar things in ftrace.
> I wrote the direct call to jump to individual trampolines, because ftrace
> was too generic. The only way at the time to get to the arguments was via
> the ftrace_regs_caller, which did a full save of regs, because this was
> what kprobes needed, and was too expensive for BPF.
> 
> I now regret writing the direct callers, and instead should have just done
> what I did afterward, which was to make ftrace default to a light weight
> trampoline that only saves enough for getting access to the arguments of
> the function. And have BPF use that. But I was under the impression that
> BPF needed fast access to a single function, and it would not become a
> generic trampoline for multiple functions, because that was the argument
> used to not enhance ftrace.
> 
> Today, ftrace by dafault (on x86) implements a generic way to get the
> arguments, and just the arguments which is exactly what BPF would need for
> multiple functions. And yes, you even have access to the return code if you
> want to "hijack" it. And since it was originally for a individual functions
> (and not a batch), I created the direct caller for BPF. But the direct
> caller will not be enhanced for multiple functions, as that's not its
> purpose. If you want a trampoline to be called back to multiple functions,
> then use the infrastructure that was designed for that. Which is what Jiri
> had proposed here.
> 
> And because the direct caller can mess with the return code, it breaks
> function graph tracing. As a temporary work around, we just made function
> graph ignore any function that has a direct caller attached to it.
> 
> If you want batch processing of BPF programs, you need to first fix the
> function graph tracing issue, and allow both BPF attached callers and
> function graph to work on the same functions.
> 
> I don't know how the BPF code does it, but if you are tracing the exit
> of a function, I'm assuming that you hijack the return pointer and replace
> it with a call to a trampoline that has access to the arguments. To do

hi,
it's bit different, the trampoline makes use of the fact that the
call to trampoline is at the very begining of the function and, so
it can call the origin function with 'call function + 5' instr.

so in nutshell the trampoline does:

  call entry_progs
  call original_func+5
  call exit_progs

you can check this in arch/x86/net/bpf_jit_comp.c in moe detail:

 * The assembly code when eth_type_trans is called from trampoline:
 *
 * push rbp
 * mov rbp, rsp
 * sub rsp, 24                     // space for skb, dev, return value
 * push rbx                        // temp regs to pass start time
 * mov qword ptr [rbp - 24], rdi   // save skb pointer to stack
 * mov qword ptr [rbp - 16], rsi   // save dev pointer to stack
 * call __bpf_prog_enter           // rcu_read_lock and preempt_disable
 * mov rbx, rax                    // remember start time if bpf stats are enabled
 * lea rdi, [rbp - 24]             // R1==ctx of bpf prog
 * call addr_of_jited_FENTRY_prog  // bpf prog can access skb and dev

entry program called ^^^

 * movabsq rdi, 64bit_addr_of_struct_bpf_prog  // unused if bpf stats are off
 * mov rsi, rbx                    // prog start time
 * call __bpf_prog_exit            // rcu_read_unlock, preempt_enable and stats math
 * mov rdi, qword ptr [rbp - 24]   // restore skb pointer from stack
 * mov rsi, qword ptr [rbp - 16]   // restore dev pointer from stack
 * call eth_type_trans+5           // execute body of eth_type_trans

original function called ^^^

 * mov qword ptr [rbp - 8], rax    // save return value
 * call __bpf_prog_enter           // rcu_read_lock and preempt_disable
 * mov rbx, rax                    // remember start time in bpf stats are enabled
 * lea rdi, [rbp - 24]             // R1==ctx of bpf prog
 * call addr_of_jited_FEXIT_prog   // bpf prog can access skb, dev, return value

exit program called ^^^

 * movabsq rdi, 64bit_addr_of_struct_bpf_prog  // unused if bpf stats are off
 * mov rsi, rbx                    // prog start time
 * call __bpf_prog_exit            // rcu_read_unlock, preempt_enable and stats math
 * mov rax, qword ptr [rbp - 8]    // restore eth_type_trans's return value
 * pop rbx
 * leave
 * add rsp, 8                      // skip eth_type_trans's frame
 * ret                             // return to its caller

> this you need a shadow stack to save the real return as well as the
> parameters of the function. This is something that I have patches that do
> similar things with function graph.
> 
> If you want this feature, lets work together and make this work for both
> BPF and ftrace.

it's been some time I saw a graph tracer, is there a way to make it
access input arguments and make it available through ftrace_ops
interface?

thanks,
jirka


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe
  2021-04-15 17:39         ` Jiri Olsa
@ 2021-04-15 18:18           ` Steven Rostedt
  2021-04-15 18:21             ` Steven Rostedt
  2021-04-15 18:31             ` Yonghong Song
  0 siblings, 2 replies; 39+ messages in thread
From: Steven Rostedt @ 2021-04-15 18:18 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Andrii Nakryiko, Jiri Olsa, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Networking, bpf, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Daniel Xu,
	Jesper Brouer, Toke Høiland-Jørgensen, Viktor Malik

On Thu, 15 Apr 2021 19:39:45 +0200
Jiri Olsa <jolsa@redhat.com> wrote:

> > I don't know how the BPF code does it, but if you are tracing the exit
> > of a function, I'm assuming that you hijack the return pointer and replace
> > it with a call to a trampoline that has access to the arguments. To do  
> 
> hi,
> it's bit different, the trampoline makes use of the fact that the
> call to trampoline is at the very begining of the function and, so
> it can call the origin function with 'call function + 5' instr.
> 
> so in nutshell the trampoline does:
> 
>   call entry_progs
>   call original_func+5

How does the above handle functions that have parameters on the stack?

>   call exit_progs
> 
> you can check this in arch/x86/net/bpf_jit_comp.c in moe detail:
> 
>  * The assembly code when eth_type_trans is called from trampoline:
>  *
>  * push rbp
>  * mov rbp, rsp
>  * sub rsp, 24                     // space for skb, dev, return value
>  * push rbx                        // temp regs to pass start time
>  * mov qword ptr [rbp - 24], rdi   // save skb pointer to stack
>  * mov qword ptr [rbp - 16], rsi   // save dev pointer to stack
>  * call __bpf_prog_enter           // rcu_read_lock and preempt_disable
>  * mov rbx, rax                    // remember start time if bpf stats are enabled
>  * lea rdi, [rbp - 24]             // R1==ctx of bpf prog
>  * call addr_of_jited_FENTRY_prog  // bpf prog can access skb and dev
> 
> entry program called ^^^
> 
>  * movabsq rdi, 64bit_addr_of_struct_bpf_prog  // unused if bpf stats are off
>  * mov rsi, rbx                    // prog start time
>  * call __bpf_prog_exit            // rcu_read_unlock, preempt_enable and stats math
>  * mov rdi, qword ptr [rbp - 24]   // restore skb pointer from stack
>  * mov rsi, qword ptr [rbp - 16]   // restore dev pointer from stack
>  * call eth_type_trans+5           // execute body of eth_type_trans
> 
> original function called ^^^

This would need to be limited to only functions that do not have any
parameters on the stack.

> 
>  * mov qword ptr [rbp - 8], rax    // save return value
>  * call __bpf_prog_enter           // rcu_read_lock and preempt_disable
>  * mov rbx, rax                    // remember start time in bpf stats are enabled
>  * lea rdi, [rbp - 24]             // R1==ctx of bpf prog
>  * call addr_of_jited_FEXIT_prog   // bpf prog can access skb, dev, return value
> 
> exit program called ^^^
> 
>  * movabsq rdi, 64bit_addr_of_struct_bpf_prog  // unused if bpf stats are off
>  * mov rsi, rbx                    // prog start time
>  * call __bpf_prog_exit            // rcu_read_unlock, preempt_enable and stats math
>  * mov rax, qword ptr [rbp - 8]    // restore eth_type_trans's return value
>  * pop rbx
>  * leave
>  * add rsp, 8                      // skip eth_type_trans's frame
>  * ret                             // return to its caller
> 
> > this you need a shadow stack to save the real return as well as the
> > parameters of the function. This is something that I have patches that do
> > similar things with function graph.
> > 
> > If you want this feature, lets work together and make this work for both
> > BPF and ftrace.  
> 
> it's been some time I saw a graph tracer, is there a way to make it
> access input arguments and make it available through ftrace_ops
> interface?

I have patches that could easily make it do so. And should probably get
them out again. The function graph tracer has a shadow stack, and my
patches allow you to store data on it for use with the exiting of the
program.

My last release of that code is here:

  https://lore.kernel.org/lkml/20190525031633.811342628@goodmis.org/

It allows you to "reserve data" to pass from the caller to the return, and
that could hold the arguments. See patch 15 of that series.


-- Steve


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe
  2021-04-15 18:18           ` Steven Rostedt
@ 2021-04-15 18:21             ` Steven Rostedt
  2021-04-15 21:49               ` Jiri Olsa
  2021-04-15 18:31             ` Yonghong Song
  1 sibling, 1 reply; 39+ messages in thread
From: Steven Rostedt @ 2021-04-15 18:21 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Andrii Nakryiko, Jiri Olsa, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Networking, bpf, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Daniel Xu,
	Jesper Brouer, Toke Høiland-Jørgensen, Viktor Malik

On Thu, 15 Apr 2021 14:18:31 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> My last release of that code is here:
> 
>   https://lore.kernel.org/lkml/20190525031633.811342628@goodmis.org/
> 
> It allows you to "reserve data" to pass from the caller to the return, and
> that could hold the arguments. See patch 15 of that series.

Note that implementation only lets you save up to 4 words on the stack, but
that can be changed. Or you could have a separate shadow stack for saving
arguments, and only pass the pointer to the location on the other stack
where those arguments are.

-- Steve

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe
  2021-04-15 18:18           ` Steven Rostedt
  2021-04-15 18:21             ` Steven Rostedt
@ 2021-04-15 18:31             ` Yonghong Song
  1 sibling, 0 replies; 39+ messages in thread
From: Yonghong Song @ 2021-04-15 18:31 UTC (permalink / raw)
  To: Steven Rostedt, Jiri Olsa
  Cc: Andrii Nakryiko, Jiri Olsa, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Networking, bpf, Martin KaFai Lau, Song Liu,
	John Fastabend, KP Singh, Daniel Xu, Jesper Brouer,
	Toke Høiland-Jørgensen, Viktor Malik



On 4/15/21 11:18 AM, Steven Rostedt wrote:
> On Thu, 15 Apr 2021 19:39:45 +0200
> Jiri Olsa <jolsa@redhat.com> wrote:
> 
>>> I don't know how the BPF code does it, but if you are tracing the exit
>>> of a function, I'm assuming that you hijack the return pointer and replace
>>> it with a call to a trampoline that has access to the arguments. To do
>>
>> hi,
>> it's bit different, the trampoline makes use of the fact that the
>> call to trampoline is at the very begining of the function and, so
>> it can call the origin function with 'call function + 5' instr.
>>
>> so in nutshell the trampoline does:
>>
>>    call entry_progs
>>    call original_func+5
> 
> How does the above handle functions that have parameters on the stack?

In arch/x86/net/bpf_jit_comp.c, the following code

int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, 
void *image_end,
                                 const struct btf_func_model *m, u32 flags,
                                 struct bpf_tramp_progs *tprogs,
                                 void *orig_call)
{
         int ret, i, cnt = 0, nr_args = m->nr_args;
         int stack_size = nr_args * 8;
         struct bpf_tramp_progs *fentry = &tprogs[BPF_TRAMP_FENTRY];
         struct bpf_tramp_progs *fexit = &tprogs[BPF_TRAMP_FEXIT];
         struct bpf_tramp_progs *fmod_ret = 
&tprogs[BPF_TRAMP_MODIFY_RETURN];
         u8 **branches = NULL;
         u8 *prog;

         /* x86-64 supports up to 6 arguments. 7+ can be added in the 
future */
         if (nr_args > 6)
                 return -ENOTSUPP;
...

Here, nr_args will be maximumly 5 original arguments + "ret" value 
argument for fexit bpf program.

yes, it is a known limitation. I guess more can be added and pushed to
the stack. Looks like there is no strong request so this has not been
implemented yet.

> 
>>    call exit_progs
>>
>> you can check this in arch/x86/net/bpf_jit_comp.c in moe detail:
>>
>>   * The assembly code when eth_type_trans is called from trampoline:
>>   *
>>   * push rbp
>>   * mov rbp, rsp
>>   * sub rsp, 24                     // space for skb, dev, return value
>>   * push rbx                        // temp regs to pass start time
>>   * mov qword ptr [rbp - 24], rdi   // save skb pointer to stack
>>   * mov qword ptr [rbp - 16], rsi   // save dev pointer to stack
>>   * call __bpf_prog_enter           // rcu_read_lock and preempt_disable
>>   * mov rbx, rax                    // remember start time if bpf stats are enabled
>>   * lea rdi, [rbp - 24]             // R1==ctx of bpf prog
>>   * call addr_of_jited_FENTRY_prog  // bpf prog can access skb and dev
>>
>> entry program called ^^^
>>
>>   * movabsq rdi, 64bit_addr_of_struct_bpf_prog  // unused if bpf stats are off
>>   * mov rsi, rbx                    // prog start time
>>   * call __bpf_prog_exit            // rcu_read_unlock, preempt_enable and stats math
>>   * mov rdi, qword ptr [rbp - 24]   // restore skb pointer from stack
>>   * mov rsi, qword ptr [rbp - 16]   // restore dev pointer from stack
>>   * call eth_type_trans+5           // execute body of eth_type_trans
>>
>> original function called ^^^
> 
> This would need to be limited to only functions that do not have any
> parameters on the stack.
> 
>>
>>   * mov qword ptr [rbp - 8], rax    // save return value
>>   * call __bpf_prog_enter           // rcu_read_lock and preempt_disable
>>   * mov rbx, rax                    // remember start time in bpf stats are enabled
>>   * lea rdi, [rbp - 24]             // R1==ctx of bpf prog
>>   * call addr_of_jited_FEXIT_prog   // bpf prog can access skb, dev, return value
>>
>> exit program called ^^^
>>
>>   * movabsq rdi, 64bit_addr_of_struct_bpf_prog  // unused if bpf stats are off
>>   * mov rsi, rbx                    // prog start time
>>   * call __bpf_prog_exit            // rcu_read_unlock, preempt_enable and stats math
>>   * mov rax, qword ptr [rbp - 8]    // restore eth_type_trans's return value
>>   * pop rbx
>>   * leave
>>   * add rsp, 8                      // skip eth_type_trans's frame
>>   * ret                             // return to its caller
>>
>>> this you need a shadow stack to save the real return as well as the
>>> parameters of the function. This is something that I have patches that do
>>> similar things with function graph.
>>>
>>> If you want this feature, lets work together and make this work for both
>>> BPF and ftrace.
>>
>> it's been some time I saw a graph tracer, is there a way to make it
>> access input arguments and make it available through ftrace_ops
>> interface?
> 
> I have patches that could easily make it do so. And should probably get
> them out again. The function graph tracer has a shadow stack, and my
> patches allow you to store data on it for use with the exiting of the
> program.
> 
> My last release of that code is here:
> 
>    https://lore.kernel.org/lkml/20190525031633.811342628@goodmis.org/
> 
> It allows you to "reserve data" to pass from the caller to the return, and
> that could hold the arguments. See patch 15 of that series.
> 
> 
> -- Steve
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe
  2021-04-15 15:10       ` Steven Rostedt
  2021-04-15 17:39         ` Jiri Olsa
@ 2021-04-15 20:45         ` Andrii Nakryiko
  2021-04-15 21:00           ` Steven Rostedt
  1 sibling, 1 reply; 39+ messages in thread
From: Andrii Nakryiko @ 2021-04-15 20:45 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Jiri Olsa, Jiri Olsa, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Networking, bpf, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Daniel Xu,
	Jesper Brouer, Toke Høiland-Jørgensen, Viktor Malik

On Thu, Apr 15, 2021 at 8:10 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Wed, 14 Apr 2021 15:46:49 -0700
> Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>
> > On Wed, Apr 14, 2021 at 5:19 AM Jiri Olsa <jolsa@redhat.com> wrote:
> > >
> > > On Tue, Apr 13, 2021 at 06:04:05PM -0700, Andrii Nakryiko wrote:
> > > > On Tue, Apr 13, 2021 at 7:57 AM Jiri Olsa <jolsa@kernel.org> wrote:
> > > > >
> > > > > hi,
> > > > > sending another attempt on speeding up load of multiple probes
> > > > > for bpftrace and possibly other tools (first post in [1]).
> > > > >
> > > > > This patchset adds support to attach bpf program directly to
> > > > > ftrace probe as suggested by Steven and it speeds up loading
> > > > > for bpftrace commands like:
> > > > >
> > > > >    # bpftrace -e 'kfunc:_raw_spin* { @[probe] = count(); }'
> > > > >    # bpftrace -e 'kfunc:ksys_* { @[probe] = count(); }'
> > > > >
> > > > > Using ftrace with single bpf program for attachment to multiple
> > > > > functions is much faster than current approach, where we need to
> > > > > load and attach program for each probe function.
> > > > >
> > > >
> > > > Ok, so first of all, I think it's super important to allow fast
> > > > attachment of a single BPF program to multiple kernel functions (I
> > > > call it mass-attachment). I've been recently prototyping a tool
> > > > (retsnoop, [0]) that allows attaching fentry/fexit to multiple
> > > > functions, and not having this feature turned into lots of extra code
> > > > and slow startup/teardown speeds. So we should definitely fix that.
> > > >
> > > > But I think the approach you've taken is not the best one, even though
> > > > it's a good starting point for discussion.
> > > >
> > > > First, you are saying function return attachment support is missing,
> > > > but is not needed so far. I actually think that without func return
> > > > the whole feature is extremely limiting. Not being able to measure
> > > > function latency  by tracking enter/exit events is crippling for tons
> > > > of useful applications. So I think this should go with both at the
> > > > same time.
> > > >
> > > > But guess what, we already have a good BPF infra (BPF trampoline and
> > > > fexit programs) that supports func exit tracing. Additionally, it
> > > > supports the ability to read input arguments *on function exit*, which
> > > > is something that kretprobe doesn't support and which is often a very
> > > > limiting restriction, necessitating complicated logic to trace
> > > > function entry just to store input arguments. It's a killer feature
> > > > and one that makes fexit so much more useful than kretprobe.
> > > >
> > > > The only problem is that currently we have a 1:1:1 relationship
> > > > between BPF trampoline, BPF program, and kernel function. I think we
> > > > should allow to have a single BPF program, using a single BPF
> > > > trampoline, but being able to attach to multiple kernel functions
> > > > (1:1:N). This will allow to validate BPF program once, allocate only
> > > > one dedicated BPF trampoline, and then (with appropriate attach API)
> > > > attach them in a batch mode.
> > >
> > > heya,
> > > I had some initial prototypes trying this way, but always ended up
> > > in complicated code, that's why I turned to ftrace_ops.
> > >
> > > let's see if it'll make any sense to you ;-)
> > >
> > > 1) so let's say we have extra trampoline for the program (which
> > > also seems a bit of waste since there will be just single record
> >
> > BPF trampoline does more than just calls BPF program. At the very
> > least it saves input arguments for fexit program to be able to access
> > it. But given it's one BPF trampoline attached to thousands of
> > functions, I don't see any problem there.
>
> Note, there's a whole infrastructure that does similar things in ftrace.
> I wrote the direct call to jump to individual trampolines, because ftrace
> was too generic. The only way at the time to get to the arguments was via
> the ftrace_regs_caller, which did a full save of regs, because this was
> what kprobes needed, and was too expensive for BPF.
>
> I now regret writing the direct callers, and instead should have just done
> what I did afterward, which was to make ftrace default to a light weight
> trampoline that only saves enough for getting access to the arguments of
> the function. And have BPF use that. But I was under the impression that
> BPF needed fast access to a single function, and it would not become a
> generic trampoline for multiple functions, because that was the argument
> used to not enhance ftrace.
>
> Today, ftrace by dafault (on x86) implements a generic way to get the
> arguments, and just the arguments which is exactly what BPF would need for
> multiple functions. And yes, you even have access to the return code if you
> want to "hijack" it. And since it was originally for a individual functions
> (and not a batch), I created the direct caller for BPF. But the direct
> caller will not be enhanced for multiple functions, as that's not its
> purpose. If you want a trampoline to be called back to multiple functions,
> then use the infrastructure that was designed for that. Which is what Jiri
> had proposed here.
>
> And because the direct caller can mess with the return code, it breaks
> function graph tracing. As a temporary work around, we just made function
> graph ignore any function that has a direct caller attached to it.
>
> If you want batch processing of BPF programs, you need to first fix the
> function graph tracing issue, and allow both BPF attached callers and
> function graph to work on the same functions.
>
> I don't know how the BPF code does it, but if you are tracing the exit
> of a function, I'm assuming that you hijack the return pointer and replace
> it with a call to a trampoline that has access to the arguments. To do

As Jiri replied, BPF trampoline doesn't do it the same way as
kretprobe does it. Which gives the fexit BPF program another critical
advantage over kretprobe -- we know traced function's entry IP in both
entry and exit cases, which allows us to generically correlate them.

I've tried to figure out how to get that entry IP from kretprobe and
couldn't find any way. Do you know if it's possible at all or it's a
fundamental limitation of the way kretprobe is implemented (through
hijacking return address)?

> this you need a shadow stack to save the real return as well as the
> parameters of the function. This is something that I have patches that do
> similar things with function graph.
>
> If you want this feature, lets work together and make this work for both
> BPF and ftrace.

Absolutely, ultimately for users it doesn't matter what specific
mechanism is used under the cover. It just seemed like BPF trampoline
has all the useful tracing features (entry IP and input arguments in
fexit) already and is just mostly missing a quick batch attach API. If
we can get the same from ftrace, all the better.

>
> -- Steve

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe
  2021-04-15 20:45         ` Andrii Nakryiko
@ 2021-04-15 21:00           ` Steven Rostedt
  2021-04-16 15:03             ` Masami Hiramatsu
  0 siblings, 1 reply; 39+ messages in thread
From: Steven Rostedt @ 2021-04-15 21:00 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Jiri Olsa, Jiri Olsa, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Networking, bpf, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Daniel Xu,
	Jesper Brouer, Toke Høiland-Jørgensen, Viktor Malik,
	Masami Hiramatsu


[
  Added Masami, as I didn't realize he wasn't on Cc. He's the maintainer of
  kretprobes.

  Masami, you may want to use lore.kernel.org to read the history of this
  thread.
]

On Thu, 15 Apr 2021 13:45:06 -0700
Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:

> > I don't know how the BPF code does it, but if you are tracing the exit
> > of a function, I'm assuming that you hijack the return pointer and replace
> > it with a call to a trampoline that has access to the arguments. To do  
> 
> As Jiri replied, BPF trampoline doesn't do it the same way as
> kretprobe does it. Which gives the fexit BPF program another critical
> advantage over kretprobe -- we know traced function's entry IP in both
> entry and exit cases, which allows us to generically correlate them.
> 
> I've tried to figure out how to get that entry IP from kretprobe and
> couldn't find any way. Do you know if it's possible at all or it's a
> fundamental limitation of the way kretprobe is implemented (through
> hijacking return address)?

The function graph tracer has the entry IP on exit, today. That's how we
can trace and show this:

 # cd /sys/kernel/tracing
 # echo 1 > echo 1 > options/funcgraph-tail
 # echo function_graph > current_tracer
 # cat trace
# tracer: function_graph
#
# CPU  DURATION                  FUNCTION CALLS
# |     |   |                     |   |   |   |
 7)   1.358 us    |  rcu_idle_exit();
 7)   0.169 us    |  sched_idle_set_state();
 7)               |  cpuidle_reflect() {
 7)               |    menu_reflect() {
 7)   0.170 us    |      tick_nohz_idle_got_tick();
 7)   0.585 us    |    } /* menu_reflect */
 7)   1.115 us    |  } /* cpuidle_reflect */

That's how we can show the tail function that's called. I'm sure kreprobes
could do the same thing.

The patch series I shared with Jiri, was work to allow kretprobes to be
built on top of the function graph tracer.

https://lore.kernel.org/lkml/20190525031633.811342628@goodmis.org/

The feature missing from that series, and why I didn't push it (as I had
ran out of time to work on it), was that kreprobes wants the full regs
stack as well. And since kretprobes was the main client of this work, that
I decided to work on this at another time. But like everything else, I got
distracted by other work, and didn't realize it has been almost 2 years
since looking at it :-p

Anyway, IIRC, Masami wasn't sure that the full regs was ever needed for the
return (who cares about the registers on return, except for the return
value?)

But this code could easily save the parameters as well.

> 
> > this you need a shadow stack to save the real return as well as the
> > parameters of the function. This is something that I have patches that do
> > similar things with function graph.
> >
> > If you want this feature, lets work together and make this work for both
> > BPF and ftrace.  
> 
> Absolutely, ultimately for users it doesn't matter what specific
> mechanism is used under the cover. It just seemed like BPF trampoline
> has all the useful tracing features (entry IP and input arguments in
> fexit) already and is just mostly missing a quick batch attach API. If
> we can get the same from ftrace, all the better.

Let me pull these patches out again, and see what we can do. Since then,
I've added the code that lets function tracer save parameters and the
stack, and function graph can use that as well.


-- Steve

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe
  2021-04-15 18:21             ` Steven Rostedt
@ 2021-04-15 21:49               ` Jiri Olsa
  2021-04-15 23:30                 ` Steven Rostedt
  0 siblings, 1 reply; 39+ messages in thread
From: Jiri Olsa @ 2021-04-15 21:49 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andrii Nakryiko, Jiri Olsa, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Networking, bpf, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Daniel Xu,
	Jesper Brouer, Toke Høiland-Jørgensen, Viktor Malik

On Thu, Apr 15, 2021 at 02:21:20PM -0400, Steven Rostedt wrote:
> On Thu, 15 Apr 2021 14:18:31 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > My last release of that code is here:
> > 
> >   https://lore.kernel.org/lkml/20190525031633.811342628@goodmis.org/
> > 
> > It allows you to "reserve data" to pass from the caller to the return, and
> > that could hold the arguments. See patch 15 of that series.
> 
> Note that implementation only lets you save up to 4 words on the stack, but
> that can be changed. Or you could have a separate shadow stack for saving
> arguments, and only pass the pointer to the location on the other stack
> where those arguments are.

right, I quickly checked on that and it looks exactly like
the thing we need

I'll try to rebase that on the current code and try to use
it with the bpf ftrace probe to see how it fits

any chance you could plan on reposting it? ;-)

thanks,
jirka


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe
  2021-04-15 21:49               ` Jiri Olsa
@ 2021-04-15 23:30                 ` Steven Rostedt
  2021-04-19 20:51                   ` Jiri Olsa
  0 siblings, 1 reply; 39+ messages in thread
From: Steven Rostedt @ 2021-04-15 23:30 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Andrii Nakryiko, Jiri Olsa, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Networking, bpf, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Daniel Xu,
	Jesper Brouer, Toke Høiland-Jørgensen, Viktor Malik

On Thu, 15 Apr 2021 23:49:43 +0200
Jiri Olsa <jolsa@redhat.com> wrote:


> right, I quickly checked on that and it looks exactly like
> the thing we need
> 
> I'll try to rebase that on the current code and try to use
> it with the bpf ftrace probe to see how it fits
> 
> any chance you could plan on reposting it? ;-)

I'm currently working on cleaning up code for the next merge window,
but I did go ahead and rebase it on top of my for-next branch. I didn't
event try to compile it, but at least it's rebased ;-)

git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git

Branch: ftrace/fgraph-multi

-- Steve

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe
  2021-04-15 21:00           ` Steven Rostedt
@ 2021-04-16 15:03             ` Masami Hiramatsu
  2021-04-16 16:48               ` Steven Rostedt
  2021-04-20  4:51               ` Andrii Nakryiko
  0 siblings, 2 replies; 39+ messages in thread
From: Masami Hiramatsu @ 2021-04-16 15:03 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andrii Nakryiko, Jiri Olsa, Jiri Olsa, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Networking, bpf,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Daniel Xu, Jesper Brouer,
	Toke Høiland-Jørgensen, Viktor Malik, Masami Hiramatsu

Hi,

On Thu, 15 Apr 2021 17:00:07 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> 
> [
>   Added Masami, as I didn't realize he wasn't on Cc. He's the maintainer of
>   kretprobes.
> 
>   Masami, you may want to use lore.kernel.org to read the history of this
>   thread.
> ]
> 
> On Thu, 15 Apr 2021 13:45:06 -0700
> Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> 
> > > I don't know how the BPF code does it, but if you are tracing the exit
> > > of a function, I'm assuming that you hijack the return pointer and replace
> > > it with a call to a trampoline that has access to the arguments. To do  
> > 
> > As Jiri replied, BPF trampoline doesn't do it the same way as
> > kretprobe does it. Which gives the fexit BPF program another critical
> > advantage over kretprobe -- we know traced function's entry IP in both
> > entry and exit cases, which allows us to generically correlate them.
> > 
> > I've tried to figure out how to get that entry IP from kretprobe and
> > couldn't find any way. Do you know if it's possible at all or it's a
> > fundamental limitation of the way kretprobe is implemented (through
> > hijacking return address)?

Inside the kretprobe handler, you can get the entry IP from kretprobe as below;

static int my_kretprobe_handler(struct kretprobe_instance *ri, struct pt_regs *regs)
{
	struct kretprobe *rp = get_kretprobe(ri);
	unsigned long entry = (unsigned long)rp->kp.addr;
	unsigned long retaddr = (unsigned long)ri->ret_addr;
	...
}

It is ensured that rp != NULL in the handler.

> 
> The function graph tracer has the entry IP on exit, today. That's how we
> can trace and show this:
> 
>  # cd /sys/kernel/tracing
>  # echo 1 > echo 1 > options/funcgraph-tail
>  # echo function_graph > current_tracer
>  # cat trace
> # tracer: function_graph
> #
> # CPU  DURATION                  FUNCTION CALLS
> # |     |   |                     |   |   |   |
>  7)   1.358 us    |  rcu_idle_exit();
>  7)   0.169 us    |  sched_idle_set_state();
>  7)               |  cpuidle_reflect() {
>  7)               |    menu_reflect() {
>  7)   0.170 us    |      tick_nohz_idle_got_tick();
>  7)   0.585 us    |    } /* menu_reflect */
>  7)   1.115 us    |  } /* cpuidle_reflect */
> 
> That's how we can show the tail function that's called. I'm sure kreprobes
> could do the same thing.

Yes, I have to update the document how to do that (and maybe introduce 2 functions
to wrap the entry/retaddr code)

> 
> The patch series I shared with Jiri, was work to allow kretprobes to be
> built on top of the function graph tracer.
> 
> https://lore.kernel.org/lkml/20190525031633.811342628@goodmis.org/
> 
> The feature missing from that series, and why I didn't push it (as I had
> ran out of time to work on it), was that kreprobes wants the full regs
> stack as well. And since kretprobes was the main client of this work, that
> I decided to work on this at another time. But like everything else, I got
> distracted by other work, and didn't realize it has been almost 2 years
> since looking at it :-p
> 
> Anyway, IIRC, Masami wasn't sure that the full regs was ever needed for the
> return (who cares about the registers on return, except for the return
> value?)

I think kretprobe and ftrace are for a bit different usage. kretprobe can be
used for something like debugger. In that case, accessing full regs stack
will be more preferrable. (BTW, what the not "full regs" means? Does that
save partial registers?)


Thank you,

> But this code could easily save the parameters as well.
> 
> > 
> > > this you need a shadow stack to save the real return as well as the
> > > parameters of the function. This is something that I have patches that do
> > > similar things with function graph.
> > >
> > > If you want this feature, lets work together and make this work for both
> > > BPF and ftrace.  
> > 
> > Absolutely, ultimately for users it doesn't matter what specific
> > mechanism is used under the cover. It just seemed like BPF trampoline
> > has all the useful tracing features (entry IP and input arguments in
> > fexit) already and is just mostly missing a quick batch attach API. If
> > we can get the same from ftrace, all the better.
> 
> Let me pull these patches out again, and see what we can do. Since then,
> I've added the code that lets function tracer save parameters and the
> stack, and function graph can use that as well.
> 
> 
> -- Steve


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe
  2021-04-16 15:03             ` Masami Hiramatsu
@ 2021-04-16 16:48               ` Steven Rostedt
  2021-04-19 14:29                 ` Masami Hiramatsu
  2021-04-20 12:51                 ` Jiri Olsa
  2021-04-20  4:51               ` Andrii Nakryiko
  1 sibling, 2 replies; 39+ messages in thread
From: Steven Rostedt @ 2021-04-16 16:48 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Andrii Nakryiko, Jiri Olsa, Jiri Olsa, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Networking, bpf,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Daniel Xu, Jesper Brouer,
	Toke Høiland-Jørgensen, Viktor Malik

On Sat, 17 Apr 2021 00:03:04 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> > Anyway, IIRC, Masami wasn't sure that the full regs was ever needed for the
> > return (who cares about the registers on return, except for the return
> > value?)  
> 
> I think kretprobe and ftrace are for a bit different usage. kretprobe can be
> used for something like debugger. In that case, accessing full regs stack
> will be more preferrable. (BTW, what the not "full regs" means? Does that
> save partial registers?)

When the REGS flag is not set in the ftrace_ops (where kprobes uses the
REGS flags), the regs parameter is not a full set of regs, but holds just
enough to get access to the parameters. This just happened to be what was
saved in the mcount/fentry trampoline, anyway, because tracing the start of
the program, you had to save the arguments before calling the trace code,
otherwise you would corrupt the parameters of the function being traced.

I just tweaked it so that by default, the ftrace callbacks now have access
to the saved regs (call ftrace_regs, to not let a callback get confused and
think it has full regs when it does not).

Now for the exit of a function, what does having the full pt_regs give you?
Besides the information to get the return value, the rest of the regs are
pretty much meaningless. Is there any example that someone wants access to
the regs at the end of a function besides getting the return value?

-- Steve

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe
  2021-04-16 16:48               ` Steven Rostedt
@ 2021-04-19 14:29                 ` Masami Hiramatsu
  2021-04-20 12:51                 ` Jiri Olsa
  1 sibling, 0 replies; 39+ messages in thread
From: Masami Hiramatsu @ 2021-04-19 14:29 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andrii Nakryiko, Jiri Olsa, Jiri Olsa, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Networking, bpf,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Daniel Xu, Jesper Brouer,
	Toke Høiland-Jørgensen, Viktor Malik

On Fri, 16 Apr 2021 12:48:34 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Sat, 17 Apr 2021 00:03:04 +0900
> Masami Hiramatsu <mhiramat@kernel.org> wrote:
> 
> > > Anyway, IIRC, Masami wasn't sure that the full regs was ever needed for the
> > > return (who cares about the registers on return, except for the return
> > > value?)  
> > 
> > I think kretprobe and ftrace are for a bit different usage. kretprobe can be
> > used for something like debugger. In that case, accessing full regs stack
> > will be more preferrable. (BTW, what the not "full regs" means? Does that
> > save partial registers?)
> 
> When the REGS flag is not set in the ftrace_ops (where kprobes uses the
> REGS flags), the regs parameter is not a full set of regs, but holds just
> enough to get access to the parameters. This just happened to be what was
> saved in the mcount/fentry trampoline, anyway, because tracing the start of
> the program, you had to save the arguments before calling the trace code,
> otherwise you would corrupt the parameters of the function being traced.

Yes, if we trace the function as a blackbox, it is correct. It should trace
the parameter at the entry and trace result at the exit.

> I just tweaked it so that by default, the ftrace callbacks now have access
> to the saved regs (call ftrace_regs, to not let a callback get confused and
> think it has full regs when it does not).

Ah, I got it. kretprobe allows user to set a custom region in its instance
so that the user handler can store the parameter at entry point. Sometimes
such "saved regs" is not enough because if the parameter passed via
pointer, actual data can be changed.
Anyway, for the kprobe event, that can be integrated seemlessly. But for the
low-level kretprobe, I think if we integrate it, we should better to update
kretprobe handler interface.

> Now for the exit of a function, what does having the full pt_regs give you?

It may allow user to debug kernel function if the user thinks any suspicious
behavior does/doesn't come from the compiler issue. (I would like to recommend
them to use kprobe for that purpose, but there is kretprobe already ...)

> Besides the information to get the return value, the rest of the regs are
> pretty much meaningless. Is there any example that someone wants access to
> the regs at the end of a function besides getting the return value?

Yes, as far as we can confident that the code is not corrupted. But, for example,
if user would like to make sure the collee saved register (or some flags)
is correctly saved and restored, ensuring raw register access will be helpful.

(Yeah, but I know it is very rare case.)

Thank you,


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe
  2021-04-15 23:30                 ` Steven Rostedt
@ 2021-04-19 20:51                   ` Jiri Olsa
  2021-04-19 22:00                     ` Steven Rostedt
  0 siblings, 1 reply; 39+ messages in thread
From: Jiri Olsa @ 2021-04-19 20:51 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andrii Nakryiko, Jiri Olsa, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Networking, bpf, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Daniel Xu,
	Jesper Brouer, Toke Høiland-Jørgensen, Viktor Malik

On Thu, Apr 15, 2021 at 07:30:32PM -0400, Steven Rostedt wrote:
> On Thu, 15 Apr 2021 23:49:43 +0200
> Jiri Olsa <jolsa@redhat.com> wrote:
> 
> 
> > right, I quickly checked on that and it looks exactly like
> > the thing we need
> > 
> > I'll try to rebase that on the current code and try to use
> > it with the bpf ftrace probe to see how it fits
> > 
> > any chance you could plan on reposting it? ;-)
> 
> I'm currently working on cleaning up code for the next merge window,
> but I did go ahead and rebase it on top of my for-next branch. I didn't
> event try to compile it, but at least it's rebased ;-)
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
> 
> Branch: ftrace/fgraph-multi

works nicely (with small compilation fixes)

I added support to call bpf program on the function exit using fgraph_ops
and it seems to work

now, it looks like the fgraph_ops entry callback does not have access
to registers.. once we have that, we could store arguments for the exit
callback and have all in place.. could this be added? ;-)

thanks,
jirka


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe
  2021-04-19 20:51                   ` Jiri Olsa
@ 2021-04-19 22:00                     ` Steven Rostedt
  0 siblings, 0 replies; 39+ messages in thread
From: Steven Rostedt @ 2021-04-19 22:00 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Andrii Nakryiko, Jiri Olsa, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Networking, bpf, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Daniel Xu,
	Jesper Brouer, Toke Høiland-Jørgensen, Viktor Malik

On Mon, 19 Apr 2021 22:51:46 +0200
Jiri Olsa <jolsa@redhat.com> wrote:

> now, it looks like the fgraph_ops entry callback does not have access
> to registers.. once we have that, we could store arguments for the exit
> callback and have all in place.. could this be added? ;-)

Sure. The only problem is that we need to do this carefully to not break
all the architectures that support function graph tracing.

For function tracing, I usually add "CONFIG_HAVE_..." configs that state if
the architecture supports some ftrace feature, and if it does it can use a
different callback prototype. But it does get messy.

Ideally, I would love to go and update all architectures to support all
features, but that requires understanding the assembly of all those
architectures :-p

To test that I don't break other archs, I usually just support x86_64 and
leave x86_32 behind. I mean, who cares about x86_32 anymore ;-)

-- Steve

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe
  2021-04-16 15:03             ` Masami Hiramatsu
  2021-04-16 16:48               ` Steven Rostedt
@ 2021-04-20  4:51               ` Andrii Nakryiko
  1 sibling, 0 replies; 39+ messages in thread
From: Andrii Nakryiko @ 2021-04-20  4:51 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Jiri Olsa, Jiri Olsa, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Networking, bpf,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Daniel Xu, Jesper Brouer,
	Toke Høiland-Jørgensen, Viktor Malik

On Fri, Apr 16, 2021 at 8:03 AM Masami Hiramatsu <mhiramat@kernel.org> wrote:
>
> Hi,
>
> On Thu, 15 Apr 2021 17:00:07 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
>
> >
> > [
> >   Added Masami, as I didn't realize he wasn't on Cc. He's the maintainer of
> >   kretprobes.
> >
> >   Masami, you may want to use lore.kernel.org to read the history of this
> >   thread.
> > ]
> >
> > On Thu, 15 Apr 2021 13:45:06 -0700
> > Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >
> > > > I don't know how the BPF code does it, but if you are tracing the exit
> > > > of a function, I'm assuming that you hijack the return pointer and replace
> > > > it with a call to a trampoline that has access to the arguments. To do
> > >
> > > As Jiri replied, BPF trampoline doesn't do it the same way as
> > > kretprobe does it. Which gives the fexit BPF program another critical
> > > advantage over kretprobe -- we know traced function's entry IP in both
> > > entry and exit cases, which allows us to generically correlate them.
> > >
> > > I've tried to figure out how to get that entry IP from kretprobe and
> > > couldn't find any way. Do you know if it's possible at all or it's a
> > > fundamental limitation of the way kretprobe is implemented (through
> > > hijacking return address)?
>
> Inside the kretprobe handler, you can get the entry IP from kretprobe as below;
>
> static int my_kretprobe_handler(struct kretprobe_instance *ri, struct pt_regs *regs)
> {
>         struct kretprobe *rp = get_kretprobe(ri);
>         unsigned long entry = (unsigned long)rp->kp.addr;
>         unsigned long retaddr = (unsigned long)ri->ret_addr;
>         ...
> }

Great. In kprobe_perf_func(), which seems to be the callback that
triggers kretprobe BPF programs, we can get that struct kretprobe
through tk->rp. So we'll just need to figure out how to pass that into
the BPF program in a sane way. Thanks!

>
> It is ensured that rp != NULL in the handler.
>
> >
> > The function graph tracer has the entry IP on exit, today. That's how we
> > can trace and show this:
> >
> >  # cd /sys/kernel/tracing
> >  # echo 1 > echo 1 > options/funcgraph-tail
> >  # echo function_graph > current_tracer
> >  # cat trace
> > # tracer: function_graph
> > #
> > # CPU  DURATION                  FUNCTION CALLS
> > # |     |   |                     |   |   |   |
> >  7)   1.358 us    |  rcu_idle_exit();
> >  7)   0.169 us    |  sched_idle_set_state();
> >  7)               |  cpuidle_reflect() {
> >  7)               |    menu_reflect() {
> >  7)   0.170 us    |      tick_nohz_idle_got_tick();
> >  7)   0.585 us    |    } /* menu_reflect */
> >  7)   1.115 us    |  } /* cpuidle_reflect */
> >
> > That's how we can show the tail function that's called. I'm sure kreprobes
> > could do the same thing.
>
> Yes, I have to update the document how to do that (and maybe introduce 2 functions
> to wrap the entry/retaddr code)
>
> >
> > The patch series I shared with Jiri, was work to allow kretprobes to be
> > built on top of the function graph tracer.
> >
> > https://lore.kernel.org/lkml/20190525031633.811342628@goodmis.org/
> >
> > The feature missing from that series, and why I didn't push it (as I had
> > ran out of time to work on it), was that kreprobes wants the full regs
> > stack as well. And since kretprobes was the main client of this work, that
> > I decided to work on this at another time. But like everything else, I got
> > distracted by other work, and didn't realize it has been almost 2 years
> > since looking at it :-p
> >
> > Anyway, IIRC, Masami wasn't sure that the full regs was ever needed for the
> > return (who cares about the registers on return, except for the return
> > value?)
>
> I think kretprobe and ftrace are for a bit different usage. kretprobe can be
> used for something like debugger. In that case, accessing full regs stack
> will be more preferrable. (BTW, what the not "full regs" means? Does that
> save partial registers?)
>
>
> Thank you,
>
> > But this code could easily save the parameters as well.
> >
> > >
> > > > this you need a shadow stack to save the real return as well as the
> > > > parameters of the function. This is something that I have patches that do
> > > > similar things with function graph.
> > > >
> > > > If you want this feature, lets work together and make this work for both
> > > > BPF and ftrace.
> > >
> > > Absolutely, ultimately for users it doesn't matter what specific
> > > mechanism is used under the cover. It just seemed like BPF trampoline
> > > has all the useful tracing features (entry IP and input arguments in
> > > fexit) already and is just mostly missing a quick batch attach API. If
> > > we can get the same from ftrace, all the better.
> >
> > Let me pull these patches out again, and see what we can do. Since then,
> > I've added the code that lets function tracer save parameters and the
> > stack, and function graph can use that as well.
> >
> >
> > -- Steve
>
>
> --
> Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe
  2021-04-16 16:48               ` Steven Rostedt
  2021-04-19 14:29                 ` Masami Hiramatsu
@ 2021-04-20 12:51                 ` Jiri Olsa
  2021-04-20 15:33                   ` Alexei Starovoitov
  1 sibling, 1 reply; 39+ messages in thread
From: Jiri Olsa @ 2021-04-20 12:51 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Masami Hiramatsu, Andrii Nakryiko, Jiri Olsa, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Networking, bpf,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Daniel Xu, Jesper Brouer,
	Toke Høiland-Jørgensen, Viktor Malik

On Fri, Apr 16, 2021 at 12:48:34PM -0400, Steven Rostedt wrote:
> On Sat, 17 Apr 2021 00:03:04 +0900
> Masami Hiramatsu <mhiramat@kernel.org> wrote:
> 
> > > Anyway, IIRC, Masami wasn't sure that the full regs was ever needed for the
> > > return (who cares about the registers on return, except for the return
> > > value?)  
> > 
> > I think kretprobe and ftrace are for a bit different usage. kretprobe can be
> > used for something like debugger. In that case, accessing full regs stack
> > will be more preferrable. (BTW, what the not "full regs" means? Does that
> > save partial registers?)
> 
> When the REGS flag is not set in the ftrace_ops (where kprobes uses the
> REGS flags), the regs parameter is not a full set of regs, but holds just
> enough to get access to the parameters. This just happened to be what was
> saved in the mcount/fentry trampoline, anyway, because tracing the start of
> the program, you had to save the arguments before calling the trace code,
> otherwise you would corrupt the parameters of the function being traced.
> 
> I just tweaked it so that by default, the ftrace callbacks now have access
> to the saved regs (call ftrace_regs, to not let a callback get confused and
> think it has full regs when it does not).
> 
> Now for the exit of a function, what does having the full pt_regs give you?
> Besides the information to get the return value, the rest of the regs are
> pretty much meaningless. Is there any example that someone wants access to
> the regs at the end of a function besides getting the return value?

for ebpf program attached to the function exit we need the functions's
arguments.. so original registers from time when the function was entered,
we don't need registers state at the time function is returning

as we discussed in another email, we could save input registers in
fgraph_ops entry handler and load them in exit handler before calling
ebpf program

jirka


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe
  2021-04-20 12:51                 ` Jiri Olsa
@ 2021-04-20 15:33                   ` Alexei Starovoitov
  2021-04-20 16:33                     ` Steven Rostedt
  2021-04-20 16:52                     ` Jiri Olsa
  0 siblings, 2 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2021-04-20 15:33 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Steven Rostedt, Masami Hiramatsu, Andrii Nakryiko, Jiri Olsa,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Networking,
	bpf, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Daniel Xu, Jesper Brouer,
	Toke Høiland-Jørgensen, Viktor Malik

On Tue, Apr 20, 2021 at 5:51 AM Jiri Olsa <jolsa@redhat.com> wrote:
>
> On Fri, Apr 16, 2021 at 12:48:34PM -0400, Steven Rostedt wrote:
> > On Sat, 17 Apr 2021 00:03:04 +0900
> > Masami Hiramatsu <mhiramat@kernel.org> wrote:
> >
> > > > Anyway, IIRC, Masami wasn't sure that the full regs was ever needed for the
> > > > return (who cares about the registers on return, except for the return
> > > > value?)
> > >
> > > I think kretprobe and ftrace are for a bit different usage. kretprobe can be
> > > used for something like debugger. In that case, accessing full regs stack
> > > will be more preferrable. (BTW, what the not "full regs" means? Does that
> > > save partial registers?)
> >
> > When the REGS flag is not set in the ftrace_ops (where kprobes uses the
> > REGS flags), the regs parameter is not a full set of regs, but holds just
> > enough to get access to the parameters. This just happened to be what was
> > saved in the mcount/fentry trampoline, anyway, because tracing the start of
> > the program, you had to save the arguments before calling the trace code,
> > otherwise you would corrupt the parameters of the function being traced.
> >
> > I just tweaked it so that by default, the ftrace callbacks now have access
> > to the saved regs (call ftrace_regs, to not let a callback get confused and
> > think it has full regs when it does not).
> >
> > Now for the exit of a function, what does having the full pt_regs give you?
> > Besides the information to get the return value, the rest of the regs are
> > pretty much meaningless. Is there any example that someone wants access to
> > the regs at the end of a function besides getting the return value?
>
> for ebpf program attached to the function exit we need the functions's
> arguments.. so original registers from time when the function was entered,
> we don't need registers state at the time function is returning
>
> as we discussed in another email, we could save input registers in
> fgraph_ops entry handler and load them in exit handler before calling
> ebpf program

I don't see how you can do it without BTF.
The mass-attach feature should prepare generic 6 or so arguments
from all functions it attached to.
On x86-64 it's trivial because 6 regs are the same.
On arm64 is now more challenging since return value regs overlaps with
first argument, so bpf trampoline (when it's ready for arm64) will look
a bit different than bpf trampoline on x86-64 to preserve arg0, arg1,
..arg6, ret
64-bit values that bpf prog expects to see.
On x86-32 it's even more trickier, since the same 6 args need to be copied
from a combination of regs and stack.
This is not some hypothetical case. We already use BTF in x86-32 JIT
and btf_func_model was introduced specifically to handle such cases.
So I really don't see how ftrace can do that just yet. It has to understand BTF
of all of the funcs it attaches to otherwise it's just saving all regs.
That approach was a pain to deal with.
Just look at bpf code samples with ugly per architecture macros to access regs.
BPF trampoline solved it and I don't think going back to per-arch macros
is an option at this point.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe
  2021-04-20 15:33                   ` Alexei Starovoitov
@ 2021-04-20 16:33                     ` Steven Rostedt
  2021-04-20 16:52                     ` Jiri Olsa
  1 sibling, 0 replies; 39+ messages in thread
From: Steven Rostedt @ 2021-04-20 16:33 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Jiri Olsa, Masami Hiramatsu, Andrii Nakryiko, Jiri Olsa,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Networking,
	bpf, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Daniel Xu, Jesper Brouer,
	Toke Høiland-Jørgensen, Viktor Malik

On Tue, 20 Apr 2021 08:33:43 -0700
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:


> I don't see how you can do it without BTF.

I agree. 

> The mass-attach feature should prepare generic 6 or so arguments
> from all functions it attached to.
> On x86-64 it's trivial because 6 regs are the same.
> On arm64 is now more challenging since return value regs overlaps with
> first argument, so bpf trampoline (when it's ready for arm64) will look
> a bit different than bpf trampoline on x86-64 to preserve arg0, arg1,
> ..arg6, ret
> 64-bit values that bpf prog expects to see.
> On x86-32 it's even more trickier, since the same 6 args need to be copied
> from a combination of regs and stack.
> This is not some hypothetical case. We already use BTF in x86-32 JIT
> and btf_func_model was introduced specifically to handle such cases.
> So I really don't see how ftrace can do that just yet. It has to understand BTF

ftrace doesn't need to understand BTF, but the call back does. Ftrace will
give the callback all the information it needs to get the arguments from
the regs that hold the arguments and a pointer to the stack.

It's a limited set of regs that produce the arguments. ftrace only needs to
supply it. How those regs are converted to arguments requires more
understanding of those arguments, which BTF can give you.


> of all of the funcs it attaches to otherwise it's just saving all regs.
> That approach was a pain to deal with.
> Just look at bpf code samples with ugly per architecture macros to access regs.
> BPF trampoline solved it and I don't think going back to per-arch macros
> is an option at this point.

How does it solve it besides a one to one mapped trampoline to function?
Once you attach more than one function to the trampoline, it needs BTF to
translate it, and you need to save enough regs to have access to the args
that are needed.

For a direct call, which attaches to only one function, you could do short
cuts. If the function you hook to, only has one argument, you may be able
to safely call the callback code and only save that one argument. But as
soon as you want to attach to more than one function to the same
trampoline, you will need to save the regs of the args with the most
arguments.

What ftrace can give you is what I have called "ftrace_regs", which is
really just pt_regs with only the arguments saved. This is much less than
the ftrace_regs_caller that is used by kprobes. Because kprobes expects to
be called by an int3. ftrace_regs_caller() is much faster than an int3, but
still needs to save all regs that an interrupt would save (including
flags!) and makes it slower than a normal ftrace_caller() (and why I
created the two). But the ftrace_caller() saves only the regs needed to
create arguments (as ftrace must handle all functions and all their args,
so that the callback does not corrupt them).

For x86_64, this is rdi, rsi, rdx, rcx, r8, r9. That's all that is saved.
And this is passed to the callback as well as the stack pointer. With this
information, you could get any arguments of any function on x86_64. And it
is trivial to do the same for other architectures.

For every function attached to ftrace, you would have a BPF program created
from the BTF of the function prototype to parse the arguments. The
ftrace_regs holds only the information needed to retrieve those arguments,
Have a hash that maps the traced function with its bpf program and then
simply execute that, feeding it the ftrace_regs. Then the bpf program could
parse out the arguments.

That is, you could have BPF hook to the ftrace callback (and it lets you
hook to a subset of functions), have a hash table that maps the function to
the BTF prototype of that function. Then you can have a generic BPF program
look up the BTF prototype of the function it is passed (via the hash), and
then retrieve the arguments.

The ftrace_regs could also be stored on the shadow stack to parse on the
return exit entry too (as Jiri wants to do).

Yes, the BPF trampoline is fine for a 1 to 1 mapping of a function to its
bpf program, but once you want to attach more than 1 function to a
trampoline, you'll need a way to parse its arguments for each function that
the trampoline is attached to. The difference is, you'll need a lookup table
to find the BTF of the function that is called.

I think this is doable.

-- Steve

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe
  2021-04-20 15:33                   ` Alexei Starovoitov
  2021-04-20 16:33                     ` Steven Rostedt
@ 2021-04-20 16:52                     ` Jiri Olsa
  2021-04-20 23:38                       ` Alexei Starovoitov
  1 sibling, 1 reply; 39+ messages in thread
From: Jiri Olsa @ 2021-04-20 16:52 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Steven Rostedt, Masami Hiramatsu, Andrii Nakryiko, Jiri Olsa,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Networking,
	bpf, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Daniel Xu, Jesper Brouer,
	Toke Høiland-Jørgensen, Viktor Malik

On Tue, Apr 20, 2021 at 08:33:43AM -0700, Alexei Starovoitov wrote:
> On Tue, Apr 20, 2021 at 5:51 AM Jiri Olsa <jolsa@redhat.com> wrote:
> >
> > On Fri, Apr 16, 2021 at 12:48:34PM -0400, Steven Rostedt wrote:
> > > On Sat, 17 Apr 2021 00:03:04 +0900
> > > Masami Hiramatsu <mhiramat@kernel.org> wrote:
> > >
> > > > > Anyway, IIRC, Masami wasn't sure that the full regs was ever needed for the
> > > > > return (who cares about the registers on return, except for the return
> > > > > value?)
> > > >
> > > > I think kretprobe and ftrace are for a bit different usage. kretprobe can be
> > > > used for something like debugger. In that case, accessing full regs stack
> > > > will be more preferrable. (BTW, what the not "full regs" means? Does that
> > > > save partial registers?)
> > >
> > > When the REGS flag is not set in the ftrace_ops (where kprobes uses the
> > > REGS flags), the regs parameter is not a full set of regs, but holds just
> > > enough to get access to the parameters. This just happened to be what was
> > > saved in the mcount/fentry trampoline, anyway, because tracing the start of
> > > the program, you had to save the arguments before calling the trace code,
> > > otherwise you would corrupt the parameters of the function being traced.
> > >
> > > I just tweaked it so that by default, the ftrace callbacks now have access
> > > to the saved regs (call ftrace_regs, to not let a callback get confused and
> > > think it has full regs when it does not).
> > >
> > > Now for the exit of a function, what does having the full pt_regs give you?
> > > Besides the information to get the return value, the rest of the regs are
> > > pretty much meaningless. Is there any example that someone wants access to
> > > the regs at the end of a function besides getting the return value?
> >
> > for ebpf program attached to the function exit we need the functions's
> > arguments.. so original registers from time when the function was entered,
> > we don't need registers state at the time function is returning
> >
> > as we discussed in another email, we could save input registers in
> > fgraph_ops entry handler and load them in exit handler before calling
> > ebpf program
> 
> I don't see how you can do it without BTF.
> The mass-attach feature should prepare generic 6 or so arguments
> from all functions it attached to.
> On x86-64 it's trivial because 6 regs are the same.
> On arm64 is now more challenging since return value regs overlaps with
> first argument, so bpf trampoline (when it's ready for arm64) will look
> a bit different than bpf trampoline on x86-64 to preserve arg0, arg1,
> ..arg6, ret
> 64-bit values that bpf prog expects to see.
> On x86-32 it's even more trickier, since the same 6 args need to be copied
> from a combination of regs and stack.
> This is not some hypothetical case. We already use BTF in x86-32 JIT
> and btf_func_model was introduced specifically to handle such cases.
> So I really don't see how ftrace can do that just yet. It has to understand BTF
> of all of the funcs it attaches to otherwise it's just saving all regs.
> That approach was a pain to deal with.

ok, my idea was to get regs from the ftrace and have arch specific code
to prepare 6 (or less) args for ebpf program.. that part would be
already in bpf code

so you'd like to see this functionality directly in ftrace, so we don't
save unneeded regs, is that right?

jirka

> Just look at bpf code samples with ugly per architecture macros to access regs.
> BPF trampoline solved it and I don't think going back to per-arch macros
> is an option at this point.
> 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe
  2021-04-20 16:52                     ` Jiri Olsa
@ 2021-04-20 23:38                       ` Alexei Starovoitov
  2021-04-21 13:40                         ` Jiri Olsa
  0 siblings, 1 reply; 39+ messages in thread
From: Alexei Starovoitov @ 2021-04-20 23:38 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Steven Rostedt, Masami Hiramatsu, Andrii Nakryiko, Jiri Olsa,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Networking,
	bpf, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Daniel Xu, Jesper Brouer,
	Toke Høiland-Jørgensen, Viktor Malik

On Tue, Apr 20, 2021 at 9:52 AM Jiri Olsa <jolsa@redhat.com> wrote:
>
> On Tue, Apr 20, 2021 at 08:33:43AM -0700, Alexei Starovoitov wrote:
> > On Tue, Apr 20, 2021 at 5:51 AM Jiri Olsa <jolsa@redhat.com> wrote:
> > >
> > > On Fri, Apr 16, 2021 at 12:48:34PM -0400, Steven Rostedt wrote:
> > > > On Sat, 17 Apr 2021 00:03:04 +0900
> > > > Masami Hiramatsu <mhiramat@kernel.org> wrote:
> > > >
> > > > > > Anyway, IIRC, Masami wasn't sure that the full regs was ever needed for the
> > > > > > return (who cares about the registers on return, except for the return
> > > > > > value?)
> > > > >
> > > > > I think kretprobe and ftrace are for a bit different usage. kretprobe can be
> > > > > used for something like debugger. In that case, accessing full regs stack
> > > > > will be more preferrable. (BTW, what the not "full regs" means? Does that
> > > > > save partial registers?)
> > > >
> > > > When the REGS flag is not set in the ftrace_ops (where kprobes uses the
> > > > REGS flags), the regs parameter is not a full set of regs, but holds just
> > > > enough to get access to the parameters. This just happened to be what was
> > > > saved in the mcount/fentry trampoline, anyway, because tracing the start of
> > > > the program, you had to save the arguments before calling the trace code,
> > > > otherwise you would corrupt the parameters of the function being traced.
> > > >
> > > > I just tweaked it so that by default, the ftrace callbacks now have access
> > > > to the saved regs (call ftrace_regs, to not let a callback get confused and
> > > > think it has full regs when it does not).
> > > >
> > > > Now for the exit of a function, what does having the full pt_regs give you?
> > > > Besides the information to get the return value, the rest of the regs are
> > > > pretty much meaningless. Is there any example that someone wants access to
> > > > the regs at the end of a function besides getting the return value?
> > >
> > > for ebpf program attached to the function exit we need the functions's
> > > arguments.. so original registers from time when the function was entered,
> > > we don't need registers state at the time function is returning
> > >
> > > as we discussed in another email, we could save input registers in
> > > fgraph_ops entry handler and load them in exit handler before calling
> > > ebpf program
> >
> > I don't see how you can do it without BTF.
> > The mass-attach feature should prepare generic 6 or so arguments
> > from all functions it attached to.
> > On x86-64 it's trivial because 6 regs are the same.
> > On arm64 is now more challenging since return value regs overlaps with
> > first argument, so bpf trampoline (when it's ready for arm64) will look
> > a bit different than bpf trampoline on x86-64 to preserve arg0, arg1,
> > ..arg6, ret
> > 64-bit values that bpf prog expects to see.
> > On x86-32 it's even more trickier, since the same 6 args need to be copied
> > from a combination of regs and stack.
> > This is not some hypothetical case. We already use BTF in x86-32 JIT
> > and btf_func_model was introduced specifically to handle such cases.
> > So I really don't see how ftrace can do that just yet. It has to understand BTF
> > of all of the funcs it attaches to otherwise it's just saving all regs.
> > That approach was a pain to deal with.
>
> ok, my idea was to get regs from the ftrace and have arch specific code
> to prepare 6 (or less) args for ebpf program.. that part would be
> already in bpf code
>
> so you'd like to see this functionality directly in ftrace, so we don't
> save unneeded regs, is that right?

What do you mean by "already in bpf code" ?

The main question is an api across layers.
If ftrace doesn't use BTF it has to prepare all regs that could be used.
Meaning on x86-64 that has to be 6 regs for args, 1 reg for return and
stack pointer.
That would be enough to discover input args and return value in fexit.
On arm64 that has to be similar, but while x86-64 can do with single pt_regs
where %rax is updated on fexit, arm64 cannot do so, since the same register
is used as arg1 and as a return value.
The most generic api between ftrace and bpf layers would be two sets of
pt_regs. One on entry and one on exit, but that's going to be very expensive.
On x86-32 it would have to be 3 regs plus stack pointer and another 2 regs
to cover all input args and return value.
So there will be plenty of per-arch differences.

Jiri, if you're thinking of a bpf helper like:
u64 bpf_read_argN(pt_regs, ip, arg_num)
that will do lookup of btf_id from ip, then it will parse btf_id and
function proto,
then it will translate that to btf_func_model and finally will extract the right
argument value from a combination of stack and regs ?
That's doable, but it's a lot of run-time overhead.
It would be usable by bpf progs that don't care much about run-time perf
and don't care that they're not usable 24/7 on production systems.
Such tools exist and they're useful,
but I'd like this mass-attach facility to be usable everywhere
including the production and 24/7 tracing.
Hence I think it's better to do this per-arch translation during bpf
prog attach.
That's exactly what bpf trampoline is doing.
Currently it's doing for single btf_id, single trampoline, and single bpf prog.
To make the same logic work across N attach points the trampoline logic
would need to iterate all btf_func_model-s of all btf_id-s and generate
M trampolines (where M < N) for a combination of possible argument passing.
On x86-64 the M will be equal to 1. On arm64 it will be equal to 1 as well.
But on x86-32 it will depend on a set of btf_ids. It could be 1,2,..10.
Since bpf doesn't allow to attach to struct-by-value it's only 32-bit and 64-bit
integers to deal with and number of combinations of possible calling conventions
is actually very small. I suspect it won't be more than 10.
This way there will be no additional run-time overhead and bpf programs
can be portable. They will work as-is on x86-64, x86-32, arm64.
Just like fentry/fexit work today. Or rather they will be portable
when bpf trampoline is supported on these archs.
This portability is the key feature of bpf trampoline design. The bpf trampoline
was implemented for x86-64 only so far. Arm64 patches are still wip.
btf_func_model is used by both x86-64 and x86-32 JITs.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe
  2021-04-20 23:38                       ` Alexei Starovoitov
@ 2021-04-21 13:40                         ` Jiri Olsa
  2021-04-21 14:05                           ` Steven Rostedt
  0 siblings, 1 reply; 39+ messages in thread
From: Jiri Olsa @ 2021-04-21 13:40 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Steven Rostedt, Masami Hiramatsu, Andrii Nakryiko, Jiri Olsa,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Networking,
	bpf, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Daniel Xu, Jesper Brouer,
	Toke Høiland-Jørgensen, Viktor Malik

On Tue, Apr 20, 2021 at 04:38:45PM -0700, Alexei Starovoitov wrote:

SNIP

> > >
> > > I don't see how you can do it without BTF.
> > > The mass-attach feature should prepare generic 6 or so arguments
> > > from all functions it attached to.
> > > On x86-64 it's trivial because 6 regs are the same.
> > > On arm64 is now more challenging since return value regs overlaps with
> > > first argument, so bpf trampoline (when it's ready for arm64) will look
> > > a bit different than bpf trampoline on x86-64 to preserve arg0, arg1,
> > > ..arg6, ret
> > > 64-bit values that bpf prog expects to see.
> > > On x86-32 it's even more trickier, since the same 6 args need to be copied
> > > from a combination of regs and stack.
> > > This is not some hypothetical case. We already use BTF in x86-32 JIT
> > > and btf_func_model was introduced specifically to handle such cases.
> > > So I really don't see how ftrace can do that just yet. It has to understand BTF
> > > of all of the funcs it attaches to otherwise it's just saving all regs.
> > > That approach was a pain to deal with.
> >
> > ok, my idea was to get regs from the ftrace and have arch specific code
> > to prepare 6 (or less) args for ebpf program.. that part would be
> > already in bpf code
> >
> > so you'd like to see this functionality directly in ftrace, so we don't
> > save unneeded regs, is that right?
> 
> What do you mean by "already in bpf code" ?

that it would not be part of ftrace code

> 
> The main question is an api across layers.
> If ftrace doesn't use BTF it has to prepare all regs that could be used.
> Meaning on x86-64 that has to be 6 regs for args, 1 reg for return and
> stack pointer.
> That would be enough to discover input args and return value in fexit.
> On arm64 that has to be similar, but while x86-64 can do with single pt_regs
> where %rax is updated on fexit, arm64 cannot do so, since the same register
> is used as arg1 and as a return value.
> The most generic api between ftrace and bpf layers would be two sets of
> pt_regs. One on entry and one on exit, but that's going to be very expensive.

that's what I was going for and I think it's the only way if
we use ftrace graph_ops for mass attaching

> On x86-32 it would have to be 3 regs plus stack pointer and another 2 regs
> to cover all input args and return value.
> So there will be plenty of per-arch differences.
> 
> Jiri, if you're thinking of a bpf helper like:
> u64 bpf_read_argN(pt_regs, ip, arg_num)
> that will do lookup of btf_id from ip, then it will parse btf_id and
> function proto,
> then it will translate that to btf_func_model and finally will extract the right
> argument value from a combination of stack and regs ?
> That's doable, but it's a lot of run-time overhead.
> It would be usable by bpf progs that don't care much about run-time perf
> and don't care that they're not usable 24/7 on production systems.
> Such tools exist and they're useful,
> but I'd like this mass-attach facility to be usable everywhere
> including the production and 24/7 tracing.

I did not think of this option, but yep, seems also expensive

> Hence I think it's better to do this per-arch translation during bpf
> prog attach.
> That's exactly what bpf trampoline is doing.
> Currently it's doing for single btf_id, single trampoline, and single bpf prog.
> To make the same logic work across N attach points the trampoline logic
> would need to iterate all btf_func_model-s of all btf_id-s and generate
> M trampolines (where M < N) for a combination of possible argument passing.
> On x86-64 the M will be equal to 1. On arm64 it will be equal to 1 as well.
> But on x86-32 it will depend on a set of btf_ids. It could be 1,2,..10.
> Since bpf doesn't allow to attach to struct-by-value it's only 32-bit and 64-bit
> integers to deal with and number of combinations of possible calling conventions
> is actually very small. I suspect it won't be more than 10.
> This way there will be no additional run-time overhead and bpf programs
> can be portable. They will work as-is on x86-64, x86-32, arm64.
> Just like fentry/fexit work today. Or rather they will be portable
> when bpf trampoline is supported on these archs.
> This portability is the key feature of bpf trampoline design. The bpf trampoline
> was implemented for x86-64 only so far. Arm64 patches are still wip.
> btf_func_model is used by both x86-64 and x86-32 JITs.

ok, I understand why this would be the best solution for calling
the program from multiple probes

I think it's the 'attach' layer which is the source of problems

currently there is ftrace's fgraph_ops support that allows fast mass
attach and calls callbacks for functions entry and exit:
  https://lore.kernel.org/lkml/20190525031633.811342628@goodmis.org/

these callbacks get ip/parent_ip and can get pt_regs (that's not
implemented at the moment)

but that gets us to the situation of having full pt_regs on both
entry/exit callbacks that you described above and want to avoid,
but I think it's the price for having this on top of generic
tracing layer

the way ftrace's fgraph_ops is implemented, I'm not sure it can
be as fast as current bpf entry/exit trampoline

but to better understand the pain points I think I'll try to implement
the 'mass trampolines' call to the bpf program you described above and
attach it for now to fgraph_ops callbacks

perhaps this is a good topic to discuss in one of the Thursday's BPF mtg?

thanks,
jirka


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe
  2021-04-21 13:40                         ` Jiri Olsa
@ 2021-04-21 14:05                           ` Steven Rostedt
  2021-04-21 18:52                             ` Andrii Nakryiko
  2021-04-21 21:37                             ` Jiri Olsa
  0 siblings, 2 replies; 39+ messages in thread
From: Steven Rostedt @ 2021-04-21 14:05 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Alexei Starovoitov, Masami Hiramatsu, Andrii Nakryiko, Jiri Olsa,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Networking,
	bpf, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Daniel Xu, Jesper Brouer,
	Toke Høiland-Jørgensen, Viktor Malik

On Wed, 21 Apr 2021 15:40:37 +0200
Jiri Olsa <jolsa@redhat.com> wrote:

> ok, I understand why this would be the best solution for calling
> the program from multiple probes
> 
> I think it's the 'attach' layer which is the source of problems
> 
> currently there is ftrace's fgraph_ops support that allows fast mass
> attach and calls callbacks for functions entry and exit:
>   https://lore.kernel.org/lkml/20190525031633.811342628@goodmis.org/
> 
> these callbacks get ip/parent_ip and can get pt_regs (that's not
> implemented at the moment)
> 
> but that gets us to the situation of having full pt_regs on both
> entry/exit callbacks that you described above and want to avoid,
> but I think it's the price for having this on top of generic
> tracing layer
> 
> the way ftrace's fgraph_ops is implemented, I'm not sure it can
> be as fast as current bpf entry/exit trampoline

Note, the above mentioned code was an attempt to consolidate the code that
does the "highjacking" of the return pointer in order to record the
return of a function. At the time there was only kretprobes and function
graph tracing. Now bpf has another version. That means there's three
utilities that record the exit of the function.

What we need is a single method that works for all three utilities. And I'm
perfectly fine with a rewrite of function graph tracer to do that. The one
problem is that function graph and kretprobes works for pretty much all the
architectures now, and whatever we decide to do, we can't break those
architectures.

One way is to have an abstract layer that allows function graph and
kretprobes to work with the old implementation as well as, depending on a
config set, a new implementation that also supports bpf trampolines.

> 
> but to better understand the pain points I think I'll try to implement
> the 'mass trampolines' call to the bpf program you described above and
> attach it for now to fgraph_ops callbacks

One thing that ftrace gives you is a way to have each function call its own
trampoline, then depending on what is attached, each one can have multiple
implementations.

One thing that needs to be fixed is the direct trampoline and function
graph and kretprobes. As the direct trampoline will break both of them,
with the bpf implementation to trace after it.

I would be interested in what a mass generic trampoline would look like, if
it had to deal with handling functions with 1 parameter and one with 12
parameters. From this thread, I was told it can currently only handle 6
parameters on x86_64. Not sure how it works on x86_32.

> 
> perhaps this is a good topic to discuss in one of the Thursday's BPF mtg?

I'm unaware of these meetings.


-- Steve

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe
  2021-04-21 14:05                           ` Steven Rostedt
@ 2021-04-21 18:52                             ` Andrii Nakryiko
  2021-04-21 19:18                               ` Jiri Olsa
  2021-04-21 21:37                             ` Jiri Olsa
  1 sibling, 1 reply; 39+ messages in thread
From: Andrii Nakryiko @ 2021-04-21 18:52 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Jiri Olsa, Alexei Starovoitov, Masami Hiramatsu, Jiri Olsa,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Networking,
	bpf, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Daniel Xu, Jesper Brouer,
	Toke Høiland-Jørgensen, Viktor Malik

On Wed, Apr 21, 2021 at 7:05 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Wed, 21 Apr 2021 15:40:37 +0200
> Jiri Olsa <jolsa@redhat.com> wrote:
>

[...]

>
> >
> > perhaps this is a good topic to discuss in one of the Thursday's BPF mtg?
>
> I'm unaware of these meetings.

We have BPF office hours weekly meetings every Thursday at 9am PDT.
There is a spreadsheet ([0]) in which anyone can propose a topic for
deeper discussion over Zoom. I've already added the topic for the
discussion in this thread. It would be great if you and Jiri could
join tomorrow. See the first tab in the spreadsheet for Zoom link.
Thanks!

  [0] https://docs.google.com/spreadsheets/d/1LfrDXZ9-fdhvPEp_LHkxAMYyxxpwBXjywWa0AejEveU/edit#gid=883029154

>
>
> -- Steve

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe
  2021-04-21 18:52                             ` Andrii Nakryiko
@ 2021-04-21 19:18                               ` Jiri Olsa
  2021-04-22 14:24                                 ` Steven Rostedt
  0 siblings, 1 reply; 39+ messages in thread
From: Jiri Olsa @ 2021-04-21 19:18 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Steven Rostedt, Alexei Starovoitov, Masami Hiramatsu, Jiri Olsa,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Networking,
	bpf, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Daniel Xu, Jesper Brouer,
	Toke Høiland-Jørgensen, Viktor Malik

On Wed, Apr 21, 2021 at 11:52:11AM -0700, Andrii Nakryiko wrote:
> On Wed, Apr 21, 2021 at 7:05 AM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > On Wed, 21 Apr 2021 15:40:37 +0200
> > Jiri Olsa <jolsa@redhat.com> wrote:
> >
> 
> [...]
> 
> >
> > >
> > > perhaps this is a good topic to discuss in one of the Thursday's BPF mtg?
> >
> > I'm unaware of these meetings.
> 
> We have BPF office hours weekly meetings every Thursday at 9am PDT.
> There is a spreadsheet ([0]) in which anyone can propose a topic for
> deeper discussion over Zoom. I've already added the topic for the
> discussion in this thread. It would be great if you and Jiri could
> join tomorrow. See the first tab in the spreadsheet for Zoom link.
> Thanks!
> 
>   [0] https://docs.google.com/spreadsheets/d/1LfrDXZ9-fdhvPEp_LHkxAMYyxxpwBXjywWa0AejEveU/edit#gid=883029154

great, I can come

thanks,
jirka

> 
> >
> >
> > -- Steve
> 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe
  2021-04-21 14:05                           ` Steven Rostedt
  2021-04-21 18:52                             ` Andrii Nakryiko
@ 2021-04-21 21:37                             ` Jiri Olsa
  2021-04-22  1:17                               ` Steven Rostedt
  1 sibling, 1 reply; 39+ messages in thread
From: Jiri Olsa @ 2021-04-21 21:37 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Alexei Starovoitov, Masami Hiramatsu, Andrii Nakryiko, Jiri Olsa,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Networking,
	bpf, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Daniel Xu, Jesper Brouer,
	Toke Høiland-Jørgensen, Viktor Malik

On Wed, Apr 21, 2021 at 10:05:41AM -0400, Steven Rostedt wrote:
> On Wed, 21 Apr 2021 15:40:37 +0200
> Jiri Olsa <jolsa@redhat.com> wrote:
> 
> > ok, I understand why this would be the best solution for calling
> > the program from multiple probes
> > 
> > I think it's the 'attach' layer which is the source of problems
> > 
> > currently there is ftrace's fgraph_ops support that allows fast mass
> > attach and calls callbacks for functions entry and exit:
> >   https://lore.kernel.org/lkml/20190525031633.811342628@goodmis.org/
> > 
> > these callbacks get ip/parent_ip and can get pt_regs (that's not
> > implemented at the moment)
> > 
> > but that gets us to the situation of having full pt_regs on both
> > entry/exit callbacks that you described above and want to avoid,
> > but I think it's the price for having this on top of generic
> > tracing layer
> > 
> > the way ftrace's fgraph_ops is implemented, I'm not sure it can
> > be as fast as current bpf entry/exit trampoline
> 
> Note, the above mentioned code was an attempt to consolidate the code that
> does the "highjacking" of the return pointer in order to record the
> return of a function. At the time there was only kretprobes and function
> graph tracing. Now bpf has another version. That means there's three
> utilities that record the exit of the function.
> 
> What we need is a single method that works for all three utilities. And I'm
> perfectly fine with a rewrite of function graph tracer to do that. The one
> problem is that function graph and kretprobes works for pretty much all the
> architectures now, and whatever we decide to do, we can't break those
> architectures.
> 
> One way is to have an abstract layer that allows function graph and
> kretprobes to work with the old implementation as well as, depending on a
> config set, a new implementation that also supports bpf trampolines.
> 
> > 
> > but to better understand the pain points I think I'll try to implement
> > the 'mass trampolines' call to the bpf program you described above and
> > attach it for now to fgraph_ops callbacks
> 
> One thing that ftrace gives you is a way to have each function call its own
> trampoline, then depending on what is attached, each one can have multiple
> implementations.

but that would cut off other tracers for the function, right?

AFAICT it's used only when there's single ftrace_ops registered
for the probe and if there are more ftrace_ops, ops->trampoline
is replaced by the generic one and ftrace will call ops->func
instead, right?

if we would not care about sharing function by multiple tracers,
we could make special 'exclusive' trampoline that would require
that no other tracer is (or will be) registered for the function
while the tracer is registered

then we could run BPF trampolines directly without 'direct' API
and use ftrace for mass attach

that is if we don't care about other tracers for the function,
which I guess was concern when the 'direct' API was introduced

jirka

> 
> One thing that needs to be fixed is the direct trampoline and function
> graph and kretprobes. As the direct trampoline will break both of them,
> with the bpf implementation to trace after it.
> 
> I would be interested in what a mass generic trampoline would look like, if
> it had to deal with handling functions with 1 parameter and one with 12
> parameters. From this thread, I was told it can currently only handle 6
> parameters on x86_64. Not sure how it works on x86_32.
> 
> > 
> > perhaps this is a good topic to discuss in one of the Thursday's BPF mtg?
> 
> I'm unaware of these meetings.
> 
> 
> -- Steve
> 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe
  2021-04-21 21:37                             ` Jiri Olsa
@ 2021-04-22  1:17                               ` Steven Rostedt
  0 siblings, 0 replies; 39+ messages in thread
From: Steven Rostedt @ 2021-04-22  1:17 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Alexei Starovoitov, Masami Hiramatsu, Andrii Nakryiko, Jiri Olsa,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Networking,
	bpf, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Daniel Xu, Jesper Brouer,
	Toke Høiland-Jørgensen, Viktor Malik

On Wed, 21 Apr 2021 23:37:47 +0200
Jiri Olsa <jolsa@redhat.com> wrote:
> > 
> > One thing that ftrace gives you is a way to have each function call its own
> > trampoline, then depending on what is attached, each one can have multiple
> > implementations.  
> 
> but that would cut off other tracers for the function, right?

No, actually just the opposite. It can be updated by what is attached
to it to provide the best trampoline for the use case.

If all the callbacks are attached to a single function with a limited
set of required args, it could jump to a trampoline that only saves the
required args.

If any of the callbacks are attached to multiple functions, and needs
all the args, then it can jump to a trampoline to save all of them.

This is similar to how it works now. One is for a trampoline that only
cares about saving the arguments, another is for a trampoline that
wants to save *all* the regs. And if there's a callback that wants
both, it will use the trampoline that can handle all the callbacks to
that function.

> 
> AFAICT it's used only when there's single ftrace_ops registered
> for the probe and if there are more ftrace_ops, ops->trampoline
> is replaced by the generic one and ftrace will call ops->func
> instead, right?

Somewhat, but let me explain it in more detail.

There's currently two variables that determine what trampoline to call.

Variable 1: number of callbacks attached. It cares about 0, 1, more than one.

Variable 2: Does any callback require a full set of regs?

Variable 1 determines if the trampoline that is called will call a loop
function, that will loop through all the attached callbacks.

Variable 2 determines if it should save all the regs or not.

If all the callbacks attached to a function do not require saving all
the regs, then it will only save the regs that are required for calling
functions (which is all the arguments but not all the regs). Then it
calls the loop function with a limited number of regs saved.

If one of the functions attached requires all the regs, then it will
save all the regs, and pass that to the loop function that calls the
callbacks. All the callbacks will get the full set of regs, even though
only one might care about them. But the work was already done, and the
regs is just a pointer passed to the callbacks.


> 
> if we would not care about sharing function by multiple tracers,
> we could make special 'exclusive' trampoline that would require
> that no other tracer is (or will be) registered for the function
> while the tracer is registered

We don't want to restrict other traces. But you can switch trampolines,
without disruption. We do that now.

If a single callback that doesn't care about all regs is attached to a
function, then a custom trampoline is created, and when the function is
called, it calls that trampoline, which saves the required regs, and
then does a direct call to the callback.

If we add another callback to that same function, and that callback
also doesn't care for all regs, then we switch to a new trampoline that
instead of calling the original callback directly, it calls the loop
function, that iterates over all the functions. The first callback sees
no difference. We just switch the function "fentry" to point to the new
trampoline.

Then if we add a callback to that same function, but this callback
wants all the regs, then we switch to a new trampoline that will save
all the regs and call the loop function. The first two callbacks notice
no change.

And this can be backed out as well when callbacks are removed from a
function. All this accounting is taken care of by
__ftrace_hash_rec_update() in kernel/trace/ftrace. And yes, it takes a
bit of care to get this right.

The point is, you can modify the trampolines dynamically individually
per function depending on the callback requirements that are attached
to it.

The state of each function is determined by the dyn_ftrace structure
flags field. (see include/linux/ftrace.h)

The 23 least significant bits of flags is a counter. 2^23 is much more
than needed, because there should never be that many callbacks attached
to a single function. Variable 1 above is determined by the count. 0,
1 , more than one.

And you'll see the flags for: REGS / REGS_EN, TRAMP / TRAMP_EN and even
DIRECT / DIRECT_EN, for whether or not the function wants regs, if the
function calls a trampoline directly or uses the loop, and if it has a
direct caller or not, respectively.

The "_EN" part of those flags show the state that the actual function
is in. The __ftrace_hash_rec_update() will look at the ftrace_ops that
is being registered or unregistered, and will update each of the functions
it traces dyn_ftrace non "_EN" flags. Then the code that actually does
the modifications will look at each record (that represent all
traceable functions), and if the non "_EN" flag does not match the
"_EN" part it knows to update that function appropriately, and then it
updates the "_EN" flag.


> 
> then we could run BPF trampolines directly without 'direct' API
> and use ftrace for mass attach
> 
> that is if we don't care about other tracers for the function,
> which I guess was concern when the 'direct' API was introduced

I still very much care about other traces for the function. Remember,
kprobes and perf uses this too, and kprobes was the one that requires
all regs being saved.

-- Steve


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe
  2021-04-21 19:18                               ` Jiri Olsa
@ 2021-04-22 14:24                                 ` Steven Rostedt
  0 siblings, 0 replies; 39+ messages in thread
From: Steven Rostedt @ 2021-04-22 14:24 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Andrii Nakryiko, Alexei Starovoitov, Masami Hiramatsu, Jiri Olsa,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Networking,
	bpf, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Daniel Xu, Jesper Brouer,
	Toke Høiland-Jørgensen, Viktor Malik

On Wed, 21 Apr 2021 21:18:42 +0200
Jiri Olsa <jolsa@redhat.com> wrote:

> > 
> >   [0] https://docs.google.com/spreadsheets/d/1LfrDXZ9-fdhvPEp_LHkxAMYyxxpwBXjywWa0AejEveU/edit#gid=883029154  
> 
> great, I can come

I'll attend as well, and so may some of my team.

-- Steve

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2021-04-22 14:24 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-13 12:15 [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe Jiri Olsa
2021-04-13 12:15 ` [PATCHv2 RFC bpf-next 1/7] bpf: Move bpf_prog_start/end functions to generic place Jiri Olsa
2021-04-13 12:15 ` [PATCHv2 RFC bpf-next 2/7] bpf: Add bpf_functions object Jiri Olsa
2021-04-13 12:15 ` [PATCHv2 RFC bpf-next 3/7] bpf: Add support to attach program to ftrace probe Jiri Olsa
2021-04-13 12:15 ` [PATCHv2 RFC bpf-next 4/7] libbpf: Add btf__find_by_pattern_kind function Jiri Olsa
2021-04-13 12:15 ` [PATCHv2 RFC bpf-next 5/7] libbpf: Add support to load and attach ftrace probe Jiri Olsa
2021-04-13 12:15 ` [PATCHv2 RFC bpf-next 6/7] selftests/bpf: Add ftrace probe to fentry test Jiri Olsa
2021-04-13 12:15 ` [PATCHv2 RFC bpf-next 7/7] selftests/bpf: Add ftrace probe test Jiri Olsa
2021-04-14  1:04 ` [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe Andrii Nakryiko
2021-04-14 12:19   ` Jiri Olsa
2021-04-14 22:46     ` Andrii Nakryiko
2021-04-15 14:00       ` Jiri Olsa
2021-04-15 15:10       ` Steven Rostedt
2021-04-15 17:39         ` Jiri Olsa
2021-04-15 18:18           ` Steven Rostedt
2021-04-15 18:21             ` Steven Rostedt
2021-04-15 21:49               ` Jiri Olsa
2021-04-15 23:30                 ` Steven Rostedt
2021-04-19 20:51                   ` Jiri Olsa
2021-04-19 22:00                     ` Steven Rostedt
2021-04-15 18:31             ` Yonghong Song
2021-04-15 20:45         ` Andrii Nakryiko
2021-04-15 21:00           ` Steven Rostedt
2021-04-16 15:03             ` Masami Hiramatsu
2021-04-16 16:48               ` Steven Rostedt
2021-04-19 14:29                 ` Masami Hiramatsu
2021-04-20 12:51                 ` Jiri Olsa
2021-04-20 15:33                   ` Alexei Starovoitov
2021-04-20 16:33                     ` Steven Rostedt
2021-04-20 16:52                     ` Jiri Olsa
2021-04-20 23:38                       ` Alexei Starovoitov
2021-04-21 13:40                         ` Jiri Olsa
2021-04-21 14:05                           ` Steven Rostedt
2021-04-21 18:52                             ` Andrii Nakryiko
2021-04-21 19:18                               ` Jiri Olsa
2021-04-22 14:24                                 ` Steven Rostedt
2021-04-21 21:37                             ` Jiri Olsa
2021-04-22  1:17                               ` Steven Rostedt
2021-04-20  4:51               ` Andrii Nakryiko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.