bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next 0/5] introduce BPF_PROG_TYPE_USER
@ 2020-08-01  8:47 Song Liu
  2020-08-01  8:47 ` [PATCH bpf-next 1/5] bpf: " Song Liu
                   ` (4 more replies)
  0 siblings, 5 replies; 43+ messages in thread
From: Song Liu @ 2020-08-01  8:47 UTC (permalink / raw)
  To: linux-kernel, bpf, netdev
  Cc: ast, daniel, kernel-team, john.fastabend, kpsingh, brouer, dlxu,
	Song Liu

This set introduces a new program type BPF_PROG_TYPE_USER, or "user
program". User program is triggered from user space via
sys_bpf(BPF_PROG_TEST_RUN). User program will replace some uprobe programs.
Compared against uprobe, user program has the following advantages:

1. User programs are faster. The new selftest added in 5/5, shows that a
   simple uprobe program takes 1400 nanoseconds, while user program only
   takes 300 nanoseconds.
2. User can specify on which cpu the user program would run.
3. User can pass arguments to the user program.

Song Liu (5):
  bpf: introduce BPF_PROG_TYPE_USER
  libbpf: support BPF_PROG_TYPE_USER programs
  selftests/bpf: add selftest for BPF_PROG_TYPE_USER
  selftests/bpf: move two functions to test_progs.c
  selftests/bpf: add benchmark for uprobe vs. user_prog

 include/linux/bpf_types.h                     |   2 +
 include/uapi/linux/bpf.h                      |  19 +++
 kernel/bpf/syscall.c                          |   3 +-
 kernel/trace/bpf_trace.c                      | 121 ++++++++++++++++++
 tools/include/uapi/linux/bpf.h                |  19 +++
 tools/lib/bpf/bpf.c                           |   1 +
 tools/lib/bpf/bpf.h                           |   3 +
 tools/lib/bpf/libbpf.c                        |   1 +
 tools/lib/bpf/libbpf_probes.c                 |   1 +
 .../selftests/bpf/prog_tests/attach_probe.c   |  21 ---
 .../selftests/bpf/prog_tests/test_overhead.c  |   8 --
 .../bpf/prog_tests/uprobe_vs_user_prog.c      | 101 +++++++++++++++
 .../selftests/bpf/prog_tests/user_prog.c      |  52 ++++++++
 .../selftests/bpf/progs/uprobe_vs_user_prog.c |  21 +++
 tools/testing/selftests/bpf/progs/user_prog.c |  56 ++++++++
 tools/testing/selftests/bpf/test_progs.c      |  30 +++++
 tools/testing/selftests/bpf/test_progs.h      |   2 +
 17 files changed, 431 insertions(+), 30 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/uprobe_vs_user_prog.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/user_prog.c
 create mode 100644 tools/testing/selftests/bpf/progs/uprobe_vs_user_prog.c
 create mode 100644 tools/testing/selftests/bpf/progs/user_prog.c

--
2.24.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH bpf-next 1/5] bpf: introduce BPF_PROG_TYPE_USER
  2020-08-01  8:47 [PATCH bpf-next 0/5] introduce BPF_PROG_TYPE_USER Song Liu
@ 2020-08-01  8:47 ` Song Liu
  2020-08-01 13:58   ` kernel test robot
                     ` (3 more replies)
  2020-08-01  8:47 ` [PATCH bpf-next 2/5] libbpf: support BPF_PROG_TYPE_USER programs Song Liu
                   ` (3 subsequent siblings)
  4 siblings, 4 replies; 43+ messages in thread
From: Song Liu @ 2020-08-01  8:47 UTC (permalink / raw)
  To: linux-kernel, bpf, netdev
  Cc: ast, daniel, kernel-team, john.fastabend, kpsingh, brouer, dlxu,
	Song Liu

As of today, to trigger BPF program from user space, the common practise
is to create a uprobe on a special function and calls that function. For
example, bpftrace uses BEGIN_trigger and END_trigger for the BEGIN and END
programs.

However, uprobe is not ideal for this use case. First, uprobe uses trap,
which adds non-trivial overhead. Second, uprobe requires calculating
function offset at runtime, which is not very reliable. bpftrace has
seen issues with this:
  https://github.com/iovisor/bpftrace/pull/1438
  https://github.com/iovisor/bpftrace/issues/1440

This patch introduces a new BPF program type BPF_PROG_TYPE_USER, or "user
program". User program is triggered via sys_bpf(BPF_PROG_TEST_RUN), which
is significant faster than a trap.

To make user program more flexible, we enabled the following features:
  1. The user can specify on which cpu the program should run. If the
     target cpu is not current cpu, the program is triggered via IPI.
  2. User can pass optional argument to user program. Currently, the
     argument can only be 5x u64 numbers.

User program has access to helper functions in bpf_tracing_func_proto()
and bpf_get_stack|stackid().

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 include/linux/bpf_types.h      |   2 +
 include/uapi/linux/bpf.h       |  19 ++++++
 kernel/bpf/syscall.c           |   3 +-
 kernel/trace/bpf_trace.c       | 121 +++++++++++++++++++++++++++++++++
 tools/include/uapi/linux/bpf.h |  19 ++++++
 5 files changed, 163 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index a52a5688418e5..3c52f3207aced 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -76,6 +76,8 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_EXT, bpf_extension,
 BPF_PROG_TYPE(BPF_PROG_TYPE_LSM, lsm,
 	       void *, void *)
 #endif /* CONFIG_BPF_LSM */
+BPF_PROG_TYPE(BPF_PROG_TYPE_USER, user,
+	       void *, void *)
 #endif
 
 BPF_MAP_TYPE(BPF_MAP_TYPE_ARRAY, array_map_ops)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index eb5e0c38eb2cf..f6b9d4e7eeb4e 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -190,6 +190,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_EXT,
 	BPF_PROG_TYPE_LSM,
 	BPF_PROG_TYPE_SK_LOOKUP,
+	BPF_PROG_TYPE_USER,
 };
 
 enum bpf_attach_type {
@@ -556,6 +557,12 @@ union bpf_attr {
 						 */
 		__aligned_u64	ctx_in;
 		__aligned_u64	ctx_out;
+		__u32		cpu_plus;	/* run this program on cpu
+						 * (cpu_plus - 1).
+						 * If cpu_plus == 0, run on
+						 * current cpu. Only valid
+						 * for BPF_PROG_TYPE_USER.
+						 */
 	} test;
 
 	struct { /* anonymous struct used by BPF_*_GET_*_ID */
@@ -4441,4 +4448,16 @@ struct bpf_sk_lookup {
 	__u32 local_port;	/* Host byte order */
 };
 
+struct pt_regs;
+
+#define BPF_USER_PROG_MAX_ARGS 5
+struct bpf_user_prog_args {
+	__u64 args[BPF_USER_PROG_MAX_ARGS];
+};
+
+struct bpf_user_prog_ctx {
+	struct pt_regs *regs;
+	__u64 args[BPF_USER_PROG_MAX_ARGS];
+};
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index cd3d599e9e90e..f5a28fd8a9bc2 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2078,6 +2078,7 @@ static bool is_perfmon_prog_type(enum bpf_prog_type prog_type)
 	case BPF_PROG_TYPE_LSM:
 	case BPF_PROG_TYPE_STRUCT_OPS: /* has access to struct sock */
 	case BPF_PROG_TYPE_EXT: /* extends any prog */
+	case BPF_PROG_TYPE_USER:
 		return true;
 	default:
 		return false;
@@ -2969,7 +2970,7 @@ static int bpf_prog_query(const union bpf_attr *attr,
 	}
 }
 
-#define BPF_PROG_TEST_RUN_LAST_FIELD test.ctx_out
+#define BPF_PROG_TEST_RUN_LAST_FIELD test.cpu_plus
 
 static int bpf_prog_test_run(const union bpf_attr *attr,
 			     union bpf_attr __user *uattr)
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index cb91ef902cc43..cbe789bc1b986 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -16,6 +16,7 @@
 #include <linux/error-injection.h>
 #include <linux/btf_ids.h>
 
+#include <asm/irq_regs.h>
 #include <asm/tlb.h>
 
 #include "trace_probe.h"
@@ -1740,6 +1741,126 @@ const struct bpf_verifier_ops perf_event_verifier_ops = {
 const struct bpf_prog_ops perf_event_prog_ops = {
 };
 
+struct bpf_user_prog_test_run_info {
+	struct bpf_prog *prog;
+	struct bpf_user_prog_ctx ctx;
+	u32 retval;
+};
+
+static void
+__bpf_prog_test_run_user(struct bpf_user_prog_test_run_info *info)
+{
+	rcu_read_lock();
+	migrate_disable();
+	info->retval = BPF_PROG_RUN(info->prog, &info->ctx);
+	migrate_enable();
+	rcu_read_unlock();
+}
+
+static void _bpf_prog_test_run_user(void *data)
+{
+	struct bpf_user_prog_test_run_info *info = data;
+
+	info->ctx.regs = get_irq_regs();
+	__bpf_prog_test_run_user(info);
+}
+
+static int bpf_prog_test_run_user(struct bpf_prog *prog,
+				  const union bpf_attr *kattr,
+				  union bpf_attr __user *uattr)
+{
+	void __user *data_in = u64_to_user_ptr(kattr->test.data_in);
+	__u32 data_size = kattr->test.data_size_in;
+	struct bpf_user_prog_test_run_info info;
+	int cpu = kattr->test.cpu_plus - 1;
+	int err;
+
+	if (kattr->test.ctx_in || kattr->test.ctx_out ||
+	    kattr->test.duration || kattr->test.repeat ||
+	    kattr->test.data_out)
+		return -EINVAL;
+
+	if ((data_in && !data_size) || (!data_in && data_size))
+		return -EINVAL;
+
+	/* if provided, data_in should be struct bpf_user_prog_args */
+	if (data_size > 0 && data_size != sizeof(struct bpf_user_prog_args))
+		return -EINVAL;
+
+	if (kattr->test.data_size_in) {
+		if (copy_from_user(&info.ctx.args, data_in,
+				   sizeof(struct bpf_user_prog_args)))
+			return -EFAULT;
+	} else {
+		memset(&info.ctx.args, 0, sizeof(struct bpf_user_prog_args));
+	}
+
+	info.prog = prog;
+
+	if (!kattr->test.cpu_plus || cpu == smp_processor_id()) {
+		/* non-IPI, use regs from perf_fetch_caller_regs */
+		info.ctx.regs = get_bpf_raw_tp_regs();
+		if (IS_ERR(info.ctx.regs))
+			return PTR_ERR(info.ctx.regs);
+		perf_fetch_caller_regs(info.ctx.regs);
+		__bpf_prog_test_run_user(&info);
+		put_bpf_raw_tp_regs();
+	} else {
+		err = smp_call_function_single(cpu, _bpf_prog_test_run_user,
+					       &info, 1);
+		if (err)
+			return err;
+	}
+
+	if (copy_to_user(&uattr->test.retval, &info.retval, sizeof(u32)))
+		return -EFAULT;
+
+	return 0;
+}
+
+static bool user_prog_is_valid_access(int off, int size, enum bpf_access_type type,
+				      const struct bpf_prog *prog,
+				      struct bpf_insn_access_aux *info)
+{
+	const int size_u64 = sizeof(u64);
+
+	if (off < 0 || off >= sizeof(struct bpf_user_prog_ctx))
+		return false;
+
+	switch (off) {
+	case bpf_ctx_range(struct bpf_user_prog_ctx, regs):
+		bpf_ctx_record_field_size(info, size_u64);
+		if (!bpf_ctx_narrow_access_ok(off, size, size_u64))
+			return false;
+		break;
+	default:
+		break;
+	}
+	return true;
+}
+
+static const struct bpf_func_proto *
+user_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
+{
+	switch (func_id) {
+	case BPF_FUNC_get_stackid:
+		return &bpf_get_stackid_proto;
+	case BPF_FUNC_get_stack:
+		return &bpf_get_stack_proto;
+	default:
+		return bpf_tracing_func_proto(func_id, prog);
+	}
+}
+
+const struct bpf_verifier_ops user_verifier_ops = {
+	.get_func_proto		= user_prog_func_proto,
+	.is_valid_access	= user_prog_is_valid_access,
+};
+
+const struct bpf_prog_ops user_prog_ops = {
+	.test_run	= bpf_prog_test_run_user,
+};
+
 static DEFINE_MUTEX(bpf_event_mutex);
 
 #define BPF_TRACE_MAX_PROGS 64
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index eb5e0c38eb2cf..f6b9d4e7eeb4e 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -190,6 +190,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_EXT,
 	BPF_PROG_TYPE_LSM,
 	BPF_PROG_TYPE_SK_LOOKUP,
+	BPF_PROG_TYPE_USER,
 };
 
 enum bpf_attach_type {
@@ -556,6 +557,12 @@ union bpf_attr {
 						 */
 		__aligned_u64	ctx_in;
 		__aligned_u64	ctx_out;
+		__u32		cpu_plus;	/* run this program on cpu
+						 * (cpu_plus - 1).
+						 * If cpu_plus == 0, run on
+						 * current cpu. Only valid
+						 * for BPF_PROG_TYPE_USER.
+						 */
 	} test;
 
 	struct { /* anonymous struct used by BPF_*_GET_*_ID */
@@ -4441,4 +4448,16 @@ struct bpf_sk_lookup {
 	__u32 local_port;	/* Host byte order */
 };
 
+struct pt_regs;
+
+#define BPF_USER_PROG_MAX_ARGS 5
+struct bpf_user_prog_args {
+	__u64 args[BPF_USER_PROG_MAX_ARGS];
+};
+
+struct bpf_user_prog_ctx {
+	struct pt_regs *regs;
+	__u64 args[BPF_USER_PROG_MAX_ARGS];
+};
+
 #endif /* _UAPI__LINUX_BPF_H__ */
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH bpf-next 2/5] libbpf: support BPF_PROG_TYPE_USER programs
  2020-08-01  8:47 [PATCH bpf-next 0/5] introduce BPF_PROG_TYPE_USER Song Liu
  2020-08-01  8:47 ` [PATCH bpf-next 1/5] bpf: " Song Liu
@ 2020-08-01  8:47 ` Song Liu
  2020-08-03  1:40   ` Andrii Nakryiko
  2020-08-01  8:47 ` [PATCH bpf-next 3/5] selftests/bpf: add selftest for BPF_PROG_TYPE_USER Song Liu
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 43+ messages in thread
From: Song Liu @ 2020-08-01  8:47 UTC (permalink / raw)
  To: linux-kernel, bpf, netdev
  Cc: ast, daniel, kernel-team, john.fastabend, kpsingh, brouer, dlxu,
	Song Liu

Add cpu_plus to bpf_prog_test_run_attr. Add BPF_PROG_SEC "user" for
BPF_PROG_TYPE_USER programs.

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 tools/lib/bpf/bpf.c           | 1 +
 tools/lib/bpf/bpf.h           | 3 +++
 tools/lib/bpf/libbpf.c        | 1 +
 tools/lib/bpf/libbpf_probes.c | 1 +
 4 files changed, 6 insertions(+)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index e1bdf214f75fe..b28c3daa9c270 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -693,6 +693,7 @@ int bpf_prog_test_run_xattr(struct bpf_prog_test_run_attr *test_attr)
 	attr.test.ctx_size_in = test_attr->ctx_size_in;
 	attr.test.ctx_size_out = test_attr->ctx_size_out;
 	attr.test.repeat = test_attr->repeat;
+	attr.test.cpu_plus = test_attr->cpu_plus;
 
 	ret = sys_bpf(BPF_PROG_TEST_RUN, &attr, sizeof(attr));
 	test_attr->data_size_out = attr.test.data_size_out;
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 6d367e01d05e9..0c799740df566 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -205,6 +205,9 @@ struct bpf_prog_test_run_attr {
 	void *ctx_out;      /* optional */
 	__u32 ctx_size_out; /* in: max length of ctx_out
 			     * out: length of cxt_out */
+	__u32 cpu_plus;	    /* specify which cpu to run the test with
+			     * cpu_plus = cpu_id + 1.
+			     * If cpu_plus = 0, run on current cpu */
 };
 
 LIBBPF_API int bpf_prog_test_run_xattr(struct bpf_prog_test_run_attr *test_attr);
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index b9f11f854985b..9ce175a486214 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -6922,6 +6922,7 @@ static const struct bpf_sec_def section_defs[] = {
 	BPF_PROG_SEC("lwt_out",			BPF_PROG_TYPE_LWT_OUT),
 	BPF_PROG_SEC("lwt_xmit",		BPF_PROG_TYPE_LWT_XMIT),
 	BPF_PROG_SEC("lwt_seg6local",		BPF_PROG_TYPE_LWT_SEG6LOCAL),
+	BPF_PROG_SEC("user",			BPF_PROG_TYPE_USER),
 	BPF_APROG_SEC("cgroup_skb/ingress",	BPF_PROG_TYPE_CGROUP_SKB,
 						BPF_CGROUP_INET_INGRESS),
 	BPF_APROG_SEC("cgroup_skb/egress",	BPF_PROG_TYPE_CGROUP_SKB,
diff --git a/tools/lib/bpf/libbpf_probes.c b/tools/lib/bpf/libbpf_probes.c
index 5a3d3f0784081..163013084000e 100644
--- a/tools/lib/bpf/libbpf_probes.c
+++ b/tools/lib/bpf/libbpf_probes.c
@@ -112,6 +112,7 @@ probe_load(enum bpf_prog_type prog_type, const struct bpf_insn *insns,
 	case BPF_PROG_TYPE_STRUCT_OPS:
 	case BPF_PROG_TYPE_EXT:
 	case BPF_PROG_TYPE_LSM:
+	case BPF_PROG_TYPE_USER:
 	default:
 		break;
 	}
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH bpf-next 3/5] selftests/bpf: add selftest for BPF_PROG_TYPE_USER
  2020-08-01  8:47 [PATCH bpf-next 0/5] introduce BPF_PROG_TYPE_USER Song Liu
  2020-08-01  8:47 ` [PATCH bpf-next 1/5] bpf: " Song Liu
  2020-08-01  8:47 ` [PATCH bpf-next 2/5] libbpf: support BPF_PROG_TYPE_USER programs Song Liu
@ 2020-08-01  8:47 ` Song Liu
  2020-08-03  1:43   ` Andrii Nakryiko
  2020-08-01  8:47 ` [PATCH bpf-next 4/5] selftests/bpf: move two functions to test_progs.c Song Liu
  2020-08-01  8:47 ` [PATCH bpf-next 5/5] selftests/bpf: add benchmark for uprobe vs. user_prog Song Liu
  4 siblings, 1 reply; 43+ messages in thread
From: Song Liu @ 2020-08-01  8:47 UTC (permalink / raw)
  To: linux-kernel, bpf, netdev
  Cc: ast, daniel, kernel-team, john.fastabend, kpsingh, brouer, dlxu,
	Song Liu

This test checks the correctness of BPF_PROG_TYPE_USER program, including:
running on the right cpu, passing in correct args, returning retval, and
being able to call bpf_get_stack|stackid.

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 .../selftests/bpf/prog_tests/user_prog.c      | 52 +++++++++++++++++
 tools/testing/selftests/bpf/progs/user_prog.c | 56 +++++++++++++++++++
 2 files changed, 108 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/user_prog.c
 create mode 100644 tools/testing/selftests/bpf/progs/user_prog.c

diff --git a/tools/testing/selftests/bpf/prog_tests/user_prog.c b/tools/testing/selftests/bpf/prog_tests/user_prog.c
new file mode 100644
index 0000000000000..416707b3bff01
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/user_prog.c
@@ -0,0 +1,52 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2020 Facebook */
+#include <test_progs.h>
+#include "user_prog.skel.h"
+
+static int duration;
+
+void test_user_prog(void)
+{
+	struct bpf_user_prog_args args = {{0, 1, 2, 3, 4}};
+	struct bpf_prog_test_run_attr attr = {};
+	struct user_prog *skel;
+	int i, numcpu, ret;
+
+	skel = user_prog__open_and_load();
+
+	if (CHECK(!skel, "user_prog__open_and_load",
+		  "skeleton open_and_laod failed\n"))
+		return;
+
+	numcpu = libbpf_num_possible_cpus();
+
+	attr.prog_fd = bpf_program__fd(skel->progs.user_func);
+	attr.data_size_in = sizeof(args);
+	attr.data_in = &args;
+
+	/* start from -1, so we test cpu_plus == 0 */
+	for (i = -1; i < numcpu; i++) {
+		args.args[0] = i + 1;
+		attr.cpu_plus = i + 1;
+		ret = bpf_prog_test_run_xattr(&attr);
+		CHECK(ret, "bpf_prog_test_run_xattr", "returns error\n");
+
+		/* skip two tests for i == -1 */
+		if (i == -1)
+			continue;
+		CHECK(attr.retval != i + 2, "bpf_prog_test_run_xattr",
+		      "doesn't get expected retval\n");
+		CHECK(skel->data->sum != 11 + i, "user_prog_args_test",
+		      "sum of args doesn't match\n");
+	}
+
+	CHECK(skel->data->cpu_match == 0, "cpu_match_test", "failed\n");
+	CHECK(skel->bss->get_stack_success != numcpu + 1, "test_bpf_get_stack",
+	      "failed on %d cores\n", numcpu - skel->bss->get_stack_success);
+	CHECK(skel->bss->get_stackid_success != numcpu + 1,
+	      "test_bpf_get_stackid",
+	      "failed on %d cores\n",
+	      numcpu + 1 - skel->bss->get_stackid_success);
+
+	user_prog__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/user_prog.c b/tools/testing/selftests/bpf/progs/user_prog.c
new file mode 100644
index 0000000000000..cf320e97f107a
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/user_prog.c
@@ -0,0 +1,56 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2020 Facebook */
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+char _license[] SEC("license") = "GPL";
+
+#ifndef PERF_MAX_STACK_DEPTH
+#define PERF_MAX_STACK_DEPTH         127
+#endif
+
+typedef __u64 stack_trace_t[PERF_MAX_STACK_DEPTH];
+
+struct {
+	__uint(type, BPF_MAP_TYPE_STACK_TRACE);
+	__uint(max_entries, 16384);
+	__uint(key_size, sizeof(__u32));
+	__uint(value_size, sizeof(stack_trace_t));
+} stackmap SEC(".maps");
+
+volatile int cpu_match = 1;
+volatile __u64 sum = 1;
+volatile int get_stack_success = 0;
+volatile int get_stackid_success = 0;
+volatile __u64 stacktrace[PERF_MAX_STACK_DEPTH];
+
+SEC("user")
+int user_func(struct bpf_user_prog_ctx *ctx)
+{
+	int cpu = bpf_get_smp_processor_id();
+	__u32 key = cpu;
+	long stackid, err;
+
+	/* check the program runs on the right cpu */
+	if (ctx->args[0] && ctx->args[0] != cpu + 1)
+		cpu_match = 0;
+
+	/* check the sum of arguments are correct */
+	sum = ctx->args[0] + ctx->args[1] + ctx->args[2] +
+		ctx->args[3] + ctx->args[4];
+
+	/* check bpf_get_stackid works */
+	stackid = bpf_get_stackid(ctx, &stackmap, 0);
+	if (stackid >= 0)
+		get_stackid_success++;
+
+	/* check bpf_get_stack works */
+	err = bpf_get_stack(ctx, (void *)stacktrace,
+			    PERF_MAX_STACK_DEPTH * sizeof(__u64),
+			    BPF_F_USER_STACK);
+	if (err >= 0)
+		get_stack_success++;
+
+	return cpu + 2;
+}
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH bpf-next 4/5] selftests/bpf: move two functions to test_progs.c
  2020-08-01  8:47 [PATCH bpf-next 0/5] introduce BPF_PROG_TYPE_USER Song Liu
                   ` (2 preceding siblings ...)
  2020-08-01  8:47 ` [PATCH bpf-next 3/5] selftests/bpf: add selftest for BPF_PROG_TYPE_USER Song Liu
@ 2020-08-01  8:47 ` Song Liu
  2020-08-03  1:46   ` Andrii Nakryiko
  2020-08-01  8:47 ` [PATCH bpf-next 5/5] selftests/bpf: add benchmark for uprobe vs. user_prog Song Liu
  4 siblings, 1 reply; 43+ messages in thread
From: Song Liu @ 2020-08-01  8:47 UTC (permalink / raw)
  To: linux-kernel, bpf, netdev
  Cc: ast, daniel, kernel-team, john.fastabend, kpsingh, brouer, dlxu,
	Song Liu

Move time_get_ns() and get_base_addr() to test_progs.c, so they can be
used in other tests.

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 .../selftests/bpf/prog_tests/attach_probe.c   | 21 -------------
 .../selftests/bpf/prog_tests/test_overhead.c  |  8 -----
 tools/testing/selftests/bpf/test_progs.c      | 30 +++++++++++++++++++
 tools/testing/selftests/bpf/test_progs.h      |  2 ++
 4 files changed, 32 insertions(+), 29 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/attach_probe.c b/tools/testing/selftests/bpf/prog_tests/attach_probe.c
index a0ee87c8e1ea7..3bda8acbbafb5 100644
--- a/tools/testing/selftests/bpf/prog_tests/attach_probe.c
+++ b/tools/testing/selftests/bpf/prog_tests/attach_probe.c
@@ -2,27 +2,6 @@
 #include <test_progs.h>
 #include "test_attach_probe.skel.h"
 
-ssize_t get_base_addr() {
-	size_t start, offset;
-	char buf[256];
-	FILE *f;
-
-	f = fopen("/proc/self/maps", "r");
-	if (!f)
-		return -errno;
-
-	while (fscanf(f, "%zx-%*x %s %zx %*[^\n]\n",
-		      &start, buf, &offset) == 3) {
-		if (strcmp(buf, "r-xp") == 0) {
-			fclose(f);
-			return start - offset;
-		}
-	}
-
-	fclose(f);
-	return -EINVAL;
-}
-
 void test_attach_probe(void)
 {
 	int duration = 0;
diff --git a/tools/testing/selftests/bpf/prog_tests/test_overhead.c b/tools/testing/selftests/bpf/prog_tests/test_overhead.c
index 2702df2b23433..3fe32e9357c4b 100644
--- a/tools/testing/selftests/bpf/prog_tests/test_overhead.c
+++ b/tools/testing/selftests/bpf/prog_tests/test_overhead.c
@@ -7,14 +7,6 @@
 
 #define MAX_CNT 100000
 
-static __u64 time_get_ns(void)
-{
-	struct timespec ts;
-
-	clock_gettime(CLOCK_MONOTONIC, &ts);
-	return ts.tv_sec * 1000000000ull + ts.tv_nsec;
-}
-
 static int test_task_rename(const char *prog)
 {
 	int i, fd, duration = 0, err;
diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
index b1e4dadacd9b4..c9e6a5ad5b9a4 100644
--- a/tools/testing/selftests/bpf/test_progs.c
+++ b/tools/testing/selftests/bpf/test_progs.c
@@ -622,6 +622,36 @@ int cd_flavor_subdir(const char *exec_name)
 	return chdir(flavor);
 }
 
+__u64 time_get_ns(void)
+{
+	struct timespec ts;
+
+	clock_gettime(CLOCK_MONOTONIC, &ts);
+	return ts.tv_sec * 1000000000ull + ts.tv_nsec;
+}
+
+ssize_t get_base_addr(void)
+{
+	size_t start, offset;
+	char buf[256];
+	FILE *f;
+
+	f = fopen("/proc/self/maps", "r");
+	if (!f)
+		return -errno;
+
+	while (fscanf(f, "%zx-%*x %s %zx %*[^\n]\n",
+		      &start, buf, &offset) == 3) {
+		if (strcmp(buf, "r-xp") == 0) {
+			fclose(f);
+			return start - offset;
+		}
+	}
+
+	fclose(f);
+	return -EINVAL;
+}
+
 #define MAX_BACKTRACE_SZ 128
 void crash_handler(int signum)
 {
diff --git a/tools/testing/selftests/bpf/test_progs.h b/tools/testing/selftests/bpf/test_progs.h
index 6e09bf738473e..2d617201024bf 100644
--- a/tools/testing/selftests/bpf/test_progs.h
+++ b/tools/testing/selftests/bpf/test_progs.h
@@ -91,6 +91,8 @@ extern bool test__start_subtest(const char *name);
 extern void test__skip(void);
 extern void test__fail(void);
 extern int test__join_cgroup(const char *path);
+extern __u64 time_get_ns(void);
+extern ssize_t get_base_addr();
 
 #define PRINT_FAIL(format...)                                                  \
 	({                                                                     \
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH bpf-next 5/5] selftests/bpf: add benchmark for uprobe vs. user_prog
  2020-08-01  8:47 [PATCH bpf-next 0/5] introduce BPF_PROG_TYPE_USER Song Liu
                   ` (3 preceding siblings ...)
  2020-08-01  8:47 ` [PATCH bpf-next 4/5] selftests/bpf: move two functions to test_progs.c Song Liu
@ 2020-08-01  8:47 ` Song Liu
  2020-08-03  1:51   ` Andrii Nakryiko
  4 siblings, 1 reply; 43+ messages in thread
From: Song Liu @ 2020-08-01  8:47 UTC (permalink / raw)
  To: linux-kernel, bpf, netdev
  Cc: ast, daniel, kernel-team, john.fastabend, kpsingh, brouer, dlxu,
	Song Liu

Add a benchmark to compare performance of
  1) uprobe;
  2) user program w/o args;
  3) user program w/ args;
  4) user program w/ args on random cpu.

Sample output:

./test_progs -t uprobe_vs_user_prog -v
test_uprobe_vs_user_prog:PASS:uprobe_vs_user_prog__open_and_load 0 nsec
test_uprobe_vs_user_prog:PASS:get_base_addr 0 nsec
test_uprobe_vs_user_prog:PASS:attach_uprobe 0 nsec
run_perf_test:PASS:uprobe 0 nsec
Each uprobe uses 1419 nanoseconds
run_perf_test:PASS:user_prog_no_args 0 nsec
Each user_prog_no_args uses 313 nanoseconds
run_perf_test:PASS:user_prog_with_args 0 nsec
Each user_prog_with_args uses 335 nanoseconds
run_perf_test:PASS:user_prog_with_args_on_cpu 0 nsec
Each user_prog_with_args_on_cpu uses 2821 nanoseconds
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 .../bpf/prog_tests/uprobe_vs_user_prog.c      | 101 ++++++++++++++++++
 .../selftests/bpf/progs/uprobe_vs_user_prog.c |  21 ++++
 2 files changed, 122 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/uprobe_vs_user_prog.c
 create mode 100644 tools/testing/selftests/bpf/progs/uprobe_vs_user_prog.c

diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_vs_user_prog.c b/tools/testing/selftests/bpf/prog_tests/uprobe_vs_user_prog.c
new file mode 100644
index 0000000000000..dadd7b56e69ec
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/uprobe_vs_user_prog.c
@@ -0,0 +1,101 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <test_progs.h>
+#include "uprobe_vs_user_prog.skel.h"
+
+#define REPEAT_CNT 10000ULL
+
+static int duration;
+
+static noinline void uprobe_target(void)
+{
+	asm ("");
+}
+
+struct bpf_prog_test_run_attr attr;
+
+static void call_user_prog(void)
+{
+	bpf_prog_test_run_xattr(&attr);
+}
+
+static int numcpu;
+
+static void call_user_prog_on_cpu(void)
+{
+	static int cpu = 0;
+
+	attr.cpu_plus = cpu + 1;
+	bpf_prog_test_run_xattr(&attr);
+	cpu = (cpu + 1) % numcpu;
+}
+
+typedef void (__run_func)(void);
+
+static void run_perf_test(struct uprobe_vs_user_prog *skel,
+			  __run_func func, const char *name)
+{
+	__u64 start_time, total_time;
+	int i;
+
+	skel->bss->sum = 0;
+
+	start_time = time_get_ns();
+	for (i = 0; i < REPEAT_CNT; i++)
+		func();
+	total_time = time_get_ns() - start_time;
+
+	CHECK(skel->bss->sum != REPEAT_CNT, name,
+	      "missed %llu times\n", REPEAT_CNT - skel->bss->sum);
+	printf("Each %s uses %llu nanoseconds\n", name, total_time / REPEAT_CNT);
+}
+
+void test_uprobe_vs_user_prog(void)
+{
+	struct bpf_user_prog_args args = {};
+	struct uprobe_vs_user_prog *skel;
+	struct bpf_link *uprobe_link;
+	size_t uprobe_offset;
+	ssize_t base_addr;
+
+	skel = uprobe_vs_user_prog__open_and_load();
+
+	if (CHECK(!skel, "uprobe_vs_user_prog__open_and_load",
+		  "skeleton open_and_laod failed\n"))
+		return;
+
+	base_addr = get_base_addr();
+	if (CHECK(base_addr < 0, "get_base_addr",
+		  "failed to find base addr: %zd", base_addr))
+		return;
+	uprobe_offset = (size_t)&uprobe_target - base_addr;
+	uprobe_link = bpf_program__attach_uprobe(skel->progs.handle_uprobe,
+						 false /* retprobe */,
+						 0 /* self pid */,
+						 "/proc/self/exe",
+						 uprobe_offset);
+
+	if (CHECK(IS_ERR(uprobe_link), "attach_uprobe",
+		  "err %ld\n", PTR_ERR(uprobe_link)))
+		goto cleanup;
+	skel->links.handle_uprobe = uprobe_link;
+
+	run_perf_test(skel, uprobe_target, "uprobe");
+
+	attr.prog_fd = bpf_program__fd(skel->progs.user_prog);
+	run_perf_test(skel, call_user_prog, "user_prog_no_args");
+
+	attr.data_size_in = sizeof(args);
+	attr.data_in = &args;
+	run_perf_test(skel, call_user_prog, "user_prog_with_args");
+
+	numcpu = libbpf_num_possible_cpus();
+
+	if (numcpu <= 0)
+		goto cleanup;
+
+	run_perf_test(skel, call_user_prog_on_cpu,
+		      "user_prog_with_args_on_cpu");
+
+cleanup:
+	uprobe_vs_user_prog__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/uprobe_vs_user_prog.c b/tools/testing/selftests/bpf/progs/uprobe_vs_user_prog.c
new file mode 100644
index 0000000000000..8b327b7cee30d
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/uprobe_vs_user_prog.c
@@ -0,0 +1,21 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2017 Facebook
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+volatile __u64 sum = 0;
+
+SEC("uprobe/func")
+int handle_uprobe(struct pt_regs *ctx)
+{
+	sum++;
+	return 0;
+}
+
+SEC("user")
+int user_prog(struct pt_regs *ctx)
+{
+	sum++;
+	return 0;
+}
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH bpf-next 1/5] bpf: introduce BPF_PROG_TYPE_USER
  2020-08-01  8:47 ` [PATCH bpf-next 1/5] bpf: " Song Liu
@ 2020-08-01 13:58   ` kernel test robot
  2020-08-01 15:21   ` kernel test robot
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 43+ messages in thread
From: kernel test robot @ 2020-08-01 13:58 UTC (permalink / raw)
  To: Song Liu, linux-kernel, bpf, netdev
  Cc: kbuild-all, ast, daniel, kernel-team, john.fastabend, kpsingh,
	brouer, dlxu

[-- Attachment #1: Type: text/plain, Size: 1228 bytes --]

Hi Song,

I love your patch! Yet something to improve:

[auto build test ERROR on bpf-next/master]

url:    https://github.com/0day-ci/linux/commits/Song-Liu/introduce-BPF_PROG_TYPE_USER/20200801-165208
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: riscv-allyesconfig (attached as .config)
compiler: riscv64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=riscv 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   riscv64-linux-ld: kernel/bpf/syscall.o: in function `.LANCHOR1':
>> syscall.c:(.rodata+0xa50): undefined reference to `user_prog_ops'
   riscv64-linux-ld: kernel/bpf/verifier.o: in function `.LANCHOR1':
>> verifier.c:(.rodata+0x6b8): undefined reference to `user_verifier_ops'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 65167 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH bpf-next 1/5] bpf: introduce BPF_PROG_TYPE_USER
  2020-08-01  8:47 ` [PATCH bpf-next 1/5] bpf: " Song Liu
  2020-08-01 13:58   ` kernel test robot
@ 2020-08-01 15:21   ` kernel test robot
  2020-08-06 18:18   ` kernel test robot
  2020-08-06 18:18   ` [RFC PATCH] bpf: user_verifier_ops can be static kernel test robot
  3 siblings, 0 replies; 43+ messages in thread
From: kernel test robot @ 2020-08-01 15:21 UTC (permalink / raw)
  To: Song Liu, linux-kernel, bpf, netdev
  Cc: kbuild-all, ast, daniel, kernel-team, john.fastabend, kpsingh,
	brouer, dlxu

[-- Attachment #1: Type: text/plain, Size: 938 bytes --]

Hi Song,

I love your patch! Yet something to improve:

[auto build test ERROR on bpf-next/master]

url:    https://github.com/0day-ci/linux/commits/Song-Liu/introduce-BPF_PROG_TYPE_USER/20200801-165208
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: i386-randconfig-a001-20200731 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-14) 9.3.0
reproduce (this is a W=1 build):
        # save the attached .config to linux build tree
        make W=1 ARCH=i386 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> ld: kernel/bpf/syscall.o:(.rodata+0xcbc): undefined reference to `user_prog_ops'
>> ld: kernel/bpf/verifier.o:(.rodata+0x115c): undefined reference to `user_verifier_ops'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 35086 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH bpf-next 2/5] libbpf: support BPF_PROG_TYPE_USER programs
  2020-08-01  8:47 ` [PATCH bpf-next 2/5] libbpf: support BPF_PROG_TYPE_USER programs Song Liu
@ 2020-08-03  1:40   ` Andrii Nakryiko
  2020-08-03  4:21     ` Song Liu
  2020-08-04  1:18     ` Song Liu
  0 siblings, 2 replies; 43+ messages in thread
From: Andrii Nakryiko @ 2020-08-03  1:40 UTC (permalink / raw)
  To: Song Liu
  Cc: open list, bpf, Networking, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, john fastabend, KP Singh, Jesper Dangaard Brouer,
	Daniel Xu

On Sat, Aug 1, 2020 at 1:50 AM Song Liu <songliubraving@fb.com> wrote:
>
> Add cpu_plus to bpf_prog_test_run_attr. Add BPF_PROG_SEC "user" for
> BPF_PROG_TYPE_USER programs.
>
> Signed-off-by: Song Liu <songliubraving@fb.com>
> ---
>  tools/lib/bpf/bpf.c           | 1 +
>  tools/lib/bpf/bpf.h           | 3 +++
>  tools/lib/bpf/libbpf.c        | 1 +
>  tools/lib/bpf/libbpf_probes.c | 1 +
>  4 files changed, 6 insertions(+)
>
> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> index e1bdf214f75fe..b28c3daa9c270 100644
> --- a/tools/lib/bpf/bpf.c
> +++ b/tools/lib/bpf/bpf.c
> @@ -693,6 +693,7 @@ int bpf_prog_test_run_xattr(struct bpf_prog_test_run_attr *test_attr)
>         attr.test.ctx_size_in = test_attr->ctx_size_in;
>         attr.test.ctx_size_out = test_attr->ctx_size_out;
>         attr.test.repeat = test_attr->repeat;
> +       attr.test.cpu_plus = test_attr->cpu_plus;
>
>         ret = sys_bpf(BPF_PROG_TEST_RUN, &attr, sizeof(attr));
>         test_attr->data_size_out = attr.test.data_size_out;
> diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
> index 6d367e01d05e9..0c799740df566 100644
> --- a/tools/lib/bpf/bpf.h
> +++ b/tools/lib/bpf/bpf.h
> @@ -205,6 +205,9 @@ struct bpf_prog_test_run_attr {
>         void *ctx_out;      /* optional */
>         __u32 ctx_size_out; /* in: max length of ctx_out
>                              * out: length of cxt_out */
> +       __u32 cpu_plus;     /* specify which cpu to run the test with
> +                            * cpu_plus = cpu_id + 1.
> +                            * If cpu_plus = 0, run on current cpu */

We can't do this due to ABI guarantees. We'll have to add a new API
using OPTS arguments.

>  };
>
>  LIBBPF_API int bpf_prog_test_run_xattr(struct bpf_prog_test_run_attr *test_attr);
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index b9f11f854985b..9ce175a486214 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -6922,6 +6922,7 @@ static const struct bpf_sec_def section_defs[] = {
>         BPF_PROG_SEC("lwt_out",                 BPF_PROG_TYPE_LWT_OUT),
>         BPF_PROG_SEC("lwt_xmit",                BPF_PROG_TYPE_LWT_XMIT),
>         BPF_PROG_SEC("lwt_seg6local",           BPF_PROG_TYPE_LWT_SEG6LOCAL),
> +       BPF_PROG_SEC("user",                    BPF_PROG_TYPE_USER),

let's do "user/" for consistency with most other prog types (and nice
separation between prog type and custom user name)


>         BPF_APROG_SEC("cgroup_skb/ingress",     BPF_PROG_TYPE_CGROUP_SKB,
>                                                 BPF_CGROUP_INET_INGRESS),
>         BPF_APROG_SEC("cgroup_skb/egress",      BPF_PROG_TYPE_CGROUP_SKB,
> diff --git a/tools/lib/bpf/libbpf_probes.c b/tools/lib/bpf/libbpf_probes.c
> index 5a3d3f0784081..163013084000e 100644
> --- a/tools/lib/bpf/libbpf_probes.c
> +++ b/tools/lib/bpf/libbpf_probes.c
> @@ -112,6 +112,7 @@ probe_load(enum bpf_prog_type prog_type, const struct bpf_insn *insns,
>         case BPF_PROG_TYPE_STRUCT_OPS:
>         case BPF_PROG_TYPE_EXT:
>         case BPF_PROG_TYPE_LSM:
> +       case BPF_PROG_TYPE_USER:
>         default:
>                 break;
>         }
> --
> 2.24.1
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH bpf-next 3/5] selftests/bpf: add selftest for BPF_PROG_TYPE_USER
  2020-08-01  8:47 ` [PATCH bpf-next 3/5] selftests/bpf: add selftest for BPF_PROG_TYPE_USER Song Liu
@ 2020-08-03  1:43   ` Andrii Nakryiko
  2020-08-03  4:33     ` Song Liu
  0 siblings, 1 reply; 43+ messages in thread
From: Andrii Nakryiko @ 2020-08-03  1:43 UTC (permalink / raw)
  To: Song Liu
  Cc: open list, bpf, Networking, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, john fastabend, KP Singh, Jesper Dangaard Brouer,
	Daniel Xu

On Sat, Aug 1, 2020 at 1:50 AM Song Liu <songliubraving@fb.com> wrote:
>
> This test checks the correctness of BPF_PROG_TYPE_USER program, including:
> running on the right cpu, passing in correct args, returning retval, and
> being able to call bpf_get_stack|stackid.
>
> Signed-off-by: Song Liu <songliubraving@fb.com>
> ---
>  .../selftests/bpf/prog_tests/user_prog.c      | 52 +++++++++++++++++
>  tools/testing/selftests/bpf/progs/user_prog.c | 56 +++++++++++++++++++
>  2 files changed, 108 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/user_prog.c
>  create mode 100644 tools/testing/selftests/bpf/progs/user_prog.c
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/user_prog.c b/tools/testing/selftests/bpf/prog_tests/user_prog.c
> new file mode 100644
> index 0000000000000..416707b3bff01
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/user_prog.c
> @@ -0,0 +1,52 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2020 Facebook */
> +#include <test_progs.h>
> +#include "user_prog.skel.h"
> +
> +static int duration;
> +
> +void test_user_prog(void)
> +{
> +       struct bpf_user_prog_args args = {{0, 1, 2, 3, 4}};
> +       struct bpf_prog_test_run_attr attr = {};
> +       struct user_prog *skel;
> +       int i, numcpu, ret;
> +
> +       skel = user_prog__open_and_load();
> +
> +       if (CHECK(!skel, "user_prog__open_and_load",
> +                 "skeleton open_and_laod failed\n"))
> +               return;
> +
> +       numcpu = libbpf_num_possible_cpus();

nit: possible doesn't mean online right now, so it will fail on
offline or non-present CPUs

> +
> +       attr.prog_fd = bpf_program__fd(skel->progs.user_func);
> +       attr.data_size_in = sizeof(args);
> +       attr.data_in = &args;
> +
> +       /* start from -1, so we test cpu_plus == 0 */
> +       for (i = -1; i < numcpu; i++) {
> +               args.args[0] = i + 1;
> +               attr.cpu_plus = i + 1;
> +               ret = bpf_prog_test_run_xattr(&attr);
> +               CHECK(ret, "bpf_prog_test_run_xattr", "returns error\n");
> +
> +               /* skip two tests for i == -1 */
> +               if (i == -1)
> +                       continue;
> +               CHECK(attr.retval != i + 2, "bpf_prog_test_run_xattr",
> +                     "doesn't get expected retval\n");
> +               CHECK(skel->data->sum != 11 + i, "user_prog_args_test",
> +                     "sum of args doesn't match\n");
> +       }
> +
> +       CHECK(skel->data->cpu_match == 0, "cpu_match_test", "failed\n");
> +       CHECK(skel->bss->get_stack_success != numcpu + 1, "test_bpf_get_stack",
> +             "failed on %d cores\n", numcpu - skel->bss->get_stack_success);
> +       CHECK(skel->bss->get_stackid_success != numcpu + 1,
> +             "test_bpf_get_stackid",
> +             "failed on %d cores\n",
> +             numcpu + 1 - skel->bss->get_stackid_success);
> +
> +       user_prog__destroy(skel);
> +}
> diff --git a/tools/testing/selftests/bpf/progs/user_prog.c b/tools/testing/selftests/bpf/progs/user_prog.c
> new file mode 100644
> index 0000000000000..cf320e97f107a
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/user_prog.c
> @@ -0,0 +1,56 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2020 Facebook */
> +#include "vmlinux.h"
> +#include <bpf/bpf_helpers.h>
> +#include <bpf/bpf_tracing.h>
> +
> +char _license[] SEC("license") = "GPL";
> +
> +#ifndef PERF_MAX_STACK_DEPTH
> +#define PERF_MAX_STACK_DEPTH         127
> +#endif
> +
> +typedef __u64 stack_trace_t[PERF_MAX_STACK_DEPTH];
> +
> +struct {
> +       __uint(type, BPF_MAP_TYPE_STACK_TRACE);
> +       __uint(max_entries, 16384);
> +       __uint(key_size, sizeof(__u32));
> +       __uint(value_size, sizeof(stack_trace_t));
> +} stackmap SEC(".maps");
> +
> +volatile int cpu_match = 1;
> +volatile __u64 sum = 1;
> +volatile int get_stack_success = 0;
> +volatile int get_stackid_success = 0;
> +volatile __u64 stacktrace[PERF_MAX_STACK_DEPTH];

nit: no need for volatile for non-static variables

> +
> +SEC("user")
> +int user_func(struct bpf_user_prog_ctx *ctx)

If you put args in bpf_user_prog_ctx as a first field, you should be
able to re-use the BPF_PROG macro to access those arguments in a more
user-friendly way.

> +{
> +       int cpu = bpf_get_smp_processor_id();
> +       __u32 key = cpu;
> +       long stackid, err;
> +
> +       /* check the program runs on the right cpu */
> +       if (ctx->args[0] && ctx->args[0] != cpu + 1)
> +               cpu_match = 0;
> +
> +       /* check the sum of arguments are correct */
> +       sum = ctx->args[0] + ctx->args[1] + ctx->args[2] +
> +               ctx->args[3] + ctx->args[4];
> +
> +       /* check bpf_get_stackid works */
> +       stackid = bpf_get_stackid(ctx, &stackmap, 0);
> +       if (stackid >= 0)
> +               get_stackid_success++;
> +
> +       /* check bpf_get_stack works */
> +       err = bpf_get_stack(ctx, (void *)stacktrace,
> +                           PERF_MAX_STACK_DEPTH * sizeof(__u64),
> +                           BPF_F_USER_STACK);
> +       if (err >= 0)
> +               get_stack_success++;
> +
> +       return cpu + 2;
> +}
> --
> 2.24.1
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH bpf-next 4/5] selftests/bpf: move two functions to test_progs.c
  2020-08-01  8:47 ` [PATCH bpf-next 4/5] selftests/bpf: move two functions to test_progs.c Song Liu
@ 2020-08-03  1:46   ` Andrii Nakryiko
  2020-08-03  4:34     ` Song Liu
  0 siblings, 1 reply; 43+ messages in thread
From: Andrii Nakryiko @ 2020-08-03  1:46 UTC (permalink / raw)
  To: Song Liu
  Cc: open list, bpf, Networking, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, john fastabend, KP Singh, Jesper Dangaard Brouer,
	Daniel Xu

On Sat, Aug 1, 2020 at 1:50 AM Song Liu <songliubraving@fb.com> wrote:
>
> Move time_get_ns() and get_base_addr() to test_progs.c, so they can be
> used in other tests.
>
> Signed-off-by: Song Liu <songliubraving@fb.com>
> ---
>  .../selftests/bpf/prog_tests/attach_probe.c   | 21 -------------
>  .../selftests/bpf/prog_tests/test_overhead.c  |  8 -----
>  tools/testing/selftests/bpf/test_progs.c      | 30 +++++++++++++++++++
>  tools/testing/selftests/bpf/test_progs.h      |  2 ++
>  4 files changed, 32 insertions(+), 29 deletions(-)
>

[...]

>  static int test_task_rename(const char *prog)
>  {
>         int i, fd, duration = 0, err;
> diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
> index b1e4dadacd9b4..c9e6a5ad5b9a4 100644
> --- a/tools/testing/selftests/bpf/test_progs.c
> +++ b/tools/testing/selftests/bpf/test_progs.c
> @@ -622,6 +622,36 @@ int cd_flavor_subdir(const char *exec_name)
>         return chdir(flavor);
>  }
>
> +__u64 time_get_ns(void)
> +{

I'd try to avoid adding stuff to test_progs.c. There is generic
testing_helpers.c, maybe let's put this there?

> +       struct timespec ts;
> +
> +       clock_gettime(CLOCK_MONOTONIC, &ts);
> +       return ts.tv_sec * 1000000000ull + ts.tv_nsec;
> +}
> +
> +ssize_t get_base_addr(void)
> +{

This would definitely be better in trace_helpers.c, though.

> +       size_t start, offset;
> +       char buf[256];
> +       FILE *f;
> +

[...]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH bpf-next 5/5] selftests/bpf: add benchmark for uprobe vs. user_prog
  2020-08-01  8:47 ` [PATCH bpf-next 5/5] selftests/bpf: add benchmark for uprobe vs. user_prog Song Liu
@ 2020-08-03  1:51   ` Andrii Nakryiko
  2020-08-03  4:47     ` Song Liu
  0 siblings, 1 reply; 43+ messages in thread
From: Andrii Nakryiko @ 2020-08-03  1:51 UTC (permalink / raw)
  To: Song Liu
  Cc: open list, bpf, Networking, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, john fastabend, KP Singh, Jesper Dangaard Brouer,
	Daniel Xu

On Sat, Aug 1, 2020 at 1:50 AM Song Liu <songliubraving@fb.com> wrote:
>
> Add a benchmark to compare performance of
>   1) uprobe;
>   2) user program w/o args;
>   3) user program w/ args;
>   4) user program w/ args on random cpu.
>

Can you please add it to the existing benchmark runner instead, e.g.,
along the other bench_trigger benchmarks? No need to re-implement
benchmark setup. And also that would also allow to compare existing
ways of cheaply triggering a program vs this new _USER program?

If the performance is not significantly better than other ways, do you
think it still makes sense to add a new BPF program type? I think
triggering KPROBE/TRACEPOINT from bpf_prog_test_run() would be very
nice, maybe it's possible to add that instead of a new program type?
Either way, let's see comparison with other program triggering
mechanisms first.


> Sample output:
>
> ./test_progs -t uprobe_vs_user_prog -v
> test_uprobe_vs_user_prog:PASS:uprobe_vs_user_prog__open_and_load 0 nsec
> test_uprobe_vs_user_prog:PASS:get_base_addr 0 nsec
> test_uprobe_vs_user_prog:PASS:attach_uprobe 0 nsec
> run_perf_test:PASS:uprobe 0 nsec
> Each uprobe uses 1419 nanoseconds
> run_perf_test:PASS:user_prog_no_args 0 nsec
> Each user_prog_no_args uses 313 nanoseconds
> run_perf_test:PASS:user_prog_with_args 0 nsec
> Each user_prog_with_args uses 335 nanoseconds
> run_perf_test:PASS:user_prog_with_args_on_cpu 0 nsec
> Each user_prog_with_args_on_cpu uses 2821 nanoseconds
> Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
>
> Signed-off-by: Song Liu <songliubraving@fb.com>
> ---
>  .../bpf/prog_tests/uprobe_vs_user_prog.c      | 101 ++++++++++++++++++
>  .../selftests/bpf/progs/uprobe_vs_user_prog.c |  21 ++++
>  2 files changed, 122 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/uprobe_vs_user_prog.c
>  create mode 100644 tools/testing/selftests/bpf/progs/uprobe_vs_user_prog.c
>

[...]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH bpf-next 2/5] libbpf: support BPF_PROG_TYPE_USER programs
  2020-08-03  1:40   ` Andrii Nakryiko
@ 2020-08-03  4:21     ` Song Liu
  2020-08-03  5:05       ` Andrii Nakryiko
  2020-08-04  1:18     ` Song Liu
  1 sibling, 1 reply; 43+ messages in thread
From: Song Liu @ 2020-08-03  4:21 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: open list, bpf, Networking, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, john fastabend, KP Singh, Jesper Dangaard Brouer,
	Daniel Xu



> On Aug 2, 2020, at 6:40 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> 
> On Sat, Aug 1, 2020 at 1:50 AM Song Liu <songliubraving@fb.com> wrote:
>> 
>> Add cpu_plus to bpf_prog_test_run_attr. Add BPF_PROG_SEC "user" for
>> BPF_PROG_TYPE_USER programs.
>> 
>> Signed-off-by: Song Liu <songliubraving@fb.com>
>> ---
>> tools/lib/bpf/bpf.c           | 1 +
>> tools/lib/bpf/bpf.h           | 3 +++
>> tools/lib/bpf/libbpf.c        | 1 +
>> tools/lib/bpf/libbpf_probes.c | 1 +
>> 4 files changed, 6 insertions(+)
>> 
>> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
>> index e1bdf214f75fe..b28c3daa9c270 100644
>> --- a/tools/lib/bpf/bpf.c
>> +++ b/tools/lib/bpf/bpf.c
>> @@ -693,6 +693,7 @@ int bpf_prog_test_run_xattr(struct bpf_prog_test_run_attr *test_attr)
>>        attr.test.ctx_size_in = test_attr->ctx_size_in;
>>        attr.test.ctx_size_out = test_attr->ctx_size_out;
>>        attr.test.repeat = test_attr->repeat;
>> +       attr.test.cpu_plus = test_attr->cpu_plus;
>> 
>>        ret = sys_bpf(BPF_PROG_TEST_RUN, &attr, sizeof(attr));
>>        test_attr->data_size_out = attr.test.data_size_out;
>> diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
>> index 6d367e01d05e9..0c799740df566 100644
>> --- a/tools/lib/bpf/bpf.h
>> +++ b/tools/lib/bpf/bpf.h
>> @@ -205,6 +205,9 @@ struct bpf_prog_test_run_attr {
>>        void *ctx_out;      /* optional */
>>        __u32 ctx_size_out; /* in: max length of ctx_out
>>                             * out: length of cxt_out */
>> +       __u32 cpu_plus;     /* specify which cpu to run the test with
>> +                            * cpu_plus = cpu_id + 1.
>> +                            * If cpu_plus = 0, run on current cpu */
> 
> We can't do this due to ABI guarantees. We'll have to add a new API
> using OPTS arguments.

To make sure I understand this correctly, the concern is when we compile
the binary with one version of libbpf and run it with libbpf.so of a 
different version, right? 

Thanks,
Song

> 
>> };
>> 
>> LIBBPF_API int bpf_prog_test_run_xattr(struct bpf_prog_test_run_attr *test_attr);
>> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
>> index b9f11f854985b..9ce175a486214 100644
>> --- a/tools/lib/bpf/libbpf.c
>> +++ b/tools/lib/bpf/libbpf.c
>> @@ -6922,6 +6922,7 @@ static const struct bpf_sec_def section_defs[] = {
>>        BPF_PROG_SEC("lwt_out",                 BPF_PROG_TYPE_LWT_OUT),
>>        BPF_PROG_SEC("lwt_xmit",                BPF_PROG_TYPE_LWT_XMIT),
>>        BPF_PROG_SEC("lwt_seg6local",           BPF_PROG_TYPE_LWT_SEG6LOCAL),
>> +       BPF_PROG_SEC("user",                    BPF_PROG_TYPE_USER),
> 
> let's do "user/" for consistency with most other prog types (and nice
> separation between prog type and custom user name)

Will update. 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH bpf-next 3/5] selftests/bpf: add selftest for BPF_PROG_TYPE_USER
  2020-08-03  1:43   ` Andrii Nakryiko
@ 2020-08-03  4:33     ` Song Liu
  2020-08-03  5:07       ` Andrii Nakryiko
  0 siblings, 1 reply; 43+ messages in thread
From: Song Liu @ 2020-08-03  4:33 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: open list, bpf, Networking, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, john fastabend, KP Singh, Jesper Dangaard Brouer,
	Daniel Xu



> On Aug 2, 2020, at 6:43 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> 
> On Sat, Aug 1, 2020 at 1:50 AM Song Liu <songliubraving@fb.com> wrote:
>> 
>> This test checks the correctness of BPF_PROG_TYPE_USER program, including:
>> running on the right cpu, passing in correct args, returning retval, and
>> being able to call bpf_get_stack|stackid.
>> 
>> Signed-off-by: Song Liu <songliubraving@fb.com>
>> ---
>> .../selftests/bpf/prog_tests/user_prog.c      | 52 +++++++++++++++++
>> tools/testing/selftests/bpf/progs/user_prog.c | 56 +++++++++++++++++++
>> 2 files changed, 108 insertions(+)
>> create mode 100644 tools/testing/selftests/bpf/prog_tests/user_prog.c
>> create mode 100644 tools/testing/selftests/bpf/progs/user_prog.c
>> 
>> diff --git a/tools/testing/selftests/bpf/prog_tests/user_prog.c b/tools/testing/selftests/bpf/prog_tests/user_prog.c
>> new file mode 100644
>> index 0000000000000..416707b3bff01
>> --- /dev/null
>> +++ b/tools/testing/selftests/bpf/prog_tests/user_prog.c
>> @@ -0,0 +1,52 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/* Copyright (c) 2020 Facebook */
>> +#include <test_progs.h>
>> +#include "user_prog.skel.h"
>> +
>> +static int duration;
>> +
>> +void test_user_prog(void)
>> +{
>> +       struct bpf_user_prog_args args = {{0, 1, 2, 3, 4}};
>> +       struct bpf_prog_test_run_attr attr = {};
>> +       struct user_prog *skel;
>> +       int i, numcpu, ret;
>> +
>> +       skel = user_prog__open_and_load();
>> +
>> +       if (CHECK(!skel, "user_prog__open_and_load",
>> +                 "skeleton open_and_laod failed\n"))
>> +               return;
>> +
>> +       numcpu = libbpf_num_possible_cpus();
> 
> nit: possible doesn't mean online right now, so it will fail on
> offline or non-present CPUs

Just found parse_cpu_mask_file(), will use it to fix this. 

[...]

>> +
>> +volatile int cpu_match = 1;
>> +volatile __u64 sum = 1;
>> +volatile int get_stack_success = 0;
>> +volatile int get_stackid_success = 0;
>> +volatile __u64 stacktrace[PERF_MAX_STACK_DEPTH];
> 
> nit: no need for volatile for non-static variables
> 
>> +
>> +SEC("user")
>> +int user_func(struct bpf_user_prog_ctx *ctx)
> 
> If you put args in bpf_user_prog_ctx as a first field, you should be
> able to re-use the BPF_PROG macro to access those arguments in a more
> user-friendly way.

I am not sure I am following here. Do you mean something like:

struct bpf_user_prog_ctx {
        __u64 args[BPF_USER_PROG_MAX_ARGS];
        struct pt_regs *regs;
};

(swap args and regs)? 

Thanks,
Song



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH bpf-next 4/5] selftests/bpf: move two functions to test_progs.c
  2020-08-03  1:46   ` Andrii Nakryiko
@ 2020-08-03  4:34     ` Song Liu
  0 siblings, 0 replies; 43+ messages in thread
From: Song Liu @ 2020-08-03  4:34 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: open list, bpf, Networking, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, john fastabend, KP Singh, Jesper Dangaard Brouer,
	Daniel Xu



> On Aug 2, 2020, at 6:46 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> 
> On Sat, Aug 1, 2020 at 1:50 AM Song Liu <songliubraving@fb.com> wrote:
>> 
>> Move time_get_ns() and get_base_addr() to test_progs.c, so they can be
>> used in other tests.
>> 
>> Signed-off-by: Song Liu <songliubraving@fb.com>
>> ---
>> .../selftests/bpf/prog_tests/attach_probe.c   | 21 -------------
>> .../selftests/bpf/prog_tests/test_overhead.c  |  8 -----
>> tools/testing/selftests/bpf/test_progs.c      | 30 +++++++++++++++++++
>> tools/testing/selftests/bpf/test_progs.h      |  2 ++
>> 4 files changed, 32 insertions(+), 29 deletions(-)
>> 
> 
> [...]
> 
>> static int test_task_rename(const char *prog)
>> {
>>        int i, fd, duration = 0, err;
>> diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
>> index b1e4dadacd9b4..c9e6a5ad5b9a4 100644
>> --- a/tools/testing/selftests/bpf/test_progs.c
>> +++ b/tools/testing/selftests/bpf/test_progs.c
>> @@ -622,6 +622,36 @@ int cd_flavor_subdir(const char *exec_name)
>>        return chdir(flavor);
>> }
>> 
>> +__u64 time_get_ns(void)
>> +{
> 
> I'd try to avoid adding stuff to test_progs.c. There is generic
> testing_helpers.c, maybe let's put this there?
> 
>> +       struct timespec ts;
>> +
>> +       clock_gettime(CLOCK_MONOTONIC, &ts);
>> +       return ts.tv_sec * 1000000000ull + ts.tv_nsec;
>> +}
>> +
>> +ssize_t get_base_addr(void)
>> +{
> 
> This would definitely be better in trace_helpers.c, though.

Will update. 

Thanks,
Song

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH bpf-next 5/5] selftests/bpf: add benchmark for uprobe vs. user_prog
  2020-08-03  1:51   ` Andrii Nakryiko
@ 2020-08-03  4:47     ` Song Liu
  2020-08-03  5:10       ` Andrii Nakryiko
  0 siblings, 1 reply; 43+ messages in thread
From: Song Liu @ 2020-08-03  4:47 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: open list, bpf, Networking, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, john fastabend, KP Singh, Jesper Dangaard Brouer,
	Daniel Xu


> On Aug 2, 2020, at 6:51 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> 
> On Sat, Aug 1, 2020 at 1:50 AM Song Liu <songliubraving@fb.com> wrote:
>> 
>> Add a benchmark to compare performance of
>>  1) uprobe;
>>  2) user program w/o args;
>>  3) user program w/ args;
>>  4) user program w/ args on random cpu.
>> 
> 
> Can you please add it to the existing benchmark runner instead, e.g.,
> along the other bench_trigger benchmarks? No need to re-implement
> benchmark setup. And also that would also allow to compare existing
> ways of cheaply triggering a program vs this new _USER program?

Will try. 

> 
> If the performance is not significantly better than other ways, do you
> think it still makes sense to add a new BPF program type? I think
> triggering KPROBE/TRACEPOINT from bpf_prog_test_run() would be very
> nice, maybe it's possible to add that instead of a new program type?
> Either way, let's see comparison with other program triggering
> mechanisms first.

Triggering KPROBE and TRACEPOINT from bpf_prog_test_run() will be useful. 
But I don't think they can be used instead of user program, for a couple
reasons. First, KPROBE/TRACEPOINT may be triggered by other programs 
running in the system, so user will have to filter those noise out in
each program. Second, it is not easy to specify CPU for KPROBE/TRACEPOINT,
while this feature could be useful in many cases, e.g. get stack trace 
on a given CPU. 

Thanks,
Song

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH bpf-next 2/5] libbpf: support BPF_PROG_TYPE_USER programs
  2020-08-03  4:21     ` Song Liu
@ 2020-08-03  5:05       ` Andrii Nakryiko
  0 siblings, 0 replies; 43+ messages in thread
From: Andrii Nakryiko @ 2020-08-03  5:05 UTC (permalink / raw)
  To: Song Liu
  Cc: open list, bpf, Networking, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, john fastabend, KP Singh, Jesper Dangaard Brouer,
	Daniel Xu

On Sun, Aug 2, 2020 at 9:21 PM Song Liu <songliubraving@fb.com> wrote:
>
>
>
> > On Aug 2, 2020, at 6:40 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >
> > On Sat, Aug 1, 2020 at 1:50 AM Song Liu <songliubraving@fb.com> wrote:
> >>
> >> Add cpu_plus to bpf_prog_test_run_attr. Add BPF_PROG_SEC "user" for
> >> BPF_PROG_TYPE_USER programs.
> >>
> >> Signed-off-by: Song Liu <songliubraving@fb.com>
> >> ---
> >> tools/lib/bpf/bpf.c           | 1 +
> >> tools/lib/bpf/bpf.h           | 3 +++
> >> tools/lib/bpf/libbpf.c        | 1 +
> >> tools/lib/bpf/libbpf_probes.c | 1 +
> >> 4 files changed, 6 insertions(+)
> >>
> >> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> >> index e1bdf214f75fe..b28c3daa9c270 100644
> >> --- a/tools/lib/bpf/bpf.c
> >> +++ b/tools/lib/bpf/bpf.c
> >> @@ -693,6 +693,7 @@ int bpf_prog_test_run_xattr(struct bpf_prog_test_run_attr *test_attr)
> >>        attr.test.ctx_size_in = test_attr->ctx_size_in;
> >>        attr.test.ctx_size_out = test_attr->ctx_size_out;
> >>        attr.test.repeat = test_attr->repeat;
> >> +       attr.test.cpu_plus = test_attr->cpu_plus;
> >>
> >>        ret = sys_bpf(BPF_PROG_TEST_RUN, &attr, sizeof(attr));
> >>        test_attr->data_size_out = attr.test.data_size_out;
> >> diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
> >> index 6d367e01d05e9..0c799740df566 100644
> >> --- a/tools/lib/bpf/bpf.h
> >> +++ b/tools/lib/bpf/bpf.h
> >> @@ -205,6 +205,9 @@ struct bpf_prog_test_run_attr {
> >>        void *ctx_out;      /* optional */
> >>        __u32 ctx_size_out; /* in: max length of ctx_out
> >>                             * out: length of cxt_out */
> >> +       __u32 cpu_plus;     /* specify which cpu to run the test with
> >> +                            * cpu_plus = cpu_id + 1.
> >> +                            * If cpu_plus = 0, run on current cpu */
> >
> > We can't do this due to ABI guarantees. We'll have to add a new API
> > using OPTS arguments.
>
> To make sure I understand this correctly, the concern is when we compile
> the binary with one version of libbpf and run it with libbpf.so of a
> different version, right?
>

yep, exactly

> Thanks,
> Song
>
> >
> >> };
> >>
> >> LIBBPF_API int bpf_prog_test_run_xattr(struct bpf_prog_test_run_attr *test_attr);
> >> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> >> index b9f11f854985b..9ce175a486214 100644
> >> --- a/tools/lib/bpf/libbpf.c
> >> +++ b/tools/lib/bpf/libbpf.c
> >> @@ -6922,6 +6922,7 @@ static const struct bpf_sec_def section_defs[] = {
> >>        BPF_PROG_SEC("lwt_out",                 BPF_PROG_TYPE_LWT_OUT),
> >>        BPF_PROG_SEC("lwt_xmit",                BPF_PROG_TYPE_LWT_XMIT),
> >>        BPF_PROG_SEC("lwt_seg6local",           BPF_PROG_TYPE_LWT_SEG6LOCAL),
> >> +       BPF_PROG_SEC("user",                    BPF_PROG_TYPE_USER),
> >
> > let's do "user/" for consistency with most other prog types (and nice
> > separation between prog type and custom user name)
>
> Will update.
>

thanks!

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH bpf-next 3/5] selftests/bpf: add selftest for BPF_PROG_TYPE_USER
  2020-08-03  4:33     ` Song Liu
@ 2020-08-03  5:07       ` Andrii Nakryiko
  0 siblings, 0 replies; 43+ messages in thread
From: Andrii Nakryiko @ 2020-08-03  5:07 UTC (permalink / raw)
  To: Song Liu
  Cc: open list, bpf, Networking, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, john fastabend, KP Singh, Jesper Dangaard Brouer,
	Daniel Xu

On Sun, Aug 2, 2020 at 9:33 PM Song Liu <songliubraving@fb.com> wrote:
>
>
>
> > On Aug 2, 2020, at 6:43 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >
> > On Sat, Aug 1, 2020 at 1:50 AM Song Liu <songliubraving@fb.com> wrote:
> >>
> >> This test checks the correctness of BPF_PROG_TYPE_USER program, including:
> >> running on the right cpu, passing in correct args, returning retval, and
> >> being able to call bpf_get_stack|stackid.
> >>
> >> Signed-off-by: Song Liu <songliubraving@fb.com>
> >> ---
> >> .../selftests/bpf/prog_tests/user_prog.c      | 52 +++++++++++++++++
> >> tools/testing/selftests/bpf/progs/user_prog.c | 56 +++++++++++++++++++
> >> 2 files changed, 108 insertions(+)
> >> create mode 100644 tools/testing/selftests/bpf/prog_tests/user_prog.c
> >> create mode 100644 tools/testing/selftests/bpf/progs/user_prog.c
> >>
> >> diff --git a/tools/testing/selftests/bpf/prog_tests/user_prog.c b/tools/testing/selftests/bpf/prog_tests/user_prog.c
> >> new file mode 100644
> >> index 0000000000000..416707b3bff01
> >> --- /dev/null
> >> +++ b/tools/testing/selftests/bpf/prog_tests/user_prog.c
> >> @@ -0,0 +1,52 @@
> >> +// SPDX-License-Identifier: GPL-2.0
> >> +/* Copyright (c) 2020 Facebook */
> >> +#include <test_progs.h>
> >> +#include "user_prog.skel.h"
> >> +
> >> +static int duration;
> >> +
> >> +void test_user_prog(void)
> >> +{
> >> +       struct bpf_user_prog_args args = {{0, 1, 2, 3, 4}};
> >> +       struct bpf_prog_test_run_attr attr = {};
> >> +       struct user_prog *skel;
> >> +       int i, numcpu, ret;
> >> +
> >> +       skel = user_prog__open_and_load();
> >> +
> >> +       if (CHECK(!skel, "user_prog__open_and_load",
> >> +                 "skeleton open_and_laod failed\n"))
> >> +               return;
> >> +
> >> +       numcpu = libbpf_num_possible_cpus();
> >
> > nit: possible doesn't mean online right now, so it will fail on
> > offline or non-present CPUs
>
> Just found parse_cpu_mask_file(), will use it to fix this.
>
> [...]
>
> >> +
> >> +volatile int cpu_match = 1;
> >> +volatile __u64 sum = 1;
> >> +volatile int get_stack_success = 0;
> >> +volatile int get_stackid_success = 0;
> >> +volatile __u64 stacktrace[PERF_MAX_STACK_DEPTH];
> >
> > nit: no need for volatile for non-static variables
> >
> >> +
> >> +SEC("user")
> >> +int user_func(struct bpf_user_prog_ctx *ctx)
> >
> > If you put args in bpf_user_prog_ctx as a first field, you should be
> > able to re-use the BPF_PROG macro to access those arguments in a more
> > user-friendly way.
>
> I am not sure I am following here. Do you mean something like:
>
> struct bpf_user_prog_ctx {
>         __u64 args[BPF_USER_PROG_MAX_ARGS];
>         struct pt_regs *regs;
> };
>
> (swap args and regs)?
>

Yes, BPF_PROG assumes that context is a plain u64[5] array, so if you
put args at the beginning, it will work nicely with BPF_PROG.

> Thanks,
> Song
>
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH bpf-next 5/5] selftests/bpf: add benchmark for uprobe vs. user_prog
  2020-08-03  4:47     ` Song Liu
@ 2020-08-03  5:10       ` Andrii Nakryiko
  2020-08-04 20:54         ` Song Liu
  0 siblings, 1 reply; 43+ messages in thread
From: Andrii Nakryiko @ 2020-08-03  5:10 UTC (permalink / raw)
  To: Song Liu
  Cc: open list, bpf, Networking, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, john fastabend, KP Singh, Jesper Dangaard Brouer,
	Daniel Xu

On Sun, Aug 2, 2020 at 9:47 PM Song Liu <songliubraving@fb.com> wrote:
>
>
> > On Aug 2, 2020, at 6:51 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >
> > On Sat, Aug 1, 2020 at 1:50 AM Song Liu <songliubraving@fb.com> wrote:
> >>
> >> Add a benchmark to compare performance of
> >>  1) uprobe;
> >>  2) user program w/o args;
> >>  3) user program w/ args;
> >>  4) user program w/ args on random cpu.
> >>
> >
> > Can you please add it to the existing benchmark runner instead, e.g.,
> > along the other bench_trigger benchmarks? No need to re-implement
> > benchmark setup. And also that would also allow to compare existing
> > ways of cheaply triggering a program vs this new _USER program?
>
> Will try.
>
> >
> > If the performance is not significantly better than other ways, do you
> > think it still makes sense to add a new BPF program type? I think
> > triggering KPROBE/TRACEPOINT from bpf_prog_test_run() would be very
> > nice, maybe it's possible to add that instead of a new program type?
> > Either way, let's see comparison with other program triggering
> > mechanisms first.
>
> Triggering KPROBE and TRACEPOINT from bpf_prog_test_run() will be useful.
> But I don't think they can be used instead of user program, for a couple
> reasons. First, KPROBE/TRACEPOINT may be triggered by other programs
> running in the system, so user will have to filter those noise out in
> each program. Second, it is not easy to specify CPU for KPROBE/TRACEPOINT,
> while this feature could be useful in many cases, e.g. get stack trace
> on a given CPU.
>

Right, it's not as convenient with KPROBE/TRACEPOINT as with the USER
program you've added specifically with that feature in mind. But if
you pin user-space thread on the needed CPU and trigger kprobe/tp,
then you'll get what you want. As for the "noise", see how
bench_trigger() deals with that: it records thread ID and filters
everything not matching. You can do the same with CPU ID. It's not as
automatic as with a special BPF program type, but still pretty simple,
which is why I'm still deciding (for myself) whether USER program type
is necessary :)


> Thanks,
> Song

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH bpf-next 2/5] libbpf: support BPF_PROG_TYPE_USER programs
  2020-08-03  1:40   ` Andrii Nakryiko
  2020-08-03  4:21     ` Song Liu
@ 2020-08-04  1:18     ` Song Liu
  2020-08-05  1:38       ` Andrii Nakryiko
  1 sibling, 1 reply; 43+ messages in thread
From: Song Liu @ 2020-08-04  1:18 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: open list, bpf, Networking, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, john fastabend, KP Singh, Jesper Dangaard Brouer,
	Daniel Xu



> On Aug 2, 2020, at 6:40 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> 
> On Sat, Aug 1, 2020 at 1:50 AM Song Liu <songliubraving@fb.com> wrote:
>> 

[...]

> 
>> };
>> 
>> LIBBPF_API int bpf_prog_test_run_xattr(struct bpf_prog_test_run_attr *test_attr);
>> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
>> index b9f11f854985b..9ce175a486214 100644
>> --- a/tools/lib/bpf/libbpf.c
>> +++ b/tools/lib/bpf/libbpf.c
>> @@ -6922,6 +6922,7 @@ static const struct bpf_sec_def section_defs[] = {
>>        BPF_PROG_SEC("lwt_out",                 BPF_PROG_TYPE_LWT_OUT),
>>        BPF_PROG_SEC("lwt_xmit",                BPF_PROG_TYPE_LWT_XMIT),
>>        BPF_PROG_SEC("lwt_seg6local",           BPF_PROG_TYPE_LWT_SEG6LOCAL),
>> +       BPF_PROG_SEC("user",                    BPF_PROG_TYPE_USER),
> 
> let's do "user/" for consistency with most other prog types (and nice
> separation between prog type and custom user name)

About "user" vs. "user/", I still think "user" is better. 

Unlike kprobe and tracepoint, user prog doesn't use the part after "/". 
This is similar to "perf_event" for BPF_PROG_TYPE_PERF_EVENT, "xdl" for 
BPF_PROG_TYPE_XDP, etc. If we specify "user" here, "user/" and "user/xxx" 
would also work. However, if we specify "user/" here, programs that used 
"user" by accident will fail to load, with a message like:

	libbpf: failed to load program 'user'

which is confusing. 

Thanks,
Song

[...]


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH bpf-next 5/5] selftests/bpf: add benchmark for uprobe vs. user_prog
  2020-08-03  5:10       ` Andrii Nakryiko
@ 2020-08-04 20:54         ` Song Liu
  2020-08-05  1:52           ` Andrii Nakryiko
  0 siblings, 1 reply; 43+ messages in thread
From: Song Liu @ 2020-08-04 20:54 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: open list, bpf, Networking, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, john fastabend, KP Singh, Jesper Dangaard Brouer,
	Daniel Xu



> On Aug 2, 2020, at 10:10 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> 
> On Sun, Aug 2, 2020 at 9:47 PM Song Liu <songliubraving@fb.com> wrote:
>> 
>> 
>>> On Aug 2, 2020, at 6:51 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>>> 
>>> On Sat, Aug 1, 2020 at 1:50 AM Song Liu <songliubraving@fb.com> wrote:
>>>> 
>>>> Add a benchmark to compare performance of
>>>> 1) uprobe;
>>>> 2) user program w/o args;
>>>> 3) user program w/ args;
>>>> 4) user program w/ args on random cpu.
>>>> 
>>> 
>>> Can you please add it to the existing benchmark runner instead, e.g.,
>>> along the other bench_trigger benchmarks? No need to re-implement
>>> benchmark setup. And also that would also allow to compare existing
>>> ways of cheaply triggering a program vs this new _USER program?
>> 
>> Will try.
>> 
>>> 
>>> If the performance is not significantly better than other ways, do you
>>> think it still makes sense to add a new BPF program type? I think
>>> triggering KPROBE/TRACEPOINT from bpf_prog_test_run() would be very
>>> nice, maybe it's possible to add that instead of a new program type?
>>> Either way, let's see comparison with other program triggering
>>> mechanisms first.
>> 
>> Triggering KPROBE and TRACEPOINT from bpf_prog_test_run() will be useful.
>> But I don't think they can be used instead of user program, for a couple
>> reasons. First, KPROBE/TRACEPOINT may be triggered by other programs
>> running in the system, so user will have to filter those noise out in
>> each program. Second, it is not easy to specify CPU for KPROBE/TRACEPOINT,
>> while this feature could be useful in many cases, e.g. get stack trace
>> on a given CPU.
>> 
> 
> Right, it's not as convenient with KPROBE/TRACEPOINT as with the USER
> program you've added specifically with that feature in mind. But if
> you pin user-space thread on the needed CPU and trigger kprobe/tp,
> then you'll get what you want. As for the "noise", see how
> bench_trigger() deals with that: it records thread ID and filters
> everything not matching. You can do the same with CPU ID. It's not as
> automatic as with a special BPF program type, but still pretty simple,
> which is why I'm still deciding (for myself) whether USER program type
> is necessary :)

Here are some bench_trigger numbers:

base      :    1.698 ± 0.001M/s
tp        :    1.477 ± 0.001M/s
rawtp     :    1.567 ± 0.001M/s
kprobe    :    1.431 ± 0.000M/s
fentry    :    1.691 ± 0.000M/s
fmodret   :    1.654 ± 0.000M/s
user      :    1.253 ± 0.000M/s
fentry-on-cpu:    0.022 ± 0.011M/s
user-on-cpu:    0.315 ± 0.001M/s

The two "on-cpu" tests run the program on a different CPU (see the patch
at the end). 

"user" is about 25% slower than "fentry". I think this is mostly because 
getpgid() is a faster syscall than bpf(BPF_TEST_RUN). 

"user-on-cpu" is more than 10x faster than "fentry-on-cpu", because IPI
is way faster than moving the process (via sched_setaffinity). 

For use cases that we would like to call BPF program on specific CPU, 
triggering it via IPI is a lot faster. 

Thanks,
Song


========================== 8< ==========================

diff --git c/tools/testing/selftests/bpf/bench.c w/tools/testing/selftests/bpf/bench.c
index 944ad4721c83c..5394a1d2dfd21 100644
--- c/tools/testing/selftests/bpf/bench.c
+++ w/tools/testing/selftests/bpf/bench.c
@@ -317,7 +317,10 @@ extern const struct bench bench_trig_tp;
 extern const struct bench bench_trig_rawtp;
 extern const struct bench bench_trig_kprobe;
 extern const struct bench bench_trig_fentry;
+extern const struct bench bench_trig_fentry_on_cpu;
 extern const struct bench bench_trig_fmodret;
+extern const struct bench bench_trig_user;
+extern const struct bench bench_trig_user_on_cpu;
 extern const struct bench bench_rb_libbpf;
 extern const struct bench bench_rb_custom;
 extern const struct bench bench_pb_libbpf;
@@ -338,7 +341,10 @@ static const struct bench *benchs[] = {
        &bench_trig_rawtp,
        &bench_trig_kprobe,
        &bench_trig_fentry,
+       &bench_trig_fentry_on_cpu,
        &bench_trig_fmodret,
+       &bench_trig_user,
+       &bench_trig_user_on_cpu,
        &bench_rb_libbpf,
        &bench_rb_custom,
        &bench_pb_libbpf,
@@ -462,4 +468,3 @@ int main(int argc, char **argv)

        return 0;
 }
-
diff --git c/tools/testing/selftests/bpf/benchs/bench_trigger.c w/tools/testing/selftests/bpf/benchs/bench_trigger.c
index 49c22832f2169..a1ebaebf6070c 100644
--- c/tools/testing/selftests/bpf/benchs/bench_trigger.c
+++ w/tools/testing/selftests/bpf/benchs/bench_trigger.c
@@ -1,5 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 /* Copyright (c) 2020 Facebook */
+#define _GNU_SOURCE
+#include <sched.h>
 #include "bench.h"
 #include "trigger_bench.skel.h"

@@ -39,6 +41,22 @@ static void *trigger_producer(void *input)
        return NULL;
 }

+static void *trigger_on_cpu_producer(void *input)
+{
+       cpu_set_t set;
+       int i = 0, nr_cpu;
+
+       nr_cpu = libbpf_num_possible_cpus();
+       while (true) {
+               CPU_ZERO(&set);
+               CPU_SET(i, &set);
+               sched_setaffinity(0, sizeof(set), &set);
+               (void)syscall(__NR_getpgid);
+               i = (i + 1) % nr_cpu;
+       }
+       return NULL;
+}
+
 static void trigger_measure(struct bench_res *res)
 {
        res->hits = atomic_swap(&ctx.skel->bss->hits, 0);
@@ -96,6 +114,39 @@ static void trigger_fmodret_setup()
        attach_bpf(ctx.skel->progs.bench_trigger_fmodret);
 }

+static void trigger_user_setup()
+{
+       setup_ctx();
+}
+
+static void *trigger_producer_user(void *input)
+{
+       struct bpf_prog_test_run_attr attr = {};
+
+       attr.prog_fd = bpf_program__fd(ctx.skel->progs.bench_trigger_user);
+
+       while (true)
+               (void)bpf_prog_test_run_xattr(&attr);
+       return NULL;
+}
+
+static void *trigger_producer_user_on_cpu(void *input)
+{
+       struct bpf_prog_test_run_attr attr = {};
+       int i = 0, nr_cpu;
+
+       nr_cpu = libbpf_num_possible_cpus();
+
+       attr.prog_fd = bpf_program__fd(ctx.skel->progs.bench_trigger_user);
+
+       while (true) {
+               attr.cpu_plus = i + 1;
+               (void)bpf_prog_test_run_xattr(&attr);
+               i = (i + 1) % nr_cpu;
+       }
+       return NULL;
+}
+
 static void *trigger_consumer(void *input)
 {
        return NULL;
@@ -155,6 +206,17 @@ const struct bench bench_trig_fentry = {
        .report_final = hits_drops_report_final,
 };

+const struct bench bench_trig_fentry_on_cpu = {
+       .name = "trig-fentry-on-cpu",
+       .validate = trigger_validate,
+       .setup = trigger_fentry_setup,
+       .producer_thread = trigger_on_cpu_producer,
+       .consumer_thread = trigger_consumer,
+       .measure = trigger_measure,
+       .report_progress = hits_drops_report_progress,
+       .report_final = hits_drops_report_final,
+};
+
 const struct bench bench_trig_fmodret = {
        .name = "trig-fmodret",
        .validate = trigger_validate,
@@ -165,3 +227,25 @@ const struct bench bench_trig_fmodret = {
        .report_progress = hits_drops_report_progress,
        .report_final = hits_drops_report_final,
 };
+
+const struct bench bench_trig_user = {
+       .name = "trig-user",
+       .validate = trigger_validate,
+       .setup = trigger_user_setup,
+       .producer_thread = trigger_producer_user,
+       .consumer_thread = trigger_consumer,
+       .measure = trigger_measure,
+       .report_progress = hits_drops_report_progress,
+       .report_final = hits_drops_report_final,
+};
+
+const struct bench bench_trig_user_on_cpu = {
+       .name = "trig-user-on-cpu",
+       .validate = trigger_validate,
+       .setup = trigger_user_setup,
+       .producer_thread = trigger_producer_user_on_cpu,
+       .consumer_thread = trigger_consumer,
+       .measure = trigger_measure,
+       .report_progress = hits_drops_report_progress,
+       .report_final = hits_drops_report_final,
+};
diff --git c/tools/testing/selftests/bpf/benchs/run_bench_trigger.sh w/tools/testing/selftests/bpf/benchs/run_bench_trigger.sh
index 78e83f2432946..f10b7aea76aa3 100755
--- c/tools/testing/selftests/bpf/benchs/run_bench_trigger.sh
+++ w/tools/testing/selftests/bpf/benchs/run_bench_trigger.sh
@@ -2,7 +2,7 @@

 set -eufo pipefail

-for i in base tp rawtp kprobe fentry fmodret
+for i in base tp rawtp kprobe fentry fmodret user fentry-on-cpu user-on-cpu
 do
        summary=$(sudo ./bench -w2 -d5 -a trig-$i | tail -n1 | cut -d'(' -f1 | cut -d' ' -f3-)
        printf "%-10s: %s\n" $i "$summary"
diff --git c/tools/testing/selftests/bpf/progs/trigger_bench.c w/tools/testing/selftests/bpf/progs/trigger_bench.c
index 8b36b6640e7e9..a6ac11e68d287 100644
--- c/tools/testing/selftests/bpf/progs/trigger_bench.c
+++ w/tools/testing/selftests/bpf/progs/trigger_bench.c
@@ -45,3 +45,10 @@ int bench_trigger_fmodret(void *ctx)
        __sync_add_and_fetch(&hits, 1);
        return -22;
 }
+
+SEC("user")
+int BPF_PROG(bench_trigger_user)
+{
+       __sync_add_and_fetch(&hits, 1);
+       return 0;
+}
~





^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH bpf-next 2/5] libbpf: support BPF_PROG_TYPE_USER programs
  2020-08-04  1:18     ` Song Liu
@ 2020-08-05  1:38       ` Andrii Nakryiko
  2020-08-05  3:59         ` Song Liu
  0 siblings, 1 reply; 43+ messages in thread
From: Andrii Nakryiko @ 2020-08-05  1:38 UTC (permalink / raw)
  To: Song Liu
  Cc: open list, bpf, Networking, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, john fastabend, KP Singh, Jesper Dangaard Brouer,
	Daniel Xu

On Mon, Aug 3, 2020 at 6:18 PM Song Liu <songliubraving@fb.com> wrote:
>
>
>
> > On Aug 2, 2020, at 6:40 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >
> > On Sat, Aug 1, 2020 at 1:50 AM Song Liu <songliubraving@fb.com> wrote:
> >>
>
> [...]
>
> >
> >> };
> >>
> >> LIBBPF_API int bpf_prog_test_run_xattr(struct bpf_prog_test_run_attr *test_attr);
> >> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> >> index b9f11f854985b..9ce175a486214 100644
> >> --- a/tools/lib/bpf/libbpf.c
> >> +++ b/tools/lib/bpf/libbpf.c
> >> @@ -6922,6 +6922,7 @@ static const struct bpf_sec_def section_defs[] = {
> >>        BPF_PROG_SEC("lwt_out",                 BPF_PROG_TYPE_LWT_OUT),
> >>        BPF_PROG_SEC("lwt_xmit",                BPF_PROG_TYPE_LWT_XMIT),
> >>        BPF_PROG_SEC("lwt_seg6local",           BPF_PROG_TYPE_LWT_SEG6LOCAL),
> >> +       BPF_PROG_SEC("user",                    BPF_PROG_TYPE_USER),
> >
> > let's do "user/" for consistency with most other prog types (and nice
> > separation between prog type and custom user name)
>
> About "user" vs. "user/", I still think "user" is better.
>
> Unlike kprobe and tracepoint, user prog doesn't use the part after "/".
> This is similar to "perf_event" for BPF_PROG_TYPE_PERF_EVENT, "xdl" for
> BPF_PROG_TYPE_XDP, etc. If we specify "user" here, "user/" and "user/xxx"
> would also work. However, if we specify "user/" here, programs that used
> "user" by accident will fail to load, with a message like:
>
>         libbpf: failed to load program 'user'
>
> which is confusing.

xdp, perf_event and a bunch of others don't enforce it, that's true,
they are a bit of a legacy, unfortunately. But all the recent ones do,
and we explicitly did that for xdp_dev/xdp_cpu, for instance.
Specifying just "user" in the spec would allow something nonsensical
like "userargh", for instance, due to this being treated as a prefix.
There is no harm to require users to do "user/my_prog", though.

Alternatively, we could introduce a new convention in the spec,
something like "user?", which would accept either "user" or
"user/something", but not "user/" nor "userblah". We can try that as
well.

>
> Thanks,
> Song
>
> [...]
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH bpf-next 5/5] selftests/bpf: add benchmark for uprobe vs. user_prog
  2020-08-04 20:54         ` Song Liu
@ 2020-08-05  1:52           ` Andrii Nakryiko
  2020-08-05  4:47             ` Song Liu
  0 siblings, 1 reply; 43+ messages in thread
From: Andrii Nakryiko @ 2020-08-05  1:52 UTC (permalink / raw)
  To: Song Liu
  Cc: open list, bpf, Networking, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, john fastabend, KP Singh, Jesper Dangaard Brouer,
	Daniel Xu

On Tue, Aug 4, 2020 at 2:01 PM Song Liu <songliubraving@fb.com> wrote:
>
>
>
> > On Aug 2, 2020, at 10:10 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >
> > On Sun, Aug 2, 2020 at 9:47 PM Song Liu <songliubraving@fb.com> wrote:
> >>
> >>
> >>> On Aug 2, 2020, at 6:51 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >>>
> >>> On Sat, Aug 1, 2020 at 1:50 AM Song Liu <songliubraving@fb.com> wrote:
> >>>>
> >>>> Add a benchmark to compare performance of
> >>>> 1) uprobe;
> >>>> 2) user program w/o args;
> >>>> 3) user program w/ args;
> >>>> 4) user program w/ args on random cpu.
> >>>>
> >>>
> >>> Can you please add it to the existing benchmark runner instead, e.g.,
> >>> along the other bench_trigger benchmarks? No need to re-implement
> >>> benchmark setup. And also that would also allow to compare existing
> >>> ways of cheaply triggering a program vs this new _USER program?
> >>
> >> Will try.
> >>
> >>>
> >>> If the performance is not significantly better than other ways, do you
> >>> think it still makes sense to add a new BPF program type? I think
> >>> triggering KPROBE/TRACEPOINT from bpf_prog_test_run() would be very
> >>> nice, maybe it's possible to add that instead of a new program type?
> >>> Either way, let's see comparison with other program triggering
> >>> mechanisms first.
> >>
> >> Triggering KPROBE and TRACEPOINT from bpf_prog_test_run() will be useful.
> >> But I don't think they can be used instead of user program, for a couple
> >> reasons. First, KPROBE/TRACEPOINT may be triggered by other programs
> >> running in the system, so user will have to filter those noise out in
> >> each program. Second, it is not easy to specify CPU for KPROBE/TRACEPOINT,
> >> while this feature could be useful in many cases, e.g. get stack trace
> >> on a given CPU.
> >>
> >
> > Right, it's not as convenient with KPROBE/TRACEPOINT as with the USER
> > program you've added specifically with that feature in mind. But if
> > you pin user-space thread on the needed CPU and trigger kprobe/tp,
> > then you'll get what you want. As for the "noise", see how
> > bench_trigger() deals with that: it records thread ID and filters
> > everything not matching. You can do the same with CPU ID. It's not as
> > automatic as with a special BPF program type, but still pretty simple,
> > which is why I'm still deciding (for myself) whether USER program type
> > is necessary :)
>
> Here are some bench_trigger numbers:
>
> base      :    1.698 ± 0.001M/s
> tp        :    1.477 ± 0.001M/s
> rawtp     :    1.567 ± 0.001M/s
> kprobe    :    1.431 ± 0.000M/s
> fentry    :    1.691 ± 0.000M/s
> fmodret   :    1.654 ± 0.000M/s
> user      :    1.253 ± 0.000M/s
> fentry-on-cpu:    0.022 ± 0.011M/s
> user-on-cpu:    0.315 ± 0.001M/s
>

Ok, so basically all of raw_tp,tp,kprobe,fentry/fexit are
significantly faster than USER programs. Sure, when compared to
uprobe, they are faster, but not when doing on-specific-CPU run, it
seems (judging from this patch's description, if I'm reading it
right). Anyways, speed argument shouldn't be a reason for doing this,
IMO.

> The two "on-cpu" tests run the program on a different CPU (see the patch
> at the end).
>
> "user" is about 25% slower than "fentry". I think this is mostly because
> getpgid() is a faster syscall than bpf(BPF_TEST_RUN).

Yes, probably.

>
> "user-on-cpu" is more than 10x faster than "fentry-on-cpu", because IPI
> is way faster than moving the process (via sched_setaffinity).

I don't think that's a good comparison, because you are actually
testing sched_setaffinity performance on each iteration vs IPI in the
kernel, not a BPF overhead.

I think the fair comparison for this would be to create a thread and
pin it on necessary CPU, and only then BPF program calls in a loop.
But I bet any of existing program types would beat USER program.

>
> For use cases that we would like to call BPF program on specific CPU,
> triggering it via IPI is a lot faster.

So these use cases would be nice to expand on in the motivational part
of the patch set. It's not really emphasized and it's not at all clear
what you are trying to achieve. It also seems, depending on latency
requirements, it's totally possible to achieve comparable results by
pre-creating a thread for each CPU, pinning each one to its designated
CPU and then using any suitable user-space signaling mechanism (a
queue, condvar, etc) to ask a thread to trigger BPF program (fentry on
getpgid(), for instance). I bet in this case the  performance would be
really nice for a lot of practical use cases. But then again, I don't
know details of the intended use case, so please provide some more
details.

>
> Thanks,
> Song
>
>
> ========================== 8< ==========================
>
> diff --git c/tools/testing/selftests/bpf/bench.c w/tools/testing/selftests/bpf/bench.c
> index 944ad4721c83c..5394a1d2dfd21 100644
> --- c/tools/testing/selftests/bpf/bench.c
> +++ w/tools/testing/selftests/bpf/bench.c
> @@ -317,7 +317,10 @@ extern const struct bench bench_trig_tp;
>  extern const struct bench bench_trig_rawtp;
>  extern const struct bench bench_trig_kprobe;
>  extern const struct bench bench_trig_fentry;
> +extern const struct bench bench_trig_fentry_on_cpu;
>  extern const struct bench bench_trig_fmodret;
> +extern const struct bench bench_trig_user;
> +extern const struct bench bench_trig_user_on_cpu;
>  extern const struct bench bench_rb_libbpf;
>  extern const struct bench bench_rb_custom;
>  extern const struct bench bench_pb_libbpf;
> @@ -338,7 +341,10 @@ static const struct bench *benchs[] = {
>         &bench_trig_rawtp,
>         &bench_trig_kprobe,
>         &bench_trig_fentry,
> +       &bench_trig_fentry_on_cpu,
>         &bench_trig_fmodret,
> +       &bench_trig_user,
> +       &bench_trig_user_on_cpu,
>         &bench_rb_libbpf,
>         &bench_rb_custom,
>         &bench_pb_libbpf,
> @@ -462,4 +468,3 @@ int main(int argc, char **argv)
>
>         return 0;
>  }
> -
> diff --git c/tools/testing/selftests/bpf/benchs/bench_trigger.c w/tools/testing/selftests/bpf/benchs/bench_trigger.c
> index 49c22832f2169..a1ebaebf6070c 100644
> --- c/tools/testing/selftests/bpf/benchs/bench_trigger.c
> +++ w/tools/testing/selftests/bpf/benchs/bench_trigger.c
> @@ -1,5 +1,7 @@
>  // SPDX-License-Identifier: GPL-2.0
>  /* Copyright (c) 2020 Facebook */
> +#define _GNU_SOURCE
> +#include <sched.h>
>  #include "bench.h"
>  #include "trigger_bench.skel.h"
>
> @@ -39,6 +41,22 @@ static void *trigger_producer(void *input)
>         return NULL;
>  }
>
> +static void *trigger_on_cpu_producer(void *input)
> +{
> +       cpu_set_t set;
> +       int i = 0, nr_cpu;
> +
> +       nr_cpu = libbpf_num_possible_cpus();
> +       while (true) {
> +               CPU_ZERO(&set);
> +               CPU_SET(i, &set);
> +               sched_setaffinity(0, sizeof(set), &set);
> +               (void)syscall(__NR_getpgid);
> +               i = (i + 1) % nr_cpu;
> +       }
> +       return NULL;
> +}
> +
>  static void trigger_measure(struct bench_res *res)
>  {
>         res->hits = atomic_swap(&ctx.skel->bss->hits, 0);
> @@ -96,6 +114,39 @@ static void trigger_fmodret_setup()
>         attach_bpf(ctx.skel->progs.bench_trigger_fmodret);
>  }
>
> +static void trigger_user_setup()
> +{
> +       setup_ctx();
> +}
> +
> +static void *trigger_producer_user(void *input)
> +{
> +       struct bpf_prog_test_run_attr attr = {};
> +
> +       attr.prog_fd = bpf_program__fd(ctx.skel->progs.bench_trigger_user);
> +
> +       while (true)
> +               (void)bpf_prog_test_run_xattr(&attr);
> +       return NULL;
> +}
> +
> +static void *trigger_producer_user_on_cpu(void *input)
> +{
> +       struct bpf_prog_test_run_attr attr = {};
> +       int i = 0, nr_cpu;
> +
> +       nr_cpu = libbpf_num_possible_cpus();
> +
> +       attr.prog_fd = bpf_program__fd(ctx.skel->progs.bench_trigger_user);
> +
> +       while (true) {
> +               attr.cpu_plus = i + 1;
> +               (void)bpf_prog_test_run_xattr(&attr);
> +               i = (i + 1) % nr_cpu;
> +       }
> +       return NULL;
> +}
> +
>  static void *trigger_consumer(void *input)
>  {
>         return NULL;
> @@ -155,6 +206,17 @@ const struct bench bench_trig_fentry = {
>         .report_final = hits_drops_report_final,
>  };
>
> +const struct bench bench_trig_fentry_on_cpu = {
> +       .name = "trig-fentry-on-cpu",
> +       .validate = trigger_validate,
> +       .setup = trigger_fentry_setup,
> +       .producer_thread = trigger_on_cpu_producer,
> +       .consumer_thread = trigger_consumer,
> +       .measure = trigger_measure,
> +       .report_progress = hits_drops_report_progress,
> +       .report_final = hits_drops_report_final,
> +};
> +
>  const struct bench bench_trig_fmodret = {
>         .name = "trig-fmodret",
>         .validate = trigger_validate,
> @@ -165,3 +227,25 @@ const struct bench bench_trig_fmodret = {
>         .report_progress = hits_drops_report_progress,
>         .report_final = hits_drops_report_final,
>  };
> +
> +const struct bench bench_trig_user = {
> +       .name = "trig-user",
> +       .validate = trigger_validate,
> +       .setup = trigger_user_setup,
> +       .producer_thread = trigger_producer_user,
> +       .consumer_thread = trigger_consumer,
> +       .measure = trigger_measure,
> +       .report_progress = hits_drops_report_progress,
> +       .report_final = hits_drops_report_final,
> +};
> +
> +const struct bench bench_trig_user_on_cpu = {
> +       .name = "trig-user-on-cpu",
> +       .validate = trigger_validate,
> +       .setup = trigger_user_setup,
> +       .producer_thread = trigger_producer_user_on_cpu,
> +       .consumer_thread = trigger_consumer,
> +       .measure = trigger_measure,
> +       .report_progress = hits_drops_report_progress,
> +       .report_final = hits_drops_report_final,
> +};
> diff --git c/tools/testing/selftests/bpf/benchs/run_bench_trigger.sh w/tools/testing/selftests/bpf/benchs/run_bench_trigger.sh
> index 78e83f2432946..f10b7aea76aa3 100755
> --- c/tools/testing/selftests/bpf/benchs/run_bench_trigger.sh
> +++ w/tools/testing/selftests/bpf/benchs/run_bench_trigger.sh
> @@ -2,7 +2,7 @@
>
>  set -eufo pipefail
>
> -for i in base tp rawtp kprobe fentry fmodret
> +for i in base tp rawtp kprobe fentry fmodret user fentry-on-cpu user-on-cpu
>  do
>         summary=$(sudo ./bench -w2 -d5 -a trig-$i | tail -n1 | cut -d'(' -f1 | cut -d' ' -f3-)
>         printf "%-10s: %s\n" $i "$summary"
> diff --git c/tools/testing/selftests/bpf/progs/trigger_bench.c w/tools/testing/selftests/bpf/progs/trigger_bench.c
> index 8b36b6640e7e9..a6ac11e68d287 100644
> --- c/tools/testing/selftests/bpf/progs/trigger_bench.c
> +++ w/tools/testing/selftests/bpf/progs/trigger_bench.c
> @@ -45,3 +45,10 @@ int bench_trigger_fmodret(void *ctx)
>         __sync_add_and_fetch(&hits, 1);
>         return -22;
>  }
> +
> +SEC("user")
> +int BPF_PROG(bench_trigger_user)
> +{
> +       __sync_add_and_fetch(&hits, 1);
> +       return 0;
> +}
> ~
>
>
>
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH bpf-next 2/5] libbpf: support BPF_PROG_TYPE_USER programs
  2020-08-05  1:38       ` Andrii Nakryiko
@ 2020-08-05  3:59         ` Song Liu
  2020-08-05  5:32           ` Andrii Nakryiko
  0 siblings, 1 reply; 43+ messages in thread
From: Song Liu @ 2020-08-05  3:59 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: open list, bpf, Networking, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, john fastabend, KP Singh, Jesper Dangaard Brouer,
	Daniel Xu



> On Aug 4, 2020, at 6:38 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> 
> On Mon, Aug 3, 2020 at 6:18 PM Song Liu <songliubraving@fb.com> wrote:
>> 
>> 
>> 
>>> On Aug 2, 2020, at 6:40 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>>> 
>>> On Sat, Aug 1, 2020 at 1:50 AM Song Liu <songliubraving@fb.com> wrote:
>>>> 
>> 
>> [...]
>> 
>>> 
>>>> };
>>>> 
>>>> LIBBPF_API int bpf_prog_test_run_xattr(struct bpf_prog_test_run_attr *test_attr);
>>>> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
>>>> index b9f11f854985b..9ce175a486214 100644
>>>> --- a/tools/lib/bpf/libbpf.c
>>>> +++ b/tools/lib/bpf/libbpf.c
>>>> @@ -6922,6 +6922,7 @@ static const struct bpf_sec_def section_defs[] = {
>>>>       BPF_PROG_SEC("lwt_out",                 BPF_PROG_TYPE_LWT_OUT),
>>>>       BPF_PROG_SEC("lwt_xmit",                BPF_PROG_TYPE_LWT_XMIT),
>>>>       BPF_PROG_SEC("lwt_seg6local",           BPF_PROG_TYPE_LWT_SEG6LOCAL),
>>>> +       BPF_PROG_SEC("user",                    BPF_PROG_TYPE_USER),
>>> 
>>> let's do "user/" for consistency with most other prog types (and nice
>>> separation between prog type and custom user name)
>> 
>> About "user" vs. "user/", I still think "user" is better.
>> 
>> Unlike kprobe and tracepoint, user prog doesn't use the part after "/".
>> This is similar to "perf_event" for BPF_PROG_TYPE_PERF_EVENT, "xdl" for
>> BPF_PROG_TYPE_XDP, etc. If we specify "user" here, "user/" and "user/xxx"
>> would also work. However, if we specify "user/" here, programs that used
>> "user" by accident will fail to load, with a message like:
>> 
>>        libbpf: failed to load program 'user'
>> 
>> which is confusing.
> 
> xdp, perf_event and a bunch of others don't enforce it, that's true,
> they are a bit of a legacy,

I don't see w/o "/" is a legacy thing. BPF_PROG_TYPE_STRUCT_OPS just uses
"struct_ops". 

> unfortunately. But all the recent ones do,
> and we explicitly did that for xdp_dev/xdp_cpu, for instance.
> Specifying just "user" in the spec would allow something nonsensical
> like "userargh", for instance, due to this being treated as a prefix.
> There is no harm to require users to do "user/my_prog", though.

I don't see why allowing "userargh" is a problem. Failing "user" is 
more confusing. We can probably improve that by a hint like:

    libbpf: failed to load program 'user', do you mean "user/"?

But it is pretty silly. "user/something_never_used" also looks weird.

> Alternatively, we could introduce a new convention in the spec,
> something like "user?", which would accept either "user" or
> "user/something", but not "user/" nor "userblah". We can try that as
> well.

Again, I don't really understand why allowing "userblah" is a problem. 
We already have "xdp", "xdp_devmap/", and "xdp_cpumap/", they all work 
fine so far. 

Thanks,
Song

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH bpf-next 5/5] selftests/bpf: add benchmark for uprobe vs. user_prog
  2020-08-05  1:52           ` Andrii Nakryiko
@ 2020-08-05  4:47             ` Song Liu
  2020-08-05  5:47               ` Andrii Nakryiko
  2020-08-05 17:16               ` Alexei Starovoitov
  0 siblings, 2 replies; 43+ messages in thread
From: Song Liu @ 2020-08-05  4:47 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: open list, bpf, Networking, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, john fastabend, KP Singh, Jesper Dangaard Brouer,
	Daniel Xu



> On Aug 4, 2020, at 6:52 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> 
> On Tue, Aug 4, 2020 at 2:01 PM Song Liu <songliubraving@fb.com> wrote:
>> 
>> 
>> 
>>> On Aug 2, 2020, at 10:10 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>>> 
>>> On Sun, Aug 2, 2020 at 9:47 PM Song Liu <songliubraving@fb.com> wrote:
>>>> 
>>>> 
>>>>> On Aug 2, 2020, at 6:51 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>>>>> 
>>>>> On Sat, Aug 1, 2020 at 1:50 AM Song Liu <songliubraving@fb.com> wrote:
>>>>>> 
>>>>>> Add a benchmark to compare performance of
>>>>>> 1) uprobe;
>>>>>> 2) user program w/o args;
>>>>>> 3) user program w/ args;
>>>>>> 4) user program w/ args on random cpu.
>>>>>> 
>>>>> 
>>>>> Can you please add it to the existing benchmark runner instead, e.g.,
>>>>> along the other bench_trigger benchmarks? No need to re-implement
>>>>> benchmark setup. And also that would also allow to compare existing
>>>>> ways of cheaply triggering a program vs this new _USER program?
>>>> 
>>>> Will try.
>>>> 
>>>>> 
>>>>> If the performance is not significantly better than other ways, do you
>>>>> think it still makes sense to add a new BPF program type? I think
>>>>> triggering KPROBE/TRACEPOINT from bpf_prog_test_run() would be very
>>>>> nice, maybe it's possible to add that instead of a new program type?
>>>>> Either way, let's see comparison with other program triggering
>>>>> mechanisms first.
>>>> 
>>>> Triggering KPROBE and TRACEPOINT from bpf_prog_test_run() will be useful.
>>>> But I don't think they can be used instead of user program, for a couple
>>>> reasons. First, KPROBE/TRACEPOINT may be triggered by other programs
>>>> running in the system, so user will have to filter those noise out in
>>>> each program. Second, it is not easy to specify CPU for KPROBE/TRACEPOINT,
>>>> while this feature could be useful in many cases, e.g. get stack trace
>>>> on a given CPU.
>>>> 
>>> 
>>> Right, it's not as convenient with KPROBE/TRACEPOINT as with the USER
>>> program you've added specifically with that feature in mind. But if
>>> you pin user-space thread on the needed CPU and trigger kprobe/tp,
>>> then you'll get what you want. As for the "noise", see how
>>> bench_trigger() deals with that: it records thread ID and filters
>>> everything not matching. You can do the same with CPU ID. It's not as
>>> automatic as with a special BPF program type, but still pretty simple,
>>> which is why I'm still deciding (for myself) whether USER program type
>>> is necessary :)
>> 
>> Here are some bench_trigger numbers:
>> 
>> base      :    1.698 ± 0.001M/s
>> tp        :    1.477 ± 0.001M/s
>> rawtp     :    1.567 ± 0.001M/s
>> kprobe    :    1.431 ± 0.000M/s
>> fentry    :    1.691 ± 0.000M/s
>> fmodret   :    1.654 ± 0.000M/s
>> user      :    1.253 ± 0.000M/s
>> fentry-on-cpu:    0.022 ± 0.011M/s
>> user-on-cpu:    0.315 ± 0.001M/s
>> 
> 
> Ok, so basically all of raw_tp,tp,kprobe,fentry/fexit are
> significantly faster than USER programs. Sure, when compared to
> uprobe, they are faster, but not when doing on-specific-CPU run, it
> seems (judging from this patch's description, if I'm reading it
> right). Anyways, speed argument shouldn't be a reason for doing this,
> IMO.
> 
>> The two "on-cpu" tests run the program on a different CPU (see the patch
>> at the end).
>> 
>> "user" is about 25% slower than "fentry". I think this is mostly because
>> getpgid() is a faster syscall than bpf(BPF_TEST_RUN).
> 
> Yes, probably.
> 
>> 
>> "user-on-cpu" is more than 10x faster than "fentry-on-cpu", because IPI
>> is way faster than moving the process (via sched_setaffinity).
> 
> I don't think that's a good comparison, because you are actually
> testing sched_setaffinity performance on each iteration vs IPI in the
> kernel, not a BPF overhead.
> 
> I think the fair comparison for this would be to create a thread and
> pin it on necessary CPU, and only then BPF program calls in a loop.
> But I bet any of existing program types would beat USER program.
> 
>> 
>> For use cases that we would like to call BPF program on specific CPU,
>> triggering it via IPI is a lot faster.
> 
> So these use cases would be nice to expand on in the motivational part
> of the patch set. It's not really emphasized and it's not at all clear
> what you are trying to achieve. It also seems, depending on latency
> requirements, it's totally possible to achieve comparable results by
> pre-creating a thread for each CPU, pinning each one to its designated
> CPU and then using any suitable user-space signaling mechanism (a
> queue, condvar, etc) to ask a thread to trigger BPF program (fentry on
> getpgid(), for instance).

I don't see why user space signal plus fentry would be faster than IPI.
If the target cpu is running something, this gonna add two context 
switches. 

> I bet in this case the  performance would be
> really nice for a lot of practical use cases. But then again, I don't
> know details of the intended use case, so please provide some more
> details.

Being able to trigger BPF program on a different CPU could enable many
use cases and optimizations. The use case I am looking at is to access
perf_event and percpu maps on the target CPU. For example:
	0. trigger the program
	1. read perf_event on cpu x;
	2. (optional) check which process is running on cpu x;
	3. add perf_event value to percpu map(s) on cpu x. 

If we do these steps in a BPF program on cpu x, the cost is:
	A.0) trigger BPF via IPI;
	A.1) read perf_event locally;
	A.2) local access current;
	A.3) local access of percpu map(s). 

If we can only do these on a different CPU, the cost will be:
	B.0) trigger BPF locally;
	B.1) read perf_event via IPI;
	B.2) remote access current on cpu x;
	B.3) remote access percpu map(s), or use non-percpu map(2). 

Cost of (A.0 + A.1) is about same as (B.0 + B.1), maybe a little higher
(sys_bpf(), vs. sys_getpgid()). But A.2 and A.3 will be significantly 
cheaper than B.2 and B.3. 

Does this make sense? 


OTOH, I do agree we can trigger bpftrace BEGIN/END with sys_getpgid() 
or something similar. 

Thanks,
Song

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH bpf-next 2/5] libbpf: support BPF_PROG_TYPE_USER programs
  2020-08-05  3:59         ` Song Liu
@ 2020-08-05  5:32           ` Andrii Nakryiko
  2020-08-05  6:26             ` Song Liu
  0 siblings, 1 reply; 43+ messages in thread
From: Andrii Nakryiko @ 2020-08-05  5:32 UTC (permalink / raw)
  To: Song Liu
  Cc: open list, bpf, Networking, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, john fastabend, KP Singh, Jesper Dangaard Brouer,
	Daniel Xu

On Tue, Aug 4, 2020 at 8:59 PM Song Liu <songliubraving@fb.com> wrote:
>
>
>
> > On Aug 4, 2020, at 6:38 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >
> > On Mon, Aug 3, 2020 at 6:18 PM Song Liu <songliubraving@fb.com> wrote:
> >>
> >>
> >>
> >>> On Aug 2, 2020, at 6:40 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >>>
> >>> On Sat, Aug 1, 2020 at 1:50 AM Song Liu <songliubraving@fb.com> wrote:
> >>>>
> >>
> >> [...]
> >>
> >>>
> >>>> };
> >>>>
> >>>> LIBBPF_API int bpf_prog_test_run_xattr(struct bpf_prog_test_run_attr *test_attr);
> >>>> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> >>>> index b9f11f854985b..9ce175a486214 100644
> >>>> --- a/tools/lib/bpf/libbpf.c
> >>>> +++ b/tools/lib/bpf/libbpf.c
> >>>> @@ -6922,6 +6922,7 @@ static const struct bpf_sec_def section_defs[] = {
> >>>>       BPF_PROG_SEC("lwt_out",                 BPF_PROG_TYPE_LWT_OUT),
> >>>>       BPF_PROG_SEC("lwt_xmit",                BPF_PROG_TYPE_LWT_XMIT),
> >>>>       BPF_PROG_SEC("lwt_seg6local",           BPF_PROG_TYPE_LWT_SEG6LOCAL),
> >>>> +       BPF_PROG_SEC("user",                    BPF_PROG_TYPE_USER),
> >>>
> >>> let's do "user/" for consistency with most other prog types (and nice
> >>> separation between prog type and custom user name)
> >>
> >> About "user" vs. "user/", I still think "user" is better.
> >>
> >> Unlike kprobe and tracepoint, user prog doesn't use the part after "/".
> >> This is similar to "perf_event" for BPF_PROG_TYPE_PERF_EVENT, "xdl" for
> >> BPF_PROG_TYPE_XDP, etc. If we specify "user" here, "user/" and "user/xxx"
> >> would also work. However, if we specify "user/" here, programs that used
> >> "user" by accident will fail to load, with a message like:
> >>
> >>        libbpf: failed to load program 'user'
> >>
> >> which is confusing.
> >
> > xdp, perf_event and a bunch of others don't enforce it, that's true,
> > they are a bit of a legacy,
>
> I don't see w/o "/" is a legacy thing. BPF_PROG_TYPE_STRUCT_OPS just uses
> "struct_ops".
>
> > unfortunately. But all the recent ones do,
> > and we explicitly did that for xdp_dev/xdp_cpu, for instance.
> > Specifying just "user" in the spec would allow something nonsensical
> > like "userargh", for instance, due to this being treated as a prefix.
> > There is no harm to require users to do "user/my_prog", though.
>
> I don't see why allowing "userargh" is a problem. Failing "user" is
> more confusing. We can probably improve that by a hint like:
>
>     libbpf: failed to load program 'user', do you mean "user/"?
>
> But it is pretty silly. "user/something_never_used" also looks weird.

"userargh" is terrible, IMO. It's a different identifier that just
happens to have the first 4 letters matching "user" program type.
There must be either a standardized separator (which happens to be
'/') or none. See the suggestion below.
>
> > Alternatively, we could introduce a new convention in the spec,
> > something like "user?", which would accept either "user" or
> > "user/something", but not "user/" nor "userblah". We can try that as
> > well.
>
> Again, I don't really understand why allowing "userblah" is a problem.
> We already have "xdp", "xdp_devmap/", and "xdp_cpumap/", they all work
> fine so far.

Right, we have "xdp_devmap/" and "xdp_cpumap/", as you say. I haven't
seen so much pushback against trailing forward slash with those ;)

But anyways, as part of deprecating APIs and preparing libbpf for 1.0
release over this half, I think I'm going to emit warnings for names
like "prog_type_whatever" or "prog_typeevenworse", etc. And asking
users to normalize section names to either "prog_type" or
"prog_type/something/here", whichever makes sense for a specific
program type. Right now libbpf doesn't allow two separate BPF programs
with the same section name, so enforcing strict "user" is limiting to
users. We are going to lift that restriction pretty soon, though. But
for now, please stick with what we've been doing lately and mark it as
"user/", later we'll allow just "user" as well.

>
> Thanks,
> Song

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH bpf-next 5/5] selftests/bpf: add benchmark for uprobe vs. user_prog
  2020-08-05  4:47             ` Song Liu
@ 2020-08-05  5:47               ` Andrii Nakryiko
  2020-08-05  7:01                 ` Song Liu
  2020-08-05 17:16               ` Alexei Starovoitov
  1 sibling, 1 reply; 43+ messages in thread
From: Andrii Nakryiko @ 2020-08-05  5:47 UTC (permalink / raw)
  To: Song Liu
  Cc: open list, bpf, Networking, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, john fastabend, KP Singh, Jesper Dangaard Brouer,
	Daniel Xu

On Tue, Aug 4, 2020 at 9:47 PM Song Liu <songliubraving@fb.com> wrote:
>
>
>
> > On Aug 4, 2020, at 6:52 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >
> > On Tue, Aug 4, 2020 at 2:01 PM Song Liu <songliubraving@fb.com> wrote:
> >>
> >>
> >>
> >>> On Aug 2, 2020, at 10:10 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >>>
> >>> On Sun, Aug 2, 2020 at 9:47 PM Song Liu <songliubraving@fb.com> wrote:
> >>>>
> >>>>
> >>>>> On Aug 2, 2020, at 6:51 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >>>>>
> >>>>> On Sat, Aug 1, 2020 at 1:50 AM Song Liu <songliubraving@fb.com> wrote:
> >>>>>>
> >>>>>> Add a benchmark to compare performance of
> >>>>>> 1) uprobe;
> >>>>>> 2) user program w/o args;
> >>>>>> 3) user program w/ args;
> >>>>>> 4) user program w/ args on random cpu.
> >>>>>>
> >>>>>
> >>>>> Can you please add it to the existing benchmark runner instead, e.g.,
> >>>>> along the other bench_trigger benchmarks? No need to re-implement
> >>>>> benchmark setup. And also that would also allow to compare existing
> >>>>> ways of cheaply triggering a program vs this new _USER program?
> >>>>
> >>>> Will try.
> >>>>
> >>>>>
> >>>>> If the performance is not significantly better than other ways, do you
> >>>>> think it still makes sense to add a new BPF program type? I think
> >>>>> triggering KPROBE/TRACEPOINT from bpf_prog_test_run() would be very
> >>>>> nice, maybe it's possible to add that instead of a new program type?
> >>>>> Either way, let's see comparison with other program triggering
> >>>>> mechanisms first.
> >>>>
> >>>> Triggering KPROBE and TRACEPOINT from bpf_prog_test_run() will be useful.
> >>>> But I don't think they can be used instead of user program, for a couple
> >>>> reasons. First, KPROBE/TRACEPOINT may be triggered by other programs
> >>>> running in the system, so user will have to filter those noise out in
> >>>> each program. Second, it is not easy to specify CPU for KPROBE/TRACEPOINT,
> >>>> while this feature could be useful in many cases, e.g. get stack trace
> >>>> on a given CPU.
> >>>>
> >>>
> >>> Right, it's not as convenient with KPROBE/TRACEPOINT as with the USER
> >>> program you've added specifically with that feature in mind. But if
> >>> you pin user-space thread on the needed CPU and trigger kprobe/tp,
> >>> then you'll get what you want. As for the "noise", see how
> >>> bench_trigger() deals with that: it records thread ID and filters
> >>> everything not matching. You can do the same with CPU ID. It's not as
> >>> automatic as with a special BPF program type, but still pretty simple,
> >>> which is why I'm still deciding (for myself) whether USER program type
> >>> is necessary :)
> >>
> >> Here are some bench_trigger numbers:
> >>
> >> base      :    1.698 ± 0.001M/s
> >> tp        :    1.477 ± 0.001M/s
> >> rawtp     :    1.567 ± 0.001M/s
> >> kprobe    :    1.431 ± 0.000M/s
> >> fentry    :    1.691 ± 0.000M/s
> >> fmodret   :    1.654 ± 0.000M/s
> >> user      :    1.253 ± 0.000M/s
> >> fentry-on-cpu:    0.022 ± 0.011M/s
> >> user-on-cpu:    0.315 ± 0.001M/s
> >>
> >
> > Ok, so basically all of raw_tp,tp,kprobe,fentry/fexit are
> > significantly faster than USER programs. Sure, when compared to
> > uprobe, they are faster, but not when doing on-specific-CPU run, it
> > seems (judging from this patch's description, if I'm reading it
> > right). Anyways, speed argument shouldn't be a reason for doing this,
> > IMO.
> >
> >> The two "on-cpu" tests run the program on a different CPU (see the patch
> >> at the end).
> >>
> >> "user" is about 25% slower than "fentry". I think this is mostly because
> >> getpgid() is a faster syscall than bpf(BPF_TEST_RUN).
> >
> > Yes, probably.
> >
> >>
> >> "user-on-cpu" is more than 10x faster than "fentry-on-cpu", because IPI
> >> is way faster than moving the process (via sched_setaffinity).
> >
> > I don't think that's a good comparison, because you are actually
> > testing sched_setaffinity performance on each iteration vs IPI in the
> > kernel, not a BPF overhead.
> >
> > I think the fair comparison for this would be to create a thread and
> > pin it on necessary CPU, and only then BPF program calls in a loop.
> > But I bet any of existing program types would beat USER program.
> >
> >>
> >> For use cases that we would like to call BPF program on specific CPU,
> >> triggering it via IPI is a lot faster.
> >
> > So these use cases would be nice to expand on in the motivational part
> > of the patch set. It's not really emphasized and it's not at all clear
> > what you are trying to achieve. It also seems, depending on latency
> > requirements, it's totally possible to achieve comparable results by
> > pre-creating a thread for each CPU, pinning each one to its designated
> > CPU and then using any suitable user-space signaling mechanism (a
> > queue, condvar, etc) to ask a thread to trigger BPF program (fentry on
> > getpgid(), for instance).
>
> I don't see why user space signal plus fentry would be faster than IPI.
> If the target cpu is running something, this gonna add two context
> switches.
>

I didn't say faster, did I? I said it would be comparable and wouldn't
require a new program type. But then again, without knowing all the
details, it's a bit hard to discuss this. E.g., if you need to trigger
that BPF program periodically, you can sleep in those per-CPU threads,
or epoll, or whatever. Or maybe you can set up a per-CPU perf event
that would trigger your program on the desired CPU, etc. My point is
that I and others shouldn't be guessing this, I'd expect someone who's
proposing an entire new BPF program type to motivate why this new
program type is necessary and what problem it's solving that can't be
solved with existing means.

BTW, how frequently do you need to trigger the BPF program? Seems very
frequently, if 2 vs 1 context switches might be a problem?

> > I bet in this case the  performance would be
> > really nice for a lot of practical use cases. But then again, I don't
> > know details of the intended use case, so please provide some more
> > details.
>
> Being able to trigger BPF program on a different CPU could enable many
> use cases and optimizations. The use case I am looking at is to access
> perf_event and percpu maps on the target CPU. For example:
>         0. trigger the program
>         1. read perf_event on cpu x;
>         2. (optional) check which process is running on cpu x;
>         3. add perf_event value to percpu map(s) on cpu x.
>
> If we do these steps in a BPF program on cpu x, the cost is:
>         A.0) trigger BPF via IPI;
>         A.1) read perf_event locally;
>         A.2) local access current;
>         A.3) local access of percpu map(s).
>
> If we can only do these on a different CPU, the cost will be:
>         B.0) trigger BPF locally;
>         B.1) read perf_event via IPI;
>         B.2) remote access current on cpu x;
>         B.3) remote access percpu map(s), or use non-percpu map(2).
>
> Cost of (A.0 + A.1) is about same as (B.0 + B.1), maybe a little higher
> (sys_bpf(), vs. sys_getpgid()). But A.2 and A.3 will be significantly
> cheaper than B.2 and B.3.
>
> Does this make sense?

It does, thanks. But what I was describing is still A, no? BPF program
will be triggered on your desired cpu X, wouldn't it?

>
>
> OTOH, I do agree we can trigger bpftrace BEGIN/END with sys_getpgid()
> or something similar.

Right.

>
> Thanks,
> Song

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH bpf-next 2/5] libbpf: support BPF_PROG_TYPE_USER programs
  2020-08-05  5:32           ` Andrii Nakryiko
@ 2020-08-05  6:26             ` Song Liu
  2020-08-05  6:54               ` Andrii Nakryiko
  0 siblings, 1 reply; 43+ messages in thread
From: Song Liu @ 2020-08-05  6:26 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: open list, bpf, Networking, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, john fastabend, KP Singh, Jesper Dangaard Brouer,
	Daniel Xu



> On Aug 4, 2020, at 10:32 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> 
> On Tue, Aug 4, 2020 at 8:59 PM Song Liu <songliubraving@fb.com> wrote:
>> 
>> 
>> 
>>> On Aug 4, 2020, at 6:38 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>>> 
>>> On Mon, Aug 3, 2020 at 6:18 PM Song Liu <songliubraving@fb.com> wrote:
>>>> 
>>>> 
>>>> 
>>>>> On Aug 2, 2020, at 6:40 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>>>>> 
>>>>> On Sat, Aug 1, 2020 at 1:50 AM Song Liu <songliubraving@fb.com> wrote:
>>>>>> 
>>>> 
>>>> [...]
>>>> 
>>>>> 
>>>>>> };
>>>>>> 
>>>>>> LIBBPF_API int bpf_prog_test_run_xattr(struct bpf_prog_test_run_attr *test_attr);
>>>>>> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
>>>>>> index b9f11f854985b..9ce175a486214 100644
>>>>>> --- a/tools/lib/bpf/libbpf.c
>>>>>> +++ b/tools/lib/bpf/libbpf.c
>>>>>> @@ -6922,6 +6922,7 @@ static const struct bpf_sec_def section_defs[] = {
>>>>>>      BPF_PROG_SEC("lwt_out",                 BPF_PROG_TYPE_LWT_OUT),
>>>>>>      BPF_PROG_SEC("lwt_xmit",                BPF_PROG_TYPE_LWT_XMIT),
>>>>>>      BPF_PROG_SEC("lwt_seg6local",           BPF_PROG_TYPE_LWT_SEG6LOCAL),
>>>>>> +       BPF_PROG_SEC("user",                    BPF_PROG_TYPE_USER),
>>>>> 
>>>>> let's do "user/" for consistency with most other prog types (and nice
>>>>> separation between prog type and custom user name)
>>>> 
>>>> About "user" vs. "user/", I still think "user" is better.
>>>> 
>>>> Unlike kprobe and tracepoint, user prog doesn't use the part after "/".
>>>> This is similar to "perf_event" for BPF_PROG_TYPE_PERF_EVENT, "xdl" for
>>>> BPF_PROG_TYPE_XDP, etc. If we specify "user" here, "user/" and "user/xxx"
>>>> would also work. However, if we specify "user/" here, programs that used
>>>> "user" by accident will fail to load, with a message like:
>>>> 
>>>>       libbpf: failed to load program 'user'
>>>> 
>>>> which is confusing.
>>> 
>>> xdp, perf_event and a bunch of others don't enforce it, that's true,
>>> they are a bit of a legacy,
>> 
>> I don't see w/o "/" is a legacy thing. BPF_PROG_TYPE_STRUCT_OPS just uses
>> "struct_ops".
>> 
>>> unfortunately. But all the recent ones do,
>>> and we explicitly did that for xdp_dev/xdp_cpu, for instance.
>>> Specifying just "user" in the spec would allow something nonsensical
>>> like "userargh", for instance, due to this being treated as a prefix.
>>> There is no harm to require users to do "user/my_prog", though.
>> 
>> I don't see why allowing "userargh" is a problem. Failing "user" is
>> more confusing. We can probably improve that by a hint like:
>> 
>>    libbpf: failed to load program 'user', do you mean "user/"?
>> 
>> But it is pretty silly. "user/something_never_used" also looks weird.
> 
> "userargh" is terrible, IMO. It's a different identifier that just
> happens to have the first 4 letters matching "user" program type.
> There must be either a standardized separator (which happens to be
> '/') or none. See the suggestion below.

We have no problem deal with "a different identifier that just happens
to have the first letters matching", like xdp vs. xdp_devmap and 
xdp_cpumap, right?

>> 
>>> Alternatively, we could introduce a new convention in the spec,
>>> something like "user?", which would accept either "user" or
>>> "user/something", but not "user/" nor "userblah". We can try that as
>>> well.
>> 
>> Again, I don't really understand why allowing "userblah" is a problem.
>> We already have "xdp", "xdp_devmap/", and "xdp_cpumap/", they all work
>> fine so far.
> 
> Right, we have "xdp_devmap/" and "xdp_cpumap/", as you say. I haven't
> seen so much pushback against trailing forward slash with those ;)

I haven't seen any issue with old "perf_event", "xdp" and new "struct_ops" 
either. 

> 
> But anyways, as part of deprecating APIs and preparing libbpf for 1.0
> release over this half, I think I'm going to emit warnings for names
> like "prog_type_whatever" or "prog_typeevenworse", etc. And asking
> users to normalize section names to either "prog_type" or
> "prog_type/something/here", whichever makes sense for a specific
> program type.

Exactly, "user" makes sense here; while "kprobe/__set_task_comm" makes
sense for kprobe. 

> Right now libbpf doesn't allow two separate BPF programs
> with the same section name, so enforcing strict "user" is limiting to
> users. We are going to lift that restriction pretty soon, though. But
> for now, please stick with what we've been doing lately and mark it as
> "user/", later we'll allow just "user" as well.

Since we would allow "user" later, why we have to reject it for now? 
Imagine the user just compiled and booted into a new kernel with user 
program support; and then got the following message:

	libbpf: failed to load program 'user'

If I were the user, I would definitely question whether the kernel was 
correct...

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH bpf-next 2/5] libbpf: support BPF_PROG_TYPE_USER programs
  2020-08-05  6:26             ` Song Liu
@ 2020-08-05  6:54               ` Andrii Nakryiko
  2020-08-05  7:23                 ` Song Liu
  0 siblings, 1 reply; 43+ messages in thread
From: Andrii Nakryiko @ 2020-08-05  6:54 UTC (permalink / raw)
  To: Song Liu
  Cc: open list, bpf, Networking, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, john fastabend, KP Singh, Jesper Dangaard Brouer,
	Daniel Xu

On Tue, Aug 4, 2020 at 11:26 PM Song Liu <songliubraving@fb.com> wrote:
>
>
>
> > On Aug 4, 2020, at 10:32 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >
> > On Tue, Aug 4, 2020 at 8:59 PM Song Liu <songliubraving@fb.com> wrote:
> >>
> >>
> >>
> >>> On Aug 4, 2020, at 6:38 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >>>
> >>> On Mon, Aug 3, 2020 at 6:18 PM Song Liu <songliubraving@fb.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>>> On Aug 2, 2020, at 6:40 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >>>>>
> >>>>> On Sat, Aug 1, 2020 at 1:50 AM Song Liu <songliubraving@fb.com> wrote:
> >>>>>>
> >>>>
> >>>> [...]
> >>>>
> >>>>>
> >>>>>> };
> >>>>>>
> >>>>>> LIBBPF_API int bpf_prog_test_run_xattr(struct bpf_prog_test_run_attr *test_attr);
> >>>>>> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> >>>>>> index b9f11f854985b..9ce175a486214 100644
> >>>>>> --- a/tools/lib/bpf/libbpf.c
> >>>>>> +++ b/tools/lib/bpf/libbpf.c
> >>>>>> @@ -6922,6 +6922,7 @@ static const struct bpf_sec_def section_defs[] = {
> >>>>>>      BPF_PROG_SEC("lwt_out",                 BPF_PROG_TYPE_LWT_OUT),
> >>>>>>      BPF_PROG_SEC("lwt_xmit",                BPF_PROG_TYPE_LWT_XMIT),
> >>>>>>      BPF_PROG_SEC("lwt_seg6local",           BPF_PROG_TYPE_LWT_SEG6LOCAL),
> >>>>>> +       BPF_PROG_SEC("user",                    BPF_PROG_TYPE_USER),
> >>>>>
> >>>>> let's do "user/" for consistency with most other prog types (and nice
> >>>>> separation between prog type and custom user name)
> >>>>
> >>>> About "user" vs. "user/", I still think "user" is better.
> >>>>
> >>>> Unlike kprobe and tracepoint, user prog doesn't use the part after "/".
> >>>> This is similar to "perf_event" for BPF_PROG_TYPE_PERF_EVENT, "xdl" for
> >>>> BPF_PROG_TYPE_XDP, etc. If we specify "user" here, "user/" and "user/xxx"
> >>>> would also work. However, if we specify "user/" here, programs that used
> >>>> "user" by accident will fail to load, with a message like:
> >>>>
> >>>>       libbpf: failed to load program 'user'
> >>>>
> >>>> which is confusing.
> >>>
> >>> xdp, perf_event and a bunch of others don't enforce it, that's true,
> >>> they are a bit of a legacy,
> >>
> >> I don't see w/o "/" is a legacy thing. BPF_PROG_TYPE_STRUCT_OPS just uses
> >> "struct_ops".
> >>
> >>> unfortunately. But all the recent ones do,
> >>> and we explicitly did that for xdp_dev/xdp_cpu, for instance.
> >>> Specifying just "user" in the spec would allow something nonsensical
> >>> like "userargh", for instance, due to this being treated as a prefix.
> >>> There is no harm to require users to do "user/my_prog", though.
> >>
> >> I don't see why allowing "userargh" is a problem. Failing "user" is
> >> more confusing. We can probably improve that by a hint like:
> >>
> >>    libbpf: failed to load program 'user', do you mean "user/"?
> >>
> >> But it is pretty silly. "user/something_never_used" also looks weird.
> >
> > "userargh" is terrible, IMO. It's a different identifier that just
> > happens to have the first 4 letters matching "user" program type.
> > There must be either a standardized separator (which happens to be
> > '/') or none. See the suggestion below.
>
> We have no problem deal with "a different identifier that just happens
> to have the first letters matching", like xdp vs. xdp_devmap and
> xdp_cpumap, right?
>

xdp vs xdp_devmap is an entirely different thing. We deal with it by
checking xdp_devmap first. What I'm saying is that user can do
"xdpomg" and libbpf would be happy (today). And I don't think that's
good. But further, if someone does something like "xdp_devmap_omg",
guess which program type will be inferred? Hint: not xdp_devmap and
libbpf won't report an error either. All because "xdp" is so lax
today.

> >>
> >>> Alternatively, we could introduce a new convention in the spec,
> >>> something like "user?", which would accept either "user" or
> >>> "user/something", but not "user/" nor "userblah". We can try that as
> >>> well.
> >>
> >> Again, I don't really understand why allowing "userblah" is a problem.
> >> We already have "xdp", "xdp_devmap/", and "xdp_cpumap/", they all work
> >> fine so far.
> >
> > Right, we have "xdp_devmap/" and "xdp_cpumap/", as you say. I haven't
> > seen so much pushback against trailing forward slash with those ;)
>
> I haven't seen any issue with old "perf_event", "xdp" and new "struct_ops"
> either.
>
> >
> > But anyways, as part of deprecating APIs and preparing libbpf for 1.0
> > release over this half, I think I'm going to emit warnings for names
> > like "prog_type_whatever" or "prog_typeevenworse", etc. And asking
> > users to normalize section names to either "prog_type" or
> > "prog_type/something/here", whichever makes sense for a specific
> > program type.
>
> Exactly, "user" makes sense here; while "kprobe/__set_task_comm" makes
> sense for kprobe.

Right, but "userblah" doesn't. It would be great if you could help
make what I described above become true. But at least don't make it
worse by allowing unrestricted "user" prefix. I'm OK with strict
"user" or "user/blah", I'm not OK with "userblah", I'm sorry.

>
> > Right now libbpf doesn't allow two separate BPF programs
> > with the same section name, so enforcing strict "user" is limiting to
> > users. We are going to lift that restriction pretty soon, though. But
> > for now, please stick with what we've been doing lately and mark it as
> > "user/", later we'll allow just "user" as well.
>
> Since we would allow "user" later, why we have to reject it for now?

Because libbpf is dumb in that regard today? And instead of migrating
users later, I want to prevent users making bad choices right now.
Then relax it, if necessary. Alternatively, we can fix up libbpf logic
before the USER program type lands.

> Imagine the user just compiled and booted into a new kernel with user
> program support; and then got the following message:
>
>         libbpf: failed to load program 'user'
>
> If I were the user, I would definitely question whether the kernel was
> correct...

That's also bad, and again, we can make libbpf better. I think moving
forward any non-recognized BPF program type should be reported by
libbpf as an error. But we can't do it right now, we have to have a
period in which users will get a chance to update their BPF programs.
This will have to happen over few libbpf releases at least. So please
join in on the fun of fixing stuff like this.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH bpf-next 5/5] selftests/bpf: add benchmark for uprobe vs. user_prog
  2020-08-05  5:47               ` Andrii Nakryiko
@ 2020-08-05  7:01                 ` Song Liu
  2020-08-05 17:39                   ` Andrii Nakryiko
  0 siblings, 1 reply; 43+ messages in thread
From: Song Liu @ 2020-08-05  7:01 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: open list, bpf, Networking, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, john fastabend, KP Singh, Jesper Dangaard Brouer,
	Daniel Xu



> On Aug 4, 2020, at 10:47 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> 
> On Tue, Aug 4, 2020 at 9:47 PM Song Liu <songliubraving@fb.com> wrote:
>> 
>> 
>> 
>>> On Aug 4, 2020, at 6:52 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>>> 
>>> On Tue, Aug 4, 2020 at 2:01 PM Song Liu <songliubraving@fb.com> wrote:
>>>> 
>>>> 
>>>> 
>>>>> On Aug 2, 2020, at 10:10 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>>>>> 
>>>>> On Sun, Aug 2, 2020 at 9:47 PM Song Liu <songliubraving@fb.com> wrote:
>>>>>> 
>>>>>> 
>>>>>>> On Aug 2, 2020, at 6:51 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>>>>>>> 
>>>>>>> On Sat, Aug 1, 2020 at 1:50 AM Song Liu <songliubraving@fb.com> wrote:
>>>>>>>> 
>>>>>>>> Add a benchmark to compare performance of
>>>>>>>> 1) uprobe;
>>>>>>>> 2) user program w/o args;
>>>>>>>> 3) user program w/ args;
>>>>>>>> 4) user program w/ args on random cpu.
>>>>>>>> 
>>>>>>> 
>>>>>>> Can you please add it to the existing benchmark runner instead, e.g.,
>>>>>>> along the other bench_trigger benchmarks? No need to re-implement
>>>>>>> benchmark setup. And also that would also allow to compare existing
>>>>>>> ways of cheaply triggering a program vs this new _USER program?
>>>>>> 
>>>>>> Will try.
>>>>>> 
>>>>>>> 
>>>>>>> If the performance is not significantly better than other ways, do you
>>>>>>> think it still makes sense to add a new BPF program type? I think
>>>>>>> triggering KPROBE/TRACEPOINT from bpf_prog_test_run() would be very
>>>>>>> nice, maybe it's possible to add that instead of a new program type?
>>>>>>> Either way, let's see comparison with other program triggering
>>>>>>> mechanisms first.
>>>>>> 
>>>>>> Triggering KPROBE and TRACEPOINT from bpf_prog_test_run() will be useful.
>>>>>> But I don't think they can be used instead of user program, for a couple
>>>>>> reasons. First, KPROBE/TRACEPOINT may be triggered by other programs
>>>>>> running in the system, so user will have to filter those noise out in
>>>>>> each program. Second, it is not easy to specify CPU for KPROBE/TRACEPOINT,
>>>>>> while this feature could be useful in many cases, e.g. get stack trace
>>>>>> on a given CPU.
>>>>>> 
>>>>> 
>>>>> Right, it's not as convenient with KPROBE/TRACEPOINT as with the USER
>>>>> program you've added specifically with that feature in mind. But if
>>>>> you pin user-space thread on the needed CPU and trigger kprobe/tp,
>>>>> then you'll get what you want. As for the "noise", see how
>>>>> bench_trigger() deals with that: it records thread ID and filters
>>>>> everything not matching. You can do the same with CPU ID. It's not as
>>>>> automatic as with a special BPF program type, but still pretty simple,
>>>>> which is why I'm still deciding (for myself) whether USER program type
>>>>> is necessary :)
>>>> 
>>>> Here are some bench_trigger numbers:
>>>> 
>>>> base      :    1.698 ± 0.001M/s
>>>> tp        :    1.477 ± 0.001M/s
>>>> rawtp     :    1.567 ± 0.001M/s
>>>> kprobe    :    1.431 ± 0.000M/s
>>>> fentry    :    1.691 ± 0.000M/s
>>>> fmodret   :    1.654 ± 0.000M/s
>>>> user      :    1.253 ± 0.000M/s
>>>> fentry-on-cpu:    0.022 ± 0.011M/s
>>>> user-on-cpu:    0.315 ± 0.001M/s
>>>> 
>>> 
>>> Ok, so basically all of raw_tp,tp,kprobe,fentry/fexit are
>>> significantly faster than USER programs. Sure, when compared to
>>> uprobe, they are faster, but not when doing on-specific-CPU run, it
>>> seems (judging from this patch's description, if I'm reading it
>>> right). Anyways, speed argument shouldn't be a reason for doing this,
>>> IMO.
>>> 
>>>> The two "on-cpu" tests run the program on a different CPU (see the patch
>>>> at the end).
>>>> 
>>>> "user" is about 25% slower than "fentry". I think this is mostly because
>>>> getpgid() is a faster syscall than bpf(BPF_TEST_RUN).
>>> 
>>> Yes, probably.
>>> 
>>>> 
>>>> "user-on-cpu" is more than 10x faster than "fentry-on-cpu", because IPI
>>>> is way faster than moving the process (via sched_setaffinity).
>>> 
>>> I don't think that's a good comparison, because you are actually
>>> testing sched_setaffinity performance on each iteration vs IPI in the
>>> kernel, not a BPF overhead.
>>> 
>>> I think the fair comparison for this would be to create a thread and
>>> pin it on necessary CPU, and only then BPF program calls in a loop.
>>> But I bet any of existing program types would beat USER program.
>>> 
>>>> 
>>>> For use cases that we would like to call BPF program on specific CPU,
>>>> triggering it via IPI is a lot faster.
>>> 
>>> So these use cases would be nice to expand on in the motivational part
>>> of the patch set. It's not really emphasized and it's not at all clear
>>> what you are trying to achieve. It also seems, depending on latency
>>> requirements, it's totally possible to achieve comparable results by
>>> pre-creating a thread for each CPU, pinning each one to its designated
>>> CPU and then using any suitable user-space signaling mechanism (a
>>> queue, condvar, etc) to ask a thread to trigger BPF program (fentry on
>>> getpgid(), for instance).
>> 
>> I don't see why user space signal plus fentry would be faster than IPI.
>> If the target cpu is running something, this gonna add two context
>> switches.
>> 
> 
> I didn't say faster, did I? I said it would be comparable and wouldn't
> require a new program type.

Well, I don't think adding program type is that big a deal. If that is
really a problem, we can use a new attach type instead. The goal is to 
trigger it with sys_bpf() on a different cpu. So we can call it kprobe
attach to nothing and hack that way. I add the new type because it makes
sense. The user just want to trigger a BPF program from user space. 

> But then again, without knowing all the
> details, it's a bit hard to discuss this. E.g., if you need to trigger
> that BPF program periodically, you can sleep in those per-CPU threads,
> or epoll, or whatever. Or maybe you can set up a per-CPU perf event
> that would trigger your program on the desired CPU, etc.My point is
> that I and others shouldn't be guessing this, I'd expect someone who's
> proposing an entire new BPF program type to motivate why this new
> program type is necessary and what problem it's solving that can't be
> solved with existing means.

Yes, there are other options. But they all come with non-trivial cost. 
Per-CPU-per-process threads and/or per-CPU perf event are cost we have 
to pay in production. IMO, these cost are much higher than a new program
type (or attach type). 

> 
> BTW, how frequently do you need to trigger the BPF program? Seems very
> frequently, if 2 vs 1 context switches might be a problem?

The whole solution requires two BPF programs. One on each context switch, 
the other is the user program. The user program will not trigger very
often. 

> 
>>> I bet in this case the  performance would be
>>> really nice for a lot of practical use cases. But then again, I don't
>>> know details of the intended use case, so please provide some more
>>> details.
>> 
>> Being able to trigger BPF program on a different CPU could enable many
>> use cases and optimizations. The use case I am looking at is to access
>> perf_event and percpu maps on the target CPU. For example:
>>        0. trigger the program
>>        1. read perf_event on cpu x;
>>        2. (optional) check which process is running on cpu x;
>>        3. add perf_event value to percpu map(s) on cpu x.
>> 
>> If we do these steps in a BPF program on cpu x, the cost is:
>>        A.0) trigger BPF via IPI;
>>        A.1) read perf_event locally;
>>        A.2) local access current;
>>        A.3) local access of percpu map(s).
>> 
>> If we can only do these on a different CPU, the cost will be:
>>        B.0) trigger BPF locally;
>>        B.1) read perf_event via IPI;
>>        B.2) remote access current on cpu x;
>>        B.3) remote access percpu map(s), or use non-percpu map(2).
>> 
>> Cost of (A.0 + A.1) is about same as (B.0 + B.1), maybe a little higher
>> (sys_bpf(), vs. sys_getpgid()). But A.2 and A.3 will be significantly
>> cheaper than B.2 and B.3.
>> 
>> Does this make sense?
> 
> It does, thanks. But what I was describing is still A, no? BPF program
> will be triggered on your desired cpu X, wouldn't it?

Well, that would be option C, but C could not do step 2, because we context 
switch to the dedicated thread. 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH bpf-next 2/5] libbpf: support BPF_PROG_TYPE_USER programs
  2020-08-05  6:54               ` Andrii Nakryiko
@ 2020-08-05  7:23                 ` Song Liu
  2020-08-05 17:44                   ` Andrii Nakryiko
  0 siblings, 1 reply; 43+ messages in thread
From: Song Liu @ 2020-08-05  7:23 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: open list, bpf, Networking, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, john fastabend, KP Singh, Jesper Dangaard Brouer,
	Daniel Xu



> On Aug 4, 2020, at 11:54 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> 
> On Tue, Aug 4, 2020 at 11:26 PM Song Liu <songliubraving@fb.com> wrote:
>> 
>> 
>> 
>>> On Aug 4, 2020, at 10:32 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>>> 
>>> On Tue, Aug 4, 2020 at 8:59 PM Song Liu <songliubraving@fb.com> wrote:
>>>> 
>>>> 
>>>> 
>>>>> On Aug 4, 2020, at 6:38 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>>>>> 
>>>>> On Mon, Aug 3, 2020 at 6:18 PM Song Liu <songliubraving@fb.com> wrote:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Aug 2, 2020, at 6:40 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>>>>>>> 
>>>>>>> On Sat, Aug 1, 2020 at 1:50 AM Song Liu <songliubraving@fb.com> wrote:
>>>>>>>> 
>>>>>> 
>>>>>> [...]
>>>>>> 
>>>>>>> 
>>>>>>>> };
>>>>>>>> 
>>>>>>>> LIBBPF_API int bpf_prog_test_run_xattr(struct bpf_prog_test_run_attr *test_attr);
>>>>>>>> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
>>>>>>>> index b9f11f854985b..9ce175a486214 100644
>>>>>>>> --- a/tools/lib/bpf/libbpf.c
>>>>>>>> +++ b/tools/lib/bpf/libbpf.c
>>>>>>>> @@ -6922,6 +6922,7 @@ static const struct bpf_sec_def section_defs[] = {
>>>>>>>>     BPF_PROG_SEC("lwt_out",                 BPF_PROG_TYPE_LWT_OUT),
>>>>>>>>     BPF_PROG_SEC("lwt_xmit",                BPF_PROG_TYPE_LWT_XMIT),
>>>>>>>>     BPF_PROG_SEC("lwt_seg6local",           BPF_PROG_TYPE_LWT_SEG6LOCAL),
>>>>>>>> +       BPF_PROG_SEC("user",                    BPF_PROG_TYPE_USER),
>>>>>>> 
>>>>>>> let's do "user/" for consistency with most other prog types (and nice
>>>>>>> separation between prog type and custom user name)
>>>>>> 
>>>>>> About "user" vs. "user/", I still think "user" is better.
>>>>>> 
>>>>>> Unlike kprobe and tracepoint, user prog doesn't use the part after "/".
>>>>>> This is similar to "perf_event" for BPF_PROG_TYPE_PERF_EVENT, "xdl" for
>>>>>> BPF_PROG_TYPE_XDP, etc. If we specify "user" here, "user/" and "user/xxx"
>>>>>> would also work. However, if we specify "user/" here, programs that used
>>>>>> "user" by accident will fail to load, with a message like:
>>>>>> 
>>>>>>      libbpf: failed to load program 'user'
>>>>>> 
>>>>>> which is confusing.
>>>>> 
>>>>> xdp, perf_event and a bunch of others don't enforce it, that's true,
>>>>> they are a bit of a legacy,
>>>> 
>>>> I don't see w/o "/" is a legacy thing. BPF_PROG_TYPE_STRUCT_OPS just uses
>>>> "struct_ops".
>>>> 
>>>>> unfortunately. But all the recent ones do,
>>>>> and we explicitly did that for xdp_dev/xdp_cpu, for instance.
>>>>> Specifying just "user" in the spec would allow something nonsensical
>>>>> like "userargh", for instance, due to this being treated as a prefix.
>>>>> There is no harm to require users to do "user/my_prog", though.
>>>> 
>>>> I don't see why allowing "userargh" is a problem. Failing "user" is
>>>> more confusing. We can probably improve that by a hint like:
>>>> 
>>>>   libbpf: failed to load program 'user', do you mean "user/"?
>>>> 
>>>> But it is pretty silly. "user/something_never_used" also looks weird.
>>> 
>>> "userargh" is terrible, IMO. It's a different identifier that just
>>> happens to have the first 4 letters matching "user" program type.
>>> There must be either a standardized separator (which happens to be
>>> '/') or none. See the suggestion below.
>> 
>> We have no problem deal with "a different identifier that just happens
>> to have the first letters matching", like xdp vs. xdp_devmap and
>> xdp_cpumap, right?
>> 
> 
> xdp vs xdp_devmap is an entirely different thing. We deal with it by
> checking xdp_devmap first. What I'm saying is that user can do
> "xdpomg" and libbpf would be happy (today). And I don't think that's
> good. But further, if someone does something like "xdp_devmap_omg",
> guess which program type will be inferred? Hint: not xdp_devmap and
> libbpf won't report an error either. All because "xdp" is so lax
> today.
> 
>>>> 
>>>>> Alternatively, we could introduce a new convention in the spec,
>>>>> something like "user?", which would accept either "user" or
>>>>> "user/something", but not "user/" nor "userblah". We can try that as
>>>>> well.
>>>> 
>>>> Again, I don't really understand why allowing "userblah" is a problem.
>>>> We already have "xdp", "xdp_devmap/", and "xdp_cpumap/", they all work
>>>> fine so far.
>>> 
>>> Right, we have "xdp_devmap/" and "xdp_cpumap/", as you say. I haven't
>>> seen so much pushback against trailing forward slash with those ;)
>> 
>> I haven't seen any issue with old "perf_event", "xdp" and new "struct_ops"
>> either.
>> 
>>> 
>>> But anyways, as part of deprecating APIs and preparing libbpf for 1.0
>>> release over this half, I think I'm going to emit warnings for names
>>> like "prog_type_whatever" or "prog_typeevenworse", etc. And asking
>>> users to normalize section names to either "prog_type" or
>>> "prog_type/something/here", whichever makes sense for a specific
>>> program type.
>> 
>> Exactly, "user" makes sense here; while "kprobe/__set_task_comm" makes
>> sense for kprobe.
> 
> Right, but "userblah" doesn't. It would be great if you could help
> make what I described above become true. But at least don't make it
> worse by allowing unrestricted "user" prefix. I'm OK with strict
> "user" or "user/blah", I'm not OK with "userblah", I'm sorry.

If the concern with "userblah" is real and unbearable, so is "xdpblah"
and "perf_eventblah", and so on, and so on. 

> 
>> 
>>> Right now libbpf doesn't allow two separate BPF programs
>>> with the same section name, so enforcing strict "user" is limiting to
>>> users. We are going to lift that restriction pretty soon, though. But
>>> for now, please stick with what we've been doing lately and mark it as
>>> "user/", later we'll allow just "user" as well.
>> 
>> Since we would allow "user" later, why we have to reject it for now?
> 
> Because libbpf is dumb in that regard today? And instead of migrating
> users later, I want to prevent users making bad choices right now.

The good choice here is to use "user", IMO. And you are preventing people 
to use it. If user has to use "user/" for now. They will have to update 
the programs later, right? If the conclusion is "user/xxx" is the ultimate
goal, I would agree with "user/" for now. 

> Then relax it, if necessary. Alternatively, we can fix up libbpf logic
> before the USER program type lands.

I don't see why the USER program type need to wait for libbpf fix, as
"xdp", "perf_event", etc. all work well today. 

> 
>> Imagine the user just compiled and booted into a new kernel with user
>> program support; and then got the following message:
>> 
>>        libbpf: failed to load program 'user'
>> 
>> If I were the user, I would definitely question whether the kernel was
>> correct...
> 
> That's also bad, and again, we can make libbpf better. I think moving
> forward any non-recognized BPF program type should be reported by
> libbpf as an error. But we can't do it right now, we have to have a
> period in which users will get a chance to update their BPF programs.
> This will have to happen over few libbpf releases at least. So please
> join in on the fun of fixing stuff like this.

I'd love to join the fun. Maybe after user program lands ;)



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH bpf-next 5/5] selftests/bpf: add benchmark for uprobe vs. user_prog
  2020-08-05  4:47             ` Song Liu
  2020-08-05  5:47               ` Andrii Nakryiko
@ 2020-08-05 17:16               ` Alexei Starovoitov
  2020-08-05 17:27                 ` Andrii Nakryiko
  2020-08-05 18:56                 ` Song Liu
  1 sibling, 2 replies; 43+ messages in thread
From: Alexei Starovoitov @ 2020-08-05 17:16 UTC (permalink / raw)
  To: Song Liu
  Cc: Andrii Nakryiko, open list, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team, john fastabend, KP Singh,
	Jesper Dangaard Brouer, Daniel Xu

On Wed, Aug 05, 2020 at 04:47:30AM +0000, Song Liu wrote:
> 
> Being able to trigger BPF program on a different CPU could enable many
> use cases and optimizations. The use case I am looking at is to access
> perf_event and percpu maps on the target CPU. For example:
> 	0. trigger the program
> 	1. read perf_event on cpu x;
> 	2. (optional) check which process is running on cpu x;
> 	3. add perf_event value to percpu map(s) on cpu x. 

If the whole thing is about doing the above then I don't understand why new
prog type is needed. Can prog_test_run support existing BPF_PROG_TYPE_KPROBE?
"enable many use cases" sounds vague. I don't think folks reading
the patches can guess those "use cases".
"Testing existing kprobe bpf progs" would sound more convincing to me.
If the test_run framework can be extended to trigger kprobe with correct pt_regs.
As part of it test_run would trigger on a given cpu with $ip pointing
to some test fuction in test_run.c. For local test_run the stack trace
would include bpf syscall chain. For IPI the stack trace would include
the corresponding kernel pieces where top is our special test function.
Sort of like pseudo kprobe where there is no actual kprobe logic,
since kprobe prog doesn't care about mechanism. It needs correct
pt_regs only as input context.
The kprobe prog output (return value) has special meaning though,
so may be kprobe prog type is not a good fit.
Something like fentry/fexit may be better, since verifier check_return_code()
enforces 'return 0'. So their return value is effectively "void".
Then prog_test_run would need to gain an ability to trigger
fentry/fexit prog on a given cpu.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH bpf-next 5/5] selftests/bpf: add benchmark for uprobe vs. user_prog
  2020-08-05 17:16               ` Alexei Starovoitov
@ 2020-08-05 17:27                 ` Andrii Nakryiko
  2020-08-05 17:45                   ` Alexei Starovoitov
  2020-08-05 18:56                 ` Song Liu
  1 sibling, 1 reply; 43+ messages in thread
From: Andrii Nakryiko @ 2020-08-05 17:27 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Song Liu, open list, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team, john fastabend, KP Singh,
	Jesper Dangaard Brouer, Daniel Xu

On Wed, Aug 5, 2020 at 10:16 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Aug 05, 2020 at 04:47:30AM +0000, Song Liu wrote:
> >
> > Being able to trigger BPF program on a different CPU could enable many
> > use cases and optimizations. The use case I am looking at is to access
> > perf_event and percpu maps on the target CPU. For example:
> >       0. trigger the program
> >       1. read perf_event on cpu x;
> >       2. (optional) check which process is running on cpu x;
> >       3. add perf_event value to percpu map(s) on cpu x.
>
> If the whole thing is about doing the above then I don't understand why new
> prog type is needed. Can prog_test_run support existing BPF_PROG_TYPE_KPROBE?
> "enable many use cases" sounds vague. I don't think folks reading
> the patches can guess those "use cases".
> "Testing existing kprobe bpf progs" would sound more convincing to me.

Was just about to propose the same :) I wonder if generic test_run()
capability to trigger test programs of whatever supported type on a
specified CPU through IPI can be added. That way you can even use the
XDP program to do what Song seems to need.

TRACEPOINTs might also be a good fit here, given it seems simpler to
let users specify custom tracepoint data for test_run(). Having the
ability to unit-test KPROBE and TRACEPOINT, however rudimentary, is
already a big win.

> If the test_run framework can be extended to trigger kprobe with correct pt_regs.
> As part of it test_run would trigger on a given cpu with $ip pointing
> to some test fuction in test_run.c. For local test_run the stack trace
> would include bpf syscall chain. For IPI the stack trace would include
> the corresponding kernel pieces where top is our special test function.
> Sort of like pseudo kprobe where there is no actual kprobe logic,
> since kprobe prog doesn't care about mechanism. It needs correct
> pt_regs only as input context.
> The kprobe prog output (return value) has special meaning though,
> so may be kprobe prog type is not a good fit.

It does? I don't remember returning 1 from KPROBE changing anything. I
thought it's only the special bpf_override_return() that can influence
the kernel function return result.

> Something like fentry/fexit may be better, since verifier check_return_code()
> enforces 'return 0'. So their return value is effectively "void".
> Then prog_test_run would need to gain an ability to trigger
> fentry/fexit prog on a given cpu.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH bpf-next 5/5] selftests/bpf: add benchmark for uprobe vs. user_prog
  2020-08-05  7:01                 ` Song Liu
@ 2020-08-05 17:39                   ` Andrii Nakryiko
  2020-08-05 18:41                     ` Song Liu
  0 siblings, 1 reply; 43+ messages in thread
From: Andrii Nakryiko @ 2020-08-05 17:39 UTC (permalink / raw)
  To: Song Liu
  Cc: open list, bpf, Networking, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, john fastabend, KP Singh, Jesper Dangaard Brouer,
	Daniel Xu

On Wed, Aug 5, 2020 at 12:01 AM Song Liu <songliubraving@fb.com> wrote:
>
>
>
> > On Aug 4, 2020, at 10:47 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >
> > On Tue, Aug 4, 2020 at 9:47 PM Song Liu <songliubraving@fb.com> wrote:
> >>
> >>
> >>
> >>> On Aug 4, 2020, at 6:52 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >>>
> >>> On Tue, Aug 4, 2020 at 2:01 PM Song Liu <songliubraving@fb.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>>> On Aug 2, 2020, at 10:10 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >>>>>
> >>>>> On Sun, Aug 2, 2020 at 9:47 PM Song Liu <songliubraving@fb.com> wrote:
> >>>>>>
> >>>>>>
> >>>>>>> On Aug 2, 2020, at 6:51 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >>>>>>>
> >>>>>>> On Sat, Aug 1, 2020 at 1:50 AM Song Liu <songliubraving@fb.com> wrote:
> >>>>>>>>
> >>>>>>>> Add a benchmark to compare performance of
> >>>>>>>> 1) uprobe;
> >>>>>>>> 2) user program w/o args;
> >>>>>>>> 3) user program w/ args;
> >>>>>>>> 4) user program w/ args on random cpu.
> >>>>>>>>
> >>>>>>>
> >>>>>>> Can you please add it to the existing benchmark runner instead, e.g.,
> >>>>>>> along the other bench_trigger benchmarks? No need to re-implement
> >>>>>>> benchmark setup. And also that would also allow to compare existing
> >>>>>>> ways of cheaply triggering a program vs this new _USER program?
> >>>>>>
> >>>>>> Will try.
> >>>>>>
> >>>>>>>
> >>>>>>> If the performance is not significantly better than other ways, do you
> >>>>>>> think it still makes sense to add a new BPF program type? I think
> >>>>>>> triggering KPROBE/TRACEPOINT from bpf_prog_test_run() would be very
> >>>>>>> nice, maybe it's possible to add that instead of a new program type?
> >>>>>>> Either way, let's see comparison with other program triggering
> >>>>>>> mechanisms first.
> >>>>>>
> >>>>>> Triggering KPROBE and TRACEPOINT from bpf_prog_test_run() will be useful.
> >>>>>> But I don't think they can be used instead of user program, for a couple
> >>>>>> reasons. First, KPROBE/TRACEPOINT may be triggered by other programs
> >>>>>> running in the system, so user will have to filter those noise out in
> >>>>>> each program. Second, it is not easy to specify CPU for KPROBE/TRACEPOINT,
> >>>>>> while this feature could be useful in many cases, e.g. get stack trace
> >>>>>> on a given CPU.
> >>>>>>
> >>>>>
> >>>>> Right, it's not as convenient with KPROBE/TRACEPOINT as with the USER
> >>>>> program you've added specifically with that feature in mind. But if
> >>>>> you pin user-space thread on the needed CPU and trigger kprobe/tp,
> >>>>> then you'll get what you want. As for the "noise", see how
> >>>>> bench_trigger() deals with that: it records thread ID and filters
> >>>>> everything not matching. You can do the same with CPU ID. It's not as
> >>>>> automatic as with a special BPF program type, but still pretty simple,
> >>>>> which is why I'm still deciding (for myself) whether USER program type
> >>>>> is necessary :)
> >>>>
> >>>> Here are some bench_trigger numbers:
> >>>>
> >>>> base      :    1.698 ± 0.001M/s
> >>>> tp        :    1.477 ± 0.001M/s
> >>>> rawtp     :    1.567 ± 0.001M/s
> >>>> kprobe    :    1.431 ± 0.000M/s
> >>>> fentry    :    1.691 ± 0.000M/s
> >>>> fmodret   :    1.654 ± 0.000M/s
> >>>> user      :    1.253 ± 0.000M/s
> >>>> fentry-on-cpu:    0.022 ± 0.011M/s
> >>>> user-on-cpu:    0.315 ± 0.001M/s
> >>>>
> >>>
> >>> Ok, so basically all of raw_tp,tp,kprobe,fentry/fexit are
> >>> significantly faster than USER programs. Sure, when compared to
> >>> uprobe, they are faster, but not when doing on-specific-CPU run, it
> >>> seems (judging from this patch's description, if I'm reading it
> >>> right). Anyways, speed argument shouldn't be a reason for doing this,
> >>> IMO.
> >>>
> >>>> The two "on-cpu" tests run the program on a different CPU (see the patch
> >>>> at the end).
> >>>>
> >>>> "user" is about 25% slower than "fentry". I think this is mostly because
> >>>> getpgid() is a faster syscall than bpf(BPF_TEST_RUN).
> >>>
> >>> Yes, probably.
> >>>
> >>>>
> >>>> "user-on-cpu" is more than 10x faster than "fentry-on-cpu", because IPI
> >>>> is way faster than moving the process (via sched_setaffinity).
> >>>
> >>> I don't think that's a good comparison, because you are actually
> >>> testing sched_setaffinity performance on each iteration vs IPI in the
> >>> kernel, not a BPF overhead.
> >>>
> >>> I think the fair comparison for this would be to create a thread and
> >>> pin it on necessary CPU, and only then BPF program calls in a loop.
> >>> But I bet any of existing program types would beat USER program.
> >>>
> >>>>
> >>>> For use cases that we would like to call BPF program on specific CPU,
> >>>> triggering it via IPI is a lot faster.
> >>>
> >>> So these use cases would be nice to expand on in the motivational part
> >>> of the patch set. It's not really emphasized and it's not at all clear
> >>> what you are trying to achieve. It also seems, depending on latency
> >>> requirements, it's totally possible to achieve comparable results by
> >>> pre-creating a thread for each CPU, pinning each one to its designated
> >>> CPU and then using any suitable user-space signaling mechanism (a
> >>> queue, condvar, etc) to ask a thread to trigger BPF program (fentry on
> >>> getpgid(), for instance).
> >>
> >> I don't see why user space signal plus fentry would be faster than IPI.
> >> If the target cpu is running something, this gonna add two context
> >> switches.
> >>
> >
> > I didn't say faster, did I? I said it would be comparable and wouldn't
> > require a new program type.
>
> Well, I don't think adding program type is that big a deal. If that is
> really a problem, we can use a new attach type instead. The goal is to
> trigger it with sys_bpf() on a different cpu. So we can call it kprobe
> attach to nothing and hack that way. I add the new type because it makes
> sense. The user just want to trigger a BPF program from user space.

I thought we already concluded that it's not really "trigger a BPF
program from user space", because for that you have many existing and
even faster options. After a few rounds of emails, it seems it's more
about triggering the BPF program on another CPU without preempting
whatever is running on that CPU. It would be helpful to be clear and
upfront about the requirements.

>
> > But then again, without knowing all the
> > details, it's a bit hard to discuss this. E.g., if you need to trigger
> > that BPF program periodically, you can sleep in those per-CPU threads,
> > or epoll, or whatever. Or maybe you can set up a per-CPU perf event
> > that would trigger your program on the desired CPU, etc.My point is
> > that I and others shouldn't be guessing this, I'd expect someone who's
> > proposing an entire new BPF program type to motivate why this new
> > program type is necessary and what problem it's solving that can't be
> > solved with existing means.
>
> Yes, there are other options. But they all come with non-trivial cost.
> Per-CPU-per-process threads and/or per-CPU perf event are cost we have
> to pay in production. IMO, these cost are much higher than a new program
> type (or attach type).
>

So for threads I know the costs (a bit of memory for thread stack,
plus some internal book keeping stuff in kernel), which is arguable
how big of a deal is that if those threads do pretty much nothing most
of the time. But what's the exact cost of perf events and why it's
unacceptably high?

The reason I'm asking is that it seems to me that one alternative,
which is more generic (and thus potentially more useful) would be to
have a manually-triggerable perf event. Some sort of software event,
that's triggered from ioctl or some other syscall, that's appropriate
for perf subsystem. You'd pre-create a perf_event for each CPU,
remember their FDs, then would trigger the one you need (corresponding
to desired CPU). From the BPF side, you'd just use a normal perf_event
program to handle perf_event activation. But it could be used even
outside of the BPF ecosystem, which is a good sign for me, because it
allows more flexible composition of building blocks.

> >
> > BTW, how frequently do you need to trigger the BPF program? Seems very
> > frequently, if 2 vs 1 context switches might be a problem?
>
> The whole solution requires two BPF programs. One on each context switch,
> the other is the user program. The user program will not trigger very
> often.

Ok, so performance was never an objective, I wonder why it is put as
the main reason for this new type of BPF program?

>
> >
> >>> I bet in this case the  performance would be
> >>> really nice for a lot of practical use cases. But then again, I don't
> >>> know details of the intended use case, so please provide some more
> >>> details.
> >>
> >> Being able to trigger BPF program on a different CPU could enable many
> >> use cases and optimizations. The use case I am looking at is to access
> >> perf_event and percpu maps on the target CPU. For example:
> >>        0. trigger the program
> >>        1. read perf_event on cpu x;
> >>        2. (optional) check which process is running on cpu x;
> >>        3. add perf_event value to percpu map(s) on cpu x.
> >>
> >> If we do these steps in a BPF program on cpu x, the cost is:
> >>        A.0) trigger BPF via IPI;
> >>        A.1) read perf_event locally;
> >>        A.2) local access current;
> >>        A.3) local access of percpu map(s).
> >>
> >> If we can only do these on a different CPU, the cost will be:
> >>        B.0) trigger BPF locally;
> >>        B.1) read perf_event via IPI;
> >>        B.2) remote access current on cpu x;
> >>        B.3) remote access percpu map(s), or use non-percpu map(2).
> >>
> >> Cost of (A.0 + A.1) is about same as (B.0 + B.1), maybe a little higher
> >> (sys_bpf(), vs. sys_getpgid()). But A.2 and A.3 will be significantly
> >> cheaper than B.2 and B.3.
> >>
> >> Does this make sense?
> >
> > It does, thanks. But what I was describing is still A, no? BPF program
> > will be triggered on your desired cpu X, wouldn't it?
>
> Well, that would be option C, but C could not do step 2, because we context
> switch to the dedicated thread.
>

So I think *this* is a real requirement. No preemption of the running
process on a different CPU. That does sound like what perf event does,
doesn't it? See above.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH bpf-next 2/5] libbpf: support BPF_PROG_TYPE_USER programs
  2020-08-05  7:23                 ` Song Liu
@ 2020-08-05 17:44                   ` Andrii Nakryiko
  0 siblings, 0 replies; 43+ messages in thread
From: Andrii Nakryiko @ 2020-08-05 17:44 UTC (permalink / raw)
  To: Song Liu
  Cc: open list, bpf, Networking, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, john fastabend, KP Singh, Jesper Dangaard Brouer,
	Daniel Xu

On Wed, Aug 5, 2020 at 12:23 AM Song Liu <songliubraving@fb.com> wrote:
>
>
>
> > On Aug 4, 2020, at 11:54 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >
> > On Tue, Aug 4, 2020 at 11:26 PM Song Liu <songliubraving@fb.com> wrote:
> >>
> >>
> >>
> >>> On Aug 4, 2020, at 10:32 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >>>
> >>> On Tue, Aug 4, 2020 at 8:59 PM Song Liu <songliubraving@fb.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>>> On Aug 4, 2020, at 6:38 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >>>>>
> >>>>> On Mon, Aug 3, 2020 at 6:18 PM Song Liu <songliubraving@fb.com> wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> On Aug 2, 2020, at 6:40 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >>>>>>>
> >>>>>>> On Sat, Aug 1, 2020 at 1:50 AM Song Liu <songliubraving@fb.com> wrote:
> >>>>>>>>
> >>>>>>
> >>>>>> [...]
> >>>>>>
> >>>>>>>
> >>>>>>>> };
> >>>>>>>>
> >>>>>>>> LIBBPF_API int bpf_prog_test_run_xattr(struct bpf_prog_test_run_attr *test_attr);
> >>>>>>>> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> >>>>>>>> index b9f11f854985b..9ce175a486214 100644
> >>>>>>>> --- a/tools/lib/bpf/libbpf.c
> >>>>>>>> +++ b/tools/lib/bpf/libbpf.c
> >>>>>>>> @@ -6922,6 +6922,7 @@ static const struct bpf_sec_def section_defs[] = {
> >>>>>>>>     BPF_PROG_SEC("lwt_out",                 BPF_PROG_TYPE_LWT_OUT),
> >>>>>>>>     BPF_PROG_SEC("lwt_xmit",                BPF_PROG_TYPE_LWT_XMIT),
> >>>>>>>>     BPF_PROG_SEC("lwt_seg6local",           BPF_PROG_TYPE_LWT_SEG6LOCAL),
> >>>>>>>> +       BPF_PROG_SEC("user",                    BPF_PROG_TYPE_USER),
> >>>>>>>
> >>>>>>> let's do "user/" for consistency with most other prog types (and nice
> >>>>>>> separation between prog type and custom user name)
> >>>>>>
> >>>>>> About "user" vs. "user/", I still think "user" is better.
> >>>>>>
> >>>>>> Unlike kprobe and tracepoint, user prog doesn't use the part after "/".
> >>>>>> This is similar to "perf_event" for BPF_PROG_TYPE_PERF_EVENT, "xdl" for
> >>>>>> BPF_PROG_TYPE_XDP, etc. If we specify "user" here, "user/" and "user/xxx"
> >>>>>> would also work. However, if we specify "user/" here, programs that used
> >>>>>> "user" by accident will fail to load, with a message like:
> >>>>>>
> >>>>>>      libbpf: failed to load program 'user'
> >>>>>>
> >>>>>> which is confusing.
> >>>>>
> >>>>> xdp, perf_event and a bunch of others don't enforce it, that's true,
> >>>>> they are a bit of a legacy,
> >>>>
> >>>> I don't see w/o "/" is a legacy thing. BPF_PROG_TYPE_STRUCT_OPS just uses
> >>>> "struct_ops".
> >>>>
> >>>>> unfortunately. But all the recent ones do,
> >>>>> and we explicitly did that for xdp_dev/xdp_cpu, for instance.
> >>>>> Specifying just "user" in the spec would allow something nonsensical
> >>>>> like "userargh", for instance, due to this being treated as a prefix.
> >>>>> There is no harm to require users to do "user/my_prog", though.
> >>>>
> >>>> I don't see why allowing "userargh" is a problem. Failing "user" is
> >>>> more confusing. We can probably improve that by a hint like:
> >>>>
> >>>>   libbpf: failed to load program 'user', do you mean "user/"?
> >>>>
> >>>> But it is pretty silly. "user/something_never_used" also looks weird.
> >>>
> >>> "userargh" is terrible, IMO. It's a different identifier that just
> >>> happens to have the first 4 letters matching "user" program type.
> >>> There must be either a standardized separator (which happens to be
> >>> '/') or none. See the suggestion below.
> >>
> >> We have no problem deal with "a different identifier that just happens
> >> to have the first letters matching", like xdp vs. xdp_devmap and
> >> xdp_cpumap, right?
> >>
> >
> > xdp vs xdp_devmap is an entirely different thing. We deal with it by
> > checking xdp_devmap first. What I'm saying is that user can do
> > "xdpomg" and libbpf would be happy (today). And I don't think that's
> > good. But further, if someone does something like "xdp_devmap_omg",
> > guess which program type will be inferred? Hint: not xdp_devmap and
> > libbpf won't report an error either. All because "xdp" is so lax
> > today.
> >
> >>>>
> >>>>> Alternatively, we could introduce a new convention in the spec,
> >>>>> something like "user?", which would accept either "user" or
> >>>>> "user/something", but not "user/" nor "userblah". We can try that as
> >>>>> well.
> >>>>
> >>>> Again, I don't really understand why allowing "userblah" is a problem.
> >>>> We already have "xdp", "xdp_devmap/", and "xdp_cpumap/", they all work
> >>>> fine so far.
> >>>
> >>> Right, we have "xdp_devmap/" and "xdp_cpumap/", as you say. I haven't
> >>> seen so much pushback against trailing forward slash with those ;)
> >>
> >> I haven't seen any issue with old "perf_event", "xdp" and new "struct_ops"
> >> either.
> >>
> >>>
> >>> But anyways, as part of deprecating APIs and preparing libbpf for 1.0
> >>> release over this half, I think I'm going to emit warnings for names
> >>> like "prog_type_whatever" or "prog_typeevenworse", etc. And asking
> >>> users to normalize section names to either "prog_type" or
> >>> "prog_type/something/here", whichever makes sense for a specific
> >>> program type.
> >>
> >> Exactly, "user" makes sense here; while "kprobe/__set_task_comm" makes
> >> sense for kprobe.
> >
> > Right, but "userblah" doesn't. It would be great if you could help
> > make what I described above become true. But at least don't make it
> > worse by allowing unrestricted "user" prefix. I'm OK with strict
> > "user" or "user/blah", I'm not OK with "userblah", I'm sorry.
>
> If the concern with "userblah" is real and unbearable, so is "xdpblah"
> and "perf_eventblah", and so on, and so on.
>

Oh yeah, "xdpblah" makes me cringe. "Some other kid is doing wrong
thing, let me do it as well" style of argument never worked for me
with my parents, I don't see why it should work here :)

But anyways, let's table this discussion, it's not worth spending so
much time on it. As I said, I'm going to start enforcing standardized
separator or a single program type word soon enough. Might as well do
that for "userblah".

> >
> >>
> >>> Right now libbpf doesn't allow two separate BPF programs
> >>> with the same section name, so enforcing strict "user" is limiting to
> >>> users. We are going to lift that restriction pretty soon, though. But
> >>> for now, please stick with what we've been doing lately and mark it as
> >>> "user/", later we'll allow just "user" as well.
> >>
> >> Since we would allow "user" later, why we have to reject it for now?
> >
> > Because libbpf is dumb in that regard today? And instead of migrating
> > users later, I want to prevent users making bad choices right now.
>
> The good choice here is to use "user", IMO. And you are preventing people
> to use it. If user has to use "user/" for now. They will have to update
> the programs later, right? If the conclusion is "user/xxx" is the ultimate
> goal, I would agree with "user/" for now.

They won't have to update, "user/something" would still work.

>
> > Then relax it, if necessary. Alternatively, we can fix up libbpf logic
> > before the USER program type lands.
>
> I don't see why the USER program type need to wait for libbpf fix, as
> "xdp", "perf_event", etc. all work well today.

It's about being a good citizen and helping move libbpf forward.

>
> >
> >> Imagine the user just compiled and booted into a new kernel with user
> >> program support; and then got the following message:
> >>
> >>        libbpf: failed to load program 'user'
> >>
> >> If I were the user, I would definitely question whether the kernel was
> >> correct...
> >
> > That's also bad, and again, we can make libbpf better. I think moving
> > forward any non-recognized BPF program type should be reported by
> > libbpf as an error. But we can't do it right now, we have to have a
> > period in which users will get a chance to update their BPF programs.
> > This will have to happen over few libbpf releases at least. So please
> > join in on the fun of fixing stuff like this.
>
> I'd love to join the fun. Maybe after user program lands ;)
>
>

Of course :) see above.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH bpf-next 5/5] selftests/bpf: add benchmark for uprobe vs. user_prog
  2020-08-05 17:27                 ` Andrii Nakryiko
@ 2020-08-05 17:45                   ` Alexei Starovoitov
  2020-08-05 17:56                     ` Andrii Nakryiko
  0 siblings, 1 reply; 43+ messages in thread
From: Alexei Starovoitov @ 2020-08-05 17:45 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Song Liu, open list, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team, john fastabend, KP Singh,
	Jesper Dangaard Brouer, Daniel Xu

On Wed, Aug 05, 2020 at 10:27:28AM -0700, Andrii Nakryiko wrote:
> On Wed, Aug 5, 2020 at 10:16 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Wed, Aug 05, 2020 at 04:47:30AM +0000, Song Liu wrote:
> > >
> > > Being able to trigger BPF program on a different CPU could enable many
> > > use cases and optimizations. The use case I am looking at is to access
> > > perf_event and percpu maps on the target CPU. For example:
> > >       0. trigger the program
> > >       1. read perf_event on cpu x;
> > >       2. (optional) check which process is running on cpu x;
> > >       3. add perf_event value to percpu map(s) on cpu x.
> >
> > If the whole thing is about doing the above then I don't understand why new
> > prog type is needed. Can prog_test_run support existing BPF_PROG_TYPE_KPROBE?
> > "enable many use cases" sounds vague. I don't think folks reading
> > the patches can guess those "use cases".
> > "Testing existing kprobe bpf progs" would sound more convincing to me.
> 
> Was just about to propose the same :) I wonder if generic test_run()
> capability to trigger test programs of whatever supported type on a
> specified CPU through IPI can be added. That way you can even use the
> XDP program to do what Song seems to need.
> 
> TRACEPOINTs might also be a good fit here, given it seems simpler to
> let users specify custom tracepoint data for test_run(). Having the
> ability to unit-test KPROBE and TRACEPOINT, however rudimentary, is
> already a big win.
> 
> > If the test_run framework can be extended to trigger kprobe with correct pt_regs.
> > As part of it test_run would trigger on a given cpu with $ip pointing
> > to some test fuction in test_run.c. For local test_run the stack trace
> > would include bpf syscall chain. For IPI the stack trace would include
> > the corresponding kernel pieces where top is our special test function.
> > Sort of like pseudo kprobe where there is no actual kprobe logic,
> > since kprobe prog doesn't care about mechanism. It needs correct
> > pt_regs only as input context.
> > The kprobe prog output (return value) has special meaning though,
> > so may be kprobe prog type is not a good fit.
> 
> It does? I don't remember returning 1 from KPROBE changing anything. I
> thought it's only the special bpf_override_return() that can influence
> the kernel function return result.

See comment in trace_call_bpf().
And logic to handle it in kprobe_perf_func() for kprobes.
and in perf_trace_run_bpf_submit() for tracepoints.
It's historical and Song actually discovered an issue with such behavior.
I don't remember whether we've concluded on the solution.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH bpf-next 5/5] selftests/bpf: add benchmark for uprobe vs. user_prog
  2020-08-05 17:45                   ` Alexei Starovoitov
@ 2020-08-05 17:56                     ` Andrii Nakryiko
  0 siblings, 0 replies; 43+ messages in thread
From: Andrii Nakryiko @ 2020-08-05 17:56 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Song Liu, open list, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team, john fastabend, KP Singh,
	Jesper Dangaard Brouer, Daniel Xu

On Wed, Aug 5, 2020 at 10:45 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Aug 05, 2020 at 10:27:28AM -0700, Andrii Nakryiko wrote:
> > On Wed, Aug 5, 2020 at 10:16 AM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Wed, Aug 05, 2020 at 04:47:30AM +0000, Song Liu wrote:
> > > >
> > > > Being able to trigger BPF program on a different CPU could enable many
> > > > use cases and optimizations. The use case I am looking at is to access
> > > > perf_event and percpu maps on the target CPU. For example:
> > > >       0. trigger the program
> > > >       1. read perf_event on cpu x;
> > > >       2. (optional) check which process is running on cpu x;
> > > >       3. add perf_event value to percpu map(s) on cpu x.
> > >
> > > If the whole thing is about doing the above then I don't understand why new
> > > prog type is needed. Can prog_test_run support existing BPF_PROG_TYPE_KPROBE?
> > > "enable many use cases" sounds vague. I don't think folks reading
> > > the patches can guess those "use cases".
> > > "Testing existing kprobe bpf progs" would sound more convincing to me.
> >
> > Was just about to propose the same :) I wonder if generic test_run()
> > capability to trigger test programs of whatever supported type on a
> > specified CPU through IPI can be added. That way you can even use the
> > XDP program to do what Song seems to need.
> >
> > TRACEPOINTs might also be a good fit here, given it seems simpler to
> > let users specify custom tracepoint data for test_run(). Having the
> > ability to unit-test KPROBE and TRACEPOINT, however rudimentary, is
> > already a big win.
> >
> > > If the test_run framework can be extended to trigger kprobe with correct pt_regs.
> > > As part of it test_run would trigger on a given cpu with $ip pointing
> > > to some test fuction in test_run.c. For local test_run the stack trace
> > > would include bpf syscall chain. For IPI the stack trace would include
> > > the corresponding kernel pieces where top is our special test function.
> > > Sort of like pseudo kprobe where there is no actual kprobe logic,
> > > since kprobe prog doesn't care about mechanism. It needs correct
> > > pt_regs only as input context.
> > > The kprobe prog output (return value) has special meaning though,
> > > so may be kprobe prog type is not a good fit.
> >
> > It does? I don't remember returning 1 from KPROBE changing anything. I
> > thought it's only the special bpf_override_return() that can influence
> > the kernel function return result.
>
> See comment in trace_call_bpf().
> And logic to handle it in kprobe_perf_func() for kprobes.
> and in perf_trace_run_bpf_submit() for tracepoints.
> It's historical and Song actually discovered an issue with such behavior.
> I don't remember whether we've concluded on the solution.

Oh, thanks for pointers. Never realized there is more going on with
those. I guess return 1; is not advised then, as it causes extra
overhead.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH bpf-next 5/5] selftests/bpf: add benchmark for uprobe vs. user_prog
  2020-08-05 17:39                   ` Andrii Nakryiko
@ 2020-08-05 18:41                     ` Song Liu
  0 siblings, 0 replies; 43+ messages in thread
From: Song Liu @ 2020-08-05 18:41 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: open list, bpf, Networking, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, john fastabend, KP Singh, Jesper Dangaard Brouer,
	Daniel Xu



> On Aug 5, 2020, at 10:39 AM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> 
> On Wed, Aug 5, 2020 at 12:01 AM Song Liu <songliubraving@fb.com> wrote:
>> 
>> 
>> 
>>> On Aug 4, 2020, at 10:47 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>>> 
>>> On Tue, Aug 4, 2020 at 9:47 PM Song Liu <songliubraving@fb.com> wrote:
>>>> 
>>>> 
>>>> 
>>>>> On Aug 4, 2020, at 6:52 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>>>>> 
>>>>> On Tue, Aug 4, 2020 at 2:01 PM Song Liu <songliubraving@fb.com> wrote:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Aug 2, 2020, at 10:10 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>>>>>>> 
>>>>>>> On Sun, Aug 2, 2020 at 9:47 PM Song Liu <songliubraving@fb.com> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Aug 2, 2020, at 6:51 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>> On Sat, Aug 1, 2020 at 1:50 AM Song Liu <songliubraving@fb.com> wrote:
>>>>>>>>>> 
>>>>>>>>>> Add a benchmark to compare performance of
>>>>>>>>>> 1) uprobe;
>>>>>>>>>> 2) user program w/o args;
>>>>>>>>>> 3) user program w/ args;
>>>>>>>>>> 4) user program w/ args on random cpu.
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Can you please add it to the existing benchmark runner instead, e.g.,
>>>>>>>>> along the other bench_trigger benchmarks? No need to re-implement
>>>>>>>>> benchmark setup. And also that would also allow to compare existing
>>>>>>>>> ways of cheaply triggering a program vs this new _USER program?
>>>>>>>> 
>>>>>>>> Will try.
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> If the performance is not significantly better than other ways, do you
>>>>>>>>> think it still makes sense to add a new BPF program type? I think
>>>>>>>>> triggering KPROBE/TRACEPOINT from bpf_prog_test_run() would be very
>>>>>>>>> nice, maybe it's possible to add that instead of a new program type?
>>>>>>>>> Either way, let's see comparison with other program triggering
>>>>>>>>> mechanisms first.
>>>>>>>> 
>>>>>>>> Triggering KPROBE and TRACEPOINT from bpf_prog_test_run() will be useful.
>>>>>>>> But I don't think they can be used instead of user program, for a couple
>>>>>>>> reasons. First, KPROBE/TRACEPOINT may be triggered by other programs
>>>>>>>> running in the system, so user will have to filter those noise out in
>>>>>>>> each program. Second, it is not easy to specify CPU for KPROBE/TRACEPOINT,
>>>>>>>> while this feature could be useful in many cases, e.g. get stack trace
>>>>>>>> on a given CPU.
>>>>>>>> 
>>>>>>> 
>>>>>>> Right, it's not as convenient with KPROBE/TRACEPOINT as with the USER
>>>>>>> program you've added specifically with that feature in mind. But if
>>>>>>> you pin user-space thread on the needed CPU and trigger kprobe/tp,
>>>>>>> then you'll get what you want. As for the "noise", see how
>>>>>>> bench_trigger() deals with that: it records thread ID and filters
>>>>>>> everything not matching. You can do the same with CPU ID. It's not as
>>>>>>> automatic as with a special BPF program type, but still pretty simple,
>>>>>>> which is why I'm still deciding (for myself) whether USER program type
>>>>>>> is necessary :)
>>>>>> 
>>>>>> Here are some bench_trigger numbers:
>>>>>> 
>>>>>> base      :    1.698 ± 0.001M/s
>>>>>> tp        :    1.477 ± 0.001M/s
>>>>>> rawtp     :    1.567 ± 0.001M/s
>>>>>> kprobe    :    1.431 ± 0.000M/s
>>>>>> fentry    :    1.691 ± 0.000M/s
>>>>>> fmodret   :    1.654 ± 0.000M/s
>>>>>> user      :    1.253 ± 0.000M/s
>>>>>> fentry-on-cpu:    0.022 ± 0.011M/s
>>>>>> user-on-cpu:    0.315 ± 0.001M/s
>>>>>> 
>>>>> 
>>>>> Ok, so basically all of raw_tp,tp,kprobe,fentry/fexit are
>>>>> significantly faster than USER programs. Sure, when compared to
>>>>> uprobe, they are faster, but not when doing on-specific-CPU run, it
>>>>> seems (judging from this patch's description, if I'm reading it
>>>>> right). Anyways, speed argument shouldn't be a reason for doing this,
>>>>> IMO.
>>>>> 
>>>>>> The two "on-cpu" tests run the program on a different CPU (see the patch
>>>>>> at the end).
>>>>>> 
>>>>>> "user" is about 25% slower than "fentry". I think this is mostly because
>>>>>> getpgid() is a faster syscall than bpf(BPF_TEST_RUN).
>>>>> 
>>>>> Yes, probably.
>>>>> 
>>>>>> 
>>>>>> "user-on-cpu" is more than 10x faster than "fentry-on-cpu", because IPI
>>>>>> is way faster than moving the process (via sched_setaffinity).
>>>>> 
>>>>> I don't think that's a good comparison, because you are actually
>>>>> testing sched_setaffinity performance on each iteration vs IPI in the
>>>>> kernel, not a BPF overhead.
>>>>> 
>>>>> I think the fair comparison for this would be to create a thread and
>>>>> pin it on necessary CPU, and only then BPF program calls in a loop.
>>>>> But I bet any of existing program types would beat USER program.
>>>>> 
>>>>>> 
>>>>>> For use cases that we would like to call BPF program on specific CPU,
>>>>>> triggering it via IPI is a lot faster.
>>>>> 
>>>>> So these use cases would be nice to expand on in the motivational part
>>>>> of the patch set. It's not really emphasized and it's not at all clear
>>>>> what you are trying to achieve. It also seems, depending on latency
>>>>> requirements, it's totally possible to achieve comparable results by
>>>>> pre-creating a thread for each CPU, pinning each one to its designated
>>>>> CPU and then using any suitable user-space signaling mechanism (a
>>>>> queue, condvar, etc) to ask a thread to trigger BPF program (fentry on
>>>>> getpgid(), for instance).
>>>> 
>>>> I don't see why user space signal plus fentry would be faster than IPI.
>>>> If the target cpu is running something, this gonna add two context
>>>> switches.
>>>> 
>>> 
>>> I didn't say faster, did I? I said it would be comparable and wouldn't
>>> require a new program type.
>> 
>> Well, I don't think adding program type is that big a deal. If that is
>> really a problem, we can use a new attach type instead. The goal is to
>> trigger it with sys_bpf() on a different cpu. So we can call it kprobe
>> attach to nothing and hack that way. I add the new type because it makes
>> sense. The user just want to trigger a BPF program from user space.
> 
> I thought we already concluded that it's not really "trigger a BPF
> program from user space", because for that you have many existing and
> even faster options. After a few rounds of emails, it seems it's more
> about triggering the BPF program on another CPU without preempting
> whatever is running on that CPU. It would be helpful to be clear and
> upfront about the requirements.
> 
>> 
>>> But then again, without knowing all the
>>> details, it's a bit hard to discuss this. E.g., if you need to trigger
>>> that BPF program periodically, you can sleep in those per-CPU threads,
>>> or epoll, or whatever. Or maybe you can set up a per-CPU perf event
>>> that would trigger your program on the desired CPU, etc.My point is
>>> that I and others shouldn't be guessing this, I'd expect someone who's
>>> proposing an entire new BPF program type to motivate why this new
>>> program type is necessary and what problem it's solving that can't be
>>> solved with existing means.
>> 
>> Yes, there are other options. But they all come with non-trivial cost.
>> Per-CPU-per-process threads and/or per-CPU perf event are cost we have
>> to pay in production. IMO, these cost are much higher than a new program
>> type (or attach type).
>> 
> 
> So for threads I know the costs (a bit of memory for thread stack,
> plus some internal book keeping stuff in kernel), which is arguable
> how big of a deal is that if those threads do pretty much nothing most
> of the time. But what's the exact cost of perf events and why it's
> unacceptably high?

perf_events use limited hardware counters. We had real production issues
in the past when we run out of hardware counters. 

> 
> The reason I'm asking is that it seems to me that one alternative,
> which is more generic (and thus potentially more useful) would be to
> have a manually-triggerable perf event. Some sort of software event,
> that's triggered from ioctl or some other syscall, that's appropriate
> for perf subsystem. You'd pre-create a perf_event for each CPU,
> remember their FDs, then would trigger the one you need (corresponding
> to desired CPU). From the BPF side, you'd just use a normal perf_event
> program to handle perf_event activation. But it could be used even
> outside of the BPF ecosystem, which is a good sign for me, because it
> allows more flexible composition of building blocks.

This is an interesting idea. But I don't really see use cases outside of
BPF ecosystem. It is not easy to pass extra arguments into the perf_event. 
Well we can probably pass the arguments via maps, but that is not ideal. 
Also, each perf_event is 1kB+ memory. For a system with 100 cores, the 
perf_events may use 25x more memory (100kB) than the bpf program (4kB). 

> 
>>> 
>>> BTW, how frequently do you need to trigger the BPF program? Seems very
>>> frequently, if 2 vs 1 context switches might be a problem?
>> 
>> The whole solution requires two BPF programs. One on each context switch,
>> the other is the user program. The user program will not trigger very
>> often.
> 
> Ok, so performance was never an objective, I wonder why it is put as
> the main reason for this new type of BPF program?

User program (or user program triggered BPF program) could also bring 
free performance saving to USDT with semaphore. Basically, we can replace

	if (semaphore) {
		dummy_func(arg1, arg2, ...);
	}

with 

	if (semaphore) {
		bpf_test_run(arg1, arg2, ...);
	}

The first one uses uprobe, which is hundreds nanoseconds slower that the 
latter. 

> 
>>> 
>>>>> I bet in this case the  performance would be
>>>>> really nice for a lot of practical use cases. But then again, I don't
>>>>> know details of the intended use case, so please provide some more
>>>>> details.
>>>> 
>>>> Being able to trigger BPF program on a different CPU could enable many
>>>> use cases and optimizations. The use case I am looking at is to access
>>>> perf_event and percpu maps on the target CPU. For example:
>>>>       0. trigger the program
>>>>       1. read perf_event on cpu x;
>>>>       2. (optional) check which process is running on cpu x;
>>>>       3. add perf_event value to percpu map(s) on cpu x.
>>>> 
>>>> If we do these steps in a BPF program on cpu x, the cost is:
>>>>       A.0) trigger BPF via IPI;
>>>>       A.1) read perf_event locally;
>>>>       A.2) local access current;
>>>>       A.3) local access of percpu map(s).
>>>> 
>>>> If we can only do these on a different CPU, the cost will be:
>>>>       B.0) trigger BPF locally;
>>>>       B.1) read perf_event via IPI;
>>>>       B.2) remote access current on cpu x;
>>>>       B.3) remote access percpu map(s), or use non-percpu map(2).
>>>> 
>>>> Cost of (A.0 + A.1) is about same as (B.0 + B.1), maybe a little higher
>>>> (sys_bpf(), vs. sys_getpgid()). But A.2 and A.3 will be significantly
>>>> cheaper than B.2 and B.3.
>>>> 
>>>> Does this make sense?
>>> 
>>> It does, thanks. But what I was describing is still A, no? BPF program
>>> will be triggered on your desired cpu X, wouldn't it?
>> 
>> Well, that would be option C, but C could not do step 2, because we context
>> switch to the dedicated thread.
>> 
> 
> So I think *this* is a real requirement. No preemption of the running
> process on a different CPU. That does sound like what perf event does,
> doesn't it? See above.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH bpf-next 5/5] selftests/bpf: add benchmark for uprobe vs. user_prog
  2020-08-05 17:16               ` Alexei Starovoitov
  2020-08-05 17:27                 ` Andrii Nakryiko
@ 2020-08-05 18:56                 ` Song Liu
  2020-08-05 22:50                   ` Alexei Starovoitov
  1 sibling, 1 reply; 43+ messages in thread
From: Song Liu @ 2020-08-05 18:56 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, open list, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team, john fastabend, KP Singh,
	Jesper Dangaard Brouer, Daniel Xu



> On Aug 5, 2020, at 10:16 AM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> 
> On Wed, Aug 05, 2020 at 04:47:30AM +0000, Song Liu wrote:
>> 
>> Being able to trigger BPF program on a different CPU could enable many
>> use cases and optimizations. The use case I am looking at is to access
>> perf_event and percpu maps on the target CPU. For example:
>> 	0. trigger the program
>> 	1. read perf_event on cpu x;
>> 	2. (optional) check which process is running on cpu x;
>> 	3. add perf_event value to percpu map(s) on cpu x. 
> 
> If the whole thing is about doing the above then I don't understand why new
> prog type is needed.

I was under the (probably wrong) impression that adding prog type is not
that big a deal. 

> Can prog_test_run support existing BPF_PROG_TYPE_KPROBE?

I haven't looked into all the details, but I bet this is possible.

> "enable many use cases" sounds vague. I don't think folks reading
> the patches can guess those "use cases".
> "Testing existing kprobe bpf progs" would sound more convincing to me.
> If the test_run framework can be extended to trigger kprobe with correct pt_regs.
> As part of it test_run would trigger on a given cpu with $ip pointing
> to some test fuction in test_run.c. For local test_run the stack trace
> would include bpf syscall chain. For IPI the stack trace would include
> the corresponding kernel pieces where top is our special test function.
> Sort of like pseudo kprobe where there is no actual kprobe logic,
> since kprobe prog doesn't care about mechanism. It needs correct
> pt_regs only as input context.
> The kprobe prog output (return value) has special meaning though,
> so may be kprobe prog type is not a good fit.
> Something like fentry/fexit may be better, since verifier check_return_code()
> enforces 'return 0'. So their return value is effectively "void".
> Then prog_test_run would need to gain an ability to trigger
> fentry/fexit prog on a given cpu.

Maybe we add a new attach type for BPF_PROG_TYPE_TRACING, which is in 
parallel with BPF_TRACE_FENTRY and BPF_TRACE_EXIT? Say BPF_TRACE_USER? 
(Just realized I like this name :-D, it matches USDT...). Then we can 
enable test_run for most (if not all) tracing programs, including
fentry/fexit. 

Does this sound like a good plan?

Thanks,
Song




^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH bpf-next 5/5] selftests/bpf: add benchmark for uprobe vs. user_prog
  2020-08-05 18:56                 ` Song Liu
@ 2020-08-05 22:50                   ` Alexei Starovoitov
  2020-08-05 23:50                     ` Song Liu
  0 siblings, 1 reply; 43+ messages in thread
From: Alexei Starovoitov @ 2020-08-05 22:50 UTC (permalink / raw)
  To: Song Liu
  Cc: Andrii Nakryiko, open list, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team, john fastabend, KP Singh,
	Jesper Dangaard Brouer, Daniel Xu

On Wed, Aug 05, 2020 at 06:56:26PM +0000, Song Liu wrote:
> 
> 
> > On Aug 5, 2020, at 10:16 AM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> > 
> > On Wed, Aug 05, 2020 at 04:47:30AM +0000, Song Liu wrote:
> >> 
> >> Being able to trigger BPF program on a different CPU could enable many
> >> use cases and optimizations. The use case I am looking at is to access
> >> perf_event and percpu maps on the target CPU. For example:
> >> 	0. trigger the program
> >> 	1. read perf_event on cpu x;
> >> 	2. (optional) check which process is running on cpu x;
> >> 	3. add perf_event value to percpu map(s) on cpu x. 
> > 
> > If the whole thing is about doing the above then I don't understand why new
> > prog type is needed.
> 
> I was under the (probably wrong) impression that adding prog type is not
> that big a deal. 

Not a big deal when it's necessary.

> > Can prog_test_run support existing BPF_PROG_TYPE_KPROBE?
> 
> I haven't looked into all the details, but I bet this is possible.
> 
> > "enable many use cases" sounds vague. I don't think folks reading
> > the patches can guess those "use cases".
> > "Testing existing kprobe bpf progs" would sound more convincing to me.
> > If the test_run framework can be extended to trigger kprobe with correct pt_regs.
> > As part of it test_run would trigger on a given cpu with $ip pointing
> > to some test fuction in test_run.c. For local test_run the stack trace
> > would include bpf syscall chain. For IPI the stack trace would include
> > the corresponding kernel pieces where top is our special test function.
> > Sort of like pseudo kprobe where there is no actual kprobe logic,
> > since kprobe prog doesn't care about mechanism. It needs correct
> > pt_regs only as input context.
> > The kprobe prog output (return value) has special meaning though,
> > so may be kprobe prog type is not a good fit.
> > Something like fentry/fexit may be better, since verifier check_return_code()
> > enforces 'return 0'. So their return value is effectively "void".
> > Then prog_test_run would need to gain an ability to trigger
> > fentry/fexit prog on a given cpu.
> 
> Maybe we add a new attach type for BPF_PROG_TYPE_TRACING, which is in 
> parallel with BPF_TRACE_FENTRY and BPF_TRACE_EXIT? Say BPF_TRACE_USER? 
> (Just realized I like this name :-D, it matches USDT...). Then we can 
> enable test_run for most (if not all) tracing programs, including
> fentry/fexit. 

Why new hook? Why prog_test_run cmd cannot be made to work
BPF_PROG_TYPE_TRACING when it's loaded as BPF_TRACE_FENTRY and attach_btf_id
points to special test function?
The test_run cmd will trigger execution of that special function.
Devil is in details of course. How attach, trampoline, etc going to work
that all needs to be figured out. Parallel test_run cmd ideally shouldn't
affect each other, etc.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH bpf-next 5/5] selftests/bpf: add benchmark for uprobe vs. user_prog
  2020-08-05 22:50                   ` Alexei Starovoitov
@ 2020-08-05 23:50                     ` Song Liu
  0 siblings, 0 replies; 43+ messages in thread
From: Song Liu @ 2020-08-05 23:50 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, open list, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team, john fastabend, KP Singh,
	Jesper Dangaard Brouer, Daniel Xu



> On Aug 5, 2020, at 3:50 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> 
> On Wed, Aug 05, 2020 at 06:56:26PM +0000, Song Liu wrote:
>> 
>> 
>>> On Aug 5, 2020, at 10:16 AM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>>> 
>>> On Wed, Aug 05, 2020 at 04:47:30AM +0000, Song Liu wrote:
>>>> 
>>>> Being able to trigger BPF program on a different CPU could enable many
>>>> use cases and optimizations. The use case I am looking at is to access
>>>> perf_event and percpu maps on the target CPU. For example:
>>>> 	0. trigger the program
>>>> 	1. read perf_event on cpu x;
>>>> 	2. (optional) check which process is running on cpu x;
>>>> 	3. add perf_event value to percpu map(s) on cpu x. 
>>> 
>>> If the whole thing is about doing the above then I don't understand why new
>>> prog type is needed.
>> 
>> I was under the (probably wrong) impression that adding prog type is not
>> that big a deal. 
> 
> Not a big deal when it's necessary.
> 
>>> Can prog_test_run support existing BPF_PROG_TYPE_KPROBE?
>> 
>> I haven't looked into all the details, but I bet this is possible.
>> 
>>> "enable many use cases" sounds vague. I don't think folks reading
>>> the patches can guess those "use cases".
>>> "Testing existing kprobe bpf progs" would sound more convincing to me.
>>> If the test_run framework can be extended to trigger kprobe with correct pt_regs.
>>> As part of it test_run would trigger on a given cpu with $ip pointing
>>> to some test fuction in test_run.c. For local test_run the stack trace
>>> would include bpf syscall chain. For IPI the stack trace would include
>>> the corresponding kernel pieces where top is our special test function.
>>> Sort of like pseudo kprobe where there is no actual kprobe logic,
>>> since kprobe prog doesn't care about mechanism. It needs correct
>>> pt_regs only as input context.
>>> The kprobe prog output (return value) has special meaning though,
>>> so may be kprobe prog type is not a good fit.
>>> Something like fentry/fexit may be better, since verifier check_return_code()
>>> enforces 'return 0'. So their return value is effectively "void".
>>> Then prog_test_run would need to gain an ability to trigger
>>> fentry/fexit prog on a given cpu.
>> 
>> Maybe we add a new attach type for BPF_PROG_TYPE_TRACING, which is in 
>> parallel with BPF_TRACE_FENTRY and BPF_TRACE_EXIT? Say BPF_TRACE_USER? 
>> (Just realized I like this name :-D, it matches USDT...). Then we can 
>> enable test_run for most (if not all) tracing programs, including
>> fentry/fexit. 
> 
> Why new hook? Why prog_test_run cmd cannot be made to work
> BPF_PROG_TYPE_TRACING when it's loaded as BPF_TRACE_FENTRY and attach_btf_id
> points to special test function?
> The test_run cmd will trigger execution of that special function.

I am not sure I am following 100%. IIUC, the special test function is a 
kernel function, and we attach fentry program to it. When multiple fentry
programs attach to the function, these programs will need proper filter
logic. 

Alternatively, if test_run just prepare the ctx and call BPF_PROG_RUN(), 
like in bpf_test_run(), we don't need the special test function. 

So I do think the new attach type requires new hook. It is just like
BPF_TRACE_FENTRY without valid attach_btf_id. Of course, we can reserve
a test function and use it for attach_btf_id. If test_run just calls
BPF_PROG_RUN(), we will probably never touch the test function. 

IMO, we are choosing from two options. 
1. FENTRY on special function. User will specify attach_btf_id on the
   special function. 
2. new attach type (BPF_TRACE_USER), that do not require attach_btf_id;
   and there is no need for a special function. 

I personally think #2 is cleaner API. But I have no objection if #1 is 
better in other means. 

Thanks,
Song



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH bpf-next 1/5] bpf: introduce BPF_PROG_TYPE_USER
  2020-08-01  8:47 ` [PATCH bpf-next 1/5] bpf: " Song Liu
  2020-08-01 13:58   ` kernel test robot
  2020-08-01 15:21   ` kernel test robot
@ 2020-08-06 18:18   ` kernel test robot
  2020-08-06 18:18   ` [RFC PATCH] bpf: user_verifier_ops can be static kernel test robot
  3 siblings, 0 replies; 43+ messages in thread
From: kernel test robot @ 2020-08-06 18:18 UTC (permalink / raw)
  To: Song Liu, linux-kernel, bpf, netdev
  Cc: kbuild-all, ast, daniel, kernel-team, john.fastabend, kpsingh,
	brouer, dlxu

[-- Attachment #1: Type: text/plain, Size: 1179 bytes --]

Hi Song,

I love your patch! Perhaps something to improve:

[auto build test WARNING on bpf-next/master]

url:    https://github.com/0day-ci/linux/commits/Song-Liu/introduce-BPF_PROG_TYPE_USER/20200801-165208
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: x86_64-randconfig-s031-20200806 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.2-117-g8c7aee71-dirty
        # save the attached .config to linux build tree
        make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)

>> kernel/trace/bpf_trace.c:1855:31: sparse: sparse: symbol 'user_verifier_ops' was not declared. Should it be static?
>> kernel/trace/bpf_trace.c:1860:27: sparse: sparse: symbol 'user_prog_ops' was not declared. Should it be static?

Please review and possibly fold the followup patch.

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 39786 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC PATCH] bpf: user_verifier_ops can be static
  2020-08-01  8:47 ` [PATCH bpf-next 1/5] bpf: " Song Liu
                     ` (2 preceding siblings ...)
  2020-08-06 18:18   ` kernel test robot
@ 2020-08-06 18:18   ` kernel test robot
  3 siblings, 0 replies; 43+ messages in thread
From: kernel test robot @ 2020-08-06 18:18 UTC (permalink / raw)
  To: Song Liu, linux-kernel, bpf, netdev
  Cc: kbuild-all, ast, daniel, kernel-team, john.fastabend, kpsingh,
	brouer, dlxu


Signed-off-by: kernel test robot <lkp@intel.com>
---
 bpf_trace.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index cbe789bc1b986..4b8f380694a10 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1852,12 +1852,12 @@ user_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	}
 }
 
-const struct bpf_verifier_ops user_verifier_ops = {
+static const struct bpf_verifier_ops user_verifier_ops = {
 	.get_func_proto		= user_prog_func_proto,
 	.is_valid_access	= user_prog_is_valid_access,
 };
 
-const struct bpf_prog_ops user_prog_ops = {
+static const struct bpf_prog_ops user_prog_ops = {
 	.test_run	= bpf_prog_test_run_user,
 };
 

^ permalink raw reply related	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2020-08-06 18:42 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-01  8:47 [PATCH bpf-next 0/5] introduce BPF_PROG_TYPE_USER Song Liu
2020-08-01  8:47 ` [PATCH bpf-next 1/5] bpf: " Song Liu
2020-08-01 13:58   ` kernel test robot
2020-08-01 15:21   ` kernel test robot
2020-08-06 18:18   ` kernel test robot
2020-08-06 18:18   ` [RFC PATCH] bpf: user_verifier_ops can be static kernel test robot
2020-08-01  8:47 ` [PATCH bpf-next 2/5] libbpf: support BPF_PROG_TYPE_USER programs Song Liu
2020-08-03  1:40   ` Andrii Nakryiko
2020-08-03  4:21     ` Song Liu
2020-08-03  5:05       ` Andrii Nakryiko
2020-08-04  1:18     ` Song Liu
2020-08-05  1:38       ` Andrii Nakryiko
2020-08-05  3:59         ` Song Liu
2020-08-05  5:32           ` Andrii Nakryiko
2020-08-05  6:26             ` Song Liu
2020-08-05  6:54               ` Andrii Nakryiko
2020-08-05  7:23                 ` Song Liu
2020-08-05 17:44                   ` Andrii Nakryiko
2020-08-01  8:47 ` [PATCH bpf-next 3/5] selftests/bpf: add selftest for BPF_PROG_TYPE_USER Song Liu
2020-08-03  1:43   ` Andrii Nakryiko
2020-08-03  4:33     ` Song Liu
2020-08-03  5:07       ` Andrii Nakryiko
2020-08-01  8:47 ` [PATCH bpf-next 4/5] selftests/bpf: move two functions to test_progs.c Song Liu
2020-08-03  1:46   ` Andrii Nakryiko
2020-08-03  4:34     ` Song Liu
2020-08-01  8:47 ` [PATCH bpf-next 5/5] selftests/bpf: add benchmark for uprobe vs. user_prog Song Liu
2020-08-03  1:51   ` Andrii Nakryiko
2020-08-03  4:47     ` Song Liu
2020-08-03  5:10       ` Andrii Nakryiko
2020-08-04 20:54         ` Song Liu
2020-08-05  1:52           ` Andrii Nakryiko
2020-08-05  4:47             ` Song Liu
2020-08-05  5:47               ` Andrii Nakryiko
2020-08-05  7:01                 ` Song Liu
2020-08-05 17:39                   ` Andrii Nakryiko
2020-08-05 18:41                     ` Song Liu
2020-08-05 17:16               ` Alexei Starovoitov
2020-08-05 17:27                 ` Andrii Nakryiko
2020-08-05 17:45                   ` Alexei Starovoitov
2020-08-05 17:56                     ` Andrii Nakryiko
2020-08-05 18:56                 ` Song Liu
2020-08-05 22:50                   ` Alexei Starovoitov
2020-08-05 23:50                     ` Song Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).