bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next 00/15] Support calling kernel function
@ 2021-03-16  1:13 Martin KaFai Lau
  2021-03-16  1:13 ` [PATCH bpf-next 01/15] bpf: Simplify freeing logic in linfo and jited_linfo Martin KaFai Lau
                   ` (14 more replies)
  0 siblings, 15 replies; 49+ messages in thread
From: Martin KaFai Lau @ 2021-03-16  1:13 UTC (permalink / raw)
  To: bpf; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, netdev

This series adds support to allow bpf program calling kernel function.

The use case included in this set is to allow bpf-tcp-cc to directly
call some tcp-cc helper functions (e.g. "tcp_cong_avoid_ai()").  Those
functions have already been used by some kernel tcp-cc implementations.

This set will also allow the bpf-tcp-cc program to directly call the
kernel tcp-cc implementation,  For example, a bpf_dctcp may only want to
implement its own dctcp_cwnd_event() and reuse other dctcp_*() directly
from the kernel tcp_dctcp.c instead of reimplementing (or
copy-and-pasting) them.

The tcp-cc kernel functions mentioned above will be white listed
for the struct_ops bpf-tcp-cc programs to use in a later patch.
The white listed functions are not bounded to a fixed ABI contract.
Those functions have already been used by the existing kernel tcp-cc.
If any of them has changed, both in-tree and out-of-tree kernel tcp-cc
implementations have to be changed.  The same goes for the struct_ops
bpf-tcp-cc programs which have to be adjusted accordingly.

Please see individual patch for details.

Martin KaFai Lau (15):
  bpf: Simplify freeing logic in linfo and jited_linfo
  bpf: btf: Support parsing extern func
  bpf: Refactor btf_check_func_arg_match
  bpf: Support bpf program calling kernel function
  bpf: Support kernel function call in x86-32
  tcp: Rename bictcp function prefix to cubictcp
  bpf: tcp: White list some tcp cong functions to be called by
    bpf-tcp-cc
  libbpf: Refactor bpf_object__resolve_ksyms_btf_id
  libbpf: Refactor codes for finding btf id of a kernel symbol
  libbpf: Rename RELO_EXTERN to RELO_EXTERN_VAR
  libbpf: Record extern sym relocation first
  libbpf: Support extern kernel function
  bpf: selftests: Rename bictcp to bpf_cubic
  bpf: selftest: bpf_cubic and bpf_dctcp calling kernel functions
  bpf: selftest: Add kfunc_call test

 arch/x86/net/bpf_jit_comp.c                   |   5 +
 arch/x86/net/bpf_jit_comp32.c                 | 198 +++++++++
 include/linux/bpf.h                           |  24 ++
 include/linux/btf.h                           |   6 +
 include/linux/filter.h                        |   4 +-
 include/uapi/linux/bpf.h                      |   4 +
 kernel/bpf/btf.c                              | 270 ++++++++-----
 kernel/bpf/core.c                             |  47 +--
 kernel/bpf/disasm.c                           |  32 +-
 kernel/bpf/disasm.h                           |   3 +-
 kernel/bpf/syscall.c                          |   4 +-
 kernel/bpf/verifier.c                         | 380 ++++++++++++++++--
 net/bpf/test_run.c                            |  11 +
 net/core/filter.c                             |  11 +
 net/ipv4/bpf_tcp_ca.c                         |  41 ++
 net/ipv4/tcp_cubic.c                          |  24 +-
 tools/bpf/bpftool/xlated_dumper.c             |   3 +-
 tools/include/uapi/linux/bpf.h                |   4 +
 tools/lib/bpf/btf.c                           |  32 +-
 tools/lib/bpf/btf.h                           |   5 +
 tools/lib/bpf/libbpf.c                        | 316 ++++++++++-----
 tools/testing/selftests/bpf/bpf_tcp_helpers.h |  29 +-
 tools/testing/selftests/bpf/prog_tests/btf.c  | 154 ++++++-
 .../selftests/bpf/prog_tests/kfunc_call.c     |  61 +++
 tools/testing/selftests/bpf/progs/bpf_cubic.c |  36 +-
 tools/testing/selftests/bpf/progs/bpf_dctcp.c |  22 +-
 .../selftests/bpf/progs/kfunc_call_test.c     |  48 +++
 .../bpf/progs/kfunc_call_test_subprog.c       |  31 ++
 28 files changed, 1454 insertions(+), 351 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/kfunc_call.c
 create mode 100644 tools/testing/selftests/bpf/progs/kfunc_call_test.c
 create mode 100644 tools/testing/selftests/bpf/progs/kfunc_call_test_subprog.c

-- 
2.30.2


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH bpf-next 01/15] bpf: Simplify freeing logic in linfo and jited_linfo
  2021-03-16  1:13 [PATCH bpf-next 00/15] Support calling kernel function Martin KaFai Lau
@ 2021-03-16  1:13 ` Martin KaFai Lau
  2021-03-16  1:13 ` [PATCH bpf-next 02/15] bpf: btf: Support parsing extern func Martin KaFai Lau
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 49+ messages in thread
From: Martin KaFai Lau @ 2021-03-16  1:13 UTC (permalink / raw)
  To: bpf; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, netdev

This patch simplifies the linfo freeing logic by combining
"bpf_prog_free_jited_linfo()" and "bpf_prog_free_unused_jited_linfo()"
into the new "bpf_prog_jit_attempt_done()".
It is a prep work for the kernel function call support.  In a later
patch, freeing the kernel function call descriptors will also
be done in the "bpf_prog_jit_attempt_done()".

"bpf_prog_free_linfo()" is removed since it is only called by
"__bpf_prog_put_noref()".  The kvfree() are directly called
instead.

It also takes this chance to s/kcalloc/kvcalloc/ for the jited_linfo
allocation.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 include/linux/filter.h |  3 +--
 kernel/bpf/core.c      | 35 ++++++++++++-----------------------
 kernel/bpf/syscall.c   |  3 ++-
 kernel/bpf/verifier.c  |  4 ++--
 4 files changed, 17 insertions(+), 28 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index b2b85b2cad8e..0d9c710eb050 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -877,8 +877,7 @@ void bpf_prog_free_linfo(struct bpf_prog *prog);
 void bpf_prog_fill_jited_linfo(struct bpf_prog *prog,
 			       const u32 *insn_to_jit_off);
 int bpf_prog_alloc_jited_linfo(struct bpf_prog *prog);
-void bpf_prog_free_jited_linfo(struct bpf_prog *prog);
-void bpf_prog_free_unused_jited_linfo(struct bpf_prog *prog);
+void bpf_prog_jit_attempt_done(struct bpf_prog *prog);
 
 struct bpf_prog *bpf_prog_alloc(unsigned int size, gfp_t gfp_extra_flags);
 struct bpf_prog *bpf_prog_alloc_no_stats(unsigned int size, gfp_t gfp_extra_flags);
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 3a283bf97f2f..4a6dd327446b 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -143,25 +143,22 @@ int bpf_prog_alloc_jited_linfo(struct bpf_prog *prog)
 	if (!prog->aux->nr_linfo || !prog->jit_requested)
 		return 0;
 
-	prog->aux->jited_linfo = kcalloc(prog->aux->nr_linfo,
-					 sizeof(*prog->aux->jited_linfo),
-					 GFP_KERNEL_ACCOUNT | __GFP_NOWARN);
+	prog->aux->jited_linfo = kvcalloc(prog->aux->nr_linfo,
+					  sizeof(*prog->aux->jited_linfo),
+					  GFP_KERNEL_ACCOUNT | __GFP_NOWARN);
 	if (!prog->aux->jited_linfo)
 		return -ENOMEM;
 
 	return 0;
 }
 
-void bpf_prog_free_jited_linfo(struct bpf_prog *prog)
+void bpf_prog_jit_attempt_done(struct bpf_prog *prog)
 {
-	kfree(prog->aux->jited_linfo);
-	prog->aux->jited_linfo = NULL;
-}
-
-void bpf_prog_free_unused_jited_linfo(struct bpf_prog *prog)
-{
-	if (prog->aux->jited_linfo && !prog->aux->jited_linfo[0])
-		bpf_prog_free_jited_linfo(prog);
+	if (prog->aux->jited_linfo &&
+	    (!prog->jited || !prog->aux->jited_linfo[0])) {
+		kvfree(prog->aux->jited_linfo);
+		prog->aux->jited_linfo = NULL;
+	}
 }
 
 /* The jit engine is responsible to provide an array
@@ -217,12 +214,6 @@ void bpf_prog_fill_jited_linfo(struct bpf_prog *prog,
 			insn_to_jit_off[linfo[i].insn_off - insn_start - 1];
 }
 
-void bpf_prog_free_linfo(struct bpf_prog *prog)
-{
-	bpf_prog_free_jited_linfo(prog);
-	kvfree(prog->aux->linfo);
-}
-
 struct bpf_prog *bpf_prog_realloc(struct bpf_prog *fp_old, unsigned int size,
 				  gfp_t gfp_extra_flags)
 {
@@ -1866,15 +1857,13 @@ struct bpf_prog *bpf_prog_select_runtime(struct bpf_prog *fp, int *err)
 			return fp;
 
 		fp = bpf_int_jit_compile(fp);
-		if (!fp->jited) {
-			bpf_prog_free_jited_linfo(fp);
+		bpf_prog_jit_attempt_done(fp);
 #ifdef CONFIG_BPF_JIT_ALWAYS_ON
+		if (!fp->jited) {
 			*err = -ENOTSUPP;
 			return fp;
-#endif
-		} else {
-			bpf_prog_free_unused_jited_linfo(fp);
 		}
+#endif
 	} else {
 		*err = bpf_prog_offload_compile(fp);
 		if (*err)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index c859bc46d06c..78a653e25df0 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1689,7 +1689,8 @@ static void __bpf_prog_put_noref(struct bpf_prog *prog, bool deferred)
 {
 	bpf_prog_kallsyms_del_all(prog);
 	btf_put(prog->aux->btf);
-	bpf_prog_free_linfo(prog);
+	kvfree(prog->aux->jited_linfo);
+	kvfree(prog->aux->linfo);
 	if (prog->aux->attach_btf)
 		btf_put(prog->aux->attach_btf);
 
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index f9096b049cd6..0647454a0c8e 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -11742,7 +11742,7 @@ static int jit_subprogs(struct bpf_verifier_env *env)
 	prog->bpf_func = func[0]->bpf_func;
 	prog->aux->func = func;
 	prog->aux->func_cnt = env->subprog_cnt;
-	bpf_prog_free_unused_jited_linfo(prog);
+	bpf_prog_jit_attempt_done(prog);
 	return 0;
 out_free:
 	for (i = 0; i < env->subprog_cnt; i++) {
@@ -11765,7 +11765,7 @@ static int jit_subprogs(struct bpf_verifier_env *env)
 		insn->off = 0;
 		insn->imm = env->insn_aux_data[i].call_imm;
 	}
-	bpf_prog_free_jited_linfo(prog);
+	bpf_prog_jit_attempt_done(prog);
 	return err;
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH bpf-next 02/15] bpf: btf: Support parsing extern func
  2021-03-16  1:13 [PATCH bpf-next 00/15] Support calling kernel function Martin KaFai Lau
  2021-03-16  1:13 ` [PATCH bpf-next 01/15] bpf: Simplify freeing logic in linfo and jited_linfo Martin KaFai Lau
@ 2021-03-16  1:13 ` Martin KaFai Lau
  2021-03-18 22:53   ` Andrii Nakryiko
  2021-03-16  1:13 ` [PATCH bpf-next 03/15] bpf: Refactor btf_check_func_arg_match Martin KaFai Lau
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 49+ messages in thread
From: Martin KaFai Lau @ 2021-03-16  1:13 UTC (permalink / raw)
  To: bpf; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, netdev

This patch makes BTF verifier to accept extern func. It is used for
allowing bpf program to call a limited set of kernel functions
in a later patch.

When writing bpf prog, the extern kernel function needs
to be declared under a ELF section (".ksyms") which is
the same as the current extern kernel variables and that should
keep its usage consistent without requiring to remember another
section name.

For example, in a bpf_prog.c:

extern int foo(struct sock *) __attribute__((section(".ksyms")))

[24] FUNC_PROTO '(anon)' ret_type_id=15 vlen=1
	'(anon)' type_id=18
[25] FUNC 'foo' type_id=24 linkage=extern
[ ... ]
[33] DATASEC '.ksyms' size=0 vlen=1
	type_id=25 offset=0 size=0

LLVM will put the "func" type into the BTF datasec ".ksyms".
The current "btf_datasec_check_meta()" assumes everything under
it is a "var" and ensures it has non-zero size ("!vsi->size" test).
The non-zero size check is not true for "func".  This patch postpones the
"!vsi-size" test from "btf_datasec_check_meta()" to
"btf_datasec_resolve()" which has all types collected to decide
if a vsi is a "var" or a "func" and then enforce the "vsi->size"
differently.

If the datasec only has "func", its "t->size" could be zero.
Thus, the current "!t->size" test is no longer valid.  The
invalid "t->size" will still be caught by the later
"last_vsi_end_off > t->size" check.   This patch also takes this
chance to consolidate other "t->size" tests ("vsi->offset >= t->size"
"vsi->size > t->size", and "t->size < sum") into the existing
"last_vsi_end_off > t->size" test.

The LLVM will also put those extern kernel function as an extern
linkage func in the BTF:

[24] FUNC_PROTO '(anon)' ret_type_id=15 vlen=1
	'(anon)' type_id=18
[25] FUNC 'foo' type_id=24 linkage=extern

This patch allows BTF_FUNC_EXTERN in btf_func_check_meta().
Also extern kernel function declaration does not
necessary have arg name. Another change in btf_func_check() is
to allow extern function having no arg name.

The btf selftest is adjusted accordingly.  New tests are also added.

The required LLVM patch: https://reviews.llvm.org/D93563

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 kernel/bpf/btf.c                             |  52 ++++---
 tools/testing/selftests/bpf/prog_tests/btf.c | 154 ++++++++++++++++++-
 2 files changed, 178 insertions(+), 28 deletions(-)

diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 369faeddf1df..96cd24020a38 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3439,7 +3439,7 @@ static s32 btf_func_check_meta(struct btf_verifier_env *env,
 		return -EINVAL;
 	}
 
-	if (btf_type_vlen(t) > BTF_FUNC_GLOBAL) {
+	if (btf_type_vlen(t) > BTF_FUNC_EXTERN) {
 		btf_verifier_log_type(env, t, "Invalid func linkage");
 		return -EINVAL;
 	}
@@ -3532,7 +3532,7 @@ static s32 btf_datasec_check_meta(struct btf_verifier_env *env,
 				  u32 meta_left)
 {
 	const struct btf_var_secinfo *vsi;
-	u64 last_vsi_end_off = 0, sum = 0;
+	u64 last_vsi_end_off = 0;
 	u32 i, meta_needed;
 
 	meta_needed = btf_type_vlen(t) * sizeof(*vsi);
@@ -3543,11 +3543,6 @@ static s32 btf_datasec_check_meta(struct btf_verifier_env *env,
 		return -EINVAL;
 	}
 
-	if (!t->size) {
-		btf_verifier_log_type(env, t, "size == 0");
-		return -EINVAL;
-	}
-
 	if (btf_type_kflag(t)) {
 		btf_verifier_log_type(env, t, "Invalid btf_info kind_flag");
 		return -EINVAL;
@@ -3569,19 +3564,13 @@ static s32 btf_datasec_check_meta(struct btf_verifier_env *env,
 			return -EINVAL;
 		}
 
-		if (vsi->offset < last_vsi_end_off || vsi->offset >= t->size) {
+		if (vsi->offset < last_vsi_end_off) {
 			btf_verifier_log_vsi(env, t, vsi,
 					     "Invalid offset");
 			return -EINVAL;
 		}
 
-		if (!vsi->size || vsi->size > t->size) {
-			btf_verifier_log_vsi(env, t, vsi,
-					     "Invalid size");
-			return -EINVAL;
-		}
-
-		last_vsi_end_off = vsi->offset + vsi->size;
+		last_vsi_end_off = (u64)vsi->offset + vsi->size;
 		if (last_vsi_end_off > t->size) {
 			btf_verifier_log_vsi(env, t, vsi,
 					     "Invalid offset+size");
@@ -3589,12 +3578,6 @@ static s32 btf_datasec_check_meta(struct btf_verifier_env *env,
 		}
 
 		btf_verifier_log_vsi(env, t, vsi, NULL);
-		sum += vsi->size;
-	}
-
-	if (t->size < sum) {
-		btf_verifier_log_type(env, t, "Invalid btf_info size");
-		return -EINVAL;
 	}
 
 	return meta_needed;
@@ -3611,9 +3594,28 @@ static int btf_datasec_resolve(struct btf_verifier_env *env,
 		u32 var_type_id = vsi->type, type_id, type_size = 0;
 		const struct btf_type *var_type = btf_type_by_id(env->btf,
 								 var_type_id);
-		if (!var_type || !btf_type_is_var(var_type)) {
+		if (!var_type) {
+			btf_verifier_log_vsi(env, v->t, vsi,
+					     "type not found");
+			return -EINVAL;
+		}
+
+		if (btf_type_is_func(var_type)) {
+			if (vsi->size || vsi->offset) {
+				btf_verifier_log_vsi(env, v->t, vsi,
+						     "Invalid size/offset");
+				return -EINVAL;
+			}
+			continue;
+		} else if (btf_type_is_var(var_type)) {
+			if (!vsi->size) {
+				btf_verifier_log_vsi(env, v->t, vsi,
+						     "Invalid size");
+				return -EINVAL;
+			}
+		} else {
 			btf_verifier_log_vsi(env, v->t, vsi,
-					     "Not a VAR kind member");
+					     "Neither a VAR nor a FUNC");
 			return -EINVAL;
 		}
 
@@ -3849,9 +3851,11 @@ static int btf_func_check(struct btf_verifier_env *env,
 	const struct btf_param *args;
 	const struct btf *btf;
 	u16 nr_args, i;
+	bool is_extern;
 
 	btf = env->btf;
 	proto_type = btf_type_by_id(btf, t->type);
+	is_extern = btf_type_vlen(t) == BTF_FUNC_EXTERN;
 
 	if (!proto_type || !btf_type_is_func_proto(proto_type)) {
 		btf_verifier_log_type(env, t, "Invalid type_id");
@@ -3861,7 +3865,7 @@ static int btf_func_check(struct btf_verifier_env *env,
 	args = (const struct btf_param *)(proto_type + 1);
 	nr_args = btf_type_vlen(proto_type);
 	for (i = 0; i < nr_args; i++) {
-		if (!args[i].name_off && args[i].type) {
+		if (!is_extern && !args[i].name_off && args[i].type) {
 			btf_verifier_log_type(env, t, "Invalid arg#%u", i + 1);
 			return -EINVAL;
 		}
diff --git a/tools/testing/selftests/bpf/prog_tests/btf.c b/tools/testing/selftests/bpf/prog_tests/btf.c
index 0457ae32b270..e469482833b2 100644
--- a/tools/testing/selftests/bpf/prog_tests/btf.c
+++ b/tools/testing/selftests/bpf/prog_tests/btf.c
@@ -498,7 +498,7 @@ static struct btf_raw_test raw_tests[] = {
 	.value_type_id = 7,
 	.max_entries = 1,
 	.btf_load_err = true,
-	.err_str = "Invalid size",
+	.err_str = "Invalid offset+size",
 },
 {
 	.descr = "global data test #10, invalid var size",
@@ -696,7 +696,7 @@ static struct btf_raw_test raw_tests[] = {
 	.err_str = "Invalid offset",
 },
 {
-	.descr = "global data test #15, not var kind",
+	.descr = "global data test #15, not var/func kind",
 	.raw_types = {
 		/* int */
 		BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4),	/* [1] */
@@ -716,7 +716,7 @@ static struct btf_raw_test raw_tests[] = {
 	.value_type_id = 3,
 	.max_entries = 1,
 	.btf_load_err = true,
-	.err_str = "Not a VAR kind member",
+	.err_str = "Neither a VAR nor a FUNC",
 },
 {
 	.descr = "global data test #16, invalid var referencing sec",
@@ -2803,7 +2803,7 @@ static struct btf_raw_test raw_tests[] = {
 			BTF_FUNC_PROTO_ARG_ENC(NAME_TBD, 1),
 			BTF_FUNC_PROTO_ARG_ENC(NAME_TBD, 2),
 		/* void func(int a, unsigned int b) */
-		BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_FUNC, 0, 2), 3), 	/* [4] */
+		BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_FUNC, 0, 3), 3), 	/* [4] */
 		BTF_END_RAW,
 	},
 	.str_sec = "\0a\0b\0func",
@@ -3531,6 +3531,152 @@ static struct btf_raw_test raw_tests[] = {
 	.max_entries = 1,
 },
 
+{
+	.descr = "datasec: func only",
+	.raw_types = {
+		/* int */
+		BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4),	/* [1] */
+		/* void (*)(void) */
+		BTF_FUNC_PROTO_ENC(0, 0),		/* [2] */
+		BTF_FUNC_ENC(NAME_NTH(1), 2),		/* [3] */
+		BTF_FUNC_ENC(NAME_NTH(2), 2),		/* [4] */
+		/* .ksym section */
+		BTF_TYPE_ENC(NAME_NTH(3), BTF_INFO_ENC(BTF_KIND_DATASEC, 0, 2), 0), /* [5] */
+		BTF_VAR_SECINFO_ENC(3, 0, 0),
+		BTF_VAR_SECINFO_ENC(4, 0, 0),
+		BTF_END_RAW,
+	},
+	BTF_STR_SEC("\0foo1\0foo2\0.ksym\0"),
+	.map_type = BPF_MAP_TYPE_ARRAY,
+	.key_size = sizeof(int),
+	.value_size = sizeof(int),
+	.key_type_id = 1,
+	.value_type_id = 1,
+	.max_entries = 1,
+},
+
+{
+	.descr = "datasec: func and var",
+	.raw_types = {
+		/* int */
+		BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4),	/* [1] */
+		/* void (*)(void) */
+		BTF_FUNC_PROTO_ENC(0, 0),		/* [2] */
+		BTF_FUNC_ENC(NAME_NTH(1), 2),		/* [3] */
+		BTF_FUNC_ENC(NAME_NTH(2), 2),		/* [4] */
+		/* int */
+		BTF_VAR_ENC(NAME_NTH(4), 1, 0),		/* [5] */
+		BTF_VAR_ENC(NAME_NTH(5), 1, 0),		/* [6] */
+		/* .ksym section */
+		BTF_TYPE_ENC(NAME_NTH(3), BTF_INFO_ENC(BTF_KIND_DATASEC, 0, 4), 8), /* [7] */
+		BTF_VAR_SECINFO_ENC(3, 0, 0),
+		BTF_VAR_SECINFO_ENC(4, 0, 0),
+		BTF_VAR_SECINFO_ENC(5, 0, 4),
+		BTF_VAR_SECINFO_ENC(6, 4, 4),
+		BTF_END_RAW,
+	},
+	BTF_STR_SEC("\0foo1\0foo2\0.ksym\0a\0b\0"),
+	.map_type = BPF_MAP_TYPE_ARRAY,
+	.key_size = sizeof(int),
+	.value_size = sizeof(int),
+	.key_type_id = 1,
+	.value_type_id = 1,
+	.max_entries = 1,
+},
+
+{
+	.descr = "datasec: func and var, invalid size/offset for func",
+	.raw_types = {
+		/* int */
+		BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4),	/* [1] */
+		/* void (*)(void) */
+		BTF_FUNC_PROTO_ENC(0, 0),		/* [2] */
+		BTF_FUNC_ENC(NAME_NTH(1), 2),		/* [3] */
+		BTF_FUNC_ENC(NAME_NTH(2), 2),		/* [4] */
+		/* int */
+		BTF_VAR_ENC(NAME_NTH(4), 1, 0),		/* [5] */
+		BTF_VAR_ENC(NAME_NTH(5), 1, 0),		/* [6] */
+		/* .ksym section */
+		BTF_TYPE_ENC(NAME_NTH(3), BTF_INFO_ENC(BTF_KIND_DATASEC, 0, 4), 8), /* [7] */
+		BTF_VAR_SECINFO_ENC(3, 0, 0),
+		BTF_VAR_SECINFO_ENC(5, 0, 4),
+		BTF_VAR_SECINFO_ENC(4, 4, 0),	/* func has non zero vsi->offset */
+		BTF_VAR_SECINFO_ENC(6, 4, 4),
+		BTF_END_RAW,
+	},
+	BTF_STR_SEC("\0foo1\0foo2\0.ksym\0a\0b\0"),
+	.map_type = BPF_MAP_TYPE_ARRAY,
+	.key_size = sizeof(int),
+	.value_size = sizeof(int),
+	.key_type_id = 1,
+	.value_type_id = 1,
+	.max_entries = 1,
+	.btf_load_err = true,
+	.err_str = "Invalid size/offset",
+},
+
+{
+	.descr = "datasec: func and var, datasec size 0",
+	.raw_types = {
+		/* int */
+		BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4),	/* [1] */
+		/* void (*)(void) */
+		BTF_FUNC_PROTO_ENC(0, 0),		/* [2] */
+		BTF_FUNC_ENC(NAME_NTH(1), 2),		/* [3] */
+		BTF_FUNC_ENC(NAME_NTH(2), 2),		/* [4] */
+		/* int */
+		BTF_VAR_ENC(NAME_NTH(4), 1, 0),		/* [5] */
+		BTF_VAR_ENC(NAME_NTH(5), 1, 0),		/* [6] */
+		/* .ksym section */
+		BTF_TYPE_ENC(NAME_NTH(3), BTF_INFO_ENC(BTF_KIND_DATASEC, 0, 4), 0), /* [7] */
+		BTF_VAR_SECINFO_ENC(3, 0, 0),
+		BTF_VAR_SECINFO_ENC(4, 0, 0),
+		BTF_VAR_SECINFO_ENC(5, 0, 4),
+		BTF_VAR_SECINFO_ENC(6, 4, 4),
+		BTF_END_RAW,
+	},
+	BTF_STR_SEC("\0foo1\0foo2\0.ksym\0a\0b\0"),
+	.map_type = BPF_MAP_TYPE_ARRAY,
+	.key_size = sizeof(int),
+	.value_size = sizeof(int),
+	.key_type_id = 1,
+	.value_type_id = 1,
+	.max_entries = 1,
+	.btf_load_err = true,
+	.err_str = "Invalid offset+size",
+},
+
+{
+	.descr = "datasec: func and var, zero vsi->size for var",
+	.raw_types = {
+		/* int */
+		BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4),	/* [1] */
+		/* void (*)(void) */
+		BTF_FUNC_PROTO_ENC(0, 0),		/* [2] */
+		BTF_FUNC_ENC(NAME_NTH(1), 2),		/* [3] */
+		BTF_FUNC_ENC(NAME_NTH(2), 2),		/* [4] */
+		/* int */
+		BTF_VAR_ENC(NAME_NTH(4), 1, 0),		/* [5] */
+		BTF_VAR_ENC(NAME_NTH(5), 1, 0),		/* [6] */
+		/* .ksym section */
+		BTF_TYPE_ENC(NAME_NTH(3), BTF_INFO_ENC(BTF_KIND_DATASEC, 0, 4), 8), /* [7] */
+		BTF_VAR_SECINFO_ENC(3, 0, 0),
+		BTF_VAR_SECINFO_ENC(4, 0, 0),
+		BTF_VAR_SECINFO_ENC(5, 0, 0),	/* var has zero vsi->size */
+		BTF_VAR_SECINFO_ENC(6, 0, 4),
+		BTF_END_RAW,
+	},
+	BTF_STR_SEC("\0foo1\0foo2\0.ksym\0a\0b\0"),
+	.map_type = BPF_MAP_TYPE_ARRAY,
+	.key_size = sizeof(int),
+	.value_size = sizeof(int),
+	.key_type_id = 1,
+	.value_type_id = 1,
+	.max_entries = 1,
+	.btf_load_err = true,
+	.err_str = "Invalid size",
+},
+
 {
 	.descr = "float test #1, well-formed",
 	.raw_types = {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH bpf-next 03/15] bpf: Refactor btf_check_func_arg_match
  2021-03-16  1:13 [PATCH bpf-next 00/15] Support calling kernel function Martin KaFai Lau
  2021-03-16  1:13 ` [PATCH bpf-next 01/15] bpf: Simplify freeing logic in linfo and jited_linfo Martin KaFai Lau
  2021-03-16  1:13 ` [PATCH bpf-next 02/15] bpf: btf: Support parsing extern func Martin KaFai Lau
@ 2021-03-16  1:13 ` Martin KaFai Lau
  2021-03-18 23:32   ` Andrii Nakryiko
  2021-03-16  1:14 ` [PATCH bpf-next 04/15] bpf: Support bpf program calling kernel function Martin KaFai Lau
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 49+ messages in thread
From: Martin KaFai Lau @ 2021-03-16  1:13 UTC (permalink / raw)
  To: bpf; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, netdev

This patch refactors the core logic of "btf_check_func_arg_match()"
into a new function "do_btf_check_func_arg_match()".
"do_btf_check_func_arg_match()" will be reused later to check
the kernel function call.

The "if (!btf_type_is_ptr(t))" is checked first to improve the indentation
which will be useful for a later patch.

Some of the "btf_kind_str[]" usages is replaced with the shortcut
"btf_type_str(t)".

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 include/linux/btf.h |   5 ++
 kernel/bpf/btf.c    | 159 ++++++++++++++++++++++++--------------------
 2 files changed, 91 insertions(+), 73 deletions(-)

diff --git a/include/linux/btf.h b/include/linux/btf.h
index 7fabf1428093..93bf2e5225f5 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -140,6 +140,11 @@ static inline bool btf_type_is_enum(const struct btf_type *t)
 	return BTF_INFO_KIND(t->info) == BTF_KIND_ENUM;
 }
 
+static inline bool btf_type_is_scalar(const struct btf_type *t)
+{
+	return btf_type_is_int(t) || btf_type_is_enum(t);
+}
+
 static inline bool btf_type_is_typedef(const struct btf_type *t)
 {
 	return BTF_INFO_KIND(t->info) == BTF_KIND_TYPEDEF;
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 96cd24020a38..529b94b601c6 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -4381,7 +4381,7 @@ static u8 bpf_ctx_convert_map[] = {
 #undef BPF_LINK_TYPE
 
 static const struct btf_member *
-btf_get_prog_ctx_type(struct bpf_verifier_log *log, struct btf *btf,
+btf_get_prog_ctx_type(struct bpf_verifier_log *log, const struct btf *btf,
 		      const struct btf_type *t, enum bpf_prog_type prog_type,
 		      int arg)
 {
@@ -5366,122 +5366,135 @@ int btf_check_type_match(struct bpf_verifier_log *log, const struct bpf_prog *pr
 	return btf_check_func_type_match(log, btf1, t1, btf2, t2);
 }
 
-/* Compare BTF of a function with given bpf_reg_state.
- * Returns:
- * EFAULT - there is a verifier bug. Abort verification.
- * EINVAL - there is a type mismatch or BTF is not available.
- * 0 - BTF matches with what bpf_reg_state expects.
- * Only PTR_TO_CTX and SCALAR_VALUE states are recognized.
- */
-int btf_check_func_arg_match(struct bpf_verifier_env *env, int subprog,
-			     struct bpf_reg_state *regs)
+static int do_btf_check_func_arg_match(struct bpf_verifier_env *env,
+				       const struct btf *btf, u32 func_id,
+				       struct bpf_reg_state *regs,
+				       bool ptr_to_mem_ok)
 {
 	struct bpf_verifier_log *log = &env->log;
-	struct bpf_prog *prog = env->prog;
-	struct btf *btf = prog->aux->btf;
-	const struct btf_param *args;
+	const char *func_name, *ref_tname;
 	const struct btf_type *t, *ref_t;
-	u32 i, nargs, btf_id, type_size;
-	const char *tname;
-	bool is_global;
-
-	if (!prog->aux->func_info)
-		return -EINVAL;
-
-	btf_id = prog->aux->func_info[subprog].type_id;
-	if (!btf_id)
-		return -EFAULT;
-
-	if (prog->aux->func_info_aux[subprog].unreliable)
-		return -EINVAL;
+	const struct btf_param *args;
+	u32 i, nargs;
 
-	t = btf_type_by_id(btf, btf_id);
+	t = btf_type_by_id(btf, func_id);
 	if (!t || !btf_type_is_func(t)) {
 		/* These checks were already done by the verifier while loading
 		 * struct bpf_func_info
 		 */
-		bpf_log(log, "BTF of func#%d doesn't point to KIND_FUNC\n",
-			subprog);
+		bpf_log(log, "BTF of func_id %u doesn't point to KIND_FUNC\n",
+			func_id);
 		return -EFAULT;
 	}
-	tname = btf_name_by_offset(btf, t->name_off);
+	func_name = btf_name_by_offset(btf, t->name_off);
 
 	t = btf_type_by_id(btf, t->type);
 	if (!t || !btf_type_is_func_proto(t)) {
-		bpf_log(log, "Invalid BTF of func %s\n", tname);
+		bpf_log(log, "Invalid BTF of func %s\n", func_name);
 		return -EFAULT;
 	}
 	args = (const struct btf_param *)(t + 1);
 	nargs = btf_type_vlen(t);
 	if (nargs > MAX_BPF_FUNC_REG_ARGS) {
-		bpf_log(log, "Function %s has %d > %d args\n", tname, nargs,
+		bpf_log(log, "Function %s has %d > %d args\n", func_name, nargs,
 			MAX_BPF_FUNC_REG_ARGS);
-		goto out;
+		return -EINVAL;
 	}
 
-	is_global = prog->aux->func_info_aux[subprog].linkage == BTF_FUNC_GLOBAL;
 	/* check that BTF function arguments match actual types that the
 	 * verifier sees.
 	 */
 	for (i = 0; i < nargs; i++) {
-		struct bpf_reg_state *reg = &regs[i + 1];
+		u32 regno = i + 1;
+		struct bpf_reg_state *reg = &regs[regno];
 
-		t = btf_type_by_id(btf, args[i].type);
-		while (btf_type_is_modifier(t))
-			t = btf_type_by_id(btf, t->type);
-		if (btf_type_is_int(t) || btf_type_is_enum(t)) {
+		t = btf_type_skip_modifiers(btf, args[i].type, NULL);
+		if (btf_type_is_scalar(t)) {
 			if (reg->type == SCALAR_VALUE)
 				continue;
-			bpf_log(log, "R%d is not a scalar\n", i + 1);
-			goto out;
+			bpf_log(log, "R%d is not a scalar\n", regno);
+			return -EINVAL;
 		}
-		if (btf_type_is_ptr(t)) {
+
+		if (!btf_type_is_ptr(t)) {
+			bpf_log(log, "Unrecognized arg#%d type %s\n",
+				i, btf_type_str(t));
+			return -EINVAL;
+		}
+
+		ref_t = btf_type_skip_modifiers(btf, t->type, NULL);
+		ref_tname = btf_name_by_offset(btf, ref_t->name_off);
+		if (btf_get_prog_ctx_type(log, btf, t, env->prog->type, i)) {
 			/* If function expects ctx type in BTF check that caller
 			 * is passing PTR_TO_CTX.
 			 */
-			if (btf_get_prog_ctx_type(log, btf, t, prog->type, i)) {
-				if (reg->type != PTR_TO_CTX) {
-					bpf_log(log,
-						"arg#%d expected pointer to ctx, but got %s\n",
-						i, btf_kind_str[BTF_INFO_KIND(t->info)]);
-					goto out;
-				}
-				if (check_ctx_reg(env, reg, i + 1))
-					goto out;
-				continue;
+			if (reg->type != PTR_TO_CTX) {
+				bpf_log(log,
+					"arg#%d expected pointer to ctx, but got %s\n",
+					i, btf_type_str(t));
+				return -EINVAL;
 			}
+			if (check_ctx_reg(env, reg, regno))
+				return -EINVAL;
+		} else if (ptr_to_mem_ok) {
+			const struct btf_type *resolve_ret;
+			u32 type_size;
 
-			if (!is_global)
-				goto out;
-
-			t = btf_type_skip_modifiers(btf, t->type, NULL);
-
-			ref_t = btf_resolve_size(btf, t, &type_size);
-			if (IS_ERR(ref_t)) {
+			resolve_ret = btf_resolve_size(btf, ref_t, &type_size);
+			if (IS_ERR(resolve_ret)) {
 				bpf_log(log,
-				    "arg#%d reference type('%s %s') size cannot be determined: %ld\n",
-				    i, btf_type_str(t), btf_name_by_offset(btf, t->name_off),
-					PTR_ERR(ref_t));
-				goto out;
+					"arg#%d reference type('%s %s') size cannot be determined: %ld\n",
+					i, btf_type_str(ref_t), ref_tname,
+					PTR_ERR(resolve_ret));
+				return -EINVAL;
 			}
 
-			if (check_mem_reg(env, reg, i + 1, type_size))
-				goto out;
-
-			continue;
+			if (check_mem_reg(env, reg, regno, type_size))
+				return -EINVAL;
+		} else {
+			return -EINVAL;
 		}
-		bpf_log(log, "Unrecognized arg#%d type %s\n",
-			i, btf_kind_str[BTF_INFO_KIND(t->info)]);
-		goto out;
 	}
+
 	return 0;
-out:
+}
+
+/* Compare BTF of a function with given bpf_reg_state.
+ * Returns:
+ * EFAULT - there is a verifier bug. Abort verification.
+ * EINVAL - there is a type mismatch or BTF is not available.
+ * 0 - BTF matches with what bpf_reg_state expects.
+ * Only PTR_TO_CTX and SCALAR_VALUE states are recognized.
+ */
+int btf_check_func_arg_match(struct bpf_verifier_env *env, int subprog,
+			     struct bpf_reg_state *regs)
+{
+	struct bpf_prog *prog = env->prog;
+	struct btf *btf = prog->aux->btf;
+	bool is_global;
+	u32 btf_id;
+	int err;
+
+	if (!prog->aux->func_info)
+		return -EINVAL;
+
+	btf_id = prog->aux->func_info[subprog].type_id;
+	if (!btf_id)
+		return -EFAULT;
+
+	if (prog->aux->func_info_aux[subprog].unreliable)
+		return -EINVAL;
+
+	is_global = prog->aux->func_info_aux[subprog].linkage == BTF_FUNC_GLOBAL;
+	err = do_btf_check_func_arg_match(env, btf, btf_id, regs, is_global);
+
 	/* Compiler optimizations can remove arguments from static functions
 	 * or mismatched type can be passed into a global function.
 	 * In such cases mark the function as unreliable from BTF point of view.
 	 */
-	prog->aux->func_info_aux[subprog].unreliable = true;
-	return -EINVAL;
+	if (err == -EINVAL)
+		prog->aux->func_info_aux[subprog].unreliable = true;
+	return err;
 }
 
 /* Convert BTF of a function into bpf_reg_state if possible
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH bpf-next 04/15] bpf: Support bpf program calling kernel function
  2021-03-16  1:13 [PATCH bpf-next 00/15] Support calling kernel function Martin KaFai Lau
                   ` (2 preceding siblings ...)
  2021-03-16  1:13 ` [PATCH bpf-next 03/15] bpf: Refactor btf_check_func_arg_match Martin KaFai Lau
@ 2021-03-16  1:14 ` Martin KaFai Lau
  2021-03-19  1:03   ` Andrii Nakryiko
  2021-03-16  1:14 ` [PATCH bpf-next 05/15] bpf: Support kernel function call in x86-32 Martin KaFai Lau
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 49+ messages in thread
From: Martin KaFai Lau @ 2021-03-16  1:14 UTC (permalink / raw)
  To: bpf; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, netdev

This patch adds support to BPF verifier to allow bpf program calling
kernel function directly.

The use case included in this set is to allow bpf-tcp-cc to directly
call some tcp-cc helper functions (e.g. "tcp_cong_avoid_ai()").  Those
functions have already been used by some kernel tcp-cc implementations.

This set will also allow the bpf-tcp-cc program to directly call the
kernel tcp-cc implementation,  For example, a bpf_dctcp may only want to
implement its own dctcp_cwnd_event() and reuse other dctcp_*() directly
from the kernel tcp_dctcp.c instead of reimplementing (or
copy-and-pasting) them.

The tcp-cc kernel functions mentioned above will be white listed
for the struct_ops bpf-tcp-cc programs to use in a later patch.
The white listed functions are not bounded to a fixed ABI contract.
Those functions have already been used by the existing kernel tcp-cc.
If any of them has changed, both in-tree and out-of-tree kernel tcp-cc
implementations have to be changed.  The same goes for the struct_ops
bpf-tcp-cc programs which have to be adjusted accordingly.

This patch is to make the required changes in the bpf verifier.

First change is in btf.c, it adds a case in "do_btf_check_func_arg_match()".
When the passed in "btf->kernel_btf == true", it means matching the
verifier regs' states with a kernel function.  This will handle the
PTR_TO_BTF_ID reg.  It also maps PTR_TO_SOCK_COMMON, PTR_TO_SOCKET,
and PTR_TO_TCP_SOCK to its kernel's btf_id.

In the later libbpf patch, the insn calling a kernel function will
look like:

insn->code == (BPF_JMP | BPF_CALL)
insn->src_reg == BPF_PSEUDO_KFUNC_CALL /* <- new in this patch */
insn->imm == func_btf_id /* btf_id of the running kernel */

[ For the future calling function-in-kernel-module support, an array
  of module btf_fds can be passed at the load time and insn->off
  can be used to index into this array. ]

At the early stage of verifier, the verifier will collect all kernel
function calls into "struct bpf_kern_func_descriptor".  Those
descriptors are stored in "prog->aux->kfunc_tab" and will
be available to the JIT.  Since this "add" operation is similar
to the current "add_subprog()" and looking for the same insn->code,
they are done together in the new "add_subprog_and_kern_func()".

In the "do_check()" stage, the new "check_kern_func_call()" is added
to verify the kernel function call instruction:
1. Ensure the kernel function can be used by a particular BPF_PROG_TYPE.
   A new bpf_verifier_ops "check_kern_func_call" is added to do that.
   The bpf-tcp-cc struct_ops program will implement this function in
   a later patch.
2. Call "btf_check_kern_func_args_match()" to ensure the regs can be
   used as the args of a kernel function.
3. Mark the regs' type, subreg_def, and zext_dst.

At the later do_misc_fixups() stage, the new fixup_kern_func_call()
will replace the insn->imm with the function address (relative
to __bpf_call_base).  If needed, the jit can find the btf_func_model
by calling the new bpf_jit_find_kern_func_model(prog, insn->imm).
With the imm set to the function address, "bpftool prog dump xlated"
will be able to display the kernel function calls the same way as
it displays other bpf helper calls.

gpl_compatible program is required to call kernel function.

This feature currently requires JIT.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 arch/x86/net/bpf_jit_comp.c       |   5 +
 include/linux/bpf.h               |  24 ++
 include/linux/btf.h               |   1 +
 include/linux/filter.h            |   1 +
 include/uapi/linux/bpf.h          |   4 +
 kernel/bpf/btf.c                  |  65 +++++-
 kernel/bpf/core.c                 |  18 +-
 kernel/bpf/disasm.c               |  32 +--
 kernel/bpf/disasm.h               |   3 +-
 kernel/bpf/syscall.c              |   1 +
 kernel/bpf/verifier.c             | 376 ++++++++++++++++++++++++++++--
 tools/bpf/bpftool/xlated_dumper.c |   3 +-
 tools/include/uapi/linux/bpf.h    |   4 +
 13 files changed, 488 insertions(+), 49 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 6926d0ca6c71..bcb957234410 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -2327,3 +2327,8 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 					   tmp : orig_prog);
 	return prog;
 }
+
+bool bpf_jit_supports_kfunc_call(void)
+{
+	return true;
+}
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index a25730eaa148..75ab8dc02df5 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -426,6 +426,7 @@ enum bpf_reg_type {
 	PTR_TO_PERCPU_BTF_ID,	 /* reg points to a percpu kernel variable */
 	PTR_TO_FUNC,		 /* reg points to a bpf program function */
 	PTR_TO_MAP_KEY,		 /* reg points to a map element key */
+	__BPF_REG_TYPE_MAX,
 };
 
 /* The information passed from prog-specific *_is_valid_access
@@ -479,6 +480,7 @@ struct bpf_verifier_ops {
 				 const struct btf_type *t, int off, int size,
 				 enum bpf_access_type atype,
 				 u32 *next_btf_id);
+	bool (*check_kern_func_call)(u32 kfunc_btf_id);
 };
 
 struct bpf_prog_offload_ops {
@@ -779,6 +781,8 @@ struct btf_mod_pair {
 	struct module *module;
 };
 
+struct bpf_kern_func_desc_tab;
+
 struct bpf_prog_aux {
 	atomic64_t refcnt;
 	u32 used_map_cnt;
@@ -816,6 +820,7 @@ struct bpf_prog_aux {
 	struct bpf_prog **func;
 	void *jit_data; /* JIT specific data. arch dependent */
 	struct bpf_jit_poke_descriptor *poke_tab;
+	struct bpf_kern_func_desc_tab *kfunc_tab;
 	u32 size_poke_tab;
 	struct bpf_ksym ksym;
 	const struct bpf_prog_ops *ops;
@@ -1514,6 +1519,9 @@ int btf_distill_func_proto(struct bpf_verifier_log *log,
 			   struct btf_func_model *m);
 
 struct bpf_reg_state;
+int btf_check_kern_func_arg_match(struct bpf_verifier_env *env,
+				  const struct btf *btf, u32 func_id,
+				  struct bpf_reg_state *regs);
 int btf_check_func_arg_match(struct bpf_verifier_env *env, int subprog,
 			     struct bpf_reg_state *regs);
 int btf_prepare_func_args(struct bpf_verifier_env *env, int subprog,
@@ -1526,6 +1534,10 @@ struct bpf_link *bpf_link_by_id(u32 id);
 
 const struct bpf_func_proto *bpf_base_func_proto(enum bpf_func_id func_id);
 void bpf_task_storage_free(struct task_struct *task);
+bool bpf_prog_has_kern_func_call(const struct bpf_prog *prog);
+const struct btf_func_model *
+bpf_jit_find_kern_func_model(const struct bpf_prog *prog,
+			      const struct bpf_insn *insn);
 #else /* !CONFIG_BPF_SYSCALL */
 static inline struct bpf_prog *bpf_prog_get(u32 ufd)
 {
@@ -1706,6 +1718,18 @@ bpf_base_func_proto(enum bpf_func_id func_id)
 static inline void bpf_task_storage_free(struct task_struct *task)
 {
 }
+
+static inline bool bpf_prog_has_kern_func_call(const struct bpf_prog *prog)
+{
+	return false;
+}
+
+static inline const struct btf_func_model *
+bpf_jit_find_kern_func_model(const struct bpf_prog *prog,
+			     const struct bpf_insn *insn)
+{
+	return NULL;
+}
 #endif /* CONFIG_BPF_SYSCALL */
 
 void __bpf_free_used_btfs(struct bpf_prog_aux *aux,
diff --git a/include/linux/btf.h b/include/linux/btf.h
index 93bf2e5225f5..8f6ea0d4d8a1 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -109,6 +109,7 @@ const struct btf_type *btf_type_resolve_func_ptr(const struct btf *btf,
 const struct btf_type *
 btf_resolve_size(const struct btf *btf, const struct btf_type *type,
 		 u32 *type_size);
+const char *btf_type_str(const struct btf_type *t);
 
 #define for_each_member(i, struct_type, member)			\
 	for (i = 0, member = btf_type_member(struct_type);	\
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 0d9c710eb050..eecfd82db648 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -918,6 +918,7 @@ u64 __bpf_call_base(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
 struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog);
 void bpf_jit_compile(struct bpf_prog *prog);
 bool bpf_jit_needs_zext(void);
+bool bpf_jit_supports_kfunc_call(void);
 bool bpf_helper_changes_pkt_data(void *func);
 
 static inline bool bpf_dump_raw_ok(const struct cred *cred)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 2d3036e292a9..ab9f2233607c 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1117,6 +1117,10 @@ enum bpf_link_type {
  * offset to another bpf function
  */
 #define BPF_PSEUDO_CALL		1
+/* when bpf_call->src_reg == BPF_PSEUDO_KFUNC_CALL,
+ * bpf_call->imm == btf_id of a BTF_KIND_FUNC in the running kernel
+ */
+#define BPF_PSEUDO_KFUNC_CALL	2
 
 /* flags for BPF_MAP_UPDATE_ELEM command */
 enum {
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 529b94b601c6..ba77fdbe8cda 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -283,7 +283,7 @@ static const char * const btf_kind_str[NR_BTF_KINDS] = {
 	[BTF_KIND_FLOAT]	= "FLOAT",
 };
 
-static const char *btf_type_str(const struct btf_type *t)
+const char *btf_type_str(const struct btf_type *t)
 {
 	return btf_kind_str[BTF_INFO_KIND(t->info)];
 }
@@ -5366,6 +5366,14 @@ int btf_check_type_match(struct bpf_verifier_log *log, const struct bpf_prog *pr
 	return btf_check_func_type_match(log, btf1, t1, btf2, t2);
 }
 
+static u32 *reg2btf_ids[__BPF_REG_TYPE_MAX] = {
+#ifdef CONFIG_NET
+	[PTR_TO_SOCKET] = &btf_sock_ids[BTF_SOCK_TYPE_SOCK],
+	[PTR_TO_SOCK_COMMON] = &btf_sock_ids[BTF_SOCK_TYPE_SOCK_COMMON],
+	[PTR_TO_TCP_SOCK] = &btf_sock_ids[BTF_SOCK_TYPE_TCP],
+#endif
+};
+
 static int do_btf_check_func_arg_match(struct bpf_verifier_env *env,
 				       const struct btf *btf, u32 func_id,
 				       struct bpf_reg_state *regs,
@@ -5375,12 +5383,12 @@ static int do_btf_check_func_arg_match(struct bpf_verifier_env *env,
 	const char *func_name, *ref_tname;
 	const struct btf_type *t, *ref_t;
 	const struct btf_param *args;
-	u32 i, nargs;
+	u32 i, nargs, ref_id;
 
 	t = btf_type_by_id(btf, func_id);
 	if (!t || !btf_type_is_func(t)) {
 		/* These checks were already done by the verifier while loading
-		 * struct bpf_func_info
+		 * struct bpf_func_info or in add_kern_func_call().
 		 */
 		bpf_log(log, "BTF of func_id %u doesn't point to KIND_FUNC\n",
 			func_id);
@@ -5422,9 +5430,49 @@ static int do_btf_check_func_arg_match(struct bpf_verifier_env *env,
 			return -EINVAL;
 		}
 
-		ref_t = btf_type_skip_modifiers(btf, t->type, NULL);
+		ref_t = btf_type_skip_modifiers(btf, t->type, &ref_id);
 		ref_tname = btf_name_by_offset(btf, ref_t->name_off);
-		if (btf_get_prog_ctx_type(log, btf, t, env->prog->type, i)) {
+		if (btf_is_kernel(btf)) {
+			const struct btf_type *reg_ref_t;
+			const struct btf *reg_btf;
+			const char *reg_ref_tname;
+			u32 reg_ref_id;
+
+			if (!btf_type_is_struct(ref_t)) {
+				bpf_log(log, "kernel function %s args#%d pointer type %s %s is not supported\n",
+					func_name, i, btf_type_str(ref_t),
+					ref_tname);
+				return -EINVAL;
+			}
+
+			if (reg->type == PTR_TO_BTF_ID) {
+				reg_btf = reg->btf;
+				reg_ref_id = reg->btf_id;
+			} else if (reg2btf_ids[reg->type]) {
+				reg_btf = btf_vmlinux;
+				reg_ref_id = *reg2btf_ids[reg->type];
+			} else {
+				bpf_log(log, "kernel function %s args#%d expected pointer to %s %s but R%d is not a pointer to btf_id\n",
+					func_name, i,
+					btf_type_str(ref_t), ref_tname, regno);
+				return -EINVAL;
+			}
+
+			reg_ref_t = btf_type_skip_modifiers(reg_btf, reg_ref_id,
+							    &reg_ref_id);
+			reg_ref_tname = btf_name_by_offset(reg_btf,
+							   reg_ref_t->name_off);
+			if (!btf_struct_ids_match(log, reg_btf, reg_ref_id,
+						  reg->off, btf, ref_id)) {
+				bpf_log(log, "kernel function %s args#%d expected pointer to %s %s but R%d has a pointer to %s %s\n",
+					func_name, i,
+					btf_type_str(ref_t), ref_tname,
+					regno, btf_type_str(reg_ref_t),
+					reg_ref_tname);
+				return -EINVAL;
+			}
+		} else if (btf_get_prog_ctx_type(log, btf, t,
+						 env->prog->type, i)) {
 			/* If function expects ctx type in BTF check that caller
 			 * is passing PTR_TO_CTX.
 			 */
@@ -5497,6 +5545,13 @@ int btf_check_func_arg_match(struct bpf_verifier_env *env, int subprog,
 	return err;
 }
 
+int btf_check_kern_func_arg_match(struct bpf_verifier_env *env,
+				  const struct btf *btf, u32 func_id,
+				  struct bpf_reg_state *regs)
+{
+	return do_btf_check_func_arg_match(env, btf, func_id, regs, false);
+}
+
 /* Convert BTF of a function into bpf_reg_state if possible
  * Returns:
  * EFAULT - there is a verifier bug. Abort verification.
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 4a6dd327446b..bd20683cb810 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -159,6 +159,9 @@ void bpf_prog_jit_attempt_done(struct bpf_prog *prog)
 		kvfree(prog->aux->jited_linfo);
 		prog->aux->jited_linfo = NULL;
 	}
+
+	kfree(prog->aux->kfunc_tab);
+	prog->aux->kfunc_tab = NULL;
 }
 
 /* The jit engine is responsible to provide an array
@@ -1840,9 +1843,15 @@ struct bpf_prog *bpf_prog_select_runtime(struct bpf_prog *fp, int *err)
 	/* In case of BPF to BPF calls, verifier did all the prep
 	 * work with regards to JITing, etc.
 	 */
+	bool jit_needed = false;
+
 	if (fp->bpf_func)
 		goto finalize;
 
+	if (IS_ENABLED(CONFIG_BPF_JIT_ALWAYS_ON) ||
+	    bpf_prog_has_kern_func_call(fp))
+		jit_needed = true;
+
 	bpf_prog_select_func(fp);
 
 	/* eBPF JITs can rewrite the program in case constant
@@ -1858,12 +1867,10 @@ struct bpf_prog *bpf_prog_select_runtime(struct bpf_prog *fp, int *err)
 
 		fp = bpf_int_jit_compile(fp);
 		bpf_prog_jit_attempt_done(fp);
-#ifdef CONFIG_BPF_JIT_ALWAYS_ON
-		if (!fp->jited) {
+		if (!fp->jited && jit_needed) {
 			*err = -ENOTSUPP;
 			return fp;
 		}
-#endif
 	} else {
 		*err = bpf_prog_offload_compile(fp);
 		if (*err)
@@ -2343,6 +2350,11 @@ bool __weak bpf_jit_needs_zext(void)
 	return false;
 }
 
+bool __weak bpf_jit_supports_kfunc_call(void)
+{
+	return false;
+}
+
 /* To execute LD_ABS/LD_IND instructions __bpf_prog_run() may call
  * skb_copy_bits(), so provide a weak definition of it for NET-less config.
  */
diff --git a/kernel/bpf/disasm.c b/kernel/bpf/disasm.c
index 3acc7e0b6916..9b476b9ead03 100644
--- a/kernel/bpf/disasm.c
+++ b/kernel/bpf/disasm.c
@@ -19,16 +19,25 @@ static const char *__func_get_name(const struct bpf_insn_cbs *cbs,
 {
 	BUILD_BUG_ON(ARRAY_SIZE(func_id_str) != __BPF_FUNC_MAX_ID);
 
-	if (insn->src_reg != BPF_PSEUDO_CALL &&
+	if (!insn->src_reg &&
 	    insn->imm >= 0 && insn->imm < __BPF_FUNC_MAX_ID &&
 	    func_id_str[insn->imm])
 		return func_id_str[insn->imm];
 
-	if (cbs && cbs->cb_call)
-		return cbs->cb_call(cbs->private_data, insn);
+	if (cbs && cbs->cb_call) {
+		const char *res;
+
+		res = cbs->cb_call(cbs->private_data, insn, buff, len);
+		if (res)
+			return res;
+	}
 
 	if (insn->src_reg == BPF_PSEUDO_CALL)
-		snprintf(buff, len, "%+d", insn->imm);
+		snprintf(buff, len, "pc%+d", insn->imm);
+	else if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL)
+		snprintf(buff, len, "kfunc#%d", insn->imm);
+	else
+		snprintf(buff, len, "unknown#%d", insn->imm);
 
 	return buff;
 }
@@ -255,18 +264,9 @@ void print_bpf_insn(const struct bpf_insn_cbs *cbs,
 		if (opcode == BPF_CALL) {
 			char tmp[64];
 
-			if (insn->src_reg == BPF_PSEUDO_CALL) {
-				verbose(cbs->private_data, "(%02x) call pc%s\n",
-					insn->code,
-					__func_get_name(cbs, insn,
-							tmp, sizeof(tmp)));
-			} else {
-				strcpy(tmp, "unknown");
-				verbose(cbs->private_data, "(%02x) call %s#%d\n", insn->code,
-					__func_get_name(cbs, insn,
-							tmp, sizeof(tmp)),
-					insn->imm);
-			}
+			verbose(cbs->private_data,
+				"(%02x) call %s\n", insn->code,
+				__func_get_name(cbs, insn, tmp, sizeof(tmp)));
 		} else if (insn->code == (BPF_JMP | BPF_JA)) {
 			verbose(cbs->private_data, "(%02x) goto pc%+d\n",
 				insn->code, insn->off);
diff --git a/kernel/bpf/disasm.h b/kernel/bpf/disasm.h
index e546b18d27da..60f9d87f9316 100644
--- a/kernel/bpf/disasm.h
+++ b/kernel/bpf/disasm.h
@@ -22,7 +22,8 @@ const char *func_id_name(int id);
 typedef __printf(2, 3) void (*bpf_insn_print_t)(void *private_data,
 						const char *, ...);
 typedef const char *(*bpf_insn_revmap_call_t)(void *private_data,
-					      const struct bpf_insn *insn);
+					      const struct bpf_insn *insn,
+					      char *buf, size_t len);
 typedef const char *(*bpf_insn_print_imm_t)(void *private_data,
 					    const struct bpf_insn *insn,
 					    __u64 full_imm);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 78a653e25df0..43ce565f017d 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1691,6 +1691,7 @@ static void __bpf_prog_put_noref(struct bpf_prog *prog, bool deferred)
 	btf_put(prog->aux->btf);
 	kvfree(prog->aux->jited_linfo);
 	kvfree(prog->aux->linfo);
+	kfree(prog->aux->kfunc_tab);
 	if (prog->aux->attach_btf)
 		btf_put(prog->aux->attach_btf);
 
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 0647454a0c8e..70e5d54a4115 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -234,6 +234,12 @@ static bool bpf_pseudo_call(const struct bpf_insn *insn)
 	       insn->src_reg == BPF_PSEUDO_CALL;
 }
 
+static bool bpf_pseudo_kfunc_call(const struct bpf_insn *insn)
+{
+	return insn->code == (BPF_JMP | BPF_CALL) &&
+	       insn->src_reg == BPF_PSEUDO_KFUNC_CALL;
+}
+
 static bool bpf_pseudo_func(const struct bpf_insn *insn)
 {
 	return insn->code == (BPF_LD | BPF_IMM | BPF_DW) &&
@@ -1554,47 +1560,203 @@ static int add_subprog(struct bpf_verifier_env *env, int off)
 		verbose(env, "too many subprograms\n");
 		return -E2BIG;
 	}
+	/* determine subprog starts. The end is one before the next starts */
 	env->subprog_info[env->subprog_cnt++].start = off;
 	sort(env->subprog_info, env->subprog_cnt,
 	     sizeof(env->subprog_info[0]), cmp_subprogs, NULL);
 	return env->subprog_cnt - 1;
 }
 
-static int check_subprogs(struct bpf_verifier_env *env)
+struct bpf_kern_func_descriptor {
+	struct btf_func_model func_model;
+	u32 func_id;
+	s32 imm;
+};
+
+#define MAX_KERN_FUNC_DESCS 256
+struct bpf_kern_func_desc_tab {
+	struct bpf_kern_func_descriptor descs[MAX_KERN_FUNC_DESCS];
+	u32 nr_descs;
+};
+
+static int kern_func_desc_cmp_by_id(const void *a, const void *b)
+{
+	const struct bpf_kern_func_descriptor *d0 = a;
+	const struct bpf_kern_func_descriptor *d1 = b;
+
+	/* func_id is not greater than BTF_MAX_TYPE */
+	return d0->func_id - d1->func_id;
+}
+
+static const struct bpf_kern_func_descriptor *
+find_kern_func_desc(const struct bpf_prog *prog, u32 func_id)
+{
+	struct bpf_kern_func_descriptor desc = {
+		.func_id = func_id,
+	};
+	struct bpf_kern_func_desc_tab *tab;
+
+	tab = prog->aux->kfunc_tab;
+	return bsearch(&desc, tab->descs, tab->nr_descs,
+		       sizeof(tab->descs[0]), kern_func_desc_cmp_by_id);
+}
+
+static int add_kern_func_call(struct bpf_verifier_env *env, u32 func_id)
+{
+	const struct btf_type *func, *func_proto;
+	struct bpf_kern_func_descriptor *desc;
+	struct bpf_kern_func_desc_tab *tab;
+	struct bpf_prog_aux *prog_aux;
+	const char *func_name;
+	unsigned long addr;
+
+	prog_aux = env->prog->aux;
+	tab = prog_aux->kfunc_tab;
+	if (!tab) {
+		if (!btf_vmlinux) {
+			verbose(env, "calling kernel function is not supported without CONFIG_DEBUG_INFO_BTF\n");
+			return -ENOTSUPP;
+		}
+
+		if (!env->prog->jit_requested) {
+			verbose(env, "JIT is required for calling kernel function\n");
+			return -ENOTSUPP;
+		}
+
+		if (!bpf_jit_supports_kfunc_call()) {
+			verbose(env, "JIT does not support calling kernel function\n");
+			return -ENOTSUPP;
+		}
+
+		if (!env->prog->gpl_compatible) {
+			verbose(env, "cannot call kernel function from non-GPL compatible program\n");
+			return -EINVAL;
+		}
+
+		tab = kzalloc(sizeof(*tab), GFP_KERNEL);
+		if (!tab)
+			return -ENOMEM;
+		prog_aux->kfunc_tab = tab;
+	}
+
+	if (find_kern_func_desc(env->prog, func_id))
+		return 0;
+
+	if (tab->nr_descs == MAX_KERN_FUNC_DESCS) {
+		verbose(env, "too many different kernel function calls\n");
+		return -E2BIG;
+	}
+
+	func = btf_type_by_id(btf_vmlinux, func_id);
+	if (!func || !btf_type_is_func(func)) {
+		verbose(env, "kernel btf_id %u is not a function\n",
+			func_id);
+		return -EINVAL;
+	}
+	func_proto = btf_type_by_id(btf_vmlinux, func->type);
+	if (!func_proto || !btf_type_is_func_proto(func_proto)) {
+		verbose(env, "kernel function btf_id %u does not have a valid func_proto\n",
+			func_id);
+		return -EINVAL;
+	}
+
+	func_name = btf_name_by_offset(btf_vmlinux, func->name_off);
+	addr = kallsyms_lookup_name(func_name);
+	if (!addr) {
+		verbose(env, "cannot find address for kernel function %s\n",
+			func_name);
+		return -EINVAL;
+	}
+
+	desc = &tab->descs[tab->nr_descs++];
+	desc->func_id = func_id;
+	desc->imm = BPF_CAST_CALL(addr) - __bpf_call_base;
+	sort(tab->descs, tab->nr_descs, sizeof(tab->descs[0]),
+	     kern_func_desc_cmp_by_id, NULL);
+
+	return btf_distill_func_proto(&env->log, btf_vmlinux,
+				      func_proto, func_name,
+				      &desc->func_model);
+}
+
+static int kern_func_desc_cmp_by_imm(const void *a, const void *b)
+{
+	const struct bpf_kern_func_descriptor *d0 = a;
+	const struct bpf_kern_func_descriptor *d1 = b;
+
+	return d0->imm - d1->imm;
+}
+
+static void sort_kern_func_descs_by_imm(struct bpf_prog *prog)
+{
+	struct bpf_kern_func_desc_tab *tab;
+
+	tab = prog->aux->kfunc_tab;
+	if (!tab)
+		return;
+
+	sort(tab->descs, tab->nr_descs, sizeof(tab->descs[0]),
+	     kern_func_desc_cmp_by_imm, NULL);
+}
+
+bool bpf_prog_has_kern_func_call(const struct bpf_prog *prog)
+{
+	return !!prog->aux->kfunc_tab;
+}
+
+const struct btf_func_model *
+bpf_jit_find_kern_func_model(const struct bpf_prog *prog,
+			     const struct bpf_insn *insn)
+{
+	const struct bpf_kern_func_descriptor desc = {
+		.imm = insn->imm,
+	};
+	const struct bpf_kern_func_descriptor *res;
+	struct bpf_kern_func_desc_tab *tab;
+
+	tab = prog->aux->kfunc_tab;
+	res = bsearch(&desc, tab->descs, tab->nr_descs,
+		      sizeof(tab->descs[0]), kern_func_desc_cmp_by_imm);
+
+	return res ? &res->func_model : NULL;
+}
+
+static int add_subprog_and_kern_func(struct bpf_verifier_env *env)
 {
-	int i, ret, subprog_start, subprog_end, off, cur_subprog = 0;
 	struct bpf_subprog_info *subprog = env->subprog_info;
 	struct bpf_insn *insn = env->prog->insnsi;
-	int insn_cnt = env->prog->len;
+	int i, ret, insn_cnt = env->prog->len;
 
 	/* Add entry function. */
 	ret = add_subprog(env, 0);
-	if (ret < 0)
+	if (ret)
 		return ret;
 
-	/* determine subprog starts. The end is one before the next starts */
-	for (i = 0; i < insn_cnt; i++) {
-		if (bpf_pseudo_func(insn + i)) {
-			if (!env->bpf_capable) {
-				verbose(env,
-					"function pointers are allowed for CAP_BPF and CAP_SYS_ADMIN\n");
-				return -EPERM;
-			}
-			ret = add_subprog(env, i + insn[i].imm + 1);
-			if (ret < 0)
-				return ret;
-			/* remember subprog */
-			insn[i + 1].imm = ret;
-			continue;
-		}
-		if (!bpf_pseudo_call(insn + i))
+	for (i = 0; i < insn_cnt; i++, insn++) {
+		if (!bpf_pseudo_func(insn) && !bpf_pseudo_call(insn) &&
+		    !bpf_pseudo_kfunc_call(insn))
 			continue;
+
 		if (!env->bpf_capable) {
 			verbose(env,
-				"function calls to other bpf functions are allowed for CAP_BPF and CAP_SYS_ADMIN\n");
+				"%s %s function pointer is only allowed for CAP_BPF and CAP_SYS_ADMIN\n",
+				bpf_pseudo_func(insn) ? "loading" : "calling",
+				bpf_pseudo_kfunc_call(insn) ? "kernel" : "other bpf");
+
 			return -EPERM;
 		}
-		ret = add_subprog(env, i + insn[i].imm + 1);
+
+		if (bpf_pseudo_func(insn)) {
+			ret = add_subprog(env, i + insn->imm + 1);
+			if (ret >= 0)
+				/* remember subprog */
+				insn[1].imm = ret;
+		} else if (bpf_pseudo_call(insn)) {
+			ret = add_subprog(env, i + insn->imm + 1);
+		} else {
+			ret = add_kern_func_call(env, insn->imm);
+		}
+
 		if (ret < 0)
 			return ret;
 	}
@@ -1608,6 +1770,16 @@ static int check_subprogs(struct bpf_verifier_env *env)
 		for (i = 0; i < env->subprog_cnt; i++)
 			verbose(env, "func#%d @%d\n", i, subprog[i].start);
 
+	return 0;
+}
+
+static int check_subprogs(struct bpf_verifier_env *env)
+{
+	int i, subprog_start, subprog_end, off, cur_subprog = 0;
+	struct bpf_subprog_info *subprog = env->subprog_info;
+	struct bpf_insn *insn = env->prog->insnsi;
+	int insn_cnt = env->prog->len;
+
 	/* now check that all jumps are within the same subprog */
 	subprog_start = subprog[cur_subprog].start;
 	subprog_end = subprog[cur_subprog + 1].start;
@@ -1916,6 +2088,27 @@ static int get_prev_insn_idx(struct bpf_verifier_state *st, int i,
 	return i;
 }
 
+static const char *disasm_kern_func_name(void *data,
+					 const struct bpf_insn *insn,
+					 char *buf, size_t len)
+{
+	const struct btf_type *func;
+	const char *func_name;
+
+	if (insn->src_reg != BPF_PSEUDO_KFUNC_CALL)
+		return NULL;
+
+	func = btf_type_by_id(btf_vmlinux, insn->imm);
+	if (!func)
+		func_name = "unknown-kern-func";
+	else
+		func_name = btf_name_by_offset(btf_vmlinux, func->name_off);
+
+	snprintf(buf, len, func_name);
+
+	return buf;
+}
+
 /* For given verifier state backtrack_insn() is called from the last insn to
  * the first insn. Its purpose is to compute a bitmask of registers and
  * stack slots that needs precision in the parent verifier state.
@@ -1924,6 +2117,7 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx,
 			  u32 *reg_mask, u64 *stack_mask)
 {
 	const struct bpf_insn_cbs cbs = {
+		.cb_call	= disasm_kern_func_name,
 		.cb_print	= verbose,
 		.private_data	= env,
 	};
@@ -5960,6 +6154,99 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 	return 0;
 }
 
+/* mark_btf_func_reg_size() is used when the reg size is determined by
+ * the BTF func_proto's return value size and argument.
+ */
+static void mark_btf_func_reg_size(struct bpf_verifier_env *env, u32 regno,
+				   size_t reg_size)
+{
+	struct bpf_reg_state *reg = &cur_regs(env)[regno];
+
+	if (regno == BPF_REG_0) {
+		/* Function return value */
+		reg->live |= REG_LIVE_WRITTEN;
+		reg->subreg_def = reg_size == sizeof(u64) ?
+			DEF_NOT_SUBREG : env->insn_idx + 1;
+	} else {
+		/* Function argument */
+		if (reg_size == sizeof(u64)) {
+			mark_insn_zext(env, reg);
+			mark_reg_read(env, reg, reg->parent, REG_LIVE_READ64);
+		} else {
+			mark_reg_read(env, reg, reg->parent, REG_LIVE_READ32);
+		}
+	}
+}
+
+static int check_kern_func_call(struct bpf_verifier_env *env,
+				struct bpf_insn *insn)
+{
+	const struct btf_type *t, *func, *func_proto, *ptr_type;
+	struct bpf_reg_state *regs = cur_regs(env);
+	const char *func_name, *ptr_type_name;
+	u32 i, nargs, ret_id, func_id;
+	const struct btf_param *args;
+	int err;
+
+	func_id = insn->imm;
+	func = btf_type_by_id(btf_vmlinux, func_id);
+	func_name = btf_name_by_offset(btf_vmlinux, func->name_off);
+	func_proto = btf_type_by_id(btf_vmlinux, func->type);
+
+	if (!env->ops->check_kern_func_call ||
+	    !env->ops->check_kern_func_call(func_id)) {
+		verbose(env, "calling kernel function %s is not allowed\n",
+			func_name);
+		return -EACCES;
+	}
+
+	/* Check return type */
+	t = btf_type_skip_modifiers(btf_vmlinux, func_proto->type, &ret_id);
+	if (btf_type_is_void(t)) {
+		mark_reg_not_init(env, regs, BPF_REG_0);
+	} else if (btf_type_is_scalar(t)) {
+		mark_reg_unknown(env, regs, BPF_REG_0);
+		mark_btf_func_reg_size(env, BPF_REG_0, t->size);
+	} else if (btf_type_is_ptr(t)) {
+		ptr_type = btf_type_skip_modifiers(btf_vmlinux, t->type, NULL);
+		if (!btf_type_is_struct(ptr_type)) {
+			ptr_type_name = btf_name_by_offset(btf_vmlinux,
+							   ptr_type->name_off);
+			verbose(env, "kernel function %s returns pointer type %s %s is not supported\n",
+				func_name, btf_type_str(t),
+				ptr_type_name);
+			return -EINVAL;
+		}
+		regs[BPF_REG_0].btf = btf_vmlinux;
+		regs[BPF_REG_0].type = PTR_TO_BTF_ID;
+		regs[BPF_REG_0].btf_id = ret_id;
+		mark_btf_func_reg_size(env, BPF_REG_0, sizeof(void *));
+	} /* else { add_kern_func_call() has already rejected this case } */
+
+	/* Check the arguments */
+	err = btf_check_kern_func_arg_match(env, btf_vmlinux, func_id, regs);
+	if (err)
+		return err;
+
+	nargs = btf_type_vlen(func_proto);
+	args = (const struct btf_param *)(func_proto + 1);
+	for (i = 0; i < nargs; i++) {
+		u32 regno = i + 1;
+
+		t = btf_type_skip_modifiers(btf_vmlinux, args[i].type, NULL);
+		if (btf_type_is_ptr(t))
+			mark_btf_func_reg_size(env, regno, sizeof(void *));
+		else
+			/* scalar. ensured by btf_check_kern_func_arg_match() */
+			mark_btf_func_reg_size(env, regno, t->size);
+	}
+
+	for (i = 1; i < CALLER_SAVED_REGS; i++)
+		mark_reg_not_init(env, regs, caller_saved[i]);
+
+	return 0;
+}
+
 static bool signed_add_overflows(s64 a, s64 b)
 {
 	/* Do the add in u64, where overflow is well-defined */
@@ -10163,6 +10450,7 @@ static int do_check(struct bpf_verifier_env *env)
 
 		if (env->log.level & BPF_LOG_LEVEL) {
 			const struct bpf_insn_cbs cbs = {
+				.cb_call	= disasm_kern_func_name,
 				.cb_print	= verbose,
 				.private_data	= env,
 			};
@@ -10310,7 +10598,8 @@ static int do_check(struct bpf_verifier_env *env)
 				if (BPF_SRC(insn->code) != BPF_K ||
 				    insn->off != 0 ||
 				    (insn->src_reg != BPF_REG_0 &&
-				     insn->src_reg != BPF_PSEUDO_CALL) ||
+				     insn->src_reg != BPF_PSEUDO_CALL &&
+				     insn->src_reg != BPF_PSEUDO_KFUNC_CALL) ||
 				    insn->dst_reg != BPF_REG_0 ||
 				    class == BPF_JMP32) {
 					verbose(env, "BPF_CALL uses reserved fields\n");
@@ -10325,6 +10614,8 @@ static int do_check(struct bpf_verifier_env *env)
 				}
 				if (insn->src_reg == BPF_PSEUDO_CALL)
 					err = check_func_call(env, insn, &env->insn_idx);
+				else if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL)
+					err = check_kern_func_call(env, insn);
 				else
 					err = check_helper_call(env, insn, &env->insn_idx);
 				if (err)
@@ -11635,6 +11926,7 @@ static int jit_subprogs(struct bpf_verifier_env *env)
 		func[i]->aux->name[0] = 'F';
 		func[i]->aux->stack_depth = env->subprog_info[i].stack_depth;
 		func[i]->jit_requested = 1;
+		func[i]->aux->kfunc_tab = prog->aux->kfunc_tab;
 		func[i]->aux->linfo = prog->aux->linfo;
 		func[i]->aux->nr_linfo = prog->aux->nr_linfo;
 		func[i]->aux->jited_linfo = prog->aux->jited_linfo;
@@ -11774,6 +12066,7 @@ static int fixup_call_args(struct bpf_verifier_env *env)
 #ifndef CONFIG_BPF_JIT_ALWAYS_ON
 	struct bpf_prog *prog = env->prog;
 	struct bpf_insn *insn = prog->insnsi;
+	bool has_kfunc_call = bpf_prog_has_kern_func_call(prog);
 	int i, depth;
 #endif
 	int err = 0;
@@ -11787,6 +12080,10 @@ static int fixup_call_args(struct bpf_verifier_env *env)
 			return err;
 	}
 #ifndef CONFIG_BPF_JIT_ALWAYS_ON
+	if (has_kfunc_call) {
+		verbose(env, "calling kernel functions are not allowed in non-JITed programs\n");
+		return -EINVAL;
+	}
 	if (env->subprog_cnt > 1 && env->prog->aux->tail_call_reachable) {
 		/* When JIT fails the progs with bpf2bpf calls and tail_calls
 		 * have to be rejected, since interpreter doesn't support them yet.
@@ -11815,6 +12112,27 @@ static int fixup_call_args(struct bpf_verifier_env *env)
 	return err;
 }
 
+static int fixup_kern_func_call(struct bpf_verifier_env *env,
+				struct bpf_insn *insn)
+{
+	const struct bpf_kern_func_descriptor *desc;
+	const struct bpf_prog *prog = env->prog;
+
+	/* insn->imm has the btf func_id. Replace it with
+	 * an address (relative to __bpf_base_call).
+	 */
+	desc = find_kern_func_desc(prog, insn->imm);
+	if (!desc) {
+		verbose(env, "verifier internal error: kernel function descriptor not found for func_id %u\n",
+			insn->imm);
+		return -EFAULT;
+	}
+
+	insn->imm = desc->imm;
+
+	return 0;
+}
+
 /* Do various post-verification rewrites in a single program pass.
  * These rewrites simplify JIT and interpreter implementations.
  */
@@ -11951,6 +12269,12 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
 			continue;
 		if (insn->src_reg == BPF_PSEUDO_CALL)
 			continue;
+		if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL) {
+			ret = fixup_kern_func_call(env, insn);
+			if (ret)
+				return ret;
+			continue;
+		}
 
 		if (insn->imm == BPF_FUNC_get_route_realm)
 			prog->dst_needed = 1;
@@ -12180,6 +12504,8 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
 		}
 	}
 
+	sort_kern_func_descs_by_imm(env->prog);
+
 	return 0;
 }
 
@@ -12885,6 +13211,10 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr,
 	if (!env->explored_states)
 		goto skip_full_check;
 
+	ret = add_subprog_and_kern_func(env);
+	if (ret < 0)
+		goto skip_full_check;
+
 	ret = check_subprogs(env);
 	if (ret < 0)
 		goto skip_full_check;
diff --git a/tools/bpf/bpftool/xlated_dumper.c b/tools/bpf/bpftool/xlated_dumper.c
index 6fc3e6f7f40c..9ea9b8c76525 100644
--- a/tools/bpf/bpftool/xlated_dumper.c
+++ b/tools/bpf/bpftool/xlated_dumper.c
@@ -167,7 +167,8 @@ static const char *print_call_helper(struct dump_data *dd,
 }
 
 static const char *print_call(void *private_data,
-			      const struct bpf_insn *insn)
+			      const struct bpf_insn *insn,
+			      char *buf, size_t len)
 {
 	struct dump_data *dd = private_data;
 	unsigned long address = dd->address_call_base + insn->imm;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 2d3036e292a9..ab9f2233607c 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1117,6 +1117,10 @@ enum bpf_link_type {
  * offset to another bpf function
  */
 #define BPF_PSEUDO_CALL		1
+/* when bpf_call->src_reg == BPF_PSEUDO_KFUNC_CALL,
+ * bpf_call->imm == btf_id of a BTF_KIND_FUNC in the running kernel
+ */
+#define BPF_PSEUDO_KFUNC_CALL	2
 
 /* flags for BPF_MAP_UPDATE_ELEM command */
 enum {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH bpf-next 05/15] bpf: Support kernel function call in x86-32
  2021-03-16  1:13 [PATCH bpf-next 00/15] Support calling kernel function Martin KaFai Lau
                   ` (3 preceding siblings ...)
  2021-03-16  1:14 ` [PATCH bpf-next 04/15] bpf: Support bpf program calling kernel function Martin KaFai Lau
@ 2021-03-16  1:14 ` Martin KaFai Lau
  2021-03-16  1:14 ` [PATCH bpf-next 06/15] tcp: Rename bictcp function prefix to cubictcp Martin KaFai Lau
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 49+ messages in thread
From: Martin KaFai Lau @ 2021-03-16  1:14 UTC (permalink / raw)
  To: bpf; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, netdev

This patch adds kernel function call support to the x86-32 bpf jit.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 arch/x86/net/bpf_jit_comp32.c | 198 ++++++++++++++++++++++++++++++++++
 1 file changed, 198 insertions(+)

diff --git a/arch/x86/net/bpf_jit_comp32.c b/arch/x86/net/bpf_jit_comp32.c
index d17b67c69f89..f2ac36cf08ac 100644
--- a/arch/x86/net/bpf_jit_comp32.c
+++ b/arch/x86/net/bpf_jit_comp32.c
@@ -1390,6 +1390,19 @@ static inline void emit_push_r64(const u8 src[], u8 **pprog)
 	*pprog = prog;
 }
 
+static void emit_push_r32(const u8 src[], u8 **pprog)
+{
+	u8 *prog = *pprog;
+	int cnt = 0;
+
+	/* mov ecx,dword ptr [ebp+off] */
+	EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_ECX), STACK_VAR(src_lo));
+	/* push ecx */
+	EMIT1(0x51);
+
+	*pprog = prog;
+}
+
 static u8 get_cond_jmp_opcode(const u8 op, bool is_cmp_lo)
 {
 	u8 jmp_cond;
@@ -1459,6 +1472,174 @@ static u8 get_cond_jmp_opcode(const u8 op, bool is_cmp_lo)
 	return jmp_cond;
 }
 
+/* i386 kernel compiles with "-mregparm=3".  From gcc document:
+ *
+ * ==== snippet ====
+ * regparm (number)
+ *	On x86-32 targets, the regparm attribute causes the compiler
+ *	to pass arguments number one to (number) if they are of integral
+ *	type in registers EAX, EDX, and ECX instead of on the stack.
+ *	Functions that take a variable number of arguments continue
+ *	to be passed all of their arguments on the stack.
+ * ==== snippet ====
+ *
+ * The first three args of a function will be considered for
+ * putting into the 32bit register EAX, EDX, and ECX.
+ *
+ * Two 32bit registers are used to pass a 64bit arg.
+ *
+ * For example,
+ * void foo(u32 a, u32 b, u32 c, u32 d):
+ *	u32 a: EAX
+ *	u32 b: EDX
+ *	u32 c: ECX
+ *	u32 d: stack
+ *
+ * void foo(u64 a, u32 b, u32 c):
+ * 	u64 a: EAX (lo32) EDX (hi32)
+ *	u32 b: ECX
+ *	u32 c: stack
+ *
+ * void foo(u32 a, u64 b, u32 c):
+ *	u32 a: EAX
+ *	u64 b: EDX (lo32) ECX (hi32)
+ *	u32 c: stack
+ *
+ * void foo(u32 a, u32 b, u64 c):
+ *	u32 a: EAX
+ *	u32 b: EDX
+ *	u64 c: stack
+ *
+ * The return value will be stored in the EAX (and EDX for 64bit value).
+ *
+ * For example,
+ * u32 foo(u32 a, u32 b, u32 c):
+ *	return value: EAX
+ *
+ * u64 foo(u32 a, u32 b, u32 c):
+ *	return value: EAX (lo32) EDX (hi32)
+ *
+ * Notes:
+ *	The verifier only accepts function having integer and pointers
+ *	as its args and return value, so it does not have
+ *	struct-by-value.
+ *
+ * emit_kfunc_call() finds out the btf_func_model by calling
+ * bpf_jit_find_kern_func_model().  A btf_func_model
+ * has the details about the number of args, size of each arg,
+ * and the size of the return value.
+ *
+ * It first decides how many args can be passed by EAX, EDX, and ECX.
+ * That will decide what args should be pushed to the stack:
+ * [first_stack_regno, last_stack_regno] are the bpf regnos
+ * that should be pushed to the stack.
+ *
+ * It will first push all args to the stack because the push
+ * will need to use ECX.  Then, it moves
+ * [BPF_REG_1, first_stack_regno) to EAX, EDX, and ECX.
+ *
+ * When emitting a call (0xE8), it needs to figure out
+ * the jmp_offset relative to the jit-insn address immediately
+ * following the call (0xE8) instruction.  At this point, it knows
+ * the end of the jit-insn address after completely translated the
+ * current (BPF_JMP | BPF_CALL) bpf-insn.  It is passed as "end_addr"
+ * to the emit_kfunc_call().  Thus, it can learn the "immediate-follow-call"
+ * address by figuring out how many jit-insn is generated between
+ * the call (0xE8) and the end_addr:
+ *	- 0-1 jit-insn (3 bytes each) to restore the esp pointer if there
+ *	  is arg pushed to the stack.
+ *	- 0-2 jit-insns (3 bytes each) to handle the return value.
+ */
+static int emit_kfunc_call(const struct bpf_prog *bpf_prog, u8 *end_addr,
+			   const struct bpf_insn *insn, u8 **pprog)
+{
+	const u8 arg_regs[] = { IA32_EAX, IA32_EDX, IA32_ECX };
+	int i, cnt = 0, first_stack_regno, last_stack_regno;
+	int free_arg_regs = ARRAY_SIZE(arg_regs);
+	const struct btf_func_model *fm;
+	int bytes_in_stack = 0;
+	const u8 *cur_arg_reg;
+	u8 *prog = *pprog;
+	s64 jmp_offset;
+
+	fm = bpf_jit_find_kern_func_model(bpf_prog, insn);
+	if (!fm)
+		return -EINVAL;
+
+	first_stack_regno = BPF_REG_1;
+	for (i = 0; i < fm->nr_args; i++) {
+		int regs_needed = fm->arg_size[i] > sizeof(u32) ? 2 : 1;
+
+		if (regs_needed > free_arg_regs)
+			break;
+
+		free_arg_regs -= regs_needed;
+		first_stack_regno++;
+	}
+
+	/* Push the args to the stack */
+	last_stack_regno = BPF_REG_0 + fm->nr_args;
+	for (i = last_stack_regno; i >= first_stack_regno; i--) {
+		if (fm->arg_size[i - 1] > sizeof(u32)) {
+			emit_push_r64(bpf2ia32[i], &prog);
+			bytes_in_stack += 8;
+		} else {
+			emit_push_r32(bpf2ia32[i], &prog);
+			bytes_in_stack += 4;
+		}
+	}
+
+	cur_arg_reg = &arg_regs[0];
+	for (i = BPF_REG_1; i < first_stack_regno; i++) {
+		/* mov e[adc]x,dword ptr [ebp+off] */
+		EMIT3(0x8B, add_2reg(0x40, IA32_EBP, *cur_arg_reg++),
+		      STACK_VAR(bpf2ia32[i][0]));
+		if (fm->arg_size[i - 1] > sizeof(u32))
+			/* mov e[adc]x,dword ptr [ebp+off] */
+			EMIT3(0x8B, add_2reg(0x40, IA32_EBP, *cur_arg_reg++),
+			      STACK_VAR(bpf2ia32[i][1]));
+	}
+
+	if (bytes_in_stack)
+		/* add esp,"bytes_in_stack" */
+		end_addr -= 3;
+
+	/* mov dword ptr [ebp+off],edx */
+	if (fm->ret_size > sizeof(u32))
+		end_addr -= 3;
+
+	/* mov dword ptr [ebp+off],eax */
+	if (fm->ret_size)
+		end_addr -= 3;
+
+	jmp_offset = (u8 *)__bpf_call_base + insn->imm - end_addr;
+	if (!is_simm32(jmp_offset)) {
+		pr_err("unsupported BPF kernel function jmp_offset:%lld\n",
+		       jmp_offset);
+		return -EINVAL;
+	}
+
+	EMIT1_off32(0xE8, jmp_offset);
+
+	if (fm->ret_size)
+		/* mov dword ptr [ebp+off],eax */
+		EMIT3(0x89, add_2reg(0x40, IA32_EBP, IA32_EAX),
+		      STACK_VAR(bpf2ia32[BPF_REG_0][0]));
+
+	if (fm->ret_size > sizeof(u32))
+		/* mov dword ptr [ebp+off],edx */
+		EMIT3(0x89, add_2reg(0x40, IA32_EBP, IA32_EDX),
+		      STACK_VAR(bpf2ia32[BPF_REG_0][1]));
+
+	if (bytes_in_stack)
+		/* add esp,"bytes_in_stack" */
+		EMIT3(0x83, add_1reg(0xC0, IA32_ESP), bytes_in_stack);
+
+	*pprog = prog;
+
+	return 0;
+}
+
 static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
 		  int oldproglen, struct jit_context *ctx)
 {
@@ -1888,6 +2069,18 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
 			if (insn->src_reg == BPF_PSEUDO_CALL)
 				goto notyet;
 
+			if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL) {
+				int err;
+
+				err = emit_kfunc_call(bpf_prog,
+						      image + addrs[i],
+						      insn, &prog);
+
+				if (err)
+					return err;
+				break;
+			}
+
 			func = (u8 *) __bpf_call_base + imm32;
 			jmp_offset = func - (image + addrs[i]);
 
@@ -2393,3 +2586,8 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 					   tmp : orig_prog);
 	return prog;
 }
+
+bool bpf_jit_supports_kfunc_call(void)
+{
+	return true;
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH bpf-next 06/15] tcp: Rename bictcp function prefix to cubictcp
  2021-03-16  1:13 [PATCH bpf-next 00/15] Support calling kernel function Martin KaFai Lau
                   ` (4 preceding siblings ...)
  2021-03-16  1:14 ` [PATCH bpf-next 05/15] bpf: Support kernel function call in x86-32 Martin KaFai Lau
@ 2021-03-16  1:14 ` Martin KaFai Lau
  2021-03-16  1:14 ` [PATCH bpf-next 07/15] bpf: tcp: White list some tcp cong functions to be called by bpf-tcp-cc Martin KaFai Lau
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 49+ messages in thread
From: Martin KaFai Lau @ 2021-03-16  1:14 UTC (permalink / raw)
  To: bpf; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, netdev

The cubic functions in tcp_cubic.c are using the bictcp prefix as
in tcp_bic.c.  This patch gives it the proper name cubictcp
because the later patch will allow the bpf prog to directly
call the cubictcp implementation.  Renaming them will avoid
the name collision when trying to find the intended
one to call during bpf prog load time.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 net/ipv4/tcp_cubic.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c
index ffcbe46dacdb..4a30deaa9a37 100644
--- a/net/ipv4/tcp_cubic.c
+++ b/net/ipv4/tcp_cubic.c
@@ -124,7 +124,7 @@ static inline void bictcp_hystart_reset(struct sock *sk)
 	ca->sample_cnt = 0;
 }
 
-static void bictcp_init(struct sock *sk)
+static void cubictcp_init(struct sock *sk)
 {
 	struct bictcp *ca = inet_csk_ca(sk);
 
@@ -137,7 +137,7 @@ static void bictcp_init(struct sock *sk)
 		tcp_sk(sk)->snd_ssthresh = initial_ssthresh;
 }
 
-static void bictcp_cwnd_event(struct sock *sk, enum tcp_ca_event event)
+static void cubictcp_cwnd_event(struct sock *sk, enum tcp_ca_event event)
 {
 	if (event == CA_EVENT_TX_START) {
 		struct bictcp *ca = inet_csk_ca(sk);
@@ -319,7 +319,7 @@ static inline void bictcp_update(struct bictcp *ca, u32 cwnd, u32 acked)
 	ca->cnt = max(ca->cnt, 2U);
 }
 
-static void bictcp_cong_avoid(struct sock *sk, u32 ack, u32 acked)
+static void cubictcp_cong_avoid(struct sock *sk, u32 ack, u32 acked)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct bictcp *ca = inet_csk_ca(sk);
@@ -338,7 +338,7 @@ static void bictcp_cong_avoid(struct sock *sk, u32 ack, u32 acked)
 	tcp_cong_avoid_ai(tp, ca->cnt, acked);
 }
 
-static u32 bictcp_recalc_ssthresh(struct sock *sk)
+static u32 cubictcp_recalc_ssthresh(struct sock *sk)
 {
 	const struct tcp_sock *tp = tcp_sk(sk);
 	struct bictcp *ca = inet_csk_ca(sk);
@@ -355,7 +355,7 @@ static u32 bictcp_recalc_ssthresh(struct sock *sk)
 	return max((tp->snd_cwnd * beta) / BICTCP_BETA_SCALE, 2U);
 }
 
-static void bictcp_state(struct sock *sk, u8 new_state)
+static void cubictcp_state(struct sock *sk, u8 new_state)
 {
 	if (new_state == TCP_CA_Loss) {
 		bictcp_reset(inet_csk_ca(sk));
@@ -442,7 +442,7 @@ static void hystart_update(struct sock *sk, u32 delay)
 	}
 }
 
-static void bictcp_acked(struct sock *sk, const struct ack_sample *sample)
+static void cubictcp_acked(struct sock *sk, const struct ack_sample *sample)
 {
 	const struct tcp_sock *tp = tcp_sk(sk);
 	struct bictcp *ca = inet_csk_ca(sk);
@@ -471,13 +471,13 @@ static void bictcp_acked(struct sock *sk, const struct ack_sample *sample)
 }
 
 static struct tcp_congestion_ops cubictcp __read_mostly = {
-	.init		= bictcp_init,
-	.ssthresh	= bictcp_recalc_ssthresh,
-	.cong_avoid	= bictcp_cong_avoid,
-	.set_state	= bictcp_state,
+	.init		= cubictcp_init,
+	.ssthresh	= cubictcp_recalc_ssthresh,
+	.cong_avoid	= cubictcp_cong_avoid,
+	.set_state	= cubictcp_state,
 	.undo_cwnd	= tcp_reno_undo_cwnd,
-	.cwnd_event	= bictcp_cwnd_event,
-	.pkts_acked     = bictcp_acked,
+	.cwnd_event	= cubictcp_cwnd_event,
+	.pkts_acked     = cubictcp_acked,
 	.owner		= THIS_MODULE,
 	.name		= "cubic",
 };
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH bpf-next 07/15] bpf: tcp: White list some tcp cong functions to be called by bpf-tcp-cc
  2021-03-16  1:13 [PATCH bpf-next 00/15] Support calling kernel function Martin KaFai Lau
                   ` (5 preceding siblings ...)
  2021-03-16  1:14 ` [PATCH bpf-next 06/15] tcp: Rename bictcp function prefix to cubictcp Martin KaFai Lau
@ 2021-03-16  1:14 ` Martin KaFai Lau
  2021-03-19  1:19   ` Andrii Nakryiko
  2021-03-16  1:14 ` [PATCH bpf-next 08/15] libbpf: Refactor bpf_object__resolve_ksyms_btf_id Martin KaFai Lau
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 49+ messages in thread
From: Martin KaFai Lau @ 2021-03-16  1:14 UTC (permalink / raw)
  To: bpf; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, netdev

This patch white list some tcp cong helper functions, tcp_slow_start()
and tcp_cong_avoid_ai().  They are allowed to be directly called by
the bpf-tcp-cc program.

A few tcp cc implementation functions are also white listed.
A potential use case is the bpf-tcp-cc implementation may only
want to override a subset of a tcp_congestion_ops.  For others,
the bpf-tcp-cc can directly call the kernel counter parts instead of
re-implementing (or copy-and-pasting) them to the bpf program.

They will only be available to the bpf-tcp-cc typed program.
The white listed functions are not bounded to a fixed ABI contract.
When any of them has changed, the bpf-tcp-cc program has to be changed
like any in-tree/out-of-tree kernel tcp-cc implementations do also.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 net/ipv4/bpf_tcp_ca.c | 41 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/net/ipv4/bpf_tcp_ca.c b/net/ipv4/bpf_tcp_ca.c
index d520e61649c8..ed6e6b5b762b 100644
--- a/net/ipv4/bpf_tcp_ca.c
+++ b/net/ipv4/bpf_tcp_ca.c
@@ -5,6 +5,7 @@
 #include <linux/bpf_verifier.h>
 #include <linux/bpf.h>
 #include <linux/btf.h>
+#include <linux/btf_ids.h>
 #include <linux/filter.h>
 #include <net/tcp.h>
 #include <net/bpf_sk_storage.h>
@@ -178,10 +179,50 @@ bpf_tcp_ca_get_func_proto(enum bpf_func_id func_id,
 	}
 }
 
+BTF_SET_START(bpf_tcp_ca_kfunc_ids)
+BTF_ID(func, tcp_reno_ssthresh)
+BTF_ID(func, tcp_reno_cong_avoid)
+BTF_ID(func, tcp_reno_undo_cwnd)
+BTF_ID(func, tcp_slow_start)
+BTF_ID(func, tcp_cong_avoid_ai)
+#if IS_BUILTIN(CONFIG_TCP_CONG_CUBIC)
+BTF_ID(func, cubictcp_init)
+BTF_ID(func, cubictcp_recalc_ssthresh)
+BTF_ID(func, cubictcp_cong_avoid)
+BTF_ID(func, cubictcp_state)
+BTF_ID(func, cubictcp_cwnd_event)
+BTF_ID(func, cubictcp_acked)
+#endif
+#if IS_BUILTIN(CONFIG_TCP_CONG_DCTCP)
+BTF_ID(func, dctcp_init)
+BTF_ID(func, dctcp_update_alpha)
+BTF_ID(func, dctcp_cwnd_event)
+BTF_ID(func, dctcp_ssthresh)
+BTF_ID(func, dctcp_cwnd_undo)
+BTF_ID(func, dctcp_state)
+#endif
+#if IS_BUILTIN(CONFIG_TCP_CONG_BBR)
+BTF_ID(func, bbr_init)
+BTF_ID(func, bbr_main)
+BTF_ID(func, bbr_sndbuf_expand)
+BTF_ID(func, bbr_undo_cwnd)
+BTF_ID(func, bbr_cwnd_even),
+BTF_ID(func, bbr_ssthresh)
+BTF_ID(func, bbr_min_tso_segs)
+BTF_ID(func, bbr_set_state)
+#endif
+BTF_SET_END(bpf_tcp_ca_kfunc_ids)
+
+static bool bpf_tcp_ca_check_kern_func_call(u32 kfunc_btf_id)
+{
+	return btf_id_set_contains(&bpf_tcp_ca_kfunc_ids, kfunc_btf_id);
+}
+
 static const struct bpf_verifier_ops bpf_tcp_ca_verifier_ops = {
 	.get_func_proto		= bpf_tcp_ca_get_func_proto,
 	.is_valid_access	= bpf_tcp_ca_is_valid_access,
 	.btf_struct_access	= bpf_tcp_ca_btf_struct_access,
+	.check_kern_func_call	= bpf_tcp_ca_check_kern_func_call,
 };
 
 static int bpf_tcp_ca_init_member(const struct btf_type *t,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH bpf-next 08/15] libbpf: Refactor bpf_object__resolve_ksyms_btf_id
  2021-03-16  1:13 [PATCH bpf-next 00/15] Support calling kernel function Martin KaFai Lau
                   ` (6 preceding siblings ...)
  2021-03-16  1:14 ` [PATCH bpf-next 07/15] bpf: tcp: White list some tcp cong functions to be called by bpf-tcp-cc Martin KaFai Lau
@ 2021-03-16  1:14 ` Martin KaFai Lau
  2021-03-19  2:53   ` Andrii Nakryiko
  2021-03-16  1:14 ` [PATCH bpf-next 09/15] libbpf: Refactor codes for finding btf id of a kernel symbol Martin KaFai Lau
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 49+ messages in thread
From: Martin KaFai Lau @ 2021-03-16  1:14 UTC (permalink / raw)
  To: bpf; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, netdev

This patch refactors most of the logic from
bpf_object__resolve_ksyms_btf_id() into a new function
bpf_object__resolve_ksym_var_btf_id().
It is to get ready for a later patch adding
bpf_object__resolve_ksym_func_btf_id() which resolves
a kernel function to the running kernel btf_id.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 tools/lib/bpf/libbpf.c | 125 ++++++++++++++++++++++-------------------
 1 file changed, 68 insertions(+), 57 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 2f351d3ad3e7..7d5f9b7877bc 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -7403,75 +7403,86 @@ static int bpf_object__read_kallsyms_file(struct bpf_object *obj)
 	return err;
 }
 
-static int bpf_object__resolve_ksyms_btf_id(struct bpf_object *obj)
+static int bpf_object__resolve_ksym_var_btf_id(struct bpf_object *obj,
+					       struct extern_desc *ext)
 {
-	struct extern_desc *ext;
+	const struct btf_type *targ_var, *targ_type;
+	__u32 targ_type_id, local_type_id;
+	const char *targ_var_name;
+	int i, id, btf_fd, err;
 	struct btf *btf;
-	int i, j, id, btf_fd, err;
 
-	for (i = 0; i < obj->nr_extern; i++) {
-		const struct btf_type *targ_var, *targ_type;
-		__u32 targ_type_id, local_type_id;
-		const char *targ_var_name;
-		int ret;
+	btf = obj->btf_vmlinux;
+	btf_fd = 0;
+	id = btf__find_by_name_kind(btf, ext->name, BTF_KIND_VAR);
+	if (id == -ENOENT) {
+		err = load_module_btfs(obj);
+		if (err)
+			return err;
 
-		ext = &obj->externs[i];
-		if (ext->type != EXT_KSYM || !ext->ksym.type_id)
-			continue;
+		for (i = 0; i < obj->btf_module_cnt; i++) {
+			btf = obj->btf_modules[i].btf;
+			/* we assume module BTF FD is always >0 */
+			btf_fd = obj->btf_modules[i].fd;
+			id = btf__find_by_name_kind(btf, ext->name, BTF_KIND_VAR);
+			if (id != -ENOENT)
+				break;
+		}
+	}
+	if (id <= 0) {
+		pr_warn("extern (var ksym) '%s': failed to find BTF ID in kernel BTF(s).\n",
+			ext->name);
+		return -ESRCH;
+	}
 
-		btf = obj->btf_vmlinux;
-		btf_fd = 0;
-		id = btf__find_by_name_kind(btf, ext->name, BTF_KIND_VAR);
-		if (id == -ENOENT) {
-			err = load_module_btfs(obj);
-			if (err)
-				return err;
+	/* find local type_id */
+	local_type_id = ext->ksym.type_id;
 
-			for (j = 0; j < obj->btf_module_cnt; j++) {
-				btf = obj->btf_modules[j].btf;
-				/* we assume module BTF FD is always >0 */
-				btf_fd = obj->btf_modules[j].fd;
-				id = btf__find_by_name_kind(btf, ext->name, BTF_KIND_VAR);
-				if (id != -ENOENT)
-					break;
-			}
-		}
-		if (id <= 0) {
-			pr_warn("extern (ksym) '%s': failed to find BTF ID in kernel BTF(s).\n",
-				ext->name);
-			return -ESRCH;
-		}
+	/* find target type_id */
+	targ_var = btf__type_by_id(btf, id);
+	targ_var_name = btf__name_by_offset(btf, targ_var->name_off);
+	targ_type = skip_mods_and_typedefs(btf, targ_var->type, &targ_type_id);
 
-		/* find local type_id */
-		local_type_id = ext->ksym.type_id;
+	err = bpf_core_types_are_compat(obj->btf, local_type_id,
+					btf, targ_type_id);
+	if (err <= 0) {
+		const struct btf_type *local_type;
+		const char *targ_name, *local_name;
 
-		/* find target type_id */
-		targ_var = btf__type_by_id(btf, id);
-		targ_var_name = btf__name_by_offset(btf, targ_var->name_off);
-		targ_type = skip_mods_and_typedefs(btf, targ_var->type, &targ_type_id);
+		local_type = btf__type_by_id(obj->btf, local_type_id);
+		local_name = btf__name_by_offset(obj->btf, local_type->name_off);
+		targ_name = btf__name_by_offset(btf, targ_type->name_off);
 
-		ret = bpf_core_types_are_compat(obj->btf, local_type_id,
-						btf, targ_type_id);
-		if (ret <= 0) {
-			const struct btf_type *local_type;
-			const char *targ_name, *local_name;
+		pr_warn("extern (var ksym) '%s': incompatible types, expected [%d] %s %s, but kernel has [%d] %s %s\n",
+			ext->name, local_type_id,
+			btf_kind_str(local_type), local_name, targ_type_id,
+			btf_kind_str(targ_type), targ_name);
+		return -EINVAL;
+	}
 
-			local_type = btf__type_by_id(obj->btf, local_type_id);
-			local_name = btf__name_by_offset(obj->btf, local_type->name_off);
-			targ_name = btf__name_by_offset(btf, targ_type->name_off);
+	ext->is_set = true;
+	ext->ksym.kernel_btf_obj_fd = btf_fd;
+	ext->ksym.kernel_btf_id = id;
+	pr_debug("extern (var ksym) '%s': resolved to [%d] %s %s\n",
+		 ext->name, id, btf_kind_str(targ_var), targ_var_name);
 
-			pr_warn("extern (ksym) '%s': incompatible types, expected [%d] %s %s, but kernel has [%d] %s %s\n",
-				ext->name, local_type_id,
-				btf_kind_str(local_type), local_name, targ_type_id,
-				btf_kind_str(targ_type), targ_name);
-			return -EINVAL;
-		}
+	return 0;
+}
+
+static int bpf_object__resolve_ksyms_btf_id(struct bpf_object *obj)
+{
+	struct extern_desc *ext;
+	int i, err;
+
+	for (i = 0; i < obj->nr_extern; i++) {
+		ext = &obj->externs[i];
+		if (ext->type != EXT_KSYM || !ext->ksym.type_id)
+			continue;
+
+		err = bpf_object__resolve_ksym_var_btf_id(obj, ext);
 
-		ext->is_set = true;
-		ext->ksym.kernel_btf_obj_fd = btf_fd;
-		ext->ksym.kernel_btf_id = id;
-		pr_debug("extern (ksym) '%s': resolved to [%d] %s %s\n",
-			 ext->name, id, btf_kind_str(targ_var), targ_var_name);
+		if (err)
+			return err;
 	}
 	return 0;
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH bpf-next 09/15] libbpf: Refactor codes for finding btf id of a kernel symbol
  2021-03-16  1:13 [PATCH bpf-next 00/15] Support calling kernel function Martin KaFai Lau
                   ` (7 preceding siblings ...)
  2021-03-16  1:14 ` [PATCH bpf-next 08/15] libbpf: Refactor bpf_object__resolve_ksyms_btf_id Martin KaFai Lau
@ 2021-03-16  1:14 ` Martin KaFai Lau
  2021-03-19  3:14   ` Andrii Nakryiko
  2021-03-16  1:14 ` [PATCH bpf-next 10/15] libbpf: Rename RELO_EXTERN to RELO_EXTERN_VAR Martin KaFai Lau
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 49+ messages in thread
From: Martin KaFai Lau @ 2021-03-16  1:14 UTC (permalink / raw)
  To: bpf; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, netdev

This patch refactors code, that finds kernel btf_id by kind
and symbol name, to a new function find_ksym_btf_id().

It also adds a new helper __btf_kind_str() to return
a string by the numeric kind value.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 tools/lib/bpf/libbpf.c | 44 +++++++++++++++++++++++++++++++-----------
 1 file changed, 33 insertions(+), 11 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 7d5f9b7877bc..8355b786b3db 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -1929,9 +1929,9 @@ resolve_func_ptr(const struct btf *btf, __u32 id, __u32 *res_id)
 	return btf_is_func_proto(t) ? t : NULL;
 }
 
-static const char *btf_kind_str(const struct btf_type *t)
+static const char *__btf_kind_str(__u16 kind)
 {
-	switch (btf_kind(t)) {
+	switch (kind) {
 	case BTF_KIND_UNKN: return "void";
 	case BTF_KIND_INT: return "int";
 	case BTF_KIND_PTR: return "ptr";
@@ -1953,6 +1953,11 @@ static const char *btf_kind_str(const struct btf_type *t)
 	}
 }
 
+static const char *btf_kind_str(const struct btf_type *t)
+{
+	return __btf_kind_str(btf_kind(t));
+}
+
 /*
  * Fetch integer attribute of BTF map definition. Such attributes are
  * represented using a pointer to an array, in which dimensionality of array
@@ -7403,18 +7408,17 @@ static int bpf_object__read_kallsyms_file(struct bpf_object *obj)
 	return err;
 }
 
-static int bpf_object__resolve_ksym_var_btf_id(struct bpf_object *obj,
-					       struct extern_desc *ext)
+static int find_ksym_btf_id(struct bpf_object *obj, const char *ksym_name,
+			    __u16 kind, struct btf **res_btf,
+			    int *res_btf_fd)
 {
-	const struct btf_type *targ_var, *targ_type;
-	__u32 targ_type_id, local_type_id;
-	const char *targ_var_name;
 	int i, id, btf_fd, err;
 	struct btf *btf;
 
 	btf = obj->btf_vmlinux;
 	btf_fd = 0;
-	id = btf__find_by_name_kind(btf, ext->name, BTF_KIND_VAR);
+	id = btf__find_by_name_kind(btf, ksym_name, kind);
+
 	if (id == -ENOENT) {
 		err = load_module_btfs(obj);
 		if (err)
@@ -7424,17 +7428,35 @@ static int bpf_object__resolve_ksym_var_btf_id(struct bpf_object *obj,
 			btf = obj->btf_modules[i].btf;
 			/* we assume module BTF FD is always >0 */
 			btf_fd = obj->btf_modules[i].fd;
-			id = btf__find_by_name_kind(btf, ext->name, BTF_KIND_VAR);
+			id = btf__find_by_name_kind(btf, ksym_name, kind);
 			if (id != -ENOENT)
 				break;
 		}
 	}
 	if (id <= 0) {
-		pr_warn("extern (var ksym) '%s': failed to find BTF ID in kernel BTF(s).\n",
-			ext->name);
+		pr_warn("extern (%s ksym) '%s': failed to find BTF ID in kernel BTF(s).\n",
+			__btf_kind_str(kind), ksym_name);
 		return -ESRCH;
 	}
 
+	*res_btf = btf;
+	*res_btf_fd = btf_fd;
+	return id;
+}
+
+static int bpf_object__resolve_ksym_var_btf_id(struct bpf_object *obj,
+					       struct extern_desc *ext)
+{
+	const struct btf_type *targ_var, *targ_type;
+	__u32 targ_type_id, local_type_id;
+	const char *targ_var_name;
+	int id, btf_fd = 0, err;
+	struct btf *btf = NULL;
+
+	id = find_ksym_btf_id(obj, ext->name, BTF_KIND_VAR, &btf, &btf_fd);
+	if (id < 0)
+		return id;
+
 	/* find local type_id */
 	local_type_id = ext->ksym.type_id;
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH bpf-next 10/15] libbpf: Rename RELO_EXTERN to RELO_EXTERN_VAR
  2021-03-16  1:13 [PATCH bpf-next 00/15] Support calling kernel function Martin KaFai Lau
                   ` (8 preceding siblings ...)
  2021-03-16  1:14 ` [PATCH bpf-next 09/15] libbpf: Refactor codes for finding btf id of a kernel symbol Martin KaFai Lau
@ 2021-03-16  1:14 ` Martin KaFai Lau
  2021-03-19  3:15   ` Andrii Nakryiko
  2021-03-16  1:14 ` [PATCH bpf-next 11/15] libbpf: Record extern sym relocation first Martin KaFai Lau
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 49+ messages in thread
From: Martin KaFai Lau @ 2021-03-16  1:14 UTC (permalink / raw)
  To: bpf; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, netdev

This patch renames RELO_EXTERN to RELO_EXTERN_VAR.
It is to avoid the confusion with a later patch adding
RELO_EXTERN_FUNC.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 tools/lib/bpf/libbpf.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 8355b786b3db..8f924aece736 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -189,7 +189,7 @@ enum reloc_type {
 	RELO_LD64,
 	RELO_CALL,
 	RELO_DATA,
-	RELO_EXTERN,
+	RELO_EXTERN_VAR,
 	RELO_SUBPROG_ADDR,
 };
 
@@ -3463,7 +3463,7 @@ static int bpf_program__record_reloc(struct bpf_program *prog,
 		}
 		pr_debug("prog '%s': found extern #%d '%s' (sym %d) for insn #%u\n",
 			 prog->name, i, ext->name, ext->sym_idx, insn_idx);
-		reloc_desc->type = RELO_EXTERN;
+		reloc_desc->type = RELO_EXTERN_VAR;
 		reloc_desc->insn_idx = insn_idx;
 		reloc_desc->sym_off = i; /* sym_off stores extern index */
 		return 0;
@@ -6226,7 +6226,7 @@ bpf_object__relocate_data(struct bpf_object *obj, struct bpf_program *prog)
 			insn[0].imm = obj->maps[relo->map_idx].fd;
 			relo->processed = true;
 			break;
-		case RELO_EXTERN:
+		case RELO_EXTERN_VAR:
 			ext = &obj->externs[relo->sym_off];
 			if (ext->type == EXT_KCFG) {
 				insn[0].src_reg = BPF_PSEUDO_MAP_VALUE;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH bpf-next 11/15] libbpf: Record extern sym relocation first
  2021-03-16  1:13 [PATCH bpf-next 00/15] Support calling kernel function Martin KaFai Lau
                   ` (9 preceding siblings ...)
  2021-03-16  1:14 ` [PATCH bpf-next 10/15] libbpf: Rename RELO_EXTERN to RELO_EXTERN_VAR Martin KaFai Lau
@ 2021-03-16  1:14 ` Martin KaFai Lau
  2021-03-19  3:16   ` Andrii Nakryiko
  2021-03-16  1:14 ` [PATCH bpf-next 12/15] libbpf: Support extern kernel function Martin KaFai Lau
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 49+ messages in thread
From: Martin KaFai Lau @ 2021-03-16  1:14 UTC (permalink / raw)
  To: bpf; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, netdev

This patch records the extern sym relocs first before recording
subprog relocs.  The later patch will have relocs for extern
kernel function call which is also using BPF_JMP | BPF_CALL.
It will be easier to handle the extern symbols first in
the later patch.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 tools/lib/bpf/libbpf.c | 50 +++++++++++++++++++++---------------------
 1 file changed, 25 insertions(+), 25 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 8f924aece736..0a60fcb2fba2 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -3416,31 +3416,7 @@ static int bpf_program__record_reloc(struct bpf_program *prog,
 
 	reloc_desc->processed = false;
 
-	/* sub-program call relocation */
-	if (insn->code == (BPF_JMP | BPF_CALL)) {
-		if (insn->src_reg != BPF_PSEUDO_CALL) {
-			pr_warn("prog '%s': incorrect bpf_call opcode\n", prog->name);
-			return -LIBBPF_ERRNO__RELOC;
-		}
-		/* text_shndx can be 0, if no default "main" program exists */
-		if (!shdr_idx || shdr_idx != obj->efile.text_shndx) {
-			sym_sec_name = elf_sec_name(obj, elf_sec_by_idx(obj, shdr_idx));
-			pr_warn("prog '%s': bad call relo against '%s' in section '%s'\n",
-				prog->name, sym_name, sym_sec_name);
-			return -LIBBPF_ERRNO__RELOC;
-		}
-		if (sym->st_value % BPF_INSN_SZ) {
-			pr_warn("prog '%s': bad call relo against '%s' at offset %zu\n",
-				prog->name, sym_name, (size_t)sym->st_value);
-			return -LIBBPF_ERRNO__RELOC;
-		}
-		reloc_desc->type = RELO_CALL;
-		reloc_desc->insn_idx = insn_idx;
-		reloc_desc->sym_off = sym->st_value;
-		return 0;
-	}
-
-	if (!is_ldimm64(insn)) {
+	if (insn->code != (BPF_JMP | BPF_CALL) && !is_ldimm64(insn)) {
 		pr_warn("prog '%s': invalid relo against '%s' for insns[%d].code 0x%x\n",
 			prog->name, sym_name, insn_idx, insn->code);
 		return -LIBBPF_ERRNO__RELOC;
@@ -3469,6 +3445,30 @@ static int bpf_program__record_reloc(struct bpf_program *prog,
 		return 0;
 	}
 
+	/* sub-program call relocation */
+	if (insn->code == (BPF_JMP | BPF_CALL)) {
+		if (insn->src_reg != BPF_PSEUDO_CALL) {
+			pr_warn("prog '%s': incorrect bpf_call opcode\n", prog->name);
+			return -LIBBPF_ERRNO__RELOC;
+		}
+		/* text_shndx can be 0, if no default "main" program exists */
+		if (!shdr_idx || shdr_idx != obj->efile.text_shndx) {
+			sym_sec_name = elf_sec_name(obj, elf_sec_by_idx(obj, shdr_idx));
+			pr_warn("prog '%s': bad call relo against '%s' in section '%s'\n",
+				prog->name, sym_name, sym_sec_name);
+			return -LIBBPF_ERRNO__RELOC;
+		}
+		if (sym->st_value % BPF_INSN_SZ) {
+			pr_warn("prog '%s': bad call relo against '%s' at offset %zu\n",
+				prog->name, sym_name, (size_t)sym->st_value);
+			return -LIBBPF_ERRNO__RELOC;
+		}
+		reloc_desc->type = RELO_CALL;
+		reloc_desc->insn_idx = insn_idx;
+		reloc_desc->sym_off = sym->st_value;
+		return 0;
+	}
+
 	if (!shdr_idx || shdr_idx >= SHN_LORESERVE) {
 		pr_warn("prog '%s': invalid relo against '%s' in special section 0x%x; forgot to initialize global var?..\n",
 			prog->name, sym_name, shdr_idx);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH bpf-next 12/15] libbpf: Support extern kernel function
  2021-03-16  1:13 [PATCH bpf-next 00/15] Support calling kernel function Martin KaFai Lau
                   ` (10 preceding siblings ...)
  2021-03-16  1:14 ` [PATCH bpf-next 11/15] libbpf: Record extern sym relocation first Martin KaFai Lau
@ 2021-03-16  1:14 ` Martin KaFai Lau
  2021-03-19  4:11   ` Andrii Nakryiko
  2021-03-16  1:14 ` [PATCH bpf-next 13/15] bpf: selftests: Rename bictcp to bpf_cubic Martin KaFai Lau
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 49+ messages in thread
From: Martin KaFai Lau @ 2021-03-16  1:14 UTC (permalink / raw)
  To: bpf; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, netdev

This patch is to make libbpf able to handle the following extern
kernel function declaration and do the needed relocations before
loading the bpf program to the kernel.

extern int foo(struct sock *) __attribute__((section(".ksyms")))

In the collect extern phase, needed changes is made to
bpf_object__collect_externs() and find_extern_btf_id() to collect
function.

In the collect relo phase, it will record the kernel function
call as RELO_EXTERN_FUNC.

bpf_object__resolve_ksym_func_btf_id() is added to find the func
btf_id of the running kernel.

During actual relocation, it will patch the BPF_CALL instruction with
src_reg = BPF_PSEUDO_FUNC_CALL and insn->imm set to the running
kernel func's btf_id.

btf_fixup_datasec() is changed also because a datasec may
only have func and its size will be 0.  The "!size" test
is postponed till it is confirmed there are vars.
It also takes this chance to remove the
"if (... || (t->size && t->size != size)) { return -ENOENT; }" test
because t->size is zero at the point.

The required LLVM patch: https://reviews.llvm.org/D93563

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 tools/lib/bpf/btf.c    |  32 ++++++++----
 tools/lib/bpf/btf.h    |   5 ++
 tools/lib/bpf/libbpf.c | 113 +++++++++++++++++++++++++++++++++++++----
 3 files changed, 129 insertions(+), 21 deletions(-)

diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index 3aa58f2ac183..bb09b577c154 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -1108,7 +1108,7 @@ static int btf_fixup_datasec(struct bpf_object *obj, struct btf *btf,
 	const struct btf_type *t_var;
 	struct btf_var_secinfo *vsi;
 	const struct btf_var *var;
-	int ret;
+	int ret, nr_vars = 0;
 
 	if (!name) {
 		pr_debug("No name found in string section for DATASEC kind.\n");
@@ -1117,27 +1117,27 @@ static int btf_fixup_datasec(struct bpf_object *obj, struct btf *btf,
 
 	/* .extern datasec size and var offsets were set correctly during
 	 * extern collection step, so just skip straight to sorting variables
+	 * One exception is the datasec may only have extern funcs,
+	 * t->size is 0 in this case.  This will be handled
+	 * with !nr_vars later.
 	 */
 	if (t->size)
 		goto sort_vars;
 
-	ret = bpf_object__section_size(obj, name, &size);
-	if (ret || !size || (t->size && t->size != size)) {
-		pr_debug("Invalid size for section %s: %u bytes\n", name, size);
-		return -ENOENT;
-	}
-
-	t->size = size;
+	bpf_object__section_size(obj, name, &size);
 
 	for (i = 0, vsi = btf_var_secinfos(t); i < vars; i++, vsi++) {
 		t_var = btf__type_by_id(btf, vsi->type);
-		var = btf_var(t_var);
 
-		if (!btf_is_var(t_var)) {
-			pr_debug("Non-VAR type seen in section %s\n", name);
+		if (btf_is_func(t_var)) {
+			continue;
+		} else if (!btf_is_var(t_var)) {
+			pr_debug("Non-VAR and Non-FUNC type seen in section %s\n", name);
 			return -EINVAL;
 		}
 
+		nr_vars++;
+		var = btf_var(t_var);
 		if (var->linkage == BTF_VAR_STATIC)
 			continue;
 
@@ -1157,6 +1157,16 @@ static int btf_fixup_datasec(struct bpf_object *obj, struct btf *btf,
 		vsi->offset = off;
 	}
 
+	if (!nr_vars)
+		return 0;
+
+	if (!size) {
+		pr_debug("Invalid size for section %s: %u bytes\n", name, size);
+		return -ENOENT;
+	}
+
+	t->size = size;
+
 sort_vars:
 	qsort(btf_var_secinfos(t), vars, sizeof(*vsi), compare_vsi_off);
 	return 0;
diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
index 029a9cfc8c2d..07d508b70497 100644
--- a/tools/lib/bpf/btf.h
+++ b/tools/lib/bpf/btf.h
@@ -368,6 +368,11 @@ btf_var_secinfos(const struct btf_type *t)
 	return (struct btf_var_secinfo *)(t + 1);
 }
 
+static inline enum btf_func_linkage btf_func_linkage(const struct btf_type *t)
+{
+	return (enum btf_func_linkage)BTF_INFO_VLEN(t->info);
+}
+
 #ifdef __cplusplus
 } /* extern "C" */
 #endif
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 0a60fcb2fba2..49bda179bd93 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -190,6 +190,7 @@ enum reloc_type {
 	RELO_CALL,
 	RELO_DATA,
 	RELO_EXTERN_VAR,
+	RELO_EXTERN_FUNC,
 	RELO_SUBPROG_ADDR,
 };
 
@@ -384,6 +385,7 @@ struct extern_desc {
 	int btf_id;
 	int sec_btf_id;
 	const char *name;
+	const struct btf_type *btf_type;
 	bool is_set;
 	bool is_weak;
 	union {
@@ -3022,7 +3024,7 @@ static bool sym_is_subprog(const GElf_Sym *sym, int text_shndx)
 static int find_extern_btf_id(const struct btf *btf, const char *ext_name)
 {
 	const struct btf_type *t;
-	const char *var_name;
+	const char *tname;
 	int i, n;
 
 	if (!btf)
@@ -3032,14 +3034,18 @@ static int find_extern_btf_id(const struct btf *btf, const char *ext_name)
 	for (i = 1; i <= n; i++) {
 		t = btf__type_by_id(btf, i);
 
-		if (!btf_is_var(t))
+		if (!btf_is_var(t) && !btf_is_func(t))
 			continue;
 
-		var_name = btf__name_by_offset(btf, t->name_off);
-		if (strcmp(var_name, ext_name))
+		tname = btf__name_by_offset(btf, t->name_off);
+		if (strcmp(tname, ext_name))
 			continue;
 
-		if (btf_var(t)->linkage != BTF_VAR_GLOBAL_EXTERN)
+		if (btf_is_var(t) &&
+		    btf_var(t)->linkage != BTF_VAR_GLOBAL_EXTERN)
+			return -EINVAL;
+
+		if (btf_is_func(t) && btf_func_linkage(t) != BTF_FUNC_EXTERN)
 			return -EINVAL;
 
 		return i;
@@ -3199,10 +3205,10 @@ static int bpf_object__collect_externs(struct bpf_object *obj)
 			return ext->btf_id;
 		}
 		t = btf__type_by_id(obj->btf, ext->btf_id);
+		ext->btf_type = t;
 		ext->name = btf__name_by_offset(obj->btf, t->name_off);
 		ext->sym_idx = i;
 		ext->is_weak = GELF_ST_BIND(sym.st_info) == STB_WEAK;
-
 		ext->sec_btf_id = find_extern_sec_btf_id(obj->btf, ext->btf_id);
 		if (ext->sec_btf_id <= 0) {
 			pr_warn("failed to find BTF for extern '%s' [%d] section: %d\n",
@@ -3212,6 +3218,34 @@ static int bpf_object__collect_externs(struct bpf_object *obj)
 		sec = (void *)btf__type_by_id(obj->btf, ext->sec_btf_id);
 		sec_name = btf__name_by_offset(obj->btf, sec->name_off);
 
+		if (btf_is_func(t)) {
+			const struct btf_type *func_proto;
+
+			func_proto = btf__type_by_id(obj->btf, t->type);
+			if (!func_proto || !btf_is_func_proto(func_proto)) {
+				pr_warn("extern function %s does not have a valid func_proto\n",
+					ext->name);
+				return -EINVAL;
+			}
+
+			if (ext->is_weak) {
+				pr_warn("extern weak function %s is unsupported\n",
+					ext->name);
+				return -ENOTSUP;
+			}
+
+			if (strcmp(sec_name, KSYMS_SEC)) {
+				pr_warn("extern function %s is only supported under %s section\n",
+					ext->name, KSYMS_SEC);
+				return -ENOTSUP;
+			}
+
+			ksym_sec = sec;
+			ext->type = EXT_KSYM;
+			ext->ksym.type_id = ext->btf_id;
+			continue;
+		}
+
 		if (strcmp(sec_name, KCONFIG_SEC) == 0) {
 			kcfg_sec = sec;
 			ext->type = EXT_KCFG;
@@ -3271,11 +3305,13 @@ static int bpf_object__collect_externs(struct bpf_object *obj)
 
 		sec = ksym_sec;
 		n = btf_vlen(sec);
-		for (i = 0, off = 0; i < n; i++, off += sizeof(int)) {
+		for (i = 0, off = 0; i < n; i++) {
 			struct btf_var_secinfo *vs = btf_var_secinfos(sec) + i;
 			struct btf_type *vt;
 
 			vt = (void *)btf__type_by_id(obj->btf, vs->type);
+			if (!btf_is_var(vt))
+				continue;
 			ext_name = btf__name_by_offset(obj->btf, vt->name_off);
 			ext = find_extern_by_name(obj, ext_name);
 			if (!ext) {
@@ -3287,6 +3323,7 @@ static int bpf_object__collect_externs(struct bpf_object *obj)
 			vt->type = int_btf_id;
 			vs->offset = off;
 			vs->size = sizeof(int);
+			off += sizeof(int);
 		}
 		sec->size = off;
 	}
@@ -3439,7 +3476,10 @@ static int bpf_program__record_reloc(struct bpf_program *prog,
 		}
 		pr_debug("prog '%s': found extern #%d '%s' (sym %d) for insn #%u\n",
 			 prog->name, i, ext->name, ext->sym_idx, insn_idx);
-		reloc_desc->type = RELO_EXTERN_VAR;
+		if (insn->code == (BPF_JMP | BPF_CALL))
+			reloc_desc->type = RELO_EXTERN_FUNC;
+		else
+			reloc_desc->type = RELO_EXTERN_VAR;
 		reloc_desc->insn_idx = insn_idx;
 		reloc_desc->sym_off = i; /* sym_off stores extern index */
 		return 0;
@@ -6244,6 +6284,12 @@ bpf_object__relocate_data(struct bpf_object *obj, struct bpf_program *prog)
 			}
 			relo->processed = true;
 			break;
+		case RELO_EXTERN_FUNC:
+			ext = &obj->externs[relo->sym_off];
+			insn[0].src_reg = BPF_PSEUDO_KFUNC_CALL;
+			insn[0].imm = ext->ksym.kernel_btf_id;
+			relo->processed = true;
+			break;
 		case RELO_SUBPROG_ADDR:
 			insn[0].src_reg = BPF_PSEUDO_FUNC;
 			/* will be handled as a follow up pass */
@@ -7387,7 +7433,7 @@ static int bpf_object__read_kallsyms_file(struct bpf_object *obj)
 		}
 
 		ext = find_extern_by_name(obj, sym_name);
-		if (!ext || ext->type != EXT_KSYM)
+		if (!ext || ext->type != EXT_KSYM || !btf_is_var(ext->btf_type))
 			continue;
 
 		if (ext->is_set && ext->ksym.addr != sym_addr) {
@@ -7491,6 +7537,50 @@ static int bpf_object__resolve_ksym_var_btf_id(struct bpf_object *obj,
 	return 0;
 }
 
+static int bpf_object__resolve_ksym_func_btf_id(struct bpf_object *obj,
+						struct extern_desc *ext)
+{
+	int local_func_proto_id, kern_func_proto_id, kern_func_id;
+	const struct btf_type *kern_func;
+	struct btf *kern_btf = NULL;
+	int ret, kern_btf_fd = 0;
+
+	local_func_proto_id = ext->btf_type->type;
+
+	kern_func_id = find_ksym_btf_id(obj, ext->name, BTF_KIND_FUNC,
+					&kern_btf, &kern_btf_fd);
+	if (kern_func_id < 0) {
+		pr_warn("extern (func ksym) '%s': not found in kernel BTF\n",
+			ext->name);
+		return kern_func_id;
+	}
+
+	if (kern_btf != obj->btf_vmlinux) {
+		pr_warn("extern (func ksym) '%s': function in kernel module is not supported\n",
+			ext->name);
+		return -ENOTSUP;
+	}
+
+	kern_func = btf__type_by_id(kern_btf, kern_func_id);
+	kern_func_proto_id = kern_func->type;
+
+	ret = bpf_core_types_are_compat(obj->btf, local_func_proto_id,
+					kern_btf, kern_func_proto_id);
+	if (ret <= 0) {
+		pr_warn("extern (func ksym) '%s': func_proto [%d] incompatible with kernel [%d]\n",
+			ext->name, local_func_proto_id, kern_func_proto_id);
+		return -EINVAL;
+	}
+
+	ext->is_set = true;
+	ext->ksym.kernel_btf_obj_fd = kern_btf_fd;
+	ext->ksym.kernel_btf_id = kern_func_id;
+	pr_debug("extern (func ksym) '%s': resolved to kernel [%d]\n",
+		 ext->name, kern_func_id);
+
+	return 0;
+}
+
 static int bpf_object__resolve_ksyms_btf_id(struct bpf_object *obj)
 {
 	struct extern_desc *ext;
@@ -7501,7 +7591,10 @@ static int bpf_object__resolve_ksyms_btf_id(struct bpf_object *obj)
 		if (ext->type != EXT_KSYM || !ext->ksym.type_id)
 			continue;
 
-		err = bpf_object__resolve_ksym_var_btf_id(obj, ext);
+		if (btf_is_var(ext->btf_type))
+			err = bpf_object__resolve_ksym_var_btf_id(obj, ext);
+		else
+			err = bpf_object__resolve_ksym_func_btf_id(obj, ext);
 
 		if (err)
 			return err;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH bpf-next 13/15] bpf: selftests: Rename bictcp to bpf_cubic
  2021-03-16  1:13 [PATCH bpf-next 00/15] Support calling kernel function Martin KaFai Lau
                   ` (11 preceding siblings ...)
  2021-03-16  1:14 ` [PATCH bpf-next 12/15] libbpf: Support extern kernel function Martin KaFai Lau
@ 2021-03-16  1:14 ` Martin KaFai Lau
  2021-03-19  4:14   ` Andrii Nakryiko
  2021-03-16  1:15 ` [PATCH bpf-next 14/15] bpf: selftest: bpf_cubic and bpf_dctcp calling kernel functions Martin KaFai Lau
  2021-03-16  1:15 ` [PATCH bpf-next 15/15] bpf: selftest: Add kfunc_call test Martin KaFai Lau
  14 siblings, 1 reply; 49+ messages in thread
From: Martin KaFai Lau @ 2021-03-16  1:14 UTC (permalink / raw)
  To: bpf; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, netdev

As a similar chanage in the kernel, this patch gives the proper
name to the bpf cubic.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 tools/testing/selftests/bpf/progs/bpf_cubic.c | 30 +++++++++----------
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/tools/testing/selftests/bpf/progs/bpf_cubic.c b/tools/testing/selftests/bpf/progs/bpf_cubic.c
index 6939bfd8690f..33c4d2bded64 100644
--- a/tools/testing/selftests/bpf/progs/bpf_cubic.c
+++ b/tools/testing/selftests/bpf/progs/bpf_cubic.c
@@ -174,8 +174,8 @@ static __always_inline void bictcp_hystart_reset(struct sock *sk)
  * as long as it is used in one of the func ptr
  * under SEC(".struct_ops").
  */
-SEC("struct_ops/bictcp_init")
-void BPF_PROG(bictcp_init, struct sock *sk)
+SEC("struct_ops/bpf_cubic_init")
+void BPF_PROG(bpf_cubic_init, struct sock *sk)
 {
 	struct bictcp *ca = inet_csk_ca(sk);
 
@@ -192,7 +192,7 @@ void BPF_PROG(bictcp_init, struct sock *sk)
  * The remaining tcp-cubic functions have an easier way.
  */
 SEC("no-sec-prefix-bictcp_cwnd_event")
-void BPF_PROG(bictcp_cwnd_event, struct sock *sk, enum tcp_ca_event event)
+void BPF_PROG(bpf_cubic_cwnd_event, struct sock *sk, enum tcp_ca_event event)
 {
 	if (event == CA_EVENT_TX_START) {
 		struct bictcp *ca = inet_csk_ca(sk);
@@ -384,7 +384,7 @@ static __always_inline void bictcp_update(struct bictcp *ca, __u32 cwnd,
 }
 
 /* Or simply use the BPF_STRUCT_OPS to avoid the SEC boiler plate. */
-void BPF_STRUCT_OPS(bictcp_cong_avoid, struct sock *sk, __u32 ack, __u32 acked)
+void BPF_STRUCT_OPS(bpf_cubic_cong_avoid, struct sock *sk, __u32 ack, __u32 acked)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct bictcp *ca = inet_csk_ca(sk);
@@ -403,7 +403,7 @@ void BPF_STRUCT_OPS(bictcp_cong_avoid, struct sock *sk, __u32 ack, __u32 acked)
 	tcp_cong_avoid_ai(tp, ca->cnt, acked);
 }
 
-__u32 BPF_STRUCT_OPS(bictcp_recalc_ssthresh, struct sock *sk)
+__u32 BPF_STRUCT_OPS(bpf_cubic_recalc_ssthresh, struct sock *sk)
 {
 	const struct tcp_sock *tp = tcp_sk(sk);
 	struct bictcp *ca = inet_csk_ca(sk);
@@ -420,7 +420,7 @@ __u32 BPF_STRUCT_OPS(bictcp_recalc_ssthresh, struct sock *sk)
 	return max((tp->snd_cwnd * beta) / BICTCP_BETA_SCALE, 2U);
 }
 
-void BPF_STRUCT_OPS(bictcp_state, struct sock *sk, __u8 new_state)
+void BPF_STRUCT_OPS(bpf_cubic_state, struct sock *sk, __u8 new_state)
 {
 	if (new_state == TCP_CA_Loss) {
 		bictcp_reset(inet_csk_ca(sk));
@@ -496,7 +496,7 @@ static __always_inline void hystart_update(struct sock *sk, __u32 delay)
 	}
 }
 
-void BPF_STRUCT_OPS(bictcp_acked, struct sock *sk,
+void BPF_STRUCT_OPS(bpf_cubic_acked, struct sock *sk,
 		    const struct ack_sample *sample)
 {
 	const struct tcp_sock *tp = tcp_sk(sk);
@@ -525,7 +525,7 @@ void BPF_STRUCT_OPS(bictcp_acked, struct sock *sk,
 		hystart_update(sk, delay);
 }
 
-__u32 BPF_STRUCT_OPS(tcp_reno_undo_cwnd, struct sock *sk)
+__u32 BPF_STRUCT_OPS(bpf_cubic_undo_cwnd, struct sock *sk)
 {
 	const struct tcp_sock *tp = tcp_sk(sk);
 
@@ -534,12 +534,12 @@ __u32 BPF_STRUCT_OPS(tcp_reno_undo_cwnd, struct sock *sk)
 
 SEC(".struct_ops")
 struct tcp_congestion_ops cubic = {
-	.init		= (void *)bictcp_init,
-	.ssthresh	= (void *)bictcp_recalc_ssthresh,
-	.cong_avoid	= (void *)bictcp_cong_avoid,
-	.set_state	= (void *)bictcp_state,
-	.undo_cwnd	= (void *)tcp_reno_undo_cwnd,
-	.cwnd_event	= (void *)bictcp_cwnd_event,
-	.pkts_acked     = (void *)bictcp_acked,
+	.init		= (void *)bpf_cubic_init,
+	.ssthresh	= (void *)bpf_cubic_recalc_ssthresh,
+	.cong_avoid	= (void *)bpf_cubic_cong_avoid,
+	.set_state	= (void *)bpf_cubic_state,
+	.undo_cwnd	= (void *)bpf_cubic_undo_cwnd,
+	.cwnd_event	= (void *)bpf_cubic_cwnd_event,
+	.pkts_acked     = (void *)bpf_cubic_acked,
 	.name		= "bpf_cubic",
 };
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH bpf-next 14/15] bpf: selftest: bpf_cubic and bpf_dctcp calling kernel functions
  2021-03-16  1:13 [PATCH bpf-next 00/15] Support calling kernel function Martin KaFai Lau
                   ` (12 preceding siblings ...)
  2021-03-16  1:14 ` [PATCH bpf-next 13/15] bpf: selftests: Rename bictcp to bpf_cubic Martin KaFai Lau
@ 2021-03-16  1:15 ` Martin KaFai Lau
  2021-03-19  4:15   ` Andrii Nakryiko
  2021-03-16  1:15 ` [PATCH bpf-next 15/15] bpf: selftest: Add kfunc_call test Martin KaFai Lau
  14 siblings, 1 reply; 49+ messages in thread
From: Martin KaFai Lau @ 2021-03-16  1:15 UTC (permalink / raw)
  To: bpf; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, netdev

This patch removes the bpf implementation of tcp_slow_start()
and tcp_cong_avoid_ai().  Instead, it directly uses the kernel
implementation.

It also replaces the bpf_cubic_undo_cwnd implementation by directly
calling tcp_reno_undo_cwnd().  bpf_dctcp also directly calls
tcp_reno_cong_avoid() instead.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 tools/testing/selftests/bpf/bpf_tcp_helpers.h | 29 ++-----------------
 tools/testing/selftests/bpf/progs/bpf_cubic.c |  6 ++--
 tools/testing/selftests/bpf/progs/bpf_dctcp.c | 22 ++++----------
 3 files changed, 11 insertions(+), 46 deletions(-)

diff --git a/tools/testing/selftests/bpf/bpf_tcp_helpers.h b/tools/testing/selftests/bpf/bpf_tcp_helpers.h
index 91f0fac632f4..029589c008c9 100644
--- a/tools/testing/selftests/bpf/bpf_tcp_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_tcp_helpers.h
@@ -187,16 +187,6 @@ struct tcp_congestion_ops {
 	typeof(y) __y = (y);			\
 	__x == 0 ? __y : ((__y == 0) ? __x : min(__x, __y)); })
 
-static __always_inline __u32 tcp_slow_start(struct tcp_sock *tp, __u32 acked)
-{
-	__u32 cwnd = min(tp->snd_cwnd + acked, tp->snd_ssthresh);
-
-	acked -= cwnd - tp->snd_cwnd;
-	tp->snd_cwnd = min(cwnd, tp->snd_cwnd_clamp);
-
-	return acked;
-}
-
 static __always_inline bool tcp_in_slow_start(const struct tcp_sock *tp)
 {
 	return tp->snd_cwnd < tp->snd_ssthresh;
@@ -213,22 +203,7 @@ static __always_inline bool tcp_is_cwnd_limited(const struct sock *sk)
 	return !!BPF_CORE_READ_BITFIELD(tp, is_cwnd_limited);
 }
 
-static __always_inline void tcp_cong_avoid_ai(struct tcp_sock *tp, __u32 w, __u32 acked)
-{
-	/* If credits accumulated at a higher w, apply them gently now. */
-	if (tp->snd_cwnd_cnt >= w) {
-		tp->snd_cwnd_cnt = 0;
-		tp->snd_cwnd++;
-	}
-
-	tp->snd_cwnd_cnt += acked;
-	if (tp->snd_cwnd_cnt >= w) {
-		__u32 delta = tp->snd_cwnd_cnt / w;
-
-		tp->snd_cwnd_cnt -= delta * w;
-		tp->snd_cwnd += delta;
-	}
-	tp->snd_cwnd = min(tp->snd_cwnd, tp->snd_cwnd_clamp);
-}
+extern __u32 tcp_slow_start(struct tcp_sock *tp, __u32 acked) __ksym;
+extern void tcp_cong_avoid_ai(struct tcp_sock *tp, __u32 w, __u32 acked) __ksym;
 
 #endif
diff --git a/tools/testing/selftests/bpf/progs/bpf_cubic.c b/tools/testing/selftests/bpf/progs/bpf_cubic.c
index 33c4d2bded64..f62df4d023f9 100644
--- a/tools/testing/selftests/bpf/progs/bpf_cubic.c
+++ b/tools/testing/selftests/bpf/progs/bpf_cubic.c
@@ -525,11 +525,11 @@ void BPF_STRUCT_OPS(bpf_cubic_acked, struct sock *sk,
 		hystart_update(sk, delay);
 }
 
+extern __u32 tcp_reno_undo_cwnd(struct sock *sk) __ksym;
+
 __u32 BPF_STRUCT_OPS(bpf_cubic_undo_cwnd, struct sock *sk)
 {
-	const struct tcp_sock *tp = tcp_sk(sk);
-
-	return max(tp->snd_cwnd, tp->prior_cwnd);
+	return tcp_reno_undo_cwnd(sk);
 }
 
 SEC(".struct_ops")
diff --git a/tools/testing/selftests/bpf/progs/bpf_dctcp.c b/tools/testing/selftests/bpf/progs/bpf_dctcp.c
index 4dc1a967776a..fd42247da8b4 100644
--- a/tools/testing/selftests/bpf/progs/bpf_dctcp.c
+++ b/tools/testing/selftests/bpf/progs/bpf_dctcp.c
@@ -194,22 +194,12 @@ __u32 BPF_PROG(dctcp_cwnd_undo, struct sock *sk)
 	return max(tcp_sk(sk)->snd_cwnd, ca->loss_cwnd);
 }
 
-SEC("struct_ops/tcp_reno_cong_avoid")
-void BPF_PROG(tcp_reno_cong_avoid, struct sock *sk, __u32 ack, __u32 acked)
-{
-	struct tcp_sock *tp = tcp_sk(sk);
-
-	if (!tcp_is_cwnd_limited(sk))
-		return;
+extern void tcp_reno_cong_avoid(struct sock *sk, __u32 ack, __u32 acked) __ksym;
 
-	/* In "safe" area, increase. */
-	if (tcp_in_slow_start(tp)) {
-		acked = tcp_slow_start(tp, acked);
-		if (!acked)
-			return;
-	}
-	/* In dangerous area, increase slowly. */
-	tcp_cong_avoid_ai(tp, tp->snd_cwnd, acked);
+SEC("struct_ops/dctcp_reno_cong_avoid")
+void BPF_PROG(dctcp_cong_avoid, struct sock *sk, __u32 ack, __u32 acked)
+{
+	tcp_reno_cong_avoid(sk, ack, acked);
 }
 
 SEC(".struct_ops")
@@ -226,7 +216,7 @@ struct tcp_congestion_ops dctcp = {
 	.in_ack_event   = (void *)dctcp_update_alpha,
 	.cwnd_event	= (void *)dctcp_cwnd_event,
 	.ssthresh	= (void *)dctcp_ssthresh,
-	.cong_avoid	= (void *)tcp_reno_cong_avoid,
+	.cong_avoid	= (void *)dctcp_cong_avoid,
 	.undo_cwnd	= (void *)dctcp_cwnd_undo,
 	.set_state	= (void *)dctcp_state,
 	.flags		= TCP_CONG_NEEDS_ECN,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH bpf-next 15/15] bpf: selftest: Add kfunc_call test
  2021-03-16  1:13 [PATCH bpf-next 00/15] Support calling kernel function Martin KaFai Lau
                   ` (13 preceding siblings ...)
  2021-03-16  1:15 ` [PATCH bpf-next 14/15] bpf: selftest: bpf_cubic and bpf_dctcp calling kernel functions Martin KaFai Lau
@ 2021-03-16  1:15 ` Martin KaFai Lau
  2021-03-16  3:39   ` kernel test robot
  2021-03-19  4:21   ` Andrii Nakryiko
  14 siblings, 2 replies; 49+ messages in thread
From: Martin KaFai Lau @ 2021-03-16  1:15 UTC (permalink / raw)
  To: bpf; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, netdev

This patch adds two kernel function bpf_kfunc_call_test[12]() for the
selftest's test_run purpose.  They will be allowed for tc_cls prog.

The selftest calling the kernel function bpf_kfunc_call_test[12]()
is also added in this patch.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 net/bpf/test_run.c                            | 11 ++++
 net/core/filter.c                             | 11 ++++
 .../selftests/bpf/prog_tests/kfunc_call.c     | 61 +++++++++++++++++++
 .../selftests/bpf/progs/kfunc_call_test.c     | 48 +++++++++++++++
 .../bpf/progs/kfunc_call_test_subprog.c       | 31 ++++++++++
 5 files changed, 162 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/kfunc_call.c
 create mode 100644 tools/testing/selftests/bpf/progs/kfunc_call_test.c
 create mode 100644 tools/testing/selftests/bpf/progs/kfunc_call_test_subprog.c

diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 0abdd67f44b1..c1baab0c7d96 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -209,6 +209,17 @@ int noinline bpf_modify_return_test(int a, int *b)
 	*b += 1;
 	return a + *b;
 }
+
+u64 noinline bpf_kfunc_call_test1(struct sock *sk, u32 a, u64 b, u32 c, u64 d)
+{
+	return a + b + c + d;
+}
+
+int noinline bpf_kfunc_call_test2(struct sock *sk, u32 a, u32 b)
+{
+	return a + b;
+}
+
 __diag_pop();
 
 ALLOW_ERROR_INJECTION(bpf_modify_return_test, ERRNO);
diff --git a/net/core/filter.c b/net/core/filter.c
index 10dac9dd5086..605fbbdd694b 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -9799,12 +9799,23 @@ const struct bpf_prog_ops sk_filter_prog_ops = {
 	.test_run		= bpf_prog_test_run_skb,
 };
 
+BTF_SET_START(bpf_tc_cls_kfunc_ids)
+BTF_ID(func, bpf_kfunc_call_test1)
+BTF_ID(func, bpf_kfunc_call_test2)
+BTF_SET_END(bpf_tc_cls_kfunc_ids)
+
+static bool tc_cls_check_kern_func_call(u32 kfunc_id)
+{
+	return btf_id_set_contains(&bpf_tc_cls_kfunc_ids, kfunc_id);
+}
+
 const struct bpf_verifier_ops tc_cls_act_verifier_ops = {
 	.get_func_proto		= tc_cls_act_func_proto,
 	.is_valid_access	= tc_cls_act_is_valid_access,
 	.convert_ctx_access	= tc_cls_act_convert_ctx_access,
 	.gen_prologue		= tc_cls_act_prologue,
 	.gen_ld_abs		= bpf_gen_ld_abs,
+	.check_kern_func_call	= tc_cls_check_kern_func_call,
 };
 
 const struct bpf_prog_ops tc_cls_act_prog_ops = {
diff --git a/tools/testing/selftests/bpf/prog_tests/kfunc_call.c b/tools/testing/selftests/bpf/prog_tests/kfunc_call.c
new file mode 100644
index 000000000000..3850e6cc0a7d
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/kfunc_call.c
@@ -0,0 +1,61 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Facebook */
+#include <test_progs.h>
+#include <network_helpers.h>
+#include "kfunc_call_test.skel.h"
+#include "kfunc_call_test_subprog.skel.h"
+
+static __u32 duration;
+
+static void test_main(void)
+{
+	struct kfunc_call_test *skel;
+	int prog_fd, retval, err;
+
+	skel = kfunc_call_test__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel"))
+		return;
+
+	prog_fd = bpf_program__fd(skel->progs.kfunc_call_test1);
+	err = bpf_prog_test_run(prog_fd, 1, &pkt_v4, sizeof(pkt_v4),
+				NULL, NULL, (__u32 *)&retval, &duration);
+
+	if (ASSERT_OK(err, "bpf_prog_test_run(test1)"))
+		ASSERT_EQ(retval, 12, "test1-retval");
+
+	prog_fd = bpf_program__fd(skel->progs.kfunc_call_test2);
+	err = bpf_prog_test_run(prog_fd, 1, &pkt_v4, sizeof(pkt_v4),
+				NULL, NULL, (__u32 *)&retval, &duration);
+	if (ASSERT_OK(err, "bpf_prog_test_run(test2)"))
+		ASSERT_EQ(retval, 3, "test2-retval");
+
+	kfunc_call_test__destroy(skel);
+}
+
+static void test_subprog(void)
+{
+	struct kfunc_call_test_subprog *skel;
+	int prog_fd, retval, err;
+
+	skel = kfunc_call_test_subprog__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel"))
+		return;
+
+	prog_fd = bpf_program__fd(skel->progs.kfunc_call_test1);
+	err = bpf_prog_test_run(prog_fd, 1, &pkt_v4, sizeof(pkt_v4),
+				NULL, NULL, (__u32 *)&retval, &duration);
+
+	if (ASSERT_OK(err, "bpf_prog_test_run(test1)"))
+		ASSERT_EQ(retval, 10, "test1-retval");
+
+	kfunc_call_test_subprog__destroy(skel);
+}
+
+void test_kfunc_call(void)
+{
+	if (test__start_subtest("main"))
+		test_main();
+
+	if (test__start_subtest("subprog"))
+		test_subprog();
+}
diff --git a/tools/testing/selftests/bpf/progs/kfunc_call_test.c b/tools/testing/selftests/bpf/progs/kfunc_call_test.c
new file mode 100644
index 000000000000..ea8c5266efd8
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/kfunc_call_test.c
@@ -0,0 +1,48 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Facebook */
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_tcp_helpers.h"
+
+extern __u64 bpf_kfunc_call_test1(struct sock *sk, __u32 a, __u64 b,
+				  __u32 c, __u64 d) __ksym;
+extern int bpf_kfunc_call_test2(struct sock *sk, __u32 a, __u32 b) __ksym;
+
+SEC("classifier/test2")
+int kfunc_call_test2(struct __sk_buff *skb)
+{
+	struct bpf_sock *sk = skb->sk;
+
+	if (!sk)
+		return -1;
+
+	sk = bpf_sk_fullsock(sk);
+	if (!sk)
+		return -1;
+
+	return bpf_kfunc_call_test2((struct sock *)sk, 1, 2);
+}
+
+SEC("classifier/test1")
+int kfunc_call_test1(struct __sk_buff *skb)
+{
+	struct bpf_sock *sk = skb->sk;
+	__u64 a = 1ULL << 32;
+	__u32 ret;
+
+	if (!sk)
+		return -1;
+
+	sk = bpf_sk_fullsock(sk);
+	if (!sk)
+		return -1;
+
+	a = bpf_kfunc_call_test1((struct sock *)sk, 1, a | 2, 3, a | 4);
+
+	ret = a >> 32;   /* ret should be 2 */
+	ret += (__u32)a; /* ret should be 12 */
+
+	return ret;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/kfunc_call_test_subprog.c b/tools/testing/selftests/bpf/progs/kfunc_call_test_subprog.c
new file mode 100644
index 000000000000..9bf66f8c826e
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/kfunc_call_test_subprog.c
@@ -0,0 +1,31 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Facebook */
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_tcp_helpers.h"
+
+extern __u64 bpf_kfunc_call_test1(struct sock *sk, __u32 a, __u64 b,
+				  __u32 c, __u64 d) __ksym;
+
+__attribute__ ((noinline))
+int f1(struct __sk_buff *skb)
+{
+	struct bpf_sock *sk = skb->sk;
+
+	if (!sk)
+		return -1;
+
+	sk = bpf_sk_fullsock(sk);
+	if (!sk)
+		return -1;
+
+	return (__u32)bpf_kfunc_call_test1((struct sock *)sk, 1, 2, 3, 4);
+}
+
+SEC("classifier/test1_subprog")
+int kfunc_call_test1(struct __sk_buff *skb)
+{
+	return f1(skb);
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH bpf-next 15/15] bpf: selftest: Add kfunc_call test
  2021-03-16  1:15 ` [PATCH bpf-next 15/15] bpf: selftest: Add kfunc_call test Martin KaFai Lau
@ 2021-03-16  3:39   ` kernel test robot
  2021-03-19  4:21   ` Andrii Nakryiko
  1 sibling, 0 replies; 49+ messages in thread
From: kernel test robot @ 2021-03-16  3:39 UTC (permalink / raw)
  To: Martin KaFai Lau, bpf
  Cc: kbuild-all, Alexei Starovoitov, Daniel Borkmann, kernel-team, netdev

[-- Attachment #1: Type: text/plain, Size: 1429 bytes --]

Hi Martin,

I love your patch! Yet something to improve:

[auto build test ERROR on bpf-next/master]

url:    https://github.com/0day-ci/linux/commits/Martin-KaFai-Lau/Support-calling-kernel-function/20210316-091639
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: parisc-defconfig (attached as .config)
compiler: hppa-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/5839e49467b4593db0b16c1343cf9d9d60cc6dcd
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Martin-KaFai-Lau/Support-calling-kernel-function/20210316-091639
        git checkout 5839e49467b4593db0b16c1343cf9d9d60cc6dcd
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=parisc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   hppa-linux-ld: net/core/filter.o: in function `tc_cls_check_kern_func_call':
>> (.text+0x79fc): undefined reference to `btf_id_set_contains'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 18432 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH bpf-next 02/15] bpf: btf: Support parsing extern func
  2021-03-16  1:13 ` [PATCH bpf-next 02/15] bpf: btf: Support parsing extern func Martin KaFai Lau
@ 2021-03-18 22:53   ` Andrii Nakryiko
  2021-03-18 23:39     ` Martin KaFai Lau
  0 siblings, 1 reply; 49+ messages in thread
From: Andrii Nakryiko @ 2021-03-18 22:53 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team, Networking

On Tue, Mar 16, 2021 at 12:01 AM Martin KaFai Lau <kafai@fb.com> wrote:
>
> This patch makes BTF verifier to accept extern func. It is used for
> allowing bpf program to call a limited set of kernel functions
> in a later patch.
>
> When writing bpf prog, the extern kernel function needs
> to be declared under a ELF section (".ksyms") which is
> the same as the current extern kernel variables and that should
> keep its usage consistent without requiring to remember another
> section name.
>
> For example, in a bpf_prog.c:
>
> extern int foo(struct sock *) __attribute__((section(".ksyms")))
>
> [24] FUNC_PROTO '(anon)' ret_type_id=15 vlen=1
>         '(anon)' type_id=18
> [25] FUNC 'foo' type_id=24 linkage=extern
> [ ... ]
> [33] DATASEC '.ksyms' size=0 vlen=1
>         type_id=25 offset=0 size=0
>
> LLVM will put the "func" type into the BTF datasec ".ksyms".
> The current "btf_datasec_check_meta()" assumes everything under
> it is a "var" and ensures it has non-zero size ("!vsi->size" test).
> The non-zero size check is not true for "func".  This patch postpones the
> "!vsi-size" test from "btf_datasec_check_meta()" to
> "btf_datasec_resolve()" which has all types collected to decide
> if a vsi is a "var" or a "func" and then enforce the "vsi->size"
> differently.
>
> If the datasec only has "func", its "t->size" could be zero.
> Thus, the current "!t->size" test is no longer valid.  The
> invalid "t->size" will still be caught by the later
> "last_vsi_end_off > t->size" check.   This patch also takes this
> chance to consolidate other "t->size" tests ("vsi->offset >= t->size"
> "vsi->size > t->size", and "t->size < sum") into the existing
> "last_vsi_end_off > t->size" test.
>
> The LLVM will also put those extern kernel function as an extern
> linkage func in the BTF:
>
> [24] FUNC_PROTO '(anon)' ret_type_id=15 vlen=1
>         '(anon)' type_id=18
> [25] FUNC 'foo' type_id=24 linkage=extern
>
> This patch allows BTF_FUNC_EXTERN in btf_func_check_meta().
> Also extern kernel function declaration does not
> necessary have arg name. Another change in btf_func_check() is
> to allow extern function having no arg name.
>
> The btf selftest is adjusted accordingly.  New tests are also added.
>
> The required LLVM patch: https://reviews.llvm.org/D93563
>
> Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> ---

High-level question about EXTERN functions in DATASEC. Does kernel
need to see them under DATASEC? What if libbpf just removed all EXTERN
funcs from under DATASEC and leave them as "free-floating" EXTERN
FUNCs in BTF.

We need to tag EXTERNs with DATASECs mainly for libbpf to know whether
it's .kconfig or .ksym or other type of externs. Does kernel need to
care?

>  kernel/bpf/btf.c                             |  52 ++++---
>  tools/testing/selftests/bpf/prog_tests/btf.c | 154 ++++++++++++++++++-
>  2 files changed, 178 insertions(+), 28 deletions(-)
>

[...]

> @@ -3611,9 +3594,28 @@ static int btf_datasec_resolve(struct btf_verifier_env *env,
>                 u32 var_type_id = vsi->type, type_id, type_size = 0;
>                 const struct btf_type *var_type = btf_type_by_id(env->btf,
>                                                                  var_type_id);
> -               if (!var_type || !btf_type_is_var(var_type)) {
> +               if (!var_type) {
> +                       btf_verifier_log_vsi(env, v->t, vsi,
> +                                            "type not found");
> +                       return -EINVAL;
> +               }
> +
> +               if (btf_type_is_func(var_type)) {
> +                       if (vsi->size || vsi->offset) {
> +                               btf_verifier_log_vsi(env, v->t, vsi,
> +                                                    "Invalid size/offset");
> +                               return -EINVAL;
> +                       }
> +                       continue;
> +               } else if (btf_type_is_var(var_type)) {
> +                       if (!vsi->size) {
> +                               btf_verifier_log_vsi(env, v->t, vsi,
> +                                                    "Invalid size");
> +                               return -EINVAL;
> +                       }
> +               } else {
>                         btf_verifier_log_vsi(env, v->t, vsi,
> -                                            "Not a VAR kind member");
> +                                            "Neither a VAR nor a FUNC");
>                         return -EINVAL;

can you please structure it as follow (I think it is bit easier to
follow the logic then):

if (btf_type_is_func()) {
   ...
   continue; /* no extra checks */
}

if (!btf_type_is_var()) {
   /* bad, complain, exit */
   return -EINVAL;
}

/* now we deal with extra checks for variables */

That way variable checks are kept all in one place.

Also a question: is that ok to enable non-extern functions under
DATASEC? Maybe, but that wasn't explicitly mentioned.

>                 }
>
> @@ -3849,9 +3851,11 @@ static int btf_func_check(struct btf_verifier_env *env,
>         const struct btf_param *args;
>         const struct btf *btf;
>         u16 nr_args, i;
> +       bool is_extern;
>
>         btf = env->btf;
>         proto_type = btf_type_by_id(btf, t->type);
> +       is_extern = btf_type_vlen(t) == BTF_FUNC_EXTERN;

using btf_type_vlen(t) for getting func linkage is becoming more and
more confusing. Would it be terrible to have btf_func_linkage(t)
helper instead?

>
>         if (!proto_type || !btf_type_is_func_proto(proto_type)) {
>                 btf_verifier_log_type(env, t, "Invalid type_id");
> @@ -3861,7 +3865,7 @@ static int btf_func_check(struct btf_verifier_env *env,
>         args = (const struct btf_param *)(proto_type + 1);
>         nr_args = btf_type_vlen(proto_type);
>         for (i = 0; i < nr_args; i++) {
> -               if (!args[i].name_off && args[i].type) {
> +               if (!is_extern && !args[i].name_off && args[i].type) {
>                         btf_verifier_log_type(env, t, "Invalid arg#%u", i + 1);
>                         return -EINVAL;
>                 }

[...]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH bpf-next 03/15] bpf: Refactor btf_check_func_arg_match
  2021-03-16  1:13 ` [PATCH bpf-next 03/15] bpf: Refactor btf_check_func_arg_match Martin KaFai Lau
@ 2021-03-18 23:32   ` Andrii Nakryiko
  2021-03-19 19:32     ` Martin KaFai Lau
  0 siblings, 1 reply; 49+ messages in thread
From: Andrii Nakryiko @ 2021-03-18 23:32 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team, Networking

On Tue, Mar 16, 2021 at 12:01 AM Martin KaFai Lau <kafai@fb.com> wrote:
>
> This patch refactors the core logic of "btf_check_func_arg_match()"
> into a new function "do_btf_check_func_arg_match()".
> "do_btf_check_func_arg_match()" will be reused later to check
> the kernel function call.
>
> The "if (!btf_type_is_ptr(t))" is checked first to improve the indentation
> which will be useful for a later patch.
>
> Some of the "btf_kind_str[]" usages is replaced with the shortcut
> "btf_type_str(t)".
>
> Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> ---
>  include/linux/btf.h |   5 ++
>  kernel/bpf/btf.c    | 159 ++++++++++++++++++++++++--------------------
>  2 files changed, 91 insertions(+), 73 deletions(-)
>
> diff --git a/include/linux/btf.h b/include/linux/btf.h
> index 7fabf1428093..93bf2e5225f5 100644
> --- a/include/linux/btf.h
> +++ b/include/linux/btf.h
> @@ -140,6 +140,11 @@ static inline bool btf_type_is_enum(const struct btf_type *t)
>         return BTF_INFO_KIND(t->info) == BTF_KIND_ENUM;
>  }
>
> +static inline bool btf_type_is_scalar(const struct btf_type *t)
> +{
> +       return btf_type_is_int(t) || btf_type_is_enum(t);
> +}
> +
>  static inline bool btf_type_is_typedef(const struct btf_type *t)
>  {
>         return BTF_INFO_KIND(t->info) == BTF_KIND_TYPEDEF;
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index 96cd24020a38..529b94b601c6 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -4381,7 +4381,7 @@ static u8 bpf_ctx_convert_map[] = {
>  #undef BPF_LINK_TYPE
>
>  static const struct btf_member *
> -btf_get_prog_ctx_type(struct bpf_verifier_log *log, struct btf *btf,
> +btf_get_prog_ctx_type(struct bpf_verifier_log *log, const struct btf *btf,
>                       const struct btf_type *t, enum bpf_prog_type prog_type,
>                       int arg)
>  {
> @@ -5366,122 +5366,135 @@ int btf_check_type_match(struct bpf_verifier_log *log, const struct bpf_prog *pr
>         return btf_check_func_type_match(log, btf1, t1, btf2, t2);
>  }
>
> -/* Compare BTF of a function with given bpf_reg_state.
> - * Returns:
> - * EFAULT - there is a verifier bug. Abort verification.
> - * EINVAL - there is a type mismatch or BTF is not available.
> - * 0 - BTF matches with what bpf_reg_state expects.
> - * Only PTR_TO_CTX and SCALAR_VALUE states are recognized.
> - */
> -int btf_check_func_arg_match(struct bpf_verifier_env *env, int subprog,
> -                            struct bpf_reg_state *regs)
> +static int do_btf_check_func_arg_match(struct bpf_verifier_env *env,

do_btf_check_func_arg_match vs btf_check_func_arg_match distinction is
not clear at all. How about something like

btf_check_func_arg_match vs btf_check_subprog_arg_match (or btf_func
vs bpf_subprog). I think that highlights the main distinction better,
no?

> +                                      const struct btf *btf, u32 func_id,
> +                                      struct bpf_reg_state *regs,
> +                                      bool ptr_to_mem_ok)
>  {
>         struct bpf_verifier_log *log = &env->log;
> -       struct bpf_prog *prog = env->prog;
> -       struct btf *btf = prog->aux->btf;
> -       const struct btf_param *args;
> +       const char *func_name, *ref_tname;
>         const struct btf_type *t, *ref_t;
> -       u32 i, nargs, btf_id, type_size;
> -       const char *tname;
> -       bool is_global;
> -
> -       if (!prog->aux->func_info)
> -               return -EINVAL;
> -
> -       btf_id = prog->aux->func_info[subprog].type_id;
> -       if (!btf_id)
> -               return -EFAULT;
> -
> -       if (prog->aux->func_info_aux[subprog].unreliable)
> -               return -EINVAL;
> +       const struct btf_param *args;
> +       u32 i, nargs;
>
> -       t = btf_type_by_id(btf, btf_id);
> +       t = btf_type_by_id(btf, func_id);
>         if (!t || !btf_type_is_func(t)) {
>                 /* These checks were already done by the verifier while loading
>                  * struct bpf_func_info
>                  */
> -               bpf_log(log, "BTF of func#%d doesn't point to KIND_FUNC\n",
> -                       subprog);
> +               bpf_log(log, "BTF of func_id %u doesn't point to KIND_FUNC\n",
> +                       func_id);
>                 return -EFAULT;
>         }
> -       tname = btf_name_by_offset(btf, t->name_off);
> +       func_name = btf_name_by_offset(btf, t->name_off);
>
>         t = btf_type_by_id(btf, t->type);
>         if (!t || !btf_type_is_func_proto(t)) {
> -               bpf_log(log, "Invalid BTF of func %s\n", tname);
> +               bpf_log(log, "Invalid BTF of func %s\n", func_name);
>                 return -EFAULT;
>         }
>         args = (const struct btf_param *)(t + 1);
>         nargs = btf_type_vlen(t);
>         if (nargs > MAX_BPF_FUNC_REG_ARGS) {
> -               bpf_log(log, "Function %s has %d > %d args\n", tname, nargs,
> +               bpf_log(log, "Function %s has %d > %d args\n", func_name, nargs,
>                         MAX_BPF_FUNC_REG_ARGS);
> -               goto out;
> +               return -EINVAL;
>         }
>
> -       is_global = prog->aux->func_info_aux[subprog].linkage == BTF_FUNC_GLOBAL;
>         /* check that BTF function arguments match actual types that the
>          * verifier sees.
>          */
>         for (i = 0; i < nargs; i++) {
> -               struct bpf_reg_state *reg = &regs[i + 1];
> +               u32 regno = i + 1;
> +               struct bpf_reg_state *reg = &regs[regno];
>
> -               t = btf_type_by_id(btf, args[i].type);
> -               while (btf_type_is_modifier(t))
> -                       t = btf_type_by_id(btf, t->type);
> -               if (btf_type_is_int(t) || btf_type_is_enum(t)) {
> +               t = btf_type_skip_modifiers(btf, args[i].type, NULL);
> +               if (btf_type_is_scalar(t)) {
>                         if (reg->type == SCALAR_VALUE)
>                                 continue;
> -                       bpf_log(log, "R%d is not a scalar\n", i + 1);
> -                       goto out;
> +                       bpf_log(log, "R%d is not a scalar\n", regno);
> +                       return -EINVAL;
>                 }
> -               if (btf_type_is_ptr(t)) {
> +
> +               if (!btf_type_is_ptr(t)) {
> +                       bpf_log(log, "Unrecognized arg#%d type %s\n",
> +                               i, btf_type_str(t));
> +                       return -EINVAL;
> +               }
> +
> +               ref_t = btf_type_skip_modifiers(btf, t->type, NULL);
> +               ref_tname = btf_name_by_offset(btf, ref_t->name_off);

these two seem to be used only inside else `if (ptr_to_mem_ok)`, let's
move the code and variables inside that branch?

> +               if (btf_get_prog_ctx_type(log, btf, t, env->prog->type, i)) {
>                         /* If function expects ctx type in BTF check that caller
>                          * is passing PTR_TO_CTX.
>                          */
> -                       if (btf_get_prog_ctx_type(log, btf, t, prog->type, i)) {
> -                               if (reg->type != PTR_TO_CTX) {
> -                                       bpf_log(log,
> -                                               "arg#%d expected pointer to ctx, but got %s\n",
> -                                               i, btf_kind_str[BTF_INFO_KIND(t->info)]);
> -                                       goto out;
> -                               }
> -                               if (check_ctx_reg(env, reg, i + 1))
> -                                       goto out;
> -                               continue;
> +                       if (reg->type != PTR_TO_CTX) {
> +                               bpf_log(log,
> +                                       "arg#%d expected pointer to ctx, but got %s\n",
> +                                       i, btf_type_str(t));
> +                               return -EINVAL;
>                         }
> +                       if (check_ctx_reg(env, reg, regno))
> +                               return -EINVAL;

original code had `continue` here allowing to stop tracking if/else
logic. Any specific reason you removed it? It keeps logic simpler to
follow, imo.

> +               } else if (ptr_to_mem_ok) {

similarly to how you did reduction of nestedness with btf_type_is_ptr, I'd do

if (!ptr_to_mem_ok)
    return -EINVAL;

and let brain forget about another if/else branch tracking

> +                       const struct btf_type *resolve_ret;
> +                       u32 type_size;
>
> -                       if (!is_global)
> -                               goto out;
> -
> -                       t = btf_type_skip_modifiers(btf, t->type, NULL);
> -
> -                       ref_t = btf_resolve_size(btf, t, &type_size);
> -                       if (IS_ERR(ref_t)) {
> +                       resolve_ret = btf_resolve_size(btf, ref_t, &type_size);
> +                       if (IS_ERR(resolve_ret)) {
>                                 bpf_log(log,
> -                                   "arg#%d reference type('%s %s') size cannot be determined: %ld\n",
> -                                   i, btf_type_str(t), btf_name_by_offset(btf, t->name_off),
> -                                       PTR_ERR(ref_t));
> -                               goto out;
> +                                       "arg#%d reference type('%s %s') size cannot be determined: %ld\n",
> +                                       i, btf_type_str(ref_t), ref_tname,
> +                                       PTR_ERR(resolve_ret));
> +                               return -EINVAL;
>                         }
>
> -                       if (check_mem_reg(env, reg, i + 1, type_size))
> -                               goto out;
> -
> -                       continue;
> +                       if (check_mem_reg(env, reg, regno, type_size))
> +                               return -EINVAL;
> +               } else {
> +                       return -EINVAL;
>                 }
> -               bpf_log(log, "Unrecognized arg#%d type %s\n",
> -                       i, btf_kind_str[BTF_INFO_KIND(t->info)]);
> -               goto out;
>         }
> +
>         return 0;
> -out:
> +}
> +
> +/* Compare BTF of a function with given bpf_reg_state.
> + * Returns:
> + * EFAULT - there is a verifier bug. Abort verification.
> + * EINVAL - there is a type mismatch or BTF is not available.
> + * 0 - BTF matches with what bpf_reg_state expects.
> + * Only PTR_TO_CTX and SCALAR_VALUE states are recognized.
> + */
> +int btf_check_func_arg_match(struct bpf_verifier_env *env, int subprog,
> +                            struct bpf_reg_state *regs)
> +{
> +       struct bpf_prog *prog = env->prog;
> +       struct btf *btf = prog->aux->btf;
> +       bool is_global;
> +       u32 btf_id;
> +       int err;
> +
> +       if (!prog->aux->func_info)
> +               return -EINVAL;
> +
> +       btf_id = prog->aux->func_info[subprog].type_id;
> +       if (!btf_id)
> +               return -EFAULT;
> +
> +       if (prog->aux->func_info_aux[subprog].unreliable)
> +               return -EINVAL;
> +
> +       is_global = prog->aux->func_info_aux[subprog].linkage == BTF_FUNC_GLOBAL;
> +       err = do_btf_check_func_arg_match(env, btf, btf_id, regs, is_global);
> +
>         /* Compiler optimizations can remove arguments from static functions
>          * or mismatched type can be passed into a global function.
>          * In such cases mark the function as unreliable from BTF point of view.
>          */
> -       prog->aux->func_info_aux[subprog].unreliable = true;
> -       return -EINVAL;
> +       if (err == -EINVAL)
> +               prog->aux->func_info_aux[subprog].unreliable = true;

is there any harm marking it unreliable for any error? this makes it
look like -EINVAL is super-special. If it's EFAULT, it won't matter,
right?

> +       return err;
>  }
>
>  /* Convert BTF of a function into bpf_reg_state if possible
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH bpf-next 02/15] bpf: btf: Support parsing extern func
  2021-03-18 22:53   ` Andrii Nakryiko
@ 2021-03-18 23:39     ` Martin KaFai Lau
  2021-03-19  4:13       ` Andrii Nakryiko
  0 siblings, 1 reply; 49+ messages in thread
From: Martin KaFai Lau @ 2021-03-18 23:39 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team, Networking

On Thu, Mar 18, 2021 at 03:53:38PM -0700, Andrii Nakryiko wrote:
> On Tue, Mar 16, 2021 at 12:01 AM Martin KaFai Lau <kafai@fb.com> wrote:
> >
> > This patch makes BTF verifier to accept extern func. It is used for
> > allowing bpf program to call a limited set of kernel functions
> > in a later patch.
> >
> > When writing bpf prog, the extern kernel function needs
> > to be declared under a ELF section (".ksyms") which is
> > the same as the current extern kernel variables and that should
> > keep its usage consistent without requiring to remember another
> > section name.
> >
> > For example, in a bpf_prog.c:
> >
> > extern int foo(struct sock *) __attribute__((section(".ksyms")))
> >
> > [24] FUNC_PROTO '(anon)' ret_type_id=15 vlen=1
> >         '(anon)' type_id=18
> > [25] FUNC 'foo' type_id=24 linkage=extern
> > [ ... ]
> > [33] DATASEC '.ksyms' size=0 vlen=1
> >         type_id=25 offset=0 size=0
> >
> > LLVM will put the "func" type into the BTF datasec ".ksyms".
> > The current "btf_datasec_check_meta()" assumes everything under
> > it is a "var" and ensures it has non-zero size ("!vsi->size" test).
> > The non-zero size check is not true for "func".  This patch postpones the
> > "!vsi-size" test from "btf_datasec_check_meta()" to
> > "btf_datasec_resolve()" which has all types collected to decide
> > if a vsi is a "var" or a "func" and then enforce the "vsi->size"
> > differently.
> >
> > If the datasec only has "func", its "t->size" could be zero.
> > Thus, the current "!t->size" test is no longer valid.  The
> > invalid "t->size" will still be caught by the later
> > "last_vsi_end_off > t->size" check.   This patch also takes this
> > chance to consolidate other "t->size" tests ("vsi->offset >= t->size"
> > "vsi->size > t->size", and "t->size < sum") into the existing
> > "last_vsi_end_off > t->size" test.
> >
> > The LLVM will also put those extern kernel function as an extern
> > linkage func in the BTF:
> >
> > [24] FUNC_PROTO '(anon)' ret_type_id=15 vlen=1
> >         '(anon)' type_id=18
> > [25] FUNC 'foo' type_id=24 linkage=extern
> >
> > This patch allows BTF_FUNC_EXTERN in btf_func_check_meta().
> > Also extern kernel function declaration does not
> > necessary have arg name. Another change in btf_func_check() is
> > to allow extern function having no arg name.
> >
> > The btf selftest is adjusted accordingly.  New tests are also added.
> >
> > The required LLVM patch: https://reviews.llvm.org/D93563 
> >
> > Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> > ---
> 
> High-level question about EXTERN functions in DATASEC. Does kernel
> need to see them under DATASEC? What if libbpf just removed all EXTERN
> funcs from under DATASEC and leave them as "free-floating" EXTERN
> FUNCs in BTF.
> 
> We need to tag EXTERNs with DATASECs mainly for libbpf to know whether
> it's .kconfig or .ksym or other type of externs. Does kernel need to
> care?
Although the kernel does not need to know, since the a legit llvm generates it,
I go with a proper support in the kernel (e.g. bpftool btf dump can better
reflect what was there).

> 
> >  kernel/bpf/btf.c                             |  52 ++++---
> >  tools/testing/selftests/bpf/prog_tests/btf.c | 154 ++++++++++++++++++-
> >  2 files changed, 178 insertions(+), 28 deletions(-)
> >
> 
> [...]
> 
> > @@ -3611,9 +3594,28 @@ static int btf_datasec_resolve(struct btf_verifier_env *env,
> >                 u32 var_type_id = vsi->type, type_id, type_size = 0;
> >                 const struct btf_type *var_type = btf_type_by_id(env->btf,
> >                                                                  var_type_id);
> > -               if (!var_type || !btf_type_is_var(var_type)) {
> > +               if (!var_type) {
> > +                       btf_verifier_log_vsi(env, v->t, vsi,
> > +                                            "type not found");
> > +                       return -EINVAL;
> > +               }
> > +
> > +               if (btf_type_is_func(var_type)) {
> > +                       if (vsi->size || vsi->offset) {
> > +                               btf_verifier_log_vsi(env, v->t, vsi,
> > +                                                    "Invalid size/offset");
> > +                               return -EINVAL;
> > +                       }
> > +                       continue;
> > +               } else if (btf_type_is_var(var_type)) {
> > +                       if (!vsi->size) {
> > +                               btf_verifier_log_vsi(env, v->t, vsi,
> > +                                                    "Invalid size");
> > +                               return -EINVAL;
> > +                       }
> > +               } else {
> >                         btf_verifier_log_vsi(env, v->t, vsi,
> > -                                            "Not a VAR kind member");
> > +                                            "Neither a VAR nor a FUNC");
> >                         return -EINVAL;
> 
> can you please structure it as follow (I think it is bit easier to
> follow the logic then):
> 
> if (btf_type_is_func()) {
>    ...
>    continue; /* no extra checks */
> }
> 
> if (!btf_type_is_var()) {
>    /* bad, complain, exit */
>    return -EINVAL;
> }
> 
> /* now we deal with extra checks for variables */
> 
> That way variable checks are kept all in one place.
> 
> Also a question: is that ok to enable non-extern functions under
> DATASEC? Maybe, but that wasn't explicitly mentioned.
The patch does not check.  We could reject that for now.

> 
> >                 }
> >
> > @@ -3849,9 +3851,11 @@ static int btf_func_check(struct btf_verifier_env *env,
> >         const struct btf_param *args;
> >         const struct btf *btf;
> >         u16 nr_args, i;
> > +       bool is_extern;
> >
> >         btf = env->btf;
> >         proto_type = btf_type_by_id(btf, t->type);
> > +       is_extern = btf_type_vlen(t) == BTF_FUNC_EXTERN;
> 
> using btf_type_vlen(t) for getting func linkage is becoming more and
> more confusing. Would it be terrible to have btf_func_linkage(t)
> helper instead?
I have it in my local v2.  and also just return when it is extern.

> 
> >
> >         if (!proto_type || !btf_type_is_func_proto(proto_type)) {
> >                 btf_verifier_log_type(env, t, "Invalid type_id");
> > @@ -3861,7 +3865,7 @@ static int btf_func_check(struct btf_verifier_env *env,
> >         args = (const struct btf_param *)(proto_type + 1);
> >         nr_args = btf_type_vlen(proto_type);
> >         for (i = 0; i < nr_args; i++) {
> > -               if (!args[i].name_off && args[i].type) {
> > +               if (!is_extern && !args[i].name_off && args[i].type) {
> >                         btf_verifier_log_type(env, t, "Invalid arg#%u", i + 1);
> >                         return -EINVAL;
> >                 }
> 
> [...]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH bpf-next 04/15] bpf: Support bpf program calling kernel function
  2021-03-16  1:14 ` [PATCH bpf-next 04/15] bpf: Support bpf program calling kernel function Martin KaFai Lau
@ 2021-03-19  1:03   ` Andrii Nakryiko
  2021-03-19  1:51     ` Alexei Starovoitov
  2021-03-19 19:47     ` Martin KaFai Lau
  0 siblings, 2 replies; 49+ messages in thread
From: Andrii Nakryiko @ 2021-03-19  1:03 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team, Networking

On Tue, Mar 16, 2021 at 12:01 AM Martin KaFai Lau <kafai@fb.com> wrote:
>
> This patch adds support to BPF verifier to allow bpf program calling
> kernel function directly.
>
> The use case included in this set is to allow bpf-tcp-cc to directly
> call some tcp-cc helper functions (e.g. "tcp_cong_avoid_ai()").  Those
> functions have already been used by some kernel tcp-cc implementations.
>
> This set will also allow the bpf-tcp-cc program to directly call the
> kernel tcp-cc implementation,  For example, a bpf_dctcp may only want to
> implement its own dctcp_cwnd_event() and reuse other dctcp_*() directly
> from the kernel tcp_dctcp.c instead of reimplementing (or
> copy-and-pasting) them.
>
> The tcp-cc kernel functions mentioned above will be white listed
> for the struct_ops bpf-tcp-cc programs to use in a later patch.
> The white listed functions are not bounded to a fixed ABI contract.
> Those functions have already been used by the existing kernel tcp-cc.
> If any of them has changed, both in-tree and out-of-tree kernel tcp-cc
> implementations have to be changed.  The same goes for the struct_ops
> bpf-tcp-cc programs which have to be adjusted accordingly.
>
> This patch is to make the required changes in the bpf verifier.
>
> First change is in btf.c, it adds a case in "do_btf_check_func_arg_match()".
> When the passed in "btf->kernel_btf == true", it means matching the
> verifier regs' states with a kernel function.  This will handle the
> PTR_TO_BTF_ID reg.  It also maps PTR_TO_SOCK_COMMON, PTR_TO_SOCKET,
> and PTR_TO_TCP_SOCK to its kernel's btf_id.
>
> In the later libbpf patch, the insn calling a kernel function will
> look like:
>
> insn->code == (BPF_JMP | BPF_CALL)
> insn->src_reg == BPF_PSEUDO_KFUNC_CALL /* <- new in this patch */
> insn->imm == func_btf_id /* btf_id of the running kernel */
>
> [ For the future calling function-in-kernel-module support, an array
>   of module btf_fds can be passed at the load time and insn->off
>   can be used to index into this array. ]
>
> At the early stage of verifier, the verifier will collect all kernel
> function calls into "struct bpf_kern_func_descriptor".  Those
> descriptors are stored in "prog->aux->kfunc_tab" and will
> be available to the JIT.  Since this "add" operation is similar
> to the current "add_subprog()" and looking for the same insn->code,
> they are done together in the new "add_subprog_and_kern_func()".
>
> In the "do_check()" stage, the new "check_kern_func_call()" is added
> to verify the kernel function call instruction:
> 1. Ensure the kernel function can be used by a particular BPF_PROG_TYPE.
>    A new bpf_verifier_ops "check_kern_func_call" is added to do that.
>    The bpf-tcp-cc struct_ops program will implement this function in
>    a later patch.
> 2. Call "btf_check_kern_func_args_match()" to ensure the regs can be
>    used as the args of a kernel function.
> 3. Mark the regs' type, subreg_def, and zext_dst.
>
> At the later do_misc_fixups() stage, the new fixup_kern_func_call()
> will replace the insn->imm with the function address (relative
> to __bpf_call_base).  If needed, the jit can find the btf_func_model
> by calling the new bpf_jit_find_kern_func_model(prog, insn->imm).
> With the imm set to the function address, "bpftool prog dump xlated"
> will be able to display the kernel function calls the same way as
> it displays other bpf helper calls.
>
> gpl_compatible program is required to call kernel function.
>
> This feature currently requires JIT.
>
> Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> ---

After the initial pass it all makes sense so far. I am a bit concerned
about s32 and kernel function offset, though. See below.

Also "kern_func" and "descriptor" are quite mouthful, it seems to me
that using kfunc consistently wouldn't hurt readability at all. You
also already use desc in place of "descriptor" for variables, so I'd
do that in type names as well.

>  arch/x86/net/bpf_jit_comp.c       |   5 +
>  include/linux/bpf.h               |  24 ++
>  include/linux/btf.h               |   1 +
>  include/linux/filter.h            |   1 +
>  include/uapi/linux/bpf.h          |   4 +
>  kernel/bpf/btf.c                  |  65 +++++-
>  kernel/bpf/core.c                 |  18 +-
>  kernel/bpf/disasm.c               |  32 +--
>  kernel/bpf/disasm.h               |   3 +-
>  kernel/bpf/syscall.c              |   1 +
>  kernel/bpf/verifier.c             | 376 ++++++++++++++++++++++++++++--
>  tools/bpf/bpftool/xlated_dumper.c |   3 +-
>  tools/include/uapi/linux/bpf.h    |   4 +
>  13 files changed, 488 insertions(+), 49 deletions(-)
>

[...]

> +
> +       func_name = btf_name_by_offset(btf_vmlinux, func->name_off);
> +       addr = kallsyms_lookup_name(func_name);
> +       if (!addr) {
> +               verbose(env, "cannot find address for kernel function %s\n",
> +                       func_name);
> +               return -EINVAL;
> +       }
> +
> +       desc = &tab->descs[tab->nr_descs++];
> +       desc->func_id = func_id;
> +       desc->imm = BPF_CAST_CALL(addr) - __bpf_call_base;

Is this difference guaranteed to always fit within s32?

> +       sort(tab->descs, tab->nr_descs, sizeof(tab->descs[0]),
> +            kern_func_desc_cmp_by_id, NULL);
> +
> +       return btf_distill_func_proto(&env->log, btf_vmlinux,
> +                                     func_proto, func_name,
> +                                     &desc->func_model);
> +}
> +
> +static int kern_func_desc_cmp_by_imm(const void *a, const void *b)
> +{
> +       const struct bpf_kern_func_descriptor *d0 = a;
> +       const struct bpf_kern_func_descriptor *d1 = b;
> +
> +       return d0->imm - d1->imm;

this is not safe, assuming any possible s32 values, no?

> +}
> +
> +static void sort_kern_func_descs_by_imm(struct bpf_prog *prog)
> +{
> +       struct bpf_kern_func_desc_tab *tab;
> +
> +       tab = prog->aux->kfunc_tab;
> +       if (!tab)
> +               return;
> +
> +       sort(tab->descs, tab->nr_descs, sizeof(tab->descs[0]),
> +            kern_func_desc_cmp_by_imm, NULL);
> +}
> +

[...]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH bpf-next 07/15] bpf: tcp: White list some tcp cong functions to be called by bpf-tcp-cc
  2021-03-16  1:14 ` [PATCH bpf-next 07/15] bpf: tcp: White list some tcp cong functions to be called by bpf-tcp-cc Martin KaFai Lau
@ 2021-03-19  1:19   ` Andrii Nakryiko
  0 siblings, 0 replies; 49+ messages in thread
From: Andrii Nakryiko @ 2021-03-19  1:19 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team, Networking

On Tue, Mar 16, 2021 at 12:02 AM Martin KaFai Lau <kafai@fb.com> wrote:
>
> This patch white list some tcp cong helper functions, tcp_slow_start()
> and tcp_cong_avoid_ai().  They are allowed to be directly called by
> the bpf-tcp-cc program.
>
> A few tcp cc implementation functions are also white listed.
> A potential use case is the bpf-tcp-cc implementation may only
> want to override a subset of a tcp_congestion_ops.  For others,
> the bpf-tcp-cc can directly call the kernel counter parts instead of
> re-implementing (or copy-and-pasting) them to the bpf program.
>
> They will only be available to the bpf-tcp-cc typed program.
> The white listed functions are not bounded to a fixed ABI contract.
> When any of them has changed, the bpf-tcp-cc program has to be changed
> like any in-tree/out-of-tree kernel tcp-cc implementations do also.
>
> Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> ---

Just nits, of course :)

Whitelist is a single word, but see also 49decddd39e5 ("Merge tag
'inclusive-terminology' of
git://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux"),
allowlist/denylist is recommended for new code.

Acked-by: Andrii Nakryiko <andrii@kernel.org>

>  net/ipv4/bpf_tcp_ca.c | 41 +++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 41 insertions(+)
>
> diff --git a/net/ipv4/bpf_tcp_ca.c b/net/ipv4/bpf_tcp_ca.c
> index d520e61649c8..ed6e6b5b762b 100644
> --- a/net/ipv4/bpf_tcp_ca.c
> +++ b/net/ipv4/bpf_tcp_ca.c
> @@ -5,6 +5,7 @@
>  #include <linux/bpf_verifier.h>
>  #include <linux/bpf.h>
>  #include <linux/btf.h>
> +#include <linux/btf_ids.h>
>  #include <linux/filter.h>
>  #include <net/tcp.h>
>  #include <net/bpf_sk_storage.h>
> @@ -178,10 +179,50 @@ bpf_tcp_ca_get_func_proto(enum bpf_func_id func_id,
>         }
>  }
>
> +BTF_SET_START(bpf_tcp_ca_kfunc_ids)
> +BTF_ID(func, tcp_reno_ssthresh)
> +BTF_ID(func, tcp_reno_cong_avoid)
> +BTF_ID(func, tcp_reno_undo_cwnd)
> +BTF_ID(func, tcp_slow_start)
> +BTF_ID(func, tcp_cong_avoid_ai)
> +#if IS_BUILTIN(CONFIG_TCP_CONG_CUBIC)
> +BTF_ID(func, cubictcp_init)
> +BTF_ID(func, cubictcp_recalc_ssthresh)
> +BTF_ID(func, cubictcp_cong_avoid)
> +BTF_ID(func, cubictcp_state)
> +BTF_ID(func, cubictcp_cwnd_event)
> +BTF_ID(func, cubictcp_acked)
> +#endif
> +#if IS_BUILTIN(CONFIG_TCP_CONG_DCTCP)
> +BTF_ID(func, dctcp_init)
> +BTF_ID(func, dctcp_update_alpha)
> +BTF_ID(func, dctcp_cwnd_event)
> +BTF_ID(func, dctcp_ssthresh)
> +BTF_ID(func, dctcp_cwnd_undo)
> +BTF_ID(func, dctcp_state)
> +#endif
> +#if IS_BUILTIN(CONFIG_TCP_CONG_BBR)
> +BTF_ID(func, bbr_init)
> +BTF_ID(func, bbr_main)
> +BTF_ID(func, bbr_sndbuf_expand)
> +BTF_ID(func, bbr_undo_cwnd)
> +BTF_ID(func, bbr_cwnd_even),
> +BTF_ID(func, bbr_ssthresh)
> +BTF_ID(func, bbr_min_tso_segs)
> +BTF_ID(func, bbr_set_state)
> +#endif
> +BTF_SET_END(bpf_tcp_ca_kfunc_ids)

see, kfunc here...

> +
> +static bool bpf_tcp_ca_check_kern_func_call(u32 kfunc_btf_id)

...but more verbose kern_func here. I like kfunc everywhere ;)

> +{
> +       return btf_id_set_contains(&bpf_tcp_ca_kfunc_ids, kfunc_btf_id);
> +}
> +
>  static const struct bpf_verifier_ops bpf_tcp_ca_verifier_ops = {
>         .get_func_proto         = bpf_tcp_ca_get_func_proto,
>         .is_valid_access        = bpf_tcp_ca_is_valid_access,
>         .btf_struct_access      = bpf_tcp_ca_btf_struct_access,
> +       .check_kern_func_call   = bpf_tcp_ca_check_kern_func_call,
>  };
>
>  static int bpf_tcp_ca_init_member(const struct btf_type *t,
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH bpf-next 04/15] bpf: Support bpf program calling kernel function
  2021-03-19  1:03   ` Andrii Nakryiko
@ 2021-03-19  1:51     ` Alexei Starovoitov
  2021-03-19 19:47     ` Martin KaFai Lau
  1 sibling, 0 replies; 49+ messages in thread
From: Alexei Starovoitov @ 2021-03-19  1:51 UTC (permalink / raw)
  To: Andrii Nakryiko, Martin KaFai Lau
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team, Networking

On 3/18/21 6:03 PM, Andrii Nakryiko wrote:
>> +       desc->imm = BPF_CAST_CALL(addr) - __bpf_call_base;
> Is this difference guaranteed to always fit within s32?
> 

we have this restriction in many places: JIT, dispatcher, trampoline,
and bpf interpreter.
Modules and kernel .text are in the same 4G for the same reason: 
performance.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH bpf-next 08/15] libbpf: Refactor bpf_object__resolve_ksyms_btf_id
  2021-03-16  1:14 ` [PATCH bpf-next 08/15] libbpf: Refactor bpf_object__resolve_ksyms_btf_id Martin KaFai Lau
@ 2021-03-19  2:53   ` Andrii Nakryiko
  0 siblings, 0 replies; 49+ messages in thread
From: Andrii Nakryiko @ 2021-03-19  2:53 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team, Networking

On Tue, Mar 16, 2021 at 12:02 AM Martin KaFai Lau <kafai@fb.com> wrote:
>
> This patch refactors most of the logic from
> bpf_object__resolve_ksyms_btf_id() into a new function
> bpf_object__resolve_ksym_var_btf_id().
> It is to get ready for a later patch adding
> bpf_object__resolve_ksym_func_btf_id() which resolves
> a kernel function to the running kernel btf_id.
>
> Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> ---

Acked-by: Andrii Nakryiko <andrii@kernel.org>

>  tools/lib/bpf/libbpf.c | 125 ++++++++++++++++++++++-------------------
>  1 file changed, 68 insertions(+), 57 deletions(-)
>

[...]

> +static int bpf_object__resolve_ksyms_btf_id(struct bpf_object *obj)
> +{
> +       struct extern_desc *ext;
> +       int i, err;
> +
> +       for (i = 0; i < obj->nr_extern; i++) {
> +               ext = &obj->externs[i];
> +               if (ext->type != EXT_KSYM || !ext->ksym.type_id)
> +                       continue;
> +
> +               err = bpf_object__resolve_ksym_var_btf_id(obj, ext);
>

we usually put error checking right on the next line without empty
lines, please remove this distracting empty line


> -               ext->is_set = true;
> -               ext->ksym.kernel_btf_obj_fd = btf_fd;
> -               ext->ksym.kernel_btf_id = id;
> -               pr_debug("extern (ksym) '%s': resolved to [%d] %s %s\n",
> -                        ext->name, id, btf_kind_str(targ_var), targ_var_name);
> +               if (err)
> +                       return err;
>         }
>         return 0;
>  }
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH bpf-next 09/15] libbpf: Refactor codes for finding btf id of a kernel symbol
  2021-03-16  1:14 ` [PATCH bpf-next 09/15] libbpf: Refactor codes for finding btf id of a kernel symbol Martin KaFai Lau
@ 2021-03-19  3:14   ` Andrii Nakryiko
  0 siblings, 0 replies; 49+ messages in thread
From: Andrii Nakryiko @ 2021-03-19  3:14 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team, Networking

On Tue, Mar 16, 2021 at 12:02 AM Martin KaFai Lau <kafai@fb.com> wrote:
>
> This patch refactors code, that finds kernel btf_id by kind
> and symbol name, to a new function find_ksym_btf_id().
>
> It also adds a new helper __btf_kind_str() to return
> a string by the numeric kind value.
>
> Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> ---
>  tools/lib/bpf/libbpf.c | 44 +++++++++++++++++++++++++++++++-----------
>  1 file changed, 33 insertions(+), 11 deletions(-)
>

LGTM.

Acked-by: Andrii Nakryiko <andrii@kernel.org>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH bpf-next 10/15] libbpf: Rename RELO_EXTERN to RELO_EXTERN_VAR
  2021-03-16  1:14 ` [PATCH bpf-next 10/15] libbpf: Rename RELO_EXTERN to RELO_EXTERN_VAR Martin KaFai Lau
@ 2021-03-19  3:15   ` Andrii Nakryiko
  0 siblings, 0 replies; 49+ messages in thread
From: Andrii Nakryiko @ 2021-03-19  3:15 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team, Networking

On Tue, Mar 16, 2021 at 12:02 AM Martin KaFai Lau <kafai@fb.com> wrote:
>
> This patch renames RELO_EXTERN to RELO_EXTERN_VAR.
> It is to avoid the confusion with a later patch adding
> RELO_EXTERN_FUNC.
>
> Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> ---

Acked-by: Andrii Nakryiko <andrii@kernel.org>

>  tools/lib/bpf/libbpf.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 8355b786b3db..8f924aece736 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -189,7 +189,7 @@ enum reloc_type {
>         RELO_LD64,
>         RELO_CALL,
>         RELO_DATA,
> -       RELO_EXTERN,
> +       RELO_EXTERN_VAR,
>         RELO_SUBPROG_ADDR,
>  };
>
> @@ -3463,7 +3463,7 @@ static int bpf_program__record_reloc(struct bpf_program *prog,
>                 }
>                 pr_debug("prog '%s': found extern #%d '%s' (sym %d) for insn #%u\n",
>                          prog->name, i, ext->name, ext->sym_idx, insn_idx);
> -               reloc_desc->type = RELO_EXTERN;
> +               reloc_desc->type = RELO_EXTERN_VAR;
>                 reloc_desc->insn_idx = insn_idx;
>                 reloc_desc->sym_off = i; /* sym_off stores extern index */
>                 return 0;
> @@ -6226,7 +6226,7 @@ bpf_object__relocate_data(struct bpf_object *obj, struct bpf_program *prog)
>                         insn[0].imm = obj->maps[relo->map_idx].fd;
>                         relo->processed = true;
>                         break;
> -               case RELO_EXTERN:
> +               case RELO_EXTERN_VAR:
>                         ext = &obj->externs[relo->sym_off];
>                         if (ext->type == EXT_KCFG) {
>                                 insn[0].src_reg = BPF_PSEUDO_MAP_VALUE;
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH bpf-next 11/15] libbpf: Record extern sym relocation first
  2021-03-16  1:14 ` [PATCH bpf-next 11/15] libbpf: Record extern sym relocation first Martin KaFai Lau
@ 2021-03-19  3:16   ` Andrii Nakryiko
  0 siblings, 0 replies; 49+ messages in thread
From: Andrii Nakryiko @ 2021-03-19  3:16 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team, Networking

On Tue, Mar 16, 2021 at 12:02 AM Martin KaFai Lau <kafai@fb.com> wrote:
>
> This patch records the extern sym relocs first before recording
> subprog relocs.  The later patch will have relocs for extern
> kernel function call which is also using BPF_JMP | BPF_CALL.
> It will be easier to handle the extern symbols first in
> the later patch.
>
> Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> ---

Looks good, just let's add that tiny helper for cleanliness and to
match what we do for ldimm64

Acked-by: Andrii Nakryiko <andrii@kernel.org>

>  tools/lib/bpf/libbpf.c | 50 +++++++++++++++++++++---------------------
>  1 file changed, 25 insertions(+), 25 deletions(-)
>
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 8f924aece736..0a60fcb2fba2 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -3416,31 +3416,7 @@ static int bpf_program__record_reloc(struct bpf_program *prog,
>
>         reloc_desc->processed = false;
>
> -       /* sub-program call relocation */
> -       if (insn->code == (BPF_JMP | BPF_CALL)) {
> -               if (insn->src_reg != BPF_PSEUDO_CALL) {
> -                       pr_warn("prog '%s': incorrect bpf_call opcode\n", prog->name);
> -                       return -LIBBPF_ERRNO__RELOC;
> -               }
> -               /* text_shndx can be 0, if no default "main" program exists */
> -               if (!shdr_idx || shdr_idx != obj->efile.text_shndx) {
> -                       sym_sec_name = elf_sec_name(obj, elf_sec_by_idx(obj, shdr_idx));
> -                       pr_warn("prog '%s': bad call relo against '%s' in section '%s'\n",
> -                               prog->name, sym_name, sym_sec_name);
> -                       return -LIBBPF_ERRNO__RELOC;
> -               }
> -               if (sym->st_value % BPF_INSN_SZ) {
> -                       pr_warn("prog '%s': bad call relo against '%s' at offset %zu\n",
> -                               prog->name, sym_name, (size_t)sym->st_value);
> -                       return -LIBBPF_ERRNO__RELOC;
> -               }
> -               reloc_desc->type = RELO_CALL;
> -               reloc_desc->insn_idx = insn_idx;
> -               reloc_desc->sym_off = sym->st_value;
> -               return 0;
> -       }
> -
> -       if (!is_ldimm64(insn)) {
> +       if (insn->code != (BPF_JMP | BPF_CALL) && !is_ldimm64(insn)) {
>                 pr_warn("prog '%s': invalid relo against '%s' for insns[%d].code 0x%x\n",
>                         prog->name, sym_name, insn_idx, insn->code);
>                 return -LIBBPF_ERRNO__RELOC;
> @@ -3469,6 +3445,30 @@ static int bpf_program__record_reloc(struct bpf_program *prog,
>                 return 0;
>         }
>
> +       /* sub-program call relocation */
> +       if (insn->code == (BPF_JMP | BPF_CALL)) {

can you please add is_call_insn() helper checking this, similarly to
how we now have is_ldimm64() (should probably be is_ldimm64_insn() for
consistency)

> +               if (insn->src_reg != BPF_PSEUDO_CALL) {
> +                       pr_warn("prog '%s': incorrect bpf_call opcode\n", prog->name);
> +                       return -LIBBPF_ERRNO__RELOC;
> +               }
> +               /* text_shndx can be 0, if no default "main" program exists */
> +               if (!shdr_idx || shdr_idx != obj->efile.text_shndx) {
> +                       sym_sec_name = elf_sec_name(obj, elf_sec_by_idx(obj, shdr_idx));
> +                       pr_warn("prog '%s': bad call relo against '%s' in section '%s'\n",
> +                               prog->name, sym_name, sym_sec_name);
> +                       return -LIBBPF_ERRNO__RELOC;
> +               }
> +               if (sym->st_value % BPF_INSN_SZ) {
> +                       pr_warn("prog '%s': bad call relo against '%s' at offset %zu\n",
> +                               prog->name, sym_name, (size_t)sym->st_value);
> +                       return -LIBBPF_ERRNO__RELOC;
> +               }
> +               reloc_desc->type = RELO_CALL;
> +               reloc_desc->insn_idx = insn_idx;
> +               reloc_desc->sym_off = sym->st_value;
> +               return 0;
> +       }
> +
>         if (!shdr_idx || shdr_idx >= SHN_LORESERVE) {
>                 pr_warn("prog '%s': invalid relo against '%s' in special section 0x%x; forgot to initialize global var?..\n",
>                         prog->name, sym_name, shdr_idx);
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH bpf-next 12/15] libbpf: Support extern kernel function
  2021-03-16  1:14 ` [PATCH bpf-next 12/15] libbpf: Support extern kernel function Martin KaFai Lau
@ 2021-03-19  4:11   ` Andrii Nakryiko
  2021-03-19  5:06     ` Martin KaFai Lau
  0 siblings, 1 reply; 49+ messages in thread
From: Andrii Nakryiko @ 2021-03-19  4:11 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team, Networking

On Tue, Mar 16, 2021 at 12:02 AM Martin KaFai Lau <kafai@fb.com> wrote:
>
> This patch is to make libbpf able to handle the following extern
> kernel function declaration and do the needed relocations before
> loading the bpf program to the kernel.
>
> extern int foo(struct sock *) __attribute__((section(".ksyms")))
>
> In the collect extern phase, needed changes is made to
> bpf_object__collect_externs() and find_extern_btf_id() to collect
> function.
>
> In the collect relo phase, it will record the kernel function
> call as RELO_EXTERN_FUNC.
>
> bpf_object__resolve_ksym_func_btf_id() is added to find the func
> btf_id of the running kernel.
>
> During actual relocation, it will patch the BPF_CALL instruction with
> src_reg = BPF_PSEUDO_FUNC_CALL and insn->imm set to the running
> kernel func's btf_id.
>
> btf_fixup_datasec() is changed also because a datasec may
> only have func and its size will be 0.  The "!size" test
> is postponed till it is confirmed there are vars.
> It also takes this chance to remove the
> "if (... || (t->size && t->size != size)) { return -ENOENT; }" test
> because t->size is zero at the point.
>
> The required LLVM patch: https://reviews.llvm.org/D93563
>
> Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> ---
>  tools/lib/bpf/btf.c    |  32 ++++++++----
>  tools/lib/bpf/btf.h    |   5 ++
>  tools/lib/bpf/libbpf.c | 113 +++++++++++++++++++++++++++++++++++++----
>  3 files changed, 129 insertions(+), 21 deletions(-)
>
> diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
> index 3aa58f2ac183..bb09b577c154 100644
> --- a/tools/lib/bpf/btf.c
> +++ b/tools/lib/bpf/btf.c
> @@ -1108,7 +1108,7 @@ static int btf_fixup_datasec(struct bpf_object *obj, struct btf *btf,
>         const struct btf_type *t_var;
>         struct btf_var_secinfo *vsi;
>         const struct btf_var *var;
> -       int ret;
> +       int ret, nr_vars = 0;
>
>         if (!name) {
>                 pr_debug("No name found in string section for DATASEC kind.\n");
> @@ -1117,27 +1117,27 @@ static int btf_fixup_datasec(struct bpf_object *obj, struct btf *btf,
>
>         /* .extern datasec size and var offsets were set correctly during
>          * extern collection step, so just skip straight to sorting variables
> +        * One exception is the datasec may only have extern funcs,
> +        * t->size is 0 in this case.  This will be handled
> +        * with !nr_vars later.
>          */
>         if (t->size)
>                 goto sort_vars;
>
> -       ret = bpf_object__section_size(obj, name, &size);
> -       if (ret || !size || (t->size && t->size != size)) {
> -               pr_debug("Invalid size for section %s: %u bytes\n", name, size);
> -               return -ENOENT;
> -       }
> -
> -       t->size = size;
> +       bpf_object__section_size(obj, name, &size);

So it's not great that we just ignore any errors here. ".ksyms" is a
special section, so it should be fine to just ignore it by name and
leave the rest of error handling intact.

>
>         for (i = 0, vsi = btf_var_secinfos(t); i < vars; i++, vsi++) {
>                 t_var = btf__type_by_id(btf, vsi->type);
> -               var = btf_var(t_var);
>
> -               if (!btf_is_var(t_var)) {
> -                       pr_debug("Non-VAR type seen in section %s\n", name);
> +               if (btf_is_func(t_var)) {
> +                       continue;

just

if (btf_is_func(t_var))
    continue;

no need for "else if" below

> +               } else if (!btf_is_var(t_var)) {
> +                       pr_debug("Non-VAR and Non-FUNC type seen in section %s\n", name);

nit: Non-FUNC -> non-FUNC

>                         return -EINVAL;
>                 }
>
> +               nr_vars++;
> +               var = btf_var(t_var);
>                 if (var->linkage == BTF_VAR_STATIC)
>                         continue;
>
> @@ -1157,6 +1157,16 @@ static int btf_fixup_datasec(struct bpf_object *obj, struct btf *btf,
>                 vsi->offset = off;
>         }
>
> +       if (!nr_vars)
> +               return 0;
> +
> +       if (!size) {
> +               pr_debug("Invalid size for section %s: %u bytes\n", name, size);
> +               return -ENOENT;
> +       }
> +
> +       t->size = size;
> +
>  sort_vars:
>         qsort(btf_var_secinfos(t), vars, sizeof(*vsi), compare_vsi_off);
>         return 0;
> diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
> index 029a9cfc8c2d..07d508b70497 100644
> --- a/tools/lib/bpf/btf.h
> +++ b/tools/lib/bpf/btf.h
> @@ -368,6 +368,11 @@ btf_var_secinfos(const struct btf_type *t)
>         return (struct btf_var_secinfo *)(t + 1);
>  }
>
> +static inline enum btf_func_linkage btf_func_linkage(const struct btf_type *t)
> +{
> +       return (enum btf_func_linkage)BTF_INFO_VLEN(t->info);
> +}

exposing `enum btf_func_linkage` in libbpf API headers will cause
compilation errors for users on older systems. We went through a bunch
of pain with `enum bpf_stats_type` (and it is still causing pain for
C++), I'd rather avoid some more here. Can you please move it into
libbpf.c for now. It doesn't seem like a very popular function that
needs to be exposed to users.

> +
>  #ifdef __cplusplus
>  } /* extern "C" */
>  #endif
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 0a60fcb2fba2..49bda179bd93 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -190,6 +190,7 @@ enum reloc_type {
>         RELO_CALL,
>         RELO_DATA,
>         RELO_EXTERN_VAR,
> +       RELO_EXTERN_FUNC,
>         RELO_SUBPROG_ADDR,
>  };
>
> @@ -384,6 +385,7 @@ struct extern_desc {
>         int btf_id;
>         int sec_btf_id;
>         const char *name;
> +       const struct btf_type *btf_type;
>         bool is_set;
>         bool is_weak;
>         union {
> @@ -3022,7 +3024,7 @@ static bool sym_is_subprog(const GElf_Sym *sym, int text_shndx)
>  static int find_extern_btf_id(const struct btf *btf, const char *ext_name)
>  {
>         const struct btf_type *t;
> -       const char *var_name;
> +       const char *tname;
>         int i, n;
>
>         if (!btf)
> @@ -3032,14 +3034,18 @@ static int find_extern_btf_id(const struct btf *btf, const char *ext_name)
>         for (i = 1; i <= n; i++) {
>                 t = btf__type_by_id(btf, i);
>
> -               if (!btf_is_var(t))
> +               if (!btf_is_var(t) && !btf_is_func(t))
>                         continue;
>
> -               var_name = btf__name_by_offset(btf, t->name_off);
> -               if (strcmp(var_name, ext_name))
> +               tname = btf__name_by_offset(btf, t->name_off);
> +               if (strcmp(tname, ext_name))
>                         continue;
>
> -               if (btf_var(t)->linkage != BTF_VAR_GLOBAL_EXTERN)
> +               if (btf_is_var(t) &&
> +                   btf_var(t)->linkage != BTF_VAR_GLOBAL_EXTERN)
> +                       return -EINVAL;
> +
> +               if (btf_is_func(t) && btf_func_linkage(t) != BTF_FUNC_EXTERN)
>                         return -EINVAL;
>
>                 return i;
> @@ -3199,10 +3205,10 @@ static int bpf_object__collect_externs(struct bpf_object *obj)
>                         return ext->btf_id;
>                 }
>                 t = btf__type_by_id(obj->btf, ext->btf_id);
> +               ext->btf_type = t;

ext->btf_type is derived from ext->btf_id and obj->btf (always), so
there is no need for it

>                 ext->name = btf__name_by_offset(obj->btf, t->name_off);
>                 ext->sym_idx = i;
>                 ext->is_weak = GELF_ST_BIND(sym.st_info) == STB_WEAK;
> -
>                 ext->sec_btf_id = find_extern_sec_btf_id(obj->btf, ext->btf_id);
>                 if (ext->sec_btf_id <= 0) {
>                         pr_warn("failed to find BTF for extern '%s' [%d] section: %d\n",
> @@ -3212,6 +3218,34 @@ static int bpf_object__collect_externs(struct bpf_object *obj)
>                 sec = (void *)btf__type_by_id(obj->btf, ext->sec_btf_id);
>                 sec_name = btf__name_by_offset(obj->btf, sec->name_off);
>
> +               if (btf_is_func(t)) {

there is a KSYMS_SEC handling logic below, let's keep both func and
variables handling together there?

> +                       const struct btf_type *func_proto;
> +
> +                       func_proto = btf__type_by_id(obj->btf, t->type);
> +                       if (!func_proto || !btf_is_func_proto(func_proto)) {

this is implied by BTF format itself, seems a bit redundant

> +                               pr_warn("extern function %s does not have a valid func_proto\n",
> +                                       ext->name);
> +                               return -EINVAL;
> +                       }
> +
> +                       if (ext->is_weak) {
> +                               pr_warn("extern weak function %s is unsupported\n",
> +                                       ext->name);
> +                               return -ENOTSUP;
> +                       }
> +
> +                       if (strcmp(sec_name, KSYMS_SEC)) {
> +                               pr_warn("extern function %s is only supported under %s section\n",
> +                                       ext->name, KSYMS_SEC);
> +                               return -ENOTSUP;
> +                       }
> +
> +                       ksym_sec = sec;
> +                       ext->type = EXT_KSYM;
> +                       ext->ksym.type_id = ext->btf_id;

there is skip_mods_and_typedefs in KSYMS_SEC section below, but it
won't have any effect on FUNC_PROTO, so existing logic can be used
as-is

> +                       continue;
> +               }
> +
>                 if (strcmp(sec_name, KCONFIG_SEC) == 0) {
>                         kcfg_sec = sec;
>                         ext->type = EXT_KCFG;

[...]

> +static int bpf_object__resolve_ksym_func_btf_id(struct bpf_object *obj,
> +                                               struct extern_desc *ext)
> +{
> +       int local_func_proto_id, kern_func_proto_id, kern_func_id;
> +       const struct btf_type *kern_func;
> +       struct btf *kern_btf = NULL;
> +       int ret, kern_btf_fd = 0;
> +
> +       local_func_proto_id = ext->btf_type->type;

yeah, so this ext->btf_type can be retrieved with
btf__type_by_id(obj->btf, ext->btf_id) here, no need to pollute
extern_desc with extra field

> +
> +       kern_func_id = find_ksym_btf_id(obj, ext->name, BTF_KIND_FUNC,
> +                                       &kern_btf, &kern_btf_fd);
> +       if (kern_func_id < 0) {
> +               pr_warn("extern (func ksym) '%s': not found in kernel BTF\n",
> +                       ext->name);
> +               return kern_func_id;
> +       }
> +
> +       if (kern_btf != obj->btf_vmlinux) {
> +               pr_warn("extern (func ksym) '%s': function in kernel module is not supported\n",
> +                       ext->name);
> +               return -ENOTSUP;
> +       }
> +

[...]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH bpf-next 02/15] bpf: btf: Support parsing extern func
  2021-03-18 23:39     ` Martin KaFai Lau
@ 2021-03-19  4:13       ` Andrii Nakryiko
  2021-03-19  5:29         ` Martin KaFai Lau
  0 siblings, 1 reply; 49+ messages in thread
From: Andrii Nakryiko @ 2021-03-19  4:13 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team, Networking

On Thu, Mar 18, 2021 at 4:39 PM Martin KaFai Lau <kafai@fb.com> wrote:
>
> On Thu, Mar 18, 2021 at 03:53:38PM -0700, Andrii Nakryiko wrote:
> > On Tue, Mar 16, 2021 at 12:01 AM Martin KaFai Lau <kafai@fb.com> wrote:
> > >
> > > This patch makes BTF verifier to accept extern func. It is used for
> > > allowing bpf program to call a limited set of kernel functions
> > > in a later patch.
> > >
> > > When writing bpf prog, the extern kernel function needs
> > > to be declared under a ELF section (".ksyms") which is
> > > the same as the current extern kernel variables and that should
> > > keep its usage consistent without requiring to remember another
> > > section name.
> > >
> > > For example, in a bpf_prog.c:
> > >
> > > extern int foo(struct sock *) __attribute__((section(".ksyms")))
> > >
> > > [24] FUNC_PROTO '(anon)' ret_type_id=15 vlen=1
> > >         '(anon)' type_id=18
> > > [25] FUNC 'foo' type_id=24 linkage=extern
> > > [ ... ]
> > > [33] DATASEC '.ksyms' size=0 vlen=1
> > >         type_id=25 offset=0 size=0
> > >
> > > LLVM will put the "func" type into the BTF datasec ".ksyms".
> > > The current "btf_datasec_check_meta()" assumes everything under
> > > it is a "var" and ensures it has non-zero size ("!vsi->size" test).
> > > The non-zero size check is not true for "func".  This patch postpones the
> > > "!vsi-size" test from "btf_datasec_check_meta()" to
> > > "btf_datasec_resolve()" which has all types collected to decide
> > > if a vsi is a "var" or a "func" and then enforce the "vsi->size"
> > > differently.
> > >
> > > If the datasec only has "func", its "t->size" could be zero.
> > > Thus, the current "!t->size" test is no longer valid.  The
> > > invalid "t->size" will still be caught by the later
> > > "last_vsi_end_off > t->size" check.   This patch also takes this
> > > chance to consolidate other "t->size" tests ("vsi->offset >= t->size"
> > > "vsi->size > t->size", and "t->size < sum") into the existing
> > > "last_vsi_end_off > t->size" test.
> > >
> > > The LLVM will also put those extern kernel function as an extern
> > > linkage func in the BTF:
> > >
> > > [24] FUNC_PROTO '(anon)' ret_type_id=15 vlen=1
> > >         '(anon)' type_id=18
> > > [25] FUNC 'foo' type_id=24 linkage=extern
> > >
> > > This patch allows BTF_FUNC_EXTERN in btf_func_check_meta().
> > > Also extern kernel function declaration does not
> > > necessary have arg name. Another change in btf_func_check() is
> > > to allow extern function having no arg name.
> > >
> > > The btf selftest is adjusted accordingly.  New tests are also added.
> > >
> > > The required LLVM patch: https://reviews.llvm.org/D93563
> > >
> > > Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> > > ---
> >
> > High-level question about EXTERN functions in DATASEC. Does kernel
> > need to see them under DATASEC? What if libbpf just removed all EXTERN
> > funcs from under DATASEC and leave them as "free-floating" EXTERN
> > FUNCs in BTF.
> >
> > We need to tag EXTERNs with DATASECs mainly for libbpf to know whether
> > it's .kconfig or .ksym or other type of externs. Does kernel need to
> > care?
> Although the kernel does not need to know, since the a legit llvm generates it,
> I go with a proper support in the kernel (e.g. bpftool btf dump can better
> reflect what was there).

LLVM also generates extern VAR with BTF_VAR_EXTERN, yet libbpf is
replacing it with fake INTs. We could do just that here as well. If
anyone would want to know all the kernel functions that some BPF
program is using, they could do it from the instruction dump, with
proper addresses and kernel function names nicely displayed there.
That's way more useful, IMO.

>
> >
> > >  kernel/bpf/btf.c                             |  52 ++++---
> > >  tools/testing/selftests/bpf/prog_tests/btf.c | 154 ++++++++++++++++++-
> > >  2 files changed, 178 insertions(+), 28 deletions(-)
> > >
> >
> > [...]
> >
> > > @@ -3611,9 +3594,28 @@ static int btf_datasec_resolve(struct btf_verifier_env *env,
> > >                 u32 var_type_id = vsi->type, type_id, type_size = 0;
> > >                 const struct btf_type *var_type = btf_type_by_id(env->btf,
> > >                                                                  var_type_id);
> > > -               if (!var_type || !btf_type_is_var(var_type)) {
> > > +               if (!var_type) {
> > > +                       btf_verifier_log_vsi(env, v->t, vsi,
> > > +                                            "type not found");
> > > +                       return -EINVAL;
> > > +               }
> > > +
> > > +               if (btf_type_is_func(var_type)) {
> > > +                       if (vsi->size || vsi->offset) {
> > > +                               btf_verifier_log_vsi(env, v->t, vsi,
> > > +                                                    "Invalid size/offset");
> > > +                               return -EINVAL;
> > > +                       }
> > > +                       continue;
> > > +               } else if (btf_type_is_var(var_type)) {
> > > +                       if (!vsi->size) {
> > > +                               btf_verifier_log_vsi(env, v->t, vsi,
> > > +                                                    "Invalid size");
> > > +                               return -EINVAL;
> > > +                       }
> > > +               } else {
> > >                         btf_verifier_log_vsi(env, v->t, vsi,
> > > -                                            "Not a VAR kind member");
> > > +                                            "Neither a VAR nor a FUNC");
> > >                         return -EINVAL;
> >
> > can you please structure it as follow (I think it is bit easier to
> > follow the logic then):
> >
> > if (btf_type_is_func()) {
> >    ...
> >    continue; /* no extra checks */
> > }
> >
> > if (!btf_type_is_var()) {
> >    /* bad, complain, exit */
> >    return -EINVAL;
> > }
> >
> > /* now we deal with extra checks for variables */
> >
> > That way variable checks are kept all in one place.
> >
> > Also a question: is that ok to enable non-extern functions under
> > DATASEC? Maybe, but that wasn't explicitly mentioned.
> The patch does not check.  We could reject that for now.
>
> >
> > >                 }
> > >
> > > @@ -3849,9 +3851,11 @@ static int btf_func_check(struct btf_verifier_env *env,
> > >         const struct btf_param *args;
> > >         const struct btf *btf;
> > >         u16 nr_args, i;
> > > +       bool is_extern;
> > >
> > >         btf = env->btf;
> > >         proto_type = btf_type_by_id(btf, t->type);
> > > +       is_extern = btf_type_vlen(t) == BTF_FUNC_EXTERN;
> >
> > using btf_type_vlen(t) for getting func linkage is becoming more and
> > more confusing. Would it be terrible to have btf_func_linkage(t)
> > helper instead?
> I have it in my local v2.  and also just return when it is extern.
>
> >
> > >
> > >         if (!proto_type || !btf_type_is_func_proto(proto_type)) {
> > >                 btf_verifier_log_type(env, t, "Invalid type_id");
> > > @@ -3861,7 +3865,7 @@ static int btf_func_check(struct btf_verifier_env *env,
> > >         args = (const struct btf_param *)(proto_type + 1);
> > >         nr_args = btf_type_vlen(proto_type);
> > >         for (i = 0; i < nr_args; i++) {
> > > -               if (!args[i].name_off && args[i].type) {
> > > +               if (!is_extern && !args[i].name_off && args[i].type) {
> > >                         btf_verifier_log_type(env, t, "Invalid arg#%u", i + 1);
> > >                         return -EINVAL;
> > >                 }
> >
> > [...]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH bpf-next 13/15] bpf: selftests: Rename bictcp to bpf_cubic
  2021-03-16  1:14 ` [PATCH bpf-next 13/15] bpf: selftests: Rename bictcp to bpf_cubic Martin KaFai Lau
@ 2021-03-19  4:14   ` Andrii Nakryiko
  0 siblings, 0 replies; 49+ messages in thread
From: Andrii Nakryiko @ 2021-03-19  4:14 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team, Networking

On Tue, Mar 16, 2021 at 12:02 AM Martin KaFai Lau <kafai@fb.com> wrote:
>
> As a similar chanage in the kernel, this patch gives the proper
> name to the bpf cubic.
>
> Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> ---

Acked-by: Andrii Nakryiko <andrii@kernel.org>

>  tools/testing/selftests/bpf/progs/bpf_cubic.c | 30 +++++++++----------
>  1 file changed, 15 insertions(+), 15 deletions(-)
>

[...]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH bpf-next 14/15] bpf: selftest: bpf_cubic and bpf_dctcp calling kernel functions
  2021-03-16  1:15 ` [PATCH bpf-next 14/15] bpf: selftest: bpf_cubic and bpf_dctcp calling kernel functions Martin KaFai Lau
@ 2021-03-19  4:15   ` Andrii Nakryiko
  0 siblings, 0 replies; 49+ messages in thread
From: Andrii Nakryiko @ 2021-03-19  4:15 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team, Networking

On Tue, Mar 16, 2021 at 12:02 AM Martin KaFai Lau <kafai@fb.com> wrote:
>
> This patch removes the bpf implementation of tcp_slow_start()
> and tcp_cong_avoid_ai().  Instead, it directly uses the kernel
> implementation.
>
> It also replaces the bpf_cubic_undo_cwnd implementation by directly
> calling tcp_reno_undo_cwnd().  bpf_dctcp also directly calls
> tcp_reno_cong_avoid() instead.
>
> Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> ---

This is awesome.

Acked-by: Andrii Nakryiko <andrii@kernel.org>

>  tools/testing/selftests/bpf/bpf_tcp_helpers.h | 29 ++-----------------
>  tools/testing/selftests/bpf/progs/bpf_cubic.c |  6 ++--
>  tools/testing/selftests/bpf/progs/bpf_dctcp.c | 22 ++++----------
>  3 files changed, 11 insertions(+), 46 deletions(-)
>

[...]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH bpf-next 15/15] bpf: selftest: Add kfunc_call test
  2021-03-16  1:15 ` [PATCH bpf-next 15/15] bpf: selftest: Add kfunc_call test Martin KaFai Lau
  2021-03-16  3:39   ` kernel test robot
@ 2021-03-19  4:21   ` Andrii Nakryiko
  2021-03-19  5:40     ` Martin KaFai Lau
  1 sibling, 1 reply; 49+ messages in thread
From: Andrii Nakryiko @ 2021-03-19  4:21 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team, Networking

On Tue, Mar 16, 2021 at 12:02 AM Martin KaFai Lau <kafai@fb.com> wrote:
>
> This patch adds two kernel function bpf_kfunc_call_test[12]() for the
> selftest's test_run purpose.  They will be allowed for tc_cls prog.
>
> The selftest calling the kernel function bpf_kfunc_call_test[12]()
> is also added in this patch.
>
> Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> ---
>  net/bpf/test_run.c                            | 11 ++++
>  net/core/filter.c                             | 11 ++++
>  .../selftests/bpf/prog_tests/kfunc_call.c     | 61 +++++++++++++++++++
>  .../selftests/bpf/progs/kfunc_call_test.c     | 48 +++++++++++++++
>  .../bpf/progs/kfunc_call_test_subprog.c       | 31 ++++++++++
>  5 files changed, 162 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/kfunc_call.c
>  create mode 100644 tools/testing/selftests/bpf/progs/kfunc_call_test.c
>  create mode 100644 tools/testing/selftests/bpf/progs/kfunc_call_test_subprog.c
>
> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> index 0abdd67f44b1..c1baab0c7d96 100644
> --- a/net/bpf/test_run.c
> +++ b/net/bpf/test_run.c
> @@ -209,6 +209,17 @@ int noinline bpf_modify_return_test(int a, int *b)
>         *b += 1;
>         return a + *b;
>  }
> +
> +u64 noinline bpf_kfunc_call_test1(struct sock *sk, u32 a, u64 b, u32 c, u64 d)
> +{
> +       return a + b + c + d;
> +}
> +
> +int noinline bpf_kfunc_call_test2(struct sock *sk, u32 a, u32 b)
> +{
> +       return a + b;
> +}
> +
>  __diag_pop();
>
>  ALLOW_ERROR_INJECTION(bpf_modify_return_test, ERRNO);
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 10dac9dd5086..605fbbdd694b 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -9799,12 +9799,23 @@ const struct bpf_prog_ops sk_filter_prog_ops = {
>         .test_run               = bpf_prog_test_run_skb,
>  };
>
> +BTF_SET_START(bpf_tc_cls_kfunc_ids)
> +BTF_ID(func, bpf_kfunc_call_test1)
> +BTF_ID(func, bpf_kfunc_call_test2)
> +BTF_SET_END(bpf_tc_cls_kfunc_ids)
> +
> +static bool tc_cls_check_kern_func_call(u32 kfunc_id)
> +{
> +       return btf_id_set_contains(&bpf_tc_cls_kfunc_ids, kfunc_id);
> +}
> +
>  const struct bpf_verifier_ops tc_cls_act_verifier_ops = {
>         .get_func_proto         = tc_cls_act_func_proto,
>         .is_valid_access        = tc_cls_act_is_valid_access,
>         .convert_ctx_access     = tc_cls_act_convert_ctx_access,
>         .gen_prologue           = tc_cls_act_prologue,
>         .gen_ld_abs             = bpf_gen_ld_abs,
> +       .check_kern_func_call   = tc_cls_check_kern_func_call,
>  };
>
>  const struct bpf_prog_ops tc_cls_act_prog_ops = {
> diff --git a/tools/testing/selftests/bpf/prog_tests/kfunc_call.c b/tools/testing/selftests/bpf/prog_tests/kfunc_call.c
> new file mode 100644
> index 000000000000..3850e6cc0a7d
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/kfunc_call.c
> @@ -0,0 +1,61 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2021 Facebook */
> +#include <test_progs.h>
> +#include <network_helpers.h>
> +#include "kfunc_call_test.skel.h"
> +#include "kfunc_call_test_subprog.skel.h"
> +
> +static __u32 duration;
> +

you shouldn't need it, you don't use CHECK()s

> +static void test_main(void)
> +{
> +       struct kfunc_call_test *skel;
> +       int prog_fd, retval, err;
> +
> +       skel = kfunc_call_test__open_and_load();
> +       if (!ASSERT_OK_PTR(skel, "skel"))
> +               return;
> +
> +       prog_fd = bpf_program__fd(skel->progs.kfunc_call_test1);
> +       err = bpf_prog_test_run(prog_fd, 1, &pkt_v4, sizeof(pkt_v4),
> +                               NULL, NULL, (__u32 *)&retval, &duration);
> +
> +       if (ASSERT_OK(err, "bpf_prog_test_run(test1)"))
> +               ASSERT_EQ(retval, 12, "test1-retval");

there is no harm in doing retval check unconditionally. If something
goes wrong, you'll both know that err != 0 and what retval you got (if
you ever care, but if not, it doesn't hurt either). Same below.

> +
> +       prog_fd = bpf_program__fd(skel->progs.kfunc_call_test2);
> +       err = bpf_prog_test_run(prog_fd, 1, &pkt_v4, sizeof(pkt_v4),
> +                               NULL, NULL, (__u32 *)&retval, &duration);
> +       if (ASSERT_OK(err, "bpf_prog_test_run(test2)"))
> +               ASSERT_EQ(retval, 3, "test2-retval");
> +
> +       kfunc_call_test__destroy(skel);
> +}
> +
> +static void test_subprog(void)
> +{
> +       struct kfunc_call_test_subprog *skel;
> +       int prog_fd, retval, err;
> +
> +       skel = kfunc_call_test_subprog__open_and_load();
> +       if (!ASSERT_OK_PTR(skel, "skel"))
> +               return;
> +
> +       prog_fd = bpf_program__fd(skel->progs.kfunc_call_test1);
> +       err = bpf_prog_test_run(prog_fd, 1, &pkt_v4, sizeof(pkt_v4),
> +                               NULL, NULL, (__u32 *)&retval, &duration);
> +
> +       if (ASSERT_OK(err, "bpf_prog_test_run(test1)"))
> +               ASSERT_EQ(retval, 10, "test1-retval");
> +
> +       kfunc_call_test_subprog__destroy(skel);
> +}
> +
> +void test_kfunc_call(void)
> +{
> +       if (test__start_subtest("main"))
> +               test_main();
> +
> +       if (test__start_subtest("subprog"))
> +               test_subprog();
> +}
> diff --git a/tools/testing/selftests/bpf/progs/kfunc_call_test.c b/tools/testing/selftests/bpf/progs/kfunc_call_test.c
> new file mode 100644
> index 000000000000..ea8c5266efd8
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/kfunc_call_test.c
> @@ -0,0 +1,48 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2021 Facebook */
> +#include <linux/bpf.h>
> +#include <bpf/bpf_helpers.h>
> +#include "bpf_tcp_helpers.h"
> +
> +extern __u64 bpf_kfunc_call_test1(struct sock *sk, __u32 a, __u64 b,
> +                                 __u32 c, __u64 d) __ksym;
> +extern int bpf_kfunc_call_test2(struct sock *sk, __u32 a, __u32 b) __ksym;
> +
> +SEC("classifier/test2")
> +int kfunc_call_test2(struct __sk_buff *skb)
> +{
> +       struct bpf_sock *sk = skb->sk;
> +
> +       if (!sk)
> +               return -1;
> +
> +       sk = bpf_sk_fullsock(sk);
> +       if (!sk)
> +               return -1;
> +
> +       return bpf_kfunc_call_test2((struct sock *)sk, 1, 2);
> +}
> +
> +SEC("classifier/test1")

please use just SEC("classifier") here and above, libbpf will handle
that properly

> +int kfunc_call_test1(struct __sk_buff *skb)
> +{
> +       struct bpf_sock *sk = skb->sk;
> +       __u64 a = 1ULL << 32;
> +       __u32 ret;
> +
> +       if (!sk)
> +               return -1;
> +
> +       sk = bpf_sk_fullsock(sk);
> +       if (!sk)
> +               return -1;
> +
> +       a = bpf_kfunc_call_test1((struct sock *)sk, 1, a | 2, 3, a | 4);
> +
> +       ret = a >> 32;   /* ret should be 2 */
> +       ret += (__u32)a; /* ret should be 12 */
> +
> +       return ret;
> +}
> +
> +char _license[] SEC("license") = "GPL";
> diff --git a/tools/testing/selftests/bpf/progs/kfunc_call_test_subprog.c b/tools/testing/selftests/bpf/progs/kfunc_call_test_subprog.c
> new file mode 100644
> index 000000000000..9bf66f8c826e
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/kfunc_call_test_subprog.c
> @@ -0,0 +1,31 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2021 Facebook */
> +#include <linux/bpf.h>
> +#include <bpf/bpf_helpers.h>
> +#include "bpf_tcp_helpers.h"
> +
> +extern __u64 bpf_kfunc_call_test1(struct sock *sk, __u32 a, __u64 b,
> +                                 __u32 c, __u64 d) __ksym;
> +
> +__attribute__ ((noinline))

__noinline

> +int f1(struct __sk_buff *skb)
> +{
> +       struct bpf_sock *sk = skb->sk;
> +
> +       if (!sk)
> +               return -1;
> +
> +       sk = bpf_sk_fullsock(sk);
> +       if (!sk)
> +               return -1;
> +
> +       return (__u32)bpf_kfunc_call_test1((struct sock *)sk, 1, 2, 3, 4);
> +}
> +
> +SEC("classifier/test1_subprog")

same, just "classifier"

> +int kfunc_call_test1(struct __sk_buff *skb)
> +{
> +       return f1(skb);
> +}
> +
> +char _license[] SEC("license") = "GPL";
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH bpf-next 12/15] libbpf: Support extern kernel function
  2021-03-19  4:11   ` Andrii Nakryiko
@ 2021-03-19  5:06     ` Martin KaFai Lau
  2021-03-19 21:38       ` Andrii Nakryiko
  0 siblings, 1 reply; 49+ messages in thread
From: Martin KaFai Lau @ 2021-03-19  5:06 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team, Networking

On Thu, Mar 18, 2021 at 09:11:39PM -0700, Andrii Nakryiko wrote:
> On Tue, Mar 16, 2021 at 12:02 AM Martin KaFai Lau <kafai@fb.com> wrote:
> >
> > This patch is to make libbpf able to handle the following extern
> > kernel function declaration and do the needed relocations before
> > loading the bpf program to the kernel.
> >
> > extern int foo(struct sock *) __attribute__((section(".ksyms")))
> >
> > In the collect extern phase, needed changes is made to
> > bpf_object__collect_externs() and find_extern_btf_id() to collect
> > function.
> >
> > In the collect relo phase, it will record the kernel function
> > call as RELO_EXTERN_FUNC.
> >
> > bpf_object__resolve_ksym_func_btf_id() is added to find the func
> > btf_id of the running kernel.
> >
> > During actual relocation, it will patch the BPF_CALL instruction with
> > src_reg = BPF_PSEUDO_FUNC_CALL and insn->imm set to the running
> > kernel func's btf_id.
> >
> > btf_fixup_datasec() is changed also because a datasec may
> > only have func and its size will be 0.  The "!size" test
> > is postponed till it is confirmed there are vars.
> > It also takes this chance to remove the
> > "if (... || (t->size && t->size != size)) { return -ENOENT; }" test
> > because t->size is zero at the point.
> >
> > The required LLVM patch: https://reviews.llvm.org/D93563 
> >
> > Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> > ---
> >  tools/lib/bpf/btf.c    |  32 ++++++++----
> >  tools/lib/bpf/btf.h    |   5 ++
> >  tools/lib/bpf/libbpf.c | 113 +++++++++++++++++++++++++++++++++++++----
> >  3 files changed, 129 insertions(+), 21 deletions(-)
> >
> > diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
> > index 3aa58f2ac183..bb09b577c154 100644
> > --- a/tools/lib/bpf/btf.c
> > +++ b/tools/lib/bpf/btf.c
> > @@ -1108,7 +1108,7 @@ static int btf_fixup_datasec(struct bpf_object *obj, struct btf *btf,
> >         const struct btf_type *t_var;
> >         struct btf_var_secinfo *vsi;
> >         const struct btf_var *var;
> > -       int ret;
> > +       int ret, nr_vars = 0;
> >
> >         if (!name) {
> >                 pr_debug("No name found in string section for DATASEC kind.\n");
> > @@ -1117,27 +1117,27 @@ static int btf_fixup_datasec(struct bpf_object *obj, struct btf *btf,
> >
> >         /* .extern datasec size and var offsets were set correctly during
> >          * extern collection step, so just skip straight to sorting variables
> > +        * One exception is the datasec may only have extern funcs,
> > +        * t->size is 0 in this case.  This will be handled
> > +        * with !nr_vars later.
> >          */
> >         if (t->size)
> >                 goto sort_vars;
> >
> > -       ret = bpf_object__section_size(obj, name, &size);
> > -       if (ret || !size || (t->size && t->size != size)) {
> > -               pr_debug("Invalid size for section %s: %u bytes\n", name, size);
> > -               return -ENOENT;
> > -       }
> > -
> > -       t->size = size;
> > +       bpf_object__section_size(obj, name, &size);
> 
> So it's not great that we just ignore any errors here. ".ksyms" is a
> special section, so it should be fine to just ignore it by name and
> leave the rest of error handling intact.
The ret < 0 case? In that case, size is 0.

or there are cases that a section has no vars but the size should not be 0?

> 
> >
> >         for (i = 0, vsi = btf_var_secinfos(t); i < vars; i++, vsi++) {
> >                 t_var = btf__type_by_id(btf, vsi->type);
> > -               var = btf_var(t_var);
> >
> > -               if (!btf_is_var(t_var)) {
> > -                       pr_debug("Non-VAR type seen in section %s\n", name);
> > +               if (btf_is_func(t_var)) {
> > +                       continue;
> 
> just
> 
> if (btf_is_func(t_var))
>     continue;
> 
> no need for "else if" below
> 
> > +               } else if (!btf_is_var(t_var)) {
> > +                       pr_debug("Non-VAR and Non-FUNC type seen in section %s\n", name);
> 
> nit: Non-FUNC -> non-FUNC
> 
> >                         return -EINVAL;
> >                 }
> >
> > +               nr_vars++;
> > +               var = btf_var(t_var);
> >                 if (var->linkage == BTF_VAR_STATIC)
> >                         continue;
> >
> > @@ -1157,6 +1157,16 @@ static int btf_fixup_datasec(struct bpf_object *obj, struct btf *btf,
> >                 vsi->offset = off;
> >         }
> >
> > +       if (!nr_vars)
> > +               return 0;
> > +
> > +       if (!size) {
> > +               pr_debug("Invalid size for section %s: %u bytes\n", name, size);
> > +               return -ENOENT;
> > +       }
> > +
> > +       t->size = size;
> > +
> >  sort_vars:
> >         qsort(btf_var_secinfos(t), vars, sizeof(*vsi), compare_vsi_off);
> >         return 0;
> > diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
> > index 029a9cfc8c2d..07d508b70497 100644
> > --- a/tools/lib/bpf/btf.h
> > +++ b/tools/lib/bpf/btf.h
> > @@ -368,6 +368,11 @@ btf_var_secinfos(const struct btf_type *t)
> >         return (struct btf_var_secinfo *)(t + 1);
> >  }
> >
> > +static inline enum btf_func_linkage btf_func_linkage(const struct btf_type *t)
> > +{
> > +       return (enum btf_func_linkage)BTF_INFO_VLEN(t->info);
> > +}
> 
> exposing `enum btf_func_linkage` in libbpf API headers will cause
> compilation errors for users on older systems. We went through a bunch
> of pain with `enum bpf_stats_type` (and it is still causing pain for
> C++), I'd rather avoid some more here. Can you please move it into
> libbpf.c for now. It doesn't seem like a very popular function that
> needs to be exposed to users.
will do.

> 
> > +
> >  #ifdef __cplusplus
> >  } /* extern "C" */
> >  #endif
> > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> > index 0a60fcb2fba2..49bda179bd93 100644
> > --- a/tools/lib/bpf/libbpf.c
> > +++ b/tools/lib/bpf/libbpf.c
> > @@ -190,6 +190,7 @@ enum reloc_type {
> >         RELO_CALL,
> >         RELO_DATA,
> >         RELO_EXTERN_VAR,
> > +       RELO_EXTERN_FUNC,
> >         RELO_SUBPROG_ADDR,
> >  };
> >
> > @@ -384,6 +385,7 @@ struct extern_desc {
> >         int btf_id;
> >         int sec_btf_id;
> >         const char *name;
> > +       const struct btf_type *btf_type;
> >         bool is_set;
> >         bool is_weak;
> >         union {
> > @@ -3022,7 +3024,7 @@ static bool sym_is_subprog(const GElf_Sym *sym, int text_shndx)
> >  static int find_extern_btf_id(const struct btf *btf, const char *ext_name)
> >  {
> >         const struct btf_type *t;
> > -       const char *var_name;
> > +       const char *tname;
> >         int i, n;
> >
> >         if (!btf)
> > @@ -3032,14 +3034,18 @@ static int find_extern_btf_id(const struct btf *btf, const char *ext_name)
> >         for (i = 1; i <= n; i++) {
> >                 t = btf__type_by_id(btf, i);
> >
> > -               if (!btf_is_var(t))
> > +               if (!btf_is_var(t) && !btf_is_func(t))
> >                         continue;
> >
> > -               var_name = btf__name_by_offset(btf, t->name_off);
> > -               if (strcmp(var_name, ext_name))
> > +               tname = btf__name_by_offset(btf, t->name_off);
> > +               if (strcmp(tname, ext_name))
> >                         continue;
> >
> > -               if (btf_var(t)->linkage != BTF_VAR_GLOBAL_EXTERN)
> > +               if (btf_is_var(t) &&
> > +                   btf_var(t)->linkage != BTF_VAR_GLOBAL_EXTERN)
> > +                       return -EINVAL;
> > +
> > +               if (btf_is_func(t) && btf_func_linkage(t) != BTF_FUNC_EXTERN)
> >                         return -EINVAL;
> >
> >                 return i;
> > @@ -3199,10 +3205,10 @@ static int bpf_object__collect_externs(struct bpf_object *obj)
> >                         return ext->btf_id;
> >                 }
> >                 t = btf__type_by_id(obj->btf, ext->btf_id);
> > +               ext->btf_type = t;
> 
> ext->btf_type is derived from ext->btf_id and obj->btf (always), so
> there is no need for it
It is for easier btf_is_var() check later instead of going through
another btf__type_by_id().

yeah, I will make a few btf__type_by_id() calls in v2.

> 
> >                 ext->name = btf__name_by_offset(obj->btf, t->name_off);
> >                 ext->sym_idx = i;
> >                 ext->is_weak = GELF_ST_BIND(sym.st_info) == STB_WEAK;
> > -
> >                 ext->sec_btf_id = find_extern_sec_btf_id(obj->btf, ext->btf_id);
> >                 if (ext->sec_btf_id <= 0) {
> >                         pr_warn("failed to find BTF for extern '%s' [%d] section: %d\n",
> > @@ -3212,6 +3218,34 @@ static int bpf_object__collect_externs(struct bpf_object *obj)
> >                 sec = (void *)btf__type_by_id(obj->btf, ext->sec_btf_id);
> >                 sec_name = btf__name_by_offset(obj->btf, sec->name_off);
> >
> > +               if (btf_is_func(t)) {
> 
> there is a KSYMS_SEC handling logic below, let's keep both func and
> variables handling together there?
It is to keep the indentation manageable
and also most of the things doing here is not
sharable with variables.

Sure. I can move it there.

> 
> > +                       const struct btf_type *func_proto;
> > +
> > +                       func_proto = btf__type_by_id(obj->btf, t->type);
> > +                       if (!func_proto || !btf_is_func_proto(func_proto)) {
> 
> this is implied by BTF format itself, seems a bit redundant
It has already been checked?

> 
> > +                               pr_warn("extern function %s does not have a valid func_proto\n",
> > +                                       ext->name);
> > +                               return -EINVAL;
> > +                       }
> > +
> > +                       if (ext->is_weak) {
> > +                               pr_warn("extern weak function %s is unsupported\n",
> > +                                       ext->name);
> > +                               return -ENOTSUP;
> > +                       }
> > +
> > +                       if (strcmp(sec_name, KSYMS_SEC)) {
> > +                               pr_warn("extern function %s is only supported under %s section\n",
> > +                                       ext->name, KSYMS_SEC);
> > +                               return -ENOTSUP;
> > +                       }
> > +
> > +                       ksym_sec = sec;
> > +                       ext->type = EXT_KSYM;
> > +                       ext->ksym.type_id = ext->btf_id;
> 
> there is skip_mods_and_typedefs in KSYMS_SEC section below, but it
> won't have any effect on FUNC_PROTO, so existing logic can be used
> as-is
func id is used here to keep what ksyms.type_id means:
/* local btf_id of the ksym extern's type. */

The kernel extern type here should be func instead of func_proto.
func_proto cannot be extern.

> 
> > +                       continue;
> > +               }
> > +
> >                 if (strcmp(sec_name, KCONFIG_SEC) == 0) {
> >                         kcfg_sec = sec;
> >                         ext->type = EXT_KCFG;
> 
> [...]
> 
> > +static int bpf_object__resolve_ksym_func_btf_id(struct bpf_object *obj,
> > +                                               struct extern_desc *ext)
> > +{
> > +       int local_func_proto_id, kern_func_proto_id, kern_func_id;
> > +       const struct btf_type *kern_func;
> > +       struct btf *kern_btf = NULL;
> > +       int ret, kern_btf_fd = 0;
> > +
> > +       local_func_proto_id = ext->btf_type->type;
> 
> yeah, so this ext->btf_type can be retrieved with
> btf__type_by_id(obj->btf, ext->btf_id) here, no need to pollute
> extern_desc with extra field
> 
> > +
> > +       kern_func_id = find_ksym_btf_id(obj, ext->name, BTF_KIND_FUNC,
> > +                                       &kern_btf, &kern_btf_fd);
> > +       if (kern_func_id < 0) {
> > +               pr_warn("extern (func ksym) '%s': not found in kernel BTF\n",
> > +                       ext->name);
> > +               return kern_func_id;
> > +       }
> > +
> > +       if (kern_btf != obj->btf_vmlinux) {
> > +               pr_warn("extern (func ksym) '%s': function in kernel module is not supported\n",
> > +                       ext->name);
> > +               return -ENOTSUP;
> > +       }
> > +
> 
> [...]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH bpf-next 02/15] bpf: btf: Support parsing extern func
  2021-03-19  4:13       ` Andrii Nakryiko
@ 2021-03-19  5:29         ` Martin KaFai Lau
  2021-03-19 21:27           ` Andrii Nakryiko
  0 siblings, 1 reply; 49+ messages in thread
From: Martin KaFai Lau @ 2021-03-19  5:29 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team, Networking

On Thu, Mar 18, 2021 at 09:13:56PM -0700, Andrii Nakryiko wrote:
> On Thu, Mar 18, 2021 at 4:39 PM Martin KaFai Lau <kafai@fb.com> wrote:
> >
> > On Thu, Mar 18, 2021 at 03:53:38PM -0700, Andrii Nakryiko wrote:
> > > On Tue, Mar 16, 2021 at 12:01 AM Martin KaFai Lau <kafai@fb.com> wrote:
> > > >
> > > > This patch makes BTF verifier to accept extern func. It is used for
> > > > allowing bpf program to call a limited set of kernel functions
> > > > in a later patch.
> > > >
> > > > When writing bpf prog, the extern kernel function needs
> > > > to be declared under a ELF section (".ksyms") which is
> > > > the same as the current extern kernel variables and that should
> > > > keep its usage consistent without requiring to remember another
> > > > section name.
> > > >
> > > > For example, in a bpf_prog.c:
> > > >
> > > > extern int foo(struct sock *) __attribute__((section(".ksyms")))
> > > >
> > > > [24] FUNC_PROTO '(anon)' ret_type_id=15 vlen=1
> > > >         '(anon)' type_id=18
> > > > [25] FUNC 'foo' type_id=24 linkage=extern
> > > > [ ... ]
> > > > [33] DATASEC '.ksyms' size=0 vlen=1
> > > >         type_id=25 offset=0 size=0
> > > >
> > > > LLVM will put the "func" type into the BTF datasec ".ksyms".
> > > > The current "btf_datasec_check_meta()" assumes everything under
> > > > it is a "var" and ensures it has non-zero size ("!vsi->size" test).
> > > > The non-zero size check is not true for "func".  This patch postpones the
> > > > "!vsi-size" test from "btf_datasec_check_meta()" to
> > > > "btf_datasec_resolve()" which has all types collected to decide
> > > > if a vsi is a "var" or a "func" and then enforce the "vsi->size"
> > > > differently.
> > > >
> > > > If the datasec only has "func", its "t->size" could be zero.
> > > > Thus, the current "!t->size" test is no longer valid.  The
> > > > invalid "t->size" will still be caught by the later
> > > > "last_vsi_end_off > t->size" check.   This patch also takes this
> > > > chance to consolidate other "t->size" tests ("vsi->offset >= t->size"
> > > > "vsi->size > t->size", and "t->size < sum") into the existing
> > > > "last_vsi_end_off > t->size" test.
> > > >
> > > > The LLVM will also put those extern kernel function as an extern
> > > > linkage func in the BTF:
> > > >
> > > > [24] FUNC_PROTO '(anon)' ret_type_id=15 vlen=1
> > > >         '(anon)' type_id=18
> > > > [25] FUNC 'foo' type_id=24 linkage=extern
> > > >
> > > > This patch allows BTF_FUNC_EXTERN in btf_func_check_meta().
> > > > Also extern kernel function declaration does not
> > > > necessary have arg name. Another change in btf_func_check() is
> > > > to allow extern function having no arg name.
> > > >
> > > > The btf selftest is adjusted accordingly.  New tests are also added.
> > > >
> > > > The required LLVM patch: https://reviews.llvm.org/D93563 
> > > >
> > > > Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> > > > ---
> > >
> > > High-level question about EXTERN functions in DATASEC. Does kernel
> > > need to see them under DATASEC? What if libbpf just removed all EXTERN
> > > funcs from under DATASEC and leave them as "free-floating" EXTERN
> > > FUNCs in BTF.
> > >
> > > We need to tag EXTERNs with DATASECs mainly for libbpf to know whether
> > > it's .kconfig or .ksym or other type of externs. Does kernel need to
> > > care?
> > Although the kernel does not need to know, since the a legit llvm generates it,
> > I go with a proper support in the kernel (e.g. bpftool btf dump can better
> > reflect what was there).
> 
> LLVM also generates extern VAR with BTF_VAR_EXTERN, yet libbpf is
> replacing it with fake INTs.
Yep. I noticed the loop in collect_extern() in libbpf.
It replaces the var->type with INT.

> We could do just that here as well.
What to replace in the FUNC case?

Regardless, supporting it properly in the kernel is a better way to go
instead of asking the userspace to move around it.  It is not very
complicated to support it in the kernel also.

What is the concern of having the kernel to support it?

> If anyone would want to know all the kernel functions that some BPF
> program is using, they could do it from the instruction dump, with
> proper addresses and kernel function names nicely displayed there.
> That's way more useful, IMO.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH bpf-next 15/15] bpf: selftest: Add kfunc_call test
  2021-03-19  4:21   ` Andrii Nakryiko
@ 2021-03-19  5:40     ` Martin KaFai Lau
  0 siblings, 0 replies; 49+ messages in thread
From: Martin KaFai Lau @ 2021-03-19  5:40 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team, Networking

On Thu, Mar 18, 2021 at 09:21:08PM -0700, Andrii Nakryiko wrote:
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/prog_tests/kfunc_call.c
> > @@ -0,0 +1,61 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/* Copyright (c) 2021 Facebook */
> > +#include <test_progs.h>
> > +#include <network_helpers.h>
> > +#include "kfunc_call_test.skel.h"
> > +#include "kfunc_call_test_subprog.skel.h"
> > +
> > +static __u32 duration;
> > +
> 
> you shouldn't need it, you don't use CHECK()s
It was for bpf_prog_test_run().
Just noticed it can take NULL.  will remove in v2.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH bpf-next 03/15] bpf: Refactor btf_check_func_arg_match
  2021-03-18 23:32   ` Andrii Nakryiko
@ 2021-03-19 19:32     ` Martin KaFai Lau
  2021-03-19 21:51       ` Andrii Nakryiko
  0 siblings, 1 reply; 49+ messages in thread
From: Martin KaFai Lau @ 2021-03-19 19:32 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team, Networking

On Thu, Mar 18, 2021 at 04:32:47PM -0700, Andrii Nakryiko wrote:
> On Tue, Mar 16, 2021 at 12:01 AM Martin KaFai Lau <kafai@fb.com> wrote:
> >
> > This patch refactors the core logic of "btf_check_func_arg_match()"
> > into a new function "do_btf_check_func_arg_match()".
> > "do_btf_check_func_arg_match()" will be reused later to check
> > the kernel function call.
> >
> > The "if (!btf_type_is_ptr(t))" is checked first to improve the indentation
> > which will be useful for a later patch.
> >
> > Some of the "btf_kind_str[]" usages is replaced with the shortcut
> > "btf_type_str(t)".
> >
> > Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> > ---
> >  include/linux/btf.h |   5 ++
> >  kernel/bpf/btf.c    | 159 ++++++++++++++++++++++++--------------------
> >  2 files changed, 91 insertions(+), 73 deletions(-)
> >
> > diff --git a/include/linux/btf.h b/include/linux/btf.h
> > index 7fabf1428093..93bf2e5225f5 100644
> > --- a/include/linux/btf.h
> > +++ b/include/linux/btf.h
> > @@ -140,6 +140,11 @@ static inline bool btf_type_is_enum(const struct btf_type *t)
> >         return BTF_INFO_KIND(t->info) == BTF_KIND_ENUM;
> >  }
> >
> > +static inline bool btf_type_is_scalar(const struct btf_type *t)
> > +{
> > +       return btf_type_is_int(t) || btf_type_is_enum(t);
> > +}
> > +
> >  static inline bool btf_type_is_typedef(const struct btf_type *t)
> >  {
> >         return BTF_INFO_KIND(t->info) == BTF_KIND_TYPEDEF;
> > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > index 96cd24020a38..529b94b601c6 100644
> > --- a/kernel/bpf/btf.c
> > +++ b/kernel/bpf/btf.c
> > @@ -4381,7 +4381,7 @@ static u8 bpf_ctx_convert_map[] = {
> >  #undef BPF_LINK_TYPE
> >
> >  static const struct btf_member *
> > -btf_get_prog_ctx_type(struct bpf_verifier_log *log, struct btf *btf,
> > +btf_get_prog_ctx_type(struct bpf_verifier_log *log, const struct btf *btf,
> >                       const struct btf_type *t, enum bpf_prog_type prog_type,
> >                       int arg)
> >  {
> > @@ -5366,122 +5366,135 @@ int btf_check_type_match(struct bpf_verifier_log *log, const struct bpf_prog *pr
> >         return btf_check_func_type_match(log, btf1, t1, btf2, t2);
> >  }
> >
> > -/* Compare BTF of a function with given bpf_reg_state.
> > - * Returns:
> > - * EFAULT - there is a verifier bug. Abort verification.
> > - * EINVAL - there is a type mismatch or BTF is not available.
> > - * 0 - BTF matches with what bpf_reg_state expects.
> > - * Only PTR_TO_CTX and SCALAR_VALUE states are recognized.
> > - */
> > -int btf_check_func_arg_match(struct bpf_verifier_env *env, int subprog,
> > -                            struct bpf_reg_state *regs)
> > +static int do_btf_check_func_arg_match(struct bpf_verifier_env *env,
> 
> do_btf_check_func_arg_match vs btf_check_func_arg_match distinction is
> not clear at all. How about something like
> 
> btf_check_func_arg_match vs btf_check_subprog_arg_match (or btf_func
> vs bpf_subprog). I think that highlights the main distinction better,
> no?
will rename.

> 
> > +                                      const struct btf *btf, u32 func_id,
> > +                                      struct bpf_reg_state *regs,
> > +                                      bool ptr_to_mem_ok)
> >  {
> >         struct bpf_verifier_log *log = &env->log;
> > -       struct bpf_prog *prog = env->prog;
> > -       struct btf *btf = prog->aux->btf;
> > -       const struct btf_param *args;
> > +       const char *func_name, *ref_tname;
> >         const struct btf_type *t, *ref_t;
> > -       u32 i, nargs, btf_id, type_size;
> > -       const char *tname;
> > -       bool is_global;
> > -
> > -       if (!prog->aux->func_info)
> > -               return -EINVAL;
> > -
> > -       btf_id = prog->aux->func_info[subprog].type_id;
> > -       if (!btf_id)
> > -               return -EFAULT;
> > -
> > -       if (prog->aux->func_info_aux[subprog].unreliable)
> > -               return -EINVAL;
> > +       const struct btf_param *args;
> > +       u32 i, nargs;
> >
> > -       t = btf_type_by_id(btf, btf_id);
> > +       t = btf_type_by_id(btf, func_id);
> >         if (!t || !btf_type_is_func(t)) {
> >                 /* These checks were already done by the verifier while loading
> >                  * struct bpf_func_info
> >                  */
> > -               bpf_log(log, "BTF of func#%d doesn't point to KIND_FUNC\n",
> > -                       subprog);
> > +               bpf_log(log, "BTF of func_id %u doesn't point to KIND_FUNC\n",
> > +                       func_id);
> >                 return -EFAULT;
> >         }
> > -       tname = btf_name_by_offset(btf, t->name_off);
> > +       func_name = btf_name_by_offset(btf, t->name_off);
> >
> >         t = btf_type_by_id(btf, t->type);
> >         if (!t || !btf_type_is_func_proto(t)) {
> > -               bpf_log(log, "Invalid BTF of func %s\n", tname);
> > +               bpf_log(log, "Invalid BTF of func %s\n", func_name);
> >                 return -EFAULT;
> >         }
> >         args = (const struct btf_param *)(t + 1);
> >         nargs = btf_type_vlen(t);
> >         if (nargs > MAX_BPF_FUNC_REG_ARGS) {
> > -               bpf_log(log, "Function %s has %d > %d args\n", tname, nargs,
> > +               bpf_log(log, "Function %s has %d > %d args\n", func_name, nargs,
> >                         MAX_BPF_FUNC_REG_ARGS);
> > -               goto out;
> > +               return -EINVAL;
> >         }
> >
> > -       is_global = prog->aux->func_info_aux[subprog].linkage == BTF_FUNC_GLOBAL;
> >         /* check that BTF function arguments match actual types that the
> >          * verifier sees.
> >          */
> >         for (i = 0; i < nargs; i++) {
> > -               struct bpf_reg_state *reg = &regs[i + 1];
> > +               u32 regno = i + 1;
> > +               struct bpf_reg_state *reg = &regs[regno];
> >
> > -               t = btf_type_by_id(btf, args[i].type);
> > -               while (btf_type_is_modifier(t))
> > -                       t = btf_type_by_id(btf, t->type);
> > -               if (btf_type_is_int(t) || btf_type_is_enum(t)) {
> > +               t = btf_type_skip_modifiers(btf, args[i].type, NULL);
> > +               if (btf_type_is_scalar(t)) {
> >                         if (reg->type == SCALAR_VALUE)
> >                                 continue;
> > -                       bpf_log(log, "R%d is not a scalar\n", i + 1);
> > -                       goto out;
> > +                       bpf_log(log, "R%d is not a scalar\n", regno);
> > +                       return -EINVAL;
> >                 }
> > -               if (btf_type_is_ptr(t)) {
> > +
> > +               if (!btf_type_is_ptr(t)) {
> > +                       bpf_log(log, "Unrecognized arg#%d type %s\n",
> > +                               i, btf_type_str(t));
> > +                       return -EINVAL;
> > +               }
> > +
> > +               ref_t = btf_type_skip_modifiers(btf, t->type, NULL);
> > +               ref_tname = btf_name_by_offset(btf, ref_t->name_off);
> 
> these two seem to be used only inside else `if (ptr_to_mem_ok)`, let's
> move the code and variables inside that branch?
It is kept here because the next patch uses it in
another case also.

> 
> > +               if (btf_get_prog_ctx_type(log, btf, t, env->prog->type, i)) {
> >                         /* If function expects ctx type in BTF check that caller
> >                          * is passing PTR_TO_CTX.
> >                          */
> > -                       if (btf_get_prog_ctx_type(log, btf, t, prog->type, i)) {
> > -                               if (reg->type != PTR_TO_CTX) {
> > -                                       bpf_log(log,
> > -                                               "arg#%d expected pointer to ctx, but got %s\n",
> > -                                               i, btf_kind_str[BTF_INFO_KIND(t->info)]);
> > -                                       goto out;
> > -                               }
> > -                               if (check_ctx_reg(env, reg, i + 1))
> > -                                       goto out;
> > -                               continue;
> > +                       if (reg->type != PTR_TO_CTX) {
> > +                               bpf_log(log,
> > +                                       "arg#%d expected pointer to ctx, but got %s\n",
> > +                                       i, btf_type_str(t));
> > +                               return -EINVAL;
> >                         }
> > +                       if (check_ctx_reg(env, reg, regno))
> > +                               return -EINVAL;
> 
> original code had `continue` here allowing to stop tracking if/else
> logic. Any specific reason you removed it? It keeps logic simpler to
> follow, imo.
There is no other case after this.
"continue" becomes redundant, so removed.

> 
> > +               } else if (ptr_to_mem_ok) {
> 
> similarly to how you did reduction of nestedness with btf_type_is_ptr, I'd do
> 
> if (!ptr_to_mem_ok)
>     return -EINVAL;
> 
> and let brain forget about another if/else branch tracking
I don't see a significant difference.  Either way looks the same with
a few more test cases, IMO.

I prefer to keep it like this since there is
another test case added in the next patch.

There are usages with much longer if-else-if statement inside a
loop in the verifier also without explicit "continue" in the middle
or handle the last case differently and they are very readable.

> 
> > +                       const struct btf_type *resolve_ret;
> > +                       u32 type_size;
> >
> > -                       if (!is_global)
> > -                               goto out;
> > -
> > -                       t = btf_type_skip_modifiers(btf, t->type, NULL);
> > -
> > -                       ref_t = btf_resolve_size(btf, t, &type_size);
> > -                       if (IS_ERR(ref_t)) {
> > +                       resolve_ret = btf_resolve_size(btf, ref_t, &type_size);
> > +                       if (IS_ERR(resolve_ret)) {
> >                                 bpf_log(log,
> > -                                   "arg#%d reference type('%s %s') size cannot be determined: %ld\n",
> > -                                   i, btf_type_str(t), btf_name_by_offset(btf, t->name_off),
> > -                                       PTR_ERR(ref_t));
> > -                               goto out;
> > +                                       "arg#%d reference type('%s %s') size cannot be determined: %ld\n",
> > +                                       i, btf_type_str(ref_t), ref_tname,
> > +                                       PTR_ERR(resolve_ret));
> > +                               return -EINVAL;
> >                         }
> >
> > -                       if (check_mem_reg(env, reg, i + 1, type_size))
> > -                               goto out;
> > -
> > -                       continue;
> > +                       if (check_mem_reg(env, reg, regno, type_size))
> > +                               return -EINVAL;
> > +               } else {
> > +                       return -EINVAL;
> >                 }
> > -               bpf_log(log, "Unrecognized arg#%d type %s\n",
> > -                       i, btf_kind_str[BTF_INFO_KIND(t->info)]);
> > -               goto out;
> >         }
> > +
> >         return 0;
> > -out:
> > +}
> > +
> > +/* Compare BTF of a function with given bpf_reg_state.
> > + * Returns:
> > + * EFAULT - there is a verifier bug. Abort verification.
> > + * EINVAL - there is a type mismatch or BTF is not available.
> > + * 0 - BTF matches with what bpf_reg_state expects.
> > + * Only PTR_TO_CTX and SCALAR_VALUE states are recognized.
> > + */
> > +int btf_check_func_arg_match(struct bpf_verifier_env *env, int subprog,
> > +                            struct bpf_reg_state *regs)
> > +{
> > +       struct bpf_prog *prog = env->prog;
> > +       struct btf *btf = prog->aux->btf;
> > +       bool is_global;
> > +       u32 btf_id;
> > +       int err;
> > +
> > +       if (!prog->aux->func_info)
> > +               return -EINVAL;
> > +
> > +       btf_id = prog->aux->func_info[subprog].type_id;
> > +       if (!btf_id)
> > +               return -EFAULT;
> > +
> > +       if (prog->aux->func_info_aux[subprog].unreliable)
> > +               return -EINVAL;
> > +
> > +       is_global = prog->aux->func_info_aux[subprog].linkage == BTF_FUNC_GLOBAL;
> > +       err = do_btf_check_func_arg_match(env, btf, btf_id, regs, is_global);
> > +
> >         /* Compiler optimizations can remove arguments from static functions
> >          * or mismatched type can be passed into a global function.
> >          * In such cases mark the function as unreliable from BTF point of view.
> >          */
> > -       prog->aux->func_info_aux[subprog].unreliable = true;
> > -       return -EINVAL;
> > +       if (err == -EINVAL)
> > +               prog->aux->func_info_aux[subprog].unreliable = true;
> 
> is there any harm marking it unreliable for any error? this makes it
> look like -EINVAL is super-special. If it's EFAULT, it won't matter,
> right?
will always assign true on any err.

> 
> > +       return err;
> >  }
> >
> >  /* Convert BTF of a function into bpf_reg_state if possible
> > --
> > 2.30.2
> >

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH bpf-next 04/15] bpf: Support bpf program calling kernel function
  2021-03-19  1:03   ` Andrii Nakryiko
  2021-03-19  1:51     ` Alexei Starovoitov
@ 2021-03-19 19:47     ` Martin KaFai Lau
  1 sibling, 0 replies; 49+ messages in thread
From: Martin KaFai Lau @ 2021-03-19 19:47 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team, Networking

On Thu, Mar 18, 2021 at 06:03:49PM -0700, Andrii Nakryiko wrote:
> On Tue, Mar 16, 2021 at 12:01 AM Martin KaFai Lau <kafai@fb.com> wrote:
> >
> > This patch adds support to BPF verifier to allow bpf program calling
> > kernel function directly.
> >
> > The use case included in this set is to allow bpf-tcp-cc to directly
> > call some tcp-cc helper functions (e.g. "tcp_cong_avoid_ai()").  Those
> > functions have already been used by some kernel tcp-cc implementations.
> >
> > This set will also allow the bpf-tcp-cc program to directly call the
> > kernel tcp-cc implementation,  For example, a bpf_dctcp may only want to
> > implement its own dctcp_cwnd_event() and reuse other dctcp_*() directly
> > from the kernel tcp_dctcp.c instead of reimplementing (or
> > copy-and-pasting) them.
> >
> > The tcp-cc kernel functions mentioned above will be white listed
> > for the struct_ops bpf-tcp-cc programs to use in a later patch.
> > The white listed functions are not bounded to a fixed ABI contract.
> > Those functions have already been used by the existing kernel tcp-cc.
> > If any of them has changed, both in-tree and out-of-tree kernel tcp-cc
> > implementations have to be changed.  The same goes for the struct_ops
> > bpf-tcp-cc programs which have to be adjusted accordingly.
> >
> > This patch is to make the required changes in the bpf verifier.
> >
> > First change is in btf.c, it adds a case in "do_btf_check_func_arg_match()".
> > When the passed in "btf->kernel_btf == true", it means matching the
> > verifier regs' states with a kernel function.  This will handle the
> > PTR_TO_BTF_ID reg.  It also maps PTR_TO_SOCK_COMMON, PTR_TO_SOCKET,
> > and PTR_TO_TCP_SOCK to its kernel's btf_id.
> >
> > In the later libbpf patch, the insn calling a kernel function will
> > look like:
> >
> > insn->code == (BPF_JMP | BPF_CALL)
> > insn->src_reg == BPF_PSEUDO_KFUNC_CALL /* <- new in this patch */
> > insn->imm == func_btf_id /* btf_id of the running kernel */
> >
> > [ For the future calling function-in-kernel-module support, an array
> >   of module btf_fds can be passed at the load time and insn->off
> >   can be used to index into this array. ]
> >
> > At the early stage of verifier, the verifier will collect all kernel
> > function calls into "struct bpf_kern_func_descriptor".  Those
> > descriptors are stored in "prog->aux->kfunc_tab" and will
> > be available to the JIT.  Since this "add" operation is similar
> > to the current "add_subprog()" and looking for the same insn->code,
> > they are done together in the new "add_subprog_and_kern_func()".
> >
> > In the "do_check()" stage, the new "check_kern_func_call()" is added
> > to verify the kernel function call instruction:
> > 1. Ensure the kernel function can be used by a particular BPF_PROG_TYPE.
> >    A new bpf_verifier_ops "check_kern_func_call" is added to do that.
> >    The bpf-tcp-cc struct_ops program will implement this function in
> >    a later patch.
> > 2. Call "btf_check_kern_func_args_match()" to ensure the regs can be
> >    used as the args of a kernel function.
> > 3. Mark the regs' type, subreg_def, and zext_dst.
> >
> > At the later do_misc_fixups() stage, the new fixup_kern_func_call()
> > will replace the insn->imm with the function address (relative
> > to __bpf_call_base).  If needed, the jit can find the btf_func_model
> > by calling the new bpf_jit_find_kern_func_model(prog, insn->imm).
> > With the imm set to the function address, "bpftool prog dump xlated"
> > will be able to display the kernel function calls the same way as
> > it displays other bpf helper calls.
> >
> > gpl_compatible program is required to call kernel function.
> >
> > This feature currently requires JIT.
> >
> > Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> > ---
> 
> After the initial pass it all makes sense so far. I am a bit concerned
> about s32 and kernel function offset, though. See below.
> 
> Also "kern_func" and "descriptor" are quite mouthful, it seems to me
> that using kfunc consistently wouldn't hurt readability at all. You
> also already use desc in place of "descriptor" for variables, so I'd
> do that in type names as well.
The descriptor/desc naming follows the existing poke descriptor
and some of its helper naming.

Sure. both can be renamed in v2.
s/descriptor/desc/
s/kern_func/kfunc/

> > +static int kern_func_desc_cmp_by_imm(const void *a, const void *b)
> > +{
> > +       const struct bpf_kern_func_descriptor *d0 = a;
> > +       const struct bpf_kern_func_descriptor *d1 = b;
> > +
> > +       return d0->imm - d1->imm;
> 
> this is not safe, assuming any possible s32 values, no?
Good catch. will fix.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH bpf-next 02/15] bpf: btf: Support parsing extern func
  2021-03-19  5:29         ` Martin KaFai Lau
@ 2021-03-19 21:27           ` Andrii Nakryiko
  2021-03-19 22:19             ` Martin KaFai Lau
  0 siblings, 1 reply; 49+ messages in thread
From: Andrii Nakryiko @ 2021-03-19 21:27 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team, Networking

On Thu, Mar 18, 2021 at 10:29 PM Martin KaFai Lau <kafai@fb.com> wrote:
>
> On Thu, Mar 18, 2021 at 09:13:56PM -0700, Andrii Nakryiko wrote:
> > On Thu, Mar 18, 2021 at 4:39 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > >
> > > On Thu, Mar 18, 2021 at 03:53:38PM -0700, Andrii Nakryiko wrote:
> > > > On Tue, Mar 16, 2021 at 12:01 AM Martin KaFai Lau <kafai@fb.com> wrote:
> > > > >
> > > > > This patch makes BTF verifier to accept extern func. It is used for
> > > > > allowing bpf program to call a limited set of kernel functions
> > > > > in a later patch.
> > > > >
> > > > > When writing bpf prog, the extern kernel function needs
> > > > > to be declared under a ELF section (".ksyms") which is
> > > > > the same as the current extern kernel variables and that should
> > > > > keep its usage consistent without requiring to remember another
> > > > > section name.
> > > > >
> > > > > For example, in a bpf_prog.c:
> > > > >
> > > > > extern int foo(struct sock *) __attribute__((section(".ksyms")))
> > > > >
> > > > > [24] FUNC_PROTO '(anon)' ret_type_id=15 vlen=1
> > > > >         '(anon)' type_id=18
> > > > > [25] FUNC 'foo' type_id=24 linkage=extern
> > > > > [ ... ]
> > > > > [33] DATASEC '.ksyms' size=0 vlen=1
> > > > >         type_id=25 offset=0 size=0
> > > > >
> > > > > LLVM will put the "func" type into the BTF datasec ".ksyms".
> > > > > The current "btf_datasec_check_meta()" assumes everything under
> > > > > it is a "var" and ensures it has non-zero size ("!vsi->size" test).
> > > > > The non-zero size check is not true for "func".  This patch postpones the
> > > > > "!vsi-size" test from "btf_datasec_check_meta()" to
> > > > > "btf_datasec_resolve()" which has all types collected to decide
> > > > > if a vsi is a "var" or a "func" and then enforce the "vsi->size"
> > > > > differently.
> > > > >
> > > > > If the datasec only has "func", its "t->size" could be zero.
> > > > > Thus, the current "!t->size" test is no longer valid.  The
> > > > > invalid "t->size" will still be caught by the later
> > > > > "last_vsi_end_off > t->size" check.   This patch also takes this
> > > > > chance to consolidate other "t->size" tests ("vsi->offset >= t->size"
> > > > > "vsi->size > t->size", and "t->size < sum") into the existing
> > > > > "last_vsi_end_off > t->size" test.
> > > > >
> > > > > The LLVM will also put those extern kernel function as an extern
> > > > > linkage func in the BTF:
> > > > >
> > > > > [24] FUNC_PROTO '(anon)' ret_type_id=15 vlen=1
> > > > >         '(anon)' type_id=18
> > > > > [25] FUNC 'foo' type_id=24 linkage=extern
> > > > >
> > > > > This patch allows BTF_FUNC_EXTERN in btf_func_check_meta().
> > > > > Also extern kernel function declaration does not
> > > > > necessary have arg name. Another change in btf_func_check() is
> > > > > to allow extern function having no arg name.
> > > > >
> > > > > The btf selftest is adjusted accordingly.  New tests are also added.
> > > > >
> > > > > The required LLVM patch: https://reviews.llvm.org/D93563
> > > > >
> > > > > Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> > > > > ---
> > > >
> > > > High-level question about EXTERN functions in DATASEC. Does kernel
> > > > need to see them under DATASEC? What if libbpf just removed all EXTERN
> > > > funcs from under DATASEC and leave them as "free-floating" EXTERN
> > > > FUNCs in BTF.
> > > >
> > > > We need to tag EXTERNs with DATASECs mainly for libbpf to know whether
> > > > it's .kconfig or .ksym or other type of externs. Does kernel need to
> > > > care?
> > > Although the kernel does not need to know, since the a legit llvm generates it,
> > > I go with a proper support in the kernel (e.g. bpftool btf dump can better
> > > reflect what was there).
> >
> > LLVM also generates extern VAR with BTF_VAR_EXTERN, yet libbpf is
> > replacing it with fake INTs.
> Yep. I noticed the loop in collect_extern() in libbpf.
> It replaces the var->type with INT.
>
> > We could do just that here as well.
> What to replace in the FUNC case?

if we do that, I'd just replace them with same INTs. Or we can just
remove the entire DATASEC. Now it is easier to do with BTF write APIs.
Back then it was a major pain. I'd probably get rid of DATASEC
altogether instead of that INT replacement, if I had BTF write APIs.

>
> Regardless, supporting it properly in the kernel is a better way to go
> instead of asking the userspace to move around it.  It is not very
> complicated to support it in the kernel also.
>
> What is the concern of having the kernel to support it?

Just more complicated BTF validation logic, which means that there are
higher chances of permitting invalid BTF. And then the question is
what can the kernel do with those EXTERNs in BTF? Probably nothing.
And that .ksyms section is special, and purely libbpf convention.
Ideally kernel should not allow EXTERN funcs in any other DATASEC. Are
you willing to hard-code ".ksyms" name in kernel for libbpf's sake?
Probably not. The general rule, so far, was that kernel shouldn't see
any unresolved EXTERN at all. Now it's neither here nor there. EXTERN
funcs are ok, EXTERN vars are not.

>
> > If anyone would want to know all the kernel functions that some BPF
> > program is using, they could do it from the instruction dump, with
> > proper addresses and kernel function names nicely displayed there.
> > That's way more useful, IMO.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH bpf-next 12/15] libbpf: Support extern kernel function
  2021-03-19  5:06     ` Martin KaFai Lau
@ 2021-03-19 21:38       ` Andrii Nakryiko
  0 siblings, 0 replies; 49+ messages in thread
From: Andrii Nakryiko @ 2021-03-19 21:38 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team, Networking

On Thu, Mar 18, 2021 at 10:06 PM Martin KaFai Lau <kafai@fb.com> wrote:
>
> On Thu, Mar 18, 2021 at 09:11:39PM -0700, Andrii Nakryiko wrote:
> > On Tue, Mar 16, 2021 at 12:02 AM Martin KaFai Lau <kafai@fb.com> wrote:
> > >
> > > This patch is to make libbpf able to handle the following extern
> > > kernel function declaration and do the needed relocations before
> > > loading the bpf program to the kernel.
> > >
> > > extern int foo(struct sock *) __attribute__((section(".ksyms")))
> > >
> > > In the collect extern phase, needed changes is made to
> > > bpf_object__collect_externs() and find_extern_btf_id() to collect
> > > function.
> > >
> > > In the collect relo phase, it will record the kernel function
> > > call as RELO_EXTERN_FUNC.
> > >
> > > bpf_object__resolve_ksym_func_btf_id() is added to find the func
> > > btf_id of the running kernel.
> > >
> > > During actual relocation, it will patch the BPF_CALL instruction with
> > > src_reg = BPF_PSEUDO_FUNC_CALL and insn->imm set to the running
> > > kernel func's btf_id.
> > >
> > > btf_fixup_datasec() is changed also because a datasec may
> > > only have func and its size will be 0.  The "!size" test
> > > is postponed till it is confirmed there are vars.
> > > It also takes this chance to remove the
> > > "if (... || (t->size && t->size != size)) { return -ENOENT; }" test
> > > because t->size is zero at the point.
> > >
> > > The required LLVM patch: https://reviews.llvm.org/D93563
> > >
> > > Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> > > ---
> > >  tools/lib/bpf/btf.c    |  32 ++++++++----
> > >  tools/lib/bpf/btf.h    |   5 ++
> > >  tools/lib/bpf/libbpf.c | 113 +++++++++++++++++++++++++++++++++++++----
> > >  3 files changed, 129 insertions(+), 21 deletions(-)
> > >
> > > diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
> > > index 3aa58f2ac183..bb09b577c154 100644
> > > --- a/tools/lib/bpf/btf.c
> > > +++ b/tools/lib/bpf/btf.c
> > > @@ -1108,7 +1108,7 @@ static int btf_fixup_datasec(struct bpf_object *obj, struct btf *btf,
> > >         const struct btf_type *t_var;
> > >         struct btf_var_secinfo *vsi;
> > >         const struct btf_var *var;
> > > -       int ret;
> > > +       int ret, nr_vars = 0;
> > >
> > >         if (!name) {
> > >                 pr_debug("No name found in string section for DATASEC kind.\n");
> > > @@ -1117,27 +1117,27 @@ static int btf_fixup_datasec(struct bpf_object *obj, struct btf *btf,
> > >
> > >         /* .extern datasec size and var offsets were set correctly during
> > >          * extern collection step, so just skip straight to sorting variables
> > > +        * One exception is the datasec may only have extern funcs,
> > > +        * t->size is 0 in this case.  This will be handled
> > > +        * with !nr_vars later.
> > >          */
> > >         if (t->size)
> > >                 goto sort_vars;
> > >
> > > -       ret = bpf_object__section_size(obj, name, &size);
> > > -       if (ret || !size || (t->size && t->size != size)) {
> > > -               pr_debug("Invalid size for section %s: %u bytes\n", name, size);
> > > -               return -ENOENT;
> > > -       }
> > > -
> > > -       t->size = size;
> > > +       bpf_object__section_size(obj, name, &size);
> >
> > So it's not great that we just ignore any errors here. ".ksyms" is a
> > special section, so it should be fine to just ignore it by name and
> > leave the rest of error handling intact.
> The ret < 0 case? In that case, size is 0.
>
> or there are cases that a section has no vars but the size should not be 0?

ret < 0 is an error, which will no longer be propagated. Silently
consuming an error is what I'm worried about.

>
> >
> > >
> > >         for (i = 0, vsi = btf_var_secinfos(t); i < vars; i++, vsi++) {
> > >                 t_var = btf__type_by_id(btf, vsi->type);
> > > -               var = btf_var(t_var);
> > >
> > > -               if (!btf_is_var(t_var)) {
> > > -                       pr_debug("Non-VAR type seen in section %s\n", name);
> > > +               if (btf_is_func(t_var)) {
> > > +                       continue;
> >
> > just
> >
> > if (btf_is_func(t_var))
> >     continue;
> >
> > no need for "else if" below
> >
> > > +               } else if (!btf_is_var(t_var)) {
> > > +                       pr_debug("Non-VAR and Non-FUNC type seen in section %s\n", name);
> >
> > nit: Non-FUNC -> non-FUNC
> >
> > >                         return -EINVAL;
> > >                 }
> > >
> > > +               nr_vars++;
> > > +               var = btf_var(t_var);
> > >                 if (var->linkage == BTF_VAR_STATIC)
> > >                         continue;
> > >
> > > @@ -1157,6 +1157,16 @@ static int btf_fixup_datasec(struct bpf_object *obj, struct btf *btf,
> > >                 vsi->offset = off;
> > >         }
> > >
> > > +       if (!nr_vars)
> > > +               return 0;
> > > +
> > > +       if (!size) {
> > > +               pr_debug("Invalid size for section %s: %u bytes\n", name, size);
> > > +               return -ENOENT;
> > > +       }
> > > +
> > > +       t->size = size;
> > > +
> > >  sort_vars:
> > >         qsort(btf_var_secinfos(t), vars, sizeof(*vsi), compare_vsi_off);
> > >         return 0;
> > > diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
> > > index 029a9cfc8c2d..07d508b70497 100644
> > > --- a/tools/lib/bpf/btf.h
> > > +++ b/tools/lib/bpf/btf.h
> > > @@ -368,6 +368,11 @@ btf_var_secinfos(const struct btf_type *t)
> > >         return (struct btf_var_secinfo *)(t + 1);
> > >  }
> > >
> > > +static inline enum btf_func_linkage btf_func_linkage(const struct btf_type *t)
> > > +{
> > > +       return (enum btf_func_linkage)BTF_INFO_VLEN(t->info);
> > > +}
> >
> > exposing `enum btf_func_linkage` in libbpf API headers will cause
> > compilation errors for users on older systems. We went through a bunch
> > of pain with `enum bpf_stats_type` (and it is still causing pain for
> > C++), I'd rather avoid some more here. Can you please move it into
> > libbpf.c for now. It doesn't seem like a very popular function that
> > needs to be exposed to users.
> will do.
>
> >
> > > +
> > >  #ifdef __cplusplus
> > >  } /* extern "C" */
> > >  #endif
> > > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> > > index 0a60fcb2fba2..49bda179bd93 100644
> > > --- a/tools/lib/bpf/libbpf.c
> > > +++ b/tools/lib/bpf/libbpf.c
> > > @@ -190,6 +190,7 @@ enum reloc_type {
> > >         RELO_CALL,
> > >         RELO_DATA,
> > >         RELO_EXTERN_VAR,
> > > +       RELO_EXTERN_FUNC,
> > >         RELO_SUBPROG_ADDR,
> > >  };
> > >
> > > @@ -384,6 +385,7 @@ struct extern_desc {
> > >         int btf_id;
> > >         int sec_btf_id;
> > >         const char *name;
> > > +       const struct btf_type *btf_type;
> > >         bool is_set;
> > >         bool is_weak;
> > >         union {
> > > @@ -3022,7 +3024,7 @@ static bool sym_is_subprog(const GElf_Sym *sym, int text_shndx)
> > >  static int find_extern_btf_id(const struct btf *btf, const char *ext_name)
> > >  {
> > >         const struct btf_type *t;
> > > -       const char *var_name;
> > > +       const char *tname;
> > >         int i, n;
> > >
> > >         if (!btf)
> > > @@ -3032,14 +3034,18 @@ static int find_extern_btf_id(const struct btf *btf, const char *ext_name)
> > >         for (i = 1; i <= n; i++) {
> > >                 t = btf__type_by_id(btf, i);
> > >
> > > -               if (!btf_is_var(t))
> > > +               if (!btf_is_var(t) && !btf_is_func(t))
> > >                         continue;
> > >
> > > -               var_name = btf__name_by_offset(btf, t->name_off);
> > > -               if (strcmp(var_name, ext_name))
> > > +               tname = btf__name_by_offset(btf, t->name_off);
> > > +               if (strcmp(tname, ext_name))
> > >                         continue;
> > >
> > > -               if (btf_var(t)->linkage != BTF_VAR_GLOBAL_EXTERN)
> > > +               if (btf_is_var(t) &&
> > > +                   btf_var(t)->linkage != BTF_VAR_GLOBAL_EXTERN)
> > > +                       return -EINVAL;
> > > +
> > > +               if (btf_is_func(t) && btf_func_linkage(t) != BTF_FUNC_EXTERN)
> > >                         return -EINVAL;
> > >
> > >                 return i;
> > > @@ -3199,10 +3205,10 @@ static int bpf_object__collect_externs(struct bpf_object *obj)
> > >                         return ext->btf_id;
> > >                 }
> > >                 t = btf__type_by_id(obj->btf, ext->btf_id);
> > > +               ext->btf_type = t;
> >
> > ext->btf_type is derived from ext->btf_id and obj->btf (always), so
> > there is no need for it
> It is for easier btf_is_var() check later instead of going through
> another btf__type_by_id().
>
> yeah, I will make a few btf__type_by_id() calls in v2.
>
> >
> > >                 ext->name = btf__name_by_offset(obj->btf, t->name_off);
> > >                 ext->sym_idx = i;
> > >                 ext->is_weak = GELF_ST_BIND(sym.st_info) == STB_WEAK;
> > > -
> > >                 ext->sec_btf_id = find_extern_sec_btf_id(obj->btf, ext->btf_id);
> > >                 if (ext->sec_btf_id <= 0) {
> > >                         pr_warn("failed to find BTF for extern '%s' [%d] section: %d\n",
> > > @@ -3212,6 +3218,34 @@ static int bpf_object__collect_externs(struct bpf_object *obj)
> > >                 sec = (void *)btf__type_by_id(obj->btf, ext->sec_btf_id);
> > >                 sec_name = btf__name_by_offset(obj->btf, sec->name_off);
> > >
> > > +               if (btf_is_func(t)) {
> >
> > there is a KSYMS_SEC handling logic below, let's keep both func and
> > variables handling together there?
> It is to keep the indentation manageable
> and also most of the things doing here is not
> sharable with variables.
>
> Sure. I can move it there.

Yes, please. KCONFIG_SEC has a similar level of indentation, it's been
manageable so far.

>
> >
> > > +                       const struct btf_type *func_proto;
> > > +
> > > +                       func_proto = btf__type_by_id(obj->btf, t->type);
> > > +                       if (!func_proto || !btf_is_func_proto(func_proto)) {
> >
> > this is implied by BTF format itself, seems a bit redundant
> It has already been checked?

libbpf doesn't validate BTF for complete correctness, but if it will,
it's better to do it in one place, instead of multiple partial checks
spread out everywhere. Good thing is that the kernel will always
strictly validate everything in the end.

>
> >
> > > +                               pr_warn("extern function %s does not have a valid func_proto\n",
> > > +                                       ext->name);
> > > +                               return -EINVAL;
> > > +                       }
> > > +
> > > +                       if (ext->is_weak) {
> > > +                               pr_warn("extern weak function %s is unsupported\n",
> > > +                                       ext->name);
> > > +                               return -ENOTSUP;
> > > +                       }
> > > +
> > > +                       if (strcmp(sec_name, KSYMS_SEC)) {
> > > +                               pr_warn("extern function %s is only supported under %s section\n",
> > > +                                       ext->name, KSYMS_SEC);
> > > +                               return -ENOTSUP;
> > > +                       }
> > > +
> > > +                       ksym_sec = sec;
> > > +                       ext->type = EXT_KSYM;
> > > +                       ext->ksym.type_id = ext->btf_id;
> >
> > there is skip_mods_and_typedefs in KSYMS_SEC section below, but it
> > won't have any effect on FUNC_PROTO, so existing logic can be used
> > as-is
> func id is used here to keep what ksyms.type_id means:
> /* local btf_id of the ksym extern's type. */
>
> The kernel extern type here should be func instead of func_proto.
> func_proto cannot be extern.

Ah, I see. Ok, then you'll need to skip the skip_mods_and_typedef for
funcs (you'll have a special weak check just for funcs anyway). But I
still prefer to keep all the logic for KSYMS_SEC in one place. Thanks.

>
> >
> > > +                       continue;
> > > +               }
> > > +
> > >                 if (strcmp(sec_name, KCONFIG_SEC) == 0) {
> > >                         kcfg_sec = sec;
> > >                         ext->type = EXT_KCFG;
> >
> > [...]
> >
> > > +static int bpf_object__resolve_ksym_func_btf_id(struct bpf_object *obj,
> > > +                                               struct extern_desc *ext)
> > > +{
> > > +       int local_func_proto_id, kern_func_proto_id, kern_func_id;
> > > +       const struct btf_type *kern_func;
> > > +       struct btf *kern_btf = NULL;
> > > +       int ret, kern_btf_fd = 0;
> > > +
> > > +       local_func_proto_id = ext->btf_type->type;
> >
> > yeah, so this ext->btf_type can be retrieved with
> > btf__type_by_id(obj->btf, ext->btf_id) here, no need to pollute
> > extern_desc with extra field
> >
> > > +
> > > +       kern_func_id = find_ksym_btf_id(obj, ext->name, BTF_KIND_FUNC,
> > > +                                       &kern_btf, &kern_btf_fd);
> > > +       if (kern_func_id < 0) {
> > > +               pr_warn("extern (func ksym) '%s': not found in kernel BTF\n",
> > > +                       ext->name);
> > > +               return kern_func_id;
> > > +       }
> > > +
> > > +       if (kern_btf != obj->btf_vmlinux) {
> > > +               pr_warn("extern (func ksym) '%s': function in kernel module is not supported\n",
> > > +                       ext->name);
> > > +               return -ENOTSUP;
> > > +       }
> > > +
> >
> > [...]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH bpf-next 03/15] bpf: Refactor btf_check_func_arg_match
  2021-03-19 19:32     ` Martin KaFai Lau
@ 2021-03-19 21:51       ` Andrii Nakryiko
  2021-03-20  0:10         ` Alexei Starovoitov
  0 siblings, 1 reply; 49+ messages in thread
From: Andrii Nakryiko @ 2021-03-19 21:51 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team, Networking

On Fri, Mar 19, 2021 at 12:32 PM Martin KaFai Lau <kafai@fb.com> wrote:
>
> On Thu, Mar 18, 2021 at 04:32:47PM -0700, Andrii Nakryiko wrote:
> > On Tue, Mar 16, 2021 at 12:01 AM Martin KaFai Lau <kafai@fb.com> wrote:
> > >
> > > This patch refactors the core logic of "btf_check_func_arg_match()"
> > > into a new function "do_btf_check_func_arg_match()".
> > > "do_btf_check_func_arg_match()" will be reused later to check
> > > the kernel function call.
> > >
> > > The "if (!btf_type_is_ptr(t))" is checked first to improve the indentation
> > > which will be useful for a later patch.
> > >
> > > Some of the "btf_kind_str[]" usages is replaced with the shortcut
> > > "btf_type_str(t)".
> > >
> > > Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> > > ---
> > >  include/linux/btf.h |   5 ++
> > >  kernel/bpf/btf.c    | 159 ++++++++++++++++++++++++--------------------
> > >  2 files changed, 91 insertions(+), 73 deletions(-)
> > >

[...]

> > > +               if (!btf_type_is_ptr(t)) {
> > > +                       bpf_log(log, "Unrecognized arg#%d type %s\n",
> > > +                               i, btf_type_str(t));
> > > +                       return -EINVAL;
> > > +               }
> > > +
> > > +               ref_t = btf_type_skip_modifiers(btf, t->type, NULL);
> > > +               ref_tname = btf_name_by_offset(btf, ref_t->name_off);
> >
> > these two seem to be used only inside else `if (ptr_to_mem_ok)`, let's
> > move the code and variables inside that branch?
> It is kept here because the next patch uses it in
> another case also.

yeah, I saw that once I got to that patch, never mind

>
> >
> > > +               if (btf_get_prog_ctx_type(log, btf, t, env->prog->type, i)) {
> > >                         /* If function expects ctx type in BTF check that caller
> > >                          * is passing PTR_TO_CTX.
> > >                          */
> > > -                       if (btf_get_prog_ctx_type(log, btf, t, prog->type, i)) {
> > > -                               if (reg->type != PTR_TO_CTX) {
> > > -                                       bpf_log(log,
> > > -                                               "arg#%d expected pointer to ctx, but got %s\n",
> > > -                                               i, btf_kind_str[BTF_INFO_KIND(t->info)]);
> > > -                                       goto out;
> > > -                               }
> > > -                               if (check_ctx_reg(env, reg, i + 1))
> > > -                                       goto out;
> > > -                               continue;
> > > +                       if (reg->type != PTR_TO_CTX) {
> > > +                               bpf_log(log,
> > > +                                       "arg#%d expected pointer to ctx, but got %s\n",
> > > +                                       i, btf_type_str(t));
> > > +                               return -EINVAL;
> > >                         }
> > > +                       if (check_ctx_reg(env, reg, regno))
> > > +                               return -EINVAL;
> >
> > original code had `continue` here allowing to stop tracking if/else
> > logic. Any specific reason you removed it? It keeps logic simpler to
> > follow, imo.
> There is no other case after this.
> "continue" becomes redundant, so removed.

well, there is the entire "else if (ptr_to_mem_ok)" which now you need
to skip and go check if there is anything else that is supposed to
happen after if. `continue;`, on the other hand, makes it very clear
that nothing more is going to happen

>
> >
> > > +               } else if (ptr_to_mem_ok) {
> >
> > similarly to how you did reduction of nestedness with btf_type_is_ptr, I'd do
> >
> > if (!ptr_to_mem_ok)
> >     return -EINVAL;
> >
> > and let brain forget about another if/else branch tracking
> I don't see a significant difference.  Either way looks the same with
> a few more test cases, IMO.
>
> I prefer to keep it like this since there is
> another test case added in the next patch.
>
> There are usages with much longer if-else-if statement inside a
> loop in the verifier also without explicit "continue" in the middle
> or handle the last case differently and they are very readable.

It's a matter of taste, I suppose. I'd probably disagree with you on
the readability of those verifier parts ;) So it's up to you, of
course, but for me this code pattern:

for (...) {
    if (A) {
        handleA;
    } else if (B) {
        handleB;
    } else {
        return -EINVAL;
    }
}

is much harder to follow than more linear (imo)

for (...) {
    if (A) {
        handleA;
        continue;
    }

    if (!B)
        return -EINVAL;

    handleB;
}

especially if handleA and handleB are quite long and complicated.
Because I have to jump back and forth to validate that C is not
allowed/handled later, and that there is no common subsequent logic
for both A and B (or even C). In the latter code pattern there are
clear "only A" and "only B" logic and it's quite obvious that no C is
allowed/handled.

>
> >
> > > +                       const struct btf_type *resolve_ret;
> > > +                       u32 type_size;
> > >
> > > -                       if (!is_global)
> > > -                               goto out;
> > > -

[...]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH bpf-next 02/15] bpf: btf: Support parsing extern func
  2021-03-19 21:27           ` Andrii Nakryiko
@ 2021-03-19 22:19             ` Martin KaFai Lau
  2021-03-19 22:29               ` Andrii Nakryiko
  0 siblings, 1 reply; 49+ messages in thread
From: Martin KaFai Lau @ 2021-03-19 22:19 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team, Networking

On Fri, Mar 19, 2021 at 02:27:13PM -0700, Andrii Nakryiko wrote:
> On Thu, Mar 18, 2021 at 10:29 PM Martin KaFai Lau <kafai@fb.com> wrote:
> >
> > On Thu, Mar 18, 2021 at 09:13:56PM -0700, Andrii Nakryiko wrote:
> > > On Thu, Mar 18, 2021 at 4:39 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > > >
> > > > On Thu, Mar 18, 2021 at 03:53:38PM -0700, Andrii Nakryiko wrote:
> > > > > On Tue, Mar 16, 2021 at 12:01 AM Martin KaFai Lau <kafai@fb.com> wrote:
> > > > > >
> > > > > > This patch makes BTF verifier to accept extern func. It is used for
> > > > > > allowing bpf program to call a limited set of kernel functions
> > > > > > in a later patch.
> > > > > >
> > > > > > When writing bpf prog, the extern kernel function needs
> > > > > > to be declared under a ELF section (".ksyms") which is
> > > > > > the same as the current extern kernel variables and that should
> > > > > > keep its usage consistent without requiring to remember another
> > > > > > section name.
> > > > > >
> > > > > > For example, in a bpf_prog.c:
> > > > > >
> > > > > > extern int foo(struct sock *) __attribute__((section(".ksyms")))
> > > > > >
> > > > > > [24] FUNC_PROTO '(anon)' ret_type_id=15 vlen=1
> > > > > >         '(anon)' type_id=18
> > > > > > [25] FUNC 'foo' type_id=24 linkage=extern
> > > > > > [ ... ]
> > > > > > [33] DATASEC '.ksyms' size=0 vlen=1
> > > > > >         type_id=25 offset=0 size=0
> > > > > >
> > > > > > LLVM will put the "func" type into the BTF datasec ".ksyms".
> > > > > > The current "btf_datasec_check_meta()" assumes everything under
> > > > > > it is a "var" and ensures it has non-zero size ("!vsi->size" test).
> > > > > > The non-zero size check is not true for "func".  This patch postpones the
> > > > > > "!vsi-size" test from "btf_datasec_check_meta()" to
> > > > > > "btf_datasec_resolve()" which has all types collected to decide
> > > > > > if a vsi is a "var" or a "func" and then enforce the "vsi->size"
> > > > > > differently.
> > > > > >
> > > > > > If the datasec only has "func", its "t->size" could be zero.
> > > > > > Thus, the current "!t->size" test is no longer valid.  The
> > > > > > invalid "t->size" will still be caught by the later
> > > > > > "last_vsi_end_off > t->size" check.   This patch also takes this
> > > > > > chance to consolidate other "t->size" tests ("vsi->offset >= t->size"
> > > > > > "vsi->size > t->size", and "t->size < sum") into the existing
> > > > > > "last_vsi_end_off > t->size" test.
> > > > > >
> > > > > > The LLVM will also put those extern kernel function as an extern
> > > > > > linkage func in the BTF:
> > > > > >
> > > > > > [24] FUNC_PROTO '(anon)' ret_type_id=15 vlen=1
> > > > > >         '(anon)' type_id=18
> > > > > > [25] FUNC 'foo' type_id=24 linkage=extern
> > > > > >
> > > > > > This patch allows BTF_FUNC_EXTERN in btf_func_check_meta().
> > > > > > Also extern kernel function declaration does not
> > > > > > necessary have arg name. Another change in btf_func_check() is
> > > > > > to allow extern function having no arg name.
> > > > > >
> > > > > > The btf selftest is adjusted accordingly.  New tests are also added.
> > > > > >
> > > > > > The required LLVM patch: https://reviews.llvm.org/D93563 
> > > > > >
> > > > > > Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> > > > > > ---
> > > > >
> > > > > High-level question about EXTERN functions in DATASEC. Does kernel
> > > > > need to see them under DATASEC? What if libbpf just removed all EXTERN
> > > > > funcs from under DATASEC and leave them as "free-floating" EXTERN
> > > > > FUNCs in BTF.
> > > > >
> > > > > We need to tag EXTERNs with DATASECs mainly for libbpf to know whether
> > > > > it's .kconfig or .ksym or other type of externs. Does kernel need to
> > > > > care?
> > > > Although the kernel does not need to know, since the a legit llvm generates it,
> > > > I go with a proper support in the kernel (e.g. bpftool btf dump can better
> > > > reflect what was there).
> > >
> > > LLVM also generates extern VAR with BTF_VAR_EXTERN, yet libbpf is
> > > replacing it with fake INTs.
> > Yep. I noticed the loop in collect_extern() in libbpf.
> > It replaces the var->type with INT.
> >
> > > We could do just that here as well.
> > What to replace in the FUNC case?
> 
> if we do that, I'd just replace them with same INTs. Or we can just
> remove the entire DATASEC. Now it is easier to do with BTF write APIs.
> Back then it was a major pain. I'd probably get rid of DATASEC
> altogether instead of that INT replacement, if I had BTF write APIs.
Do you mean vsi->type = INT?

> 
> >
> > Regardless, supporting it properly in the kernel is a better way to go
> > instead of asking the userspace to move around it.  It is not very
> > complicated to support it in the kernel also.
> >
> > What is the concern of having the kernel to support it?
> 
> Just more complicated BTF validation logic, which means that there are
> higher chances of permitting invalid BTF. And then the question is
> what can the kernel do with those EXTERNs in BTF? Probably nothing.
> And that .ksyms section is special, and purely libbpf convention.
> Ideally kernel should not allow EXTERN funcs in any other DATASEC. Are
> you willing to hard-code ".ksyms" name in kernel for libbpf's sake?
> Probably not. The general rule, so far, was that kernel shouldn't see
> any unresolved EXTERN at all. Now it's neither here nor there. EXTERN
> funcs are ok, EXTERN vars are not.
Exactly, it is libbpf convention.  The kernel does not need to enforce it.
The kernel only needs to be able to support the debug info generated by
llvm and being able to display/dump it later.

There are many other things in the BTF that the kernel does not need to
know.  It is there for debug purpose which the BTF is used for.  Yes,
the func call can be discovered by instruction dump.  It is also nice to
see everything in one ksyms datasec also during btf dump.

If there is a need to strip everything that the kernel does not need
from the BTF, it can all be stripped in another "--strip-debug" like
option.

To support EXTERN var, the kernel part should be fine.  I am only not
sure why it has to change the vs->size and vs->offset in libbpf?


> 
> >
> > > If anyone would want to know all the kernel functions that some BPF
> > > program is using, they could do it from the instruction dump, with
> > > proper addresses and kernel function names nicely displayed there.
> > > That's way more useful, IMO.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH bpf-next 02/15] bpf: btf: Support parsing extern func
  2021-03-19 22:19             ` Martin KaFai Lau
@ 2021-03-19 22:29               ` Andrii Nakryiko
  2021-03-19 22:45                 ` Martin KaFai Lau
  0 siblings, 1 reply; 49+ messages in thread
From: Andrii Nakryiko @ 2021-03-19 22:29 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team, Networking

On Fri, Mar 19, 2021 at 3:19 PM Martin KaFai Lau <kafai@fb.com> wrote:
>
> On Fri, Mar 19, 2021 at 02:27:13PM -0700, Andrii Nakryiko wrote:
> > On Thu, Mar 18, 2021 at 10:29 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > >
> > > On Thu, Mar 18, 2021 at 09:13:56PM -0700, Andrii Nakryiko wrote:
> > > > On Thu, Mar 18, 2021 at 4:39 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > > > >
> > > > > On Thu, Mar 18, 2021 at 03:53:38PM -0700, Andrii Nakryiko wrote:
> > > > > > On Tue, Mar 16, 2021 at 12:01 AM Martin KaFai Lau <kafai@fb.com> wrote:
> > > > > > >
> > > > > > > This patch makes BTF verifier to accept extern func. It is used for
> > > > > > > allowing bpf program to call a limited set of kernel functions
> > > > > > > in a later patch.
> > > > > > >
> > > > > > > When writing bpf prog, the extern kernel function needs
> > > > > > > to be declared under a ELF section (".ksyms") which is
> > > > > > > the same as the current extern kernel variables and that should
> > > > > > > keep its usage consistent without requiring to remember another
> > > > > > > section name.
> > > > > > >
> > > > > > > For example, in a bpf_prog.c:
> > > > > > >
> > > > > > > extern int foo(struct sock *) __attribute__((section(".ksyms")))
> > > > > > >
> > > > > > > [24] FUNC_PROTO '(anon)' ret_type_id=15 vlen=1
> > > > > > >         '(anon)' type_id=18
> > > > > > > [25] FUNC 'foo' type_id=24 linkage=extern
> > > > > > > [ ... ]
> > > > > > > [33] DATASEC '.ksyms' size=0 vlen=1
> > > > > > >         type_id=25 offset=0 size=0
> > > > > > >
> > > > > > > LLVM will put the "func" type into the BTF datasec ".ksyms".
> > > > > > > The current "btf_datasec_check_meta()" assumes everything under
> > > > > > > it is a "var" and ensures it has non-zero size ("!vsi->size" test).
> > > > > > > The non-zero size check is not true for "func".  This patch postpones the
> > > > > > > "!vsi-size" test from "btf_datasec_check_meta()" to
> > > > > > > "btf_datasec_resolve()" which has all types collected to decide
> > > > > > > if a vsi is a "var" or a "func" and then enforce the "vsi->size"
> > > > > > > differently.
> > > > > > >
> > > > > > > If the datasec only has "func", its "t->size" could be zero.
> > > > > > > Thus, the current "!t->size" test is no longer valid.  The
> > > > > > > invalid "t->size" will still be caught by the later
> > > > > > > "last_vsi_end_off > t->size" check.   This patch also takes this
> > > > > > > chance to consolidate other "t->size" tests ("vsi->offset >= t->size"
> > > > > > > "vsi->size > t->size", and "t->size < sum") into the existing
> > > > > > > "last_vsi_end_off > t->size" test.
> > > > > > >
> > > > > > > The LLVM will also put those extern kernel function as an extern
> > > > > > > linkage func in the BTF:
> > > > > > >
> > > > > > > [24] FUNC_PROTO '(anon)' ret_type_id=15 vlen=1
> > > > > > >         '(anon)' type_id=18
> > > > > > > [25] FUNC 'foo' type_id=24 linkage=extern
> > > > > > >
> > > > > > > This patch allows BTF_FUNC_EXTERN in btf_func_check_meta().
> > > > > > > Also extern kernel function declaration does not
> > > > > > > necessary have arg name. Another change in btf_func_check() is
> > > > > > > to allow extern function having no arg name.
> > > > > > >
> > > > > > > The btf selftest is adjusted accordingly.  New tests are also added.
> > > > > > >
> > > > > > > The required LLVM patch: https://reviews.llvm.org/D93563
> > > > > > >
> > > > > > > Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> > > > > > > ---
> > > > > >
> > > > > > High-level question about EXTERN functions in DATASEC. Does kernel
> > > > > > need to see them under DATASEC? What if libbpf just removed all EXTERN
> > > > > > funcs from under DATASEC and leave them as "free-floating" EXTERN
> > > > > > FUNCs in BTF.
> > > > > >
> > > > > > We need to tag EXTERNs with DATASECs mainly for libbpf to know whether
> > > > > > it's .kconfig or .ksym or other type of externs. Does kernel need to
> > > > > > care?
> > > > > Although the kernel does not need to know, since the a legit llvm generates it,
> > > > > I go with a proper support in the kernel (e.g. bpftool btf dump can better
> > > > > reflect what was there).
> > > >
> > > > LLVM also generates extern VAR with BTF_VAR_EXTERN, yet libbpf is
> > > > replacing it with fake INTs.
> > > Yep. I noticed the loop in collect_extern() in libbpf.
> > > It replaces the var->type with INT.
> > >
> > > > We could do just that here as well.
> > > What to replace in the FUNC case?
> >
> > if we do that, I'd just replace them with same INTs. Or we can just
> > remove the entire DATASEC. Now it is easier to do with BTF write APIs.
> > Back then it was a major pain. I'd probably get rid of DATASEC
> > altogether instead of that INT replacement, if I had BTF write APIs.
> Do you mean vsi->type = INT?

yes, that's what existing logic does for EXTERN var

>
> >
> > >
> > > Regardless, supporting it properly in the kernel is a better way to go
> > > instead of asking the userspace to move around it.  It is not very
> > > complicated to support it in the kernel also.
> > >
> > > What is the concern of having the kernel to support it?
> >
> > Just more complicated BTF validation logic, which means that there are
> > higher chances of permitting invalid BTF. And then the question is
> > what can the kernel do with those EXTERNs in BTF? Probably nothing.
> > And that .ksyms section is special, and purely libbpf convention.
> > Ideally kernel should not allow EXTERN funcs in any other DATASEC. Are
> > you willing to hard-code ".ksyms" name in kernel for libbpf's sake?
> > Probably not. The general rule, so far, was that kernel shouldn't see
> > any unresolved EXTERN at all. Now it's neither here nor there. EXTERN
> > funcs are ok, EXTERN vars are not.
> Exactly, it is libbpf convention.  The kernel does not need to enforce it.
> The kernel only needs to be able to support the debug info generated by
> llvm and being able to display/dump it later.
>
> There are many other things in the BTF that the kernel does not need to

Curious, what are those many other things?

> know.  It is there for debug purpose which the BTF is used for.  Yes,
> the func call can be discovered by instruction dump.  It is also nice to
> see everything in one ksyms datasec also during btf dump.
>
> If there is a need to strip everything that the kernel does not need
> from the BTF, it can all be stripped in another "--strip-debug" like
> option.

Where does this "--strip-debug" option go? Clang, pahole, or bpftool?
Or am I misunderstanding what you are proposing?

>
> To support EXTERN var, the kernel part should be fine.  I am only not
> sure why it has to change the vs->size and vs->offset in libbpf?

vs->size and vs->offset are adjusted to match int type. Otherwise
kernel BTF validation will complain about DATASEC size mismatch.

>
>
> >
> > >
> > > > If anyone would want to know all the kernel functions that some BPF
> > > > program is using, they could do it from the instruction dump, with
> > > > proper addresses and kernel function names nicely displayed there.
> > > > That's way more useful, IMO.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH bpf-next 02/15] bpf: btf: Support parsing extern func
  2021-03-19 22:29               ` Andrii Nakryiko
@ 2021-03-19 22:45                 ` Martin KaFai Lau
  2021-03-19 23:02                   ` Andrii Nakryiko
  0 siblings, 1 reply; 49+ messages in thread
From: Martin KaFai Lau @ 2021-03-19 22:45 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team, Networking

On Fri, Mar 19, 2021 at 03:29:57PM -0700, Andrii Nakryiko wrote:
> On Fri, Mar 19, 2021 at 3:19 PM Martin KaFai Lau <kafai@fb.com> wrote:
> >
> > On Fri, Mar 19, 2021 at 02:27:13PM -0700, Andrii Nakryiko wrote:
> > > On Thu, Mar 18, 2021 at 10:29 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > > >
> > > > On Thu, Mar 18, 2021 at 09:13:56PM -0700, Andrii Nakryiko wrote:
> > > > > On Thu, Mar 18, 2021 at 4:39 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > > > > >
> > > > > > On Thu, Mar 18, 2021 at 03:53:38PM -0700, Andrii Nakryiko wrote:
> > > > > > > On Tue, Mar 16, 2021 at 12:01 AM Martin KaFai Lau <kafai@fb.com> wrote:
> > > > > > > >
> > > > > > > > This patch makes BTF verifier to accept extern func. It is used for
> > > > > > > > allowing bpf program to call a limited set of kernel functions
> > > > > > > > in a later patch.
> > > > > > > >
> > > > > > > > When writing bpf prog, the extern kernel function needs
> > > > > > > > to be declared under a ELF section (".ksyms") which is
> > > > > > > > the same as the current extern kernel variables and that should
> > > > > > > > keep its usage consistent without requiring to remember another
> > > > > > > > section name.
> > > > > > > >
> > > > > > > > For example, in a bpf_prog.c:
> > > > > > > >
> > > > > > > > extern int foo(struct sock *) __attribute__((section(".ksyms")))
> > > > > > > >
> > > > > > > > [24] FUNC_PROTO '(anon)' ret_type_id=15 vlen=1
> > > > > > > >         '(anon)' type_id=18
> > > > > > > > [25] FUNC 'foo' type_id=24 linkage=extern
> > > > > > > > [ ... ]
> > > > > > > > [33] DATASEC '.ksyms' size=0 vlen=1
> > > > > > > >         type_id=25 offset=0 size=0
> > > > > > > >
> > > > > > > > LLVM will put the "func" type into the BTF datasec ".ksyms".
> > > > > > > > The current "btf_datasec_check_meta()" assumes everything under
> > > > > > > > it is a "var" and ensures it has non-zero size ("!vsi->size" test).
> > > > > > > > The non-zero size check is not true for "func".  This patch postpones the
> > > > > > > > "!vsi-size" test from "btf_datasec_check_meta()" to
> > > > > > > > "btf_datasec_resolve()" which has all types collected to decide
> > > > > > > > if a vsi is a "var" or a "func" and then enforce the "vsi->size"
> > > > > > > > differently.
> > > > > > > >
> > > > > > > > If the datasec only has "func", its "t->size" could be zero.
> > > > > > > > Thus, the current "!t->size" test is no longer valid.  The
> > > > > > > > invalid "t->size" will still be caught by the later
> > > > > > > > "last_vsi_end_off > t->size" check.   This patch also takes this
> > > > > > > > chance to consolidate other "t->size" tests ("vsi->offset >= t->size"
> > > > > > > > "vsi->size > t->size", and "t->size < sum") into the existing
> > > > > > > > "last_vsi_end_off > t->size" test.
> > > > > > > >
> > > > > > > > The LLVM will also put those extern kernel function as an extern
> > > > > > > > linkage func in the BTF:
> > > > > > > >
> > > > > > > > [24] FUNC_PROTO '(anon)' ret_type_id=15 vlen=1
> > > > > > > >         '(anon)' type_id=18
> > > > > > > > [25] FUNC 'foo' type_id=24 linkage=extern
> > > > > > > >
> > > > > > > > This patch allows BTF_FUNC_EXTERN in btf_func_check_meta().
> > > > > > > > Also extern kernel function declaration does not
> > > > > > > > necessary have arg name. Another change in btf_func_check() is
> > > > > > > > to allow extern function having no arg name.
> > > > > > > >
> > > > > > > > The btf selftest is adjusted accordingly.  New tests are also added.
> > > > > > > >
> > > > > > > > The required LLVM patch: https://reviews.llvm.org/D93563 
> > > > > > > >
> > > > > > > > Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> > > > > > > > ---
> > > > > > >
> > > > > > > High-level question about EXTERN functions in DATASEC. Does kernel
> > > > > > > need to see them under DATASEC? What if libbpf just removed all EXTERN
> > > > > > > funcs from under DATASEC and leave them as "free-floating" EXTERN
> > > > > > > FUNCs in BTF.
> > > > > > >
> > > > > > > We need to tag EXTERNs with DATASECs mainly for libbpf to know whether
> > > > > > > it's .kconfig or .ksym or other type of externs. Does kernel need to
> > > > > > > care?
> > > > > > Although the kernel does not need to know, since the a legit llvm generates it,
> > > > > > I go with a proper support in the kernel (e.g. bpftool btf dump can better
> > > > > > reflect what was there).
> > > > >
> > > > > LLVM also generates extern VAR with BTF_VAR_EXTERN, yet libbpf is
> > > > > replacing it with fake INTs.
> > > > Yep. I noticed the loop in collect_extern() in libbpf.
> > > > It replaces the var->type with INT.
> > > >
> > > > > We could do just that here as well.
> > > > What to replace in the FUNC case?
> > >
> > > if we do that, I'd just replace them with same INTs. Or we can just
> > > remove the entire DATASEC. Now it is easier to do with BTF write APIs.
> > > Back then it was a major pain. I'd probably get rid of DATASEC
> > > altogether instead of that INT replacement, if I had BTF write APIs.
> > Do you mean vsi->type = INT?
> 
> yes, that's what existing logic does for EXTERN var
There may be no var.

> 
> >
> > >
> > > >
> > > > Regardless, supporting it properly in the kernel is a better way to go
> > > > instead of asking the userspace to move around it.  It is not very
> > > > complicated to support it in the kernel also.
> > > >
> > > > What is the concern of having the kernel to support it?
> > >
> > > Just more complicated BTF validation logic, which means that there are
> > > higher chances of permitting invalid BTF. And then the question is
> > > what can the kernel do with those EXTERNs in BTF? Probably nothing.
> > > And that .ksyms section is special, and purely libbpf convention.
> > > Ideally kernel should not allow EXTERN funcs in any other DATASEC. Are
> > > you willing to hard-code ".ksyms" name in kernel for libbpf's sake?
> > > Probably not. The general rule, so far, was that kernel shouldn't see
> > > any unresolved EXTERN at all. Now it's neither here nor there. EXTERN
> > > funcs are ok, EXTERN vars are not.
> > Exactly, it is libbpf convention.  The kernel does not need to enforce it.
> > The kernel only needs to be able to support the debug info generated by
> > llvm and being able to display/dump it later.
> >
> > There are many other things in the BTF that the kernel does not need to
> 
> Curious, what are those many other things?
VAR '_license'.
deeper things could be STRUCT 'tcp_congestion_ops' and the types under it.

> 
> > know.  It is there for debug purpose which the BTF is used for.  Yes,
> > the func call can be discovered by instruction dump.  It is also nice to
> > see everything in one ksyms datasec also during btf dump.
> >
> > If there is a need to strip everything that the kernel does not need
> > from the BTF, it can all be stripped in another "--strip-debug" like
> > option.
> 
> Where does this "--strip-debug" option go? Clang, pahole, or bpftool?
> Or am I misunderstanding what you are proposing?
Could be a libbpf option during load? or it can be done during gen skel?

> 
> >
> > To support EXTERN var, the kernel part should be fine.  I am only not
> > sure why it has to change the vs->size and vs->offset in libbpf?
> 
> vs->size and vs->offset are adjusted to match int type. Otherwise
> kernel BTF validation will complain about DATASEC size mismatch.
make sense. so if there is no need to replace it with INT,
they can be left as is?

> 
> >
> >
> > >
> > > >
> > > > > If anyone would want to know all the kernel functions that some BPF
> > > > > program is using, they could do it from the instruction dump, with
> > > > > proper addresses and kernel function names nicely displayed there.
> > > > > That's way more useful, IMO.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH bpf-next 02/15] bpf: btf: Support parsing extern func
  2021-03-19 22:45                 ` Martin KaFai Lau
@ 2021-03-19 23:02                   ` Andrii Nakryiko
  2021-03-20  0:13                     ` Martin KaFai Lau
  0 siblings, 1 reply; 49+ messages in thread
From: Andrii Nakryiko @ 2021-03-19 23:02 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team, Networking

On Fri, Mar 19, 2021 at 3:45 PM Martin KaFai Lau <kafai@fb.com> wrote:
>
> On Fri, Mar 19, 2021 at 03:29:57PM -0700, Andrii Nakryiko wrote:
> > On Fri, Mar 19, 2021 at 3:19 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > >
> > > On Fri, Mar 19, 2021 at 02:27:13PM -0700, Andrii Nakryiko wrote:
> > > > On Thu, Mar 18, 2021 at 10:29 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > > > >
> > > > > On Thu, Mar 18, 2021 at 09:13:56PM -0700, Andrii Nakryiko wrote:
> > > > > > On Thu, Mar 18, 2021 at 4:39 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > > > > > >
> > > > > > > On Thu, Mar 18, 2021 at 03:53:38PM -0700, Andrii Nakryiko wrote:
> > > > > > > > On Tue, Mar 16, 2021 at 12:01 AM Martin KaFai Lau <kafai@fb.com> wrote:
> > > > > > > > >
> > > > > > > > > This patch makes BTF verifier to accept extern func. It is used for
> > > > > > > > > allowing bpf program to call a limited set of kernel functions
> > > > > > > > > in a later patch.
> > > > > > > > >
> > > > > > > > > When writing bpf prog, the extern kernel function needs
> > > > > > > > > to be declared under a ELF section (".ksyms") which is
> > > > > > > > > the same as the current extern kernel variables and that should
> > > > > > > > > keep its usage consistent without requiring to remember another
> > > > > > > > > section name.
> > > > > > > > >
> > > > > > > > > For example, in a bpf_prog.c:
> > > > > > > > >
> > > > > > > > > extern int foo(struct sock *) __attribute__((section(".ksyms")))
> > > > > > > > >
> > > > > > > > > [24] FUNC_PROTO '(anon)' ret_type_id=15 vlen=1
> > > > > > > > >         '(anon)' type_id=18
> > > > > > > > > [25] FUNC 'foo' type_id=24 linkage=extern
> > > > > > > > > [ ... ]
> > > > > > > > > [33] DATASEC '.ksyms' size=0 vlen=1
> > > > > > > > >         type_id=25 offset=0 size=0
> > > > > > > > >
> > > > > > > > > LLVM will put the "func" type into the BTF datasec ".ksyms".
> > > > > > > > > The current "btf_datasec_check_meta()" assumes everything under
> > > > > > > > > it is a "var" and ensures it has non-zero size ("!vsi->size" test).
> > > > > > > > > The non-zero size check is not true for "func".  This patch postpones the
> > > > > > > > > "!vsi-size" test from "btf_datasec_check_meta()" to
> > > > > > > > > "btf_datasec_resolve()" which has all types collected to decide
> > > > > > > > > if a vsi is a "var" or a "func" and then enforce the "vsi->size"
> > > > > > > > > differently.
> > > > > > > > >
> > > > > > > > > If the datasec only has "func", its "t->size" could be zero.
> > > > > > > > > Thus, the current "!t->size" test is no longer valid.  The
> > > > > > > > > invalid "t->size" will still be caught by the later
> > > > > > > > > "last_vsi_end_off > t->size" check.   This patch also takes this
> > > > > > > > > chance to consolidate other "t->size" tests ("vsi->offset >= t->size"
> > > > > > > > > "vsi->size > t->size", and "t->size < sum") into the existing
> > > > > > > > > "last_vsi_end_off > t->size" test.
> > > > > > > > >
> > > > > > > > > The LLVM will also put those extern kernel function as an extern
> > > > > > > > > linkage func in the BTF:
> > > > > > > > >
> > > > > > > > > [24] FUNC_PROTO '(anon)' ret_type_id=15 vlen=1
> > > > > > > > >         '(anon)' type_id=18
> > > > > > > > > [25] FUNC 'foo' type_id=24 linkage=extern
> > > > > > > > >
> > > > > > > > > This patch allows BTF_FUNC_EXTERN in btf_func_check_meta().
> > > > > > > > > Also extern kernel function declaration does not
> > > > > > > > > necessary have arg name. Another change in btf_func_check() is
> > > > > > > > > to allow extern function having no arg name.
> > > > > > > > >
> > > > > > > > > The btf selftest is adjusted accordingly.  New tests are also added.
> > > > > > > > >
> > > > > > > > > The required LLVM patch: https://reviews.llvm.org/D93563
> > > > > > > > >
> > > > > > > > > Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> > > > > > > > > ---
> > > > > > > >
> > > > > > > > High-level question about EXTERN functions in DATASEC. Does kernel
> > > > > > > > need to see them under DATASEC? What if libbpf just removed all EXTERN
> > > > > > > > funcs from under DATASEC and leave them as "free-floating" EXTERN
> > > > > > > > FUNCs in BTF.
> > > > > > > >
> > > > > > > > We need to tag EXTERNs with DATASECs mainly for libbpf to know whether
> > > > > > > > it's .kconfig or .ksym or other type of externs. Does kernel need to
> > > > > > > > care?
> > > > > > > Although the kernel does not need to know, since the a legit llvm generates it,
> > > > > > > I go with a proper support in the kernel (e.g. bpftool btf dump can better
> > > > > > > reflect what was there).
> > > > > >
> > > > > > LLVM also generates extern VAR with BTF_VAR_EXTERN, yet libbpf is
> > > > > > replacing it with fake INTs.
> > > > > Yep. I noticed the loop in collect_extern() in libbpf.
> > > > > It replaces the var->type with INT.
> > > > >
> > > > > > We could do just that here as well.
> > > > > What to replace in the FUNC case?
> > > >
> > > > if we do that, I'd just replace them with same INTs. Or we can just
> > > > remove the entire DATASEC. Now it is easier to do with BTF write APIs.
> > > > Back then it was a major pain. I'd probably get rid of DATASEC
> > > > altogether instead of that INT replacement, if I had BTF write APIs.
> > > Do you mean vsi->type = INT?
> >
> > yes, that's what existing logic does for EXTERN var
> There may be no var.
>

sure, but we have btf__add_var(), if we really want VAR ;)

> >
> > >
> > > >
> > > > >
> > > > > Regardless, supporting it properly in the kernel is a better way to go
> > > > > instead of asking the userspace to move around it.  It is not very
> > > > > complicated to support it in the kernel also.
> > > > >
> > > > > What is the concern of having the kernel to support it?
> > > >
> > > > Just more complicated BTF validation logic, which means that there are
> > > > higher chances of permitting invalid BTF. And then the question is
> > > > what can the kernel do with those EXTERNs in BTF? Probably nothing.
> > > > And that .ksyms section is special, and purely libbpf convention.
> > > > Ideally kernel should not allow EXTERN funcs in any other DATASEC. Are
> > > > you willing to hard-code ".ksyms" name in kernel for libbpf's sake?
> > > > Probably not. The general rule, so far, was that kernel shouldn't see
> > > > any unresolved EXTERN at all. Now it's neither here nor there. EXTERN
> > > > funcs are ok, EXTERN vars are not.
> > > Exactly, it is libbpf convention.  The kernel does not need to enforce it.
> > > The kernel only needs to be able to support the debug info generated by
> > > llvm and being able to display/dump it later.
> > >
> > > There are many other things in the BTF that the kernel does not need to
> >
> > Curious, what are those many other things?
> VAR '_license'.
> deeper things could be STRUCT 'tcp_congestion_ops' and the types under it.
>

kernel is aware of DATASEC in general, it validates variable sizes and
offsets, and datasec size itself. DATASEC can be assigned as
value_type_id for maps. So I guess technically you are correct that it
doesn't care about VAR _license specifically, but it has to care about
DATASEC/VARs in general. Same applies to STRUCT 'tcp_congestion_ops'.

I'm fine with extending the kernel with EXTERN funcs, btw. I just
don't think it's necessary. But then also let's support EXTERN vars
for consistency.

> >
> > > know.  It is there for debug purpose which the BTF is used for.  Yes,
> > > the func call can be discovered by instruction dump.  It is also nice to
> > > see everything in one ksyms datasec also during btf dump.
> > >
> > > If there is a need to strip everything that the kernel does not need
> > > from the BTF, it can all be stripped in another "--strip-debug" like
> > > option.
> >
> > Where does this "--strip-debug" option go? Clang, pahole, or bpftool?
> > Or am I misunderstanding what you are proposing?
> Could be a libbpf option during load? or it can be done during gen skel?

libbpf already sanitizes BTF removing and adjusting BTF information
(e.g., DATASEC sizes). So that's happening automatically. I wasn't
sure what that new stripping option would do, which is why I asked.

>
> >
> > >
> > > To support EXTERN var, the kernel part should be fine.  I am only not
> > > sure why it has to change the vs->size and vs->offset in libbpf?
> >
> > vs->size and vs->offset are adjusted to match int type. Otherwise
> > kernel BTF validation will complain about DATASEC size mismatch.
> make sense. so if there is no need to replace it with INT,
> they can be left as is?

If kernel start supporting EXTERN vars, yes, we won't need to touch
it. But of course to support older kernels libbpf will still have to
do this. EXTERN vars won't reduce the amount of libbpf logic.

>
> >
> > >
> > >
> > > >
> > > > >
> > > > > > If anyone would want to know all the kernel functions that some BPF
> > > > > > program is using, they could do it from the instruction dump, with
> > > > > > proper addresses and kernel function names nicely displayed there.
> > > > > > That's way more useful, IMO.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH bpf-next 03/15] bpf: Refactor btf_check_func_arg_match
  2021-03-19 21:51       ` Andrii Nakryiko
@ 2021-03-20  0:10         ` Alexei Starovoitov
  2021-03-20 17:13           ` Andrii Nakryiko
  0 siblings, 1 reply; 49+ messages in thread
From: Alexei Starovoitov @ 2021-03-20  0:10 UTC (permalink / raw)
  To: Andrii Nakryiko, Martin KaFai Lau
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team, Networking

On 3/19/21 2:51 PM, Andrii Nakryiko wrote:
> 
> It's a matter of taste, I suppose. I'd probably disagree with you on
> the readability of those verifier parts ;) So it's up to you, of
> course, but for me this code pattern:
> 
> for (...) {
>      if (A) {
>          handleA;
>      } else if (B) {
>          handleB;
>      } else {
>          return -EINVAL;
>      }
> }
> 
> is much harder to follow than more linear (imo)
> 
> for (...) {
>      if (A) {
>          handleA;
>          continue;
>      }
> 
>      if (!B)
>          return -EINVAL;
> 
>      handleB;
> }
> 
> especially if handleA and handleB are quite long and complicated.
> Because I have to jump back and forth to validate that C is not
> allowed/handled later, and that there is no common subsequent logic
> for both A and B (or even C). In the latter code pattern there are
> clear "only A" and "only B" logic and it's quite obvious that no C is
> allowed/handled.

my .02. I like the former (Martin's case) much better than the later.
We had few patterns like the later in the past and had to turn them
into the former because "case C" appeared.
In other words:
if (A)
else if (B)
else
   return

is much easier to extend for C and later convert to 'switch' with 'D':
less code churn, easier to refactor.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH bpf-next 02/15] bpf: btf: Support parsing extern func
  2021-03-19 23:02                   ` Andrii Nakryiko
@ 2021-03-20  0:13                     ` Martin KaFai Lau
  2021-03-20 17:18                       ` Andrii Nakryiko
  0 siblings, 1 reply; 49+ messages in thread
From: Martin KaFai Lau @ 2021-03-20  0:13 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team, Networking

On Fri, Mar 19, 2021 at 04:02:27PM -0700, Andrii Nakryiko wrote:
> On Fri, Mar 19, 2021 at 3:45 PM Martin KaFai Lau <kafai@fb.com> wrote:
> >
> > On Fri, Mar 19, 2021 at 03:29:57PM -0700, Andrii Nakryiko wrote:
> > > On Fri, Mar 19, 2021 at 3:19 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > > >
> > > > On Fri, Mar 19, 2021 at 02:27:13PM -0700, Andrii Nakryiko wrote:
> > > > > On Thu, Mar 18, 2021 at 10:29 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > > > > >
> > > > > > On Thu, Mar 18, 2021 at 09:13:56PM -0700, Andrii Nakryiko wrote:
> > > > > > > On Thu, Mar 18, 2021 at 4:39 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > > > > > > >
> > > > > > > > On Thu, Mar 18, 2021 at 03:53:38PM -0700, Andrii Nakryiko wrote:
> > > > > > > > > On Tue, Mar 16, 2021 at 12:01 AM Martin KaFai Lau <kafai@fb.com> wrote:
> > > > > > > > > >
> > > > > > > > > > This patch makes BTF verifier to accept extern func. It is used for
> > > > > > > > > > allowing bpf program to call a limited set of kernel functions
> > > > > > > > > > in a later patch.
> > > > > > > > > >
> > > > > > > > > > When writing bpf prog, the extern kernel function needs
> > > > > > > > > > to be declared under a ELF section (".ksyms") which is
> > > > > > > > > > the same as the current extern kernel variables and that should
> > > > > > > > > > keep its usage consistent without requiring to remember another
> > > > > > > > > > section name.
> > > > > > > > > >
> > > > > > > > > > For example, in a bpf_prog.c:
> > > > > > > > > >
> > > > > > > > > > extern int foo(struct sock *) __attribute__((section(".ksyms")))
> > > > > > > > > >
> > > > > > > > > > [24] FUNC_PROTO '(anon)' ret_type_id=15 vlen=1
> > > > > > > > > >         '(anon)' type_id=18
> > > > > > > > > > [25] FUNC 'foo' type_id=24 linkage=extern
> > > > > > > > > > [ ... ]
> > > > > > > > > > [33] DATASEC '.ksyms' size=0 vlen=1
> > > > > > > > > >         type_id=25 offset=0 size=0
> > > > > > > > > >
> > > > > > > > > > LLVM will put the "func" type into the BTF datasec ".ksyms".
> > > > > > > > > > The current "btf_datasec_check_meta()" assumes everything under
> > > > > > > > > > it is a "var" and ensures it has non-zero size ("!vsi->size" test).
> > > > > > > > > > The non-zero size check is not true for "func".  This patch postpones the
> > > > > > > > > > "!vsi-size" test from "btf_datasec_check_meta()" to
> > > > > > > > > > "btf_datasec_resolve()" which has all types collected to decide
> > > > > > > > > > if a vsi is a "var" or a "func" and then enforce the "vsi->size"
> > > > > > > > > > differently.
> > > > > > > > > >
> > > > > > > > > > If the datasec only has "func", its "t->size" could be zero.
> > > > > > > > > > Thus, the current "!t->size" test is no longer valid.  The
> > > > > > > > > > invalid "t->size" will still be caught by the later
> > > > > > > > > > "last_vsi_end_off > t->size" check.   This patch also takes this
> > > > > > > > > > chance to consolidate other "t->size" tests ("vsi->offset >= t->size"
> > > > > > > > > > "vsi->size > t->size", and "t->size < sum") into the existing
> > > > > > > > > > "last_vsi_end_off > t->size" test.
> > > > > > > > > >
> > > > > > > > > > The LLVM will also put those extern kernel function as an extern
> > > > > > > > > > linkage func in the BTF:
> > > > > > > > > >
> > > > > > > > > > [24] FUNC_PROTO '(anon)' ret_type_id=15 vlen=1
> > > > > > > > > >         '(anon)' type_id=18
> > > > > > > > > > [25] FUNC 'foo' type_id=24 linkage=extern
> > > > > > > > > >
> > > > > > > > > > This patch allows BTF_FUNC_EXTERN in btf_func_check_meta().
> > > > > > > > > > Also extern kernel function declaration does not
> > > > > > > > > > necessary have arg name. Another change in btf_func_check() is
> > > > > > > > > > to allow extern function having no arg name.
> > > > > > > > > >
> > > > > > > > > > The btf selftest is adjusted accordingly.  New tests are also added.
> > > > > > > > > >
> > > > > > > > > > The required LLVM patch: https://reviews.llvm.org/D93563 
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> > > > > > > > > > ---
> > > > > > > > >
> > > > > > > > > High-level question about EXTERN functions in DATASEC. Does kernel
> > > > > > > > > need to see them under DATASEC? What if libbpf just removed all EXTERN
> > > > > > > > > funcs from under DATASEC and leave them as "free-floating" EXTERN
> > > > > > > > > FUNCs in BTF.
> > > > > > > > >
> > > > > > > > > We need to tag EXTERNs with DATASECs mainly for libbpf to know whether
> > > > > > > > > it's .kconfig or .ksym or other type of externs. Does kernel need to
> > > > > > > > > care?
> > > > > > > > Although the kernel does not need to know, since the a legit llvm generates it,
> > > > > > > > I go with a proper support in the kernel (e.g. bpftool btf dump can better
> > > > > > > > reflect what was there).
> > > > > > >
> > > > > > > LLVM also generates extern VAR with BTF_VAR_EXTERN, yet libbpf is
> > > > > > > replacing it with fake INTs.
> > > > > > Yep. I noticed the loop in collect_extern() in libbpf.
> > > > > > It replaces the var->type with INT.
> > > > > >
> > > > > > > We could do just that here as well.
> > > > > > What to replace in the FUNC case?
> > > > >
> > > > > if we do that, I'd just replace them with same INTs. Or we can just
> > > > > remove the entire DATASEC. Now it is easier to do with BTF write APIs.
> > > > > Back then it was a major pain. I'd probably get rid of DATASEC
> > > > > altogether instead of that INT replacement, if I had BTF write APIs.
> > > > Do you mean vsi->type = INT?
> > >
> > > yes, that's what existing logic does for EXTERN var
> > There may be no var.
> >
> 
> sure, but we have btf__add_var(), if we really want VAR ;)
> 
> > >
> > > >
> > > > >
> > > > > >
> > > > > > Regardless, supporting it properly in the kernel is a better way to go
> > > > > > instead of asking the userspace to move around it.  It is not very
> > > > > > complicated to support it in the kernel also.
> > > > > >
> > > > > > What is the concern of having the kernel to support it?
> > > > >
> > > > > Just more complicated BTF validation logic, which means that there are
> > > > > higher chances of permitting invalid BTF. And then the question is
> > > > > what can the kernel do with those EXTERNs in BTF? Probably nothing.
> > > > > And that .ksyms section is special, and purely libbpf convention.
> > > > > Ideally kernel should not allow EXTERN funcs in any other DATASEC. Are
> > > > > you willing to hard-code ".ksyms" name in kernel for libbpf's sake?
> > > > > Probably not. The general rule, so far, was that kernel shouldn't see
> > > > > any unresolved EXTERN at all. Now it's neither here nor there. EXTERN
> > > > > funcs are ok, EXTERN vars are not.
> > > > Exactly, it is libbpf convention.  The kernel does not need to enforce it.
> > > > The kernel only needs to be able to support the debug info generated by
> > > > llvm and being able to display/dump it later.
> > > >
> > > > There are many other things in the BTF that the kernel does not need to
> > >
> > > Curious, what are those many other things?
> > VAR '_license'.
> > deeper things could be STRUCT 'tcp_congestion_ops' and the types under it.
> >
> 
> kernel is aware of DATASEC in general, it validates variable sizes and
> offsets, and datasec size itself. 
Yeah, the kernel still thinks it is data only now.
With func in datasec, I think the name "data"sec may be a bit out-dated.

> DATASEC can be assigned as
> value_type_id for maps. So I guess technically you are correct that it
> doesn't care about VAR _license specifically, but it has to care about
> DATASEC/VARs in general. Same applies to STRUCT 'tcp_congestion_ops'.
> 
> I'm fine with extending the kernel with EXTERN funcs, btw. I just
> don't think it's necessary. But then also let's support EXTERN vars
> for consistency.
cool. lets explore EXTERN vars support.

> > > >
> > > > To support EXTERN var, the kernel part should be fine.  I am only not
> > > > sure why it has to change the vs->size and vs->offset in libbpf?
> > >
> > > vs->size and vs->offset are adjusted to match int type. Otherwise
> > > kernel BTF validation will complain about DATASEC size mismatch.
> > make sense. so if there is no need to replace it with INT,
> > they can be left as is?
> 
> If kernel start supporting EXTERN vars, yes, we won't need to touch
> it.
From test_ksyms.c:
[22] DATASEC '.ksyms' size=0 vlen=5
     type_id=12 offset=0 size=1
     type_id=13 offset=0 size=1

For extern, does it make sense for the libbpf to assign 0 to
both var offset and size since it does not matter?
In the kernel, it can ensure a datasec only has all extern or no extern.
array_map_check_btf() will ensure the datasec has no extern.

> But of course to support older kernels libbpf will still have to
> do this. EXTERN vars won't reduce the amount of libbpf logic.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH bpf-next 03/15] bpf: Refactor btf_check_func_arg_match
  2021-03-20  0:10         ` Alexei Starovoitov
@ 2021-03-20 17:13           ` Andrii Nakryiko
  0 siblings, 0 replies; 49+ messages in thread
From: Andrii Nakryiko @ 2021-03-20 17:13 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Martin KaFai Lau, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Networking

On Fri, Mar 19, 2021 at 5:10 PM Alexei Starovoitov <ast@fb.com> wrote:
>
> On 3/19/21 2:51 PM, Andrii Nakryiko wrote:
> >
> > It's a matter of taste, I suppose. I'd probably disagree with you on
> > the readability of those verifier parts ;) So it's up to you, of
> > course, but for me this code pattern:
> >
> > for (...) {
> >      if (A) {
> >          handleA;
> >      } else if (B) {
> >          handleB;
> >      } else {
> >          return -EINVAL;
> >      }
> > }
> >
> > is much harder to follow than more linear (imo)
> >
> > for (...) {
> >      if (A) {
> >          handleA;
> >          continue;
> >      }
> >
> >      if (!B)
> >          return -EINVAL;
> >
> >      handleB;
> > }
> >
> > especially if handleA and handleB are quite long and complicated.
> > Because I have to jump back and forth to validate that C is not
> > allowed/handled later, and that there is no common subsequent logic
> > for both A and B (or even C). In the latter code pattern there are
> > clear "only A" and "only B" logic and it's quite obvious that no C is
> > allowed/handled.
>
> my .02. I like the former (Martin's case) much better than the later.
> We had few patterns like the later in the past and had to turn them
> into the former because "case C" appeared.
> In other words:
> if (A)
> else if (B)
> else
>    return
>
> is much easier to extend for C and later convert to 'switch' with 'D':
> less code churn, easier to refactor.

I think code structure should reflect current logic, not be in
preparation for further potential extension, which might not even
happen. If there are only A and B possible, then code should make it
as clear as possible. But if we anticipate another case C, then

if (A) {
    handleA;
    continue;
}
if (B) {
    handle B;
    continue;
}
return -EINVAL;

Is still easier to follow and is easy to extend.

My original point was that `if () {} else if () {}` code structure
implies that there is or might be some common handling logic after
if/else, so at least my brain constantly worries about that and jumps
around in the code to validate that there isn't actually anything
else. And that gets progressively more harder with longer or more
complicated logic inside handleA and handleB.

Anyways, I'm not trying to enforce my personal style, I tried to show
that it's objectively superior from my brain's point of view. That
`continue` is "a pruning point", if you will. But I'm not trying to
convert anyone. Please proceed with whatever code structure you feel
is better.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH bpf-next 02/15] bpf: btf: Support parsing extern func
  2021-03-20  0:13                     ` Martin KaFai Lau
@ 2021-03-20 17:18                       ` Andrii Nakryiko
  2021-03-23  4:55                         ` Martin KaFai Lau
  0 siblings, 1 reply; 49+ messages in thread
From: Andrii Nakryiko @ 2021-03-20 17:18 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team, Networking

On Fri, Mar 19, 2021 at 5:13 PM Martin KaFai Lau <kafai@fb.com> wrote:
>
> On Fri, Mar 19, 2021 at 04:02:27PM -0700, Andrii Nakryiko wrote:
> > On Fri, Mar 19, 2021 at 3:45 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > >
> > > On Fri, Mar 19, 2021 at 03:29:57PM -0700, Andrii Nakryiko wrote:
> > > > On Fri, Mar 19, 2021 at 3:19 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > > > >
> > > > > On Fri, Mar 19, 2021 at 02:27:13PM -0700, Andrii Nakryiko wrote:
> > > > > > On Thu, Mar 18, 2021 at 10:29 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > > > > > >
> > > > > > > On Thu, Mar 18, 2021 at 09:13:56PM -0700, Andrii Nakryiko wrote:
> > > > > > > > On Thu, Mar 18, 2021 at 4:39 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > > > > > > > >
> > > > > > > > > On Thu, Mar 18, 2021 at 03:53:38PM -0700, Andrii Nakryiko wrote:
> > > > > > > > > > On Tue, Mar 16, 2021 at 12:01 AM Martin KaFai Lau <kafai@fb.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > This patch makes BTF verifier to accept extern func. It is used for
> > > > > > > > > > > allowing bpf program to call a limited set of kernel functions
> > > > > > > > > > > in a later patch.
> > > > > > > > > > >
> > > > > > > > > > > When writing bpf prog, the extern kernel function needs
> > > > > > > > > > > to be declared under a ELF section (".ksyms") which is
> > > > > > > > > > > the same as the current extern kernel variables and that should
> > > > > > > > > > > keep its usage consistent without requiring to remember another
> > > > > > > > > > > section name.
> > > > > > > > > > >
> > > > > > > > > > > For example, in a bpf_prog.c:
> > > > > > > > > > >
> > > > > > > > > > > extern int foo(struct sock *) __attribute__((section(".ksyms")))
> > > > > > > > > > >
> > > > > > > > > > > [24] FUNC_PROTO '(anon)' ret_type_id=15 vlen=1
> > > > > > > > > > >         '(anon)' type_id=18
> > > > > > > > > > > [25] FUNC 'foo' type_id=24 linkage=extern
> > > > > > > > > > > [ ... ]
> > > > > > > > > > > [33] DATASEC '.ksyms' size=0 vlen=1
> > > > > > > > > > >         type_id=25 offset=0 size=0
> > > > > > > > > > >
> > > > > > > > > > > LLVM will put the "func" type into the BTF datasec ".ksyms".
> > > > > > > > > > > The current "btf_datasec_check_meta()" assumes everything under
> > > > > > > > > > > it is a "var" and ensures it has non-zero size ("!vsi->size" test).
> > > > > > > > > > > The non-zero size check is not true for "func".  This patch postpones the
> > > > > > > > > > > "!vsi-size" test from "btf_datasec_check_meta()" to
> > > > > > > > > > > "btf_datasec_resolve()" which has all types collected to decide
> > > > > > > > > > > if a vsi is a "var" or a "func" and then enforce the "vsi->size"
> > > > > > > > > > > differently.
> > > > > > > > > > >
> > > > > > > > > > > If the datasec only has "func", its "t->size" could be zero.
> > > > > > > > > > > Thus, the current "!t->size" test is no longer valid.  The
> > > > > > > > > > > invalid "t->size" will still be caught by the later
> > > > > > > > > > > "last_vsi_end_off > t->size" check.   This patch also takes this
> > > > > > > > > > > chance to consolidate other "t->size" tests ("vsi->offset >= t->size"
> > > > > > > > > > > "vsi->size > t->size", and "t->size < sum") into the existing
> > > > > > > > > > > "last_vsi_end_off > t->size" test.
> > > > > > > > > > >
> > > > > > > > > > > The LLVM will also put those extern kernel function as an extern
> > > > > > > > > > > linkage func in the BTF:
> > > > > > > > > > >
> > > > > > > > > > > [24] FUNC_PROTO '(anon)' ret_type_id=15 vlen=1
> > > > > > > > > > >         '(anon)' type_id=18
> > > > > > > > > > > [25] FUNC 'foo' type_id=24 linkage=extern
> > > > > > > > > > >
> > > > > > > > > > > This patch allows BTF_FUNC_EXTERN in btf_func_check_meta().
> > > > > > > > > > > Also extern kernel function declaration does not
> > > > > > > > > > > necessary have arg name. Another change in btf_func_check() is
> > > > > > > > > > > to allow extern function having no arg name.
> > > > > > > > > > >
> > > > > > > > > > > The btf selftest is adjusted accordingly.  New tests are also added.
> > > > > > > > > > >
> > > > > > > > > > > The required LLVM patch: https://reviews.llvm.org/D93563
> > > > > > > > > > >
> > > > > > > > > > > Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> > > > > > > > > > > ---
> > > > > > > > > >
> > > > > > > > > > High-level question about EXTERN functions in DATASEC. Does kernel
> > > > > > > > > > need to see them under DATASEC? What if libbpf just removed all EXTERN
> > > > > > > > > > funcs from under DATASEC and leave them as "free-floating" EXTERN
> > > > > > > > > > FUNCs in BTF.
> > > > > > > > > >
> > > > > > > > > > We need to tag EXTERNs with DATASECs mainly for libbpf to know whether
> > > > > > > > > > it's .kconfig or .ksym or other type of externs. Does kernel need to
> > > > > > > > > > care?
> > > > > > > > > Although the kernel does not need to know, since the a legit llvm generates it,
> > > > > > > > > I go with a proper support in the kernel (e.g. bpftool btf dump can better
> > > > > > > > > reflect what was there).
> > > > > > > >
> > > > > > > > LLVM also generates extern VAR with BTF_VAR_EXTERN, yet libbpf is
> > > > > > > > replacing it with fake INTs.
> > > > > > > Yep. I noticed the loop in collect_extern() in libbpf.
> > > > > > > It replaces the var->type with INT.
> > > > > > >
> > > > > > > > We could do just that here as well.
> > > > > > > What to replace in the FUNC case?
> > > > > >
> > > > > > if we do that, I'd just replace them with same INTs. Or we can just
> > > > > > remove the entire DATASEC. Now it is easier to do with BTF write APIs.
> > > > > > Back then it was a major pain. I'd probably get rid of DATASEC
> > > > > > altogether instead of that INT replacement, if I had BTF write APIs.
> > > > > Do you mean vsi->type = INT?
> > > >
> > > > yes, that's what existing logic does for EXTERN var
> > > There may be no var.
> > >
> >
> > sure, but we have btf__add_var(), if we really want VAR ;)
> >
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > Regardless, supporting it properly in the kernel is a better way to go
> > > > > > > instead of asking the userspace to move around it.  It is not very
> > > > > > > complicated to support it in the kernel also.
> > > > > > >
> > > > > > > What is the concern of having the kernel to support it?
> > > > > >
> > > > > > Just more complicated BTF validation logic, which means that there are
> > > > > > higher chances of permitting invalid BTF. And then the question is
> > > > > > what can the kernel do with those EXTERNs in BTF? Probably nothing.
> > > > > > And that .ksyms section is special, and purely libbpf convention.
> > > > > > Ideally kernel should not allow EXTERN funcs in any other DATASEC. Are
> > > > > > you willing to hard-code ".ksyms" name in kernel for libbpf's sake?
> > > > > > Probably not. The general rule, so far, was that kernel shouldn't see
> > > > > > any unresolved EXTERN at all. Now it's neither here nor there. EXTERN
> > > > > > funcs are ok, EXTERN vars are not.
> > > > > Exactly, it is libbpf convention.  The kernel does not need to enforce it.
> > > > > The kernel only needs to be able to support the debug info generated by
> > > > > llvm and being able to display/dump it later.
> > > > >
> > > > > There are many other things in the BTF that the kernel does not need to
> > > >
> > > > Curious, what are those many other things?
> > > VAR '_license'.
> > > deeper things could be STRUCT 'tcp_congestion_ops' and the types under it.
> > >
> >
> > kernel is aware of DATASEC in general, it validates variable sizes and
> > offsets, and datasec size itself.
> Yeah, the kernel still thinks it is data only now.
> With func in datasec, I think the name "data"sec may be a bit out-dated.

yep, should have been called SECTION, probably

>
> > DATASEC can be assigned as
> > value_type_id for maps. So I guess technically you are correct that it
> > doesn't care about VAR _license specifically, but it has to care about
> > DATASEC/VARs in general. Same applies to STRUCT 'tcp_congestion_ops'.
> >
> > I'm fine with extending the kernel with EXTERN funcs, btw. I just
> > don't think it's necessary. But then also let's support EXTERN vars
> > for consistency.
> cool. lets explore EXTERN vars support.
>
> > > > >
> > > > > To support EXTERN var, the kernel part should be fine.  I am only not
> > > > > sure why it has to change the vs->size and vs->offset in libbpf?
> > > >
> > > > vs->size and vs->offset are adjusted to match int type. Otherwise
> > > > kernel BTF validation will complain about DATASEC size mismatch.
> > > make sense. so if there is no need to replace it with INT,
> > > they can be left as is?
> >
> > If kernel start supporting EXTERN vars, yes, we won't need to touch
> > it.
> From test_ksyms.c:
> [22] DATASEC '.ksyms' size=0 vlen=5
>      type_id=12 offset=0 size=1
>      type_id=13 offset=0 size=1
>
> For extern, does it make sense for the libbpf to assign 0 to
> both var offset and size since it does not matter?

That's how it is generated and yes, I think that's how it should be
kept once kernel supports EXTERN VAR. libbpf is adjusting offsets and
sizes in addition to marking VAR itself as GLOBAL_ALLOCATED. If kernel
supports EXTERN VAR natively, none of that needs to happen (on newer
kernels only, of course).

> In the kernel, it can ensure a datasec only has all extern or no extern.
> array_map_check_btf() will ensure the datasec has no extern.

It certainly makes it less surprising from handling BTF, but it feels
like an arbitrary policy, rather than technical restriction. You can
mark allocated variables and externs with the same section name and
Clang will probably happily generate a single DATASEC with a mix of
externs and non-externs. Is that inherently a bad thing? I'm not sure.
Basically, I don't know if the kernel should care and enforce or not.

>
> > But of course to support older kernels libbpf will still have to
> > do this. EXTERN vars won't reduce the amount of libbpf logic.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH bpf-next 02/15] bpf: btf: Support parsing extern func
  2021-03-20 17:18                       ` Andrii Nakryiko
@ 2021-03-23  4:55                         ` Martin KaFai Lau
  0 siblings, 0 replies; 49+ messages in thread
From: Martin KaFai Lau @ 2021-03-23  4:55 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team, Networking

On Sat, Mar 20, 2021 at 10:18:36AM -0700, Andrii Nakryiko wrote:
> > From test_ksyms.c:
> > [22] DATASEC '.ksyms' size=0 vlen=5
> >      type_id=12 offset=0 size=1
> >      type_id=13 offset=0 size=1
> >
> > For extern, does it make sense for the libbpf to assign 0 to
> > both var offset and size since it does not matter?
> 
> That's how it is generated and yes, I think that's how it should be
> kept once kernel supports EXTERN VAR. libbpf is adjusting offsets and
> sizes in addition to marking VAR itself as GLOBAL_ALLOCATED. If kernel
> supports EXTERN VAR natively, none of that needs to happen (on newer
> kernels only, of course).
> 
> > In the kernel, it can ensure a datasec only has all extern or no extern.
> > array_map_check_btf() will ensure the datasec has no extern.
> 
> It certainly makes it less surprising from handling BTF, but it feels
> like an arbitrary policy, rather than technical restriction. You can
> mark allocated variables and externs with the same section name and
> Clang will probably happily generate a single DATASEC with a mix of
> externs and non-externs. Is that inherently a bad thing? I'm not sure.
> Basically, I don't know if the kernel should care and enforce or not.
I have thought a bit more on this.  The offset=0 of extern var
can be used in the verification but I think it will still have some
open ended questions like arraymap.

I will use your suggestion in libbpf and do something similar as
the extern VAR: replace the FUNC in datasec with INT (btf__add_var() if
needed).

> 
> >
> > > But of course to support older kernels libbpf will still have to
> > > do this. EXTERN vars won't reduce the amount of libbpf logic.

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2021-03-23  4:56 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-16  1:13 [PATCH bpf-next 00/15] Support calling kernel function Martin KaFai Lau
2021-03-16  1:13 ` [PATCH bpf-next 01/15] bpf: Simplify freeing logic in linfo and jited_linfo Martin KaFai Lau
2021-03-16  1:13 ` [PATCH bpf-next 02/15] bpf: btf: Support parsing extern func Martin KaFai Lau
2021-03-18 22:53   ` Andrii Nakryiko
2021-03-18 23:39     ` Martin KaFai Lau
2021-03-19  4:13       ` Andrii Nakryiko
2021-03-19  5:29         ` Martin KaFai Lau
2021-03-19 21:27           ` Andrii Nakryiko
2021-03-19 22:19             ` Martin KaFai Lau
2021-03-19 22:29               ` Andrii Nakryiko
2021-03-19 22:45                 ` Martin KaFai Lau
2021-03-19 23:02                   ` Andrii Nakryiko
2021-03-20  0:13                     ` Martin KaFai Lau
2021-03-20 17:18                       ` Andrii Nakryiko
2021-03-23  4:55                         ` Martin KaFai Lau
2021-03-16  1:13 ` [PATCH bpf-next 03/15] bpf: Refactor btf_check_func_arg_match Martin KaFai Lau
2021-03-18 23:32   ` Andrii Nakryiko
2021-03-19 19:32     ` Martin KaFai Lau
2021-03-19 21:51       ` Andrii Nakryiko
2021-03-20  0:10         ` Alexei Starovoitov
2021-03-20 17:13           ` Andrii Nakryiko
2021-03-16  1:14 ` [PATCH bpf-next 04/15] bpf: Support bpf program calling kernel function Martin KaFai Lau
2021-03-19  1:03   ` Andrii Nakryiko
2021-03-19  1:51     ` Alexei Starovoitov
2021-03-19 19:47     ` Martin KaFai Lau
2021-03-16  1:14 ` [PATCH bpf-next 05/15] bpf: Support kernel function call in x86-32 Martin KaFai Lau
2021-03-16  1:14 ` [PATCH bpf-next 06/15] tcp: Rename bictcp function prefix to cubictcp Martin KaFai Lau
2021-03-16  1:14 ` [PATCH bpf-next 07/15] bpf: tcp: White list some tcp cong functions to be called by bpf-tcp-cc Martin KaFai Lau
2021-03-19  1:19   ` Andrii Nakryiko
2021-03-16  1:14 ` [PATCH bpf-next 08/15] libbpf: Refactor bpf_object__resolve_ksyms_btf_id Martin KaFai Lau
2021-03-19  2:53   ` Andrii Nakryiko
2021-03-16  1:14 ` [PATCH bpf-next 09/15] libbpf: Refactor codes for finding btf id of a kernel symbol Martin KaFai Lau
2021-03-19  3:14   ` Andrii Nakryiko
2021-03-16  1:14 ` [PATCH bpf-next 10/15] libbpf: Rename RELO_EXTERN to RELO_EXTERN_VAR Martin KaFai Lau
2021-03-19  3:15   ` Andrii Nakryiko
2021-03-16  1:14 ` [PATCH bpf-next 11/15] libbpf: Record extern sym relocation first Martin KaFai Lau
2021-03-19  3:16   ` Andrii Nakryiko
2021-03-16  1:14 ` [PATCH bpf-next 12/15] libbpf: Support extern kernel function Martin KaFai Lau
2021-03-19  4:11   ` Andrii Nakryiko
2021-03-19  5:06     ` Martin KaFai Lau
2021-03-19 21:38       ` Andrii Nakryiko
2021-03-16  1:14 ` [PATCH bpf-next 13/15] bpf: selftests: Rename bictcp to bpf_cubic Martin KaFai Lau
2021-03-19  4:14   ` Andrii Nakryiko
2021-03-16  1:15 ` [PATCH bpf-next 14/15] bpf: selftest: bpf_cubic and bpf_dctcp calling kernel functions Martin KaFai Lau
2021-03-19  4:15   ` Andrii Nakryiko
2021-03-16  1:15 ` [PATCH bpf-next 15/15] bpf: selftest: Add kfunc_call test Martin KaFai Lau
2021-03-16  3:39   ` kernel test robot
2021-03-19  4:21   ` Andrii Nakryiko
2021-03-19  5:40     ` Martin KaFai Lau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).