All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 bpf-next 00/13] bpf: CO-RE support in the kernel
@ 2021-11-20  3:32 Alexei Starovoitov
  2021-11-20  3:32 ` [PATCH v3 bpf-next 01/13] libbpf: Replace btf__type_by_id() with btf_type_by_id() Alexei Starovoitov
                   ` (12 more replies)
  0 siblings, 13 replies; 37+ messages in thread
From: Alexei Starovoitov @ 2021-11-20  3:32 UTC (permalink / raw)
  To: davem; +Cc: daniel, andrii, bpf, kernel-team

From: Alexei Starovoitov <ast@kernel.org>

v2->v3:
. addressed Andrii's feedback in every patch.
  New field in union bpf_attr changed from "core_relo" to "core_relos".
. added one more test and checkpatch.pl-ed the set.

v1->v2:
. Refactor uapi to pass 'struct bpf_core_relo' from LLVM into libbpf and further
into the kernel instead of bpf_core_apply_relo() bpf helper. Because of this
change the CO-RE algorithm has an ability to log error and debug events through
the standard bpf verifer log mechanism which was not possible with helper
approach.
. #define RELO_CORE macro was removed and replaced with btf_member_bit_offset() patch.

This set introduces CO-RE support in the kernel.
There are several reasons to add such support:
1. It's a step toward signed BPF programs.
2. It allows golang like languages that struggle to adopt libbpf
   to take advantage of CO-RE powers.
3. Currently the field accessed by 'ldx [R1 + 10]' insn is recognized
   by the verifier purely based on +10 offset. If R1 points to a union
   the verifier picks one of the fields at this offset.
   With CO-RE the kernel can disambiguate the field access.

Alexei Starovoitov (13):
  libbpf: Replace btf__type_by_id() with btf_type_by_id().
  bpf: Rename btf_member accessors.
  bpf: Prepare relo_core.c for kernel duty.
  bpf: Define enum bpf_core_relo_kind as uapi.
  bpf: Pass a set of bpf_core_relo-s to prog_load command.
  bpf: Add bpf_core_add_cands() and wire it into
    bpf_core_apply_relo_insn().
  libbpf: Use CO-RE in the kernel in light skeleton.
  libbpf: Support init of inner maps in light skeleton.
  selftests/bpf: Add lskel version of kfunc test.
  selftests/bpf: Improve inner_map test coverage.
  selftests/bpf: Convert map_ptr_kern test to use light skeleton.
  selftests/bpf: Additional test for CO-RE in the kernel.
  selftest/bpf: Revert CO-RE removal in test_ksyms_weak.

 include/linux/bpf.h                           |  11 +
 include/linux/btf.h                           |  89 ++++++++-
 include/uapi/linux/bpf.h                      |  78 +++++++-
 kernel/bpf/Makefile                           |   4 +
 kernel/bpf/bpf_struct_ops.c                   |   6 +-
 kernel/bpf/btf.c                              | 188 +++++++++++++++++-
 kernel/bpf/syscall.c                          |   2 +-
 kernel/bpf/verifier.c                         |  77 +++++++
 net/ipv4/bpf_tcp_ca.c                         |   6 +-
 tools/include/uapi/linux/bpf.h                |  78 +++++++-
 tools/lib/bpf/bpf_gen_internal.h              |   4 +
 tools/lib/bpf/btf.c                           |   2 +-
 tools/lib/bpf/gen_loader.c                    |  68 ++++++-
 tools/lib/bpf/libbpf.c                        | 116 +++++++----
 tools/lib/bpf/libbpf_internal.h               |   2 +-
 tools/lib/bpf/relo_core.c                     | 179 +++++++++++------
 tools/lib/bpf/relo_core.h                     |  71 +------
 tools/testing/selftests/bpf/Makefile          |   5 +-
 .../selftests/bpf/prog_tests/core_kern.c      |  14 ++
 .../selftests/bpf/prog_tests/kfunc_call.c     |  24 +++
 .../selftests/bpf/prog_tests/ksyms_btf.c      |   4 +-
 .../selftests/bpf/prog_tests/map_ptr.c        |  16 +-
 tools/testing/selftests/bpf/progs/core_kern.c |  60 ++++++
 .../selftests/bpf/progs/map_ptr_kern.c        |  16 +-
 .../selftests/bpf/progs/test_ksyms_weak.c     |   2 +-
 25 files changed, 911 insertions(+), 211 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/core_kern.c
 create mode 100644 tools/testing/selftests/bpf/progs/core_kern.c

-- 
2.30.2


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v3 bpf-next 01/13] libbpf: Replace btf__type_by_id() with btf_type_by_id().
  2021-11-20  3:32 [PATCH v3 bpf-next 00/13] bpf: CO-RE support in the kernel Alexei Starovoitov
@ 2021-11-20  3:32 ` Alexei Starovoitov
  2021-11-22 21:42   ` Andrii Nakryiko
  2021-11-20  3:32 ` [PATCH v3 bpf-next 02/13] bpf: Rename btf_member accessors Alexei Starovoitov
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 37+ messages in thread
From: Alexei Starovoitov @ 2021-11-20  3:32 UTC (permalink / raw)
  To: davem; +Cc: daniel, andrii, bpf, kernel-team

From: Alexei Starovoitov <ast@kernel.org>

To prepare relo_core.c to be compiled in the kernel and the user space
replace btf__type_by_id with btf_type_by_id.

In libbpf btf__type_by_id and btf_type_by_id have different behavior.

bpf_core_apply_relo_insn() needs behavior of uapi btf__type_by_id
vs internal btf_type_by_id, but type_id range check is already done
in bpf_core_apply_relo(), so it's safe to replace it everywhere.
The kernel btf_type_by_id() does the check anyway. It doesn't hurt.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 tools/lib/bpf/btf.c             |  2 +-
 tools/lib/bpf/libbpf_internal.h |  2 +-
 tools/lib/bpf/relo_core.c       | 19 ++++++++-----------
 3 files changed, 10 insertions(+), 13 deletions(-)

diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index e97217a77196..4a1115eb39b4 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -454,7 +454,7 @@ const struct btf *btf__base_btf(const struct btf *btf)
 }
 
 /* internal helper returning non-const pointer to a type */
-struct btf_type *btf_type_by_id(struct btf *btf, __u32 type_id)
+struct btf_type *btf_type_by_id(const struct btf *btf, __u32 type_id)
 {
 	if (type_id == 0)
 		return &btf_void;
diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h
index f7ac349650a1..1e1250e1dfa3 100644
--- a/tools/lib/bpf/libbpf_internal.h
+++ b/tools/lib/bpf/libbpf_internal.h
@@ -172,7 +172,7 @@ static inline void *libbpf_reallocarray(void *ptr, size_t nmemb, size_t size)
 struct btf;
 struct btf_type;
 
-struct btf_type *btf_type_by_id(struct btf *btf, __u32 type_id);
+struct btf_type *btf_type_by_id(const struct btf *btf, __u32 type_id);
 const char *btf_kind_str(const struct btf_type *t);
 const struct btf_type *skip_mods_and_typedefs(const struct btf *btf, __u32 id, __u32 *res_id);
 
diff --git a/tools/lib/bpf/relo_core.c b/tools/lib/bpf/relo_core.c
index b5b8956a1be8..c0904f4cb514 100644
--- a/tools/lib/bpf/relo_core.c
+++ b/tools/lib/bpf/relo_core.c
@@ -51,7 +51,7 @@ static bool is_flex_arr(const struct btf *btf,
 		return false;
 
 	/* has to be the last member of enclosing struct */
-	t = btf__type_by_id(btf, acc->type_id);
+	t = btf_type_by_id(btf, acc->type_id);
 	return acc->idx == btf_vlen(t) - 1;
 }
 
@@ -388,7 +388,7 @@ static int bpf_core_match_member(const struct btf *local_btf,
 		return 0;
 
 	local_id = local_acc->type_id;
-	local_type = btf__type_by_id(local_btf, local_id);
+	local_type = btf_type_by_id(local_btf, local_id);
 	local_member = btf_members(local_type) + local_acc->idx;
 	local_name = btf__name_by_offset(local_btf, local_member->name_off);
 
@@ -580,7 +580,7 @@ static int bpf_core_calc_field_relo(const char *prog_name,
 		return -EUCLEAN; /* request instruction poisoning */
 
 	acc = &spec->spec[spec->len - 1];
-	t = btf__type_by_id(spec->btf, acc->type_id);
+	t = btf_type_by_id(spec->btf, acc->type_id);
 
 	/* a[n] accessor needs special handling */
 	if (!acc->name) {
@@ -729,7 +729,7 @@ static int bpf_core_calc_enumval_relo(const struct bpf_core_relo *relo,
 	case BPF_ENUMVAL_VALUE:
 		if (!spec)
 			return -EUCLEAN; /* request instruction poisoning */
-		t = btf__type_by_id(spec->btf, spec->spec[0].type_id);
+		t = btf_type_by_id(spec->btf, spec->spec[0].type_id);
 		e = btf_enum(t) + spec->spec[0].idx;
 		*val = e->val;
 		break;
@@ -805,8 +805,8 @@ static int bpf_core_calc_relo(const char *prog_name,
 		if (res->orig_sz != res->new_sz) {
 			const struct btf_type *orig_t, *new_t;
 
-			orig_t = btf__type_by_id(local_spec->btf, res->orig_type_id);
-			new_t = btf__type_by_id(targ_spec->btf, res->new_type_id);
+			orig_t = btf_type_by_id(local_spec->btf, res->orig_type_id);
+			new_t = btf_type_by_id(targ_spec->btf, res->new_type_id);
 
 			/* There are two use cases in which it's safe to
 			 * adjust load/store's mem size:
@@ -1054,7 +1054,7 @@ static void bpf_core_dump_spec(int level, const struct bpf_core_spec *spec)
 	int i;
 
 	type_id = spec->root_type_id;
-	t = btf__type_by_id(spec->btf, type_id);
+	t = btf_type_by_id(spec->btf, type_id);
 	s = btf__name_by_offset(spec->btf, t->name_off);
 
 	libbpf_print(level, "[%u] %s %s", type_id, btf_kind_str(t), str_is_empty(s) ? "<anon>" : s);
@@ -1158,10 +1158,7 @@ int bpf_core_apply_relo_insn(const char *prog_name, struct bpf_insn *insn,
 	int i, j, err;
 
 	local_id = relo->type_id;
-	local_type = btf__type_by_id(local_btf, local_id);
-	if (!local_type)
-		return -EINVAL;
-
+	local_type = btf_type_by_id(local_btf, local_id);
 	local_name = btf__name_by_offset(local_btf, local_type->name_off);
 	if (!local_name)
 		return -EINVAL;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 bpf-next 02/13] bpf: Rename btf_member accessors.
  2021-11-20  3:32 [PATCH v3 bpf-next 00/13] bpf: CO-RE support in the kernel Alexei Starovoitov
  2021-11-20  3:32 ` [PATCH v3 bpf-next 01/13] libbpf: Replace btf__type_by_id() with btf_type_by_id() Alexei Starovoitov
@ 2021-11-20  3:32 ` Alexei Starovoitov
  2021-11-20  3:32 ` [PATCH v3 bpf-next 03/13] bpf: Prepare relo_core.c for kernel duty Alexei Starovoitov
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 37+ messages in thread
From: Alexei Starovoitov @ 2021-11-20  3:32 UTC (permalink / raw)
  To: davem; +Cc: daniel, andrii, bpf, kernel-team

From: Alexei Starovoitov <ast@kernel.org>

Rename btf_member_bit_offset() and btf_member_bitfield_size() to
avoid conflicts with similarly named helpers in libbpf's btf.h.
Rename the kernel helpers, since libbpf helpers are part of uapi.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/btf.h         |  8 ++++----
 kernel/bpf/bpf_struct_ops.c |  6 +++---
 kernel/bpf/btf.c            | 18 +++++++++---------
 net/ipv4/bpf_tcp_ca.c       |  6 +++---
 4 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/include/linux/btf.h b/include/linux/btf.h
index 203eef993d76..956f70388f69 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -194,15 +194,15 @@ static inline bool btf_type_kflag(const struct btf_type *t)
 	return BTF_INFO_KFLAG(t->info);
 }
 
-static inline u32 btf_member_bit_offset(const struct btf_type *struct_type,
-					const struct btf_member *member)
+static inline u32 __btf_member_bit_offset(const struct btf_type *struct_type,
+					  const struct btf_member *member)
 {
 	return btf_type_kflag(struct_type) ? BTF_MEMBER_BIT_OFFSET(member->offset)
 					   : member->offset;
 }
 
-static inline u32 btf_member_bitfield_size(const struct btf_type *struct_type,
-					   const struct btf_member *member)
+static inline u32 __btf_member_bitfield_size(const struct btf_type *struct_type,
+					     const struct btf_member *member)
 {
 	return btf_type_kflag(struct_type) ? BTF_MEMBER_BITFIELD_SIZE(member->offset)
 					   : 0;
diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c
index 8ecfe4752769..21069dbe9138 100644
--- a/kernel/bpf/bpf_struct_ops.c
+++ b/kernel/bpf/bpf_struct_ops.c
@@ -165,7 +165,7 @@ void bpf_struct_ops_init(struct btf *btf, struct bpf_verifier_log *log)
 				break;
 			}
 
-			if (btf_member_bitfield_size(t, member)) {
+			if (__btf_member_bitfield_size(t, member)) {
 				pr_warn("bit field member %s in struct %s is not supported\n",
 					mname, st_ops->name);
 				break;
@@ -296,7 +296,7 @@ static int check_zero_holes(const struct btf_type *t, void *data)
 	const struct btf_type *mtype;
 
 	for_each_member(i, t, member) {
-		moff = btf_member_bit_offset(t, member) / 8;
+		moff = __btf_member_bit_offset(t, member) / 8;
 		if (moff > prev_mend &&
 		    memchr_inv(data + prev_mend, 0, moff - prev_mend))
 			return -EINVAL;
@@ -387,7 +387,7 @@ static int bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
 		struct bpf_prog *prog;
 		u32 moff;
 
-		moff = btf_member_bit_offset(t, member) / 8;
+		moff = __btf_member_bit_offset(t, member) / 8;
 		ptype = btf_type_resolve_ptr(btf_vmlinux, member->type, NULL);
 		if (ptype == module_type) {
 			if (*(void **)(udata + moff))
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 6b9d23be1e99..f4119a99da7b 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -2969,7 +2969,7 @@ static s32 btf_struct_check_meta(struct btf_verifier_env *env,
 			return -EINVAL;
 		}
 
-		offset = btf_member_bit_offset(t, member);
+		offset = __btf_member_bit_offset(t, member);
 		if (is_union && offset) {
 			btf_verifier_log_member(env, t, member,
 						"Invalid member bits_offset");
@@ -3094,7 +3094,7 @@ static int btf_find_struct_field(const struct btf *btf, const struct btf_type *t
 		if (off != -ENOENT)
 			/* only one such field is allowed */
 			return -E2BIG;
-		off = btf_member_bit_offset(t, member);
+		off = __btf_member_bit_offset(t, member);
 		if (off % 8)
 			/* valid C code cannot generate such BTF */
 			return -EINVAL;
@@ -3184,8 +3184,8 @@ static void __btf_struct_show(const struct btf *btf, const struct btf_type *t,
 
 		btf_show_start_member(show, member);
 
-		member_offset = btf_member_bit_offset(t, member);
-		bitfield_size = btf_member_bitfield_size(t, member);
+		member_offset = __btf_member_bit_offset(t, member);
+		bitfield_size = __btf_member_bitfield_size(t, member);
 		bytes_offset = BITS_ROUNDDOWN_BYTES(member_offset);
 		bits8_offset = BITS_PER_BYTE_MASKED(member_offset);
 		if (bitfield_size) {
@@ -5060,7 +5060,7 @@ static int btf_struct_walk(struct bpf_verifier_log *log, const struct btf *btf,
 		if (array_elem->nelems != 0)
 			goto error;
 
-		moff = btf_member_bit_offset(t, member) / 8;
+		moff = __btf_member_bit_offset(t, member) / 8;
 		if (off < moff)
 			goto error;
 
@@ -5083,14 +5083,14 @@ static int btf_struct_walk(struct bpf_verifier_log *log, const struct btf *btf,
 
 	for_each_member(i, t, member) {
 		/* offset of the field in bytes */
-		moff = btf_member_bit_offset(t, member) / 8;
+		moff = __btf_member_bit_offset(t, member) / 8;
 		if (off + size <= moff)
 			/* won't find anything, field is already too far */
 			break;
 
-		if (btf_member_bitfield_size(t, member)) {
-			u32 end_bit = btf_member_bit_offset(t, member) +
-				btf_member_bitfield_size(t, member);
+		if (__btf_member_bitfield_size(t, member)) {
+			u32 end_bit = __btf_member_bit_offset(t, member) +
+				__btf_member_bitfield_size(t, member);
 
 			/* off <= moff instead of off == moff because clang
 			 * does not generate a BTF member for anonymous
diff --git a/net/ipv4/bpf_tcp_ca.c b/net/ipv4/bpf_tcp_ca.c
index 2cf02b4d77fb..67466dbff152 100644
--- a/net/ipv4/bpf_tcp_ca.c
+++ b/net/ipv4/bpf_tcp_ca.c
@@ -169,7 +169,7 @@ static u32 prog_ops_moff(const struct bpf_prog *prog)
 	t = bpf_tcp_congestion_ops.type;
 	m = &btf_type_member(t)[midx];
 
-	return btf_member_bit_offset(t, m) / 8;
+	return __btf_member_bit_offset(t, m) / 8;
 }
 
 static const struct bpf_func_proto *
@@ -244,7 +244,7 @@ static int bpf_tcp_ca_init_member(const struct btf_type *t,
 	utcp_ca = (const struct tcp_congestion_ops *)udata;
 	tcp_ca = (struct tcp_congestion_ops *)kdata;
 
-	moff = btf_member_bit_offset(t, member) / 8;
+	moff = __btf_member_bit_offset(t, member) / 8;
 	switch (moff) {
 	case offsetof(struct tcp_congestion_ops, flags):
 		if (utcp_ca->flags & ~TCP_CONG_MASK)
@@ -274,7 +274,7 @@ static int bpf_tcp_ca_init_member(const struct btf_type *t,
 static int bpf_tcp_ca_check_member(const struct btf_type *t,
 				   const struct btf_member *member)
 {
-	if (is_unsupported(btf_member_bit_offset(t, member) / 8))
+	if (is_unsupported(__btf_member_bit_offset(t, member) / 8))
 		return -ENOTSUPP;
 	return 0;
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 bpf-next 03/13] bpf: Prepare relo_core.c for kernel duty.
  2021-11-20  3:32 [PATCH v3 bpf-next 00/13] bpf: CO-RE support in the kernel Alexei Starovoitov
  2021-11-20  3:32 ` [PATCH v3 bpf-next 01/13] libbpf: Replace btf__type_by_id() with btf_type_by_id() Alexei Starovoitov
  2021-11-20  3:32 ` [PATCH v3 bpf-next 02/13] bpf: Rename btf_member accessors Alexei Starovoitov
@ 2021-11-20  3:32 ` Alexei Starovoitov
  2021-11-20 15:27   ` Matteo Croce
  2021-11-22 23:15   ` Andrii Nakryiko
  2021-11-20  3:32 ` [PATCH v3 bpf-next 04/13] bpf: Define enum bpf_core_relo_kind as uapi Alexei Starovoitov
                   ` (9 subsequent siblings)
  12 siblings, 2 replies; 37+ messages in thread
From: Alexei Starovoitov @ 2021-11-20  3:32 UTC (permalink / raw)
  To: davem; +Cc: daniel, andrii, bpf, kernel-team

From: Alexei Starovoitov <ast@kernel.org>

Make relo_core.c to be compiled for the kernel and for user space libbpf.

Note the patch is reducing BPF_CORE_SPEC_MAX_LEN from 64 to 32.
This is the maximum number of nested structs and arrays.
For example:
 struct sample {
     int a;
     struct {
         int b[10];
     };
 };

 struct sample *s = ...;
 int y = &s->b[5];
This field access is encoded as "0:1:0:5" and spec len is 4.

The follow up patch might bump it back to 64.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 include/linux/btf.h       | 81 +++++++++++++++++++++++++++++++++++++++
 kernel/bpf/Makefile       |  4 ++
 kernel/bpf/btf.c          | 26 +++++++++++++
 tools/lib/bpf/relo_core.c | 76 ++++++++++++++++++++++++++++++------
 4 files changed, 176 insertions(+), 11 deletions(-)

diff --git a/include/linux/btf.h b/include/linux/btf.h
index 956f70388f69..acef6ef28768 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -144,6 +144,53 @@ static inline bool btf_type_is_enum(const struct btf_type *t)
 	return BTF_INFO_KIND(t->info) == BTF_KIND_ENUM;
 }
 
+static inline bool str_is_empty(const char *s)
+{
+	return !s || !s[0];
+}
+
+static inline u16 btf_kind(const struct btf_type *t)
+{
+	return BTF_INFO_KIND(t->info);
+}
+
+static inline bool btf_is_enum(const struct btf_type *t)
+{
+	return btf_kind(t) == BTF_KIND_ENUM;
+}
+
+static inline bool btf_is_composite(const struct btf_type *t)
+{
+	u16 kind = btf_kind(t);
+
+	return kind == BTF_KIND_STRUCT || kind == BTF_KIND_UNION;
+}
+
+static inline bool btf_is_array(const struct btf_type *t)
+{
+	return btf_kind(t) == BTF_KIND_ARRAY;
+}
+
+static inline bool btf_is_int(const struct btf_type *t)
+{
+	return btf_kind(t) == BTF_KIND_INT;
+}
+
+static inline bool btf_is_ptr(const struct btf_type *t)
+{
+	return btf_kind(t) == BTF_KIND_PTR;
+}
+
+static inline u8 btf_int_offset(const struct btf_type *t)
+{
+	return BTF_INT_OFFSET(*(u32 *)(t + 1));
+}
+
+static inline u8 btf_int_encoding(const struct btf_type *t)
+{
+	return BTF_INT_ENCODING(*(u32 *)(t + 1));
+}
+
 static inline bool btf_type_is_scalar(const struct btf_type *t)
 {
 	return btf_type_is_int(t) || btf_type_is_enum(t);
@@ -184,6 +231,11 @@ static inline u16 btf_type_vlen(const struct btf_type *t)
 	return BTF_INFO_VLEN(t->info);
 }
 
+static inline u16 btf_vlen(const struct btf_type *t)
+{
+	return btf_type_vlen(t);
+}
+
 static inline u16 btf_func_linkage(const struct btf_type *t)
 {
 	return BTF_INFO_VLEN(t->info);
@@ -208,11 +260,40 @@ static inline u32 __btf_member_bitfield_size(const struct btf_type *struct_type,
 					   : 0;
 }
 
+static inline struct btf_member *btf_members(const struct btf_type *t)
+{
+	return (struct btf_member *)(t + 1);
+}
+
+static inline u32 btf_member_bit_offset(const struct btf_type *t, u32 member_idx)
+{
+	const struct btf_member *m = btf_members(t) + member_idx;
+
+	return __btf_member_bit_offset(t, m);
+}
+
+static inline u32 btf_member_bitfield_size(const struct btf_type *t, u32 member_idx)
+{
+	const struct btf_member *m = btf_members(t) + member_idx;
+
+	return __btf_member_bitfield_size(t, m);
+}
+
 static inline const struct btf_member *btf_type_member(const struct btf_type *t)
 {
 	return (const struct btf_member *)(t + 1);
 }
 
+static inline struct btf_array *btf_array(const struct btf_type *t)
+{
+	return (struct btf_array *)(t + 1);
+}
+
+static inline struct btf_enum *btf_enum(const struct btf_type *t)
+{
+	return (struct btf_enum *)(t + 1);
+}
+
 static inline const struct btf_var_secinfo *btf_type_var_secinfo(
 		const struct btf_type *t)
 {
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index cf6ca339f3cd..c1a9be6a4b9f 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -36,3 +36,7 @@ obj-$(CONFIG_BPF_SYSCALL) += bpf_struct_ops.o
 obj-${CONFIG_BPF_LSM} += bpf_lsm.o
 endif
 obj-$(CONFIG_BPF_PRELOAD) += preload/
+
+obj-$(CONFIG_BPF_SYSCALL) += relo_core.o
+$(obj)/relo_core.o: $(srctree)/tools/lib/bpf/relo_core.c FORCE
+	$(call if_changed_rule,cc_o_c)
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index f4119a99da7b..c79595aad55b 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -6413,3 +6413,29 @@ bool bpf_check_mod_kfunc_call(struct kfunc_btf_id_list *klist, u32 kfunc_id,
 
 DEFINE_KFUNC_BTF_ID_LIST(bpf_tcp_ca_kfunc_list);
 DEFINE_KFUNC_BTF_ID_LIST(prog_test_kfunc_list);
+
+int bpf_core_types_are_compat(const struct btf *local_btf, __u32 local_id,
+			      const struct btf *targ_btf, __u32 targ_id)
+{
+	return -EOPNOTSUPP;
+}
+
+static bool bpf_core_is_flavor_sep(const char *s)
+{
+	/* check X___Y name pattern, where X and Y are not underscores */
+	return s[0] != '_' &&				      /* X */
+	       s[1] == '_' && s[2] == '_' && s[3] == '_' &&   /* ___ */
+	       s[4] != '_';				      /* Y */
+}
+
+size_t bpf_core_essential_name_len(const char *name)
+{
+	size_t n = strlen(name);
+	int i;
+
+	for (i = n - 5; i >= 0; i--) {
+		if (bpf_core_is_flavor_sep(name + i))
+			return i + 1;
+	}
+	return n;
+}
diff --git a/tools/lib/bpf/relo_core.c b/tools/lib/bpf/relo_core.c
index c0904f4cb514..56dbe6d16664 100644
--- a/tools/lib/bpf/relo_core.c
+++ b/tools/lib/bpf/relo_core.c
@@ -1,6 +1,60 @@
 // SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
 /* Copyright (c) 2019 Facebook */
 
+#ifdef __KERNEL__
+#include <linux/bpf.h>
+#include <linux/btf.h>
+#include <linux/string.h>
+#include <linux/bpf_verifier.h>
+#include "relo_core.h"
+
+static const char *btf_kind_str(const struct btf_type *t)
+{
+	return btf_type_str(t);
+}
+
+static bool is_ldimm64_insn(struct bpf_insn *insn)
+{
+	return insn->code == (BPF_LD | BPF_IMM | BPF_DW);
+}
+
+static const struct btf_type *
+skip_mods_and_typedefs(const struct btf *btf, u32 id, u32 *res_id)
+{
+	return btf_type_skip_modifiers(btf, id, res_id);
+}
+
+static const char *btf__name_by_offset(const struct btf *btf, u32 offset)
+{
+	return btf_name_by_offset(btf, offset);
+}
+
+static s64 btf__resolve_size(const struct btf *btf, u32 type_id)
+{
+	const struct btf_type *t;
+	int size;
+
+	t = btf_type_by_id(btf, type_id);
+	t = btf_resolve_size(btf, t, &size);
+	if (IS_ERR(t))
+		return PTR_ERR(t);
+	return size;
+}
+
+enum libbpf_print_level {
+	LIBBPF_WARN,
+	LIBBPF_INFO,
+	LIBBPF_DEBUG,
+};
+
+#undef pr_warn
+#undef pr_info
+#undef pr_debug
+#define pr_warn(fmt, log, ...)	bpf_log((void *)log, fmt, "", ##__VA_ARGS__)
+#define pr_info(fmt, log, ...)	bpf_log((void *)log, fmt, "", ##__VA_ARGS__)
+#define pr_debug(fmt, log, ...)	bpf_log((void *)log, fmt, "", ##__VA_ARGS__)
+#define libbpf_print(level, fmt, ...)	bpf_log((void *)prog_name, fmt, ##__VA_ARGS__)
+#else
 #include <stdio.h>
 #include <string.h>
 #include <errno.h>
@@ -12,8 +66,9 @@
 #include "btf.h"
 #include "str_error.h"
 #include "libbpf_internal.h"
+#endif
 
-#define BPF_CORE_SPEC_MAX_LEN 64
+#define BPF_CORE_SPEC_MAX_LEN 32
 
 /* represents BPF CO-RE field or array element accessor */
 struct bpf_core_accessor {
@@ -150,7 +205,7 @@ static bool core_relo_is_enumval_based(enum bpf_core_relo_kind kind)
  * Enum value-based relocations (ENUMVAL_EXISTS/ENUMVAL_VALUE) use access
  * string to specify enumerator's value index that need to be relocated.
  */
-static int bpf_core_parse_spec(const struct btf *btf,
+static int bpf_core_parse_spec(const char *prog_name, const struct btf *btf,
 			       __u32 type_id,
 			       const char *spec_str,
 			       enum bpf_core_relo_kind relo_kind,
@@ -272,8 +327,8 @@ static int bpf_core_parse_spec(const struct btf *btf,
 				return sz;
 			spec->bit_offset += access_idx * sz * 8;
 		} else {
-			pr_warn("relo for [%u] %s (at idx %d) captures type [%d] of unexpected kind %s\n",
-				type_id, spec_str, i, id, btf_kind_str(t));
+			pr_warn("prog '%s': relo for [%u] %s (at idx %d) captures type [%d] of unexpected kind %s\n",
+				prog_name, type_id, spec_str, i, id, btf_kind_str(t));
 			return -EINVAL;
 		}
 	}
@@ -346,8 +401,6 @@ static int bpf_core_fields_are_compat(const struct btf *local_btf,
 		targ_id = btf_array(targ_type)->type;
 		goto recur;
 	default:
-		pr_warn("unexpected kind %d relocated, local [%d], target [%d]\n",
-			btf_kind(local_type), local_id, targ_id);
 		return 0;
 	}
 }
@@ -1045,7 +1098,7 @@ static int bpf_core_patch_insn(const char *prog_name, struct bpf_insn *insn,
  * [<type-id>] (<type-name>) + <raw-spec> => <offset>@<spec>,
  * where <spec> is a C-syntax view of recorded field access, e.g.: x.a[3].b
  */
-static void bpf_core_dump_spec(int level, const struct bpf_core_spec *spec)
+static void bpf_core_dump_spec(const char *prog_name, int level, const struct bpf_core_spec *spec)
 {
 	const struct btf_type *t;
 	const struct btf_enum *e;
@@ -1167,7 +1220,8 @@ int bpf_core_apply_relo_insn(const char *prog_name, struct bpf_insn *insn,
 	if (str_is_empty(spec_str))
 		return -EINVAL;
 
-	err = bpf_core_parse_spec(local_btf, local_id, spec_str, relo->kind, &local_spec);
+	err = bpf_core_parse_spec(prog_name, local_btf, local_id, spec_str,
+				  relo->kind, &local_spec);
 	if (err) {
 		pr_warn("prog '%s': relo #%d: parsing [%d] %s %s + %s failed: %d\n",
 			prog_name, relo_idx, local_id, btf_kind_str(local_type),
@@ -1178,7 +1232,7 @@ int bpf_core_apply_relo_insn(const char *prog_name, struct bpf_insn *insn,
 
 	pr_debug("prog '%s': relo #%d: kind <%s> (%d), spec is ", prog_name,
 		 relo_idx, core_relo_kind_str(relo->kind), relo->kind);
-	bpf_core_dump_spec(LIBBPF_DEBUG, &local_spec);
+	bpf_core_dump_spec(prog_name, LIBBPF_DEBUG, &local_spec);
 	libbpf_print(LIBBPF_DEBUG, "\n");
 
 	/* TYPE_ID_LOCAL relo is special and doesn't need candidate search */
@@ -1204,14 +1258,14 @@ int bpf_core_apply_relo_insn(const char *prog_name, struct bpf_insn *insn,
 		if (err < 0) {
 			pr_warn("prog '%s': relo #%d: error matching candidate #%d ",
 				prog_name, relo_idx, i);
-			bpf_core_dump_spec(LIBBPF_WARN, &cand_spec);
+			bpf_core_dump_spec(prog_name, LIBBPF_WARN, &cand_spec);
 			libbpf_print(LIBBPF_WARN, ": %d\n", err);
 			return err;
 		}
 
 		pr_debug("prog '%s': relo #%d: %s candidate #%d ", prog_name,
 			 relo_idx, err == 0 ? "non-matching" : "matching", i);
-		bpf_core_dump_spec(LIBBPF_DEBUG, &cand_spec);
+		bpf_core_dump_spec(prog_name, LIBBPF_DEBUG, &cand_spec);
 		libbpf_print(LIBBPF_DEBUG, "\n");
 
 		if (err == 0)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 bpf-next 04/13] bpf: Define enum bpf_core_relo_kind as uapi.
  2021-11-20  3:32 [PATCH v3 bpf-next 00/13] bpf: CO-RE support in the kernel Alexei Starovoitov
                   ` (2 preceding siblings ...)
  2021-11-20  3:32 ` [PATCH v3 bpf-next 03/13] bpf: Prepare relo_core.c for kernel duty Alexei Starovoitov
@ 2021-11-20  3:32 ` Alexei Starovoitov
  2021-11-20  3:32 ` [PATCH v3 bpf-next 05/13] bpf: Pass a set of bpf_core_relo-s to prog_load command Alexei Starovoitov
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 37+ messages in thread
From: Alexei Starovoitov @ 2021-11-20  3:32 UTC (permalink / raw)
  To: davem; +Cc: daniel, andrii, bpf, kernel-team

From: Alexei Starovoitov <ast@kernel.org>

enum bpf_core_relo_kind is generated by llvm and processed by libbpf.
It's a de-facto uapi.
With CO-RE in the kernel the bpf_core_relo_kind values become uapi de-jure.
Also rename them with BPF_CORE_ prefix to distinguish from conflicting names in
bpf_core_read.h. The enums bpf_field_info_kind, bpf_type_id_kind,
bpf_type_info_kind, bpf_enum_value_kind are passing different values from bpf
program into llvm.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/uapi/linux/bpf.h       | 19 ++++++++
 tools/include/uapi/linux/bpf.h | 19 ++++++++
 tools/lib/bpf/libbpf.c         |  2 +-
 tools/lib/bpf/relo_core.c      | 84 +++++++++++++++++-----------------
 tools/lib/bpf/relo_core.h      | 18 +-------
 5 files changed, 82 insertions(+), 60 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index a69e4b04ffeb..190e9e3c0693 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -6349,4 +6349,23 @@ enum {
 	BTF_F_ZERO	=	(1ULL << 3),
 };
 
+/* bpf_core_relo_kind encodes which aspect of captured field/type/enum value
+ * has to be adjusted by relocations. It is emitted by llvm and passed to
+ * libbpf and later to the kernel.
+ */
+enum bpf_core_relo_kind {
+	BPF_CORE_FIELD_BYTE_OFFSET = 0,      /* field byte offset */
+	BPF_CORE_FIELD_BYTE_SIZE = 1,        /* field size in bytes */
+	BPF_CORE_FIELD_EXISTS = 2,           /* field existence in target kernel */
+	BPF_CORE_FIELD_SIGNED = 3,           /* field signedness (0 - unsigned, 1 - signed) */
+	BPF_CORE_FIELD_LSHIFT_U64 = 4,       /* bitfield-specific left bitshift */
+	BPF_CORE_FIELD_RSHIFT_U64 = 5,       /* bitfield-specific right bitshift */
+	BPF_CORE_TYPE_ID_LOCAL = 6,          /* type ID in local BPF object */
+	BPF_CORE_TYPE_ID_TARGET = 7,         /* type ID in target kernel */
+	BPF_CORE_TYPE_EXISTS = 8,            /* type existence in target kernel */
+	BPF_CORE_TYPE_SIZE = 9,              /* type size in bytes */
+	BPF_CORE_ENUMVAL_EXISTS = 10,        /* enum value existence in target kernel */
+	BPF_CORE_ENUMVAL_VALUE = 11,         /* enum value integer value */
+};
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index a69e4b04ffeb..190e9e3c0693 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -6349,4 +6349,23 @@ enum {
 	BTF_F_ZERO	=	(1ULL << 3),
 };
 
+/* bpf_core_relo_kind encodes which aspect of captured field/type/enum value
+ * has to be adjusted by relocations. It is emitted by llvm and passed to
+ * libbpf and later to the kernel.
+ */
+enum bpf_core_relo_kind {
+	BPF_CORE_FIELD_BYTE_OFFSET = 0,      /* field byte offset */
+	BPF_CORE_FIELD_BYTE_SIZE = 1,        /* field size in bytes */
+	BPF_CORE_FIELD_EXISTS = 2,           /* field existence in target kernel */
+	BPF_CORE_FIELD_SIGNED = 3,           /* field signedness (0 - unsigned, 1 - signed) */
+	BPF_CORE_FIELD_LSHIFT_U64 = 4,       /* bitfield-specific left bitshift */
+	BPF_CORE_FIELD_RSHIFT_U64 = 5,       /* bitfield-specific right bitshift */
+	BPF_CORE_TYPE_ID_LOCAL = 6,          /* type ID in local BPF object */
+	BPF_CORE_TYPE_ID_TARGET = 7,         /* type ID in target kernel */
+	BPF_CORE_TYPE_EXISTS = 8,            /* type existence in target kernel */
+	BPF_CORE_TYPE_SIZE = 9,              /* type size in bytes */
+	BPF_CORE_ENUMVAL_EXISTS = 10,        /* enum value existence in target kernel */
+	BPF_CORE_ENUMVAL_VALUE = 11,         /* enum value integer value */
+};
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index af405c38aadc..fc9a179ea412 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -5459,7 +5459,7 @@ static int bpf_core_apply_relo(struct bpf_program *prog,
 		return -ENOTSUP;
 	}
 
-	if (relo->kind != BPF_TYPE_ID_LOCAL &&
+	if (relo->kind != BPF_CORE_TYPE_ID_LOCAL &&
 	    !hashmap__find(cand_cache, type_key, (void **)&cands)) {
 		cands = bpf_core_find_cands(prog->obj, local_btf, local_id);
 		if (IS_ERR(cands)) {
diff --git a/tools/lib/bpf/relo_core.c b/tools/lib/bpf/relo_core.c
index 56dbe6d16664..d194fb9306ed 100644
--- a/tools/lib/bpf/relo_core.c
+++ b/tools/lib/bpf/relo_core.c
@@ -113,18 +113,18 @@ static bool is_flex_arr(const struct btf *btf,
 static const char *core_relo_kind_str(enum bpf_core_relo_kind kind)
 {
 	switch (kind) {
-	case BPF_FIELD_BYTE_OFFSET: return "byte_off";
-	case BPF_FIELD_BYTE_SIZE: return "byte_sz";
-	case BPF_FIELD_EXISTS: return "field_exists";
-	case BPF_FIELD_SIGNED: return "signed";
-	case BPF_FIELD_LSHIFT_U64: return "lshift_u64";
-	case BPF_FIELD_RSHIFT_U64: return "rshift_u64";
-	case BPF_TYPE_ID_LOCAL: return "local_type_id";
-	case BPF_TYPE_ID_TARGET: return "target_type_id";
-	case BPF_TYPE_EXISTS: return "type_exists";
-	case BPF_TYPE_SIZE: return "type_size";
-	case BPF_ENUMVAL_EXISTS: return "enumval_exists";
-	case BPF_ENUMVAL_VALUE: return "enumval_value";
+	case BPF_CORE_FIELD_BYTE_OFFSET: return "byte_off";
+	case BPF_CORE_FIELD_BYTE_SIZE: return "byte_sz";
+	case BPF_CORE_FIELD_EXISTS: return "field_exists";
+	case BPF_CORE_FIELD_SIGNED: return "signed";
+	case BPF_CORE_FIELD_LSHIFT_U64: return "lshift_u64";
+	case BPF_CORE_FIELD_RSHIFT_U64: return "rshift_u64";
+	case BPF_CORE_TYPE_ID_LOCAL: return "local_type_id";
+	case BPF_CORE_TYPE_ID_TARGET: return "target_type_id";
+	case BPF_CORE_TYPE_EXISTS: return "type_exists";
+	case BPF_CORE_TYPE_SIZE: return "type_size";
+	case BPF_CORE_ENUMVAL_EXISTS: return "enumval_exists";
+	case BPF_CORE_ENUMVAL_VALUE: return "enumval_value";
 	default: return "unknown";
 	}
 }
@@ -132,12 +132,12 @@ static const char *core_relo_kind_str(enum bpf_core_relo_kind kind)
 static bool core_relo_is_field_based(enum bpf_core_relo_kind kind)
 {
 	switch (kind) {
-	case BPF_FIELD_BYTE_OFFSET:
-	case BPF_FIELD_BYTE_SIZE:
-	case BPF_FIELD_EXISTS:
-	case BPF_FIELD_SIGNED:
-	case BPF_FIELD_LSHIFT_U64:
-	case BPF_FIELD_RSHIFT_U64:
+	case BPF_CORE_FIELD_BYTE_OFFSET:
+	case BPF_CORE_FIELD_BYTE_SIZE:
+	case BPF_CORE_FIELD_EXISTS:
+	case BPF_CORE_FIELD_SIGNED:
+	case BPF_CORE_FIELD_LSHIFT_U64:
+	case BPF_CORE_FIELD_RSHIFT_U64:
 		return true;
 	default:
 		return false;
@@ -147,10 +147,10 @@ static bool core_relo_is_field_based(enum bpf_core_relo_kind kind)
 static bool core_relo_is_type_based(enum bpf_core_relo_kind kind)
 {
 	switch (kind) {
-	case BPF_TYPE_ID_LOCAL:
-	case BPF_TYPE_ID_TARGET:
-	case BPF_TYPE_EXISTS:
-	case BPF_TYPE_SIZE:
+	case BPF_CORE_TYPE_ID_LOCAL:
+	case BPF_CORE_TYPE_ID_TARGET:
+	case BPF_CORE_TYPE_EXISTS:
+	case BPF_CORE_TYPE_SIZE:
 		return true;
 	default:
 		return false;
@@ -160,8 +160,8 @@ static bool core_relo_is_type_based(enum bpf_core_relo_kind kind)
 static bool core_relo_is_enumval_based(enum bpf_core_relo_kind kind)
 {
 	switch (kind) {
-	case BPF_ENUMVAL_EXISTS:
-	case BPF_ENUMVAL_VALUE:
+	case BPF_CORE_ENUMVAL_EXISTS:
+	case BPF_CORE_ENUMVAL_VALUE:
 		return true;
 	default:
 		return false;
@@ -624,7 +624,7 @@ static int bpf_core_calc_field_relo(const char *prog_name,
 
 	*field_sz = 0;
 
-	if (relo->kind == BPF_FIELD_EXISTS) {
+	if (relo->kind == BPF_CORE_FIELD_EXISTS) {
 		*val = spec ? 1 : 0;
 		return 0;
 	}
@@ -637,7 +637,7 @@ static int bpf_core_calc_field_relo(const char *prog_name,
 
 	/* a[n] accessor needs special handling */
 	if (!acc->name) {
-		if (relo->kind == BPF_FIELD_BYTE_OFFSET) {
+		if (relo->kind == BPF_CORE_FIELD_BYTE_OFFSET) {
 			*val = spec->bit_offset / 8;
 			/* remember field size for load/store mem size */
 			sz = btf__resolve_size(spec->btf, acc->type_id);
@@ -645,7 +645,7 @@ static int bpf_core_calc_field_relo(const char *prog_name,
 				return -EINVAL;
 			*field_sz = sz;
 			*type_id = acc->type_id;
-		} else if (relo->kind == BPF_FIELD_BYTE_SIZE) {
+		} else if (relo->kind == BPF_CORE_FIELD_BYTE_SIZE) {
 			sz = btf__resolve_size(spec->btf, acc->type_id);
 			if (sz < 0)
 				return -EINVAL;
@@ -697,36 +697,36 @@ static int bpf_core_calc_field_relo(const char *prog_name,
 		*validate = !bitfield;
 
 	switch (relo->kind) {
-	case BPF_FIELD_BYTE_OFFSET:
+	case BPF_CORE_FIELD_BYTE_OFFSET:
 		*val = byte_off;
 		if (!bitfield) {
 			*field_sz = byte_sz;
 			*type_id = field_type_id;
 		}
 		break;
-	case BPF_FIELD_BYTE_SIZE:
+	case BPF_CORE_FIELD_BYTE_SIZE:
 		*val = byte_sz;
 		break;
-	case BPF_FIELD_SIGNED:
+	case BPF_CORE_FIELD_SIGNED:
 		/* enums will be assumed unsigned */
 		*val = btf_is_enum(mt) ||
 		       (btf_int_encoding(mt) & BTF_INT_SIGNED);
 		if (validate)
 			*validate = true; /* signedness is never ambiguous */
 		break;
-	case BPF_FIELD_LSHIFT_U64:
+	case BPF_CORE_FIELD_LSHIFT_U64:
 #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
 		*val = 64 - (bit_off + bit_sz - byte_off  * 8);
 #else
 		*val = (8 - byte_sz) * 8 + (bit_off - byte_off * 8);
 #endif
 		break;
-	case BPF_FIELD_RSHIFT_U64:
+	case BPF_CORE_FIELD_RSHIFT_U64:
 		*val = 64 - bit_sz;
 		if (validate)
 			*validate = true; /* right shift is never ambiguous */
 		break;
-	case BPF_FIELD_EXISTS:
+	case BPF_CORE_FIELD_EXISTS:
 	default:
 		return -EOPNOTSUPP;
 	}
@@ -747,20 +747,20 @@ static int bpf_core_calc_type_relo(const struct bpf_core_relo *relo,
 	}
 
 	switch (relo->kind) {
-	case BPF_TYPE_ID_TARGET:
+	case BPF_CORE_TYPE_ID_TARGET:
 		*val = spec->root_type_id;
 		break;
-	case BPF_TYPE_EXISTS:
+	case BPF_CORE_TYPE_EXISTS:
 		*val = 1;
 		break;
-	case BPF_TYPE_SIZE:
+	case BPF_CORE_TYPE_SIZE:
 		sz = btf__resolve_size(spec->btf, spec->root_type_id);
 		if (sz < 0)
 			return -EINVAL;
 		*val = sz;
 		break;
-	case BPF_TYPE_ID_LOCAL:
-	/* BPF_TYPE_ID_LOCAL is handled specially and shouldn't get here */
+	case BPF_CORE_TYPE_ID_LOCAL:
+	/* BPF_CORE_TYPE_ID_LOCAL is handled specially and shouldn't get here */
 	default:
 		return -EOPNOTSUPP;
 	}
@@ -776,10 +776,10 @@ static int bpf_core_calc_enumval_relo(const struct bpf_core_relo *relo,
 	const struct btf_enum *e;
 
 	switch (relo->kind) {
-	case BPF_ENUMVAL_EXISTS:
+	case BPF_CORE_ENUMVAL_EXISTS:
 		*val = spec ? 1 : 0;
 		break;
-	case BPF_ENUMVAL_VALUE:
+	case BPF_CORE_ENUMVAL_VALUE:
 		if (!spec)
 			return -EUCLEAN; /* request instruction poisoning */
 		t = btf_type_by_id(spec->btf, spec->spec[0].type_id);
@@ -1236,7 +1236,7 @@ int bpf_core_apply_relo_insn(const char *prog_name, struct bpf_insn *insn,
 	libbpf_print(LIBBPF_DEBUG, "\n");
 
 	/* TYPE_ID_LOCAL relo is special and doesn't need candidate search */
-	if (relo->kind == BPF_TYPE_ID_LOCAL) {
+	if (relo->kind == BPF_CORE_TYPE_ID_LOCAL) {
 		targ_res.validate = true;
 		targ_res.poison = false;
 		targ_res.orig_val = local_spec.root_type_id;
@@ -1302,7 +1302,7 @@ int bpf_core_apply_relo_insn(const char *prog_name, struct bpf_insn *insn,
 	}
 
 	/*
-	 * For BPF_FIELD_EXISTS relo or when used BPF program has field
+	 * For BPF_CORE_FIELD_EXISTS relo or when used BPF program has field
 	 * existence checks or kernel version/config checks, it's expected
 	 * that we might not find any candidates. In this case, if field
 	 * wasn't found in any candidate, the list of candidates shouldn't
diff --git a/tools/lib/bpf/relo_core.h b/tools/lib/bpf/relo_core.h
index 3b9f8f18346c..3d0b86e7f439 100644
--- a/tools/lib/bpf/relo_core.h
+++ b/tools/lib/bpf/relo_core.h
@@ -4,23 +4,7 @@
 #ifndef __RELO_CORE_H
 #define __RELO_CORE_H
 
-/* bpf_core_relo_kind encodes which aspect of captured field/type/enum value
- * has to be adjusted by relocations.
- */
-enum bpf_core_relo_kind {
-	BPF_FIELD_BYTE_OFFSET = 0,	/* field byte offset */
-	BPF_FIELD_BYTE_SIZE = 1,	/* field size in bytes */
-	BPF_FIELD_EXISTS = 2,		/* field existence in target kernel */
-	BPF_FIELD_SIGNED = 3,		/* field signedness (0 - unsigned, 1 - signed) */
-	BPF_FIELD_LSHIFT_U64 = 4,	/* bitfield-specific left bitshift */
-	BPF_FIELD_RSHIFT_U64 = 5,	/* bitfield-specific right bitshift */
-	BPF_TYPE_ID_LOCAL = 6,		/* type ID in local BPF object */
-	BPF_TYPE_ID_TARGET = 7,		/* type ID in target kernel */
-	BPF_TYPE_EXISTS = 8,		/* type existence in target kernel */
-	BPF_TYPE_SIZE = 9,		/* type size in bytes */
-	BPF_ENUMVAL_EXISTS = 10,	/* enum value existence in target kernel */
-	BPF_ENUMVAL_VALUE = 11,		/* enum value integer value */
-};
+#include <linux/bpf.h>
 
 /* The minimum bpf_core_relo checked by the loader
  *
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 bpf-next 05/13] bpf: Pass a set of bpf_core_relo-s to prog_load command.
  2021-11-20  3:32 [PATCH v3 bpf-next 00/13] bpf: CO-RE support in the kernel Alexei Starovoitov
                   ` (3 preceding siblings ...)
  2021-11-20  3:32 ` [PATCH v3 bpf-next 04/13] bpf: Define enum bpf_core_relo_kind as uapi Alexei Starovoitov
@ 2021-11-20  3:32 ` Alexei Starovoitov
  2021-11-22 23:37   ` Andrii Nakryiko
  2021-11-20  3:32 ` [PATCH v3 bpf-next 06/13] bpf: Add bpf_core_add_cands() and wire it into bpf_core_apply_relo_insn() Alexei Starovoitov
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 37+ messages in thread
From: Alexei Starovoitov @ 2021-11-20  3:32 UTC (permalink / raw)
  To: davem; +Cc: daniel, andrii, bpf, kernel-team

From: Alexei Starovoitov <ast@kernel.org>

struct bpf_core_relo is generated by llvm and processed by libbpf.
It's a de-facto uapi.
With CO-RE in the kernel the struct bpf_core_relo becomes uapi de-jure.
Add an ability to pass a set of 'struct bpf_core_relo' to prog_load command
and let the kernel perform CO-RE relocations.

Note the struct bpf_line_info and struct bpf_func_info have the same
layout when passed from LLVM to libbpf and from libbpf to the kernel
except "insn_off" fields means "byte offset" when LLVM generates it.
Then libbpf converts it to "insn index" to pass to the kernel.
The struct bpf_core_relo's "insn_off" field is always "byte offset".

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 include/linux/bpf.h            | 11 +++++
 include/uapi/linux/bpf.h       | 59 +++++++++++++++++++++++++-
 kernel/bpf/btf.c               | 10 +++++
 kernel/bpf/syscall.c           |  2 +-
 kernel/bpf/verifier.c          | 77 ++++++++++++++++++++++++++++++++++
 tools/include/uapi/linux/bpf.h | 59 +++++++++++++++++++++++++-
 tools/lib/bpf/relo_core.h      | 53 -----------------------
 7 files changed, 215 insertions(+), 56 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index cc7a0c36e7df..3cedec249035 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1732,6 +1732,17 @@ bool bpf_prog_has_kfunc_call(const struct bpf_prog *prog);
 const struct btf_func_model *
 bpf_jit_find_kfunc_model(const struct bpf_prog *prog,
 			 const struct bpf_insn *insn);
+struct bpf_core_cand_list;
+struct bpf_core_ctx {
+	struct bpf_verifier_log *log;
+	const struct btf *btf;
+	struct bpf_core_cand_list *cands;
+};
+
+int bpf_core_apply(struct bpf_core_ctx *ctx, const struct bpf_core_relo *relo,
+		   int relo_idx, void *insn);
+void bpf_core_finish(struct bpf_core_ctx *ctx);
+
 #else /* !CONFIG_BPF_SYSCALL */
 static inline struct bpf_prog *bpf_prog_get(u32 ufd)
 {
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 190e9e3c0693..ea15208793f9 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1342,8 +1342,10 @@ union bpf_attr {
 			/* or valid module BTF object fd or 0 to attach to vmlinux */
 			__u32		attach_btf_obj_fd;
 		};
-		__u32		:32;		/* pad */
+		__u32		core_relo_cnt;	/* number of bpf_core_relo */
 		__aligned_u64	fd_array;	/* array of FDs */
+		__aligned_u64	core_relos;
+		__u32		core_relo_rec_size; /* sizeof(struct bpf_core_relo) */
 	};
 
 	struct { /* anonymous struct used by BPF_OBJ_* commands */
@@ -6368,4 +6370,59 @@ enum bpf_core_relo_kind {
 	BPF_CORE_ENUMVAL_VALUE = 11,         /* enum value integer value */
 };
 
+/*
+ * "struct bpf_core_relo" is used to pass relocation data form LLVM to libbpf
+ * and from libbpf to the kernel.
+ *
+ * CO-RE relocation captures the following data:
+ * - insn_off - instruction offset (in bytes) within a BPF program that needs
+ *   its insn->imm field to be relocated with actual field info;
+ * - type_id - BTF type ID of the "root" (containing) entity of a relocatable
+ *   type or field;
+ * - access_str_off - offset into corresponding .BTF string section. String
+ *   interpretation depends on specific relocation kind:
+ *     - for field-based relocations, string encodes an accessed field using
+ *       a sequence of field and array indices, separated by colon (:). It's
+ *       conceptually very close to LLVM's getelementptr ([0]) instruction's
+ *       arguments for identifying offset to a field.
+ *     - for type-based relocations, strings is expected to be just "0";
+ *     - for enum value-based relocations, string contains an index of enum
+ *       value within its enum type;
+ * - kind - one of enum bpf_core_relo_kind;
+ *
+ * Example:
+ *   struct sample {
+ *       int a;
+ *       struct {
+ *           int b[10];
+ *       };
+ *   };
+ *
+ *   struct sample *s = ...;
+ *   int x = &s->a;     // encoded as "0:0" (a is field #0)
+ *   int y = &s->b[5];  // encoded as "0:1:0:5" (anon struct is field #1,
+ *                      // b is field #0 inside anon struct, accessing elem #5)
+ *   int z = &s[10]->b; // encoded as "10:1" (ptr is used as an array)
+ *
+ * type_id for all relocs in this example will capture BTF type id of
+ * `struct sample`.
+ *
+ * Such relocation is emitted when using __builtin_preserve_access_index()
+ * Clang built-in, passing expression that captures field address, e.g.:
+ *
+ * bpf_probe_read(&dst, sizeof(dst),
+ *		  __builtin_preserve_access_index(&src->a.b.c));
+ *
+ * In this case Clang will emit field relocation recording necessary data to
+ * be able to find offset of embedded `a.b.c` field within `src` struct.
+ *
+ * [0] https://llvm.org/docs/LangRef.html#getelementptr-instruction
+ */
+struct bpf_core_relo {
+	__u32 insn_off;
+	__u32 type_id;
+	__u32 access_str_off;
+	enum bpf_core_relo_kind kind;
+};
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index c79595aad55b..5fd690ea04ea 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -6439,3 +6439,13 @@ size_t bpf_core_essential_name_len(const char *name)
 	}
 	return n;
 }
+
+void bpf_core_finish(struct bpf_core_ctx *ctx)
+{
+}
+
+int bpf_core_apply(struct bpf_core_ctx *ctx, const struct bpf_core_relo *relo,
+		   int relo_idx, void *insn)
+{
+	return -EOPNOTSUPP;
+}
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 50f96ea4452a..45c9bb932132 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2184,7 +2184,7 @@ static bool is_perfmon_prog_type(enum bpf_prog_type prog_type)
 }
 
 /* last field in 'union bpf_attr' used by this command */
-#define	BPF_PROG_LOAD_LAST_FIELD fd_array
+#define	BPF_PROG_LOAD_LAST_FIELD core_relo_rec_size
 
 static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr)
 {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 0763cca139a7..bbc36132945f 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -10251,6 +10251,79 @@ static int check_btf_line(struct bpf_verifier_env *env,
 	return err;
 }
 
+#define MIN_CORE_RELO_SIZE	sizeof(struct bpf_core_relo)
+#define MAX_CORE_RELO_SIZE	MAX_FUNCINFO_REC_SIZE
+
+static int check_core_relo(struct bpf_verifier_env *env,
+			   const union bpf_attr *attr,
+			   bpfptr_t uattr)
+{
+	u32 i, nr_core_relo, ncopy, expected_size, rec_size;
+	struct bpf_core_relo core_relo = {};
+	struct bpf_prog *prog = env->prog;
+	const struct btf *btf = prog->aux->btf;
+	struct bpf_core_ctx ctx = {
+		.log = &env->log,
+		.btf = btf,
+	};
+	bpfptr_t u_core_relo;
+	int err;
+
+	nr_core_relo = attr->core_relo_cnt;
+	if (!nr_core_relo)
+		return 0;
+	if (nr_core_relo > INT_MAX / sizeof(struct bpf_core_relo))
+		return -EINVAL;
+
+	rec_size = attr->core_relo_rec_size;
+	if (rec_size < MIN_CORE_RELO_SIZE ||
+	    rec_size > MAX_CORE_RELO_SIZE ||
+	    rec_size % sizeof(u32))
+		return -EINVAL;
+
+	u_core_relo = make_bpfptr(attr->core_relos, uattr.is_kernel);
+	expected_size = sizeof(struct bpf_core_relo);
+	ncopy = min_t(u32, expected_size, rec_size);
+
+	/* Unlike func_info and line_info, copy and apply each CO-RE
+	 * relocation record one at a time.
+	 */
+	for (i = 0; i < nr_core_relo; i++) {
+		/* future proofing when sizeof(bpf_core_relo) changes */
+		err = bpf_check_uarg_tail_zero(u_core_relo, expected_size, rec_size);
+		if (err) {
+			if (err == -E2BIG) {
+				verbose(env, "nonzero tailing record in core_relo");
+				if (copy_to_bpfptr_offset(uattr,
+							  offsetof(union bpf_attr, core_relo_rec_size),
+							  &expected_size, sizeof(expected_size)))
+					err = -EFAULT;
+			}
+			break;
+		}
+
+		if (copy_from_bpfptr(&core_relo, u_core_relo, ncopy)) {
+			err = -EFAULT;
+			break;
+		}
+
+		if (core_relo.insn_off % 8 || core_relo.insn_off / 8 >= prog->len) {
+			verbose(env, "Invalid core_relo[%u].insn_off:%u prog->len:%u\n",
+				i, core_relo.insn_off, prog->len);
+			err = -EINVAL;
+			break;
+		}
+
+		err = bpf_core_apply(&ctx, &core_relo, i,
+				     &prog->insnsi[core_relo.insn_off / 8]);
+		if (err)
+			break;
+		bpfptr_add(&u_core_relo, rec_size);
+	}
+	bpf_core_finish(&ctx);
+	return err;
+}
+
 static int check_btf_info(struct bpf_verifier_env *env,
 			  const union bpf_attr *attr,
 			  bpfptr_t uattr)
@@ -10281,6 +10354,10 @@ static int check_btf_info(struct bpf_verifier_env *env,
 	if (err)
 		return err;
 
+	err = check_core_relo(env, attr, uattr);
+	if (err)
+		return err;
+
 	return 0;
 }
 
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 190e9e3c0693..ea15208793f9 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1342,8 +1342,10 @@ union bpf_attr {
 			/* or valid module BTF object fd or 0 to attach to vmlinux */
 			__u32		attach_btf_obj_fd;
 		};
-		__u32		:32;		/* pad */
+		__u32		core_relo_cnt;	/* number of bpf_core_relo */
 		__aligned_u64	fd_array;	/* array of FDs */
+		__aligned_u64	core_relos;
+		__u32		core_relo_rec_size; /* sizeof(struct bpf_core_relo) */
 	};
 
 	struct { /* anonymous struct used by BPF_OBJ_* commands */
@@ -6368,4 +6370,59 @@ enum bpf_core_relo_kind {
 	BPF_CORE_ENUMVAL_VALUE = 11,         /* enum value integer value */
 };
 
+/*
+ * "struct bpf_core_relo" is used to pass relocation data form LLVM to libbpf
+ * and from libbpf to the kernel.
+ *
+ * CO-RE relocation captures the following data:
+ * - insn_off - instruction offset (in bytes) within a BPF program that needs
+ *   its insn->imm field to be relocated with actual field info;
+ * - type_id - BTF type ID of the "root" (containing) entity of a relocatable
+ *   type or field;
+ * - access_str_off - offset into corresponding .BTF string section. String
+ *   interpretation depends on specific relocation kind:
+ *     - for field-based relocations, string encodes an accessed field using
+ *       a sequence of field and array indices, separated by colon (:). It's
+ *       conceptually very close to LLVM's getelementptr ([0]) instruction's
+ *       arguments for identifying offset to a field.
+ *     - for type-based relocations, strings is expected to be just "0";
+ *     - for enum value-based relocations, string contains an index of enum
+ *       value within its enum type;
+ * - kind - one of enum bpf_core_relo_kind;
+ *
+ * Example:
+ *   struct sample {
+ *       int a;
+ *       struct {
+ *           int b[10];
+ *       };
+ *   };
+ *
+ *   struct sample *s = ...;
+ *   int x = &s->a;     // encoded as "0:0" (a is field #0)
+ *   int y = &s->b[5];  // encoded as "0:1:0:5" (anon struct is field #1,
+ *                      // b is field #0 inside anon struct, accessing elem #5)
+ *   int z = &s[10]->b; // encoded as "10:1" (ptr is used as an array)
+ *
+ * type_id for all relocs in this example will capture BTF type id of
+ * `struct sample`.
+ *
+ * Such relocation is emitted when using __builtin_preserve_access_index()
+ * Clang built-in, passing expression that captures field address, e.g.:
+ *
+ * bpf_probe_read(&dst, sizeof(dst),
+ *		  __builtin_preserve_access_index(&src->a.b.c));
+ *
+ * In this case Clang will emit field relocation recording necessary data to
+ * be able to find offset of embedded `a.b.c` field within `src` struct.
+ *
+ * [0] https://llvm.org/docs/LangRef.html#getelementptr-instruction
+ */
+struct bpf_core_relo {
+	__u32 insn_off;
+	__u32 type_id;
+	__u32 access_str_off;
+	enum bpf_core_relo_kind kind;
+};
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/tools/lib/bpf/relo_core.h b/tools/lib/bpf/relo_core.h
index 3d0b86e7f439..f410691cc4e5 100644
--- a/tools/lib/bpf/relo_core.h
+++ b/tools/lib/bpf/relo_core.h
@@ -6,59 +6,6 @@
 
 #include <linux/bpf.h>
 
-/* The minimum bpf_core_relo checked by the loader
- *
- * CO-RE relocation captures the following data:
- * - insn_off - instruction offset (in bytes) within a BPF program that needs
- *   its insn->imm field to be relocated with actual field info;
- * - type_id - BTF type ID of the "root" (containing) entity of a relocatable
- *   type or field;
- * - access_str_off - offset into corresponding .BTF string section. String
- *   interpretation depends on specific relocation kind:
- *     - for field-based relocations, string encodes an accessed field using
- *     a sequence of field and array indices, separated by colon (:). It's
- *     conceptually very close to LLVM's getelementptr ([0]) instruction's
- *     arguments for identifying offset to a field.
- *     - for type-based relocations, strings is expected to be just "0";
- *     - for enum value-based relocations, string contains an index of enum
- *     value within its enum type;
- *
- * Example to provide a better feel.
- *
- *   struct sample {
- *       int a;
- *       struct {
- *           int b[10];
- *       };
- *   };
- *
- *   struct sample *s = ...;
- *   int x = &s->a;     // encoded as "0:0" (a is field #0)
- *   int y = &s->b[5];  // encoded as "0:1:0:5" (anon struct is field #1,
- *                      // b is field #0 inside anon struct, accessing elem #5)
- *   int z = &s[10]->b; // encoded as "10:1" (ptr is used as an array)
- *
- * type_id for all relocs in this example  will capture BTF type id of
- * `struct sample`.
- *
- * Such relocation is emitted when using __builtin_preserve_access_index()
- * Clang built-in, passing expression that captures field address, e.g.:
- *
- * bpf_probe_read(&dst, sizeof(dst),
- *		  __builtin_preserve_access_index(&src->a.b.c));
- *
- * In this case Clang will emit field relocation recording necessary data to
- * be able to find offset of embedded `a.b.c` field within `src` struct.
- *
- *   [0] https://llvm.org/docs/LangRef.html#getelementptr-instruction
- */
-struct bpf_core_relo {
-	__u32   insn_off;
-	__u32   type_id;
-	__u32   access_str_off;
-	enum bpf_core_relo_kind kind;
-};
-
 struct bpf_core_cand {
 	const struct btf *btf;
 	const struct btf_type *t;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 bpf-next 06/13] bpf: Add bpf_core_add_cands() and wire it into bpf_core_apply_relo_insn().
  2021-11-20  3:32 [PATCH v3 bpf-next 00/13] bpf: CO-RE support in the kernel Alexei Starovoitov
                   ` (4 preceding siblings ...)
  2021-11-20  3:32 ` [PATCH v3 bpf-next 05/13] bpf: Pass a set of bpf_core_relo-s to prog_load command Alexei Starovoitov
@ 2021-11-20  3:32 ` Alexei Starovoitov
  2021-11-22 23:47   ` Andrii Nakryiko
  2021-11-20  3:32 ` [PATCH v3 bpf-next 07/13] libbpf: Use CO-RE in the kernel in light skeleton Alexei Starovoitov
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 37+ messages in thread
From: Alexei Starovoitov @ 2021-11-20  3:32 UTC (permalink / raw)
  To: davem; +Cc: daniel, andrii, bpf, kernel-team

From: Alexei Starovoitov <ast@kernel.org>

Given BPF program's BTF perform a linear search through kernel BTFs for
a possible candidate.
Then wire the result into bpf_core_apply_relo_insn().

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 kernel/bpf/btf.c | 136 ++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 135 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 5fd690ea04ea..8a87de8b33c2 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -25,6 +25,7 @@
 #include <linux/kobject.h>
 #include <linux/sysfs.h>
 #include <net/sock.h>
+#include "../tools/lib/bpf/relo_core.h"
 
 /* BTF (BPF Type Format) is the meta data format which describes
  * the data types of BPF program/map.  Hence, it basically focus
@@ -6440,12 +6441,145 @@ size_t bpf_core_essential_name_len(const char *name)
 	return n;
 }
 
+static void bpf_core_free_cands(struct bpf_core_cand_list *cands)
+{
+	if (!cands)
+		return;
+	kfree(cands->cands);
+	kfree(cands);
+}
+
 void bpf_core_finish(struct bpf_core_ctx *ctx)
 {
+	bpf_core_free_cands(ctx->cands);
+}
+
+static int bpf_core_add_cands(struct bpf_verifier_log *log,
+			      struct bpf_core_cand *local_cand,
+			      size_t local_essent_len, const struct btf *targ_btf,
+			      int targ_start_id, struct bpf_core_cand_list *cands)
+{
+	struct bpf_core_cand *new_cands, *cand;
+	const struct btf_type *t;
+	const char *targ_name;
+	size_t targ_essent_len;
+	int n, i;
+
+	n = btf_nr_types(targ_btf);
+	for (i = targ_start_id; i < n; i++) {
+		t = btf_type_by_id(targ_btf, i);
+		if (btf_kind(t) != btf_kind(local_cand->t))
+			continue;
+
+		targ_name = btf_name_by_offset(targ_btf, t->name_off);
+		if (str_is_empty(targ_name))
+			continue;
+
+		targ_essent_len = bpf_core_essential_name_len(targ_name);
+		if (targ_essent_len != local_essent_len)
+			continue;
+
+		if (strncmp(local_cand->name, targ_name, local_essent_len) != 0)
+			continue;
+
+		bpf_log(log,
+			"CO-RE relocating [%d] %s %s: found target candidate [%d] %s %s\n",
+			local_cand->id, btf_type_str(local_cand->t),
+			local_cand->name, i, btf_type_str(t), targ_name);
+		new_cands = krealloc(cands->cands,
+				     (cands->len + 1) * sizeof(*cands->cands), GFP_KERNEL);
+		if (!new_cands)
+			return -ENOMEM;
+
+		cand = &new_cands[cands->len];
+		cand->btf = targ_btf;
+		cand->t = t;
+		cand->name = targ_name;
+		cand->id = i;
+
+		cands->cands = new_cands;
+		cands->len++;
+	}
+	return 0;
+}
+
+static struct bpf_core_cand_list *
+bpf_core_find_cands(struct bpf_core_ctx *ctx, u32 local_type_id)
+{
+	const struct btf *local_btf = ctx->btf;
+	struct bpf_core_cand local_cand = {};
+	struct bpf_core_cand_list *cands;
+	const struct btf *main_btf;
+	size_t local_essent_len;
+	struct btf *mod_btf;
+	int err;
+	int id;
+
+	local_cand.btf = local_btf;
+	local_cand.t = btf_type_by_id(local_btf, local_type_id);
+	if (!local_cand.t)
+		return ERR_PTR(-EINVAL);
+
+	local_cand.name = btf_name_by_offset(local_btf, local_cand.t->name_off);
+	if (str_is_empty(local_cand.name))
+		return ERR_PTR(-EINVAL);
+	local_essent_len = bpf_core_essential_name_len(local_cand.name);
+
+	cands = kcalloc(1, sizeof(*cands), GFP_KERNEL);
+	if (!cands)
+		return ERR_PTR(-ENOMEM);
+
+	/* Attempt to find target candidates in vmlinux BTF first */
+	main_btf = bpf_get_btf_vmlinux();
+	err = bpf_core_add_cands(ctx->log, &local_cand, local_essent_len, main_btf, 1, cands);
+	if (err)
+		goto err_out;
+
+	/* if vmlinux BTF has any candidate, don't go for module BTFs */
+	if (cands->len)
+		return cands;
+
+	/* If candidate is not found in vmlinux's BTF then search in module's BTFs */
+	spin_lock_bh(&btf_idr_lock);
+	idr_for_each_entry(&btf_idr, mod_btf, id) {
+		if (!btf_is_module(mod_btf))
+			continue;
+		/* linear search could be slow hence unlock/lock
+		 * the IDR to avoiding holding it for too long
+		 */
+		btf_get(mod_btf);
+		spin_unlock_bh(&btf_idr_lock);
+		err = bpf_core_add_cands(ctx->log, &local_cand, local_essent_len,
+					 mod_btf, btf_nr_types(main_btf), cands);
+		if (err) {
+			btf_put(mod_btf);
+			goto err_out;
+		}
+		spin_lock_bh(&btf_idr_lock);
+		btf_put(mod_btf);
+	}
+	spin_unlock_bh(&btf_idr_lock);
+
+	return cands;
+err_out:
+	bpf_core_free_cands(cands);
+	return ERR_PTR(err);
 }
 
 int bpf_core_apply(struct bpf_core_ctx *ctx, const struct bpf_core_relo *relo,
 		   int relo_idx, void *insn)
 {
-	return -EOPNOTSUPP;
+	if (relo->kind != BPF_CORE_TYPE_ID_LOCAL) {
+		struct bpf_core_cand_list *cands;
+
+		cands = bpf_core_find_cands(ctx, relo->type_id);
+		if (IS_ERR(cands)) {
+			bpf_log(ctx->log, "target candidate search failed for %d\n",
+				relo->type_id);
+			return PTR_ERR(cands);
+		}
+		ctx->cands = cands;
+	}
+	return bpf_core_apply_relo_insn((void *)ctx->log, insn, relo->insn_off / 8,
+					relo, relo_idx, ctx->btf, ctx->cands);
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 bpf-next 07/13] libbpf: Use CO-RE in the kernel in light skeleton.
  2021-11-20  3:32 [PATCH v3 bpf-next 00/13] bpf: CO-RE support in the kernel Alexei Starovoitov
                   ` (5 preceding siblings ...)
  2021-11-20  3:32 ` [PATCH v3 bpf-next 06/13] bpf: Add bpf_core_add_cands() and wire it into bpf_core_apply_relo_insn() Alexei Starovoitov
@ 2021-11-20  3:32 ` Alexei Starovoitov
  2021-11-23  0:03   ` Andrii Nakryiko
  2021-11-20  3:32 ` [PATCH v3 bpf-next 08/13] libbpf: Support init of inner maps " Alexei Starovoitov
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 37+ messages in thread
From: Alexei Starovoitov @ 2021-11-20  3:32 UTC (permalink / raw)
  To: davem; +Cc: daniel, andrii, bpf, kernel-team

From: Alexei Starovoitov <ast@kernel.org>

Without lskel the CO-RE relocations are processed by libbpf before any other
work is done. Instead, when lskel is needed, remember relocation as RELO_CORE
kind. Then when loader prog is generated for a given bpf program pass CO-RE
relos of that program to gen loader via bpf_gen__record_relo_core(). The gen
loader will remember them as-is and pass it later as-is into the kernel.

The normal libbpf flow is to process CO-RE early before call relos happen. In
case of gen_loader the core relos have to be added to other relos to be copied
together when bpf static function is appended in different places to other main
bpf progs. During the copy the append_subprog_relos() will adjust insn_idx for
normal relos and for RELO_CORE kind too. When that is done each struct
reloc_desc has good relos for specific main prog.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 tools/lib/bpf/bpf_gen_internal.h |   3 +
 tools/lib/bpf/gen_loader.c       |  41 +++++++++++-
 tools/lib/bpf/libbpf.c           | 108 ++++++++++++++++++++++---------
 3 files changed, 119 insertions(+), 33 deletions(-)

diff --git a/tools/lib/bpf/bpf_gen_internal.h b/tools/lib/bpf/bpf_gen_internal.h
index 75ca9fb857b2..5cbc3e5d3e69 100644
--- a/tools/lib/bpf/bpf_gen_internal.h
+++ b/tools/lib/bpf/bpf_gen_internal.h
@@ -39,6 +39,8 @@ struct bpf_gen {
 	int error;
 	struct ksym_relo_desc *relos;
 	int relo_cnt;
+	struct bpf_core_relo *core_relos;
+	int core_relo_cnt;
 	char attach_target[128];
 	int attach_kind;
 	struct ksym_desc *ksyms;
@@ -61,5 +63,6 @@ void bpf_gen__map_freeze(struct bpf_gen *gen, int map_idx);
 void bpf_gen__record_attach_target(struct bpf_gen *gen, const char *name, enum bpf_attach_type type);
 void bpf_gen__record_extern(struct bpf_gen *gen, const char *name, bool is_weak,
 			    bool is_typeless, int kind, int insn_idx);
+void bpf_gen__record_relo_core(struct bpf_gen *gen, const struct bpf_core_relo *core_relo);
 
 #endif
diff --git a/tools/lib/bpf/gen_loader.c b/tools/lib/bpf/gen_loader.c
index 7b73f97b1fa1..4e32fe1f68db 100644
--- a/tools/lib/bpf/gen_loader.c
+++ b/tools/lib/bpf/gen_loader.c
@@ -839,6 +839,22 @@ static void emit_relo_ksym_btf(struct bpf_gen *gen, struct ksym_relo_desc *relo,
 	emit_ksym_relo_log(gen, relo, kdesc->ref);
 }
 
+void bpf_gen__record_relo_core(struct bpf_gen *gen,
+			       const struct bpf_core_relo *core_relo)
+{
+	struct bpf_core_relo *relos;
+
+	relos = libbpf_reallocarray(gen->core_relos, gen->core_relo_cnt + 1, sizeof(*relos));
+	if (!relos) {
+		gen->error = -ENOMEM;
+		return;
+	}
+	gen->core_relos = relos;
+	relos += gen->core_relo_cnt;
+	memcpy(relos, core_relo, sizeof(*relos));
+	gen->core_relo_cnt++;
+}
+
 static void emit_relo(struct bpf_gen *gen, struct ksym_relo_desc *relo, int insns)
 {
 	int insn;
@@ -871,6 +887,15 @@ static void emit_relos(struct bpf_gen *gen, int insns)
 		emit_relo(gen, gen->relos + i, insns);
 }
 
+static void cleanup_core_relo(struct bpf_gen *gen)
+{
+	if (!gen->core_relo_cnt)
+		return;
+	free(gen->core_relos);
+	gen->core_relo_cnt = 0;
+	gen->core_relos = NULL;
+}
+
 static void cleanup_relos(struct bpf_gen *gen, int insns)
 {
 	int i, insn;
@@ -898,6 +923,7 @@ static void cleanup_relos(struct bpf_gen *gen, int insns)
 		gen->relo_cnt = 0;
 		gen->relos = NULL;
 	}
+	cleanup_core_relo(gen);
 }
 
 void bpf_gen__prog_load(struct bpf_gen *gen,
@@ -905,12 +931,13 @@ void bpf_gen__prog_load(struct bpf_gen *gen,
 			const char *license, struct bpf_insn *insns, size_t insn_cnt,
 			struct bpf_prog_load_opts *load_attr, int prog_idx)
 {
-	int attr_size = offsetofend(union bpf_attr, fd_array);
-	int prog_load_attr, license_off, insns_off, func_info, line_info;
+	int prog_load_attr, license_off, insns_off, func_info, line_info, core_relos;
+	int attr_size = offsetofend(union bpf_attr, core_relo_rec_size);
 	union bpf_attr attr;
 
 	memset(&attr, 0, attr_size);
-	pr_debug("gen: prog_load: type %d insns_cnt %zd\n", prog_type, insn_cnt);
+	pr_debug("gen: prog_load: type %d insns_cnt %zd progi_idx %d\n",
+		 prog_type, insn_cnt, prog_idx);
 	/* add license string to blob of bytes */
 	license_off = add_data(gen, license, strlen(license) + 1);
 	/* add insns to blob of bytes */
@@ -934,6 +961,11 @@ void bpf_gen__prog_load(struct bpf_gen *gen,
 	line_info = add_data(gen, load_attr->line_info,
 			     attr.line_info_cnt * attr.line_info_rec_size);
 
+	attr.core_relo_rec_size = sizeof(struct bpf_core_relo);
+	attr.core_relo_cnt = gen->core_relo_cnt;
+	core_relos = add_data(gen, gen->core_relos,
+			     attr.core_relo_cnt * attr.core_relo_rec_size);
+
 	memcpy(attr.prog_name, prog_name,
 	       min((unsigned)strlen(prog_name), BPF_OBJ_NAME_LEN - 1));
 	prog_load_attr = add_data(gen, &attr, attr_size);
@@ -950,6 +982,9 @@ void bpf_gen__prog_load(struct bpf_gen *gen,
 	/* populate union bpf_attr with a pointer to line_info */
 	emit_rel_store(gen, attr_field(prog_load_attr, line_info), line_info);
 
+	/* populate union bpf_attr with a pointer to core_relos */
+	emit_rel_store(gen, attr_field(prog_load_attr, core_relos), core_relos);
+
 	/* populate union bpf_attr fd_array with a pointer to data where map_fds are saved */
 	emit_rel_store(gen, attr_field(prog_load_attr, fd_array), gen->fd_array);
 
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index fc9a179ea412..dbb393768339 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -230,13 +230,19 @@ enum reloc_type {
 	RELO_EXTERN_VAR,
 	RELO_EXTERN_FUNC,
 	RELO_SUBPROG_ADDR,
+	RELO_CORE,
 };
 
 struct reloc_desc {
 	enum reloc_type type;
 	int insn_idx;
-	int map_idx;
-	int sym_off;
+	union {
+		const struct bpf_core_relo *core_relo; /* used when type == RELO_CORE */
+		struct {
+			int map_idx;
+			int sym_off;
+		};
+	};
 };
 
 struct bpf_sec_def;
@@ -5417,6 +5423,24 @@ static void *u32_as_hash_key(__u32 x)
 	return (void *)(uintptr_t)x;
 }
 
+static int record_relo_core(struct bpf_program *prog,
+			    const struct bpf_core_relo *core_relo, int insn_idx)
+{
+	struct reloc_desc *relos, *relo;
+
+	relos = libbpf_reallocarray(prog->reloc_desc,
+				    prog->nr_reloc + 1, sizeof(*relos));
+	if (!relos)
+		return -ENOMEM;
+	relo = &relos[prog->nr_reloc];
+	relo->type = RELO_CORE;
+	relo->insn_idx = insn_idx;
+	relo->core_relo = core_relo;
+	prog->reloc_desc = relos;
+	prog->nr_reloc++;
+	return 0;
+}
+
 static int bpf_core_apply_relo(struct bpf_program *prog,
 			       const struct bpf_core_relo *relo,
 			       int relo_idx,
@@ -5453,10 +5477,12 @@ static int bpf_core_apply_relo(struct bpf_program *prog,
 		return -EINVAL;
 
 	if (prog->obj->gen_loader) {
-		pr_warn("// TODO core_relo: prog %td insn[%d] %s kind %d\n",
+		const char *spec_str = btf__name_by_offset(local_btf, relo->access_str_off);
+
+		pr_debug("record_relo_core: prog %td insn[%d] %s %s %s final insn_idx %d\n",
 			prog - prog->obj->programs, relo->insn_off / 8,
-			local_name, relo->kind);
-		return -ENOTSUP;
+			btf_kind_str(local_type), local_name, spec_str, insn_idx);
+		return record_relo_core(prog, relo, insn_idx);
 	}
 
 	if (relo->kind != BPF_CORE_TYPE_ID_LOCAL &&
@@ -5653,6 +5679,9 @@ bpf_object__relocate_data(struct bpf_object *obj, struct bpf_program *prog)
 		case RELO_CALL:
 			/* handled already */
 			break;
+		case RELO_CORE:
+			/* will be handled by bpf_program_record_relos() */
+			break;
 		default:
 			pr_warn("prog '%s': relo #%d: bad relo type %d\n",
 				prog->name, i, relo->type);
@@ -6090,6 +6119,35 @@ bpf_object__free_relocs(struct bpf_object *obj)
 	}
 }
 
+static int cmp_relocs(const void *_a, const void *_b)
+{
+	const struct reloc_desc *a = _a;
+	const struct reloc_desc *b = _b;
+
+	if (a->insn_idx != b->insn_idx)
+		return a->insn_idx < b->insn_idx ? -1 : 1;
+
+	/* no two relocations should have the same insn_idx, but ... */
+	if (a->type != b->type)
+		return a->type < b->type ? -1 : 1;
+
+	return 0;
+}
+
+static void bpf_object__sort_relos(struct bpf_object *obj)
+{
+	int i;
+
+	for (i = 0; i < obj->nr_programs; i++) {
+		struct bpf_program *p = &obj->programs[i];
+
+		if (!p->nr_reloc)
+			continue;
+
+		qsort(p->reloc_desc, p->nr_reloc, sizeof(*p->reloc_desc), cmp_relocs);
+	}
+}
+
 static int
 bpf_object__relocate(struct bpf_object *obj, const char *targ_btf_path)
 {
@@ -6104,6 +6162,8 @@ bpf_object__relocate(struct bpf_object *obj, const char *targ_btf_path)
 				err);
 			return err;
 		}
+		if (obj->gen_loader)
+			bpf_object__sort_relos(obj);
 	}
 
 	/* Before relocating calls pre-process relocations and mark
@@ -6281,21 +6341,6 @@ static int bpf_object__collect_map_relos(struct bpf_object *obj,
 	return 0;
 }
 
-static int cmp_relocs(const void *_a, const void *_b)
-{
-	const struct reloc_desc *a = _a;
-	const struct reloc_desc *b = _b;
-
-	if (a->insn_idx != b->insn_idx)
-		return a->insn_idx < b->insn_idx ? -1 : 1;
-
-	/* no two relocations should have the same insn_idx, but ... */
-	if (a->type != b->type)
-		return a->type < b->type ? -1 : 1;
-
-	return 0;
-}
-
 static int bpf_object__collect_relos(struct bpf_object *obj)
 {
 	int i, err;
@@ -6328,14 +6373,7 @@ static int bpf_object__collect_relos(struct bpf_object *obj)
 			return err;
 	}
 
-	for (i = 0; i < obj->nr_programs; i++) {
-		struct bpf_program *p = &obj->programs[i];
-
-		if (!p->nr_reloc)
-			continue;
-
-		qsort(p->reloc_desc, p->nr_reloc, sizeof(*p->reloc_desc), cmp_relocs);
-	}
+	bpf_object__sort_relos(obj);
 	return 0;
 }
 
@@ -6578,7 +6616,7 @@ static int bpf_object_load_prog_instance(struct bpf_object *obj, struct bpf_prog
 	return ret;
 }
 
-static int bpf_program__record_externs(struct bpf_program *prog)
+static int bpf_program_record_relos(struct bpf_program *prog)
 {
 	struct bpf_object *obj = prog->obj;
 	int i;
@@ -6600,6 +6638,16 @@ static int bpf_program__record_externs(struct bpf_program *prog)
 					       ext->is_weak, false, BTF_KIND_FUNC,
 					       relo->insn_idx);
 			break;
+		case RELO_CORE: {
+			struct bpf_core_relo cr = {
+				.insn_off = relo->insn_idx * 8,
+				.type_id = relo->core_relo->type_id,
+				.access_str_off = relo->core_relo->access_str_off,
+				.kind = relo->core_relo->kind,
+			};
+			bpf_gen__record_relo_core(obj->gen_loader, &cr);
+			break;
+		}
 		default:
 			continue;
 		}
@@ -6639,7 +6687,7 @@ static int bpf_object_load_prog(struct bpf_object *obj, struct bpf_program *prog
 				prog->name, prog->instances.nr);
 		}
 		if (obj->gen_loader)
-			bpf_program__record_externs(prog);
+			bpf_program_record_relos(prog);
 		err = bpf_object_load_prog_instance(obj, prog,
 						    prog->insns, prog->insns_cnt,
 						    license, kern_ver, &fd);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 bpf-next 08/13] libbpf: Support init of inner maps in light skeleton.
  2021-11-20  3:32 [PATCH v3 bpf-next 00/13] bpf: CO-RE support in the kernel Alexei Starovoitov
                   ` (6 preceding siblings ...)
  2021-11-20  3:32 ` [PATCH v3 bpf-next 07/13] libbpf: Use CO-RE in the kernel in light skeleton Alexei Starovoitov
@ 2021-11-20  3:32 ` Alexei Starovoitov
  2021-11-20  3:32 ` [PATCH v3 bpf-next 09/13] selftests/bpf: Add lskel version of kfunc test Alexei Starovoitov
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 37+ messages in thread
From: Alexei Starovoitov @ 2021-11-20  3:32 UTC (permalink / raw)
  To: davem; +Cc: daniel, andrii, bpf, kernel-team

From: Alexei Starovoitov <ast@kernel.org>

Add ability to initialize inner maps in light skeleton.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 tools/lib/bpf/bpf_gen_internal.h |  1 +
 tools/lib/bpf/gen_loader.c       | 27 +++++++++++++++++++++++++++
 tools/lib/bpf/libbpf.c           |  6 +++---
 3 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/tools/lib/bpf/bpf_gen_internal.h b/tools/lib/bpf/bpf_gen_internal.h
index 5cbc3e5d3e69..6d8447756509 100644
--- a/tools/lib/bpf/bpf_gen_internal.h
+++ b/tools/lib/bpf/bpf_gen_internal.h
@@ -64,5 +64,6 @@ void bpf_gen__record_attach_target(struct bpf_gen *gen, const char *name, enum b
 void bpf_gen__record_extern(struct bpf_gen *gen, const char *name, bool is_weak,
 			    bool is_typeless, int kind, int insn_idx);
 void bpf_gen__record_relo_core(struct bpf_gen *gen, const struct bpf_core_relo *core_relo);
+void bpf_gen__populate_outer_map(struct bpf_gen *gen, int outer_map_idx, int key, int inner_map_idx);
 
 #endif
diff --git a/tools/lib/bpf/gen_loader.c b/tools/lib/bpf/gen_loader.c
index 4e32fe1f68db..9066d1ae3461 100644
--- a/tools/lib/bpf/gen_loader.c
+++ b/tools/lib/bpf/gen_loader.c
@@ -1063,6 +1063,33 @@ void bpf_gen__map_update_elem(struct bpf_gen *gen, int map_idx, void *pvalue,
 	emit_check_err(gen);
 }
 
+void bpf_gen__populate_outer_map(struct bpf_gen *gen, int outer_map_idx, int slot,
+				 int inner_map_idx)
+{
+	int attr_size = offsetofend(union bpf_attr, flags);
+	int map_update_attr, key;
+	union bpf_attr attr;
+
+	memset(&attr, 0, attr_size);
+	pr_debug("gen: populate_outer_map: outer %d key %d inner %d\n",
+		 outer_map_idx, slot, inner_map_idx);
+
+	key = add_data(gen, &slot, sizeof(slot));
+
+	map_update_attr = add_data(gen, &attr, attr_size);
+	move_blob2blob(gen, attr_field(map_update_attr, map_fd), 4,
+		       blob_fd_array_off(gen, outer_map_idx));
+	emit_rel_store(gen, attr_field(map_update_attr, key), key);
+	emit_rel_store(gen, attr_field(map_update_attr, value),
+		       blob_fd_array_off(gen, inner_map_idx));
+
+	/* emit MAP_UPDATE_ELEM command */
+	emit_sys_bpf(gen, BPF_MAP_UPDATE_ELEM, map_update_attr, attr_size);
+	debug_ret(gen, "populate_outer_map outer %d key %d inner %d",
+		  outer_map_idx, slot, inner_map_idx);
+	emit_check_err(gen);
+}
+
 void bpf_gen__map_freeze(struct bpf_gen *gen, int map_idx)
 {
 	int attr_size = offsetofend(union bpf_attr, map_fd);
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index dbb393768339..af413273c0fb 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -4977,9 +4977,9 @@ static int init_map_slots(struct bpf_object *obj, struct bpf_map *map)
 		targ_map = map->init_slots[i];
 		fd = bpf_map__fd(targ_map);
 		if (obj->gen_loader) {
-			pr_warn("// TODO map_update_elem: idx %td key %d value==map_idx %td\n",
-				map - obj->maps, i, targ_map - obj->maps);
-			return -ENOTSUP;
+			bpf_gen__populate_outer_map(obj->gen_loader,
+						    map - obj->maps, i,
+						    targ_map - obj->maps);
 		} else {
 			err = bpf_map_update_elem(map->fd, &i, &fd, 0);
 		}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 bpf-next 09/13] selftests/bpf: Add lskel version of kfunc test.
  2021-11-20  3:32 [PATCH v3 bpf-next 00/13] bpf: CO-RE support in the kernel Alexei Starovoitov
                   ` (7 preceding siblings ...)
  2021-11-20  3:32 ` [PATCH v3 bpf-next 08/13] libbpf: Support init of inner maps " Alexei Starovoitov
@ 2021-11-20  3:32 ` Alexei Starovoitov
  2021-11-23  0:05   ` Andrii Nakryiko
  2021-11-20  3:32 ` [PATCH v3 bpf-next 10/13] selftests/bpf: Improve inner_map test coverage Alexei Starovoitov
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 37+ messages in thread
From: Alexei Starovoitov @ 2021-11-20  3:32 UTC (permalink / raw)
  To: davem; +Cc: daniel, andrii, bpf, kernel-team

From: Alexei Starovoitov <ast@kernel.org>

Add light skeleton version of kfunc_call_test_subprog test.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 tools/testing/selftests/bpf/Makefile          |  2 +-
 .../selftests/bpf/prog_tests/kfunc_call.c     | 24 +++++++++++++++++++
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 35684d61aaeb..0c9f1837ece2 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -327,7 +327,7 @@ LINKED_SKELS := test_static_linked.skel.h linked_funcs.skel.h		\
 LSKELS := kfunc_call_test.c fentry_test.c fexit_test.c fexit_sleep.c \
 	test_ringbuf.c atomics.c trace_printk.c trace_vprintk.c
 # Generate both light skeleton and libbpf skeleton for these
-LSKELS_EXTRA := test_ksyms_module.c test_ksyms_weak.c
+LSKELS_EXTRA := test_ksyms_module.c test_ksyms_weak.c kfunc_call_test_subprog.c
 SKEL_BLACKLIST += $$(LSKELS)
 
 test_static_linked.skel.h-deps := test_static_linked1.o test_static_linked2.o
diff --git a/tools/testing/selftests/bpf/prog_tests/kfunc_call.c b/tools/testing/selftests/bpf/prog_tests/kfunc_call.c
index 5c9c0176991b..7d7445ccc141 100644
--- a/tools/testing/selftests/bpf/prog_tests/kfunc_call.c
+++ b/tools/testing/selftests/bpf/prog_tests/kfunc_call.c
@@ -4,6 +4,7 @@
 #include <network_helpers.h>
 #include "kfunc_call_test.lskel.h"
 #include "kfunc_call_test_subprog.skel.h"
+#include "kfunc_call_test_subprog.lskel.h"
 
 static void test_main(void)
 {
@@ -49,6 +50,26 @@ static void test_subprog(void)
 	kfunc_call_test_subprog__destroy(skel);
 }
 
+static void test_subprog_lskel(void)
+{
+	struct kfunc_call_test_subprog_lskel *skel;
+	int prog_fd, retval, err;
+
+	skel = kfunc_call_test_subprog_lskel__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel"))
+		return;
+
+	prog_fd = skel->progs.kfunc_call_test1.prog_fd;
+	err = bpf_prog_test_run(prog_fd, 1, &pkt_v4, sizeof(pkt_v4),
+				NULL, NULL, (__u32 *)&retval, NULL);
+	ASSERT_OK(err, "bpf_prog_test_run(test1)");
+	ASSERT_EQ(retval, 10, "test1-retval");
+	ASSERT_NEQ(skel->data->active_res, -1, "active_res");
+	ASSERT_EQ(skel->data->sk_state_res, BPF_TCP_CLOSE, "sk_state_res");
+
+	kfunc_call_test_subprog_lskel__destroy(skel);
+}
+
 void test_kfunc_call(void)
 {
 	if (test__start_subtest("main"))
@@ -56,4 +77,7 @@ void test_kfunc_call(void)
 
 	if (test__start_subtest("subprog"))
 		test_subprog();
+
+	if (test__start_subtest("subprog_lskel"))
+		test_subprog_lskel();
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 bpf-next 10/13] selftests/bpf: Improve inner_map test coverage.
  2021-11-20  3:32 [PATCH v3 bpf-next 00/13] bpf: CO-RE support in the kernel Alexei Starovoitov
                   ` (8 preceding siblings ...)
  2021-11-20  3:32 ` [PATCH v3 bpf-next 09/13] selftests/bpf: Add lskel version of kfunc test Alexei Starovoitov
@ 2021-11-20  3:32 ` Alexei Starovoitov
  2021-11-20  3:32 ` [PATCH v3 bpf-next 11/13] selftests/bpf: Convert map_ptr_kern test to use light skeleton Alexei Starovoitov
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 37+ messages in thread
From: Alexei Starovoitov @ 2021-11-20  3:32 UTC (permalink / raw)
  To: davem; +Cc: daniel, andrii, bpf, kernel-team

From: Alexei Starovoitov <ast@kernel.org>

Check that hash and array inner maps are properly initialized.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 tools/testing/selftests/bpf/progs/map_ptr_kern.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/bpf/progs/map_ptr_kern.c b/tools/testing/selftests/bpf/progs/map_ptr_kern.c
index b1b711d9b214..b64df94ec476 100644
--- a/tools/testing/selftests/bpf/progs/map_ptr_kern.c
+++ b/tools/testing/selftests/bpf/progs/map_ptr_kern.c
@@ -334,9 +334,11 @@ static inline int check_lpm_trie(void)
 	return 1;
 }
 
+#define INNER_MAX_ENTRIES 1234
+
 struct inner_map {
 	__uint(type, BPF_MAP_TYPE_ARRAY);
-	__uint(max_entries, 1);
+	__uint(max_entries, INNER_MAX_ENTRIES);
 	__type(key, __u32);
 	__type(value, __u32);
 } inner_map SEC(".maps");
@@ -348,7 +350,7 @@ struct {
 	__type(value, __u32);
 	__array(values, struct {
 		__uint(type, BPF_MAP_TYPE_ARRAY);
-		__uint(max_entries, 1);
+		__uint(max_entries, INNER_MAX_ENTRIES);
 		__type(key, __u32);
 		__type(value, __u32);
 	});
@@ -360,8 +362,13 @@ static inline int check_array_of_maps(void)
 {
 	struct bpf_array *array_of_maps = (struct bpf_array *)&m_array_of_maps;
 	struct bpf_map *map = (struct bpf_map *)&m_array_of_maps;
+	struct bpf_array *inner_map;
+	int key = 0;
 
 	VERIFY(check_default(&array_of_maps->map, map));
+	inner_map = bpf_map_lookup_elem(array_of_maps, &key);
+	VERIFY(inner_map != 0);
+	VERIFY(inner_map->map.max_entries == INNER_MAX_ENTRIES);
 
 	return 1;
 }
@@ -382,8 +389,13 @@ static inline int check_hash_of_maps(void)
 {
 	struct bpf_htab *hash_of_maps = (struct bpf_htab *)&m_hash_of_maps;
 	struct bpf_map *map = (struct bpf_map *)&m_hash_of_maps;
+	struct bpf_htab *inner_map;
+	int key = 2;
 
 	VERIFY(check_default(&hash_of_maps->map, map));
+	inner_map = bpf_map_lookup_elem(hash_of_maps, &key);
+	VERIFY(inner_map != 0);
+	VERIFY(inner_map->map.max_entries == INNER_MAX_ENTRIES);
 
 	return 1;
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 bpf-next 11/13] selftests/bpf: Convert map_ptr_kern test to use light skeleton.
  2021-11-20  3:32 [PATCH v3 bpf-next 00/13] bpf: CO-RE support in the kernel Alexei Starovoitov
                   ` (9 preceding siblings ...)
  2021-11-20  3:32 ` [PATCH v3 bpf-next 10/13] selftests/bpf: Improve inner_map test coverage Alexei Starovoitov
@ 2021-11-20  3:32 ` Alexei Starovoitov
  2021-11-20  3:32 ` [PATCH v3 bpf-next 12/13] selftests/bpf: Additional test for CO-RE in the kernel Alexei Starovoitov
  2021-11-20  3:32 ` [PATCH v3 bpf-next 13/13] selftest/bpf: Revert CO-RE removal in test_ksyms_weak Alexei Starovoitov
  12 siblings, 0 replies; 37+ messages in thread
From: Alexei Starovoitov @ 2021-11-20  3:32 UTC (permalink / raw)
  To: davem; +Cc: daniel, andrii, bpf, kernel-team

From: Alexei Starovoitov <ast@kernel.org>

To exercise CO-RE in the kernel further convert map_ptr_kern
test to light skeleton.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 tools/testing/selftests/bpf/Makefile             |  3 ++-
 tools/testing/selftests/bpf/prog_tests/map_ptr.c | 16 +++++++---------
 2 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 0c9f1837ece2..4fd040f5944b 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -325,7 +325,8 @@ LINKED_SKELS := test_static_linked.skel.h linked_funcs.skel.h		\
 		linked_vars.skel.h linked_maps.skel.h
 
 LSKELS := kfunc_call_test.c fentry_test.c fexit_test.c fexit_sleep.c \
-	test_ringbuf.c atomics.c trace_printk.c trace_vprintk.c
+	test_ringbuf.c atomics.c trace_printk.c trace_vprintk.c \
+	map_ptr_kern.c
 # Generate both light skeleton and libbpf skeleton for these
 LSKELS_EXTRA := test_ksyms_module.c test_ksyms_weak.c kfunc_call_test_subprog.c
 SKEL_BLACKLIST += $$(LSKELS)
diff --git a/tools/testing/selftests/bpf/prog_tests/map_ptr.c b/tools/testing/selftests/bpf/prog_tests/map_ptr.c
index 4972f92205c7..273725504f11 100644
--- a/tools/testing/selftests/bpf/prog_tests/map_ptr.c
+++ b/tools/testing/selftests/bpf/prog_tests/map_ptr.c
@@ -4,31 +4,29 @@
 #include <test_progs.h>
 #include <network_helpers.h>
 
-#include "map_ptr_kern.skel.h"
+#include "map_ptr_kern.lskel.h"
 
 void test_map_ptr(void)
 {
-	struct map_ptr_kern *skel;
+	struct map_ptr_kern_lskel *skel;
 	__u32 duration = 0, retval;
 	char buf[128];
 	int err;
 	int page_size = getpagesize();
 
-	skel = map_ptr_kern__open();
+	skel = map_ptr_kern_lskel__open();
 	if (!ASSERT_OK_PTR(skel, "skel_open"))
 		return;
 
-	err = bpf_map__set_max_entries(skel->maps.m_ringbuf, page_size);
-	if (!ASSERT_OK(err, "bpf_map__set_max_entries"))
-		goto cleanup;
+	skel->maps.m_ringbuf.max_entries = page_size;
 
-	err = map_ptr_kern__load(skel);
+	err = map_ptr_kern_lskel__load(skel);
 	if (!ASSERT_OK(err, "skel_load"))
 		goto cleanup;
 
 	skel->bss->page_size = page_size;
 
-	err = bpf_prog_test_run(bpf_program__fd(skel->progs.cg_skb), 1, &pkt_v4,
+	err = bpf_prog_test_run(skel->progs.cg_skb.prog_fd, 1, &pkt_v4,
 				sizeof(pkt_v4), buf, NULL, &retval, NULL);
 
 	if (CHECK(err, "test_run", "err=%d errno=%d\n", err, errno))
@@ -39,5 +37,5 @@ void test_map_ptr(void)
 		goto cleanup;
 
 cleanup:
-	map_ptr_kern__destroy(skel);
+	map_ptr_kern_lskel__destroy(skel);
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 bpf-next 12/13] selftests/bpf: Additional test for CO-RE in the kernel.
  2021-11-20  3:32 [PATCH v3 bpf-next 00/13] bpf: CO-RE support in the kernel Alexei Starovoitov
                   ` (10 preceding siblings ...)
  2021-11-20  3:32 ` [PATCH v3 bpf-next 11/13] selftests/bpf: Convert map_ptr_kern test to use light skeleton Alexei Starovoitov
@ 2021-11-20  3:32 ` Alexei Starovoitov
  2021-11-23  0:09   ` Andrii Nakryiko
  2021-11-20  3:32 ` [PATCH v3 bpf-next 13/13] selftest/bpf: Revert CO-RE removal in test_ksyms_weak Alexei Starovoitov
  12 siblings, 1 reply; 37+ messages in thread
From: Alexei Starovoitov @ 2021-11-20  3:32 UTC (permalink / raw)
  To: davem; +Cc: daniel, andrii, bpf, kernel-team

From: Alexei Starovoitov <ast@kernel.org>

Additional test where randmap() function is appended to three different bpf
programs. That action checks struct bpf_core_relo replication logic and offset
adjustment.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 tools/testing/selftests/bpf/Makefile          |  2 +-
 .../selftests/bpf/prog_tests/core_kern.c      | 14 +++++
 tools/testing/selftests/bpf/progs/core_kern.c | 60 +++++++++++++++++++
 3 files changed, 75 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/core_kern.c
 create mode 100644 tools/testing/selftests/bpf/progs/core_kern.c

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 4fd040f5944b..139d7e5e0a5f 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -326,7 +326,7 @@ LINKED_SKELS := test_static_linked.skel.h linked_funcs.skel.h		\
 
 LSKELS := kfunc_call_test.c fentry_test.c fexit_test.c fexit_sleep.c \
 	test_ringbuf.c atomics.c trace_printk.c trace_vprintk.c \
-	map_ptr_kern.c
+	map_ptr_kern.c core_kern.c
 # Generate both light skeleton and libbpf skeleton for these
 LSKELS_EXTRA := test_ksyms_module.c test_ksyms_weak.c kfunc_call_test_subprog.c
 SKEL_BLACKLIST += $$(LSKELS)
diff --git a/tools/testing/selftests/bpf/prog_tests/core_kern.c b/tools/testing/selftests/bpf/prog_tests/core_kern.c
new file mode 100644
index 000000000000..561c5185d886
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/core_kern.c
@@ -0,0 +1,14 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Facebook */
+
+#include "test_progs.h"
+#include "core_kern.lskel.h"
+
+void test_core_kern_lskel(void)
+{
+	struct core_kern_lskel *skel;
+
+	skel = core_kern_lskel__open_and_load();
+	ASSERT_OK_PTR(skel, "open_and_load");
+	core_kern_lskel__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/core_kern.c b/tools/testing/selftests/bpf/progs/core_kern.c
new file mode 100644
index 000000000000..3b4571d68369
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/core_kern.c
@@ -0,0 +1,60 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Facebook */
+#include "vmlinux.h"
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__type(key, u32);
+	__type(value, u32);
+	__uint(max_entries, 256);
+} array1 SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__type(key, u32);
+	__type(value, u32);
+	__uint(max_entries, 256);
+} array2 SEC(".maps");
+
+int randmap(int v)
+{
+	struct bpf_map *map = (struct bpf_map *)&array1;
+	int key = bpf_get_prandom_u32() & 0xff;
+	int *val;
+
+	if (bpf_get_prandom_u32() & 1)
+		map = (struct bpf_map *)&array2;
+
+	val = bpf_map_lookup_elem(map, &key);
+	if (val)
+		*val = bpf_get_prandom_u32() + v;
+
+	return 0;
+}
+
+SEC("tp_btf/xdp_devmap_xmit")
+int BPF_PROG(tp_xdp_devmap_xmit_multi, const struct net_device
+	     *from_dev, const struct net_device *to_dev, int sent, int drops,
+	     int err)
+{
+	return randmap(from_dev->ifindex);
+}
+
+SEC("fentry/eth_type_trans")
+int BPF_PROG(fentry_eth_type_trans, struct sk_buff *skb,
+	     struct net_device *dev, unsigned short protocol)
+{
+	return randmap(dev->ifindex + skb->len);
+}
+
+SEC("fexit/eth_type_trans")
+int BPF_PROG(fexit_eth_type_trans, struct sk_buff *skb,
+	     struct net_device *dev, unsigned short protocol)
+{
+	return randmap(dev->ifindex + skb->len);
+}
+
+char LICENSE[] SEC("license") = "GPL";
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 bpf-next 13/13] selftest/bpf: Revert CO-RE removal in test_ksyms_weak.
  2021-11-20  3:32 [PATCH v3 bpf-next 00/13] bpf: CO-RE support in the kernel Alexei Starovoitov
                   ` (11 preceding siblings ...)
  2021-11-20  3:32 ` [PATCH v3 bpf-next 12/13] selftests/bpf: Additional test for CO-RE in the kernel Alexei Starovoitov
@ 2021-11-20  3:32 ` Alexei Starovoitov
  2021-11-23  0:10   ` Andrii Nakryiko
  2021-11-23  0:23   ` Kumar Kartikeya Dwivedi
  12 siblings, 2 replies; 37+ messages in thread
From: Alexei Starovoitov @ 2021-11-20  3:32 UTC (permalink / raw)
  To: davem; +Cc: daniel, andrii, bpf, kernel-team

From: Alexei Starovoitov <ast@kernel.org>

The commit 087cba799ced ("selftests/bpf: Add weak/typeless ksym test for light skeleton")
added test_ksyms_weak to light skeleton testing, but remove CO-RE access.
Revert that part of commit, since light skeleton can use CO-RE in the kernel.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 tools/testing/selftests/bpf/prog_tests/ksyms_btf.c  | 4 ++--
 tools/testing/selftests/bpf/progs/test_ksyms_weak.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/ksyms_btf.c b/tools/testing/selftests/bpf/prog_tests/ksyms_btf.c
index 79f6bd1e50d6..988f5db3e342 100644
--- a/tools/testing/selftests/bpf/prog_tests/ksyms_btf.c
+++ b/tools/testing/selftests/bpf/prog_tests/ksyms_btf.c
@@ -101,7 +101,7 @@ static void test_weak_syms(void)
 	usleep(1);
 
 	data = skel->data;
-	ASSERT_EQ(data->out__existing_typed, 0, "existing typed ksym");
+	ASSERT_GE(data->out__existing_typed, 0, "existing typed ksym");
 	ASSERT_NEQ(data->out__existing_typeless, -1, "existing typeless ksym");
 	ASSERT_EQ(data->out__non_existent_typeless, 0, "nonexistent typeless ksym");
 	ASSERT_EQ(data->out__non_existent_typed, 0, "nonexistent typed ksym");
@@ -128,7 +128,7 @@ static void test_weak_syms_lskel(void)
 	usleep(1);
 
 	data = skel->data;
-	ASSERT_EQ(data->out__existing_typed, 0, "existing typed ksym");
+	ASSERT_GE(data->out__existing_typed, 0, "existing typed ksym");
 	ASSERT_NEQ(data->out__existing_typeless, -1, "existing typeless ksym");
 	ASSERT_EQ(data->out__non_existent_typeless, 0, "nonexistent typeless ksym");
 	ASSERT_EQ(data->out__non_existent_typed, 0, "nonexistent typed ksym");
diff --git a/tools/testing/selftests/bpf/progs/test_ksyms_weak.c b/tools/testing/selftests/bpf/progs/test_ksyms_weak.c
index 8eadbd4caf7a..5f8379aadb29 100644
--- a/tools/testing/selftests/bpf/progs/test_ksyms_weak.c
+++ b/tools/testing/selftests/bpf/progs/test_ksyms_weak.c
@@ -38,7 +38,7 @@ int pass_handler(const void *ctx)
 	/* tests existing symbols. */
 	rq = (struct rq *)bpf_per_cpu_ptr(&runqueues, 0);
 	if (rq)
-		out__existing_typed = 0;
+		out__existing_typed = rq->cpu;
 	out__existing_typeless = (__u64)&bpf_prog_active;
 
 	/* tests non-existent symbols. */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 bpf-next 03/13] bpf: Prepare relo_core.c for kernel duty.
  2021-11-20  3:32 ` [PATCH v3 bpf-next 03/13] bpf: Prepare relo_core.c for kernel duty Alexei Starovoitov
@ 2021-11-20 15:27   ` Matteo Croce
  2021-11-20 16:23     ` Alexei Starovoitov
  2021-11-22 23:15   ` Andrii Nakryiko
  1 sibling, 1 reply; 37+ messages in thread
From: Matteo Croce @ 2021-11-20 15:27 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David Miller, Daniel Borkmann, Andrii Nakryiko, bpf, Kernel Team

On Sat, Nov 20, 2021 at 4:33 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> From: Alexei Starovoitov <ast@kernel.org>
>
> Make relo_core.c to be compiled for the kernel and for user space libbpf.
>
> Note the patch is reducing BPF_CORE_SPEC_MAX_LEN from 64 to 32.
> This is the maximum number of nested structs and arrays.
> For example:
>  struct sample {
>      int a;
>      struct {
>          int b[10];
>      };
>  };
>
>  struct sample *s = ...;
>  int y = &s->b[5];

I don't understand this. Is this intentional, or it should be one of:

int y = s->b[5];
int *y = &s->b[5];

Regards,
-- 
per aspera ad upstream

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 bpf-next 03/13] bpf: Prepare relo_core.c for kernel duty.
  2021-11-20 15:27   ` Matteo Croce
@ 2021-11-20 16:23     ` Alexei Starovoitov
  2021-11-22 21:36       ` Andrii Nakryiko
  0 siblings, 1 reply; 37+ messages in thread
From: Alexei Starovoitov @ 2021-11-20 16:23 UTC (permalink / raw)
  To: Matteo Croce
  Cc: David Miller, Daniel Borkmann, Andrii Nakryiko, bpf, Kernel Team

On Sat, Nov 20, 2021 at 7:27 AM Matteo Croce <mcroce@linux.microsoft.com> wrote:
>
> On Sat, Nov 20, 2021 at 4:33 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > From: Alexei Starovoitov <ast@kernel.org>
> >
> > Make relo_core.c to be compiled for the kernel and for user space libbpf.
> >
> > Note the patch is reducing BPF_CORE_SPEC_MAX_LEN from 64 to 32.
> > This is the maximum number of nested structs and arrays.
> > For example:
> >  struct sample {
> >      int a;
> >      struct {
> >          int b[10];
> >      };
> >  };
> >
> >  struct sample *s = ...;
> >  int y = &s->b[5];
>
> I don't understand this. Is this intentional, or it should be one of:
>
> int y = s->b[5];
> int *y = &s->b[5];

Eagle eye. I copy pasted this typo from libbpf.
Will fix in all places at once either in a respin or in a separate patch.
For the purpose of the example it could be either.
int *y = &s->b[5]; is a relocatable ADD.
int y = s->b[5]; is a relocatable LDX.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 bpf-next 03/13] bpf: Prepare relo_core.c for kernel duty.
  2021-11-20 16:23     ` Alexei Starovoitov
@ 2021-11-22 21:36       ` Andrii Nakryiko
  0 siblings, 0 replies; 37+ messages in thread
From: Andrii Nakryiko @ 2021-11-22 21:36 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Matteo Croce, David Miller, Daniel Borkmann, Andrii Nakryiko,
	bpf, Kernel Team

On Sat, Nov 20, 2021 at 8:24 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Sat, Nov 20, 2021 at 7:27 AM Matteo Croce <mcroce@linux.microsoft.com> wrote:
> >
> > On Sat, Nov 20, 2021 at 4:33 AM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > From: Alexei Starovoitov <ast@kernel.org>
> > >
> > > Make relo_core.c to be compiled for the kernel and for user space libbpf.
> > >
> > > Note the patch is reducing BPF_CORE_SPEC_MAX_LEN from 64 to 32.
> > > This is the maximum number of nested structs and arrays.
> > > For example:
> > >  struct sample {
> > >      int a;
> > >      struct {
> > >          int b[10];
> > >      };
> > >  };
> > >
> > >  struct sample *s = ...;
> > >  int y = &s->b[5];
> >
> > I don't understand this. Is this intentional, or it should be one of:
> >
> > int y = s->b[5];
> > int *y = &s->b[5];
>
> Eagle eye. I copy pasted this typo from libbpf.
> Will fix in all places at once either in a respin or in a separate patch.
> For the purpose of the example it could be either.
> int *y = &s->b[5]; is a relocatable ADD.

I think this was the intention (getting address in CO-RE-relocatable
way) at the time, we didn't yet have the direct memory access that
fentry provides. So that `int *` was intended to be then passed to
bpf_probe_read_kernel().

Now both options would work, but the first one only works in
fentry/fexit and similar program types.

> int y = s->b[5]; is a relocatable LDX.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 bpf-next 01/13] libbpf: Replace btf__type_by_id() with btf_type_by_id().
  2021-11-20  3:32 ` [PATCH v3 bpf-next 01/13] libbpf: Replace btf__type_by_id() with btf_type_by_id() Alexei Starovoitov
@ 2021-11-22 21:42   ` Andrii Nakryiko
  0 siblings, 0 replies; 37+ messages in thread
From: Andrii Nakryiko @ 2021-11-22 21:42 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, Andrii Nakryiko, bpf, Kernel Team

On Fri, Nov 19, 2021 at 7:33 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> From: Alexei Starovoitov <ast@kernel.org>
>
> To prepare relo_core.c to be compiled in the kernel and the user space
> replace btf__type_by_id with btf_type_by_id.
>
> In libbpf btf__type_by_id and btf_type_by_id have different behavior.
>
> bpf_core_apply_relo_insn() needs behavior of uapi btf__type_by_id
> vs internal btf_type_by_id, but type_id range check is already done
> in bpf_core_apply_relo(), so it's safe to replace it everywhere.
> The kernel btf_type_by_id() does the check anyway. It doesn't hurt.
>
> Suggested-by: Andrii Nakryiko <andrii@kernel.org>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---

LGTM.

Acked-by: Andrii Nakryiko <andrii@kernel.org>

>  tools/lib/bpf/btf.c             |  2 +-
>  tools/lib/bpf/libbpf_internal.h |  2 +-
>  tools/lib/bpf/relo_core.c       | 19 ++++++++-----------
>  3 files changed, 10 insertions(+), 13 deletions(-)
>

[...]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 bpf-next 03/13] bpf: Prepare relo_core.c for kernel duty.
  2021-11-20  3:32 ` [PATCH v3 bpf-next 03/13] bpf: Prepare relo_core.c for kernel duty Alexei Starovoitov
  2021-11-20 15:27   ` Matteo Croce
@ 2021-11-22 23:15   ` Andrii Nakryiko
  1 sibling, 0 replies; 37+ messages in thread
From: Andrii Nakryiko @ 2021-11-22 23:15 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, Andrii Nakryiko, bpf, Kernel Team

On Fri, Nov 19, 2021 at 7:33 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> From: Alexei Starovoitov <ast@kernel.org>
>
> Make relo_core.c to be compiled for the kernel and for user space libbpf.
>
> Note the patch is reducing BPF_CORE_SPEC_MAX_LEN from 64 to 32.
> This is the maximum number of nested structs and arrays.
> For example:
>  struct sample {
>      int a;
>      struct {
>          int b[10];
>      };
>  };
>
>  struct sample *s = ...;
>  int y = &s->b[5];
> This field access is encoded as "0:1:0:5" and spec len is 4.
>
> The follow up patch might bump it back to 64.
>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---

Acked-by: Andrii Nakryiko <andrii@kernel.org>

>  include/linux/btf.h       | 81 +++++++++++++++++++++++++++++++++++++++
>  kernel/bpf/Makefile       |  4 ++
>  kernel/bpf/btf.c          | 26 +++++++++++++
>  tools/lib/bpf/relo_core.c | 76 ++++++++++++++++++++++++++++++------
>  4 files changed, 176 insertions(+), 11 deletions(-)
>

[...]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 bpf-next 05/13] bpf: Pass a set of bpf_core_relo-s to prog_load command.
  2021-11-20  3:32 ` [PATCH v3 bpf-next 05/13] bpf: Pass a set of bpf_core_relo-s to prog_load command Alexei Starovoitov
@ 2021-11-22 23:37   ` Andrii Nakryiko
  0 siblings, 0 replies; 37+ messages in thread
From: Andrii Nakryiko @ 2021-11-22 23:37 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, Andrii Nakryiko, bpf, Kernel Team

On Fri, Nov 19, 2021 at 7:33 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> From: Alexei Starovoitov <ast@kernel.org>
>
> struct bpf_core_relo is generated by llvm and processed by libbpf.
> It's a de-facto uapi.
> With CO-RE in the kernel the struct bpf_core_relo becomes uapi de-jure.
> Add an ability to pass a set of 'struct bpf_core_relo' to prog_load command
> and let the kernel perform CO-RE relocations.
>
> Note the struct bpf_line_info and struct bpf_func_info have the same
> layout when passed from LLVM to libbpf and from libbpf to the kernel
> except "insn_off" fields means "byte offset" when LLVM generates it.
> Then libbpf converts it to "insn index" to pass to the kernel.
> The struct bpf_core_relo's "insn_off" field is always "byte offset".
>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---

Acked-by: Andrii Nakryiko <andrii@kernel.org>

>  include/linux/bpf.h            | 11 +++++
>  include/uapi/linux/bpf.h       | 59 +++++++++++++++++++++++++-
>  kernel/bpf/btf.c               | 10 +++++
>  kernel/bpf/syscall.c           |  2 +-
>  kernel/bpf/verifier.c          | 77 ++++++++++++++++++++++++++++++++++
>  tools/include/uapi/linux/bpf.h | 59 +++++++++++++++++++++++++-
>  tools/lib/bpf/relo_core.h      | 53 -----------------------
>  7 files changed, 215 insertions(+), 56 deletions(-)
>

[...]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 bpf-next 06/13] bpf: Add bpf_core_add_cands() and wire it into bpf_core_apply_relo_insn().
  2021-11-20  3:32 ` [PATCH v3 bpf-next 06/13] bpf: Add bpf_core_add_cands() and wire it into bpf_core_apply_relo_insn() Alexei Starovoitov
@ 2021-11-22 23:47   ` Andrii Nakryiko
  2021-11-23  0:42     ` Alexei Starovoitov
  0 siblings, 1 reply; 37+ messages in thread
From: Andrii Nakryiko @ 2021-11-22 23:47 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, Andrii Nakryiko, bpf, Kernel Team

On Fri, Nov 19, 2021 at 7:33 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> From: Alexei Starovoitov <ast@kernel.org>
>
> Given BPF program's BTF perform a linear search through kernel BTFs for
> a possible candidate.
> Then wire the result into bpf_core_apply_relo_insn().
>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---
>  kernel/bpf/btf.c | 136 ++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 135 insertions(+), 1 deletion(-)
>

[...]

>  int bpf_core_apply(struct bpf_core_ctx *ctx, const struct bpf_core_relo *relo,
>                    int relo_idx, void *insn)
>  {
> -       return -EOPNOTSUPP;
> +       if (relo->kind != BPF_CORE_TYPE_ID_LOCAL) {
> +               struct bpf_core_cand_list *cands;
> +
> +               cands = bpf_core_find_cands(ctx, relo->type_id);

this is wrong for many reasons:

1. you will overwrite previous ctx->cands, if it was already set,
which leaks memory
2. this list of candidates should be keyed by relo->type_id ("root
type"). Different root types get their own independent lists; so it
has to be some sort of look up table from type_id to a list of
candidates.

2) means that if you had a bunch of relos against struct task_struct,
you'll crate a list of candidates when processing first relo that
starts at task_struct. All the subsequent relos that have task_struct
as root type will re-used that list and potentially trim it down. If
there are some other relos against, say, struct mm_struct, they will
have their independent list of candidates.


> +               if (IS_ERR(cands)) {
> +                       bpf_log(ctx->log, "target candidate search failed for %d\n",
> +                               relo->type_id);
> +                       return PTR_ERR(cands);
> +               }
> +               ctx->cands = cands;
> +       }
> +       return bpf_core_apply_relo_insn((void *)ctx->log, insn, relo->insn_off / 8,
> +                                       relo, relo_idx, ctx->btf, ctx->cands);
>  }
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 bpf-next 07/13] libbpf: Use CO-RE in the kernel in light skeleton.
  2021-11-20  3:32 ` [PATCH v3 bpf-next 07/13] libbpf: Use CO-RE in the kernel in light skeleton Alexei Starovoitov
@ 2021-11-23  0:03   ` Andrii Nakryiko
  2021-11-23  0:08     ` Alexei Starovoitov
  0 siblings, 1 reply; 37+ messages in thread
From: Andrii Nakryiko @ 2021-11-23  0:03 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, Andrii Nakryiko, bpf, Kernel Team

On Fri, Nov 19, 2021 at 7:33 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> From: Alexei Starovoitov <ast@kernel.org>
>
> Without lskel the CO-RE relocations are processed by libbpf before any other
> work is done. Instead, when lskel is needed, remember relocation as RELO_CORE
> kind. Then when loader prog is generated for a given bpf program pass CO-RE
> relos of that program to gen loader via bpf_gen__record_relo_core(). The gen
> loader will remember them as-is and pass it later as-is into the kernel.
>
> The normal libbpf flow is to process CO-RE early before call relos happen. In
> case of gen_loader the core relos have to be added to other relos to be copied
> together when bpf static function is appended in different places to other main
> bpf progs. During the copy the append_subprog_relos() will adjust insn_idx for
> normal relos and for RELO_CORE kind too. When that is done each struct
> reloc_desc has good relos for specific main prog.
>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---
>  tools/lib/bpf/bpf_gen_internal.h |   3 +
>  tools/lib/bpf/gen_loader.c       |  41 +++++++++++-
>  tools/lib/bpf/libbpf.c           | 108 ++++++++++++++++++++++---------
>  3 files changed, 119 insertions(+), 33 deletions(-)
>

[...]

>         if (relo->kind != BPF_CORE_TYPE_ID_LOCAL &&
> @@ -5653,6 +5679,9 @@ bpf_object__relocate_data(struct bpf_object *obj, struct bpf_program *prog)
>                 case RELO_CALL:
>                         /* handled already */
>                         break;
> +               case RELO_CORE:
> +                       /* will be handled by bpf_program_record_relos() */
> +                       break;
>                 default:
>                         pr_warn("prog '%s': relo #%d: bad relo type %d\n",
>                                 prog->name, i, relo->type);
> @@ -6090,6 +6119,35 @@ bpf_object__free_relocs(struct bpf_object *obj)
>         }
>  }
>
> +static int cmp_relocs(const void *_a, const void *_b)
> +{
> +       const struct reloc_desc *a = _a;
> +       const struct reloc_desc *b = _b;
> +
> +       if (a->insn_idx != b->insn_idx)
> +               return a->insn_idx < b->insn_idx ? -1 : 1;
> +
> +       /* no two relocations should have the same insn_idx, but ... */
> +       if (a->type != b->type)
> +               return a->type < b->type ? -1 : 1;
> +
> +       return 0;
> +}
> +
> +static void bpf_object__sort_relos(struct bpf_object *obj)
> +{
> +       int i;
> +
> +       for (i = 0; i < obj->nr_programs; i++) {
> +               struct bpf_program *p = &obj->programs[i];
> +
> +               if (!p->nr_reloc)
> +                       continue;
> +
> +               qsort(p->reloc_desc, p->nr_reloc, sizeof(*p->reloc_desc), cmp_relocs);
> +       }
> +}
> +
>  static int
>  bpf_object__relocate(struct bpf_object *obj, const char *targ_btf_path)
>  {
> @@ -6104,6 +6162,8 @@ bpf_object__relocate(struct bpf_object *obj, const char *targ_btf_path)
>                                 err);
>                         return err;
>                 }
> +               if (obj->gen_loader)
> +                       bpf_object__sort_relos(obj);

libbpf sorts relos because it does binary search on them (see
find_prog_insn_relo). Kernel doesn't seem to enforce the order of
CO-RE relocations, so there doesn't seem to be a need to sort them. Is
there any reason you wanted to sort them? Does light skeleton's
generated code rely on some ordering? If yes, let's add a small
comment to document this.

>         }
>
>         /* Before relocating calls pre-process relocations and mark

[...]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 bpf-next 09/13] selftests/bpf: Add lskel version of kfunc test.
  2021-11-20  3:32 ` [PATCH v3 bpf-next 09/13] selftests/bpf: Add lskel version of kfunc test Alexei Starovoitov
@ 2021-11-23  0:05   ` Andrii Nakryiko
  0 siblings, 0 replies; 37+ messages in thread
From: Andrii Nakryiko @ 2021-11-23  0:05 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, Andrii Nakryiko, bpf, Kernel Team

On Fri, Nov 19, 2021 at 7:33 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> From: Alexei Starovoitov <ast@kernel.org>
>
> Add light skeleton version of kfunc_call_test_subprog test.
>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---

Acked-by: Andrii Nakryiko <andrii@kernel.org>

>  tools/testing/selftests/bpf/Makefile          |  2 +-
>  .../selftests/bpf/prog_tests/kfunc_call.c     | 24 +++++++++++++++++++
>  2 files changed, 25 insertions(+), 1 deletion(-)
>

[...]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 bpf-next 07/13] libbpf: Use CO-RE in the kernel in light skeleton.
  2021-11-23  0:03   ` Andrii Nakryiko
@ 2021-11-23  0:08     ` Alexei Starovoitov
  0 siblings, 0 replies; 37+ messages in thread
From: Alexei Starovoitov @ 2021-11-23  0:08 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: David S. Miller, Daniel Borkmann, Andrii Nakryiko, bpf, Kernel Team

On Mon, Nov 22, 2021 at 4:04 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Fri, Nov 19, 2021 at 7:33 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > From: Alexei Starovoitov <ast@kernel.org>
> >
> > Without lskel the CO-RE relocations are processed by libbpf before any other
> > work is done. Instead, when lskel is needed, remember relocation as RELO_CORE
> > kind. Then when loader prog is generated for a given bpf program pass CO-RE
> > relos of that program to gen loader via bpf_gen__record_relo_core(). The gen
> > loader will remember them as-is and pass it later as-is into the kernel.
> >
> > The normal libbpf flow is to process CO-RE early before call relos happen. In
> > case of gen_loader the core relos have to be added to other relos to be copied
> > together when bpf static function is appended in different places to other main
> > bpf progs. During the copy the append_subprog_relos() will adjust insn_idx for
> > normal relos and for RELO_CORE kind too. When that is done each struct
> > reloc_desc has good relos for specific main prog.
> >
> > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> > ---
> >  tools/lib/bpf/bpf_gen_internal.h |   3 +
> >  tools/lib/bpf/gen_loader.c       |  41 +++++++++++-
> >  tools/lib/bpf/libbpf.c           | 108 ++++++++++++++++++++++---------
> >  3 files changed, 119 insertions(+), 33 deletions(-)
> >
>
> [...]
>
> >         if (relo->kind != BPF_CORE_TYPE_ID_LOCAL &&
> > @@ -5653,6 +5679,9 @@ bpf_object__relocate_data(struct bpf_object *obj, struct bpf_program *prog)
> >                 case RELO_CALL:
> >                         /* handled already */
> >                         break;
> > +               case RELO_CORE:
> > +                       /* will be handled by bpf_program_record_relos() */
> > +                       break;
> >                 default:
> >                         pr_warn("prog '%s': relo #%d: bad relo type %d\n",
> >                                 prog->name, i, relo->type);
> > @@ -6090,6 +6119,35 @@ bpf_object__free_relocs(struct bpf_object *obj)
> >         }
> >  }
> >
> > +static int cmp_relocs(const void *_a, const void *_b)
> > +{
> > +       const struct reloc_desc *a = _a;
> > +       const struct reloc_desc *b = _b;
> > +
> > +       if (a->insn_idx != b->insn_idx)
> > +               return a->insn_idx < b->insn_idx ? -1 : 1;
> > +
> > +       /* no two relocations should have the same insn_idx, but ... */
> > +       if (a->type != b->type)
> > +               return a->type < b->type ? -1 : 1;
> > +
> > +       return 0;
> > +}
> > +
> > +static void bpf_object__sort_relos(struct bpf_object *obj)
> > +{
> > +       int i;
> > +
> > +       for (i = 0; i < obj->nr_programs; i++) {
> > +               struct bpf_program *p = &obj->programs[i];
> > +
> > +               if (!p->nr_reloc)
> > +                       continue;
> > +
> > +               qsort(p->reloc_desc, p->nr_reloc, sizeof(*p->reloc_desc), cmp_relocs);
> > +       }
> > +}
> > +
> >  static int
> >  bpf_object__relocate(struct bpf_object *obj, const char *targ_btf_path)
> >  {
> > @@ -6104,6 +6162,8 @@ bpf_object__relocate(struct bpf_object *obj, const char *targ_btf_path)
> >                                 err);
> >                         return err;
> >                 }
> > +               if (obj->gen_loader)
> > +                       bpf_object__sort_relos(obj);
>
> libbpf sorts relos because it does binary search on them (see
> find_prog_insn_relo).

exactly.
After co-re relos were added the array has to be sorted again.
find_prog_insn_relo() will be called after this step.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 bpf-next 12/13] selftests/bpf: Additional test for CO-RE in the kernel.
  2021-11-20  3:32 ` [PATCH v3 bpf-next 12/13] selftests/bpf: Additional test for CO-RE in the kernel Alexei Starovoitov
@ 2021-11-23  0:09   ` Andrii Nakryiko
  0 siblings, 0 replies; 37+ messages in thread
From: Andrii Nakryiko @ 2021-11-23  0:09 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, Andrii Nakryiko, bpf, Kernel Team

On Fri, Nov 19, 2021 at 7:33 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> From: Alexei Starovoitov <ast@kernel.org>
>
> Additional test where randmap() function is appended to three different bpf
> programs. That action checks struct bpf_core_relo replication logic and offset

typo: replication -> relocation?

> adjustment.
>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---
>  tools/testing/selftests/bpf/Makefile          |  2 +-
>  .../selftests/bpf/prog_tests/core_kern.c      | 14 +++++
>  tools/testing/selftests/bpf/progs/core_kern.c | 60 +++++++++++++++++++
>  3 files changed, 75 insertions(+), 1 deletion(-)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/core_kern.c
>  create mode 100644 tools/testing/selftests/bpf/progs/core_kern.c
>
> diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> index 4fd040f5944b..139d7e5e0a5f 100644
> --- a/tools/testing/selftests/bpf/Makefile
> +++ b/tools/testing/selftests/bpf/Makefile
> @@ -326,7 +326,7 @@ LINKED_SKELS := test_static_linked.skel.h linked_funcs.skel.h               \
>
>  LSKELS := kfunc_call_test.c fentry_test.c fexit_test.c fexit_sleep.c \
>         test_ringbuf.c atomics.c trace_printk.c trace_vprintk.c \
> -       map_ptr_kern.c
> +       map_ptr_kern.c core_kern.c
>  # Generate both light skeleton and libbpf skeleton for these
>  LSKELS_EXTRA := test_ksyms_module.c test_ksyms_weak.c kfunc_call_test_subprog.c
>  SKEL_BLACKLIST += $$(LSKELS)
> diff --git a/tools/testing/selftests/bpf/prog_tests/core_kern.c b/tools/testing/selftests/bpf/prog_tests/core_kern.c
> new file mode 100644
> index 000000000000..561c5185d886
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/core_kern.c
> @@ -0,0 +1,14 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2021 Facebook */
> +
> +#include "test_progs.h"
> +#include "core_kern.lskel.h"
> +
> +void test_core_kern_lskel(void)
> +{
> +       struct core_kern_lskel *skel;
> +
> +       skel = core_kern_lskel__open_and_load();
> +       ASSERT_OK_PTR(skel, "open_and_load");
> +       core_kern_lskel__destroy(skel);
> +}
> diff --git a/tools/testing/selftests/bpf/progs/core_kern.c b/tools/testing/selftests/bpf/progs/core_kern.c
> new file mode 100644
> index 000000000000..3b4571d68369
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/core_kern.c
> @@ -0,0 +1,60 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2021 Facebook */
> +#include "vmlinux.h"
> +
> +#include <bpf/bpf_helpers.h>
> +#include <bpf/bpf_tracing.h>
> +
> +struct {
> +       __uint(type, BPF_MAP_TYPE_ARRAY);
> +       __type(key, u32);
> +       __type(value, u32);
> +       __uint(max_entries, 256);
> +} array1 SEC(".maps");
> +
> +struct {
> +       __uint(type, BPF_MAP_TYPE_ARRAY);
> +       __type(key, u32);
> +       __type(value, u32);
> +       __uint(max_entries, 256);
> +} array2 SEC(".maps");
> +
> +int randmap(int v)
> +{
> +       struct bpf_map *map = (struct bpf_map *)&array1;
> +       int key = bpf_get_prandom_u32() & 0xff;
> +       int *val;
> +
> +       if (bpf_get_prandom_u32() & 1)
> +               map = (struct bpf_map *)&array2;
> +
> +       val = bpf_map_lookup_elem(map, &key);
> +       if (val)
> +               *val = bpf_get_prandom_u32() + v;
> +
> +       return 0;

If I understand correctly the intent, this function should have had
some CO-RE relocations, no? So that after its code is appended to the
three entry-level progs below CO-RE relocations are performed
correctly, right? But as far as I can see, this function doesn't do
any CO-RE relocations or am I missing something? If it was accessing
map->type or something along those lines (probably through
BPF_CORE_READ() macro), then it would have CO-RE relocs.

> +}
> +
> +SEC("tp_btf/xdp_devmap_xmit")
> +int BPF_PROG(tp_xdp_devmap_xmit_multi, const struct net_device
> +            *from_dev, const struct net_device *to_dev, int sent, int drops,
> +            int err)
> +{
> +       return randmap(from_dev->ifindex);
> +}
> +
> +SEC("fentry/eth_type_trans")
> +int BPF_PROG(fentry_eth_type_trans, struct sk_buff *skb,
> +            struct net_device *dev, unsigned short protocol)
> +{
> +       return randmap(dev->ifindex + skb->len);
> +}
> +
> +SEC("fexit/eth_type_trans")
> +int BPF_PROG(fexit_eth_type_trans, struct sk_buff *skb,
> +            struct net_device *dev, unsigned short protocol)
> +{
> +       return randmap(dev->ifindex + skb->len);
> +}
> +
> +char LICENSE[] SEC("license") = "GPL";
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 bpf-next 13/13] selftest/bpf: Revert CO-RE removal in test_ksyms_weak.
  2021-11-20  3:32 ` [PATCH v3 bpf-next 13/13] selftest/bpf: Revert CO-RE removal in test_ksyms_weak Alexei Starovoitov
@ 2021-11-23  0:10   ` Andrii Nakryiko
  2021-11-23  0:23   ` Kumar Kartikeya Dwivedi
  1 sibling, 0 replies; 37+ messages in thread
From: Andrii Nakryiko @ 2021-11-23  0:10 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, Andrii Nakryiko, bpf, Kernel Team

On Fri, Nov 19, 2021 at 7:33 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> From: Alexei Starovoitov <ast@kernel.org>
>
> The commit 087cba799ced ("selftests/bpf: Add weak/typeless ksym test for light skeleton")
> added test_ksyms_weak to light skeleton testing, but remove CO-RE access.
> Revert that part of commit, since light skeleton can use CO-RE in the kernel.
>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---

Great!

Acked-by: Andrii Nakryiko <andrii@kernel.org>

>  tools/testing/selftests/bpf/prog_tests/ksyms_btf.c  | 4 ++--
>  tools/testing/selftests/bpf/progs/test_ksyms_weak.c | 2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)
>

[...]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 bpf-next 13/13] selftest/bpf: Revert CO-RE removal in test_ksyms_weak.
  2021-11-20  3:32 ` [PATCH v3 bpf-next 13/13] selftest/bpf: Revert CO-RE removal in test_ksyms_weak Alexei Starovoitov
  2021-11-23  0:10   ` Andrii Nakryiko
@ 2021-11-23  0:23   ` Kumar Kartikeya Dwivedi
  2021-11-23  0:32     ` Alexei Starovoitov
  1 sibling, 1 reply; 37+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2021-11-23  0:23 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: davem, daniel, andrii, bpf, kernel-team

On Sat, Nov 20, 2021 at 09:02:55AM IST, Alexei Starovoitov wrote:
> From: Alexei Starovoitov <ast@kernel.org>
>
> The commit 087cba799ced ("selftests/bpf: Add weak/typeless ksym test for light skeleton")
> added test_ksyms_weak to light skeleton testing, but remove CO-RE access.
> Revert that part of commit, since light skeleton can use CO-RE in the kernel.
>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---
>  tools/testing/selftests/bpf/prog_tests/ksyms_btf.c  | 4 ++--
>  tools/testing/selftests/bpf/progs/test_ksyms_weak.c | 2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/ksyms_btf.c b/tools/testing/selftests/bpf/prog_tests/ksyms_btf.c
> index 79f6bd1e50d6..988f5db3e342 100644
> --- a/tools/testing/selftests/bpf/prog_tests/ksyms_btf.c
> +++ b/tools/testing/selftests/bpf/prog_tests/ksyms_btf.c
> @@ -101,7 +101,7 @@ static void test_weak_syms(void)
>  	usleep(1);
>
>  	data = skel->data;
> -	ASSERT_EQ(data->out__existing_typed, 0, "existing typed ksym");
> +	ASSERT_GE(data->out__existing_typed, 0, "existing typed ksym");

I think original test (2211c825e7b6b) was doing ASSERT_EQ, since per cpu ptr for
runqueue is from CPU 0.

>  	ASSERT_NEQ(data->out__existing_typeless, -1, "existing typeless ksym");
>  	ASSERT_EQ(data->out__non_existent_typeless, 0, "nonexistent typeless ksym");
>  	ASSERT_EQ(data->out__non_existent_typed, 0, "nonexistent typed ksym");
> @@ -128,7 +128,7 @@ static void test_weak_syms_lskel(void)
>  	usleep(1);
>
>  	data = skel->data;
> -	ASSERT_EQ(data->out__existing_typed, 0, "existing typed ksym");
> +	ASSERT_GE(data->out__existing_typed, 0, "existing typed ksym");
>  	ASSERT_NEQ(data->out__existing_typeless, -1, "existing typeless ksym");
>  	ASSERT_EQ(data->out__non_existent_typeless, 0, "nonexistent typeless ksym");
>  	ASSERT_EQ(data->out__non_existent_typed, 0, "nonexistent typed ksym");
> diff --git a/tools/testing/selftests/bpf/progs/test_ksyms_weak.c b/tools/testing/selftests/bpf/progs/test_ksyms_weak.c
> index 8eadbd4caf7a..5f8379aadb29 100644
> --- a/tools/testing/selftests/bpf/progs/test_ksyms_weak.c
> +++ b/tools/testing/selftests/bpf/progs/test_ksyms_weak.c
> @@ -38,7 +38,7 @@ int pass_handler(const void *ctx)
>  	/* tests existing symbols. */
>  	rq = (struct rq *)bpf_per_cpu_ptr(&runqueues, 0);
>  	if (rq)
> -		out__existing_typed = 0;
> +		out__existing_typed = rq->cpu;
>  	out__existing_typeless = (__u64)&bpf_prog_active;
>
>  	/* tests non-existent symbols. */
> --
> 2.30.2
>

--
Kartikeya

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 bpf-next 13/13] selftest/bpf: Revert CO-RE removal in test_ksyms_weak.
  2021-11-23  0:23   ` Kumar Kartikeya Dwivedi
@ 2021-11-23  0:32     ` Alexei Starovoitov
  0 siblings, 0 replies; 37+ messages in thread
From: Alexei Starovoitov @ 2021-11-23  0:32 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: David S. Miller, Daniel Borkmann, Andrii Nakryiko, bpf, Kernel Team

On Mon, Nov 22, 2021 at 4:23 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> On Sat, Nov 20, 2021 at 09:02:55AM IST, Alexei Starovoitov wrote:
> > From: Alexei Starovoitov <ast@kernel.org>
> >
> > The commit 087cba799ced ("selftests/bpf: Add weak/typeless ksym test for light skeleton")
> > added test_ksyms_weak to light skeleton testing, but remove CO-RE access.
> > Revert that part of commit, since light skeleton can use CO-RE in the kernel.
> >
> > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> > ---
> >  tools/testing/selftests/bpf/prog_tests/ksyms_btf.c  | 4 ++--
> >  tools/testing/selftests/bpf/progs/test_ksyms_weak.c | 2 +-
> >  2 files changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/tools/testing/selftests/bpf/prog_tests/ksyms_btf.c b/tools/testing/selftests/bpf/prog_tests/ksyms_btf.c
> > index 79f6bd1e50d6..988f5db3e342 100644
> > --- a/tools/testing/selftests/bpf/prog_tests/ksyms_btf.c
> > +++ b/tools/testing/selftests/bpf/prog_tests/ksyms_btf.c
> > @@ -101,7 +101,7 @@ static void test_weak_syms(void)
> >       usleep(1);
> >
> >       data = skel->data;
> > -     ASSERT_EQ(data->out__existing_typed, 0, "existing typed ksym");
> > +     ASSERT_GE(data->out__existing_typed, 0, "existing typed ksym");
>
> I think original test (2211c825e7b6b) was doing ASSERT_EQ, since per cpu ptr for
> runqueue is from CPU 0.

Thanks for the explanation.
I saw that the value is zero, but didn't dig that far to see that
it's fixed due to runqueue design.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 bpf-next 06/13] bpf: Add bpf_core_add_cands() and wire it into bpf_core_apply_relo_insn().
  2021-11-22 23:47   ` Andrii Nakryiko
@ 2021-11-23  0:42     ` Alexei Starovoitov
  2021-11-23  1:44       ` Andrii Nakryiko
  0 siblings, 1 reply; 37+ messages in thread
From: Alexei Starovoitov @ 2021-11-23  0:42 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: David S. Miller, Daniel Borkmann, Andrii Nakryiko, bpf, Kernel Team

On Mon, Nov 22, 2021 at 3:47 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Fri, Nov 19, 2021 at 7:33 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > From: Alexei Starovoitov <ast@kernel.org>
> >
> > Given BPF program's BTF perform a linear search through kernel BTFs for
> > a possible candidate.
> > Then wire the result into bpf_core_apply_relo_insn().
> >
> > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> > ---
> >  kernel/bpf/btf.c | 136 ++++++++++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 135 insertions(+), 1 deletion(-)
> >
>
> [...]
>
> >  int bpf_core_apply(struct bpf_core_ctx *ctx, const struct bpf_core_relo *relo,
> >                    int relo_idx, void *insn)
> >  {
> > -       return -EOPNOTSUPP;
> > +       if (relo->kind != BPF_CORE_TYPE_ID_LOCAL) {
> > +               struct bpf_core_cand_list *cands;
> > +
> > +               cands = bpf_core_find_cands(ctx, relo->type_id);
>
> this is wrong for many reasons:
>
> 1. you will overwrite previous ctx->cands, if it was already set,
> which leaks memory
> 2. this list of candidates should be keyed by relo->type_id ("root
> type"). Different root types get their own independent lists; so it
> has to be some sort of look up table from type_id to a list of
> candidates.
>
> 2) means that if you had a bunch of relos against struct task_struct,
> you'll crate a list of candidates when processing first relo that
> starts at task_struct. All the subsequent relos that have task_struct
> as root type will re-used that list and potentially trim it down. If
> there are some other relos against, say, struct mm_struct, they will
> have their independent list of candidates.

right.
Your prior comment confused me. I didn't do this reuse of cands
to avoid introducing hashtable here like libbpf does,
since it does too little to actually help.
I think I will go back to the prior version: linear search for every relo.
If we actually need to optimize this part of loading
we better do persistent cache of
name -> kernel btf_type-s
and reuse it across different programs.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 bpf-next 06/13] bpf: Add bpf_core_add_cands() and wire it into bpf_core_apply_relo_insn().
  2021-11-23  0:42     ` Alexei Starovoitov
@ 2021-11-23  1:44       ` Andrii Nakryiko
  2021-11-23  2:59         ` Andrii Nakryiko
  2021-11-23  3:15         ` Alexei Starovoitov
  0 siblings, 2 replies; 37+ messages in thread
From: Andrii Nakryiko @ 2021-11-23  1:44 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, Andrii Nakryiko, bpf, Kernel Team

On Mon, Nov 22, 2021 at 4:43 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, Nov 22, 2021 at 3:47 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Fri, Nov 19, 2021 at 7:33 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > From: Alexei Starovoitov <ast@kernel.org>
> > >
> > > Given BPF program's BTF perform a linear search through kernel BTFs for
> > > a possible candidate.
> > > Then wire the result into bpf_core_apply_relo_insn().
> > >
> > > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> > > ---
> > >  kernel/bpf/btf.c | 136 ++++++++++++++++++++++++++++++++++++++++++++++-
> > >  1 file changed, 135 insertions(+), 1 deletion(-)
> > >
> >
> > [...]
> >
> > >  int bpf_core_apply(struct bpf_core_ctx *ctx, const struct bpf_core_relo *relo,
> > >                    int relo_idx, void *insn)
> > >  {
> > > -       return -EOPNOTSUPP;
> > > +       if (relo->kind != BPF_CORE_TYPE_ID_LOCAL) {
> > > +               struct bpf_core_cand_list *cands;
> > > +
> > > +               cands = bpf_core_find_cands(ctx, relo->type_id);
> >
> > this is wrong for many reasons:
> >
> > 1. you will overwrite previous ctx->cands, if it was already set,
> > which leaks memory
> > 2. this list of candidates should be keyed by relo->type_id ("root
> > type"). Different root types get their own independent lists; so it
> > has to be some sort of look up table from type_id to a list of
> > candidates.
> >
> > 2) means that if you had a bunch of relos against struct task_struct,
> > you'll crate a list of candidates when processing first relo that
> > starts at task_struct. All the subsequent relos that have task_struct
> > as root type will re-used that list and potentially trim it down. If
> > there are some other relos against, say, struct mm_struct, they will
> > have their independent list of candidates.
>
> right.
> Your prior comment confused me. I didn't do this reuse of cands
> to avoid introducing hashtable here like libbpf does,
> since it does too little to actually help.

Since when avoiding millions of iterations for each relocation is "too
little help"?

BTW, I've tried to measure how noticeable this will be and added
test_verif_scale2 to core_kern with only change to use vmlinux.h (so
that __sk_buff accesses are CO-RE relocated). I didn't manage to get
it loaded, so something else is going there. So please take a look, I
don't really know how to debug lskel stuff. Here are the changes:

diff --git a/tools/testing/selftests/bpf/progs/core_kern.c
b/tools/testing/selftests/bpf/progs/core_kern.c
index 3b4571d68369..9916cf059883 100644
--- a/tools/testing/selftests/bpf/progs/core_kern.c
+++ b/tools/testing/selftests/bpf/progs/core_kern.c
@@ -4,6 +4,10 @@

 #include <bpf/bpf_helpers.h>
 #include <bpf/bpf_tracing.h>
+#include <bpf/bpf_core_read.h>
+
+#define ATTR __always_inline
+#include "test_jhash.h"

 struct {
  __uint(type, BPF_MAP_TYPE_ARRAY);
@@ -57,4 +61,27 @@ int BPF_PROG(fexit_eth_type_trans, struct sk_buff *skb,
  return randmap(dev->ifindex + skb->len);
 }

+SEC("tc")
+int balancer_ingress(struct __sk_buff *ctx)
+{
+ void *data_end = (void *)(long)ctx->data_end;
+ void *data = (void *)(long)ctx->data;
+ void *ptr;
+ int ret = 0, nh_off, i = 0;
+
+ nh_off = 14;
+
+ /* pragma unroll doesn't work on large loops */
+
+#define C do { \
+ ptr = data + i; \
+ if (ptr + nh_off > data_end) \
+ break; \
+ ctx->tc_index = jhash(ptr, nh_off, ctx->cb[0] + i++); \
+ } while (0);
+#define C30 C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;
+ C30;C30;C30; /* 90 calls */
+ return 0;
+}
+
 char LICENSE[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/test_verif_scale2.c
b/tools/testing/selftests/bpf/progs/test_verif_scale2.c
index f024154c7be7..9e2c2a6954cb 100644
--- a/tools/testing/selftests/bpf/progs/test_verif_scale2.c
+++ b/tools/testing/selftests/bpf/progs/test_verif_scale2.c
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 // Copyright (c) 2019 Facebook
-#include <linux/bpf.h>
+#include "vmlinux.h"
 #include <bpf/bpf_helpers.h>
 #define ATTR __always_inline
 #include "test_jhash.h"


The version with libbpf does load successfully, while lskel one gives:

#35 core_kern_lskel:FAIL
test_core_kern_lskel:FAIL:open_and_load unexpected error: -22

Same stuff with libbpf skeleton successfully loaded.

If you manage to get it working, you'll have a BPF program 181 CO-RE
relocations, let's see how noticeable it will be to do 181 iterations
over ~2mln vmlinux BTF types.

> I think I will go back to the prior version: linear search for every relo.

I don't understand the resistance, the kernel has both rbtree and
hashmaps, what's the problem just using that?

> If we actually need to optimize this part of loading
> we better do persistent cache of
> name -> kernel btf_type-s
> and reuse it across different programs.

You can't reuse it because the same type name in a BPF object BTF can
be resolved to different kernel types (e.g., if we still had
ring_buffer name collision), so this is all per-BPF object (or at
least per BPF program). We still have duplicated types in some more
fuller kernel configurations, and it can always happen down the road.

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 bpf-next 06/13] bpf: Add bpf_core_add_cands() and wire it into bpf_core_apply_relo_insn().
  2021-11-23  1:44       ` Andrii Nakryiko
@ 2021-11-23  2:59         ` Andrii Nakryiko
  2021-11-23  3:15         ` Alexei Starovoitov
  1 sibling, 0 replies; 37+ messages in thread
From: Andrii Nakryiko @ 2021-11-23  2:59 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, Andrii Nakryiko, bpf, Kernel Team

On Mon, Nov 22, 2021 at 5:44 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Mon, Nov 22, 2021 at 4:43 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Mon, Nov 22, 2021 at 3:47 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Fri, Nov 19, 2021 at 7:33 PM Alexei Starovoitov
> > > <alexei.starovoitov@gmail.com> wrote:
> > > >
> > > > From: Alexei Starovoitov <ast@kernel.org>
> > > >
> > > > Given BPF program's BTF perform a linear search through kernel BTFs for
> > > > a possible candidate.
> > > > Then wire the result into bpf_core_apply_relo_insn().
> > > >
> > > > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> > > > ---
> > > >  kernel/bpf/btf.c | 136 ++++++++++++++++++++++++++++++++++++++++++++++-
> > > >  1 file changed, 135 insertions(+), 1 deletion(-)
> > > >
> > >
> > > [...]
> > >
> > > >  int bpf_core_apply(struct bpf_core_ctx *ctx, const struct bpf_core_relo *relo,
> > > >                    int relo_idx, void *insn)
> > > >  {
> > > > -       return -EOPNOTSUPP;
> > > > +       if (relo->kind != BPF_CORE_TYPE_ID_LOCAL) {
> > > > +               struct bpf_core_cand_list *cands;
> > > > +
> > > > +               cands = bpf_core_find_cands(ctx, relo->type_id);
> > >
> > > this is wrong for many reasons:
> > >
> > > 1. you will overwrite previous ctx->cands, if it was already set,
> > > which leaks memory
> > > 2. this list of candidates should be keyed by relo->type_id ("root
> > > type"). Different root types get their own independent lists; so it
> > > has to be some sort of look up table from type_id to a list of
> > > candidates.
> > >
> > > 2) means that if you had a bunch of relos against struct task_struct,
> > > you'll crate a list of candidates when processing first relo that
> > > starts at task_struct. All the subsequent relos that have task_struct
> > > as root type will re-used that list and potentially trim it down. If
> > > there are some other relos against, say, struct mm_struct, they will
> > > have their independent list of candidates.
> >
> > right.
> > Your prior comment confused me. I didn't do this reuse of cands
> > to avoid introducing hashtable here like libbpf does,
> > since it does too little to actually help.
>
> Since when avoiding millions of iterations for each relocation is "too
> little help"?
>
> BTW, I've tried to measure how noticeable this will be and added
> test_verif_scale2 to core_kern with only change to use vmlinux.h (so
> that __sk_buff accesses are CO-RE relocated). I didn't manage to get
> it loaded, so something else is going there. So please take a look, I
> don't really know how to debug lskel stuff. Here are the changes:
>
> diff --git a/tools/testing/selftests/bpf/progs/core_kern.c
> b/tools/testing/selftests/bpf/progs/core_kern.c
> index 3b4571d68369..9916cf059883 100644
> --- a/tools/testing/selftests/bpf/progs/core_kern.c
> +++ b/tools/testing/selftests/bpf/progs/core_kern.c
> @@ -4,6 +4,10 @@
>
>  #include <bpf/bpf_helpers.h>
>  #include <bpf/bpf_tracing.h>
> +#include <bpf/bpf_core_read.h>
> +
> +#define ATTR __always_inline
> +#include "test_jhash.h"
>
>  struct {
>   __uint(type, BPF_MAP_TYPE_ARRAY);
> @@ -57,4 +61,27 @@ int BPF_PROG(fexit_eth_type_trans, struct sk_buff *skb,
>   return randmap(dev->ifindex + skb->len);
>  }
>
> +SEC("tc")
> +int balancer_ingress(struct __sk_buff *ctx)
> +{
> + void *data_end = (void *)(long)ctx->data_end;
> + void *data = (void *)(long)ctx->data;
> + void *ptr;
> + int ret = 0, nh_off, i = 0;
> +
> + nh_off = 14;
> +
> + /* pragma unroll doesn't work on large loops */
> +
> +#define C do { \
> + ptr = data + i; \
> + if (ptr + nh_off > data_end) \
> + break; \
> + ctx->tc_index = jhash(ptr, nh_off, ctx->cb[0] + i++); \
> + } while (0);
> +#define C30 C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;
> + C30;C30;C30; /* 90 calls */
> + return 0;
> +}
> +
>  char LICENSE[] SEC("license") = "GPL";
> diff --git a/tools/testing/selftests/bpf/progs/test_verif_scale2.c
> b/tools/testing/selftests/bpf/progs/test_verif_scale2.c
> index f024154c7be7..9e2c2a6954cb 100644
> --- a/tools/testing/selftests/bpf/progs/test_verif_scale2.c
> +++ b/tools/testing/selftests/bpf/progs/test_verif_scale2.c
> @@ -1,6 +1,6 @@
>  // SPDX-License-Identifier: GPL-2.0
>  // Copyright (c) 2019 Facebook
> -#include <linux/bpf.h>
> +#include "vmlinux.h"
>  #include <bpf/bpf_helpers.h>
>  #define ATTR __always_inline
>  #include "test_jhash.h"
>
>
> The version with libbpf does load successfully, while lskel one gives:
>
> #35 core_kern_lskel:FAIL
> test_core_kern_lskel:FAIL:open_and_load unexpected error: -22
>
> Same stuff with libbpf skeleton successfully loaded.
>
> If you manage to get it working, you'll have a BPF program 181 CO-RE
> relocations, let's see how noticeable it will be to do 181 iterations
> over ~2mln vmlinux BTF types.

Just realized that I was off by one order of magnitude, it's about
100K types, not 2 million. But the point stands, it is quite a lot.

>
> > I think I will go back to the prior version: linear search for every relo.
>
> I don't understand the resistance, the kernel has both rbtree and
> hashmaps, what's the problem just using that?
>
> > If we actually need to optimize this part of loading
> > we better do persistent cache of
> > name -> kernel btf_type-s
> > and reuse it across different programs.
>
> You can't reuse it because the same type name in a BPF object BTF can
> be resolved to different kernel types (e.g., if we still had
> ring_buffer name collision), so this is all per-BPF object (or at
> least per BPF program). We still have duplicated types in some more
> fuller kernel configurations, and it can always happen down the road.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 bpf-next 06/13] bpf: Add bpf_core_add_cands() and wire it into bpf_core_apply_relo_insn().
  2021-11-23  1:44       ` Andrii Nakryiko
  2021-11-23  2:59         ` Andrii Nakryiko
@ 2021-11-23  3:15         ` Alexei Starovoitov
  2021-11-23  3:44           ` Andrii Nakryiko
  1 sibling, 1 reply; 37+ messages in thread
From: Alexei Starovoitov @ 2021-11-23  3:15 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: David S. Miller, Daniel Borkmann, Andrii Nakryiko, bpf, Kernel Team

On Mon, Nov 22, 2021 at 5:44 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Mon, Nov 22, 2021 at 4:43 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Mon, Nov 22, 2021 at 3:47 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Fri, Nov 19, 2021 at 7:33 PM Alexei Starovoitov
> > > <alexei.starovoitov@gmail.com> wrote:
> > > >
> > > > From: Alexei Starovoitov <ast@kernel.org>
> > > >
> > > > Given BPF program's BTF perform a linear search through kernel BTFs for
> > > > a possible candidate.
> > > > Then wire the result into bpf_core_apply_relo_insn().
> > > >
> > > > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> > > > ---
> > > >  kernel/bpf/btf.c | 136 ++++++++++++++++++++++++++++++++++++++++++++++-
> > > >  1 file changed, 135 insertions(+), 1 deletion(-)
> > > >
> > >
> > > [...]
> > >
> > > >  int bpf_core_apply(struct bpf_core_ctx *ctx, const struct bpf_core_relo *relo,
> > > >                    int relo_idx, void *insn)
> > > >  {
> > > > -       return -EOPNOTSUPP;
> > > > +       if (relo->kind != BPF_CORE_TYPE_ID_LOCAL) {
> > > > +               struct bpf_core_cand_list *cands;
> > > > +
> > > > +               cands = bpf_core_find_cands(ctx, relo->type_id);
> > >
> > > this is wrong for many reasons:
> > >
> > > 1. you will overwrite previous ctx->cands, if it was already set,
> > > which leaks memory
> > > 2. this list of candidates should be keyed by relo->type_id ("root
> > > type"). Different root types get their own independent lists; so it
> > > has to be some sort of look up table from type_id to a list of
> > > candidates.
> > >
> > > 2) means that if you had a bunch of relos against struct task_struct,
> > > you'll crate a list of candidates when processing first relo that
> > > starts at task_struct. All the subsequent relos that have task_struct
> > > as root type will re-used that list and potentially trim it down. If
> > > there are some other relos against, say, struct mm_struct, they will
> > > have their independent list of candidates.
> >
> > right.
> > Your prior comment confused me. I didn't do this reuse of cands
> > to avoid introducing hashtable here like libbpf does,
> > since it does too little to actually help.
>
> Since when avoiding millions of iterations for each relocation is "too
> little help"?

because it is a premature optimization for a case that
may or may not be relevant.
If 180 sk_buff relocations somehow makes the loading too slow
180 relocations of 180 different types would produce exactly
the same slow down and hashtable cache won't help.

> BTW, I've tried to measure how noticeable this will be and added
> test_verif_scale2 to core_kern with only change to use vmlinux.h (so
> that __sk_buff accesses are CO-RE relocated). I didn't manage to get
> it loaded, so something else is going there. So please take a look, I
> don't really know how to debug lskel stuff. Here are the changes:

Looking. Thanks for the test.

> > If we actually need to optimize this part of loading
> > we better do persistent cache of
> > name -> kernel btf_type-s
> > and reuse it across different programs.
>
> You can't reuse it because the same type name in a BPF object BTF can
> be resolved to different kernel types (e.g., if we still had
> ring_buffer name collision),

well and the candidate list will have two kernel types with the same name.
Both kept in a cache.
so cache_lookup("ring_buffer") will return two kernel types.
That would be the case for all progs being loaded.
What am I missing?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 bpf-next 06/13] bpf: Add bpf_core_add_cands() and wire it into bpf_core_apply_relo_insn().
  2021-11-23  3:15         ` Alexei Starovoitov
@ 2021-11-23  3:44           ` Andrii Nakryiko
  2021-11-23  5:04             ` Alexei Starovoitov
  0 siblings, 1 reply; 37+ messages in thread
From: Andrii Nakryiko @ 2021-11-23  3:44 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, Andrii Nakryiko, bpf, Kernel Team

On Mon, Nov 22, 2021 at 7:15 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, Nov 22, 2021 at 5:44 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Mon, Nov 22, 2021 at 4:43 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Mon, Nov 22, 2021 at 3:47 PM Andrii Nakryiko
> > > <andrii.nakryiko@gmail.com> wrote:
> > > >
> > > > On Fri, Nov 19, 2021 at 7:33 PM Alexei Starovoitov
> > > > <alexei.starovoitov@gmail.com> wrote:
> > > > >
> > > > > From: Alexei Starovoitov <ast@kernel.org>
> > > > >
> > > > > Given BPF program's BTF perform a linear search through kernel BTFs for
> > > > > a possible candidate.
> > > > > Then wire the result into bpf_core_apply_relo_insn().
> > > > >
> > > > > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> > > > > ---
> > > > >  kernel/bpf/btf.c | 136 ++++++++++++++++++++++++++++++++++++++++++++++-
> > > > >  1 file changed, 135 insertions(+), 1 deletion(-)
> > > > >
> > > >
> > > > [...]
> > > >
> > > > >  int bpf_core_apply(struct bpf_core_ctx *ctx, const struct bpf_core_relo *relo,
> > > > >                    int relo_idx, void *insn)
> > > > >  {
> > > > > -       return -EOPNOTSUPP;
> > > > > +       if (relo->kind != BPF_CORE_TYPE_ID_LOCAL) {
> > > > > +               struct bpf_core_cand_list *cands;
> > > > > +
> > > > > +               cands = bpf_core_find_cands(ctx, relo->type_id);
> > > >
> > > > this is wrong for many reasons:
> > > >
> > > > 1. you will overwrite previous ctx->cands, if it was already set,
> > > > which leaks memory
> > > > 2. this list of candidates should be keyed by relo->type_id ("root
> > > > type"). Different root types get their own independent lists; so it
> > > > has to be some sort of look up table from type_id to a list of
> > > > candidates.
> > > >
> > > > 2) means that if you had a bunch of relos against struct task_struct,
> > > > you'll crate a list of candidates when processing first relo that
> > > > starts at task_struct. All the subsequent relos that have task_struct
> > > > as root type will re-used that list and potentially trim it down. If
> > > > there are some other relos against, say, struct mm_struct, they will
> > > > have their independent list of candidates.
> > >
> > > right.
> > > Your prior comment confused me. I didn't do this reuse of cands
> > > to avoid introducing hashtable here like libbpf does,
> > > since it does too little to actually help.
> >
> > Since when avoiding millions of iterations for each relocation is "too
> > little help"?
>
> because it is a premature optimization for a case that
> may or may not be relevant.
> If 180 sk_buff relocations somehow makes the loading too slow
> 180 relocations of 180 different types would produce exactly
> the same slow down and hashtable cache won't help.

Likelihood of using 180 different root types in real application is
very small. Using 180 relocations (either with explicit BPF_CORE_READ,
field accesses in fentry, or just through always_inline helpers doing
either and being inlined in multiple places) is very real in
real-world non-trivial applications. And the cost of optimizing that
in the kernel later is very high, you'll be waiting for a new kernel
release to get everywhere to rely on this optimization. The cost of
further optimizing this in libbpf is much smaller (and libbpf still
did the optimization from the get go and I stand by that decision).

If you think I'm making this up, we have one security-related BPF app
with 1076 CO-RE relocations across 11 BPF programs. It's using 22
different root types.

>
> > BTW, I've tried to measure how noticeable this will be and added
> > test_verif_scale2 to core_kern with only change to use vmlinux.h (so
> > that __sk_buff accesses are CO-RE relocated). I didn't manage to get
> > it loaded, so something else is going there. So please take a look, I
> > don't really know how to debug lskel stuff. Here are the changes:
>
> Looking. Thanks for the test.
>
> > > If we actually need to optimize this part of loading
> > > we better do persistent cache of
> > > name -> kernel btf_type-s
> > > and reuse it across different programs.
> >
> > You can't reuse it because the same type name in a BPF object BTF can
> > be resolved to different kernel types (e.g., if we still had
> > ring_buffer name collision),
>
> well and the candidate list will have two kernel types with the same name.
> Both kept in a cache.
> so cache_lookup("ring_buffer") will return two kernel types.
> That would be the case for all progs being loaded.
> What am I missing?

if there are two matching types with the same matching field but field
offsets are different (and thus there is ambiguity), that's an error.
So the correct (by current definition, at least) program has to result
in one of such two incompatible ring_buffer types and only one. If
there are multiple duplicates, though, (like task_struct and
task_struct___2) they will have identical sets of fields at the same
offsets, so both will remain possible candidates and that's not an
error. But for actually two different types, there can be only one
winner.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 bpf-next 06/13] bpf: Add bpf_core_add_cands() and wire it into bpf_core_apply_relo_insn().
  2021-11-23  3:44           ` Andrii Nakryiko
@ 2021-11-23  5:04             ` Alexei Starovoitov
  2021-11-23 18:19               ` Andrii Nakryiko
  0 siblings, 1 reply; 37+ messages in thread
From: Alexei Starovoitov @ 2021-11-23  5:04 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: David S. Miller, Daniel Borkmann, Andrii Nakryiko, bpf, Kernel Team

On Mon, Nov 22, 2021 at 7:44 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Mon, Nov 22, 2021 at 7:15 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Mon, Nov 22, 2021 at 5:44 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Mon, Nov 22, 2021 at 4:43 PM Alexei Starovoitov
> > > <alexei.starovoitov@gmail.com> wrote:
> > > >
> > > > On Mon, Nov 22, 2021 at 3:47 PM Andrii Nakryiko
> > > > <andrii.nakryiko@gmail.com> wrote:
> > > > >
> > > > > On Fri, Nov 19, 2021 at 7:33 PM Alexei Starovoitov
> > > > > <alexei.starovoitov@gmail.com> wrote:
> > > > > >
> > > > > > From: Alexei Starovoitov <ast@kernel.org>
> > > > > >
> > > > > > Given BPF program's BTF perform a linear search through kernel BTFs for
> > > > > > a possible candidate.
> > > > > > Then wire the result into bpf_core_apply_relo_insn().
> > > > > >
> > > > > > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> > > > > > ---
> > > > > >  kernel/bpf/btf.c | 136 ++++++++++++++++++++++++++++++++++++++++++++++-
> > > > > >  1 file changed, 135 insertions(+), 1 deletion(-)
> > > > > >
> > > > >
> > > > > [...]
> > > > >
> > > > > >  int bpf_core_apply(struct bpf_core_ctx *ctx, const struct bpf_core_relo *relo,
> > > > > >                    int relo_idx, void *insn)
> > > > > >  {
> > > > > > -       return -EOPNOTSUPP;
> > > > > > +       if (relo->kind != BPF_CORE_TYPE_ID_LOCAL) {
> > > > > > +               struct bpf_core_cand_list *cands;
> > > > > > +
> > > > > > +               cands = bpf_core_find_cands(ctx, relo->type_id);
> > > > >
> > > > > this is wrong for many reasons:
> > > > >
> > > > > 1. you will overwrite previous ctx->cands, if it was already set,
> > > > > which leaks memory
> > > > > 2. this list of candidates should be keyed by relo->type_id ("root
> > > > > type"). Different root types get their own independent lists; so it
> > > > > has to be some sort of look up table from type_id to a list of
> > > > > candidates.
> > > > >
> > > > > 2) means that if you had a bunch of relos against struct task_struct,
> > > > > you'll crate a list of candidates when processing first relo that
> > > > > starts at task_struct. All the subsequent relos that have task_struct
> > > > > as root type will re-used that list and potentially trim it down. If
> > > > > there are some other relos against, say, struct mm_struct, they will
> > > > > have their independent list of candidates.
> > > >
> > > > right.
> > > > Your prior comment confused me. I didn't do this reuse of cands
> > > > to avoid introducing hashtable here like libbpf does,
> > > > since it does too little to actually help.
> > >
> > > Since when avoiding millions of iterations for each relocation is "too
> > > little help"?
> >
> > because it is a premature optimization for a case that
> > may or may not be relevant.
> > If 180 sk_buff relocations somehow makes the loading too slow
> > 180 relocations of 180 different types would produce exactly
> > the same slow down and hashtable cache won't help.
>
> Likelihood of using 180 different root types in real application is
> very small. Using 180 relocations (either with explicit BPF_CORE_READ,
> field accesses in fentry, or just through always_inline helpers doing
> either and being inlined in multiple places) is very real in
> real-world non-trivial applications. And the cost of optimizing that
> in the kernel later is very high, you'll be waiting for a new kernel
> release to get everywhere to rely on this optimization. The cost of
> further optimizing this in libbpf is much smaller (and libbpf still
> did the optimization from the get go and I stand by that decision).
>
> If you think I'm making this up, we have one security-related BPF app
> with 1076 CO-RE relocations across 11 BPF programs. It's using 22
> different root types.

I suspect you're talking about selftests/bpf/profiler* tests.
bpftool prog load -L profile1.o
[  873.449749] nr_core_relo 106
[  873.614186] nr_core_relo 297
[  874.107470] nr_core_relo 19
[  874.144146] nr_core_relo 102
[  874.306462] nr_core_relo 258
[  874.692219] nr_core_relo 410
[  875.329652] nr_core_relo 238
[  875.689900] nr_core_relo 9

8 different progs with a bunch of core relos.
On a debug kernel with lockdep and kasan it takes 2.3 seconds to load
while kernel bpf_core_add_cands() is doing that loop
more than a thousand times.
libbpf takes 1.7 seconds.
So it's an extra 0.5 second due to the loop.

I haven't found the bug in lksel with core_kern.c + balancer_ingress yet.
But just doing balancer_ingress (test_verif_scale2) as lskel I get:
0m0.827s

while verif_scale2 is 6 seconds!

Turned out due to attr.prog_flags = BPF_F_TEST_RND_HI32
without it it's 0m0.574s.

So 0.25 sec is spent in the add_cands loop.

> >
> > > BTW, I've tried to measure how noticeable this will be and added
> > > test_verif_scale2 to core_kern with only change to use vmlinux.h (so
> > > that __sk_buff accesses are CO-RE relocated). I didn't manage to get
> > > it loaded, so something else is going there. So please take a look, I
> > > don't really know how to debug lskel stuff. Here are the changes:
> >
> > Looking. Thanks for the test.
> >
> > > > If we actually need to optimize this part of loading
> > > > we better do persistent cache of
> > > > name -> kernel btf_type-s
> > > > and reuse it across different programs.
> > >
> > > You can't reuse it because the same type name in a BPF object BTF can
> > > be resolved to different kernel types (e.g., if we still had
> > > ring_buffer name collision),
> >
> > well and the candidate list will have two kernel types with the same name.
> > Both kept in a cache.
> > so cache_lookup("ring_buffer") will return two kernel types.
> > That would be the case for all progs being loaded.
> > What am I missing?
>
> if there are two matching types with the same matching field but field
> offsets are different (and thus there is ambiguity), that's an error.
> So the correct (by current definition, at least) program has to result
> in one of such two incompatible ring_buffer types and only one. If
> there are multiple duplicates, though, (like task_struct and
> task_struct___2) they will have identical sets of fields at the same
> offsets, so both will remain possible candidates and that's not an
> error. But for actually two different types, there can be only one
> winner.

I'm not proposing to cache the result of refined bpf_core_cand_list
after bpf_core_apply_relo_insn() did its job.
I'm saying the kernel can cache the result of the add_cands loop across
vmlinux BTFs for all progs.
The bpf_core_apply_relo_insn() will be called with a copy of
bpf_core_cand_list from a cache.
I'm thinking to keep this cache in 'struct btf'

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 bpf-next 06/13] bpf: Add bpf_core_add_cands() and wire it into bpf_core_apply_relo_insn().
  2021-11-23  5:04             ` Alexei Starovoitov
@ 2021-11-23 18:19               ` Andrii Nakryiko
  2021-11-23 19:33                 ` Alexei Starovoitov
  0 siblings, 1 reply; 37+ messages in thread
From: Andrii Nakryiko @ 2021-11-23 18:19 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, Andrii Nakryiko, bpf, Kernel Team

On Mon, Nov 22, 2021 at 9:04 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, Nov 22, 2021 at 7:44 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Mon, Nov 22, 2021 at 7:15 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Mon, Nov 22, 2021 at 5:44 PM Andrii Nakryiko
> > > <andrii.nakryiko@gmail.com> wrote:
> > > >
> > > > On Mon, Nov 22, 2021 at 4:43 PM Alexei Starovoitov
> > > > <alexei.starovoitov@gmail.com> wrote:
> > > > >
> > > > > On Mon, Nov 22, 2021 at 3:47 PM Andrii Nakryiko
> > > > > <andrii.nakryiko@gmail.com> wrote:
> > > > > >
> > > > > > On Fri, Nov 19, 2021 at 7:33 PM Alexei Starovoitov
> > > > > > <alexei.starovoitov@gmail.com> wrote:
> > > > > > >
> > > > > > > From: Alexei Starovoitov <ast@kernel.org>
> > > > > > >
> > > > > > > Given BPF program's BTF perform a linear search through kernel BTFs for
> > > > > > > a possible candidate.
> > > > > > > Then wire the result into bpf_core_apply_relo_insn().
> > > > > > >
> > > > > > > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> > > > > > > ---
> > > > > > >  kernel/bpf/btf.c | 136 ++++++++++++++++++++++++++++++++++++++++++++++-
> > > > > > >  1 file changed, 135 insertions(+), 1 deletion(-)
> > > > > > >
> > > > > >
> > > > > > [...]
> > > > > >
> > > > > > >  int bpf_core_apply(struct bpf_core_ctx *ctx, const struct bpf_core_relo *relo,
> > > > > > >                    int relo_idx, void *insn)
> > > > > > >  {
> > > > > > > -       return -EOPNOTSUPP;
> > > > > > > +       if (relo->kind != BPF_CORE_TYPE_ID_LOCAL) {
> > > > > > > +               struct bpf_core_cand_list *cands;
> > > > > > > +
> > > > > > > +               cands = bpf_core_find_cands(ctx, relo->type_id);
> > > > > >
> > > > > > this is wrong for many reasons:
> > > > > >
> > > > > > 1. you will overwrite previous ctx->cands, if it was already set,
> > > > > > which leaks memory
> > > > > > 2. this list of candidates should be keyed by relo->type_id ("root
> > > > > > type"). Different root types get their own independent lists; so it
> > > > > > has to be some sort of look up table from type_id to a list of
> > > > > > candidates.
> > > > > >
> > > > > > 2) means that if you had a bunch of relos against struct task_struct,
> > > > > > you'll crate a list of candidates when processing first relo that
> > > > > > starts at task_struct. All the subsequent relos that have task_struct
> > > > > > as root type will re-used that list and potentially trim it down. If
> > > > > > there are some other relos against, say, struct mm_struct, they will
> > > > > > have their independent list of candidates.
> > > > >
> > > > > right.
> > > > > Your prior comment confused me. I didn't do this reuse of cands
> > > > > to avoid introducing hashtable here like libbpf does,
> > > > > since it does too little to actually help.
> > > >
> > > > Since when avoiding millions of iterations for each relocation is "too
> > > > little help"?
> > >
> > > because it is a premature optimization for a case that
> > > may or may not be relevant.
> > > If 180 sk_buff relocations somehow makes the loading too slow
> > > 180 relocations of 180 different types would produce exactly
> > > the same slow down and hashtable cache won't help.
> >
> > Likelihood of using 180 different root types in real application is
> > very small. Using 180 relocations (either with explicit BPF_CORE_READ,
> > field accesses in fentry, or just through always_inline helpers doing
> > either and being inlined in multiple places) is very real in
> > real-world non-trivial applications. And the cost of optimizing that
> > in the kernel later is very high, you'll be waiting for a new kernel
> > release to get everywhere to rely on this optimization. The cost of
> > further optimizing this in libbpf is much smaller (and libbpf still
> > did the optimization from the get go and I stand by that decision).
> >
> > If you think I'm making this up, we have one security-related BPF app
> > with 1076 CO-RE relocations across 11 BPF programs. It's using 22
> > different root types.
>
> I suspect you're talking about selftests/bpf/profiler* tests.
> bpftool prog load -L profile1.o
> [  873.449749] nr_core_relo 106
> [  873.614186] nr_core_relo 297
> [  874.107470] nr_core_relo 19
> [  874.144146] nr_core_relo 102
> [  874.306462] nr_core_relo 258
> [  874.692219] nr_core_relo 410
> [  875.329652] nr_core_relo 238
> [  875.689900] nr_core_relo 9
>
> 8 different progs with a bunch of core relos.

Nope, I was talking about real production app here at Meta:

Section            Reloc cnt
------------------ ----------
.text              80
kprobe/...         217
kprobe/...         2
kprobe/...         4
kprobe/...         83
kprobe/...         261
kprobe/...         163
kretprobe/...      1
kretprobe/...      174
raw_tracepoint/... 82
raw_tracepoint/... 9


> On a debug kernel with lockdep and kasan it takes 2.3 seconds to load
> while kernel bpf_core_add_cands() is doing that loop
> more than a thousand times.
> libbpf takes 1.7 seconds.
> So it's an extra 0.5 second due to the loop.
>
> I haven't found the bug in lksel with core_kern.c + balancer_ingress yet.
> But just doing balancer_ingress (test_verif_scale2) as lskel I get:
> 0m0.827s
>
> while verif_scale2 is 6 seconds!
>
> Turned out due to attr.prog_flags = BPF_F_TEST_RND_HI32
> without it it's 0m0.574s.
>
> So 0.25 sec is spent in the add_cands loop.
>

Not sure whether you agree it's unnecessary slow or not :) But we have
teams worrying about 300ms total verification time, so there's that.

> > >
> > > > BTW, I've tried to measure how noticeable this will be and added
> > > > test_verif_scale2 to core_kern with only change to use vmlinux.h (so
> > > > that __sk_buff accesses are CO-RE relocated). I didn't manage to get
> > > > it loaded, so something else is going there. So please take a look, I
> > > > don't really know how to debug lskel stuff. Here are the changes:
> > >
> > > Looking. Thanks for the test.
> > >
> > > > > If we actually need to optimize this part of loading
> > > > > we better do persistent cache of
> > > > > name -> kernel btf_type-s
> > > > > and reuse it across different programs.
> > > >
> > > > You can't reuse it because the same type name in a BPF object BTF can
> > > > be resolved to different kernel types (e.g., if we still had
> > > > ring_buffer name collision),
> > >
> > > well and the candidate list will have two kernel types with the same name.
> > > Both kept in a cache.
> > > so cache_lookup("ring_buffer") will return two kernel types.
> > > That would be the case for all progs being loaded.
> > > What am I missing?
> >
> > if there are two matching types with the same matching field but field
> > offsets are different (and thus there is ambiguity), that's an error.
> > So the correct (by current definition, at least) program has to result
> > in one of such two incompatible ring_buffer types and only one. If
> > there are multiple duplicates, though, (like task_struct and
> > task_struct___2) they will have identical sets of fields at the same
> > offsets, so both will remain possible candidates and that's not an
> > error. But for actually two different types, there can be only one
> > winner.
>
> I'm not proposing to cache the result of refined bpf_core_cand_list
> after bpf_core_apply_relo_insn() did its job.
> I'm saying the kernel can cache the result of the add_cands loop across
> vmlinux BTFs for all progs.
> The bpf_core_apply_relo_insn() will be called with a copy of
> bpf_core_cand_list from a cache.
> I'm thinking to keep this cache in 'struct btf'

I wouldn't start with that for sure, because the next question is
garbage collection. What if someone just did some experiments, used
some obscure types that are not really used (typically), potentially
lots of them. Now the kernel will keep these extra cache entries
forever. You are basically talking about a lazily-built by-name index,
it might be useful in some other cases as well, but it definitely
comes with extra cost and I'm not convinced yet that we need to pay
this price right now.

But then again, I don't see why it's so hard to build a local hashmap
or rbtree for the duration of program load and discard it after load.
You haven't provided a clear argument why not.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 bpf-next 06/13] bpf: Add bpf_core_add_cands() and wire it into bpf_core_apply_relo_insn().
  2021-11-23 18:19               ` Andrii Nakryiko
@ 2021-11-23 19:33                 ` Alexei Starovoitov
  2021-11-23 19:48                   ` Andrii Nakryiko
  0 siblings, 1 reply; 37+ messages in thread
From: Alexei Starovoitov @ 2021-11-23 19:33 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: David S. Miller, Daniel Borkmann, Andrii Nakryiko, bpf, Kernel Team

On Tue, Nov 23, 2021 at 10:19 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Mon, Nov 22, 2021 at 9:04 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Mon, Nov 22, 2021 at 7:44 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Mon, Nov 22, 2021 at 7:15 PM Alexei Starovoitov
> > > <alexei.starovoitov@gmail.com> wrote:
> > > >
> > > > On Mon, Nov 22, 2021 at 5:44 PM Andrii Nakryiko
> > > > <andrii.nakryiko@gmail.com> wrote:
> > > > >
> > > > > On Mon, Nov 22, 2021 at 4:43 PM Alexei Starovoitov
> > > > > <alexei.starovoitov@gmail.com> wrote:
> > > > > >
> > > > > > On Mon, Nov 22, 2021 at 3:47 PM Andrii Nakryiko
> > > > > > <andrii.nakryiko@gmail.com> wrote:
> > > > > > >
> > > > > > > On Fri, Nov 19, 2021 at 7:33 PM Alexei Starovoitov
> > > > > > > <alexei.starovoitov@gmail.com> wrote:
> > > > > > > >
> > > > > > > > From: Alexei Starovoitov <ast@kernel.org>
> > > > > > > >
> > > > > > > > Given BPF program's BTF perform a linear search through kernel BTFs for
> > > > > > > > a possible candidate.
> > > > > > > > Then wire the result into bpf_core_apply_relo_insn().
> > > > > > > >
> > > > > > > > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> > > > > > > > ---
> > > > > > > >  kernel/bpf/btf.c | 136 ++++++++++++++++++++++++++++++++++++++++++++++-
> > > > > > > >  1 file changed, 135 insertions(+), 1 deletion(-)
> > > > > > > >
> > > > > > >
> > > > > > > [...]
> > > > > > >
> > > > > > > >  int bpf_core_apply(struct bpf_core_ctx *ctx, const struct bpf_core_relo *relo,
> > > > > > > >                    int relo_idx, void *insn)
> > > > > > > >  {
> > > > > > > > -       return -EOPNOTSUPP;
> > > > > > > > +       if (relo->kind != BPF_CORE_TYPE_ID_LOCAL) {
> > > > > > > > +               struct bpf_core_cand_list *cands;
> > > > > > > > +
> > > > > > > > +               cands = bpf_core_find_cands(ctx, relo->type_id);
> > > > > > >
> > > > > > > this is wrong for many reasons:
> > > > > > >
> > > > > > > 1. you will overwrite previous ctx->cands, if it was already set,
> > > > > > > which leaks memory
> > > > > > > 2. this list of candidates should be keyed by relo->type_id ("root
> > > > > > > type"). Different root types get their own independent lists; so it
> > > > > > > has to be some sort of look up table from type_id to a list of
> > > > > > > candidates.
> > > > > > >
> > > > > > > 2) means that if you had a bunch of relos against struct task_struct,
> > > > > > > you'll crate a list of candidates when processing first relo that
> > > > > > > starts at task_struct. All the subsequent relos that have task_struct
> > > > > > > as root type will re-used that list and potentially trim it down. If
> > > > > > > there are some other relos against, say, struct mm_struct, they will
> > > > > > > have their independent list of candidates.
> > > > > >
> > > > > > right.
> > > > > > Your prior comment confused me. I didn't do this reuse of cands
> > > > > > to avoid introducing hashtable here like libbpf does,
> > > > > > since it does too little to actually help.
> > > > >
> > > > > Since when avoiding millions of iterations for each relocation is "too
> > > > > little help"?
> > > >
> > > > because it is a premature optimization for a case that
> > > > may or may not be relevant.
> > > > If 180 sk_buff relocations somehow makes the loading too slow
> > > > 180 relocations of 180 different types would produce exactly
> > > > the same slow down and hashtable cache won't help.
> > >
> > > Likelihood of using 180 different root types in real application is
> > > very small. Using 180 relocations (either with explicit BPF_CORE_READ,
> > > field accesses in fentry, or just through always_inline helpers doing
> > > either and being inlined in multiple places) is very real in
> > > real-world non-trivial applications. And the cost of optimizing that
> > > in the kernel later is very high, you'll be waiting for a new kernel
> > > release to get everywhere to rely on this optimization. The cost of
> > > further optimizing this in libbpf is much smaller (and libbpf still
> > > did the optimization from the get go and I stand by that decision).
> > >
> > > If you think I'm making this up, we have one security-related BPF app
> > > with 1076 CO-RE relocations across 11 BPF programs. It's using 22
> > > different root types.
> >
> > I suspect you're talking about selftests/bpf/profiler* tests.
> > bpftool prog load -L profile1.o
> > [  873.449749] nr_core_relo 106
> > [  873.614186] nr_core_relo 297
> > [  874.107470] nr_core_relo 19
> > [  874.144146] nr_core_relo 102
> > [  874.306462] nr_core_relo 258
> > [  874.692219] nr_core_relo 410
> > [  875.329652] nr_core_relo 238
> > [  875.689900] nr_core_relo 9
> >
> > 8 different progs with a bunch of core relos.
>
> Nope, I was talking about real production app here at Meta:
>
> Section            Reloc cnt
> ------------------ ----------
> .text              80
> kprobe/...         217
> kprobe/...         2
> kprobe/...         4
> kprobe/...         83
> kprobe/...         261
> kprobe/...         163
> kretprobe/...      1
> kretprobe/...      174
> raw_tracepoint/... 82
> raw_tracepoint/... 9

I think the profiler test is a trimmed version of it.
Few short kprobe/krerprobes were removed from it,
since they don't improve the test coverage.

>
> > On a debug kernel with lockdep and kasan it takes 2.3 seconds to load
> > while kernel bpf_core_add_cands() is doing that loop
> > more than a thousand times.
> > libbpf takes 1.7 seconds.
> > So it's an extra 0.5 second due to the loop.
> >
> > I haven't found the bug in lksel with core_kern.c + balancer_ingress yet.
> > But just doing balancer_ingress (test_verif_scale2) as lskel I get:
> > 0m0.827s
> >
> > while verif_scale2 is 6 seconds!
> >
> > Turned out due to attr.prog_flags = BPF_F_TEST_RND_HI32
> > without it it's 0m0.574s.
> >
> > So 0.25 sec is spent in the add_cands loop.
> >
>
> Not sure whether you agree it's unnecessary slow or not :) But we have
> teams worrying about 300ms total verification time, so there's that.
>
> > > >
> > > > > BTW, I've tried to measure how noticeable this will be and added
> > > > > test_verif_scale2 to core_kern with only change to use vmlinux.h (so
> > > > > that __sk_buff accesses are CO-RE relocated). I didn't manage to get
> > > > > it loaded, so something else is going there. So please take a look, I
> > > > > don't really know how to debug lskel stuff. Here are the changes:
> > > >
> > > > Looking. Thanks for the test.
> > > >
> > > > > > If we actually need to optimize this part of loading
> > > > > > we better do persistent cache of
> > > > > > name -> kernel btf_type-s
> > > > > > and reuse it across different programs.
> > > > >
> > > > > You can't reuse it because the same type name in a BPF object BTF can
> > > > > be resolved to different kernel types (e.g., if we still had
> > > > > ring_buffer name collision),
> > > >
> > > > well and the candidate list will have two kernel types with the same name.
> > > > Both kept in a cache.
> > > > so cache_lookup("ring_buffer") will return two kernel types.
> > > > That would be the case for all progs being loaded.
> > > > What am I missing?
> > >
> > > if there are two matching types with the same matching field but field
> > > offsets are different (and thus there is ambiguity), that's an error.
> > > So the correct (by current definition, at least) program has to result
> > > in one of such two incompatible ring_buffer types and only one. If
> > > there are multiple duplicates, though, (like task_struct and
> > > task_struct___2) they will have identical sets of fields at the same
> > > offsets, so both will remain possible candidates and that's not an
> > > error. But for actually two different types, there can be only one
> > > winner.
> >
> > I'm not proposing to cache the result of refined bpf_core_cand_list
> > after bpf_core_apply_relo_insn() did its job.
> > I'm saying the kernel can cache the result of the add_cands loop across
> > vmlinux BTFs for all progs.
> > The bpf_core_apply_relo_insn() will be called with a copy of
> > bpf_core_cand_list from a cache.
> > I'm thinking to keep this cache in 'struct btf'
>
> I wouldn't start with that for sure, because the next question is
> garbage collection. What if someone just did some experiments, used
> some obscure types that are not really used (typically), potentially
> lots of them. Now the kernel will keep these extra cache entries
> forever. You are basically talking about a lazily-built by-name index,
> it might be useful in some other cases as well, but it definitely
> comes with extra cost and I'm not convinced yet that we need to pay
> this price right now.
>
> But then again, I don't see why it's so hard to build a local hashmap
> or rbtree for the duration of program load and discard it after load.
> You haven't provided a clear argument why not.

because libbpf's approach to use hashtab for narrow use case is inferior.
we could remove it from libbpf and no one will notice.
The generic cache that works across progs will speed up the loading
a tiny bit across all progs instead of a single case of profiler*.c
that has a bunch of relos only because it's using an outdated
always_unroll approach to loops.

btw the following diff removes the bigger part of the loop overhead:
+
+               if (strncmp(local_cand->name, targ_name, local_essent_len) != 0)
                        continue;

                targ_essent_len = bpf_core_essential_name_len(targ_name);
                if (targ_essent_len != local_essent_len)
                        continue;

-               if (strncmp(local_cand->name, targ_name, local_essent_len) != 0)
-                       continue;
-

from 0.25 sec it goes to 0.1 sec.
Not saying it's ok to waste this 0.1.
On non debug kernel it's probably even less noticeable.
The point is that the loop can be trivially made faster for
all use cases.
Whereas complicating kernel code with custom hashtable for
a narrow use case is just odd.
You've said it yourself: it's 22 different root types.
With libbpf style hashtable this will be 22 x sizeof(vmlinux btf)
iterations every time. And if this is a typical use case indeed
then more complex progs in the future will be doing these loops.
Either the loop should be fast enough (I think it is with above
strncmp move) or the kernel needs to be optimized for all use cases.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 bpf-next 06/13] bpf: Add bpf_core_add_cands() and wire it into bpf_core_apply_relo_insn().
  2021-11-23 19:33                 ` Alexei Starovoitov
@ 2021-11-23 19:48                   ` Andrii Nakryiko
  0 siblings, 0 replies; 37+ messages in thread
From: Andrii Nakryiko @ 2021-11-23 19:48 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, Andrii Nakryiko, bpf, Kernel Team

On Tue, Nov 23, 2021 at 11:33 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Nov 23, 2021 at 10:19 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Mon, Nov 22, 2021 at 9:04 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Mon, Nov 22, 2021 at 7:44 PM Andrii Nakryiko
> > > <andrii.nakryiko@gmail.com> wrote:
> > > >
> > > > On Mon, Nov 22, 2021 at 7:15 PM Alexei Starovoitov
> > > > <alexei.starovoitov@gmail.com> wrote:
> > > > >
> > > > > On Mon, Nov 22, 2021 at 5:44 PM Andrii Nakryiko
> > > > > <andrii.nakryiko@gmail.com> wrote:
> > > > > >
> > > > > > On Mon, Nov 22, 2021 at 4:43 PM Alexei Starovoitov
> > > > > > <alexei.starovoitov@gmail.com> wrote:
> > > > > > >
> > > > > > > On Mon, Nov 22, 2021 at 3:47 PM Andrii Nakryiko
> > > > > > > <andrii.nakryiko@gmail.com> wrote:
> > > > > > > >
> > > > > > > > On Fri, Nov 19, 2021 at 7:33 PM Alexei Starovoitov
> > > > > > > > <alexei.starovoitov@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > From: Alexei Starovoitov <ast@kernel.org>
> > > > > > > > >
> > > > > > > > > Given BPF program's BTF perform a linear search through kernel BTFs for
> > > > > > > > > a possible candidate.
> > > > > > > > > Then wire the result into bpf_core_apply_relo_insn().
> > > > > > > > >
> > > > > > > > > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> > > > > > > > > ---
> > > > > > > > >  kernel/bpf/btf.c | 136 ++++++++++++++++++++++++++++++++++++++++++++++-
> > > > > > > > >  1 file changed, 135 insertions(+), 1 deletion(-)
> > > > > > > > >
> > > > > > > >
> > > > > > > > [...]
> > > > > > > >
> > > > > > > > >  int bpf_core_apply(struct bpf_core_ctx *ctx, const struct bpf_core_relo *relo,
> > > > > > > > >                    int relo_idx, void *insn)
> > > > > > > > >  {
> > > > > > > > > -       return -EOPNOTSUPP;
> > > > > > > > > +       if (relo->kind != BPF_CORE_TYPE_ID_LOCAL) {
> > > > > > > > > +               struct bpf_core_cand_list *cands;
> > > > > > > > > +
> > > > > > > > > +               cands = bpf_core_find_cands(ctx, relo->type_id);
> > > > > > > >
> > > > > > > > this is wrong for many reasons:
> > > > > > > >
> > > > > > > > 1. you will overwrite previous ctx->cands, if it was already set,
> > > > > > > > which leaks memory
> > > > > > > > 2. this list of candidates should be keyed by relo->type_id ("root
> > > > > > > > type"). Different root types get their own independent lists; so it
> > > > > > > > has to be some sort of look up table from type_id to a list of
> > > > > > > > candidates.
> > > > > > > >
> > > > > > > > 2) means that if you had a bunch of relos against struct task_struct,
> > > > > > > > you'll crate a list of candidates when processing first relo that
> > > > > > > > starts at task_struct. All the subsequent relos that have task_struct
> > > > > > > > as root type will re-used that list and potentially trim it down. If
> > > > > > > > there are some other relos against, say, struct mm_struct, they will
> > > > > > > > have their independent list of candidates.
> > > > > > >
> > > > > > > right.
> > > > > > > Your prior comment confused me. I didn't do this reuse of cands
> > > > > > > to avoid introducing hashtable here like libbpf does,
> > > > > > > since it does too little to actually help.
> > > > > >
> > > > > > Since when avoiding millions of iterations for each relocation is "too
> > > > > > little help"?
> > > > >
> > > > > because it is a premature optimization for a case that
> > > > > may or may not be relevant.
> > > > > If 180 sk_buff relocations somehow makes the loading too slow
> > > > > 180 relocations of 180 different types would produce exactly
> > > > > the same slow down and hashtable cache won't help.
> > > >
> > > > Likelihood of using 180 different root types in real application is
> > > > very small. Using 180 relocations (either with explicit BPF_CORE_READ,
> > > > field accesses in fentry, or just through always_inline helpers doing
> > > > either and being inlined in multiple places) is very real in
> > > > real-world non-trivial applications. And the cost of optimizing that
> > > > in the kernel later is very high, you'll be waiting for a new kernel
> > > > release to get everywhere to rely on this optimization. The cost of
> > > > further optimizing this in libbpf is much smaller (and libbpf still
> > > > did the optimization from the get go and I stand by that decision).
> > > >
> > > > If you think I'm making this up, we have one security-related BPF app
> > > > with 1076 CO-RE relocations across 11 BPF programs. It's using 22
> > > > different root types.
> > >
> > > I suspect you're talking about selftests/bpf/profiler* tests.
> > > bpftool prog load -L profile1.o
> > > [  873.449749] nr_core_relo 106
> > > [  873.614186] nr_core_relo 297
> > > [  874.107470] nr_core_relo 19
> > > [  874.144146] nr_core_relo 102
> > > [  874.306462] nr_core_relo 258
> > > [  874.692219] nr_core_relo 410
> > > [  875.329652] nr_core_relo 238
> > > [  875.689900] nr_core_relo 9
> > >
> > > 8 different progs with a bunch of core relos.
> >
> > Nope, I was talking about real production app here at Meta:
> >
> > Section            Reloc cnt
> > ------------------ ----------
> > .text              80
> > kprobe/...         217
> > kprobe/...         2
> > kprobe/...         4
> > kprobe/...         83
> > kprobe/...         261
> > kprobe/...         163
> > kretprobe/...      1
> > kretprobe/...      174
> > raw_tracepoint/... 82
> > raw_tracepoint/... 9
>
> I think the profiler test is a trimmed version of it.
> Few short kprobe/krerprobes were removed from it,
> since they don't improve the test coverage.

yeah, could be, I forgot they are related

>
> >
> > > On a debug kernel with lockdep and kasan it takes 2.3 seconds to load
> > > while kernel bpf_core_add_cands() is doing that loop
> > > more than a thousand times.
> > > libbpf takes 1.7 seconds.
> > > So it's an extra 0.5 second due to the loop.
> > >
> > > I haven't found the bug in lksel with core_kern.c + balancer_ingress yet.
> > > But just doing balancer_ingress (test_verif_scale2) as lskel I get:
> > > 0m0.827s
> > >
> > > while verif_scale2 is 6 seconds!
> > >
> > > Turned out due to attr.prog_flags = BPF_F_TEST_RND_HI32
> > > without it it's 0m0.574s.
> > >
> > > So 0.25 sec is spent in the add_cands loop.
> > >
> >
> > Not sure whether you agree it's unnecessary slow or not :) But we have
> > teams worrying about 300ms total verification time, so there's that.
> >
> > > > >
> > > > > > BTW, I've tried to measure how noticeable this will be and added
> > > > > > test_verif_scale2 to core_kern with only change to use vmlinux.h (so
> > > > > > that __sk_buff accesses are CO-RE relocated). I didn't manage to get
> > > > > > it loaded, so something else is going there. So please take a look, I
> > > > > > don't really know how to debug lskel stuff. Here are the changes:
> > > > >
> > > > > Looking. Thanks for the test.
> > > > >
> > > > > > > If we actually need to optimize this part of loading
> > > > > > > we better do persistent cache of
> > > > > > > name -> kernel btf_type-s
> > > > > > > and reuse it across different programs.
> > > > > >
> > > > > > You can't reuse it because the same type name in a BPF object BTF can
> > > > > > be resolved to different kernel types (e.g., if we still had
> > > > > > ring_buffer name collision),
> > > > >
> > > > > well and the candidate list will have two kernel types with the same name.
> > > > > Both kept in a cache.
> > > > > so cache_lookup("ring_buffer") will return two kernel types.
> > > > > That would be the case for all progs being loaded.
> > > > > What am I missing?
> > > >
> > > > if there are two matching types with the same matching field but field
> > > > offsets are different (and thus there is ambiguity), that's an error.
> > > > So the correct (by current definition, at least) program has to result
> > > > in one of such two incompatible ring_buffer types and only one. If
> > > > there are multiple duplicates, though, (like task_struct and
> > > > task_struct___2) they will have identical sets of fields at the same
> > > > offsets, so both will remain possible candidates and that's not an
> > > > error. But for actually two different types, there can be only one
> > > > winner.
> > >
> > > I'm not proposing to cache the result of refined bpf_core_cand_list
> > > after bpf_core_apply_relo_insn() did its job.
> > > I'm saying the kernel can cache the result of the add_cands loop across
> > > vmlinux BTFs for all progs.
> > > The bpf_core_apply_relo_insn() will be called with a copy of
> > > bpf_core_cand_list from a cache.
> > > I'm thinking to keep this cache in 'struct btf'
> >
> > I wouldn't start with that for sure, because the next question is
> > garbage collection. What if someone just did some experiments, used
> > some obscure types that are not really used (typically), potentially
> > lots of them. Now the kernel will keep these extra cache entries
> > forever. You are basically talking about a lazily-built by-name index,
> > it might be useful in some other cases as well, but it definitely
> > comes with extra cost and I'm not convinced yet that we need to pay
> > this price right now.
> >
> > But then again, I don't see why it's so hard to build a local hashmap
> > or rbtree for the duration of program load and discard it after load.
> > You haven't provided a clear argument why not.
>
> because libbpf's approach to use hashtab for narrow use case is inferior.
> we could remove it from libbpf and no one will notice.
> The generic cache that works across progs will speed up the loading
> a tiny bit across all progs instead of a single case of profiler*.c
> that has a bunch of relos only because it's using an outdated
> always_unroll approach to loops.
>
> btw the following diff removes the bigger part of the loop overhead:
> +
> +               if (strncmp(local_cand->name, targ_name, local_essent_len) != 0)
>                         continue;
>
>                 targ_essent_len = bpf_core_essential_name_len(targ_name);
>                 if (targ_essent_len != local_essent_len)
>                         continue;
>
> -               if (strncmp(local_cand->name, targ_name, local_essent_len) != 0)
> -                       continue;
> -
>

this is a good optimization, we should do it in libbpf as well

> from 0.25 sec it goes to 0.1 sec.
> Not saying it's ok to waste this 0.1.
> On non debug kernel it's probably even less noticeable.
> The point is that the loop can be trivially made faster for
> all use cases.
> Whereas complicating kernel code with custom hashtable for
> a narrow use case is just odd.
> You've said it yourself: it's 22 different root types.
> With libbpf style hashtable this will be 22 x sizeof(vmlinux btf)
> iterations every time. And if this is a typical use case indeed
> then more complex progs in the future will be doing these loops.
> Either the loop should be fast enough (I think it is with above
> strncmp move) or the kernel needs to be optimized for all use cases.

Up to you, I've made my arguments, don't care beyond that. The big
difference is that we can optimize libbpf further without upgrading
the world. Inferior or not, it's better than no caching, IMO, and I
also figured this way of caching is a good tradeoff between memory use
(and cost of building the index) and performance in real-world cases.

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2021-11-23 19:49 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-20  3:32 [PATCH v3 bpf-next 00/13] bpf: CO-RE support in the kernel Alexei Starovoitov
2021-11-20  3:32 ` [PATCH v3 bpf-next 01/13] libbpf: Replace btf__type_by_id() with btf_type_by_id() Alexei Starovoitov
2021-11-22 21:42   ` Andrii Nakryiko
2021-11-20  3:32 ` [PATCH v3 bpf-next 02/13] bpf: Rename btf_member accessors Alexei Starovoitov
2021-11-20  3:32 ` [PATCH v3 bpf-next 03/13] bpf: Prepare relo_core.c for kernel duty Alexei Starovoitov
2021-11-20 15:27   ` Matteo Croce
2021-11-20 16:23     ` Alexei Starovoitov
2021-11-22 21:36       ` Andrii Nakryiko
2021-11-22 23:15   ` Andrii Nakryiko
2021-11-20  3:32 ` [PATCH v3 bpf-next 04/13] bpf: Define enum bpf_core_relo_kind as uapi Alexei Starovoitov
2021-11-20  3:32 ` [PATCH v3 bpf-next 05/13] bpf: Pass a set of bpf_core_relo-s to prog_load command Alexei Starovoitov
2021-11-22 23:37   ` Andrii Nakryiko
2021-11-20  3:32 ` [PATCH v3 bpf-next 06/13] bpf: Add bpf_core_add_cands() and wire it into bpf_core_apply_relo_insn() Alexei Starovoitov
2021-11-22 23:47   ` Andrii Nakryiko
2021-11-23  0:42     ` Alexei Starovoitov
2021-11-23  1:44       ` Andrii Nakryiko
2021-11-23  2:59         ` Andrii Nakryiko
2021-11-23  3:15         ` Alexei Starovoitov
2021-11-23  3:44           ` Andrii Nakryiko
2021-11-23  5:04             ` Alexei Starovoitov
2021-11-23 18:19               ` Andrii Nakryiko
2021-11-23 19:33                 ` Alexei Starovoitov
2021-11-23 19:48                   ` Andrii Nakryiko
2021-11-20  3:32 ` [PATCH v3 bpf-next 07/13] libbpf: Use CO-RE in the kernel in light skeleton Alexei Starovoitov
2021-11-23  0:03   ` Andrii Nakryiko
2021-11-23  0:08     ` Alexei Starovoitov
2021-11-20  3:32 ` [PATCH v3 bpf-next 08/13] libbpf: Support init of inner maps " Alexei Starovoitov
2021-11-20  3:32 ` [PATCH v3 bpf-next 09/13] selftests/bpf: Add lskel version of kfunc test Alexei Starovoitov
2021-11-23  0:05   ` Andrii Nakryiko
2021-11-20  3:32 ` [PATCH v3 bpf-next 10/13] selftests/bpf: Improve inner_map test coverage Alexei Starovoitov
2021-11-20  3:32 ` [PATCH v3 bpf-next 11/13] selftests/bpf: Convert map_ptr_kern test to use light skeleton Alexei Starovoitov
2021-11-20  3:32 ` [PATCH v3 bpf-next 12/13] selftests/bpf: Additional test for CO-RE in the kernel Alexei Starovoitov
2021-11-23  0:09   ` Andrii Nakryiko
2021-11-20  3:32 ` [PATCH v3 bpf-next 13/13] selftest/bpf: Revert CO-RE removal in test_ksyms_weak Alexei Starovoitov
2021-11-23  0:10   ` Andrii Nakryiko
2021-11-23  0:23   ` Kumar Kartikeya Dwivedi
2021-11-23  0:32     ` Alexei Starovoitov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.