linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v1 0/8] bpf: BTF support for ksyms
@ 2020-08-19 22:40 Hao Luo
  2020-08-19 22:40 ` [PATCH bpf-next v1 1/8] bpf: Introduce pseudo_btf_id Hao Luo
                   ` (7 more replies)
  0 siblings, 8 replies; 42+ messages in thread
From: Hao Luo @ 2020-08-19 22:40 UTC (permalink / raw)
  To: netdev, bpf, linux-kernel, linux-kselftest
  Cc: Shuah Khan, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Quentin Monnet, Hao Luo, Steven Rostedt, Ingo Molnar,
	Andrey Ignatov, Jakub Sitnicki

This patch series extends the previously added __ksym externs with
btf support.

Right now the __ksym externs are treated as pure 64-bit scalar value.
Libbpf replaces ld_imm64 insn of __ksym by its kernel address at load
time. This patch series extend those externs with their btf info. Note
that btf support for __ksym must come with the kernel btf that has
VARs encoded to work properly. The corresponding chagnes in pahole
is available at [1].

The first 5 patches in this series add support for general kernel
global variables, which includes verifier checking (01/08), libbpf
type checking (03/08) and btf_id resolving (04/08).

The last 3 patches extends that capability further by introducing a
helper bpf_per_cpu_ptr(), which allows accessing kernel percpu vars
correctly (06/08).

The tests of this feature were performed against the extended pahole.
For kernel btf that does not have VARs encoded, the selftests will be
skipped.

[1] https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?id=f3d9054ba8ff1df0fc44e507e3a01c0964cabd42

rfc -> v1:
 - Encode VAR's btf_id for PSEUDO_BTF_ID.
 - More checks in verifier. Checking the btf_id passed as
   PSEUDO_BTF_ID is valid VAR, its name and type.
 - Checks in libbpf on type compatibility of ksyms.
 - Add bpf_per_cpu_ptr() to access kernel percpu vars. Introduced
   new ARG and RET types for this helper.

Hao Luo (8):
  bpf: Introduce pseudo_btf_id
  bpf: Propagate BPF_PSEUDO_BTF_ID to uapi headers in /tools
  bpf: Introduce help function to validate ksym's type.
  bpf/libbpf: BTF support for typed ksyms
  bpf/selftests: ksyms_btf to test typed ksyms
  bpf: Introduce bpf_per_cpu_ptr()
  bpf: Propagate bpf_per_cpu_ptr() to /tools
  bpf/selftests: Test for bpf_per_cpu_ptr()

 include/linux/bpf.h                           |   3 +
 include/linux/btf.h                           |  26 +++
 include/uapi/linux/bpf.h                      |  52 +++++-
 kernel/bpf/btf.c                              |  25 ---
 kernel/bpf/verifier.c                         | 128 ++++++++++++-
 kernel/trace/bpf_trace.c                      |  18 ++
 tools/include/uapi/linux/bpf.h                |  53 +++++-
 tools/lib/bpf/btf.c                           | 171 ++++++++++++++++++
 tools/lib/bpf/btf.h                           |   2 +
 tools/lib/bpf/libbpf.c                        | 130 +++++++++++--
 .../selftests/bpf/prog_tests/ksyms_btf.c      |  81 +++++++++
 .../selftests/bpf/progs/test_ksyms_btf.c      |  36 ++++
 12 files changed, 665 insertions(+), 60 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/ksyms_btf.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_ksyms_btf.c

-- 
2.28.0.220.ged08abb693-goog


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH bpf-next v1 1/8] bpf: Introduce pseudo_btf_id
  2020-08-19 22:40 [PATCH bpf-next v1 0/8] bpf: BTF support for ksyms Hao Luo
@ 2020-08-19 22:40 ` Hao Luo
  2020-08-20 15:22   ` Yonghong Song
                     ` (2 more replies)
  2020-08-19 22:40 ` [PATCH bpf-next v1 2/8] bpf: Propagate BPF_PSEUDO_BTF_ID to uapi headers in /tools Hao Luo
                   ` (6 subsequent siblings)
  7 siblings, 3 replies; 42+ messages in thread
From: Hao Luo @ 2020-08-19 22:40 UTC (permalink / raw)
  To: netdev, bpf, linux-kernel, linux-kselftest
  Cc: Shuah Khan, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Quentin Monnet, Hao Luo, Steven Rostedt, Ingo Molnar,
	Andrey Ignatov, Jakub Sitnicki

Pseudo_btf_id is a type of ld_imm insn that associates a btf_id to a
ksym so that further dereferences on the ksym can use the BTF info
to validate accesses. Internally, when seeing a pseudo_btf_id ld insn,
the verifier reads the btf_id stored in the insn[0]'s imm field and
marks the dst_reg as PTR_TO_BTF_ID. The btf_id points to a VAR_KIND,
which is encoded in btf_vminux by pahole. If the VAR is not of a struct
type, the dst reg will be marked as PTR_TO_MEM instead of PTR_TO_BTF_ID
and the mem_size is resolved to the size of the VAR's type.

From the VAR btf_id, the verifier can also read the address of the
ksym's corresponding kernel var from kallsyms and use that to fill
dst_reg.

Therefore, the proper functionality of pseudo_btf_id depends on (1)
kallsyms and (2) the encoding of kernel global VARs in pahole, which
should be available since pahole v1.18.

Signed-off-by: Hao Luo <haoluo@google.com>
---
 include/linux/btf.h      | 15 +++++++++
 include/uapi/linux/bpf.h | 38 ++++++++++++++++------
 kernel/bpf/btf.c         | 15 ---------
 kernel/bpf/verifier.c    | 68 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 112 insertions(+), 24 deletions(-)

diff --git a/include/linux/btf.h b/include/linux/btf.h
index 8b81fbb4497c..cee4089e83c0 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -107,6 +107,21 @@ static inline bool btf_type_is_func_proto(const struct btf_type *t)
 	return BTF_INFO_KIND(t->info) == BTF_KIND_FUNC_PROTO;
 }
 
+static inline bool btf_type_is_var(const struct btf_type *t)
+{
+	return BTF_INFO_KIND(t->info) == BTF_KIND_VAR;
+}
+
+/* union is only a special case of struct:
+ * all its offsetof(member) == 0
+ */
+static inline bool btf_type_is_struct(const struct btf_type *t)
+{
+	u8 kind = BTF_INFO_KIND(t->info);
+
+	return kind == BTF_KIND_STRUCT || kind == BTF_KIND_UNION;
+}
+
 static inline u16 btf_type_vlen(const struct btf_type *t)
 {
 	return BTF_INFO_VLEN(t->info);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 0480f893facd..468376f2910b 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -346,18 +346,38 @@ enum bpf_link_type {
 #define BPF_F_TEST_STATE_FREQ	(1U << 3)
 
 /* When BPF ldimm64's insn[0].src_reg != 0 then this can have
- * two extensions:
- *
- * insn[0].src_reg:  BPF_PSEUDO_MAP_FD   BPF_PSEUDO_MAP_VALUE
- * insn[0].imm:      map fd              map fd
- * insn[1].imm:      0                   offset into value
- * insn[0].off:      0                   0
- * insn[1].off:      0                   0
- * ldimm64 rewrite:  address of map      address of map[0]+offset
- * verifier type:    CONST_PTR_TO_MAP    PTR_TO_MAP_VALUE
+ * the following extensions:
+ *
+ * insn[0].src_reg:  BPF_PSEUDO_MAP_FD
+ * insn[0].imm:      map fd
+ * insn[1].imm:      0
+ * insn[0].off:      0
+ * insn[1].off:      0
+ * ldimm64 rewrite:  address of map
+ * verifier type:    CONST_PTR_TO_MAP
  */
 #define BPF_PSEUDO_MAP_FD	1
+/*
+ * insn[0].src_reg:  BPF_PSEUDO_MAP_VALUE
+ * insn[0].imm:      map fd
+ * insn[1].imm:      offset into value
+ * insn[0].off:      0
+ * insn[1].off:      0
+ * ldimm64 rewrite:  address of map[0]+offset
+ * verifier type:    PTR_TO_MAP_VALUE
+ */
 #define BPF_PSEUDO_MAP_VALUE	2
+/*
+ * insn[0].src_reg:  BPF_PSEUDO_BTF_ID
+ * insn[0].imm:      kernel btd id of VAR
+ * insn[1].imm:      0
+ * insn[0].off:      0
+ * insn[1].off:      0
+ * ldimm64 rewrite:  address of the kernel variable
+ * verifier type:    PTR_TO_BTF_ID or PTR_TO_MEM, depending on whether the var
+ *                   is struct/union.
+ */
+#define BPF_PSEUDO_BTF_ID	3
 
 /* when bpf_call->src_reg == BPF_PSEUDO_CALL, bpf_call->imm == pc-relative
  * offset to another bpf function
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 91afdd4c82e3..b6d8f653afe2 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -353,16 +353,6 @@ static bool btf_type_nosize_or_null(const struct btf_type *t)
 	return !t || btf_type_nosize(t);
 }
 
-/* union is only a special case of struct:
- * all its offsetof(member) == 0
- */
-static bool btf_type_is_struct(const struct btf_type *t)
-{
-	u8 kind = BTF_INFO_KIND(t->info);
-
-	return kind == BTF_KIND_STRUCT || kind == BTF_KIND_UNION;
-}
-
 static bool __btf_type_is_struct(const struct btf_type *t)
 {
 	return BTF_INFO_KIND(t->info) == BTF_KIND_STRUCT;
@@ -373,11 +363,6 @@ static bool btf_type_is_array(const struct btf_type *t)
 	return BTF_INFO_KIND(t->info) == BTF_KIND_ARRAY;
 }
 
-static bool btf_type_is_var(const struct btf_type *t)
-{
-	return BTF_INFO_KIND(t->info) == BTF_KIND_VAR;
-}
-
 static bool btf_type_is_datasec(const struct btf_type *t)
 {
 	return BTF_INFO_KIND(t->info) == BTF_KIND_DATASEC;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index ef938f17b944..47badde71f83 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -7205,6 +7205,68 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
 	return 0;
 }
 
+/* verify ld_imm64 insn of type PSEUDO_BTF_ID is valid */
+static inline int check_pseudo_btf_id(struct bpf_verifier_env *env,
+				      struct bpf_insn *insn)
+{
+	struct bpf_reg_state *regs = cur_regs(env);
+	u32 type, id = insn->imm;
+	u64 addr;
+	const char *sym_name;
+	const struct btf_type *t = btf_type_by_id(btf_vmlinux, id);
+
+	if (!t) {
+		verbose(env, "%s: invalid btf_id %d\n", __func__, id);
+		return -ENOENT;
+	}
+
+	if (insn[1].imm != 0) {
+		verbose(env, "%s: BPF_PSEUDO_BTF_ID uses reserved fields\n",
+			__func__);
+		return -EINVAL;
+	}
+
+	if (!btf_type_is_var(t)) {
+		verbose(env, "%s: btf_id %d isn't KIND_VAR\n", __func__, id);
+		return -EINVAL;
+	}
+
+	sym_name = btf_name_by_offset(btf_vmlinux, t->name_off);
+	addr = kallsyms_lookup_name(sym_name);
+	if (!addr) {
+		verbose(env, "%s: failed to find the address of symbol '%s'.\n",
+			__func__, sym_name);
+		return -ENOENT;
+	}
+
+	insn[0].imm = (u32)addr;
+	insn[1].imm = addr >> 32;
+	mark_reg_known_zero(env, regs, insn->dst_reg);
+
+	type = t->type;
+	t = btf_type_skip_modifiers(btf_vmlinux, type, NULL);
+	if (!btf_type_is_struct(t)) {
+		u32 tsize;
+		const struct btf_type *ret;
+		const char *tname;
+
+		/* resolve the type size of ksym. */
+		ret = btf_resolve_size(btf_vmlinux, t, &tsize, NULL, NULL);
+		if (IS_ERR(ret)) {
+			tname = btf_name_by_offset(btf_vmlinux, t->name_off);
+			verbose(env, "unable to resolve the size of type '%s': %ld\n",
+				tname, PTR_ERR(ret));
+			return -EINVAL;
+		}
+		regs[insn->dst_reg].type = PTR_TO_MEM;
+		regs[insn->dst_reg].mem_size = tsize;
+	} else {
+		regs[insn->dst_reg].type = PTR_TO_BTF_ID;
+		regs[insn->dst_reg].btf_id = type;
+	}
+	return 0;
+}
+
 /* verify BPF_LD_IMM64 instruction */
 static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
 {
@@ -7234,6 +7296,9 @@ static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
 		return 0;
 	}
 
+	if (insn->src_reg == BPF_PSEUDO_BTF_ID)
+		return check_pseudo_btf_id(env, insn);
+
 	map = env->used_maps[aux->map_index];
 	mark_reg_known_zero(env, regs, insn->dst_reg);
 	regs[insn->dst_reg].map_ptr = map;
@@ -9255,6 +9320,9 @@ static int replace_map_fd_with_map_ptr(struct bpf_verifier_env *env)
 				/* valid generic load 64-bit imm */
 				goto next_insn;
 
+			if (insn[0].src_reg == BPF_PSEUDO_BTF_ID)
+				goto next_insn;
+
 			/* In final convert_pseudo_ld_imm64() step, this is
 			 * converted into regular 64-bit imm load insn.
 			 */
-- 
2.28.0.220.ged08abb693-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH bpf-next v1 2/8] bpf: Propagate BPF_PSEUDO_BTF_ID to uapi headers in /tools
  2020-08-19 22:40 [PATCH bpf-next v1 0/8] bpf: BTF support for ksyms Hao Luo
  2020-08-19 22:40 ` [PATCH bpf-next v1 1/8] bpf: Introduce pseudo_btf_id Hao Luo
@ 2020-08-19 22:40 ` Hao Luo
  2020-08-20 16:44   ` Yonghong Song
  2020-08-19 22:40 ` [PATCH bpf-next v1 3/8] bpf: Introduce help function to validate ksym's type Hao Luo
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 42+ messages in thread
From: Hao Luo @ 2020-08-19 22:40 UTC (permalink / raw)
  To: netdev, bpf, linux-kernel, linux-kselftest
  Cc: Shuah Khan, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Quentin Monnet, Hao Luo, Steven Rostedt, Ingo Molnar,
	Andrey Ignatov, Jakub Sitnicki

Propagate BPF_PSEUDO_BTF_ID from include/linux/uapi/bpf.h to
tools/include/linux/uapi/bpf.h.

Signed-off-by: Hao Luo <haoluo@google.com>
---
 tools/include/uapi/linux/bpf.h | 38 ++++++++++++++++++++++++++--------
 1 file changed, 29 insertions(+), 9 deletions(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 0480f893facd..468376f2910b 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -346,18 +346,38 @@ enum bpf_link_type {
 #define BPF_F_TEST_STATE_FREQ	(1U << 3)
 
 /* When BPF ldimm64's insn[0].src_reg != 0 then this can have
- * two extensions:
- *
- * insn[0].src_reg:  BPF_PSEUDO_MAP_FD   BPF_PSEUDO_MAP_VALUE
- * insn[0].imm:      map fd              map fd
- * insn[1].imm:      0                   offset into value
- * insn[0].off:      0                   0
- * insn[1].off:      0                   0
- * ldimm64 rewrite:  address of map      address of map[0]+offset
- * verifier type:    CONST_PTR_TO_MAP    PTR_TO_MAP_VALUE
+ * the following extensions:
+ *
+ * insn[0].src_reg:  BPF_PSEUDO_MAP_FD
+ * insn[0].imm:      map fd
+ * insn[1].imm:      0
+ * insn[0].off:      0
+ * insn[1].off:      0
+ * ldimm64 rewrite:  address of map
+ * verifier type:    CONST_PTR_TO_MAP
  */
 #define BPF_PSEUDO_MAP_FD	1
+/*
+ * insn[0].src_reg:  BPF_PSEUDO_MAP_VALUE
+ * insn[0].imm:      map fd
+ * insn[1].imm:      offset into value
+ * insn[0].off:      0
+ * insn[1].off:      0
+ * ldimm64 rewrite:  address of map[0]+offset
+ * verifier type:    PTR_TO_MAP_VALUE
+ */
 #define BPF_PSEUDO_MAP_VALUE	2
+/*
+ * insn[0].src_reg:  BPF_PSEUDO_BTF_ID
+ * insn[0].imm:      kernel btd id of VAR
+ * insn[1].imm:      0
+ * insn[0].off:      0
+ * insn[1].off:      0
+ * ldimm64 rewrite:  address of the kernel variable
+ * verifier type:    PTR_TO_BTF_ID or PTR_TO_MEM, depending on whether the var
+ *                   is struct/union.
+ */
+#define BPF_PSEUDO_BTF_ID	3
 
 /* when bpf_call->src_reg == BPF_PSEUDO_CALL, bpf_call->imm == pc-relative
  * offset to another bpf function
-- 
2.28.0.220.ged08abb693-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH bpf-next v1 3/8] bpf: Introduce help function to validate ksym's type.
  2020-08-19 22:40 [PATCH bpf-next v1 0/8] bpf: BTF support for ksyms Hao Luo
  2020-08-19 22:40 ` [PATCH bpf-next v1 1/8] bpf: Introduce pseudo_btf_id Hao Luo
  2020-08-19 22:40 ` [PATCH bpf-next v1 2/8] bpf: Propagate BPF_PSEUDO_BTF_ID to uapi headers in /tools Hao Luo
@ 2020-08-19 22:40 ` Hao Luo
  2020-08-20 17:20   ` Yonghong Song
  2020-08-19 22:40 ` [PATCH bpf-next v1 4/8] bpf/libbpf: BTF support for typed ksyms Hao Luo
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 42+ messages in thread
From: Hao Luo @ 2020-08-19 22:40 UTC (permalink / raw)
  To: netdev, bpf, linux-kernel, linux-kselftest
  Cc: Shuah Khan, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Quentin Monnet, Hao Luo, Steven Rostedt, Ingo Molnar,
	Andrey Ignatov, Jakub Sitnicki

For a ksym to be safely dereferenced and accessed, its type defined in
bpf program should basically match its type defined in kernel. Implement
a help function for a quick matching, which is used by libbpf when
resolving the kernel btf_id of a ksym.

Signed-off-by: Hao Luo <haoluo@google.com>
---
 tools/lib/bpf/btf.c | 171 ++++++++++++++++++++++++++++++++++++++++++++
 tools/lib/bpf/btf.h |   2 +
 2 files changed, 173 insertions(+)

diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index a3d259e614b0..2ff31f244d7a 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -1005,6 +1005,177 @@ int btf__get_map_kv_tids(const struct btf *btf, const char *map_name,
 	return 0;
 }
 
+/*
+ * Basic type check for ksym support. Only checks type kind and resolved size.
+ */
+static inline
+bool btf_ksym_equal_type(const struct btf *ba, __u32 type_a,
+			 const struct btf *bb, __u32 type_b)
+{
+	const struct btf_type *ta, *tb;
+
+	ta = btf__type_by_id(ba, type_a);
+	tb = btf__type_by_id(bb, type_b);
+
+	/* compare type kind */
+	if (btf_kind(ta) != btf_kind(tb))
+		return false;
+
+	/* compare resolved type size */
+	return btf__resolve_size(ba, type_a) == btf__resolve_size(bb, type_b);
+}
+
+/*
+ * Match a ksym's type defined in bpf programs against its type encoded in
+ * kernel btf.
+ */
+bool btf_ksym_type_match(const struct btf *ba, __u32 id_a,
+			 const struct btf *bb, __u32 id_b)
+{
+	const struct btf_type *ta = btf__type_by_id(ba, id_a);
+	const struct btf_type *tb = btf__type_by_id(bb, id_b);
+	int i;
+
+	/* compare type kind */
+	if (btf_kind(ta) != btf_kind(tb)) {
+		pr_warn("%s:mismatched type kind (%d v.s. %d).\n",
+			__func__, btf_kind(ta), btf_kind(tb));
+		return false;
+	}
+
+	switch (btf_kind(ta)) {
+	case BTF_KIND_INT: { /* compare size and encoding */
+		__u32 ea, eb;
+
+		if (ta->size != tb->size) {
+			pr_warn("%s:INT size mismatch, (%u v.s. %u)\n",
+				__func__, ta->size, tb->size);
+			return false;
+		}
+		ea = *(__u32 *)(ta + 1);
+		eb = *(__u32 *)(tb + 1);
+		if (ea != eb) {
+			pr_warn("%s:INT encoding mismatch (%u v.s. %u)\n",
+				__func__, ea, eb);
+			return false;
+		}
+		break;
+	}
+	case BTF_KIND_ARRAY: { /* compare type and number of elements */
+		const struct btf_array *ea, *eb;
+
+		ea = btf_array(ta);
+		eb = btf_array(tb);
+		if (!btf_ksym_equal_type(ba, ea->type, bb, eb->type)) {
+			pr_warn("%s:ARRAY elem type mismatch.\n", __func__);
+			return false;
+		}
+		if (ea->nelems != eb->nelems) {
+			pr_warn("%s:ARRAY nelems mismatch (%d v.s. %d)\n",
+				__func__, ea->nelems, eb->nelems);
+			return false;
+		}
+		break;
+	}
+	case BTF_KIND_STRUCT:
+	case BTF_KIND_UNION: { /* compare size, vlen and member offset, name */
+		const struct btf_member *ma, *mb;
+
+		if (ta->size != tb->size) {
+			pr_warn("%s:STRUCT size mismatch, (%u v.s. %u)\n",
+				__func__, ta->size, tb->size);
+			return false;
+		}
+		if (btf_vlen(ta) != btf_vlen(tb)) {
+			pr_warn("%s:STRUCT vlen mismatch, (%u v.s. %u)\n",
+				__func__, btf_vlen(ta), btf_vlen(tb));
+			return false;
+		}
+
+		ma = btf_members(ta);
+		mb = btf_members(tb);
+		for (i = 0; i < btf_vlen(ta); i++, ma++, mb++) {
+			const char *na, *nb;
+
+			if (ma->offset != mb->offset) {
+				pr_warn("%s:STRUCT field offset mismatch, (%u v.s. %u)\n",
+					__func__, ma->offset, mb->offset);
+				return false;
+			}
+			na = btf__name_by_offset(ba, ma->name_off);
+			nb = btf__name_by_offset(bb, mb->name_off);
+			if (strcmp(na, nb)) {
+				pr_warn("%s:STRUCT field name mismatch, (%s v.s. %s)\n",
+					__func__, na, nb);
+				return false;
+			}
+		}
+		break;
+	}
+	case BTF_KIND_ENUM: { /* compare vlen and member value, name */
+		const struct btf_enum *ma, *mb;
+
+		if (btf_vlen(ta) != btf_vlen(tb)) {
+			pr_warn("%s:ENUM vlen mismatch, (%u v.s. %u)\n",
+				__func__, btf_vlen(ta), btf_vlen(tb));
+			return false;
+		}
+
+		ma = btf_enum(ta);
+		mb = btf_enum(tb);
+		for (i = 0; i < btf_vlen(ta); i++, ma++, mb++) {
+			if (ma->val != mb->val) {
+				pr_warn("%s:ENUM val mismatch, (%u v.s. %u)\n",
+					__func__, ma->val, mb->val);
+				return false;
+			}
+		}
+		break;
+	}
+	case BTF_KIND_PTR: { /* naive compare of ref type for PTR */
+		if (!btf_ksym_equal_type(ba, ta->type, bb, tb->type)) {
+			pr_warn("%s:PTR ref type mismatch.\n", __func__);
+			return false;
+		}
+		break;
+	}
+	case BTF_KIND_FUNC_PROTO: { /* naive compare of vlen and param types */
+		const struct btf_param *pa, *pb;
+
+		if (btf_vlen(ta) != btf_vlen(tb)) {
+			pr_warn("%s:FUNC_PROTO vlen mismatch, (%u v.s. %u)\n",
+				__func__, btf_vlen(ta), btf_vlen(tb));
+			return false;
+		}
+
+		pa = btf_params(ta);
+		pb = btf_params(tb);
+		for (i = 0; i < btf_vlen(ta); i++, pa++, pb++) {
+			if (!btf_ksym_equal_type(ba, pa->type, bb, pb->type)) {
+				pr_warn("%s:FUNC_PROTO params type mismatch.\n",
+					__func__);
+				return false;
+			}
+		}
+		break;
+	}
+	case BTF_KIND_FUNC:
+	case BTF_KIND_CONST:
+	case BTF_KIND_VOLATILE:
+	case BTF_KIND_RESTRICT:
+	case BTF_KIND_TYPEDEF:
+	case BTF_KIND_VAR:
+	case BTF_KIND_DATASEC:
+		pr_warn("unexpected type for matching ksym types.\n");
+		return false;
+	default:
+		pr_warn("unsupported btf types.\n");
+		return false;
+	}
+
+	return true;
+}
+
 struct btf_ext_sec_setup_param {
 	__u32 off;
 	__u32 len;
diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
index 91f0ad0e0325..5ef220e52485 100644
--- a/tools/lib/bpf/btf.h
+++ b/tools/lib/bpf/btf.h
@@ -52,6 +52,8 @@ LIBBPF_API int btf__get_map_kv_tids(const struct btf *btf, const char *map_name,
 				    __u32 expected_key_size,
 				    __u32 expected_value_size,
 				    __u32 *key_type_id, __u32 *value_type_id);
+LIBBPF_API bool btf_ksym_type_match(const struct btf *ba, __u32 id_a,
+				    const struct btf *bb, __u32 id_b);
 
 LIBBPF_API struct btf_ext *btf_ext__new(__u8 *data, __u32 size);
 LIBBPF_API void btf_ext__free(struct btf_ext *btf_ext);
-- 
2.28.0.220.ged08abb693-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH bpf-next v1 4/8] bpf/libbpf: BTF support for typed ksyms
  2020-08-19 22:40 [PATCH bpf-next v1 0/8] bpf: BTF support for ksyms Hao Luo
                   ` (2 preceding siblings ...)
  2020-08-19 22:40 ` [PATCH bpf-next v1 3/8] bpf: Introduce help function to validate ksym's type Hao Luo
@ 2020-08-19 22:40 ` Hao Luo
  2020-08-21 22:37   ` Andrii Nakryiko
  2020-08-19 22:40 ` [PATCH bpf-next v1 5/8] bpf/selftests: ksyms_btf to test " Hao Luo
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 42+ messages in thread
From: Hao Luo @ 2020-08-19 22:40 UTC (permalink / raw)
  To: netdev, bpf, linux-kernel, linux-kselftest
  Cc: Shuah Khan, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Quentin Monnet, Hao Luo, Steven Rostedt, Ingo Molnar,
	Andrey Ignatov, Jakub Sitnicki

If a ksym is defined with a type, libbpf will try to find the ksym's btf
information from kernel btf. If a valid btf entry for the ksym is found,
libbpf can pass in the found btf id to the verifier, which validates the
ksym's type and value.

Typeless ksyms (i.e. those defined as 'void') will not have such btf_id,
but it has the symbol's address (read from kallsyms) and its value is
treated as a raw pointer.

Signed-off-by: Hao Luo <haoluo@google.com>
---
 tools/lib/bpf/libbpf.c | 130 ++++++++++++++++++++++++++++++++++++-----
 1 file changed, 114 insertions(+), 16 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 4a81c6b2d21b..94eff612c7c2 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -357,7 +357,16 @@ struct extern_desc {
 			bool is_signed;
 		} kcfg;
 		struct {
-			unsigned long long addr;
+			/*
+			 *  1. If ksym is typeless, the field 'addr' is valid.
+			 *  2. If ksym is typed, the field 'vmlinux_btf_id' is
+			 *     valid.
+			 */
+			bool is_typeless;
+			union {
+				unsigned long long addr;
+				int vmlinux_btf_id;
+			};
 		} ksym;
 	};
 };
@@ -382,6 +391,7 @@ struct bpf_object {
 
 	bool loaded;
 	bool has_pseudo_calls;
+	bool has_typed_ksyms;
 
 	/*
 	 * Information when doing elf related work. Only valid if fd
@@ -2521,6 +2531,10 @@ static int bpf_object__load_vmlinux_btf(struct bpf_object *obj)
 	if (obj->btf_ext && obj->btf_ext->core_relo_info.len)
 		need_vmlinux_btf = true;
 
+	/* Support for typed ksyms needs kernel BTF */
+	if (obj->has_typed_ksyms)
+		need_vmlinux_btf = true;
+
 	bpf_object__for_each_program(prog, obj) {
 		if (!prog->load)
 			continue;
@@ -2975,10 +2989,10 @@ static int bpf_object__collect_externs(struct bpf_object *obj)
 			ext->type = EXT_KSYM;
 
 			vt = skip_mods_and_typedefs(obj->btf, t->type, NULL);
-			if (!btf_is_void(vt)) {
-				pr_warn("extern (ksym) '%s' is not typeless (void)\n", ext_name);
-				return -ENOTSUP;
-			}
+			ext->ksym.is_typeless = btf_is_void(vt);
+
+			if (!obj->has_typed_ksyms && !ext->ksym.is_typeless)
+				obj->has_typed_ksyms = true;
 		} else {
 			pr_warn("unrecognized extern section '%s'\n", sec_name);
 			return -ENOTSUP;
@@ -2992,9 +3006,9 @@ static int bpf_object__collect_externs(struct bpf_object *obj)
 	/* sort externs by type, for kcfg ones also by (align, size, name) */
 	qsort(obj->externs, obj->nr_extern, sizeof(*ext), cmp_externs);
 
-	/* for .ksyms section, we need to turn all externs into allocated
-	 * variables in BTF to pass kernel verification; we do this by
-	 * pretending that each extern is a 8-byte variable
+	/* for .ksyms section, we need to turn all typeless externs into
+	 * allocated variables in BTF to pass kernel verification; we do
+	 * this by pretending that each typeless extern is a 8-byte variable
 	 */
 	if (ksym_sec) {
 		/* find existing 4-byte integer type in BTF to use for fake
@@ -3012,7 +3026,7 @@ static int bpf_object__collect_externs(struct bpf_object *obj)
 
 		sec = ksym_sec;
 		n = btf_vlen(sec);
-		for (i = 0, off = 0; i < n; i++, off += sizeof(int)) {
+		for (i = 0, off = 0; i < n; i++) {
 			struct btf_var_secinfo *vs = btf_var_secinfos(sec) + i;
 			struct btf_type *vt;
 
@@ -3025,9 +3039,14 @@ static int bpf_object__collect_externs(struct bpf_object *obj)
 				return -ESRCH;
 			}
 			btf_var(vt)->linkage = BTF_VAR_GLOBAL_ALLOCATED;
-			vt->type = int_btf_id;
+			if (ext->ksym.is_typeless) {
+				vt->type = int_btf_id;
+				vs->size = sizeof(int);
+			}
 			vs->offset = off;
-			vs->size = sizeof(int);
+			off += vs->size;
+			pr_debug("ksym var_secinfo: var '%s', type #%d, size %d, offset %d\n",
+				 ext->name, vt->type, vs->size, vs->offset);
 		}
 		sec->size = off;
 	}
@@ -5300,8 +5319,13 @@ bpf_program__relocate(struct bpf_program *prog, struct bpf_object *obj)
 				insn[0].imm = obj->maps[obj->kconfig_map_idx].fd;
 				insn[1].imm = ext->kcfg.data_off;
 			} else /* EXT_KSYM */ {
-				insn[0].imm = (__u32)ext->ksym.addr;
-				insn[1].imm = ext->ksym.addr >> 32;
+				if (ext->ksym.is_typeless) { /* typelss ksyms */
+					insn[0].imm = (__u32)ext->ksym.addr;
+					insn[1].imm = ext->ksym.addr >> 32;
+				} else { /* typed ksyms */
+					insn[0].src_reg = BPF_PSEUDO_BTF_ID;
+					insn[0].imm = ext->ksym.vmlinux_btf_id;
+				}
 			}
 			break;
 		case RELO_CALL:
@@ -6019,6 +6043,10 @@ static int bpf_object__read_kallsyms_file(struct bpf_object *obj)
 		if (!ext || ext->type != EXT_KSYM)
 			continue;
 
+		/* Typed ksyms have the verifier to fill their addresses. */
+		if (!ext->ksym.is_typeless)
+			continue;
+
 		if (ext->is_set && ext->ksym.addr != sym_addr) {
 			pr_warn("extern (ksym) '%s' resolution is ambiguous: 0x%llx or 0x%llx\n",
 				sym_name, ext->ksym.addr, sym_addr);
@@ -6037,10 +6065,72 @@ static int bpf_object__read_kallsyms_file(struct bpf_object *obj)
 	return err;
 }
 
+static int bpf_object__resolve_ksyms_btf_id(struct bpf_object *obj)
+{
+	struct extern_desc *ext;
+	int i, id;
+
+	if (!obj->btf_vmlinux) {
+		pr_warn("support of typed ksyms needs kernel btf.\n");
+		return -ENOENT;
+	}
+
+	for (i = 0; i < obj->nr_extern; i++) {
+		const struct btf_type *v, *vx; /* VARs in object and vmlinux btf */
+		const struct btf_type *t, *tx; /* TYPEs in btf */
+		__u32 vt, vtx; /* btf_ids of TYPEs */
+
+		ext = &obj->externs[i];
+		if (ext->type != EXT_KSYM)
+			continue;
+
+		if (ext->ksym.is_typeless)
+			continue;
+
+		if (ext->is_set) {
+			pr_warn("typed ksym '%s' resolved as typeless ksyms.\n",
+				ext->name);
+			return -EFAULT;
+		}
+
+		id = btf__find_by_name_kind(obj->btf_vmlinux, ext->name,
+					    BTF_KIND_VAR);
+		if (id <= 0) {
+			pr_warn("no btf entry for ksym '%s' in vmlinux.\n",
+				ext->name);
+			return -ESRCH;
+		}
+
+		vx = btf__type_by_id(obj->btf_vmlinux, id);
+		tx = skip_mods_and_typedefs(obj->btf_vmlinux, vx->type, &vtx);
+
+		v = btf__type_by_id(obj->btf, ext->btf_id);
+		t = skip_mods_and_typedefs(obj->btf, v->type, &vt);
+
+		if (!btf_ksym_type_match(obj->btf_vmlinux, vtx, obj->btf, vt)) {
+			const char *tname, *txname; /* names of TYPEs */
+
+			txname = btf__name_by_offset(obj->btf_vmlinux, tx->name_off);
+			tname = btf__name_by_offset(obj->btf, t->name_off);
+
+			pr_warn("ksym '%s' expects type '%s' (vmlinux_btf_id: #%d), "
+				"but got '%s' (btf_id: #%d)\n", ext->name,
+				txname, vtx, tname, vt);
+			return -EINVAL;
+		}
+
+		ext->is_set = true;
+		ext->ksym.vmlinux_btf_id = id;
+		pr_debug("extern (ksym) %s=vmlinux_btf_id(#%d)\n", ext->name, id);
+	}
+	return 0;
+}
+
 static int bpf_object__resolve_externs(struct bpf_object *obj,
 				       const char *extra_kconfig)
 {
-	bool need_config = false, need_kallsyms = false;
+	bool need_config = false;
+	bool need_kallsyms = false, need_vmlinux_btf = false;
 	struct extern_desc *ext;
 	void *kcfg_data = NULL;
 	int err, i;
@@ -6071,7 +6161,10 @@ static int bpf_object__resolve_externs(struct bpf_object *obj,
 			   strncmp(ext->name, "CONFIG_", 7) == 0) {
 			need_config = true;
 		} else if (ext->type == EXT_KSYM) {
-			need_kallsyms = true;
+			if (ext->ksym.is_typeless)
+				need_kallsyms = true;
+			else
+				need_vmlinux_btf = true;
 		} else {
 			pr_warn("unrecognized extern '%s'\n", ext->name);
 			return -EINVAL;
@@ -6100,6 +6193,11 @@ static int bpf_object__resolve_externs(struct bpf_object *obj,
 		if (err)
 			return -EINVAL;
 	}
+	if (need_vmlinux_btf) {
+		err = bpf_object__resolve_ksyms_btf_id(obj);
+		if (err)
+			return -EINVAL;
+	}
 	for (i = 0; i < obj->nr_extern; i++) {
 		ext = &obj->externs[i];
 
@@ -6132,10 +6230,10 @@ int bpf_object__load_xattr(struct bpf_object_load_attr *attr)
 	}
 
 	err = bpf_object__probe_loading(obj);
+	err = err ? : bpf_object__load_vmlinux_btf(obj);
 	err = err ? : bpf_object__resolve_externs(obj, obj->kconfig);
 	err = err ? : bpf_object__sanitize_and_load_btf(obj);
 	err = err ? : bpf_object__sanitize_maps(obj);
-	err = err ? : bpf_object__load_vmlinux_btf(obj);
 	err = err ? : bpf_object__init_kern_struct_ops_maps(obj);
 	err = err ? : bpf_object__create_maps(obj);
 	err = err ? : bpf_object__relocate(obj, attr->target_btf_path);
-- 
2.28.0.220.ged08abb693-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH bpf-next v1 5/8] bpf/selftests: ksyms_btf to test typed ksyms
  2020-08-19 22:40 [PATCH bpf-next v1 0/8] bpf: BTF support for ksyms Hao Luo
                   ` (3 preceding siblings ...)
  2020-08-19 22:40 ` [PATCH bpf-next v1 4/8] bpf/libbpf: BTF support for typed ksyms Hao Luo
@ 2020-08-19 22:40 ` Hao Luo
  2020-08-20 17:28   ` Yonghong Song
  2020-08-19 22:40 ` [PATCH bpf-next v1 6/8] bpf: Introduce bpf_per_cpu_ptr() Hao Luo
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 42+ messages in thread
From: Hao Luo @ 2020-08-19 22:40 UTC (permalink / raw)
  To: netdev, bpf, linux-kernel, linux-kselftest
  Cc: Shuah Khan, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Quentin Monnet, Hao Luo, Steven Rostedt, Ingo Molnar,
	Andrey Ignatov, Jakub Sitnicki

Selftests for typed ksyms. Tests two types of ksyms: one is a struct,
the other is a plain int. This tests two paths in the kernel. Struct
ksyms will be converted into PTR_TO_BTF_ID by the verifier while int
typed ksyms will be converted into PTR_TO_MEM.

Signed-off-by: Hao Luo <haoluo@google.com>
---
 .../selftests/bpf/prog_tests/ksyms_btf.c      | 77 +++++++++++++++++++
 .../selftests/bpf/progs/test_ksyms_btf.c      | 23 ++++++
 2 files changed, 100 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/ksyms_btf.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_ksyms_btf.c

diff --git a/tools/testing/selftests/bpf/prog_tests/ksyms_btf.c b/tools/testing/selftests/bpf/prog_tests/ksyms_btf.c
new file mode 100644
index 000000000000..1dad61ba7e99
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/ksyms_btf.c
@@ -0,0 +1,77 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2020 Google */
+
+#include <test_progs.h>
+#include <bpf/libbpf.h>
+#include <bpf/btf.h>
+#include "test_ksyms_btf.skel.h"
+
+static int duration;
+
+static __u64 kallsyms_find(const char *sym)
+{
+	char type, name[500];
+	__u64 addr, res = 0;
+	FILE *f;
+
+	f = fopen("/proc/kallsyms", "r");
+	if (CHECK(!f, "kallsyms_fopen", "failed to open: %d\n", errno))
+		return 0;
+
+	while (fscanf(f, "%llx %c %499s%*[^\n]\n", &addr, &type, name) > 0) {
+		if (strcmp(name, sym) == 0) {
+			res = addr;
+			goto out;
+		}
+	}
+
+	CHECK(false, "not_found", "symbol %s not found\n", sym);
+out:
+	fclose(f);
+	return res;
+}
+
+void test_ksyms_btf(void)
+{
+	__u64 runqueues_addr = kallsyms_find("runqueues");
+	__u64 bpf_prog_active_addr = kallsyms_find("bpf_prog_active");
+	struct test_ksyms_btf *skel;
+	struct test_ksyms_btf__data *data;
+	struct btf *btf;
+	int percpu_datasec;
+	int err;
+
+	btf = libbpf_find_kernel_btf();
+	if (CHECK(IS_ERR(btf), "btf_exists", "failed to load kernel BTF: %ld\n",
+		  PTR_ERR(btf)))
+		return;
+
+	percpu_datasec = btf__find_by_name_kind(btf, ".data..percpu",
+						BTF_KIND_DATASEC);
+	if (percpu_datasec < 0) {
+		printf("%s:SKIP:no PERCPU DATASEC in kernel btf\n",
+		       __func__);
+		test__skip();
+		return;
+	}
+
+	skel = test_ksyms_btf__open_and_load();
+	if (CHECK(!skel, "skel_open", "failed to open and load skeleton\n"))
+		return;
+
+	err = test_ksyms_btf__attach(skel);
+	if (CHECK(err, "skel_attach", "skeleton attach failed: %d\n", err))
+		goto cleanup;
+
+	/* trigger tracepoint */
+	usleep(1);
+
+	data = skel->data;
+	CHECK(data->out__runqueues != runqueues_addr, "runqueues",
+	      "got %llu, exp %llu\n", data->out__runqueues, runqueues_addr);
+	CHECK(data->out__bpf_prog_active != bpf_prog_active_addr, "bpf_prog_active",
+	      "got %llu, exp %llu\n", data->out__bpf_prog_active, bpf_prog_active_addr);
+
+cleanup:
+	test_ksyms_btf__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_ksyms_btf.c b/tools/testing/selftests/bpf/progs/test_ksyms_btf.c
new file mode 100644
index 000000000000..e04e31117f84
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_ksyms_btf.c
@@ -0,0 +1,23 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2020 Google */
+
+#include "vmlinux.h"
+
+#include <bpf/bpf_helpers.h>
+
+__u64 out__runqueues = -1;
+__u64 out__bpf_prog_active = -1;
+
+extern const struct rq runqueues __ksym; /* struct type global var. */
+extern const int bpf_prog_active __ksym; /* int type global var. */
+
+SEC("raw_tp/sys_enter")
+int handler(const void *ctx)
+{
+	out__runqueues = (__u64)&runqueues;
+	out__bpf_prog_active = (__u64)&bpf_prog_active;
+
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.28.0.220.ged08abb693-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH bpf-next v1 6/8] bpf: Introduce bpf_per_cpu_ptr()
  2020-08-19 22:40 [PATCH bpf-next v1 0/8] bpf: BTF support for ksyms Hao Luo
                   ` (4 preceding siblings ...)
  2020-08-19 22:40 ` [PATCH bpf-next v1 5/8] bpf/selftests: ksyms_btf to test " Hao Luo
@ 2020-08-19 22:40 ` Hao Luo
  2020-08-22  3:26   ` Andrii Nakryiko
  2020-08-19 22:40 ` [PATCH bpf-next v1 7/8] bpf: Propagate bpf_per_cpu_ptr() to /tools Hao Luo
  2020-08-19 22:40 ` [PATCH bpf-next v1 8/8] bpf/selftests: Test for bpf_per_cpu_ptr() Hao Luo
  7 siblings, 1 reply; 42+ messages in thread
From: Hao Luo @ 2020-08-19 22:40 UTC (permalink / raw)
  To: netdev, bpf, linux-kernel, linux-kselftest
  Cc: Shuah Khan, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Quentin Monnet, Hao Luo, Steven Rostedt, Ingo Molnar,
	Andrey Ignatov, Jakub Sitnicki

Add bpf_per_cpu_ptr() to help bpf programs access percpu vars.
bpf_per_cpu_ptr() has the same semantic as per_cpu_ptr() in the kernel
except that it may return NULL. This happens when the cpu parameter is
out of range. So the caller must check the returned value.

Signed-off-by: Hao Luo <haoluo@google.com>
---
 include/linux/bpf.h      |  3 ++
 include/linux/btf.h      | 11 +++++++
 include/uapi/linux/bpf.h | 14 +++++++++
 kernel/bpf/btf.c         | 10 -------
 kernel/bpf/verifier.c    | 64 ++++++++++++++++++++++++++++++++++++++--
 kernel/trace/bpf_trace.c | 18 +++++++++++
 6 files changed, 107 insertions(+), 13 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 55f694b63164..613404beab33 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -268,6 +268,7 @@ enum bpf_arg_type {
 	ARG_PTR_TO_ALLOC_MEM,	/* pointer to dynamically allocated memory */
 	ARG_PTR_TO_ALLOC_MEM_OR_NULL,	/* pointer to dynamically allocated memory or NULL */
 	ARG_CONST_ALLOC_SIZE_OR_ZERO,	/* number of allocated bytes requested */
+	ARG_PTR_TO_PERCPU_BTF_ID,	/* pointer to in-kernel percpu type */
 };
 
 /* type of values returned from helper functions */
@@ -281,6 +282,7 @@ enum bpf_return_type {
 	RET_PTR_TO_SOCK_COMMON_OR_NULL,	/* returns a pointer to a sock_common or NULL */
 	RET_PTR_TO_ALLOC_MEM_OR_NULL,	/* returns a pointer to dynamically allocated memory or NULL */
 	RET_PTR_TO_BTF_ID_OR_NULL,	/* returns a pointer to a btf_id or NULL */
+	RET_PTR_TO_MEM_OR_BTF_OR_NULL,	/* returns a pointer to a valid memory or a btf_id or NULL */
 };
 
 /* eBPF function prototype used by verifier to allow BPF_CALLs from eBPF programs
@@ -360,6 +362,7 @@ enum bpf_reg_type {
 	PTR_TO_RDONLY_BUF_OR_NULL, /* reg points to a readonly buffer or NULL */
 	PTR_TO_RDWR_BUF,	 /* reg points to a read/write buffer */
 	PTR_TO_RDWR_BUF_OR_NULL, /* reg points to a read/write buffer or NULL */
+	PTR_TO_PERCPU_BTF_ID,	 /* reg points to percpu kernel type */
 };
 
 /* The information passed from prog-specific *_is_valid_access
diff --git a/include/linux/btf.h b/include/linux/btf.h
index cee4089e83c0..dc3509246913 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -72,6 +72,11 @@ btf_resolve_size(const struct btf *btf, const struct btf_type *type,
 	     i < btf_type_vlen(struct_type);			\
 	     i++, member++)
 
+#define for_each_vsi(i, struct_type, member)			\
+	for (i = 0, member = btf_type_var_secinfo(struct_type);	\
+	     i < btf_type_vlen(struct_type);			\
+	     i++, member++)
+
 static inline bool btf_type_is_ptr(const struct btf_type *t)
 {
 	return BTF_INFO_KIND(t->info) == BTF_KIND_PTR;
@@ -156,6 +161,12 @@ static inline const struct btf_member *btf_type_member(const struct btf_type *t)
 	return (const struct btf_member *)(t + 1);
 }
 
+static inline const struct btf_var_secinfo *btf_type_var_secinfo(
+		const struct btf_type *t)
+{
+	return (const struct btf_var_secinfo *)(t + 1);
+}
+
 #ifdef CONFIG_BPF_SYSCALL
 const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id);
 const char *btf_name_by_offset(const struct btf *btf, u32 offset);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 468376f2910b..c7e49a102ed2 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3415,6 +3415,19 @@ union bpf_attr {
  *		A non-negative value equal to or less than *size* on success,
  *		or a negative error in case of failure.
  *
+ * void *bpf_per_cpu_ptr(const void *ptr, u32 cpu)
+ *	Description
+ *		Take the address of a percpu ksym and return a pointer pointing
+ *		to the variable on *cpu*. A ksym is an extern variable decorated
+ *		with '__ksym'. A ksym is percpu if there is a global percpu var
+ *		(either static or global) defined of the same name in the kernel.
+ *
+ *		bpf_per_cpu_ptr() has the same semantic as per_cpu_ptr() in the
+ *		kernel, except that bpf_per_cpu_ptr() may return NULL. This
+ *		happens if *cpu* is larger than nr_cpu_ids. The caller of
+ *		bpf_per_cpu_ptr() must check the returned value.
+ *	Return
+ *		A generic pointer pointing to the variable on *cpu*.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -3559,6 +3572,7 @@ union bpf_attr {
 	FN(skc_to_tcp_request_sock),	\
 	FN(skc_to_udp6_sock),		\
 	FN(get_task_stack),		\
+	FN(bpf_per_cpu_ptr),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index b6d8f653afe2..e735804f5f34 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -186,11 +186,6 @@
 	     i < btf_type_vlen(struct_type);				\
 	     i++, member++)
 
-#define for_each_vsi(i, struct_type, member)			\
-	for (i = 0, member = btf_type_var_secinfo(struct_type);	\
-	     i < btf_type_vlen(struct_type);			\
-	     i++, member++)
-
 #define for_each_vsi_from(i, from, struct_type, member)				\
 	for (i = from, member = btf_type_var_secinfo(struct_type) + from;	\
 	     i < btf_type_vlen(struct_type);					\
@@ -511,11 +506,6 @@ static const struct btf_var *btf_type_var(const struct btf_type *t)
 	return (const struct btf_var *)(t + 1);
 }
 
-static const struct btf_var_secinfo *btf_type_var_secinfo(const struct btf_type *t)
-{
-	return (const struct btf_var_secinfo *)(t + 1);
-}
-
 static const struct btf_kind_operations *btf_type_ops(const struct btf_type *t)
 {
 	return kind_ops[BTF_INFO_KIND(t->info)];
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 47badde71f83..c2db6308d6fa 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -238,6 +238,7 @@ struct bpf_call_arg_meta {
 	int ref_obj_id;
 	int func_id;
 	u32 btf_id;
+	u32 ret_btf_id;
 };
 
 struct btf *btf_vmlinux;
@@ -503,6 +504,7 @@ static const char * const reg_type_str[] = {
 	[PTR_TO_XDP_SOCK]	= "xdp_sock",
 	[PTR_TO_BTF_ID]		= "ptr_",
 	[PTR_TO_BTF_ID_OR_NULL]	= "ptr_or_null_",
+	[PTR_TO_PERCPU_BTF_ID]	= "percpu_ptr_",
 	[PTR_TO_MEM]		= "mem",
 	[PTR_TO_MEM_OR_NULL]	= "mem_or_null",
 	[PTR_TO_RDONLY_BUF]	= "rdonly_buf",
@@ -569,7 +571,9 @@ static void print_verifier_state(struct bpf_verifier_env *env,
 			/* reg->off should be 0 for SCALAR_VALUE */
 			verbose(env, "%lld", reg->var_off.value + reg->off);
 		} else {
-			if (t == PTR_TO_BTF_ID || t == PTR_TO_BTF_ID_OR_NULL)
+			if (t == PTR_TO_BTF_ID ||
+			    t == PTR_TO_BTF_ID_OR_NULL ||
+			    t == PTR_TO_PERCPU_BTF_ID)
 				verbose(env, "%s", kernel_type_name(reg->btf_id));
 			verbose(env, "(id=%d", reg->id);
 			if (reg_type_may_be_refcounted_or_null(t))
@@ -2183,6 +2187,7 @@ static bool is_spillable_regtype(enum bpf_reg_type type)
 	case PTR_TO_RDONLY_BUF_OR_NULL:
 	case PTR_TO_RDWR_BUF:
 	case PTR_TO_RDWR_BUF_OR_NULL:
+	case PTR_TO_PERCPU_BTF_ID:
 		return true;
 	default:
 		return false;
@@ -3959,6 +3964,15 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 			if (type != expected_type)
 				goto err_type;
 		}
+	} else if (arg_type == ARG_PTR_TO_PERCPU_BTF_ID) {
+		expected_type = PTR_TO_PERCPU_BTF_ID;
+		if (type != expected_type)
+			goto err_type;
+		if (!reg->btf_id) {
+			verbose(env, "Helper has zero btf_id in R%d\n", regno);
+			return -EACCES;
+		}
+		meta->ret_btf_id = reg->btf_id;
 	} else if (arg_type == ARG_PTR_TO_BTF_ID) {
 		expected_type = PTR_TO_BTF_ID;
 		if (type != expected_type)
@@ -4904,6 +4918,30 @@ static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn
 		regs[BPF_REG_0].type = PTR_TO_MEM_OR_NULL;
 		regs[BPF_REG_0].id = ++env->id_gen;
 		regs[BPF_REG_0].mem_size = meta.mem_size;
+	} else if (fn->ret_type == RET_PTR_TO_MEM_OR_BTF_OR_NULL) {
+		const struct btf_type *t;
+
+		mark_reg_known_zero(env, regs, BPF_REG_0);
+		t = btf_type_skip_modifiers(btf_vmlinux, meta.ret_btf_id, NULL);
+		if (!btf_type_is_struct(t)) {
+			u32 tsize;
+			const struct btf_type *ret;
+			const char *tname;
+
+			/* resolve the type size of ksym. */
+			ret = btf_resolve_size(btf_vmlinux, t, &tsize, NULL, NULL);
+			if (IS_ERR(ret)) {
+				tname = btf_name_by_offset(btf_vmlinux, t->name_off);
+				verbose(env, "unable to resolve the size of type '%s': %ld\n",
+					tname, PTR_ERR(ret));
+				return -EINVAL;
+			}
+			regs[BPF_REG_0].type = PTR_TO_MEM_OR_NULL;
+			regs[BPF_REG_0].mem_size = tsize;
+		} else {
+			regs[BPF_REG_0].type = PTR_TO_BTF_ID_OR_NULL;
+			regs[BPF_REG_0].btf_id = meta.ret_btf_id;
+		}
 	} else if (fn->ret_type == RET_PTR_TO_BTF_ID_OR_NULL) {
 		int ret_btf_id;
 
@@ -7210,10 +7248,15 @@ static inline int check_pseudo_btf_id(struct bpf_verifier_env *env,
 				      struct bpf_insn *insn)
 {
 	struct bpf_reg_state *regs = cur_regs(env);
-	u32 type, id = insn->imm;
+	u32 datasec_id, type, id = insn->imm;
 	u64 addr;
 	const char *sym_name;
 	const struct btf_type *t = btf_type_by_id(btf_vmlinux, id);
+	const struct btf_type *datasec;
+	const struct btf_var_secinfo *vsi;
+	int i;
+
+	bool percpu = false;
 
 	if (!t) {
 		verbose(env, "%s: invalid btf_id %d\n", __func__, id);
@@ -7243,9 +7286,24 @@ static inline int check_pseudo_btf_id(struct bpf_verifier_env *env,
 	insn[1].imm = addr >> 32;
 	mark_reg_known_zero(env, regs, insn->dst_reg);
 
+	datasec_id = btf_find_by_name_kind(btf_vmlinux, ".data..percpu",
+					   BTF_KIND_DATASEC);
+	if (datasec_id > 0) {
+		datasec = btf_type_by_id(btf_vmlinux, datasec_id);
+		for_each_vsi(i, datasec, vsi) {
+			if (vsi->type == id) {
+				percpu = true;
+				break;
+			}
+		}
+	}
+
 	type = t->type;
 	t = btf_type_skip_modifiers(btf_vmlinux, type, NULL);
-	if (!btf_type_is_struct(t)) {
+	if (percpu) {
+		regs[insn->dst_reg].type = PTR_TO_PERCPU_BTF_ID;
+		regs[insn->dst_reg].btf_id = type;
+	} else if (!btf_type_is_struct(t)) {
 		u32 tsize;
 		const struct btf_type *ret;
 		const char *tname;
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index a8d4f253ed77..7f0033960d82 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1098,6 +1098,22 @@ static const struct bpf_func_proto bpf_send_signal_thread_proto = {
 	.arg1_type	= ARG_ANYTHING,
 };
 
+BPF_CALL_2(bpf_per_cpu_ptr, const void *, ptr, u32, cpu)
+{
+	if (cpu >= nr_cpu_ids)
+		return 0;
+
+	return (u64)per_cpu_ptr(ptr, cpu);
+}
+
+static const struct bpf_func_proto bpf_per_cpu_ptr_proto = {
+	.func		= bpf_per_cpu_ptr,
+	.gpl_only	= false,
+	.ret_type	= RET_PTR_TO_MEM_OR_BTF_OR_NULL,
+	.arg1_type	= ARG_PTR_TO_PERCPU_BTF_ID,
+	.arg2_type	= ARG_ANYTHING,
+};
+
 const struct bpf_func_proto *
 bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
@@ -1182,6 +1198,8 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_jiffies64_proto;
 	case BPF_FUNC_get_task_stack:
 		return &bpf_get_task_stack_proto;
+	case BPF_FUNC_bpf_per_cpu_ptr:
+		return &bpf_per_cpu_ptr_proto;
 	default:
 		return NULL;
 	}
-- 
2.28.0.220.ged08abb693-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH bpf-next v1 7/8] bpf: Propagate bpf_per_cpu_ptr() to /tools
  2020-08-19 22:40 [PATCH bpf-next v1 0/8] bpf: BTF support for ksyms Hao Luo
                   ` (5 preceding siblings ...)
  2020-08-19 22:40 ` [PATCH bpf-next v1 6/8] bpf: Introduce bpf_per_cpu_ptr() Hao Luo
@ 2020-08-19 22:40 ` Hao Luo
  2020-08-20 17:30   ` Yonghong Song
  2020-08-19 22:40 ` [PATCH bpf-next v1 8/8] bpf/selftests: Test for bpf_per_cpu_ptr() Hao Luo
  7 siblings, 1 reply; 42+ messages in thread
From: Hao Luo @ 2020-08-19 22:40 UTC (permalink / raw)
  To: netdev, bpf, linux-kernel, linux-kselftest
  Cc: Shuah Khan, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Quentin Monnet, Hao Luo, Steven Rostedt, Ingo Molnar,
	Andrey Ignatov, Jakub Sitnicki

Sync tools/include/linux/uapi/bpf.h with include/linux/uapi/bpf.h

Signed-off-by: Hao Luo <haoluo@google.com>
---
 tools/include/uapi/linux/bpf.h | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 468376f2910b..7e3dfb2bbb86 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -3415,6 +3415,20 @@ union bpf_attr {
  *		A non-negative value equal to or less than *size* on success,
  *		or a negative error in case of failure.
  *
+ * void *bpf_per_cpu_ptr(const void *ptr, u32 cpu)
+ *	Description
+ *		Take the address of a percpu ksym and return a pointer pointing
+ *		to the variable on *cpu*. A ksym is an extern variable decorated
+ *		with '__ksym'. A ksym is percpu if there is a global percpu var
+ *		(either static or global) defined of the same name in the kernel.
+ *
+ *		bpf_per_cpu_ptr() has the same semantic as per_cpu_ptr() in the
+ *		kernel, except that bpf_per_cpu_ptr() may return NULL. This
+ *		happens if *cpu* is larger than nr_cpu_ids. The caller of
+ *		bpf_per_cpu_ptr() must check the returned value.
+ *	Return
+ *		A generic pointer pointing to the variable on *cpu*.
+ *
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -3559,6 +3573,7 @@ union bpf_attr {
 	FN(skc_to_tcp_request_sock),	\
 	FN(skc_to_udp6_sock),		\
 	FN(get_task_stack),		\
+	FN(bpf_per_cpu_ptr),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
-- 
2.28.0.220.ged08abb693-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH bpf-next v1 8/8] bpf/selftests: Test for bpf_per_cpu_ptr()
  2020-08-19 22:40 [PATCH bpf-next v1 0/8] bpf: BTF support for ksyms Hao Luo
                   ` (6 preceding siblings ...)
  2020-08-19 22:40 ` [PATCH bpf-next v1 7/8] bpf: Propagate bpf_per_cpu_ptr() to /tools Hao Luo
@ 2020-08-19 22:40 ` Hao Luo
  2020-08-22  3:30   ` Andrii Nakryiko
  7 siblings, 1 reply; 42+ messages in thread
From: Hao Luo @ 2020-08-19 22:40 UTC (permalink / raw)
  To: netdev, bpf, linux-kernel, linux-kselftest
  Cc: Shuah Khan, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Quentin Monnet, Hao Luo, Steven Rostedt, Ingo Molnar,
	Andrey Ignatov, Jakub Sitnicki

Test bpf_per_cpu_ptr(). Test two paths in the kernel. If the base
pointer points to a struct, the returned reg is of type PTR_TO_BTF_ID.
Direct pointer dereference can be applied on the returned variable.
If the base pointer isn't a struct, the returned reg is of type
PTR_TO_MEM, which also supports direct pointer dereference.

Signed-off-by: Hao Luo <haoluo@google.com>
---
 .../testing/selftests/bpf/prog_tests/ksyms_btf.c  |  4 ++++
 .../testing/selftests/bpf/progs/test_ksyms_btf.c  | 15 ++++++++++++++-
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/ksyms_btf.c b/tools/testing/selftests/bpf/prog_tests/ksyms_btf.c
index 1dad61ba7e99..bdedd4a76b42 100644
--- a/tools/testing/selftests/bpf/prog_tests/ksyms_btf.c
+++ b/tools/testing/selftests/bpf/prog_tests/ksyms_btf.c
@@ -71,6 +71,10 @@ void test_ksyms_btf(void)
 	      "got %llu, exp %llu\n", data->out__runqueues, runqueues_addr);
 	CHECK(data->out__bpf_prog_active != bpf_prog_active_addr, "bpf_prog_active",
 	      "got %llu, exp %llu\n", data->out__bpf_prog_active, bpf_prog_active_addr);
+	CHECK(data->out__rq_cpu != 1, "rq_cpu",
+	      "got %u, exp %u\n", data->out__rq_cpu, 1);
+	CHECK(data->out__process_counts == -1, "process_counts",
+	      "got %lu, exp != -1", data->out__process_counts);
 
 cleanup:
 	test_ksyms_btf__destroy(skel);
diff --git a/tools/testing/selftests/bpf/progs/test_ksyms_btf.c b/tools/testing/selftests/bpf/progs/test_ksyms_btf.c
index e04e31117f84..78cf1ebb753d 100644
--- a/tools/testing/selftests/bpf/progs/test_ksyms_btf.c
+++ b/tools/testing/selftests/bpf/progs/test_ksyms_btf.c
@@ -7,16 +7,29 @@
 
 __u64 out__runqueues = -1;
 __u64 out__bpf_prog_active = -1;
+__u32 out__rq_cpu = -1;
+unsigned long out__process_counts = -1;
 
-extern const struct rq runqueues __ksym; /* struct type global var. */
+extern const struct rq runqueues __ksym; /* struct type percpu var. */
 extern const int bpf_prog_active __ksym; /* int type global var. */
+extern const unsigned long process_counts __ksym; /* int type percpu var. */
 
 SEC("raw_tp/sys_enter")
 int handler(const void *ctx)
 {
+	struct rq *rq;
+	unsigned long *count;
+
 	out__runqueues = (__u64)&runqueues;
 	out__bpf_prog_active = (__u64)&bpf_prog_active;
 
+	rq = (struct rq *)bpf_per_cpu_ptr(&runqueues, 1);
+	if (rq)
+		out__rq_cpu = rq->cpu;
+	count = (unsigned long *)bpf_per_cpu_ptr(&process_counts, 1);
+	if (count)
+		out__process_counts = *count;
+
 	return 0;
 }
 
-- 
2.28.0.220.ged08abb693-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next v1 1/8] bpf: Introduce pseudo_btf_id
  2020-08-19 22:40 ` [PATCH bpf-next v1 1/8] bpf: Introduce pseudo_btf_id Hao Luo
@ 2020-08-20 15:22   ` Yonghong Song
  2020-08-20 17:04     ` Hao Luo
  2020-08-25  0:05     ` Hao Luo
  2020-08-20 16:43   ` Yonghong Song
  2020-08-20 21:53   ` Alexei Starovoitov
  2 siblings, 2 replies; 42+ messages in thread
From: Yonghong Song @ 2020-08-20 15:22 UTC (permalink / raw)
  To: Hao Luo, netdev, bpf, linux-kernel, linux-kselftest
  Cc: Shuah Khan, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, John Fastabend, KP Singh,
	Quentin Monnet, Steven Rostedt, Ingo Molnar, Andrey Ignatov,
	Jakub Sitnicki



On 8/19/20 3:40 PM, Hao Luo wrote:
> Pseudo_btf_id is a type of ld_imm insn that associates a btf_id to a
> ksym so that further dereferences on the ksym can use the BTF info
> to validate accesses. Internally, when seeing a pseudo_btf_id ld insn,
> the verifier reads the btf_id stored in the insn[0]'s imm field and
> marks the dst_reg as PTR_TO_BTF_ID. The btf_id points to a VAR_KIND,
> which is encoded in btf_vminux by pahole. If the VAR is not of a struct
> type, the dst reg will be marked as PTR_TO_MEM instead of PTR_TO_BTF_ID
> and the mem_size is resolved to the size of the VAR's type.
> 
>  From the VAR btf_id, the verifier can also read the address of the
> ksym's corresponding kernel var from kallsyms and use that to fill
> dst_reg.
> 
> Therefore, the proper functionality of pseudo_btf_id depends on (1)
> kallsyms and (2) the encoding of kernel global VARs in pahole, which
> should be available since pahole v1.18.

I tried your patch with latest pahole but it did not generate
expected BTF_TYPE_VARs. My pahole head is:
   f3d9054ba8ff btf_encoder: Teach pahole to store percpu variables in 
vmlinux BTF.

First I made the following changes to facilitate debugging:
diff --git a/btf_encoder.c b/btf_encoder.c
index 982f59d..f94c3a6 100644
--- a/btf_encoder.c
+++ b/btf_encoder.c
@@ -334,6 +334,9 @@ int cu__encode_btf(struct cu *cu, int verbose, bool 
force)
                 /* percpu variables are allocated in global space */
                 if (variable__scope(var) != VSCOPE_GLOBAL)
                         continue;
+               /* type 0 is void, probably an internal error */
+               if (var->ip.tag.type == 0)
+                       continue;
                 has_global_var = true;
                 head = &hash_addr[hashaddr__fn(var->ip.addr)];
                 hlist_add_head(&var->tool_hnode, head);
@@ -399,8 +402,8 @@ int cu__encode_btf(struct cu *cu, int verbose, bool 
force)
                 }

                 if (verbose)
-                       printf("symbol '%s' of address 0x%lx encoded\n",
-                              sym_name, addr);
+                       printf("symbol '%s' of address 0x%lx encoded, 
type %u\n",
+                              sym_name, addr, type);

                 /* add a BTF_KIND_VAR in btfe->types */
                 linkage = var->external ? BTF_VAR_GLOBAL_ALLOCATED : 
BTF_VAR_STATIC;
diff --git a/libbtf.c b/libbtf.c
index 7a01ded..3a0d8d7 100644
--- a/libbtf.c
+++ b/libbtf.c
@@ -304,6 +304,8 @@ static const char * const btf_kind_str[NR_BTF_KINDS] = {
         [BTF_KIND_RESTRICT]     = "RESTRICT",
         [BTF_KIND_FUNC]         = "FUNC",
         [BTF_KIND_FUNC_PROTO]   = "FUNC_PROTO",
+       [BTF_KIND_VAR]          = "VAR",
+       [BTF_KIND_DATASEC]      = "DATASEC",
  };

  static const char *btf_elf__name_in_gobuf(const struct btf_elf *btfe, 
uint32_t offset)
@@ -671,7 +673,7 @@ int32_t btf_elf__add_var_type(struct btf_elf *btfe, 
uint32_t type, uint32_t name
                 return -1;
         }

-       btf_elf__log_type(btfe, &t.type, false, false, "type=%u name=%s",
+       btf_elf__log_type(btfe, &t.type, false, false, "type=%u name=%s\n",
                           t.type.type, btf_elf__name_in_gobuf(btfe, 
t.type.name_off));

         return btfe->type_index;

It would be good if you can add some of the above changes to
pahole for easier `pahole -JV` dump.

With the above change, I only got static per cpu variables.
For example,
    static DEFINE_PER_CPU(unsigned int , mirred_rec_level);
in net/sched/act_mirred.c.

[10] INT 'unsigned int' size=4 bits_offset=0 nr_bits=32 encoding=(none)
[74536] VAR 'mirred_rec_level' type_id=10, linkage=static

The dwarf debug_info entry for `mirred_rec_level`:
0x0001d8d6:   DW_TAG_variable
                 DW_AT_name      ("mirred_rec_level")
                 DW_AT_decl_file 
("/data/users/yhs/work/net-next/net/sched/act_mirred.c")
                 DW_AT_decl_line (31)
                 DW_AT_decl_column       (0x08)
                 DW_AT_type      (0x00000063 "unsigned int")
                 DW_AT_location  (DW_OP_addr 0x0)
It is not a declaration and it contains type.

All global per cpu variables do not have BTF_KIND_VAR generated.
I did a brief investigation and found this mostly like to be a
pahole issue. For example, for global per cpu variable
bpf_prog_active,
   include/linux/bpf.h
         DECLARE_PER_CPU(int , bpf_prog_active);
   kernel/bpf/syscall.c
         DEFINE_PER_CPU(int , bpf_prog_active);
it is declared in the header include/linux/bpf.h and
defined in kernel/bpf/syscall.c.

In many cu's, you will see:
0x0003592a:   DW_TAG_variable
                 DW_AT_name      ("bpf_prog_active")
                 DW_AT_decl_file 
("/data/users/yhs/work/net-next/include/linux/bpf.h")
                 DW_AT_decl_line (1074)
                 DW_AT_decl_column       (0x01)
                 DW_AT_type      (0x0001fa7e "int")
                 DW_AT_external  (true)
                 DW_AT_declaration       (true)

In kernel/bpf/syscall.c, I see
the following dwarf entry for real definition:
0x00013534:   DW_TAG_variable
                 DW_AT_name      ("bpf_prog_active")
                 DW_AT_decl_file 
("/data/users/yhs/work/net-next/include/linux/bpf.h")
                 DW_AT_decl_line (1074)
                 DW_AT_decl_column       (0x01)
                 DW_AT_type      (0x000000d6 "int")
                 DW_AT_external  (true)
                 DW_AT_declaration       (true)

0x00021a25:   DW_TAG_variable
                 DW_AT_specification     (0x00013534 "bpf_prog_active")
                 DW_AT_decl_file 
("/data/users/yhs/work/net-next/kernel/bpf/syscall.c")
                 DW_AT_decl_line (43)
                 DW_AT_location  (DW_OP_addr 0x0)

Note that for the second entry DW_AT_specification points to the 
declaration. I am not 100% sure whether pahole handle this properly or 
not. It generates a type id 0 (void) for bpf_prog_active variable.

Could you investigate this a little more?

I am using gcc 8.2.1. Using kernel default dwarf (dwarf 2) exposed
the above issue. Tries to use dwarf 4 and the problem still exists.


> 
> Signed-off-by: Hao Luo <haoluo@google.com>
> ---
>   include/linux/btf.h      | 15 +++++++++
>   include/uapi/linux/bpf.h | 38 ++++++++++++++++------
>   kernel/bpf/btf.c         | 15 ---------
>   kernel/bpf/verifier.c    | 68 ++++++++++++++++++++++++++++++++++++++++
>   4 files changed, 112 insertions(+), 24 deletions(-)
> 
[...]

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next v1 1/8] bpf: Introduce pseudo_btf_id
  2020-08-19 22:40 ` [PATCH bpf-next v1 1/8] bpf: Introduce pseudo_btf_id Hao Luo
  2020-08-20 15:22   ` Yonghong Song
@ 2020-08-20 16:43   ` Yonghong Song
  2020-08-20 21:53   ` Alexei Starovoitov
  2 siblings, 0 replies; 42+ messages in thread
From: Yonghong Song @ 2020-08-20 16:43 UTC (permalink / raw)
  To: Hao Luo, netdev, bpf, linux-kernel, linux-kselftest
  Cc: Shuah Khan, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, John Fastabend, KP Singh,
	Quentin Monnet, Steven Rostedt, Ingo Molnar, Andrey Ignatov,
	Jakub Sitnicki



On 8/19/20 3:40 PM, Hao Luo wrote:
> Pseudo_btf_id is a type of ld_imm insn that associates a btf_id to a
> ksym so that further dereferences on the ksym can use the BTF info
> to validate accesses. Internally, when seeing a pseudo_btf_id ld insn,
> the verifier reads the btf_id stored in the insn[0]'s imm field and
> marks the dst_reg as PTR_TO_BTF_ID. The btf_id points to a VAR_KIND,
> which is encoded in btf_vminux by pahole. If the VAR is not of a struct
> type, the dst reg will be marked as PTR_TO_MEM instead of PTR_TO_BTF_ID
> and the mem_size is resolved to the size of the VAR's type.
> 
>  From the VAR btf_id, the verifier can also read the address of the
> ksym's corresponding kernel var from kallsyms and use that to fill
> dst_reg.
> 
> Therefore, the proper functionality of pseudo_btf_id depends on (1)
> kallsyms and (2) the encoding of kernel global VARs in pahole, which
> should be available since pahole v1.18.
> 
> Signed-off-by: Hao Luo <haoluo@google.com>
> ---
>   include/linux/btf.h      | 15 +++++++++
>   include/uapi/linux/bpf.h | 38 ++++++++++++++++------
>   kernel/bpf/btf.c         | 15 ---------
>   kernel/bpf/verifier.c    | 68 ++++++++++++++++++++++++++++++++++++++++
>   4 files changed, 112 insertions(+), 24 deletions(-)
> 
> diff --git a/include/linux/btf.h b/include/linux/btf.h
> index 8b81fbb4497c..cee4089e83c0 100644
> --- a/include/linux/btf.h
> +++ b/include/linux/btf.h
> @@ -107,6 +107,21 @@ static inline bool btf_type_is_func_proto(const struct btf_type *t)
>   	return BTF_INFO_KIND(t->info) == BTF_KIND_FUNC_PROTO;
>   }
>   
> +static inline bool btf_type_is_var(const struct btf_type *t)
> +{
> +	return BTF_INFO_KIND(t->info) == BTF_KIND_VAR;
> +}
> +
> +/* union is only a special case of struct:
> + * all its offsetof(member) == 0
> + */
> +static inline bool btf_type_is_struct(const struct btf_type *t)
> +{
> +	u8 kind = BTF_INFO_KIND(t->info);
> +
> +	return kind == BTF_KIND_STRUCT || kind == BTF_KIND_UNION;
> +}
> +
>   static inline u16 btf_type_vlen(const struct btf_type *t)
>   {
>   	return BTF_INFO_VLEN(t->info);
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 0480f893facd..468376f2910b 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -346,18 +346,38 @@ enum bpf_link_type {
>   #define BPF_F_TEST_STATE_FREQ	(1U << 3)
>   
>   /* When BPF ldimm64's insn[0].src_reg != 0 then this can have
> - * two extensions:
> - *
> - * insn[0].src_reg:  BPF_PSEUDO_MAP_FD   BPF_PSEUDO_MAP_VALUE
> - * insn[0].imm:      map fd              map fd
> - * insn[1].imm:      0                   offset into value
> - * insn[0].off:      0                   0
> - * insn[1].off:      0                   0
> - * ldimm64 rewrite:  address of map      address of map[0]+offset
> - * verifier type:    CONST_PTR_TO_MAP    PTR_TO_MAP_VALUE
> + * the following extensions:
> + *
> + * insn[0].src_reg:  BPF_PSEUDO_MAP_FD
> + * insn[0].imm:      map fd
> + * insn[1].imm:      0
> + * insn[0].off:      0
> + * insn[1].off:      0
> + * ldimm64 rewrite:  address of map
> + * verifier type:    CONST_PTR_TO_MAP
>    */
>   #define BPF_PSEUDO_MAP_FD	1
> +/*
> + * insn[0].src_reg:  BPF_PSEUDO_MAP_VALUE
> + * insn[0].imm:      map fd
> + * insn[1].imm:      offset into value
> + * insn[0].off:      0
> + * insn[1].off:      0
> + * ldimm64 rewrite:  address of map[0]+offset
> + * verifier type:    PTR_TO_MAP_VALUE
> + */
>   #define BPF_PSEUDO_MAP_VALUE	2
> +/*
> + * insn[0].src_reg:  BPF_PSEUDO_BTF_ID
> + * insn[0].imm:      kernel btd id of VAR
> + * insn[1].imm:      0
> + * insn[0].off:      0
> + * insn[1].off:      0
> + * ldimm64 rewrite:  address of the kernel variable
> + * verifier type:    PTR_TO_BTF_ID or PTR_TO_MEM, depending on whether the var
> + *                   is struct/union.
> + */
> +#define BPF_PSEUDO_BTF_ID	3
>   
>   /* when bpf_call->src_reg == BPF_PSEUDO_CALL, bpf_call->imm == pc-relative
>    * offset to another bpf function
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index 91afdd4c82e3..b6d8f653afe2 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -353,16 +353,6 @@ static bool btf_type_nosize_or_null(const struct btf_type *t)
>   	return !t || btf_type_nosize(t);
>   }
>   
> -/* union is only a special case of struct:
> - * all its offsetof(member) == 0
> - */
> -static bool btf_type_is_struct(const struct btf_type *t)
> -{
> -	u8 kind = BTF_INFO_KIND(t->info);
> -
> -	return kind == BTF_KIND_STRUCT || kind == BTF_KIND_UNION;
> -}
> -
>   static bool __btf_type_is_struct(const struct btf_type *t)
>   {
>   	return BTF_INFO_KIND(t->info) == BTF_KIND_STRUCT;
> @@ -373,11 +363,6 @@ static bool btf_type_is_array(const struct btf_type *t)
>   	return BTF_INFO_KIND(t->info) == BTF_KIND_ARRAY;
>   }
>   
> -static bool btf_type_is_var(const struct btf_type *t)
> -{
> -	return BTF_INFO_KIND(t->info) == BTF_KIND_VAR;
> -}
> -
>   static bool btf_type_is_datasec(const struct btf_type *t)
>   {
>   	return BTF_INFO_KIND(t->info) == BTF_KIND_DATASEC;
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index ef938f17b944..47badde71f83 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -7205,6 +7205,68 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
>   	return 0;
>   }
>   
> +/* verify ld_imm64 insn of type PSEUDO_BTF_ID is valid */
> +static inline int check_pseudo_btf_id(struct bpf_verifier_env *env,
> +				      struct bpf_insn *insn)
> +{
> +	struct bpf_reg_state *regs = cur_regs(env);
> +	u32 type, id = insn->imm;
> +	u64 addr;
> +	const char *sym_name;
> +	const struct btf_type *t = btf_type_by_id(btf_vmlinux, id);

Since this is new code, please try to conform to reverse christmas tree 
coding style. For the last one, the assignment no need to be in
declaration, you can put "t = ..." right before the first use of "t".

same for other places.

> +
> +	if (!t) {
> +		verbose(env, "%s: invalid btf_id %d\n", __func__, id);
> +		return -ENOENT;
> +	}
> +
> +	if (insn[1].imm != 0) {
> +		verbose(env, "%s: BPF_PSEUDO_BTF_ID uses reserved fields\n",
> +			__func__);
> +		return -EINVAL;
> +	}
> +
> +	if (!btf_type_is_var(t)) {
> +		verbose(env, "%s: btf_id %d isn't KIND_VAR\n", __func__, id);
> +		return -EINVAL;
> +	}
> +
> +	sym_name = btf_name_by_offset(btf_vmlinux, t->name_off);
> +	addr = kallsyms_lookup_name(sym_name);
> +	if (!addr) {
> +		verbose(env, "%s: failed to find the address of symbol '%s'.\n",
> +			__func__, sym_name);
> +		return -ENOENT;
> +	}
> +
> +	insn[0].imm = (u32)addr;
> +	insn[1].imm = addr >> 32;
> +	mark_reg_known_zero(env, regs, insn->dst_reg);
> +
> +	type = t->type;
> +	t = btf_type_skip_modifiers(btf_vmlinux, type, NULL);
> +	if (!btf_type_is_struct(t)) {
> +		u32 tsize;
> +		const struct btf_type *ret;
> +		const char *tname; > +
> +		/* resolve the type size of ksym. */
> +		ret = btf_resolve_size(btf_vmlinux, t, &tsize, NULL, NULL);
> +		if (IS_ERR(ret)) {
> +			tname = btf_name_by_offset(btf_vmlinux, t->name_off);
> +			verbose(env, "unable to resolve the size of type '%s': %ld\n",
> +				tname, PTR_ERR(ret));
> +			return -EINVAL;
> +		}
> +		regs[insn->dst_reg].type = PTR_TO_MEM;
> +		regs[insn->dst_reg].mem_size = tsize;
> +	} else {
> +		regs[insn->dst_reg].type = PTR_TO_BTF_ID;
> +		regs[insn->dst_reg].btf_id = type;
> +	}
> +	return 0;
> +}
> +
>   /* verify BPF_LD_IMM64 instruction */
[...]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next v1 2/8] bpf: Propagate BPF_PSEUDO_BTF_ID to uapi headers in /tools
  2020-08-19 22:40 ` [PATCH bpf-next v1 2/8] bpf: Propagate BPF_PSEUDO_BTF_ID to uapi headers in /tools Hao Luo
@ 2020-08-20 16:44   ` Yonghong Song
  0 siblings, 0 replies; 42+ messages in thread
From: Yonghong Song @ 2020-08-20 16:44 UTC (permalink / raw)
  To: Hao Luo, netdev, bpf, linux-kernel, linux-kselftest
  Cc: Shuah Khan, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, John Fastabend, KP Singh,
	Quentin Monnet, Steven Rostedt, Ingo Molnar, Andrey Ignatov,
	Jakub Sitnicki



On 8/19/20 3:40 PM, Hao Luo wrote:
> Propagate BPF_PSEUDO_BTF_ID from include/linux/uapi/bpf.h to
> tools/include/linux/uapi/bpf.h.

This can be folded into the previous patch.

> 
> Signed-off-by: Hao Luo <haoluo@google.com>
> ---
>   tools/include/uapi/linux/bpf.h | 38 ++++++++++++++++++++++++++--------
>   1 file changed, 29 insertions(+), 9 deletions(-)
> 
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 0480f893facd..468376f2910b 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -346,18 +346,38 @@ enum bpf_link_type {
>   #define BPF_F_TEST_STATE_FREQ	(1U << 3)
>   
>   /* When BPF ldimm64's insn[0].src_reg != 0 then this can have
> - * two extensions:
> - *
> - * insn[0].src_reg:  BPF_PSEUDO_MAP_FD   BPF_PSEUDO_MAP_VALUE
> - * insn[0].imm:      map fd              map fd
> - * insn[1].imm:      0                   offset into value
> - * insn[0].off:      0                   0
> - * insn[1].off:      0                   0
> - * ldimm64 rewrite:  address of map      address of map[0]+offset
> - * verifier type:    CONST_PTR_TO_MAP    PTR_TO_MAP_VALUE
> + * the following extensions:
> + *
> + * insn[0].src_reg:  BPF_PSEUDO_MAP_FD
> + * insn[0].imm:      map fd
> + * insn[1].imm:      0
> + * insn[0].off:      0
> + * insn[1].off:      0
> + * ldimm64 rewrite:  address of map
> + * verifier type:    CONST_PTR_TO_MAP
>    */
>   #define BPF_PSEUDO_MAP_FD	1
> +/*
> + * insn[0].src_reg:  BPF_PSEUDO_MAP_VALUE
> + * insn[0].imm:      map fd
> + * insn[1].imm:      offset into value
> + * insn[0].off:      0
> + * insn[1].off:      0
> + * ldimm64 rewrite:  address of map[0]+offset
> + * verifier type:    PTR_TO_MAP_VALUE
> + */
>   #define BPF_PSEUDO_MAP_VALUE	2
> +/*
> + * insn[0].src_reg:  BPF_PSEUDO_BTF_ID
> + * insn[0].imm:      kernel btd id of VAR
> + * insn[1].imm:      0
> + * insn[0].off:      0
> + * insn[1].off:      0
> + * ldimm64 rewrite:  address of the kernel variable
> + * verifier type:    PTR_TO_BTF_ID or PTR_TO_MEM, depending on whether the var
> + *                   is struct/union.
> + */
> +#define BPF_PSEUDO_BTF_ID	3
>   
>   /* when bpf_call->src_reg == BPF_PSEUDO_CALL, bpf_call->imm == pc-relative
>    * offset to another bpf function
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next v1 1/8] bpf: Introduce pseudo_btf_id
  2020-08-20 15:22   ` Yonghong Song
@ 2020-08-20 17:04     ` Hao Luo
  2020-08-25  0:05     ` Hao Luo
  1 sibling, 0 replies; 42+ messages in thread
From: Hao Luo @ 2020-08-20 17:04 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Networking, bpf, open list, open list:KERNEL SELFTEST FRAMEWORK,
	Shuah Khan, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, John Fastabend, KP Singh,
	Quentin Monnet, Steven Rostedt, Ingo Molnar, Andrey Ignatov,
	Jakub Sitnicki, Arnaldo Carvalho de Melo

Yonghong,

Thank you for taking a look. Explicitly cc'ing Arnaldo to see if he
has any immediate insights. In the meantime, I'll dedicate time to
investigate this issue you found.

Thanks,
Hao


On Thu, Aug 20, 2020 at 8:23 AM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 8/19/20 3:40 PM, Hao Luo wrote:
> > Pseudo_btf_id is a type of ld_imm insn that associates a btf_id to a
> > ksym so that further dereferences on the ksym can use the BTF info
> > to validate accesses. Internally, when seeing a pseudo_btf_id ld insn,
> > the verifier reads the btf_id stored in the insn[0]'s imm field and
> > marks the dst_reg as PTR_TO_BTF_ID. The btf_id points to a VAR_KIND,
> > which is encoded in btf_vminux by pahole. If the VAR is not of a struct
> > type, the dst reg will be marked as PTR_TO_MEM instead of PTR_TO_BTF_ID
> > and the mem_size is resolved to the size of the VAR's type.
> >
> >  From the VAR btf_id, the verifier can also read the address of the
> > ksym's corresponding kernel var from kallsyms and use that to fill
> > dst_reg.
> >
> > Therefore, the proper functionality of pseudo_btf_id depends on (1)
> > kallsyms and (2) the encoding of kernel global VARs in pahole, which
> > should be available since pahole v1.18.
>
> I tried your patch with latest pahole but it did not generate
> expected BTF_TYPE_VARs. My pahole head is:
>    f3d9054ba8ff btf_encoder: Teach pahole to store percpu variables in
> vmlinux BTF.
>
> First I made the following changes to facilitate debugging:
> diff --git a/btf_encoder.c b/btf_encoder.c
> index 982f59d..f94c3a6 100644
> --- a/btf_encoder.c
> +++ b/btf_encoder.c
> @@ -334,6 +334,9 @@ int cu__encode_btf(struct cu *cu, int verbose, bool
> force)
>                  /* percpu variables are allocated in global space */
>                  if (variable__scope(var) != VSCOPE_GLOBAL)
>                          continue;
> +               /* type 0 is void, probably an internal error */
> +               if (var->ip.tag.type == 0)
> +                       continue;
>                  has_global_var = true;
>                  head = &hash_addr[hashaddr__fn(var->ip.addr)];
>                  hlist_add_head(&var->tool_hnode, head);
> @@ -399,8 +402,8 @@ int cu__encode_btf(struct cu *cu, int verbose, bool
> force)
>                  }
>
>                  if (verbose)
> -                       printf("symbol '%s' of address 0x%lx encoded\n",
> -                              sym_name, addr);
> +                       printf("symbol '%s' of address 0x%lx encoded,
> type %u\n",
> +                              sym_name, addr, type);
>
>                  /* add a BTF_KIND_VAR in btfe->types */
>                  linkage = var->external ? BTF_VAR_GLOBAL_ALLOCATED :
> BTF_VAR_STATIC;
> diff --git a/libbtf.c b/libbtf.c
> index 7a01ded..3a0d8d7 100644
> --- a/libbtf.c
> +++ b/libbtf.c
> @@ -304,6 +304,8 @@ static const char * const btf_kind_str[NR_BTF_KINDS] = {
>          [BTF_KIND_RESTRICT]     = "RESTRICT",
>          [BTF_KIND_FUNC]         = "FUNC",
>          [BTF_KIND_FUNC_PROTO]   = "FUNC_PROTO",
> +       [BTF_KIND_VAR]          = "VAR",
> +       [BTF_KIND_DATASEC]      = "DATASEC",
>   };
>
>   static const char *btf_elf__name_in_gobuf(const struct btf_elf *btfe,
> uint32_t offset)
> @@ -671,7 +673,7 @@ int32_t btf_elf__add_var_type(struct btf_elf *btfe,
> uint32_t type, uint32_t name
>                  return -1;
>          }
>
> -       btf_elf__log_type(btfe, &t.type, false, false, "type=%u name=%s",
> +       btf_elf__log_type(btfe, &t.type, false, false, "type=%u name=%s\n",
>                            t.type.type, btf_elf__name_in_gobuf(btfe,
> t.type.name_off));
>
>          return btfe->type_index;
>
> It would be good if you can add some of the above changes to
> pahole for easier `pahole -JV` dump.
>
> With the above change, I only got static per cpu variables.
> For example,
>     static DEFINE_PER_CPU(unsigned int , mirred_rec_level);
> in net/sched/act_mirred.c.
>
> [10] INT 'unsigned int' size=4 bits_offset=0 nr_bits=32 encoding=(none)
> [74536] VAR 'mirred_rec_level' type_id=10, linkage=static
>
> The dwarf debug_info entry for `mirred_rec_level`:
> 0x0001d8d6:   DW_TAG_variable
>                  DW_AT_name      ("mirred_rec_level")
>                  DW_AT_decl_file
> ("/data/users/yhs/work/net-next/net/sched/act_mirred.c")
>                  DW_AT_decl_line (31)
>                  DW_AT_decl_column       (0x08)
>                  DW_AT_type      (0x00000063 "unsigned int")
>                  DW_AT_location  (DW_OP_addr 0x0)
> It is not a declaration and it contains type.
>
> All global per cpu variables do not have BTF_KIND_VAR generated.
> I did a brief investigation and found this mostly like to be a
> pahole issue. For example, for global per cpu variable
> bpf_prog_active,
>    include/linux/bpf.h
>          DECLARE_PER_CPU(int , bpf_prog_active);
>    kernel/bpf/syscall.c
>          DEFINE_PER_CPU(int , bpf_prog_active);
> it is declared in the header include/linux/bpf.h and
> defined in kernel/bpf/syscall.c.
>
> In many cu's, you will see:
> 0x0003592a:   DW_TAG_variable
>                  DW_AT_name      ("bpf_prog_active")
>                  DW_AT_decl_file
> ("/data/users/yhs/work/net-next/include/linux/bpf.h")
>                  DW_AT_decl_line (1074)
>                  DW_AT_decl_column       (0x01)
>                  DW_AT_type      (0x0001fa7e "int")
>                  DW_AT_external  (true)
>                  DW_AT_declaration       (true)
>
> In kernel/bpf/syscall.c, I see
> the following dwarf entry for real definition:
> 0x00013534:   DW_TAG_variable
>                  DW_AT_name      ("bpf_prog_active")
>                  DW_AT_decl_file
> ("/data/users/yhs/work/net-next/include/linux/bpf.h")
>                  DW_AT_decl_line (1074)
>                  DW_AT_decl_column       (0x01)
>                  DW_AT_type      (0x000000d6 "int")
>                  DW_AT_external  (true)
>                  DW_AT_declaration       (true)
>
> 0x00021a25:   DW_TAG_variable
>                  DW_AT_specification     (0x00013534 "bpf_prog_active")
>                  DW_AT_decl_file
> ("/data/users/yhs/work/net-next/kernel/bpf/syscall.c")
>                  DW_AT_decl_line (43)
>                  DW_AT_location  (DW_OP_addr 0x0)
>
> Note that for the second entry DW_AT_specification points to the
> declaration. I am not 100% sure whether pahole handle this properly or
> not. It generates a type id 0 (void) for bpf_prog_active variable.
>
> Could you investigate this a little more?
>
> I am using gcc 8.2.1. Using kernel default dwarf (dwarf 2) exposed
> the above issue. Tries to use dwarf 4 and the problem still exists.
>
>
> >
> > Signed-off-by: Hao Luo <haoluo@google.com>
> > ---
> >   include/linux/btf.h      | 15 +++++++++
> >   include/uapi/linux/bpf.h | 38 ++++++++++++++++------
> >   kernel/bpf/btf.c         | 15 ---------
> >   kernel/bpf/verifier.c    | 68 ++++++++++++++++++++++++++++++++++++++++
> >   4 files changed, 112 insertions(+), 24 deletions(-)
> >
> [...]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next v1 3/8] bpf: Introduce help function to validate ksym's type.
  2020-08-19 22:40 ` [PATCH bpf-next v1 3/8] bpf: Introduce help function to validate ksym's type Hao Luo
@ 2020-08-20 17:20   ` Yonghong Song
  2020-08-21 21:50     ` Andrii Nakryiko
  0 siblings, 1 reply; 42+ messages in thread
From: Yonghong Song @ 2020-08-20 17:20 UTC (permalink / raw)
  To: Hao Luo, netdev, bpf, linux-kernel, linux-kselftest
  Cc: Shuah Khan, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, John Fastabend, KP Singh,
	Quentin Monnet, Steven Rostedt, Ingo Molnar, Andrey Ignatov,
	Jakub Sitnicki



On 8/19/20 3:40 PM, Hao Luo wrote:
> For a ksym to be safely dereferenced and accessed, its type defined in
> bpf program should basically match its type defined in kernel. Implement
> a help function for a quick matching, which is used by libbpf when
> resolving the kernel btf_id of a ksym.
> 
> Signed-off-by: Hao Luo <haoluo@google.com>
> ---
>   tools/lib/bpf/btf.c | 171 ++++++++++++++++++++++++++++++++++++++++++++
>   tools/lib/bpf/btf.h |   2 +
>   2 files changed, 173 insertions(+)
> 
> diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
> index a3d259e614b0..2ff31f244d7a 100644
> --- a/tools/lib/bpf/btf.c
> +++ b/tools/lib/bpf/btf.c
> @@ -1005,6 +1005,177 @@ int btf__get_map_kv_tids(const struct btf *btf, const char *map_name,
>   	return 0;
>   }
>   
> +/*
> + * Basic type check for ksym support. Only checks type kind and resolved size.
> + */
> +static inline
> +bool btf_ksym_equal_type(const struct btf *ba, __u32 type_a,
> +			 const struct btf *bb, __u32 type_b)

"ba" and "bb" is not descriptive. Maybe "btf_a" or "btf_b"?
or even "btf1" or "btf2" since the number does not carry
extra meaning compared to letters.

The same for below, may be t1, t2?

> +{
> +	const struct btf_type *ta, *tb;
> +
> +	ta = btf__type_by_id(ba, type_a);
> +	tb = btf__type_by_id(bb, type_b);
> +
> +	/* compare type kind */
> +	if (btf_kind(ta) != btf_kind(tb))
> +		return false;
> +
> +	/* compare resolved type size */
> +	return btf__resolve_size(ba, type_a) == btf__resolve_size(bb, type_b);
> +}
> +
> +/*
> + * Match a ksym's type defined in bpf programs against its type encoded in
> + * kernel btf.
> + */
> +bool btf_ksym_type_match(const struct btf *ba, __u32 id_a,
> +			 const struct btf *bb, __u32 id_b)
> +{
> +	const struct btf_type *ta = btf__type_by_id(ba, id_a);
> +	const struct btf_type *tb = btf__type_by_id(bb, id_b);
> +	int i;
> +
> +	/* compare type kind */
> +	if (btf_kind(ta) != btf_kind(tb)) {
> +		pr_warn("%s:mismatched type kind (%d v.s. %d).\n",
> +			__func__, btf_kind(ta), btf_kind(tb));
> +		return false;
> +	}
> +
> +	switch (btf_kind(ta)) {
> +	case BTF_KIND_INT: { /* compare size and encoding */
> +		__u32 ea, eb;
> +
> +		if (ta->size != tb->size) {
> +			pr_warn("%s:INT size mismatch, (%u v.s. %u)\n",
> +				__func__, ta->size, tb->size);
> +			return false;
> +		}
> +		ea = *(__u32 *)(ta + 1);
> +		eb = *(__u32 *)(tb + 1);
> +		if (ea != eb) {
> +			pr_warn("%s:INT encoding mismatch (%u v.s. %u)\n",
> +				__func__, ea, eb);
> +			return false;
> +		}
> +		break;
> +	}
> +	case BTF_KIND_ARRAY: { /* compare type and number of elements */
> +		const struct btf_array *ea, *eb;
> +
> +		ea = btf_array(ta);
> +		eb = btf_array(tb);
> +		if (!btf_ksym_equal_type(ba, ea->type, bb, eb->type)) {
> +			pr_warn("%s:ARRAY elem type mismatch.\n", __func__);
> +			return false;
> +		}
> +		if (ea->nelems != eb->nelems) {
> +			pr_warn("%s:ARRAY nelems mismatch (%d v.s. %d)\n",
> +				__func__, ea->nelems, eb->nelems);
> +			return false;
> +		}
> +		break;
> +	}
> +	case BTF_KIND_STRUCT:
> +	case BTF_KIND_UNION: { /* compare size, vlen and member offset, name */
> +		const struct btf_member *ma, *mb;
> +
> +		if (ta->size != tb->size) {
> +			pr_warn("%s:STRUCT size mismatch, (%u v.s. %u)\n",
> +				__func__, ta->size, tb->size);
> +			return false;
> +		}
> +		if (btf_vlen(ta) != btf_vlen(tb)) {
> +			pr_warn("%s:STRUCT vlen mismatch, (%u v.s. %u)\n",
> +				__func__, btf_vlen(ta), btf_vlen(tb));
> +			return false;
> +		}
> +
> +		ma = btf_members(ta);
> +		mb = btf_members(tb);
> +		for (i = 0; i < btf_vlen(ta); i++, ma++, mb++) {
> +			const char *na, *nb;
> +
> +			if (ma->offset != mb->offset) {
> +				pr_warn("%s:STRUCT field offset mismatch, (%u v.s. %u)\n",
> +					__func__, ma->offset, mb->offset);
> +				return false;
> +			}
> +			na = btf__name_by_offset(ba, ma->name_off);
> +			nb = btf__name_by_offset(bb, mb->name_off);
> +			if (strcmp(na, nb)) {
> +				pr_warn("%s:STRUCT field name mismatch, (%s v.s. %s)\n",
> +					__func__, na, nb);
> +				return false;
> +			}
> +		}

I am wondering whether this is too strict and how this can co-work with 
CO-RE. Forcing users to write almost identical structure definition to 
the underlying kernel will not be user friendly and may not work cross
kernel versions even if the field user cares have not changed.

Maybe we can relax the constraint here. You can look at existing
libbpf CO-RE code.

> +		break;
> +	}
> +	case BTF_KIND_ENUM: { /* compare vlen and member value, name */
> +		const struct btf_enum *ma, *mb;
> +
> +		if (btf_vlen(ta) != btf_vlen(tb)) {
> +			pr_warn("%s:ENUM vlen mismatch, (%u v.s. %u)\n",
> +				__func__, btf_vlen(ta), btf_vlen(tb));
> +			return false;
> +		}
> +
> +		ma = btf_enum(ta);
> +		mb = btf_enum(tb);
> +		for (i = 0; i < btf_vlen(ta); i++, ma++, mb++) {
> +			if (ma->val != mb->val) {
> +				pr_warn("%s:ENUM val mismatch, (%u v.s. %u)\n",
> +					__func__, ma->val, mb->val);
> +				return false;
> +			}
> +		}
> +		break;
> +	}
> +	case BTF_KIND_PTR: { /* naive compare of ref type for PTR */
> +		if (!btf_ksym_equal_type(ba, ta->type, bb, tb->type)) {
> +			pr_warn("%s:PTR ref type mismatch.\n", __func__);
> +			return false;
> +		}
> +		break;
> +	}
> +	case BTF_KIND_FUNC_PROTO: { /* naive compare of vlen and param types */
> +		const struct btf_param *pa, *pb;
> +
> +		if (btf_vlen(ta) != btf_vlen(tb)) {
> +			pr_warn("%s:FUNC_PROTO vlen mismatch, (%u v.s. %u)\n",
> +				__func__, btf_vlen(ta), btf_vlen(tb));
> +			return false;
> +		}
> +
> +		pa = btf_params(ta);
> +		pb = btf_params(tb);
> +		for (i = 0; i < btf_vlen(ta); i++, pa++, pb++) {
> +			if (!btf_ksym_equal_type(ba, pa->type, bb, pb->type)) {
> +				pr_warn("%s:FUNC_PROTO params type mismatch.\n",
> +					__func__);
> +				return false;
> +			}
> +		}
> +		break;
> +	}
> +	case BTF_KIND_FUNC:
> +	case BTF_KIND_CONST:
> +	case BTF_KIND_VOLATILE:
> +	case BTF_KIND_RESTRICT:
> +	case BTF_KIND_TYPEDEF:
> +	case BTF_KIND_VAR:
> +	case BTF_KIND_DATASEC:
> +		pr_warn("unexpected type for matching ksym types.\n");
> +		return false;
> +	default:
> +		pr_warn("unsupported btf types.\n");
> +		return false;
> +	}
> +
> +	return true;
> +}
> +
>   struct btf_ext_sec_setup_param {
>   	__u32 off;
>   	__u32 len;
> diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
> index 91f0ad0e0325..5ef220e52485 100644
> --- a/tools/lib/bpf/btf.h
> +++ b/tools/lib/bpf/btf.h
> @@ -52,6 +52,8 @@ LIBBPF_API int btf__get_map_kv_tids(const struct btf *btf, const char *map_name,
>   				    __u32 expected_key_size,
>   				    __u32 expected_value_size,
>   				    __u32 *key_type_id, __u32 *value_type_id);
> +LIBBPF_API bool btf_ksym_type_match(const struct btf *ba, __u32 id_a,
> +				    const struct btf *bb, __u32 id_b);
>   
>   LIBBPF_API struct btf_ext *btf_ext__new(__u8 *data, __u32 size);
>   LIBBPF_API void btf_ext__free(struct btf_ext *btf_ext);

The new API function should be added to libbpf.map.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next v1 5/8] bpf/selftests: ksyms_btf to test typed ksyms
  2020-08-19 22:40 ` [PATCH bpf-next v1 5/8] bpf/selftests: ksyms_btf to test " Hao Luo
@ 2020-08-20 17:28   ` Yonghong Song
  2020-08-21 23:03     ` Andrii Nakryiko
  0 siblings, 1 reply; 42+ messages in thread
From: Yonghong Song @ 2020-08-20 17:28 UTC (permalink / raw)
  To: Hao Luo, netdev, bpf, linux-kernel, linux-kselftest
  Cc: Shuah Khan, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, John Fastabend, KP Singh,
	Quentin Monnet, Steven Rostedt, Ingo Molnar, Andrey Ignatov,
	Jakub Sitnicki



On 8/19/20 3:40 PM, Hao Luo wrote:
> Selftests for typed ksyms. Tests two types of ksyms: one is a struct,
> the other is a plain int. This tests two paths in the kernel. Struct
> ksyms will be converted into PTR_TO_BTF_ID by the verifier while int
> typed ksyms will be converted into PTR_TO_MEM.
> 
> Signed-off-by: Hao Luo <haoluo@google.com>
> ---
>   .../selftests/bpf/prog_tests/ksyms_btf.c      | 77 +++++++++++++++++++
>   .../selftests/bpf/progs/test_ksyms_btf.c      | 23 ++++++
>   2 files changed, 100 insertions(+)
>   create mode 100644 tools/testing/selftests/bpf/prog_tests/ksyms_btf.c
>   create mode 100644 tools/testing/selftests/bpf/progs/test_ksyms_btf.c
> 
> diff --git a/tools/testing/selftests/bpf/prog_tests/ksyms_btf.c b/tools/testing/selftests/bpf/prog_tests/ksyms_btf.c
> new file mode 100644
> index 000000000000..1dad61ba7e99
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/ksyms_btf.c
> @@ -0,0 +1,77 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2020 Google */
> +
> +#include <test_progs.h>
> +#include <bpf/libbpf.h>
> +#include <bpf/btf.h>
> +#include "test_ksyms_btf.skel.h"
> +
> +static int duration;
> +
> +static __u64 kallsyms_find(const char *sym)
> +{
> +	char type, name[500];
> +	__u64 addr, res = 0;
> +	FILE *f;
> +
> +	f = fopen("/proc/kallsyms", "r");
> +	if (CHECK(!f, "kallsyms_fopen", "failed to open: %d\n", errno))
> +		return 0;

could you check whether libbpf API can provide this functionality for 
you? As far as I know, libbpf does parse /proc/kallsyms.

> +
> +	while (fscanf(f, "%llx %c %499s%*[^\n]\n", &addr, &type, name) > 0) {
> +		if (strcmp(name, sym) == 0) {
> +			res = addr;
> +			goto out;
> +		}
> +	}
> +
> +	CHECK(false, "not_found", "symbol %s not found\n", sym);
> +out:
> +	fclose(f);
> +	return res;
> +}
> +
> +void test_ksyms_btf(void)
> +{
> +	__u64 runqueues_addr = kallsyms_find("runqueues");
> +	__u64 bpf_prog_active_addr = kallsyms_find("bpf_prog_active");
> +	struct test_ksyms_btf *skel;
> +	struct test_ksyms_btf__data *data;
> +	struct btf *btf;
> +	int percpu_datasec;
> +	int err;
> +
> +	btf = libbpf_find_kernel_btf();
> +	if (CHECK(IS_ERR(btf), "btf_exists", "failed to load kernel BTF: %ld\n",
> +		  PTR_ERR(btf)))
> +		return;
> +
> +	percpu_datasec = btf__find_by_name_kind(btf, ".data..percpu",
> +						BTF_KIND_DATASEC);
> +	if (percpu_datasec < 0) {
> +		printf("%s:SKIP:no PERCPU DATASEC in kernel btf\n",
> +		       __func__);
> +		test__skip();
> +		return;
> +	}
> +
> +	skel = test_ksyms_btf__open_and_load();
> +	if (CHECK(!skel, "skel_open", "failed to open and load skeleton\n"))
> +		return;
> +
> +	err = test_ksyms_btf__attach(skel);
> +	if (CHECK(err, "skel_attach", "skeleton attach failed: %d\n", err))
> +		goto cleanup;
> +
> +	/* trigger tracepoint */
> +	usleep(1);
> +
> +	data = skel->data;
> +	CHECK(data->out__runqueues != runqueues_addr, "runqueues",
> +	      "got %llu, exp %llu\n", data->out__runqueues, runqueues_addr);
> +	CHECK(data->out__bpf_prog_active != bpf_prog_active_addr, "bpf_prog_active",
> +	      "got %llu, exp %llu\n", data->out__bpf_prog_active, bpf_prog_active_addr);
> +
> +cleanup:
> +	test_ksyms_btf__destroy(skel);
> +}
> diff --git a/tools/testing/selftests/bpf/progs/test_ksyms_btf.c b/tools/testing/selftests/bpf/progs/test_ksyms_btf.c
> new file mode 100644
> index 000000000000..e04e31117f84
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/test_ksyms_btf.c
> @@ -0,0 +1,23 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2020 Google */
> +
> +#include "vmlinux.h"
> +
> +#include <bpf/bpf_helpers.h>
> +
> +__u64 out__runqueues = -1;
> +__u64 out__bpf_prog_active = -1;
> +
> +extern const struct rq runqueues __ksym; /* struct type global var. */
> +extern const int bpf_prog_active __ksym; /* int type global var. */
> +
> +SEC("raw_tp/sys_enter")
> +int handler(const void *ctx)
> +{
> +	out__runqueues = (__u64)&runqueues;
> +	out__bpf_prog_active = (__u64)&bpf_prog_active;
> +
> +	return 0;
> +}
> +
> +char _license[] SEC("license") = "GPL";
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next v1 7/8] bpf: Propagate bpf_per_cpu_ptr() to /tools
  2020-08-19 22:40 ` [PATCH bpf-next v1 7/8] bpf: Propagate bpf_per_cpu_ptr() to /tools Hao Luo
@ 2020-08-20 17:30   ` Yonghong Song
  0 siblings, 0 replies; 42+ messages in thread
From: Yonghong Song @ 2020-08-20 17:30 UTC (permalink / raw)
  To: Hao Luo, netdev, bpf, linux-kernel, linux-kselftest
  Cc: Shuah Khan, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, John Fastabend, KP Singh,
	Quentin Monnet, Steven Rostedt, Ingo Molnar, Andrey Ignatov,
	Jakub Sitnicki



On 8/19/20 3:40 PM, Hao Luo wrote:
> Sync tools/include/linux/uapi/bpf.h with include/linux/uapi/bpf.h

This can be folded into the previous patch.

> 
> Signed-off-by: Hao Luo <haoluo@google.com>
> ---
>   tools/include/uapi/linux/bpf.h | 15 +++++++++++++++
>   1 file changed, 15 insertions(+)
> 
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 468376f2910b..7e3dfb2bbb86 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -3415,6 +3415,20 @@ union bpf_attr {
>    *		A non-negative value equal to or less than *size* on success,
>    *		or a negative error in case of failure.
>    *
> + * void *bpf_per_cpu_ptr(const void *ptr, u32 cpu)
> + *	Description
> + *		Take the address of a percpu ksym and return a pointer pointing
> + *		to the variable on *cpu*. A ksym is an extern variable decorated
> + *		with '__ksym'. A ksym is percpu if there is a global percpu var
> + *		(either static or global) defined of the same name in the kernel.
> + *
> + *		bpf_per_cpu_ptr() has the same semantic as per_cpu_ptr() in the
> + *		kernel, except that bpf_per_cpu_ptr() may return NULL. This
> + *		happens if *cpu* is larger than nr_cpu_ids. The caller of
> + *		bpf_per_cpu_ptr() must check the returned value.
> + *	Return
> + *		A generic pointer pointing to the variable on *cpu*.
> + *
>    */
>   #define __BPF_FUNC_MAPPER(FN)		\
>   	FN(unspec),			\
> @@ -3559,6 +3573,7 @@ union bpf_attr {
>   	FN(skc_to_tcp_request_sock),	\
>   	FN(skc_to_udp6_sock),		\
>   	FN(get_task_stack),		\
> +	FN(bpf_per_cpu_ptr),		\
>   	/* */
>   
>   /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next v1 1/8] bpf: Introduce pseudo_btf_id
  2020-08-19 22:40 ` [PATCH bpf-next v1 1/8] bpf: Introduce pseudo_btf_id Hao Luo
  2020-08-20 15:22   ` Yonghong Song
  2020-08-20 16:43   ` Yonghong Song
@ 2020-08-20 21:53   ` Alexei Starovoitov
  2020-08-21  2:22     ` Hao Luo
  2 siblings, 1 reply; 42+ messages in thread
From: Alexei Starovoitov @ 2020-08-20 21:53 UTC (permalink / raw)
  To: Hao Luo
  Cc: netdev, bpf, linux-kernel, linux-kselftest, Shuah Khan,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Quentin Monnet, Steven Rostedt, Ingo Molnar,
	Andrey Ignatov, Jakub Sitnicki

On Wed, Aug 19, 2020 at 03:40:23PM -0700, Hao Luo wrote:
> +
>  /* verify BPF_LD_IMM64 instruction */
>  static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
>  {
> @@ -7234,6 +7296,9 @@ static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
>  		return 0;
>  	}
>  
> +	if (insn->src_reg == BPF_PSEUDO_BTF_ID)
> +		return check_pseudo_btf_id(env, insn);
> +
>  	map = env->used_maps[aux->map_index];
>  	mark_reg_known_zero(env, regs, insn->dst_reg);
>  	regs[insn->dst_reg].map_ptr = map;
> @@ -9255,6 +9320,9 @@ static int replace_map_fd_with_map_ptr(struct bpf_verifier_env *env)
>  				/* valid generic load 64-bit imm */
>  				goto next_insn;
>  
> +			if (insn[0].src_reg == BPF_PSEUDO_BTF_ID)
> +				goto next_insn;
> +

Why did you choose to do it during main do_check() walk instead of this pre-pass ?
check_ld_imm() can be called multiple times for the same insn,
so it's faster and less surprising to do it during replace_map_fd_with_map_ptr().
BTF needs to be parsed first, of course.
You can either move check_btf_info() before replace_map_fd_with_map_ptr() or
move replace_map_fd_with_map_ptr() after check_btf_info().
The latter is probably cleaner.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next v1 1/8] bpf: Introduce pseudo_btf_id
  2020-08-20 21:53   ` Alexei Starovoitov
@ 2020-08-21  2:22     ` Hao Luo
  0 siblings, 0 replies; 42+ messages in thread
From: Hao Luo @ 2020-08-21  2:22 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Networking, bpf, open list, open list:KERNEL SELFTEST FRAMEWORK,
	Shuah Khan, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Quentin Monnet, Steven Rostedt, Ingo Molnar,
	Andrey Ignatov, Jakub Sitnicki

On Thu, Aug 20, 2020 at 2:53 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> Why did you choose to do it during main do_check() walk instead of this pre-pass ?
> check_ld_imm() can be called multiple times for the same insn,
> so it's faster and less surprising to do it during replace_map_fd_with_map_ptr().
> BTF needs to be parsed first, of course.
> You can either move check_btf_info() before replace_map_fd_with_map_ptr() or
> move replace_map_fd_with_map_ptr() after check_btf_info().
> The latter is probably cleaner.

Ah, that does make more sense. I can move
replace_map_fd_with_map_ptr() after check_btf_info() (or
check_attach_btf_id()) and rename it to better reflect the fact that
it's not just for maps now.

Thanks for the insights!
Hao

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next v1 3/8] bpf: Introduce help function to validate ksym's type.
  2020-08-20 17:20   ` Yonghong Song
@ 2020-08-21 21:50     ` Andrii Nakryiko
  2020-08-22  0:43       ` Hao Luo
  0 siblings, 1 reply; 42+ messages in thread
From: Andrii Nakryiko @ 2020-08-21 21:50 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Hao Luo, Networking, bpf, open list,
	open list:KERNEL SELFTEST FRAMEWORK, Shuah Khan,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, John Fastabend, KP Singh,
	Quentin Monnet, Steven Rostedt, Ingo Molnar, Andrey Ignatov,
	Jakub Sitnicki

On Thu, Aug 20, 2020 at 10:22 AM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 8/19/20 3:40 PM, Hao Luo wrote:
> > For a ksym to be safely dereferenced and accessed, its type defined in
> > bpf program should basically match its type defined in kernel. Implement
> > a help function for a quick matching, which is used by libbpf when
> > resolving the kernel btf_id of a ksym.
> >
> > Signed-off-by: Hao Luo <haoluo@google.com>
> > ---
> >   tools/lib/bpf/btf.c | 171 ++++++++++++++++++++++++++++++++++++++++++++
> >   tools/lib/bpf/btf.h |   2 +
> >   2 files changed, 173 insertions(+)
> >
> > diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
> > index a3d259e614b0..2ff31f244d7a 100644
> > --- a/tools/lib/bpf/btf.c
> > +++ b/tools/lib/bpf/btf.c
> > @@ -1005,6 +1005,177 @@ int btf__get_map_kv_tids(const struct btf *btf, const char *map_name,
> >       return 0;
> >   }
> >
> > +/*
> > + * Basic type check for ksym support. Only checks type kind and resolved size.
> > + */
> > +static inline
> > +bool btf_ksym_equal_type(const struct btf *ba, __u32 type_a,
> > +                      const struct btf *bb, __u32 type_b)
>
> "ba" and "bb" is not descriptive. Maybe "btf_a" or "btf_b"?
> or even "btf1" or "btf2" since the number does not carry
> extra meaning compared to letters.
>
> The same for below, may be t1, t2?
>
> > +{
> > +     const struct btf_type *ta, *tb;
> > +
> > +     ta = btf__type_by_id(ba, type_a);
> > +     tb = btf__type_by_id(bb, type_b);
> > +
> > +     /* compare type kind */
> > +     if (btf_kind(ta) != btf_kind(tb))
> > +             return false;
> > +
> > +     /* compare resolved type size */
> > +     return btf__resolve_size(ba, type_a) == btf__resolve_size(bb, type_b);
> > +}
> > +
> > +/*
> > + * Match a ksym's type defined in bpf programs against its type encoded in
> > + * kernel btf.
> > + */
> > +bool btf_ksym_type_match(const struct btf *ba, __u32 id_a,
> > +                      const struct btf *bb, __u32 id_b)
> > +{

[...]

> > +                     }
> > +             }
>
> I am wondering whether this is too strict and how this can co-work with
> CO-RE. Forcing users to write almost identical structure definition to
> the underlying kernel will not be user friendly and may not work cross
> kernel versions even if the field user cares have not changed.
>
> Maybe we can relax the constraint here. You can look at existing
> libbpf CO-RE code.

Right. Hao, can you just re-use bpf_core_types_are_compat() instead?
See if semantics makes sense, but I think it should. BPF CO-RE has
been permissive in terms of struct size and few other type aspects,
because it handles relocations so well. This approach allows to not
have to exactly match all possible variations of some struct
definition, which is a big problem with ever-changing kernel data
structures.

>
> > +             break;
> > +     }

[...]

> > +
> >   struct btf_ext_sec_setup_param {
> >       __u32 off;
> >       __u32 len;
> > diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
> > index 91f0ad0e0325..5ef220e52485 100644
> > --- a/tools/lib/bpf/btf.h
> > +++ b/tools/lib/bpf/btf.h
> > @@ -52,6 +52,8 @@ LIBBPF_API int btf__get_map_kv_tids(const struct btf *btf, const char *map_name,
> >                                   __u32 expected_key_size,
> >                                   __u32 expected_value_size,
> >                                   __u32 *key_type_id, __u32 *value_type_id);
> > +LIBBPF_API bool btf_ksym_type_match(const struct btf *ba, __u32 id_a,
> > +                                 const struct btf *bb, __u32 id_b);
> >
> >   LIBBPF_API struct btf_ext *btf_ext__new(__u8 *data, __u32 size);
> >   LIBBPF_API void btf_ext__free(struct btf_ext *btf_ext);
>
> The new API function should be added to libbpf.map.

My question is why does this even have to be a public API?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next v1 4/8] bpf/libbpf: BTF support for typed ksyms
  2020-08-19 22:40 ` [PATCH bpf-next v1 4/8] bpf/libbpf: BTF support for typed ksyms Hao Luo
@ 2020-08-21 22:37   ` Andrii Nakryiko
  2020-08-27 22:29     ` Hao Luo
  0 siblings, 1 reply; 42+ messages in thread
From: Andrii Nakryiko @ 2020-08-21 22:37 UTC (permalink / raw)
  To: Hao Luo
  Cc: Networking, bpf, open list, open list:KERNEL SELFTEST FRAMEWORK,
	Shuah Khan, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Quentin Monnet, Steven Rostedt, Ingo Molnar,
	Andrey Ignatov, Jakub Sitnicki

On Wed, Aug 19, 2020 at 3:42 PM Hao Luo <haoluo@google.com> wrote:
>
> If a ksym is defined with a type, libbpf will try to find the ksym's btf
> information from kernel btf. If a valid btf entry for the ksym is found,
> libbpf can pass in the found btf id to the verifier, which validates the
> ksym's type and value.
>
> Typeless ksyms (i.e. those defined as 'void') will not have such btf_id,
> but it has the symbol's address (read from kallsyms) and its value is
> treated as a raw pointer.
>
> Signed-off-by: Hao Luo <haoluo@google.com>
> ---
>  tools/lib/bpf/libbpf.c | 130 ++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 114 insertions(+), 16 deletions(-)
>
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 4a81c6b2d21b..94eff612c7c2 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -357,7 +357,16 @@ struct extern_desc {
>                         bool is_signed;
>                 } kcfg;
>                 struct {
> -                       unsigned long long addr;
> +                       /*
> +                        *  1. If ksym is typeless, the field 'addr' is valid.
> +                        *  2. If ksym is typed, the field 'vmlinux_btf_id' is
> +                        *     valid.
> +                        */
> +                       bool is_typeless;
> +                       union {
> +                               unsigned long long addr;
> +                               int vmlinux_btf_id;
> +                       };

ksym is 16 bytes anyways, union doesn't help to save space. I propose
to encode all this with just two fields: vmlinux_btf_id and addr. If
btf_id == 0, then extern is typeless.

>                 } ksym;
>         };
>  };
> @@ -382,6 +391,7 @@ struct bpf_object {
>
>         bool loaded;
>         bool has_pseudo_calls;
> +       bool has_typed_ksyms;
>
>         /*
>          * Information when doing elf related work. Only valid if fd
> @@ -2521,6 +2531,10 @@ static int bpf_object__load_vmlinux_btf(struct bpf_object *obj)
>         if (obj->btf_ext && obj->btf_ext->core_relo_info.len)
>                 need_vmlinux_btf = true;
>
> +       /* Support for typed ksyms needs kernel BTF */
> +       if (obj->has_typed_ksyms)
> +               need_vmlinux_btf = true;

On the second read, I don't think you really need `has_typed_ksyms` at
all. Just iterate over ksym externs and see if you have a typed one.
It's the only place that cares.

> +
>         bpf_object__for_each_program(prog, obj) {
>                 if (!prog->load)
>                         continue;
> @@ -2975,10 +2989,10 @@ static int bpf_object__collect_externs(struct bpf_object *obj)
>                         ext->type = EXT_KSYM;
>
>                         vt = skip_mods_and_typedefs(obj->btf, t->type, NULL);
> -                       if (!btf_is_void(vt)) {
> -                               pr_warn("extern (ksym) '%s' is not typeless (void)\n", ext_name);
> -                               return -ENOTSUP;
> -                       }
> +                       ext->ksym.is_typeless = btf_is_void(vt);
> +
> +                       if (!obj->has_typed_ksyms && !ext->ksym.is_typeless)
> +                               obj->has_typed_ksyms = true;

nit: keep it simple:

if (ext->ksym.is_typeless)
  obj->has_typed_ksyms = true;

>                 } else {
>                         pr_warn("unrecognized extern section '%s'\n", sec_name);
>                         return -ENOTSUP;
> @@ -2992,9 +3006,9 @@ static int bpf_object__collect_externs(struct bpf_object *obj)
>         /* sort externs by type, for kcfg ones also by (align, size, name) */
>         qsort(obj->externs, obj->nr_extern, sizeof(*ext), cmp_externs);
>
> -       /* for .ksyms section, we need to turn all externs into allocated
> -        * variables in BTF to pass kernel verification; we do this by
> -        * pretending that each extern is a 8-byte variable
> +       /* for .ksyms section, we need to turn all typeless externs into
> +        * allocated variables in BTF to pass kernel verification; we do
> +        * this by pretending that each typeless extern is a 8-byte variable
>          */
>         if (ksym_sec) {
>                 /* find existing 4-byte integer type in BTF to use for fake
> @@ -3012,7 +3026,7 @@ static int bpf_object__collect_externs(struct bpf_object *obj)
>
>                 sec = ksym_sec;
>                 n = btf_vlen(sec);
> -               for (i = 0, off = 0; i < n; i++, off += sizeof(int)) {
> +               for (i = 0, off = 0; i < n; i++) {
>                         struct btf_var_secinfo *vs = btf_var_secinfos(sec) + i;
>                         struct btf_type *vt;
>
> @@ -3025,9 +3039,14 @@ static int bpf_object__collect_externs(struct bpf_object *obj)
>                                 return -ESRCH;
>                         }
>                         btf_var(vt)->linkage = BTF_VAR_GLOBAL_ALLOCATED;
> -                       vt->type = int_btf_id;
> +                       if (ext->ksym.is_typeless) {
> +                               vt->type = int_btf_id;
> +                               vs->size = sizeof(int);
> +                       }
>                         vs->offset = off;
> -                       vs->size = sizeof(int);
> +                       off += vs->size;
> +                       pr_debug("ksym var_secinfo: var '%s', type #%d, size %d, offset %d\n",
> +                                ext->name, vt->type, vs->size, vs->offset);

It's a bit of a waste that we still allocate memory for those typed
ksym externs, as they don't really need space. But modifying BTF is a
pain right now, so I think we'll have to do it, until we have a better
BTF API. But let's make them integers for now to take a fixed and
small amount of space.

>                 }
>                 sec->size = off;
>         }
> @@ -5300,8 +5319,13 @@ bpf_program__relocate(struct bpf_program *prog, struct bpf_object *obj)
>                                 insn[0].imm = obj->maps[obj->kconfig_map_idx].fd;
>                                 insn[1].imm = ext->kcfg.data_off;
>                         } else /* EXT_KSYM */ {
> -                               insn[0].imm = (__u32)ext->ksym.addr;
> -                               insn[1].imm = ext->ksym.addr >> 32;
> +                               if (ext->ksym.is_typeless) { /* typelss ksyms */

typo: typeless

> +                                       insn[0].imm = (__u32)ext->ksym.addr;
> +                                       insn[1].imm = ext->ksym.addr >> 32;
> +                               } else { /* typed ksyms */
> +                                       insn[0].src_reg = BPF_PSEUDO_BTF_ID;
> +                                       insn[0].imm = ext->ksym.vmlinux_btf_id;
> +                               }
>                         }
>                         break;
>                 case RELO_CALL:
> @@ -6019,6 +6043,10 @@ static int bpf_object__read_kallsyms_file(struct bpf_object *obj)
>                 if (!ext || ext->type != EXT_KSYM)
>                         continue;
>
> +               /* Typed ksyms have the verifier to fill their addresses. */
> +               if (!ext->ksym.is_typeless)
> +                       continue;

It might still be a good idea to try to find the symbol in kallsyms
and emit nicer message if it's not there. Think about the user's
experience when some global variable is removed from the kernel (or
compiled out due to missing Kconfig). If libbpf can easily detect
this, we should do it and provide a good error message.

> +
>                 if (ext->is_set && ext->ksym.addr != sym_addr) {
>                         pr_warn("extern (ksym) '%s' resolution is ambiguous: 0x%llx or 0x%llx\n",
>                                 sym_name, ext->ksym.addr, sym_addr);
> @@ -6037,10 +6065,72 @@ static int bpf_object__read_kallsyms_file(struct bpf_object *obj)
>         return err;
>  }
>
> +static int bpf_object__resolve_ksyms_btf_id(struct bpf_object *obj)
> +{
> +       struct extern_desc *ext;
> +       int i, id;
> +
> +       if (!obj->btf_vmlinux) {
> +               pr_warn("support of typed ksyms needs kernel btf.\n");
> +               return -ENOENT;
> +       }
> +
> +       for (i = 0; i < obj->nr_extern; i++) {
> +               const struct btf_type *v, *vx; /* VARs in object and vmlinux btf */
> +               const struct btf_type *t, *tx; /* TYPEs in btf */
> +               __u32 vt, vtx; /* btf_ids of TYPEs */

I use targ_ and local_ prefixes with CO-RE to distinguish something
that's coming from BPF object's BTF vs kernel's BTF, can you please
adopt the same, as the process is essentially the same. "vx" vs "vtx"
vs "v" are hard to distinguish and understand.

> +
> +               ext = &obj->externs[i];
> +               if (ext->type != EXT_KSYM)
> +                       continue;
> +
> +               if (ext->ksym.is_typeless)
> +                       continue;

nit: combine into a single filter condition (we need typed ksym,
that's one check)

> +
> +               if (ext->is_set) {
> +                       pr_warn("typed ksym '%s' resolved as typeless ksyms.\n",
> +                               ext->name);
> +                       return -EFAULT;

this is a bug if that happens, that should be caught by test, please
remove unnecessary check

> +               }
> +
> +               id = btf__find_by_name_kind(obj->btf_vmlinux, ext->name,
> +                                           BTF_KIND_VAR);
> +               if (id <= 0) {
> +                       pr_warn("no btf entry for ksym '%s' in vmlinux.\n",
> +                               ext->name);
> +                       return -ESRCH;
> +               }
> +
> +               vx = btf__type_by_id(obj->btf_vmlinux, id);
> +               tx = skip_mods_and_typedefs(obj->btf_vmlinux, vx->type, &vtx);
> +
> +               v = btf__type_by_id(obj->btf, ext->btf_id);
> +               t = skip_mods_and_typedefs(obj->btf, v->type, &vt);
> +
> +               if (!btf_ksym_type_match(obj->btf_vmlinux, vtx, obj->btf, vt)) {
> +                       const char *tname, *txname; /* names of TYPEs */
> +
> +                       txname = btf__name_by_offset(obj->btf_vmlinux, tx->name_off);
> +                       tname = btf__name_by_offset(obj->btf, t->name_off);
> +
> +                       pr_warn("ksym '%s' expects type '%s' (vmlinux_btf_id: #%d), "
> +                               "but got '%s' (btf_id: #%d)\n", ext->name,
> +                               txname, vtx, tname, vt);
> +                       return -EINVAL;
> +               }

yeah, definitely just use bpf_core_types_are_compat() here. You'll
want to skip_mods_and_typedefs first, but everything else should work
for your use case.

> +
> +               ext->is_set = true;
> +               ext->ksym.vmlinux_btf_id = id;
> +               pr_debug("extern (ksym) %s=vmlinux_btf_id(#%d)\n", ext->name, id);
> +       }
> +       return 0;
> +}
> +

[...]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next v1 5/8] bpf/selftests: ksyms_btf to test typed ksyms
  2020-08-20 17:28   ` Yonghong Song
@ 2020-08-21 23:03     ` Andrii Nakryiko
  2020-08-22  7:26       ` Hao Luo
  0 siblings, 1 reply; 42+ messages in thread
From: Andrii Nakryiko @ 2020-08-21 23:03 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Hao Luo, Networking, bpf, open list,
	open list:KERNEL SELFTEST FRAMEWORK, Shuah Khan,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, John Fastabend, KP Singh,
	Quentin Monnet, Steven Rostedt, Ingo Molnar, Andrey Ignatov,
	Jakub Sitnicki

On Thu, Aug 20, 2020 at 10:32 AM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 8/19/20 3:40 PM, Hao Luo wrote:
> > Selftests for typed ksyms. Tests two types of ksyms: one is a struct,
> > the other is a plain int. This tests two paths in the kernel. Struct
> > ksyms will be converted into PTR_TO_BTF_ID by the verifier while int
> > typed ksyms will be converted into PTR_TO_MEM.
> >
> > Signed-off-by: Hao Luo <haoluo@google.com>
> > ---
> >   .../selftests/bpf/prog_tests/ksyms_btf.c      | 77 +++++++++++++++++++
> >   .../selftests/bpf/progs/test_ksyms_btf.c      | 23 ++++++
> >   2 files changed, 100 insertions(+)
> >   create mode 100644 tools/testing/selftests/bpf/prog_tests/ksyms_btf.c
> >   create mode 100644 tools/testing/selftests/bpf/progs/test_ksyms_btf.c
> >
> > diff --git a/tools/testing/selftests/bpf/prog_tests/ksyms_btf.c b/tools/testing/selftests/bpf/prog_tests/ksyms_btf.c
> > new file mode 100644
> > index 000000000000..1dad61ba7e99
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/prog_tests/ksyms_btf.c
> > @@ -0,0 +1,77 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/* Copyright (c) 2020 Google */
> > +
> > +#include <test_progs.h>
> > +#include <bpf/libbpf.h>
> > +#include <bpf/btf.h>
> > +#include "test_ksyms_btf.skel.h"
> > +
> > +static int duration;
> > +
> > +static __u64 kallsyms_find(const char *sym)
> > +{
> > +     char type, name[500];
> > +     __u64 addr, res = 0;
> > +     FILE *f;
> > +
> > +     f = fopen("/proc/kallsyms", "r");
> > +     if (CHECK(!f, "kallsyms_fopen", "failed to open: %d\n", errno))
> > +             return 0;
>
> could you check whether libbpf API can provide this functionality for
> you? As far as I know, libbpf does parse /proc/kallsyms.

No need to use libbpf's implementation. We already have
kallsyms_find() in prog_tests/ksyms.c and a combination of
load_kallsyms() + ksym_get_addr() in trace_helpers.c. It would be good
to switch to one implementation for both prog_tests/ksyms.c and this
one.


>
> > +
> > +     while (fscanf(f, "%llx %c %499s%*[^\n]\n", &addr, &type, name) > 0) {
> > +             if (strcmp(name, sym) == 0) {
> > +                     res = addr;
> > +                     goto out;
> > +             }
> > +     }
> > +

[...]

> > diff --git a/tools/testing/selftests/bpf/progs/test_ksyms_btf.c b/tools/testing/selftests/bpf/progs/test_ksyms_btf.c
> > new file mode 100644
> > index 000000000000..e04e31117f84
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/progs/test_ksyms_btf.c
> > @@ -0,0 +1,23 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/* Copyright (c) 2020 Google */
> > +
> > +#include "vmlinux.h"
> > +
> > +#include <bpf/bpf_helpers.h>
> > +
> > +__u64 out__runqueues = -1;
> > +__u64 out__bpf_prog_active = -1;
> > +
> > +extern const struct rq runqueues __ksym; /* struct type global var. */
> > +extern const int bpf_prog_active __ksym; /* int type global var. */
> > +
> > +SEC("raw_tp/sys_enter")
> > +int handler(const void *ctx)
> > +{
> > +     out__runqueues = (__u64)&runqueues;
> > +     out__bpf_prog_active = (__u64)&bpf_prog_active;
> > +

You didn't test accessing any of the members of runqueues, because BTF
only has per-CPU variables, right? Adding global/static variables was
adding too much data to BTF or something like that, is that right?

> > +     return 0;
> > +}
> > +
> > +char _license[] SEC("license") = "GPL";
> >

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next v1 3/8] bpf: Introduce help function to validate ksym's type.
  2020-08-21 21:50     ` Andrii Nakryiko
@ 2020-08-22  0:43       ` Hao Luo
  2020-08-22  2:43         ` Andrii Nakryiko
  0 siblings, 1 reply; 42+ messages in thread
From: Hao Luo @ 2020-08-22  0:43 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Yonghong Song, Networking, bpf, open list,
	open list:KERNEL SELFTEST FRAMEWORK, Shuah Khan,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, John Fastabend, KP Singh,
	Quentin Monnet, Steven Rostedt, Ingo Molnar, Andrey Ignatov,
	Jakub Sitnicki

On Fri, Aug 21, 2020 at 2:50 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Aug 20, 2020 at 10:22 AM Yonghong Song <yhs@fb.com> wrote:
> >
> >
> >
> > On 8/19/20 3:40 PM, Hao Luo wrote:
> > > For a ksym to be safely dereferenced and accessed, its type defined in
> > > bpf program should basically match its type defined in kernel. Implement
> > > a help function for a quick matching, which is used by libbpf when
> > > resolving the kernel btf_id of a ksym.
> > >
> > > Signed-off-by: Hao Luo <haoluo@google.com>
> > > ---
[...]
> > > +/*
> > > + * Match a ksym's type defined in bpf programs against its type encoded in
> > > + * kernel btf.
> > > + */
> > > +bool btf_ksym_type_match(const struct btf *ba, __u32 id_a,
> > > +                      const struct btf *bb, __u32 id_b)
> > > +{
>
> [...]
>
> > > +                     }
> > > +             }
> >
> > I am wondering whether this is too strict and how this can co-work with
> > CO-RE. Forcing users to write almost identical structure definition to
> > the underlying kernel will not be user friendly and may not work cross
> > kernel versions even if the field user cares have not changed.
> >
> > Maybe we can relax the constraint here. You can look at existing
> > libbpf CO-RE code.
>
> Right. Hao, can you just re-use bpf_core_types_are_compat() instead?
> See if semantics makes sense, but I think it should. BPF CO-RE has
> been permissive in terms of struct size and few other type aspects,
> because it handles relocations so well. This approach allows to not
> have to exactly match all possible variations of some struct
> definition, which is a big problem with ever-changing kernel data
> structures.
>

I have to say I hate myself writing another type comparison instead of
reusing the existing one. The issue is that when bpf_core_types_compat
compares names, it uses t1->name_off == t2->name_off. It is also used
in bpf_equal_common(). In my case, because these types are from two
different BTFs, their name_off are not expected to be the same, right?
I didn't find a good solution to refactor before posting this patch. I
think I can adapt bpf_core_type_compat() and pay more attention to
CO-RE.

> >
> > > +             break;
> > > +     }
>
> [...]
>
> > > +
> > >   struct btf_ext_sec_setup_param {
> > >       __u32 off;
> > >       __u32 len;
> > > diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
> > > index 91f0ad0e0325..5ef220e52485 100644
> > > --- a/tools/lib/bpf/btf.h
> > > +++ b/tools/lib/bpf/btf.h
> > > @@ -52,6 +52,8 @@ LIBBPF_API int btf__get_map_kv_tids(const struct btf *btf, const char *map_name,
> > >                                   __u32 expected_key_size,
> > >                                   __u32 expected_value_size,
> > >                                   __u32 *key_type_id, __u32 *value_type_id);
> > > +LIBBPF_API bool btf_ksym_type_match(const struct btf *ba, __u32 id_a,
> > > +                                 const struct btf *bb, __u32 id_b);
> > >
> > >   LIBBPF_API struct btf_ext *btf_ext__new(__u8 *data, __u32 size);
> > >   LIBBPF_API void btf_ext__free(struct btf_ext *btf_ext);
> >
> > The new API function should be added to libbpf.map.
>
> My question is why does this even have to be a public API?

I can fix. Please pardon my ignorance, what is the difference between
public and internal APIs? I wasn't sure, so used it improperly.

Thanks,
Hao

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next v1 3/8] bpf: Introduce help function to validate ksym's type.
  2020-08-22  0:43       ` Hao Luo
@ 2020-08-22  2:43         ` Andrii Nakryiko
  2020-08-22  7:04           ` Hao Luo
  0 siblings, 1 reply; 42+ messages in thread
From: Andrii Nakryiko @ 2020-08-22  2:43 UTC (permalink / raw)
  To: Hao Luo
  Cc: Yonghong Song, Networking, bpf, open list,
	open list:KERNEL SELFTEST FRAMEWORK, Shuah Khan,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, John Fastabend, KP Singh,
	Quentin Monnet, Steven Rostedt, Ingo Molnar, Andrey Ignatov,
	Jakub Sitnicki

On Fri, Aug 21, 2020 at 5:43 PM Hao Luo <haoluo@google.com> wrote:
>
> On Fri, Aug 21, 2020 at 2:50 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Thu, Aug 20, 2020 at 10:22 AM Yonghong Song <yhs@fb.com> wrote:
> > >
> > >
> > >
> > > On 8/19/20 3:40 PM, Hao Luo wrote:
> > > > For a ksym to be safely dereferenced and accessed, its type defined in
> > > > bpf program should basically match its type defined in kernel. Implement
> > > > a help function for a quick matching, which is used by libbpf when
> > > > resolving the kernel btf_id of a ksym.
> > > >
> > > > Signed-off-by: Hao Luo <haoluo@google.com>
> > > > ---
> [...]
> > > > +/*
> > > > + * Match a ksym's type defined in bpf programs against its type encoded in
> > > > + * kernel btf.
> > > > + */
> > > > +bool btf_ksym_type_match(const struct btf *ba, __u32 id_a,
> > > > +                      const struct btf *bb, __u32 id_b)
> > > > +{
> >
> > [...]
> >
> > > > +                     }
> > > > +             }
> > >
> > > I am wondering whether this is too strict and how this can co-work with
> > > CO-RE. Forcing users to write almost identical structure definition to
> > > the underlying kernel will not be user friendly and may not work cross
> > > kernel versions even if the field user cares have not changed.
> > >
> > > Maybe we can relax the constraint here. You can look at existing
> > > libbpf CO-RE code.
> >
> > Right. Hao, can you just re-use bpf_core_types_are_compat() instead?
> > See if semantics makes sense, but I think it should. BPF CO-RE has
> > been permissive in terms of struct size and few other type aspects,
> > because it handles relocations so well. This approach allows to not
> > have to exactly match all possible variations of some struct
> > definition, which is a big problem with ever-changing kernel data
> > structures.
> >
>
> I have to say I hate myself writing another type comparison instead of
> reusing the existing one. The issue is that when bpf_core_types_compat
> compares names, it uses t1->name_off == t2->name_off. It is also used

Huh? Are we talking about the same bpf_core_types_are_compat() (there
is no bpf_core_types_compat, I think it's a typo)?
bpf_core_types_are_compat() doesn't even compare any name, so I'm not
sure what you are talking about. Some of btf_dedup functions do string
comparisons using name_off directly, but that's a special and very
careful case, it's not relevant here.


> in bpf_equal_common(). In my case, because these types are from two
> different BTFs, their name_off are not expected to be the same, right?
> I didn't find a good solution to refactor before posting this patch. I

bpf_core_types_are_compat() didn't land until this week, so you must
be confusing something. Please take another look.

> think I can adapt bpf_core_type_compat() and pay more attention to
> CO-RE.
>
> > >
> > > > +             break;
> > > > +     }
> >
> > [...]
> >
> > > > +
> > > >   struct btf_ext_sec_setup_param {
> > > >       __u32 off;
> > > >       __u32 len;
> > > > diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
> > > > index 91f0ad0e0325..5ef220e52485 100644
> > > > --- a/tools/lib/bpf/btf.h
> > > > +++ b/tools/lib/bpf/btf.h
> > > > @@ -52,6 +52,8 @@ LIBBPF_API int btf__get_map_kv_tids(const struct btf *btf, const char *map_name,
> > > >                                   __u32 expected_key_size,
> > > >                                   __u32 expected_value_size,
> > > >                                   __u32 *key_type_id, __u32 *value_type_id);
> > > > +LIBBPF_API bool btf_ksym_type_match(const struct btf *ba, __u32 id_a,
> > > > +                                 const struct btf *bb, __u32 id_b);
> > > >
> > > >   LIBBPF_API struct btf_ext *btf_ext__new(__u8 *data, __u32 size);
> > > >   LIBBPF_API void btf_ext__free(struct btf_ext *btf_ext);
> > >
> > > The new API function should be added to libbpf.map.
> >
> > My question is why does this even have to be a public API?
>
> I can fix. Please pardon my ignorance, what is the difference between
> public and internal APIs? I wasn't sure, so used it improperly.

public APIs are those that users of libbpf are supposed to use,
internal one is just for libbpf internal use. The former can't change,
the latter can be refactor as much as we need to.

>
> Thanks,
> Hao

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next v1 6/8] bpf: Introduce bpf_per_cpu_ptr()
  2020-08-19 22:40 ` [PATCH bpf-next v1 6/8] bpf: Introduce bpf_per_cpu_ptr() Hao Luo
@ 2020-08-22  3:26   ` Andrii Nakryiko
  2020-08-22  3:31     ` Andrii Nakryiko
  0 siblings, 1 reply; 42+ messages in thread
From: Andrii Nakryiko @ 2020-08-22  3:26 UTC (permalink / raw)
  To: Hao Luo
  Cc: Networking, bpf, open list, open list:KERNEL SELFTEST FRAMEWORK,
	Shuah Khan, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Quentin Monnet, Steven Rostedt, Ingo Molnar,
	Andrey Ignatov, Jakub Sitnicki

On Wed, Aug 19, 2020 at 3:42 PM Hao Luo <haoluo@google.com> wrote:
>
> Add bpf_per_cpu_ptr() to help bpf programs access percpu vars.
> bpf_per_cpu_ptr() has the same semantic as per_cpu_ptr() in the kernel
> except that it may return NULL. This happens when the cpu parameter is
> out of range. So the caller must check the returned value.
>
> Signed-off-by: Hao Luo <haoluo@google.com>
> ---

The logic looks correct, few naming nits, but otherwise:

Acked-by: Andrii Nakryiko <andriin@fb.com>

>  include/linux/bpf.h      |  3 ++
>  include/linux/btf.h      | 11 +++++++
>  include/uapi/linux/bpf.h | 14 +++++++++
>  kernel/bpf/btf.c         | 10 -------
>  kernel/bpf/verifier.c    | 64 ++++++++++++++++++++++++++++++++++++++--
>  kernel/trace/bpf_trace.c | 18 +++++++++++
>  6 files changed, 107 insertions(+), 13 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 55f694b63164..613404beab33 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -268,6 +268,7 @@ enum bpf_arg_type {
>         ARG_PTR_TO_ALLOC_MEM,   /* pointer to dynamically allocated memory */
>         ARG_PTR_TO_ALLOC_MEM_OR_NULL,   /* pointer to dynamically allocated memory or NULL */
>         ARG_CONST_ALLOC_SIZE_OR_ZERO,   /* number of allocated bytes requested */
> +       ARG_PTR_TO_PERCPU_BTF_ID,       /* pointer to in-kernel percpu type */
>  };
>
>  /* type of values returned from helper functions */
> @@ -281,6 +282,7 @@ enum bpf_return_type {
>         RET_PTR_TO_SOCK_COMMON_OR_NULL, /* returns a pointer to a sock_common or NULL */
>         RET_PTR_TO_ALLOC_MEM_OR_NULL,   /* returns a pointer to dynamically allocated memory or NULL */
>         RET_PTR_TO_BTF_ID_OR_NULL,      /* returns a pointer to a btf_id or NULL */
> +       RET_PTR_TO_MEM_OR_BTF_OR_NULL,  /* returns a pointer to a valid memory or a btf_id or NULL */

I know it's already very long, but still let's use BTF_ID consistently

>  };
>
>  /* eBPF function prototype used by verifier to allow BPF_CALLs from eBPF programs
> @@ -360,6 +362,7 @@ enum bpf_reg_type {
>         PTR_TO_RDONLY_BUF_OR_NULL, /* reg points to a readonly buffer or NULL */
>         PTR_TO_RDWR_BUF,         /* reg points to a read/write buffer */
>         PTR_TO_RDWR_BUF_OR_NULL, /* reg points to a read/write buffer or NULL */
> +       PTR_TO_PERCPU_BTF_ID,    /* reg points to percpu kernel type */
>  };
>

[...]

> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 468376f2910b..c7e49a102ed2 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -3415,6 +3415,19 @@ union bpf_attr {
>   *             A non-negative value equal to or less than *size* on success,
>   *             or a negative error in case of failure.
>   *
> + * void *bpf_per_cpu_ptr(const void *ptr, u32 cpu)
> + *     Description
> + *             Take the address of a percpu ksym and return a pointer pointing
> + *             to the variable on *cpu*. A ksym is an extern variable decorated
> + *             with '__ksym'. A ksym is percpu if there is a global percpu var
> + *             (either static or global) defined of the same name in the kernel.

The function signature has "ptr", not "ksym", but the description is
using "ksym". please make them consistent (might name param
"percpu_ptr")

> + *
> + *             bpf_per_cpu_ptr() has the same semantic as per_cpu_ptr() in the
> + *             kernel, except that bpf_per_cpu_ptr() may return NULL. This
> + *             happens if *cpu* is larger than nr_cpu_ids. The caller of
> + *             bpf_per_cpu_ptr() must check the returned value.
> + *     Return
> + *             A generic pointer pointing to the variable on *cpu*.
>   */

[...]

> +       } else if (arg_type == ARG_PTR_TO_PERCPU_BTF_ID) {
> +               expected_type = PTR_TO_PERCPU_BTF_ID;
> +               if (type != expected_type)
> +                       goto err_type;
> +               if (!reg->btf_id) {
> +                       verbose(env, "Helper has zero btf_id in R%d\n", regno);

nit: "invalid btf_id"?

> +                       return -EACCES;
> +               }
> +               meta->ret_btf_id = reg->btf_id;
>         } else if (arg_type == ARG_PTR_TO_BTF_ID) {
>                 expected_type = PTR_TO_BTF_ID;
>                 if (type != expected_type)
> @@ -4904,6 +4918,30 @@ static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn
>                 regs[BPF_REG_0].type = PTR_TO_MEM_OR_NULL;
>                 regs[BPF_REG_0].id = ++env->id_gen;
>                 regs[BPF_REG_0].mem_size = meta.mem_size;

[...]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next v1 8/8] bpf/selftests: Test for bpf_per_cpu_ptr()
  2020-08-19 22:40 ` [PATCH bpf-next v1 8/8] bpf/selftests: Test for bpf_per_cpu_ptr() Hao Luo
@ 2020-08-22  3:30   ` Andrii Nakryiko
  2020-08-28  3:42     ` Hao Luo
  0 siblings, 1 reply; 42+ messages in thread
From: Andrii Nakryiko @ 2020-08-22  3:30 UTC (permalink / raw)
  To: Hao Luo
  Cc: Networking, bpf, open list, open list:KERNEL SELFTEST FRAMEWORK,
	Shuah Khan, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Quentin Monnet, Steven Rostedt, Ingo Molnar,
	Andrey Ignatov, Jakub Sitnicki

On Wed, Aug 19, 2020 at 3:42 PM Hao Luo <haoluo@google.com> wrote:
>
> Test bpf_per_cpu_ptr(). Test two paths in the kernel. If the base
> pointer points to a struct, the returned reg is of type PTR_TO_BTF_ID.
> Direct pointer dereference can be applied on the returned variable.
> If the base pointer isn't a struct, the returned reg is of type
> PTR_TO_MEM, which also supports direct pointer dereference.
>
> Signed-off-by: Hao Luo <haoluo@google.com>
> ---

Acked-by: Andrii Nakryiko <andriin@fb.com>

>  .../testing/selftests/bpf/prog_tests/ksyms_btf.c  |  4 ++++
>  .../testing/selftests/bpf/progs/test_ksyms_btf.c  | 15 ++++++++++++++-
>  2 files changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/ksyms_btf.c b/tools/testing/selftests/bpf/prog_tests/ksyms_btf.c
> index 1dad61ba7e99..bdedd4a76b42 100644
> --- a/tools/testing/selftests/bpf/prog_tests/ksyms_btf.c
> +++ b/tools/testing/selftests/bpf/prog_tests/ksyms_btf.c
> @@ -71,6 +71,10 @@ void test_ksyms_btf(void)
>               "got %llu, exp %llu\n", data->out__runqueues, runqueues_addr);
>         CHECK(data->out__bpf_prog_active != bpf_prog_active_addr, "bpf_prog_active",
>               "got %llu, exp %llu\n", data->out__bpf_prog_active, bpf_prog_active_addr);
> +       CHECK(data->out__rq_cpu != 1, "rq_cpu",
> +             "got %u, exp %u\n", data->out__rq_cpu, 1);
> +       CHECK(data->out__process_counts == -1, "process_counts",
> +             "got %lu, exp != -1", data->out__process_counts);
>
>  cleanup:
>         test_ksyms_btf__destroy(skel);
> diff --git a/tools/testing/selftests/bpf/progs/test_ksyms_btf.c b/tools/testing/selftests/bpf/progs/test_ksyms_btf.c
> index e04e31117f84..78cf1ebb753d 100644
> --- a/tools/testing/selftests/bpf/progs/test_ksyms_btf.c
> +++ b/tools/testing/selftests/bpf/progs/test_ksyms_btf.c
> @@ -7,16 +7,29 @@
>
>  __u64 out__runqueues = -1;
>  __u64 out__bpf_prog_active = -1;
> +__u32 out__rq_cpu = -1;
> +unsigned long out__process_counts = -1;

try to not use long for variables, it is 32-bit integer in user-space
but always 64-bit in BPF. This causes problems when using skeleton on
32-bit architecture.

>
> -extern const struct rq runqueues __ksym; /* struct type global var. */
> +extern const struct rq runqueues __ksym; /* struct type percpu var. */
>  extern const int bpf_prog_active __ksym; /* int type global var. */
> +extern const unsigned long process_counts __ksym; /* int type percpu var. */
>
>  SEC("raw_tp/sys_enter")
>  int handler(const void *ctx)
>  {
> +       struct rq *rq;
> +       unsigned long *count;
> +
>         out__runqueues = (__u64)&runqueues;
>         out__bpf_prog_active = (__u64)&bpf_prog_active;
>
> +       rq = (struct rq *)bpf_per_cpu_ptr(&runqueues, 1);
> +       if (rq)
> +               out__rq_cpu = rq->cpu;

this is awesome!

Are there any per-cpu variables that are arrays? Would be nice to test
those too.


> +       count = (unsigned long *)bpf_per_cpu_ptr(&process_counts, 1);
> +       if (count)
> +               out__process_counts = *count;
> +
>         return 0;
>  }
>
> --
> 2.28.0.220.ged08abb693-goog
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next v1 6/8] bpf: Introduce bpf_per_cpu_ptr()
  2020-08-22  3:26   ` Andrii Nakryiko
@ 2020-08-22  3:31     ` Andrii Nakryiko
  2020-08-22  7:49       ` Hao Luo
  0 siblings, 1 reply; 42+ messages in thread
From: Andrii Nakryiko @ 2020-08-22  3:31 UTC (permalink / raw)
  To: Hao Luo
  Cc: Networking, bpf, open list, open list:KERNEL SELFTEST FRAMEWORK,
	Shuah Khan, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Quentin Monnet, Steven Rostedt, Ingo Molnar,
	Andrey Ignatov, Jakub Sitnicki

On Fri, Aug 21, 2020 at 8:26 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Wed, Aug 19, 2020 at 3:42 PM Hao Luo <haoluo@google.com> wrote:
> >
> > Add bpf_per_cpu_ptr() to help bpf programs access percpu vars.
> > bpf_per_cpu_ptr() has the same semantic as per_cpu_ptr() in the kernel
> > except that it may return NULL. This happens when the cpu parameter is
> > out of range. So the caller must check the returned value.
> >
> > Signed-off-by: Hao Luo <haoluo@google.com>
> > ---
>
> The logic looks correct, few naming nits, but otherwise:
>
> Acked-by: Andrii Nakryiko <andriin@fb.com>
>
> >  include/linux/bpf.h      |  3 ++
> >  include/linux/btf.h      | 11 +++++++
> >  include/uapi/linux/bpf.h | 14 +++++++++
> >  kernel/bpf/btf.c         | 10 -------
> >  kernel/bpf/verifier.c    | 64 ++++++++++++++++++++++++++++++++++++++--
> >  kernel/trace/bpf_trace.c | 18 +++++++++++
> >  6 files changed, 107 insertions(+), 13 deletions(-)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 55f694b63164..613404beab33 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -268,6 +268,7 @@ enum bpf_arg_type {
> >         ARG_PTR_TO_ALLOC_MEM,   /* pointer to dynamically allocated memory */
> >         ARG_PTR_TO_ALLOC_MEM_OR_NULL,   /* pointer to dynamically allocated memory or NULL */
> >         ARG_CONST_ALLOC_SIZE_OR_ZERO,   /* number of allocated bytes requested */
> > +       ARG_PTR_TO_PERCPU_BTF_ID,       /* pointer to in-kernel percpu type */
> >  };
> >
> >  /* type of values returned from helper functions */
> > @@ -281,6 +282,7 @@ enum bpf_return_type {
> >         RET_PTR_TO_SOCK_COMMON_OR_NULL, /* returns a pointer to a sock_common or NULL */
> >         RET_PTR_TO_ALLOC_MEM_OR_NULL,   /* returns a pointer to dynamically allocated memory or NULL */
> >         RET_PTR_TO_BTF_ID_OR_NULL,      /* returns a pointer to a btf_id or NULL */
> > +       RET_PTR_TO_MEM_OR_BTF_OR_NULL,  /* returns a pointer to a valid memory or a btf_id or NULL */
>
> I know it's already very long, but still let's use BTF_ID consistently
>
> >  };
> >
> >  /* eBPF function prototype used by verifier to allow BPF_CALLs from eBPF programs
> > @@ -360,6 +362,7 @@ enum bpf_reg_type {
> >         PTR_TO_RDONLY_BUF_OR_NULL, /* reg points to a readonly buffer or NULL */
> >         PTR_TO_RDWR_BUF,         /* reg points to a read/write buffer */
> >         PTR_TO_RDWR_BUF_OR_NULL, /* reg points to a read/write buffer or NULL */
> > +       PTR_TO_PERCPU_BTF_ID,    /* reg points to percpu kernel type */
> >  };
> >
>
> [...]
>
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 468376f2910b..c7e49a102ed2 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -3415,6 +3415,19 @@ union bpf_attr {
> >   *             A non-negative value equal to or less than *size* on success,
> >   *             or a negative error in case of failure.
> >   *
> > + * void *bpf_per_cpu_ptr(const void *ptr, u32 cpu)

btw, having bpf_this_cpu_ptr(const void *ptr) seems worthwhile as well, WDYT?

> > + *     Description
> > + *             Take the address of a percpu ksym and return a pointer pointing
> > + *             to the variable on *cpu*. A ksym is an extern variable decorated
> > + *             with '__ksym'. A ksym is percpu if there is a global percpu var
> > + *             (either static or global) defined of the same name in the kernel.
>
> The function signature has "ptr", not "ksym", but the description is
> using "ksym". please make them consistent (might name param
> "percpu_ptr")
>
> > + *
> > + *             bpf_per_cpu_ptr() has the same semantic as per_cpu_ptr() in the
> > + *             kernel, except that bpf_per_cpu_ptr() may return NULL. This
> > + *             happens if *cpu* is larger than nr_cpu_ids. The caller of
> > + *             bpf_per_cpu_ptr() must check the returned value.
> > + *     Return
> > + *             A generic pointer pointing to the variable on *cpu*.
> >   */
>
> [...]
>
> > +       } else if (arg_type == ARG_PTR_TO_PERCPU_BTF_ID) {
> > +               expected_type = PTR_TO_PERCPU_BTF_ID;
> > +               if (type != expected_type)
> > +                       goto err_type;
> > +               if (!reg->btf_id) {
> > +                       verbose(env, "Helper has zero btf_id in R%d\n", regno);
>
> nit: "invalid btf_id"?
>
> > +                       return -EACCES;
> > +               }
> > +               meta->ret_btf_id = reg->btf_id;
> >         } else if (arg_type == ARG_PTR_TO_BTF_ID) {
> >                 expected_type = PTR_TO_BTF_ID;
> >                 if (type != expected_type)
> > @@ -4904,6 +4918,30 @@ static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn
> >                 regs[BPF_REG_0].type = PTR_TO_MEM_OR_NULL;
> >                 regs[BPF_REG_0].id = ++env->id_gen;
> >                 regs[BPF_REG_0].mem_size = meta.mem_size;
>
> [...]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next v1 3/8] bpf: Introduce help function to validate ksym's type.
  2020-08-22  2:43         ` Andrii Nakryiko
@ 2020-08-22  7:04           ` Hao Luo
  0 siblings, 0 replies; 42+ messages in thread
From: Hao Luo @ 2020-08-22  7:04 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Yonghong Song, Networking, bpf, open list,
	open list:KERNEL SELFTEST FRAMEWORK, Shuah Khan,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, John Fastabend, KP Singh,
	Quentin Monnet, Steven Rostedt, Ingo Molnar, Andrey Ignatov,
	Jakub Sitnicki

Ah, I see bpf_core_types_are_compat() after sync'ing my local repo. It
seems the perfect fit for my use case. I only found the
btf_equal_xxx() defined in btf.c when posting these patches. I can
test and use bpf_core_types_are_compat() in v2. Thanks for pointing it
out and explaining the public APIs.

Hao

On Fri, Aug 21, 2020 at 7:43 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Fri, Aug 21, 2020 at 5:43 PM Hao Luo <haoluo@google.com> wrote:
> >
> > On Fri, Aug 21, 2020 at 2:50 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Thu, Aug 20, 2020 at 10:22 AM Yonghong Song <yhs@fb.com> wrote:
> > > >
> > > >
> > > >
> > > > On 8/19/20 3:40 PM, Hao Luo wrote:
> > > > > For a ksym to be safely dereferenced and accessed, its type defined in
> > > > > bpf program should basically match its type defined in kernel. Implement
> > > > > a help function for a quick matching, which is used by libbpf when
> > > > > resolving the kernel btf_id of a ksym.
> > > > >
> > > > > Signed-off-by: Hao Luo <haoluo@google.com>
> > > > > ---
> > [...]
> > > > > +/*
> > > > > + * Match a ksym's type defined in bpf programs against its type encoded in
> > > > > + * kernel btf.
> > > > > + */
> > > > > +bool btf_ksym_type_match(const struct btf *ba, __u32 id_a,
> > > > > +                      const struct btf *bb, __u32 id_b)
> > > > > +{
> > >
> > > [...]
> > >
> > > > > +                     }
> > > > > +             }
> > > >
> > > > I am wondering whether this is too strict and how this can co-work with
> > > > CO-RE. Forcing users to write almost identical structure definition to
> > > > the underlying kernel will not be user friendly and may not work cross
> > > > kernel versions even if the field user cares have not changed.
> > > >
> > > > Maybe we can relax the constraint here. You can look at existing
> > > > libbpf CO-RE code.
> > >
> > > Right. Hao, can you just re-use bpf_core_types_are_compat() instead?
> > > See if semantics makes sense, but I think it should. BPF CO-RE has
> > > been permissive in terms of struct size and few other type aspects,
> > > because it handles relocations so well. This approach allows to not
> > > have to exactly match all possible variations of some struct
> > > definition, which is a big problem with ever-changing kernel data
> > > structures.
> > >
> >
> > I have to say I hate myself writing another type comparison instead of
> > reusing the existing one. The issue is that when bpf_core_types_compat
> > compares names, it uses t1->name_off == t2->name_off. It is also used
>
> Huh? Are we talking about the same bpf_core_types_are_compat() (there
> is no bpf_core_types_compat, I think it's a typo)?
> bpf_core_types_are_compat() doesn't even compare any name, so I'm not
> sure what you are talking about. Some of btf_dedup functions do string
> comparisons using name_off directly, but that's a special and very
> careful case, it's not relevant here.
>
>
> > in bpf_equal_common(). In my case, because these types are from two
> > different BTFs, their name_off are not expected to be the same, right?
> > I didn't find a good solution to refactor before posting this patch. I
>
> bpf_core_types_are_compat() didn't land until this week, so you must
> be confusing something. Please take another look.
>
> > think I can adapt bpf_core_type_compat() and pay more attention to
> > CO-RE.
> >
> > > >
> > > > > +             break;
> > > > > +     }
> > >
> > > [...]
> > >
> > > > > +
> > > > >   struct btf_ext_sec_setup_param {
> > > > >       __u32 off;
> > > > >       __u32 len;
> > > > > diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
> > > > > index 91f0ad0e0325..5ef220e52485 100644
> > > > > --- a/tools/lib/bpf/btf.h
> > > > > +++ b/tools/lib/bpf/btf.h
> > > > > @@ -52,6 +52,8 @@ LIBBPF_API int btf__get_map_kv_tids(const struct btf *btf, const char *map_name,
> > > > >                                   __u32 expected_key_size,
> > > > >                                   __u32 expected_value_size,
> > > > >                                   __u32 *key_type_id, __u32 *value_type_id);
> > > > > +LIBBPF_API bool btf_ksym_type_match(const struct btf *ba, __u32 id_a,
> > > > > +                                 const struct btf *bb, __u32 id_b);
> > > > >
> > > > >   LIBBPF_API struct btf_ext *btf_ext__new(__u8 *data, __u32 size);
> > > > >   LIBBPF_API void btf_ext__free(struct btf_ext *btf_ext);
> > > >
> > > > The new API function should be added to libbpf.map.
> > >
> > > My question is why does this even have to be a public API?
> >
> > I can fix. Please pardon my ignorance, what is the difference between
> > public and internal APIs? I wasn't sure, so used it improperly.
>
> public APIs are those that users of libbpf are supposed to use,
> internal one is just for libbpf internal use. The former can't change,
> the latter can be refactor as much as we need to.
>
> >
> > Thanks,
> > Hao

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next v1 5/8] bpf/selftests: ksyms_btf to test typed ksyms
  2020-08-21 23:03     ` Andrii Nakryiko
@ 2020-08-22  7:26       ` Hao Luo
  2020-08-22  7:35         ` Andrii Nakryiko
  0 siblings, 1 reply; 42+ messages in thread
From: Hao Luo @ 2020-08-22  7:26 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Yonghong Song, Networking, bpf, open list,
	open list:KERNEL SELFTEST FRAMEWORK, Shuah Khan,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, John Fastabend, KP Singh,
	Quentin Monnet, Steven Rostedt, Ingo Molnar, Andrey Ignatov,
	Jakub Sitnicki

On Fri, Aug 21, 2020 at 4:03 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Aug 20, 2020 at 10:32 AM Yonghong Song <yhs@fb.com> wrote:
> >
> >
> >
> > On 8/19/20 3:40 PM, Hao Luo wrote:
> > > Selftests for typed ksyms. Tests two types of ksyms: one is a struct,
> > > the other is a plain int. This tests two paths in the kernel. Struct
> > > ksyms will be converted into PTR_TO_BTF_ID by the verifier while int
> > > typed ksyms will be converted into PTR_TO_MEM.
> > >
> > > Signed-off-by: Hao Luo <haoluo@google.com>
> > > ---
> > >   .../selftests/bpf/prog_tests/ksyms_btf.c      | 77 +++++++++++++++++++
> > >   .../selftests/bpf/progs/test_ksyms_btf.c      | 23 ++++++
> > >   2 files changed, 100 insertions(+)
> > >   create mode 100644 tools/testing/selftests/bpf/prog_tests/ksyms_btf.c
> > >   create mode 100644 tools/testing/selftests/bpf/progs/test_ksyms_btf.c
> > >
> > > diff --git a/tools/testing/selftests/bpf/prog_tests/ksyms_btf.c b/tools/testing/selftests/bpf/prog_tests/ksyms_btf.c
> > > new file mode 100644
> > > index 000000000000..1dad61ba7e99
> > > --- /dev/null
> > > +++ b/tools/testing/selftests/bpf/prog_tests/ksyms_btf.c
> > > @@ -0,0 +1,77 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +/* Copyright (c) 2020 Google */
> > > +
> > > +#include <test_progs.h>
> > > +#include <bpf/libbpf.h>
> > > +#include <bpf/btf.h>
> > > +#include "test_ksyms_btf.skel.h"
> > > +
> > > +static int duration;
> > > +
> > > +static __u64 kallsyms_find(const char *sym)
> > > +{
> > > +     char type, name[500];
> > > +     __u64 addr, res = 0;
> > > +     FILE *f;
> > > +
> > > +     f = fopen("/proc/kallsyms", "r");
> > > +     if (CHECK(!f, "kallsyms_fopen", "failed to open: %d\n", errno))
> > > +             return 0;
> >
> > could you check whether libbpf API can provide this functionality for
> > you? As far as I know, libbpf does parse /proc/kallsyms.
>
> No need to use libbpf's implementation. We already have
> kallsyms_find() in prog_tests/ksyms.c and a combination of
> load_kallsyms() + ksym_get_addr() in trace_helpers.c. It would be good
> to switch to one implementation for both prog_tests/ksyms.c and this
> one.
>
Ack. I can do some refactoring. The least thing that I can do is
moving kallsyms_find() to a header for both prog_tests/ksyms.c and
this test to use.

>
> > > diff --git a/tools/testing/selftests/bpf/progs/test_ksyms_btf.c b/tools/testing/selftests/bpf/progs/test_ksyms_btf.c
> > > new file mode 100644
> > > index 000000000000..e04e31117f84
> > > --- /dev/null
> > > +++ b/tools/testing/selftests/bpf/progs/test_ksyms_btf.c
> > > @@ -0,0 +1,23 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +/* Copyright (c) 2020 Google */
> > > +
> > > +#include "vmlinux.h"
> > > +
> > > +#include <bpf/bpf_helpers.h>
> > > +
> > > +__u64 out__runqueues = -1;
> > > +__u64 out__bpf_prog_active = -1;
> > > +
> > > +extern const struct rq runqueues __ksym; /* struct type global var. */
> > > +extern const int bpf_prog_active __ksym; /* int type global var. */
> > > +
> > > +SEC("raw_tp/sys_enter")
> > > +int handler(const void *ctx)
> > > +{
> > > +     out__runqueues = (__u64)&runqueues;
> > > +     out__bpf_prog_active = (__u64)&bpf_prog_active;
> > > +
>
> You didn't test accessing any of the members of runqueues, because BTF
> only has per-CPU variables, right? Adding global/static variables was
> adding too much data to BTF or something like that, is that right?
>

Right. With some experiments, I found the address of a percpu variable
doesn't necessarily point to a valid structure. So it doesn't make
sense to dereference runqueues and access its members. However, right
now there are only percpu variables encoded in BTF, so I can't test
accessing members of general global/static variables unfortunately.

Hao

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next v1 5/8] bpf/selftests: ksyms_btf to test typed ksyms
  2020-08-22  7:26       ` Hao Luo
@ 2020-08-22  7:35         ` Andrii Nakryiko
  0 siblings, 0 replies; 42+ messages in thread
From: Andrii Nakryiko @ 2020-08-22  7:35 UTC (permalink / raw)
  To: Hao Luo
  Cc: Yonghong Song, Networking, bpf, open list,
	open list:KERNEL SELFTEST FRAMEWORK, Shuah Khan,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, John Fastabend, KP Singh,
	Quentin Monnet, Steven Rostedt, Ingo Molnar, Andrey Ignatov,
	Jakub Sitnicki

On Sat, Aug 22, 2020 at 12:27 AM Hao Luo <haoluo@google.com> wrote:
>
> On Fri, Aug 21, 2020 at 4:03 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Thu, Aug 20, 2020 at 10:32 AM Yonghong Song <yhs@fb.com> wrote:
> > >
> > >
> > >
> > > On 8/19/20 3:40 PM, Hao Luo wrote:
> > > > Selftests for typed ksyms. Tests two types of ksyms: one is a struct,
> > > > the other is a plain int. This tests two paths in the kernel. Struct
> > > > ksyms will be converted into PTR_TO_BTF_ID by the verifier while int
> > > > typed ksyms will be converted into PTR_TO_MEM.
> > > >
> > > > Signed-off-by: Hao Luo <haoluo@google.com>
> > > > ---
> > > >   .../selftests/bpf/prog_tests/ksyms_btf.c      | 77 +++++++++++++++++++
> > > >   .../selftests/bpf/progs/test_ksyms_btf.c      | 23 ++++++
> > > >   2 files changed, 100 insertions(+)
> > > >   create mode 100644 tools/testing/selftests/bpf/prog_tests/ksyms_btf.c
> > > >   create mode 100644 tools/testing/selftests/bpf/progs/test_ksyms_btf.c
> > > >
> > > > diff --git a/tools/testing/selftests/bpf/prog_tests/ksyms_btf.c b/tools/testing/selftests/bpf/prog_tests/ksyms_btf.c
> > > > new file mode 100644
> > > > index 000000000000..1dad61ba7e99
> > > > --- /dev/null
> > > > +++ b/tools/testing/selftests/bpf/prog_tests/ksyms_btf.c
> > > > @@ -0,0 +1,77 @@
> > > > +// SPDX-License-Identifier: GPL-2.0
> > > > +/* Copyright (c) 2020 Google */
> > > > +
> > > > +#include <test_progs.h>
> > > > +#include <bpf/libbpf.h>
> > > > +#include <bpf/btf.h>
> > > > +#include "test_ksyms_btf.skel.h"
> > > > +
> > > > +static int duration;
> > > > +
> > > > +static __u64 kallsyms_find(const char *sym)
> > > > +{
> > > > +     char type, name[500];
> > > > +     __u64 addr, res = 0;
> > > > +     FILE *f;
> > > > +
> > > > +     f = fopen("/proc/kallsyms", "r");
> > > > +     if (CHECK(!f, "kallsyms_fopen", "failed to open: %d\n", errno))
> > > > +             return 0;
> > >
> > > could you check whether libbpf API can provide this functionality for
> > > you? As far as I know, libbpf does parse /proc/kallsyms.
> >
> > No need to use libbpf's implementation. We already have
> > kallsyms_find() in prog_tests/ksyms.c and a combination of
> > load_kallsyms() + ksym_get_addr() in trace_helpers.c. It would be good
> > to switch to one implementation for both prog_tests/ksyms.c and this
> > one.
> >
> Ack. I can do some refactoring. The least thing that I can do is
> moving kallsyms_find() to a header for both prog_tests/ksyms.c and
> this test to use.

Please no extra headers. Just put kallsyms_find() in trace_helpers.c,
along other kallsyms-related functions.

>
> >
> > > > diff --git a/tools/testing/selftests/bpf/progs/test_ksyms_btf.c b/tools/testing/selftests/bpf/progs/test_ksyms_btf.c
> > > > new file mode 100644
> > > > index 000000000000..e04e31117f84
> > > > --- /dev/null
> > > > +++ b/tools/testing/selftests/bpf/progs/test_ksyms_btf.c
> > > > @@ -0,0 +1,23 @@
> > > > +// SPDX-License-Identifier: GPL-2.0
> > > > +/* Copyright (c) 2020 Google */
> > > > +
> > > > +#include "vmlinux.h"
> > > > +
> > > > +#include <bpf/bpf_helpers.h>
> > > > +
> > > > +__u64 out__runqueues = -1;
> > > > +__u64 out__bpf_prog_active = -1;
> > > > +
> > > > +extern const struct rq runqueues __ksym; /* struct type global var. */
> > > > +extern const int bpf_prog_active __ksym; /* int type global var. */
> > > > +
> > > > +SEC("raw_tp/sys_enter")
> > > > +int handler(const void *ctx)
> > > > +{
> > > > +     out__runqueues = (__u64)&runqueues;
> > > > +     out__bpf_prog_active = (__u64)&bpf_prog_active;
> > > > +
> >
> > You didn't test accessing any of the members of runqueues, because BTF
> > only has per-CPU variables, right? Adding global/static variables was
> > adding too much data to BTF or something like that, is that right?
> >
>
> Right. With some experiments, I found the address of a percpu variable
> doesn't necessarily point to a valid structure. So it doesn't make
> sense to dereference runqueues and access its members. However, right
> now there are only percpu variables encoded in BTF, so I can't test
> accessing members of general global/static variables unfortunately.
>
> Hao

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next v1 6/8] bpf: Introduce bpf_per_cpu_ptr()
  2020-08-22  3:31     ` Andrii Nakryiko
@ 2020-08-22  7:49       ` Hao Luo
  2020-08-22  7:55         ` Andrii Nakryiko
  0 siblings, 1 reply; 42+ messages in thread
From: Hao Luo @ 2020-08-22  7:49 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Networking, bpf, open list, open list:KERNEL SELFTEST FRAMEWORK,
	Shuah Khan, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Quentin Monnet, Steven Rostedt, Ingo Molnar,
	Andrey Ignatov, Jakub Sitnicki

On Fri, Aug 21, 2020 at 8:31 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Fri, Aug 21, 2020 at 8:26 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Wed, Aug 19, 2020 at 3:42 PM Hao Luo <haoluo@google.com> wrote:
> > >
> > > Add bpf_per_cpu_ptr() to help bpf programs access percpu vars.
> > > bpf_per_cpu_ptr() has the same semantic as per_cpu_ptr() in the kernel
> > > except that it may return NULL. This happens when the cpu parameter is
> > > out of range. So the caller must check the returned value.
> > >
> > > Signed-off-by: Hao Luo <haoluo@google.com>
> > > ---
> >
> > The logic looks correct, few naming nits, but otherwise:
> >
> > Acked-by: Andrii Nakryiko <andriin@fb.com>
> >
> > >  include/linux/bpf.h      |  3 ++
> > >  include/linux/btf.h      | 11 +++++++
> > >  include/uapi/linux/bpf.h | 14 +++++++++
> > >  kernel/bpf/btf.c         | 10 -------
> > >  kernel/bpf/verifier.c    | 64 ++++++++++++++++++++++++++++++++++++++--
> > >  kernel/trace/bpf_trace.c | 18 +++++++++++
> > >  6 files changed, 107 insertions(+), 13 deletions(-)
[...]
>
> btw, having bpf_this_cpu_ptr(const void *ptr) seems worthwhile as well, WDYT?
>

It's probably not a good idea, IMHO. How does it interact with
preemption? Should we treat it as __this_cpu_ptr()? If so, I feel it's
easy to be misused, if the bpf program is called in a preemptible
context.

Btw, is bpf programs always called with preemption disabled? How about
interrupts? I haven't thought about these questions before but I think
they matter as we start to have more ways for bpf programs to interact
with the kernel.

Best,
Hao

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next v1 6/8] bpf: Introduce bpf_per_cpu_ptr()
  2020-08-22  7:49       ` Hao Luo
@ 2020-08-22  7:55         ` Andrii Nakryiko
  2020-08-25  1:03           ` Hao Luo
  0 siblings, 1 reply; 42+ messages in thread
From: Andrii Nakryiko @ 2020-08-22  7:55 UTC (permalink / raw)
  To: Hao Luo
  Cc: Networking, bpf, open list, open list:KERNEL SELFTEST FRAMEWORK,
	Shuah Khan, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Quentin Monnet, Steven Rostedt, Ingo Molnar,
	Andrey Ignatov, Jakub Sitnicki

On Sat, Aug 22, 2020 at 12:49 AM Hao Luo <haoluo@google.com> wrote:
>
> On Fri, Aug 21, 2020 at 8:31 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Fri, Aug 21, 2020 at 8:26 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Wed, Aug 19, 2020 at 3:42 PM Hao Luo <haoluo@google.com> wrote:
> > > >
> > > > Add bpf_per_cpu_ptr() to help bpf programs access percpu vars.
> > > > bpf_per_cpu_ptr() has the same semantic as per_cpu_ptr() in the kernel
> > > > except that it may return NULL. This happens when the cpu parameter is
> > > > out of range. So the caller must check the returned value.
> > > >
> > > > Signed-off-by: Hao Luo <haoluo@google.com>
> > > > ---
> > >
> > > The logic looks correct, few naming nits, but otherwise:
> > >
> > > Acked-by: Andrii Nakryiko <andriin@fb.com>
> > >
> > > >  include/linux/bpf.h      |  3 ++
> > > >  include/linux/btf.h      | 11 +++++++
> > > >  include/uapi/linux/bpf.h | 14 +++++++++
> > > >  kernel/bpf/btf.c         | 10 -------
> > > >  kernel/bpf/verifier.c    | 64 ++++++++++++++++++++++++++++++++++++++--
> > > >  kernel/trace/bpf_trace.c | 18 +++++++++++
> > > >  6 files changed, 107 insertions(+), 13 deletions(-)
> [...]
> >
> > btw, having bpf_this_cpu_ptr(const void *ptr) seems worthwhile as well, WDYT?
> >
>
> It's probably not a good idea, IMHO. How does it interact with
> preemption? Should we treat it as __this_cpu_ptr()? If so, I feel it's
> easy to be misused, if the bpf program is called in a preemptible
> context.
>
> Btw, is bpf programs always called with preemption disabled? How about
> interrupts? I haven't thought about these questions before but I think
> they matter as we start to have more ways for bpf programs to interact
> with the kernel.

non-sleepable BPF is always disabling CPU migration, so there is no
problem with this_cpu_ptr. For sleepable not sure, but we can disable
this helper for sleepable BPF programs, if that's a problem.

>
> Best,
> Hao

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next v1 1/8] bpf: Introduce pseudo_btf_id
  2020-08-20 15:22   ` Yonghong Song
  2020-08-20 17:04     ` Hao Luo
@ 2020-08-25  0:05     ` Hao Luo
  2020-08-25  0:43       ` Yonghong Song
  1 sibling, 1 reply; 42+ messages in thread
From: Hao Luo @ 2020-08-25  0:05 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Networking, bpf, open list, open list:KERNEL SELFTEST FRAMEWORK,
	Shuah Khan, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, John Fastabend, KP Singh,
	Quentin Monnet, Steven Rostedt, Ingo Molnar, Andrey Ignatov,
	Jakub Sitnicki

Yonghong,

An update on this thread. I successfully reproduced this issue on a
8.2.0 gcc compiler, It looks like gcc 4.9 did not have this issue. I
was also using clang which did not show this bug.

It seems having a DW_AT_specification that refers to another
DW_TAG_variable isn't handled in pahole. I have a (maybe hacky) pahole
patch as fix and let me clean it up and post for review soon.

Hao

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next v1 1/8] bpf: Introduce pseudo_btf_id
  2020-08-25  0:05     ` Hao Luo
@ 2020-08-25  0:43       ` Yonghong Song
  0 siblings, 0 replies; 42+ messages in thread
From: Yonghong Song @ 2020-08-25  0:43 UTC (permalink / raw)
  To: Hao Luo
  Cc: Networking, bpf, open list, open list:KERNEL SELFTEST FRAMEWORK,
	Shuah Khan, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, John Fastabend, KP Singh,
	Quentin Monnet, Steven Rostedt, Ingo Molnar, Andrey Ignatov,
	Jakub Sitnicki



On 8/24/20 5:05 PM, Hao Luo wrote:
> Yonghong,
> 
> An update on this thread. I successfully reproduced this issue on a
> 8.2.0 gcc compiler, It looks like gcc 4.9 did not have this issue. I
> was also using clang which did not show this bug.
> 
> It seems having a DW_AT_specification that refers to another
> DW_TAG_variable isn't handled in pahole. I have a (maybe hacky) pahole
> patch as fix and let me clean it up and post for review soon.

Sounds good. Thanks for fixing it in pahole! People may use gcc 4.9 up 
to gcc 10, or llvm compiler (let us say llvm10 which is the version 
which has CO-RE support and is recommended for compiling bpf
programs.) Once you have fix for gcc 8.2, it might be worthwhile to
check a few other gcc/llvm version (at least gcc8/9/10 and llvm10)
to ensure it still works.

Currently, by default kernel is using DWARF2, pahole has a good
support for dwarf2. I am not sure whether it supports dwarf4 well
or not. But I think additional dwarf4 support, if needed, can
be done a little bit later as dwarf4 is not widely used yet for kernel,
I think.


> 
> Hao
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next v1 6/8] bpf: Introduce bpf_per_cpu_ptr()
  2020-08-22  7:55         ` Andrii Nakryiko
@ 2020-08-25  1:03           ` Hao Luo
  0 siblings, 0 replies; 42+ messages in thread
From: Hao Luo @ 2020-08-25  1:03 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Networking, bpf, open list, open list:KERNEL SELFTEST FRAMEWORK,
	Shuah Khan, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Quentin Monnet, Steven Rostedt, Ingo Molnar,
	Andrey Ignatov, Jakub Sitnicki

On Sat, Aug 22, 2020 at 12:55 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Sat, Aug 22, 2020 at 12:49 AM Hao Luo <haoluo@google.com> wrote:
> >
> > On Fri, Aug 21, 2020 at 8:31 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Fri, Aug 21, 2020 at 8:26 PM Andrii Nakryiko
> > > <andrii.nakryiko@gmail.com> wrote:
> > > >
> > > > On Wed, Aug 19, 2020 at 3:42 PM Hao Luo <haoluo@google.com> wrote:
> > > > >
> > > > > Add bpf_per_cpu_ptr() to help bpf programs access percpu vars.
> > > > > bpf_per_cpu_ptr() has the same semantic as per_cpu_ptr() in the kernel
> > > > > except that it may return NULL. This happens when the cpu parameter is
> > > > > out of range. So the caller must check the returned value.
> > > > >
> > > > > Signed-off-by: Hao Luo <haoluo@google.com>
> > > > > ---
> > > >
> > > > The logic looks correct, few naming nits, but otherwise:
> > > >
> > > > Acked-by: Andrii Nakryiko <andriin@fb.com>
> > > >
> > > > >  include/linux/bpf.h      |  3 ++
> > > > >  include/linux/btf.h      | 11 +++++++
> > > > >  include/uapi/linux/bpf.h | 14 +++++++++
> > > > >  kernel/bpf/btf.c         | 10 -------
> > > > >  kernel/bpf/verifier.c    | 64 ++++++++++++++++++++++++++++++++++++++--
> > > > >  kernel/trace/bpf_trace.c | 18 +++++++++++
> > > > >  6 files changed, 107 insertions(+), 13 deletions(-)
> > [...]
> > >
> > > btw, having bpf_this_cpu_ptr(const void *ptr) seems worthwhile as well, WDYT?
> > >
> >
> > It's probably not a good idea, IMHO. How does it interact with
> > preemption? Should we treat it as __this_cpu_ptr()? If so, I feel it's
> > easy to be misused, if the bpf program is called in a preemptible
> > context.
> >
> > Btw, is bpf programs always called with preemption disabled? How about
> > interrupts? I haven't thought about these questions before but I think
> > they matter as we start to have more ways for bpf programs to interact
> > with the kernel.
>
> non-sleepable BPF is always disabling CPU migration, so there is no
> problem with this_cpu_ptr. For sleepable not sure, but we can disable
> this helper for sleepable BPF programs, if that's a problem.
>

Sounds good. I see there is bpf_get_smp_processor_id() and we are
already doing this. I can add this_cpu_ptr() in v2.

Hao

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next v1 4/8] bpf/libbpf: BTF support for typed ksyms
  2020-08-21 22:37   ` Andrii Nakryiko
@ 2020-08-27 22:29     ` Hao Luo
  2020-09-01 18:11       ` Andrii Nakryiko
  0 siblings, 1 reply; 42+ messages in thread
From: Hao Luo @ 2020-08-27 22:29 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Networking, bpf, open list, open list:KERNEL SELFTEST FRAMEWORK,
	Shuah Khan, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Quentin Monnet, Steven Rostedt, Ingo Molnar,
	Andrey Ignatov, Jakub Sitnicki

On Fri, Aug 21, 2020 at 3:37 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Wed, Aug 19, 2020 at 3:42 PM Hao Luo <haoluo@google.com> wrote:
> >
> > If a ksym is defined with a type, libbpf will try to find the ksym's btf
> > information from kernel btf. If a valid btf entry for the ksym is found,
> > libbpf can pass in the found btf id to the verifier, which validates the
> > ksym's type and value.
> >
> > Typeless ksyms (i.e. those defined as 'void') will not have such btf_id,
> > but it has the symbol's address (read from kallsyms) and its value is
> > treated as a raw pointer.
> >
> > Signed-off-by: Hao Luo <haoluo@google.com>
> > ---
> >  tools/lib/bpf/libbpf.c | 130 ++++++++++++++++++++++++++++++++++++-----
> >  1 file changed, 114 insertions(+), 16 deletions(-)
> >
> > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> > index 4a81c6b2d21b..94eff612c7c2 100644
> > --- a/tools/lib/bpf/libbpf.c
> > +++ b/tools/lib/bpf/libbpf.c
> > @@ -357,7 +357,16 @@ struct extern_desc {
> >                         bool is_signed;
> >                 } kcfg;
> >                 struct {
> > -                       unsigned long long addr;
> > +                       /*
> > +                        *  1. If ksym is typeless, the field 'addr' is valid.
> > +                        *  2. If ksym is typed, the field 'vmlinux_btf_id' is
> > +                        *     valid.
> > +                        */
> > +                       bool is_typeless;
> > +                       union {
> > +                               unsigned long long addr;
> > +                               int vmlinux_btf_id;
> > +                       };
>
> ksym is 16 bytes anyways, union doesn't help to save space. I propose
> to encode all this with just two fields: vmlinux_btf_id and addr. If
> btf_id == 0, then extern is typeless.

Ack on expanding the union. But I slightly preferred keeping
is_typeless. IIUC, btf_id points a VAR_KIND, we need the following
pointer chasing every time

t = btf__type_by_id(obj->btf, ext->btf_id);
t->type;

which I felt is worse than keeping a is_typeless flag.

>
> >                 } ksym;
> >         };
> >  };
> > @@ -382,6 +391,7 @@ struct bpf_object {
> >
> >         bool loaded;
> >         bool has_pseudo_calls;
> > +       bool has_typed_ksyms;
> >
> >         /*
> >          * Information when doing elf related work. Only valid if fd
> > @@ -2521,6 +2531,10 @@ static int bpf_object__load_vmlinux_btf(struct bpf_object *obj)
> >         if (obj->btf_ext && obj->btf_ext->core_relo_info.len)
> >                 need_vmlinux_btf = true;
> >
> > +       /* Support for typed ksyms needs kernel BTF */
> > +       if (obj->has_typed_ksyms)
> > +               need_vmlinux_btf = true;
>
> On the second read, I don't think you really need `has_typed_ksyms` at
> all. Just iterate over ksym externs and see if you have a typed one.
> It's the only place that cares.
>

Ack. Will iterate over ksym externs here.

> > +
> >         bpf_object__for_each_program(prog, obj) {
> >                 if (!prog->load)
> >                         continue;
> > @@ -2975,10 +2989,10 @@ static int bpf_object__collect_externs(struct bpf_object *obj)
> >                         ext->type = EXT_KSYM;
> >
> >                         vt = skip_mods_and_typedefs(obj->btf, t->type, NULL);
> > -                       if (!btf_is_void(vt)) {
> > -                               pr_warn("extern (ksym) '%s' is not typeless (void)\n", ext_name);
> > -                               return -ENOTSUP;
> > -                       }
> > +                       ext->ksym.is_typeless = btf_is_void(vt);
> > +
> > +                       if (!obj->has_typed_ksyms && !ext->ksym.is_typeless)
> > +                               obj->has_typed_ksyms = true;
>
> nit: keep it simple:
>
> if (ext->ksym.is_typeless)
>   obj->has_typed_ksyms = true;
>

Ack.

> >                 } else {
> >                         pr_warn("unrecognized extern section '%s'\n", sec_name);
> >                         return -ENOTSUP;
> > @@ -2992,9 +3006,9 @@ static int bpf_object__collect_externs(struct bpf_object *obj)
> >         /* sort externs by type, for kcfg ones also by (align, size, name) */
> >         qsort(obj->externs, obj->nr_extern, sizeof(*ext), cmp_externs);
> >
> > -       /* for .ksyms section, we need to turn all externs into allocated
> > -        * variables in BTF to pass kernel verification; we do this by
> > -        * pretending that each extern is a 8-byte variable
> > +       /* for .ksyms section, we need to turn all typeless externs into
> > +        * allocated variables in BTF to pass kernel verification; we do
> > +        * this by pretending that each typeless extern is a 8-byte variable
> >          */
> >         if (ksym_sec) {
> >                 /* find existing 4-byte integer type in BTF to use for fake
> > @@ -3012,7 +3026,7 @@ static int bpf_object__collect_externs(struct bpf_object *obj)
> >
> >                 sec = ksym_sec;
> >                 n = btf_vlen(sec);
> > -               for (i = 0, off = 0; i < n; i++, off += sizeof(int)) {
> > +               for (i = 0, off = 0; i < n; i++) {
> >                         struct btf_var_secinfo *vs = btf_var_secinfos(sec) + i;
> >                         struct btf_type *vt;
> >
> > @@ -3025,9 +3039,14 @@ static int bpf_object__collect_externs(struct bpf_object *obj)
> >                                 return -ESRCH;
> >                         }
> >                         btf_var(vt)->linkage = BTF_VAR_GLOBAL_ALLOCATED;
> > -                       vt->type = int_btf_id;
> > +                       if (ext->ksym.is_typeless) {
> > +                               vt->type = int_btf_id;
> > +                               vs->size = sizeof(int);
> > +                       }
> >                         vs->offset = off;
> > -                       vs->size = sizeof(int);
> > +                       off += vs->size;
> > +                       pr_debug("ksym var_secinfo: var '%s', type #%d, size %d, offset %d\n",
> > +                                ext->name, vt->type, vs->size, vs->offset);
>
> It's a bit of a waste that we still allocate memory for those typed
> ksym externs, as they don't really need space. But modifying BTF is a
> pain right now, so I think we'll have to do it, until we have a better
> BTF API. But let's make them integers for now to take a fixed and
> small amount of space.
>

Do you mean making typed ksym externs of type integer? If so, we can't
do that, I think. After collect_externs, we later need to compare the
declared extern's type against the type defined in kernel. Better not
rewrite their types in BTf.

I am generally against modifying BTF. I initially didn't notice that
all the ksym externs' types are modified to 'int' and the type
comparison I mentioned above always failed. I dumped the btf in
vmlinux and the btf in object file, checked the kernel variable's
source code, printed out everything I could. The experience was very
bad.

> >                 }
> >                 sec->size = off;
> >         }
> > @@ -5300,8 +5319,13 @@ bpf_program__relocate(struct bpf_program *prog, struct bpf_object *obj)
> >                                 insn[0].imm = obj->maps[obj->kconfig_map_idx].fd;
> >                                 insn[1].imm = ext->kcfg.data_off;
> >                         } else /* EXT_KSYM */ {
> > -                               insn[0].imm = (__u32)ext->ksym.addr;
> > -                               insn[1].imm = ext->ksym.addr >> 32;
> > +                               if (ext->ksym.is_typeless) { /* typelss ksyms */
>
> typo: typeless
>

Ack.

> > +                                       insn[0].imm = (__u32)ext->ksym.addr;
> > +                                       insn[1].imm = ext->ksym.addr >> 32;
> > +                               } else { /* typed ksyms */
> > +                                       insn[0].src_reg = BPF_PSEUDO_BTF_ID;
> > +                                       insn[0].imm = ext->ksym.vmlinux_btf_id;
> > +                               }
> >                         }
> >                         break;
> >                 case RELO_CALL:
> > @@ -6019,6 +6043,10 @@ static int bpf_object__read_kallsyms_file(struct bpf_object *obj)
> >                 if (!ext || ext->type != EXT_KSYM)
> >                         continue;
> >
> > +               /* Typed ksyms have the verifier to fill their addresses. */
> > +               if (!ext->ksym.is_typeless)
> > +                       continue;
>
> It might still be a good idea to try to find the symbol in kallsyms
> and emit nicer message if it's not there. Think about the user's
> experience when some global variable is removed from the kernel (or
> compiled out due to missing Kconfig). If libbpf can easily detect
> this, we should do it and provide a good error message.
>

Ack.

> > +
> >                 if (ext->is_set && ext->ksym.addr != sym_addr) {
> >                         pr_warn("extern (ksym) '%s' resolution is ambiguous: 0x%llx or 0x%llx\n",
> >                                 sym_name, ext->ksym.addr, sym_addr);
> > @@ -6037,10 +6065,72 @@ static int bpf_object__read_kallsyms_file(struct bpf_object *obj)
> >         return err;
> >  }
> >
> > +static int bpf_object__resolve_ksyms_btf_id(struct bpf_object *obj)
> > +{
> > +       struct extern_desc *ext;
> > +       int i, id;
> > +
> > +       if (!obj->btf_vmlinux) {
> > +               pr_warn("support of typed ksyms needs kernel btf.\n");
> > +               return -ENOENT;
> > +       }
> > +
> > +       for (i = 0; i < obj->nr_extern; i++) {
> > +               const struct btf_type *v, *vx; /* VARs in object and vmlinux btf */
> > +               const struct btf_type *t, *tx; /* TYPEs in btf */
> > +               __u32 vt, vtx; /* btf_ids of TYPEs */
>
> I use targ_ and local_ prefixes with CO-RE to distinguish something
> that's coming from BPF object's BTF vs kernel's BTF, can you please
> adopt the same, as the process is essentially the same. "vx" vs "vtx"
> vs "v" are hard to distinguish and understand.
>

No problem. Will follow the convention.

> > +
> > +               ext = &obj->externs[i];
> > +               if (ext->type != EXT_KSYM)
> > +                       continue;
> > +
> > +               if (ext->ksym.is_typeless)
> > +                       continue;
>
> nit: combine into a single filter condition (we need typed ksym,
> that's one check)
>

Ack.

> > +
> > +               if (ext->is_set) {
> > +                       pr_warn("typed ksym '%s' resolved as typeless ksyms.\n",
> > +                               ext->name);
> > +                       return -EFAULT;
>
> this is a bug if that happens, that should be caught by test, please
> remove unnecessary check
>

Ack.

> > +               }
> > +
> > +               id = btf__find_by_name_kind(obj->btf_vmlinux, ext->name,
> > +                                           BTF_KIND_VAR);
> > +               if (id <= 0) {
> > +                       pr_warn("no btf entry for ksym '%s' in vmlinux.\n",
> > +                               ext->name);
> > +                       return -ESRCH;
> > +               }
> > +
> > +               vx = btf__type_by_id(obj->btf_vmlinux, id);
> > +               tx = skip_mods_and_typedefs(obj->btf_vmlinux, vx->type, &vtx);
> > +
> > +               v = btf__type_by_id(obj->btf, ext->btf_id);
> > +               t = skip_mods_and_typedefs(obj->btf, v->type, &vt);
> > +
> > +               if (!btf_ksym_type_match(obj->btf_vmlinux, vtx, obj->btf, vt)) {
> > +                       const char *tname, *txname; /* names of TYPEs */
> > +
> > +                       txname = btf__name_by_offset(obj->btf_vmlinux, tx->name_off);
> > +                       tname = btf__name_by_offset(obj->btf, t->name_off);
> > +
> > +                       pr_warn("ksym '%s' expects type '%s' (vmlinux_btf_id: #%d), "
> > +                               "but got '%s' (btf_id: #%d)\n", ext->name,
> > +                               txname, vtx, tname, vt);
> > +                       return -EINVAL;
> > +               }
>
> yeah, definitely just use bpf_core_types_are_compat() here. You'll
> want to skip_mods_and_typedefs first, but everything else should work
> for your use case.
>

Ack. bpf_core_types_are_compat() is indeed a perfect fit here.

[...]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next v1 8/8] bpf/selftests: Test for bpf_per_cpu_ptr()
  2020-08-22  3:30   ` Andrii Nakryiko
@ 2020-08-28  3:42     ` Hao Luo
  2020-09-01 18:12       ` Andrii Nakryiko
  0 siblings, 1 reply; 42+ messages in thread
From: Hao Luo @ 2020-08-28  3:42 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Networking, bpf, open list, open list:KERNEL SELFTEST FRAMEWORK,
	Shuah Khan, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Quentin Monnet, Steven Rostedt, Ingo Molnar,
	Andrey Ignatov, Jakub Sitnicki

Thanks for taking a look!

On Fri, Aug 21, 2020 at 8:30 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Wed, Aug 19, 2020 at 3:42 PM Hao Luo <haoluo@google.com> wrote:
> >
> > Test bpf_per_cpu_ptr(). Test two paths in the kernel. If the base
> > pointer points to a struct, the returned reg is of type PTR_TO_BTF_ID.
> > Direct pointer dereference can be applied on the returned variable.
> > If the base pointer isn't a struct, the returned reg is of type
> > PTR_TO_MEM, which also supports direct pointer dereference.
> >
> > Signed-off-by: Hao Luo <haoluo@google.com>
> > ---
>
> Acked-by: Andrii Nakryiko <andriin@fb.com>
>
[...]
> >
> >  __u64 out__runqueues = -1;
> >  __u64 out__bpf_prog_active = -1;
> > +__u32 out__rq_cpu = -1;
> > +unsigned long out__process_counts = -1;
>
> try to not use long for variables, it is 32-bit integer in user-space
> but always 64-bit in BPF. This causes problems when using skeleton on
> 32-bit architecture.
>

Ack. I will use another variable of type 'int' instead.

> >
> > -extern const struct rq runqueues __ksym; /* struct type global var. */
> > +extern const struct rq runqueues __ksym; /* struct type percpu var. */
> >  extern const int bpf_prog_active __ksym; /* int type global var. */
> > +extern const unsigned long process_counts __ksym; /* int type percpu var. */
> >
> >  SEC("raw_tp/sys_enter")
> >  int handler(const void *ctx)
> >  {
> > +       struct rq *rq;
> > +       unsigned long *count;
> > +
> >         out__runqueues = (__u64)&runqueues;
> >         out__bpf_prog_active = (__u64)&bpf_prog_active;
> >
> > +       rq = (struct rq *)bpf_per_cpu_ptr(&runqueues, 1);
> > +       if (rq)
> > +               out__rq_cpu = rq->cpu;
>
> this is awesome!
>
> Are there any per-cpu variables that are arrays? Would be nice to test
> those too.
>
>

There are currently per-cpu arrays, but not common. There is a
'pmc_prev_left' in arch/x86, I can add that in this test.

[...]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next v1 4/8] bpf/libbpf: BTF support for typed ksyms
  2020-08-27 22:29     ` Hao Luo
@ 2020-09-01 18:11       ` Andrii Nakryiko
  2020-09-01 20:35         ` Hao Luo
  0 siblings, 1 reply; 42+ messages in thread
From: Andrii Nakryiko @ 2020-09-01 18:11 UTC (permalink / raw)
  To: Hao Luo
  Cc: Networking, bpf, open list, open list:KERNEL SELFTEST FRAMEWORK,
	Shuah Khan, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Quentin Monnet, Steven Rostedt, Ingo Molnar,
	Andrey Ignatov, Jakub Sitnicki

On Thu, Aug 27, 2020 at 3:29 PM Hao Luo <haoluo@google.com> wrote:
>
> On Fri, Aug 21, 2020 at 3:37 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Wed, Aug 19, 2020 at 3:42 PM Hao Luo <haoluo@google.com> wrote:
> > >
> > > If a ksym is defined with a type, libbpf will try to find the ksym's btf
> > > information from kernel btf. If a valid btf entry for the ksym is found,
> > > libbpf can pass in the found btf id to the verifier, which validates the
> > > ksym's type and value.
> > >
> > > Typeless ksyms (i.e. those defined as 'void') will not have such btf_id,
> > > but it has the symbol's address (read from kallsyms) and its value is
> > > treated as a raw pointer.
> > >
> > > Signed-off-by: Hao Luo <haoluo@google.com>
> > > ---
> > >  tools/lib/bpf/libbpf.c | 130 ++++++++++++++++++++++++++++++++++++-----
> > >  1 file changed, 114 insertions(+), 16 deletions(-)
> > >
> > > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> > > index 4a81c6b2d21b..94eff612c7c2 100644
> > > --- a/tools/lib/bpf/libbpf.c
> > > +++ b/tools/lib/bpf/libbpf.c
> > > @@ -357,7 +357,16 @@ struct extern_desc {
> > >                         bool is_signed;
> > >                 } kcfg;
> > >                 struct {
> > > -                       unsigned long long addr;
> > > +                       /*
> > > +                        *  1. If ksym is typeless, the field 'addr' is valid.
> > > +                        *  2. If ksym is typed, the field 'vmlinux_btf_id' is
> > > +                        *     valid.
> > > +                        */
> > > +                       bool is_typeless;
> > > +                       union {
> > > +                               unsigned long long addr;
> > > +                               int vmlinux_btf_id;
> > > +                       };
> >
> > ksym is 16 bytes anyways, union doesn't help to save space. I propose
> > to encode all this with just two fields: vmlinux_btf_id and addr. If
> > btf_id == 0, then extern is typeless.
>
> Ack on expanding the union. But I slightly preferred keeping
> is_typeless. IIUC, btf_id points a VAR_KIND, we need the following
> pointer chasing every time
>
> t = btf__type_by_id(obj->btf, ext->btf_id);
> t->type;
>
> which I felt is worse than keeping a is_typeless flag.

Sorry, I'm not following. In all places where you would check
sym->is_typeless, you'd now just do:

if (ext->ksym.vmlinux_btf_id) {
  /* typed, use ext->ksym.vmlinux_btf_id */
} else {
  /* typeless */
}

>
> >
> > >                 } ksym;
> > >         };
> > >  };
> > > @@ -382,6 +391,7 @@ struct bpf_object {
> > >
> > >         bool loaded;
> > >         bool has_pseudo_calls;
> > > +       bool has_typed_ksyms;
> > >
> > >         /*
> > >          * Information when doing elf related work. Only valid if fd
> > > @@ -2521,6 +2531,10 @@ static int bpf_object__load_vmlinux_btf(struct bpf_object *obj)
> > >         if (obj->btf_ext && obj->btf_ext->core_relo_info.len)
> > >                 need_vmlinux_btf = true;
> > >
> > > +       /* Support for typed ksyms needs kernel BTF */
> > > +       if (obj->has_typed_ksyms)
> > > +               need_vmlinux_btf = true;
> >
> > On the second read, I don't think you really need `has_typed_ksyms` at
> > all. Just iterate over ksym externs and see if you have a typed one.
> > It's the only place that cares.
> >
>
> Ack. Will iterate over ksym externs here.
>
> > > +
> > >         bpf_object__for_each_program(prog, obj) {
> > >                 if (!prog->load)
> > >                         continue;
> > > @@ -2975,10 +2989,10 @@ static int bpf_object__collect_externs(struct bpf_object *obj)
> > >                         ext->type = EXT_KSYM;
> > >
> > >                         vt = skip_mods_and_typedefs(obj->btf, t->type, NULL);
> > > -                       if (!btf_is_void(vt)) {
> > > -                               pr_warn("extern (ksym) '%s' is not typeless (void)\n", ext_name);
> > > -                               return -ENOTSUP;
> > > -                       }
> > > +                       ext->ksym.is_typeless = btf_is_void(vt);
> > > +
> > > +                       if (!obj->has_typed_ksyms && !ext->ksym.is_typeless)
> > > +                               obj->has_typed_ksyms = true;
> >
> > nit: keep it simple:
> >
> > if (ext->ksym.is_typeless)
> >   obj->has_typed_ksyms = true;
> >
>
> Ack.
>
> > >                 } else {
> > >                         pr_warn("unrecognized extern section '%s'\n", sec_name);
> > >                         return -ENOTSUP;
> > > @@ -2992,9 +3006,9 @@ static int bpf_object__collect_externs(struct bpf_object *obj)
> > >         /* sort externs by type, for kcfg ones also by (align, size, name) */
> > >         qsort(obj->externs, obj->nr_extern, sizeof(*ext), cmp_externs);
> > >
> > > -       /* for .ksyms section, we need to turn all externs into allocated
> > > -        * variables in BTF to pass kernel verification; we do this by
> > > -        * pretending that each extern is a 8-byte variable
> > > +       /* for .ksyms section, we need to turn all typeless externs into
> > > +        * allocated variables in BTF to pass kernel verification; we do
> > > +        * this by pretending that each typeless extern is a 8-byte variable
> > >          */
> > >         if (ksym_sec) {
> > >                 /* find existing 4-byte integer type in BTF to use for fake
> > > @@ -3012,7 +3026,7 @@ static int bpf_object__collect_externs(struct bpf_object *obj)
> > >
> > >                 sec = ksym_sec;
> > >                 n = btf_vlen(sec);
> > > -               for (i = 0, off = 0; i < n; i++, off += sizeof(int)) {
> > > +               for (i = 0, off = 0; i < n; i++) {
> > >                         struct btf_var_secinfo *vs = btf_var_secinfos(sec) + i;
> > >                         struct btf_type *vt;
> > >
> > > @@ -3025,9 +3039,14 @@ static int bpf_object__collect_externs(struct bpf_object *obj)
> > >                                 return -ESRCH;
> > >                         }
> > >                         btf_var(vt)->linkage = BTF_VAR_GLOBAL_ALLOCATED;
> > > -                       vt->type = int_btf_id;
> > > +                       if (ext->ksym.is_typeless) {
> > > +                               vt->type = int_btf_id;
> > > +                               vs->size = sizeof(int);
> > > +                       }
> > >                         vs->offset = off;
> > > -                       vs->size = sizeof(int);
> > > +                       off += vs->size;
> > > +                       pr_debug("ksym var_secinfo: var '%s', type #%d, size %d, offset %d\n",
> > > +                                ext->name, vt->type, vs->size, vs->offset);
> >
> > It's a bit of a waste that we still allocate memory for those typed
> > ksym externs, as they don't really need space. But modifying BTF is a
> > pain right now, so I think we'll have to do it, until we have a better
> > BTF API. But let's make them integers for now to take a fixed and
> > small amount of space.
> >
>
> Do you mean making typed ksym externs of type integer? If so, we can't
> do that, I think. After collect_externs, we later need to compare the
> declared extern's type against the type defined in kernel. Better not
> rewrite their types in BTf.

Then maybe we need to make btf_id to point to the actual type of the
variable, not BTF_KIND_VAR? Or just additionally record type's btf_id,
not sure which one makes more sense at the moment.

>
> I am generally against modifying BTF. I initially didn't notice that
> all the ksym externs' types are modified to 'int' and the type
> comparison I mentioned above always failed. I dumped the btf in
> vmlinux and the btf in object file, checked the kernel variable's
> source code, printed out everything I could. The experience was very
> bad.
>

It might be confusing, I agree, but the alternative is just a waste of
memory just to match the BTF definition of a DATASEC, which describes
externs. It seems sloppy to allocate a bunch of unused memory just to
match the kernel's variable size, while in reality we either use 8
bytes used (for typeless externs, storing ksym address) or none (for
typed externs).

Another alternative is to not specify BTF ID for .ksyms map, but it's
not great for typeless externs case, as we are losing all type info
completely. Trade-offs...

[...]

> > > +               }
> > > +
> > > +               id = btf__find_by_name_kind(obj->btf_vmlinux, ext->name,
> > > +                                           BTF_KIND_VAR);
> > > +               if (id <= 0) {
> > > +                       pr_warn("no btf entry for ksym '%s' in vmlinux.\n",
> > > +                               ext->name);
> > > +                       return -ESRCH;
> > > +               }
> > > +
> > > +               vx = btf__type_by_id(obj->btf_vmlinux, id);
> > > +               tx = skip_mods_and_typedefs(obj->btf_vmlinux, vx->type, &vtx);
> > > +
> > > +               v = btf__type_by_id(obj->btf, ext->btf_id);
> > > +               t = skip_mods_and_typedefs(obj->btf, v->type, &vt);
> > > +
> > > +               if (!btf_ksym_type_match(obj->btf_vmlinux, vtx, obj->btf, vt)) {
> > > +                       const char *tname, *txname; /* names of TYPEs */
> > > +
> > > +                       txname = btf__name_by_offset(obj->btf_vmlinux, tx->name_off);
> > > +                       tname = btf__name_by_offset(obj->btf, t->name_off);
> > > +
> > > +                       pr_warn("ksym '%s' expects type '%s' (vmlinux_btf_id: #%d), "
> > > +                               "but got '%s' (btf_id: #%d)\n", ext->name,
> > > +                               txname, vtx, tname, vt);
> > > +                       return -EINVAL;
> > > +               }
> >
> > yeah, definitely just use bpf_core_types_are_compat() here. You'll
> > want to skip_mods_and_typedefs first, but everything else should work
> > for your use case.
> >
>
> Ack. bpf_core_types_are_compat() is indeed a perfect fit here.
>

ok, great

> [...]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next v1 8/8] bpf/selftests: Test for bpf_per_cpu_ptr()
  2020-08-28  3:42     ` Hao Luo
@ 2020-09-01 18:12       ` Andrii Nakryiko
  2020-09-01 19:47         ` Hao Luo
  0 siblings, 1 reply; 42+ messages in thread
From: Andrii Nakryiko @ 2020-09-01 18:12 UTC (permalink / raw)
  To: Hao Luo
  Cc: Networking, bpf, open list, open list:KERNEL SELFTEST FRAMEWORK,
	Shuah Khan, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Quentin Monnet, Steven Rostedt, Ingo Molnar,
	Andrey Ignatov, Jakub Sitnicki

On Thu, Aug 27, 2020 at 8:42 PM Hao Luo <haoluo@google.com> wrote:
>
> Thanks for taking a look!
>
> On Fri, Aug 21, 2020 at 8:30 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Wed, Aug 19, 2020 at 3:42 PM Hao Luo <haoluo@google.com> wrote:
> > >
> > > Test bpf_per_cpu_ptr(). Test two paths in the kernel. If the base
> > > pointer points to a struct, the returned reg is of type PTR_TO_BTF_ID.
> > > Direct pointer dereference can be applied on the returned variable.
> > > If the base pointer isn't a struct, the returned reg is of type
> > > PTR_TO_MEM, which also supports direct pointer dereference.
> > >
> > > Signed-off-by: Hao Luo <haoluo@google.com>
> > > ---
> >
> > Acked-by: Andrii Nakryiko <andriin@fb.com>
> >
> [...]
> > >
> > >  __u64 out__runqueues = -1;
> > >  __u64 out__bpf_prog_active = -1;
> > > +__u32 out__rq_cpu = -1;
> > > +unsigned long out__process_counts = -1;
> >
> > try to not use long for variables, it is 32-bit integer in user-space
> > but always 64-bit in BPF. This causes problems when using skeleton on
> > 32-bit architecture.
> >
>
> Ack. I will use another variable of type 'int' instead.

__u64 is fine as well

>
> > >
> > > -extern const struct rq runqueues __ksym; /* struct type global var. */
> > > +extern const struct rq runqueues __ksym; /* struct type percpu var. */
> > >  extern const int bpf_prog_active __ksym; /* int type global var. */
> > > +extern const unsigned long process_counts __ksym; /* int type percpu var. */
> > >
> > >  SEC("raw_tp/sys_enter")
> > >  int handler(const void *ctx)
> > >  {
> > > +       struct rq *rq;
> > > +       unsigned long *count;
> > > +
> > >         out__runqueues = (__u64)&runqueues;
> > >         out__bpf_prog_active = (__u64)&bpf_prog_active;
> > >
> > > +       rq = (struct rq *)bpf_per_cpu_ptr(&runqueues, 1);
> > > +       if (rq)
> > > +               out__rq_cpu = rq->cpu;
> >
> > this is awesome!
> >
> > Are there any per-cpu variables that are arrays? Would be nice to test
> > those too.
> >
> >
>
> There are currently per-cpu arrays, but not common. There is a
> 'pmc_prev_left' in arch/x86, I can add that in this test.

arch-specific variables are bad, because selftests will be failing on
other architectures; let's not do this then.

>
> [...]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next v1 8/8] bpf/selftests: Test for bpf_per_cpu_ptr()
  2020-09-01 18:12       ` Andrii Nakryiko
@ 2020-09-01 19:47         ` Hao Luo
  0 siblings, 0 replies; 42+ messages in thread
From: Hao Luo @ 2020-09-01 19:47 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Networking, bpf, open list, open list:KERNEL SELFTEST FRAMEWORK,
	Shuah Khan, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Quentin Monnet, Steven Rostedt, Ingo Molnar,
	Andrey Ignatov, Jakub Sitnicki

On Tue, Sep 1, 2020 at 11:12 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Aug 27, 2020 at 8:42 PM Hao Luo <haoluo@google.com> wrote:
> >
[...]
> > > >
> > > > -extern const struct rq runqueues __ksym; /* struct type global var. */
> > > > +extern const struct rq runqueues __ksym; /* struct type percpu var. */
> > > >  extern const int bpf_prog_active __ksym; /* int type global var. */
> > > > +extern const unsigned long process_counts __ksym; /* int type percpu var. */
> > > >
> > > >  SEC("raw_tp/sys_enter")
> > > >  int handler(const void *ctx)
> > > >  {
> > > > +       struct rq *rq;
> > > > +       unsigned long *count;
> > > > +
> > > >         out__runqueues = (__u64)&runqueues;
> > > >         out__bpf_prog_active = (__u64)&bpf_prog_active;
> > > >
> > > > +       rq = (struct rq *)bpf_per_cpu_ptr(&runqueues, 1);
> > > > +       if (rq)
> > > > +               out__rq_cpu = rq->cpu;
> > >
> > > this is awesome!
> > >
> > > Are there any per-cpu variables that are arrays? Would be nice to test
> > > those too.
> > >
> > >
> >
> > There are currently per-cpu arrays, but not common. There is a
> > 'pmc_prev_left' in arch/x86, I can add that in this test.
>
> arch-specific variables are bad, because selftests will be failing on
> other architectures; let's not do this then.
>

Yeah, no problem. Though not going to add this arch-specific variable
in the posted patches, I tried array-typed ksyms locally in my test
environment. It worked fine, except that the array size is not
checked. For instance, if there is a percpu array in kernel as

DEFINE_PER_CPU(u32[64], foo);

we can declare a ksym of different size and it passes libbpf checks
and kernel verification.

extern u32 foo[128] __ksyms;

It seems that bpf_core_types_are_compat() doesn't check nr_elem. But
it seems the kernel verifier does check out-of-bounds accesses, so
this may not be a real problem. Just want to list what I saw.

> >
> > [...]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next v1 4/8] bpf/libbpf: BTF support for typed ksyms
  2020-09-01 18:11       ` Andrii Nakryiko
@ 2020-09-01 20:35         ` Hao Luo
  2020-09-01 23:54           ` Andrii Nakryiko
  0 siblings, 1 reply; 42+ messages in thread
From: Hao Luo @ 2020-09-01 20:35 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Networking, bpf, open list, open list:KERNEL SELFTEST FRAMEWORK,
	Shuah Khan, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Quentin Monnet, Steven Rostedt, Ingo Molnar,
	Andrey Ignatov, Jakub Sitnicki

On Tue, Sep 1, 2020 at 11:11 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Aug 27, 2020 at 3:29 PM Hao Luo <haoluo@google.com> wrote:
> >
> > On Fri, Aug 21, 2020 at 3:37 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Wed, Aug 19, 2020 at 3:42 PM Hao Luo <haoluo@google.com> wrote:
> > > >
> > > > If a ksym is defined with a type, libbpf will try to find the ksym's btf
> > > > information from kernel btf. If a valid btf entry for the ksym is found,
> > > > libbpf can pass in the found btf id to the verifier, which validates the
> > > > ksym's type and value.
> > > >
> > > > Typeless ksyms (i.e. those defined as 'void') will not have such btf_id,
> > > > but it has the symbol's address (read from kallsyms) and its value is
> > > > treated as a raw pointer.
> > > >
> > > > Signed-off-by: Hao Luo <haoluo@google.com>
> > > > ---
> > > >  tools/lib/bpf/libbpf.c | 130 ++++++++++++++++++++++++++++++++++++-----
> > > >  1 file changed, 114 insertions(+), 16 deletions(-)
> > > >
> > > > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> > > > index 4a81c6b2d21b..94eff612c7c2 100644
> > > > --- a/tools/lib/bpf/libbpf.c
> > > > +++ b/tools/lib/bpf/libbpf.c
> > > > @@ -357,7 +357,16 @@ struct extern_desc {
> > > >                         bool is_signed;
> > > >                 } kcfg;
> > > >                 struct {
> > > > -                       unsigned long long addr;
> > > > +                       /*
> > > > +                        *  1. If ksym is typeless, the field 'addr' is valid.
> > > > +                        *  2. If ksym is typed, the field 'vmlinux_btf_id' is
> > > > +                        *     valid.
> > > > +                        */
> > > > +                       bool is_typeless;
> > > > +                       union {
> > > > +                               unsigned long long addr;
> > > > +                               int vmlinux_btf_id;
> > > > +                       };
> > >
> > > ksym is 16 bytes anyways, union doesn't help to save space. I propose
> > > to encode all this with just two fields: vmlinux_btf_id and addr. If
> > > btf_id == 0, then extern is typeless.
> >
> > Ack on expanding the union. But I slightly preferred keeping
> > is_typeless. IIUC, btf_id points a VAR_KIND, we need the following
> > pointer chasing every time
> >
> > t = btf__type_by_id(obj->btf, ext->btf_id);
> > t->type;
> >
> > which I felt is worse than keeping a is_typeless flag.
>
> Sorry, I'm not following. In all places where you would check
> sym->is_typeless, you'd now just do:
>
> if (ext->ksym.vmlinux_btf_id) {
>   /* typed, use ext->ksym.vmlinux_btf_id */
> } else {
>   /* typeless */
> }
>

My apologies, I should be more specific.

'vmlinux_btf_id' gets its value in bpf_object__resolve_ksyms_btf_id().
Before we call this function, there are three places that need to tell
whether a ksym is typed, currently in v1. Specifically,

 - in bpf_object__collect_externs(), typeless ksyms are rewritten as
'int', in contrast, typed ones are left untouched (though this may
change in v2).
 - bpf_object__load_vmlinux_btf() now is called before
bpf_object__resolve_ksyms_btf_id(). In v1's design, if there is no
typed ksym, we could skip loading vmlinux_btf potentially.
 - even bpf_object__resolve_ksyms_btf_id() itself is conditionally
called, depending on whether there is any typed ksym.

At the time when these places are called, vmlinux_btf_id is
unavailable and we can't use it for the purpose of telling whether a
ksym is typed.

However, rather than vmlinux_btf_id, there may be an alternative. We
can record the ksym extern's type's btf_id and use that as
'is_typeless' flag. This also solves the problem below.

[...]

> > > >                 } else {
> > > >                         pr_warn("unrecognized extern section '%s'\n", sec_name);
> > > >                         return -ENOTSUP;
> > > > @@ -2992,9 +3006,9 @@ static int bpf_object__collect_externs(struct bpf_object *obj)
> > > >         /* sort externs by type, for kcfg ones also by (align, size, name) */
> > > >         qsort(obj->externs, obj->nr_extern, sizeof(*ext), cmp_externs);
> > > >
> > > > -       /* for .ksyms section, we need to turn all externs into allocated
> > > > -        * variables in BTF to pass kernel verification; we do this by
> > > > -        * pretending that each extern is a 8-byte variable
> > > > +       /* for .ksyms section, we need to turn all typeless externs into
> > > > +        * allocated variables in BTF to pass kernel verification; we do
> > > > +        * this by pretending that each typeless extern is a 8-byte variable
> > > >          */
> > > >         if (ksym_sec) {
> > > >                 /* find existing 4-byte integer type in BTF to use for fake
> > > > @@ -3012,7 +3026,7 @@ static int bpf_object__collect_externs(struct bpf_object *obj)
> > > >
> > > >                 sec = ksym_sec;
> > > >                 n = btf_vlen(sec);
> > > > -               for (i = 0, off = 0; i < n; i++, off += sizeof(int)) {
> > > > +               for (i = 0, off = 0; i < n; i++) {
> > > >                         struct btf_var_secinfo *vs = btf_var_secinfos(sec) + i;
> > > >                         struct btf_type *vt;
> > > >
> > > > @@ -3025,9 +3039,14 @@ static int bpf_object__collect_externs(struct bpf_object *obj)
> > > >                                 return -ESRCH;
> > > >                         }
> > > >                         btf_var(vt)->linkage = BTF_VAR_GLOBAL_ALLOCATED;
> > > > -                       vt->type = int_btf_id;
> > > > +                       if (ext->ksym.is_typeless) {
> > > > +                               vt->type = int_btf_id;
> > > > +                               vs->size = sizeof(int);
> > > > +                       }
> > > >                         vs->offset = off;
> > > > -                       vs->size = sizeof(int);
> > > > +                       off += vs->size;
> > > > +                       pr_debug("ksym var_secinfo: var '%s', type #%d, size %d, offset %d\n",
> > > > +                                ext->name, vt->type, vs->size, vs->offset);
> > >
> > > It's a bit of a waste that we still allocate memory for those typed
> > > ksym externs, as they don't really need space. But modifying BTF is a
> > > pain right now, so I think we'll have to do it, until we have a better
> > > BTF API. But let's make them integers for now to take a fixed and
> > > small amount of space.
> > >
> >
> > Do you mean making typed ksym externs of type integer? If so, we can't
> > do that, I think. After collect_externs, we later need to compare the
> > declared extern's type against the type defined in kernel. Better not
> > rewrite their types in BTf.
>
> Then maybe we need to make btf_id to point to the actual type of the
> variable, not BTF_KIND_VAR? Or just additionally record type's btf_id,
> not sure which one makes more sense at the moment.
>
> >
> > I am generally against modifying BTF. I initially didn't notice that
> > all the ksym externs' types are modified to 'int' and the type
> > comparison I mentioned above always failed. I dumped the btf in
> > vmlinux and the btf in object file, checked the kernel variable's
> > source code, printed out everything I could. The experience was very
> > bad.
> >
>
> It might be confusing, I agree, but the alternative is just a waste of
> memory just to match the BTF definition of a DATASEC, which describes
> externs. It seems sloppy to allocate a bunch of unused memory just to
> match the kernel's variable size, while in reality we either use 8
> bytes used (for typeless externs, storing ksym address) or none (for
> typed externs).
>
> Another alternative is to not specify BTF ID for .ksyms map, but it's
> not great for typeless externs case, as we are losing all type info
> completely. Trade-offs...
>

I see. It looks like rewriting all ksym externs' type to integers is
the most straightforward solution here, though I felt a bit hacky.

I can record the btf_id of the var's type before rewriting, so
bpf_core_type_are_compat() can find the true type for comparison. One
good thing about recording the type's btf_id is that it can be used to
tell whether the ksym extern is typed or not, before vmlinux_btf_id
gets its value. I will think about this and try the alternatives a bit
more and follow up if I come up with a better solution.

Thanks!

[...]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next v1 4/8] bpf/libbpf: BTF support for typed ksyms
  2020-09-01 20:35         ` Hao Luo
@ 2020-09-01 23:54           ` Andrii Nakryiko
  2020-09-02  0:46             ` Hao Luo
  0 siblings, 1 reply; 42+ messages in thread
From: Andrii Nakryiko @ 2020-09-01 23:54 UTC (permalink / raw)
  To: Hao Luo
  Cc: Networking, bpf, open list, open list:KERNEL SELFTEST FRAMEWORK,
	Shuah Khan, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Quentin Monnet, Steven Rostedt, Ingo Molnar,
	Andrey Ignatov, Jakub Sitnicki

On Tue, Sep 1, 2020 at 1:35 PM Hao Luo <haoluo@google.com> wrote:
>
> On Tue, Sep 1, 2020 at 11:11 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Thu, Aug 27, 2020 at 3:29 PM Hao Luo <haoluo@google.com> wrote:
> > >
> > > On Fri, Aug 21, 2020 at 3:37 PM Andrii Nakryiko
> > > <andrii.nakryiko@gmail.com> wrote:
> > > >
> > > > On Wed, Aug 19, 2020 at 3:42 PM Hao Luo <haoluo@google.com> wrote:
> > > > >
> > > > > If a ksym is defined with a type, libbpf will try to find the ksym's btf
> > > > > information from kernel btf. If a valid btf entry for the ksym is found,
> > > > > libbpf can pass in the found btf id to the verifier, which validates the
> > > > > ksym's type and value.
> > > > >
> > > > > Typeless ksyms (i.e. those defined as 'void') will not have such btf_id,
> > > > > but it has the symbol's address (read from kallsyms) and its value is
> > > > > treated as a raw pointer.
> > > > >
> > > > > Signed-off-by: Hao Luo <haoluo@google.com>
> > > > > ---
> > > > >  tools/lib/bpf/libbpf.c | 130 ++++++++++++++++++++++++++++++++++++-----
> > > > >  1 file changed, 114 insertions(+), 16 deletions(-)
> > > > >
> > > > > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> > > > > index 4a81c6b2d21b..94eff612c7c2 100644
> > > > > --- a/tools/lib/bpf/libbpf.c
> > > > > +++ b/tools/lib/bpf/libbpf.c
> > > > > @@ -357,7 +357,16 @@ struct extern_desc {
> > > > >                         bool is_signed;
> > > > >                 } kcfg;
> > > > >                 struct {
> > > > > -                       unsigned long long addr;
> > > > > +                       /*
> > > > > +                        *  1. If ksym is typeless, the field 'addr' is valid.
> > > > > +                        *  2. If ksym is typed, the field 'vmlinux_btf_id' is
> > > > > +                        *     valid.
> > > > > +                        */
> > > > > +                       bool is_typeless;
> > > > > +                       union {
> > > > > +                               unsigned long long addr;
> > > > > +                               int vmlinux_btf_id;
> > > > > +                       };
> > > >
> > > > ksym is 16 bytes anyways, union doesn't help to save space. I propose
> > > > to encode all this with just two fields: vmlinux_btf_id and addr. If
> > > > btf_id == 0, then extern is typeless.
> > >
> > > Ack on expanding the union. But I slightly preferred keeping
> > > is_typeless. IIUC, btf_id points a VAR_KIND, we need the following
> > > pointer chasing every time
> > >
> > > t = btf__type_by_id(obj->btf, ext->btf_id);
> > > t->type;
> > >
> > > which I felt is worse than keeping a is_typeless flag.
> >
> > Sorry, I'm not following. In all places where you would check
> > sym->is_typeless, you'd now just do:
> >
> > if (ext->ksym.vmlinux_btf_id) {
> >   /* typed, use ext->ksym.vmlinux_btf_id */
> > } else {
> >   /* typeless */
> > }
> >
>
> My apologies, I should be more specific.
>
> 'vmlinux_btf_id' gets its value in bpf_object__resolve_ksyms_btf_id().
> Before we call this function, there are three places that need to tell
> whether a ksym is typed, currently in v1. Specifically,
>
>  - in bpf_object__collect_externs(), typeless ksyms are rewritten as
> 'int', in contrast, typed ones are left untouched (though this may
> change in v2).
>  - bpf_object__load_vmlinux_btf() now is called before
> bpf_object__resolve_ksyms_btf_id(). In v1's design, if there is no
> typed ksym, we could skip loading vmlinux_btf potentially.
>  - even bpf_object__resolve_ksyms_btf_id() itself is conditionally
> called, depending on whether there is any typed ksym.
>
> At the time when these places are called, vmlinux_btf_id is
> unavailable and we can't use it for the purpose of telling whether a
> ksym is typed.
>
> However, rather than vmlinux_btf_id, there may be an alternative. We
> can record the ksym extern's type's btf_id and use that as
> 'is_typeless' flag. This also solves the problem below.

Oh, I was thinking that vmlinux_btf_id contains a local BTF ID this
whole time (clearly ignoring the "vmlinux_" part).

>
> [...]
>
> > > > >                 } else {
> > > > >                         pr_warn("unrecognized extern section '%s'\n", sec_name);
> > > > >                         return -ENOTSUP;
> > > > > @@ -2992,9 +3006,9 @@ static int bpf_object__collect_externs(struct bpf_object *obj)
> > > > >         /* sort externs by type, for kcfg ones also by (align, size, name) */
> > > > >         qsort(obj->externs, obj->nr_extern, sizeof(*ext), cmp_externs);
> > > > >
> > > > > -       /* for .ksyms section, we need to turn all externs into allocated
> > > > > -        * variables in BTF to pass kernel verification; we do this by
> > > > > -        * pretending that each extern is a 8-byte variable
> > > > > +       /* for .ksyms section, we need to turn all typeless externs into
> > > > > +        * allocated variables in BTF to pass kernel verification; we do
> > > > > +        * this by pretending that each typeless extern is a 8-byte variable
> > > > >          */
> > > > >         if (ksym_sec) {
> > > > >                 /* find existing 4-byte integer type in BTF to use for fake
> > > > > @@ -3012,7 +3026,7 @@ static int bpf_object__collect_externs(struct bpf_object *obj)
> > > > >
> > > > >                 sec = ksym_sec;
> > > > >                 n = btf_vlen(sec);
> > > > > -               for (i = 0, off = 0; i < n; i++, off += sizeof(int)) {
> > > > > +               for (i = 0, off = 0; i < n; i++) {
> > > > >                         struct btf_var_secinfo *vs = btf_var_secinfos(sec) + i;
> > > > >                         struct btf_type *vt;
> > > > >
> > > > > @@ -3025,9 +3039,14 @@ static int bpf_object__collect_externs(struct bpf_object *obj)
> > > > >                                 return -ESRCH;
> > > > >                         }
> > > > >                         btf_var(vt)->linkage = BTF_VAR_GLOBAL_ALLOCATED;
> > > > > -                       vt->type = int_btf_id;
> > > > > +                       if (ext->ksym.is_typeless) {
> > > > > +                               vt->type = int_btf_id;
> > > > > +                               vs->size = sizeof(int);
> > > > > +                       }
> > > > >                         vs->offset = off;
> > > > > -                       vs->size = sizeof(int);
> > > > > +                       off += vs->size;
> > > > > +                       pr_debug("ksym var_secinfo: var '%s', type #%d, size %d, offset %d\n",
> > > > > +                                ext->name, vt->type, vs->size, vs->offset);
> > > >
> > > > It's a bit of a waste that we still allocate memory for those typed
> > > > ksym externs, as they don't really need space. But modifying BTF is a
> > > > pain right now, so I think we'll have to do it, until we have a better
> > > > BTF API. But let's make them integers for now to take a fixed and
> > > > small amount of space.
> > > >
> > >
> > > Do you mean making typed ksym externs of type integer? If so, we can't
> > > do that, I think. After collect_externs, we later need to compare the
> > > declared extern's type against the type defined in kernel. Better not
> > > rewrite their types in BTf.
> >
> > Then maybe we need to make btf_id to point to the actual type of the
> > variable, not BTF_KIND_VAR? Or just additionally record type's btf_id,
> > not sure which one makes more sense at the moment.
> >
> > >
> > > I am generally against modifying BTF. I initially didn't notice that
> > > all the ksym externs' types are modified to 'int' and the type
> > > comparison I mentioned above always failed. I dumped the btf in
> > > vmlinux and the btf in object file, checked the kernel variable's
> > > source code, printed out everything I could. The experience was very
> > > bad.
> > >
> >
> > It might be confusing, I agree, but the alternative is just a waste of
> > memory just to match the BTF definition of a DATASEC, which describes
> > externs. It seems sloppy to allocate a bunch of unused memory just to
> > match the kernel's variable size, while in reality we either use 8
> > bytes used (for typeless externs, storing ksym address) or none (for
> > typed externs).
> >
> > Another alternative is to not specify BTF ID for .ksyms map, but it's
> > not great for typeless externs case, as we are losing all type info
> > completely. Trade-offs...
> >
>
> I see. It looks like rewriting all ksym externs' type to integers is
> the most straightforward solution here, though I felt a bit hacky.
>
> I can record the btf_id of the var's type before rewriting, so
> bpf_core_type_are_compat() can find the true type for comparison. One
> good thing about recording the type's btf_id is that it can be used to
> tell whether the ksym extern is typed or not, before vmlinux_btf_id

that's what I've been getting at, but I missed that vmlinux_btf_id is
kernel BTF type ID. So let's record both local and target BTF type IDs
and use local_btf_id as an indicator of typed vs typeless?

> gets its value. I will think about this and try the alternatives a bit
> more and follow up if I come up with a better solution.
>
> Thanks!
>
> [...]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next v1 4/8] bpf/libbpf: BTF support for typed ksyms
  2020-09-01 23:54           ` Andrii Nakryiko
@ 2020-09-02  0:46             ` Hao Luo
  0 siblings, 0 replies; 42+ messages in thread
From: Hao Luo @ 2020-09-02  0:46 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Networking, bpf, open list, open list:KERNEL SELFTEST FRAMEWORK,
	Shuah Khan, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Quentin Monnet, Steven Rostedt, Ingo Molnar,
	Andrey Ignatov, Jakub Sitnicki

On Tue, Sep 1, 2020 at 4:55 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Tue, Sep 1, 2020 at 1:35 PM Hao Luo <haoluo@google.com> wrote:
> >
> > On Tue, Sep 1, 2020 at 11:11 AM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Thu, Aug 27, 2020 at 3:29 PM Hao Luo <haoluo@google.com> wrote:
> > > >
> > > > On Fri, Aug 21, 2020 at 3:37 PM Andrii Nakryiko
> > > > <andrii.nakryiko@gmail.com> wrote:
> > > > >
> > > > > On Wed, Aug 19, 2020 at 3:42 PM Hao Luo <haoluo@google.com> wrote:
> > > > > >
> > > > > > If a ksym is defined with a type, libbpf will try to find the ksym's btf
> > > > > > information from kernel btf. If a valid btf entry for the ksym is found,
> > > > > > libbpf can pass in the found btf id to the verifier, which validates the
> > > > > > ksym's type and value.
> > > > > >
> > > > > > Typeless ksyms (i.e. those defined as 'void') will not have such btf_id,
> > > > > > but it has the symbol's address (read from kallsyms) and its value is
> > > > > > treated as a raw pointer.
> > > > > >
> > > > > > Signed-off-by: Hao Luo <haoluo@google.com>
> > > > > > ---
> > > > > >  tools/lib/bpf/libbpf.c | 130 ++++++++++++++++++++++++++++++++++++-----
> > > > > >  1 file changed, 114 insertions(+), 16 deletions(-)
> > > > > >
> > > > > > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> > > > > > index 4a81c6b2d21b..94eff612c7c2 100644
> > > > > > --- a/tools/lib/bpf/libbpf.c
> > > > > > +++ b/tools/lib/bpf/libbpf.c
> > > > > > @@ -357,7 +357,16 @@ struct extern_desc {
> > > > > >                         bool is_signed;
> > > > > >                 } kcfg;
> > > > > >                 struct {
> > > > > > -                       unsigned long long addr;
> > > > > > +                       /*
> > > > > > +                        *  1. If ksym is typeless, the field 'addr' is valid.
> > > > > > +                        *  2. If ksym is typed, the field 'vmlinux_btf_id' is
> > > > > > +                        *     valid.
> > > > > > +                        */
> > > > > > +                       bool is_typeless;
> > > > > > +                       union {
> > > > > > +                               unsigned long long addr;
> > > > > > +                               int vmlinux_btf_id;
> > > > > > +                       };
> > > > >
> > > > > ksym is 16 bytes anyways, union doesn't help to save space. I propose
> > > > > to encode all this with just two fields: vmlinux_btf_id and addr. If
> > > > > btf_id == 0, then extern is typeless.
> > > >
> > > > Ack on expanding the union. But I slightly preferred keeping
> > > > is_typeless. IIUC, btf_id points a VAR_KIND, we need the following
> > > > pointer chasing every time
> > > >
> > > > t = btf__type_by_id(obj->btf, ext->btf_id);
> > > > t->type;
> > > >
> > > > which I felt is worse than keeping a is_typeless flag.
> > >
> > > Sorry, I'm not following. In all places where you would check
> > > sym->is_typeless, you'd now just do:
> > >
> > > if (ext->ksym.vmlinux_btf_id) {
> > >   /* typed, use ext->ksym.vmlinux_btf_id */
> > > } else {
> > >   /* typeless */
> > > }
> > >
> >
> > My apologies, I should be more specific.
> >
> > 'vmlinux_btf_id' gets its value in bpf_object__resolve_ksyms_btf_id().
> > Before we call this function, there are three places that need to tell
> > whether a ksym is typed, currently in v1. Specifically,
> >
> >  - in bpf_object__collect_externs(), typeless ksyms are rewritten as
> > 'int', in contrast, typed ones are left untouched (though this may
> > change in v2).
> >  - bpf_object__load_vmlinux_btf() now is called before
> > bpf_object__resolve_ksyms_btf_id(). In v1's design, if there is no
> > typed ksym, we could skip loading vmlinux_btf potentially.
> >  - even bpf_object__resolve_ksyms_btf_id() itself is conditionally
> > called, depending on whether there is any typed ksym.
> >
> > At the time when these places are called, vmlinux_btf_id is
> > unavailable and we can't use it for the purpose of telling whether a
> > ksym is typed.
> >
> > However, rather than vmlinux_btf_id, there may be an alternative. We
> > can record the ksym extern's type's btf_id and use that as
> > 'is_typeless' flag. This also solves the problem below.
>
> Oh, I was thinking that vmlinux_btf_id contains a local BTF ID this
> whole time (clearly ignoring the "vmlinux_" part).
>
> >
> > [...]
> >
> > > > > >                 } else {
> > > > > >                         pr_warn("unrecognized extern section '%s'\n", sec_name);
> > > > > >                         return -ENOTSUP;
> > > > > > @@ -2992,9 +3006,9 @@ static int bpf_object__collect_externs(struct bpf_object *obj)
> > > > > >         /* sort externs by type, for kcfg ones also by (align, size, name) */
> > > > > >         qsort(obj->externs, obj->nr_extern, sizeof(*ext), cmp_externs);
> > > > > >
> > > > > > -       /* for .ksyms section, we need to turn all externs into allocated
> > > > > > -        * variables in BTF to pass kernel verification; we do this by
> > > > > > -        * pretending that each extern is a 8-byte variable
> > > > > > +       /* for .ksyms section, we need to turn all typeless externs into
> > > > > > +        * allocated variables in BTF to pass kernel verification; we do
> > > > > > +        * this by pretending that each typeless extern is a 8-byte variable
> > > > > >          */
> > > > > >         if (ksym_sec) {
> > > > > >                 /* find existing 4-byte integer type in BTF to use for fake
> > > > > > @@ -3012,7 +3026,7 @@ static int bpf_object__collect_externs(struct bpf_object *obj)
> > > > > >
> > > > > >                 sec = ksym_sec;
> > > > > >                 n = btf_vlen(sec);
> > > > > > -               for (i = 0, off = 0; i < n; i++, off += sizeof(int)) {
> > > > > > +               for (i = 0, off = 0; i < n; i++) {
> > > > > >                         struct btf_var_secinfo *vs = btf_var_secinfos(sec) + i;
> > > > > >                         struct btf_type *vt;
> > > > > >
> > > > > > @@ -3025,9 +3039,14 @@ static int bpf_object__collect_externs(struct bpf_object *obj)
> > > > > >                                 return -ESRCH;
> > > > > >                         }
> > > > > >                         btf_var(vt)->linkage = BTF_VAR_GLOBAL_ALLOCATED;
> > > > > > -                       vt->type = int_btf_id;
> > > > > > +                       if (ext->ksym.is_typeless) {
> > > > > > +                               vt->type = int_btf_id;
> > > > > > +                               vs->size = sizeof(int);
> > > > > > +                       }
> > > > > >                         vs->offset = off;
> > > > > > -                       vs->size = sizeof(int);
> > > > > > +                       off += vs->size;
> > > > > > +                       pr_debug("ksym var_secinfo: var '%s', type #%d, size %d, offset %d\n",
> > > > > > +                                ext->name, vt->type, vs->size, vs->offset);
> > > > >
> > > > > It's a bit of a waste that we still allocate memory for those typed
> > > > > ksym externs, as they don't really need space. But modifying BTF is a
> > > > > pain right now, so I think we'll have to do it, until we have a better
> > > > > BTF API. But let's make them integers for now to take a fixed and
> > > > > small amount of space.
> > > > >
> > > >
> > > > Do you mean making typed ksym externs of type integer? If so, we can't
> > > > do that, I think. After collect_externs, we later need to compare the
> > > > declared extern's type against the type defined in kernel. Better not
> > > > rewrite their types in BTf.
> > >
> > > Then maybe we need to make btf_id to point to the actual type of the
> > > variable, not BTF_KIND_VAR? Or just additionally record type's btf_id,
> > > not sure which one makes more sense at the moment.
> > >
> > > >
> > > > I am generally against modifying BTF. I initially didn't notice that
> > > > all the ksym externs' types are modified to 'int' and the type
> > > > comparison I mentioned above always failed. I dumped the btf in
> > > > vmlinux and the btf in object file, checked the kernel variable's
> > > > source code, printed out everything I could. The experience was very
> > > > bad.
> > > >
> > >
> > > It might be confusing, I agree, but the alternative is just a waste of
> > > memory just to match the BTF definition of a DATASEC, which describes
> > > externs. It seems sloppy to allocate a bunch of unused memory just to
> > > match the kernel's variable size, while in reality we either use 8
> > > bytes used (for typeless externs, storing ksym address) or none (for
> > > typed externs).
> > >
> > > Another alternative is to not specify BTF ID for .ksyms map, but it's
> > > not great for typeless externs case, as we are losing all type info
> > > completely. Trade-offs...
> > >
> >
> > I see. It looks like rewriting all ksym externs' type to integers is
> > the most straightforward solution here, though I felt a bit hacky.
> >
> > I can record the btf_id of the var's type before rewriting, so
> > bpf_core_type_are_compat() can find the true type for comparison. One
> > good thing about recording the type's btf_id is that it can be used to
> > tell whether the ksym extern is typed or not, before vmlinux_btf_id
>
> that's what I've been getting at, but I missed that vmlinux_btf_id is
> kernel BTF type ID. So let's record both local and target BTF type IDs
> and use local_btf_id as an indicator of typed vs typeless?
>

Yup, that sounds great!

[...]

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2020-09-02  0:46 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-19 22:40 [PATCH bpf-next v1 0/8] bpf: BTF support for ksyms Hao Luo
2020-08-19 22:40 ` [PATCH bpf-next v1 1/8] bpf: Introduce pseudo_btf_id Hao Luo
2020-08-20 15:22   ` Yonghong Song
2020-08-20 17:04     ` Hao Luo
2020-08-25  0:05     ` Hao Luo
2020-08-25  0:43       ` Yonghong Song
2020-08-20 16:43   ` Yonghong Song
2020-08-20 21:53   ` Alexei Starovoitov
2020-08-21  2:22     ` Hao Luo
2020-08-19 22:40 ` [PATCH bpf-next v1 2/8] bpf: Propagate BPF_PSEUDO_BTF_ID to uapi headers in /tools Hao Luo
2020-08-20 16:44   ` Yonghong Song
2020-08-19 22:40 ` [PATCH bpf-next v1 3/8] bpf: Introduce help function to validate ksym's type Hao Luo
2020-08-20 17:20   ` Yonghong Song
2020-08-21 21:50     ` Andrii Nakryiko
2020-08-22  0:43       ` Hao Luo
2020-08-22  2:43         ` Andrii Nakryiko
2020-08-22  7:04           ` Hao Luo
2020-08-19 22:40 ` [PATCH bpf-next v1 4/8] bpf/libbpf: BTF support for typed ksyms Hao Luo
2020-08-21 22:37   ` Andrii Nakryiko
2020-08-27 22:29     ` Hao Luo
2020-09-01 18:11       ` Andrii Nakryiko
2020-09-01 20:35         ` Hao Luo
2020-09-01 23:54           ` Andrii Nakryiko
2020-09-02  0:46             ` Hao Luo
2020-08-19 22:40 ` [PATCH bpf-next v1 5/8] bpf/selftests: ksyms_btf to test " Hao Luo
2020-08-20 17:28   ` Yonghong Song
2020-08-21 23:03     ` Andrii Nakryiko
2020-08-22  7:26       ` Hao Luo
2020-08-22  7:35         ` Andrii Nakryiko
2020-08-19 22:40 ` [PATCH bpf-next v1 6/8] bpf: Introduce bpf_per_cpu_ptr() Hao Luo
2020-08-22  3:26   ` Andrii Nakryiko
2020-08-22  3:31     ` Andrii Nakryiko
2020-08-22  7:49       ` Hao Luo
2020-08-22  7:55         ` Andrii Nakryiko
2020-08-25  1:03           ` Hao Luo
2020-08-19 22:40 ` [PATCH bpf-next v1 7/8] bpf: Propagate bpf_per_cpu_ptr() to /tools Hao Luo
2020-08-20 17:30   ` Yonghong Song
2020-08-19 22:40 ` [PATCH bpf-next v1 8/8] bpf/selftests: Test for bpf_per_cpu_ptr() Hao Luo
2020-08-22  3:30   ` Andrii Nakryiko
2020-08-28  3:42     ` Hao Luo
2020-09-01 18:12       ` Andrii Nakryiko
2020-09-01 19:47         ` Hao Luo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).