netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next 0/9] libbpf: BTF writer APIs
@ 2020-09-23 15:54 Andrii Nakryiko
  2020-09-23 15:54 ` [PATCH bpf-next 1/9] libbpf: refactor internals of BTF type index Andrii Nakryiko
                   ` (9 more replies)
  0 siblings, 10 replies; 20+ messages in thread
From: Andrii Nakryiko @ 2020-09-23 15:54 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko, Arnaldo Carvalho de Melo

This patch set introduces a new set of BTF APIs to libbpf that allow to
conveniently produce BTF types and strings. Internals of struct btf were
changed such that it can transparently and automatically switch to writable
mode, which allows appending BTF types and strings. This will allow for libbpf
itself to do more intrusive modifications of program's BTF (by rewriting it,
at least as of right now), which is necessary for the upcoming libbpf static
linking. But they are complete and generic, so can be adopted by anyone who
has a need to produce BTF type information.

One such example outside of libbpf is pahole, which was actually converted to
these APIs (locally, pending landing of these changes in libbpf) completely
and shows reduction in amount of custom pahole code necessary and brings nice
savings in memory usage (about 370MB reduction at peak for my kernel
configuration) and even BTF deduplication times (one second reduction,
23.7s -> 22.7s). Memory savings are due to avoiding pahole's own copy of
"uncompressed" raw BTF data. Time reduction comes from faster string
search and deduplication by relying on hashmap instead of BST used by pahole's
own code. Consequently, these APIs are already tested on real-world
complicated kernel BTF, but there is also pretty extensive selftest doing
extra validations.

Selftests in patch #9 add a set of generic ASSERT_{EQ,STREQ,ERR,OK} macros
that are useful for writing shorter and less repretitive selftests. I decided
to keep them local to that selftest for now, but if they prove to be useful in
more contexts we should move them to test_progs.h. And few more (e.g.,
inequality tests) macros are probably necessary to have a more complete set.

Cc: Arnaldo Carvalho de Melo <acme@redhat.com>

Andrii Nakryiko (9):
  libbpf: refactor internals of BTF type index
  libbpf: remove assumption of single contiguous memory for BTF data
  libbpf: generalize common logic for managing dynamically-sized arrays
  libbpf: extract generic string hashing function for reuse
  libbpf: allow modification of BTF and add btf__add_str API
  libbpf: add btf__new_empty() to create an empty BTF object
  libbpf: add BTF writing APIs
  libbpf: add btf__str_by_offset() as a more generic variant of
    name_by_offset
  selftests/bpf: test BTF writing APIs

 tools/lib/bpf/bpf.c                           |    2 +-
 tools/lib/bpf/bpf.h                           |    2 +-
 tools/lib/bpf/btf.c                           | 1311 +++++++++++++++--
 tools/lib/bpf/btf.h                           |   41 +
 tools/lib/bpf/btf_dump.c                      |    9 +-
 tools/lib/bpf/hashmap.h                       |   12 +
 tools/lib/bpf/libbpf.map                      |   22 +
 tools/lib/bpf/libbpf_internal.h               |    3 +
 .../selftests/bpf/prog_tests/btf_write.c      |  271 ++++
 9 files changed, 1553 insertions(+), 120 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/btf_write.c

-- 
2.24.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH bpf-next 1/9] libbpf: refactor internals of BTF type index
  2020-09-23 15:54 [PATCH bpf-next 0/9] libbpf: BTF writer APIs Andrii Nakryiko
@ 2020-09-23 15:54 ` Andrii Nakryiko
  2020-09-23 15:54 ` [PATCH bpf-next 2/9] libbpf: remove assumption of single contiguous memory for BTF data Andrii Nakryiko
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Andrii Nakryiko @ 2020-09-23 15:54 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko, Arnaldo Carvalho de Melo

Refactor implementation of internal BTF type index to not use direct pointers.
Instead it uses offset relative to the start of types data section. This
allows for types data to be reallocatable, enabling implementation of
modifiable BTF.

As now getting type by ID has an extra indirection step, convert all internal
type lookups to a new helper btf_type_id(), that returns non-const pointer to
a type by its ID.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 tools/lib/bpf/btf.c | 139 ++++++++++++++++++++++++--------------------
 1 file changed, 75 insertions(+), 64 deletions(-)

diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index a3d259e614b0..7c9957893ef2 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -31,11 +31,12 @@ struct btf {
 		struct btf_header *hdr;
 		void *data;
 	};
-	struct btf_type **types;
+	__u32 *type_offs;
+	__u32 type_offs_cap;
 	const char *strings;
 	void *nohdr_data;
+	void *types_data;
 	__u32 nr_types;
-	__u32 types_size;
 	__u32 data_size;
 	int fd;
 	int ptr_sz;
@@ -46,30 +47,30 @@ static inline __u64 ptr_to_u64(const void *ptr)
 	return (__u64) (unsigned long) ptr;
 }
 
-static int btf_add_type(struct btf *btf, struct btf_type *t)
+static int btf_add_type_idx_entry(struct btf *btf, __u32 type_off)
 {
-	if (btf->types_size - btf->nr_types < 2) {
-		struct btf_type **new_types;
+	/* nr_types is 1-based, so N types means we need N+1-sized array */
+	if (btf->nr_types + 2 > btf->type_offs_cap) {
+		__u32 *new_offs;
 		__u32 expand_by, new_size;
 
-		if (btf->types_size == BTF_MAX_NR_TYPES)
+		if (btf->type_offs_cap == BTF_MAX_NR_TYPES)
 			return -E2BIG;
 
-		expand_by = max(btf->types_size >> 2, 16U);
-		new_size = min(BTF_MAX_NR_TYPES, btf->types_size + expand_by);
+		expand_by = max(btf->type_offs_cap / 4, 16U);
+		new_size = min(BTF_MAX_NR_TYPES, btf->type_offs_cap + expand_by);
 
-		new_types = libbpf_reallocarray(btf->types, new_size, sizeof(*new_types));
-		if (!new_types)
+		new_offs = libbpf_reallocarray(btf->type_offs, new_size, sizeof(*new_offs));
+		if (!new_offs)
 			return -ENOMEM;
 
-		if (btf->nr_types == 0)
-			new_types[0] = &btf_void;
+		new_offs[0] = UINT_MAX; /* VOID is specially handled */
 
-		btf->types = new_types;
-		btf->types_size = new_size;
+		btf->type_offs = new_offs;
+		btf->type_offs_cap = new_size;
 	}
 
-	btf->types[++(btf->nr_types)] = t;
+	btf->type_offs[btf->nr_types + 1] = type_off;
 
 	return 0;
 }
@@ -147,7 +148,7 @@ static int btf_parse_str_sec(struct btf *btf)
 	return 0;
 }
 
-static int btf_type_size(struct btf_type *t)
+static int btf_type_size(const struct btf_type *t)
 {
 	int base_size = sizeof(struct btf_type);
 	__u16 vlen = btf_vlen(t);
@@ -185,22 +186,25 @@ static int btf_type_size(struct btf_type *t)
 static int btf_parse_type_sec(struct btf *btf)
 {
 	struct btf_header *hdr = btf->hdr;
-	void *nohdr_data = btf->nohdr_data;
-	void *next_type = nohdr_data + hdr->type_off;
-	void *end_type = nohdr_data + hdr->str_off;
+	void *next_type = btf->nohdr_data + hdr->type_off;
+	void *end_type = next_type + hdr->type_len;
+
+	btf->types_data = next_type;
 
 	while (next_type < end_type) {
-		struct btf_type *t = next_type;
 		int type_size;
 		int err;
 
-		type_size = btf_type_size(t);
+		err = btf_add_type_idx_entry(btf, next_type - btf->types_data);
+		if (err)
+			return err;
+
+		type_size = btf_type_size(next_type);
 		if (type_size < 0)
 			return type_size;
+
 		next_type += type_size;
-		err = btf_add_type(btf, t);
-		if (err)
-			return err;
+		btf->nr_types++;
 	}
 
 	return 0;
@@ -211,12 +215,20 @@ __u32 btf__get_nr_types(const struct btf *btf)
 	return btf->nr_types;
 }
 
+/* internal helper returning non-const pointer to a type */
+static struct btf_type *btf_type_by_id(struct btf *btf, __u32 type_id)
+{
+	if (type_id == 0)
+		return &btf_void;
+
+	return btf->types_data + btf->type_offs[type_id];
+}
+
 const struct btf_type *btf__type_by_id(const struct btf *btf, __u32 type_id)
 {
 	if (type_id > btf->nr_types)
 		return NULL;
-
-	return btf->types[type_id];
+	return btf_type_by_id((struct btf *)btf, type_id);
 }
 
 static int determine_ptr_size(const struct btf *btf)
@@ -414,7 +426,7 @@ __s32 btf__find_by_name(const struct btf *btf, const char *type_name)
 		return 0;
 
 	for (i = 1; i <= btf->nr_types; i++) {
-		const struct btf_type *t = btf->types[i];
+		const struct btf_type *t = btf__type_by_id(btf, i);
 		const char *name = btf__name_by_offset(btf, t->name_off);
 
 		if (name && !strcmp(type_name, name))
@@ -433,7 +445,7 @@ __s32 btf__find_by_name_kind(const struct btf *btf, const char *type_name,
 		return 0;
 
 	for (i = 1; i <= btf->nr_types; i++) {
-		const struct btf_type *t = btf->types[i];
+		const struct btf_type *t = btf__type_by_id(btf, i);
 		const char *name;
 
 		if (btf_kind(t) != kind)
@@ -455,7 +467,7 @@ void btf__free(struct btf *btf)
 		close(btf->fd);
 
 	free(btf->data);
-	free(btf->types);
+	free(btf->type_offs);
 	free(btf);
 }
 
@@ -789,7 +801,7 @@ int btf__finalize_data(struct bpf_object *obj, struct btf *btf)
 	__u32 i;
 
 	for (i = 1; i <= btf->nr_types; i++) {
-		struct btf_type *t = btf->types[i];
+		struct btf_type *t = btf_type_by_id(btf, i);
 
 		/* Loader needs to fix up some of the things compiler
 		 * couldn't get its hands on while emitting BTF. This
@@ -1655,7 +1667,7 @@ static struct btf_dedup *btf_dedup_new(struct btf *btf, struct btf_ext *btf_ext,
 	/* special BTF "void" type is made canonical immediately */
 	d->map[0] = 0;
 	for (i = 1; i <= btf->nr_types; i++) {
-		struct btf_type *t = d->btf->types[i];
+		struct btf_type *t = btf_type_by_id(d->btf, i);
 
 		/* VAR and DATASEC are never deduped and are self-canonical */
 		if (btf_is_var(t) || btf_is_datasec(t))
@@ -1694,7 +1706,7 @@ static int btf_for_each_str_off(struct btf_dedup *d, str_off_fn_t fn, void *ctx)
 	struct btf_type *t;
 
 	for (i = 1; i <= d->btf->nr_types; i++) {
-		t = d->btf->types[i];
+		t = btf_type_by_id(d->btf, i);
 		r = fn(&t->name_off, ctx);
 		if (r)
 			return r;
@@ -2229,7 +2241,7 @@ static bool btf_compat_fnproto(struct btf_type *t1, struct btf_type *t2)
  */
 static int btf_dedup_prim_type(struct btf_dedup *d, __u32 type_id)
 {
-	struct btf_type *t = d->btf->types[type_id];
+	struct btf_type *t = btf_type_by_id(d->btf, type_id);
 	struct hashmap_entry *hash_entry;
 	struct btf_type *cand;
 	/* if we don't find equivalent type, then we are canonical */
@@ -2256,7 +2268,7 @@ static int btf_dedup_prim_type(struct btf_dedup *d, __u32 type_id)
 		h = btf_hash_int(t);
 		for_each_dedup_cand(d, hash_entry, h) {
 			cand_id = (__u32)(long)hash_entry->value;
-			cand = d->btf->types[cand_id];
+			cand = btf_type_by_id(d->btf, cand_id);
 			if (btf_equal_int(t, cand)) {
 				new_id = cand_id;
 				break;
@@ -2268,7 +2280,7 @@ static int btf_dedup_prim_type(struct btf_dedup *d, __u32 type_id)
 		h = btf_hash_enum(t);
 		for_each_dedup_cand(d, hash_entry, h) {
 			cand_id = (__u32)(long)hash_entry->value;
-			cand = d->btf->types[cand_id];
+			cand = btf_type_by_id(d->btf, cand_id);
 			if (btf_equal_enum(t, cand)) {
 				new_id = cand_id;
 				break;
@@ -2291,7 +2303,7 @@ static int btf_dedup_prim_type(struct btf_dedup *d, __u32 type_id)
 		h = btf_hash_common(t);
 		for_each_dedup_cand(d, hash_entry, h) {
 			cand_id = (__u32)(long)hash_entry->value;
-			cand = d->btf->types[cand_id];
+			cand = btf_type_by_id(d->btf, cand_id);
 			if (btf_equal_common(t, cand)) {
 				new_id = cand_id;
 				break;
@@ -2350,13 +2362,13 @@ static uint32_t resolve_fwd_id(struct btf_dedup *d, uint32_t type_id)
 {
 	__u32 orig_type_id = type_id;
 
-	if (!btf_is_fwd(d->btf->types[type_id]))
+	if (!btf_is_fwd(btf__type_by_id(d->btf, type_id)))
 		return type_id;
 
 	while (is_type_mapped(d, type_id) && d->map[type_id] != type_id)
 		type_id = d->map[type_id];
 
-	if (!btf_is_fwd(d->btf->types[type_id]))
+	if (!btf_is_fwd(btf__type_by_id(d->btf, type_id)))
 		return type_id;
 
 	return orig_type_id;
@@ -2484,8 +2496,8 @@ static int btf_dedup_is_equiv(struct btf_dedup *d, __u32 cand_id,
 	if (btf_dedup_hypot_map_add(d, canon_id, cand_id))
 		return -ENOMEM;
 
-	cand_type = d->btf->types[cand_id];
-	canon_type = d->btf->types[canon_id];
+	cand_type = btf_type_by_id(d->btf, cand_id);
+	canon_type = btf_type_by_id(d->btf, canon_id);
 	cand_kind = btf_kind(cand_type);
 	canon_kind = btf_kind(canon_type);
 
@@ -2636,8 +2648,8 @@ static void btf_dedup_merge_hypot_map(struct btf_dedup *d)
 		targ_type_id = d->hypot_map[cand_type_id];
 		t_id = resolve_type_id(d, targ_type_id);
 		c_id = resolve_type_id(d, cand_type_id);
-		t_kind = btf_kind(d->btf->types[t_id]);
-		c_kind = btf_kind(d->btf->types[c_id]);
+		t_kind = btf_kind(btf__type_by_id(d->btf, t_id));
+		c_kind = btf_kind(btf__type_by_id(d->btf, c_id));
 		/*
 		 * Resolve FWD into STRUCT/UNION.
 		 * It's ok to resolve FWD into STRUCT/UNION that's not yet
@@ -2705,7 +2717,7 @@ static int btf_dedup_struct_type(struct btf_dedup *d, __u32 type_id)
 	if (d->map[type_id] <= BTF_MAX_NR_TYPES)
 		return 0;
 
-	t = d->btf->types[type_id];
+	t = btf_type_by_id(d->btf, type_id);
 	kind = btf_kind(t);
 
 	if (kind != BTF_KIND_STRUCT && kind != BTF_KIND_UNION)
@@ -2726,7 +2738,7 @@ static int btf_dedup_struct_type(struct btf_dedup *d, __u32 type_id)
 		 * creating a loop (FWD -> STRUCT and STRUCT -> FWD), because
 		 * FWD and compatible STRUCT/UNION are considered equivalent.
 		 */
-		cand_type = d->btf->types[cand_id];
+		cand_type = btf_type_by_id(d->btf, cand_id);
 		if (!btf_shallow_equal_struct(t, cand_type))
 			continue;
 
@@ -2798,7 +2810,7 @@ static int btf_dedup_ref_type(struct btf_dedup *d, __u32 type_id)
 	if (d->map[type_id] <= BTF_MAX_NR_TYPES)
 		return resolve_type_id(d, type_id);
 
-	t = d->btf->types[type_id];
+	t = btf_type_by_id(d->btf, type_id);
 	d->map[type_id] = BTF_IN_PROGRESS_ID;
 
 	switch (btf_kind(t)) {
@@ -2816,7 +2828,7 @@ static int btf_dedup_ref_type(struct btf_dedup *d, __u32 type_id)
 		h = btf_hash_common(t);
 		for_each_dedup_cand(d, hash_entry, h) {
 			cand_id = (__u32)(long)hash_entry->value;
-			cand = d->btf->types[cand_id];
+			cand = btf_type_by_id(d->btf, cand_id);
 			if (btf_equal_common(t, cand)) {
 				new_id = cand_id;
 				break;
@@ -2840,7 +2852,7 @@ static int btf_dedup_ref_type(struct btf_dedup *d, __u32 type_id)
 		h = btf_hash_array(t);
 		for_each_dedup_cand(d, hash_entry, h) {
 			cand_id = (__u32)(long)hash_entry->value;
-			cand = d->btf->types[cand_id];
+			cand = btf_type_by_id(d->btf, cand_id);
 			if (btf_equal_array(t, cand)) {
 				new_id = cand_id;
 				break;
@@ -2872,7 +2884,7 @@ static int btf_dedup_ref_type(struct btf_dedup *d, __u32 type_id)
 		h = btf_hash_fnproto(t);
 		for_each_dedup_cand(d, hash_entry, h) {
 			cand_id = (__u32)(long)hash_entry->value;
-			cand = d->btf->types[cand_id];
+			cand = btf_type_by_id(d->btf, cand_id);
 			if (btf_equal_fnproto(t, cand)) {
 				new_id = cand_id;
 				break;
@@ -2920,9 +2932,9 @@ static int btf_dedup_ref_types(struct btf_dedup *d)
  */
 static int btf_dedup_compact_types(struct btf_dedup *d)
 {
-	struct btf_type **new_types;
+	__u32 *new_offs;
 	__u32 next_type_id = 1;
-	char *types_start, *p;
+	void *p;
 	int i, len;
 
 	/* we are going to reuse hypot_map to store compaction remapping */
@@ -2930,41 +2942,40 @@ static int btf_dedup_compact_types(struct btf_dedup *d)
 	for (i = 1; i <= d->btf->nr_types; i++)
 		d->hypot_map[i] = BTF_UNPROCESSED_ID;
 
-	types_start = d->btf->nohdr_data + d->btf->hdr->type_off;
-	p = types_start;
+	p = d->btf->types_data;
 
 	for (i = 1; i <= d->btf->nr_types; i++) {
 		if (d->map[i] != i)
 			continue;
 
-		len = btf_type_size(d->btf->types[i]);
+		len = btf_type_size(btf__type_by_id(d->btf, i));
 		if (len < 0)
 			return len;
 
-		memmove(p, d->btf->types[i], len);
+		memmove(p, btf__type_by_id(d->btf, i), len);
 		d->hypot_map[i] = next_type_id;
-		d->btf->types[next_type_id] = (struct btf_type *)p;
+		d->btf->type_offs[next_type_id] = p - d->btf->types_data;
 		p += len;
 		next_type_id++;
 	}
 
 	/* shrink struct btf's internal types index and update btf_header */
 	d->btf->nr_types = next_type_id - 1;
-	d->btf->types_size = d->btf->nr_types;
-	d->btf->hdr->type_len = p - types_start;
-	new_types = libbpf_reallocarray(d->btf->types, (1 + d->btf->nr_types),
-					sizeof(struct btf_type *));
-	if (!new_types)
+	d->btf->type_offs_cap = d->btf->nr_types + 1;
+	d->btf->hdr->type_len = p - d->btf->types_data;
+	new_offs = libbpf_reallocarray(d->btf->type_offs, d->btf->type_offs_cap,
+				       sizeof(*new_offs));
+	if (!new_offs)
 		return -ENOMEM;
-	d->btf->types = new_types;
+	d->btf->type_offs = new_offs;
 
 	/* make sure string section follows type information without gaps */
-	d->btf->hdr->str_off = p - (char *)d->btf->nohdr_data;
+	d->btf->hdr->str_off = p - d->btf->nohdr_data;
 	memmove(p, d->btf->strings, d->btf->hdr->str_len);
 	d->btf->strings = p;
 	p += d->btf->hdr->str_len;
 
-	d->btf->data_size = p - (char *)d->btf->data;
+	d->btf->data_size = p - d->btf->data;
 	return 0;
 }
 
@@ -2997,7 +3008,7 @@ static int btf_dedup_remap_type_id(struct btf_dedup *d, __u32 type_id)
  */
 static int btf_dedup_remap_type(struct btf_dedup *d, __u32 type_id)
 {
-	struct btf_type *t = d->btf->types[type_id];
+	struct btf_type *t = btf_type_by_id(d->btf, type_id);
 	int i, r;
 
 	switch (btf_kind(t)) {
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next 2/9] libbpf: remove assumption of single contiguous memory for BTF data
  2020-09-23 15:54 [PATCH bpf-next 0/9] libbpf: BTF writer APIs Andrii Nakryiko
  2020-09-23 15:54 ` [PATCH bpf-next 1/9] libbpf: refactor internals of BTF type index Andrii Nakryiko
@ 2020-09-23 15:54 ` Andrii Nakryiko
  2020-09-24 15:21   ` John Fastabend
  2020-09-23 15:54 ` [PATCH bpf-next 3/9] libbpf: generalize common logic for managing dynamically-sized arrays Andrii Nakryiko
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 20+ messages in thread
From: Andrii Nakryiko @ 2020-09-23 15:54 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko, Arnaldo Carvalho de Melo

Refactor internals of struct btf to remove assumptions that BTF header, type
data, and string data are layed out contiguously in a memory in a single
memory allocation. Now we have three separate pointers pointing to the start
of each respective are: header, types, strings. In the next patches, these
pointers will be re-assigned to point to independently allocated memory areas,
if BTF needs to be modified.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 tools/lib/bpf/bpf.c |  2 +-
 tools/lib/bpf/bpf.h |  2 +-
 tools/lib/bpf/btf.c | 99 ++++++++++++++++++++++++++-------------------
 3 files changed, 60 insertions(+), 43 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 2baa1308737c..9f3224c385af 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -815,7 +815,7 @@ int bpf_raw_tracepoint_open(const char *name, int prog_fd)
 	return sys_bpf(BPF_RAW_TRACEPOINT_OPEN, &attr, sizeof(attr));
 }
 
-int bpf_load_btf(void *btf, __u32 btf_size, char *log_buf, __u32 log_buf_size,
+int bpf_load_btf(const void *btf, __u32 btf_size, char *log_buf, __u32 log_buf_size,
 		 bool do_log)
 {
 	union bpf_attr attr = {};
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 8c1ac4b42f90..671a6e6a4ce9 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -234,7 +234,7 @@ LIBBPF_API int bpf_prog_query(int target_fd, enum bpf_attach_type type,
 			      __u32 query_flags, __u32 *attach_flags,
 			      __u32 *prog_ids, __u32 *prog_cnt);
 LIBBPF_API int bpf_raw_tracepoint_open(const char *name, int prog_fd);
-LIBBPF_API int bpf_load_btf(void *btf, __u32 btf_size, char *log_buf,
+LIBBPF_API int bpf_load_btf(const void *btf, __u32 btf_size, char *log_buf,
 			    __u32 log_buf_size, bool do_log);
 LIBBPF_API int bpf_task_fd_query(int pid, int fd, __u32 flags, char *buf,
 				 __u32 *buf_len, __u32 *prog_id, __u32 *fd_type,
diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index 7c9957893ef2..d180a677a3fb 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -27,18 +27,37 @@
 static struct btf_type btf_void;
 
 struct btf {
-	union {
-		struct btf_header *hdr;
-		void *data;
-	};
+	void *raw_data;
+	__u32 raw_size;
+
+	/*
+	 * When BTF is loaded from ELF or raw memory it is stored
+	 * in contiguous memory block, pointed to by raw_data pointer, and
+	 * hdr, types_data, and strs_data point inside that memory region to
+	 * respective parts of BTF representation:
+	 *
+	 * +--------------------------------+
+	 * |  Header  |  Types  |  Strings  |
+	 * +--------------------------------+
+	 * ^          ^         ^
+	 * |          |         |
+	 * hdr        |         |
+	 * types_data-+         |
+	 * strs_data------------+
+	 */
+	struct btf_header *hdr;
+	void *types_data;
+	void *strs_data;
+
+	/* type ID to `struct btf_type *` lookup index */
 	__u32 *type_offs;
 	__u32 type_offs_cap;
-	const char *strings;
-	void *nohdr_data;
-	void *types_data;
 	__u32 nr_types;
-	__u32 data_size;
+
+	/* BTF object FD, if loaded into kernel */
 	int fd;
+
+	/* Pointer size (in bytes) for a target architecture of this BTF */
 	int ptr_sz;
 };
 
@@ -80,7 +99,7 @@ static int btf_parse_hdr(struct btf *btf)
 	const struct btf_header *hdr = btf->hdr;
 	__u32 meta_left;
 
-	if (btf->data_size < sizeof(struct btf_header)) {
+	if (btf->raw_size < sizeof(struct btf_header)) {
 		pr_debug("BTF header not found\n");
 		return -EINVAL;
 	}
@@ -100,7 +119,7 @@ static int btf_parse_hdr(struct btf *btf)
 		return -ENOTSUP;
 	}
 
-	meta_left = btf->data_size - sizeof(*hdr);
+	meta_left = btf->raw_size - sizeof(*hdr);
 	if (!meta_left) {
 		pr_debug("BTF has no data\n");
 		return -EINVAL;
@@ -126,15 +145,13 @@ static int btf_parse_hdr(struct btf *btf)
 		return -EINVAL;
 	}
 
-	btf->nohdr_data = btf->hdr + 1;
-
 	return 0;
 }
 
 static int btf_parse_str_sec(struct btf *btf)
 {
 	const struct btf_header *hdr = btf->hdr;
-	const char *start = btf->nohdr_data + hdr->str_off;
+	const char *start = btf->strs_data;
 	const char *end = start + btf->hdr->str_len;
 
 	if (!hdr->str_len || hdr->str_len - 1 > BTF_MAX_STR_OFFSET ||
@@ -143,8 +160,6 @@ static int btf_parse_str_sec(struct btf *btf)
 		return -EINVAL;
 	}
 
-	btf->strings = start;
-
 	return 0;
 }
 
@@ -186,11 +201,9 @@ static int btf_type_size(const struct btf_type *t)
 static int btf_parse_type_sec(struct btf *btf)
 {
 	struct btf_header *hdr = btf->hdr;
-	void *next_type = btf->nohdr_data + hdr->type_off;
+	void *next_type = btf->types_data;
 	void *end_type = next_type + hdr->type_len;
 
-	btf->types_data = next_type;
-
 	while (next_type < end_type) {
 		int type_size;
 		int err;
@@ -466,7 +479,7 @@ void btf__free(struct btf *btf)
 	if (btf->fd >= 0)
 		close(btf->fd);
 
-	free(btf->data);
+	free(btf->raw_data);
 	free(btf->type_offs);
 	free(btf);
 }
@@ -482,24 +495,24 @@ struct btf *btf__new(const void *data, __u32 size)
 
 	btf->fd = -1;
 
-	btf->data = malloc(size);
-	if (!btf->data) {
+	btf->raw_data = malloc(size);
+	if (!btf->raw_data) {
 		err = -ENOMEM;
 		goto done;
 	}
+	memcpy(btf->raw_data, data, size);
+	btf->raw_size = size;
 
-	memcpy(btf->data, data, size);
-	btf->data_size = size;
-
+	btf->hdr = btf->raw_data;
 	err = btf_parse_hdr(btf);
 	if (err)
 		goto done;
 
-	err = btf_parse_str_sec(btf);
-	if (err)
-		goto done;
+	btf->strs_data = btf->raw_data + btf->hdr->hdr_len + btf->hdr->str_off;
+	btf->types_data = btf->raw_data + btf->hdr->hdr_len + btf->hdr->type_off;
 
-	err = btf_parse_type_sec(btf);
+	err = btf_parse_str_sec(btf);
+	err = err ?: btf_parse_type_sec(btf);
 
 done:
 	if (err) {
@@ -820,8 +833,9 @@ int btf__finalize_data(struct bpf_object *obj, struct btf *btf)
 
 int btf__load(struct btf *btf)
 {
-	__u32 log_buf_size = 0;
+	__u32 log_buf_size = 0, raw_size;
 	char *log_buf = NULL;
+	const void *raw_data;
 	int err = 0;
 
 	if (btf->fd >= 0)
@@ -836,8 +850,13 @@ int btf__load(struct btf *btf)
 		*log_buf = 0;
 	}
 
-	btf->fd = bpf_load_btf(btf->data, btf->data_size,
-			       log_buf, log_buf_size, false);
+	raw_data = btf__get_raw_data(btf, &raw_size);
+	if (!raw_data) {
+		err = -ENOMEM;
+		goto done;
+	}
+
+	btf->fd = bpf_load_btf(raw_data, raw_size, log_buf, log_buf_size, false);
 	if (btf->fd < 0) {
 		if (!log_buf || errno == ENOSPC) {
 			log_buf_size = max((__u32)BPF_LOG_BUF_SIZE,
@@ -870,14 +889,14 @@ void btf__set_fd(struct btf *btf, int fd)
 
 const void *btf__get_raw_data(const struct btf *btf, __u32 *size)
 {
-	*size = btf->data_size;
-	return btf->data;
+	*size = btf->raw_size;
+	return btf->raw_data;
 }
 
 const char *btf__name_by_offset(const struct btf *btf, __u32 offset)
 {
 	if (offset < btf->hdr->str_len)
-		return &btf->strings[offset];
+		return btf->strs_data + offset;
 	else
 		return NULL;
 }
@@ -1860,8 +1879,7 @@ static int btf_str_remap_offset(__u32 *str_off_ptr, void *ctx)
  */
 static int btf_dedup_strings(struct btf_dedup *d)
 {
-	const struct btf_header *hdr = d->btf->hdr;
-	char *start = (char *)d->btf->nohdr_data + hdr->str_off;
+	char *start = d->btf->strs_data;
 	char *end = start + d->btf->hdr->str_len;
 	char *p = start, *tmp_strs = NULL;
 	struct btf_str_ptrs strs = {
@@ -2970,12 +2988,11 @@ static int btf_dedup_compact_types(struct btf_dedup *d)
 	d->btf->type_offs = new_offs;
 
 	/* make sure string section follows type information without gaps */
-	d->btf->hdr->str_off = p - d->btf->nohdr_data;
-	memmove(p, d->btf->strings, d->btf->hdr->str_len);
-	d->btf->strings = p;
-	p += d->btf->hdr->str_len;
+	d->btf->hdr->str_off = p - d->btf->types_data;
+	memmove(p, d->btf->strs_data, d->btf->hdr->str_len);
+	d->btf->strs_data = p;
 
-	d->btf->data_size = p - d->btf->data;
+	d->btf->raw_size = d->btf->hdr->hdr_len + d->btf->hdr->type_len + d->btf->hdr->str_len;
 	return 0;
 }
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next 3/9] libbpf: generalize common logic for managing dynamically-sized arrays
  2020-09-23 15:54 [PATCH bpf-next 0/9] libbpf: BTF writer APIs Andrii Nakryiko
  2020-09-23 15:54 ` [PATCH bpf-next 1/9] libbpf: refactor internals of BTF type index Andrii Nakryiko
  2020-09-23 15:54 ` [PATCH bpf-next 2/9] libbpf: remove assumption of single contiguous memory for BTF data Andrii Nakryiko
@ 2020-09-23 15:54 ` Andrii Nakryiko
  2020-09-23 15:54 ` [PATCH bpf-next 4/9] libbpf: extract generic string hashing function for reuse Andrii Nakryiko
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Andrii Nakryiko @ 2020-09-23 15:54 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko, Arnaldo Carvalho de Melo

Managing dynamically-sized array is a common, but not trivial functionality,
which significant amount of logic and code to implement properly. So instead
of re-implementing it all the time, extract it into a helper function ans
reuse.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 tools/lib/bpf/btf.c             | 77 ++++++++++++++++++++++++---------
 tools/lib/bpf/libbpf_internal.h |  3 ++
 2 files changed, 59 insertions(+), 21 deletions(-)

diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index d180a677a3fb..f5255e2bd222 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -51,7 +51,7 @@ struct btf {
 
 	/* type ID to `struct btf_type *` lookup index */
 	__u32 *type_offs;
-	__u32 type_offs_cap;
+	size_t type_offs_cap;
 	__u32 nr_types;
 
 	/* BTF object FD, if loaded into kernel */
@@ -66,31 +66,60 @@ static inline __u64 ptr_to_u64(const void *ptr)
 	return (__u64) (unsigned long) ptr;
 }
 
-static int btf_add_type_idx_entry(struct btf *btf, __u32 type_off)
+/* Ensure given dynamically allocated memory region pointed to by *data* with
+ * capacity of *cap_cnt* elements each taking *elem_sz* bytes has enough
+ * memory to accomodate *add_cnt* new elements, assuming *cur_cnt* elements
+ * are already used. At most *max_cnt* elements can be ever allocated.
+ * If necessary, memory is reallocated and all existing data is copied over,
+ * new pointer to the memory region is stored at *data, new memory region
+ * capacity (in number of elements) is stored in *cap.
+ * On success, memory pointer to the beginning of unused memory is returned.
+ * On error, NULL is returned.
+ */
+void *btf_add_mem(void **data, size_t *cap_cnt, size_t elem_sz,
+		  size_t cur_cnt, size_t max_cnt, size_t add_cnt)
 {
-	/* nr_types is 1-based, so N types means we need N+1-sized array */
-	if (btf->nr_types + 2 > btf->type_offs_cap) {
-		__u32 *new_offs;
-		__u32 expand_by, new_size;
+	size_t new_cnt;
+	void *new_data;
 
-		if (btf->type_offs_cap == BTF_MAX_NR_TYPES)
-			return -E2BIG;
+	if (cur_cnt + add_cnt <= *cap_cnt)
+		return *data + cur_cnt * elem_sz;
 
-		expand_by = max(btf->type_offs_cap / 4, 16U);
-		new_size = min(BTF_MAX_NR_TYPES, btf->type_offs_cap + expand_by);
+	/* requested more than the set limit */
+	if (cur_cnt + add_cnt > max_cnt)
+		return NULL;
 
-		new_offs = libbpf_reallocarray(btf->type_offs, new_size, sizeof(*new_offs));
-		if (!new_offs)
-			return -ENOMEM;
+	new_cnt = *cap_cnt;
+	new_cnt += new_cnt / 4;		  /* expand by 25% */
+	if (new_cnt < 16)		  /* but at least 16 elements */
+		new_cnt = 16;
+	if (new_cnt > max_cnt)		  /* but not exceeding a set limit */
+		new_cnt = max_cnt;
+	if (new_cnt < cur_cnt + add_cnt)  /* also ensure we have enough memory */
+		new_cnt = cur_cnt + add_cnt;
+
+	new_data = libbpf_reallocarray(*data, new_cnt, elem_sz);
+	if (!new_data)
+		return NULL;
 
-		new_offs[0] = UINT_MAX; /* VOID is specially handled */
+	/* zero out newly allocated portion of memory */
+	memset(new_data + (*cap_cnt) * elem_sz, 0, (new_cnt - *cap_cnt) * elem_sz);
 
-		btf->type_offs = new_offs;
-		btf->type_offs_cap = new_size;
-	}
+	*data = new_data;
+	*cap_cnt = new_cnt;
+	return new_data + cur_cnt * elem_sz;
+}
 
-	btf->type_offs[btf->nr_types + 1] = type_off;
+static int btf_add_type_idx_entry(struct btf *btf, __u32 type_off)
+{
+	__u32 *p;
+
+	p = btf_add_mem((void **)&btf->type_offs, &btf->type_offs_cap, sizeof(__u32),
+			btf->nr_types + 1, BTF_MAX_NR_TYPES, 1);
+	if (!p)
+		return -ENOMEM;
 
+	*p = type_off;
 	return 0;
 }
 
@@ -203,11 +232,17 @@ static int btf_parse_type_sec(struct btf *btf)
 	struct btf_header *hdr = btf->hdr;
 	void *next_type = btf->types_data;
 	void *end_type = next_type + hdr->type_len;
+	int err, type_size;
 
-	while (next_type < end_type) {
-		int type_size;
-		int err;
+	/* VOID (type_id == 0) is specially handled by btf__get_type_by_id(),
+	 * so ensure we can never properly use its offset from index by
+	 * setting it to a large value
+	 */
+	err = btf_add_type_idx_entry(btf, UINT_MAX);
+	if (err)
+		return err;
 
+	while (next_type < end_type) {
 		err = btf_add_type_idx_entry(btf, next_type - btf->types_data);
 		if (err)
 			return err;
diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h
index 4d1c366fca2c..d43ce00e0022 100644
--- a/tools/lib/bpf/libbpf_internal.h
+++ b/tools/lib/bpf/libbpf_internal.h
@@ -105,6 +105,9 @@ static inline void *libbpf_reallocarray(void *ptr, size_t nmemb, size_t size)
 	return realloc(ptr, total);
 }
 
+void *btf_add_mem(void **data, size_t *cap_cnt, size_t elem_sz,
+		  size_t cur_cnt, size_t max_cnt, size_t add_cnt);
+
 static inline bool libbpf_validate_opts(const char *opts,
 					size_t opts_sz, size_t user_sz,
 					const char *type_name)
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next 4/9] libbpf: extract generic string hashing function for reuse
  2020-09-23 15:54 [PATCH bpf-next 0/9] libbpf: BTF writer APIs Andrii Nakryiko
                   ` (2 preceding siblings ...)
  2020-09-23 15:54 ` [PATCH bpf-next 3/9] libbpf: generalize common logic for managing dynamically-sized arrays Andrii Nakryiko
@ 2020-09-23 15:54 ` Andrii Nakryiko
  2020-09-23 15:54 ` [PATCH bpf-next 5/9] libbpf: allow modification of BTF and add btf__add_str API Andrii Nakryiko
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Andrii Nakryiko @ 2020-09-23 15:54 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko, Arnaldo Carvalho de Melo

Calculating a hash of zero-terminated string is a common need when using
hashmap, so extract it for reuse.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 tools/lib/bpf/btf_dump.c |  9 +--------
 tools/lib/bpf/hashmap.h  | 12 ++++++++++++
 2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/tools/lib/bpf/btf_dump.c b/tools/lib/bpf/btf_dump.c
index 6c079b3c8679..91310e528a3a 100644
--- a/tools/lib/bpf/btf_dump.c
+++ b/tools/lib/bpf/btf_dump.c
@@ -90,14 +90,7 @@ struct btf_dump {
 
 static size_t str_hash_fn(const void *key, void *ctx)
 {
-	const char *s = key;
-	size_t h = 0;
-
-	while (*s) {
-		h = h * 31 + *s;
-		s++;
-	}
-	return h;
+	return str_hash(key);
 }
 
 static bool str_equal_fn(const void *a, const void *b, void *ctx)
diff --git a/tools/lib/bpf/hashmap.h b/tools/lib/bpf/hashmap.h
index e0af36b0e5d8..d9b385fe808c 100644
--- a/tools/lib/bpf/hashmap.h
+++ b/tools/lib/bpf/hashmap.h
@@ -25,6 +25,18 @@ static inline size_t hash_bits(size_t h, int bits)
 #endif
 }
 
+/* generic C-string hashing function */
+static inline size_t str_hash(const char *s)
+{
+	size_t h = 0;
+
+	while (*s) {
+		h = h * 31 + *s;
+		s++;
+	}
+	return h;
+}
+
 typedef size_t (*hashmap_hash_fn)(const void *key, void *ctx);
 typedef bool (*hashmap_equal_fn)(const void *key1, const void *key2, void *ctx);
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next 5/9] libbpf: allow modification of BTF and add btf__add_str API
  2020-09-23 15:54 [PATCH bpf-next 0/9] libbpf: BTF writer APIs Andrii Nakryiko
                   ` (3 preceding siblings ...)
  2020-09-23 15:54 ` [PATCH bpf-next 4/9] libbpf: extract generic string hashing function for reuse Andrii Nakryiko
@ 2020-09-23 15:54 ` Andrii Nakryiko
  2020-09-24 15:56   ` John Fastabend
  2020-09-23 15:54 ` [PATCH bpf-next 6/9] libbpf: add btf__new_empty() to create an empty BTF object Andrii Nakryiko
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 20+ messages in thread
From: Andrii Nakryiko @ 2020-09-23 15:54 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko, Arnaldo Carvalho de Melo

Allow internal BTF representation to switch from default read-only mode, in
which raw BTF data is a single non-modifiable block of memory with BTF header,
types, and strings layed out sequentially and contiguously in memory, into
a writable representation with types and strings data split out into separate
memory regions, that can be dynamically expanded.

Such writable internal representation is transparent to users of libbpf APIs,
but allows to append new types and strings at the end of BTF, which is
a typical use case when generating BTF programmatically. All the basic
guarantees of BTF types and strings layout is preserved, i.e., user can get
`struct btf_type *` pointer and read it directly. Such btf_type pointers might
be invalidated if BTF is modified, so some care is required in such mixed
read/write scenarios.

Switch from read-only to writable configuration happens automatically the
first time when user attempts to modify BTF by either adding a new type or new
string. It is still possible to get raw BTF data, which is a single piece of
memory that can be persisted in ELF section or into a file as raw BTF. Such
raw data memory is also still owned by BTF and will be freed either when BTF
object is freed or if another modification to BTF happens, as any modification
invalidates BTF raw representation.

This patch adds the first BTF writing API: btf__add_str(), which allows to
add arbitrary strings to BTF string section. All the added strings are
automatically deduplicated. This is achieved by maintaining an additional
string lookup index for all unique strings. Such index is built when BTF is
switched to modifiable mode. If at that time BTF strings section contained
duplicate strings, they are not de-duplicated. This is done specifically to
not modify the existing content of BTF (types, their string offsets, etc),
which can cause confusion and is especially important property if there is
struct btf_ext associated with struct btf. By following this "imperfect
deduplication" process, btf_ext is kept consitent and correct. If
deduplication of strings is necessary, it can be forced by doing BTF
deduplication, at which point all the strings will be eagerly deduplicated and
all string offsets both in struct btf and struct btf_ext will be updated.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 tools/lib/bpf/btf.c      | 230 +++++++++++++++++++++++++++++++++++++--
 tools/lib/bpf/btf.h      |   2 +
 tools/lib/bpf/libbpf.map |   1 +
 3 files changed, 225 insertions(+), 8 deletions(-)

diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index f5255e2bd222..8c8f15703608 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -44,16 +44,46 @@ struct btf {
 	 * hdr        |         |
 	 * types_data-+         |
 	 * strs_data------------+
+	 *
+	 * If BTF data is later modified, e.g., due to types added or
+	 * removed, BTF deduplication performed, etc, this contiguous
+	 * representation is broken up into three independently allocated
+	 * memory regions to be able to modify them independently.
+	 * raw_data is nulled out at that point, but can be later allocated
+	 * and cached again if user calls btf__get_raw_data(), at which point
+	 * raw_data will contain a contiguous copy of header, types, and
+	 * strings:
+	 *
+	 * +----------+  +---------+  +-----------+
+	 * |  Header  |  |  Types  |  |  Strings  |
+	 * +----------+  +---------+  +-----------+
+	 * ^             ^            ^
+	 * |             |            |
+	 * hdr           |            |
+	 * types_data----+            |
+	 * strs_data------------------+
+	 *
+	 *               +----------+---------+-----------+
+	 *               |  Header  |  Types  |  Strings  |
+	 * raw_data----->+----------+---------+-----------+
 	 */
 	struct btf_header *hdr;
+
 	void *types_data;
-	void *strs_data;
+	size_t types_data_cap; /* used size stored in hdr->type_len */
 
 	/* type ID to `struct btf_type *` lookup index */
 	__u32 *type_offs;
 	size_t type_offs_cap;
 	__u32 nr_types;
 
+	void *strs_data;
+	size_t strs_data_cap; /* used size stored in hdr->str_len */
+
+	/* lookup index for each unique string in strings section */
+	struct hashmap *strs_hash;
+	/* whether strings are already deduplicated */
+	bool strs_deduped;
 	/* BTF object FD, if loaded into kernel */
 	int fd;
 
@@ -506,6 +536,11 @@ __s32 btf__find_by_name_kind(const struct btf *btf, const char *type_name,
 	return -ENOENT;
 }
 
+static bool btf_is_modifiable(const struct btf *btf)
+{
+	return (void *)btf->hdr != btf->raw_data;
+}
+
 void btf__free(struct btf *btf)
 {
 	if (IS_ERR_OR_NULL(btf))
@@ -514,6 +549,17 @@ void btf__free(struct btf *btf)
 	if (btf->fd >= 0)
 		close(btf->fd);
 
+	if (btf_is_modifiable(btf)) {
+		/* if BTF was modified after loading, it will have a split
+		 * in-memory representation for header, types, and strings
+		 * sections, so we need to free all of them individually. It
+		 * might still have a cached contiguous raw data present,
+		 * which will be unconditionally freed below.
+		 */
+		free(btf->hdr);
+		free(btf->types_data);
+		free(btf->strs_data);
+	}
 	free(btf->raw_data);
 	free(btf->type_offs);
 	free(btf);
@@ -922,8 +968,29 @@ void btf__set_fd(struct btf *btf, int fd)
 	btf->fd = fd;
 }
 
-const void *btf__get_raw_data(const struct btf *btf, __u32 *size)
+const void *btf__get_raw_data(const struct btf *btf_ro, __u32 *size)
 {
+	struct btf *btf = (struct btf *)btf_ro;
+
+	if (!btf->raw_data) {
+		struct btf_header *hdr = btf->hdr;
+		void *data;
+
+		btf->raw_size = hdr->hdr_len + hdr->type_len + hdr->str_len;
+		btf->raw_data = calloc(1, btf->raw_size);
+		if (!btf->raw_data)
+			return NULL;
+		data = btf->raw_data;
+
+		memcpy(data, hdr, hdr->hdr_len);
+		data += hdr->hdr_len;
+
+		memcpy(data, btf->types_data, hdr->type_len);
+		data += hdr->type_len;
+
+		memcpy(data, btf->strs_data, hdr->str_len);
+		data += hdr->str_len;
+	}
 	*size = btf->raw_size;
 	return btf->raw_data;
 }
@@ -1071,6 +1138,151 @@ int btf__get_map_kv_tids(const struct btf *btf, const char *map_name,
 	return 0;
 }
 
+static size_t strs_hash_fn(const void *key, void *ctx)
+{
+	struct btf *btf = ctx;
+	const char *str = btf->strs_data + (long)key;
+
+	return str_hash(str);
+}
+
+static bool strs_hash_equal_fn(const void *key1, const void *key2, void *ctx)
+{
+	struct btf *btf = ctx;
+	const char *str1 = btf->strs_data + (long)key1;
+	const char *str2 = btf->strs_data + (long)key2;
+
+	return strcmp(str1, str2) == 0;
+}
+
+/* Ensure BTF is ready to be modified (by splitting into a three memory
+ * regions for header, types, and strings). Also invalidate cached
+ * raw_data, if any.
+ */
+static int btf_ensure_modifiable(struct btf *btf)
+{
+	void *hdr, *types, *strs, *strs_end, *s;
+	struct hashmap *hash = NULL;
+	long off;
+	int err;
+
+	if (btf_is_modifiable(btf)) {
+		/* any BTF modification invalidates raw_data */
+		if (btf->raw_data) {
+			free(btf->raw_data);
+			btf->raw_data = NULL;
+		}
+		return 0;
+	}
+
+	/* split raw data into three memory regions */
+	hdr = malloc(btf->hdr->hdr_len);
+	types = malloc(btf->hdr->type_len);
+	strs = malloc(btf->hdr->str_len);
+	if (!hdr || !types || !strs)
+		goto err_out;
+
+	memcpy(hdr, btf->hdr, btf->hdr->hdr_len);
+	memcpy(types, btf->types_data, btf->hdr->type_len);
+	memcpy(strs, btf->strs_data, btf->hdr->str_len);
+
+	/* build lookup index for all strings */
+	hash = hashmap__new(strs_hash_fn, strs_hash_equal_fn, btf);
+	if (IS_ERR(hash)) {
+		err = PTR_ERR(hash);
+		hash = NULL;
+		goto err_out;
+	}
+
+	strs_end = strs + btf->hdr->str_len;
+	for (off = 0, s = strs; s < strs_end; off += strlen(s) + 1, s = strs + off) {
+		/* hashmap__add() returns EEXIST if string with the same
+		 * content already is in the hash map
+		 */
+		err = hashmap__add(hash, (void *)off, (void *)off);
+		if (err == -EEXIST)
+			continue; /* duplicate */
+		if (err)
+			goto err_out;
+	}
+
+	/* only when everything was successful, update internal state */
+	btf->hdr = hdr;
+	btf->types_data = types;
+	btf->types_data_cap = btf->hdr->type_len;
+	btf->strs_data = strs;
+	btf->strs_data_cap = btf->hdr->str_len;
+	btf->strs_hash = hash;
+	/* if BTF was created from scratch, all strings are guaranteed to be
+	 * unique and deduplicated
+	 */
+	btf->strs_deduped = btf->hdr->str_len <= 1;
+
+	/* invalidate raw_data representation */
+	free(btf->raw_data);
+	btf->raw_data = NULL;
+
+	return 0;
+
+err_out:
+	hashmap__free(hash);
+	free(hdr);
+	free(types);
+	free(strs);
+	return -ENOMEM;
+}
+
+static void *btf_add_str_mem(struct btf *btf, size_t add_sz)
+{
+	return btf_add_mem(&btf->strs_data, &btf->strs_data_cap, 1,
+			   btf->hdr->str_len, BTF_MAX_STR_OFFSET, add_sz);
+}
+
+/* Add a string s to the BTF string section.
+ * Returns:
+ *   - > 0 offset into string section, on success;
+ *   - < 0, on error.
+ */
+int btf__add_str(struct btf *btf, const char *s)
+{
+	long old_off, new_off, len;
+	void *p;
+	int err;
+
+	if (btf_ensure_modifiable(btf))
+		return -ENOMEM;
+
+	/* Hashmap keys are always offsets within btf->strs_data, so to even
+	 * look up some string from the "outside", we need to first append it
+	 * at the end, so that it can be addressed with an offset. Luckily,
+	 * until btf->hdr->str_len is incremented, that string is just a piece
+	 * of garbage for the rest of BTF code, so no harm, no foul. On the
+	 * other hand, if the string is unique, it's already appended and
+	 * ready to be used, only a simple btf->hdr->str_len increment away.
+	 */
+	len = strlen(s) + 1;
+	p = btf_add_str_mem(btf, len);
+	if (!p)
+		return -ENOMEM;
+
+	new_off = btf->hdr->str_len;
+	memcpy(p, s, len);
+
+	/* Now attempt to add the string, but only if the string with the same
+	 * contents doesn't exist already (HASHMAP_ADD strategy). If such
+	 * string exists, we'll get its offset in old_off (that's old_key).
+	 */
+	err = hashmap__insert(btf->strs_hash, (void *)new_off, (void *)new_off,
+			      HASHMAP_ADD, (const void **)&old_off, NULL);
+	if (err == -EEXIST)
+		return old_off; /* duplicated string, return existing offset */
+	if (err)
+		return err;
+
+	btf->hdr->str_len += len; /* new unique string, adjust data length */
+	return new_off;
+}
+
 struct btf_ext_sec_setup_param {
 	__u32 off;
 	__u32 len;
@@ -1537,6 +1749,9 @@ int btf__dedup(struct btf *btf, struct btf_ext *btf_ext,
 		return -EINVAL;
 	}
 
+	if (btf_ensure_modifiable(btf))
+		return -ENOMEM;
+
 	err = btf_dedup_strings(d);
 	if (err < 0) {
 		pr_debug("btf_dedup_strings failed:%d\n", err);
@@ -1926,6 +2141,9 @@ static int btf_dedup_strings(struct btf_dedup *d)
 	int i, j, err = 0, grp_idx;
 	bool grp_used;
 
+	if (d->btf->strs_deduped)
+		return 0;
+
 	/* build index of all strings */
 	while (p < end) {
 		if (strs.cnt + 1 > strs.cap) {
@@ -2018,6 +2236,7 @@ static int btf_dedup_strings(struct btf_dedup *d)
 		goto done;
 
 	d->btf->hdr->str_len = end - start;
+	d->btf->strs_deduped = true;
 
 done:
 	free(tmp_strs);
@@ -3021,12 +3240,7 @@ static int btf_dedup_compact_types(struct btf_dedup *d)
 	if (!new_offs)
 		return -ENOMEM;
 	d->btf->type_offs = new_offs;
-
-	/* make sure string section follows type information without gaps */
-	d->btf->hdr->str_off = p - d->btf->types_data;
-	memmove(p, d->btf->strs_data, d->btf->hdr->str_len);
-	d->btf->strs_data = p;
-
+	d->btf->hdr->str_off = d->btf->hdr->type_len;
 	d->btf->raw_size = d->btf->hdr->hdr_len + d->btf->hdr->type_len + d->btf->hdr->str_len;
 	return 0;
 }
diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
index 2a55320d87d0..84cb0502af95 100644
--- a/tools/lib/bpf/btf.h
+++ b/tools/lib/bpf/btf.h
@@ -72,6 +72,8 @@ LIBBPF_API __u32 btf_ext__line_info_rec_size(const struct btf_ext *btf_ext);
 
 LIBBPF_API struct btf *libbpf_find_kernel_btf(void);
 
+LIBBPF_API int btf__add_str(struct btf *btf, const char *s);
+
 struct btf_dedup_opts {
 	unsigned int dedup_table_size;
 	bool dont_resolve_fwds;
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 5f054dadf082..b4274c0fdd04 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -304,6 +304,7 @@ LIBBPF_0.2.0 {
 	global:
 		bpf_prog_bind_map;
 		bpf_program__section_name;
+		btf__add_str;
 		perf_buffer__buffer_cnt;
 		perf_buffer__buffer_fd;
 		perf_buffer__epoll_fd;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next 6/9] libbpf: add btf__new_empty() to create an empty BTF object
  2020-09-23 15:54 [PATCH bpf-next 0/9] libbpf: BTF writer APIs Andrii Nakryiko
                   ` (4 preceding siblings ...)
  2020-09-23 15:54 ` [PATCH bpf-next 5/9] libbpf: allow modification of BTF and add btf__add_str API Andrii Nakryiko
@ 2020-09-23 15:54 ` Andrii Nakryiko
  2020-09-23 15:54 ` [PATCH bpf-next 7/9] libbpf: add BTF writing APIs Andrii Nakryiko
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Andrii Nakryiko @ 2020-09-23 15:54 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko, Arnaldo Carvalho de Melo

Add an ability to create an empty BTF object from scratch. This is going to be
used by pahole for BTF encoding. And also by selftest for convenient creation
of BTF objects.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 tools/lib/bpf/btf.c      | 28 ++++++++++++++++++++++++++++
 tools/lib/bpf/btf.h      |  1 +
 tools/lib/bpf/libbpf.map |  1 +
 3 files changed, 30 insertions(+)

diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index 8c8f15703608..4d0e532d7b3d 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -565,6 +565,34 @@ void btf__free(struct btf *btf)
 	free(btf);
 }
 
+struct btf *btf__new_empty(void)
+{
+	struct btf *btf;
+
+	btf = calloc(1, sizeof(*btf));
+	if (!btf)
+		return ERR_PTR(-ENOMEM);
+
+	/* +1 for empty string at offset 0 */
+	btf->raw_size = sizeof(struct btf_header) + 1;
+	btf->raw_data = calloc(1, btf->raw_size);
+	if (!btf->raw_data) {
+		free(btf);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	btf->hdr = btf->raw_data;
+	btf->hdr->hdr_len = sizeof(struct btf_header);
+	btf->hdr->magic = BTF_MAGIC;
+	btf->hdr->version = BTF_VERSION;
+
+	btf->types_data = btf->raw_data + btf->hdr->hdr_len;
+	btf->strs_data = btf->raw_data + btf->hdr->hdr_len;
+	btf->hdr->str_len = 1; /* empty string at offset 0 */
+
+	return btf;
+}
+
 struct btf *btf__new(const void *data, __u32 size)
 {
 	struct btf *btf;
diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
index 84cb0502af95..830f798b9a77 100644
--- a/tools/lib/bpf/btf.h
+++ b/tools/lib/bpf/btf.h
@@ -26,6 +26,7 @@ struct bpf_object;
 
 LIBBPF_API void btf__free(struct btf *btf);
 LIBBPF_API struct btf *btf__new(const void *data, __u32 size);
+LIBBPF_API struct btf *btf__new_empty(void);
 LIBBPF_API struct btf *btf__parse(const char *path, struct btf_ext **btf_ext);
 LIBBPF_API struct btf *btf__parse_elf(const char *path, struct btf_ext **btf_ext);
 LIBBPF_API struct btf *btf__parse_raw(const char *path);
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index b4274c0fdd04..180c871b04b6 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -305,6 +305,7 @@ LIBBPF_0.2.0 {
 		bpf_prog_bind_map;
 		bpf_program__section_name;
 		btf__add_str;
+		btf__new_empty;
 		perf_buffer__buffer_cnt;
 		perf_buffer__buffer_fd;
 		perf_buffer__epoll_fd;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next 7/9] libbpf: add BTF writing APIs
  2020-09-23 15:54 [PATCH bpf-next 0/9] libbpf: BTF writer APIs Andrii Nakryiko
                   ` (5 preceding siblings ...)
  2020-09-23 15:54 ` [PATCH bpf-next 6/9] libbpf: add btf__new_empty() to create an empty BTF object Andrii Nakryiko
@ 2020-09-23 15:54 ` Andrii Nakryiko
  2020-09-24 15:59   ` John Fastabend
  2020-09-25  3:55   ` Alexei Starovoitov
  2020-09-23 15:54 ` [PATCH bpf-next 8/9] libbpf: add btf__str_by_offset() as a more generic variant of name_by_offset Andrii Nakryiko
                   ` (2 subsequent siblings)
  9 siblings, 2 replies; 20+ messages in thread
From: Andrii Nakryiko @ 2020-09-23 15:54 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko, Arnaldo Carvalho de Melo

Add APIs for appending new BTF types at the end of BTF object.

Each BTF kind has either one API of the form btf__append_<kind>(). For types
that have variable amount of additional items (struct/union, enum, func_proto,
datasec), additional API is provided to emit each such item. E.g., for
emitting a struct, one would use the following sequence of API calls:

btf__append_struct(...);
btf__append_field(...);
...
btf__append_field(...);

Each btf__append_field() will ensure that the last BTF type is of STRUCT or
UNION kind and will automatically increment that type's vlen field.

All the strings are provided as C strings (const char *), not a string offset.
This significantly improves usability of BTF writer APIs. All such strings
will be automatically appended to string section or existing string will be
re-used, if such string was already added previously.

Each API attempts to do all the reasonable validations, like enforcing
non-empty names for entities with required names, proper value bounds, various
bit offset restrictions, etc.

Type ID validation is minimal because it's possible to emit a type that refers
to type that will be emitted later, so libbpf has no way to enforce such
cases. User must be careful to properly emit all the necessary types and
specify type IDs that will be valid in the finally generated BTF.

Each of btf__append_<kind>() APIs return new type ID on success or negative
value on error. APIs like btf__append_field() that emit additional items
return zero on success and negative value on error.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 tools/lib/bpf/btf.c      | 781 +++++++++++++++++++++++++++++++++++++++
 tools/lib/bpf/btf.h      |  37 ++
 tools/lib/bpf/libbpf.map |  19 +
 3 files changed, 837 insertions(+)

diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index 4d0e532d7b3d..7d50f626b823 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -1311,6 +1311,787 @@ int btf__add_str(struct btf *btf, const char *s)
 	return new_off;
 }
 
+static void *btf_add_type_mem(struct btf *btf, size_t add_sz)
+{
+	return btf_add_mem(&btf->types_data, &btf->types_data_cap, 1,
+			   btf->hdr->type_len, UINT_MAX, add_sz);
+}
+
+static __u32 btf_type_info(int kind, int vlen, int kflag)
+{
+	return (kflag << 31) | (kind << 24) | vlen;
+}
+
+static void btf_type_inc_vlen(struct btf_type *t)
+{
+	t->info = btf_type_info(btf_kind(t), btf_vlen(t) + 1, btf_kflag(t));
+}
+
+/*
+ * Append new BTF_KIND_INT type with:
+ *   - *name* - non-empty, non-NULL type name;
+ *   - *sz* - power-of-2 (1, 2, 4, ..) size of the type, in bytes;
+ *   - encoding is a combination of BTF_INT_SIGNED, BTF_INT_CHAR, BTF_INT_BOOL.
+ * Returns:
+ *   - >0, type ID of newly added BTF type;
+ *   - <0, on error.
+ */
+int btf__append_int(struct btf *btf, const char *name, size_t byte_sz, int encoding)
+{
+	struct btf_type *t;
+	int sz, err, name_off;
+
+	/* non-empty name */
+	if (!name || !name[0])
+		return -EINVAL;
+	/* byte_sz must be power of 2 */
+	if (!byte_sz || (byte_sz & (byte_sz - 1)) || byte_sz > 16)
+		return -EINVAL;
+	if (encoding & ~(BTF_INT_SIGNED | BTF_INT_CHAR | BTF_INT_BOOL))
+		return -EINVAL;
+
+	/* deconstruct BTF, if necessary, and invalidate raw_data */
+	if (btf_ensure_modifiable(btf))
+		return -ENOMEM;
+
+	sz = sizeof(struct btf_type) + sizeof(int);
+	t = btf_add_type_mem(btf, sz);
+	if (!t)
+		return -ENOMEM;
+
+	/* if something goes wrong later, we might end up with extra an string,
+	 * but that shouldn't be a problem, because BTF can't be constructed
+	 * completely anyway and will most probably be just discarded
+	 */
+	name_off = btf__add_str(btf, name);
+	if (name_off < 0)
+		return name_off;
+
+	t->name_off = name_off;
+	t->info = btf_type_info(BTF_KIND_INT, 0, 0);
+	t->size = byte_sz;
+	/* set INT info, we don't allow setting legacy bit offset/size */
+	*(__u32 *)(t + 1) = (encoding << 24) | (byte_sz * 8);
+
+	err = btf_add_type_idx_entry(btf, btf->hdr->type_len);
+	if (err)
+		return err;
+
+	btf->hdr->type_len += sz;
+	btf->hdr->str_off += sz;
+	btf->nr_types++;
+	return btf->nr_types;
+}
+
+/* it's completely legal to append BTF types with type IDs pointing forward to
+ * types that haven't been appended yet, so we only make sure that id looks
+ * sane, we can't guarantee that ID will always be valid
+ */
+static int validate_type_id(int id)
+{
+	if (id < 0 || id > BTF_MAX_NR_TYPES)
+		return -EINVAL;
+	return 0;
+}
+
+/* generic append function for PTR, TYPEDEF, CONST/VOLATILE/RESTRICT */
+static int btf_append_ref_kind(struct btf *btf, int kind, const char *name, int ref_type_id)
+{
+	struct btf_type *t;
+	int sz, name_off = 0, err;
+
+	if (validate_type_id(ref_type_id))
+		return -EINVAL;
+
+	if (btf_ensure_modifiable(btf))
+		return -ENOMEM;
+
+	sz = sizeof(struct btf_type);
+	t = btf_add_type_mem(btf, sz);
+	if (!t)
+		return -ENOMEM;
+
+	if (name && name[0]) {
+		name_off = btf__add_str(btf, name);
+		if (name_off < 0)
+			return name_off;
+	}
+
+	t->name_off = name_off;
+	t->info = btf_type_info(kind, 0, 0);
+	t->type = ref_type_id;
+
+	err = btf_add_type_idx_entry(btf, btf->hdr->type_len);
+	if (err)
+		return err;
+
+	btf->hdr->type_len += sz;
+	btf->hdr->str_off += sz;
+	btf->nr_types++;
+	return btf->nr_types;
+}
+
+/*
+ * Append new BTF_KIND_PTR type with:
+ *   - *ref_type_id* - referenced type ID, it might not exist yet;
+ * Returns:
+ *   - >0, type ID of newly added BTF type;
+ *   - <0, on error.
+ */
+int btf__append_ptr(struct btf *btf, int ref_type_id)
+{
+	return btf_append_ref_kind(btf, BTF_KIND_PTR, NULL, ref_type_id);
+}
+
+/*
+ * Append new BTF_KIND_ARRAY type with:
+ *   - *index_type_id* - type ID of the type describing array index;
+ *   - *elem_type_id* - type ID of the type describing array element;
+ *   - *nr_elems* - the size of the array;
+ * Returns:
+ *   - >0, type ID of newly added BTF type;
+ *   - <0, on error.
+ */
+int btf__append_array(struct btf *btf, int index_type_id, int elem_type_id, __u32 nr_elems)
+{
+	struct btf_type *t;
+	struct btf_array *a;
+	int sz, err;
+
+	if (validate_type_id(index_type_id) || validate_type_id(elem_type_id))
+		return -EINVAL;
+
+	if (btf_ensure_modifiable(btf))
+		return -ENOMEM;
+
+	sz = sizeof(struct btf_type) + sizeof(struct btf_array);
+	t = btf_add_type_mem(btf, sz);
+	if (!t)
+		return -ENOMEM;
+
+	t->name_off = 0;
+	t->info = btf_type_info(BTF_KIND_ARRAY, 0, 0);
+	t->size = 0;
+
+	a = btf_array(t);
+	a->type = elem_type_id;
+	a->index_type = index_type_id;
+	a->nelems = nr_elems;
+
+	err = btf_add_type_idx_entry(btf, btf->hdr->type_len);
+	if (err)
+		return err;
+
+	btf->hdr->type_len += sz;
+	btf->hdr->str_off += sz;
+	btf->nr_types++;
+	return btf->nr_types;
+}
+
+/* generic STRUCT/UNION append function */
+static int btf_append_composite(struct btf *btf, int kind, const char *name, __u32 bytes_sz)
+{
+	struct btf_type *t;
+	int sz, err, name_off = 0;
+
+	if (btf_ensure_modifiable(btf))
+		return -ENOMEM;
+
+	sz = sizeof(struct btf_type);
+	t = btf_add_type_mem(btf, sz);
+	if (!t)
+		return -ENOMEM;
+
+	if (name && name[0]) {
+		name_off = btf__add_str(btf, name);
+		if (name_off < 0)
+			return name_off;
+	}
+
+	/* start out with vlen=0 and no kflag; this will be adjusted when
+	 * adding each member
+	 */
+	t->name_off = name_off;
+	t->info = btf_type_info(kind, 0, 0);
+	t->size = bytes_sz;
+
+	err = btf_add_type_idx_entry(btf, btf->hdr->type_len);
+	if (err)
+		return err;
+
+	btf->hdr->type_len += sz;
+	btf->hdr->str_off += sz;
+	btf->nr_types++;
+	return btf->nr_types;
+}
+
+/*
+ * Append new BTF_KIND_STRUCT type with:
+ *   - *name* - name of the struct, can be NULL or empty for anonymous structs;
+ *   - *byte_sz* - size of the struct, in bytes;
+ *
+ * Struct initially has no fields in it. Fields can be added by
+ * btf__append_field() right after btf__append_struct() succeeds. 
+ *
+ * Returns:
+ *   - >0, type ID of newly added BTF type;
+ *   - <0, on error.
+ */
+int btf__append_struct(struct btf *btf, const char *name, __u32 byte_sz)
+{
+	return btf_append_composite(btf, BTF_KIND_STRUCT, name, byte_sz);
+}
+
+/*
+ * Append new BTF_KIND_UNION type with:
+ *   - *name* - name of the union, can be NULL or empty for anonymous union;
+ *   - *byte_sz* - size of the union, in bytes;
+ *
+ * Union initially has no fields in it. Fields can be added by
+ * btf__append_field() right after btf__append_union() succeeds. All fields
+ * should have *bit_offset* of 0.
+ *
+ * Returns:
+ *   - >0, type ID of newly added BTF type;
+ *   - <0, on error.
+ */
+int btf__append_union(struct btf *btf, const char *name, __u32 byte_sz)
+{
+	return btf_append_composite(btf, BTF_KIND_UNION, name, byte_sz);
+}
+
+/*
+ * Append new field for the current STRUCT/UNION type with:
+ *   - *name* - name of the field, can be NULL or empty for anonymous field;
+ *   - *type_id* - type ID for the type describing field type;
+ *   - *bit_offset* - bit offset of the start of the field within struct/union;
+ *   - *bit_size* - bit size of a bitfield, 0 for non-bitfield fields;
+ * Returns:
+ *   -  0, on success;
+ *   - <0, on error.
+ */
+int btf__append_field(struct btf *btf, const char *name, int type_id,
+		      __u32 bit_offset, __u32 bit_size)
+{
+	struct btf_type *t;
+	struct btf_member *m;
+	bool is_bitfield;
+	int sz, name_off = 0;
+
+	/* last type should be union/struct */
+	if (btf->nr_types == 0)
+		return -EINVAL;
+	t = btf_type_by_id(btf, btf->nr_types);
+	if (!btf_is_composite(t))
+		return -EINVAL;
+
+	if (validate_type_id(type_id))
+		return -EINVAL;
+	/* best-effort bit field offset/size enforcement */
+	is_bitfield = bit_size || (bit_offset % 8 != 0);
+	if (is_bitfield && (bit_size == 0 || bit_size > 255 || bit_offset > 0xffffff))
+		return -EINVAL;
+
+	/* only offset 0 is allowed for unions */
+	if (btf_is_union(t) && bit_offset)
+		return -EINVAL;
+
+	/* decompose and invalidate raw data */
+	if (btf_ensure_modifiable(btf))
+		return -ENOMEM;
+
+	sz = sizeof(struct btf_member);
+	m = btf_add_type_mem(btf, sz);
+	if (!m)
+		return -ENOMEM;
+
+	if (name && name[0]) {
+		name_off = btf__add_str(btf, name);
+		if (name_off < 0)
+			return name_off;
+	}
+
+	m->name_off = name_off;
+	m->type = type_id;
+	m->offset = bit_offset | (bit_size << 24);
+
+	/* btf_add_type_mem can invalidate t pointer */
+	t = btf_type_by_id(btf, btf->nr_types);
+	/* update parent type's vlen and kflag */
+	t->info = btf_type_info(btf_kind(t), btf_vlen(t) + 1, is_bitfield || btf_kflag(t));
+
+	btf->hdr->type_len += sz;
+	btf->hdr->str_off += sz;
+	return 0;
+}
+
+/*
+ * Append new BTF_KIND_ENUM type with:
+ *   - *name* - name of the enum, can be NULL or empty for anonymous enums;
+ *   - *byte_sz* - size of the enum, in bytes.
+ *
+ * Enum initially has no enum values in it (and corresponds to enum forward
+ * declaration). Enumerator values can be added by btf__append_enum_value()
+ * immediately after btf__append_enum() succeeds.
+ *
+ * Returns:
+ *   - >0, type ID of newly added BTF type;
+ *   - <0, on error.
+ */
+int btf__append_enum(struct btf *btf, const char *name, __u32 byte_sz)
+{
+	struct btf_type *t;
+	int sz, err, name_off = 0;
+
+	/* byte_sz must be power of 2 */
+	if (!byte_sz || (byte_sz & (byte_sz - 1)) || byte_sz > 8)
+		return -EINVAL;
+
+	if (btf_ensure_modifiable(btf))
+		return -ENOMEM;
+
+	sz = sizeof(struct btf_type);
+	t = btf_add_type_mem(btf, sz);
+	if (!t)
+		return -ENOMEM;
+
+	if (name && name[0]) {
+		name_off = btf__add_str(btf, name);
+		if (name_off < 0)
+			return name_off;
+	}
+
+	/* start out with vlen=0; it will be adjusted when adding enum values */
+	t->name_off = name_off;
+	t->info = btf_type_info(BTF_KIND_ENUM, 0, 0);
+	t->size = byte_sz;
+
+	err = btf_add_type_idx_entry(btf, btf->hdr->type_len);
+	if (err)
+		return err;
+
+	btf->hdr->type_len += sz;
+	btf->hdr->str_off += sz;
+	btf->nr_types++;
+	return btf->nr_types;
+}
+
+/*
+ * Append new enum value for the current ENUM type with:
+ *   - *name* - name of the enumerator value, can't be NULL or empty;
+ *   - *value* - integer value corresponding to enum value *name*;
+ * Returns:
+ *   -  0, on success;
+ *   - <0, on error.
+ */
+int btf__append_enum_value(struct btf *btf, const char *name, __s64 value)
+{
+	struct btf_type *t;
+	struct btf_enum *v;
+	int sz, name_off;
+
+	/* last type should be BTF_KIND_ENUM */
+	if (btf->nr_types == 0)
+		return -EINVAL;
+	t = btf_type_by_id(btf, btf->nr_types);
+	if (!btf_is_enum(t))
+		return -EINVAL;
+
+	/* non-empty name */
+	if (!name || !name[0])
+		return -EINVAL;
+	if (value < INT_MIN || value > UINT_MAX)
+		return -E2BIG;
+
+	/* decompose and invalidate raw data */
+	if (btf_ensure_modifiable(btf))
+		return -ENOMEM;
+
+	sz = sizeof(struct btf_enum);
+	v = btf_add_type_mem(btf, sz);
+	if (!v)
+		return -ENOMEM;
+
+	name_off = btf__add_str(btf, name);
+	if (name_off < 0)
+		return name_off;
+
+	v->name_off = name_off;
+	v->val = value;
+
+	/* update parent type's vlen */
+	t = btf_type_by_id(btf, btf->nr_types);
+	btf_type_inc_vlen(t);
+
+	btf->hdr->type_len += sz;
+	btf->hdr->str_off += sz;
+	return 0;
+}
+
+/*
+ * Append new BTF_KIND_FWD type with:
+ *   - *name*, non-empty/non-NULL name;
+ *   - *fwd_kind*, kind of forward declaration, one of BTF_FWD_STRUCT,
+ *     BTF_FWD_UNION, or BTF_FWD_ENUM;
+ * Returns:
+ *   - >0, type ID of newly added BTF type;
+ *   - <0, on error.
+ */
+int btf__append_fwd(struct btf *btf, const char *name, enum btf_fwd_kind fwd_kind)
+{
+	if (!name || !name[0])
+		return -EINVAL;
+
+	switch (fwd_kind) {
+	case BTF_FWD_STRUCT:
+	case BTF_FWD_UNION: {
+		struct btf_type *t;
+		int id;
+
+		id = btf_append_ref_kind(btf, BTF_KIND_FWD, name, 0);
+		if (id <= 0)
+			return id;
+		t = btf_type_by_id(btf, id);
+		t->info = btf_type_info(BTF_KIND_FWD, 0, fwd_kind == BTF_FWD_UNION);
+		return id;
+	}
+	case BTF_FWD_ENUM:
+		/* enum forward in BTF currently is just an enum with no enum
+		 * values; we also assume a standard 4-byte size for it
+		 */
+		return btf__append_enum(btf, name, sizeof(int));
+	default:
+		return -EINVAL;
+	}
+}
+
+/*
+ * Append new BTF_KING_TYPEDEF type with:
+ *   - *name*, non-empty/non-NULL name;
+ *   - *ref_type_id* - referenced type ID, it might not exist yet;
+ * Returns:
+ *   - >0, type ID of newly added BTF type;
+ *   - <0, on error.
+ */
+int btf__append_typedef(struct btf *btf, const char *name, int ref_type_id)
+{
+	if (!name || !name[0])
+		return -EINVAL;
+
+	return btf_append_ref_kind(btf, BTF_KIND_TYPEDEF, name, ref_type_id);
+}
+
+/*
+ * Append new BTF_KIND_VOLATILE type with:
+ *   - *ref_type_id* - referenced type ID, it might not exist yet;
+ * Returns:
+ *   - >0, type ID of newly added BTF type;
+ *   - <0, on error.
+ */
+int btf__append_volatile(struct btf *btf, int ref_type_id)
+{
+	return btf_append_ref_kind(btf, BTF_KIND_VOLATILE, NULL, ref_type_id);
+}
+
+/*
+ * Append new BTF_KIND_CONST type with:
+ *   - *ref_type_id* - referenced type ID, it might not exist yet;
+ * Returns:
+ *   - >0, type ID of newly added BTF type;
+ *   - <0, on error.
+ */
+int btf__append_const(struct btf *btf, int ref_type_id)
+{
+	return btf_append_ref_kind(btf, BTF_KIND_CONST, NULL, ref_type_id);
+}
+
+/*
+ * Append new BTF_KIND_RESTRICT type with:
+ *   - *ref_type_id* - referenced type ID, it might not exist yet;
+ * Returns:
+ *   - >0, type ID of newly added BTF type;
+ *   - <0, on error.
+ */
+int btf__append_restrict(struct btf *btf, int ref_type_id)
+{
+	return btf_append_ref_kind(btf, BTF_KIND_RESTRICT, NULL, ref_type_id);
+}
+
+/*
+ * Append new BTF_KIND_FUNC type with:
+ *   - *name*, non-empty/non-NULL name;
+ *   - *proto_type_id* - FUNC_PROTO's type ID, it might not exist yet;
+ * Returns:
+ *   - >0, type ID of newly added BTF type;
+ *   - <0, on error.
+ */
+int btf__append_func(struct btf *btf, const char *name,
+		     enum btf_func_linkage linkage, int proto_type_id)
+{
+	int id;
+
+	if (!name || !name[0])
+		return -EINVAL;
+	if (linkage != BTF_FUNC_STATIC && linkage != BTF_FUNC_GLOBAL &&
+	    linkage != BTF_FUNC_EXTERN)
+		return -EINVAL;
+
+	id = btf_append_ref_kind(btf, BTF_KIND_FUNC, name, proto_type_id);
+	if (id > 0) {
+		struct btf_type *t = btf_type_by_id(btf, id);
+
+		t->info = btf_type_info(BTF_KIND_FUNC, linkage, 0);
+	}
+	return id;
+}
+
+/*
+ * Append new BTF_KIND_FUNC_PROTO with:
+ *   - *ret_type_id* - type ID for return result of a function.
+ *
+ * Function prototype initially has no arguments, but they can be added by
+ * btf__append_func_param() one by one, immediately after
+ * btf__append_func_proto() succeeded.
+ *
+ * Returns:
+ *   - >0, type ID of newly added BTF type;
+ *   - <0, on error.
+ */
+int btf__append_func_proto(struct btf *btf, int ret_type_id)
+{
+	struct btf_type *t;
+	int sz, err;
+
+	if (validate_type_id(ret_type_id))
+		return -EINVAL;
+
+	if (btf_ensure_modifiable(btf))
+		return -ENOMEM;
+
+	sz = sizeof(struct btf_type);
+	t = btf_add_type_mem(btf, sz);
+	if (!t)
+		return -ENOMEM;
+
+	/* start out with vlen=0; this will be adjusted when adding enum
+	 * values, if necessary
+	 */
+	t->name_off = 0;
+	t->info = btf_type_info(BTF_KIND_FUNC_PROTO, 0, 0);
+	t->type = ret_type_id;
+
+	err = btf_add_type_idx_entry(btf, btf->hdr->type_len);
+	if (err)
+		return err;
+
+	btf->hdr->type_len += sz;
+	btf->hdr->str_off += sz;
+	btf->nr_types++;
+	return btf->nr_types;
+}
+
+/*
+ * Append new function parameter for current FUNC_PROTO type with:
+ *   - *name* - parameter name, can be NULL or empty;
+ *   - *type_id* - type ID describing the type of the parameter.
+ * Returns:
+ *   -  0, on success;
+ *   - <0, on error.
+ */
+int btf__append_func_param(struct btf *btf, const char *name, int type_id)
+{
+	struct btf_type *t;
+	struct btf_param *p;
+	int sz, name_off = 0;
+
+	if (validate_type_id(type_id))
+		return -EINVAL;
+
+	/* last type should be BTF_KIND_FUNC_PROTO */
+	if (btf->nr_types == 0)
+		return -EINVAL;
+	t = btf_type_by_id(btf, btf->nr_types);
+	if (!btf_is_func_proto(t))
+		return -EINVAL;
+
+	/* decompose and invalidate raw data */
+	if (btf_ensure_modifiable(btf))
+		return -ENOMEM;
+
+	sz = sizeof(struct btf_param);
+	p = btf_add_type_mem(btf, sz);
+	if (!p)
+		return -ENOMEM;
+
+	if (name && name[0]) {
+		name_off = btf__add_str(btf, name);
+		if (name_off < 0)
+			return name_off;
+	}
+
+	p->name_off = name_off;
+	p->type = type_id;
+
+	/* update parent type's vlen */
+	t = btf_type_by_id(btf, btf->nr_types);
+	btf_type_inc_vlen(t);
+
+	btf->hdr->type_len += sz;
+	btf->hdr->str_off += sz;
+	return 0;
+}
+
+/*
+ * Append new BTF_KIND_VAR type with:
+ *   - *name* - non-empty/non-NULL name;
+ *   - *linkage* - variable linkage, one of BTF_VAR_STATIC,
+ *     BTF_VAR_GLOBAL_ALLOCATED, or BTF_VAR_GLOBAL_EXTERN;
+ *   - *type_id* - type ID of the type describing the type of the variable.
+ * Returns:
+ *   - >0, type ID of newly added BTF type;
+ *   - <0, on error.
+ */
+int btf__append_var(struct btf *btf, const char *name, int linkage, int type_id)
+{
+	struct btf_type *t;
+	struct btf_var *v;
+	int sz, err, name_off;
+
+	/* non-empty name */
+	if (!name || !name[0])
+		return -EINVAL;
+	if (linkage != BTF_VAR_STATIC && linkage != BTF_VAR_GLOBAL_ALLOCATED &&
+	    linkage != BTF_VAR_GLOBAL_EXTERN)
+		return -EINVAL;
+	if (validate_type_id(type_id))
+		return -EINVAL;
+
+	/* deconstruct BTF, if necessary, and invalidate raw_data */
+	if (btf_ensure_modifiable(btf))
+		return -ENOMEM;
+
+	sz = sizeof(struct btf_type) + sizeof(struct btf_var);
+	t = btf_add_type_mem(btf, sz);
+	if (!t)
+		return -ENOMEM;
+
+	name_off = btf__add_str(btf, name);
+	if (name_off < 0)
+		return name_off;
+
+	t->name_off = name_off;
+	t->info = btf_type_info(BTF_KIND_VAR, 0, 0);
+	t->type = type_id;
+
+	v = btf_var(t);
+	v->linkage = linkage;
+
+	err = btf_add_type_idx_entry(btf, btf->hdr->type_len);
+	if (err)
+		return err;
+
+	btf->hdr->type_len += sz;
+	btf->hdr->str_off += sz;
+	btf->nr_types++;
+	return btf->nr_types;
+}
+
+/*
+ * Append new BTF_KIND_DATASEC type with:
+ *   - *name* - non-empty/non-NULL name;
+ *   - *byte_sz* - data section size, in bytes.
+ *
+ * Data section is initially empty. Variables info can be added with
+ * btf__append_datasec_var_info() calls, after btf__append_datasec() succeeds.
+ *
+ * Returns:
+ *   - >0, type ID of newly added BTF type;
+ *   - <0, on error.
+ */
+int btf__append_datasec(struct btf *btf, const char *name, __u32 byte_sz)
+{
+	struct btf_type *t;
+	int sz, err, name_off;
+
+	/* non-empty name */
+	if (!name || !name[0])
+		return -EINVAL;
+
+	if (btf_ensure_modifiable(btf))
+		return -ENOMEM;
+
+	sz = sizeof(struct btf_type);
+	t = btf_add_type_mem(btf, sz);
+	if (!t)
+		return -ENOMEM;
+
+	name_off = btf__add_str(btf, name);
+	if (name_off < 0)
+		return name_off;
+
+	/* start with vlen=0, which will be update as var_secinfos are added */
+	t->name_off = name_off;
+	t->info = btf_type_info(BTF_KIND_DATASEC, 0, 0);
+	t->size = byte_sz;
+
+	err = btf_add_type_idx_entry(btf, btf->hdr->type_len);
+	if (err)
+		return err;
+
+	btf->hdr->type_len += sz;
+	btf->hdr->str_off += sz;
+	btf->nr_types++;
+	return btf->nr_types;
+}
+
+/*
+ * Append new data section variable information entry for current DATASEC type:
+ *   - *var_type_id* - type ID, describing type of the variable;
+ *   - *offset* - variable offset within data section, in bytes;
+ *   - *byte_sz* - variable size, in bytes.
+ *
+ * Returns:
+ *   -  0, on success;
+ *   - <0, on error.
+ */
+int btf__append_datasec_var_info(struct btf *btf, int var_type_id, __u32 offset, __u32 byte_sz)
+{
+	struct btf_type *t;
+	struct btf_var_secinfo *v;
+	int sz;
+
+	/* last type should be BTF_KIND_DATASEC */
+	if (btf->nr_types == 0)
+		return -EINVAL;
+	t = btf_type_by_id(btf, btf->nr_types);
+	if (!btf_is_datasec(t))
+		return -EINVAL;
+
+	if (validate_type_id(var_type_id))
+		return -EINVAL;
+
+	/* decompose and invalidate raw data */
+	if (btf_ensure_modifiable(btf))
+		return -ENOMEM;
+
+	sz = sizeof(struct btf_var_secinfo);
+	v = btf_add_type_mem(btf, sz);
+	if (!v)
+		return -ENOMEM;
+
+	v->type = var_type_id;
+	v->offset = offset;
+	v->size = byte_sz;
+
+	/* update parent type's vlen */
+	t = btf_type_by_id(btf, btf->nr_types);
+	btf_type_inc_vlen(t);
+
+	btf->hdr->type_len += sz;
+	btf->hdr->str_off += sz;
+	return 0;
+}
+
 struct btf_ext_sec_setup_param {
 	__u32 off;
 	__u32 len;
diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
index 830f798b9a77..32dd71eeed38 100644
--- a/tools/lib/bpf/btf.h
+++ b/tools/lib/bpf/btf.h
@@ -74,6 +74,43 @@ LIBBPF_API __u32 btf_ext__line_info_rec_size(const struct btf_ext *btf_ext);
 LIBBPF_API struct btf *libbpf_find_kernel_btf(void);
 
 LIBBPF_API int btf__add_str(struct btf *btf, const char *s);
+LIBBPF_API int btf__append_int(struct btf *btf, const char *name, size_t byte_sz, int encoding);
+LIBBPF_API int btf__append_ptr(struct btf *btf, int ref_type_id);
+LIBBPF_API int btf__append_array(struct btf *btf,
+				 int index_type_id, int elem_type_id, __u32 nr_elems);
+/* struct/union construction APIs */
+LIBBPF_API int btf__append_struct(struct btf *btf, const char *name, __u32 sz);
+LIBBPF_API int btf__append_union(struct btf *btf, const char *name, __u32 sz);
+LIBBPF_API int btf__append_field(struct btf *btf, const char *name, int field_type_id,
+				 __u32 bit_offset, __u32 bit_size);
+
+/* enum construction APIs */
+LIBBPF_API int btf__append_enum(struct btf *btf, const char *name, __u32 bytes_sz);
+LIBBPF_API int btf__append_enum_value(struct btf *btf, const char *name, __s64 value);
+
+enum btf_fwd_kind {
+	BTF_FWD_STRUCT = 0,
+	BTF_FWD_UNION = 1,
+	BTF_FWD_ENUM = 2,
+};
+
+LIBBPF_API int btf__append_fwd(struct btf *btf, const char *name, enum btf_fwd_kind fwd_kind);
+LIBBPF_API int btf__append_typedef(struct btf *btf, const char *name, int ref_type_id);
+LIBBPF_API int btf__append_volatile(struct btf *btf, int ref_type_id);
+LIBBPF_API int btf__append_const(struct btf *btf, int ref_type_id);
+LIBBPF_API int btf__append_restrict(struct btf *btf, int ref_type_id);
+
+/* func and func_proto construction APIs */
+LIBBPF_API int btf__append_func(struct btf *btf, const char *name,
+				enum btf_func_linkage linkage, int proto_type_id);
+LIBBPF_API int btf__append_func_proto(struct btf *btf, int ret_type_id);
+LIBBPF_API int btf__append_func_param(struct btf *btf, const char *name, int type_id);
+
+/* var & datasec construction APIs */
+LIBBPF_API int btf__append_var(struct btf *btf, const char *name, int linkage, int type_id);
+LIBBPF_API int btf__append_datasec(struct btf *btf, const char *name, __u32 byte_sz);
+LIBBPF_API int btf__append_datasec_var_info(struct btf *btf, int var_type_id,
+					    __u32 offset, __u32 byte_sz);
 
 struct btf_dedup_opts {
 	unsigned int dedup_table_size;
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 180c871b04b6..6ef598472a50 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -305,6 +305,25 @@ LIBBPF_0.2.0 {
 		bpf_prog_bind_map;
 		bpf_program__section_name;
 		btf__add_str;
+		btf__append_array;
+		btf__append_const;
+		btf__append_enum;
+		btf__append_enum_value;
+		btf__append_datasec;
+		btf__append_datasec_var_info;
+		btf__append_field;
+		btf__append_func;
+		btf__append_func_param;
+		btf__append_func_proto;
+		btf__append_fwd;
+		btf__append_int;
+		btf__append_ptr;
+		btf__append_restrict;
+		btf__append_struct;
+		btf__append_typedef;
+		btf__append_union;
+		btf__append_var;
+		btf__append_volatile;
 		btf__new_empty;
 		perf_buffer__buffer_cnt;
 		perf_buffer__buffer_fd;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next 8/9] libbpf: add btf__str_by_offset() as a more generic variant of name_by_offset
  2020-09-23 15:54 [PATCH bpf-next 0/9] libbpf: BTF writer APIs Andrii Nakryiko
                   ` (6 preceding siblings ...)
  2020-09-23 15:54 ` [PATCH bpf-next 7/9] libbpf: add BTF writing APIs Andrii Nakryiko
@ 2020-09-23 15:54 ` Andrii Nakryiko
  2020-09-23 15:54 ` [PATCH bpf-next 9/9] selftests/bpf: test BTF writing APIs Andrii Nakryiko
  2020-09-24 15:16 ` [PATCH bpf-next 0/9] libbpf: BTF writer APIs John Fastabend
  9 siblings, 0 replies; 20+ messages in thread
From: Andrii Nakryiko @ 2020-09-23 15:54 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko, Arnaldo Carvalho de Melo

BTF strings are used not just for names, they can be arbitrary strings used
for CO-RE relocations, line/func infos, etc. Thus "name_by_offset" terminology
is too specific and might be misleading. Instead, introduce
btf__str_by_offset() API which uses generic string terminology.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 tools/lib/bpf/btf.c      | 7 ++++++-
 tools/lib/bpf/btf.h      | 1 +
 tools/lib/bpf/libbpf.map | 1 +
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index 7d50f626b823..15c84d41a5cb 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -1023,7 +1023,7 @@ const void *btf__get_raw_data(const struct btf *btf_ro, __u32 *size)
 	return btf->raw_data;
 }
 
-const char *btf__name_by_offset(const struct btf *btf, __u32 offset)
+const char *btf__str_by_offset(const struct btf *btf, __u32 offset)
 {
 	if (offset < btf->hdr->str_len)
 		return btf->strs_data + offset;
@@ -1031,6 +1031,11 @@ const char *btf__name_by_offset(const struct btf *btf, __u32 offset)
 		return NULL;
 }
 
+const char *btf__name_by_offset(const struct btf *btf, __u32 offset)
+{
+	return btf__str_by_offset(btf, offset);
+}
+
 int btf__get_from_id(__u32 id, struct btf **btf)
 {
 	struct bpf_btf_info btf_info = { 0 };
diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
index 32dd71eeed38..6a91d47283d7 100644
--- a/tools/lib/bpf/btf.h
+++ b/tools/lib/bpf/btf.h
@@ -48,6 +48,7 @@ LIBBPF_API int btf__fd(const struct btf *btf);
 LIBBPF_API void btf__set_fd(struct btf *btf, int fd);
 LIBBPF_API const void *btf__get_raw_data(const struct btf *btf, __u32 *size);
 LIBBPF_API const char *btf__name_by_offset(const struct btf *btf, __u32 offset);
+LIBBPF_API const char *btf__str_by_offset(const struct btf *btf, __u32 offset);
 LIBBPF_API int btf__get_from_id(__u32 id, struct btf **btf);
 LIBBPF_API int btf__get_map_kv_tids(const struct btf *btf, const char *map_name,
 				    __u32 expected_key_size,
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 6ef598472a50..724ad3d94bb3 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -325,6 +325,7 @@ LIBBPF_0.2.0 {
 		btf__append_var;
 		btf__append_volatile;
 		btf__new_empty;
+		btf__str_by_offset;
 		perf_buffer__buffer_cnt;
 		perf_buffer__buffer_fd;
 		perf_buffer__epoll_fd;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next 9/9] selftests/bpf: test BTF writing APIs
  2020-09-23 15:54 [PATCH bpf-next 0/9] libbpf: BTF writer APIs Andrii Nakryiko
                   ` (7 preceding siblings ...)
  2020-09-23 15:54 ` [PATCH bpf-next 8/9] libbpf: add btf__str_by_offset() as a more generic variant of name_by_offset Andrii Nakryiko
@ 2020-09-23 15:54 ` Andrii Nakryiko
  2020-09-24 15:16 ` [PATCH bpf-next 0/9] libbpf: BTF writer APIs John Fastabend
  9 siblings, 0 replies; 20+ messages in thread
From: Andrii Nakryiko @ 2020-09-23 15:54 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko, Arnaldo Carvalho de Melo

Add selftests testing BTF writer APIs.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 .../selftests/bpf/prog_tests/btf_write.c      | 271 ++++++++++++++++++
 1 file changed, 271 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/btf_write.c

diff --git a/tools/testing/selftests/bpf/prog_tests/btf_write.c b/tools/testing/selftests/bpf/prog_tests/btf_write.c
new file mode 100644
index 000000000000..df6efdc67567
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/btf_write.c
@@ -0,0 +1,271 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <test_progs.h>
+#include <bpf/btf.h>
+
+#define ASSERT_EQ(actual, expected, name) ({				\
+	typeof(actual) ___act = (actual);				\
+	typeof(expected) ___exp = (expected);				\
+	bool ___ok = ___act == ___exp;					\
+	CHECK(!___ok, (name),						\
+	      "unexpected %s: actual %lld != expected %lld\n",		\
+	      (name), (long long)(___act), (long long)(___exp));	\
+	___ok;								\
+})
+
+#define ASSERT_STREQ(actual, expected, name) ({				\
+	const char *___act = actual;					\
+	const char *___exp = expected;					\
+	bool ___ok = strcmp(___act, ___exp) == 0;			\
+	CHECK(!___ok, (name),						\
+	      "unexpected %s: actual '%s' != expected '%s'\n",		\
+	      (name), ___act, ___exp);					\
+	___ok;								\
+})
+
+#define ASSERT_OK(res, name) ({						\
+	long long ___res = (res);					\
+	bool ___ok = ___res == 0;					\
+	CHECK(!___ok, (name), "unexpected error: %lld\n", ___res);	\
+	___ok;								\
+})
+
+#define ASSERT_ERR(res, name) ({					\
+	long long ___res = (res);					\
+	bool ___ok = ___res < 0;					\
+	CHECK(!___ok, (name), "unexpected success: %lld\n", ___res);	\
+	___ok;								\
+})
+
+static int duration = 0;
+
+void test_btf_write() {
+	const struct btf_var_secinfo *vi;
+	const struct btf_type *t;
+	const struct btf_member *m;
+	const struct btf_enum *v;
+	const struct btf_param *p;
+	struct btf *btf;
+	int id, err, str_off;
+
+	btf = btf__new_empty();
+	if (CHECK(IS_ERR(btf), "new_empty", "failed: %ld\n", PTR_ERR(btf)))
+		return;
+
+	str_off = btf__add_str(btf, "int");
+	ASSERT_EQ(str_off, 1, "int_str_off");
+
+	/* BTF_KIND_INT */
+	id = btf__append_int(btf, "int", 4,  BTF_INT_SIGNED);
+	ASSERT_EQ(id, 1, "int_id");
+
+	t = btf__type_by_id(btf, 1);
+	/* should re-use previously added "int" string */
+	ASSERT_EQ(t->name_off, str_off, "int_name_off");
+	ASSERT_STREQ(btf__str_by_offset(btf, t->name_off), "int", "int_name");
+	ASSERT_EQ(btf_kind(t), BTF_KIND_INT, "int_kind");
+	ASSERT_EQ(t->size, 4, "int_sz");
+	ASSERT_EQ(btf_int_encoding(t), BTF_INT_SIGNED, "int_enc");
+	ASSERT_EQ(btf_int_bits(t), 32, "int_bits");
+
+	/* invalid int size */
+	id = btf__append_int(btf, "bad sz int", 7, 0);
+	ASSERT_ERR(id, "int_bad_sz");
+	/* invalid encoding */
+	id = btf__append_int(btf, "bad enc int", 4, 123);
+	ASSERT_ERR(id, "int_bad_enc");
+	/* NULL name */
+	id = btf__append_int(btf, NULL, 4, 0);
+	ASSERT_ERR(id, "int_bad_null_name");
+	/* empty name */
+	id = btf__append_int(btf, "", 4, 0);
+	ASSERT_ERR(id, "int_bad_empty_name");
+
+	/* PTR/CONST/VOLATILE/RESTRICT */
+	id = btf__append_ptr(btf, 1);
+	ASSERT_EQ(id, 2, "ptr_id");
+	t = btf__type_by_id(btf, 2);
+	ASSERT_EQ(btf_kind(t), BTF_KIND_PTR, "ptr_kind");
+	ASSERT_EQ(t->type, 1, "ptr_type");
+
+	id = btf__append_const(btf, 5); /* points forward to restrict */
+	ASSERT_EQ(id, 3, "const_id");
+	t = btf__type_by_id(btf, 3);
+	ASSERT_EQ(btf_kind(t), BTF_KIND_CONST, "const_kind");
+	ASSERT_EQ(t->type, 5, "const_type");
+
+	id = btf__append_volatile(btf, 3);
+	ASSERT_EQ(id, 4, "volatile_id");
+	t = btf__type_by_id(btf, 4);
+	ASSERT_EQ(btf_kind(t), BTF_KIND_VOLATILE, "volatile_kind");
+	ASSERT_EQ(t->type, 3, "volatile_type");
+
+	id = btf__append_restrict(btf, 4);
+	ASSERT_EQ(id, 5, "restrict_id");
+	t = btf__type_by_id(btf, 5);
+	ASSERT_EQ(btf_kind(t), BTF_KIND_RESTRICT, "restrict_kind");
+	ASSERT_EQ(t->type, 4, "restrict_type");
+
+	/* ARRAY */
+	id = btf__append_array(btf, 1, 2, 10); /* int *[10] */
+	ASSERT_EQ(id, 6, "array_id");
+	t = btf__type_by_id(btf, 6);
+	ASSERT_EQ(btf_kind(t), BTF_KIND_ARRAY, "array_kind");
+	ASSERT_EQ(btf_array(t)->index_type, 1, "array_index_type");
+	ASSERT_EQ(btf_array(t)->type, 2, "array_elem_type");
+	ASSERT_EQ(btf_array(t)->nelems, 10, "array_nelems");
+
+	/* STRUCT */
+	err = btf__append_field(btf, "field", 1, 0, 0);
+	ASSERT_ERR(err, "no_struct_field");
+	id = btf__append_struct(btf, "s1", 8);
+	ASSERT_EQ(id, 7, "struct_id");
+	err = btf__append_field(btf, "f1", 1, 0, 0);
+	ASSERT_OK(err, "f1_res");
+	err = btf__append_field(btf, "f2", 1, 32, 16);
+	ASSERT_OK(err, "f2_res");
+
+	t = btf__type_by_id(btf, 7);
+	ASSERT_STREQ(btf__str_by_offset(btf, t->name_off), "s1", "struct_name");
+	ASSERT_EQ(btf_kind(t), BTF_KIND_STRUCT, "struct_kind");
+	ASSERT_EQ(btf_vlen(t), 2, "struct_vlen");
+	ASSERT_EQ(btf_kflag(t), true, "struct_kflag");
+	ASSERT_EQ(t->size, 8, "struct_sz");
+	m = btf_members(t) + 0;
+	ASSERT_STREQ(btf__str_by_offset(btf, m->name_off), "f1", "f1_name");
+	ASSERT_EQ(m->type, 1, "f1_type");
+	ASSERT_EQ(btf_member_bit_offset(t, 0), 0, "f1_bit_off");
+	ASSERT_EQ(btf_member_bitfield_size(t, 0), 0, "f1_bit_sz");
+	m = btf_members(t) + 1;
+	ASSERT_STREQ(btf__str_by_offset(btf, m->name_off), "f2", "f2_name");
+	ASSERT_EQ(m->type, 1, "f2_type");
+	ASSERT_EQ(btf_member_bit_offset(t, 1), 32, "f2_bit_off");
+	ASSERT_EQ(btf_member_bitfield_size(t, 1), 16, "f2_bit_sz");
+
+	/* UNION */
+	id = btf__append_union(btf, "u1", 8);
+	ASSERT_EQ(id, 8, "union_id");
+
+	/* invalid, non-zero offset */
+	err = btf__append_field(btf, "field", 1, 1, 0);
+	ASSERT_ERR(err, "no_struct_field");
+
+	err = btf__append_field(btf, "f1", 1, 0, 16);
+	ASSERT_OK(err, "f1_res");
+
+	t = btf__type_by_id(btf, 8);
+	ASSERT_STREQ(btf__str_by_offset(btf, t->name_off), "u1", "union_name");
+	ASSERT_EQ(btf_kind(t), BTF_KIND_UNION, "union_kind");
+	ASSERT_EQ(btf_vlen(t), 1, "union_vlen");
+	ASSERT_EQ(btf_kflag(t), true, "union_kflag");
+	ASSERT_EQ(t->size, 8, "union_sz");
+	m = btf_members(t) + 0;
+	ASSERT_STREQ(btf__str_by_offset(btf, m->name_off), "f1", "f1_name");
+	ASSERT_EQ(m->type, 1, "f1_type");
+	ASSERT_EQ(btf_member_bit_offset(t, 0), 0, "f1_bit_off");
+	ASSERT_EQ(btf_member_bitfield_size(t, 0), 16, "f1_bit_sz");
+
+	/* ENUM */
+	id = btf__append_enum(btf, "e1", 4);
+	ASSERT_EQ(id, 9, "enum_id");
+	err = btf__append_enum_value(btf, "v1", 1);
+	ASSERT_OK(err, "v1_res");
+	err = btf__append_enum_value(btf, "v2", 2);
+	ASSERT_OK(err, "v2_res");
+
+	t = btf__type_by_id(btf, 9);
+	ASSERT_STREQ(btf__str_by_offset(btf, t->name_off), "e1", "enum_name");
+	ASSERT_EQ(btf_kind(t), BTF_KIND_ENUM, "enum_kind");
+	ASSERT_EQ(btf_vlen(t), 2, "enum_vlen");
+	ASSERT_EQ(t->size, 4, "enum_sz");
+	v = btf_enum(t) + 0;
+	ASSERT_STREQ(btf__str_by_offset(btf, v->name_off), "v1", "v1_name");
+	ASSERT_EQ(v->val, 1, "v1_val");
+	v = btf_enum(t) + 1;
+	ASSERT_STREQ(btf__str_by_offset(btf, v->name_off), "v2", "v2_name");
+	ASSERT_EQ(v->val, 2, "v2_val");
+
+	/* FWDs */
+	id = btf__append_fwd(btf, "struct_fwd", BTF_FWD_STRUCT);
+	ASSERT_EQ(id, 10, "struct_fwd_id");
+	t = btf__type_by_id(btf, 10);
+	ASSERT_STREQ(btf__str_by_offset(btf, t->name_off), "struct_fwd", "fwd_name");
+	ASSERT_EQ(btf_kind(t), BTF_KIND_FWD, "fwd_kind");
+	ASSERT_EQ(btf_kflag(t), 0, "fwd_kflag");
+
+	id = btf__append_fwd(btf, "union_fwd", BTF_FWD_UNION);
+	ASSERT_EQ(id, 11, "union_fwd_id");
+	t = btf__type_by_id(btf, 11);
+	ASSERT_STREQ(btf__str_by_offset(btf, t->name_off), "union_fwd", "fwd_name");
+	ASSERT_EQ(btf_kind(t), BTF_KIND_FWD, "fwd_kind");
+	ASSERT_EQ(btf_kflag(t), 1, "fwd_kflag");
+
+	id = btf__append_fwd(btf, "enum_fwd", BTF_FWD_ENUM);
+	ASSERT_EQ(id, 12, "enum_fwd_id");
+	t = btf__type_by_id(btf, 12);
+	ASSERT_STREQ(btf__str_by_offset(btf, t->name_off), "enum_fwd", "fwd_name");
+	ASSERT_EQ(btf_kind(t), BTF_KIND_ENUM, "enum_fwd_kind");
+	ASSERT_EQ(btf_vlen(t), 0, "enum_fwd_kind");
+	ASSERT_EQ(t->size, 4, "enum_fwd_sz");
+
+	/* TYPEDEF */
+	id = btf__append_typedef(btf, "typedef1", 1);
+	ASSERT_EQ(id, 13, "typedef_fwd_id");
+	t = btf__type_by_id(btf, 13);
+	ASSERT_STREQ(btf__str_by_offset(btf, t->name_off), "typedef1", "typedef_name");
+	ASSERT_EQ(btf_kind(t), BTF_KIND_TYPEDEF, "typedef_kind");
+	ASSERT_EQ(t->type, 1, "typedef_type");
+
+	/* FUNC & FUNC_PROTO */
+	id = btf__append_func(btf, "func1", BTF_FUNC_GLOBAL, 15);
+	ASSERT_EQ(id, 14, "func_id");
+	t = btf__type_by_id(btf, 14);
+	ASSERT_STREQ(btf__str_by_offset(btf, t->name_off), "func1", "func_name");
+	ASSERT_EQ(t->type, 15, "func_type");
+	ASSERT_EQ(btf_kind(t), BTF_KIND_FUNC, "func_kind");
+	ASSERT_EQ(btf_vlen(t), BTF_FUNC_GLOBAL, "func_vlen");
+
+	id = btf__append_func_proto(btf, 1);
+	ASSERT_EQ(id, 15, "func_proto_id");
+	err = btf__append_func_param(btf, "p1", 1);
+	ASSERT_OK(err, "p1_res");
+	err = btf__append_func_param(btf, "p2", 2);
+	ASSERT_OK(err, "p2_res");
+
+	t = btf__type_by_id(btf, 15);
+	ASSERT_EQ(btf_kind(t), BTF_KIND_FUNC_PROTO, "func_proto_kind");
+	ASSERT_EQ(btf_vlen(t), 2, "func_proto_vlen");
+	ASSERT_EQ(t->type, 1, "func_proto_ret_type");
+	p = btf_params(t) + 0;
+	ASSERT_STREQ(btf__str_by_offset(btf, p->name_off), "p1", "p1_name");
+	ASSERT_EQ(p->type, 1, "p1_type");
+	p = btf_params(t) + 1;
+	ASSERT_STREQ(btf__str_by_offset(btf, p->name_off), "p2", "p2_name");
+	ASSERT_EQ(p->type, 2, "p2_type");
+
+	/* VAR */
+	id = btf__append_var(btf, "var1", BTF_VAR_GLOBAL_ALLOCATED, 1);
+	ASSERT_EQ(id, 16, "var_id");
+	t = btf__type_by_id(btf, 16);
+	ASSERT_STREQ(btf__str_by_offset(btf, t->name_off), "var1", "var_name");
+	ASSERT_EQ(btf_kind(t), BTF_KIND_VAR, "var_kind");
+	ASSERT_EQ(t->type, 1, "var_type");
+	ASSERT_EQ(btf_var(t)->linkage, BTF_VAR_GLOBAL_ALLOCATED, "var_type");
+
+	/* DATASECT */
+	id = btf__append_datasec(btf, "datasec1", 12);
+	ASSERT_EQ(id, 17, "datasec_id");
+	err = btf__append_datasec_var_info(btf, 1, 4, 8);
+	ASSERT_OK(err, "v1_res");
+
+	t = btf__type_by_id(btf, 17);
+	ASSERT_STREQ(btf__str_by_offset(btf, t->name_off), "datasec1", "datasec_name");
+	ASSERT_EQ(t->size, 12, "datasec_sz");
+	ASSERT_EQ(btf_kind(t), BTF_KIND_DATASEC, "datasec_kind");
+	ASSERT_EQ(btf_vlen(t), 1, "datasec_vlen");
+	vi = btf_var_secinfos(t) + 0;
+	ASSERT_EQ(vi->type, 1, "v1_type");
+	ASSERT_EQ(vi->offset, 4, "v1_off");
+	ASSERT_EQ(vi->size, 8, "v1_sz");
+
+	btf__free(btf);
+}
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* RE: [PATCH bpf-next 0/9] libbpf: BTF writer APIs
  2020-09-23 15:54 [PATCH bpf-next 0/9] libbpf: BTF writer APIs Andrii Nakryiko
                   ` (8 preceding siblings ...)
  2020-09-23 15:54 ` [PATCH bpf-next 9/9] selftests/bpf: test BTF writing APIs Andrii Nakryiko
@ 2020-09-24 15:16 ` John Fastabend
  9 siblings, 0 replies; 20+ messages in thread
From: John Fastabend @ 2020-09-24 15:16 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, netdev, ast, daniel
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko, Arnaldo Carvalho de Melo

Andrii Nakryiko wrote:
> This patch set introduces a new set of BTF APIs to libbpf that allow to
> conveniently produce BTF types and strings. Internals of struct btf were
> changed such that it can transparently and automatically switch to writable
> mode, which allows appending BTF types and strings. This will allow for libbpf
> itself to do more intrusive modifications of program's BTF (by rewriting it,
> at least as of right now), which is necessary for the upcoming libbpf static
> linking. But they are complete and generic, so can be adopted by anyone who
> has a need to produce BTF type information.


I had this floating around on my todo list thanks a lot for doing it. I can
remove a couple more silly hacks I have floating around now!

> 
> One such example outside of libbpf is pahole, which was actually converted to
> these APIs (locally, pending landing of these changes in libbpf) completely
> and shows reduction in amount of custom pahole code necessary and brings nice
> savings in memory usage (about 370MB reduction at peak for my kernel
> configuration) and even BTF deduplication times (one second reduction,
> 23.7s -> 22.7s). Memory savings are due to avoiding pahole's own copy of
> "uncompressed" raw BTF data. Time reduction comes from faster string
> search and deduplication by relying on hashmap instead of BST used by pahole's
> own code. Consequently, these APIs are already tested on real-world
> complicated kernel BTF, but there is also pretty extensive selftest doing
> extra validations.
> 
> Selftests in patch #9 add a set of generic ASSERT_{EQ,STREQ,ERR,OK} macros
> that are useful for writing shorter and less repretitive selftests. I decided
> to keep them local to that selftest for now, but if they prove to be useful in
> more contexts we should move them to test_progs.h. And few more (e.g.,
> inequality tests) macros are probably necessary to have a more complete set.
> 
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>

For the series, I have a few nits I'll put in the patches, mostly spelling 
errors and a couple questions, otherwise this is awesome thanks.

Acked-by: John Fastabend <john.fastabend@gmail.com>

> 
> Andrii Nakryiko (9):
>   libbpf: refactor internals of BTF type index
>   libbpf: remove assumption of single contiguous memory for BTF data
>   libbpf: generalize common logic for managing dynamically-sized arrays
>   libbpf: extract generic string hashing function for reuse
>   libbpf: allow modification of BTF and add btf__add_str API
>   libbpf: add btf__new_empty() to create an empty BTF object
>   libbpf: add BTF writing APIs
>   libbpf: add btf__str_by_offset() as a more generic variant of
>     name_by_offset
>   selftests/bpf: test BTF writing APIs
> 
>  tools/lib/bpf/bpf.c                           |    2 +-
>  tools/lib/bpf/bpf.h                           |    2 +-
>  tools/lib/bpf/btf.c                           | 1311 +++++++++++++++--
>  tools/lib/bpf/btf.h                           |   41 +
>  tools/lib/bpf/btf_dump.c                      |    9 +-
>  tools/lib/bpf/hashmap.h                       |   12 +
>  tools/lib/bpf/libbpf.map                      |   22 +
>  tools/lib/bpf/libbpf_internal.h               |    3 +
>  .../selftests/bpf/prog_tests/btf_write.c      |  271 ++++
>  9 files changed, 1553 insertions(+), 120 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/btf_write.c
> 
> -- 
> 2.24.1
> 



^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH bpf-next 2/9] libbpf: remove assumption of single contiguous memory for BTF data
  2020-09-23 15:54 ` [PATCH bpf-next 2/9] libbpf: remove assumption of single contiguous memory for BTF data Andrii Nakryiko
@ 2020-09-24 15:21   ` John Fastabend
  2020-09-24 20:25     ` Andrii Nakryiko
  0 siblings, 1 reply; 20+ messages in thread
From: John Fastabend @ 2020-09-24 15:21 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, netdev, ast, daniel
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko, Arnaldo Carvalho de Melo

Andrii Nakryiko wrote:
> Refactor internals of struct btf to remove assumptions that BTF header, type
> data, and string data are layed out contiguously in a memory in a single
> memory allocation. Now we have three separate pointers pointing to the start
> of each respective are: header, types, strings. In the next patches, these
> pointers will be re-assigned to point to independently allocated memory areas,
> if BTF needs to be modified.
> 
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> ---

[...]

> --- a/tools/lib/bpf/btf.c
> +++ b/tools/lib/bpf/btf.c
> @@ -27,18 +27,37 @@
>  static struct btf_type btf_void;
>  
>  struct btf {
> -	union {
> -		struct btf_header *hdr;
> -		void *data;
> -	};
> +	void *raw_data;
> +	__u32 raw_size;
> +
> +	/*
> +	 * When BTF is loaded from ELF or raw memory it is stored
> +	 * in contiguous memory block, pointed to by raw_data pointer, and
> +	 * hdr, types_data, and strs_data point inside that memory region to
> +	 * respective parts of BTF representation:

I find the above comment a bit confusing. The picture though is great. How
about something like,

  When BTF is loaded from an ELF or raw memory it is stored
  in a continguous memory block. The hdr, type_data, and strs_data
  point inside that memory region to their respective parts of BTF
  representation

> +	 *
> +	 * +--------------------------------+
> +	 * |  Header  |  Types  |  Strings  |
> +	 * +--------------------------------+
> +	 * ^          ^         ^
> +	 * |          |         |
> +	 * hdr        |         |
> +	 * types_data-+         |
> +	 * strs_data------------+
> +	 */
> +	struct btf_header *hdr;
> +	void *types_data;
> +	void *strs_data;
> +
> +	/* type ID to `struct btf_type *` lookup index */
>  	__u32 *type_offs;
>  	__u32 type_offs_cap;
> -	const char *strings;
> -	void *nohdr_data;
> -	void *types_data;
>  	__u32 nr_types;
> -	__u32 data_size;
> +
> +	/* BTF object FD, if loaded into kernel */
>  	int fd;
> +
> +	/* Pointer size (in bytes) for a target architecture of this BTF */
>  	int ptr_sz;
>  };
>  

Thanks,
John

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH bpf-next 5/9] libbpf: allow modification of BTF and add btf__add_str API
  2020-09-23 15:54 ` [PATCH bpf-next 5/9] libbpf: allow modification of BTF and add btf__add_str API Andrii Nakryiko
@ 2020-09-24 15:56   ` John Fastabend
  2020-09-24 20:27     ` Andrii Nakryiko
  0 siblings, 1 reply; 20+ messages in thread
From: John Fastabend @ 2020-09-24 15:56 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, netdev, ast, daniel
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko, Arnaldo Carvalho de Melo

Andrii Nakryiko wrote:
> Allow internal BTF representation to switch from default read-only mode, in
> which raw BTF data is a single non-modifiable block of memory with BTF header,
> types, and strings layed out sequentially and contiguously in memory, into
> a writable representation with types and strings data split out into separate
> memory regions, that can be dynamically expanded.
> 
> Such writable internal representation is transparent to users of libbpf APIs,
> but allows to append new types and strings at the end of BTF, which is
> a typical use case when generating BTF programmatically. All the basic
> guarantees of BTF types and strings layout is preserved, i.e., user can get
> `struct btf_type *` pointer and read it directly. Such btf_type pointers might
> be invalidated if BTF is modified, so some care is required in such mixed
> read/write scenarios.
> 
> Switch from read-only to writable configuration happens automatically the
> first time when user attempts to modify BTF by either adding a new type or new
> string. It is still possible to get raw BTF data, which is a single piece of
> memory that can be persisted in ELF section or into a file as raw BTF. Such
> raw data memory is also still owned by BTF and will be freed either when BTF
> object is freed or if another modification to BTF happens, as any modification
> invalidates BTF raw representation.
> 
> This patch adds the first BTF writing API: btf__add_str(), which allows to
> add arbitrary strings to BTF string section. All the added strings are
> automatically deduplicated. This is achieved by maintaining an additional
> string lookup index for all unique strings. Such index is built when BTF is
> switched to modifiable mode. If at that time BTF strings section contained
> duplicate strings, they are not de-duplicated. This is done specifically to
> not modify the existing content of BTF (types, their string offsets, etc),
> which can cause confusion and is especially important property if there is
> struct btf_ext associated with struct btf. By following this "imperfect
> deduplication" process, btf_ext is kept consitent and correct. If
> deduplication of strings is necessary, it can be forced by doing BTF
> deduplication, at which point all the strings will be eagerly deduplicated and
> all string offsets both in struct btf and struct btf_ext will be updated.
> 
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> ---

[...]

> +/* Ensure BTF is ready to be modified (by splitting into a three memory
> + * regions for header, types, and strings). Also invalidate cached
> + * raw_data, if any.
> + */
> +static int btf_ensure_modifiable(struct btf *btf)
> +{
> +	void *hdr, *types, *strs, *strs_end, *s;
> +	struct hashmap *hash = NULL;
> +	long off;
> +	int err;
> +
> +	if (btf_is_modifiable(btf)) {
> +		/* any BTF modification invalidates raw_data */
> +		if (btf->raw_data) {

I missed why this case is needed? Just being cautious? It looks like
we get btf->hdr != btf->raw_data (aka btf_is_modifiable) below, but
by the tiime we do this set it looks like we will always null btf->raw_data
as well. Again doesn't appear harmful just seeing if I missed a path.

> +			free(btf->raw_data);
> +			btf->raw_data = NULL;
> +		}
> +		return 0;
> +	}
> +
> +	/* split raw data into three memory regions */
> +	hdr = malloc(btf->hdr->hdr_len);
> +	types = malloc(btf->hdr->type_len);
> +	strs = malloc(btf->hdr->str_len);
> +	if (!hdr || !types || !strs)
> +		goto err_out;
> +
> +	memcpy(hdr, btf->hdr, btf->hdr->hdr_len);
> +	memcpy(types, btf->types_data, btf->hdr->type_len);
> +	memcpy(strs, btf->strs_data, btf->hdr->str_len);
> +
> +	/* build lookup index for all strings */
> +	hash = hashmap__new(strs_hash_fn, strs_hash_equal_fn, btf);
> +	if (IS_ERR(hash)) {
> +		err = PTR_ERR(hash);
> +		hash = NULL;
> +		goto err_out;
> +	}
> +

[...]

Thanks,
John

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH bpf-next 7/9] libbpf: add BTF writing APIs
  2020-09-23 15:54 ` [PATCH bpf-next 7/9] libbpf: add BTF writing APIs Andrii Nakryiko
@ 2020-09-24 15:59   ` John Fastabend
  2020-09-25  3:55   ` Alexei Starovoitov
  1 sibling, 0 replies; 20+ messages in thread
From: John Fastabend @ 2020-09-24 15:59 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, netdev, ast, daniel
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko, Arnaldo Carvalho de Melo

Andrii Nakryiko wrote:
> Add APIs for appending new BTF types at the end of BTF object.
> 
> Each BTF kind has either one API of the form btf__append_<kind>(). For types
> that have variable amount of additional items (struct/union, enum, func_proto,
> datasec), additional API is provided to emit each such item. E.g., for
> emitting a struct, one would use the following sequence of API calls:
> 
> btf__append_struct(...);
> btf__append_field(...);
> ...
> btf__append_field(...);
> 
> Each btf__append_field() will ensure that the last BTF type is of STRUCT or
> UNION kind and will automatically increment that type's vlen field.
> 
> All the strings are provided as C strings (const char *), not a string offset.
> This significantly improves usability of BTF writer APIs. All such strings
> will be automatically appended to string section or existing string will be
> re-used, if such string was already added previously.
> 
> Each API attempts to do all the reasonable validations, like enforcing
> non-empty names for entities with required names, proper value bounds, various
> bit offset restrictions, etc.
> 
> Type ID validation is minimal because it's possible to emit a type that refers
> to type that will be emitted later, so libbpf has no way to enforce such
> cases. User must be careful to properly emit all the necessary types and
> specify type IDs that will be valid in the finally generated BTF.
> 
> Each of btf__append_<kind>() APIs return new type ID on success or negative
> value on error. APIs like btf__append_field() that emit additional items
> return zero on success and negative value on error.
> 
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> ---

[...]

> +	/* deconstruct BTF, if necessary, and invalidate raw_data */
> +	if (btf_ensure_modifiable(btf))
> +		return -ENOMEM;
> +
> +	sz = sizeof(struct btf_type) + sizeof(int);
> +	t = btf_add_type_mem(btf, sz);
> +	if (!t)
> +		return -ENOMEM;
> +
> +	/* if something goes wrong later, we might end up with extra an string,

nit typo, 'with an extra string'

> +	 * but that shouldn't be a problem, because BTF can't be constructed
> +	 * completely anyway and will most probably be just discarded
> +	 */
> +	name_off = btf__add_str(btf, name);
> +	if (name_off < 0)
> +		return name_off;
> +
> +	t->name_off = name_off;
> +	t->info = btf_type_info(BTF_KIND_INT, 0, 0);
> +	t->size = byte_sz;
> +	/* set INT info, we don't allow setting legacy bit offset/size */
> +	*(__u32 *)(t + 1) = (encoding << 24) | (byte_sz * 8);
> +
> +	err = btf_add_type_idx_entry(btf, btf->hdr->type_len);
> +	if (err)
> +		return err;
> +
> +	btf->hdr->type_len += sz;
> +	btf->hdr->str_off += sz;
> +	btf->nr_types++;
> +	return btf->nr_types;
> +}
> +

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 2/9] libbpf: remove assumption of single contiguous memory for BTF data
  2020-09-24 15:21   ` John Fastabend
@ 2020-09-24 20:25     ` Andrii Nakryiko
  0 siblings, 0 replies; 20+ messages in thread
From: Andrii Nakryiko @ 2020-09-24 20:25 UTC (permalink / raw)
  To: John Fastabend
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team, Arnaldo Carvalho de Melo

On Thu, Sep 24, 2020 at 8:21 AM John Fastabend <john.fastabend@gmail.com> wrote:
>
> Andrii Nakryiko wrote:
> > Refactor internals of struct btf to remove assumptions that BTF header, type
> > data, and string data are layed out contiguously in a memory in a single
> > memory allocation. Now we have three separate pointers pointing to the start
> > of each respective are: header, types, strings. In the next patches, these
> > pointers will be re-assigned to point to independently allocated memory areas,
> > if BTF needs to be modified.
> >
> > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> > ---
>
> [...]
>
> > --- a/tools/lib/bpf/btf.c
> > +++ b/tools/lib/bpf/btf.c
> > @@ -27,18 +27,37 @@
> >  static struct btf_type btf_void;
> >
> >  struct btf {
> > -     union {
> > -             struct btf_header *hdr;
> > -             void *data;
> > -     };
> > +     void *raw_data;
> > +     __u32 raw_size;
> > +
> > +     /*
> > +      * When BTF is loaded from ELF or raw memory it is stored
> > +      * in contiguous memory block, pointed to by raw_data pointer, and
> > +      * hdr, types_data, and strs_data point inside that memory region to
> > +      * respective parts of BTF representation:
>
> I find the above comment a bit confusing. The picture though is great. How
> about something like,
>
>   When BTF is loaded from an ELF or raw memory it is stored
>   in a continguous memory block. The hdr, type_data, and strs_data
>   point inside that memory region to their respective parts of BTF
>   representation
>

Had to do a mental diff to find the part you didn't like :) I'll
update the comment as you suggested.

> > +      *
> > +      * +--------------------------------+
> > +      * |  Header  |  Types  |  Strings  |
> > +      * +--------------------------------+
> > +      * ^          ^         ^
> > +      * |          |         |
> > +      * hdr        |         |
> > +      * types_data-+         |
> > +      * strs_data------------+
> > +      */
> > +     struct btf_header *hdr;
> > +     void *types_data;
> > +     void *strs_data;
> > +
> > +     /* type ID to `struct btf_type *` lookup index */
> >       __u32 *type_offs;
> >       __u32 type_offs_cap;
> > -     const char *strings;
> > -     void *nohdr_data;
> > -     void *types_data;
> >       __u32 nr_types;
> > -     __u32 data_size;
> > +
> > +     /* BTF object FD, if loaded into kernel */
> >       int fd;
> > +
> > +     /* Pointer size (in bytes) for a target architecture of this BTF */
> >       int ptr_sz;
> >  };
> >
>
> Thanks,
> John

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 5/9] libbpf: allow modification of BTF and add btf__add_str API
  2020-09-24 15:56   ` John Fastabend
@ 2020-09-24 20:27     ` Andrii Nakryiko
  0 siblings, 0 replies; 20+ messages in thread
From: Andrii Nakryiko @ 2020-09-24 20:27 UTC (permalink / raw)
  To: John Fastabend
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team, Arnaldo Carvalho de Melo

On Thu, Sep 24, 2020 at 8:56 AM John Fastabend <john.fastabend@gmail.com> wrote:
>
> Andrii Nakryiko wrote:
> > Allow internal BTF representation to switch from default read-only mode, in
> > which raw BTF data is a single non-modifiable block of memory with BTF header,
> > types, and strings layed out sequentially and contiguously in memory, into
> > a writable representation with types and strings data split out into separate
> > memory regions, that can be dynamically expanded.
> >
> > Such writable internal representation is transparent to users of libbpf APIs,
> > but allows to append new types and strings at the end of BTF, which is
> > a typical use case when generating BTF programmatically. All the basic
> > guarantees of BTF types and strings layout is preserved, i.e., user can get
> > `struct btf_type *` pointer and read it directly. Such btf_type pointers might
> > be invalidated if BTF is modified, so some care is required in such mixed
> > read/write scenarios.
> >
> > Switch from read-only to writable configuration happens automatically the
> > first time when user attempts to modify BTF by either adding a new type or new
> > string. It is still possible to get raw BTF data, which is a single piece of
> > memory that can be persisted in ELF section or into a file as raw BTF. Such
> > raw data memory is also still owned by BTF and will be freed either when BTF
> > object is freed or if another modification to BTF happens, as any modification
> > invalidates BTF raw representation.
> >
> > This patch adds the first BTF writing API: btf__add_str(), which allows to
> > add arbitrary strings to BTF string section. All the added strings are
> > automatically deduplicated. This is achieved by maintaining an additional
> > string lookup index for all unique strings. Such index is built when BTF is
> > switched to modifiable mode. If at that time BTF strings section contained
> > duplicate strings, they are not de-duplicated. This is done specifically to
> > not modify the existing content of BTF (types, their string offsets, etc),
> > which can cause confusion and is especially important property if there is
> > struct btf_ext associated with struct btf. By following this "imperfect
> > deduplication" process, btf_ext is kept consitent and correct. If
> > deduplication of strings is necessary, it can be forced by doing BTF
> > deduplication, at which point all the strings will be eagerly deduplicated and
> > all string offsets both in struct btf and struct btf_ext will be updated.
> >
> > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> > ---
>
> [...]
>
> > +/* Ensure BTF is ready to be modified (by splitting into a three memory
> > + * regions for header, types, and strings). Also invalidate cached
> > + * raw_data, if any.
> > + */
> > +static int btf_ensure_modifiable(struct btf *btf)
> > +{
> > +     void *hdr, *types, *strs, *strs_end, *s;
> > +     struct hashmap *hash = NULL;
> > +     long off;
> > +     int err;
> > +
> > +     if (btf_is_modifiable(btf)) {
> > +             /* any BTF modification invalidates raw_data */
> > +             if (btf->raw_data) {
>
> I missed why this case is needed? Just being cautious? It looks like
> we get btf->hdr != btf->raw_data (aka btf_is_modifiable) below, but
> by the tiime we do this set it looks like we will always null btf->raw_data
> as well. Again doesn't appear harmful just seeing if I missed a path.

It's because of btf__get_raw_data() (it's currently used by pahole for
BTF dedup). raw_data is cached in struct btf and is owned by it, so
when we attempt modification, we have to invalidate a single-blob
representation, as it is immediately invalid. This is mostly to
preserve existing semantics, but also not to keep allocating new
memory if caller created BTF and then accesses raw_data few times.

>
> > +                     free(btf->raw_data);
> > +                     btf->raw_data = NULL;
> > +             }
> > +             return 0;
> > +     }
> > +
> > +     /* split raw data into three memory regions */
> > +     hdr = malloc(btf->hdr->hdr_len);
> > +     types = malloc(btf->hdr->type_len);
> > +     strs = malloc(btf->hdr->str_len);
> > +     if (!hdr || !types || !strs)
> > +             goto err_out;
> > +
> > +     memcpy(hdr, btf->hdr, btf->hdr->hdr_len);
> > +     memcpy(types, btf->types_data, btf->hdr->type_len);
> > +     memcpy(strs, btf->strs_data, btf->hdr->str_len);
> > +
> > +     /* build lookup index for all strings */
> > +     hash = hashmap__new(strs_hash_fn, strs_hash_equal_fn, btf);
> > +     if (IS_ERR(hash)) {
> > +             err = PTR_ERR(hash);
> > +             hash = NULL;
> > +             goto err_out;
> > +     }
> > +
>
> [...]
>
> Thanks,
> John

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 7/9] libbpf: add BTF writing APIs
  2020-09-23 15:54 ` [PATCH bpf-next 7/9] libbpf: add BTF writing APIs Andrii Nakryiko
  2020-09-24 15:59   ` John Fastabend
@ 2020-09-25  3:55   ` Alexei Starovoitov
  2020-09-25  6:21     ` Andrii Nakryiko
  1 sibling, 1 reply; 20+ messages in thread
From: Alexei Starovoitov @ 2020-09-25  3:55 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, netdev, ast, daniel, andrii.nakryiko, kernel-team,
	Arnaldo Carvalho de Melo

On Wed, Sep 23, 2020 at 08:54:34AM -0700, Andrii Nakryiko wrote:
> Add APIs for appending new BTF types at the end of BTF object.
> 
> Each BTF kind has either one API of the form btf__append_<kind>(). For types
> that have variable amount of additional items (struct/union, enum, func_proto,
> datasec), additional API is provided to emit each such item. E.g., for
> emitting a struct, one would use the following sequence of API calls:
> 
> btf__append_struct(...);
> btf__append_field(...);
> ...
> btf__append_field(...);

I've just started looking through the diffs. The first thing that struck me
is the name :) Why 'append' instead of 'add' ? The latter is shorter.

Also how would you add anon struct that is within another struct ?
The anon one would have to be added first and then added as a field?
Feels a bit odd that struct/union building doesn't have 'finish' method,
but I guess it can work.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 7/9] libbpf: add BTF writing APIs
  2020-09-25  3:55   ` Alexei Starovoitov
@ 2020-09-25  6:21     ` Andrii Nakryiko
  2020-09-25 15:37       ` Alexei Starovoitov
  0 siblings, 1 reply; 20+ messages in thread
From: Andrii Nakryiko @ 2020-09-25  6:21 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team, Arnaldo Carvalho de Melo

On Thu, Sep 24, 2020 at 8:55 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Sep 23, 2020 at 08:54:34AM -0700, Andrii Nakryiko wrote:
> > Add APIs for appending new BTF types at the end of BTF object.
> >
> > Each BTF kind has either one API of the form btf__append_<kind>(). For types
> > that have variable amount of additional items (struct/union, enum, func_proto,
> > datasec), additional API is provided to emit each such item. E.g., for
> > emitting a struct, one would use the following sequence of API calls:
> >
> > btf__append_struct(...);
> > btf__append_field(...);
> > ...
> > btf__append_field(...);
>
> I've just started looking through the diffs. The first thing that struck me
> is the name :) Why 'append' instead of 'add' ? The latter is shorter.

Append is very precise about those types being added at the end. Add
is more ambiguous in that sense and doesn't imply any specific order.
E.g., for btf__add_str() that's suitable, because the order in which
strings are inserted might be different (e.g., if the string is
duplicated). But it's not an "insert" either, so I'm fine with
renaming to "add", if you prefer it.

>
> Also how would you add anon struct that is within another struct ?
> The anon one would have to be added first and then added as a field?
> Feels a bit odd that struct/union building doesn't have 'finish' method,
> but I guess it can work.

That embedded anon struct will be a separate type, then the field
(anonymous or not, that's orthogonal to anonymity of a struct (!))
will refer to that anon struct type by its ID. Anon struct can be
added right before, right after, or in between completely unrelated
types, it doesn't matter to BTF itself as long as all the type IDs
match in the end.

As for the finish method... There wasn't a need so far to have it, as
BTF constantly maintains correct vlen for the "current"
struct/union/func_proto/datasec/enum (anything with extra members).
I've been writing a few more tests than what I posted here (they will
come in the second wave of patches) and converted pahole to this new
API completely. And it does feel pretty nice and natural so far. In
the future we might add something like that, I suppose, that would
allow to do some more validations at the end. But that would be a
non-breaking extension, so I don't want to do it right now.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 7/9] libbpf: add BTF writing APIs
  2020-09-25  6:21     ` Andrii Nakryiko
@ 2020-09-25 15:37       ` Alexei Starovoitov
  2020-09-25 16:53         ` Andrii Nakryiko
  0 siblings, 1 reply; 20+ messages in thread
From: Alexei Starovoitov @ 2020-09-25 15:37 UTC (permalink / raw)
  To: Andrii Nakryiko, Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, Networking, Daniel Borkmann, Kernel Team,
	Arnaldo Carvalho de Melo

On 9/24/20 11:21 PM, Andrii Nakryiko wrote:
> On Thu, Sep 24, 2020 at 8:55 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
>>
>> On Wed, Sep 23, 2020 at 08:54:34AM -0700, Andrii Nakryiko wrote:
>>> Add APIs for appending new BTF types at the end of BTF object.
>>>
>>> Each BTF kind has either one API of the form btf__append_<kind>(). For types
>>> that have variable amount of additional items (struct/union, enum, func_proto,
>>> datasec), additional API is provided to emit each such item. E.g., for
>>> emitting a struct, one would use the following sequence of API calls:
>>>
>>> btf__append_struct(...);
>>> btf__append_field(...);
>>> ...
>>> btf__append_field(...);
>>
>> I've just started looking through the diffs. The first thing that struck me
>> is the name :) Why 'append' instead of 'add' ? The latter is shorter.
> 
> Append is very precise about those types being added at the end. Add
> is more ambiguous in that sense and doesn't imply any specific order.
> E.g., for btf__add_str() that's suitable, because the order in which
> strings are inserted might be different (e.g., if the string is
> duplicated). But it's not an "insert" either, so I'm fine with
> renaming to "add", if you prefer it.

The reason I prefer shorter is to be able to write:
btf__add_var(btf, "my_var", global,
              btf__add_const(btf,
              btf__add_volatile(btf,
              btf__add_ptr(btf,
              btf__add_int(btf, "int", 4, signed))));

In other words the shorter the type the more suitable the api
will be for manual construction of types.
Looks like the api already checks for invalid type_id,
so no need to check validity at every build stage.
Hence it's nice to combine multiple btf__add_*() into single line.

I think C language isn't great for 'constructor' style api.
May be on top of the above api we can add c++-like api?
For example we can define
struct btf_builder {
    struct btf_builder * (*_volatile) (void);
    struct btf_builder * (*_const) (void);
    struct btf_builder * (*_int) (char *name, int sz, int encoding);
    struct btf_builder * (_ptr) (void);
};

and the use it as:
     struct btf_builder *b = btf__create_global_builer(...);

     b->_int("int", 4, singed)
      ->_const()
      ->_volatile()
      ->_ptr()
      ->_var("my_var", global);

Every method will be return its own object (only one such object)
while the actual building will be happening in some 'invisible',
global, and mutex protected place.

>>
>> Also how would you add anon struct that is within another struct ?
>> The anon one would have to be added first and then added as a field?
>> Feels a bit odd that struct/union building doesn't have 'finish' method,
>> but I guess it can work.
> 
> That embedded anon struct will be a separate type, then the field
> (anonymous or not, that's orthogonal to anonymity of a struct (!))
> will refer to that anon struct type by its ID. Anon struct can be
> added right before, right after, or in between completely unrelated
> types, it doesn't matter to BTF itself as long as all the type IDs
> match in the end.
> 
> As for the finish method... There wasn't a need so far to have it, as
> BTF constantly maintains correct vlen for the "current"
> struct/union/func_proto/datasec/enum (anything with extra members).
> I've been writing a few more tests than what I posted here (they will
> come in the second wave of patches) and converted pahole to this new
> API completely. And it does feel pretty nice and natural so far. In
> the future we might add something like that, I suppose, that would
> allow to do some more validations at the end. But that would be a
> non-breaking extension, so I don't want to do it right now.

sure. that's fine.
Also I suspect sooner or later would be good to do at least partial
dedup of types while they're being built.
Instead of passing exact btf_id of 'int' everywhere it would be
human friendlier to say:
   b->_int("int", 4, singed)->_var("my_var")
instead of having extra variable that holds btf_id of 'int' and
use as it:
   u32 btf_id_of_int; /* that is used everywhere where 'int' field of 
var needs to be added */
   b->__var("my_var", btf_id_of_int);

I mean since types are being built the real dedup cannot happen,
but dedup of simple types can happen on the fly.
That will justify 'add' vs 'append' as well.
Just a thought.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 7/9] libbpf: add BTF writing APIs
  2020-09-25 15:37       ` Alexei Starovoitov
@ 2020-09-25 16:53         ` Andrii Nakryiko
  0 siblings, 0 replies; 20+ messages in thread
From: Andrii Nakryiko @ 2020-09-25 16:53 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, Andrii Nakryiko, bpf, Networking,
	Daniel Borkmann, Kernel Team, Arnaldo Carvalho de Melo

On Fri, Sep 25, 2020 at 8:37 AM Alexei Starovoitov <ast@fb.com> wrote:
>
> On 9/24/20 11:21 PM, Andrii Nakryiko wrote:
> > On Thu, Sep 24, 2020 at 8:55 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> >>
> >> On Wed, Sep 23, 2020 at 08:54:34AM -0700, Andrii Nakryiko wrote:
> >>> Add APIs for appending new BTF types at the end of BTF object.
> >>>
> >>> Each BTF kind has either one API of the form btf__append_<kind>(). For types
> >>> that have variable amount of additional items (struct/union, enum, func_proto,
> >>> datasec), additional API is provided to emit each such item. E.g., for
> >>> emitting a struct, one would use the following sequence of API calls:
> >>>
> >>> btf__append_struct(...);
> >>> btf__append_field(...);
> >>> ...
> >>> btf__append_field(...);
> >>
> >> I've just started looking through the diffs. The first thing that struck me
> >> is the name :) Why 'append' instead of 'add' ? The latter is shorter.
> >
> > Append is very precise about those types being added at the end. Add
> > is more ambiguous in that sense and doesn't imply any specific order.
> > E.g., for btf__add_str() that's suitable, because the order in which
> > strings are inserted might be different (e.g., if the string is
> > duplicated). But it's not an "insert" either, so I'm fine with
> > renaming to "add", if you prefer it.
>
> The reason I prefer shorter is to be able to write:
> btf__add_var(btf, "my_var", global,
>               btf__add_const(btf,
>               btf__add_volatile(btf,
>               btf__add_ptr(btf,
>               btf__add_int(btf, "int", 4, signed))));

That's an interesting way of using it, I'll give you that :)

Ok, I'll switch to "add" name.

>
> In other words the shorter the type the more suitable the api
> will be for manual construction of types.
> Looks like the api already checks for invalid type_id,
> so no need to check validity at every build stage.
> Hence it's nice to combine multiple btf__add_*() into single line.
>
> I think C language isn't great for 'constructor' style api.
> May be on top of the above api we can add c++-like api?
> For example we can define
> struct btf_builder {
>     struct btf_builder * (*_volatile) (void);
>     struct btf_builder * (*_const) (void);
>     struct btf_builder * (*_int) (char *name, int sz, int encoding);
>     struct btf_builder * (_ptr) (void);
> };
>
> and the use it as:
>      struct btf_builder *b = btf__create_global_builer(...);
>
>      b->_int("int", 4, singed)
>       ->_const()
>       ->_volatile()
>       ->_ptr()
>       ->_var("my_var", global);
>
> Every method will be return its own object (only one such object)
> while the actual building will be happening in some 'invisible',
> global, and mutex protected place.
>
> >>
> >> Also how would you add anon struct that is within another struct ?
> >> The anon one would have to be added first and then added as a field?
> >> Feels a bit odd that struct/union building doesn't have 'finish' method,
> >> but I guess it can work.
> >
> > That embedded anon struct will be a separate type, then the field
> > (anonymous or not, that's orthogonal to anonymity of a struct (!))
> > will refer to that anon struct type by its ID. Anon struct can be
> > added right before, right after, or in between completely unrelated
> > types, it doesn't matter to BTF itself as long as all the type IDs
> > match in the end.
> >
> > As for the finish method... There wasn't a need so far to have it, as
> > BTF constantly maintains correct vlen for the "current"
> > struct/union/func_proto/datasec/enum (anything with extra members).
> > I've been writing a few more tests than what I posted here (they will
> > come in the second wave of patches) and converted pahole to this new
> > API completely. And it does feel pretty nice and natural so far. In
> > the future we might add something like that, I suppose, that would
> > allow to do some more validations at the end. But that would be a
> > non-breaking extension, so I don't want to do it right now.
>
> sure. that's fine.
> Also I suspect sooner or later would be good to do at least partial
> dedup of types while they're being built.
> Instead of passing exact btf_id of 'int' everywhere it would be
> human friendlier to say:
>    b->_int("int", 4, singed)->_var("my_var")
> instead of having extra variable that holds btf_id of 'int' and
> use as it:
>    u32 btf_id_of_int; /* that is used everywhere where 'int' field of
> var needs to be added */
>    b->__var("my_var", btf_id_of_int);
>
> I mean since types are being built the real dedup cannot happen,
> but dedup of simple types can happen on the fly.
> That will justify 'add' vs 'append' as well.
> Just a thought.
>

That's doable, though certainly added complexity. Unless you need to
generate millions of those variables, just appending "int"s many times
and then doing dedup once would be faster and simpler, using just a
touch more memory. But certainly something to keep in mind.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2020-09-25 16:53 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-23 15:54 [PATCH bpf-next 0/9] libbpf: BTF writer APIs Andrii Nakryiko
2020-09-23 15:54 ` [PATCH bpf-next 1/9] libbpf: refactor internals of BTF type index Andrii Nakryiko
2020-09-23 15:54 ` [PATCH bpf-next 2/9] libbpf: remove assumption of single contiguous memory for BTF data Andrii Nakryiko
2020-09-24 15:21   ` John Fastabend
2020-09-24 20:25     ` Andrii Nakryiko
2020-09-23 15:54 ` [PATCH bpf-next 3/9] libbpf: generalize common logic for managing dynamically-sized arrays Andrii Nakryiko
2020-09-23 15:54 ` [PATCH bpf-next 4/9] libbpf: extract generic string hashing function for reuse Andrii Nakryiko
2020-09-23 15:54 ` [PATCH bpf-next 5/9] libbpf: allow modification of BTF and add btf__add_str API Andrii Nakryiko
2020-09-24 15:56   ` John Fastabend
2020-09-24 20:27     ` Andrii Nakryiko
2020-09-23 15:54 ` [PATCH bpf-next 6/9] libbpf: add btf__new_empty() to create an empty BTF object Andrii Nakryiko
2020-09-23 15:54 ` [PATCH bpf-next 7/9] libbpf: add BTF writing APIs Andrii Nakryiko
2020-09-24 15:59   ` John Fastabend
2020-09-25  3:55   ` Alexei Starovoitov
2020-09-25  6:21     ` Andrii Nakryiko
2020-09-25 15:37       ` Alexei Starovoitov
2020-09-25 16:53         ` Andrii Nakryiko
2020-09-23 15:54 ` [PATCH bpf-next 8/9] libbpf: add btf__str_by_offset() as a more generic variant of name_by_offset Andrii Nakryiko
2020-09-23 15:54 ` [PATCH bpf-next 9/9] selftests/bpf: test BTF writing APIs Andrii Nakryiko
2020-09-24 15:16 ` [PATCH bpf-next 0/9] libbpf: BTF writer APIs John Fastabend

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).