netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH bpf-next 0/8] BTF-defined BPF map definitions
@ 2019-05-31 20:21 Andrii Nakryiko
  2019-05-31 20:21 ` [RFC PATCH bpf-next 1/8] libbpf: add common min/max macro to libbpf_internal.h Andrii Nakryiko
                   ` (7 more replies)
  0 siblings, 8 replies; 40+ messages in thread
From: Andrii Nakryiko @ 2019-05-31 20:21 UTC (permalink / raw)
  To: andrii.nakryiko, netdev, bpf, ast, daniel, kernel-team; +Cc: Andrii Nakryiko

This patch set implements initial version (as discussed at LSF/MM2019
conference) of a new way to specify BPF maps, relying on BTF type information,
which allows for easy extensibility, preserving forward and backward
compatibility. See details and examples in description for patch #6.

Patch #1 centralizes commonly used min/max macro in libbpf_internal.h.
Patch #2 extracts .BTF and .BTF.ext loading loging from elf_collect().
Patch #3 refactors map initialization logic into user-provided maps and global
data maps, in preparation to adding another way (BTF-defined maps).
Patch #4 adds support for map definitions in multiple ELF sections and
deprecates bpf_object__find_map_by_offset() API which doesn't appear to be
used anymore and makes assumption that all map definitions reside in single
ELF section.
Patch #5 splits BTF intialization from sanitization/loading into kernel to
preserve original BTF at the time of map initialization.
Patch #6 adds support for BTF-defined maps.
Patch #7 adds new test for BTF-defined map definition.
Patch #8 converts test BPF map definitions to use BTF way.

Andrii Nakryiko (8):
  libbpf: add common min/max macro to libbpf_internal.h
  libbpf: extract BTF loading and simplify ELF parsing logic
  libbpf: refactor map initialization
  libbpf: identify maps by section index in addition to offset
  libbpf: split initialization and loading of BTF
  libbpf: allow specifying map definitions using BTF
  selftests/bpf: add test for BTF-defined maps
  selftests/bpf: switch tests to BTF-defined map definitions

 tools/lib/bpf/bpf.c                           |   7 +-
 tools/lib/bpf/bpf_prog_linfo.c                |   5 +-
 tools/lib/bpf/btf.c                           |   3 -
 tools/lib/bpf/btf.h                           |   1 +
 tools/lib/bpf/btf_dump.c                      |   3 -
 tools/lib/bpf/libbpf.c                        | 762 +++++++++++++-----
 tools/lib/bpf/libbpf_internal.h               |   7 +
 tools/testing/selftests/bpf/progs/bpf_flow.c  |  18 +-
 .../selftests/bpf/progs/get_cgroup_id_kern.c  |  18 +-
 .../testing/selftests/bpf/progs/netcnt_prog.c |  22 +-
 .../selftests/bpf/progs/sample_map_ret0.c     |  18 +-
 .../selftests/bpf/progs/socket_cookie_prog.c  |   9 +-
 .../bpf/progs/sockmap_verdict_prog.c          |  36 +-
 .../selftests/bpf/progs/test_btf_newkv.c      |  73 ++
 .../bpf/progs/test_get_stack_rawtp.c          |  27 +-
 .../selftests/bpf/progs/test_global_data.c    |  27 +-
 tools/testing/selftests/bpf/progs/test_l4lb.c |  45 +-
 .../selftests/bpf/progs/test_l4lb_noinline.c  |  45 +-
 .../selftests/bpf/progs/test_map_in_map.c     |  20 +-
 .../selftests/bpf/progs/test_map_lock.c       |  22 +-
 .../testing/selftests/bpf/progs/test_obj_id.c |   9 +-
 .../bpf/progs/test_select_reuseport_kern.c    |  45 +-
 .../bpf/progs/test_send_signal_kern.c         |  22 +-
 .../bpf/progs/test_skb_cgroup_id_kern.c       |   9 +-
 .../bpf/progs/test_sock_fields_kern.c         |  60 +-
 .../selftests/bpf/progs/test_spin_lock.c      |  33 +-
 .../bpf/progs/test_stacktrace_build_id.c      |  44 +-
 .../selftests/bpf/progs/test_stacktrace_map.c |  40 +-
 .../testing/selftests/bpf/progs/test_tc_edt.c |   9 +-
 .../bpf/progs/test_tcp_check_syncookie_kern.c |   9 +-
 .../selftests/bpf/progs/test_tcp_estats.c     |   9 +-
 .../selftests/bpf/progs/test_tcpbpf_kern.c    |  18 +-
 .../selftests/bpf/progs/test_tcpnotify_kern.c |  18 +-
 tools/testing/selftests/bpf/progs/test_xdp.c  |  18 +-
 .../selftests/bpf/progs/test_xdp_noinline.c   |  60 +-
 tools/testing/selftests/bpf/test_btf.c        |  10 +-
 .../selftests/bpf/test_queue_stack_map.h      |  20 +-
 .../testing/selftests/bpf/test_sockmap_kern.h |  72 +-
 38 files changed, 1182 insertions(+), 491 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/test_btf_newkv.c

-- 
2.17.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [RFC PATCH bpf-next 1/8] libbpf: add common min/max macro to libbpf_internal.h
  2019-05-31 20:21 [RFC PATCH bpf-next 0/8] BTF-defined BPF map definitions Andrii Nakryiko
@ 2019-05-31 20:21 ` Andrii Nakryiko
  2019-05-31 20:21 ` [RFC PATCH bpf-next 2/8] libbpf: extract BTF loading and simplify ELF parsing logic Andrii Nakryiko
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 40+ messages in thread
From: Andrii Nakryiko @ 2019-05-31 20:21 UTC (permalink / raw)
  To: andrii.nakryiko, netdev, bpf, ast, daniel, kernel-team; +Cc: Andrii Nakryiko

Multiple files in libbpf redefine their own definitions for min/max.
Let's define them in libbpf_internal.h and use those everywhere.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 tools/lib/bpf/bpf.c             | 7 ++-----
 tools/lib/bpf/bpf_prog_linfo.c  | 5 +----
 tools/lib/bpf/btf.c             | 3 ---
 tools/lib/bpf/btf_dump.c        | 3 ---
 tools/lib/bpf/libbpf_internal.h | 7 +++++++
 5 files changed, 10 insertions(+), 15 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 0d4b4fe10a84..c7d7993c44bb 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -26,10 +26,11 @@
 #include <memory.h>
 #include <unistd.h>
 #include <asm/unistd.h>
+#include <errno.h>
 #include <linux/bpf.h>
 #include "bpf.h"
 #include "libbpf.h"
-#include <errno.h>
+#include "libbpf_internal.h"
 
 /*
  * When building perf, unistd.h is overridden. __NR_bpf is
@@ -53,10 +54,6 @@
 # endif
 #endif
 
-#ifndef min
-#define min(x, y) ((x) < (y) ? (x) : (y))
-#endif
-
 static inline __u64 ptr_to_u64(const void *ptr)
 {
 	return (__u64) (unsigned long) ptr;
diff --git a/tools/lib/bpf/bpf_prog_linfo.c b/tools/lib/bpf/bpf_prog_linfo.c
index 6978314ea7f6..8c67561c93b0 100644
--- a/tools/lib/bpf/bpf_prog_linfo.c
+++ b/tools/lib/bpf/bpf_prog_linfo.c
@@ -6,10 +6,7 @@
 #include <linux/err.h>
 #include <linux/bpf.h>
 #include "libbpf.h"
-
-#ifndef min
-#define min(x, y) ((x) < (y) ? (x) : (y))
-#endif
+#include "libbpf_internal.h"
 
 struct bpf_prog_linfo {
 	void *raw_linfo;
diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index b2478e98c367..467224feb43b 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -16,9 +16,6 @@
 #include "libbpf_internal.h"
 #include "hashmap.h"
 
-#define max(a, b) ((a) > (b) ? (a) : (b))
-#define min(a, b) ((a) < (b) ? (a) : (b))
-
 #define BTF_MAX_NR_TYPES 0x7fffffff
 #define BTF_MAX_STR_OFFSET 0x7fffffff
 
diff --git a/tools/lib/bpf/btf_dump.c b/tools/lib/bpf/btf_dump.c
index 4b22db77e2cc..7065bb5b2752 100644
--- a/tools/lib/bpf/btf_dump.c
+++ b/tools/lib/bpf/btf_dump.c
@@ -18,9 +18,6 @@
 #include "libbpf.h"
 #include "libbpf_internal.h"
 
-#define min(x, y) ((x) < (y) ? (x) : (y))
-#define max(x, y) ((x) < (y) ? (y) : (x))
-
 static const char PREFIXES[] = "\t\t\t\t\t\t\t\t\t\t\t\t\t";
 static const size_t PREFIX_CNT = sizeof(PREFIXES) - 1;
 
diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h
index 850f7bdec5cb..554a7856dc2d 100644
--- a/tools/lib/bpf/libbpf_internal.h
+++ b/tools/lib/bpf/libbpf_internal.h
@@ -23,6 +23,13 @@
 #define BTF_PARAM_ENC(name, type) (name), (type)
 #define BTF_VAR_SECINFO_ENC(type, offset, size) (type), (offset), (size)
 
+#ifndef min
+# define min(x, y) ((x) < (y) ? (x) : (y))
+#endif
+#ifndef max
+# define max(x, y) ((x) < (y) ? (y) : (x))
+#endif
+
 extern void libbpf_print(enum libbpf_print_level level,
 			 const char *format, ...)
 	__attribute__((format(printf, 2, 3)));
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH bpf-next 2/8] libbpf: extract BTF loading and simplify ELF parsing logic
  2019-05-31 20:21 [RFC PATCH bpf-next 0/8] BTF-defined BPF map definitions Andrii Nakryiko
  2019-05-31 20:21 ` [RFC PATCH bpf-next 1/8] libbpf: add common min/max macro to libbpf_internal.h Andrii Nakryiko
@ 2019-05-31 20:21 ` Andrii Nakryiko
  2019-05-31 20:21 ` [RFC PATCH bpf-next 3/8] libbpf: refactor map initialization Andrii Nakryiko
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 40+ messages in thread
From: Andrii Nakryiko @ 2019-05-31 20:21 UTC (permalink / raw)
  To: andrii.nakryiko, netdev, bpf, ast, daniel, kernel-team; +Cc: Andrii Nakryiko

As a preparation for adding BTF-based BPF map loading, extract .BTF and
.BTF.ext loading logic. Also simplify error handling in
bpf_object__elf_collect() by returning early, as there is no common
clean up to be done.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 tools/lib/bpf/libbpf.c | 137 ++++++++++++++++++++++-------------------
 1 file changed, 75 insertions(+), 62 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index ba89d9727137..9e39a0a33aeb 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -1078,6 +1078,58 @@ static void bpf_object__sanitize_btf_ext(struct bpf_object *obj)
 	}
 }
 
+static int bpf_object__load_btf(struct bpf_object *obj,
+				Elf_Data *btf_data,
+				Elf_Data *btf_ext_data)
+{
+	int err = 0;
+
+	if (btf_data) {
+		obj->btf = btf__new(btf_data->d_buf, btf_data->d_size);
+		if (IS_ERR(obj->btf)) {
+			pr_warning("Error loading ELF section %s: %d.\n",
+				   BTF_ELF_SEC, err);
+			goto out;
+		}
+		err = btf__finalize_data(obj, obj->btf);
+		if (err) {
+			pr_warning("Error finalizing %s: %d.\n",
+				   BTF_ELF_SEC, err);
+			goto out;
+		}
+		bpf_object__sanitize_btf(obj);
+		err = btf__load(obj->btf);
+		if (err) {
+			pr_warning("Error loading %s into kernel: %d.\n",
+				   BTF_ELF_SEC, err);
+			goto out;
+		}
+	}
+	if (btf_ext_data) {
+		if (!obj->btf) {
+			pr_debug("Ignore ELF section %s because its depending ELF section %s is not found.\n",
+				 BTF_EXT_ELF_SEC, BTF_ELF_SEC);
+			goto out;
+		}
+		obj->btf_ext = btf_ext__new(btf_ext_data->d_buf,
+					    btf_ext_data->d_size);
+		if (IS_ERR(obj->btf_ext)) {
+			pr_warning("Error loading ELF section %s: %ld. Ignored and continue.\n",
+				   BTF_EXT_ELF_SEC, PTR_ERR(obj->btf_ext));
+			obj->btf_ext = NULL;
+			goto out;
+		}
+		bpf_object__sanitize_btf_ext(obj);
+	}
+out:
+	if (err || IS_ERR(obj->btf)) {
+		if (!IS_ERR_OR_NULL(obj->btf))
+			btf__free(obj->btf);
+		obj->btf = NULL;
+	}
+	return 0;
+}
+
 static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
 {
 	Elf *elf = obj->efile.elf;
@@ -1102,24 +1154,21 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
 		if (gelf_getshdr(scn, &sh) != &sh) {
 			pr_warning("failed to get section(%d) header from %s\n",
 				   idx, obj->path);
-			err = -LIBBPF_ERRNO__FORMAT;
-			goto out;
+			return -LIBBPF_ERRNO__FORMAT;
 		}
 
 		name = elf_strptr(elf, ep->e_shstrndx, sh.sh_name);
 		if (!name) {
 			pr_warning("failed to get section(%d) name from %s\n",
 				   idx, obj->path);
-			err = -LIBBPF_ERRNO__FORMAT;
-			goto out;
+			return -LIBBPF_ERRNO__FORMAT;
 		}
 
 		data = elf_getdata(scn, 0);
 		if (!data) {
 			pr_warning("failed to get section(%d) data from %s(%s)\n",
 				   idx, name, obj->path);
-			err = -LIBBPF_ERRNO__FORMAT;
-			goto out;
+			return -LIBBPF_ERRNO__FORMAT;
 		}
 		pr_debug("section(%d) %s, size %ld, link %d, flags %lx, type=%d\n",
 			 idx, name, (unsigned long)data->d_size,
@@ -1130,10 +1179,14 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
 			err = bpf_object__init_license(obj,
 						       data->d_buf,
 						       data->d_size);
+			if (err)
+				return err;
 		} else if (strcmp(name, "version") == 0) {
 			err = bpf_object__init_kversion(obj,
 							data->d_buf,
 							data->d_size);
+			if (err)
+				return err;
 		} else if (strcmp(name, "maps") == 0) {
 			obj->efile.maps_shndx = idx;
 		} else if (strcmp(name, BTF_ELF_SEC) == 0) {
@@ -1144,11 +1197,10 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
 			if (obj->efile.symbols) {
 				pr_warning("bpf: multiple SYMTAB in %s\n",
 					   obj->path);
-				err = -LIBBPF_ERRNO__FORMAT;
-			} else {
-				obj->efile.symbols = data;
-				obj->efile.strtabidx = sh.sh_link;
+				return -LIBBPF_ERRNO__FORMAT;
 			}
+			obj->efile.symbols = data;
+			obj->efile.strtabidx = sh.sh_link;
 		} else if (sh.sh_type == SHT_PROGBITS && data->d_size > 0) {
 			if (sh.sh_flags & SHF_EXECINSTR) {
 				if (strcmp(name, ".text") == 0)
@@ -1162,6 +1214,7 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
 
 					pr_warning("failed to alloc program %s (%s): %s",
 						   name, obj->path, cp);
+					return err;
 				}
 			} else if (strcmp(name, ".data") == 0) {
 				obj->efile.data = data;
@@ -1173,8 +1226,8 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
 				pr_debug("skip section(%d) %s\n", idx, name);
 			}
 		} else if (sh.sh_type == SHT_REL) {
+			int nr_reloc = obj->efile.nr_reloc;
 			void *reloc = obj->efile.reloc;
-			int nr_reloc = obj->efile.nr_reloc + 1;
 			int sec = sh.sh_info; /* points to other section */
 
 			/* Only do relo for section with exec instructions */
@@ -1184,79 +1237,39 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
 				continue;
 			}
 
-			reloc = reallocarray(reloc, nr_reloc,
+			reloc = reallocarray(reloc, nr_reloc + 1,
 					     sizeof(*obj->efile.reloc));
 			if (!reloc) {
 				pr_warning("realloc failed\n");
-				err = -ENOMEM;
-			} else {
-				int n = nr_reloc - 1;
+				return -ENOMEM;
+			}
 
-				obj->efile.reloc = reloc;
-				obj->efile.nr_reloc = nr_reloc;
+			obj->efile.reloc = reloc;
+			obj->efile.nr_reloc++;
 
-				obj->efile.reloc[n].shdr = sh;
-				obj->efile.reloc[n].data = data;
-			}
+			obj->efile.reloc[nr_reloc].shdr = sh;
+			obj->efile.reloc[nr_reloc].data = data;
 		} else if (sh.sh_type == SHT_NOBITS && strcmp(name, ".bss") == 0) {
 			obj->efile.bss = data;
 			obj->efile.bss_shndx = idx;
 		} else {
 			pr_debug("skip section(%d) %s\n", idx, name);
 		}
-		if (err)
-			goto out;
 	}
 
 	if (!obj->efile.strtabidx || obj->efile.strtabidx >= idx) {
 		pr_warning("Corrupted ELF file: index of strtab invalid\n");
 		return -LIBBPF_ERRNO__FORMAT;
 	}
-	if (btf_data) {
-		obj->btf = btf__new(btf_data->d_buf, btf_data->d_size);
-		if (IS_ERR(obj->btf)) {
-			pr_warning("Error loading ELF section %s: %ld. Ignored and continue.\n",
-				   BTF_ELF_SEC, PTR_ERR(obj->btf));
-			obj->btf = NULL;
-		} else {
-			err = btf__finalize_data(obj, obj->btf);
-			if (!err) {
-				bpf_object__sanitize_btf(obj);
-				err = btf__load(obj->btf);
-			}
-			if (err) {
-				pr_warning("Error finalizing and loading %s into kernel: %d. Ignored and continue.\n",
-					   BTF_ELF_SEC, err);
-				btf__free(obj->btf);
-				obj->btf = NULL;
-				err = 0;
-			}
-		}
-	}
-	if (btf_ext_data) {
-		if (!obj->btf) {
-			pr_debug("Ignore ELF section %s because its depending ELF section %s is not found.\n",
-				 BTF_EXT_ELF_SEC, BTF_ELF_SEC);
-		} else {
-			obj->btf_ext = btf_ext__new(btf_ext_data->d_buf,
-						    btf_ext_data->d_size);
-			if (IS_ERR(obj->btf_ext)) {
-				pr_warning("Error loading ELF section %s: %ld. Ignored and continue.\n",
-					   BTF_EXT_ELF_SEC,
-					   PTR_ERR(obj->btf_ext));
-				obj->btf_ext = NULL;
-			} else {
-				bpf_object__sanitize_btf_ext(obj);
-			}
-		}
-	}
+	err = bpf_object__load_btf(obj, btf_data, btf_ext_data);
+	if (err)
+		return err;
 	if (bpf_object__has_maps(obj)) {
 		err = bpf_object__init_maps(obj, flags);
 		if (err)
-			goto out;
+			return err;
 	}
 	err = bpf_object__init_prog_names(obj);
-out:
 	return err;
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH bpf-next 3/8] libbpf: refactor map initialization
  2019-05-31 20:21 [RFC PATCH bpf-next 0/8] BTF-defined BPF map definitions Andrii Nakryiko
  2019-05-31 20:21 ` [RFC PATCH bpf-next 1/8] libbpf: add common min/max macro to libbpf_internal.h Andrii Nakryiko
  2019-05-31 20:21 ` [RFC PATCH bpf-next 2/8] libbpf: extract BTF loading and simplify ELF parsing logic Andrii Nakryiko
@ 2019-05-31 20:21 ` Andrii Nakryiko
  2019-05-31 20:21 ` [RFC PATCH bpf-next 4/8] libbpf: identify maps by section index in addition to offset Andrii Nakryiko
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 40+ messages in thread
From: Andrii Nakryiko @ 2019-05-31 20:21 UTC (permalink / raw)
  To: andrii.nakryiko, netdev, bpf, ast, daniel, kernel-team; +Cc: Andrii Nakryiko

User and global data maps initialization has gotten pretty complicated
and unnecessarily convoluted. This patch splits out the logic for global
data map and user-defined map initialization. It also removes the
restriction of pre-calculating how many maps will be initialized,
instead allowing to keep adding new maps as they are discovered, which
will be used later for BTF-defined map definitions.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 tools/lib/bpf/libbpf.c | 244 ++++++++++++++++++++++-------------------
 1 file changed, 134 insertions(+), 110 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 9e39a0a33aeb..c931ee7e1fd2 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -234,6 +234,7 @@ struct bpf_object {
 	size_t nr_programs;
 	struct bpf_map *maps;
 	size_t nr_maps;
+	size_t maps_cap;
 	struct bpf_secdata sections;
 
 	bool loaded;
@@ -763,12 +764,38 @@ int bpf_object__variable_offset(const struct bpf_object *obj, const char *name,
 	return -ENOENT;
 }
 
-static bool bpf_object__has_maps(const struct bpf_object *obj)
+static struct bpf_map *bpf_object__add_map(struct bpf_object *obj)
 {
-	return obj->efile.maps_shndx >= 0 ||
-	       obj->efile.data_shndx >= 0 ||
-	       obj->efile.rodata_shndx >= 0 ||
-	       obj->efile.bss_shndx >= 0;
+	struct bpf_map *new_maps;
+	size_t new_cap;
+	int i;
+
+	if (obj->nr_maps + 1 <= obj->maps_cap)
+		return &obj->maps[obj->nr_maps++];
+
+	new_cap = max(4ul, obj->maps_cap * 3 / 2);
+	new_maps = realloc(obj->maps, new_cap * sizeof(*obj->maps));
+	if (!new_maps) {
+		pr_warning("alloc maps for object failed\n");
+		return ERR_PTR(-ENOMEM);
+	}
+
+	obj->maps_cap = new_cap;
+	obj->maps = new_maps;
+
+	/* zero out new maps */
+	memset(obj->maps + obj->nr_maps, 0,
+	       (obj->maps_cap - obj->nr_maps) * sizeof(*obj->maps));
+	/*
+	 * fill all fd with -1 so won't close incorrect fd (fd=0 is stdin)
+	 * when failure (zclose won't close negative fd)).
+	 */
+	for (i = obj->nr_maps; i < obj->maps_cap; i++) {
+		obj->maps[i].fd = -1;
+		obj->maps[i].inner_map_fd = -1;
+	}
+
+	return &obj->maps[obj->nr_maps++];
 }
 
 static int
@@ -808,29 +835,68 @@ bpf_object__init_internal_map(struct bpf_object *obj, struct bpf_map *map,
 	return 0;
 }
 
-static int bpf_object__init_maps(struct bpf_object *obj, int flags)
+static int bpf_object__init_global_data_maps(struct bpf_object *obj)
+{
+	struct bpf_map *map;
+	int err;
+
+	if (!obj->caps.global_data)
+		return 0;
+	/*
+	 * Populate obj->maps with libbpf internal maps.
+	 */
+	if (obj->efile.data_shndx >= 0) {
+		map = bpf_object__add_map(obj);
+		if (IS_ERR(map))
+			return PTR_ERR(map);
+		err = bpf_object__init_internal_map(obj, map, LIBBPF_MAP_DATA,
+						    obj->efile.data,
+						    &obj->sections.data);
+		if (err)
+			return err;
+	}
+	if (obj->efile.rodata_shndx >= 0) {
+		map = bpf_object__add_map(obj);
+		if (IS_ERR(map))
+			return PTR_ERR(map);
+		err = bpf_object__init_internal_map(obj, map, LIBBPF_MAP_RODATA,
+						    obj->efile.rodata,
+						    &obj->sections.rodata);
+		if (err)
+			return err;
+	}
+	if (obj->efile.bss_shndx >= 0) {
+		map = bpf_object__add_map(obj);
+		if (IS_ERR(map))
+			return PTR_ERR(map);
+		err = bpf_object__init_internal_map(obj, map, LIBBPF_MAP_BSS,
+						    obj->efile.bss, NULL);
+		if (err)
+			return err;
+	}
+	return 0;
+}
+
+static int bpf_object__init_user_maps(struct bpf_object *obj, bool strict)
 {
-	int i, map_idx, map_def_sz = 0, nr_syms, nr_maps = 0, nr_maps_glob = 0;
-	bool strict = !(flags & MAPS_RELAX_COMPAT);
 	Elf_Data *symbols = obj->efile.symbols;
+	int i, map_def_sz = 0, nr_maps = 0, nr_syms;
 	Elf_Data *data = NULL;
-	int ret = 0;
+	Elf_Scn *scn;
+
+	if (obj->efile.maps_shndx < 0)
+		return 0;
 
 	if (!symbols)
 		return -EINVAL;
-	nr_syms = symbols->d_size / sizeof(GElf_Sym);
-
-	if (obj->efile.maps_shndx >= 0) {
-		Elf_Scn *scn = elf_getscn(obj->efile.elf,
-					  obj->efile.maps_shndx);
 
-		if (scn)
-			data = elf_getdata(scn, NULL);
-		if (!scn || !data) {
-			pr_warning("failed to get Elf_Data from map section %d\n",
-				   obj->efile.maps_shndx);
-			return -EINVAL;
-		}
+	scn = elf_getscn(obj->efile.elf, obj->efile.maps_shndx);
+	if (scn)
+		data = elf_getdata(scn, NULL);
+	if (!scn || !data) {
+		pr_warning("failed to get Elf_Data from map section %d\n",
+			   obj->efile.maps_shndx);
+		return -EINVAL;
 	}
 
 	/*
@@ -840,16 +906,8 @@ static int bpf_object__init_maps(struct bpf_object *obj, int flags)
 	 *
 	 * TODO: Detect array of map and report error.
 	 */
-	if (obj->caps.global_data) {
-		if (obj->efile.data_shndx >= 0)
-			nr_maps_glob++;
-		if (obj->efile.rodata_shndx >= 0)
-			nr_maps_glob++;
-		if (obj->efile.bss_shndx >= 0)
-			nr_maps_glob++;
-	}
-
-	for (i = 0; data && i < nr_syms; i++) {
+	nr_syms = symbols->d_size / sizeof(GElf_Sym);
+	for (i = 0; i < nr_syms; i++) {
 		GElf_Sym sym;
 
 		if (!gelf_getsym(symbols, i, &sym))
@@ -858,79 +916,56 @@ static int bpf_object__init_maps(struct bpf_object *obj, int flags)
 			continue;
 		nr_maps++;
 	}
-
-	if (!nr_maps && !nr_maps_glob)
-		return 0;
-
 	/* Assume equally sized map definitions */
-	if (data) {
-		pr_debug("maps in %s: %d maps in %zd bytes\n", obj->path,
-			 nr_maps, data->d_size);
-
-		map_def_sz = data->d_size / nr_maps;
-		if (!data->d_size || (data->d_size % nr_maps) != 0) {
-			pr_warning("unable to determine map definition size "
-				   "section %s, %d maps in %zd bytes\n",
-				   obj->path, nr_maps, data->d_size);
-			return -EINVAL;
-		}
-	}
-
-	nr_maps += nr_maps_glob;
-	obj->maps = calloc(nr_maps, sizeof(obj->maps[0]));
-	if (!obj->maps) {
-		pr_warning("alloc maps for object failed\n");
-		return -ENOMEM;
-	}
-	obj->nr_maps = nr_maps;
-
-	for (i = 0; i < nr_maps; i++) {
-		/*
-		 * fill all fd with -1 so won't close incorrect
-		 * fd (fd=0 is stdin) when failure (zclose won't close
-		 * negative fd)).
-		 */
-		obj->maps[i].fd = -1;
-		obj->maps[i].inner_map_fd = -1;
+	pr_debug("maps in %s: %d maps in %zd bytes\n",
+		 obj->path, nr_maps, data->d_size);
+
+	map_def_sz = data->d_size / nr_maps;
+	if (!data->d_size || (data->d_size % nr_maps) != 0) {
+		pr_warning("unable to determine map definition size "
+			   "section %s, %d maps in %zd bytes\n",
+			   obj->path, nr_maps, data->d_size);
+		return -EINVAL;
 	}
 
-	/*
-	 * Fill obj->maps using data in "maps" section.
-	 */
-	for (i = 0, map_idx = 0; data && i < nr_syms; i++) {
+	/* Fill obj->maps using data in "maps" section.  */
+	for (i = 0; i < nr_syms; i++) {
 		GElf_Sym sym;
 		const char *map_name;
 		struct bpf_map_def *def;
+		struct bpf_map *map;
 
 		if (!gelf_getsym(symbols, i, &sym))
 			continue;
 		if (sym.st_shndx != obj->efile.maps_shndx)
 			continue;
 
-		map_name = elf_strptr(obj->efile.elf,
-				      obj->efile.strtabidx,
+		map = bpf_object__add_map(obj);
+		if (IS_ERR(map))
+			return PTR_ERR(map);
+
+		map_name = elf_strptr(obj->efile.elf, obj->efile.strtabidx,
 				      sym.st_name);
 		if (!map_name) {
 			pr_warning("failed to get map #%d name sym string for obj %s\n",
-				   map_idx, obj->path);
+				   i, obj->path);
 			return -LIBBPF_ERRNO__FORMAT;
 		}
 
-		obj->maps[map_idx].libbpf_type = LIBBPF_MAP_UNSPEC;
-		obj->maps[map_idx].offset = sym.st_value;
+		map->libbpf_type = LIBBPF_MAP_UNSPEC;
+		map->offset = sym.st_value;
 		if (sym.st_value + map_def_sz > data->d_size) {
 			pr_warning("corrupted maps section in %s: last map \"%s\" too small\n",
 				   obj->path, map_name);
 			return -EINVAL;
 		}
 
-		obj->maps[map_idx].name = strdup(map_name);
-		if (!obj->maps[map_idx].name) {
+		map->name = strdup(map_name);
+		if (!map->name) {
 			pr_warning("failed to alloc map name\n");
 			return -ENOMEM;
 		}
-		pr_debug("map %d is \"%s\"\n", map_idx,
-			 obj->maps[map_idx].name);
+		pr_debug("map %d is \"%s\"\n", i, map->name);
 		def = (struct bpf_map_def *)(data->d_buf + sym.st_value);
 		/*
 		 * If the definition of the map in the object file fits in
@@ -939,7 +974,7 @@ static int bpf_object__init_maps(struct bpf_object *obj, int flags)
 		 * calloc above.
 		 */
 		if (map_def_sz <= sizeof(struct bpf_map_def)) {
-			memcpy(&obj->maps[map_idx].def, def, map_def_sz);
+			memcpy(&map->def, def, map_def_sz);
 		} else {
 			/*
 			 * Here the map structure being read is bigger than what
@@ -959,37 +994,30 @@ static int bpf_object__init_maps(struct bpf_object *obj, int flags)
 						return -EINVAL;
 				}
 			}
-			memcpy(&obj->maps[map_idx].def, def,
-			       sizeof(struct bpf_map_def));
+			memcpy(&map->def, def, sizeof(struct bpf_map_def));
 		}
-		map_idx++;
 	}
+	return 0;
+}
 
-	if (!obj->caps.global_data)
-		goto finalize;
+static int bpf_object__init_maps(struct bpf_object *obj, int flags)
+{
+	bool strict = !(flags & MAPS_RELAX_COMPAT);
+	int err;
 
-	/*
-	 * Populate rest of obj->maps with libbpf internal maps.
-	 */
-	if (obj->efile.data_shndx >= 0)
-		ret = bpf_object__init_internal_map(obj, &obj->maps[map_idx++],
-						    LIBBPF_MAP_DATA,
-						    obj->efile.data,
-						    &obj->sections.data);
-	if (!ret && obj->efile.rodata_shndx >= 0)
-		ret = bpf_object__init_internal_map(obj, &obj->maps[map_idx++],
-						    LIBBPF_MAP_RODATA,
-						    obj->efile.rodata,
-						    &obj->sections.rodata);
-	if (!ret && obj->efile.bss_shndx >= 0)
-		ret = bpf_object__init_internal_map(obj, &obj->maps[map_idx++],
-						    LIBBPF_MAP_BSS,
-						    obj->efile.bss, NULL);
-finalize:
-	if (!ret)
+	err = bpf_object__init_user_maps(obj, strict);
+	if (err)
+		return err;
+
+	err = bpf_object__init_global_data_maps(obj);
+	if (err)
+		return err;
+
+	if (obj->nr_maps) {
 		qsort(obj->maps, obj->nr_maps, sizeof(obj->maps[0]),
 		      compare_bpf_map);
-	return ret;
+	}
+	return 0;
 }
 
 static bool section_have_execinstr(struct bpf_object *obj, int idx)
@@ -1262,14 +1290,10 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
 		return -LIBBPF_ERRNO__FORMAT;
 	}
 	err = bpf_object__load_btf(obj, btf_data, btf_ext_data);
-	if (err)
-		return err;
-	if (bpf_object__has_maps(obj)) {
+	if (!err)
 		err = bpf_object__init_maps(obj, flags);
-		if (err)
-			return err;
-	}
-	err = bpf_object__init_prog_names(obj);
+	if (!err)
+		err = bpf_object__init_prog_names(obj);
 	return err;
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH bpf-next 4/8] libbpf: identify maps by section index in addition to offset
  2019-05-31 20:21 [RFC PATCH bpf-next 0/8] BTF-defined BPF map definitions Andrii Nakryiko
                   ` (2 preceding siblings ...)
  2019-05-31 20:21 ` [RFC PATCH bpf-next 3/8] libbpf: refactor map initialization Andrii Nakryiko
@ 2019-05-31 20:21 ` Andrii Nakryiko
  2019-05-31 20:21 ` [RFC PATCH bpf-next 5/8] libbpf: split initialization and loading of BTF Andrii Nakryiko
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 40+ messages in thread
From: Andrii Nakryiko @ 2019-05-31 20:21 UTC (permalink / raw)
  To: andrii.nakryiko, netdev, bpf, ast, daniel, kernel-team; +Cc: Andrii Nakryiko

To support maps to be defined in multiple sections, it's important to
identify map not just by offset within its section, but section index as
well. This patch adds tracking of section index.

For global data, we record section index of corresponding
.data/.bss/.rodata ELF section for uniformity, and thus don't need
a special value of offset for those maps.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 tools/lib/bpf/libbpf.c | 42 ++++++++++++++++++++++++++----------------
 1 file changed, 26 insertions(+), 16 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index c931ee7e1fd2..5e7ea7dac958 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -207,7 +207,8 @@ static const char * const libbpf_type_to_btf_name[] = {
 struct bpf_map {
 	int fd;
 	char *name;
-	size_t offset;
+	int sec_idx;
+	size_t sec_offset;
 	int map_ifindex;
 	int inner_map_fd;
 	struct bpf_map_def def;
@@ -647,7 +648,9 @@ static int compare_bpf_map(const void *_a, const void *_b)
 	const struct bpf_map *a = _a;
 	const struct bpf_map *b = _b;
 
-	return a->offset - b->offset;
+	if (a->sec_idx != b->sec_idx)
+		return a->sec_idx - b->sec_idx;
+	return a->sec_offset - b->sec_offset;
 }
 
 static bool bpf_map_type__is_map_in_map(enum bpf_map_type type)
@@ -800,14 +803,15 @@ static struct bpf_map *bpf_object__add_map(struct bpf_object *obj)
 
 static int
 bpf_object__init_internal_map(struct bpf_object *obj, struct bpf_map *map,
-			      enum libbpf_map_type type, Elf_Data *data,
-			      void **data_buff)
+			      enum libbpf_map_type type, int sec_idx,
+			      Elf_Data *data, void **data_buff)
 {
 	struct bpf_map_def *def = &map->def;
 	char map_name[BPF_OBJ_NAME_LEN];
 
 	map->libbpf_type = type;
-	map->offset = ~(typeof(map->offset))0;
+	map->sec_idx = sec_idx;
+	map->sec_offset = 0;
 	snprintf(map_name, sizeof(map_name), "%.8s%.7s", obj->name,
 		 libbpf_type_to_btf_name[type]);
 	map->name = strdup(map_name);
@@ -815,6 +819,8 @@ bpf_object__init_internal_map(struct bpf_object *obj, struct bpf_map *map,
 		pr_warning("failed to alloc map name\n");
 		return -ENOMEM;
 	}
+	pr_debug("map '%s' (global data): at sec_idx %d, offset %zu.\n",
+		 map_name, map->sec_idx, map->sec_offset);
 
 	def->type = BPF_MAP_TYPE_ARRAY;
 	def->key_size = sizeof(int);
@@ -850,6 +856,7 @@ static int bpf_object__init_global_data_maps(struct bpf_object *obj)
 		if (IS_ERR(map))
 			return PTR_ERR(map);
 		err = bpf_object__init_internal_map(obj, map, LIBBPF_MAP_DATA,
+						    obj->efile.data_shndx,
 						    obj->efile.data,
 						    &obj->sections.data);
 		if (err)
@@ -860,6 +867,7 @@ static int bpf_object__init_global_data_maps(struct bpf_object *obj)
 		if (IS_ERR(map))
 			return PTR_ERR(map);
 		err = bpf_object__init_internal_map(obj, map, LIBBPF_MAP_RODATA,
+						    obj->efile.rodata_shndx,
 						    obj->efile.rodata,
 						    &obj->sections.rodata);
 		if (err)
@@ -870,6 +878,7 @@ static int bpf_object__init_global_data_maps(struct bpf_object *obj)
 		if (IS_ERR(map))
 			return PTR_ERR(map);
 		err = bpf_object__init_internal_map(obj, map, LIBBPF_MAP_BSS,
+						    obj->efile.bss_shndx,
 						    obj->efile.bss, NULL);
 		if (err)
 			return err;
@@ -953,7 +962,10 @@ static int bpf_object__init_user_maps(struct bpf_object *obj, bool strict)
 		}
 
 		map->libbpf_type = LIBBPF_MAP_UNSPEC;
-		map->offset = sym.st_value;
+		map->sec_idx = sym.st_shndx;
+		map->sec_offset = sym.st_value;
+		pr_debug("map '%s' (legacy): at sec_idx %d, offset %zu.\n",
+			 map_name, map->sec_idx, map->sec_offset);
 		if (sym.st_value + map_def_sz > data->d_size) {
 			pr_warning("corrupted maps section in %s: last map \"%s\" too small\n",
 				   obj->path, map_name);
@@ -1453,9 +1465,13 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
 				if (maps[map_idx].libbpf_type != type)
 					continue;
 				if (type != LIBBPF_MAP_UNSPEC ||
-				    maps[map_idx].offset == sym.st_value) {
-					pr_debug("relocation: find map %zd (%s) for insn %u\n",
-						 map_idx, maps[map_idx].name, insn_idx);
+				    (maps[map_idx].sec_idx == sym.st_shndx &&
+				     maps[map_idx].sec_offset == sym.st_value)) {
+					pr_debug("relocation: found map %zd (%s, sec_idx %d, offset %zu) for insn %u\n",
+						 map_idx, maps[map_idx].name,
+						 maps[map_idx].sec_idx,
+						 maps[map_idx].sec_offset,
+						 insn_idx);
 					break;
 				}
 			}
@@ -3472,13 +3488,7 @@ bpf_object__find_map_fd_by_name(struct bpf_object *obj, const char *name)
 struct bpf_map *
 bpf_object__find_map_by_offset(struct bpf_object *obj, size_t offset)
 {
-	int i;
-
-	for (i = 0; i < obj->nr_maps; i++) {
-		if (obj->maps[i].offset == offset)
-			return &obj->maps[i];
-	}
-	return ERR_PTR(-ENOENT);
+	return ERR_PTR(-ENOTSUP);
 }
 
 long libbpf_get_error(const void *ptr)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH bpf-next 5/8] libbpf: split initialization and loading of BTF
  2019-05-31 20:21 [RFC PATCH bpf-next 0/8] BTF-defined BPF map definitions Andrii Nakryiko
                   ` (3 preceding siblings ...)
  2019-05-31 20:21 ` [RFC PATCH bpf-next 4/8] libbpf: identify maps by section index in addition to offset Andrii Nakryiko
@ 2019-05-31 20:21 ` Andrii Nakryiko
  2019-05-31 20:21 ` [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF Andrii Nakryiko
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 40+ messages in thread
From: Andrii Nakryiko @ 2019-05-31 20:21 UTC (permalink / raw)
  To: andrii.nakryiko, netdev, bpf, ast, daniel, kernel-team; +Cc: Andrii Nakryiko

Libbpf does sanitization of BTF before loading it into kernel, if kernel
doesn't support some of newer BTF features. This removes some of the
important information from BTF (e.g., DATASEC and VAR description),
which will be used for map construction. This patch splits BTF
processing into initialization step, in which BTF is initialized from
ELF and all the original data is still preserved; and
sanitization/loading step, which ensures that BTF is safe to load into
kernel. This allows to use full BTF information to construct maps, while
still loading valid BTF into older kernels.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 tools/lib/bpf/libbpf.c | 34 ++++++++++++++++++++++++----------
 1 file changed, 24 insertions(+), 10 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 5e7ea7dac958..79a8143240d7 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -1118,7 +1118,7 @@ static void bpf_object__sanitize_btf_ext(struct bpf_object *obj)
 	}
 }
 
-static int bpf_object__load_btf(struct bpf_object *obj,
+static int bpf_object__init_btf(struct bpf_object *obj,
 				Elf_Data *btf_data,
 				Elf_Data *btf_ext_data)
 {
@@ -1137,13 +1137,6 @@ static int bpf_object__load_btf(struct bpf_object *obj,
 				   BTF_ELF_SEC, err);
 			goto out;
 		}
-		bpf_object__sanitize_btf(obj);
-		err = btf__load(obj->btf);
-		if (err) {
-			pr_warning("Error loading %s into kernel: %d.\n",
-				   BTF_ELF_SEC, err);
-			goto out;
-		}
 	}
 	if (btf_ext_data) {
 		if (!obj->btf) {
@@ -1159,7 +1152,6 @@ static int bpf_object__load_btf(struct bpf_object *obj,
 			obj->btf_ext = NULL;
 			goto out;
 		}
-		bpf_object__sanitize_btf_ext(obj);
 	}
 out:
 	if (err || IS_ERR(obj->btf)) {
@@ -1170,6 +1162,26 @@ static int bpf_object__load_btf(struct bpf_object *obj,
 	return 0;
 }
 
+static int bpf_object__sanitize_and_load_btf(struct bpf_object *obj)
+{
+	int err = 0;
+
+	if (!obj->btf)
+		return 0;
+
+	bpf_object__sanitize_btf(obj);
+	bpf_object__sanitize_btf_ext(obj);
+
+	err = btf__load(obj->btf);
+	if (err) {
+		pr_warning("Error loading %s into kernel: %d.\n",
+			   BTF_ELF_SEC, err);
+		btf__free(obj->btf);
+		obj->btf = NULL;
+	}
+	return 0;
+}
+
 static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
 {
 	Elf *elf = obj->efile.elf;
@@ -1301,9 +1313,11 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
 		pr_warning("Corrupted ELF file: index of strtab invalid\n");
 		return -LIBBPF_ERRNO__FORMAT;
 	}
-	err = bpf_object__load_btf(obj, btf_data, btf_ext_data);
+	err = bpf_object__init_btf(obj, btf_data, btf_ext_data);
 	if (!err)
 		err = bpf_object__init_maps(obj, flags);
+	if (!err)
+		err = bpf_object__sanitize_and_load_btf(obj);
 	if (!err)
 		err = bpf_object__init_prog_names(obj);
 	return err;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF
  2019-05-31 20:21 [RFC PATCH bpf-next 0/8] BTF-defined BPF map definitions Andrii Nakryiko
                   ` (4 preceding siblings ...)
  2019-05-31 20:21 ` [RFC PATCH bpf-next 5/8] libbpf: split initialization and loading of BTF Andrii Nakryiko
@ 2019-05-31 20:21 ` Andrii Nakryiko
  2019-05-31 21:28   ` Stanislav Fomichev
                     ` (2 more replies)
  2019-05-31 20:21 ` [RFC PATCH bpf-next 7/8] selftests/bpf: add test for BTF-defined maps Andrii Nakryiko
  2019-05-31 20:21 ` [RFC PATCH bpf-next 8/8] selftests/bpf: switch tests to BTF-defined map definitions Andrii Nakryiko
  7 siblings, 3 replies; 40+ messages in thread
From: Andrii Nakryiko @ 2019-05-31 20:21 UTC (permalink / raw)
  To: andrii.nakryiko, netdev, bpf, ast, daniel, kernel-team; +Cc: Andrii Nakryiko

This patch adds support for a new way to define BPF maps. It relies on
BTF to describe mandatory and optional attributes of a map, as well as
captures type information of key and value naturally. This eliminates
the need for BPF_ANNOTATE_KV_PAIR hack and ensures key/value sizes are
always in sync with the key/value type.

Relying on BTF, this approach allows for both forward and backward
compatibility w.r.t. extending supported map definition features. Old
libbpf implementation will ignore fields it doesn't recognize, while new
implementations will parse and recognize new optional attributes.

The outline of the new map definition (short, BTF-defined maps) is as follows:
1. All the maps should be defined in .maps ELF section. It's possible to
   have both "legacy" map definitions in `maps` sections and BTF-defined
   maps in .maps sections. Everything will still work transparently.
2. The map declaration and initialization is done through
   a global/static variable of a struct type with few mandatory and
   extra optional fields:
   - type field is mandatory and specified type of BPF map;
   - key/value fields are mandatory and capture key/value type/size information;
   - max_entries attribute is optional; if max_entries is not specified or
     initialized, it has to be provided in runtime through libbpf API
     before loading bpf_object;
   - map_flags is optional and if not defined, will be assumed to be 0.
3. Key/value fields should be **a pointer** to a type describing
   key/value. The pointee type is assumed (and will be recorded as such
   and used for size determination) to be a type describing key/value of
   the map. This is done to save excessive amounts of space allocated in
   corresponding ELF sections for key/value of big size.
4. As some maps disallow having BTF type ID associated with key/value,
   it's possible to specify key/value size explicitly without
   associating BTF type ID with it. Use key_size and value_size fields
   to do that (see example below).

Here's an example of simple ARRAY map defintion:

struct my_value { int x, y, z; };

struct {
	int type;
	int max_entries;
	int *key;
	struct my_value *value;
} btf_map SEC(".maps") = {
	.type = BPF_MAP_TYPE_ARRAY,
	.max_entries = 16,
};

This will define BPF ARRAY map 'btf_map' with 16 elements. The key will
be of type int and thus key size will be 4 bytes. The value is struct
my_value of size 12 bytes. This map can be used from C code exactly the
same as with existing maps defined through struct bpf_map_def.

Here's an example of STACKMAP definition (which currently disallows BTF type
IDs for key/value):

struct {
	__u32 type;
	__u32 max_entries;
	__u32 map_flags;
	__u32 key_size;
	__u32 value_size;
} stackmap SEC(".maps") = {
	.type = BPF_MAP_TYPE_STACK_TRACE,
	.max_entries = 128,
	.map_flags = BPF_F_STACK_BUILD_ID,
	.key_size = sizeof(__u32),
	.value_size = PERF_MAX_STACK_DEPTH * sizeof(struct bpf_stack_build_id),
};

This approach is naturally extended to support map-in-map, by making a value
field to be another struct that describes inner map. This feature is not
implemented yet. It's also possible to incrementally add features like pinning
with full backwards and forward compatibility.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 tools/lib/bpf/btf.h    |   1 +
 tools/lib/bpf/libbpf.c | 333 +++++++++++++++++++++++++++++++++++++++--
 2 files changed, 325 insertions(+), 9 deletions(-)

diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
index ba4ffa831aa4..88a52ae56fc6 100644
--- a/tools/lib/bpf/btf.h
+++ b/tools/lib/bpf/btf.h
@@ -17,6 +17,7 @@ extern "C" {
 
 #define BTF_ELF_SEC ".BTF"
 #define BTF_EXT_ELF_SEC ".BTF.ext"
+#define MAPS_ELF_SEC ".maps"
 
 struct btf;
 struct btf_ext;
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 79a8143240d7..5a8f1e82809b 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -262,6 +262,7 @@ struct bpf_object {
 		} *reloc;
 		int nr_reloc;
 		int maps_shndx;
+		int btf_maps_shndx;
 		int text_shndx;
 		int data_shndx;
 		int rodata_shndx;
@@ -514,6 +515,7 @@ static struct bpf_object *bpf_object__new(const char *path,
 	obj->efile.obj_buf = obj_buf;
 	obj->efile.obj_buf_sz = obj_buf_sz;
 	obj->efile.maps_shndx = -1;
+	obj->efile.btf_maps_shndx = -1;
 	obj->efile.data_shndx = -1;
 	obj->efile.rodata_shndx = -1;
 	obj->efile.bss_shndx = -1;
@@ -1012,6 +1014,292 @@ static int bpf_object__init_user_maps(struct bpf_object *obj, bool strict)
 	return 0;
 }
 
+static const struct btf_type *skip_mods_and_typedefs(const struct btf *btf,
+						     __u32 id)
+{
+	const struct btf_type *t = btf__type_by_id(btf, id);
+
+	while (true) {
+		switch (BTF_INFO_KIND(t->info)) {
+		case BTF_KIND_VOLATILE:
+		case BTF_KIND_CONST:
+		case BTF_KIND_RESTRICT:
+		case BTF_KIND_TYPEDEF:
+			t = btf__type_by_id(btf, t->type);
+			break;
+		default:
+			return t;
+		}
+	}
+}
+
+static bool get_map_attr_int(const char *map_name, 
+			     const struct btf *btf, 
+			     const struct btf_type *def,
+			     const struct btf_member *m, 
+			     const void *data, __u32 *res) {
+	const struct btf_type *t = skip_mods_and_typedefs(btf, m->type);
+	const char *name = btf__name_by_offset(btf, m->name_off);
+	__u32 int_info = *(const __u32 *)(const void *)(t + 1);
+
+	if (BTF_INFO_KIND(t->info) != BTF_KIND_INT) {
+		pr_warning("map '%s': attr '%s': expected INT, got %u.\n",
+			   map_name, name, BTF_INFO_KIND(t->info));
+		return false;
+	}
+	if (t->size != 4 || BTF_INT_BITS(int_info) != 32 ||
+	    BTF_INT_OFFSET(int_info)) {
+		pr_warning("map '%s': attr '%s': expected 32-bit non-bitfield integer, "
+			   "got %u-byte (%d-bit) one with bit offset %d.\n",
+			   map_name, name, t->size, BTF_INT_BITS(int_info),
+			   BTF_INT_OFFSET(int_info));
+		return false;
+	}
+	if (BTF_INFO_KFLAG(def->info) && BTF_MEMBER_BITFIELD_SIZE(m->offset)) {
+		pr_warning("map '%s': attr '%s': bitfield is not supported.\n",
+			   map_name, name);
+		return false;
+	}
+	if (m->offset % 32) {
+		pr_warning("map '%s': attr '%s': unaligned fields are not supported.\n",
+			   map_name, name);
+		return false;
+	}
+
+	*res = *(const __u32 *)(data + m->offset / 8);
+	return true;
+}
+
+static int bpf_object__init_user_btf_map(struct bpf_object *obj,
+					 const struct btf_type *sec,
+					 int var_idx, int sec_idx,
+					 const Elf_Data *data)
+{
+	const struct btf_type *var, *def, *t;
+	const struct btf_var_secinfo *vi;
+	const struct btf_var *var_extra;
+	const struct btf_member *m;
+	const void *def_data;
+	const char *map_name;
+	struct bpf_map *map;
+	int vlen, i;
+
+	vi = (const struct btf_var_secinfo *)(const void *)(sec + 1) + var_idx;
+	var = btf__type_by_id(obj->btf, vi->type);
+	var_extra = (const void *)(var + 1);
+	map_name = btf__name_by_offset(obj->btf, var->name_off);
+	vlen = BTF_INFO_VLEN(var->info);
+
+	if (map_name == NULL || map_name[0] == '\0') {
+		pr_warning("map #%d: empty name.\n", var_idx);
+		return -EINVAL;
+	}
+	if ((__u64)vi->offset + vi->size > data->d_size) {
+		pr_warning("map '%s' BTF data is corrupted.\n", map_name);
+		return -EINVAL;
+	}
+	if (BTF_INFO_KIND(var->info) != BTF_KIND_VAR) {
+		pr_warning("map '%s': unexpected var kind %u.\n",
+			   map_name, BTF_INFO_KIND(var->info));
+		return -EINVAL;
+	}
+	if (var_extra->linkage != BTF_VAR_GLOBAL_ALLOCATED &&
+	    var_extra->linkage != BTF_VAR_STATIC) {
+		pr_warning("map '%s': unsupported var linkage %u.\n",
+			   map_name, var_extra->linkage);
+		return -EOPNOTSUPP;
+	}
+
+	def = skip_mods_and_typedefs(obj->btf, var->type);
+	if (BTF_INFO_KIND(def->info) != BTF_KIND_STRUCT) {
+		pr_warning("map '%s': unexpected def kind %u.\n",
+			   map_name, BTF_INFO_KIND(var->info));
+		return -EINVAL;
+	}
+	if (def->size > vi->size) {
+		pr_warning("map '%s': invalid def size.\n", map_name);
+		return -EINVAL;
+	}
+
+	map = bpf_object__add_map(obj);
+	if (IS_ERR(map))
+		return PTR_ERR(map);
+	map->name = strdup(map_name);
+	if (!map->name) {
+		pr_warning("map '%s': failed to alloc map name.\n", map_name);
+		return -ENOMEM;
+	}
+	map->libbpf_type = LIBBPF_MAP_UNSPEC;
+	map->def.type = BPF_MAP_TYPE_UNSPEC;
+	map->sec_idx = sec_idx;
+	map->sec_offset = vi->offset;
+	pr_debug("map '%s': at sec_idx %d, offset %zu.\n",
+		 map_name, map->sec_idx, map->sec_offset);
+
+	def_data = data->d_buf + vi->offset;
+	vlen = BTF_INFO_VLEN(def->info);
+	m = (const void *)(def + 1);
+	for (i = 0; i < vlen; i++, m++) {
+		const char *name = btf__name_by_offset(obj->btf, m->name_off);
+
+		if (strcmp(name, "type") == 0) {
+			if (!get_map_attr_int(map_name, obj->btf, def, m,
+					      def_data, &map->def.type))
+				return -EINVAL;
+			pr_debug("map '%s': found type = %u.\n",
+				 map_name, map->def.type);
+		} else if (strcmp(name, "max_entries") == 0) {
+			if (!get_map_attr_int(map_name, obj->btf, def, m,
+					      def_data, &map->def.max_entries))
+				return -EINVAL;
+			pr_debug("map '%s': found max_entries = %u.\n",
+				 map_name, map->def.max_entries);
+		} else if (strcmp(name, "map_flags") == 0) {
+			if (!get_map_attr_int(map_name, obj->btf, def, m,
+					      def_data, &map->def.map_flags))
+				return -EINVAL;
+			pr_debug("map '%s': found map_flags = %u.\n",
+				 map_name, map->def.map_flags);
+		} else if (strcmp(name, "key_size") == 0) {
+			__u32 sz;
+
+			if (!get_map_attr_int(map_name, obj->btf, def, m,
+					      def_data, &sz))
+				return -EINVAL;
+			pr_debug("map '%s': found key_size = %u.\n",
+				 map_name, sz);
+			if (map->def.key_size && map->def.key_size != sz) {
+				pr_warning("map '%s': conflictling key size %u != %u.\n",
+					   map_name, map->def.key_size, sz);
+				return -EINVAL;
+			}
+			map->def.key_size = sz;
+		} else if (strcmp(name, "key") == 0) {
+			__s64 sz;
+
+			t = btf__type_by_id(obj->btf, m->type);
+			if (BTF_INFO_KIND(t->info) != BTF_KIND_PTR) {
+				pr_warning("map '%s': key spec is not PTR: %u.\n",
+					   map_name, BTF_INFO_KIND(t->info));
+				return -EINVAL;
+			}
+			sz = btf__resolve_size(obj->btf, t->type);
+			if (sz < 0) {
+				pr_warning("map '%s': can't determine key size for type [%u]: %lld.\n",
+					   map_name, t->type, sz);
+				return sz;
+			}
+			pr_debug("map '%s': found key [%u], sz = %lld.\n",
+				 map_name, t->type, sz);
+			if (map->def.key_size && map->def.key_size != sz) {
+				pr_warning("map '%s': conflictling key size %u != %lld.\n",
+					   map_name, map->def.key_size, sz);
+				return -EINVAL;
+			}
+			map->def.key_size = sz;
+			map->btf_key_type_id = t->type;
+		} else if (strcmp(name, "value_size") == 0) {
+			__u32 sz;
+
+			if (!get_map_attr_int(map_name, obj->btf, def, m,
+					      def_data, &sz))
+				return -EINVAL;
+			pr_debug("map '%s': found value_size = %u.\n",
+				 map_name, sz);
+			if (map->def.value_size && map->def.value_size != sz) {
+				pr_warning("map '%s': conflictling value size %u != %u.\n",
+					   map_name, map->def.value_size, sz);
+				return -EINVAL;
+			}
+			map->def.value_size = sz;
+		} else if (strcmp(name, "value") == 0) {
+			__s64 sz;
+
+			t = btf__type_by_id(obj->btf, m->type);
+			if (BTF_INFO_KIND(t->info) != BTF_KIND_PTR) {
+				pr_warning("map '%s': value spec is not PTR: %u.\n",
+					   map_name, BTF_INFO_KIND(t->info));
+				return -EINVAL;
+			}
+			sz = btf__resolve_size(obj->btf, t->type);
+			if (sz < 0) {
+				pr_warning("map '%s': can't determine value size for type [%u]: %lld.\n",
+					   map_name, t->type, sz);
+				return sz;
+			}
+			pr_debug("map '%s': found value [%u], sz = %lld.\n",
+				 map_name, t->type, sz);
+			if (map->def.value_size && map->def.value_size != sz) {
+				pr_warning("map '%s': conflictling value size %u != %lld.\n",
+					   map_name, map->def.value_size, sz);
+				return -EINVAL;
+			}
+			map->def.value_size = sz;
+			map->btf_value_type_id = t->type;
+		} else {
+			pr_debug("map '%s': ignoring unknown def field '%s'.\n",
+				 map_name, name);
+		}
+	}
+
+	if (map->def.type == BPF_MAP_TYPE_UNSPEC) {
+		pr_warning("map '%s': map type isn't specified.\n", map_name);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int bpf_object__init_user_btf_maps(struct bpf_object *obj)
+{
+	const struct btf_type *sec = NULL;
+	int nr_types, i, vlen, err;
+	const struct btf_type *t;
+	const char *name;
+	Elf_Data *data;
+	Elf_Scn *scn;
+
+	if (obj->efile.btf_maps_shndx < 0)
+		return 0;
+
+	scn = elf_getscn(obj->efile.elf, obj->efile.btf_maps_shndx);
+	if (scn)
+		data = elf_getdata(scn, NULL);
+	if (!scn || !data) {
+		pr_warning("failed to get Elf_Data from map section %d (%s)\n",
+			   obj->efile.maps_shndx, MAPS_ELF_SEC);
+		return -EINVAL;
+	}
+
+	nr_types = btf__get_nr_types(obj->btf);
+	for (i = 1; i <= nr_types; i++) {
+		t = btf__type_by_id(obj->btf, i);
+		if (BTF_INFO_KIND(t->info) != BTF_KIND_DATASEC)
+			continue;
+		name = btf__name_by_offset(obj->btf, t->name_off);
+		if (strcmp(name, MAPS_ELF_SEC) == 0) {
+			sec = t;
+			break;
+		}
+	}
+
+	if (!sec) {
+		pr_warning("DATASEC '%s' not found.\n", MAPS_ELF_SEC);
+		return -ENOENT;
+	}
+
+	vlen = BTF_INFO_VLEN(sec->info);
+	for (i = 0; i < vlen; i++) {
+		err = bpf_object__init_user_btf_map(obj, sec, i,
+						    obj->efile.btf_maps_shndx,
+						    data);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
 static int bpf_object__init_maps(struct bpf_object *obj, int flags)
 {
 	bool strict = !(flags & MAPS_RELAX_COMPAT);
@@ -1021,6 +1309,10 @@ static int bpf_object__init_maps(struct bpf_object *obj, int flags)
 	if (err)
 		return err;
 
+	err = bpf_object__init_user_btf_maps(obj);
+	if (err)
+		return err;
+
 	err = bpf_object__init_global_data_maps(obj);
 	if (err)
 		return err;
@@ -1118,10 +1410,16 @@ static void bpf_object__sanitize_btf_ext(struct bpf_object *obj)
 	}
 }
 
+static bool bpf_object__is_btf_mandatory(const struct bpf_object *obj)
+{
+	return obj->efile.btf_maps_shndx >= 0;
+}
+
 static int bpf_object__init_btf(struct bpf_object *obj,
 				Elf_Data *btf_data,
 				Elf_Data *btf_ext_data)
 {
+	bool btf_required = bpf_object__is_btf_mandatory(obj);
 	int err = 0;
 
 	if (btf_data) {
@@ -1155,10 +1453,18 @@ static int bpf_object__init_btf(struct bpf_object *obj,
 	}
 out:
 	if (err || IS_ERR(obj->btf)) {
+		if (btf_required)
+			err = err ? : PTR_ERR(obj->btf);
+		else
+			err = 0;
 		if (!IS_ERR_OR_NULL(obj->btf))
 			btf__free(obj->btf);
 		obj->btf = NULL;
 	}
+	if (btf_required && !obj->btf) {
+		pr_warning("BTF is required, but is missing or corrupted.\n");
+		return err == 0 ? -ENOENT : err;
+	}
 	return 0;
 }
 
@@ -1178,6 +1484,8 @@ static int bpf_object__sanitize_and_load_btf(struct bpf_object *obj)
 			   BTF_ELF_SEC, err);
 		btf__free(obj->btf);
 		obj->btf = NULL;
+		if (bpf_object__is_btf_mandatory(obj))
+			return err;
 	}
 	return 0;
 }
@@ -1241,6 +1549,8 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
 				return err;
 		} else if (strcmp(name, "maps") == 0) {
 			obj->efile.maps_shndx = idx;
+		} else if (strcmp(name, MAPS_ELF_SEC) == 0) {
+			obj->efile.btf_maps_shndx = idx;
 		} else if (strcmp(name, BTF_ELF_SEC) == 0) {
 			btf_data = data;
 		} else if (strcmp(name, BTF_EXT_ELF_SEC) == 0) {
@@ -1360,7 +1670,8 @@ static bool bpf_object__shndx_is_data(const struct bpf_object *obj,
 static bool bpf_object__shndx_is_maps(const struct bpf_object *obj,
 				      int shndx)
 {
-	return shndx == obj->efile.maps_shndx;
+	return shndx == obj->efile.maps_shndx ||
+	       shndx == obj->efile.btf_maps_shndx;
 }
 
 static bool bpf_object__relo_in_known_section(const struct bpf_object *obj,
@@ -1404,14 +1715,14 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
 	prog->nr_reloc = nrels;
 
 	for (i = 0; i < nrels; i++) {
-		GElf_Sym sym;
-		GElf_Rel rel;
-		unsigned int insn_idx;
-		unsigned int shdr_idx;
 		struct bpf_insn *insns = prog->insns;
 		enum libbpf_map_type type;
+		unsigned int insn_idx;
+		unsigned int shdr_idx;
 		const char *name;
 		size_t map_idx;
+		GElf_Sym sym;
+		GElf_Rel rel;
 
 		if (!gelf_getrel(data, i, &rel)) {
 			pr_warning("relocation: failed to get %d reloc\n", i);
@@ -1505,14 +1816,18 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
 	return 0;
 }
 
-static int bpf_map_find_btf_info(struct bpf_map *map, const struct btf *btf)
+static int bpf_map_find_btf_info(struct bpf_object *obj, struct bpf_map *map)
 {
 	struct bpf_map_def *def = &map->def;
 	__u32 key_type_id = 0, value_type_id = 0;
 	int ret;
 
+	/* if it's BTF-defined map, we don't need to search for type IDs */
+	if (map->sec_idx == obj->efile.btf_maps_shndx)
+		return 0;
+
 	if (!bpf_map__is_internal(map)) {
-		ret = btf__get_map_kv_tids(btf, map->name, def->key_size,
+		ret = btf__get_map_kv_tids(obj->btf, map->name, def->key_size,
 					   def->value_size, &key_type_id,
 					   &value_type_id);
 	} else {
@@ -1520,7 +1835,7 @@ static int bpf_map_find_btf_info(struct bpf_map *map, const struct btf *btf)
 		 * LLVM annotates global data differently in BTF, that is,
 		 * only as '.data', '.bss' or '.rodata'.
 		 */
-		ret = btf__find_by_name(btf,
+		ret = btf__find_by_name(obj->btf,
 				libbpf_type_to_btf_name[map->libbpf_type]);
 	}
 	if (ret < 0)
@@ -1810,7 +2125,7 @@ bpf_object__create_maps(struct bpf_object *obj)
 		    map->inner_map_fd >= 0)
 			create_attr.inner_map_fd = map->inner_map_fd;
 
-		if (obj->btf && !bpf_map_find_btf_info(map, obj->btf)) {
+		if (obj->btf && !bpf_map_find_btf_info(obj, map)) {
 			create_attr.btf_fd = btf__fd(obj->btf);
 			create_attr.btf_key_type_id = map->btf_key_type_id;
 			create_attr.btf_value_type_id = map->btf_value_type_id;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH bpf-next 7/8] selftests/bpf: add test for BTF-defined maps
  2019-05-31 20:21 [RFC PATCH bpf-next 0/8] BTF-defined BPF map definitions Andrii Nakryiko
                   ` (5 preceding siblings ...)
  2019-05-31 20:21 ` [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF Andrii Nakryiko
@ 2019-05-31 20:21 ` Andrii Nakryiko
  2019-05-31 20:21 ` [RFC PATCH bpf-next 8/8] selftests/bpf: switch tests to BTF-defined map definitions Andrii Nakryiko
  7 siblings, 0 replies; 40+ messages in thread
From: Andrii Nakryiko @ 2019-05-31 20:21 UTC (permalink / raw)
  To: andrii.nakryiko, netdev, bpf, ast, daniel, kernel-team; +Cc: Andrii Nakryiko

Add file test for BTF-defined map definition.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 .../selftests/bpf/progs/test_btf_newkv.c      | 73 +++++++++++++++++++
 tools/testing/selftests/bpf/test_btf.c        | 10 +--
 2 files changed, 76 insertions(+), 7 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/test_btf_newkv.c

diff --git a/tools/testing/selftests/bpf/progs/test_btf_newkv.c b/tools/testing/selftests/bpf/progs/test_btf_newkv.c
new file mode 100644
index 000000000000..28c16bb583b6
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_btf_newkv.c
@@ -0,0 +1,73 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) 2018 Facebook */
+#include <linux/bpf.h>
+#include "bpf_helpers.h"
+
+int _version SEC("version") = 1;
+
+struct ipv_counts {
+	unsigned int v4;
+	unsigned int v6;
+};
+
+/* just to validate we can handle maps in multiple sections */
+struct bpf_map_def SEC("maps") btf_map_legacy = {
+	.type = BPF_MAP_TYPE_ARRAY,
+	.key_size = sizeof(int),
+	.value_size = sizeof(long long),
+	.max_entries = 4,
+};
+
+BPF_ANNOTATE_KV_PAIR(btf_map_legacy, int, struct ipv_counts);
+
+struct {
+	int *key;
+	struct ipv_counts *value;
+	unsigned int type;
+	unsigned int max_entries;
+} btf_map SEC(".maps") = {
+	.type = BPF_MAP_TYPE_ARRAY,
+	.max_entries = 4,
+};
+
+struct dummy_tracepoint_args {
+	unsigned long long pad;
+	struct sock *sock;
+};
+
+__attribute__((noinline))
+static int test_long_fname_2(struct dummy_tracepoint_args *arg)
+{
+	struct ipv_counts *counts;
+	int key = 0;
+
+	if (!arg->sock)
+		return 0;
+
+	counts = bpf_map_lookup_elem(&btf_map, &key);
+	if (!counts)
+		return 0;
+
+	counts->v6++;
+
+	/* just verify we can reference both maps */
+	counts = bpf_map_lookup_elem(&btf_map_legacy, &key);
+	if (!counts)
+		return 0;
+
+	return 0;
+}
+
+__attribute__((noinline))
+static int test_long_fname_1(struct dummy_tracepoint_args *arg)
+{
+	return test_long_fname_2(arg);
+}
+
+SEC("dummy_tracepoint")
+int _dummy_tracepoint(struct dummy_tracepoint_args *arg)
+{
+	return test_long_fname_1(arg);
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/test_btf.c b/tools/testing/selftests/bpf/test_btf.c
index 289daf54dec4..8351cb5f4a20 100644
--- a/tools/testing/selftests/bpf/test_btf.c
+++ b/tools/testing/selftests/bpf/test_btf.c
@@ -4016,13 +4016,9 @@ struct btf_file_test {
 };
 
 static struct btf_file_test file_tests[] = {
-{
-	.file = "test_btf_haskv.o",
-},
-{
-	.file = "test_btf_nokv.o",
-	.btf_kv_notfound = true,
-},
+	{ .file = "test_btf_haskv.o", },
+	{ .file = "test_btf_newkv.o", },
+	{ .file = "test_btf_nokv.o", .btf_kv_notfound = true, },
 };
 
 static int do_test_file(unsigned int test_num)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH bpf-next 8/8] selftests/bpf: switch tests to BTF-defined map definitions
  2019-05-31 20:21 [RFC PATCH bpf-next 0/8] BTF-defined BPF map definitions Andrii Nakryiko
                   ` (6 preceding siblings ...)
  2019-05-31 20:21 ` [RFC PATCH bpf-next 7/8] selftests/bpf: add test for BTF-defined maps Andrii Nakryiko
@ 2019-05-31 20:21 ` Andrii Nakryiko
  7 siblings, 0 replies; 40+ messages in thread
From: Andrii Nakryiko @ 2019-05-31 20:21 UTC (permalink / raw)
  To: andrii.nakryiko, netdev, bpf, ast, daniel, kernel-team; +Cc: Andrii Nakryiko

Switch test map definition to new BTF-defined format.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 tools/testing/selftests/bpf/progs/bpf_flow.c  | 18 +++--
 .../selftests/bpf/progs/get_cgroup_id_kern.c  | 18 +++--
 .../testing/selftests/bpf/progs/netcnt_prog.c | 22 +++---
 .../selftests/bpf/progs/sample_map_ret0.c     | 18 +++--
 .../selftests/bpf/progs/socket_cookie_prog.c  |  9 ++-
 .../bpf/progs/sockmap_verdict_prog.c          | 36 +++++++---
 .../bpf/progs/test_get_stack_rawtp.c          | 27 ++++---
 .../selftests/bpf/progs/test_global_data.c    | 27 ++++---
 tools/testing/selftests/bpf/progs/test_l4lb.c | 45 ++++++++----
 .../selftests/bpf/progs/test_l4lb_noinline.c  | 45 ++++++++----
 .../selftests/bpf/progs/test_map_in_map.c     | 20 ++++--
 .../selftests/bpf/progs/test_map_lock.c       | 22 +++---
 .../testing/selftests/bpf/progs/test_obj_id.c |  9 ++-
 .../bpf/progs/test_select_reuseport_kern.c    | 45 ++++++++----
 .../bpf/progs/test_send_signal_kern.c         | 22 +++---
 .../bpf/progs/test_skb_cgroup_id_kern.c       |  9 ++-
 .../bpf/progs/test_sock_fields_kern.c         | 60 +++++++++-------
 .../selftests/bpf/progs/test_spin_lock.c      | 33 ++++-----
 .../bpf/progs/test_stacktrace_build_id.c      | 44 ++++++++----
 .../selftests/bpf/progs/test_stacktrace_map.c | 40 +++++++----
 .../testing/selftests/bpf/progs/test_tc_edt.c |  9 ++-
 .../bpf/progs/test_tcp_check_syncookie_kern.c |  9 ++-
 .../selftests/bpf/progs/test_tcp_estats.c     |  9 ++-
 .../selftests/bpf/progs/test_tcpbpf_kern.c    | 18 +++--
 .../selftests/bpf/progs/test_tcpnotify_kern.c | 18 +++--
 tools/testing/selftests/bpf/progs/test_xdp.c  | 18 +++--
 .../selftests/bpf/progs/test_xdp_noinline.c   | 60 ++++++++++------
 .../selftests/bpf/test_queue_stack_map.h      | 20 ++++--
 .../testing/selftests/bpf/test_sockmap_kern.h | 72 +++++++++++++------
 29 files changed, 526 insertions(+), 276 deletions(-)

diff --git a/tools/testing/selftests/bpf/progs/bpf_flow.c b/tools/testing/selftests/bpf/progs/bpf_flow.c
index 81ad9a0b29d0..849f42e548b5 100644
--- a/tools/testing/selftests/bpf/progs/bpf_flow.c
+++ b/tools/testing/selftests/bpf/progs/bpf_flow.c
@@ -57,17 +57,25 @@ struct frag_hdr {
 	__be32 identification;
 };
 
-struct bpf_map_def SEC("maps") jmp_table = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 key_size;
+	__u32 value_size;
+} jmp_table SEC(".maps") = {
 	.type = BPF_MAP_TYPE_PROG_ARRAY,
+	.max_entries = 8,
 	.key_size = sizeof(__u32),
 	.value_size = sizeof(__u32),
-	.max_entries = 8
 };
 
-struct bpf_map_def SEC("maps") last_dissection = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 *key;
+	struct bpf_flow_keys *value;
+} last_dissection SEC(".maps") = {
 	.type = BPF_MAP_TYPE_ARRAY,
-	.key_size = sizeof(__u32),
-	.value_size = sizeof(struct bpf_flow_keys),
 	.max_entries = 1,
 };
 
diff --git a/tools/testing/selftests/bpf/progs/get_cgroup_id_kern.c b/tools/testing/selftests/bpf/progs/get_cgroup_id_kern.c
index 014dba10b8a5..87b202381088 100644
--- a/tools/testing/selftests/bpf/progs/get_cgroup_id_kern.c
+++ b/tools/testing/selftests/bpf/progs/get_cgroup_id_kern.c
@@ -4,17 +4,23 @@
 #include <linux/bpf.h>
 #include "bpf_helpers.h"
 
-struct bpf_map_def SEC("maps") cg_ids = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 *key;
+	__u64 *value;
+} cg_ids SEC(".maps") = {
 	.type = BPF_MAP_TYPE_ARRAY,
-	.key_size = sizeof(__u32),
-	.value_size = sizeof(__u64),
 	.max_entries = 1,
 };
 
-struct bpf_map_def SEC("maps") pidmap = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 *key;
+	__u32 *value;
+} pidmap SEC(".maps") = {
 	.type = BPF_MAP_TYPE_ARRAY,
-	.key_size = sizeof(__u32),
-	.value_size = sizeof(__u32),
 	.max_entries = 1,
 };
 
diff --git a/tools/testing/selftests/bpf/progs/netcnt_prog.c b/tools/testing/selftests/bpf/progs/netcnt_prog.c
index 9f741e69cebe..a25c82a5b7c8 100644
--- a/tools/testing/selftests/bpf/progs/netcnt_prog.c
+++ b/tools/testing/selftests/bpf/progs/netcnt_prog.c
@@ -10,24 +10,22 @@
 #define REFRESH_TIME_NS	100000000
 #define NS_PER_SEC	1000000000
 
-struct bpf_map_def SEC("maps") percpu_netcnt = {
+struct {
+	__u32 type;
+	struct bpf_cgroup_storage_key *key;
+	struct percpu_net_cnt *value;
+} percpu_netcnt SEC(".maps") = {
 	.type = BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE,
-	.key_size = sizeof(struct bpf_cgroup_storage_key),
-	.value_size = sizeof(struct percpu_net_cnt),
 };
 
-BPF_ANNOTATE_KV_PAIR(percpu_netcnt, struct bpf_cgroup_storage_key,
-		     struct percpu_net_cnt);
-
-struct bpf_map_def SEC("maps") netcnt = {
+struct {
+	__u32 type;
+	struct bpf_cgroup_storage_key *key;
+	struct net_cnt *value;
+} netcnt SEC(".maps") = {
 	.type = BPF_MAP_TYPE_CGROUP_STORAGE,
-	.key_size = sizeof(struct bpf_cgroup_storage_key),
-	.value_size = sizeof(struct net_cnt),
 };
 
-BPF_ANNOTATE_KV_PAIR(netcnt, struct bpf_cgroup_storage_key,
-		     struct net_cnt);
-
 SEC("cgroup/skb")
 int bpf_nextcnt(struct __sk_buff *skb)
 {
diff --git a/tools/testing/selftests/bpf/progs/sample_map_ret0.c b/tools/testing/selftests/bpf/progs/sample_map_ret0.c
index 0756303676ac..0f4d47cecd4d 100644
--- a/tools/testing/selftests/bpf/progs/sample_map_ret0.c
+++ b/tools/testing/selftests/bpf/progs/sample_map_ret0.c
@@ -2,17 +2,23 @@
 #include <linux/bpf.h>
 #include "bpf_helpers.h"
 
-struct bpf_map_def SEC("maps") htab = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 *key;
+	long *value;
+} htab SEC(".maps") = {
 	.type = BPF_MAP_TYPE_HASH,
-	.key_size = sizeof(__u32),
-	.value_size = sizeof(long),
 	.max_entries = 2,
 };
 
-struct bpf_map_def SEC("maps") array = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 *key;
+	long *value;
+} array SEC(".maps") = {
 	.type = BPF_MAP_TYPE_ARRAY,
-	.key_size = sizeof(__u32),
-	.value_size = sizeof(long),
 	.max_entries = 2,
 };
 
diff --git a/tools/testing/selftests/bpf/progs/socket_cookie_prog.c b/tools/testing/selftests/bpf/progs/socket_cookie_prog.c
index 9ff8ac4b0bf6..5158bd8c342a 100644
--- a/tools/testing/selftests/bpf/progs/socket_cookie_prog.c
+++ b/tools/testing/selftests/bpf/progs/socket_cookie_prog.c
@@ -7,10 +7,13 @@
 #include "bpf_helpers.h"
 #include "bpf_endian.h"
 
-struct bpf_map_def SEC("maps") socket_cookies = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u64 *key;
+	__u32 *value;
+} socket_cookies SEC(".maps") = {
 	.type = BPF_MAP_TYPE_HASH,
-	.key_size = sizeof(__u64),
-	.value_size = sizeof(__u32),
 	.max_entries = 1 << 8,
 };
 
diff --git a/tools/testing/selftests/bpf/progs/sockmap_verdict_prog.c b/tools/testing/selftests/bpf/progs/sockmap_verdict_prog.c
index bdc22be46f2e..7b2146300489 100644
--- a/tools/testing/selftests/bpf/progs/sockmap_verdict_prog.c
+++ b/tools/testing/selftests/bpf/progs/sockmap_verdict_prog.c
@@ -5,31 +5,49 @@
 
 int _version SEC("version") = 1;
 
-struct bpf_map_def SEC("maps") sock_map_rx = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 key_size;
+	__u32 value_size;
+} sock_map_rx SEC(".maps") = {
 	.type = BPF_MAP_TYPE_SOCKMAP,
+	.max_entries = 20,
 	.key_size = sizeof(int),
 	.value_size = sizeof(int),
-	.max_entries = 20,
 };
 
-struct bpf_map_def SEC("maps") sock_map_tx = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 key_size;
+	__u32 value_size;
+} sock_map_tx SEC(".maps") = {
 	.type = BPF_MAP_TYPE_SOCKMAP,
+	.max_entries = 20,
 	.key_size = sizeof(int),
 	.value_size = sizeof(int),
-	.max_entries = 20,
 };
 
-struct bpf_map_def SEC("maps") sock_map_msg = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 key_size;
+	__u32 value_size;
+} sock_map_msg SEC(".maps") = {
 	.type = BPF_MAP_TYPE_SOCKMAP,
+	.max_entries = 20,
 	.key_size = sizeof(int),
 	.value_size = sizeof(int),
-	.max_entries = 20,
 };
 
-struct bpf_map_def SEC("maps") sock_map_break = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	int *key;
+	int *value;
+} sock_map_break SEC(".maps") = {
 	.type = BPF_MAP_TYPE_ARRAY,
-	.key_size = sizeof(int),
-	.value_size = sizeof(int),
 	.max_entries = 20,
 };
 
diff --git a/tools/testing/selftests/bpf/progs/test_get_stack_rawtp.c b/tools/testing/selftests/bpf/progs/test_get_stack_rawtp.c
index f6d9f238e00a..aaa6ec250e15 100644
--- a/tools/testing/selftests/bpf/progs/test_get_stack_rawtp.c
+++ b/tools/testing/selftests/bpf/progs/test_get_stack_rawtp.c
@@ -15,17 +15,25 @@ struct stack_trace_t {
 	struct bpf_stack_build_id user_stack_buildid[MAX_STACK_RAWTP];
 };
 
-struct bpf_map_def SEC("maps") perfmap = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 key_size;
+	__u32 value_size;
+} perfmap SEC(".maps") = {
 	.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
+	.max_entries = 2,
 	.key_size = sizeof(int),
 	.value_size = sizeof(__u32),
-	.max_entries = 2,
 };
 
-struct bpf_map_def SEC("maps") stackdata_map = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 *key;
+	struct stack_trace_t *value;
+} stackdata_map SEC(".maps") = {
 	.type = BPF_MAP_TYPE_PERCPU_ARRAY,
-	.key_size = sizeof(__u32),
-	.value_size = sizeof(struct stack_trace_t),
 	.max_entries = 1,
 };
 
@@ -47,10 +55,13 @@ struct bpf_map_def SEC("maps") stackdata_map = {
  * issue and avoid complicated C programming massaging.
  * This is an acceptable workaround since there is one entry here.
  */
-struct bpf_map_def SEC("maps") rawdata_map = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 *key;
+	__u64 (*value)[2 * MAX_STACK_RAWTP];
+} rawdata_map SEC(".maps") = {
 	.type = BPF_MAP_TYPE_PERCPU_ARRAY,
-	.key_size = sizeof(__u32),
-	.value_size = MAX_STACK_RAWTP * sizeof(__u64) * 2,
 	.max_entries = 1,
 };
 
diff --git a/tools/testing/selftests/bpf/progs/test_global_data.c b/tools/testing/selftests/bpf/progs/test_global_data.c
index 5ab14e941980..866cc7ddbe43 100644
--- a/tools/testing/selftests/bpf/progs/test_global_data.c
+++ b/tools/testing/selftests/bpf/progs/test_global_data.c
@@ -7,17 +7,23 @@
 
 #include "bpf_helpers.h"
 
-struct bpf_map_def SEC("maps") result_number = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 *key;
+	__u64 *value;
+} result_number SEC(".maps") = {
 	.type		= BPF_MAP_TYPE_ARRAY,
-	.key_size	= sizeof(__u32),
-	.value_size	= sizeof(__u64),
 	.max_entries	= 11,
 };
 
-struct bpf_map_def SEC("maps") result_string = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 *key;
+	const char (*value)[32];
+} result_string SEC(".maps") = {
 	.type		= BPF_MAP_TYPE_ARRAY,
-	.key_size	= sizeof(__u32),
-	.value_size	= 32,
 	.max_entries	= 5,
 };
 
@@ -27,10 +33,13 @@ struct foo {
 	__u64 c;
 };
 
-struct bpf_map_def SEC("maps") result_struct = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 *key;
+	struct foo *value;
+} result_struct SEC(".maps") = {
 	.type		= BPF_MAP_TYPE_ARRAY,
-	.key_size	= sizeof(__u32),
-	.value_size	= sizeof(struct foo),
 	.max_entries	= 5,
 };
 
diff --git a/tools/testing/selftests/bpf/progs/test_l4lb.c b/tools/testing/selftests/bpf/progs/test_l4lb.c
index 1e10c9590991..848cbb90f581 100644
--- a/tools/testing/selftests/bpf/progs/test_l4lb.c
+++ b/tools/testing/selftests/bpf/progs/test_l4lb.c
@@ -169,38 +169,53 @@ struct eth_hdr {
 	unsigned short eth_proto;
 };
 
-struct bpf_map_def SEC("maps") vip_map = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	struct vip *key;
+	struct vip_meta *value;
+} vip_map SEC(".maps") = {
 	.type = BPF_MAP_TYPE_HASH,
-	.key_size = sizeof(struct vip),
-	.value_size = sizeof(struct vip_meta),
 	.max_entries = MAX_VIPS,
 };
 
-struct bpf_map_def SEC("maps") ch_rings = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 *key;
+	__u32 *value;
+} ch_rings SEC(".maps") = {
 	.type = BPF_MAP_TYPE_ARRAY,
-	.key_size = sizeof(__u32),
-	.value_size = sizeof(__u32),
 	.max_entries = CH_RINGS_SIZE,
 };
 
-struct bpf_map_def SEC("maps") reals = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 *key;
+	struct real_definition *value;
+} reals SEC(".maps") = {
 	.type = BPF_MAP_TYPE_ARRAY,
-	.key_size = sizeof(__u32),
-	.value_size = sizeof(struct real_definition),
 	.max_entries = MAX_REALS,
 };
 
-struct bpf_map_def SEC("maps") stats = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 *key;
+	struct vip_stats *value;
+} stats SEC(".maps") = {
 	.type = BPF_MAP_TYPE_PERCPU_ARRAY,
-	.key_size = sizeof(__u32),
-	.value_size = sizeof(struct vip_stats),
 	.max_entries = MAX_VIPS,
 };
 
-struct bpf_map_def SEC("maps") ctl_array = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 *key;
+	struct ctl_value *value;
+} ctl_array SEC(".maps") = {
 	.type = BPF_MAP_TYPE_ARRAY,
-	.key_size = sizeof(__u32),
-	.value_size = sizeof(struct ctl_value),
 	.max_entries = CTL_MAP_SIZE,
 };
 
diff --git a/tools/testing/selftests/bpf/progs/test_l4lb_noinline.c b/tools/testing/selftests/bpf/progs/test_l4lb_noinline.c
index ba44a14e6dc4..c63ecf3ca573 100644
--- a/tools/testing/selftests/bpf/progs/test_l4lb_noinline.c
+++ b/tools/testing/selftests/bpf/progs/test_l4lb_noinline.c
@@ -165,38 +165,53 @@ struct eth_hdr {
 	unsigned short eth_proto;
 };
 
-struct bpf_map_def SEC("maps") vip_map = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	struct vip *key;
+	struct vip_meta *value;
+} vip_map SEC(".maps") = {
 	.type = BPF_MAP_TYPE_HASH,
-	.key_size = sizeof(struct vip),
-	.value_size = sizeof(struct vip_meta),
 	.max_entries = MAX_VIPS,
 };
 
-struct bpf_map_def SEC("maps") ch_rings = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 *key;
+	__u32 *value;
+} ch_rings SEC(".maps") = {
 	.type = BPF_MAP_TYPE_ARRAY,
-	.key_size = sizeof(__u32),
-	.value_size = sizeof(__u32),
 	.max_entries = CH_RINGS_SIZE,
 };
 
-struct bpf_map_def SEC("maps") reals = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 *key;
+	struct real_definition *value;
+} reals SEC(".maps") = {
 	.type = BPF_MAP_TYPE_ARRAY,
-	.key_size = sizeof(__u32),
-	.value_size = sizeof(struct real_definition),
 	.max_entries = MAX_REALS,
 };
 
-struct bpf_map_def SEC("maps") stats = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 *key;
+	struct vip_stats *value;
+} stats SEC(".maps") = {
 	.type = BPF_MAP_TYPE_PERCPU_ARRAY,
-	.key_size = sizeof(__u32),
-	.value_size = sizeof(struct vip_stats),
 	.max_entries = MAX_VIPS,
 };
 
-struct bpf_map_def SEC("maps") ctl_array = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 *key;
+	struct ctl_value *value;
+} ctl_array SEC(".maps") = {
 	.type = BPF_MAP_TYPE_ARRAY,
-	.key_size = sizeof(__u32),
-	.value_size = sizeof(struct ctl_value),
 	.max_entries = CTL_MAP_SIZE,
 };
 
diff --git a/tools/testing/selftests/bpf/progs/test_map_in_map.c b/tools/testing/selftests/bpf/progs/test_map_in_map.c
index 2985f262846e..7404bee7c26e 100644
--- a/tools/testing/selftests/bpf/progs/test_map_in_map.c
+++ b/tools/testing/selftests/bpf/progs/test_map_in_map.c
@@ -5,22 +5,30 @@
 #include <linux/types.h>
 #include "bpf_helpers.h"
 
-struct bpf_map_def SEC("maps") mim_array = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 key_size;
+	__u32 value_size;
+} mim_array SEC(".maps") = {
 	.type = BPF_MAP_TYPE_ARRAY_OF_MAPS,
+	.max_entries = 1,
 	.key_size = sizeof(int),
 	/* must be sizeof(__u32) for map in map */
 	.value_size = sizeof(__u32),
-	.max_entries = 1,
-	.map_flags = 0,
 };
 
-struct bpf_map_def SEC("maps") mim_hash = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 key_size;
+	__u32 value_size;
+} mim_hash SEC(".maps") = {
 	.type = BPF_MAP_TYPE_HASH_OF_MAPS,
+	.max_entries = 1,
 	.key_size = sizeof(int),
 	/* must be sizeof(__u32) for map in map */
 	.value_size = sizeof(__u32),
-	.max_entries = 1,
-	.map_flags = 0,
 };
 
 SEC("xdp_mimtest")
diff --git a/tools/testing/selftests/bpf/progs/test_map_lock.c b/tools/testing/selftests/bpf/progs/test_map_lock.c
index af8cc68ed2f9..40d9c2853393 100644
--- a/tools/testing/selftests/bpf/progs/test_map_lock.c
+++ b/tools/testing/selftests/bpf/progs/test_map_lock.c
@@ -11,29 +11,31 @@ struct hmap_elem {
 	int var[VAR_NUM];
 };
 
-struct bpf_map_def SEC("maps") hash_map = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 *key;
+	struct hmap_elem *value;
+} hash_map SEC(".maps") = {
 	.type = BPF_MAP_TYPE_HASH,
-	.key_size = sizeof(int),
-	.value_size = sizeof(struct hmap_elem),
 	.max_entries = 1,
 };
 
-BPF_ANNOTATE_KV_PAIR(hash_map, int, struct hmap_elem);
-
 struct array_elem {
 	struct bpf_spin_lock lock;
 	int var[VAR_NUM];
 };
 
-struct bpf_map_def SEC("maps") array_map = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	int *key;
+	struct array_elem *value;
+} array_map SEC(".maps") = {
 	.type = BPF_MAP_TYPE_ARRAY,
-	.key_size = sizeof(int),
-	.value_size = sizeof(struct array_elem),
 	.max_entries = 1,
 };
 
-BPF_ANNOTATE_KV_PAIR(array_map, int, struct array_elem);
-
 SEC("map_lock_demo")
 int bpf_map_lock_test(struct __sk_buff *skb)
 {
diff --git a/tools/testing/selftests/bpf/progs/test_obj_id.c b/tools/testing/selftests/bpf/progs/test_obj_id.c
index 880d2963b472..2b1c2efdeed4 100644
--- a/tools/testing/selftests/bpf/progs/test_obj_id.c
+++ b/tools/testing/selftests/bpf/progs/test_obj_id.c
@@ -16,10 +16,13 @@
 
 int _version SEC("version") = 1;
 
-struct bpf_map_def SEC("maps") test_map_id = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 *key;
+	__u64 *value;
+} test_map_id SEC(".maps") = {
 	.type = BPF_MAP_TYPE_ARRAY,
-	.key_size = sizeof(__u32),
-	.value_size = sizeof(__u64),
 	.max_entries = 1,
 };
 
diff --git a/tools/testing/selftests/bpf/progs/test_select_reuseport_kern.c b/tools/testing/selftests/bpf/progs/test_select_reuseport_kern.c
index 5b54ec637ada..435a9527733e 100644
--- a/tools/testing/selftests/bpf/progs/test_select_reuseport_kern.c
+++ b/tools/testing/selftests/bpf/progs/test_select_reuseport_kern.c
@@ -21,38 +21,55 @@ int _version SEC("version") = 1;
 #define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)
 #endif
 
-struct bpf_map_def SEC("maps") outer_map = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 key_size;
+	__u32 value_size;
+} outer_map SEC(".maps") = {
 	.type = BPF_MAP_TYPE_ARRAY_OF_MAPS,
+	.max_entries = 1,
 	.key_size = sizeof(__u32),
 	.value_size = sizeof(__u32),
-	.max_entries = 1,
 };
 
-struct bpf_map_def SEC("maps") result_map = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 *key;
+	__u32 *value;
+} result_map SEC(".maps") = {
 	.type = BPF_MAP_TYPE_ARRAY,
-	.key_size = sizeof(__u32),
-	.value_size = sizeof(__u32),
 	.max_entries = NR_RESULTS,
 };
 
-struct bpf_map_def SEC("maps") tmp_index_ovr_map = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 *key;
+	int *value;
+} tmp_index_ovr_map SEC(".maps") = {
 	.type = BPF_MAP_TYPE_ARRAY,
-	.key_size = sizeof(__u32),
-	.value_size = sizeof(int),
 	.max_entries = 1,
 };
 
-struct bpf_map_def SEC("maps") linum_map = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 *key;
+	__u32 *value;
+} linum_map SEC(".maps") = {
 	.type = BPF_MAP_TYPE_ARRAY,
-	.key_size = sizeof(__u32),
-	.value_size = sizeof(__u32),
 	.max_entries = 1,
 };
 
-struct bpf_map_def SEC("maps") data_check_map = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 *key;
+	struct data_check *value;
+} data_check_map SEC(".maps") = {
 	.type = BPF_MAP_TYPE_ARRAY,
-	.key_size = sizeof(__u32),
-	.value_size = sizeof(struct data_check),
 	.max_entries = 1,
 };
 
diff --git a/tools/testing/selftests/bpf/progs/test_send_signal_kern.c b/tools/testing/selftests/bpf/progs/test_send_signal_kern.c
index 45a1a1a2c345..6ac68be5d68b 100644
--- a/tools/testing/selftests/bpf/progs/test_send_signal_kern.c
+++ b/tools/testing/selftests/bpf/progs/test_send_signal_kern.c
@@ -4,24 +4,26 @@
 #include <linux/version.h>
 #include "bpf_helpers.h"
 
-struct bpf_map_def SEC("maps") info_map = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 *key;
+	__u64 *value;
+} info_map SEC(".maps") = {
 	.type = BPF_MAP_TYPE_ARRAY,
-	.key_size = sizeof(__u32),
-	.value_size = sizeof(__u64),
 	.max_entries = 1,
 };
 
-BPF_ANNOTATE_KV_PAIR(info_map, __u32, __u64);
-
-struct bpf_map_def SEC("maps") status_map = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 *key;
+	__u64 *value;
+} status_map SEC(".maps") = {
 	.type = BPF_MAP_TYPE_ARRAY,
-	.key_size = sizeof(__u32),
-	.value_size = sizeof(__u64),
 	.max_entries = 1,
 };
 
-BPF_ANNOTATE_KV_PAIR(status_map, __u32, __u64);
-
 SEC("send_signal_demo")
 int bpf_send_signal_test(void *ctx)
 {
diff --git a/tools/testing/selftests/bpf/progs/test_skb_cgroup_id_kern.c b/tools/testing/selftests/bpf/progs/test_skb_cgroup_id_kern.c
index 68cf9829f5a7..af296b876156 100644
--- a/tools/testing/selftests/bpf/progs/test_skb_cgroup_id_kern.c
+++ b/tools/testing/selftests/bpf/progs/test_skb_cgroup_id_kern.c
@@ -10,10 +10,13 @@
 
 #define NUM_CGROUP_LEVELS	4
 
-struct bpf_map_def SEC("maps") cgroup_ids = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 *key;
+	__u64 *value;
+} cgroup_ids SEC(".maps") = {
 	.type = BPF_MAP_TYPE_ARRAY,
-	.key_size = sizeof(__u32),
-	.value_size = sizeof(__u64),
 	.max_entries = NUM_CGROUP_LEVELS,
 };
 
diff --git a/tools/testing/selftests/bpf/progs/test_sock_fields_kern.c b/tools/testing/selftests/bpf/progs/test_sock_fields_kern.c
index 1c39e4ccb7f1..c3d383d650cb 100644
--- a/tools/testing/selftests/bpf/progs/test_sock_fields_kern.c
+++ b/tools/testing/selftests/bpf/progs/test_sock_fields_kern.c
@@ -27,31 +27,43 @@ enum bpf_linum_array_idx {
 	__NR_BPF_LINUM_ARRAY_IDX,
 };
 
-struct bpf_map_def SEC("maps") addr_map = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 *key;
+	struct sockaddr_in6 *value;
+} addr_map SEC(".maps") = {
 	.type = BPF_MAP_TYPE_ARRAY,
-	.key_size = sizeof(__u32),
-	.value_size = sizeof(struct sockaddr_in6),
 	.max_entries = __NR_BPF_ADDR_ARRAY_IDX,
 };
 
-struct bpf_map_def SEC("maps") sock_result_map = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 *key;
+	struct bpf_sock *value;
+} sock_result_map SEC(".maps") = {
 	.type = BPF_MAP_TYPE_ARRAY,
-	.key_size = sizeof(__u32),
-	.value_size = sizeof(struct bpf_sock),
 	.max_entries = __NR_BPF_RESULT_ARRAY_IDX,
 };
 
-struct bpf_map_def SEC("maps") tcp_sock_result_map = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 *key;
+	struct bpf_tcp_sock *value;
+} tcp_sock_result_map SEC(".maps") = {
 	.type = BPF_MAP_TYPE_ARRAY,
-	.key_size = sizeof(__u32),
-	.value_size = sizeof(struct bpf_tcp_sock),
 	.max_entries = __NR_BPF_RESULT_ARRAY_IDX,
 };
 
-struct bpf_map_def SEC("maps") linum_map = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 *key;
+	__u32 *value;
+} linum_map SEC(".maps") = {
 	.type = BPF_MAP_TYPE_ARRAY,
-	.key_size = sizeof(__u32),
-	.value_size = sizeof(__u32),
 	.max_entries = __NR_BPF_LINUM_ARRAY_IDX,
 };
 
@@ -60,26 +72,26 @@ struct bpf_spinlock_cnt {
 	__u32 cnt;
 };
 
-struct bpf_map_def SEC("maps") sk_pkt_out_cnt = {
+struct {
+	__u32 type;
+	__u32 map_flags;
+	int *key;
+	struct bpf_spinlock_cnt *value;
+} sk_pkt_out_cnt SEC(".maps") = {
 	.type = BPF_MAP_TYPE_SK_STORAGE,
-	.key_size = sizeof(int),
-	.value_size = sizeof(struct bpf_spinlock_cnt),
-	.max_entries = 0,
 	.map_flags = BPF_F_NO_PREALLOC,
 };
 
-BPF_ANNOTATE_KV_PAIR(sk_pkt_out_cnt, int, struct bpf_spinlock_cnt);
-
-struct bpf_map_def SEC("maps") sk_pkt_out_cnt10 = {
+struct {
+	__u32 type;
+	__u32 map_flags;
+	int *key;
+	struct bpf_spinlock_cnt *value;
+} sk_pkt_out_cnt10 SEC(".maps") = {
 	.type = BPF_MAP_TYPE_SK_STORAGE,
-	.key_size = sizeof(int),
-	.value_size = sizeof(struct bpf_spinlock_cnt),
-	.max_entries = 0,
 	.map_flags = BPF_F_NO_PREALLOC,
 };
 
-BPF_ANNOTATE_KV_PAIR(sk_pkt_out_cnt10, int, struct bpf_spinlock_cnt);
-
 static bool is_loopback6(__u32 *a6)
 {
 	return !a6[0] && !a6[1] && !a6[2] && a6[3] == bpf_htonl(1);
diff --git a/tools/testing/selftests/bpf/progs/test_spin_lock.c b/tools/testing/selftests/bpf/progs/test_spin_lock.c
index 40f904312090..0a77ae36d981 100644
--- a/tools/testing/selftests/bpf/progs/test_spin_lock.c
+++ b/tools/testing/selftests/bpf/progs/test_spin_lock.c
@@ -10,30 +10,29 @@ struct hmap_elem {
 	int test_padding;
 };
 
-struct bpf_map_def SEC("maps") hmap = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	int *key;
+	struct hmap_elem *value;
+} hmap SEC(".maps") = {
 	.type = BPF_MAP_TYPE_HASH,
-	.key_size = sizeof(int),
-	.value_size = sizeof(struct hmap_elem),
 	.max_entries = 1,
 };
 
-BPF_ANNOTATE_KV_PAIR(hmap, int, struct hmap_elem);
-
-
 struct cls_elem {
 	struct bpf_spin_lock lock;
 	volatile int cnt;
 };
 
-struct bpf_map_def SEC("maps") cls_map = {
+struct {
+	__u32 type;
+	struct bpf_cgroup_storage_key *key;
+	struct cls_elem *value;
+} cls_map SEC(".maps") = {
 	.type = BPF_MAP_TYPE_CGROUP_STORAGE,
-	.key_size = sizeof(struct bpf_cgroup_storage_key),
-	.value_size = sizeof(struct cls_elem),
 };
 
-BPF_ANNOTATE_KV_PAIR(cls_map, struct bpf_cgroup_storage_key,
-		     struct cls_elem);
-
 struct bpf_vqueue {
 	struct bpf_spin_lock lock;
 	/* 4 byte hole */
@@ -42,14 +41,16 @@ struct bpf_vqueue {
 	unsigned int rate;
 };
 
-struct bpf_map_def SEC("maps") vqueue = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	int *key;
+	struct bpf_vqueue *value;
+} vqueue SEC(".maps") = {
 	.type = BPF_MAP_TYPE_ARRAY,
-	.key_size = sizeof(int),
-	.value_size = sizeof(struct bpf_vqueue),
 	.max_entries = 1,
 };
 
-BPF_ANNOTATE_KV_PAIR(vqueue, int, struct bpf_vqueue);
 #define CREDIT_PER_NS(delta, rate) (((delta) * rate) >> 20)
 
 SEC("spin_lock_demo")
diff --git a/tools/testing/selftests/bpf/progs/test_stacktrace_build_id.c b/tools/testing/selftests/bpf/progs/test_stacktrace_build_id.c
index d86c281e957f..fcf2280bb60c 100644
--- a/tools/testing/selftests/bpf/progs/test_stacktrace_build_id.c
+++ b/tools/testing/selftests/bpf/progs/test_stacktrace_build_id.c
@@ -8,34 +8,50 @@
 #define PERF_MAX_STACK_DEPTH         127
 #endif
 
-struct bpf_map_def SEC("maps") control_map = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 *key;
+	__u32 *value;
+} control_map SEC(".maps") = {
 	.type = BPF_MAP_TYPE_ARRAY,
-	.key_size = sizeof(__u32),
-	.value_size = sizeof(__u32),
 	.max_entries = 1,
 };
 
-struct bpf_map_def SEC("maps") stackid_hmap = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 *key;
+	__u32 *value;
+} stackid_hmap SEC(".maps") = {
 	.type = BPF_MAP_TYPE_HASH,
-	.key_size = sizeof(__u32),
-	.value_size = sizeof(__u32),
 	.max_entries = 16384,
 };
 
-struct bpf_map_def SEC("maps") stackmap = {
+typedef struct bpf_stack_build_id stack_trace_t[PERF_MAX_STACK_DEPTH];
+
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 map_flags;
+	__u32 key_size;
+	__u32 value_size;
+} stackmap SEC(".maps") = {
 	.type = BPF_MAP_TYPE_STACK_TRACE,
-	.key_size = sizeof(__u32),
-	.value_size = sizeof(struct bpf_stack_build_id)
-		* PERF_MAX_STACK_DEPTH,
 	.max_entries = 128,
 	.map_flags = BPF_F_STACK_BUILD_ID,
+	.key_size = sizeof(__u32),
+	.value_size = sizeof(stack_trace_t),
 };
 
-struct bpf_map_def SEC("maps") stack_amap = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 *key;
+	/* there seems to be a bug in kernel not handling typedef properly */
+	struct bpf_stack_build_id (*value)[PERF_MAX_STACK_DEPTH];
+} stack_amap SEC(".maps") = {
 	.type = BPF_MAP_TYPE_ARRAY,
-	.key_size = sizeof(__u32),
-	.value_size = sizeof(struct bpf_stack_build_id)
-		* PERF_MAX_STACK_DEPTH,
 	.max_entries = 128,
 };
 
diff --git a/tools/testing/selftests/bpf/progs/test_stacktrace_map.c b/tools/testing/selftests/bpf/progs/test_stacktrace_map.c
index af111af7ca1a..7ad09adbf648 100644
--- a/tools/testing/selftests/bpf/progs/test_stacktrace_map.c
+++ b/tools/testing/selftests/bpf/progs/test_stacktrace_map.c
@@ -8,31 +8,47 @@
 #define PERF_MAX_STACK_DEPTH         127
 #endif
 
-struct bpf_map_def SEC("maps") control_map = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 *key;
+	__u32 *value;
+} control_map SEC(".maps") = {
 	.type = BPF_MAP_TYPE_ARRAY,
-	.key_size = sizeof(__u32),
-	.value_size = sizeof(__u32),
 	.max_entries = 1,
 };
 
-struct bpf_map_def SEC("maps") stackid_hmap = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 *key;
+	__u32 *value;
+} stackid_hmap SEC(".maps") = {
 	.type = BPF_MAP_TYPE_HASH,
-	.key_size = sizeof(__u32),
-	.value_size = sizeof(__u32),
 	.max_entries = 16384,
 };
 
-struct bpf_map_def SEC("maps") stackmap = {
+typedef __u64 stack_trace_t[PERF_MAX_STACK_DEPTH];
+
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 key_size;
+	__u32 value_size;
+} stackmap SEC(".maps") = {
 	.type = BPF_MAP_TYPE_STACK_TRACE,
-	.key_size = sizeof(__u32),
-	.value_size = sizeof(__u64) * PERF_MAX_STACK_DEPTH,
 	.max_entries = 16384,
+	.key_size = sizeof(__u32),
+	.value_size = sizeof(stack_trace_t),
 };
 
-struct bpf_map_def SEC("maps") stack_amap = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 *key;
+	__u64 (*value)[PERF_MAX_STACK_DEPTH];
+} stack_amap SEC(".maps") = {
 	.type = BPF_MAP_TYPE_ARRAY,
-	.key_size = sizeof(__u32),
-	.value_size = sizeof(__u64) * PERF_MAX_STACK_DEPTH,
 	.max_entries = 16384,
 };
 
diff --git a/tools/testing/selftests/bpf/progs/test_tc_edt.c b/tools/testing/selftests/bpf/progs/test_tc_edt.c
index 3af64c470d64..c2781dd78617 100644
--- a/tools/testing/selftests/bpf/progs/test_tc_edt.c
+++ b/tools/testing/selftests/bpf/progs/test_tc_edt.c
@@ -16,10 +16,13 @@
 #define THROTTLE_RATE_BPS (5 * 1000 * 1000)
 
 /* flow_key => last_tstamp timestamp used */
-struct bpf_map_def SEC("maps") flow_map = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	uint32_t *key;
+	uint64_t *value;
+} flow_map SEC(".maps") = {
 	.type = BPF_MAP_TYPE_HASH,
-	.key_size = sizeof(uint32_t),
-	.value_size = sizeof(uint64_t),
 	.max_entries = 1,
 };
 
diff --git a/tools/testing/selftests/bpf/progs/test_tcp_check_syncookie_kern.c b/tools/testing/selftests/bpf/progs/test_tcp_check_syncookie_kern.c
index 1ab095bcacd8..0f1725e25c44 100644
--- a/tools/testing/selftests/bpf/progs/test_tcp_check_syncookie_kern.c
+++ b/tools/testing/selftests/bpf/progs/test_tcp_check_syncookie_kern.c
@@ -16,10 +16,13 @@
 #include "bpf_helpers.h"
 #include "bpf_endian.h"
 
-struct bpf_map_def SEC("maps") results = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 *key;
+	__u64 *value;
+} results SEC(".maps") = {
 	.type = BPF_MAP_TYPE_ARRAY,
-	.key_size = sizeof(__u32),
-	.value_size = sizeof(__u64),
 	.max_entries = 1,
 };
 
diff --git a/tools/testing/selftests/bpf/progs/test_tcp_estats.c b/tools/testing/selftests/bpf/progs/test_tcp_estats.c
index bee3bbecc0c4..df98f7e32832 100644
--- a/tools/testing/selftests/bpf/progs/test_tcp_estats.c
+++ b/tools/testing/selftests/bpf/progs/test_tcp_estats.c
@@ -148,10 +148,13 @@ struct tcp_estats_basic_event {
 	struct tcp_estats_conn_id conn_id;
 };
 
-struct bpf_map_def SEC("maps") ev_record_map = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 *key;
+	struct tcp_estats_basic_event *value;
+} ev_record_map SEC(".maps") = {
 	.type = BPF_MAP_TYPE_HASH,
-	.key_size = sizeof(__u32),
-	.value_size = sizeof(struct tcp_estats_basic_event),
 	.max_entries = 1024,
 };
 
diff --git a/tools/testing/selftests/bpf/progs/test_tcpbpf_kern.c b/tools/testing/selftests/bpf/progs/test_tcpbpf_kern.c
index c7c3240e0dd4..38e10c9fd996 100644
--- a/tools/testing/selftests/bpf/progs/test_tcpbpf_kern.c
+++ b/tools/testing/selftests/bpf/progs/test_tcpbpf_kern.c
@@ -14,17 +14,23 @@
 #include "bpf_endian.h"
 #include "test_tcpbpf.h"
 
-struct bpf_map_def SEC("maps") global_map = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 *key;
+	struct tcpbpf_globals *value;
+} global_map SEC(".maps") = {
 	.type = BPF_MAP_TYPE_ARRAY,
-	.key_size = sizeof(__u32),
-	.value_size = sizeof(struct tcpbpf_globals),
 	.max_entries = 4,
 };
 
-struct bpf_map_def SEC("maps") sockopt_results = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 *key;
+	int *value;
+} sockopt_results SEC(".maps") = {
 	.type = BPF_MAP_TYPE_ARRAY,
-	.key_size = sizeof(__u32),
-	.value_size = sizeof(int),
 	.max_entries = 2,
 };
 
diff --git a/tools/testing/selftests/bpf/progs/test_tcpnotify_kern.c b/tools/testing/selftests/bpf/progs/test_tcpnotify_kern.c
index ec6db6e64c41..d073d37d4e27 100644
--- a/tools/testing/selftests/bpf/progs/test_tcpnotify_kern.c
+++ b/tools/testing/selftests/bpf/progs/test_tcpnotify_kern.c
@@ -14,18 +14,26 @@
 #include "bpf_endian.h"
 #include "test_tcpnotify.h"
 
-struct bpf_map_def SEC("maps") global_map = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 *key;
+	struct tcpnotify_globals *value;
+} global_map SEC(".maps") = {
 	.type = BPF_MAP_TYPE_ARRAY,
-	.key_size = sizeof(__u32),
-	.value_size = sizeof(struct tcpnotify_globals),
 	.max_entries = 4,
 };
 
-struct bpf_map_def SEC("maps") perf_event_map = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 key_size;
+	__u32 value_size;
+} perf_event_map SEC(".maps") = {
 	.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
+	.max_entries = 2,
 	.key_size = sizeof(int),
 	.value_size = sizeof(__u32),
-	.max_entries = 2,
 };
 
 int _version SEC("version") = 1;
diff --git a/tools/testing/selftests/bpf/progs/test_xdp.c b/tools/testing/selftests/bpf/progs/test_xdp.c
index 5e7df8bb5b5d..ec3d2c1c8cf9 100644
--- a/tools/testing/selftests/bpf/progs/test_xdp.c
+++ b/tools/testing/selftests/bpf/progs/test_xdp.c
@@ -22,17 +22,23 @@
 
 int _version SEC("version") = 1;
 
-struct bpf_map_def SEC("maps") rxcnt = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 *key;
+	__u64 *value;
+} rxcnt SEC(".maps") = {
 	.type = BPF_MAP_TYPE_PERCPU_ARRAY,
-	.key_size = sizeof(__u32),
-	.value_size = sizeof(__u64),
 	.max_entries = 256,
 };
 
-struct bpf_map_def SEC("maps") vip2tnl = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	struct vip *key;
+	struct iptnl_info *value;
+} vip2tnl SEC(".maps") = {
 	.type = BPF_MAP_TYPE_HASH,
-	.key_size = sizeof(struct vip),
-	.value_size = sizeof(struct iptnl_info),
 	.max_entries = MAX_IPTNL_ENTRIES,
 };
 
diff --git a/tools/testing/selftests/bpf/progs/test_xdp_noinline.c b/tools/testing/selftests/bpf/progs/test_xdp_noinline.c
index 4fe6aaad22a4..d2eddb5553d1 100644
--- a/tools/testing/selftests/bpf/progs/test_xdp_noinline.c
+++ b/tools/testing/selftests/bpf/progs/test_xdp_noinline.c
@@ -163,52 +163,66 @@ struct lb_stats {
 	__u64 v1;
 };
 
-struct bpf_map_def __attribute__ ((section("maps"), used)) vip_map = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	struct vip_definition *key;
+	struct vip_meta *value;
+} vip_map SEC(".maps") = {
 	.type = BPF_MAP_TYPE_HASH,
-	.key_size = sizeof(struct vip_definition),
-	.value_size = sizeof(struct vip_meta),
 	.max_entries = 512,
-	.map_flags = 0,
 };
 
-struct bpf_map_def __attribute__ ((section("maps"), used)) lru_cache = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 map_flags;
+	struct flow_key *key;
+	struct real_pos_lru *value;
+} lru_cache SEC(".maps") = {
 	.type = BPF_MAP_TYPE_LRU_HASH,
-	.key_size = sizeof(struct flow_key),
-	.value_size = sizeof(struct real_pos_lru),
 	.max_entries = 300,
 	.map_flags = 1U << 1,
 };
 
-struct bpf_map_def __attribute__ ((section("maps"), used)) ch_rings = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 *key;
+	__u32 *value;
+} ch_rings SEC(".maps") = {
 	.type = BPF_MAP_TYPE_ARRAY,
-	.key_size = sizeof(__u32),
-	.value_size = sizeof(__u32),
 	.max_entries = 12 * 655,
-	.map_flags = 0,
 };
 
-struct bpf_map_def __attribute__ ((section("maps"), used)) reals = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 *key;
+	struct real_definition *value;
+} reals SEC(".maps") = {
 	.type = BPF_MAP_TYPE_ARRAY,
-	.key_size = sizeof(__u32),
-	.value_size = sizeof(struct real_definition),
 	.max_entries = 40,
-	.map_flags = 0,
 };
 
-struct bpf_map_def __attribute__ ((section("maps"), used)) stats = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 *key;
+	struct lb_stats *value;
+} stats SEC(".maps") = {
 	.type = BPF_MAP_TYPE_PERCPU_ARRAY,
-	.key_size = sizeof(__u32),
-	.value_size = sizeof(struct lb_stats),
 	.max_entries = 515,
-	.map_flags = 0,
 };
 
-struct bpf_map_def __attribute__ ((section("maps"), used)) ctl_array = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 *key;
+	struct ctl_value *value;
+} ctl_array SEC(".maps") = {
 	.type = BPF_MAP_TYPE_ARRAY,
-	.key_size = sizeof(__u32),
-	.value_size = sizeof(struct ctl_value),
 	.max_entries = 16,
-	.map_flags = 0,
 };
 
 struct eth_hdr {
diff --git a/tools/testing/selftests/bpf/test_queue_stack_map.h b/tools/testing/selftests/bpf/test_queue_stack_map.h
index 295b9b3bc5c7..f284137a36c4 100644
--- a/tools/testing/selftests/bpf/test_queue_stack_map.h
+++ b/tools/testing/selftests/bpf/test_queue_stack_map.h
@@ -10,20 +10,28 @@
 
 int _version SEC("version") = 1;
 
-struct bpf_map_def __attribute__ ((section("maps"), used)) map_in = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 key_size;
+	__u32 value_size;
+} map_in SEC(".maps") = {
 	.type = MAP_TYPE,
+	.max_entries = 32,
 	.key_size = 0,
 	.value_size = sizeof(__u32),
-	.max_entries = 32,
-	.map_flags = 0,
 };
 
-struct bpf_map_def __attribute__ ((section("maps"), used)) map_out = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 key_size;
+	__u32 value_size;
+} map_out SEC(".maps") = {
 	.type = MAP_TYPE,
+	.max_entries = 32,
 	.key_size = 0,
 	.value_size = sizeof(__u32),
-	.max_entries = 32,
-	.map_flags = 0,
 };
 
 SEC("test")
diff --git a/tools/testing/selftests/bpf/test_sockmap_kern.h b/tools/testing/selftests/bpf/test_sockmap_kern.h
index 4e7d3da21357..70b9236cedb0 100644
--- a/tools/testing/selftests/bpf/test_sockmap_kern.h
+++ b/tools/testing/selftests/bpf/test_sockmap_kern.h
@@ -28,59 +28,89 @@
  * are established and verdicts are decided.
  */
 
-struct bpf_map_def SEC("maps") sock_map = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 key_size;
+	__u32 value_size;
+} sock_map SEC(".maps") = {
 	.type = TEST_MAP_TYPE,
+	.max_entries = 20,
 	.key_size = sizeof(int),
 	.value_size = sizeof(int),
-	.max_entries = 20,
 };
 
-struct bpf_map_def SEC("maps") sock_map_txmsg = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 key_size;
+	__u32 value_size;
+} sock_map_txmsg SEC(".maps") = {
 	.type = TEST_MAP_TYPE,
+	.max_entries = 20,
 	.key_size = sizeof(int),
 	.value_size = sizeof(int),
-	.max_entries = 20,
 };
 
-struct bpf_map_def SEC("maps") sock_map_redir = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	__u32 key_size;
+	__u32 value_size;
+} sock_map_redir SEC(".maps") = {
 	.type = TEST_MAP_TYPE,
+	.max_entries = 20,
 	.key_size = sizeof(int),
 	.value_size = sizeof(int),
-	.max_entries = 20,
 };
 
-struct bpf_map_def SEC("maps") sock_apply_bytes = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	int *key;
+	int *value;
+} sock_apply_bytes SEC(".maps") = {
 	.type = BPF_MAP_TYPE_ARRAY,
-	.key_size = sizeof(int),
-	.value_size = sizeof(int),
 	.max_entries = 1
 };
 
-struct bpf_map_def SEC("maps") sock_cork_bytes = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	int *key;
+	int *value;
+} sock_cork_bytes SEC(".maps") = {
 	.type = BPF_MAP_TYPE_ARRAY,
-	.key_size = sizeof(int),
-	.value_size = sizeof(int),
 	.max_entries = 1
 };
 
-struct bpf_map_def SEC("maps") sock_bytes = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	int *key;
+	int *value;
+} sock_bytes SEC(".maps") = {
 	.type = BPF_MAP_TYPE_ARRAY,
-	.key_size = sizeof(int),
-	.value_size = sizeof(int),
 	.max_entries = 6
 };
 
-struct bpf_map_def SEC("maps") sock_redir_flags = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	int *key;
+	int *value;
+} sock_redir_flags SEC(".maps") = {
 	.type = BPF_MAP_TYPE_ARRAY,
-	.key_size = sizeof(int),
-	.value_size = sizeof(int),
 	.max_entries = 1
 };
 
-struct bpf_map_def SEC("maps") sock_skb_opts = {
+struct {
+	__u32 type;
+	__u32 max_entries;
+	int *key;
+	int *value;
+} sock_skb_opts SEC(".maps") = {
 	.type = BPF_MAP_TYPE_ARRAY,
-	.key_size = sizeof(int),
-	.value_size = sizeof(int),
 	.max_entries = 1
 };
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF
  2019-05-31 20:21 ` [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF Andrii Nakryiko
@ 2019-05-31 21:28   ` Stanislav Fomichev
  2019-05-31 22:58     ` Andrii Nakryiko
  2019-06-03 22:34   ` Andrii Nakryiko
  2019-06-06 16:42   ` Lorenz Bauer
  2 siblings, 1 reply; 40+ messages in thread
From: Stanislav Fomichev @ 2019-05-31 21:28 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: andrii.nakryiko, netdev, bpf, ast, daniel, kernel-team

On 05/31, Andrii Nakryiko wrote:
> This patch adds support for a new way to define BPF maps. It relies on
> BTF to describe mandatory and optional attributes of a map, as well as
> captures type information of key and value naturally. This eliminates
> the need for BPF_ANNOTATE_KV_PAIR hack and ensures key/value sizes are
> always in sync with the key/value type.
My 2c: this is too magical and relies on me knowing the expected fields.
(also, the compiler won't be able to help with the misspellings).

I don't know how others feel about it, but I'd be much more comfortable
with a simpler TLV-like approach. Have a new section where the format
is |4-byte size|struct bpf_map_def_extendable|. That would essentially
allow us to extend it the way we do with a syscall args.

Also, (un)related: we don't currently use BTF internally, so if
you convert all tests, we'd be unable to run them :-(

> Relying on BTF, this approach allows for both forward and backward
> compatibility w.r.t. extending supported map definition features. Old
> libbpf implementation will ignore fields it doesn't recognize, while new
> implementations will parse and recognize new optional attributes.
I also don't know how to feel about old libbpf ignoring some attributes.
In the kernel we require that the unknown fields are zeroed.
We probably need to do something like that here? What do you think
would be a good example of an optional attribute?

> The outline of the new map definition (short, BTF-defined maps) is as follows:
> 1. All the maps should be defined in .maps ELF section. It's possible to
>    have both "legacy" map definitions in `maps` sections and BTF-defined
>    maps in .maps sections. Everything will still work transparently.
> 2. The map declaration and initialization is done through
>    a global/static variable of a struct type with few mandatory and
>    extra optional fields:
>    - type field is mandatory and specified type of BPF map;
>    - key/value fields are mandatory and capture key/value type/size information;
>    - max_entries attribute is optional; if max_entries is not specified or
>      initialized, it has to be provided in runtime through libbpf API
>      before loading bpf_object;
>    - map_flags is optional and if not defined, will be assumed to be 0.
> 3. Key/value fields should be **a pointer** to a type describing
>    key/value. The pointee type is assumed (and will be recorded as such
>    and used for size determination) to be a type describing key/value of
>    the map. This is done to save excessive amounts of space allocated in
>    corresponding ELF sections for key/value of big size.
> 4. As some maps disallow having BTF type ID associated with key/value,
>    it's possible to specify key/value size explicitly without
>    associating BTF type ID with it. Use key_size and value_size fields
>    to do that (see example below).
> 
> Here's an example of simple ARRAY map defintion:
> 
> struct my_value { int x, y, z; };
> 
> struct {
> 	int type;
> 	int max_entries;
> 	int *key;
> 	struct my_value *value;
> } btf_map SEC(".maps") = {
> 	.type = BPF_MAP_TYPE_ARRAY,
> 	.max_entries = 16,
> };
> 
> This will define BPF ARRAY map 'btf_map' with 16 elements. The key will
> be of type int and thus key size will be 4 bytes. The value is struct
> my_value of size 12 bytes. This map can be used from C code exactly the
> same as with existing maps defined through struct bpf_map_def.
> 
> Here's an example of STACKMAP definition (which currently disallows BTF type
> IDs for key/value):
> 
> struct {
> 	__u32 type;
> 	__u32 max_entries;
> 	__u32 map_flags;
> 	__u32 key_size;
> 	__u32 value_size;
> } stackmap SEC(".maps") = {
> 	.type = BPF_MAP_TYPE_STACK_TRACE,
> 	.max_entries = 128,
> 	.map_flags = BPF_F_STACK_BUILD_ID,
> 	.key_size = sizeof(__u32),
> 	.value_size = PERF_MAX_STACK_DEPTH * sizeof(struct bpf_stack_build_id),
> };
> 
> This approach is naturally extended to support map-in-map, by making a value
> field to be another struct that describes inner map. This feature is not
> implemented yet. It's also possible to incrementally add features like pinning
> with full backwards and forward compatibility.
> 
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> ---
>  tools/lib/bpf/btf.h    |   1 +
>  tools/lib/bpf/libbpf.c | 333 +++++++++++++++++++++++++++++++++++++++--
>  2 files changed, 325 insertions(+), 9 deletions(-)
> 
> diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
> index ba4ffa831aa4..88a52ae56fc6 100644
> --- a/tools/lib/bpf/btf.h
> +++ b/tools/lib/bpf/btf.h
> @@ -17,6 +17,7 @@ extern "C" {
>  
>  #define BTF_ELF_SEC ".BTF"
>  #define BTF_EXT_ELF_SEC ".BTF.ext"
> +#define MAPS_ELF_SEC ".maps"
>  
>  struct btf;
>  struct btf_ext;
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 79a8143240d7..5a8f1e82809b 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -262,6 +262,7 @@ struct bpf_object {
>  		} *reloc;
>  		int nr_reloc;
>  		int maps_shndx;
> +		int btf_maps_shndx;
>  		int text_shndx;
>  		int data_shndx;
>  		int rodata_shndx;
> @@ -514,6 +515,7 @@ static struct bpf_object *bpf_object__new(const char *path,
>  	obj->efile.obj_buf = obj_buf;
>  	obj->efile.obj_buf_sz = obj_buf_sz;
>  	obj->efile.maps_shndx = -1;
> +	obj->efile.btf_maps_shndx = -1;
>  	obj->efile.data_shndx = -1;
>  	obj->efile.rodata_shndx = -1;
>  	obj->efile.bss_shndx = -1;
> @@ -1012,6 +1014,292 @@ static int bpf_object__init_user_maps(struct bpf_object *obj, bool strict)
>  	return 0;
>  }
>  
> +static const struct btf_type *skip_mods_and_typedefs(const struct btf *btf,
> +						     __u32 id)
> +{
> +	const struct btf_type *t = btf__type_by_id(btf, id);
> +
> +	while (true) {
> +		switch (BTF_INFO_KIND(t->info)) {
> +		case BTF_KIND_VOLATILE:
> +		case BTF_KIND_CONST:
> +		case BTF_KIND_RESTRICT:
> +		case BTF_KIND_TYPEDEF:
> +			t = btf__type_by_id(btf, t->type);
> +			break;
> +		default:
> +			return t;
> +		}
> +	}
> +}
> +
> +static bool get_map_attr_int(const char *map_name, 
> +			     const struct btf *btf, 
> +			     const struct btf_type *def,
> +			     const struct btf_member *m, 
> +			     const void *data, __u32 *res) {
> +	const struct btf_type *t = skip_mods_and_typedefs(btf, m->type);
> +	const char *name = btf__name_by_offset(btf, m->name_off);
> +	__u32 int_info = *(const __u32 *)(const void *)(t + 1);
> +
> +	if (BTF_INFO_KIND(t->info) != BTF_KIND_INT) {
> +		pr_warning("map '%s': attr '%s': expected INT, got %u.\n",
> +			   map_name, name, BTF_INFO_KIND(t->info));
> +		return false;
> +	}
> +	if (t->size != 4 || BTF_INT_BITS(int_info) != 32 ||
> +	    BTF_INT_OFFSET(int_info)) {
> +		pr_warning("map '%s': attr '%s': expected 32-bit non-bitfield integer, "
> +			   "got %u-byte (%d-bit) one with bit offset %d.\n",
> +			   map_name, name, t->size, BTF_INT_BITS(int_info),
> +			   BTF_INT_OFFSET(int_info));
> +		return false;
> +	}
> +	if (BTF_INFO_KFLAG(def->info) && BTF_MEMBER_BITFIELD_SIZE(m->offset)) {
> +		pr_warning("map '%s': attr '%s': bitfield is not supported.\n",
> +			   map_name, name);
> +		return false;
> +	}
> +	if (m->offset % 32) {
> +		pr_warning("map '%s': attr '%s': unaligned fields are not supported.\n",
> +			   map_name, name);
> +		return false;
> +	}
> +
> +	*res = *(const __u32 *)(data + m->offset / 8);
> +	return true;
> +}
> +
> +static int bpf_object__init_user_btf_map(struct bpf_object *obj,
> +					 const struct btf_type *sec,
> +					 int var_idx, int sec_idx,
> +					 const Elf_Data *data)
> +{
> +	const struct btf_type *var, *def, *t;
> +	const struct btf_var_secinfo *vi;
> +	const struct btf_var *var_extra;
> +	const struct btf_member *m;
> +	const void *def_data;
> +	const char *map_name;
> +	struct bpf_map *map;
> +	int vlen, i;
> +
> +	vi = (const struct btf_var_secinfo *)(const void *)(sec + 1) + var_idx;
> +	var = btf__type_by_id(obj->btf, vi->type);
> +	var_extra = (const void *)(var + 1);
> +	map_name = btf__name_by_offset(obj->btf, var->name_off);
> +	vlen = BTF_INFO_VLEN(var->info);
> +
> +	if (map_name == NULL || map_name[0] == '\0') {
> +		pr_warning("map #%d: empty name.\n", var_idx);
> +		return -EINVAL;
> +	}
> +	if ((__u64)vi->offset + vi->size > data->d_size) {
> +		pr_warning("map '%s' BTF data is corrupted.\n", map_name);
> +		return -EINVAL;
> +	}
> +	if (BTF_INFO_KIND(var->info) != BTF_KIND_VAR) {
> +		pr_warning("map '%s': unexpected var kind %u.\n",
> +			   map_name, BTF_INFO_KIND(var->info));
> +		return -EINVAL;
> +	}
> +	if (var_extra->linkage != BTF_VAR_GLOBAL_ALLOCATED &&
> +	    var_extra->linkage != BTF_VAR_STATIC) {
> +		pr_warning("map '%s': unsupported var linkage %u.\n",
> +			   map_name, var_extra->linkage);
> +		return -EOPNOTSUPP;
> +	}
> +
> +	def = skip_mods_and_typedefs(obj->btf, var->type);
> +	if (BTF_INFO_KIND(def->info) != BTF_KIND_STRUCT) {
> +		pr_warning("map '%s': unexpected def kind %u.\n",
> +			   map_name, BTF_INFO_KIND(var->info));
> +		return -EINVAL;
> +	}
> +	if (def->size > vi->size) {
> +		pr_warning("map '%s': invalid def size.\n", map_name);
> +		return -EINVAL;
> +	}
> +
> +	map = bpf_object__add_map(obj);
> +	if (IS_ERR(map))
> +		return PTR_ERR(map);
> +	map->name = strdup(map_name);
> +	if (!map->name) {
> +		pr_warning("map '%s': failed to alloc map name.\n", map_name);
> +		return -ENOMEM;
> +	}
> +	map->libbpf_type = LIBBPF_MAP_UNSPEC;
> +	map->def.type = BPF_MAP_TYPE_UNSPEC;
> +	map->sec_idx = sec_idx;
> +	map->sec_offset = vi->offset;
> +	pr_debug("map '%s': at sec_idx %d, offset %zu.\n",
> +		 map_name, map->sec_idx, map->sec_offset);
> +
> +	def_data = data->d_buf + vi->offset;
> +	vlen = BTF_INFO_VLEN(def->info);
> +	m = (const void *)(def + 1);
> +	for (i = 0; i < vlen; i++, m++) {
> +		const char *name = btf__name_by_offset(obj->btf, m->name_off);
> +
> +		if (strcmp(name, "type") == 0) {
> +			if (!get_map_attr_int(map_name, obj->btf, def, m,
> +					      def_data, &map->def.type))
> +				return -EINVAL;
> +			pr_debug("map '%s': found type = %u.\n",
> +				 map_name, map->def.type);
> +		} else if (strcmp(name, "max_entries") == 0) {
> +			if (!get_map_attr_int(map_name, obj->btf, def, m,
> +					      def_data, &map->def.max_entries))
> +				return -EINVAL;
> +			pr_debug("map '%s': found max_entries = %u.\n",
> +				 map_name, map->def.max_entries);
> +		} else if (strcmp(name, "map_flags") == 0) {
> +			if (!get_map_attr_int(map_name, obj->btf, def, m,
> +					      def_data, &map->def.map_flags))
> +				return -EINVAL;
> +			pr_debug("map '%s': found map_flags = %u.\n",
> +				 map_name, map->def.map_flags);
> +		} else if (strcmp(name, "key_size") == 0) {
> +			__u32 sz;
> +
> +			if (!get_map_attr_int(map_name, obj->btf, def, m,
> +					      def_data, &sz))
> +				return -EINVAL;
> +			pr_debug("map '%s': found key_size = %u.\n",
> +				 map_name, sz);
> +			if (map->def.key_size && map->def.key_size != sz) {
> +				pr_warning("map '%s': conflictling key size %u != %u.\n",
> +					   map_name, map->def.key_size, sz);
> +				return -EINVAL;
> +			}
> +			map->def.key_size = sz;
> +		} else if (strcmp(name, "key") == 0) {
> +			__s64 sz;
> +
> +			t = btf__type_by_id(obj->btf, m->type);
> +			if (BTF_INFO_KIND(t->info) != BTF_KIND_PTR) {
> +				pr_warning("map '%s': key spec is not PTR: %u.\n",
> +					   map_name, BTF_INFO_KIND(t->info));
> +				return -EINVAL;
> +			}
> +			sz = btf__resolve_size(obj->btf, t->type);
> +			if (sz < 0) {
> +				pr_warning("map '%s': can't determine key size for type [%u]: %lld.\n",
> +					   map_name, t->type, sz);
> +				return sz;
> +			}
> +			pr_debug("map '%s': found key [%u], sz = %lld.\n",
> +				 map_name, t->type, sz);
> +			if (map->def.key_size && map->def.key_size != sz) {
> +				pr_warning("map '%s': conflictling key size %u != %lld.\n",
> +					   map_name, map->def.key_size, sz);
> +				return -EINVAL;
> +			}
> +			map->def.key_size = sz;
> +			map->btf_key_type_id = t->type;
> +		} else if (strcmp(name, "value_size") == 0) {
> +			__u32 sz;
> +
> +			if (!get_map_attr_int(map_name, obj->btf, def, m,
> +					      def_data, &sz))
> +				return -EINVAL;
> +			pr_debug("map '%s': found value_size = %u.\n",
> +				 map_name, sz);
> +			if (map->def.value_size && map->def.value_size != sz) {
> +				pr_warning("map '%s': conflictling value size %u != %u.\n",
> +					   map_name, map->def.value_size, sz);
> +				return -EINVAL;
> +			}
> +			map->def.value_size = sz;
> +		} else if (strcmp(name, "value") == 0) {
> +			__s64 sz;
> +
> +			t = btf__type_by_id(obj->btf, m->type);
> +			if (BTF_INFO_KIND(t->info) != BTF_KIND_PTR) {
> +				pr_warning("map '%s': value spec is not PTR: %u.\n",
> +					   map_name, BTF_INFO_KIND(t->info));
> +				return -EINVAL;
> +			}
> +			sz = btf__resolve_size(obj->btf, t->type);
> +			if (sz < 0) {
> +				pr_warning("map '%s': can't determine value size for type [%u]: %lld.\n",
> +					   map_name, t->type, sz);
> +				return sz;
> +			}
> +			pr_debug("map '%s': found value [%u], sz = %lld.\n",
> +				 map_name, t->type, sz);
> +			if (map->def.value_size && map->def.value_size != sz) {
> +				pr_warning("map '%s': conflictling value size %u != %lld.\n",
> +					   map_name, map->def.value_size, sz);
> +				return -EINVAL;
> +			}
> +			map->def.value_size = sz;
> +			map->btf_value_type_id = t->type;
> +		} else {
> +			pr_debug("map '%s': ignoring unknown def field '%s'.\n",
> +				 map_name, name);
> +		}
> +	}
> +
> +	if (map->def.type == BPF_MAP_TYPE_UNSPEC) {
> +		pr_warning("map '%s': map type isn't specified.\n", map_name);
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
> +static int bpf_object__init_user_btf_maps(struct bpf_object *obj)
> +{
> +	const struct btf_type *sec = NULL;
> +	int nr_types, i, vlen, err;
> +	const struct btf_type *t;
> +	const char *name;
> +	Elf_Data *data;
> +	Elf_Scn *scn;
> +
> +	if (obj->efile.btf_maps_shndx < 0)
> +		return 0;
> +
> +	scn = elf_getscn(obj->efile.elf, obj->efile.btf_maps_shndx);
> +	if (scn)
> +		data = elf_getdata(scn, NULL);
> +	if (!scn || !data) {
> +		pr_warning("failed to get Elf_Data from map section %d (%s)\n",
> +			   obj->efile.maps_shndx, MAPS_ELF_SEC);
> +		return -EINVAL;
> +	}
> +
> +	nr_types = btf__get_nr_types(obj->btf);
> +	for (i = 1; i <= nr_types; i++) {
> +		t = btf__type_by_id(obj->btf, i);
> +		if (BTF_INFO_KIND(t->info) != BTF_KIND_DATASEC)
> +			continue;
> +		name = btf__name_by_offset(obj->btf, t->name_off);
> +		if (strcmp(name, MAPS_ELF_SEC) == 0) {
> +			sec = t;
> +			break;
> +		}
> +	}
> +
> +	if (!sec) {
> +		pr_warning("DATASEC '%s' not found.\n", MAPS_ELF_SEC);
> +		return -ENOENT;
> +	}
> +
> +	vlen = BTF_INFO_VLEN(sec->info);
> +	for (i = 0; i < vlen; i++) {
> +		err = bpf_object__init_user_btf_map(obj, sec, i,
> +						    obj->efile.btf_maps_shndx,
> +						    data);
> +		if (err)
> +			return err;
> +	}
> +
> +	return 0;
> +}
> +
>  static int bpf_object__init_maps(struct bpf_object *obj, int flags)
>  {
>  	bool strict = !(flags & MAPS_RELAX_COMPAT);
> @@ -1021,6 +1309,10 @@ static int bpf_object__init_maps(struct bpf_object *obj, int flags)
>  	if (err)
>  		return err;
>  
> +	err = bpf_object__init_user_btf_maps(obj);
> +	if (err)
> +		return err;
> +
>  	err = bpf_object__init_global_data_maps(obj);
>  	if (err)
>  		return err;
> @@ -1118,10 +1410,16 @@ static void bpf_object__sanitize_btf_ext(struct bpf_object *obj)
>  	}
>  }
>  
> +static bool bpf_object__is_btf_mandatory(const struct bpf_object *obj)
> +{
> +	return obj->efile.btf_maps_shndx >= 0;
> +}
> +
>  static int bpf_object__init_btf(struct bpf_object *obj,
>  				Elf_Data *btf_data,
>  				Elf_Data *btf_ext_data)
>  {
> +	bool btf_required = bpf_object__is_btf_mandatory(obj);
>  	int err = 0;
>  
>  	if (btf_data) {
> @@ -1155,10 +1453,18 @@ static int bpf_object__init_btf(struct bpf_object *obj,
>  	}
>  out:
>  	if (err || IS_ERR(obj->btf)) {
> +		if (btf_required)
> +			err = err ? : PTR_ERR(obj->btf);
> +		else
> +			err = 0;
>  		if (!IS_ERR_OR_NULL(obj->btf))
>  			btf__free(obj->btf);
>  		obj->btf = NULL;
>  	}
> +	if (btf_required && !obj->btf) {
> +		pr_warning("BTF is required, but is missing or corrupted.\n");
> +		return err == 0 ? -ENOENT : err;
> +	}
>  	return 0;
>  }
>  
> @@ -1178,6 +1484,8 @@ static int bpf_object__sanitize_and_load_btf(struct bpf_object *obj)
>  			   BTF_ELF_SEC, err);
>  		btf__free(obj->btf);
>  		obj->btf = NULL;
> +		if (bpf_object__is_btf_mandatory(obj))
> +			return err;
>  	}
>  	return 0;
>  }
> @@ -1241,6 +1549,8 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
>  				return err;
>  		} else if (strcmp(name, "maps") == 0) {
>  			obj->efile.maps_shndx = idx;
> +		} else if (strcmp(name, MAPS_ELF_SEC) == 0) {
> +			obj->efile.btf_maps_shndx = idx;
>  		} else if (strcmp(name, BTF_ELF_SEC) == 0) {
>  			btf_data = data;
>  		} else if (strcmp(name, BTF_EXT_ELF_SEC) == 0) {
> @@ -1360,7 +1670,8 @@ static bool bpf_object__shndx_is_data(const struct bpf_object *obj,
>  static bool bpf_object__shndx_is_maps(const struct bpf_object *obj,
>  				      int shndx)
>  {
> -	return shndx == obj->efile.maps_shndx;
> +	return shndx == obj->efile.maps_shndx ||
> +	       shndx == obj->efile.btf_maps_shndx;
>  }
>  
>  static bool bpf_object__relo_in_known_section(const struct bpf_object *obj,
> @@ -1404,14 +1715,14 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
>  	prog->nr_reloc = nrels;
>  
>  	for (i = 0; i < nrels; i++) {
> -		GElf_Sym sym;
> -		GElf_Rel rel;
> -		unsigned int insn_idx;
> -		unsigned int shdr_idx;
>  		struct bpf_insn *insns = prog->insns;
>  		enum libbpf_map_type type;
> +		unsigned int insn_idx;
> +		unsigned int shdr_idx;
>  		const char *name;
>  		size_t map_idx;
> +		GElf_Sym sym;
> +		GElf_Rel rel;
>  
>  		if (!gelf_getrel(data, i, &rel)) {
>  			pr_warning("relocation: failed to get %d reloc\n", i);
> @@ -1505,14 +1816,18 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
>  	return 0;
>  }
>  
> -static int bpf_map_find_btf_info(struct bpf_map *map, const struct btf *btf)
> +static int bpf_map_find_btf_info(struct bpf_object *obj, struct bpf_map *map)
>  {
>  	struct bpf_map_def *def = &map->def;
>  	__u32 key_type_id = 0, value_type_id = 0;
>  	int ret;
>  
> +	/* if it's BTF-defined map, we don't need to search for type IDs */
> +	if (map->sec_idx == obj->efile.btf_maps_shndx)
> +		return 0;
> +
>  	if (!bpf_map__is_internal(map)) {
> -		ret = btf__get_map_kv_tids(btf, map->name, def->key_size,
> +		ret = btf__get_map_kv_tids(obj->btf, map->name, def->key_size,
>  					   def->value_size, &key_type_id,
>  					   &value_type_id);
>  	} else {
> @@ -1520,7 +1835,7 @@ static int bpf_map_find_btf_info(struct bpf_map *map, const struct btf *btf)
>  		 * LLVM annotates global data differently in BTF, that is,
>  		 * only as '.data', '.bss' or '.rodata'.
>  		 */
> -		ret = btf__find_by_name(btf,
> +		ret = btf__find_by_name(obj->btf,
>  				libbpf_type_to_btf_name[map->libbpf_type]);
>  	}
>  	if (ret < 0)
> @@ -1810,7 +2125,7 @@ bpf_object__create_maps(struct bpf_object *obj)
>  		    map->inner_map_fd >= 0)
>  			create_attr.inner_map_fd = map->inner_map_fd;
>  
> -		if (obj->btf && !bpf_map_find_btf_info(map, obj->btf)) {
> +		if (obj->btf && !bpf_map_find_btf_info(obj, map)) {
>  			create_attr.btf_fd = btf__fd(obj->btf);
>  			create_attr.btf_key_type_id = map->btf_key_type_id;
>  			create_attr.btf_value_type_id = map->btf_value_type_id;
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF
  2019-05-31 21:28   ` Stanislav Fomichev
@ 2019-05-31 22:58     ` Andrii Nakryiko
  2019-06-03  0:33       ` Jakub Kicinski
  2019-06-03 16:32       ` Stanislav Fomichev
  0 siblings, 2 replies; 40+ messages in thread
From: Andrii Nakryiko @ 2019-05-31 22:58 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Andrii Nakryiko, Networking, bpf, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team

On Fri, May 31, 2019 at 2:28 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
>
> On 05/31, Andrii Nakryiko wrote:
> > This patch adds support for a new way to define BPF maps. It relies on
> > BTF to describe mandatory and optional attributes of a map, as well as
> > captures type information of key and value naturally. This eliminates
> > the need for BPF_ANNOTATE_KV_PAIR hack and ensures key/value sizes are
> > always in sync with the key/value type.
> My 2c: this is too magical and relies on me knowing the expected fields.
> (also, the compiler won't be able to help with the misspellings).

I don't think it's really worse than current bpf_map_def approach. In
typical scenario, there are only two fields you need to remember: type
and max_entries (notice, they are called exactly the same as in
bpf_map_def, so this knowledge is transferrable). Then you'll have
key/value, using which you are describing both type (using field's
type) and size (calculated from the type).

I can relate a bit to that with bpf_map_def you can find definition
and see all possible fields, but one can also find a lot of examples
for new map definitions as well.

One big advantage of this scheme, though, is that you get that type
association automagically without using BPF_ANNOTATE_KV_PAIR hack,
with no chance of having a mismatch, etc. This is less duplication (no
need to do sizeof(struct my_struct) and struct my_struct as an arg to
that macro) and there is no need to go and ping people to add those
annotations to improve introspection of BPF maps.

>
> I don't know how others feel about it, but I'd be much more comfortable
> with a simpler TLV-like approach. Have a new section where the format
> is |4-byte size|struct bpf_map_def_extendable|. That would essentially
> allow us to extend it the way we do with a syscall args.

It would help with extensibility, sure, though even current
bpf_map_def approach sort of can be extended already. But it won't
solve the problem of having BTF types captured for key/value (see
above). Also, you'd need another macro to lay everything out properly.

>
> Also, (un)related: we don't currently use BTF internally, so if
> you convert all tests, we'd be unable to run them :-(

Not exactly sure what you mean "you'd be unable to run them". Do you
mean that you use old Clang that doesn't emit BTF? If that's what you
are saying, a lot of tests already rely on latest Clang, so those
tests already don't work for you, probably. I'll leave it up to Daniel
and Alexei to decide if we want to convert selftests right now or not.
I did it mostly to prove that we can handle all existing cases (and
found few gotchas and bugs along the way, both in my implementation
and in kernel - fixes coming soon).

>
> > Relying on BTF, this approach allows for both forward and backward
> > compatibility w.r.t. extending supported map definition features. Old
> > libbpf implementation will ignore fields it doesn't recognize, while new
> > implementations will parse and recognize new optional attributes.
> I also don't know how to feel about old libbpf ignoring some attributes.
> In the kernel we require that the unknown fields are zeroed.
> We probably need to do something like that here? What do you think
> would be a good example of an optional attribute?

Ignoring is required for forward-compatibility, where old libbpf will
be used to load newer user BPF programs. We can decided not to do it,
in that case it's just a question of erroring out on first unknown
field. This RFC was posted exactly to discuss all these issues with
more general community, as there is no single true way to do this.

As for examples of when it can be used. It's any feature that can be
considered optional or a hint, so if old libbpf doesn't do that, it's
still not the end of the world (and we can live with that, or can
correct using direct libbpf API calls).

>
> > The outline of the new map definition (short, BTF-defined maps) is as follows:
> > 1. All the maps should be defined in .maps ELF section. It's possible to
> >    have both "legacy" map definitions in `maps` sections and BTF-defined
> >    maps in .maps sections. Everything will still work transparently.
> > 2. The map declaration and initialization is done through
> >    a global/static variable of a struct type with few mandatory and
> >    extra optional fields:
> >    - type field is mandatory and specified type of BPF map;
> >    - key/value fields are mandatory and capture key/value type/size information;
> >    - max_entries attribute is optional; if max_entries is not specified or
> >      initialized, it has to be provided in runtime through libbpf API
> >      before loading bpf_object;
> >    - map_flags is optional and if not defined, will be assumed to be 0.
> > 3. Key/value fields should be **a pointer** to a type describing
> >    key/value. The pointee type is assumed (and will be recorded as such
> >    and used for size determination) to be a type describing key/value of
> >    the map. This is done to save excessive amounts of space allocated in
> >    corresponding ELF sections for key/value of big size.
> > 4. As some maps disallow having BTF type ID associated with key/value,
> >    it's possible to specify key/value size explicitly without
> >    associating BTF type ID with it. Use key_size and value_size fields
> >    to do that (see example below).
> >
> > Here's an example of simple ARRAY map defintion:
> >
> > struct my_value { int x, y, z; };
> >
> > struct {
> >       int type;
> >       int max_entries;
> >       int *key;
> >       struct my_value *value;
> > } btf_map SEC(".maps") = {
> >       .type = BPF_MAP_TYPE_ARRAY,
> >       .max_entries = 16,
> > };
> >
> > This will define BPF ARRAY map 'btf_map' with 16 elements. The key will
> > be of type int and thus key size will be 4 bytes. The value is struct
> > my_value of size 12 bytes. This map can be used from C code exactly the
> > same as with existing maps defined through struct bpf_map_def.
> >
> > Here's an example of STACKMAP definition (which currently disallows BTF type
> > IDs for key/value):
> >
> > struct {
> >       __u32 type;
> >       __u32 max_entries;
> >       __u32 map_flags;
> >       __u32 key_size;
> >       __u32 value_size;
> > } stackmap SEC(".maps") = {
> >       .type = BPF_MAP_TYPE_STACK_TRACE,
> >       .max_entries = 128,
> >       .map_flags = BPF_F_STACK_BUILD_ID,
> >       .key_size = sizeof(__u32),
> >       .value_size = PERF_MAX_STACK_DEPTH * sizeof(struct bpf_stack_build_id),
> > };
> >
> > This approach is naturally extended to support map-in-map, by making a value
> > field to be another struct that describes inner map. This feature is not
> > implemented yet. It's also possible to incrementally add features like pinning
> > with full backwards and forward compatibility.
> >
> > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> > ---
> >  tools/lib/bpf/btf.h    |   1 +
> >  tools/lib/bpf/libbpf.c | 333 +++++++++++++++++++++++++++++++++++++++--
> >  2 files changed, 325 insertions(+), 9 deletions(-)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF
  2019-05-31 22:58     ` Andrii Nakryiko
@ 2019-06-03  0:33       ` Jakub Kicinski
  2019-06-03 21:54         ` Andrii Nakryiko
  2019-06-03 16:32       ` Stanislav Fomichev
  1 sibling, 1 reply; 40+ messages in thread
From: Jakub Kicinski @ 2019-06-03  0:33 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Stanislav Fomichev, Andrii Nakryiko, Networking, bpf,
	Alexei Starovoitov, Daniel Borkmann, Kernel Team

On Fri, 31 May 2019 15:58:41 -0700, Andrii Nakryiko wrote:
> On Fri, May 31, 2019 at 2:28 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
> > On 05/31, Andrii Nakryiko wrote:  
> > > This patch adds support for a new way to define BPF maps. It relies on
> > > BTF to describe mandatory and optional attributes of a map, as well as
> > > captures type information of key and value naturally. This eliminates
> > > the need for BPF_ANNOTATE_KV_PAIR hack and ensures key/value sizes are
> > > always in sync with the key/value type.  
> > My 2c: this is too magical and relies on me knowing the expected fields.
> > (also, the compiler won't be able to help with the misspellings).  

I have mixed feelings, too.  Especially the key and value fields are
very non-idiomatic for C :(  They never hold any value or data, while
the other fields do.  That feels so awkward.  I'm no compiler expert,
but even something like:

struct map_def {
	void *key_type_ref;
} mamap = {
	.key_type_ref = &(struct key_xyz){},
};

Would feel like less of a hack to me, and then map_def doesn't have to
be different for every map.  But yea, IDK if it's easy to (a) resolve
the type of what key_type points to, or (b) how to do this for scalar
types.

> I don't think it's really worse than current bpf_map_def approach. In
> typical scenario, there are only two fields you need to remember: type
> and max_entries (notice, they are called exactly the same as in
> bpf_map_def, so this knowledge is transferrable). Then you'll have
> key/value, using which you are describing both type (using field's
> type) and size (calculated from the type).
> 
> I can relate a bit to that with bpf_map_def you can find definition
> and see all possible fields, but one can also find a lot of examples
> for new map definitions as well.
> 
> One big advantage of this scheme, though, is that you get that type
> association automagically without using BPF_ANNOTATE_KV_PAIR hack,
> with no chance of having a mismatch, etc. This is less duplication (no
> need to do sizeof(struct my_struct) and struct my_struct as an arg to
> that macro) and there is no need to go and ping people to add those
> annotations to improve introspection of BPF maps.

> > > Relying on BTF, this approach allows for both forward and backward
> > > compatibility w.r.t. extending supported map definition features. Old
> > > libbpf implementation will ignore fields it doesn't recognize, while new
> > > implementations will parse and recognize new optional attributes.  
> > I also don't know how to feel about old libbpf ignoring some attributes.
> > In the kernel we require that the unknown fields are zeroed.
> > We probably need to do something like that here? What do you think
> > would be a good example of an optional attribute?  
> 
> Ignoring is required for forward-compatibility, where old libbpf will
> be used to load newer user BPF programs. We can decided not to do it,
> in that case it's just a question of erroring out on first unknown
> field. This RFC was posted exactly to discuss all these issues with
> more general community, as there is no single true way to do this.
> 
> As for examples of when it can be used. It's any feature that can be
> considered optional or a hint, so if old libbpf doesn't do that, it's
> still not the end of the world (and we can live with that, or can
> correct using direct libbpf API calls).

On forward compatibility my 0.02c would be - if we want to go there 
and silently ignore fields it'd be good to have some form of "hard
required" bit.  For TLVs ABIs it can be a "you have to understand 
this one" bit, for libbpf perhaps we could add a "min libbpf version
required" section?  That kind of ties us ELF formats to libbpf
specifics (the libbpf version presumably would imply support for
features), but I think we want to go there, anyway.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF
  2019-05-31 22:58     ` Andrii Nakryiko
  2019-06-03  0:33       ` Jakub Kicinski
@ 2019-06-03 16:32       ` Stanislav Fomichev
  2019-06-03 22:03         ` Andrii Nakryiko
  1 sibling, 1 reply; 40+ messages in thread
From: Stanislav Fomichev @ 2019-06-03 16:32 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, Networking, bpf, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team

On 05/31, Andrii Nakryiko wrote:
> On Fri, May 31, 2019 at 2:28 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
> >
> > On 05/31, Andrii Nakryiko wrote:
> > > This patch adds support for a new way to define BPF maps. It relies on
> > > BTF to describe mandatory and optional attributes of a map, as well as
> > > captures type information of key and value naturally. This eliminates
> > > the need for BPF_ANNOTATE_KV_PAIR hack and ensures key/value sizes are
> > > always in sync with the key/value type.
> > My 2c: this is too magical and relies on me knowing the expected fields.
> > (also, the compiler won't be able to help with the misspellings).
> 
> I don't think it's really worse than current bpf_map_def approach. In
> typical scenario, there are only two fields you need to remember: type
> and max_entries (notice, they are called exactly the same as in
> bpf_map_def, so this knowledge is transferrable). Then you'll have
> key/value, using which you are describing both type (using field's
> type) and size (calculated from the type).
> 
> I can relate a bit to that with bpf_map_def you can find definition
> and see all possible fields, but one can also find a lot of examples
> for new map definitions as well.
> 
> One big advantage of this scheme, though, is that you get that type
> association automagically without using BPF_ANNOTATE_KV_PAIR hack,
> with no chance of having a mismatch, etc. This is less duplication (no
> need to do sizeof(struct my_struct) and struct my_struct as an arg to
> that macro) and there is no need to go and ping people to add those
> annotations to improve introspection of BPF maps.
Don't get me wrong, it looks good and there are advantages compared to
the existing way. But, again, feels to me a bit too magic. We should somehow
make it less magic (see below).

> > I don't know how others feel about it, but I'd be much more comfortable
> > with a simpler TLV-like approach. Have a new section where the format
> > is |4-byte size|struct bpf_map_def_extendable|. That would essentially
> > allow us to extend it the way we do with a syscall args.
> 
> It would help with extensibility, sure, though even current
> bpf_map_def approach sort of can be extended already. But it won't
> solve the problem of having BTF types captured for key/value (see
> above). Also, you'd need another macro to lay everything out properly.
I didn't know that we look into the list of exported symbols to estimate
the number of maps and then use it to derive struct bpf_map_def size.

In that case, maybe we can keep extending struct bpf_map_def
and support BTF mode as a better alternative? bpf_map_def could be
used as a reference for which fields there are, people can still use it
(with BPF_ANNOTATE_KV_PAIR if needed), but they can also use
new BTF mode if they find that works better for them?

Because the biggest issue for me with the BTF mode is the question
of where to look for the supported fields (and misspellings). People
on this mailing list can probably figure it out, but people who don't
work full time on bpf might find it hard. Having 'struct bpf_map_def'
as a reference (or a good supported piece of documentation) might help
with that.

What do you think? The only issue is that we now have two formats
to support :-/

> > Also, (un)related: we don't currently use BTF internally, so if
> > you convert all tests, we'd be unable to run them :-(
> 
> Not exactly sure what you mean "you'd be unable to run them". Do you
> mean that you use old Clang that doesn't emit BTF? If that's what you
> are saying, a lot of tests already rely on latest Clang, so those
> tests already don't work for you, probably. I'll leave it up to Daniel
> and Alexei to decide if we want to convert selftests right now or not.
> I did it mostly to prove that we can handle all existing cases (and
> found few gotchas and bugs along the way, both in my implementation
> and in kernel - fixes coming soon).
Yes, I mean that we don't always use the latest features of clang,
so having the existing tests in the old form (at least for a while)
would be appreciated. Good candidates to showcase new format can
be features that explicitly require BTF, stuff like spinlocks.

> > > Relying on BTF, this approach allows for both forward and backward
> > > compatibility w.r.t. extending supported map definition features. Old
> > > libbpf implementation will ignore fields it doesn't recognize, while new
> > > implementations will parse and recognize new optional attributes.
> > I also don't know how to feel about old libbpf ignoring some attributes.
> > In the kernel we require that the unknown fields are zeroed.
> > We probably need to do something like that here? What do you think
> > would be a good example of an optional attribute?
> 
> Ignoring is required for forward-compatibility, where old libbpf will
> be used to load newer user BPF programs. We can decided not to do it,
> in that case it's just a question of erroring out on first unknown
> field. This RFC was posted exactly to discuss all these issues with
> more general community, as there is no single true way to do this.
> 
> As for examples of when it can be used. It's any feature that can be
> considered optional or a hint, so if old libbpf doesn't do that, it's
> still not the end of the world (and we can live with that, or can
> correct using direct libbpf API calls).
In general, doing what we do right now with bpf_map_def (returning an error
for non-zero unknown options) seems like the safest option. We should
probably do the same with the unknown BTF fields (return an error
for non-zero value).

For a general BTF case, we can have some predefined policy: if, for example,
the field name starts with an underscore, it's optional and doesn't require
non-zero check. (or the name ends with '_opt' or some other clear policy).

> > > The outline of the new map definition (short, BTF-defined maps) is as follows:
> > > 1. All the maps should be defined in .maps ELF section. It's possible to
> > >    have both "legacy" map definitions in `maps` sections and BTF-defined
> > >    maps in .maps sections. Everything will still work transparently.
> > > 2. The map declaration and initialization is done through
> > >    a global/static variable of a struct type with few mandatory and
> > >    extra optional fields:
> > >    - type field is mandatory and specified type of BPF map;
> > >    - key/value fields are mandatory and capture key/value type/size information;
> > >    - max_entries attribute is optional; if max_entries is not specified or
> > >      initialized, it has to be provided in runtime through libbpf API
> > >      before loading bpf_object;
> > >    - map_flags is optional and if not defined, will be assumed to be 0.
> > > 3. Key/value fields should be **a pointer** to a type describing
> > >    key/value. The pointee type is assumed (and will be recorded as such
> > >    and used for size determination) to be a type describing key/value of
> > >    the map. This is done to save excessive amounts of space allocated in
> > >    corresponding ELF sections for key/value of big size.
> > > 4. As some maps disallow having BTF type ID associated with key/value,
> > >    it's possible to specify key/value size explicitly without
> > >    associating BTF type ID with it. Use key_size and value_size fields
> > >    to do that (see example below).
> > >
> > > Here's an example of simple ARRAY map defintion:
> > >
> > > struct my_value { int x, y, z; };
> > >
> > > struct {
> > >       int type;
> > >       int max_entries;
> > >       int *key;
> > >       struct my_value *value;
> > > } btf_map SEC(".maps") = {
> > >       .type = BPF_MAP_TYPE_ARRAY,
> > >       .max_entries = 16,
> > > };
> > >
> > > This will define BPF ARRAY map 'btf_map' with 16 elements. The key will
> > > be of type int and thus key size will be 4 bytes. The value is struct
> > > my_value of size 12 bytes. This map can be used from C code exactly the
> > > same as with existing maps defined through struct bpf_map_def.
> > >
> > > Here's an example of STACKMAP definition (which currently disallows BTF type
> > > IDs for key/value):
> > >
> > > struct {
> > >       __u32 type;
> > >       __u32 max_entries;
> > >       __u32 map_flags;
> > >       __u32 key_size;
> > >       __u32 value_size;
> > > } stackmap SEC(".maps") = {
> > >       .type = BPF_MAP_TYPE_STACK_TRACE,
> > >       .max_entries = 128,
> > >       .map_flags = BPF_F_STACK_BUILD_ID,
> > >       .key_size = sizeof(__u32),
> > >       .value_size = PERF_MAX_STACK_DEPTH * sizeof(struct bpf_stack_build_id),
> > > };
> > >
> > > This approach is naturally extended to support map-in-map, by making a value
> > > field to be another struct that describes inner map. This feature is not
> > > implemented yet. It's also possible to incrementally add features like pinning
> > > with full backwards and forward compatibility.
> > >
> > > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> > > ---
> > >  tools/lib/bpf/btf.h    |   1 +
> > >  tools/lib/bpf/libbpf.c | 333 +++++++++++++++++++++++++++++++++++++++--
> > >  2 files changed, 325 insertions(+), 9 deletions(-)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF
  2019-06-03  0:33       ` Jakub Kicinski
@ 2019-06-03 21:54         ` Andrii Nakryiko
  2019-06-03 23:34           ` Jakub Kicinski
  0 siblings, 1 reply; 40+ messages in thread
From: Andrii Nakryiko @ 2019-06-03 21:54 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Stanislav Fomichev, Andrii Nakryiko, Networking, bpf,
	Alexei Starovoitov, Daniel Borkmann, Kernel Team

On Sun, Jun 2, 2019 at 5:33 PM Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
>
> On Fri, 31 May 2019 15:58:41 -0700, Andrii Nakryiko wrote:
> > On Fri, May 31, 2019 at 2:28 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
> > > On 05/31, Andrii Nakryiko wrote:
> > > > This patch adds support for a new way to define BPF maps. It relies on
> > > > BTF to describe mandatory and optional attributes of a map, as well as
> > > > captures type information of key and value naturally. This eliminates
> > > > the need for BPF_ANNOTATE_KV_PAIR hack and ensures key/value sizes are
> > > > always in sync with the key/value type.
> > > My 2c: this is too magical and relies on me knowing the expected fields.
> > > (also, the compiler won't be able to help with the misspellings).
>
> I have mixed feelings, too.  Especially the key and value fields are
> very non-idiomatic for C :(  They never hold any value or data, while
> the other fields do.  That feels so awkward.  I'm no compiler expert,
> but even something like:
>
> struct map_def {
>         void *key_type_ref;
> } mamap = {
>         .key_type_ref = &(struct key_xyz){},
> };
>
> Would feel like less of a hack to me, and then map_def doesn't have to
> be different for every map.  But yea, IDK if it's easy to (a) resolve
> the type of what key_type points to, or (b) how to do this for scalar
> types.

The syntax for scalar would be &(int){0}, that compiles.

But there are a bunch of things that make it infeasible. So let's take
an example and see what's happening:

/* huge struct */
struct custom {int a; int b; int c; int d[1000000];};

struct {
        void *key;
        void *value;
} new_map = {
        .key = &(int){0},
        .value = &(struct custom){},
};

If we dump BTF, here's what we get:

$ bpftool btf dump file tail_call_test.o
[1] FUNC_PROTO '(anon)' ret_type_id=2 vlen=0
[2] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
[3] FUNC 'main' type_id=1
[4] VAR '.compoundliteral' type_id=0, linkage=static
[5] VAR '.compoundliteral.1' type_id=0, linkage=static
[6] STRUCT '(anon)' size=24 vlen=3
        'type' type_id=2 bits_offset=0
        'key' type_id=7 bits_offset=64
        'value' type_id=7 bits_offset=128
[7] PTR '(anon)' type_id=0
[8] VAR 'new_map' type_id=6, linkage=global-alloc
[9] DATASEC '.bss' size=0 vlen=2
        type_id=4 offset=0 size=4
        type_id=5 offset=4 size=4000012
[10] DATASEC '.maps' size=0 vlen=1
        type_id=8 offset=0 size=24

So notice how we get two .bss entries, one for 4 bytes (for key, var
'compoundliteral') and another for 4MB (for huge struct, var
'.compoundliteral.1'). So while this won't increase the size of ELF,
it will force a huge .bss (and corresponding global data map) to be
created, which is no good.

Also, notice how there is no type information associated with [4] and
[5] vars, they are just of type void. There is no type information
about struct custom at all, though it might be (?) possible to fix it
by modifying compiler to preserve more type information.

So while the second one is a technical hurdle, which we might overcome
(not sure, actually), the issue with big .BSS is a showstopper for
some applications.

To eliminate .BSS issue, we'd need something like this to capture type
information:

struct {
        void *key;
        void *value;
} new_map = {
        .key = (int)0,
        .value = (struct custom *)0,
};

But that doesn't capture any type information for those type casts at
all, so more compiler work (if at all possible).

Which is why I think capturing type information using a standard
non-convoluted C way/syntax using a field declaration is the most
reliable, simple, and clean way. You do intialize key/value, it's just
a NULL pointer to corresponding type:

struct {
        int type;
        int *key;
        struct custom *value;
} new_map __attribute__((section(".maps"), used)) = {
        .type = 2,
        .key = (int)0,
        .value = (struct custom *)NULL,
};


Notice, btw, that this approach doesn't prevent you to re-use struct
definitions for multiple maps, if they have identical key/value types
or if you are not capturing type information at all.

struct my_typical_map {
        int type;
        int max_entries;
        u64 *key;
        struct custom *value;
};

struct my_typical_map map1 SEC(".maps") = {
        .type = BPF_MAP_TYPE_ARRAY,
        .max_entries = 10,
};

struct my_typical_map map2 SEC(".maps") = {
        .type = BPF_MAP_TYPE_ARRAY,
        .max_entries = 20,
};

Or, you can just re-use struct bpf_map_def today like this (but you
won't have type info for key/value, of course):

struct bpf_map_def my_map_without_type_info SEC(".maps") = {
        .type = BPF_MAP_TYPE_ARRAY,
        .max_entries = 100,
        .key_size = sizeof(u64),
        .value_size = sizeof(struct custom),
};

This approach gives you as much flexibility as possible, you only will
have to have different definition struct, if you have different
key/value type (in C++ that would be solved by templates, but alas we
are in C land).


>
> > I don't think it's really worse than current bpf_map_def approach. In
> > typical scenario, there are only two fields you need to remember: type
> > and max_entries (notice, they are called exactly the same as in
> > bpf_map_def, so this knowledge is transferrable). Then you'll have
> > key/value, using which you are describing both type (using field's
> > type) and size (calculated from the type).
> >
> > I can relate a bit to that with bpf_map_def you can find definition
> > and see all possible fields, but one can also find a lot of examples
> > for new map definitions as well.
> >
> > One big advantage of this scheme, though, is that you get that type
> > association automagically without using BPF_ANNOTATE_KV_PAIR hack,
> > with no chance of having a mismatch, etc. This is less duplication (no
> > need to do sizeof(struct my_struct) and struct my_struct as an arg to
> > that macro) and there is no need to go and ping people to add those
> > annotations to improve introspection of BPF maps.
>
> > > > Relying on BTF, this approach allows for both forward and backward
> > > > compatibility w.r.t. extending supported map definition features. Old
> > > > libbpf implementation will ignore fields it doesn't recognize, while new
> > > > implementations will parse and recognize new optional attributes.
> > > I also don't know how to feel about old libbpf ignoring some attributes.
> > > In the kernel we require that the unknown fields are zeroed.
> > > We probably need to do something like that here? What do you think
> > > would be a good example of an optional attribute?
> >
> > Ignoring is required for forward-compatibility, where old libbpf will
> > be used to load newer user BPF programs. We can decided not to do it,
> > in that case it's just a question of erroring out on first unknown
> > field. This RFC was posted exactly to discuss all these issues with
> > more general community, as there is no single true way to do this.
> >
> > As for examples of when it can be used. It's any feature that can be
> > considered optional or a hint, so if old libbpf doesn't do that, it's
> > still not the end of the world (and we can live with that, or can
> > correct using direct libbpf API calls).
>
> On forward compatibility my 0.02c would be - if we want to go there
> and silently ignore fields it'd be good to have some form of "hard
> required" bit.  For TLVs ABIs it can be a "you have to understand
> this one" bit, for libbpf perhaps we could add a "min libbpf version
> required" section?  That kind of ties us ELF formats to libbpf
> specifics (the libbpf version presumably would imply support for
> features), but I think we want to go there, anyway.

I think we can go with strict/non-strict mode, which we already
support in libbpf with MAPS_RELAX_COMPAT flag (see
__bpf_object__open_xattr), would that work?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF
  2019-06-03 16:32       ` Stanislav Fomichev
@ 2019-06-03 22:03         ` Andrii Nakryiko
  2019-06-04  1:02           ` Stanislav Fomichev
  0 siblings, 1 reply; 40+ messages in thread
From: Andrii Nakryiko @ 2019-06-03 22:03 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Andrii Nakryiko, Networking, bpf, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team

On Mon, Jun 3, 2019 at 9:32 AM Stanislav Fomichev <sdf@fomichev.me> wrote:
>
> On 05/31, Andrii Nakryiko wrote:
> > On Fri, May 31, 2019 at 2:28 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
> > >
> > > On 05/31, Andrii Nakryiko wrote:
> > > > This patch adds support for a new way to define BPF maps. It relies on
> > > > BTF to describe mandatory and optional attributes of a map, as well as
> > > > captures type information of key and value naturally. This eliminates
> > > > the need for BPF_ANNOTATE_KV_PAIR hack and ensures key/value sizes are
> > > > always in sync with the key/value type.
> > > My 2c: this is too magical and relies on me knowing the expected fields.
> > > (also, the compiler won't be able to help with the misspellings).
> >
> > I don't think it's really worse than current bpf_map_def approach. In
> > typical scenario, there are only two fields you need to remember: type
> > and max_entries (notice, they are called exactly the same as in
> > bpf_map_def, so this knowledge is transferrable). Then you'll have
> > key/value, using which you are describing both type (using field's
> > type) and size (calculated from the type).
> >
> > I can relate a bit to that with bpf_map_def you can find definition
> > and see all possible fields, but one can also find a lot of examples
> > for new map definitions as well.
> >
> > One big advantage of this scheme, though, is that you get that type
> > association automagically without using BPF_ANNOTATE_KV_PAIR hack,
> > with no chance of having a mismatch, etc. This is less duplication (no
> > need to do sizeof(struct my_struct) and struct my_struct as an arg to
> > that macro) and there is no need to go and ping people to add those
> > annotations to improve introspection of BPF maps.
> Don't get me wrong, it looks good and there are advantages compared to
> the existing way. But, again, feels to me a bit too magic. We should somehow
> make it less magic (see below).
>
> > > I don't know how others feel about it, but I'd be much more comfortable
> > > with a simpler TLV-like approach. Have a new section where the format
> > > is |4-byte size|struct bpf_map_def_extendable|. That would essentially
> > > allow us to extend it the way we do with a syscall args.
> >
> > It would help with extensibility, sure, though even current
> > bpf_map_def approach sort of can be extended already. But it won't
> > solve the problem of having BTF types captured for key/value (see
> > above). Also, you'd need another macro to lay everything out properly.
> I didn't know that we look into the list of exported symbols to estimate
> the number of maps and then use it to derive struct bpf_map_def size.
>
> In that case, maybe we can keep extending struct bpf_map_def
> and support BTF mode as a better alternative? bpf_map_def could be
> used as a reference for which fields there are, people can still use it
> (with BPF_ANNOTATE_KV_PAIR if needed), but they can also use
> new BTF mode if they find that works better for them?
>
> Because the biggest issue for me with the BTF mode is the question
> of where to look for the supported fields (and misspellings). People
> on this mailing list can probably figure it out, but people who don't
> work full time on bpf might find it hard. Having 'struct bpf_map_def'
> as a reference (or a good supported piece of documentation) might help

So yeah, it's more about documentation and examples, it seems, rather
than having a C struct in code, right? Today, if I need to add new
map, I copy/paste either from example, existing code or look up
documentation. You'll be able to do the same with new way (just grep
for \.maps).

> with that.
>
> What do you think? The only issue is that we now have two formats
> to support :-/

We'll have to support existing bpf_map_def for backwards compatibility
(and see my reply to Jakub, you can just plain re-use struct
bpf_map_def today with BTF approach, just put it into .maps section),
but I'd love to avoid having to support new features using two
different way, so if we go with BTF, I'd restrict new features to BTF
only, moving forward.

>
> > > Also, (un)related: we don't currently use BTF internally, so if
> > > you convert all tests, we'd be unable to run them :-(
> >
> > Not exactly sure what you mean "you'd be unable to run them". Do you
> > mean that you use old Clang that doesn't emit BTF? If that's what you
> > are saying, a lot of tests already rely on latest Clang, so those
> > tests already don't work for you, probably. I'll leave it up to Daniel
> > and Alexei to decide if we want to convert selftests right now or not.
> > I did it mostly to prove that we can handle all existing cases (and
> > found few gotchas and bugs along the way, both in my implementation
> > and in kernel - fixes coming soon).
> Yes, I mean that we don't always use the latest features of clang,
> so having the existing tests in the old form (at least for a while)
> would be appreciated. Good candidates to showcase new format can
> be features that explicitly require BTF, stuff like spinlocks.

I totally understand a concern, but I'll still defer to maintainers to
make a call as to when to do conversion.

>
> > > > Relying on BTF, this approach allows for both forward and backward
> > > > compatibility w.r.t. extending supported map definition features. Old
> > > > libbpf implementation will ignore fields it doesn't recognize, while new
> > > > implementations will parse and recognize new optional attributes.
> > > I also don't know how to feel about old libbpf ignoring some attributes.
> > > In the kernel we require that the unknown fields are zeroed.
> > > We probably need to do something like that here? What do you think
> > > would be a good example of an optional attribute?
> >
> > Ignoring is required for forward-compatibility, where old libbpf will
> > be used to load newer user BPF programs. We can decided not to do it,
> > in that case it's just a question of erroring out on first unknown
> > field. This RFC was posted exactly to discuss all these issues with
> > more general community, as there is no single true way to do this.
> >
> > As for examples of when it can be used. It's any feature that can be
> > considered optional or a hint, so if old libbpf doesn't do that, it's
> > still not the end of the world (and we can live with that, or can
> > correct using direct libbpf API calls).
> In general, doing what we do right now with bpf_map_def (returning an error
> for non-zero unknown options) seems like the safest option. We should
> probably do the same with the unknown BTF fields (return an error
> for non-zero value).

Yeah, as I replied to Jakub, libbpf already has strict/non-strict
mode, we should probably do the same. The only potential difference is
that there is no need to check for zeros and stuff: just don't define
a field. And using an extra flag, we can allow more relaxed semantics
(just debug/info/warn message on unknown fields). This is what
__bpf_object__open_xattr does today with MAPS_RELAX_COMPAT flag.

>
> For a general BTF case, we can have some predefined policy: if, for example,
> the field name starts with an underscore, it's optional and doesn't require
> non-zero check. (or the name ends with '_opt' or some other clear policy).
>
> > > > The outline of the new map definition (short, BTF-defined maps) is as follows:
> > > > 1. All the maps should be defined in .maps ELF section. It's possible to
> > > >    have both "legacy" map definitions in `maps` sections and BTF-defined
> > > >    maps in .maps sections. Everything will still work transparently.
> > > > 2. The map declaration and initialization is done through
> > > >    a global/static variable of a struct type with few mandatory and
> > > >    extra optional fields:
> > > >    - type field is mandatory and specified type of BPF map;
> > > >    - key/value fields are mandatory and capture key/value type/size information;
> > > >    - max_entries attribute is optional; if max_entries is not specified or
> > > >      initialized, it has to be provided in runtime through libbpf API
> > > >      before loading bpf_object;
> > > >    - map_flags is optional and if not defined, will be assumed to be 0.
> > > > 3. Key/value fields should be **a pointer** to a type describing
> > > >    key/value. The pointee type is assumed (and will be recorded as such
> > > >    and used for size determination) to be a type describing key/value of
> > > >    the map. This is done to save excessive amounts of space allocated in
> > > >    corresponding ELF sections for key/value of big size.
> > > > 4. As some maps disallow having BTF type ID associated with key/value,
> > > >    it's possible to specify key/value size explicitly without
> > > >    associating BTF type ID with it. Use key_size and value_size fields
> > > >    to do that (see example below).
> > > >
> > > > Here's an example of simple ARRAY map defintion:
> > > >
> > > > struct my_value { int x, y, z; };
> > > >
> > > > struct {
> > > >       int type;
> > > >       int max_entries;
> > > >       int *key;
> > > >       struct my_value *value;
> > > > } btf_map SEC(".maps") = {
> > > >       .type = BPF_MAP_TYPE_ARRAY,
> > > >       .max_entries = 16,
> > > > };
> > > >
> > > > This will define BPF ARRAY map 'btf_map' with 16 elements. The key will
> > > > be of type int and thus key size will be 4 bytes. The value is struct
> > > > my_value of size 12 bytes. This map can be used from C code exactly the
> > > > same as with existing maps defined through struct bpf_map_def.
> > > >
> > > > Here's an example of STACKMAP definition (which currently disallows BTF type
> > > > IDs for key/value):
> > > >
> > > > struct {
> > > >       __u32 type;
> > > >       __u32 max_entries;
> > > >       __u32 map_flags;
> > > >       __u32 key_size;
> > > >       __u32 value_size;
> > > > } stackmap SEC(".maps") = {
> > > >       .type = BPF_MAP_TYPE_STACK_TRACE,
> > > >       .max_entries = 128,
> > > >       .map_flags = BPF_F_STACK_BUILD_ID,
> > > >       .key_size = sizeof(__u32),
> > > >       .value_size = PERF_MAX_STACK_DEPTH * sizeof(struct bpf_stack_build_id),
> > > > };
> > > >
> > > > This approach is naturally extended to support map-in-map, by making a value
> > > > field to be another struct that describes inner map. This feature is not
> > > > implemented yet. It's also possible to incrementally add features like pinning
> > > > with full backwards and forward compatibility.
> > > >
> > > > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> > > > ---
> > > >  tools/lib/bpf/btf.h    |   1 +
> > > >  tools/lib/bpf/libbpf.c | 333 +++++++++++++++++++++++++++++++++++++++--
> > > >  2 files changed, 325 insertions(+), 9 deletions(-)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF
  2019-05-31 20:21 ` [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF Andrii Nakryiko
  2019-05-31 21:28   ` Stanislav Fomichev
@ 2019-06-03 22:34   ` Andrii Nakryiko
  2019-06-06 16:42   ` Lorenz Bauer
  2 siblings, 0 replies; 40+ messages in thread
From: Andrii Nakryiko @ 2019-06-03 22:34 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Networking, bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team

On Fri, May 31, 2019 at 1:21 PM Andrii Nakryiko <andriin@fb.com> wrote:
>
> This patch adds support for a new way to define BPF maps. It relies on
> BTF to describe mandatory and optional attributes of a map, as well as
> captures type information of key and value naturally. This eliminates
> the need for BPF_ANNOTATE_KV_PAIR hack and ensures key/value sizes are
> always in sync with the key/value type.
>
> Relying on BTF, this approach allows for both forward and backward
> compatibility w.r.t. extending supported map definition features. Old
> libbpf implementation will ignore fields it doesn't recognize, while new
> implementations will parse and recognize new optional attributes.
>
> The outline of the new map definition (short, BTF-defined maps) is as follows:
> 1. All the maps should be defined in .maps ELF section. It's possible to
>    have both "legacy" map definitions in `maps` sections and BTF-defined
>    maps in .maps sections. Everything will still work transparently.
> 2. The map declaration and initialization is done through
>    a global/static variable of a struct type with few mandatory and
>    extra optional fields:
>    - type field is mandatory and specified type of BPF map;
>    - key/value fields are mandatory and capture key/value type/size information;
>    - max_entries attribute is optional; if max_entries is not specified or
>      initialized, it has to be provided in runtime through libbpf API
>      before loading bpf_object;
>    - map_flags is optional and if not defined, will be assumed to be 0.
> 3. Key/value fields should be **a pointer** to a type describing
>    key/value. The pointee type is assumed (and will be recorded as such
>    and used for size determination) to be a type describing key/value of
>    the map. This is done to save excessive amounts of space allocated in
>    corresponding ELF sections for key/value of big size.
> 4. As some maps disallow having BTF type ID associated with key/value,
>    it's possible to specify key/value size explicitly without
>    associating BTF type ID with it. Use key_size and value_size fields
>    to do that (see example below).
>
> Here's an example of simple ARRAY map defintion:
>
> struct my_value { int x, y, z; };
>
> struct {
>         int type;
>         int max_entries;
>         int *key;
>         struct my_value *value;
> } btf_map SEC(".maps") = {
>         .type = BPF_MAP_TYPE_ARRAY,
>         .max_entries = 16,
> };
>
> This will define BPF ARRAY map 'btf_map' with 16 elements. The key will
> be of type int and thus key size will be 4 bytes. The value is struct
> my_value of size 12 bytes. This map can be used from C code exactly the
> same as with existing maps defined through struct bpf_map_def.
>
> Here's an example of STACKMAP definition (which currently disallows BTF type
> IDs for key/value):
>
> struct {
>         __u32 type;
>         __u32 max_entries;
>         __u32 map_flags;
>         __u32 key_size;
>         __u32 value_size;
> } stackmap SEC(".maps") = {
>         .type = BPF_MAP_TYPE_STACK_TRACE,
>         .max_entries = 128,
>         .map_flags = BPF_F_STACK_BUILD_ID,
>         .key_size = sizeof(__u32),
>         .value_size = PERF_MAX_STACK_DEPTH * sizeof(struct bpf_stack_build_id),
> };
>
> This approach is naturally extended to support map-in-map, by making a value
> field to be another struct that describes inner map. This feature is not
> implemented yet. It's also possible to incrementally add features like pinning
> with full backwards and forward compatibility.

So I wanted to elaborate a bit more on what I'm planning to add, once
we agree on the approach. Those are the features that are currently
supported by iproute2 loader and here's how I was thinking to support
them with BTF-defined maps. Once all this is implemented, there should
be just a mechanical field rename to switch BPF apps relying on
iproute2 loader (size_key -> key_size, size_value -> value_size,
max_elem -> max_entries) for most maps. For more complicated cases
described below, I hope we can agree it's easy to migrate and end
result might even look better (because more explicit).

1. Pinning. This one is simple:
  - add pinning attribute, that will either be "no pinning", "global
pinning", "object-scope pinning".
  - by default pinning root will be "/sys/fs/bpf", but one will be
able to override this per-object using extra options (so that
"/sys/fs/bpf/tc" can be specified).

2. Map-in-map declaration:

As outlined at LSF/MM, we can extend value type to be another map
definition, specifying a prototype for inner map:

struct {
        int type;
        int max_entries;
        struct outer_key *key;
        struct { /* this is definition of inner map */
               int type;
               int max_entries;
               struct inner_key *key;
               struct inner_value *value;
        } value;
} my_hash_of_arrays BPF_MAP = {
        .type = BPF_MAP_TYPE_HASH_OF_MAPS,
        .max_entries = 1024,
        .value = {
                .type = BPF_MAP_TYPE_ARRAY,
                .max_entries = 64,
        },
};

This would declare a hash_of_maps, where inner maps are arrays of 64
elements each. Notice, that struct defining inner map can be declared
outside and shared with other maps:

struct inner_map_t {
        int type;
        int max_entries;
        struct inner_key *key;
        struct inner_value *value;
};

struct {
        int type;
        int max_entries;
        struct outer_key *key;
        struct inner_map_t value;
} my_hash_of_arrays BPF_MAP = {
        .type = BPF_MAP_TYPE_HASH_OF_MAPS,
        .max_entries = 1024,
        .value = {
                .type = BPF_MAP_TYPE_ARRAY,
                .max_entries = 64,
        },
};


3. Initialization of prog array. Iproute2 supports a convention-driven
initialization of BPF_MAP_TYPE_PROG_ARRAY using special section names
(wrapped into __section_tail(ID, IDX)):

struct bpf_elf_map SEC("maps") POLICY_CALL_MAP = {
        .type = BPF_MAP_TYPE_PROG_ARRAY,
        .id = MAP_ID,
        .size_key = sizeof(__u32),
        .size_value = sizeof(__u32),
        .max_elem = 16,
};

__section_tail(MAP_ID, MAP_IDX) int handle_policy(struct __sk_buff *skb)
{
        ...
}

For each such program, iproute2 will put its FD (for later
tail-calling) into a corresponding MAP with id == MAP_ID at index
MAP_IDX.

Here's how I see this supported in BTF-defined maps case.

typedef int (* skbuff_tailcall_fn)(struct __sk_buff *);

struct {
        int type;
        int max_entries;
        int *key;
        skbuff_tailcall_fb value[];
} POLICY_CALL_MAP SEC(".maps") = {
        .type = BPF_MAP_TYPE_PROG_ARRAY,
        .max_entries = 16,
        .value = {
                &handle_policy,
                NULL,
                &handle_some_other_policy,
        },
};

libbpf loader will greate BPF_MAP_TYPE_PROG_ARRAY map with 16 elements
and will initialize first and third entries with FDs of handle_policy
and handle_some_other_policy programs. As an added nice bonus,
compiler should also warn on signature mismatch. ;)


4. We can extend this idea into ARRAY_OF_MAPS initialization. This is
currently implemented in iproute2 using .id, .inner_id, and .inner_idx
fields.

struct inner_map_t {
        int type;
        int max_entries;
        struct inner_key *key;
        struct inner_value *value;
};

struct inner_map_t map1 = {...};
struct inner_map_t map2 = {...};

struct {
        int type;
        int max_entries;
        struct outer_key *key;
        struct inner_map_t value[];
} my_hash_of_arrays BPF_MAP = {
        .type = BPF_MAP_TYPE_ARRAY_OF_MAPS,
        .max_entries = 2,
        .value = {
                &map1,
                &map2,
        },
};


There are a bunch of slight variations we might consider (e.g., value
vs values, when there is inline initialization, is it an array of
structs or an array of pointers to structs, etc), but the overall idea
stays the same.

So when all this is implemented and supported, from looking at Cilium,
it seems like conversion of iproute2 to libbpf should be rather simple
and painless. I'd be curious to hear what Cilium folks are thinking
about that.



>
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> ---

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF
  2019-06-03 21:54         ` Andrii Nakryiko
@ 2019-06-03 23:34           ` Jakub Kicinski
  0 siblings, 0 replies; 40+ messages in thread
From: Jakub Kicinski @ 2019-06-03 23:34 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Stanislav Fomichev, Andrii Nakryiko, Networking, bpf,
	Alexei Starovoitov, Daniel Borkmann, Kernel Team

On Mon, 3 Jun 2019 14:54:53 -0700, Andrii Nakryiko wrote:
> On Sun, Jun 2, 2019 at 5:33 PM Jakub Kicinski wrote:
> > On Fri, 31 May 2019 15:58:41 -0700, Andrii Nakryiko wrote:  
> > > On Fri, May 31, 2019 at 2:28 PM Stanislav Fomichev <sdf@fomichev.me> wrote:  
> > > > On 05/31, Andrii Nakryiko wrote:  
> > > > > This patch adds support for a new way to define BPF maps. It relies on
> > > > > BTF to describe mandatory and optional attributes of a map, as well as
> > > > > captures type information of key and value naturally. This eliminates
> > > > > the need for BPF_ANNOTATE_KV_PAIR hack and ensures key/value sizes are
> > > > > always in sync with the key/value type.  
> > > > My 2c: this is too magical and relies on me knowing the expected fields.
> > > > (also, the compiler won't be able to help with the misspellings).  
> >
> > I have mixed feelings, too.  Especially the key and value fields are
> > very non-idiomatic for C :(  They never hold any value or data, while
> > the other fields do.  That feels so awkward.  I'm no compiler expert,
> > but even something like:
> >
> > struct map_def {
> >         void *key_type_ref;
> > } mamap = {
> >         .key_type_ref = &(struct key_xyz){},
> > };
> >
> > Would feel like less of a hack to me, and then map_def doesn't have to
> > be different for every map.  But yea, IDK if it's easy to (a) resolve
> > the type of what key_type points to, or (b) how to do this for scalar
> > types.  
> 
> The syntax for scalar would be &(int){0}, that compiles.
> 
> But there are a bunch of things that make it infeasible. So let's take
> an example and see what's happening:
> 
> /* huge struct */
> struct custom {int a; int b; int c; int d[1000000];};
> 
> struct {
>         void *key;
>         void *value;
> } new_map = {
>         .key = &(int){0},
>         .value = &(struct custom){},
> };
> 
> If we dump BTF, here's what we get:
> 
> $ bpftool btf dump file tail_call_test.o
> [1] FUNC_PROTO '(anon)' ret_type_id=2 vlen=0
> [2] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
> [3] FUNC 'main' type_id=1
> [4] VAR '.compoundliteral' type_id=0, linkage=static
> [5] VAR '.compoundliteral.1' type_id=0, linkage=static
> [6] STRUCT '(anon)' size=24 vlen=3
>         'type' type_id=2 bits_offset=0
>         'key' type_id=7 bits_offset=64
>         'value' type_id=7 bits_offset=128
> [7] PTR '(anon)' type_id=0
> [8] VAR 'new_map' type_id=6, linkage=global-alloc
> [9] DATASEC '.bss' size=0 vlen=2
>         type_id=4 offset=0 size=4
>         type_id=5 offset=4 size=4000012
> [10] DATASEC '.maps' size=0 vlen=1
>         type_id=8 offset=0 size=24
> 
> So notice how we get two .bss entries, one for 4 bytes (for key, var
> 'compoundliteral') and another for 4MB (for huge struct, var
> '.compoundliteral.1'). So while this won't increase the size of ELF,
> it will force a huge .bss (and corresponding global data map) to be
> created, which is no good.
> 
> Also, notice how there is no type information associated with [4] and
> [5] vars, they are just of type void. There is no type information
> about struct custom at all, though it might be (?) possible to fix it
> by modifying compiler to preserve more type information.
> 
> So while the second one is a technical hurdle, which we might overcome
> (not sure, actually), the issue with big .BSS is a showstopper for
> some applications.

Ah :/

> To eliminate .BSS issue, we'd need something like this to capture type
> information:

Well, or we can track that the part of BSS is only referenced from map
def, but that's hairy as well.

> struct {
>         void *key;
>         void *value;
> } new_map = {
>         .key = (int)0,
>         .value = (struct custom *)0,
> };
> 
> But that doesn't capture any type information for those type casts at
> all, so more compiler work (if at all possible).
> 
> Which is why I think capturing type information using a standard
> non-convoluted C way/syntax using a field declaration is the most
> reliable, simple, and clean way. You do intialize key/value, it's just
> a NULL pointer to corresponding type:
> 
> struct {
>         int type;
>         int *key;
>         struct custom *value;
> } new_map __attribute__((section(".maps"), used)) = {
>         .type = 2,
>         .key = (int)0,
>         .value = (struct custom *)NULL,
> };
> 
> 
> Notice, btw, that this approach doesn't prevent you to re-use struct
> definitions for multiple maps, if they have identical key/value types
> or if you are not capturing type information at all.
> 
> struct my_typical_map {
>         int type;
>         int max_entries;
>         u64 *key;
>         struct custom *value;
> };
> 
> struct my_typical_map map1 SEC(".maps") = {
>         .type = BPF_MAP_TYPE_ARRAY,
>         .max_entries = 10,
> };
> 
> struct my_typical_map map2 SEC(".maps") = {
>         .type = BPF_MAP_TYPE_ARRAY,
>         .max_entries = 20,
> };
> 
> Or, you can just re-use struct bpf_map_def today like this (but you
> won't have type info for key/value, of course):
> 
> struct bpf_map_def my_map_without_type_info SEC(".maps") = {
>         .type = BPF_MAP_TYPE_ARRAY,
>         .max_entries = 100,
>         .key_size = sizeof(u64),
>         .value_size = sizeof(struct custom),
> };
> 
> This approach gives you as much flexibility as possible, you only will
> have to have different definition struct, if you have different
> key/value type (in C++ that would be solved by templates, but alas we
> are in C land).

To be clear I'm not arguing that the proposal is not flexible.  To a C
guy like me having struct members which don't hold value mixed with
struct members for storing data seems very dirty, dare I say the
BPF_ANNOTATE_KV_PAIR() construct feels cleaner :(  I like that it
has the "meta structure" separate from the actual def structure.
In a sense this would feel better to me:

BPF_DECLARE_KV_PAIR(kv_ref, struct key, struct val);

struct { .. } def = {
	.btf_ref = &kv_ref,
};

BPF_DECLARE_KV_PAIR() can stuff the structure into a custom ignored
section.

Initially I was thinking to try to force a relocation to avoid the BSS
issue:

extern struct key_ext;
extern struct value_ext;

struct def {
        void *key;
        void *value;
} new_map = {
        .key = &key_ext,
        .value = &value_ext,
};

But then I'm not super happy with needed the extern declarations :(

IDK, I would really like a cleaner solution, but perhaps there is
none ;)

> > > > > Relying on BTF, this approach allows for both forward and backward
> > > > > compatibility w.r.t. extending supported map definition features. Old
> > > > > libbpf implementation will ignore fields it doesn't recognize, while new
> > > > > implementations will parse and recognize new optional attributes.  
> > > > I also don't know how to feel about old libbpf ignoring some attributes.
> > > > In the kernel we require that the unknown fields are zeroed.
> > > > We probably need to do something like that here? What do you think
> > > > would be a good example of an optional attribute?  
> > >
> > > Ignoring is required for forward-compatibility, where old libbpf will
> > > be used to load newer user BPF programs. We can decided not to do it,
> > > in that case it's just a question of erroring out on first unknown
> > > field. This RFC was posted exactly to discuss all these issues with
> > > more general community, as there is no single true way to do this.
> > >
> > > As for examples of when it can be used. It's any feature that can be
> > > considered optional or a hint, so if old libbpf doesn't do that, it's
> > > still not the end of the world (and we can live with that, or can
> > > correct using direct libbpf API calls).  
> >
> > On forward compatibility my 0.02c would be - if we want to go there
> > and silently ignore fields it'd be good to have some form of "hard
> > required" bit.  For TLVs ABIs it can be a "you have to understand
> > this one" bit, for libbpf perhaps we could add a "min libbpf version
> > required" section?  That kind of ties us ELF formats to libbpf
> > specifics (the libbpf version presumably would imply support for
> > features), but I think we want to go there, anyway.  
> 
> I think we can go with strict/non-strict mode, which we already
> support in libbpf with MAPS_RELAX_COMPAT flag (see
> __bpf_object__open_xattr), would that work?

I'd be a lil worried that all or nothing may not be flexible enough.
IOW someone may need a future from 5.5 but optionally want a 5.10
features if available?  No strong feelings, tho.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF
  2019-06-03 22:03         ` Andrii Nakryiko
@ 2019-06-04  1:02           ` Stanislav Fomichev
  2019-06-04  1:07             ` Alexei Starovoitov
  0 siblings, 1 reply; 40+ messages in thread
From: Stanislav Fomichev @ 2019-06-04  1:02 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, Networking, bpf, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team

On 06/03, Andrii Nakryiko wrote:
> On Mon, Jun 3, 2019 at 9:32 AM Stanislav Fomichev <sdf@fomichev.me> wrote:
> >
> > On 05/31, Andrii Nakryiko wrote:
> > > On Fri, May 31, 2019 at 2:28 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
> > > >
> > > > On 05/31, Andrii Nakryiko wrote:
> > > > > This patch adds support for a new way to define BPF maps. It relies on
> > > > > BTF to describe mandatory and optional attributes of a map, as well as
> > > > > captures type information of key and value naturally. This eliminates
> > > > > the need for BPF_ANNOTATE_KV_PAIR hack and ensures key/value sizes are
> > > > > always in sync with the key/value type.
> > > > My 2c: this is too magical and relies on me knowing the expected fields.
> > > > (also, the compiler won't be able to help with the misspellings).
> > >
> > > I don't think it's really worse than current bpf_map_def approach. In
> > > typical scenario, there are only two fields you need to remember: type
> > > and max_entries (notice, they are called exactly the same as in
> > > bpf_map_def, so this knowledge is transferrable). Then you'll have
> > > key/value, using which you are describing both type (using field's
> > > type) and size (calculated from the type).
> > >
> > > I can relate a bit to that with bpf_map_def you can find definition
> > > and see all possible fields, but one can also find a lot of examples
> > > for new map definitions as well.
> > >
> > > One big advantage of this scheme, though, is that you get that type
> > > association automagically without using BPF_ANNOTATE_KV_PAIR hack,
> > > with no chance of having a mismatch, etc. This is less duplication (no
> > > need to do sizeof(struct my_struct) and struct my_struct as an arg to
> > > that macro) and there is no need to go and ping people to add those
> > > annotations to improve introspection of BPF maps.
> > Don't get me wrong, it looks good and there are advantages compared to
> > the existing way. But, again, feels to me a bit too magic. We should somehow
> > make it less magic (see below).
> >
> > > > I don't know how others feel about it, but I'd be much more comfortable
> > > > with a simpler TLV-like approach. Have a new section where the format
> > > > is |4-byte size|struct bpf_map_def_extendable|. That would essentially
> > > > allow us to extend it the way we do with a syscall args.
> > >
> > > It would help with extensibility, sure, though even current
> > > bpf_map_def approach sort of can be extended already. But it won't
> > > solve the problem of having BTF types captured for key/value (see
> > > above). Also, you'd need another macro to lay everything out properly.
> > I didn't know that we look into the list of exported symbols to estimate
> > the number of maps and then use it to derive struct bpf_map_def size.
> >
> > In that case, maybe we can keep extending struct bpf_map_def
> > and support BTF mode as a better alternative? bpf_map_def could be
> > used as a reference for which fields there are, people can still use it
> > (with BPF_ANNOTATE_KV_PAIR if needed), but they can also use
> > new BTF mode if they find that works better for them?
> >
> > Because the biggest issue for me with the BTF mode is the question
> > of where to look for the supported fields (and misspellings). People
> > on this mailing list can probably figure it out, but people who don't
> > work full time on bpf might find it hard. Having 'struct bpf_map_def'
> > as a reference (or a good supported piece of documentation) might help
> 
> So yeah, it's more about documentation and examples, it seems, rather
> than having a C struct in code, right? Today, if I need to add new
> map, I copy/paste either from example, existing code or look up
Well, you know where to copy paste from ;-)

> documentation. You'll be able to do the same with new way (just grep
> for \.maps).
Yes, it's mostly about discoverability. Either documentation or
the real underlaying structure could help with that.

> > with that.
> >
> > What do you think? The only issue is that we now have two formats
> > to support :-/
> 
> We'll have to support existing bpf_map_def for backwards compatibility
> (and see my reply to Jakub, you can just plain re-use struct
> bpf_map_def today with BTF approach, just put it into .maps section),
> but I'd love to avoid having to support new features using two
> different way, so if we go with BTF, I'd restrict new features to BTF
> only, moving forward.
But what's wrong with trying to extend bpf_map_def for a while? It looks like
we have everything in place to do that. I understand your desire
to deprecate everything and move on, but when was BTF support added to
LLVM? 8.0.0? 8.0.1? Six months ago? Is there a major distro with the
latest llvm+btf? Do we want to lock everyone out of new libbpf features?
(Consider that a lot of people run on the LTS kernels).

What's wrong with having BTF be just a syntactic sugar on top of
bpf_map_def? One major use-case is supporting iproute2 features,
but some of those features can go into bpf_map_def as well and
be used by non-BTF enabled users.

One other point to consider here might be pure Go libbpf that Lorenz is
maintaining. Having simple underlying bpf_map_def which we can agree
on might be beneficial.

> > > > Also, (un)related: we don't currently use BTF internally, so if
> > > > you convert all tests, we'd be unable to run them :-(
> > >
> > > Not exactly sure what you mean "you'd be unable to run them". Do you
> > > mean that you use old Clang that doesn't emit BTF? If that's what you
> > > are saying, a lot of tests already rely on latest Clang, so those
> > > tests already don't work for you, probably. I'll leave it up to Daniel
> > > and Alexei to decide if we want to convert selftests right now or not.
> > > I did it mostly to prove that we can handle all existing cases (and
> > > found few gotchas and bugs along the way, both in my implementation
> > > and in kernel - fixes coming soon).
> > Yes, I mean that we don't always use the latest features of clang,
> > so having the existing tests in the old form (at least for a while)
> > would be appreciated. Good candidates to showcase new format can
> > be features that explicitly require BTF, stuff like spinlocks.
> 
> I totally understand a concern, but I'll still defer to maintainers to
> make a call as to when to do conversion.
Sure, totally up to you and the maintainers. Just raising my voice,
so you'd at least consider not converting everything.

> > > > > Relying on BTF, this approach allows for both forward and backward
> > > > > compatibility w.r.t. extending supported map definition features. Old
> > > > > libbpf implementation will ignore fields it doesn't recognize, while new
> > > > > implementations will parse and recognize new optional attributes.
> > > > I also don't know how to feel about old libbpf ignoring some attributes.
> > > > In the kernel we require that the unknown fields are zeroed.
> > > > We probably need to do something like that here? What do you think
> > > > would be a good example of an optional attribute?
> > >
> > > Ignoring is required for forward-compatibility, where old libbpf will
> > > be used to load newer user BPF programs. We can decided not to do it,
> > > in that case it's just a question of erroring out on first unknown
> > > field. This RFC was posted exactly to discuss all these issues with
> > > more general community, as there is no single true way to do this.
> > >
> > > As for examples of when it can be used. It's any feature that can be
> > > considered optional or a hint, so if old libbpf doesn't do that, it's
> > > still not the end of the world (and we can live with that, or can
> > > correct using direct libbpf API calls).
> > In general, doing what we do right now with bpf_map_def (returning an error
> > for non-zero unknown options) seems like the safest option. We should
> > probably do the same with the unknown BTF fields (return an error
> > for non-zero value).
> 
> Yeah, as I replied to Jakub, libbpf already has strict/non-strict
> mode, we should probably do the same. The only potential difference is
> that there is no need to check for zeros and stuff: just don't define
> a field. And using an extra flag, we can allow more relaxed semantics
> (just debug/info/warn message on unknown fields). This is what
> __bpf_object__open_xattr does today with MAPS_RELAX_COMPAT flag.
> 
> >
> > For a general BTF case, we can have some predefined policy: if, for example,
> > the field name starts with an underscore, it's optional and doesn't require
> > non-zero check. (or the name ends with '_opt' or some other clear policy).
> >
> > > > > The outline of the new map definition (short, BTF-defined maps) is as follows:
> > > > > 1. All the maps should be defined in .maps ELF section. It's possible to
> > > > >    have both "legacy" map definitions in `maps` sections and BTF-defined
> > > > >    maps in .maps sections. Everything will still work transparently.
> > > > > 2. The map declaration and initialization is done through
> > > > >    a global/static variable of a struct type with few mandatory and
> > > > >    extra optional fields:
> > > > >    - type field is mandatory and specified type of BPF map;
> > > > >    - key/value fields are mandatory and capture key/value type/size information;
> > > > >    - max_entries attribute is optional; if max_entries is not specified or
> > > > >      initialized, it has to be provided in runtime through libbpf API
> > > > >      before loading bpf_object;
> > > > >    - map_flags is optional and if not defined, will be assumed to be 0.
> > > > > 3. Key/value fields should be **a pointer** to a type describing
> > > > >    key/value. The pointee type is assumed (and will be recorded as such
> > > > >    and used for size determination) to be a type describing key/value of
> > > > >    the map. This is done to save excessive amounts of space allocated in
> > > > >    corresponding ELF sections for key/value of big size.
> > > > > 4. As some maps disallow having BTF type ID associated with key/value,
> > > > >    it's possible to specify key/value size explicitly without
> > > > >    associating BTF type ID with it. Use key_size and value_size fields
> > > > >    to do that (see example below).
> > > > >
> > > > > Here's an example of simple ARRAY map defintion:
> > > > >
> > > > > struct my_value { int x, y, z; };
> > > > >
> > > > > struct {
> > > > >       int type;
> > > > >       int max_entries;
> > > > >       int *key;
> > > > >       struct my_value *value;
> > > > > } btf_map SEC(".maps") = {
> > > > >       .type = BPF_MAP_TYPE_ARRAY,
> > > > >       .max_entries = 16,
> > > > > };
> > > > >
> > > > > This will define BPF ARRAY map 'btf_map' with 16 elements. The key will
> > > > > be of type int and thus key size will be 4 bytes. The value is struct
> > > > > my_value of size 12 bytes. This map can be used from C code exactly the
> > > > > same as with existing maps defined through struct bpf_map_def.
> > > > >
> > > > > Here's an example of STACKMAP definition (which currently disallows BTF type
> > > > > IDs for key/value):
> > > > >
> > > > > struct {
> > > > >       __u32 type;
> > > > >       __u32 max_entries;
> > > > >       __u32 map_flags;
> > > > >       __u32 key_size;
> > > > >       __u32 value_size;
> > > > > } stackmap SEC(".maps") = {
> > > > >       .type = BPF_MAP_TYPE_STACK_TRACE,
> > > > >       .max_entries = 128,
> > > > >       .map_flags = BPF_F_STACK_BUILD_ID,
> > > > >       .key_size = sizeof(__u32),
> > > > >       .value_size = PERF_MAX_STACK_DEPTH * sizeof(struct bpf_stack_build_id),
> > > > > };
> > > > >
> > > > > This approach is naturally extended to support map-in-map, by making a value
> > > > > field to be another struct that describes inner map. This feature is not
> > > > > implemented yet. It's also possible to incrementally add features like pinning
> > > > > with full backwards and forward compatibility.
> > > > >
> > > > > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> > > > > ---
> > > > >  tools/lib/bpf/btf.h    |   1 +
> > > > >  tools/lib/bpf/libbpf.c | 333 +++++++++++++++++++++++++++++++++++++++--
> > > > >  2 files changed, 325 insertions(+), 9 deletions(-)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF
  2019-06-04  1:02           ` Stanislav Fomichev
@ 2019-06-04  1:07             ` Alexei Starovoitov
  2019-06-04  4:29               ` Stanislav Fomichev
  0 siblings, 1 reply; 40+ messages in thread
From: Alexei Starovoitov @ 2019-06-04  1:07 UTC (permalink / raw)
  To: Stanislav Fomichev, Andrii Nakryiko
  Cc: Andrii Nakryiko, Networking, bpf, Daniel Borkmann, Kernel Team

On 6/3/19 6:02 PM, Stanislav Fomichev wrote:
> Do we want to lock everyone out of new libbpf features?

BTF is mandatory for _any_ new feature.
It's for introspection and debuggability in the first place.
Good debugging is not optional.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF
  2019-06-04  1:07             ` Alexei Starovoitov
@ 2019-06-04  4:29               ` Stanislav Fomichev
  2019-06-04 13:45                 ` Stanislav Fomichev
  0 siblings, 1 reply; 40+ messages in thread
From: Stanislav Fomichev @ 2019-06-04  4:29 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, Andrii Nakryiko, Networking, bpf,
	Daniel Borkmann, Kernel Team

> BTF is mandatory for _any_ new feature.
If something is easy to support without asking everyone to upgrade to
a bleeding edge llvm, why not do it?
So much for backwards compatibility and flexibility.

> It's for introspection and debuggability in the first place.
> Good debugging is not optional.
Once llvm 8+ is everywhere, sure, but we are not there yet (I'm talking
about upstream LTS distros like ubuntu/redhat).

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF
  2019-06-04  4:29               ` Stanislav Fomichev
@ 2019-06-04 13:45                 ` Stanislav Fomichev
  2019-06-04 17:31                   ` Andrii Nakryiko
  0 siblings, 1 reply; 40+ messages in thread
From: Stanislav Fomichev @ 2019-06-04 13:45 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, Andrii Nakryiko, Networking, bpf,
	Daniel Borkmann, Kernel Team

On 06/03, Stanislav Fomichev wrote:
> > BTF is mandatory for _any_ new feature.
> If something is easy to support without asking everyone to upgrade to
> a bleeding edge llvm, why not do it?
> So much for backwards compatibility and flexibility.
> 
> > It's for introspection and debuggability in the first place.
> > Good debugging is not optional.
> Once llvm 8+ is everywhere, sure, but we are not there yet (I'm talking
> about upstream LTS distros like ubuntu/redhat).
But putting this aside, one thing that I didn't see addressed in the
cover letter is: what is the main motivation for the series?
Is it to support iproute2 map definitions (so cilium can switch to libbpf)?
If that's the case, maybe explicitly focus on that? Once we have
proof-of-concept working for iproute2 mode, we can extend it to everything.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF
  2019-06-04 13:45                 ` Stanislav Fomichev
@ 2019-06-04 17:31                   ` Andrii Nakryiko
  2019-06-04 21:07                     ` Stanislav Fomichev
  2019-06-06 21:09                     ` Daniel Borkmann
  0 siblings, 2 replies; 40+ messages in thread
From: Andrii Nakryiko @ 2019-06-04 17:31 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Alexei Starovoitov, Andrii Nakryiko, Networking, bpf,
	Daniel Borkmann, Kernel Team

On Tue, Jun 4, 2019 at 6:45 AM Stanislav Fomichev <sdf@fomichev.me> wrote:
>
> On 06/03, Stanislav Fomichev wrote:
> > > BTF is mandatory for _any_ new feature.
> > If something is easy to support without asking everyone to upgrade to
> > a bleeding edge llvm, why not do it?
> > So much for backwards compatibility and flexibility.
> >
> > > It's for introspection and debuggability in the first place.
> > > Good debugging is not optional.
> > Once llvm 8+ is everywhere, sure, but we are not there yet (I'm talking
> > about upstream LTS distros like ubuntu/redhat).
> But putting this aside, one thing that I didn't see addressed in the
> cover letter is: what is the main motivation for the series?
> Is it to support iproute2 map definitions (so cilium can switch to libbpf)?

In general, the motivation is to arrive at a way to support
declaratively defining maps in such a way, that:
- captures type information (for debuggability/introspection) in
coherent and hard-to-screw-up way;
- allows to support missing useful features w/ good syntax (e.g.,
natural map-in-map case vs current completely manual non-declarative
way for libbpf);
- ultimately allow iproute2 to use libbpf as unified loader (and thus
the need to support its existing features, like
BPF_MAP_TYPE_PROG_ARRAY initialization, pinning, map-in-map);

The only missing feature that can be supported reasonably with
bpf_map_def is pinning (as it's just another int field), but all the
other use cases requires awkward approach of matching arbitrary IDs,
which feels like a bad way forward.


> If that's the case, maybe explicitly focus on that? Once we have
> proof-of-concept working for iproute2 mode, we can extend it to everything.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF
  2019-06-04 17:31                   ` Andrii Nakryiko
@ 2019-06-04 21:07                     ` Stanislav Fomichev
  2019-06-04 21:22                       ` Andrii Nakryiko
  2019-06-06 21:09                     ` Daniel Borkmann
  1 sibling, 1 reply; 40+ messages in thread
From: Stanislav Fomichev @ 2019-06-04 21:07 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, Andrii Nakryiko, Networking, bpf,
	Daniel Borkmann, Kernel Team

On 06/04, Andrii Nakryiko wrote:
> On Tue, Jun 4, 2019 at 6:45 AM Stanislav Fomichev <sdf@fomichev.me> wrote:
> >
> > On 06/03, Stanislav Fomichev wrote:
> > > > BTF is mandatory for _any_ new feature.
> > > If something is easy to support without asking everyone to upgrade to
> > > a bleeding edge llvm, why not do it?
> > > So much for backwards compatibility and flexibility.
> > >
> > > > It's for introspection and debuggability in the first place.
> > > > Good debugging is not optional.
> > > Once llvm 8+ is everywhere, sure, but we are not there yet (I'm talking
> > > about upstream LTS distros like ubuntu/redhat).
> > But putting this aside, one thing that I didn't see addressed in the
> > cover letter is: what is the main motivation for the series?
> > Is it to support iproute2 map definitions (so cilium can switch to libbpf)?
> 
> In general, the motivation is to arrive at a way to support
> declaratively defining maps in such a way, that:
> - captures type information (for debuggability/introspection) in
> coherent and hard-to-screw-up way;
> - allows to support missing useful features w/ good syntax (e.g.,
> natural map-in-map case vs current completely manual non-declarative
> way for libbpf);

[..]
> - ultimately allow iproute2 to use libbpf as unified loader (and thus
> the need to support its existing features, like
> BPF_MAP_TYPE_PROG_ARRAY initialization, pinning, map-in-map);
So prog_array tail call info would be encoded in the magic struct instead of
a __section_tail(whatever) macros that iproute2 is using? Does it
mean that the programs that target iproute2 would have to be rewritten?
Or we don't have a goal to provide source-level compatibility?

In general, supporting iproute2 seems like the most compelling
reason to use BTF given current state of llvm+btf adoption.
BPF_ANNOTATE_KV_PAIR and map-in-map syntax while ugly, is not the major
paint point (imho); but I agree, with BTF both of those things
look much better.

That's why I was trying to understand whether we can start with using
BTF to support _existing_ iproute2 format and then, once it's working,
generalize it (and kill bpf_map_def or make it a subset of generic BTF).
That way we are not implementing another way to support pinning/tail
calls, but enabling iproute2 to use libbpf.

But feel free to ignore all my nonsense above; I don't really have any
major concerns with the new generic format rather than discoverability
(the docs might help) and a mandate that everyone switches to it immediately.

> The only missing feature that can be supported reasonably with
> bpf_map_def is pinning (as it's just another int field), but all the
> other use cases requires awkward approach of matching arbitrary IDs,
> which feels like a bad way forward.
> 
> 
> > If that's the case, maybe explicitly focus on that? Once we have
> > proof-of-concept working for iproute2 mode, we can extend it to everything.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF
  2019-06-04 21:07                     ` Stanislav Fomichev
@ 2019-06-04 21:22                       ` Andrii Nakryiko
  0 siblings, 0 replies; 40+ messages in thread
From: Andrii Nakryiko @ 2019-06-04 21:22 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Alexei Starovoitov, Andrii Nakryiko, Networking, bpf,
	Daniel Borkmann, Kernel Team

On Tue, Jun 4, 2019 at 2:07 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
>
> On 06/04, Andrii Nakryiko wrote:
> > On Tue, Jun 4, 2019 at 6:45 AM Stanislav Fomichev <sdf@fomichev.me> wrote:
> > >
> > > On 06/03, Stanislav Fomichev wrote:
> > > > > BTF is mandatory for _any_ new feature.
> > > > If something is easy to support without asking everyone to upgrade to
> > > > a bleeding edge llvm, why not do it?
> > > > So much for backwards compatibility and flexibility.
> > > >
> > > > > It's for introspection and debuggability in the first place.
> > > > > Good debugging is not optional.
> > > > Once llvm 8+ is everywhere, sure, but we are not there yet (I'm talking
> > > > about upstream LTS distros like ubuntu/redhat).
> > > But putting this aside, one thing that I didn't see addressed in the
> > > cover letter is: what is the main motivation for the series?
> > > Is it to support iproute2 map definitions (so cilium can switch to libbpf)?
> >
> > In general, the motivation is to arrive at a way to support
> > declaratively defining maps in such a way, that:
> > - captures type information (for debuggability/introspection) in
> > coherent and hard-to-screw-up way;
> > - allows to support missing useful features w/ good syntax (e.g.,
> > natural map-in-map case vs current completely manual non-declarative
> > way for libbpf);
>
> [..]
> > - ultimately allow iproute2 to use libbpf as unified loader (and thus
> > the need to support its existing features, like
> > BPF_MAP_TYPE_PROG_ARRAY initialization, pinning, map-in-map);
> So prog_array tail call info would be encoded in the magic struct instead of
> a __section_tail(whatever) macros that iproute2 is using? Does it

Yes. It will be C-style array initialization (where value is address
of a function, corresponding to a BPF program).

> mean that the programs that target iproute2 would have to be rewritten?
> Or we don't have a goal to provide source-level compatibility?

As outlined in separate email I sent out yesterday, my goal was making
sure we have very easy transition path not changing the semantics
(field renaming for common case, functionally-equivalent, but
different syntax for tail call prog array initialization, etc). Let's
see what folks working on Cilium think about this.


>
> In general, supporting iproute2 seems like the most compelling
> reason to use BTF given current state of llvm+btf adoption.
> BPF_ANNOTATE_KV_PAIR and map-in-map syntax while ugly, is not the major
> paint point (imho); but I agree, with BTF both of those things
> look much better.
>
> That's why I was trying to understand whether we can start with using
> BTF to support _existing_ iproute2 format and then, once it's working,
> generalize it (and kill bpf_map_def or make it a subset of generic BTF).
> That way we are not implementing another way to support pinning/tail
> calls, but enabling iproute2 to use libbpf.

We currently don't have a good way (except for programmatic API) to do
either tail call or map-in-map declaratively in libbpf, so the hope is
this approach will allow us to address that lack, and preferrably in a
bit more intuitive way, than iproute2 support today. Given it's simple
to convert iproute2 approach to BTF-based one, I'd vote for not
back-porting that logic into libbpf, if possible.

>
> But feel free to ignore all my nonsense above; I don't really have any
> major concerns with the new generic format rather than discoverability
> (the docs might help) and a mandate that everyone switches to it immediately.

No, thanks for feedback! For documentation, I think we might want to
add description to https://docs.cilium.io/en/v1.4/bpf/ (though
timing-wise it would be better to do after iproute2 starts using
libbpf, so a bit of a chicken-and-egg problem). If you have better
suggestions where to put it, let me know.

>
> > The only missing feature that can be supported reasonably with
> > bpf_map_def is pinning (as it's just another int field), but all the
> > other use cases requires awkward approach of matching arbitrary IDs,
> > which feels like a bad way forward.
> >
> >
> > > If that's the case, maybe explicitly focus on that? Once we have
> > > proof-of-concept working for iproute2 mode, we can extend it to everything.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF
  2019-05-31 20:21 ` [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF Andrii Nakryiko
  2019-05-31 21:28   ` Stanislav Fomichev
  2019-06-03 22:34   ` Andrii Nakryiko
@ 2019-06-06 16:42   ` Lorenz Bauer
  2019-06-06 22:34     ` Andrii Nakryiko
  2 siblings, 1 reply; 40+ messages in thread
From: Lorenz Bauer @ 2019-06-06 16:42 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, Networking, bpf, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team

Thanks for sending this RFC! For me, the biggest draw is that map-in-map
would be so much nicer to use, plus automatic dumping of map values.

Others on the thread have raised this point already: not everybody lives
on the bleeding edge or can control all of their dependencies. To me this means
that having a good compatibility story is paramount. I'd like to have very clear
rules how the presence / absence of fields is handled.

For example:
- Fields that are present but not understood are an error. This makes
sense because
  the user can simply omit the field in their definition if they do
not use it. It's also necessary
  to preserve the freedom to add new fields in the future without
risking user breakage.
- If libbpf adds support for a new field, it must be optional. Seems
like this is what current
  map extensions already do, so maybe a no-brainer.

Somewhat related to this: I really wish that BTF was self-describing,
e.g. possible
to parse without understanding all types. I mentioned this in another
thread of yours,
but the more we add features where BTF is required the more important it becomes
IMO.

Finally, some nits inline:

On Fri, 31 May 2019 at 21:22, Andrii Nakryiko <andriin@fb.com> wrote:
>
> The outline of the new map definition (short, BTF-defined maps) is as follows:
> 1. All the maps should be defined in .maps ELF section. It's possible to
>    have both "legacy" map definitions in `maps` sections and BTF-defined
>    maps in .maps sections. Everything will still work transparently.

I'd prefer using a new map section "btf_maps" or whatever. No need to
worry about code that deals with either type.

> 3. Key/value fields should be **a pointer** to a type describing
>    key/value. The pointee type is assumed (and will be recorded as such
>    and used for size determination) to be a type describing key/value of
>    the map. This is done to save excessive amounts of space allocated in
>    corresponding ELF sections for key/value of big size.

My biggest concern with the pointer is that there are cases when we want
to _not_ use a pointer, e.g. your proposal for map in map and tail calling.
There we need value to be a struct, an array, etc. The burden on the user
for this is very high.

> 4. As some maps disallow having BTF type ID associated with key/value,
>    it's possible to specify key/value size explicitly without
>    associating BTF type ID with it. Use key_size and value_size fields
>    to do that (see example below).

Why not just make them use the legacy map?

-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF
  2019-06-04 17:31                   ` Andrii Nakryiko
  2019-06-04 21:07                     ` Stanislav Fomichev
@ 2019-06-06 21:09                     ` Daniel Borkmann
  2019-06-06 23:02                       ` Andrii Nakryiko
  1 sibling, 1 reply; 40+ messages in thread
From: Daniel Borkmann @ 2019-06-06 21:09 UTC (permalink / raw)
  To: Andrii Nakryiko, Stanislav Fomichev
  Cc: Alexei Starovoitov, Andrii Nakryiko, Networking, bpf, Kernel Team, yhs

On 06/04/2019 07:31 PM, Andrii Nakryiko wrote:
> On Tue, Jun 4, 2019 at 6:45 AM Stanislav Fomichev <sdf@fomichev.me> wrote:
>> On 06/03, Stanislav Fomichev wrote:
>>>> BTF is mandatory for _any_ new feature.
>>> If something is easy to support without asking everyone to upgrade to
>>> a bleeding edge llvm, why not do it?
>>> So much for backwards compatibility and flexibility.
>>>
>>>> It's for introspection and debuggability in the first place.
>>>> Good debugging is not optional.
>>> Once llvm 8+ is everywhere, sure, but we are not there yet (I'm talking
>>> about upstream LTS distros like ubuntu/redhat).
>> But putting this aside, one thing that I didn't see addressed in the
>> cover letter is: what is the main motivation for the series?
>> Is it to support iproute2 map definitions (so cilium can switch to libbpf)?
> 
> In general, the motivation is to arrive at a way to support
> declaratively defining maps in such a way, that:
> - captures type information (for debuggability/introspection) in
> coherent and hard-to-screw-up way;
> - allows to support missing useful features w/ good syntax (e.g.,
> natural map-in-map case vs current completely manual non-declarative
> way for libbpf);
> - ultimately allow iproute2 to use libbpf as unified loader (and thus
> the need to support its existing features, like
> BPF_MAP_TYPE_PROG_ARRAY initialization, pinning, map-in-map);

Thanks for working on this & sorry for jumping in late! Generally, I like
the approach of using BTF to make sense out of the individual members and
to have extensibility, so overall I think it's a step in the right direction.
Going back to the example where others complained that the k/v NULL
initialization feels too much magic from a C pov:

struct {
	int type;
	int max_entries;
	int *key;
	struct my_value *value;
} my_map SEC(".maps") = {
	.type = BPF_MAP_TYPE_ARRAY,
	.max_entries = 16,
};

Given LLVM is in charge of emitting BTF plus given gcc/clang seem /both/
to support *target* specific attributes [0], how about something along these
lines where the type specific info is annotated as a variable BPF target
attribute, like:

struct {
	int type;
	int max_entries;
} my_map __attribute__((map(int,struct my_value))) = {
	.type = BPF_MAP_TYPE_ARRAY,
	.max_entries = 16,
};

Of course this would need BPF backend support, but at least that approach
would be more C like. Thus this would define types where we can automatically
derive key/val sizes etc. The SEC() could be dropped as well as map attribute
would imply it for LLVM to do the right thing underneath. The normal/actual members
from the struct has a base set of well-known names that are minimally required
but there could be custom stuff as well where libbpf would query some user
defined callback that can handle these. Anyway, main point, what do you think
about the __attribute__ approach instead? I think this feels cleaner to me at
least iff feasible.

Thanks,
Daniel

  [0] https://clang.llvm.org/docs/AttributeReference.html
      https://gcc.gnu.org/onlinedocs/gcc/Variable-Attributes.html

> The only missing feature that can be supported reasonably with
> bpf_map_def is pinning (as it's just another int field), but all the
> other use cases requires awkward approach of matching arbitrary IDs,
> which feels like a bad way forward.
> 
>> If that's the case, maybe explicitly focus on that? Once we have
>> proof-of-concept working for iproute2 mode, we can extend it to everything.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF
  2019-06-06 16:42   ` Lorenz Bauer
@ 2019-06-06 22:34     ` Andrii Nakryiko
  2019-06-17  9:07       ` Lorenz Bauer
  0 siblings, 1 reply; 40+ messages in thread
From: Andrii Nakryiko @ 2019-06-06 22:34 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: Andrii Nakryiko, Networking, bpf, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team

On Thu, Jun 6, 2019 at 9:43 AM Lorenz Bauer <lmb@cloudflare.com> wrote:
>
> Thanks for sending this RFC! For me, the biggest draw is that map-in-map
> would be so much nicer to use, plus automatic dumping of map values.
>
> Others on the thread have raised this point already: not everybody lives
> on the bleeding edge or can control all of their dependencies. To me this means
> that having a good compatibility story is paramount. I'd like to have very clear
> rules how the presence / absence of fields is handled.

I think that discussion was more about selftests being switched to
BTF-defined maps rather than BPF users having to switch to latest
compiler. struct bpf_map_def is still supported for those who can't
use clang that supports BTF_KIND_VAR/BTF_KIND_DATASEC.
So I don't think this enforces anyone to switch compiler, but
certainly incentivizes them :)

>
> For example:
> - Fields that are present but not understood are an error. This makes
> sense because
>   the user can simply omit the field in their definition if they do
> not use it. It's also necessary
>   to preserve the freedom to add new fields in the future without
> risking user breakage.

So you are arguing for strict-by-default behavior. It's fine by me,
but exactly that strict-by-default behavior is the problem with BTF
extensivility, that you care a lot about. You are advocating for
skipping unknown BTF types (if it was possible), which is directly
opposite to strict-by-default behavior. I have no strong preference
here, but given amount of problem (and how many times we missed this
problem in the past) w/ introducing new BTF feature and then
forgetting about doing something for older kernels, kind of makes me
lean towards skip-and-log behavior. But I'm happy to support both
(through flags) w/ strict by default.

> - If libbpf adds support for a new field, it must be optional. Seems
> like this is what current
>   map extensions already do, so maybe a no-brainer.

Yeah, of course.

>
> Somewhat related to this: I really wish that BTF was self-describing,
> e.g. possible
> to parse without understanding all types. I mentioned this in another
> thread of yours,
> but the more we add features where BTF is required the more important it becomes
> IMO.

I relate, but have no new and better solution than previously
discussed :) We should try to add new stuff to .BTF.ext as much as
possible, which is self-describing.

>
> Finally, some nits inline:
>
> On Fri, 31 May 2019 at 21:22, Andrii Nakryiko <andriin@fb.com> wrote:
> >
> > The outline of the new map definition (short, BTF-defined maps) is as follows:
> > 1. All the maps should be defined in .maps ELF section. It's possible to
> >    have both "legacy" map definitions in `maps` sections and BTF-defined
> >    maps in .maps sections. Everything will still work transparently.
>
> I'd prefer using a new map section "btf_maps" or whatever. No need to
> worry about code that deals with either type.

We do use new map section. Its ".maps" vs "maps". Difference is
subtle, but ".maps" looks a bit more "standardized" than "btf_maps" to
me (and hopefully, eventually no one will use "maps" anymore :) ).

>
> > 3. Key/value fields should be **a pointer** to a type describing
> >    key/value. The pointee type is assumed (and will be recorded as such
> >    and used for size determination) to be a type describing key/value of
> >    the map. This is done to save excessive amounts of space allocated in
> >    corresponding ELF sections for key/value of big size.
>
> My biggest concern with the pointer is that there are cases when we want
> to _not_ use a pointer, e.g. your proposal for map in map and tail calling.
> There we need value to be a struct, an array, etc. The burden on the user
> for this is very high.

Well, map-in-map is still a special case and whichever syntax we go
with, it will need to be of slightly different syntax to distinguish
between those cases. Initialized maps fall into similar category,
IMHO.

Embedding full value just to capture type info/size is unacceptable,
as we have use cases that cause too big ELF size increase, which will
prevent users from switching to this.

>
> > 4. As some maps disallow having BTF type ID associated with key/value,
> >    it's possible to specify key/value size explicitly without
> >    associating BTF type ID with it. Use key_size and value_size fields
> >    to do that (see example below).
>
> Why not just make them use the legacy map?

For completeness' sake at the least. E.g., what if you want to use
map-in-map, where inner map is stackmap or something like that, which
requires key_size/value_size? I think we all agree that it's better if
application uses just one style, instead of a mix of both, right?
Btw, for map cases where map key can be arbitrary, but value is FD or
some other opaque value, libbpf can automatically "derive" value size
and still capture key type. I haven't done that, but it's very easy to
do (and also we can keep adding per-map-type checks/niceties, to help
users understand what's wrong with their map definition, instead of
getting EINVAL from kernel on map creation).

>
> --
> Lorenz Bauer  |  Systems Engineer
> 6th Floor, County Hall/The Riverside Building, SE1 7PB, UK
>
> www.cloudflare.com

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF
  2019-06-06 21:09                     ` Daniel Borkmann
@ 2019-06-06 23:02                       ` Andrii Nakryiko
  2019-06-06 23:27                         ` Alexei Starovoitov
  0 siblings, 1 reply; 40+ messages in thread
From: Andrii Nakryiko @ 2019-06-06 23:02 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Stanislav Fomichev, Alexei Starovoitov, Andrii Nakryiko,
	Networking, bpf, Kernel Team, Yonghong Song

On Thu, Jun 6, 2019 at 2:09 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 06/04/2019 07:31 PM, Andrii Nakryiko wrote:
> > On Tue, Jun 4, 2019 at 6:45 AM Stanislav Fomichev <sdf@fomichev.me> wrote:
> >> On 06/03, Stanislav Fomichev wrote:
> >>>> BTF is mandatory for _any_ new feature.
> >>> If something is easy to support without asking everyone to upgrade to
> >>> a bleeding edge llvm, why not do it?
> >>> So much for backwards compatibility and flexibility.
> >>>
> >>>> It's for introspection and debuggability in the first place.
> >>>> Good debugging is not optional.
> >>> Once llvm 8+ is everywhere, sure, but we are not there yet (I'm talking
> >>> about upstream LTS distros like ubuntu/redhat).
> >> But putting this aside, one thing that I didn't see addressed in the
> >> cover letter is: what is the main motivation for the series?
> >> Is it to support iproute2 map definitions (so cilium can switch to libbpf)?
> >
> > In general, the motivation is to arrive at a way to support
> > declaratively defining maps in such a way, that:
> > - captures type information (for debuggability/introspection) in
> > coherent and hard-to-screw-up way;
> > - allows to support missing useful features w/ good syntax (e.g.,
> > natural map-in-map case vs current completely manual non-declarative
> > way for libbpf);
> > - ultimately allow iproute2 to use libbpf as unified loader (and thus
> > the need to support its existing features, like
> > BPF_MAP_TYPE_PROG_ARRAY initialization, pinning, map-in-map);
>
> Thanks for working on this & sorry for jumping in late! Generally, I like
> the approach of using BTF to make sense out of the individual members and
> to have extensibility, so overall I think it's a step in the right direction.
> Going back to the example where others complained that the k/v NULL
> initialization feels too much magic from a C pov:
>
> struct {
>         int type;
>         int max_entries;
>         int *key;
>         struct my_value *value;
> } my_map SEC(".maps") = {
>         .type = BPF_MAP_TYPE_ARRAY,
>         .max_entries = 16,
> };
>
> Given LLVM is in charge of emitting BTF plus given gcc/clang seem /both/
> to support *target* specific attributes [0], how about something along these
> lines where the type specific info is annotated as a variable BPF target
> attribute, like:
>
> struct {
>         int type;
>         int max_entries;
> } my_map __attribute__((map(int,struct my_value))) = {
>         .type = BPF_MAP_TYPE_ARRAY,
>         .max_entries = 16,
> };
>
> Of course this would need BPF backend support, but at least that approach
> would be more C like. Thus this would define types where we can automatically

I guess it's technically possible (not a compiler guru, but I don't
see why it wouldn't be possible). But it will require at least two
things:
1. Compiler support, obviously, as you mentioned.
2. BTF specification on how to describe attributes and how to describe
what entities (variable in this case) it is attached to.

2. is not straightforward, as attributes in general is a collection of
values of vastly different types: some values could be integers, some
strings, some, like in this case, would be a reference another BTF
type. It seems like a powerful and potentially useful addition to BTF,
of course, but it's very unclear at this point what's the best way to
represent them.

I'm not relating with "non idiomatic C" motive, though, so all that
seems like unnecessarily heavy-weight way to get something that we can
get today w/o compiler support in a clean, succinct and familiar C
syntax, that to me doesn't look like magic at all.

And if anything, attribute feels just as much magic to me. But here's
very similarly looking macro-trick:

#define MAP_KEY_VALUE_META(KEY, VALUE) KEY *key; VALUE *value;

struct {
       MAP_KEY_VALUE_META(int, struct my_value)
       int type;
       int max_entries;
} my_map SEC(".maps") = {
       .type = BPF_MAP_TYPE_ARRAY,
       .max_entries = 16,
};

Or even:

#define MAP_DEF(KEY, VALUE) struct { KEY *key; VALUE *value; int type;
int max_entries; }

MAP_DEF(int, struct my_value) my_map SEC(".maps") = {
       .type = BPF_MAP_TYPE_ARRAY,
       .max_entries = 16,
};

> derive key/val sizes etc. The SEC() could be dropped as well as map attribute

I think we should at least have an ability to override ELF section
name, just in case we add support to have maps in multiple sections
(e.g., shared library with its own set of maps, or whatever).

> would imply it for LLVM to do the right thing underneath. The normal/actual members
> from the struct has a base set of well-known names that are minimally required
> but there could be custom stuff as well where libbpf would query some user
> defined callback that can handle these. Anyway, main point, what do you think

So regarding callback. I find it hard to imagine how that could be
implemented interface-wise. As each field can have very different
value (it could be another embedded custom struct, not just integer;
or it could be char array of fixed size, etc), which is determined by
BTF, I don't know how I would expose that to custom callback in C type
system.

If I absolutely had to do it, though, how about this approach. We
either add BTF type id of a defining struct to bpf_map_def or add
bpf_map__btf_def() API, which returns it, so:

struct bpf_map *map = bpf_object__find_map_by_name(obj, "my_fancy_map");
struct btf *btf = bpf_object__btf(obj);
__u32 def_id = bpf_map__btf_map_def_type_id(map);
const void *def_data = bpf_map__btf_map_def_data(map);
struct btf_type *t = btf__type_by_id(btf, def_id);

Then application can do whatever parsing it wants on BTF map
definition and extract values in whatever manner suits it. This way
it's just a bunch of very straightforward APIs, instead of callbacks
w/ unclear interface (i.e., you'd still need to expose field_name,
field's type_id, raw pointer to data).

Does this make sense?

But having said that, what are the use cases you have in mind that
require application to put custom stuff into a standardized map
definition?

> about the __attribute__ approach instead? I think this feels cleaner to me at
> least iff feasible.
>
> Thanks,
> Daniel
>
>   [0] https://clang.llvm.org/docs/AttributeReference.html
>       https://gcc.gnu.org/onlinedocs/gcc/Variable-Attributes.html
>
> > The only missing feature that can be supported reasonably with
> > bpf_map_def is pinning (as it's just another int field), but all the
> > other use cases requires awkward approach of matching arbitrary IDs,
> > which feels like a bad way forward.
> >
> >> If that's the case, maybe explicitly focus on that? Once we have
> >> proof-of-concept working for iproute2 mode, we can extend it to everything.
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF
  2019-06-06 23:02                       ` Andrii Nakryiko
@ 2019-06-06 23:27                         ` Alexei Starovoitov
  2019-06-07  0:10                           ` Jakub Kicinski
  0 siblings, 1 reply; 40+ messages in thread
From: Alexei Starovoitov @ 2019-06-06 23:27 UTC (permalink / raw)
  To: Andrii Nakryiko, Daniel Borkmann
  Cc: Stanislav Fomichev, Andrii Nakryiko, Networking, bpf,
	Kernel Team, Yonghong Song

On 6/6/19 4:02 PM, Andrii Nakryiko wrote:
>> struct {
>>          int type;
>>          int max_entries;
>> } my_map __attribute__((map(int,struct my_value))) = {
>>          .type = BPF_MAP_TYPE_ARRAY,
>>          .max_entries = 16,
>> };
>>
>> Of course this would need BPF backend support, but at least that approach
>> would be more C like. Thus this would define types where we can automatically
> I guess it's technically possible (not a compiler guru, but I don't
> see why it wouldn't be possible). But it will require at least two
> things:
> 1. Compiler support, obviously, as you mentioned.

every time we're doing llvm common change it takes many months.
Adding BTF took 6 month, though the common changes were trivial.
Now we're already 1+ month into adding 4 intrinsics to support CO-RE.

In the past I was very much in favor of extending __attribute__
with bpf specific stuff. Now not so much.
__attribute__((map(int,struct my_value))) cannot be done as strings.
clang has to process the types, create new objects inside debug info.
It's not clear to me how this modified debug info will be associated
with the variable my_map.
So I suspect doing __attribute__ with actual C type inside (())
will not be possible.
I think in the future we might still add string based attributes,
but it's not going to be easy.
So... Unless somebody in the community who is doing full time llvm work
will not step in right now and says "I will code the above attr stuff",
we should not count on such clang+llvm feature.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF
  2019-06-06 23:27                         ` Alexei Starovoitov
@ 2019-06-07  0:10                           ` Jakub Kicinski
  2019-06-07  0:27                             ` Alexei Starovoitov
  0 siblings, 1 reply; 40+ messages in thread
From: Jakub Kicinski @ 2019-06-07  0:10 UTC (permalink / raw)
  To: Alexei Starovoitov, Andrii Nakryiko
  Cc: Daniel Borkmann, Stanislav Fomichev, Andrii Nakryiko, Networking,
	bpf, Kernel Team, Yonghong Song

On Thu, 6 Jun 2019 23:27:36 +0000, Alexei Starovoitov wrote:
> On 6/6/19 4:02 PM, Andrii Nakryiko wrote:
> >> struct {
> >>          int type;
> >>          int max_entries;
> >> } my_map __attribute__((map(int,struct my_value))) = {
> >>          .type = BPF_MAP_TYPE_ARRAY,
> >>          .max_entries = 16,
> >> };
> >>
> >> Of course this would need BPF backend support, but at least that approach
> >> would be more C like. Thus this would define types where we can automatically  
> > I guess it's technically possible (not a compiler guru, but I don't
> > see why it wouldn't be possible). But it will require at least two
> > things:
> > 1. Compiler support, obviously, as you mentioned.  
> 
> every time we're doing llvm common change it takes many months.
> Adding BTF took 6 month, though the common changes were trivial.
> Now we're already 1+ month into adding 4 intrinsics to support CO-RE.
> 
> In the past I was very much in favor of extending __attribute__
> with bpf specific stuff. Now not so much.
> __attribute__((map(int,struct my_value))) cannot be done as strings.
> clang has to process the types, create new objects inside debug info.
> It's not clear to me how this modified debug info will be associated
> with the variable my_map.
> So I suspect doing __attribute__ with actual C type inside (())
> will not be possible.
> I think in the future we might still add string based attributes,
> but it's not going to be easy.
> So... Unless somebody in the community who is doing full time llvm work
> will not step in right now and says "I will code the above attr stuff",
> we should not count on such clang+llvm feature.

If nobody has resources to commit to this, perhaps we can just stick 
to BPF_ANNOTATE_KV_PAIR()?

Apologies, but I think I missed the memo on why that's considered 
a hack.  Could someone point me to the relevant discussion?

We could conceivably add BTF-based map_def for other features, and
solve the K/V problem once a clean solution becomes apparent and
tractable?  BPF_ANNOTATE_KV_PAIR() is not great, but we kinda already
have it..

Perhaps I'm not thinking clearly about this and I should stay quiet :)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF
  2019-06-07  0:10                           ` Jakub Kicinski
@ 2019-06-07  0:27                             ` Alexei Starovoitov
  2019-06-07  1:02                               ` Jakub Kicinski
  0 siblings, 1 reply; 40+ messages in thread
From: Alexei Starovoitov @ 2019-06-07  0:27 UTC (permalink / raw)
  To: Jakub Kicinski, Andrii Nakryiko
  Cc: Daniel Borkmann, Stanislav Fomichev, Andrii Nakryiko, Networking,
	bpf, Kernel Team, Yonghong Song

On 6/6/19 5:10 PM, Jakub Kicinski wrote:
> On Thu, 6 Jun 2019 23:27:36 +0000, Alexei Starovoitov wrote:
>> On 6/6/19 4:02 PM, Andrii Nakryiko wrote:
>>>> struct {
>>>>           int type;
>>>>           int max_entries;
>>>> } my_map __attribute__((map(int,struct my_value))) = {
>>>>           .type = BPF_MAP_TYPE_ARRAY,
>>>>           .max_entries = 16,
>>>> };
>>>>
>>>> Of course this would need BPF backend support, but at least that approach
>>>> would be more C like. Thus this would define types where we can automatically
>>> I guess it's technically possible (not a compiler guru, but I don't
>>> see why it wouldn't be possible). But it will require at least two
>>> things:
>>> 1. Compiler support, obviously, as you mentioned.
>>
>> every time we're doing llvm common change it takes many months.
>> Adding BTF took 6 month, though the common changes were trivial.
>> Now we're already 1+ month into adding 4 intrinsics to support CO-RE.
>>
>> In the past I was very much in favor of extending __attribute__
>> with bpf specific stuff. Now not so much.
>> __attribute__((map(int,struct my_value))) cannot be done as strings.
>> clang has to process the types, create new objects inside debug info.
>> It's not clear to me how this modified debug info will be associated
>> with the variable my_map.
>> So I suspect doing __attribute__ with actual C type inside (())
>> will not be possible.
>> I think in the future we might still add string based attributes,
>> but it's not going to be easy.
>> So... Unless somebody in the community who is doing full time llvm work
>> will not step in right now and says "I will code the above attr stuff",
>> we should not count on such clang+llvm feature.
> 
> If nobody has resources to commit to this, perhaps we can just stick
> to BPF_ANNOTATE_KV_PAIR()?
> 
> Apologies, but I think I missed the memo on why that's considered
> a hack.  Could someone point me to the relevant discussion?
> 
> We could conceivably add BTF-based map_def for other features, and
> solve the K/V problem once a clean solution becomes apparent and
> tractable?  BPF_ANNOTATE_KV_PAIR() is not great, but we kinda already
> have it..
> 
> Perhaps I'm not thinking clearly about this and I should stay quiet :)

the solution we're discussing should solve BPF_ANNOTATE_KV_PAIR too.
That hack must go.

If I understood your objections to Andrii's format is that
you don't like pointer part of key/value while Andrii explained
why we picked the pointer, right?

So how about:

struct {
   int type;
   int max_entries;
   struct {
     __u32 key;
     struct my_value value;
   } types[];
} ...



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF
  2019-06-07  0:27                             ` Alexei Starovoitov
@ 2019-06-07  1:02                               ` Jakub Kicinski
  2019-06-10  1:17                                 ` explicit maps. Was: " Alexei Starovoitov
  0 siblings, 1 reply; 40+ messages in thread
From: Jakub Kicinski @ 2019-06-07  1:02 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, Daniel Borkmann, Stanislav Fomichev,
	Andrii Nakryiko, Networking, bpf, Kernel Team, Yonghong Song

On Fri, 7 Jun 2019 00:27:52 +0000, Alexei Starovoitov wrote:
> the solution we're discussing should solve BPF_ANNOTATE_KV_PAIR too.
> That hack must go.

I see.

> If I understood your objections to Andrii's format is that
> you don't like pointer part of key/value while Andrii explained
> why we picked the pointer, right?
> 
> So how about:
> 
> struct {
>    int type;
>    int max_entries;
>    struct {
>      __u32 key;
>      struct my_value value;
>    } types[];
> } ...

My objection is that k/v fields are never initialized, so they're 
"metafields", mixed with real fields which hold parameters - like 
type, max_entries etc.

But I thought about this 3 times now, and I see no better solution.
FWIW my best shot was relos:

extern struct my_key my_key;
extern int type_int;

struct map_def {
    int type;
    int max_entries;
    void *btf_key_ref;
    void *btf_val_ref;
} = {
    ...
    .btf_key_ref = &my_key,
    .btf_val_ref = &type_int,
};

The advantage being that map_def is no longer modified for each
instance and k/v combination.  And I get my assignment to k/v members :)

But really, I give up, I can't come up with anything better :)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* explicit maps. Was: [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF
  2019-06-07  1:02                               ` Jakub Kicinski
@ 2019-06-10  1:17                                 ` Alexei Starovoitov
  2019-06-10 21:15                                   ` Jakub Kicinski
  2019-06-10 23:48                                   ` Andrii Nakryiko
  0 siblings, 2 replies; 40+ messages in thread
From: Alexei Starovoitov @ 2019-06-10  1:17 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Andrii Nakryiko, Daniel Borkmann, Stanislav Fomichev,
	Andrii Nakryiko, Networking, bpf, Kernel Team, Yonghong Song

On 6/6/19 6:02 PM, Jakub Kicinski wrote:
> On Fri, 7 Jun 2019 00:27:52 +0000, Alexei Starovoitov wrote:
>> the solution we're discussing should solve BPF_ANNOTATE_KV_PAIR too.
>> That hack must go.
> 
> I see.
> 
>> If I understood your objections to Andrii's format is that
>> you don't like pointer part of key/value while Andrii explained
>> why we picked the pointer, right?
>>
>> So how about:
>>
>> struct {
>>     int type;
>>     int max_entries;
>>     struct {
>>       __u32 key;
>>       struct my_value value;
>>     } types[];
>> } ...
> 
> My objection is that k/v fields are never initialized, so they're
> "metafields", mixed with real fields which hold parameters - like
> type, max_entries etc.

I don't share this meta fields vs real fields distinction.
All of the fields are meta.
Kernel implementation of the map doesn't need to hold type and
max_entries as actual configuration fields.
The map definition in c++ would have looked like:
bpf::hash_map<int, struct my_value, 1000, NO_PREALLOC> foo;
bpf::array_map<struct my_value, 2000> bar;

Sometime key is not necessary. Sometimes flags have to be zero.
bpf syscall api is a superset of all fiels for all maps.
All of them are configuration and meta fields at the same time.
In c++ example there is really no difference between
'struct my_value' and '1000' attributes.

I'm pretty sure bpf will have C++ front-end in the future,
but until then we have to deal with C and, I think, the map
definition should be the most natural C syntax.
In that sense what you're proposing with extern:
> extern struct my_key my_key;
> extern int type_int;
> 
> struct map_def {
>      int type;
>      int max_entries;
>      void *btf_key_ref;
>      void *btf_val_ref;
> } = {
>      ...
>      .btf_key_ref = &my_key,
>      .btf_val_ref = &type_int,
> };

is worse than

struct map_def {
       int type;
       int max_entries;
       int btf_key;
       struct my_key btf_value;
};

imo explicit key and value would be ideal,
but they take too much space. Hence pointers
or zero sized array:
struct {
      int type;
      int max_entries;
      struct {
        __u32 key;
        struct my_value value;
      } types[];
};

I think we should also consider explicit map creation.

Something like:

struct my_map {
   __u32 key;
   struct my_value value;
} *my_hash_map, *my_pinned_hash_map;

struct {
    __u64 key;
   struct my_map *value;
} *my_hash_of_maps;

struct {
   struct my_map *value;
} *my_array_of_maps;

__init void create_my_maps(void)
{
   bpf_create_hash_map(&my_hash_map, 1000/*max_entries*/);
   bpf_obj_get(&my_pinned_hash_map, "/sys/fs/bpf/my_map");
   bpf_create_hash_of_maps(&my_hash_of_maps, 1000/*max_entries*/);
   bpf_create_array_of_maps(&my_array_of_maps, 20);
}

SEC("cgroup/skb")
int bpf_prog(struct __sk_buff *skb)
{
   struct my_value *val;
   __u32 key;
   __u64 key64;
   struct my_map *map;

   val = bpf_map_lookup(my_hash_map, &key);
   map = bpf_map_lookup(my_hash_of_maps, &key64);
}

'__init' section will be compiled by llvm into bpf instructions
that will be executed in users space by libbpf.
The __init prog has to succeed otherwise prog load fails.

May be all map pointers should be in a special section to avoid
putting them into datasec, but libbpf should be able to figure that
out without requiring user to specify the .map section.
The rest of global vars would go into special datasec map.

No llvm changes necessary and BTF is available for keys and values.

libbpf can start with simple __init and eventually grow into
complex init procedure where maps are initialized,
prog_array is populated, etc.

Thoughts?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: explicit maps. Was: [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF
  2019-06-10  1:17                                 ` explicit maps. Was: " Alexei Starovoitov
@ 2019-06-10 21:15                                   ` Jakub Kicinski
  2019-06-10 23:48                                   ` Andrii Nakryiko
  1 sibling, 0 replies; 40+ messages in thread
From: Jakub Kicinski @ 2019-06-10 21:15 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, Daniel Borkmann, Stanislav Fomichev,
	Andrii Nakryiko, Networking, bpf, Kernel Team, Yonghong Song

On Mon, 10 Jun 2019 01:17:13 +0000, Alexei Starovoitov wrote:
> On 6/6/19 6:02 PM, Jakub Kicinski wrote:
> > On Fri, 7 Jun 2019 00:27:52 +0000, Alexei Starovoitov wrote:  
> >> the solution we're discussing should solve BPF_ANNOTATE_KV_PAIR too.
> >> That hack must go.  
> > 
> > I see.
> >   
> >> If I understood your objections to Andrii's format is that
> >> you don't like pointer part of key/value while Andrii explained
> >> why we picked the pointer, right?
> >>
> >> So how about:
> >>
> >> struct {
> >>     int type;
> >>     int max_entries;
> >>     struct {
> >>       __u32 key;
> >>       struct my_value value;
> >>     } types[];
> >> } ...  
> > 
> > My objection is that k/v fields are never initialized, so they're
> > "metafields", mixed with real fields which hold parameters - like
> > type, max_entries etc.  
> 
> I don't share this meta fields vs real fields distinction.
> All of the fields are meta.
> Kernel implementation of the map doesn't need to hold type and
> max_entries as actual configuration fields.
> The map definition in c++ would have looked like:
> bpf::hash_map<int, struct my_value, 1000, NO_PREALLOC> foo;
> bpf::array_map<struct my_value, 2000> bar;
> 
> Sometime key is not necessary. Sometimes flags have to be zero.
> bpf syscall api is a superset of all fiels for all maps.
> All of them are configuration and meta fields at the same time.
> In c++ example there is really no difference between
> 'struct my_value' and '1000' attributes.
> 
> I'm pretty sure bpf will have C++ front-end in the future,
> but until then we have to deal with C and, I think, the map
> definition should be the most natural C syntax.
> In that sense what you're proposing with extern:
> > extern struct my_key my_key;
> > extern int type_int;
> > 
> > struct map_def {
> >      int type;
> >      int max_entries;
> >      void *btf_key_ref;
> >      void *btf_val_ref;
> > } = {
> >      ...
> >      .btf_key_ref = &my_key,
> >      .btf_val_ref = &type_int,
> > };  
> 
> is worse than
> 
> struct map_def {
>        int type;
>        int max_entries;
>        int btf_key;
>        struct my_key btf_value;
> };
> 
> imo explicit key and value would be ideal,
> but they take too much space. Hence pointers
> or zero sized array:
> struct {
>       int type;
>       int max_entries;
>       struct {
>         __u32 key;
>         struct my_value value;
>       } types[];
> };

It is a C syntax problem, I do agree with you that it works well for
templates.  The map_def structure holds parameters, and we can't take
a type as a value in C.  Hence the types[] in your proposal - you could
as well call them ghost_fields[] :)

> I think we should also consider explicit map creation.
> 
> Something like:
> 
> struct my_map {
>    __u32 key;
>    struct my_value value;
> } *my_hash_map, *my_pinned_hash_map;
> 
> struct {
>     __u64 key;
>    struct my_map *value;
> } *my_hash_of_maps;
> 
> struct {
>    struct my_map *value;
> } *my_array_of_maps;
> 
> __init void create_my_maps(void)
> {
>    bpf_create_hash_map(&my_hash_map, 1000/*max_entries*/);
>    bpf_obj_get(&my_pinned_hash_map, "/sys/fs/bpf/my_map");
>    bpf_create_hash_of_maps(&my_hash_of_maps, 1000/*max_entries*/);
>    bpf_create_array_of_maps(&my_array_of_maps, 20);
> }
> 
> SEC("cgroup/skb")
> int bpf_prog(struct __sk_buff *skb)
> {
>    struct my_value *val;
>    __u32 key;
>    __u64 key64;
>    struct my_map *map;
> 
>    val = bpf_map_lookup(my_hash_map, &key);
>    map = bpf_map_lookup(my_hash_of_maps, &key64);
> }
> 
> '__init' section will be compiled by llvm into bpf instructions
> that will be executed in users space by libbpf.
> The __init prog has to succeed otherwise prog load fails.
> 
> May be all map pointers should be in a special section to avoid
> putting them into datasec, but libbpf should be able to figure that
> out without requiring user to specify the .map section.
> The rest of global vars would go into special datasec map.
> 
> No llvm changes necessary and BTF is available for keys and values.
> 
> libbpf can start with simple __init and eventually grow into
> complex init procedure where maps are initialized,
> prog_array is populated, etc.
> 
> Thoughts?

I like it! :)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: explicit maps. Was: [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF
  2019-06-10  1:17                                 ` explicit maps. Was: " Alexei Starovoitov
  2019-06-10 21:15                                   ` Jakub Kicinski
@ 2019-06-10 23:48                                   ` Andrii Nakryiko
  1 sibling, 0 replies; 40+ messages in thread
From: Andrii Nakryiko @ 2019-06-10 23:48 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Jakub Kicinski, Daniel Borkmann, Stanislav Fomichev,
	Andrii Nakryiko, Networking, bpf, Kernel Team, Yonghong Song

On Sun, Jun 9, 2019 at 6:17 PM Alexei Starovoitov <ast@fb.com> wrote:
>
> On 6/6/19 6:02 PM, Jakub Kicinski wrote:
> > On Fri, 7 Jun 2019 00:27:52 +0000, Alexei Starovoitov wrote:
> >> the solution we're discussing should solve BPF_ANNOTATE_KV_PAIR too.
> >> That hack must go.
> >
> > I see.
> >
> >> If I understood your objections to Andrii's format is that
> >> you don't like pointer part of key/value while Andrii explained
> >> why we picked the pointer, right?
> >>
> >> So how about:
> >>
> >> struct {
> >>     int type;
> >>     int max_entries;
> >>     struct {
> >>       __u32 key;
> >>       struct my_value value;
> >>     } types[];
> >> } ...
> >
> > My objection is that k/v fields are never initialized, so they're
> > "metafields", mixed with real fields which hold parameters - like
> > type, max_entries etc.
>
> I don't share this meta fields vs real fields distinction.

100% agree.

> All of the fields are meta.
> Kernel implementation of the map doesn't need to hold type and
> max_entries as actual configuration fields.
> The map definition in c++ would have looked like:
> bpf::hash_map<int, struct my_value, 1000, NO_PREALLOC> foo;
> bpf::array_map<struct my_value, 2000> bar;
>
> Sometime key is not necessary. Sometimes flags have to be zero.
> bpf syscall api is a superset of all fiels for all maps.
> All of them are configuration and meta fields at the same time.
> In c++ example there is really no difference between
> 'struct my_value' and '1000' attributes.
>
> I'm pretty sure bpf will have C++ front-end in the future,
> but until then we have to deal with C and, I think, the map
> definition should be the most natural C syntax.
> In that sense what you're proposing with extern:
> > extern struct my_key my_key;
> > extern int type_int;
> >
> > struct map_def {
> >      int type;
> >      int max_entries;
> >      void *btf_key_ref;
> >      void *btf_val_ref;
> > } = {
> >      ...
> >      .btf_key_ref = &my_key,
> >      .btf_val_ref = &type_int,
> > };
>
> is worse than
>
> struct map_def {
>        int type;
>        int max_entries;
>        int btf_key;
>        struct my_key btf_value;
> };
>
> imo explicit key and value would be ideal,

also agree 100%, that's how I started, but then was quickly pointed to
a real cases where value is just way too big.

> but they take too much space. Hence pointers
> or zero sized array:
> struct {
>       int type;
>       int max_entries;
>       struct {
>         __u32 key;
>         struct my_value value;
>       } types[];
> };

This works, but I still prefer simpler

__u32 *key;
struct my_value *value;

It has less visual clutter and doesn't rely on somewhat obscure
flexible array feature (and it will have to be last in the struct,
unless you do zero-sized array w/ [0]).

>
> I think we should also consider explicit map creation.
>
> Something like:
>
> struct my_map {
>    __u32 key;
>    struct my_value value;
> } *my_hash_map, *my_pinned_hash_map;
>
> struct {
>     __u64 key;
>    struct my_map *value;
> } *my_hash_of_maps;
>
> struct {
>    struct my_map *value;
> } *my_array_of_maps;
>
> __init void create_my_maps(void)
> {
>    bpf_create_hash_map(&my_hash_map, 1000/*max_entries*/);
>    bpf_obj_get(&my_pinned_hash_map, "/sys/fs/bpf/my_map");
>    bpf_create_hash_of_maps(&my_hash_of_maps, 1000/*max_entries*/);
>    bpf_create_array_of_maps(&my_array_of_maps, 20);
> }
>
> SEC("cgroup/skb")
> int bpf_prog(struct __sk_buff *skb)
> {
>    struct my_value *val;
>    __u32 key;
>    __u64 key64;
>    struct my_map *map;
>
>    val = bpf_map_lookup(my_hash_map, &key);
>    map = bpf_map_lookup(my_hash_of_maps, &key64);
> }
>
> '__init' section will be compiled by llvm into bpf instructions
> that will be executed in users space by libbpf.
> The __init prog has to succeed otherwise prog load fails.
>
> May be all map pointers should be in a special section to avoid
> putting them into datasec, but libbpf should be able to figure that
> out without requiring user to specify the .map section.
> The rest of global vars would go into special datasec map.
>
> No llvm changes necessary and BTF is available for keys and values.
>
> libbpf can start with simple __init and eventually grow into
> complex init procedure where maps are initialized,
> prog_array is populated, etc.
>
> Thoughts?

I have few. :)

I think it would be great to have this feature as a sort of "escape
hatch" for really complicated initialization of maps, which can't be
done w/ declarative syntax (and doing it from user-land driving app is
not possible/desirable). But there is a lot of added complexity and
work to be done to make this happen:

1. We'll need to build BPF interpreter into libbpf (so partial
duplication of in-kernel BPF machinery);
2. We'll need to define some sort of user-space BPF API, so that these
init functions can call into libbpf API (at least). So now in addition
to in-kernel BPF helpers, we'll have another and different set of
helpers/APIs exposed to user-land BPF code. This will certainly add
confusion and raise learning curve.
3. Next we'll be adding not-just-libbpf APIs, for cases where the size
of map depends on some system parameter (e.g., number of CPUs, or
amount of free RAM, or something else). This probably can be done
through exposed libbpf APIs again, but now we'll need to decide what
gets exposed, in what format, etc.

It's all doable, but looks like a very large effort, while we yet
don't have a realistic use case for this. Today cases like that are
handled by driving user-land app. It seems like having prog_array and
map-in-map declarative initialization covers a lot of advanced use
cases (plus, of course, pinning), so for starters I'd concentrate
effort there to get declarative approach powerful enough to address a
lot of real-world needs.

The good thing, though, is that nothing prevents us from specifying
and adding this later, once we have good use cases and most needs
already covered w/ declarative syntax.

But, assuming we do explicit map creation, I'd also vote for per-map
"factory" functions, like this:

typedef int (*map_factory_fn)(struct bpf_map); /* can be provided by libbpf */

int init_my_map(struct bpf_map *map)
{
    /* something fancy here */
}

struct {
    __u64 *key;
    struct my_value *value;
    map_factory_fn factory;
} my_map SEC(".maps") = {
    .factory = &init_my_map,
};

/* we can still have per-BPF object init function: */
int init_my_app(struct bpf_object *obj) {
    /* some more initialization of BPF object */
}

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF
  2019-06-06 22:34     ` Andrii Nakryiko
@ 2019-06-17  9:07       ` Lorenz Bauer
  2019-06-17 20:59         ` Andrii Nakryiko
  0 siblings, 1 reply; 40+ messages in thread
From: Lorenz Bauer @ 2019-06-17  9:07 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, Networking, bpf, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team

On Thu, 6 Jun 2019 at 23:35, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Jun 6, 2019 at 9:43 AM Lorenz Bauer <lmb@cloudflare.com> wrote:
> >
> > Thanks for sending this RFC! For me, the biggest draw is that map-in-map
> > would be so much nicer to use, plus automatic dumping of map values.
> >
> > Others on the thread have raised this point already: not everybody lives
> > on the bleeding edge or can control all of their dependencies. To me this means
> > that having a good compatibility story is paramount. I'd like to have very clear
> > rules how the presence / absence of fields is handled.
>
> I think that discussion was more about selftests being switched to
> BTF-defined maps rather than BPF users having to switch to latest
> compiler. struct bpf_map_def is still supported for those who can't
> use clang that supports BTF_KIND_VAR/BTF_KIND_DATASEC.
> So I don't think this enforces anyone to switch compiler, but
> certainly incentivizes them :)
>
> >
> > For example:
> > - Fields that are present but not understood are an error. This makes
> > sense because
> >   the user can simply omit the field in their definition if they do
> > not use it. It's also necessary
> >   to preserve the freedom to add new fields in the future without
> > risking user breakage.
>
> So you are arguing for strict-by-default behavior. It's fine by me,
> but exactly that strict-by-default behavior is the problem with BTF
> extensivility, that you care a lot about. You are advocating for
> skipping unknown BTF types (if it was possible), which is directly
> opposite to strict-by-default behavior. I have no strong preference
> here, but given amount of problem (and how many times we missed this
> problem in the past) w/ introducing new BTF feature and then
> forgetting about doing something for older kernels, kind of makes me
> lean towards skip-and-log behavior. But I'm happy to support both
> (through flags) w/ strict by default.

In my mind, BPF loaders should be able to pass through BTF to the kernel
as a binary blob as much as possible. That's why I want the format to
be "self describing". Compatibility then becomes a question of: what
feature are you using on which kernel. The kernel itself can then still be
strict-by-default or what have you.

>
> > - If libbpf adds support for a new field, it must be optional. Seems
> > like this is what current
> >   map extensions already do, so maybe a no-brainer.
>
> Yeah, of course.
>
> >
> > Somewhat related to this: I really wish that BTF was self-describing,
> > e.g. possible
> > to parse without understanding all types. I mentioned this in another
> > thread of yours,
> > but the more we add features where BTF is required the more important it becomes
> > IMO.
>
> I relate, but have no new and better solution than previously
> discussed :) We should try to add new stuff to .BTF.ext as much as
> possible, which is self-describing.
>
> >
> > Finally, some nits inline:
> >
> > On Fri, 31 May 2019 at 21:22, Andrii Nakryiko <andriin@fb.com> wrote:
> > >
> > > The outline of the new map definition (short, BTF-defined maps) is as follows:
> > > 1. All the maps should be defined in .maps ELF section. It's possible to
> > >    have both "legacy" map definitions in `maps` sections and BTF-defined
> > >    maps in .maps sections. Everything will still work transparently.
> >
> > I'd prefer using a new map section "btf_maps" or whatever. No need to
> > worry about code that deals with either type.
>
> We do use new map section. Its ".maps" vs "maps". Difference is
> subtle, but ".maps" looks a bit more "standardized" than "btf_maps" to
> me (and hopefully, eventually no one will use "maps" anymore :) ).

Phew, spotting that difference is night impossible IMO.

>
> >
> > > 3. Key/value fields should be **a pointer** to a type describing
> > >    key/value. The pointee type is assumed (and will be recorded as such
> > >    and used for size determination) to be a type describing key/value of
> > >    the map. This is done to save excessive amounts of space allocated in
> > >    corresponding ELF sections for key/value of big size.
> >
> > My biggest concern with the pointer is that there are cases when we want
> > to _not_ use a pointer, e.g. your proposal for map in map and tail calling.
> > There we need value to be a struct, an array, etc. The burden on the user
> > for this is very high.
>
> Well, map-in-map is still a special case and whichever syntax we go
> with, it will need to be of slightly different syntax to distinguish
> between those cases. Initialized maps fall into similar category,
> IMHO.

I agree with you, the syntax probably has to be different. I'd just like it to
differ by more than a "*" in the struct definition, because that is too small
to notice.

>
> Embedding full value just to capture type info/size is unacceptable,
> as we have use cases that cause too big ELF size increase, which will
> prevent users from switching to this.
>
> >
> > > 4. As some maps disallow having BTF type ID associated with key/value,
> > >    it's possible to specify key/value size explicitly without
> > >    associating BTF type ID with it. Use key_size and value_size fields
> > >    to do that (see example below).
> >
> > Why not just make them use the legacy map?
>
> For completeness' sake at the least. E.g., what if you want to use
> map-in-map, where inner map is stackmap or something like that, which
> requires key_size/value_size? I think we all agree that it's better if
> application uses just one style, instead of a mix of both, right?

I kind of assumed that BTF support for those maps would at some point
appear, maybe I should have checked that.

> Btw, for map cases where map key can be arbitrary, but value is FD or
> some other opaque value, libbpf can automatically "derive" value size
> and still capture key type. I haven't done that, but it's very easy to
> do (and also we can keep adding per-map-type checks/niceties, to help
> users understand what's wrong with their map definition, instead of
> getting EINVAL from kernel on map creation).
>
> >
> > --
> > Lorenz Bauer  |  Systems Engineer
> > 6th Floor, County Hall/The Riverside Building, SE1 7PB, UK
> >
> > www.cloudflare.com



-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF
  2019-06-17  9:07       ` Lorenz Bauer
@ 2019-06-17 20:59         ` Andrii Nakryiko
  2019-06-20  9:27           ` Lorenz Bauer
  0 siblings, 1 reply; 40+ messages in thread
From: Andrii Nakryiko @ 2019-06-17 20:59 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: Andrii Nakryiko, Networking, bpf, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team

On Mon, Jun 17, 2019 at 2:07 AM Lorenz Bauer <lmb@cloudflare.com> wrote:
>
> On Thu, 6 Jun 2019 at 23:35, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >
> > On Thu, Jun 6, 2019 at 9:43 AM Lorenz Bauer <lmb@cloudflare.com> wrote:
> > >
> > > Thanks for sending this RFC! For me, the biggest draw is that map-in-map
> > > would be so much nicer to use, plus automatic dumping of map values.
> > >
> > > Others on the thread have raised this point already: not everybody lives
> > > on the bleeding edge or can control all of their dependencies. To me this means
> > > that having a good compatibility story is paramount. I'd like to have very clear
> > > rules how the presence / absence of fields is handled.
> >
> > I think that discussion was more about selftests being switched to
> > BTF-defined maps rather than BPF users having to switch to latest
> > compiler. struct bpf_map_def is still supported for those who can't
> > use clang that supports BTF_KIND_VAR/BTF_KIND_DATASEC.
> > So I don't think this enforces anyone to switch compiler, but
> > certainly incentivizes them :)
> >
> > >
> > > For example:
> > > - Fields that are present but not understood are an error. This makes
> > > sense because
> > >   the user can simply omit the field in their definition if they do
> > > not use it. It's also necessary
> > >   to preserve the freedom to add new fields in the future without
> > > risking user breakage.
> >
> > So you are arguing for strict-by-default behavior. It's fine by me,
> > but exactly that strict-by-default behavior is the problem with BTF
> > extensivility, that you care a lot about. You are advocating for
> > skipping unknown BTF types (if it was possible), which is directly
> > opposite to strict-by-default behavior. I have no strong preference
> > here, but given amount of problem (and how many times we missed this
> > problem in the past) w/ introducing new BTF feature and then
> > forgetting about doing something for older kernels, kind of makes me
> > lean towards skip-and-log behavior. But I'm happy to support both
> > (through flags) w/ strict by default.
>
> In my mind, BPF loaders should be able to pass through BTF to the kernel
> as a binary blob as much as possible. That's why I want the format to
> be "self describing". Compatibility then becomes a question of: what
> feature are you using on which kernel. The kernel itself can then still be
> strict-by-default or what have you.

That would work in ideal world, where kernel is updated frequently
(and BTF is self-describing, which it is not). In practice, though,
libbpf is far more up-to-date and lends its hand on "sanitizing" .BTF
from kernel-unsupported features (so far we manage to pull this off
very reasonably). If you have a good proposal how to make .BTF
self-describing, that would be great!

>
> >
> > > - If libbpf adds support for a new field, it must be optional. Seems
> > > like this is what current
> > >   map extensions already do, so maybe a no-brainer.
> >
> > Yeah, of course.
> >
> > >
> > > Somewhat related to this: I really wish that BTF was self-describing,
> > > e.g. possible
> > > to parse without understanding all types. I mentioned this in another
> > > thread of yours,
> > > but the more we add features where BTF is required the more important it becomes
> > > IMO.
> >
> > I relate, but have no new and better solution than previously
> > discussed :) We should try to add new stuff to .BTF.ext as much as
> > possible, which is self-describing.
> >
> > >
> > > Finally, some nits inline:
> > >
> > > On Fri, 31 May 2019 at 21:22, Andrii Nakryiko <andriin@fb.com> wrote:
> > > >
> > > > The outline of the new map definition (short, BTF-defined maps) is as follows:
> > > > 1. All the maps should be defined in .maps ELF section. It's possible to
> > > >    have both "legacy" map definitions in `maps` sections and BTF-defined
> > > >    maps in .maps sections. Everything will still work transparently.
> > >
> > > I'd prefer using a new map section "btf_maps" or whatever. No need to
> > > worry about code that deals with either type.
> >
> > We do use new map section. Its ".maps" vs "maps". Difference is
> > subtle, but ".maps" looks a bit more "standardized" than "btf_maps" to
> > me (and hopefully, eventually no one will use "maps" anymore :) ).
>
> Phew, spotting that difference is night impossible IMO.

Eventually "maps" should die off, as people switch from bpf_map_def to
to BTF-defined maps in .maps. Libbpf itself can just provide a macro
hiding all that, something like:

#define BPF_MAP __attribute__((section(".maps"), used))

>
> >
> > >
> > > > 3. Key/value fields should be **a pointer** to a type describing
> > > >    key/value. The pointee type is assumed (and will be recorded as such
> > > >    and used for size determination) to be a type describing key/value of
> > > >    the map. This is done to save excessive amounts of space allocated in
> > > >    corresponding ELF sections for key/value of big size.
> > >
> > > My biggest concern with the pointer is that there are cases when we want
> > > to _not_ use a pointer, e.g. your proposal for map in map and tail calling.
> > > There we need value to be a struct, an array, etc. The burden on the user
> > > for this is very high.
> >
> > Well, map-in-map is still a special case and whichever syntax we go
> > with, it will need to be of slightly different syntax to distinguish
> > between those cases. Initialized maps fall into similar category,
> > IMHO.
>
> I agree with you, the syntax probably has to be different. I'd just like it to
> differ by more than a "*" in the struct definition, because that is too small
> to notice.

So let's lay out how it will be done in practice:

1. Simple map w/ custom key/value

struct my_key { ... };
struct my_value { ... };

struct {
    __u32 type;
    __u32 max_entries;
    struct my_key *key;
    struct my_value *value;
} my_simple_map BPF_MAP = {
    .type = BPF_MAP_TYPE_ARRAY,
    .max_entries = 16,
};

2. Now map-in-map:

struct {
    __u32 type;
    __u32 max_entries;
    struct my_key *key;
    struct {
        __u32 type;
        __u32 max_entries;
        __u64 *key;
        struct my_value *value;
    } value;
} my_map_in_map BPF_MAP = {
    .type = BPF_MAP_TYPE_HASH_OF_MAPS,
    .max_entries = 16,
    .value = {
        .type = BPF_MAP_TYPE_ARRAY,
        .max_entries = 100,
    },
};

It's clearly hard to misinterpret inner map definition for a custom
anonymous struct type, right?


>
> >
> > Embedding full value just to capture type info/size is unacceptable,
> > as we have use cases that cause too big ELF size increase, which will
> > prevent users from switching to this.
> >
> > >
> > > > 4. As some maps disallow having BTF type ID associated with key/value,
> > > >    it's possible to specify key/value size explicitly without
> > > >    associating BTF type ID with it. Use key_size and value_size fields
> > > >    to do that (see example below).
> > >
> > > Why not just make them use the legacy map?
> >
> > For completeness' sake at the least. E.g., what if you want to use
> > map-in-map, where inner map is stackmap or something like that, which
> > requires key_size/value_size? I think we all agree that it's better if
> > application uses just one style, instead of a mix of both, right?
>
> I kind of assumed that BTF support for those maps would at some point
> appear, maybe I should have checked that.

It will. Current situation with maps not supporting specifying BTF for
key and/or value looks more like a bug, than feature and we should fix
that. But even if we fix it today, kernels are updated much slower
than libbpf, so by not supporting key_size/value_size, we force people
to get stuck with legacy bpf_map_def for a really long time.

>
> > Btw, for map cases where map key can be arbitrary, but value is FD or
> > some other opaque value, libbpf can automatically "derive" value size
> > and still capture key type. I haven't done that, but it's very easy to
> > do (and also we can keep adding per-map-type checks/niceties, to help
> > users understand what's wrong with their map definition, instead of
> > getting EINVAL from kernel on map creation).
> >
> > >
> > > --
> > > Lorenz Bauer  |  Systems Engineer
> > > 6th Floor, County Hall/The Riverside Building, SE1 7PB, UK
> > >
> > > www.cloudflare.com
>
>
>
> --
> Lorenz Bauer  |  Systems Engineer
> 6th Floor, County Hall/The Riverside Building, SE1 7PB, UK
>
> www.cloudflare.com

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF
  2019-06-17 20:59         ` Andrii Nakryiko
@ 2019-06-20  9:27           ` Lorenz Bauer
  2019-06-21  4:05             ` Andrii Nakryiko
  0 siblings, 1 reply; 40+ messages in thread
From: Lorenz Bauer @ 2019-06-20  9:27 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, Networking, bpf, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team

On Mon, 17 Jun 2019 at 22:00, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> > In my mind, BPF loaders should be able to pass through BTF to the kernel
> > as a binary blob as much as possible. That's why I want the format to
> > be "self describing". Compatibility then becomes a question of: what
> > feature are you using on which kernel. The kernel itself can then still be
> > strict-by-default or what have you.
>
> That would work in ideal world, where kernel is updated frequently
> (and BTF is self-describing, which it is not). In practice, though,
> libbpf is far more up-to-date and lends its hand on "sanitizing" .BTF
> from kernel-unsupported features (so far we manage to pull this off
> very reasonably). If you have a good proposal how to make .BTF
> self-describing, that would be great!

I think sanitizing is going to become a problem, but we've been around
that argument a few times :)

Making .BTF self describing need at least adding length to certain fields,
as I mentioned in another thread. Plus an interface to interrogate the
kernel about a loaded BTF blob.

> > I agree with you, the syntax probably has to be different. I'd just like it to
> > differ by more than a "*" in the struct definition, because that is too small
> > to notice.
>
> So let's lay out how it will be done in practice:
>
> 1. Simple map w/ custom key/value
>
> struct my_key { ... };
> struct my_value { ... };
>
> struct {
>     __u32 type;
>     __u32 max_entries;
>     struct my_key *key;
>     struct my_value *value;
> } my_simple_map BPF_MAP = {
>     .type = BPF_MAP_TYPE_ARRAY,
>     .max_entries = 16,
> };
>
> 2. Now map-in-map:
>
> struct {
>     __u32 type;
>     __u32 max_entries;
>     struct my_key *key;
>     struct {
>         __u32 type;
>         __u32 max_entries;
>         __u64 *key;
>         struct my_value *value;
>     } value;
> } my_map_in_map BPF_MAP = {
>     .type = BPF_MAP_TYPE_HASH_OF_MAPS,
>     .max_entries = 16,
>     .value = {
>         .type = BPF_MAP_TYPE_ARRAY,
>         .max_entries = 100,
>     },
> };
>
> It's clearly hard to misinterpret inner map definition for a custom
> anonymous struct type, right?

That's not what I'm concerned about. My point is: sometimes you
have to use a pointer, sometimes you don't. Every user has to learn this.
Chance is, they'll probably get it wrong first. Is there a way to give a
reasonable error message for this?

> > I kind of assumed that BTF support for those maps would at some point
> > appear, maybe I should have checked that.
>
> It will. Current situation with maps not supporting specifying BTF for
> key and/or value looks more like a bug, than feature and we should fix
> that. But even if we fix it today, kernels are updated much slower
> than libbpf, so by not supporting key_size/value_size, we force people
> to get stuck with legacy bpf_map_def for a really long time.

OK.

I'll go and look at the newest revision of the patch set now :o)

-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF
  2019-06-20  9:27           ` Lorenz Bauer
@ 2019-06-21  4:05             ` Andrii Nakryiko
  0 siblings, 0 replies; 40+ messages in thread
From: Andrii Nakryiko @ 2019-06-21  4:05 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: Andrii Nakryiko, Networking, bpf, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team

On Thu, Jun 20, 2019 at 2:28 AM Lorenz Bauer <lmb@cloudflare.com> wrote:
>
> On Mon, 17 Jun 2019 at 22:00, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> > > In my mind, BPF loaders should be able to pass through BTF to the kernel
> > > as a binary blob as much as possible. That's why I want the format to
> > > be "self describing". Compatibility then becomes a question of: what
> > > feature are you using on which kernel. The kernel itself can then still be
> > > strict-by-default or what have you.
> >
> > That would work in ideal world, where kernel is updated frequently
> > (and BTF is self-describing, which it is not). In practice, though,
> > libbpf is far more up-to-date and lends its hand on "sanitizing" .BTF
> > from kernel-unsupported features (so far we manage to pull this off
> > very reasonably). If you have a good proposal how to make .BTF
> > self-describing, that would be great!
>
> I think sanitizing is going to become a problem, but we've been around
> that argument a few times :)

Yep :)

>
> Making .BTF self describing need at least adding length to certain fields,
> as I mentioned in another thread. Plus an interface to interrogate the
> kernel about a loaded BTF blob.
>
> > > I agree with you, the syntax probably has to be different. I'd just like it to
> > > differ by more than a "*" in the struct definition, because that is too small
> > > to notice.
> >
> > So let's lay out how it will be done in practice:
> >
> > 1. Simple map w/ custom key/value
> >
> > struct my_key { ... };
> > struct my_value { ... };
> >
> > struct {
> >     __u32 type;
> >     __u32 max_entries;
> >     struct my_key *key;
> >     struct my_value *value;
> > } my_simple_map BPF_MAP = {
> >     .type = BPF_MAP_TYPE_ARRAY,
> >     .max_entries = 16,
> > };
> >
> > 2. Now map-in-map:
> >
> > struct {
> >     __u32 type;
> >     __u32 max_entries;
> >     struct my_key *key;
> >     struct {
> >         __u32 type;
> >         __u32 max_entries;
> >         __u64 *key;
> >         struct my_value *value;
> >     } value;
> > } my_map_in_map BPF_MAP = {
> >     .type = BPF_MAP_TYPE_HASH_OF_MAPS,
> >     .max_entries = 16,
> >     .value = {
> >         .type = BPF_MAP_TYPE_ARRAY,
> >         .max_entries = 100,
> >     },
> > };
> >
> > It's clearly hard to misinterpret inner map definition for a custom
> > anonymous struct type, right?
>
> That's not what I'm concerned about. My point is: sometimes you
> have to use a pointer, sometimes you don't. Every user has to learn this.
> Chance is, they'll probably get it wrong first. Is there a way to give a
> reasonable error message for this?

Right now pointer is always required. My initial intent for map-in-map
was to not use pointer, but since then I've proposed a slightly
different approach, which eliminates all these concerns you mentioned.
As for messaging, yeah, that the simplest part, which can always be
improved.

>
> > > I kind of assumed that BTF support for those maps would at some point
> > > appear, maybe I should have checked that.
> >
> > It will. Current situation with maps not supporting specifying BTF for
> > key and/or value looks more like a bug, than feature and we should fix
> > that. But even if we fix it today, kernels are updated much slower
> > than libbpf, so by not supporting key_size/value_size, we force people
> > to get stuck with legacy bpf_map_def for a really long time.
>
> OK.
>
> I'll go and look at the newest revision of the patch set now :o)
>
> --
> Lorenz Bauer  |  Systems Engineer
> 6th Floor, County Hall/The Riverside Building, SE1 7PB, UK
>
> www.cloudflare.com

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF
  2019-06-11  4:34 [RFC PATCH bpf-next 0/8] BTF-defined BPF " Andrii Nakryiko
@ 2019-06-11  4:35 ` Andrii Nakryiko
  0 siblings, 0 replies; 40+ messages in thread
From: Andrii Nakryiko @ 2019-06-11  4:35 UTC (permalink / raw)
  To: andrii.nakryiko, bpf, netdev, ast, daniel, kernel-team; +Cc: Andrii Nakryiko

This patch adds support for a new way to define BPF maps. It relies on
BTF to describe mandatory and optional attributes of a map, as well as
captures type information of key and value naturally. This eliminates
the need for BPF_ANNOTATE_KV_PAIR hack and ensures key/value sizes are
always in sync with the key/value type.

Relying on BTF, this approach allows for both forward and backward
compatibility w.r.t. extending supported map definition features. By
default, any unrecognized attributes are treated as an error, but it's
possible relax this using MAPS_RELAX_COMPAT flag. New attributes, added
in the future will need to be optional.

The outline of the new map definition (short, BTF-defined maps) is as follows:
1. All the maps should be defined in .maps ELF section. It's possible to
   have both "legacy" map definitions in `maps` sections and BTF-defined
   maps in .maps sections. Everything will still work transparently.
2. The map declaration and initialization is done through
   a global/static variable of a struct type with few mandatory and
   extra optional fields:
   - type field is mandatory and specified type of BPF map;
   - key/value fields are mandatory and capture key/value type/size information;
   - max_entries attribute is optional; if max_entries is not specified or
     initialized, it has to be provided in runtime through libbpf API
     before loading bpf_object;
   - map_flags is optional and if not defined, will be assumed to be 0.
3. Key/value fields should be **a pointer** to a type describing
   key/value. The pointee type is assumed (and will be recorded as such
   and used for size determination) to be a type describing key/value of
   the map. This is done to save excessive amounts of space allocated in
   corresponding ELF sections for key/value of big size.
4. As some maps disallow having BTF type ID associated with key/value,
   it's possible to specify key/value size explicitly without
   associating BTF type ID with it. Use key_size and value_size fields
   to do that (see example below).

Here's an example of simple ARRAY map defintion:

struct my_value { int x, y, z; };

struct {
	int type;
	int max_entries;
	int *key;
	struct my_value *value;
} btf_map SEC(".maps") = {
	.type = BPF_MAP_TYPE_ARRAY,
	.max_entries = 16,
};

This will define BPF ARRAY map 'btf_map' with 16 elements. The key will
be of type int and thus key size will be 4 bytes. The value is struct
my_value of size 12 bytes. This map can be used from C code exactly the
same as with existing maps defined through struct bpf_map_def.

Here's an example of STACKMAP definition (which currently disallows BTF type
IDs for key/value):

struct {
	__u32 type;
	__u32 max_entries;
	__u32 map_flags;
	__u32 key_size;
	__u32 value_size;
} stackmap SEC(".maps") = {
	.type = BPF_MAP_TYPE_STACK_TRACE,
	.max_entries = 128,
	.map_flags = BPF_F_STACK_BUILD_ID,
	.key_size = sizeof(__u32),
	.value_size = PERF_MAX_STACK_DEPTH * sizeof(struct bpf_stack_build_id),
};

This approach is naturally extended to support map-in-map, by making a value
field to be another struct that describes inner map. This feature is not
implemented yet. It's also possible to incrementally add features like pinning
with full backwards and forward compatibility. Support for static
initialization of BPF_MAP_TYPE_PROG_ARRAY using pointers to BPF programs
is also on the roadmap.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 tools/lib/bpf/btf.h    |   1 +
 tools/lib/bpf/libbpf.c | 338 +++++++++++++++++++++++++++++++++++++++--
 2 files changed, 330 insertions(+), 9 deletions(-)

diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
index ba4ffa831aa4..88a52ae56fc6 100644
--- a/tools/lib/bpf/btf.h
+++ b/tools/lib/bpf/btf.h
@@ -17,6 +17,7 @@ extern "C" {
 
 #define BTF_ELF_SEC ".BTF"
 #define BTF_EXT_ELF_SEC ".BTF.ext"
+#define MAPS_ELF_SEC ".maps"
 
 struct btf;
 struct btf_ext;
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 79a8143240d7..60713bcc2279 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -262,6 +262,7 @@ struct bpf_object {
 		} *reloc;
 		int nr_reloc;
 		int maps_shndx;
+		int btf_maps_shndx;
 		int text_shndx;
 		int data_shndx;
 		int rodata_shndx;
@@ -514,6 +515,7 @@ static struct bpf_object *bpf_object__new(const char *path,
 	obj->efile.obj_buf = obj_buf;
 	obj->efile.obj_buf_sz = obj_buf_sz;
 	obj->efile.maps_shndx = -1;
+	obj->efile.btf_maps_shndx = -1;
 	obj->efile.data_shndx = -1;
 	obj->efile.rodata_shndx = -1;
 	obj->efile.bss_shndx = -1;
@@ -1012,6 +1014,297 @@ static int bpf_object__init_user_maps(struct bpf_object *obj, bool strict)
 	return 0;
 }
 
+static const struct btf_type *skip_mods_and_typedefs(const struct btf *btf,
+						     __u32 id)
+{
+	const struct btf_type *t = btf__type_by_id(btf, id);
+
+	while (true) {
+		switch (BTF_INFO_KIND(t->info)) {
+		case BTF_KIND_VOLATILE:
+		case BTF_KIND_CONST:
+		case BTF_KIND_RESTRICT:
+		case BTF_KIND_TYPEDEF:
+			t = btf__type_by_id(btf, t->type);
+			break;
+		default:
+			return t;
+		}
+	}
+}
+
+static bool get_map_attr_int(const char *map_name, 
+			     const struct btf *btf, 
+			     const struct btf_type *def,
+			     const struct btf_member *m, 
+			     const void *data, __u32 *res) {
+	const struct btf_type *t = skip_mods_and_typedefs(btf, m->type);
+	const char *name = btf__name_by_offset(btf, m->name_off);
+	__u32 int_info = *(const __u32 *)(const void *)(t + 1);
+
+	if (BTF_INFO_KIND(t->info) != BTF_KIND_INT) {
+		pr_warning("map '%s': attr '%s': expected INT, got %u.\n",
+			   map_name, name, BTF_INFO_KIND(t->info));
+		return false;
+	}
+	if (t->size != 4 || BTF_INT_BITS(int_info) != 32 ||
+	    BTF_INT_OFFSET(int_info)) {
+		pr_warning("map '%s': attr '%s': expected 32-bit non-bitfield integer, "
+			   "got %u-byte (%d-bit) one with bit offset %d.\n",
+			   map_name, name, t->size, BTF_INT_BITS(int_info),
+			   BTF_INT_OFFSET(int_info));
+		return false;
+	}
+	if (BTF_INFO_KFLAG(def->info) && BTF_MEMBER_BITFIELD_SIZE(m->offset)) {
+		pr_warning("map '%s': attr '%s': bitfield is not supported.\n",
+			   map_name, name);
+		return false;
+	}
+	if (m->offset % 32) {
+		pr_warning("map '%s': attr '%s': unaligned fields are not supported.\n",
+			   map_name, name);
+		return false;
+	}
+
+	*res = *(const __u32 *)(data + m->offset / 8);
+	return true;
+}
+
+static int bpf_object__init_user_btf_map(struct bpf_object *obj,
+					 const struct btf_type *sec,
+					 int var_idx, int sec_idx,
+					 const Elf_Data *data, bool strict)
+{
+	const struct btf_type *var, *def, *t;
+	const struct btf_var_secinfo *vi;
+	const struct btf_var *var_extra;
+	const struct btf_member *m;
+	const void *def_data;
+	const char *map_name;
+	struct bpf_map *map;
+	int vlen, i;
+
+	vi = (const struct btf_var_secinfo *)(const void *)(sec + 1) + var_idx;
+	var = btf__type_by_id(obj->btf, vi->type);
+	var_extra = (const void *)(var + 1);
+	map_name = btf__name_by_offset(obj->btf, var->name_off);
+	vlen = BTF_INFO_VLEN(var->info);
+
+	if (map_name == NULL || map_name[0] == '\0') {
+		pr_warning("map #%d: empty name.\n", var_idx);
+		return -EINVAL;
+	}
+	if ((__u64)vi->offset + vi->size > data->d_size) {
+		pr_warning("map '%s' BTF data is corrupted.\n", map_name);
+		return -EINVAL;
+	}
+	if (BTF_INFO_KIND(var->info) != BTF_KIND_VAR) {
+		pr_warning("map '%s': unexpected var kind %u.\n",
+			   map_name, BTF_INFO_KIND(var->info));
+		return -EINVAL;
+	}
+	if (var_extra->linkage != BTF_VAR_GLOBAL_ALLOCATED &&
+	    var_extra->linkage != BTF_VAR_STATIC) {
+		pr_warning("map '%s': unsupported var linkage %u.\n",
+			   map_name, var_extra->linkage);
+		return -EOPNOTSUPP;
+	}
+
+	def = skip_mods_and_typedefs(obj->btf, var->type);
+	if (BTF_INFO_KIND(def->info) != BTF_KIND_STRUCT) {
+		pr_warning("map '%s': unexpected def kind %u.\n",
+			   map_name, BTF_INFO_KIND(var->info));
+		return -EINVAL;
+	}
+	if (def->size > vi->size) {
+		pr_warning("map '%s': invalid def size.\n", map_name);
+		return -EINVAL;
+	}
+
+	map = bpf_object__add_map(obj);
+	if (IS_ERR(map))
+		return PTR_ERR(map);
+	map->name = strdup(map_name);
+	if (!map->name) {
+		pr_warning("map '%s': failed to alloc map name.\n", map_name);
+		return -ENOMEM;
+	}
+	map->libbpf_type = LIBBPF_MAP_UNSPEC;
+	map->def.type = BPF_MAP_TYPE_UNSPEC;
+	map->sec_idx = sec_idx;
+	map->sec_offset = vi->offset;
+	pr_debug("map '%s': at sec_idx %d, offset %zu.\n",
+		 map_name, map->sec_idx, map->sec_offset);
+
+	def_data = data->d_buf + vi->offset;
+	vlen = BTF_INFO_VLEN(def->info);
+	m = (const void *)(def + 1);
+	for (i = 0; i < vlen; i++, m++) {
+		const char *name = btf__name_by_offset(obj->btf, m->name_off);
+
+		if (strcmp(name, "type") == 0) {
+			if (!get_map_attr_int(map_name, obj->btf, def, m,
+					      def_data, &map->def.type))
+				return -EINVAL;
+			pr_debug("map '%s': found type = %u.\n",
+				 map_name, map->def.type);
+		} else if (strcmp(name, "max_entries") == 0) {
+			if (!get_map_attr_int(map_name, obj->btf, def, m,
+					      def_data, &map->def.max_entries))
+				return -EINVAL;
+			pr_debug("map '%s': found max_entries = %u.\n",
+				 map_name, map->def.max_entries);
+		} else if (strcmp(name, "map_flags") == 0) {
+			if (!get_map_attr_int(map_name, obj->btf, def, m,
+					      def_data, &map->def.map_flags))
+				return -EINVAL;
+			pr_debug("map '%s': found map_flags = %u.\n",
+				 map_name, map->def.map_flags);
+		} else if (strcmp(name, "key_size") == 0) {
+			__u32 sz;
+
+			if (!get_map_attr_int(map_name, obj->btf, def, m,
+					      def_data, &sz))
+				return -EINVAL;
+			pr_debug("map '%s': found key_size = %u.\n",
+				 map_name, sz);
+			if (map->def.key_size && map->def.key_size != sz) {
+				pr_warning("map '%s': conflictling key size %u != %u.\n",
+					   map_name, map->def.key_size, sz);
+				return -EINVAL;
+			}
+			map->def.key_size = sz;
+		} else if (strcmp(name, "key") == 0) {
+			__s64 sz;
+
+			t = btf__type_by_id(obj->btf, m->type);
+			if (BTF_INFO_KIND(t->info) != BTF_KIND_PTR) {
+				pr_warning("map '%s': key spec is not PTR: %u.\n",
+					   map_name, BTF_INFO_KIND(t->info));
+				return -EINVAL;
+			}
+			sz = btf__resolve_size(obj->btf, t->type);
+			if (sz < 0) {
+				pr_warning("map '%s': can't determine key size for type [%u]: %lld.\n",
+					   map_name, t->type, sz);
+				return sz;
+			}
+			pr_debug("map '%s': found key [%u], sz = %lld.\n",
+				 map_name, t->type, sz);
+			if (map->def.key_size && map->def.key_size != sz) {
+				pr_warning("map '%s': conflictling key size %u != %lld.\n",
+					   map_name, map->def.key_size, sz);
+				return -EINVAL;
+			}
+			map->def.key_size = sz;
+			map->btf_key_type_id = t->type;
+		} else if (strcmp(name, "value_size") == 0) {
+			__u32 sz;
+
+			if (!get_map_attr_int(map_name, obj->btf, def, m,
+					      def_data, &sz))
+				return -EINVAL;
+			pr_debug("map '%s': found value_size = %u.\n",
+				 map_name, sz);
+			if (map->def.value_size && map->def.value_size != sz) {
+				pr_warning("map '%s': conflictling value size %u != %u.\n",
+					   map_name, map->def.value_size, sz);
+				return -EINVAL;
+			}
+			map->def.value_size = sz;
+		} else if (strcmp(name, "value") == 0) {
+			__s64 sz;
+
+			t = btf__type_by_id(obj->btf, m->type);
+			if (BTF_INFO_KIND(t->info) != BTF_KIND_PTR) {
+				pr_warning("map '%s': value spec is not PTR: %u.\n",
+					   map_name, BTF_INFO_KIND(t->info));
+				return -EINVAL;
+			}
+			sz = btf__resolve_size(obj->btf, t->type);
+			if (sz < 0) {
+				pr_warning("map '%s': can't determine value size for type [%u]: %lld.\n",
+					   map_name, t->type, sz);
+				return sz;
+			}
+			pr_debug("map '%s': found value [%u], sz = %lld.\n",
+				 map_name, t->type, sz);
+			if (map->def.value_size && map->def.value_size != sz) {
+				pr_warning("map '%s': conflictling value size %u != %lld.\n",
+					   map_name, map->def.value_size, sz);
+				return -EINVAL;
+			}
+			map->def.value_size = sz;
+			map->btf_value_type_id = t->type;
+		} else {
+			if (strict) {
+				pr_warning("map '%s': unknown attribute '%s'.\n",
+					   map_name, name);
+				return -ENOTSUP;
+			}
+			pr_debug("map '%s': ignoring unknown attribute '%s'.\n",
+				 map_name, name);
+		}
+	}
+
+	if (map->def.type == BPF_MAP_TYPE_UNSPEC) {
+		pr_warning("map '%s': map type isn't specified.\n", map_name);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int bpf_object__init_user_btf_maps(struct bpf_object *obj, bool strict)
+{
+	const struct btf_type *sec = NULL;
+	int nr_types, i, vlen, err;
+	const struct btf_type *t;
+	const char *name;
+	Elf_Data *data;
+	Elf_Scn *scn;
+
+	if (obj->efile.btf_maps_shndx < 0)
+		return 0;
+
+	scn = elf_getscn(obj->efile.elf, obj->efile.btf_maps_shndx);
+	if (scn)
+		data = elf_getdata(scn, NULL);
+	if (!scn || !data) {
+		pr_warning("failed to get Elf_Data from map section %d (%s)\n",
+			   obj->efile.maps_shndx, MAPS_ELF_SEC);
+		return -EINVAL;
+	}
+
+	nr_types = btf__get_nr_types(obj->btf);
+	for (i = 1; i <= nr_types; i++) {
+		t = btf__type_by_id(obj->btf, i);
+		if (BTF_INFO_KIND(t->info) != BTF_KIND_DATASEC)
+			continue;
+		name = btf__name_by_offset(obj->btf, t->name_off);
+		if (strcmp(name, MAPS_ELF_SEC) == 0) {
+			sec = t;
+			break;
+		}
+	}
+
+	if (!sec) {
+		pr_warning("DATASEC '%s' not found.\n", MAPS_ELF_SEC);
+		return -ENOENT;
+	}
+
+	vlen = BTF_INFO_VLEN(sec->info);
+	for (i = 0; i < vlen; i++) {
+		err = bpf_object__init_user_btf_map(obj, sec, i,
+						    obj->efile.btf_maps_shndx,
+						    data, strict);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
 static int bpf_object__init_maps(struct bpf_object *obj, int flags)
 {
 	bool strict = !(flags & MAPS_RELAX_COMPAT);
@@ -1021,6 +1314,10 @@ static int bpf_object__init_maps(struct bpf_object *obj, int flags)
 	if (err)
 		return err;
 
+	err = bpf_object__init_user_btf_maps(obj, strict);
+	if (err)
+		return err;
+
 	err = bpf_object__init_global_data_maps(obj);
 	if (err)
 		return err;
@@ -1118,10 +1415,16 @@ static void bpf_object__sanitize_btf_ext(struct bpf_object *obj)
 	}
 }
 
+static bool bpf_object__is_btf_mandatory(const struct bpf_object *obj)
+{
+	return obj->efile.btf_maps_shndx >= 0;
+}
+
 static int bpf_object__init_btf(struct bpf_object *obj,
 				Elf_Data *btf_data,
 				Elf_Data *btf_ext_data)
 {
+	bool btf_required = bpf_object__is_btf_mandatory(obj);
 	int err = 0;
 
 	if (btf_data) {
@@ -1155,10 +1458,18 @@ static int bpf_object__init_btf(struct bpf_object *obj,
 	}
 out:
 	if (err || IS_ERR(obj->btf)) {
+		if (btf_required)
+			err = err ? : PTR_ERR(obj->btf);
+		else
+			err = 0;
 		if (!IS_ERR_OR_NULL(obj->btf))
 			btf__free(obj->btf);
 		obj->btf = NULL;
 	}
+	if (btf_required && !obj->btf) {
+		pr_warning("BTF is required, but is missing or corrupted.\n");
+		return err == 0 ? -ENOENT : err;
+	}
 	return 0;
 }
 
@@ -1178,6 +1489,8 @@ static int bpf_object__sanitize_and_load_btf(struct bpf_object *obj)
 			   BTF_ELF_SEC, err);
 		btf__free(obj->btf);
 		obj->btf = NULL;
+		if (bpf_object__is_btf_mandatory(obj))
+			return err;
 	}
 	return 0;
 }
@@ -1241,6 +1554,8 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
 				return err;
 		} else if (strcmp(name, "maps") == 0) {
 			obj->efile.maps_shndx = idx;
+		} else if (strcmp(name, MAPS_ELF_SEC) == 0) {
+			obj->efile.btf_maps_shndx = idx;
 		} else if (strcmp(name, BTF_ELF_SEC) == 0) {
 			btf_data = data;
 		} else if (strcmp(name, BTF_EXT_ELF_SEC) == 0) {
@@ -1360,7 +1675,8 @@ static bool bpf_object__shndx_is_data(const struct bpf_object *obj,
 static bool bpf_object__shndx_is_maps(const struct bpf_object *obj,
 				      int shndx)
 {
-	return shndx == obj->efile.maps_shndx;
+	return shndx == obj->efile.maps_shndx ||
+	       shndx == obj->efile.btf_maps_shndx;
 }
 
 static bool bpf_object__relo_in_known_section(const struct bpf_object *obj,
@@ -1404,14 +1720,14 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
 	prog->nr_reloc = nrels;
 
 	for (i = 0; i < nrels; i++) {
-		GElf_Sym sym;
-		GElf_Rel rel;
-		unsigned int insn_idx;
-		unsigned int shdr_idx;
 		struct bpf_insn *insns = prog->insns;
 		enum libbpf_map_type type;
+		unsigned int insn_idx;
+		unsigned int shdr_idx;
 		const char *name;
 		size_t map_idx;
+		GElf_Sym sym;
+		GElf_Rel rel;
 
 		if (!gelf_getrel(data, i, &rel)) {
 			pr_warning("relocation: failed to get %d reloc\n", i);
@@ -1505,14 +1821,18 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
 	return 0;
 }
 
-static int bpf_map_find_btf_info(struct bpf_map *map, const struct btf *btf)
+static int bpf_map_find_btf_info(struct bpf_object *obj, struct bpf_map *map)
 {
 	struct bpf_map_def *def = &map->def;
 	__u32 key_type_id = 0, value_type_id = 0;
 	int ret;
 
+	/* if it's BTF-defined map, we don't need to search for type IDs */
+	if (map->sec_idx == obj->efile.btf_maps_shndx)
+		return 0;
+
 	if (!bpf_map__is_internal(map)) {
-		ret = btf__get_map_kv_tids(btf, map->name, def->key_size,
+		ret = btf__get_map_kv_tids(obj->btf, map->name, def->key_size,
 					   def->value_size, &key_type_id,
 					   &value_type_id);
 	} else {
@@ -1520,7 +1840,7 @@ static int bpf_map_find_btf_info(struct bpf_map *map, const struct btf *btf)
 		 * LLVM annotates global data differently in BTF, that is,
 		 * only as '.data', '.bss' or '.rodata'.
 		 */
-		ret = btf__find_by_name(btf,
+		ret = btf__find_by_name(obj->btf,
 				libbpf_type_to_btf_name[map->libbpf_type]);
 	}
 	if (ret < 0)
@@ -1810,7 +2130,7 @@ bpf_object__create_maps(struct bpf_object *obj)
 		    map->inner_map_fd >= 0)
 			create_attr.inner_map_fd = map->inner_map_fd;
 
-		if (obj->btf && !bpf_map_find_btf_info(map, obj->btf)) {
+		if (obj->btf && !bpf_map_find_btf_info(obj, map)) {
 			create_attr.btf_fd = btf__fd(obj->btf);
 			create_attr.btf_key_type_id = map->btf_key_type_id;
 			create_attr.btf_value_type_id = map->btf_value_type_id;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2019-06-21  4:05 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-31 20:21 [RFC PATCH bpf-next 0/8] BTF-defined BPF map definitions Andrii Nakryiko
2019-05-31 20:21 ` [RFC PATCH bpf-next 1/8] libbpf: add common min/max macro to libbpf_internal.h Andrii Nakryiko
2019-05-31 20:21 ` [RFC PATCH bpf-next 2/8] libbpf: extract BTF loading and simplify ELF parsing logic Andrii Nakryiko
2019-05-31 20:21 ` [RFC PATCH bpf-next 3/8] libbpf: refactor map initialization Andrii Nakryiko
2019-05-31 20:21 ` [RFC PATCH bpf-next 4/8] libbpf: identify maps by section index in addition to offset Andrii Nakryiko
2019-05-31 20:21 ` [RFC PATCH bpf-next 5/8] libbpf: split initialization and loading of BTF Andrii Nakryiko
2019-05-31 20:21 ` [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF Andrii Nakryiko
2019-05-31 21:28   ` Stanislav Fomichev
2019-05-31 22:58     ` Andrii Nakryiko
2019-06-03  0:33       ` Jakub Kicinski
2019-06-03 21:54         ` Andrii Nakryiko
2019-06-03 23:34           ` Jakub Kicinski
2019-06-03 16:32       ` Stanislav Fomichev
2019-06-03 22:03         ` Andrii Nakryiko
2019-06-04  1:02           ` Stanislav Fomichev
2019-06-04  1:07             ` Alexei Starovoitov
2019-06-04  4:29               ` Stanislav Fomichev
2019-06-04 13:45                 ` Stanislav Fomichev
2019-06-04 17:31                   ` Andrii Nakryiko
2019-06-04 21:07                     ` Stanislav Fomichev
2019-06-04 21:22                       ` Andrii Nakryiko
2019-06-06 21:09                     ` Daniel Borkmann
2019-06-06 23:02                       ` Andrii Nakryiko
2019-06-06 23:27                         ` Alexei Starovoitov
2019-06-07  0:10                           ` Jakub Kicinski
2019-06-07  0:27                             ` Alexei Starovoitov
2019-06-07  1:02                               ` Jakub Kicinski
2019-06-10  1:17                                 ` explicit maps. Was: " Alexei Starovoitov
2019-06-10 21:15                                   ` Jakub Kicinski
2019-06-10 23:48                                   ` Andrii Nakryiko
2019-06-03 22:34   ` Andrii Nakryiko
2019-06-06 16:42   ` Lorenz Bauer
2019-06-06 22:34     ` Andrii Nakryiko
2019-06-17  9:07       ` Lorenz Bauer
2019-06-17 20:59         ` Andrii Nakryiko
2019-06-20  9:27           ` Lorenz Bauer
2019-06-21  4:05             ` Andrii Nakryiko
2019-05-31 20:21 ` [RFC PATCH bpf-next 7/8] selftests/bpf: add test for BTF-defined maps Andrii Nakryiko
2019-05-31 20:21 ` [RFC PATCH bpf-next 8/8] selftests/bpf: switch tests to BTF-defined map definitions Andrii Nakryiko
2019-06-11  4:34 [RFC PATCH bpf-next 0/8] BTF-defined BPF " Andrii Nakryiko
2019-06-11  4:35 ` [RFC PATCH bpf-next 6/8] libbpf: allow specifying map definitions using BTF Andrii Nakryiko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).