netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP)
@ 2019-08-20 11:47 Toke Høiland-Jørgensen
  2019-08-20 11:47 ` [RFC bpf-next 1/5] libbpf: Add map definition struct fields from iproute2 Toke Høiland-Jørgensen
                   ` (6 more replies)
  0 siblings, 7 replies; 46+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-08-20 11:47 UTC (permalink / raw)
  To: Stephen Hemminger, Daniel Borkmann, Alexei Starovoitov
  Cc: Martin KaFai Lau, Song Liu, Yonghong Song, David Miller,
	Jesper Dangaard Brouer, netdev, bpf,
	Toke Høiland-Jørgensen

iproute2 uses its own bpf loader to load eBPF programs, which has
evolved separately from libbpf. Since we are now standardising on
libbpf, this becomes a problem as iproute2 is slowly accumulating
feature incompatibilities with libbpf-based loaders. In particular,
iproute2 has its own (expanded) version of the map definition struct,
which makes it difficult to write programs that can be loaded with both
custom loaders and iproute2.

This series seeks to address this by converting iproute2 to using libbpf
for all its bpf needs. This version is an early proof-of-concept RFC, to
get some feedback on whether people think this is the right direction.

What this series does is the following:

- Updates the libbpf map definition struct to match that of iproute2
  (patch 1).
- Adds functionality to libbpf to support automatic pinning of maps when
  loading an eBPF program, while re-using pinned maps if they already
  exist (patches 2-3).
- Modifies iproute2 to make it possible to compile it against libbpf
  without affecting any existing functionality (patch 4).
- Changes the iproute2 eBPF loader to use libbpf for loading XDP
  programs (patch 5).


As this is an early PoC, there are still a few missing pieces before
this can be merged. Including (but probably not limited to):

- Consolidate the map definition struct in the bpf_helpers.h file in the
  kernel tree. This contains a different, and incompatible, update to
  the struct. Since the iproute2 version has actually been released for
  use outside the kernel tree (and thus is subject to API stability
  constraints), I think it makes the most sense to keep that, and port
  the selftests to use it.

- The iproute2 loader supports automatically populating map-in-map
  definitions on load. This needs to be added to libbpf as well. There
  is an implementation of this in the selftests in the kernel tree,
  which will have to be ported (related to the previous point).

- The iproute2 port needs to be completed; this means at least
  supporting TC eBPF programs as well, figuring out how to deal with
  cBPF programs, and getting the verbose output back to the same state
  as before the port. Also, I guess the iproute2 maintainers need to ACK
  that they are good with adding a dependency on libbpf.

- Some of the code added to libbpf in patch 2 in this series include
  code derived from iproute2, which is GPLv2+. So it will need to be
  re-licensed to be usable in libbpf. Since `git blame` indicated that
  the original code was written by Daniel, I figure he can ACK that
  relicensing before applying the patches :)


Please take a look at this series and let me know if you agree
that this is the right direction to go. Assuming there's consensus that
it is, I'll focus on getting the rest of the libbpf patches ready for
merging. I'll send those as a separate series, and hold off on the
iproute2 patches until they are merged; but for this version I'm
including both in one series so it's easier to see the context.

-Toke


libbpf:
Toke Høiland-Jørgensen (3):
  libbpf: Add map definition struct fields from iproute2
  libbpf: Add support for auto-pinning of maps with reuse on program
    load
  libbpf: Add support for specifying map pinning path via callback

 tools/lib/bpf/libbpf.c | 205 +++++++++++++++++++++++++++++++++++++++--
 tools/lib/bpf/libbpf.h |  18 ++++
 2 files changed, 214 insertions(+), 9 deletions(-)

iproute2:
Toke Høiland-Jørgensen (2):
  iproute2: Allow compiling against libbpf
  iproute2: Support loading XDP programs with libbpf

 configure          |  16 +++++
 include/bpf_util.h |   6 +-
 ip/ipvrf.c         |   4 +-
 lib/bpf.c          | 148 ++++++++++++++++++++++++++++++++++++---------
 4 files changed, 142 insertions(+), 32 deletions(-)


-- 
2.22.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC bpf-next 1/5] libbpf: Add map definition struct fields from iproute2
  2019-08-20 11:47 [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP) Toke Høiland-Jørgensen
@ 2019-08-20 11:47 ` Toke Høiland-Jørgensen
  2019-08-20 11:47 ` [RFC bpf-next 2/5] libbpf: Add support for auto-pinning of maps with reuse on program load Toke Høiland-Jørgensen
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 46+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-08-20 11:47 UTC (permalink / raw)
  To: Stephen Hemminger, Daniel Borkmann, Alexei Starovoitov
  Cc: Martin KaFai Lau, Song Liu, Yonghong Song, David Miller,
	Jesper Dangaard Brouer, netdev, bpf,
	Toke Høiland-Jørgensen

The iproute2 bpf headers define four more fields for map definitions than
libbpf does. This adds those fields to the libbpf headers, in preparation
for porting the bpf loading functionality of iproute2 to be based on
libbpf. Subsequent commits add the functionality needed in libbpf to be
able to port over iproute2.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 tools/lib/bpf/libbpf.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index e8f70977d137..5facba6ea1e1 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -289,6 +289,10 @@ struct bpf_map_def {
 	unsigned int value_size;
 	unsigned int max_entries;
 	unsigned int map_flags;
+	unsigned int map_id;
+	unsigned int pinning;
+	unsigned int inner_id;
+	unsigned int inner_idx;
 };
 
 /*
-- 
2.22.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC bpf-next 2/5] libbpf: Add support for auto-pinning of maps with reuse on program load
  2019-08-20 11:47 [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP) Toke Høiland-Jørgensen
  2019-08-20 11:47 ` [RFC bpf-next 1/5] libbpf: Add map definition struct fields from iproute2 Toke Høiland-Jørgensen
@ 2019-08-20 11:47 ` Toke Høiland-Jørgensen
  2019-08-20 11:47 ` [RFC bpf-next 3/5] libbpf: Add support for specifying map pinning path via callback Toke Høiland-Jørgensen
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 46+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-08-20 11:47 UTC (permalink / raw)
  To: Stephen Hemminger, Daniel Borkmann, Alexei Starovoitov
  Cc: Martin KaFai Lau, Song Liu, Yonghong Song, David Miller,
	Jesper Dangaard Brouer, netdev, bpf,
	Toke Høiland-Jørgensen

This adds support for automatically pinning maps on program load to libbpf.
This is needed for porting iproute2 bpf support to libbpf, but is also
useful in other contexts.

The semantics are modelled on those of the same functionality in iproute2,
namely:

- A path can be supplied in bpf_prog_load_attr specifying the directory
  that maps should be pinned into.

- Only maps that specify a non-zero value in its 'pinning' definition
  attribute will be pinned in the automatic mode.

- If an existing pinning is found at the pinning destination, its
  attributes will be compared and if they match, the existing map will be
  reused instead of creating a new map.

A subsequent commit will expand the functionality to enable programs to
support different pinning paths for different values of the map pinning
attribute, similar to what iproute2 does today.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 tools/lib/bpf/libbpf.c | 161 ++++++++++++++++++++++++++++++++++++++++-
 tools/lib/bpf/libbpf.h |   8 ++
 2 files changed, 168 insertions(+), 1 deletion(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 2233f919dd88..6d372a965c9d 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -220,6 +220,7 @@ struct bpf_map {
 	size_t sec_offset;
 	int map_ifindex;
 	int inner_map_fd;
+	int pin_reused;
 	struct bpf_map_def def;
 	__u32 btf_key_type_id;
 	__u32 btf_value_type_id;
@@ -3994,8 +3995,10 @@ int bpf_map__unpin(struct bpf_map *map, const char *path)
 	return 0;
 }
 
-int bpf_object__pin_maps(struct bpf_object *obj, const char *path)
+int bpf_object__pin_maps2(struct bpf_object *obj, const char *path,
+			  enum bpf_pin_mode mode)
 {
+	int explicit = (mode == BPF_PIN_MODE_EXPLICIT);
 	struct bpf_map *map;
 	int err;
 
@@ -4015,6 +4018,9 @@ int bpf_object__pin_maps(struct bpf_object *obj, const char *path)
 		char buf[PATH_MAX];
 		int len;
 
+		if ((explicit && !map->def.pinning) || map->pin_reused)
+			continue;
+
 		len = snprintf(buf, PATH_MAX, "%s/%s", path,
 			       bpf_map__name(map));
 		if (len < 0) {
@@ -4037,6 +4043,9 @@ int bpf_object__pin_maps(struct bpf_object *obj, const char *path)
 		char buf[PATH_MAX];
 		int len;
 
+		if ((explicit && !map->def.pinning) || map->pin_reused)
+			continue;
+
 		len = snprintf(buf, PATH_MAX, "%s/%s", path,
 			       bpf_map__name(map));
 		if (len < 0)
@@ -4050,6 +4059,11 @@ int bpf_object__pin_maps(struct bpf_object *obj, const char *path)
 	return err;
 }
 
+int bpf_object__pin_maps(struct bpf_object *obj, const char *path)
+{
+	return bpf_object__pin_maps2(obj, path, BPF_PIN_MODE_ALL);
+}
+
 int bpf_object__unpin_maps(struct bpf_object *obj, const char *path)
 {
 	struct bpf_map *map;
@@ -4802,6 +4816,141 @@ int bpf_prog_load(const char *file, enum bpf_prog_type type,
 	return bpf_prog_load_xattr(&attr, pobj, prog_fd);
 }
 
+static int bpf_read_map_info(int fd, struct bpf_map_def *map,
+			     enum bpf_prog_type *type)
+{
+	unsigned int val, owner_type = 0;
+	char file[PATH_MAX], buff[4096];
+	FILE *fp;
+
+	snprintf(file, sizeof(file), "/proc/%d/fdinfo/%d", getpid(), fd);
+	memset(map, 0, sizeof(*map));
+
+	fp = fopen(file, "r");
+	if (!fp) {
+		pr_warning("No procfs support?!\n");
+		return -EIO;
+	}
+
+	while (fgets(buff, sizeof(buff), fp)) {
+		if (sscanf(buff, "map_type:\t%u", &val) == 1)
+			map->type = val;
+		else if (sscanf(buff, "key_size:\t%u", &val) == 1)
+			map->key_size = val;
+		else if (sscanf(buff, "value_size:\t%u", &val) == 1)
+			map->value_size = val;
+		else if (sscanf(buff, "max_entries:\t%u", &val) == 1)
+			map->max_entries = val;
+		else if (sscanf(buff, "map_flags:\t%i", &val) == 1)
+			map->map_flags = val;
+		else if (sscanf(buff, "owner_prog_type:\t%i", &val) == 1)
+			owner_type = val;
+	}
+
+	fclose(fp);
+	if (type)
+		*type  = owner_type;
+
+	return 0;
+}
+
+static void bpf_map_pin_report(const struct bpf_map_def *pin,
+			       const struct bpf_map_def *obj)
+{
+	pr_warning("Map specification differs from pinned file!\n");
+
+	if (obj->type != pin->type)
+		pr_warning(" - Type:         %u (obj) != %u (pin)\n",
+			   obj->type, pin->type);
+	if (obj->key_size != pin->key_size)
+		pr_warning(" - Key size:     %u (obj) != %u (pin)\n",
+			   obj->key_size, pin->key_size);
+	if (obj->value_size != pin->value_size)
+		pr_warning(" - Value size:   %u (obj) != %u (pin)\n",
+			   obj->value_size, pin->value_size);
+	if (obj->max_entries != pin->max_entries)
+		pr_warning(" - Max entries:    %u (obj) != %u (pin)\n",
+			   obj->max_entries, pin->max_entries);
+	if (obj->map_flags != pin->map_flags)
+		pr_warning(" - Flags:        %#x (obj) != %#x (pin)\n",
+			   obj->map_flags, pin->map_flags);
+
+	pr_warning("\n");
+}
+
+
+
+static int bpf_map_selfcheck_pinned(int fd, const struct bpf_map_def *map,
+				    int length, enum bpf_prog_type type)
+{
+	enum bpf_prog_type owner_type = 0;
+	struct bpf_map_def tmp, zero = {};
+	int ret;
+
+	ret = bpf_read_map_info(fd, &tmp, &owner_type);
+	if (ret < 0)
+		return ret;
+
+	/* The decision to reject this is on kernel side eventually, but
+	 * at least give the user a chance to know what's wrong.
+	 */
+	if (owner_type && owner_type != type)
+		pr_warning("Program array map owner types differ: %u (obj) != %u (pin)\n",
+			   type, owner_type);
+
+	if (!memcmp(&tmp, map, length)) {
+		return 0;
+	} else {
+		/* If kernel doesn't have eBPF-related fdinfo, we cannot do much,
+		 * so just accept it. We know we do have an eBPF fd and in this
+		 * case, everything is 0. It is guaranteed that no such map exists
+		 * since map type of 0 is unloadable BPF_MAP_TYPE_UNSPEC.
+		 */
+		if (!memcmp(&tmp, &zero, length))
+			return 0;
+
+		bpf_map_pin_report(&tmp, map);
+		return -EINVAL;
+	}
+}
+
+
+int bpf_probe_pinned(const struct bpf_map *map,
+		     const struct bpf_prog_load_attr *attr)
+{
+	const char *name = bpf_map__name(map);
+	char buf[PATH_MAX];
+	int fd, len, ret;
+
+	if (!attr->auto_pin_path)
+		return -ENOENT;
+
+	len = snprintf(buf, PATH_MAX, "%s/%s", attr->auto_pin_path,
+		       name);
+	if (len < 0)
+		return -EINVAL;
+	else if (len >= PATH_MAX)
+		return -ENAMETOOLONG;
+
+	fd = bpf_obj_get(buf);
+	if (fd <= 0)
+		return fd;
+
+	ret = bpf_map_selfcheck_pinned(fd, &map->def,
+				       offsetof(struct bpf_map_def,
+						map_id),
+				       attr->prog_type);
+	if (ret < 0) {
+		close(fd);
+		pr_warning("Map \'%s\' self-check failed!\n", name);
+		return ret;
+	}
+	if (attr->log_level)
+		pr_debug("Map \'%s\' loaded as pinned!\n", name);
+
+	return fd;
+}
+
 int bpf_prog_load_xattr(const struct bpf_prog_load_attr *attr,
 			struct bpf_object **pobj, int *prog_fd)
 {
@@ -4853,8 +5002,14 @@ int bpf_prog_load_xattr(const struct bpf_prog_load_attr *attr,
 	}
 
 	bpf_object__for_each_map(map, obj) {
+		int fd;
+
 		if (!bpf_map__is_offload_neutral(map))
 			map->map_ifindex = attr->ifindex;
+
+		fd = bpf_probe_pinned(map, attr);
+		if (fd > 0 && !bpf_map__reuse_fd(map, fd))
+			map->pin_reused = 1;
 	}
 
 	if (!first_prog) {
@@ -4869,6 +5024,10 @@ int bpf_prog_load_xattr(const struct bpf_prog_load_attr *attr,
 		return -EINVAL;
 	}
 
+	if (attr->auto_pin_path)
+		bpf_object__pin_maps2(obj, attr->auto_pin_path,
+				      BPF_PIN_MODE_EXPLICIT);
+
 	*pobj = obj;
 	*prog_fd = bpf_program__fd(first_prog);
 	return 0;
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 5facba6ea1e1..3c5c3256e22d 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -67,6 +67,11 @@ struct bpf_object_open_attr {
 	enum bpf_prog_type prog_type;
 };
 
+enum bpf_pin_mode {
+	BPF_PIN_MODE_ALL = 0,
+	BPF_PIN_MODE_EXPLICIT,
+};
+
 LIBBPF_API struct bpf_object *bpf_object__open(const char *path);
 LIBBPF_API struct bpf_object *
 bpf_object__open_xattr(struct bpf_object_open_attr *attr);
@@ -79,6 +84,8 @@ int bpf_object__section_size(const struct bpf_object *obj, const char *name,
 			     __u32 *size);
 int bpf_object__variable_offset(const struct bpf_object *obj, const char *name,
 				__u32 *off);
+LIBBPF_API int bpf_object__pin_maps2(struct bpf_object *obj, const char *path,
+				     enum bpf_pin_mode mode);
 LIBBPF_API int bpf_object__pin_maps(struct bpf_object *obj, const char *path);
 LIBBPF_API int bpf_object__unpin_maps(struct bpf_object *obj,
 				      const char *path);
@@ -353,6 +360,7 @@ struct bpf_prog_load_attr {
 	int ifindex;
 	int log_level;
 	int prog_flags;
+	const char *auto_pin_path;
 };
 
 LIBBPF_API int bpf_prog_load_xattr(const struct bpf_prog_load_attr *attr,
-- 
2.22.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC bpf-next 3/5] libbpf: Add support for specifying map pinning path via callback
  2019-08-20 11:47 [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP) Toke Høiland-Jørgensen
  2019-08-20 11:47 ` [RFC bpf-next 1/5] libbpf: Add map definition struct fields from iproute2 Toke Høiland-Jørgensen
  2019-08-20 11:47 ` [RFC bpf-next 2/5] libbpf: Add support for auto-pinning of maps with reuse on program load Toke Høiland-Jørgensen
@ 2019-08-20 11:47 ` Toke Høiland-Jørgensen
  2019-08-20 11:47 ` [RFC bpf-next 4/5] iproute2: Allow compiling against libbpf Toke Høiland-Jørgensen
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 46+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-08-20 11:47 UTC (permalink / raw)
  To: Stephen Hemminger, Daniel Borkmann, Alexei Starovoitov
  Cc: Martin KaFai Lau, Song Liu, Yonghong Song, David Miller,
	Jesper Dangaard Brouer, netdev, bpf,
	Toke Høiland-Jørgensen

This adds a callback parameter that can be set in struct bpf_prog_load_attr
which will allow the calling program to specify arbitrary paths for each
map included in an ELF file being loaded.

In particular, this allows iproute2 to implement its current semantics for
map pinning on top of libbpf.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 tools/lib/bpf/libbpf.c | 73 +++++++++++++++++++++++++++++-------------
 tools/lib/bpf/libbpf.h |  6 ++++
 2 files changed, 57 insertions(+), 22 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 6d372a965c9d..a5c379378f26 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -3995,8 +3995,33 @@ int bpf_map__unpin(struct bpf_map *map, const char *path)
 	return 0;
 }
 
-int bpf_object__pin_maps2(struct bpf_object *obj, const char *path,
-			  enum bpf_pin_mode mode)
+int __bpf_map__pin_path(const struct bpf_map *map,
+			const char *path,
+			map_pinning_path_fn fn, void *fn_ctx,
+			char *buf, int buf_len)
+{
+	const char *name = bpf_map__name(map);
+	int len;
+
+	if (fn) {
+		len = fn(fn_ctx, buf, buf_len, name, map->def.pinning);
+		buf[buf_len - 1] = '\0';
+		return len;
+	}
+
+	len = snprintf(buf, PATH_MAX, "%s/%s", path,
+		       name);
+	if (len < 0)
+		return -EINVAL;
+	else if (len >= PATH_MAX)
+		return -ENAMETOOLONG;
+
+	return len;
+}
+
+int __bpf_object__pin_maps(struct bpf_object *obj, enum bpf_pin_mode mode,
+			   const char *path,
+			   map_pinning_path_fn cb, void *cb_ctx)
 {
 	int explicit = (mode == BPF_PIN_MODE_EXPLICIT);
 	struct bpf_map *map;
@@ -4010,9 +4035,11 @@ int bpf_object__pin_maps2(struct bpf_object *obj, const char *path,
 		return -ENOENT;
 	}
 
-	err = make_dir(path);
-	if (err)
-		return err;
+	if (path) {
+		err = make_dir(path);
+		if (err)
+			return err;
+	}
 
 	bpf_object__for_each_map(map, obj) {
 		char buf[PATH_MAX];
@@ -4021,14 +4048,10 @@ int bpf_object__pin_maps2(struct bpf_object *obj, const char *path,
 		if ((explicit && !map->def.pinning) || map->pin_reused)
 			continue;
 
-		len = snprintf(buf, PATH_MAX, "%s/%s", path,
-			       bpf_map__name(map));
+		len = __bpf_map__pin_path(map, path, cb, cb_ctx, buf, PATH_MAX);
 		if (len < 0) {
 			err = -EINVAL;
 			goto err_unpin_maps;
-		} else if (len >= PATH_MAX) {
-			err = -ENAMETOOLONG;
-			goto err_unpin_maps;
 		}
 
 		err = bpf_map__pin(map, buf);
@@ -4059,6 +4082,13 @@ int bpf_object__pin_maps2(struct bpf_object *obj, const char *path,
 	return err;
 }
 
+int bpf_object__pin_maps2(struct bpf_object *obj, const char *path,
+			  enum bpf_pin_mode mode)
+{
+	return __bpf_object__pin_maps(obj, mode, path, NULL, NULL);
+}
+
+
 int bpf_object__pin_maps(struct bpf_object *obj, const char *path)
 {
 	return bpf_object__pin_maps2(obj, path, BPF_PIN_MODE_ALL);
@@ -4914,23 +4944,21 @@ static int bpf_map_selfcheck_pinned(int fd, const struct bpf_map_def *map,
 	}
 }
 
-
 int bpf_probe_pinned(const struct bpf_map *map,
 		     const struct bpf_prog_load_attr *attr)
 {
 	const char *name = bpf_map__name(map);
 	char buf[PATH_MAX];
-	int fd, len, ret;
+	int fd, ret;
 
-	if (!attr->auto_pin_path)
+	if (!attr->auto_pin_path && !attr->auto_pin_cb)
 		return -ENOENT;
 
-	len = snprintf(buf, PATH_MAX, "%s/%s", attr->auto_pin_path,
-		       name);
-	if (len < 0)
-		return -EINVAL;
-	else if (len >= PATH_MAX)
-		return -ENAMETOOLONG;
+	ret = __bpf_map__pin_path(map, attr->auto_pin_path,
+				  attr->auto_pin_cb, attr->auto_pin_ctx,
+				  buf, PATH_MAX);
+	if (ret < 0)
+		return ret;
 
 	fd = bpf_obj_get(buf);
 	if (fd <= 0)
@@ -5024,9 +5052,10 @@ int bpf_prog_load_xattr(const struct bpf_prog_load_attr *attr,
 		return -EINVAL;
 	}
 
-	if (attr->auto_pin_path)
-		bpf_object__pin_maps2(obj, attr->auto_pin_path,
-				      BPF_PIN_MODE_EXPLICIT);
+	if (attr->auto_pin_path || attr->auto_pin_cb)
+		__bpf_object__pin_maps(obj, BPF_PIN_MODE_EXPLICIT,
+				       attr->auto_pin_path, attr->auto_pin_cb,
+				       attr->auto_pin_ctx);
 
 	*pobj = obj;
 	*prog_fd = bpf_program__fd(first_prog);
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 3c5c3256e22d..65fdd3389d27 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -72,6 +72,10 @@ enum bpf_pin_mode {
 	BPF_PIN_MODE_EXPLICIT,
 };
 
+struct bpf_map_def;
+typedef int (*map_pinning_path_fn)(void *ctx, char *buf, int buf_len,
+				   const char *name, unsigned int pinning);
+
 LIBBPF_API struct bpf_object *bpf_object__open(const char *path);
 LIBBPF_API struct bpf_object *
 bpf_object__open_xattr(struct bpf_object_open_attr *attr);
@@ -361,6 +365,8 @@ struct bpf_prog_load_attr {
 	int log_level;
 	int prog_flags;
 	const char *auto_pin_path;
+	map_pinning_path_fn auto_pin_cb;
+	void *auto_pin_ctx;
 };
 
 LIBBPF_API int bpf_prog_load_xattr(const struct bpf_prog_load_attr *attr,
-- 
2.22.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC bpf-next 4/5] iproute2: Allow compiling against libbpf
  2019-08-20 11:47 [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP) Toke Høiland-Jørgensen
                   ` (2 preceding siblings ...)
  2019-08-20 11:47 ` [RFC bpf-next 3/5] libbpf: Add support for specifying map pinning path via callback Toke Høiland-Jørgensen
@ 2019-08-20 11:47 ` Toke Høiland-Jørgensen
  2019-08-22  8:58   ` Daniel Borkmann
  2019-08-20 11:47 ` [RFC bpf-next 5/5] iproute2: Support loading XDP programs with libbpf Toke Høiland-Jørgensen
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 46+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-08-20 11:47 UTC (permalink / raw)
  To: Stephen Hemminger, Daniel Borkmann, Alexei Starovoitov
  Cc: Martin KaFai Lau, Song Liu, Yonghong Song, David Miller,
	Jesper Dangaard Brouer, netdev, bpf,
	Toke Høiland-Jørgensen

This adds a configure check for libbpf and renames functions to allow
lib/bpf.c to be compiled with it present. This makes it possible to
port functionality piecemeal to use libbpf.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 configure          | 16 ++++++++++++++++
 include/bpf_util.h |  6 +++---
 ip/ipvrf.c         |  4 ++--
 lib/bpf.c          | 33 +++++++++++++++++++--------------
 4 files changed, 40 insertions(+), 19 deletions(-)

diff --git a/configure b/configure
index 45fcffb6..5a89ee9f 100755
--- a/configure
+++ b/configure
@@ -238,6 +238,19 @@ check_elf()
     fi
 }
 
+check_libbpf()
+{
+    if ${PKG_CONFIG} libbpf --exists; then
+	echo "HAVE_LIBBPF:=y" >>$CONFIG
+	echo "yes"
+
+	echo 'CFLAGS += -DHAVE_LIBBPF' `${PKG_CONFIG} libbpf --cflags` >> $CONFIG
+	echo 'LDLIBS += ' `${PKG_CONFIG} libbpf --libs` >>$CONFIG
+    else
+	echo "no"
+    fi
+}
+
 check_selinux()
 # SELinux is a compile time option in the ss utility
 {
@@ -386,6 +399,9 @@ check_selinux
 echo -n "ELF support: "
 check_elf
 
+echo -n "libbpf support: "
+check_libbpf
+
 echo -n "libmnl support: "
 check_mnl
 
diff --git a/include/bpf_util.h b/include/bpf_util.h
index 63db07ca..72d3a32c 100644
--- a/include/bpf_util.h
+++ b/include/bpf_util.h
@@ -274,9 +274,9 @@ int bpf_trace_pipe(void);
 
 void bpf_print_ops(struct rtattr *bpf_ops, __u16 len);
 
-int bpf_prog_load(enum bpf_prog_type type, const struct bpf_insn *insns,
-		  size_t size_insns, const char *license, char *log,
-		  size_t size_log);
+int bpf_prog_load_buf(enum bpf_prog_type type, const struct bpf_insn *insns,
+		      size_t size_insns, const char *license, char *log,
+		      size_t size_log);
 
 int bpf_prog_attach_fd(int prog_fd, int target_fd, enum bpf_attach_type type);
 int bpf_prog_detach_fd(int target_fd, enum bpf_attach_type type);
diff --git a/ip/ipvrf.c b/ip/ipvrf.c
index 43366f6e..1d1aae6f 100644
--- a/ip/ipvrf.c
+++ b/ip/ipvrf.c
@@ -256,8 +256,8 @@ static int prog_load(int idx)
 		BPF_EXIT_INSN(),
 	};
 
-	return bpf_prog_load(BPF_PROG_TYPE_CGROUP_SOCK, prog, sizeof(prog),
-			     "GPL", bpf_log_buf, sizeof(bpf_log_buf));
+	return bpf_prog_load_buf(BPF_PROG_TYPE_CGROUP_SOCK, prog, sizeof(prog),
+				 "GPL", bpf_log_buf, sizeof(bpf_log_buf));
 }
 
 static int vrf_configure_cgroup(const char *path, int ifindex)
diff --git a/lib/bpf.c b/lib/bpf.c
index 7d2a322f..c6e3bd0d 100644
--- a/lib/bpf.c
+++ b/lib/bpf.c
@@ -28,6 +28,11 @@
 #include <gelf.h>
 #endif
 
+#ifdef HAVE_LIBBPF
+#include <bpf/libbpf.h>
+#include <bpf/bpf.h>
+#endif
+
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <sys/un.h>
@@ -795,7 +800,7 @@ out:
 	return mnt;
 }
 
-static int bpf_obj_get(const char *pathname, enum bpf_prog_type type)
+static int bpf_obj_get_path(const char *pathname, enum bpf_prog_type type)
 {
 	union bpf_attr attr = {};
 	char tmp[PATH_MAX];
@@ -814,7 +819,7 @@ static int bpf_obj_get(const char *pathname, enum bpf_prog_type type)
 
 static int bpf_obj_pinned(const char *pathname, enum bpf_prog_type type)
 {
-	int prog_fd = bpf_obj_get(pathname, type);
+	int prog_fd = bpf_obj_get_path(pathname, type);
 
 	if (prog_fd < 0)
 		fprintf(stderr, "Couldn\'t retrieve pinned program \'%s\': %s\n",
@@ -1036,7 +1041,7 @@ int bpf_graft_map(const char *map_path, uint32_t *key, int argc, char **argv)
 		}
 	}
 
-	map_fd = bpf_obj_get(map_path, cfg.type);
+	map_fd = bpf_obj_get_path(map_path, cfg.type);
 	if (map_fd < 0) {
 		fprintf(stderr, "Couldn\'t retrieve pinned map \'%s\': %s\n",
 			map_path, strerror(errno));
@@ -1105,9 +1110,9 @@ static int bpf_prog_load_dev(enum bpf_prog_type type,
 	return bpf(BPF_PROG_LOAD, &attr, sizeof(attr));
 }
 
-int bpf_prog_load(enum bpf_prog_type type, const struct bpf_insn *insns,
-		  size_t size_insns, const char *license, char *log,
-		  size_t size_log)
+int bpf_prog_load_buf(enum bpf_prog_type type, const struct bpf_insn *insns,
+		      size_t size_insns, const char *license, char *log,
+		      size_t size_log)
 {
 	return bpf_prog_load_dev(type, insns, size_insns, license, 0,
 				 log, size_log);
@@ -1284,7 +1289,7 @@ static int bpf_btf_load(void *btf, size_t size_btf,
 	return bpf(BPF_BTF_LOAD, &attr, sizeof(attr));
 }
 
-static int bpf_obj_pin(int fd, const char *pathname)
+static int bpf_obj_pin_fd(int fd, const char *pathname)
 {
 	union bpf_attr attr = {};
 
@@ -1433,7 +1438,7 @@ static int bpf_probe_pinned(const char *name, const struct bpf_elf_ctx *ctx,
 		return 0;
 
 	bpf_make_pathname(pathname, sizeof(pathname), name, ctx, pinning);
-	return bpf_obj_get(pathname, ctx->type);
+	return bpf_obj_get_path(pathname, ctx->type);
 }
 
 static int bpf_make_obj_path(const struct bpf_elf_ctx *ctx)
@@ -1501,7 +1506,7 @@ static int bpf_place_pinned(int fd, const char *name,
 		return ret;
 
 	bpf_make_pathname(pathname, sizeof(pathname), name, ctx, pinning);
-	return bpf_obj_pin(fd, pathname);
+	return bpf_obj_pin_fd(fd, pathname);
 }
 
 static void bpf_prog_report(int fd, const char *section,
@@ -1523,9 +1528,9 @@ static void bpf_prog_report(int fd, const char *section,
 	bpf_dump_error(ctx, "Verifier analysis:\n\n");
 }
 
-static int bpf_prog_attach(const char *section,
-			   const struct bpf_elf_prog *prog,
-			   struct bpf_elf_ctx *ctx)
+static int bpf_prog_attach_section(const char *section,
+				   const struct bpf_elf_prog *prog,
+				   struct bpf_elf_ctx *ctx)
 {
 	int tries = 0, fd;
 retry:
@@ -2347,7 +2352,7 @@ static int bpf_fetch_prog(struct bpf_elf_ctx *ctx, const char *section,
 		prog.insns_num = prog.size / sizeof(struct bpf_insn);
 		prog.insns     = data.sec_data->d_buf;
 
-		fd = bpf_prog_attach(section, &prog, ctx);
+		fd = bpf_prog_attach_section(section, &prog, ctx);
 		if (fd < 0)
 			return fd;
 
@@ -2513,7 +2518,7 @@ static int bpf_fetch_prog_relo(struct bpf_elf_ctx *ctx, const char *section,
 			goto out;
 		}
 
-		fd = bpf_prog_attach(section, prog, ctx);
+		fd = bpf_prog_attach_section(section, prog, ctx);
 		free(prog->insns);
 		if (fd < 0) {
 			*lderr = true;
-- 
2.22.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC bpf-next 5/5] iproute2: Support loading XDP programs with libbpf
  2019-08-20 11:47 [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP) Toke Høiland-Jørgensen
                   ` (3 preceding siblings ...)
  2019-08-20 11:47 ` [RFC bpf-next 4/5] iproute2: Allow compiling against libbpf Toke Høiland-Jørgensen
@ 2019-08-20 11:47 ` Toke Høiland-Jørgensen
  2019-08-21 19:26 ` [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP) Alexei Starovoitov
  2019-08-21 20:30 ` Andrii Nakryiko
  6 siblings, 0 replies; 46+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-08-20 11:47 UTC (permalink / raw)
  To: Stephen Hemminger, Daniel Borkmann, Alexei Starovoitov
  Cc: Martin KaFai Lau, Song Liu, Yonghong Song, David Miller,
	Jesper Dangaard Brouer, netdev, bpf,
	Toke Høiland-Jørgensen

This switches over loading of XDP programs to the using libbpf, if it is
available. It uses the automatic pinning features added to libbpf to
construct the same pinning paths as the libelf-based loader.

Since map-in-map support has not yet been added to libbpf, this means that
map-in-map definitions will not work with this patch.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 lib/bpf.c | 115 ++++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 102 insertions(+), 13 deletions(-)

diff --git a/lib/bpf.c b/lib/bpf.c
index c6e3bd0d..de1a655a 100644
--- a/lib/bpf.c
+++ b/lib/bpf.c
@@ -938,9 +938,17 @@ static int bpf_do_parse(struct bpf_cfg_in *cfg, const bool *opt_tbl)
 	return ret;
 }
 
+#ifdef HAVE_LIBBPF
+static int bpf_do_load_libbpf(struct bpf_cfg_in *cfg);
+#endif
+
 static int bpf_do_load(struct bpf_cfg_in *cfg)
 {
 	if (cfg->mode == EBPF_OBJECT) {
+#ifdef HAVE_LIBBPF
+		if(cfg->type == BPF_PROG_TYPE_XDP)
+			return bpf_do_load_libbpf(cfg);
+#endif
 		cfg->prog_fd = bpf_obj_open(cfg->object, cfg->type,
 					    cfg->section, cfg->ifindex,
 					    cfg->verbose);
@@ -1407,25 +1415,22 @@ static bool bpf_no_pinning(const struct bpf_elf_ctx *ctx,
 	}
 }
 
-static void bpf_make_pathname(char *pathname, size_t len, const char *name,
+static int bpf_make_pathname(char *pathname, size_t len, const char *name,
 			      const struct bpf_elf_ctx *ctx, uint32_t pinning)
 {
 	switch (pinning) {
 	case PIN_OBJECT_NS:
-		snprintf(pathname, len, "%s/%s/%s",
-			 bpf_get_work_dir(ctx->type),
-			 ctx->obj_uid, name);
-		break;
+		return snprintf(pathname, len, "%s/%s/%s",
+				bpf_get_work_dir(ctx->type),
+				ctx->obj_uid, name);
 	case PIN_GLOBAL_NS:
-		snprintf(pathname, len, "%s/%s/%s",
-			 bpf_get_work_dir(ctx->type),
-			 BPF_DIR_GLOBALS, name);
-		break;
+		return snprintf(pathname, len, "%s/%s/%s",
+				bpf_get_work_dir(ctx->type),
+				BPF_DIR_GLOBALS, name);
 	default:
-		snprintf(pathname, len, "%s/../%s/%s",
-			 bpf_get_work_dir(ctx->type),
-			 bpf_custom_pinning(ctx, pinning), name);
-		break;
+		return snprintf(pathname, len, "%s/../%s/%s",
+				bpf_get_work_dir(ctx->type),
+				bpf_custom_pinning(ctx, pinning), name);
 	}
 }
 
@@ -3160,3 +3165,87 @@ int bpf_recv_map_fds(const char *path, int *fds, struct bpf_map_aux *aux,
 	return ret;
 }
 #endif /* HAVE_ELF */
+
+#ifdef HAVE_LIBBPF
+static int bpf_gen_pin_name(void *priv, char *buf, int buf_len,
+			    const char *name, unsigned int pinning)
+{
+	struct bpf_elf_ctx *ctx = priv;
+	const char *tmp;
+	int ret = 0;
+
+	if (bpf_no_pinning(ctx, pinning) || !bpf_get_work_dir(ctx->type))
+		return 0;
+
+	if (pinning == PIN_OBJECT_NS)
+		ret = bpf_make_obj_path(ctx);
+	else if ((tmp = bpf_custom_pinning(ctx, pinning)))
+		ret = bpf_make_custom_path(ctx, tmp);
+	if (ret < 0)
+		return ret;
+
+	return bpf_make_pathname(buf, buf_len, name, ctx, pinning);
+}
+
+static int bpf_elf_ctx_init_stub(struct bpf_elf_ctx *ctx, const char *pathname,
+				 enum bpf_prog_type type, int verbose)
+{
+	uint8_t tmp[20];
+	int ret;
+
+	memset(ctx, 0, sizeof(*ctx));
+	ret = bpf_obj_hash(pathname, tmp, sizeof(tmp));
+	if (ret)
+		ctx->noafalg = true;
+	else
+		hexstring_n2a(tmp, sizeof(tmp), ctx->obj_uid,
+			      sizeof(ctx->obj_uid));
+
+	ctx->verbose = verbose;
+	ctx->type    = type;
+	bpf_hash_init(ctx, CONFDIR "/bpf_pinning");
+
+	return 0;
+}
+
+static int verbose_print(enum libbpf_print_level level, const char *format,
+			 va_list args)
+{
+	return vfprintf(stderr, format, args);
+}
+
+static int bpf_do_load_libbpf(struct bpf_cfg_in *cfg)
+{
+	struct bpf_elf_ctx *ctx = &__ctx;
+	struct bpf_program *prog;
+	struct bpf_object *obj;
+	int err, prog_fd = -1;
+
+	struct bpf_prog_load_attr attr = {
+		.file = cfg->object,
+		.prog_type = cfg->type,
+		.ifindex = cfg->ifindex,
+		.log_level = cfg->verbose,
+		.auto_pin_cb = bpf_gen_pin_name,
+		.auto_pin_ctx = ctx,
+	};
+
+	if (cfg->verbose)
+		libbpf_set_print(verbose_print);
+
+	bpf_elf_ctx_init_stub(ctx, cfg->object, cfg->type, cfg->verbose);
+
+	err = bpf_prog_load_xattr(&attr, &obj, &prog_fd);
+	if (err)
+		return err;
+
+	if (cfg->section) {
+		prog = bpf_object__find_program_by_title(obj,
+							 cfg->section);
+		prog_fd = bpf_program__fd(prog);
+	}
+
+	cfg->prog_fd = prog_fd;
+	return cfg->prog_fd;
+}
+#endif
-- 
2.22.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP)
  2019-08-20 11:47 [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP) Toke Høiland-Jørgensen
                   ` (4 preceding siblings ...)
  2019-08-20 11:47 ` [RFC bpf-next 5/5] iproute2: Support loading XDP programs with libbpf Toke Høiland-Jørgensen
@ 2019-08-21 19:26 ` Alexei Starovoitov
  2019-08-21 21:00   ` Toke Høiland-Jørgensen
  2019-08-21 20:30 ` Andrii Nakryiko
  6 siblings, 1 reply; 46+ messages in thread
From: Alexei Starovoitov @ 2019-08-21 19:26 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Stephen Hemminger, Daniel Borkmann, Alexei Starovoitov,
	Martin KaFai Lau, Song Liu, Yonghong Song, David Miller,
	Jesper Dangaard Brouer, netdev, bpf

On Tue, Aug 20, 2019 at 01:47:01PM +0200, Toke Høiland-Jørgensen wrote:
> iproute2 uses its own bpf loader to load eBPF programs, which has
> evolved separately from libbpf. Since we are now standardising on
> libbpf, this becomes a problem as iproute2 is slowly accumulating
> feature incompatibilities with libbpf-based loaders. In particular,
> iproute2 has its own (expanded) version of the map definition struct,
> which makes it difficult to write programs that can be loaded with both
> custom loaders and iproute2.
> 
> This series seeks to address this by converting iproute2 to using libbpf
> for all its bpf needs. This version is an early proof-of-concept RFC, to
> get some feedback on whether people think this is the right direction.
> 
> What this series does is the following:
> 
> - Updates the libbpf map definition struct to match that of iproute2
>   (patch 1).
> - Adds functionality to libbpf to support automatic pinning of maps when
>   loading an eBPF program, while re-using pinned maps if they already
>   exist (patches 2-3).
> - Modifies iproute2 to make it possible to compile it against libbpf
>   without affecting any existing functionality (patch 4).
> - Changes the iproute2 eBPF loader to use libbpf for loading XDP
>   programs (patch 5).
> 
> 
> As this is an early PoC, there are still a few missing pieces before
> this can be merged. Including (but probably not limited to):
> 
> - Consolidate the map definition struct in the bpf_helpers.h file in the
>   kernel tree. This contains a different, and incompatible, update to
>   the struct. Since the iproute2 version has actually been released for
>   use outside the kernel tree (and thus is subject to API stability
>   constraints), I think it makes the most sense to keep that, and port
>   the selftests to use it.

It sounds like you're implying that existing libbpf format is not uapi.
It is and we cannot break it.
If patch 1 means breakage for existing pre-compiled .o that won't load
with new libbpf then we cannot use this method.
Recompiling .o with new libbpf definition of bpf_map_def isn't an option.
libbpf has to be smart before/after and recognize both old and iproute2 format.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP)
  2019-08-20 11:47 [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP) Toke Høiland-Jørgensen
                   ` (5 preceding siblings ...)
  2019-08-21 19:26 ` [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP) Alexei Starovoitov
@ 2019-08-21 20:30 ` Andrii Nakryiko
  2019-08-21 21:07   ` Toke Høiland-Jørgensen
  2019-08-23 10:27   ` Jesper Dangaard Brouer
  6 siblings, 2 replies; 46+ messages in thread
From: Andrii Nakryiko @ 2019-08-21 20:30 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Stephen Hemminger, Daniel Borkmann, Alexei Starovoitov,
	Martin KaFai Lau, Song Liu, Yonghong Song, David Miller,
	Jesper Dangaard Brouer, Networking, bpf

On Tue, Aug 20, 2019 at 4:47 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> iproute2 uses its own bpf loader to load eBPF programs, which has
> evolved separately from libbpf. Since we are now standardising on
> libbpf, this becomes a problem as iproute2 is slowly accumulating
> feature incompatibilities with libbpf-based loaders. In particular,
> iproute2 has its own (expanded) version of the map definition struct,
> which makes it difficult to write programs that can be loaded with both
> custom loaders and iproute2.
>
> This series seeks to address this by converting iproute2 to using libbpf
> for all its bpf needs. This version is an early proof-of-concept RFC, to
> get some feedback on whether people think this is the right direction.
>
> What this series does is the following:
>
> - Updates the libbpf map definition struct to match that of iproute2
>   (patch 1).


Hi Toke,

Thanks for taking a stab at unifying libbpf and iproute2 loaders. I'm
totally in support of making iproute2 use libbpf to load/initialize
BPF programs. But I'm against adding iproute2-specific fields to
libbpf's bpf_map_def definitions to support this.

I've proposed the plan of extending libbpf's supported features so
that it can be used to load iproute2-style BPF programs earlier,
please see discussions in [0] and [1]. I think instead of emulating
iproute2 way of matching everything based on user-specified internal
IDs, which doesn't provide good user experience and is quite easy to
get wrong, we should support same scenarios with better declarative
syntax and in a less error-prone way. I believe we can do that by
relying on BTF more heavily (again, please check some of my proposals
in [0], [1], and discussion with Daniel in those threads). It will
feel more natural and be more straightforward to follow. It would be
great if you can lend a hand in implementing pieces of that plan!

I'm currently on vacation, so my availability is very sparse, but I'd
be happy to discuss this further, if need be.

  [0] https://lore.kernel.org/bpf/CAEf4BzbfdG2ub7gCi0OYqBrUoChVHWsmOntWAkJt47=FE+km+A@xxxxxxxxxxxxxx/
  [1] https://www.spinics.net/lists/bpf/msg03976.html

> - Adds functionality to libbpf to support automatic pinning of maps when
>   loading an eBPF program, while re-using pinned maps if they already
>   exist (patches 2-3).
> - Modifies iproute2 to make it possible to compile it against libbpf
>   without affecting any existing functionality (patch 4).
> - Changes the iproute2 eBPF loader to use libbpf for loading XDP
>   programs (patch 5).
>
>
> As this is an early PoC, there are still a few missing pieces before
> this can be merged. Including (but probably not limited to):
>
> - Consolidate the map definition struct in the bpf_helpers.h file in the
>   kernel tree. This contains a different, and incompatible, update to
>   the struct. Since the iproute2 version has actually been released for
>   use outside the kernel tree (and thus is subject to API stability
>   constraints), I think it makes the most sense to keep that, and port
>   the selftests to use it.
>
> - The iproute2 loader supports automatically populating map-in-map
>   definitions on load. This needs to be added to libbpf as well. There
>   is an implementation of this in the selftests in the kernel tree,
>   which will have to be ported (related to the previous point).
>
> - The iproute2 port needs to be completed; this means at least
>   supporting TC eBPF programs as well, figuring out how to deal with
>   cBPF programs, and getting the verbose output back to the same state
>   as before the port. Also, I guess the iproute2 maintainers need to ACK
>   that they are good with adding a dependency on libbpf.
>
> - Some of the code added to libbpf in patch 2 in this series include
>   code derived from iproute2, which is GPLv2+. So it will need to be
>   re-licensed to be usable in libbpf. Since `git blame` indicated that
>   the original code was written by Daniel, I figure he can ACK that
>   relicensing before applying the patches :)
>
>
> Please take a look at this series and let me know if you agree
> that this is the right direction to go. Assuming there's consensus that
> it is, I'll focus on getting the rest of the libbpf patches ready for
> merging. I'll send those as a separate series, and hold off on the
> iproute2 patches until they are merged; but for this version I'm
> including both in one series so it's easier to see the context.
>
> -Toke
>
>
> libbpf:
> Toke Høiland-Jørgensen (3):
>   libbpf: Add map definition struct fields from iproute2
>   libbpf: Add support for auto-pinning of maps with reuse on program
>     load
>   libbpf: Add support for specifying map pinning path via callback
>
>  tools/lib/bpf/libbpf.c | 205 +++++++++++++++++++++++++++++++++++++++--
>  tools/lib/bpf/libbpf.h |  18 ++++
>  2 files changed, 214 insertions(+), 9 deletions(-)
>
> iproute2:
> Toke Høiland-Jørgensen (2):
>   iproute2: Allow compiling against libbpf
>   iproute2: Support loading XDP programs with libbpf
>
>  configure          |  16 +++++
>  include/bpf_util.h |   6 +-
>  ip/ipvrf.c         |   4 +-
>  lib/bpf.c          | 148 ++++++++++++++++++++++++++++++++++++---------
>  4 files changed, 142 insertions(+), 32 deletions(-)
>
>
> --
> 2.22.1
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP)
  2019-08-21 19:26 ` [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP) Alexei Starovoitov
@ 2019-08-21 21:00   ` Toke Høiland-Jørgensen
  2019-08-22  7:52     ` Andrii Nakryiko
  0 siblings, 1 reply; 46+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-08-21 21:00 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Stephen Hemminger, Daniel Borkmann, Alexei Starovoitov,
	Martin KaFai Lau, Song Liu, Yonghong Song, David Miller,
	Jesper Dangaard Brouer, netdev, bpf

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Tue, Aug 20, 2019 at 01:47:01PM +0200, Toke Høiland-Jørgensen wrote:
>> iproute2 uses its own bpf loader to load eBPF programs, which has
>> evolved separately from libbpf. Since we are now standardising on
>> libbpf, this becomes a problem as iproute2 is slowly accumulating
>> feature incompatibilities with libbpf-based loaders. In particular,
>> iproute2 has its own (expanded) version of the map definition struct,
>> which makes it difficult to write programs that can be loaded with both
>> custom loaders and iproute2.
>> 
>> This series seeks to address this by converting iproute2 to using libbpf
>> for all its bpf needs. This version is an early proof-of-concept RFC, to
>> get some feedback on whether people think this is the right direction.
>> 
>> What this series does is the following:
>> 
>> - Updates the libbpf map definition struct to match that of iproute2
>>   (patch 1).
>> - Adds functionality to libbpf to support automatic pinning of maps when
>>   loading an eBPF program, while re-using pinned maps if they already
>>   exist (patches 2-3).
>> - Modifies iproute2 to make it possible to compile it against libbpf
>>   without affecting any existing functionality (patch 4).
>> - Changes the iproute2 eBPF loader to use libbpf for loading XDP
>>   programs (patch 5).
>> 
>> 
>> As this is an early PoC, there are still a few missing pieces before
>> this can be merged. Including (but probably not limited to):
>> 
>> - Consolidate the map definition struct in the bpf_helpers.h file in the
>>   kernel tree. This contains a different, and incompatible, update to
>>   the struct. Since the iproute2 version has actually been released for
>>   use outside the kernel tree (and thus is subject to API stability
>>   constraints), I think it makes the most sense to keep that, and port
>>   the selftests to use it.
>
> It sounds like you're implying that existing libbpf format is not
> uapi.

No, that's not what I meant... See below.

> It is and we cannot break it.
> If patch 1 means breakage for existing pre-compiled .o that won't load
> with new libbpf then we cannot use this method.
> Recompiling .o with new libbpf definition of bpf_map_def isn't an option.
> libbpf has to be smart before/after and recognize both old and iproute2 format.

The libbpf.h definition of struct bpf_map_def is compatible with the one
used in iproute2. In libbpf.h, the struct only contains five fields
(type, key_size, value_size, max_entries and flags), and iproute2 adds
another 4 (id, pinning, inner_id and inner_idx; these are the ones in
patch 1 in this series).

The issue I was alluding to above is that the bpf_helpers.h file in the
kernel selftests directory *also* extends the bpf_map_def struct, and
adds two *different* fields (inner_map_idx and numa_mode). The former is
used to implement the same map-in-map definition functionality that
iproute2 has, but with different semantics. The latter is additional to
that, and I'm planning to add that to this series.

Since bpf_helpers.h is *not* part of libbpf (yet), this will make it
possible to keep API (and ABI) compatibility with both iproute2 and
libbpf. As in, old .o files will still load with libbpf after this
series, they just won't be able to use the new automatic pinning
feature.

-Toke

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP)
  2019-08-21 20:30 ` Andrii Nakryiko
@ 2019-08-21 21:07   ` Toke Høiland-Jørgensen
  2019-08-22  7:49     ` Andrii Nakryiko
  2019-08-23 10:27   ` Jesper Dangaard Brouer
  1 sibling, 1 reply; 46+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-08-21 21:07 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Stephen Hemminger, Daniel Borkmann, Alexei Starovoitov,
	Martin KaFai Lau, Song Liu, Yonghong Song, David Miller,
	Jesper Dangaard Brouer, Networking, bpf

Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> On Tue, Aug 20, 2019 at 4:47 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> iproute2 uses its own bpf loader to load eBPF programs, which has
>> evolved separately from libbpf. Since we are now standardising on
>> libbpf, this becomes a problem as iproute2 is slowly accumulating
>> feature incompatibilities with libbpf-based loaders. In particular,
>> iproute2 has its own (expanded) version of the map definition struct,
>> which makes it difficult to write programs that can be loaded with both
>> custom loaders and iproute2.
>>
>> This series seeks to address this by converting iproute2 to using libbpf
>> for all its bpf needs. This version is an early proof-of-concept RFC, to
>> get some feedback on whether people think this is the right direction.
>>
>> What this series does is the following:
>>
>> - Updates the libbpf map definition struct to match that of iproute2
>>   (patch 1).
>
>
> Hi Toke,
>
> Thanks for taking a stab at unifying libbpf and iproute2 loaders. I'm
> totally in support of making iproute2 use libbpf to load/initialize
> BPF programs. But I'm against adding iproute2-specific fields to
> libbpf's bpf_map_def definitions to support this.
>
> I've proposed the plan of extending libbpf's supported features so
> that it can be used to load iproute2-style BPF programs earlier,
> please see discussions in [0] and [1].

Yeah, I've seen that discussion, and agree that longer term this is
probably a better way to do map-in-map definitions.

However, I view your proposal as complementary to this series: we'll
probably also want the BTF-based definition to work with iproute2, and
that means iproute2 needs to be ported to libbpf. But iproute2 needs to
be backwards compatible with the format it supports now, and, well, this
series is the simplest way to achieve that IMO :)

> I think instead of emulating iproute2 way of matching everything based
> on user-specified internal IDs, which doesn't provide good user
> experience and is quite easy to get wrong, we should support same
> scenarios with better declarative syntax and in a less error-prone
> way. I believe we can do that by relying on BTF more heavily (again,
> please check some of my proposals in [0], [1], and discussion with
> Daniel in those threads). It will feel more natural and be more
> straightforward to follow. It would be great if you can lend a hand in
> implementing pieces of that plan!
>
> I'm currently on vacation, so my availability is very sparse, but I'd
> be happy to discuss this further, if need be.

Happy to collaborate on your proposal when you're back from vacation;
but as I said above, I believe this is a complementary longer-term
thing...

-Toke

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP)
  2019-08-21 21:07   ` Toke Høiland-Jørgensen
@ 2019-08-22  7:49     ` Andrii Nakryiko
  2019-08-22  8:33       ` Daniel Borkmann
  0 siblings, 1 reply; 46+ messages in thread
From: Andrii Nakryiko @ 2019-08-22  7:49 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Stephen Hemminger, Daniel Borkmann, Alexei Starovoitov,
	Martin KaFai Lau, Song Liu, Yonghong Song, David Miller,
	Jesper Dangaard Brouer, Networking, bpf

On Wed, Aug 21, 2019 at 2:07 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>
> > On Tue, Aug 20, 2019 at 4:47 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> iproute2 uses its own bpf loader to load eBPF programs, which has
> >> evolved separately from libbpf. Since we are now standardising on
> >> libbpf, this becomes a problem as iproute2 is slowly accumulating
> >> feature incompatibilities with libbpf-based loaders. In particular,
> >> iproute2 has its own (expanded) version of the map definition struct,
> >> which makes it difficult to write programs that can be loaded with both
> >> custom loaders and iproute2.
> >>
> >> This series seeks to address this by converting iproute2 to using libbpf
> >> for all its bpf needs. This version is an early proof-of-concept RFC, to
> >> get some feedback on whether people think this is the right direction.
> >>
> >> What this series does is the following:
> >>
> >> - Updates the libbpf map definition struct to match that of iproute2
> >>   (patch 1).
> >
> >
> > Hi Toke,
> >
> > Thanks for taking a stab at unifying libbpf and iproute2 loaders. I'm
> > totally in support of making iproute2 use libbpf to load/initialize
> > BPF programs. But I'm against adding iproute2-specific fields to
> > libbpf's bpf_map_def definitions to support this.
> >
> > I've proposed the plan of extending libbpf's supported features so
> > that it can be used to load iproute2-style BPF programs earlier,
> > please see discussions in [0] and [1].
>
> Yeah, I've seen that discussion, and agree that longer term this is
> probably a better way to do map-in-map definitions.
>
> However, I view your proposal as complementary to this series: we'll
> probably also want the BTF-based definition to work with iproute2, and
> that means iproute2 needs to be ported to libbpf. But iproute2 needs to
> be backwards compatible with the format it supports now, and, well, this
> series is the simplest way to achieve that IMO :)

Ok, I understand that. But I'd still want to avoid adding extra cruft
to libbpf just for backwards-compatibility with *exact* iproute2
format. Libbpf as a whole is trying to move away from relying on
binary bpf_map_def and into using BTF-defined map definitions, and
this patch series is a step backwards in that regard, that adds,
essentially, already outdated stuff that we'll need to support forever
(I mean those extra fields in bpf_map_def, that will stay there
forever).

We've discussed one way to deal with it, IMO, in a cleaner way. It can
be done in few steps:

1. I originally wanted BTF-defined map definitions to ignore unknown
fields. It shouldn't be a default mode, but it should be supported
(and of course is very easy to add). So let's add that and let libbpf
ignore unknown stuff.

2. Then to let iproute2 loader deal with backwards-compatibility for
libbpf-incompatible bpf_elf_map, we need to "pass-through" all those
fields so that users of libbpf (iproute2 loader, in this case) can
make use of it. The easiest and cleanest way to do this is to expose
BTF ID of a type describing each map entry and let iproute2 process
that in whichever way it sees fit.

Luckily, bpf_elf_map is compatible in `type` field, which will let
libbpf recognize bpf_elf_map as map definition. All the rest setup
will be done by iproute2, by processing BTF of bpf_elf_map, which will
let it set up map sizes, flags and do all of its map-in-map magic.

The only additions to libbpf in this case would be a new `__u32
bpf_map__btf_id(struct bpf_map* map);` API.

I haven't written any code and haven't 100% checked that this will
cover everything, but I think we should try. This will allow to let
users of libbpf do custom stuff with map definitions without having to
put all this extra logic into libbpf itself, which I think is
desirable outcome.


>
> > I think instead of emulating iproute2 way of matching everything based
> > on user-specified internal IDs, which doesn't provide good user
> > experience and is quite easy to get wrong, we should support same
> > scenarios with better declarative syntax and in a less error-prone
> > way. I believe we can do that by relying on BTF more heavily (again,
> > please check some of my proposals in [0], [1], and discussion with
> > Daniel in those threads). It will feel more natural and be more
> > straightforward to follow. It would be great if you can lend a hand in
> > implementing pieces of that plan!
> >
> > I'm currently on vacation, so my availability is very sparse, but I'd
> > be happy to discuss this further, if need be.
>
> Happy to collaborate on your proposal when you're back from vacation;
> but as I said above, I believe this is a complementary longer-term
> thing...
>
> -Toke

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP)
  2019-08-21 21:00   ` Toke Høiland-Jørgensen
@ 2019-08-22  7:52     ` Andrii Nakryiko
  2019-08-22 10:38       ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 46+ messages in thread
From: Andrii Nakryiko @ 2019-08-22  7:52 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Alexei Starovoitov, Stephen Hemminger, Daniel Borkmann,
	Alexei Starovoitov, Martin KaFai Lau, Song Liu, Yonghong Song,
	David Miller, Jesper Dangaard Brouer, Networking, bpf

On Wed, Aug 21, 2019 at 4:29 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>
> > On Tue, Aug 20, 2019 at 01:47:01PM +0200, Toke Høiland-Jørgensen wrote:
> >> iproute2 uses its own bpf loader to load eBPF programs, which has
> >> evolved separately from libbpf. Since we are now standardising on
> >> libbpf, this becomes a problem as iproute2 is slowly accumulating
> >> feature incompatibilities with libbpf-based loaders. In particular,
> >> iproute2 has its own (expanded) version of the map definition struct,
> >> which makes it difficult to write programs that can be loaded with both
> >> custom loaders and iproute2.
> >>
> >> This series seeks to address this by converting iproute2 to using libbpf
> >> for all its bpf needs. This version is an early proof-of-concept RFC, to
> >> get some feedback on whether people think this is the right direction.
> >>
> >> What this series does is the following:
> >>
> >> - Updates the libbpf map definition struct to match that of iproute2
> >>   (patch 1).
> >> - Adds functionality to libbpf to support automatic pinning of maps when
> >>   loading an eBPF program, while re-using pinned maps if they already
> >>   exist (patches 2-3).
> >> - Modifies iproute2 to make it possible to compile it against libbpf
> >>   without affecting any existing functionality (patch 4).
> >> - Changes the iproute2 eBPF loader to use libbpf for loading XDP
> >>   programs (patch 5).
> >>
> >>
> >> As this is an early PoC, there are still a few missing pieces before
> >> this can be merged. Including (but probably not limited to):
> >>
> >> - Consolidate the map definition struct in the bpf_helpers.h file in the
> >>   kernel tree. This contains a different, and incompatible, update to
> >>   the struct. Since the iproute2 version has actually been released for
> >>   use outside the kernel tree (and thus is subject to API stability
> >>   constraints), I think it makes the most sense to keep that, and port
> >>   the selftests to use it.
> >
> > It sounds like you're implying that existing libbpf format is not
> > uapi.
>
> No, that's not what I meant... See below.
>
> > It is and we cannot break it.
> > If patch 1 means breakage for existing pre-compiled .o that won't load
> > with new libbpf then we cannot use this method.
> > Recompiling .o with new libbpf definition of bpf_map_def isn't an option.
> > libbpf has to be smart before/after and recognize both old and iproute2 format.
>
> The libbpf.h definition of struct bpf_map_def is compatible with the one
> used in iproute2. In libbpf.h, the struct only contains five fields
> (type, key_size, value_size, max_entries and flags), and iproute2 adds
> another 4 (id, pinning, inner_id and inner_idx; these are the ones in
> patch 1 in this series).
>
> The issue I was alluding to above is that the bpf_helpers.h file in the
> kernel selftests directory *also* extends the bpf_map_def struct, and
> adds two *different* fields (inner_map_idx and numa_mode). The former is
> used to implement the same map-in-map definition functionality that
> iproute2 has, but with different semantics. The latter is additional to
> that, and I'm planning to add that to this series.
>
> Since bpf_helpers.h is *not* part of libbpf (yet), this will make it

We should start considering it as if it was, so if we can avoid adding
stuff that I'd need to untangle to move it into libbpf, I'd rather
avoid it.
We've already prepared this move by relicensing bpf_helpers.h. Moving
it into libbpf itself is immediate next thing I'll do when I'm back.

> possible to keep API (and ABI) compatibility with both iproute2 and
> libbpf. As in, old .o files will still load with libbpf after this
> series, they just won't be able to use the new automatic pinning
> feature.
>
> -Toke

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP)
  2019-08-22  7:49     ` Andrii Nakryiko
@ 2019-08-22  8:33       ` Daniel Borkmann
  2019-08-22 11:48         ` Toke Høiland-Jørgensen
  2019-08-23  6:31         ` Andrii Nakryiko
  0 siblings, 2 replies; 46+ messages in thread
From: Daniel Borkmann @ 2019-08-22  8:33 UTC (permalink / raw)
  To: Andrii Nakryiko, Toke Høiland-Jørgensen
  Cc: Stephen Hemminger, Alexei Starovoitov, Martin KaFai Lau,
	Song Liu, Yonghong Song, David Miller, Jesper Dangaard Brouer,
	Networking, bpf

On 8/22/19 9:49 AM, Andrii Nakryiko wrote:
> On Wed, Aug 21, 2019 at 2:07 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>>> On Tue, Aug 20, 2019 at 4:47 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>>>
>>>> iproute2 uses its own bpf loader to load eBPF programs, which has
>>>> evolved separately from libbpf. Since we are now standardising on
>>>> libbpf, this becomes a problem as iproute2 is slowly accumulating
>>>> feature incompatibilities with libbpf-based loaders. In particular,
>>>> iproute2 has its own (expanded) version of the map definition struct,
>>>> which makes it difficult to write programs that can be loaded with both
>>>> custom loaders and iproute2.
>>>>
>>>> This series seeks to address this by converting iproute2 to using libbpf
>>>> for all its bpf needs. This version is an early proof-of-concept RFC, to
>>>> get some feedback on whether people think this is the right direction.
>>>>
>>>> What this series does is the following:
>>>>
>>>> - Updates the libbpf map definition struct to match that of iproute2
>>>>    (patch 1).
>>>
>>> Thanks for taking a stab at unifying libbpf and iproute2 loaders. I'm
>>> totally in support of making iproute2 use libbpf to load/initialize
>>> BPF programs. But I'm against adding iproute2-specific fields to
>>> libbpf's bpf_map_def definitions to support this.
>>>
>>> I've proposed the plan of extending libbpf's supported features so
>>> that it can be used to load iproute2-style BPF programs earlier,
>>> please see discussions in [0] and [1].
>>
>> Yeah, I've seen that discussion, and agree that longer term this is
>> probably a better way to do map-in-map definitions.
>>
>> However, I view your proposal as complementary to this series: we'll
>> probably also want the BTF-based definition to work with iproute2, and
>> that means iproute2 needs to be ported to libbpf. But iproute2 needs to
>> be backwards compatible with the format it supports now, and, well, this
>> series is the simplest way to achieve that IMO :)
> 
> Ok, I understand that. But I'd still want to avoid adding extra cruft
> to libbpf just for backwards-compatibility with *exact* iproute2
> format. Libbpf as a whole is trying to move away from relying on
> binary bpf_map_def and into using BTF-defined map definitions, and
> this patch series is a step backwards in that regard, that adds,
> essentially, already outdated stuff that we'll need to support forever
> (I mean those extra fields in bpf_map_def, that will stay there
> forever).

Agree, adding these extensions for libbpf would be a step backwards
compared to using BTF defined map defs.

> We've discussed one way to deal with it, IMO, in a cleaner way. It can
> be done in few steps:
> 
> 1. I originally wanted BTF-defined map definitions to ignore unknown
> fields. It shouldn't be a default mode, but it should be supported
> (and of course is very easy to add). So let's add that and let libbpf
> ignore unknown stuff.
> 
> 2. Then to let iproute2 loader deal with backwards-compatibility for
> libbpf-incompatible bpf_elf_map, we need to "pass-through" all those
> fields so that users of libbpf (iproute2 loader, in this case) can
> make use of it. The easiest and cleanest way to do this is to expose
> BTF ID of a type describing each map entry and let iproute2 process
> that in whichever way it sees fit.
> 
> Luckily, bpf_elf_map is compatible in `type` field, which will let
> libbpf recognize bpf_elf_map as map definition. All the rest setup
> will be done by iproute2, by processing BTF of bpf_elf_map, which will
> let it set up map sizes, flags and do all of its map-in-map magic.
> 
> The only additions to libbpf in this case would be a new `__u32
> bpf_map__btf_id(struct bpf_map* map);` API.
> 
> I haven't written any code and haven't 100% checked that this will
> cover everything, but I think we should try. This will allow to let
> users of libbpf do custom stuff with map definitions without having to
> put all this extra logic into libbpf itself, which I think is
> desirable outcome.

Sounds reasonable in general, but all this still has the issue that we're
assuming that BTF is /always/ present. Existing object files that would load
just fine /today/ but do not have BTF attached won't be handled here. Wouldn't
it be more straight forward to allow passing callbacks to the libbpf loader
such that if the map section is not found to be bpf_map_def compatible, we
rely on external user aka callback to parse the ELF section, handle any
non-default libbpf behavior like pinning/retrieving from BPF fs, populate
related internal libbpf map data structures and pass control back to libbpf
loader afterwards. (Similar callback with prog section name handling for the
case where tail call maps get automatically populated.)

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 4/5] iproute2: Allow compiling against libbpf
  2019-08-20 11:47 ` [RFC bpf-next 4/5] iproute2: Allow compiling against libbpf Toke Høiland-Jørgensen
@ 2019-08-22  8:58   ` Daniel Borkmann
  2019-08-22 10:43     ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 46+ messages in thread
From: Daniel Borkmann @ 2019-08-22  8:58 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Stephen Hemminger, Alexei Starovoitov
  Cc: Martin KaFai Lau, Song Liu, Yonghong Song, David Miller,
	Jesper Dangaard Brouer, netdev, bpf, andrii.nakryiko

On 8/20/19 1:47 PM, Toke Høiland-Jørgensen wrote:
> This adds a configure check for libbpf and renames functions to allow
> lib/bpf.c to be compiled with it present. This makes it possible to
> port functionality piecemeal to use libbpf.
> 
> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> ---
>   configure          | 16 ++++++++++++++++
>   include/bpf_util.h |  6 +++---
>   ip/ipvrf.c         |  4 ++--
>   lib/bpf.c          | 33 +++++++++++++++++++--------------
>   4 files changed, 40 insertions(+), 19 deletions(-)
> 
> diff --git a/configure b/configure
> index 45fcffb6..5a89ee9f 100755
> --- a/configure
> +++ b/configure
> @@ -238,6 +238,19 @@ check_elf()
>       fi
>   }
>   
> +check_libbpf()
> +{
> +    if ${PKG_CONFIG} libbpf --exists; then
> +	echo "HAVE_LIBBPF:=y" >>$CONFIG
> +	echo "yes"
> +
> +	echo 'CFLAGS += -DHAVE_LIBBPF' `${PKG_CONFIG} libbpf --cflags` >> $CONFIG
> +	echo 'LDLIBS += ' `${PKG_CONFIG} libbpf --libs` >>$CONFIG
> +    else
> +	echo "no"
> +    fi
> +}
> +
>   check_selinux()

More of an implementation detail at this point in time, but want to make sure this
doesn't get missed along the way: as discussed at bpfconf [0] best for iproute2 to
handle libbpf support would be the same way of integration as pahole does, that is,
to integrate it via submodule [1] to allow kernel and libbpf features to be in sync
with iproute2 releases and therefore easily consume extensions we're adding to libbpf
to aide iproute2 integration.

Thanks,
Daniel

   [0] http://vger.kernel.org/bpfconf2019.html#session-4
   [1] https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?id=21507cd3e97bc5692d97201ee68df044c6767e9a

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP)
  2019-08-22  7:52     ` Andrii Nakryiko
@ 2019-08-22 10:38       ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 46+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-08-22 10:38 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, Stephen Hemminger, Daniel Borkmann,
	Alexei Starovoitov, Martin KaFai Lau, Song Liu, Yonghong Song,
	David Miller, Jesper Dangaard Brouer, Networking, bpf

Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> On Wed, Aug 21, 2019 at 4:29 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>>
>> > On Tue, Aug 20, 2019 at 01:47:01PM +0200, Toke Høiland-Jørgensen wrote:
>> >> iproute2 uses its own bpf loader to load eBPF programs, which has
>> >> evolved separately from libbpf. Since we are now standardising on
>> >> libbpf, this becomes a problem as iproute2 is slowly accumulating
>> >> feature incompatibilities with libbpf-based loaders. In particular,
>> >> iproute2 has its own (expanded) version of the map definition struct,
>> >> which makes it difficult to write programs that can be loaded with both
>> >> custom loaders and iproute2.
>> >>
>> >> This series seeks to address this by converting iproute2 to using libbpf
>> >> for all its bpf needs. This version is an early proof-of-concept RFC, to
>> >> get some feedback on whether people think this is the right direction.
>> >>
>> >> What this series does is the following:
>> >>
>> >> - Updates the libbpf map definition struct to match that of iproute2
>> >>   (patch 1).
>> >> - Adds functionality to libbpf to support automatic pinning of maps when
>> >>   loading an eBPF program, while re-using pinned maps if they already
>> >>   exist (patches 2-3).
>> >> - Modifies iproute2 to make it possible to compile it against libbpf
>> >>   without affecting any existing functionality (patch 4).
>> >> - Changes the iproute2 eBPF loader to use libbpf for loading XDP
>> >>   programs (patch 5).
>> >>
>> >>
>> >> As this is an early PoC, there are still a few missing pieces before
>> >> this can be merged. Including (but probably not limited to):
>> >>
>> >> - Consolidate the map definition struct in the bpf_helpers.h file in the
>> >>   kernel tree. This contains a different, and incompatible, update to
>> >>   the struct. Since the iproute2 version has actually been released for
>> >>   use outside the kernel tree (and thus is subject to API stability
>> >>   constraints), I think it makes the most sense to keep that, and port
>> >>   the selftests to use it.
>> >
>> > It sounds like you're implying that existing libbpf format is not
>> > uapi.
>>
>> No, that's not what I meant... See below.
>>
>> > It is and we cannot break it.
>> > If patch 1 means breakage for existing pre-compiled .o that won't load
>> > with new libbpf then we cannot use this method.
>> > Recompiling .o with new libbpf definition of bpf_map_def isn't an option.
>> > libbpf has to be smart before/after and recognize both old and iproute2 format.
>>
>> The libbpf.h definition of struct bpf_map_def is compatible with the one
>> used in iproute2. In libbpf.h, the struct only contains five fields
>> (type, key_size, value_size, max_entries and flags), and iproute2 adds
>> another 4 (id, pinning, inner_id and inner_idx; these are the ones in
>> patch 1 in this series).
>>
>> The issue I was alluding to above is that the bpf_helpers.h file in the
>> kernel selftests directory *also* extends the bpf_map_def struct, and
>> adds two *different* fields (inner_map_idx and numa_mode). The former is
>> used to implement the same map-in-map definition functionality that
>> iproute2 has, but with different semantics. The latter is additional to
>> that, and I'm planning to add that to this series.
>>
>> Since bpf_helpers.h is *not* part of libbpf (yet), this will make it
>
> We should start considering it as if it was, so if we can avoid adding
> stuff that I'd need to untangle to move it into libbpf, I'd rather
> avoid it.
> We've already prepared this move by relicensing bpf_helpers.h. Moving
> it into libbpf itself is immediate next thing I'll do when I'm back.

Yeah, I figured that with the relicensing, bpf_helpers would probably be
making its way into libbpf soon. Which is why I wanted to start this
discussion before that: If we do move bpf_helpers as-is, that will put
us in the territory of full-on binary incompatibility. So the time to
discuss doing this in a compatible way is now, before any such move is
made.

-Toke

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 4/5] iproute2: Allow compiling against libbpf
  2019-08-22  8:58   ` Daniel Borkmann
@ 2019-08-22 10:43     ` Toke Høiland-Jørgensen
  2019-08-22 11:45       ` Daniel Borkmann
  0 siblings, 1 reply; 46+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-08-22 10:43 UTC (permalink / raw)
  To: Daniel Borkmann, Stephen Hemminger, Alexei Starovoitov
  Cc: Martin KaFai Lau, Song Liu, Yonghong Song, David Miller,
	Jesper Dangaard Brouer, netdev, bpf, andrii.nakryiko

Daniel Borkmann <daniel@iogearbox.net> writes:

> On 8/20/19 1:47 PM, Toke Høiland-Jørgensen wrote:
>> This adds a configure check for libbpf and renames functions to allow
>> lib/bpf.c to be compiled with it present. This makes it possible to
>> port functionality piecemeal to use libbpf.
>> 
>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>> ---
>>   configure          | 16 ++++++++++++++++
>>   include/bpf_util.h |  6 +++---
>>   ip/ipvrf.c         |  4 ++--
>>   lib/bpf.c          | 33 +++++++++++++++++++--------------
>>   4 files changed, 40 insertions(+), 19 deletions(-)
>> 
>> diff --git a/configure b/configure
>> index 45fcffb6..5a89ee9f 100755
>> --- a/configure
>> +++ b/configure
>> @@ -238,6 +238,19 @@ check_elf()
>>       fi
>>   }
>>   
>> +check_libbpf()
>> +{
>> +    if ${PKG_CONFIG} libbpf --exists; then
>> +	echo "HAVE_LIBBPF:=y" >>$CONFIG
>> +	echo "yes"
>> +
>> +	echo 'CFLAGS += -DHAVE_LIBBPF' `${PKG_CONFIG} libbpf --cflags` >> $CONFIG
>> +	echo 'LDLIBS += ' `${PKG_CONFIG} libbpf --libs` >>$CONFIG
>> +    else
>> +	echo "no"
>> +    fi
>> +}
>> +
>>   check_selinux()
>
> More of an implementation detail at this point in time, but want to
> make sure this doesn't get missed along the way: as discussed at
> bpfconf [0] best for iproute2 to handle libbpf support would be the
> same way of integration as pahole does, that is, to integrate it via
> submodule [1] to allow kernel and libbpf features to be in sync with
> iproute2 releases and therefore easily consume extensions we're adding
> to libbpf to aide iproute2 integration.

I can sorta see the point wrt keeping in sync with kernel features. But
how will this work with distros that package libbpf as a regular
library? Have you guys given up on regular library symbol versioning for
libbpf?

>    [0] http://vger.kernel.org/bpfconf2019.html#session-4

Thanks for that link! Didn't manage to find any of the previous
discussions on iproute2 compatibility.

-Toke

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 4/5] iproute2: Allow compiling against libbpf
  2019-08-22 10:43     ` Toke Høiland-Jørgensen
@ 2019-08-22 11:45       ` Daniel Borkmann
  2019-08-22 12:04         ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 46+ messages in thread
From: Daniel Borkmann @ 2019-08-22 11:45 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Stephen Hemminger, Alexei Starovoitov
  Cc: Martin KaFai Lau, Song Liu, Yonghong Song, David Miller,
	Jesper Dangaard Brouer, netdev, bpf, andrii.nakryiko

On 8/22/19 12:43 PM, Toke Høiland-Jørgensen wrote:
> Daniel Borkmann <daniel@iogearbox.net> writes:
>> On 8/20/19 1:47 PM, Toke Høiland-Jørgensen wrote:
>>> This adds a configure check for libbpf and renames functions to allow
>>> lib/bpf.c to be compiled with it present. This makes it possible to
>>> port functionality piecemeal to use libbpf.
>>>
>>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>>> ---
>>>    configure          | 16 ++++++++++++++++
>>>    include/bpf_util.h |  6 +++---
>>>    ip/ipvrf.c         |  4 ++--
>>>    lib/bpf.c          | 33 +++++++++++++++++++--------------
>>>    4 files changed, 40 insertions(+), 19 deletions(-)
>>>
>>> diff --git a/configure b/configure
>>> index 45fcffb6..5a89ee9f 100755
>>> --- a/configure
>>> +++ b/configure
>>> @@ -238,6 +238,19 @@ check_elf()
>>>        fi
>>>    }
>>>    
>>> +check_libbpf()
>>> +{
>>> +    if ${PKG_CONFIG} libbpf --exists; then
>>> +	echo "HAVE_LIBBPF:=y" >>$CONFIG
>>> +	echo "yes"
>>> +
>>> +	echo 'CFLAGS += -DHAVE_LIBBPF' `${PKG_CONFIG} libbpf --cflags` >> $CONFIG
>>> +	echo 'LDLIBS += ' `${PKG_CONFIG} libbpf --libs` >>$CONFIG
>>> +    else
>>> +	echo "no"
>>> +    fi
>>> +}
>>> +
>>>    check_selinux()
>>
>> More of an implementation detail at this point in time, but want to
>> make sure this doesn't get missed along the way: as discussed at
>> bpfconf [0] best for iproute2 to handle libbpf support would be the
>> same way of integration as pahole does, that is, to integrate it via
>> submodule [1] to allow kernel and libbpf features to be in sync with
>> iproute2 releases and therefore easily consume extensions we're adding
>> to libbpf to aide iproute2 integration.
> 
> I can sorta see the point wrt keeping in sync with kernel features. But
> how will this work with distros that package libbpf as a regular
> library? Have you guys given up on regular library symbol versioning for
> libbpf?

Not at all, and I hope you know that. ;-) The reason I added lib/bpf.c
integration into iproute2 directly back then was exactly such that users
can start consuming BPF for tc and XDP via iproute2 /everywhere/ with
only a simple libelf dependency which is also available on all distros
since pretty much forever. If it was an external library, we could have
waited till hell freezes over and initial distro adoption would have pretty
much taken forever: to pick one random example here wrt the pace of some
downstream distros [0]. The main rationale is pretty much the same as with
added kernel features that land complementary iproute2 patches for that
kernel release and as libbpf is developed alongside it is reasonable to
guarantee user expectations that iproute2 released for kernel version x
can make use of BPF features added to kernel x with same loader support
from x.

   [0] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1774815

>>     [0] http://vger.kernel.org/bpfconf2019.html#session-4
> 
> Thanks for that link! Didn't manage to find any of the previous
> discussions on iproute2 compatibility.
> 
> -Toke
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP)
  2019-08-22  8:33       ` Daniel Borkmann
@ 2019-08-22 11:48         ` Toke Høiland-Jørgensen
  2019-08-22 11:49           ` Toke Høiland-Jørgensen
  2019-08-23  6:31         ` Andrii Nakryiko
  1 sibling, 1 reply; 46+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-08-22 11:48 UTC (permalink / raw)
  To: Daniel Borkmann, Andrii Nakryiko
  Cc: Stephen Hemminger, Alexei Starovoitov, Martin KaFai Lau,
	Song Liu, Yonghong Song, David Miller, Jesper Dangaard Brouer,
	Networking, bpf

Daniel Borkmann <daniel@iogearbox.net> writes:

> On 8/22/19 9:49 AM, Andrii Nakryiko wrote:
>> On Wed, Aug 21, 2019 at 2:07 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>>>> On Tue, Aug 20, 2019 at 4:47 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>>>>
>>>>> iproute2 uses its own bpf loader to load eBPF programs, which has
>>>>> evolved separately from libbpf. Since we are now standardising on
>>>>> libbpf, this becomes a problem as iproute2 is slowly accumulating
>>>>> feature incompatibilities with libbpf-based loaders. In particular,
>>>>> iproute2 has its own (expanded) version of the map definition struct,
>>>>> which makes it difficult to write programs that can be loaded with both
>>>>> custom loaders and iproute2.
>>>>>
>>>>> This series seeks to address this by converting iproute2 to using libbpf
>>>>> for all its bpf needs. This version is an early proof-of-concept RFC, to
>>>>> get some feedback on whether people think this is the right direction.
>>>>>
>>>>> What this series does is the following:
>>>>>
>>>>> - Updates the libbpf map definition struct to match that of iproute2
>>>>>    (patch 1).
>>>>
>>>> Thanks for taking a stab at unifying libbpf and iproute2 loaders. I'm
>>>> totally in support of making iproute2 use libbpf to load/initialize
>>>> BPF programs. But I'm against adding iproute2-specific fields to
>>>> libbpf's bpf_map_def definitions to support this.
>>>>
>>>> I've proposed the plan of extending libbpf's supported features so
>>>> that it can be used to load iproute2-style BPF programs earlier,
>>>> please see discussions in [0] and [1].
>>>
>>> Yeah, I've seen that discussion, and agree that longer term this is
>>> probably a better way to do map-in-map definitions.
>>>
>>> However, I view your proposal as complementary to this series: we'll
>>> probably also want the BTF-based definition to work with iproute2, and
>>> that means iproute2 needs to be ported to libbpf. But iproute2 needs to
>>> be backwards compatible with the format it supports now, and, well, this
>>> series is the simplest way to achieve that IMO :)
>> 
>> Ok, I understand that. But I'd still want to avoid adding extra cruft
>> to libbpf just for backwards-compatibility with *exact* iproute2
>> format. Libbpf as a whole is trying to move away from relying on
>> binary bpf_map_def and into using BTF-defined map definitions, and
>> this patch series is a step backwards in that regard, that adds,
>> essentially, already outdated stuff that we'll need to support forever
>> (I mean those extra fields in bpf_map_def, that will stay there
>> forever).
>
> Agree, adding these extensions for libbpf would be a step backwards
> compared to using BTF defined map defs.
>
>> We've discussed one way to deal with it, IMO, in a cleaner way. It can
>> be done in few steps:
>> 
>> 1. I originally wanted BTF-defined map definitions to ignore unknown
>> fields. It shouldn't be a default mode, but it should be supported
>> (and of course is very easy to add). So let's add that and let libbpf
>> ignore unknown stuff.
>> 
>> 2. Then to let iproute2 loader deal with backwards-compatibility for
>> libbpf-incompatible bpf_elf_map, we need to "pass-through" all those
>> fields so that users of libbpf (iproute2 loader, in this case) can
>> make use of it. The easiest and cleanest way to do this is to expose
>> BTF ID of a type describing each map entry and let iproute2 process
>> that in whichever way it sees fit.
>> 
>> Luckily, bpf_elf_map is compatible in `type` field, which will let
>> libbpf recognize bpf_elf_map as map definition. All the rest setup
>> will be done by iproute2, by processing BTF of bpf_elf_map, which will
>> let it set up map sizes, flags and do all of its map-in-map magic.
>> 
>> The only additions to libbpf in this case would be a new `__u32
>> bpf_map__btf_id(struct bpf_map* map);` API.
>> 
>> I haven't written any code and haven't 100% checked that this will
>> cover everything, but I think we should try. This will allow to let
>> users of libbpf do custom stuff with map definitions without having to
>> put all this extra logic into libbpf itself, which I think is
>> desirable outcome.
>
> Sounds reasonable in general, but all this still has the issue that
> we're assuming that BTF is /always/ present. Existing object files
> that would load just fine /today/ but do not have BTF attached won't
> be handled here. Wouldn't it be more straight forward to allow passing
> callbacks to the libbpf loader such that if the map section is not
> found to be bpf_map_def compatible, we rely on external user aka
> callback to parse the ELF section, handle any non-default libbpf
> behavior like pinning/retrieving from BPF fs, populate related
> internal libbpf map data structures and pass control back to libbpf
> loader afterwards. (Similar callback with prog section name handling
> for the case where tail call maps get automatically populated.)

Thinking about this some more, I think there are two separate issues
here:

1. Do we want libbpf to support the features currently in iproute2 and
   bpf_helpers (i.e., map pinning + reuse, map-in-map definitions, and
   NUMA node placement of maps). IMO the answer to this is yes.

2. What should the data format be for BPF programs to signal that they
   want to use those features? Here, the longer-term answer is BTF-based
   map definitions, but we still want iproute2 to be backwards
   compatible.


So how about I revise this patch series to implement the *features* (I
already implemented map-in-map and numa nodes[0], so that is sorta
already done), but instead of extending the bpf_map_def struct, I just
expose callbacks that will allow programs to fill in internal-to-libbpf
data structures with the required information. Then, once the BTF-based
map definition does land, that can simply define default callbacks that
uses the BTF information to fill in those same internal data structures?

This would mean no extending bpf_map_def, and relaxing the current
libbpf restriction on extending bpf_map_def.

The drawback of this approach is that it does nothing to combat
fragmentation: People building their own loaders can still reimplement
different semantics for map defs, leading to programs that are tied to a
particular loader. So this would only work if we really believe BTF can
save us from this. I don't feel competent to comment on this just yet,
but thought I'd mention it :)

-Toke

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP)
  2019-08-22 11:48         ` Toke Høiland-Jørgensen
@ 2019-08-22 11:49           ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 46+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-08-22 11:49 UTC (permalink / raw)
  To: Daniel Borkmann, Andrii Nakryiko
  Cc: Stephen Hemminger, Alexei Starovoitov, Martin KaFai Lau,
	Song Liu, Yonghong Song, David Miller, Jesper Dangaard Brouer,
	Networking, bpf

Toke Høiland-Jørgensen <toke@redhat.com> writes:

> Daniel Borkmann <daniel@iogearbox.net> writes:
>
>> On 8/22/19 9:49 AM, Andrii Nakryiko wrote:
>>> On Wed, Aug 21, 2019 at 2:07 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>>> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>>>>> On Tue, Aug 20, 2019 at 4:47 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>>>>>
>>>>>> iproute2 uses its own bpf loader to load eBPF programs, which has
>>>>>> evolved separately from libbpf. Since we are now standardising on
>>>>>> libbpf, this becomes a problem as iproute2 is slowly accumulating
>>>>>> feature incompatibilities with libbpf-based loaders. In particular,
>>>>>> iproute2 has its own (expanded) version of the map definition struct,
>>>>>> which makes it difficult to write programs that can be loaded with both
>>>>>> custom loaders and iproute2.
>>>>>>
>>>>>> This series seeks to address this by converting iproute2 to using libbpf
>>>>>> for all its bpf needs. This version is an early proof-of-concept RFC, to
>>>>>> get some feedback on whether people think this is the right direction.
>>>>>>
>>>>>> What this series does is the following:
>>>>>>
>>>>>> - Updates the libbpf map definition struct to match that of iproute2
>>>>>>    (patch 1).
>>>>>
>>>>> Thanks for taking a stab at unifying libbpf and iproute2 loaders. I'm
>>>>> totally in support of making iproute2 use libbpf to load/initialize
>>>>> BPF programs. But I'm against adding iproute2-specific fields to
>>>>> libbpf's bpf_map_def definitions to support this.
>>>>>
>>>>> I've proposed the plan of extending libbpf's supported features so
>>>>> that it can be used to load iproute2-style BPF programs earlier,
>>>>> please see discussions in [0] and [1].
>>>>
>>>> Yeah, I've seen that discussion, and agree that longer term this is
>>>> probably a better way to do map-in-map definitions.
>>>>
>>>> However, I view your proposal as complementary to this series: we'll
>>>> probably also want the BTF-based definition to work with iproute2, and
>>>> that means iproute2 needs to be ported to libbpf. But iproute2 needs to
>>>> be backwards compatible with the format it supports now, and, well, this
>>>> series is the simplest way to achieve that IMO :)
>>> 
>>> Ok, I understand that. But I'd still want to avoid adding extra cruft
>>> to libbpf just for backwards-compatibility with *exact* iproute2
>>> format. Libbpf as a whole is trying to move away from relying on
>>> binary bpf_map_def and into using BTF-defined map definitions, and
>>> this patch series is a step backwards in that regard, that adds,
>>> essentially, already outdated stuff that we'll need to support forever
>>> (I mean those extra fields in bpf_map_def, that will stay there
>>> forever).
>>
>> Agree, adding these extensions for libbpf would be a step backwards
>> compared to using BTF defined map defs.
>>
>>> We've discussed one way to deal with it, IMO, in a cleaner way. It can
>>> be done in few steps:
>>> 
>>> 1. I originally wanted BTF-defined map definitions to ignore unknown
>>> fields. It shouldn't be a default mode, but it should be supported
>>> (and of course is very easy to add). So let's add that and let libbpf
>>> ignore unknown stuff.
>>> 
>>> 2. Then to let iproute2 loader deal with backwards-compatibility for
>>> libbpf-incompatible bpf_elf_map, we need to "pass-through" all those
>>> fields so that users of libbpf (iproute2 loader, in this case) can
>>> make use of it. The easiest and cleanest way to do this is to expose
>>> BTF ID of a type describing each map entry and let iproute2 process
>>> that in whichever way it sees fit.
>>> 
>>> Luckily, bpf_elf_map is compatible in `type` field, which will let
>>> libbpf recognize bpf_elf_map as map definition. All the rest setup
>>> will be done by iproute2, by processing BTF of bpf_elf_map, which will
>>> let it set up map sizes, flags and do all of its map-in-map magic.
>>> 
>>> The only additions to libbpf in this case would be a new `__u32
>>> bpf_map__btf_id(struct bpf_map* map);` API.
>>> 
>>> I haven't written any code and haven't 100% checked that this will
>>> cover everything, but I think we should try. This will allow to let
>>> users of libbpf do custom stuff with map definitions without having to
>>> put all this extra logic into libbpf itself, which I think is
>>> desirable outcome.
>>
>> Sounds reasonable in general, but all this still has the issue that
>> we're assuming that BTF is /always/ present. Existing object files
>> that would load just fine /today/ but do not have BTF attached won't
>> be handled here. Wouldn't it be more straight forward to allow passing
>> callbacks to the libbpf loader such that if the map section is not
>> found to be bpf_map_def compatible, we rely on external user aka
>> callback to parse the ELF section, handle any non-default libbpf
>> behavior like pinning/retrieving from BPF fs, populate related
>> internal libbpf map data structures and pass control back to libbpf
>> loader afterwards. (Similar callback with prog section name handling
>> for the case where tail call maps get automatically populated.)
>
> Thinking about this some more, I think there are two separate issues
> here:
>
> 1. Do we want libbpf to support the features currently in iproute2 and
>    bpf_helpers (i.e., map pinning + reuse, map-in-map definitions, and
>    NUMA node placement of maps). IMO the answer to this is yes.
>
> 2. What should the data format be for BPF programs to signal that they
>    want to use those features? Here, the longer-term answer is BTF-based
>    map definitions, but we still want iproute2 to be backwards
>    compatible.
>
>
> So how about I revise this patch series to implement the *features* (I
> already implemented map-in-map and numa nodes[0], so that is sorta
> already done), but instead of extending the bpf_map_def struct, I just
> expose callbacks that will allow programs to fill in internal-to-libbpf
> data structures with the required information. Then, once the BTF-based
> map definition does land, that can simply define default callbacks that
> uses the BTF information to fill in those same internal data structures?
>
> This would mean no extending bpf_map_def, and relaxing the current
> libbpf restriction on extending bpf_map_def.
>
> The drawback of this approach is that it does nothing to combat
> fragmentation: People building their own loaders can still reimplement
> different semantics for map defs, leading to programs that are tied to a
> particular loader. So this would only work if we really believe BTF can
> save us from this. I don't feel competent to comment on this just yet,
> but thought I'd mention it :)
>
> -Toke

[0] was supposed to be a reference to
    https://github.com/tohojo/libbpf/tree/iproute2-compat

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 4/5] iproute2: Allow compiling against libbpf
  2019-08-22 11:45       ` Daniel Borkmann
@ 2019-08-22 12:04         ` Toke Høiland-Jørgensen
  2019-08-22 12:33           ` Daniel Borkmann
  0 siblings, 1 reply; 46+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-08-22 12:04 UTC (permalink / raw)
  To: Daniel Borkmann, Stephen Hemminger, Alexei Starovoitov
  Cc: Martin KaFai Lau, Song Liu, Yonghong Song, David Miller,
	Jesper Dangaard Brouer, netdev, bpf, andrii.nakryiko

Daniel Borkmann <daniel@iogearbox.net> writes:

> On 8/22/19 12:43 PM, Toke Høiland-Jørgensen wrote:
>> Daniel Borkmann <daniel@iogearbox.net> writes:
>>> On 8/20/19 1:47 PM, Toke Høiland-Jørgensen wrote:
>>>> This adds a configure check for libbpf and renames functions to allow
>>>> lib/bpf.c to be compiled with it present. This makes it possible to
>>>> port functionality piecemeal to use libbpf.
>>>>
>>>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>>>> ---
>>>>    configure          | 16 ++++++++++++++++
>>>>    include/bpf_util.h |  6 +++---
>>>>    ip/ipvrf.c         |  4 ++--
>>>>    lib/bpf.c          | 33 +++++++++++++++++++--------------
>>>>    4 files changed, 40 insertions(+), 19 deletions(-)
>>>>
>>>> diff --git a/configure b/configure
>>>> index 45fcffb6..5a89ee9f 100755
>>>> --- a/configure
>>>> +++ b/configure
>>>> @@ -238,6 +238,19 @@ check_elf()
>>>>        fi
>>>>    }
>>>>    
>>>> +check_libbpf()
>>>> +{
>>>> +    if ${PKG_CONFIG} libbpf --exists; then
>>>> +	echo "HAVE_LIBBPF:=y" >>$CONFIG
>>>> +	echo "yes"
>>>> +
>>>> +	echo 'CFLAGS += -DHAVE_LIBBPF' `${PKG_CONFIG} libbpf --cflags` >> $CONFIG
>>>> +	echo 'LDLIBS += ' `${PKG_CONFIG} libbpf --libs` >>$CONFIG
>>>> +    else
>>>> +	echo "no"
>>>> +    fi
>>>> +}
>>>> +
>>>>    check_selinux()
>>>
>>> More of an implementation detail at this point in time, but want to
>>> make sure this doesn't get missed along the way: as discussed at
>>> bpfconf [0] best for iproute2 to handle libbpf support would be the
>>> same way of integration as pahole does, that is, to integrate it via
>>> submodule [1] to allow kernel and libbpf features to be in sync with
>>> iproute2 releases and therefore easily consume extensions we're adding
>>> to libbpf to aide iproute2 integration.
>> 
>> I can sorta see the point wrt keeping in sync with kernel features. But
>> how will this work with distros that package libbpf as a regular
>> library? Have you guys given up on regular library symbol versioning for
>> libbpf?
>
> Not at all, and I hope you know that. ;-)

Good! Didn't really expect you had, just checking ;)

> The reason I added lib/bpf.c integration into iproute2 directly back
> then was exactly such that users can start consuming BPF for tc and
> XDP via iproute2 /everywhere/ with only a simple libelf dependency
> which is also available on all distros since pretty much forever. If
> it was an external library, we could have waited till hell freezes
> over and initial distro adoption would have pretty much taken forever:
> to pick one random example here wrt the pace of some downstream
> distros [0]. The main rationale is pretty much the same as with added
> kernel features that land complementary iproute2 patches for that
> kernel release and as libbpf is developed alongside it is reasonable
> to guarantee user expectations that iproute2 released for kernel
> version x can make use of BPF features added to kernel x with same
> loader support from x.

Well, for iproute2 I would expect this to be solved by version
dependencies. I.e. iproute2 version X would depend on libbpf version Y+
(like, I dunno, the version of libbpf included in the same kernel source
tree as the kernel version iproute2 is targeting? :)).

If we vendor libbpf into every project using it, we'll end up with
dozens of different versions statically linked into each binary, kinda
defeating the purpose of having it as a shared library in the first
place...

-Toke

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 4/5] iproute2: Allow compiling against libbpf
  2019-08-22 12:04         ` Toke Høiland-Jørgensen
@ 2019-08-22 12:33           ` Daniel Borkmann
  2019-08-22 13:38             ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 46+ messages in thread
From: Daniel Borkmann @ 2019-08-22 12:33 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Stephen Hemminger, Alexei Starovoitov
  Cc: Martin KaFai Lau, Song Liu, Yonghong Song, David Miller,
	Jesper Dangaard Brouer, netdev, bpf, andrii.nakryiko

On 8/22/19 2:04 PM, Toke Høiland-Jørgensen wrote:
> Daniel Borkmann <daniel@iogearbox.net> writes:
>> On 8/22/19 12:43 PM, Toke Høiland-Jørgensen wrote:
>>> Daniel Borkmann <daniel@iogearbox.net> writes:
>>>> On 8/20/19 1:47 PM, Toke Høiland-Jørgensen wrote:
>>>>> This adds a configure check for libbpf and renames functions to allow
>>>>> lib/bpf.c to be compiled with it present. This makes it possible to
>>>>> port functionality piecemeal to use libbpf.
>>>>>
>>>>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>>>>> ---
>>>>>     configure          | 16 ++++++++++++++++
>>>>>     include/bpf_util.h |  6 +++---
>>>>>     ip/ipvrf.c         |  4 ++--
>>>>>     lib/bpf.c          | 33 +++++++++++++++++++--------------
>>>>>     4 files changed, 40 insertions(+), 19 deletions(-)
>>>>>
>>>>> diff --git a/configure b/configure
>>>>> index 45fcffb6..5a89ee9f 100755
>>>>> --- a/configure
>>>>> +++ b/configure
>>>>> @@ -238,6 +238,19 @@ check_elf()
>>>>>         fi
>>>>>     }
>>>>>     
>>>>> +check_libbpf()
>>>>> +{
>>>>> +    if ${PKG_CONFIG} libbpf --exists; then
>>>>> +	echo "HAVE_LIBBPF:=y" >>$CONFIG
>>>>> +	echo "yes"
>>>>> +
>>>>> +	echo 'CFLAGS += -DHAVE_LIBBPF' `${PKG_CONFIG} libbpf --cflags` >> $CONFIG
>>>>> +	echo 'LDLIBS += ' `${PKG_CONFIG} libbpf --libs` >>$CONFIG
>>>>> +    else
>>>>> +	echo "no"
>>>>> +    fi
>>>>> +}
>>>>> +
>>>>>     check_selinux()
>>>>
>>>> More of an implementation detail at this point in time, but want to
>>>> make sure this doesn't get missed along the way: as discussed at
>>>> bpfconf [0] best for iproute2 to handle libbpf support would be the
>>>> same way of integration as pahole does, that is, to integrate it via
>>>> submodule [1] to allow kernel and libbpf features to be in sync with
>>>> iproute2 releases and therefore easily consume extensions we're adding
>>>> to libbpf to aide iproute2 integration.
>>>
>>> I can sorta see the point wrt keeping in sync with kernel features. But
>>> how will this work with distros that package libbpf as a regular
>>> library? Have you guys given up on regular library symbol versioning for
>>> libbpf?
>>
>> Not at all, and I hope you know that. ;-)
> 
> Good! Didn't really expect you had, just checking ;)
> 
>> The reason I added lib/bpf.c integration into iproute2 directly back
>> then was exactly such that users can start consuming BPF for tc and
>> XDP via iproute2 /everywhere/ with only a simple libelf dependency
>> which is also available on all distros since pretty much forever. If
>> it was an external library, we could have waited till hell freezes
>> over and initial distro adoption would have pretty much taken forever:
>> to pick one random example here wrt the pace of some downstream
>> distros [0]. The main rationale is pretty much the same as with added
>> kernel features that land complementary iproute2 patches for that
>> kernel release and as libbpf is developed alongside it is reasonable
>> to guarantee user expectations that iproute2 released for kernel
>> version x can make use of BPF features added to kernel x with same
>> loader support from x.
> 
> Well, for iproute2 I would expect this to be solved by version
> dependencies. I.e. iproute2 version X would depend on libbpf version Y+
> (like, I dunno, the version of libbpf included in the same kernel source
> tree as the kernel version iproute2 is targeting? :)).

This sounds nice in theory, but from what I've seen major (!) distros
already seem to have a hard time releasing kernel x along with iproute2
package x, concrete example was that distro kernel was on 4.13 and its
official iproute2 package on 4.9, adding yet another variable that needs
to be in sync with kernel is simply impractical especially for a _core_
package like iproute2 that should have as little dependencies as possible.
I also don't want to make a bet on whether libbpf will be available on
every distro that also ships iproute2. Hence approach that pahole (and
also bcc by the way) takes is most reasonable to have the best user
experience.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 4/5] iproute2: Allow compiling against libbpf
  2019-08-22 12:33           ` Daniel Borkmann
@ 2019-08-22 13:38             ` Toke Høiland-Jørgensen
  2019-08-22 13:45               ` Daniel Borkmann
  0 siblings, 1 reply; 46+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-08-22 13:38 UTC (permalink / raw)
  To: Daniel Borkmann, Stephen Hemminger, Alexei Starovoitov
  Cc: Martin KaFai Lau, Song Liu, Yonghong Song, David Miller,
	Jesper Dangaard Brouer, netdev, bpf, andrii.nakryiko

Daniel Borkmann <daniel@iogearbox.net> writes:

> On 8/22/19 2:04 PM, Toke Høiland-Jørgensen wrote:
>> Daniel Borkmann <daniel@iogearbox.net> writes:
>>> On 8/22/19 12:43 PM, Toke Høiland-Jørgensen wrote:
>>>> Daniel Borkmann <daniel@iogearbox.net> writes:
>>>>> On 8/20/19 1:47 PM, Toke Høiland-Jørgensen wrote:
>>>>>> This adds a configure check for libbpf and renames functions to allow
>>>>>> lib/bpf.c to be compiled with it present. This makes it possible to
>>>>>> port functionality piecemeal to use libbpf.
>>>>>>
>>>>>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>>>>>> ---
>>>>>>     configure          | 16 ++++++++++++++++
>>>>>>     include/bpf_util.h |  6 +++---
>>>>>>     ip/ipvrf.c         |  4 ++--
>>>>>>     lib/bpf.c          | 33 +++++++++++++++++++--------------
>>>>>>     4 files changed, 40 insertions(+), 19 deletions(-)
>>>>>>
>>>>>> diff --git a/configure b/configure
>>>>>> index 45fcffb6..5a89ee9f 100755
>>>>>> --- a/configure
>>>>>> +++ b/configure
>>>>>> @@ -238,6 +238,19 @@ check_elf()
>>>>>>         fi
>>>>>>     }
>>>>>>     
>>>>>> +check_libbpf()
>>>>>> +{
>>>>>> +    if ${PKG_CONFIG} libbpf --exists; then
>>>>>> +	echo "HAVE_LIBBPF:=y" >>$CONFIG
>>>>>> +	echo "yes"
>>>>>> +
>>>>>> +	echo 'CFLAGS += -DHAVE_LIBBPF' `${PKG_CONFIG} libbpf --cflags` >> $CONFIG
>>>>>> +	echo 'LDLIBS += ' `${PKG_CONFIG} libbpf --libs` >>$CONFIG
>>>>>> +    else
>>>>>> +	echo "no"
>>>>>> +    fi
>>>>>> +}
>>>>>> +
>>>>>>     check_selinux()
>>>>>
>>>>> More of an implementation detail at this point in time, but want to
>>>>> make sure this doesn't get missed along the way: as discussed at
>>>>> bpfconf [0] best for iproute2 to handle libbpf support would be the
>>>>> same way of integration as pahole does, that is, to integrate it via
>>>>> submodule [1] to allow kernel and libbpf features to be in sync with
>>>>> iproute2 releases and therefore easily consume extensions we're adding
>>>>> to libbpf to aide iproute2 integration.
>>>>
>>>> I can sorta see the point wrt keeping in sync with kernel features. But
>>>> how will this work with distros that package libbpf as a regular
>>>> library? Have you guys given up on regular library symbol versioning for
>>>> libbpf?
>>>
>>> Not at all, and I hope you know that. ;-)
>> 
>> Good! Didn't really expect you had, just checking ;)
>> 
>>> The reason I added lib/bpf.c integration into iproute2 directly back
>>> then was exactly such that users can start consuming BPF for tc and
>>> XDP via iproute2 /everywhere/ with only a simple libelf dependency
>>> which is also available on all distros since pretty much forever. If
>>> it was an external library, we could have waited till hell freezes
>>> over and initial distro adoption would have pretty much taken forever:
>>> to pick one random example here wrt the pace of some downstream
>>> distros [0]. The main rationale is pretty much the same as with added
>>> kernel features that land complementary iproute2 patches for that
>>> kernel release and as libbpf is developed alongside it is reasonable
>>> to guarantee user expectations that iproute2 released for kernel
>>> version x can make use of BPF features added to kernel x with same
>>> loader support from x.
>> 
>> Well, for iproute2 I would expect this to be solved by version
>> dependencies. I.e. iproute2 version X would depend on libbpf version Y+
>> (like, I dunno, the version of libbpf included in the same kernel source
>> tree as the kernel version iproute2 is targeting? :)).
>
> This sounds nice in theory, but from what I've seen major (!) distros
> already seem to have a hard time releasing kernel x along with iproute2
> package x, concrete example was that distro kernel was on 4.13 and its
> official iproute2 package on 4.9,

If the iproute2 package is not being updated at all I don't really see
how it would make any difference whether libbpf is vendored or not? :)

> adding yet another variable that needs to be in sync with kernel is
> simply impractical especially for a _core_ package like iproute2 that
> should have as little dependencies as possible. I also don't want to
> make a bet on whether libbpf will be available on every distro that
> also ships iproute2. Hence approach that pahole (and also bcc by the
> way) takes is most reasonable to have the best user experience.

Most users are going to get iproute2 from their distro packages anyway,
so if distros are incompetent at packaging, my bet is that you're going
to run into issues one way or another.

But OK, if you think it is easier to work around bad distros by
vendoring, you guys are the maintainers, so that's up to you. But can we
at least put in the version dependency and let the build system pick up
a system libbpf if it exists and is compatible? That way distros that
*are* competent can still link it dynamically...

-Toke

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 4/5] iproute2: Allow compiling against libbpf
  2019-08-22 13:38             ` Toke Høiland-Jørgensen
@ 2019-08-22 13:45               ` Daniel Borkmann
  2019-08-22 15:28                 ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 46+ messages in thread
From: Daniel Borkmann @ 2019-08-22 13:45 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Stephen Hemminger, Alexei Starovoitov
  Cc: Martin KaFai Lau, Song Liu, Yonghong Song, David Miller,
	Jesper Dangaard Brouer, netdev, bpf, andrii.nakryiko

On 8/22/19 3:38 PM, Toke Høiland-Jørgensen wrote:
> Daniel Borkmann <daniel@iogearbox.net> writes:
>> On 8/22/19 2:04 PM, Toke Høiland-Jørgensen wrote:
>>> Daniel Borkmann <daniel@iogearbox.net> writes:
>>>> On 8/22/19 12:43 PM, Toke Høiland-Jørgensen wrote:
>>>>> Daniel Borkmann <daniel@iogearbox.net> writes:
>>>>>> On 8/20/19 1:47 PM, Toke Høiland-Jørgensen wrote:
>>>>>>> This adds a configure check for libbpf and renames functions to allow
>>>>>>> lib/bpf.c to be compiled with it present. This makes it possible to
>>>>>>> port functionality piecemeal to use libbpf.
>>>>>>>
>>>>>>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>>>>>>> ---
>>>>>>>      configure          | 16 ++++++++++++++++
>>>>>>>      include/bpf_util.h |  6 +++---
>>>>>>>      ip/ipvrf.c         |  4 ++--
>>>>>>>      lib/bpf.c          | 33 +++++++++++++++++++--------------
>>>>>>>      4 files changed, 40 insertions(+), 19 deletions(-)
>>>>>>>
>>>>>>> diff --git a/configure b/configure
>>>>>>> index 45fcffb6..5a89ee9f 100755
>>>>>>> --- a/configure
>>>>>>> +++ b/configure
>>>>>>> @@ -238,6 +238,19 @@ check_elf()
>>>>>>>          fi
>>>>>>>      }
>>>>>>>      
>>>>>>> +check_libbpf()
>>>>>>> +{
>>>>>>> +    if ${PKG_CONFIG} libbpf --exists; then
>>>>>>> +	echo "HAVE_LIBBPF:=y" >>$CONFIG
>>>>>>> +	echo "yes"
>>>>>>> +
>>>>>>> +	echo 'CFLAGS += -DHAVE_LIBBPF' `${PKG_CONFIG} libbpf --cflags` >> $CONFIG
>>>>>>> +	echo 'LDLIBS += ' `${PKG_CONFIG} libbpf --libs` >>$CONFIG
>>>>>>> +    else
>>>>>>> +	echo "no"
>>>>>>> +    fi
>>>>>>> +}
>>>>>>> +
>>>>>>>      check_selinux()
>>>>>>
>>>>>> More of an implementation detail at this point in time, but want to
>>>>>> make sure this doesn't get missed along the way: as discussed at
>>>>>> bpfconf [0] best for iproute2 to handle libbpf support would be the
>>>>>> same way of integration as pahole does, that is, to integrate it via
>>>>>> submodule [1] to allow kernel and libbpf features to be in sync with
>>>>>> iproute2 releases and therefore easily consume extensions we're adding
>>>>>> to libbpf to aide iproute2 integration.
>>>>>
>>>>> I can sorta see the point wrt keeping in sync with kernel features. But
>>>>> how will this work with distros that package libbpf as a regular
>>>>> library? Have you guys given up on regular library symbol versioning for
>>>>> libbpf?
>>>>
>>>> Not at all, and I hope you know that. ;-)
>>>
>>> Good! Didn't really expect you had, just checking ;)
>>>
>>>> The reason I added lib/bpf.c integration into iproute2 directly back
>>>> then was exactly such that users can start consuming BPF for tc and
>>>> XDP via iproute2 /everywhere/ with only a simple libelf dependency
>>>> which is also available on all distros since pretty much forever. If
>>>> it was an external library, we could have waited till hell freezes
>>>> over and initial distro adoption would have pretty much taken forever:
>>>> to pick one random example here wrt the pace of some downstream
>>>> distros [0]. The main rationale is pretty much the same as with added
>>>> kernel features that land complementary iproute2 patches for that
>>>> kernel release and as libbpf is developed alongside it is reasonable
>>>> to guarantee user expectations that iproute2 released for kernel
>>>> version x can make use of BPF features added to kernel x with same
>>>> loader support from x.
>>>
>>> Well, for iproute2 I would expect this to be solved by version
>>> dependencies. I.e. iproute2 version X would depend on libbpf version Y+
>>> (like, I dunno, the version of libbpf included in the same kernel source
>>> tree as the kernel version iproute2 is targeting? :)).
>>
>> This sounds nice in theory, but from what I've seen major (!) distros
>> already seem to have a hard time releasing kernel x along with iproute2
>> package x, concrete example was that distro kernel was on 4.13 and its
>> official iproute2 package on 4.9,
> 
> If the iproute2 package is not being updated at all I don't really see
> how it would make any difference whether libbpf is vendored or not? :)
> 
>> adding yet another variable that needs to be in sync with kernel is
>> simply impractical especially for a _core_ package like iproute2 that
>> should have as little dependencies as possible. I also don't want to
>> make a bet on whether libbpf will be available on every distro that
>> also ships iproute2. Hence approach that pahole (and also bcc by the
>> way) takes is most reasonable to have the best user experience.
> 
> Most users are going to get iproute2 from their distro packages anyway,
> so if distros are incompetent at packaging, my bet is that you're going
> to run into issues one way or another.
> 
> But OK, if you think it is easier to work around bad distros by
> vendoring, you guys are the maintainers, so that's up to you. But can we
> at least put in the version dependency and let the build system pick up
> a system libbpf if it exists and is compatible? That way distros that
> *are* competent can still link it dynamically...

Yeah that would be fine by me to use this as a fallback, and I think that
iproute2's configure script should be able to easily handle this situation.
That way we don't compromise on user experience.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 4/5] iproute2: Allow compiling against libbpf
  2019-08-22 13:45               ` Daniel Borkmann
@ 2019-08-22 15:28                 ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 46+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-08-22 15:28 UTC (permalink / raw)
  To: Daniel Borkmann, Stephen Hemminger, Alexei Starovoitov
  Cc: Martin KaFai Lau, Song Liu, Yonghong Song, David Miller,
	Jesper Dangaard Brouer, netdev, bpf, andrii.nakryiko

Daniel Borkmann <daniel@iogearbox.net> writes:

> On 8/22/19 3:38 PM, Toke Høiland-Jørgensen wrote:
>> Daniel Borkmann <daniel@iogearbox.net> writes:
>>> On 8/22/19 2:04 PM, Toke Høiland-Jørgensen wrote:
>>>> Daniel Borkmann <daniel@iogearbox.net> writes:
>>>>> On 8/22/19 12:43 PM, Toke Høiland-Jørgensen wrote:
>>>>>> Daniel Borkmann <daniel@iogearbox.net> writes:
>>>>>>> On 8/20/19 1:47 PM, Toke Høiland-Jørgensen wrote:
>>>>>>>> This adds a configure check for libbpf and renames functions to allow
>>>>>>>> lib/bpf.c to be compiled with it present. This makes it possible to
>>>>>>>> port functionality piecemeal to use libbpf.
>>>>>>>>
>>>>>>>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>>>>>>>> ---
>>>>>>>>      configure          | 16 ++++++++++++++++
>>>>>>>>      include/bpf_util.h |  6 +++---
>>>>>>>>      ip/ipvrf.c         |  4 ++--
>>>>>>>>      lib/bpf.c          | 33 +++++++++++++++++++--------------
>>>>>>>>      4 files changed, 40 insertions(+), 19 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/configure b/configure
>>>>>>>> index 45fcffb6..5a89ee9f 100755
>>>>>>>> --- a/configure
>>>>>>>> +++ b/configure
>>>>>>>> @@ -238,6 +238,19 @@ check_elf()
>>>>>>>>          fi
>>>>>>>>      }
>>>>>>>>      
>>>>>>>> +check_libbpf()
>>>>>>>> +{
>>>>>>>> +    if ${PKG_CONFIG} libbpf --exists; then
>>>>>>>> +	echo "HAVE_LIBBPF:=y" >>$CONFIG
>>>>>>>> +	echo "yes"
>>>>>>>> +
>>>>>>>> +	echo 'CFLAGS += -DHAVE_LIBBPF' `${PKG_CONFIG} libbpf --cflags` >> $CONFIG
>>>>>>>> +	echo 'LDLIBS += ' `${PKG_CONFIG} libbpf --libs` >>$CONFIG
>>>>>>>> +    else
>>>>>>>> +	echo "no"
>>>>>>>> +    fi
>>>>>>>> +}
>>>>>>>> +
>>>>>>>>      check_selinux()
>>>>>>>
>>>>>>> More of an implementation detail at this point in time, but want to
>>>>>>> make sure this doesn't get missed along the way: as discussed at
>>>>>>> bpfconf [0] best for iproute2 to handle libbpf support would be the
>>>>>>> same way of integration as pahole does, that is, to integrate it via
>>>>>>> submodule [1] to allow kernel and libbpf features to be in sync with
>>>>>>> iproute2 releases and therefore easily consume extensions we're adding
>>>>>>> to libbpf to aide iproute2 integration.
>>>>>>
>>>>>> I can sorta see the point wrt keeping in sync with kernel features. But
>>>>>> how will this work with distros that package libbpf as a regular
>>>>>> library? Have you guys given up on regular library symbol versioning for
>>>>>> libbpf?
>>>>>
>>>>> Not at all, and I hope you know that. ;-)
>>>>
>>>> Good! Didn't really expect you had, just checking ;)
>>>>
>>>>> The reason I added lib/bpf.c integration into iproute2 directly back
>>>>> then was exactly such that users can start consuming BPF for tc and
>>>>> XDP via iproute2 /everywhere/ with only a simple libelf dependency
>>>>> which is also available on all distros since pretty much forever. If
>>>>> it was an external library, we could have waited till hell freezes
>>>>> over and initial distro adoption would have pretty much taken forever:
>>>>> to pick one random example here wrt the pace of some downstream
>>>>> distros [0]. The main rationale is pretty much the same as with added
>>>>> kernel features that land complementary iproute2 patches for that
>>>>> kernel release and as libbpf is developed alongside it is reasonable
>>>>> to guarantee user expectations that iproute2 released for kernel
>>>>> version x can make use of BPF features added to kernel x with same
>>>>> loader support from x.
>>>>
>>>> Well, for iproute2 I would expect this to be solved by version
>>>> dependencies. I.e. iproute2 version X would depend on libbpf version Y+
>>>> (like, I dunno, the version of libbpf included in the same kernel source
>>>> tree as the kernel version iproute2 is targeting? :)).
>>>
>>> This sounds nice in theory, but from what I've seen major (!) distros
>>> already seem to have a hard time releasing kernel x along with iproute2
>>> package x, concrete example was that distro kernel was on 4.13 and its
>>> official iproute2 package on 4.9,
>> 
>> If the iproute2 package is not being updated at all I don't really see
>> how it would make any difference whether libbpf is vendored or not? :)
>> 
>>> adding yet another variable that needs to be in sync with kernel is
>>> simply impractical especially for a _core_ package like iproute2 that
>>> should have as little dependencies as possible. I also don't want to
>>> make a bet on whether libbpf will be available on every distro that
>>> also ships iproute2. Hence approach that pahole (and also bcc by the
>>> way) takes is most reasonable to have the best user experience.
>> 
>> Most users are going to get iproute2 from their distro packages anyway,
>> so if distros are incompetent at packaging, my bet is that you're going
>> to run into issues one way or another.
>> 
>> But OK, if you think it is easier to work around bad distros by
>> vendoring, you guys are the maintainers, so that's up to you. But can we
>> at least put in the version dependency and let the build system pick up
>> a system libbpf if it exists and is compatible? That way distros that
>> *are* competent can still link it dynamically...
>
> Yeah that would be fine by me to use this as a fallback, and I think that
> iproute2's configure script should be able to easily handle this
> situation.

Cool, I can live with that :)

-Toke

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP)
  2019-08-22  8:33       ` Daniel Borkmann
  2019-08-22 11:48         ` Toke Høiland-Jørgensen
@ 2019-08-23  6:31         ` Andrii Nakryiko
  2019-08-23 11:29           ` Toke Høiland-Jørgensen
  1 sibling, 1 reply; 46+ messages in thread
From: Andrii Nakryiko @ 2019-08-23  6:31 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Toke Høiland-Jørgensen, Stephen Hemminger,
	Alexei Starovoitov, Martin KaFai Lau, Song Liu, Yonghong Song,
	David Miller, Jesper Dangaard Brouer, Networking, bpf

On Thu, Aug 22, 2019 at 1:33 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 8/22/19 9:49 AM, Andrii Nakryiko wrote:
> > On Wed, Aug 21, 2019 at 2:07 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> >>> On Tue, Aug 20, 2019 at 4:47 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>>>
> >>>> iproute2 uses its own bpf loader to load eBPF programs, which has
> >>>> evolved separately from libbpf. Since we are now standardising on
> >>>> libbpf, this becomes a problem as iproute2 is slowly accumulating
> >>>> feature incompatibilities with libbpf-based loaders. In particular,
> >>>> iproute2 has its own (expanded) version of the map definition struct,
> >>>> which makes it difficult to write programs that can be loaded with both
> >>>> custom loaders and iproute2.
> >>>>
> >>>> This series seeks to address this by converting iproute2 to using libbpf
> >>>> for all its bpf needs. This version is an early proof-of-concept RFC, to
> >>>> get some feedback on whether people think this is the right direction.
> >>>>
> >>>> What this series does is the following:
> >>>>
> >>>> - Updates the libbpf map definition struct to match that of iproute2
> >>>>    (patch 1).
> >>>
> >>> Thanks for taking a stab at unifying libbpf and iproute2 loaders. I'm
> >>> totally in support of making iproute2 use libbpf to load/initialize
> >>> BPF programs. But I'm against adding iproute2-specific fields to
> >>> libbpf's bpf_map_def definitions to support this.
> >>>
> >>> I've proposed the plan of extending libbpf's supported features so
> >>> that it can be used to load iproute2-style BPF programs earlier,
> >>> please see discussions in [0] and [1].
> >>
> >> Yeah, I've seen that discussion, and agree that longer term this is
> >> probably a better way to do map-in-map definitions.
> >>
> >> However, I view your proposal as complementary to this series: we'll
> >> probably also want the BTF-based definition to work with iproute2, and
> >> that means iproute2 needs to be ported to libbpf. But iproute2 needs to
> >> be backwards compatible with the format it supports now, and, well, this
> >> series is the simplest way to achieve that IMO :)
> >
> > Ok, I understand that. But I'd still want to avoid adding extra cruft
> > to libbpf just for backwards-compatibility with *exact* iproute2
> > format. Libbpf as a whole is trying to move away from relying on
> > binary bpf_map_def and into using BTF-defined map definitions, and
> > this patch series is a step backwards in that regard, that adds,
> > essentially, already outdated stuff that we'll need to support forever
> > (I mean those extra fields in bpf_map_def, that will stay there
> > forever).
>
> Agree, adding these extensions for libbpf would be a step backwards
> compared to using BTF defined map defs.
>
> > We've discussed one way to deal with it, IMO, in a cleaner way. It can
> > be done in few steps:
> >
> > 1. I originally wanted BTF-defined map definitions to ignore unknown
> > fields. It shouldn't be a default mode, but it should be supported
> > (and of course is very easy to add). So let's add that and let libbpf
> > ignore unknown stuff.
> >
> > 2. Then to let iproute2 loader deal with backwards-compatibility for
> > libbpf-incompatible bpf_elf_map, we need to "pass-through" all those
> > fields so that users of libbpf (iproute2 loader, in this case) can
> > make use of it. The easiest and cleanest way to do this is to expose
> > BTF ID of a type describing each map entry and let iproute2 process
> > that in whichever way it sees fit.
> >
> > Luckily, bpf_elf_map is compatible in `type` field, which will let
> > libbpf recognize bpf_elf_map as map definition. All the rest setup
> > will be done by iproute2, by processing BTF of bpf_elf_map, which will
> > let it set up map sizes, flags and do all of its map-in-map magic.
> >
> > The only additions to libbpf in this case would be a new `__u32
> > bpf_map__btf_id(struct bpf_map* map);` API.
> >
> > I haven't written any code and haven't 100% checked that this will
> > cover everything, but I think we should try. This will allow to let
> > users of libbpf do custom stuff with map definitions without having to
> > put all this extra logic into libbpf itself, which I think is
> > desirable outcome.
>
> Sounds reasonable in general, but all this still has the issue that we're
> assuming that BTF is /always/ present. Existing object files that would load
> just fine /today/ but do not have BTF attached won't be handled here. Wouldn't
> it be more straight forward to allow passing callbacks to the libbpf loader
> such that if the map section is not found to be bpf_map_def compatible, we
> rely on external user aka callback to parse the ELF section, handle any
> non-default libbpf behavior like pinning/retrieving from BPF fs, populate
> related internal libbpf map data structures and pass control back to libbpf

Having all those special callbacks feels a bit too narrow-focused. I
do agree that we need to provide enough flexibility to allow
non-standard BPF loaders like iproute2 to adjust and/or extend some of
BPF initialization logic, though. I'm just unsure if adding more and
more callbacks is the right approach.

Let's think slightly beyond iproute2, because iproute2 and libbpf use
the same ELF section name ("maps") for non-BTF-defined map defs, which
makes it easy to fall into the trap of designing too specific
solution. Let's take, say, BCC. BCC uses one section per each map.
And, unlike, iproute2, ELF section names don't coincide. So what if
BCC were to use libbpf as an underlying BPF loader, while preserving
its layout? This callback for "maps" section won't work at all.

BTW, I realize BCC might not be the best example here, given
on-the-fly complication nature of its programs and extensive usage of
macro, which provide quite a lot of flexibility in changing ELF
layout, if necessary, without changing user programs, but bear with
me, let's pretend we can't change some of those aspects easily. The
point I'm trying to make is that iproute2 and libbpf cases are
examples of almost compatible ELF layouts, and if we try to solve just
such case, we might end up inventing too specific solution.

What seems like a bit more flexible and generic solution is to make
libbpf API granular enough to allow users to adjust/add extra maps,
programs, relocations, whatever is necessary, before libbpf does final
loading or some other checks that would fail for normal libbpf
programs, but are ok for custom BPF programs. So instead of having
callbacks, we have API for each step, where custom loader can skip
and/or adjust behavior of each of them, while also implement their own
steps.

E.g., today's API is essentially three steps:

1. open and parse ELF: collect relos, programs, map definitions
2. load: create maps from collected defs, do program/global data/CO-RE
relocs, load and verify BPF programs
3. attach programs one by one.

Between step 1 and 2 user has flexibility to create more maps, set up
map-in-map, etc. Between 2 and 3 you can fill in global data, fill in
tail call maps, etc. That's already pretty flexible. But we can tune
and break apart those steps even further, if necessary.

E.g., for iproute2 maps, I think the biggest problem is that we can't
create a custom map dynamically, but still allow BPF instructions
relocations against that map. Plus, of course, a name clash of "maps"
ELF section with incompatible map definitions. So how about:

1. for open call, tell it to not parse "maps" section at all. We can
model that as an extra option to override map definitions ELF section
name. E.g., so that some applications can put their map defs into
"my_fancy_map_def_section" ELF section, for instance. If that section
is empty, though, libbpf will just skip step of collecting
bpf_map_defs (it can still do BTF-defined maps, btw), allowing gradual
migration.

2. provide API to add dynamically created (e.g., by iproute2 loader)
maps that are "relocatable to", before program relocations happen.
We'll need to figure out best interface here. Internally relocations
refer to maps as section index + offset and translate that to map
index. If we keep relos (internally) just as section index + offset,
it will be simple and consistent interface to add external maps that
can be relocated against, IMO. Map index is inconvenient in this case.

BTW, we'd need to answer some of those questions regardless, even for
callback-based solution. There are a bunch of details to be worked
out, of course, and I don't have exact answer to all of them right
now, but I think it's a worthwhile exercise to try to answer them and
see how API would look like.

As an aside, taking a step back and thinking about this whole API
design thing, it occurred to me that ELF is just one way to specify
programs, maps, relocations, global data, etc. But it doesn't have to
be just ELF, right? What if we have a use case where we have
"on-the-fly" creation of BPF program, with dynamically added maps,
dynamically generated BPF programs, etc. E.g., think about bpftrace
generating BPF object/program on-the-fly from their DSL language. Or
some pcap to BPF translator for firewall rules, etc, etc. It might be
inconvenient and unnecessary to generate ELF and then pass it to
libbpf to load it. Instead it could be more convenient to create an
empty bpf_object (bpf_object__new) and then use programmatic APIs to
add a bunch of maps (some sort of bpf_object__add_map) and progs
(bpf_object__add_program) with relocations against those maps, and let
libbpf do all the low-level stuff (relos, CO-RE, etc). And provide
high-level API for bpf_map, bpf_program, etc.

How API would look like to support this? Today's bpf_object__open +
bpf__object__load could be just a higher-level wrappers on top of
those APIs, constructing all the entities from standardized ELF
layout. But there still is low-level API to construct same bpf_object
construct dynamically. That seems flexible and powerful and not tied
to any particular use case, but of course requires a bit of thought
about best API.

Sorry for the wall of text :) Callback-based solutions always seem
convoluted to me, as well as hard to follow and quite often too
limited. This topic is not the easiest one to discuss over email, as
well, maybe we should chat about this while at LPC?

> loader afterwards. (Similar callback with prog section name handling for the
> case where tail call maps get automatically populated.)

I'm not sure I completely understand why we need this prog section
name callback. Can you elaborate what problem does it solve that can't
be solved with existing API?

>
> Thanks,
> Daniel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP)
  2019-08-21 20:30 ` Andrii Nakryiko
  2019-08-21 21:07   ` Toke Høiland-Jørgensen
@ 2019-08-23 10:27   ` Jesper Dangaard Brouer
  2019-08-28 20:23     ` Andrii Nakryiko
  1 sibling, 1 reply; 46+ messages in thread
From: Jesper Dangaard Brouer @ 2019-08-23 10:27 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Toke Høiland-Jørgensen, Stephen Hemminger,
	Daniel Borkmann, Alexei Starovoitov, Martin KaFai Lau, Song Liu,
	Yonghong Song, David Miller, Networking, bpf, brouer,
	Anton Protopopov, Stanislav Fomichev, Yoel Caspersen

On Wed, 21 Aug 2019 13:30:09 -0700
Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:

> On Tue, Aug 20, 2019 at 4:47 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >
> > iproute2 uses its own bpf loader to load eBPF programs, which has
> > evolved separately from libbpf. Since we are now standardising on
> > libbpf, this becomes a problem as iproute2 is slowly accumulating
> > feature incompatibilities with libbpf-based loaders. In particular,
> > iproute2 has its own (expanded) version of the map definition struct,
> > which makes it difficult to write programs that can be loaded with both
> > custom loaders and iproute2.
> >
> > This series seeks to address this by converting iproute2 to using libbpf
> > for all its bpf needs. This version is an early proof-of-concept RFC, to
> > get some feedback on whether people think this is the right direction.
> >
> > What this series does is the following:
> >
> > - Updates the libbpf map definition struct to match that of iproute2
> >   (patch 1).  
> 
> 
> Hi Toke,
> 
> Thanks for taking a stab at unifying libbpf and iproute2 loaders. I'm
> totally in support of making iproute2 use libbpf to load/initialize
> BPF programs. But I'm against adding iproute2-specific fields to
> libbpf's bpf_map_def definitions to support this.
> 
> I've proposed the plan of extending libbpf's supported features so
> that it can be used to load iproute2-style BPF programs earlier,
> please see discussions in [0] and [1]. I think instead of emulating
> iproute2 way of matching everything based on user-specified internal
> IDs, which doesn't provide good user experience and is quite easy to
> get wrong, we should support same scenarios with better declarative
> syntax and in a less error-prone way. I believe we can do that by
> relying on BTF more heavily (again, please check some of my proposals
> in [0], [1], and discussion with Daniel in those threads). It will
> feel more natural and be more straightforward to follow. It would be
> great if you can lend a hand in implementing pieces of that plan!
> 
> I'm currently on vacation, so my availability is very sparse, but I'd
> be happy to discuss this further, if need be.
> 
>   [0] https://lore.kernel.org/bpf/CAEf4BzbfdG2ub7gCi0OYqBrUoChVHWsmOntWAkJt47=FE+km+A@mail.gmail.com/
>   [1] https://www.spinics.net/lists/bpf/msg03976.html
> 
> > - Adds functionality to libbpf to support automatic pinning of maps when
> >   loading an eBPF program, while re-using pinned maps if they already
> >   exist (patches 2-3).

For production use-cases, libbpf really need an easier higher-level API
for re-using pinned maps, for establishing shared maps between
programs.  The existing libbpf API bpf_object__pin_maps() and
bpf_object__unpin_maps(), which don't re-use pinned maps, are not
really usable, because they pin/unpin ALL maps in the ELF file.

What users really need is an easy way to specify, on a per map basis,
what kind of pinning and reuse/sharing they want.  E.g. like iproute2
have, "global", "object-scope", and "no-pinning". ("ifindex-scope" would
be nice for XDP).
  Today users have to split/reimplement bpf_prog_load_xattr(), and
use/add bpf_map__reuse_fd().  Which is that I ended doing for
xdp-cpumap-tc[2] (used in production at ISP) resulting in 142 lines of
extra code[3] that should have been hidden inside libbpf.  And worse,
in this solution[4] the maps for reuse-pinning is specified in the code
by name.  Thus, they cannot use a generic loader.  That I why, I want
to mark the maps via a pinning member, like iproute2.

I really hope this moves in a practical direction, as I have the next
production request lined up (also from an ISP), and I hate to have to
advice them to choose the same route as [3].


[2] https://github.com/xdp-project/xdp-cpumap-tc/
[3] https://github.com/xdp-project/xdp-cpumap-tc/blob/master/src/xdp_iphash_to_cpu_user.c#L262-L403
[4] https://github.com/xdp-project/xdp-cpumap-tc/blob/master/src/xdp_iphash_to_cpu_user.c#L431-L441
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP)
  2019-08-23  6:31         ` Andrii Nakryiko
@ 2019-08-23 11:29           ` Toke Høiland-Jørgensen
  2019-08-28 20:40             ` Andrii Nakryiko
  0 siblings, 1 reply; 46+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-08-23 11:29 UTC (permalink / raw)
  To: Andrii Nakryiko, Daniel Borkmann
  Cc: Stephen Hemminger, Alexei Starovoitov, Martin KaFai Lau,
	Song Liu, Yonghong Song, David Miller, Jesper Dangaard Brouer,
	Networking, bpf

[ ... snip ...]

> E.g., today's API is essentially three steps:
>
> 1. open and parse ELF: collect relos, programs, map definitions
> 2. load: create maps from collected defs, do program/global data/CO-RE
> relocs, load and verify BPF programs
> 3. attach programs one by one.
>
> Between step 1 and 2 user has flexibility to create more maps, set up
> map-in-map, etc. Between 2 and 3 you can fill in global data, fill in
> tail call maps, etc. That's already pretty flexible. But we can tune
> and break apart those steps even further, if necessary.

Today, steps 1 and 2 can be collapsed into a single call to
bpf_prog_load_xattr(). As Jesper's mail explains, for XDP we don't
generally want to do all the fancy rewriting stuff, we just want a
simple way to load a program and get reusable pinning of maps.
Preferably in a way that is compatible with the iproute2 loader.

So I really think we need two things:

(1) a flexible API that splits up all the various steps in a way that
    allows programs to inject their own map definitions before
    relocations and loading

(2) a simple convenience wrapper that loads an object file, does
    something sensible with pinning and map-in-map definitions, and loads
    everything into the kernel.

I'd go so far as to say that (2) should even support system-wide
configuration, similar to the /etc/iproute2/bpf_pinning file. E.g., an
/etc/libbpf/pinning.conf file that sets the default pinning directory,
and makes it possible to set up pin-value-to-subdir mappings like what
iproute2 does today.

Having (2) makes it more likely that all the different custom loaders
will be compatible with each other, while still allowing people to do
their own custom thing with (1). And of course, (2) could be implemented
in terms of (1) internally in libbpf.

In my ideal world, (2) would just use the definition format already in
iproute2 (this is basically what I implemented already), but if you guys
don't want to put this into libbpf, I can probably live with the default
format being BTF-based instead. Which would mean that iproute2 I would
end up with a flow like this:

- When given an elf file, try to run it through the "standard loader"
  (2). If this works, great, proceed to program attach.

- If using (2) fails because it doesn't understand the map definition,
  fall back to a compatibility loader that parses the legacy iproute2
  map definition format and uses (1) to load that.


Does the above make sense? :)

-Toke

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP)
  2019-08-23 10:27   ` Jesper Dangaard Brouer
@ 2019-08-28 20:23     ` Andrii Nakryiko
  0 siblings, 0 replies; 46+ messages in thread
From: Andrii Nakryiko @ 2019-08-28 20:23 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Toke Høiland-Jørgensen, Stephen Hemminger,
	Daniel Borkmann, Alexei Starovoitov, Martin KaFai Lau, Song Liu,
	Yonghong Song, David Miller, Networking, bpf, Anton Protopopov,
	Stanislav Fomichev, Yoel Caspersen

On Fri, Aug 23, 2019 at 3:27 AM Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
>
> On Wed, 21 Aug 2019 13:30:09 -0700
> Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>
> > On Tue, Aug 20, 2019 at 4:47 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> > >
> > > iproute2 uses its own bpf loader to load eBPF programs, which has
> > > evolved separately from libbpf. Since we are now standardising on
> > > libbpf, this becomes a problem as iproute2 is slowly accumulating
> > > feature incompatibilities with libbpf-based loaders. In particular,
> > > iproute2 has its own (expanded) version of the map definition struct,
> > > which makes it difficult to write programs that can be loaded with both
> > > custom loaders and iproute2.
> > >
> > > This series seeks to address this by converting iproute2 to using libbpf
> > > for all its bpf needs. This version is an early proof-of-concept RFC, to
> > > get some feedback on whether people think this is the right direction.
> > >
> > > What this series does is the following:
> > >
> > > - Updates the libbpf map definition struct to match that of iproute2
> > >   (patch 1).
> >
> >
> > Hi Toke,
> >
> > Thanks for taking a stab at unifying libbpf and iproute2 loaders. I'm
> > totally in support of making iproute2 use libbpf to load/initialize
> > BPF programs. But I'm against adding iproute2-specific fields to
> > libbpf's bpf_map_def definitions to support this.
> >
> > I've proposed the plan of extending libbpf's supported features so
> > that it can be used to load iproute2-style BPF programs earlier,
> > please see discussions in [0] and [1]. I think instead of emulating
> > iproute2 way of matching everything based on user-specified internal
> > IDs, which doesn't provide good user experience and is quite easy to
> > get wrong, we should support same scenarios with better declarative
> > syntax and in a less error-prone way. I believe we can do that by
> > relying on BTF more heavily (again, please check some of my proposals
> > in [0], [1], and discussion with Daniel in those threads). It will
> > feel more natural and be more straightforward to follow. It would be
> > great if you can lend a hand in implementing pieces of that plan!
> >
> > I'm currently on vacation, so my availability is very sparse, but I'd
> > be happy to discuss this further, if need be.
> >
> >   [0] https://lore.kernel.org/bpf/CAEf4BzbfdG2ub7gCi0OYqBrUoChVHWsmOntWAkJt47=FE+km+A@mail.gmail.com/
> >   [1] https://www.spinics.net/lists/bpf/msg03976.html
> >
> > > - Adds functionality to libbpf to support automatic pinning of maps when
> > >   loading an eBPF program, while re-using pinned maps if they already
> > >   exist (patches 2-3).
>
> For production use-cases, libbpf really need an easier higher-level API
> for re-using pinned maps, for establishing shared maps between
> programs.  The existing libbpf API bpf_object__pin_maps() and
> bpf_object__unpin_maps(), which don't re-use pinned maps, are not
> really usable, because they pin/unpin ALL maps in the ELF file.
>
> What users really need is an easy way to specify, on a per map basis,
> what kind of pinning and reuse/sharing they want.  E.g. like iproute2
> have, "global", "object-scope", and "no-pinning". ("ifindex-scope" would
> be nice for XDP).

I totally agree and I think this is easy to add both for BTF-defined
and "classic" bpf_map_def maps. Daniel mentioned in one of the
previous threads that in practice object-scope doesn't seem to be
used, so I'd say we should start with no-pinning + global pinning as
two initial supported values for pinning attribute. ifindex-scope is
interesting, but I'd love to hear a bit more about the use cases.

>   Today users have to split/reimplement bpf_prog_load_xattr(), and
> use/add bpf_map__reuse_fd().  Which is that I ended doing for

Honestly, bpf_prog_load_xattr() existence seems redundant to me. It's
basically just bpf_object__open + bpf_object__load. There is a piece
in the middle with "guessing" program types, but it should just be
moved into bpf_object__open and happen by default. Using open + load
gives more control and isn't really harder than bpf_prog_load_xattr.
bpf_prog_load_xattr which might be slightly more convenient for simple
use case, but falls apart immediately if you need to tune anything
before load.

> xdp-cpumap-tc[2] (used in production at ISP) resulting in 142 lines of
> extra code[3] that should have been hidden inside libbpf.  And worse,
> in this solution[4] the maps for reuse-pinning is specified in the code
> by name.  Thus, they cannot use a generic loader.  That I why, I want
> to mark the maps via a pinning member, like iproute2.
>
> I really hope this moves in a practical direction, as I have the next
> production request lined up (also from an ISP), and I hate to have to
> advice them to choose the same route as [3].

It seems to me that map pinning doesn't need much discussion at this
point, let's start with no-pinning + global pinning. To accommodate
pinning at custom root, bpf_object__open_xattr should accept extra
argument with non-default pinning root path. That should solve your
case completely, shouldn't it? Ultimately, with BTF-defined maps it
should be possible to specify custom pinning path on per-map basis for
cases where user needs ultimate non-uniform manual control.

>
>
> [2] https://github.com/xdp-project/xdp-cpumap-tc/
> [3] https://github.com/xdp-project/xdp-cpumap-tc/blob/master/src/xdp_iphash_to_cpu_user.c#L262-L403
> [4] https://github.com/xdp-project/xdp-cpumap-tc/blob/master/src/xdp_iphash_to_cpu_user.c#L431-L441
> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP)
  2019-08-23 11:29           ` Toke Høiland-Jørgensen
@ 2019-08-28 20:40             ` Andrii Nakryiko
  2020-02-03  7:29               ` Andrii Nakryiko
  0 siblings, 1 reply; 46+ messages in thread
From: Andrii Nakryiko @ 2019-08-28 20:40 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Daniel Borkmann, Stephen Hemminger, Alexei Starovoitov,
	Martin KaFai Lau, Song Liu, Yonghong Song, David Miller,
	Jesper Dangaard Brouer, Networking, bpf

On Fri, Aug 23, 2019 at 4:29 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> [ ... snip ...]
>
> > E.g., today's API is essentially three steps:
> >
> > 1. open and parse ELF: collect relos, programs, map definitions
> > 2. load: create maps from collected defs, do program/global data/CO-RE
> > relocs, load and verify BPF programs
> > 3. attach programs one by one.
> >
> > Between step 1 and 2 user has flexibility to create more maps, set up
> > map-in-map, etc. Between 2 and 3 you can fill in global data, fill in
> > tail call maps, etc. That's already pretty flexible. But we can tune
> > and break apart those steps even further, if necessary.
>
> Today, steps 1 and 2 can be collapsed into a single call to
> bpf_prog_load_xattr(). As Jesper's mail explains, for XDP we don't
> generally want to do all the fancy rewriting stuff, we just want a
> simple way to load a program and get reusable pinning of maps.

I agree. See my response to Jesper's message. Note also my view of
bpf_prog_load_xattr() existence.

> Preferably in a way that is compatible with the iproute2 loader.
>
> So I really think we need two things:
>
> (1) a flexible API that splits up all the various steps in a way that
>     allows programs to inject their own map definitions before
>     relocations and loading
>
> (2) a simple convenience wrapper that loads an object file, does
>     something sensible with pinning and map-in-map definitions, and loads
>     everything into the kernel.

I agree. I think this wrapper is bpf_object__open + bpf_object__load
(bpf_prog_load_xattr will do as well, if you don't need to do anything
between open and load). I think pinning is simple to add in minimal
form and is pretty non-controversial (there is some ambiguity as to
how to handle merging of prog array maps, or maybe not just prog array
maps, but that can be controlled later through extra flags/attributes,
so I'd start with something sensible as a default behavior).

>
> I'd go so far as to say that (2) should even support system-wide
> configuration, similar to the /etc/iproute2/bpf_pinning file. E.g., an
> /etc/libbpf/pinning.conf file that sets the default pinning directory,
> and makes it possible to set up pin-value-to-subdir mappings like what
> iproute2 does today.

This I'm a bit hesitant about. It feels like it's not library's job to
read some system-wide configs modifying its behavior. We have all
those _xattr methods, which allow to override sensible defaults, I'd
try to go as far as possible with just that before doing
libbpf-specific /etc configs.

>
> Having (2) makes it more likely that all the different custom loaders
> will be compatible with each other, while still allowing people to do
> their own custom thing with (1). And of course, (2) could be implemented
> in terms of (1) internally in libbpf.
>
> In my ideal world, (2) would just use the definition format already in
> iproute2 (this is basically what I implemented already), but if you guys
> don't want to put this into libbpf, I can probably live with the default

I want to avoid having legacy-at-the-time-it-was-added code in libbpf
that we'd need to support for a long time, that solves only iproute2
cases, which is why I'm pushing back. With BTF we can support same
functionality in better form, which is what I want to prioritize and
which will be beneficial to the whole BPF ecosystem.

But I also want to make libbpf useful to iproute2 and other custom
loaders that have to support existing formats, and thus my proposal to
have libbpf provide granular enough APIs to augment default format in
non-intrusive way. Should this be callback-based or not is secondary,
though important to API design, concern.

> format being BTF-based instead. Which would mean that iproute2 I would
> end up with a flow like this:
>
> - When given an elf file, try to run it through the "standard loader"
>   (2). If this works, great, proceed to program attach.
>
> - If using (2) fails because it doesn't understand the map definition,
>   fall back to a compatibility loader that parses the legacy iproute2
>   map definition format and uses (1) to load that.
>
>
> Does the above make sense? :)

It does, yes. Also, with BTF enabled it should be easy to distinguish
between those two (e.g., was bpf_elf_map type used? if yes, then it's
a compatibility format) and not do extra work.

>
> -Toke

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP)
  2019-08-28 20:40             ` Andrii Nakryiko
@ 2020-02-03  7:29               ` Andrii Nakryiko
  2020-02-03 19:34                 ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 46+ messages in thread
From: Andrii Nakryiko @ 2020-02-03  7:29 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Daniel Borkmann, Stephen Hemminger, Alexei Starovoitov,
	Martin KaFai Lau, Song Liu, Yonghong Song, David Miller,
	Jesper Dangaard Brouer, Networking, bpf

On Wed, Aug 28, 2019 at 1:40 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Fri, Aug 23, 2019 at 4:29 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >
> > [ ... snip ...]
> >
> > > E.g., today's API is essentially three steps:
> > >
> > > 1. open and parse ELF: collect relos, programs, map definitions
> > > 2. load: create maps from collected defs, do program/global data/CO-RE
> > > relocs, load and verify BPF programs
> > > 3. attach programs one by one.
> > >
> > > Between step 1 and 2 user has flexibility to create more maps, set up
> > > map-in-map, etc. Between 2 and 3 you can fill in global data, fill in
> > > tail call maps, etc. That's already pretty flexible. But we can tune
> > > and break apart those steps even further, if necessary.
> >
> > Today, steps 1 and 2 can be collapsed into a single call to
> > bpf_prog_load_xattr(). As Jesper's mail explains, for XDP we don't
> > generally want to do all the fancy rewriting stuff, we just want a
> > simple way to load a program and get reusable pinning of maps.
>
> I agree. See my response to Jesper's message. Note also my view of
> bpf_prog_load_xattr() existence.
>
> > Preferably in a way that is compatible with the iproute2 loader.
> >

Hi Toke,

I was wondering what's the state of converting iproute2 to use libbpf?
Is this still something you (or someone else) interested to do?

Briefly re-reading the thread, I think libbpf already has almost
everything to be used by iproute2. You've added map pinning, so with
bpf_map__set_pin_path() iproute2 should be able to specify pinning
path, according to its own logic. The only thing missing that I can
see is ability to specify numa_node, which we should add both to
BTF-defined map definitions (trivial change), as well as probably
expose a method like bpf_map__set_numa_node(struct bpf_map *map, int
numa_node) for non-declarative and non-BTF legacy cases.

There was concern about supporting "extended" bpf_map_def format of
iproute2 (bpf_elf_map, actually) with extra fields. I think it's
actually easy to handle as is without any extra new APIs.
bpf_object__open() w/ .relaxed_maps = true option will process
compatible 5 fields of bpf_map_def (type, key/value sizes,
max_entries, and map_flags) and will set up corresponding struct
bpf_map entries (but won't create BPF maps in kernel yet). Then
iproute2 can iterate over "maps" ELF section on its own, and see which
maps need to get some more adjustments before load phase: map-in-map
set up, numa node, pinning, etc. All those adjustments can be done
(except for numa yet) through existing libbpf APIs, as far as I can
tell. Once that is taken care of, proceed to bpf_object__load() and
other standard steps. No callbacks, no extra cruft.

Is there anything else that can block iproute2 conversion to libbpf?

BTW, I have a draft patches for declarative (BTF-based) map-in-map set
up and initialization the way I described it at Plumbers last year. So
while I'm finalizing that, thought I'll resurrect iproute2 thread and
see if we can get iproute2 migration to libbpf started.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP)
  2020-02-03  7:29               ` Andrii Nakryiko
@ 2020-02-03 19:34                 ` Toke Høiland-Jørgensen
  2020-02-04  0:56                   ` Andrii Nakryiko
  0 siblings, 1 reply; 46+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-02-03 19:34 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Daniel Borkmann, Stephen Hemminger, Alexei Starovoitov,
	Martin KaFai Lau, Song Liu, Yonghong Song, David Miller,
	Jesper Dangaard Brouer, Networking, bpf

Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> On Wed, Aug 28, 2019 at 1:40 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
>>
>> On Fri, Aug 23, 2019 at 4:29 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >
>> > [ ... snip ...]
>> >
>> > > E.g., today's API is essentially three steps:
>> > >
>> > > 1. open and parse ELF: collect relos, programs, map definitions
>> > > 2. load: create maps from collected defs, do program/global data/CO-RE
>> > > relocs, load and verify BPF programs
>> > > 3. attach programs one by one.
>> > >
>> > > Between step 1 and 2 user has flexibility to create more maps, set up
>> > > map-in-map, etc. Between 2 and 3 you can fill in global data, fill in
>> > > tail call maps, etc. That's already pretty flexible. But we can tune
>> > > and break apart those steps even further, if necessary.
>> >
>> > Today, steps 1 and 2 can be collapsed into a single call to
>> > bpf_prog_load_xattr(). As Jesper's mail explains, for XDP we don't
>> > generally want to do all the fancy rewriting stuff, we just want a
>> > simple way to load a program and get reusable pinning of maps.
>>
>> I agree. See my response to Jesper's message. Note also my view of
>> bpf_prog_load_xattr() existence.
>>
>> > Preferably in a way that is compatible with the iproute2 loader.
>> >
>
> Hi Toke,
>
> I was wondering what's the state of converting iproute2 to use libbpf?
> Is this still something you (or someone else) interested to do?

Yeah, it's still on my list; planning to circle back to it once I have
finished an RFC implementation for XDP multiprog loading based on the
new function-replacing in the kernel.

(Not that this should keep anyone else from giving the conversion a go
and beating me to it :)).

> Briefly re-reading the thread, I think libbpf already has almost
> everything to be used by iproute2. You've added map pinning, so with
> bpf_map__set_pin_path() iproute2 should be able to specify pinning
> path, according to its own logic. The only thing missing that I can
> see is ability to specify numa_node, which we should add both to
> BTF-defined map definitions (trivial change), as well as probably
> expose a method like bpf_map__set_numa_node(struct bpf_map *map, int
> numa_node) for non-declarative and non-BTF legacy cases.

Yes, adding this to libbpf would be good.

> There was concern about supporting "extended" bpf_map_def format of
> iproute2 (bpf_elf_map, actually) with extra fields. I think it's
> actually easy to handle as is without any extra new APIs.
> bpf_object__open() w/ .relaxed_maps = true option will process
> compatible 5 fields of bpf_map_def (type, key/value sizes,
> max_entries, and map_flags) and will set up corresponding struct
> bpf_map entries (but won't create BPF maps in kernel yet). Then
> iproute2 can iterate over "maps" ELF section on its own, and see which
> maps need to get some more adjustments before load phase: map-in-map
> set up, numa node, pinning, etc. All those adjustments can be done
> (except for numa yet) through existing libbpf APIs, as far as I can
> tell. Once that is taken care of, proceed to bpf_object__load() and
> other standard steps. No callbacks, no extra cruft.
>
> Is there anything else that can block iproute2 conversion to libbpf?

I haven't looked into the details since my last RFC conversion series,
but from what I recall from that, and what we've been changing in libbpf
since, I was basically planning to do what you explained. So while there
are some details to work out, I believe it's basically straight forward,
and I can't think of anything that should block it.

> BTW, I have a draft patches for declarative (BTF-based) map-in-map set
> up and initialization the way I described it at Plumbers last year. So
> while I'm finalizing that, thought I'll resurrect iproute2 thread and
> see if we can get iproute2 migration to libbpf started.

Great! FWIW, as long as we have the legacy compatibility code in
iproute2, I don't think it'll be a problem (or a blocker for the
conversion) if BTF-defined maps can't express all the same things as the
legacy maps. The missing bits will come automatically as libbpf is
updated. But great to hear that you're working on this :)

-Toke


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP)
  2020-02-03 19:34                 ` Toke Høiland-Jørgensen
@ 2020-02-04  0:56                   ` Andrii Nakryiko
  2020-02-04  1:46                     ` David Ahern
  2020-02-04  8:27                     ` Toke Høiland-Jørgensen
  0 siblings, 2 replies; 46+ messages in thread
From: Andrii Nakryiko @ 2020-02-04  0:56 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Daniel Borkmann, Stephen Hemminger, Alexei Starovoitov,
	Martin KaFai Lau, Song Liu, Yonghong Song, David Miller,
	Jesper Dangaard Brouer, Networking, bpf

On Mon, Feb 3, 2020 at 11:34 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>
> > On Wed, Aug 28, 2019 at 1:40 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> >>
> >> On Fri, Aug 23, 2019 at 4:29 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >> >
> >> > [ ... snip ...]
> >> >
> >> > > E.g., today's API is essentially three steps:
> >> > >
> >> > > 1. open and parse ELF: collect relos, programs, map definitions
> >> > > 2. load: create maps from collected defs, do program/global data/CO-RE
> >> > > relocs, load and verify BPF programs
> >> > > 3. attach programs one by one.
> >> > >
> >> > > Between step 1 and 2 user has flexibility to create more maps, set up
> >> > > map-in-map, etc. Between 2 and 3 you can fill in global data, fill in
> >> > > tail call maps, etc. That's already pretty flexible. But we can tune
> >> > > and break apart those steps even further, if necessary.
> >> >
> >> > Today, steps 1 and 2 can be collapsed into a single call to
> >> > bpf_prog_load_xattr(). As Jesper's mail explains, for XDP we don't
> >> > generally want to do all the fancy rewriting stuff, we just want a
> >> > simple way to load a program and get reusable pinning of maps.
> >>
> >> I agree. See my response to Jesper's message. Note also my view of
> >> bpf_prog_load_xattr() existence.
> >>
> >> > Preferably in a way that is compatible with the iproute2 loader.
> >> >
> >
> > Hi Toke,
> >
> > I was wondering what's the state of converting iproute2 to use libbpf?
> > Is this still something you (or someone else) interested to do?
>
> Yeah, it's still on my list; planning to circle back to it once I have
> finished an RFC implementation for XDP multiprog loading based on the
> new function-replacing in the kernel.
>
> (Not that this should keep anyone else from giving the conversion a go
> and beating me to it :)).
>
> > Briefly re-reading the thread, I think libbpf already has almost
> > everything to be used by iproute2. You've added map pinning, so with
> > bpf_map__set_pin_path() iproute2 should be able to specify pinning
> > path, according to its own logic. The only thing missing that I can
> > see is ability to specify numa_node, which we should add both to
> > BTF-defined map definitions (trivial change), as well as probably
> > expose a method like bpf_map__set_numa_node(struct bpf_map *map, int
> > numa_node) for non-declarative and non-BTF legacy cases.
>
> Yes, adding this to libbpf would be good.
>
> > There was concern about supporting "extended" bpf_map_def format of
> > iproute2 (bpf_elf_map, actually) with extra fields. I think it's
> > actually easy to handle as is without any extra new APIs.
> > bpf_object__open() w/ .relaxed_maps = true option will process
> > compatible 5 fields of bpf_map_def (type, key/value sizes,
> > max_entries, and map_flags) and will set up corresponding struct
> > bpf_map entries (but won't create BPF maps in kernel yet). Then
> > iproute2 can iterate over "maps" ELF section on its own, and see which
> > maps need to get some more adjustments before load phase: map-in-map
> > set up, numa node, pinning, etc. All those adjustments can be done
> > (except for numa yet) through existing libbpf APIs, as far as I can
> > tell. Once that is taken care of, proceed to bpf_object__load() and
> > other standard steps. No callbacks, no extra cruft.
> >
> > Is there anything else that can block iproute2 conversion to libbpf?
>
> I haven't looked into the details since my last RFC conversion series,
> but from what I recall from that, and what we've been changing in libbpf
> since, I was basically planning to do what you explained. So while there
> are some details to work out, I believe it's basically straight forward,
> and I can't think of anything that should block it.
>

Great! Just to disambiguate and make sure we are in agreement, my hope
here is that iproute2 can completely delegate to libbpf all the ELF
parsing, map creation, program loading, etc (including all the new
stuff like global variables, etc). And only for legacy maps in
SEC("maps"), it would have to parse that *single* ELF section (again,
on its own) and see if there are any extra features of struct
bpf_elf_map requested (i.e., numa, map-in-map, pinning), and if yes,
it would use programmatic libbpf APIs to set this up. It might need to
do additional BPF_PROG_ARRAY set up after BPF programs are loaded
(because iproute2 has its custom naming-based convention). But
hopefully we'll encourage people to gradually migrate to BTF-defined
maps with declarative ways of doing all that.

> > BTW, I have a draft patches for declarative (BTF-based) map-in-map set
> > up and initialization the way I described it at Plumbers last year. So
> > while I'm finalizing that, thought I'll resurrect iproute2 thread and
> > see if we can get iproute2 migration to libbpf started.
>
> Great! FWIW, as long as we have the legacy compatibility code in
> iproute2, I don't think it'll be a problem (or a blocker for the
> conversion) if BTF-defined maps can't express all the same things as the
> legacy maps. The missing bits will come automatically as libbpf is
> updated. But great to hear that you're working on this :)
>
> -Toke
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP)
  2020-02-04  0:56                   ` Andrii Nakryiko
@ 2020-02-04  1:46                     ` David Ahern
  2020-02-04  3:41                       ` Andrii Nakryiko
  2020-02-04  8:27                     ` Toke Høiland-Jørgensen
  1 sibling, 1 reply; 46+ messages in thread
From: David Ahern @ 2020-02-04  1:46 UTC (permalink / raw)
  To: Andrii Nakryiko, Toke Høiland-Jørgensen
  Cc: Daniel Borkmann, Stephen Hemminger, Alexei Starovoitov,
	Martin KaFai Lau, Song Liu, Yonghong Song, David Miller,
	Jesper Dangaard Brouer, Networking, bpf

On 2/3/20 5:56 PM, Andrii Nakryiko wrote:
> Great! Just to disambiguate and make sure we are in agreement, my hope
> here is that iproute2 can completely delegate to libbpf all the ELF
>

iproute2 needs to compile and continue working as is when libbpf is not
available. e.g., add check in configure to define HAVE_LIBBPF and move
the existing code and move under else branch.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP)
  2020-02-04  1:46                     ` David Ahern
@ 2020-02-04  3:41                       ` Andrii Nakryiko
  2020-02-04  4:52                         ` David Ahern
  0 siblings, 1 reply; 46+ messages in thread
From: Andrii Nakryiko @ 2020-02-04  3:41 UTC (permalink / raw)
  To: David Ahern
  Cc: Toke Høiland-Jørgensen, Daniel Borkmann,
	Stephen Hemminger, Alexei Starovoitov, Martin KaFai Lau,
	Song Liu, Yonghong Song, David Miller, Jesper Dangaard Brouer,
	Networking, bpf

On Mon, Feb 3, 2020 at 5:46 PM David Ahern <dsahern@gmail.com> wrote:
>
> On 2/3/20 5:56 PM, Andrii Nakryiko wrote:
> > Great! Just to disambiguate and make sure we are in agreement, my hope
> > here is that iproute2 can completely delegate to libbpf all the ELF
> >
>
> iproute2 needs to compile and continue working as is when libbpf is not
> available. e.g., add check in configure to define HAVE_LIBBPF and move
> the existing code and move under else branch.

Wouldn't it be better to statically compile against libbpf in this
case and get rid a lot of BPF-related code and simplify the rest of
it? This can be easily done by using libbpf through submodule, the
same way as BCC and pahole do it.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP)
  2020-02-04  3:41                       ` Andrii Nakryiko
@ 2020-02-04  4:52                         ` David Ahern
  2020-02-04  5:00                           ` Andrii Nakryiko
  0 siblings, 1 reply; 46+ messages in thread
From: David Ahern @ 2020-02-04  4:52 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Toke Høiland-Jørgensen, Daniel Borkmann,
	Stephen Hemminger, Alexei Starovoitov, Martin KaFai Lau,
	Song Liu, Yonghong Song, David Miller, Jesper Dangaard Brouer,
	Networking, bpf

On 2/3/20 8:41 PM, Andrii Nakryiko wrote:
> On Mon, Feb 3, 2020 at 5:46 PM David Ahern <dsahern@gmail.com> wrote:
>>
>> On 2/3/20 5:56 PM, Andrii Nakryiko wrote:
>>> Great! Just to disambiguate and make sure we are in agreement, my hope
>>> here is that iproute2 can completely delegate to libbpf all the ELF
>>>
>>
>> iproute2 needs to compile and continue working as is when libbpf is not
>> available. e.g., add check in configure to define HAVE_LIBBPF and move
>> the existing code and move under else branch.
> 
> Wouldn't it be better to statically compile against libbpf in this
> case and get rid a lot of BPF-related code and simplify the rest of
> it? This can be easily done by using libbpf through submodule, the
> same way as BCC and pahole do it.
> 

iproute2 compiles today and runs on older distributions and older
distributions with newer kernels. That needs to hold true after the move
to libbpf.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP)
  2020-02-04  4:52                         ` David Ahern
@ 2020-02-04  5:00                           ` Andrii Nakryiko
  2020-02-04  8:25                             ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 46+ messages in thread
From: Andrii Nakryiko @ 2020-02-04  5:00 UTC (permalink / raw)
  To: David Ahern
  Cc: Toke Høiland-Jørgensen, Daniel Borkmann,
	Stephen Hemminger, Alexei Starovoitov, Martin KaFai Lau,
	Song Liu, Yonghong Song, David Miller, Jesper Dangaard Brouer,
	Networking, bpf

On Mon, Feb 3, 2020 at 8:53 PM David Ahern <dsahern@gmail.com> wrote:
>
> On 2/3/20 8:41 PM, Andrii Nakryiko wrote:
> > On Mon, Feb 3, 2020 at 5:46 PM David Ahern <dsahern@gmail.com> wrote:
> >>
> >> On 2/3/20 5:56 PM, Andrii Nakryiko wrote:
> >>> Great! Just to disambiguate and make sure we are in agreement, my hope
> >>> here is that iproute2 can completely delegate to libbpf all the ELF
> >>>
> >>
> >> iproute2 needs to compile and continue working as is when libbpf is not
> >> available. e.g., add check in configure to define HAVE_LIBBPF and move
> >> the existing code and move under else branch.
> >
> > Wouldn't it be better to statically compile against libbpf in this
> > case and get rid a lot of BPF-related code and simplify the rest of
> > it? This can be easily done by using libbpf through submodule, the
> > same way as BCC and pahole do it.
> >
>
> iproute2 compiles today and runs on older distributions and older
> distributions with newer kernels. That needs to hold true after the move
> to libbpf.

And by statically compiling against libbpf, checked out as a
submodule, that will still hold true, wouldn't it? Or there is some
complications I'm missing? Libbpf is designed to handle old kernels
with no problems.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP)
  2020-02-04  5:00                           ` Andrii Nakryiko
@ 2020-02-04  8:25                             ` Toke Høiland-Jørgensen
  2020-02-04 18:47                               ` Andrii Nakryiko
  0 siblings, 1 reply; 46+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-02-04  8:25 UTC (permalink / raw)
  To: Andrii Nakryiko, David Ahern
  Cc: Daniel Borkmann, Stephen Hemminger, Alexei Starovoitov,
	Martin KaFai Lau, Song Liu, Yonghong Song, David Miller,
	Jesper Dangaard Brouer, Networking, bpf

Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> On Mon, Feb 3, 2020 at 8:53 PM David Ahern <dsahern@gmail.com> wrote:
>>
>> On 2/3/20 8:41 PM, Andrii Nakryiko wrote:
>> > On Mon, Feb 3, 2020 at 5:46 PM David Ahern <dsahern@gmail.com> wrote:
>> >>
>> >> On 2/3/20 5:56 PM, Andrii Nakryiko wrote:
>> >>> Great! Just to disambiguate and make sure we are in agreement, my hope
>> >>> here is that iproute2 can completely delegate to libbpf all the ELF
>> >>>
>> >>
>> >> iproute2 needs to compile and continue working as is when libbpf is not
>> >> available. e.g., add check in configure to define HAVE_LIBBPF and move
>> >> the existing code and move under else branch.
>> >
>> > Wouldn't it be better to statically compile against libbpf in this
>> > case and get rid a lot of BPF-related code and simplify the rest of
>> > it? This can be easily done by using libbpf through submodule, the
>> > same way as BCC and pahole do it.
>> >
>>
>> iproute2 compiles today and runs on older distributions and older
>> distributions with newer kernels. That needs to hold true after the move
>> to libbpf.
>
> And by statically compiling against libbpf, checked out as a
> submodule, that will still hold true, wouldn't it? Or there is some
> complications I'm missing? Libbpf is designed to handle old kernels
> with no problems.

My plan was to use the same configure test I'm using for xdp-tools
(where I in turn copied the structure of the configure script from
iproute2):

https://github.com/xdp-project/xdp-tools/blob/master/configure#L59

This will look for a system libbpf install and compile against it if it
is compatible, and otherwise fall back to a statically linking against a
git submodule.

We'll need to double-check that this will work on everything currently
supported by iproute2, and fix libbpf if there are any issues with that.
Not that I foresee any, but you never know :)

-Toke


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP)
  2020-02-04  0:56                   ` Andrii Nakryiko
  2020-02-04  1:46                     ` David Ahern
@ 2020-02-04  8:27                     ` Toke Høiland-Jørgensen
  1 sibling, 0 replies; 46+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-02-04  8:27 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Daniel Borkmann, Stephen Hemminger, Alexei Starovoitov,
	Martin KaFai Lau, Song Liu, Yonghong Song, David Miller,
	Jesper Dangaard Brouer, Networking, bpf

Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> On Mon, Feb 3, 2020 at 11:34 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>>
>> > On Wed, Aug 28, 2019 at 1:40 PM Andrii Nakryiko
>> > <andrii.nakryiko@gmail.com> wrote:
>> >>
>> >> On Fri, Aug 23, 2019 at 4:29 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >> >
>> >> > [ ... snip ...]
>> >> >
>> >> > > E.g., today's API is essentially three steps:
>> >> > >
>> >> > > 1. open and parse ELF: collect relos, programs, map definitions
>> >> > > 2. load: create maps from collected defs, do program/global data/CO-RE
>> >> > > relocs, load and verify BPF programs
>> >> > > 3. attach programs one by one.
>> >> > >
>> >> > > Between step 1 and 2 user has flexibility to create more maps, set up
>> >> > > map-in-map, etc. Between 2 and 3 you can fill in global data, fill in
>> >> > > tail call maps, etc. That's already pretty flexible. But we can tune
>> >> > > and break apart those steps even further, if necessary.
>> >> >
>> >> > Today, steps 1 and 2 can be collapsed into a single call to
>> >> > bpf_prog_load_xattr(). As Jesper's mail explains, for XDP we don't
>> >> > generally want to do all the fancy rewriting stuff, we just want a
>> >> > simple way to load a program and get reusable pinning of maps.
>> >>
>> >> I agree. See my response to Jesper's message. Note also my view of
>> >> bpf_prog_load_xattr() existence.
>> >>
>> >> > Preferably in a way that is compatible with the iproute2 loader.
>> >> >
>> >
>> > Hi Toke,
>> >
>> > I was wondering what's the state of converting iproute2 to use libbpf?
>> > Is this still something you (or someone else) interested to do?
>>
>> Yeah, it's still on my list; planning to circle back to it once I have
>> finished an RFC implementation for XDP multiprog loading based on the
>> new function-replacing in the kernel.
>>
>> (Not that this should keep anyone else from giving the conversion a go
>> and beating me to it :)).
>>
>> > Briefly re-reading the thread, I think libbpf already has almost
>> > everything to be used by iproute2. You've added map pinning, so with
>> > bpf_map__set_pin_path() iproute2 should be able to specify pinning
>> > path, according to its own logic. The only thing missing that I can
>> > see is ability to specify numa_node, which we should add both to
>> > BTF-defined map definitions (trivial change), as well as probably
>> > expose a method like bpf_map__set_numa_node(struct bpf_map *map, int
>> > numa_node) for non-declarative and non-BTF legacy cases.
>>
>> Yes, adding this to libbpf would be good.
>>
>> > There was concern about supporting "extended" bpf_map_def format of
>> > iproute2 (bpf_elf_map, actually) with extra fields. I think it's
>> > actually easy to handle as is without any extra new APIs.
>> > bpf_object__open() w/ .relaxed_maps = true option will process
>> > compatible 5 fields of bpf_map_def (type, key/value sizes,
>> > max_entries, and map_flags) and will set up corresponding struct
>> > bpf_map entries (but won't create BPF maps in kernel yet). Then
>> > iproute2 can iterate over "maps" ELF section on its own, and see which
>> > maps need to get some more adjustments before load phase: map-in-map
>> > set up, numa node, pinning, etc. All those adjustments can be done
>> > (except for numa yet) through existing libbpf APIs, as far as I can
>> > tell. Once that is taken care of, proceed to bpf_object__load() and
>> > other standard steps. No callbacks, no extra cruft.
>> >
>> > Is there anything else that can block iproute2 conversion to libbpf?
>>
>> I haven't looked into the details since my last RFC conversion series,
>> but from what I recall from that, and what we've been changing in libbpf
>> since, I was basically planning to do what you explained. So while there
>> are some details to work out, I believe it's basically straight forward,
>> and I can't think of anything that should block it.
>>
>
> Great! Just to disambiguate and make sure we are in agreement, my hope
> here is that iproute2 can completely delegate to libbpf all the ELF
> parsing, map creation, program loading, etc (including all the new
> stuff like global variables, etc). And only for legacy maps in
> SEC("maps"), it would have to parse that *single* ELF section (again,
> on its own) and see if there are any extra features of struct
> bpf_elf_map requested (i.e., numa, map-in-map, pinning), and if yes,
> it would use programmatic libbpf APIs to set this up. It might need to
> do additional BPF_PROG_ARRAY set up after BPF programs are loaded
> (because iproute2 has its custom naming-based convention). But
> hopefully we'll encourage people to gradually migrate to BTF-defined
> maps with declarative ways of doing all that.

Yup, that is my hope as well. Let's see how it goes :)

-Toke


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP)
  2020-02-04  8:25                             ` Toke Høiland-Jørgensen
@ 2020-02-04 18:47                               ` Andrii Nakryiko
  2020-02-04 19:19                                 ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 46+ messages in thread
From: Andrii Nakryiko @ 2020-02-04 18:47 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: David Ahern, Daniel Borkmann, Stephen Hemminger,
	Alexei Starovoitov, Martin KaFai Lau, Song Liu, Yonghong Song,
	David Miller, Jesper Dangaard Brouer, Networking, bpf

On Tue, Feb 4, 2020 at 12:25 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>
> > On Mon, Feb 3, 2020 at 8:53 PM David Ahern <dsahern@gmail.com> wrote:
> >>
> >> On 2/3/20 8:41 PM, Andrii Nakryiko wrote:
> >> > On Mon, Feb 3, 2020 at 5:46 PM David Ahern <dsahern@gmail.com> wrote:
> >> >>
> >> >> On 2/3/20 5:56 PM, Andrii Nakryiko wrote:
> >> >>> Great! Just to disambiguate and make sure we are in agreement, my hope
> >> >>> here is that iproute2 can completely delegate to libbpf all the ELF
> >> >>>
> >> >>
> >> >> iproute2 needs to compile and continue working as is when libbpf is not
> >> >> available. e.g., add check in configure to define HAVE_LIBBPF and move
> >> >> the existing code and move under else branch.
> >> >
> >> > Wouldn't it be better to statically compile against libbpf in this
> >> > case and get rid a lot of BPF-related code and simplify the rest of
> >> > it? This can be easily done by using libbpf through submodule, the
> >> > same way as BCC and pahole do it.
> >> >
> >>
> >> iproute2 compiles today and runs on older distributions and older
> >> distributions with newer kernels. That needs to hold true after the move
> >> to libbpf.
> >
> > And by statically compiling against libbpf, checked out as a
> > submodule, that will still hold true, wouldn't it? Or there is some
> > complications I'm missing? Libbpf is designed to handle old kernels
> > with no problems.
>
> My plan was to use the same configure test I'm using for xdp-tools
> (where I in turn copied the structure of the configure script from
> iproute2):
>
> https://github.com/xdp-project/xdp-tools/blob/master/configure#L59
>
> This will look for a system libbpf install and compile against it if it
> is compatible, and otherwise fall back to a statically linking against a
> git submodule.

How will this work when build host has libbpf installed, but target
host doesn't? You'll get dynamic linker error when trying to run that
tool.

If the goal is to have a reliable tool working everywhere, and you
already support having libbpf as a submodule, why not always use
submodule's libbpf? What's the concern? Libbpf is a small library, I
don't think a binary size argument is enough reason to not do this. On
the other hand, by using libbpf from submodule, your tool is built
*and tested* with a well-known libbpf version that tool-producer
controls.

>
> We'll need to double-check that this will work on everything currently
> supported by iproute2, and fix libbpf if there are any issues with that.
> Not that I foresee any, but you never know :)
>
> -Toke
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP)
  2020-02-04 18:47                               ` Andrii Nakryiko
@ 2020-02-04 19:19                                 ` Toke Høiland-Jørgensen
  2020-02-04 19:29                                   ` Andrii Nakryiko
  0 siblings, 1 reply; 46+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-02-04 19:19 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: David Ahern, Daniel Borkmann, Stephen Hemminger,
	Alexei Starovoitov, Martin KaFai Lau, Song Liu, Yonghong Song,
	David Miller, Jesper Dangaard Brouer, Networking, bpf

Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> On Tue, Feb 4, 2020 at 12:25 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>>
>> > On Mon, Feb 3, 2020 at 8:53 PM David Ahern <dsahern@gmail.com> wrote:
>> >>
>> >> On 2/3/20 8:41 PM, Andrii Nakryiko wrote:
>> >> > On Mon, Feb 3, 2020 at 5:46 PM David Ahern <dsahern@gmail.com> wrote:
>> >> >>
>> >> >> On 2/3/20 5:56 PM, Andrii Nakryiko wrote:
>> >> >>> Great! Just to disambiguate and make sure we are in agreement, my hope
>> >> >>> here is that iproute2 can completely delegate to libbpf all the ELF
>> >> >>>
>> >> >>
>> >> >> iproute2 needs to compile and continue working as is when libbpf is not
>> >> >> available. e.g., add check in configure to define HAVE_LIBBPF and move
>> >> >> the existing code and move under else branch.
>> >> >
>> >> > Wouldn't it be better to statically compile against libbpf in this
>> >> > case and get rid a lot of BPF-related code and simplify the rest of
>> >> > it? This can be easily done by using libbpf through submodule, the
>> >> > same way as BCC and pahole do it.
>> >> >
>> >>
>> >> iproute2 compiles today and runs on older distributions and older
>> >> distributions with newer kernels. That needs to hold true after the move
>> >> to libbpf.
>> >
>> > And by statically compiling against libbpf, checked out as a
>> > submodule, that will still hold true, wouldn't it? Or there is some
>> > complications I'm missing? Libbpf is designed to handle old kernels
>> > with no problems.
>>
>> My plan was to use the same configure test I'm using for xdp-tools
>> (where I in turn copied the structure of the configure script from
>> iproute2):
>>
>> https://github.com/xdp-project/xdp-tools/blob/master/configure#L59
>>
>> This will look for a system libbpf install and compile against it if it
>> is compatible, and otherwise fall back to a statically linking against a
>> git submodule.
>
> How will this work when build host has libbpf installed, but target
> host doesn't? You'll get dynamic linker error when trying to run that
> tool.

That's called dependency tracking; distros have various ways of going
about that :)

But yeah, if you're going to do you own cross-compilation, you'd
probably want to just force using the static library.

> If the goal is to have a reliable tool working everywhere, and you
> already support having libbpf as a submodule, why not always use
> submodule's libbpf? What's the concern? Libbpf is a small library, I
> don't think a binary size argument is enough reason to not do this. On
> the other hand, by using libbpf from submodule, your tool is built
> *and tested* with a well-known libbpf version that tool-producer
> controls.

I thought we already had this discussion? :)

libbpf is a library like any other. Distros that package the library
want the tools that use it to be dynamically linked against it so
library upgrades (especially of the CVE-fixing kind) get picked up by
all users. Other distros have memory and space constraints (iproute2 is
shipped on OpenWrt, for instance, which is *extremely*
space-constrained). And yeah, other deployments don't care and will just
statically compile in the vendored version. So we'll need to support all
of those use cases.

-Toke


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP)
  2020-02-04 19:19                                 ` Toke Høiland-Jørgensen
@ 2020-02-04 19:29                                   ` Andrii Nakryiko
  2020-02-04 21:56                                     ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 46+ messages in thread
From: Andrii Nakryiko @ 2020-02-04 19:29 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: David Ahern, Daniel Borkmann, Stephen Hemminger,
	Alexei Starovoitov, Martin KaFai Lau, Song Liu, Yonghong Song,
	David Miller, Jesper Dangaard Brouer, Networking, bpf

On Tue, Feb 4, 2020 at 11:19 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>
> > On Tue, Feb 4, 2020 at 12:25 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> >>
> >> > On Mon, Feb 3, 2020 at 8:53 PM David Ahern <dsahern@gmail.com> wrote:
> >> >>
> >> >> On 2/3/20 8:41 PM, Andrii Nakryiko wrote:
> >> >> > On Mon, Feb 3, 2020 at 5:46 PM David Ahern <dsahern@gmail.com> wrote:
> >> >> >>
> >> >> >> On 2/3/20 5:56 PM, Andrii Nakryiko wrote:
> >> >> >>> Great! Just to disambiguate and make sure we are in agreement, my hope
> >> >> >>> here is that iproute2 can completely delegate to libbpf all the ELF
> >> >> >>>
> >> >> >>
> >> >> >> iproute2 needs to compile and continue working as is when libbpf is not
> >> >> >> available. e.g., add check in configure to define HAVE_LIBBPF and move
> >> >> >> the existing code and move under else branch.
> >> >> >
> >> >> > Wouldn't it be better to statically compile against libbpf in this
> >> >> > case and get rid a lot of BPF-related code and simplify the rest of
> >> >> > it? This can be easily done by using libbpf through submodule, the
> >> >> > same way as BCC and pahole do it.
> >> >> >
> >> >>
> >> >> iproute2 compiles today and runs on older distributions and older
> >> >> distributions with newer kernels. That needs to hold true after the move
> >> >> to libbpf.
> >> >
> >> > And by statically compiling against libbpf, checked out as a
> >> > submodule, that will still hold true, wouldn't it? Or there is some
> >> > complications I'm missing? Libbpf is designed to handle old kernels
> >> > with no problems.
> >>
> >> My plan was to use the same configure test I'm using for xdp-tools
> >> (where I in turn copied the structure of the configure script from
> >> iproute2):
> >>
> >> https://github.com/xdp-project/xdp-tools/blob/master/configure#L59
> >>
> >> This will look for a system libbpf install and compile against it if it
> >> is compatible, and otherwise fall back to a statically linking against a
> >> git submodule.
> >
> > How will this work when build host has libbpf installed, but target
> > host doesn't? You'll get dynamic linker error when trying to run that
> > tool.
>
> That's called dependency tracking; distros have various ways of going
> about that :)

I'm confused, honestly. libbpf is either a dependency and thus can be
relied upon to be present in the target system, or it's not and this
whole dance with detecting libbpf presence needs to be performed.

If libbpf is optional, then I don't see how iproute2 BPF-related code
and complexity can be reduced at all, given it should still support
loading BPF programs even without libbpf. Furthermore, given libbpf
supports more features already and will probably be outpacing
iproute2's own BPF support in the future, some users will start
relying on BPF features supported only by libbpf "backend", so
iproute2's own BPF backend will just fail to load such programs,
bringing unpleasant surprises, potentially. So I still fail to see how
libbpf can be optional and what benefit does that bring. But shared or
static - whatever fits iproute2 best, no preferences.

>
> But yeah, if you're going to do you own cross-compilation, you'd
> probably want to just force using the static library.
>
> > If the goal is to have a reliable tool working everywhere, and you
> > already support having libbpf as a submodule, why not always use
> > submodule's libbpf? What's the concern? Libbpf is a small library, I
> > don't think a binary size argument is enough reason to not do this. On
> > the other hand, by using libbpf from submodule, your tool is built
> > *and tested* with a well-known libbpf version that tool-producer
> > controls.
>
> I thought we already had this discussion? :)

Yeah, I guess we did for bpftool :) No worries, I don't care that
much, just see my notes above about optional libbpf and consequences.

>
> libbpf is a library like any other. Distros that package the library
> want the tools that use it to be dynamically linked against it so
> library upgrades (especially of the CVE-fixing kind) get picked up by
> all users. Other distros have memory and space constraints (iproute2 is
> shipped on OpenWrt, for instance, which is *extremely*
> space-constrained). And yeah, other deployments don't care and will just
> statically compile in the vendored version. So we'll need to support all
> of those use cases.
>
> -Toke
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP)
  2020-02-04 19:29                                   ` Andrii Nakryiko
@ 2020-02-04 21:56                                     ` Toke Høiland-Jørgensen
  2020-02-04 22:12                                       ` David Ahern
  0 siblings, 1 reply; 46+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-02-04 21:56 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: David Ahern, Daniel Borkmann, Stephen Hemminger,
	Alexei Starovoitov, Martin KaFai Lau, Song Liu, Yonghong Song,
	David Miller, Jesper Dangaard Brouer, Networking, bpf

Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> On Tue, Feb 4, 2020 at 11:19 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>>
>> > On Tue, Feb 4, 2020 at 12:25 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >>
>> >> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>> >>
>> >> > On Mon, Feb 3, 2020 at 8:53 PM David Ahern <dsahern@gmail.com> wrote:
>> >> >>
>> >> >> On 2/3/20 8:41 PM, Andrii Nakryiko wrote:
>> >> >> > On Mon, Feb 3, 2020 at 5:46 PM David Ahern <dsahern@gmail.com> wrote:
>> >> >> >>
>> >> >> >> On 2/3/20 5:56 PM, Andrii Nakryiko wrote:
>> >> >> >>> Great! Just to disambiguate and make sure we are in agreement, my hope
>> >> >> >>> here is that iproute2 can completely delegate to libbpf all the ELF
>> >> >> >>>
>> >> >> >>
>> >> >> >> iproute2 needs to compile and continue working as is when libbpf is not
>> >> >> >> available. e.g., add check in configure to define HAVE_LIBBPF and move
>> >> >> >> the existing code and move under else branch.
>> >> >> >
>> >> >> > Wouldn't it be better to statically compile against libbpf in this
>> >> >> > case and get rid a lot of BPF-related code and simplify the rest of
>> >> >> > it? This can be easily done by using libbpf through submodule, the
>> >> >> > same way as BCC and pahole do it.
>> >> >> >
>> >> >>
>> >> >> iproute2 compiles today and runs on older distributions and older
>> >> >> distributions with newer kernels. That needs to hold true after the move
>> >> >> to libbpf.
>> >> >
>> >> > And by statically compiling against libbpf, checked out as a
>> >> > submodule, that will still hold true, wouldn't it? Or there is some
>> >> > complications I'm missing? Libbpf is designed to handle old kernels
>> >> > with no problems.
>> >>
>> >> My plan was to use the same configure test I'm using for xdp-tools
>> >> (where I in turn copied the structure of the configure script from
>> >> iproute2):
>> >>
>> >> https://github.com/xdp-project/xdp-tools/blob/master/configure#L59
>> >>
>> >> This will look for a system libbpf install and compile against it if it
>> >> is compatible, and otherwise fall back to a statically linking against a
>> >> git submodule.
>> >
>> > How will this work when build host has libbpf installed, but target
>> > host doesn't? You'll get dynamic linker error when trying to run that
>> > tool.
>>
>> That's called dependency tracking; distros have various ways of going
>> about that :)
>
> I'm confused, honestly. libbpf is either a dependency and thus can be
> relied upon to be present in the target system, or it's not and this
> whole dance with detecting libbpf presence needs to be performed.

Yes, and iproute2 is likely to be built in both sorts of environments,
so we will have to support both :)

> If libbpf is optional, then I don't see how iproute2 BPF-related code
> and complexity can be reduced at all, given it should still support
> loading BPF programs even without libbpf. Furthermore, given libbpf
> supports more features already and will probably be outpacing
> iproute2's own BPF support in the future, some users will start
> relying on BPF features supported only by libbpf "backend", so
> iproute2's own BPF backend will just fail to load such programs,
> bringing unpleasant surprises, potentially. So I still fail to see how
> libbpf can be optional and what benefit does that bring.

I wasn't saying that libbpf itself should be optional; if we're porting
things, we should rip out as much of the old code as we can. I just
meant that we should support both modes of building, so distros that
*do* build libbpf as a library can link iproute2 against that with as
little friction as possible.

I'm dead set on a specific auto-detection semantic either; I guess it'll
be up to the iproute2 maintainers whether they prefer defaulting to one
or the other.

> But shared or static - whatever fits iproute2 best, no preferences.

Right, cool, I think we are basically agreed, given the above :)

-Toke


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP)
  2020-02-04 21:56                                     ` Toke Høiland-Jørgensen
@ 2020-02-04 22:12                                       ` David Ahern
  2020-02-04 22:35                                         ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 46+ messages in thread
From: David Ahern @ 2020-02-04 22:12 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Andrii Nakryiko
  Cc: Daniel Borkmann, Stephen Hemminger, Alexei Starovoitov,
	Martin KaFai Lau, Song Liu, Yonghong Song, David Miller,
	Jesper Dangaard Brouer, Networking, bpf

On 2/4/20 2:56 PM, Toke Høiland-Jørgensen wrote:
>> I'm confused, honestly. libbpf is either a dependency and thus can be
>> relied upon to be present in the target system, or it's not and this
>> whole dance with detecting libbpf presence needs to be performed.
> 
> Yes, and iproute2 is likely to be built in both sorts of environments,
> so we will have to support both :)
> 
>> If libbpf is optional, then I don't see how iproute2 BPF-related code
>> and complexity can be reduced at all, given it should still support
>> loading BPF programs even without libbpf. Furthermore, given libbpf
>> supports more features already and will probably be outpacing
>> iproute2's own BPF support in the future, some users will start
>> relying on BPF features supported only by libbpf "backend", so
>> iproute2's own BPF backend will just fail to load such programs,
>> bringing unpleasant surprises, potentially. So I still fail to see how
>> libbpf can be optional and what benefit does that bring.
> 
> I wasn't saying that libbpf itself should be optional; if we're porting
> things, we should rip out as much of the old code as we can. I just
> meant that we should support both modes of building, so distros that
> *do* build libbpf as a library can link iproute2 against that with as
> little friction as possible.
> 
> I'm dead set on a specific auto-detection semantic either; I guess it'll
> be up to the iproute2 maintainers whether they prefer defaulting to one
> or the other.
> 

A few concerns from my perspective:

1. Right now ip comes in around 650k unstripped; libbpf.a for 0.0.7 is
around 1.2M with the size of libbpf.o > than ip. Most likely, making
iproute2 use libbpf statically is going to be challenging and I am not
sure it is the right thing to do (unless the user is building a static
version of iproute2 commands).

2. git submodules can be a PITA to deal with (e.g., jumping between
branches and versions), so there needs to be a good reason for it.

3. iproute2 code needs to build for a wide range of OSes and not lose
functionality compared to what it has today.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP)
  2020-02-04 22:12                                       ` David Ahern
@ 2020-02-04 22:35                                         ` Toke Høiland-Jørgensen
  2020-02-04 23:13                                           ` David Ahern
  0 siblings, 1 reply; 46+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-02-04 22:35 UTC (permalink / raw)
  To: David Ahern, Andrii Nakryiko
  Cc: Daniel Borkmann, Stephen Hemminger, Alexei Starovoitov,
	Martin KaFai Lau, Song Liu, Yonghong Song, David Miller,
	Jesper Dangaard Brouer, Networking, bpf

David Ahern <dsahern@gmail.com> writes:

> On 2/4/20 2:56 PM, Toke Høiland-Jørgensen wrote:
>>> I'm confused, honestly. libbpf is either a dependency and thus can be
>>> relied upon to be present in the target system, or it's not and this
>>> whole dance with detecting libbpf presence needs to be performed.
>> 
>> Yes, and iproute2 is likely to be built in both sorts of environments,
>> so we will have to support both :)
>> 
>>> If libbpf is optional, then I don't see how iproute2 BPF-related code
>>> and complexity can be reduced at all, given it should still support
>>> loading BPF programs even without libbpf. Furthermore, given libbpf
>>> supports more features already and will probably be outpacing
>>> iproute2's own BPF support in the future, some users will start
>>> relying on BPF features supported only by libbpf "backend", so
>>> iproute2's own BPF backend will just fail to load such programs,
>>> bringing unpleasant surprises, potentially. So I still fail to see how
>>> libbpf can be optional and what benefit does that bring.
>> 
>> I wasn't saying that libbpf itself should be optional; if we're porting
>> things, we should rip out as much of the old code as we can. I just
>> meant that we should support both modes of building, so distros that
>> *do* build libbpf as a library can link iproute2 against that with as
>> little friction as possible.
>> 
>> I'm dead set on a specific auto-detection semantic either; I guess it'll
>> be up to the iproute2 maintainers whether they prefer defaulting to one
>> or the other.
>> 
>
> A few concerns from my perspective:
>
> 1. Right now ip comes in around 650k unstripped; libbpf.a for 0.0.7 is
> around 1.2M with the size of libbpf.o > than ip.

Hmm, I'm getting ~700k for libbpf.a and libbpf.so.0.0.7 is ~480k (for
whichever kernel I currently have checked out). But lib/bpf.o in
iproute2 is only 80k, so fair point :)

> Most likely, making iproute2 use libbpf statically is going to be
> challenging and I am not sure it is the right thing to do (unless the
> user is building a static version of iproute2 commands).

Linking dynamically would imply a new dependency. I'm not necessarily
against that, but would it be acceptable from your PoV? And if so,
should we keep the current internal BPF code for when libbpf is not
available, or would it be acceptable to not be able to load BPF programs
if libbpf is not present (similar to how the libelf dependency works
today)?

> 2. git submodules can be a PITA to deal with (e.g., jumping between
> branches and versions), so there needs to be a good reason for it.

Yes, totally with you on that. Another option could be to just copy the
files into the iproute2 tree, and update them the same way the kernel
headers are? Or maybe doing fancy things like this:
https://github.com/apenwarr/git-subtrac

> 3. iproute2 code needs to build for a wide range of OSes and not lose
> functionality compared to what it has today.

Could you be a bit more specific about "a wide range of OSes"? I guess
we could do the work to make sure libbpf builds on all the same
platforms iproute2 supports, but we'd need something a bit more definite
to go on...

-Toke


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP)
  2020-02-04 22:35                                         ` Toke Høiland-Jørgensen
@ 2020-02-04 23:13                                           ` David Ahern
  2020-02-05 10:37                                             ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 46+ messages in thread
From: David Ahern @ 2020-02-04 23:13 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Andrii Nakryiko
  Cc: Daniel Borkmann, Stephen Hemminger, Alexei Starovoitov,
	Martin KaFai Lau, Song Liu, Yonghong Song, David Miller,
	Jesper Dangaard Brouer, Networking, bpf

On 2/4/20 3:35 PM, Toke Høiland-Jørgensen wrote:
> 
>> Most likely, making iproute2 use libbpf statically is going to be
>> challenging and I am not sure it is the right thing to do (unless the
>> user is building a static version of iproute2 commands).
> 
> Linking dynamically would imply a new dependency. I'm not necessarily
> against that, but would it be acceptable from your PoV? And if so,
> should we keep the current internal BPF code for when libbpf is not
> available, or would it be acceptable to not be able to load BPF programs
> if libbpf is not present (similar to how the libelf dependency works
> today)?

iproute2 recently gained the libmnl dependency for extack. Seems like
libbpf falls into the similar category.

> 
>> 2. git submodules can be a PITA to deal with (e.g., jumping between
>> branches and versions), so there needs to be a good reason for it.
> 
> Yes, totally with you on that. Another option could be to just copy the
> files into the iproute2 tree, and update them the same way the kernel
> headers are? Or maybe doing fancy things like this:
> https://github.com/apenwarr/git-subtrac

kernel uapi is a totally different reason to import the headers. bpf
functionality is an add-on.

I would like to see iproute2 work with libbpf. Given libbpf's current
status and availability across OS'es that is going to be a challenge for
a lot of OS'es which is why I suggested the HAVE_LIBBPF check falls back
to existing code if libbpf is not installed.

> 
>> 3. iproute2 code needs to build for a wide range of OSes and not lose
>> functionality compared to what it has today.
> 
> Could you be a bit more specific about "a wide range of OSes"? I guess
> we could do the work to make sure libbpf builds on all the same
> platforms iproute2 supports, but we'd need something a bit more definite
> to go on...
> 

rhel5/centos5? definitely rhel6/centos6 time frame and forward.

Stephen: has the backwards lifetime ever been stated?

Changing configure to check for existence and fall back to existing code
seems to me the safest option.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP)
  2020-02-04 23:13                                           ` David Ahern
@ 2020-02-05 10:37                                             ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 46+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-02-05 10:37 UTC (permalink / raw)
  To: David Ahern, Andrii Nakryiko
  Cc: Daniel Borkmann, Stephen Hemminger, Alexei Starovoitov,
	Martin KaFai Lau, Song Liu, Yonghong Song, David Miller,
	Jesper Dangaard Brouer, Networking, bpf

David Ahern <dsahern@gmail.com> writes:

> On 2/4/20 3:35 PM, Toke Høiland-Jørgensen wrote:
>> 
>>> Most likely, making iproute2 use libbpf statically is going to be
>>> challenging and I am not sure it is the right thing to do (unless the
>>> user is building a static version of iproute2 commands).
>> 
>> Linking dynamically would imply a new dependency. I'm not necessarily
>> against that, but would it be acceptable from your PoV? And if so,
>> should we keep the current internal BPF code for when libbpf is not
>> available, or would it be acceptable to not be able to load BPF programs
>> if libbpf is not present (similar to how the libelf dependency works
>> today)?
>
> iproute2 recently gained the libmnl dependency for extack. Seems like
> libbpf falls into the similar category.
>
>> 
>>> 2. git submodules can be a PITA to deal with (e.g., jumping between
>>> branches and versions), so there needs to be a good reason for it.
>> 
>> Yes, totally with you on that. Another option could be to just copy the
>> files into the iproute2 tree, and update them the same way the kernel
>> headers are? Or maybe doing fancy things like this:
>> https://github.com/apenwarr/git-subtrac
>
> kernel uapi is a totally different reason to import the headers. bpf
> functionality is an add-on.
>
> I would like to see iproute2 work with libbpf. Given libbpf's current
> status and availability across OS'es that is going to be a challenge for
> a lot of OS'es which is why I suggested the HAVE_LIBBPF check falls back
> to existing code if libbpf is not installed.

Sure, can do.

-Toke


^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2020-02-05 10:37 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-20 11:47 [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP) Toke Høiland-Jørgensen
2019-08-20 11:47 ` [RFC bpf-next 1/5] libbpf: Add map definition struct fields from iproute2 Toke Høiland-Jørgensen
2019-08-20 11:47 ` [RFC bpf-next 2/5] libbpf: Add support for auto-pinning of maps with reuse on program load Toke Høiland-Jørgensen
2019-08-20 11:47 ` [RFC bpf-next 3/5] libbpf: Add support for specifying map pinning path via callback Toke Høiland-Jørgensen
2019-08-20 11:47 ` [RFC bpf-next 4/5] iproute2: Allow compiling against libbpf Toke Høiland-Jørgensen
2019-08-22  8:58   ` Daniel Borkmann
2019-08-22 10:43     ` Toke Høiland-Jørgensen
2019-08-22 11:45       ` Daniel Borkmann
2019-08-22 12:04         ` Toke Høiland-Jørgensen
2019-08-22 12:33           ` Daniel Borkmann
2019-08-22 13:38             ` Toke Høiland-Jørgensen
2019-08-22 13:45               ` Daniel Borkmann
2019-08-22 15:28                 ` Toke Høiland-Jørgensen
2019-08-20 11:47 ` [RFC bpf-next 5/5] iproute2: Support loading XDP programs with libbpf Toke Høiland-Jørgensen
2019-08-21 19:26 ` [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP) Alexei Starovoitov
2019-08-21 21:00   ` Toke Høiland-Jørgensen
2019-08-22  7:52     ` Andrii Nakryiko
2019-08-22 10:38       ` Toke Høiland-Jørgensen
2019-08-21 20:30 ` Andrii Nakryiko
2019-08-21 21:07   ` Toke Høiland-Jørgensen
2019-08-22  7:49     ` Andrii Nakryiko
2019-08-22  8:33       ` Daniel Borkmann
2019-08-22 11:48         ` Toke Høiland-Jørgensen
2019-08-22 11:49           ` Toke Høiland-Jørgensen
2019-08-23  6:31         ` Andrii Nakryiko
2019-08-23 11:29           ` Toke Høiland-Jørgensen
2019-08-28 20:40             ` Andrii Nakryiko
2020-02-03  7:29               ` Andrii Nakryiko
2020-02-03 19:34                 ` Toke Høiland-Jørgensen
2020-02-04  0:56                   ` Andrii Nakryiko
2020-02-04  1:46                     ` David Ahern
2020-02-04  3:41                       ` Andrii Nakryiko
2020-02-04  4:52                         ` David Ahern
2020-02-04  5:00                           ` Andrii Nakryiko
2020-02-04  8:25                             ` Toke Høiland-Jørgensen
2020-02-04 18:47                               ` Andrii Nakryiko
2020-02-04 19:19                                 ` Toke Høiland-Jørgensen
2020-02-04 19:29                                   ` Andrii Nakryiko
2020-02-04 21:56                                     ` Toke Høiland-Jørgensen
2020-02-04 22:12                                       ` David Ahern
2020-02-04 22:35                                         ` Toke Høiland-Jørgensen
2020-02-04 23:13                                           ` David Ahern
2020-02-05 10:37                                             ` Toke Høiland-Jørgensen
2020-02-04  8:27                     ` Toke Høiland-Jørgensen
2019-08-23 10:27   ` Jesper Dangaard Brouer
2019-08-28 20:23     ` Andrii Nakryiko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).