All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH iproute2 -next v2 0/5] BPF updates
@ 2015-11-26 14:38 Daniel Borkmann
  2015-11-26 14:38 ` [PATCH iproute2 -next v2 1/5] {f,m}_bpf: make tail calls working Daniel Borkmann
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Daniel Borkmann @ 2015-11-26 14:38 UTC (permalink / raw)
  To: stephen; +Cc: ast, netdev, Daniel Borkmann

Some more updates on the BPF front-end to get further eBPF
functionality working with tc. See individual patches for
details. Targeted at iproute2's -next branch.

Thanks!

v1 -> v2:
 - fix minor stylistic nit spotted by Sergei

Daniel Borkmann (5):
  {f,m}_bpf: make tail calls working
  {f,m}_bpf: check map attributes when fetching as pinned
  {f,m}_bpf: allow for user-defined object pinnings
  {f,m}_bpf: allow updates on program arrays
  {f,m}_bpf: add more example code

 etc/iproute2/bpf_pinning    |   6 +
 examples/bpf/README         |  13 +
 examples/bpf/bpf_cyclic.c   |  32 ++
 examples/bpf/bpf_funcs.h    |  11 +
 examples/bpf/bpf_graft.c    |  70 +++++
 examples/bpf/bpf_tailcall.c | 115 +++++++
 include/bpf_elf.h           |   2 +-
 include/utils.h             |   4 +
 lib/rt_names.c              |   5 +-
 tc/e_bpf.c                  |  30 +-
 tc/tc_bpf.c                 | 708 +++++++++++++++++++++++++++++++++-----------
 tc/tc_bpf.h                 |   1 +
 12 files changed, 819 insertions(+), 178 deletions(-)
 create mode 100644 etc/iproute2/bpf_pinning
 create mode 100644 examples/bpf/README
 create mode 100644 examples/bpf/bpf_cyclic.c
 create mode 100644 examples/bpf/bpf_graft.c
 create mode 100644 examples/bpf/bpf_tailcall.c

-- 
1.9.3

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH iproute2 -next v2 1/5] {f,m}_bpf: make tail calls working
  2015-11-26 14:38 [PATCH iproute2 -next v2 0/5] BPF updates Daniel Borkmann
@ 2015-11-26 14:38 ` Daniel Borkmann
  2015-11-26 14:38 ` [PATCH iproute2 -next v2 2/5] {f,m}_bpf: check map attributes when fetching as pinned Daniel Borkmann
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Daniel Borkmann @ 2015-11-26 14:38 UTC (permalink / raw)
  To: stephen; +Cc: ast, netdev, Daniel Borkmann

Now that we have the possibility of sharing maps, it's time we get the
ELF loader fully working with regards to tail calls. Since program array
maps are pinned, we can keep them finally alive. I've noticed two bugs
that are being fixed in bpf_fill_prog_arrays() with this patch. Example
code comes as follow-up.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
 tc/tc_bpf.c | 27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/tc/tc_bpf.c b/tc/tc_bpf.c
index bc7bc9f..c3adc23 100644
--- a/tc/tc_bpf.c
+++ b/tc/tc_bpf.c
@@ -1139,11 +1139,22 @@ static int bpf_fetch_prog_sec(struct bpf_elf_ctx *ctx, const char *section)
 	return ret;
 }
 
+static int bpf_find_map_by_id(struct bpf_elf_ctx *ctx, uint32_t id)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(ctx->map_fds); i++)
+		if (ctx->map_fds[i] && ctx->maps[i].id == id &&
+		    ctx->maps[i].type == BPF_MAP_TYPE_PROG_ARRAY)
+			return i;
+	return -1;
+}
+
 static int bpf_fill_prog_arrays(struct bpf_elf_ctx *ctx)
 {
 	struct bpf_elf_sec_data data;
 	uint32_t map_id, key_id;
-	int fd, i, ret;
+	int fd, i, ret, idx;
 
 	for (i = 1; i < ctx->elf_hdr.e_shnum; i++) {
 		if (ctx->sec_done[i])
@@ -1153,20 +1164,20 @@ static int bpf_fill_prog_arrays(struct bpf_elf_ctx *ctx)
 		if (ret < 0)
 			continue;
 
-		ret = sscanf(data.sec_name, "%u/%u", &map_id, &key_id);
-		if (ret != 2 || map_id >= ARRAY_SIZE(ctx->map_fds) ||
-		    !ctx->map_fds[map_id])
+		ret = sscanf(data.sec_name, "%i/%i", &map_id, &key_id);
+		if (ret != 2)
 			continue;
-		if (ctx->maps[map_id].type != BPF_MAP_TYPE_PROG_ARRAY ||
-		    ctx->maps[map_id].max_elem <= key_id)
+
+		idx = bpf_find_map_by_id(ctx, map_id);
+		if (idx < 0)
 			continue;
 
 		fd = bpf_fetch_prog_sec(ctx, data.sec_name);
 		if (fd < 0)
 			return -EIO;
 
-		ret = bpf_map_update(ctx->map_fds[map_id], &key_id,
-				     &fd, BPF_NOEXIST);
+		ret = bpf_map_update(ctx->map_fds[idx], &key_id,
+				     &fd, BPF_ANY);
 		if (ret < 0)
 			return -ENOENT;
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH iproute2 -next v2 2/5] {f,m}_bpf: check map attributes when fetching as pinned
  2015-11-26 14:38 [PATCH iproute2 -next v2 0/5] BPF updates Daniel Borkmann
  2015-11-26 14:38 ` [PATCH iproute2 -next v2 1/5] {f,m}_bpf: make tail calls working Daniel Borkmann
@ 2015-11-26 14:38 ` Daniel Borkmann
  2015-11-26 14:38 ` [PATCH iproute2 -next v2 3/5] {f,m}_bpf: allow for user-defined object pinnings Daniel Borkmann
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Daniel Borkmann @ 2015-11-26 14:38 UTC (permalink / raw)
  To: stephen; +Cc: ast, netdev, Daniel Borkmann

Make use of the new show_fdinfo() facility and verify that when a
pinned map is being fetched that its basic attributes are the same
as the map we declared from the ELF file. I.e. when placed into the
globalns, collisions could occur. In such a case warn the user and
bail out.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
 tc/tc_bpf.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/tc/tc_bpf.c b/tc/tc_bpf.c
index c3adc23..b44b123 100644
--- a/tc/tc_bpf.c
+++ b/tc/tc_bpf.c
@@ -205,6 +205,52 @@ void bpf_print_ops(FILE *f, struct rtattr *bpf_ops, __u16 len)
 		ops[i].jf, ops[i].k);
 }
 
+static int bpf_map_selfcheck_pinned(int fd, const struct bpf_elf_map *map)
+{
+	char file[PATH_MAX], buff[4096];
+	struct bpf_elf_map tmp, zero;
+	unsigned int val;
+	FILE *fp;
+
+	snprintf(file, sizeof(file), "/proc/%d/fdinfo/%d", getpid(), fd);
+
+	fp = fopen(file, "r");
+	if (!fp) {
+		fprintf(stderr, "No procfs support?!\n");
+		return -EIO;
+	}
+
+	memset(&tmp, 0, sizeof(tmp));
+	while (fgets(buff, sizeof(buff), fp)) {
+		if (sscanf(buff, "map_type:\t%u", &val) == 1)
+			tmp.type = val;
+		else if (sscanf(buff, "key_size:\t%u", &val) == 1)
+			tmp.size_key = val;
+		else if (sscanf(buff, "value_size:\t%u", &val) == 1)
+			tmp.size_value = val;
+		else if (sscanf(buff, "max_entries:\t%u", &val) == 1)
+			tmp.max_elem = val;
+	}
+
+	fclose(fp);
+
+	if (!memcmp(&tmp, map, offsetof(struct bpf_elf_map, id))) {
+		return 0;
+	} else {
+		memset(&zero, 0, sizeof(zero));
+		/* If kernel doesn't have eBPF-related fdinfo, we cannot do much,
+		 * so just accept it. We know we do have an eBPF fd and in this
+		 * case, everything is 0. It is guaranteed that no such map exists
+		 * since map type of 0 is unloadable BPF_MAP_TYPE_UNSPEC.
+		 */
+		if (!memcmp(&tmp, &zero, offsetof(struct bpf_elf_map, id)))
+			return 0;
+
+		fprintf(stderr, "Map specs from pinned file differ!\n");
+		return -EINVAL;
+	}
+}
+
 static int bpf_valid_mntpt(const char *mnt, unsigned long magic)
 {
 	struct statfs st_fs;
@@ -816,6 +862,13 @@ static int bpf_map_attach(const char *name, const struct bpf_elf_map *map,
 
 	fd = bpf_probe_pinned(name, map->pinning);
 	if (fd > 0) {
+		ret = bpf_map_selfcheck_pinned(fd, map);
+		if (ret < 0) {
+			close(fd);
+			fprintf(stderr, "Map \'%s\' self-check failed!\n",
+				name);
+			return ret;
+		}
 		if (verbose)
 			fprintf(stderr, "Map \'%s\' loaded as pinned!\n",
 				name);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH iproute2 -next v2 3/5] {f,m}_bpf: allow for user-defined object pinnings
  2015-11-26 14:38 [PATCH iproute2 -next v2 0/5] BPF updates Daniel Borkmann
  2015-11-26 14:38 ` [PATCH iproute2 -next v2 1/5] {f,m}_bpf: make tail calls working Daniel Borkmann
  2015-11-26 14:38 ` [PATCH iproute2 -next v2 2/5] {f,m}_bpf: check map attributes when fetching as pinned Daniel Borkmann
@ 2015-11-26 14:38 ` Daniel Borkmann
  2015-11-26 14:38 ` [PATCH iproute2 -next v2 4/5] {f,m}_bpf: allow updates on program arrays Daniel Borkmann
  2015-11-26 14:38 ` [PATCH iproute2 -next v2 5/5] {f,m}_bpf: add more example code Daniel Borkmann
  4 siblings, 0 replies; 9+ messages in thread
From: Daniel Borkmann @ 2015-11-26 14:38 UTC (permalink / raw)
  To: stephen; +Cc: ast, netdev, Daniel Borkmann

The recently introduced object pinning can be further extended in order
to allow sharing maps beyond tc namespace. F.e. maps that are being pinned
from tracing side, can be accessed through this facility as well.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
 etc/iproute2/bpf_pinning |   6 ++
 include/bpf_elf.h        |   2 +-
 include/utils.h          |   4 +
 lib/rt_names.c           |   5 +-
 tc/tc_bpf.c              | 212 ++++++++++++++++++++++++++++++++++++++++++-----
 5 files changed, 204 insertions(+), 25 deletions(-)
 create mode 100644 etc/iproute2/bpf_pinning

diff --git a/etc/iproute2/bpf_pinning b/etc/iproute2/bpf_pinning
new file mode 100644
index 0000000..2b39c70
--- /dev/null
+++ b/etc/iproute2/bpf_pinning
@@ -0,0 +1,6 @@
+#
+# subpath mappings from mount point for pinning
+#
+#3	tracing
+#4	foo/bar
+#5	tc/cls1
diff --git a/include/bpf_elf.h b/include/bpf_elf.h
index 0690dd6..31a8974 100644
--- a/include/bpf_elf.h
+++ b/include/bpf_elf.h
@@ -33,7 +33,7 @@ struct bpf_elf_map {
 	__u32 size_value;
 	__u32 max_elem;
 	__u32 id;
-	__u8  pinning;
+	__u32 pinning;
 };
 
 #endif /* __BPF_ELF__ */
diff --git a/include/utils.h b/include/utils.h
index 5902a98..e830be6 100644
--- a/include/utils.h
+++ b/include/utils.h
@@ -40,6 +40,10 @@ extern bool do_all;
 #define IPSEC_PROTO_ANY	255
 #endif
 
+#ifndef CONFDIR
+#define CONFDIR		"/etc/iproute2"
+#endif
+
 #define SPRINT_BSIZE 64
 #define SPRINT_BUF(x)	char x[SPRINT_BSIZE]
 
diff --git a/lib/rt_names.c b/lib/rt_names.c
index e87c65d..3968c76 100644
--- a/lib/rt_names.c
+++ b/lib/rt_names.c
@@ -22,10 +22,7 @@
 #include <linux/rtnetlink.h>
 
 #include "rt_names.h"
-
-#ifndef CONFDIR
-#define CONFDIR "/etc/iproute2"
-#endif
+#include "utils.h"
 
 #define NAME_MAX_LEN 512
 
diff --git a/tc/tc_bpf.c b/tc/tc_bpf.c
index b44b123..17c04e9 100644
--- a/tc/tc_bpf.c
+++ b/tc/tc_bpf.c
@@ -458,6 +458,12 @@ struct bpf_elf_prog {
 	const char		*license;
 };
 
+struct bpf_hash_entry {
+	unsigned int		pinning;
+	const char		*subpath;
+	struct bpf_hash_entry	*next;
+};
+
 struct bpf_elf_ctx {
 	Elf			*elf_fd;
 	GElf_Ehdr		elf_hdr;
@@ -474,6 +480,7 @@ struct bpf_elf_ctx {
 	enum bpf_prog_type	type;
 	bool			verbose;
 	struct bpf_elf_st	stat;
+	struct bpf_hash_entry	*ht[256];
 };
 
 struct bpf_elf_sec_data {
@@ -771,20 +778,34 @@ static int bpf_init_env(const char *pathname)
 	return 0;
 }
 
-static bool bpf_no_pinning(int pinning)
+static const char *bpf_custom_pinning(const struct bpf_elf_ctx *ctx,
+				      uint32_t pinning)
+{
+	struct bpf_hash_entry *entry;
+
+	entry = ctx->ht[pinning & (ARRAY_SIZE(ctx->ht) - 1)];
+	while (entry && entry->pinning != pinning)
+		entry = entry->next;
+
+	return entry ? entry->subpath : NULL;
+}
+
+static bool bpf_no_pinning(const struct bpf_elf_ctx *ctx,
+			   uint32_t pinning)
 {
 	switch (pinning) {
 	case PIN_OBJECT_NS:
 	case PIN_GLOBAL_NS:
 		return false;
 	case PIN_NONE:
-	default:
 		return true;
+	default:
+		return !bpf_custom_pinning(ctx, pinning);
 	}
 }
 
 static void bpf_make_pathname(char *pathname, size_t len, const char *name,
-			      int pinning)
+			      const struct bpf_elf_ctx *ctx, uint32_t pinning)
 {
 	switch (pinning) {
 	case PIN_OBJECT_NS:
@@ -795,41 +816,89 @@ static void bpf_make_pathname(char *pathname, size_t len, const char *name,
 		snprintf(pathname, len, "%s/%s/%s", bpf_get_tc_dir(),
 			 BPF_DIR_GLOBALS, name);
 		break;
+	default:
+		snprintf(pathname, len, "%s/../%s/%s", bpf_get_tc_dir(),
+			 bpf_custom_pinning(ctx, pinning), name);
+		break;
 	}
 }
 
-static int bpf_probe_pinned(const char *name, int pinning)
+static int bpf_probe_pinned(const char *name, const struct bpf_elf_ctx *ctx,
+			    uint32_t pinning)
 {
 	char pathname[PATH_MAX];
 
-	if (bpf_no_pinning(pinning) || !bpf_get_tc_dir())
+	if (bpf_no_pinning(ctx, pinning) || !bpf_get_tc_dir())
 		return 0;
 
-	bpf_make_pathname(pathname, sizeof(pathname), name, pinning);
+	bpf_make_pathname(pathname, sizeof(pathname), name, ctx, pinning);
 	return bpf_obj_get(pathname);
 }
 
-static int bpf_place_pinned(int fd, const char *name, int pinning)
+static int bpf_make_obj_path(void)
 {
-	char pathname[PATH_MAX];
+	char tmp[PATH_MAX];
 	int ret;
 
-	if (bpf_no_pinning(pinning) || !bpf_get_tc_dir())
-		return 0;
+	snprintf(tmp, sizeof(tmp), "%s/%s", bpf_get_tc_dir(),
+		 bpf_get_obj_uid(NULL));
+
+	ret = mkdir(tmp, S_IRWXU);
+	if (ret && errno != EEXIST) {
+		fprintf(stderr, "mkdir %s failed: %s\n", tmp, strerror(errno));
+		return ret;
+	}
+
+	return 0;
+}
+
+static int bpf_make_custom_path(const char *todo)
+{
+	char tmp[PATH_MAX], rem[PATH_MAX], *sub;
+	int ret;
+
+	snprintf(tmp, sizeof(tmp), "%s/../", bpf_get_tc_dir());
+	snprintf(rem, sizeof(rem), "%s/", todo);
+	sub = strtok(rem, "/");
 
-	if (pinning == PIN_OBJECT_NS) {
-		snprintf(pathname, sizeof(pathname), "%s/%s",
-			 bpf_get_tc_dir(), bpf_get_obj_uid(NULL));
+	while (sub) {
+		if (strlen(tmp) + strlen(sub) + 2 > PATH_MAX)
+			return -EINVAL;
+
+		strcat(tmp, sub);
+		strcat(tmp, "/");
 
-		ret = mkdir(pathname, S_IRWXU);
+		ret = mkdir(tmp, S_IRWXU);
 		if (ret && errno != EEXIST) {
-			fprintf(stderr, "mkdir %s failed: %s\n", pathname,
+			fprintf(stderr, "mkdir %s failed: %s\n", tmp,
 				strerror(errno));
 			return ret;
 		}
+
+		sub = strtok(NULL, "/");
 	}
 
-	bpf_make_pathname(pathname, sizeof(pathname), name, pinning);
+	return 0;
+}
+
+static int bpf_place_pinned(int fd, const char *name,
+			    const struct bpf_elf_ctx *ctx, uint32_t pinning)
+{
+	char pathname[PATH_MAX];
+	const char *tmp;
+	int ret = 0;
+
+	if (bpf_no_pinning(ctx, pinning) || !bpf_get_tc_dir())
+		return 0;
+
+	if (pinning == PIN_OBJECT_NS)
+		ret = bpf_make_obj_path();
+	else if ((tmp = bpf_custom_pinning(ctx, pinning)))
+		ret = bpf_make_custom_path(tmp);
+	if (ret < 0)
+		return ret;
+
+	bpf_make_pathname(pathname, sizeof(pathname), name, ctx, pinning);
 	return bpf_obj_pin(fd, pathname);
 }
 
@@ -856,11 +925,11 @@ static int bpf_prog_attach(const char *section,
 }
 
 static int bpf_map_attach(const char *name, const struct bpf_elf_map *map,
-			  bool verbose)
+			  const struct bpf_elf_ctx *ctx, bool verbose)
 {
 	int fd, ret;
 
-	fd = bpf_probe_pinned(name, map->pinning);
+	fd = bpf_probe_pinned(name, ctx, map->pinning);
 	if (fd > 0) {
 		ret = bpf_map_selfcheck_pinned(fd, map);
 		if (ret < 0) {
@@ -889,7 +958,7 @@ static int bpf_map_attach(const char *name, const struct bpf_elf_map *map,
 			return fd;
 	}
 
-	ret = bpf_place_pinned(fd, name, map->pinning);
+	ret = bpf_place_pinned(fd, name, ctx, map->pinning);
 	if (ret < 0 && errno != EEXIST) {
 		fprintf(stderr, "Could not pin %s map: %s\n", name,
 			strerror(errno));
@@ -940,7 +1009,8 @@ static int bpf_maps_attach_all(struct bpf_elf_ctx *ctx)
 		if (!map_name)
 			return -EIO;
 
-		fd = bpf_map_attach(map_name, &ctx->maps[i], ctx->verbose);
+		fd = bpf_map_attach(map_name, &ctx->maps[i], ctx,
+				    ctx->verbose);
 		if (fd < 0)
 			return fd;
 
@@ -1258,6 +1328,105 @@ static void bpf_save_finfo(struct bpf_elf_ctx *ctx)
 	ctx->stat.st_ino = st.st_ino;
 }
 
+static int bpf_read_pin_mapping(FILE *fp, uint32_t *id, char *path)
+{
+	char buff[PATH_MAX];
+
+	while (fgets(buff, sizeof(buff), fp)) {
+		char *ptr = buff;
+
+		while (*ptr == ' ' || *ptr == '\t')
+			ptr++;
+
+		if (*ptr == '#' || *ptr == '\n' || *ptr == 0)
+			continue;
+
+		if (sscanf(ptr, "%i %s\n", id, path) != 2 &&
+		    sscanf(ptr, "%i %s #", id, path) != 2) {
+			strcpy(path, ptr);
+			return -1;
+		}
+
+		return 1;
+	}
+
+	return 0;
+}
+
+static bool bpf_pinning_reserved(uint32_t pinning)
+{
+	switch (pinning) {
+	case PIN_NONE:
+	case PIN_OBJECT_NS:
+	case PIN_GLOBAL_NS:
+		return true;
+	default:
+		return false;
+	}
+}
+
+static void bpf_hash_init(struct bpf_elf_ctx *ctx, const char *db_file)
+{
+	struct bpf_hash_entry *entry;
+	char subpath[PATH_MAX];
+	uint32_t pinning;
+	FILE *fp;
+	int ret;
+
+	fp = fopen(db_file, "r");
+	if (!fp)
+		return;
+
+	memset(subpath, 0, sizeof(subpath));
+	while ((ret = bpf_read_pin_mapping(fp, &pinning, subpath))) {
+		if (ret == -1) {
+			fprintf(stderr, "Database %s is corrupted at: %s\n",
+				db_file, subpath);
+			fclose(fp);
+			return;
+		}
+
+		if (bpf_pinning_reserved(pinning)) {
+			fprintf(stderr, "Database %s, id %u is reserved - "
+				"ignoring!\n", db_file, pinning);
+			continue;
+		}
+
+		entry = malloc(sizeof(*entry));
+		if (!entry) {
+			fprintf(stderr, "No memory left for db entry!\n");
+			continue;
+		}
+
+		entry->pinning = pinning;
+		entry->subpath = strdup(subpath);
+		if (!entry->subpath) {
+			fprintf(stderr, "No memory left for db entry!\n");
+			free(entry);
+			continue;
+		}
+
+		entry->next = ctx->ht[pinning & (ARRAY_SIZE(ctx->ht) - 1)];
+		ctx->ht[pinning & (ARRAY_SIZE(ctx->ht) - 1)] = entry;
+	}
+
+	fclose(fp);
+}
+
+static void bpf_hash_destroy(struct bpf_elf_ctx *ctx)
+{
+	struct bpf_hash_entry *entry;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(ctx->ht); i++) {
+		while ((entry = ctx->ht[i]) != NULL) {
+			ctx->ht[i] = entry->next;
+			free((char *)entry->subpath);
+			free(entry);
+		}
+	}
+}
+
 static int bpf_elf_ctx_init(struct bpf_elf_ctx *ctx, const char *pathname,
 			    enum bpf_prog_type type, bool verbose)
 {
@@ -1295,6 +1464,8 @@ static int bpf_elf_ctx_init(struct bpf_elf_ctx *ctx, const char *pathname,
 	}
 
 	bpf_save_finfo(ctx);
+	bpf_hash_init(ctx, CONFDIR "/bpf_pinning");
+
 	return 0;
 out_elf:
 	elf_end(ctx->elf_fd);
@@ -1331,6 +1502,7 @@ static void bpf_elf_ctx_destroy(struct bpf_elf_ctx *ctx, bool failure)
 	if (failure)
 		bpf_maps_teardown(ctx);
 
+	bpf_hash_destroy(ctx);
 	free(ctx->sec_done);
 	elf_end(ctx->elf_fd);
 	close(ctx->obj_fd);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH iproute2 -next v2 4/5] {f,m}_bpf: allow updates on program arrays
  2015-11-26 14:38 [PATCH iproute2 -next v2 0/5] BPF updates Daniel Borkmann
                   ` (2 preceding siblings ...)
  2015-11-26 14:38 ` [PATCH iproute2 -next v2 3/5] {f,m}_bpf: allow for user-defined object pinnings Daniel Borkmann
@ 2015-11-26 14:38 ` Daniel Borkmann
  2015-11-26 15:19   ` Hannes Frederic Sowa
  2015-11-26 14:38 ` [PATCH iproute2 -next v2 5/5] {f,m}_bpf: add more example code Daniel Borkmann
  4 siblings, 1 reply; 9+ messages in thread
From: Daniel Borkmann @ 2015-11-26 14:38 UTC (permalink / raw)
  To: stephen; +Cc: ast, netdev, Daniel Borkmann

Since we have all infrastructure in place now, allow atomic live updates
on program arrays. This can be very useful e.g. in case programs that are
being tail-called need to be replaced, f.e. when classifier functionality
needs to be changed, new protocols added/removed during runtime, etc.

Thus, provide a way for in-place code updates, minimal example: Given is
an object file cls.o that contains the entry point in section 'classifier',
has a globally pinned program array 'jmp' with 2 slots and id of 0, and
two tail called programs under section '0/0' (prog array key 0) and '0/1'
(prog array key 1), the section encoding for the loader is <id/key>.
Adding the filter loads everything into cls_bpf:

  tc filter add dev foo parent ffff: bpf da obj cls.o

Now, the program under section '0/1' needs to be replaced with an updated
version that resides in the same section (also full path to tc's subfolder
of the mount point can be passed, e.g. /sys/fs/bpf/tc/globals/jmp):

  tc exec bpf graft m:globals/jmp obj cls.o sec 0/1

In case the program resides under a different section 'foo', it can also
be injected into the program array like:

  tc exec bpf graft m:globals/jmp key 1 obj cls.o sec foo

If the new tail called classifier program is already available as a pinned
object somewhere (here: /sys/fs/bpf/tc/progs/parser), it can be injected
into the prog array like:

  tc exec bpf graft m:globals/jmp key 1 fd m:progs/parser

In the kernel, the program on key 1 is being atomically replaced and the
old one's refcount dropped.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
 tc/e_bpf.c  |  30 ++++-
 tc/tc_bpf.c | 424 +++++++++++++++++++++++++++++++++++++++---------------------
 tc/tc_bpf.h |   1 +
 3 files changed, 306 insertions(+), 149 deletions(-)

diff --git a/tc/e_bpf.c b/tc/e_bpf.c
index 1f386c3..2d650a4 100644
--- a/tc/e_bpf.c
+++ b/tc/e_bpf.c
@@ -26,10 +26,19 @@ static char *argv_default[] = { BPF_DEFAULT_CMD, NULL };
 
 static void explain(void)
 {
-	fprintf(stderr, "Usage: ... bpf [ import UDS_FILE ] [ run CMD ] [ debug ]\n\n");
+	fprintf(stderr, "Usage: ... bpf [ import UDS_FILE ] [ run CMD ]\n");
+	fprintf(stderr, "       ... bpf [ debug ]\n");
+	fprintf(stderr, "       ... bpf [ graft MAP_FILE ] [ key KEY ]\n");
+	fprintf(stderr, "          `... [ object-file OBJ_FILE ] [ type TYPE ] [ section NAME ] [ verbose ]\n");
+	fprintf(stderr, "          `... [ object-pinned PROG_FILE ]\n");
+	fprintf(stderr, "\n");
 	fprintf(stderr, "Where UDS_FILE provides the name of a unix domain socket file\n");
 	fprintf(stderr, "to import eBPF maps and the optional CMD denotes the command\n");
 	fprintf(stderr, "to be executed (default: \'%s\').\n", BPF_DEFAULT_CMD);
+	fprintf(stderr, "Where MAP_FILE points to a pinned map, OBJ_FILE to an object file\n");
+	fprintf(stderr, "and PROG_FILE to a pinned program. TYPE can be {cls, act}, where\n");
+	fprintf(stderr, "\'cls\' is default. KEY is optional and can be inferred from the\n");
+	fprintf(stderr, "section name, otherwise it needs to be provided.\n");
 }
 
 static int bpf_num_env_entries(void)
@@ -67,6 +76,25 @@ static int parse_bpf(struct exec_util *eu, int argc, char **argv)
 				fprintf(stderr,
 					"No trace pipe, tracefs not mounted?\n");
 			return -1;
+		} else if (matches(*argv, "graft") == 0) {
+			const char *bpf_map_path;
+			bool has_key = false;
+			uint32_t key;
+
+			NEXT_ARG();
+			bpf_map_path = *argv;
+			NEXT_ARG();
+			if (matches(*argv, "key") == 0) {
+				NEXT_ARG();
+				if (get_unsigned(&key, *argv, 0)) {
+					fprintf(stderr, "Illegal \"key\"\n");
+					return -1;
+				}
+				has_key = true;
+				NEXT_ARG();
+			}
+			return bpf_graft_map(bpf_map_path, has_key ?
+					     &key : NULL, argc, argv);
 		} else {
 			explain();
 			return -1;
diff --git a/tc/tc_bpf.c b/tc/tc_bpf.c
index 17c04e9..beb74be 100644
--- a/tc/tc_bpf.c
+++ b/tc/tc_bpf.c
@@ -76,13 +76,17 @@ static int bpf(int cmd, union bpf_attr *attr, unsigned int size)
 #endif
 }
 
-static int bpf_obj_get(const char *pathname)
+static int bpf_map_update(int fd, const void *key, const void *value,
+			  uint64_t flags)
 {
 	union bpf_attr attr = {
-		.pathname	= bpf_ptr_to_u64(pathname),
+		.map_fd		= fd,
+		.key		= bpf_ptr_to_u64(key),
+		.value		= bpf_ptr_to_u64(value),
+		.flags		= flags,
 	};
 
-	return bpf(BPF_OBJ_GET, &attr, sizeof(attr));
+	return bpf(BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr));
 }
 
 static int bpf_parse_string(char *arg, bool from_file, __u16 *bpf_len,
@@ -205,7 +209,8 @@ void bpf_print_ops(FILE *f, struct rtattr *bpf_ops, __u16 len)
 		ops[i].jf, ops[i].k);
 }
 
-static int bpf_map_selfcheck_pinned(int fd, const struct bpf_elf_map *map)
+static int bpf_map_selfcheck_pinned(int fd, const struct bpf_elf_map *map,
+				    int length)
 {
 	char file[PATH_MAX], buff[4096];
 	struct bpf_elf_map tmp, zero;
@@ -234,7 +239,7 @@ static int bpf_map_selfcheck_pinned(int fd, const struct bpf_elf_map *map)
 
 	fclose(fp);
 
-	if (!memcmp(&tmp, map, offsetof(struct bpf_elf_map, id))) {
+	if (!memcmp(&tmp, map, length)) {
 		return 0;
 	} else {
 		memset(&zero, 0, sizeof(zero));
@@ -243,7 +248,7 @@ static int bpf_map_selfcheck_pinned(int fd, const struct bpf_elf_map *map)
 		 * case, everything is 0. It is guaranteed that no such map exists
 		 * since map type of 0 is unloadable BPF_MAP_TYPE_UNSPEC.
 		 */
-		if (!memcmp(&tmp, &zero, offsetof(struct bpf_elf_map, id)))
+		if (!memcmp(&tmp, &zero, length))
 			return 0;
 
 		fprintf(stderr, "Map specs from pinned file differ!\n");
@@ -251,6 +256,35 @@ static int bpf_map_selfcheck_pinned(int fd, const struct bpf_elf_map *map)
 	}
 }
 
+static int bpf_mnt_fs(const char *target)
+{
+	bool bind_done = false;
+
+	while (mount("", target, "none", MS_PRIVATE | MS_REC, NULL)) {
+		if (errno != EINVAL || bind_done) {
+			fprintf(stderr, "mount --make-private %s failed: %s\n",
+				target,	strerror(errno));
+			return -1;
+		}
+
+		if (mount(target, target, "none", MS_BIND, NULL)) {
+			fprintf(stderr, "mount --bind %s %s failed: %s\n",
+				target,	target, strerror(errno));
+			return -1;
+		}
+
+		bind_done = true;
+	}
+
+	if (mount("bpf", target, "bpf", 0, NULL)) {
+		fprintf(stderr, "mount -t bpf bpf %s failed: %s\n",
+			target,	strerror(errno));
+		return -1;
+	}
+
+	return 0;
+}
+
 static int bpf_valid_mntpt(const char *mnt, unsigned long magic)
 {
 	struct statfs st_fs;
@@ -342,6 +376,79 @@ int bpf_trace_pipe(void)
 	return 0;
 }
 
+static const char *bpf_get_tc_dir(void)
+{
+	static bool bpf_mnt_cached = false;
+	static char bpf_tc_dir[PATH_MAX];
+	static const char *mnt;
+	static const char * const bpf_known_mnts[] = {
+		BPF_DIR_MNT,
+		0,
+	};
+	char bpf_mnt[PATH_MAX] = BPF_DIR_MNT;
+	char bpf_glo_dir[PATH_MAX];
+	int ret;
+
+	if (bpf_mnt_cached)
+		goto done;
+
+	mnt = bpf_find_mntpt("bpf", BPF_FS_MAGIC, bpf_mnt, sizeof(bpf_mnt),
+			     bpf_known_mnts);
+	if (!mnt) {
+		mnt = getenv(BPF_ENV_MNT);
+		if (!mnt)
+			mnt = BPF_DIR_MNT;
+		ret = bpf_mnt_fs(mnt);
+		if (ret) {
+			mnt = NULL;
+			goto out;
+		}
+	}
+
+	snprintf(bpf_tc_dir, sizeof(bpf_tc_dir), "%s/%s", mnt, BPF_DIR_TC);
+	ret = mkdir(bpf_tc_dir, S_IRWXU);
+	if (ret && errno != EEXIST) {
+		fprintf(stderr, "mkdir %s failed: %s\n", bpf_tc_dir,
+			strerror(errno));
+		mnt = NULL;
+		goto out;
+	}
+
+	snprintf(bpf_glo_dir, sizeof(bpf_glo_dir), "%s/%s",
+		 bpf_tc_dir, BPF_DIR_GLOBALS);
+	ret = mkdir(bpf_glo_dir, S_IRWXU);
+	if (ret && errno != EEXIST) {
+		fprintf(stderr, "mkdir %s failed: %s\n", bpf_glo_dir,
+			strerror(errno));
+		mnt = NULL;
+		goto out;
+	}
+
+	mnt = bpf_tc_dir;
+out:
+	bpf_mnt_cached = true;
+done:
+	return mnt;
+}
+
+static int bpf_obj_get(const char *pathname)
+{
+	union bpf_attr attr;
+	char tmp[PATH_MAX];
+
+	if (strlen(pathname) > 2 && pathname[0] == 'm' &&
+	    pathname[1] == ':' && bpf_get_tc_dir()) {
+		snprintf(tmp, sizeof(tmp), "%s/%s",
+			 bpf_get_tc_dir(), pathname + 2);
+		pathname = tmp;
+	}
+
+	memset(&attr, 0, sizeof(attr));
+	attr.pathname = bpf_ptr_to_u64(pathname);
+
+	return bpf(BPF_OBJ_GET, &attr, sizeof(attr));
+}
+
 const char *bpf_default_section(const enum bpf_prog_type type)
 {
 	switch (type) {
@@ -354,37 +461,45 @@ const char *bpf_default_section(const enum bpf_prog_type type)
 	}
 }
 
-int bpf_parse_common(int *ptr_argc, char ***ptr_argv, const int *nla_tbl,
-		     enum bpf_prog_type type, const char **ptr_object,
-		     const char **ptr_uds_name, struct nlmsghdr *n)
+enum bpf_mode {
+	CBPF_BYTECODE = 0,
+	CBPF_FILE,
+	EBPF_OBJECT,
+	EBPF_PINNED,
+	__BPF_MODE_MAX,
+#define BPF_MODE_MAX	__BPF_MODE_MAX
+};
+
+static int bpf_parse(int *ptr_argc, char ***ptr_argv, const bool *opt_tbl,
+		     enum bpf_prog_type *type, enum bpf_mode *mode,
+		     const char **ptr_object, const char **ptr_section,
+		     const char **ptr_uds_name, struct sock_filter *opcodes)
 {
-	struct sock_filter opcodes[BPF_MAXINSNS];
 	const char *file, *section, *uds_name;
-	char **argv = *ptr_argv;
-	int argc = *ptr_argc;
-	char annotation[256];
 	bool verbose = false;
-	int ret;
-	enum bpf_mode {
-		CBPF_BYTECODE,
-		CBPF_FILE,
-		EBPF_OBJECT,
-		EBPF_PINNED,
-	} mode;
-
-	if (matches(*argv, "bytecode") == 0 ||
-	    strcmp(*argv, "bc") == 0) {
-		mode = CBPF_BYTECODE;
-	} else if (matches(*argv, "bytecode-file") == 0 ||
-		   strcmp(*argv, "bcf") == 0) {
-		mode = CBPF_FILE;
-	} else if (matches(*argv, "object-file") == 0 ||
-		   strcmp(*argv, "obj") == 0) {
-		mode = EBPF_OBJECT;
-	} else if (matches(*argv, "object-pinned") == 0 ||
-		   matches(*argv, "pinned") == 0 ||
-		   matches(*argv, "fd") == 0) {
-		mode = EBPF_PINNED;
+	int ret, argc;
+	char **argv;
+
+	argv = *ptr_argv;
+	argc = *ptr_argc;
+
+	if (opt_tbl[CBPF_BYTECODE] &&
+	    (matches(*argv, "bytecode") == 0 ||
+	     strcmp(*argv, "bc") == 0)) {
+		*mode = CBPF_BYTECODE;
+	} else if (opt_tbl[CBPF_FILE] &&
+		   (matches(*argv, "bytecode-file") == 0 ||
+		    strcmp(*argv, "bcf") == 0)) {
+		*mode = CBPF_FILE;
+	} else if (opt_tbl[EBPF_OBJECT] &&
+		   (matches(*argv, "object-file") == 0 ||
+		    strcmp(*argv, "obj") == 0)) {
+		*mode = EBPF_OBJECT;
+	} else if (opt_tbl[EBPF_PINNED] &&
+		   (matches(*argv, "object-pinned") == 0 ||
+		    matches(*argv, "pinned") == 0 ||
+		    matches(*argv, "fd") == 0)) {
+		*mode = EBPF_PINNED;
 	} else {
 		fprintf(stderr, "What mode is \"%s\"?\n", *argv);
 		return -1;
@@ -392,11 +507,29 @@ int bpf_parse_common(int *ptr_argc, char ***ptr_argv, const int *nla_tbl,
 
 	NEXT_ARG();
 	file = section = uds_name = NULL;
-	if (mode == EBPF_OBJECT || mode == EBPF_PINNED) {
+	if (*mode == EBPF_OBJECT || *mode == EBPF_PINNED) {
 		file = *argv;
 		NEXT_ARG_FWD();
 
-		section = bpf_default_section(type);
+		if (*type == BPF_PROG_TYPE_UNSPEC) {
+			if (argc > 0 && matches(*argv, "type") == 0) {
+				NEXT_ARG();
+				if (matches(*argv, "cls") == 0) {
+					*type = BPF_PROG_TYPE_SCHED_CLS;
+				} else if (matches(*argv, "act") == 0) {
+					*type = BPF_PROG_TYPE_SCHED_ACT;
+				} else {
+					fprintf(stderr, "What type is \"%s\"?\n",
+						*argv);
+					return -1;
+				}
+				NEXT_ARG_FWD();
+			} else {
+				*type = BPF_PROG_TYPE_SCHED_CLS;
+			}
+		}
+
+		section = bpf_default_section(*type);
 		if (argc > 0 && matches(*argv, "section") == 0) {
 			NEXT_ARG();
 			section = *argv;
@@ -419,35 +552,125 @@ int bpf_parse_common(int *ptr_argc, char ***ptr_argv, const int *nla_tbl,
 		PREV_ARG();
 	}
 
-	if (mode == CBPF_BYTECODE || mode == CBPF_FILE)
-		ret = bpf_ops_parse(argc, argv, opcodes, mode == CBPF_FILE);
-	else if (mode == EBPF_OBJECT)
-		ret = bpf_obj_open(file, type, section, verbose);
-	else if (mode == EBPF_PINNED)
+	if (*mode == CBPF_BYTECODE || *mode == CBPF_FILE)
+		ret = bpf_ops_parse(argc, argv, opcodes, *mode == CBPF_FILE);
+	else if (*mode == EBPF_OBJECT)
+		ret = bpf_obj_open(file, *type, section, verbose);
+	else if (*mode == EBPF_PINNED)
 		ret = bpf_obj_get(file);
-	if (ret < 0)
+	else
 		return -1;
 
+	if (ptr_object)
+		*ptr_object = file;
+	if (ptr_section)
+		*ptr_section = section;
+	if (ptr_uds_name)
+		*ptr_uds_name = uds_name;
+
+	*ptr_argc = argc;
+	*ptr_argv = argv;
+
+	return ret;
+}
+
+int bpf_parse_common(int *ptr_argc, char ***ptr_argv, const int *nla_tbl,
+		     enum bpf_prog_type type, const char **ptr_object,
+		     const char **ptr_uds_name, struct nlmsghdr *n)
+{
+	struct sock_filter opcodes[BPF_MAXINSNS];
+	const bool opt_tbl[BPF_MODE_MAX] = {
+		[CBPF_BYTECODE]	= true,
+		[CBPF_FILE]	= true,
+		[EBPF_OBJECT]	= true,
+		[EBPF_PINNED]	= true,
+	};
+	char annotation[256];
+	const char *section;
+	enum bpf_mode mode;
+	int ret;
+
+	ret = bpf_parse(ptr_argc, ptr_argv, opt_tbl, &type, &mode,
+			ptr_object, &section, ptr_uds_name, opcodes);
+	if (ret < 0)
+		return ret;
+
 	if (mode == CBPF_BYTECODE || mode == CBPF_FILE) {
 		addattr16(n, MAX_MSG, nla_tbl[BPF_NLA_OPS_LEN], ret);
 		addattr_l(n, MAX_MSG, nla_tbl[BPF_NLA_OPS], opcodes,
 			  ret * sizeof(struct sock_filter));
-	} else if (mode == EBPF_OBJECT || mode == EBPF_PINNED) {
+	}
+
+	if (mode == EBPF_OBJECT || mode == EBPF_PINNED) {
 		snprintf(annotation, sizeof(annotation), "%s:[%s]",
-			 basename(file), mode == EBPF_PINNED ? "*fsobj" :
-			 section);
+			 basename(*ptr_object), mode == EBPF_PINNED ?
+			 "*fsobj" : section);
 
 		addattr32(n, MAX_MSG, nla_tbl[BPF_NLA_FD], ret);
 		addattrstrz(n, MAX_MSG, nla_tbl[BPF_NLA_NAME], annotation);
 	}
 
-	*ptr_object = file;
-	*ptr_uds_name = uds_name;
+	return 0;
+}
 
-	*ptr_argc = argc;
-	*ptr_argv = argv;
+int bpf_graft_map(const char *map_path, uint32_t *key, int argc, char **argv)
+{
+	enum bpf_prog_type type = BPF_PROG_TYPE_UNSPEC;
+	const bool opt_tbl[BPF_MODE_MAX] = {
+		[CBPF_BYTECODE]	= false,
+		[CBPF_FILE]	= false,
+		[EBPF_OBJECT]	= true,
+		[EBPF_PINNED]	= true,
+	};
+	const struct bpf_elf_map test = {
+		.type		= BPF_MAP_TYPE_PROG_ARRAY,
+		.size_key	= sizeof(int),
+		.size_value	= sizeof(int),
+	};
+	int ret, prog_fd, map_fd;
+	const char *section;
+	enum bpf_mode mode;
+	uint32_t map_key;
+
+	prog_fd = bpf_parse(&argc, &argv, opt_tbl, &type, &mode,
+			    NULL, &section, NULL, NULL);
+	if (prog_fd < 0)
+		return prog_fd;
+	if (key) {
+		map_key = *key;
+	} else {
+		ret = sscanf(section, "%*i/%i", &map_key);
+		if (ret != 1) {
+			fprintf(stderr, "Couldn\'t infer map key from section "
+				"name! Please provide \'key\' argument!\n");
+			ret = -EINVAL;
+			goto out_prog;
+		}
+	}
 
-	return 0;
+	map_fd = bpf_obj_get(map_path);
+	if (map_fd < 0) {
+		fprintf(stderr, "Couldn\'t retrieve pinned map \'%s\': %s\n",
+			map_path, strerror(errno));
+		ret = map_fd;
+		goto out_prog;
+	}
+
+	ret = bpf_map_selfcheck_pinned(map_fd, &test,
+				       offsetof(struct bpf_elf_map, max_elem));
+	if (ret < 0) {
+		fprintf(stderr, "Map \'%s\' self-check failed!\n", map_path);
+		goto out_map;
+	}
+
+	ret = bpf_map_update(map_fd, &map_key, &prog_fd, BPF_ANY);
+	if (ret < 0)
+		fprintf(stderr, "Map update failed: %s\n", strerror(errno));
+out_map:
+	close(map_fd);
+out_prog:
+	close(prog_fd);
+	return ret;
 }
 
 #ifdef HAVE_ELF
@@ -530,19 +753,6 @@ static int bpf_map_create(enum bpf_map_type type, unsigned int size_key,
 	return bpf(BPF_MAP_CREATE, &attr, sizeof(attr));
 }
 
-static int bpf_map_update(int fd, const void *key, const void *value,
-			  uint64_t flags)
-{
-	union bpf_attr attr = {
-		.map_fd		= fd,
-		.key		= bpf_ptr_to_u64(key),
-		.value		= bpf_ptr_to_u64(value),
-		.flags		= flags,
-	};
-
-	return bpf(BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr));
-}
-
 static int bpf_prog_load(enum bpf_prog_type type, const struct bpf_insn *insns,
 			 size_t size, const char *license)
 {
@@ -672,90 +882,6 @@ done:
 	return bpf_uid;
 }
 
-static int bpf_mnt_fs(const char *target)
-{
-	bool bind_done = false;
-
-	while (mount("", target, "none", MS_PRIVATE | MS_REC, NULL)) {
-		if (errno != EINVAL || bind_done) {
-			fprintf(stderr, "mount --make-private %s failed: %s\n",
-				target,	strerror(errno));
-			return -1;
-		}
-
-		if (mount(target, target, "none", MS_BIND, NULL)) {
-			fprintf(stderr, "mount --bind %s %s failed: %s\n",
-				target,	target, strerror(errno));
-			return -1;
-		}
-
-		bind_done = true;
-	}
-
-	if (mount("bpf", target, "bpf", 0, NULL)) {
-		fprintf(stderr, "mount -t bpf bpf %s failed: %s\n",
-			target,	strerror(errno));
-		return -1;
-	}
-
-	return 0;
-}
-
-static const char *bpf_get_tc_dir(void)
-{
-	static bool bpf_mnt_cached = false;
-	static char bpf_tc_dir[PATH_MAX];
-	static const char *mnt;
-	static const char * const bpf_known_mnts[] = {
-		BPF_DIR_MNT,
-		0,
-	};
-	char bpf_mnt[PATH_MAX] = BPF_DIR_MNT;
-	char bpf_glo_dir[PATH_MAX];
-	int ret;
-
-	if (bpf_mnt_cached)
-		goto done;
-
-	mnt = bpf_find_mntpt("bpf", BPF_FS_MAGIC, bpf_mnt, sizeof(bpf_mnt),
-			     bpf_known_mnts);
-	if (!mnt) {
-		mnt = getenv(BPF_ENV_MNT);
-		if (!mnt)
-			mnt = BPF_DIR_MNT;
-		ret = bpf_mnt_fs(mnt);
-		if (ret) {
-			mnt = NULL;
-			goto out;
-		}
-	}
-
-	snprintf(bpf_tc_dir, sizeof(bpf_tc_dir), "%s/%s", mnt, BPF_DIR_TC);
-	ret = mkdir(bpf_tc_dir, S_IRWXU);
-	if (ret && errno != EEXIST) {
-		fprintf(stderr, "mkdir %s failed: %s\n", bpf_tc_dir,
-			strerror(errno));
-		mnt = NULL;
-		goto out;
-	}
-
-	snprintf(bpf_glo_dir, sizeof(bpf_glo_dir), "%s/%s",
-		 bpf_tc_dir, BPF_DIR_GLOBALS);
-	ret = mkdir(bpf_glo_dir, S_IRWXU);
-	if (ret && errno != EEXIST) {
-		fprintf(stderr, "mkdir %s failed: %s\n", bpf_glo_dir,
-			strerror(errno));
-		mnt = NULL;
-		goto out;
-	}
-
-	mnt = bpf_tc_dir;
-out:
-	bpf_mnt_cached = true;
-done:
-	return mnt;
-}
-
 static int bpf_init_env(const char *pathname)
 {
 	struct rlimit limit = {
@@ -931,7 +1057,9 @@ static int bpf_map_attach(const char *name, const struct bpf_elf_map *map,
 
 	fd = bpf_probe_pinned(name, ctx, map->pinning);
 	if (fd > 0) {
-		ret = bpf_map_selfcheck_pinned(fd, map);
+		ret = bpf_map_selfcheck_pinned(fd, map,
+					       offsetof(struct bpf_elf_map,
+							id));
 		if (ret < 0) {
 			close(fd);
 			fprintf(stderr, "Map \'%s\' self-check failed!\n",
diff --git a/tc/tc_bpf.h b/tc/tc_bpf.h
index dea3c3b..526d0b1 100644
--- a/tc/tc_bpf.h
+++ b/tc/tc_bpf.h
@@ -55,6 +55,7 @@ const char *bpf_default_section(const enum bpf_prog_type type);
 int bpf_parse_common(int *ptr_argc, char ***ptr_argv, const int *nla_tbl,
 		     enum bpf_prog_type type, const char **ptr_object,
 		     const char **ptr_uds_name, struct nlmsghdr *n);
+int bpf_graft_map(const char *map_path, uint32_t *key, int argc, char **argv);
 
 void bpf_print_ops(FILE *f, struct rtattr *bpf_ops, __u16 len);
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH iproute2 -next v2 5/5] {f,m}_bpf: add more example code
  2015-11-26 14:38 [PATCH iproute2 -next v2 0/5] BPF updates Daniel Borkmann
                   ` (3 preceding siblings ...)
  2015-11-26 14:38 ` [PATCH iproute2 -next v2 4/5] {f,m}_bpf: allow updates on program arrays Daniel Borkmann
@ 2015-11-26 14:38 ` Daniel Borkmann
  2015-11-29 19:56   ` Stephen Hemminger
  4 siblings, 1 reply; 9+ messages in thread
From: Daniel Borkmann @ 2015-11-26 14:38 UTC (permalink / raw)
  To: stephen; +Cc: ast, netdev, Daniel Borkmann

I've added three examples to examples/bpf/ that demonstrate how one can
implement eBPF tail calls in tc with f.e. multiple levels of nesting.
That should act as a good starting point, but also as test cases for the
ELF loader and kernel. A real test suite for {f,m,e}_bpf is still to be
developed in future work.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
 examples/bpf/README         |  13 +++++
 examples/bpf/bpf_cyclic.c   |  32 ++++++++++++
 examples/bpf/bpf_funcs.h    |  11 +++++
 examples/bpf/bpf_graft.c    |  70 +++++++++++++++++++++++++++
 examples/bpf/bpf_tailcall.c | 115 ++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 241 insertions(+)
 create mode 100644 examples/bpf/README
 create mode 100644 examples/bpf/bpf_cyclic.c
 create mode 100644 examples/bpf/bpf_graft.c
 create mode 100644 examples/bpf/bpf_tailcall.c

diff --git a/examples/bpf/README b/examples/bpf/README
new file mode 100644
index 0000000..4247257
--- /dev/null
+++ b/examples/bpf/README
@@ -0,0 +1,13 @@
+eBPF toy code examples (running in kernel) to familiarize yourself
+with syntax and features:
+
+ - bpf_prog.c		-> Classifier examples with using maps
+ - bpf_shared.c		-> Ingress/egress map sharing example
+ - bpf_tailcall.c	-> Using tail call chains
+ - bpf_cyclic.c		-> Simple cycle as tail calls
+ - bpf_graft.c		-> Demo on altering runtime behaviour
+
+User space code example:
+
+ - bpf_agent.c		-> Counterpart to bpf_prog.c for user
+                           space to transfer/read out map data
diff --git a/examples/bpf/bpf_cyclic.c b/examples/bpf/bpf_cyclic.c
new file mode 100644
index 0000000..bde061c
--- /dev/null
+++ b/examples/bpf/bpf_cyclic.c
@@ -0,0 +1,32 @@
+#include <linux/bpf.h>
+
+#include "bpf_funcs.h"
+
+/* Cyclic dependency example to test the kernel's runtime upper
+ * bound on loops.
+ */
+struct bpf_elf_map __section("maps") jmp_tc = {
+	.type		= BPF_MAP_TYPE_PROG_ARRAY,
+	.id		= 0xabccba,
+	.size_key	= sizeof(int),
+	.size_value	= sizeof(int),
+	.pinning	= PIN_OBJECT_NS,
+	.max_elem	= 1,
+};
+
+__section_tail(0xabccba, 0) int cls_loop(struct __sk_buff *skb)
+{
+	char fmt[] = "cb: %u\n";
+
+	bpf_printk(fmt, sizeof(fmt), skb->cb[0]++);
+	bpf_tail_call(skb, &jmp_tc, 0);
+	return -1;
+}
+
+__section("classifier") int cls_entry(struct __sk_buff *skb)
+{
+	bpf_tail_call(skb, &jmp_tc, 0);
+	return -1;
+}
+
+char __license[] __section("license") = "GPL";
diff --git a/examples/bpf/bpf_funcs.h b/examples/bpf/bpf_funcs.h
index 1369401..6d058f0 100644
--- a/examples/bpf/bpf_funcs.h
+++ b/examples/bpf/bpf_funcs.h
@@ -10,10 +10,18 @@
 # define __maybe_unused		__attribute__ ((__unused__))
 #endif
 
+#ifndef __stringify
+# define __stringify(x)		#x
+#endif
+
 #ifndef __section
 # define __section(NAME)	__attribute__((section(NAME), used))
 #endif
 
+#ifndef __section_tail
+# define __section_tail(m, x)	__section(__stringify(m) "/" __stringify(x))
+#endif
+
 #ifndef offsetof
 # define offsetof		__builtin_offsetof
 #endif
@@ -50,6 +58,9 @@ static unsigned int (*get_prandom_u32)(void) __maybe_unused =
 static int (*bpf_printk)(const char *fmt, int fmt_size, ...) __maybe_unused =
 	(void *) BPF_FUNC_trace_printk;
 
+static void (*bpf_tail_call)(void *ctx, void *map, int index) __maybe_unused =
+	(void *) BPF_FUNC_tail_call;
+
 /* LLVM built-in functions that an eBPF C program may use to emit
  * BPF_LD_ABS and BPF_LD_IND instructions.
  */
diff --git a/examples/bpf/bpf_graft.c b/examples/bpf/bpf_graft.c
new file mode 100644
index 0000000..f36d25a
--- /dev/null
+++ b/examples/bpf/bpf_graft.c
@@ -0,0 +1,70 @@
+#include <linux/bpf.h>
+
+#include "bpf_funcs.h"
+
+/* This example demonstrates how classifier run-time behaviour
+ * can be altered with tail calls. We start out with an empty
+ * jmp_tc array, then add section aaa to the array slot 0, and
+ * later on atomically replace it with section bbb. Note that
+ * as shown in other examples, the tc loader can prepopulate
+ * tail called sections, here we start out with an empty one
+ * on purpose to show it can also be done this way.
+ *
+ * tc filter add dev foo parent ffff: bpf obj graft.o
+ * tc exec bpf dbg
+ *   [...]
+ *   Socket Thread-20229 [001] ..s. 138993.003923: : fallthrough
+ *   <idle>-0            [001] ..s. 138993.202265: : fallthrough
+ *   Socket Thread-20229 [001] ..s. 138994.004149: : fallthrough
+ *   [...]
+ *
+ * tc exec bpf graft m:globals/jmp_tc key 0 obj graft.o sec aaa
+ * tc exec bpf dbg
+ *   [...]
+ *   Socket Thread-19818 [002] ..s. 139012.053587: : aaa
+ *   <idle>-0            [002] ..s. 139012.172359: : aaa
+ *   Socket Thread-19818 [001] ..s. 139012.173556: : aaa
+ *   [...]
+ *
+ * tc exec bpf graft m:globals/jmp_tc key 0 obj graft.o sec bbb
+ * tc exec bpf dbg
+ *   [...]
+ *   Socket Thread-19818 [002] ..s. 139022.102967: : bbb
+ *   <idle>-0            [002] ..s. 139022.155640: : bbb
+ *   Socket Thread-19818 [001] ..s. 139022.156730: : bbb
+ *   [...]
+ */
+struct bpf_elf_map __section("maps") jmp_tc = {
+	.type		= BPF_MAP_TYPE_PROG_ARRAY,
+	.size_key	= sizeof(int),
+	.size_value	= sizeof(int),
+	.pinning	= PIN_GLOBAL_NS,
+	.max_elem	= 1,
+};
+
+__section("aaa") int cls_aaa(struct __sk_buff *skb)
+{
+	char fmt[] = "aaa\n";
+
+	bpf_printk(fmt, sizeof(fmt));
+	return -1;
+}
+
+__section("bbb") int cls_bbb(struct __sk_buff *skb)
+{
+	char fmt[] = "bbb\n";
+
+	bpf_printk(fmt, sizeof(fmt));
+	return -1;
+}
+
+__section("classifier") int cls_entry(struct __sk_buff *skb)
+{
+	char fmt[] = "fallthrough\n";
+
+	bpf_tail_call(skb, &jmp_tc, 0);
+	bpf_printk(fmt, sizeof(fmt));
+	return -1;
+}
+
+char __license[] __section("license") = "GPL";
diff --git a/examples/bpf/bpf_tailcall.c b/examples/bpf/bpf_tailcall.c
new file mode 100644
index 0000000..f186e57
--- /dev/null
+++ b/examples/bpf/bpf_tailcall.c
@@ -0,0 +1,115 @@
+#include <linux/bpf.h>
+
+#include "bpf_funcs.h"
+
+#define ENTRY_INIT	3
+#define ENTRY_0		0
+#define ENTRY_1		1
+#define MAX_JMP_SIZE	2
+
+#define FOO		42
+#define BAR		43
+
+/* This example doesn't really do anything useful, but it's purpose is to
+ * demonstrate eBPF tail calls on a very simple example.
+ *
+ * cls_entry() is our classifier entry point, from there we jump based on
+ * skb->hash into cls_case1() or cls_case2(). They are both part of the
+ * program array jmp_tc. Indicated via __section_tail(), the tc loader
+ * populates the program arrays with the loaded file descriptors already.
+ *
+ * To demonstrate nested jumps, cls_case2() jumps within the same jmp_tc
+ * array to cls_case1(). And whenever we arrive at cls_case1(), we jump
+ * into cls_exit(), part of the jump array jmp_ex.
+ *
+ * Also, to show it's possible, all programs share map_sh and dump the value
+ * that the entry point incremented. The sections that are loaded into a
+ * program array can be atomically replaced during run-time, e.g. to change
+ * classifier behaviour.
+ */
+struct bpf_elf_map __section("maps") map_sh = {
+	.type		= BPF_MAP_TYPE_ARRAY,
+	.size_key	= sizeof(int),
+	.size_value	= sizeof(int),
+	.pinning	= PIN_OBJECT_NS,
+	.max_elem	= 1,
+};
+
+struct bpf_elf_map __section("maps") jmp_tc = {
+	.type		= BPF_MAP_TYPE_PROG_ARRAY,
+	.id		= FOO,
+	.size_key	= sizeof(int),
+	.size_value	= sizeof(int),
+	.pinning	= PIN_OBJECT_NS,
+	.max_elem	= MAX_JMP_SIZE,
+};
+
+struct bpf_elf_map __section("maps") jmp_ex = {
+	.type		= BPF_MAP_TYPE_PROG_ARRAY,
+	.id		= BAR,
+	.size_key	= sizeof(int),
+	.size_value	= sizeof(int),
+	.pinning	= PIN_OBJECT_NS,
+	.max_elem	= 1,
+};
+
+__section_tail(FOO, ENTRY_0) int cls_case1(struct __sk_buff *skb)
+{
+	char fmt[] = "case1: map-val: %d from:%u\n";
+	int key = 0, *val;
+
+	val = bpf_map_lookup_elem(&map_sh, &key);
+	if (val)
+		bpf_printk(fmt, sizeof(fmt), *val, skb->cb[0]);
+
+	skb->cb[0] = ENTRY_0;
+	bpf_tail_call(skb, &jmp_ex, ENTRY_0);
+	return 0;
+}
+
+__section_tail(FOO, ENTRY_1) int cls_case2(struct __sk_buff *skb)
+{
+	char fmt[] = "case2: map-val: %d from:%u\n";
+	int key = 0, *val;
+
+	val = bpf_map_lookup_elem(&map_sh, &key);
+	if (val)
+		bpf_printk(fmt, sizeof(fmt), *val, skb->cb[0]);
+
+	skb->cb[0] = ENTRY_1;
+	bpf_tail_call(skb, &jmp_tc, ENTRY_0);
+	return 0;
+}
+
+__section_tail(BAR, ENTRY_0) int cls_exit(struct __sk_buff *skb)
+{
+	char fmt[] = "exit: map-val: %d from:%u\n";
+	int key = 0, *val;
+
+	val = bpf_map_lookup_elem(&map_sh, &key);
+	if (val)
+		bpf_printk(fmt, sizeof(fmt), *val, skb->cb[0]);
+
+	/* Termination point. */
+	return -1;
+}
+
+__section("classifier") int cls_entry(struct __sk_buff *skb)
+{
+	char fmt[] = "fallthrough\n";
+	int key = 0, *val;
+
+	/* For transferring state, we can use skb->cb[0] ... skb->cb[4]. */
+	val = bpf_map_lookup_elem(&map_sh, &key);
+	if (val) {
+		__sync_fetch_and_add(val, 1);
+
+		skb->cb[0] = ENTRY_INIT;
+		bpf_tail_call(skb, &jmp_tc, skb->hash & (MAX_JMP_SIZE - 1));
+	}
+
+	bpf_printk(fmt, sizeof(fmt));
+	return 0;
+}
+
+char __license[] __section("license") = "GPL";
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH iproute2 -next v2 4/5] {f,m}_bpf: allow updates on program arrays
  2015-11-26 14:38 ` [PATCH iproute2 -next v2 4/5] {f,m}_bpf: allow updates on program arrays Daniel Borkmann
@ 2015-11-26 15:19   ` Hannes Frederic Sowa
  2015-11-26 15:51     ` Daniel Borkmann
  0 siblings, 1 reply; 9+ messages in thread
From: Hannes Frederic Sowa @ 2015-11-26 15:19 UTC (permalink / raw)
  To: Daniel Borkmann, stephen; +Cc: ast, netdev

On Thu, Nov 26, 2015, at 15:38, Daniel Borkmann wrote:
> +static int bpf_mnt_fs(const char *target)
> +{
> +       bool bind_done = false;
> +
> +       while (mount("", target, "none", MS_PRIVATE | MS_REC, NULL)) {
> +               if (errno != EINVAL || bind_done) {
> +                       fprintf(stderr, "mount --make-private %s failed:
> %s\n",
> +                               target, strerror(errno));
> +                       return -1;
> +               }
> +
> +               if (mount(target, target, "none", MS_BIND, NULL)) {
> +                       fprintf(stderr, "mount --bind %s %s failed:
> %s\n",
> +                               target, target, strerror(errno));
> +                       return -1;
> +               }
> +
> +               bind_done = true;
> +       }

Why does user space actually still have to deal with setting the mount
point private? Isn't this handled by the kernel?

> +       if (mount("bpf", target, "bpf", 0, NULL)) {
> +               fprintf(stderr, "mount -t bpf bpf %s failed: %s\n",
> +                       target, strerror(errno));
> +               return -1;
> +       }

Shouldn't this be just enough?

> +       return 0;
> +}

Thanks,
Hannes

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH iproute2 -next v2 4/5] {f,m}_bpf: allow updates on program arrays
  2015-11-26 15:19   ` Hannes Frederic Sowa
@ 2015-11-26 15:51     ` Daniel Borkmann
  0 siblings, 0 replies; 9+ messages in thread
From: Daniel Borkmann @ 2015-11-26 15:51 UTC (permalink / raw)
  To: Hannes Frederic Sowa, stephen; +Cc: ast, netdev

On 11/26/2015 04:19 PM, Hannes Frederic Sowa wrote:
> On Thu, Nov 26, 2015, at 15:38, Daniel Borkmann wrote:
[...]
> Why does user space actually still have to deal with setting the mount
> point private? Isn't this handled by the kernel?
>
>> +       if (mount("bpf", target, "bpf", 0, NULL)) {
>> +               fprintf(stderr, "mount -t bpf bpf %s failed: %s\n",
>> +                       target, strerror(errno));
>> +               return -1;
>> +       }
>
> Shouldn't this be just enough?

Note that the patch just moves the function around, but to get to your
question, that would just make it shared by default, not private.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH iproute2 -next v2 5/5] {f,m}_bpf: add more example code
  2015-11-26 14:38 ` [PATCH iproute2 -next v2 5/5] {f,m}_bpf: add more example code Daniel Borkmann
@ 2015-11-29 19:56   ` Stephen Hemminger
  0 siblings, 0 replies; 9+ messages in thread
From: Stephen Hemminger @ 2015-11-29 19:56 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: ast, netdev

On Thu, 26 Nov 2015 15:38:46 +0100
Daniel Borkmann <daniel@iogearbox.net> wrote:

> I've added three examples to examples/bpf/ that demonstrate how one can
> implement eBPF tail calls in tc with f.e. multiple levels of nesting.
> That should act as a good starting point, but also as test cases for the
> ELF loader and kernel. A real test suite for {f,m,e}_bpf is still to be
> developed in future work.
> 
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Acked-by: Alexei Starovoitov <ast@kernel.org>

All applied to net-next branch.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-11-29 19:55 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-26 14:38 [PATCH iproute2 -next v2 0/5] BPF updates Daniel Borkmann
2015-11-26 14:38 ` [PATCH iproute2 -next v2 1/5] {f,m}_bpf: make tail calls working Daniel Borkmann
2015-11-26 14:38 ` [PATCH iproute2 -next v2 2/5] {f,m}_bpf: check map attributes when fetching as pinned Daniel Borkmann
2015-11-26 14:38 ` [PATCH iproute2 -next v2 3/5] {f,m}_bpf: allow for user-defined object pinnings Daniel Borkmann
2015-11-26 14:38 ` [PATCH iproute2 -next v2 4/5] {f,m}_bpf: allow updates on program arrays Daniel Borkmann
2015-11-26 15:19   ` Hannes Frederic Sowa
2015-11-26 15:51     ` Daniel Borkmann
2015-11-26 14:38 ` [PATCH iproute2 -next v2 5/5] {f,m}_bpf: add more example code Daniel Borkmann
2015-11-29 19:56   ` Stephen Hemminger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.