netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls
@ 2019-10-02 13:30 Toke Høiland-Jørgensen
  2019-10-02 13:30 ` [PATCH bpf-next 1/9] hashtab: Add new bpf_map_fd_put_value op Toke Høiland-Jørgensen
                   ` (12 more replies)
  0 siblings, 13 replies; 53+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-10-02 13:30 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Martin KaFai Lau, Song Liu, Yonghong Song,
	Marek Majkowski, Lorenz Bauer, David Miller,
	Jesper Dangaard Brouer, netdev, bpf

This series adds support for executing multiple XDP programs on a single
interface in sequence, through the use of chain calls, as discussed at the Linux
Plumbers Conference last month:

https://linuxplumbersconf.org/event/4/contributions/460/

# HIGH-LEVEL IDEA

The basic idea is to express the chain call sequence through a special map type,
which contains a mapping from a (program, return code) tuple to another program
to run in next in the sequence. Userspace can populate this map to express
arbitrary call sequences, and update the sequence by updating or replacing the
map.

The actual execution of the program sequence is done in bpf_prog_run_xdp(),
which will lookup the chain sequence map, and if found, will loop through calls
to BPF_PROG_RUN, looking up the next XDP program in the sequence based on the
previous program ID and return code.

An XDP chain call map can be installed on an interface by means of a new netlink
attribute containing an fd pointing to a chain call map. This can be supplied
along with the XDP prog fd, so that a chain map is always installed together
with an XDP program.

# PERFORMANCE

I performed a simple performance test to get an initial feel for the overhead of
the chain call mechanism. This test consists of running only two programs in
sequence: One that returns XDP_PASS and another that returns XDP_DROP. I then
measure the drop PPS performance and compare it to a baseline of just a single
program that only returns XDP_DROP.

For comparison, a test case that uses regular eBPF tail calls to sequence two
programs together is also included. Finally, because 'perf' showed that the
hashmap lookup was the largest single source of overhead, I also added a test
case where I removed the jhash() call from the hashmap code, and just use the
u32 key directly as an index into the hash bucket structure.

The performance for these different cases is as follows (with retpolines disabled):

| Test case                       | Perf      | Add. overhead | Total overhead |
|---------------------------------+-----------+---------------+----------------|
| Before patch (XDP DROP program) | 31.0 Mpps |               |                |
| After patch (XDP DROP program)  | 28.9 Mpps |        2.3 ns |         2.3 ns |
| XDP tail call                   | 26.6 Mpps |        3.0 ns |         5.3 ns |
| XDP chain call (no jhash)       | 19.6 Mpps |       13.4 ns |        18.7 ns |
| XDP chain call (this series)    | 17.0 Mpps |        7.9 ns |        26.6 ns |

From this it is clear that while there is some overhead from this mechanism; but
the jhash removal example indicates that it is probably possible to optimise the
code to the point where the overhead becomes low enough that it is acceptable.

# PATCH SET STRUCTURE
This series is structured as follows:

- Patch 1: Prerequisite
- Patch 2: New map type
- Patch 3: Netlink hooks to install the chain call map
- Patch 4: Core chain call logic
- Patch 5-7: Bookkeeping updates to tools
- Patch 8: Libbpf support for installing chain call maps
- Patch 9: Selftests with example user space code

The whole series is also available in my git repo on kernel.org:
https://git.kernel.org/pub/scm/linux/kernel/git/toke/linux.git/log/?h=xdp-multiprog-01

---

Toke Høiland-Jørgensen (9):
      hashtab: Add new bpf_map_fd_put_value op
      xdp: Add new xdp_chain_map type for specifying XDP call sequences
      xdp: Support setting and getting device chain map
      xdp: Implement chain call logic to support multiple programs on one interface
      tools/include/uapi: Add XDP chain map definitions
      tools/libbpf_probes: Add support for xdp_chain map type
      bpftool: Add definitions for xdp_chain map type
      libbpf: Add support for setting and getting XDP chain maps
      selftests: Add tests for XDP chain calls


 include/linux/bpf.h                             |   10 +
 include/linux/bpf_types.h                       |    1 
 include/linux/filter.h                          |   26 ++
 include/linux/netdevice.h                       |    3 
 include/uapi/linux/bpf.h                        |   12 +
 include/uapi/linux/if_link.h                    |    2 
 kernel/bpf/hashtab.c                            |  169 +++++++++++++-
 kernel/bpf/map_in_map.c                         |    7 +
 kernel/bpf/map_in_map.h                         |    1 
 kernel/bpf/syscall.c                            |   11 +
 net/core/dev.c                                  |   42 +++-
 net/core/rtnetlink.c                            |   23 ++
 tools/bpf/bpftool/Documentation/bpftool-map.rst |    4 
 tools/bpf/bpftool/bash-completion/bpftool       |    2 
 tools/bpf/bpftool/map.c                         |    3 
 tools/include/uapi/linux/bpf.h                  |   12 +
 tools/include/uapi/linux/if_link.h              |    2 
 tools/lib/bpf/libbpf.h                          |    4 
 tools/lib/bpf/libbpf.map                        |    2 
 tools/lib/bpf/libbpf_probes.c                   |    4 
 tools/lib/bpf/netlink.c                         |   49 ++++
 tools/testing/selftests/bpf/.gitignore          |    1 
 tools/testing/selftests/bpf/Makefile            |    3 
 tools/testing/selftests/bpf/progs/xdp_dummy.c   |    6 +
 tools/testing/selftests/bpf/test_maps.c         |   45 ++++
 tools/testing/selftests/bpf/test_xdp_chain.sh   |   77 +++++++
 tools/testing/selftests/bpf/xdp_chain.c         |  271 +++++++++++++++++++++++
 27 files changed, 765 insertions(+), 27 deletions(-)
 create mode 100755 tools/testing/selftests/bpf/test_xdp_chain.sh
 create mode 100644 tools/testing/selftests/bpf/xdp_chain.c


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH bpf-next 1/9] hashtab: Add new bpf_map_fd_put_value op
  2019-10-02 13:30 [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls Toke Høiland-Jørgensen
@ 2019-10-02 13:30 ` Toke Høiland-Jørgensen
  2019-10-02 13:30 ` [PATCH bpf-next 2/9] xdp: Add new xdp_chain_map type for specifying XDP call sequences Toke Høiland-Jørgensen
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 53+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-10-02 13:30 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Martin KaFai Lau, Song Liu, Yonghong Song,
	Marek Majkowski, Lorenz Bauer, David Miller,
	Jesper Dangaard Brouer, netdev, bpf

From: Toke Høiland-Jørgensen <toke@redhat.com>

The fd type maps all resolve the fd into a pointer to the underlying object
when it is inserted into the map, and stores that pointer as the real array
value. The htab code assumes that the map value is this single pointer, and
dereferences it before passing it to the map fd_put_ptr() op.

For xdp chain maps we want to be able to store multiple pointers, so we
need to get the pointer to the map value store, not the dereferenced
pointer to the actual object. So add a new more general
bpf_map_fd_put_value() op that takes the map value instead of the
dereferenced pointer, and use this on map element free.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 include/linux/bpf.h     |    1 +
 kernel/bpf/hashtab.c    |   16 ++++++----------
 kernel/bpf/map_in_map.c |    7 +++++++
 kernel/bpf/map_in_map.h |    1 +
 4 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 5b9d22338606..be3e9e9109c7 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -50,6 +50,7 @@ struct bpf_map_ops {
 	void *(*map_fd_get_ptr)(struct bpf_map *map, struct file *map_file,
 				int fd);
 	void (*map_fd_put_ptr)(void *ptr);
+	void (*map_fd_put_value)(void *value);
 	u32 (*map_gen_lookup)(struct bpf_map *map, struct bpf_insn *insn_buf);
 	u32 (*map_fd_sys_lookup_elem)(void *ptr);
 	void (*map_seq_show_elem)(struct bpf_map *map, void *key,
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 22066a62c8c9..113e1286e184 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -87,11 +87,6 @@ static inline void __percpu *htab_elem_get_ptr(struct htab_elem *l, u32 key_size
 	return *(void __percpu **)(l->key + key_size);
 }
 
-static void *fd_htab_map_get_ptr(const struct bpf_map *map, struct htab_elem *l)
-{
-	return *(void **)(l->key + roundup(map->key_size, 8));
-}
-
 static struct htab_elem *get_htab_elem(struct bpf_htab *htab, int i)
 {
 	return (struct htab_elem *) (htab->elems + i * htab->elem_size);
@@ -679,10 +674,10 @@ static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l)
 {
 	struct bpf_map *map = &htab->map;
 
-	if (map->ops->map_fd_put_ptr) {
-		void *ptr = fd_htab_map_get_ptr(map, l);
+	if (map->ops->map_fd_put_value) {
+		void *value = l->key + round_up(map->key_size, 8);
 
-		map->ops->map_fd_put_ptr(ptr);
+		map->ops->map_fd_put_value(value);
 	}
 
 	if (htab_is_prealloc(htab)) {
@@ -1400,9 +1395,9 @@ static void fd_htab_map_free(struct bpf_map *map)
 		head = select_bucket(htab, i);
 
 		hlist_nulls_for_each_entry_safe(l, n, head, hash_node) {
-			void *ptr = fd_htab_map_get_ptr(map, l);
+			void *value = l->key + round_up(map->key_size, 8);
 
-			map->ops->map_fd_put_ptr(ptr);
+			map->ops->map_fd_put_value(value);
 		}
 	}
 
@@ -1510,6 +1505,7 @@ const struct bpf_map_ops htab_of_maps_map_ops = {
 	.map_delete_elem = htab_map_delete_elem,
 	.map_fd_get_ptr = bpf_map_fd_get_ptr,
 	.map_fd_put_ptr = bpf_map_fd_put_ptr,
+	.map_fd_put_value = bpf_map_fd_put_value,
 	.map_fd_sys_lookup_elem = bpf_map_fd_sys_lookup_elem,
 	.map_gen_lookup = htab_of_map_gen_lookup,
 	.map_check_btf = map_check_no_btf,
diff --git a/kernel/bpf/map_in_map.c b/kernel/bpf/map_in_map.c
index fab4fb134547..1b4e8b6da777 100644
--- a/kernel/bpf/map_in_map.c
+++ b/kernel/bpf/map_in_map.c
@@ -114,6 +114,13 @@ void bpf_map_fd_put_ptr(void *ptr)
 	bpf_map_put(ptr);
 }
 
+void bpf_map_fd_put_value(void *value)
+{
+	void **ptr = value;
+
+	bpf_map_fd_put_ptr(*ptr);
+}
+
 u32 bpf_map_fd_sys_lookup_elem(void *ptr)
 {
 	return ((struct bpf_map *)ptr)->id;
diff --git a/kernel/bpf/map_in_map.h b/kernel/bpf/map_in_map.h
index a507bf6ef8b9..68d1a52e1757 100644
--- a/kernel/bpf/map_in_map.h
+++ b/kernel/bpf/map_in_map.h
@@ -16,6 +16,7 @@ bool bpf_map_meta_equal(const struct bpf_map *meta0,
 void *bpf_map_fd_get_ptr(struct bpf_map *map, struct file *map_file,
 			 int ufd);
 void bpf_map_fd_put_ptr(void *ptr);
+void bpf_map_fd_put_value(void *value);
 u32 bpf_map_fd_sys_lookup_elem(void *ptr);
 
 #endif


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH bpf-next 2/9] xdp: Add new xdp_chain_map type for specifying XDP call sequences
  2019-10-02 13:30 [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls Toke Høiland-Jørgensen
  2019-10-02 13:30 ` [PATCH bpf-next 1/9] hashtab: Add new bpf_map_fd_put_value op Toke Høiland-Jørgensen
@ 2019-10-02 13:30 ` Toke Høiland-Jørgensen
  2019-10-02 15:50   ` Lorenz Bauer
  2019-10-02 13:30 ` [PATCH bpf-next 3/9] xdp: Support setting and getting device chain map Toke Høiland-Jørgensen
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 53+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-10-02 13:30 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Martin KaFai Lau, Song Liu, Yonghong Song,
	Marek Majkowski, Lorenz Bauer, David Miller,
	Jesper Dangaard Brouer, netdev, bpf

From: Toke Høiland-Jørgensen <toke@redhat.com>

To specify call sequences for XDP programs, we need to do a lookup after an
XDP program runs to get the next program to run. Introduce a new map type
for this use which allows specifying one action per return code of the
original program.

The new map type is derived from the hashmap type, and uses the program ID
as the lookup key. The value is a struct of program pointers specifying the
next program in the call sequence for each return code of the previous
program. A special wildcard pointer can also be specified which will match
on all program return codes if no more specific entry is included in the
structure (i.e., LPM-style matching).

Userspace fills in a struct of program fds and passes that as the value,
and the map converts the fds to pointers before inserting the structure
into the map. Using the map type directly from an eBPF program is
disallowed; instead, a subsequent commit will add lookup code to the XDP
dispatch function, and the ability for userspace to set the current chain
call map when attaching an XDP program to an interface.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 include/linux/bpf.h       |    9 +++
 include/linux/bpf_types.h |    1 
 include/uapi/linux/bpf.h  |   12 ++++
 kernel/bpf/hashtab.c      |  153 +++++++++++++++++++++++++++++++++++++++++++++
 kernel/bpf/syscall.c      |   11 +++
 5 files changed, 186 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index be3e9e9109c7..e72702c4cb12 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -689,6 +689,15 @@ int bpf_fd_htab_map_update_elem(struct bpf_map *map, struct file *map_file,
 				void *key, void *value, u64 map_flags);
 int bpf_fd_htab_map_lookup_elem(struct bpf_map *map, void *key, u32 *value);
 
+int bpf_xdp_chain_map_update_elem(struct bpf_map *map, void *key, void *value,
+				  u64 map_flags);
+int bpf_xdp_chain_map_lookup_elem(struct bpf_map *map, void *key, void *value);
+struct bpf_prog *bpf_xdp_chain_map_get_prog(struct bpf_map *map,
+					    u32 prev_id,
+					    enum xdp_action action);
+
+
+
 int bpf_get_file_flag(int flags);
 int bpf_check_uarg_tail_zero(void __user *uaddr, size_t expected_size,
 			     size_t actual_size);
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 36a9c2325176..95650ed86a12 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -63,6 +63,7 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_HASH_OF_MAPS, htab_of_maps_map_ops)
 #ifdef CONFIG_NET
 BPF_MAP_TYPE(BPF_MAP_TYPE_DEVMAP, dev_map_ops)
 BPF_MAP_TYPE(BPF_MAP_TYPE_DEVMAP_HASH, dev_map_hash_ops)
+BPF_MAP_TYPE(BPF_MAP_TYPE_XDP_CHAIN, xdp_chain_map_ops)
 BPF_MAP_TYPE(BPF_MAP_TYPE_SK_STORAGE, sk_storage_map_ops)
 #if defined(CONFIG_BPF_STREAM_PARSER)
 BPF_MAP_TYPE(BPF_MAP_TYPE_SOCKMAP, sock_map_ops)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 77c6be96d676..8b336fb68880 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -136,6 +136,7 @@ enum bpf_map_type {
 	BPF_MAP_TYPE_STACK,
 	BPF_MAP_TYPE_SK_STORAGE,
 	BPF_MAP_TYPE_DEVMAP_HASH,
+	BPF_MAP_TYPE_XDP_CHAIN,
 };
 
 /* Note that tracing related programs such as
@@ -3153,6 +3154,17 @@ enum xdp_action {
 	XDP_PASS,
 	XDP_TX,
 	XDP_REDIRECT,
+
+	__XDP_ACT_MAX /* leave at end */
+};
+#define XDP_ACT_MAX	(__XDP_ACT_MAX - 1)
+
+struct xdp_chain_acts {
+	__u32 wildcard_act;
+	__u32 drop_act;
+	__u32 pass_act;
+	__u32 tx_act;
+	__u32 redirect_act;
 };
 
 /* user accessible metadata for XDP packet hook
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 113e1286e184..ab855095c830 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -1510,3 +1510,156 @@ const struct bpf_map_ops htab_of_maps_map_ops = {
 	.map_gen_lookup = htab_of_map_gen_lookup,
 	.map_check_btf = map_check_no_btf,
 };
+
+struct xdp_chain_table {
+	struct bpf_prog *wildcard_act;
+	struct bpf_prog *act[XDP_ACT_MAX];
+};
+
+static int xdp_chain_map_alloc_check(union bpf_attr *attr)
+{
+	BUILD_BUG_ON(sizeof(struct xdp_chain_table) / sizeof(void *) !=
+		     sizeof(struct xdp_chain_acts) / sizeof(u32));
+
+	if (attr->key_size != sizeof(u32) ||
+	    attr->value_size != sizeof(struct xdp_chain_acts))
+		return -EINVAL;
+
+	attr->value_size = sizeof(struct xdp_chain_table);
+	return htab_map_alloc_check(attr);
+}
+
+struct bpf_prog *bpf_xdp_chain_map_get_prog(struct bpf_map *map,
+					    u32 prev_id,
+					    enum xdp_action action)
+{
+	struct xdp_chain_table *tab;
+	void *ptr;
+
+	ptr = htab_map_lookup_elem(map, &prev_id);
+
+	if (!ptr)
+		return NULL;
+
+	tab = READ_ONCE(ptr);
+	return tab->act[action - 1] ?: tab->wildcard_act;
+}
+EXPORT_SYMBOL_GPL(bpf_xdp_chain_map_get_prog);
+
+/* only called from syscall */
+int bpf_xdp_chain_map_lookup_elem(struct bpf_map *map, void *key, void *value)
+{
+	struct xdp_chain_acts *act = value;
+	struct xdp_chain_table *tab;
+	void *ptr;
+	u32 *cur;
+	int i;
+
+	ptr = htab_map_lookup_elem(map, key);
+	if (!ptr)
+		return -ENOENT;
+
+	tab = READ_ONCE(ptr);
+
+	if (tab->wildcard_act)
+		act->wildcard_act = tab->wildcard_act->aux->id;
+
+	cur = &act->drop_act;
+	for (i = 0; i < XDP_ACT_MAX; i++, cur++)
+		if(tab->act[i])
+			*cur = tab->act[i]->aux->id;
+
+	return 0;
+}
+
+static void *xdp_chain_map_get_ptr(int fd)
+{
+	struct bpf_prog *prog = bpf_prog_get(fd);
+
+	if (IS_ERR(prog))
+		return prog;
+
+	if (prog->type != BPF_PROG_TYPE_XDP ||
+	    bpf_prog_is_dev_bound(prog->aux)) {
+		bpf_prog_put(prog);
+		return ERR_PTR(-EINVAL);
+	}
+
+	return prog;
+}
+
+static void xdp_chain_map_put_ptrs(void *value)
+{
+	struct xdp_chain_table *tab = value;
+	int i;
+
+	for (i = 0; i < XDP_ACT_MAX; i++)
+		if (tab->act[i])
+			bpf_prog_put(tab->act[i]);
+
+	if (tab->wildcard_act)
+		bpf_prog_put(tab->wildcard_act);
+}
+
+/* only called from syscall */
+int bpf_xdp_chain_map_update_elem(struct bpf_map *map, void *key, void *value,
+				  u64 map_flags)
+{
+	struct xdp_chain_acts *act = value;
+	struct xdp_chain_table tab = {};
+	u32 lookup_key = *((u32*)key);
+	u32 *cur = &act->drop_act;
+	bool found_val = false;
+	int ret, i;
+	void *ptr;
+
+	if (!lookup_key)
+		return -EINVAL;
+
+	if (act->wildcard_act) {
+		ptr = xdp_chain_map_get_ptr(act->wildcard_act);
+		if (IS_ERR(ptr))
+			return PTR_ERR(ptr);
+		tab.wildcard_act = ptr;
+		found_val = true;
+	}
+
+	for (i = 0; i < XDP_ACT_MAX; i++, cur++) {
+		if (*cur) {
+			ptr = xdp_chain_map_get_ptr(*cur);
+			if (IS_ERR(ptr)) {
+				ret = PTR_ERR(ptr);
+				goto out_err;
+			}
+			tab.act[i] = ptr;
+			found_val = true;
+		}
+	}
+
+	if (!found_val) {
+		ret = -EINVAL;
+		goto out_err;
+	}
+
+	ret = htab_map_update_elem(map, key, &tab, map_flags);
+	if (ret)
+		goto out_err;
+
+	return ret;
+
+out_err:
+	xdp_chain_map_put_ptrs(&tab);
+
+	return ret;
+}
+
+
+const struct bpf_map_ops xdp_chain_map_ops = {
+	.map_alloc_check = xdp_chain_map_alloc_check,
+	.map_alloc = htab_map_alloc,
+	.map_free = fd_htab_map_free,
+	.map_get_next_key = htab_map_get_next_key,
+	.map_delete_elem = htab_map_delete_elem,
+	.map_fd_put_value = xdp_chain_map_put_ptrs,
+	.map_check_btf = map_check_no_btf,
+};
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 82eabd4e38ad..c9afaace048d 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -778,6 +778,8 @@ static int map_lookup_elem(union bpf_attr *attr)
 		value_size = round_up(map->value_size, 8) * num_possible_cpus();
 	else if (IS_FD_MAP(map))
 		value_size = sizeof(u32);
+	else if (map->map_type == BPF_MAP_TYPE_XDP_CHAIN)
+		value_size = sizeof(struct xdp_chain_acts);
 	else
 		value_size = map->value_size;
 
@@ -806,6 +808,8 @@ static int map_lookup_elem(union bpf_attr *attr)
 		err = bpf_fd_array_map_lookup_elem(map, key, value);
 	} else if (IS_FD_HASH(map)) {
 		err = bpf_fd_htab_map_lookup_elem(map, key, value);
+	} else if (map->map_type == BPF_MAP_TYPE_XDP_CHAIN) {
+		err = bpf_xdp_chain_map_lookup_elem(map, key, value);
 	} else if (map->map_type == BPF_MAP_TYPE_REUSEPORT_SOCKARRAY) {
 		err = bpf_fd_reuseport_array_lookup_elem(map, key, value);
 	} else if (map->map_type == BPF_MAP_TYPE_QUEUE ||
@@ -908,6 +912,8 @@ static int map_update_elem(union bpf_attr *attr)
 	    map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY ||
 	    map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE)
 		value_size = round_up(map->value_size, 8) * num_possible_cpus();
+	else if (map->map_type == BPF_MAP_TYPE_XDP_CHAIN)
+		value_size = sizeof(struct xdp_chain_acts);
 	else
 		value_size = map->value_size;
 
@@ -954,6 +960,11 @@ static int map_update_elem(union bpf_attr *attr)
 		err = bpf_fd_htab_map_update_elem(map, f.file, key, value,
 						  attr->flags);
 		rcu_read_unlock();
+	} else if (map->map_type == BPF_MAP_TYPE_XDP_CHAIN) {
+		rcu_read_lock();
+		err = bpf_xdp_chain_map_update_elem(map, key, value,
+						    attr->flags);
+		rcu_read_unlock();
 	} else if (map->map_type == BPF_MAP_TYPE_REUSEPORT_SOCKARRAY) {
 		/* rcu_read_lock() is not needed */
 		err = bpf_fd_reuseport_array_update_elem(map, key, value,


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH bpf-next 3/9] xdp: Support setting and getting device chain map
  2019-10-02 13:30 [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls Toke Høiland-Jørgensen
  2019-10-02 13:30 ` [PATCH bpf-next 1/9] hashtab: Add new bpf_map_fd_put_value op Toke Høiland-Jørgensen
  2019-10-02 13:30 ` [PATCH bpf-next 2/9] xdp: Add new xdp_chain_map type for specifying XDP call sequences Toke Høiland-Jørgensen
@ 2019-10-02 13:30 ` Toke Høiland-Jørgensen
  2019-10-02 15:50   ` Lorenz Bauer
                     ` (2 more replies)
  2019-10-02 13:30 ` [PATCH bpf-next 4/9] xdp: Implement chain call logic to support multiple programs on one interface Toke Høiland-Jørgensen
                   ` (9 subsequent siblings)
  12 siblings, 3 replies; 53+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-10-02 13:30 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Martin KaFai Lau, Song Liu, Yonghong Song,
	Marek Majkowski, Lorenz Bauer, David Miller,
	Jesper Dangaard Brouer, netdev, bpf

From: Toke Høiland-Jørgensen <toke@redhat.com>

This adds support to rtnetlink for setting and getting the per-device XDP
chain map. The map is set by means of a new netlink attribute that contains
a pointer to a BPF map of the XDP chain type. If such an attribute is
included, it will be inserted into the struct net_device so that the XDP
chain call code will pick it up on program execution.

To prevent old userspace programs that do not understand the chain map
attribute from messing up the chain call order, a netlink message with no
chain map attribute set will be rejected if a chain map has already been
installed.

When installing a new chain call map, an XDP program fd must also be
provided, otherwise the operation will be rejected.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 include/linux/netdevice.h    |    3 ++-
 include/uapi/linux/if_link.h |    2 ++
 net/core/dev.c               |   42 ++++++++++++++++++++++++++++++++++++------
 net/core/rtnetlink.c         |   23 +++++++++++++++++++++++
 4 files changed, 63 insertions(+), 7 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 48cc71aae466..60f3b510b175 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1941,6 +1941,7 @@ struct net_device {
 	unsigned int		real_num_rx_queues;
 
 	struct bpf_prog __rcu	*xdp_prog;
+	struct bpf_map __rcu	*xdp_chain_map;
 	unsigned long		gro_flush_timeout;
 	rx_handler_func_t __rcu	*rx_handler;
 	void __rcu		*rx_handler_data;
@@ -3702,7 +3703,7 @@ struct sk_buff *dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 
 typedef int (*bpf_op_t)(struct net_device *dev, struct netdev_bpf *bpf);
 int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
-		      int fd, u32 flags);
+		      int prog_fd, int chain_map_fd, u32 flags);
 u32 __dev_xdp_query(struct net_device *dev, bpf_op_t xdp_op,
 		    enum bpf_netdev_command cmd);
 int xdp_umem_query(struct net_device *dev, u16 queue_id);
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 8aec8769d944..e7a704387608 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -976,6 +976,8 @@ enum {
 	IFLA_XDP_DRV_PROG_ID,
 	IFLA_XDP_SKB_PROG_ID,
 	IFLA_XDP_HW_PROG_ID,
+	IFLA_XDP_CHAIN_MAP_FD,
+	IFLA_XDP_CHAIN_MAP_ID,
 	__IFLA_XDP_MAX,
 };
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 7a456c6a7ad8..0aa5106339e7 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -8177,9 +8177,15 @@ static int dev_xdp_install(struct net_device *dev, bpf_op_t bpf_op,
 
 static void dev_xdp_uninstall(struct net_device *dev)
 {
+	struct bpf_map *chain_map = NULL;
 	struct netdev_bpf xdp;
 	bpf_op_t ndo_bpf;
 
+	/* Remove chain map */
+	rcu_swap_protected(dev->xdp_chain_map, chain_map, 1);
+	if(chain_map)
+		bpf_map_put_with_uref(chain_map);
+
 	/* Remove generic XDP */
 	WARN_ON(dev_xdp_install(dev, generic_xdp_install, NULL, 0, NULL));
 
@@ -8207,15 +8213,17 @@ static void dev_xdp_uninstall(struct net_device *dev)
  *	dev_change_xdp_fd - set or clear a bpf program for a device rx path
  *	@dev: device
  *	@extack: netlink extended ack
- *	@fd: new program fd or negative value to clear
+ *	@prog_fd: new program fd or negative value to clear
+ *	@chain_map_fd: new chain map fd or negative value to clear
  *	@flags: xdp-related flags
  *
  *	Set or clear a bpf program for a device
  */
 int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
-		      int fd, u32 flags)
+		      int prog_fd, int chain_map_fd, u32 flags)
 {
 	const struct net_device_ops *ops = dev->netdev_ops;
+	struct bpf_map *chain_map = NULL;
 	enum bpf_netdev_command query;
 	struct bpf_prog *prog = NULL;
 	bpf_op_t bpf_op, bpf_chk;
@@ -8237,7 +8245,7 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
 	if (bpf_op == bpf_chk)
 		bpf_chk = generic_xdp_install;
 
-	if (fd >= 0) {
+	if (prog_fd >= 0) {
 		u32 prog_id;
 
 		if (!offload && __dev_xdp_query(dev, bpf_chk, XDP_QUERY_PROG)) {
@@ -8251,7 +8259,7 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
 			return -EBUSY;
 		}
 
-		prog = bpf_prog_get_type_dev(fd, BPF_PROG_TYPE_XDP,
+		prog = bpf_prog_get_type_dev(prog_fd, BPF_PROG_TYPE_XDP,
 					     bpf_op == ops->ndo_bpf);
 		if (IS_ERR(prog))
 			return PTR_ERR(prog);
@@ -8267,13 +8275,35 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
 			return 0;
 		}
 	} else {
+		if (chain_map_fd >= 0)
+			return -EINVAL;
+
 		if (!__dev_xdp_query(dev, bpf_op, query))
 			return 0;
 	}
 
+	if (chain_map_fd >= 0) {
+		chain_map = bpf_map_get_with_uref(chain_map_fd);
+		if (IS_ERR(chain_map))
+			return PTR_ERR(chain_map);
+
+		if (chain_map->map_type != BPF_MAP_TYPE_XDP_CHAIN) {
+			NL_SET_ERR_MSG(extack, "invalid chain map type");
+			bpf_map_put_with_uref(chain_map);
+			return -EINVAL;
+		}
+	}
+
 	err = dev_xdp_install(dev, bpf_op, extack, flags, prog);
-	if (err < 0 && prog)
-		bpf_prog_put(prog);
+	if (err < 0) {
+		if (prog)
+			bpf_prog_put(prog);
+	} else {
+		rcu_swap_protected(dev->xdp_chain_map, chain_map, 1);
+	}
+
+	if(chain_map)
+		bpf_map_put_with_uref(chain_map);
 
 	return err;
 }
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 49fa910b58af..d6123efdff80 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1427,6 +1427,7 @@ static int rtnl_xdp_report_one(struct sk_buff *skb, struct net_device *dev,
 
 static int rtnl_xdp_fill(struct sk_buff *skb, struct net_device *dev)
 {
+	struct bpf_map *chain_map;
 	struct nlattr *xdp;
 	u32 prog_id;
 	int err;
@@ -1461,6 +1462,13 @@ static int rtnl_xdp_fill(struct sk_buff *skb, struct net_device *dev)
 			goto err_cancel;
 	}
 
+	chain_map = rcu_dereference(dev->xdp_chain_map);
+	if (chain_map) {
+		err = nla_put_u32(skb, IFLA_XDP_CHAIN_MAP_ID, chain_map->id);
+		if (err)
+			goto err_cancel;
+	}
+
 	nla_nest_end(skb, xdp);
 	return 0;
 
@@ -2756,12 +2764,27 @@ static int do_setlink(const struct sk_buff *skb,
 		}
 
 		if (xdp[IFLA_XDP_FD]) {
+			int chain_map_fd = -1;
+
+			if (xdp[IFLA_XDP_CHAIN_MAP_FD]) {
+				chain_map_fd = nla_get_s32(xdp[IFLA_XDP_CHAIN_MAP_FD]);
+			} else if (dev->xdp_chain_map) {
+				NL_SET_ERR_MSG(extack, "no chain map attribute, but chain map loaded");
+				err = -EINVAL;
+				goto errout;
+			}
+
 			err = dev_change_xdp_fd(dev, extack,
 						nla_get_s32(xdp[IFLA_XDP_FD]),
+						chain_map_fd,
 						xdp_flags);
 			if (err)
 				goto errout;
 			status |= DO_SETLINK_NOTIFY;
+		} else if (xdp[IFLA_XDP_CHAIN_MAP_FD]) {
+			err = -EINVAL;
+			NL_SET_ERR_MSG(extack, "chain map attribute invalid without prog fd");
+			goto errout;
 		}
 	}
 


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH bpf-next 4/9] xdp: Implement chain call logic to support multiple programs on one interface
  2019-10-02 13:30 [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls Toke Høiland-Jørgensen
                   ` (2 preceding siblings ...)
  2019-10-02 13:30 ` [PATCH bpf-next 3/9] xdp: Support setting and getting device chain map Toke Høiland-Jørgensen
@ 2019-10-02 13:30 ` Toke Høiland-Jørgensen
  2019-10-02 17:33   ` kbuild test robot
  2019-10-02 17:53   ` kbuild test robot
  2019-10-02 13:30 ` [PATCH bpf-next 5/9] tools/include/uapi: Add XDP chain map definitions Toke Høiland-Jørgensen
                   ` (8 subsequent siblings)
  12 siblings, 2 replies; 53+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-10-02 13:30 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Martin KaFai Lau, Song Liu, Yonghong Song,
	Marek Majkowski, Lorenz Bauer, David Miller,
	Jesper Dangaard Brouer, netdev, bpf

From: Toke Høiland-Jørgensen <toke@redhat.com>

This adds for executing multiple XDP programs on a single interface using
the chain call map type introduced in the previous commits. The logic is
added as an extension of bpf_prog_run_xdp() which will loop through the
call sequence specified by the chain call map installed on the current
interface. The overhead when no chain map is installed is only a single
pointer dereference.

The total call sequence length is limited to 32 programs, and the call
sequence will be aborted and XDP_ABORTED returned if it is exceeded.
Likewise, if a program in the sequence returns XDP_ABORTED, the whole
sequence will be aborted immediately, on the assumption that this is a
fault somewhere in the system.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 include/linux/filter.h |   26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 2ce57645f3cd..8a79ddd4f7b5 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -693,6 +693,7 @@ static inline u32 bpf_prog_run_clear_cb(const struct bpf_prog *prog,
 	return res;
 }
 
+#define BPF_XDP_MAX_CHAIN_CALLS 32
 static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog,
 					    struct xdp_buff *xdp)
 {
@@ -702,7 +703,30 @@ static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog,
 	 * already takes rcu_read_lock() when fetching the program, so
 	 * it's not necessary here anymore.
 	 */
-	return BPF_PROG_RUN(prog, xdp);
+
+	int i = BPF_XDP_MAX_CHAIN_CALLS;
+	struct bpf_map *chain_map;
+	u32 ret;
+
+	chain_map = rcu_dereference(xdp->rxq->dev->xdp_chain_map);
+	if (!chain_map)
+		return BPF_PROG_RUN(prog, xdp);
+
+	do {
+		if (!--i) {
+			ret = XDP_ABORTED;
+			goto out;
+		}
+
+		ret = BPF_PROG_RUN(prog, xdp);
+		if (ret == XDP_ABORTED)
+			goto out;
+
+		prog = bpf_xdp_chain_map_get_prog(chain_map, prog->aux->id, ret);
+	} while(prog);
+
+out:
+	return ret;
 }
 
 static inline u32 bpf_prog_insn_size(const struct bpf_prog *prog)


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH bpf-next 5/9] tools/include/uapi: Add XDP chain map definitions
  2019-10-02 13:30 [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls Toke Høiland-Jørgensen
                   ` (3 preceding siblings ...)
  2019-10-02 13:30 ` [PATCH bpf-next 4/9] xdp: Implement chain call logic to support multiple programs on one interface Toke Høiland-Jørgensen
@ 2019-10-02 13:30 ` Toke Høiland-Jørgensen
  2019-10-02 13:30 ` [PATCH bpf-next 6/9] tools/libbpf_probes: Add support for xdp_chain map type Toke Høiland-Jørgensen
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 53+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-10-02 13:30 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Martin KaFai Lau, Song Liu, Yonghong Song,
	Marek Majkowski, Lorenz Bauer, David Miller,
	Jesper Dangaard Brouer, netdev, bpf

From: Toke Høiland-Jørgensen <toke@redhat.com>

This syncs the XDP chain-map related UAPI definitions into
tools/include/uapi.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 tools/include/uapi/linux/bpf.h     |   12 ++++++++++++
 tools/include/uapi/linux/if_link.h |    2 ++
 2 files changed, 14 insertions(+)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 77c6be96d676..8b336fb68880 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -136,6 +136,7 @@ enum bpf_map_type {
 	BPF_MAP_TYPE_STACK,
 	BPF_MAP_TYPE_SK_STORAGE,
 	BPF_MAP_TYPE_DEVMAP_HASH,
+	BPF_MAP_TYPE_XDP_CHAIN,
 };
 
 /* Note that tracing related programs such as
@@ -3153,6 +3154,17 @@ enum xdp_action {
 	XDP_PASS,
 	XDP_TX,
 	XDP_REDIRECT,
+
+	__XDP_ACT_MAX /* leave at end */
+};
+#define XDP_ACT_MAX	(__XDP_ACT_MAX - 1)
+
+struct xdp_chain_acts {
+	__u32 wildcard_act;
+	__u32 drop_act;
+	__u32 pass_act;
+	__u32 tx_act;
+	__u32 redirect_act;
 };
 
 /* user accessible metadata for XDP packet hook
diff --git a/tools/include/uapi/linux/if_link.h b/tools/include/uapi/linux/if_link.h
index 4a8c02cafa9a..7387d2371489 100644
--- a/tools/include/uapi/linux/if_link.h
+++ b/tools/include/uapi/linux/if_link.h
@@ -974,6 +974,8 @@ enum {
 	IFLA_XDP_DRV_PROG_ID,
 	IFLA_XDP_SKB_PROG_ID,
 	IFLA_XDP_HW_PROG_ID,
+	IFLA_XDP_CHAIN_MAP_FD,
+	IFLA_XDP_CHAIN_MAP_ID,
 	__IFLA_XDP_MAX,
 };
 


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH bpf-next 6/9] tools/libbpf_probes: Add support for xdp_chain map type
  2019-10-02 13:30 [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls Toke Høiland-Jørgensen
                   ` (4 preceding siblings ...)
  2019-10-02 13:30 ` [PATCH bpf-next 5/9] tools/include/uapi: Add XDP chain map definitions Toke Høiland-Jørgensen
@ 2019-10-02 13:30 ` Toke Høiland-Jørgensen
  2019-10-02 13:30 ` [PATCH bpf-next 7/9] bpftool: Add definitions " Toke Høiland-Jørgensen
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 53+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-10-02 13:30 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Martin KaFai Lau, Song Liu, Yonghong Song,
	Marek Majkowski, Lorenz Bauer, David Miller,
	Jesper Dangaard Brouer, netdev, bpf

From: Toke Høiland-Jørgensen <toke@redhat.com>

This adds support for the BPF_MAP_TYPE_XDP_CHAIN map type to
libbpf_probes.c.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 tools/lib/bpf/libbpf_probes.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tools/lib/bpf/libbpf_probes.c b/tools/lib/bpf/libbpf_probes.c
index 4b0b0364f5fc..266ae6e78f88 100644
--- a/tools/lib/bpf/libbpf_probes.c
+++ b/tools/lib/bpf/libbpf_probes.c
@@ -231,6 +231,10 @@ bool bpf_probe_map_type(enum bpf_map_type map_type, __u32 ifindex)
 		if (btf_fd < 0)
 			return false;
 		break;
+	case BPF_MAP_TYPE_XDP_CHAIN:
+		key_size = sizeof(__u32);
+		value_size = sizeof(struct xdp_chain_acts);
+		break;
 	case BPF_MAP_TYPE_UNSPEC:
 	case BPF_MAP_TYPE_HASH:
 	case BPF_MAP_TYPE_ARRAY:


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH bpf-next 7/9] bpftool: Add definitions for xdp_chain map type
  2019-10-02 13:30 [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls Toke Høiland-Jørgensen
                   ` (5 preceding siblings ...)
  2019-10-02 13:30 ` [PATCH bpf-next 6/9] tools/libbpf_probes: Add support for xdp_chain map type Toke Høiland-Jørgensen
@ 2019-10-02 13:30 ` Toke Høiland-Jørgensen
  2019-10-02 13:30 ` [PATCH bpf-next 8/9] libbpf: Add support for setting and getting XDP chain maps Toke Høiland-Jørgensen
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 53+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-10-02 13:30 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Martin KaFai Lau, Song Liu, Yonghong Song,
	Marek Majkowski, Lorenz Bauer, David Miller,
	Jesper Dangaard Brouer, netdev, bpf

From: Toke Høiland-Jørgensen <toke@redhat.com>

Adds bash completion and definition types for the xdp_chain map type to
bpftool.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 tools/bpf/bpftool/Documentation/bpftool-map.rst |    4 ++--
 tools/bpf/bpftool/bash-completion/bpftool       |    2 +-
 tools/bpf/bpftool/map.c                         |    3 ++-
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/tools/bpf/bpftool/Documentation/bpftool-map.rst b/tools/bpf/bpftool/Documentation/bpftool-map.rst
index 1c0f7146aab0..9aefcef50b53 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-map.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-map.rst
@@ -47,8 +47,8 @@ MAP COMMANDS
 |	*TYPE* := { **hash** | **array** | **prog_array** | **perf_event_array** | **percpu_hash**
 |		| **percpu_array** | **stack_trace** | **cgroup_array** | **lru_hash**
 |		| **lru_percpu_hash** | **lpm_trie** | **array_of_maps** | **hash_of_maps**
-|		| **devmap** | **devmap_hash** | **sockmap** | **cpumap** | **xskmap** | **sockhash**
-|		| **cgroup_storage** | **reuseport_sockarray** | **percpu_cgroup_storage**
+|		| **devmap** | **devmap_hash** | **xdp_chain** | **sockmap** | **cpumap** | **xskmap**
+|		| **sockhash** | **cgroup_storage** | **reuseport_sockarray** | **percpu_cgroup_storage**
 |		| **queue** | **stack** }
 
 DESCRIPTION
diff --git a/tools/bpf/bpftool/bash-completion/bpftool b/tools/bpf/bpftool/bash-completion/bpftool
index 70493a6da206..95f19191b8d1 100644
--- a/tools/bpf/bpftool/bash-completion/bpftool
+++ b/tools/bpf/bpftool/bash-completion/bpftool
@@ -489,7 +489,7 @@ _bpftool()
                                 perf_event_array percpu_hash percpu_array \
                                 stack_trace cgroup_array lru_hash \
                                 lru_percpu_hash lpm_trie array_of_maps \
-                                hash_of_maps devmap devmap_hash sockmap cpumap \
+                                hash_of_maps devmap devmap_hash xdp_chain sockmap cpumap \
                                 xskmap sockhash cgroup_storage reuseport_sockarray \
                                 percpu_cgroup_storage queue stack' -- \
                                                    "$cur" ) )
diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
index de61d73b9030..97b5b42df79c 100644
--- a/tools/bpf/bpftool/map.c
+++ b/tools/bpf/bpftool/map.c
@@ -38,6 +38,7 @@ const char * const map_type_name[] = {
 	[BPF_MAP_TYPE_HASH_OF_MAPS]		= "hash_of_maps",
 	[BPF_MAP_TYPE_DEVMAP]			= "devmap",
 	[BPF_MAP_TYPE_DEVMAP_HASH]		= "devmap_hash",
+	[BPF_MAP_TYPE_XDP_CHAIN]		= "xdp_chain",
 	[BPF_MAP_TYPE_SOCKMAP]			= "sockmap",
 	[BPF_MAP_TYPE_CPUMAP]			= "cpumap",
 	[BPF_MAP_TYPE_XSKMAP]			= "xskmap",
@@ -1326,7 +1327,7 @@ static int do_help(int argc, char **argv)
 		"       TYPE := { hash | array | prog_array | perf_event_array | percpu_hash |\n"
 		"                 percpu_array | stack_trace | cgroup_array | lru_hash |\n"
 		"                 lru_percpu_hash | lpm_trie | array_of_maps | hash_of_maps |\n"
-		"                 devmap | devmap_hash | sockmap | cpumap | xskmap | sockhash |\n"
+		"                 devmap | devmap_hash | xdp_chain | sockmap | cpumap | xskmap | sockhash |\n"
 		"                 cgroup_storage | reuseport_sockarray | percpu_cgroup_storage }\n"
 		"       " HELP_SPEC_OPTIONS "\n"
 		"",


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH bpf-next 8/9] libbpf: Add support for setting and getting XDP chain maps
  2019-10-02 13:30 [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls Toke Høiland-Jørgensen
                   ` (6 preceding siblings ...)
  2019-10-02 13:30 ` [PATCH bpf-next 7/9] bpftool: Add definitions " Toke Høiland-Jørgensen
@ 2019-10-02 13:30 ` Toke Høiland-Jørgensen
  2019-10-02 13:30 ` [PATCH bpf-next 9/9] selftests: Add tests for XDP chain calls Toke Høiland-Jørgensen
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 53+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-10-02 13:30 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Martin KaFai Lau, Song Liu, Yonghong Song,
	Marek Majkowski, Lorenz Bauer, David Miller,
	Jesper Dangaard Brouer, netdev, bpf

From: Toke Høiland-Jørgensen <toke@redhat.com>

This adds two new API functions for setting and getting XDP programs with
associated chain map IDs. Programs are expected to pair them, so that if a
program uses the chain map-aware setter, it should also use the associated
getter.

Programs using the old non-chain aware variants of the functions will not
set the XDP chain map attribute on the netlink message, resulting in the
kernel rejecting the command if a chain map has already been loaded.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 tools/lib/bpf/libbpf.h   |    4 ++++
 tools/lib/bpf/libbpf.map |    2 ++
 tools/lib/bpf/netlink.c  |   49 ++++++++++++++++++++++++++++++++++++++++++----
 3 files changed, 51 insertions(+), 4 deletions(-)

diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index e8f70977d137..0a459840e32c 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -357,7 +357,11 @@ LIBBPF_API int bpf_prog_load(const char *file, enum bpf_prog_type type,
 			     struct bpf_object **pobj, int *prog_fd);
 
 LIBBPF_API int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags);
+LIBBPF_API int bpf_set_link_xdp_chain(int ifindex, int prog_fd, int chain_map_fd,
+				      __u32 flags);
 LIBBPF_API int bpf_get_link_xdp_id(int ifindex, __u32 *prog_id, __u32 flags);
+LIBBPF_API int bpf_get_link_xdp_chain(int ifindex, __u32 *prog_id, __u32 *chain_map_id,
+				      __u32 flags);
 
 struct perf_buffer;
 
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 8d10ca03d78d..59f412680292 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -192,4 +192,6 @@ LIBBPF_0.0.5 {
 } LIBBPF_0.0.4;
 
 LIBBPF_0.0.6 {
+		bpf_set_link_xdp_chain;
+		bpf_get_link_xdp_chain;
 } LIBBPF_0.0.5;
diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c
index ce3ec81b71c0..c6f63bdab2e6 100644
--- a/tools/lib/bpf/netlink.c
+++ b/tools/lib/bpf/netlink.c
@@ -25,6 +25,7 @@ struct xdp_id_md {
 	int ifindex;
 	__u32 flags;
 	__u32 id;
+	__u32 chain_map_id;
 };
 
 int libbpf_netlink_open(__u32 *nl_pid)
@@ -128,7 +129,8 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq,
 	return ret;
 }
 
-int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags)
+static int __bpf_set_link_xdp_fd(int ifindex, int *prog_fd, int *chain_map_fd,
+				 __u32 flags)
 {
 	int sock, seq = 0, ret;
 	struct nlattr *nla, *nla_xdp;
@@ -162,9 +164,19 @@ int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags)
 	nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
 	nla_xdp->nla_type = IFLA_XDP_FD;
 	nla_xdp->nla_len = NLA_HDRLEN + sizeof(int);
-	memcpy((char *)nla_xdp + NLA_HDRLEN, &fd, sizeof(fd));
+	memcpy((char *)nla_xdp + NLA_HDRLEN, prog_fd, sizeof(*prog_fd));
 	nla->nla_len += nla_xdp->nla_len;
 
+	if (chain_map_fd) {
+		/* add XDP chain map */
+		nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
+		nla_xdp->nla_type = IFLA_XDP_CHAIN_MAP_FD;
+		nla_xdp->nla_len = NLA_HDRLEN + sizeof(int);
+		memcpy((char *)nla_xdp + NLA_HDRLEN, chain_map_fd,
+		       sizeof(*chain_map_fd));
+		nla->nla_len += nla_xdp->nla_len;
+	}
+
 	/* if user passed in any flags, add those too */
 	if (flags) {
 		nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
@@ -187,6 +199,17 @@ int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags)
 	return ret;
 }
 
+int bpf_set_link_xdp_chain(int ifindex, int prog_fd, int chain_map_fd,
+			   __u32 flags)
+{
+	return __bpf_set_link_xdp_fd(ifindex, &prog_fd, &chain_map_fd, flags);
+}
+
+int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags)
+{
+	return __bpf_set_link_xdp_fd(ifindex, &fd, NULL, flags);
+}
+
 static int __dump_link_nlmsg(struct nlmsghdr *nlh,
 			     libbpf_dump_nlmsg_t dump_link_nlmsg, void *cookie)
 {
@@ -247,10 +270,13 @@ static int get_xdp_id(void *cookie, void *msg, struct nlattr **tb)
 
 	xdp_id->id = libbpf_nla_getattr_u32(xdp_tb[xdp_attr]);
 
+	if (xdp_tb[IFLA_XDP_CHAIN_MAP_ID])
+		xdp_id->chain_map_id = libbpf_nla_getattr_u32(xdp_tb[IFLA_XDP_CHAIN_MAP_ID]);
+
 	return 0;
 }
 
-int bpf_get_link_xdp_id(int ifindex, __u32 *prog_id, __u32 flags)
+static int __bpf_get_link_xdp_id(int ifindex, __u32 *prog_id, __u32 *chain_map_id, __u32 flags)
 {
 	struct xdp_id_md xdp_id = {};
 	int sock, ret;
@@ -274,13 +300,28 @@ int bpf_get_link_xdp_id(int ifindex, __u32 *prog_id, __u32 flags)
 	xdp_id.flags = flags;
 
 	ret = libbpf_nl_get_link(sock, nl_pid, get_xdp_id, &xdp_id);
-	if (!ret)
+	if (!ret) {
 		*prog_id = xdp_id.id;
 
+		if (chain_map_id)
+			*chain_map_id = xdp_id.chain_map_id;
+	}
+
 	close(sock);
 	return ret;
 }
 
+int bpf_get_link_xdp_id(int ifindex, __u32 *prog_id, __u32 flags)
+{
+	return __bpf_get_link_xdp_id(ifindex, prog_id, NULL, flags);
+}
+
+int bpf_get_link_xdp_chain(int ifindex, __u32 *prog_id, __u32 *chain_map_id,
+			   __u32 flags)
+{
+	return __bpf_get_link_xdp_id(ifindex, prog_id, chain_map_id, flags);
+}
+
 int libbpf_nl_get_link(int sock, unsigned int nl_pid,
 		       libbpf_dump_nlmsg_t dump_link_nlmsg, void *cookie)
 {


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH bpf-next 9/9] selftests: Add tests for XDP chain calls
  2019-10-02 13:30 [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls Toke Høiland-Jørgensen
                   ` (7 preceding siblings ...)
  2019-10-02 13:30 ` [PATCH bpf-next 8/9] libbpf: Add support for setting and getting XDP chain maps Toke Høiland-Jørgensen
@ 2019-10-02 13:30 ` Toke Høiland-Jørgensen
  2019-10-02 15:10 ` [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through " Alan Maguire
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 53+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-10-02 13:30 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Martin KaFai Lau, Song Liu, Yonghong Song,
	Marek Majkowski, Lorenz Bauer, David Miller,
	Jesper Dangaard Brouer, netdev, bpf

From: Toke Høiland-Jørgensen <toke@redhat.com>

This adds new self tests for the XDP chain call functionality.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 tools/testing/selftests/bpf/.gitignore        |    1 
 tools/testing/selftests/bpf/Makefile          |    3 
 tools/testing/selftests/bpf/progs/xdp_dummy.c |    6 +
 tools/testing/selftests/bpf/test_maps.c       |   45 ++++
 tools/testing/selftests/bpf/test_xdp_chain.sh |   77 +++++++
 tools/testing/selftests/bpf/xdp_chain.c       |  271 +++++++++++++++++++++++++
 6 files changed, 402 insertions(+), 1 deletion(-)
 create mode 100755 tools/testing/selftests/bpf/test_xdp_chain.sh
 create mode 100644 tools/testing/selftests/bpf/xdp_chain.c

diff --git a/tools/testing/selftests/bpf/.gitignore b/tools/testing/selftests/bpf/.gitignore
index 7470327edcfe..e9d2d765cc8f 100644
--- a/tools/testing/selftests/bpf/.gitignore
+++ b/tools/testing/selftests/bpf/.gitignore
@@ -39,3 +39,4 @@ libbpf.so.*
 test_hashmap
 test_btf_dump
 xdping
+xdp_chain
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 6889c19a628c..97e8f6ae4a15 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -29,7 +29,7 @@ TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test
 	test_sock test_btf test_sockmap get_cgroup_id_user test_socket_cookie \
 	test_cgroup_storage test_select_reuseport test_section_names \
 	test_netcnt test_tcpnotify_user test_sock_fields test_sysctl test_hashmap \
-	test_btf_dump test_cgroup_attach xdping
+	test_btf_dump test_cgroup_attach xdping xdp_chain
 
 BPF_OBJ_FILES = $(patsubst %.c,%.o, $(notdir $(wildcard progs/*.c)))
 TEST_GEN_FILES = $(BPF_OBJ_FILES)
@@ -71,6 +71,7 @@ TEST_PROGS := test_kmod.sh \
 	test_tc_tunnel.sh \
 	test_tc_edt.sh \
 	test_xdping.sh \
+	test_xdp_chain.sh \
 	test_bpftool_build.sh
 
 TEST_PROGS_EXTENDED := with_addr.sh \
diff --git a/tools/testing/selftests/bpf/progs/xdp_dummy.c b/tools/testing/selftests/bpf/progs/xdp_dummy.c
index 43b0ef1001ed..454a1f0763a1 100644
--- a/tools/testing/selftests/bpf/progs/xdp_dummy.c
+++ b/tools/testing/selftests/bpf/progs/xdp_dummy.c
@@ -10,4 +10,10 @@ int xdp_dummy_prog(struct xdp_md *ctx)
 	return XDP_PASS;
 }
 
+SEC("xdp_drop")
+int xdp_drop_prog(struct xdp_md *ctx)
+{
+	return XDP_DROP;
+}
+
 char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/test_maps.c b/tools/testing/selftests/bpf/test_maps.c
index e1f1becda529..44f2f8548a24 100644
--- a/tools/testing/selftests/bpf/test_maps.c
+++ b/tools/testing/selftests/bpf/test_maps.c
@@ -523,6 +523,50 @@ static void test_devmap_hash(unsigned int task, void *data)
 	close(fd);
 }
 
+static void test_xdp_chain_map(unsigned int task, void *data)
+{
+	const char *file = "./xdp_dummy.o";
+	struct xdp_chain_acts acts = {};
+	struct bpf_object *obj;
+	int map_fd, prog_fd;
+	__u32 key = 0;
+	int err;
+
+	map_fd = bpf_create_map(BPF_MAP_TYPE_XDP_CHAIN, sizeof(key), sizeof(acts),
+			    2, 0);
+	if (map_fd < 0) {
+		printf("Failed to create xdp_chain map '%s'!\n", strerror(errno));
+		exit(1);
+	}
+
+	err = bpf_prog_load(file, BPF_PROG_TYPE_XDP, &obj, &prog_fd);
+	if (err < 0) {
+		printf("Failed to load dummy prog: '%s'!\n", strerror(errno));
+		exit(1);
+	}
+
+	/* Try inserting NULL key/value - should fail */
+	assert(bpf_map_update_elem(map_fd, &key, &acts, 0) == -1 && errno == EINVAL);
+
+	/* Try inserting NULL value - should fail */
+	key = 1;
+	assert(bpf_map_update_elem(map_fd, &key, &acts, 0) == -1 && errno == EINVAL);
+
+	/* Try a real insert - should succeed */
+	acts.wildcard_act = prog_fd;
+	assert(bpf_map_update_elem(map_fd, &key, &acts, 0) == 0);
+
+	/* Replace with full table */
+	acts.drop_act = acts.pass_act = acts.tx_act = acts.redirect_act = prog_fd;
+	assert(bpf_map_update_elem(map_fd, &key, &acts, 0) == 0);
+
+	/* Try deleting element */
+	assert(bpf_map_delete_elem(map_fd, &key) == 0);
+
+	bpf_object__close(obj);
+	close(map_fd);
+}
+
 static void test_queuemap(unsigned int task, void *data)
 {
 	const int MAP_SIZE = 32;
@@ -1700,6 +1744,7 @@ static void run_all_tests(void)
 
 	test_devmap(0, NULL);
 	test_devmap_hash(0, NULL);
+	test_xdp_chain_map(0, NULL);
 	test_sockmap(0, NULL);
 
 	test_map_large();
diff --git a/tools/testing/selftests/bpf/test_xdp_chain.sh b/tools/testing/selftests/bpf/test_xdp_chain.sh
new file mode 100755
index 000000000000..3997655d4e45
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_xdp_chain.sh
@@ -0,0 +1,77 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+# xdp_chain tests
+#   Here we setup and teardown configuration required to run
+#   xdp_chain, exercising its options.
+#
+#   Setup is similar to xdping tests.
+#
+# Topology:
+# ---------
+#     root namespace   |     tc_ns0 namespace
+#                      |
+#      ----------      |     ----------
+#      |  veth1  | --------- |  veth0  |
+#      ----------    peer    ----------
+#
+# Device Configuration
+# --------------------
+# Root namespace with BPF
+# Device names and addresses:
+#	veth1 IP: 10.1.1.200
+#
+# Namespace tc_ns0 with BPF
+# Device names and addresses:
+#       veth0 IPv4: 10.1.1.100
+#	xdp_chain binary run inside this
+#
+
+readonly TARGET_IP="10.1.1.100"
+readonly TARGET_NS="xdp_ns0"
+
+readonly LOCAL_IP="10.1.1.200"
+
+setup()
+{
+	ip netns add $TARGET_NS
+	ip link add veth0 type veth peer name veth1
+	ip link set veth0 netns $TARGET_NS
+	ip netns exec $TARGET_NS ip addr add ${TARGET_IP}/24 dev veth0
+	ip addr add ${LOCAL_IP}/24 dev veth1
+	ip netns exec $TARGET_NS ip link set veth0 up
+	ip link set veth1 up
+}
+
+cleanup()
+{
+	set +e
+	ip netns delete $TARGET_NS 2>/dev/null
+	ip link del veth1 2>/dev/null
+}
+
+die()
+{
+        echo "$@" >&2
+        exit 1
+}
+
+test()
+{
+	args="$1"
+
+	ip netns exec $TARGET_NS ./xdp_chain $args || die "XDP chain test error"
+}
+
+set -e
+
+server_pid=0
+
+trap cleanup EXIT
+
+setup
+
+test "-I veth0 -S $LOCAL_IP"
+
+echo "OK. All tests passed"
+exit 0
diff --git a/tools/testing/selftests/bpf/xdp_chain.c b/tools/testing/selftests/bpf/xdp_chain.c
new file mode 100644
index 000000000000..11f78bb1c2e7
--- /dev/null
+++ b/tools/testing/selftests/bpf/xdp_chain.c
@@ -0,0 +1,271 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2019, Oracle and/or its affiliates. All rights reserved. */
+
+#include <linux/bpf.h>
+#include <linux/if_link.h>
+#include <arpa/inet.h>
+#include <assert.h>
+#include <errno.h>
+#include <signal.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <libgen.h>
+#include <sys/resource.h>
+#include <net/if.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <netdb.h>
+
+#include "bpf/bpf.h"
+#include "bpf/libbpf.h"
+
+static int ifindex;
+static __u32 xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
+static char *dest = NULL, *ifname = NULL;
+
+static void cleanup(int sig)
+{
+	int ret;
+
+	fprintf(stderr, "  Cleaning up\n");
+	if ((ret = bpf_set_link_xdp_chain(ifindex, -1, -1, xdp_flags)))
+		fprintf(stderr, "Warning: Unable to clear XDP prog: %s\n",
+			strerror(-ret));
+	if (sig)
+		exit(1);
+}
+
+static void show_usage(const char *prog)
+{
+	fprintf(stderr,
+		"usage: %s [OPTS] -I interface destination\n\n"
+		"OPTS:\n"
+		"    -I interface		interface name\n"
+		"    -N			Run in driver mode\n"
+		"    -S			Run in skb mode\n"
+		"    -p pin_path		path to pin chain call map\n"
+		"    -x			Exit after setup\n"
+		"    -c			Cleanup and exit\n",
+		prog);
+}
+
+static int run_ping(bool should_fail, const char *msg)
+{
+	char cmd[256];
+	bool success;
+	int ret;
+
+	snprintf(cmd, sizeof(cmd), "ping -c 1 -W 1 -I %s %s >/dev/null", ifname, dest);
+
+	printf("  %s: ", msg);
+
+	ret = system(cmd);
+
+	success = (!!ret == should_fail);
+	printf(success ? "PASS\n" : "FAIL\n");
+
+	return !success;
+}
+
+int main(int argc, char **argv)
+{
+	__u32 mode_flags = XDP_FLAGS_DRV_MODE | XDP_FLAGS_SKB_MODE;
+	int pass_prog_fd = -1, drop_prog_fd = -1, map_fd = -1;
+	const char *filename = "xdp_dummy.o", *pin_path = NULL;
+	struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
+	struct bpf_program *pass_prog, *drop_prog;
+	__u32 map_key, prog_id, chain_map_id;
+	struct xdp_chain_acts acts = {};
+	struct bpf_prog_info info = {};
+	__u32 info_len = sizeof(info);
+	const char *optstr = "I:NSp:xc";
+	bool setup_only = false, cleanup_only = false;
+	struct bpf_object *obj;
+	int opt, ret = 1;
+
+	while ((opt = getopt(argc, argv, optstr)) != -1) {
+		switch (opt) {
+		case 'I':
+			ifname = optarg;
+			ifindex = if_nametoindex(ifname);
+			if (!ifindex) {
+				fprintf(stderr, "Could not get interface %s\n",
+					ifname);
+				return 1;
+			}
+			break;
+		case 'N':
+			xdp_flags |= XDP_FLAGS_DRV_MODE;
+			break;
+		case 'S':
+			xdp_flags |= XDP_FLAGS_SKB_MODE;
+			break;
+		case 'x':
+			setup_only = true;
+			break;
+		case 'c':
+			cleanup_only = true;
+			break;
+		case 'p':
+			pin_path = optarg;
+			break;
+		default:
+			show_usage(basename(argv[0]));
+			return 1;
+		}
+	}
+
+	if (!ifname) {
+		show_usage(basename(argv[0]));
+		return 1;
+	}
+
+	if (cleanup_only) {
+		if (pin_path)
+			unlink(pin_path);
+		cleanup(0);
+		return 0;
+	}
+
+	if (!setup_only && optind == argc) {
+		show_usage(basename(argv[0]));
+		return 1;
+	}
+	dest = argv[optind];
+
+	if ((xdp_flags & mode_flags) == mode_flags) {
+		fprintf(stderr, "-N or -S can be specified, not both.\n");
+		show_usage(basename(argv[0]));
+		return 1;
+	}
+
+	if (setrlimit(RLIMIT_MEMLOCK, &r)) {
+		perror("setrlimit(RLIMIT_MEMLOCK)");
+		return 1;
+	}
+
+	if (bpf_prog_load(filename, BPF_PROG_TYPE_XDP, &obj, &pass_prog_fd)) {
+		fprintf(stderr, "load of %s failed\n", filename);
+		return 1;
+	}
+
+	pass_prog = bpf_object__find_program_by_title(obj, "xdp_dummy");
+	drop_prog = bpf_object__find_program_by_title(obj, "xdp_drop");
+
+	if (!pass_prog || !drop_prog) {
+		fprintf(stderr, "could not find xdp programs\n");
+		return 1;
+	}
+	pass_prog_fd = bpf_program__fd(pass_prog);
+	drop_prog_fd = bpf_program__fd(drop_prog);
+	if (pass_prog_fd < 0 || drop_prog_fd < 0) {
+		fprintf(stderr, "could not find xdp programs\n");
+		goto done;
+	}
+
+	ret = bpf_obj_get_info_by_fd(pass_prog_fd, &info, &info_len);
+	if (ret) {
+		fprintf(stderr, "unable to get program ID from kernel\n");
+		goto done;
+	}
+	map_key = info.id;
+	map_fd = bpf_create_map(BPF_MAP_TYPE_XDP_CHAIN,
+				sizeof(map_key), sizeof(acts),
+				2, 0);
+
+	if (map_fd < 0) {
+		fprintf(stderr, "unable to create chain call map: %s\n", strerror(errno));
+		goto done;
+	}
+
+	if (pin_path && (ret = bpf_obj_pin(map_fd, pin_path))) {
+		fprintf(stderr, "unable to pin map at %s: %s\n", pin_path,
+			strerror(errno));
+		goto done;
+	}
+
+
+#define RUN_PING(should_fail, err) if ((ret = run_ping(should_fail, err))) goto done;
+
+	if (!setup_only) {
+		RUN_PING(false, "Pre-setup ping test");
+
+		signal(SIGINT, cleanup);
+		signal(SIGTERM, cleanup);
+	}
+
+	if ((ret = bpf_set_link_xdp_chain(ifindex, pass_prog_fd, map_fd, xdp_flags)) < 0) {
+		fprintf(stderr, "Link set xdp fd failed for %s: %s\n", ifname,
+			strerror(-ret));
+		goto done;
+	}
+
+	if ((ret = bpf_get_link_xdp_chain(ifindex, &prog_id, &chain_map_id, 0)) < 0) {
+		fprintf(stderr, "Unable to get xdp IDs for %s: '%s'\n", ifname, strerror(-ret));
+		goto done;
+	}
+	printf("  XDP prog ID: %u Chain map ID: %u\n", prog_id, chain_map_id);
+
+	if (!setup_only) {
+		sleep(1);
+		RUN_PING(false, "Empty map test");
+	}
+
+	acts.wildcard_act = drop_prog_fd;
+	if (bpf_map_update_elem(map_fd, &map_key, &acts, 0)) {
+		fprintf(stderr, "unable to insert into map: %s\n", strerror(errno));
+		goto done;
+	}
+
+	if (setup_only) {
+		printf("Setup done; exiting.\n");
+		ret = 0;
+		goto done;
+	}
+
+	sleep(1);
+
+	RUN_PING(true, "Wildcard act test");
+
+	if (bpf_map_delete_elem(map_fd, &map_key)) {
+		fprintf(stderr, "unable to delete from map: %s\n", strerror(errno));
+		goto done;
+	}
+	sleep(1);
+
+	RUN_PING(false, "Post-delete map test");
+
+	acts.wildcard_act = 0;
+	acts.pass_act = drop_prog_fd;
+	if (bpf_map_update_elem(map_fd, &map_key, &acts, 0)) {
+		fprintf(stderr, "unable to insert into map: %s\n", strerror(errno));
+		goto done;
+	}
+	sleep(1);
+
+	RUN_PING(true, "Pass act test");
+
+
+	if ((ret = bpf_set_link_xdp_chain(ifindex, -1, -1, xdp_flags)) < 0) {
+		fprintf(stderr, "Link clear xdp fd failed for %s: '%s'\n", ifname, strerror(-ret));
+		goto done;
+	}
+	sleep(1);
+
+	RUN_PING(false, "Post-delete prog test");
+
+
+done:
+	cleanup(ret);
+
+	if (pass_prog_fd > 0)
+		close(pass_prog_fd);
+	if (drop_prog_fd > 0)
+		close(drop_prog_fd);
+	if (map_fd > 0)
+		close(map_fd);
+
+	return ret;
+}


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls
  2019-10-02 13:30 [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls Toke Høiland-Jørgensen
                   ` (8 preceding siblings ...)
  2019-10-02 13:30 ` [PATCH bpf-next 9/9] selftests: Add tests for XDP chain calls Toke Høiland-Jørgensen
@ 2019-10-02 15:10 ` Alan Maguire
  2019-10-02 15:33   ` Toke Høiland-Jørgensen
  2019-10-02 16:35 ` Lorenz Bauer
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 53+ messages in thread
From: Alan Maguire @ 2019-10-02 15:10 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Daniel Borkmann, Alexei Starovoitov, Martin KaFai Lau, Song Liu,
	Yonghong Song, Marek Majkowski, Lorenz Bauer, David Miller,
	Jesper Dangaard Brouer, netdev, bpf

[-- Attachment #1: Type: text/plain, Size: 2193 bytes --]

On Wed, 2 Oct 2019, Toke Høiland-Jørgensen wrote:

> This series adds support for executing multiple XDP programs on a single
> interface in sequence, through the use of chain calls, as discussed at the Linux
> Plumbers Conference last month:
> 
> https://linuxplumbersconf.org/event/4/contributions/460/
> 
> # HIGH-LEVEL IDEA
> 
> The basic idea is to express the chain call sequence through a special map type,
> which contains a mapping from a (program, return code) tuple to another program
> to run in next in the sequence. Userspace can populate this map to express
> arbitrary call sequences, and update the sequence by updating or replacing the
> map.
> 
> The actual execution of the program sequence is done in bpf_prog_run_xdp(),
> which will lookup the chain sequence map, and if found, will loop through calls
> to BPF_PROG_RUN, looking up the next XDP program in the sequence based on the
> previous program ID and return code.
> 
> An XDP chain call map can be installed on an interface by means of a new netlink
> attribute containing an fd pointing to a chain call map. This can be supplied
> along with the XDP prog fd, so that a chain map is always installed together
> with an XDP program.
> 

This is great stuff Toke! One thing that wasn't immediately clear to me -
and this may be just me - is the relationship between program 
behaviour for the XDP_DROP case and chain call execution.  My initial
thought was that a program in the chain XDP_DROP'ping the packet would
terminate the call chain, but on looking at patch #4 it seems that
the only way the call chain execution is terminated is if

- XDP_ABORTED is returned from a program in the call chain; or
- the map entry for the next program (determined by the return value
  of the current program) is empty; or
- we run out of entries in the map

The return value of the last-executed program in the chain seems
to be what determines packet processing behaviour after executing
the chain (_DROP, _TX, _PASS, etc).  So there's no way to both XDP_PASS 
and XDP_TX a packet from the same chain, right? Just want to make
sure I've got the semantics correct. Thanks!

Alan

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls
  2019-10-02 15:10 ` [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through " Alan Maguire
@ 2019-10-02 15:33   ` Toke Høiland-Jørgensen
  2019-10-02 16:34     ` John Fastabend
  0 siblings, 1 reply; 53+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-10-02 15:33 UTC (permalink / raw)
  To: Alan Maguire
  Cc: Daniel Borkmann, Alexei Starovoitov, Martin KaFai Lau, Song Liu,
	Yonghong Song, Marek Majkowski, Lorenz Bauer, David Miller,
	Jesper Dangaard Brouer, netdev, bpf

Alan Maguire <alan.maguire@oracle.com> writes:

> On Wed, 2 Oct 2019, Toke Høiland-Jørgensen wrote:
>
>> This series adds support for executing multiple XDP programs on a single
>> interface in sequence, through the use of chain calls, as discussed at the Linux
>> Plumbers Conference last month:
>> 
>> https://linuxplumbersconf.org/event/4/contributions/460/
>> 
>> # HIGH-LEVEL IDEA
>> 
>> The basic idea is to express the chain call sequence through a special map type,
>> which contains a mapping from a (program, return code) tuple to another program
>> to run in next in the sequence. Userspace can populate this map to express
>> arbitrary call sequences, and update the sequence by updating or replacing the
>> map.
>> 
>> The actual execution of the program sequence is done in bpf_prog_run_xdp(),
>> which will lookup the chain sequence map, and if found, will loop through calls
>> to BPF_PROG_RUN, looking up the next XDP program in the sequence based on the
>> previous program ID and return code.
>> 
>> An XDP chain call map can be installed on an interface by means of a new netlink
>> attribute containing an fd pointing to a chain call map. This can be supplied
>> along with the XDP prog fd, so that a chain map is always installed together
>> with an XDP program.
>> 
>
> This is great stuff Toke!

Thanks! :)

> One thing that wasn't immediately clear to me - and this may be just
> me - is the relationship between program behaviour for the XDP_DROP
> case and chain call execution. My initial thought was that a program
> in the chain XDP_DROP'ping the packet would terminate the call chain,
> but on looking at patch #4 it seems that the only way the call chain
> execution is terminated is if
>
> - XDP_ABORTED is returned from a program in the call chain; or

Yes. Not actually sure about this one...

> - the map entry for the next program (determined by the return value
>   of the current program) is empty; or

This will be the common exit condition, I expect

> - we run out of entries in the map

You mean if we run the iteration counter to zero, right?

> The return value of the last-executed program in the chain seems to be
> what determines packet processing behaviour after executing the chain
> (_DROP, _TX, _PASS, etc). So there's no way to both XDP_PASS and
> XDP_TX a packet from the same chain, right? Just want to make sure
> I've got the semantics correct. Thanks!

Yeah, you've got all this right. The chain call mechanism itself doesn't
change any of the underlying fundamentals of XDP. I.e., each packet gets
exactly one verdict.

For chaining actual XDP programs that do different things to the packet,
I expect that the most common use case will be to only run the next
program if the previous one returns XDP_PASS. That will make the most
semantic sense I think.

But there are also use cases where one would want to match on the other
return codes; such as packet capture, for instance, where one might
install a capture program that would carry forward the previous return
code, but do something to the packet (throw it out to userspace) first.

For the latter use case, the question is if we need to expose the
previous return code to the program when it runs. You can do things
without it (by just using a different program per return code), but it
may simplify things if we just expose the return code. However, since
this will also change the semantics for running programs, I decided to
leave that off for now.

-Toke

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH bpf-next 3/9] xdp: Support setting and getting device chain map
  2019-10-02 13:30 ` [PATCH bpf-next 3/9] xdp: Support setting and getting device chain map Toke Høiland-Jørgensen
@ 2019-10-02 15:50   ` Lorenz Bauer
  2019-10-02 18:32     ` Toke Høiland-Jørgensen
  2019-10-02 18:07   ` kbuild test robot
  2019-10-02 18:29   ` kbuild test robot
  2 siblings, 1 reply; 53+ messages in thread
From: Lorenz Bauer @ 2019-10-02 15:50 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Daniel Borkmann, Alexei Starovoitov, Martin KaFai Lau, Song Liu,
	Yonghong Song, Marek Majkowski, David Miller,
	Jesper Dangaard Brouer, Networking, bpf, kernel-team

On Wed, 2 Oct 2019 at 14:30, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> From: Toke Høiland-Jørgensen <toke@redhat.com>
>
> This adds support to rtnetlink for setting and getting the per-device XDP
> chain map. The map is set by means of a new netlink attribute that contains
> a pointer to a BPF map of the XDP chain type. If such an attribute is
> included, it will be inserted into the struct net_device so that the XDP
> chain call code will pick it up on program execution.
>
> To prevent old userspace programs that do not understand the chain map
> attribute from messing up the chain call order, a netlink message with no
> chain map attribute set will be rejected if a chain map has already been
> installed.
>
> When installing a new chain call map, an XDP program fd must also be
> provided, otherwise the operation will be rejected.

Why is the program required? I kind of expected the chain call map
to override any program.

--
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH bpf-next 2/9] xdp: Add new xdp_chain_map type for specifying XDP call sequences
  2019-10-02 13:30 ` [PATCH bpf-next 2/9] xdp: Add new xdp_chain_map type for specifying XDP call sequences Toke Høiland-Jørgensen
@ 2019-10-02 15:50   ` Lorenz Bauer
  2019-10-02 18:25     ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 53+ messages in thread
From: Lorenz Bauer @ 2019-10-02 15:50 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Daniel Borkmann, Alexei Starovoitov, Martin KaFai Lau, Song Liu,
	Yonghong Song, Marek Majkowski, David Miller,
	Jesper Dangaard Brouer, Networking, bpf, kernel-team

On Wed, 2 Oct 2019 at 14:30, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index 113e1286e184..ab855095c830 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -1510,3 +1510,156 @@ const struct bpf_map_ops htab_of_maps_map_ops = {
>         .map_gen_lookup = htab_of_map_gen_lookup,
>         .map_check_btf = map_check_no_btf,
>  };
> +
> +struct xdp_chain_table {
> +       struct bpf_prog *wildcard_act;
> +       struct bpf_prog *act[XDP_ACT_MAX];
> +};
> +
> +static int xdp_chain_map_alloc_check(union bpf_attr *attr)
> +{
> +       BUILD_BUG_ON(sizeof(struct xdp_chain_table) / sizeof(void *) !=
> +                    sizeof(struct xdp_chain_acts) / sizeof(u32));
> +
> +       if (attr->key_size != sizeof(u32) ||
> +           attr->value_size != sizeof(struct xdp_chain_acts))
> +               return -EINVAL;

How are we going to extend xdp_chain_acts if a new XDP action is introduced?

> +
> +       attr->value_size = sizeof(struct xdp_chain_table);
> +       return htab_map_alloc_check(attr);
> +}
> +
> +struct bpf_prog *bpf_xdp_chain_map_get_prog(struct bpf_map *map,
> +                                           u32 prev_id,
> +                                           enum xdp_action action)
> +{
> +       struct xdp_chain_table *tab;
> +       void *ptr;
> +
> +       ptr = htab_map_lookup_elem(map, &prev_id);
> +
> +       if (!ptr)
> +               return NULL;
> +
> +       tab = READ_ONCE(ptr);
> +       return tab->act[action - 1] ?: tab->wildcard_act;
> +}
> +EXPORT_SYMBOL_GPL(bpf_xdp_chain_map_get_prog);
> +
> +/* only called from syscall */
> +int bpf_xdp_chain_map_lookup_elem(struct bpf_map *map, void *key, void *value)
> +{
> +       struct xdp_chain_acts *act = value;
> +       struct xdp_chain_table *tab;
> +       void *ptr;
> +       u32 *cur;
> +       int i;
> +
> +       ptr = htab_map_lookup_elem(map, key);
> +       if (!ptr)
> +               return -ENOENT;
> +
> +       tab = READ_ONCE(ptr);
> +
> +       if (tab->wildcard_act)
> +               act->wildcard_act = tab->wildcard_act->aux->id;
> +
> +       cur = &act->drop_act;
> +       for (i = 0; i < XDP_ACT_MAX; i++, cur++)
> +               if(tab->act[i])
> +                       *cur = tab->act[i]->aux->id;

For completeness, zero out *cur in the else case?

> +
> +       return 0;
> +}
> +
> +static void *xdp_chain_map_get_ptr(int fd)
> +{
> +       struct bpf_prog *prog = bpf_prog_get(fd);
> +
> +       if (IS_ERR(prog))
> +               return prog;
> +
> +       if (prog->type != BPF_PROG_TYPE_XDP ||
> +           bpf_prog_is_dev_bound(prog->aux)) {
> +               bpf_prog_put(prog);
> +               return ERR_PTR(-EINVAL);
> +       }
> +
> +       return prog;
> +}
> +
> +static void xdp_chain_map_put_ptrs(void *value)
> +{
> +       struct xdp_chain_table *tab = value;
> +       int i;
> +
> +       for (i = 0; i < XDP_ACT_MAX; i++)
> +               if (tab->act[i])
> +                       bpf_prog_put(tab->act[i]);
> +
> +       if (tab->wildcard_act)
> +               bpf_prog_put(tab->wildcard_act);
> +}
> +
> +/* only called from syscall */
> +int bpf_xdp_chain_map_update_elem(struct bpf_map *map, void *key, void *value,
> +                                 u64 map_flags)

Nit: check that map_flags == 0

> +{
> +       struct xdp_chain_acts *act = value;
> +       struct xdp_chain_table tab = {};
> +       u32 lookup_key = *((u32*)key);
> +       u32 *cur = &act->drop_act;
> +       bool found_val = false;
> +       int ret, i;
> +       void *ptr;
> +
> +       if (!lookup_key)
> +               return -EINVAL;

Is it possible to check that this is a valid prog id / fd or whatever it is?

> +
> +       if (act->wildcard_act) {

If this is an fd, 0 is a valid value no?

> +               ptr = xdp_chain_map_get_ptr(act->wildcard_act);
> +               if (IS_ERR(ptr))
> +                       return PTR_ERR(ptr);
> +               tab.wildcard_act = ptr;
> +               found_val = true;
> +       }
> +
> +       for (i = 0; i < XDP_ACT_MAX; i++, cur++) {
> +               if (*cur) {
> +                       ptr = xdp_chain_map_get_ptr(*cur);
> +                       if (IS_ERR(ptr)) {
> +                               ret = PTR_ERR(ptr);
> +                               goto out_err;
> +                       }
> +                       tab.act[i] = ptr;
> +                       found_val = true;
> +               }
> +       }
> +
> +       if (!found_val) {
> +               ret = -EINVAL;
> +               goto out_err;
> +       }
> +
> +       ret = htab_map_update_elem(map, key, &tab, map_flags);
> +       if (ret)
> +               goto out_err;
> +
> +       return ret;
> +
> +out_err:
> +       xdp_chain_map_put_ptrs(&tab);
> +
> +       return ret;
> +}
> +
> +
> +const struct bpf_map_ops xdp_chain_map_ops = {
> +       .map_alloc_check = xdp_chain_map_alloc_check,
> +       .map_alloc = htab_map_alloc,
> +       .map_free = fd_htab_map_free,
> +       .map_get_next_key = htab_map_get_next_key,
> +       .map_delete_elem = htab_map_delete_elem,
> +       .map_fd_put_value = xdp_chain_map_put_ptrs,
> +       .map_check_btf = map_check_no_btf,
> +};

--
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls
  2019-10-02 15:33   ` Toke Høiland-Jørgensen
@ 2019-10-02 16:34     ` John Fastabend
  2019-10-02 18:33       ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 53+ messages in thread
From: John Fastabend @ 2019-10-02 16:34 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Alan Maguire
  Cc: Daniel Borkmann, Alexei Starovoitov, Martin KaFai Lau, Song Liu,
	Yonghong Song, Marek Majkowski, Lorenz Bauer, David Miller,
	Jesper Dangaard Brouer, netdev, bpf

Toke Høiland-Jørgensen wrote:
> Alan Maguire <alan.maguire@oracle.com> writes:
> 
> > On Wed, 2 Oct 2019, Toke Høiland-Jørgensen wrote:
> >
> >> This series adds support for executing multiple XDP programs on a single
> >> interface in sequence, through the use of chain calls, as discussed at the Linux
> >> Plumbers Conference last month:
> >> 
> >> https://linuxplumbersconf.org/event/4/contributions/460/
> >> 
> >> # HIGH-LEVEL IDEA
> >> 
> >> The basic idea is to express the chain call sequence through a special map type,
> >> which contains a mapping from a (program, return code) tuple to another program
> >> to run in next in the sequence. Userspace can populate this map to express
> >> arbitrary call sequences, and update the sequence by updating or replacing the
> >> map.
> >> 
> >> The actual execution of the program sequence is done in bpf_prog_run_xdp(),
> >> which will lookup the chain sequence map, and if found, will loop through calls
> >> to BPF_PROG_RUN, looking up the next XDP program in the sequence based on the
> >> previous program ID and return code.
> >> 
> >> An XDP chain call map can be installed on an interface by means of a new netlink
> >> attribute containing an fd pointing to a chain call map. This can be supplied
> >> along with the XDP prog fd, so that a chain map is always installed together
> >> with an XDP program.
> >> 
> >
> > This is great stuff Toke!
> 
> Thanks! :)
> 
> > One thing that wasn't immediately clear to me - and this may be just
> > me - is the relationship between program behaviour for the XDP_DROP
> > case and chain call execution. My initial thought was that a program
> > in the chain XDP_DROP'ping the packet would terminate the call chain,
> > but on looking at patch #4 it seems that the only way the call chain
> > execution is terminated is if
> >
> > - XDP_ABORTED is returned from a program in the call chain; or
> 
> Yes. Not actually sure about this one...
> 
> > - the map entry for the next program (determined by the return value
> >   of the current program) is empty; or
> 
> This will be the common exit condition, I expect
> 
> > - we run out of entries in the map
> 
> You mean if we run the iteration counter to zero, right?
> 
> > The return value of the last-executed program in the chain seems to be
> > what determines packet processing behaviour after executing the chain
> > (_DROP, _TX, _PASS, etc). So there's no way to both XDP_PASS and
> > XDP_TX a packet from the same chain, right? Just want to make sure
> > I've got the semantics correct. Thanks!
> 
> Yeah, you've got all this right. The chain call mechanism itself doesn't
> change any of the underlying fundamentals of XDP. I.e., each packet gets
> exactly one verdict.
> 
> For chaining actual XDP programs that do different things to the packet,
> I expect that the most common use case will be to only run the next
> program if the previous one returns XDP_PASS. That will make the most
> semantic sense I think.
> 
> But there are also use cases where one would want to match on the other
> return codes; such as packet capture, for instance, where one might
> install a capture program that would carry forward the previous return
> code, but do something to the packet (throw it out to userspace) first.
> 
> For the latter use case, the question is if we need to expose the
> previous return code to the program when it runs. You can do things
> without it (by just using a different program per return code), but it
> may simplify things if we just expose the return code. However, since
> this will also change the semantics for running programs, I decided to
> leave that off for now.
> 
> -Toke

In other cases where programs (e.g. cgroups) are run in an array the
return codes are 'AND'ed together so that we get

   result1 & result2 & ... & resultN

The result is if any program returns a drop then the packet should
be dropped, but all programs at least "see" the packet. There was
a lot of debate over this semantic and I think in hind sight it
actually works pretty well so any chaining in XDP should also keep
those semantics. For things like redirect we can already, even in
the same program, do multiple calls to the redirect helper and it
will overwrite the last call so those semantics can stay the same.

.John

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls
  2019-10-02 13:30 [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls Toke Høiland-Jørgensen
                   ` (9 preceding siblings ...)
  2019-10-02 15:10 ` [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through " Alan Maguire
@ 2019-10-02 16:35 ` Lorenz Bauer
  2019-10-02 18:54   ` Toke Høiland-Jørgensen
  2019-10-02 16:43 ` John Fastabend
  2019-10-02 18:38 ` Song Liu
  12 siblings, 1 reply; 53+ messages in thread
From: Lorenz Bauer @ 2019-10-02 16:35 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Daniel Borkmann, Alexei Starovoitov, Martin KaFai Lau, Song Liu,
	Yonghong Song, Marek Majkowski, David Miller,
	Jesper Dangaard Brouer, Networking, bpf, kernel-team

On Wed, 2 Oct 2019 at 14:30, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> This series adds support for executing multiple XDP programs on a single
> interface in sequence, through the use of chain calls, as discussed at the Linux
> Plumbers Conference last month:

Hi Toke,

Thanks for posting the patch set! As I mentioned, this is a big pain
point for us as
well. Right now, all of our different XDP components are managed by a single
daemon called (drumroll) xdpd. We'd like  to separate this out into
individual bits and pieces that operate separately from each other.

I've looked at the kernel side of your patch set, here are my thoughts:

> # HIGH-LEVEL IDEA
>
> The basic idea is to express the chain call sequence through a special map type,
> which contains a mapping from a (program, return code) tuple to another program
> to run in next in the sequence. Userspace can populate this map to express
> arbitrary call sequences, and update the sequence by updating or replacing the
> map.

How do you imagine this would work in practice? From what I can tell, the map
is keyed by program id, which makes it difficult to reliably construct
the control
flow that I want.

As an example, I'd like to split the daemon into two parts, A and B,
which I want
to be completely independent. So:

- A runs before B, if both are active
- If A is not active, B is called first
- If B is not active, only A is called

Both A and B may at any point in time replace their current XDP program with a
new one. This means that there is no stable program ID I can use.

Another problem are deletions: if I delete A (because that component
is shut down)
B will never be called, since the program ID that linked B into the
control flow is gone.
This means that B needs to know about A and vice versa.

> The actual execution of the program sequence is done in bpf_prog_run_xdp(),
> which will lookup the chain sequence map, and if found, will loop through calls
> to BPF_PROG_RUN, looking up the next XDP program in the sequence based on the
> previous program ID and return code.

I think that the tail call chain is useful for other eBPF programs as well.
How hard is it to turn the logic in bpf_prog_run_xdp into a helper
instead?

Lorenz

-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

^ permalink raw reply	[flat|nested] 53+ messages in thread

* RE: [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls
  2019-10-02 13:30 [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls Toke Høiland-Jørgensen
                   ` (10 preceding siblings ...)
  2019-10-02 16:35 ` Lorenz Bauer
@ 2019-10-02 16:43 ` John Fastabend
  2019-10-02 19:09   ` Toke Høiland-Jørgensen
  2019-10-02 19:15   ` Daniel Borkmann
  2019-10-02 18:38 ` Song Liu
  12 siblings, 2 replies; 53+ messages in thread
From: John Fastabend @ 2019-10-02 16:43 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Daniel Borkmann
  Cc: Alexei Starovoitov, Martin KaFai Lau, Song Liu, Yonghong Song,
	Marek Majkowski, Lorenz Bauer, David Miller,
	Jesper Dangaard Brouer, netdev, bpf

Toke Høiland-Jørgensen wrote:
> This series adds support for executing multiple XDP programs on a single
> interface in sequence, through the use of chain calls, as discussed at the Linux
> Plumbers Conference last month:
> 
> https://linuxplumbersconf.org/event/4/contributions/460/
> 
> # HIGH-LEVEL IDEA
> 
> The basic idea is to express the chain call sequence through a special map type,
> which contains a mapping from a (program, return code) tuple to another program
> to run in next in the sequence. Userspace can populate this map to express
> arbitrary call sequences, and update the sequence by updating or replacing the
> map.
> 
> The actual execution of the program sequence is done in bpf_prog_run_xdp(),
> which will lookup the chain sequence map, and if found, will loop through calls
> to BPF_PROG_RUN, looking up the next XDP program in the sequence based on the
> previous program ID and return code.
> 
> An XDP chain call map can be installed on an interface by means of a new netlink
> attribute containing an fd pointing to a chain call map. This can be supplied
> along with the XDP prog fd, so that a chain map is always installed together
> with an XDP program.
> 
> # PERFORMANCE
> 
> I performed a simple performance test to get an initial feel for the overhead of
> the chain call mechanism. This test consists of running only two programs in
> sequence: One that returns XDP_PASS and another that returns XDP_DROP. I then
> measure the drop PPS performance and compare it to a baseline of just a single
> program that only returns XDP_DROP.
> 
> For comparison, a test case that uses regular eBPF tail calls to sequence two
> programs together is also included. Finally, because 'perf' showed that the
> hashmap lookup was the largest single source of overhead, I also added a test
> case where I removed the jhash() call from the hashmap code, and just use the
> u32 key directly as an index into the hash bucket structure.
> 
> The performance for these different cases is as follows (with retpolines disabled):

retpolines enabled would also be interesting.

> 
> | Test case                       | Perf      | Add. overhead | Total overhead |
> |---------------------------------+-----------+---------------+----------------|
> | Before patch (XDP DROP program) | 31.0 Mpps |               |                |
> | After patch (XDP DROP program)  | 28.9 Mpps |        2.3 ns |         2.3 ns |

IMO even 1 Mpps overhead is too much for a feature that is primarily about
ease of use. Sacrificing performance to make userland a bit easier is hard
to justify for me when XDP _is_ singularly about performance. Also that is
nearly 10% overhead which is fairly large. So I think going forward the
performance gab needs to be removed.

> | XDP tail call                   | 26.6 Mpps |        3.0 ns |         5.3 ns |
> | XDP chain call (no jhash)       | 19.6 Mpps |       13.4 ns |        18.7 ns |
> | XDP chain call (this series)    | 17.0 Mpps |        7.9 ns |        26.6 ns |
> 
> From this it is clear that while there is some overhead from this mechanism; but
> the jhash removal example indicates that it is probably possible to optimise the
> code to the point where the overhead becomes low enough that it is acceptable.

I'm missing why 'in theory' at least this can't be made as-fast as tail calls?
Again I can't see why someone would lose 30% of their performance when a userland
program could populate a tail call map for the same effect. Sure userland would
also have to enforce some program standards/conventions but it could be done and
at 30% overhead that pain is probably worth it IMO.

My thinking though is if we are a bit clever chaining and tail calls could be
performance-wise equivalent?

I'll go read the patches now ;)

.John

> 
> # PATCH SET STRUCTURE
> This series is structured as follows:
> 
> - Patch 1: Prerequisite
> - Patch 2: New map type
> - Patch 3: Netlink hooks to install the chain call map
> - Patch 4: Core chain call logic
> - Patch 5-7: Bookkeeping updates to tools
> - Patch 8: Libbpf support for installing chain call maps
> - Patch 9: Selftests with example user space code
> 
> The whole series is also available in my git repo on kernel.org:
> https://git.kernel.org/pub/scm/linux/kernel/git/toke/linux.git/log/?h=xdp-multiprog-01
> 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH bpf-next 4/9] xdp: Implement chain call logic to support multiple programs on one interface
  2019-10-02 13:30 ` [PATCH bpf-next 4/9] xdp: Implement chain call logic to support multiple programs on one interface Toke Høiland-Jørgensen
@ 2019-10-02 17:33   ` kbuild test robot
  2019-10-02 17:53   ` kbuild test robot
  1 sibling, 0 replies; 53+ messages in thread
From: kbuild test robot @ 2019-10-02 17:33 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: kbuild-all, Daniel Borkmann, Alexei Starovoitov,
	Martin KaFai Lau, Song Liu, Yonghong Song, Marek Majkowski,
	Lorenz Bauer, David Miller, Jesper Dangaard Brouer, netdev, bpf

[-- Attachment #1: Type: text/plain, Size: 2924 bytes --]

Hi "Toke,

I love your patch! Yet something to improve:

[auto build test ERROR on bpf-next/master]

url:    https://github.com/0day-ci/linux/commits/Toke-H-iland-J-rgensen/xdp-Support-multiple-programs-on-a-single-interface-through-chain-calls/20191003-005238
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: i386-tinyconfig (attached as .config)
compiler: gcc-7 (Debian 7.4.0-13) 7.4.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>):

   In file included from include/net/sock.h:59:0,
                    from include/linux/tcp.h:19,
                    from include/linux/ipv6.h:87,
                    from include/net/ipv6.h:12,
                    from include/linux/sunrpc/clnt.h:28,
                    from include/linux/nfs_fs.h:32,
                    from init/do_mounts.c:23:
   include/linux/filter.h: In function 'bpf_prog_run_xdp':
>> include/linux/filter.h:725:10: error: implicit declaration of function 'bpf_xdp_chain_map_get_prog' [-Werror=implicit-function-declaration]
      prog = bpf_xdp_chain_map_get_prog(chain_map, prog->aux->id, ret);
             ^~~~~~~~~~~~~~~~~~~~~~~~~~
>> include/linux/filter.h:725:8: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
      prog = bpf_xdp_chain_map_get_prog(chain_map, prog->aux->id, ret);
           ^
   cc1: some warnings being treated as errors

vim +/bpf_xdp_chain_map_get_prog +725 include/linux/filter.h

   695	
   696	#define BPF_XDP_MAX_CHAIN_CALLS 32
   697	static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog,
   698						    struct xdp_buff *xdp)
   699	{
   700		/* Caller needs to hold rcu_read_lock() (!), otherwise program
   701		 * can be released while still running, or map elements could be
   702		 * freed early while still having concurrent users. XDP fastpath
   703		 * already takes rcu_read_lock() when fetching the program, so
   704		 * it's not necessary here anymore.
   705		 */
   706	
   707		int i = BPF_XDP_MAX_CHAIN_CALLS;
   708		struct bpf_map *chain_map;
   709		u32 ret;
   710	
   711		chain_map = rcu_dereference(xdp->rxq->dev->xdp_chain_map);
   712		if (!chain_map)
   713			return BPF_PROG_RUN(prog, xdp);
   714	
   715		do {
   716			if (!--i) {
   717				ret = XDP_ABORTED;
   718				goto out;
   719			}
   720	
   721			ret = BPF_PROG_RUN(prog, xdp);
   722			if (ret == XDP_ABORTED)
   723				goto out;
   724	
 > 725			prog = bpf_xdp_chain_map_get_prog(chain_map, prog->aux->id, ret);
   726		} while(prog);
   727	
   728	out:
   729		return ret;
   730	}
   731	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 7205 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH bpf-next 4/9] xdp: Implement chain call logic to support multiple programs on one interface
  2019-10-02 13:30 ` [PATCH bpf-next 4/9] xdp: Implement chain call logic to support multiple programs on one interface Toke Høiland-Jørgensen
  2019-10-02 17:33   ` kbuild test robot
@ 2019-10-02 17:53   ` kbuild test robot
  1 sibling, 0 replies; 53+ messages in thread
From: kbuild test robot @ 2019-10-02 17:53 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: kbuild-all, Daniel Borkmann, Alexei Starovoitov,
	Martin KaFai Lau, Song Liu, Yonghong Song, Marek Majkowski,
	Lorenz Bauer, David Miller, Jesper Dangaard Brouer, netdev, bpf

[-- Attachment #1: Type: text/plain, Size: 3076 bytes --]

Hi "Toke,

I love your patch! Yet something to improve:

[auto build test ERROR on bpf-next/master]

url:    https://github.com/0day-ci/linux/commits/Toke-H-iland-J-rgensen/xdp-Support-multiple-programs-on-a-single-interface-through-chain-calls/20191003-005238
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: sparc64-allnoconfig (attached as .config)
compiler: sparc64-linux-gcc (GCC) 7.4.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=7.4.0 make.cross ARCH=sparc64 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from include/net/sock.h:59:0,
                    from include/linux/tcp.h:19,
                    from include/linux/ipv6.h:87,
                    from include/net/ipv6.h:12,
                    from include/linux/sunrpc/clnt.h:28,
                    from include/linux/nfs_fs.h:32,
                    from arch/sparc/kernel/sys_sparc32.c:25:
   include/linux/filter.h: In function 'bpf_prog_run_xdp':
   include/linux/filter.h:725:10: error: implicit declaration of function 'bpf_xdp_chain_map_get_prog' [-Werror=implicit-function-declaration]
      prog = bpf_xdp_chain_map_get_prog(chain_map, prog->aux->id, ret);
             ^~~~~~~~~~~~~~~~~~~~~~~~~~
>> include/linux/filter.h:725:8: error: assignment makes pointer from integer without a cast [-Werror=int-conversion]
      prog = bpf_xdp_chain_map_get_prog(chain_map, prog->aux->id, ret);
           ^
   cc1: all warnings being treated as errors

vim +725 include/linux/filter.h

   695	
   696	#define BPF_XDP_MAX_CHAIN_CALLS 32
   697	static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog,
   698						    struct xdp_buff *xdp)
   699	{
   700		/* Caller needs to hold rcu_read_lock() (!), otherwise program
   701		 * can be released while still running, or map elements could be
   702		 * freed early while still having concurrent users. XDP fastpath
   703		 * already takes rcu_read_lock() when fetching the program, so
   704		 * it's not necessary here anymore.
   705		 */
   706	
   707		int i = BPF_XDP_MAX_CHAIN_CALLS;
   708		struct bpf_map *chain_map;
   709		u32 ret;
   710	
   711		chain_map = rcu_dereference(xdp->rxq->dev->xdp_chain_map);
   712		if (!chain_map)
   713			return BPF_PROG_RUN(prog, xdp);
   714	
   715		do {
   716			if (!--i) {
   717				ret = XDP_ABORTED;
   718				goto out;
   719			}
   720	
   721			ret = BPF_PROG_RUN(prog, xdp);
   722			if (ret == XDP_ABORTED)
   723				goto out;
   724	
 > 725			prog = bpf_xdp_chain_map_get_prog(chain_map, prog->aux->id, ret);
   726		} while(prog);
   727	
   728	out:
   729		return ret;
   730	}
   731	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 5713 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH bpf-next 3/9] xdp: Support setting and getting device chain map
  2019-10-02 13:30 ` [PATCH bpf-next 3/9] xdp: Support setting and getting device chain map Toke Høiland-Jørgensen
  2019-10-02 15:50   ` Lorenz Bauer
@ 2019-10-02 18:07   ` kbuild test robot
  2019-10-02 18:29   ` kbuild test robot
  2 siblings, 0 replies; 53+ messages in thread
From: kbuild test robot @ 2019-10-02 18:07 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: kbuild-all, Daniel Borkmann, Alexei Starovoitov,
	Martin KaFai Lau, Song Liu, Yonghong Song, Marek Majkowski,
	Lorenz Bauer, David Miller, Jesper Dangaard Brouer, netdev, bpf

[-- Attachment #1: Type: text/plain, Size: 6239 bytes --]

Hi "Toke,

I love your patch! Yet something to improve:

[auto build test ERROR on bpf-next/master]

url:    https://github.com/0day-ci/linux/commits/Toke-H-iland-J-rgensen/xdp-Support-multiple-programs-on-a-single-interface-through-chain-calls/20191003-005238
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: ia64-defconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 7.4.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=7.4.0 make.cross ARCH=ia64 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>):

   net/core/dev.c: In function 'dev_xdp_uninstall':
>> net/core/dev.c:8187:3: error: implicit declaration of function 'bpf_map_put_with_uref' [-Werror=implicit-function-declaration]
      bpf_map_put_with_uref(chain_map);
      ^~~~~~~~~~~~~~~~~~~~~
   net/core/dev.c: In function 'dev_change_xdp_fd':
>> net/core/dev.c:8286:15: error: implicit declaration of function 'bpf_map_get_with_uref'; did you mean 'bpf_obj_get_user'? [-Werror=implicit-function-declaration]
      chain_map = bpf_map_get_with_uref(chain_map_fd);
                  ^~~~~~~~~~~~~~~~~~~~~
                  bpf_obj_get_user
>> net/core/dev.c:8286:13: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
      chain_map = bpf_map_get_with_uref(chain_map_fd);
                ^
   cc1: some warnings being treated as errors

vim +/bpf_map_put_with_uref +8187 net/core/dev.c

  8177	
  8178	static void dev_xdp_uninstall(struct net_device *dev)
  8179	{
  8180		struct bpf_map *chain_map = NULL;
  8181		struct netdev_bpf xdp;
  8182		bpf_op_t ndo_bpf;
  8183	
  8184		/* Remove chain map */
  8185		rcu_swap_protected(dev->xdp_chain_map, chain_map, 1);
  8186		if(chain_map)
> 8187			bpf_map_put_with_uref(chain_map);
  8188	
  8189		/* Remove generic XDP */
  8190		WARN_ON(dev_xdp_install(dev, generic_xdp_install, NULL, 0, NULL));
  8191	
  8192		/* Remove from the driver */
  8193		ndo_bpf = dev->netdev_ops->ndo_bpf;
  8194		if (!ndo_bpf)
  8195			return;
  8196	
  8197		memset(&xdp, 0, sizeof(xdp));
  8198		xdp.command = XDP_QUERY_PROG;
  8199		WARN_ON(ndo_bpf(dev, &xdp));
  8200		if (xdp.prog_id)
  8201			WARN_ON(dev_xdp_install(dev, ndo_bpf, NULL, xdp.prog_flags,
  8202						NULL));
  8203	
  8204		/* Remove HW offload */
  8205		memset(&xdp, 0, sizeof(xdp));
  8206		xdp.command = XDP_QUERY_PROG_HW;
  8207		if (!ndo_bpf(dev, &xdp) && xdp.prog_id)
  8208			WARN_ON(dev_xdp_install(dev, ndo_bpf, NULL, xdp.prog_flags,
  8209						NULL));
  8210	}
  8211	
  8212	/**
  8213	 *	dev_change_xdp_fd - set or clear a bpf program for a device rx path
  8214	 *	@dev: device
  8215	 *	@extack: netlink extended ack
  8216	 *	@prog_fd: new program fd or negative value to clear
  8217	 *	@chain_map_fd: new chain map fd or negative value to clear
  8218	 *	@flags: xdp-related flags
  8219	 *
  8220	 *	Set or clear a bpf program for a device
  8221	 */
  8222	int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
  8223			      int prog_fd, int chain_map_fd, u32 flags)
  8224	{
  8225		const struct net_device_ops *ops = dev->netdev_ops;
  8226		struct bpf_map *chain_map = NULL;
  8227		enum bpf_netdev_command query;
  8228		struct bpf_prog *prog = NULL;
  8229		bpf_op_t bpf_op, bpf_chk;
  8230		bool offload;
  8231		int err;
  8232	
  8233		ASSERT_RTNL();
  8234	
  8235		offload = flags & XDP_FLAGS_HW_MODE;
  8236		query = offload ? XDP_QUERY_PROG_HW : XDP_QUERY_PROG;
  8237	
  8238		bpf_op = bpf_chk = ops->ndo_bpf;
  8239		if (!bpf_op && (flags & (XDP_FLAGS_DRV_MODE | XDP_FLAGS_HW_MODE))) {
  8240			NL_SET_ERR_MSG(extack, "underlying driver does not support XDP in native mode");
  8241			return -EOPNOTSUPP;
  8242		}
  8243		if (!bpf_op || (flags & XDP_FLAGS_SKB_MODE))
  8244			bpf_op = generic_xdp_install;
  8245		if (bpf_op == bpf_chk)
  8246			bpf_chk = generic_xdp_install;
  8247	
  8248		if (prog_fd >= 0) {
  8249			u32 prog_id;
  8250	
  8251			if (!offload && __dev_xdp_query(dev, bpf_chk, XDP_QUERY_PROG)) {
  8252				NL_SET_ERR_MSG(extack, "native and generic XDP can't be active at the same time");
  8253				return -EEXIST;
  8254			}
  8255	
  8256			prog_id = __dev_xdp_query(dev, bpf_op, query);
  8257			if ((flags & XDP_FLAGS_UPDATE_IF_NOEXIST) && prog_id) {
  8258				NL_SET_ERR_MSG(extack, "XDP program already attached");
  8259				return -EBUSY;
  8260			}
  8261	
  8262			prog = bpf_prog_get_type_dev(prog_fd, BPF_PROG_TYPE_XDP,
  8263						     bpf_op == ops->ndo_bpf);
  8264			if (IS_ERR(prog))
  8265				return PTR_ERR(prog);
  8266	
  8267			if (!offload && bpf_prog_is_dev_bound(prog->aux)) {
  8268				NL_SET_ERR_MSG(extack, "using device-bound program without HW_MODE flag is not supported");
  8269				bpf_prog_put(prog);
  8270				return -EINVAL;
  8271			}
  8272	
  8273			if (prog->aux->id == prog_id) {
  8274				bpf_prog_put(prog);
  8275				return 0;
  8276			}
  8277		} else {
  8278			if (chain_map_fd >= 0)
  8279				return -EINVAL;
  8280	
  8281			if (!__dev_xdp_query(dev, bpf_op, query))
  8282				return 0;
  8283		}
  8284	
  8285		if (chain_map_fd >= 0) {
> 8286			chain_map = bpf_map_get_with_uref(chain_map_fd);
  8287			if (IS_ERR(chain_map))
  8288				return PTR_ERR(chain_map);
  8289	
  8290			if (chain_map->map_type != BPF_MAP_TYPE_XDP_CHAIN) {
  8291				NL_SET_ERR_MSG(extack, "invalid chain map type");
  8292				bpf_map_put_with_uref(chain_map);
  8293				return -EINVAL;
  8294			}
  8295		}
  8296	
  8297		err = dev_xdp_install(dev, bpf_op, extack, flags, prog);
  8298		if (err < 0) {
  8299			if (prog)
  8300				bpf_prog_put(prog);
  8301		} else {
  8302			rcu_swap_protected(dev->xdp_chain_map, chain_map, 1);
  8303		}
  8304	
  8305		if(chain_map)
  8306			bpf_map_put_with_uref(chain_map);
  8307	
  8308		return err;
  8309	}
  8310	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 19101 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH bpf-next 2/9] xdp: Add new xdp_chain_map type for specifying XDP call sequences
  2019-10-02 15:50   ` Lorenz Bauer
@ 2019-10-02 18:25     ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 53+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-10-02 18:25 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: Daniel Borkmann, Alexei Starovoitov, Martin KaFai Lau, Song Liu,
	Yonghong Song, Marek Majkowski, David Miller,
	Jesper Dangaard Brouer, Networking, bpf, kernel-team

Lorenz Bauer <lmb@cloudflare.com> writes:

> On Wed, 2 Oct 2019 at 14:30, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
>> index 113e1286e184..ab855095c830 100644
>> --- a/kernel/bpf/hashtab.c
>> +++ b/kernel/bpf/hashtab.c
>> @@ -1510,3 +1510,156 @@ const struct bpf_map_ops htab_of_maps_map_ops = {
>>         .map_gen_lookup = htab_of_map_gen_lookup,
>>         .map_check_btf = map_check_no_btf,
>>  };
>> +
>> +struct xdp_chain_table {
>> +       struct bpf_prog *wildcard_act;
>> +       struct bpf_prog *act[XDP_ACT_MAX];
>> +};
>> +
>> +static int xdp_chain_map_alloc_check(union bpf_attr *attr)
>> +{
>> +       BUILD_BUG_ON(sizeof(struct xdp_chain_table) / sizeof(void *) !=
>> +                    sizeof(struct xdp_chain_acts) / sizeof(u32));
>> +
>> +       if (attr->key_size != sizeof(u32) ||
>> +           attr->value_size != sizeof(struct xdp_chain_acts))
>> +               return -EINVAL;
>
> How are we going to extend xdp_chain_acts if a new XDP action is
> introduced?

By just checking the size and reacting appropriately? Don't think that
is problematic, just takes a few if statements here?

>> +
>> +       attr->value_size = sizeof(struct xdp_chain_table);
>> +       return htab_map_alloc_check(attr);
>> +}
>> +
>> +struct bpf_prog *bpf_xdp_chain_map_get_prog(struct bpf_map *map,
>> +                                           u32 prev_id,
>> +                                           enum xdp_action action)
>> +{
>> +       struct xdp_chain_table *tab;
>> +       void *ptr;
>> +
>> +       ptr = htab_map_lookup_elem(map, &prev_id);
>> +
>> +       if (!ptr)
>> +               return NULL;
>> +
>> +       tab = READ_ONCE(ptr);
>> +       return tab->act[action - 1] ?: tab->wildcard_act;
>> +}
>> +EXPORT_SYMBOL_GPL(bpf_xdp_chain_map_get_prog);
>> +
>> +/* only called from syscall */
>> +int bpf_xdp_chain_map_lookup_elem(struct bpf_map *map, void *key, void *value)
>> +{
>> +       struct xdp_chain_acts *act = value;
>> +       struct xdp_chain_table *tab;
>> +       void *ptr;
>> +       u32 *cur;
>> +       int i;
>> +
>> +       ptr = htab_map_lookup_elem(map, key);
>> +       if (!ptr)
>> +               return -ENOENT;
>> +
>> +       tab = READ_ONCE(ptr);
>> +
>> +       if (tab->wildcard_act)
>> +               act->wildcard_act = tab->wildcard_act->aux->id;
>> +
>> +       cur = &act->drop_act;
>> +       for (i = 0; i < XDP_ACT_MAX; i++, cur++)
>> +               if(tab->act[i])
>> +                       *cur = tab->act[i]->aux->id;
>
> For completeness, zero out *cur in the else case?

Ah, good point. I assumed the caller was kzalloc'ing; seems that's not
the case; will fix.

>> +
>> +       return 0;
>> +}
>> +
>> +static void *xdp_chain_map_get_ptr(int fd)
>> +{
>> +       struct bpf_prog *prog = bpf_prog_get(fd);
>> +
>> +       if (IS_ERR(prog))
>> +               return prog;
>> +
>> +       if (prog->type != BPF_PROG_TYPE_XDP ||
>> +           bpf_prog_is_dev_bound(prog->aux)) {
>> +               bpf_prog_put(prog);
>> +               return ERR_PTR(-EINVAL);
>> +       }
>> +
>> +       return prog;
>> +}
>> +
>> +static void xdp_chain_map_put_ptrs(void *value)
>> +{
>> +       struct xdp_chain_table *tab = value;
>> +       int i;
>> +
>> +       for (i = 0; i < XDP_ACT_MAX; i++)
>> +               if (tab->act[i])
>> +                       bpf_prog_put(tab->act[i]);
>> +
>> +       if (tab->wildcard_act)
>> +               bpf_prog_put(tab->wildcard_act);
>> +}
>> +
>> +/* only called from syscall */
>> +int bpf_xdp_chain_map_update_elem(struct bpf_map *map, void *key, void *value,
>> +                                 u64 map_flags)
>
> Nit: check that map_flags == 0

Yup, will fix.

>> +{
>> +       struct xdp_chain_acts *act = value;
>> +       struct xdp_chain_table tab = {};
>> +       u32 lookup_key = *((u32*)key);
>> +       u32 *cur = &act->drop_act;
>> +       bool found_val = false;
>> +       int ret, i;
>> +       void *ptr;
>> +
>> +       if (!lookup_key)
>> +               return -EINVAL;
>
> Is it possible to check that this is a valid prog id / fd or whatever
> it is?

I suppose we could. The reason I didn't was that I figured it would be
valid to insert IDs into the map that don't exist (yet, or any longer).
Theoretically, if you can predict the program ID, you could insert it
into the map before it's allocated. There's no clearing of maps when
program IDs are freed either.

>> +
>> +       if (act->wildcard_act) {
>
> If this is an fd, 0 is a valid value no?

Yes, technically. But if we actually allow fd 0, that means the
userspace caller can't just pass a 0-initialised struct, but will have
to loop through and explicitly set everything to -1. Whereas if we just
disallow fd 0, that is no longer a problem (and normally you have to
jump through some hoops to end up with a bpf program pointer as standard
input, no?).

It's not ideal, but I figured this was the least ugly way to do it. If
you have a better idea I'm all ears? :)

>> +               ptr = xdp_chain_map_get_ptr(act->wildcard_act);
>> +               if (IS_ERR(ptr))
>> +                       return PTR_ERR(ptr);
>> +               tab.wildcard_act = ptr;
>> +               found_val = true;
>> +       }
>> +
>> +       for (i = 0; i < XDP_ACT_MAX; i++, cur++) {
>> +               if (*cur) {
>> +                       ptr = xdp_chain_map_get_ptr(*cur);
>> +                       if (IS_ERR(ptr)) {
>> +                               ret = PTR_ERR(ptr);
>> +                               goto out_err;
>> +                       }
>> +                       tab.act[i] = ptr;
>> +                       found_val = true;
>> +               }
>> +       }
>> +
>> +       if (!found_val) {
>> +               ret = -EINVAL;
>> +               goto out_err;
>> +       }
>> +
>> +       ret = htab_map_update_elem(map, key, &tab, map_flags);
>> +       if (ret)
>> +               goto out_err;
>> +
>> +       return ret;
>> +
>> +out_err:
>> +       xdp_chain_map_put_ptrs(&tab);
>> +
>> +       return ret;
>> +}
>> +
>> +
>> +const struct bpf_map_ops xdp_chain_map_ops = {
>> +       .map_alloc_check = xdp_chain_map_alloc_check,
>> +       .map_alloc = htab_map_alloc,
>> +       .map_free = fd_htab_map_free,
>> +       .map_get_next_key = htab_map_get_next_key,
>> +       .map_delete_elem = htab_map_delete_elem,
>> +       .map_fd_put_value = xdp_chain_map_put_ptrs,
>> +       .map_check_btf = map_check_no_btf,
>> +};
>
> --
> Lorenz Bauer  |  Systems Engineer
> 6th Floor, County Hall/The Riverside Building, SE1 7PB, UK
>
> www.cloudflare.com

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH bpf-next 3/9] xdp: Support setting and getting device chain map
  2019-10-02 13:30 ` [PATCH bpf-next 3/9] xdp: Support setting and getting device chain map Toke Høiland-Jørgensen
  2019-10-02 15:50   ` Lorenz Bauer
  2019-10-02 18:07   ` kbuild test robot
@ 2019-10-02 18:29   ` kbuild test robot
  2 siblings, 0 replies; 53+ messages in thread
From: kbuild test robot @ 2019-10-02 18:29 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: kbuild-all, Daniel Borkmann, Alexei Starovoitov,
	Martin KaFai Lau, Song Liu, Yonghong Song, Marek Majkowski,
	Lorenz Bauer, David Miller, Jesper Dangaard Brouer, netdev, bpf

[-- Attachment #1: Type: text/plain, Size: 5992 bytes --]

Hi "Toke,

I love your patch! Yet something to improve:

[auto build test ERROR on bpf-next/master]

url:    https://github.com/0day-ci/linux/commits/Toke-H-iland-J-rgensen/xdp-Support-multiple-programs-on-a-single-interface-through-chain-calls/20191003-005238
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: x86_64-randconfig-s2-201939 (attached as .config)
compiler: gcc-5 (Ubuntu 5.5.0-12ubuntu1) 5.5.0 20171010
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   net/core/dev.c: In function 'dev_xdp_uninstall':
   net/core/dev.c:8187:3: error: implicit declaration of function 'bpf_map_put_with_uref' [-Werror=implicit-function-declaration]
      bpf_map_put_with_uref(chain_map);
      ^
   net/core/dev.c: In function 'dev_change_xdp_fd':
>> net/core/dev.c:8286:15: error: implicit declaration of function 'bpf_map_get_with_uref' [-Werror=implicit-function-declaration]
      chain_map = bpf_map_get_with_uref(chain_map_fd);
                  ^
   net/core/dev.c:8286:13: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
      chain_map = bpf_map_get_with_uref(chain_map_fd);
                ^
   cc1: some warnings being treated as errors

vim +/bpf_map_get_with_uref +8286 net/core/dev.c

  8177	
  8178	static void dev_xdp_uninstall(struct net_device *dev)
  8179	{
  8180		struct bpf_map *chain_map = NULL;
  8181		struct netdev_bpf xdp;
  8182		bpf_op_t ndo_bpf;
  8183	
  8184		/* Remove chain map */
  8185		rcu_swap_protected(dev->xdp_chain_map, chain_map, 1);
  8186		if(chain_map)
> 8187			bpf_map_put_with_uref(chain_map);
  8188	
  8189		/* Remove generic XDP */
  8190		WARN_ON(dev_xdp_install(dev, generic_xdp_install, NULL, 0, NULL));
  8191	
  8192		/* Remove from the driver */
  8193		ndo_bpf = dev->netdev_ops->ndo_bpf;
  8194		if (!ndo_bpf)
  8195			return;
  8196	
  8197		memset(&xdp, 0, sizeof(xdp));
  8198		xdp.command = XDP_QUERY_PROG;
  8199		WARN_ON(ndo_bpf(dev, &xdp));
  8200		if (xdp.prog_id)
  8201			WARN_ON(dev_xdp_install(dev, ndo_bpf, NULL, xdp.prog_flags,
  8202						NULL));
  8203	
  8204		/* Remove HW offload */
  8205		memset(&xdp, 0, sizeof(xdp));
  8206		xdp.command = XDP_QUERY_PROG_HW;
  8207		if (!ndo_bpf(dev, &xdp) && xdp.prog_id)
  8208			WARN_ON(dev_xdp_install(dev, ndo_bpf, NULL, xdp.prog_flags,
  8209						NULL));
  8210	}
  8211	
  8212	/**
  8213	 *	dev_change_xdp_fd - set or clear a bpf program for a device rx path
  8214	 *	@dev: device
  8215	 *	@extack: netlink extended ack
  8216	 *	@prog_fd: new program fd or negative value to clear
  8217	 *	@chain_map_fd: new chain map fd or negative value to clear
  8218	 *	@flags: xdp-related flags
  8219	 *
  8220	 *	Set or clear a bpf program for a device
  8221	 */
  8222	int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
  8223			      int prog_fd, int chain_map_fd, u32 flags)
  8224	{
  8225		const struct net_device_ops *ops = dev->netdev_ops;
  8226		struct bpf_map *chain_map = NULL;
  8227		enum bpf_netdev_command query;
  8228		struct bpf_prog *prog = NULL;
  8229		bpf_op_t bpf_op, bpf_chk;
  8230		bool offload;
  8231		int err;
  8232	
  8233		ASSERT_RTNL();
  8234	
  8235		offload = flags & XDP_FLAGS_HW_MODE;
  8236		query = offload ? XDP_QUERY_PROG_HW : XDP_QUERY_PROG;
  8237	
  8238		bpf_op = bpf_chk = ops->ndo_bpf;
  8239		if (!bpf_op && (flags & (XDP_FLAGS_DRV_MODE | XDP_FLAGS_HW_MODE))) {
  8240			NL_SET_ERR_MSG(extack, "underlying driver does not support XDP in native mode");
  8241			return -EOPNOTSUPP;
  8242		}
  8243		if (!bpf_op || (flags & XDP_FLAGS_SKB_MODE))
  8244			bpf_op = generic_xdp_install;
  8245		if (bpf_op == bpf_chk)
  8246			bpf_chk = generic_xdp_install;
  8247	
  8248		if (prog_fd >= 0) {
  8249			u32 prog_id;
  8250	
  8251			if (!offload && __dev_xdp_query(dev, bpf_chk, XDP_QUERY_PROG)) {
  8252				NL_SET_ERR_MSG(extack, "native and generic XDP can't be active at the same time");
  8253				return -EEXIST;
  8254			}
  8255	
  8256			prog_id = __dev_xdp_query(dev, bpf_op, query);
  8257			if ((flags & XDP_FLAGS_UPDATE_IF_NOEXIST) && prog_id) {
  8258				NL_SET_ERR_MSG(extack, "XDP program already attached");
  8259				return -EBUSY;
  8260			}
  8261	
  8262			prog = bpf_prog_get_type_dev(prog_fd, BPF_PROG_TYPE_XDP,
  8263						     bpf_op == ops->ndo_bpf);
  8264			if (IS_ERR(prog))
  8265				return PTR_ERR(prog);
  8266	
  8267			if (!offload && bpf_prog_is_dev_bound(prog->aux)) {
  8268				NL_SET_ERR_MSG(extack, "using device-bound program without HW_MODE flag is not supported");
  8269				bpf_prog_put(prog);
  8270				return -EINVAL;
  8271			}
  8272	
  8273			if (prog->aux->id == prog_id) {
  8274				bpf_prog_put(prog);
  8275				return 0;
  8276			}
  8277		} else {
  8278			if (chain_map_fd >= 0)
  8279				return -EINVAL;
  8280	
  8281			if (!__dev_xdp_query(dev, bpf_op, query))
  8282				return 0;
  8283		}
  8284	
  8285		if (chain_map_fd >= 0) {
> 8286			chain_map = bpf_map_get_with_uref(chain_map_fd);
  8287			if (IS_ERR(chain_map))
  8288				return PTR_ERR(chain_map);
  8289	
  8290			if (chain_map->map_type != BPF_MAP_TYPE_XDP_CHAIN) {
  8291				NL_SET_ERR_MSG(extack, "invalid chain map type");
  8292				bpf_map_put_with_uref(chain_map);
  8293				return -EINVAL;
  8294			}
  8295		}
  8296	
  8297		err = dev_xdp_install(dev, bpf_op, extack, flags, prog);
  8298		if (err < 0) {
  8299			if (prog)
  8300				bpf_prog_put(prog);
  8301		} else {
  8302			rcu_swap_protected(dev->xdp_chain_map, chain_map, 1);
  8303		}
  8304	
  8305		if(chain_map)
  8306			bpf_map_put_with_uref(chain_map);
  8307	
  8308		return err;
  8309	}
  8310	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 37305 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH bpf-next 3/9] xdp: Support setting and getting device chain map
  2019-10-02 15:50   ` Lorenz Bauer
@ 2019-10-02 18:32     ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 53+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-10-02 18:32 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: Daniel Borkmann, Alexei Starovoitov, Martin KaFai Lau, Song Liu,
	Yonghong Song, Marek Majkowski, David Miller,
	Jesper Dangaard Brouer, Networking, bpf, kernel-team

Lorenz Bauer <lmb@cloudflare.com> writes:

> On Wed, 2 Oct 2019 at 14:30, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> From: Toke Høiland-Jørgensen <toke@redhat.com>
>>
>> This adds support to rtnetlink for setting and getting the per-device XDP
>> chain map. The map is set by means of a new netlink attribute that contains
>> a pointer to a BPF map of the XDP chain type. If such an attribute is
>> included, it will be inserted into the struct net_device so that the XDP
>> chain call code will pick it up on program execution.
>>
>> To prevent old userspace programs that do not understand the chain map
>> attribute from messing up the chain call order, a netlink message with no
>> chain map attribute set will be rejected if a chain map has already been
>> installed.
>>
>> When installing a new chain call map, an XDP program fd must also be
>> provided, otherwise the operation will be rejected.
>
> Why is the program required? I kind of expected the chain call map to
> override any program.

Mainly because drivers expect to attach an XDP program, not a map. We
could conceivably emulate this by pulling out the first program from the
map on attach (after we define semantics for this; just lookup ID 0?)
and pass that down to the driver. But then we'd either need to make that
first program immutable in the map, or we'd need to propagate map
updates all the way down to the driver. Or alternatively, we'd need to
teach all the drivers how to attach a map instead of a program. Lots of
cans of worms there I'd rather not open... :)

A userspace application or library could emulate all of this, though, by
just accepting "attach" of a map, and turning that into the right
netlink message. It would have the same problem re: map updates, but if
it controls those that should be doable...

-Toke

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls
  2019-10-02 16:34     ` John Fastabend
@ 2019-10-02 18:33       ` Toke Høiland-Jørgensen
  2019-10-02 20:34         ` John Fastabend
  0 siblings, 1 reply; 53+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-10-02 18:33 UTC (permalink / raw)
  To: John Fastabend, Alan Maguire
  Cc: Daniel Borkmann, Alexei Starovoitov, Martin KaFai Lau, Song Liu,
	Yonghong Song, Marek Majkowski, Lorenz Bauer, David Miller,
	Jesper Dangaard Brouer, netdev, bpf

John Fastabend <john.fastabend@gmail.com> writes:

> Toke Høiland-Jørgensen wrote:
>> Alan Maguire <alan.maguire@oracle.com> writes:
>> 
>> > On Wed, 2 Oct 2019, Toke Høiland-Jørgensen wrote:
>> >
>> >> This series adds support for executing multiple XDP programs on a single
>> >> interface in sequence, through the use of chain calls, as discussed at the Linux
>> >> Plumbers Conference last month:
>> >> 
>> >> https://linuxplumbersconf.org/event/4/contributions/460/
>> >> 
>> >> # HIGH-LEVEL IDEA
>> >> 
>> >> The basic idea is to express the chain call sequence through a special map type,
>> >> which contains a mapping from a (program, return code) tuple to another program
>> >> to run in next in the sequence. Userspace can populate this map to express
>> >> arbitrary call sequences, and update the sequence by updating or replacing the
>> >> map.
>> >> 
>> >> The actual execution of the program sequence is done in bpf_prog_run_xdp(),
>> >> which will lookup the chain sequence map, and if found, will loop through calls
>> >> to BPF_PROG_RUN, looking up the next XDP program in the sequence based on the
>> >> previous program ID and return code.
>> >> 
>> >> An XDP chain call map can be installed on an interface by means of a new netlink
>> >> attribute containing an fd pointing to a chain call map. This can be supplied
>> >> along with the XDP prog fd, so that a chain map is always installed together
>> >> with an XDP program.
>> >> 
>> >
>> > This is great stuff Toke!
>> 
>> Thanks! :)
>> 
>> > One thing that wasn't immediately clear to me - and this may be just
>> > me - is the relationship between program behaviour for the XDP_DROP
>> > case and chain call execution. My initial thought was that a program
>> > in the chain XDP_DROP'ping the packet would terminate the call chain,
>> > but on looking at patch #4 it seems that the only way the call chain
>> > execution is terminated is if
>> >
>> > - XDP_ABORTED is returned from a program in the call chain; or
>> 
>> Yes. Not actually sure about this one...
>> 
>> > - the map entry for the next program (determined by the return value
>> >   of the current program) is empty; or
>> 
>> This will be the common exit condition, I expect
>> 
>> > - we run out of entries in the map
>> 
>> You mean if we run the iteration counter to zero, right?
>> 
>> > The return value of the last-executed program in the chain seems to be
>> > what determines packet processing behaviour after executing the chain
>> > (_DROP, _TX, _PASS, etc). So there's no way to both XDP_PASS and
>> > XDP_TX a packet from the same chain, right? Just want to make sure
>> > I've got the semantics correct. Thanks!
>> 
>> Yeah, you've got all this right. The chain call mechanism itself doesn't
>> change any of the underlying fundamentals of XDP. I.e., each packet gets
>> exactly one verdict.
>> 
>> For chaining actual XDP programs that do different things to the packet,
>> I expect that the most common use case will be to only run the next
>> program if the previous one returns XDP_PASS. That will make the most
>> semantic sense I think.
>> 
>> But there are also use cases where one would want to match on the other
>> return codes; such as packet capture, for instance, where one might
>> install a capture program that would carry forward the previous return
>> code, but do something to the packet (throw it out to userspace) first.
>> 
>> For the latter use case, the question is if we need to expose the
>> previous return code to the program when it runs. You can do things
>> without it (by just using a different program per return code), but it
>> may simplify things if we just expose the return code. However, since
>> this will also change the semantics for running programs, I decided to
>> leave that off for now.
>> 
>> -Toke
>
> In other cases where programs (e.g. cgroups) are run in an array the
> return codes are 'AND'ed together so that we get
>
>    result1 & result2 & ... & resultN

How would that work with multiple programs, though? PASS -> DROP seems
obvious, but what if the first program returns TX? Also, programs may
want to be able to actually override return codes (e.g., say you want to
turn DROPs into REDIRECTs, to get all your dropped packets mirrored to
your IDS or something).

-Toke

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls
  2019-10-02 13:30 [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls Toke Høiland-Jørgensen
                   ` (11 preceding siblings ...)
  2019-10-02 16:43 ` John Fastabend
@ 2019-10-02 18:38 ` Song Liu
  2019-10-02 18:54   ` Song Liu
  2019-10-02 19:23   ` Toke Høiland-Jørgensen
  12 siblings, 2 replies; 53+ messages in thread
From: Song Liu @ 2019-10-02 18:38 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Daniel Borkmann, Alexei Starovoitov, Martin Lau, Yonghong Song,
	Marek Majkowski, Lorenz Bauer, David Miller,
	Jesper Dangaard Brouer, netdev, bpf



> On Oct 2, 2019, at 6:30 AM, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> 
> This series adds support for executing multiple XDP programs on a single
> interface in sequence, through the use of chain calls, as discussed at the Linux
> Plumbers Conference last month:
> 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__linuxplumbersconf.org_event_4_contributions_460_&d=DwIDaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=dR8692q0_uaizy0jkrBJQM5k2hfm4CiFxYT8KaysFrg&m=YXqqHTC51zXBviPBEk55y-fQjFQwcXWFlH0IoOqm2KU&s=NF4w3eSPmNhSpJr1-0FLqqlqfgEV8gsCQb9YqWQ9p-k&e= 
> 
> # HIGH-LEVEL IDEA
> 
> The basic idea is to express the chain call sequence through a special map type,
> which contains a mapping from a (program, return code) tuple to another program
> to run in next in the sequence. Userspace can populate this map to express
> arbitrary call sequences, and update the sequence by updating or replacing the
> map.
> 
> The actual execution of the program sequence is done in bpf_prog_run_xdp(),
> which will lookup the chain sequence map, and if found, will loop through calls
> to BPF_PROG_RUN, looking up the next XDP program in the sequence based on the
> previous program ID and return code.
> 
> An XDP chain call map can be installed on an interface by means of a new netlink
> attribute containing an fd pointing to a chain call map. This can be supplied
> along with the XDP prog fd, so that a chain map is always installed together
> with an XDP program.

Interesting work!

Quick question: can we achieve the same by adding a "retval to call_tail_next" 
map to each program? I think one issue is how to avoid loop like A->B->C->A, 
but this should be solvable? 

> 
> # PERFORMANCE
> 
> I performed a simple performance test to get an initial feel for the overhead of
> the chain call mechanism. This test consists of running only two programs in
> sequence: One that returns XDP_PASS and another that returns XDP_DROP. I then
> measure the drop PPS performance and compare it to a baseline of just a single
> program that only returns XDP_DROP.
> 
> For comparison, a test case that uses regular eBPF tail calls to sequence two
> programs together is also included. Finally, because 'perf' showed that the
> hashmap lookup was the largest single source of overhead, I also added a test
> case where I removed the jhash() call from the hashmap code, and just use the
> u32 key directly as an index into the hash bucket structure.
> 
> The performance for these different cases is as follows (with retpolines disabled):
> 
> | Test case                       | Perf      | Add. overhead | Total overhead |
> |---------------------------------+-----------+---------------+----------------|
> | Before patch (XDP DROP program) | 31.0 Mpps |               |                |
> | After patch (XDP DROP program)  | 28.9 Mpps |        2.3 ns |         2.3 ns |
> | XDP tail call                   | 26.6 Mpps |        3.0 ns |         5.3 ns |
> | XDP chain call (no jhash)       | 19.6 Mpps |       13.4 ns |        18.7 ns |
> | XDP chain call (this series)    | 17.0 Mpps |        7.9 ns |        26.6 ns |
> 
> From this it is clear that while there is some overhead from this mechanism; but
> the jhash removal example indicates that it is probably possible to optimise the
> code to the point where the overhead becomes low enough that it is acceptable.

I think we can probably re-jit multiple programs into one based on the mapping, 
which should give the best performance. 

Thanks,
Song

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls
  2019-10-02 16:35 ` Lorenz Bauer
@ 2019-10-02 18:54   ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 53+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-10-02 18:54 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: Daniel Borkmann, Alexei Starovoitov, Martin KaFai Lau, Song Liu,
	Yonghong Song, Marek Majkowski, David Miller,
	Jesper Dangaard Brouer, Networking, bpf, kernel-team

Lorenz Bauer <lmb@cloudflare.com> writes:

> On Wed, 2 Oct 2019 at 14:30, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> This series adds support for executing multiple XDP programs on a single
>> interface in sequence, through the use of chain calls, as discussed at the Linux
>> Plumbers Conference last month:
>
> Hi Toke,
>
> Thanks for posting the patch set! As I mentioned, this is a big pain
> point for us as well. Right now, all of our different XDP components
> are managed by a single daemon called (drumroll) xdpd. We'd like to
> separate this out into individual bits and pieces that operate
> separately from each other.

Yes, I'm counting on you guys to be one of the potential users and help
me iron out the API and semantics! ;)

> I've looked at the kernel side of your patch set, here are my thoughts:
>
>> # HIGH-LEVEL IDEA
>>
>> The basic idea is to express the chain call sequence through a special map type,
>> which contains a mapping from a (program, return code) tuple to another program
>> to run in next in the sequence. Userspace can populate this map to express
>> arbitrary call sequences, and update the sequence by updating or replacing the
>> map.
>
> How do you imagine this would work in practice? From what I can tell,
> the map is keyed by program id, which makes it difficult to reliably
> construct the control flow that I want.

The way I've been picturing this would work, is that programs would
cooperatively manage this by updating the data structures to insert and
remove themselves as needed. I haven't actually fleshed out this in code
yet (that's next on my agenda), but I imagine it would be something like
this:

> As an example, I'd like to split the daemon into two parts, A and B,
> which I want to be completely independent. So:
>
> - A runs before B, if both are active
> - If A is not active, B is called first
> - If B is not active, only A is called
>
> Both A and B may at any point in time replace their current XDP
> program with a new one. This means that there is no stable program ID
> I can use.

They would have to cooperate. Either by a central management daemon, or
by using the chain call map as a data structure to coordinate.

If A switches out its program, it would need to first load the new
version, then update the map to replace the old ID with the new one.

> Another problem are deletions: if I delete A (because that component
> is shut down) B will never be called, since the program ID that linked
> B into the control flow is gone. This means that B needs to know about
> A and vice versa.

Say both programs are running. If B shuts down, it just removes itself
from the map. If A shuts down, it also needs to remove itself from the
map, and move up all its "descendants" in the call sequence. So in this
case, A would look up itself in the chain call map, find the next
program in the sequence, and install that as the new program on the
interface.

The operations are kinda like linked list manipulations. I had some
examples in my slides at LPC:

- Simple updates: *linked-list like* operations (map stays the same)
# Insert after id 3
  --> id = load(prog.o);
  --> map_update(map, {3, PASS}, id) # atomic update
# Insert before id 2
  --> id = load(prog.o);
  --> map_update(map, {id, PASS}, 2); # no effect on chain sequence
  --> map_update(map, {1, PASS}, id); # atomic update

- More complex operations: /*replace the whole thing*/

# Replace ID 3 with new program
  --> id = load(prog.o); map = new_map();
  --> map_update(map, {1, PASS}, 2);
  --> map_update(map, {1, TX}, id);
  --> map_update(map, {2, PASS}, id);
  --> xdp_attach(eth0, 1, map, FORCE); # atomic replace


The tricky part is avoiding read-modify-update races. I was imagining
that we could provide "cmpxchg-like" semantics by adding an "expected
old value" to the netlink load and/or to map updates. The kernel could
then ensure that the update fails if the actual value being replaced has
changed since userspace read it, in which case userspace could retry the
whole operation.

>> The actual execution of the program sequence is done in
>> bpf_prog_run_xdp(), which will lookup the chain sequence map, and if
>> found, will loop through calls to BPF_PROG_RUN, looking up the next
>> XDP program in the sequence based on the previous program ID and
>> return code.
>
> I think that the tail call chain is useful for other eBPF programs as
> well. How hard is it to turn the logic in bpf_prog_run_xdp into a
> helper instead?

Heh. In principle, I guess we could move the logic *into* BPF_PROG_RUN
instead of wrapping around it. The tricky part would be figuring out
where to stash the map itself.

One way to do this, which your other question about why both prog fd and
map fd was needed made me think about, is to make programs and sequence
maps interchangeable *everywhere*. I.e., every time we attach an eBPF
program anywhere, we pass an fd to that program. So from a UAPI PoV, we
could just extend that so all attach points would accept *either* a
program fd *or* a chain call map fd, and then do the appropriate thing.
This is quite intrusive, though, so let's leave that aside from now and
maybe come back to it in the future if this turns out to be a huge
success :)

-Toke

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls
  2019-10-02 18:38 ` Song Liu
@ 2019-10-02 18:54   ` Song Liu
  2019-10-02 19:25     ` Toke Høiland-Jørgensen
  2019-10-02 19:23   ` Toke Høiland-Jørgensen
  1 sibling, 1 reply; 53+ messages in thread
From: Song Liu @ 2019-10-02 18:54 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Daniel Borkmann, Alexei Starovoitov, Martin Lau, Yonghong Song,
	Marek Majkowski, Lorenz Bauer, David Miller,
	Jesper Dangaard Brouer, netdev, bpf



> On Oct 2, 2019, at 11:38 AM, Song Liu <songliubraving@fb.com> wrote:
> 
> 
> 
>> On Oct 2, 2019, at 6:30 AM, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> 
>> This series adds support for executing multiple XDP programs on a single
>> interface in sequence, through the use of chain calls, as discussed at the Linux
>> Plumbers Conference last month:
>> 
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__linuxplumbersconf.org_event_4_contributions_460_&d=DwIDaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=dR8692q0_uaizy0jkrBJQM5k2hfm4CiFxYT8KaysFrg&m=YXqqHTC51zXBviPBEk55y-fQjFQwcXWFlH0IoOqm2KU&s=NF4w3eSPmNhSpJr1-0FLqqlqfgEV8gsCQb9YqWQ9p-k&e= 
>> 
>> # HIGH-LEVEL IDEA
>> 
>> The basic idea is to express the chain call sequence through a special map type,
>> which contains a mapping from a (program, return code) tuple to another program
>> to run in next in the sequence. Userspace can populate this map to express
>> arbitrary call sequences, and update the sequence by updating or replacing the
>> map.
>> 
>> The actual execution of the program sequence is done in bpf_prog_run_xdp(),
>> which will lookup the chain sequence map, and if found, will loop through calls
>> to BPF_PROG_RUN, looking up the next XDP program in the sequence based on the
>> previous program ID and return code.
>> 
>> An XDP chain call map can be installed on an interface by means of a new netlink
>> attribute containing an fd pointing to a chain call map. This can be supplied
>> along with the XDP prog fd, so that a chain map is always installed together
>> with an XDP program.
> 
> Interesting work!
> 
> Quick question: can we achieve the same by adding a "retval to call_tail_next" 
> map to each program? I think one issue is how to avoid loop like A->B->C->A, 
> but this should be solvable? 

Also, could you please share a real word example? I saw the example from LPC
slides, but I am more curious about what does each program do in real use cases.

Thanks,
Song





^ permalink raw reply	[flat|nested] 53+ messages in thread

* RE: [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls
  2019-10-02 16:43 ` John Fastabend
@ 2019-10-02 19:09   ` Toke Høiland-Jørgensen
  2019-10-02 19:15   ` Daniel Borkmann
  1 sibling, 0 replies; 53+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-10-02 19:09 UTC (permalink / raw)
  To: John Fastabend, Daniel Borkmann
  Cc: Alexei Starovoitov, Martin KaFai Lau, Song Liu, Yonghong Song,
	Marek Majkowski, Lorenz Bauer, David Miller,
	Jesper Dangaard Brouer, netdev, bpf

John Fastabend <john.fastabend@gmail.com> writes:

> Toke Høiland-Jørgensen wrote:
>> This series adds support for executing multiple XDP programs on a single
>> interface in sequence, through the use of chain calls, as discussed at the Linux
>> Plumbers Conference last month:
>> 
>> https://linuxplumbersconf.org/event/4/contributions/460/
>> 
>> # HIGH-LEVEL IDEA
>> 
>> The basic idea is to express the chain call sequence through a special map type,
>> which contains a mapping from a (program, return code) tuple to another program
>> to run in next in the sequence. Userspace can populate this map to express
>> arbitrary call sequences, and update the sequence by updating or replacing the
>> map.
>> 
>> The actual execution of the program sequence is done in bpf_prog_run_xdp(),
>> which will lookup the chain sequence map, and if found, will loop through calls
>> to BPF_PROG_RUN, looking up the next XDP program in the sequence based on the
>> previous program ID and return code.
>> 
>> An XDP chain call map can be installed on an interface by means of a new netlink
>> attribute containing an fd pointing to a chain call map. This can be supplied
>> along with the XDP prog fd, so that a chain map is always installed together
>> with an XDP program.
>> 
>> # PERFORMANCE
>> 
>> I performed a simple performance test to get an initial feel for the overhead of
>> the chain call mechanism. This test consists of running only two programs in
>> sequence: One that returns XDP_PASS and another that returns XDP_DROP. I then
>> measure the drop PPS performance and compare it to a baseline of just a single
>> program that only returns XDP_DROP.
>> 
>> For comparison, a test case that uses regular eBPF tail calls to sequence two
>> programs together is also included. Finally, because 'perf' showed that the
>> hashmap lookup was the largest single source of overhead, I also added a test
>> case where I removed the jhash() call from the hashmap code, and just use the
>> u32 key directly as an index into the hash bucket structure.
>> 
>> The performance for these different cases is as follows (with retpolines disabled):
>
> retpolines enabled would also be interesting.

Yes, I'll try that as well.

>> 
>> | Test case                       | Perf      | Add. overhead | Total overhead |
>> |---------------------------------+-----------+---------------+----------------|
>> | Before patch (XDP DROP program) | 31.0 Mpps |               |                |
>> | After patch (XDP DROP program)  | 28.9 Mpps |        2.3 ns |         2.3 ns |
>
> IMO even 1 Mpps overhead is too much for a feature that is primarily
> about ease of use. Sacrificing performance to make userland a bit
> easier is hard to justify for me when XDP _is_ singularly about
> performance. Also that is nearly 10% overhead which is fairly large.
> So I think going forward the performance gab needs to be removed.

It's not just about ease of use. If we really want XDP to catch on and
be integrated into third-party applications, we *need* to solve the
multi-program problem. Otherwise users are going to have to choose
between running their DDOS protection application and their IDS; which
means no one is going to be able to rely on XDP support, and so no one
is going to bother to implement it. Or at least, I'm afraid it'll end up
that way.

That being said, I agree that the performance impact should be kept at
an absolute minimum. In terms of code it already is: it's a single if
statement to check if a chain map is loaded. I haven't looked into why
that takes 2.3 ns to do yet. I suspect it may just be because we're
taking a cache miss: the chain map pointer is not stored anywhere near
the rest of the XDP data structures...

>> | XDP tail call                   | 26.6 Mpps |        3.0 ns |         5.3 ns |
>> | XDP chain call (no jhash)       | 19.6 Mpps |       13.4 ns |        18.7 ns |
>> | XDP chain call (this series)    | 17.0 Mpps |        7.9 ns |        26.6 ns |
>> 
>> From this it is clear that while there is some overhead from this mechanism; but
>> the jhash removal example indicates that it is probably possible to optimise the
>> code to the point where the overhead becomes low enough that it is acceptable.
>
> I'm missing why 'in theory' at least this can't be made as-fast as
> tail calls? Again I can't see why someone would lose 30% of their
> performance when a userland program could populate a tail call map for
> the same effect. Sure userland would also have to enforce some program
> standards/conventions but it could be done and at 30% overhead that
> pain is probably worth it IMO.
>
> My thinking though is if we are a bit clever chaining and tail calls
> could be performance-wise equivalent?

Oh, I totally agree that 30% overhead is way too much. I just
prioritised getting this to work and send it out for comments to get the
conversation going around the API before I spend too much time
optimising the performance bits.

I think it should be possible to get it pretty close to pure tail calls.
A tail call compiles down to a direct jump instruction without leaving
the BPF execution environment, though, so maybe not *exactly* as fast.
But I'd settle for an overhead of, say, a handful of nanoseconds
compared to a pure tail call. Certainly not the tens of nanoseconds I
get right now.

Also, bear in mind that an overhead of say 4-6 nanoseconds only
translates into such a large percentage-wise overhead in this test
because the XDP programs being run don't actually do anything - not even
touch the packet data. As the programs themselves get larger, the
percentage-wise overhead becomes smaller as well. So a test like this
shows the worst-case performance.

-Toke

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls
  2019-10-02 16:43 ` John Fastabend
  2019-10-02 19:09   ` Toke Høiland-Jørgensen
@ 2019-10-02 19:15   ` Daniel Borkmann
  2019-10-02 19:29     ` Toke Høiland-Jørgensen
  2019-10-02 19:46     ` Alexei Starovoitov
  1 sibling, 2 replies; 53+ messages in thread
From: Daniel Borkmann @ 2019-10-02 19:15 UTC (permalink / raw)
  To: John Fastabend
  Cc: Toke Høiland-Jørgensen, Alexei Starovoitov,
	Martin KaFai Lau, Song Liu, Yonghong Song, Marek Majkowski,
	Lorenz Bauer, David Miller, Jesper Dangaard Brouer, netdev, bpf

On Wed, Oct 02, 2019 at 09:43:49AM -0700, John Fastabend wrote:
> Toke Høiland-Jørgensen wrote:
> > This series adds support for executing multiple XDP programs on a single
> > interface in sequence, through the use of chain calls, as discussed at the Linux
> > Plumbers Conference last month:
> > 
> > https://linuxplumbersconf.org/event/4/contributions/460/
> > 
> > # HIGH-LEVEL IDEA
> > 
> > The basic idea is to express the chain call sequence through a special map type,
> > which contains a mapping from a (program, return code) tuple to another program
> > to run in next in the sequence. Userspace can populate this map to express
> > arbitrary call sequences, and update the sequence by updating or replacing the
> > map.
> > 
> > The actual execution of the program sequence is done in bpf_prog_run_xdp(),
> > which will lookup the chain sequence map, and if found, will loop through calls
> > to BPF_PROG_RUN, looking up the next XDP program in the sequence based on the
> > previous program ID and return code.
> > 
> > An XDP chain call map can be installed on an interface by means of a new netlink
> > attribute containing an fd pointing to a chain call map. This can be supplied
> > along with the XDP prog fd, so that a chain map is always installed together
> > with an XDP program.
> > 
> > # PERFORMANCE
> > 
> > I performed a simple performance test to get an initial feel for the overhead of
> > the chain call mechanism. This test consists of running only two programs in
> > sequence: One that returns XDP_PASS and another that returns XDP_DROP. I then
> > measure the drop PPS performance and compare it to a baseline of just a single
> > program that only returns XDP_DROP.
> > 
> > For comparison, a test case that uses regular eBPF tail calls to sequence two
> > programs together is also included. Finally, because 'perf' showed that the
> > hashmap lookup was the largest single source of overhead, I also added a test
> > case where I removed the jhash() call from the hashmap code, and just use the
> > u32 key directly as an index into the hash bucket structure.
> > 
> > The performance for these different cases is as follows (with retpolines disabled):
> 
> retpolines enabled would also be interesting.
> 
> > 
> > | Test case                       | Perf      | Add. overhead | Total overhead |
> > |---------------------------------+-----------+---------------+----------------|
> > | Before patch (XDP DROP program) | 31.0 Mpps |               |                |
> > | After patch (XDP DROP program)  | 28.9 Mpps |        2.3 ns |         2.3 ns |
> 
> IMO even 1 Mpps overhead is too much for a feature that is primarily about
> ease of use. Sacrificing performance to make userland a bit easier is hard
> to justify for me when XDP _is_ singularly about performance. Also that is
> nearly 10% overhead which is fairly large. So I think going forward the
> performance gab needs to be removed.

Fully agree, for the case where this facility is not used, it must have
*zero* overhead. This is /one/ map flavor, in future there will be other
facilities with different use-cases, but we cannot place them all into
the critical fast-path. Given this is BPF, we have the flexibility that
this can be hidden behind the scenes by rewriting and therefore only add
overhead when used.

What I also see as a red flag with this proposal is the fact that it's
tied to XDP only because you need to go and hack bpf_prog_run_xdp() all
the way to fetch xdp->rxq->dev->xdp_chain_map even though the map/concept
itself is rather generic and could be used in various other program types
as well. I'm very sure that once there, people would request it. Therefore,
better to explore a way where this has no changes to BPF_PROG_RUN() similar
to the original tail call work.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls
  2019-10-02 18:38 ` Song Liu
  2019-10-02 18:54   ` Song Liu
@ 2019-10-02 19:23   ` Toke Høiland-Jørgensen
  2019-10-02 19:49     ` Song Liu
  1 sibling, 1 reply; 53+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-10-02 19:23 UTC (permalink / raw)
  To: Song Liu
  Cc: Daniel Borkmann, Alexei Starovoitov, Martin Lau, Yonghong Song,
	Marek Majkowski, Lorenz Bauer, David Miller,
	Jesper Dangaard Brouer, netdev, bpf

Song Liu <songliubraving@fb.com> writes:

>> On Oct 2, 2019, at 6:30 AM, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> 
>> This series adds support for executing multiple XDP programs on a single
>> interface in sequence, through the use of chain calls, as discussed at the Linux
>> Plumbers Conference last month:
>> 
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__linuxplumbersconf.org_event_4_contributions_460_&d=DwIDaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=dR8692q0_uaizy0jkrBJQM5k2hfm4CiFxYT8KaysFrg&m=YXqqHTC51zXBviPBEk55y-fQjFQwcXWFlH0IoOqm2KU&s=NF4w3eSPmNhSpJr1-0FLqqlqfgEV8gsCQb9YqWQ9p-k&e= 
>> 
>> # HIGH-LEVEL IDEA
>> 
>> The basic idea is to express the chain call sequence through a special map type,
>> which contains a mapping from a (program, return code) tuple to another program
>> to run in next in the sequence. Userspace can populate this map to express
>> arbitrary call sequences, and update the sequence by updating or replacing the
>> map.
>> 
>> The actual execution of the program sequence is done in bpf_prog_run_xdp(),
>> which will lookup the chain sequence map, and if found, will loop through calls
>> to BPF_PROG_RUN, looking up the next XDP program in the sequence based on the
>> previous program ID and return code.
>> 
>> An XDP chain call map can be installed on an interface by means of a new netlink
>> attribute containing an fd pointing to a chain call map. This can be supplied
>> along with the XDP prog fd, so that a chain map is always installed together
>> with an XDP program.
>
> Interesting work!
>
> Quick question: can we achieve the same by adding a "retval to
> call_tail_next" map to each program?

Hmm, that's an interesting idea; I hadn't thought of that. As long as
that map can be manipulated outside of the program itself, it may work.
I wonder how complex it gets to modify the call sequence, though; say
you want to change A->B->C to A->C->B - how do you do that without
interrupting the sequence while you're modifying things? Or is it OK if
that is not possible?

> I think one issue is how to avoid loop like A->B->C->A, but this
> should be solvable?

Well, for tail calls there's already a counter that breaks the sequence
after a certain number of calls. We could do the same here.

>> # PERFORMANCE
>> 
>> I performed a simple performance test to get an initial feel for the overhead of
>> the chain call mechanism. This test consists of running only two programs in
>> sequence: One that returns XDP_PASS and another that returns XDP_DROP. I then
>> measure the drop PPS performance and compare it to a baseline of just a single
>> program that only returns XDP_DROP.
>> 
>> For comparison, a test case that uses regular eBPF tail calls to sequence two
>> programs together is also included. Finally, because 'perf' showed that the
>> hashmap lookup was the largest single source of overhead, I also added a test
>> case where I removed the jhash() call from the hashmap code, and just use the
>> u32 key directly as an index into the hash bucket structure.
>> 
>> The performance for these different cases is as follows (with retpolines disabled):
>> 
>> | Test case                       | Perf      | Add. overhead | Total overhead |
>> |---------------------------------+-----------+---------------+----------------|
>> | Before patch (XDP DROP program) | 31.0 Mpps |               |                |
>> | After patch (XDP DROP program)  | 28.9 Mpps |        2.3 ns |         2.3 ns |
>> | XDP tail call                   | 26.6 Mpps |        3.0 ns |         5.3 ns |
>> | XDP chain call (no jhash)       | 19.6 Mpps |       13.4 ns |        18.7 ns |
>> | XDP chain call (this series)    | 17.0 Mpps |        7.9 ns |        26.6 ns |
>> 
>> From this it is clear that while there is some overhead from this mechanism; but
>> the jhash removal example indicates that it is probably possible to optimise the
>> code to the point where the overhead becomes low enough that it is acceptable.
>
> I think we can probably re-jit multiple programs into one based on the
> mapping, which should give the best performance.

Yeah, integrating this into the jit+verifier would obviously give the
best performance. But I wanted to avoid that because I viewed this as an
XDP-specific feature, and I didn't want to add more complexity to the
already somewhat complex verifier.

However, if there's really interest in having this be a general feature
outside of XDP, I guess I can look at that again.

-Toke

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls
  2019-10-02 18:54   ` Song Liu
@ 2019-10-02 19:25     ` Toke Høiland-Jørgensen
  2019-10-03  8:53       ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 53+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-10-02 19:25 UTC (permalink / raw)
  To: Song Liu
  Cc: Daniel Borkmann, Alexei Starovoitov, Martin Lau, Yonghong Song,
	Marek Majkowski, Lorenz Bauer, David Miller,
	Jesper Dangaard Brouer, netdev, bpf

Song Liu <songliubraving@fb.com> writes:

>> On Oct 2, 2019, at 11:38 AM, Song Liu <songliubraving@fb.com> wrote:
>> 
>> 
>> 
>>> On Oct 2, 2019, at 6:30 AM, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>> 
>>> This series adds support for executing multiple XDP programs on a single
>>> interface in sequence, through the use of chain calls, as discussed at the Linux
>>> Plumbers Conference last month:
>>> 
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__linuxplumbersconf.org_event_4_contributions_460_&d=DwIDaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=dR8692q0_uaizy0jkrBJQM5k2hfm4CiFxYT8KaysFrg&m=YXqqHTC51zXBviPBEk55y-fQjFQwcXWFlH0IoOqm2KU&s=NF4w3eSPmNhSpJr1-0FLqqlqfgEV8gsCQb9YqWQ9p-k&e= 
>>> 
>>> # HIGH-LEVEL IDEA
>>> 
>>> The basic idea is to express the chain call sequence through a special map type,
>>> which contains a mapping from a (program, return code) tuple to another program
>>> to run in next in the sequence. Userspace can populate this map to express
>>> arbitrary call sequences, and update the sequence by updating or replacing the
>>> map.
>>> 
>>> The actual execution of the program sequence is done in bpf_prog_run_xdp(),
>>> which will lookup the chain sequence map, and if found, will loop through calls
>>> to BPF_PROG_RUN, looking up the next XDP program in the sequence based on the
>>> previous program ID and return code.
>>> 
>>> An XDP chain call map can be installed on an interface by means of a new netlink
>>> attribute containing an fd pointing to a chain call map. This can be supplied
>>> along with the XDP prog fd, so that a chain map is always installed together
>>> with an XDP program.
>> 
>> Interesting work!
>> 
>> Quick question: can we achieve the same by adding a "retval to call_tail_next" 
>> map to each program? I think one issue is how to avoid loop like A->B->C->A, 
>> but this should be solvable? 
>
> Also, could you please share a real word example? I saw the example
> from LPC slides, but I am more curious about what does each program do
> in real use cases.

The only concrete program that I have that needs this is xdpcap:
https://github.com/cloudflare/xdpcap

Right now that needs to be integrated into the calling program to work;
I want to write a tool like it, but that can insert itself before or
after arbitrary XDP programs.

Lorenz, can you say more about your use case? :)

-Toke

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls
  2019-10-02 19:15   ` Daniel Borkmann
@ 2019-10-02 19:29     ` Toke Høiland-Jørgensen
  2019-10-02 19:46     ` Alexei Starovoitov
  1 sibling, 0 replies; 53+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-10-02 19:29 UTC (permalink / raw)
  To: Daniel Borkmann, John Fastabend
  Cc: Alexei Starovoitov, Martin KaFai Lau, Song Liu, Yonghong Song,
	Marek Majkowski, Lorenz Bauer, David Miller,
	Jesper Dangaard Brouer, netdev, bpf

Daniel Borkmann <daniel@iogearbox.net> writes:

> On Wed, Oct 02, 2019 at 09:43:49AM -0700, John Fastabend wrote:
>> Toke Høiland-Jørgensen wrote:
>> > This series adds support for executing multiple XDP programs on a single
>> > interface in sequence, through the use of chain calls, as discussed at the Linux
>> > Plumbers Conference last month:
>> > 
>> > https://linuxplumbersconf.org/event/4/contributions/460/
>> > 
>> > # HIGH-LEVEL IDEA
>> > 
>> > The basic idea is to express the chain call sequence through a special map type,
>> > which contains a mapping from a (program, return code) tuple to another program
>> > to run in next in the sequence. Userspace can populate this map to express
>> > arbitrary call sequences, and update the sequence by updating or replacing the
>> > map.
>> > 
>> > The actual execution of the program sequence is done in bpf_prog_run_xdp(),
>> > which will lookup the chain sequence map, and if found, will loop through calls
>> > to BPF_PROG_RUN, looking up the next XDP program in the sequence based on the
>> > previous program ID and return code.
>> > 
>> > An XDP chain call map can be installed on an interface by means of a new netlink
>> > attribute containing an fd pointing to a chain call map. This can be supplied
>> > along with the XDP prog fd, so that a chain map is always installed together
>> > with an XDP program.
>> > 
>> > # PERFORMANCE
>> > 
>> > I performed a simple performance test to get an initial feel for the overhead of
>> > the chain call mechanism. This test consists of running only two programs in
>> > sequence: One that returns XDP_PASS and another that returns XDP_DROP. I then
>> > measure the drop PPS performance and compare it to a baseline of just a single
>> > program that only returns XDP_DROP.
>> > 
>> > For comparison, a test case that uses regular eBPF tail calls to sequence two
>> > programs together is also included. Finally, because 'perf' showed that the
>> > hashmap lookup was the largest single source of overhead, I also added a test
>> > case where I removed the jhash() call from the hashmap code, and just use the
>> > u32 key directly as an index into the hash bucket structure.
>> > 
>> > The performance for these different cases is as follows (with retpolines disabled):
>> 
>> retpolines enabled would also be interesting.
>> 
>> > 
>> > | Test case                       | Perf      | Add. overhead | Total overhead |
>> > |---------------------------------+-----------+---------------+----------------|
>> > | Before patch (XDP DROP program) | 31.0 Mpps |               |                |
>> > | After patch (XDP DROP program)  | 28.9 Mpps |        2.3 ns |         2.3 ns |
>> 
>> IMO even 1 Mpps overhead is too much for a feature that is primarily about
>> ease of use. Sacrificing performance to make userland a bit easier is hard
>> to justify for me when XDP _is_ singularly about performance. Also that is
>> nearly 10% overhead which is fairly large. So I think going forward the
>> performance gab needs to be removed.
>
> Fully agree, for the case where this facility is not used, it must
> have *zero* overhead. This is /one/ map flavor, in future there will
> be other facilities with different use-cases, but we cannot place them
> all into the critical fast-path. Given this is BPF, we have the
> flexibility that this can be hidden behind the scenes by rewriting and
> therefore only add overhead when used.
>
> What I also see as a red flag with this proposal is the fact that it's
> tied to XDP only because you need to go and hack bpf_prog_run_xdp()
> all the way to fetch xdp->rxq->dev->xdp_chain_map even though the
> map/concept itself is rather generic and could be used in various
> other program types as well. I'm very sure that once there, people
> would request it. Therefore, better to explore a way where this has no
> changes to BPF_PROG_RUN() similar to the original tail call work.

As I said in the other reply, I actually went out of my way to make this
XDP only. But since you're now the third person requesting it not be, I
guess I'll take the hint and look at a more general way to hook this in :)

-Toke

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls
  2019-10-02 19:15   ` Daniel Borkmann
  2019-10-02 19:29     ` Toke Høiland-Jørgensen
@ 2019-10-02 19:46     ` Alexei Starovoitov
  2019-10-03  7:58       ` Toke Høiland-Jørgensen
  1 sibling, 1 reply; 53+ messages in thread
From: Alexei Starovoitov @ 2019-10-02 19:46 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: John Fastabend, Toke Høiland-Jørgensen,
	Alexei Starovoitov, Martin KaFai Lau, Song Liu, Yonghong Song,
	Marek Majkowski, Lorenz Bauer, David Miller,
	Jesper Dangaard Brouer, netdev, bpf

On Wed, Oct 02, 2019 at 09:15:22PM +0200, Daniel Borkmann wrote:
> On Wed, Oct 02, 2019 at 09:43:49AM -0700, John Fastabend wrote:
> > Toke Høiland-Jørgensen wrote:
> > > This series adds support for executing multiple XDP programs on a single
> > > interface in sequence, through the use of chain calls, as discussed at the Linux
> > > Plumbers Conference last month:
> > > 
> > > https://linuxplumbersconf.org/event/4/contributions/460/
> > > 
> > > # HIGH-LEVEL IDEA
> > > 
> > > The basic idea is to express the chain call sequence through a special map type,
> > > which contains a mapping from a (program, return code) tuple to another program
> > > to run in next in the sequence. Userspace can populate this map to express
> > > arbitrary call sequences, and update the sequence by updating or replacing the
> > > map.
> > > 
> > > The actual execution of the program sequence is done in bpf_prog_run_xdp(),
> > > which will lookup the chain sequence map, and if found, will loop through calls
> > > to BPF_PROG_RUN, looking up the next XDP program in the sequence based on the
> > > previous program ID and return code.
> > > 
> > > An XDP chain call map can be installed on an interface by means of a new netlink
> > > attribute containing an fd pointing to a chain call map. This can be supplied
> > > along with the XDP prog fd, so that a chain map is always installed together
> > > with an XDP program.
> > > 
> > > # PERFORMANCE
> > > 
> > > I performed a simple performance test to get an initial feel for the overhead of
> > > the chain call mechanism. This test consists of running only two programs in
> > > sequence: One that returns XDP_PASS and another that returns XDP_DROP. I then
> > > measure the drop PPS performance and compare it to a baseline of just a single
> > > program that only returns XDP_DROP.
> > > 
> > > For comparison, a test case that uses regular eBPF tail calls to sequence two
> > > programs together is also included. Finally, because 'perf' showed that the
> > > hashmap lookup was the largest single source of overhead, I also added a test
> > > case where I removed the jhash() call from the hashmap code, and just use the
> > > u32 key directly as an index into the hash bucket structure.
> > > 
> > > The performance for these different cases is as follows (with retpolines disabled):
> > 
> > retpolines enabled would also be interesting.
> > 
> > > 
> > > | Test case                       | Perf      | Add. overhead | Total overhead |
> > > |---------------------------------+-----------+---------------+----------------|
> > > | Before patch (XDP DROP program) | 31.0 Mpps |               |                |
> > > | After patch (XDP DROP program)  | 28.9 Mpps |        2.3 ns |         2.3 ns |
> > 
> > IMO even 1 Mpps overhead is too much for a feature that is primarily about
> > ease of use. Sacrificing performance to make userland a bit easier is hard
> > to justify for me when XDP _is_ singularly about performance. Also that is
> > nearly 10% overhead which is fairly large. So I think going forward the
> > performance gab needs to be removed.
> 
> Fully agree, for the case where this facility is not used, it must have
> *zero* overhead. This is /one/ map flavor, in future there will be other
> facilities with different use-cases, but we cannot place them all into
> the critical fast-path. Given this is BPF, we have the flexibility that
> this can be hidden behind the scenes by rewriting and therefore only add
> overhead when used.
> 
> What I also see as a red flag with this proposal is the fact that it's
> tied to XDP only because you need to go and hack bpf_prog_run_xdp() all
> the way to fetch xdp->rxq->dev->xdp_chain_map even though the map/concept
> itself is rather generic and could be used in various other program types
> as well. I'm very sure that once there, people would request it. Therefore,
> better to explore a way where this has no changes to BPF_PROG_RUN() similar
> to the original tail call work.

two +1s.

1. new features have to have zero overhead when not used. this is not negotiable.
2. prog chaining is not xdp specific.

two years ago I was thinking about extending tail_call mechanism like this:
https://git.kernel.org/pub/scm/linux/kernel/git/ast/bpf.git/commit/?h=prog_chain&id=f54f45d00f91e083f6aec2abe35b6f0be52ae85b

and the program would call the new helper 'bpf_tail_call_next()' to jump
into the next program.
Sample code is here:
https://git.kernel.org/pub/scm/linux/kernel/git/ast/bpf.git/commit/?h=prog_chain&id=933a93208f1bd60a9707dc3f51a9aa457c86be87

In my benchmarks it was faster than existing bpf_tail_call via prog_array.
(And fit the rule of zero overhead when not used).

While coding it I figured that we can do proper indirect calls instead,
which would be even cleaner solution.
It would support arbitrary program chaining and calling.

The verifier back then didn't have enough infra to support indirect calls.
I suggest to look into implementing indirect calls instead of hacking
custom prog chaining logic via maps.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls
  2019-10-02 19:23   ` Toke Høiland-Jørgensen
@ 2019-10-02 19:49     ` Song Liu
  2019-10-03  7:59       ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 53+ messages in thread
From: Song Liu @ 2019-10-02 19:49 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Daniel Borkmann, Alexei Starovoitov, Martin Lau, Yonghong Song,
	Marek Majkowski, Lorenz Bauer, David Miller,
	Jesper Dangaard Brouer, netdev, bpf



> On Oct 2, 2019, at 12:23 PM, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> 
> Song Liu <songliubraving@fb.com> writes:
> 
>>> On Oct 2, 2019, at 6:30 AM, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>> 
>>> This series adds support for executing multiple XDP programs on a single
>>> interface in sequence, through the use of chain calls, as discussed at the Linux
>>> Plumbers Conference last month:
>>> 
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__linuxplumbersconf.org_event_4_contributions_460_&d=DwIDaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=dR8692q0_uaizy0jkrBJQM5k2hfm4CiFxYT8KaysFrg&m=YXqqHTC51zXBviPBEk55y-fQjFQwcXWFlH0IoOqm2KU&s=NF4w3eSPmNhSpJr1-0FLqqlqfgEV8gsCQb9YqWQ9p-k&e= 
>>> 
>>> # HIGH-LEVEL IDEA
>>> 
>>> The basic idea is to express the chain call sequence through a special map type,
>>> which contains a mapping from a (program, return code) tuple to another program
>>> to run in next in the sequence. Userspace can populate this map to express
>>> arbitrary call sequences, and update the sequence by updating or replacing the
>>> map.
>>> 
>>> The actual execution of the program sequence is done in bpf_prog_run_xdp(),
>>> which will lookup the chain sequence map, and if found, will loop through calls
>>> to BPF_PROG_RUN, looking up the next XDP program in the sequence based on the
>>> previous program ID and return code.
>>> 
>>> An XDP chain call map can be installed on an interface by means of a new netlink
>>> attribute containing an fd pointing to a chain call map. This can be supplied
>>> along with the XDP prog fd, so that a chain map is always installed together
>>> with an XDP program.
>> 
>> Interesting work!
>> 
>> Quick question: can we achieve the same by adding a "retval to
>> call_tail_next" map to each program?
> 
> Hmm, that's an interesting idea; I hadn't thought of that. As long as
> that map can be manipulated outside of the program itself, it may work.
> I wonder how complex it gets to modify the call sequence, though; say
> you want to change A->B->C to A->C->B - how do you do that without
> interrupting the sequence while you're modifying things? Or is it OK if
> that is not possible?

We can always load another copy of B and C, say D == B, and E == C. And 
make it A->E->D. 



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls
  2019-10-02 18:33       ` Toke Høiland-Jørgensen
@ 2019-10-02 20:34         ` John Fastabend
  2019-10-03  7:48           ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 53+ messages in thread
From: John Fastabend @ 2019-10-02 20:34 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, John Fastabend, Alan Maguire
  Cc: Daniel Borkmann, Alexei Starovoitov, Martin KaFai Lau, Song Liu,
	Yonghong Song, Marek Majkowski, Lorenz Bauer, David Miller,
	Jesper Dangaard Brouer, netdev, bpf

Toke Høiland-Jørgensen wrote:
> John Fastabend <john.fastabend@gmail.com> writes:
> 
> > Toke Høiland-Jørgensen wrote:
> >> Alan Maguire <alan.maguire@oracle.com> writes:
> >> 
> >> > On Wed, 2 Oct 2019, Toke Høiland-Jørgensen wrote:
> >> >
> >> >> This series adds support for executing multiple XDP programs on a single
> >> >> interface in sequence, through the use of chain calls, as discussed at the Linux
> >> >> Plumbers Conference last month:
> >> >> 
> >> >> https://linuxplumbersconf.org/event/4/contributions/460/
> >> >> 
> >> >> # HIGH-LEVEL IDEA
> >> >> 
> >> >> The basic idea is to express the chain call sequence through a special map type,
> >> >> which contains a mapping from a (program, return code) tuple to another program
> >> >> to run in next in the sequence. Userspace can populate this map to express
> >> >> arbitrary call sequences, and update the sequence by updating or replacing the
> >> >> map.
> >> >> 
> >> >> The actual execution of the program sequence is done in bpf_prog_run_xdp(),
> >> >> which will lookup the chain sequence map, and if found, will loop through calls
> >> >> to BPF_PROG_RUN, looking up the next XDP program in the sequence based on the
> >> >> previous program ID and return code.
> >> >> 
> >> >> An XDP chain call map can be installed on an interface by means of a new netlink
> >> >> attribute containing an fd pointing to a chain call map. This can be supplied
> >> >> along with the XDP prog fd, so that a chain map is always installed together
> >> >> with an XDP program.
> >> >> 
> >> >
> >> > This is great stuff Toke!
> >> 
> >> Thanks! :)
> >> 
> >> > One thing that wasn't immediately clear to me - and this may be just
> >> > me - is the relationship between program behaviour for the XDP_DROP
> >> > case and chain call execution. My initial thought was that a program
> >> > in the chain XDP_DROP'ping the packet would terminate the call chain,
> >> > but on looking at patch #4 it seems that the only way the call chain
> >> > execution is terminated is if
> >> >
> >> > - XDP_ABORTED is returned from a program in the call chain; or
> >> 
> >> Yes. Not actually sure about this one...
> >> 
> >> > - the map entry for the next program (determined by the return value
> >> >   of the current program) is empty; or
> >> 
> >> This will be the common exit condition, I expect
> >> 
> >> > - we run out of entries in the map
> >> 
> >> You mean if we run the iteration counter to zero, right?
> >> 
> >> > The return value of the last-executed program in the chain seems to be
> >> > what determines packet processing behaviour after executing the chain
> >> > (_DROP, _TX, _PASS, etc). So there's no way to both XDP_PASS and
> >> > XDP_TX a packet from the same chain, right? Just want to make sure
> >> > I've got the semantics correct. Thanks!
> >> 
> >> Yeah, you've got all this right. The chain call mechanism itself doesn't
> >> change any of the underlying fundamentals of XDP. I.e., each packet gets
> >> exactly one verdict.
> >> 
> >> For chaining actual XDP programs that do different things to the packet,
> >> I expect that the most common use case will be to only run the next
> >> program if the previous one returns XDP_PASS. That will make the most
> >> semantic sense I think.
> >> 
> >> But there are also use cases where one would want to match on the other
> >> return codes; such as packet capture, for instance, where one might
> >> install a capture program that would carry forward the previous return
> >> code, but do something to the packet (throw it out to userspace) first.
> >> 
> >> For the latter use case, the question is if we need to expose the
> >> previous return code to the program when it runs. You can do things
> >> without it (by just using a different program per return code), but it
> >> may simplify things if we just expose the return code. However, since
> >> this will also change the semantics for running programs, I decided to
> >> leave that off for now.
> >> 
> >> -Toke
> >
> > In other cases where programs (e.g. cgroups) are run in an array the
> > return codes are 'AND'ed together so that we get
> >
> >    result1 & result2 & ... & resultN
> 
> How would that work with multiple programs, though? PASS -> DROP seems
> obvious, but what if the first program returns TX? Also, programs may
> want to be able to actually override return codes (e.g., say you want to
> turn DROPs into REDIRECTs, to get all your dropped packets mirrored to
> your IDS or something).

In general I think either you hard code a precedence that will have to
be overly conservative because if one program (your firewall) tells XDP
to drop the packet and some other program redirects it, passes, etc. that
seems incorrect to me. Or you get creative with the precedence rules and
they become complex and difficult to manage, where a drop will drop a packet
unless a previous/preceding program redirects it, etc. I think any hard
coded precedence you come up with will make some one happy and some other
user annoyed. Defeating the programability of BPF.

Better if its programmable. I would prefer to pass the context into the
next program then programs can build their own semantics. Then leave
the & of return codes so any program can if needed really drop a packet.
The context could be pushed into a shared memory region and then it
doesn't even need to be part of the program signature.

.John

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls
  2019-10-02 20:34         ` John Fastabend
@ 2019-10-03  7:48           ` Toke Høiland-Jørgensen
  2019-10-03 10:09             ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 53+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-10-03  7:48 UTC (permalink / raw)
  To: John Fastabend, John Fastabend, Alan Maguire
  Cc: Daniel Borkmann, Alexei Starovoitov, Martin KaFai Lau, Song Liu,
	Yonghong Song, Marek Majkowski, Lorenz Bauer, David Miller,
	Jesper Dangaard Brouer, netdev, bpf

John Fastabend <john.fastabend@gmail.com> writes:

> Toke Høiland-Jørgensen wrote:
>> John Fastabend <john.fastabend@gmail.com> writes:
>> 
>> > Toke Høiland-Jørgensen wrote:
>> >> Alan Maguire <alan.maguire@oracle.com> writes:
>> >> 
>> >> > On Wed, 2 Oct 2019, Toke Høiland-Jørgensen wrote:
>> >> >
>> >> >> This series adds support for executing multiple XDP programs on a single
>> >> >> interface in sequence, through the use of chain calls, as discussed at the Linux
>> >> >> Plumbers Conference last month:
>> >> >> 
>> >> >> https://linuxplumbersconf.org/event/4/contributions/460/
>> >> >> 
>> >> >> # HIGH-LEVEL IDEA
>> >> >> 
>> >> >> The basic idea is to express the chain call sequence through a special map type,
>> >> >> which contains a mapping from a (program, return code) tuple to another program
>> >> >> to run in next in the sequence. Userspace can populate this map to express
>> >> >> arbitrary call sequences, and update the sequence by updating or replacing the
>> >> >> map.
>> >> >> 
>> >> >> The actual execution of the program sequence is done in bpf_prog_run_xdp(),
>> >> >> which will lookup the chain sequence map, and if found, will loop through calls
>> >> >> to BPF_PROG_RUN, looking up the next XDP program in the sequence based on the
>> >> >> previous program ID and return code.
>> >> >> 
>> >> >> An XDP chain call map can be installed on an interface by means of a new netlink
>> >> >> attribute containing an fd pointing to a chain call map. This can be supplied
>> >> >> along with the XDP prog fd, so that a chain map is always installed together
>> >> >> with an XDP program.
>> >> >> 
>> >> >
>> >> > This is great stuff Toke!
>> >> 
>> >> Thanks! :)
>> >> 
>> >> > One thing that wasn't immediately clear to me - and this may be just
>> >> > me - is the relationship between program behaviour for the XDP_DROP
>> >> > case and chain call execution. My initial thought was that a program
>> >> > in the chain XDP_DROP'ping the packet would terminate the call chain,
>> >> > but on looking at patch #4 it seems that the only way the call chain
>> >> > execution is terminated is if
>> >> >
>> >> > - XDP_ABORTED is returned from a program in the call chain; or
>> >> 
>> >> Yes. Not actually sure about this one...
>> >> 
>> >> > - the map entry for the next program (determined by the return value
>> >> >   of the current program) is empty; or
>> >> 
>> >> This will be the common exit condition, I expect
>> >> 
>> >> > - we run out of entries in the map
>> >> 
>> >> You mean if we run the iteration counter to zero, right?
>> >> 
>> >> > The return value of the last-executed program in the chain seems to be
>> >> > what determines packet processing behaviour after executing the chain
>> >> > (_DROP, _TX, _PASS, etc). So there's no way to both XDP_PASS and
>> >> > XDP_TX a packet from the same chain, right? Just want to make sure
>> >> > I've got the semantics correct. Thanks!
>> >> 
>> >> Yeah, you've got all this right. The chain call mechanism itself doesn't
>> >> change any of the underlying fundamentals of XDP. I.e., each packet gets
>> >> exactly one verdict.
>> >> 
>> >> For chaining actual XDP programs that do different things to the packet,
>> >> I expect that the most common use case will be to only run the next
>> >> program if the previous one returns XDP_PASS. That will make the most
>> >> semantic sense I think.
>> >> 
>> >> But there are also use cases where one would want to match on the other
>> >> return codes; such as packet capture, for instance, where one might
>> >> install a capture program that would carry forward the previous return
>> >> code, but do something to the packet (throw it out to userspace) first.
>> >> 
>> >> For the latter use case, the question is if we need to expose the
>> >> previous return code to the program when it runs. You can do things
>> >> without it (by just using a different program per return code), but it
>> >> may simplify things if we just expose the return code. However, since
>> >> this will also change the semantics for running programs, I decided to
>> >> leave that off for now.
>> >> 
>> >> -Toke
>> >
>> > In other cases where programs (e.g. cgroups) are run in an array the
>> > return codes are 'AND'ed together so that we get
>> >
>> >    result1 & result2 & ... & resultN
>> 
>> How would that work with multiple programs, though? PASS -> DROP seems
>> obvious, but what if the first program returns TX? Also, programs may
>> want to be able to actually override return codes (e.g., say you want to
>> turn DROPs into REDIRECTs, to get all your dropped packets mirrored to
>> your IDS or something).
>
> In general I think either you hard code a precedence that will have to
> be overly conservative because if one program (your firewall) tells
> XDP to drop the packet and some other program redirects it, passes,
> etc. that seems incorrect to me. Or you get creative with the
> precedence rules and they become complex and difficult to manage,
> where a drop will drop a packet unless a previous/preceding program
> redirects it, etc. I think any hard coded precedence you come up with
> will make some one happy and some other user annoyed. Defeating the
> programability of BPF.

Yeah, exactly. That's basically why I punted on that completely.
Besides, technically you can get this by just installing different
programs in each slot if you really need it.

> Better if its programmable. I would prefer to pass the context into
> the next program then programs can build their own semantics. Then
> leave the & of return codes so any program can if needed really drop a
> packet. The context could be pushed into a shared memory region and
> then it doesn't even need to be part of the program signature.

Since it seems I'll be going down the rabbit hole of baking this into
the BPF execution environment itself, I guess I'll keep this in mind as
well. Either by stuffing the previous program return code into the
context object(s), or by adding a new helper to retrieve it.

-Toke

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls
  2019-10-02 19:46     ` Alexei Starovoitov
@ 2019-10-03  7:58       ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 53+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-10-03  7:58 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: John Fastabend, Alexei Starovoitov, Martin KaFai Lau, Song Liu,
	Yonghong Song, Marek Majkowski, Lorenz Bauer, David Miller,
	Jesper Dangaard Brouer, netdev, bpf

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Wed, Oct 02, 2019 at 09:15:22PM +0200, Daniel Borkmann wrote:
>> On Wed, Oct 02, 2019 at 09:43:49AM -0700, John Fastabend wrote:
>> > Toke Høiland-Jørgensen wrote:
>> > > This series adds support for executing multiple XDP programs on a single
>> > > interface in sequence, through the use of chain calls, as discussed at the Linux
>> > > Plumbers Conference last month:
>> > > 
>> > > https://linuxplumbersconf.org/event/4/contributions/460/
>> > > 
>> > > # HIGH-LEVEL IDEA
>> > > 
>> > > The basic idea is to express the chain call sequence through a special map type,
>> > > which contains a mapping from a (program, return code) tuple to another program
>> > > to run in next in the sequence. Userspace can populate this map to express
>> > > arbitrary call sequences, and update the sequence by updating or replacing the
>> > > map.
>> > > 
>> > > The actual execution of the program sequence is done in bpf_prog_run_xdp(),
>> > > which will lookup the chain sequence map, and if found, will loop through calls
>> > > to BPF_PROG_RUN, looking up the next XDP program in the sequence based on the
>> > > previous program ID and return code.
>> > > 
>> > > An XDP chain call map can be installed on an interface by means of a new netlink
>> > > attribute containing an fd pointing to a chain call map. This can be supplied
>> > > along with the XDP prog fd, so that a chain map is always installed together
>> > > with an XDP program.
>> > > 
>> > > # PERFORMANCE
>> > > 
>> > > I performed a simple performance test to get an initial feel for the overhead of
>> > > the chain call mechanism. This test consists of running only two programs in
>> > > sequence: One that returns XDP_PASS and another that returns XDP_DROP. I then
>> > > measure the drop PPS performance and compare it to a baseline of just a single
>> > > program that only returns XDP_DROP.
>> > > 
>> > > For comparison, a test case that uses regular eBPF tail calls to sequence two
>> > > programs together is also included. Finally, because 'perf' showed that the
>> > > hashmap lookup was the largest single source of overhead, I also added a test
>> > > case where I removed the jhash() call from the hashmap code, and just use the
>> > > u32 key directly as an index into the hash bucket structure.
>> > > 
>> > > The performance for these different cases is as follows (with retpolines disabled):
>> > 
>> > retpolines enabled would also be interesting.
>> > 
>> > > 
>> > > | Test case                       | Perf      | Add. overhead | Total overhead |
>> > > |---------------------------------+-----------+---------------+----------------|
>> > > | Before patch (XDP DROP program) | 31.0 Mpps |               |                |
>> > > | After patch (XDP DROP program)  | 28.9 Mpps |        2.3 ns |         2.3 ns |
>> > 
>> > IMO even 1 Mpps overhead is too much for a feature that is primarily about
>> > ease of use. Sacrificing performance to make userland a bit easier is hard
>> > to justify for me when XDP _is_ singularly about performance. Also that is
>> > nearly 10% overhead which is fairly large. So I think going forward the
>> > performance gab needs to be removed.
>> 
>> Fully agree, for the case where this facility is not used, it must have
>> *zero* overhead. This is /one/ map flavor, in future there will be other
>> facilities with different use-cases, but we cannot place them all into
>> the critical fast-path. Given this is BPF, we have the flexibility that
>> this can be hidden behind the scenes by rewriting and therefore only add
>> overhead when used.
>> 
>> What I also see as a red flag with this proposal is the fact that it's
>> tied to XDP only because you need to go and hack bpf_prog_run_xdp() all
>> the way to fetch xdp->rxq->dev->xdp_chain_map even though the map/concept
>> itself is rather generic and could be used in various other program types
>> as well. I'm very sure that once there, people would request it. Therefore,
>> better to explore a way where this has no changes to BPF_PROG_RUN() similar
>> to the original tail call work.
>
> two +1s.
>
> 1. new features have to have zero overhead when not used. this is not
>    negotiable.
> 2. prog chaining is not xdp specific.
>
> two years ago I was thinking about extending tail_call mechanism like this:
> https://git.kernel.org/pub/scm/linux/kernel/git/ast/bpf.git/commit/?h=prog_chain&id=f54f45d00f91e083f6aec2abe35b6f0be52ae85b
>
> and the program would call the new helper 'bpf_tail_call_next()' to jump
> into the next program.
> Sample code is here:
> https://git.kernel.org/pub/scm/linux/kernel/git/ast/bpf.git/commit/?h=prog_chain&id=933a93208f1bd60a9707dc3f51a9aa457c86be87

Ah, that's a useful starting point, thanks!

> In my benchmarks it was faster than existing bpf_tail_call via prog_array.
> (And fit the rule of zero overhead when not used).
>
> While coding it I figured that we can do proper indirect calls instead,
> which would be even cleaner solution.
> It would support arbitrary program chaining and calling.
>
> The verifier back then didn't have enough infra to support indirect calls.
> I suggest to look into implementing indirect calls instead of hacking
> custom prog chaining logic via maps.

One constraint I have is that the admin needs to be able to define the
program execution chain without any cooperation from the program
authors. So it can't rely on programs calling a helper. But I guess we
can achieve that by injecting code into the program when the feature is
turned on...

I'll go jump down the BPF execution environment rabbit whole and see
what comes back up! :)

-Toke

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls
  2019-10-02 19:49     ` Song Liu
@ 2019-10-03  7:59       ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 53+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-10-03  7:59 UTC (permalink / raw)
  To: Song Liu
  Cc: Daniel Borkmann, Alexei Starovoitov, Martin Lau, Yonghong Song,
	Marek Majkowski, Lorenz Bauer, David Miller,
	Jesper Dangaard Brouer, netdev, bpf

Song Liu <songliubraving@fb.com> writes:

>> On Oct 2, 2019, at 12:23 PM, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> 
>> Song Liu <songliubraving@fb.com> writes:
>> 
>>>> On Oct 2, 2019, at 6:30 AM, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>>> 
>>>> This series adds support for executing multiple XDP programs on a single
>>>> interface in sequence, through the use of chain calls, as discussed at the Linux
>>>> Plumbers Conference last month:
>>>> 
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__linuxplumbersconf.org_event_4_contributions_460_&d=DwIDaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=dR8692q0_uaizy0jkrBJQM5k2hfm4CiFxYT8KaysFrg&m=YXqqHTC51zXBviPBEk55y-fQjFQwcXWFlH0IoOqm2KU&s=NF4w3eSPmNhSpJr1-0FLqqlqfgEV8gsCQb9YqWQ9p-k&e= 
>>>> 
>>>> # HIGH-LEVEL IDEA
>>>> 
>>>> The basic idea is to express the chain call sequence through a special map type,
>>>> which contains a mapping from a (program, return code) tuple to another program
>>>> to run in next in the sequence. Userspace can populate this map to express
>>>> arbitrary call sequences, and update the sequence by updating or replacing the
>>>> map.
>>>> 
>>>> The actual execution of the program sequence is done in bpf_prog_run_xdp(),
>>>> which will lookup the chain sequence map, and if found, will loop through calls
>>>> to BPF_PROG_RUN, looking up the next XDP program in the sequence based on the
>>>> previous program ID and return code.
>>>> 
>>>> An XDP chain call map can be installed on an interface by means of a new netlink
>>>> attribute containing an fd pointing to a chain call map. This can be supplied
>>>> along with the XDP prog fd, so that a chain map is always installed together
>>>> with an XDP program.
>>> 
>>> Interesting work!
>>> 
>>> Quick question: can we achieve the same by adding a "retval to
>>> call_tail_next" map to each program?
>> 
>> Hmm, that's an interesting idea; I hadn't thought of that. As long as
>> that map can be manipulated outside of the program itself, it may work.
>> I wonder how complex it gets to modify the call sequence, though; say
>> you want to change A->B->C to A->C->B - how do you do that without
>> interrupting the sequence while you're modifying things? Or is it OK if
>> that is not possible?
>
> We can always load another copy of B and C, say D == B, and E == C. And 
> make it A->E->D. 

Yes, thinking some more about this I don't think it's actually a
problem. I'll go prototype something...

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls
  2019-10-02 19:25     ` Toke Høiland-Jørgensen
@ 2019-10-03  8:53       ` Jesper Dangaard Brouer
  2019-10-03 14:03         ` Alexei Starovoitov
  0 siblings, 1 reply; 53+ messages in thread
From: Jesper Dangaard Brouer @ 2019-10-03  8:53 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Song Liu, Daniel Borkmann, Alexei Starovoitov, Martin Lau,
	Yonghong Song, Marek Majkowski, Lorenz Bauer, David Miller,
	netdev, bpf, brouer

On Wed, 02 Oct 2019 21:25:28 +0200
Toke Høiland-Jørgensen <toke@redhat.com> wrote:

> Song Liu <songliubraving@fb.com> writes:
> 
> >> On Oct 2, 2019, at 11:38 AM, Song Liu <songliubraving@fb.com> wrote:
> >>   
> >>> On Oct 2, 2019, at 6:30 AM, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>> 
> >>> This series adds support for executing multiple XDP programs on a single
> >>> interface in sequence, through the use of chain calls, as discussed at the Linux
> >>> Plumbers Conference last month:
> >>>

[1] https://linuxplumbersconf.org/event/4/contributions/460/
- [2] Slides: http://people.netfilter.org/hawk/presentations/LinuxPlumbers2019/xdp-distro-view.pdf
- [3] Source: https://github.com/xdp-project/xdp-project/tree/master/conference/LinuxPlumbers2019
 
[...]
> >
> > Also, could you please share a real word example? I saw the example
> > from LPC slides, but I am more curious about what does each program do
> > in real use cases.  
> 
> The only concrete program that I have that needs this is xdpcap:
> https://github.com/cloudflare/xdpcap
> 
> Right now that needs to be integrated into the calling program to work;
> I want to write a tool like it, but that can insert itself before or
> after arbitrary XDP programs.

The other real world use-case it Facebooks katran, you should be aware:
 https://github.com/facebookincubator/katran

It might be important to understand that the patchset/intent is a hybrid
that satisfy both xdpcap ([2] slide-26) and katran ([2] slide-27), see
later slides how this is done. Notice there a requirement is that users
don't (need to) modify the BPF ELF file, to make it cooperate with this
system.

The katran use-case is to chain several eBPF programs.

The xdpcap use-case is to trap any XDP return action code (and tcpdump
via perf event ring_buffer).  For system administrators the xdpcap
use-case is something we hear about all the time, so one of the missing
features for XDP.  As Toke also wrote, we want to extend this to ALSO
be-able to see/dump the packet BEFORE a given XDP program.


> Lorenz, can you say more about your use case? :)

AFAIK Cloudflare also have a chaining eBPF program use-case for XDP.  I
could not find the blog post.
 
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls
  2019-10-03  7:48           ` Toke Høiland-Jørgensen
@ 2019-10-03 10:09             ` Jesper Dangaard Brouer
  2019-10-03 19:45               ` John Fastabend
  0 siblings, 1 reply; 53+ messages in thread
From: Jesper Dangaard Brouer @ 2019-10-03 10:09 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: John Fastabend, Alan Maguire, Daniel Borkmann,
	Alexei Starovoitov, Martin KaFai Lau, Song Liu, Yonghong Song,
	Marek Majkowski, Lorenz Bauer, David Miller, netdev, bpf, brouer

On Thu, 03 Oct 2019 09:48:22 +0200
Toke Høiland-Jørgensen <toke@redhat.com> wrote:

> John Fastabend <john.fastabend@gmail.com> writes:
> 
> > Toke Høiland-Jørgensen wrote:  
> >> John Fastabend <john.fastabend@gmail.com> writes:
> >>   
> >> > Toke Høiland-Jørgensen wrote:  
> >> >> Alan Maguire <alan.maguire@oracle.com> writes:
> >> >>   
> >> >> > On Wed, 2 Oct 2019, Toke Høiland-Jørgensen wrote:
> >> >> >  
> >> >> >> This series adds support for executing multiple XDP programs on a single
> >> >> >> interface in sequence, through the use of chain calls, as discussed at the Linux
> >> >> >> Plumbers Conference last month:
> >> >> >> 
> >> >> >> https://linuxplumbersconf.org/event/4/contributions/460/
> >> >> >> 
> >> >> >> # HIGH-LEVEL IDEA
> >> >> >> 
> >> >> >> The basic idea is to express the chain call sequence through a special map type,
> >> >> >> which contains a mapping from a (program, return code) tuple to another program
> >> >> >> to run in next in the sequence. Userspace can populate this map to express
> >> >> >> arbitrary call sequences, and update the sequence by updating or replacing the
> >> >> >> map.
> >> >> >> 
> >> >> >> The actual execution of the program sequence is done in bpf_prog_run_xdp(),
> >> >> >> which will lookup the chain sequence map, and if found, will loop through calls
> >> >> >> to BPF_PROG_RUN, looking up the next XDP program in the sequence based on the
> >> >> >> previous program ID and return code.
> >> >> >> 
> >> >> >> An XDP chain call map can be installed on an interface by means of a new netlink
> >> >> >> attribute containing an fd pointing to a chain call map. This can be supplied
> >> >> >> along with the XDP prog fd, so that a chain map is always installed together
> >> >> >> with an XDP program.
> >> >> >>   
> >> >> >
> >> >> > This is great stuff Toke!  
> >> >> 
> >> >> Thanks! :)
> >> >>   
> >> >> > One thing that wasn't immediately clear to me - and this may be just
> >> >> > me - is the relationship between program behaviour for the XDP_DROP
> >> >> > case and chain call execution. My initial thought was that a program
> >> >> > in the chain XDP_DROP'ping the packet would terminate the call chain,
> >> >> > but on looking at patch #4 it seems that the only way the call chain
> >> >> > execution is terminated is if
> >> >> >
> >> >> > - XDP_ABORTED is returned from a program in the call chain; or  
> >> >> 
> >> >> Yes. Not actually sure about this one...
> >> >>   
> >> >> > - the map entry for the next program (determined by the return value
> >> >> >   of the current program) is empty; or  
> >> >> 
> >> >> This will be the common exit condition, I expect
> >> >>   
> >> >> > - we run out of entries in the map  
> >> >> 
> >> >> You mean if we run the iteration counter to zero, right?
> >> >>   
> >> >> > The return value of the last-executed program in the chain seems to be
> >> >> > what determines packet processing behaviour after executing the chain
> >> >> > (_DROP, _TX, _PASS, etc). So there's no way to both XDP_PASS and
> >> >> > XDP_TX a packet from the same chain, right? Just want to make sure
> >> >> > I've got the semantics correct. Thanks!  
> >> >> 
> >> >> Yeah, you've got all this right. The chain call mechanism itself doesn't
> >> >> change any of the underlying fundamentals of XDP. I.e., each packet gets
> >> >> exactly one verdict.
> >> >> 
> >> >> For chaining actual XDP programs that do different things to the packet,
> >> >> I expect that the most common use case will be to only run the next
> >> >> program if the previous one returns XDP_PASS. That will make the most
> >> >> semantic sense I think.
> >> >> 
> >> >> But there are also use cases where one would want to match on the other
> >> >> return codes; such as packet capture, for instance, where one might
> >> >> install a capture program that would carry forward the previous return
> >> >> code, but do something to the packet (throw it out to userspace) first.
> >> >> 
> >> >> For the latter use case, the question is if we need to expose the
> >> >> previous return code to the program when it runs. You can do things
> >> >> without it (by just using a different program per return code), but it
> >> >> may simplify things if we just expose the return code. However, since
> >> >> this will also change the semantics for running programs, I decided to
> >> >> leave that off for now.
> >> >> 
> >> >> -Toke  
> >> >
> >> > In other cases where programs (e.g. cgroups) are run in an array the
> >> > return codes are 'AND'ed together so that we get
> >> >
> >> >    result1 & result2 & ... & resultN  

But the XDP return codes are not bit values, so AND operation doesn't
make sense to me.

> >> 
> >> How would that work with multiple programs, though? PASS -> DROP seems
> >> obvious, but what if the first program returns TX? Also, programs may
> >> want to be able to actually override return codes (e.g., say you want to
> >> turn DROPs into REDIRECTs, to get all your dropped packets mirrored to
> >> your IDS or something).  
> >
> > In general I think either you hard code a precedence that will have to
> > be overly conservative because if one program (your firewall) tells
> > XDP to drop the packet and some other program redirects it, passes,
> > etc. that seems incorrect to me. Or you get creative with the
> > precedence rules and they become complex and difficult to manage,
> > where a drop will drop a packet unless a previous/preceding program
> > redirects it, etc. I think any hard coded precedence you come up with
> > will make some one happy and some other user annoyed. Defeating the
> > programability of BPF.  
> 
> Yeah, exactly. That's basically why I punted on that completely.
> Besides, technically you can get this by just installing different
> programs in each slot if you really need it.

I would really like to avoid hard coding precedence.  I know it is
"challenging" that we want to allow overruling any XDP return code, but
I think it makes sense and it is the most flexible solution.


> > Better if its programmable. I would prefer to pass the context into
> > the next program then programs can build their own semantics. Then
> > leave the & of return codes so any program can if needed really drop a
> > packet. The context could be pushed into a shared memory region and
> > then it doesn't even need to be part of the program signature.  
> 
> Since it seems I'll be going down the rabbit hole of baking this into
> the BPF execution environment itself, I guess I'll keep this in mind as
> well. Either by stuffing the previous program return code into the
> context object(s), or by adding a new helper to retrieve it.

I would like to see the ability to retrieve previous program return
code, and a new helper would be the simplest approach.  As this could
potentially simplify and compact the data-structure.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls
  2019-10-03  8:53       ` Jesper Dangaard Brouer
@ 2019-10-03 14:03         ` Alexei Starovoitov
  2019-10-03 14:33           ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 53+ messages in thread
From: Alexei Starovoitov @ 2019-10-03 14:03 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Toke Høiland-Jørgensen, Song Liu, Daniel Borkmann,
	Alexei Starovoitov, Martin Lau, Yonghong Song, Marek Majkowski,
	Lorenz Bauer, David Miller, netdev, bpf

On Thu, Oct 3, 2019 at 1:53 AM Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>> The xdpcap use-case is to trap any XDP return action code (and tcpdump
> via perf event ring_buffer).  For system administrators the xdpcap
> use-case is something we hear about all the time, so one of the missing
> features for XDP.  As Toke also wrote, we want to extend this to ALSO
> be-able to see/dump the packet BEFORE a given XDP program.

It sounds to me that 'xdpdump/xdpcap' (tcpdump equivalent) is
the only use case both you and Toke are advocating for.
I think such case we can do already without new kernel code:
- retrieve prog_id of the program attached to given xdp ifindex
- convert to fd
- create prog_array of one element and store that prog_fd
- create xdpump bpf prog that prints to ring buffer
  and tail_calls into that prog_array
- replace xdp prog on that ifindex

Now it see all the traffic first and existing xdp progs keep working.
What am I missing?

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls
  2019-10-03 14:03         ` Alexei Starovoitov
@ 2019-10-03 14:33           ` Toke Høiland-Jørgensen
  2019-10-03 14:53             ` Edward Cree
  0 siblings, 1 reply; 53+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-10-03 14:33 UTC (permalink / raw)
  To: Alexei Starovoitov, Jesper Dangaard Brouer
  Cc: Song Liu, Daniel Borkmann, Alexei Starovoitov, Martin Lau,
	Yonghong Song, Marek Majkowski, Lorenz Bauer, David Miller,
	netdev, bpf

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Thu, Oct 3, 2019 at 1:53 AM Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>>> The xdpcap use-case is to trap any XDP return action code (and tcpdump
>> via perf event ring_buffer).  For system administrators the xdpcap
>> use-case is something we hear about all the time, so one of the missing
>> features for XDP.  As Toke also wrote, we want to extend this to ALSO
>> be-able to see/dump the packet BEFORE a given XDP program.
>
> It sounds to me that 'xdpdump/xdpcap' (tcpdump equivalent) is
> the only use case both you and Toke are advocating for.
> I think such case we can do already without new kernel code:
> - retrieve prog_id of the program attached to given xdp ifindex
> - convert to fd
> - create prog_array of one element and store that prog_fd
> - create xdpump bpf prog that prints to ring buffer
>   and tail_calls into that prog_array
> - replace xdp prog on that ifindex
>
> Now it see all the traffic first and existing xdp progs keep working.
> What am I missing?

Yeah, that takes care of the "run xdpdump as the first thing" use case.
But we also want to be able to run it *after* another program, *without*
modifying that program to add a tail call.

More generally, we want to be able to chain XDP programs from multiple
sources in arbitrary ways. Let me try to provide a more fleshed-out
usage example:

Say a distro ships MyFirewall and MyIDS, two different upstream
projects, both of which support XDP acceleration. MyFirewall has
specified in its documentation that its XDP program will return XDP_PASS
for anything that it has determined should not be dropped. So the
sysadmin decides he wants to enable both, and of course he wants both to
be XDP-accelerated.

This particular sysadmin doesn't want to expend IDS resources on traffic
that the firewall has already decided to drop, so he'll just install the
firewall first, and then run the IDS on any traffic that gets XDP_PASS.
So he installs IDS as a chain-call XDP program on the XDP_PASS action
after the firewall.

Another sysadmin might be more paranoid (or have more CPU resources
available), and so he wants to run the IDS first, and the firewall
afterwards. So he installs the two XDP programs in the reverse order, by
chaining the firewall to the IDS' XDP_PASS action.

At the same time, the sysadmin wants to inspect what the firewall is
actually dropping, so he fires up xdpdump and tells it to show him
everything dropped by the firewall. The xdpdump tool does this by
attaching itself as a chain call program to the XDP_DROP action of the
firewall program.

In all cases, the sysadmin can't (or doesn't want to) modify any of the
XDP programs. In fact, they may just be installed as pre-compiled .so
BPF files on his system. So he needs to be able to configure the call
chain of different programs without modifying the eBPF program source
code.

This is basically what we're trying to support with XDP chain calls
(which I guess is now turning into more general eBPF chain calls). I
think it is doable with something based on the BPF_PROG_CHAIN_* series
you posted a link to earlier; but instead of having an explicit
tail_call_next() helper, I'll just make the verifier insert the chain
calls before each BPF_EXIT instruction when this feature is turned on.
Do you see any reason why this wouldn't work?

-Toke

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls
  2019-10-03 14:33           ` Toke Høiland-Jørgensen
@ 2019-10-03 14:53             ` Edward Cree
  2019-10-03 18:49               ` Jesper Dangaard Brouer
  2019-10-03 19:35               ` John Fastabend
  0 siblings, 2 replies; 53+ messages in thread
From: Edward Cree @ 2019-10-03 14:53 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Alexei Starovoitov,
	Jesper Dangaard Brouer
  Cc: Song Liu, Daniel Borkmann, Alexei Starovoitov, Martin Lau,
	Yonghong Song, Marek Majkowski, Lorenz Bauer, David Miller,
	netdev, bpf

On 03/10/2019 15:33, Toke Høiland-Jørgensen wrote:
> In all cases, the sysadmin can't (or doesn't want to) modify any of the
> XDP programs. In fact, they may just be installed as pre-compiled .so
> BPF files on his system. So he needs to be able to configure the call
> chain of different programs without modifying the eBPF program source
> code.
Perhaps I'm being dumb, but can't we solve this if we make linking work?
I.e. myIDS.so has ids_main() function, myFirewall.so has firewall()
 function, and sysadmin writes a little XDP prog to call these:

int main(struct xdp_md *ctx)
{
        int rc = firewall(ctx), rc2;

        switch(rc) {
        case XDP_DROP:
        case XDP_ABORTED:
        default:
                return rc;
        case XDP_PASS:
                return ids_main(ctx);
        case XDP_TX:
        case XDP_REDIRECT:
                rc2 = ids_main(ctx);
                if (rc2 == XDP_PASS)
                        return rc;
                return rc2;
        }
}

Now he compiles this and links it against those .so files, giving him
 a new object file which he can then install.

(One problem which does spring to mind is that the .so files may very
 inconsiderately both name their entry points main(), which makes
 linking against both of them rather challenging.  But I think that
 can be worked around with a sufficiently clever linker).

-Ed

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls
  2019-10-03 14:53             ` Edward Cree
@ 2019-10-03 18:49               ` Jesper Dangaard Brouer
  2019-10-03 19:35               ` John Fastabend
  1 sibling, 0 replies; 53+ messages in thread
From: Jesper Dangaard Brouer @ 2019-10-03 18:49 UTC (permalink / raw)
  To: Edward Cree
  Cc: Toke Høiland-Jørgensen, Alexei Starovoitov, Song Liu,
	Daniel Borkmann, Alexei Starovoitov, Martin Lau, Yonghong Song,
	Marek Majkowski, Lorenz Bauer, David Miller, netdev, bpf, brouer

On Thu, 3 Oct 2019 15:53:50 +0100
Edward Cree <ecree@solarflare.com> wrote:

> On 03/10/2019 15:33, Toke Høiland-Jørgensen wrote:
> > In all cases, the sysadmin can't (or doesn't want to) modify any of the
> > XDP programs. In fact, they may just be installed as pre-compiled .so
> > BPF files on his system. So he needs to be able to configure the call
> > chain of different programs without modifying the eBPF program source
> > code.  
>
> Perhaps I'm being dumb, but can't we solve this if we make linking work?
> I.e. myIDS.so has ids_main() function, myFirewall.so has firewall()
>  function, and sysadmin writes a little XDP prog to call these:
> 
> int main(struct xdp_md *ctx)
> {
>         int rc = firewall(ctx), rc2;
> 
>         switch(rc) {
>         case XDP_DROP:
>         case XDP_ABORTED:
>         default:
>                 return rc;
>         case XDP_PASS:
>                 return ids_main(ctx);
>         case XDP_TX:
>         case XDP_REDIRECT:
>                 rc2 = ids_main(ctx);
>                 if (rc2 == XDP_PASS)
>                         return rc;
>                 return rc2;
>         }
> }
> 
> Now he compiles this and links it against those .so files, giving him
> a new object file which he can then install.

Sorry, but I don't think this makes sense. 

We are not dealing with .so (shared-object) files.  These are BPF
ELF-object files, which is not really ELF-objects in a normal sense.
The role of libbpf is to load the BPF elements in the ELF-file, via the
BPF-syscall.  First the maps are loaded (gets back a FD from
BPF-syscall), and then program sections are "opened" and ELF-relocation
is preformed replacing map accesses with map-FD's, before loading the
BPF byte-code via BPF-syscall.

Next issue is that the program loading the BPF-ELF object will get the
map FD's, which it expect and need to query and also populate with for
example its config.  Thus, you would need to pass the FD to the
original userspace application.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls
  2019-10-03 14:53             ` Edward Cree
  2019-10-03 18:49               ` Jesper Dangaard Brouer
@ 2019-10-03 19:35               ` John Fastabend
  2019-10-04  8:09                 ` Toke Høiland-Jørgensen
  1 sibling, 1 reply; 53+ messages in thread
From: John Fastabend @ 2019-10-03 19:35 UTC (permalink / raw)
  To: Edward Cree, Toke Høiland-Jørgensen,
	Alexei Starovoitov, Jesper Dangaard Brouer
  Cc: Song Liu, Daniel Borkmann, Alexei Starovoitov, Martin Lau,
	Yonghong Song, Marek Majkowski, Lorenz Bauer, David Miller,
	netdev, bpf

Edward Cree wrote:
> On 03/10/2019 15:33, Toke Høiland-Jørgensen wrote:
> > In all cases, the sysadmin can't (or doesn't want to) modify any of the
> > XDP programs. In fact, they may just be installed as pre-compiled .so
> > BPF files on his system. So he needs to be able to configure the call
> > chain of different programs without modifying the eBPF program source
> > code.
> Perhaps I'm being dumb, but can't we solve this if we make linking work?
> I.e. myIDS.so has ids_main() function, myFirewall.so has firewall()
>  function, and sysadmin writes a little XDP prog to call these:
> 
> int main(struct xdp_md *ctx)
> {
>         int rc = firewall(ctx), rc2;
> 
>         switch(rc) {
>         case XDP_DROP:
>         case XDP_ABORTED:
>         default:
>                 return rc;
>         case XDP_PASS:
>                 return ids_main(ctx);
>         case XDP_TX:
>         case XDP_REDIRECT:
>                 rc2 = ids_main(ctx);
>                 if (rc2 == XDP_PASS)
>                         return rc;
>                 return rc2;
>         }
> }
> 
> Now he compiles this and links it against those .so files, giving him
>  a new object file which he can then install.
> 
> (One problem which does spring to mind is that the .so files may very
>  inconsiderately both name their entry points main(), which makes
>  linking against both of them rather challenging.  But I think that
>  can be worked around with a sufficiently clever linker).

I agree but the same could be done today if ids_main and firewall
were inline functions. Admin can write their little program like above
and just '#include firewall', '#include ids'. Then you don't need
linking although it does make things nicer.

> 
> -Ed



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls
  2019-10-03 10:09             ` Jesper Dangaard Brouer
@ 2019-10-03 19:45               ` John Fastabend
  0 siblings, 0 replies; 53+ messages in thread
From: John Fastabend @ 2019-10-03 19:45 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Toke Høiland-Jørgensen
  Cc: John Fastabend, Alan Maguire, Daniel Borkmann,
	Alexei Starovoitov, Martin KaFai Lau, Song Liu, Yonghong Song,
	Marek Majkowski, Lorenz Bauer, David Miller, netdev, bpf, brouer

Jesper Dangaard Brouer wrote:
> On Thu, 03 Oct 2019 09:48:22 +0200
> Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> 
> > John Fastabend <john.fastabend@gmail.com> writes:
> > 
> > > Toke Høiland-Jørgensen wrote:  
> > >> John Fastabend <john.fastabend@gmail.com> writes:
> > >>   
> > >> > Toke Høiland-Jørgensen wrote:  
> > >> >> Alan Maguire <alan.maguire@oracle.com> writes:
> > >> >>   
> > >> >> > On Wed, 2 Oct 2019, Toke Høiland-Jørgensen wrote:
> > >> >> >  
> > >> >> >> This series adds support for executing multiple XDP programs on a single
> > >> >> >> interface in sequence, through the use of chain calls, as discussed at the Linux
> > >> >> >> Plumbers Conference last month:
> > >> >> >> 
> > >> >> >> https://linuxplumbersconf.org/event/4/contributions/460/
> > >> >> >> 
> > >> >> >> # HIGH-LEVEL IDEA
> > >> >> >> 
> > >> >> >> The basic idea is to express the chain call sequence through a special map type,
> > >> >> >> which contains a mapping from a (program, return code) tuple to another program
> > >> >> >> to run in next in the sequence. Userspace can populate this map to express
> > >> >> >> arbitrary call sequences, and update the sequence by updating or replacing the
> > >> >> >> map.
> > >> >> >> 
> > >> >> >> The actual execution of the program sequence is done in bpf_prog_run_xdp(),
> > >> >> >> which will lookup the chain sequence map, and if found, will loop through calls
> > >> >> >> to BPF_PROG_RUN, looking up the next XDP program in the sequence based on the
> > >> >> >> previous program ID and return code.
> > >> >> >> 
> > >> >> >> An XDP chain call map can be installed on an interface by means of a new netlink
> > >> >> >> attribute containing an fd pointing to a chain call map. This can be supplied
> > >> >> >> along with the XDP prog fd, so that a chain map is always installed together
> > >> >> >> with an XDP program.
> > >> >> >>   
> > >> >> >
> > >> >> > This is great stuff Toke!  
> > >> >> 
> > >> >> Thanks! :)
> > >> >>   
> > >> >> > One thing that wasn't immediately clear to me - and this may be just
> > >> >> > me - is the relationship between program behaviour for the XDP_DROP
> > >> >> > case and chain call execution. My initial thought was that a program
> > >> >> > in the chain XDP_DROP'ping the packet would terminate the call chain,
> > >> >> > but on looking at patch #4 it seems that the only way the call chain
> > >> >> > execution is terminated is if
> > >> >> >
> > >> >> > - XDP_ABORTED is returned from a program in the call chain; or  
> > >> >> 
> > >> >> Yes. Not actually sure about this one...
> > >> >>   
> > >> >> > - the map entry for the next program (determined by the return value
> > >> >> >   of the current program) is empty; or  
> > >> >> 
> > >> >> This will be the common exit condition, I expect
> > >> >>   
> > >> >> > - we run out of entries in the map  
> > >> >> 
> > >> >> You mean if we run the iteration counter to zero, right?
> > >> >>   
> > >> >> > The return value of the last-executed program in the chain seems to be
> > >> >> > what determines packet processing behaviour after executing the chain
> > >> >> > (_DROP, _TX, _PASS, etc). So there's no way to both XDP_PASS and
> > >> >> > XDP_TX a packet from the same chain, right? Just want to make sure
> > >> >> > I've got the semantics correct. Thanks!  
> > >> >> 
> > >> >> Yeah, you've got all this right. The chain call mechanism itself doesn't
> > >> >> change any of the underlying fundamentals of XDP. I.e., each packet gets
> > >> >> exactly one verdict.
> > >> >> 
> > >> >> For chaining actual XDP programs that do different things to the packet,
> > >> >> I expect that the most common use case will be to only run the next
> > >> >> program if the previous one returns XDP_PASS. That will make the most
> > >> >> semantic sense I think.
> > >> >> 
> > >> >> But there are also use cases where one would want to match on the other
> > >> >> return codes; such as packet capture, for instance, where one might
> > >> >> install a capture program that would carry forward the previous return
> > >> >> code, but do something to the packet (throw it out to userspace) first.
> > >> >> 
> > >> >> For the latter use case, the question is if we need to expose the
> > >> >> previous return code to the program when it runs. You can do things
> > >> >> without it (by just using a different program per return code), but it
> > >> >> may simplify things if we just expose the return code. However, since
> > >> >> this will also change the semantics for running programs, I decided to
> > >> >> leave that off for now.
> > >> >> 
> > >> >> -Toke  
> > >> >
> > >> > In other cases where programs (e.g. cgroups) are run in an array the
> > >> > return codes are 'AND'ed together so that we get
> > >> >
> > >> >    result1 & result2 & ... & resultN  
> 
> But the XDP return codes are not bit values, so AND operation doesn't
> make sense to me.
> 
> > >> 
> > >> How would that work with multiple programs, though? PASS -> DROP seems
> > >> obvious, but what if the first program returns TX? Also, programs may
> > >> want to be able to actually override return codes (e.g., say you want to
> > >> turn DROPs into REDIRECTs, to get all your dropped packets mirrored to
> > >> your IDS or something).  
> > >
> > > In general I think either you hard code a precedence that will have to
> > > be overly conservative because if one program (your firewall) tells
> > > XDP to drop the packet and some other program redirects it, passes,
> > > etc. that seems incorrect to me. Or you get creative with the
> > > precedence rules and they become complex and difficult to manage,
> > > where a drop will drop a packet unless a previous/preceding program
> > > redirects it, etc. I think any hard coded precedence you come up with
> > > will make some one happy and some other user annoyed. Defeating the
> > > programability of BPF.  
> > 
> > Yeah, exactly. That's basically why I punted on that completely.
> > Besides, technically you can get this by just installing different
> > programs in each slot if you really need it.
> 
> I would really like to avoid hard coding precedence.  I know it is
> "challenging" that we want to allow overruling any XDP return code, but
> I think it makes sense and it is the most flexible solution.
> 
> 
> > > Better if its programmable. I would prefer to pass the context into
> > > the next program then programs can build their own semantics. Then
> > > leave the & of return codes so any program can if needed really drop a
> > > packet. The context could be pushed into a shared memory region and
> > > then it doesn't even need to be part of the program signature.  
> > 
> > Since it seems I'll be going down the rabbit hole of baking this into
> > the BPF execution environment itself, I guess I'll keep this in mind as
> > well. Either by stuffing the previous program return code into the
> > context object(s), or by adding a new helper to retrieve it.
> 
> I would like to see the ability to retrieve previous program return
> code, and a new helper would be the simplest approach.  As this could
> potentially simplify and compact the data-structure.

But I'm a bit lost here. I think similar to Alexei comment.

Sounds like you want 'something' that implements a graph of functions,
where depending on the return code of the function you call the next
function in the graph. And you want the edges on that graph to be
programmable so that depending on the administrators configuration
the graph may be built differently?

The more I think about this it seems that you want a BPF program
that is generated from a configuration file to chain together whatever
the administrator asks for. Then you don't need any kernel bits
other than what we have. Of course we can optimize with shared libs
and such as we go but I'm not seeing the missing piece. Use proper
calls instead of tail calls and you can get return codes to handle
however you like. It requires a bit of cooperation from the ids
writers to build you a function to call but that should be simply
their normal xdp hook. Maybe the verifier needs a bit of extra
smarts to follow ctx types into calls I'm not sure.


User says via some fancy gui or yaml or ... build this


                                      ----- XDP_PASS
                                      |
                                      |
 XDP_firefwall --- pass ---> XDP_IDS ------ XDP_DROP
                |
                -- drop ------------------> XDP_DROP

generate the file and load it. This seems to be the advantage of
BPF here you can build whatever you like, make the rules however
complex you like, and build any precedent on the return codes you
like.

.John


> 
> -- 
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls
  2019-10-03 19:35               ` John Fastabend
@ 2019-10-04  8:09                 ` Toke Høiland-Jørgensen
  2019-10-04 10:34                   ` Edward Cree
  0 siblings, 1 reply; 53+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-10-04  8:09 UTC (permalink / raw)
  To: John Fastabend, Edward Cree, Alexei Starovoitov, Jesper Dangaard Brouer
  Cc: Song Liu, Daniel Borkmann, Alexei Starovoitov, Martin Lau,
	Yonghong Song, Marek Majkowski, Lorenz Bauer, David Miller,
	netdev, bpf

John Fastabend <john.fastabend@gmail.com> writes:

> Edward Cree wrote:
>> On 03/10/2019 15:33, Toke Høiland-Jørgensen wrote:
>> > In all cases, the sysadmin can't (or doesn't want to) modify any of the
>> > XDP programs. In fact, they may just be installed as pre-compiled .so
>> > BPF files on his system. So he needs to be able to configure the call
>> > chain of different programs without modifying the eBPF program source
>> > code.
>> Perhaps I'm being dumb, but can't we solve this if we make linking work?
>> I.e. myIDS.so has ids_main() function, myFirewall.so has firewall()
>>  function, and sysadmin writes a little XDP prog to call these:
>> 
>> int main(struct xdp_md *ctx)
>> {
>>         int rc = firewall(ctx), rc2;
>> 
>>         switch(rc) {
>>         case XDP_DROP:
>>         case XDP_ABORTED:
>>         default:
>>                 return rc;
>>         case XDP_PASS:
>>                 return ids_main(ctx);
>>         case XDP_TX:
>>         case XDP_REDIRECT:
>>                 rc2 = ids_main(ctx);
>>                 if (rc2 == XDP_PASS)
>>                         return rc;
>>                 return rc2;
>>         }
>> }
>> 
>> Now he compiles this and links it against those .so files, giving him
>>  a new object file which he can then install.
>> 
>> (One problem which does spring to mind is that the .so files may very
>>  inconsiderately both name their entry points main(), which makes
>>  linking against both of them rather challenging.  But I think that
>>  can be worked around with a sufficiently clever linker).
>
> I agree but the same could be done today if ids_main and firewall
> were inline functions. Admin can write their little program like above
> and just '#include firewall', '#include ids'. Then you don't need
> linking although it does make things nicer.

(Replying to both you and Edward here, as what you suggested in your
other email is basically the same).

Yes, you are right that we could technically do this entirely in
userspace with a sufficiently smart loader. This was even in the
presentation at LPC (see slide 34: "Implementation option #1: userspace
only").

The reason we want to add kernel support is that it solves several
important problems:

- Centralisation: To do this entirely in userspace, everything has to go
  through the same loader that builds the "dispatcher program".

- Enforcement: Related to the point above - if an XDP-enabled
  application loads its own XDP program without going through the
  central loader, we can't load another program after that.

- State inspection: If userspace builds and loads a single dispatch
  program, it is difficult to go from that back to the call sequence
  graph.

- Dynamic changes: While the dispatch program can be rebuilt from a
  config file, it becomes difficult to do dynamic changes to the
  sequence (such as temporarily attaching xdpdump to the XDP_DROP action
  of this already-loaded program while debugging).

Having the mechanism be in-kernel makes solving these problems a lot
easier, because the kernel can be responsible for state management, and
it can enforce the chain call execution logic.

The fact that Lorenz et al are interested in this feature (even though
they are essentially already doing what you suggested, by having a
centralised daemon to manage all XDP programs), tells me that having
kernel support for this is the right thing to do.

-Toke

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls
  2019-10-04  8:09                 ` Toke Høiland-Jørgensen
@ 2019-10-04 10:34                   ` Edward Cree
  2019-10-04 15:58                     ` Lorenz Bauer
  0 siblings, 1 reply; 53+ messages in thread
From: Edward Cree @ 2019-10-04 10:34 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, John Fastabend,
	Alexei Starovoitov, Jesper Dangaard Brouer, Lorenz Bauer
  Cc: Song Liu, Daniel Borkmann, Alexei Starovoitov, Martin Lau,
	Yonghong Song, Marek Majkowski, David Miller, netdev, bpf

On 04/10/2019 09:09, Toke Høiland-Jørgensen wrote:
> Having the mechanism be in-kernel makes solving these problems a lot
> easier, because the kernel can be responsible for state management, and
> it can enforce the chain call execution logic.
I would argue this isn't mechanism, but rather policy, because the
 mechanism we already have is sufficient to express the policy.

Enforcement is easily dealt with: you just don't give people the caps/
 perms to load XDP programs directly, so the only way they can do it is
 via your loader (which you give them a socket or dbus or something to
 talk to).  (Whereas your chain map doesn't really 'enforce' anything;
 anyone who can add themselves to the chain can also remove others.)
Then state inspection happens by querying the loader; if we assume that
 the distro provided the central loader, then they can also include the
 query in their standard system-dump tools.
Dynamic changes would just mean compiling a new dispatcher, then
 atomically replacing the old prog with the new (which we can already
 do), since the central loader daemon knows the call graph and can make
 changed versions easily.
Centralisation is something that happens normally in userspace; just
 count how many daemons your favourite init system runs to administer
 system resources and multiplex requests.  Probably we'd end up with
 one or two standard loaders and interfaces to them.
In any case, it seems like XDP users in userspace still need to
 communicate with each other in order to update the chain map (which
 seems to rely on knowing where one's own program fits into it); you
 suggest they might communicate through the chain map itself, and then
 veer off into the weeds of finding race-free ways of doing that.  This
 seems (to me) needlessly complex.

Incidentally, there's also a performance advantage to an eBPF dispatcher,
 because it means the calls to the individual programs can be JITted and
 therefore be direct, whereas an in-kernel data-driven dispatcher has to
 use indirect calls (*waves at spectre*).

> The fact that Lorenz et al are interested in this feature (even though
> they are essentially already doing what you suggested, by having a
> centralised daemon to manage all XDP programs), tells me that having
> kernel support for this is the right thing to do.
Maybe Lorenz could describe what he sees as the difficulties with the
 centralised daemon approach.  In what ways is his current "xdpd"
 solution unsatisfactory?

-Ed

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls
  2019-10-04 10:34                   ` Edward Cree
@ 2019-10-04 15:58                     ` Lorenz Bauer
  2019-10-07 16:43                       ` Edward Cree
  0 siblings, 1 reply; 53+ messages in thread
From: Lorenz Bauer @ 2019-10-04 15:58 UTC (permalink / raw)
  To: Edward Cree
  Cc: Toke Høiland-Jørgensen, John Fastabend,
	Alexei Starovoitov, Jesper Dangaard Brouer, Song Liu,
	Daniel Borkmann, Alexei Starovoitov, Martin Lau, Yonghong Song,
	Marek Majkowski, David Miller, netdev, bpf, kernel-team

On Fri, 4 Oct 2019 at 11:34, Edward Cree <ecree@solarflare.com> wrote:
>
> Enforcement is easily dealt with: you just don't give people the caps/
>  perms to load XDP programs directly, so the only way they can do it is
>  via your loader (which you give them a socket or dbus or something to
>  talk to).

Writing this daemon is actually harder than it sounds. Loading eBPF
programs can become fairly complex, with eBPF
maps being shared between different programs. If you want to support
all use cases (which you kind of have to) then you'll end up writing an
RPC wrapper for libbpf, which sounds very painful to me.

So I think for this to work at all, loading has to happen in the user space
components. Only construction of the control flow should be centralised.
This has the knock on effect that these components need
CAP_NET_ADMIN, since too much of eBPF relies on having that
capability right now: various map types, safety features applied to non-root
eBPF, etc. Given time this will be fixed, and maybe these programs can then
just have CAP_BPF or whatever.

I chatted with my colleague Arthur, and we think this might work if all
programs are forced to comply with the xdpcap-style tail call map:
a prog array with MAX_XDP_ACTION slots, which each program
calls into via

  tail_call(map, action);
  return action; // to handle the empty slot case

You could then send (program fd, tail call map fd) along with a priority
of some sort via SCM_RIGHTS. The daemon can then update the tail
call maps as needed. The problem is that this only allows
for linear (not tree-like) control flow.

We'll try and hack up a POC to see if it works at all.

> In any case, it seems like XDP users in userspace still need to
>  communicate with each other in order to update the chain map (which
>  seems to rely on knowing where one's own program fits into it); you
>  suggest they might communicate through the chain map itself, and then
>  veer off into the weeds of finding race-free ways of doing that.  This
>  seems (to me) needlessly complex.

I agree.

> Incidentally, there's also a performance advantage to an eBPF dispatcher,
>  because it means the calls to the individual programs can be JITted and
>  therefore be direct, whereas an in-kernel data-driven dispatcher has to
>  use indirect calls (*waves at spectre*).

This is if we somehow got full blown calls between distinct eBPF programs?

> Maybe Lorenz could describe what he sees as the difficulties with the
>  centralised daemon approach.  In what ways is his current "xdpd"
>  solution unsatisfactory?

xdpd contains the logic to load and install all the different XDP programs
we have. If we want to change one of them we have to redeploy the whole
thing. Same if we want to add one. It also makes life-cycle management
harder than it should be. So our xdpd is not at all like the "loader"
you envision.

-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls
  2019-10-04 15:58                     ` Lorenz Bauer
@ 2019-10-07 16:43                       ` Edward Cree
  2019-10-07 17:12                         ` Lorenz Bauer
  2019-10-07 21:01                         ` Alexei Starovoitov
  0 siblings, 2 replies; 53+ messages in thread
From: Edward Cree @ 2019-10-07 16:43 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: Toke Høiland-Jørgensen, John Fastabend,
	Alexei Starovoitov, Jesper Dangaard Brouer, Song Liu,
	Daniel Borkmann, Alexei Starovoitov, Martin Lau, Yonghong Song,
	Marek Majkowski, David Miller, netdev, bpf, kernel-team

On 04/10/2019 16:58, Lorenz Bauer wrote:
> If you want to support
> all use cases (which you kind of have to) then you'll end up writing an
> RPC wrapper for libbpf,
Yes, you more-or-less would need that.  Though I think you can e.g. have
 the clients load & pin their own maps and then pass the map fds over
 SCM_RIGHTS (though I'm not sure if our current permissions system is
 granular enough for that).

> which sounds very painful to me.
I might be being naïve, but it doesn't sound more painful than is normal
 for userland.  I mean, what operations have you got-
* create/destroy map (maybe, see above)
* load prog (pass it an fd from which it can read an ELF, and more fds
  for the maps it uses.  Everything else, e.g. BTFs, can just live in the
  ELF.)
* destroy prog
* bind prog to hook (admittedly there's a long list of hooks, but this is
  only to cover the XDP ones, so basically we just have to specify
  interface and generic/driver/hw)
-that doesn't seem like it presents great difficulties?

>> Incidentally, there's also a performance advantage to an eBPF dispatcher,
>>  because it means the calls to the individual programs can be JITted and
>>  therefore be direct, whereas an in-kernel data-driven dispatcher has to
>>  use indirect calls (*waves at spectre*).
> This is if we somehow got full blown calls between distinct eBPF programs?
No, I'm talking about doing a linker step (using the 'full-blown calls'
 _within_ an eBPF program that Alexei added a few months back) before the
 program is submitted to the kernel.  So the BPF_CALL|BPF_PSEUDO_CALL insn
 gets JITed to a direct call.

(Although I also think full-blown dynamically-linked calls ought not to be
 impossible, *if* we restrict them to taking a ctx and returning a u64, in
 which case the callee can be verified as though it were a normal program,
 and the caller's verifier just treats the program as returning an unknown
 scalar.  The devil is in the details, though, and it seems no-one's quite
 wanted it enough to do the work required to make it happen.)

>> Maybe Lorenz could describe what he sees as the difficulties with the
>>  centralised daemon approach.  In what ways is his current "xdpd"
>>  solution unsatisfactory?
> xdpd contains the logic to load and install all the different XDP programs
> we have. If we want to change one of them we have to redeploy the whole
> thing. Same if we want to add one. It also makes life-cycle management
> harder than it should be. So our xdpd is not at all like the "loader"
> you envision.
OK, but in that case xdpd isn't evidence that the "loader" approach doesn't
 work, so I still think it should be tried before we go to the lengths of
 pushing something into the kernel (that we then have to maintain forever).

No promises but I might find the time to put together a strawman
 implementation of the loader, to show how I envisage it working.

-Ed

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls
  2019-10-07 16:43                       ` Edward Cree
@ 2019-10-07 17:12                         ` Lorenz Bauer
  2019-10-07 19:21                           ` Edward Cree
  2019-10-07 21:01                         ` Alexei Starovoitov
  1 sibling, 1 reply; 53+ messages in thread
From: Lorenz Bauer @ 2019-10-07 17:12 UTC (permalink / raw)
  To: Edward Cree
  Cc: Toke Høiland-Jørgensen, John Fastabend,
	Alexei Starovoitov, Jesper Dangaard Brouer, Song Liu,
	Daniel Borkmann, Alexei Starovoitov, Martin Lau, Yonghong Song,
	Marek Majkowski, David Miller, netdev, bpf, kernel-team

On Mon, 7 Oct 2019 at 17:43, Edward Cree <ecree@solarflare.com> wrote:
>
> I might be being naïve, but it doesn't sound more painful than is normal
>  for userland.  I mean, what operations have you got-
> * create/destroy map (maybe, see above)
> * load prog (pass it an fd from which it can read an ELF, and more fds
>   for the maps it uses.  Everything else, e.g. BTFs, can just live in the
>   ELF.)
> * destroy prog
> * bind prog to hook (admittedly there's a long list of hooks, but this is
>   only to cover the XDP ones, so basically we just have to specify
>   interface and generic/driver/hw)
> -that doesn't seem like it presents great difficulties?

Sure, but this is the simplest, not necessarily realistic use case. There
is a reason that libbpf has the API it has. For example, we patch our
eBPF before loading it. I'm sure there are other complications, which is
why I prefer to keep loading my own programs.

> No, I'm talking about doing a linker step (using the 'full-blown calls'
>  _within_ an eBPF program that Alexei added a few months back) before the
>  program is submitted to the kernel.  So the BPF_CALL|BPF_PSEUDO_CALL insn
>  gets JITed to a direct call.

Ah, I see. I'm not sure whether this restriction has been lifted, but those
calls are incompatible with tail calls. So we wouldn't be able to use this.

> OK, but in that case xdpd isn't evidence that the "loader" approach doesn't
>  work, so I still think it should be tried before we go to the lengths of
>  pushing something into the kernel (that we then have to maintain forever).

Maybe this came across the wrong way, I never said it is. Merely that it's
the status quo we'd like to move away from. If we can achieve that in
userspace, great.

Lorenz

-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls
  2019-10-07 17:12                         ` Lorenz Bauer
@ 2019-10-07 19:21                           ` Edward Cree
  0 siblings, 0 replies; 53+ messages in thread
From: Edward Cree @ 2019-10-07 19:21 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: Toke Høiland-Jørgensen, John Fastabend,
	Alexei Starovoitov, Jesper Dangaard Brouer, Song Liu,
	Daniel Borkmann, Alexei Starovoitov, Martin Lau, Yonghong Song,
	Marek Majkowski, David Miller, netdev, bpf, kernel-team

On 07/10/2019 18:12, Lorenz Bauer wrote:
> Sure, but this is the simplest, not necessarily realistic use case. There
> is a reason that libbpf has the API it has. For example, we patch our
> eBPF before loading it. I'm sure there are other complications, which is
> why I prefer to keep loading my own programs.
Any reason why you can't have the client patch the eBPF (possibly with
 libbpf) before supplying the patched object file to the loaderiser?

>> No, I'm talking about doing a linker step (using the 'full-blown calls'
>>  _within_ an eBPF program that Alexei added a few months back) before the
>>  program is submitted to the kernel.  So the BPF_CALL|BPF_PSEUDO_CALL insn
>>  gets JITed to a direct call.
> Ah, I see. I'm not sure whether this restriction has been lifted, but those
> calls are incompatible with tail calls. So we wouldn't be able to use this.
Indeed, tail calls don't fit into my scheme, because being a tail-call from
 the subprogram doesn't make you a tail-call from the dispatcher program.
But AIUI tail calls are only in use today in various work-arounds for the
 lack of proper linking (including dynamic linking).  If we supported that,
 would you still need them?

>> OK, but in that case xdpd isn't evidence that the "loader" approach doesn't
>>  work, so I still think it should be tried before we go to the lengths of
>>  pushing something into the kernel (that we then have to maintain forever).
> Maybe this came across the wrong way, I never said it is.
No, you didn't (sorry).  Toke somewhat implied it, which is what I was
 responding to there.

-Ed

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls
  2019-10-07 16:43                       ` Edward Cree
  2019-10-07 17:12                         ` Lorenz Bauer
@ 2019-10-07 21:01                         ` Alexei Starovoitov
  1 sibling, 0 replies; 53+ messages in thread
From: Alexei Starovoitov @ 2019-10-07 21:01 UTC (permalink / raw)
  To: Edward Cree
  Cc: Lorenz Bauer, Toke Høiland-Jørgensen, John Fastabend,
	Jesper Dangaard Brouer, Song Liu, Daniel Borkmann,
	Alexei Starovoitov, Martin Lau, Yonghong Song, Marek Majkowski,
	David Miller, netdev, bpf, kernel-team

On Mon, Oct 07, 2019 at 05:43:44PM +0100, Edward Cree wrote:
> 
> (Although I also think full-blown dynamically-linked calls ought not to be
>  impossible, *if* we restrict them to taking a ctx and returning a u64, in
>  which case the callee can be verified as though it were a normal program,
>  and the caller's verifier just treats the program as returning an unknown
>  scalar.  The devil is in the details, though, and it seems no-one's quite
>  wanted it enough to do the work required to make it happen.)

Absolutely.
Full dynamic linking and libraries is on todo list.
It's taking long time, since it needs to be powerful and generic from the day one.
If we do 'pass ctx only and return u64' as a stop gap, it will be just as limited
as existing bpf_tail_calls.
bpf_tail_call api was badly designed.
I couldn't figure out how to make tail calls safe and generic, so I came up
with this bpf_tail_call hack. bpf usability suffers.
The verifier is different now. It's powerful enough to do true calls and jumps.
Especially with BTF it can see and track all types.
True tail calls will be seen as indirect jump assembler insn.
True indirect calls will be seen as indirect call assembler insn.
The choice of opcode encoding is clear.
The verifier, interpreter, and JIT work left.
It won't take long.


^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2019-10-07 21:01 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-02 13:30 [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls Toke Høiland-Jørgensen
2019-10-02 13:30 ` [PATCH bpf-next 1/9] hashtab: Add new bpf_map_fd_put_value op Toke Høiland-Jørgensen
2019-10-02 13:30 ` [PATCH bpf-next 2/9] xdp: Add new xdp_chain_map type for specifying XDP call sequences Toke Høiland-Jørgensen
2019-10-02 15:50   ` Lorenz Bauer
2019-10-02 18:25     ` Toke Høiland-Jørgensen
2019-10-02 13:30 ` [PATCH bpf-next 3/9] xdp: Support setting and getting device chain map Toke Høiland-Jørgensen
2019-10-02 15:50   ` Lorenz Bauer
2019-10-02 18:32     ` Toke Høiland-Jørgensen
2019-10-02 18:07   ` kbuild test robot
2019-10-02 18:29   ` kbuild test robot
2019-10-02 13:30 ` [PATCH bpf-next 4/9] xdp: Implement chain call logic to support multiple programs on one interface Toke Høiland-Jørgensen
2019-10-02 17:33   ` kbuild test robot
2019-10-02 17:53   ` kbuild test robot
2019-10-02 13:30 ` [PATCH bpf-next 5/9] tools/include/uapi: Add XDP chain map definitions Toke Høiland-Jørgensen
2019-10-02 13:30 ` [PATCH bpf-next 6/9] tools/libbpf_probes: Add support for xdp_chain map type Toke Høiland-Jørgensen
2019-10-02 13:30 ` [PATCH bpf-next 7/9] bpftool: Add definitions " Toke Høiland-Jørgensen
2019-10-02 13:30 ` [PATCH bpf-next 8/9] libbpf: Add support for setting and getting XDP chain maps Toke Høiland-Jørgensen
2019-10-02 13:30 ` [PATCH bpf-next 9/9] selftests: Add tests for XDP chain calls Toke Høiland-Jørgensen
2019-10-02 15:10 ` [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through " Alan Maguire
2019-10-02 15:33   ` Toke Høiland-Jørgensen
2019-10-02 16:34     ` John Fastabend
2019-10-02 18:33       ` Toke Høiland-Jørgensen
2019-10-02 20:34         ` John Fastabend
2019-10-03  7:48           ` Toke Høiland-Jørgensen
2019-10-03 10:09             ` Jesper Dangaard Brouer
2019-10-03 19:45               ` John Fastabend
2019-10-02 16:35 ` Lorenz Bauer
2019-10-02 18:54   ` Toke Høiland-Jørgensen
2019-10-02 16:43 ` John Fastabend
2019-10-02 19:09   ` Toke Høiland-Jørgensen
2019-10-02 19:15   ` Daniel Borkmann
2019-10-02 19:29     ` Toke Høiland-Jørgensen
2019-10-02 19:46     ` Alexei Starovoitov
2019-10-03  7:58       ` Toke Høiland-Jørgensen
2019-10-02 18:38 ` Song Liu
2019-10-02 18:54   ` Song Liu
2019-10-02 19:25     ` Toke Høiland-Jørgensen
2019-10-03  8:53       ` Jesper Dangaard Brouer
2019-10-03 14:03         ` Alexei Starovoitov
2019-10-03 14:33           ` Toke Høiland-Jørgensen
2019-10-03 14:53             ` Edward Cree
2019-10-03 18:49               ` Jesper Dangaard Brouer
2019-10-03 19:35               ` John Fastabend
2019-10-04  8:09                 ` Toke Høiland-Jørgensen
2019-10-04 10:34                   ` Edward Cree
2019-10-04 15:58                     ` Lorenz Bauer
2019-10-07 16:43                       ` Edward Cree
2019-10-07 17:12                         ` Lorenz Bauer
2019-10-07 19:21                           ` Edward Cree
2019-10-07 21:01                         ` Alexei Starovoitov
2019-10-02 19:23   ` Toke Høiland-Jørgensen
2019-10-02 19:49     ` Song Liu
2019-10-03  7:59       ` Toke Høiland-Jørgensen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).