bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v4 0/3] Add TC-BPF API
@ 2021-04-23 15:05 Kumar Kartikeya Dwivedi
  2021-04-23 15:05 ` [PATCH bpf-next v4 1/3] libbpf: add helpers for preparing netlink attributes Kumar Kartikeya Dwivedi
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2021-04-23 15:05 UTC (permalink / raw)
  To: bpf
  Cc: Kumar Kartikeya Dwivedi, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, Toke Høiland-Jørgensen, netdev

This is the fourth version of the TC-BPF series.

It adds a simple API that uses netlink to attach the tc filter and its bpf
classifier. Currently, a user needs to shell out to the tc command line to be
able to create filters and attach SCHED_CLS programs as classifiers. With the
help of this API, it will be possible to use libbpf for doing all parts of bpf
program setup and attach.

Changelog contains details of patchset evolution.

In an effort to keep discussion focused, this series doesn't have the high level
TC-BPF API. It was clear that there is a need for a bpf_link API in the kernel,
hence that will be submitted as a separate patchset.

The individual commit messages contain more details, and also a brief summary of
the API.

Changelog:
----------
v3 -> v4
v3: https://lore.kernel.org/bpf/20210420193740.124285-1-memxor@gmail.com

 * We add a concept of bpf_tc_ctx context structure representing the attach point.
   The qdisc setup and delete is tied to this object's lifetime if it succeeds
   in creating the clsact qdisc when the attach point is BPF_TC_INGRESS or
   BPF_TC_EGRESS. Qdisc is only deleted when there are no filters attached to
   it. The struct itself is opaque to the user.
 * Refactor all API functions to take ctx.
 * Remove bpf_tc_info, bpf_tc_attach_id, instead reuse bpf_tc_opts for filling
   in attributes in various API functions (including query).
 * Explicitly document the expectation of each function regarding the opts
   fields that must be set/unset. Add some small notes for the defaults chosen
   by the API.
 * Rename bpf_tc_get_info to bpf_tc_query
 * Keep the netlink socket open in the context structure to save on open/close
   cycles for each operation.
 * Miscellaneous adjustments due to keeping the socket open.
 * Rewrite the tests, and also add tests for verifying all preconditions of the
   TC-BPF API.
 * Use bpf skeleton API in examples and tests.

v2 -> v3
v2: https://lore.kernel.org/bpf/20210419121811.117400-1-memxor@gmail.com

 * bpf_tc_cls_* -> bpf_tc_* rename
 * bpf_tc_attach_id now only consists of handle and priority, the two variables
   that user may or may not set.
 * bpf_tc_replace has been dropped, instead a replace bool is introduced in
   bpf_tc_opts for the same purpose.
 * bpf_tc_get_info now takes attach_id for filling in filter details during
   lookup instead of requiring user to do so. This also allows us to remove the
   fd parameter, as no matching is needed as long as we have all attributes
   necessary to identify a specific filter.
 * A little bit of code simplification taking into account the change above.
 * priority and protocol are now __u16 members in user facing API structs to
   reflect actual size.
 * Patch updating pkt_cls.h header has been removed, as it is unused now.
 * protocol and chain_index options have been dropped in bpf_tc_opts,
   protocol is always set to ETH_P_ALL, while chain_index is set as 0 by
   default in the kernel. This also means removal of chain_index from
   bpf_tc_attach_id, as it is unconditionally always 0.
 * bpf_tc_cls_change has been dropped
 * selftest now uses ASSERT_* macros

v1 -> v2
v1: https://lore.kernel.org/bpf/20210325120020.236504-1-memxor@gmail.com

 * netlink helpers have been renamed to object_action style.
 * attach_id now only contains attributes that are not explicitly set. Only
   the bare minimum info is kept in it.
 * protocol is now an optional and always set to ETH_P_ALL.
 * direct-action mode is always set.
 * skip_sw and skip_hw options have also been removed.
 * bpf_tc_cls_info struct now also returns the bpf program tag and id, as
   available in the netlink response. This came up as a requirement during
   discussion with people wanting to use this functionality.
 * support for attaching SCHED_ACT programs has been dropped, as it isn't
   useful without any support for binding loaded actions to a classifier.
 * the distinction between dev and block API has been dropped, there is now
   a single set of functions and user has to pass the special ifindex value
   to indicate operation on a shared filter block on their own.
 * The high level API returning a bpf_link is gone. This was already non-
   functional for pinning and typical ownership semantics. Instead, a separate
   patchset will be sent adding a bpf_link API for attaching SCHED_CLS progs to
   the kernel, and its corresponding libbpf API.
 * The clsact qdisc is now setup automatically in a best-effort fashion whenever
   user passes in the clsact ingress or egress parent id. This is done with
   exclusive mode, such that if an ingress or clsact qdisc is already set up,
   we skip the setup and move on with filter creation.
 * Other minor changes that came up during the course of discussion and rework.

Kumar Kartikeya Dwivedi (3):
  libbpf: add helpers for preparing netlink attributes
  libbpf: add low level TC-BPF API
  libbpf: add selftests for TC-BPF API

 tools/lib/bpf/libbpf.h                        |  92 ++++
 tools/lib/bpf/libbpf.map                      |   5 +
 tools/lib/bpf/netlink.c                       | 515 +++++++++++++++++-
 tools/lib/bpf/nlattr.h                        |  48 ++
 .../testing/selftests/bpf/prog_tests/tc_bpf.c | 204 +++++++
 .../testing/selftests/bpf/progs/test_tc_bpf.c |  12 +
 6 files changed, 854 insertions(+), 22 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/tc_bpf.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_tc_bpf.c

--
2.30.2


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH bpf-next v4 1/3] libbpf: add helpers for preparing netlink attributes
  2021-04-23 15:05 [PATCH bpf-next v4 0/3] Add TC-BPF API Kumar Kartikeya Dwivedi
@ 2021-04-23 15:05 ` Kumar Kartikeya Dwivedi
  2021-04-23 15:05 ` [PATCH bpf-next v4 2/3] libbpf: add low level TC-BPF API Kumar Kartikeya Dwivedi
  2021-04-23 15:06 ` [PATCH bpf-next v4 3/3] libbpf: add selftests for " Kumar Kartikeya Dwivedi
  2 siblings, 0 replies; 18+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2021-04-23 15:05 UTC (permalink / raw)
  To: bpf
  Cc: Kumar Kartikeya Dwivedi, Toke Høiland-Jørgensen,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, netdev

This change introduces a few helpers to wrap open coded attribute
preparation in netlink.c.

Every nested attribute's closure must happen using the helper
nlattr_end_nested, which sets its length properly. NLA_F_NESTED is
enforced using nlattr_begin_nested helper. Other simple attributes
can be added directly.

The maxsz parameter corresponds to the size of the request structure
which is being filled in, so for instance with req being:

struct {
	struct nlmsghdr nh;
	struct tcmsg t;
	char buf[4096];
} req;

Then, maxsz should be sizeof(req).

This change also converts the open coded attribute preparation with the
helpers. Note that the only failure the internal call to nlattr_add
could result in the nested helper would be -EMSGSIZE, hence that is what
we return to our caller.

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 tools/lib/bpf/netlink.c | 37 ++++++++++++++-----------------
 tools/lib/bpf/nlattr.h  | 48 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 64 insertions(+), 21 deletions(-)

diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c
index d2cb28e9ef52..c79e30484e81 100644
--- a/tools/lib/bpf/netlink.c
+++ b/tools/lib/bpf/netlink.c
@@ -135,7 +135,7 @@ static int __bpf_set_link_xdp_fd_replace(int ifindex, int fd, int old_fd,
 					 __u32 flags)
 {
 	int sock, seq = 0, ret;
-	struct nlattr *nla, *nla_xdp;
+	struct nlattr *nla;
 	struct {
 		struct nlmsghdr  nh;
 		struct ifinfomsg ifinfo;
@@ -157,36 +157,31 @@ static int __bpf_set_link_xdp_fd_replace(int ifindex, int fd, int old_fd,
 	req.ifinfo.ifi_index = ifindex;
 
 	/* started nested attribute for XDP */
-	nla = (struct nlattr *)(((char *)&req)
-				+ NLMSG_ALIGN(req.nh.nlmsg_len));
-	nla->nla_type = NLA_F_NESTED | IFLA_XDP;
-	nla->nla_len = NLA_HDRLEN;
+	nla = nlattr_begin_nested(&req.nh, sizeof(req), IFLA_XDP);
+	if (!nla) {
+		ret = -EMSGSIZE;
+		goto cleanup;
+	}
 
 	/* add XDP fd */
-	nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
-	nla_xdp->nla_type = IFLA_XDP_FD;
-	nla_xdp->nla_len = NLA_HDRLEN + sizeof(int);
-	memcpy((char *)nla_xdp + NLA_HDRLEN, &fd, sizeof(fd));
-	nla->nla_len += nla_xdp->nla_len;
+	ret = nlattr_add(&req.nh, sizeof(req), IFLA_XDP_FD, &fd, sizeof(fd));
+	if (ret < 0)
+		goto cleanup;
 
 	/* if user passed in any flags, add those too */
 	if (flags) {
-		nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
-		nla_xdp->nla_type = IFLA_XDP_FLAGS;
-		nla_xdp->nla_len = NLA_HDRLEN + sizeof(flags);
-		memcpy((char *)nla_xdp + NLA_HDRLEN, &flags, sizeof(flags));
-		nla->nla_len += nla_xdp->nla_len;
+		ret = nlattr_add(&req.nh, sizeof(req), IFLA_XDP_FLAGS, &flags, sizeof(flags));
+		if (ret < 0)
+			goto cleanup;
 	}
 
 	if (flags & XDP_FLAGS_REPLACE) {
-		nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
-		nla_xdp->nla_type = IFLA_XDP_EXPECTED_FD;
-		nla_xdp->nla_len = NLA_HDRLEN + sizeof(old_fd);
-		memcpy((char *)nla_xdp + NLA_HDRLEN, &old_fd, sizeof(old_fd));
-		nla->nla_len += nla_xdp->nla_len;
+		ret = nlattr_add(&req.nh, sizeof(req), IFLA_XDP_EXPECTED_FD, &flags, sizeof(flags));
+		if (ret < 0)
+			goto cleanup;
 	}
 
-	req.nh.nlmsg_len += NLA_ALIGN(nla->nla_len);
+	nlattr_end_nested(&req.nh, nla);
 
 	if (send(sock, &req, req.nh.nlmsg_len, 0) < 0) {
 		ret = -errno;
diff --git a/tools/lib/bpf/nlattr.h b/tools/lib/bpf/nlattr.h
index 6cc3ac91690f..1c94cdb6e89d 100644
--- a/tools/lib/bpf/nlattr.h
+++ b/tools/lib/bpf/nlattr.h
@@ -10,7 +10,10 @@
 #define __LIBBPF_NLATTR_H
 
 #include <stdint.h>
+#include <string.h>
+#include <errno.h>
 #include <linux/netlink.h>
+
 /* avoid multiple definition of netlink features */
 #define __LINUX_NETLINK_H
 
@@ -103,4 +106,49 @@ int libbpf_nla_parse_nested(struct nlattr *tb[], int maxtype,
 
 int libbpf_nla_dump_errormsg(struct nlmsghdr *nlh);
 
+static inline struct nlattr *nla_data(struct nlattr *nla)
+{
+	return (struct nlattr *)((char *)nla + NLA_HDRLEN);
+}
+
+static inline struct nlattr *nh_tail(struct nlmsghdr *nh)
+{
+	return (struct nlattr *)((char *)nh + NLMSG_ALIGN(nh->nlmsg_len));
+}
+
+static inline int nlattr_add(struct nlmsghdr *nh, size_t maxsz, int type,
+			     const void *data, int len)
+{
+	struct nlattr *nla;
+
+	if (NLMSG_ALIGN(nh->nlmsg_len) + NLA_ALIGN(NLA_HDRLEN + len) > maxsz)
+		return -EMSGSIZE;
+	if ((!data && len) || (data && !len))
+		return -EINVAL;
+
+	nla = nh_tail(nh);
+	nla->nla_type = type;
+	nla->nla_len = NLA_HDRLEN + len;
+	if (data)
+		memcpy(nla_data(nla), data, len);
+	nh->nlmsg_len = NLMSG_ALIGN(nh->nlmsg_len) + NLA_ALIGN(nla->nla_len);
+	return 0;
+}
+
+static inline struct nlattr *nlattr_begin_nested(struct nlmsghdr *nh,
+						 size_t maxsz, int type)
+{
+	struct nlattr *tail;
+
+	tail = nh_tail(nh);
+	if (nlattr_add(nh, maxsz, type | NLA_F_NESTED, NULL, 0))
+		return NULL;
+	return tail;
+}
+
+static inline void nlattr_end_nested(struct nlmsghdr *nh, struct nlattr *tail)
+{
+	tail->nla_len = (char *)nh_tail(nh) - (char *)tail;
+}
+
 #endif /* __LIBBPF_NLATTR_H */
-- 
2.30.2


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH bpf-next v4 2/3] libbpf: add low level TC-BPF API
  2021-04-23 15:05 [PATCH bpf-next v4 0/3] Add TC-BPF API Kumar Kartikeya Dwivedi
  2021-04-23 15:05 ` [PATCH bpf-next v4 1/3] libbpf: add helpers for preparing netlink attributes Kumar Kartikeya Dwivedi
@ 2021-04-23 15:05 ` Kumar Kartikeya Dwivedi
  2021-04-27 15:04   ` Daniel Borkmann
  2021-04-23 15:06 ` [PATCH bpf-next v4 3/3] libbpf: add selftests for " Kumar Kartikeya Dwivedi
  2 siblings, 1 reply; 18+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2021-04-23 15:05 UTC (permalink / raw)
  To: bpf
  Cc: Kumar Kartikeya Dwivedi, Toke Høiland-Jørgensen,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, netdev

This adds functions that wrap the netlink API used for adding,
manipulating, and removing traffic control filters.

An API summary:

bpf_tc_ctx_init may be used to create a context structure represented by
the opaque struct bpf_tc_ctx pointer returned to the caller. A bpf
context represents the attach point for a filter, which means it takes
the ifindex of the device user is attaching to, and an attach point that
translates to the parent of the filter. When specifying BPF_TC_INGRESS
or BPF_TC_EGRESS, the ctx initialization will automatically setup the
clsact qdisc if possible, and also clean it up if it was created during
ctx_init and there are no more remaining filters. A custom parent value
can be set in opts during filter attach and indicated by setting
BPF_TC_CUSTOM_PARENT during ctx initialization.

bpf_tc_ctx_destroy is used to release a bpf_tc_ctx object and remove any
qdiscs owned by it. Note that there is a small race between the checking
of existing filters and removal of qdisc. The chances of this happening
are quite slim however, as we only remove a qdisc we set up.

bpf_tc_attach may be used to attach, and replace a filter and bind a
SCHED_CLS prog as its bpf classifier. Direct action mode is always
enabled, and protocol is always set to ETH_P_ALL. The filter is attached
to the main chain with index 0.

bpf_tc_detach may be used to delete existing filter and detach the
SCHED_CLS filter.

Example (using bpf skeleton infrastructure):

BPF program (test_tc_bpf.c):
	#include <linux/bpf.h>
	#include <bpf/bpf_helpers.h>

	SEC("classifier")
	int cls(struct __sk_buff *skb)
	{
		return 0;
	}

Userspace loader:
	DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, 0);
	struct test_tc_bpf *skel;
	struct bpf_tc_ctx *ctx;
	int fd, r;

	skel = test_tc_bpf__open_and_load();
	if (!skel)
		return -ENOMEM;

	fd = bpf_program__fd(skel->progs.cls);

	ctx = bpf_tc_ctx_init(if_nametoindex("lo"),
			      BPF_TC_INGRESS, NULL);
	if (!ctx)
		return -ENOMEM;

	r = bpf_tc_attach(ctx, fd, &opts);
	if (r < 0)
		return r;

... which is roughly equivalent to:
  # tc qdisc add dev lo clsact
  # tc filter add dev lo ingress bpf obj foo.o sec classifier da

To replace an existing filter (e.g. the one we just created):

	/* opts.{handle, parent, priority, prog_id} was filled
	 * in by attach.
	 */
	opts.replace = true;
	/* clear fields that must be unset, we must also clear parent as
	   our attach point is ingress hook */
	opts.parent = opts.prog_id = 0;
	r = bpf_tc_attach(ctx, fd, &opts);
	if (r < 0)
		return r;

To obtain info of a particular filter:

	/* Find info for filter with handle 1 and priority 50 */
	DECLARE_LIBBPF_OPTS(bpf_tc_opts, info_opts, .handle = 1,
			    .priority = 50, .parent = opts.parent)
	r = bpf_tc_query(ctx, &info_opts);
	if (r < 0 && r == -ENOENT)
		printf("Filter not found\n");
	else
		return r;

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 tools/lib/bpf/libbpf.h   |  92 ++++++++
 tools/lib/bpf/libbpf.map |   5 +
 tools/lib/bpf/netlink.c  | 478 ++++++++++++++++++++++++++++++++++++++-
 3 files changed, 574 insertions(+), 1 deletion(-)

diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index bec4e6a6e31d..1c717c07b66e 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -775,6 +775,98 @@ LIBBPF_API int bpf_linker__add_file(struct bpf_linker *linker, const char *filen
 LIBBPF_API int bpf_linker__finalize(struct bpf_linker *linker);
 LIBBPF_API void bpf_linker__free(struct bpf_linker *linker);

+enum bpf_tc_attach_point {
+	BPF_TC_INGRESS,
+	BPF_TC_EGRESS,
+	BPF_TC_CUSTOM_PARENT,
+	_BPF_TC_PARENT_MAX,
+};
+
+/* The opts structure is also used to return the created filters attributes
+ * (e.g. in case the user left them unset). Some of the options that were left
+ * out default to a reasonable value, documented below.
+ *
+ *	protocol - ETH_P_ALL
+ *	chain index - 0
+ *	class_id - 0 (can be set by bpf program using skb->tc_classid)
+ *	bpf_flags - TCA_BPF_FLAG_ACT_DIRECT (direct action mode)
+ *	bpf_flags_gen - 0
+ *
+ *	The user must fulfill documented requirements for each function.
+ */
+struct bpf_tc_opts {
+	size_t sz;
+	__u32 handle;
+	__u32 parent;
+	__u16 priority;
+	__u32 prog_id;
+	bool replace;
+	size_t :0;
+};
+
+#define bpf_tc_opts__last_field replace
+
+struct bpf_tc_ctx;
+
+struct bpf_tc_ctx_opts {
+	size_t sz;
+};
+
+#define bpf_tc_ctx_opts__last_field sz
+
+/* Requirements */
+/*
+ * @ifindex: Must be > 0.
+ * @parent: Must be one of the enum constants < _BPF_TC_PARENT_MAX
+ * @opts: Can be NULL, currently no options are supported.
+ */
+LIBBPF_API struct bpf_tc_ctx *bpf_tc_ctx_init(__u32 ifindex,
+					      enum bpf_tc_attach_point parent,
+					      struct bpf_tc_ctx_opts *opts);
+/*
+ * @ctx: Can be NULL, if not, must point to a valid object.
+ *	 If the qdisc was attached during ctx_init, it will be deleted if no
+ *	 filters are attached to it.
+ *	 When ctx == NULL, this is a no-op.
+ */
+LIBBPF_API int bpf_tc_ctx_destroy(struct bpf_tc_ctx *ctx);
+/*
+ * @ctx: Cannot be NULL.
+ * @fd: Must be >= 0.
+ * @opts: Cannot be NULL, prog_id must be unset, all other fields can be
+ *	  optionally set. All fields except replace  will be set as per created
+ *        filter's attributes. parent must only be set when attach_point of ctx is
+ *        BPF_TC_CUSTOM_PARENT, otherwise parent must be unset.
+ *
+ * Fills the following fields in opts:
+ *	handle
+ *	parent
+ *	priority
+ *	prog_id
+ */
+LIBBPF_API int bpf_tc_attach(struct bpf_tc_ctx *ctx, int fd,
+			     struct bpf_tc_opts *opts);
+/*
+ * @ctx: Cannot be NULL.
+ * @opts: Cannot be NULL, replace and prog_id must be unset, all other fields
+ *	  must be set.
+ */
+LIBBPF_API int bpf_tc_detach(struct bpf_tc_ctx *ctx,
+			     const struct bpf_tc_opts *opts);
+/*
+ * @ctx: Cannot be NULL.
+ * @opts: Cannot be NULL, replace and prog_id must be unset, all other fields
+ *	  must be set.
+ *
+ * Fills the following fields in opts:
+ *	handle
+ *	parent
+ *	priority
+ *	prog_id
+ */
+LIBBPF_API int bpf_tc_query(struct bpf_tc_ctx *ctx,
+			    struct bpf_tc_opts *opts);
+
 #ifdef __cplusplus
 } /* extern "C" */
 #endif
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index b9b29baf1df8..f6490d521601 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -361,4 +361,9 @@ LIBBPF_0.4.0 {
 		bpf_linker__new;
 		bpf_map__inner_map;
 		bpf_object__set_kversion;
+		bpf_tc_ctx_init;
+		bpf_tc_ctx_destroy;
+		bpf_tc_attach;
+		bpf_tc_detach;
+		bpf_tc_query;
 } LIBBPF_0.3.0;
diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c
index c79e30484e81..a389e1151391 100644
--- a/tools/lib/bpf/netlink.c
+++ b/tools/lib/bpf/netlink.c
@@ -4,7 +4,11 @@
 #include <stdlib.h>
 #include <memory.h>
 #include <unistd.h>
+#include <inttypes.h>
+#include <arpa/inet.h>
 #include <linux/bpf.h>
+#include <linux/if_ether.h>
+#include <linux/pkt_cls.h>
 #include <linux/rtnetlink.h>
 #include <sys/socket.h>
 #include <errno.h>
@@ -73,6 +77,11 @@ static int libbpf_netlink_open(__u32 *nl_pid)
 	return ret;
 }

+enum {
+	BPF_NL_CONT,
+	BPF_NL_NEXT,
+};
+
 static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq,
 			    __dump_nlmsg_t _fn, libbpf_dump_nlmsg_t fn,
 			    void *cookie)
@@ -84,6 +93,7 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq,
 	int len, ret;

 	while (multipart) {
+start:
 		multipart = false;
 		len = recv(sock, buf, sizeof(buf), 0);
 		if (len < 0) {
@@ -121,8 +131,16 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq,
 			}
 			if (_fn) {
 				ret = _fn(nh, fn, cookie);
-				if (ret)
+				if (ret < 0)
+					return ret;
+				switch (ret) {
+				case BPF_NL_CONT:
+					break;
+				case BPF_NL_NEXT:
+					goto start;
+				default:
 					return ret;
+				}
 			}
 		}
 	}
@@ -131,6 +149,21 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq,
 	return ret;
 }

+/* In TC-BPF we use seqnum to form causal order of operations on shared ctx
+ * socket, so we want to skip messages older than the one we are looking for,
+ * in case they are left in socket buffer for some reason (e.g. errors). */
+static int bpf_netlink_recv_skip(int sock, __u32 nl_pid, int seq, __dump_nlmsg_t fn,
+				 void *cookie)
+{
+	int ret;
+
+restart:
+	ret = bpf_netlink_recv(sock, nl_pid, seq, fn, NULL, cookie);
+	if (ret < 0 && ret == -LIBBPF_ERRNO__INVSEQ)
+		goto restart;
+	return ret;
+}
+
 static int __bpf_set_link_xdp_fd_replace(int ifindex, int fd, int old_fd,
 					 __u32 flags)
 {
@@ -365,3 +398,446 @@ int libbpf_nl_get_link(int sock, unsigned int nl_pid,
 	return bpf_netlink_recv(sock, nl_pid, seq, __dump_link_nlmsg,
 				dump_link_nlmsg, cookie);
 }
+
+/* TC-CTX */
+
+struct bpf_tc_ctx {
+	__u32 ifindex;
+	enum bpf_tc_attach_point parent;
+	int sock;
+	__u32 nl_pid;
+	__u32 seq;
+	bool created_qdisc;
+};
+
+typedef int (*qdisc_config_t)(struct nlmsghdr *nh, struct tcmsg *t,
+			      size_t maxsz);
+
+static int clsact_config(struct nlmsghdr *nh, struct tcmsg *t, size_t maxsz)
+{
+	int ret;
+
+	t->tcm_parent = TC_H_CLSACT;
+	t->tcm_handle = TC_H_MAKE(TC_H_CLSACT, 0);
+
+	ret = nlattr_add(nh, maxsz, TCA_KIND, "clsact", sizeof("clsact"));
+	if (ret < 0)
+		return ret;
+
+	return 0;
+}
+
+static const qdisc_config_t parent_to_qdisc[_BPF_TC_PARENT_MAX] = {
+	[BPF_TC_INGRESS] = &clsact_config,
+	[BPF_TC_EGRESS] = &clsact_config,
+	[BPF_TC_CUSTOM_PARENT] = NULL,
+};
+
+static int tc_qdisc_modify(struct bpf_tc_ctx *ctx, int cmd, int flags,
+			   qdisc_config_t config)
+{
+	int ret = 0;
+	struct {
+		struct nlmsghdr nh;
+		struct tcmsg t;
+		char buf[256];
+	} req;
+
+	if (!ctx || !config)
+		return -EINVAL;
+
+	memset(&req, 0, sizeof(req));
+	req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg));
+	req.nh.nlmsg_flags =
+		NLM_F_REQUEST | NLM_F_ACK | flags;
+	req.nh.nlmsg_type = cmd;
+	req.nh.nlmsg_pid = 0;
+	req.nh.nlmsg_seq = ++ctx->seq;
+	req.t.tcm_family = AF_UNSPEC;
+	req.t.tcm_ifindex = ctx->ifindex;
+
+	ret = config(&req.nh, &req.t, sizeof(req));
+	if (ret < 0)
+		return ret;
+
+	ret = send(ctx->sock, &req.nh, req.nh.nlmsg_len, 0);
+	if (ret < 0)
+		return ret;
+
+	return bpf_netlink_recv_skip(ctx->sock, ctx->nl_pid, ctx->seq, NULL, NULL);
+}
+
+static int tc_qdisc_create_excl(struct bpf_tc_ctx *ctx, qdisc_config_t config)
+{
+	return tc_qdisc_modify(ctx, RTM_NEWQDISC, NLM_F_CREATE | NLM_F_EXCL, config);
+}
+
+static int tc_qdisc_delete(struct bpf_tc_ctx *ctx, qdisc_config_t config)
+{
+	return tc_qdisc_modify(ctx, RTM_DELQDISC, 0, config);
+}
+
+struct bpf_tc_ctx *bpf_tc_ctx_init(__u32 ifindex, enum bpf_tc_attach_point parent,
+				   struct bpf_tc_ctx_opts *opts)
+{
+	struct bpf_tc_ctx *ctx = NULL;
+	qdisc_config_t config;
+	int ret, sock;
+	__u32 nl_pid;
+
+	if (!ifindex || parent >= _BPF_TC_PARENT_MAX ||
+	    !OPTS_VALID(opts, bpf_tc_ctx_opts)) {
+		errno = EINVAL;
+		return NULL;
+	}
+
+	sock = libbpf_netlink_open(&nl_pid);
+	if (sock < 0) {
+		errno = -sock;
+		return NULL;
+	}
+
+	ctx = calloc(1, sizeof(*ctx));
+	if (!ctx) {
+		errno = ENOMEM;
+		goto end_sock;
+	}
+
+	ctx->ifindex = ifindex;
+	ctx->parent = parent;
+	ctx->seq = time(NULL);
+	ctx->nl_pid = nl_pid;
+	ctx->sock = sock;
+
+	config = parent_to_qdisc[parent];
+	if (config) {
+		ret = tc_qdisc_create_excl(ctx, config);
+		if (ret < 0 && ret != -EEXIST) {
+			errno = -ret;
+			goto end_ctx;
+		}
+		ctx->created_qdisc = ret == 0;
+	}
+
+	return ctx;
+
+end_ctx:
+	free(ctx);
+end_sock:
+	close(sock);
+	return NULL;
+}
+
+struct pass_info {
+	struct bpf_tc_opts *opts;
+	bool processed;
+};
+
+static int __tc_query(struct bpf_tc_ctx *ctx,
+	              struct bpf_tc_opts *opts);
+
+int bpf_tc_ctx_destroy(struct bpf_tc_ctx *ctx)
+{
+	qdisc_config_t config;
+	int ret = 0;
+
+	if (!ctx)
+		return 0;
+
+	config = parent_to_qdisc[ctx->parent];
+	if (ctx->created_qdisc && config) {
+		/* ctx->parent cannot be BPF_TC_CUSTOM_PARENT, as this doesn't
+		 * map to a qdisc that can be created, so opts being NULL won't
+		 * be an error (e.g. in tc_ctx_get_tcm_parent).
+		 */
+		if (__tc_query(ctx, NULL) == -ENOENT)
+			ret = tc_qdisc_delete(ctx, config);
+	}
+
+	close(ctx->sock);
+	free(ctx);
+	return ret;
+}
+
+static long long int tc_ctx_get_tcm_parent(enum bpf_tc_attach_point type,
+					   __u32 parent)
+{
+	long long int ret;
+
+	switch (type) {
+	case BPF_TC_INGRESS:
+		ret = TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_INGRESS);
+		if (parent && parent != ret)
+			return -EINVAL;
+		break;
+	case BPF_TC_EGRESS:
+		ret = TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_EGRESS);
+		if (parent && parent != ret)
+			return -EINVAL;
+		break;
+	case BPF_TC_CUSTOM_PARENT:
+		if (!parent)
+			return -EINVAL;
+		ret = parent;
+		break;
+	default:
+		return -ERANGE;
+	}
+
+	return ret;
+}
+
+/* TC-BPF */
+
+static int tc_bpf_add_fd_and_name(struct nlmsghdr *nh, size_t maxsz, int fd)
+{
+	struct bpf_prog_info info = {};
+	__u32 info_len = sizeof(info);
+	char name[256] = {};
+	int len, ret;
+
+	ret = bpf_obj_get_info_by_fd(fd, &info, &info_len);
+	if (ret < 0)
+		return ret;
+
+	ret = nlattr_add(nh, maxsz, TCA_BPF_FD, &fd, sizeof(fd));
+	if (ret < 0)
+		return ret;
+
+	len = snprintf(name, sizeof(name), "%s:[%" PRIu32 "]", info.name,
+		       info.id);
+	if (len < 0 || len >= sizeof(name))
+		return len < 0 ? -EINVAL : -ENAMETOOLONG;
+
+	return nlattr_add(nh, maxsz, TCA_BPF_NAME, name, len + 1);
+}
+
+static int tc_cls_bpf_modify(struct bpf_tc_ctx *ctx, int fd, int cmd, int flags,
+			     struct bpf_tc_opts *opts, __dump_nlmsg_t fn)
+{
+	unsigned int bpf_flags = 0;
+	struct pass_info info = {};
+	__u32 protocol, priority;
+	long long int tcm_parent;
+	struct nlattr *nla;
+	int ret;
+	struct {
+		struct nlmsghdr nh;
+		struct tcmsg t;
+		char buf[256];
+	} req;
+
+	if (cmd == RTM_NEWTFILTER)
+		flags |= OPTS_GET(opts, replace, false) ? NLM_F_REPLACE :
+								NLM_F_EXCL;
+	priority = OPTS_GET(opts, priority, 0);
+	protocol = ETH_P_ALL;
+
+	memset(&req, 0, sizeof(req));
+	req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg));
+	req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | flags;
+	req.nh.nlmsg_type = cmd;
+	req.nh.nlmsg_pid = 0;
+	req.nh.nlmsg_seq = ++ctx->seq;
+	req.t.tcm_family = AF_UNSPEC;
+	req.t.tcm_handle = OPTS_GET(opts, handle, 0);
+	req.t.tcm_ifindex = ctx->ifindex;
+	req.t.tcm_info = TC_H_MAKE(priority << 16, htons(protocol));
+
+	tcm_parent = tc_ctx_get_tcm_parent(ctx->parent, OPTS_GET(opts, parent, 0));
+	if (tcm_parent < 0)
+		return tcm_parent;
+	req.t.tcm_parent = tcm_parent;
+
+	ret = nlattr_add(&req.nh, sizeof(req), TCA_KIND, "bpf", sizeof("bpf"));
+	if (ret < 0)
+		return ret;
+
+	nla = nlattr_begin_nested(&req.nh, sizeof(req), TCA_OPTIONS);
+	if (!nla)
+		return -EMSGSIZE;
+
+	if (cmd != RTM_DELTFILTER) {
+		ret = tc_bpf_add_fd_and_name(&req.nh, sizeof(req), fd);
+		if (ret < 0)
+			return ret;
+
+		/* direct action mode is always enabled */
+		bpf_flags |= TCA_BPF_FLAG_ACT_DIRECT;
+		ret = nlattr_add(&req.nh, sizeof(req), TCA_BPF_FLAGS,
+				 &bpf_flags, sizeof(bpf_flags));
+		if (ret < 0)
+			return ret;
+	}
+
+	nlattr_end_nested(&req.nh, nla);
+
+	ret = send(ctx->sock, &req.nh, req.nh.nlmsg_len, 0);
+	if (ret < 0)
+		return ret;
+
+	info.opts = opts;
+	ret = bpf_netlink_recv_skip(ctx->sock, ctx->nl_pid, ctx->seq, fn,
+				    &info);
+	if (ret < 0)
+		return ret;
+
+	/* Failed to process unicast response */
+	if (fn && !info.processed)
+		ret = -ENOENT;
+
+	return ret;
+}
+
+static int cls_get_info(struct nlmsghdr *nh, libbpf_dump_nlmsg_t fn,
+			void *cookie);
+
+int bpf_tc_attach(struct bpf_tc_ctx *ctx, int fd,
+		  struct bpf_tc_opts *opts)
+{
+	if (!ctx || fd < 0 || !opts)
+		return -EINVAL;
+
+	if (!OPTS_VALID(opts, bpf_tc_opts) || OPTS_GET(opts, prog_id, 0))
+		return -EINVAL;
+
+	if (OPTS_GET(opts, parent, 0) && ctx->parent < BPF_TC_CUSTOM_PARENT)
+		return -EINVAL;
+
+	return tc_cls_bpf_modify(ctx, fd, RTM_NEWTFILTER,
+				 NLM_F_ECHO | NLM_F_CREATE,
+				 opts, cls_get_info);
+}
+
+int bpf_tc_detach(struct bpf_tc_ctx *ctx,
+		  const struct bpf_tc_opts *opts)
+{
+	if (!ctx || !opts)
+		return -EINVAL;
+
+	if (!OPTS_VALID(opts, bpf_tc_opts) || !OPTS_GET(opts, handle, 0) ||
+	    !OPTS_GET(opts, priority, 0) || !OPTS_GET(opts, parent, 0) ||
+	    OPTS_GET(opts, replace, false) || OPTS_GET(opts, prog_id, 0))
+		return -EINVAL;
+
+	/* Won't write to opts when fn is NULL */
+	return tc_cls_bpf_modify(ctx, 0, RTM_DELTFILTER, 0,
+				 (struct bpf_tc_opts *)opts, NULL);
+}
+
+static int __cls_get_info(void *cookie, void *msg, struct nlattr **tb,
+			  bool unicast)
+{
+	struct nlattr *tbb[TCA_BPF_MAX + 1];
+	struct pass_info *info = cookie;
+	struct tcmsg *t = msg;
+
+	if (!info)
+		return -EINVAL;
+	if (unicast && info->processed)
+		return -EINVAL;
+	/* We use BPF_NL_CONT even after finding the filter to consume all
+	 * remaining multipart messages.
+	 */
+	if (info->processed || !tb[TCA_OPTIONS])
+		return BPF_NL_CONT;
+
+	libbpf_nla_parse_nested(tbb, TCA_BPF_MAX, tb[TCA_OPTIONS], NULL);
+	if (!tbb[TCA_BPF_ID])
+		return BPF_NL_CONT;
+
+	OPTS_SET(info->opts, handle, t->tcm_handle);
+	OPTS_SET(info->opts, parent, t->tcm_parent);
+	OPTS_SET(info->opts, priority, TC_H_MAJ(t->tcm_info) >> 16);
+	OPTS_SET(info->opts, prog_id, libbpf_nla_getattr_u32(tbb[TCA_BPF_ID]));
+
+	info->processed = true;
+	return unicast ? BPF_NL_NEXT : BPF_NL_CONT;
+}
+
+static int cls_get_info(struct nlmsghdr *nh, libbpf_dump_nlmsg_t fn,
+			void *cookie)
+{
+	struct tcmsg *t = NLMSG_DATA(nh);
+	struct nlattr *tb[TCA_MAX + 1];
+
+	libbpf_nla_parse(tb, TCA_MAX,
+			 (struct nlattr *)((char *)t + NLMSG_ALIGN(sizeof(*t))),
+			 NLMSG_PAYLOAD(nh, sizeof(*t)), NULL);
+	if (!tb[TCA_KIND])
+		return BPF_NL_CONT;
+
+	return __cls_get_info(cookie, t, tb, nh->nlmsg_flags & NLM_F_ECHO);
+}
+
+/* This is the less strict internal helper used to get dump for more than one
+ * filter, used to determine if there are any filters attach for a bpf_tc_ctx.
+ *
+ */
+static int __tc_query(struct bpf_tc_ctx *ctx,
+	              struct bpf_tc_opts *opts)
+{
+	struct pass_info pinfo = {};
+	__u32 priority, protocol;
+	long long int tcm_parent;
+	int ret;
+	struct {
+		struct nlmsghdr nh;
+		struct tcmsg t;
+		char buf[256];
+	} req = {
+		.nh.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg)),
+		.nh.nlmsg_type = RTM_GETTFILTER,
+		.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_DUMP,
+		.t.tcm_family = AF_UNSPEC,
+	};
+
+	if (!ctx)
+		return -EINVAL;
+
+	priority = OPTS_GET(opts, priority, 0);
+	protocol = ETH_P_ALL;
+
+	req.nh.nlmsg_seq = ++ctx->seq;
+	req.t.tcm_ifindex = ctx->ifindex;
+	req.t.tcm_handle = OPTS_GET(opts, handle, 0);
+	req.t.tcm_info = TC_H_MAKE(priority << 16, htons(protocol));
+
+	tcm_parent = tc_ctx_get_tcm_parent(ctx->parent, OPTS_GET(opts, parent, 0));
+	if (tcm_parent < 0)
+		return tcm_parent;
+	req.t.tcm_parent = tcm_parent;
+
+	ret = nlattr_add(&req.nh, sizeof(req), TCA_KIND, "bpf", sizeof("bpf"));
+	if (ret < 0)
+		return ret;
+
+	ret = send(ctx->sock, &req.nh, req.nh.nlmsg_len, 0);
+	if (ret < 0)
+		return ret;
+
+	pinfo.opts = opts;
+	ret = bpf_netlink_recv_skip(ctx->sock, ctx->nl_pid, ctx->seq,
+				    cls_get_info, &pinfo);
+	if (ret < 0)
+		return ret;
+
+	if (!pinfo.processed)
+		ret = -ENOENT;
+
+	return ret;
+}
+
+int bpf_tc_query(struct bpf_tc_ctx *ctx,
+		 struct bpf_tc_opts *opts)
+{
+	if (!ctx || !opts)
+		return -EINVAL;
+
+	if (!OPTS_VALID(opts, bpf_tc_opts) || !OPTS_GET(opts, handle, 0) ||
+	    !OPTS_GET(opts, priority, 0) || !OPTS_GET(opts, parent, 0) ||
+	    OPTS_GET(opts, replace, false) || OPTS_GET(opts, prog_id, 0))
+		return -EINVAL;
+
+	return __tc_query(ctx, opts);
+}
--
2.30.2


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH bpf-next v4 3/3] libbpf: add selftests for TC-BPF API
  2021-04-23 15:05 [PATCH bpf-next v4 0/3] Add TC-BPF API Kumar Kartikeya Dwivedi
  2021-04-23 15:05 ` [PATCH bpf-next v4 1/3] libbpf: add helpers for preparing netlink attributes Kumar Kartikeya Dwivedi
  2021-04-23 15:05 ` [PATCH bpf-next v4 2/3] libbpf: add low level TC-BPF API Kumar Kartikeya Dwivedi
@ 2021-04-23 15:06 ` Kumar Kartikeya Dwivedi
  2021-04-27 21:50   ` Andrii Nakryiko
  2 siblings, 1 reply; 18+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2021-04-23 15:06 UTC (permalink / raw)
  To: bpf
  Cc: Kumar Kartikeya Dwivedi, Toke Høiland-Jørgensen,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, netdev

This adds some basic tests for the low level bpf_tc_* API.

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 .../testing/selftests/bpf/prog_tests/tc_bpf.c | 204 ++++++++++++++++++
 .../testing/selftests/bpf/progs/test_tc_bpf.c |  12 ++
 2 files changed, 216 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/tc_bpf.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_tc_bpf.c

diff --git a/tools/testing/selftests/bpf/prog_tests/tc_bpf.c b/tools/testing/selftests/bpf/prog_tests/tc_bpf.c
new file mode 100644
index 000000000000..47505b92e50a
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/tc_bpf.c
@@ -0,0 +1,204 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <test_progs.h>
+#include <linux/pkt_cls.h>
+
+#include "test_tc_bpf.skel.h"
+
+#define LO_IFINDEX 1
+
+static const __u32 tcm_parent[2] = {
+	[BPF_TC_INGRESS] = TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_INGRESS),
+	[BPF_TC_EGRESS] = TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_EGRESS),
+};
+
+static int test_tc_internal(struct bpf_tc_ctx *ctx, int fd,
+			    enum bpf_tc_attach_point parent)
+{
+	DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .handle = 1, .priority = 10);
+	struct bpf_prog_info info = {};
+	__u32 info_len = sizeof(info);
+	int ret;
+
+	ret = bpf_obj_get_info_by_fd(fd, &info, &info_len);
+	if (!ASSERT_EQ(ret, 0, "bpf_obj_get_info_by_fd"))
+		return ret;
+
+	ret = bpf_tc_attach(ctx, fd, &opts);
+	if (!ASSERT_EQ(ret, 0, "bpf_tc_attach"))
+		return ret;
+
+	if (!ASSERT_EQ(opts.handle, 1, "handle set") ||
+	    !ASSERT_EQ(opts.priority, 10, "priority set") ||
+	    !ASSERT_EQ(opts.parent, tcm_parent[parent], "parent set") ||
+	    !ASSERT_NEQ(opts.prog_id, 0, "prog_id set"))
+		goto end;
+
+	opts.prog_id = 0;
+	ret = bpf_tc_query(ctx, &opts);
+	if (!ASSERT_EQ(ret, 0, "bpf_tc_query"))
+		goto end;
+
+	if (!ASSERT_NEQ(opts.prog_id, 0, "prog_id set") ||
+	    !ASSERT_EQ(info.id, opts.prog_id, "prog_id matching"))
+		goto end;
+
+	/* Atomic replace */
+	opts.replace = true;
+	opts.parent = opts.prog_id = 0;
+	ret = bpf_tc_attach(ctx, fd, &opts);
+	if (!ASSERT_EQ(ret, 0, "bpf_tc_attach replace mode"))
+		return ret;
+	opts.replace = false;
+
+end:
+	opts.prog_id = 0;
+	ret = bpf_tc_detach(ctx, &opts);
+	ASSERT_EQ(ret, 0, "bpf_tc_detach");
+	return ret;
+}
+
+int test_tc_invalid(struct bpf_tc_ctx *ctx, int fd)
+{
+	DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .handle = 1, .priority = 10,
+			    .parent = tcm_parent[BPF_TC_INGRESS]);
+	struct bpf_tc_ctx *inv_ctx;
+	int ret, saved_errno;
+
+	inv_ctx = bpf_tc_ctx_init(0, BPF_TC_INGRESS, NULL);
+	saved_errno = errno;
+	if (!ASSERT_EQ(inv_ctx, NULL, "bpf_tc_ctx_init invalid ifindex = 0"))
+		return -EINVAL;
+
+	ASSERT_EQ(saved_errno, EINVAL, "errno");
+
+	inv_ctx = bpf_tc_ctx_init(LO_IFINDEX, 0xdeadc0de, NULL);
+	saved_errno = errno;
+	if (!ASSERT_EQ(inv_ctx, NULL,
+		       "bpf_tc_ctx_init invalid parent >= _BPF_TC_PARENT_MAX"))
+		return -EINVAL;
+
+	ASSERT_EQ(saved_errno, EINVAL, "errno");
+
+	ret = bpf_tc_ctx_destroy(NULL);
+	if (!ASSERT_EQ(ret, 0, "bpf_tc_ctx_destroy ctx = NULL"))
+		return -EINVAL;
+
+	ret = bpf_tc_detach(NULL, &opts);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_detach invalid ctx = NULL"))
+		return -EINVAL;
+
+	ret = bpf_tc_detach(ctx, NULL);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_detach invalid opts = NULL"))
+		return -EINVAL;
+
+	ret = bpf_tc_query(NULL, &opts);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_query invalid ctx = NULL"))
+		return -EINVAL;
+
+	ret = bpf_tc_query(ctx, NULL);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_query invalid opts = NULL"))
+		return -EINVAL;
+
+	opts.replace = true;
+	ret = bpf_tc_detach(ctx, &opts);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_detach invalid replace set"))
+		return -EINVAL;
+	ret = bpf_tc_query(ctx, &opts);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_query invalid replace set"))
+		return -EINVAL;
+	opts.replace = false;
+
+	opts.prog_id = 42;
+	ret = bpf_tc_detach(ctx, &opts);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_detach invalid prog_id set"))
+		return -EINVAL;
+	ret = bpf_tc_query(ctx, &opts);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_query invalid prog_id set"))
+		return -EINVAL;
+	opts.prog_id = 0;
+
+	opts.handle = 0;
+	ret = bpf_tc_detach(ctx, &opts);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_detach invalid handle unset"))
+		return -EINVAL;
+	ret = bpf_tc_query(ctx, &opts);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_query invalid handle unset"))
+		return -EINVAL;
+	opts.handle = 1;
+
+	opts.priority = 0;
+	ret = bpf_tc_detach(ctx, &opts);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_detach invalid priority unset"))
+		return -EINVAL;
+	ret = bpf_tc_query(ctx, &opts);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_query invalid priority unset"))
+		return -EINVAL;
+	opts.priority = 10;
+
+	opts.parent = 0;
+	ret = bpf_tc_detach(ctx, &opts);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_detach invalid parent unset"))
+		return -EINVAL;
+	ret = bpf_tc_query(ctx, &opts);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_query invalid parent unset"))
+		return -EINVAL;
+
+	ret = bpf_tc_attach(NULL, fd, &opts);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_attach invalid ctx = NULL"))
+		return -EINVAL;
+
+	ret = bpf_tc_attach(ctx, -1, &opts);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_attach invalid fd < 0"))
+		return -EINVAL;
+
+	ret = bpf_tc_attach(ctx, fd, NULL);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_attach invalid opts = NULL"))
+		return -EINVAL;
+
+	opts.prog_id = 42;
+	ret = bpf_tc_attach(ctx, fd, &opts);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_attach invalid prog_id set"))
+		return -EINVAL;
+	opts.prog_id = 0;
+
+	return 0;
+}
+
+void test_tc_bpf(void)
+{
+	struct bpf_tc_ctx *ctx_ing = NULL, *ctx_eg = NULL;
+	struct test_tc_bpf *skel = NULL;
+	int cls_fd, ret;
+
+	skel = test_tc_bpf__open_and_load();
+	if (!ASSERT_NEQ(skel, NULL, "test_tc_bpf skeleton"))
+		goto end;
+
+	cls_fd = bpf_program__fd(skel->progs.cls);
+
+	ctx_ing = bpf_tc_ctx_init(LO_IFINDEX, BPF_TC_INGRESS, NULL);
+	if (!ASSERT_NEQ(ctx_ing, NULL, "bpf_tc_ctx_init(BPF_TC_INGRESS)"))
+		goto end;
+
+	ctx_eg = bpf_tc_ctx_init(LO_IFINDEX, BPF_TC_EGRESS, NULL);
+	if (!ASSERT_NEQ(ctx_eg, NULL, "bpf_tc_ctx_init(BPF_TC_EGRESS)"))
+		goto end;
+
+	ret = test_tc_internal(ctx_ing, cls_fd, BPF_TC_INGRESS);
+	if (!ASSERT_EQ(ret, 0, "test_tc_internal ingress"))
+		goto end;
+
+	ret = test_tc_internal(ctx_eg, cls_fd, BPF_TC_EGRESS);
+	if (!ASSERT_EQ(ret, 0, "test_tc_internal egress"))
+		goto end;
+
+	ret = test_tc_invalid(ctx_ing, cls_fd);
+	if (!ASSERT_EQ(ret, 0, "test_tc_invalid"))
+		goto end;
+
+end:
+	bpf_tc_ctx_destroy(ctx_eg);
+	bpf_tc_ctx_destroy(ctx_ing);
+	test_tc_bpf__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_tc_bpf.c b/tools/testing/selftests/bpf/progs/test_tc_bpf.c
new file mode 100644
index 000000000000..18a3a7ed924a
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_tc_bpf.c
@@ -0,0 +1,12 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+
+/* Dummy prog to test TC-BPF API */
+
+SEC("classifier")
+int cls(struct __sk_buff *skb)
+{
+	return 0;
+}
-- 
2.30.2


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next v4 2/3] libbpf: add low level TC-BPF API
  2021-04-23 15:05 ` [PATCH bpf-next v4 2/3] libbpf: add low level TC-BPF API Kumar Kartikeya Dwivedi
@ 2021-04-27 15:04   ` Daniel Borkmann
  2021-04-27 18:00     ` Toke Høiland-Jørgensen
                       ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Daniel Borkmann @ 2021-04-27 15:04 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: Toke Høiland-Jørgensen, Alexei Starovoitov,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, netdev

On 4/23/21 5:05 PM, Kumar Kartikeya Dwivedi wrote:
[...]
>   tools/lib/bpf/libbpf.h   |  92 ++++++++
>   tools/lib/bpf/libbpf.map |   5 +
>   tools/lib/bpf/netlink.c  | 478 ++++++++++++++++++++++++++++++++++++++-
>   3 files changed, 574 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> index bec4e6a6e31d..1c717c07b66e 100644
> --- a/tools/lib/bpf/libbpf.h
> +++ b/tools/lib/bpf/libbpf.h
> @@ -775,6 +775,98 @@ LIBBPF_API int bpf_linker__add_file(struct bpf_linker *linker, const char *filen
>   LIBBPF_API int bpf_linker__finalize(struct bpf_linker *linker);
>   LIBBPF_API void bpf_linker__free(struct bpf_linker *linker);
> 
> +enum bpf_tc_attach_point {
> +	BPF_TC_INGRESS,
> +	BPF_TC_EGRESS,
> +	BPF_TC_CUSTOM_PARENT,
> +	_BPF_TC_PARENT_MAX,

I don't think we need to expose _BPF_TC_PARENT_MAX as part of the API, I would drop
the latter.

> +};
> +
> +/* The opts structure is also used to return the created filters attributes
> + * (e.g. in case the user left them unset). Some of the options that were left
> + * out default to a reasonable value, documented below.
> + *
> + *	protocol - ETH_P_ALL
> + *	chain index - 0
> + *	class_id - 0 (can be set by bpf program using skb->tc_classid)
> + *	bpf_flags - TCA_BPF_FLAG_ACT_DIRECT (direct action mode)
> + *	bpf_flags_gen - 0
> + *
> + *	The user must fulfill documented requirements for each function.

Not sure if this is overly relevant as part of the bpf_tc_opts in here. For the
2nd part, I would probably just mention that libbpf internally attaches the bpf
programs with direct action mode. The hw offload may be future todo, and the other
bits are little used anyway; mentioning them here, what value does it have to
libbpf users? I'd rather just drop the 2nd part and/or simplify this paragraph
just stating that the progs are attached in direct action mode.

> + */
> +struct bpf_tc_opts {
> +	size_t sz;
> +	__u32 handle;
> +	__u32 parent;
> +	__u16 priority;
> +	__u32 prog_id;
> +	bool replace;
> +	size_t :0;
> +};
> +
> +#define bpf_tc_opts__last_field replace
> +
> +struct bpf_tc_ctx;
> +
> +struct bpf_tc_ctx_opts {
> +	size_t sz;
> +};
> +
> +#define bpf_tc_ctx_opts__last_field sz
> +
> +/* Requirements */
> +/*
> + * @ifindex: Must be > 0.
> + * @parent: Must be one of the enum constants < _BPF_TC_PARENT_MAX
> + * @opts: Can be NULL, currently no options are supported.
> + */

Up to Andrii, but we don't have such API doc in general inside libbpf.h, I
would drop it for the time being to be consistent with the rest (same for
others below).

> +LIBBPF_API struct bpf_tc_ctx *bpf_tc_ctx_init(__u32 ifindex,

nit: in user space s/__u32 ifindex/int ifindex/

> +					      enum bpf_tc_attach_point parent,
> +					      struct bpf_tc_ctx_opts *opts);

Should we enforce opts being NULL or non-NULL here, or drop the arg from here
for now altogether? (And if later versions of the functions show up this could
be mapped to the right one?)

> +/*
> + * @ctx: Can be NULL, if not, must point to a valid object.
> + *	 If the qdisc was attached during ctx_init, it will be deleted if no
> + *	 filters are attached to it.
> + *	 When ctx == NULL, this is a no-op.
> + */
> +LIBBPF_API int bpf_tc_ctx_destroy(struct bpf_tc_ctx *ctx);
> +/*
> + * @ctx: Cannot be NULL.
> + * @fd: Must be >= 0.
> + * @opts: Cannot be NULL, prog_id must be unset, all other fields can be
> + *	  optionally set. All fields except replace  will be set as per created
> + *        filter's attributes. parent must only be set when attach_point of ctx is
> + *        BPF_TC_CUSTOM_PARENT, otherwise parent must be unset.
> + *
> + * Fills the following fields in opts:
> + *	handle
> + *	parent
> + *	priority
> + *	prog_id
> + */
> +LIBBPF_API int bpf_tc_attach(struct bpf_tc_ctx *ctx, int fd,
> +			     struct bpf_tc_opts *opts);
> +/*
> + * @ctx: Cannot be NULL.
> + * @opts: Cannot be NULL, replace and prog_id must be unset, all other fields
> + *	  must be set.
> + */
> +LIBBPF_API int bpf_tc_detach(struct bpf_tc_ctx *ctx,
> +			     const struct bpf_tc_opts *opts);

One thing that I find a bit odd from this API is that BPF_TC_INGRESS / BPF_TC_EGRESS
needs to be set each time via bpf_tc_ctx_init(). So whenever a specific program would
be attached to both we need to 're-init' in between just to change from hook a to b,
whereas when you have BPF_TC_CUSTOM_PARENT, you could just use a different opts->parent
without going this detour (unless the clsact wasn't loaded there in the first place).

Could we add a BPF_TC_UNSPEC to enum bpf_tc_attach_point, which the user would pass to
bpf_tc_ctx_init(), so that opts.direction = BPF_TC_INGRESS with subsequent bpf_tc_attach()
can be called, and same opts.direction = BPF_TC_EGRESS with bpf_tc_attach() for different
fd. The only thing we cared about in bpf_tc_ctx_init() resp. the ctx was that qdisc was
ready.

> +/*
> + * @ctx: Cannot be NULL.
> + * @opts: Cannot be NULL, replace and prog_id must be unset, all other fields
> + *	  must be set.
> + *
> + * Fills the following fields in opts:
> + *	handle
> + *	parent
> + *	priority
> + *	prog_id
> + */
> +LIBBPF_API int bpf_tc_query(struct bpf_tc_ctx *ctx,
> +			    struct bpf_tc_opts *opts);
> +
>   #ifdef __cplusplus
>   } /* extern "C" */
>   #endif

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next v4 2/3] libbpf: add low level TC-BPF API
  2021-04-27 15:04   ` Daniel Borkmann
@ 2021-04-27 18:00     ` Toke Høiland-Jørgensen
  2021-04-27 18:02     ` Kumar Kartikeya Dwivedi
  2021-04-27 20:36     ` Andrii Nakryiko
  2 siblings, 0 replies; 18+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-04-27 18:00 UTC (permalink / raw)
  To: Daniel Borkmann, Kumar Kartikeya Dwivedi, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, netdev

Daniel Borkmann <daniel@iogearbox.net> writes:

> On 4/23/21 5:05 PM, Kumar Kartikeya Dwivedi wrote:
> [...]
>>   tools/lib/bpf/libbpf.h   |  92 ++++++++
>>   tools/lib/bpf/libbpf.map |   5 +
>>   tools/lib/bpf/netlink.c  | 478 ++++++++++++++++++++++++++++++++++++++-
>>   3 files changed, 574 insertions(+), 1 deletion(-)
>> 
>> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
>> index bec4e6a6e31d..1c717c07b66e 100644
>> --- a/tools/lib/bpf/libbpf.h
>> +++ b/tools/lib/bpf/libbpf.h
>> @@ -775,6 +775,98 @@ LIBBPF_API int bpf_linker__add_file(struct bpf_linker *linker, const char *filen
>>   LIBBPF_API int bpf_linker__finalize(struct bpf_linker *linker);
>>   LIBBPF_API void bpf_linker__free(struct bpf_linker *linker);
>> 
>> +enum bpf_tc_attach_point {
>> +	BPF_TC_INGRESS,
>> +	BPF_TC_EGRESS,
>> +	BPF_TC_CUSTOM_PARENT,
>> +	_BPF_TC_PARENT_MAX,
>
> I don't think we need to expose _BPF_TC_PARENT_MAX as part of the API, I would drop
> the latter.
>
>> +};
>> +
>> +/* The opts structure is also used to return the created filters attributes
>> + * (e.g. in case the user left them unset). Some of the options that were left
>> + * out default to a reasonable value, documented below.
>> + *
>> + *	protocol - ETH_P_ALL
>> + *	chain index - 0
>> + *	class_id - 0 (can be set by bpf program using skb->tc_classid)
>> + *	bpf_flags - TCA_BPF_FLAG_ACT_DIRECT (direct action mode)
>> + *	bpf_flags_gen - 0
>> + *
>> + *	The user must fulfill documented requirements for each function.
>
> Not sure if this is overly relevant as part of the bpf_tc_opts in here. For the
> 2nd part, I would probably just mention that libbpf internally attaches the bpf
> programs with direct action mode. The hw offload may be future todo, and the other
> bits are little used anyway; mentioning them here, what value does it have to
> libbpf users? I'd rather just drop the 2nd part and/or simplify this paragraph
> just stating that the progs are attached in direct action mode.

The idea was that this would be for the benefit of people familiar with
TC concepts. Maybe simplify it to "Programs are attached in direct
action mode with a protocol value of 'all', and all other parameters
that the 'tc' binary supports will be set to 0"?

>> + */
>> +struct bpf_tc_opts {
>> +	size_t sz;
>> +	__u32 handle;
>> +	__u32 parent;
>> +	__u16 priority;
>> +	__u32 prog_id;
>> +	bool replace;
>> +	size_t :0;
>> +};
>> +
>> +#define bpf_tc_opts__last_field replace
>> +
>> +struct bpf_tc_ctx;
>> +
>> +struct bpf_tc_ctx_opts {
>> +	size_t sz;
>> +};
>> +
>> +#define bpf_tc_ctx_opts__last_field sz
>> +
>> +/* Requirements */
>> +/*
>> + * @ifindex: Must be > 0.
>> + * @parent: Must be one of the enum constants < _BPF_TC_PARENT_MAX
>> + * @opts: Can be NULL, currently no options are supported.
>> + */
>
> Up to Andrii, but we don't have such API doc in general inside libbpf.h, I
> would drop it for the time being to be consistent with the rest (same for
> others below).
>
>> +LIBBPF_API struct bpf_tc_ctx *bpf_tc_ctx_init(__u32 ifindex,
>
> nit: in user space s/__u32 ifindex/int ifindex/
>
>> +					      enum bpf_tc_attach_point parent,
>> +					      struct bpf_tc_ctx_opts *opts);
>
> Should we enforce opts being NULL or non-NULL here, or drop the arg from here
> for now altogether? (And if later versions of the functions show up this could
> be mapped to the right one?)

Hmm, extending later is easier if there's already an opts parameter. But
it could be declared 'void *' and enforced as always 0 for now?

>> +/*
>> + * @ctx: Can be NULL, if not, must point to a valid object.
>> + *	 If the qdisc was attached during ctx_init, it will be deleted if no
>> + *	 filters are attached to it.
>> + *	 When ctx == NULL, this is a no-op.
>> + */
>> +LIBBPF_API int bpf_tc_ctx_destroy(struct bpf_tc_ctx *ctx);
>> +/*
>> + * @ctx: Cannot be NULL.
>> + * @fd: Must be >= 0.
>> + * @opts: Cannot be NULL, prog_id must be unset, all other fields can be
>> + *	  optionally set. All fields except replace  will be set as per created
>> + *        filter's attributes. parent must only be set when attach_point of ctx is
>> + *        BPF_TC_CUSTOM_PARENT, otherwise parent must be unset.
>> + *
>> + * Fills the following fields in opts:
>> + *	handle
>> + *	parent
>> + *	priority
>> + *	prog_id
>> + */
>> +LIBBPF_API int bpf_tc_attach(struct bpf_tc_ctx *ctx, int fd,
>> +			     struct bpf_tc_opts *opts);
>> +/*
>> + * @ctx: Cannot be NULL.
>> + * @opts: Cannot be NULL, replace and prog_id must be unset, all other fields
>> + *	  must be set.
>> + */
>> +LIBBPF_API int bpf_tc_detach(struct bpf_tc_ctx *ctx,
>> +			     const struct bpf_tc_opts *opts);
>
> One thing that I find a bit odd from this API is that BPF_TC_INGRESS / BPF_TC_EGRESS
> needs to be set each time via bpf_tc_ctx_init(). So whenever a specific program would
> be attached to both we need to 're-init' in between just to change from hook a to b,
> whereas when you have BPF_TC_CUSTOM_PARENT, you could just use a different opts->parent
> without going this detour (unless the clsact wasn't loaded there in the first place).
>
> Could we add a BPF_TC_UNSPEC to enum bpf_tc_attach_point, which the user would pass to
> bpf_tc_ctx_init(), so that opts.direction = BPF_TC_INGRESS with subsequent bpf_tc_attach()
> can be called, and same opts.direction = BPF_TC_EGRESS with bpf_tc_attach() for different
> fd. The only thing we cared about in bpf_tc_ctx_init() resp. the ctx was that qdisc was
> ready.

This sounds workable - no objections from me :)

-Toke


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next v4 2/3] libbpf: add low level TC-BPF API
  2021-04-27 15:04   ` Daniel Borkmann
  2021-04-27 18:00     ` Toke Høiland-Jørgensen
@ 2021-04-27 18:02     ` Kumar Kartikeya Dwivedi
  2021-04-27 18:15       ` Toke Høiland-Jørgensen
  2021-04-27 21:55       ` Daniel Borkmann
  2021-04-27 20:36     ` Andrii Nakryiko
  2 siblings, 2 replies; 18+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2021-04-27 18:02 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: bpf, Toke Høiland-Jørgensen, Alexei Starovoitov,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, netdev

On Tue, Apr 27, 2021 at 08:34:30PM IST, Daniel Borkmann wrote:
> On 4/23/21 5:05 PM, Kumar Kartikeya Dwivedi wrote:
> [...]
> >   tools/lib/bpf/libbpf.h   |  92 ++++++++
> >   tools/lib/bpf/libbpf.map |   5 +
> >   tools/lib/bpf/netlink.c  | 478 ++++++++++++++++++++++++++++++++++++++-
> >   3 files changed, 574 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> > index bec4e6a6e31d..1c717c07b66e 100644
> > --- a/tools/lib/bpf/libbpf.h
> > +++ b/tools/lib/bpf/libbpf.h
> > @@ -775,6 +775,98 @@ LIBBPF_API int bpf_linker__add_file(struct bpf_linker *linker, const char *filen
> >   LIBBPF_API int bpf_linker__finalize(struct bpf_linker *linker);
> >   LIBBPF_API void bpf_linker__free(struct bpf_linker *linker);
> >
> > +enum bpf_tc_attach_point {
> > +	BPF_TC_INGRESS,
> > +	BPF_TC_EGRESS,
> > +	BPF_TC_CUSTOM_PARENT,
> > +	_BPF_TC_PARENT_MAX,
>
> I don't think we need to expose _BPF_TC_PARENT_MAX as part of the API, I would drop
> the latter.
>

Ok, will drop.

> > +};
> > +
> > +/* The opts structure is also used to return the created filters attributes
> > + * (e.g. in case the user left them unset). Some of the options that were left
> > + * out default to a reasonable value, documented below.
> > + *
> > + *	protocol - ETH_P_ALL
> > + *	chain index - 0
> > + *	class_id - 0 (can be set by bpf program using skb->tc_classid)
> > + *	bpf_flags - TCA_BPF_FLAG_ACT_DIRECT (direct action mode)
> > + *	bpf_flags_gen - 0
> > + *
> > + *	The user must fulfill documented requirements for each function.
>
> Not sure if this is overly relevant as part of the bpf_tc_opts in here. For the
> 2nd part, I would probably just mention that libbpf internally attaches the bpf
> programs with direct action mode. The hw offload may be future todo, and the other
> bits are little used anyway; mentioning them here, what value does it have to
> libbpf users? I'd rather just drop the 2nd part and/or simplify this paragraph
> just stating that the progs are attached in direct action mode.
>

The goal was to just document whatever attributes were set to by default, but I can see
your point. I'll trim it.

> > + */
> > +struct bpf_tc_opts {
> > +	size_t sz;
> > +	__u32 handle;
> > +	__u32 parent;
> > +	__u16 priority;
> > +	__u32 prog_id;
> > +	bool replace;
> > +	size_t :0;
> > +};
> > +
> > +#define bpf_tc_opts__last_field replace
> > +
> > +struct bpf_tc_ctx;
> > +
> > +struct bpf_tc_ctx_opts {
> > +	size_t sz;
> > +};
> > +
> > +#define bpf_tc_ctx_opts__last_field sz
> > +
> > +/* Requirements */
> > +/*
> > + * @ifindex: Must be > 0.
> > + * @parent: Must be one of the enum constants < _BPF_TC_PARENT_MAX
> > + * @opts: Can be NULL, currently no options are supported.
> > + */
>
> Up to Andrii, but we don't have such API doc in general inside libbpf.h, I
> would drop it for the time being to be consistent with the rest (same for
> others below).
>

I think we need to keep this somewhere. We dropped bpf_tc_info since it wasn't
really serving any purpose, but that meant we would put the only extra thing it
returned (prog_id) into bpf_tc_opts. That means we also need to take care that
it is unset (along with replace) in functions where it isn't used, to allow for
reuse for some future purpose. If we don't document that the user needs to unset
them whenever working with bpf_tc_query and bpf_tc_detach, how are they supposed
to know?

Maybe a man page and/or a blog post would be better? Just putting it above the
function seemed best for now.

> > +LIBBPF_API struct bpf_tc_ctx *bpf_tc_ctx_init(__u32 ifindex,
>
> nit: in user space s/__u32 ifindex/int ifindex/
>

Ok.

> > +					      enum bpf_tc_attach_point parent,
> > +					      struct bpf_tc_ctx_opts *opts);
>
> Should we enforce opts being NULL or non-NULL here, or drop the arg from here
> for now altogether? (And if later versions of the functions show up this could
> be mapped to the right one?)
>

I can enforce it to be NULL for now, but I'd rather keep it this way than add
another with opts later. The way this works, you could extend it to attach
other kinds of qdiscs (by taking a simple config in opts), which I think can be
very useful.

> > +/*
> > + * @ctx: Can be NULL, if not, must point to a valid object.
> > + *	 If the qdisc was attached during ctx_init, it will be deleted if no
> > + *	 filters are attached to it.
> > + *	 When ctx == NULL, this is a no-op.
> > + */
> > +LIBBPF_API int bpf_tc_ctx_destroy(struct bpf_tc_ctx *ctx);
> > +/*
> > + * @ctx: Cannot be NULL.
> > + * @fd: Must be >= 0.
> > + * @opts: Cannot be NULL, prog_id must be unset, all other fields can be
> > + *	  optionally set. All fields except replace  will be set as per created
> > + *        filter's attributes. parent must only be set when attach_point of ctx is
> > + *        BPF_TC_CUSTOM_PARENT, otherwise parent must be unset.
> > + *
> > + * Fills the following fields in opts:
> > + *	handle
> > + *	parent
> > + *	priority
> > + *	prog_id
> > + */
> > +LIBBPF_API int bpf_tc_attach(struct bpf_tc_ctx *ctx, int fd,
> > +			     struct bpf_tc_opts *opts);
> > +/*
> > + * @ctx: Cannot be NULL.
> > + * @opts: Cannot be NULL, replace and prog_id must be unset, all other fields
> > + *	  must be set.
> > + */
> > +LIBBPF_API int bpf_tc_detach(struct bpf_tc_ctx *ctx,
> > +			     const struct bpf_tc_opts *opts);
>
> One thing that I find a bit odd from this API is that BPF_TC_INGRESS / BPF_TC_EGRESS
> needs to be set each time via bpf_tc_ctx_init(). So whenever a specific program would
> be attached to both we need to 're-init' in between just to change from hook a to b,
> whereas when you have BPF_TC_CUSTOM_PARENT, you could just use a different opts->parent
> without going this detour (unless the clsact wasn't loaded there in the first place).
>

Currently I check that opts->parent is unset when BPF_TC_INGRESS or BPF_TC_EGRESS
is set as attach point. But since both map to clsact, we could allow the user to
specify opts->parent as BPF_TC_INGRESS or BPF_TC_EGRESS (no need to use
TC_H_MAKE, we can detect it from ctx->parent that it won't be a parent id). This
would mean that by default attach point is what you set for ctx, but for
bpf_tc_attach you can temporarily override to be some other attach point (for
the same qdisc). You still won't be able to set anything other than the two
though.

> Could we add a BPF_TC_UNSPEC to enum bpf_tc_attach_point, which the user would pass to
> bpf_tc_ctx_init(), so that opts.direction = BPF_TC_INGRESS with subsequent bpf_tc_attach()
> can be called, and same opts.direction = BPF_TC_EGRESS with bpf_tc_attach() for different
> fd. The only thing we cared about in bpf_tc_ctx_init() resp. the ctx was that qdisc was
> ready.
>
> > +/*
> > + * @ctx: Cannot be NULL.
> > + * @opts: Cannot be NULL, replace and prog_id must be unset, all other fields
> > + *	  must be set.
> > + *
> > + * Fills the following fields in opts:
> > + *	handle
> > + *	parent
> > + *	priority
> > + *	prog_id
> > + */
> > +LIBBPF_API int bpf_tc_query(struct bpf_tc_ctx *ctx,
> > +			    struct bpf_tc_opts *opts);
> > +
> >   #ifdef __cplusplus
> >   } /* extern "C" */
> >   #endif

--
Kartikeya

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next v4 2/3] libbpf: add low level TC-BPF API
  2021-04-27 18:02     ` Kumar Kartikeya Dwivedi
@ 2021-04-27 18:15       ` Toke Høiland-Jørgensen
  2021-04-27 21:55       ` Daniel Borkmann
  1 sibling, 0 replies; 18+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-04-27 18:15 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, Daniel Borkmann
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Martin KaFai Lau,
	Song Liu, Yonghong Song, John Fastabend, KP Singh,
	David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer, netdev

Kumar Kartikeya Dwivedi <memxor@gmail.com> writes:

> On Tue, Apr 27, 2021 at 08:34:30PM IST, Daniel Borkmann wrote:
>> On 4/23/21 5:05 PM, Kumar Kartikeya Dwivedi wrote:
>> [...]
>> >   tools/lib/bpf/libbpf.h   |  92 ++++++++
>> >   tools/lib/bpf/libbpf.map |   5 +
>> >   tools/lib/bpf/netlink.c  | 478 ++++++++++++++++++++++++++++++++++++++-
>> >   3 files changed, 574 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
>> > index bec4e6a6e31d..1c717c07b66e 100644
>> > --- a/tools/lib/bpf/libbpf.h
>> > +++ b/tools/lib/bpf/libbpf.h
>> > @@ -775,6 +775,98 @@ LIBBPF_API int bpf_linker__add_file(struct bpf_linker *linker, const char *filen
>> >   LIBBPF_API int bpf_linker__finalize(struct bpf_linker *linker);
>> >   LIBBPF_API void bpf_linker__free(struct bpf_linker *linker);
>> >
>> > +enum bpf_tc_attach_point {
>> > +	BPF_TC_INGRESS,
>> > +	BPF_TC_EGRESS,
>> > +	BPF_TC_CUSTOM_PARENT,
>> > +	_BPF_TC_PARENT_MAX,
>>
>> I don't think we need to expose _BPF_TC_PARENT_MAX as part of the API, I would drop
>> the latter.
>>
>
> Ok, will drop.
>
>> > +};
>> > +
>> > +/* The opts structure is also used to return the created filters attributes
>> > + * (e.g. in case the user left them unset). Some of the options that were left
>> > + * out default to a reasonable value, documented below.
>> > + *
>> > + *	protocol - ETH_P_ALL
>> > + *	chain index - 0
>> > + *	class_id - 0 (can be set by bpf program using skb->tc_classid)
>> > + *	bpf_flags - TCA_BPF_FLAG_ACT_DIRECT (direct action mode)
>> > + *	bpf_flags_gen - 0
>> > + *
>> > + *	The user must fulfill documented requirements for each function.
>>
>> Not sure if this is overly relevant as part of the bpf_tc_opts in here. For the
>> 2nd part, I would probably just mention that libbpf internally attaches the bpf
>> programs with direct action mode. The hw offload may be future todo, and the other
>> bits are little used anyway; mentioning them here, what value does it have to
>> libbpf users? I'd rather just drop the 2nd part and/or simplify this paragraph
>> just stating that the progs are attached in direct action mode.
>>
>
> The goal was to just document whatever attributes were set to by default, but I can see
> your point. I'll trim it.
>
>> > + */
>> > +struct bpf_tc_opts {
>> > +	size_t sz;
>> > +	__u32 handle;
>> > +	__u32 parent;
>> > +	__u16 priority;
>> > +	__u32 prog_id;
>> > +	bool replace;
>> > +	size_t :0;
>> > +};
>> > +
>> > +#define bpf_tc_opts__last_field replace
>> > +
>> > +struct bpf_tc_ctx;
>> > +
>> > +struct bpf_tc_ctx_opts {
>> > +	size_t sz;
>> > +};
>> > +
>> > +#define bpf_tc_ctx_opts__last_field sz
>> > +
>> > +/* Requirements */
>> > +/*
>> > + * @ifindex: Must be > 0.
>> > + * @parent: Must be one of the enum constants < _BPF_TC_PARENT_MAX
>> > + * @opts: Can be NULL, currently no options are supported.
>> > + */
>>
>> Up to Andrii, but we don't have such API doc in general inside libbpf.h, I
>> would drop it for the time being to be consistent with the rest (same for
>> others below).
>>
>
> I think we need to keep this somewhere. We dropped bpf_tc_info since it wasn't
> really serving any purpose, but that meant we would put the only extra thing it
> returned (prog_id) into bpf_tc_opts. That means we also need to take care that
> it is unset (along with replace) in functions where it isn't used, to allow for
> reuse for some future purpose. If we don't document that the user needs to unset
> them whenever working with bpf_tc_query and bpf_tc_detach, how are they supposed
> to know?
>
> Maybe a man page and/or a blog post would be better? Just putting it above the
> function seemed best for now.

You could document it together with the struct definition instead of for
each function? See the inline comments in bpf_object_open_opts for instance...

-Toke


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next v4 2/3] libbpf: add low level TC-BPF API
  2021-04-27 15:04   ` Daniel Borkmann
  2021-04-27 18:00     ` Toke Høiland-Jørgensen
  2021-04-27 18:02     ` Kumar Kartikeya Dwivedi
@ 2021-04-27 20:36     ` Andrii Nakryiko
  2 siblings, 0 replies; 18+ messages in thread
From: Andrii Nakryiko @ 2021-04-27 20:36 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Kumar Kartikeya Dwivedi, bpf, Toke Høiland-Jørgensen,
	Alexei Starovoitov, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, Networking

On Tue, Apr 27, 2021 at 8:04 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 4/23/21 5:05 PM, Kumar Kartikeya Dwivedi wrote:
> [...]
> >   tools/lib/bpf/libbpf.h   |  92 ++++++++
> >   tools/lib/bpf/libbpf.map |   5 +
> >   tools/lib/bpf/netlink.c  | 478 ++++++++++++++++++++++++++++++++++++++-
> >   3 files changed, 574 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> > index bec4e6a6e31d..1c717c07b66e 100644
> > --- a/tools/lib/bpf/libbpf.h
> > +++ b/tools/lib/bpf/libbpf.h
> > @@ -775,6 +775,98 @@ LIBBPF_API int bpf_linker__add_file(struct bpf_linker *linker, const char *filen
> >   LIBBPF_API int bpf_linker__finalize(struct bpf_linker *linker);
> >   LIBBPF_API void bpf_linker__free(struct bpf_linker *linker);
> >
> > +enum bpf_tc_attach_point {
> > +     BPF_TC_INGRESS,
> > +     BPF_TC_EGRESS,
> > +     BPF_TC_CUSTOM_PARENT,
> > +     _BPF_TC_PARENT_MAX,
>
> I don't think we need to expose _BPF_TC_PARENT_MAX as part of the API, I would drop
> the latter.
>
> > +};
> > +
> > +/* The opts structure is also used to return the created filters attributes
> > + * (e.g. in case the user left them unset). Some of the options that were left
> > + * out default to a reasonable value, documented below.
> > + *
> > + *   protocol - ETH_P_ALL
> > + *   chain index - 0
> > + *   class_id - 0 (can be set by bpf program using skb->tc_classid)
> > + *   bpf_flags - TCA_BPF_FLAG_ACT_DIRECT (direct action mode)
> > + *   bpf_flags_gen - 0
> > + *
> > + *   The user must fulfill documented requirements for each function.
>
> Not sure if this is overly relevant as part of the bpf_tc_opts in here. For the
> 2nd part, I would probably just mention that libbpf internally attaches the bpf
> programs with direct action mode. The hw offload may be future todo, and the other
> bits are little used anyway; mentioning them here, what value does it have to
> libbpf users? I'd rather just drop the 2nd part and/or simplify this paragraph
> just stating that the progs are attached in direct action mode.
>
> > + */
> > +struct bpf_tc_opts {
> > +     size_t sz;
> > +     __u32 handle;
> > +     __u32 parent;
> > +     __u16 priority;
> > +     __u32 prog_id;
> > +     bool replace;
> > +     size_t :0;
> > +};
> > +
> > +#define bpf_tc_opts__last_field replace
> > +
> > +struct bpf_tc_ctx;
> > +
> > +struct bpf_tc_ctx_opts {
> > +     size_t sz;
> > +};
> > +
> > +#define bpf_tc_ctx_opts__last_field sz
> > +
> > +/* Requirements */
> > +/*
> > + * @ifindex: Must be > 0.
> > + * @parent: Must be one of the enum constants < _BPF_TC_PARENT_MAX
> > + * @opts: Can be NULL, currently no options are supported.
> > + */
>
> Up to Andrii, but we don't have such API doc in general inside libbpf.h, I
> would drop it for the time being to be consistent with the rest (same for
> others below).

+1

>
> > +LIBBPF_API struct bpf_tc_ctx *bpf_tc_ctx_init(__u32 ifindex,
>
> nit: in user space s/__u32 ifindex/int ifindex/
>
> > +                                           enum bpf_tc_attach_point parent,
> > +                                           struct bpf_tc_ctx_opts *opts);
>
> Should we enforce opts being NULL or non-NULL here, or drop the arg from here
> for now altogether? (And if later versions of the functions show up this could
> be mapped to the right one?)
>

OPTS_VALID check handles all the cases. Opts are always allowed to be
NULL. If it's not null, all the bytes beyond what current libbpf
version supports should be zero. All that is handled by OPTS_VALID, so
I don't think anything extra needs to be checked.

> > +/*
> > + * @ctx: Can be NULL, if not, must point to a valid object.
> > + *    If the qdisc was attached during ctx_init, it will be deleted if no
> > + *    filters are attached to it.
> > + *    When ctx == NULL, this is a no-op.
> > + */
> > +LIBBPF_API int bpf_tc_ctx_destroy(struct bpf_tc_ctx *ctx);
> > +/*
> > + * @ctx: Cannot be NULL.
> > + * @fd: Must be >= 0.
> > + * @opts: Cannot be NULL, prog_id must be unset, all other fields can be
> > + *     optionally set. All fields except replace  will be set as per created
> > + *        filter's attributes. parent must only be set when attach_point of ctx is
> > + *        BPF_TC_CUSTOM_PARENT, otherwise parent must be unset.
> > + *
> > + * Fills the following fields in opts:
> > + *   handle
> > + *   parent
> > + *   priority
> > + *   prog_id
> > + */
> > +LIBBPF_API int bpf_tc_attach(struct bpf_tc_ctx *ctx, int fd,
> > +                          struct bpf_tc_opts *opts);
> > +/*
> > + * @ctx: Cannot be NULL.
> > + * @opts: Cannot be NULL, replace and prog_id must be unset, all other fields
> > + *     must be set.
> > + */
> > +LIBBPF_API int bpf_tc_detach(struct bpf_tc_ctx *ctx,
> > +                          const struct bpf_tc_opts *opts);
>
> One thing that I find a bit odd from this API is that BPF_TC_INGRESS / BPF_TC_EGRESS
> needs to be set each time via bpf_tc_ctx_init(). So whenever a specific program would
> be attached to both we need to 're-init' in between just to change from hook a to b,
> whereas when you have BPF_TC_CUSTOM_PARENT, you could just use a different opts->parent
> without going this detour (unless the clsact wasn't loaded there in the first place).
>
> Could we add a BPF_TC_UNSPEC to enum bpf_tc_attach_point, which the user would pass to
> bpf_tc_ctx_init(), so that opts.direction = BPF_TC_INGRESS with subsequent bpf_tc_attach()
> can be called, and same opts.direction = BPF_TC_EGRESS with bpf_tc_attach() for different
> fd. The only thing we cared about in bpf_tc_ctx_init() resp. the ctx was that qdisc was
> ready.
>
> > +/*
> > + * @ctx: Cannot be NULL.
> > + * @opts: Cannot be NULL, replace and prog_id must be unset, all other fields
> > + *     must be set.
> > + *
> > + * Fills the following fields in opts:
> > + *   handle
> > + *   parent
> > + *   priority
> > + *   prog_id
> > + */
> > +LIBBPF_API int bpf_tc_query(struct bpf_tc_ctx *ctx,
> > +                         struct bpf_tc_opts *opts);
> > +
> >   #ifdef __cplusplus
> >   } /* extern "C" */
> >   #endif

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next v4 3/3] libbpf: add selftests for TC-BPF API
  2021-04-23 15:06 ` [PATCH bpf-next v4 3/3] libbpf: add selftests for " Kumar Kartikeya Dwivedi
@ 2021-04-27 21:50   ` Andrii Nakryiko
  0 siblings, 0 replies; 18+ messages in thread
From: Andrii Nakryiko @ 2021-04-27 21:50 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Toke Høiland-Jørgensen, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, Networking

On Fri, Apr 23, 2021 at 8:06 AM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> This adds some basic tests for the low level bpf_tc_* API.
>
> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>  .../testing/selftests/bpf/prog_tests/tc_bpf.c | 204 ++++++++++++++++++
>  .../testing/selftests/bpf/progs/test_tc_bpf.c |  12 ++
>  2 files changed, 216 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/tc_bpf.c
>  create mode 100644 tools/testing/selftests/bpf/progs/test_tc_bpf.c
>

[...]

> +
> +void test_tc_bpf(void)
> +{
> +       struct bpf_tc_ctx *ctx_ing = NULL, *ctx_eg = NULL;
> +       struct test_tc_bpf *skel = NULL;
> +       int cls_fd, ret;
> +
> +       skel = test_tc_bpf__open_and_load();
> +       if (!ASSERT_NEQ(skel, NULL, "test_tc_bpf skeleton"))
> +               goto end;
> +
> +       cls_fd = bpf_program__fd(skel->progs.cls);
> +
> +       ctx_ing = bpf_tc_ctx_init(LO_IFINDEX, BPF_TC_INGRESS, NULL);
> +       if (!ASSERT_NEQ(ctx_ing, NULL, "bpf_tc_ctx_init(BPF_TC_INGRESS)"))

please use ASSERT_OK_PTR() and ASSERT_ERR_PTR() for pointer checks.
They handle both NULL/non-NULL cases and ERR_PTR() errors.

> +               goto end;
> +
> +       ctx_eg = bpf_tc_ctx_init(LO_IFINDEX, BPF_TC_EGRESS, NULL);
> +       if (!ASSERT_NEQ(ctx_eg, NULL, "bpf_tc_ctx_init(BPF_TC_EGRESS)"))
> +               goto end;
> +
> +       ret = test_tc_internal(ctx_ing, cls_fd, BPF_TC_INGRESS);
> +       if (!ASSERT_EQ(ret, 0, "test_tc_internal ingress"))
> +               goto end;
> +
> +       ret = test_tc_internal(ctx_eg, cls_fd, BPF_TC_EGRESS);
> +       if (!ASSERT_EQ(ret, 0, "test_tc_internal egress"))
> +               goto end;
> +
> +       ret = test_tc_invalid(ctx_ing, cls_fd);
> +       if (!ASSERT_EQ(ret, 0, "test_tc_invalid"))
> +               goto end;
> +
> +end:
> +       bpf_tc_ctx_destroy(ctx_eg);
> +       bpf_tc_ctx_destroy(ctx_ing);
> +       test_tc_bpf__destroy(skel);
> +}
> diff --git a/tools/testing/selftests/bpf/progs/test_tc_bpf.c b/tools/testing/selftests/bpf/progs/test_tc_bpf.c
> new file mode 100644
> index 000000000000..18a3a7ed924a
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/test_tc_bpf.c
> @@ -0,0 +1,12 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/bpf.h>
> +#include <bpf/bpf_helpers.h>
> +
> +/* Dummy prog to test TC-BPF API */
> +
> +SEC("classifier")
> +int cls(struct __sk_buff *skb)
> +{
> +       return 0;
> +}
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next v4 2/3] libbpf: add low level TC-BPF API
  2021-04-27 18:02     ` Kumar Kartikeya Dwivedi
  2021-04-27 18:15       ` Toke Høiland-Jørgensen
@ 2021-04-27 21:55       ` Daniel Borkmann
  2021-04-27 22:05         ` Daniel Borkmann
  1 sibling, 1 reply; 18+ messages in thread
From: Daniel Borkmann @ 2021-04-27 21:55 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Toke Høiland-Jørgensen, Alexei Starovoitov,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, netdev

On 4/27/21 8:02 PM, Kumar Kartikeya Dwivedi wrote:
> On Tue, Apr 27, 2021 at 08:34:30PM IST, Daniel Borkmann wrote:
>> On 4/23/21 5:05 PM, Kumar Kartikeya Dwivedi wrote:
[...]
>>> +/*
>>> + * @ctx: Can be NULL, if not, must point to a valid object.
>>> + *	 If the qdisc was attached during ctx_init, it will be deleted if no
>>> + *	 filters are attached to it.
>>> + *	 When ctx == NULL, this is a no-op.
>>> + */
>>> +LIBBPF_API int bpf_tc_ctx_destroy(struct bpf_tc_ctx *ctx);
>>> +/*
>>> + * @ctx: Cannot be NULL.
>>> + * @fd: Must be >= 0.
>>> + * @opts: Cannot be NULL, prog_id must be unset, all other fields can be
>>> + *	  optionally set. All fields except replace  will be set as per created
>>> + *        filter's attributes. parent must only be set when attach_point of ctx is
>>> + *        BPF_TC_CUSTOM_PARENT, otherwise parent must be unset.
>>> + *
>>> + * Fills the following fields in opts:
>>> + *	handle
>>> + *	parent
>>> + *	priority
>>> + *	prog_id
>>> + */
>>> +LIBBPF_API int bpf_tc_attach(struct bpf_tc_ctx *ctx, int fd,
>>> +			     struct bpf_tc_opts *opts);
>>> +/*
>>> + * @ctx: Cannot be NULL.
>>> + * @opts: Cannot be NULL, replace and prog_id must be unset, all other fields
>>> + *	  must be set.
>>> + */
>>> +LIBBPF_API int bpf_tc_detach(struct bpf_tc_ctx *ctx,
>>> +			     const struct bpf_tc_opts *opts);
>>
>> One thing that I find a bit odd from this API is that BPF_TC_INGRESS / BPF_TC_EGRESS
>> needs to be set each time via bpf_tc_ctx_init(). So whenever a specific program would
>> be attached to both we need to 're-init' in between just to change from hook a to b,
>> whereas when you have BPF_TC_CUSTOM_PARENT, you could just use a different opts->parent
>> without going this detour (unless the clsact wasn't loaded there in the first place).
> 
> Currently I check that opts->parent is unset when BPF_TC_INGRESS or BPF_TC_EGRESS
> is set as attach point. But since both map to clsact, we could allow the user to
> specify opts->parent as BPF_TC_INGRESS or BPF_TC_EGRESS (no need to use
> TC_H_MAKE, we can detect it from ctx->parent that it won't be a parent id). This
> would mean that by default attach point is what you set for ctx, but for
> bpf_tc_attach you can temporarily override to be some other attach point (for
> the same qdisc). You still won't be able to set anything other than the two
> though.

I think the assumption on auto-detecting the parent id in that case might not hold given
major number could very well be 0. Wrt BPF_TC_UNSPEC ... maybe it's not even needed, back
to drawing board ...

Here's how the whole API could look like, usage examples below:

   enum bpf_tc_attach_point {
	BPF_TC_INGRESS = 1 << 0,
	BPF_TC_EGRESS  = 1 << 1,
	BPF_TC_CUSTOM  = 1 << 2,
   };

   enum bpf_tc_attach_flags {
	BPF_TC_F_REPLACE = 1 << 0,
   };

   struct bpf_tc_hook {
	size_t sz;
	int    ifindex;
	enum bpf_tc_attach_point which;
	__u32  parent;
	size_t :0;
   };

   struct bpf_tc_opts {
	size_t sz;
	__u32  handle;
	__u16  priority;
	union {
		int   prog_fd;
		__u32 prog_id;
	};
	size_t :0;
   };

   LIBBPF_API int bpf_tc_hook_create(struct bpf_tc_hook *hook);
   LIBBPF_API int bpf_tc_hook_destroy(struct bpf_tc_hook *hook);

   LIBBPF_API int bpf_tc_attach(const struct bpf_tc_hook *hook, const struct bpf_tc_opts *opts, int flags);
   LIBBPF_API int bpf_tc_detach(const struct bpf_tc_hook *hook, const struct bpf_tc_opts *opts);
   LIBBPF_API int bpf_tc_query(const struct bpf_tc_hook *hook, struct bpf_tc_opts *opts);

So a user could do just:

   DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex = 42, .which = BPF_TC_INGRESS);
   DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .handle = 1, .priority = 1, .prog_fd = fd);

   err = bpf_tc_attach(&hook, &opts, BPF_TC_F_REPLACE);
   [...]

If it's not known whether the hook exists, then a preceding call to:

   err = bpf_tc_hook_create(&hook);
   [...]

The bpf_tc_query() would look like:

   DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex = 42, .which = BPF_TC_EGRESS);
   DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .handle = 1, .priority = 1);

   err = bpf_tc_query(&hook, &opts);
   if (!err) {
          [...]  // gives access to: opts.prog_id
   }

The bpf_tc_detach():

   DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex = 42, .which = BPF_TC_INGRESS);
   DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .handle = 1, .priority = 1);

   err = bpf_tc_detach(&hook, &opts);
   [...]

The nice thing would be that hook and opts are kept semantically separate, meaning with
hook you can iterate though a bunch of devs and ingress/egress locations without changing
opts, whereas with opts you could iterate on the cls_bpf instance itself w/o changing
hook. Both are kept extensible via DECLARE_LIBBPF_OPTS().

Now the bpf_tc_hook_destroy() one:

   DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex = 42, .which = BPF_TC_INGRESS|BPF_TC_EGRESS);

   err = bpf_tc_hook_destroy(&hook, &opts);
   [...]

For triggering a remove of the clsact qdisc on the device, both directions are passed in.
Combining both is only ever allowed for bpf_tc_hook_destroy().

If /only/ BPF_TC_INGRESS or only BPF_TC_EGRESS is passed, it could flush their lists (aka
equivalent of `tc filter del dev eth0 ingress` and `tc filter del dev eth0 egress` command).

For bpf_tc_hook_{create,destroy}() with BPF_TC_CUSTOM, we just return -EINVAL or -EOPNOTSUPP.

I think the above interface would work nicely and feels intuitive while being extensible.
Thoughts?

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next v4 2/3] libbpf: add low level TC-BPF API
  2021-04-27 21:55       ` Daniel Borkmann
@ 2021-04-27 22:05         ` Daniel Borkmann
  2021-04-27 22:32           ` Daniel Borkmann
  2021-04-27 22:36           ` Toke Høiland-Jørgensen
  0 siblings, 2 replies; 18+ messages in thread
From: Daniel Borkmann @ 2021-04-27 22:05 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Toke Høiland-Jørgensen, Alexei Starovoitov,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, netdev

On 4/27/21 11:55 PM, Daniel Borkmann wrote:
> On 4/27/21 8:02 PM, Kumar Kartikeya Dwivedi wrote:
>> On Tue, Apr 27, 2021 at 08:34:30PM IST, Daniel Borkmann wrote:
>>> On 4/23/21 5:05 PM, Kumar Kartikeya Dwivedi wrote:
> [...]
>>>> +/*
>>>> + * @ctx: Can be NULL, if not, must point to a valid object.
>>>> + *     If the qdisc was attached during ctx_init, it will be deleted if no
>>>> + *     filters are attached to it.
>>>> + *     When ctx == NULL, this is a no-op.
>>>> + */
>>>> +LIBBPF_API int bpf_tc_ctx_destroy(struct bpf_tc_ctx *ctx);
>>>> +/*
>>>> + * @ctx: Cannot be NULL.
>>>> + * @fd: Must be >= 0.
>>>> + * @opts: Cannot be NULL, prog_id must be unset, all other fields can be
>>>> + *      optionally set. All fields except replace  will be set as per created
>>>> + *        filter's attributes. parent must only be set when attach_point of ctx is
>>>> + *        BPF_TC_CUSTOM_PARENT, otherwise parent must be unset.
>>>> + *
>>>> + * Fills the following fields in opts:
>>>> + *    handle
>>>> + *    parent
>>>> + *    priority
>>>> + *    prog_id
>>>> + */
>>>> +LIBBPF_API int bpf_tc_attach(struct bpf_tc_ctx *ctx, int fd,
>>>> +                 struct bpf_tc_opts *opts);
>>>> +/*
>>>> + * @ctx: Cannot be NULL.
>>>> + * @opts: Cannot be NULL, replace and prog_id must be unset, all other fields
>>>> + *      must be set.
>>>> + */
>>>> +LIBBPF_API int bpf_tc_detach(struct bpf_tc_ctx *ctx,
>>>> +                 const struct bpf_tc_opts *opts);
>>>
>>> One thing that I find a bit odd from this API is that BPF_TC_INGRESS / BPF_TC_EGRESS
>>> needs to be set each time via bpf_tc_ctx_init(). So whenever a specific program would
>>> be attached to both we need to 're-init' in between just to change from hook a to b,
>>> whereas when you have BPF_TC_CUSTOM_PARENT, you could just use a different opts->parent
>>> without going this detour (unless the clsact wasn't loaded there in the first place).
>>
>> Currently I check that opts->parent is unset when BPF_TC_INGRESS or BPF_TC_EGRESS
>> is set as attach point. But since both map to clsact, we could allow the user to
>> specify opts->parent as BPF_TC_INGRESS or BPF_TC_EGRESS (no need to use
>> TC_H_MAKE, we can detect it from ctx->parent that it won't be a parent id). This
>> would mean that by default attach point is what you set for ctx, but for
>> bpf_tc_attach you can temporarily override to be some other attach point (for
>> the same qdisc). You still won't be able to set anything other than the two
>> though.
> 
> I think the assumption on auto-detecting the parent id in that case might not hold given
> major number could very well be 0. Wrt BPF_TC_UNSPEC ... maybe it's not even needed, back
> to drawing board ...
> 
> Here's how the whole API could look like, usage examples below:
> 
>    enum bpf_tc_attach_point {
>      BPF_TC_INGRESS = 1 << 0,
>      BPF_TC_EGRESS  = 1 << 1,
>      BPF_TC_CUSTOM  = 1 << 2,
>    };
> 
>    enum bpf_tc_attach_flags {
>      BPF_TC_F_REPLACE = 1 << 0,
>    };
> 
>    struct bpf_tc_hook {
>      size_t sz;
>      int    ifindex;
>      enum bpf_tc_attach_point which;
>      __u32  parent;
>      size_t :0;
>    };
> 
>    struct bpf_tc_opts {
>      size_t sz;
>      __u32  handle;
>      __u16  priority;
>      union {
>          int   prog_fd;
>          __u32 prog_id;
>      };
>      size_t :0;
>    };
> 
>    LIBBPF_API int bpf_tc_hook_create(struct bpf_tc_hook *hook);
>    LIBBPF_API int bpf_tc_hook_destroy(struct bpf_tc_hook *hook);
> 
>    LIBBPF_API int bpf_tc_attach(const struct bpf_tc_hook *hook, const struct bpf_tc_opts *opts, int flags);
>    LIBBPF_API int bpf_tc_detach(const struct bpf_tc_hook *hook, const struct bpf_tc_opts *opts);
>    LIBBPF_API int bpf_tc_query(const struct bpf_tc_hook *hook, struct bpf_tc_opts *opts);
> 
> So a user could do just:
> 
>    DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex = 42, .which = BPF_TC_INGRESS);
>    DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .handle = 1, .priority = 1, .prog_fd = fd);
> 
>    err = bpf_tc_attach(&hook, &opts, BPF_TC_F_REPLACE);
>    [...]
> 
> If it's not known whether the hook exists, then a preceding call to:
> 
>    err = bpf_tc_hook_create(&hook);
>    [...]
> 
> The bpf_tc_query() would look like:
> 
>    DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex = 42, .which = BPF_TC_EGRESS);
>    DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .handle = 1, .priority = 1);
> 
>    err = bpf_tc_query(&hook, &opts);
>    if (!err) {
>           [...]  // gives access to: opts.prog_id
>    }
> 
> The bpf_tc_detach():
> 
>    DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex = 42, .which = BPF_TC_INGRESS);
>    DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .handle = 1, .priority = 1);
> 
>    err = bpf_tc_detach(&hook, &opts);
>    [...]
> 
> The nice thing would be that hook and opts are kept semantically separate, meaning with
> hook you can iterate though a bunch of devs and ingress/egress locations without changing
> opts, whereas with opts you could iterate on the cls_bpf instance itself w/o changing
> hook. Both are kept extensible via DECLARE_LIBBPF_OPTS().
> 
> Now the bpf_tc_hook_destroy() one:
> 
>    DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex = 42, .which = BPF_TC_INGRESS|BPF_TC_EGRESS);
> 
>    err = bpf_tc_hook_destroy(&hook);
>    [...]
> 
> For triggering a remove of the clsact qdisc on the device, both directions are passed in.
> Combining both is only ever allowed for bpf_tc_hook_destroy().

Small addendum:

     DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex = 42, .which = BPF_TC_INGRESS|BPF_TC_EGRESS);

     err = bpf_tc_hook_create(&hook);
     [...]

... is also possible, of course, and then both bpf_tc_hook_{create,destroy}() are symmetric.

> If /only/ BPF_TC_INGRESS or only BPF_TC_EGRESS is passed, it could flush their lists (aka
> equivalent of `tc filter del dev eth0 ingress` and `tc filter del dev eth0 egress` command).
> 
> For bpf_tc_hook_{create,destroy}() with BPF_TC_CUSTOM, we just return -EINVAL or -EOPNOTSUPP.
> 
> I think the above interface would work nicely and feels intuitive while being extensible.
> Thoughts?
> 
> Thanks,
> Daniel


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next v4 2/3] libbpf: add low level TC-BPF API
  2021-04-27 22:05         ` Daniel Borkmann
@ 2021-04-27 22:32           ` Daniel Borkmann
  2021-04-27 22:36           ` Toke Høiland-Jørgensen
  1 sibling, 0 replies; 18+ messages in thread
From: Daniel Borkmann @ 2021-04-27 22:32 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Toke Høiland-Jørgensen, Alexei Starovoitov,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, netdev

On 4/28/21 12:05 AM, Daniel Borkmann wrote:
> On 4/27/21 11:55 PM, Daniel Borkmann wrote:
>> On 4/27/21 8:02 PM, Kumar Kartikeya Dwivedi wrote:
>>> On Tue, Apr 27, 2021 at 08:34:30PM IST, Daniel Borkmann wrote:
>>>> On 4/23/21 5:05 PM, Kumar Kartikeya Dwivedi wrote:
>> [...]
>>>>> +/*
>>>>> + * @ctx: Can be NULL, if not, must point to a valid object.
>>>>> + *     If the qdisc was attached during ctx_init, it will be deleted if no
>>>>> + *     filters are attached to it.
>>>>> + *     When ctx == NULL, this is a no-op.
>>>>> + */
>>>>> +LIBBPF_API int bpf_tc_ctx_destroy(struct bpf_tc_ctx *ctx);
>>>>> +/*
>>>>> + * @ctx: Cannot be NULL.
>>>>> + * @fd: Must be >= 0.
>>>>> + * @opts: Cannot be NULL, prog_id must be unset, all other fields can be
>>>>> + *      optionally set. All fields except replace  will be set as per created
>>>>> + *        filter's attributes. parent must only be set when attach_point of ctx is
>>>>> + *        BPF_TC_CUSTOM_PARENT, otherwise parent must be unset.
>>>>> + *
>>>>> + * Fills the following fields in opts:
>>>>> + *    handle
>>>>> + *    parent
>>>>> + *    priority
>>>>> + *    prog_id
>>>>> + */
>>>>> +LIBBPF_API int bpf_tc_attach(struct bpf_tc_ctx *ctx, int fd,
>>>>> +                 struct bpf_tc_opts *opts);
>>>>> +/*
>>>>> + * @ctx: Cannot be NULL.
>>>>> + * @opts: Cannot be NULL, replace and prog_id must be unset, all other fields
>>>>> + *      must be set.
>>>>> + */
>>>>> +LIBBPF_API int bpf_tc_detach(struct bpf_tc_ctx *ctx,
>>>>> +                 const struct bpf_tc_opts *opts);
>>>>
>>>> One thing that I find a bit odd from this API is that BPF_TC_INGRESS / BPF_TC_EGRESS
>>>> needs to be set each time via bpf_tc_ctx_init(). So whenever a specific program would
>>>> be attached to both we need to 're-init' in between just to change from hook a to b,
>>>> whereas when you have BPF_TC_CUSTOM_PARENT, you could just use a different opts->parent
>>>> without going this detour (unless the clsact wasn't loaded there in the first place).
>>>
>>> Currently I check that opts->parent is unset when BPF_TC_INGRESS or BPF_TC_EGRESS
>>> is set as attach point. But since both map to clsact, we could allow the user to
>>> specify opts->parent as BPF_TC_INGRESS or BPF_TC_EGRESS (no need to use
>>> TC_H_MAKE, we can detect it from ctx->parent that it won't be a parent id). This
>>> would mean that by default attach point is what you set for ctx, but for
>>> bpf_tc_attach you can temporarily override to be some other attach point (for
>>> the same qdisc). You still won't be able to set anything other than the two
>>> though.
>>
>> I think the assumption on auto-detecting the parent id in that case might not hold given
>> major number could very well be 0. Wrt BPF_TC_UNSPEC ... maybe it's not even needed, back
>> to drawing board ...
>>
>> Here's how the whole API could look like, usage examples below:

And one last follow-up thought:

>>    enum bpf_tc_attach_point {
>>      BPF_TC_INGRESS = 1 << 0,
>>      BPF_TC_EGRESS  = 1 << 1,
>>      BPF_TC_CUSTOM  = 1 << 2,
>>    };
>>
>>    enum bpf_tc_attach_flags {
>>      BPF_TC_F_REPLACE = 1 << 0,
>>    };
>>
>>    struct bpf_tc_hook {
>>      size_t sz;
>>      int    ifindex;
>>      enum bpf_tc_attach_point which;
>>      __u32  parent;
>>      size_t :0;
>>    };
>>
>>    struct bpf_tc_opts {
>>      size_t sz;
>>      __u32  handle;
>>      __u16  priority;

To avoid gaps, priority could be __u32 as well, but we need to enforce [0, u16max].

>>      union {
>>          int   prog_fd;
>>          __u32 prog_id;
>>      };

We should rather remove the union to make it less error-prone.

>>      size_t :0;
>>    };
>>
>>    LIBBPF_API int bpf_tc_hook_create(struct bpf_tc_hook *hook);
>>    LIBBPF_API int bpf_tc_hook_destroy(struct bpf_tc_hook *hook);
>>
>>    LIBBPF_API int bpf_tc_attach(const struct bpf_tc_hook *hook, const struct bpf_tc_opts *opts, int flags);

Changing s/const struct bpf_tc_opts *opts/struct bpf_tc_opts *opts/ so that if a user did:

     DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex = 42, .which = BPF_TC_INGRESS);
     DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .prog_fd = fd);

     err = bpf_tc_attach(&hook, &opts, 0);
     [...]

Then we'd rely on the kernel (cls_bpf) to auto-allocate handle/prio. libbpf in that case
would populate opts.handle and opts.priority upon success, which can then later be used
again for bpf_tc_detach() / bpf_tc_query() calls.

>>    LIBBPF_API int bpf_tc_detach(const struct bpf_tc_hook *hook, const struct bpf_tc_opts *opts);
>>    LIBBPF_API int bpf_tc_query(const struct bpf_tc_hook *hook, struct bpf_tc_opts *opts);
>>
>> So a user could do just:
>>
>>    DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex = 42, .which = BPF_TC_INGRESS);
>>    DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .handle = 1, .priority = 1, .prog_fd = fd);
>>
>>    err = bpf_tc_attach(&hook, &opts, BPF_TC_F_REPLACE);
>>    [...]
>>
>> If it's not known whether the hook exists, then a preceding call to:
>>
>>    err = bpf_tc_hook_create(&hook);
>>    [...]
>>
>> The bpf_tc_query() would look like:
>>
>>    DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex = 42, .which = BPF_TC_EGRESS);
>>    DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .handle = 1, .priority = 1);
>>
>>    err = bpf_tc_query(&hook, &opts);
>>    if (!err) {
>>           [...]  // gives access to: opts.prog_id
>>    }
>>
>> The bpf_tc_detach():
>>
>>    DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex = 42, .which = BPF_TC_INGRESS);
>>    DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .handle = 1, .priority = 1);
>>
>>    err = bpf_tc_detach(&hook, &opts);
>>    [...]
>>
>> The nice thing would be that hook and opts are kept semantically separate, meaning with
>> hook you can iterate though a bunch of devs and ingress/egress locations without changing
>> opts, whereas with opts you could iterate on the cls_bpf instance itself w/o changing
>> hook. Both are kept extensible via DECLARE_LIBBPF_OPTS().
>>
>> Now the bpf_tc_hook_destroy() one:
>>
>>    DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex = 42, .which = BPF_TC_INGRESS|BPF_TC_EGRESS);
>>
>>    err = bpf_tc_hook_destroy(&hook);
>>    [...]
>>
>> For triggering a remove of the clsact qdisc on the device, both directions are passed in.
>> Combining both is only ever allowed for bpf_tc_hook_destroy().
> 
> Small addendum:
> 
>      DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex = 42, .which = BPF_TC_INGRESS|BPF_TC_EGRESS);
> 
>      err = bpf_tc_hook_create(&hook);
>      [...]
> 
> ... is also possible, of course, and then both bpf_tc_hook_{create,destroy}() are symmetric.
> 
>> If /only/ BPF_TC_INGRESS or only BPF_TC_EGRESS is passed, it could flush their lists (aka
>> equivalent of `tc filter del dev eth0 ingress` and `tc filter del dev eth0 egress` command).
>>
>> For bpf_tc_hook_{create,destroy}() with BPF_TC_CUSTOM, we just return -EINVAL or -EOPNOTSUPP.
>>
>> I think the above interface would work nicely and feels intuitive while being extensible.
>> Thoughts?
>>
>> Thanks,
>> Daniel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next v4 2/3] libbpf: add low level TC-BPF API
  2021-04-27 22:05         ` Daniel Borkmann
  2021-04-27 22:32           ` Daniel Borkmann
@ 2021-04-27 22:36           ` Toke Høiland-Jørgensen
  2021-04-27 22:40             ` Daniel Borkmann
  1 sibling, 1 reply; 18+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-04-27 22:36 UTC (permalink / raw)
  To: Daniel Borkmann, Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Martin KaFai Lau,
	Song Liu, Yonghong Song, John Fastabend, KP Singh,
	David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer, netdev

Daniel Borkmann <daniel@iogearbox.net> writes:

> On 4/27/21 11:55 PM, Daniel Borkmann wrote:
>> On 4/27/21 8:02 PM, Kumar Kartikeya Dwivedi wrote:
>>> On Tue, Apr 27, 2021 at 08:34:30PM IST, Daniel Borkmann wrote:
>>>> On 4/23/21 5:05 PM, Kumar Kartikeya Dwivedi wrote:
>> [...]
>>>>> +/*
>>>>> + * @ctx: Can be NULL, if not, must point to a valid object.
>>>>> + *     If the qdisc was attached during ctx_init, it will be deleted if no
>>>>> + *     filters are attached to it.
>>>>> + *     When ctx == NULL, this is a no-op.
>>>>> + */
>>>>> +LIBBPF_API int bpf_tc_ctx_destroy(struct bpf_tc_ctx *ctx);
>>>>> +/*
>>>>> + * @ctx: Cannot be NULL.
>>>>> + * @fd: Must be >= 0.
>>>>> + * @opts: Cannot be NULL, prog_id must be unset, all other fields can be
>>>>> + *      optionally set. All fields except replace  will be set as per created
>>>>> + *        filter's attributes. parent must only be set when attach_point of ctx is
>>>>> + *        BPF_TC_CUSTOM_PARENT, otherwise parent must be unset.
>>>>> + *
>>>>> + * Fills the following fields in opts:
>>>>> + *    handle
>>>>> + *    parent
>>>>> + *    priority
>>>>> + *    prog_id
>>>>> + */
>>>>> +LIBBPF_API int bpf_tc_attach(struct bpf_tc_ctx *ctx, int fd,
>>>>> +                 struct bpf_tc_opts *opts);
>>>>> +/*
>>>>> + * @ctx: Cannot be NULL.
>>>>> + * @opts: Cannot be NULL, replace and prog_id must be unset, all other fields
>>>>> + *      must be set.
>>>>> + */
>>>>> +LIBBPF_API int bpf_tc_detach(struct bpf_tc_ctx *ctx,
>>>>> +                 const struct bpf_tc_opts *opts);
>>>>
>>>> One thing that I find a bit odd from this API is that BPF_TC_INGRESS / BPF_TC_EGRESS
>>>> needs to be set each time via bpf_tc_ctx_init(). So whenever a specific program would
>>>> be attached to both we need to 're-init' in between just to change from hook a to b,
>>>> whereas when you have BPF_TC_CUSTOM_PARENT, you could just use a different opts->parent
>>>> without going this detour (unless the clsact wasn't loaded there in the first place).
>>>
>>> Currently I check that opts->parent is unset when BPF_TC_INGRESS or BPF_TC_EGRESS
>>> is set as attach point. But since both map to clsact, we could allow the user to
>>> specify opts->parent as BPF_TC_INGRESS or BPF_TC_EGRESS (no need to use
>>> TC_H_MAKE, we can detect it from ctx->parent that it won't be a parent id). This
>>> would mean that by default attach point is what you set for ctx, but for
>>> bpf_tc_attach you can temporarily override to be some other attach point (for
>>> the same qdisc). You still won't be able to set anything other than the two
>>> though.
>> 
>> I think the assumption on auto-detecting the parent id in that case might not hold given
>> major number could very well be 0. Wrt BPF_TC_UNSPEC ... maybe it's not even needed, back
>> to drawing board ...
>> 
>> Here's how the whole API could look like, usage examples below:
>> 
>>    enum bpf_tc_attach_point {
>>      BPF_TC_INGRESS = 1 << 0,
>>      BPF_TC_EGRESS  = 1 << 1,
>>      BPF_TC_CUSTOM  = 1 << 2,
>>    };
>> 
>>    enum bpf_tc_attach_flags {
>>      BPF_TC_F_REPLACE = 1 << 0,
>>    };
>> 
>>    struct bpf_tc_hook {
>>      size_t sz;
>>      int    ifindex;
>>      enum bpf_tc_attach_point which;
>>      __u32  parent;
>>      size_t :0;
>>    };
>> 
>>    struct bpf_tc_opts {
>>      size_t sz;
>>      __u32  handle;
>>      __u16  priority;
>>      union {
>>          int   prog_fd;
>>          __u32 prog_id;
>>      };
>>      size_t :0;
>>    };
>> 
>>    LIBBPF_API int bpf_tc_hook_create(struct bpf_tc_hook *hook);
>>    LIBBPF_API int bpf_tc_hook_destroy(struct bpf_tc_hook *hook);
>> 
>>    LIBBPF_API int bpf_tc_attach(const struct bpf_tc_hook *hook, const struct bpf_tc_opts *opts, int flags);
>>    LIBBPF_API int bpf_tc_detach(const struct bpf_tc_hook *hook, const struct bpf_tc_opts *opts);
>>    LIBBPF_API int bpf_tc_query(const struct bpf_tc_hook *hook, struct bpf_tc_opts *opts);
>> 
>> So a user could do just:
>> 
>>    DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex = 42, .which = BPF_TC_INGRESS);
>>    DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .handle = 1, .priority = 1, .prog_fd = fd);
>> 
>>    err = bpf_tc_attach(&hook, &opts, BPF_TC_F_REPLACE);
>>    [...]
>> 
>> If it's not known whether the hook exists, then a preceding call to:
>> 
>>    err = bpf_tc_hook_create(&hook);
>>    [...]
>> 
>> The bpf_tc_query() would look like:
>> 
>>    DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex = 42, .which = BPF_TC_EGRESS);
>>    DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .handle = 1, .priority = 1);
>> 
>>    err = bpf_tc_query(&hook, &opts);
>>    if (!err) {
>>           [...]  // gives access to: opts.prog_id
>>    }
>> 
>> The bpf_tc_detach():
>> 
>>    DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex = 42, .which = BPF_TC_INGRESS);
>>    DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .handle = 1, .priority = 1);
>> 
>>    err = bpf_tc_detach(&hook, &opts);
>>    [...]
>> 
>> The nice thing would be that hook and opts are kept semantically separate, meaning with
>> hook you can iterate though a bunch of devs and ingress/egress locations without changing
>> opts, whereas with opts you could iterate on the cls_bpf instance itself w/o changing
>> hook. Both are kept extensible via DECLARE_LIBBPF_OPTS().
>> 
>> Now the bpf_tc_hook_destroy() one:
>> 
>>    DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex = 42, .which = BPF_TC_INGRESS|BPF_TC_EGRESS);
>> 
>>    err = bpf_tc_hook_destroy(&hook);
>>    [...]
>> 
>> For triggering a remove of the clsact qdisc on the device, both directions are passed in.
>> Combining both is only ever allowed for bpf_tc_hook_destroy().
>
> Small addendum:
>
>      DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex = 42, .which = BPF_TC_INGRESS|BPF_TC_EGRESS);
>
>      err = bpf_tc_hook_create(&hook);
>      [...]
>
> ... is also possible, of course, and then both bpf_tc_hook_{create,destroy}() are symmetric.

It should be allowed, but it wouldn't actually make any difference which
combination of TC_INGRESS and TC_EGRESS you specify, as long as one of
them is set, right? I.e., we just attach the clsact qdisc in both
cases...

-Toke


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next v4 2/3] libbpf: add low level TC-BPF API
  2021-04-27 22:36           ` Toke Høiland-Jørgensen
@ 2021-04-27 22:40             ` Daniel Borkmann
  2021-04-27 22:51               ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 18+ messages in thread
From: Daniel Borkmann @ 2021-04-27 22:40 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Martin KaFai Lau,
	Song Liu, Yonghong Song, John Fastabend, KP Singh,
	David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer, netdev

On 4/28/21 12:36 AM, Toke Høiland-Jørgensen wrote:
> Daniel Borkmann <daniel@iogearbox.net> writes:
[...]
>> Small addendum:
>>
>>       DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex = 42, .which = BPF_TC_INGRESS|BPF_TC_EGRESS);
>>
>>       err = bpf_tc_hook_create(&hook);
>>       [...]
>>
>> ... is also possible, of course, and then both bpf_tc_hook_{create,destroy}() are symmetric.
> 
> It should be allowed, but it wouldn't actually make any difference which
> combination of TC_INGRESS and TC_EGRESS you specify, as long as one of
> them is set, right? I.e., we just attach the clsact qdisc in both
> cases...

Yes, that is correct, for the bpf_tc_hook_create() whether you pass in BPF_TC_INGRESS,
BPF_TC_EGRESS or BPF_TC_INGRESS|BPF_TC_EGRESS, you'll end up creating clsact qdisc in
either of the three cases. Only the bpf_tc_hook_destroy() differs between all of them.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next v4 2/3] libbpf: add low level TC-BPF API
  2021-04-27 22:40             ` Daniel Borkmann
@ 2021-04-27 22:51               ` Toke Høiland-Jørgensen
  2021-04-27 23:14                 ` Daniel Borkmann
  0 siblings, 1 reply; 18+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-04-27 22:51 UTC (permalink / raw)
  To: Daniel Borkmann, Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Martin KaFai Lau,
	Song Liu, Yonghong Song, John Fastabend, KP Singh,
	David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer, netdev

Daniel Borkmann <daniel@iogearbox.net> writes:

> On 4/28/21 12:36 AM, Toke Høiland-Jørgensen wrote:
>> Daniel Borkmann <daniel@iogearbox.net> writes:
> [...]
>>> Small addendum:
>>>
>>>       DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex = 42, .which = BPF_TC_INGRESS|BPF_TC_EGRESS);
>>>
>>>       err = bpf_tc_hook_create(&hook);
>>>       [...]
>>>
>>> ... is also possible, of course, and then both bpf_tc_hook_{create,destroy}() are symmetric.
>> 
>> It should be allowed, but it wouldn't actually make any difference which
>> combination of TC_INGRESS and TC_EGRESS you specify, as long as one of
>> them is set, right? I.e., we just attach the clsact qdisc in both
>> cases...
>
> Yes, that is correct, for the bpf_tc_hook_create() whether you pass in BPF_TC_INGRESS,
> BPF_TC_EGRESS or BPF_TC_INGRESS|BPF_TC_EGRESS, you'll end up creating clsact qdisc in
> either of the three cases. Only the bpf_tc_hook_destroy() differs
> between all of them.

Right, just checking. Other than that, I like your proposal; it loses
the "automatic removal of qdisc if we added it" feature, but that's
probably OK: less magic is good. And as long as bpf_tc_hook_create()
returns EEXIST if the qdisc already exists, the caller can do the same
thing if they want.

-Toke


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next v4 2/3] libbpf: add low level TC-BPF API
  2021-04-27 22:51               ` Toke Høiland-Jørgensen
@ 2021-04-27 23:14                 ` Daniel Borkmann
  2021-04-27 23:19                   ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 18+ messages in thread
From: Daniel Borkmann @ 2021-04-27 23:14 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Martin KaFai Lau,
	Song Liu, Yonghong Song, John Fastabend, KP Singh,
	David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer, netdev

On 4/28/21 12:51 AM, Toke Høiland-Jørgensen wrote:
> Daniel Borkmann <daniel@iogearbox.net> writes:
>> On 4/28/21 12:36 AM, Toke Høiland-Jørgensen wrote:
>>> Daniel Borkmann <daniel@iogearbox.net> writes:
>> [...]
>>>> Small addendum:
>>>>
>>>>        DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex = 42, .which = BPF_TC_INGRESS|BPF_TC_EGRESS);
>>>>
>>>>        err = bpf_tc_hook_create(&hook);
>>>>        [...]
>>>>
>>>> ... is also possible, of course, and then both bpf_tc_hook_{create,destroy}() are symmetric.
>>>
>>> It should be allowed, but it wouldn't actually make any difference which
>>> combination of TC_INGRESS and TC_EGRESS you specify, as long as one of
>>> them is set, right? I.e., we just attach the clsact qdisc in both
>>> cases...
>>
>> Yes, that is correct, for the bpf_tc_hook_create() whether you pass in BPF_TC_INGRESS,
>> BPF_TC_EGRESS or BPF_TC_INGRESS|BPF_TC_EGRESS, you'll end up creating clsact qdisc in
>> either of the three cases. Only the bpf_tc_hook_destroy() differs
>> between all of them.
> 
> Right, just checking. Other than that, I like your proposal; it loses
> the "automatic removal of qdisc if we added it" feature, but that's
> probably OK: less magic is good. And as long as bpf_tc_hook_create()
> returns EEXIST if the qdisc already exists, the caller can do the same
> thing if they want.

Yes exactly. Less magic the better, especially given this has global effect.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next v4 2/3] libbpf: add low level TC-BPF API
  2021-04-27 23:14                 ` Daniel Borkmann
@ 2021-04-27 23:19                   ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 18+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2021-04-27 23:19 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Toke Høiland-Jørgensen, bpf, Alexei Starovoitov,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, netdev

On Wed, Apr 28, 2021 at 04:44:23AM IST, Daniel Borkmann wrote:
> On 4/28/21 12:51 AM, Toke Høiland-Jørgensen wrote:
> > Daniel Borkmann <daniel@iogearbox.net> writes:
> > > On 4/28/21 12:36 AM, Toke Høiland-Jørgensen wrote:
> > > > Daniel Borkmann <daniel@iogearbox.net> writes:
> > > [...]
> > > > > Small addendum:
> > > > >
> > > > >        DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex = 42, .which = BPF_TC_INGRESS|BPF_TC_EGRESS);
> > > > >
> > > > >        err = bpf_tc_hook_create(&hook);
> > > > >        [...]
> > > > >
> > > > > ... is also possible, of course, and then both bpf_tc_hook_{create,destroy}() are symmetric.
> > > >
> > > > It should be allowed, but it wouldn't actually make any difference which
> > > > combination of TC_INGRESS and TC_EGRESS you specify, as long as one of
> > > > them is set, right? I.e., we just attach the clsact qdisc in both
> > > > cases...
> > >
> > > Yes, that is correct, for the bpf_tc_hook_create() whether you pass in BPF_TC_INGRESS,
> > > BPF_TC_EGRESS or BPF_TC_INGRESS|BPF_TC_EGRESS, you'll end up creating clsact qdisc in
> > > either of the three cases. Only the bpf_tc_hook_destroy() differs
> > > between all of them.
> >
> > Right, just checking. Other than that, I like your proposal; it loses
> > the "automatic removal of qdisc if we added it" feature, but that's
> > probably OK: less magic is good. And as long as bpf_tc_hook_create()
> > returns EEXIST if the qdisc already exists, the caller can do the same
> > thing if they want.
>
> Yes exactly. Less magic the better, especially given this has global effect.
>

Everything sounds good. I'll do a resend.

> Thanks,
> Daniel

--
Kartikeya

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2021-04-27 23:19 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-23 15:05 [PATCH bpf-next v4 0/3] Add TC-BPF API Kumar Kartikeya Dwivedi
2021-04-23 15:05 ` [PATCH bpf-next v4 1/3] libbpf: add helpers for preparing netlink attributes Kumar Kartikeya Dwivedi
2021-04-23 15:05 ` [PATCH bpf-next v4 2/3] libbpf: add low level TC-BPF API Kumar Kartikeya Dwivedi
2021-04-27 15:04   ` Daniel Borkmann
2021-04-27 18:00     ` Toke Høiland-Jørgensen
2021-04-27 18:02     ` Kumar Kartikeya Dwivedi
2021-04-27 18:15       ` Toke Høiland-Jørgensen
2021-04-27 21:55       ` Daniel Borkmann
2021-04-27 22:05         ` Daniel Borkmann
2021-04-27 22:32           ` Daniel Borkmann
2021-04-27 22:36           ` Toke Høiland-Jørgensen
2021-04-27 22:40             ` Daniel Borkmann
2021-04-27 22:51               ` Toke Høiland-Jørgensen
2021-04-27 23:14                 ` Daniel Borkmann
2021-04-27 23:19                   ` Kumar Kartikeya Dwivedi
2021-04-27 20:36     ` Andrii Nakryiko
2021-04-23 15:06 ` [PATCH bpf-next v4 3/3] libbpf: add selftests for " Kumar Kartikeya Dwivedi
2021-04-27 21:50   ` Andrii Nakryiko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).