bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v3 0/3] Add TC-BPF API
@ 2021-04-20 19:37 Kumar Kartikeya Dwivedi
  2021-04-20 19:37 ` [PATCH bpf-next v3 1/3] libbpf: add helpers for preparing netlink attributes Kumar Kartikeya Dwivedi
                   ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2021-04-20 19:37 UTC (permalink / raw)
  To: bpf
  Cc: Kumar Kartikeya Dwivedi, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, Toke Høiland-Jørgensen, netdev

This is the third version of the TC-BPF series.

It adds a simple API that uses netlink to attach the tc filter and its bpf
classifier. Currently, a user needs to shell out to the tc command line to be
able to create filters and attach SCHED_CLS programs as classifiers. With the
help of this API, it will be possible to use libbpf for doing all parts of bpf
program setup and attach.

Direct action is now always set, and currently no way to disable it has been
provided. This also means that SCHED_ACT programs are a lot less useful, so
support for them has been removed for now. In the future, if someone comes up
with a convincing use case where the direct action mode doesn't serve their
needs, a simple extension that allows disabling direct action mode and passing
the list of actions that would be bound to the classifier can be added.

In an effort to keep discussion focused, this series doesn't have the high level
TC-BPF API. It was clear that there is a need for a bpf_link API in the kernel,
hence that will be submitted as a separate patchset.

The individual commit messages contain more details, and also a brief summary of
the API.

Changelog:
----------
v2 -> v3
v2: https://lore.kernel.org/bpf/20210419121811.117400-1-memxor@gmail.com

 * bpf_tc_cls_* -> bpf_tc_* rename
 * bpf_tc_attach_id now only consists of handle and priority, the two variables
   that user may or may not set.
 * bpf_tc_replace has been dropped, instead a replace bool is introduced in
   bpf_tc_opts for the same purpose.
 * bpf_tc_get_info now takes attach_id for filling in filter details during
   lookup instead of requiring user to do so. This also allows us to remove the
   fd parameter, as no matching is needed as long as we have all attributes
   necessary to identify a specific filter.
 * A little bit of code simplification taking into account the change above.
 * priority and protocol are now __u16 members in user facing API structs to
   reflect actual size.
 * Patch updating pkt_cls.h header has been removed, as it is unused now.
 * protocol and chain_index options have been dropped in bpf_tc_opts,
   protocol is always set to ETH_P_ALL, while chain_index is set as 0 by
   default in the kernel. This also means removal of chain_index from
   bpf_tc_attach_id, as it is unconditionally always 0.
 * bpf_tc_cls_change has been dropped
 * selftest now uses ASSERT_* macros

v1 -> v2
v1: https://lore.kernel.org/bpf/20210325120020.236504-1-memxor@gmail.com

 * netlink helpers have been renamed to object_action style.
 * attach_id now only contains attributes that are not explicitly set. Only
   the bare minimum info is kept in it.
 * protocol is now an optional and always set to ETH_P_ALL.
 * direct-action mode is always set.
 * skip_sw and skip_hw options have also been removed.
 * bpf_tc_cls_info struct now also returns the bpf program tag and id, as
   available in the netlink response. This came up as a requirement during
   discussion with people wanting to use this functionality.
 * support for attaching SCHED_ACT programs has been dropped, as it isn't
   useful without any support for binding loaded actions to a classifier.
 * the distinction between dev and block API has been dropped, there is now
   a single set of functions and user has to pass the special ifindex value
   to indicate operation on a shared filter block on their own.
 * The high level API returning a bpf_link is gone. This was already non-
   functional for pinning and typical ownership semantics. Instead, a separate
   patchset will be sent adding a bpf_link API for attaching SCHED_CLS progs to
   the kernel, and its corresponding libbpf API.
 * The clsact qdisc is now setup automatically in a best-effort fashion whenever
   user passes in the clsact ingress or egress parent id. This is done with
   exclusive mode, such that if an ingress or clsact qdisc is already set up,
   we skip the setup and move on with filter creation.
 * Other minor changes that came up during the course of discussion and rework.

Kumar Kartikeya Dwivedi (3):
  libbpf: add helpers for preparing netlink attributes
  libbpf: add low level TC-BPF API
  libbpf: add selftests for TC-BPF API

 tools/lib/bpf/libbpf.h                        |  44 +++
 tools/lib/bpf/libbpf.map                      |   3 +
 tools/lib/bpf/netlink.c                       | 356 ++++++++++++++++--
 tools/lib/bpf/nlattr.h                        |  48 +++
 .../selftests/bpf/prog_tests/test_tc_bpf.c    | 169 +++++++++
 .../selftests/bpf/progs/test_tc_bpf_kern.c    |  12 +
 6 files changed, 605 insertions(+), 27 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c

--
2.30.2


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH bpf-next v3 1/3] libbpf: add helpers for preparing netlink attributes
  2021-04-20 19:37 [PATCH bpf-next v3 0/3] Add TC-BPF API Kumar Kartikeya Dwivedi
@ 2021-04-20 19:37 ` Kumar Kartikeya Dwivedi
  2021-04-21  6:37   ` Yonghong Song
  2021-04-20 19:37 ` [PATCH bpf-next v3 2/3] libbpf: add low level TC-BPF API Kumar Kartikeya Dwivedi
  2021-04-20 19:37 ` [PATCH bpf-next v3 3/3] libbpf: add selftests for " Kumar Kartikeya Dwivedi
  2 siblings, 1 reply; 29+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2021-04-20 19:37 UTC (permalink / raw)
  To: bpf
  Cc: Kumar Kartikeya Dwivedi, Toke Høiland-Jørgensen,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, netdev

This change introduces a few helpers to wrap open coded attribute
preparation in netlink.c.

Every nested attribute's closure must happen using the helper
nlattr_end_nested, which sets its length properly. NLA_F_NESTED is
enforeced using nlattr_begin_nested helper. Other simple attributes
can be added directly.

The maxsz parameter corresponds to the size of the request structure
which is being filled in, so for instance with req being:

struct {
	struct nlmsghdr nh;
	struct tcmsg t;
	char buf[4096];
} req;

Then, maxsz should be sizeof(req).

This change also converts the open coded attribute preparation with the
helpers. Note that the only failure the internal call to nlattr_add
could result in the nested helper would be -EMSGSIZE, hence that is what
we return to our caller.

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 tools/lib/bpf/netlink.c | 37 ++++++++++++++-----------------
 tools/lib/bpf/nlattr.h  | 48 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 64 insertions(+), 21 deletions(-)

diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c
index d2cb28e9ef52..c79e30484e81 100644
--- a/tools/lib/bpf/netlink.c
+++ b/tools/lib/bpf/netlink.c
@@ -135,7 +135,7 @@ static int __bpf_set_link_xdp_fd_replace(int ifindex, int fd, int old_fd,
 					 __u32 flags)
 {
 	int sock, seq = 0, ret;
-	struct nlattr *nla, *nla_xdp;
+	struct nlattr *nla;
 	struct {
 		struct nlmsghdr  nh;
 		struct ifinfomsg ifinfo;
@@ -157,36 +157,31 @@ static int __bpf_set_link_xdp_fd_replace(int ifindex, int fd, int old_fd,
 	req.ifinfo.ifi_index = ifindex;
 
 	/* started nested attribute for XDP */
-	nla = (struct nlattr *)(((char *)&req)
-				+ NLMSG_ALIGN(req.nh.nlmsg_len));
-	nla->nla_type = NLA_F_NESTED | IFLA_XDP;
-	nla->nla_len = NLA_HDRLEN;
+	nla = nlattr_begin_nested(&req.nh, sizeof(req), IFLA_XDP);
+	if (!nla) {
+		ret = -EMSGSIZE;
+		goto cleanup;
+	}
 
 	/* add XDP fd */
-	nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
-	nla_xdp->nla_type = IFLA_XDP_FD;
-	nla_xdp->nla_len = NLA_HDRLEN + sizeof(int);
-	memcpy((char *)nla_xdp + NLA_HDRLEN, &fd, sizeof(fd));
-	nla->nla_len += nla_xdp->nla_len;
+	ret = nlattr_add(&req.nh, sizeof(req), IFLA_XDP_FD, &fd, sizeof(fd));
+	if (ret < 0)
+		goto cleanup;
 
 	/* if user passed in any flags, add those too */
 	if (flags) {
-		nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
-		nla_xdp->nla_type = IFLA_XDP_FLAGS;
-		nla_xdp->nla_len = NLA_HDRLEN + sizeof(flags);
-		memcpy((char *)nla_xdp + NLA_HDRLEN, &flags, sizeof(flags));
-		nla->nla_len += nla_xdp->nla_len;
+		ret = nlattr_add(&req.nh, sizeof(req), IFLA_XDP_FLAGS, &flags, sizeof(flags));
+		if (ret < 0)
+			goto cleanup;
 	}
 
 	if (flags & XDP_FLAGS_REPLACE) {
-		nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
-		nla_xdp->nla_type = IFLA_XDP_EXPECTED_FD;
-		nla_xdp->nla_len = NLA_HDRLEN + sizeof(old_fd);
-		memcpy((char *)nla_xdp + NLA_HDRLEN, &old_fd, sizeof(old_fd));
-		nla->nla_len += nla_xdp->nla_len;
+		ret = nlattr_add(&req.nh, sizeof(req), IFLA_XDP_EXPECTED_FD, &flags, sizeof(flags));
+		if (ret < 0)
+			goto cleanup;
 	}
 
-	req.nh.nlmsg_len += NLA_ALIGN(nla->nla_len);
+	nlattr_end_nested(&req.nh, nla);
 
 	if (send(sock, &req, req.nh.nlmsg_len, 0) < 0) {
 		ret = -errno;
diff --git a/tools/lib/bpf/nlattr.h b/tools/lib/bpf/nlattr.h
index 6cc3ac91690f..1c94cdb6e89d 100644
--- a/tools/lib/bpf/nlattr.h
+++ b/tools/lib/bpf/nlattr.h
@@ -10,7 +10,10 @@
 #define __LIBBPF_NLATTR_H
 
 #include <stdint.h>
+#include <string.h>
+#include <errno.h>
 #include <linux/netlink.h>
+
 /* avoid multiple definition of netlink features */
 #define __LINUX_NETLINK_H
 
@@ -103,4 +106,49 @@ int libbpf_nla_parse_nested(struct nlattr *tb[], int maxtype,
 
 int libbpf_nla_dump_errormsg(struct nlmsghdr *nlh);
 
+static inline struct nlattr *nla_data(struct nlattr *nla)
+{
+	return (struct nlattr *)((char *)nla + NLA_HDRLEN);
+}
+
+static inline struct nlattr *nh_tail(struct nlmsghdr *nh)
+{
+	return (struct nlattr *)((char *)nh + NLMSG_ALIGN(nh->nlmsg_len));
+}
+
+static inline int nlattr_add(struct nlmsghdr *nh, size_t maxsz, int type,
+			     const void *data, int len)
+{
+	struct nlattr *nla;
+
+	if (NLMSG_ALIGN(nh->nlmsg_len) + NLA_ALIGN(NLA_HDRLEN + len) > maxsz)
+		return -EMSGSIZE;
+	if ((!data && len) || (data && !len))
+		return -EINVAL;
+
+	nla = nh_tail(nh);
+	nla->nla_type = type;
+	nla->nla_len = NLA_HDRLEN + len;
+	if (data)
+		memcpy(nla_data(nla), data, len);
+	nh->nlmsg_len = NLMSG_ALIGN(nh->nlmsg_len) + NLA_ALIGN(nla->nla_len);
+	return 0;
+}
+
+static inline struct nlattr *nlattr_begin_nested(struct nlmsghdr *nh,
+						 size_t maxsz, int type)
+{
+	struct nlattr *tail;
+
+	tail = nh_tail(nh);
+	if (nlattr_add(nh, maxsz, type | NLA_F_NESTED, NULL, 0))
+		return NULL;
+	return tail;
+}
+
+static inline void nlattr_end_nested(struct nlmsghdr *nh, struct nlattr *tail)
+{
+	tail->nla_len = (char *)nh_tail(nh) - (char *)tail;
+}
+
 #endif /* __LIBBPF_NLATTR_H */
-- 
2.30.2


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH bpf-next v3 2/3] libbpf: add low level TC-BPF API
  2021-04-20 19:37 [PATCH bpf-next v3 0/3] Add TC-BPF API Kumar Kartikeya Dwivedi
  2021-04-20 19:37 ` [PATCH bpf-next v3 1/3] libbpf: add helpers for preparing netlink attributes Kumar Kartikeya Dwivedi
@ 2021-04-20 19:37 ` Kumar Kartikeya Dwivedi
  2021-04-21  6:58   ` Yonghong Song
                     ` (2 more replies)
  2021-04-20 19:37 ` [PATCH bpf-next v3 3/3] libbpf: add selftests for " Kumar Kartikeya Dwivedi
  2 siblings, 3 replies; 29+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2021-04-20 19:37 UTC (permalink / raw)
  To: bpf
  Cc: Kumar Kartikeya Dwivedi, Toke Høiland-Jørgensen,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, netdev

This adds functions that wrap the netlink API used for adding,
manipulating, and removing traffic control filters. These functions
operate directly on the loaded prog's fd, and return a handle to the
filter using an out parameter named id.

The basic featureset is covered to allow for attaching and removal of
filters. Some additional features like TCA_BPF_POLICE and TCA_RATE for
the API have been omitted. These can added on top later by extending the
bpf_tc_opts struct.

Support for binding actions directly to a classifier by passing them in
during filter creation has also been omitted for now. These actions have
an auto clean up property because their lifetime is bound to the filter
they are attached to. This can be added later, but was omitted for now
as direct action mode is a better alternative to it, which is enabled by
default.

An API summary:

bpf_tc_attach may be used to attach, and replace SCHED_CLS bpf
classifier. The protocol is always set as ETH_P_ALL. The replace option
in bpf_tc_opts is used to control replacement behavior.  Attachment
fails if filter with existing attributes already exists.

bpf_tc_detach may be used to detach existing SCHED_CLS filter. The
bpf_tc_attach_id object filled in during attach must be passed in to the
detach functions for them to remove the filter and its attached
classififer correctly.

bpf_tc_get_info is a helper that can be used to obtain attributes
for the filter and classififer.

Examples:

	struct bpf_tc_attach_id id = {};
	struct bpf_object *obj;
	struct bpf_program *p;
	int fd, r;

	obj = bpf_object_open("foo.o");
	if (IS_ERR_OR_NULL(obj))
		return PTR_ERR(obj);

	p = bpf_object__find_program_by_title(obj, "classifier");
	if (IS_ERR_OR_NULL(p))
		return PTR_ERR(p);

	if (bpf_object__load(obj) < 0)
		return -1;

	fd = bpf_program__fd(p);

	r = bpf_tc_attach(fd, if_nametoindex("lo"),
			  BPF_TC_CLSACT_INGRESS,
			  NULL, &id);
	if (r < 0)
		return r;

... which is roughly equivalent to:
  # tc qdisc add dev lo clsact
  # tc filter add dev lo ingress bpf obj foo.o sec classifier da

... as direct action mode is always enabled.

To replace an existing filter:

	DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .handle = id.handle,
			    .priority = id.priority, .replace = true);
	r = bpf_tc_attach(fd, if_nametoindex("lo"),
			  BPF_TC_CLSACT_INGRESS,
			  &opts, &id);
	if (r < 0)
		return r;

To obtain info of a particular filter, the example above can be extended
as follows:

	struct bpf_tc_info info = {};

	r = bpf_tc_get_info(if_nametoindex("lo"),
			    BPF_TC_CLSACT_INGRESS,
			    &id, &info);
	if (r < 0)
		return r;

... where id corresponds to the bpf_tc_attach_id filled in during an
attach operation.

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 tools/lib/bpf/libbpf.h   |  44 ++++++
 tools/lib/bpf/libbpf.map |   3 +
 tools/lib/bpf/netlink.c  | 319 ++++++++++++++++++++++++++++++++++++++-
 3 files changed, 360 insertions(+), 6 deletions(-)

diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index bec4e6a6e31d..b4ed6a41ea70 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -16,6 +16,8 @@
 #include <stdbool.h>
 #include <sys/types.h>  // for size_t
 #include <linux/bpf.h>
+#include <linux/pkt_sched.h>
+#include <linux/tc_act/tc_bpf.h>
 
 #include "libbpf_common.h"
 
@@ -775,6 +777,48 @@ LIBBPF_API int bpf_linker__add_file(struct bpf_linker *linker, const char *filen
 LIBBPF_API int bpf_linker__finalize(struct bpf_linker *linker);
 LIBBPF_API void bpf_linker__free(struct bpf_linker *linker);
 
+/* Convenience macros for the clsact attach hooks */
+#define BPF_TC_CLSACT_INGRESS TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_INGRESS)
+#define BPF_TC_CLSACT_EGRESS TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_EGRESS)
+
+struct bpf_tc_opts {
+	size_t sz;
+	__u32 handle;
+	__u32 class_id;
+	__u16 priority;
+	bool replace;
+	size_t :0;
+};
+
+#define bpf_tc_opts__last_field replace
+
+/* Acts as a handle for an attached filter */
+struct bpf_tc_attach_id {
+	__u32 handle;
+	__u16 priority;
+};
+
+struct bpf_tc_info {
+	struct bpf_tc_attach_id id;
+	__u16 protocol;
+	__u32 chain_index;
+	__u32 prog_id;
+	__u8 tag[BPF_TAG_SIZE];
+	__u32 class_id;
+	__u32 bpf_flags;
+	__u32 bpf_flags_gen;
+};
+
+/* id is out parameter that will be written to, it must not be NULL */
+LIBBPF_API int bpf_tc_attach(int fd, __u32 ifindex, __u32 parent_id,
+			     const struct bpf_tc_opts *opts,
+			     struct bpf_tc_attach_id *id);
+LIBBPF_API int bpf_tc_detach(__u32 ifindex, __u32 parent_id,
+			     const struct bpf_tc_attach_id *id);
+LIBBPF_API int bpf_tc_get_info(__u32 ifindex, __u32 parent_id,
+			       const struct bpf_tc_attach_id *id,
+			       struct bpf_tc_info *info);
+
 #ifdef __cplusplus
 } /* extern "C" */
 #endif
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index b9b29baf1df8..686444fbb838 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -361,4 +361,7 @@ LIBBPF_0.4.0 {
 		bpf_linker__new;
 		bpf_map__inner_map;
 		bpf_object__set_kversion;
+		bpf_tc_attach;
+		bpf_tc_detach;
+		bpf_tc_get_info;
 } LIBBPF_0.3.0;
diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c
index c79e30484e81..71109dcea9e4 100644
--- a/tools/lib/bpf/netlink.c
+++ b/tools/lib/bpf/netlink.c
@@ -4,7 +4,10 @@
 #include <stdlib.h>
 #include <memory.h>
 #include <unistd.h>
+#include <inttypes.h>
+#include <arpa/inet.h>
 #include <linux/bpf.h>
+#include <linux/if_ether.h>
 #include <linux/rtnetlink.h>
 #include <sys/socket.h>
 #include <errno.h>
@@ -344,6 +347,20 @@ int bpf_get_link_xdp_id(int ifindex, __u32 *prog_id, __u32 flags)
 	return ret;
 }
 
+static int bpf_nl_get_ext(struct nlmsghdr *nh, int sock, unsigned int nl_pid,
+			  __dump_nlmsg_t dump_link_nlmsg_p,
+			  libbpf_dump_nlmsg_t dump_link_nlmsg, void *cookie)
+{
+	int seq = time(NULL);
+
+	nh->nlmsg_seq = seq;
+	if (send(sock, nh, nh->nlmsg_len, 0) < 0)
+		return -errno;
+
+	return bpf_netlink_recv(sock, nl_pid, seq, dump_link_nlmsg_p,
+				dump_link_nlmsg, cookie);
+}
+
 int libbpf_nl_get_link(int sock, unsigned int nl_pid,
 		       libbpf_dump_nlmsg_t dump_link_nlmsg, void *cookie)
 {
@@ -356,12 +373,302 @@ int libbpf_nl_get_link(int sock, unsigned int nl_pid,
 		.nlh.nlmsg_flags = NLM_F_DUMP | NLM_F_REQUEST,
 		.ifm.ifi_family = AF_PACKET,
 	};
-	int seq = time(NULL);
 
-	req.nlh.nlmsg_seq = seq;
-	if (send(sock, &req, req.nlh.nlmsg_len, 0) < 0)
-		return -errno;
+	return bpf_nl_get_ext(&req.nlh, sock, nl_pid, __dump_link_nlmsg,
+			      dump_link_nlmsg, cookie);
+}
 
-	return bpf_netlink_recv(sock, nl_pid, seq, __dump_link_nlmsg,
-				dump_link_nlmsg, cookie);
+static int tc_setup_clsact_excl(int sock, __u32 nl_pid, __u32 ifindex)
+{
+	int seq = 0, ret = 0;
+	struct {
+		struct nlmsghdr nh;
+		struct tcmsg t;
+		char buf[256];
+	} req;
+
+	memset(&req, 0, sizeof(req));
+	req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg));
+	req.nh.nlmsg_flags =
+		NLM_F_REQUEST | NLM_F_CREATE | NLM_F_ACK | NLM_F_EXCL;
+	req.nh.nlmsg_type = RTM_NEWQDISC;
+	req.nh.nlmsg_pid = 0;
+	req.nh.nlmsg_seq = ++seq;
+	req.t.tcm_family = AF_UNSPEC;
+	req.t.tcm_ifindex = ifindex;
+	req.t.tcm_parent = TC_H_CLSACT;
+	req.t.tcm_handle = TC_H_MAKE(TC_H_CLSACT, 0);
+
+	ret = nlattr_add(&req.nh, sizeof(req), TCA_KIND, "clsact",
+			 sizeof("clsact"));
+	if (ret < 0)
+		return ret;
+
+	ret = send(sock, &req.nh, req.nh.nlmsg_len, 0);
+	if (ret < 0)
+		return ret;
+
+	return bpf_netlink_recv(sock, nl_pid, seq, NULL, NULL, NULL);
+}
+
+static int tc_bpf_add_fd_and_name(struct nlmsghdr *nh, size_t maxsz, int fd)
+{
+	struct bpf_prog_info info = {};
+	__u32 info_len = sizeof(info);
+	char name[256] = {};
+	int len, ret;
+
+	ret = bpf_obj_get_info_by_fd(fd, &info, &info_len);
+	if (ret < 0)
+		return ret;
+
+	ret = nlattr_add(nh, maxsz, TCA_BPF_FD, &fd, sizeof(fd));
+	if (ret < 0)
+		return ret;
+
+	len = snprintf(name, sizeof(name), "%s:[%" PRIu32 "]", info.name,
+		       info.id);
+	if (len < 0 || len >= sizeof(name))
+		return len < 0 ? -EINVAL : -ENAMETOOLONG;
+
+	return nlattr_add(nh, maxsz, TCA_BPF_NAME, name, len + 1);
+}
+
+static int cls_get_info(struct nlmsghdr *nh, libbpf_dump_nlmsg_t fn,
+			void *cookie);
+
+static int tc_cls_bpf_modify(int fd, int cmd, unsigned int flags, __u32 ifindex,
+			     __u32 parent_id, const struct bpf_tc_opts *opts,
+			     __dump_nlmsg_t fn, struct bpf_tc_attach_id *id)
+{
+	__u32 nl_pid = 0, protocol, priority;
+	struct bpf_tc_info info = {};
+	unsigned int bpf_flags = 0;
+	int sock, seq = 0, ret;
+	struct nlattr *nla;
+	struct {
+		struct nlmsghdr nh;
+		struct tcmsg t;
+		char buf[256];
+	} req;
+
+	sock = libbpf_netlink_open(&nl_pid);
+	if (sock < 0)
+		return sock;
+
+	if ((parent_id == BPF_TC_CLSACT_INGRESS ||
+	     parent_id == BPF_TC_CLSACT_EGRESS) &&
+	    flags & NLM_F_CREATE) {
+		ret = tc_setup_clsact_excl(sock, nl_pid, ifindex);
+		/* Attachment can still fail if ingress qdisc is installed, and
+		 * we're trying attach on egress as parent. Ignore in that case
+		 * as well.
+		 */
+		if (ret < 0 && ret != -EEXIST)
+			goto end;
+	}
+
+	priority = OPTS_GET(opts, priority, 0);
+	protocol = ETH_P_ALL;
+
+	memset(&req, 0, sizeof(req));
+	req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg));
+	req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | flags;
+	req.nh.nlmsg_type = cmd;
+	req.nh.nlmsg_pid = 0;
+	req.nh.nlmsg_seq = ++seq;
+	req.t.tcm_family = AF_UNSPEC;
+	req.t.tcm_handle = OPTS_GET(opts, handle, 0);
+	req.t.tcm_parent = parent_id;
+	req.t.tcm_ifindex = ifindex;
+	req.t.tcm_info = TC_H_MAKE(priority << 16, htons(protocol));
+
+	ret = nlattr_add(&req.nh, sizeof(req), TCA_KIND, "bpf", sizeof("bpf"));
+	if (ret < 0)
+		goto end;
+
+	nla = nlattr_begin_nested(&req.nh, sizeof(req), TCA_OPTIONS);
+	if (!nla) {
+		ret = -EMSGSIZE;
+		goto end;
+	}
+
+	if (OPTS_GET(opts, class_id, TC_H_UNSPEC)) {
+		ret = nlattr_add(&req.nh, sizeof(req), TCA_BPF_CLASSID,
+				 &opts->class_id, sizeof(opts->class_id));
+		if (ret < 0)
+			goto end;
+	}
+
+	if (cmd != RTM_DELTFILTER) {
+		ret = tc_bpf_add_fd_and_name(&req.nh, sizeof(req), fd);
+		if (ret < 0)
+			goto end;
+
+		/* direct action is always set */
+		bpf_flags |= TCA_BPF_FLAG_ACT_DIRECT;
+		ret = nlattr_add(&req.nh, sizeof(req), TCA_BPF_FLAGS,
+				 &bpf_flags, sizeof(bpf_flags));
+		if (ret < 0)
+			goto end;
+	}
+
+	nlattr_end_nested(&req.nh, nla);
+
+	ret = send(sock, &req.nh, req.nh.nlmsg_len, 0);
+	if (ret < 0)
+		goto end;
+
+	ret = bpf_netlink_recv(sock, nl_pid, seq, fn, NULL, &info);
+
+	if (fn) {
+		*id = info.id;
+		ret = ret < 0 ? ret : 0;
+	}
+
+end:
+	close(sock);
+	return ret;
+}
+
+int bpf_tc_attach(int fd, __u32 ifindex, __u32 parent_id,
+		  const struct bpf_tc_opts *opts,
+		  struct bpf_tc_attach_id *id)
+{
+	if (fd < 0 || !OPTS_VALID(opts, bpf_tc_opts) || !id)
+		return -EINVAL;
+
+	return tc_cls_bpf_modify(fd, RTM_NEWTFILTER,
+				 NLM_F_ECHO | NLM_F_CREATE |
+			         (OPTS_GET(opts, replace, false) ?
+					NLM_F_REPLACE : NLM_F_EXCL),
+				 ifindex, parent_id, opts, cls_get_info, id);
+}
+
+int bpf_tc_detach(__u32 ifindex, __u32 parent_id,
+		  const struct bpf_tc_attach_id *id)
+{
+	DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, 0);
+
+	if (!id)
+		return -EINVAL;
+
+	opts.handle = id->handle;
+	opts.priority = id->priority;
+
+	return tc_cls_bpf_modify(-1, RTM_DELTFILTER, 0, ifindex, parent_id,
+				 &opts, NULL, NULL);
+}
+
+static int __cls_get_info(void *cookie, void *msg, struct nlattr **tb)
+{
+	struct nlattr *tbb[TCA_BPF_MAX + 1];
+	struct bpf_tc_info *info = cookie;
+	struct tcmsg *t = msg;
+
+	if (!tb[TCA_OPTIONS])
+		return 0;
+
+	libbpf_nla_parse_nested(tbb, TCA_BPF_MAX, tb[TCA_OPTIONS], NULL);
+	if (!tbb[TCA_BPF_ID])
+		return 0;
+
+	info->protocol = ntohs(TC_H_MIN(t->tcm_info));
+	info->id.priority = TC_H_MAJ(t->tcm_info) >> 16;
+	info->id.handle = t->tcm_handle;
+
+	if (tb[TCA_CHAIN])
+		info->chain_index = libbpf_nla_getattr_u32(tb[TCA_CHAIN]);
+	else
+		info->chain_index = 0;
+
+	if (tbb[TCA_BPF_FLAGS])
+		info->bpf_flags = libbpf_nla_getattr_u32(tbb[TCA_BPF_FLAGS]);
+
+	if (tbb[TCA_BPF_FLAGS_GEN])
+		info->bpf_flags_gen =
+			libbpf_nla_getattr_u32(tbb[TCA_BPF_FLAGS_GEN]);
+
+	if (tbb[TCA_BPF_ID])
+		info->prog_id = libbpf_nla_getattr_u32(tbb[TCA_BPF_ID]);
+
+	if (tbb[TCA_BPF_TAG])
+		memcpy(info->tag, libbpf_nla_getattr_str(tbb[TCA_BPF_TAG]),
+		       sizeof(info->tag));
+
+	if (tbb[TCA_BPF_CLASSID])
+		info->class_id = libbpf_nla_getattr_u32(tbb[TCA_BPF_CLASSID]);
+
+	return 1;
+}
+
+static int cls_get_info(struct nlmsghdr *nh, libbpf_dump_nlmsg_t fn,
+			void *cookie)
+{
+	struct tcmsg *t = NLMSG_DATA(nh);
+	struct nlattr *tb[TCA_MAX + 1];
+
+	libbpf_nla_parse(tb, TCA_MAX,
+			 (struct nlattr *)((char *)t + NLMSG_ALIGN(sizeof(*t))),
+			 NLMSG_PAYLOAD(nh, sizeof(*t)), NULL);
+	if (!tb[TCA_KIND])
+		return -EINVAL;
+
+	return __cls_get_info(cookie, t, tb);
+}
+
+static int tc_cls_get_info(__u32 ifindex, __u32 parent_id,
+			   const struct bpf_tc_attach_id *id,
+			   struct bpf_tc_info *info)
+{
+	__u32 nl_pid = 0, protocol;
+	__u32 priority;
+	int sock, ret;
+	struct {
+		struct nlmsghdr nh;
+		struct tcmsg t;
+		char buf[256];
+	} req = {
+		.nh.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg)),
+		.nh.nlmsg_type = RTM_GETTFILTER,
+		.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_DUMP,
+		.t.tcm_family = AF_UNSPEC,
+	};
+
+	priority = id->priority;
+	protocol = ETH_P_ALL;
+
+	req.t.tcm_parent = parent_id;
+	req.t.tcm_ifindex = ifindex;
+	req.t.tcm_handle = id->handle;
+	req.t.tcm_info = TC_H_MAKE(priority << 16, htons(protocol));
+
+	sock = libbpf_netlink_open(&nl_pid);
+	if (sock < 0)
+		return sock;
+
+	ret = nlattr_add(&req.nh, sizeof(req), TCA_KIND, "bpf", sizeof("bpf"));
+	if (ret < 0)
+		goto end;
+
+	req.nh.nlmsg_seq = time(NULL);
+
+	ret = bpf_nl_get_ext(&req.nh, sock, nl_pid, cls_get_info, NULL, info);
+	if (ret < 0)
+		goto end;
+	/* 1 denotes a match */
+	ret = ret == 1 ? 0 : -ESRCH;
+end:
+	close(sock);
+	return ret;
+}
+
+int bpf_tc_get_info(__u32 ifindex, __u32 parent_id,
+		    const struct bpf_tc_attach_id *id,
+		    struct bpf_tc_info *info)
+{
+	if (!id || !info)
+		return -EINVAL;
+
+	return tc_cls_get_info(ifindex, parent_id, id, info);
 }
-- 
2.30.2


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH bpf-next v3 3/3] libbpf: add selftests for TC-BPF API
  2021-04-20 19:37 [PATCH bpf-next v3 0/3] Add TC-BPF API Kumar Kartikeya Dwivedi
  2021-04-20 19:37 ` [PATCH bpf-next v3 1/3] libbpf: add helpers for preparing netlink attributes Kumar Kartikeya Dwivedi
  2021-04-20 19:37 ` [PATCH bpf-next v3 2/3] libbpf: add low level TC-BPF API Kumar Kartikeya Dwivedi
@ 2021-04-20 19:37 ` Kumar Kartikeya Dwivedi
  2021-04-21  7:01   ` Yonghong Song
  2021-04-21 18:24   ` Andrii Nakryiko
  2 siblings, 2 replies; 29+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2021-04-20 19:37 UTC (permalink / raw)
  To: bpf
  Cc: Kumar Kartikeya Dwivedi, Toke Høiland-Jørgensen,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, netdev

This adds some basic tests for the low level bpf_tc_* API.

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 .../selftests/bpf/prog_tests/test_tc_bpf.c    | 169 ++++++++++++++++++
 .../selftests/bpf/progs/test_tc_bpf_kern.c    |  12 ++
 2 files changed, 181 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c

diff --git a/tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c b/tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c
new file mode 100644
index 000000000000..563a3944553c
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c
@@ -0,0 +1,169 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/bpf.h>
+#include <linux/err.h>
+#include <linux/limits.h>
+#include <bpf/libbpf.h>
+#include <errno.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <test_progs.h>
+#include <linux/if_ether.h>
+
+#define LO_IFINDEX 1
+
+static int test_tc_internal(int fd, __u32 parent_id)
+{
+	DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .handle = 1, .priority = 10,
+			    .class_id = TC_H_MAKE(1UL << 16, 1));
+	struct bpf_tc_attach_id id = {};
+	struct bpf_tc_info info = {};
+	int ret;
+
+	ret = bpf_tc_attach(fd, LO_IFINDEX, parent_id, &opts, &id);
+	if (!ASSERT_EQ(ret, 0, "bpf_tc_attach"))
+		return ret;
+
+	ret = bpf_tc_get_info(LO_IFINDEX, parent_id, &id, &info);
+	if (!ASSERT_EQ(ret, 0, "bpf_tc_get_info"))
+		goto end;
+
+	if (!ASSERT_EQ(info.id.handle, id.handle, "handle mismatch") ||
+	    !ASSERT_EQ(info.id.priority, id.priority, "priority mismatch") ||
+	    !ASSERT_EQ(info.id.handle, 1, "handle incorrect") ||
+	    !ASSERT_EQ(info.chain_index, 0, "chain_index incorrect") ||
+	    !ASSERT_EQ(info.id.priority, 10, "priority incorrect") ||
+	    !ASSERT_EQ(info.class_id, TC_H_MAKE(1UL << 16, 1),
+		       "class_id incorrect") ||
+	    !ASSERT_EQ(info.protocol, ETH_P_ALL, "protocol incorrect"))
+		goto end;
+
+	opts.replace = true;
+	ret = bpf_tc_attach(fd, LO_IFINDEX, parent_id, &opts, &id);
+	if (!ASSERT_EQ(ret, 0, "bpf_tc_attach in replace mode"))
+		return ret;
+
+	/* Demonstrate changing attributes */
+	opts.class_id = TC_H_MAKE(1UL << 16, 2);
+
+	ret = bpf_tc_attach(fd, LO_IFINDEX, parent_id, &opts, &id);
+	if (!ASSERT_EQ(ret, 0, "bpf_tc attach in replace mode"))
+		goto end;
+
+	ret = bpf_tc_get_info(LO_IFINDEX, parent_id, &id, &info);
+	if (!ASSERT_EQ(ret, 0, "bpf_tc_get_info"))
+		goto end;
+
+	if (!ASSERT_EQ(info.class_id, TC_H_MAKE(1UL << 16, 2),
+		       "class_id incorrect after replace"))
+		goto end;
+	if (!ASSERT_EQ(info.bpf_flags & TCA_BPF_FLAG_ACT_DIRECT, 1,
+		       "direct action mode not set"))
+		goto end;
+
+end:
+	ret = bpf_tc_detach(LO_IFINDEX, parent_id, &id);
+	ASSERT_EQ(ret, 0, "detach failed");
+	return ret;
+}
+
+int test_tc_info(int fd)
+{
+	DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .handle = 1, .priority = 10,
+			    .class_id = TC_H_MAKE(1UL << 16, 1));
+	struct bpf_tc_attach_id id = {}, old;
+	struct bpf_tc_info info = {};
+	int ret;
+
+	ret = bpf_tc_attach(fd, LO_IFINDEX, BPF_TC_CLSACT_INGRESS, &opts, &id);
+	if (!ASSERT_EQ(ret, 0, "bpf_tc_attach"))
+		return ret;
+	old = id;
+
+	ret = bpf_tc_get_info(LO_IFINDEX, BPF_TC_CLSACT_INGRESS, &id, &info);
+	if (!ASSERT_EQ(ret, 0, "bpf_tc_get_info"))
+		goto end_old;
+
+	if (!ASSERT_EQ(info.id.handle, id.handle, "handle mismatch") ||
+	    !ASSERT_EQ(info.id.priority, id.priority, "priority mismatch") ||
+	    !ASSERT_EQ(info.id.handle, 1, "handle incorrect") ||
+	    !ASSERT_EQ(info.chain_index, 0, "chain_index incorrect") ||
+	    !ASSERT_EQ(info.id.priority, 10, "priority incorrect") ||
+	    !ASSERT_EQ(info.class_id, TC_H_MAKE(1UL << 16, 1),
+		       "class_id incorrect") ||
+	    !ASSERT_EQ(info.protocol, ETH_P_ALL, "protocol incorrect"))
+		goto end_old;
+
+	/* choose a priority */
+	opts.priority = 0;
+	ret = bpf_tc_attach(fd, LO_IFINDEX, BPF_TC_CLSACT_INGRESS, &opts, &id);
+	if (!ASSERT_EQ(ret, 0, "bpf_tc_attach"))
+		goto end_old;
+
+	ret = bpf_tc_get_info(LO_IFINDEX, BPF_TC_CLSACT_INGRESS, &id, &info);
+	if (!ASSERT_EQ(ret, 0, "bpf_tc_get_info"))
+		goto end;
+
+	if (!ASSERT_NEQ(id.priority, old.priority, "filter priority mismatch"))
+		goto end;
+	if (!ASSERT_EQ(info.id.priority, id.priority, "priority mismatch"))
+		goto end;
+
+end:
+	ret = bpf_tc_detach(LO_IFINDEX, BPF_TC_CLSACT_INGRESS, &id);
+	ASSERT_EQ(ret, 0, "detach failed");
+end_old:
+	ret = bpf_tc_detach(LO_IFINDEX, BPF_TC_CLSACT_INGRESS, &old);
+	ASSERT_EQ(ret, 0, "detach failed");
+	return ret;
+}
+
+void test_test_tc_bpf(void)
+{
+	const char *file = "./test_tc_bpf_kern.o";
+	struct bpf_program *clsp;
+	struct bpf_object *obj;
+	int cls_fd, ret;
+
+	obj = bpf_object__open(file);
+	if (!ASSERT_OK_PTR(obj, "bpf_object__open"))
+		return;
+
+	clsp = bpf_object__find_program_by_title(obj, "classifier");
+	if (!ASSERT_OK_PTR(clsp, "bpf_object__find_program_by_title"))
+		goto end;
+
+	ret = bpf_object__load(obj);
+	if (!ASSERT_EQ(ret, 0, "bpf_object__load"))
+		goto end;
+
+	cls_fd = bpf_program__fd(clsp);
+
+	system("tc qdisc del dev lo clsact");
+
+	ret = test_tc_internal(cls_fd, BPF_TC_CLSACT_INGRESS);
+	if (!ASSERT_EQ(ret, 0, "test_tc_internal INGRESS"))
+		goto end;
+
+	if (!ASSERT_EQ(system("tc qdisc del dev lo clsact"), 0,
+		       "clsact qdisc delete failed"))
+		goto end;
+
+	ret = test_tc_info(cls_fd);
+	if (!ASSERT_EQ(ret, 0, "test_tc_info"))
+		goto end;
+
+	if (!ASSERT_EQ(system("tc qdisc del dev lo clsact"), 0,
+		       "clsact qdisc delete failed"))
+		goto end;
+
+	ret = test_tc_internal(cls_fd, BPF_TC_CLSACT_EGRESS);
+	if (!ASSERT_EQ(ret, 0, "test_tc_internal EGRESS"))
+		goto end;
+
+	ASSERT_EQ(system("tc qdisc del dev lo clsact"), 0,
+		  "clsact qdisc delete failed");
+
+end:
+	bpf_object__close(obj);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c b/tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c
new file mode 100644
index 000000000000..18a3a7ed924a
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c
@@ -0,0 +1,12 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+
+/* Dummy prog to test TC-BPF API */
+
+SEC("classifier")
+int cls(struct __sk_buff *skb)
+{
+	return 0;
+}
-- 
2.30.2


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v3 1/3] libbpf: add helpers for preparing netlink attributes
  2021-04-20 19:37 ` [PATCH bpf-next v3 1/3] libbpf: add helpers for preparing netlink attributes Kumar Kartikeya Dwivedi
@ 2021-04-21  6:37   ` Yonghong Song
  0 siblings, 0 replies; 29+ messages in thread
From: Yonghong Song @ 2021-04-21  6:37 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: Toke Høiland-Jørgensen, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	John Fastabend, KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, netdev



On 4/20/21 12:37 PM, Kumar Kartikeya Dwivedi wrote:
> This change introduces a few helpers to wrap open coded attribute
> preparation in netlink.c.
> 
> Every nested attribute's closure must happen using the helper
> nlattr_end_nested, which sets its length properly. NLA_F_NESTED is
> enforeced using nlattr_begin_nested helper. Other simple attributes

typo: enforced

> can be added directly.
> 
> The maxsz parameter corresponds to the size of the request structure
> which is being filled in, so for instance with req being:
> 
> struct {
> 	struct nlmsghdr nh;
> 	struct tcmsg t;
> 	char buf[4096];
> } req;
> 
> Then, maxsz should be sizeof(req).
> 
> This change also converts the open coded attribute preparation with the
> helpers. Note that the only failure the internal call to nlattr_add
> could result in the nested helper would be -EMSGSIZE, hence that is what
> we return to our caller.
> 
> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>   tools/lib/bpf/netlink.c | 37 ++++++++++++++-----------------
>   tools/lib/bpf/nlattr.h  | 48 +++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 64 insertions(+), 21 deletions(-)
> 
[...]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v3 2/3] libbpf: add low level TC-BPF API
  2021-04-20 19:37 ` [PATCH bpf-next v3 2/3] libbpf: add low level TC-BPF API Kumar Kartikeya Dwivedi
@ 2021-04-21  6:58   ` Yonghong Song
  2021-04-21 17:06     ` Kumar Kartikeya Dwivedi
  2021-04-21 18:19   ` Andrii Nakryiko
  2021-04-21 22:59   ` Daniel Borkmann
  2 siblings, 1 reply; 29+ messages in thread
From: Yonghong Song @ 2021-04-21  6:58 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: Toke Høiland-Jørgensen, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	John Fastabend, KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, netdev



On 4/20/21 12:37 PM, Kumar Kartikeya Dwivedi wrote:
> This adds functions that wrap the netlink API used for adding,
> manipulating, and removing traffic control filters. These functions
> operate directly on the loaded prog's fd, and return a handle to the
> filter using an out parameter named id.
> 
> The basic featureset is covered to allow for attaching and removal of
> filters. Some additional features like TCA_BPF_POLICE and TCA_RATE for
> the API have been omitted. These can added on top later by extending the

"later" => "layer"?

> bpf_tc_opts struct.
> 
> Support for binding actions directly to a classifier by passing them in
> during filter creation has also been omitted for now. These actions have
> an auto clean up property because their lifetime is bound to the filter
> they are attached to. This can be added later, but was omitted for now
> as direct action mode is a better alternative to it, which is enabled by
> default.
> 
> An API summary:
> 
> bpf_tc_attach may be used to attach, and replace SCHED_CLS bpf
> classifier. The protocol is always set as ETH_P_ALL. The replace option
> in bpf_tc_opts is used to control replacement behavior.  Attachment
> fails if filter with existing attributes already exists.
> 
> bpf_tc_detach may be used to detach existing SCHED_CLS filter. The
> bpf_tc_attach_id object filled in during attach must be passed in to the
> detach functions for them to remove the filter and its attached
> classififer correctly.
> 
> bpf_tc_get_info is a helper that can be used to obtain attributes
> for the filter and classififer.
> 
> Examples:
> 
> 	struct bpf_tc_attach_id id = {};
> 	struct bpf_object *obj;
> 	struct bpf_program *p;
> 	int fd, r;
> 
> 	obj = bpf_object_open("foo.o");
> 	if (IS_ERR_OR_NULL(obj))
> 		return PTR_ERR(obj);
> 
> 	p = bpf_object__find_program_by_title(obj, "classifier");

Please use bpf_object__find_program_by_name() API.
bpf_object__find_program_by_title() is not recommended as now
libbpf supports multiple programs within the same section.

> 	if (IS_ERR_OR_NULL(p))
> 		return PTR_ERR(p);
> 
> 	if (bpf_object__load(obj) < 0)
> 		return -1;
> 
> 	fd = bpf_program__fd(p);
> 
> 	r = bpf_tc_attach(fd, if_nametoindex("lo"),
> 			  BPF_TC_CLSACT_INGRESS,
> 			  NULL, &id);
> 	if (r < 0)
> 		return r;
> 
> ... which is roughly equivalent to:
>    # tc qdisc add dev lo clsact
>    # tc filter add dev lo ingress bpf obj foo.o sec classifier da
> 
> ... as direct action mode is always enabled.
> 
> To replace an existing filter:
> 
> 	DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .handle = id.handle,
> 			    .priority = id.priority, .replace = true);
> 	r = bpf_tc_attach(fd, if_nametoindex("lo"),
> 			  BPF_TC_CLSACT_INGRESS,
> 			  &opts, &id);
> 	if (r < 0)
> 		return r;
> 
> To obtain info of a particular filter, the example above can be extended
> as follows:
> 
> 	struct bpf_tc_info info = {};
> 
> 	r = bpf_tc_get_info(if_nametoindex("lo"),
> 			    BPF_TC_CLSACT_INGRESS,
> 			    &id, &info);
> 	if (r < 0)
> 		return r;
> 
> ... where id corresponds to the bpf_tc_attach_id filled in during an
> attach operation.
> 
> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>   tools/lib/bpf/libbpf.h   |  44 ++++++
>   tools/lib/bpf/libbpf.map |   3 +
>   tools/lib/bpf/netlink.c  | 319 ++++++++++++++++++++++++++++++++++++++-
>   3 files changed, 360 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> index bec4e6a6e31d..b4ed6a41ea70 100644
> --- a/tools/lib/bpf/libbpf.h
> +++ b/tools/lib/bpf/libbpf.h
> @@ -16,6 +16,8 @@
>   #include <stdbool.h>
>   #include <sys/types.h>  // for size_t
>   #include <linux/bpf.h>
> +#include <linux/pkt_sched.h>
> +#include <linux/tc_act/tc_bpf.h>
>   
>   #include "libbpf_common.h"
>   
> @@ -775,6 +777,48 @@ LIBBPF_API int bpf_linker__add_file(struct bpf_linker *linker, const char *filen
>   LIBBPF_API int bpf_linker__finalize(struct bpf_linker *linker);
>   LIBBPF_API void bpf_linker__free(struct bpf_linker *linker);
>   
> +/* Convenience macros for the clsact attach hooks */
> +#define BPF_TC_CLSACT_INGRESS TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_INGRESS)
> +#define BPF_TC_CLSACT_EGRESS TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_EGRESS)
> +
> +struct bpf_tc_opts {
> +	size_t sz;
> +	__u32 handle;
> +	__u32 class_id;
> +	__u16 priority;
> +	bool replace;
> +	size_t :0;

Did you see any error without "size_t :0"?

> +};
> +
> +#define bpf_tc_opts__last_field replace
> +
> +/* Acts as a handle for an attached filter */
> +struct bpf_tc_attach_id {
> +	__u32 handle;
> +	__u16 priority;
> +};
> +
[...]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v3 3/3] libbpf: add selftests for TC-BPF API
  2021-04-20 19:37 ` [PATCH bpf-next v3 3/3] libbpf: add selftests for " Kumar Kartikeya Dwivedi
@ 2021-04-21  7:01   ` Yonghong Song
  2021-04-21 18:24   ` Andrii Nakryiko
  1 sibling, 0 replies; 29+ messages in thread
From: Yonghong Song @ 2021-04-21  7:01 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: Toke Høiland-Jørgensen, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	John Fastabend, KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, netdev



On 4/20/21 12:37 PM, Kumar Kartikeya Dwivedi wrote:
> This adds some basic tests for the low level bpf_tc_* API.
> 
> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>   .../selftests/bpf/prog_tests/test_tc_bpf.c    | 169 ++++++++++++++++++
>   .../selftests/bpf/progs/test_tc_bpf_kern.c    |  12 ++
>   2 files changed, 181 insertions(+)
>   create mode 100644 tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c
>   create mode 100644 tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c
> 
> diff --git a/tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c b/tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c
> new file mode 100644
> index 000000000000..563a3944553c
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c
> @@ -0,0 +1,169 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/bpf.h>
> +#include <linux/err.h>
> +#include <linux/limits.h>
> +#include <bpf/libbpf.h>
> +#include <errno.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <test_progs.h>
> +#include <linux/if_ether.h>
> +
> +#define LO_IFINDEX 1
> +
[...]
> +
> +void test_test_tc_bpf(void)
> +{
> +	const char *file = "./test_tc_bpf_kern.o";
> +	struct bpf_program *clsp;
> +	struct bpf_object *obj;
> +	int cls_fd, ret;
> +
> +	obj = bpf_object__open(file);
> +	if (!ASSERT_OK_PTR(obj, "bpf_object__open"))
> +		return;
> +
> +	clsp = bpf_object__find_program_by_title(obj, "classifier");
> +	if (!ASSERT_OK_PTR(clsp, "bpf_object__find_program_by_title"))
> +		goto end;

Please use bpf_object__find_program_by_name() API.

> +
> +	ret = bpf_object__load(obj);
> +	if (!ASSERT_EQ(ret, 0, "bpf_object__load"))
> +		goto end;
> +
> +	cls_fd = bpf_program__fd(clsp);
> +
> +	system("tc qdisc del dev lo clsact");
> +
> +	ret = test_tc_internal(cls_fd, BPF_TC_CLSACT_INGRESS);
> +	if (!ASSERT_EQ(ret, 0, "test_tc_internal INGRESS"))
> +		goto end;
> +
> +	if (!ASSERT_EQ(system("tc qdisc del dev lo clsact"), 0,
> +		       "clsact qdisc delete failed"))
> +		goto end;
> +
> +	ret = test_tc_info(cls_fd);
> +	if (!ASSERT_EQ(ret, 0, "test_tc_info"))
> +		goto end;
> +
> +	if (!ASSERT_EQ(system("tc qdisc del dev lo clsact"), 0,
> +		       "clsact qdisc delete failed"))
> +		goto end;
> +
> +	ret = test_tc_internal(cls_fd, BPF_TC_CLSACT_EGRESS);
> +	if (!ASSERT_EQ(ret, 0, "test_tc_internal EGRESS"))
> +		goto end;
> +
> +	ASSERT_EQ(system("tc qdisc del dev lo clsact"), 0,
> +		  "clsact qdisc delete failed");
> +
> +end:
> +	bpf_object__close(obj);
> +}
> diff --git a/tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c b/tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c
> new file mode 100644
> index 000000000000..18a3a7ed924a
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c
> @@ -0,0 +1,12 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/bpf.h>
> +#include <bpf/bpf_helpers.h>
> +
> +/* Dummy prog to test TC-BPF API */
> +
> +SEC("classifier")
> +int cls(struct __sk_buff *skb)
> +{
> +	return 0;
> +}
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v3 2/3] libbpf: add low level TC-BPF API
  2021-04-21  6:58   ` Yonghong Song
@ 2021-04-21 17:06     ` Kumar Kartikeya Dwivedi
  2021-04-22  1:56       ` Yonghong Song
  0 siblings, 1 reply; 29+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2021-04-21 17:06 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, Toke Høiland-Jørgensen, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	John Fastabend, KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, netdev

On Wed, Apr 21, 2021 at 12:28:01PM IST, Yonghong Song wrote:
>
>
> On 4/20/21 12:37 PM, Kumar Kartikeya Dwivedi wrote:
> > This adds functions that wrap the netlink API used for adding,
> > manipulating, and removing traffic control filters. These functions
> > operate directly on the loaded prog's fd, and return a handle to the
> > filter using an out parameter named id.
> >
> > The basic featureset is covered to allow for attaching and removal of
> > filters. Some additional features like TCA_BPF_POLICE and TCA_RATE for
> > the API have been omitted. These can added on top later by extending the
>
> "later" => "layer"?
>

No, I meant that other options can be added on top of this series by someone
later. I'll reword it.

> > bpf_tc_opts struct.
> >
> > Support for binding actions directly to a classifier by passing them in
> > during filter creation has also been omitted for now. These actions have
> > an auto clean up property because their lifetime is bound to the filter
> > they are attached to. This can be added later, but was omitted for now
> > as direct action mode is a better alternative to it, which is enabled by
> > default.
> >
> > An API summary:
> >
> > bpf_tc_attach may be used to attach, and replace SCHED_CLS bpf
> > classifier. The protocol is always set as ETH_P_ALL. The replace option
> > in bpf_tc_opts is used to control replacement behavior.  Attachment
> > fails if filter with existing attributes already exists.
> >
> > bpf_tc_detach may be used to detach existing SCHED_CLS filter. The
> > bpf_tc_attach_id object filled in during attach must be passed in to the
> > detach functions for them to remove the filter and its attached
> > classififer correctly.
> >
> > bpf_tc_get_info is a helper that can be used to obtain attributes
> > for the filter and classififer.
> >
> > Examples:
> >
> > 	struct bpf_tc_attach_id id = {};
> > 	struct bpf_object *obj;
> > 	struct bpf_program *p;
> > 	int fd, r;
> >
> > 	obj = bpf_object_open("foo.o");
> > 	if (IS_ERR_OR_NULL(obj))
> > 		return PTR_ERR(obj);
> >
> > 	p = bpf_object__find_program_by_title(obj, "classifier");
>
> Please use bpf_object__find_program_by_name() API.
> bpf_object__find_program_by_title() is not recommended as now
> libbpf supports multiple programs within the same section.

Thanks, will do.

>
> > 	if (IS_ERR_OR_NULL(p))
> > 		return PTR_ERR(p);
> >
> > 	if (bpf_object__load(obj) < 0)
> > 		return -1;
> >
> > 	fd = bpf_program__fd(p);
> >
> > 	r = bpf_tc_attach(fd, if_nametoindex("lo"),
> > 			  BPF_TC_CLSACT_INGRESS,
> > 			  NULL, &id);
> > 	if (r < 0)
> > 		return r;
> >
> > ... which is roughly equivalent to:
> >    # tc qdisc add dev lo clsact
> >    # tc filter add dev lo ingress bpf obj foo.o sec classifier da
> >
> > ... as direct action mode is always enabled.
> >
> > To replace an existing filter:
> >
> > 	DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .handle = id.handle,
> > 			    .priority = id.priority, .replace = true);
> > 	r = bpf_tc_attach(fd, if_nametoindex("lo"),
> > 			  BPF_TC_CLSACT_INGRESS,
> > 			  &opts, &id);
> > 	if (r < 0)
> > 		return r;
> >
> > To obtain info of a particular filter, the example above can be extended
> > as follows:
> >
> > 	struct bpf_tc_info info = {};
> >
> > 	r = bpf_tc_get_info(if_nametoindex("lo"),
> > 			    BPF_TC_CLSACT_INGRESS,
> > 			    &id, &info);
> > 	if (r < 0)
> > 		return r;
> >
> > ... where id corresponds to the bpf_tc_attach_id filled in during an
> > attach operation.
> >
> > Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> >   tools/lib/bpf/libbpf.h   |  44 ++++++
> >   tools/lib/bpf/libbpf.map |   3 +
> >   tools/lib/bpf/netlink.c  | 319 ++++++++++++++++++++++++++++++++++++++-
> >   3 files changed, 360 insertions(+), 6 deletions(-)
> >
> > diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> > index bec4e6a6e31d..b4ed6a41ea70 100644
> > --- a/tools/lib/bpf/libbpf.h
> > +++ b/tools/lib/bpf/libbpf.h
> > @@ -16,6 +16,8 @@
> >   #include <stdbool.h>
> >   #include <sys/types.h>  // for size_t
> >   #include <linux/bpf.h>
> > +#include <linux/pkt_sched.h>
> > +#include <linux/tc_act/tc_bpf.h>
> >   #include "libbpf_common.h"
> > @@ -775,6 +777,48 @@ LIBBPF_API int bpf_linker__add_file(struct bpf_linker *linker, const char *filen
> >   LIBBPF_API int bpf_linker__finalize(struct bpf_linker *linker);
> >   LIBBPF_API void bpf_linker__free(struct bpf_linker *linker);
> > +/* Convenience macros for the clsact attach hooks */
> > +#define BPF_TC_CLSACT_INGRESS TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_INGRESS)
> > +#define BPF_TC_CLSACT_EGRESS TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_EGRESS)
> > +
> > +struct bpf_tc_opts {
> > +	size_t sz;
> > +	__u32 handle;
> > +	__u32 class_id;
> > +	__u16 priority;
> > +	bool replace;
> > +	size_t :0;
>
> Did you see any error without "size_t :0"?
>

Not really, I just added this considering this recent change:
dde7b3f5f2f4 ("libbpf: Add explicit padding to bpf_xdp_set_link_opts")

> > +};
> > +
> > +#define bpf_tc_opts__last_field replace
> > +
> > +/* Acts as a handle for an attached filter */
> > +struct bpf_tc_attach_id {
> > +	__u32 handle;
> > +	__u16 priority;
> > +};
> > +
> [...]

--
Kartikeya

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v3 2/3] libbpf: add low level TC-BPF API
  2021-04-20 19:37 ` [PATCH bpf-next v3 2/3] libbpf: add low level TC-BPF API Kumar Kartikeya Dwivedi
  2021-04-21  6:58   ` Yonghong Song
@ 2021-04-21 18:19   ` Andrii Nakryiko
  2021-04-21 19:48     ` Toke Høiland-Jørgensen
  2021-04-21 22:59   ` Daniel Borkmann
  2 siblings, 1 reply; 29+ messages in thread
From: Andrii Nakryiko @ 2021-04-21 18:19 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Toke Høiland-Jørgensen, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, Networking

On Tue, Apr 20, 2021 at 12:37 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> This adds functions that wrap the netlink API used for adding,
> manipulating, and removing traffic control filters. These functions
> operate directly on the loaded prog's fd, and return a handle to the
> filter using an out parameter named id.
>
> The basic featureset is covered to allow for attaching and removal of
> filters. Some additional features like TCA_BPF_POLICE and TCA_RATE for
> the API have been omitted. These can added on top later by extending the
> bpf_tc_opts struct.
>
> Support for binding actions directly to a classifier by passing them in
> during filter creation has also been omitted for now. These actions have
> an auto clean up property because their lifetime is bound to the filter
> they are attached to. This can be added later, but was omitted for now
> as direct action mode is a better alternative to it, which is enabled by
> default.
>
> An API summary:
>
> bpf_tc_attach may be used to attach, and replace SCHED_CLS bpf
> classifier. The protocol is always set as ETH_P_ALL. The replace option
> in bpf_tc_opts is used to control replacement behavior.  Attachment
> fails if filter with existing attributes already exists.
>
> bpf_tc_detach may be used to detach existing SCHED_CLS filter. The
> bpf_tc_attach_id object filled in during attach must be passed in to the
> detach functions for them to remove the filter and its attached
> classififer correctly.
>
> bpf_tc_get_info is a helper that can be used to obtain attributes
> for the filter and classififer.
>
> Examples:
>
>         struct bpf_tc_attach_id id = {};
>         struct bpf_object *obj;
>         struct bpf_program *p;
>         int fd, r;
>
>         obj = bpf_object_open("foo.o");
>         if (IS_ERR_OR_NULL(obj))
>                 return PTR_ERR(obj);
>
>         p = bpf_object__find_program_by_title(obj, "classifier");
>         if (IS_ERR_OR_NULL(p))
>                 return PTR_ERR(p);
>
>         if (bpf_object__load(obj) < 0)
>                 return -1;
>
>         fd = bpf_program__fd(p);
>
>         r = bpf_tc_attach(fd, if_nametoindex("lo"),
>                           BPF_TC_CLSACT_INGRESS,
>                           NULL, &id);
>         if (r < 0)
>                 return r;
>
> ... which is roughly equivalent to:
>   # tc qdisc add dev lo clsact
>   # tc filter add dev lo ingress bpf obj foo.o sec classifier da
>
> ... as direct action mode is always enabled.
>
> To replace an existing filter:
>
>         DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .handle = id.handle,
>                             .priority = id.priority, .replace = true);
>         r = bpf_tc_attach(fd, if_nametoindex("lo"),
>                           BPF_TC_CLSACT_INGRESS,
>                           &opts, &id);
>         if (r < 0)
>                 return r;
>
> To obtain info of a particular filter, the example above can be extended
> as follows:
>
>         struct bpf_tc_info info = {};
>
>         r = bpf_tc_get_info(if_nametoindex("lo"),
>                             BPF_TC_CLSACT_INGRESS,
>                             &id, &info);
>         if (r < 0)
>                 return r;
>
> ... where id corresponds to the bpf_tc_attach_id filled in during an
> attach operation.
>
> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>  tools/lib/bpf/libbpf.h   |  44 ++++++
>  tools/lib/bpf/libbpf.map |   3 +
>  tools/lib/bpf/netlink.c  | 319 ++++++++++++++++++++++++++++++++++++++-
>  3 files changed, 360 insertions(+), 6 deletions(-)
>
> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> index bec4e6a6e31d..b4ed6a41ea70 100644
> --- a/tools/lib/bpf/libbpf.h
> +++ b/tools/lib/bpf/libbpf.h
> @@ -16,6 +16,8 @@
>  #include <stdbool.h>
>  #include <sys/types.h>  // for size_t
>  #include <linux/bpf.h>
> +#include <linux/pkt_sched.h>
> +#include <linux/tc_act/tc_bpf.h>

apart from those unused macros below, are these needed in public API header?

>
>  #include "libbpf_common.h"
>
> @@ -775,6 +777,48 @@ LIBBPF_API int bpf_linker__add_file(struct bpf_linker *linker, const char *filen
>  LIBBPF_API int bpf_linker__finalize(struct bpf_linker *linker);
>  LIBBPF_API void bpf_linker__free(struct bpf_linker *linker);
>
> +/* Convenience macros for the clsact attach hooks */
> +#define BPF_TC_CLSACT_INGRESS TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_INGRESS)
> +#define BPF_TC_CLSACT_EGRESS TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_EGRESS)

these seem to be used only internally, why exposing them in public API?

> +
> +struct bpf_tc_opts {
> +       size_t sz;
> +       __u32 handle;
> +       __u32 class_id;
> +       __u16 priority;
> +       bool replace;
> +       size_t :0;
> +};
> +
> +#define bpf_tc_opts__last_field replace
> +
> +/* Acts as a handle for an attached filter */
> +struct bpf_tc_attach_id {
> +       __u32 handle;
> +       __u16 priority;
> +};

what are the chances that we'll need to grow this id struct? If that
happens, how do we do that in a backward/forward compatible manner?

if handle/prio are the only two ever necessary, we can actually use
bpf_tc_opts to return them back to user (we do that with
bpf_test_run_opts API). And then adjust detach/get_info methods to let
pass those values.

The whole idea of a struct for id just screams "compatibility problems
down the road" at me. Does anyone else has any other opinion on this?

> +
> +struct bpf_tc_info {
> +       struct bpf_tc_attach_id id;
> +       __u16 protocol;
> +       __u32 chain_index;
> +       __u32 prog_id;
> +       __u8 tag[BPF_TAG_SIZE];
> +       __u32 class_id;
> +       __u32 bpf_flags;
> +       __u32 bpf_flags_gen;
> +};
> +
> +/* id is out parameter that will be written to, it must not be NULL */
> +LIBBPF_API int bpf_tc_attach(int fd, __u32 ifindex, __u32 parent_id,

so parent_id is INGRESS|EGRESS, right? Is that an obvious name for
this parameter? I had to look at the code to understand what's
expected. Is it possible that it will be anything other than INGRESS
or EGRESS? If not `bool ingress` might be an option. Or perhaps enum
bpf_tc_direction { BPF_TC_INGRESS, BPF_TC_EGRESS } is better still.

> +                            const struct bpf_tc_opts *opts,
> +                            struct bpf_tc_attach_id *id);
> +LIBBPF_API int bpf_tc_detach(__u32 ifindex, __u32 parent_id,
> +                            const struct bpf_tc_attach_id *id);
> +LIBBPF_API int bpf_tc_get_info(__u32 ifindex, __u32 parent_id,

bpf_tc_query() to be more in line with attach/detach single-word verbs?

> +                              const struct bpf_tc_attach_id *id,
> +                              struct bpf_tc_info *info);
> +
>  #ifdef __cplusplus
>  } /* extern "C" */
>  #endif
> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> index b9b29baf1df8..686444fbb838 100644
> --- a/tools/lib/bpf/libbpf.map
> +++ b/tools/lib/bpf/libbpf.map
> @@ -361,4 +361,7 @@ LIBBPF_0.4.0 {
>                 bpf_linker__new;
>                 bpf_map__inner_map;
>                 bpf_object__set_kversion;
> +               bpf_tc_attach;
> +               bpf_tc_detach;
> +               bpf_tc_get_info;
>  } LIBBPF_0.3.0;

[...]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v3 3/3] libbpf: add selftests for TC-BPF API
  2021-04-20 19:37 ` [PATCH bpf-next v3 3/3] libbpf: add selftests for " Kumar Kartikeya Dwivedi
  2021-04-21  7:01   ` Yonghong Song
@ 2021-04-21 18:24   ` Andrii Nakryiko
  2021-04-21 19:56     ` Kumar Kartikeya Dwivedi
  1 sibling, 1 reply; 29+ messages in thread
From: Andrii Nakryiko @ 2021-04-21 18:24 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Toke Høiland-Jørgensen, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, Networking

On Tue, Apr 20, 2021 at 12:37 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> This adds some basic tests for the low level bpf_tc_* API.
>
> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>  .../selftests/bpf/prog_tests/test_tc_bpf.c    | 169 ++++++++++++++++++
>  .../selftests/bpf/progs/test_tc_bpf_kern.c    |  12 ++
>  2 files changed, 181 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c

we normally don't call prog_test's files with "test_" prefix, it can
be just tc_bpf.c (or just tc.c)

>  create mode 100644 tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c

we also don't typically call BPF source code files with _kern suffix,
just test_tc_bpf.c would be more in line with most common case

>
> diff --git a/tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c b/tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c
> new file mode 100644
> index 000000000000..563a3944553c
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c
> @@ -0,0 +1,169 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/bpf.h>
> +#include <linux/err.h>
> +#include <linux/limits.h>
> +#include <bpf/libbpf.h>
> +#include <errno.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <test_progs.h>
> +#include <linux/if_ether.h>
> +
> +#define LO_IFINDEX 1
> +
> +static int test_tc_internal(int fd, __u32 parent_id)
> +{
> +       DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .handle = 1, .priority = 10,
> +                           .class_id = TC_H_MAKE(1UL << 16, 1));
> +       struct bpf_tc_attach_id id = {};
> +       struct bpf_tc_info info = {};
> +       int ret;
> +
> +       ret = bpf_tc_attach(fd, LO_IFINDEX, parent_id, &opts, &id);
> +       if (!ASSERT_EQ(ret, 0, "bpf_tc_attach"))
> +               return ret;
> +
> +       ret = bpf_tc_get_info(LO_IFINDEX, parent_id, &id, &info);
> +       if (!ASSERT_EQ(ret, 0, "bpf_tc_get_info"))
> +               goto end;
> +
> +       if (!ASSERT_EQ(info.id.handle, id.handle, "handle mismatch") ||
> +           !ASSERT_EQ(info.id.priority, id.priority, "priority mismatch") ||
> +           !ASSERT_EQ(info.id.handle, 1, "handle incorrect") ||
> +           !ASSERT_EQ(info.chain_index, 0, "chain_index incorrect") ||
> +           !ASSERT_EQ(info.id.priority, 10, "priority incorrect") ||
> +           !ASSERT_EQ(info.class_id, TC_H_MAKE(1UL << 16, 1),
> +                      "class_id incorrect") ||
> +           !ASSERT_EQ(info.protocol, ETH_P_ALL, "protocol incorrect"))
> +               goto end;
> +
> +       opts.replace = true;
> +       ret = bpf_tc_attach(fd, LO_IFINDEX, parent_id, &opts, &id);
> +       if (!ASSERT_EQ(ret, 0, "bpf_tc_attach in replace mode"))
> +               return ret;
> +
> +       /* Demonstrate changing attributes */
> +       opts.class_id = TC_H_MAKE(1UL << 16, 2);
> +
> +       ret = bpf_tc_attach(fd, LO_IFINDEX, parent_id, &opts, &id);
> +       if (!ASSERT_EQ(ret, 0, "bpf_tc attach in replace mode"))
> +               goto end;
> +
> +       ret = bpf_tc_get_info(LO_IFINDEX, parent_id, &id, &info);
> +       if (!ASSERT_EQ(ret, 0, "bpf_tc_get_info"))
> +               goto end;
> +
> +       if (!ASSERT_EQ(info.class_id, TC_H_MAKE(1UL << 16, 2),
> +                      "class_id incorrect after replace"))
> +               goto end;
> +       if (!ASSERT_EQ(info.bpf_flags & TCA_BPF_FLAG_ACT_DIRECT, 1,
> +                      "direct action mode not set"))
> +               goto end;
> +
> +end:
> +       ret = bpf_tc_detach(LO_IFINDEX, parent_id, &id);
> +       ASSERT_EQ(ret, 0, "detach failed");
> +       return ret;
> +}
> +
> +int test_tc_info(int fd)
> +{
> +       DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .handle = 1, .priority = 10,
> +                           .class_id = TC_H_MAKE(1UL << 16, 1));
> +       struct bpf_tc_attach_id id = {}, old;
> +       struct bpf_tc_info info = {};
> +       int ret;
> +
> +       ret = bpf_tc_attach(fd, LO_IFINDEX, BPF_TC_CLSACT_INGRESS, &opts, &id);
> +       if (!ASSERT_EQ(ret, 0, "bpf_tc_attach"))
> +               return ret;
> +       old = id;
> +
> +       ret = bpf_tc_get_info(LO_IFINDEX, BPF_TC_CLSACT_INGRESS, &id, &info);
> +       if (!ASSERT_EQ(ret, 0, "bpf_tc_get_info"))
> +               goto end_old;
> +
> +       if (!ASSERT_EQ(info.id.handle, id.handle, "handle mismatch") ||
> +           !ASSERT_EQ(info.id.priority, id.priority, "priority mismatch") ||
> +           !ASSERT_EQ(info.id.handle, 1, "handle incorrect") ||
> +           !ASSERT_EQ(info.chain_index, 0, "chain_index incorrect") ||
> +           !ASSERT_EQ(info.id.priority, 10, "priority incorrect") ||
> +           !ASSERT_EQ(info.class_id, TC_H_MAKE(1UL << 16, 1),
> +                      "class_id incorrect") ||
> +           !ASSERT_EQ(info.protocol, ETH_P_ALL, "protocol incorrect"))
> +               goto end_old;
> +
> +       /* choose a priority */
> +       opts.priority = 0;
> +       ret = bpf_tc_attach(fd, LO_IFINDEX, BPF_TC_CLSACT_INGRESS, &opts, &id);
> +       if (!ASSERT_EQ(ret, 0, "bpf_tc_attach"))
> +               goto end_old;
> +
> +       ret = bpf_tc_get_info(LO_IFINDEX, BPF_TC_CLSACT_INGRESS, &id, &info);
> +       if (!ASSERT_EQ(ret, 0, "bpf_tc_get_info"))
> +               goto end;
> +
> +       if (!ASSERT_NEQ(id.priority, old.priority, "filter priority mismatch"))
> +               goto end;
> +       if (!ASSERT_EQ(info.id.priority, id.priority, "priority mismatch"))
> +               goto end;
> +
> +end:
> +       ret = bpf_tc_detach(LO_IFINDEX, BPF_TC_CLSACT_INGRESS, &id);
> +       ASSERT_EQ(ret, 0, "detach failed");
> +end_old:
> +       ret = bpf_tc_detach(LO_IFINDEX, BPF_TC_CLSACT_INGRESS, &old);
> +       ASSERT_EQ(ret, 0, "detach failed");
> +       return ret;
> +}
> +
> +void test_test_tc_bpf(void)

test_test_ tautology, drop one test?

> +{
> +       const char *file = "./test_tc_bpf_kern.o";

please use BPF skeleton instead, see lots of other selftests doing
that already. You won't even need find_program_by_{name,title}, among
other things.

> +       struct bpf_program *clsp;
> +       struct bpf_object *obj;
> +       int cls_fd, ret;
> +
> +       obj = bpf_object__open(file);
> +       if (!ASSERT_OK_PTR(obj, "bpf_object__open"))
> +               return;
> +
> +       clsp = bpf_object__find_program_by_title(obj, "classifier");
> +       if (!ASSERT_OK_PTR(clsp, "bpf_object__find_program_by_title"))
> +               goto end;
> +
> +       ret = bpf_object__load(obj);
> +       if (!ASSERT_EQ(ret, 0, "bpf_object__load"))
> +               goto end;
> +
> +       cls_fd = bpf_program__fd(clsp);
> +
> +       system("tc qdisc del dev lo clsact");

can this fail? also why is this necessary? it's still not possible to
do anything with only libbpf APIs?

> +
> +       ret = test_tc_internal(cls_fd, BPF_TC_CLSACT_INGRESS);
> +       if (!ASSERT_EQ(ret, 0, "test_tc_internal INGRESS"))
> +               goto end;
> +
> +       if (!ASSERT_EQ(system("tc qdisc del dev lo clsact"), 0,
> +                      "clsact qdisc delete failed"))
> +               goto end;
> +
> +       ret = test_tc_info(cls_fd);
> +       if (!ASSERT_EQ(ret, 0, "test_tc_info"))
> +               goto end;
> +
> +       if (!ASSERT_EQ(system("tc qdisc del dev lo clsact"), 0,
> +                      "clsact qdisc delete failed"))
> +               goto end;
> +
> +       ret = test_tc_internal(cls_fd, BPF_TC_CLSACT_EGRESS);
> +       if (!ASSERT_EQ(ret, 0, "test_tc_internal EGRESS"))
> +               goto end;
> +
> +       ASSERT_EQ(system("tc qdisc del dev lo clsact"), 0,
> +                 "clsact qdisc delete failed");
> +
> +end:
> +       bpf_object__close(obj);
> +}
> diff --git a/tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c b/tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c
> new file mode 100644
> index 000000000000..18a3a7ed924a
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c
> @@ -0,0 +1,12 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/bpf.h>
> +#include <bpf/bpf_helpers.h>
> +
> +/* Dummy prog to test TC-BPF API */
> +
> +SEC("classifier")
> +int cls(struct __sk_buff *skb)
> +{
> +       return 0;
> +}
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v3 2/3] libbpf: add low level TC-BPF API
  2021-04-21 18:19   ` Andrii Nakryiko
@ 2021-04-21 19:48     ` Toke Høiland-Jørgensen
  2021-04-21 23:14       ` Daniel Borkmann
  0 siblings, 1 reply; 29+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-04-21 19:48 UTC (permalink / raw)
  To: Andrii Nakryiko, Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, Networking

Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> On Tue, Apr 20, 2021 at 12:37 PM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
>>
>> This adds functions that wrap the netlink API used for adding,
>> manipulating, and removing traffic control filters. These functions
>> operate directly on the loaded prog's fd, and return a handle to the
>> filter using an out parameter named id.
>>
>> The basic featureset is covered to allow for attaching and removal of
>> filters. Some additional features like TCA_BPF_POLICE and TCA_RATE for
>> the API have been omitted. These can added on top later by extending the
>> bpf_tc_opts struct.
>>
>> Support for binding actions directly to a classifier by passing them in
>> during filter creation has also been omitted for now. These actions have
>> an auto clean up property because their lifetime is bound to the filter
>> they are attached to. This can be added later, but was omitted for now
>> as direct action mode is a better alternative to it, which is enabled by
>> default.
>>
>> An API summary:
>>
>> bpf_tc_attach may be used to attach, and replace SCHED_CLS bpf
>> classifier. The protocol is always set as ETH_P_ALL. The replace option
>> in bpf_tc_opts is used to control replacement behavior.  Attachment
>> fails if filter with existing attributes already exists.
>>
>> bpf_tc_detach may be used to detach existing SCHED_CLS filter. The
>> bpf_tc_attach_id object filled in during attach must be passed in to the
>> detach functions for them to remove the filter and its attached
>> classififer correctly.
>>
>> bpf_tc_get_info is a helper that can be used to obtain attributes
>> for the filter and classififer.
>>
>> Examples:
>>
>>         struct bpf_tc_attach_id id = {};
>>         struct bpf_object *obj;
>>         struct bpf_program *p;
>>         int fd, r;
>>
>>         obj = bpf_object_open("foo.o");
>>         if (IS_ERR_OR_NULL(obj))
>>                 return PTR_ERR(obj);
>>
>>         p = bpf_object__find_program_by_title(obj, "classifier");
>>         if (IS_ERR_OR_NULL(p))
>>                 return PTR_ERR(p);
>>
>>         if (bpf_object__load(obj) < 0)
>>                 return -1;
>>
>>         fd = bpf_program__fd(p);
>>
>>         r = bpf_tc_attach(fd, if_nametoindex("lo"),
>>                           BPF_TC_CLSACT_INGRESS,
>>                           NULL, &id);
>>         if (r < 0)
>>                 return r;
>>
>> ... which is roughly equivalent to:
>>   # tc qdisc add dev lo clsact
>>   # tc filter add dev lo ingress bpf obj foo.o sec classifier da
>>
>> ... as direct action mode is always enabled.
>>
>> To replace an existing filter:
>>
>>         DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .handle = id.handle,
>>                             .priority = id.priority, .replace = true);
>>         r = bpf_tc_attach(fd, if_nametoindex("lo"),
>>                           BPF_TC_CLSACT_INGRESS,
>>                           &opts, &id);
>>         if (r < 0)
>>                 return r;
>>
>> To obtain info of a particular filter, the example above can be extended
>> as follows:
>>
>>         struct bpf_tc_info info = {};
>>
>>         r = bpf_tc_get_info(if_nametoindex("lo"),
>>                             BPF_TC_CLSACT_INGRESS,
>>                             &id, &info);
>>         if (r < 0)
>>                 return r;
>>
>> ... where id corresponds to the bpf_tc_attach_id filled in during an
>> attach operation.
>>
>> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
>> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
>> ---
>>  tools/lib/bpf/libbpf.h   |  44 ++++++
>>  tools/lib/bpf/libbpf.map |   3 +
>>  tools/lib/bpf/netlink.c  | 319 ++++++++++++++++++++++++++++++++++++++-
>>  3 files changed, 360 insertions(+), 6 deletions(-)
>>
>> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
>> index bec4e6a6e31d..b4ed6a41ea70 100644
>> --- a/tools/lib/bpf/libbpf.h
>> +++ b/tools/lib/bpf/libbpf.h
>> @@ -16,6 +16,8 @@
>>  #include <stdbool.h>
>>  #include <sys/types.h>  // for size_t
>>  #include <linux/bpf.h>
>> +#include <linux/pkt_sched.h>
>> +#include <linux/tc_act/tc_bpf.h>
>
> apart from those unused macros below, are these needed in public API header?
>
>>
>>  #include "libbpf_common.h"
>>
>> @@ -775,6 +777,48 @@ LIBBPF_API int bpf_linker__add_file(struct bpf_linker *linker, const char *filen
>>  LIBBPF_API int bpf_linker__finalize(struct bpf_linker *linker);
>>  LIBBPF_API void bpf_linker__free(struct bpf_linker *linker);
>>
>> +/* Convenience macros for the clsact attach hooks */
>> +#define BPF_TC_CLSACT_INGRESS TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_INGRESS)
>> +#define BPF_TC_CLSACT_EGRESS TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_EGRESS)
>
> these seem to be used only internally, why exposing them in public
> API?

No they're "aliases" for when you want to attach the filter directly to
the interface (and thus install the clsact qdisc as the root). You can
also use the filter with an existing qdisc (most commonly HTB), in which
case you need to specify the qdisc handle as the root. We have a few
examples of this use case:

https://github.com/xdp-project/bpf-examples/tree/master/traffic-pacing-edt
and
https://github.com/xdp-project/xdp-cpumap-tc

>> +struct bpf_tc_opts {
>> +       size_t sz;
>> +       __u32 handle;
>> +       __u32 class_id;
>> +       __u16 priority;
>> +       bool replace;
>> +       size_t :0;
>> +};
>> +
>> +#define bpf_tc_opts__last_field replace
>> +
>> +/* Acts as a handle for an attached filter */
>> +struct bpf_tc_attach_id {
>> +       __u32 handle;
>> +       __u16 priority;
>> +};
>
> what are the chances that we'll need to grow this id struct? If that
> happens, how do we do that in a backward/forward compatible manner?
>
> if handle/prio are the only two ever necessary, we can actually use
> bpf_tc_opts to return them back to user (we do that with
> bpf_test_run_opts API). And then adjust detach/get_info methods to let
> pass those values.
>
> The whole idea of a struct for id just screams "compatibility problems
> down the road" at me. Does anyone else has any other opinion on this?

Well, *if* we ever want to extend them (e.g., to support other values of
the protocol field, if that ever becomes necessary), we'll probably also
want to make it possible to pass the same identifiers as options, so
just reusing the opts struct definitely makes sense!

>> +struct bpf_tc_info {
>> +       struct bpf_tc_attach_id id;
>> +       __u16 protocol;
>> +       __u32 chain_index;
>> +       __u32 prog_id;
>> +       __u8 tag[BPF_TAG_SIZE];
>> +       __u32 class_id;
>> +       __u32 bpf_flags;
>> +       __u32 bpf_flags_gen;
>> +};
>> +
>> +/* id is out parameter that will be written to, it must not be NULL */
>> +LIBBPF_API int bpf_tc_attach(int fd, __u32 ifindex, __u32 parent_id,
>
> so parent_id is INGRESS|EGRESS, right? Is that an obvious name for
> this parameter? I had to look at the code to understand what's
> expected. Is it possible that it will be anything other than INGRESS
> or EGRESS? If not `bool ingress` might be an option. Or perhaps enum
> bpf_tc_direction { BPF_TC_INGRESS, BPF_TC_EGRESS } is better still.

See above; the parent is the attach point, and you use the defines from
above if you just want to attach to the interface.

But maybe documenting this in a comment above the function signature
would be good (along with a bit of terminology from the TC world for
those coming from there) :)

>> +                            const struct bpf_tc_opts *opts,
>> +                            struct bpf_tc_attach_id *id);
>> +LIBBPF_API int bpf_tc_detach(__u32 ifindex, __u32 parent_id,
>> +                            const struct bpf_tc_attach_id *id);
>> +LIBBPF_API int bpf_tc_get_info(__u32 ifindex, __u32 parent_id,
>
> bpf_tc_query() to be more in line with attach/detach single-word
> verbs?

OK by me!

-Toke


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v3 3/3] libbpf: add selftests for TC-BPF API
  2021-04-21 18:24   ` Andrii Nakryiko
@ 2021-04-21 19:56     ` Kumar Kartikeya Dwivedi
  2021-04-21 20:38       ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 29+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2021-04-21 19:56 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Toke Høiland-Jørgensen, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, Networking

On Wed, Apr 21, 2021 at 11:54:18PM IST, Andrii Nakryiko wrote:
> On Tue, Apr 20, 2021 at 12:37 PM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > This adds some basic tests for the low level bpf_tc_* API.
> >
> > Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> >  .../selftests/bpf/prog_tests/test_tc_bpf.c    | 169 ++++++++++++++++++
> >  .../selftests/bpf/progs/test_tc_bpf_kern.c    |  12 ++
> >  2 files changed, 181 insertions(+)
> >  create mode 100644 tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c
>
> we normally don't call prog_test's files with "test_" prefix, it can
> be just tc_bpf.c (or just tc.c)
>

Ok, will rename.

> >  create mode 100644 tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c
>
> we also don't typically call BPF source code files with _kern suffix,
> just test_tc_bpf.c would be more in line with most common case
>

Will rename.

> >
> > diff --git a/tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c b/tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c
> > new file mode 100644
> > index 000000000000..563a3944553c
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c
> > @@ -0,0 +1,169 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +#include <linux/bpf.h>
> > +#include <linux/err.h>
> > +#include <linux/limits.h>
> > +#include <bpf/libbpf.h>
> > +#include <errno.h>
> > +#include <stdio.h>
> > +#include <stdlib.h>
> > +#include <test_progs.h>
> > +#include <linux/if_ether.h>
> > +
> > +#define LO_IFINDEX 1
> > +
> > +static int test_tc_internal(int fd, __u32 parent_id)
> > +{
> > +       DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .handle = 1, .priority = 10,
> > +                           .class_id = TC_H_MAKE(1UL << 16, 1));
> > +       struct bpf_tc_attach_id id = {};
> > +       struct bpf_tc_info info = {};
> > +       int ret;
> > +
> > +       ret = bpf_tc_attach(fd, LO_IFINDEX, parent_id, &opts, &id);
> > +       if (!ASSERT_EQ(ret, 0, "bpf_tc_attach"))
> > +               return ret;
> > +
> > +       ret = bpf_tc_get_info(LO_IFINDEX, parent_id, &id, &info);
> > +       if (!ASSERT_EQ(ret, 0, "bpf_tc_get_info"))
> > +               goto end;
> > +
> > +       if (!ASSERT_EQ(info.id.handle, id.handle, "handle mismatch") ||
> > +           !ASSERT_EQ(info.id.priority, id.priority, "priority mismatch") ||
> > +           !ASSERT_EQ(info.id.handle, 1, "handle incorrect") ||
> > +           !ASSERT_EQ(info.chain_index, 0, "chain_index incorrect") ||
> > +           !ASSERT_EQ(info.id.priority, 10, "priority incorrect") ||
> > +           !ASSERT_EQ(info.class_id, TC_H_MAKE(1UL << 16, 1),
> > +                      "class_id incorrect") ||
> > +           !ASSERT_EQ(info.protocol, ETH_P_ALL, "protocol incorrect"))
> > +               goto end;
> > +
> > +       opts.replace = true;
> > +       ret = bpf_tc_attach(fd, LO_IFINDEX, parent_id, &opts, &id);
> > +       if (!ASSERT_EQ(ret, 0, "bpf_tc_attach in replace mode"))
> > +               return ret;
> > +
> > +       /* Demonstrate changing attributes */
> > +       opts.class_id = TC_H_MAKE(1UL << 16, 2);
> > +
> > +       ret = bpf_tc_attach(fd, LO_IFINDEX, parent_id, &opts, &id);
> > +       if (!ASSERT_EQ(ret, 0, "bpf_tc attach in replace mode"))
> > +               goto end;
> > +
> > +       ret = bpf_tc_get_info(LO_IFINDEX, parent_id, &id, &info);
> > +       if (!ASSERT_EQ(ret, 0, "bpf_tc_get_info"))
> > +               goto end;
> > +
> > +       if (!ASSERT_EQ(info.class_id, TC_H_MAKE(1UL << 16, 2),
> > +                      "class_id incorrect after replace"))
> > +               goto end;
> > +       if (!ASSERT_EQ(info.bpf_flags & TCA_BPF_FLAG_ACT_DIRECT, 1,
> > +                      "direct action mode not set"))
> > +               goto end;
> > +
> > +end:
> > +       ret = bpf_tc_detach(LO_IFINDEX, parent_id, &id);
> > +       ASSERT_EQ(ret, 0, "detach failed");
> > +       return ret;
> > +}
> > +
> > +int test_tc_info(int fd)
> > +{
> > +       DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .handle = 1, .priority = 10,
> > +                           .class_id = TC_H_MAKE(1UL << 16, 1));
> > +       struct bpf_tc_attach_id id = {}, old;
> > +       struct bpf_tc_info info = {};
> > +       int ret;
> > +
> > +       ret = bpf_tc_attach(fd, LO_IFINDEX, BPF_TC_CLSACT_INGRESS, &opts, &id);
> > +       if (!ASSERT_EQ(ret, 0, "bpf_tc_attach"))
> > +               return ret;
> > +       old = id;
> > +
> > +       ret = bpf_tc_get_info(LO_IFINDEX, BPF_TC_CLSACT_INGRESS, &id, &info);
> > +       if (!ASSERT_EQ(ret, 0, "bpf_tc_get_info"))
> > +               goto end_old;
> > +
> > +       if (!ASSERT_EQ(info.id.handle, id.handle, "handle mismatch") ||
> > +           !ASSERT_EQ(info.id.priority, id.priority, "priority mismatch") ||
> > +           !ASSERT_EQ(info.id.handle, 1, "handle incorrect") ||
> > +           !ASSERT_EQ(info.chain_index, 0, "chain_index incorrect") ||
> > +           !ASSERT_EQ(info.id.priority, 10, "priority incorrect") ||
> > +           !ASSERT_EQ(info.class_id, TC_H_MAKE(1UL << 16, 1),
> > +                      "class_id incorrect") ||
> > +           !ASSERT_EQ(info.protocol, ETH_P_ALL, "protocol incorrect"))
> > +               goto end_old;
> > +
> > +       /* choose a priority */
> > +       opts.priority = 0;
> > +       ret = bpf_tc_attach(fd, LO_IFINDEX, BPF_TC_CLSACT_INGRESS, &opts, &id);
> > +       if (!ASSERT_EQ(ret, 0, "bpf_tc_attach"))
> > +               goto end_old;
> > +
> > +       ret = bpf_tc_get_info(LO_IFINDEX, BPF_TC_CLSACT_INGRESS, &id, &info);
> > +       if (!ASSERT_EQ(ret, 0, "bpf_tc_get_info"))
> > +               goto end;
> > +
> > +       if (!ASSERT_NEQ(id.priority, old.priority, "filter priority mismatch"))
> > +               goto end;
> > +       if (!ASSERT_EQ(info.id.priority, id.priority, "priority mismatch"))
> > +               goto end;
> > +
> > +end:
> > +       ret = bpf_tc_detach(LO_IFINDEX, BPF_TC_CLSACT_INGRESS, &id);
> > +       ASSERT_EQ(ret, 0, "detach failed");
> > +end_old:
> > +       ret = bpf_tc_detach(LO_IFINDEX, BPF_TC_CLSACT_INGRESS, &old);
> > +       ASSERT_EQ(ret, 0, "detach failed");
> > +       return ret;
> > +}
> > +
> > +void test_test_tc_bpf(void)
>
> test_test_ tautology, drop one test?
>

Ok.

> > +{
> > +       const char *file = "./test_tc_bpf_kern.o";
>
> please use BPF skeleton instead, see lots of other selftests doing
> that already. You won't even need find_program_by_{name,title}, among
> other things.
>

Sounds good, will change.

> > +       struct bpf_program *clsp;
> > +       struct bpf_object *obj;
> > +       int cls_fd, ret;
> > +
> > +       obj = bpf_object__open(file);
> > +       if (!ASSERT_OK_PTR(obj, "bpf_object__open"))
> > +               return;
> > +
> > +       clsp = bpf_object__find_program_by_title(obj, "classifier");
> > +       if (!ASSERT_OK_PTR(clsp, "bpf_object__find_program_by_title"))
> > +               goto end;
> > +
> > +       ret = bpf_object__load(obj);
> > +       if (!ASSERT_EQ(ret, 0, "bpf_object__load"))
> > +               goto end;
> > +
> > +       cls_fd = bpf_program__fd(clsp);
> > +
> > +       system("tc qdisc del dev lo clsact");
>
> can this fail? also why is this necessary? it's still not possible to

This is just removing any existing clsact qdisc since it will be setup by the
attach call, which is again removed later (where we do check if it fails, if it
does clsact qdisc was not setup, and something was wrong in that it returned 0
when the attach point was one of the clsact hooks).

We don't care about failure initially, since if it isn't present we'd just move
on to running the test.

> do anything with only libbpf APIs?
>

I don't think so, I can do the qdisc teardown using netlink in the selftest,
but that would mean duplicating a lot of code. I think expecting tc to be
present on the system is a reasonable assumption for this test.

> > +
> > +       ret = test_tc_internal(cls_fd, BPF_TC_CLSACT_INGRESS);
> > +       if (!ASSERT_EQ(ret, 0, "test_tc_internal INGRESS"))
> > +               goto end;
> > +
> > +       if (!ASSERT_EQ(system("tc qdisc del dev lo clsact"), 0,
> > +                      "clsact qdisc delete failed"))
> > +               goto end;
> > +
> > +       ret = test_tc_info(cls_fd);
> > +       if (!ASSERT_EQ(ret, 0, "test_tc_info"))
> > +               goto end;
> > +
> > +       if (!ASSERT_EQ(system("tc qdisc del dev lo clsact"), 0,
> > +                      "clsact qdisc delete failed"))
> > +               goto end;
> > +
> > +       ret = test_tc_internal(cls_fd, BPF_TC_CLSACT_EGRESS);
> > +       if (!ASSERT_EQ(ret, 0, "test_tc_internal EGRESS"))
> > +               goto end;
> > +
> > +       ASSERT_EQ(system("tc qdisc del dev lo clsact"), 0,
> > +                 "clsact qdisc delete failed");
> > +
> > +end:
> > +       bpf_object__close(obj);
> > +}
> > diff --git a/tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c b/tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c
> > new file mode 100644
> > index 000000000000..18a3a7ed924a
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c
> > @@ -0,0 +1,12 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +#include <linux/bpf.h>
> > +#include <bpf/bpf_helpers.h>
> > +
> > +/* Dummy prog to test TC-BPF API */
> > +
> > +SEC("classifier")
> > +int cls(struct __sk_buff *skb)
> > +{
> > +       return 0;
> > +}
> > --
> > 2.30.2
> >

--
Kartikeya

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v3 3/3] libbpf: add selftests for TC-BPF API
  2021-04-21 19:56     ` Kumar Kartikeya Dwivedi
@ 2021-04-21 20:38       ` Toke Høiland-Jørgensen
  2021-04-21 22:41         ` Daniel Borkmann
  0 siblings, 1 reply; 29+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-04-21 20:38 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, Andrii Nakryiko
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, Networking

Kumar Kartikeya Dwivedi <memxor@gmail.com> writes:

> On Wed, Apr 21, 2021 at 11:54:18PM IST, Andrii Nakryiko wrote:
>> On Tue, Apr 20, 2021 at 12:37 PM Kumar Kartikeya Dwivedi
>> <memxor@gmail.com> wrote:
>> >
>> > This adds some basic tests for the low level bpf_tc_* API.
>> >
>> > Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
>> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
>> > ---
>> >  .../selftests/bpf/prog_tests/test_tc_bpf.c    | 169 ++++++++++++++++++
>> >  .../selftests/bpf/progs/test_tc_bpf_kern.c    |  12 ++
>> >  2 files changed, 181 insertions(+)
>> >  create mode 100644 tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c
>>
>> we normally don't call prog_test's files with "test_" prefix, it can
>> be just tc_bpf.c (or just tc.c)
>>
>
> Ok, will rename.
>
>> >  create mode 100644 tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c
>>
>> we also don't typically call BPF source code files with _kern suffix,
>> just test_tc_bpf.c would be more in line with most common case
>>
>
> Will rename.
>
>> >
>> > diff --git a/tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c b/tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c
>> > new file mode 100644
>> > index 000000000000..563a3944553c
>> > --- /dev/null
>> > +++ b/tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c
>> > @@ -0,0 +1,169 @@
>> > +// SPDX-License-Identifier: GPL-2.0
>> > +
>> > +#include <linux/bpf.h>
>> > +#include <linux/err.h>
>> > +#include <linux/limits.h>
>> > +#include <bpf/libbpf.h>
>> > +#include <errno.h>
>> > +#include <stdio.h>
>> > +#include <stdlib.h>
>> > +#include <test_progs.h>
>> > +#include <linux/if_ether.h>
>> > +
>> > +#define LO_IFINDEX 1
>> > +
>> > +static int test_tc_internal(int fd, __u32 parent_id)
>> > +{
>> > +       DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .handle = 1, .priority = 10,
>> > +                           .class_id = TC_H_MAKE(1UL << 16, 1));
>> > +       struct bpf_tc_attach_id id = {};
>> > +       struct bpf_tc_info info = {};
>> > +       int ret;
>> > +
>> > +       ret = bpf_tc_attach(fd, LO_IFINDEX, parent_id, &opts, &id);
>> > +       if (!ASSERT_EQ(ret, 0, "bpf_tc_attach"))
>> > +               return ret;
>> > +
>> > +       ret = bpf_tc_get_info(LO_IFINDEX, parent_id, &id, &info);
>> > +       if (!ASSERT_EQ(ret, 0, "bpf_tc_get_info"))
>> > +               goto end;
>> > +
>> > +       if (!ASSERT_EQ(info.id.handle, id.handle, "handle mismatch") ||
>> > +           !ASSERT_EQ(info.id.priority, id.priority, "priority mismatch") ||
>> > +           !ASSERT_EQ(info.id.handle, 1, "handle incorrect") ||
>> > +           !ASSERT_EQ(info.chain_index, 0, "chain_index incorrect") ||
>> > +           !ASSERT_EQ(info.id.priority, 10, "priority incorrect") ||
>> > +           !ASSERT_EQ(info.class_id, TC_H_MAKE(1UL << 16, 1),
>> > +                      "class_id incorrect") ||
>> > +           !ASSERT_EQ(info.protocol, ETH_P_ALL, "protocol incorrect"))
>> > +               goto end;
>> > +
>> > +       opts.replace = true;
>> > +       ret = bpf_tc_attach(fd, LO_IFINDEX, parent_id, &opts, &id);
>> > +       if (!ASSERT_EQ(ret, 0, "bpf_tc_attach in replace mode"))
>> > +               return ret;
>> > +
>> > +       /* Demonstrate changing attributes */
>> > +       opts.class_id = TC_H_MAKE(1UL << 16, 2);
>> > +
>> > +       ret = bpf_tc_attach(fd, LO_IFINDEX, parent_id, &opts, &id);
>> > +       if (!ASSERT_EQ(ret, 0, "bpf_tc attach in replace mode"))
>> > +               goto end;
>> > +
>> > +       ret = bpf_tc_get_info(LO_IFINDEX, parent_id, &id, &info);
>> > +       if (!ASSERT_EQ(ret, 0, "bpf_tc_get_info"))
>> > +               goto end;
>> > +
>> > +       if (!ASSERT_EQ(info.class_id, TC_H_MAKE(1UL << 16, 2),
>> > +                      "class_id incorrect after replace"))
>> > +               goto end;
>> > +       if (!ASSERT_EQ(info.bpf_flags & TCA_BPF_FLAG_ACT_DIRECT, 1,
>> > +                      "direct action mode not set"))
>> > +               goto end;
>> > +
>> > +end:
>> > +       ret = bpf_tc_detach(LO_IFINDEX, parent_id, &id);
>> > +       ASSERT_EQ(ret, 0, "detach failed");
>> > +       return ret;
>> > +}
>> > +
>> > +int test_tc_info(int fd)
>> > +{
>> > +       DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .handle = 1, .priority = 10,
>> > +                           .class_id = TC_H_MAKE(1UL << 16, 1));
>> > +       struct bpf_tc_attach_id id = {}, old;
>> > +       struct bpf_tc_info info = {};
>> > +       int ret;
>> > +
>> > +       ret = bpf_tc_attach(fd, LO_IFINDEX, BPF_TC_CLSACT_INGRESS, &opts, &id);
>> > +       if (!ASSERT_EQ(ret, 0, "bpf_tc_attach"))
>> > +               return ret;
>> > +       old = id;
>> > +
>> > +       ret = bpf_tc_get_info(LO_IFINDEX, BPF_TC_CLSACT_INGRESS, &id, &info);
>> > +       if (!ASSERT_EQ(ret, 0, "bpf_tc_get_info"))
>> > +               goto end_old;
>> > +
>> > +       if (!ASSERT_EQ(info.id.handle, id.handle, "handle mismatch") ||
>> > +           !ASSERT_EQ(info.id.priority, id.priority, "priority mismatch") ||
>> > +           !ASSERT_EQ(info.id.handle, 1, "handle incorrect") ||
>> > +           !ASSERT_EQ(info.chain_index, 0, "chain_index incorrect") ||
>> > +           !ASSERT_EQ(info.id.priority, 10, "priority incorrect") ||
>> > +           !ASSERT_EQ(info.class_id, TC_H_MAKE(1UL << 16, 1),
>> > +                      "class_id incorrect") ||
>> > +           !ASSERT_EQ(info.protocol, ETH_P_ALL, "protocol incorrect"))
>> > +               goto end_old;
>> > +
>> > +       /* choose a priority */
>> > +       opts.priority = 0;
>> > +       ret = bpf_tc_attach(fd, LO_IFINDEX, BPF_TC_CLSACT_INGRESS, &opts, &id);
>> > +       if (!ASSERT_EQ(ret, 0, "bpf_tc_attach"))
>> > +               goto end_old;
>> > +
>> > +       ret = bpf_tc_get_info(LO_IFINDEX, BPF_TC_CLSACT_INGRESS, &id, &info);
>> > +       if (!ASSERT_EQ(ret, 0, "bpf_tc_get_info"))
>> > +               goto end;
>> > +
>> > +       if (!ASSERT_NEQ(id.priority, old.priority, "filter priority mismatch"))
>> > +               goto end;
>> > +       if (!ASSERT_EQ(info.id.priority, id.priority, "priority mismatch"))
>> > +               goto end;
>> > +
>> > +end:
>> > +       ret = bpf_tc_detach(LO_IFINDEX, BPF_TC_CLSACT_INGRESS, &id);
>> > +       ASSERT_EQ(ret, 0, "detach failed");
>> > +end_old:
>> > +       ret = bpf_tc_detach(LO_IFINDEX, BPF_TC_CLSACT_INGRESS, &old);
>> > +       ASSERT_EQ(ret, 0, "detach failed");
>> > +       return ret;
>> > +}
>> > +
>> > +void test_test_tc_bpf(void)
>>
>> test_test_ tautology, drop one test?
>>
>
> Ok.
>
>> > +{
>> > +       const char *file = "./test_tc_bpf_kern.o";
>>
>> please use BPF skeleton instead, see lots of other selftests doing
>> that already. You won't even need find_program_by_{name,title}, among
>> other things.
>>
>
> Sounds good, will change.
>
>> > +       struct bpf_program *clsp;
>> > +       struct bpf_object *obj;
>> > +       int cls_fd, ret;
>> > +
>> > +       obj = bpf_object__open(file);
>> > +       if (!ASSERT_OK_PTR(obj, "bpf_object__open"))
>> > +               return;
>> > +
>> > +       clsp = bpf_object__find_program_by_title(obj, "classifier");
>> > +       if (!ASSERT_OK_PTR(clsp, "bpf_object__find_program_by_title"))
>> > +               goto end;
>> > +
>> > +       ret = bpf_object__load(obj);
>> > +       if (!ASSERT_EQ(ret, 0, "bpf_object__load"))
>> > +               goto end;
>> > +
>> > +       cls_fd = bpf_program__fd(clsp);
>> > +
>> > +       system("tc qdisc del dev lo clsact");
>>
>> can this fail? also why is this necessary? it's still not possible to
>
> This is just removing any existing clsact qdisc since it will be setup by the
> attach call, which is again removed later (where we do check if it fails, if it
> does clsact qdisc was not setup, and something was wrong in that it returned 0
> when the attach point was one of the clsact hooks).
>
> We don't care about failure initially, since if it isn't present we'd just move
> on to running the test.
>
>> do anything with only libbpf APIs?
>>
>
> I don't think so, I can do the qdisc teardown using netlink in the selftest,
> but that would mean duplicating a lot of code. I think expecting tc to be
> present on the system is a reasonable assumption for this test.

So this stems from the fact that bpf_tc_detach() doesn't clean up the
clsact qdisc that is added by bpf_tc_attach(). I think we should fix
this.

Andrii, Kumar and I discussed this, and concluded that the best we can
do from userspace right now is query the number of filters before remove
and if there's only one, also remove the clsact qdisc. This is racy in
that a new filter can be attached between the check and the remove, but
to fix that we need a way for the filter to take the ref on the qdisc.
Since something like this will be needed for a bpf_link attach mode as
well, we figured that can be added as part of such a series, and we'll
just do the best-effort thing now. WDYT?

-Toke


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v3 3/3] libbpf: add selftests for TC-BPF API
  2021-04-21 20:38       ` Toke Høiland-Jørgensen
@ 2021-04-21 22:41         ` Daniel Borkmann
  0 siblings, 0 replies; 29+ messages in thread
From: Daniel Borkmann @ 2021-04-21 22:41 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Kumar Kartikeya Dwivedi,
	Andrii Nakryiko
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Martin KaFai Lau,
	Song Liu, Yonghong Song, John Fastabend, KP Singh,
	David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer,
	Networking

On 4/21/21 10:38 PM, Toke Høiland-Jørgensen wrote:
> Kumar Kartikeya Dwivedi <memxor@gmail.com> writes:
> 
>> On Wed, Apr 21, 2021 at 11:54:18PM IST, Andrii Nakryiko wrote:
>>> On Tue, Apr 20, 2021 at 12:37 PM Kumar Kartikeya Dwivedi
>>> <memxor@gmail.com> wrote:
>>>>
>>>> This adds some basic tests for the low level bpf_tc_* API.
>>>>
>>>> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
>>>> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
>>>> ---
>>>>   .../selftests/bpf/prog_tests/test_tc_bpf.c    | 169 ++++++++++++++++++
>>>>   .../selftests/bpf/progs/test_tc_bpf_kern.c    |  12 ++
>>>>   2 files changed, 181 insertions(+)
>>>>   create mode 100644 tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c
>>>
>>> we normally don't call prog_test's files with "test_" prefix, it can
>>> be just tc_bpf.c (or just tc.c)
>>>
>>
>> Ok, will rename.
>>
>>>>   create mode 100644 tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c
>>>
>>> we also don't typically call BPF source code files with _kern suffix,
>>> just test_tc_bpf.c would be more in line with most common case
>>>
>>
>> Will rename.
>>
>>>>
>>>> diff --git a/tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c b/tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c
>>>> new file mode 100644
>>>> index 000000000000..563a3944553c
>>>> --- /dev/null
>>>> +++ b/tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c
>>>> @@ -0,0 +1,169 @@
>>>> +// SPDX-License-Identifier: GPL-2.0
>>>> +
>>>> +#include <linux/bpf.h>
>>>> +#include <linux/err.h>
>>>> +#include <linux/limits.h>
>>>> +#include <bpf/libbpf.h>
>>>> +#include <errno.h>
>>>> +#include <stdio.h>
>>>> +#include <stdlib.h>
>>>> +#include <test_progs.h>
>>>> +#include <linux/if_ether.h>
>>>> +
>>>> +#define LO_IFINDEX 1
>>>> +
>>>> +static int test_tc_internal(int fd, __u32 parent_id)
>>>> +{
>>>> +       DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .handle = 1, .priority = 10,
>>>> +                           .class_id = TC_H_MAKE(1UL << 16, 1));
>>>> +       struct bpf_tc_attach_id id = {};
>>>> +       struct bpf_tc_info info = {};
>>>> +       int ret;
>>>> +
>>>> +       ret = bpf_tc_attach(fd, LO_IFINDEX, parent_id, &opts, &id);
>>>> +       if (!ASSERT_EQ(ret, 0, "bpf_tc_attach"))
>>>> +               return ret;
>>>> +
>>>> +       ret = bpf_tc_get_info(LO_IFINDEX, parent_id, &id, &info);
>>>> +       if (!ASSERT_EQ(ret, 0, "bpf_tc_get_info"))
>>>> +               goto end;
>>>> +
>>>> +       if (!ASSERT_EQ(info.id.handle, id.handle, "handle mismatch") ||
>>>> +           !ASSERT_EQ(info.id.priority, id.priority, "priority mismatch") ||
>>>> +           !ASSERT_EQ(info.id.handle, 1, "handle incorrect") ||
>>>> +           !ASSERT_EQ(info.chain_index, 0, "chain_index incorrect") ||
>>>> +           !ASSERT_EQ(info.id.priority, 10, "priority incorrect") ||
>>>> +           !ASSERT_EQ(info.class_id, TC_H_MAKE(1UL << 16, 1),
>>>> +                      "class_id incorrect") ||
>>>> +           !ASSERT_EQ(info.protocol, ETH_P_ALL, "protocol incorrect"))
>>>> +               goto end;
>>>> +
>>>> +       opts.replace = true;
>>>> +       ret = bpf_tc_attach(fd, LO_IFINDEX, parent_id, &opts, &id);
>>>> +       if (!ASSERT_EQ(ret, 0, "bpf_tc_attach in replace mode"))
>>>> +               return ret;
>>>> +
>>>> +       /* Demonstrate changing attributes */
>>>> +       opts.class_id = TC_H_MAKE(1UL << 16, 2);
>>>> +
>>>> +       ret = bpf_tc_attach(fd, LO_IFINDEX, parent_id, &opts, &id);
>>>> +       if (!ASSERT_EQ(ret, 0, "bpf_tc attach in replace mode"))
>>>> +               goto end;
>>>> +
>>>> +       ret = bpf_tc_get_info(LO_IFINDEX, parent_id, &id, &info);
>>>> +       if (!ASSERT_EQ(ret, 0, "bpf_tc_get_info"))
>>>> +               goto end;
>>>> +
>>>> +       if (!ASSERT_EQ(info.class_id, TC_H_MAKE(1UL << 16, 2),
>>>> +                      "class_id incorrect after replace"))
>>>> +               goto end;
>>>> +       if (!ASSERT_EQ(info.bpf_flags & TCA_BPF_FLAG_ACT_DIRECT, 1,
>>>> +                      "direct action mode not set"))
>>>> +               goto end;
>>>> +
>>>> +end:
>>>> +       ret = bpf_tc_detach(LO_IFINDEX, parent_id, &id);
>>>> +       ASSERT_EQ(ret, 0, "detach failed");
>>>> +       return ret;
>>>> +}
>>>> +
>>>> +int test_tc_info(int fd)
>>>> +{
>>>> +       DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .handle = 1, .priority = 10,
>>>> +                           .class_id = TC_H_MAKE(1UL << 16, 1));
>>>> +       struct bpf_tc_attach_id id = {}, old;
>>>> +       struct bpf_tc_info info = {};
>>>> +       int ret;
>>>> +
>>>> +       ret = bpf_tc_attach(fd, LO_IFINDEX, BPF_TC_CLSACT_INGRESS, &opts, &id);
>>>> +       if (!ASSERT_EQ(ret, 0, "bpf_tc_attach"))
>>>> +               return ret;
>>>> +       old = id;
>>>> +
>>>> +       ret = bpf_tc_get_info(LO_IFINDEX, BPF_TC_CLSACT_INGRESS, &id, &info);
>>>> +       if (!ASSERT_EQ(ret, 0, "bpf_tc_get_info"))
>>>> +               goto end_old;
>>>> +
>>>> +       if (!ASSERT_EQ(info.id.handle, id.handle, "handle mismatch") ||
>>>> +           !ASSERT_EQ(info.id.priority, id.priority, "priority mismatch") ||
>>>> +           !ASSERT_EQ(info.id.handle, 1, "handle incorrect") ||
>>>> +           !ASSERT_EQ(info.chain_index, 0, "chain_index incorrect") ||
>>>> +           !ASSERT_EQ(info.id.priority, 10, "priority incorrect") ||
>>>> +           !ASSERT_EQ(info.class_id, TC_H_MAKE(1UL << 16, 1),
>>>> +                      "class_id incorrect") ||
>>>> +           !ASSERT_EQ(info.protocol, ETH_P_ALL, "protocol incorrect"))
>>>> +               goto end_old;
>>>> +
>>>> +       /* choose a priority */
>>>> +       opts.priority = 0;
>>>> +       ret = bpf_tc_attach(fd, LO_IFINDEX, BPF_TC_CLSACT_INGRESS, &opts, &id);
>>>> +       if (!ASSERT_EQ(ret, 0, "bpf_tc_attach"))
>>>> +               goto end_old;
>>>> +
>>>> +       ret = bpf_tc_get_info(LO_IFINDEX, BPF_TC_CLSACT_INGRESS, &id, &info);
>>>> +       if (!ASSERT_EQ(ret, 0, "bpf_tc_get_info"))
>>>> +               goto end;
>>>> +
>>>> +       if (!ASSERT_NEQ(id.priority, old.priority, "filter priority mismatch"))
>>>> +               goto end;
>>>> +       if (!ASSERT_EQ(info.id.priority, id.priority, "priority mismatch"))
>>>> +               goto end;
>>>> +
>>>> +end:
>>>> +       ret = bpf_tc_detach(LO_IFINDEX, BPF_TC_CLSACT_INGRESS, &id);
>>>> +       ASSERT_EQ(ret, 0, "detach failed");
>>>> +end_old:
>>>> +       ret = bpf_tc_detach(LO_IFINDEX, BPF_TC_CLSACT_INGRESS, &old);
>>>> +       ASSERT_EQ(ret, 0, "detach failed");
>>>> +       return ret;
>>>> +}
>>>> +
>>>> +void test_test_tc_bpf(void)
>>>
>>> test_test_ tautology, drop one test?
>>>
>>
>> Ok.
>>
>>>> +{
>>>> +       const char *file = "./test_tc_bpf_kern.o";
>>>
>>> please use BPF skeleton instead, see lots of other selftests doing
>>> that already. You won't even need find_program_by_{name,title}, among
>>> other things.
>>>
>>
>> Sounds good, will change.
>>
>>>> +       struct bpf_program *clsp;
>>>> +       struct bpf_object *obj;
>>>> +       int cls_fd, ret;
>>>> +
>>>> +       obj = bpf_object__open(file);
>>>> +       if (!ASSERT_OK_PTR(obj, "bpf_object__open"))
>>>> +               return;
>>>> +
>>>> +       clsp = bpf_object__find_program_by_title(obj, "classifier");
>>>> +       if (!ASSERT_OK_PTR(clsp, "bpf_object__find_program_by_title"))
>>>> +               goto end;
>>>> +
>>>> +       ret = bpf_object__load(obj);
>>>> +       if (!ASSERT_EQ(ret, 0, "bpf_object__load"))
>>>> +               goto end;
>>>> +
>>>> +       cls_fd = bpf_program__fd(clsp);
>>>> +
>>>> +       system("tc qdisc del dev lo clsact");
>>>
>>> can this fail? also why is this necessary? it's still not possible to
>>
>> This is just removing any existing clsact qdisc since it will be setup by the
>> attach call, which is again removed later (where we do check if it fails, if it
>> does clsact qdisc was not setup, and something was wrong in that it returned 0
>> when the attach point was one of the clsact hooks).
>>
>> We don't care about failure initially, since if it isn't present we'd just move
>> on to running the test.
>>
>>> do anything with only libbpf APIs?
>>
>> I don't think so, I can do the qdisc teardown using netlink in the selftest,
>> but that would mean duplicating a lot of code. I think expecting tc to be
>> present on the system is a reasonable assumption for this test.
> 
> So this stems from the fact that bpf_tc_detach() doesn't clean up the
> clsact qdisc that is added by bpf_tc_attach(). I think we should fix
> this.

I was wondering whether it would make sense to add a bpf_tc_ctx_init() and
bpf_tc_ctx_destroy() API which would auto-create the sch_clsact qdisc, plus
provide a 'handle' for bpf_tc_attach() and bpf_tc_detach(), and for the other
one, it would delete the qdisc. Otoh, if an empty sch_clsact obj is sitting
around while not being great (given minor effect on fast-path), it also doesn't
harm /overly/ much. Maybe a /poor/ analogy could be that if you open a v6 socket,
it pulls in the ipv6 module, but also doesn't remove it when you close() it.
Anyway, but for the test itself, given you can define prio etc, I don't think
it would even need the system() call?

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v3 2/3] libbpf: add low level TC-BPF API
  2021-04-20 19:37 ` [PATCH bpf-next v3 2/3] libbpf: add low level TC-BPF API Kumar Kartikeya Dwivedi
  2021-04-21  6:58   ` Yonghong Song
  2021-04-21 18:19   ` Andrii Nakryiko
@ 2021-04-21 22:59   ` Daniel Borkmann
  2021-04-21 23:08     ` Kumar Kartikeya Dwivedi
  2021-04-22  3:43     ` Andrii Nakryiko
  2 siblings, 2 replies; 29+ messages in thread
From: Daniel Borkmann @ 2021-04-21 22:59 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: Toke Høiland-Jørgensen, Alexei Starovoitov,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, netdev

On 4/20/21 9:37 PM, Kumar Kartikeya Dwivedi wrote:
[...]
> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> index bec4e6a6e31d..b4ed6a41ea70 100644
> --- a/tools/lib/bpf/libbpf.h
> +++ b/tools/lib/bpf/libbpf.h
> @@ -16,6 +16,8 @@
>   #include <stdbool.h>
>   #include <sys/types.h>  // for size_t
>   #include <linux/bpf.h>
> +#include <linux/pkt_sched.h>
> +#include <linux/tc_act/tc_bpf.h>
>   
>   #include "libbpf_common.h"
>   
> @@ -775,6 +777,48 @@ LIBBPF_API int bpf_linker__add_file(struct bpf_linker *linker, const char *filen
>   LIBBPF_API int bpf_linker__finalize(struct bpf_linker *linker);
>   LIBBPF_API void bpf_linker__free(struct bpf_linker *linker);
>   
> +/* Convenience macros for the clsact attach hooks */
> +#define BPF_TC_CLSACT_INGRESS TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_INGRESS)
> +#define BPF_TC_CLSACT_EGRESS TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_EGRESS)

I would abstract those away into an enum, plus avoid having to pull in
linux/pkt_sched.h and linux/tc_act/tc_bpf.h from main libbpf.h header.

Just add a enum { BPF_TC_DIR_INGRESS, BPF_TC_DIR_EGRESS, } and then the
concrete tc bits (TC_H_MAKE()) can be translated internally.

> +struct bpf_tc_opts {
> +	size_t sz;

Is this set anywhere?

> +	__u32 handle;
> +	__u32 class_id;

I'd remove class_id from here as well given in direct-action a BPF prog can
set it if needed.

> +	__u16 priority;
> +	bool replace;
> +	size_t :0;

What's the rationale for this padding?

> +};
> +
> +#define bpf_tc_opts__last_field replace
> +
> +/* Acts as a handle for an attached filter */
> +struct bpf_tc_attach_id {

nit: maybe bpf_tc_ctx

> +	__u32 handle;
> +	__u16 priority;
> +};
> +
> +struct bpf_tc_info {
> +	struct bpf_tc_attach_id id;
> +	__u16 protocol;
> +	__u32 chain_index;
> +	__u32 prog_id;
> +	__u8 tag[BPF_TAG_SIZE];
> +	__u32 class_id;
> +	__u32 bpf_flags;
> +	__u32 bpf_flags_gen;

Given we do not yet have any setters e.g. for offload, etc, the one thing
I'd see useful and crucial initially is prog_id.

The protocol, chain_index, and I would also include tag should be dropped.
Similarly class_id given my earlier statement, and flags I would extend once
this lib API would support offloading progs.

> +};
> +
> +/* id is out parameter that will be written to, it must not be NULL */
> +LIBBPF_API int bpf_tc_attach(int fd, __u32 ifindex, __u32 parent_id,
> +			     const struct bpf_tc_opts *opts,
> +			     struct bpf_tc_attach_id *id);
> +LIBBPF_API int bpf_tc_detach(__u32 ifindex, __u32 parent_id,
> +			     const struct bpf_tc_attach_id *id);
> +LIBBPF_API int bpf_tc_get_info(__u32 ifindex, __u32 parent_id,
> +			       const struct bpf_tc_attach_id *id,
> +			       struct bpf_tc_info *info);

As per above, for parent_id I'd replace with dir enum.

> +
>   #ifdef __cplusplus
>   } /* extern "C" */
>   #endif

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v3 2/3] libbpf: add low level TC-BPF API
  2021-04-21 22:59   ` Daniel Borkmann
@ 2021-04-21 23:08     ` Kumar Kartikeya Dwivedi
  2021-04-21 23:21       ` Daniel Borkmann
  2021-04-22  3:43     ` Andrii Nakryiko
  1 sibling, 1 reply; 29+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2021-04-21 23:08 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: bpf, Toke Høiland-Jørgensen, Alexei Starovoitov,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, netdev

On Thu, Apr 22, 2021 at 04:29:28AM IST, Daniel Borkmann wrote:
> On 4/20/21 9:37 PM, Kumar Kartikeya Dwivedi wrote:
> [...]
> > diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> > index bec4e6a6e31d..b4ed6a41ea70 100644
> > --- a/tools/lib/bpf/libbpf.h
> > +++ b/tools/lib/bpf/libbpf.h
> > @@ -16,6 +16,8 @@
> >   #include <stdbool.h>
> >   #include <sys/types.h>  // for size_t
> >   #include <linux/bpf.h>
> > +#include <linux/pkt_sched.h>
> > +#include <linux/tc_act/tc_bpf.h>
> >   #include "libbpf_common.h"
> > @@ -775,6 +777,48 @@ LIBBPF_API int bpf_linker__add_file(struct bpf_linker *linker, const char *filen
> >   LIBBPF_API int bpf_linker__finalize(struct bpf_linker *linker);
> >   LIBBPF_API void bpf_linker__free(struct bpf_linker *linker);
> > +/* Convenience macros for the clsact attach hooks */
> > +#define BPF_TC_CLSACT_INGRESS TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_INGRESS)
> > +#define BPF_TC_CLSACT_EGRESS TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_EGRESS)
>
> I would abstract those away into an enum, plus avoid having to pull in
> linux/pkt_sched.h and linux/tc_act/tc_bpf.h from main libbpf.h header.
>
> Just add a enum { BPF_TC_DIR_INGRESS, BPF_TC_DIR_EGRESS, } and then the
> concrete tc bits (TC_H_MAKE()) can be translated internally.
>

Ok, will do.

> > +struct bpf_tc_opts {
> > +	size_t sz;
>
> Is this set anywhere?
>

This is needed by the OPTS_* infrastructure.

> > +	__u32 handle;
> > +	__u32 class_id;
>
> I'd remove class_id from here as well given in direct-action a BPF prog can
> set it if needed.
>

Ok, makes sense.

> > +	__u16 priority;
> > +	bool replace;
> > +	size_t :0;
>
> What's the rationale for this padding?
>

dde7b3f5f2f4 ("libbpf: Add explicit padding to bpf_xdp_set_link_opts")

> > +};
> > +
> > +#define bpf_tc_opts__last_field replace
> > +
> > +/* Acts as a handle for an attached filter */
> > +struct bpf_tc_attach_id {
>
> nit: maybe bpf_tc_ctx
>

Noted.

> > +	__u32 handle;
> > +	__u16 priority;
> > +};
> > +
> > +struct bpf_tc_info {
> > +	struct bpf_tc_attach_id id;
> > +	__u16 protocol;
> > +	__u32 chain_index;
> > +	__u32 prog_id;
> > +	__u8 tag[BPF_TAG_SIZE];
> > +	__u32 class_id;
> > +	__u32 bpf_flags;
> > +	__u32 bpf_flags_gen;
>
> Given we do not yet have any setters e.g. for offload, etc, the one thing
> I'd see useful and crucial initially is prog_id.
>
> The protocol, chain_index, and I would also include tag should be dropped.

A future user of this API needs to know the tag, so I would like to keep that.
The rest we can drop, and probably document the default values explicitly.

> Similarly class_id given my earlier statement, and flags I would extend once
> this lib API would support offloading progs.
>
> > +};
> > +
> > +/* id is out parameter that will be written to, it must not be NULL */
> > +LIBBPF_API int bpf_tc_attach(int fd, __u32 ifindex, __u32 parent_id,
> > +			     const struct bpf_tc_opts *opts,
> > +			     struct bpf_tc_attach_id *id);
> > +LIBBPF_API int bpf_tc_detach(__u32 ifindex, __u32 parent_id,
> > +			     const struct bpf_tc_attach_id *id);
> > +LIBBPF_API int bpf_tc_get_info(__u32 ifindex, __u32 parent_id,
> > +			       const struct bpf_tc_attach_id *id,
> > +			       struct bpf_tc_info *info);
>
> As per above, for parent_id I'd replace with dir enum.
>
> > +
> >   #ifdef __cplusplus
> >   } /* extern "C" */
> >   #endif

--
Kartikeya

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v3 2/3] libbpf: add low level TC-BPF API
  2021-04-21 19:48     ` Toke Høiland-Jørgensen
@ 2021-04-21 23:14       ` Daniel Borkmann
  2021-04-22  9:08         ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 29+ messages in thread
From: Daniel Borkmann @ 2021-04-21 23:14 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Andrii Nakryiko,
	Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Martin KaFai Lau,
	Song Liu, Yonghong Song, John Fastabend, KP Singh,
	David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer,
	Networking

On 4/21/21 9:48 PM, Toke Høiland-Jørgensen wrote:
> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>> On Tue, Apr 20, 2021 at 12:37 PM Kumar Kartikeya Dwivedi
>> <memxor@gmail.com> wrote:
[...]
>>> ---
>>>   tools/lib/bpf/libbpf.h   |  44 ++++++
>>>   tools/lib/bpf/libbpf.map |   3 +
>>>   tools/lib/bpf/netlink.c  | 319 ++++++++++++++++++++++++++++++++++++++-
>>>   3 files changed, 360 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
>>> index bec4e6a6e31d..b4ed6a41ea70 100644
>>> --- a/tools/lib/bpf/libbpf.h
>>> +++ b/tools/lib/bpf/libbpf.h
>>> @@ -16,6 +16,8 @@
>>>   #include <stdbool.h>
>>>   #include <sys/types.h>  // for size_t
>>>   #include <linux/bpf.h>
>>> +#include <linux/pkt_sched.h>
>>> +#include <linux/tc_act/tc_bpf.h>
>>
>> apart from those unused macros below, are these needed in public API header?
>>
>>>   #include "libbpf_common.h"
>>>
>>> @@ -775,6 +777,48 @@ LIBBPF_API int bpf_linker__add_file(struct bpf_linker *linker, const char *filen
>>>   LIBBPF_API int bpf_linker__finalize(struct bpf_linker *linker);
>>>   LIBBPF_API void bpf_linker__free(struct bpf_linker *linker);
>>>
>>> +/* Convenience macros for the clsact attach hooks */
>>> +#define BPF_TC_CLSACT_INGRESS TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_INGRESS)
>>> +#define BPF_TC_CLSACT_EGRESS TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_EGRESS)
>>
>> these seem to be used only internally, why exposing them in public
>> API?
> 
> No they're "aliases" for when you want to attach the filter directly to
> the interface (and thus install the clsact qdisc as the root). You can
> also use the filter with an existing qdisc (most commonly HTB), in which
> case you need to specify the qdisc handle as the root. We have a few
> examples of this use case:
> 
> https://github.com/xdp-project/bpf-examples/tree/master/traffic-pacing-edt
> and
> https://github.com/xdp-project/xdp-cpumap-tc

I'm a bit puzzled, could you elaborate on your use case on why you wouldn't
use the tc egress hook for those especially given it's guaranteed to run
outside of root qdisc lock?

Some pointers as well:

  - http://vger.kernel.org/lpc-bpf2018.html#session-1
  - https://netdevconf.info/0x14/session.html?talk-replacing-HTB-with-EDT-and-BPF
  - https://cilium.io/blog/2020/11/10/cilium-19#bwmanager

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v3 2/3] libbpf: add low level TC-BPF API
  2021-04-21 23:08     ` Kumar Kartikeya Dwivedi
@ 2021-04-21 23:21       ` Daniel Borkmann
  2021-04-21 23:30         ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 29+ messages in thread
From: Daniel Borkmann @ 2021-04-21 23:21 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Toke Høiland-Jørgensen, Alexei Starovoitov,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, netdev

On 4/22/21 1:08 AM, Kumar Kartikeya Dwivedi wrote:
> On Thu, Apr 22, 2021 at 04:29:28AM IST, Daniel Borkmann wrote:
>> On 4/20/21 9:37 PM, Kumar Kartikeya Dwivedi wrote:
>> [...]
>>> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
>>> index bec4e6a6e31d..b4ed6a41ea70 100644
>>> --- a/tools/lib/bpf/libbpf.h
>>> +++ b/tools/lib/bpf/libbpf.h
>>> @@ -16,6 +16,8 @@
>>>    #include <stdbool.h>
>>>    #include <sys/types.h>  // for size_t
>>>    #include <linux/bpf.h>
>>> +#include <linux/pkt_sched.h>
>>> +#include <linux/tc_act/tc_bpf.h>
>>>    #include "libbpf_common.h"
>>> @@ -775,6 +777,48 @@ LIBBPF_API int bpf_linker__add_file(struct bpf_linker *linker, const char *filen
>>>    LIBBPF_API int bpf_linker__finalize(struct bpf_linker *linker);
>>>    LIBBPF_API void bpf_linker__free(struct bpf_linker *linker);
>>> +/* Convenience macros for the clsact attach hooks */
>>> +#define BPF_TC_CLSACT_INGRESS TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_INGRESS)
>>> +#define BPF_TC_CLSACT_EGRESS TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_EGRESS)
>>
>> I would abstract those away into an enum, plus avoid having to pull in
>> linux/pkt_sched.h and linux/tc_act/tc_bpf.h from main libbpf.h header.
>>
>> Just add a enum { BPF_TC_DIR_INGRESS, BPF_TC_DIR_EGRESS, } and then the
>> concrete tc bits (TC_H_MAKE()) can be translated internally.
> 
> Ok, will do.
> 
>>> +struct bpf_tc_opts {
>>> +	size_t sz;
>>
>> Is this set anywhere?
> 
> This is needed by the OPTS_* infrastructure.
> 
>>> +	__u32 handle;
>>> +	__u32 class_id;
>>
>> I'd remove class_id from here as well given in direct-action a BPF prog can
>> set it if needed.
> 
> Ok, makes sense.
> 
>>> +	__u16 priority;
>>> +	bool replace;
>>> +	size_t :0;
>>
>> What's the rationale for this padding?
> 
> dde7b3f5f2f4 ("libbpf: Add explicit padding to bpf_xdp_set_link_opts")

Hm, fair enough.

>>> +};
>>> +
>>> +#define bpf_tc_opts__last_field replace
>>> +
>>> +/* Acts as a handle for an attached filter */
>>> +struct bpf_tc_attach_id {
>>
>> nit: maybe bpf_tc_ctx
>>
> 
> Noted.
> 
>>> +	__u32 handle;
>>> +	__u16 priority;
>>> +};
>>> +
>>> +struct bpf_tc_info {
>>> +	struct bpf_tc_attach_id id;
>>> +	__u16 protocol;
>>> +	__u32 chain_index;
>>> +	__u32 prog_id;
>>> +	__u8 tag[BPF_TAG_SIZE];
>>> +	__u32 class_id;
>>> +	__u32 bpf_flags;
>>> +	__u32 bpf_flags_gen;
>>
>> Given we do not yet have any setters e.g. for offload, etc, the one thing
>> I'd see useful and crucial initially is prog_id.
>>
>> The protocol, chain_index, and I would also include tag should be dropped.
> 
> A future user of this API needs to know the tag, so I would like to keep that.
> The rest we can drop, and probably document the default values explicitly.

Couldn't this be added along with the future patch for the [future] user?

The tag should be the tag of the prog itself, so if you have prog_id, you
could also retrieve the same tag from the prog. The tag was basically from
the early days where we didn't have bpf_prog_get_info_by_fd().

What does that future user need to do different here?

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v3 2/3] libbpf: add low level TC-BPF API
  2021-04-21 23:21       ` Daniel Borkmann
@ 2021-04-21 23:30         ` Kumar Kartikeya Dwivedi
  2021-04-21 23:41           ` Daniel Borkmann
  0 siblings, 1 reply; 29+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2021-04-21 23:30 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: bpf, Toke Høiland-Jørgensen, Alexei Starovoitov,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S. Miller, Jakub Kicinski,
	Shaun Crampton, Jesper Dangaard Brouer, netdev

On Thu, Apr 22, 2021 at 04:51:55AM IST, Daniel Borkmann wrote:
> On 4/22/21 1:08 AM, Kumar Kartikeya Dwivedi wrote:
> > On Thu, Apr 22, 2021 at 04:29:28AM IST, Daniel Borkmann wrote:
> > > On 4/20/21 9:37 PM, Kumar Kartikeya Dwivedi wrote:
> > > [...]
> > > > diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> > > > index bec4e6a6e31d..b4ed6a41ea70 100644
> > > > --- a/tools/lib/bpf/libbpf.h
> > > > +++ b/tools/lib/bpf/libbpf.h
> > > > @@ -16,6 +16,8 @@
> > > >    #include <stdbool.h>
> > > >    #include <sys/types.h>  // for size_t
> > > >    #include <linux/bpf.h>
> > > > +#include <linux/pkt_sched.h>
> > > > +#include <linux/tc_act/tc_bpf.h>
> > > >    #include "libbpf_common.h"
> > > > @@ -775,6 +777,48 @@ LIBBPF_API int bpf_linker__add_file(struct bpf_linker *linker, const char *filen
> > > >    LIBBPF_API int bpf_linker__finalize(struct bpf_linker *linker);
> > > >    LIBBPF_API void bpf_linker__free(struct bpf_linker *linker);
> > > > +/* Convenience macros for the clsact attach hooks */
> > > > +#define BPF_TC_CLSACT_INGRESS TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_INGRESS)
> > > > +#define BPF_TC_CLSACT_EGRESS TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_EGRESS)
> > >
> > > I would abstract those away into an enum, plus avoid having to pull in
> > > linux/pkt_sched.h and linux/tc_act/tc_bpf.h from main libbpf.h header.
> > >
> > > Just add a enum { BPF_TC_DIR_INGRESS, BPF_TC_DIR_EGRESS, } and then the
> > > concrete tc bits (TC_H_MAKE()) can be translated internally.
> >
> > Ok, will do.
> >
> > > > +struct bpf_tc_opts {
> > > > +	size_t sz;
> > >
> > > Is this set anywhere?
> >
> > This is needed by the OPTS_* infrastructure.
> >
> > > > +	__u32 handle;
> > > > +	__u32 class_id;
> > >
> > > I'd remove class_id from here as well given in direct-action a BPF prog can
> > > set it if needed.
> >
> > Ok, makes sense.
> >
> > > > +	__u16 priority;
> > > > +	bool replace;
> > > > +	size_t :0;
> > >
> > > What's the rationale for this padding?
> >
> > dde7b3f5f2f4 ("libbpf: Add explicit padding to bpf_xdp_set_link_opts")
>
> Hm, fair enough.
>
> > > > +};
> > > > +
> > > > +#define bpf_tc_opts__last_field replace
> > > > +
> > > > +/* Acts as a handle for an attached filter */
> > > > +struct bpf_tc_attach_id {
> > >
> > > nit: maybe bpf_tc_ctx
> > >
> >
> > Noted.
> >
> > > > +	__u32 handle;
> > > > +	__u16 priority;
> > > > +};
> > > > +
> > > > +struct bpf_tc_info {
> > > > +	struct bpf_tc_attach_id id;
> > > > +	__u16 protocol;
> > > > +	__u32 chain_index;
> > > > +	__u32 prog_id;
> > > > +	__u8 tag[BPF_TAG_SIZE];
> > > > +	__u32 class_id;
> > > > +	__u32 bpf_flags;
> > > > +	__u32 bpf_flags_gen;
> > >
> > > Given we do not yet have any setters e.g. for offload, etc, the one thing
> > > I'd see useful and crucial initially is prog_id.
> > >
> > > The protocol, chain_index, and I would also include tag should be dropped.
> >
> > A future user of this API needs to know the tag, so I would like to keep that.
> > The rest we can drop, and probably document the default values explicitly.
>
> Couldn't this be added along with the future patch for the [future] user?
>

True.

> The tag should be the tag of the prog itself, so if you have prog_id, you
> could also retrieve the same tag from the prog. The tag was basically from
> the early days where we didn't have bpf_prog_get_info_by_fd().
>
> What does that future user need to do different here?
>

From Shaun Crampton:
"My particular use case is to load a program, link it with its maps and then
check if its tag matches the existing program on the interface (and if so, abort
the update)"

Also CC'd, they would be able to elaborate better, and whether or not dropping
it is ok.

> Thanks,
> Daniel

--
Kartikeya

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v3 2/3] libbpf: add low level TC-BPF API
  2021-04-21 23:30         ` Kumar Kartikeya Dwivedi
@ 2021-04-21 23:41           ` Daniel Borkmann
  2021-04-22  9:47             ` Shaun Crampton
  0 siblings, 1 reply; 29+ messages in thread
From: Daniel Borkmann @ 2021-04-21 23:41 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Toke Høiland-Jørgensen, Alexei Starovoitov,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S. Miller, Jakub Kicinski,
	Shaun Crampton, Jesper Dangaard Brouer, netdev

On 4/22/21 1:30 AM, Kumar Kartikeya Dwivedi wrote:
> On Thu, Apr 22, 2021 at 04:51:55AM IST, Daniel Borkmann wrote:
>> On 4/22/21 1:08 AM, Kumar Kartikeya Dwivedi wrote:
>>> On Thu, Apr 22, 2021 at 04:29:28AM IST, Daniel Borkmann wrote:
>>>> On 4/20/21 9:37 PM, Kumar Kartikeya Dwivedi wrote:
>>>> [...]
>>>>> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
>>>>> index bec4e6a6e31d..b4ed6a41ea70 100644
>>>>> --- a/tools/lib/bpf/libbpf.h
>>>>> +++ b/tools/lib/bpf/libbpf.h
>>>>> @@ -16,6 +16,8 @@
>>>>>     #include <stdbool.h>
>>>>>     #include <sys/types.h>  // for size_t
>>>>>     #include <linux/bpf.h>
>>>>> +#include <linux/pkt_sched.h>
>>>>> +#include <linux/tc_act/tc_bpf.h>
>>>>>     #include "libbpf_common.h"
>>>>> @@ -775,6 +777,48 @@ LIBBPF_API int bpf_linker__add_file(struct bpf_linker *linker, const char *filen
>>>>>     LIBBPF_API int bpf_linker__finalize(struct bpf_linker *linker);
>>>>>     LIBBPF_API void bpf_linker__free(struct bpf_linker *linker);
>>>>> +/* Convenience macros for the clsact attach hooks */
>>>>> +#define BPF_TC_CLSACT_INGRESS TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_INGRESS)
>>>>> +#define BPF_TC_CLSACT_EGRESS TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_EGRESS)
>>>>
>>>> I would abstract those away into an enum, plus avoid having to pull in
>>>> linux/pkt_sched.h and linux/tc_act/tc_bpf.h from main libbpf.h header.
>>>>
>>>> Just add a enum { BPF_TC_DIR_INGRESS, BPF_TC_DIR_EGRESS, } and then the
>>>> concrete tc bits (TC_H_MAKE()) can be translated internally.
>>>
>>> Ok, will do.
>>>
>>>>> +struct bpf_tc_opts {
>>>>> +	size_t sz;
>>>>
>>>> Is this set anywhere?
>>>
>>> This is needed by the OPTS_* infrastructure.
>>>
>>>>> +	__u32 handle;
>>>>> +	__u32 class_id;
>>>>
>>>> I'd remove class_id from here as well given in direct-action a BPF prog can
>>>> set it if needed.
>>>
>>> Ok, makes sense.
>>>
>>>>> +	__u16 priority;
>>>>> +	bool replace;
>>>>> +	size_t :0;
>>>>
>>>> What's the rationale for this padding?
>>>
>>> dde7b3f5f2f4 ("libbpf: Add explicit padding to bpf_xdp_set_link_opts")
>>
>> Hm, fair enough.
>>
>>>>> +};
>>>>> +
>>>>> +#define bpf_tc_opts__last_field replace
>>>>> +
>>>>> +/* Acts as a handle for an attached filter */
>>>>> +struct bpf_tc_attach_id {
>>>>
>>>> nit: maybe bpf_tc_ctx
>>>
>>> Noted.
>>>
>>>>> +	__u32 handle;
>>>>> +	__u16 priority;
>>>>> +};
>>>>> +
>>>>> +struct bpf_tc_info {
>>>>> +	struct bpf_tc_attach_id id;
>>>>> +	__u16 protocol;
>>>>> +	__u32 chain_index;
>>>>> +	__u32 prog_id;
>>>>> +	__u8 tag[BPF_TAG_SIZE];
>>>>> +	__u32 class_id;
>>>>> +	__u32 bpf_flags;
>>>>> +	__u32 bpf_flags_gen;
>>>>
>>>> Given we do not yet have any setters e.g. for offload, etc, the one thing
>>>> I'd see useful and crucial initially is prog_id.
>>>>
>>>> The protocol, chain_index, and I would also include tag should be dropped.
>>>
>>> A future user of this API needs to know the tag, so I would like to keep that.
>>> The rest we can drop, and probably document the default values explicitly.
>>
>> Couldn't this be added along with the future patch for the [future] user?
> 
> True.
> 
>> The tag should be the tag of the prog itself, so if you have prog_id, you
>> could also retrieve the same tag from the prog. The tag was basically from
>> the early days where we didn't have bpf_prog_get_info_by_fd().
>>
>> What does that future user need to do different here?
> 
>  From Shaun Crampton:
> "My particular use case is to load a program, link it with its maps and then
> check if its tag matches the existing program on the interface (and if so, abort
> the update)"
> 
> Also CC'd, they would be able to elaborate better, and whether or not dropping
> it is ok.

Nope, just get it from the prog itself.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v3 2/3] libbpf: add low level TC-BPF API
  2021-04-21 17:06     ` Kumar Kartikeya Dwivedi
@ 2021-04-22  1:56       ` Yonghong Song
  0 siblings, 0 replies; 29+ messages in thread
From: Yonghong Song @ 2021-04-22  1:56 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Toke Høiland-Jørgensen, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	John Fastabend, KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, netdev



On 4/21/21 10:06 AM, Kumar Kartikeya Dwivedi wrote:
> On Wed, Apr 21, 2021 at 12:28:01PM IST, Yonghong Song wrote:
>>
>>
>> On 4/20/21 12:37 PM, Kumar Kartikeya Dwivedi wrote:
>>> This adds functions that wrap the netlink API used for adding,
>>> manipulating, and removing traffic control filters. These functions
>>> operate directly on the loaded prog's fd, and return a handle to the
>>> filter using an out parameter named id.
>>>
>>> The basic featureset is covered to allow for attaching and removal of
>>> filters. Some additional features like TCA_BPF_POLICE and TCA_RATE for
>>> the API have been omitted. These can added on top later by extending the
>>
>> "later" => "layer"?
>>
> 
> No, I meant that other options can be added on top of this series by someone
> later. I'll reword it.
> 
[...]
>>>
>>> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
>>> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
>>> ---
>>>    tools/lib/bpf/libbpf.h   |  44 ++++++
>>>    tools/lib/bpf/libbpf.map |   3 +
>>>    tools/lib/bpf/netlink.c  | 319 ++++++++++++++++++++++++++++++++++++++-
>>>    3 files changed, 360 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
>>> index bec4e6a6e31d..b4ed6a41ea70 100644
>>> --- a/tools/lib/bpf/libbpf.h
>>> +++ b/tools/lib/bpf/libbpf.h
>>> @@ -16,6 +16,8 @@
>>>    #include <stdbool.h>
>>>    #include <sys/types.h>  // for size_t
>>>    #include <linux/bpf.h>
>>> +#include <linux/pkt_sched.h>
>>> +#include <linux/tc_act/tc_bpf.h>
>>>    #include "libbpf_common.h"
>>> @@ -775,6 +777,48 @@ LIBBPF_API int bpf_linker__add_file(struct bpf_linker *linker, const char *filen
>>>    LIBBPF_API int bpf_linker__finalize(struct bpf_linker *linker);
>>>    LIBBPF_API void bpf_linker__free(struct bpf_linker *linker);
>>> +/* Convenience macros for the clsact attach hooks */
>>> +#define BPF_TC_CLSACT_INGRESS TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_INGRESS)
>>> +#define BPF_TC_CLSACT_EGRESS TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_EGRESS)
>>> +
>>> +struct bpf_tc_opts {
>>> +	size_t sz;
>>> +	__u32 handle;
>>> +	__u32 class_id;
>>> +	__u16 priority;
>>> +	bool replace;
>>> +	size_t :0;
>>
>> Did you see any error without "size_t :0"?
>>
> 
> Not really, I just added this considering this recent change:
> dde7b3f5f2f4 ("libbpf: Add explicit padding to bpf_xdp_set_link_opts")

Thanks. That is fine with me then.

> 
>>> +};
>>> +
>>> +#define bpf_tc_opts__last_field replace
>>> +
>>> +/* Acts as a handle for an attached filter */
>>> +struct bpf_tc_attach_id {
>>> +	__u32 handle;
>>> +	__u16 priority;
>>> +};
>>> +
>> [...]
> 
> --
> Kartikeya
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v3 2/3] libbpf: add low level TC-BPF API
  2021-04-21 22:59   ` Daniel Borkmann
  2021-04-21 23:08     ` Kumar Kartikeya Dwivedi
@ 2021-04-22  3:43     ` Andrii Nakryiko
  2021-04-22 15:35       ` Daniel Borkmann
  1 sibling, 1 reply; 29+ messages in thread
From: Andrii Nakryiko @ 2021-04-22  3:43 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Kumar Kartikeya Dwivedi, bpf, Toke Høiland-Jørgensen,
	Alexei Starovoitov, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, Networking

On Wed, Apr 21, 2021 at 3:59 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 4/20/21 9:37 PM, Kumar Kartikeya Dwivedi wrote:
> [...]
> > diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> > index bec4e6a6e31d..b4ed6a41ea70 100644
> > --- a/tools/lib/bpf/libbpf.h
> > +++ b/tools/lib/bpf/libbpf.h
> > @@ -16,6 +16,8 @@
> >   #include <stdbool.h>
> >   #include <sys/types.h>  // for size_t
> >   #include <linux/bpf.h>
> > +#include <linux/pkt_sched.h>
> > +#include <linux/tc_act/tc_bpf.h>
> >
> >   #include "libbpf_common.h"
> >
> > @@ -775,6 +777,48 @@ LIBBPF_API int bpf_linker__add_file(struct bpf_linker *linker, const char *filen
> >   LIBBPF_API int bpf_linker__finalize(struct bpf_linker *linker);
> >   LIBBPF_API void bpf_linker__free(struct bpf_linker *linker);
> >
> > +/* Convenience macros for the clsact attach hooks */
> > +#define BPF_TC_CLSACT_INGRESS TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_INGRESS)
> > +#define BPF_TC_CLSACT_EGRESS TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_EGRESS)
>
> I would abstract those away into an enum, plus avoid having to pull in
> linux/pkt_sched.h and linux/tc_act/tc_bpf.h from main libbpf.h header.
>
> Just add a enum { BPF_TC_DIR_INGRESS, BPF_TC_DIR_EGRESS, } and then the
> concrete tc bits (TC_H_MAKE()) can be translated internally.
>
> > +struct bpf_tc_opts {
> > +     size_t sz;
>
> Is this set anywhere?
>
> > +     __u32 handle;
> > +     __u32 class_id;
>
> I'd remove class_id from here as well given in direct-action a BPF prog can
> set it if needed.
>
> > +     __u16 priority;
> > +     bool replace;
> > +     size_t :0;
>
> What's the rationale for this padding?
>
> > +};
> > +
> > +#define bpf_tc_opts__last_field replace
> > +
> > +/* Acts as a handle for an attached filter */
> > +struct bpf_tc_attach_id {
>
> nit: maybe bpf_tc_ctx

ok, so wait. It seems like apart from INGRESS|EGRESS enum and ifindex,
everything else is optional and/or has some sane defaults, right? So
this bpf_tc_attach_id or bpf_tc_ctx seems a bit artificial construct
and it will cause problems for extending this.

So if my understanding is correct, I'd get rid of it completely. As I
said previously, opts allow returning parameters back, so if user
didn't specify handle and priority and kernel picks values on user's
behalf, we can return them in the same opts fields.

For detach, again, ifindex and INGRESS|EGRESS is sufficient, but if
user want to provide more detailed parameters, we should do that
through extensible opts. That way we can keep growing this easily,
plus simple cases will remain simple.

Similarly bpf_tc_info below, there is no need to have struct
bpf_tc_attach_id id; field, just have handle and priority right there.
And bpf_tc_info should use OPTS framework for extensibility (though
opts name doesn't fit it very well, but it is still nice for
extensibility and for doing optional input/output params).

Does this make sense? Am I missing something crucial here?

The general rule with any new structs added to libbpf APIs is to
either be 100% (ok, 99.99%) sure that they will never be changed, or
do forward/backward compatible OPTS. Any other thing is pain and calls
for symbol versioning, which we are trying really hard to avoid.

>
> > +     __u32 handle;
> > +     __u16 priority;
> > +};
> > +
> > +struct bpf_tc_info {
> > +     struct bpf_tc_attach_id id;
> > +     __u16 protocol;
> > +     __u32 chain_index;
> > +     __u32 prog_id;
> > +     __u8 tag[BPF_TAG_SIZE];
> > +     __u32 class_id;
> > +     __u32 bpf_flags;
> > +     __u32 bpf_flags_gen;
>
> Given we do not yet have any setters e.g. for offload, etc, the one thing
> I'd see useful and crucial initially is prog_id.
>
> The protocol, chain_index, and I would also include tag should be dropped.
> Similarly class_id given my earlier statement, and flags I would extend once
> this lib API would support offloading progs.
>
> > +};
> > +
> > +/* id is out parameter that will be written to, it must not be NULL */
> > +LIBBPF_API int bpf_tc_attach(int fd, __u32 ifindex, __u32 parent_id,
> > +                          const struct bpf_tc_opts *opts,
> > +                          struct bpf_tc_attach_id *id);
> > +LIBBPF_API int bpf_tc_detach(__u32 ifindex, __u32 parent_id,
> > +                          const struct bpf_tc_attach_id *id);
> > +LIBBPF_API int bpf_tc_get_info(__u32 ifindex, __u32 parent_id,
> > +                            const struct bpf_tc_attach_id *id,
> > +                            struct bpf_tc_info *info);
>
> As per above, for parent_id I'd replace with dir enum.
>
> > +
> >   #ifdef __cplusplus
> >   } /* extern "C" */
> >   #endif

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v3 2/3] libbpf: add low level TC-BPF API
  2021-04-21 23:14       ` Daniel Borkmann
@ 2021-04-22  9:08         ` Toke Høiland-Jørgensen
  2021-04-22 11:51           ` Daniel Borkmann
  0 siblings, 1 reply; 29+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-04-22  9:08 UTC (permalink / raw)
  To: Daniel Borkmann, Andrii Nakryiko, Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Martin KaFai Lau,
	Song Liu, Yonghong Song, John Fastabend, KP Singh,
	David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer,
	Networking

Daniel Borkmann <daniel@iogearbox.net> writes:

> On 4/21/21 9:48 PM, Toke Høiland-Jørgensen wrote:
>> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>>> On Tue, Apr 20, 2021 at 12:37 PM Kumar Kartikeya Dwivedi
>>> <memxor@gmail.com> wrote:
> [...]
>>>> ---
>>>>   tools/lib/bpf/libbpf.h   |  44 ++++++
>>>>   tools/lib/bpf/libbpf.map |   3 +
>>>>   tools/lib/bpf/netlink.c  | 319 ++++++++++++++++++++++++++++++++++++++-
>>>>   3 files changed, 360 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
>>>> index bec4e6a6e31d..b4ed6a41ea70 100644
>>>> --- a/tools/lib/bpf/libbpf.h
>>>> +++ b/tools/lib/bpf/libbpf.h
>>>> @@ -16,6 +16,8 @@
>>>>   #include <stdbool.h>
>>>>   #include <sys/types.h>  // for size_t
>>>>   #include <linux/bpf.h>
>>>> +#include <linux/pkt_sched.h>
>>>> +#include <linux/tc_act/tc_bpf.h>
>>>
>>> apart from those unused macros below, are these needed in public API header?
>>>
>>>>   #include "libbpf_common.h"
>>>>
>>>> @@ -775,6 +777,48 @@ LIBBPF_API int bpf_linker__add_file(struct bpf_linker *linker, const char *filen
>>>>   LIBBPF_API int bpf_linker__finalize(struct bpf_linker *linker);
>>>>   LIBBPF_API void bpf_linker__free(struct bpf_linker *linker);
>>>>
>>>> +/* Convenience macros for the clsact attach hooks */
>>>> +#define BPF_TC_CLSACT_INGRESS TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_INGRESS)
>>>> +#define BPF_TC_CLSACT_EGRESS TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_EGRESS)
>>>
>>> these seem to be used only internally, why exposing them in public
>>> API?
>> 
>> No they're "aliases" for when you want to attach the filter directly to
>> the interface (and thus install the clsact qdisc as the root). You can
>> also use the filter with an existing qdisc (most commonly HTB), in which
>> case you need to specify the qdisc handle as the root. We have a few
>> examples of this use case:
>> 
>> https://github.com/xdp-project/bpf-examples/tree/master/traffic-pacing-edt
>> and
>> https://github.com/xdp-project/xdp-cpumap-tc
>
> I'm a bit puzzled, could you elaborate on your use case on why you wouldn't
> use the tc egress hook for those especially given it's guaranteed to run
> outside of root qdisc lock?

Jesper can correct me if I'm wrong, but I think the first one of the
links above is basically his implementation of just that EDT-based
shaper. And it works reasonably well, except you don't get the nice
per-flow scheduling and sparse flow prioritisation like in FQ-CoDel
unless you implement that yourself in BPF when you set the timestamps
(and that is by no means trivial to implement).

So if you want to use any of the features of the existing qdiscs (I have
also been suggesting to people that they use tc_bpf if they want to
customise sch_cake's notion of flows or shaping tiers), you need to be
able to attach the filter to an existing qdisc. Sure, this means you're
still stuck behind the qdisc lock, but for some applications that is
fine (not everything is a data centre, some devices don't have that many
CPUs anyway; and as the second example above shows, you can get around
the qdisc lock by some clever use of partitioning of flows using
cpumap).

So what this boils down to is, we should keep the 'parent' parameter not
just as an egress/ingress enum, but also as a field the user can fill
out. I'm fine with moving the latter into the opts struct, though, so
maybe the function parameter can be an enum like:

enum bpf_tc_attach_point {
  BPF_TC_CLSACT_INGRESS,
  BPF_TC_CLSACT_EGRESS,
  BPF_TC_QDISC_PARENT
};

where if you set the last one you have to fill in the parent in opts?

-Toke


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v3 2/3] libbpf: add low level TC-BPF API
  2021-04-21 23:41           ` Daniel Borkmann
@ 2021-04-22  9:47             ` Shaun Crampton
  2021-04-22 11:26               ` Daniel Borkmann
  0 siblings, 1 reply; 29+ messages in thread
From: Shaun Crampton @ 2021-04-22  9:47 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Kumar Kartikeya Dwivedi, bpf, Toke Høiland-Jørgensen,
	Alexei Starovoitov, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, netdev

> Nope, just get it from the prog itself.

Looks like the API returns the prog ID, so from that we can look up the prog
and then get its tag? If so that meets our needs.  Alternatively, if
the API allows
for atomic replacement of a BPF program with another, that'd also work for us.

The use case is that our daemon is restarted and it doesn't know what BPF
program was previously loaded. We want to make sure we end up with the
latest version loaded, either by doing a seamless replace with the
latest version
or by checking if the loaded version matches the latest.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v3 2/3] libbpf: add low level TC-BPF API
  2021-04-22  9:47             ` Shaun Crampton
@ 2021-04-22 11:26               ` Daniel Borkmann
  0 siblings, 0 replies; 29+ messages in thread
From: Daniel Borkmann @ 2021-04-22 11:26 UTC (permalink / raw)
  To: Shaun Crampton
  Cc: Kumar Kartikeya Dwivedi, bpf, Toke Høiland-Jørgensen,
	Alexei Starovoitov, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, netdev

On 4/22/21 11:47 AM, Shaun Crampton wrote:
>> Nope, just get it from the prog itself.
> 
> Looks like the API returns the prog ID, so from that we can look up the prog
> and then get its tag? If so that meets our needs.  Alternatively, if
> the API allows
> for atomic replacement of a BPF program with another, that'd also work for us.

Both is the case: from prog ID you can already retrieve that same tag, and progs
can be atomically replaced with the current API code.

Exposing the tag in here otherwise feels just odd/wrong from a design PoV, explain
that to a user of this API on /why/ such field is in the tc API when it already
can be retrieved via bpf_prog_get_info_by_fd(), as in, what is so special on this
field in the tc API here (aside from legacy reasons when there was no mentioned
helper [which we don't need to support given it dates way too far back]) ... I
cannot. ;-) Hence lets drop it from there.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v3 2/3] libbpf: add low level TC-BPF API
  2021-04-22  9:08         ` Toke Høiland-Jørgensen
@ 2021-04-22 11:51           ` Daniel Borkmann
  2021-04-22 12:46             ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 29+ messages in thread
From: Daniel Borkmann @ 2021-04-22 11:51 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Andrii Nakryiko,
	Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Martin KaFai Lau,
	Song Liu, Yonghong Song, John Fastabend, KP Singh,
	David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer,
	Networking

On 4/22/21 11:08 AM, Toke Høiland-Jørgensen wrote:
> Daniel Borkmann <daniel@iogearbox.net> writes:
>> On 4/21/21 9:48 PM, Toke Høiland-Jørgensen wrote:
>>> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>>>> On Tue, Apr 20, 2021 at 12:37 PM Kumar Kartikeya Dwivedi
>>>> <memxor@gmail.com> wrote:
>> [...]
>>>>> ---
>>>>>    tools/lib/bpf/libbpf.h   |  44 ++++++
>>>>>    tools/lib/bpf/libbpf.map |   3 +
>>>>>    tools/lib/bpf/netlink.c  | 319 ++++++++++++++++++++++++++++++++++++++-
>>>>>    3 files changed, 360 insertions(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
>>>>> index bec4e6a6e31d..b4ed6a41ea70 100644
>>>>> --- a/tools/lib/bpf/libbpf.h
>>>>> +++ b/tools/lib/bpf/libbpf.h
>>>>> @@ -16,6 +16,8 @@
>>>>>    #include <stdbool.h>
>>>>>    #include <sys/types.h>  // for size_t
>>>>>    #include <linux/bpf.h>
>>>>> +#include <linux/pkt_sched.h>
>>>>> +#include <linux/tc_act/tc_bpf.h>
>>>>
>>>> apart from those unused macros below, are these needed in public API header?
>>>>
>>>>>    #include "libbpf_common.h"
>>>>>
>>>>> @@ -775,6 +777,48 @@ LIBBPF_API int bpf_linker__add_file(struct bpf_linker *linker, const char *filen
>>>>>    LIBBPF_API int bpf_linker__finalize(struct bpf_linker *linker);
>>>>>    LIBBPF_API void bpf_linker__free(struct bpf_linker *linker);
>>>>>
>>>>> +/* Convenience macros for the clsact attach hooks */
>>>>> +#define BPF_TC_CLSACT_INGRESS TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_INGRESS)
>>>>> +#define BPF_TC_CLSACT_EGRESS TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_EGRESS)
>>>>
>>>> these seem to be used only internally, why exposing them in public
>>>> API?
>>>
>>> No they're "aliases" for when you want to attach the filter directly to
>>> the interface (and thus install the clsact qdisc as the root). You can
>>> also use the filter with an existing qdisc (most commonly HTB), in which
>>> case you need to specify the qdisc handle as the root. We have a few
>>> examples of this use case:
>>>
>>> https://github.com/xdp-project/bpf-examples/tree/master/traffic-pacing-edt
>>> and
>>> https://github.com/xdp-project/xdp-cpumap-tc
>>
>> I'm a bit puzzled, could you elaborate on your use case on why you wouldn't
>> use the tc egress hook for those especially given it's guaranteed to run
>> outside of root qdisc lock?
> 
> Jesper can correct me if I'm wrong, but I think the first one of the
> links above is basically his implementation of just that EDT-based
> shaper. And it works reasonably well, except you don't get the nice
> per-flow scheduling and sparse flow prioritisation like in FQ-CoDel
> unless you implement that yourself in BPF when you set the timestamps
> (and that is by no means trivial to implement).
> 
> So if you want to use any of the features of the existing qdiscs (I have
> also been suggesting to people that they use tc_bpf if they want to
> customise sch_cake's notion of flows or shaping tiers), you need to be
> able to attach the filter to an existing qdisc. Sure, this means you're
> still stuck behind the qdisc lock, but for some applications that is
> fine (not everything is a data centre, some devices don't have that many
> CPUs anyway; and as the second example above shows, you can get around
> the qdisc lock by some clever use of partitioning of flows using
> cpumap).
> 
> So what this boils down to is, we should keep the 'parent' parameter not
> just as an egress/ingress enum, but also as a field the user can fill
> out. I'm fine with moving the latter into the opts struct, though, so
> maybe the function parameter can be an enum like:
> 
> enum bpf_tc_attach_point {
>    BPF_TC_CLSACT_INGRESS,
>    BPF_TC_CLSACT_EGRESS,
>    BPF_TC_QDISC_PARENT
> };
> 
> where if you set the last one you have to fill in the parent in opts?

Fair enough, I still think this is a bit backwards and should be discouraged
given the constraints, but if you have an actual need for it ... I'd rather
simplify API naming, the fact that it's clsact is an implementation detail
and shouldn't matter to a user, like:

enum bpf_tc_attach_point {
	BPF_TC_INGRESS,
	BPF_TC_EGRESS,
	BPF_TC_CUSTOM_PARENT,
};

For BPF_TC_INGRESS and BPF_TC_EGRESS, I would enforce opts parent parameter
to be /zero/ from the API.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v3 2/3] libbpf: add low level TC-BPF API
  2021-04-22 11:51           ` Daniel Borkmann
@ 2021-04-22 12:46             ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 29+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-04-22 12:46 UTC (permalink / raw)
  To: Daniel Borkmann, Andrii Nakryiko, Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Martin KaFai Lau,
	Song Liu, Yonghong Song, John Fastabend, KP Singh,
	David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer,
	Networking

Daniel Borkmann <daniel@iogearbox.net> writes:

> On 4/22/21 11:08 AM, Toke Høiland-Jørgensen wrote:
>> Daniel Borkmann <daniel@iogearbox.net> writes:
>>> On 4/21/21 9:48 PM, Toke Høiland-Jørgensen wrote:
>>>> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>>>>> On Tue, Apr 20, 2021 at 12:37 PM Kumar Kartikeya Dwivedi
>>>>> <memxor@gmail.com> wrote:
>>> [...]
>>>>>> ---
>>>>>>    tools/lib/bpf/libbpf.h   |  44 ++++++
>>>>>>    tools/lib/bpf/libbpf.map |   3 +
>>>>>>    tools/lib/bpf/netlink.c  | 319 ++++++++++++++++++++++++++++++++++++++-
>>>>>>    3 files changed, 360 insertions(+), 6 deletions(-)
>>>>>>
>>>>>> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
>>>>>> index bec4e6a6e31d..b4ed6a41ea70 100644
>>>>>> --- a/tools/lib/bpf/libbpf.h
>>>>>> +++ b/tools/lib/bpf/libbpf.h
>>>>>> @@ -16,6 +16,8 @@
>>>>>>    #include <stdbool.h>
>>>>>>    #include <sys/types.h>  // for size_t
>>>>>>    #include <linux/bpf.h>
>>>>>> +#include <linux/pkt_sched.h>
>>>>>> +#include <linux/tc_act/tc_bpf.h>
>>>>>
>>>>> apart from those unused macros below, are these needed in public API header?
>>>>>
>>>>>>    #include "libbpf_common.h"
>>>>>>
>>>>>> @@ -775,6 +777,48 @@ LIBBPF_API int bpf_linker__add_file(struct bpf_linker *linker, const char *filen
>>>>>>    LIBBPF_API int bpf_linker__finalize(struct bpf_linker *linker);
>>>>>>    LIBBPF_API void bpf_linker__free(struct bpf_linker *linker);
>>>>>>
>>>>>> +/* Convenience macros for the clsact attach hooks */
>>>>>> +#define BPF_TC_CLSACT_INGRESS TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_INGRESS)
>>>>>> +#define BPF_TC_CLSACT_EGRESS TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_EGRESS)
>>>>>
>>>>> these seem to be used only internally, why exposing them in public
>>>>> API?
>>>>
>>>> No they're "aliases" for when you want to attach the filter directly to
>>>> the interface (and thus install the clsact qdisc as the root). You can
>>>> also use the filter with an existing qdisc (most commonly HTB), in which
>>>> case you need to specify the qdisc handle as the root. We have a few
>>>> examples of this use case:
>>>>
>>>> https://github.com/xdp-project/bpf-examples/tree/master/traffic-pacing-edt
>>>> and
>>>> https://github.com/xdp-project/xdp-cpumap-tc
>>>
>>> I'm a bit puzzled, could you elaborate on your use case on why you wouldn't
>>> use the tc egress hook for those especially given it's guaranteed to run
>>> outside of root qdisc lock?
>> 
>> Jesper can correct me if I'm wrong, but I think the first one of the
>> links above is basically his implementation of just that EDT-based
>> shaper. And it works reasonably well, except you don't get the nice
>> per-flow scheduling and sparse flow prioritisation like in FQ-CoDel
>> unless you implement that yourself in BPF when you set the timestamps
>> (and that is by no means trivial to implement).
>> 
>> So if you want to use any of the features of the existing qdiscs (I have
>> also been suggesting to people that they use tc_bpf if they want to
>> customise sch_cake's notion of flows or shaping tiers), you need to be
>> able to attach the filter to an existing qdisc. Sure, this means you're
>> still stuck behind the qdisc lock, but for some applications that is
>> fine (not everything is a data centre, some devices don't have that many
>> CPUs anyway; and as the second example above shows, you can get around
>> the qdisc lock by some clever use of partitioning of flows using
>> cpumap).
>> 
>> So what this boils down to is, we should keep the 'parent' parameter not
>> just as an egress/ingress enum, but also as a field the user can fill
>> out. I'm fine with moving the latter into the opts struct, though, so
>> maybe the function parameter can be an enum like:
>> 
>> enum bpf_tc_attach_point {
>>    BPF_TC_CLSACT_INGRESS,
>>    BPF_TC_CLSACT_EGRESS,
>>    BPF_TC_QDISC_PARENT
>> };
>> 
>> where if you set the last one you have to fill in the parent in opts?
>
> Fair enough, I still think this is a bit backwards and should be discouraged
> given the constraints, but if you have an actual need for it ... I'd rather
> simplify API naming, the fact that it's clsact is an implementation detail
> and shouldn't matter to a user, like:
>
> enum bpf_tc_attach_point {
> 	BPF_TC_INGRESS,
> 	BPF_TC_EGRESS,
> 	BPF_TC_CUSTOM_PARENT,
> };
>
> For BPF_TC_INGRESS and BPF_TC_EGRESS, I would enforce opts parent parameter
> to be /zero/ from the API.

OK, SGTM :)

-Toke


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v3 2/3] libbpf: add low level TC-BPF API
  2021-04-22  3:43     ` Andrii Nakryiko
@ 2021-04-22 15:35       ` Daniel Borkmann
  2021-04-22 18:28         ` Andrii Nakryiko
  0 siblings, 1 reply; 29+ messages in thread
From: Daniel Borkmann @ 2021-04-22 15:35 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Kumar Kartikeya Dwivedi, bpf, Toke Høiland-Jørgensen,
	Alexei Starovoitov, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, Networking

On 4/22/21 5:43 AM, Andrii Nakryiko wrote:
> On Wed, Apr 21, 2021 at 3:59 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>> On 4/20/21 9:37 PM, Kumar Kartikeya Dwivedi wrote:
>> [...]
>>> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
>>> index bec4e6a6e31d..b4ed6a41ea70 100644
>>> --- a/tools/lib/bpf/libbpf.h
>>> +++ b/tools/lib/bpf/libbpf.h
>>> @@ -16,6 +16,8 @@
>>>    #include <stdbool.h>
>>>    #include <sys/types.h>  // for size_t
>>>    #include <linux/bpf.h>
>>> +#include <linux/pkt_sched.h>
>>> +#include <linux/tc_act/tc_bpf.h>
>>>
>>>    #include "libbpf_common.h"
>>>
>>> @@ -775,6 +777,48 @@ LIBBPF_API int bpf_linker__add_file(struct bpf_linker *linker, const char *filen
>>>    LIBBPF_API int bpf_linker__finalize(struct bpf_linker *linker);
>>>    LIBBPF_API void bpf_linker__free(struct bpf_linker *linker);
>>>
>>> +/* Convenience macros for the clsact attach hooks */
>>> +#define BPF_TC_CLSACT_INGRESS TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_INGRESS)
>>> +#define BPF_TC_CLSACT_EGRESS TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_EGRESS)
>>
>> I would abstract those away into an enum, plus avoid having to pull in
>> linux/pkt_sched.h and linux/tc_act/tc_bpf.h from main libbpf.h header.
>>
>> Just add a enum { BPF_TC_DIR_INGRESS, BPF_TC_DIR_EGRESS, } and then the
>> concrete tc bits (TC_H_MAKE()) can be translated internally.
>>
>>> +struct bpf_tc_opts {
>>> +     size_t sz;
>>
>> Is this set anywhere?
>>
>>> +     __u32 handle;
>>> +     __u32 class_id;
>>
>> I'd remove class_id from here as well given in direct-action a BPF prog can
>> set it if needed.
>>
>>> +     __u16 priority;
>>> +     bool replace;
>>> +     size_t :0;
>>
>> What's the rationale for this padding?
>>
>>> +};
>>> +
>>> +#define bpf_tc_opts__last_field replace
>>> +
>>> +/* Acts as a handle for an attached filter */
>>> +struct bpf_tc_attach_id {
>>
>> nit: maybe bpf_tc_ctx
> 
> ok, so wait. It seems like apart from INGRESS|EGRESS enum and ifindex,
> everything else is optional and/or has some sane defaults, right? So
> this bpf_tc_attach_id or bpf_tc_ctx seems a bit artificial construct
> and it will cause problems for extending this.
> 
> So if my understanding is correct, I'd get rid of it completely. As I
> said previously, opts allow returning parameters back, so if user
> didn't specify handle and priority and kernel picks values on user's
> behalf, we can return them in the same opts fields.
> 
> For detach, again, ifindex and INGRESS|EGRESS is sufficient, but if
> user want to provide more detailed parameters, we should do that
> through extensible opts. That way we can keep growing this easily,
> plus simple cases will remain simple.
> 
> Similarly bpf_tc_info below, there is no need to have struct
> bpf_tc_attach_id id; field, just have handle and priority right there.
> And bpf_tc_info should use OPTS framework for extensibility (though
> opts name doesn't fit it very well, but it is still nice for
> extensibility and for doing optional input/output params).
> 
> Does this make sense? Am I missing something crucial here?

I would probably keep the handle + priority in there; maybe if both are 0,
we could fix it to some default value internally, but without those it might
be a bit hard if people want to build a 'pipeline' of cls_bpf progs if they
need/want to.

Potentially, one could fixate the handle itself, and then allow to specify
different priorities for it such that when a BPF prog returns a TC_ACT_UNSPEC,
it will exec the next one inside that cls_bpf instance, every other TC_ACT_*
opcode will terminate the processing. Technically, only priority would really
be needed (unless you combine multiple different classifiers from tc side on
the ingress/egress hook which is not great to begin with, tbh).

> The general rule with any new structs added to libbpf APIs is to
> either be 100% (ok, 99.99%) sure that they will never be changed, or
> do forward/backward compatible OPTS. Any other thing is pain and calls
> for symbol versioning, which we are trying really hard to avoid.
> 
>>> +     __u32 handle;
>>> +     __u16 priority;
>>> +};
>>> +
>>> +struct bpf_tc_info {
>>> +     struct bpf_tc_attach_id id;
>>> +     __u16 protocol;
>>> +     __u32 chain_index;
>>> +     __u32 prog_id;
>>> +     __u8 tag[BPF_TAG_SIZE];
>>> +     __u32 class_id;
>>> +     __u32 bpf_flags;
>>> +     __u32 bpf_flags_gen;
>>
>> Given we do not yet have any setters e.g. for offload, etc, the one thing
>> I'd see useful and crucial initially is prog_id.
>>
>> The protocol, chain_index, and I would also include tag should be dropped.
>> Similarly class_id given my earlier statement, and flags I would extend once
>> this lib API would support offloading progs.
>>
>>> +};
>>> +
>>> +/* id is out parameter that will be written to, it must not be NULL */
>>> +LIBBPF_API int bpf_tc_attach(int fd, __u32 ifindex, __u32 parent_id,
>>> +                          const struct bpf_tc_opts *opts,
>>> +                          struct bpf_tc_attach_id *id);
>>> +LIBBPF_API int bpf_tc_detach(__u32 ifindex, __u32 parent_id,
>>> +                          const struct bpf_tc_attach_id *id);
>>> +LIBBPF_API int bpf_tc_get_info(__u32 ifindex, __u32 parent_id,
>>> +                            const struct bpf_tc_attach_id *id,
>>> +                            struct bpf_tc_info *info);
>>
>> As per above, for parent_id I'd replace with dir enum.
>>
>>> +
>>>    #ifdef __cplusplus
>>>    } /* extern "C" */
>>>    #endif


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v3 2/3] libbpf: add low level TC-BPF API
  2021-04-22 15:35       ` Daniel Borkmann
@ 2021-04-22 18:28         ` Andrii Nakryiko
  0 siblings, 0 replies; 29+ messages in thread
From: Andrii Nakryiko @ 2021-04-22 18:28 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Kumar Kartikeya Dwivedi, bpf, Toke Høiland-Jørgensen,
	Alexei Starovoitov, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, Networking

On Thu, Apr 22, 2021 at 8:35 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 4/22/21 5:43 AM, Andrii Nakryiko wrote:
> > On Wed, Apr 21, 2021 at 3:59 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
> >> On 4/20/21 9:37 PM, Kumar Kartikeya Dwivedi wrote:
> >> [...]
> >>> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> >>> index bec4e6a6e31d..b4ed6a41ea70 100644
> >>> --- a/tools/lib/bpf/libbpf.h
> >>> +++ b/tools/lib/bpf/libbpf.h
> >>> @@ -16,6 +16,8 @@
> >>>    #include <stdbool.h>
> >>>    #include <sys/types.h>  // for size_t
> >>>    #include <linux/bpf.h>
> >>> +#include <linux/pkt_sched.h>
> >>> +#include <linux/tc_act/tc_bpf.h>
> >>>
> >>>    #include "libbpf_common.h"
> >>>
> >>> @@ -775,6 +777,48 @@ LIBBPF_API int bpf_linker__add_file(struct bpf_linker *linker, const char *filen
> >>>    LIBBPF_API int bpf_linker__finalize(struct bpf_linker *linker);
> >>>    LIBBPF_API void bpf_linker__free(struct bpf_linker *linker);
> >>>
> >>> +/* Convenience macros for the clsact attach hooks */
> >>> +#define BPF_TC_CLSACT_INGRESS TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_INGRESS)
> >>> +#define BPF_TC_CLSACT_EGRESS TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_EGRESS)
> >>
> >> I would abstract those away into an enum, plus avoid having to pull in
> >> linux/pkt_sched.h and linux/tc_act/tc_bpf.h from main libbpf.h header.
> >>
> >> Just add a enum { BPF_TC_DIR_INGRESS, BPF_TC_DIR_EGRESS, } and then the
> >> concrete tc bits (TC_H_MAKE()) can be translated internally.
> >>
> >>> +struct bpf_tc_opts {
> >>> +     size_t sz;
> >>
> >> Is this set anywhere?
> >>
> >>> +     __u32 handle;
> >>> +     __u32 class_id;
> >>
> >> I'd remove class_id from here as well given in direct-action a BPF prog can
> >> set it if needed.
> >>
> >>> +     __u16 priority;
> >>> +     bool replace;
> >>> +     size_t :0;
> >>
> >> What's the rationale for this padding?
> >>
> >>> +};
> >>> +
> >>> +#define bpf_tc_opts__last_field replace
> >>> +
> >>> +/* Acts as a handle for an attached filter */
> >>> +struct bpf_tc_attach_id {
> >>
> >> nit: maybe bpf_tc_ctx
> >
> > ok, so wait. It seems like apart from INGRESS|EGRESS enum and ifindex,
> > everything else is optional and/or has some sane defaults, right? So
> > this bpf_tc_attach_id or bpf_tc_ctx seems a bit artificial construct
> > and it will cause problems for extending this.
> >
> > So if my understanding is correct, I'd get rid of it completely. As I
> > said previously, opts allow returning parameters back, so if user
> > didn't specify handle and priority and kernel picks values on user's
> > behalf, we can return them in the same opts fields.
> >
> > For detach, again, ifindex and INGRESS|EGRESS is sufficient, but if
> > user want to provide more detailed parameters, we should do that
> > through extensible opts. That way we can keep growing this easily,
> > plus simple cases will remain simple.
> >
> > Similarly bpf_tc_info below, there is no need to have struct
> > bpf_tc_attach_id id; field, just have handle and priority right there.
> > And bpf_tc_info should use OPTS framework for extensibility (though
> > opts name doesn't fit it very well, but it is still nice for
> > extensibility and for doing optional input/output params).
> >
> > Does this make sense? Am I missing something crucial here?
>
> I would probably keep the handle + priority in there; maybe if both are 0,
> we could fix it to some default value internally, but without those it might
> be a bit hard if people want to build a 'pipeline' of cls_bpf progs if they
> need/want to.

Oh, I'm not proposing to drop support for specifying handle and prio.
I'm just saying having a fixed UAPI struct bpf_tc_attach_id as an "ID"
is problematic from API stability point of view. So instead of
pretending we know what "ID" will always be like, pass any extra
non-default fields in OPTS struct. And if those are not specified by
user (either opts is NULL or handle/prio is 0), use sane defaults, as
you are proposing.

>
> Potentially, one could fixate the handle itself, and then allow to specify
> different priorities for it such that when a BPF prog returns a TC_ACT_UNSPEC,
> it will exec the next one inside that cls_bpf instance, every other TC_ACT_*
> opcode will terminate the processing. Technically, only priority would really
> be needed (unless you combine multiple different classifiers from tc side on
> the ingress/egress hook which is not great to begin with, tbh).
>
> > The general rule with any new structs added to libbpf APIs is to
> > either be 100% (ok, 99.99%) sure that they will never be changed, or
> > do forward/backward compatible OPTS. Any other thing is pain and calls
> > for symbol versioning, which we are trying really hard to avoid.
> >
> >>> +     __u32 handle;
> >>> +     __u16 priority;
> >>> +};
> >>> +
> >>> +struct bpf_tc_info {
> >>> +     struct bpf_tc_attach_id id;
> >>> +     __u16 protocol;
> >>> +     __u32 chain_index;
> >>> +     __u32 prog_id;
> >>> +     __u8 tag[BPF_TAG_SIZE];
> >>> +     __u32 class_id;
> >>> +     __u32 bpf_flags;
> >>> +     __u32 bpf_flags_gen;
> >>
> >> Given we do not yet have any setters e.g. for offload, etc, the one thing
> >> I'd see useful and crucial initially is prog_id.
> >>
> >> The protocol, chain_index, and I would also include tag should be dropped.
> >> Similarly class_id given my earlier statement, and flags I would extend once
> >> this lib API would support offloading progs.
> >>
> >>> +};
> >>> +
> >>> +/* id is out parameter that will be written to, it must not be NULL */
> >>> +LIBBPF_API int bpf_tc_attach(int fd, __u32 ifindex, __u32 parent_id,
> >>> +                          const struct bpf_tc_opts *opts,
> >>> +                          struct bpf_tc_attach_id *id);
> >>> +LIBBPF_API int bpf_tc_detach(__u32 ifindex, __u32 parent_id,
> >>> +                          const struct bpf_tc_attach_id *id);
> >>> +LIBBPF_API int bpf_tc_get_info(__u32 ifindex, __u32 parent_id,
> >>> +                            const struct bpf_tc_attach_id *id,
> >>> +                            struct bpf_tc_info *info);
> >>
> >> As per above, for parent_id I'd replace with dir enum.
> >>
> >>> +
> >>>    #ifdef __cplusplus
> >>>    } /* extern "C" */
> >>>    #endif
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2021-04-22 18:28 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-20 19:37 [PATCH bpf-next v3 0/3] Add TC-BPF API Kumar Kartikeya Dwivedi
2021-04-20 19:37 ` [PATCH bpf-next v3 1/3] libbpf: add helpers for preparing netlink attributes Kumar Kartikeya Dwivedi
2021-04-21  6:37   ` Yonghong Song
2021-04-20 19:37 ` [PATCH bpf-next v3 2/3] libbpf: add low level TC-BPF API Kumar Kartikeya Dwivedi
2021-04-21  6:58   ` Yonghong Song
2021-04-21 17:06     ` Kumar Kartikeya Dwivedi
2021-04-22  1:56       ` Yonghong Song
2021-04-21 18:19   ` Andrii Nakryiko
2021-04-21 19:48     ` Toke Høiland-Jørgensen
2021-04-21 23:14       ` Daniel Borkmann
2021-04-22  9:08         ` Toke Høiland-Jørgensen
2021-04-22 11:51           ` Daniel Borkmann
2021-04-22 12:46             ` Toke Høiland-Jørgensen
2021-04-21 22:59   ` Daniel Borkmann
2021-04-21 23:08     ` Kumar Kartikeya Dwivedi
2021-04-21 23:21       ` Daniel Borkmann
2021-04-21 23:30         ` Kumar Kartikeya Dwivedi
2021-04-21 23:41           ` Daniel Borkmann
2021-04-22  9:47             ` Shaun Crampton
2021-04-22 11:26               ` Daniel Borkmann
2021-04-22  3:43     ` Andrii Nakryiko
2021-04-22 15:35       ` Daniel Borkmann
2021-04-22 18:28         ` Andrii Nakryiko
2021-04-20 19:37 ` [PATCH bpf-next v3 3/3] libbpf: add selftests for " Kumar Kartikeya Dwivedi
2021-04-21  7:01   ` Yonghong Song
2021-04-21 18:24   ` Andrii Nakryiko
2021-04-21 19:56     ` Kumar Kartikeya Dwivedi
2021-04-21 20:38       ` Toke Høiland-Jørgensen
2021-04-21 22:41         ` Daniel Borkmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).