Netdev Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH bpf-next v5 0/3] Add TC-BPF API
@ 2021-04-28 16:25 Kumar Kartikeya Dwivedi
  2021-04-28 16:25 ` [PATCH bpf-next v5 1/3] libbpf: add netlink helpers Kumar Kartikeya Dwivedi
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2021-04-28 16:25 UTC (permalink / raw)
  To: bpf
  Cc: Kumar Kartikeya Dwivedi, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, Toke Høiland-Jørgensen,
	Shaun Crampton, netdev

This is the fifth version of the TC-BPF series.

It adds a simple API that uses netlink to attach the tc filter and its bpf
classifier program. Currently, a user needs to shell out to the tc command line
to be able to create filters and attach SCHED_CLS programs as classifiers. With
the help of this API, it will be possible to use libbpf for doing all parts of
bpf program setup and attach.

Changelog contains details of patchset evolution.

In an effort to keep discussion focused, this series doesn't have the high level
TC-BPF API. It was clear that there is a need for a bpf_link API in the kernel,
hence that will be submitted as a separate patchset based on this.

The individual commit messages contain more details, and also a brief summary of
the API.

Changelog:
----------
v4 -> v5:
v4: https://lore.kernel.org/bpf/20210423150600.498490-1-memxor@gmail.com
 * Added bpf_tc_hook to represent the attach location of a filter.
 * Removed the bpf_tc_ctx context object, refactored code to not assume shared
   open socket across operations on the same ctx.
 * Add a helper libbpf_nl_send_recv that wraps socket creation, sending and
   receiving the netlink message.
 * Extended netlink code to cut short message processing using BPF_NL_DONE. This
   is used in a few places to return early to the user and discard remaining
   data.
 * selftests rewrite and expantion, considering API is looking more solid now.
 * Documented the API assumptions and behaviour in the commit that adds it,
   along with a few basic usage examples.
 * Dropped documentation from libbpf.h.
 * Relax some restrictions on bpf_tc_query to make it more useful (e.g. to
   detect if any filters exist).
 * Incorporate other minor suggestions from previous review (Andrii and Daniel).

v3 -> v4
v3: https://lore.kernel.org/bpf/20210420193740.124285-1-memxor@gmail.com
 * Added a concept of bpf_tc_ctx context structure representing the attach point.
   The qdisc setup and delete is tied to this object's lifetime if it succeeds
   in creating the clsact qdisc when the attach point is BPF_TC_INGRESS or
   BPF_TC_EGRESS. Qdisc is only deleted when there are no filters attached to
   it.
 * Refactored all API functions to take ctx.
 * Removed bpf_tc_info, bpf_tc_attach_id, instead reused bpf_tc_opts for filling
   in attributes in various API functions (including query).
 * Explicitly documented the expectation of each function regarding the opts
   fields set. Added some small notes for the defaults chosen by the API.
 * Rename bpf_tc_get_info to bpf_tc_query
 * Keep the netlink socket open in the context structure to save on open/close
   cycles for each operation.
 * Miscellaneous adjustments due to keeping the socket open.
 * Rewrote the tests, and also added tests for testing all preconditions of the
   TC-BPF API.
 * We now use bpf skeleton in examples and tests.

v2 -> v3
v2: https://lore.kernel.org/bpf/20210419121811.117400-1-memxor@gmail.com

 * bpf_tc_cls_* -> bpf_tc_* rename
 * bpf_tc_attach_id now only consists of handle and priority, the two variables
   that user may or may not set.
 * bpf_tc_replace has been dropped, instead a replace bool is introduced in
   bpf_tc_opts for the same purpose.
 * bpf_tc_get_info now takes attach_id for filling in filter details during
   lookup instead of requiring user to do so. This also allows us to remove the
   fd parameter, as no matching is needed as long as we have all attributes
   necessary to identify a specific filter.
 * A little bit of code simplification taking into account the change above.
 * priority and protocol are now __u16 members in user facing API structs to
   reflect actual size.
 * Patch updating pkt_cls.h header has been removed, as it is unused now.
 * protocol and chain_index options have been dropped in bpf_tc_opts,
   protocol is always set to ETH_P_ALL, while chain_index is set as 0 by
   default in the kernel. This also means removal of chain_index from
   bpf_tc_attach_id, as it is unconditionally always 0.
 * bpf_tc_cls_change has been dropped
 * selftest now uses ASSERT_* macros

v1 -> v2
v1: https://lore.kernel.org/bpf/20210325120020.236504-1-memxor@gmail.com

 * netlink helpers have been renamed to object_action style.
 * attach_id now only contains attributes that are not explicitly set. Only
   the bare minimum info is kept in it.
 * protocol is now an optional and always set to ETH_P_ALL.
 * direct-action mode is always set.
 * skip_sw and skip_hw options have also been removed.
 * bpf_tc_cls_info struct now also returns the bpf program tag and id, as
   available in the netlink response. This came up as a requirement during
   discussion with people wanting to use this functionality.
 * support for attaching SCHED_ACT programs has been dropped, as it isn't
   useful without any support for binding loaded actions to a classifier.
 * the distinction between dev and block API has been dropped, there is now
   a single set of functions and user has to pass the special ifindex value
   to indicate operation on a shared filter block on their own.
 * The high level API returning a bpf_link is gone. This was already non-
   functional for pinning and typical ownership semantics. Instead, a separate
   patchset will be sent adding a bpf_link API for attaching SCHED_CLS progs to
   the kernel, and its corresponding libbpf API.
 * The clsact qdisc is now setup automatically in a best-effort fashion whenever
   user passes in the clsact ingress or egress parent id. This is done with
   exclusive mode, such that if an ingress or clsact qdisc is already set up,
   we skip the setup and move on with filter creation.
 * Other minor changes that came up during the course of discussion and rework.

Kumar Kartikeya Dwivedi (3):
  libbpf: add netlink helpers
  libbpf: add low level TC-BPF API
  libbpf: add selftests for TC-BPF API

 tools/lib/bpf/libbpf.h                        |  41 ++
 tools/lib/bpf/libbpf.map                      |   5 +
 tools/lib/bpf/netlink.c                       | 578 ++++++++++++++++--
 tools/lib/bpf/nlattr.h                        |  48 ++
 .../testing/selftests/bpf/prog_tests/tc_bpf.c | 467 ++++++++++++++
 .../testing/selftests/bpf/progs/test_tc_bpf.c |  12 +
 6 files changed, 1086 insertions(+), 65 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/tc_bpf.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_tc_bpf.c

--
2.30.2


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH bpf-next v5 1/3] libbpf: add netlink helpers
  2021-04-28 16:25 [PATCH bpf-next v5 0/3] Add TC-BPF API Kumar Kartikeya Dwivedi
@ 2021-04-28 16:25 ` Kumar Kartikeya Dwivedi
  2021-04-30 19:04   ` Andrii Nakryiko
  2021-04-28 16:25 ` [PATCH bpf-next v5 2/3] libbpf: add low level TC-BPF API Kumar Kartikeya Dwivedi
  2021-04-28 16:25 ` [PATCH bpf-next v5 3/3] libbpf: add selftests for " Kumar Kartikeya Dwivedi
  2 siblings, 1 reply; 14+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2021-04-28 16:25 UTC (permalink / raw)
  To: bpf
  Cc: Kumar Kartikeya Dwivedi, Toke Høiland-Jørgensen,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, Shaun Crampton, netdev

This change introduces a few helpers to wrap open coded attribute
preparation in netlink.c. It also adds a libbpf_nl_send_recv that is useful
to wrap send + recv handling in a generic way. Subsequent patch will
also use this function for sending and receiving a netlink response.
The libbpf_nl_get_link helper has been removed instead, moving socket
creation into the newly named libbpf_nl_send_recv.

Every nested attribute's closure must happen using the helper
nlattr_end_nested, which sets its length properly. NLA_F_NESTED is
enforced using nlattr_begin_nested helper. Other simple attributes
can be added directly.

The maxsz parameter corresponds to the size of the request structure
which is being filled in, so for instance with req being:

struct {
	struct nlmsghdr nh;
	struct tcmsg t;
	char buf[4096];
} req;

Then, maxsz should be sizeof(req).

This change also converts the open coded attribute preparation with the
helpers. Note that the only failure the internal call to nlattr_add
could result in the nested helper would be -EMSGSIZE, hence that is what
we return to our caller.

The libbpf_nl_send_recv call takes care of opening the socket, sending the
netlink message, receiving the response, potentially invoking callbacks,
and return errors if any, and then finally close the socket. This allows
users to avoid identical socket setup code in different places. The only
user of libbpf_nl_get_link has been converted to make use of it.

__bpf_set_link_xdp_fd_replace has also been refactored to use it.

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 tools/lib/bpf/netlink.c | 117 ++++++++++++++++++----------------------
 tools/lib/bpf/nlattr.h  |  48 +++++++++++++++++
 2 files changed, 100 insertions(+), 65 deletions(-)

diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c
index d2cb28e9ef52..6daee6640725 100644
--- a/tools/lib/bpf/netlink.c
+++ b/tools/lib/bpf/netlink.c
@@ -131,72 +131,53 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq,
 	return ret;
 }
 
+static int libbpf_nl_send_recv(struct nlmsghdr *nh, __dump_nlmsg_t fn,
+			       libbpf_dump_nlmsg_t _fn, void *cookie);
+
 static int __bpf_set_link_xdp_fd_replace(int ifindex, int fd, int old_fd,
 					 __u32 flags)
 {
-	int sock, seq = 0, ret;
-	struct nlattr *nla, *nla_xdp;
+	struct nlattr *nla;
+	int ret;
 	struct {
 		struct nlmsghdr  nh;
 		struct ifinfomsg ifinfo;
 		char             attrbuf[64];
 	} req;
-	__u32 nl_pid = 0;
-
-	sock = libbpf_netlink_open(&nl_pid);
-	if (sock < 0)
-		return sock;
 
 	memset(&req, 0, sizeof(req));
 	req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg));
 	req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
 	req.nh.nlmsg_type = RTM_SETLINK;
-	req.nh.nlmsg_pid = 0;
-	req.nh.nlmsg_seq = ++seq;
 	req.ifinfo.ifi_family = AF_UNSPEC;
 	req.ifinfo.ifi_index = ifindex;
 
 	/* started nested attribute for XDP */
-	nla = (struct nlattr *)(((char *)&req)
-				+ NLMSG_ALIGN(req.nh.nlmsg_len));
-	nla->nla_type = NLA_F_NESTED | IFLA_XDP;
-	nla->nla_len = NLA_HDRLEN;
+	nla = nlattr_begin_nested(&req.nh, sizeof(req), IFLA_XDP);
+	if (!nla)
+		return -EMSGSIZE;
 
 	/* add XDP fd */
-	nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
-	nla_xdp->nla_type = IFLA_XDP_FD;
-	nla_xdp->nla_len = NLA_HDRLEN + sizeof(int);
-	memcpy((char *)nla_xdp + NLA_HDRLEN, &fd, sizeof(fd));
-	nla->nla_len += nla_xdp->nla_len;
+	ret = nlattr_add(&req.nh, sizeof(req), IFLA_XDP_FD, &fd, sizeof(fd));
+	if (ret < 0)
+		return ret;
 
 	/* if user passed in any flags, add those too */
 	if (flags) {
-		nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
-		nla_xdp->nla_type = IFLA_XDP_FLAGS;
-		nla_xdp->nla_len = NLA_HDRLEN + sizeof(flags);
-		memcpy((char *)nla_xdp + NLA_HDRLEN, &flags, sizeof(flags));
-		nla->nla_len += nla_xdp->nla_len;
+		ret = nlattr_add(&req.nh, sizeof(req), IFLA_XDP_FLAGS, &flags, sizeof(flags));
+		if (ret < 0)
+			return ret;
 	}
 
 	if (flags & XDP_FLAGS_REPLACE) {
-		nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
-		nla_xdp->nla_type = IFLA_XDP_EXPECTED_FD;
-		nla_xdp->nla_len = NLA_HDRLEN + sizeof(old_fd);
-		memcpy((char *)nla_xdp + NLA_HDRLEN, &old_fd, sizeof(old_fd));
-		nla->nla_len += nla_xdp->nla_len;
+		ret = nlattr_add(&req.nh, sizeof(req), IFLA_XDP_EXPECTED_FD, &flags, sizeof(flags));
+		if (ret < 0)
+			return ret;
 	}
 
-	req.nh.nlmsg_len += NLA_ALIGN(nla->nla_len);
+	nlattr_end_nested(&req.nh, nla);
 
-	if (send(sock, &req, req.nh.nlmsg_len, 0) < 0) {
-		ret = -errno;
-		goto cleanup;
-	}
-	ret = bpf_netlink_recv(sock, nl_pid, seq, NULL, NULL, NULL);
-
-cleanup:
-	close(sock);
-	return ret;
+	return libbpf_nl_send_recv(&req.nh, NULL, NULL, NULL);
 }
 
 int bpf_set_link_xdp_fd_opts(int ifindex, int fd, __u32 flags,
@@ -282,16 +263,22 @@ static int get_xdp_info(void *cookie, void *msg, struct nlattr **tb)
 	return 0;
 }
 
-static int libbpf_nl_get_link(int sock, unsigned int nl_pid,
-			      libbpf_dump_nlmsg_t dump_link_nlmsg, void *cookie);
 
 int bpf_get_link_xdp_info(int ifindex, struct xdp_link_info *info,
 			  size_t info_size, __u32 flags)
 {
 	struct xdp_id_md xdp_id = {};
-	int sock, ret;
-	__u32 nl_pid = 0;
 	__u32 mask;
+	int ret;
+	struct {
+		struct nlmsghdr nlh;
+		struct ifinfomsg ifm;
+	} req = {
+		.nlh.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
+		.nlh.nlmsg_type = RTM_GETLINK,
+		.nlh.nlmsg_flags = NLM_F_DUMP | NLM_F_REQUEST,
+		.ifm.ifi_family = AF_PACKET,
+	};
 
 	if (flags & ~XDP_FLAGS_MASK || !info_size)
 		return -EINVAL;
@@ -302,14 +289,10 @@ int bpf_get_link_xdp_info(int ifindex, struct xdp_link_info *info,
 	if (flags && flags & mask)
 		return -EINVAL;
 
-	sock = libbpf_netlink_open(&nl_pid);
-	if (sock < 0)
-		return sock;
-
 	xdp_id.ifindex = ifindex;
 	xdp_id.flags = flags;
 
-	ret = libbpf_nl_get_link(sock, nl_pid, get_xdp_info, &xdp_id);
+	ret = libbpf_nl_send_recv(&req.nlh, __dump_link_nlmsg, get_xdp_info, &xdp_id);
 	if (!ret) {
 		size_t sz = min(info_size, sizeof(xdp_id.info));
 
@@ -317,7 +300,6 @@ int bpf_get_link_xdp_info(int ifindex, struct xdp_link_info *info,
 		memset((void *) info + sz, 0, info_size - sz);
 	}
 
-	close(sock);
 	return ret;
 }
 
@@ -349,24 +331,29 @@ int bpf_get_link_xdp_id(int ifindex, __u32 *prog_id, __u32 flags)
 	return ret;
 }
 
-int libbpf_nl_get_link(int sock, unsigned int nl_pid,
-		       libbpf_dump_nlmsg_t dump_link_nlmsg, void *cookie)
+static int libbpf_nl_send_recv(struct nlmsghdr *nh, __dump_nlmsg_t fn,
+			       libbpf_dump_nlmsg_t _fn, void *cookie)
 {
-	struct {
-		struct nlmsghdr nlh;
-		struct ifinfomsg ifm;
-	} req = {
-		.nlh.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
-		.nlh.nlmsg_type = RTM_GETLINK,
-		.nlh.nlmsg_flags = NLM_F_DUMP | NLM_F_REQUEST,
-		.ifm.ifi_family = AF_PACKET,
-	};
-	int seq = time(NULL);
+	__u32 nl_pid = 0;
+	int sock, ret;
 
-	req.nlh.nlmsg_seq = seq;
-	if (send(sock, &req, req.nlh.nlmsg_len, 0) < 0)
-		return -errno;
+	if (!nh)
+		return -EINVAL;
+
+	sock = libbpf_netlink_open(&nl_pid);
+	if (sock < 0)
+		return sock;
 
-	return bpf_netlink_recv(sock, nl_pid, seq, __dump_link_nlmsg,
-				dump_link_nlmsg, cookie);
+	nh->nlmsg_pid = 0;
+	nh->nlmsg_seq = time(NULL);
+	if (send(sock, nh, nh->nlmsg_len, 0) < 0) {
+		ret = -errno;
+		goto end;
+	}
+
+	ret = bpf_netlink_recv(sock, nl_pid, nh->nlmsg_seq, fn, _fn, cookie);
+
+end:
+	close(sock);
+	return ret;
 }
diff --git a/tools/lib/bpf/nlattr.h b/tools/lib/bpf/nlattr.h
index 6cc3ac91690f..1c94cdb6e89d 100644
--- a/tools/lib/bpf/nlattr.h
+++ b/tools/lib/bpf/nlattr.h
@@ -10,7 +10,10 @@
 #define __LIBBPF_NLATTR_H
 
 #include <stdint.h>
+#include <string.h>
+#include <errno.h>
 #include <linux/netlink.h>
+
 /* avoid multiple definition of netlink features */
 #define __LINUX_NETLINK_H
 
@@ -103,4 +106,49 @@ int libbpf_nla_parse_nested(struct nlattr *tb[], int maxtype,
 
 int libbpf_nla_dump_errormsg(struct nlmsghdr *nlh);
 
+static inline struct nlattr *nla_data(struct nlattr *nla)
+{
+	return (struct nlattr *)((char *)nla + NLA_HDRLEN);
+}
+
+static inline struct nlattr *nh_tail(struct nlmsghdr *nh)
+{
+	return (struct nlattr *)((char *)nh + NLMSG_ALIGN(nh->nlmsg_len));
+}
+
+static inline int nlattr_add(struct nlmsghdr *nh, size_t maxsz, int type,
+			     const void *data, int len)
+{
+	struct nlattr *nla;
+
+	if (NLMSG_ALIGN(nh->nlmsg_len) + NLA_ALIGN(NLA_HDRLEN + len) > maxsz)
+		return -EMSGSIZE;
+	if ((!data && len) || (data && !len))
+		return -EINVAL;
+
+	nla = nh_tail(nh);
+	nla->nla_type = type;
+	nla->nla_len = NLA_HDRLEN + len;
+	if (data)
+		memcpy(nla_data(nla), data, len);
+	nh->nlmsg_len = NLMSG_ALIGN(nh->nlmsg_len) + NLA_ALIGN(nla->nla_len);
+	return 0;
+}
+
+static inline struct nlattr *nlattr_begin_nested(struct nlmsghdr *nh,
+						 size_t maxsz, int type)
+{
+	struct nlattr *tail;
+
+	tail = nh_tail(nh);
+	if (nlattr_add(nh, maxsz, type | NLA_F_NESTED, NULL, 0))
+		return NULL;
+	return tail;
+}
+
+static inline void nlattr_end_nested(struct nlmsghdr *nh, struct nlattr *tail)
+{
+	tail->nla_len = (char *)nh_tail(nh) - (char *)tail;
+}
+
 #endif /* __LIBBPF_NLATTR_H */
-- 
2.30.2


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH bpf-next v5 2/3] libbpf: add low level TC-BPF API
  2021-04-28 16:25 [PATCH bpf-next v5 0/3] Add TC-BPF API Kumar Kartikeya Dwivedi
  2021-04-28 16:25 ` [PATCH bpf-next v5 1/3] libbpf: add netlink helpers Kumar Kartikeya Dwivedi
@ 2021-04-28 16:25 ` Kumar Kartikeya Dwivedi
  2021-04-30 19:35   ` Andrii Nakryiko
  2021-04-28 16:25 ` [PATCH bpf-next v5 3/3] libbpf: add selftests for " Kumar Kartikeya Dwivedi
  2 siblings, 1 reply; 14+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2021-04-28 16:25 UTC (permalink / raw)
  To: bpf
  Cc: Kumar Kartikeya Dwivedi, Toke Høiland-Jørgensen,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, Shaun Crampton, netdev

This adds functions that wrap the netlink API used for adding,
manipulating, and removing traffic control filters.

An API summary:

A bpf_tc_hook represents a location where a TC-BPF filter can be
attached. This means that creating a hook leads to creation of the
backing qdisc, while destruction either removes all filters attached to
a hook, or destroys qdisc if requested explicitly (as discussed below).

The TC-BPF API functions operate on this bpf_tc_hook to attach, replace,
query, and detach tc filters.

All functions return 0 on success, and a negative error code on failure.

bpf_tc_hook_create - Create a hook
Parameters:
	@hook - Cannot be NULL, ifindex > 0, attach_point must be set to
		proper enum constant. Note that parent must be unset when
		attach_point is one of BPF_TC_INGRESS or BPF_TC_EGRESS. Note
		that as an exception BPF_TC_INGRESS|BPF_TC_EGRESS is also a
		valid value for attach_point.

		Returns -EOPNOTSUPP when hook has attach_point as BPF_TC_CUSTOM.

	@flags - Currently only BPF_TC_F_REPLACE, which creates qdisc in
		 non-exclusive mode (i.e. an existing qdisc will be replaced
		 instead of this function failing with -EEXIST).

bpf_tc_hook_destroy - Destroy the hook
Parameters:
        @hook - Cannot be NULL. The behaviour depends on value of
		attach_point.

		If BPF_TC_INGRESS, all filters attached to the ingress
		hook will be detached.
		If BPF_TC_EGRESS, all filters attached to the egress hook
		will be detached.
		If BPF_TC_INGRESS|BPF_TC_EGRESS, the clsact qdisc will be
		deleted, also detaching all filters.

		It is advised that if the qdisc is operated on by many programs,
		then the program atleast check that there are no other existing
		filters before deleting the clsact qdisc. An example is shown
		below:

		/* set opts as NULL, as we're not really interested in
		 * getting any info for a particular filter, but just
	 	 * detecting its presence.
		 */
		DECLARE_LIBBPF_OPTS(bpf_tc_hook, .ifindex = if_nametoindex("lo"),
				    .attach_point = BPF_TC_INGRESS);
		r = bpf_tc_query(&hook, NULL);
		if (r < 0 && r == -ENOENT) {
			/* no filters */
			hook.attach_point = BPF_TC_INGRESS|BPF_TC_EGREESS;
			return bpf_tc_hook_destroy(&hook);
		} else /* failed or r == 0, the latter means filters do exist */
			return r;

		Note that there is a small race between checking for no
		filters and deleting the qdisc. This is currently unavoidable.

		Returns -EOPNOTSUPP when hook has attach_point as BPF_TC_CUSTOM.

bpf_tc_attach - Attach a filter to a hook
Parameters:
	@hook - Cannot be NULL. Represents the hook the filter will be
		attached to. Requirements for ifindex and attach_point are
		same as described in bpf_tc_hook_create, but BPF_TC_CUSTOM
		is also supported.  In that case, parent must be set to the
		handle where the filter will be attached (using TC_H_MAKE).

		E.g. To set parent to 1:16 like in tc command line,
		     the equivalent would be TC_H_MAKE(1 << 16, 16)

	@opts - Cannot be NULL.

		The following opts are optional:
			handle - The handle of the filter
			priority - The priority of the filter
				   Must be >= 0 and <= UINT16_MAX
		The following opts must be set:
			prog_fd - The fd of the loaded SCHED_CLS prog
		The following opts must be unset:
			prog_id - The ID of the BPF prog

		The following opts will be filled by bpf_tc_attach on a
		successful attach operation if they are unset:
			handle - The handle of the attached filter
			priority - The priority of the attached filter
			prog_id - The ID of the attached SCHED_CLS prog

		This way, the user can know what the auto allocated
		values for optional opts like handle and priority are
		for the newly attached filter, if they were unset.

		Note that some other attributes are set to some default
		values listed below (this holds for all bpf_tc_* APIs):
			protocol - ETH_P_ALL
			mode - direct action
			chain index - 0
			class ID - 0 (this can be set by writing to the
			skb->tc_classid field from the BPF program)

	@flags - Currently only BPF_TC_F_REPLACE, which creates filter
		 in non-exclusive mode (i.e. an existing filter with the
		 same attributes will be replaced instead of this
		 function failing with -EEXIST).

bpf_tc_detach
Parameters:
	@hook: Cannot be NULL. Represents the hook the filter will be
		detached from. Requirements are same as described above
		in bpf_tc_attach.

	@opts:	Cannot be NULL.

		The following opts must be set:
			handle
			priority
		The following opts must be unset:
			prog_fd
			prog_id

bpf_tc_query
Parameters:
	@hook: Cannot be NULL. Represents the hook where the filter
	       lookup will be performed. Requires are same as described
	       above in bpf_tc_attach.

	@opts: Can be NULL.

	       The following opts are optional:
			handle
			priority
			prog_fd
			prog_id

	       However, only one of prog_fd and prog_id must be
	       set. Setting both leads to an error. Setting none is
	       allowed.

	       The following fields will be filled by bpf_tc_query on a
	       successful lookup if they are unset:
			handle
			priority
			prog_id

	       Based on the specified optional parameters, the matching
	       data for the first matching filter is filled in and 0 is
	       returned. When setting prog_fd, the prog_id will be
	       matched against prog_id of the loaded SCHED_CLS prog
	       represented by prog_fd.

	       To uniquely identify a filter, e.g. to detect its presence,
	       it is recommended to set both handle and priority fields.

Some usage examples (using bpf skeleton infrastructure):

BPF program (test_tc_bpf.c):

	#include <linux/bpf.h>
	#include <bpf/bpf_helpers.h>

	SEC("classifier")
	int cls(struct __sk_buff *skb)
	{
		return 0;
	}

Userspace loader:

	DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, 0);
	struct test_tc_bpf *skel = NULL;
	int fd, r;

	skel = test_tc_bpf__open_and_load();
	if (!skel)
		return -ENOMEM;

	fd = bpf_program__fd(skel->progs.cls);

	DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex =
			    if_nametoindex("lo"), .attach_point =
			    BPF_TC_INGRESS);
	/* Create clsact qdisc */
	r = bpf_tc_hook_create(&hook, 0);
	if (r < 0)
		goto end;

	DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .prog_fd = fd);
	r = bpf_tc_attach(&hook, &opts, 0);
	if (r < 0)
		goto end;
	/* Print the auto allocated handle and priority */
	printf("Handle=%"PRIu32", opts.handle);
	printf("Priority=%"PRIu32", opts.priority);

	opts.prog_fd = opts.prog_id = 0;
	bpf_tc_detach(&hook, &opts);
end:
	test_tc_bpf__destroy(skel);

This is equivalent to doing the following using tc command line:
  # tc qdisc add dev lo clsact
  # tc filter add dev lo ingress bpf obj foo.o sec classifier da

Another example replacing a filter (extending prior example):

	/* We can also choose both (or one), let's try replacing an
	 * existing filter.
	 */
	DECLARE_LIBBPF_OPTS(bpf_tc_opts, replace_opts, .handle =
			    opts.handle, .priority = opts.priority,
			    .prog_fd = fd);
	r = bpf_tc_attach(&hook, &replace_opts, 0);
	if (r < 0 && r == -EEXIST) {
		/* Expected, now use BPF_TC_F_REPLACE to replace it */
		return bpf_tc_attach(&hook, &replace_opts, BPF_TC_F_REPLACE);
	} else if (r == 0) {
		/* There must be no existing filter with these
		 * attributes, so cleanup and return an error.
		 */
		replace_opts.prog_fd = replace_opts.prog_id = 0;
		r = bpf_tc_detach(&hook, &replace_opts);
		if (r == 0)
			r = -1;
	}
	return r;

To obtain info of a particular filter:

	/* Find info for filter with handle 1 and priority 50 */
	DECLARE_LIBBPF_OPTS(bpf_tc_opts, info_opts, .handle = 1,
			    .priority = 50);
	r = bpf_tc_query(&hook, &info_opts);
	if (r < 0 && r == -ENOENT)
		printf("Filter not found");
	else if (r == 0)
		printf("Prog ID: %"PRIu32", info_opts.prog_id);
	return r;

We can also match using prog_id to find the same filter:

	DECLARE_LIBBPF_OPTS(bpf_tc_opts, info_opts2, .prog_id =
			    info_opts.prog_id);
	r = bpf_tc_query(&hook, &info_opts2);
	if (r < 0 && r == -ENOENT)
		printf("Filter not found");
	else if (r == 0) {
		/* If we know there's only one filter for this loaded prog,
		 * it is safe to assert that the handle and priority are
		 * as expected.
		 */
		assert(info_opts2.handle == 1);
		assert(info_opts2.priority == 50);
	}
	return r;

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 tools/lib/bpf/libbpf.h   |  41 ++++
 tools/lib/bpf/libbpf.map |   5 +
 tools/lib/bpf/netlink.c  | 463 ++++++++++++++++++++++++++++++++++++++-
 3 files changed, 508 insertions(+), 1 deletion(-)

diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index bec4e6a6e31d..3de701f46a33 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -775,6 +775,47 @@ LIBBPF_API int bpf_linker__add_file(struct bpf_linker *linker, const char *filen
 LIBBPF_API int bpf_linker__finalize(struct bpf_linker *linker);
 LIBBPF_API void bpf_linker__free(struct bpf_linker *linker);
 
+enum bpf_tc_attach_point {
+	BPF_TC_INGRESS = 1 << 0,
+	BPF_TC_EGRESS  = 1 << 1,
+	BPF_TC_CUSTOM  = 1 << 2,
+};
+
+enum bpf_tc_attach_flags {
+	BPF_TC_F_REPLACE = 1 << 0,
+};
+
+struct bpf_tc_hook {
+	size_t sz;
+	int ifindex;
+	enum bpf_tc_attach_point attach_point;
+	__u32 parent;
+	size_t :0;
+};
+
+#define bpf_tc_hook__last_field parent
+
+struct bpf_tc_opts {
+	size_t sz;
+	int prog_fd;
+	__u32 prog_id;
+	__u32 handle;
+	__u32 priority;
+	size_t :0;
+};
+
+#define bpf_tc_opts__last_field priority
+
+LIBBPF_API int bpf_tc_hook_create(struct bpf_tc_hook *hook, int flags);
+LIBBPF_API int bpf_tc_hook_destroy(struct bpf_tc_hook *hook);
+LIBBPF_API int bpf_tc_attach(const struct bpf_tc_hook *hook,
+			     struct bpf_tc_opts *opts,
+			     int flags);
+LIBBPF_API int bpf_tc_detach(const struct bpf_tc_hook *hook,
+			     const struct bpf_tc_opts *opts);
+LIBBPF_API int bpf_tc_query(const struct bpf_tc_hook *hook,
+			    struct bpf_tc_opts *opts);
+
 #ifdef __cplusplus
 } /* extern "C" */
 #endif
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index b9b29baf1df8..04509c7c144b 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -361,4 +361,9 @@ LIBBPF_0.4.0 {
 		bpf_linker__new;
 		bpf_map__inner_map;
 		bpf_object__set_kversion;
+		bpf_tc_hook_create;
+		bpf_tc_hook_destroy;
+		bpf_tc_attach;
+		bpf_tc_detach;
+		bpf_tc_query;
 } LIBBPF_0.3.0;
diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c
index 6daee6640725..88f7b6144c78 100644
--- a/tools/lib/bpf/netlink.c
+++ b/tools/lib/bpf/netlink.c
@@ -4,7 +4,11 @@
 #include <stdlib.h>
 #include <memory.h>
 #include <unistd.h>
+#include <inttypes.h>
+#include <arpa/inet.h>
 #include <linux/bpf.h>
+#include <linux/if_ether.h>
+#include <linux/pkt_cls.h>
 #include <linux/rtnetlink.h>
 #include <sys/socket.h>
 #include <errno.h>
@@ -73,6 +77,12 @@ static int libbpf_netlink_open(__u32 *nl_pid)
 	return ret;
 }
 
+enum {
+	BPF_NL_CONT,
+	BPF_NL_NEXT,
+	BPF_NL_DONE,
+};
+
 static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq,
 			    __dump_nlmsg_t _fn, libbpf_dump_nlmsg_t fn,
 			    void *cookie)
@@ -84,6 +94,7 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq,
 	int len, ret;
 
 	while (multipart) {
+start:
 		multipart = false;
 		len = recv(sock, buf, sizeof(buf), 0);
 		if (len < 0) {
@@ -121,8 +132,18 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq,
 			}
 			if (_fn) {
 				ret = _fn(nh, fn, cookie);
-				if (ret)
+				if (ret < 0)
+					return ret;
+				switch (ret) {
+				case BPF_NL_CONT:
+					break;
+				case BPF_NL_NEXT:
+					goto start;
+				case BPF_NL_DONE:
+					return 0;
+				default:
 					return ret;
+				}
 			}
 		}
 	}
@@ -357,3 +378,443 @@ static int libbpf_nl_send_recv(struct nlmsghdr *nh, __dump_nlmsg_t fn,
 	close(sock);
 	return ret;
 }
+
+/* TC-HOOK */
+
+typedef int (*qdisc_config_t)(struct nlmsghdr *nh, struct tcmsg *t,
+			      size_t maxsz);
+
+static int clsact_config(struct nlmsghdr *nh, struct tcmsg *t, size_t maxsz)
+{
+	int ret;
+
+	t->tcm_parent = TC_H_CLSACT;
+	t->tcm_handle = TC_H_MAKE(TC_H_CLSACT, 0);
+
+	ret = nlattr_add(nh, maxsz, TCA_KIND, "clsact", sizeof("clsact"));
+	if (ret < 0)
+		return ret;
+
+	return 0;
+}
+
+static int attach_point_to_config(struct bpf_tc_hook *hook, qdisc_config_t *configp)
+{
+	if (!hook)
+		return -EINVAL;
+
+	switch ((int)OPTS_GET(hook, attach_point, 0)) {
+		case BPF_TC_INGRESS:
+		case BPF_TC_EGRESS:
+		case BPF_TC_INGRESS|BPF_TC_EGRESS:
+			if (OPTS_GET(hook, parent, 0))
+				return -EINVAL;
+			*configp = &clsact_config;
+			break;
+		case BPF_TC_CUSTOM:
+			return -EOPNOTSUPP;
+		default:
+			return -EINVAL;
+	}
+
+	return 0;
+}
+
+static long long int tc_get_tcm_parent(enum bpf_tc_attach_point attach_point,
+				       __u32 parent)
+{
+	long long int ret;
+
+	switch (attach_point) {
+	case BPF_TC_INGRESS:
+		if (parent)
+			return -EINVAL;
+		ret = TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_INGRESS);
+		break;
+	case BPF_TC_EGRESS:
+		if (parent)
+			return -EINVAL;
+		ret = TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_EGRESS);
+		break;
+	case BPF_TC_CUSTOM:
+		if (!parent)
+			return -EINVAL;
+		ret = parent;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return ret;
+}
+
+static int tc_qdisc_modify(struct bpf_tc_hook *hook, int cmd, int flags)
+{
+	qdisc_config_t config;
+	int ret = 0;
+	struct {
+		struct nlmsghdr nh;
+		struct tcmsg t;
+		char buf[256];
+	} req;
+
+	ret = attach_point_to_config(hook, &config);
+	if (ret < 0)
+		return ret;
+
+	memset(&req, 0, sizeof(req));
+	req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg));
+	req.nh.nlmsg_flags =
+		NLM_F_REQUEST | NLM_F_ACK | flags;
+	req.nh.nlmsg_type = cmd;
+	req.t.tcm_family = AF_UNSPEC;
+	req.t.tcm_ifindex = OPTS_GET(hook, ifindex, 0);
+
+	ret = config(&req.nh, &req.t, sizeof(req));
+	if (ret < 0)
+		return ret;
+
+	ret = libbpf_nl_send_recv(&req.nh, NULL, NULL, NULL);
+	if (ret < 0)
+		return ret;
+
+	return 0;
+}
+
+static int tc_qdisc_create_excl(struct bpf_tc_hook *hook, int flags)
+{
+	flags = flags & BPF_TC_F_REPLACE ? NLM_F_REPLACE : NLM_F_EXCL;
+	return tc_qdisc_modify(hook, RTM_NEWQDISC, NLM_F_CREATE | flags);
+}
+
+static int tc_qdisc_delete(struct bpf_tc_hook *hook)
+{
+	return tc_qdisc_modify(hook, RTM_DELQDISC, 0);
+}
+
+int bpf_tc_hook_create(struct bpf_tc_hook *hook, int flags)
+{
+	if (!hook || !OPTS_VALID(hook, bpf_tc_hook))
+		return -EINVAL;
+	if (OPTS_GET(hook, ifindex, 0) <= 0 || flags & ~BPF_TC_F_REPLACE)
+		return -EINVAL;
+
+	return tc_qdisc_create_excl(hook, flags);
+}
+
+static int tc_cls_detach(const struct bpf_tc_hook *hook,
+			 const struct bpf_tc_opts *opts, bool flush);
+
+int bpf_tc_hook_destroy(struct bpf_tc_hook *hook)
+{
+	if (!hook || !OPTS_VALID(hook, bpf_tc_hook) ||
+	    OPTS_GET(hook, ifindex, 0) <= 0)
+		return -EINVAL;
+
+	switch ((int)OPTS_GET(hook, attach_point, 0)) {
+		case BPF_TC_INGRESS:
+		case BPF_TC_EGRESS:
+			return tc_cls_detach(hook, NULL, true);
+		case BPF_TC_INGRESS|BPF_TC_EGRESS:
+			return tc_qdisc_delete(hook);
+		case BPF_TC_CUSTOM:
+			return -EOPNOTSUPP;
+		default:
+			return -EINVAL;
+	}
+}
+
+struct pass_info {
+	struct bpf_tc_opts *opts;
+	__u32 match_prog_id;
+	bool processed;
+};
+
+/* TC-BPF */
+
+static int tc_cls_add_fd_and_name(struct nlmsghdr *nh, size_t maxsz, int fd)
+{
+	struct bpf_prog_info info = {};
+	char name[256] = {};
+	int len, ret;
+
+	ret = bpf_obj_get_info_by_fd(fd, &info, &(__u32){sizeof(info)});
+	if (ret < 0)
+		return ret;
+
+	ret = nlattr_add(nh, maxsz, TCA_BPF_FD, &fd, sizeof(fd));
+	if (ret < 0)
+		return ret;
+
+	len = snprintf(name, sizeof(name), "%s:[%" PRIu32 "]", info.name,
+		       info.id);
+	if (len < 0 || len >= sizeof(name))
+		return len < 0 ? -EINVAL : -ENAMETOOLONG;
+
+	return nlattr_add(nh, maxsz, TCA_BPF_NAME, name, len + 1);
+}
+
+
+static int cls_get_info(struct nlmsghdr *nh, libbpf_dump_nlmsg_t fn,
+			void *cookie);
+
+int bpf_tc_attach(const struct bpf_tc_hook *hook,
+		  struct bpf_tc_opts *opts, int flags)
+{
+	__u32 protocol = 0, bpf_flags;
+	struct pass_info info = {};
+	long long int tcm_parent;
+	struct nlattr *nla;
+	int ret;
+	struct {
+		struct nlmsghdr nh;
+		struct tcmsg t;
+		char buf[256];
+	} req;
+
+	if (!hook || !opts || !OPTS_VALID(hook, bpf_tc_opts) ||
+	    !OPTS_VALID(opts, bpf_tc_opts))
+		return -EINVAL;
+	if (OPTS_GET(hook, ifindex, 0) <= 0 || !OPTS_GET(opts, prog_fd, 0) ||
+	    OPTS_GET(opts, prog_id, 0))
+		return -EINVAL;
+	if (OPTS_GET(opts, priority, 0) > UINT16_MAX)
+		return -EINVAL;
+	if (flags & ~BPF_TC_F_REPLACE)
+		return -EINVAL;
+
+	protocol = ETH_P_ALL;
+	flags = flags & BPF_TC_F_REPLACE ? NLM_F_REPLACE : NLM_F_EXCL;
+
+	memset(&req, 0, sizeof(req));
+	req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg));
+	req.nh.nlmsg_flags =
+		NLM_F_REQUEST | NLM_F_ACK | NLM_F_CREATE | NLM_F_ECHO | flags;
+	req.nh.nlmsg_type = RTM_NEWTFILTER;
+	req.t.tcm_family = AF_UNSPEC;
+	req.t.tcm_handle = OPTS_GET(opts, handle, 0);
+	req.t.tcm_ifindex = OPTS_GET(hook, ifindex, 0);
+	req.t.tcm_info = TC_H_MAKE(OPTS_GET(opts, priority, 0) << 16, htons(protocol));
+
+	tcm_parent = tc_get_tcm_parent(OPTS_GET(hook, attach_point, 0), OPTS_GET(hook, parent, 0));
+	if (tcm_parent < 0)
+		return tcm_parent;
+	req.t.tcm_parent = tcm_parent;
+
+	ret = nlattr_add(&req.nh, sizeof(req), TCA_KIND, "bpf", sizeof("bpf"));
+	if (ret < 0)
+		return ret;
+
+	nla = nlattr_begin_nested(&req.nh, sizeof(req), TCA_OPTIONS);
+	if (!nla)
+		return -EMSGSIZE;
+
+	ret = tc_cls_add_fd_and_name(&req.nh, sizeof(req), OPTS_GET(opts, prog_fd, 0));
+	if (ret < 0)
+		return ret;
+
+	/* direct action mode is always enabled */
+	bpf_flags = TCA_BPF_FLAG_ACT_DIRECT;
+	ret = nlattr_add(&req.nh, sizeof(req), TCA_BPF_FLAGS,
+			 &bpf_flags, sizeof(bpf_flags));
+	if (ret < 0)
+		return ret;
+
+	nlattr_end_nested(&req.nh, nla);
+
+	info.opts = opts;
+
+	ret = libbpf_nl_send_recv(&req.nh, &cls_get_info, NULL, &info);
+	if (ret < 0)
+		return ret;
+
+	/* Failed to process unicast response */
+	if (!info.processed)
+		ret = -ENOENT;
+
+	return ret;
+}
+
+static int tc_cls_detach(const struct bpf_tc_hook *hook,
+			 const struct bpf_tc_opts *opts, bool flush)
+{
+	long long int tcm_parent;
+	__u32 protocol = 0;
+	int ret, c;
+	struct {
+		struct nlmsghdr nh;
+		struct tcmsg t;
+		char buf[256];
+	} req;
+
+	if (!hook || !OPTS_VALID(hook, bpf_tc_opts) ||
+	    !OPTS_VALID(opts, bpf_tc_opts))
+		return -EINVAL;
+	if (OPTS_GET(hook, ifindex, 0) <= 0 || OPTS_GET(opts, prog_fd, 0) ||
+	    OPTS_GET(opts, prog_id, 0))
+		return -EINVAL;
+	c = !!OPTS_GET(opts, handle, 0) + !!OPTS_GET(opts, priority, 0);
+	if ((flush && c != 0) || (!flush && c != 2))
+		return -EINVAL;
+	if (OPTS_GET(opts, priority, 0) > UINT16_MAX)
+		return -EINVAL;
+
+	if (!flush)
+		protocol = ETH_P_ALL;
+
+	memset(&req, 0, sizeof(req));
+	req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg));
+	req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
+	req.nh.nlmsg_type = RTM_DELTFILTER;
+	req.t.tcm_family = AF_UNSPEC;
+	if (!flush)
+		req.t.tcm_handle = OPTS_GET(opts, handle, 0);
+	req.t.tcm_ifindex = OPTS_GET(hook, ifindex, 0);
+	if (!flush)
+		req.t.tcm_info = TC_H_MAKE(OPTS_GET(opts, priority, 0) << 16,
+					   htons(protocol));
+
+	tcm_parent = tc_get_tcm_parent(OPTS_GET(hook, attach_point, 0), OPTS_GET(hook, parent, 0));
+	if (tcm_parent < 0)
+		return tcm_parent;
+	req.t.tcm_parent = tcm_parent;
+
+	if (!flush) {
+		ret = nlattr_add(&req.nh, sizeof(req), TCA_KIND, "bpf", sizeof("bpf"));
+		if (ret < 0)
+			return ret;
+	}
+
+	return libbpf_nl_send_recv(&req.nh, NULL, NULL, NULL);
+}
+
+int bpf_tc_detach(const struct bpf_tc_hook *hook,
+		  const struct bpf_tc_opts *opts)
+{
+	if (!opts)
+		return -EINVAL;
+
+	return tc_cls_detach(hook, opts, false);
+}
+
+static int __cls_get_info(void *cookie, void *msg, struct nlattr **tb,
+			  bool unicast)
+{
+	struct nlattr *tbb[TCA_BPF_MAX + 1];
+	struct pass_info *info = cookie;
+	struct tcmsg *t = msg;
+	__u32 prog_id;
+
+	if (!info)
+		return -EINVAL;
+	if (unicast && info->processed)
+		return -EINVAL;
+	if (!tb[TCA_OPTIONS])
+		return BPF_NL_CONT;
+
+	libbpf_nla_parse_nested(tbb, TCA_BPF_MAX, tb[TCA_OPTIONS], NULL);
+	if (!tbb[TCA_BPF_ID])
+		return -EINVAL;
+
+	if (!info->opts) {
+		/* This is a special case, where user isn't really looking for
+		 * info for the filter, but just wants to detect if there's
+		 * atleast one attached. In that case, terminate processing as a
+		 * short cut.
+		 */
+		if (unicast)
+			return -EINVAL;
+		goto end;
+	}
+
+	prog_id = libbpf_nla_getattr_u32(tbb[TCA_BPF_ID]);
+	if (info->match_prog_id && info->match_prog_id != prog_id)
+		return BPF_NL_CONT;
+
+	OPTS_SET(info->opts, handle, t->tcm_handle);
+	OPTS_SET(info->opts, priority, TC_H_MAJ(t->tcm_info) >> 16);
+	OPTS_SET(info->opts, prog_id, prog_id);
+
+end:
+	info->processed = true;
+	return unicast ? BPF_NL_NEXT : BPF_NL_DONE;
+}
+
+static int cls_get_info(struct nlmsghdr *nh, libbpf_dump_nlmsg_t fn,
+			void *cookie)
+{
+	struct tcmsg *t = NLMSG_DATA(nh);
+	struct nlattr *tb[TCA_MAX + 1];
+
+	libbpf_nla_parse(tb, TCA_MAX,
+			 (struct nlattr *)((char *)t + NLMSG_ALIGN(sizeof(*t))),
+			 NLMSG_PAYLOAD(nh, sizeof(*t)), NULL);
+	if (!tb[TCA_KIND])
+		return -EINVAL;
+
+	return __cls_get_info(cookie, t, tb, nh->nlmsg_flags & NLM_F_ECHO);
+}
+
+int bpf_tc_query(const struct bpf_tc_hook *hook,
+	         struct bpf_tc_opts *opts)
+{
+	struct pass_info pinfo = {};
+	long long int tcm_parent;
+	__u32 protocol;
+	int ret;
+	struct {
+		struct nlmsghdr nh;
+		struct tcmsg t;
+		char buf[256];
+	} req;
+
+	if (!hook || !OPTS_VALID(hook, bpf_tc_hook) ||
+	    !OPTS_VALID(opts, bpf_tc_opts))
+		return -EINVAL;
+	if (OPTS_GET(hook, ifindex, 0) <= 0 || (OPTS_GET(opts, prog_fd, 0) &&
+	    OPTS_GET(opts, prog_id, 0)))
+		return -EINVAL;
+	if (OPTS_GET(opts, priority, 0) > UINT16_MAX)
+		return -EINVAL;
+
+	protocol = ETH_P_ALL;
+
+	memset(&req, 0, sizeof(req));
+	req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg));
+	req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_DUMP;
+	req.nh.nlmsg_type = RTM_GETTFILTER;
+	req.t.tcm_family = AF_UNSPEC;
+	req.t.tcm_handle = OPTS_GET(opts, handle, 0);
+	req.t.tcm_ifindex = OPTS_GET(hook, ifindex, 0);
+	req.t.tcm_info = TC_H_MAKE(OPTS_GET(opts, priority, 0) << 16, htons(protocol));
+
+	tcm_parent = tc_get_tcm_parent(OPTS_GET(hook, attach_point, 0), OPTS_GET(hook, parent, 0));
+	if (tcm_parent < 0)
+		return tcm_parent;
+	req.t.tcm_parent = tcm_parent;
+
+	ret = nlattr_add(&req.nh, sizeof(req), TCA_KIND, "bpf", sizeof("bpf"));
+	if (ret < 0)
+		return ret;
+
+	if (OPTS_GET(opts, prog_fd, 0)) {
+		struct bpf_prog_info info = {};
+		ret = bpf_obj_get_info_by_fd(OPTS_GET(opts, prog_fd, 0), &info, &(__u32){sizeof(info)});
+		if (ret < 0)
+			return ret;
+
+		pinfo.match_prog_id = info.id;
+	} else
+		pinfo.match_prog_id = OPTS_GET(opts, prog_id, 0);
+
+	pinfo.opts = opts;
+
+	ret = libbpf_nl_send_recv(&req.nh, cls_get_info, NULL, &pinfo);
+	if (ret < 0)
+		return ret;
+
+	if (!pinfo.processed)
+		ret = -ENOENT;
+
+	return ret;
+}
-- 
2.30.2


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH bpf-next v5 3/3] libbpf: add selftests for TC-BPF API
  2021-04-28 16:25 [PATCH bpf-next v5 0/3] Add TC-BPF API Kumar Kartikeya Dwivedi
  2021-04-28 16:25 ` [PATCH bpf-next v5 1/3] libbpf: add netlink helpers Kumar Kartikeya Dwivedi
  2021-04-28 16:25 ` [PATCH bpf-next v5 2/3] libbpf: add low level TC-BPF API Kumar Kartikeya Dwivedi
@ 2021-04-28 16:25 ` Kumar Kartikeya Dwivedi
  2021-04-30 19:41   ` Andrii Nakryiko
  2 siblings, 1 reply; 14+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2021-04-28 16:25 UTC (permalink / raw)
  To: bpf
  Cc: Kumar Kartikeya Dwivedi, Toke Høiland-Jørgensen,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, Shaun Crampton, netdev

This adds some basic tests for the low level bpf_tc_* API.

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 .../testing/selftests/bpf/prog_tests/tc_bpf.c | 467 ++++++++++++++++++
 .../testing/selftests/bpf/progs/test_tc_bpf.c |  12 +
 2 files changed, 479 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/tc_bpf.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_tc_bpf.c

diff --git a/tools/testing/selftests/bpf/prog_tests/tc_bpf.c b/tools/testing/selftests/bpf/prog_tests/tc_bpf.c
new file mode 100644
index 000000000000..40441f4e23e2
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/tc_bpf.c
@@ -0,0 +1,467 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <test_progs.h>
+#include <linux/pkt_cls.h>
+
+#include "test_tc_bpf.skel.h"
+
+#define LO_IFINDEX 1
+
+static int test_tc_internal(const struct bpf_tc_hook *hook, int fd)
+{
+	DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .handle = 1, .priority = 1,
+			    .prog_fd = fd);
+	struct bpf_prog_info info = {};
+	int ret;
+
+	ret = bpf_obj_get_info_by_fd(fd, &info, &(__u32){sizeof(info)});
+	if (!ASSERT_OK(ret, "bpf_obj_get_info_by_fd"))
+		return ret;
+
+	ret = bpf_tc_attach(hook, &opts, 0);
+	if (!ASSERT_OK(ret, "bpf_tc_attach"))
+		return ret;
+
+	if (!ASSERT_EQ(opts.handle, 1, "handle set") ||
+	    !ASSERT_EQ(opts.priority, 1, "priority set") ||
+	    !ASSERT_EQ(opts.prog_id, info.id, "prog_id set"))
+		goto end;
+
+	DECLARE_LIBBPF_OPTS(bpf_tc_opts, info_opts, .prog_fd = fd);
+	ret = bpf_tc_query(hook, &info_opts);
+	if (!ASSERT_OK(ret, "bpf_tc_query"))
+		goto end;
+
+	DECLARE_LIBBPF_OPTS(bpf_tc_opts, info_opts2, .prog_id = info.id);
+	ret = bpf_tc_query(hook, &info_opts2);
+	if (!ASSERT_OK(ret, "bpf_tc_query"))
+		goto end;
+
+	if (!ASSERT_EQ(opts.handle, 1, "handle set") ||
+	    !ASSERT_EQ(opts.priority, 1, "priority set") ||
+	    !ASSERT_EQ(opts.prog_id, info.id, "prog_id set"))
+		goto end;
+
+	opts.prog_id = 0;
+	ret = bpf_tc_attach(hook, &opts, BPF_TC_F_REPLACE);
+	if (!ASSERT_OK(ret, "bpf_tc_attach replace mode"))
+		return ret;
+
+end:
+	opts.prog_fd = opts.prog_id = 0;
+	ret = bpf_tc_detach(hook, &opts);
+	ASSERT_OK(ret, "bpf_tc_detach");
+	return ret;
+}
+
+static int test_tc_bpf_api(struct bpf_tc_hook *hook, int fd)
+{
+	DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .handle = 1, .priority = 1);
+	DECLARE_LIBBPF_OPTS(bpf_tc_opts, attach_opts, .handle = 1, .priority = 1,
+			    .prog_fd = fd);
+	DECLARE_LIBBPF_OPTS(bpf_tc_hook, inv_hook, .attach_point = BPF_TC_INGRESS);
+	int ret;
+
+	ret = bpf_tc_hook_create(NULL, 0);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_hook_create invalid hook = NULL"))
+		return -EINVAL;
+
+	ret = bpf_tc_hook_create(hook, 42);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_hook_create invalid flags"))
+		return -EINVAL;
+	ret = bpf_tc_hook_destroy(NULL);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_hook_destroy invalid hook = NULL"))
+		return -EINVAL;
+
+	/* hook ifindex = 0 */
+	ret = bpf_tc_hook_create(&inv_hook, 0);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_hook_create invalid hook ifindex == 0"))
+		return -EINVAL;
+	ret = bpf_tc_hook_destroy(&inv_hook);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_hook_destroy invalid hook ifindex == 0"))
+		return -EINVAL;
+	ret = bpf_tc_attach(&inv_hook, &attach_opts, 0);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_attach invalid hook ifindex == 0"))
+		return -EINVAL;
+	ret = bpf_tc_detach(&inv_hook, &opts);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_detach invalid hook ifindex == 0"))
+		return -EINVAL;
+	ret = bpf_tc_query(&inv_hook, &opts);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_query invalid hook ifindex == 0"))
+		return -EINVAL;
+
+	/* hook ifindex < 0 */
+	inv_hook.ifindex = -1;
+	ret = bpf_tc_hook_create(&inv_hook, 0);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_hook_create invalid hook ifindex < 0"))
+		return -EINVAL;
+	ret = bpf_tc_hook_destroy(&inv_hook);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_hook_destroy invalid hook ifindex < 0"))
+		return -EINVAL;
+	ret = bpf_tc_attach(&inv_hook, &attach_opts, 0);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_attach invalid hook ifindex < 0"))
+		return -EINVAL;
+	ret = bpf_tc_detach(&inv_hook, &opts);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_detach invalid hook ifindex < 0"))
+		return -EINVAL;
+	ret = bpf_tc_query(&inv_hook, &opts);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_query invalid hook ifindex < 0"))
+		return -EINVAL;
+	inv_hook.ifindex = LO_IFINDEX;
+
+	/* hook.attach_point invalid */
+	inv_hook.attach_point = 0xabcd;
+	ret = bpf_tc_hook_create(&inv_hook, 0);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_hook_create invalid hook.attach_point"))
+		return -EINVAL;
+	ret = bpf_tc_hook_destroy(&inv_hook);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_hook_destroy invalid hook.attach_point"))
+		return -EINVAL;
+	ret = bpf_tc_attach(&inv_hook, &attach_opts, 0);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_attach invalid hook.attach_point"))
+		return -EINVAL;
+	ret = bpf_tc_detach(&inv_hook, &opts);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_detach invalid hook.attach_point"))
+		return -EINVAL;
+	ret = bpf_tc_query(&inv_hook, &opts);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_query invalid hook.attach_point"))
+		return -EINVAL;
+	inv_hook.attach_point = BPF_TC_INGRESS;
+
+	/* hook.attach_point valid, but parent invalid */
+	inv_hook.parent = TC_H_MAKE(1UL << 16, 10);
+	ret = bpf_tc_hook_create(&inv_hook, 0);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_hook_create invalid hook parent"))
+		return -EINVAL;
+	ret = bpf_tc_hook_destroy(&inv_hook);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_hook_destroy invalid hook parent"))
+		return -EINVAL;
+	ret = bpf_tc_attach(&inv_hook, &attach_opts, 0);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_attach invalid hook parent"))
+		return -EINVAL;
+	ret = bpf_tc_detach(&inv_hook, &opts);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_detach invalid hook parent"))
+		return -EINVAL;
+	ret = bpf_tc_query(&inv_hook, &opts);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_query invalid hook parent"))
+		return -EINVAL;
+
+	inv_hook.attach_point = BPF_TC_CUSTOM;
+	inv_hook.parent = 0;
+	/* These return EOPNOTSUPP instead of EINVAL as parent is checked after
+	 * attach_point of the hook.
+	 */
+	ret = bpf_tc_hook_create(&inv_hook, 0);
+	if (!ASSERT_EQ(ret, -EOPNOTSUPP, "bpf_tc_hook_create invalid hook parent"))
+		return -EINVAL;
+	ret = bpf_tc_hook_destroy(&inv_hook);
+	if (!ASSERT_EQ(ret, -EOPNOTSUPP, "bpf_tc_hook_destroy invalid hook parent"))
+		return -EINVAL;
+	ret = bpf_tc_attach(&inv_hook, &attach_opts, 0);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_attach invalid hook parent"))
+		return -EINVAL;
+	ret = bpf_tc_detach(&inv_hook, &opts);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_detach invalid hook parent"))
+		return -EINVAL;
+	ret = bpf_tc_query(&inv_hook, &opts);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_query invalid hook parent"))
+		return -EINVAL;
+	inv_hook.attach_point = BPF_TC_INGRESS;
+
+	/* detach */
+	ret = bpf_tc_detach(NULL, &opts);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_detach invalid hook = NULL"))
+		return -EINVAL;
+	opts.prog_fd = 42;
+	ret = bpf_tc_detach(hook, &opts);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_detach invalid prog_fd set"))
+		return -EINVAL;
+	opts.prog_fd = 0;
+	opts.prog_id = 42;
+	ret = bpf_tc_detach(hook, &opts);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_detach invalid prog_id set"))
+		return -EINVAL;
+	opts.prog_id = 0;
+	opts.handle = 0;
+	ret = bpf_tc_detach(hook, &opts);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_detach invalid handle unset"))
+		return -EINVAL;
+	opts.handle = 1;
+	opts.priority = 0;
+	ret = bpf_tc_detach(hook, &opts);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_detach invalid priority unset"))
+		return -EINVAL;
+	opts.priority = UINT16_MAX + 1;
+	ret = bpf_tc_detach(hook, &opts);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_detach invalid priority > UINT16_MAX"))
+		return -EINVAL;
+	opts.priority = 1;
+	ret = bpf_tc_detach(hook, NULL);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_detach invalid opts = NULL"))
+		return -EINVAL;
+
+	/* query */
+	ret = bpf_tc_query(NULL, &opts);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_query invalid hook = NULL"))
+		return -EINVAL;
+	opts.prog_fd = fd;
+	ret = bpf_tc_query(hook, &opts);
+	if (!ASSERT_EQ(ret, -ENOENT, "bpf_tc_query valid only prog_fd set"))
+		return -EINVAL;
+	opts.prog_fd = 0;
+	opts.prog_id = 42;
+	ret = bpf_tc_query(hook, &opts);
+	if (!ASSERT_EQ(ret, -ENOENT, "bpf_tc_query valid only prog_id set"))
+		return -EINVAL;
+	opts.prog_fd = opts.prog_id = 42;
+	ret = bpf_tc_query(hook, &opts);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_query invalid both prog_fd and prog_id set"))
+		return -EINVAL;
+	opts.prog_fd = opts.prog_id = 0;
+	opts.handle = 0;
+	ret = bpf_tc_query(hook, &opts);
+	if (!ASSERT_EQ(ret, -ENOENT, "bpf_tc_query valid handle unset"))
+		return -EINVAL;
+	opts.handle = 1;
+	opts.priority = 0;
+	ret = bpf_tc_query(hook, &opts);
+	if (!ASSERT_EQ(ret, -ENOENT, "bpf_tc_query valid priority unset"))
+		return -EINVAL;
+	opts.priority = UINT16_MAX + 1;
+	ret = bpf_tc_query(hook, &opts);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_query invalid priority > UINT16_MAX"))
+		return -EINVAL;
+	opts.priority = 1;
+	ret = bpf_tc_query(hook, NULL);
+	if (!ASSERT_EQ(ret, -ENOENT, "bpf_tc_query valid opts = NULL"))
+		return -EINVAL;
+
+	/* attach */
+	ret = bpf_tc_attach(NULL, &attach_opts, 0);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_attach invalid hook = NULL"))
+		return -EINVAL;
+	ret = bpf_tc_attach(hook, &attach_opts, 42);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_attach invalid flags"))
+		return -EINVAL;
+	attach_opts.prog_fd = 0;
+	ret = bpf_tc_attach(hook, &attach_opts, 0);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_attach invalid prog_fd unset"))
+		return -EINVAL;
+	attach_opts.prog_fd = fd;
+	attach_opts.prog_id = 42;
+	ret = bpf_tc_attach(hook, &attach_opts, 0);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_attach invalid prog_id set"))
+		return -EINVAL;
+	attach_opts.prog_id = 0;
+	attach_opts.handle = 0;
+	ret = bpf_tc_attach(hook, &attach_opts, 0);
+	if (!ASSERT_OK(ret, "bpf_tc_attach valid handle unset"))
+		return -EINVAL;
+	attach_opts.prog_fd = attach_opts.prog_id = 0;
+	ASSERT_OK(bpf_tc_detach(hook, &attach_opts), "bpf_tc_detach");
+	attach_opts.prog_fd = fd;
+	attach_opts.handle = 1;
+	attach_opts.priority = 0;
+	ret = bpf_tc_attach(hook, &attach_opts, 0);
+	if (!ASSERT_OK(ret, "bpf_tc_attach valid priority unset"))
+		return -EINVAL;
+	attach_opts.prog_fd = attach_opts.prog_id = 0;
+	ASSERT_OK(bpf_tc_detach(hook, &attach_opts), "bpf_tc_detach");
+	attach_opts.prog_fd = fd;
+	attach_opts.priority = UINT16_MAX + 1;
+	ret = bpf_tc_attach(hook, &attach_opts, 0);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_attach invalid priority > UINT16_MAX"))
+		return -EINVAL;
+	attach_opts.priority = 0;
+	attach_opts.handle = attach_opts.priority = 0;
+	ret = bpf_tc_attach(hook, &attach_opts, 0);
+	if (!ASSERT_OK(ret, "bpf_tc_attach valid both handle and priority unset"))
+		return -EINVAL;
+	attach_opts.prog_fd = attach_opts.prog_id = 0;
+	ASSERT_OK(bpf_tc_detach(hook, &attach_opts), "bpf_tc_detach");
+	ret = bpf_tc_attach(hook, NULL, 0);
+	if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_attach invalid opts = NULL"))
+		return -EINVAL;
+
+	return 0;
+}
+
+static int test_tc_query(const struct bpf_tc_hook *hook, int fd)
+{
+	struct test_tc_bpf *skel = NULL;
+	int new_fd, ret, i = 0;
+
+	skel = test_tc_bpf__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "test_tc_bpf__open_and_load"))
+		return -EINVAL;
+
+	new_fd = bpf_program__fd(skel->progs.cls);
+
+	/* make sure no other filters are attached */
+	ret = bpf_tc_query(hook, NULL);
+	if (!ASSERT_EQ(ret, -ENOENT, "bpf_tc_query == -ENOENT"))
+		goto end_destroy;
+
+	for (i = 0; i < 5; i++) {
+		DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .prog_fd = fd);
+		ret = bpf_tc_attach(hook, &opts, 0);
+		if (!ASSERT_OK(ret, "bpf_tc_attach"))
+			goto end;
+	}
+	DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .handle = 1, .priority = 1,
+			    .prog_fd = new_fd);
+	ret = bpf_tc_attach(hook, &opts, 0);
+	if (!ASSERT_OK(ret, "bpf_tc_attach"))
+		goto end;
+	i++;
+
+	ASSERT_EQ(opts.handle, 1, "handle match");
+	ASSERT_EQ(opts.priority, 1, "priority match");
+	ASSERT_NEQ(opts.prog_id, 0, "prog_id set");
+
+	opts.prog_fd = 0;
+	/* search with handle, priority, prog_id */
+	ret = bpf_tc_query(hook, &opts);
+	if (!ASSERT_OK(ret, "bpf_tc_query"))
+		goto end;
+
+	ASSERT_EQ(opts.handle, 1, "handle match");
+	ASSERT_EQ(opts.priority, 1, "priority match");
+	ASSERT_NEQ(opts.prog_id, 0, "prog_id set");
+
+	opts.priority = opts.prog_fd = 0;
+	/* search with handle, prog_id */
+	ret = bpf_tc_query(hook, &opts);
+	if (!ASSERT_OK(ret, "bpf_tc_query"))
+		goto end;
+
+	ASSERT_EQ(opts.handle, 1, "handle match");
+	ASSERT_EQ(opts.priority, 1, "priority match");
+	ASSERT_NEQ(opts.prog_id, 0, "prog_id set");
+
+	opts.handle = opts.prog_fd = 0;
+	/* search with priority, prog_id */
+	ret = bpf_tc_query(hook, &opts);
+	if (!ASSERT_OK(ret, "bpf_tc_query"))
+		goto end;
+
+	ASSERT_EQ(opts.handle, 1, "handle match");
+	ASSERT_EQ(opts.priority, 1, "priority match");
+	ASSERT_NEQ(opts.prog_id, 0, "prog_id set");
+
+	opts.handle = opts.priority = opts.prog_fd = 0;
+	/* search with prog_id */
+	ret = bpf_tc_query(hook, &opts);
+	if (!ASSERT_OK(ret, "bpf_tc_query"))
+		goto end;
+
+	ASSERT_EQ(opts.handle, 1, "handle match");
+	ASSERT_EQ(opts.priority, 1, "priority match");
+	ASSERT_NEQ(opts.prog_id, 0, "prog_id set");
+
+	while (i != 1) {
+		DECLARE_LIBBPF_OPTS(bpf_tc_opts, del_opts, .prog_fd = fd);
+		ret = bpf_tc_query(hook, &del_opts);
+		if (!ASSERT_OK(ret, "bpf_tc_query"))
+			goto end;
+		ASSERT_NEQ(del_opts.prog_id, opts.prog_id, "prog_id should not be same");
+		ASSERT_NEQ(del_opts.priority, 1, "priority should not be 1");
+		del_opts.prog_fd = del_opts.prog_id = 0;
+		ret = bpf_tc_detach(hook, &del_opts);
+		if (!ASSERT_OK(ret, "bpf_tc_detach"))
+			goto end;
+		i--;
+	}
+
+	opts.handle = opts.priority = opts.prog_id = 0;
+	opts.prog_fd = fd;
+	ret = bpf_tc_query(hook, &opts);
+	ASSERT_EQ(ret, -ENOENT, "bpf_tc_query == -ENOENT");
+
+end:
+	while (i--) {
+		DECLARE_LIBBPF_OPTS(bpf_tc_opts, del_opts, 0);
+		ret = bpf_tc_query(hook, &del_opts);
+		if (!ASSERT_OK(ret, "bpf_tc_query"))
+			break;
+		del_opts.prog_id = 0;
+		ret = bpf_tc_detach(hook, &del_opts);
+		if (!ASSERT_OK(ret, "bpf_tc_detach"))
+			break;
+	}
+	ASSERT_EQ(bpf_tc_query(hook, NULL), -ENOENT, "bpf_tc_query == -ENOENT");
+end_destroy:
+	test_tc_bpf__destroy(skel);
+	return ret;
+}
+
+void test_tc_bpf(void)
+{
+	struct test_tc_bpf *skel = NULL;
+	bool hook_created = true;
+	int cls_fd, ret;
+
+	skel = test_tc_bpf__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "test_tc_bpf__open_and_load"))
+		return;
+
+	cls_fd = bpf_program__fd(skel->progs.cls);
+
+	DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex = LO_IFINDEX,
+			    .attach_point = BPF_TC_INGRESS);
+	ret = bpf_tc_hook_create(&hook, 0);
+	if (ret < 0 && ret == -EEXIST) {
+		hook_created = false;
+		ret = 0;
+	}
+	if (!ASSERT_OK(ret, "bpf_tc_hook_create(BPF_TC_INGRESS)"))
+		goto end;
+
+	hook.attach_point = BPF_TC_CUSTOM;
+	hook.parent = TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_INGRESS);
+	ret = bpf_tc_hook_create(&hook, 0);
+	if (!ASSERT_EQ(ret, -EOPNOTSUPP, "bpf_tc_hook_create invalid hook.attach_point"))
+		goto end;
+
+	ret = test_tc_internal(&hook, cls_fd);
+	if (!ASSERT_OK(ret, "test_tc_internal ingress"))
+		goto end;
+
+	ret = bpf_tc_hook_destroy(&hook);
+	if (!ASSERT_EQ(ret, -EOPNOTSUPP, "bpf_tc_hook_destroy invalid hook.attach_point"))
+		goto end;
+
+	hook.attach_point = BPF_TC_INGRESS;
+	hook.parent = 0;
+	bpf_tc_hook_destroy(&hook);
+
+	ret = test_tc_internal(&hook, cls_fd);
+	if (!ASSERT_OK(ret, "test_tc_internal ingress"))
+		goto end;
+
+	bpf_tc_hook_destroy(&hook);
+
+	hook.attach_point = BPF_TC_EGRESS;
+	ret = test_tc_internal(&hook, cls_fd);
+	if (!ASSERT_OK(ret, "test_tc_internal egress"))
+		goto end;
+
+	bpf_tc_hook_destroy(&hook);
+
+	ret = test_tc_bpf_api(&hook, cls_fd);
+	if (!ASSERT_OK(ret, "test_tc_bpf_api"))
+		goto end;
+
+	bpf_tc_hook_destroy(&hook);
+
+	ret = test_tc_query(&hook, cls_fd);
+	if (!ASSERT_OK(ret, "test_tc_query"))
+		goto end;
+
+end:
+	if (hook_created) {
+		hook.attach_point = BPF_TC_INGRESS|BPF_TC_EGRESS;
+		bpf_tc_hook_destroy(&hook);
+	}
+	test_tc_bpf__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_tc_bpf.c b/tools/testing/selftests/bpf/progs/test_tc_bpf.c
new file mode 100644
index 000000000000..18a3a7ed924a
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_tc_bpf.c
@@ -0,0 +1,12 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+
+/* Dummy prog to test TC-BPF API */
+
+SEC("classifier")
+int cls(struct __sk_buff *skb)
+{
+	return 0;
+}
-- 
2.30.2


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next v5 1/3] libbpf: add netlink helpers
  2021-04-28 16:25 ` [PATCH bpf-next v5 1/3] libbpf: add netlink helpers Kumar Kartikeya Dwivedi
@ 2021-04-30 19:04   ` Andrii Nakryiko
  2021-05-01  6:13     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 14+ messages in thread
From: Andrii Nakryiko @ 2021-04-30 19:04 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Toke Høiland-Jørgensen, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, Shaun Crampton,
	Networking

On Wed, Apr 28, 2021 at 9:26 AM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> This change introduces a few helpers to wrap open coded attribute
> preparation in netlink.c. It also adds a libbpf_nl_send_recv that is useful
> to wrap send + recv handling in a generic way. Subsequent patch will
> also use this function for sending and receiving a netlink response.
> The libbpf_nl_get_link helper has been removed instead, moving socket
> creation into the newly named libbpf_nl_send_recv.
>
> Every nested attribute's closure must happen using the helper
> nlattr_end_nested, which sets its length properly. NLA_F_NESTED is
> enforced using nlattr_begin_nested helper. Other simple attributes
> can be added directly.
>
> The maxsz parameter corresponds to the size of the request structure
> which is being filled in, so for instance with req being:
>
> struct {
>         struct nlmsghdr nh;
>         struct tcmsg t;
>         char buf[4096];
> } req;
>
> Then, maxsz should be sizeof(req).
>
> This change also converts the open coded attribute preparation with the
> helpers. Note that the only failure the internal call to nlattr_add
> could result in the nested helper would be -EMSGSIZE, hence that is what
> we return to our caller.
>
> The libbpf_nl_send_recv call takes care of opening the socket, sending the
> netlink message, receiving the response, potentially invoking callbacks,
> and return errors if any, and then finally close the socket. This allows
> users to avoid identical socket setup code in different places. The only
> user of libbpf_nl_get_link has been converted to make use of it.
>
> __bpf_set_link_xdp_fd_replace has also been refactored to use it.
>
> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>  tools/lib/bpf/netlink.c | 117 ++++++++++++++++++----------------------
>  tools/lib/bpf/nlattr.h  |  48 +++++++++++++++++
>  2 files changed, 100 insertions(+), 65 deletions(-)
>
> diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c
> index d2cb28e9ef52..6daee6640725 100644
> --- a/tools/lib/bpf/netlink.c
> +++ b/tools/lib/bpf/netlink.c
> @@ -131,72 +131,53 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq,
>         return ret;
>  }
>
> +static int libbpf_nl_send_recv(struct nlmsghdr *nh, __dump_nlmsg_t fn,
> +                              libbpf_dump_nlmsg_t _fn, void *cookie);
> +
>  static int __bpf_set_link_xdp_fd_replace(int ifindex, int fd, int old_fd,
>                                          __u32 flags)
>  {
> -       int sock, seq = 0, ret;
> -       struct nlattr *nla, *nla_xdp;
> +       struct nlattr *nla;
> +       int ret;
>         struct {
>                 struct nlmsghdr  nh;
>                 struct ifinfomsg ifinfo;
>                 char             attrbuf[64];
>         } req;
> -       __u32 nl_pid = 0;
> -
> -       sock = libbpf_netlink_open(&nl_pid);
> -       if (sock < 0)
> -               return sock;
>
>         memset(&req, 0, sizeof(req));
>         req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg));
>         req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
>         req.nh.nlmsg_type = RTM_SETLINK;
> -       req.nh.nlmsg_pid = 0;
> -       req.nh.nlmsg_seq = ++seq;
>         req.ifinfo.ifi_family = AF_UNSPEC;
>         req.ifinfo.ifi_index = ifindex;
>
>         /* started nested attribute for XDP */
> -       nla = (struct nlattr *)(((char *)&req)
> -                               + NLMSG_ALIGN(req.nh.nlmsg_len));
> -       nla->nla_type = NLA_F_NESTED | IFLA_XDP;
> -       nla->nla_len = NLA_HDRLEN;
> +       nla = nlattr_begin_nested(&req.nh, sizeof(req), IFLA_XDP);
> +       if (!nla)
> +               return -EMSGSIZE;
>
>         /* add XDP fd */
> -       nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
> -       nla_xdp->nla_type = IFLA_XDP_FD;
> -       nla_xdp->nla_len = NLA_HDRLEN + sizeof(int);
> -       memcpy((char *)nla_xdp + NLA_HDRLEN, &fd, sizeof(fd));
> -       nla->nla_len += nla_xdp->nla_len;
> +       ret = nlattr_add(&req.nh, sizeof(req), IFLA_XDP_FD, &fd, sizeof(fd));
> +       if (ret < 0)
> +               return ret;
>
>         /* if user passed in any flags, add those too */
>         if (flags) {
> -               nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
> -               nla_xdp->nla_type = IFLA_XDP_FLAGS;
> -               nla_xdp->nla_len = NLA_HDRLEN + sizeof(flags);
> -               memcpy((char *)nla_xdp + NLA_HDRLEN, &flags, sizeof(flags));
> -               nla->nla_len += nla_xdp->nla_len;
> +               ret = nlattr_add(&req.nh, sizeof(req), IFLA_XDP_FLAGS, &flags, sizeof(flags));
> +               if (ret < 0)
> +                       return ret;
>         }
>
>         if (flags & XDP_FLAGS_REPLACE) {
> -               nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
> -               nla_xdp->nla_type = IFLA_XDP_EXPECTED_FD;
> -               nla_xdp->nla_len = NLA_HDRLEN + sizeof(old_fd);
> -               memcpy((char *)nla_xdp + NLA_HDRLEN, &old_fd, sizeof(old_fd));
> -               nla->nla_len += nla_xdp->nla_len;
> +               ret = nlattr_add(&req.nh, sizeof(req), IFLA_XDP_EXPECTED_FD, &flags, sizeof(flags));

shouldn't old_fd be used here?

> +               if (ret < 0)
> +                       return ret;
>         }
>
> -       req.nh.nlmsg_len += NLA_ALIGN(nla->nla_len);
> +       nlattr_end_nested(&req.nh, nla);
>
> -       if (send(sock, &req, req.nh.nlmsg_len, 0) < 0) {
> -               ret = -errno;
> -               goto cleanup;
> -       }
> -       ret = bpf_netlink_recv(sock, nl_pid, seq, NULL, NULL, NULL);
> -
> -cleanup:
> -       close(sock);
> -       return ret;
> +       return libbpf_nl_send_recv(&req.nh, NULL, NULL, NULL);
>  }
>
>  int bpf_set_link_xdp_fd_opts(int ifindex, int fd, __u32 flags,

[...]

> -int libbpf_nl_get_link(int sock, unsigned int nl_pid,
> -                      libbpf_dump_nlmsg_t dump_link_nlmsg, void *cookie)
> +static int libbpf_nl_send_recv(struct nlmsghdr *nh, __dump_nlmsg_t fn,
> +                              libbpf_dump_nlmsg_t _fn, void *cookie)
>  {
> -       struct {
> -               struct nlmsghdr nlh;
> -               struct ifinfomsg ifm;
> -       } req = {
> -               .nlh.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
> -               .nlh.nlmsg_type = RTM_GETLINK,
> -               .nlh.nlmsg_flags = NLM_F_DUMP | NLM_F_REQUEST,
> -               .ifm.ifi_family = AF_PACKET,
> -       };
> -       int seq = time(NULL);
> +       __u32 nl_pid = 0;
> +       int sock, ret;
>
> -       req.nlh.nlmsg_seq = seq;
> -       if (send(sock, &req, req.nlh.nlmsg_len, 0) < 0)
> -               return -errno;
> +       if (!nh)
> +               return -EINVAL;
> +
> +       sock = libbpf_netlink_open(&nl_pid);
> +       if (sock < 0)
> +               return sock;
>
> -       return bpf_netlink_recv(sock, nl_pid, seq, __dump_link_nlmsg,
> -                               dump_link_nlmsg, cookie);
> +       nh->nlmsg_pid = 0;
> +       nh->nlmsg_seq = time(NULL);
> +       if (send(sock, nh, nh->nlmsg_len, 0) < 0) {
> +               ret = -errno;
> +               goto end;
> +       }
> +
> +       ret = bpf_netlink_recv(sock, nl_pid, nh->nlmsg_seq, fn, _fn, cookie);

what's the difference between fn and _fn, can this be somehow
reflected in the name?

> +
> +end:
> +       close(sock);
> +       return ret;
>  }
> diff --git a/tools/lib/bpf/nlattr.h b/tools/lib/bpf/nlattr.h
> index 6cc3ac91690f..1c94cdb6e89d 100644
> --- a/tools/lib/bpf/nlattr.h
> +++ b/tools/lib/bpf/nlattr.h
> @@ -10,7 +10,10 @@
>  #define __LIBBPF_NLATTR_H
>
>  #include <stdint.h>
> +#include <string.h>
> +#include <errno.h>
>  #include <linux/netlink.h>
> +
>  /* avoid multiple definition of netlink features */
>  #define __LINUX_NETLINK_H
>
> @@ -103,4 +106,49 @@ int libbpf_nla_parse_nested(struct nlattr *tb[], int maxtype,
>
>  int libbpf_nla_dump_errormsg(struct nlmsghdr *nlh);
>
> +static inline struct nlattr *nla_data(struct nlattr *nla)
> +{
> +       return (struct nlattr *)((char *)nla + NLA_HDRLEN);
> +}
> +
> +static inline struct nlattr *nh_tail(struct nlmsghdr *nh)
> +{
> +       return (struct nlattr *)((char *)nh + NLMSG_ALIGN(nh->nlmsg_len));
> +}
> +
> +static inline int nlattr_add(struct nlmsghdr *nh, size_t maxsz, int type,
> +                            const void *data, int len)
> +{
> +       struct nlattr *nla;
> +
> +       if (NLMSG_ALIGN(nh->nlmsg_len) + NLA_ALIGN(NLA_HDRLEN + len) > maxsz)
> +               return -EMSGSIZE;
> +       if ((!data && len) || (data && !len))

we use !!data != !!len for this in at least few places

> +               return -EINVAL;
> +
> +       nla = nh_tail(nh);
> +       nla->nla_type = type;
> +       nla->nla_len = NLA_HDRLEN + len;
> +       if (data)
> +               memcpy(nla_data(nla), data, len);
> +       nh->nlmsg_len = NLMSG_ALIGN(nh->nlmsg_len) + NLA_ALIGN(nla->nla_len);
> +       return 0;
> +}
> +

[...]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next v5 2/3] libbpf: add low level TC-BPF API
  2021-04-28 16:25 ` [PATCH bpf-next v5 2/3] libbpf: add low level TC-BPF API Kumar Kartikeya Dwivedi
@ 2021-04-30 19:35   ` Andrii Nakryiko
  2021-05-01  6:32     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 14+ messages in thread
From: Andrii Nakryiko @ 2021-04-30 19:35 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Toke Høiland-Jørgensen, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, Shaun Crampton,
	Networking

On Wed, Apr 28, 2021 at 9:26 AM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> This adds functions that wrap the netlink API used for adding,
> manipulating, and removing traffic control filters.
>
> An API summary:
>
> A bpf_tc_hook represents a location where a TC-BPF filter can be
> attached. This means that creating a hook leads to creation of the
> backing qdisc, while destruction either removes all filters attached to
> a hook, or destroys qdisc if requested explicitly (as discussed below).
>
> The TC-BPF API functions operate on this bpf_tc_hook to attach, replace,
> query, and detach tc filters.
>
> All functions return 0 on success, and a negative error code on failure.
>
> bpf_tc_hook_create - Create a hook
> Parameters:
>         @hook - Cannot be NULL, ifindex > 0, attach_point must be set to
>                 proper enum constant. Note that parent must be unset when
>                 attach_point is one of BPF_TC_INGRESS or BPF_TC_EGRESS. Note
>                 that as an exception BPF_TC_INGRESS|BPF_TC_EGRESS is also a
>                 valid value for attach_point.
>
>                 Returns -EOPNOTSUPP when hook has attach_point as BPF_TC_CUSTOM.
>
>         @flags - Currently only BPF_TC_F_REPLACE, which creates qdisc in
>                  non-exclusive mode (i.e. an existing qdisc will be replaced
>                  instead of this function failing with -EEXIST).
>
> bpf_tc_hook_destroy - Destroy the hook
> Parameters:
>         @hook - Cannot be NULL. The behaviour depends on value of
>                 attach_point.
>
>                 If BPF_TC_INGRESS, all filters attached to the ingress
>                 hook will be detached.
>                 If BPF_TC_EGRESS, all filters attached to the egress hook
>                 will be detached.
>                 If BPF_TC_INGRESS|BPF_TC_EGRESS, the clsact qdisc will be
>                 deleted, also detaching all filters.
>
>                 It is advised that if the qdisc is operated on by many programs,
>                 then the program atleast check that there are no other existing

typo: at least

>                 filters before deleting the clsact qdisc. An example is shown
>                 below:
>
>                 /* set opts as NULL, as we're not really interested in
>                  * getting any info for a particular filter, but just
>                  * detecting its presence.
>                  */

this comment probably is better moved to right before bpf_tc_query,
otherwise it reads as if it's related to bpf_tc_hook

>                 DECLARE_LIBBPF_OPTS(bpf_tc_hook, .ifindex = if_nametoindex("lo"),
>                                     .attach_point = BPF_TC_INGRESS);
>                 r = bpf_tc_query(&hook, NULL);
>                 if (r < 0 && r == -ENOENT) {

well, r == -ENOENT should be enough then, no?

>                         /* no filters */
>                         hook.attach_point = BPF_TC_INGRESS|BPF_TC_EGREESS;
>                         return bpf_tc_hook_destroy(&hook);
>                 } else /* failed or r == 0, the latter means filters do exist */
>                         return r;
>
>                 Note that there is a small race between checking for no
>                 filters and deleting the qdisc. This is currently unavoidable.
>
>                 Returns -EOPNOTSUPP when hook has attach_point as BPF_TC_CUSTOM.
>
> bpf_tc_attach - Attach a filter to a hook
> Parameters:
>         @hook - Cannot be NULL. Represents the hook the filter will be
>                 attached to. Requirements for ifindex and attach_point are
>                 same as described in bpf_tc_hook_create, but BPF_TC_CUSTOM
>                 is also supported.  In that case, parent must be set to the
>                 handle where the filter will be attached (using TC_H_MAKE).
>
>                 E.g. To set parent to 1:16 like in tc command line,
>                      the equivalent would be TC_H_MAKE(1 << 16, 16)
>
>         @opts - Cannot be NULL.
>
>                 The following opts are optional:
>                         handle - The handle of the filter
>                         priority - The priority of the filter
>                                    Must be >= 0 and <= UINT16_MAX
>                 The following opts must be set:
>                         prog_fd - The fd of the loaded SCHED_CLS prog
>                 The following opts must be unset:
>                         prog_id - The ID of the BPF prog
>
>                 The following opts will be filled by bpf_tc_attach on a
>                 successful attach operation if they are unset:
>                         handle - The handle of the attached filter
>                         priority - The priority of the attached filter
>                         prog_id - The ID of the attached SCHED_CLS prog
>
>                 This way, the user can know what the auto allocated
>                 values for optional opts like handle and priority are
>                 for the newly attached filter, if they were unset.
>
>                 Note that some other attributes are set to some default
>                 values listed below (this holds for all bpf_tc_* APIs):
>                         protocol - ETH_P_ALL
>                         mode - direct action
>                         chain index - 0
>                         class ID - 0 (this can be set by writing to the
>                         skb->tc_classid field from the BPF program)
>
>         @flags - Currently only BPF_TC_F_REPLACE, which creates filter
>                  in non-exclusive mode (i.e. an existing filter with the
>                  same attributes will be replaced instead of this
>                  function failing with -EEXIST).
>
> bpf_tc_detach
> Parameters:
>         @hook: Cannot be NULL. Represents the hook the filter will be
>                 detached from. Requirements are same as described above
>                 in bpf_tc_attach.
>
>         @opts:  Cannot be NULL.
>
>                 The following opts must be set:
>                         handle
>                         priority
>                 The following opts must be unset:
>                         prog_fd
>                         prog_id
>
> bpf_tc_query
> Parameters:
>         @hook: Cannot be NULL. Represents the hook where the filter
>                lookup will be performed. Requires are same as described
>                above in bpf_tc_attach.
>
>         @opts: Can be NULL.
>
>                The following opts are optional:
>                         handle
>                         priority
>                         prog_fd
>                         prog_id
>
>                However, only one of prog_fd and prog_id must be
>                set. Setting both leads to an error. Setting none is
>                allowed.
>
>                The following fields will be filled by bpf_tc_query on a
>                successful lookup if they are unset:
>                         handle
>                         priority
>                         prog_id
>
>                Based on the specified optional parameters, the matching
>                data for the first matching filter is filled in and 0 is
>                returned. When setting prog_fd, the prog_id will be
>                matched against prog_id of the loaded SCHED_CLS prog
>                represented by prog_fd.
>
>                To uniquely identify a filter, e.g. to detect its presence,
>                it is recommended to set both handle and priority fields.
>
> Some usage examples (using bpf skeleton infrastructure):
>
> BPF program (test_tc_bpf.c):
>
>         #include <linux/bpf.h>
>         #include <bpf/bpf_helpers.h>
>
>         SEC("classifier")
>         int cls(struct __sk_buff *skb)
>         {
>                 return 0;
>         }
>
> Userspace loader:
>
>         DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, 0);
>         struct test_tc_bpf *skel = NULL;
>         int fd, r;
>
>         skel = test_tc_bpf__open_and_load();
>         if (!skel)
>                 return -ENOMEM;
>
>         fd = bpf_program__fd(skel->progs.cls);
>
>         DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex =
>                             if_nametoindex("lo"), .attach_point =
>                             BPF_TC_INGRESS);
>         /* Create clsact qdisc */
>         r = bpf_tc_hook_create(&hook, 0);
>         if (r < 0)
>                 goto end;
>
>         DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .prog_fd = fd);

I don't feel too strongly about this w.r.t. example, but
DECLARE_LIBBPF_OPTS() does declare a variable, so according to C89 all
such declarations should be gathered at the top. It would be nice to
stick to this in the example, but I can see how such locality is a bit
better for educational purposes, so I'm ok with that as well.

>         r = bpf_tc_attach(&hook, &opts, 0);
>         if (r < 0)
>                 goto end;
>         /* Print the auto allocated handle and priority */
>         printf("Handle=%"PRIu32", opts.handle);

let's drop PRIu32, libbpf doesn't use it so let's not use it as an
example, %u would work fine here

>         printf("Priority=%"PRIu32", opts.priority);
>
>         opts.prog_fd = opts.prog_id = 0;
>         bpf_tc_detach(&hook, &opts);
> end:
>         test_tc_bpf__destroy(skel);
>
> This is equivalent to doing the following using tc command line:
>   # tc qdisc add dev lo clsact
>   # tc filter add dev lo ingress bpf obj foo.o sec classifier da
>
> Another example replacing a filter (extending prior example):
>
>         /* We can also choose both (or one), let's try replacing an
>          * existing filter.
>          */
>         DECLARE_LIBBPF_OPTS(bpf_tc_opts, replace_opts, .handle =
>                             opts.handle, .priority = opts.priority,
>                             .prog_fd = fd);
>         r = bpf_tc_attach(&hook, &replace_opts, 0);
>         if (r < 0 && r == -EEXIST) {

again, == -EEXISTS implies r < 0, this just looks sloppy

>                 /* Expected, now use BPF_TC_F_REPLACE to replace it */
>                 return bpf_tc_attach(&hook, &replace_opts, BPF_TC_F_REPLACE);
>         } else if (r == 0) {

I'd go with

else if (r < 0) {
    return r;
}

/* handle happy case without unnecessary nesting */

>                 /* There must be no existing filter with these
>                  * attributes, so cleanup and return an error.
>                  */
>                 replace_opts.prog_fd = replace_opts.prog_id = 0;
>                 r = bpf_tc_detach(&hook, &replace_opts);
>                 if (r == 0)
>                         r = -1;

just return -1;

>         }
>         return r;
>
> To obtain info of a particular filter:
>
>         /* Find info for filter with handle 1 and priority 50 */
>         DECLARE_LIBBPF_OPTS(bpf_tc_opts, info_opts, .handle = 1,
>                             .priority = 50);
>         r = bpf_tc_query(&hook, &info_opts);
>         if (r < 0 && r == -ENOENT)
>                 printf("Filter not found");
>         else if (r == 0)
>                 printf("Prog ID: %"PRIu32", info_opts.prog_id);

same about PRI and r < 0

>         return r;
>
> We can also match using prog_id to find the same filter:
>
>         DECLARE_LIBBPF_OPTS(bpf_tc_opts, info_opts2, .prog_id =
>                             info_opts.prog_id);
>         r = bpf_tc_query(&hook, &info_opts2);
>         if (r < 0 && r == -ENOENT)
>                 printf("Filter not found");
>         else if (r == 0) {
>                 /* If we know there's only one filter for this loaded prog,
>                  * it is safe to assert that the handle and priority are
>                  * as expected.
>                  */
>                 assert(info_opts2.handle == 1);
>                 assert(info_opts2.priority == 50);
>         }
>         return r;
>
> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---

API looks good to me (except the flags field that just stands out).
But I'll defer to Daniel to make the final call.

>  tools/lib/bpf/libbpf.h   |  41 ++++
>  tools/lib/bpf/libbpf.map |   5 +
>  tools/lib/bpf/netlink.c  | 463 ++++++++++++++++++++++++++++++++++++++-
>  3 files changed, 508 insertions(+), 1 deletion(-)
>
> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> index bec4e6a6e31d..3de701f46a33 100644
> --- a/tools/lib/bpf/libbpf.h
> +++ b/tools/lib/bpf/libbpf.h
> @@ -775,6 +775,47 @@ LIBBPF_API int bpf_linker__add_file(struct bpf_linker *linker, const char *filen
>  LIBBPF_API int bpf_linker__finalize(struct bpf_linker *linker);
>  LIBBPF_API void bpf_linker__free(struct bpf_linker *linker);
>
> +enum bpf_tc_attach_point {
> +       BPF_TC_INGRESS = 1 << 0,
> +       BPF_TC_EGRESS  = 1 << 1,
> +       BPF_TC_CUSTOM  = 1 << 2,
> +};
> +
> +enum bpf_tc_attach_flags {
> +       BPF_TC_F_REPLACE = 1 << 0,
> +};
> +
> +struct bpf_tc_hook {
> +       size_t sz;
> +       int ifindex;
> +       enum bpf_tc_attach_point attach_point;
> +       __u32 parent;
> +       size_t :0;
> +};
> +
> +#define bpf_tc_hook__last_field parent
> +
> +struct bpf_tc_opts {
> +       size_t sz;
> +       int prog_fd;
> +       __u32 prog_id;
> +       __u32 handle;
> +       __u32 priority;
> +       size_t :0;
> +};
> +
> +#define bpf_tc_opts__last_field priority
> +
> +LIBBPF_API int bpf_tc_hook_create(struct bpf_tc_hook *hook, int flags);
> +LIBBPF_API int bpf_tc_hook_destroy(struct bpf_tc_hook *hook);
> +LIBBPF_API int bpf_tc_attach(const struct bpf_tc_hook *hook,
> +                            struct bpf_tc_opts *opts,
> +                            int flags);

why didn't you put flags into bpf_tc_opts? they are clearly optional
and fit into "opts" paradigm...

> +LIBBPF_API int bpf_tc_detach(const struct bpf_tc_hook *hook,
> +                            const struct bpf_tc_opts *opts);
> +LIBBPF_API int bpf_tc_query(const struct bpf_tc_hook *hook,
> +                           struct bpf_tc_opts *opts);
> +
>  #ifdef __cplusplus
>  } /* extern "C" */
>  #endif
> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> index b9b29baf1df8..04509c7c144b 100644
> --- a/tools/lib/bpf/libbpf.map
> +++ b/tools/lib/bpf/libbpf.map
> @@ -361,4 +361,9 @@ LIBBPF_0.4.0 {
>                 bpf_linker__new;
>                 bpf_map__inner_map;
>                 bpf_object__set_kversion;
> +               bpf_tc_hook_create;
> +               bpf_tc_hook_destroy;

please keep this alphabetically sorted

> +               bpf_tc_attach;
> +               bpf_tc_detach;
> +               bpf_tc_query;
>  } LIBBPF_0.3.0;
> diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c
> index 6daee6640725..88f7b6144c78 100644
> --- a/tools/lib/bpf/netlink.c
> +++ b/tools/lib/bpf/netlink.c
> @@ -4,7 +4,11 @@
>  #include <stdlib.h>
>  #include <memory.h>
>  #include <unistd.h>
> +#include <inttypes.h>
> +#include <arpa/inet.h>
>  #include <linux/bpf.h>
> +#include <linux/if_ether.h>
> +#include <linux/pkt_cls.h>
>  #include <linux/rtnetlink.h>
>  #include <sys/socket.h>
>  #include <errno.h>
> @@ -73,6 +77,12 @@ static int libbpf_netlink_open(__u32 *nl_pid)
>         return ret;
>  }
>
> +enum {
> +       BPF_NL_CONT,
> +       BPF_NL_NEXT,
> +       BPF_NL_DONE,
> +};
> +
>  static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq,
>                             __dump_nlmsg_t _fn, libbpf_dump_nlmsg_t fn,
>                             void *cookie)
> @@ -84,6 +94,7 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq,
>         int len, ret;
>
>         while (multipart) {
> +start:
>                 multipart = false;
>                 len = recv(sock, buf, sizeof(buf), 0);
>                 if (len < 0) {
> @@ -121,8 +132,18 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq,
>                         }
>                         if (_fn) {
>                                 ret = _fn(nh, fn, cookie);
> -                               if (ret)
> +                               if (ret < 0)
> +                                       return ret;
> +                               switch (ret) {
> +                               case BPF_NL_CONT:
> +                                       break;
> +                               case BPF_NL_NEXT:
> +                                       goto start;
> +                               case BPF_NL_DONE:
> +                                       return 0;
> +                               default:
>                                         return ret;
> +                               }
>                         }
>                 }
>         }
> @@ -357,3 +378,443 @@ static int libbpf_nl_send_recv(struct nlmsghdr *nh, __dump_nlmsg_t fn,
>         close(sock);
>         return ret;
>  }
> +
> +/* TC-HOOK */
> +
> +typedef int (*qdisc_config_t)(struct nlmsghdr *nh, struct tcmsg *t,
> +                             size_t maxsz);
> +
> +static int clsact_config(struct nlmsghdr *nh, struct tcmsg *t, size_t maxsz)
> +{
> +       int ret;
> +
> +       t->tcm_parent = TC_H_CLSACT;
> +       t->tcm_handle = TC_H_MAKE(TC_H_CLSACT, 0);
> +
> +       ret = nlattr_add(nh, maxsz, TCA_KIND, "clsact", sizeof("clsact"));
> +       if (ret < 0)
> +               return ret;
> +
> +       return 0;

nit: return nlattr_add(...)

> +}
> +
> +static int attach_point_to_config(struct bpf_tc_hook *hook, qdisc_config_t *configp)
> +{
> +       if (!hook)
> +               return -EINVAL;

!hook should be already ensured by calling functions, no need to
re-check this everywhere, do this only in API methods. All internal
functions should already ensure non-NULL, otherwise it's a bug.

> +
> +       switch ((int)OPTS_GET(hook, attach_point, 0)) {

is int casting necessary here?

> +               case BPF_TC_INGRESS:
> +               case BPF_TC_EGRESS:
> +               case BPF_TC_INGRESS|BPF_TC_EGRESS:
> +                       if (OPTS_GET(hook, parent, 0))
> +                               return -EINVAL;
> +                       *configp = &clsact_config;
> +                       break;
> +               case BPF_TC_CUSTOM:
> +                       return -EOPNOTSUPP;
> +               default:
> +                       return -EINVAL;
> +       }
> +
> +       return 0;
> +}
> +
> +static long long int tc_get_tcm_parent(enum bpf_tc_attach_point attach_point,
> +                                      __u32 parent)
> +{
> +       long long int ret;
> +
> +       switch (attach_point) {
> +       case BPF_TC_INGRESS:
> +               if (parent)
> +                       return -EINVAL;
> +               ret = TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_INGRESS);

direct return

> +               break;
> +       case BPF_TC_EGRESS:
> +               if (parent)
> +                       return -EINVAL;
> +               ret = TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_EGRESS);

same, make it explicit that we are done and it's the final value returned

> +               break;
> +       case BPF_TC_CUSTOM:
> +               if (!parent)
> +                       return -EINVAL;
> +               ret = parent;
> +               break;
> +       default:
> +               return -EINVAL;
> +       }
> +
> +       return ret;
> +}
> +
> +static int tc_qdisc_modify(struct bpf_tc_hook *hook, int cmd, int flags)
> +{
> +       qdisc_config_t config;
> +       int ret = 0;

unnecessary initialization, some tooling definitely will complain,
please drop = 0 part

> +       struct {
> +               struct nlmsghdr nh;
> +               struct tcmsg t;
> +               char buf[256];
> +       } req;
> +
> +       ret = attach_point_to_config(hook, &config);
> +       if (ret < 0)
> +               return ret;
> +
> +       memset(&req, 0, sizeof(req));
> +       req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg));
> +       req.nh.nlmsg_flags =
> +               NLM_F_REQUEST | NLM_F_ACK | flags;

we can go up to 100 character lines, keep it on single line

> +       req.nh.nlmsg_type = cmd;
> +       req.t.tcm_family = AF_UNSPEC;
> +       req.t.tcm_ifindex = OPTS_GET(hook, ifindex, 0);
> +
> +       ret = config(&req.nh, &req.t, sizeof(req));
> +       if (ret < 0)
> +               return ret;
> +
> +       ret = libbpf_nl_send_recv(&req.nh, NULL, NULL, NULL);
> +       if (ret < 0)
> +               return ret;
> +
> +       return 0;
> +}
> +
> +static int tc_qdisc_create_excl(struct bpf_tc_hook *hook, int flags)
> +{
> +       flags = flags & BPF_TC_F_REPLACE ? NLM_F_REPLACE : NLM_F_EXCL;

see below as well, please use () around bit operators

> +       return tc_qdisc_modify(hook, RTM_NEWQDISC, NLM_F_CREATE | flags);
> +}
> +
> +static int tc_qdisc_delete(struct bpf_tc_hook *hook)
> +{
> +       return tc_qdisc_modify(hook, RTM_DELQDISC, 0);
> +}
> +
> +int bpf_tc_hook_create(struct bpf_tc_hook *hook, int flags)
> +{
> +       if (!hook || !OPTS_VALID(hook, bpf_tc_hook))
> +               return -EINVAL;
> +       if (OPTS_GET(hook, ifindex, 0) <= 0 || flags & ~BPF_TC_F_REPLACE)

please use () around bit operators

> +               return -EINVAL;
> +
> +       return tc_qdisc_create_excl(hook, flags);
> +}
> +
> +static int tc_cls_detach(const struct bpf_tc_hook *hook,
> +                        const struct bpf_tc_opts *opts, bool flush);
> +
> +int bpf_tc_hook_destroy(struct bpf_tc_hook *hook)
> +{
> +       if (!hook || !OPTS_VALID(hook, bpf_tc_hook) ||
> +           OPTS_GET(hook, ifindex, 0) <= 0)
> +               return -EINVAL;
> +
> +       switch ((int)OPTS_GET(hook, attach_point, 0)) {

int casting. Did the compiler complain about that or what?

> +               case BPF_TC_INGRESS:
> +               case BPF_TC_EGRESS:
> +                       return tc_cls_detach(hook, NULL, true);
> +               case BPF_TC_INGRESS|BPF_TC_EGRESS:
> +                       return tc_qdisc_delete(hook);
> +               case BPF_TC_CUSTOM:
> +                       return -EOPNOTSUPP;
> +               default:
> +                       return -EINVAL;
> +       }
> +}
> +
> +struct pass_info {
> +       struct bpf_tc_opts *opts;
> +       __u32 match_prog_id;
> +       bool processed;
> +};
> +
> +/* TC-BPF */
> +
> +static int tc_cls_add_fd_and_name(struct nlmsghdr *nh, size_t maxsz, int fd)
> +{
> +       struct bpf_prog_info info = {};
> +       char name[256] = {};

you are unconditionally snprintf()'ing into name, don't unnecessarily
initialize it

> +       int len, ret;
> +
> +       ret = bpf_obj_get_info_by_fd(fd, &info, &(__u32){sizeof(info)});

that sizeof part... even if that works reliably, stick to normal use
pattern, have a local variable for that. It can be overwritten by the
kernel.

you can re-use len for this, btw

> +       if (ret < 0)
> +               return ret;
> +
> +       ret = nlattr_add(nh, maxsz, TCA_BPF_FD, &fd, sizeof(fd));
> +       if (ret < 0)
> +               return ret;
> +
> +       len = snprintf(name, sizeof(name), "%s:[%" PRIu32 "]", info.name,

libbpf doesn't use PRI modifiers, use %u

> +                      info.id);
> +       if (len < 0 || len >= sizeof(name))
> +               return len < 0 ? -EINVAL : -ENAMETOOLONG;

if (len < 0)
    return -errno;
if (len >= sizeof(name))
    return -ENAMETOOLONG;

> +
> +       return nlattr_add(nh, maxsz, TCA_BPF_NAME, name, len + 1);
> +}
> +
> +
> +static int cls_get_info(struct nlmsghdr *nh, libbpf_dump_nlmsg_t fn,
> +                       void *cookie);
> +
> +int bpf_tc_attach(const struct bpf_tc_hook *hook,
> +                 struct bpf_tc_opts *opts, int flags)
> +{
> +       __u32 protocol = 0, bpf_flags;
> +       struct pass_info info = {};
> +       long long int tcm_parent;
> +       struct nlattr *nla;
> +       int ret;
> +       struct {
> +               struct nlmsghdr nh;
> +               struct tcmsg t;
> +               char buf[256];
> +       } req;
> +
> +       if (!hook || !opts || !OPTS_VALID(hook, bpf_tc_opts) ||
> +           !OPTS_VALID(opts, bpf_tc_opts))
> +               return -EINVAL;
> +       if (OPTS_GET(hook, ifindex, 0) <= 0 || !OPTS_GET(opts, prog_fd, 0) ||
> +           OPTS_GET(opts, prog_id, 0))
> +               return -EINVAL;
> +       if (OPTS_GET(opts, priority, 0) > UINT16_MAX)
> +               return -EINVAL;
> +       if (flags & ~BPF_TC_F_REPLACE)
> +               return -EINVAL;
> +
> +       protocol = ETH_P_ALL;
> +       flags = flags & BPF_TC_F_REPLACE ? NLM_F_REPLACE : NLM_F_EXCL;

()

> +
> +       memset(&req, 0, sizeof(req));
> +       req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg));
> +       req.nh.nlmsg_flags =
> +               NLM_F_REQUEST | NLM_F_ACK | NLM_F_CREATE | NLM_F_ECHO | flags;
> +       req.nh.nlmsg_type = RTM_NEWTFILTER;
> +       req.t.tcm_family = AF_UNSPEC;
> +       req.t.tcm_handle = OPTS_GET(opts, handle, 0);
> +       req.t.tcm_ifindex = OPTS_GET(hook, ifindex, 0);

you are OPTS_GET()ing same stuff multiple times, it might look cleaner
to use local variables for that. It will be faster also, but that's
not important here.

> +       req.t.tcm_info = TC_H_MAKE(OPTS_GET(opts, priority, 0) << 16, htons(protocol));
> +
> +       tcm_parent = tc_get_tcm_parent(OPTS_GET(hook, attach_point, 0), OPTS_GET(hook, parent, 0));

and this will be much shorter, positively, please use local variables
for all those input fields you care about

> +       if (tcm_parent < 0)
> +               return tcm_parent;
> +       req.t.tcm_parent = tcm_parent;
> +
> +       ret = nlattr_add(&req.nh, sizeof(req), TCA_KIND, "bpf", sizeof("bpf"));
> +       if (ret < 0)
> +               return ret;
> +
> +       nla = nlattr_begin_nested(&req.nh, sizeof(req), TCA_OPTIONS);
> +       if (!nla)
> +               return -EMSGSIZE;
> +
> +       ret = tc_cls_add_fd_and_name(&req.nh, sizeof(req), OPTS_GET(opts, prog_fd, 0));
> +       if (ret < 0)
> +               return ret;
> +
> +       /* direct action mode is always enabled */
> +       bpf_flags = TCA_BPF_FLAG_ACT_DIRECT;
> +       ret = nlattr_add(&req.nh, sizeof(req), TCA_BPF_FLAGS,
> +                        &bpf_flags, sizeof(bpf_flags));
> +       if (ret < 0)
> +               return ret;
> +
> +       nlattr_end_nested(&req.nh, nla);
> +
> +       info.opts = opts;
> +
> +       ret = libbpf_nl_send_recv(&req.nh, &cls_get_info, NULL, &info);
> +       if (ret < 0)
> +               return ret;
> +
> +       /* Failed to process unicast response */
> +       if (!info.processed)
> +               ret = -ENOENT;

just return directly, you just did that multiple times above, why this
one is special?

> +
> +       return ret;
> +}
> +
> +static int tc_cls_detach(const struct bpf_tc_hook *hook,
> +                        const struct bpf_tc_opts *opts, bool flush)
> +{
> +       long long int tcm_parent;
> +       __u32 protocol = 0;
> +       int ret, c;
> +       struct {
> +               struct nlmsghdr nh;
> +               struct tcmsg t;
> +               char buf[256];
> +       } req;
> +
> +       if (!hook || !OPTS_VALID(hook, bpf_tc_opts) ||
> +           !OPTS_VALID(opts, bpf_tc_opts))
> +               return -EINVAL;
> +       if (OPTS_GET(hook, ifindex, 0) <= 0 || OPTS_GET(opts, prog_fd, 0) ||
> +           OPTS_GET(opts, prog_id, 0))
> +               return -EINVAL;
> +       c = !!OPTS_GET(opts, handle, 0) + !!OPTS_GET(opts, priority, 0);
> +       if ((flush && c != 0) || (!flush && c != 2))
> +               return -EINVAL;

arithmetics here looks pretty ugly, would it be too bad with logical checks?

> +       if (OPTS_GET(opts, priority, 0) > UINT16_MAX)
> +               return -EINVAL;
> +
> +       if (!flush)
> +               protocol = ETH_P_ALL;
> +
> +       memset(&req, 0, sizeof(req));
> +       req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg));
> +       req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
> +       req.nh.nlmsg_type = RTM_DELTFILTER;
> +       req.t.tcm_family = AF_UNSPEC;
> +       if (!flush)
> +               req.t.tcm_handle = OPTS_GET(opts, handle, 0);
> +       req.t.tcm_ifindex = OPTS_GET(hook, ifindex, 0);
> +       if (!flush)
> +               req.t.tcm_info = TC_H_MAKE(OPTS_GET(opts, priority, 0) << 16,

OPTS_GET()s just make everything uglier and unnecessarily verbose

> +                                          htons(protocol));
> +
> +       tcm_parent = tc_get_tcm_parent(OPTS_GET(hook, attach_point, 0), OPTS_GET(hook, parent, 0));
> +       if (tcm_parent < 0)
> +               return tcm_parent;
> +       req.t.tcm_parent = tcm_parent;
> +
> +       if (!flush) {
> +               ret = nlattr_add(&req.nh, sizeof(req), TCA_KIND, "bpf", sizeof("bpf"));
> +               if (ret < 0)
> +                       return ret;
> +       }
> +
> +       return libbpf_nl_send_recv(&req.nh, NULL, NULL, NULL);
> +}
> +

[...]

> +       tcm_parent = tc_get_tcm_parent(OPTS_GET(hook, attach_point, 0), OPTS_GET(hook, parent, 0));
> +       if (tcm_parent < 0)
> +               return tcm_parent;
> +       req.t.tcm_parent = tcm_parent;
> +
> +       ret = nlattr_add(&req.nh, sizeof(req), TCA_KIND, "bpf", sizeof("bpf"));
> +       if (ret < 0)
> +               return ret;
> +
> +       if (OPTS_GET(opts, prog_fd, 0)) {
> +               struct bpf_prog_info info = {};
> +               ret = bpf_obj_get_info_by_fd(OPTS_GET(opts, prog_fd, 0), &info, &(__u32){sizeof(info)});

same as before, use dedicated variable

> +               if (ret < 0)
> +                       return ret;
> +
> +               pinfo.match_prog_id = info.id;
> +       } else
> +               pinfo.match_prog_id = OPTS_GET(opts, prog_id, 0);

when one branch of if has {}, the other one has to have it as well, please fix

> +
> +       pinfo.opts = opts;
> +
> +       ret = libbpf_nl_send_recv(&req.nh, cls_get_info, NULL, &pinfo);
> +       if (ret < 0)
> +               return ret;
> +
> +       if (!pinfo.processed)
> +               ret = -ENOENT;

direct return

> +
> +       return ret;
> +}
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next v5 3/3] libbpf: add selftests for TC-BPF API
  2021-04-28 16:25 ` [PATCH bpf-next v5 3/3] libbpf: add selftests for " Kumar Kartikeya Dwivedi
@ 2021-04-30 19:41   ` Andrii Nakryiko
  2021-05-01  6:34     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 14+ messages in thread
From: Andrii Nakryiko @ 2021-04-30 19:41 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Toke Høiland-Jørgensen, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, Shaun Crampton,
	Networking

On Wed, Apr 28, 2021 at 9:26 AM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> This adds some basic tests for the low level bpf_tc_* API.
>
> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>  .../testing/selftests/bpf/prog_tests/tc_bpf.c | 467 ++++++++++++++++++
>  .../testing/selftests/bpf/progs/test_tc_bpf.c |  12 +
>  2 files changed, 479 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/tc_bpf.c
>  create mode 100644 tools/testing/selftests/bpf/progs/test_tc_bpf.c
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/tc_bpf.c b/tools/testing/selftests/bpf/prog_tests/tc_bpf.c
> new file mode 100644
> index 000000000000..40441f4e23e2
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/tc_bpf.c
> @@ -0,0 +1,467 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <test_progs.h>
> +#include <linux/pkt_cls.h>
> +
> +#include "test_tc_bpf.skel.h"
> +
> +#define LO_IFINDEX 1
> +
> +static int test_tc_internal(const struct bpf_tc_hook *hook, int fd)
> +{
> +       DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .handle = 1, .priority = 1,
> +                           .prog_fd = fd);

we have 100 characters, if needed, use it to keep it on the single line

> +       struct bpf_prog_info info = {};
> +       int ret;
> +
> +       ret = bpf_obj_get_info_by_fd(fd, &info, &(__u32){sizeof(info)});

as in previous patch, don't do this

> +       if (!ASSERT_OK(ret, "bpf_obj_get_info_by_fd"))
> +               return ret;
> +
> +       ret = bpf_tc_attach(hook, &opts, 0);
> +       if (!ASSERT_OK(ret, "bpf_tc_attach"))
> +               return ret;
> +
> +       if (!ASSERT_EQ(opts.handle, 1, "handle set") ||
> +           !ASSERT_EQ(opts.priority, 1, "priority set") ||
> +           !ASSERT_EQ(opts.prog_id, info.id, "prog_id set"))
> +               goto end;
> +
> +       DECLARE_LIBBPF_OPTS(bpf_tc_opts, info_opts, .prog_fd = fd);

this is not C89, please move variable declarations to the top

> +       ret = bpf_tc_query(hook, &info_opts);
> +       if (!ASSERT_OK(ret, "bpf_tc_query"))
> +               goto end;
> +
> +       DECLARE_LIBBPF_OPTS(bpf_tc_opts, info_opts2, .prog_id = info.id);

and here

> +       ret = bpf_tc_query(hook, &info_opts2);
> +       if (!ASSERT_OK(ret, "bpf_tc_query"))
> +               goto end;
> +
> +       if (!ASSERT_EQ(opts.handle, 1, "handle set") ||
> +           !ASSERT_EQ(opts.priority, 1, "priority set") ||
> +           !ASSERT_EQ(opts.prog_id, info.id, "prog_id set"))
> +               goto end;
> +
> +       opts.prog_id = 0;
> +       ret = bpf_tc_attach(hook, &opts, BPF_TC_F_REPLACE);
> +       if (!ASSERT_OK(ret, "bpf_tc_attach replace mode"))
> +               return ret;

goto end?

> +
> +end:
> +       opts.prog_fd = opts.prog_id = 0;
> +       ret = bpf_tc_detach(hook, &opts);
> +       ASSERT_OK(ret, "bpf_tc_detach");
> +       return ret;
> +}
> +

[...]

> +
> +       /* attach */
> +       ret = bpf_tc_attach(NULL, &attach_opts, 0);
> +       if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_attach invalid hook = NULL"))
> +               return -EINVAL;
> +       ret = bpf_tc_attach(hook, &attach_opts, 42);
> +       if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_attach invalid flags"))
> +               return -EINVAL;
> +       attach_opts.prog_fd = 0;
> +       ret = bpf_tc_attach(hook, &attach_opts, 0);
> +       if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_attach invalid prog_fd unset"))
> +               return -EINVAL;
> +       attach_opts.prog_fd = fd;
> +       attach_opts.prog_id = 42;
> +       ret = bpf_tc_attach(hook, &attach_opts, 0);
> +       if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_attach invalid prog_id set"))
> +               return -EINVAL;
> +       attach_opts.prog_id = 0;
> +       attach_opts.handle = 0;
> +       ret = bpf_tc_attach(hook, &attach_opts, 0);
> +       if (!ASSERT_OK(ret, "bpf_tc_attach valid handle unset"))
> +               return -EINVAL;
> +       attach_opts.prog_fd = attach_opts.prog_id = 0;
> +       ASSERT_OK(bpf_tc_detach(hook, &attach_opts), "bpf_tc_detach");

this code is quite hard to follow, maybe sprinkle empty lines between
logical groups of statements (i.e., prepare inputs + call bpf_tc_xxx +
assert is one group that goes together)

> +       attach_opts.prog_fd = fd;
> +       attach_opts.handle = 1;
> +       attach_opts.priority = 0;
> +       ret = bpf_tc_attach(hook, &attach_opts, 0);
> +       if (!ASSERT_OK(ret, "bpf_tc_attach valid priority unset"))
> +               return -EINVAL;
> +       attach_opts.prog_fd = attach_opts.prog_id = 0;
> +       ASSERT_OK(bpf_tc_detach(hook, &attach_opts), "bpf_tc_detach");
> +       attach_opts.prog_fd = fd;
> +       attach_opts.priority = UINT16_MAX + 1;
> +       ret = bpf_tc_attach(hook, &attach_opts, 0);
> +       if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_attach invalid priority > UINT16_MAX"))
> +               return -EINVAL;
> +       attach_opts.priority = 0;
> +       attach_opts.handle = attach_opts.priority = 0;
> +       ret = bpf_tc_attach(hook, &attach_opts, 0);
> +       if (!ASSERT_OK(ret, "bpf_tc_attach valid both handle and priority unset"))
> +               return -EINVAL;
> +       attach_opts.prog_fd = attach_opts.prog_id = 0;
> +       ASSERT_OK(bpf_tc_detach(hook, &attach_opts), "bpf_tc_detach");
> +       ret = bpf_tc_attach(hook, NULL, 0);
> +       if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_attach invalid opts = NULL"))
> +               return -EINVAL;
> +
> +       return 0;
> +}
> +
> +static int test_tc_query(const struct bpf_tc_hook *hook, int fd)
> +{
> +       struct test_tc_bpf *skel = NULL;
> +       int new_fd, ret, i = 0;
> +
> +       skel = test_tc_bpf__open_and_load();
> +       if (!ASSERT_OK_PTR(skel, "test_tc_bpf__open_and_load"))
> +               return -EINVAL;
> +
> +       new_fd = bpf_program__fd(skel->progs.cls);
> +
> +       /* make sure no other filters are attached */
> +       ret = bpf_tc_query(hook, NULL);
> +       if (!ASSERT_EQ(ret, -ENOENT, "bpf_tc_query == -ENOENT"))
> +               goto end_destroy;
> +
> +       for (i = 0; i < 5; i++) {
> +               DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .prog_fd = fd);

empty line after variable declaration

> +               ret = bpf_tc_attach(hook, &opts, 0);
> +               if (!ASSERT_OK(ret, "bpf_tc_attach"))
> +                       goto end;
> +       }
> +       DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .handle = 1, .priority = 1,
> +                           .prog_fd = new_fd);
> +       ret = bpf_tc_attach(hook, &opts, 0);
> +       if (!ASSERT_OK(ret, "bpf_tc_attach"))
> +               goto end;
> +       i++;
> +
> +       ASSERT_EQ(opts.handle, 1, "handle match");
> +       ASSERT_EQ(opts.priority, 1, "priority match");
> +       ASSERT_NEQ(opts.prog_id, 0, "prog_id set");
> +
> +       opts.prog_fd = 0;
> +       /* search with handle, priority, prog_id */
> +       ret = bpf_tc_query(hook, &opts);
> +       if (!ASSERT_OK(ret, "bpf_tc_query"))
> +               goto end;
> +
> +       ASSERT_EQ(opts.handle, 1, "handle match");
> +       ASSERT_EQ(opts.priority, 1, "priority match");
> +       ASSERT_NEQ(opts.prog_id, 0, "prog_id set");
> +
> +       opts.priority = opts.prog_fd = 0;
> +       /* search with handle, prog_id */
> +       ret = bpf_tc_query(hook, &opts);
> +       if (!ASSERT_OK(ret, "bpf_tc_query"))
> +               goto end;
> +
> +       ASSERT_EQ(opts.handle, 1, "handle match");
> +       ASSERT_EQ(opts.priority, 1, "priority match");
> +       ASSERT_NEQ(opts.prog_id, 0, "prog_id set");
> +
> +       opts.handle = opts.prog_fd = 0;
> +       /* search with priority, prog_id */
> +       ret = bpf_tc_query(hook, &opts);
> +       if (!ASSERT_OK(ret, "bpf_tc_query"))
> +               goto end;
> +
> +       ASSERT_EQ(opts.handle, 1, "handle match");
> +       ASSERT_EQ(opts.priority, 1, "priority match");
> +       ASSERT_NEQ(opts.prog_id, 0, "prog_id set");
> +
> +       opts.handle = opts.priority = opts.prog_fd = 0;
> +       /* search with prog_id */
> +       ret = bpf_tc_query(hook, &opts);
> +       if (!ASSERT_OK(ret, "bpf_tc_query"))
> +               goto end;
> +
> +       ASSERT_EQ(opts.handle, 1, "handle match");
> +       ASSERT_EQ(opts.priority, 1, "priority match");
> +       ASSERT_NEQ(opts.prog_id, 0, "prog_id set");
> +
> +       while (i != 1) {
> +               DECLARE_LIBBPF_OPTS(bpf_tc_opts, del_opts, .prog_fd = fd);

empty line here

> +               ret = bpf_tc_query(hook, &del_opts);
> +               if (!ASSERT_OK(ret, "bpf_tc_query"))
> +                       goto end;
> +               ASSERT_NEQ(del_opts.prog_id, opts.prog_id, "prog_id should not be same");
> +               ASSERT_NEQ(del_opts.priority, 1, "priority should not be 1");
> +               del_opts.prog_fd = del_opts.prog_id = 0;
> +               ret = bpf_tc_detach(hook, &del_opts);
> +               if (!ASSERT_OK(ret, "bpf_tc_detach"))
> +                       goto end;
> +               i--;
> +       }
> +
> +       opts.handle = opts.priority = opts.prog_id = 0;
> +       opts.prog_fd = fd;
> +       ret = bpf_tc_query(hook, &opts);
> +       ASSERT_EQ(ret, -ENOENT, "bpf_tc_query == -ENOENT");
> +
> +end:
> +       while (i--) {
> +               DECLARE_LIBBPF_OPTS(bpf_tc_opts, del_opts, 0);

you get the idea by now

> +               ret = bpf_tc_query(hook, &del_opts);
> +               if (!ASSERT_OK(ret, "bpf_tc_query"))
> +                       break;
> +               del_opts.prog_id = 0;
> +               ret = bpf_tc_detach(hook, &del_opts);
> +               if (!ASSERT_OK(ret, "bpf_tc_detach"))
> +                       break;
> +       }
> +       ASSERT_EQ(bpf_tc_query(hook, NULL), -ENOENT, "bpf_tc_query == -ENOENT");
> +end_destroy:
> +       test_tc_bpf__destroy(skel);
> +       return ret;
> +}
> +

[...]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next v5 1/3] libbpf: add netlink helpers
  2021-04-30 19:04   ` Andrii Nakryiko
@ 2021-05-01  6:13     ` Kumar Kartikeya Dwivedi
  2021-05-03 22:47       ` Andrii Nakryiko
  0 siblings, 1 reply; 14+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2021-05-01  6:13 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Toke Høiland-Jørgensen, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, Shaun Crampton,
	Networking

On Sat, May 01, 2021 at 12:34:39AM IST, Andrii Nakryiko wrote:
> On Wed, Apr 28, 2021 at 9:26 AM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > This change introduces a few helpers to wrap open coded attribute
> > preparation in netlink.c. It also adds a libbpf_nl_send_recv that is useful
> > to wrap send + recv handling in a generic way. Subsequent patch will
> > also use this function for sending and receiving a netlink response.
> > The libbpf_nl_get_link helper has been removed instead, moving socket
> > creation into the newly named libbpf_nl_send_recv.
> >
> > Every nested attribute's closure must happen using the helper
> > nlattr_end_nested, which sets its length properly. NLA_F_NESTED is
> > enforced using nlattr_begin_nested helper. Other simple attributes
> > can be added directly.
> >
> > The maxsz parameter corresponds to the size of the request structure
> > which is being filled in, so for instance with req being:
> >
> > struct {
> >         struct nlmsghdr nh;
> >         struct tcmsg t;
> >         char buf[4096];
> > } req;
> >
> > Then, maxsz should be sizeof(req).
> >
> > This change also converts the open coded attribute preparation with the
> > helpers. Note that the only failure the internal call to nlattr_add
> > could result in the nested helper would be -EMSGSIZE, hence that is what
> > we return to our caller.
> >
> > The libbpf_nl_send_recv call takes care of opening the socket, sending the
> > netlink message, receiving the response, potentially invoking callbacks,
> > and return errors if any, and then finally close the socket. This allows
> > users to avoid identical socket setup code in different places. The only
> > user of libbpf_nl_get_link has been converted to make use of it.
> >
> > __bpf_set_link_xdp_fd_replace has also been refactored to use it.
> >
> > Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> >  tools/lib/bpf/netlink.c | 117 ++++++++++++++++++----------------------
> >  tools/lib/bpf/nlattr.h  |  48 +++++++++++++++++
> >  2 files changed, 100 insertions(+), 65 deletions(-)
> >
> > diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c
> > index d2cb28e9ef52..6daee6640725 100644
> > --- a/tools/lib/bpf/netlink.c
> > +++ b/tools/lib/bpf/netlink.c
> > @@ -131,72 +131,53 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq,
> >         return ret;
> >  }
> >
> > +static int libbpf_nl_send_recv(struct nlmsghdr *nh, __dump_nlmsg_t fn,
> > +                              libbpf_dump_nlmsg_t _fn, void *cookie);
> > +
> >  static int __bpf_set_link_xdp_fd_replace(int ifindex, int fd, int old_fd,
> >                                          __u32 flags)
> >  {
> > -       int sock, seq = 0, ret;
> > -       struct nlattr *nla, *nla_xdp;
> > +       struct nlattr *nla;
> > +       int ret;
> >         struct {
> >                 struct nlmsghdr  nh;
> >                 struct ifinfomsg ifinfo;
> >                 char             attrbuf[64];
> >         } req;
> > -       __u32 nl_pid = 0;
> > -
> > -       sock = libbpf_netlink_open(&nl_pid);
> > -       if (sock < 0)
> > -               return sock;
> >
> >         memset(&req, 0, sizeof(req));
> >         req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg));
> >         req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
> >         req.nh.nlmsg_type = RTM_SETLINK;
> > -       req.nh.nlmsg_pid = 0;
> > -       req.nh.nlmsg_seq = ++seq;
> >         req.ifinfo.ifi_family = AF_UNSPEC;
> >         req.ifinfo.ifi_index = ifindex;
> >
> >         /* started nested attribute for XDP */
> > -       nla = (struct nlattr *)(((char *)&req)
> > -                               + NLMSG_ALIGN(req.nh.nlmsg_len));
> > -       nla->nla_type = NLA_F_NESTED | IFLA_XDP;
> > -       nla->nla_len = NLA_HDRLEN;
> > +       nla = nlattr_begin_nested(&req.nh, sizeof(req), IFLA_XDP);
> > +       if (!nla)
> > +               return -EMSGSIZE;
> >
> >         /* add XDP fd */
> > -       nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
> > -       nla_xdp->nla_type = IFLA_XDP_FD;
> > -       nla_xdp->nla_len = NLA_HDRLEN + sizeof(int);
> > -       memcpy((char *)nla_xdp + NLA_HDRLEN, &fd, sizeof(fd));
> > -       nla->nla_len += nla_xdp->nla_len;
> > +       ret = nlattr_add(&req.nh, sizeof(req), IFLA_XDP_FD, &fd, sizeof(fd));
> > +       if (ret < 0)
> > +               return ret;
> >
> >         /* if user passed in any flags, add those too */
> >         if (flags) {
> > -               nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
> > -               nla_xdp->nla_type = IFLA_XDP_FLAGS;
> > -               nla_xdp->nla_len = NLA_HDRLEN + sizeof(flags);
> > -               memcpy((char *)nla_xdp + NLA_HDRLEN, &flags, sizeof(flags));
> > -               nla->nla_len += nla_xdp->nla_len;
> > +               ret = nlattr_add(&req.nh, sizeof(req), IFLA_XDP_FLAGS, &flags, sizeof(flags));
> > +               if (ret < 0)
> > +                       return ret;
> >         }
> >
> >         if (flags & XDP_FLAGS_REPLACE) {
> > -               nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
> > -               nla_xdp->nla_type = IFLA_XDP_EXPECTED_FD;
> > -               nla_xdp->nla_len = NLA_HDRLEN + sizeof(old_fd);
> > -               memcpy((char *)nla_xdp + NLA_HDRLEN, &old_fd, sizeof(old_fd));
> > -               nla->nla_len += nla_xdp->nla_len;
> > +               ret = nlattr_add(&req.nh, sizeof(req), IFLA_XDP_EXPECTED_FD, &flags, sizeof(flags));
>
> shouldn't old_fd be used here?
>

Ouch, yes, thanks for spotting this.

> > +               if (ret < 0)
> > +                       return ret;
> >         }
> >
> > -       req.nh.nlmsg_len += NLA_ALIGN(nla->nla_len);
> > +       nlattr_end_nested(&req.nh, nla);
> >
> > -       if (send(sock, &req, req.nh.nlmsg_len, 0) < 0) {
> > -               ret = -errno;
> > -               goto cleanup;
> > -       }
> > -       ret = bpf_netlink_recv(sock, nl_pid, seq, NULL, NULL, NULL);
> > -
> > -cleanup:
> > -       close(sock);
> > -       return ret;
> > +       return libbpf_nl_send_recv(&req.nh, NULL, NULL, NULL);
> >  }
> >
> >  int bpf_set_link_xdp_fd_opts(int ifindex, int fd, __u32 flags,
>
> [...]
>
> > -int libbpf_nl_get_link(int sock, unsigned int nl_pid,
> > -                      libbpf_dump_nlmsg_t dump_link_nlmsg, void *cookie)
> > +static int libbpf_nl_send_recv(struct nlmsghdr *nh, __dump_nlmsg_t fn,
> > +                              libbpf_dump_nlmsg_t _fn, void *cookie)
> >  {
> > -       struct {
> > -               struct nlmsghdr nlh;
> > -               struct ifinfomsg ifm;
> > -       } req = {
> > -               .nlh.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
> > -               .nlh.nlmsg_type = RTM_GETLINK,
> > -               .nlh.nlmsg_flags = NLM_F_DUMP | NLM_F_REQUEST,
> > -               .ifm.ifi_family = AF_PACKET,
> > -       };
> > -       int seq = time(NULL);
> > +       __u32 nl_pid = 0;
> > +       int sock, ret;
> >
> > -       req.nlh.nlmsg_seq = seq;
> > -       if (send(sock, &req, req.nlh.nlmsg_len, 0) < 0)
> > -               return -errno;
> > +       if (!nh)
> > +               return -EINVAL;
> > +
> > +       sock = libbpf_netlink_open(&nl_pid);
> > +       if (sock < 0)
> > +               return sock;
> >
> > -       return bpf_netlink_recv(sock, nl_pid, seq, __dump_link_nlmsg,
> > -                               dump_link_nlmsg, cookie);
> > +       nh->nlmsg_pid = 0;
> > +       nh->nlmsg_seq = time(NULL);
> > +       if (send(sock, nh, nh->nlmsg_len, 0) < 0) {
> > +               ret = -errno;
> > +               goto end;
> > +       }
> > +
> > +       ret = bpf_netlink_recv(sock, nl_pid, nh->nlmsg_seq, fn, _fn, cookie);
>
> what's the difference between fn and _fn, can this be somehow
> reflected in the name?
>

You can use fn as a common parsing function for the same RTM_GET* message, and
then use _fn to parse a nested layer of attributes below it to fill in different
kind of opts (through the cookie user data parameter).

How about outer_cb, inner_cb?

> > +
> > +end:
> > +       close(sock);
> > +       return ret;
> >  }
> > diff --git a/tools/lib/bpf/nlattr.h b/tools/lib/bpf/nlattr.h
> > index 6cc3ac91690f..1c94cdb6e89d 100644
> > --- a/tools/lib/bpf/nlattr.h
> > +++ b/tools/lib/bpf/nlattr.h
> > @@ -10,7 +10,10 @@
> >  #define __LIBBPF_NLATTR_H
> >
> >  #include <stdint.h>
> > +#include <string.h>
> > +#include <errno.h>
> >  #include <linux/netlink.h>
> > +
> >  /* avoid multiple definition of netlink features */
> >  #define __LINUX_NETLINK_H
> >
> > @@ -103,4 +106,49 @@ int libbpf_nla_parse_nested(struct nlattr *tb[], int maxtype,
> >
> >  int libbpf_nla_dump_errormsg(struct nlmsghdr *nlh);
> >
> > +static inline struct nlattr *nla_data(struct nlattr *nla)
> > +{
> > +       return (struct nlattr *)((char *)nla + NLA_HDRLEN);
> > +}
> > +
> > +static inline struct nlattr *nh_tail(struct nlmsghdr *nh)
> > +{
> > +       return (struct nlattr *)((char *)nh + NLMSG_ALIGN(nh->nlmsg_len));
> > +}
> > +
> > +static inline int nlattr_add(struct nlmsghdr *nh, size_t maxsz, int type,
> > +                            const void *data, int len)
> > +{
> > +       struct nlattr *nla;
> > +
> > +       if (NLMSG_ALIGN(nh->nlmsg_len) + NLA_ALIGN(NLA_HDRLEN + len) > maxsz)
> > +               return -EMSGSIZE;
> > +       if ((!data && len) || (data && !len))
>
> we use !!data != !!len for this in at least few places
>

Ok.

> > +               return -EINVAL;
> > +
> > +       nla = nh_tail(nh);
> > +       nla->nla_type = type;
> > +       nla->nla_len = NLA_HDRLEN + len;
> > +       if (data)
> > +               memcpy(nla_data(nla), data, len);
> > +       nh->nlmsg_len = NLMSG_ALIGN(nh->nlmsg_len) + NLA_ALIGN(nla->nla_len);
> > +       return 0;
> > +}
> > +
>
> [...]

--
Kartikeya

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next v5 2/3] libbpf: add low level TC-BPF API
  2021-04-30 19:35   ` Andrii Nakryiko
@ 2021-05-01  6:32     ` Kumar Kartikeya Dwivedi
  2021-05-03 22:54       ` Andrii Nakryiko
  0 siblings, 1 reply; 14+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2021-05-01  6:32 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Toke Høiland-Jørgensen, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, Shaun Crampton,
	Networking

On Sat, May 01, 2021 at 01:05:40AM IST, Andrii Nakryiko wrote:
> On Wed, Apr 28, 2021 at 9:26 AM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > This adds functions that wrap the netlink API used for adding,
> > manipulating, and removing traffic control filters.
> >
> > An API summary:
> >
> > A bpf_tc_hook represents a location where a TC-BPF filter can be
> > attached. This means that creating a hook leads to creation of the
> > backing qdisc, while destruction either removes all filters attached to
> > a hook, or destroys qdisc if requested explicitly (as discussed below).
> >
> > The TC-BPF API functions operate on this bpf_tc_hook to attach, replace,
> > query, and detach tc filters.
> >
> > All functions return 0 on success, and a negative error code on failure.
> >
> > bpf_tc_hook_create - Create a hook
> > Parameters:
> >         @hook - Cannot be NULL, ifindex > 0, attach_point must be set to
> >                 proper enum constant. Note that parent must be unset when
> >                 attach_point is one of BPF_TC_INGRESS or BPF_TC_EGRESS. Note
> >                 that as an exception BPF_TC_INGRESS|BPF_TC_EGRESS is also a
> >                 valid value for attach_point.
> >
> >                 Returns -EOPNOTSUPP when hook has attach_point as BPF_TC_CUSTOM.
> >
> >         @flags - Currently only BPF_TC_F_REPLACE, which creates qdisc in
> >                  non-exclusive mode (i.e. an existing qdisc will be replaced
> >                  instead of this function failing with -EEXIST).
> >
> > bpf_tc_hook_destroy - Destroy the hook
> > Parameters:
> >         @hook - Cannot be NULL. The behaviour depends on value of
> >                 attach_point.
> >
> >                 If BPF_TC_INGRESS, all filters attached to the ingress
> >                 hook will be detached.
> >                 If BPF_TC_EGRESS, all filters attached to the egress hook
> >                 will be detached.
> >                 If BPF_TC_INGRESS|BPF_TC_EGRESS, the clsact qdisc will be
> >                 deleted, also detaching all filters.
> >
> >                 It is advised that if the qdisc is operated on by many programs,
> >                 then the program atleast check that there are no other existing
>
> typo: at least
>

Will fix.

> >                 filters before deleting the clsact qdisc. An example is shown
> >                 below:
> >
> >                 /* set opts as NULL, as we're not really interested in
> >                  * getting any info for a particular filter, but just
> >                  * detecting its presence.
> >                  */
>
> this comment probably is better moved to right before bpf_tc_query,
> otherwise it reads as if it's related to bpf_tc_hook
>

Ok.

> >                 DECLARE_LIBBPF_OPTS(bpf_tc_hook, .ifindex = if_nametoindex("lo"),
> >                                     .attach_point = BPF_TC_INGRESS);
> >                 r = bpf_tc_query(&hook, NULL);
> >                 if (r < 0 && r == -ENOENT) {
>
> well, r == -ENOENT should be enough then, no?
>

Yes, I'll change it.

> >                         /* no filters */
> >                         hook.attach_point = BPF_TC_INGRESS|BPF_TC_EGREESS;
> >                         return bpf_tc_hook_destroy(&hook);
> >                 } else /* failed or r == 0, the latter means filters do exist */
> >                         return r;
> >
> >                 Note that there is a small race between checking for no
> >                 filters and deleting the qdisc. This is currently unavoidable.
> >
> >                 Returns -EOPNOTSUPP when hook has attach_point as BPF_TC_CUSTOM.
> >
> > bpf_tc_attach - Attach a filter to a hook
> > Parameters:
> >         @hook - Cannot be NULL. Represents the hook the filter will be
> >                 attached to. Requirements for ifindex and attach_point are
> >                 same as described in bpf_tc_hook_create, but BPF_TC_CUSTOM
> >                 is also supported.  In that case, parent must be set to the
> >                 handle where the filter will be attached (using TC_H_MAKE).
> >
> >                 E.g. To set parent to 1:16 like in tc command line,
> >                      the equivalent would be TC_H_MAKE(1 << 16, 16)
> >
> >         @opts - Cannot be NULL.
> >
> >                 The following opts are optional:
> >                         handle - The handle of the filter
> >                         priority - The priority of the filter
> >                                    Must be >= 0 and <= UINT16_MAX
> >                 The following opts must be set:
> >                         prog_fd - The fd of the loaded SCHED_CLS prog
> >                 The following opts must be unset:
> >                         prog_id - The ID of the BPF prog
> >
> >                 The following opts will be filled by bpf_tc_attach on a
> >                 successful attach operation if they are unset:
> >                         handle - The handle of the attached filter
> >                         priority - The priority of the attached filter
> >                         prog_id - The ID of the attached SCHED_CLS prog
> >
> >                 This way, the user can know what the auto allocated
> >                 values for optional opts like handle and priority are
> >                 for the newly attached filter, if they were unset.
> >
> >                 Note that some other attributes are set to some default
> >                 values listed below (this holds for all bpf_tc_* APIs):
> >                         protocol - ETH_P_ALL
> >                         mode - direct action
> >                         chain index - 0
> >                         class ID - 0 (this can be set by writing to the
> >                         skb->tc_classid field from the BPF program)
> >
> >         @flags - Currently only BPF_TC_F_REPLACE, which creates filter
> >                  in non-exclusive mode (i.e. an existing filter with the
> >                  same attributes will be replaced instead of this
> >                  function failing with -EEXIST).
> >
> > bpf_tc_detach
> > Parameters:
> >         @hook: Cannot be NULL. Represents the hook the filter will be
> >                 detached from. Requirements are same as described above
> >                 in bpf_tc_attach.
> >
> >         @opts:  Cannot be NULL.
> >
> >                 The following opts must be set:
> >                         handle
> >                         priority
> >                 The following opts must be unset:
> >                         prog_fd
> >                         prog_id
> >
> > bpf_tc_query
> > Parameters:
> >         @hook: Cannot be NULL. Represents the hook where the filter
> >                lookup will be performed. Requires are same as described
> >                above in bpf_tc_attach.
> >
> >         @opts: Can be NULL.
> >
> >                The following opts are optional:
> >                         handle
> >                         priority
> >                         prog_fd
> >                         prog_id
> >
> >                However, only one of prog_fd and prog_id must be
> >                set. Setting both leads to an error. Setting none is
> >                allowed.
> >
> >                The following fields will be filled by bpf_tc_query on a
> >                successful lookup if they are unset:
> >                         handle
> >                         priority
> >                         prog_id
> >
> >                Based on the specified optional parameters, the matching
> >                data for the first matching filter is filled in and 0 is
> >                returned. When setting prog_fd, the prog_id will be
> >                matched against prog_id of the loaded SCHED_CLS prog
> >                represented by prog_fd.
> >
> >                To uniquely identify a filter, e.g. to detect its presence,
> >                it is recommended to set both handle and priority fields.
> >
> > Some usage examples (using bpf skeleton infrastructure):
> >
> > BPF program (test_tc_bpf.c):
> >
> >         #include <linux/bpf.h>
> >         #include <bpf/bpf_helpers.h>
> >
> >         SEC("classifier")
> >         int cls(struct __sk_buff *skb)
> >         {
> >                 return 0;
> >         }
> >
> > Userspace loader:
> >
> >         DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, 0);
> >         struct test_tc_bpf *skel = NULL;
> >         int fd, r;
> >
> >         skel = test_tc_bpf__open_and_load();
> >         if (!skel)
> >                 return -ENOMEM;
> >
> >         fd = bpf_program__fd(skel->progs.cls);
> >
> >         DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex =
> >                             if_nametoindex("lo"), .attach_point =
> >                             BPF_TC_INGRESS);
> >         /* Create clsact qdisc */
> >         r = bpf_tc_hook_create(&hook, 0);
> >         if (r < 0)
> >                 goto end;
> >
> >         DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .prog_fd = fd);
>
> I don't feel too strongly about this w.r.t. example, but
> DECLARE_LIBBPF_OPTS() does declare a variable, so according to C89 all
> such declarations should be gathered at the top. It would be nice to
> stick to this in the example, but I can see how such locality is a bit
> better for educational purposes, so I'm ok with that as well.
>
> >         r = bpf_tc_attach(&hook, &opts, 0);
> >         if (r < 0)
> >                 goto end;
> >         /* Print the auto allocated handle and priority */
> >         printf("Handle=%"PRIu32", opts.handle);
>
> let's drop PRIu32, libbpf doesn't use it so let's not use it as an
> example, %u would work fine here
>

Ok, will drop.

> >         printf("Priority=%"PRIu32", opts.priority);
> >
> >         opts.prog_fd = opts.prog_id = 0;
> >         bpf_tc_detach(&hook, &opts);
> > end:
> >         test_tc_bpf__destroy(skel);
> >
> > This is equivalent to doing the following using tc command line:
> >   # tc qdisc add dev lo clsact
> >   # tc filter add dev lo ingress bpf obj foo.o sec classifier da
> >
> > Another example replacing a filter (extending prior example):
> >
> >         /* We can also choose both (or one), let's try replacing an
> >          * existing filter.
> >          */
> >         DECLARE_LIBBPF_OPTS(bpf_tc_opts, replace_opts, .handle =
> >                             opts.handle, .priority = opts.priority,
> >                             .prog_fd = fd);
> >         r = bpf_tc_attach(&hook, &replace_opts, 0);
> >         if (r < 0 && r == -EEXIST) {
>
> again, == -EEXISTS implies r < 0, this just looks sloppy
>
> >                 /* Expected, now use BPF_TC_F_REPLACE to replace it */
> >                 return bpf_tc_attach(&hook, &replace_opts, BPF_TC_F_REPLACE);
> >         } else if (r == 0) {
>
> I'd go with
>
> else if (r < 0) {
>     return r;
> }
>
> /* handle happy case without unnecessary nesting */
>

Ok.

> >                 /* There must be no existing filter with these
> >                  * attributes, so cleanup and return an error.
> >                  */
> >                 replace_opts.prog_fd = replace_opts.prog_id = 0;
> >                 r = bpf_tc_detach(&hook, &replace_opts);
> >                 if (r == 0)
> >                         r = -1;
>
> just return -1;
>

Ok.

> >         }
> >         return r;
> >
> > To obtain info of a particular filter:
> >
> >         /* Find info for filter with handle 1 and priority 50 */
> >         DECLARE_LIBBPF_OPTS(bpf_tc_opts, info_opts, .handle = 1,
> >                             .priority = 50);
> >         r = bpf_tc_query(&hook, &info_opts);
> >         if (r < 0 && r == -ENOENT)
> >                 printf("Filter not found");
> >         else if (r == 0)
> >                 printf("Prog ID: %"PRIu32", info_opts.prog_id);
>
> same about PRI and r < 0
>
> >         return r;
> >
> > We can also match using prog_id to find the same filter:
> >
> >         DECLARE_LIBBPF_OPTS(bpf_tc_opts, info_opts2, .prog_id =
> >                             info_opts.prog_id);
> >         r = bpf_tc_query(&hook, &info_opts2);
> >         if (r < 0 && r == -ENOENT)
> >                 printf("Filter not found");
> >         else if (r == 0) {
> >                 /* If we know there's only one filter for this loaded prog,
> >                  * it is safe to assert that the handle and priority are
> >                  * as expected.
> >                  */
> >                 assert(info_opts2.handle == 1);
> >                 assert(info_opts2.priority == 50);
> >         }
> >         return r;
> >
> > Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
>
> API looks good to me (except the flags field that just stands out).
> But I'll defer to Daniel to make the final call.
>
> >  tools/lib/bpf/libbpf.h   |  41 ++++
> >  tools/lib/bpf/libbpf.map |   5 +
> >  tools/lib/bpf/netlink.c  | 463 ++++++++++++++++++++++++++++++++++++++-
> >  3 files changed, 508 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> > index bec4e6a6e31d..3de701f46a33 100644
> > --- a/tools/lib/bpf/libbpf.h
> > +++ b/tools/lib/bpf/libbpf.h
> > @@ -775,6 +775,47 @@ LIBBPF_API int bpf_linker__add_file(struct bpf_linker *linker, const char *filen
> >  LIBBPF_API int bpf_linker__finalize(struct bpf_linker *linker);
> >  LIBBPF_API void bpf_linker__free(struct bpf_linker *linker);
> >
> > +enum bpf_tc_attach_point {
> > +       BPF_TC_INGRESS = 1 << 0,
> > +       BPF_TC_EGRESS  = 1 << 1,
> > +       BPF_TC_CUSTOM  = 1 << 2,
> > +};
> > +
> > +enum bpf_tc_attach_flags {
> > +       BPF_TC_F_REPLACE = 1 << 0,
> > +};
> > +
> > +struct bpf_tc_hook {
> > +       size_t sz;
> > +       int ifindex;
> > +       enum bpf_tc_attach_point attach_point;
> > +       __u32 parent;
> > +       size_t :0;
> > +};
> > +
> > +#define bpf_tc_hook__last_field parent
> > +
> > +struct bpf_tc_opts {
> > +       size_t sz;
> > +       int prog_fd;
> > +       __u32 prog_id;
> > +       __u32 handle;
> > +       __u32 priority;
> > +       size_t :0;
> > +};
> > +
> > +#define bpf_tc_opts__last_field priority
> > +
> > +LIBBPF_API int bpf_tc_hook_create(struct bpf_tc_hook *hook, int flags);
> > +LIBBPF_API int bpf_tc_hook_destroy(struct bpf_tc_hook *hook);
> > +LIBBPF_API int bpf_tc_attach(const struct bpf_tc_hook *hook,
> > +                            struct bpf_tc_opts *opts,
> > +                            int flags);
>
> why didn't you put flags into bpf_tc_opts? they are clearly optional
> and fit into "opts" paradigm...
>

I can move this into opts, but during previous discussion it was kept outside
opts by Daniel, so I kept that unchanged.

> > +LIBBPF_API int bpf_tc_detach(const struct bpf_tc_hook *hook,
> > +                            const struct bpf_tc_opts *opts);
> > +LIBBPF_API int bpf_tc_query(const struct bpf_tc_hook *hook,
> > +                           struct bpf_tc_opts *opts);
> > +
> >  #ifdef __cplusplus
> >  } /* extern "C" */
> >  #endif
> > diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> > index b9b29baf1df8..04509c7c144b 100644
> > --- a/tools/lib/bpf/libbpf.map
> > +++ b/tools/lib/bpf/libbpf.map
> > @@ -361,4 +361,9 @@ LIBBPF_0.4.0 {
> >                 bpf_linker__new;
> >                 bpf_map__inner_map;
> >                 bpf_object__set_kversion;
> > +               bpf_tc_hook_create;
> > +               bpf_tc_hook_destroy;
>
> please keep this alphabetically sorted
>

Ok.

> > +               bpf_tc_attach;
> > +               bpf_tc_detach;
> > +               bpf_tc_query;
> >  } LIBBPF_0.3.0;
> > diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c
> > index 6daee6640725..88f7b6144c78 100644
> > --- a/tools/lib/bpf/netlink.c
> > +++ b/tools/lib/bpf/netlink.c
> > @@ -4,7 +4,11 @@
> >  #include <stdlib.h>
> >  #include <memory.h>
> >  #include <unistd.h>
> > +#include <inttypes.h>
> > +#include <arpa/inet.h>
> >  #include <linux/bpf.h>
> > +#include <linux/if_ether.h>
> > +#include <linux/pkt_cls.h>
> >  #include <linux/rtnetlink.h>
> >  #include <sys/socket.h>
> >  #include <errno.h>
> > @@ -73,6 +77,12 @@ static int libbpf_netlink_open(__u32 *nl_pid)
> >         return ret;
> >  }
> >
> > +enum {
> > +       BPF_NL_CONT,
> > +       BPF_NL_NEXT,
> > +       BPF_NL_DONE,
> > +};
> > +
> >  static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq,
> >                             __dump_nlmsg_t _fn, libbpf_dump_nlmsg_t fn,
> >                             void *cookie)
> > @@ -84,6 +94,7 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq,
> >         int len, ret;
> >
> >         while (multipart) {
> > +start:
> >                 multipart = false;
> >                 len = recv(sock, buf, sizeof(buf), 0);
> >                 if (len < 0) {
> > @@ -121,8 +132,18 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq,
> >                         }
> >                         if (_fn) {
> >                                 ret = _fn(nh, fn, cookie);
> > -                               if (ret)
> > +                               if (ret < 0)
> > +                                       return ret;
> > +                               switch (ret) {
> > +                               case BPF_NL_CONT:
> > +                                       break;
> > +                               case BPF_NL_NEXT:
> > +                                       goto start;
> > +                               case BPF_NL_DONE:
> > +                                       return 0;
> > +                               default:
> >                                         return ret;
> > +                               }
> >                         }
> >                 }
> >         }
> > @@ -357,3 +378,443 @@ static int libbpf_nl_send_recv(struct nlmsghdr *nh, __dump_nlmsg_t fn,
> >         close(sock);
> >         return ret;
> >  }
> > +
> > +/* TC-HOOK */
> > +
> > +typedef int (*qdisc_config_t)(struct nlmsghdr *nh, struct tcmsg *t,
> > +                             size_t maxsz);
> > +
> > +static int clsact_config(struct nlmsghdr *nh, struct tcmsg *t, size_t maxsz)
> > +{
> > +       int ret;
> > +
> > +       t->tcm_parent = TC_H_CLSACT;
> > +       t->tcm_handle = TC_H_MAKE(TC_H_CLSACT, 0);
> > +
> > +       ret = nlattr_add(nh, maxsz, TCA_KIND, "clsact", sizeof("clsact"));
> > +       if (ret < 0)
> > +               return ret;
> > +
> > +       return 0;
>
> nit: return nlattr_add(...)
>

Will fix.

> > +}
> > +
> > +static int attach_point_to_config(struct bpf_tc_hook *hook, qdisc_config_t *configp)
> > +{
> > +       if (!hook)
> > +               return -EINVAL;
>
> !hook should be already ensured by calling functions, no need to
> re-check this everywhere, do this only in API methods. All internal
> functions should already ensure non-NULL, otherwise it's a bug.
>

Right, will fix.

> > +
> > +       switch ((int)OPTS_GET(hook, attach_point, 0)) {
>
> is int casting necessary here?
>
> > +               case BPF_TC_INGRESS:
> > +               case BPF_TC_EGRESS:
> > +               case BPF_TC_INGRESS|BPF_TC_EGRESS:
> > +                       if (OPTS_GET(hook, parent, 0))
> > +                               return -EINVAL;
> > +                       *configp = &clsact_config;
> > +                       break;
> > +               case BPF_TC_CUSTOM:
> > +                       return -EOPNOTSUPP;
> > +               default:
> > +                       return -EINVAL;
> > +       }
> > +
> > +       return 0;
> > +}
> > +
> > +static long long int tc_get_tcm_parent(enum bpf_tc_attach_point attach_point,
> > +                                      __u32 parent)
> > +{
> > +       long long int ret;
> > +
> > +       switch (attach_point) {
> > +       case BPF_TC_INGRESS:
> > +               if (parent)
> > +                       return -EINVAL;
> > +               ret = TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_INGRESS);
>
> direct return
>
> > +               break;
> > +       case BPF_TC_EGRESS:
> > +               if (parent)
> > +                       return -EINVAL;
> > +               ret = TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_EGRESS);
>
> same, make it explicit that we are done and it's the final value returned
>
> > +               break;
> > +       case BPF_TC_CUSTOM:
> > +               if (!parent)
> > +                       return -EINVAL;
> > +               ret = parent;
> > +               break;
> > +       default:
> > +               return -EINVAL;
> > +       }
> > +
> > +       return ret;
> > +}
> > +
> > +static int tc_qdisc_modify(struct bpf_tc_hook *hook, int cmd, int flags)
> > +{
> > +       qdisc_config_t config;
> > +       int ret = 0;
>
> unnecessary initialization, some tooling definitely will complain,
> please drop = 0 part
>
> > +       struct {
> > +               struct nlmsghdr nh;
> > +               struct tcmsg t;
> > +               char buf[256];
> > +       } req;
> > +
> > +       ret = attach_point_to_config(hook, &config);
> > +       if (ret < 0)
> > +               return ret;
> > +
> > +       memset(&req, 0, sizeof(req));
> > +       req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg));
> > +       req.nh.nlmsg_flags =
> > +               NLM_F_REQUEST | NLM_F_ACK | flags;
>
> we can go up to 100 character lines, keep it on single line
>
> > +       req.nh.nlmsg_type = cmd;
> > +       req.t.tcm_family = AF_UNSPEC;
> > +       req.t.tcm_ifindex = OPTS_GET(hook, ifindex, 0);
> > +
> > +       ret = config(&req.nh, &req.t, sizeof(req));
> > +       if (ret < 0)
> > +               return ret;
> > +
> > +       ret = libbpf_nl_send_recv(&req.nh, NULL, NULL, NULL);
> > +       if (ret < 0)
> > +               return ret;
> > +
> > +       return 0;
> > +}
> > +
> > +static int tc_qdisc_create_excl(struct bpf_tc_hook *hook, int flags)
> > +{
> > +       flags = flags & BPF_TC_F_REPLACE ? NLM_F_REPLACE : NLM_F_EXCL;
>
> see below as well, please use () around bit operators
>

Right.

> > +       return tc_qdisc_modify(hook, RTM_NEWQDISC, NLM_F_CREATE | flags);
> > +}
> > +
> > +static int tc_qdisc_delete(struct bpf_tc_hook *hook)
> > +{
> > +       return tc_qdisc_modify(hook, RTM_DELQDISC, 0);
> > +}
> > +
> > +int bpf_tc_hook_create(struct bpf_tc_hook *hook, int flags)
> > +{
> > +       if (!hook || !OPTS_VALID(hook, bpf_tc_hook))
> > +               return -EINVAL;
> > +       if (OPTS_GET(hook, ifindex, 0) <= 0 || flags & ~BPF_TC_F_REPLACE)
>
> please use () around bit operators
>

Ok.

> > +               return -EINVAL;
> > +
> > +       return tc_qdisc_create_excl(hook, flags);
> > +}
> > +
> > +static int tc_cls_detach(const struct bpf_tc_hook *hook,
> > +                        const struct bpf_tc_opts *opts, bool flush);
> > +
> > +int bpf_tc_hook_destroy(struct bpf_tc_hook *hook)
> > +{
> > +       if (!hook || !OPTS_VALID(hook, bpf_tc_hook) ||
> > +           OPTS_GET(hook, ifindex, 0) <= 0)
> > +               return -EINVAL;
> > +
> > +       switch ((int)OPTS_GET(hook, attach_point, 0)) {
>
> int casting. Did the compiler complain about that or what?
>

It complains on -Wswitch, as we switch on values apart from the enum values, but
I'll see if I can remove it.

> > +               case BPF_TC_INGRESS:
> > +               case BPF_TC_EGRESS:
> > +                       return tc_cls_detach(hook, NULL, true);
> > +               case BPF_TC_INGRESS|BPF_TC_EGRESS:
> > +                       return tc_qdisc_delete(hook);
> > +               case BPF_TC_CUSTOM:
> > +                       return -EOPNOTSUPP;
> > +               default:
> > +                       return -EINVAL;
> > +       }
> > +}
> > +
> > +struct pass_info {
> > +       struct bpf_tc_opts *opts;
> > +       __u32 match_prog_id;
> > +       bool processed;
> > +};
> > +
> > +/* TC-BPF */
> > +
> > +static int tc_cls_add_fd_and_name(struct nlmsghdr *nh, size_t maxsz, int fd)
> > +{
> > +       struct bpf_prog_info info = {};
> > +       char name[256] = {};
>
> you are unconditionally snprintf()'ing into name, don't unnecessarily
> initialize it
>

Ok.

> > +       int len, ret;
> > +
> > +       ret = bpf_obj_get_info_by_fd(fd, &info, &(__u32){sizeof(info)});
>
> that sizeof part... even if that works reliably, stick to normal use
> pattern, have a local variable for that. It can be overwritten by the
> kernel.
>
> you can re-use len for this, btw
>

Ok, will fix everywhere.

> > +       if (ret < 0)
> > +               return ret;
> > +
> > +       ret = nlattr_add(nh, maxsz, TCA_BPF_FD, &fd, sizeof(fd));
> > +       if (ret < 0)
> > +               return ret;
> > +
> > +       len = snprintf(name, sizeof(name), "%s:[%" PRIu32 "]", info.name,
>
> libbpf doesn't use PRI modifiers, use %u
>

Ok.

> > +                      info.id);
> > +       if (len < 0 || len >= sizeof(name))
> > +               return len < 0 ? -EINVAL : -ENAMETOOLONG;
>
> if (len < 0)
>     return -errno;
> if (len >= sizeof(name))
>     return -ENAMETOOLONG;
>

Ok.

> > +
> > +       return nlattr_add(nh, maxsz, TCA_BPF_NAME, name, len + 1);
> > +}
> > +
> > +
> > +static int cls_get_info(struct nlmsghdr *nh, libbpf_dump_nlmsg_t fn,
> > +                       void *cookie);
> > +
> > +int bpf_tc_attach(const struct bpf_tc_hook *hook,
> > +                 struct bpf_tc_opts *opts, int flags)
> > +{
> > +       __u32 protocol = 0, bpf_flags;
> > +       struct pass_info info = {};
> > +       long long int tcm_parent;
> > +       struct nlattr *nla;
> > +       int ret;
> > +       struct {
> > +               struct nlmsghdr nh;
> > +               struct tcmsg t;
> > +               char buf[256];
> > +       } req;
> > +
> > +       if (!hook || !opts || !OPTS_VALID(hook, bpf_tc_opts) ||
> > +           !OPTS_VALID(opts, bpf_tc_opts))
> > +               return -EINVAL;
> > +       if (OPTS_GET(hook, ifindex, 0) <= 0 || !OPTS_GET(opts, prog_fd, 0) ||
> > +           OPTS_GET(opts, prog_id, 0))
> > +               return -EINVAL;
> > +       if (OPTS_GET(opts, priority, 0) > UINT16_MAX)
> > +               return -EINVAL;
> > +       if (flags & ~BPF_TC_F_REPLACE)
> > +               return -EINVAL;
> > +
> > +       protocol = ETH_P_ALL;
> > +       flags = flags & BPF_TC_F_REPLACE ? NLM_F_REPLACE : NLM_F_EXCL;
>
> ()
>
> > +
> > +       memset(&req, 0, sizeof(req));
> > +       req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg));
> > +       req.nh.nlmsg_flags =
> > +               NLM_F_REQUEST | NLM_F_ACK | NLM_F_CREATE | NLM_F_ECHO | flags;
> > +       req.nh.nlmsg_type = RTM_NEWTFILTER;
> > +       req.t.tcm_family = AF_UNSPEC;
> > +       req.t.tcm_handle = OPTS_GET(opts, handle, 0);
> > +       req.t.tcm_ifindex = OPTS_GET(hook, ifindex, 0);
>
> you are OPTS_GET()ing same stuff multiple times, it might look cleaner
> to use local variables for that. It will be faster also, but that's
> not important here.
>
> > +       req.t.tcm_info = TC_H_MAKE(OPTS_GET(opts, priority, 0) << 16, htons(protocol));
> > +
> > +       tcm_parent = tc_get_tcm_parent(OPTS_GET(hook, attach_point, 0), OPTS_GET(hook, parent, 0));
>
> and this will be much shorter, positively, please use local variables
> for all those input fields you care about
>

Ok, will fix.

> > +       if (tcm_parent < 0)
> > +               return tcm_parent;
> > +       req.t.tcm_parent = tcm_parent;
> > +
> > +       ret = nlattr_add(&req.nh, sizeof(req), TCA_KIND, "bpf", sizeof("bpf"));
> > +       if (ret < 0)
> > +               return ret;
> > +
> > +       nla = nlattr_begin_nested(&req.nh, sizeof(req), TCA_OPTIONS);
> > +       if (!nla)
> > +               return -EMSGSIZE;
> > +
> > +       ret = tc_cls_add_fd_and_name(&req.nh, sizeof(req), OPTS_GET(opts, prog_fd, 0));
> > +       if (ret < 0)
> > +               return ret;
> > +
> > +       /* direct action mode is always enabled */
> > +       bpf_flags = TCA_BPF_FLAG_ACT_DIRECT;
> > +       ret = nlattr_add(&req.nh, sizeof(req), TCA_BPF_FLAGS,
> > +                        &bpf_flags, sizeof(bpf_flags));
> > +       if (ret < 0)
> > +               return ret;
> > +
> > +       nlattr_end_nested(&req.nh, nla);
> > +
> > +       info.opts = opts;
> > +
> > +       ret = libbpf_nl_send_recv(&req.nh, &cls_get_info, NULL, &info);
> > +       if (ret < 0)
> > +               return ret;
> > +
> > +       /* Failed to process unicast response */
> > +       if (!info.processed)
> > +               ret = -ENOENT;
>
> just return directly, you just did that multiple times above, why this
> one is special?
>

Yes, this can be a direct return. A lot of this is just oversight from the
constant rewriting etc.

> > +
> > +       return ret;
> > +}
> > +
> > +static int tc_cls_detach(const struct bpf_tc_hook *hook,
> > +                        const struct bpf_tc_opts *opts, bool flush)
> > +{
> > +       long long int tcm_parent;
> > +       __u32 protocol = 0;
> > +       int ret, c;
> > +       struct {
> > +               struct nlmsghdr nh;
> > +               struct tcmsg t;
> > +               char buf[256];
> > +       } req;
> > +
> > +       if (!hook || !OPTS_VALID(hook, bpf_tc_opts) ||
> > +           !OPTS_VALID(opts, bpf_tc_opts))
> > +               return -EINVAL;
> > +       if (OPTS_GET(hook, ifindex, 0) <= 0 || OPTS_GET(opts, prog_fd, 0) ||
> > +           OPTS_GET(opts, prog_id, 0))
> > +               return -EINVAL;
> > +       c = !!OPTS_GET(opts, handle, 0) + !!OPTS_GET(opts, priority, 0);
> > +       if ((flush && c != 0) || (!flush && c != 2))
> > +               return -EINVAL;
>
> arithmetics here looks pretty ugly, would it be too bad with logical checks?
>

I'll do it with logical checks, this was just shorter.

> > +       if (OPTS_GET(opts, priority, 0) > UINT16_MAX)
> > +               return -EINVAL;
> > +
> > +       if (!flush)
> > +               protocol = ETH_P_ALL;
> > +
> > +       memset(&req, 0, sizeof(req));
> > +       req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg));
> > +       req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
> > +       req.nh.nlmsg_type = RTM_DELTFILTER;
> > +       req.t.tcm_family = AF_UNSPEC;
> > +       if (!flush)
> > +               req.t.tcm_handle = OPTS_GET(opts, handle, 0);
> > +       req.t.tcm_ifindex = OPTS_GET(hook, ifindex, 0);
> > +       if (!flush)
> > +               req.t.tcm_info = TC_H_MAKE(OPTS_GET(opts, priority, 0) << 16,
>
> OPTS_GET()s just make everything uglier and unnecessarily verbose
>
> > +                                          htons(protocol));
> > +
> > +       tcm_parent = tc_get_tcm_parent(OPTS_GET(hook, attach_point, 0), OPTS_GET(hook, parent, 0));
> > +       if (tcm_parent < 0)
> > +               return tcm_parent;
> > +       req.t.tcm_parent = tcm_parent;
> > +
> > +       if (!flush) {
> > +               ret = nlattr_add(&req.nh, sizeof(req), TCA_KIND, "bpf", sizeof("bpf"));
> > +               if (ret < 0)
> > +                       return ret;
> > +       }
> > +
> > +       return libbpf_nl_send_recv(&req.nh, NULL, NULL, NULL);
> > +}
> > +
>
> [...]
>
> > +       tcm_parent = tc_get_tcm_parent(OPTS_GET(hook, attach_point, 0), OPTS_GET(hook, parent, 0));
> > +       if (tcm_parent < 0)
> > +               return tcm_parent;
> > +       req.t.tcm_parent = tcm_parent;
> > +
> > +       ret = nlattr_add(&req.nh, sizeof(req), TCA_KIND, "bpf", sizeof("bpf"));
> > +       if (ret < 0)
> > +               return ret;
> > +
> > +       if (OPTS_GET(opts, prog_fd, 0)) {
> > +               struct bpf_prog_info info = {};
> > +               ret = bpf_obj_get_info_by_fd(OPTS_GET(opts, prog_fd, 0), &info, &(__u32){sizeof(info)});
>
> same as before, use dedicated variable
>
> > +               if (ret < 0)
> > +                       return ret;
> > +
> > +               pinfo.match_prog_id = info.id;
> > +       } else
> > +               pinfo.match_prog_id = OPTS_GET(opts, prog_id, 0);
>
> when one branch of if has {}, the other one has to have it as well, please fix
>

Ok.

> > +
> > +       pinfo.opts = opts;
> > +
> > +       ret = libbpf_nl_send_recv(&req.nh, cls_get_info, NULL, &pinfo);
> > +       if (ret < 0)
> > +               return ret;
> > +
> > +       if (!pinfo.processed)
> > +               ret = -ENOENT;
>
> direct return
>

Ok.

> > +
> > +       return ret;
> > +}
> > --
> > 2.30.2
> >

--
Kartikeya

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next v5 3/3] libbpf: add selftests for TC-BPF API
  2021-04-30 19:41   ` Andrii Nakryiko
@ 2021-05-01  6:34     ` Kumar Kartikeya Dwivedi
  2021-05-03 22:55       ` Andrii Nakryiko
  0 siblings, 1 reply; 14+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2021-05-01  6:34 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Toke Høiland-Jørgensen, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, Shaun Crampton,
	Networking

On Sat, May 01, 2021 at 01:11:47AM IST, Andrii Nakryiko wrote:
> On Wed, Apr 28, 2021 at 9:26 AM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > This adds some basic tests for the low level bpf_tc_* API.
> >
> > Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> >  .../testing/selftests/bpf/prog_tests/tc_bpf.c | 467 ++++++++++++++++++
> >  .../testing/selftests/bpf/progs/test_tc_bpf.c |  12 +
> >  2 files changed, 479 insertions(+)
> >  create mode 100644 tools/testing/selftests/bpf/prog_tests/tc_bpf.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/test_tc_bpf.c
> >
> > diff --git a/tools/testing/selftests/bpf/prog_tests/tc_bpf.c b/tools/testing/selftests/bpf/prog_tests/tc_bpf.c
> > new file mode 100644
> > index 000000000000..40441f4e23e2
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/prog_tests/tc_bpf.c
> > @@ -0,0 +1,467 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +#include <test_progs.h>
> > +#include <linux/pkt_cls.h>
> > +
> > +#include "test_tc_bpf.skel.h"
> > +
> > +#define LO_IFINDEX 1
> > +
> > +static int test_tc_internal(const struct bpf_tc_hook *hook, int fd)
> > +{
> > +       DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .handle = 1, .priority = 1,
> > +                           .prog_fd = fd);
>
> we have 100 characters, if needed, use it to keep it on the single line
>

Ok.

> > +       struct bpf_prog_info info = {};
> > +       int ret;
> > +
> > +       ret = bpf_obj_get_info_by_fd(fd, &info, &(__u32){sizeof(info)});
>
> as in previous patch, don't do this
>
> > +       if (!ASSERT_OK(ret, "bpf_obj_get_info_by_fd"))
> > +               return ret;
> > +
> > +       ret = bpf_tc_attach(hook, &opts, 0);
> > +       if (!ASSERT_OK(ret, "bpf_tc_attach"))
> > +               return ret;
> > +
> > +       if (!ASSERT_EQ(opts.handle, 1, "handle set") ||
> > +           !ASSERT_EQ(opts.priority, 1, "priority set") ||
> > +           !ASSERT_EQ(opts.prog_id, info.id, "prog_id set"))
> > +               goto end;
> > +
> > +       DECLARE_LIBBPF_OPTS(bpf_tc_opts, info_opts, .prog_fd = fd);
>
> this is not C89, please move variable declarations to the top
>
> > +       ret = bpf_tc_query(hook, &info_opts);
> > +       if (!ASSERT_OK(ret, "bpf_tc_query"))
> > +               goto end;
> > +
> > +       DECLARE_LIBBPF_OPTS(bpf_tc_opts, info_opts2, .prog_id = info.id);
>
> and here
>
> > +       ret = bpf_tc_query(hook, &info_opts2);
> > +       if (!ASSERT_OK(ret, "bpf_tc_query"))
> > +               goto end;
> > +
> > +       if (!ASSERT_EQ(opts.handle, 1, "handle set") ||
> > +           !ASSERT_EQ(opts.priority, 1, "priority set") ||
> > +           !ASSERT_EQ(opts.prog_id, info.id, "prog_id set"))
> > +               goto end;
> > +
> > +       opts.prog_id = 0;
> > +       ret = bpf_tc_attach(hook, &opts, BPF_TC_F_REPLACE);
> > +       if (!ASSERT_OK(ret, "bpf_tc_attach replace mode"))
> > +               return ret;
>
> goto end?
>

Yes, thanks for spotting it.

> > +
> > +end:
> > +       opts.prog_fd = opts.prog_id = 0;
> > +       ret = bpf_tc_detach(hook, &opts);
> > +       ASSERT_OK(ret, "bpf_tc_detach");
> > +       return ret;
> > +}
> > +
>
> [...]
>
> > +
> > +       /* attach */
> > +       ret = bpf_tc_attach(NULL, &attach_opts, 0);
> > +       if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_attach invalid hook = NULL"))
> > +               return -EINVAL;
> > +       ret = bpf_tc_attach(hook, &attach_opts, 42);
> > +       if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_attach invalid flags"))
> > +               return -EINVAL;
> > +       attach_opts.prog_fd = 0;
> > +       ret = bpf_tc_attach(hook, &attach_opts, 0);
> > +       if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_attach invalid prog_fd unset"))
> > +               return -EINVAL;
> > +       attach_opts.prog_fd = fd;
> > +       attach_opts.prog_id = 42;
> > +       ret = bpf_tc_attach(hook, &attach_opts, 0);
> > +       if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_attach invalid prog_id set"))
> > +               return -EINVAL;
> > +       attach_opts.prog_id = 0;
> > +       attach_opts.handle = 0;
> > +       ret = bpf_tc_attach(hook, &attach_opts, 0);
> > +       if (!ASSERT_OK(ret, "bpf_tc_attach valid handle unset"))
> > +               return -EINVAL;
> > +       attach_opts.prog_fd = attach_opts.prog_id = 0;
> > +       ASSERT_OK(bpf_tc_detach(hook, &attach_opts), "bpf_tc_detach");
>
> this code is quite hard to follow, maybe sprinkle empty lines between
> logical groups of statements (i.e., prepare inputs + call bpf_tc_xxx +
> assert is one group that goes together)
>

I agree it looks bad. I can also just make a new opts for each combination, and
name it that way. Maybe that will look much better.

> > +       attach_opts.prog_fd = fd;
> > +       attach_opts.handle = 1;
> > +       attach_opts.priority = 0;
> > +       ret = bpf_tc_attach(hook, &attach_opts, 0);
> > +       if (!ASSERT_OK(ret, "bpf_tc_attach valid priority unset"))
> > +               return -EINVAL;
> > +       attach_opts.prog_fd = attach_opts.prog_id = 0;
> > +       ASSERT_OK(bpf_tc_detach(hook, &attach_opts), "bpf_tc_detach");
> > +       attach_opts.prog_fd = fd;
> > +       attach_opts.priority = UINT16_MAX + 1;
> > +       ret = bpf_tc_attach(hook, &attach_opts, 0);
> > +       if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_attach invalid priority > UINT16_MAX"))
> > +               return -EINVAL;
> > +       attach_opts.priority = 0;
> > +       attach_opts.handle = attach_opts.priority = 0;
> > +       ret = bpf_tc_attach(hook, &attach_opts, 0);
> > +       if (!ASSERT_OK(ret, "bpf_tc_attach valid both handle and priority unset"))
> > +               return -EINVAL;
> > +       attach_opts.prog_fd = attach_opts.prog_id = 0;
> > +       ASSERT_OK(bpf_tc_detach(hook, &attach_opts), "bpf_tc_detach");
> > +       ret = bpf_tc_attach(hook, NULL, 0);
> > +       if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_attach invalid opts = NULL"))
> > +               return -EINVAL;
> > +
> > +       return 0;
> > +}
> > +
> > +static int test_tc_query(const struct bpf_tc_hook *hook, int fd)
> > +{
> > +       struct test_tc_bpf *skel = NULL;
> > +       int new_fd, ret, i = 0;
> > +
> > +       skel = test_tc_bpf__open_and_load();
> > +       if (!ASSERT_OK_PTR(skel, "test_tc_bpf__open_and_load"))
> > +               return -EINVAL;
> > +
> > +       new_fd = bpf_program__fd(skel->progs.cls);
> > +
> > +       /* make sure no other filters are attached */
> > +       ret = bpf_tc_query(hook, NULL);
> > +       if (!ASSERT_EQ(ret, -ENOENT, "bpf_tc_query == -ENOENT"))
> > +               goto end_destroy;
> > +
> > +       for (i = 0; i < 5; i++) {
> > +               DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .prog_fd = fd);
>
> empty line after variable declaration
>

Ok, will fix everywhere.

> > +               ret = bpf_tc_attach(hook, &opts, 0);
> > +               if (!ASSERT_OK(ret, "bpf_tc_attach"))
> > +                       goto end;
> > +       }
> > +       DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .handle = 1, .priority = 1,
> > +                           .prog_fd = new_fd);
> > +       ret = bpf_tc_attach(hook, &opts, 0);
> > +       if (!ASSERT_OK(ret, "bpf_tc_attach"))
> > +               goto end;
> > +       i++;
> > +
> > +       ASSERT_EQ(opts.handle, 1, "handle match");
> > +       ASSERT_EQ(opts.priority, 1, "priority match");
> > +       ASSERT_NEQ(opts.prog_id, 0, "prog_id set");
> > +
> > +       opts.prog_fd = 0;
> > +       /* search with handle, priority, prog_id */
> > +       ret = bpf_tc_query(hook, &opts);
> > +       if (!ASSERT_OK(ret, "bpf_tc_query"))
> > +               goto end;
> > +
> > +       ASSERT_EQ(opts.handle, 1, "handle match");
> > +       ASSERT_EQ(opts.priority, 1, "priority match");
> > +       ASSERT_NEQ(opts.prog_id, 0, "prog_id set");
> > +
> > +       opts.priority = opts.prog_fd = 0;
> > +       /* search with handle, prog_id */
> > +       ret = bpf_tc_query(hook, &opts);
> > +       if (!ASSERT_OK(ret, "bpf_tc_query"))
> > +               goto end;
> > +
> > +       ASSERT_EQ(opts.handle, 1, "handle match");
> > +       ASSERT_EQ(opts.priority, 1, "priority match");
> > +       ASSERT_NEQ(opts.prog_id, 0, "prog_id set");
> > +
> > +       opts.handle = opts.prog_fd = 0;
> > +       /* search with priority, prog_id */
> > +       ret = bpf_tc_query(hook, &opts);
> > +       if (!ASSERT_OK(ret, "bpf_tc_query"))
> > +               goto end;
> > +
> > +       ASSERT_EQ(opts.handle, 1, "handle match");
> > +       ASSERT_EQ(opts.priority, 1, "priority match");
> > +       ASSERT_NEQ(opts.prog_id, 0, "prog_id set");
> > +
> > +       opts.handle = opts.priority = opts.prog_fd = 0;
> > +       /* search with prog_id */
> > +       ret = bpf_tc_query(hook, &opts);
> > +       if (!ASSERT_OK(ret, "bpf_tc_query"))
> > +               goto end;
> > +
> > +       ASSERT_EQ(opts.handle, 1, "handle match");
> > +       ASSERT_EQ(opts.priority, 1, "priority match");
> > +       ASSERT_NEQ(opts.prog_id, 0, "prog_id set");
> > +
> > +       while (i != 1) {
> > +               DECLARE_LIBBPF_OPTS(bpf_tc_opts, del_opts, .prog_fd = fd);
>
> empty line here
>
> > +               ret = bpf_tc_query(hook, &del_opts);
> > +               if (!ASSERT_OK(ret, "bpf_tc_query"))
> > +                       goto end;
> > +               ASSERT_NEQ(del_opts.prog_id, opts.prog_id, "prog_id should not be same");
> > +               ASSERT_NEQ(del_opts.priority, 1, "priority should not be 1");
> > +               del_opts.prog_fd = del_opts.prog_id = 0;
> > +               ret = bpf_tc_detach(hook, &del_opts);
> > +               if (!ASSERT_OK(ret, "bpf_tc_detach"))
> > +                       goto end;
> > +               i--;
> > +       }
> > +
> > +       opts.handle = opts.priority = opts.prog_id = 0;
> > +       opts.prog_fd = fd;
> > +       ret = bpf_tc_query(hook, &opts);
> > +       ASSERT_EQ(ret, -ENOENT, "bpf_tc_query == -ENOENT");
> > +
> > +end:
> > +       while (i--) {
> > +               DECLARE_LIBBPF_OPTS(bpf_tc_opts, del_opts, 0);
>
> you get the idea by now
>
> > +               ret = bpf_tc_query(hook, &del_opts);
> > +               if (!ASSERT_OK(ret, "bpf_tc_query"))
> > +                       break;
> > +               del_opts.prog_id = 0;
> > +               ret = bpf_tc_detach(hook, &del_opts);
> > +               if (!ASSERT_OK(ret, "bpf_tc_detach"))
> > +                       break;
> > +       }
> > +       ASSERT_EQ(bpf_tc_query(hook, NULL), -ENOENT, "bpf_tc_query == -ENOENT");
> > +end_destroy:
> > +       test_tc_bpf__destroy(skel);
> > +       return ret;
> > +}
> > +
>
> [...]

--
Kartikeya

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next v5 1/3] libbpf: add netlink helpers
  2021-05-01  6:13     ` Kumar Kartikeya Dwivedi
@ 2021-05-03 22:47       ` Andrii Nakryiko
  0 siblings, 0 replies; 14+ messages in thread
From: Andrii Nakryiko @ 2021-05-03 22:47 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Toke Høiland-Jørgensen, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, Shaun Crampton,
	Networking

On Fri, Apr 30, 2021 at 11:13 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> On Sat, May 01, 2021 at 12:34:39AM IST, Andrii Nakryiko wrote:
> > On Wed, Apr 28, 2021 at 9:26 AM Kumar Kartikeya Dwivedi
> > <memxor@gmail.com> wrote:
> > >
> > > This change introduces a few helpers to wrap open coded attribute
> > > preparation in netlink.c. It also adds a libbpf_nl_send_recv that is useful
> > > to wrap send + recv handling in a generic way. Subsequent patch will
> > > also use this function for sending and receiving a netlink response.
> > > The libbpf_nl_get_link helper has been removed instead, moving socket
> > > creation into the newly named libbpf_nl_send_recv.
> > >
> > > Every nested attribute's closure must happen using the helper
> > > nlattr_end_nested, which sets its length properly. NLA_F_NESTED is
> > > enforced using nlattr_begin_nested helper. Other simple attributes
> > > can be added directly.
> > >
> > > The maxsz parameter corresponds to the size of the request structure
> > > which is being filled in, so for instance with req being:
> > >
> > > struct {
> > >         struct nlmsghdr nh;
> > >         struct tcmsg t;
> > >         char buf[4096];
> > > } req;
> > >
> > > Then, maxsz should be sizeof(req).
> > >
> > > This change also converts the open coded attribute preparation with the
> > > helpers. Note that the only failure the internal call to nlattr_add
> > > could result in the nested helper would be -EMSGSIZE, hence that is what
> > > we return to our caller.
> > >
> > > The libbpf_nl_send_recv call takes care of opening the socket, sending the
> > > netlink message, receiving the response, potentially invoking callbacks,
> > > and return errors if any, and then finally close the socket. This allows
> > > users to avoid identical socket setup code in different places. The only
> > > user of libbpf_nl_get_link has been converted to make use of it.
> > >
> > > __bpf_set_link_xdp_fd_replace has also been refactored to use it.
> > >
> > > Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> > > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > > ---
> > >  tools/lib/bpf/netlink.c | 117 ++++++++++++++++++----------------------
> > >  tools/lib/bpf/nlattr.h  |  48 +++++++++++++++++
> > >  2 files changed, 100 insertions(+), 65 deletions(-)
> > >
> > > diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c
> > > index d2cb28e9ef52..6daee6640725 100644
> > > --- a/tools/lib/bpf/netlink.c
> > > +++ b/tools/lib/bpf/netlink.c
> > > @@ -131,72 +131,53 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq,
> > >         return ret;
> > >  }
> > >
> > > +static int libbpf_nl_send_recv(struct nlmsghdr *nh, __dump_nlmsg_t fn,
> > > +                              libbpf_dump_nlmsg_t _fn, void *cookie);
> > > +
> > >  static int __bpf_set_link_xdp_fd_replace(int ifindex, int fd, int old_fd,
> > >                                          __u32 flags)
> > >  {
> > > -       int sock, seq = 0, ret;
> > > -       struct nlattr *nla, *nla_xdp;
> > > +       struct nlattr *nla;
> > > +       int ret;
> > >         struct {
> > >                 struct nlmsghdr  nh;
> > >                 struct ifinfomsg ifinfo;
> > >                 char             attrbuf[64];
> > >         } req;
> > > -       __u32 nl_pid = 0;
> > > -
> > > -       sock = libbpf_netlink_open(&nl_pid);
> > > -       if (sock < 0)
> > > -               return sock;
> > >
> > >         memset(&req, 0, sizeof(req));
> > >         req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg));
> > >         req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
> > >         req.nh.nlmsg_type = RTM_SETLINK;
> > > -       req.nh.nlmsg_pid = 0;
> > > -       req.nh.nlmsg_seq = ++seq;
> > >         req.ifinfo.ifi_family = AF_UNSPEC;
> > >         req.ifinfo.ifi_index = ifindex;
> > >
> > >         /* started nested attribute for XDP */
> > > -       nla = (struct nlattr *)(((char *)&req)
> > > -                               + NLMSG_ALIGN(req.nh.nlmsg_len));
> > > -       nla->nla_type = NLA_F_NESTED | IFLA_XDP;
> > > -       nla->nla_len = NLA_HDRLEN;
> > > +       nla = nlattr_begin_nested(&req.nh, sizeof(req), IFLA_XDP);
> > > +       if (!nla)
> > > +               return -EMSGSIZE;
> > >
> > >         /* add XDP fd */
> > > -       nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
> > > -       nla_xdp->nla_type = IFLA_XDP_FD;
> > > -       nla_xdp->nla_len = NLA_HDRLEN + sizeof(int);
> > > -       memcpy((char *)nla_xdp + NLA_HDRLEN, &fd, sizeof(fd));
> > > -       nla->nla_len += nla_xdp->nla_len;
> > > +       ret = nlattr_add(&req.nh, sizeof(req), IFLA_XDP_FD, &fd, sizeof(fd));
> > > +       if (ret < 0)
> > > +               return ret;
> > >
> > >         /* if user passed in any flags, add those too */
> > >         if (flags) {
> > > -               nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
> > > -               nla_xdp->nla_type = IFLA_XDP_FLAGS;
> > > -               nla_xdp->nla_len = NLA_HDRLEN + sizeof(flags);
> > > -               memcpy((char *)nla_xdp + NLA_HDRLEN, &flags, sizeof(flags));
> > > -               nla->nla_len += nla_xdp->nla_len;
> > > +               ret = nlattr_add(&req.nh, sizeof(req), IFLA_XDP_FLAGS, &flags, sizeof(flags));
> > > +               if (ret < 0)
> > > +                       return ret;
> > >         }
> > >
> > >         if (flags & XDP_FLAGS_REPLACE) {
> > > -               nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
> > > -               nla_xdp->nla_type = IFLA_XDP_EXPECTED_FD;
> > > -               nla_xdp->nla_len = NLA_HDRLEN + sizeof(old_fd);
> > > -               memcpy((char *)nla_xdp + NLA_HDRLEN, &old_fd, sizeof(old_fd));
> > > -               nla->nla_len += nla_xdp->nla_len;
> > > +               ret = nlattr_add(&req.nh, sizeof(req), IFLA_XDP_EXPECTED_FD, &flags, sizeof(flags));
> >
> > shouldn't old_fd be used here?
> >
>
> Ouch, yes, thanks for spotting this.
>
> > > +               if (ret < 0)
> > > +                       return ret;
> > >         }
> > >
> > > -       req.nh.nlmsg_len += NLA_ALIGN(nla->nla_len);
> > > +       nlattr_end_nested(&req.nh, nla);
> > >
> > > -       if (send(sock, &req, req.nh.nlmsg_len, 0) < 0) {
> > > -               ret = -errno;
> > > -               goto cleanup;
> > > -       }
> > > -       ret = bpf_netlink_recv(sock, nl_pid, seq, NULL, NULL, NULL);
> > > -
> > > -cleanup:
> > > -       close(sock);
> > > -       return ret;
> > > +       return libbpf_nl_send_recv(&req.nh, NULL, NULL, NULL);
> > >  }
> > >
> > >  int bpf_set_link_xdp_fd_opts(int ifindex, int fd, __u32 flags,
> >
> > [...]
> >
> > > -int libbpf_nl_get_link(int sock, unsigned int nl_pid,
> > > -                      libbpf_dump_nlmsg_t dump_link_nlmsg, void *cookie)
> > > +static int libbpf_nl_send_recv(struct nlmsghdr *nh, __dump_nlmsg_t fn,
> > > +                              libbpf_dump_nlmsg_t _fn, void *cookie)
> > >  {
> > > -       struct {
> > > -               struct nlmsghdr nlh;
> > > -               struct ifinfomsg ifm;
> > > -       } req = {
> > > -               .nlh.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
> > > -               .nlh.nlmsg_type = RTM_GETLINK,
> > > -               .nlh.nlmsg_flags = NLM_F_DUMP | NLM_F_REQUEST,
> > > -               .ifm.ifi_family = AF_PACKET,
> > > -       };
> > > -       int seq = time(NULL);
> > > +       __u32 nl_pid = 0;
> > > +       int sock, ret;
> > >
> > > -       req.nlh.nlmsg_seq = seq;
> > > -       if (send(sock, &req, req.nlh.nlmsg_len, 0) < 0)
> > > -               return -errno;
> > > +       if (!nh)
> > > +               return -EINVAL;
> > > +
> > > +       sock = libbpf_netlink_open(&nl_pid);
> > > +       if (sock < 0)
> > > +               return sock;
> > >
> > > -       return bpf_netlink_recv(sock, nl_pid, seq, __dump_link_nlmsg,
> > > -                               dump_link_nlmsg, cookie);
> > > +       nh->nlmsg_pid = 0;
> > > +       nh->nlmsg_seq = time(NULL);
> > > +       if (send(sock, nh, nh->nlmsg_len, 0) < 0) {
> > > +               ret = -errno;
> > > +               goto end;
> > > +       }
> > > +
> > > +       ret = bpf_netlink_recv(sock, nl_pid, nh->nlmsg_seq, fn, _fn, cookie);
> >
> > what's the difference between fn and _fn, can this be somehow
> > reflected in the name?
> >
>
> You can use fn as a common parsing function for the same RTM_GET* message, and
> then use _fn to parse a nested layer of attributes below it to fill in different
> kind of opts (through the cookie user data parameter).
>
> How about outer_cb, inner_cb?

so the outer thingy is "message" and internal one is "attribute" in
netlink lingo? If yes, then parse_msg and parse_attr would make more
sense, imo. If not, outer_cb/inner_cb is fine as well.

>
> > > +
> > > +end:
> > > +       close(sock);
> > > +       return ret;
> > >  }
> > > diff --git a/tools/lib/bpf/nlattr.h b/tools/lib/bpf/nlattr.h
> > > index 6cc3ac91690f..1c94cdb6e89d 100644
> > > --- a/tools/lib/bpf/nlattr.h
> > > +++ b/tools/lib/bpf/nlattr.h
> > > @@ -10,7 +10,10 @@
> > >  #define __LIBBPF_NLATTR_H
> > >
> > >  #include <stdint.h>
> > > +#include <string.h>
> > > +#include <errno.h>
> > >  #include <linux/netlink.h>
> > > +
> > >  /* avoid multiple definition of netlink features */
> > >  #define __LINUX_NETLINK_H
> > >
> > > @@ -103,4 +106,49 @@ int libbpf_nla_parse_nested(struct nlattr *tb[], int maxtype,
> > >
> > >  int libbpf_nla_dump_errormsg(struct nlmsghdr *nlh);
> > >
> > > +static inline struct nlattr *nla_data(struct nlattr *nla)
> > > +{
> > > +       return (struct nlattr *)((char *)nla + NLA_HDRLEN);
> > > +}
> > > +
> > > +static inline struct nlattr *nh_tail(struct nlmsghdr *nh)
> > > +{
> > > +       return (struct nlattr *)((char *)nh + NLMSG_ALIGN(nh->nlmsg_len));
> > > +}
> > > +
> > > +static inline int nlattr_add(struct nlmsghdr *nh, size_t maxsz, int type,
> > > +                            const void *data, int len)
> > > +{
> > > +       struct nlattr *nla;
> > > +
> > > +       if (NLMSG_ALIGN(nh->nlmsg_len) + NLA_ALIGN(NLA_HDRLEN + len) > maxsz)
> > > +               return -EMSGSIZE;
> > > +       if ((!data && len) || (data && !len))
> >
> > we use !!data != !!len for this in at least few places
> >
>
> Ok.
>
> > > +               return -EINVAL;
> > > +
> > > +       nla = nh_tail(nh);
> > > +       nla->nla_type = type;
> > > +       nla->nla_len = NLA_HDRLEN + len;
> > > +       if (data)
> > > +               memcpy(nla_data(nla), data, len);
> > > +       nh->nlmsg_len = NLMSG_ALIGN(nh->nlmsg_len) + NLA_ALIGN(nla->nla_len);
> > > +       return 0;
> > > +}
> > > +
> >
> > [...]
>
> --
> Kartikeya

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next v5 2/3] libbpf: add low level TC-BPF API
  2021-05-01  6:32     ` Kumar Kartikeya Dwivedi
@ 2021-05-03 22:54       ` Andrii Nakryiko
  2021-05-03 23:11         ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 14+ messages in thread
From: Andrii Nakryiko @ 2021-05-03 22:54 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Toke Høiland-Jørgensen, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, Shaun Crampton,
	Networking

On Fri, Apr 30, 2021 at 11:32 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> On Sat, May 01, 2021 at 01:05:40AM IST, Andrii Nakryiko wrote:
> > On Wed, Apr 28, 2021 at 9:26 AM Kumar Kartikeya Dwivedi
> > <memxor@gmail.com> wrote:
> > >
> > > This adds functions that wrap the netlink API used for adding,
> > > manipulating, and removing traffic control filters.
> > >
> > > An API summary:
> > >
> > > A bpf_tc_hook represents a location where a TC-BPF filter can be
> > > attached. This means that creating a hook leads to creation of the
> > > backing qdisc, while destruction either removes all filters attached to
> > > a hook, or destroys qdisc if requested explicitly (as discussed below).
> > >
> > > The TC-BPF API functions operate on this bpf_tc_hook to attach, replace,
> > > query, and detach tc filters.
> > >
> > > All functions return 0 on success, and a negative error code on failure.
> > >

[...]

> > >
> > > Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> > > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > > ---
> >
> > API looks good to me (except the flags field that just stands out).
> > But I'll defer to Daniel to make the final call.
> >
> > >  tools/lib/bpf/libbpf.h   |  41 ++++
> > >  tools/lib/bpf/libbpf.map |   5 +
> > >  tools/lib/bpf/netlink.c  | 463 ++++++++++++++++++++++++++++++++++++++-
> > >  3 files changed, 508 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> > > index bec4e6a6e31d..3de701f46a33 100644
> > > --- a/tools/lib/bpf/libbpf.h
> > > +++ b/tools/lib/bpf/libbpf.h
> > > @@ -775,6 +775,47 @@ LIBBPF_API int bpf_linker__add_file(struct bpf_linker *linker, const char *filen
> > >  LIBBPF_API int bpf_linker__finalize(struct bpf_linker *linker);
> > >  LIBBPF_API void bpf_linker__free(struct bpf_linker *linker);
> > >
> > > +enum bpf_tc_attach_point {
> > > +       BPF_TC_INGRESS = 1 << 0,
> > > +       BPF_TC_EGRESS  = 1 << 1,
> > > +       BPF_TC_CUSTOM  = 1 << 2,
> > > +};
> > > +
> > > +enum bpf_tc_attach_flags {
> > > +       BPF_TC_F_REPLACE = 1 << 0,
> > > +};
> > > +
> > > +struct bpf_tc_hook {
> > > +       size_t sz;
> > > +       int ifindex;
> > > +       enum bpf_tc_attach_point attach_point;
> > > +       __u32 parent;
> > > +       size_t :0;
> > > +};
> > > +
> > > +#define bpf_tc_hook__last_field parent
> > > +
> > > +struct bpf_tc_opts {
> > > +       size_t sz;
> > > +       int prog_fd;
> > > +       __u32 prog_id;
> > > +       __u32 handle;
> > > +       __u32 priority;
> > > +       size_t :0;
> > > +};
> > > +
> > > +#define bpf_tc_opts__last_field priority
> > > +
> > > +LIBBPF_API int bpf_tc_hook_create(struct bpf_tc_hook *hook, int flags);
> > > +LIBBPF_API int bpf_tc_hook_destroy(struct bpf_tc_hook *hook);
> > > +LIBBPF_API int bpf_tc_attach(const struct bpf_tc_hook *hook,
> > > +                            struct bpf_tc_opts *opts,
> > > +                            int flags);
> >
> > why didn't you put flags into bpf_tc_opts? they are clearly optional
> > and fit into "opts" paradigm...
> >
>
> I can move this into opts, but during previous discussion it was kept outside
> opts by Daniel, so I kept that unchanged.

for bpf_tc_attach() I see no reason to keep flags separate. For
bpf_tc_hook_create()... for extensibility it would need it's own opts
for hook creation. But if flags is 99% the only thing we'll need, then
we can always add extra bpf_tc_hook_create_opts() later.

>
> > > +LIBBPF_API int bpf_tc_detach(const struct bpf_tc_hook *hook,
> > > +                            const struct bpf_tc_opts *opts);
> > > +LIBBPF_API int bpf_tc_query(const struct bpf_tc_hook *hook,
> > > +                           struct bpf_tc_opts *opts);
> > > +
> > >  #ifdef __cplusplus
> > >  } /* extern "C" */
> > >  #endif

[...]

> > > +               return -EINVAL;
> > > +
> > > +       return tc_qdisc_create_excl(hook, flags);
> > > +}
> > > +
> > > +static int tc_cls_detach(const struct bpf_tc_hook *hook,
> > > +                        const struct bpf_tc_opts *opts, bool flush);
> > > +
> > > +int bpf_tc_hook_destroy(struct bpf_tc_hook *hook)
> > > +{
> > > +       if (!hook || !OPTS_VALID(hook, bpf_tc_hook) ||
> > > +           OPTS_GET(hook, ifindex, 0) <= 0)
> > > +               return -EINVAL;
> > > +
> > > +       switch ((int)OPTS_GET(hook, attach_point, 0)) {
> >
> > int casting. Did the compiler complain about that or what?
> >
>
> It complains on -Wswitch, as we switch on values apart from the enum values, but
> I'll see if I can remove it.

ah, because of BPF_TC_INGRESS|BPF_TC_EGRESS? That sucks, of course. An
alternative I guess is just declaring BPF_TC_INGRESS_EGRESS =
BPF_TC_INGRESS | BPF_TC_EGRESS, but I don't know how awful that would
be.

>
> > > +               case BPF_TC_INGRESS:
> > > +               case BPF_TC_EGRESS:
> > > +                       return tc_cls_detach(hook, NULL, true);
> > > +               case BPF_TC_INGRESS|BPF_TC_EGRESS:
> > > +                       return tc_qdisc_delete(hook);
> > > +               case BPF_TC_CUSTOM:
> > > +                       return -EOPNOTSUPP;
> > > +               default:
> > > +                       return -EINVAL;
> > > +       }
> > > +}
> > > +

[...]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next v5 3/3] libbpf: add selftests for TC-BPF API
  2021-05-01  6:34     ` Kumar Kartikeya Dwivedi
@ 2021-05-03 22:55       ` Andrii Nakryiko
  0 siblings, 0 replies; 14+ messages in thread
From: Andrii Nakryiko @ 2021-05-03 22:55 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Toke Høiland-Jørgensen, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, Shaun Crampton,
	Networking

On Fri, Apr 30, 2021 at 11:34 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> On Sat, May 01, 2021 at 01:11:47AM IST, Andrii Nakryiko wrote:
> > On Wed, Apr 28, 2021 at 9:26 AM Kumar Kartikeya Dwivedi
> > <memxor@gmail.com> wrote:
> > >
> > > This adds some basic tests for the low level bpf_tc_* API.
> > >
> > > Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> > > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > > ---
> > >  .../testing/selftests/bpf/prog_tests/tc_bpf.c | 467 ++++++++++++++++++
> > >  .../testing/selftests/bpf/progs/test_tc_bpf.c |  12 +
> > >  2 files changed, 479 insertions(+)
> > >  create mode 100644 tools/testing/selftests/bpf/prog_tests/tc_bpf.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/test_tc_bpf.c
> > >

[...]

> >
> > > +
> > > +       /* attach */
> > > +       ret = bpf_tc_attach(NULL, &attach_opts, 0);
> > > +       if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_attach invalid hook = NULL"))
> > > +               return -EINVAL;
> > > +       ret = bpf_tc_attach(hook, &attach_opts, 42);
> > > +       if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_attach invalid flags"))
> > > +               return -EINVAL;
> > > +       attach_opts.prog_fd = 0;
> > > +       ret = bpf_tc_attach(hook, &attach_opts, 0);
> > > +       if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_attach invalid prog_fd unset"))
> > > +               return -EINVAL;
> > > +       attach_opts.prog_fd = fd;
> > > +       attach_opts.prog_id = 42;
> > > +       ret = bpf_tc_attach(hook, &attach_opts, 0);
> > > +       if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_attach invalid prog_id set"))
> > > +               return -EINVAL;
> > > +       attach_opts.prog_id = 0;
> > > +       attach_opts.handle = 0;
> > > +       ret = bpf_tc_attach(hook, &attach_opts, 0);
> > > +       if (!ASSERT_OK(ret, "bpf_tc_attach valid handle unset"))
> > > +               return -EINVAL;
> > > +       attach_opts.prog_fd = attach_opts.prog_id = 0;
> > > +       ASSERT_OK(bpf_tc_detach(hook, &attach_opts), "bpf_tc_detach");
> >
> > this code is quite hard to follow, maybe sprinkle empty lines between
> > logical groups of statements (i.e., prepare inputs + call bpf_tc_xxx +
> > assert is one group that goes together)
> >
>
> I agree it looks bad. I can also just make a new opts for each combination, and
> name it that way. Maybe that will look much better.

It probably would be just more code to read. Try to space it out with
empty lines into logical groups, that should be enough.

>
> > > +       attach_opts.prog_fd = fd;
> > > +       attach_opts.handle = 1;
> > > +       attach_opts.priority = 0;
> > > +       ret = bpf_tc_attach(hook, &attach_opts, 0);
> > > +       if (!ASSERT_OK(ret, "bpf_tc_attach valid priority unset"))
> > > +               return -EINVAL;
> > > +       attach_opts.prog_fd = attach_opts.prog_id = 0;
> > > +       ASSERT_OK(bpf_tc_detach(hook, &attach_opts), "bpf_tc_detach");
> > > +       attach_opts.prog_fd = fd;
> > > +       attach_opts.priority = UINT16_MAX + 1;
> > > +       ret = bpf_tc_attach(hook, &attach_opts, 0);
> > > +       if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_attach invalid priority > UINT16_MAX"))
> > > +               return -EINVAL;
> > > +       attach_opts.priority = 0;
> > > +       attach_opts.handle = attach_opts.priority = 0;
> > > +       ret = bpf_tc_attach(hook, &attach_opts, 0);
> > > +       if (!ASSERT_OK(ret, "bpf_tc_attach valid both handle and priority unset"))
> > > +               return -EINVAL;
> > > +       attach_opts.prog_fd = attach_opts.prog_id = 0;
> > > +       ASSERT_OK(bpf_tc_detach(hook, &attach_opts), "bpf_tc_detach");
> > > +       ret = bpf_tc_attach(hook, NULL, 0);
> > > +       if (!ASSERT_EQ(ret, -EINVAL, "bpf_tc_attach invalid opts = NULL"))
> > > +               return -EINVAL;
> > > +
> > > +       return 0;
> > > +}
> > > +

[...]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next v5 2/3] libbpf: add low level TC-BPF API
  2021-05-03 22:54       ` Andrii Nakryiko
@ 2021-05-03 23:11         ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 14+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2021-05-03 23:11 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Toke Høiland-Jørgensen, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, Shaun Crampton,
	Networking

On Tue, May 04, 2021 at 04:24:05AM IST, Andrii Nakryiko wrote:
> On Fri, Apr 30, 2021 at 11:32 PM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > On Sat, May 01, 2021 at 01:05:40AM IST, Andrii Nakryiko wrote:
[...]
> > >
> > > why didn't you put flags into bpf_tc_opts? they are clearly optional
> > > and fit into "opts" paradigm...
> > >
> >
> > I can move this into opts, but during previous discussion it was kept outside
> > opts by Daniel, so I kept that unchanged.
>
> for bpf_tc_attach() I see no reason to keep flags separate. For
> bpf_tc_hook_create()... for extensibility it would need it's own opts
> for hook creation. But if flags is 99% the only thing we'll need, then
> we can always add extra bpf_tc_hook_create_opts() later.
>

I'll put flags in the respective opts struct for both.

The hook creation path was kept generic enough so that this can be extended to
complex qdisc setup in the future than just clsact (even classful qdiscs should
be possible). So it is quite possible for bpf_tc_hook to take more parameters
than just flags by mapping different attach_point to different qdiscs.

Given some parameters are already optional depending on attach_point, it is
probably better to put flags in opts than dropping opts for now.

--
Kartikeya

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, back to index

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-28 16:25 [PATCH bpf-next v5 0/3] Add TC-BPF API Kumar Kartikeya Dwivedi
2021-04-28 16:25 ` [PATCH bpf-next v5 1/3] libbpf: add netlink helpers Kumar Kartikeya Dwivedi
2021-04-30 19:04   ` Andrii Nakryiko
2021-05-01  6:13     ` Kumar Kartikeya Dwivedi
2021-05-03 22:47       ` Andrii Nakryiko
2021-04-28 16:25 ` [PATCH bpf-next v5 2/3] libbpf: add low level TC-BPF API Kumar Kartikeya Dwivedi
2021-04-30 19:35   ` Andrii Nakryiko
2021-05-01  6:32     ` Kumar Kartikeya Dwivedi
2021-05-03 22:54       ` Andrii Nakryiko
2021-05-03 23:11         ` Kumar Kartikeya Dwivedi
2021-04-28 16:25 ` [PATCH bpf-next v5 3/3] libbpf: add selftests for " Kumar Kartikeya Dwivedi
2021-04-30 19:41   ` Andrii Nakryiko
2021-05-01  6:34     ` Kumar Kartikeya Dwivedi
2021-05-03 22:55       ` Andrii Nakryiko

Netdev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/netdev/0 netdev/git/0.git
	git clone --mirror https://lore.kernel.org/netdev/1 netdev/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 netdev netdev/ https://lore.kernel.org/netdev \
		netdev@vger.kernel.org
	public-inbox-index netdev

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.netdev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git