bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v2 0/4] Add TC-BPF API
@ 2021-04-19 12:18 Kumar Kartikeya Dwivedi
  2021-04-19 12:18 ` [PATCH bpf-next v2 1/4] tools: pkt_cls.h: sync with kernel sources Kumar Kartikeya Dwivedi
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2021-04-19 12:18 UTC (permalink / raw)
  To: bpf
  Cc: Kumar Kartikeya Dwivedi, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, Jesper Dangaard Brouer,
	Toke Høiland-Jørgensen, linux-kernel, netdev

This is the second version of the TC-BPF series.

It adds a simple API that uses netlink to attach the tc filter and its bpf
classifier. Currently, a user needs to shell out to the tc command line to be
able to create filters and attach SCHED_CLS programs as classifiers. With the
help of this API, it will be possible to use libbpf for doing all parts of bpf
program setup and attach.

Direct action is now the default, and currently no way to disable it has been
provided. This also means that SCHED_ACT programs are a lot less useful, so
support for them has been removed for now. In the future, if someone comes up
with a convincing use case where the direct action mode doesn't serve their
needs, a simple extension that allows disabling direct action mode and passing
the list of actions that would be bound to the classifier can be added.

In an effort to keep discussion focused, this series doesn't have the high level
TC-BPF API. It was clear that there is a need for a bpf_link API in the kernel,
hence that will be submitted as a separate patchset.

The individual commit messages contain more details, and also a brief summary of
the API.

Changelog:
----------
v1 -> v2
v1: https://lore.kernel.org/bpf/20210325120020.236504-1-memxor@gmail.com

 * netlink helpers have been renamed to object_action style.
 * attach_id now only contains attributes that are not explicitly set. Only
   the bare minimum info is kept in it.
 * protocol is now an optional and defaults to ETH_P_ALL.
 * direct-action mode is default, and cannot be unset for now.
 * skip_sw and skip_hw options have also been removed.
 * bpf_tc_cls_info struct now also returns the bpf program tag and id, as
   available in the netlink response. This came up as a requirement during
   discussion with people wanting to use this functionality.
 * support for attaching SCHED_ACT programs has been dropped, as it isn't
   useful without any support for binding loaded actions to a classifier.
 * the distinction between dev and block API has been dropped, there is now
   a single set of functions and user has to pass the special ifindex value
   to indicate operation on a shared filter block on their own.
 * The high level API returning a bpf_link is gone. This was already non-
   functional for pinning and typical ownership semantics. Instead, a separate
   patchset will be sent adding a bpf_link API for attaching SCHED_CLS progs to
   the kernel, and its corresponding libbpf API.
 * The clsact qdisc is now setup automatically in a best-effort fashion whenever
   user passes in the clsact ingress or egress parent id. This is done with
   exclusive mode, such that if an ingress or clsact qdisc is already set up,
   we skip the setup and move on with filter creation.
 * Other minor changes that came up during the course of discussion and rework.

Kumar Kartikeya Dwivedi (4):
  tools: pkt_cls.h: sync with kernel sources
  libbpf: add helpers for preparing netlink attributes
  libbpf: add low level TC-BPF API
  libbpf: add selftests for TC-BPF API

 tools/include/uapi/linux/pkt_cls.h            | 174 +++++++-
 tools/lib/bpf/Makefile                        |   3 +
 tools/lib/bpf/libbpf.h                        |  52 +++
 tools/lib/bpf/libbpf.map                      |   5 +
 tools/lib/bpf/netlink.c                       | 414 ++++++++++++++++--
 tools/lib/bpf/nlattr.h                        |  48 ++
 .../selftests/bpf/prog_tests/test_tc_bpf.c    | 112 +++++
 .../selftests/bpf/progs/test_tc_bpf_kern.c    |  12 +
 8 files changed, 789 insertions(+), 31 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c

--
2.30.2


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH bpf-next v2 1/4] tools: pkt_cls.h: sync with kernel sources
  2021-04-19 12:18 [PATCH bpf-next v2 0/4] Add TC-BPF API Kumar Kartikeya Dwivedi
@ 2021-04-19 12:18 ` Kumar Kartikeya Dwivedi
  2021-04-19 12:18 ` [PATCH bpf-next v2 2/4] libbpf: add helpers for preparing netlink attributes Kumar Kartikeya Dwivedi
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2021-04-19 12:18 UTC (permalink / raw)
  To: bpf
  Cc: Kumar Kartikeya Dwivedi, Toke Høiland-Jørgensen,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, Jesper Dangaard Brouer, linux-kernel,
	netdev

Update the header file so we can use the new defines in subsequent
patches. Also make sure they remain up to date.

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 tools/include/uapi/linux/pkt_cls.h | 174 ++++++++++++++++++++++++++++-
 tools/lib/bpf/Makefile             |   3 +
 2 files changed, 173 insertions(+), 4 deletions(-)

diff --git a/tools/include/uapi/linux/pkt_cls.h b/tools/include/uapi/linux/pkt_cls.h
index 12153771396a..025c40fef93d 100644
--- a/tools/include/uapi/linux/pkt_cls.h
+++ b/tools/include/uapi/linux/pkt_cls.h
@@ -16,9 +16,36 @@ enum {
 	TCA_ACT_STATS,
 	TCA_ACT_PAD,
 	TCA_ACT_COOKIE,
+	TCA_ACT_FLAGS,
+	TCA_ACT_HW_STATS,
+	TCA_ACT_USED_HW_STATS,
 	__TCA_ACT_MAX
 };
 
+#define TCA_ACT_FLAGS_NO_PERCPU_STATS 1 /* Don't use percpu allocator for
+					 * actions stats.
+					 */
+
+/* tca HW stats type
+ * When user does not pass the attribute, he does not care.
+ * It is the same as if he would pass the attribute with
+ * all supported bits set.
+ * In case no bits are set, user is not interested in getting any HW statistics.
+ */
+#define TCA_ACT_HW_STATS_IMMEDIATE (1 << 0) /* Means that in dump, user
+					     * gets the current HW stats
+					     * state from the device
+					     * queried at the dump time.
+					     */
+#define TCA_ACT_HW_STATS_DELAYED (1 << 1) /* Means that in dump, user gets
+					   * HW stats that might be out of date
+					   * for some time, maybe couple of
+					   * seconds. This is the case when
+					   * driver polls stats updates
+					   * periodically or when it gets async
+					   * stats update from the device.
+					   */
+
 #define TCA_ACT_MAX __TCA_ACT_MAX
 #define TCA_OLD_COMPAT (TCA_ACT_MAX+1)
 #define TCA_ACT_MAX_PRIO 32
@@ -63,12 +90,53 @@ enum {
 #define TC_ACT_GOTO_CHAIN __TC_ACT_EXT(2)
 #define TC_ACT_EXT_OPCODE_MAX	TC_ACT_GOTO_CHAIN
 
+/* These macros are put here for binary compatibility with userspace apps that
+ * make use of them. For kernel code and new userspace apps, use the TCA_ID_*
+ * versions.
+ */
+#define TCA_ACT_GACT 5
+#define TCA_ACT_IPT 6
+#define TCA_ACT_PEDIT 7
+#define TCA_ACT_MIRRED 8
+#define TCA_ACT_NAT 9
+#define TCA_ACT_XT 10
+#define TCA_ACT_SKBEDIT 11
+#define TCA_ACT_VLAN 12
+#define TCA_ACT_BPF 13
+#define TCA_ACT_CONNMARK 14
+#define TCA_ACT_SKBMOD 15
+#define TCA_ACT_CSUM 16
+#define TCA_ACT_TUNNEL_KEY 17
+#define TCA_ACT_SIMP 22
+#define TCA_ACT_IFE 25
+#define TCA_ACT_SAMPLE 26
+
 /* Action type identifiers*/
-enum {
-	TCA_ID_UNSPEC=0,
-	TCA_ID_POLICE=1,
+enum tca_id {
+	TCA_ID_UNSPEC = 0,
+	TCA_ID_POLICE = 1,
+	TCA_ID_GACT = TCA_ACT_GACT,
+	TCA_ID_IPT = TCA_ACT_IPT,
+	TCA_ID_PEDIT = TCA_ACT_PEDIT,
+	TCA_ID_MIRRED = TCA_ACT_MIRRED,
+	TCA_ID_NAT = TCA_ACT_NAT,
+	TCA_ID_XT = TCA_ACT_XT,
+	TCA_ID_SKBEDIT = TCA_ACT_SKBEDIT,
+	TCA_ID_VLAN = TCA_ACT_VLAN,
+	TCA_ID_BPF = TCA_ACT_BPF,
+	TCA_ID_CONNMARK = TCA_ACT_CONNMARK,
+	TCA_ID_SKBMOD = TCA_ACT_SKBMOD,
+	TCA_ID_CSUM = TCA_ACT_CSUM,
+	TCA_ID_TUNNEL_KEY = TCA_ACT_TUNNEL_KEY,
+	TCA_ID_SIMP = TCA_ACT_SIMP,
+	TCA_ID_IFE = TCA_ACT_IFE,
+	TCA_ID_SAMPLE = TCA_ACT_SAMPLE,
+	TCA_ID_CTINFO,
+	TCA_ID_MPLS,
+	TCA_ID_CT,
+	TCA_ID_GATE,
 	/* other actions go here */
-	__TCA_ID_MAX=255
+	__TCA_ID_MAX = 255
 };
 
 #define TCA_ID_MAX __TCA_ID_MAX
@@ -120,6 +188,10 @@ enum {
 	TCA_POLICE_RESULT,
 	TCA_POLICE_TM,
 	TCA_POLICE_PAD,
+	TCA_POLICE_RATE64,
+	TCA_POLICE_PEAKRATE64,
+	TCA_POLICE_PKTRATE64,
+	TCA_POLICE_PKTBURST64,
 	__TCA_POLICE_MAX
 #define TCA_POLICE_RESULT TCA_POLICE_RESULT
 };
@@ -333,12 +405,19 @@ enum {
 
 /* Basic filter */
 
+struct tc_basic_pcnt {
+	__u64 rcnt;
+	__u64 rhit;
+};
+
 enum {
 	TCA_BASIC_UNSPEC,
 	TCA_BASIC_CLASSID,
 	TCA_BASIC_EMATCHES,
 	TCA_BASIC_ACT,
 	TCA_BASIC_POLICE,
+	TCA_BASIC_PCNT,
+	TCA_BASIC_PAD,
 	__TCA_BASIC_MAX
 };
 
@@ -485,17 +564,54 @@ enum {
 
 	TCA_FLOWER_IN_HW_COUNT,
 
+	TCA_FLOWER_KEY_PORT_SRC_MIN,	/* be16 */
+	TCA_FLOWER_KEY_PORT_SRC_MAX,	/* be16 */
+	TCA_FLOWER_KEY_PORT_DST_MIN,	/* be16 */
+	TCA_FLOWER_KEY_PORT_DST_MAX,	/* be16 */
+
+	TCA_FLOWER_KEY_CT_STATE,	/* u16 */
+	TCA_FLOWER_KEY_CT_STATE_MASK,	/* u16 */
+	TCA_FLOWER_KEY_CT_ZONE,		/* u16 */
+	TCA_FLOWER_KEY_CT_ZONE_MASK,	/* u16 */
+	TCA_FLOWER_KEY_CT_MARK,		/* u32 */
+	TCA_FLOWER_KEY_CT_MARK_MASK,	/* u32 */
+	TCA_FLOWER_KEY_CT_LABELS,	/* u128 */
+	TCA_FLOWER_KEY_CT_LABELS_MASK,	/* u128 */
+
+	TCA_FLOWER_KEY_MPLS_OPTS,
+
+	TCA_FLOWER_KEY_HASH,		/* u32 */
+	TCA_FLOWER_KEY_HASH_MASK,	/* u32 */
+
 	__TCA_FLOWER_MAX,
 };
 
 #define TCA_FLOWER_MAX (__TCA_FLOWER_MAX - 1)
 
+enum {
+	TCA_FLOWER_KEY_CT_FLAGS_NEW = 1 << 0, /* Beginning of a new connection. */
+	TCA_FLOWER_KEY_CT_FLAGS_ESTABLISHED = 1 << 1, /* Part of an existing connection. */
+	TCA_FLOWER_KEY_CT_FLAGS_RELATED = 1 << 2, /* Related to an established connection. */
+	TCA_FLOWER_KEY_CT_FLAGS_TRACKED = 1 << 3, /* Conntrack has occurred. */
+	TCA_FLOWER_KEY_CT_FLAGS_INVALID = 1 << 4, /* Conntrack is invalid. */
+	TCA_FLOWER_KEY_CT_FLAGS_REPLY = 1 << 5, /* Packet is in the reply direction. */
+	__TCA_FLOWER_KEY_CT_FLAGS_MAX,
+};
+
 enum {
 	TCA_FLOWER_KEY_ENC_OPTS_UNSPEC,
 	TCA_FLOWER_KEY_ENC_OPTS_GENEVE, /* Nested
 					 * TCA_FLOWER_KEY_ENC_OPT_GENEVE_
 					 * attributes
 					 */
+	TCA_FLOWER_KEY_ENC_OPTS_VXLAN,	/* Nested
+					 * TCA_FLOWER_KEY_ENC_OPT_VXLAN_
+					 * attributes
+					 */
+	TCA_FLOWER_KEY_ENC_OPTS_ERSPAN,	/* Nested
+					 * TCA_FLOWER_KEY_ENC_OPT_ERSPAN_
+					 * attributes
+					 */
 	__TCA_FLOWER_KEY_ENC_OPTS_MAX,
 };
 
@@ -513,18 +629,68 @@ enum {
 #define TCA_FLOWER_KEY_ENC_OPT_GENEVE_MAX \
 		(__TCA_FLOWER_KEY_ENC_OPT_GENEVE_MAX - 1)
 
+enum {
+	TCA_FLOWER_KEY_ENC_OPT_VXLAN_UNSPEC,
+	TCA_FLOWER_KEY_ENC_OPT_VXLAN_GBP,		/* u32 */
+	__TCA_FLOWER_KEY_ENC_OPT_VXLAN_MAX,
+};
+
+#define TCA_FLOWER_KEY_ENC_OPT_VXLAN_MAX \
+		(__TCA_FLOWER_KEY_ENC_OPT_VXLAN_MAX - 1)
+
+enum {
+	TCA_FLOWER_KEY_ENC_OPT_ERSPAN_UNSPEC,
+	TCA_FLOWER_KEY_ENC_OPT_ERSPAN_VER,              /* u8 */
+	TCA_FLOWER_KEY_ENC_OPT_ERSPAN_INDEX,            /* be32 */
+	TCA_FLOWER_KEY_ENC_OPT_ERSPAN_DIR,              /* u8 */
+	TCA_FLOWER_KEY_ENC_OPT_ERSPAN_HWID,             /* u8 */
+	__TCA_FLOWER_KEY_ENC_OPT_ERSPAN_MAX,
+};
+
+#define TCA_FLOWER_KEY_ENC_OPT_ERSPAN_MAX \
+		(__TCA_FLOWER_KEY_ENC_OPT_ERSPAN_MAX - 1)
+
+enum {
+	TCA_FLOWER_KEY_MPLS_OPTS_UNSPEC,
+	TCA_FLOWER_KEY_MPLS_OPTS_LSE,
+	__TCA_FLOWER_KEY_MPLS_OPTS_MAX,
+};
+
+#define TCA_FLOWER_KEY_MPLS_OPTS_MAX (__TCA_FLOWER_KEY_MPLS_OPTS_MAX - 1)
+
+enum {
+	TCA_FLOWER_KEY_MPLS_OPT_LSE_UNSPEC,
+	TCA_FLOWER_KEY_MPLS_OPT_LSE_DEPTH,
+	TCA_FLOWER_KEY_MPLS_OPT_LSE_TTL,
+	TCA_FLOWER_KEY_MPLS_OPT_LSE_BOS,
+	TCA_FLOWER_KEY_MPLS_OPT_LSE_TC,
+	TCA_FLOWER_KEY_MPLS_OPT_LSE_LABEL,
+	__TCA_FLOWER_KEY_MPLS_OPT_LSE_MAX,
+};
+
+#define TCA_FLOWER_KEY_MPLS_OPT_LSE_MAX \
+		(__TCA_FLOWER_KEY_MPLS_OPT_LSE_MAX - 1)
+
 enum {
 	TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT = (1 << 0),
 	TCA_FLOWER_KEY_FLAGS_FRAG_IS_FIRST = (1 << 1),
 };
 
+#define TCA_FLOWER_MASK_FLAGS_RANGE	(1 << 0) /* Range-based match */
+
 /* Match-all classifier */
 
+struct tc_matchall_pcnt {
+	__u64 rhit;
+};
+
 enum {
 	TCA_MATCHALL_UNSPEC,
 	TCA_MATCHALL_CLASSID,
 	TCA_MATCHALL_ACT,
 	TCA_MATCHALL_FLAGS,
+	TCA_MATCHALL_PCNT,
+	TCA_MATCHALL_PAD,
 	__TCA_MATCHALL_MAX,
 };
 
diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
index e43e1896cb4b..745624d91a01 100644
--- a/tools/lib/bpf/Makefile
+++ b/tools/lib/bpf/Makefile
@@ -152,6 +152,9 @@ $(BPF_IN_SHARED): force $(BPF_HELPER_DEFS)
 	@(test -f ../../include/uapi/linux/if_xdp.h -a -f ../../../include/uapi/linux/if_xdp.h && ( \
 	(diff -B ../../include/uapi/linux/if_xdp.h ../../../include/uapi/linux/if_xdp.h >/dev/null) || \
 	echo "Warning: Kernel ABI header at 'tools/include/uapi/linux/if_xdp.h' differs from latest version at 'include/uapi/linux/if_xdp.h'" >&2 )) || true
+	@(test -f ../../include/uapi/linux/pkt_cls.h -a -f ../../../include/uapi/linux/pkt_cls.h && ( \
+	(diff -B ../../include/uapi/linux/pkt_cls.h ../../../include/uapi/linux/pkt_cls.h >/dev/null) || \
+	echo "Warning: Kernel ABI header at 'tools/include/uapi/linux/pkt_cls.h' differs from latest version at 'include/uapi/linux/pkt_cls.h'" >&2 )) || true
 	$(Q)$(MAKE) $(build)=libbpf OUTPUT=$(SHARED_OBJDIR) CFLAGS="$(CFLAGS) $(SHLIB_FLAGS)"
 
 $(BPF_IN_STATIC): force $(BPF_HELPER_DEFS)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH bpf-next v2 2/4] libbpf: add helpers for preparing netlink attributes
  2021-04-19 12:18 [PATCH bpf-next v2 0/4] Add TC-BPF API Kumar Kartikeya Dwivedi
  2021-04-19 12:18 ` [PATCH bpf-next v2 1/4] tools: pkt_cls.h: sync with kernel sources Kumar Kartikeya Dwivedi
@ 2021-04-19 12:18 ` Kumar Kartikeya Dwivedi
  2021-04-19 12:18 ` [PATCH bpf-next v2 3/4] libbpf: add low level TC-BPF API Kumar Kartikeya Dwivedi
  2021-04-19 12:18 ` [PATCH bpf-next v2 4/4] libbpf: add selftests for " Kumar Kartikeya Dwivedi
  3 siblings, 0 replies; 10+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2021-04-19 12:18 UTC (permalink / raw)
  To: bpf
  Cc: Kumar Kartikeya Dwivedi, Toke Høiland-Jørgensen,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, Jesper Dangaard Brouer, linux-kernel,
	netdev

This change introduces a few helpers to wrap open coded attribute
preparation in netlink.c.

Every nested attribute's closure must happen using the helper
nlattr_end_nested, which sets its length properly. NLA_F_NESTED is
enforeced using nlattr_begin_nested helper. Other simple attributes
can be added directly.

The maxsz parameter corresponds to the size of the request structure
which is being filled in, so for instance with req being:

struct {
	struct nlmsghdr nh;
	struct tcmsg t;
	char buf[4096];
} req;

Then, maxsz should be sizeof(req).

This change also converts the open coded attribute preparation with the
helpers. Note that the only failure the internal call to nlattr_add
could result in the nested helper would be -EMSGSIZE, hence that is what
we return to our caller.

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 tools/lib/bpf/netlink.c | 37 ++++++++++++++-----------------
 tools/lib/bpf/nlattr.h  | 48 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 64 insertions(+), 21 deletions(-)

diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c
index d2cb28e9ef52..c79e30484e81 100644
--- a/tools/lib/bpf/netlink.c
+++ b/tools/lib/bpf/netlink.c
@@ -135,7 +135,7 @@ static int __bpf_set_link_xdp_fd_replace(int ifindex, int fd, int old_fd,
 					 __u32 flags)
 {
 	int sock, seq = 0, ret;
-	struct nlattr *nla, *nla_xdp;
+	struct nlattr *nla;
 	struct {
 		struct nlmsghdr  nh;
 		struct ifinfomsg ifinfo;
@@ -157,36 +157,31 @@ static int __bpf_set_link_xdp_fd_replace(int ifindex, int fd, int old_fd,
 	req.ifinfo.ifi_index = ifindex;
 
 	/* started nested attribute for XDP */
-	nla = (struct nlattr *)(((char *)&req)
-				+ NLMSG_ALIGN(req.nh.nlmsg_len));
-	nla->nla_type = NLA_F_NESTED | IFLA_XDP;
-	nla->nla_len = NLA_HDRLEN;
+	nla = nlattr_begin_nested(&req.nh, sizeof(req), IFLA_XDP);
+	if (!nla) {
+		ret = -EMSGSIZE;
+		goto cleanup;
+	}
 
 	/* add XDP fd */
-	nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
-	nla_xdp->nla_type = IFLA_XDP_FD;
-	nla_xdp->nla_len = NLA_HDRLEN + sizeof(int);
-	memcpy((char *)nla_xdp + NLA_HDRLEN, &fd, sizeof(fd));
-	nla->nla_len += nla_xdp->nla_len;
+	ret = nlattr_add(&req.nh, sizeof(req), IFLA_XDP_FD, &fd, sizeof(fd));
+	if (ret < 0)
+		goto cleanup;
 
 	/* if user passed in any flags, add those too */
 	if (flags) {
-		nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
-		nla_xdp->nla_type = IFLA_XDP_FLAGS;
-		nla_xdp->nla_len = NLA_HDRLEN + sizeof(flags);
-		memcpy((char *)nla_xdp + NLA_HDRLEN, &flags, sizeof(flags));
-		nla->nla_len += nla_xdp->nla_len;
+		ret = nlattr_add(&req.nh, sizeof(req), IFLA_XDP_FLAGS, &flags, sizeof(flags));
+		if (ret < 0)
+			goto cleanup;
 	}
 
 	if (flags & XDP_FLAGS_REPLACE) {
-		nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
-		nla_xdp->nla_type = IFLA_XDP_EXPECTED_FD;
-		nla_xdp->nla_len = NLA_HDRLEN + sizeof(old_fd);
-		memcpy((char *)nla_xdp + NLA_HDRLEN, &old_fd, sizeof(old_fd));
-		nla->nla_len += nla_xdp->nla_len;
+		ret = nlattr_add(&req.nh, sizeof(req), IFLA_XDP_EXPECTED_FD, &flags, sizeof(flags));
+		if (ret < 0)
+			goto cleanup;
 	}
 
-	req.nh.nlmsg_len += NLA_ALIGN(nla->nla_len);
+	nlattr_end_nested(&req.nh, nla);
 
 	if (send(sock, &req, req.nh.nlmsg_len, 0) < 0) {
 		ret = -errno;
diff --git a/tools/lib/bpf/nlattr.h b/tools/lib/bpf/nlattr.h
index 6cc3ac91690f..1c94cdb6e89d 100644
--- a/tools/lib/bpf/nlattr.h
+++ b/tools/lib/bpf/nlattr.h
@@ -10,7 +10,10 @@
 #define __LIBBPF_NLATTR_H
 
 #include <stdint.h>
+#include <string.h>
+#include <errno.h>
 #include <linux/netlink.h>
+
 /* avoid multiple definition of netlink features */
 #define __LINUX_NETLINK_H
 
@@ -103,4 +106,49 @@ int libbpf_nla_parse_nested(struct nlattr *tb[], int maxtype,
 
 int libbpf_nla_dump_errormsg(struct nlmsghdr *nlh);
 
+static inline struct nlattr *nla_data(struct nlattr *nla)
+{
+	return (struct nlattr *)((char *)nla + NLA_HDRLEN);
+}
+
+static inline struct nlattr *nh_tail(struct nlmsghdr *nh)
+{
+	return (struct nlattr *)((char *)nh + NLMSG_ALIGN(nh->nlmsg_len));
+}
+
+static inline int nlattr_add(struct nlmsghdr *nh, size_t maxsz, int type,
+			     const void *data, int len)
+{
+	struct nlattr *nla;
+
+	if (NLMSG_ALIGN(nh->nlmsg_len) + NLA_ALIGN(NLA_HDRLEN + len) > maxsz)
+		return -EMSGSIZE;
+	if ((!data && len) || (data && !len))
+		return -EINVAL;
+
+	nla = nh_tail(nh);
+	nla->nla_type = type;
+	nla->nla_len = NLA_HDRLEN + len;
+	if (data)
+		memcpy(nla_data(nla), data, len);
+	nh->nlmsg_len = NLMSG_ALIGN(nh->nlmsg_len) + NLA_ALIGN(nla->nla_len);
+	return 0;
+}
+
+static inline struct nlattr *nlattr_begin_nested(struct nlmsghdr *nh,
+						 size_t maxsz, int type)
+{
+	struct nlattr *tail;
+
+	tail = nh_tail(nh);
+	if (nlattr_add(nh, maxsz, type | NLA_F_NESTED, NULL, 0))
+		return NULL;
+	return tail;
+}
+
+static inline void nlattr_end_nested(struct nlmsghdr *nh, struct nlattr *tail)
+{
+	tail->nla_len = (char *)nh_tail(nh) - (char *)tail;
+}
+
 #endif /* __LIBBPF_NLATTR_H */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH bpf-next v2 3/4] libbpf: add low level TC-BPF API
  2021-04-19 12:18 [PATCH bpf-next v2 0/4] Add TC-BPF API Kumar Kartikeya Dwivedi
  2021-04-19 12:18 ` [PATCH bpf-next v2 1/4] tools: pkt_cls.h: sync with kernel sources Kumar Kartikeya Dwivedi
  2021-04-19 12:18 ` [PATCH bpf-next v2 2/4] libbpf: add helpers for preparing netlink attributes Kumar Kartikeya Dwivedi
@ 2021-04-19 12:18 ` Kumar Kartikeya Dwivedi
  2021-04-19 21:00   ` Daniel Borkmann
  2021-04-19 12:18 ` [PATCH bpf-next v2 4/4] libbpf: add selftests for " Kumar Kartikeya Dwivedi
  3 siblings, 1 reply; 10+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2021-04-19 12:18 UTC (permalink / raw)
  To: bpf
  Cc: Kumar Kartikeya Dwivedi, Toke Høiland-Jørgensen,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, Jesper Dangaard Brouer, linux-kernel,
	netdev

This adds functions that wrap the netlink API used for adding,
manipulating, and removing traffic control filters. These functions
operate directly on the loaded prog's fd, and return a handle to the
filter using an out parameter named id.

The basic featureset is covered to allow for attaching, manipulation of
properties, and removal of filters. Some additional features like
TCA_BPF_POLICE and TCA_RATE for tc_cls have been omitted. These can
added on top later by extending the bpf_tc_cls_opts struct.

Support for binding actions directly to a classifier by passing them in
during filter creation has also been omitted for now. These actions have
an auto clean up property because their lifetime is bound to the filter
they are attached to. This can be added later, but was omitted for now
as direct action mode is a better alternative to it, which is enabled by
default.

An API summary:

bpf_tc_act_{attach, change, replace} may be used to attach, change, and
replace SCHED_CLS bpf classifier. The protocol field can be set as 0, in
which case it is subsitituted as ETH_P_ALL by default.

The behavior of the three functions is as follows:

attach = create filter if it does not exist, fail otherwise
change = change properties of the classifier of existing filter
replace = create filter, and replace any existing filter

bpf_tc_cls_detach may be used to detach existing SCHED_CLS
filter. The bpf_tc_cls_attach_id object filled in during attach,
change, or replace must be passed in to the detach functions for them to
remove the filter and its attached classififer correctly.

bpf_tc_cls_get_info is a helper that can be used to obtain attributes
for the filter and classififer. The opts structure may be used to
choose the granularity of search, such that info for a specific filter
corresponding to the same loaded bpf program can be obtained. By
default, the first match is returned to the user.

Examples:

	struct bpf_tc_cls_attach_id id = {};
	struct bpf_object *obj;
	struct bpf_program *p;
	int fd, r;

	obj = bpf_object_open("foo.o");
	if (IS_ERR_OR_NULL(obj))
		return PTR_ERR(obj);

	p = bpf_object__find_program_by_title(obj, "classifier");
	if (IS_ERR_OR_NULL(p))
		return PTR_ERR(p);

	if (bpf_object__load(obj) < 0)
		return -1;

	fd = bpf_program__fd(p);

	r = bpf_tc_cls_attach(fd, if_nametoindex("lo"),
			      BPF_TC_CLSACT_INGRESS,
			      NULL, &id);
	if (r < 0)
		return r;

... which is roughly equivalent to (after clsact qdisc setup):
  # tc filter add dev lo ingress bpf obj foo.o sec classifier da

... as direct action mode is always enabled.

If a user wishes to modify existing options on an attached classifier,
bpf_tc_cls_change API may be used.

Only parameters class_id can be modified, the rest are filled in to
identify the correct filter. protocol can be left out if it was not
chosen explicitly (defaulting to ETH_P_ALL).

Example:

	/* Optional parameters necessary to select the right filter */
	DECLARE_LIBBPF_OPTS(bpf_tc_cls_opts, opts,
			    .handle = id.handle,
			    .priority = id.priority,
			    .chain_index = id.chain_index)
	opts.class_id = TC_H_MAKE(1UL << 16, 12);
	r = bpf_tc_cls_change(fd, if_nametoindex("lo"),
			      BPF_TC_CLSACT_INGRESS,
			      &opts, &id);
	if (r < 0)
		return r;

	struct bpf_tc_cls_info info = {};
	r = bpf_tc_cls_get_info(fd, if_nametoindex("lo"),
			        BPF_TC_CLSACT_INGRESS,
				&opts, &info);
	if (r < 0)
		return r;

	assert(info.class_id == TC_H_MAKE(1UL << 16, 12));

This would be roughly equivalent to doing:
  # tc filter change dev lo egress prio <p> handle <h> bpf obj foo.o sec \
    classifier classid 1:12

... except a new bpf program will be loaded and replace existing one.

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 tools/lib/bpf/libbpf.h   |  52 ++++++
 tools/lib/bpf/libbpf.map |   5 +
 tools/lib/bpf/netlink.c  | 377 ++++++++++++++++++++++++++++++++++++++-
 3 files changed, 428 insertions(+), 6 deletions(-)

diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index bec4e6a6e31d..2f4a2036cb74 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -16,6 +16,9 @@
 #include <stdbool.h>
 #include <sys/types.h>  // for size_t
 #include <linux/bpf.h>
+#include <linux/pkt_cls.h>
+#include <linux/pkt_sched.h>
+#include <linux/tc_act/tc_bpf.h>
 
 #include "libbpf_common.h"
 
@@ -775,6 +778,55 @@ LIBBPF_API int bpf_linker__add_file(struct bpf_linker *linker, const char *filen
 LIBBPF_API int bpf_linker__finalize(struct bpf_linker *linker);
 LIBBPF_API void bpf_linker__free(struct bpf_linker *linker);
 
+/* Convenience macros for the clsact attach hooks */
+#define BPF_TC_CLSACT_INGRESS TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_INGRESS)
+#define BPF_TC_CLSACT_EGRESS TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_EGRESS)
+
+struct bpf_tc_cls_opts {
+	size_t sz;
+	__u32 protocol;
+	__u32 handle;
+	__u32 chain_index;
+	__u32 priority;
+	__u32 class_id;
+	size_t :0;
+};
+
+#define bpf_tc_cls_opts__last_field class_id
+
+/* Acts as a handle for an attached filter */
+struct bpf_tc_cls_attach_id {
+	__u32 protocol;
+	__u32 chain_index;
+	__u32 handle;
+	__u32 priority;
+};
+
+struct bpf_tc_cls_info {
+	struct bpf_tc_cls_attach_id id;
+	__u32 prog_id;
+	__u8 tag[BPF_TAG_SIZE];
+	__u32 class_id;
+	__u32 bpf_flags;
+	__u32 bpf_flags_gen;
+};
+
+/* id is out parameter that will be written to, it must not be NULL */
+LIBBPF_API int bpf_tc_cls_attach(int fd, __u32 ifindex, __u32 parent_id,
+				 const struct bpf_tc_cls_opts *opts,
+				 struct bpf_tc_cls_attach_id *id);
+LIBBPF_API int bpf_tc_cls_change(int fd, __u32 ifindex, __u32 parent_id,
+				 const struct bpf_tc_cls_opts *opts,
+				 struct bpf_tc_cls_attach_id *id);
+LIBBPF_API int bpf_tc_cls_replace(int fd, __u32 ifindex, __u32 parent_id,
+				  const struct bpf_tc_cls_opts *opts,
+				  struct bpf_tc_cls_attach_id *id);
+LIBBPF_API int bpf_tc_cls_detach(__u32 ifindex, __u32 parent_id,
+				 const struct bpf_tc_cls_attach_id *id);
+LIBBPF_API int bpf_tc_cls_get_info(int fd, __u32 ifindex, __u32 parent_id,
+				   const struct bpf_tc_cls_opts *opts,
+				   struct bpf_tc_cls_info *info);
+
 #ifdef __cplusplus
 } /* extern "C" */
 #endif
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index b9b29baf1df8..52e5de1e82ea 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -361,4 +361,9 @@ LIBBPF_0.4.0 {
 		bpf_linker__new;
 		bpf_map__inner_map;
 		bpf_object__set_kversion;
+		bpf_tc_cls_attach;
+		bpf_tc_cls_change;
+		bpf_tc_cls_detach;
+		bpf_tc_cls_replace;
+		bpf_tc_cls_get_info;
 } LIBBPF_0.3.0;
diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c
index c79e30484e81..93cc1e027065 100644
--- a/tools/lib/bpf/netlink.c
+++ b/tools/lib/bpf/netlink.c
@@ -4,7 +4,11 @@
 #include <stdlib.h>
 #include <memory.h>
 #include <unistd.h>
+#include <inttypes.h>
+#include <arpa/inet.h>
 #include <linux/bpf.h>
+#include <linux/pkt_cls.h>
+#include <linux/if_ether.h>
 #include <linux/rtnetlink.h>
 #include <sys/socket.h>
 #include <errno.h>
@@ -131,6 +135,41 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq,
 	return ret;
 }
 
+static int tc_setup_clsact_excl(int sock, __u32 nl_pid, __u32 ifindex)
+{
+	int seq = 0, ret = 0;
+	struct {
+		struct nlmsghdr nh;
+		struct tcmsg t;
+		char buf[256];
+	} req;
+
+	memset(&req, 0, sizeof(req));
+	req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg));
+	req.nh.nlmsg_flags =
+		NLM_F_REQUEST | NLM_F_CREATE | NLM_F_ACK | NLM_F_EXCL;
+	req.nh.nlmsg_type = RTM_NEWQDISC;
+	req.nh.nlmsg_pid = 0;
+	req.nh.nlmsg_seq = ++seq;
+	req.t.tcm_family = AF_UNSPEC;
+	req.t.tcm_ifindex = ifindex;
+	req.t.tcm_parent = TC_H_CLSACT;
+	req.t.tcm_handle = TC_H_MAKE(TC_H_CLSACT, 0);
+
+	ret = nlattr_add(&req.nh, sizeof(req), TCA_KIND, "clsact",
+			 sizeof("clsact"));
+	if (ret < 0)
+		return ret;
+
+	ret = send(sock, &req.nh, req.nh.nlmsg_len, 0);
+	if (ret < 0)
+		return ret;
+
+	ret = bpf_netlink_recv(sock, nl_pid, seq, NULL, NULL, NULL);
+
+	return ret;
+}
+
 static int __bpf_set_link_xdp_fd_replace(int ifindex, int fd, int old_fd,
 					 __u32 flags)
 {
@@ -344,6 +383,20 @@ int bpf_get_link_xdp_id(int ifindex, __u32 *prog_id, __u32 flags)
 	return ret;
 }
 
+static int bpf_nl_get_ext(struct nlmsghdr *nh, int sock, unsigned int nl_pid,
+			  __dump_nlmsg_t dump_link_nlmsg_p,
+			  libbpf_dump_nlmsg_t dump_link_nlmsg, void *cookie)
+{
+	int seq = time(NULL);
+
+	nh->nlmsg_seq = seq;
+	if (send(sock, nh, nh->nlmsg_len, 0) < 0)
+		return -errno;
+
+	return bpf_netlink_recv(sock, nl_pid, seq, dump_link_nlmsg_p,
+				dump_link_nlmsg, cookie);
+}
+
 int libbpf_nl_get_link(int sock, unsigned int nl_pid,
 		       libbpf_dump_nlmsg_t dump_link_nlmsg, void *cookie)
 {
@@ -356,12 +409,324 @@ int libbpf_nl_get_link(int sock, unsigned int nl_pid,
 		.nlh.nlmsg_flags = NLM_F_DUMP | NLM_F_REQUEST,
 		.ifm.ifi_family = AF_PACKET,
 	};
-	int seq = time(NULL);
 
-	req.nlh.nlmsg_seq = seq;
-	if (send(sock, &req, req.nlh.nlmsg_len, 0) < 0)
-		return -errno;
+	return bpf_nl_get_ext(&req.nlh, sock, nl_pid, __dump_link_nlmsg,
+			      dump_link_nlmsg, cookie);
+}
 
-	return bpf_netlink_recv(sock, nl_pid, seq, __dump_link_nlmsg,
-				dump_link_nlmsg, cookie);
+static int tc_bpf_add_fd_and_name(struct nlmsghdr *nh, size_t maxsz, int fd)
+{
+	struct bpf_prog_info info = {};
+	__u32 info_len = sizeof(info);
+	char name[256] = {};
+	int len, ret;
+
+	ret = bpf_obj_get_info_by_fd(fd, &info, &info_len);
+	if (ret < 0)
+		return ret;
+
+	ret = nlattr_add(nh, maxsz, TCA_BPF_FD, &fd, sizeof(fd));
+	if (ret < 0)
+		return ret;
+
+	len = snprintf(name, sizeof(name), "%s:[%" PRIu32 "]", info.name,
+		       info.id);
+	if (len < 0 || len >= sizeof(name))
+		return len < 0 ? -EINVAL : -ENAMETOOLONG;
+
+	return nlattr_add(nh, maxsz, TCA_BPF_NAME, name, len + 1);
+}
+
+struct pass_info {
+	struct bpf_tc_cls_info *info;
+	__u32 prog_id;
+};
+
+static int cls_get_info(struct nlmsghdr *nh, libbpf_dump_nlmsg_t fn,
+			void *cookie);
+
+static int tc_cls_bpf_modify(int fd, int cmd, unsigned int flags, __u32 ifindex,
+			     __u32 parent_id, const struct bpf_tc_cls_opts *opts,
+			     __dump_nlmsg_t fn, struct bpf_tc_cls_attach_id *id)
+{
+	struct bpf_tc_cls_info info = {};
+	unsigned int bpf_flags = 0;
+	__u32 nl_pid = 0, protocol;
+	int sock, seq = 0, ret;
+	struct nlattr *nla;
+	struct {
+		struct nlmsghdr nh;
+		struct tcmsg t;
+		char buf[256];
+	} req;
+
+	if (OPTS_GET(opts, priority, 0) > 0xFFFF)
+		return -EINVAL;
+
+	sock = libbpf_netlink_open(&nl_pid);
+	if (sock < 0)
+		return sock;
+
+	if ((parent_id == BPF_TC_CLSACT_INGRESS ||
+	     parent_id == BPF_TC_CLSACT_EGRESS) &&
+	    flags & NLM_F_CREATE) {
+		ret = tc_setup_clsact_excl(sock, nl_pid, ifindex);
+		/* attachment can still fail if ingress qdisc is installed, and
+		 * we're trying attach on egress as parent */
+		if (ret < 0 && ret != -EEXIST)
+			goto end;
+	}
+
+	protocol = OPTS_GET(opts, protocol, 0) ?: ETH_P_ALL;
+
+	memset(&req, 0, sizeof(req));
+	req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg));
+	req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | flags;
+	req.nh.nlmsg_type = cmd;
+	req.nh.nlmsg_pid = 0;
+	req.nh.nlmsg_seq = ++seq;
+	req.t.tcm_family = AF_UNSPEC;
+	req.t.tcm_handle = OPTS_GET(opts, handle, 0);
+	req.t.tcm_parent = parent_id;
+	req.t.tcm_ifindex = ifindex;
+	req.t.tcm_info =
+		TC_H_MAKE(OPTS_GET(opts, priority, 0UL) << 16, htons(protocol));
+
+	if (OPTS_HAS(opts, chain_index)) {
+		ret = nlattr_add(&req.nh, sizeof(req), TCA_CHAIN,
+				 &opts->chain_index, sizeof(opts->chain_index));
+		if (ret < 0)
+			goto end;
+	}
+
+	ret = nlattr_add(&req.nh, sizeof(req), TCA_KIND, "bpf", sizeof("bpf"));
+	if (ret < 0)
+		goto end;
+
+	nla = nlattr_begin_nested(&req.nh, sizeof(req), TCA_OPTIONS);
+	if (!nla) {
+		ret = -EMSGSIZE;
+		goto end;
+	}
+
+	if (OPTS_GET(opts, class_id, TC_H_UNSPEC)) {
+		ret = nlattr_add(&req.nh, sizeof(req), TCA_BPF_CLASSID,
+				 &opts->class_id, sizeof(opts->class_id));
+		if (ret < 0)
+			goto end;
+	}
+
+	if (cmd != RTM_DELTFILTER) {
+		ret = tc_bpf_add_fd_and_name(&req.nh, sizeof(req), fd);
+		if (ret < 0)
+			goto end;
+
+		/* direct action is always set */
+		bpf_flags |= TCA_BPF_FLAG_ACT_DIRECT;
+		ret = nlattr_add(&req.nh, sizeof(req), TCA_BPF_FLAGS,
+				 &bpf_flags, sizeof(bpf_flags));
+		if (ret < 0)
+			goto end;
+	}
+
+	nlattr_end_nested(&req.nh, nla);
+
+	ret = send(sock, &req.nh, req.nh.nlmsg_len, 0);
+	if (ret < 0)
+		goto end;
+
+	ret = bpf_netlink_recv(sock, nl_pid, seq, fn, NULL,
+			       &(struct pass_info){ &info, 0 });
+
+	if (fn)
+		*id = info.id;
+
+end:
+	close(sock);
+	return ret;
+}
+
+int bpf_tc_cls_attach(int fd, __u32 ifindex, __u32 parent_id,
+		      const struct bpf_tc_cls_opts *opts,
+		      struct bpf_tc_cls_attach_id *id)
+{
+	if (fd < 0 || !OPTS_VALID(opts, bpf_tc_cls_opts) || !id)
+		return -EINVAL;
+
+	return tc_cls_bpf_modify(fd, RTM_NEWTFILTER,
+				 NLM_F_ECHO | NLM_F_EXCL | NLM_F_CREATE,
+				 ifindex, parent_id, opts, cls_get_info, id);
+}
+
+int bpf_tc_cls_change(int fd, __u32 ifindex, __u32 parent_id,
+		      const struct bpf_tc_cls_opts *opts,
+		      struct bpf_tc_cls_attach_id *id)
+{
+	if (fd < 0 || !OPTS_VALID(opts, bpf_tc_cls_opts) || !id)
+		return -EINVAL;
+
+	return tc_cls_bpf_modify(fd, RTM_NEWTFILTER, NLM_F_ECHO, ifindex,
+				 parent_id, opts, cls_get_info, id);
+}
+
+int bpf_tc_cls_replace(int fd, __u32 ifindex, __u32 parent_id,
+		       const struct bpf_tc_cls_opts *opts,
+		       struct bpf_tc_cls_attach_id *id)
+{
+	if (fd < 0 || !OPTS_VALID(opts, bpf_tc_cls_opts) || !id)
+		return -EINVAL;
+
+	return tc_cls_bpf_modify(fd, RTM_NEWTFILTER, NLM_F_ECHO | NLM_F_CREATE,
+				 ifindex, parent_id, opts, cls_get_info, id);
+}
+
+int bpf_tc_cls_detach(__u32 ifindex, __u32 parent_id,
+		      const struct bpf_tc_cls_attach_id *id)
+{
+	DECLARE_LIBBPF_OPTS(bpf_tc_cls_opts, opts, 0);
+
+	if (!id)
+		return -EINVAL;
+
+	opts.protocol = id->protocol;
+	opts.chain_index = id->chain_index;
+	opts.handle = id->handle;
+	opts.priority = id->priority;
+
+	return tc_cls_bpf_modify(-1, RTM_DELTFILTER, 0, ifindex, parent_id,
+				 &opts, NULL, NULL);
+}
+
+static int __cls_get_info(void *cookie, void *msg, struct nlattr **tb)
+{
+	struct nlattr *tbb[TCA_BPF_MAX + 1];
+	struct pass_info *cinfo = cookie;
+	struct bpf_tc_cls_info *info;
+	struct tcmsg *t = msg;
+	__u32 prog_id;
+
+	info = cinfo->info;
+
+	if (!tb[TCA_OPTIONS])
+		return 0;
+
+	libbpf_nla_parse_nested(tbb, TCA_BPF_MAX, tb[TCA_OPTIONS], NULL);
+	if (!tbb[TCA_BPF_ID])
+		return 0;
+
+	prog_id = libbpf_nla_getattr_u32(tbb[TCA_BPF_ID]);
+	if (cinfo->prog_id && cinfo->prog_id != prog_id)
+		return 0;
+
+	info->id.protocol = ntohs(TC_H_MIN(t->tcm_info));
+	info->id.priority = TC_H_MAJ(t->tcm_info) >> 16;
+	info->id.handle = t->tcm_handle;
+
+	if (tb[TCA_CHAIN])
+		info->id.chain_index = libbpf_nla_getattr_u32(tb[TCA_CHAIN]);
+	else
+		info->id.chain_index = 0;
+
+	if (tbb[TCA_BPF_FLAGS])
+		info->bpf_flags = libbpf_nla_getattr_u32(tbb[TCA_BPF_FLAGS]);
+
+	if (tbb[TCA_BPF_FLAGS_GEN])
+		info->bpf_flags_gen =
+			libbpf_nla_getattr_u32(tbb[TCA_BPF_FLAGS_GEN]);
+
+	if (tbb[TCA_BPF_ID])
+		info->prog_id = libbpf_nla_getattr_u32(tbb[TCA_BPF_ID]);
+
+	if (tbb[TCA_BPF_TAG])
+		memcpy(info->tag, libbpf_nla_getattr_str(tbb[TCA_BPF_TAG]),
+		       sizeof(info->tag));
+
+	if (tbb[TCA_BPF_CLASSID])
+		info->class_id = libbpf_nla_getattr_u32(tbb[TCA_BPF_CLASSID]);
+
+	return 1;
+}
+
+static int cls_get_info(struct nlmsghdr *nh, libbpf_dump_nlmsg_t fn,
+			void *cookie)
+{
+	struct tcmsg *t = NLMSG_DATA(nh);
+	struct nlattr *tb[TCA_MAX + 1];
+
+	libbpf_nla_parse(tb, TCA_MAX,
+			 (struct nlattr *)((char *)t + NLMSG_ALIGN(sizeof(*t))),
+			 NLMSG_PAYLOAD(nh, sizeof(*t)), NULL);
+	if (!tb[TCA_KIND])
+		return -EINVAL;
+
+	return __cls_get_info(cookie, t, tb);
+}
+
+static int tc_cls_get_info(int fd, __u32 ifindex, __u32 parent_id,
+			   const struct bpf_tc_cls_opts *opts,
+			   struct bpf_tc_cls_info *info)
+{
+	__u32 nl_pid = 0, protocol, info_len = sizeof(struct bpf_prog_info);
+	struct bpf_prog_info prog_info = {};
+	int sock, ret;
+	struct {
+		struct nlmsghdr nh;
+		struct tcmsg t;
+		char buf[256];
+	} req = {
+		.nh.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg)),
+		.nh.nlmsg_type = RTM_GETTFILTER,
+		.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_DUMP,
+		.t.tcm_family = AF_UNSPEC,
+	};
+
+
+	if (!OPTS_VALID(opts, bpf_tc_cls_opts))
+		return -EINVAL;
+
+	protocol = OPTS_GET(opts, protocol, 0) ?: ETH_P_ALL;
+
+	req.t.tcm_parent = parent_id;
+	req.t.tcm_ifindex = ifindex;
+	req.t.tcm_handle = OPTS_GET(opts, handle, 0);
+	req.t.tcm_info =
+		TC_H_MAKE(OPTS_GET(opts, priority, 0UL) << 16, htons(protocol));
+
+	ret = bpf_obj_get_info_by_fd(fd, &prog_info, &info_len);
+	if (ret < 0)
+		return ret;
+
+	sock = libbpf_netlink_open(&nl_pid);
+	if (sock < 0)
+		return sock;
+
+	ret = nlattr_add(&req.nh, sizeof(req), TCA_KIND, "bpf", sizeof("bpf"));
+	if (ret < 0)
+		goto end;
+
+	if (OPTS_HAS(opts, chain_index)) {
+		ret = nlattr_add(&req.nh, sizeof(req), TCA_CHAIN,
+				 &opts->chain_index, sizeof(opts->chain_index));
+		if (ret < 0)
+			goto end;
+	}
+
+	req.nh.nlmsg_seq = time(NULL);
+
+	ret = bpf_nl_get_ext(&req.nh, sock, nl_pid, cls_get_info, NULL,
+			     &(struct pass_info){ info, prog_info.id });
+	if (ret < 0)
+		goto end;
+	/* 1 denotes a match */
+	ret = ret == 1 ? 0 : -ESRCH;
+end:
+	close(sock);
+	return ret;
+}
+
+int bpf_tc_cls_get_info(int fd, __u32 ifindex, __u32 parent_id,
+			const struct bpf_tc_cls_opts *opts,
+			struct bpf_tc_cls_info *info)
+{
+	return tc_cls_get_info(fd, ifindex, parent_id, opts, info);
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH bpf-next v2 4/4] libbpf: add selftests for TC-BPF API
  2021-04-19 12:18 [PATCH bpf-next v2 0/4] Add TC-BPF API Kumar Kartikeya Dwivedi
                   ` (2 preceding siblings ...)
  2021-04-19 12:18 ` [PATCH bpf-next v2 3/4] libbpf: add low level TC-BPF API Kumar Kartikeya Dwivedi
@ 2021-04-19 12:18 ` Kumar Kartikeya Dwivedi
  2021-04-20  4:35   ` Andrii Nakryiko
  3 siblings, 1 reply; 10+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2021-04-19 12:18 UTC (permalink / raw)
  To: bpf
  Cc: Kumar Kartikeya Dwivedi, Toke Høiland-Jørgensen,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, Jesper Dangaard Brouer, linux-kernel,
	netdev

This adds some basic tests for the low level bpf_tc_cls_* API.

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 .../selftests/bpf/prog_tests/test_tc_bpf.c    | 112 ++++++++++++++++++
 .../selftests/bpf/progs/test_tc_bpf_kern.c    |  12 ++
 2 files changed, 124 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c

diff --git a/tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c b/tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c
new file mode 100644
index 000000000000..945f3a1a72f8
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c
@@ -0,0 +1,112 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/bpf.h>
+#include <linux/err.h>
+#include <linux/limits.h>
+#include <bpf/libbpf.h>
+#include <errno.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <test_progs.h>
+#include <linux/if_ether.h>
+
+#define LO_IFINDEX 1
+
+static int test_tc_cls_internal(int fd, __u32 parent_id)
+{
+	DECLARE_LIBBPF_OPTS(bpf_tc_cls_opts, opts, .handle = 1, .priority = 10,
+			    .class_id = TC_H_MAKE(1UL << 16, 1),
+			    .chain_index = 5);
+	struct bpf_tc_cls_attach_id id = {};
+	struct bpf_tc_cls_info info = {};
+	int ret;
+
+	ret = bpf_tc_cls_attach(fd, LO_IFINDEX, parent_id, &opts, &id);
+	if (CHECK_FAIL(ret < 0))
+		return ret;
+
+	ret = bpf_tc_cls_get_info(fd, LO_IFINDEX, parent_id, NULL, &info);
+	if (CHECK_FAIL(ret < 0))
+		goto end;
+
+	ret = -1;
+
+	if (CHECK_FAIL(info.id.handle != id.handle) ||
+	    CHECK_FAIL(info.id.chain_index != id.chain_index) ||
+	    CHECK_FAIL(info.id.priority != id.priority) ||
+	    CHECK_FAIL(info.id.handle != 1) ||
+	    CHECK_FAIL(info.id.priority != 10) ||
+	    CHECK_FAIL(info.class_id != TC_H_MAKE(1UL << 16, 1)) ||
+	    CHECK_FAIL(info.id.chain_index != 5))
+		goto end;
+
+	ret = bpf_tc_cls_replace(fd, LO_IFINDEX, parent_id, &opts, &id);
+	if (CHECK_FAIL(ret < 0))
+		return ret;
+
+	if (CHECK_FAIL(info.id.handle != 1) ||
+	    CHECK_FAIL(info.id.priority != 10) ||
+	    CHECK_FAIL(info.class_id != TC_H_MAKE(1UL << 16, 1)))
+		goto end;
+
+	/* Demonstrate changing attributes */
+	opts.class_id = TC_H_MAKE(1UL << 16, 2);
+
+	ret = bpf_tc_cls_change(fd, LO_IFINDEX, parent_id, &opts, &info.id);
+	if (CHECK_FAIL(ret < 0))
+		goto end;
+
+	ret = bpf_tc_cls_get_info(fd, LO_IFINDEX, parent_id, NULL, &info);
+	if (CHECK_FAIL(ret < 0))
+		goto end;
+
+	if (CHECK_FAIL(info.class_id != TC_H_MAKE(1UL << 16, 2)))
+		goto end;
+	if (CHECK_FAIL((info.bpf_flags & TCA_BPF_FLAG_ACT_DIRECT) != 1))
+		goto end;
+
+end:
+	ret = bpf_tc_cls_detach(LO_IFINDEX, parent_id, &id);
+	CHECK_FAIL(ret < 0);
+	return ret;
+}
+
+void test_test_tc_bpf(void)
+{
+	const char *file = "./test_tc_bpf_kern.o";
+	struct bpf_program *clsp;
+	struct bpf_object *obj;
+	int cls_fd, ret;
+
+	obj = bpf_object__open(file);
+	if (CHECK_FAIL(IS_ERR_OR_NULL(obj)))
+		return;
+
+	clsp = bpf_object__find_program_by_title(obj, "classifier");
+	if (CHECK_FAIL(IS_ERR_OR_NULL(clsp)))
+		goto end;
+
+	ret = bpf_object__load(obj);
+	if (CHECK_FAIL(ret < 0))
+		goto end;
+
+	cls_fd = bpf_program__fd(clsp);
+
+	system("tc qdisc del dev lo clsact");
+
+	ret = test_tc_cls_internal(cls_fd, BPF_TC_CLSACT_INGRESS);
+	if (CHECK_FAIL(ret < 0))
+		goto end;
+
+	if (CHECK_FAIL(system("tc qdisc del dev lo clsact")))
+		goto end;
+
+	ret = test_tc_cls_internal(cls_fd, BPF_TC_CLSACT_EGRESS);
+	if (CHECK_FAIL(ret < 0))
+		goto end;
+
+	CHECK_FAIL(system("tc qdisc del dev lo clsact"));
+
+end:
+	bpf_object__close(obj);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c b/tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c
new file mode 100644
index 000000000000..3dd40e21af8e
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c
@@ -0,0 +1,12 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+
+// Dummy prog to test TC-BPF API
+
+SEC("classifier")
+int cls(struct __sk_buff *skb)
+{
+	return 0;
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next v2 3/4] libbpf: add low level TC-BPF API
  2021-04-19 12:18 ` [PATCH bpf-next v2 3/4] libbpf: add low level TC-BPF API Kumar Kartikeya Dwivedi
@ 2021-04-19 21:00   ` Daniel Borkmann
  2021-04-19 21:43     ` Toke Høiland-Jørgensen
  2021-04-19 21:45     ` Kumar Kartikeya Dwivedi
  0 siblings, 2 replies; 10+ messages in thread
From: Daniel Borkmann @ 2021-04-19 21:00 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: Toke Høiland-Jørgensen, Alexei Starovoitov,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, Jesper Dangaard Brouer, linux-kernel,
	netdev

On 4/19/21 2:18 PM, Kumar Kartikeya Dwivedi wrote:
> This adds functions that wrap the netlink API used for adding,
> manipulating, and removing traffic control filters. These functions
> operate directly on the loaded prog's fd, and return a handle to the
> filter using an out parameter named id.
> 
> The basic featureset is covered to allow for attaching, manipulation of
> properties, and removal of filters. Some additional features like
> TCA_BPF_POLICE and TCA_RATE for tc_cls have been omitted. These can
> added on top later by extending the bpf_tc_cls_opts struct.
> 
> Support for binding actions directly to a classifier by passing them in
> during filter creation has also been omitted for now. These actions have
> an auto clean up property because their lifetime is bound to the filter
> they are attached to. This can be added later, but was omitted for now
> as direct action mode is a better alternative to it, which is enabled by
> default.
> 
> An API summary:
> 
> bpf_tc_act_{attach, change, replace} may be used to attach, change, and

typo on bpf_tc_act_{...} ?
                ^^^
> replace SCHED_CLS bpf classifier. The protocol field can be set as 0, in
> which case it is subsitituted as ETH_P_ALL by default.

Do you have an actual user that needs anything other than ETH_P_ALL? Why is it
even needed? Why not stick to just ETH_P_ALL?

> The behavior of the three functions is as follows:
> 
> attach = create filter if it does not exist, fail otherwise
> change = change properties of the classifier of existing filter
> replace = create filter, and replace any existing filter

This touches on tc oddities quite a bit. Why do we need to expose them? Can't we
simplify/abstract this e.g. i) create or update instance, ii) delete instance,
iii) get instance ? What concrete use case do you have that you need those three
above?

> bpf_tc_cls_detach may be used to detach existing SCHED_CLS
> filter. The bpf_tc_cls_attach_id object filled in during attach,
> change, or replace must be passed in to the detach functions for them to
> remove the filter and its attached classififer correctly.
> 
> bpf_tc_cls_get_info is a helper that can be used to obtain attributes
> for the filter and classififer. The opts structure may be used to
> choose the granularity of search, such that info for a specific filter
> corresponding to the same loaded bpf program can be obtained. By
> default, the first match is returned to the user.
> 
> Examples:
> 
> 	struct bpf_tc_cls_attach_id id = {};
> 	struct bpf_object *obj;
> 	struct bpf_program *p;
> 	int fd, r;
> 
> 	obj = bpf_object_open("foo.o");
> 	if (IS_ERR_OR_NULL(obj))
> 		return PTR_ERR(obj);
> 
> 	p = bpf_object__find_program_by_title(obj, "classifier");
> 	if (IS_ERR_OR_NULL(p))
> 		return PTR_ERR(p);
> 
> 	if (bpf_object__load(obj) < 0)
> 		return -1;
> 
> 	fd = bpf_program__fd(p);
> 
> 	r = bpf_tc_cls_attach(fd, if_nametoindex("lo"),
> 			      BPF_TC_CLSACT_INGRESS,
> 			      NULL, &id);
> 	if (r < 0)
> 		return r;
> 
> ... which is roughly equivalent to (after clsact qdisc setup):
>    # tc filter add dev lo ingress bpf obj foo.o sec classifier da
> 
> ... as direct action mode is always enabled.
> 
> If a user wishes to modify existing options on an attached classifier,
> bpf_tc_cls_change API may be used.
> 
> Only parameters class_id can be modified, the rest are filled in to
> identify the correct filter. protocol can be left out if it was not
> chosen explicitly (defaulting to ETH_P_ALL).
> 
> Example:
> 
> 	/* Optional parameters necessary to select the right filter */
> 	DECLARE_LIBBPF_OPTS(bpf_tc_cls_opts, opts,
> 			    .handle = id.handle,
> 			    .priority = id.priority,
> 			    .chain_index = id.chain_index)

Why do we need chain_index as part of the basic API?

> 	opts.class_id = TC_H_MAKE(1UL << 16, 12);
> 	r = bpf_tc_cls_change(fd, if_nametoindex("lo"),
> 			      BPF_TC_CLSACT_INGRESS,
> 			      &opts, &id);

Also, I'm not sure whether the prefix should even be named  bpf_tc_cls_*() tbh,
yes, despite being "low level" api. I think in the context of bpf we should stop
regarding this as 'classifier' and 'action' objects since it's really just a
single entity and not separate ones. It's weird enough to explain this concept
to new users and if a libbpf based api could cleanly abstract it, I would be all
for it. I don't think we need to map 1:1 the old tc legacy even in the low level
api, tbh, as it feels backwards. I think the 'handle' & 'priority' bits are okay,
but I would remove the others.

> 	if (r < 0)
> 		return r;
> 
> 	struct bpf_tc_cls_info info = {};
> 	r = bpf_tc_cls_get_info(fd, if_nametoindex("lo"),
> 			        BPF_TC_CLSACT_INGRESS,
> 				&opts, &info);
> 	if (r < 0)
> 		return r;
> 
> 	assert(info.class_id == TC_H_MAKE(1UL << 16, 12));
> 
> This would be roughly equivalent to doing:
>    # tc filter change dev lo egress prio <p> handle <h> bpf obj foo.o sec \
>      classifier classid 1:12

Why even bother to support !da mode, what are you trying to solve with it? I
don't think official libbpf api should support something that doesn't scale.

> ... except a new bpf program will be loaded and replace existing one.
> 
> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next v2 3/4] libbpf: add low level TC-BPF API
  2021-04-19 21:00   ` Daniel Borkmann
@ 2021-04-19 21:43     ` Toke Høiland-Jørgensen
  2021-04-19 21:57       ` Daniel Borkmann
  2021-04-19 21:45     ` Kumar Kartikeya Dwivedi
  1 sibling, 1 reply; 10+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-04-19 21:43 UTC (permalink / raw)
  To: Daniel Borkmann, Kumar Kartikeya Dwivedi, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, Jesper Dangaard Brouer,
	linux-kernel, netdev

Daniel Borkmann <daniel@iogearbox.net> writes:

> On 4/19/21 2:18 PM, Kumar Kartikeya Dwivedi wrote:
>> This adds functions that wrap the netlink API used for adding,
>> manipulating, and removing traffic control filters. These functions
>> operate directly on the loaded prog's fd, and return a handle to the
>> filter using an out parameter named id.
>> 
>> The basic featureset is covered to allow for attaching, manipulation of
>> properties, and removal of filters. Some additional features like
>> TCA_BPF_POLICE and TCA_RATE for tc_cls have been omitted. These can
>> added on top later by extending the bpf_tc_cls_opts struct.
>> 
>> Support for binding actions directly to a classifier by passing them in
>> during filter creation has also been omitted for now. These actions have
>> an auto clean up property because their lifetime is bound to the filter
>> they are attached to. This can be added later, but was omitted for now
>> as direct action mode is a better alternative to it, which is enabled by
>> default.
>> 
>> An API summary:
>> 
>> bpf_tc_act_{attach, change, replace} may be used to attach, change, and
>
> typo on bpf_tc_act_{...} ?
>                 ^^^
>> replace SCHED_CLS bpf classifier. The protocol field can be set as 0, in
>> which case it is subsitituted as ETH_P_ALL by default.
>
> Do you have an actual user that needs anything other than ETH_P_ALL? Why is it
> even needed? Why not stick to just ETH_P_ALL?
>
>> The behavior of the three functions is as follows:
>> 
>> attach = create filter if it does not exist, fail otherwise
>> change = change properties of the classifier of existing filter
>> replace = create filter, and replace any existing filter
>
> This touches on tc oddities quite a bit. Why do we need to expose them? Can't we
> simplify/abstract this e.g. i) create or update instance, ii) delete instance,
> iii) get instance ? What concrete use case do you have that you need those three
> above?
>
>> bpf_tc_cls_detach may be used to detach existing SCHED_CLS
>> filter. The bpf_tc_cls_attach_id object filled in during attach,
>> change, or replace must be passed in to the detach functions for them to
>> remove the filter and its attached classififer correctly.
>> 
>> bpf_tc_cls_get_info is a helper that can be used to obtain attributes
>> for the filter and classififer. The opts structure may be used to
>> choose the granularity of search, such that info for a specific filter
>> corresponding to the same loaded bpf program can be obtained. By
>> default, the first match is returned to the user.
>> 
>> Examples:
>> 
>> 	struct bpf_tc_cls_attach_id id = {};
>> 	struct bpf_object *obj;
>> 	struct bpf_program *p;
>> 	int fd, r;
>> 
>> 	obj = bpf_object_open("foo.o");
>> 	if (IS_ERR_OR_NULL(obj))
>> 		return PTR_ERR(obj);
>> 
>> 	p = bpf_object__find_program_by_title(obj, "classifier");
>> 	if (IS_ERR_OR_NULL(p))
>> 		return PTR_ERR(p);
>> 
>> 	if (bpf_object__load(obj) < 0)
>> 		return -1;
>> 
>> 	fd = bpf_program__fd(p);
>> 
>> 	r = bpf_tc_cls_attach(fd, if_nametoindex("lo"),
>> 			      BPF_TC_CLSACT_INGRESS,
>> 			      NULL, &id);
>> 	if (r < 0)
>> 		return r;
>> 
>> ... which is roughly equivalent to (after clsact qdisc setup):
>>    # tc filter add dev lo ingress bpf obj foo.o sec classifier da
>> 
>> ... as direct action mode is always enabled.
>> 
>> If a user wishes to modify existing options on an attached classifier,
>> bpf_tc_cls_change API may be used.
>> 
>> Only parameters class_id can be modified, the rest are filled in to
>> identify the correct filter. protocol can be left out if it was not
>> chosen explicitly (defaulting to ETH_P_ALL).
>> 
>> Example:
>> 
>> 	/* Optional parameters necessary to select the right filter */
>> 	DECLARE_LIBBPF_OPTS(bpf_tc_cls_opts, opts,
>> 			    .handle = id.handle,
>> 			    .priority = id.priority,
>> 			    .chain_index = id.chain_index)
>
> Why do we need chain_index as part of the basic API?
>
>> 	opts.class_id = TC_H_MAKE(1UL << 16, 12);
>> 	r = bpf_tc_cls_change(fd, if_nametoindex("lo"),
>> 			      BPF_TC_CLSACT_INGRESS,
>> 			      &opts, &id);
>
> Also, I'm not sure whether the prefix should even be named  bpf_tc_cls_*() tbh,
> yes, despite being "low level" api. I think in the context of bpf we should stop
> regarding this as 'classifier' and 'action' objects since it's really just a
> single entity and not separate ones. It's weird enough to explain this concept
> to new users and if a libbpf based api could cleanly abstract it, I would be all
> for it. I don't think we need to map 1:1 the old tc legacy even in the low level
> api, tbh, as it feels backwards. I think the 'handle' & 'priority' bits are okay,
> but I would remove the others.

Hmm, I'm OK with dropping the TC oddities (including the cls_ in the
name), but I think we should be documenting it so that users that do
come from TC will not be completely lost :)

-Toke


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next v2 3/4] libbpf: add low level TC-BPF API
  2021-04-19 21:00   ` Daniel Borkmann
  2021-04-19 21:43     ` Toke Høiland-Jørgensen
@ 2021-04-19 21:45     ` Kumar Kartikeya Dwivedi
  1 sibling, 0 replies; 10+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2021-04-19 21:45 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: bpf, Toke Høiland-Jørgensen, Alexei Starovoitov,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, Jesper Dangaard Brouer, linux-kernel,
	netdev

On Tue, Apr 20, 2021 at 02:30:44AM IST, Daniel Borkmann wrote:
> On 4/19/21 2:18 PM, Kumar Kartikeya Dwivedi wrote:
> > This adds functions that wrap the netlink API used for adding,
> > manipulating, and removing traffic control filters. These functions
> > operate directly on the loaded prog's fd, and return a handle to the
> > filter using an out parameter named id.
> >
> > The basic featureset is covered to allow for attaching, manipulation of
> > properties, and removal of filters. Some additional features like
> > TCA_BPF_POLICE and TCA_RATE for tc_cls have been omitted. These can
> > added on top later by extending the bpf_tc_cls_opts struct.
> >
> > Support for binding actions directly to a classifier by passing them in
> > during filter creation has also been omitted for now. These actions have
> > an auto clean up property because their lifetime is bound to the filter
> > they are attached to. This can be added later, but was omitted for now
> > as direct action mode is a better alternative to it, which is enabled by
> > default.
> >
> > An API summary:
> >
> > bpf_tc_act_{attach, change, replace} may be used to attach, change, and
>
> typo on bpf_tc_act_{...} ?
>

Oops, yes. Should be bpf_tc_cls_...

> > replace SCHED_CLS bpf classifier. The protocol field can be set as 0, in
> > which case it is subsitituted as ETH_P_ALL by default.
>
> Do you have an actual user that needs anything other than ETH_P_ALL? Why is it
> even needed? Why not stick to just ETH_P_ALL?
>

Mostly because it was little to no effort to expose this. Though if you feel
strongly about it I can drop the protocol option, and just bake in ETH_P_ALL. It
can always be added later ofcourse, if the need arises in the future.

> > The behavior of the three functions is as follows:
> >
> > attach = create filter if it does not exist, fail otherwise
> > change = change properties of the classifier of existing filter
> > replace = create filter, and replace any existing filter
>
> This touches on tc oddities quite a bit. Why do we need to expose them? Can't we
> simplify/abstract this e.g. i) create or update instance, ii) delete instance,
> iii) get instance ? What concrete use case do you have that you need those three
> above?
>

'change' is relevant for modifying classifier specific options, and given it's
a lot less useful now as per the current state of this patch, I am fine with
removing it. This is also where the distinction becomes visible to the user, so
removing it should hide the filter/classifier separation.

> > bpf_tc_cls_detach may be used to detach existing SCHED_CLS
> > filter. The bpf_tc_cls_attach_id object filled in during attach,
> > change, or replace must be passed in to the detach functions for them to
> > remove the filter and its attached classififer correctly.
> >
> > bpf_tc_cls_get_info is a helper that can be used to obtain attributes
> > for the filter and classififer. The opts structure may be used to
> > choose the granularity of search, such that info for a specific filter
> > corresponding to the same loaded bpf program can be obtained. By
> > default, the first match is returned to the user.
> >
> > Examples:
> >
> > 	struct bpf_tc_cls_attach_id id = {};
> > 	struct bpf_object *obj;
> > 	struct bpf_program *p;
> > 	int fd, r;
> >
> > 	obj = bpf_object_open("foo.o");
> > 	if (IS_ERR_OR_NULL(obj))
> > 		return PTR_ERR(obj);
> >
> > 	p = bpf_object__find_program_by_title(obj, "classifier");
> > 	if (IS_ERR_OR_NULL(p))
> > 		return PTR_ERR(p);
> >
> > 	if (bpf_object__load(obj) < 0)
> > 		return -1;
> >
> > 	fd = bpf_program__fd(p);
> >
> > 	r = bpf_tc_cls_attach(fd, if_nametoindex("lo"),
> > 			      BPF_TC_CLSACT_INGRESS,
> > 			      NULL, &id);
> > 	if (r < 0)
> > 		return r;
> >
> > ... which is roughly equivalent to (after clsact qdisc setup):
> >    # tc filter add dev lo ingress bpf obj foo.o sec classifier da
> >
> > ... as direct action mode is always enabled.
> >
> > If a user wishes to modify existing options on an attached classifier,
> > bpf_tc_cls_change API may be used.
> >
> > Only parameters class_id can be modified, the rest are filled in to
> > identify the correct filter. protocol can be left out if it was not
> > chosen explicitly (defaulting to ETH_P_ALL).
> >
> > Example:
> >
> > 	/* Optional parameters necessary to select the right filter */
> > 	DECLARE_LIBBPF_OPTS(bpf_tc_cls_opts, opts,
> > 			    .handle = id.handle,
> > 			    .priority = id.priority,
> > 			    .chain_index = id.chain_index)
>
> Why do we need chain_index as part of the basic API?
>

It would be relevant when TC_ACT_GOTO_CHAIN is used. Other than that, I guess
it's not very useful.

> > 	opts.class_id = TC_H_MAKE(1UL << 16, 12);
> > 	r = bpf_tc_cls_change(fd, if_nametoindex("lo"),
> > 			      BPF_TC_CLSACT_INGRESS,
> > 			      &opts, &id);
>
> Also, I'm not sure whether the prefix should even be named  bpf_tc_cls_*() tbh,
> yes, despite being "low level" api. I think in the context of bpf we should stop
> regarding this as 'classifier' and 'action' objects since it's really just a
> single entity and not separate ones. It's weird enough to explain this concept
> to new users and if a libbpf based api could cleanly abstract it, I would be all
> for it. I don't think we need to map 1:1 the old tc legacy even in the low level
> api, tbh, as it feels backwards. I think the 'handle' & 'priority' bits are okay,
> but I would remove the others.
>

Ok, would dropping _cls from the name be better?
bpf_tc_attach
bpf_tc_replace
bpf_tc_get_info
bpf_tc_detach

As for options, I'll drop protocol, if you feel strongly about chain_index I can
drop that one too.

> > 	if (r < 0)
> > 		return r;
> >
> > 	struct bpf_tc_cls_info info = {};
> > 	r = bpf_tc_cls_get_info(fd, if_nametoindex("lo"),
> > 			        BPF_TC_CLSACT_INGRESS,
> > 				&opts, &info);
> > 	if (r < 0)
> > 		return r;
> >
> > 	assert(info.class_id == TC_H_MAKE(1UL << 16, 12));
> >
> > This would be roughly equivalent to doing:
> >    # tc filter change dev lo egress prio <p> handle <h> bpf obj foo.o sec \
> >      classifier classid 1:12
>
> Why even bother to support !da mode, what are you trying to solve with it? I
> don't think official libbpf api should support something that doesn't scale.
>

da is default now, this is yet another typo/oversight...

--
Kartikeya

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next v2 3/4] libbpf: add low level TC-BPF API
  2021-04-19 21:43     ` Toke Høiland-Jørgensen
@ 2021-04-19 21:57       ` Daniel Borkmann
  0 siblings, 0 replies; 10+ messages in thread
From: Daniel Borkmann @ 2021-04-19 21:57 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Kumar Kartikeya Dwivedi, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, Jesper Dangaard Brouer,
	linux-kernel, netdev

On 4/19/21 11:43 PM, Toke Høiland-Jørgensen wrote:
> Daniel Borkmann <daniel@iogearbox.net> writes:
>> On 4/19/21 2:18 PM, Kumar Kartikeya Dwivedi wrote:
>>> This adds functions that wrap the netlink API used for adding,
>>> manipulating, and removing traffic control filters. These functions
>>> operate directly on the loaded prog's fd, and return a handle to the
>>> filter using an out parameter named id.
>>>
>>> The basic featureset is covered to allow for attaching, manipulation of
>>> properties, and removal of filters. Some additional features like
>>> TCA_BPF_POLICE and TCA_RATE for tc_cls have been omitted. These can
>>> added on top later by extending the bpf_tc_cls_opts struct.
>>>
>>> Support for binding actions directly to a classifier by passing them in
>>> during filter creation has also been omitted for now. These actions have
>>> an auto clean up property because their lifetime is bound to the filter
>>> they are attached to. This can be added later, but was omitted for now
>>> as direct action mode is a better alternative to it, which is enabled by
>>> default.
>>>
>>> An API summary:
>>>
>>> bpf_tc_act_{attach, change, replace} may be used to attach, change, and
>>
>> typo on bpf_tc_act_{...} ?
>>                  ^^^
>>> replace SCHED_CLS bpf classifier. The protocol field can be set as 0, in
>>> which case it is subsitituted as ETH_P_ALL by default.
>>
>> Do you have an actual user that needs anything other than ETH_P_ALL? Why is it
>> even needed? Why not stick to just ETH_P_ALL?
>>
>>> The behavior of the three functions is as follows:
>>>
>>> attach = create filter if it does not exist, fail otherwise
>>> change = change properties of the classifier of existing filter
>>> replace = create filter, and replace any existing filter
>>
>> This touches on tc oddities quite a bit. Why do we need to expose them? Can't we
>> simplify/abstract this e.g. i) create or update instance, ii) delete instance,
>> iii) get instance ? What concrete use case do you have that you need those three
>> above?
>>
>>> bpf_tc_cls_detach may be used to detach existing SCHED_CLS
>>> filter. The bpf_tc_cls_attach_id object filled in during attach,
>>> change, or replace must be passed in to the detach functions for them to
>>> remove the filter and its attached classififer correctly.
>>>
>>> bpf_tc_cls_get_info is a helper that can be used to obtain attributes
>>> for the filter and classififer. The opts structure may be used to
>>> choose the granularity of search, such that info for a specific filter
>>> corresponding to the same loaded bpf program can be obtained. By
>>> default, the first match is returned to the user.
>>>
>>> Examples:
>>>
>>> 	struct bpf_tc_cls_attach_id id = {};
>>> 	struct bpf_object *obj;
>>> 	struct bpf_program *p;
>>> 	int fd, r;
>>>
>>> 	obj = bpf_object_open("foo.o");
>>> 	if (IS_ERR_OR_NULL(obj))
>>> 		return PTR_ERR(obj);
>>>
>>> 	p = bpf_object__find_program_by_title(obj, "classifier");
>>> 	if (IS_ERR_OR_NULL(p))
>>> 		return PTR_ERR(p);
>>>
>>> 	if (bpf_object__load(obj) < 0)
>>> 		return -1;
>>>
>>> 	fd = bpf_program__fd(p);
>>>
>>> 	r = bpf_tc_cls_attach(fd, if_nametoindex("lo"),
>>> 			      BPF_TC_CLSACT_INGRESS,
>>> 			      NULL, &id);
>>> 	if (r < 0)
>>> 		return r;
>>>
>>> ... which is roughly equivalent to (after clsact qdisc setup):
>>>     # tc filter add dev lo ingress bpf obj foo.o sec classifier da
>>>
>>> ... as direct action mode is always enabled.
>>>
>>> If a user wishes to modify existing options on an attached classifier,
>>> bpf_tc_cls_change API may be used.
>>>
>>> Only parameters class_id can be modified, the rest are filled in to
>>> identify the correct filter. protocol can be left out if it was not
>>> chosen explicitly (defaulting to ETH_P_ALL).
>>>
>>> Example:
>>>
>>> 	/* Optional parameters necessary to select the right filter */
>>> 	DECLARE_LIBBPF_OPTS(bpf_tc_cls_opts, opts,
>>> 			    .handle = id.handle,
>>> 			    .priority = id.priority,
>>> 			    .chain_index = id.chain_index)
>>
>> Why do we need chain_index as part of the basic API?
>>
>>> 	opts.class_id = TC_H_MAKE(1UL << 16, 12);
>>> 	r = bpf_tc_cls_change(fd, if_nametoindex("lo"),
>>> 			      BPF_TC_CLSACT_INGRESS,
>>> 			      &opts, &id);
>>
>> Also, I'm not sure whether the prefix should even be named  bpf_tc_cls_*() tbh,
>> yes, despite being "low level" api. I think in the context of bpf we should stop
>> regarding this as 'classifier' and 'action' objects since it's really just a
>> single entity and not separate ones. It's weird enough to explain this concept
>> to new users and if a libbpf based api could cleanly abstract it, I would be all
>> for it. I don't think we need to map 1:1 the old tc legacy even in the low level
>> api, tbh, as it feels backwards. I think the 'handle' & 'priority' bits are okay,
>> but I would remove the others.
> 
> Hmm, I'm OK with dropping the TC oddities (including the cls_ in the
> name), but I think we should be documenting it so that users that do
> come from TC will not be completely lost :)

Yeah, that sounds good to me. :) All I'm trying to say is that /we/ are used to the
terminology and quirks that come with it, but I'm hoping that such API will be used
by *new* folks who have zero context on the underlying details, and they also really
shouldn't have to care. Even if on the lower level we expose handle/priority at least,
the API should be designed with a mindset of no prior tc experience required. Simple
and easy to use. I think the handle/priority concept can be explained/documented fairly
straight forward. There is a chance to hide all the nasty historic/legacy details in
something clean, easy to use and scaleable (implied by the da mode), we should use this
opportunity. ;)

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next v2 4/4] libbpf: add selftests for TC-BPF API
  2021-04-19 12:18 ` [PATCH bpf-next v2 4/4] libbpf: add selftests for " Kumar Kartikeya Dwivedi
@ 2021-04-20  4:35   ` Andrii Nakryiko
  0 siblings, 0 replies; 10+ messages in thread
From: Andrii Nakryiko @ 2021-04-20  4:35 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Toke Høiland-Jørgensen, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, Jesper Dangaard Brouer,
	open list, Networking

On Mon, Apr 19, 2021 at 5:18 AM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> This adds some basic tests for the low level bpf_tc_cls_* API.
>
> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>  .../selftests/bpf/prog_tests/test_tc_bpf.c    | 112 ++++++++++++++++++
>  .../selftests/bpf/progs/test_tc_bpf_kern.c    |  12 ++
>  2 files changed, 124 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c
>  create mode 100644 tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c b/tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c
> new file mode 100644
> index 000000000000..945f3a1a72f8
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c
> @@ -0,0 +1,112 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/bpf.h>
> +#include <linux/err.h>
> +#include <linux/limits.h>
> +#include <bpf/libbpf.h>
> +#include <errno.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <test_progs.h>
> +#include <linux/if_ether.h>
> +
> +#define LO_IFINDEX 1
> +
> +static int test_tc_cls_internal(int fd, __u32 parent_id)
> +{
> +       DECLARE_LIBBPF_OPTS(bpf_tc_cls_opts, opts, .handle = 1, .priority = 10,
> +                           .class_id = TC_H_MAKE(1UL << 16, 1),
> +                           .chain_index = 5);
> +       struct bpf_tc_cls_attach_id id = {};
> +       struct bpf_tc_cls_info info = {};
> +       int ret;
> +
> +       ret = bpf_tc_cls_attach(fd, LO_IFINDEX, parent_id, &opts, &id);
> +       if (CHECK_FAIL(ret < 0))
> +               return ret;
> +
> +       ret = bpf_tc_cls_get_info(fd, LO_IFINDEX, parent_id, NULL, &info);
> +       if (CHECK_FAIL(ret < 0))
> +               goto end;
> +
> +       ret = -1;
> +
> +       if (CHECK_FAIL(info.id.handle != id.handle) ||
> +           CHECK_FAIL(info.id.chain_index != id.chain_index) ||
> +           CHECK_FAIL(info.id.priority != id.priority) ||
> +           CHECK_FAIL(info.id.handle != 1) ||
> +           CHECK_FAIL(info.id.priority != 10) ||
> +           CHECK_FAIL(info.class_id != TC_H_MAKE(1UL << 16, 1)) ||
> +           CHECK_FAIL(info.id.chain_index != 5))
> +               goto end;
> +
> +       ret = bpf_tc_cls_replace(fd, LO_IFINDEX, parent_id, &opts, &id);
> +       if (CHECK_FAIL(ret < 0))
> +               return ret;
> +
> +       if (CHECK_FAIL(info.id.handle != 1) ||
> +           CHECK_FAIL(info.id.priority != 10) ||
> +           CHECK_FAIL(info.class_id != TC_H_MAKE(1UL << 16, 1)))
> +               goto end;
> +
> +       /* Demonstrate changing attributes */
> +       opts.class_id = TC_H_MAKE(1UL << 16, 2);
> +
> +       ret = bpf_tc_cls_change(fd, LO_IFINDEX, parent_id, &opts, &info.id);
> +       if (CHECK_FAIL(ret < 0))
> +               goto end;
> +
> +       ret = bpf_tc_cls_get_info(fd, LO_IFINDEX, parent_id, NULL, &info);
> +       if (CHECK_FAIL(ret < 0))
> +               goto end;
> +
> +       if (CHECK_FAIL(info.class_id != TC_H_MAKE(1UL << 16, 2)))
> +               goto end;
> +       if (CHECK_FAIL((info.bpf_flags & TCA_BPF_FLAG_ACT_DIRECT) != 1))
> +               goto end;
> +
> +end:
> +       ret = bpf_tc_cls_detach(LO_IFINDEX, parent_id, &id);
> +       CHECK_FAIL(ret < 0);
> +       return ret;
> +}
> +
> +void test_test_tc_bpf(void)
> +{
> +       const char *file = "./test_tc_bpf_kern.o";
> +       struct bpf_program *clsp;
> +       struct bpf_object *obj;
> +       int cls_fd, ret;
> +
> +       obj = bpf_object__open(file);
> +       if (CHECK_FAIL(IS_ERR_OR_NULL(obj)))
> +               return;
> +
> +       clsp = bpf_object__find_program_by_title(obj, "classifier");
> +       if (CHECK_FAIL(IS_ERR_OR_NULL(clsp)))
> +               goto end;
> +
> +       ret = bpf_object__load(obj);
> +       if (CHECK_FAIL(ret < 0))
> +               goto end;
> +
> +       cls_fd = bpf_program__fd(clsp);
> +
> +       system("tc qdisc del dev lo clsact");
> +
> +       ret = test_tc_cls_internal(cls_fd, BPF_TC_CLSACT_INGRESS);
> +       if (CHECK_FAIL(ret < 0))
> +               goto end;
> +
> +       if (CHECK_FAIL(system("tc qdisc del dev lo clsact")))
> +               goto end;
> +
> +       ret = test_tc_cls_internal(cls_fd, BPF_TC_CLSACT_EGRESS);
> +       if (CHECK_FAIL(ret < 0))
> +               goto end;
> +
> +       CHECK_FAIL(system("tc qdisc del dev lo clsact"));

please don't use CHECK_FAIL. And prefer ASSERT_xxx over CHECK().

> +
> +end:
> +       bpf_object__close(obj);
> +}
> diff --git a/tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c b/tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c
> new file mode 100644
> index 000000000000..3dd40e21af8e
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c
> @@ -0,0 +1,12 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/bpf.h>
> +#include <bpf/bpf_helpers.h>
> +
> +// Dummy prog to test TC-BPF API

no C++-style comments, please (except for SPDX header, of course)
> +
> +SEC("classifier")
> +int cls(struct __sk_buff *skb)
> +{
> +       return 0;
> +}
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-04-20  4:35 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-19 12:18 [PATCH bpf-next v2 0/4] Add TC-BPF API Kumar Kartikeya Dwivedi
2021-04-19 12:18 ` [PATCH bpf-next v2 1/4] tools: pkt_cls.h: sync with kernel sources Kumar Kartikeya Dwivedi
2021-04-19 12:18 ` [PATCH bpf-next v2 2/4] libbpf: add helpers for preparing netlink attributes Kumar Kartikeya Dwivedi
2021-04-19 12:18 ` [PATCH bpf-next v2 3/4] libbpf: add low level TC-BPF API Kumar Kartikeya Dwivedi
2021-04-19 21:00   ` Daniel Borkmann
2021-04-19 21:43     ` Toke Høiland-Jørgensen
2021-04-19 21:57       ` Daniel Borkmann
2021-04-19 21:45     ` Kumar Kartikeya Dwivedi
2021-04-19 12:18 ` [PATCH bpf-next v2 4/4] libbpf: add selftests for " Kumar Kartikeya Dwivedi
2021-04-20  4:35   ` Andrii Nakryiko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).