All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next v4 0/8] BPF link support for tc BPF programs
@ 2023-07-10 20:12 Daniel Borkmann
  2023-07-10 20:12 ` [PATCH bpf-next v4 1/8] bpf: Add generic attach/detach/query API for multi-progs Daniel Borkmann
                   ` (7 more replies)
  0 siblings, 8 replies; 22+ messages in thread
From: Daniel Borkmann @ 2023-07-10 20:12 UTC (permalink / raw)
  To: ast
  Cc: andrii, martin.lau, razor, sdf, john.fastabend, kuba, dxu, joe,
	toke, davem, bpf, netdev, Daniel Borkmann

This series adds BPF link support for tc BPF programs. We initially
presented the motivation, related work and design at last year's LPC
conference in the networking & BPF track [0], and a recent update on
our progress of the rework during this year's LSF/MM/BPF summit [1].
The main changes are in first two patches and the last two have an
extensive batch of test cases we developed along with it, please see
individual patches for details. We tested this series with tc-testing
selftest suite as well as BPF CI/selftests. Thanks!

v3 -> v4:
  - Fix bpftool output to display tcx/{ingress,egress} (Stan)
  - Documentation around API, BPF_MPROG_* return codes and locking
    expectations (Stan, Alexei)
  - Change _after and _before to have the same semantics for return
    value (Alexei)
  - Rework mprog initialization and move allocation/free one layer
    up into tcx to simplify the code (Stan)
  - Add comment on synchronize_rcu and parent->ref (Stan)
  - Add comment on bpf_mprog_pos_() helpers wrt target position (Stan)
v2 -> v3:
  - Removal of BPF_F_FIRST/BPF_F_LAST from control UAPI (Toke, Stan)
  - Along with that full rework of bpf_mprog internals to simplify
    dependency management, looks much nicer now imho
  - Just single bpf_mprog_cp instead of two (Andrii)
  - atomic64_t for revision counter (Andrii)
  - Evaluate target position and reject on conflicts (Andrii)
  - Keep track of actual count in bpf_mprob_bundle (Andrii)
  - Make combo of REPLACE and BEFORE/AFTER work (Andrii)
  - Moved miniq as first struct member (Jamal)
  - Rework tcx_link_attach with regards to rtnl (Jakub, Andrii)
  - Moved wrappers after bpf_prog_detach_ops (Andrii)
  - Removed union for relative_fd and friends for opts and link in
    libbpf (Andrii)
  - Add doc comments to attach/detach/query libbpf APIs (Andrii)
  - Dropped SEC_ATTACHABLE_OPT (Andrii)
  - Add an OPTS_ZEROED check to bpf_link_create (Andrii)
  - Keep opts as the last argument in bpf_program_attach_fd (Andrii)
  - Rework bpf_program_attach_fd (Andrii)
  - Remove OPTS_GET before we checked OPTS_VALID in
    bpf_program__attach_tcx (Andrii)
  - Add `size_t :0;` to prevent compiler from leaving garbage (Andrii)
  - Add helper macro to clear opts structs which I found useful
    when writing tests
  - Rework of both opts and link test cases to accommodate for changes
v1 -> v2:
  - Rework of almost entire series to remove prio from UAPI and switch
    to better control directives BPF_F_FIRST/BPF_F_LAST/BPF_F_BEFORE/
    BPF_F_AFTER (Alexei, Toke, Stan, Andrii)
  - Addition of big test suite to cover all corner cases

  [0] https://lpc.events/event/16/contributions/1353/
  [1] http://vger.kernel.org/bpfconf2023_material/tcx_meta_netdev_borkmann.pdf

Daniel Borkmann (8):
  bpf: Add generic attach/detach/query API for multi-progs
  bpf: Add fd-based tcx multi-prog infra with link support
  libbpf: Add opts-based attach/detach/query API for tcx
  libbpf: Add link-based API for tcx
  libbpf: Add helper macro to clear opts structs
  bpftool: Extend net dump with tcx progs
  selftests/bpf: Add mprog API tests for BPF tcx opts
  selftests/bpf: Add mprog API tests for BPF tcx links

 MAINTAINERS                                   |    5 +-
 include/linux/bpf_mprog.h                     |  352 +++
 include/linux/netdevice.h                     |   15 +-
 include/linux/skbuff.h                        |    4 +-
 include/net/sch_generic.h                     |    2 +-
 include/net/tcx.h                             |  199 ++
 include/uapi/linux/bpf.h                      |   70 +-
 kernel/bpf/Kconfig                            |    1 +
 kernel/bpf/Makefile                           |    3 +-
 kernel/bpf/mprog.c                            |  427 ++++
 kernel/bpf/syscall.c                          |   83 +-
 kernel/bpf/tcx.c                              |  351 +++
 net/Kconfig                                   |    5 +
 net/core/dev.c                                |  267 +-
 net/core/filter.c                             |    4 +-
 net/sched/Kconfig                             |    4 +-
 net/sched/sch_ingress.c                       |   61 +-
 tools/bpf/bpftool/net.c                       |   86 +-
 tools/include/uapi/linux/bpf.h                |   70 +-
 tools/lib/bpf/bpf.c                           |  124 +-
 tools/lib/bpf/bpf.h                           |   97 +-
 tools/lib/bpf/libbpf.c                        |   74 +-
 tools/lib/bpf/libbpf.h                        |   16 +
 tools/lib/bpf/libbpf.map                      |    2 +
 tools/lib/bpf/libbpf_common.h                 |   11 +
 .../selftests/bpf/prog_tests/tc_helpers.h     |   72 +
 .../selftests/bpf/prog_tests/tc_links.c       | 1604 ++++++++++++
 .../selftests/bpf/prog_tests/tc_opts.c        | 2182 +++++++++++++++++
 .../selftests/bpf/progs/test_tc_link.c        |   40 +
 29 files changed, 6002 insertions(+), 229 deletions(-)
 create mode 100644 include/linux/bpf_mprog.h
 create mode 100644 include/net/tcx.h
 create mode 100644 kernel/bpf/mprog.c
 create mode 100644 kernel/bpf/tcx.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/tc_helpers.h
 create mode 100644 tools/testing/selftests/bpf/prog_tests/tc_links.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/tc_opts.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_tc_link.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH bpf-next v4 1/8] bpf: Add generic attach/detach/query API for multi-progs
  2023-07-10 20:12 [PATCH bpf-next v4 0/8] BPF link support for tc BPF programs Daniel Borkmann
@ 2023-07-10 20:12 ` Daniel Borkmann
  2023-07-11  0:23   ` Alexei Starovoitov
  2023-07-11 18:48   ` Andrii Nakryiko
  2023-07-10 20:12 ` [PATCH bpf-next v4 2/8] bpf: Add fd-based tcx multi-prog infra with link support Daniel Borkmann
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 22+ messages in thread
From: Daniel Borkmann @ 2023-07-10 20:12 UTC (permalink / raw)
  To: ast
  Cc: andrii, martin.lau, razor, sdf, john.fastabend, kuba, dxu, joe,
	toke, davem, bpf, netdev, Daniel Borkmann

This adds a generic layer called bpf_mprog which can be reused by different
attachment layers to enable multi-program attachment and dependency resolution.
In-kernel users of the bpf_mprog don't need to care about the dependency
resolution internals, they can just consume it with few API calls.

The initial idea of having a generic API sparked out of discussion [0] from an
earlier revision of this work where tc's priority was reused and exposed via
BPF uapi as a way to coordinate dependencies among tc BPF programs, similar
as-is for classic tc BPF. The feedback was that priority provides a bad user
experience and is hard to use [1], e.g.:

  I cannot help but feel that priority logic copy-paste from old tc, netfilter
  and friends is done because "that's how things were done in the past". [...]
  Priority gets exposed everywhere in uapi all the way to bpftool when it's
  right there for users to understand. And that's the main problem with it.

  The user don't want to and don't need to be aware of it, but uapi forces them
  to pick the priority. [...] Your cover letter [0] example proves that in
  real life different service pick the same priority. They simply don't know
  any better. Priority is an unnecessary magic that apps _have_ to pick, so
  they just copy-paste and everyone ends up using the same.

The course of the discussion showed more and more the need for a generic,
reusable API where the "same look and feel" can be applied for various other
program types beyond just tc BPF, for example XDP today does not have multi-
program support in kernel, but also there was interest around this API for
improving management of cgroup program types. Such common multi-program
management concept is useful for BPF management daemons or user space BPF
applications coordinating internally about their attachments.

Both from Cilium and Meta side [2], we've collected the following requirements
for a generic attach/detach/query API for multi-progs which has been implemented
as part of this work:

  - Support prog-based attach/detach and link API
  - Dependency directives (can also be combined):
    - BPF_F_{BEFORE,AFTER} with relative_{fd,id} which can be {prog,link,none}
      - BPF_F_ID flag as {fd,id} toggle; the rationale for id is so that user
        space application does not need CAP_SYS_ADMIN to retrieve foreign fds
        via bpf_*_get_fd_by_id()
      - BPF_F_LINK flag as {prog,link} toggle
      - If relative_{fd,id} is none, then BPF_F_BEFORE will just prepend, and
        BPF_F_AFTER will just append for attaching
      - Enforced only at attach time
    - BPF_F_REPLACE with replace_bpf_fd which can be prog, links have their
      own infra for replacing their internal prog
    - If no flags are set, then it's default append behavior for attaching
  - Internal revision counter and optionally being able to pass expected_revision
  - User space application can query current state with revision, and pass it
    along for attachment to assert current state before doing updates
  - Query also gets extension for link_ids array and link_attach_flags:
    - prog_ids are always filled with program IDs
    - link_ids are filled with link IDs when link was used, otherwise 0
    - {prog,link}_attach_flags for holding {prog,link}-specific flags
  - Must be easy to integrate/reuse for in-kernel users

The uapi-side changes needed for supporting bpf_mprog are rather minimal,
consisting of the additions of the attachment flags, revision counter, and
expanding existing union with relative_{fd,id} member.

The bpf_mprog framework consists of an bpf_mprog_entry object which holds
an array of bpf_mprog_fp (fast-path structure). The bpf_mprog_cp (control-path
structure) is part of bpf_mprog_bundle. Both have been separated, so that
fast-path gets efficient packing of bpf_prog pointers for maximum cache
efficiency. Also, array has been chosen instead of linked list or other
structures to remove unnecessary indirections for a fast point-to-entry in
tc for BPF.

The bpf_mprog_entry comes as a pair via bpf_mprog_bundle so that in case of
updates the peer bpf_mprog_entry is populated and then just swapped which
avoids additional allocations that could otherwise fail, for example, in
detach case. bpf_mprog_{fp,cp} arrays are currently static, but they could
be converted to dynamic allocation if necessary at a point in future.
Locking is deferred to the in-kernel user of bpf_mprog, for example, in case
of tcx which uses this API in the next patch, it piggybacks on rtnl.

An extensive test suite for checking all aspects of this API for prog-based
attach/detach and link API comes as BPF selftests in this series.

Kudos also to Andrii Nakryiko for API discussions wrt Meta's BPF management.

  [0] https://lore.kernel.org/bpf/20221004231143.19190-1-daniel@iogearbox.net
  [1] https://lore.kernel.org/bpf/CAADnVQ+gEY3FjCR=+DmjDR4gp5bOYZUFJQXj4agKFHT9CQPZBw@mail.gmail.com
  [2] http://vger.kernel.org/bpfconf2023_material/tcx_meta_netdev_borkmann.pdf

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 MAINTAINERS                    |   1 +
 include/linux/bpf_mprog.h      | 343 ++++++++++++++++++++++++++
 include/uapi/linux/bpf.h       |  36 ++-
 kernel/bpf/Makefile            |   2 +-
 kernel/bpf/mprog.c             | 427 +++++++++++++++++++++++++++++++++
 tools/include/uapi/linux/bpf.h |  36 ++-
 6 files changed, 828 insertions(+), 17 deletions(-)
 create mode 100644 include/linux/bpf_mprog.h
 create mode 100644 kernel/bpf/mprog.c

diff --git a/MAINTAINERS b/MAINTAINERS
index acbe54087d1c..7e5ba799d1c5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3736,6 +3736,7 @@ F:	include/linux/filter.h
 F:	include/linux/tnum.h
 F:	kernel/bpf/core.c
 F:	kernel/bpf/dispatcher.c
+F:	kernel/bpf/mprog.c
 F:	kernel/bpf/syscall.c
 F:	kernel/bpf/tnum.c
 F:	kernel/bpf/trampoline.c
diff --git a/include/linux/bpf_mprog.h b/include/linux/bpf_mprog.h
new file mode 100644
index 000000000000..63f0f35bd3e2
--- /dev/null
+++ b/include/linux/bpf_mprog.h
@@ -0,0 +1,343 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) 2023 Isovalent */
+#ifndef __BPF_MPROG_H
+#define __BPF_MPROG_H
+
+#include <linux/bpf.h>
+
+/* bpf_mprog framework:
+ * ~~~~~~~~~~~~~~~~~~~~
+ *
+ * bpf_mprog is a generic layer for multi-program attachment. In-kernel users
+ * of the bpf_mprog don't need to care about the dependency resolution
+ * internals, they can just consume it with few API calls. Currently available
+ * dependency directives are BPF_F_{BEFORE,AFTER} which enable insertion of
+ * a BPF program or BPF link relative to an existing BPF program or BPF link
+ * inside the multi-program array as well as prepend and append behavior if
+ * no relative object was specified, see corresponding selftests for concrete
+ * examples (e.g. tc_links and tc_opts test cases of test_progs).
+ *
+ * Usage of bpf_mprog_{attach,detach,query}() core APIs with pseudo code:
+ *
+ *  Attach case:
+ *
+ *   struct bpf_mprog_entry *entry, *peer;
+ *   int ret;
+ *
+ *   // bpf_mprog user-side lock
+ *   // fetch active @entry from attach location
+ *   [...]
+ *   ret = bpf_mprog_attach(entry, [...]);
+ *   if (ret >= 0) {
+ *       peer = bpf_mprog_peer(entry);
+ *       if (bpf_mprog_swap_entries(ret))
+ *           // swap @entry to @peer at attach location
+ *       bpf_mprog_commit(entry);
+ *       ret = 0;
+ *   } else {
+ *       // error path, bail out, propagate @ret
+ *   }
+ *   // bpf_mprog user-side unlock
+ *
+ *  Detach case:
+ *
+ *   struct bpf_mprog_entry *entry, *peer;
+ *   bool release;
+ *   int ret;
+ *
+ *   // bpf_mprog user-side lock
+ *   // fetch active @entry from attach location
+ *   [...]
+ *   ret = bpf_mprog_detach(entry, [...]);
+ *   if (ret >= 0) {
+ *       release = ret == BPF_MPROG_FREE;
+ *       peer = release ? NULL : bpf_mprog_peer(entry);
+ *       if (bpf_mprog_swap_entries(ret))
+ *           // swap @entry to @peer at attach location
+ *       bpf_mprog_commit(entry);
+ *       if (release)
+ *           // free bpf_mprog_bundle
+ *       ret = 0;
+ *   } else {
+ *       // error path, bail out, propagate @ret
+ *   }
+ *   // bpf_mprog user-side unlock
+ *
+ *  Query case:
+ *
+ *   struct bpf_mprog_entry *entry;
+ *   int ret;
+ *
+ *   // bpf_mprog user-side lock
+ *   // fetch active @entry from attach location
+ *   [...]
+ *   ret = bpf_mprog_query(attr, uattr, entry);
+ *   // bpf_mprog user-side unlock
+ *
+ *  Data/fast path:
+ *
+ *   struct bpf_mprog_entry *entry;
+ *   struct bpf_mprog_fp *fp;
+ *   struct bpf_prog *prog;
+ *   int ret = [...];
+ *
+ *   rcu_read_lock();
+ *   // fetch active @entry from attach location
+ *   [...]
+ *   bpf_mprog_foreach_prog(entry, fp, prog) {
+ *       ret = bpf_prog_run(prog, [...]);
+ *       // process @ret from program
+ *   }
+ *   [...]
+ *   rcu_read_unlock();
+ *
+ * bpf_mprog_{attach,detach}() return codes:
+ *
+ * Negative return code means that an error occurred and the bpf_mprog_entry
+ * has not been changed. The error should be propagated to the user. A non-
+ * negative return code can be one of the following:
+ *
+ * BPF_MPROG_KEEP:
+ *   The bpf_mprog_entry does not need a/b swap, the bpf_mprog_fp item has
+ *   been replaced in the current active bpf_mprog_entry.
+ *
+ * BPF_MPROG_SWAP:
+ *   The bpf_mprog_entry does need an a/b swap and must be updated to its
+ *   peer entry (peer = bpf_mprog_peer(entry)) which has been populated to
+ *   the new bpf_mprog_fp item configuration.
+ *
+ * BPF_MPROG_FREE:
+ *   The bpf_mprog_entry now does not hold any non-NULL bpf_mprog_fp items
+ *   anymore. The bpf_mprog_entry should be swapped with NULL and the
+ *   corresponding bpf_mprog_bundle can be freed.
+ *
+ * bpf_mprog locking considerations:
+ *
+ * bpf_mprog_{attach,detach,query}() must be protected by an external lock
+ * (like RTNL in case of tcx).
+ *
+ * bpf_mprog_entry pointer can be an __rcu annotated pointer (in case of tcx
+ * the netdevice has tcx_ingress and tcx_egress __rcu pointer) which gets
+ * updated via rcu_assign_pointer() pointing to the active bpf_mprog_entry of
+ * the bpf_mprog_bundle.
+ *
+ * Fast path accesses the active bpf_mprog_entry within RCU critical section
+ * (in case of tcx it runs in NAPI which provides RCU protection there,
+ * other users might need explicit rcu_read_lock()). The bpf_mprog_commit()
+ * assumes that RCU protection.
+ *
+ * The READ_ONCE()/WRITE_ONCE() pairing for bpf_mprog_fp's prog access is for
+ * the replacement case where we don't swap the bpf_mprog_entry.
+ */
+
+#define BPF_MPROG_KEEP	0
+#define BPF_MPROG_SWAP	1
+#define BPF_MPROG_FREE	2
+
+#define BPF_MPROG_MAX	64
+
+#define bpf_mprog_foreach_tuple(entry, fp, cp, t)			\
+	for (fp = &entry->fp_items[0], cp = &entry->parent->cp_items[0];\
+	     ({								\
+		t.prog = READ_ONCE(fp->prog);				\
+		t.link = cp->link;					\
+		t.prog;							\
+	      });							\
+	     fp++, cp++)
+
+#define bpf_mprog_foreach_prog(entry, fp, p)				\
+	for (fp = &entry->fp_items[0];					\
+	     (p = READ_ONCE(fp->prog));					\
+	     fp++)
+
+struct bpf_mprog_fp {
+	struct bpf_prog *prog;
+};
+
+struct bpf_mprog_cp {
+	struct bpf_link *link;
+};
+
+struct bpf_mprog_entry {
+	struct bpf_mprog_fp fp_items[BPF_MPROG_MAX];
+	struct bpf_mprog_bundle *parent;
+};
+
+struct bpf_mprog_bundle {
+	struct bpf_mprog_entry a;
+	struct bpf_mprog_entry b;
+	struct bpf_mprog_cp cp_items[BPF_MPROG_MAX];
+	struct bpf_prog *ref;
+	atomic64_t revision;
+	u32 count;
+};
+
+struct bpf_tuple {
+	struct bpf_prog *prog;
+	struct bpf_link *link;
+};
+
+static inline struct bpf_mprog_entry *
+bpf_mprog_peer(const struct bpf_mprog_entry *entry)
+{
+	if (entry == &entry->parent->a)
+		return &entry->parent->b;
+	else
+		return &entry->parent->a;
+}
+
+static inline void bpf_mprog_bundle_init(struct bpf_mprog_bundle *bundle)
+{
+	BUILD_BUG_ON(sizeof(bundle->a.fp_items[0]) > sizeof(u64));
+	BUILD_BUG_ON(ARRAY_SIZE(bundle->a.fp_items) !=
+		     ARRAY_SIZE(bundle->cp_items));
+
+	memset(bundle, 0, sizeof(*bundle));
+	atomic64_set(&bundle->revision, 1);
+	bundle->a.parent = bundle;
+	bundle->b.parent = bundle;
+}
+
+static inline void bpf_mprog_inc(struct bpf_mprog_entry *entry)
+{
+	entry->parent->count++;
+}
+
+static inline void bpf_mprog_dec(struct bpf_mprog_entry *entry)
+{
+	entry->parent->count--;
+}
+
+static inline int bpf_mprog_max(void)
+{
+	return ARRAY_SIZE(((struct bpf_mprog_entry *)NULL)->fp_items) - 1;
+}
+
+static inline int bpf_mprog_total(struct bpf_mprog_entry *entry)
+{
+	int total = entry->parent->count;
+
+	WARN_ON_ONCE(total > bpf_mprog_max());
+	return total;
+}
+
+static inline bool bpf_mprog_exists(struct bpf_mprog_entry *entry,
+				    struct bpf_prog *prog)
+{
+	const struct bpf_mprog_fp *fp;
+	const struct bpf_prog *tmp;
+
+	bpf_mprog_foreach_prog(entry, fp, tmp) {
+		if (tmp == prog)
+			return true;
+	}
+	return false;
+}
+
+static inline bool bpf_mprog_swap_entries(const int code)
+{
+	return code == BPF_MPROG_SWAP ||
+	       code == BPF_MPROG_FREE;
+}
+
+static inline void bpf_mprog_mark_ref(struct bpf_mprog_entry *entry,
+				      struct bpf_tuple *tuple)
+{
+	WARN_ON_ONCE(entry->parent->ref);
+	if (!tuple->link)
+		entry->parent->ref = tuple->prog;
+}
+
+static inline void bpf_mprog_commit(struct bpf_mprog_entry *entry)
+{
+	atomic64_inc(&entry->parent->revision);
+	/* bpf_mprog_entry got a/b swapped or prog replacement occurred
+	 * on the active bpf_mprog_entry. Ensure there are no inflight
+	 * users.
+	 */
+	synchronize_rcu();
+	/* bpf_mprog_delete() marked plain prog via bpf_mprog_mark_ref()
+	 * where its reference needs to be dropped after the RCU sync.
+	 */
+	if (entry->parent->ref) {
+		bpf_prog_put(entry->parent->ref);
+		entry->parent->ref = NULL;
+	}
+}
+
+static inline u64 bpf_mprog_revision(struct bpf_mprog_entry *entry)
+{
+	return atomic64_read(&entry->parent->revision);
+}
+
+static inline void bpf_mprog_entry_clear(struct bpf_mprog_entry *entry)
+{
+	memset(entry->fp_items, 0, sizeof(entry->fp_items));
+}
+
+static inline void bpf_mprog_commit_cp(struct bpf_mprog_entry *entry,
+				       struct bpf_mprog_cp *cp_items)
+{
+	memcpy(entry->parent->cp_items, cp_items,
+	       sizeof(entry->parent->cp_items));
+}
+
+static inline void bpf_mprog_read_fp(struct bpf_mprog_entry *entry, u32 idx,
+				     struct bpf_mprog_fp **fp)
+{
+	*fp = &entry->fp_items[idx];
+}
+
+static inline void bpf_mprog_read_cp(struct bpf_mprog_entry *entry, u32 idx,
+				     struct bpf_mprog_cp **cp)
+{
+	*cp = &entry->parent->cp_items[idx];
+}
+
+static inline void bpf_mprog_read(struct bpf_mprog_entry *entry, u32 idx,
+				  struct bpf_mprog_fp **fp,
+				  struct bpf_mprog_cp **cp)
+{
+	bpf_mprog_read_fp(entry, idx, fp);
+	bpf_mprog_read_cp(entry, idx, cp);
+}
+
+static inline void bpf_mprog_write_fp(struct bpf_mprog_fp *fp,
+				      struct bpf_tuple *tuple)
+{
+	WRITE_ONCE(fp->prog, tuple->prog);
+}
+
+static inline void bpf_mprog_write_cp(struct bpf_mprog_cp *cp,
+				      struct bpf_tuple *tuple)
+{
+	cp->link = tuple->link;
+}
+
+static inline void bpf_mprog_write(struct bpf_mprog_fp *fp,
+				   struct bpf_mprog_cp *cp,
+				   struct bpf_tuple *tuple)
+{
+	bpf_mprog_write_fp(fp, tuple);
+	bpf_mprog_write_cp(cp, tuple);
+}
+
+static inline void bpf_mprog_copy(struct bpf_mprog_fp *fp_dst,
+				  struct bpf_mprog_cp *cp_dst,
+				  struct bpf_mprog_fp *fp_src,
+				  struct bpf_mprog_cp *cp_src)
+{
+	WRITE_ONCE(fp_dst->prog, READ_ONCE(fp_src->prog));
+	memcpy(cp_dst, cp_src, sizeof(*cp_src));
+}
+
+int bpf_mprog_attach(struct bpf_mprog_entry *entry, struct bpf_prog *prog_new,
+		     struct bpf_link *link, struct bpf_prog *prog_old,
+		     u32 flags, u32 object, u64 revision);
+int bpf_mprog_detach(struct bpf_mprog_entry *entry, struct bpf_prog *prog,
+		     struct bpf_link *link, u32 flags, u32 object, u64 revision);
+
+int bpf_mprog_query(const union bpf_attr *attr, union bpf_attr __user *uattr,
+		    struct bpf_mprog_entry *entry);
+
+#endif /* __BPF_MPROG_H */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 60a9d59beeab..74879c538f2b 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1103,7 +1103,12 @@ enum bpf_link_type {
  */
 #define BPF_F_ALLOW_OVERRIDE	(1U << 0)
 #define BPF_F_ALLOW_MULTI	(1U << 1)
+/* Generic attachment flags. */
 #define BPF_F_REPLACE		(1U << 2)
+#define BPF_F_BEFORE		(1U << 3)
+#define BPF_F_AFTER		(1U << 4)
+#define BPF_F_ID		(1U << 5)
+#define BPF_F_LINK		BPF_F_LINK /* 1 << 13 */
 
 /* If BPF_F_STRICT_ALIGNMENT is used in BPF_PROG_LOAD command, the
  * verifier will perform strict alignment checking as if the kernel
@@ -1434,14 +1439,19 @@ union bpf_attr {
 	};
 
 	struct { /* anonymous struct used by BPF_PROG_ATTACH/DETACH commands */
-		__u32		target_fd;	/* container object to attach to */
-		__u32		attach_bpf_fd;	/* eBPF program to attach */
+		union {
+			__u32	target_fd;	/* target object to attach to or ... */
+			__u32	target_ifindex;	/* target ifindex */
+		};
+		__u32		attach_bpf_fd;
 		__u32		attach_type;
 		__u32		attach_flags;
-		__u32		replace_bpf_fd;	/* previously attached eBPF
-						 * program to replace if
-						 * BPF_F_REPLACE is used
-						 */
+		__u32		replace_bpf_fd;
+		union {
+			__u32	relative_fd;
+			__u32	relative_id;
+		};
+		__u64		expected_revision;
 	};
 
 	struct { /* anonymous struct used by BPF_PROG_TEST_RUN command */
@@ -1487,16 +1497,26 @@ union bpf_attr {
 	} info;
 
 	struct { /* anonymous struct used by BPF_PROG_QUERY command */
-		__u32		target_fd;	/* container object to query */
+		union {
+			__u32	target_fd;	/* target object to query or ... */
+			__u32	target_ifindex;	/* target ifindex */
+		};
 		__u32		attach_type;
 		__u32		query_flags;
 		__u32		attach_flags;
 		__aligned_u64	prog_ids;
-		__u32		prog_cnt;
+		union {
+			__u32	prog_cnt;
+			__u32	count;
+		};
+		__u32		:32;
 		/* output: per-program attach_flags.
 		 * not allowed to be set during effective query.
 		 */
 		__aligned_u64	prog_attach_flags;
+		__aligned_u64	link_ids;
+		__aligned_u64	link_attach_flags;
+		__u64		revision;
 	} query;
 
 	struct { /* anonymous struct used by BPF_RAW_TRACEPOINT_OPEN command */
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index 1d3892168d32..1bea2eb912cd 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -12,7 +12,7 @@ obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list
 obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o
 obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o
 obj-${CONFIG_BPF_LSM}	  += bpf_inode_storage.o
-obj-$(CONFIG_BPF_SYSCALL) += disasm.o
+obj-$(CONFIG_BPF_SYSCALL) += disasm.o mprog.o
 obj-$(CONFIG_BPF_JIT) += trampoline.o
 obj-$(CONFIG_BPF_SYSCALL) += btf.o memalloc.o
 obj-$(CONFIG_BPF_JIT) += dispatcher.o
diff --git a/kernel/bpf/mprog.c b/kernel/bpf/mprog.c
new file mode 100644
index 000000000000..1c4fcde74969
--- /dev/null
+++ b/kernel/bpf/mprog.c
@@ -0,0 +1,427 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2023 Isovalent */
+
+#include <linux/bpf.h>
+#include <linux/bpf_mprog.h>
+
+static int bpf_mprog_link(struct bpf_tuple *tuple,
+			  u32 object, u32 flags,
+			  enum bpf_prog_type type)
+{
+	bool id = flags & BPF_F_ID;
+	struct bpf_link *link;
+
+	if (id)
+		link = bpf_link_by_id(object);
+	else
+		link = bpf_link_get_from_fd(object);
+	if (IS_ERR(link))
+		return PTR_ERR(link);
+	if (type && link->prog->type != type) {
+		bpf_link_put(link);
+		return -EINVAL;
+	}
+
+	tuple->link = link;
+	tuple->prog = link->prog;
+	return 0;
+}
+
+static int bpf_mprog_prog(struct bpf_tuple *tuple,
+			  u32 object, u32 flags,
+			  enum bpf_prog_type type)
+{
+	bool id = flags & BPF_F_ID;
+	struct bpf_prog *prog;
+
+	if (id)
+		prog = bpf_prog_by_id(object);
+	else
+		prog = bpf_prog_get(object);
+	if (IS_ERR(prog)) {
+		if (!object && !id)
+			return 0;
+		return PTR_ERR(prog);
+	}
+	if (type && prog->type != type) {
+		bpf_prog_put(prog);
+		return -EINVAL;
+	}
+
+	tuple->link = NULL;
+	tuple->prog = prog;
+	return 0;
+}
+
+static int bpf_mprog_tuple_relative(struct bpf_tuple *tuple,
+				    u32 object, u32 flags,
+				    enum bpf_prog_type type)
+{
+	memset(tuple, 0, sizeof(*tuple));
+	if (flags & BPF_F_LINK)
+		return bpf_mprog_link(tuple, object, flags, type);
+	return bpf_mprog_prog(tuple, object, flags, type);
+}
+
+static void bpf_mprog_tuple_put(struct bpf_tuple *tuple)
+{
+	if (tuple->link)
+		bpf_link_put(tuple->link);
+	else if (tuple->prog)
+		bpf_prog_put(tuple->prog);
+}
+
+static int bpf_mprog_replace(struct bpf_mprog_entry *entry,
+			     struct bpf_tuple *ntuple, int idx)
+{
+	struct bpf_mprog_fp *fp;
+	struct bpf_mprog_cp *cp;
+	struct bpf_prog *oprog;
+
+	bpf_mprog_read(entry, idx, &fp, &cp);
+	oprog = READ_ONCE(fp->prog);
+	bpf_mprog_write(fp, cp, ntuple);
+	if (!ntuple->link) {
+		WARN_ON_ONCE(cp->link);
+		bpf_prog_put(oprog);
+	}
+	return BPF_MPROG_KEEP;
+}
+
+static int bpf_mprog_insert(struct bpf_mprog_entry *entry,
+			    struct bpf_tuple *ntuple, int idx, u32 flags)
+{
+	int i, j = 0, total = bpf_mprog_total(entry);
+	struct bpf_mprog_cp *cp, cpp[BPF_MPROG_MAX] = {};
+	struct bpf_mprog_fp *fp, *fpp;
+	struct bpf_mprog_entry *peer;
+
+	peer = bpf_mprog_peer(entry);
+	bpf_mprog_entry_clear(peer);
+	if (idx < 0) {
+		bpf_mprog_read_fp(peer, j, &fpp);
+		bpf_mprog_write_fp(fpp, ntuple);
+		bpf_mprog_write_cp(&cpp[j], ntuple);
+		j++;
+	}
+	for (i = 0; i <= total; i++) {
+		bpf_mprog_read_fp(peer, j, &fpp);
+		if (idx == i && (flags & BPF_F_AFTER)) {
+			bpf_mprog_write(fpp, &cpp[j], ntuple);
+			j++;
+			bpf_mprog_read_fp(peer, j, &fpp);
+		}
+		if (i < total) {
+			bpf_mprog_read(entry, i, &fp, &cp);
+			bpf_mprog_copy(fpp, &cpp[j], fp, cp);
+			j++;
+		}
+		if (idx == i && (flags & BPF_F_BEFORE)) {
+			bpf_mprog_read_fp(peer, j, &fpp);
+			bpf_mprog_write(fpp, &cpp[j], ntuple);
+			j++;
+		}
+	}
+	bpf_mprog_commit_cp(peer, cpp);
+	bpf_mprog_inc(peer);
+	return BPF_MPROG_SWAP;
+}
+
+static int bpf_mprog_tuple_confirm(struct bpf_mprog_entry *entry,
+				   struct bpf_tuple *dtuple, int idx)
+{
+	int first = 0, last = bpf_mprog_total(entry) - 1;
+	struct bpf_mprog_cp *cp;
+	struct bpf_mprog_fp *fp;
+	struct bpf_prog *prog;
+	struct bpf_link *link;
+
+	if (idx <= first)
+		bpf_mprog_read(entry, first, &fp, &cp);
+	else if (idx >= last)
+		bpf_mprog_read(entry, last, &fp, &cp);
+	else
+		bpf_mprog_read(entry, idx, &fp, &cp);
+
+	prog = READ_ONCE(fp->prog);
+	link = cp->link;
+	if (!dtuple->link && link)
+		return -EBUSY;
+
+	WARN_ON_ONCE(dtuple->prog && dtuple->prog != prog);
+	WARN_ON_ONCE(dtuple->link && dtuple->link != link);
+
+	dtuple->prog = prog;
+	dtuple->link = link;
+	return 0;
+}
+
+static int bpf_mprog_delete(struct bpf_mprog_entry *entry,
+			    struct bpf_tuple *dtuple, int idx)
+{
+	int i = 0, j, ret, total = bpf_mprog_total(entry);
+	struct bpf_mprog_cp *cp, cpp[BPF_MPROG_MAX] = {};
+	struct bpf_mprog_fp *fp, *fpp;
+	struct bpf_mprog_entry *peer;
+
+	ret = bpf_mprog_tuple_confirm(entry, dtuple, idx);
+	if (ret)
+		return ret;
+	peer = bpf_mprog_peer(entry);
+	bpf_mprog_entry_clear(peer);
+	if (idx < 0)
+		i++;
+	if (idx == total)
+		total--;
+	for (j = 0; i < total; i++) {
+		if (idx == i)
+			continue;
+		bpf_mprog_read_fp(peer, j, &fpp);
+		bpf_mprog_read(entry, i, &fp, &cp);
+		bpf_mprog_copy(fpp, &cpp[j], fp, cp);
+		j++;
+	}
+	bpf_mprog_commit_cp(peer, cpp);
+	bpf_mprog_dec(peer);
+	bpf_mprog_mark_ref(peer, dtuple);
+	return bpf_mprog_total(peer) ?
+	       BPF_MPROG_SWAP : BPF_MPROG_FREE;
+}
+
+/* In bpf_mprog_pos_*() we evaluate the target position for the BPF
+ * program/link that needs to be replaced, inserted or deleted for
+ * each "rule" independently. If all rules agree on that position
+ * or existing element, then enact replacement, addition or deletion.
+ * If this is not the case, then the request cannot be satisfied and
+ * we bail out with an error.
+ */
+static int bpf_mprog_pos_exact(struct bpf_mprog_entry *entry,
+			       struct bpf_tuple *tuple)
+{
+	struct bpf_mprog_fp *fp;
+	struct bpf_mprog_cp *cp;
+	int i;
+
+	for (i = 0; i < bpf_mprog_total(entry); i++) {
+		bpf_mprog_read(entry, i, &fp, &cp);
+		if (tuple->prog == READ_ONCE(fp->prog))
+			return tuple->link == cp->link ? i : -EBUSY;
+	}
+	return -ENOENT;
+}
+
+static int bpf_mprog_pos_before(struct bpf_mprog_entry *entry,
+				struct bpf_tuple *tuple)
+{
+	struct bpf_mprog_fp *fp;
+	struct bpf_mprog_cp *cp;
+	int i;
+
+	for (i = 0; i < bpf_mprog_total(entry); i++) {
+		bpf_mprog_read(entry, i, &fp, &cp);
+		if (tuple->prog == READ_ONCE(fp->prog) &&
+		    (!tuple->link || tuple->link == cp->link))
+			return i - 1;
+	}
+	return tuple->prog ? -ENOENT : -1;
+}
+
+static int bpf_mprog_pos_after(struct bpf_mprog_entry *entry,
+			       struct bpf_tuple *tuple)
+{
+	struct bpf_mprog_fp *fp;
+	struct bpf_mprog_cp *cp;
+	int i;
+
+	for (i = 0; i < bpf_mprog_total(entry); i++) {
+		bpf_mprog_read(entry, i, &fp, &cp);
+		if (tuple->prog == READ_ONCE(fp->prog) &&
+		    (!tuple->link || tuple->link == cp->link))
+			return i + 1;
+	}
+	return tuple->prog ? -ENOENT : bpf_mprog_total(entry);
+}
+
+int bpf_mprog_attach(struct bpf_mprog_entry *entry, struct bpf_prog *prog_new,
+		     struct bpf_link *link, struct bpf_prog *prog_old,
+		     u32 flags, u32 object, u64 revision)
+{
+	struct bpf_tuple rtuple, ntuple = {
+		.prog = prog_new,
+		.link = link,
+	}, otuple = {
+		.prog = prog_old,
+		.link = link,
+	};
+	int ret, idx = -2, tidx;
+
+	if (revision && revision != bpf_mprog_revision(entry))
+		return -ESTALE;
+	if (bpf_mprog_exists(entry, prog_new))
+		return -EEXIST;
+	ret = bpf_mprog_tuple_relative(&rtuple, object,
+				       flags & ~BPF_F_REPLACE,
+				       prog_new->type);
+	if (ret)
+		return ret;
+	if (flags & BPF_F_REPLACE) {
+		tidx = bpf_mprog_pos_exact(entry, &otuple);
+		if (tidx < 0) {
+			ret = tidx;
+			goto out;
+		}
+		idx = tidx;
+	}
+	if (flags & BPF_F_BEFORE) {
+		tidx = bpf_mprog_pos_before(entry, &rtuple);
+		if (tidx < -1 || (idx >= -1 && tidx != idx)) {
+			ret = tidx < -1 ? tidx : -EDOM;
+			goto out;
+		}
+		idx = tidx;
+	}
+	if (flags & BPF_F_AFTER) {
+		tidx = bpf_mprog_pos_after(entry, &rtuple);
+		if (tidx < -1 || (idx >= -1 && tidx != idx)) {
+			ret = tidx < 0 ? tidx : -EDOM;
+			goto out;
+		}
+		idx = tidx;
+	}
+	if (idx < -1) {
+		if (rtuple.prog || flags) {
+			ret = -EINVAL;
+			goto out;
+		}
+		idx = bpf_mprog_total(entry);
+		flags = BPF_F_AFTER;
+	}
+	if (idx >= bpf_mprog_max()) {
+		ret = -EDOM;
+		goto out;
+	}
+	if (flags & BPF_F_REPLACE)
+		ret = bpf_mprog_replace(entry, &ntuple, idx);
+	else
+		ret = bpf_mprog_insert(entry, &ntuple, idx, flags);
+out:
+	bpf_mprog_tuple_put(&rtuple);
+	return ret;
+}
+
+int bpf_mprog_detach(struct bpf_mprog_entry *entry, struct bpf_prog *prog,
+		     struct bpf_link *link, u32 flags, u32 object, u64 revision)
+{
+	struct bpf_tuple rtuple, dtuple = {
+		.prog = prog,
+		.link = link,
+	};
+	int ret, idx = -2, tidx;
+
+	if (flags & BPF_F_REPLACE)
+		return -EINVAL;
+	if (revision && revision != bpf_mprog_revision(entry))
+		return -ESTALE;
+	ret = bpf_mprog_tuple_relative(&rtuple, object, flags,
+				       prog ? prog->type :
+				       BPF_PROG_TYPE_UNSPEC);
+	if (ret)
+		return ret;
+	if (dtuple.prog) {
+		tidx = bpf_mprog_pos_exact(entry, &dtuple);
+		if (tidx < 0) {
+			ret = tidx;
+			goto out;
+		}
+		idx = tidx;
+	}
+	if (flags & BPF_F_BEFORE) {
+		tidx = bpf_mprog_pos_before(entry, &rtuple);
+		if (tidx < -1 || (idx >= -1 && tidx != idx)) {
+			ret = tidx < -1 ? tidx : -EDOM;
+			goto out;
+		}
+		idx = tidx;
+	}
+	if (flags & BPF_F_AFTER) {
+		tidx = bpf_mprog_pos_after(entry, &rtuple);
+		if (tidx < -1 || (idx >= -1 && tidx != idx)) {
+			ret = tidx < 0 ? tidx : -EDOM;
+			goto out;
+		}
+		idx = tidx;
+	}
+	if (idx < -1) {
+		if (rtuple.prog || flags) {
+			ret = -EINVAL;
+			goto out;
+		}
+		idx = bpf_mprog_total(entry);
+		flags = BPF_F_AFTER;
+	}
+	if (idx >= bpf_mprog_max()) {
+		ret = -EDOM;
+		goto out;
+	}
+	ret = bpf_mprog_delete(entry, &dtuple, idx);
+out:
+	bpf_mprog_tuple_put(&rtuple);
+	return ret;
+}
+
+int bpf_mprog_query(const union bpf_attr *attr, union bpf_attr __user *uattr,
+		    struct bpf_mprog_entry *entry)
+{
+	u32 __user *uprog_flags, *ulink_flags;
+	u32 __user *uprog_id, *ulink_id;
+	struct bpf_mprog_fp *fp;
+	struct bpf_mprog_cp *cp;
+	struct bpf_prog *prog;
+	const u32 flags = 0;
+	int i, ret = 0;
+	u32 id, count;
+	u64 revision;
+
+	if (attr->query.query_flags || attr->query.attach_flags)
+		return -EINVAL;
+	revision = bpf_mprog_revision(entry);
+	count = bpf_mprog_total(entry);
+	if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags)))
+		return -EFAULT;
+	if (copy_to_user(&uattr->query.revision, &revision, sizeof(revision)))
+		return -EFAULT;
+	if (copy_to_user(&uattr->query.count, &count, sizeof(count)))
+		return -EFAULT;
+	uprog_id = u64_to_user_ptr(attr->query.prog_ids);
+	uprog_flags = u64_to_user_ptr(attr->query.prog_attach_flags);
+	ulink_id = u64_to_user_ptr(attr->query.link_ids);
+	ulink_flags = u64_to_user_ptr(attr->query.link_attach_flags);
+	if (attr->query.count == 0 || !uprog_id || !count)
+		return 0;
+	if (attr->query.count < count) {
+		count = attr->query.count;
+		ret = -ENOSPC;
+	}
+	for (i = 0; i < bpf_mprog_max(); i++) {
+		bpf_mprog_read(entry, i, &fp, &cp);
+		prog = READ_ONCE(fp->prog);
+		if (!prog)
+			break;
+		id = prog->aux->id;
+		if (copy_to_user(uprog_id + i, &id, sizeof(id)))
+			return -EFAULT;
+		if (uprog_flags &&
+		    copy_to_user(uprog_flags + i, &flags, sizeof(flags)))
+			return -EFAULT;
+		id = cp->link ? cp->link->id : 0;
+		if (ulink_id &&
+		    copy_to_user(ulink_id + i, &id, sizeof(id)))
+			return -EFAULT;
+		if (ulink_flags &&
+		    copy_to_user(ulink_flags + i, &flags, sizeof(flags)))
+			return -EFAULT;
+		if (i + 1 == count)
+			break;
+	}
+	return ret;
+}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 60a9d59beeab..74879c538f2b 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1103,7 +1103,12 @@ enum bpf_link_type {
  */
 #define BPF_F_ALLOW_OVERRIDE	(1U << 0)
 #define BPF_F_ALLOW_MULTI	(1U << 1)
+/* Generic attachment flags. */
 #define BPF_F_REPLACE		(1U << 2)
+#define BPF_F_BEFORE		(1U << 3)
+#define BPF_F_AFTER		(1U << 4)
+#define BPF_F_ID		(1U << 5)
+#define BPF_F_LINK		BPF_F_LINK /* 1 << 13 */
 
 /* If BPF_F_STRICT_ALIGNMENT is used in BPF_PROG_LOAD command, the
  * verifier will perform strict alignment checking as if the kernel
@@ -1434,14 +1439,19 @@ union bpf_attr {
 	};
 
 	struct { /* anonymous struct used by BPF_PROG_ATTACH/DETACH commands */
-		__u32		target_fd;	/* container object to attach to */
-		__u32		attach_bpf_fd;	/* eBPF program to attach */
+		union {
+			__u32	target_fd;	/* target object to attach to or ... */
+			__u32	target_ifindex;	/* target ifindex */
+		};
+		__u32		attach_bpf_fd;
 		__u32		attach_type;
 		__u32		attach_flags;
-		__u32		replace_bpf_fd;	/* previously attached eBPF
-						 * program to replace if
-						 * BPF_F_REPLACE is used
-						 */
+		__u32		replace_bpf_fd;
+		union {
+			__u32	relative_fd;
+			__u32	relative_id;
+		};
+		__u64		expected_revision;
 	};
 
 	struct { /* anonymous struct used by BPF_PROG_TEST_RUN command */
@@ -1487,16 +1497,26 @@ union bpf_attr {
 	} info;
 
 	struct { /* anonymous struct used by BPF_PROG_QUERY command */
-		__u32		target_fd;	/* container object to query */
+		union {
+			__u32	target_fd;	/* target object to query or ... */
+			__u32	target_ifindex;	/* target ifindex */
+		};
 		__u32		attach_type;
 		__u32		query_flags;
 		__u32		attach_flags;
 		__aligned_u64	prog_ids;
-		__u32		prog_cnt;
+		union {
+			__u32	prog_cnt;
+			__u32	count;
+		};
+		__u32		:32;
 		/* output: per-program attach_flags.
 		 * not allowed to be set during effective query.
 		 */
 		__aligned_u64	prog_attach_flags;
+		__aligned_u64	link_ids;
+		__aligned_u64	link_attach_flags;
+		__u64		revision;
 	} query;
 
 	struct { /* anonymous struct used by BPF_RAW_TRACEPOINT_OPEN command */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next v4 2/8] bpf: Add fd-based tcx multi-prog infra with link support
  2023-07-10 20:12 [PATCH bpf-next v4 0/8] BPF link support for tc BPF programs Daniel Borkmann
  2023-07-10 20:12 ` [PATCH bpf-next v4 1/8] bpf: Add generic attach/detach/query API for multi-progs Daniel Borkmann
@ 2023-07-10 20:12 ` Daniel Borkmann
  2023-07-10 20:12 ` [PATCH bpf-next v4 3/8] libbpf: Add opts-based attach/detach/query API for tcx Daniel Borkmann
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Daniel Borkmann @ 2023-07-10 20:12 UTC (permalink / raw)
  To: ast
  Cc: andrii, martin.lau, razor, sdf, john.fastabend, kuba, dxu, joe,
	toke, davem, bpf, netdev, Daniel Borkmann

This work refactors and adds a lightweight extension ("tcx") to the tc BPF
ingress and egress data path side for allowing BPF program management based
on fds via bpf() syscall through the newly added generic multi-prog API.
The main goal behind this work which we also presented at LPC [0] last year
and a recent update at LSF/MM/BPF this year [3] is to support long-awaited
BPF link functionality for tc BPF programs, which allows for a model of safe
ownership and program detachment.

Given the rise in tc BPF users in cloud native environments, this becomes
necessary to avoid hard to debug incidents either through stale leftover
programs or 3rd party applications accidentally stepping on each others toes.
As a recap, a BPF link represents the attachment of a BPF program to a BPF
hook point. The BPF link holds a single reference to keep BPF program alive.
Moreover, hook points do not reference a BPF link, only the application's
fd or pinning does. A BPF link holds meta-data specific to attachment and
implements operations for link creation, (atomic) BPF program update,
detachment and introspection. The motivation for BPF links for tc BPF programs
is multi-fold, for example:

  - From Meta: "It's especially important for applications that are deployed
    fleet-wide and that don't "control" hosts they are deployed to. If such
    application crashes and no one notices and does anything about that, BPF
    program will keep running draining resources or even just, say, dropping
    packets. We at FB had outages due to such permanent BPF attachment
    semantics. With fd-based BPF link we are getting a framework, which allows
    safe, auto-detachable behavior by default, unless application explicitly
    opts in by pinning the BPF link." [1]

  - From Cilium-side the tc BPF programs we attach to host-facing veth devices
    and phys devices build the core datapath for Kubernetes Pods, and they
    implement forwarding, load-balancing, policy, EDT-management, etc, within
    BPF. Currently there is no concept of 'safe' ownership, e.g. we've recently
    experienced hard-to-debug issues in a user's staging environment where
    another Kubernetes application using tc BPF attached to the same prio/handle
    of cls_bpf, accidentally wiping all Cilium-based BPF programs from underneath
    it. The goal is to establish a clear/safe ownership model via links which
    cannot accidentally be overridden. [0,2]

BPF links for tc can co-exist with non-link attachments, and the semantics are
in line also with XDP links: BPF links cannot replace other BPF links, BPF
links cannot replace non-BPF links, non-BPF links cannot replace BPF links and
lastly only non-BPF links can replace non-BPF links. In case of Cilium, this
would solve mentioned issue of safe ownership model as 3rd party applications
would not be able to accidentally wipe Cilium programs, even if they are not
BPF link aware.

Earlier attempts [4] have tried to integrate BPF links into core tc machinery
to solve cls_bpf, which has been intrusive to the generic tc kernel API with
extensions only specific to cls_bpf and suboptimal/complex since cls_bpf could
be wiped from the qdisc also. Locking a tc BPF program in place this way, is
getting into layering hacks given the two object models are vastly different.

We instead implemented the tcx (tc 'express') layer which is an fd-based tc BPF
attach API, so that the BPF link implementation blends in naturally similar to
other link types which are fd-based and without the need for changing core tc
internal APIs. BPF programs for tc can then be successively migrated from classic
cls_bpf to the new tc BPF link without needing to change the program's source
code, just the BPF loader mechanics for attaching is sufficient.

For the current tc framework, there is no change in behavior with this change
and neither does this change touch on tc core kernel APIs. The gist of this
patch is that the ingress and egress hook have a lightweight, qdisc-less
extension for BPF to attach its tc BPF programs, in other words, a minimal
entry point for tc BPF. The name tcx has been suggested from discussion of
earlier revisions of this work as a good fit, and to more easily differ between
the classic cls_bpf attachment and the fd-based one.

For the ingress and egress tcx points, the device holds a cache-friendly array
with program pointers which is separated from control plane (slow-path) data.
Earlier versions of this work used priority to determine ordering and expression
of dependencies similar as with classic tc, but it was challenged that for
something more future-proof a better user experience is required. Hence this
resulted in the design and development of the generic attach/detach/query API
for multi-progs. See prior patch with its discussion on the API design. tcx is
the first user and later we plan to integrate also others, for example, one
candidate is multi-prog support for XDP which would benefit and have the same
'look and feel' from API perspective.

The goal with tcx is to have maximum compatibility to existing tc BPF programs,
so they don't need to be rewritten specifically. Compatibility to call into
classic tcf_classify() is also provided in order to allow successive migration
or both to cleanly co-exist where needed given its all one logical tc layer and
the tcx plus classic tc cls/act build one logical overall processing pipeline.

tcx supports the simplified return codes TCX_NEXT which is non-terminating (go
to next program) and terminating ones with TCX_PASS, TCX_DROP, TCX_REDIRECT.
The fd-based API is behind a static key, so that when unused the code is also
not entered. The struct tcx_entry's program array is currently static, but
could be made dynamic if necessary at a point in future. The a/b pair swap
design has been chosen so that for detachment there are no allocations which
otherwise could fail.

The work has been tested with tc-testing selftest suite which all passes, as
well as the tc BPF tests from the BPF CI, and also with Cilium's L4LB.

Kudos also to Nikolay Aleksandrov and Martin Lau for in-depth early reviews
of this work.

  [0] https://lpc.events/event/16/contributions/1353/
  [1] https://lore.kernel.org/bpf/CAEf4BzbokCJN33Nw_kg82sO=xppXnKWEncGTWCTB9vGCmLB6pw@mail.gmail.com
  [2] https://colocatedeventseu2023.sched.com/event/1Jo6O/tales-from-an-ebpf-programs-murder-mystery-hemanth-malla-guillaume-fournier-datadog
  [3] http://vger.kernel.org/bpfconf2023_material/tcx_meta_netdev_borkmann.pdf
  [4] https://lore.kernel.org/bpf/20210604063116.234316-1-memxor@gmail.com

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 MAINTAINERS                    |   4 +-
 include/linux/bpf_mprog.h      |   9 +
 include/linux/netdevice.h      |  15 +-
 include/linux/skbuff.h         |   4 +-
 include/net/sch_generic.h      |   2 +-
 include/net/tcx.h              | 199 +++++++++++++++++++
 include/uapi/linux/bpf.h       |  34 +++-
 kernel/bpf/Kconfig             |   1 +
 kernel/bpf/Makefile            |   1 +
 kernel/bpf/syscall.c           |  83 ++++++--
 kernel/bpf/tcx.c               | 351 +++++++++++++++++++++++++++++++++
 net/Kconfig                    |   5 +
 net/core/dev.c                 | 267 +++++++++++++++----------
 net/core/filter.c              |   4 +-
 net/sched/Kconfig              |   4 +-
 net/sched/sch_ingress.c        |  61 +++++-
 tools/include/uapi/linux/bpf.h |  34 +++-
 17 files changed, 934 insertions(+), 144 deletions(-)
 create mode 100644 include/net/tcx.h
 create mode 100644 kernel/bpf/tcx.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 7e5ba799d1c5..0a1327cf4e34 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3830,13 +3830,15 @@ L:	netdev@vger.kernel.org
 S:	Maintained
 F:	kernel/bpf/bpf_struct*
 
-BPF [NETWORKING] (tc BPF, sock_addr)
+BPF [NETWORKING] (tcx & tc BPF, sock_addr)
 M:	Martin KaFai Lau <martin.lau@linux.dev>
 M:	Daniel Borkmann <daniel@iogearbox.net>
 R:	John Fastabend <john.fastabend@gmail.com>
 L:	bpf@vger.kernel.org
 L:	netdev@vger.kernel.org
 S:	Maintained
+F:	include/net/tcx.h
+F:	kernel/bpf/tcx.c
 F:	net/core/filter.c
 F:	net/sched/act_bpf.c
 F:	net/sched/cls_bpf.c
diff --git a/include/linux/bpf_mprog.h b/include/linux/bpf_mprog.h
index 63f0f35bd3e2..ffe39bac011f 100644
--- a/include/linux/bpf_mprog.h
+++ b/include/linux/bpf_mprog.h
@@ -340,4 +340,13 @@ int bpf_mprog_detach(struct bpf_mprog_entry *entry, struct bpf_prog *prog,
 int bpf_mprog_query(const union bpf_attr *attr, union bpf_attr __user *uattr,
 		    struct bpf_mprog_entry *entry);
 
+static inline bool bpf_mprog_supported(enum bpf_prog_type type)
+{
+	switch (type) {
+	case BPF_PROG_TYPE_SCHED_CLS:
+		return true;
+	default:
+		return false;
+	}
+}
 #endif /* __BPF_MPROG_H */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index b828c7a75be2..024314c68bc8 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1930,8 +1930,7 @@ enum netdev_ml_priv_type {
  *
  *	@rx_handler:		handler for received packets
  *	@rx_handler_data: 	XXX: need comments on this one
- *	@miniq_ingress:		ingress/clsact qdisc specific data for
- *				ingress processing
+ *	@tcx_ingress:		BPF & clsact qdisc specific data for ingress processing
  *	@ingress_queue:		XXX: need comments on this one
  *	@nf_hooks_ingress:	netfilter hooks executed for ingress packets
  *	@broadcast:		hw bcast address
@@ -1952,8 +1951,7 @@ enum netdev_ml_priv_type {
  *	@xps_maps:		all CPUs/RXQs maps for XPS device
  *
  *	@xps_maps:	XXX: need comments on this one
- *	@miniq_egress:		clsact qdisc specific data for
- *				egress processing
+ *	@tcx_egress:		BPF & clsact qdisc specific data for egress processing
  *	@nf_hooks_egress:	netfilter hooks executed for egress packets
  *	@qdisc_hash:		qdisc hash table
  *	@watchdog_timeo:	Represents the timeout that is used by
@@ -2252,9 +2250,8 @@ struct net_device {
 	unsigned int		gro_ipv4_max_size;
 	rx_handler_func_t __rcu	*rx_handler;
 	void __rcu		*rx_handler_data;
-
-#ifdef CONFIG_NET_CLS_ACT
-	struct mini_Qdisc __rcu	*miniq_ingress;
+#ifdef CONFIG_NET_XGRESS
+	struct bpf_mprog_entry __rcu *tcx_ingress;
 #endif
 	struct netdev_queue __rcu *ingress_queue;
 #ifdef CONFIG_NETFILTER_INGRESS
@@ -2282,8 +2279,8 @@ struct net_device {
 #ifdef CONFIG_XPS
 	struct xps_dev_maps __rcu *xps_maps[XPS_MAPS_MAX];
 #endif
-#ifdef CONFIG_NET_CLS_ACT
-	struct mini_Qdisc __rcu	*miniq_egress;
+#ifdef CONFIG_NET_XGRESS
+	struct bpf_mprog_entry __rcu *tcx_egress;
 #endif
 #ifdef CONFIG_NETFILTER_EGRESS
 	struct nf_hook_entries __rcu *nf_hooks_egress;
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 91ed66952580..ed83f1c5fc1f 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -944,7 +944,7 @@ struct sk_buff {
 	__u8			__mono_tc_offset[0];
 	/* public: */
 	__u8			mono_delivery_time:1;	/* See SKB_MONO_DELIVERY_TIME_MASK */
-#ifdef CONFIG_NET_CLS_ACT
+#ifdef CONFIG_NET_XGRESS
 	__u8			tc_at_ingress:1;	/* See TC_AT_INGRESS_MASK */
 	__u8			tc_skip_classify:1;
 #endif
@@ -993,7 +993,7 @@ struct sk_buff {
 	__u8			csum_not_inet:1;
 #endif
 
-#ifdef CONFIG_NET_SCHED
+#if defined(CONFIG_NET_SCHED) || defined(CONFIG_NET_XGRESS)
 	__u16			tc_index;	/* traffic control index */
 #endif
 
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index e92f73bb3198..15be2d96b06d 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -703,7 +703,7 @@ int skb_do_redirect(struct sk_buff *);
 
 static inline bool skb_at_tc_ingress(const struct sk_buff *skb)
 {
-#ifdef CONFIG_NET_CLS_ACT
+#ifdef CONFIG_NET_XGRESS
 	return skb->tc_at_ingress;
 #else
 	return false;
diff --git a/include/net/tcx.h b/include/net/tcx.h
new file mode 100644
index 000000000000..6c84817f6a6c
--- /dev/null
+++ b/include/net/tcx.h
@@ -0,0 +1,199 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) 2023 Isovalent */
+#ifndef __NET_TCX_H
+#define __NET_TCX_H
+
+#include <linux/bpf.h>
+#include <linux/bpf_mprog.h>
+
+#include <net/sch_generic.h>
+
+struct mini_Qdisc;
+
+struct tcx_entry {
+	struct mini_Qdisc __rcu *miniq;
+	struct bpf_mprog_bundle bundle;
+	bool miniq_active;
+	struct rcu_head rcu;
+};
+
+struct tcx_link {
+	struct bpf_link link;
+	struct net_device *dev;
+	u32 location;
+};
+
+static inline void tcx_set_ingress(struct sk_buff *skb, bool ingress)
+{
+#ifdef CONFIG_NET_XGRESS
+	skb->tc_at_ingress = ingress;
+#endif
+}
+
+#ifdef CONFIG_NET_XGRESS
+static inline struct tcx_entry *tcx_entry(struct bpf_mprog_entry *entry)
+{
+	struct bpf_mprog_bundle *bundle = entry->parent;
+
+	return container_of(bundle, struct tcx_entry, bundle);
+}
+
+static inline struct tcx_link *tcx_link(struct bpf_link *link)
+{
+	return container_of(link, struct tcx_link, link);
+}
+
+static inline const struct tcx_link *tcx_link_const(const struct bpf_link *link)
+{
+	return tcx_link((struct bpf_link *)link);
+}
+
+void tcx_inc(void);
+void tcx_dec(void);
+
+static inline void
+tcx_entry_update(struct net_device *dev, struct bpf_mprog_entry *entry,
+		 bool ingress)
+{
+	ASSERT_RTNL();
+	if (ingress)
+		rcu_assign_pointer(dev->tcx_ingress, entry);
+	else
+		rcu_assign_pointer(dev->tcx_egress, entry);
+}
+
+static inline bool tcx_entry_needs_release(struct bpf_mprog_entry *entry,
+					   int code)
+{
+	ASSERT_RTNL();
+	return code == BPF_MPROG_FREE && !tcx_entry(entry)->miniq_active;
+}
+
+static inline struct bpf_mprog_entry *
+tcx_entry_fetch(struct net_device *dev, bool ingress)
+{
+	ASSERT_RTNL();
+	if (ingress)
+		return rcu_dereference_rtnl(dev->tcx_ingress);
+	else
+		return rcu_dereference_rtnl(dev->tcx_egress);
+}
+
+static inline struct bpf_mprog_entry *tcx_entry_create(void)
+{
+	struct tcx_entry *tcx = kzalloc(sizeof(*tcx), GFP_KERNEL);
+
+	if (tcx) {
+		bpf_mprog_bundle_init(&tcx->bundle);
+		return &tcx->bundle.a;
+	}
+	return NULL;
+}
+
+static inline void tcx_entry_free(struct bpf_mprog_entry *entry)
+{
+	kfree_rcu(tcx_entry(entry), rcu);
+}
+
+static inline struct bpf_mprog_entry *
+tcx_entry_fetch_or_create(struct net_device *dev, bool ingress, bool *created)
+{
+	struct bpf_mprog_entry *entry = tcx_entry_fetch(dev, ingress);
+
+	*created = false;
+	if (!entry) {
+		entry = tcx_entry_create();
+		if (!entry)
+			return NULL;
+		*created = true;
+	}
+	return entry;
+}
+
+static inline void tcx_skeys_inc(bool ingress)
+{
+	tcx_inc();
+	if (ingress)
+		net_inc_ingress_queue();
+	else
+		net_inc_egress_queue();
+}
+
+static inline void tcx_skeys_dec(bool ingress)
+{
+	if (ingress)
+		net_dec_ingress_queue();
+	else
+		net_dec_egress_queue();
+	tcx_dec();
+}
+
+static inline void tcx_miniq_active(struct bpf_mprog_entry *entry,
+				    const bool active)
+{
+	ASSERT_RTNL();
+	tcx_entry(entry)->miniq_active = active;
+}
+
+static inline enum tcx_action_base tcx_action_code(struct sk_buff *skb,
+						   int code)
+{
+	switch (code) {
+	case TCX_PASS:
+		skb->tc_index = qdisc_skb_cb(skb)->tc_classid;
+		fallthrough;
+	case TCX_DROP:
+	case TCX_REDIRECT:
+		return code;
+	case TCX_NEXT:
+	default:
+		return TCX_NEXT;
+	}
+}
+#endif /* CONFIG_NET_XGRESS */
+
+#if defined(CONFIG_NET_XGRESS) && defined(CONFIG_BPF_SYSCALL)
+int tcx_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog);
+int tcx_link_attach(const union bpf_attr *attr, struct bpf_prog *prog);
+int tcx_prog_detach(const union bpf_attr *attr, struct bpf_prog *prog);
+void tcx_uninstall(struct net_device *dev, bool ingress);
+
+int tcx_prog_query(const union bpf_attr *attr,
+		   union bpf_attr __user *uattr);
+
+static inline void dev_tcx_uninstall(struct net_device *dev)
+{
+	ASSERT_RTNL();
+	tcx_uninstall(dev, true);
+	tcx_uninstall(dev, false);
+}
+#else
+static inline int tcx_prog_attach(const union bpf_attr *attr,
+				  struct bpf_prog *prog)
+{
+	return -EINVAL;
+}
+
+static inline int tcx_link_attach(const union bpf_attr *attr,
+				  struct bpf_prog *prog)
+{
+	return -EINVAL;
+}
+
+static inline int tcx_prog_detach(const union bpf_attr *attr,
+				  struct bpf_prog *prog)
+{
+	return -EINVAL;
+}
+
+static inline int tcx_prog_query(const union bpf_attr *attr,
+				 union bpf_attr __user *uattr)
+{
+	return -EINVAL;
+}
+
+static inline void dev_tcx_uninstall(struct net_device *dev)
+{
+}
+#endif /* CONFIG_NET_XGRESS && CONFIG_BPF_SYSCALL */
+#endif /* __NET_TCX_H */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 74879c538f2b..98c4a3a6e137 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1036,6 +1036,8 @@ enum bpf_attach_type {
 	BPF_LSM_CGROUP,
 	BPF_STRUCT_OPS,
 	BPF_NETFILTER,
+	BPF_TCX_INGRESS,
+	BPF_TCX_EGRESS,
 	__MAX_BPF_ATTACH_TYPE
 };
 
@@ -1053,7 +1055,7 @@ enum bpf_link_type {
 	BPF_LINK_TYPE_KPROBE_MULTI = 8,
 	BPF_LINK_TYPE_STRUCT_OPS = 9,
 	BPF_LINK_TYPE_NETFILTER = 10,
-
+	BPF_LINK_TYPE_TCX = 11,
 	MAX_BPF_LINK_TYPE,
 };
 
@@ -1559,13 +1561,13 @@ union bpf_attr {
 			__u32		map_fd;		/* struct_ops to attach */
 		};
 		union {
-			__u32		target_fd;	/* object to attach to */
-			__u32		target_ifindex; /* target ifindex */
+			__u32	target_fd;	/* target object to attach to or ... */
+			__u32	target_ifindex; /* target ifindex */
 		};
 		__u32		attach_type;	/* attach type */
 		__u32		flags;		/* extra flags */
 		union {
-			__u32		target_btf_id;	/* btf_id of target to attach to */
+			__u32	target_btf_id;	/* btf_id of target to attach to */
 			struct {
 				__aligned_u64	iter_info;	/* extra bpf_iter_link_info */
 				__u32		iter_info_len;	/* iter_info length */
@@ -1599,6 +1601,13 @@ union bpf_attr {
 				__s32		priority;
 				__u32		flags;
 			} netfilter;
+			struct {
+				union {
+					__u32	relative_fd;
+					__u32	relative_id;
+				};
+				__u64		expected_revision;
+			} tcx;
 		};
 	} link_create;
 
@@ -6207,6 +6216,19 @@ struct bpf_sock_tuple {
 	};
 };
 
+/* (Simplified) user return codes for tcx prog type.
+ * A valid tcx program must return one of these defined values. All other
+ * return codes are reserved for future use. Must remain compatible with
+ * their TC_ACT_* counter-parts. For compatibility in behavior, unknown
+ * return codes are mapped to TCX_NEXT.
+ */
+enum tcx_action_base {
+	TCX_NEXT	= -1,
+	TCX_PASS	= 0,
+	TCX_DROP	= 2,
+	TCX_REDIRECT	= 7,
+};
+
 struct bpf_xdp_sock {
 	__u32 queue_id;
 };
@@ -6459,6 +6481,10 @@ struct bpf_link_info {
 			__s32 priority;
 			__u32 flags;
 		} netfilter;
+		struct {
+			__u32 ifindex;
+			__u32 attach_type;
+		} tcx;
 	};
 } __attribute__((aligned(8)));
 
diff --git a/kernel/bpf/Kconfig b/kernel/bpf/Kconfig
index 2dfe1079f772..6a906ff93006 100644
--- a/kernel/bpf/Kconfig
+++ b/kernel/bpf/Kconfig
@@ -31,6 +31,7 @@ config BPF_SYSCALL
 	select TASKS_TRACE_RCU
 	select BINARY_PRINTF
 	select NET_SOCK_MSG if NET
+	select NET_XGRESS if NET
 	select PAGE_POOL if NET
 	default n
 	help
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index 1bea2eb912cd..f526b7573e97 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -21,6 +21,7 @@ obj-$(CONFIG_BPF_SYSCALL) += devmap.o
 obj-$(CONFIG_BPF_SYSCALL) += cpumap.o
 obj-$(CONFIG_BPF_SYSCALL) += offload.o
 obj-$(CONFIG_BPF_SYSCALL) += net_namespace.o
+obj-$(CONFIG_BPF_SYSCALL) += tcx.o
 endif
 ifeq ($(CONFIG_PERF_EVENTS),y)
 obj-$(CONFIG_BPF_SYSCALL) += stackmap.o
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index a2aef900519c..8c884cd50413 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -37,6 +37,8 @@
 #include <linux/trace_events.h>
 #include <net/netfilter/nf_bpf_link.h>
 
+#include <net/tcx.h>
+
 #define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || \
 			  (map)->map_type == BPF_MAP_TYPE_CGROUP_ARRAY || \
 			  (map)->map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS)
@@ -3588,31 +3590,45 @@ attach_type_to_prog_type(enum bpf_attach_type attach_type)
 		return BPF_PROG_TYPE_XDP;
 	case BPF_LSM_CGROUP:
 		return BPF_PROG_TYPE_LSM;
+	case BPF_TCX_INGRESS:
+	case BPF_TCX_EGRESS:
+		return BPF_PROG_TYPE_SCHED_CLS;
 	default:
 		return BPF_PROG_TYPE_UNSPEC;
 	}
 }
 
-#define BPF_PROG_ATTACH_LAST_FIELD replace_bpf_fd
+#define BPF_PROG_ATTACH_LAST_FIELD expected_revision
+
+#define BPF_F_ATTACH_MASK_BASE	\
+	(BPF_F_ALLOW_OVERRIDE |	\
+	 BPF_F_ALLOW_MULTI |	\
+	 BPF_F_REPLACE)
 
-#define BPF_F_ATTACH_MASK \
-	(BPF_F_ALLOW_OVERRIDE | BPF_F_ALLOW_MULTI | BPF_F_REPLACE)
+#define BPF_F_ATTACH_MASK_MPROG	\
+	(BPF_F_REPLACE |	\
+	 BPF_F_BEFORE |		\
+	 BPF_F_AFTER |		\
+	 BPF_F_ID |		\
+	 BPF_F_LINK)
 
 static int bpf_prog_attach(const union bpf_attr *attr)
 {
 	enum bpf_prog_type ptype;
 	struct bpf_prog *prog;
+	u32 mask;
 	int ret;
 
 	if (CHECK_ATTR(BPF_PROG_ATTACH))
 		return -EINVAL;
 
-	if (attr->attach_flags & ~BPF_F_ATTACH_MASK)
-		return -EINVAL;
-
 	ptype = attach_type_to_prog_type(attr->attach_type);
 	if (ptype == BPF_PROG_TYPE_UNSPEC)
 		return -EINVAL;
+	mask = bpf_mprog_supported(ptype) ?
+	       BPF_F_ATTACH_MASK_MPROG : BPF_F_ATTACH_MASK_BASE;
+	if (attr->attach_flags & ~mask)
+		return -EINVAL;
 
 	prog = bpf_prog_get_type(attr->attach_bpf_fd, ptype);
 	if (IS_ERR(prog))
@@ -3648,6 +3664,9 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 		else
 			ret = cgroup_bpf_prog_attach(attr, ptype, prog);
 		break;
+	case BPF_PROG_TYPE_SCHED_CLS:
+		ret = tcx_prog_attach(attr, prog);
+		break;
 	default:
 		ret = -EINVAL;
 	}
@@ -3657,25 +3676,42 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 	return ret;
 }
 
-#define BPF_PROG_DETACH_LAST_FIELD attach_type
+#define BPF_PROG_DETACH_LAST_FIELD expected_revision
 
 static int bpf_prog_detach(const union bpf_attr *attr)
 {
+	struct bpf_prog *prog = NULL;
 	enum bpf_prog_type ptype;
+	int ret;
 
 	if (CHECK_ATTR(BPF_PROG_DETACH))
 		return -EINVAL;
 
 	ptype = attach_type_to_prog_type(attr->attach_type);
+	if (bpf_mprog_supported(ptype)) {
+		if (ptype == BPF_PROG_TYPE_UNSPEC)
+			return -EINVAL;
+		if (attr->attach_flags & ~BPF_F_ATTACH_MASK_MPROG)
+			return -EINVAL;
+		prog = bpf_prog_get_type(attr->attach_bpf_fd, ptype);
+		if (IS_ERR(prog)) {
+			if ((int)attr->attach_bpf_fd > 0)
+				return PTR_ERR(prog);
+			prog = NULL;
+		}
+	}
 
 	switch (ptype) {
 	case BPF_PROG_TYPE_SK_MSG:
 	case BPF_PROG_TYPE_SK_SKB:
-		return sock_map_prog_detach(attr, ptype);
+		ret = sock_map_prog_detach(attr, ptype);
+		break;
 	case BPF_PROG_TYPE_LIRC_MODE2:
-		return lirc_prog_detach(attr);
+		ret = lirc_prog_detach(attr);
+		break;
 	case BPF_PROG_TYPE_FLOW_DISSECTOR:
-		return netns_bpf_prog_detach(attr, ptype);
+		ret = netns_bpf_prog_detach(attr, ptype);
+		break;
 	case BPF_PROG_TYPE_CGROUP_DEVICE:
 	case BPF_PROG_TYPE_CGROUP_SKB:
 	case BPF_PROG_TYPE_CGROUP_SOCK:
@@ -3684,13 +3720,21 @@ static int bpf_prog_detach(const union bpf_attr *attr)
 	case BPF_PROG_TYPE_CGROUP_SYSCTL:
 	case BPF_PROG_TYPE_SOCK_OPS:
 	case BPF_PROG_TYPE_LSM:
-		return cgroup_bpf_prog_detach(attr, ptype);
+		ret = cgroup_bpf_prog_detach(attr, ptype);
+		break;
+	case BPF_PROG_TYPE_SCHED_CLS:
+		ret = tcx_prog_detach(attr, prog);
+		break;
 	default:
-		return -EINVAL;
+		ret = -EINVAL;
 	}
+
+	if (prog)
+		bpf_prog_put(prog);
+	return ret;
 }
 
-#define BPF_PROG_QUERY_LAST_FIELD query.prog_attach_flags
+#define BPF_PROG_QUERY_LAST_FIELD query.link_attach_flags
 
 static int bpf_prog_query(const union bpf_attr *attr,
 			  union bpf_attr __user *uattr)
@@ -3738,6 +3782,9 @@ static int bpf_prog_query(const union bpf_attr *attr,
 	case BPF_SK_MSG_VERDICT:
 	case BPF_SK_SKB_VERDICT:
 		return sock_map_bpf_prog_query(attr, uattr);
+	case BPF_TCX_INGRESS:
+	case BPF_TCX_EGRESS:
+		return tcx_prog_query(attr, uattr);
 	default:
 		return -EINVAL;
 	}
@@ -4700,6 +4747,13 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
 			goto out;
 		}
 		break;
+	case BPF_PROG_TYPE_SCHED_CLS:
+		if (attr->link_create.attach_type != BPF_TCX_INGRESS &&
+		    attr->link_create.attach_type != BPF_TCX_EGRESS) {
+			ret = -EINVAL;
+			goto out;
+		}
+		break;
 	default:
 		ptype = attach_type_to_prog_type(attr->link_create.attach_type);
 		if (ptype == BPF_PROG_TYPE_UNSPEC || ptype != prog->type) {
@@ -4751,6 +4805,9 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
 	case BPF_PROG_TYPE_XDP:
 		ret = bpf_xdp_link_attach(attr, prog);
 		break;
+	case BPF_PROG_TYPE_SCHED_CLS:
+		ret = tcx_link_attach(attr, prog);
+		break;
 	case BPF_PROG_TYPE_NETFILTER:
 		ret = bpf_nf_link_attach(attr, prog);
 		break;
diff --git a/kernel/bpf/tcx.c b/kernel/bpf/tcx.c
new file mode 100644
index 000000000000..f5211bb6714c
--- /dev/null
+++ b/kernel/bpf/tcx.c
@@ -0,0 +1,351 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2023 Isovalent */
+
+#include <linux/bpf.h>
+#include <linux/bpf_mprog.h>
+#include <linux/netdevice.h>
+
+#include <net/tcx.h>
+
+int tcx_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
+{
+	bool created, ingress = attr->attach_type == BPF_TCX_INGRESS;
+	struct net *net = current->nsproxy->net_ns;
+	struct bpf_prog *replace_prog = NULL;
+	struct bpf_mprog_entry *entry;
+	struct net_device *dev;
+	int ret;
+
+	rtnl_lock();
+	dev = __dev_get_by_index(net, attr->target_ifindex);
+	if (!dev) {
+		ret = -ENODEV;
+		goto out;
+	}
+	if (attr->attach_flags & BPF_F_REPLACE) {
+		replace_prog = bpf_prog_get_type(attr->replace_bpf_fd,
+						 prog->type);
+		if (IS_ERR(replace_prog)) {
+			ret = PTR_ERR(replace_prog);
+			replace_prog = NULL;
+			goto out;
+		}
+	}
+	entry = tcx_entry_fetch_or_create(dev, ingress, &created);
+	if (!entry) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	ret = bpf_mprog_attach(entry, prog, NULL, replace_prog,
+			       attr->attach_flags, attr->relative_fd,
+			       attr->expected_revision);
+	if (ret >= 0) {
+		if (bpf_mprog_swap_entries(ret))
+			tcx_entry_update(dev, bpf_mprog_peer(entry), ingress);
+		bpf_mprog_commit(entry);
+		tcx_skeys_inc(ingress);
+		ret = 0;
+	} else if (created) {
+		tcx_entry_free(entry);
+	}
+out:
+	if (replace_prog)
+		bpf_prog_put(replace_prog);
+	rtnl_unlock();
+	return ret;
+}
+
+int tcx_prog_detach(const union bpf_attr *attr, struct bpf_prog *prog)
+{
+	bool tcx_release, ingress = attr->attach_type == BPF_TCX_INGRESS;
+	struct net *net = current->nsproxy->net_ns;
+	struct bpf_mprog_entry *entry, *peer;
+	struct net_device *dev;
+	int ret;
+
+	rtnl_lock();
+	dev = __dev_get_by_index(net, attr->target_ifindex);
+	if (!dev) {
+		ret = -ENODEV;
+		goto out;
+	}
+	entry = tcx_entry_fetch(dev, ingress);
+	if (!entry) {
+		ret = -ENOENT;
+		goto out;
+	}
+	ret = bpf_mprog_detach(entry, prog, NULL, attr->attach_flags,
+			       attr->relative_fd, attr->expected_revision);
+	if (ret >= 0) {
+		tcx_release = tcx_entry_needs_release(entry, ret);
+		peer = tcx_release ? NULL : bpf_mprog_peer(entry);
+		if (bpf_mprog_swap_entries(ret))
+			tcx_entry_update(dev, peer, ingress);
+		bpf_mprog_commit(entry);
+		tcx_skeys_dec(ingress);
+		if (tcx_release)
+			tcx_entry_free(entry);
+		ret = 0;
+	}
+out:
+	rtnl_unlock();
+	return ret;
+}
+
+void tcx_uninstall(struct net_device *dev, bool ingress)
+{
+	struct bpf_tuple tuple = {};
+	struct bpf_mprog_entry *entry;
+	struct bpf_mprog_fp *fp;
+	struct bpf_mprog_cp *cp;
+
+	entry = tcx_entry_fetch(dev, ingress);
+	if (!entry)
+		return;
+	tcx_entry_update(dev, NULL, ingress);
+	bpf_mprog_commit(entry);
+	bpf_mprog_foreach_tuple(entry, fp, cp, tuple) {
+		if (tuple.link)
+			tcx_link(tuple.link)->dev = NULL;
+		else
+			bpf_prog_put(tuple.prog);
+		tcx_skeys_dec(ingress);
+	}
+	WARN_ON_ONCE(tcx_entry(entry)->miniq_active);
+	tcx_entry_free(entry);
+}
+
+int tcx_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr)
+{
+	bool ingress = attr->query.attach_type == BPF_TCX_INGRESS;
+	struct net *net = current->nsproxy->net_ns;
+	struct bpf_mprog_entry *entry;
+	struct net_device *dev;
+	int ret;
+
+	rtnl_lock();
+	dev = __dev_get_by_index(net, attr->query.target_ifindex);
+	if (!dev) {
+		ret = -ENODEV;
+		goto out;
+	}
+	entry = tcx_entry_fetch(dev, ingress);
+	if (!entry) {
+		ret = -ENOENT;
+		goto out;
+	}
+	ret = bpf_mprog_query(attr, uattr, entry);
+out:
+	rtnl_unlock();
+	return ret;
+}
+
+static int tcx_link_prog_attach(struct bpf_link *link, u32 flags, u32 object,
+				u64 revision)
+{
+	struct tcx_link *tcx = tcx_link(link);
+	bool created, ingress = tcx->location == BPF_TCX_INGRESS;
+	struct net_device *dev = tcx->dev;
+	struct bpf_mprog_entry *entry;
+	int ret;
+
+	ASSERT_RTNL();
+	entry = tcx_entry_fetch_or_create(dev, ingress, &created);
+	if (!entry)
+		return -ENOMEM;
+	ret = bpf_mprog_attach(entry, link->prog, link, NULL, flags, object,
+			       revision);
+	if (ret >= 0) {
+		if (bpf_mprog_swap_entries(ret))
+			tcx_entry_update(dev, bpf_mprog_peer(entry), ingress);
+		bpf_mprog_commit(entry);
+		tcx_skeys_inc(ingress);
+		ret = 0;
+	} else if (created) {
+		tcx_entry_free(entry);
+	}
+	return ret;
+}
+
+static void tcx_link_release(struct bpf_link *link)
+{
+	struct tcx_link *tcx = tcx_link(link);
+	bool tcx_release, ingress = tcx->location == BPF_TCX_INGRESS;
+	struct bpf_mprog_entry *entry, *peer;
+	struct net_device *dev;
+	int ret = 0;
+
+	rtnl_lock();
+	dev = tcx->dev;
+	if (!dev)
+		goto out;
+	entry = tcx_entry_fetch(dev, ingress);
+	if (!entry) {
+		ret = -ENOENT;
+		goto out;
+	}
+	ret = bpf_mprog_detach(entry, link->prog, link, 0, 0, 0);
+	if (ret >= 0) {
+		tcx_release = tcx_entry_needs_release(entry, ret);
+		peer = tcx_release ? NULL : bpf_mprog_peer(entry);
+		if (bpf_mprog_swap_entries(ret))
+			tcx_entry_update(dev, peer, ingress);
+		bpf_mprog_commit(entry);
+		tcx_skeys_dec(ingress);
+		if (tcx_release)
+			tcx_entry_free(entry);
+		tcx->dev = NULL;
+		ret = 0;
+	}
+out:
+	WARN_ON_ONCE(ret);
+	rtnl_unlock();
+}
+
+static int tcx_link_update(struct bpf_link *link, struct bpf_prog *nprog,
+			   struct bpf_prog *oprog)
+{
+	struct tcx_link *tcx = tcx_link(link);
+	bool ingress = tcx->location == BPF_TCX_INGRESS;
+	struct bpf_mprog_entry *entry;
+	struct net_device *dev;
+	int ret = 0;
+
+	rtnl_lock();
+	dev = tcx->dev;
+	if (!dev) {
+		ret = -ENOLINK;
+		goto out;
+	}
+	if (oprog && link->prog != oprog) {
+		ret = -EPERM;
+		goto out;
+	}
+	oprog = link->prog;
+	if (oprog == nprog) {
+		bpf_prog_put(nprog);
+		goto out;
+	}
+	entry = tcx_entry_fetch(dev, ingress);
+	if (!entry) {
+		ret = -ENOENT;
+		goto out;
+	}
+	ret = bpf_mprog_attach(entry, nprog, link, oprog,
+			       BPF_F_REPLACE | BPF_F_ID,
+			       link->prog->aux->id, 0);
+	if (ret >= 0) {
+		if (bpf_mprog_swap_entries(ret))
+			tcx_entry_update(dev, bpf_mprog_peer(entry), ingress);
+		bpf_mprog_commit(entry);
+		tcx_skeys_inc(ingress);
+		oprog = xchg(&link->prog, nprog);
+		bpf_prog_put(oprog);
+		ret = 0;
+	}
+out:
+	rtnl_unlock();
+	return ret;
+}
+
+static void tcx_link_dealloc(struct bpf_link *link)
+{
+	kfree(tcx_link(link));
+}
+
+static void tcx_link_fdinfo(const struct bpf_link *link, struct seq_file *seq)
+{
+	const struct tcx_link *tcx = tcx_link_const(link);
+	u32 ifindex = 0;
+
+	rtnl_lock();
+	if (tcx->dev)
+		ifindex = tcx->dev->ifindex;
+	rtnl_unlock();
+
+	seq_printf(seq, "ifindex:\t%u\n", ifindex);
+	seq_printf(seq, "attach_type:\t%u (%s)\n",
+		   tcx->location,
+		   tcx->location == BPF_TCX_INGRESS ? "ingress" : "egress");
+}
+
+static int tcx_link_fill_info(const struct bpf_link *link,
+			      struct bpf_link_info *info)
+{
+	const struct tcx_link *tcx = tcx_link_const(link);
+	u32 ifindex = 0;
+
+	rtnl_lock();
+	if (tcx->dev)
+		ifindex = tcx->dev->ifindex;
+	rtnl_unlock();
+
+	info->tcx.ifindex = ifindex;
+	info->tcx.attach_type = tcx->location;
+	return 0;
+}
+
+static int tcx_link_detach(struct bpf_link *link)
+{
+	tcx_link_release(link);
+	return 0;
+}
+
+static const struct bpf_link_ops tcx_link_lops = {
+	.release	= tcx_link_release,
+	.detach		= tcx_link_detach,
+	.dealloc	= tcx_link_dealloc,
+	.update_prog	= tcx_link_update,
+	.show_fdinfo	= tcx_link_fdinfo,
+	.fill_link_info	= tcx_link_fill_info,
+};
+
+static int tcx_link_init(struct tcx_link *tcx,
+			 struct bpf_link_primer *link_primer,
+			 const union bpf_attr *attr,
+			 struct net_device *dev,
+			 struct bpf_prog *prog)
+{
+	bpf_link_init(&tcx->link, BPF_LINK_TYPE_TCX, &tcx_link_lops, prog);
+	tcx->location = attr->link_create.attach_type;
+	tcx->dev = dev;
+	return bpf_link_prime(&tcx->link, link_primer);
+}
+
+int tcx_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
+{
+	struct net *net = current->nsproxy->net_ns;
+	struct bpf_link_primer link_primer;
+	struct net_device *dev;
+	struct tcx_link *tcx;
+	int ret;
+
+	rtnl_lock();
+	dev = __dev_get_by_index(net, attr->link_create.target_ifindex);
+	if (!dev) {
+		ret = -ENODEV;
+		goto out;
+	}
+	tcx = kzalloc(sizeof(*tcx), GFP_USER);
+	if (!tcx) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	ret = tcx_link_init(tcx, &link_primer, attr, dev, prog);
+	if (ret) {
+		kfree(tcx);
+		goto out;
+	}
+	ret = tcx_link_prog_attach(&tcx->link, attr->link_create.flags,
+				   attr->link_create.tcx.relative_fd,
+				   attr->link_create.tcx.expected_revision);
+	if (ret) {
+		tcx->dev = NULL;
+		bpf_link_cleanup(&link_primer);
+		goto out;
+	}
+	ret = bpf_link_settle(&link_primer);
+out:
+	rtnl_unlock();
+	return ret;
+}
diff --git a/net/Kconfig b/net/Kconfig
index 2fb25b534df5..d532ec33f1fe 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -52,6 +52,11 @@ config NET_INGRESS
 config NET_EGRESS
 	bool
 
+config NET_XGRESS
+	select NET_INGRESS
+	select NET_EGRESS
+	bool
+
 config NET_REDIRECT
 	bool
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 69a3e544676c..db60921ba1f8 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -107,6 +107,7 @@
 #include <net/pkt_cls.h>
 #include <net/checksum.h>
 #include <net/xfrm.h>
+#include <net/tcx.h>
 #include <linux/highmem.h>
 #include <linux/init.h>
 #include <linux/module.h>
@@ -154,7 +155,6 @@
 #include "dev.h"
 #include "net-sysfs.h"
 
-
 static DEFINE_SPINLOCK(ptype_lock);
 struct list_head ptype_base[PTYPE_HASH_SIZE] __read_mostly;
 struct list_head ptype_all __read_mostly;	/* Taps */
@@ -3882,69 +3882,200 @@ int dev_loopback_xmit(struct net *net, struct sock *sk, struct sk_buff *skb)
 EXPORT_SYMBOL(dev_loopback_xmit);
 
 #ifdef CONFIG_NET_EGRESS
-static struct sk_buff *
-sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
+static struct netdev_queue *
+netdev_tx_queue_mapping(struct net_device *dev, struct sk_buff *skb)
+{
+	int qm = skb_get_queue_mapping(skb);
+
+	return netdev_get_tx_queue(dev, netdev_cap_txqueue(dev, qm));
+}
+
+static bool netdev_xmit_txqueue_skipped(void)
 {
+	return __this_cpu_read(softnet_data.xmit.skip_txqueue);
+}
+
+void netdev_xmit_skip_txqueue(bool skip)
+{
+	__this_cpu_write(softnet_data.xmit.skip_txqueue, skip);
+}
+EXPORT_SYMBOL_GPL(netdev_xmit_skip_txqueue);
+#endif /* CONFIG_NET_EGRESS */
+
+#ifdef CONFIG_NET_XGRESS
+static int tc_run(struct tcx_entry *entry, struct sk_buff *skb)
+{
+	int ret = TC_ACT_UNSPEC;
 #ifdef CONFIG_NET_CLS_ACT
-	struct mini_Qdisc *miniq = rcu_dereference_bh(dev->miniq_egress);
-	struct tcf_result cl_res;
+	struct mini_Qdisc *miniq = rcu_dereference_bh(entry->miniq);
+	struct tcf_result res;
 
 	if (!miniq)
-		return skb;
+		return ret;
 
-	/* qdisc_skb_cb(skb)->pkt_len was already set by the caller. */
 	tc_skb_cb(skb)->mru = 0;
 	tc_skb_cb(skb)->post_ct = false;
-	mini_qdisc_bstats_cpu_update(miniq, skb);
 
-	switch (tcf_classify(skb, miniq->block, miniq->filter_list, &cl_res, false)) {
+	mini_qdisc_bstats_cpu_update(miniq, skb);
+	ret = tcf_classify(skb, miniq->block, miniq->filter_list, &res, false);
+	/* Only tcf related quirks below. */
+	switch (ret) {
+	case TC_ACT_SHOT:
+		mini_qdisc_qstats_cpu_drop(miniq);
+		break;
 	case TC_ACT_OK:
 	case TC_ACT_RECLASSIFY:
-		skb->tc_index = TC_H_MIN(cl_res.classid);
+		skb->tc_index = TC_H_MIN(res.classid);
 		break;
+	}
+#endif /* CONFIG_NET_CLS_ACT */
+	return ret;
+}
+
+static DEFINE_STATIC_KEY_FALSE(tcx_needed_key);
+
+void tcx_inc(void)
+{
+	static_branch_inc(&tcx_needed_key);
+}
+EXPORT_SYMBOL_GPL(tcx_inc);
+
+void tcx_dec(void)
+{
+	static_branch_dec(&tcx_needed_key);
+}
+EXPORT_SYMBOL_GPL(tcx_dec);
+
+static __always_inline enum tcx_action_base
+tcx_run(const struct bpf_mprog_entry *entry, struct sk_buff *skb,
+	const bool needs_mac)
+{
+	const struct bpf_mprog_fp *fp;
+	const struct bpf_prog *prog;
+	int ret = TCX_NEXT;
+
+	if (needs_mac)
+		__skb_push(skb, skb->mac_len);
+	bpf_mprog_foreach_prog(entry, fp, prog) {
+		bpf_compute_data_pointers(skb);
+		ret = bpf_prog_run(prog, skb);
+		if (ret != TCX_NEXT)
+			break;
+	}
+	if (needs_mac)
+		__skb_pull(skb, skb->mac_len);
+	return tcx_action_code(skb, ret);
+}
+
+static __always_inline struct sk_buff *
+sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
+		   struct net_device *orig_dev, bool *another)
+{
+	struct bpf_mprog_entry *entry = rcu_dereference_bh(skb->dev->tcx_ingress);
+	int sch_ret;
+
+	if (!entry)
+		return skb;
+	if (*pt_prev) {
+		*ret = deliver_skb(skb, *pt_prev, orig_dev);
+		*pt_prev = NULL;
+	}
+
+	qdisc_skb_cb(skb)->pkt_len = skb->len;
+	tcx_set_ingress(skb, true);
+
+	if (static_branch_unlikely(&tcx_needed_key)) {
+		sch_ret = tcx_run(entry, skb, true);
+		if (sch_ret != TC_ACT_UNSPEC)
+			goto ingress_verdict;
+	}
+	sch_ret = tc_run(tcx_entry(entry), skb);
+ingress_verdict:
+	switch (sch_ret) {
+	case TC_ACT_REDIRECT:
+		/* skb_mac_header check was done by BPF, so we can safely
+		 * push the L2 header back before redirecting to another
+		 * netdev.
+		 */
+		__skb_push(skb, skb->mac_len);
+		if (skb_do_redirect(skb) == -EAGAIN) {
+			__skb_pull(skb, skb->mac_len);
+			*another = true;
+			break;
+		}
+		*ret = NET_RX_SUCCESS;
+		return NULL;
 	case TC_ACT_SHOT:
-		mini_qdisc_qstats_cpu_drop(miniq);
-		*ret = NET_XMIT_DROP;
-		kfree_skb_reason(skb, SKB_DROP_REASON_TC_EGRESS);
+		kfree_skb_reason(skb, SKB_DROP_REASON_TC_INGRESS);
+		*ret = NET_RX_DROP;
 		return NULL;
+	/* used by tc_run */
 	case TC_ACT_STOLEN:
 	case TC_ACT_QUEUED:
 	case TC_ACT_TRAP:
-		*ret = NET_XMIT_SUCCESS;
 		consume_skb(skb);
+		fallthrough;
+	case TC_ACT_CONSUMED:
+		*ret = NET_RX_SUCCESS;
 		return NULL;
+	}
+
+	return skb;
+}
+
+static __always_inline struct sk_buff *
+sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
+{
+	struct bpf_mprog_entry *entry = rcu_dereference_bh(dev->tcx_egress);
+	int sch_ret;
+
+	if (!entry)
+		return skb;
+
+	/* qdisc_skb_cb(skb)->pkt_len & tcx_set_ingress() was
+	 * already set by the caller.
+	 */
+	if (static_branch_unlikely(&tcx_needed_key)) {
+		sch_ret = tcx_run(entry, skb, false);
+		if (sch_ret != TC_ACT_UNSPEC)
+			goto egress_verdict;
+	}
+	sch_ret = tc_run(tcx_entry(entry), skb);
+egress_verdict:
+	switch (sch_ret) {
 	case TC_ACT_REDIRECT:
 		/* No need to push/pop skb's mac_header here on egress! */
 		skb_do_redirect(skb);
 		*ret = NET_XMIT_SUCCESS;
 		return NULL;
-	default:
-		break;
+	case TC_ACT_SHOT:
+		kfree_skb_reason(skb, SKB_DROP_REASON_TC_EGRESS);
+		*ret = NET_XMIT_DROP;
+		return NULL;
+	/* used by tc_run */
+	case TC_ACT_STOLEN:
+	case TC_ACT_QUEUED:
+	case TC_ACT_TRAP:
+		*ret = NET_XMIT_SUCCESS;
+		return NULL;
 	}
-#endif /* CONFIG_NET_CLS_ACT */
 
 	return skb;
 }
-
-static struct netdev_queue *
-netdev_tx_queue_mapping(struct net_device *dev, struct sk_buff *skb)
-{
-	int qm = skb_get_queue_mapping(skb);
-
-	return netdev_get_tx_queue(dev, netdev_cap_txqueue(dev, qm));
-}
-
-static bool netdev_xmit_txqueue_skipped(void)
+#else
+static __always_inline struct sk_buff *
+sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
+		   struct net_device *orig_dev, bool *another)
 {
-	return __this_cpu_read(softnet_data.xmit.skip_txqueue);
+	return skb;
 }
 
-void netdev_xmit_skip_txqueue(bool skip)
+static __always_inline struct sk_buff *
+sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
 {
-	__this_cpu_write(softnet_data.xmit.skip_txqueue, skip);
+	return skb;
 }
-EXPORT_SYMBOL_GPL(netdev_xmit_skip_txqueue);
-#endif /* CONFIG_NET_EGRESS */
+#endif /* CONFIG_NET_XGRESS */
 
 #ifdef CONFIG_XPS
 static int __get_xps_queue_idx(struct net_device *dev, struct sk_buff *skb,
@@ -4128,9 +4259,7 @@ int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
 	skb_update_prio(skb);
 
 	qdisc_pkt_len_init(skb);
-#ifdef CONFIG_NET_CLS_ACT
-	skb->tc_at_ingress = 0;
-#endif
+	tcx_set_ingress(skb, false);
 #ifdef CONFIG_NET_EGRESS
 	if (static_branch_unlikely(&egress_needed_key)) {
 		if (nf_hook_egress_active()) {
@@ -5064,72 +5193,6 @@ int (*br_fdb_test_addr_hook)(struct net_device *dev,
 EXPORT_SYMBOL_GPL(br_fdb_test_addr_hook);
 #endif
 
-static inline struct sk_buff *
-sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
-		   struct net_device *orig_dev, bool *another)
-{
-#ifdef CONFIG_NET_CLS_ACT
-	struct mini_Qdisc *miniq = rcu_dereference_bh(skb->dev->miniq_ingress);
-	struct tcf_result cl_res;
-
-	/* If there's at least one ingress present somewhere (so
-	 * we get here via enabled static key), remaining devices
-	 * that are not configured with an ingress qdisc will bail
-	 * out here.
-	 */
-	if (!miniq)
-		return skb;
-
-	if (*pt_prev) {
-		*ret = deliver_skb(skb, *pt_prev, orig_dev);
-		*pt_prev = NULL;
-	}
-
-	qdisc_skb_cb(skb)->pkt_len = skb->len;
-	tc_skb_cb(skb)->mru = 0;
-	tc_skb_cb(skb)->post_ct = false;
-	skb->tc_at_ingress = 1;
-	mini_qdisc_bstats_cpu_update(miniq, skb);
-
-	switch (tcf_classify(skb, miniq->block, miniq->filter_list, &cl_res, false)) {
-	case TC_ACT_OK:
-	case TC_ACT_RECLASSIFY:
-		skb->tc_index = TC_H_MIN(cl_res.classid);
-		break;
-	case TC_ACT_SHOT:
-		mini_qdisc_qstats_cpu_drop(miniq);
-		kfree_skb_reason(skb, SKB_DROP_REASON_TC_INGRESS);
-		*ret = NET_RX_DROP;
-		return NULL;
-	case TC_ACT_STOLEN:
-	case TC_ACT_QUEUED:
-	case TC_ACT_TRAP:
-		consume_skb(skb);
-		*ret = NET_RX_SUCCESS;
-		return NULL;
-	case TC_ACT_REDIRECT:
-		/* skb_mac_header check was done by cls/act_bpf, so
-		 * we can safely push the L2 header back before
-		 * redirecting to another netdev
-		 */
-		__skb_push(skb, skb->mac_len);
-		if (skb_do_redirect(skb) == -EAGAIN) {
-			__skb_pull(skb, skb->mac_len);
-			*another = true;
-			break;
-		}
-		*ret = NET_RX_SUCCESS;
-		return NULL;
-	case TC_ACT_CONSUMED:
-		*ret = NET_RX_SUCCESS;
-		return NULL;
-	default:
-		break;
-	}
-#endif /* CONFIG_NET_CLS_ACT */
-	return skb;
-}
-
 /**
  *	netdev_is_rx_handler_busy - check if receive handler is registered
  *	@dev: device to check
@@ -10838,7 +10901,7 @@ void unregister_netdevice_many_notify(struct list_head *head,
 
 		/* Shutdown queueing discipline. */
 		dev_shutdown(dev);
-
+		dev_tcx_uninstall(dev);
 		dev_xdp_uninstall(dev);
 		bpf_dev_bound_netdev_unregister(dev);
 
diff --git a/net/core/filter.c b/net/core/filter.c
index 06ba0e56e369..e39a8a20dd10 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -9312,7 +9312,7 @@ static struct bpf_insn *bpf_convert_tstamp_read(const struct bpf_prog *prog,
 	__u8 value_reg = si->dst_reg;
 	__u8 skb_reg = si->src_reg;
 
-#ifdef CONFIG_NET_CLS_ACT
+#ifdef CONFIG_NET_XGRESS
 	/* If the tstamp_type is read,
 	 * the bpf prog is aware the tstamp could have delivery time.
 	 * Thus, read skb->tstamp as is if tstamp_type_access is true.
@@ -9346,7 +9346,7 @@ static struct bpf_insn *bpf_convert_tstamp_write(const struct bpf_prog *prog,
 	__u8 value_reg = si->src_reg;
 	__u8 skb_reg = si->dst_reg;
 
-#ifdef CONFIG_NET_CLS_ACT
+#ifdef CONFIG_NET_XGRESS
 	/* If the tstamp_type is read,
 	 * the bpf prog is aware the tstamp could have delivery time.
 	 * Thus, write skb->tstamp as is if tstamp_type_access is true.
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index 4b95cb1ac435..470c70deffe2 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -347,8 +347,7 @@ config NET_SCH_FQ_PIE
 config NET_SCH_INGRESS
 	tristate "Ingress/classifier-action Qdisc"
 	depends on NET_CLS_ACT
-	select NET_INGRESS
-	select NET_EGRESS
+	select NET_XGRESS
 	help
 	  Say Y here if you want to use classifiers for incoming and/or outgoing
 	  packets. This qdisc doesn't do anything else besides running classifiers,
@@ -679,6 +678,7 @@ config NET_EMATCH_IPT
 config NET_CLS_ACT
 	bool "Actions"
 	select NET_CLS
+	select NET_XGRESS
 	help
 	  Say Y here if you want to use traffic control actions. Actions
 	  get attached to classifiers and are invoked after a successful
diff --git a/net/sched/sch_ingress.c b/net/sched/sch_ingress.c
index e43a45499372..ea1baa0e0d76 100644
--- a/net/sched/sch_ingress.c
+++ b/net/sched/sch_ingress.c
@@ -13,6 +13,7 @@
 #include <net/netlink.h>
 #include <net/pkt_sched.h>
 #include <net/pkt_cls.h>
+#include <net/tcx.h>
 
 struct ingress_sched_data {
 	struct tcf_block *block;
@@ -78,6 +79,8 @@ static int ingress_init(struct Qdisc *sch, struct nlattr *opt,
 {
 	struct ingress_sched_data *q = qdisc_priv(sch);
 	struct net_device *dev = qdisc_dev(sch);
+	struct bpf_mprog_entry *entry;
+	bool created;
 	int err;
 
 	if (sch->parent != TC_H_INGRESS)
@@ -85,7 +88,13 @@ static int ingress_init(struct Qdisc *sch, struct nlattr *opt,
 
 	net_inc_ingress_queue();
 
-	mini_qdisc_pair_init(&q->miniqp, sch, &dev->miniq_ingress);
+	entry = tcx_entry_fetch_or_create(dev, true, &created);
+	if (!entry)
+		return -ENOMEM;
+	tcx_miniq_active(entry, true);
+	mini_qdisc_pair_init(&q->miniqp, sch, &tcx_entry(entry)->miniq);
+	if (created)
+		tcx_entry_update(dev, entry, true);
 
 	q->block_info.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS;
 	q->block_info.chain_head_change = clsact_chain_head_change;
@@ -103,11 +112,22 @@ static int ingress_init(struct Qdisc *sch, struct nlattr *opt,
 static void ingress_destroy(struct Qdisc *sch)
 {
 	struct ingress_sched_data *q = qdisc_priv(sch);
+	struct net_device *dev = qdisc_dev(sch);
+	struct bpf_mprog_entry *entry = rtnl_dereference(dev->tcx_ingress);
 
 	if (sch->parent != TC_H_INGRESS)
 		return;
 
 	tcf_block_put_ext(q->block, sch, &q->block_info);
+
+	if (entry) {
+		tcx_miniq_active(entry, false);
+		if (!bpf_mprog_total(entry)) {
+			tcx_entry_update(dev, NULL, false);
+			tcx_entry_free(entry);
+		}
+	}
+
 	net_dec_ingress_queue();
 }
 
@@ -223,6 +243,8 @@ static int clsact_init(struct Qdisc *sch, struct nlattr *opt,
 {
 	struct clsact_sched_data *q = qdisc_priv(sch);
 	struct net_device *dev = qdisc_dev(sch);
+	struct bpf_mprog_entry *entry;
+	bool created;
 	int err;
 
 	if (sch->parent != TC_H_CLSACT)
@@ -231,7 +253,13 @@ static int clsact_init(struct Qdisc *sch, struct nlattr *opt,
 	net_inc_ingress_queue();
 	net_inc_egress_queue();
 
-	mini_qdisc_pair_init(&q->miniqp_ingress, sch, &dev->miniq_ingress);
+	entry = tcx_entry_fetch_or_create(dev, true, &created);
+	if (!entry)
+		return -ENOMEM;
+	tcx_miniq_active(entry, true);
+	mini_qdisc_pair_init(&q->miniqp_ingress, sch, &tcx_entry(entry)->miniq);
+	if (created)
+		tcx_entry_update(dev, entry, true);
 
 	q->ingress_block_info.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS;
 	q->ingress_block_info.chain_head_change = clsact_chain_head_change;
@@ -244,7 +272,13 @@ static int clsact_init(struct Qdisc *sch, struct nlattr *opt,
 
 	mini_qdisc_pair_block_init(&q->miniqp_ingress, q->ingress_block);
 
-	mini_qdisc_pair_init(&q->miniqp_egress, sch, &dev->miniq_egress);
+	entry = tcx_entry_fetch_or_create(dev, false, &created);
+	if (!entry)
+		return -ENOMEM;
+	tcx_miniq_active(entry, true);
+	mini_qdisc_pair_init(&q->miniqp_egress, sch, &tcx_entry(entry)->miniq);
+	if (created)
+		tcx_entry_update(dev, entry, false);
 
 	q->egress_block_info.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_EGRESS;
 	q->egress_block_info.chain_head_change = clsact_chain_head_change;
@@ -256,12 +290,31 @@ static int clsact_init(struct Qdisc *sch, struct nlattr *opt,
 static void clsact_destroy(struct Qdisc *sch)
 {
 	struct clsact_sched_data *q = qdisc_priv(sch);
+	struct net_device *dev = qdisc_dev(sch);
+	struct bpf_mprog_entry *ingress_entry = rtnl_dereference(dev->tcx_ingress);
+	struct bpf_mprog_entry *egress_entry = rtnl_dereference(dev->tcx_egress);
 
 	if (sch->parent != TC_H_CLSACT)
 		return;
 
-	tcf_block_put_ext(q->egress_block, sch, &q->egress_block_info);
 	tcf_block_put_ext(q->ingress_block, sch, &q->ingress_block_info);
+	tcf_block_put_ext(q->egress_block, sch, &q->egress_block_info);
+
+	if (ingress_entry) {
+		tcx_miniq_active(ingress_entry, false);
+		if (!bpf_mprog_total(ingress_entry)) {
+			tcx_entry_update(dev, NULL, true);
+			tcx_entry_free(ingress_entry);
+		}
+	}
+
+	if (egress_entry) {
+		tcx_miniq_active(egress_entry, false);
+		if (!bpf_mprog_total(egress_entry)) {
+			tcx_entry_update(dev, NULL, false);
+			tcx_entry_free(egress_entry);
+		}
+	}
 
 	net_dec_ingress_queue();
 	net_dec_egress_queue();
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 74879c538f2b..98c4a3a6e137 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1036,6 +1036,8 @@ enum bpf_attach_type {
 	BPF_LSM_CGROUP,
 	BPF_STRUCT_OPS,
 	BPF_NETFILTER,
+	BPF_TCX_INGRESS,
+	BPF_TCX_EGRESS,
 	__MAX_BPF_ATTACH_TYPE
 };
 
@@ -1053,7 +1055,7 @@ enum bpf_link_type {
 	BPF_LINK_TYPE_KPROBE_MULTI = 8,
 	BPF_LINK_TYPE_STRUCT_OPS = 9,
 	BPF_LINK_TYPE_NETFILTER = 10,
-
+	BPF_LINK_TYPE_TCX = 11,
 	MAX_BPF_LINK_TYPE,
 };
 
@@ -1559,13 +1561,13 @@ union bpf_attr {
 			__u32		map_fd;		/* struct_ops to attach */
 		};
 		union {
-			__u32		target_fd;	/* object to attach to */
-			__u32		target_ifindex; /* target ifindex */
+			__u32	target_fd;	/* target object to attach to or ... */
+			__u32	target_ifindex; /* target ifindex */
 		};
 		__u32		attach_type;	/* attach type */
 		__u32		flags;		/* extra flags */
 		union {
-			__u32		target_btf_id;	/* btf_id of target to attach to */
+			__u32	target_btf_id;	/* btf_id of target to attach to */
 			struct {
 				__aligned_u64	iter_info;	/* extra bpf_iter_link_info */
 				__u32		iter_info_len;	/* iter_info length */
@@ -1599,6 +1601,13 @@ union bpf_attr {
 				__s32		priority;
 				__u32		flags;
 			} netfilter;
+			struct {
+				union {
+					__u32	relative_fd;
+					__u32	relative_id;
+				};
+				__u64		expected_revision;
+			} tcx;
 		};
 	} link_create;
 
@@ -6207,6 +6216,19 @@ struct bpf_sock_tuple {
 	};
 };
 
+/* (Simplified) user return codes for tcx prog type.
+ * A valid tcx program must return one of these defined values. All other
+ * return codes are reserved for future use. Must remain compatible with
+ * their TC_ACT_* counter-parts. For compatibility in behavior, unknown
+ * return codes are mapped to TCX_NEXT.
+ */
+enum tcx_action_base {
+	TCX_NEXT	= -1,
+	TCX_PASS	= 0,
+	TCX_DROP	= 2,
+	TCX_REDIRECT	= 7,
+};
+
 struct bpf_xdp_sock {
 	__u32 queue_id;
 };
@@ -6459,6 +6481,10 @@ struct bpf_link_info {
 			__s32 priority;
 			__u32 flags;
 		} netfilter;
+		struct {
+			__u32 ifindex;
+			__u32 attach_type;
+		} tcx;
 	};
 } __attribute__((aligned(8)));
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next v4 3/8] libbpf: Add opts-based attach/detach/query API for tcx
  2023-07-10 20:12 [PATCH bpf-next v4 0/8] BPF link support for tc BPF programs Daniel Borkmann
  2023-07-10 20:12 ` [PATCH bpf-next v4 1/8] bpf: Add generic attach/detach/query API for multi-progs Daniel Borkmann
  2023-07-10 20:12 ` [PATCH bpf-next v4 2/8] bpf: Add fd-based tcx multi-prog infra with link support Daniel Borkmann
@ 2023-07-10 20:12 ` Daniel Borkmann
  2023-07-11  4:00   ` Andrii Nakryiko
  2023-07-10 20:12 ` [PATCH bpf-next v4 4/8] libbpf: Add link-based " Daniel Borkmann
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Daniel Borkmann @ 2023-07-10 20:12 UTC (permalink / raw)
  To: ast
  Cc: andrii, martin.lau, razor, sdf, john.fastabend, kuba, dxu, joe,
	toke, davem, bpf, netdev, Daniel Borkmann

Extend libbpf attach opts and add a new detach opts API so this can be used
to add/remove fd-based tcx BPF programs. The old-style bpf_prog_detach() and
bpf_prog_detach2() APIs are refactored to reuse the new bpf_prog_detach_opts()
internally.

The bpf_prog_query_opts() API got extended to be able to handle the new
link_ids, link_attach_flags and revision fields.

For concrete usage examples, see the extensive selftests that have been
developed as part of this series.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 tools/lib/bpf/bpf.c      | 105 +++++++++++++++++++++++++--------------
 tools/lib/bpf/bpf.h      |  92 ++++++++++++++++++++++++++++------
 tools/lib/bpf/libbpf.c   |  12 +++--
 tools/lib/bpf/libbpf.map |   1 +
 4 files changed, 157 insertions(+), 53 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 3b0da19715e1..3dfc43b477c3 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -629,55 +629,87 @@ int bpf_prog_attach(int prog_fd, int target_fd, enum bpf_attach_type type,
 	return bpf_prog_attach_opts(prog_fd, target_fd, type, &opts);
 }
 
-int bpf_prog_attach_opts(int prog_fd, int target_fd,
-			  enum bpf_attach_type type,
-			  const struct bpf_prog_attach_opts *opts)
+int bpf_prog_attach_opts(int prog_fd, int target,
+			 enum bpf_attach_type type,
+			 const struct bpf_prog_attach_opts *opts)
 {
-	const size_t attr_sz = offsetofend(union bpf_attr, replace_bpf_fd);
+	const size_t attr_sz = offsetofend(union bpf_attr, expected_revision);
+	__u32 relative_id, flags;
 	union bpf_attr attr;
-	int ret;
+	int ret, relative;
 
 	if (!OPTS_VALID(opts, bpf_prog_attach_opts))
 		return libbpf_err(-EINVAL);
 
+	relative_id = OPTS_GET(opts, relative_id, 0);
+	relative = OPTS_GET(opts, relative_fd, 0);
+	flags = OPTS_GET(opts, flags, 0);
+
+	/* validate we don't have unexpected combinations of non-zero fields */
+	if (relative > 0 && relative_id)
+		return libbpf_err(-EINVAL);
+	if (relative_id) {
+		relative = relative_id;
+		flags |= BPF_F_ID;
+	}
+
 	memset(&attr, 0, attr_sz);
-	attr.target_fd	   = target_fd;
-	attr.attach_bpf_fd = prog_fd;
-	attr.attach_type   = type;
-	attr.attach_flags  = OPTS_GET(opts, flags, 0);
-	attr.replace_bpf_fd = OPTS_GET(opts, replace_prog_fd, 0);
+	attr.target_fd		= target;
+	attr.attach_bpf_fd	= prog_fd;
+	attr.attach_type	= type;
+	attr.attach_flags	= flags;
+	attr.relative_fd	= relative;
+	attr.replace_bpf_fd	= OPTS_GET(opts, replace_fd, 0);
+	attr.expected_revision	= OPTS_GET(opts, expected_revision, 0);
 
 	ret = sys_bpf(BPF_PROG_ATTACH, &attr, attr_sz);
 	return libbpf_err_errno(ret);
 }
 
-int bpf_prog_detach(int target_fd, enum bpf_attach_type type)
+int bpf_prog_detach_opts(int prog_fd, int target,
+			 enum bpf_attach_type type,
+			 const struct bpf_prog_detach_opts *opts)
 {
-	const size_t attr_sz = offsetofend(union bpf_attr, replace_bpf_fd);
+	const size_t attr_sz = offsetofend(union bpf_attr, expected_revision);
+	__u32 relative_id, flags;
 	union bpf_attr attr;
-	int ret;
+	int ret, relative;
+
+	if (!OPTS_VALID(opts, bpf_prog_detach_opts))
+		return libbpf_err(-EINVAL);
+
+	relative_id = OPTS_GET(opts, relative_id, 0);
+	relative = OPTS_GET(opts, relative_fd, 0);
+	flags = OPTS_GET(opts, flags, 0);
+
+	/* validate we don't have unexpected combinations of non-zero fields */
+	if (relative > 0 && relative_id)
+		return libbpf_err(-EINVAL);
+	if (relative_id) {
+		relative = relative_id;
+		flags |= BPF_F_ID;
+	}
 
 	memset(&attr, 0, attr_sz);
-	attr.target_fd	 = target_fd;
-	attr.attach_type = type;
+	attr.target_fd		= target;
+	attr.attach_bpf_fd	= prog_fd;
+	attr.attach_type	= type;
+	attr.attach_flags	= flags;
+	attr.relative_fd	= relative;
+	attr.expected_revision	= OPTS_GET(opts, expected_revision, 0);
 
 	ret = sys_bpf(BPF_PROG_DETACH, &attr, attr_sz);
 	return libbpf_err_errno(ret);
 }
 
-int bpf_prog_detach2(int prog_fd, int target_fd, enum bpf_attach_type type)
+int bpf_prog_detach(int target_fd, enum bpf_attach_type type)
 {
-	const size_t attr_sz = offsetofend(union bpf_attr, replace_bpf_fd);
-	union bpf_attr attr;
-	int ret;
-
-	memset(&attr, 0, attr_sz);
-	attr.target_fd	 = target_fd;
-	attr.attach_bpf_fd = prog_fd;
-	attr.attach_type = type;
+	return bpf_prog_detach_opts(0, target_fd, type, NULL);
+}
 
-	ret = sys_bpf(BPF_PROG_DETACH, &attr, attr_sz);
-	return libbpf_err_errno(ret);
+int bpf_prog_detach2(int prog_fd, int target_fd, enum bpf_attach_type type)
+{
+	return bpf_prog_detach_opts(prog_fd, target_fd, type, NULL);
 }
 
 int bpf_link_create(int prog_fd, int target_fd,
@@ -841,8 +873,7 @@ int bpf_iter_create(int link_fd)
 	return libbpf_err_errno(fd);
 }
 
-int bpf_prog_query_opts(int target_fd,
-			enum bpf_attach_type type,
+int bpf_prog_query_opts(int target, enum bpf_attach_type type,
 			struct bpf_prog_query_opts *opts)
 {
 	const size_t attr_sz = offsetofend(union bpf_attr, query);
@@ -853,18 +884,20 @@ int bpf_prog_query_opts(int target_fd,
 		return libbpf_err(-EINVAL);
 
 	memset(&attr, 0, attr_sz);
-
-	attr.query.target_fd	= target_fd;
-	attr.query.attach_type	= type;
-	attr.query.query_flags	= OPTS_GET(opts, query_flags, 0);
-	attr.query.prog_cnt	= OPTS_GET(opts, prog_cnt, 0);
-	attr.query.prog_ids	= ptr_to_u64(OPTS_GET(opts, prog_ids, NULL));
-	attr.query.prog_attach_flags = ptr_to_u64(OPTS_GET(opts, prog_attach_flags, NULL));
+	attr.query.target_fd		= target;
+	attr.query.attach_type		= type;
+	attr.query.query_flags		= OPTS_GET(opts, query_flags, 0);
+	attr.query.count		= OPTS_GET(opts, count, 0);
+	attr.query.prog_ids		= ptr_to_u64(OPTS_GET(opts, prog_ids, NULL));
+	attr.query.link_ids		= ptr_to_u64(OPTS_GET(opts, link_ids, NULL));
+	attr.query.prog_attach_flags	= ptr_to_u64(OPTS_GET(opts, prog_attach_flags, NULL));
+	attr.query.link_attach_flags	= ptr_to_u64(OPTS_GET(opts, link_attach_flags, NULL));
 
 	ret = sys_bpf(BPF_PROG_QUERY, &attr, attr_sz);
 
 	OPTS_SET(opts, attach_flags, attr.query.attach_flags);
-	OPTS_SET(opts, prog_cnt, attr.query.prog_cnt);
+	OPTS_SET(opts, revision, attr.query.revision);
+	OPTS_SET(opts, count, attr.query.count);
 
 	return libbpf_err_errno(ret);
 }
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index c676295ab9bf..49e9d88fd9cf 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -312,22 +312,68 @@ LIBBPF_API int bpf_obj_get(const char *pathname);
 LIBBPF_API int bpf_obj_get_opts(const char *pathname,
 				const struct bpf_obj_get_opts *opts);
 
-struct bpf_prog_attach_opts {
-	size_t sz; /* size of this struct for forward/backward compatibility */
-	unsigned int flags;
-	int replace_prog_fd;
-};
-#define bpf_prog_attach_opts__last_field replace_prog_fd
-
 LIBBPF_API int bpf_prog_attach(int prog_fd, int attachable_fd,
 			       enum bpf_attach_type type, unsigned int flags);
-LIBBPF_API int bpf_prog_attach_opts(int prog_fd, int attachable_fd,
-				     enum bpf_attach_type type,
-				     const struct bpf_prog_attach_opts *opts);
 LIBBPF_API int bpf_prog_detach(int attachable_fd, enum bpf_attach_type type);
 LIBBPF_API int bpf_prog_detach2(int prog_fd, int attachable_fd,
 				enum bpf_attach_type type);
 
+struct bpf_prog_attach_opts {
+	size_t sz; /* size of this struct for forward/backward compatibility */
+	__u32 flags;
+	union {
+		int replace_prog_fd;
+		int replace_fd;
+	};
+	int relative_fd;
+	__u32 relative_id;
+	__u64 expected_revision;
+	size_t :0;
+};
+#define bpf_prog_attach_opts__last_field expected_revision
+
+struct bpf_prog_detach_opts {
+	size_t sz; /* size of this struct for forward/backward compatibility */
+	__u32 flags;
+	int relative_fd;
+	__u32 relative_id;
+	__u64 expected_revision;
+	size_t :0;
+};
+#define bpf_prog_detach_opts__last_field expected_revision
+
+/**
+ * @brief **bpf_prog_attach_opts()** attaches the BPF program corresponding to
+ * *prog_fd* to a *target* which can represent a file descriptor or netdevice
+ * ifindex.
+ *
+ * @param prog_fd BPF program file descriptor
+ * @param target attach location file descriptor or ifindex
+ * @param type attach type for the BPF program
+ * @param opts options for configuring the attachment
+ * @return 0, on success; negative error code, otherwise (errno is also set to
+ * the error code)
+ */
+LIBBPF_API int bpf_prog_attach_opts(int prog_fd, int target,
+				    enum bpf_attach_type type,
+				    const struct bpf_prog_attach_opts *opts);
+
+/**
+ * @brief **bpf_prog_detach_opts()** detaches the BPF program corresponding to
+ * *prog_fd* from a *target* which can represent a file descriptor or netdevice
+ * ifindex.
+ *
+ * @param prog_fd BPF program file descriptor
+ * @param target detach location file descriptor or ifindex
+ * @param type detach type for the BPF program
+ * @param opts options for configuring the detachment
+ * @return 0, on success; negative error code, otherwise (errno is also set to
+ * the error code)
+ */
+LIBBPF_API int bpf_prog_detach_opts(int prog_fd, int target,
+				    enum bpf_attach_type type,
+				    const struct bpf_prog_detach_opts *opts);
+
 union bpf_iter_link_info; /* defined in up-to-date linux/bpf.h */
 struct bpf_link_create_opts {
 	size_t sz; /* size of this struct for forward/backward compatibility */
@@ -495,13 +541,31 @@ struct bpf_prog_query_opts {
 	__u32 query_flags;
 	__u32 attach_flags; /* output argument */
 	__u32 *prog_ids;
-	__u32 prog_cnt; /* input+output argument */
+	union {
+		/* input+output argument */
+		__u32 prog_cnt;
+		__u32 count;
+	};
 	__u32 *prog_attach_flags;
+	__u32 *link_ids;
+	__u32 *link_attach_flags;
+	__u64 revision;
+	size_t :0;
 };
-#define bpf_prog_query_opts__last_field prog_attach_flags
+#define bpf_prog_query_opts__last_field revision
 
-LIBBPF_API int bpf_prog_query_opts(int target_fd,
-				   enum bpf_attach_type type,
+/**
+ * @brief **bpf_prog_query_opts()** queries the BPF programs and BPF links
+ * which are attached to *target* which can represent a file descriptor or
+ * netdevice ifindex.
+ *
+ * @param target query location file descriptor or ifindex
+ * @param type attach type for the BPF program
+ * @param opts options for configuring the query
+ * @return 0, on success; negative error code, otherwise (errno is also set to
+ * the error code)
+ */
+LIBBPF_API int bpf_prog_query_opts(int target, enum bpf_attach_type type,
 				   struct bpf_prog_query_opts *opts);
 LIBBPF_API int bpf_prog_query(int target_fd, enum bpf_attach_type type,
 			      __u32 query_flags, __u32 *attach_flags,
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 78635feb1946..bd621c916783 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -118,6 +118,8 @@ static const char * const attach_type_name[] = {
 	[BPF_TRACE_KPROBE_MULTI]	= "trace_kprobe_multi",
 	[BPF_STRUCT_OPS]		= "struct_ops",
 	[BPF_NETFILTER]			= "netfilter",
+	[BPF_TCX_INGRESS]		= "tcx_ingress",
+	[BPF_TCX_EGRESS]		= "tcx_egress",
 };
 
 static const char * const link_type_name[] = {
@@ -8691,9 +8693,13 @@ static const struct bpf_sec_def section_defs[] = {
 	SEC_DEF("ksyscall+",		KPROBE,	0, SEC_NONE, attach_ksyscall),
 	SEC_DEF("kretsyscall+",		KPROBE, 0, SEC_NONE, attach_ksyscall),
 	SEC_DEF("usdt+",		KPROBE,	0, SEC_NONE, attach_usdt),
-	SEC_DEF("tc",			SCHED_CLS, 0, SEC_NONE),
-	SEC_DEF("classifier",		SCHED_CLS, 0, SEC_NONE),
-	SEC_DEF("action",		SCHED_ACT, 0, SEC_NONE),
+	SEC_DEF("tc/ingress",		SCHED_CLS, BPF_TCX_INGRESS, SEC_NONE), /* alias for tcx */
+	SEC_DEF("tc/egress",		SCHED_CLS, BPF_TCX_EGRESS, SEC_NONE),  /* alias for tcx */
+	SEC_DEF("tcx/ingress",		SCHED_CLS, BPF_TCX_INGRESS, SEC_NONE),
+	SEC_DEF("tcx/egress",		SCHED_CLS, BPF_TCX_EGRESS, SEC_NONE),
+	SEC_DEF("tc",			SCHED_CLS, 0, SEC_NONE), /* deprecated / legacy, use tcx */
+	SEC_DEF("classifier",		SCHED_CLS, 0, SEC_NONE), /* deprecated / legacy, use tcx */
+	SEC_DEF("action",		SCHED_ACT, 0, SEC_NONE), /* deprecated / legacy, use tcx */
 	SEC_DEF("tracepoint+",		TRACEPOINT, 0, SEC_NONE, attach_tp),
 	SEC_DEF("tp+",			TRACEPOINT, 0, SEC_NONE, attach_tp),
 	SEC_DEF("raw_tracepoint+",	RAW_TRACEPOINT, 0, SEC_NONE, attach_raw_tp),
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index d9ec4407befa..a95d39bbef90 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -396,4 +396,5 @@ LIBBPF_1.3.0 {
 	global:
 		bpf_obj_pin_opts;
 		bpf_program__attach_netfilter;
+		bpf_prog_detach_opts;
 } LIBBPF_1.2.0;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next v4 4/8] libbpf: Add link-based API for tcx
  2023-07-10 20:12 [PATCH bpf-next v4 0/8] BPF link support for tc BPF programs Daniel Borkmann
                   ` (2 preceding siblings ...)
  2023-07-10 20:12 ` [PATCH bpf-next v4 3/8] libbpf: Add opts-based attach/detach/query API for tcx Daniel Borkmann
@ 2023-07-10 20:12 ` Daniel Borkmann
  2023-07-11  4:00   ` Andrii Nakryiko
  2023-07-10 20:12 ` [PATCH bpf-next v4 5/8] libbpf: Add helper macro to clear opts structs Daniel Borkmann
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Daniel Borkmann @ 2023-07-10 20:12 UTC (permalink / raw)
  To: ast
  Cc: andrii, martin.lau, razor, sdf, john.fastabend, kuba, dxu, joe,
	toke, davem, bpf, netdev, Daniel Borkmann

Implement tcx BPF link support for libbpf.

The bpf_program__attach_fd() API has been refactored slightly in order to pass
bpf_link_create_opts pointer as input.

A new bpf_program__attach_tcx() has been added on top of this which allows for
passing all relevant data via extensible struct bpf_tcx_opts.

The program sections tcx/ingress and tcx/egress correspond to the hook locations
for tc ingress and egress, respectively.

For concrete usage examples, see the extensive selftests that have been
developed as part of this series.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 tools/lib/bpf/bpf.c      | 19 ++++++++++--
 tools/lib/bpf/bpf.h      |  5 ++++
 tools/lib/bpf/libbpf.c   | 62 ++++++++++++++++++++++++++++++++++------
 tools/lib/bpf/libbpf.h   | 16 +++++++++++
 tools/lib/bpf/libbpf.map |  1 +
 5 files changed, 92 insertions(+), 11 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 3dfc43b477c3..d513c226b9aa 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -717,9 +717,9 @@ int bpf_link_create(int prog_fd, int target_fd,
 		    const struct bpf_link_create_opts *opts)
 {
 	const size_t attr_sz = offsetofend(union bpf_attr, link_create);
-	__u32 target_btf_id, iter_info_len;
+	__u32 target_btf_id, iter_info_len, relative_id;
+	int fd, err, relative;
 	union bpf_attr attr;
-	int fd, err;
 
 	if (!OPTS_VALID(opts, bpf_link_create_opts))
 		return libbpf_err(-EINVAL);
@@ -781,6 +781,21 @@ int bpf_link_create(int prog_fd, int target_fd,
 		if (!OPTS_ZEROED(opts, netfilter))
 			return libbpf_err(-EINVAL);
 		break;
+	case BPF_TCX_INGRESS:
+	case BPF_TCX_EGRESS:
+		relative = OPTS_GET(opts, tcx.relative_fd, 0);
+		relative_id = OPTS_GET(opts, tcx.relative_id, 0);
+		if (relative > 0 && relative_id)
+			return libbpf_err(-EINVAL);
+		if (relative_id) {
+			relative = relative_id;
+			attr.link_create.flags |= BPF_F_ID;
+		}
+		attr.link_create.tcx.relative_fd = relative;
+		attr.link_create.tcx.expected_revision = OPTS_GET(opts, tcx.expected_revision, 0);
+		if (!OPTS_ZEROED(opts, tcx))
+			return libbpf_err(-EINVAL);
+		break;
 	default:
 		if (!OPTS_ZEROED(opts, flags))
 			return libbpf_err(-EINVAL);
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 49e9d88fd9cf..044a74ffc38a 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -401,6 +401,11 @@ struct bpf_link_create_opts {
 			__s32 priority;
 			__u32 flags;
 		} netfilter;
+		struct {
+			__u32 relative_fd;
+			__u32 relative_id;
+			__u64 expected_revision;
+		} tcx;
 	};
 	size_t :0;
 };
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index bd621c916783..aa94d4af0ecb 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -134,6 +134,7 @@ static const char * const link_type_name[] = {
 	[BPF_LINK_TYPE_KPROBE_MULTI]		= "kprobe_multi",
 	[BPF_LINK_TYPE_STRUCT_OPS]		= "struct_ops",
 	[BPF_LINK_TYPE_NETFILTER]		= "netfilter",
+	[BPF_LINK_TYPE_TCX]			= "tcx",
 };
 
 static const char * const map_type_name[] = {
@@ -11845,11 +11846,10 @@ static int attach_lsm(const struct bpf_program *prog, long cookie, struct bpf_li
 }
 
 static struct bpf_link *
-bpf_program__attach_fd(const struct bpf_program *prog, int target_fd, int btf_id,
-		       const char *target_name)
+bpf_program_attach_fd(const struct bpf_program *prog,
+		      int target_fd, const char *target_name,
+		      const struct bpf_link_create_opts *opts)
 {
-	DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts,
-			    .target_btf_id = btf_id);
 	enum bpf_attach_type attach_type;
 	char errmsg[STRERR_BUFSIZE];
 	struct bpf_link *link;
@@ -11867,7 +11867,7 @@ bpf_program__attach_fd(const struct bpf_program *prog, int target_fd, int btf_id
 	link->detach = &bpf_link__detach_fd;
 
 	attach_type = bpf_program__expected_attach_type(prog);
-	link_fd = bpf_link_create(prog_fd, target_fd, attach_type, &opts);
+	link_fd = bpf_link_create(prog_fd, target_fd, attach_type, opts);
 	if (link_fd < 0) {
 		link_fd = -errno;
 		free(link);
@@ -11883,19 +11883,58 @@ bpf_program__attach_fd(const struct bpf_program *prog, int target_fd, int btf_id
 struct bpf_link *
 bpf_program__attach_cgroup(const struct bpf_program *prog, int cgroup_fd)
 {
-	return bpf_program__attach_fd(prog, cgroup_fd, 0, "cgroup");
+	return bpf_program_attach_fd(prog, cgroup_fd, "cgroup", NULL);
 }
 
 struct bpf_link *
 bpf_program__attach_netns(const struct bpf_program *prog, int netns_fd)
 {
-	return bpf_program__attach_fd(prog, netns_fd, 0, "netns");
+	return bpf_program_attach_fd(prog, netns_fd, "netns", NULL);
 }
 
 struct bpf_link *bpf_program__attach_xdp(const struct bpf_program *prog, int ifindex)
 {
 	/* target_fd/target_ifindex use the same field in LINK_CREATE */
-	return bpf_program__attach_fd(prog, ifindex, 0, "xdp");
+	return bpf_program_attach_fd(prog, ifindex, "xdp", NULL);
+}
+
+struct bpf_link *
+bpf_program__attach_tcx(const struct bpf_program *prog,
+			const struct bpf_tcx_opts *opts)
+{
+	LIBBPF_OPTS(bpf_link_create_opts, link_create_opts);
+	__u32 relative_id, flags;
+	int ifindex, relative_fd;
+
+	if (!OPTS_VALID(opts, bpf_tcx_opts))
+		return libbpf_err_ptr(-EINVAL);
+
+	relative_id = OPTS_GET(opts, relative_id, 0);
+	relative_fd = OPTS_GET(opts, relative_fd, 0);
+	flags = OPTS_GET(opts, flags, 0);
+	ifindex = OPTS_GET(opts, ifindex, 0);
+
+	/* validate we don't have unexpected combinations of non-zero fields */
+	if (!ifindex) {
+		pr_warn("prog '%s': target netdevice ifindex cannot be zero\n",
+			prog->name);
+		return libbpf_err_ptr(-EINVAL);
+	}
+	if (relative_fd > 0 && relative_id) {
+		pr_warn("prog '%s': relative_fd and relative_id cannot be set at the same time\n",
+			prog->name);
+		return libbpf_err_ptr(-EINVAL);
+	}
+	if (relative_id)
+		flags |= BPF_F_ID;
+
+	link_create_opts.tcx.expected_revision = OPTS_GET(opts, expected_revision, 0);
+	link_create_opts.tcx.relative_fd = relative_fd;
+	link_create_opts.tcx.relative_id = relative_id;
+	link_create_opts.flags = flags;
+
+	/* target_fd/target_ifindex use the same field in LINK_CREATE */
+	return bpf_program_attach_fd(prog, ifindex, "tc", &link_create_opts);
 }
 
 struct bpf_link *bpf_program__attach_freplace(const struct bpf_program *prog,
@@ -11917,11 +11956,16 @@ struct bpf_link *bpf_program__attach_freplace(const struct bpf_program *prog,
 	}
 
 	if (target_fd) {
+		LIBBPF_OPTS(bpf_link_create_opts, target_opts);
+
 		btf_id = libbpf_find_prog_btf_id(attach_func_name, target_fd);
 		if (btf_id < 0)
 			return libbpf_err_ptr(btf_id);
 
-		return bpf_program__attach_fd(prog, target_fd, btf_id, "freplace");
+		target_opts.target_btf_id = btf_id;
+
+		return bpf_program_attach_fd(prog, target_fd, "freplace",
+					     &target_opts);
 	} else {
 		/* no target, so use raw_tracepoint_open for compatibility
 		 * with old kernels
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 10642ad69d76..33f60a318e81 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -733,6 +733,22 @@ LIBBPF_API struct bpf_link *
 bpf_program__attach_netfilter(const struct bpf_program *prog,
 			      const struct bpf_netfilter_opts *opts);
 
+struct bpf_tcx_opts {
+	/* size of this struct, for forward/backward compatibility */
+	size_t sz;
+	int ifindex;
+	__u32 flags;
+	__u32 relative_fd;
+	__u32 relative_id;
+	__u64 expected_revision;
+	size_t :0;
+};
+#define bpf_tcx_opts__last_field expected_revision
+
+LIBBPF_API struct bpf_link *
+bpf_program__attach_tcx(const struct bpf_program *prog,
+			const struct bpf_tcx_opts *opts);
+
 struct bpf_map;
 
 LIBBPF_API struct bpf_link *bpf_map__attach_struct_ops(const struct bpf_map *map);
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index a95d39bbef90..2a2db5c78048 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -397,4 +397,5 @@ LIBBPF_1.3.0 {
 		bpf_obj_pin_opts;
 		bpf_program__attach_netfilter;
 		bpf_prog_detach_opts;
+		bpf_program__attach_tcx;
 } LIBBPF_1.2.0;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next v4 5/8] libbpf: Add helper macro to clear opts structs
  2023-07-10 20:12 [PATCH bpf-next v4 0/8] BPF link support for tc BPF programs Daniel Borkmann
                   ` (3 preceding siblings ...)
  2023-07-10 20:12 ` [PATCH bpf-next v4 4/8] libbpf: Add link-based " Daniel Borkmann
@ 2023-07-10 20:12 ` Daniel Borkmann
  2023-07-11  4:02   ` Andrii Nakryiko
  2023-07-10 20:12 ` [PATCH bpf-next v4 6/8] bpftool: Extend net dump with tcx progs Daniel Borkmann
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Daniel Borkmann @ 2023-07-10 20:12 UTC (permalink / raw)
  To: ast
  Cc: andrii, martin.lau, razor, sdf, john.fastabend, kuba, dxu, joe,
	toke, davem, bpf, netdev, Daniel Borkmann

Add a small and generic LIBBPF_OPTS_CLEAR() helper macros which clears
an opts structure and reinitializes its .sz member to place the structure
size. I found this very useful when developing selftests, but it is also
generic enough as a macro next to the existing LIBBPF_OPTS() which hides
the .sz initialization, too.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 tools/lib/bpf/libbpf_common.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/tools/lib/bpf/libbpf_common.h b/tools/lib/bpf/libbpf_common.h
index 9a7937f339df..eb180023aa97 100644
--- a/tools/lib/bpf/libbpf_common.h
+++ b/tools/lib/bpf/libbpf_common.h
@@ -70,4 +70,15 @@
 		};							    \
 	})
 
+/* Helper macro to clear a libbpf options struct
+ *
+ * Small helper macro to reset all fields and to reinitialize the common
+ * structure size member.
+ */
+#define LIBBPF_OPTS_CLEAR(NAME)						    \
+	do {								    \
+		memset(&NAME, 0, sizeof(NAME));				    \
+		NAME.sz = sizeof(NAME);					    \
+	} while (0)
+
 #endif /* __LIBBPF_LIBBPF_COMMON_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next v4 6/8] bpftool: Extend net dump with tcx progs
  2023-07-10 20:12 [PATCH bpf-next v4 0/8] BPF link support for tc BPF programs Daniel Borkmann
                   ` (4 preceding siblings ...)
  2023-07-10 20:12 ` [PATCH bpf-next v4 5/8] libbpf: Add helper macro to clear opts structs Daniel Borkmann
@ 2023-07-10 20:12 ` Daniel Borkmann
  2023-07-11 14:19   ` Quentin Monnet
  2023-07-10 20:12 ` [PATCH bpf-next v4 7/8] selftests/bpf: Add mprog API tests for BPF tcx opts Daniel Borkmann
  2023-07-10 20:12 ` [PATCH bpf-next v4 8/8] selftests/bpf: Add mprog API tests for BPF tcx links Daniel Borkmann
  7 siblings, 1 reply; 22+ messages in thread
From: Daniel Borkmann @ 2023-07-10 20:12 UTC (permalink / raw)
  To: ast
  Cc: andrii, martin.lau, razor, sdf, john.fastabend, kuba, dxu, joe,
	toke, davem, bpf, netdev, Daniel Borkmann

Add support to dump fd-based attach types via bpftool. This includes both
the tc BPF link and attach ops programs. Dumped information contain the
attach location, function entry name, program ID and link ID when applicable.

Example with tc BPF link:

  # ./bpftool net
  xdp:

  tc:
  bond0(4) tcx/ingress cil_from_netdev prog id 784 link id 10
  bond0(4) tcx/egress cil_to_netdev prog id 804 link id 11

  flow_dissector:

  netfilter:

Example with tc BPF attach ops:

  # ./bpftool net
  xdp:

  tc:
  bond0(4) tcx/ingress cil_from_netdev prog id 654
  bond0(4) tcx/egress cil_to_netdev prog id 672

  flow_dissector:

  netfilter:

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 tools/bpf/bpftool/net.c | 86 +++++++++++++++++++++++++++++++++++++++--
 1 file changed, 82 insertions(+), 4 deletions(-)

diff --git a/tools/bpf/bpftool/net.c b/tools/bpf/bpftool/net.c
index 26a49965bf71..22af0a81458c 100644
--- a/tools/bpf/bpftool/net.c
+++ b/tools/bpf/bpftool/net.c
@@ -76,6 +76,11 @@ static const char * const attach_type_strings[] = {
 	[NET_ATTACH_TYPE_XDP_OFFLOAD]	= "xdpoffload",
 };
 
+static const char * const attach_loc_strings[] = {
+	[BPF_TCX_INGRESS]		= "tcx/ingress",
+	[BPF_TCX_EGRESS]		= "tcx/egress",
+};
+
 const size_t net_attach_type_size = ARRAY_SIZE(attach_type_strings);
 
 static enum net_attach_type parse_attach_type(const char *str)
@@ -422,8 +427,80 @@ static int dump_filter_nlmsg(void *cookie, void *msg, struct nlattr **tb)
 			      filter_info->devname, filter_info->ifindex);
 }
 
-static int show_dev_tc_bpf(int sock, unsigned int nl_pid,
-			   struct ip_devname_ifindex *dev)
+static const char *flags_strings(__u32 flags)
+{
+	return json_output ? "none" : "";
+}
+
+static int __show_dev_tc_bpf_name(__u32 id, char *name, size_t len)
+{
+	struct bpf_prog_info info = {};
+	__u32 ilen = sizeof(info);
+	int fd, ret;
+
+	fd = bpf_prog_get_fd_by_id(id);
+	if (fd < 0)
+		return fd;
+	ret = bpf_obj_get_info_by_fd(fd, &info, &ilen);
+	if (ret < 0)
+		goto out;
+	ret = -ENOENT;
+	if (info.name[0]) {
+		get_prog_full_name(&info, fd, name, len);
+		ret = 0;
+	}
+out:
+	close(fd);
+	return ret;
+}
+
+static void __show_dev_tc_bpf(const struct ip_devname_ifindex *dev,
+			      const enum bpf_attach_type loc)
+{
+	__u32 prog_flags[64] = {}, link_flags[64] = {}, i;
+	__u32 prog_ids[64] = {}, link_ids[64] = {};
+	LIBBPF_OPTS(bpf_prog_query_opts, optq);
+	char prog_name[MAX_PROG_FULL_NAME];
+	int ret;
+
+	optq.prog_ids = prog_ids;
+	optq.prog_attach_flags = prog_flags;
+	optq.link_ids = link_ids;
+	optq.link_attach_flags = link_flags;
+	optq.count = ARRAY_SIZE(prog_ids);
+
+	ret = bpf_prog_query_opts(dev->ifindex, loc, &optq);
+	if (ret)
+		return;
+	for (i = 0; i < optq.count; i++) {
+		NET_START_OBJECT;
+		NET_DUMP_STR("devname", "%s", dev->devname);
+		NET_DUMP_UINT("ifindex", "(%u)", dev->ifindex);
+		NET_DUMP_STR("kind", " %s", attach_loc_strings[loc]);
+		ret = __show_dev_tc_bpf_name(prog_ids[i], prog_name,
+					     sizeof(prog_name));
+		if (!ret)
+			NET_DUMP_STR("name", " %s", prog_name);
+		NET_DUMP_UINT("prog_id", " prog id %u", prog_ids[i]);
+		if (prog_flags[i])
+			NET_DUMP_STR("prog_flags", "%s", flags_strings(prog_flags[i]));
+		if (link_ids[i])
+			NET_DUMP_UINT("link_id", " link id %u",
+				      link_ids[i]);
+		if (link_flags[i])
+			NET_DUMP_STR("link_flags", "%s", flags_strings(link_flags[i]));
+		NET_END_OBJECT_FINAL;
+	}
+}
+
+static void show_dev_tc_bpf(struct ip_devname_ifindex *dev)
+{
+	__show_dev_tc_bpf(dev, BPF_TCX_INGRESS);
+	__show_dev_tc_bpf(dev, BPF_TCX_EGRESS);
+}
+
+static int show_dev_tc_bpf_classic(int sock, unsigned int nl_pid,
+				   struct ip_devname_ifindex *dev)
 {
 	struct bpf_filter_t filter_info;
 	struct bpf_tcinfo_t tcinfo;
@@ -790,8 +867,9 @@ static int do_show(int argc, char **argv)
 	if (!ret) {
 		NET_START_ARRAY("tc", "%s:\n");
 		for (i = 0; i < dev_array.used_len; i++) {
-			ret = show_dev_tc_bpf(sock, nl_pid,
-					      &dev_array.devices[i]);
+			show_dev_tc_bpf(&dev_array.devices[i]);
+			ret = show_dev_tc_bpf_classic(sock, nl_pid,
+						      &dev_array.devices[i]);
 			if (ret)
 				break;
 		}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next v4 7/8] selftests/bpf: Add mprog API tests for BPF tcx opts
  2023-07-10 20:12 [PATCH bpf-next v4 0/8] BPF link support for tc BPF programs Daniel Borkmann
                   ` (5 preceding siblings ...)
  2023-07-10 20:12 ` [PATCH bpf-next v4 6/8] bpftool: Extend net dump with tcx progs Daniel Borkmann
@ 2023-07-10 20:12 ` Daniel Borkmann
  2023-07-10 20:12 ` [PATCH bpf-next v4 8/8] selftests/bpf: Add mprog API tests for BPF tcx links Daniel Borkmann
  7 siblings, 0 replies; 22+ messages in thread
From: Daniel Borkmann @ 2023-07-10 20:12 UTC (permalink / raw)
  To: ast
  Cc: andrii, martin.lau, razor, sdf, john.fastabend, kuba, dxu, joe,
	toke, davem, bpf, netdev, Daniel Borkmann

Add a big batch of test coverage to assert all aspects of the tcx opts
attach, detach and query API:

  # ./vmtest.sh -- ./test_progs -t tc_opts
  [...]
  #238     tc_opts_after:OK
  #239     tc_opts_append:OK
  #240     tc_opts_basic:OK
  #241     tc_opts_before:OK
  #242     tc_opts_chain_classic:OK
  #243     tc_opts_demixed:OK
  #244     tc_opts_detach:OK
  #245     tc_opts_detach_after:OK
  #246     tc_opts_detach_before:OK
  #247     tc_opts_dev_cleanup:OK
  #248     tc_opts_invalid:OK
  #249     tc_opts_mixed:OK
  #250     tc_opts_prepend:OK
  #251     tc_opts_replace:OK
  #252     tc_opts_revision:OK
  Summary: 15/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 .../selftests/bpf/prog_tests/tc_helpers.h     |   72 +
 .../selftests/bpf/prog_tests/tc_opts.c        | 2182 +++++++++++++++++
 .../selftests/bpf/progs/test_tc_link.c        |   40 +
 3 files changed, 2294 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/tc_helpers.h
 create mode 100644 tools/testing/selftests/bpf/prog_tests/tc_opts.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_tc_link.c

diff --git a/tools/testing/selftests/bpf/prog_tests/tc_helpers.h b/tools/testing/selftests/bpf/prog_tests/tc_helpers.h
new file mode 100644
index 000000000000..6c93215be8a3
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/tc_helpers.h
@@ -0,0 +1,72 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) 2023 Isovalent */
+#ifndef TC_HELPERS
+#define TC_HELPERS
+#include <test_progs.h>
+
+static inline __u32 id_from_prog_fd(int fd)
+{
+	struct bpf_prog_info prog_info = {};
+	__u32 prog_info_len = sizeof(prog_info);
+	int err;
+
+	err = bpf_obj_get_info_by_fd(fd, &prog_info, &prog_info_len);
+	if (!ASSERT_OK(err, "id_from_prog_fd"))
+		return 0;
+
+	ASSERT_NEQ(prog_info.id, 0, "prog_info.id");
+	return prog_info.id;
+}
+
+static inline __u32 id_from_link_fd(int fd)
+{
+	struct bpf_link_info link_info = {};
+	__u32 link_info_len = sizeof(link_info);
+	int err;
+
+	err = bpf_link_get_info_by_fd(fd, &link_info, &link_info_len);
+	if (!ASSERT_OK(err, "id_from_link_fd"))
+		return 0;
+
+	ASSERT_NEQ(link_info.id, 0, "link_info.id");
+	return link_info.id;
+}
+
+static inline __u32 ifindex_from_link_fd(int fd)
+{
+	struct bpf_link_info link_info = {};
+	__u32 link_info_len = sizeof(link_info);
+	int err;
+
+	err = bpf_link_get_info_by_fd(fd, &link_info, &link_info_len);
+	if (!ASSERT_OK(err, "id_from_link_fd"))
+		return 0;
+
+	return link_info.tcx.ifindex;
+}
+
+static inline void __assert_mprog_count(int target, int expected, bool miniq, int ifindex)
+{
+	__u32 count = 0, attach_flags = 0;
+	int err;
+
+	err = bpf_prog_query(ifindex, target, 0, &attach_flags,
+			     NULL, &count);
+	ASSERT_EQ(count, expected, "count");
+	if (!expected && !miniq)
+		ASSERT_EQ(err, -ENOENT, "prog_query");
+	else
+		ASSERT_EQ(err, 0, "prog_query");
+}
+
+static inline void assert_mprog_count(int target, int expected)
+{
+	__assert_mprog_count(target, expected, false, loopback);
+}
+
+static inline void assert_mprog_count_ifindex(int ifindex, int target, int expected)
+{
+	__assert_mprog_count(target, expected, false, ifindex);
+}
+
+#endif /* TC_HELPERS */
diff --git a/tools/testing/selftests/bpf/prog_tests/tc_opts.c b/tools/testing/selftests/bpf/prog_tests/tc_opts.c
new file mode 100644
index 000000000000..c21a2940ea53
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/tc_opts.c
@@ -0,0 +1,2182 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2023 Isovalent */
+#include <uapi/linux/if_link.h>
+#include <net/if.h>
+#include <test_progs.h>
+
+#define loopback 1
+#define ping_cmd "ping -q -c1 -w1 127.0.0.1 > /dev/null"
+
+#include "test_tc_link.skel.h"
+#include "tc_helpers.h"
+
+void serial_test_tc_opts_basic(void)
+{
+	LIBBPF_OPTS(bpf_prog_attach_opts, opta);
+	LIBBPF_OPTS(bpf_prog_detach_opts, optd);
+	LIBBPF_OPTS(bpf_prog_query_opts, optq);
+	__u32 fd1, fd2, id1, id2;
+	struct test_tc_link *skel;
+	__u32 prog_ids[2];
+	int err;
+
+	skel = test_tc_link__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_load"))
+		goto cleanup;
+
+	fd1 = bpf_program__fd(skel->progs.tc1);
+	fd2 = bpf_program__fd(skel->progs.tc2);
+
+	id1 = id_from_prog_fd(fd1);
+	id2 = id_from_prog_fd(fd2);
+
+	ASSERT_NEQ(id1, id2, "prog_ids_1_2");
+
+	assert_mprog_count(BPF_TCX_INGRESS, 0);
+	assert_mprog_count(BPF_TCX_EGRESS, 0);
+
+	ASSERT_EQ(skel->bss->seen_tc1, false, "seen_tc1");
+	ASSERT_EQ(skel->bss->seen_tc2, false, "seen_tc2");
+
+	err = bpf_prog_attach_opts(fd1, loopback, BPF_TCX_INGRESS, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup;
+
+	assert_mprog_count(BPF_TCX_INGRESS, 1);
+	assert_mprog_count(BPF_TCX_EGRESS, 0);
+
+	optq.prog_ids = prog_ids;
+
+	memset(prog_ids, 0, sizeof(prog_ids));
+	optq.count = ARRAY_SIZE(prog_ids);
+
+	err = bpf_prog_query_opts(loopback, BPF_TCX_INGRESS, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup_in;
+
+	ASSERT_EQ(optq.count, 1, "count");
+	ASSERT_EQ(optq.revision, 2, "revision");
+	ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]");
+	ASSERT_EQ(optq.prog_ids[1], 0, "prog_ids[1]");
+
+	ASSERT_OK(system(ping_cmd), ping_cmd);
+
+	ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+	ASSERT_EQ(skel->bss->seen_tc2, false, "seen_tc2");
+
+	err = bpf_prog_attach_opts(fd2, loopback, BPF_TCX_EGRESS, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup_in;
+
+	assert_mprog_count(BPF_TCX_INGRESS, 1);
+	assert_mprog_count(BPF_TCX_EGRESS, 1);
+
+	memset(prog_ids, 0, sizeof(prog_ids));
+	optq.count = ARRAY_SIZE(prog_ids);
+
+	err = bpf_prog_query_opts(loopback, BPF_TCX_EGRESS, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup_eg;
+
+	ASSERT_EQ(optq.count, 1, "count");
+	ASSERT_EQ(optq.revision, 2, "revision");
+	ASSERT_EQ(optq.prog_ids[0], id2, "prog_ids[0]");
+	ASSERT_EQ(optq.prog_ids[1], 0, "prog_ids[1]");
+
+	ASSERT_OK(system(ping_cmd), ping_cmd);
+
+	ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+	ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+
+cleanup_eg:
+	err = bpf_prog_detach_opts(fd2, loopback, BPF_TCX_EGRESS, &optd);
+	ASSERT_OK(err, "prog_detach_eg");
+
+	assert_mprog_count(BPF_TCX_INGRESS, 1);
+	assert_mprog_count(BPF_TCX_EGRESS, 0);
+
+cleanup_in:
+	err = bpf_prog_detach_opts(fd1, loopback, BPF_TCX_INGRESS, &optd);
+	ASSERT_OK(err, "prog_detach_in");
+
+	assert_mprog_count(BPF_TCX_INGRESS, 0);
+	assert_mprog_count(BPF_TCX_EGRESS, 0);
+
+cleanup:
+	test_tc_link__destroy(skel);
+}
+
+static void test_tc_opts_before_target(int target)
+{
+	LIBBPF_OPTS(bpf_prog_attach_opts, opta);
+	LIBBPF_OPTS(bpf_prog_detach_opts, optd);
+	LIBBPF_OPTS(bpf_prog_query_opts, optq);
+	__u32 fd1, fd2, fd3, fd4, id1, id2, id3, id4;
+	struct test_tc_link *skel;
+	__u32 prog_ids[5];
+	int err;
+
+	skel = test_tc_link__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_load"))
+		goto cleanup;
+
+	fd1 = bpf_program__fd(skel->progs.tc1);
+	fd2 = bpf_program__fd(skel->progs.tc2);
+	fd3 = bpf_program__fd(skel->progs.tc3);
+	fd4 = bpf_program__fd(skel->progs.tc4);
+
+	id1 = id_from_prog_fd(fd1);
+	id2 = id_from_prog_fd(fd2);
+	id3 = id_from_prog_fd(fd3);
+	id4 = id_from_prog_fd(fd4);
+
+	ASSERT_NEQ(id1, id2, "prog_ids_1_2");
+	ASSERT_NEQ(id3, id4, "prog_ids_3_4");
+	ASSERT_NEQ(id2, id3, "prog_ids_2_3");
+
+	assert_mprog_count(target, 0);
+
+	err = bpf_prog_attach_opts(fd1, loopback, target, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup;
+
+	assert_mprog_count(target, 1);
+
+	err = bpf_prog_attach_opts(fd2, loopback, target, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup_target;
+
+	assert_mprog_count(target, 2);
+
+	optq.prog_ids = prog_ids;
+
+	memset(prog_ids, 0, sizeof(prog_ids));
+	optq.count = ARRAY_SIZE(prog_ids);
+
+	err = bpf_prog_query_opts(loopback, target, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup_target2;
+
+	ASSERT_EQ(optq.count, 2, "count");
+	ASSERT_EQ(optq.revision, 3, "revision");
+	ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]");
+	ASSERT_EQ(optq.prog_ids[1], id2, "prog_ids[1]");
+	ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]");
+
+	ASSERT_OK(system(ping_cmd), ping_cmd);
+
+	ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+	ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+	ASSERT_EQ(skel->bss->seen_tc3, false, "seen_tc3");
+	ASSERT_EQ(skel->bss->seen_tc4, false, "seen_tc4");
+
+	LIBBPF_OPTS_CLEAR(opta);
+	opta.flags = BPF_F_BEFORE;
+	opta.relative_fd = fd2;
+
+	err = bpf_prog_attach_opts(fd3, loopback, target, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup_target2;
+
+	memset(prog_ids, 0, sizeof(prog_ids));
+	optq.count = ARRAY_SIZE(prog_ids);
+
+	err = bpf_prog_query_opts(loopback, target, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup_target3;
+
+	ASSERT_EQ(optq.count, 3, "count");
+	ASSERT_EQ(optq.revision, 4, "revision");
+	ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]");
+	ASSERT_EQ(optq.prog_ids[1], id3, "prog_ids[1]");
+	ASSERT_EQ(optq.prog_ids[2], id2, "prog_ids[2]");
+	ASSERT_EQ(optq.prog_ids[3], 0, "prog_ids[3]");
+
+	LIBBPF_OPTS_CLEAR(opta);
+	opta.flags = BPF_F_BEFORE;
+	opta.relative_id = id1;
+
+	err = bpf_prog_attach_opts(fd4, loopback, target, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup_target3;
+
+	assert_mprog_count(target, 4);
+
+	memset(prog_ids, 0, sizeof(prog_ids));
+	optq.count = ARRAY_SIZE(prog_ids);
+
+	err = bpf_prog_query_opts(loopback, target, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup_target4;
+
+	ASSERT_EQ(optq.count, 4, "count");
+	ASSERT_EQ(optq.revision, 5, "revision");
+	ASSERT_EQ(optq.prog_ids[0], id4, "prog_ids[0]");
+	ASSERT_EQ(optq.prog_ids[1], id1, "prog_ids[1]");
+	ASSERT_EQ(optq.prog_ids[2], id3, "prog_ids[2]");
+	ASSERT_EQ(optq.prog_ids[3], id2, "prog_ids[3]");
+	ASSERT_EQ(optq.prog_ids[4], 0, "prog_ids[4]");
+
+	ASSERT_OK(system(ping_cmd), ping_cmd);
+
+	ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+	ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+	ASSERT_EQ(skel->bss->seen_tc3, true, "seen_tc3");
+	ASSERT_EQ(skel->bss->seen_tc4, true, "seen_tc4");
+
+cleanup_target4:
+	err = bpf_prog_detach_opts(fd4, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+	assert_mprog_count(target, 3);
+
+cleanup_target3:
+	err = bpf_prog_detach_opts(fd3, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+	assert_mprog_count(target, 2);
+
+cleanup_target2:
+	err = bpf_prog_detach_opts(fd2, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+	assert_mprog_count(target, 1);
+
+cleanup_target:
+	err = bpf_prog_detach_opts(fd1, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+	assert_mprog_count(target, 0);
+
+cleanup:
+	test_tc_link__destroy(skel);
+}
+
+void serial_test_tc_opts_before(void)
+{
+	test_tc_opts_before_target(BPF_TCX_INGRESS);
+	test_tc_opts_before_target(BPF_TCX_EGRESS);
+}
+
+static void test_tc_opts_after_target(int target)
+{
+	LIBBPF_OPTS(bpf_prog_attach_opts, opta);
+	LIBBPF_OPTS(bpf_prog_detach_opts, optd);
+	LIBBPF_OPTS(bpf_prog_query_opts, optq);
+	__u32 fd1, fd2, fd3, fd4, id1, id2, id3, id4;
+	struct test_tc_link *skel;
+	__u32 prog_ids[5];
+	int err;
+
+	skel = test_tc_link__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_load"))
+		goto cleanup;
+
+	fd1 = bpf_program__fd(skel->progs.tc1);
+	fd2 = bpf_program__fd(skel->progs.tc2);
+	fd3 = bpf_program__fd(skel->progs.tc3);
+	fd4 = bpf_program__fd(skel->progs.tc4);
+
+	id1 = id_from_prog_fd(fd1);
+	id2 = id_from_prog_fd(fd2);
+	id3 = id_from_prog_fd(fd3);
+	id4 = id_from_prog_fd(fd4);
+
+	ASSERT_NEQ(id1, id2, "prog_ids_1_2");
+	ASSERT_NEQ(id3, id4, "prog_ids_3_4");
+	ASSERT_NEQ(id2, id3, "prog_ids_2_3");
+
+	assert_mprog_count(target, 0);
+
+	err = bpf_prog_attach_opts(fd1, loopback, target, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup;
+
+	assert_mprog_count(target, 1);
+
+	err = bpf_prog_attach_opts(fd2, loopback, target, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup_target;
+
+	assert_mprog_count(target, 2);
+
+	optq.prog_ids = prog_ids;
+
+	memset(prog_ids, 0, sizeof(prog_ids));
+	optq.count = ARRAY_SIZE(prog_ids);
+
+	err = bpf_prog_query_opts(loopback, target, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup_target2;
+
+	ASSERT_EQ(optq.count, 2, "count");
+	ASSERT_EQ(optq.revision, 3, "revision");
+	ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]");
+	ASSERT_EQ(optq.prog_ids[1], id2, "prog_ids[1]");
+	ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]");
+
+	ASSERT_OK(system(ping_cmd), ping_cmd);
+
+	ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+	ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+	ASSERT_EQ(skel->bss->seen_tc3, false, "seen_tc3");
+	ASSERT_EQ(skel->bss->seen_tc4, false, "seen_tc4");
+
+	LIBBPF_OPTS_CLEAR(opta);
+	opta.flags = BPF_F_AFTER;
+	opta.relative_fd = fd1;
+
+	err = bpf_prog_attach_opts(fd3, loopback, target, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup_target2;
+
+	memset(prog_ids, 0, sizeof(prog_ids));
+	optq.count = ARRAY_SIZE(prog_ids);
+
+	err = bpf_prog_query_opts(loopback, target, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup_target3;
+
+	ASSERT_EQ(optq.count, 3, "count");
+	ASSERT_EQ(optq.revision, 4, "revision");
+	ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]");
+	ASSERT_EQ(optq.prog_ids[1], id3, "prog_ids[1]");
+	ASSERT_EQ(optq.prog_ids[2], id2, "prog_ids[2]");
+	ASSERT_EQ(optq.prog_ids[3], 0, "prog_ids[3]");
+
+	LIBBPF_OPTS_CLEAR(opta);
+	opta.flags = BPF_F_AFTER;
+	opta.relative_id = id2;
+
+	err = bpf_prog_attach_opts(fd4, loopback, target, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup_target3;
+
+	assert_mprog_count(target, 4);
+
+	memset(prog_ids, 0, sizeof(prog_ids));
+	optq.count = ARRAY_SIZE(prog_ids);
+
+	err = bpf_prog_query_opts(loopback, target, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup_target4;
+
+	ASSERT_EQ(optq.count, 4, "count");
+	ASSERT_EQ(optq.revision, 5, "revision");
+	ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]");
+	ASSERT_EQ(optq.prog_ids[1], id3, "prog_ids[1]");
+	ASSERT_EQ(optq.prog_ids[2], id2, "prog_ids[2]");
+	ASSERT_EQ(optq.prog_ids[3], id4, "prog_ids[3]");
+	ASSERT_EQ(optq.prog_ids[4], 0, "prog_ids[4]");
+
+	ASSERT_OK(system(ping_cmd), ping_cmd);
+
+	ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+	ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+	ASSERT_EQ(skel->bss->seen_tc3, true, "seen_tc3");
+	ASSERT_EQ(skel->bss->seen_tc4, true, "seen_tc4");
+
+cleanup_target4:
+	err = bpf_prog_detach_opts(fd4, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+	assert_mprog_count(target, 3);
+
+	memset(prog_ids, 0, sizeof(prog_ids));
+	optq.count = ARRAY_SIZE(prog_ids);
+
+	err = bpf_prog_query_opts(loopback, target, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup_target3;
+
+	ASSERT_EQ(optq.count, 3, "count");
+	ASSERT_EQ(optq.revision, 6, "revision");
+	ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]");
+	ASSERT_EQ(optq.prog_ids[1], id3, "prog_ids[1]");
+	ASSERT_EQ(optq.prog_ids[2], id2, "prog_ids[2]");
+	ASSERT_EQ(optq.prog_ids[3], 0, "prog_ids[3]");
+
+cleanup_target3:
+	err = bpf_prog_detach_opts(fd3, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+	assert_mprog_count(target, 2);
+
+	memset(prog_ids, 0, sizeof(prog_ids));
+	optq.count = ARRAY_SIZE(prog_ids);
+
+	err = bpf_prog_query_opts(loopback, target, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup_target2;
+
+	ASSERT_EQ(optq.count, 2, "count");
+	ASSERT_EQ(optq.revision, 7, "revision");
+	ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]");
+	ASSERT_EQ(optq.prog_ids[1], id2, "prog_ids[1]");
+	ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]");
+
+cleanup_target2:
+	err = bpf_prog_detach_opts(fd2, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+	assert_mprog_count(target, 1);
+
+	memset(prog_ids, 0, sizeof(prog_ids));
+	optq.count = ARRAY_SIZE(prog_ids);
+
+	err = bpf_prog_query_opts(loopback, target, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup_target;
+
+	ASSERT_EQ(optq.count, 1, "count");
+	ASSERT_EQ(optq.revision, 8, "revision");
+	ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]");
+	ASSERT_EQ(optq.prog_ids[1], 0, "prog_ids[1]");
+
+cleanup_target:
+	err = bpf_prog_detach_opts(fd1, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+	assert_mprog_count(target, 0);
+
+cleanup:
+	test_tc_link__destroy(skel);
+}
+
+void serial_test_tc_opts_after(void)
+{
+	test_tc_opts_after_target(BPF_TCX_INGRESS);
+	test_tc_opts_after_target(BPF_TCX_EGRESS);
+}
+
+static void test_tc_opts_revision_target(int target)
+{
+	LIBBPF_OPTS(bpf_prog_attach_opts, opta);
+	LIBBPF_OPTS(bpf_prog_detach_opts, optd);
+	LIBBPF_OPTS(bpf_prog_query_opts, optq);
+	__u32 fd1, fd2, id1, id2;
+	struct test_tc_link *skel;
+	__u32 prog_ids[3];
+	int err;
+
+	skel = test_tc_link__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_load"))
+		goto cleanup;
+
+	fd1 = bpf_program__fd(skel->progs.tc1);
+	fd2 = bpf_program__fd(skel->progs.tc2);
+
+	id1 = id_from_prog_fd(fd1);
+	id2 = id_from_prog_fd(fd2);
+
+	ASSERT_NEQ(id1, id2, "prog_ids_1_2");
+
+	assert_mprog_count(target, 0);
+
+	LIBBPF_OPTS_CLEAR(opta);
+	opta.expected_revision = 1;
+
+	err = bpf_prog_attach_opts(fd1, loopback, target, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup;
+
+	assert_mprog_count(target, 1);
+
+	LIBBPF_OPTS_CLEAR(opta);
+	opta.expected_revision = 1;
+
+	err = bpf_prog_attach_opts(fd2, loopback, target, &opta);
+	if (!ASSERT_EQ(err, -ESTALE, "prog_attach"))
+		goto cleanup_target;
+
+	assert_mprog_count(target, 1);
+
+	LIBBPF_OPTS_CLEAR(opta);
+	opta.expected_revision = 2;
+
+	err = bpf_prog_attach_opts(fd2, loopback, target, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup_target;
+
+	assert_mprog_count(target, 2);
+
+	optq.prog_ids = prog_ids;
+
+	memset(prog_ids, 0, sizeof(prog_ids));
+	optq.count = ARRAY_SIZE(prog_ids);
+
+	err = bpf_prog_query_opts(loopback, target, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup_target2;
+
+	ASSERT_EQ(optq.count, 2, "count");
+	ASSERT_EQ(optq.revision, 3, "revision");
+	ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]");
+	ASSERT_EQ(optq.prog_ids[1], id2, "prog_ids[1]");
+	ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]");
+
+	ASSERT_OK(system(ping_cmd), ping_cmd);
+
+	ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+	ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+
+	LIBBPF_OPTS_CLEAR(optd);
+	optd.expected_revision = 2;
+
+	err = bpf_prog_detach_opts(fd2, loopback, target, &optd);
+	ASSERT_EQ(err, -ESTALE, "prog_detach");
+	assert_mprog_count(target, 2);
+
+cleanup_target2:
+	LIBBPF_OPTS_CLEAR(optd);
+	optd.expected_revision = 3;
+
+	err = bpf_prog_detach_opts(fd2, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+	assert_mprog_count(target, 1);
+
+cleanup_target:
+	LIBBPF_OPTS_CLEAR(optd);
+
+	err = bpf_prog_detach_opts(fd1, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+	assert_mprog_count(target, 0);
+
+cleanup:
+	test_tc_link__destroy(skel);
+}
+
+void serial_test_tc_opts_revision(void)
+{
+	test_tc_opts_revision_target(BPF_TCX_INGRESS);
+	test_tc_opts_revision_target(BPF_TCX_EGRESS);
+}
+
+static void test_tc_chain_classic(int target, bool chain_tc_old)
+{
+	LIBBPF_OPTS(bpf_tc_opts, tc_opts, .handle = 1, .priority = 1);
+	LIBBPF_OPTS(bpf_tc_hook, tc_hook, .ifindex = loopback);
+	LIBBPF_OPTS(bpf_prog_attach_opts, opta);
+	LIBBPF_OPTS(bpf_prog_detach_opts, optd);
+	bool hook_created = false, tc_attached = false;
+	__u32 fd1, fd2, fd3, id1, id2, id3;
+	struct test_tc_link *skel;
+	int err;
+
+	skel = test_tc_link__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_load"))
+		goto cleanup;
+
+	fd1 = bpf_program__fd(skel->progs.tc1);
+	fd2 = bpf_program__fd(skel->progs.tc2);
+	fd3 = bpf_program__fd(skel->progs.tc3);
+
+	id1 = id_from_prog_fd(fd1);
+	id2 = id_from_prog_fd(fd2);
+	id3 = id_from_prog_fd(fd3);
+
+	ASSERT_NEQ(id1, id2, "prog_ids_1_2");
+	ASSERT_NEQ(id2, id3, "prog_ids_2_3");
+
+	assert_mprog_count(target, 0);
+
+	if (chain_tc_old) {
+		tc_hook.attach_point = target == BPF_TCX_INGRESS ?
+				       BPF_TC_INGRESS : BPF_TC_EGRESS;
+		err = bpf_tc_hook_create(&tc_hook);
+		if (err == 0)
+			hook_created = true;
+		err = err == -EEXIST ? 0 : err;
+		if (!ASSERT_OK(err, "bpf_tc_hook_create"))
+			goto cleanup;
+
+		tc_opts.prog_fd = fd3;
+		err = bpf_tc_attach(&tc_hook, &tc_opts);
+		if (!ASSERT_OK(err, "bpf_tc_attach"))
+			goto cleanup;
+		tc_attached = true;
+	}
+
+	err = bpf_prog_attach_opts(fd1, loopback, target, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup;
+
+	err = bpf_prog_attach_opts(fd2, loopback, target, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup_detach;
+
+	assert_mprog_count(target, 2);
+
+	ASSERT_OK(system(ping_cmd), ping_cmd);
+
+	ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+	ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+	ASSERT_EQ(skel->bss->seen_tc3, chain_tc_old, "seen_tc3");
+
+	skel->bss->seen_tc1 = false;
+	skel->bss->seen_tc2 = false;
+	skel->bss->seen_tc3 = false;
+
+	err = bpf_prog_detach_opts(fd2, loopback, target, &optd);
+	if (!ASSERT_OK(err, "prog_detach"))
+		goto cleanup_detach;
+
+	assert_mprog_count(target, 1);
+
+	ASSERT_OK(system(ping_cmd), ping_cmd);
+
+	ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+	ASSERT_EQ(skel->bss->seen_tc2, false, "seen_tc2");
+	ASSERT_EQ(skel->bss->seen_tc3, chain_tc_old, "seen_tc3");
+
+cleanup_detach:
+	err = bpf_prog_detach_opts(fd1, loopback, target, &optd);
+	if (!ASSERT_OK(err, "prog_detach"))
+		goto cleanup;
+
+	__assert_mprog_count(target, 0, chain_tc_old, loopback);
+cleanup:
+	if (tc_attached) {
+		tc_opts.flags = tc_opts.prog_fd = tc_opts.prog_id = 0;
+		err = bpf_tc_detach(&tc_hook, &tc_opts);
+		ASSERT_OK(err, "bpf_tc_detach");
+	}
+	if (hook_created) {
+		tc_hook.attach_point = BPF_TC_INGRESS | BPF_TC_EGRESS;
+		bpf_tc_hook_destroy(&tc_hook);
+	}
+	test_tc_link__destroy(skel);
+	assert_mprog_count(target, 0);
+}
+
+void serial_test_tc_opts_chain_classic(void)
+{
+	test_tc_chain_classic(BPF_TCX_INGRESS, false);
+	test_tc_chain_classic(BPF_TCX_EGRESS, false);
+	test_tc_chain_classic(BPF_TCX_INGRESS, true);
+	test_tc_chain_classic(BPF_TCX_EGRESS, true);
+}
+
+static void test_tc_opts_replace_target(int target)
+{
+	LIBBPF_OPTS(bpf_prog_attach_opts, opta);
+	LIBBPF_OPTS(bpf_prog_detach_opts, optd);
+	LIBBPF_OPTS(bpf_prog_query_opts, optq);
+	__u32 fd1, fd2, fd3, id1, id2, id3, detach_fd;
+	__u32 prog_ids[4], prog_flags[4];
+	struct test_tc_link *skel;
+	int err;
+
+	skel = test_tc_link__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_load"))
+		goto cleanup;
+
+	fd1 = bpf_program__fd(skel->progs.tc1);
+	fd2 = bpf_program__fd(skel->progs.tc2);
+	fd3 = bpf_program__fd(skel->progs.tc3);
+
+	id1 = id_from_prog_fd(fd1);
+	id2 = id_from_prog_fd(fd2);
+	id3 = id_from_prog_fd(fd3);
+
+	ASSERT_NEQ(id1, id2, "prog_ids_1_2");
+	ASSERT_NEQ(id2, id3, "prog_ids_2_3");
+
+	assert_mprog_count(target, 0);
+
+	LIBBPF_OPTS_CLEAR(opta);
+	opta.expected_revision = 1;
+
+	err = bpf_prog_attach_opts(fd1, loopback, target, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup;
+
+	assert_mprog_count(target, 1);
+
+	LIBBPF_OPTS_CLEAR(opta);
+	opta.flags = BPF_F_BEFORE;
+	opta.relative_id = id1;
+	opta.expected_revision = 2;
+
+	err = bpf_prog_attach_opts(fd2, loopback, target, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup_target;
+
+	detach_fd = fd2;
+
+	assert_mprog_count(target, 2);
+
+	optq.prog_attach_flags = prog_flags;
+	optq.prog_ids = prog_ids;
+
+	memset(prog_flags, 0, sizeof(prog_flags));
+	memset(prog_ids, 0, sizeof(prog_ids));
+	optq.count = ARRAY_SIZE(prog_ids);
+
+	err = bpf_prog_query_opts(loopback, target, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup_target2;
+
+	ASSERT_EQ(optq.count, 2, "count");
+	ASSERT_EQ(optq.revision, 3, "revision");
+	ASSERT_EQ(optq.prog_ids[0], id2, "prog_ids[0]");
+	ASSERT_EQ(optq.prog_ids[1], id1, "prog_ids[1]");
+	ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]");
+
+	ASSERT_EQ(optq.prog_attach_flags[0], 0, "prog_flags[0]");
+	ASSERT_EQ(optq.prog_attach_flags[1], 0, "prog_flags[1]");
+	ASSERT_EQ(optq.prog_attach_flags[2], 0, "prog_flags[2]");
+
+	ASSERT_OK(system(ping_cmd), ping_cmd);
+
+	ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+	ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+	ASSERT_EQ(skel->bss->seen_tc3, false, "seen_tc3");
+
+	skel->bss->seen_tc1 = false;
+	skel->bss->seen_tc2 = false;
+	skel->bss->seen_tc3 = false;
+
+	LIBBPF_OPTS_CLEAR(opta);
+	opta.flags = BPF_F_REPLACE;
+	opta.replace_prog_fd = fd2;
+	opta.expected_revision = 3;
+
+	err = bpf_prog_attach_opts(fd3, loopback, target, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup_target2;
+
+	detach_fd = fd3;
+
+	assert_mprog_count(target, 2);
+
+	memset(prog_ids, 0, sizeof(prog_ids));
+	optq.count = ARRAY_SIZE(prog_ids);
+
+	err = bpf_prog_query_opts(loopback, target, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup_target2;
+
+	ASSERT_EQ(optq.count, 2, "count");
+	ASSERT_EQ(optq.revision, 4, "revision");
+	ASSERT_EQ(optq.prog_ids[0], id3, "prog_ids[0]");
+	ASSERT_EQ(optq.prog_ids[1], id1, "prog_ids[1]");
+	ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]");
+
+	ASSERT_OK(system(ping_cmd), ping_cmd);
+
+	ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+	ASSERT_EQ(skel->bss->seen_tc2, false, "seen_tc2");
+	ASSERT_EQ(skel->bss->seen_tc3, true, "seen_tc3");
+
+	skel->bss->seen_tc1 = false;
+	skel->bss->seen_tc2 = false;
+	skel->bss->seen_tc3 = false;
+
+	LIBBPF_OPTS_CLEAR(opta);
+	opta.flags = BPF_F_REPLACE | BPF_F_BEFORE;
+	opta.replace_prog_fd = fd3;
+	opta.relative_fd = fd1;
+	opta.expected_revision = 4;
+
+	err = bpf_prog_attach_opts(fd2, loopback, target, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup_target2;
+
+	detach_fd = fd2;
+
+	assert_mprog_count(target, 2);
+
+	memset(prog_ids, 0, sizeof(prog_ids));
+	optq.count = ARRAY_SIZE(prog_ids);
+
+	err = bpf_prog_query_opts(loopback, target, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup_target2;
+
+	ASSERT_EQ(optq.count, 2, "count");
+	ASSERT_EQ(optq.revision, 5, "revision");
+	ASSERT_EQ(optq.prog_ids[0], id2, "prog_ids[0]");
+	ASSERT_EQ(optq.prog_ids[1], id1, "prog_ids[1]");
+	ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]");
+
+	ASSERT_OK(system(ping_cmd), ping_cmd);
+
+	ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+	ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+	ASSERT_EQ(skel->bss->seen_tc3, false, "seen_tc3");
+
+	LIBBPF_OPTS_CLEAR(opta);
+	opta.flags = BPF_F_REPLACE;
+	opta.replace_prog_fd = fd2;
+
+	err = bpf_prog_attach_opts(fd2, loopback, target, &opta);
+	ASSERT_EQ(err, -EEXIST, "prog_attach");
+	assert_mprog_count(target, 2);
+
+	LIBBPF_OPTS_CLEAR(opta);
+	opta.flags = BPF_F_REPLACE | BPF_F_AFTER;
+	opta.replace_prog_fd = fd2;
+	opta.relative_fd = fd1;
+	opta.expected_revision = 5;
+
+	err = bpf_prog_attach_opts(fd3, loopback, target, &opta);
+	ASSERT_EQ(err, -EDOM, "prog_attach");
+	assert_mprog_count(target, 2);
+
+	LIBBPF_OPTS_CLEAR(opta);
+	opta.flags = BPF_F_BEFORE | BPF_F_AFTER | BPF_F_REPLACE;
+	opta.replace_prog_fd = fd2;
+	opta.relative_fd = fd1;
+	opta.expected_revision = 5;
+
+	err = bpf_prog_attach_opts(fd3, loopback, target, &opta);
+	ASSERT_EQ(err, -EDOM, "prog_attach");
+	assert_mprog_count(target, 2);
+
+	LIBBPF_OPTS_CLEAR(optd);
+	optd.flags = BPF_F_BEFORE;
+	optd.relative_id = id1;
+	optd.expected_revision = 5;
+
+cleanup_target2:
+	err = bpf_prog_detach_opts(detach_fd, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+	assert_mprog_count(target, 1);
+
+cleanup_target:
+	LIBBPF_OPTS_CLEAR(optd);
+
+	err = bpf_prog_detach_opts(fd1, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+	assert_mprog_count(target, 0);
+
+cleanup:
+	test_tc_link__destroy(skel);
+}
+
+void serial_test_tc_opts_replace(void)
+{
+	test_tc_opts_replace_target(BPF_TCX_INGRESS);
+	test_tc_opts_replace_target(BPF_TCX_EGRESS);
+}
+
+static void test_tc_opts_invalid_target(int target)
+{
+	LIBBPF_OPTS(bpf_prog_attach_opts, opta);
+	LIBBPF_OPTS(bpf_prog_detach_opts, optd);
+	__u32 fd1, fd2, id1, id2;
+	struct test_tc_link *skel;
+	int err;
+
+	skel = test_tc_link__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_load"))
+		goto cleanup;
+
+	fd1 = bpf_program__fd(skel->progs.tc1);
+	fd2 = bpf_program__fd(skel->progs.tc2);
+
+	id1 = id_from_prog_fd(fd1);
+	id2 = id_from_prog_fd(fd2);
+
+	ASSERT_NEQ(id1, id2, "prog_ids_1_2");
+
+	assert_mprog_count(target, 0);
+
+	LIBBPF_OPTS_CLEAR(opta);
+	opta.flags = BPF_F_BEFORE | BPF_F_AFTER;
+
+	err = bpf_prog_attach_opts(fd1, loopback, target, &opta);
+	ASSERT_EQ(err, -EDOM, "prog_attach");
+	assert_mprog_count(target, 0);
+
+	LIBBPF_OPTS_CLEAR(opta);
+	opta.flags = BPF_F_BEFORE | BPF_F_ID;
+
+	err = bpf_prog_attach_opts(fd1, loopback, target, &opta);
+	ASSERT_EQ(err, -ENOENT, "prog_attach");
+	assert_mprog_count(target, 0);
+
+	LIBBPF_OPTS_CLEAR(opta);
+	opta.flags = BPF_F_AFTER | BPF_F_ID;
+
+	err = bpf_prog_attach_opts(fd1, loopback, target, &opta);
+	ASSERT_EQ(err, -ENOENT, "prog_attach");
+	assert_mprog_count(target, 0);
+
+	LIBBPF_OPTS_CLEAR(opta);
+	opta.relative_fd = fd2;
+
+	err = bpf_prog_attach_opts(fd1, loopback, target, &opta);
+	ASSERT_EQ(err, -EINVAL, "prog_attach");
+	assert_mprog_count(target, 0);
+
+	LIBBPF_OPTS_CLEAR(opta);
+	opta.flags = BPF_F_BEFORE | BPF_F_AFTER;
+	opta.relative_fd = fd2;
+
+	err = bpf_prog_attach_opts(fd1, loopback, target, &opta);
+	ASSERT_EQ(err, -ENOENT, "prog_attach");
+	assert_mprog_count(target, 0);
+
+	LIBBPF_OPTS_CLEAR(opta);
+	opta.flags = BPF_F_ID;
+	opta.relative_id = id2;
+
+	err = bpf_prog_attach_opts(fd1, loopback, target, &opta);
+	ASSERT_EQ(err, -EINVAL, "prog_attach");
+	assert_mprog_count(target, 0);
+
+	LIBBPF_OPTS_CLEAR(opta);
+	opta.flags = BPF_F_BEFORE;
+	opta.relative_fd = fd1;
+
+	err = bpf_prog_attach_opts(fd1, loopback, target, &opta);
+	ASSERT_EQ(err, -ENOENT, "prog_attach");
+	assert_mprog_count(target, 0);
+
+	LIBBPF_OPTS_CLEAR(opta);
+	opta.flags = BPF_F_AFTER;
+	opta.relative_fd = fd1;
+
+	err = bpf_prog_attach_opts(fd1, loopback, target, &opta);
+	ASSERT_EQ(err, -ENOENT, "prog_attach");
+	assert_mprog_count(target, 0);
+
+	LIBBPF_OPTS_CLEAR(opta);
+
+	err = bpf_prog_attach_opts(fd1, loopback, target, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup;
+
+	assert_mprog_count(target, 1);
+
+	LIBBPF_OPTS_CLEAR(opta);
+
+	err = bpf_prog_attach_opts(fd1, loopback, target, &opta);
+	ASSERT_EQ(err, -EEXIST, "prog_attach");
+	assert_mprog_count(target, 1);
+
+	LIBBPF_OPTS_CLEAR(opta);
+	opta.flags = BPF_F_BEFORE;
+	opta.relative_fd = fd1;
+
+	err = bpf_prog_attach_opts(fd1, loopback, target, &opta);
+	ASSERT_EQ(err, -EEXIST, "prog_attach");
+	assert_mprog_count(target, 1);
+
+	LIBBPF_OPTS_CLEAR(opta);
+	opta.flags = BPF_F_AFTER;
+	opta.relative_fd = fd1;
+
+	err = bpf_prog_attach_opts(fd1, loopback, target, &opta);
+	ASSERT_EQ(err, -EEXIST, "prog_attach");
+	assert_mprog_count(target, 1);
+
+	LIBBPF_OPTS_CLEAR(opta);
+	opta.flags = BPF_F_REPLACE;
+	opta.relative_fd = fd1;
+
+	err = bpf_prog_attach_opts(fd1, loopback, target, &opta);
+	ASSERT_EQ(err, -EINVAL, "prog_attach_x1");
+	assert_mprog_count(target, 1);
+
+	LIBBPF_OPTS_CLEAR(opta);
+	opta.flags = BPF_F_REPLACE;
+	opta.replace_prog_fd = fd1;
+
+	err = bpf_prog_attach_opts(fd1, loopback, target, &opta);
+	ASSERT_EQ(err, -EEXIST, "prog_attach");
+	assert_mprog_count(target, 1);
+
+	err = bpf_prog_detach_opts(fd1, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+	assert_mprog_count(target, 0);
+cleanup:
+	test_tc_link__destroy(skel);
+}
+
+void serial_test_tc_opts_invalid(void)
+{
+	test_tc_opts_invalid_target(BPF_TCX_INGRESS);
+	test_tc_opts_invalid_target(BPF_TCX_EGRESS);
+}
+
+static void test_tc_opts_prepend_target(int target)
+{
+	LIBBPF_OPTS(bpf_prog_attach_opts, opta);
+	LIBBPF_OPTS(bpf_prog_detach_opts, optd);
+	LIBBPF_OPTS(bpf_prog_query_opts, optq);
+	__u32 fd1, fd2, fd3, fd4, id1, id2, id3, id4;
+	struct test_tc_link *skel;
+	__u32 prog_ids[5];
+	int err;
+
+	skel = test_tc_link__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_load"))
+		goto cleanup;
+
+	fd1 = bpf_program__fd(skel->progs.tc1);
+	fd2 = bpf_program__fd(skel->progs.tc2);
+	fd3 = bpf_program__fd(skel->progs.tc3);
+	fd4 = bpf_program__fd(skel->progs.tc4);
+
+	id1 = id_from_prog_fd(fd1);
+	id2 = id_from_prog_fd(fd2);
+	id3 = id_from_prog_fd(fd3);
+	id4 = id_from_prog_fd(fd4);
+
+	ASSERT_NEQ(id1, id2, "prog_ids_1_2");
+	ASSERT_NEQ(id3, id4, "prog_ids_3_4");
+	ASSERT_NEQ(id2, id3, "prog_ids_2_3");
+
+	assert_mprog_count(target, 0);
+
+	err = bpf_prog_attach_opts(fd1, loopback, target, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup;
+
+	assert_mprog_count(target, 1);
+
+	LIBBPF_OPTS_CLEAR(opta);
+	opta.flags = BPF_F_BEFORE;
+
+	err = bpf_prog_attach_opts(fd2, loopback, target, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup_target;
+
+	assert_mprog_count(target, 2);
+
+	optq.prog_ids = prog_ids;
+
+	memset(prog_ids, 0, sizeof(prog_ids));
+	optq.count = ARRAY_SIZE(prog_ids);
+
+	err = bpf_prog_query_opts(loopback, target, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup_target2;
+
+	ASSERT_EQ(optq.count, 2, "count");
+	ASSERT_EQ(optq.revision, 3, "revision");
+	ASSERT_EQ(optq.prog_ids[0], id2, "prog_ids[0]");
+	ASSERT_EQ(optq.prog_ids[1], id1, "prog_ids[1]");
+	ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]");
+
+	ASSERT_OK(system(ping_cmd), ping_cmd);
+
+	ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+	ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+	ASSERT_EQ(skel->bss->seen_tc3, false, "seen_tc3");
+	ASSERT_EQ(skel->bss->seen_tc4, false, "seen_tc4");
+
+	LIBBPF_OPTS_CLEAR(opta);
+	opta.flags = BPF_F_BEFORE;
+
+	err = bpf_prog_attach_opts(fd3, loopback, target, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup_target2;
+
+	LIBBPF_OPTS_CLEAR(opta);
+	opta.flags = BPF_F_BEFORE;
+
+	err = bpf_prog_attach_opts(fd4, loopback, target, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup_target3;
+
+	assert_mprog_count(target, 4);
+
+	memset(prog_ids, 0, sizeof(prog_ids));
+	optq.count = ARRAY_SIZE(prog_ids);
+
+	err = bpf_prog_query_opts(loopback, target, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup_target4;
+
+	ASSERT_EQ(optq.count, 4, "count");
+	ASSERT_EQ(optq.revision, 5, "revision");
+	ASSERT_EQ(optq.prog_ids[0], id4, "prog_ids[0]");
+	ASSERT_EQ(optq.prog_ids[1], id3, "prog_ids[1]");
+	ASSERT_EQ(optq.prog_ids[2], id2, "prog_ids[2]");
+	ASSERT_EQ(optq.prog_ids[3], id1, "prog_ids[3]");
+	ASSERT_EQ(optq.prog_ids[4], 0, "prog_ids[4]");
+
+	ASSERT_OK(system(ping_cmd), ping_cmd);
+
+	ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+	ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+	ASSERT_EQ(skel->bss->seen_tc3, true, "seen_tc3");
+	ASSERT_EQ(skel->bss->seen_tc4, true, "seen_tc4");
+
+cleanup_target4:
+	err = bpf_prog_detach_opts(fd4, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+	assert_mprog_count(target, 3);
+
+cleanup_target3:
+	err = bpf_prog_detach_opts(fd3, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+	assert_mprog_count(target, 2);
+
+cleanup_target2:
+	err = bpf_prog_detach_opts(fd2, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+	assert_mprog_count(target, 1);
+
+cleanup_target:
+	err = bpf_prog_detach_opts(fd1, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+	assert_mprog_count(target, 0);
+
+cleanup:
+	test_tc_link__destroy(skel);
+}
+
+void serial_test_tc_opts_prepend(void)
+{
+	test_tc_opts_prepend_target(BPF_TCX_INGRESS);
+	test_tc_opts_prepend_target(BPF_TCX_EGRESS);
+}
+
+static void test_tc_opts_append_target(int target)
+{
+	LIBBPF_OPTS(bpf_prog_attach_opts, opta);
+	LIBBPF_OPTS(bpf_prog_detach_opts, optd);
+	LIBBPF_OPTS(bpf_prog_query_opts, optq);
+	__u32 fd1, fd2, fd3, fd4, id1, id2, id3, id4;
+	struct test_tc_link *skel;
+	__u32 prog_ids[5];
+	int err;
+
+	skel = test_tc_link__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_load"))
+		goto cleanup;
+
+	fd1 = bpf_program__fd(skel->progs.tc1);
+	fd2 = bpf_program__fd(skel->progs.tc2);
+	fd3 = bpf_program__fd(skel->progs.tc3);
+	fd4 = bpf_program__fd(skel->progs.tc4);
+
+	id1 = id_from_prog_fd(fd1);
+	id2 = id_from_prog_fd(fd2);
+	id3 = id_from_prog_fd(fd3);
+	id4 = id_from_prog_fd(fd4);
+
+	ASSERT_NEQ(id1, id2, "prog_ids_1_2");
+	ASSERT_NEQ(id3, id4, "prog_ids_3_4");
+	ASSERT_NEQ(id2, id3, "prog_ids_2_3");
+
+	assert_mprog_count(target, 0);
+
+	err = bpf_prog_attach_opts(fd1, loopback, target, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup;
+
+	assert_mprog_count(target, 1);
+
+	LIBBPF_OPTS_CLEAR(opta);
+	opta.flags = BPF_F_AFTER;
+
+	err = bpf_prog_attach_opts(fd2, loopback, target, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup_target;
+
+	assert_mprog_count(target, 2);
+
+	optq.prog_ids = prog_ids;
+
+	memset(prog_ids, 0, sizeof(prog_ids));
+	optq.count = ARRAY_SIZE(prog_ids);
+
+	err = bpf_prog_query_opts(loopback, target, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup_target2;
+
+	ASSERT_EQ(optq.count, 2, "count");
+	ASSERT_EQ(optq.revision, 3, "revision");
+	ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]");
+	ASSERT_EQ(optq.prog_ids[1], id2, "prog_ids[1]");
+	ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]");
+
+	ASSERT_OK(system(ping_cmd), ping_cmd);
+
+	ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+	ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+	ASSERT_EQ(skel->bss->seen_tc3, false, "seen_tc3");
+	ASSERT_EQ(skel->bss->seen_tc4, false, "seen_tc4");
+
+	LIBBPF_OPTS_CLEAR(opta);
+	opta.flags = BPF_F_AFTER;
+
+	err = bpf_prog_attach_opts(fd3, loopback, target, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup_target2;
+
+	LIBBPF_OPTS_CLEAR(opta);
+	opta.flags = BPF_F_AFTER;
+
+	err = bpf_prog_attach_opts(fd4, loopback, target, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup_target3;
+
+	assert_mprog_count(target, 4);
+
+	memset(prog_ids, 0, sizeof(prog_ids));
+	optq.count = ARRAY_SIZE(prog_ids);
+
+	err = bpf_prog_query_opts(loopback, target, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup_target4;
+
+	ASSERT_EQ(optq.count, 4, "count");
+	ASSERT_EQ(optq.revision, 5, "revision");
+	ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]");
+	ASSERT_EQ(optq.prog_ids[1], id2, "prog_ids[1]");
+	ASSERT_EQ(optq.prog_ids[2], id3, "prog_ids[2]");
+	ASSERT_EQ(optq.prog_ids[3], id4, "prog_ids[3]");
+	ASSERT_EQ(optq.prog_ids[4], 0, "prog_ids[4]");
+
+	ASSERT_OK(system(ping_cmd), ping_cmd);
+
+	ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+	ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+	ASSERT_EQ(skel->bss->seen_tc3, true, "seen_tc3");
+	ASSERT_EQ(skel->bss->seen_tc4, true, "seen_tc4");
+
+cleanup_target4:
+	err = bpf_prog_detach_opts(fd4, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+	assert_mprog_count(target, 3);
+
+cleanup_target3:
+	err = bpf_prog_detach_opts(fd3, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+	assert_mprog_count(target, 2);
+
+cleanup_target2:
+	err = bpf_prog_detach_opts(fd2, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+	assert_mprog_count(target, 1);
+
+cleanup_target:
+	err = bpf_prog_detach_opts(fd1, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+	assert_mprog_count(target, 0);
+
+cleanup:
+	test_tc_link__destroy(skel);
+}
+
+void serial_test_tc_opts_append(void)
+{
+	test_tc_opts_append_target(BPF_TCX_INGRESS);
+	test_tc_opts_append_target(BPF_TCX_EGRESS);
+}
+
+static void test_tc_opts_dev_cleanup_target(int target)
+{
+	LIBBPF_OPTS(bpf_prog_attach_opts, opta);
+	LIBBPF_OPTS(bpf_prog_detach_opts, optd);
+	LIBBPF_OPTS(bpf_prog_query_opts, optq);
+	__u32 fd1, fd2, fd3, fd4, id1, id2, id3, id4;
+	struct test_tc_link *skel;
+	int err, ifindex;
+
+	ASSERT_OK(system("ip link add dev tcx_opts1 type veth peer name tcx_opts2"), "add veth");
+	ifindex = if_nametoindex("tcx_opts1");
+	ASSERT_NEQ(ifindex, 0, "non_zero_ifindex");
+
+	skel = test_tc_link__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_load"))
+		goto cleanup;
+
+	fd1 = bpf_program__fd(skel->progs.tc1);
+	fd2 = bpf_program__fd(skel->progs.tc2);
+	fd3 = bpf_program__fd(skel->progs.tc3);
+	fd4 = bpf_program__fd(skel->progs.tc4);
+
+	id1 = id_from_prog_fd(fd1);
+	id2 = id_from_prog_fd(fd2);
+	id3 = id_from_prog_fd(fd3);
+	id4 = id_from_prog_fd(fd4);
+
+	ASSERT_NEQ(id1, id2, "prog_ids_1_2");
+	ASSERT_NEQ(id3, id4, "prog_ids_3_4");
+	ASSERT_NEQ(id2, id3, "prog_ids_2_3");
+
+	assert_mprog_count_ifindex(ifindex, target, 0);
+
+	err = bpf_prog_attach_opts(fd1, ifindex, target, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup;
+
+	assert_mprog_count_ifindex(ifindex, target, 1);
+
+	err = bpf_prog_attach_opts(fd2, ifindex, target, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup1;
+
+	assert_mprog_count_ifindex(ifindex, target, 2);
+
+	err = bpf_prog_attach_opts(fd3, ifindex, target, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup2;
+
+	assert_mprog_count_ifindex(ifindex, target, 3);
+
+	err = bpf_prog_attach_opts(fd4, ifindex, target, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup3;
+
+	assert_mprog_count_ifindex(ifindex, target, 4);
+
+	ASSERT_OK(system("ip link del dev tcx_opts1"), "del veth");
+	ASSERT_EQ(if_nametoindex("tcx_opts1"), 0, "dev1_removed");
+	ASSERT_EQ(if_nametoindex("tcx_opts2"), 0, "dev2_removed");
+	return;
+cleanup3:
+	err = bpf_prog_detach_opts(fd3, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+
+	assert_mprog_count_ifindex(ifindex, target, 2);
+cleanup2:
+	err = bpf_prog_detach_opts(fd2, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+
+	assert_mprog_count_ifindex(ifindex, target, 1);
+cleanup1:
+	err = bpf_prog_detach_opts(fd1, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+
+	assert_mprog_count_ifindex(ifindex, target, 0);
+cleanup:
+	test_tc_link__destroy(skel);
+
+	ASSERT_OK(system("ip link del dev tcx_opts1"), "del veth");
+	ASSERT_EQ(if_nametoindex("tcx_opts1"), 0, "dev1_removed");
+	ASSERT_EQ(if_nametoindex("tcx_opts2"), 0, "dev2_removed");
+}
+
+void serial_test_tc_opts_dev_cleanup(void)
+{
+	test_tc_opts_dev_cleanup_target(BPF_TCX_INGRESS);
+	test_tc_opts_dev_cleanup_target(BPF_TCX_EGRESS);
+}
+
+static void test_tc_opts_mixed_target(int target)
+{
+	LIBBPF_OPTS(bpf_tcx_opts, optl,
+		.ifindex = loopback,
+	);
+	LIBBPF_OPTS(bpf_prog_attach_opts, opta);
+	LIBBPF_OPTS(bpf_prog_detach_opts, optd);
+	LIBBPF_OPTS(bpf_prog_query_opts, optq);
+	__u32 pid1, pid2, pid3, pid4, lid2, lid4;
+	__u32 prog_flags[4], link_flags[4];
+	__u32 prog_ids[4], link_ids[4];
+	struct test_tc_link *skel;
+	struct bpf_link *link;
+	int err, detach_fd;
+
+	skel = test_tc_link__open();
+	if (!ASSERT_OK_PTR(skel, "skel_open"))
+		goto cleanup;
+
+	ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc1, target),
+		  0, "tc1_attach_type");
+	ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc2, target),
+		  0, "tc2_attach_type");
+	ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc3, target),
+		  0, "tc3_attach_type");
+	ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc4, target),
+		  0, "tc4_attach_type");
+
+	err = test_tc_link__load(skel);
+	if (!ASSERT_OK(err, "skel_load"))
+		goto cleanup;
+
+	pid1 = id_from_prog_fd(bpf_program__fd(skel->progs.tc1));
+	pid2 = id_from_prog_fd(bpf_program__fd(skel->progs.tc2));
+	pid3 = id_from_prog_fd(bpf_program__fd(skel->progs.tc3));
+	pid4 = id_from_prog_fd(bpf_program__fd(skel->progs.tc4));
+
+	ASSERT_NEQ(pid1, pid2, "prog_ids_1_2");
+	ASSERT_NEQ(pid3, pid4, "prog_ids_3_4");
+	ASSERT_NEQ(pid2, pid3, "prog_ids_2_3");
+
+	assert_mprog_count(target, 0);
+
+	err = bpf_prog_attach_opts(bpf_program__fd(skel->progs.tc1),
+				   loopback, target, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup;
+
+	detach_fd = bpf_program__fd(skel->progs.tc1);
+
+	assert_mprog_count(target, 1);
+
+	link = bpf_program__attach_tcx(skel->progs.tc2, &optl);
+	if (!ASSERT_OK_PTR(link, "link_attach"))
+		goto cleanup1;
+	skel->links.tc2 = link;
+
+	lid2 = id_from_link_fd(bpf_link__fd(skel->links.tc2));
+
+	assert_mprog_count(target, 2);
+
+	LIBBPF_OPTS_CLEAR(opta);
+	opta.flags = BPF_F_REPLACE;
+	opta.replace_prog_fd = bpf_program__fd(skel->progs.tc1);
+
+	err = bpf_prog_attach_opts(bpf_program__fd(skel->progs.tc2),
+				   loopback, target, &opta);
+	ASSERT_EQ(err, -EEXIST, "prog_attach");
+
+	assert_mprog_count(target, 2);
+
+	LIBBPF_OPTS_CLEAR(opta);
+	opta.flags = BPF_F_REPLACE;
+	opta.replace_prog_fd = bpf_program__fd(skel->progs.tc2);
+
+	err = bpf_prog_attach_opts(bpf_program__fd(skel->progs.tc1),
+				   loopback, target, &opta);
+	ASSERT_EQ(err, -EEXIST, "prog_attach");
+
+	assert_mprog_count(target, 2);
+
+	LIBBPF_OPTS_CLEAR(opta);
+	opta.flags = BPF_F_REPLACE;
+	opta.replace_prog_fd = bpf_program__fd(skel->progs.tc2);
+
+	err = bpf_prog_attach_opts(bpf_program__fd(skel->progs.tc3),
+				   loopback, target, &opta);
+	ASSERT_EQ(err, -EBUSY, "prog_attach");
+
+	assert_mprog_count(target, 2);
+
+	LIBBPF_OPTS_CLEAR(opta);
+	opta.flags = BPF_F_REPLACE;
+	opta.replace_prog_fd = bpf_program__fd(skel->progs.tc1);
+
+	err = bpf_prog_attach_opts(bpf_program__fd(skel->progs.tc3),
+				   loopback, target, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup1;
+
+	detach_fd = bpf_program__fd(skel->progs.tc3);
+
+	assert_mprog_count(target, 2);
+
+	link = bpf_program__attach_tcx(skel->progs.tc4, &optl);
+	if (!ASSERT_OK_PTR(link, "link_attach"))
+		goto cleanup1;
+	skel->links.tc4 = link;
+
+	lid4 = id_from_link_fd(bpf_link__fd(skel->links.tc4));
+
+	assert_mprog_count(target, 3);
+
+	LIBBPF_OPTS_CLEAR(opta);
+	opta.flags = BPF_F_REPLACE;
+	opta.replace_prog_fd = bpf_program__fd(skel->progs.tc4);
+
+	err = bpf_prog_attach_opts(bpf_program__fd(skel->progs.tc2),
+				   loopback, target, &opta);
+	ASSERT_EQ(err, -EEXIST, "prog_attach");
+
+	optq.prog_ids = prog_ids;
+	optq.prog_attach_flags = prog_flags;
+	optq.link_ids = link_ids;
+	optq.link_attach_flags = link_flags;
+
+	memset(prog_ids, 0, sizeof(prog_ids));
+	memset(prog_flags, 0, sizeof(prog_flags));
+	memset(link_ids, 0, sizeof(link_ids));
+	memset(link_flags, 0, sizeof(link_flags));
+	optq.count = ARRAY_SIZE(prog_ids);
+
+	err = bpf_prog_query_opts(loopback, target, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup1;
+
+	ASSERT_EQ(optq.count, 3, "count");
+	ASSERT_EQ(optq.revision, 5, "revision");
+	ASSERT_EQ(optq.prog_ids[0], pid3, "prog_ids[0]");
+	ASSERT_EQ(optq.prog_attach_flags[0], 0, "prog_flags[0]");
+	ASSERT_EQ(optq.link_ids[0], 0, "link_ids[0]");
+	ASSERT_EQ(optq.link_attach_flags[0], 0, "link_flags[0]");
+	ASSERT_EQ(optq.prog_ids[1], pid2, "prog_ids[1]");
+	ASSERT_EQ(optq.prog_attach_flags[1], 0, "prog_flags[1]");
+	ASSERT_EQ(optq.link_ids[1], lid2, "link_ids[1]");
+	ASSERT_EQ(optq.link_attach_flags[1], 0, "link_flags[1]");
+	ASSERT_EQ(optq.prog_ids[2], pid4, "prog_ids[2]");
+	ASSERT_EQ(optq.prog_attach_flags[2], 0, "prog_flags[2]");
+	ASSERT_EQ(optq.link_ids[2], lid4, "link_ids[2]");
+	ASSERT_EQ(optq.link_attach_flags[2], 0, "link_flags[2]");
+	ASSERT_EQ(optq.prog_ids[3], 0, "prog_ids[3]");
+	ASSERT_EQ(optq.prog_attach_flags[3], 0, "prog_flags[3]");
+	ASSERT_EQ(optq.link_ids[3], 0, "link_ids[3]");
+	ASSERT_EQ(optq.link_attach_flags[3], 0, "link_flags[3]");
+
+	ASSERT_OK(system(ping_cmd), ping_cmd);
+
+cleanup1:
+	err = bpf_prog_detach_opts(detach_fd, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+	assert_mprog_count(target, 2);
+
+cleanup:
+	test_tc_link__destroy(skel);
+	assert_mprog_count(target, 0);
+}
+
+void serial_test_tc_opts_mixed(void)
+{
+	test_tc_opts_mixed_target(BPF_TCX_INGRESS);
+	test_tc_opts_mixed_target(BPF_TCX_EGRESS);
+}
+
+static void test_tc_opts_demixed_target(int target)
+{
+	LIBBPF_OPTS(bpf_tcx_opts, optl,
+		.ifindex = loopback,
+	);
+	LIBBPF_OPTS(bpf_prog_attach_opts, opta);
+	LIBBPF_OPTS(bpf_prog_detach_opts, optd);
+	struct test_tc_link *skel;
+	struct bpf_link *link;
+	__u32 pid1, pid2;
+	int err;
+
+	skel = test_tc_link__open();
+	if (!ASSERT_OK_PTR(skel, "skel_open"))
+		goto cleanup;
+
+	ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc1, target),
+		  0, "tc1_attach_type");
+	ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc2, target),
+		  0, "tc2_attach_type");
+
+	err = test_tc_link__load(skel);
+	if (!ASSERT_OK(err, "skel_load"))
+		goto cleanup;
+
+	pid1 = id_from_prog_fd(bpf_program__fd(skel->progs.tc1));
+	pid2 = id_from_prog_fd(bpf_program__fd(skel->progs.tc2));
+	ASSERT_NEQ(pid1, pid2, "prog_ids_1_2");
+
+	assert_mprog_count(target, 0);
+
+	err = bpf_prog_attach_opts(bpf_program__fd(skel->progs.tc1),
+				   loopback, target, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup;
+
+	assert_mprog_count(target, 1);
+
+	link = bpf_program__attach_tcx(skel->progs.tc2, &optl);
+	if (!ASSERT_OK_PTR(link, "link_attach"))
+		goto cleanup1;
+	skel->links.tc2 = link;
+
+	assert_mprog_count(target, 2);
+
+	LIBBPF_OPTS_CLEAR(optd);
+	optd.flags = BPF_F_AFTER;
+
+	err = bpf_prog_detach_opts(0, loopback, target, &optd);
+	ASSERT_EQ(err, -EBUSY, "prog_detach");
+
+	assert_mprog_count(target, 2);
+
+	LIBBPF_OPTS_CLEAR(optd);
+	optd.flags = BPF_F_BEFORE;
+
+	err = bpf_prog_detach_opts(0, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+
+	assert_mprog_count(target, 1);
+	goto cleanup;
+
+cleanup1:
+	err = bpf_prog_detach_opts(bpf_program__fd(skel->progs.tc1),
+				   loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+	assert_mprog_count(target, 2);
+
+cleanup:
+	test_tc_link__destroy(skel);
+	assert_mprog_count(target, 0);
+}
+
+void serial_test_tc_opts_demixed(void)
+{
+	test_tc_opts_demixed_target(BPF_TCX_INGRESS);
+	test_tc_opts_demixed_target(BPF_TCX_EGRESS);
+}
+
+static void test_tc_opts_detach_target(int target)
+{
+	LIBBPF_OPTS(bpf_prog_attach_opts, opta);
+	LIBBPF_OPTS(bpf_prog_detach_opts, optd);
+	LIBBPF_OPTS(bpf_prog_query_opts, optq);
+	__u32 fd1, fd2, fd3, fd4, id1, id2, id3, id4;
+	struct test_tc_link *skel;
+	__u32 prog_ids[5];
+	int err;
+
+	skel = test_tc_link__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_load"))
+		goto cleanup;
+
+	fd1 = bpf_program__fd(skel->progs.tc1);
+	fd2 = bpf_program__fd(skel->progs.tc2);
+	fd3 = bpf_program__fd(skel->progs.tc3);
+	fd4 = bpf_program__fd(skel->progs.tc4);
+
+	id1 = id_from_prog_fd(fd1);
+	id2 = id_from_prog_fd(fd2);
+	id3 = id_from_prog_fd(fd3);
+	id4 = id_from_prog_fd(fd4);
+
+	ASSERT_NEQ(id1, id2, "prog_ids_1_2");
+	ASSERT_NEQ(id3, id4, "prog_ids_3_4");
+	ASSERT_NEQ(id2, id3, "prog_ids_2_3");
+
+	assert_mprog_count(target, 0);
+
+	err = bpf_prog_attach_opts(fd1, loopback, target, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup;
+
+	assert_mprog_count(target, 1);
+
+	err = bpf_prog_attach_opts(fd2, loopback, target, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup1;
+
+	assert_mprog_count(target, 2);
+
+	err = bpf_prog_attach_opts(fd3, loopback, target, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup2;
+
+	assert_mprog_count(target, 3);
+
+	err = bpf_prog_attach_opts(fd4, loopback, target, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup3;
+
+	assert_mprog_count(target, 4);
+
+	optq.prog_ids = prog_ids;
+
+	memset(prog_ids, 0, sizeof(prog_ids));
+	optq.count = ARRAY_SIZE(prog_ids);
+
+	err = bpf_prog_query_opts(loopback, target, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup4;
+
+	ASSERT_EQ(optq.count, 4, "count");
+	ASSERT_EQ(optq.revision, 5, "revision");
+	ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]");
+	ASSERT_EQ(optq.prog_ids[1], id2, "prog_ids[1]");
+	ASSERT_EQ(optq.prog_ids[2], id3, "prog_ids[2]");
+	ASSERT_EQ(optq.prog_ids[3], id4, "prog_ids[3]");
+	ASSERT_EQ(optq.prog_ids[4], 0, "prog_ids[4]");
+
+	LIBBPF_OPTS_CLEAR(optd);
+	optd.flags = BPF_F_BEFORE;
+
+	err = bpf_prog_detach_opts(0, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+
+	assert_mprog_count(target, 3);
+
+	memset(prog_ids, 0, sizeof(prog_ids));
+	optq.count = ARRAY_SIZE(prog_ids);
+
+	err = bpf_prog_query_opts(loopback, target, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup4;
+
+	ASSERT_EQ(optq.count, 3, "count");
+	ASSERT_EQ(optq.revision, 6, "revision");
+	ASSERT_EQ(optq.prog_ids[0], id2, "prog_ids[0]");
+	ASSERT_EQ(optq.prog_ids[1], id3, "prog_ids[1]");
+	ASSERT_EQ(optq.prog_ids[2], id4, "prog_ids[2]");
+	ASSERT_EQ(optq.prog_ids[3], 0, "prog_ids[3]");
+
+	LIBBPF_OPTS_CLEAR(optd);
+	optd.flags = BPF_F_AFTER;
+
+	err = bpf_prog_detach_opts(0, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+
+	assert_mprog_count(target, 2);
+
+	memset(prog_ids, 0, sizeof(prog_ids));
+	optq.count = ARRAY_SIZE(prog_ids);
+
+	err = bpf_prog_query_opts(loopback, target, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup4;
+
+	ASSERT_EQ(optq.count, 2, "count");
+	ASSERT_EQ(optq.revision, 7, "revision");
+	ASSERT_EQ(optq.prog_ids[0], id2, "prog_ids[0]");
+	ASSERT_EQ(optq.prog_ids[1], id3, "prog_ids[1]");
+	ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]");
+
+	LIBBPF_OPTS_CLEAR(optd);
+
+	err = bpf_prog_detach_opts(fd3, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+	assert_mprog_count(target, 1);
+
+	err = bpf_prog_detach_opts(fd2, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+	assert_mprog_count(target, 0);
+
+	LIBBPF_OPTS_CLEAR(optd);
+	optd.flags = BPF_F_BEFORE;
+
+	err = bpf_prog_detach_opts(0, loopback, target, &optd);
+	ASSERT_EQ(err, -ENOENT, "prog_detach");
+
+	LIBBPF_OPTS_CLEAR(optd);
+	optd.flags = BPF_F_AFTER;
+
+	err = bpf_prog_detach_opts(0, loopback, target, &optd);
+	ASSERT_EQ(err, -ENOENT, "prog_detach");
+	goto cleanup;
+
+cleanup4:
+	err = bpf_prog_detach_opts(fd4, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+	assert_mprog_count(target, 3);
+
+cleanup3:
+	err = bpf_prog_detach_opts(fd3, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+	assert_mprog_count(target, 2);
+
+cleanup2:
+	err = bpf_prog_detach_opts(fd2, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+	assert_mprog_count(target, 1);
+
+cleanup1:
+	err = bpf_prog_detach_opts(fd1, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+	assert_mprog_count(target, 0);
+
+cleanup:
+	test_tc_link__destroy(skel);
+}
+
+void serial_test_tc_opts_detach(void)
+{
+	test_tc_opts_detach_target(BPF_TCX_INGRESS);
+	test_tc_opts_detach_target(BPF_TCX_EGRESS);
+}
+
+static void test_tc_opts_detach_before_target(int target)
+{
+	LIBBPF_OPTS(bpf_prog_attach_opts, opta);
+	LIBBPF_OPTS(bpf_prog_detach_opts, optd);
+	LIBBPF_OPTS(bpf_prog_query_opts, optq);
+	__u32 fd1, fd2, fd3, fd4, id1, id2, id3, id4;
+	struct test_tc_link *skel;
+	__u32 prog_ids[5];
+	int err;
+
+	skel = test_tc_link__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_load"))
+		goto cleanup;
+
+	fd1 = bpf_program__fd(skel->progs.tc1);
+	fd2 = bpf_program__fd(skel->progs.tc2);
+	fd3 = bpf_program__fd(skel->progs.tc3);
+	fd4 = bpf_program__fd(skel->progs.tc4);
+
+	id1 = id_from_prog_fd(fd1);
+	id2 = id_from_prog_fd(fd2);
+	id3 = id_from_prog_fd(fd3);
+	id4 = id_from_prog_fd(fd4);
+
+	ASSERT_NEQ(id1, id2, "prog_ids_1_2");
+	ASSERT_NEQ(id3, id4, "prog_ids_3_4");
+	ASSERT_NEQ(id2, id3, "prog_ids_2_3");
+
+	assert_mprog_count(target, 0);
+
+	err = bpf_prog_attach_opts(fd1, loopback, target, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup;
+
+	assert_mprog_count(target, 1);
+
+	err = bpf_prog_attach_opts(fd2, loopback, target, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup1;
+
+	assert_mprog_count(target, 2);
+
+	err = bpf_prog_attach_opts(fd3, loopback, target, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup2;
+
+	assert_mprog_count(target, 3);
+
+	err = bpf_prog_attach_opts(fd4, loopback, target, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup3;
+
+	assert_mprog_count(target, 4);
+
+	optq.prog_ids = prog_ids;
+
+	memset(prog_ids, 0, sizeof(prog_ids));
+	optq.count = ARRAY_SIZE(prog_ids);
+
+	err = bpf_prog_query_opts(loopback, target, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup4;
+
+	ASSERT_EQ(optq.count, 4, "count");
+	ASSERT_EQ(optq.revision, 5, "revision");
+	ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]");
+	ASSERT_EQ(optq.prog_ids[1], id2, "prog_ids[1]");
+	ASSERT_EQ(optq.prog_ids[2], id3, "prog_ids[2]");
+	ASSERT_EQ(optq.prog_ids[3], id4, "prog_ids[3]");
+	ASSERT_EQ(optq.prog_ids[4], 0, "prog_ids[4]");
+
+	LIBBPF_OPTS_CLEAR(optd);
+	optd.flags = BPF_F_BEFORE;
+	optd.relative_fd = fd2;
+
+	err = bpf_prog_detach_opts(fd1, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+
+	assert_mprog_count(target, 3);
+
+	memset(prog_ids, 0, sizeof(prog_ids));
+	optq.count = ARRAY_SIZE(prog_ids);
+
+	err = bpf_prog_query_opts(loopback, target, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup4;
+
+	ASSERT_EQ(optq.count, 3, "count");
+	ASSERT_EQ(optq.revision, 6, "revision");
+	ASSERT_EQ(optq.prog_ids[0], id2, "prog_ids[0]");
+	ASSERT_EQ(optq.prog_ids[1], id3, "prog_ids[1]");
+	ASSERT_EQ(optq.prog_ids[2], id4, "prog_ids[2]");
+	ASSERT_EQ(optq.prog_ids[3], 0, "prog_ids[3]");
+
+	LIBBPF_OPTS_CLEAR(optd);
+	optd.flags = BPF_F_BEFORE;
+	optd.relative_fd = fd2;
+
+	err = bpf_prog_detach_opts(fd1, loopback, target, &optd);
+	ASSERT_EQ(err, -ENOENT, "prog_detach");
+	assert_mprog_count(target, 3);
+
+	LIBBPF_OPTS_CLEAR(optd);
+	optd.flags = BPF_F_BEFORE;
+	optd.relative_fd = fd4;
+
+	err = bpf_prog_detach_opts(fd2, loopback, target, &optd);
+	ASSERT_EQ(err, -EDOM, "prog_detach");
+	assert_mprog_count(target, 3);
+
+	LIBBPF_OPTS_CLEAR(optd);
+	optd.flags = BPF_F_BEFORE;
+	optd.relative_fd = fd1;
+
+	err = bpf_prog_detach_opts(fd2, loopback, target, &optd);
+	ASSERT_EQ(err, -ENOENT, "prog_detach");
+	assert_mprog_count(target, 3);
+
+	LIBBPF_OPTS_CLEAR(optd);
+	optd.flags = BPF_F_BEFORE;
+	optd.relative_fd = fd3;
+
+	err = bpf_prog_detach_opts(fd2, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+
+	assert_mprog_count(target, 2);
+
+	memset(prog_ids, 0, sizeof(prog_ids));
+	optq.count = ARRAY_SIZE(prog_ids);
+
+	err = bpf_prog_query_opts(loopback, target, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup4;
+
+	ASSERT_EQ(optq.count, 2, "count");
+	ASSERT_EQ(optq.revision, 7, "revision");
+	ASSERT_EQ(optq.prog_ids[0], id3, "prog_ids[0]");
+	ASSERT_EQ(optq.prog_ids[1], id4, "prog_ids[1]");
+	ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]");
+
+	LIBBPF_OPTS_CLEAR(optd);
+	optd.flags = BPF_F_BEFORE;
+	optd.relative_fd = fd4;
+
+	err = bpf_prog_detach_opts(0, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+
+	assert_mprog_count(target, 1);
+
+	memset(prog_ids, 0, sizeof(prog_ids));
+	optq.count = ARRAY_SIZE(prog_ids);
+
+	err = bpf_prog_query_opts(loopback, target, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup4;
+
+	ASSERT_EQ(optq.count, 1, "count");
+	ASSERT_EQ(optq.revision, 8, "revision");
+	ASSERT_EQ(optq.prog_ids[0], id4, "prog_ids[0]");
+	ASSERT_EQ(optq.prog_ids[1], 0, "prog_ids[1]");
+
+	LIBBPF_OPTS_CLEAR(optd);
+	optd.flags = BPF_F_BEFORE;
+
+	err = bpf_prog_detach_opts(0, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+
+	assert_mprog_count(target, 0);
+	goto cleanup;
+
+cleanup4:
+	err = bpf_prog_detach_opts(fd4, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+	assert_mprog_count(target, 3);
+
+cleanup3:
+	err = bpf_prog_detach_opts(fd3, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+	assert_mprog_count(target, 2);
+
+cleanup2:
+	err = bpf_prog_detach_opts(fd2, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+	assert_mprog_count(target, 1);
+
+cleanup1:
+	err = bpf_prog_detach_opts(fd1, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+	assert_mprog_count(target, 0);
+
+cleanup:
+	test_tc_link__destroy(skel);
+}
+
+void serial_test_tc_opts_detach_before(void)
+{
+	test_tc_opts_detach_before_target(BPF_TCX_INGRESS);
+	test_tc_opts_detach_before_target(BPF_TCX_EGRESS);
+}
+
+static void test_tc_opts_detach_after_target(int target)
+{
+	LIBBPF_OPTS(bpf_prog_attach_opts, opta);
+	LIBBPF_OPTS(bpf_prog_detach_opts, optd);
+	LIBBPF_OPTS(bpf_prog_query_opts, optq);
+	__u32 fd1, fd2, fd3, fd4, id1, id2, id3, id4;
+	struct test_tc_link *skel;
+	__u32 prog_ids[5];
+	int err;
+
+	skel = test_tc_link__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_load"))
+		goto cleanup;
+
+	fd1 = bpf_program__fd(skel->progs.tc1);
+	fd2 = bpf_program__fd(skel->progs.tc2);
+	fd3 = bpf_program__fd(skel->progs.tc3);
+	fd4 = bpf_program__fd(skel->progs.tc4);
+
+	id1 = id_from_prog_fd(fd1);
+	id2 = id_from_prog_fd(fd2);
+	id3 = id_from_prog_fd(fd3);
+	id4 = id_from_prog_fd(fd4);
+
+	ASSERT_NEQ(id1, id2, "prog_ids_1_2");
+	ASSERT_NEQ(id3, id4, "prog_ids_3_4");
+	ASSERT_NEQ(id2, id3, "prog_ids_2_3");
+
+	assert_mprog_count(target, 0);
+
+	err = bpf_prog_attach_opts(fd1, loopback, target, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup;
+
+	assert_mprog_count(target, 1);
+
+	err = bpf_prog_attach_opts(fd2, loopback, target, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup1;
+
+	assert_mprog_count(target, 2);
+
+	err = bpf_prog_attach_opts(fd3, loopback, target, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup2;
+
+	assert_mprog_count(target, 3);
+
+	err = bpf_prog_attach_opts(fd4, loopback, target, &opta);
+	if (!ASSERT_EQ(err, 0, "prog_attach"))
+		goto cleanup3;
+
+	assert_mprog_count(target, 4);
+
+	optq.prog_ids = prog_ids;
+
+	memset(prog_ids, 0, sizeof(prog_ids));
+	optq.count = ARRAY_SIZE(prog_ids);
+
+	err = bpf_prog_query_opts(loopback, target, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup4;
+
+	ASSERT_EQ(optq.count, 4, "count");
+	ASSERT_EQ(optq.revision, 5, "revision");
+	ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]");
+	ASSERT_EQ(optq.prog_ids[1], id2, "prog_ids[1]");
+	ASSERT_EQ(optq.prog_ids[2], id3, "prog_ids[2]");
+	ASSERT_EQ(optq.prog_ids[3], id4, "prog_ids[3]");
+	ASSERT_EQ(optq.prog_ids[4], 0, "prog_ids[4]");
+
+	LIBBPF_OPTS_CLEAR(optd);
+	optd.flags = BPF_F_AFTER;
+	optd.relative_fd = fd1;
+
+	err = bpf_prog_detach_opts(fd2, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+
+	assert_mprog_count(target, 3);
+
+	memset(prog_ids, 0, sizeof(prog_ids));
+	optq.count = ARRAY_SIZE(prog_ids);
+
+	err = bpf_prog_query_opts(loopback, target, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup4;
+
+	ASSERT_EQ(optq.count, 3, "count");
+	ASSERT_EQ(optq.revision, 6, "revision");
+	ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]");
+	ASSERT_EQ(optq.prog_ids[1], id3, "prog_ids[1]");
+	ASSERT_EQ(optq.prog_ids[2], id4, "prog_ids[2]");
+	ASSERT_EQ(optq.prog_ids[3], 0, "prog_ids[3]");
+
+	LIBBPF_OPTS_CLEAR(optd);
+	optd.flags = BPF_F_AFTER;
+	optd.relative_fd = fd1;
+
+	err = bpf_prog_detach_opts(fd2, loopback, target, &optd);
+	ASSERT_EQ(err, -ENOENT, "prog_detach");
+	assert_mprog_count(target, 3);
+
+	LIBBPF_OPTS_CLEAR(optd);
+	optd.flags = BPF_F_AFTER;
+	optd.relative_fd = fd4;
+
+	err = bpf_prog_detach_opts(fd1, loopback, target, &optd);
+	ASSERT_EQ(err, -EDOM, "prog_detach");
+	assert_mprog_count(target, 3);
+
+	LIBBPF_OPTS_CLEAR(optd);
+	optd.flags = BPF_F_AFTER;
+	optd.relative_fd = fd3;
+
+	err = bpf_prog_detach_opts(fd1, loopback, target, &optd);
+	ASSERT_EQ(err, -EDOM, "prog_detach");
+	assert_mprog_count(target, 3);
+
+	LIBBPF_OPTS_CLEAR(optd);
+	optd.flags = BPF_F_AFTER;
+	optd.relative_fd = fd1;
+
+	err = bpf_prog_detach_opts(fd1, loopback, target, &optd);
+	ASSERT_EQ(err, -EDOM, "prog_detach");
+	assert_mprog_count(target, 3);
+
+	LIBBPF_OPTS_CLEAR(optd);
+	optd.flags = BPF_F_AFTER;
+	optd.relative_fd = fd1;
+
+	err = bpf_prog_detach_opts(fd3, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+
+	assert_mprog_count(target, 2);
+
+	memset(prog_ids, 0, sizeof(prog_ids));
+	optq.count = ARRAY_SIZE(prog_ids);
+
+	err = bpf_prog_query_opts(loopback, target, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup4;
+
+	ASSERT_EQ(optq.count, 2, "count");
+	ASSERT_EQ(optq.revision, 7, "revision");
+	ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]");
+	ASSERT_EQ(optq.prog_ids[1], id4, "prog_ids[1]");
+	ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]");
+
+	LIBBPF_OPTS_CLEAR(optd);
+	optd.flags = BPF_F_AFTER;
+	optd.relative_fd = fd1;
+
+	err = bpf_prog_detach_opts(0, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+
+	assert_mprog_count(target, 1);
+
+	memset(prog_ids, 0, sizeof(prog_ids));
+	optq.count = ARRAY_SIZE(prog_ids);
+
+	err = bpf_prog_query_opts(loopback, target, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup4;
+
+	ASSERT_EQ(optq.count, 1, "count");
+	ASSERT_EQ(optq.revision, 8, "revision");
+	ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]");
+	ASSERT_EQ(optq.prog_ids[1], 0, "prog_ids[1]");
+
+	LIBBPF_OPTS_CLEAR(optd);
+	optd.flags = BPF_F_AFTER;
+
+	err = bpf_prog_detach_opts(0, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+
+	assert_mprog_count(target, 0);
+	goto cleanup;
+
+cleanup4:
+	err = bpf_prog_detach_opts(fd4, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+	assert_mprog_count(target, 3);
+
+cleanup3:
+	err = bpf_prog_detach_opts(fd3, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+	assert_mprog_count(target, 2);
+
+cleanup2:
+	err = bpf_prog_detach_opts(fd2, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+	assert_mprog_count(target, 1);
+
+cleanup1:
+	err = bpf_prog_detach_opts(fd1, loopback, target, &optd);
+	ASSERT_OK(err, "prog_detach");
+	assert_mprog_count(target, 0);
+
+cleanup:
+	test_tc_link__destroy(skel);
+}
+
+void serial_test_tc_opts_detach_after(void)
+{
+	test_tc_opts_detach_after_target(BPF_TCX_INGRESS);
+	test_tc_opts_detach_after_target(BPF_TCX_EGRESS);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_tc_link.c b/tools/testing/selftests/bpf/progs/test_tc_link.c
new file mode 100644
index 000000000000..ed1fd0e9cee9
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_tc_link.c
@@ -0,0 +1,40 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2023 Isovalent */
+#include <stdbool.h>
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+
+char LICENSE[] SEC("license") = "GPL";
+
+bool seen_tc1;
+bool seen_tc2;
+bool seen_tc3;
+bool seen_tc4;
+
+SEC("tc/ingress")
+int tc1(struct __sk_buff *skb)
+{
+	seen_tc1 = true;
+	return TCX_NEXT;
+}
+
+SEC("tc/egress")
+int tc2(struct __sk_buff *skb)
+{
+	seen_tc2 = true;
+	return TCX_NEXT;
+}
+
+SEC("tc/egress")
+int tc3(struct __sk_buff *skb)
+{
+	seen_tc3 = true;
+	return TCX_NEXT;
+}
+
+SEC("tc/egress")
+int tc4(struct __sk_buff *skb)
+{
+	seen_tc4 = true;
+	return TCX_NEXT;
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next v4 8/8] selftests/bpf: Add mprog API tests for BPF tcx links
  2023-07-10 20:12 [PATCH bpf-next v4 0/8] BPF link support for tc BPF programs Daniel Borkmann
                   ` (6 preceding siblings ...)
  2023-07-10 20:12 ` [PATCH bpf-next v4 7/8] selftests/bpf: Add mprog API tests for BPF tcx opts Daniel Borkmann
@ 2023-07-10 20:12 ` Daniel Borkmann
  7 siblings, 0 replies; 22+ messages in thread
From: Daniel Borkmann @ 2023-07-10 20:12 UTC (permalink / raw)
  To: ast
  Cc: andrii, martin.lau, razor, sdf, john.fastabend, kuba, dxu, joe,
	toke, davem, bpf, netdev, Daniel Borkmann

Add a big batch of test coverage to assert all aspects of the tcx link API:

  # ./vmtest.sh -- ./test_progs -t tc_links
  [...]
  #225     tc_links_after:OK
  #226     tc_links_append:OK
  #227     tc_links_basic:OK
  #228     tc_links_before:OK
  #229     tc_links_chain_classic:OK
  #230     tc_links_dev_cleanup:OK
  #231     tc_links_invalid:OK
  #232     tc_links_prepend:OK
  #233     tc_links_replace:OK
  #234     tc_links_revision:OK
  Summary: 10/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 .../selftests/bpf/prog_tests/tc_links.c       | 1604 +++++++++++++++++
 1 file changed, 1604 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/tc_links.c

diff --git a/tools/testing/selftests/bpf/prog_tests/tc_links.c b/tools/testing/selftests/bpf/prog_tests/tc_links.c
new file mode 100644
index 000000000000..6a05f75492b5
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/tc_links.c
@@ -0,0 +1,1604 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2023 Isovalent */
+#include <uapi/linux/if_link.h>
+#include <net/if.h>
+#include <test_progs.h>
+
+#define loopback 1
+#define ping_cmd "ping -q -c1 -w1 127.0.0.1 > /dev/null"
+
+#include "test_tc_link.skel.h"
+#include "tc_helpers.h"
+
+void serial_test_tc_links_basic(void)
+{
+	LIBBPF_OPTS(bpf_tcx_opts, optl,
+		.ifindex = loopback,
+	);
+	LIBBPF_OPTS(bpf_prog_query_opts, optq);
+	__u32 prog_ids[2], link_ids[2];
+	__u32 pid1, pid2, lid1, lid2;
+	struct test_tc_link *skel;
+	struct bpf_link *link;
+	int err;
+
+	skel = test_tc_link__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_load"))
+		goto cleanup;
+
+	pid1 = id_from_prog_fd(bpf_program__fd(skel->progs.tc1));
+	pid2 = id_from_prog_fd(bpf_program__fd(skel->progs.tc2));
+
+	ASSERT_NEQ(pid1, pid2, "prog_ids_1_2");
+
+	assert_mprog_count(BPF_TCX_INGRESS, 0);
+	assert_mprog_count(BPF_TCX_EGRESS, 0);
+
+	ASSERT_EQ(skel->bss->seen_tc1, false, "seen_tc1");
+	ASSERT_EQ(skel->bss->seen_tc2, false, "seen_tc2");
+
+	link = bpf_program__attach_tcx(skel->progs.tc1, &optl);
+	if (!ASSERT_OK_PTR(link, "link_attach"))
+		goto cleanup;
+
+	skel->links.tc1 = link;
+
+	lid1 = id_from_link_fd(bpf_link__fd(skel->links.tc1));
+
+	assert_mprog_count(BPF_TCX_INGRESS, 1);
+	assert_mprog_count(BPF_TCX_EGRESS, 0);
+
+	optq.prog_ids = prog_ids;
+	optq.link_ids = link_ids;
+
+	memset(prog_ids, 0, sizeof(prog_ids));
+	memset(link_ids, 0, sizeof(link_ids));
+	optq.count = ARRAY_SIZE(prog_ids);
+
+	err = bpf_prog_query_opts(loopback, BPF_TCX_INGRESS, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup;
+
+	ASSERT_EQ(optq.count, 1, "count");
+	ASSERT_EQ(optq.revision, 2, "revision");
+	ASSERT_EQ(optq.prog_ids[0], pid1, "prog_ids[0]");
+	ASSERT_EQ(optq.link_ids[0], lid1, "link_ids[0]");
+	ASSERT_EQ(optq.prog_ids[1], 0, "prog_ids[1]");
+	ASSERT_EQ(optq.link_ids[1], 0, "link_ids[1]");
+
+	ASSERT_OK(system(ping_cmd), ping_cmd);
+
+	ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+	ASSERT_EQ(skel->bss->seen_tc2, false, "seen_tc2");
+
+	link = bpf_program__attach_tcx(skel->progs.tc2, &optl);
+	if (!ASSERT_OK_PTR(link, "link_attach"))
+		goto cleanup;
+
+	skel->links.tc2 = link;
+
+	lid2 = id_from_link_fd(bpf_link__fd(skel->links.tc2));
+	ASSERT_NEQ(lid1, lid2, "link_ids_1_2");
+
+	assert_mprog_count(BPF_TCX_INGRESS, 1);
+	assert_mprog_count(BPF_TCX_EGRESS, 1);
+
+	memset(prog_ids, 0, sizeof(prog_ids));
+	memset(link_ids, 0, sizeof(link_ids));
+	optq.count = ARRAY_SIZE(prog_ids);
+
+	err = bpf_prog_query_opts(loopback, BPF_TCX_EGRESS, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup;
+
+	ASSERT_EQ(optq.count, 1, "count");
+	ASSERT_EQ(optq.revision, 2, "revision");
+	ASSERT_EQ(optq.prog_ids[0], pid2, "prog_ids[0]");
+	ASSERT_EQ(optq.link_ids[0], lid2, "link_ids[0]");
+	ASSERT_EQ(optq.prog_ids[1], 0, "prog_ids[1]");
+	ASSERT_EQ(optq.link_ids[1], 0, "link_ids[1]");
+
+	ASSERT_OK(system(ping_cmd), ping_cmd);
+
+	ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+	ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+cleanup:
+	test_tc_link__destroy(skel);
+
+	assert_mprog_count(BPF_TCX_INGRESS, 0);
+	assert_mprog_count(BPF_TCX_EGRESS, 0);
+}
+
+static void test_tc_links_before_target(int target)
+{
+	LIBBPF_OPTS(bpf_tcx_opts, optl,
+		.ifindex = loopback,
+	);
+	LIBBPF_OPTS(bpf_prog_query_opts, optq);
+	__u32 prog_ids[5], link_ids[5];
+	__u32 pid1, pid2, pid3, pid4;
+	__u32 lid1, lid2, lid3, lid4;
+	struct test_tc_link *skel;
+	struct bpf_link *link;
+	int err;
+
+	skel = test_tc_link__open();
+	if (!ASSERT_OK_PTR(skel, "skel_open"))
+		goto cleanup;
+
+	ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc1, target),
+		  0, "tc1_attach_type");
+	ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc2, target),
+		  0, "tc2_attach_type");
+	ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc3, target),
+		  0, "tc3_attach_type");
+	ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc4, target),
+		  0, "tc4_attach_type");
+
+	err = test_tc_link__load(skel);
+	if (!ASSERT_OK(err, "skel_load"))
+		goto cleanup;
+
+	pid1 = id_from_prog_fd(bpf_program__fd(skel->progs.tc1));
+	pid2 = id_from_prog_fd(bpf_program__fd(skel->progs.tc2));
+	pid3 = id_from_prog_fd(bpf_program__fd(skel->progs.tc3));
+	pid4 = id_from_prog_fd(bpf_program__fd(skel->progs.tc4));
+
+	ASSERT_NEQ(pid1, pid2, "prog_ids_1_2");
+	ASSERT_NEQ(pid3, pid4, "prog_ids_3_4");
+	ASSERT_NEQ(pid2, pid3, "prog_ids_2_3");
+
+	assert_mprog_count(target, 0);
+
+	link = bpf_program__attach_tcx(skel->progs.tc1, &optl);
+	if (!ASSERT_OK_PTR(link, "link_attach"))
+		goto cleanup;
+
+	skel->links.tc1 = link;
+
+	lid1 = id_from_link_fd(bpf_link__fd(skel->links.tc1));
+
+	assert_mprog_count(target, 1);
+
+	link = bpf_program__attach_tcx(skel->progs.tc2, &optl);
+	if (!ASSERT_OK_PTR(link, "link_attach"))
+		goto cleanup;
+
+	skel->links.tc2 = link;
+
+	lid2 = id_from_link_fd(bpf_link__fd(skel->links.tc2));
+
+	assert_mprog_count(target, 2);
+
+	optq.prog_ids = prog_ids;
+	optq.link_ids = link_ids;
+
+	memset(prog_ids, 0, sizeof(prog_ids));
+	memset(link_ids, 0, sizeof(link_ids));
+	optq.count = ARRAY_SIZE(prog_ids);
+
+	err = bpf_prog_query_opts(loopback, target, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup;
+
+	ASSERT_EQ(optq.count, 2, "count");
+	ASSERT_EQ(optq.revision, 3, "revision");
+	ASSERT_EQ(optq.prog_ids[0], pid1, "prog_ids[0]");
+	ASSERT_EQ(optq.link_ids[0], lid1, "link_ids[0]");
+	ASSERT_EQ(optq.prog_ids[1], pid2, "prog_ids[1]");
+	ASSERT_EQ(optq.link_ids[1], lid2, "link_ids[1]");
+	ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]");
+	ASSERT_EQ(optq.link_ids[2], 0, "link_ids[2]");
+
+	ASSERT_OK(system(ping_cmd), ping_cmd);
+
+	ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+	ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+	ASSERT_EQ(skel->bss->seen_tc3, false, "seen_tc3");
+	ASSERT_EQ(skel->bss->seen_tc4, false, "seen_tc4");
+
+	skel->bss->seen_tc1 = false;
+	skel->bss->seen_tc2 = false;
+
+	LIBBPF_OPTS_CLEAR(optl);
+	optl.ifindex = loopback;
+	optl.flags = BPF_F_BEFORE;
+	optl.relative_fd = bpf_program__fd(skel->progs.tc2);
+
+	link = bpf_program__attach_tcx(skel->progs.tc3, &optl);
+	if (!ASSERT_OK_PTR(link, "link_attach"))
+		goto cleanup;
+
+	skel->links.tc3 = link;
+
+	lid3 = id_from_link_fd(bpf_link__fd(skel->links.tc3));
+
+	LIBBPF_OPTS_CLEAR(optl);
+	optl.ifindex = loopback;
+	optl.flags = BPF_F_BEFORE | BPF_F_LINK;
+	optl.relative_id = lid1;
+
+	link = bpf_program__attach_tcx(skel->progs.tc4, &optl);
+	if (!ASSERT_OK_PTR(link, "link_attach"))
+		goto cleanup;
+
+	skel->links.tc4 = link;
+
+	lid4 = id_from_link_fd(bpf_link__fd(skel->links.tc4));
+
+	assert_mprog_count(target, 4);
+
+	memset(prog_ids, 0, sizeof(prog_ids));
+	memset(link_ids, 0, sizeof(link_ids));
+	optq.count = ARRAY_SIZE(prog_ids);
+
+	err = bpf_prog_query_opts(loopback, target, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup;
+
+	ASSERT_EQ(optq.count, 4, "count");
+	ASSERT_EQ(optq.revision, 5, "revision");
+	ASSERT_EQ(optq.prog_ids[0], pid4, "prog_ids[0]");
+	ASSERT_EQ(optq.link_ids[0], lid4, "link_ids[0]");
+	ASSERT_EQ(optq.prog_ids[1], pid1, "prog_ids[1]");
+	ASSERT_EQ(optq.link_ids[1], lid1, "link_ids[1]");
+	ASSERT_EQ(optq.prog_ids[2], pid3, "prog_ids[2]");
+	ASSERT_EQ(optq.link_ids[2], lid3, "link_ids[2]");
+	ASSERT_EQ(optq.prog_ids[3], pid2, "prog_ids[3]");
+	ASSERT_EQ(optq.link_ids[3], lid2, "link_ids[3]");
+	ASSERT_EQ(optq.prog_ids[4], 0, "prog_ids[4]");
+	ASSERT_EQ(optq.link_ids[4], 0, "link_ids[4]");
+
+	ASSERT_OK(system(ping_cmd), ping_cmd);
+
+	ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+	ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+	ASSERT_EQ(skel->bss->seen_tc3, true, "seen_tc3");
+	ASSERT_EQ(skel->bss->seen_tc4, true, "seen_tc4");
+cleanup:
+	test_tc_link__destroy(skel);
+	assert_mprog_count(target, 0);
+}
+
+void serial_test_tc_links_before(void)
+{
+	test_tc_links_before_target(BPF_TCX_INGRESS);
+	test_tc_links_before_target(BPF_TCX_EGRESS);
+}
+
+static void test_tc_links_after_target(int target)
+{
+	LIBBPF_OPTS(bpf_tcx_opts, optl,
+		.ifindex = loopback,
+	);
+	LIBBPF_OPTS(bpf_prog_query_opts, optq);
+	__u32 prog_ids[5], link_ids[5];
+	__u32 pid1, pid2, pid3, pid4;
+	__u32 lid1, lid2, lid3, lid4;
+	struct test_tc_link *skel;
+	struct bpf_link *link;
+	int err;
+
+	skel = test_tc_link__open();
+	if (!ASSERT_OK_PTR(skel, "skel_open"))
+		goto cleanup;
+
+	ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc1, target),
+		  0, "tc1_attach_type");
+	ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc2, target),
+		  0, "tc2_attach_type");
+	ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc3, target),
+		  0, "tc3_attach_type");
+	ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc4, target),
+		  0, "tc4_attach_type");
+
+	err = test_tc_link__load(skel);
+	if (!ASSERT_OK(err, "skel_load"))
+		goto cleanup;
+
+	pid1 = id_from_prog_fd(bpf_program__fd(skel->progs.tc1));
+	pid2 = id_from_prog_fd(bpf_program__fd(skel->progs.tc2));
+	pid3 = id_from_prog_fd(bpf_program__fd(skel->progs.tc3));
+	pid4 = id_from_prog_fd(bpf_program__fd(skel->progs.tc4));
+
+	ASSERT_NEQ(pid1, pid2, "prog_ids_1_2");
+	ASSERT_NEQ(pid3, pid4, "prog_ids_3_4");
+	ASSERT_NEQ(pid2, pid3, "prog_ids_2_3");
+
+	assert_mprog_count(target, 0);
+
+	link = bpf_program__attach_tcx(skel->progs.tc1, &optl);
+	if (!ASSERT_OK_PTR(link, "link_attach"))
+		goto cleanup;
+
+	skel->links.tc1 = link;
+
+	lid1 = id_from_link_fd(bpf_link__fd(skel->links.tc1));
+
+	assert_mprog_count(target, 1);
+
+	link = bpf_program__attach_tcx(skel->progs.tc2, &optl);
+	if (!ASSERT_OK_PTR(link, "link_attach"))
+		goto cleanup;
+
+	skel->links.tc2 = link;
+
+	lid2 = id_from_link_fd(bpf_link__fd(skel->links.tc2));
+
+	assert_mprog_count(target, 2);
+
+	optq.prog_ids = prog_ids;
+	optq.link_ids = link_ids;
+
+	memset(prog_ids, 0, sizeof(prog_ids));
+	memset(link_ids, 0, sizeof(link_ids));
+	optq.count = ARRAY_SIZE(prog_ids);
+
+	err = bpf_prog_query_opts(loopback, target, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup;
+
+	ASSERT_EQ(optq.count, 2, "count");
+	ASSERT_EQ(optq.revision, 3, "revision");
+	ASSERT_EQ(optq.prog_ids[0], pid1, "prog_ids[0]");
+	ASSERT_EQ(optq.link_ids[0], lid1, "link_ids[0]");
+	ASSERT_EQ(optq.prog_ids[1], pid2, "prog_ids[1]");
+	ASSERT_EQ(optq.link_ids[1], lid2, "link_ids[1]");
+	ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]");
+	ASSERT_EQ(optq.link_ids[2], 0, "link_ids[2]");
+
+	ASSERT_OK(system(ping_cmd), ping_cmd);
+
+	ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+	ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+	ASSERT_EQ(skel->bss->seen_tc3, false, "seen_tc3");
+	ASSERT_EQ(skel->bss->seen_tc4, false, "seen_tc4");
+
+	skel->bss->seen_tc1 = false;
+	skel->bss->seen_tc2 = false;
+
+	LIBBPF_OPTS_CLEAR(optl);
+	optl.ifindex = loopback;
+	optl.flags = BPF_F_AFTER;
+	optl.relative_fd = bpf_program__fd(skel->progs.tc1);
+
+	link = bpf_program__attach_tcx(skel->progs.tc3, &optl);
+	if (!ASSERT_OK_PTR(link, "link_attach"))
+		goto cleanup;
+
+	skel->links.tc3 = link;
+
+	lid3 = id_from_link_fd(bpf_link__fd(skel->links.tc3));
+
+	LIBBPF_OPTS_CLEAR(optl);
+	optl.ifindex = loopback;
+	optl.flags = BPF_F_AFTER | BPF_F_LINK;
+	optl.relative_fd = bpf_link__fd(skel->links.tc2);
+
+	link = bpf_program__attach_tcx(skel->progs.tc4, &optl);
+	if (!ASSERT_OK_PTR(link, "link_attach"))
+		goto cleanup;
+
+	skel->links.tc4 = link;
+
+	lid4 = id_from_link_fd(bpf_link__fd(skel->links.tc4));
+
+	assert_mprog_count(target, 4);
+
+	memset(prog_ids, 0, sizeof(prog_ids));
+	memset(link_ids, 0, sizeof(link_ids));
+	optq.count = ARRAY_SIZE(prog_ids);
+
+	err = bpf_prog_query_opts(loopback, target, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup;
+
+	ASSERT_EQ(optq.count, 4, "count");
+	ASSERT_EQ(optq.revision, 5, "revision");
+	ASSERT_EQ(optq.prog_ids[0], pid1, "prog_ids[0]");
+	ASSERT_EQ(optq.link_ids[0], lid1, "link_ids[0]");
+	ASSERT_EQ(optq.prog_ids[1], pid3, "prog_ids[1]");
+	ASSERT_EQ(optq.link_ids[1], lid3, "link_ids[1]");
+	ASSERT_EQ(optq.prog_ids[2], pid2, "prog_ids[2]");
+	ASSERT_EQ(optq.link_ids[2], lid2, "link_ids[2]");
+	ASSERT_EQ(optq.prog_ids[3], pid4, "prog_ids[3]");
+	ASSERT_EQ(optq.link_ids[3], lid4, "link_ids[3]");
+	ASSERT_EQ(optq.prog_ids[4], 0, "prog_ids[4]");
+	ASSERT_EQ(optq.link_ids[4], 0, "link_ids[4]");
+
+	ASSERT_OK(system(ping_cmd), ping_cmd);
+
+	ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+	ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+	ASSERT_EQ(skel->bss->seen_tc3, true, "seen_tc3");
+	ASSERT_EQ(skel->bss->seen_tc4, true, "seen_tc4");
+cleanup:
+	test_tc_link__destroy(skel);
+	assert_mprog_count(target, 0);
+}
+
+void serial_test_tc_links_after(void)
+{
+	test_tc_links_after_target(BPF_TCX_INGRESS);
+	test_tc_links_after_target(BPF_TCX_EGRESS);
+}
+
+static void test_tc_links_revision_target(int target)
+{
+	LIBBPF_OPTS(bpf_tcx_opts, optl,
+		.ifindex = loopback,
+	);
+	LIBBPF_OPTS(bpf_prog_query_opts, optq);
+	__u32 prog_ids[3], link_ids[3];
+	__u32 pid1, pid2, lid1, lid2;
+	struct test_tc_link *skel;
+	struct bpf_link *link;
+	int err;
+
+	skel = test_tc_link__open();
+	if (!ASSERT_OK_PTR(skel, "skel_open"))
+		goto cleanup;
+
+	ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc1, target),
+		  0, "tc1_attach_type");
+	ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc2, target),
+		  0, "tc2_attach_type");
+
+	err = test_tc_link__load(skel);
+	if (!ASSERT_OK(err, "skel_load"))
+		goto cleanup;
+
+	pid1 = id_from_prog_fd(bpf_program__fd(skel->progs.tc1));
+	pid2 = id_from_prog_fd(bpf_program__fd(skel->progs.tc2));
+
+	ASSERT_NEQ(pid1, pid2, "prog_ids_1_2");
+
+	assert_mprog_count(target, 0);
+
+	optl.expected_revision = 1;
+
+	link = bpf_program__attach_tcx(skel->progs.tc1, &optl);
+	if (!ASSERT_OK_PTR(link, "link_attach"))
+		goto cleanup;
+
+	skel->links.tc1 = link;
+
+	lid1 = id_from_link_fd(bpf_link__fd(skel->links.tc1));
+
+	assert_mprog_count(target, 1);
+
+	optl.expected_revision = 1;
+
+	link = bpf_program__attach_tcx(skel->progs.tc2, &optl);
+	if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) {
+		bpf_link__destroy(link);
+		goto cleanup;
+	}
+
+	assert_mprog_count(target, 1);
+
+	optl.expected_revision = 2;
+
+	link = bpf_program__attach_tcx(skel->progs.tc2, &optl);
+	if (!ASSERT_OK_PTR(link, "link_attach"))
+		goto cleanup;
+
+	skel->links.tc2 = link;
+
+	lid2 = id_from_link_fd(bpf_link__fd(skel->links.tc2));
+
+	assert_mprog_count(target, 2);
+
+	optq.prog_ids = prog_ids;
+	optq.link_ids = link_ids;
+
+	memset(prog_ids, 0, sizeof(prog_ids));
+	memset(link_ids, 0, sizeof(link_ids));
+	optq.count = ARRAY_SIZE(prog_ids);
+
+	err = bpf_prog_query_opts(loopback, target, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup;
+
+	ASSERT_EQ(optq.count, 2, "count");
+	ASSERT_EQ(optq.revision, 3, "revision");
+	ASSERT_EQ(optq.prog_ids[0], pid1, "prog_ids[0]");
+	ASSERT_EQ(optq.link_ids[0], lid1, "link_ids[0]");
+	ASSERT_EQ(optq.prog_ids[1], pid2, "prog_ids[1]");
+	ASSERT_EQ(optq.link_ids[1], lid2, "link_ids[1]");
+	ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]");
+	ASSERT_EQ(optq.link_ids[2], 0, "prog_ids[2]");
+
+	ASSERT_OK(system(ping_cmd), ping_cmd);
+
+	ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+	ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+cleanup:
+	test_tc_link__destroy(skel);
+	assert_mprog_count(target, 0);
+}
+
+void serial_test_tc_links_revision(void)
+{
+	test_tc_links_revision_target(BPF_TCX_INGRESS);
+	test_tc_links_revision_target(BPF_TCX_EGRESS);
+}
+
+static void test_tc_chain_classic(int target, bool chain_tc_old)
+{
+	LIBBPF_OPTS(bpf_tc_opts, tc_opts, .handle = 1, .priority = 1);
+	LIBBPF_OPTS(bpf_tc_hook, tc_hook, .ifindex = loopback);
+	bool hook_created = false, tc_attached = false;
+	LIBBPF_OPTS(bpf_tcx_opts, optl,
+		.ifindex = loopback,
+	);
+	__u32 pid1, pid2, pid3;
+	struct test_tc_link *skel;
+	struct bpf_link *link;
+	int err;
+
+	skel = test_tc_link__open();
+	if (!ASSERT_OK_PTR(skel, "skel_open"))
+		goto cleanup;
+
+	ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc1, target),
+		  0, "tc1_attach_type");
+	ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc2, target),
+		  0, "tc2_attach_type");
+
+	err = test_tc_link__load(skel);
+	if (!ASSERT_OK(err, "skel_load"))
+		goto cleanup;
+
+	pid1 = id_from_prog_fd(bpf_program__fd(skel->progs.tc1));
+	pid2 = id_from_prog_fd(bpf_program__fd(skel->progs.tc2));
+	pid3 = id_from_prog_fd(bpf_program__fd(skel->progs.tc3));
+
+	ASSERT_NEQ(pid1, pid2, "prog_ids_1_2");
+	ASSERT_NEQ(pid2, pid3, "prog_ids_2_3");
+
+	assert_mprog_count(target, 0);
+
+	if (chain_tc_old) {
+		tc_hook.attach_point = target == BPF_TCX_INGRESS ?
+				       BPF_TC_INGRESS : BPF_TC_EGRESS;
+		err = bpf_tc_hook_create(&tc_hook);
+		if (err == 0)
+			hook_created = true;
+		err = err == -EEXIST ? 0 : err;
+		if (!ASSERT_OK(err, "bpf_tc_hook_create"))
+			goto cleanup;
+
+		tc_opts.prog_fd = bpf_program__fd(skel->progs.tc3);
+		err = bpf_tc_attach(&tc_hook, &tc_opts);
+		if (!ASSERT_OK(err, "bpf_tc_attach"))
+			goto cleanup;
+		tc_attached = true;
+	}
+
+	link = bpf_program__attach_tcx(skel->progs.tc1, &optl);
+	if (!ASSERT_OK_PTR(link, "link_attach"))
+		goto cleanup;
+
+	skel->links.tc1 = link;
+
+	link = bpf_program__attach_tcx(skel->progs.tc2, &optl);
+	if (!ASSERT_OK_PTR(link, "link_attach"))
+		goto cleanup;
+
+	skel->links.tc2 = link;
+
+	assert_mprog_count(target, 2);
+
+	ASSERT_OK(system(ping_cmd), ping_cmd);
+
+	ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+	ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+	ASSERT_EQ(skel->bss->seen_tc3, chain_tc_old, "seen_tc3");
+
+	skel->bss->seen_tc1 = false;
+	skel->bss->seen_tc2 = false;
+	skel->bss->seen_tc3 = false;
+
+	err = bpf_link__detach(skel->links.tc2);
+	if (!ASSERT_OK(err, "prog_detach"))
+		goto cleanup;
+
+	assert_mprog_count(target, 1);
+
+	ASSERT_OK(system(ping_cmd), ping_cmd);
+
+	ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+	ASSERT_EQ(skel->bss->seen_tc2, false, "seen_tc2");
+	ASSERT_EQ(skel->bss->seen_tc3, chain_tc_old, "seen_tc3");
+cleanup:
+	if (tc_attached) {
+		tc_opts.flags = tc_opts.prog_fd = tc_opts.prog_id = 0;
+		err = bpf_tc_detach(&tc_hook, &tc_opts);
+		ASSERT_OK(err, "bpf_tc_detach");
+	}
+	if (hook_created) {
+		tc_hook.attach_point = BPF_TC_INGRESS | BPF_TC_EGRESS;
+		bpf_tc_hook_destroy(&tc_hook);
+	}
+	assert_mprog_count(target, 1);
+	test_tc_link__destroy(skel);
+	assert_mprog_count(target, 0);
+}
+
+void serial_test_tc_links_chain_classic(void)
+{
+	test_tc_chain_classic(BPF_TCX_INGRESS, false);
+	test_tc_chain_classic(BPF_TCX_EGRESS, false);
+	test_tc_chain_classic(BPF_TCX_INGRESS, true);
+	test_tc_chain_classic(BPF_TCX_EGRESS, true);
+}
+
+static void test_tc_links_replace_target(int target)
+{
+	LIBBPF_OPTS(bpf_tcx_opts, optl,
+		.ifindex = loopback,
+	);
+	LIBBPF_OPTS(bpf_prog_query_opts, optq);
+	__u32 pid1, pid2, pid3, lid1, lid2;
+	__u32 prog_ids[4], link_ids[4];
+	struct test_tc_link *skel;
+	struct bpf_link *link;
+	int err;
+
+	skel = test_tc_link__open();
+	if (!ASSERT_OK_PTR(skel, "skel_open"))
+		goto cleanup;
+
+	ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc1, target),
+		  0, "tc1_attach_type");
+	ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc2, target),
+		  0, "tc2_attach_type");
+	ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc3, target),
+		  0, "tc3_attach_type");
+
+	err = test_tc_link__load(skel);
+	if (!ASSERT_OK(err, "skel_load"))
+		goto cleanup;
+
+	pid1 = id_from_prog_fd(bpf_program__fd(skel->progs.tc1));
+	pid2 = id_from_prog_fd(bpf_program__fd(skel->progs.tc2));
+	pid3 = id_from_prog_fd(bpf_program__fd(skel->progs.tc3));
+
+	ASSERT_NEQ(pid1, pid2, "prog_ids_1_2");
+	ASSERT_NEQ(pid2, pid3, "prog_ids_2_3");
+
+	assert_mprog_count(target, 0);
+
+	optl.expected_revision = 1;
+
+	link = bpf_program__attach_tcx(skel->progs.tc1, &optl);
+	if (!ASSERT_OK_PTR(link, "link_attach"))
+		goto cleanup;
+
+	skel->links.tc1 = link;
+
+	lid1 = id_from_link_fd(bpf_link__fd(skel->links.tc1));
+
+	assert_mprog_count(target, 1);
+
+	LIBBPF_OPTS_CLEAR(optl);
+	optl.ifindex = loopback;
+	optl.flags = BPF_F_BEFORE;
+	optl.relative_id = pid1;
+	optl.expected_revision = 2;
+
+	link = bpf_program__attach_tcx(skel->progs.tc2, &optl);
+	if (!ASSERT_OK_PTR(link, "link_attach"))
+		goto cleanup;
+
+	skel->links.tc2 = link;
+
+	lid2 = id_from_link_fd(bpf_link__fd(skel->links.tc2));
+
+	assert_mprog_count(target, 2);
+
+	optq.prog_ids = prog_ids;
+	optq.link_ids = link_ids;
+
+	memset(prog_ids, 0, sizeof(prog_ids));
+	memset(link_ids, 0, sizeof(link_ids));
+	optq.count = ARRAY_SIZE(prog_ids);
+
+	err = bpf_prog_query_opts(loopback, target, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup;
+
+	ASSERT_EQ(optq.count, 2, "count");
+	ASSERT_EQ(optq.revision, 3, "revision");
+	ASSERT_EQ(optq.prog_ids[0], pid2, "prog_ids[0]");
+	ASSERT_EQ(optq.link_ids[0], lid2, "link_ids[0]");
+	ASSERT_EQ(optq.prog_ids[1], pid1, "prog_ids[1]");
+	ASSERT_EQ(optq.link_ids[1], lid1, "link_ids[1]");
+	ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]");
+	ASSERT_EQ(optq.link_ids[2], 0, "link_ids[2]");
+
+	ASSERT_OK(system(ping_cmd), ping_cmd);
+
+	ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+	ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+	ASSERT_EQ(skel->bss->seen_tc3, false, "seen_tc3");
+
+	skel->bss->seen_tc1 = false;
+	skel->bss->seen_tc2 = false;
+	skel->bss->seen_tc3 = false;
+
+	LIBBPF_OPTS_CLEAR(optl);
+	optl.ifindex = loopback;
+	optl.flags = BPF_F_REPLACE;
+	optl.relative_fd = bpf_program__fd(skel->progs.tc2);
+	optl.expected_revision = 3;
+
+	link = bpf_program__attach_tcx(skel->progs.tc3, &optl);
+	if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) {
+		bpf_link__destroy(link);
+		goto cleanup;
+	}
+
+	assert_mprog_count(target, 2);
+
+	LIBBPF_OPTS_CLEAR(optl);
+	optl.ifindex = loopback;
+	optl.flags = BPF_F_REPLACE | BPF_F_LINK;
+	optl.relative_fd = bpf_link__fd(skel->links.tc2);
+	optl.expected_revision = 3;
+
+	link = bpf_program__attach_tcx(skel->progs.tc3, &optl);
+	if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) {
+		bpf_link__destroy(link);
+		goto cleanup;
+	}
+
+	assert_mprog_count(target, 2);
+
+	LIBBPF_OPTS_CLEAR(optl);
+	optl.ifindex = loopback;
+	optl.flags = BPF_F_REPLACE | BPF_F_LINK | BPF_F_AFTER;
+	optl.relative_id = lid2;
+	optl.expected_revision = 0;
+
+	link = bpf_program__attach_tcx(skel->progs.tc3, &optl);
+	if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) {
+		bpf_link__destroy(link);
+		goto cleanup;
+	}
+
+	assert_mprog_count(target, 2);
+
+	err = bpf_link__update_program(skel->links.tc2, skel->progs.tc3);
+	if (!ASSERT_OK(err, "link_update"))
+		goto cleanup;
+
+	assert_mprog_count(target, 2);
+
+	memset(prog_ids, 0, sizeof(prog_ids));
+	memset(link_ids, 0, sizeof(link_ids));
+	optq.count = ARRAY_SIZE(prog_ids);
+
+	err = bpf_prog_query_opts(loopback, target, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup;
+
+	ASSERT_EQ(optq.count, 2, "count");
+	ASSERT_EQ(optq.revision, 4, "revision");
+	ASSERT_EQ(optq.prog_ids[0], pid3, "prog_ids[0]");
+	ASSERT_EQ(optq.link_ids[0], lid2, "link_ids[0]");
+	ASSERT_EQ(optq.prog_ids[1], pid1, "prog_ids[1]");
+	ASSERT_EQ(optq.link_ids[1], lid1, "link_ids[1]");
+	ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]");
+	ASSERT_EQ(optq.link_ids[2], 0, "link_ids[2]");
+
+	ASSERT_OK(system(ping_cmd), ping_cmd);
+
+	ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+	ASSERT_EQ(skel->bss->seen_tc2, false, "seen_tc2");
+	ASSERT_EQ(skel->bss->seen_tc3, true, "seen_tc3");
+
+	skel->bss->seen_tc1 = false;
+	skel->bss->seen_tc2 = false;
+	skel->bss->seen_tc3 = false;
+
+	err = bpf_link__detach(skel->links.tc2);
+	if (!ASSERT_OK(err, "link_detach"))
+		goto cleanup;
+
+	assert_mprog_count(target, 1);
+
+	memset(prog_ids, 0, sizeof(prog_ids));
+	memset(link_ids, 0, sizeof(link_ids));
+	optq.count = ARRAY_SIZE(prog_ids);
+
+	err = bpf_prog_query_opts(loopback, target, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup;
+
+	ASSERT_EQ(optq.count, 1, "count");
+	ASSERT_EQ(optq.revision, 5, "revision");
+	ASSERT_EQ(optq.prog_ids[0], pid1, "prog_ids[0]");
+	ASSERT_EQ(optq.link_ids[0], lid1, "link_ids[0]");
+	ASSERT_EQ(optq.prog_ids[1], 0, "prog_ids[1]");
+	ASSERT_EQ(optq.link_ids[1], 0, "link_ids[1]");
+
+	ASSERT_OK(system(ping_cmd), ping_cmd);
+
+	ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+	ASSERT_EQ(skel->bss->seen_tc2, false, "seen_tc2");
+	ASSERT_EQ(skel->bss->seen_tc3, false, "seen_tc3");
+
+	skel->bss->seen_tc1 = false;
+	skel->bss->seen_tc2 = false;
+	skel->bss->seen_tc3 = false;
+
+	err = bpf_link__update_program(skel->links.tc1, skel->progs.tc1);
+	if (!ASSERT_OK(err, "link_update_self"))
+		goto cleanup;
+
+	assert_mprog_count(target, 1);
+
+	memset(prog_ids, 0, sizeof(prog_ids));
+	memset(link_ids, 0, sizeof(link_ids));
+	optq.count = ARRAY_SIZE(prog_ids);
+
+	err = bpf_prog_query_opts(loopback, target, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup;
+
+	ASSERT_EQ(optq.count, 1, "count");
+	ASSERT_EQ(optq.revision, 5, "revision");
+	ASSERT_EQ(optq.prog_ids[0], pid1, "prog_ids[0]");
+	ASSERT_EQ(optq.link_ids[0], lid1, "link_ids[0]");
+	ASSERT_EQ(optq.prog_ids[1], 0, "prog_ids[1]");
+	ASSERT_EQ(optq.link_ids[1], 0, "link_ids[1]");
+
+	ASSERT_OK(system(ping_cmd), ping_cmd);
+
+	ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+	ASSERT_EQ(skel->bss->seen_tc2, false, "seen_tc2");
+	ASSERT_EQ(skel->bss->seen_tc3, false, "seen_tc3");
+cleanup:
+	test_tc_link__destroy(skel);
+	assert_mprog_count(target, 0);
+}
+
+void serial_test_tc_links_replace(void)
+{
+	test_tc_links_replace_target(BPF_TCX_INGRESS);
+	test_tc_links_replace_target(BPF_TCX_EGRESS);
+}
+
+static void test_tc_links_invalid_target(int target)
+{
+	LIBBPF_OPTS(bpf_tcx_opts, optl,
+		.ifindex = loopback,
+	);
+	LIBBPF_OPTS(bpf_prog_query_opts, optq);
+	__u32 pid1, pid2, lid1;
+	struct test_tc_link *skel;
+	struct bpf_link *link;
+	int err;
+
+	skel = test_tc_link__open();
+	if (!ASSERT_OK_PTR(skel, "skel_open"))
+		goto cleanup;
+
+	ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc1, target),
+		  0, "tc1_attach_type");
+	ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc2, target),
+		  0, "tc2_attach_type");
+
+	err = test_tc_link__load(skel);
+	if (!ASSERT_OK(err, "skel_load"))
+		goto cleanup;
+
+	pid1 = id_from_prog_fd(bpf_program__fd(skel->progs.tc1));
+	pid2 = id_from_prog_fd(bpf_program__fd(skel->progs.tc2));
+
+	ASSERT_NEQ(pid1, pid2, "prog_ids_1_2");
+
+	assert_mprog_count(target, 0);
+
+	optl.flags = BPF_F_BEFORE | BPF_F_AFTER;
+
+	link = bpf_program__attach_tcx(skel->progs.tc1, &optl);
+	if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) {
+		bpf_link__destroy(link);
+		goto cleanup;
+	}
+
+	assert_mprog_count(target, 0);
+
+	LIBBPF_OPTS_CLEAR(optl);
+	optl.ifindex = loopback;
+	optl.flags = BPF_F_BEFORE | BPF_F_ID;
+
+	link = bpf_program__attach_tcx(skel->progs.tc1, &optl);
+	if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) {
+		bpf_link__destroy(link);
+		goto cleanup;
+	}
+
+	assert_mprog_count(target, 0);
+
+	LIBBPF_OPTS_CLEAR(optl);
+	optl.ifindex = loopback;
+	optl.flags = BPF_F_AFTER | BPF_F_ID;
+
+	link = bpf_program__attach_tcx(skel->progs.tc1, &optl);
+	if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) {
+		bpf_link__destroy(link);
+		goto cleanup;
+	}
+
+	assert_mprog_count(target, 0);
+
+	LIBBPF_OPTS_CLEAR(optl);
+	optl.ifindex = loopback;
+	optl.flags = BPF_F_ID;
+
+	link = bpf_program__attach_tcx(skel->progs.tc1, &optl);
+	if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) {
+		bpf_link__destroy(link);
+		goto cleanup;
+	}
+
+	assert_mprog_count(target, 0);
+
+	LIBBPF_OPTS_CLEAR(optl);
+	optl.ifindex = loopback;
+	optl.flags = BPF_F_LINK;
+	optl.relative_fd = bpf_program__fd(skel->progs.tc2);
+
+	link = bpf_program__attach_tcx(skel->progs.tc1, &optl);
+	if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) {
+		bpf_link__destroy(link);
+		goto cleanup;
+	}
+
+	assert_mprog_count(target, 0);
+
+	LIBBPF_OPTS_CLEAR(optl);
+	optl.ifindex = loopback;
+	optl.flags = BPF_F_LINK;
+
+	link = bpf_program__attach_tcx(skel->progs.tc1, &optl);
+	if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) {
+		bpf_link__destroy(link);
+		goto cleanup;
+	}
+
+	assert_mprog_count(target, 0);
+
+	LIBBPF_OPTS_CLEAR(optl);
+	optl.ifindex = loopback;
+	optl.relative_fd = bpf_program__fd(skel->progs.tc2);
+
+	link = bpf_program__attach_tcx(skel->progs.tc1, &optl);
+	if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) {
+		bpf_link__destroy(link);
+		goto cleanup;
+	}
+
+	assert_mprog_count(target, 0);
+
+	LIBBPF_OPTS_CLEAR(optl);
+	optl.ifindex = loopback;
+	optl.flags = BPF_F_BEFORE | BPF_F_AFTER;
+	optl.relative_fd = bpf_program__fd(skel->progs.tc2);
+
+	link = bpf_program__attach_tcx(skel->progs.tc1, &optl);
+	if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) {
+		bpf_link__destroy(link);
+		goto cleanup;
+	}
+
+	assert_mprog_count(target, 0);
+
+	LIBBPF_OPTS_CLEAR(optl);
+	optl.ifindex = loopback;
+	optl.flags = BPF_F_BEFORE;
+	optl.relative_fd = bpf_program__fd(skel->progs.tc1);
+
+	link = bpf_program__attach_tcx(skel->progs.tc1, &optl);
+	if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) {
+		bpf_link__destroy(link);
+		goto cleanup;
+	}
+
+	assert_mprog_count(target, 0);
+
+	LIBBPF_OPTS_CLEAR(optl);
+	optl.ifindex = loopback;
+	optl.flags = BPF_F_ID;
+	optl.relative_id = pid2;
+
+	link = bpf_program__attach_tcx(skel->progs.tc1, &optl);
+	if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) {
+		bpf_link__destroy(link);
+		goto cleanup;
+	}
+
+	assert_mprog_count(target, 0);
+
+	LIBBPF_OPTS_CLEAR(optl);
+	optl.ifindex = loopback;
+	optl.flags = BPF_F_ID;
+	optl.relative_id = 42;
+
+	link = bpf_program__attach_tcx(skel->progs.tc1, &optl);
+	if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) {
+		bpf_link__destroy(link);
+		goto cleanup;
+	}
+
+	assert_mprog_count(target, 0);
+
+	LIBBPF_OPTS_CLEAR(optl);
+	optl.ifindex = loopback;
+	optl.flags = BPF_F_BEFORE;
+	optl.relative_fd = bpf_program__fd(skel->progs.tc1);
+
+	link = bpf_program__attach_tcx(skel->progs.tc1, &optl);
+	if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) {
+		bpf_link__destroy(link);
+		goto cleanup;
+	}
+
+	assert_mprog_count(target, 0);
+
+	LIBBPF_OPTS_CLEAR(optl);
+	optl.ifindex = loopback;
+	optl.flags = BPF_F_BEFORE | BPF_F_LINK;
+	optl.relative_fd = bpf_program__fd(skel->progs.tc1);
+
+	link = bpf_program__attach_tcx(skel->progs.tc1, &optl);
+	if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) {
+		bpf_link__destroy(link);
+		goto cleanup;
+	}
+
+	assert_mprog_count(target, 0);
+
+	LIBBPF_OPTS_CLEAR(optl);
+	optl.ifindex = loopback;
+	optl.flags = BPF_F_AFTER;
+	optl.relative_fd = bpf_program__fd(skel->progs.tc1);
+
+	link = bpf_program__attach_tcx(skel->progs.tc1, &optl);
+	if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) {
+		bpf_link__destroy(link);
+		goto cleanup;
+	}
+
+	assert_mprog_count(target, 0);
+
+	LIBBPF_OPTS_CLEAR(optl);
+
+	link = bpf_program__attach_tcx(skel->progs.tc1, &optl);
+	if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) {
+		bpf_link__destroy(link);
+		goto cleanup;
+	}
+
+	assert_mprog_count(target, 0);
+
+	LIBBPF_OPTS_CLEAR(optl);
+	optl.ifindex = loopback;
+	optl.flags = BPF_F_AFTER | BPF_F_LINK;
+	optl.relative_fd = bpf_program__fd(skel->progs.tc1);
+
+	link = bpf_program__attach_tcx(skel->progs.tc1, &optl);
+	if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) {
+		bpf_link__destroy(link);
+		goto cleanup;
+	}
+
+	assert_mprog_count(target, 0);
+
+	LIBBPF_OPTS_CLEAR(optl);
+	optl.ifindex = loopback;
+
+	link = bpf_program__attach_tcx(skel->progs.tc1, &optl);
+	if (!ASSERT_OK_PTR(link, "link_attach"))
+		goto cleanup;
+
+	skel->links.tc1 = link;
+
+	lid1 = id_from_link_fd(bpf_link__fd(skel->links.tc1));
+
+	assert_mprog_count(target, 1);
+
+	LIBBPF_OPTS_CLEAR(optl);
+	optl.ifindex = loopback;
+	optl.flags = BPF_F_AFTER | BPF_F_LINK;
+	optl.relative_fd = bpf_program__fd(skel->progs.tc1);
+
+	link = bpf_program__attach_tcx(skel->progs.tc2, &optl);
+	if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) {
+		bpf_link__destroy(link);
+		goto cleanup;
+	}
+
+	assert_mprog_count(target, 1);
+
+	LIBBPF_OPTS_CLEAR(optl);
+	optl.ifindex = loopback;
+	optl.flags = BPF_F_BEFORE | BPF_F_LINK | BPF_F_ID;
+	optl.relative_id = ~0;
+
+	link = bpf_program__attach_tcx(skel->progs.tc2, &optl);
+	if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) {
+		bpf_link__destroy(link);
+		goto cleanup;
+	}
+
+	assert_mprog_count(target, 1);
+
+	LIBBPF_OPTS_CLEAR(optl);
+	optl.ifindex = loopback;
+	optl.flags = BPF_F_BEFORE | BPF_F_LINK | BPF_F_ID;
+	optl.relative_id = lid1;
+
+	link = bpf_program__attach_tcx(skel->progs.tc1, &optl);
+	if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) {
+		bpf_link__destroy(link);
+		goto cleanup;
+	}
+
+	assert_mprog_count(target, 1);
+
+	LIBBPF_OPTS_CLEAR(optl);
+	optl.ifindex = loopback;
+	optl.flags = BPF_F_BEFORE | BPF_F_ID;
+	optl.relative_id = pid1;
+
+	link = bpf_program__attach_tcx(skel->progs.tc1, &optl);
+	if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) {
+		bpf_link__destroy(link);
+		goto cleanup;
+	}
+	assert_mprog_count(target, 1);
+
+	LIBBPF_OPTS_CLEAR(optl);
+	optl.ifindex = loopback;
+	optl.flags = BPF_F_BEFORE | BPF_F_LINK | BPF_F_ID;
+	optl.relative_id = lid1;
+
+	link = bpf_program__attach_tcx(skel->progs.tc2, &optl);
+	if (!ASSERT_OK_PTR(link, "link_attach"))
+		goto cleanup;
+
+	skel->links.tc2 = link;
+
+	assert_mprog_count(target, 2);
+cleanup:
+	test_tc_link__destroy(skel);
+	assert_mprog_count(target, 0);
+}
+
+void serial_test_tc_links_invalid(void)
+{
+	test_tc_links_invalid_target(BPF_TCX_INGRESS);
+	test_tc_links_invalid_target(BPF_TCX_EGRESS);
+}
+
+static void test_tc_links_prepend_target(int target)
+{
+	LIBBPF_OPTS(bpf_tcx_opts, optl,
+		.ifindex = loopback,
+	);
+	LIBBPF_OPTS(bpf_prog_query_opts, optq);
+	__u32 prog_ids[5], link_ids[5];
+	__u32 pid1, pid2, pid3, pid4;
+	__u32 lid1, lid2, lid3, lid4;
+	struct test_tc_link *skel;
+	struct bpf_link *link;
+	int err;
+
+	skel = test_tc_link__open();
+	if (!ASSERT_OK_PTR(skel, "skel_open"))
+		goto cleanup;
+
+	ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc1, target),
+		  0, "tc1_attach_type");
+	ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc2, target),
+		  0, "tc2_attach_type");
+	ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc3, target),
+		  0, "tc3_attach_type");
+	ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc4, target),
+		  0, "tc4_attach_type");
+
+	err = test_tc_link__load(skel);
+	if (!ASSERT_OK(err, "skel_load"))
+		goto cleanup;
+
+	pid1 = id_from_prog_fd(bpf_program__fd(skel->progs.tc1));
+	pid2 = id_from_prog_fd(bpf_program__fd(skel->progs.tc2));
+	pid3 = id_from_prog_fd(bpf_program__fd(skel->progs.tc3));
+	pid4 = id_from_prog_fd(bpf_program__fd(skel->progs.tc4));
+
+	ASSERT_NEQ(pid1, pid2, "prog_ids_1_2");
+	ASSERT_NEQ(pid3, pid4, "prog_ids_3_4");
+	ASSERT_NEQ(pid2, pid3, "prog_ids_2_3");
+
+	assert_mprog_count(target, 0);
+
+	link = bpf_program__attach_tcx(skel->progs.tc1, &optl);
+	if (!ASSERT_OK_PTR(link, "link_attach"))
+		goto cleanup;
+
+	skel->links.tc1 = link;
+
+	lid1 = id_from_link_fd(bpf_link__fd(skel->links.tc1));
+
+	assert_mprog_count(target, 1);
+
+	LIBBPF_OPTS_CLEAR(optl);
+	optl.ifindex = loopback;
+	optl.flags = BPF_F_BEFORE;
+
+	link = bpf_program__attach_tcx(skel->progs.tc2, &optl);
+	if (!ASSERT_OK_PTR(link, "link_attach"))
+		goto cleanup;
+
+	skel->links.tc2 = link;
+
+	lid2 = id_from_link_fd(bpf_link__fd(skel->links.tc2));
+
+	assert_mprog_count(target, 2);
+
+	optq.prog_ids = prog_ids;
+	optq.link_ids = link_ids;
+
+	memset(prog_ids, 0, sizeof(prog_ids));
+	memset(link_ids, 0, sizeof(link_ids));
+	optq.count = ARRAY_SIZE(prog_ids);
+
+	err = bpf_prog_query_opts(loopback, target, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup;
+
+	ASSERT_EQ(optq.count, 2, "count");
+	ASSERT_EQ(optq.revision, 3, "revision");
+	ASSERT_EQ(optq.prog_ids[0], pid2, "prog_ids[0]");
+	ASSERT_EQ(optq.link_ids[0], lid2, "link_ids[0]");
+	ASSERT_EQ(optq.prog_ids[1], pid1, "prog_ids[1]");
+	ASSERT_EQ(optq.link_ids[1], lid1, "link_ids[1]");
+	ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]");
+	ASSERT_EQ(optq.link_ids[2], 0, "link_ids[2]");
+
+	ASSERT_OK(system(ping_cmd), ping_cmd);
+
+	ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+	ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+	ASSERT_EQ(skel->bss->seen_tc3, false, "seen_tc3");
+	ASSERT_EQ(skel->bss->seen_tc4, false, "seen_tc4");
+
+	skel->bss->seen_tc1 = false;
+	skel->bss->seen_tc2 = false;
+
+	LIBBPF_OPTS_CLEAR(optl);
+	optl.ifindex = loopback;
+	optl.flags = BPF_F_BEFORE;
+
+	link = bpf_program__attach_tcx(skel->progs.tc3, &optl);
+	if (!ASSERT_OK_PTR(link, "link_attach"))
+		goto cleanup;
+
+	skel->links.tc3 = link;
+
+	lid3 = id_from_link_fd(bpf_link__fd(skel->links.tc3));
+
+	LIBBPF_OPTS_CLEAR(optl);
+	optl.ifindex = loopback;
+	optl.flags = BPF_F_BEFORE;
+
+	link = bpf_program__attach_tcx(skel->progs.tc4, &optl);
+	if (!ASSERT_OK_PTR(link, "link_attach"))
+		goto cleanup;
+
+	skel->links.tc4 = link;
+
+	lid4 = id_from_link_fd(bpf_link__fd(skel->links.tc4));
+
+	assert_mprog_count(target, 4);
+
+	memset(prog_ids, 0, sizeof(prog_ids));
+	memset(link_ids, 0, sizeof(link_ids));
+	optq.count = ARRAY_SIZE(prog_ids);
+
+	err = bpf_prog_query_opts(loopback, target, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup;
+
+	ASSERT_EQ(optq.count, 4, "count");
+	ASSERT_EQ(optq.revision, 5, "revision");
+	ASSERT_EQ(optq.prog_ids[0], pid4, "prog_ids[0]");
+	ASSERT_EQ(optq.link_ids[0], lid4, "link_ids[0]");
+	ASSERT_EQ(optq.prog_ids[1], pid3, "prog_ids[1]");
+	ASSERT_EQ(optq.link_ids[1], lid3, "link_ids[1]");
+	ASSERT_EQ(optq.prog_ids[2], pid2, "prog_ids[2]");
+	ASSERT_EQ(optq.link_ids[2], lid2, "link_ids[2]");
+	ASSERT_EQ(optq.prog_ids[3], pid1, "prog_ids[3]");
+	ASSERT_EQ(optq.link_ids[3], lid1, "link_ids[3]");
+	ASSERT_EQ(optq.prog_ids[4], 0, "prog_ids[4]");
+	ASSERT_EQ(optq.link_ids[4], 0, "link_ids[4]");
+
+	ASSERT_OK(system(ping_cmd), ping_cmd);
+
+	ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+	ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+	ASSERT_EQ(skel->bss->seen_tc3, true, "seen_tc3");
+	ASSERT_EQ(skel->bss->seen_tc4, true, "seen_tc4");
+cleanup:
+	test_tc_link__destroy(skel);
+	assert_mprog_count(target, 0);
+}
+
+void serial_test_tc_links_prepend(void)
+{
+	test_tc_links_prepend_target(BPF_TCX_INGRESS);
+	test_tc_links_prepend_target(BPF_TCX_EGRESS);
+}
+
+static void test_tc_links_append_target(int target)
+{
+	LIBBPF_OPTS(bpf_tcx_opts, optl,
+		.ifindex = loopback,
+	);
+	LIBBPF_OPTS(bpf_prog_query_opts, optq);
+	__u32 prog_ids[5], link_ids[5];
+	__u32 pid1, pid2, pid3, pid4;
+	__u32 lid1, lid2, lid3, lid4;
+	struct test_tc_link *skel;
+	struct bpf_link *link;
+	int err;
+
+	skel = test_tc_link__open();
+	if (!ASSERT_OK_PTR(skel, "skel_open"))
+		goto cleanup;
+
+	ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc1, target),
+		  0, "tc1_attach_type");
+	ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc2, target),
+		  0, "tc2_attach_type");
+	ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc3, target),
+		  0, "tc3_attach_type");
+	ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc4, target),
+		  0, "tc4_attach_type");
+
+	err = test_tc_link__load(skel);
+	if (!ASSERT_OK(err, "skel_load"))
+		goto cleanup;
+
+	pid1 = id_from_prog_fd(bpf_program__fd(skel->progs.tc1));
+	pid2 = id_from_prog_fd(bpf_program__fd(skel->progs.tc2));
+	pid3 = id_from_prog_fd(bpf_program__fd(skel->progs.tc3));
+	pid4 = id_from_prog_fd(bpf_program__fd(skel->progs.tc4));
+
+	ASSERT_NEQ(pid1, pid2, "prog_ids_1_2");
+	ASSERT_NEQ(pid3, pid4, "prog_ids_3_4");
+	ASSERT_NEQ(pid2, pid3, "prog_ids_2_3");
+
+	assert_mprog_count(target, 0);
+
+	link = bpf_program__attach_tcx(skel->progs.tc1, &optl);
+	if (!ASSERT_OK_PTR(link, "link_attach"))
+		goto cleanup;
+
+	skel->links.tc1 = link;
+
+	lid1 = id_from_link_fd(bpf_link__fd(skel->links.tc1));
+
+	assert_mprog_count(target, 1);
+
+	LIBBPF_OPTS_CLEAR(optl);
+	optl.ifindex = loopback;
+	optl.flags = BPF_F_AFTER;
+
+	link = bpf_program__attach_tcx(skel->progs.tc2, &optl);
+	if (!ASSERT_OK_PTR(link, "link_attach"))
+		goto cleanup;
+
+	skel->links.tc2 = link;
+
+	lid2 = id_from_link_fd(bpf_link__fd(skel->links.tc2));
+
+	assert_mprog_count(target, 2);
+
+	optq.prog_ids = prog_ids;
+	optq.link_ids = link_ids;
+
+	memset(prog_ids, 0, sizeof(prog_ids));
+	memset(link_ids, 0, sizeof(link_ids));
+	optq.count = ARRAY_SIZE(prog_ids);
+
+	err = bpf_prog_query_opts(loopback, target, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup;
+
+	ASSERT_EQ(optq.count, 2, "count");
+	ASSERT_EQ(optq.revision, 3, "revision");
+	ASSERT_EQ(optq.prog_ids[0], pid1, "prog_ids[0]");
+	ASSERT_EQ(optq.link_ids[0], lid1, "link_ids[0]");
+	ASSERT_EQ(optq.prog_ids[1], pid2, "prog_ids[1]");
+	ASSERT_EQ(optq.link_ids[1], lid2, "link_ids[1]");
+	ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]");
+	ASSERT_EQ(optq.link_ids[2], 0, "link_ids[2]");
+
+	ASSERT_OK(system(ping_cmd), ping_cmd);
+
+	ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+	ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+	ASSERT_EQ(skel->bss->seen_tc3, false, "seen_tc3");
+	ASSERT_EQ(skel->bss->seen_tc4, false, "seen_tc4");
+
+	skel->bss->seen_tc1 = false;
+	skel->bss->seen_tc2 = false;
+
+	LIBBPF_OPTS_CLEAR(optl);
+	optl.ifindex = loopback;
+	optl.flags = BPF_F_AFTER;
+
+	link = bpf_program__attach_tcx(skel->progs.tc3, &optl);
+	if (!ASSERT_OK_PTR(link, "link_attach"))
+		goto cleanup;
+
+	skel->links.tc3 = link;
+
+	lid3 = id_from_link_fd(bpf_link__fd(skel->links.tc3));
+
+	LIBBPF_OPTS_CLEAR(optl);
+	optl.ifindex = loopback;
+	optl.flags = BPF_F_AFTER;
+
+	link = bpf_program__attach_tcx(skel->progs.tc4, &optl);
+	if (!ASSERT_OK_PTR(link, "link_attach"))
+		goto cleanup;
+
+	skel->links.tc4 = link;
+
+	lid4 = id_from_link_fd(bpf_link__fd(skel->links.tc4));
+
+	assert_mprog_count(target, 4);
+
+	memset(prog_ids, 0, sizeof(prog_ids));
+	memset(link_ids, 0, sizeof(link_ids));
+	optq.count = ARRAY_SIZE(prog_ids);
+
+	err = bpf_prog_query_opts(loopback, target, &optq);
+	if (!ASSERT_OK(err, "prog_query"))
+		goto cleanup;
+
+	ASSERT_EQ(optq.count, 4, "count");
+	ASSERT_EQ(optq.revision, 5, "revision");
+	ASSERT_EQ(optq.prog_ids[0], pid1, "prog_ids[0]");
+	ASSERT_EQ(optq.link_ids[0], lid1, "link_ids[0]");
+	ASSERT_EQ(optq.prog_ids[1], pid2, "prog_ids[1]");
+	ASSERT_EQ(optq.link_ids[1], lid2, "link_ids[1]");
+	ASSERT_EQ(optq.prog_ids[2], pid3, "prog_ids[2]");
+	ASSERT_EQ(optq.link_ids[2], lid3, "link_ids[2]");
+	ASSERT_EQ(optq.prog_ids[3], pid4, "prog_ids[3]");
+	ASSERT_EQ(optq.link_ids[3], lid4, "link_ids[3]");
+	ASSERT_EQ(optq.prog_ids[4], 0, "prog_ids[4]");
+	ASSERT_EQ(optq.link_ids[4], 0, "link_ids[4]");
+
+	ASSERT_OK(system(ping_cmd), ping_cmd);
+
+	ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+	ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+	ASSERT_EQ(skel->bss->seen_tc3, true, "seen_tc3");
+	ASSERT_EQ(skel->bss->seen_tc4, true, "seen_tc4");
+cleanup:
+	test_tc_link__destroy(skel);
+	assert_mprog_count(target, 0);
+}
+
+void serial_test_tc_links_append(void)
+{
+	test_tc_links_append_target(BPF_TCX_INGRESS);
+	test_tc_links_append_target(BPF_TCX_EGRESS);
+}
+
+static void test_tc_links_dev_cleanup_target(int target)
+{
+	LIBBPF_OPTS(bpf_tcx_opts, optl);
+	LIBBPF_OPTS(bpf_prog_query_opts, optq);
+	__u32 pid1, pid2, pid3, pid4;
+	struct test_tc_link *skel;
+	struct bpf_link *link;
+	int err, ifindex;
+
+	ASSERT_OK(system("ip link add dev tcx_opts1 type veth peer name tcx_opts2"), "add veth");
+	ifindex = if_nametoindex("tcx_opts1");
+	ASSERT_NEQ(ifindex, 0, "non_zero_ifindex");
+	optl.ifindex = ifindex;
+
+	skel = test_tc_link__open();
+	if (!ASSERT_OK_PTR(skel, "skel_open"))
+		goto cleanup;
+
+	ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc1, target),
+		  0, "tc1_attach_type");
+	ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc2, target),
+		  0, "tc2_attach_type");
+	ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc3, target),
+		  0, "tc3_attach_type");
+	ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc4, target),
+		  0, "tc4_attach_type");
+
+	err = test_tc_link__load(skel);
+	if (!ASSERT_OK(err, "skel_load"))
+		goto cleanup;
+
+	pid1 = id_from_prog_fd(bpf_program__fd(skel->progs.tc1));
+	pid2 = id_from_prog_fd(bpf_program__fd(skel->progs.tc2));
+	pid3 = id_from_prog_fd(bpf_program__fd(skel->progs.tc3));
+	pid4 = id_from_prog_fd(bpf_program__fd(skel->progs.tc4));
+
+	ASSERT_NEQ(pid1, pid2, "prog_ids_1_2");
+	ASSERT_NEQ(pid3, pid4, "prog_ids_3_4");
+	ASSERT_NEQ(pid2, pid3, "prog_ids_2_3");
+
+	assert_mprog_count(target, 0);
+
+	link = bpf_program__attach_tcx(skel->progs.tc1, &optl);
+	if (!ASSERT_OK_PTR(link, "link_attach"))
+		goto cleanup;
+
+	skel->links.tc1 = link;
+
+	assert_mprog_count_ifindex(ifindex, target, 1);
+
+	link = bpf_program__attach_tcx(skel->progs.tc2, &optl);
+	if (!ASSERT_OK_PTR(link, "link_attach"))
+		goto cleanup;
+
+	skel->links.tc2 = link;
+
+	assert_mprog_count_ifindex(ifindex, target, 2);
+
+	link = bpf_program__attach_tcx(skel->progs.tc3, &optl);
+	if (!ASSERT_OK_PTR(link, "link_attach"))
+		goto cleanup;
+
+	skel->links.tc3 = link;
+
+	assert_mprog_count_ifindex(ifindex, target, 3);
+
+	link = bpf_program__attach_tcx(skel->progs.tc4, &optl);
+	if (!ASSERT_OK_PTR(link, "link_attach"))
+		goto cleanup;
+
+	skel->links.tc4 = link;
+
+	assert_mprog_count_ifindex(ifindex, target, 4);
+
+	ASSERT_OK(system("ip link del dev tcx_opts1"), "del veth");
+	ASSERT_EQ(if_nametoindex("tcx_opts1"), 0, "dev1_removed");
+	ASSERT_EQ(if_nametoindex("tcx_opts2"), 0, "dev2_removed");
+
+	ASSERT_EQ(ifindex_from_link_fd(bpf_link__fd(skel->links.tc1)), 0, "tc1_ifindex");
+	ASSERT_EQ(ifindex_from_link_fd(bpf_link__fd(skel->links.tc2)), 0, "tc2_ifindex");
+	ASSERT_EQ(ifindex_from_link_fd(bpf_link__fd(skel->links.tc3)), 0, "tc3_ifindex");
+	ASSERT_EQ(ifindex_from_link_fd(bpf_link__fd(skel->links.tc4)), 0, "tc4_ifindex");
+
+	test_tc_link__destroy(skel);
+	return;
+cleanup:
+	test_tc_link__destroy(skel);
+
+	ASSERT_OK(system("ip link del dev tcx_opts1"), "del veth");
+	ASSERT_EQ(if_nametoindex("tcx_opts1"), 0, "dev1_removed");
+	ASSERT_EQ(if_nametoindex("tcx_opts2"), 0, "dev2_removed");
+}
+
+void serial_test_tc_links_dev_cleanup(void)
+{
+	test_tc_links_dev_cleanup_target(BPF_TCX_INGRESS);
+	test_tc_links_dev_cleanup_target(BPF_TCX_EGRESS);
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v4 1/8] bpf: Add generic attach/detach/query API for multi-progs
  2023-07-10 20:12 ` [PATCH bpf-next v4 1/8] bpf: Add generic attach/detach/query API for multi-progs Daniel Borkmann
@ 2023-07-11  0:23   ` Alexei Starovoitov
  2023-07-11 18:51     ` Andrii Nakryiko
  2023-07-11 18:48   ` Andrii Nakryiko
  1 sibling, 1 reply; 22+ messages in thread
From: Alexei Starovoitov @ 2023-07-11  0:23 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: ast, andrii, martin.lau, razor, sdf, john.fastabend, kuba, dxu,
	joe, toke, davem, bpf, netdev

On Mon, Jul 10, 2023 at 10:12:11PM +0200, Daniel Borkmann wrote:
> + *
> + *   struct bpf_mprog_entry *entry, *peer;
> + *   int ret;
> + *
> + *   // bpf_mprog user-side lock
> + *   // fetch active @entry from attach location
> + *   [...]
> + *   ret = bpf_mprog_attach(entry, [...]);
> + *   if (ret >= 0) {
> + *       peer = bpf_mprog_peer(entry);
> + *       if (bpf_mprog_swap_entries(ret))
> + *           // swap @entry to @peer at attach location
> + *       bpf_mprog_commit(entry);
> + *       ret = 0;
> + *   } else {
> + *       // error path, bail out, propagate @ret
> + *   }
> + *   // bpf_mprog user-side unlock
> + *
> + *  Detach case:
> + *
> + *   struct bpf_mprog_entry *entry, *peer;
> + *   bool release;
> + *   int ret;
> + *
> + *   // bpf_mprog user-side lock
> + *   // fetch active @entry from attach location
> + *   [...]
> + *   ret = bpf_mprog_detach(entry, [...]);
> + *   if (ret >= 0) {
> + *       release = ret == BPF_MPROG_FREE;
> + *       peer = release ? NULL : bpf_mprog_peer(entry);
> + *       if (bpf_mprog_swap_entries(ret))
> + *           // swap @entry to @peer at attach location
> + *       bpf_mprog_commit(entry);
> + *       if (release)
> + *           // free bpf_mprog_bundle
> + *       ret = 0;
> + *   } else {
> + *       // error path, bail out, propagate @ret
> + *   }
> + *   // bpf_mprog user-side unlock

Thanks for the doc. It helped a lot.
And when it's contained like this it's easier to discuss api.
It seems bpf_mprog_swap_entries() is trying to abstract the error code
away, but BPF_MPROG_FREE leaks out and tcx_entry_needs_release()
captures it with extra miniq_active twist, which I don't understand yet.
bpf_mprog_peer() is also leaking a bit of implementation detail.
Can we abstract it further, like:

ret = bpf_mprog_detach(entry, [...], &new_entry);
if (ret >= 0) {
   if (entry != new_entry)
     // swap @entry to @new_entry at attach location
   bpf_mprog_commit(entry);
   if (!new_entry)
     // free bpf_mprog_bundle
}
and make bpf_mprog_peer internal to mprog. It will also allow removing
BPF_MPROG_FREE vs SWAP distinction. peer is hidden.
   if (entry != new_entry)
      // update
also will be easier to read inside tcx code without looking into mprog details.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v4 3/8] libbpf: Add opts-based attach/detach/query API for tcx
  2023-07-10 20:12 ` [PATCH bpf-next v4 3/8] libbpf: Add opts-based attach/detach/query API for tcx Daniel Borkmann
@ 2023-07-11  4:00   ` Andrii Nakryiko
  2023-07-11 14:03     ` Daniel Borkmann
  0 siblings, 1 reply; 22+ messages in thread
From: Andrii Nakryiko @ 2023-07-11  4:00 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: ast, andrii, martin.lau, razor, sdf, john.fastabend, kuba, dxu,
	joe, toke, davem, bpf, netdev

On Mon, Jul 10, 2023 at 1:12 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> Extend libbpf attach opts and add a new detach opts API so this can be used
> to add/remove fd-based tcx BPF programs. The old-style bpf_prog_detach() and
> bpf_prog_detach2() APIs are refactored to reuse the new bpf_prog_detach_opts()
> internally.
>
> The bpf_prog_query_opts() API got extended to be able to handle the new
> link_ids, link_attach_flags and revision fields.
>
> For concrete usage examples, see the extensive selftests that have been
> developed as part of this series.
>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> ---
>  tools/lib/bpf/bpf.c      | 105 +++++++++++++++++++++++++--------------
>  tools/lib/bpf/bpf.h      |  92 ++++++++++++++++++++++++++++------
>  tools/lib/bpf/libbpf.c   |  12 +++--
>  tools/lib/bpf/libbpf.map |   1 +
>  4 files changed, 157 insertions(+), 53 deletions(-)
>

Thanks for doc comments! Looks good, left a few nits with suggestions
for simplifying code, but it's minor.

Acked-by: Andrii Nakryiko <andrii@kernel.org>

> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> index 3b0da19715e1..3dfc43b477c3 100644
> --- a/tools/lib/bpf/bpf.c
> +++ b/tools/lib/bpf/bpf.c
> @@ -629,55 +629,87 @@ int bpf_prog_attach(int prog_fd, int target_fd, enum bpf_attach_type type,
>         return bpf_prog_attach_opts(prog_fd, target_fd, type, &opts);
>  }
>
> -int bpf_prog_attach_opts(int prog_fd, int target_fd,
> -                         enum bpf_attach_type type,
> -                         const struct bpf_prog_attach_opts *opts)
> +int bpf_prog_attach_opts(int prog_fd, int target,
> +                        enum bpf_attach_type type,
> +                        const struct bpf_prog_attach_opts *opts)
>  {
> -       const size_t attr_sz = offsetofend(union bpf_attr, replace_bpf_fd);
> +       const size_t attr_sz = offsetofend(union bpf_attr, expected_revision);
> +       __u32 relative_id, flags;
>         union bpf_attr attr;
> -       int ret;
> +       int ret, relative;
>
>         if (!OPTS_VALID(opts, bpf_prog_attach_opts))
>                 return libbpf_err(-EINVAL);
>
> +       relative_id = OPTS_GET(opts, relative_id, 0);
> +       relative = OPTS_GET(opts, relative_fd, 0);
> +       flags = OPTS_GET(opts, flags, 0);
> +
> +       /* validate we don't have unexpected combinations of non-zero fields */
> +       if (relative > 0 && relative_id)
> +               return libbpf_err(-EINVAL);

I left a comment in the next patch about this, I think it should be
simple `if (relative_fd && relative_id) { /* bad */ }`. But see the
next patch for why.

> +       if (relative_id) {
> +               relative = relative_id;
> +               flags |= BPF_F_ID;
> +       }

it's a bit hard to follow as written (to me at least). How about a
slight variation that has less in-place state update


int relative_fd, relative_id;

relative_fd = OPTS_GET(opts, relative_fd, 0);
relative_id = OPTS_GET(opts, relative_id, 0);

/* only one of fd or id can be specified */
if (relative_fd && relative_id > 0)
    return libbpf_err(-EINVAL);

... then see further below

> +
>         memset(&attr, 0, attr_sz);
> -       attr.target_fd     = target_fd;
> -       attr.attach_bpf_fd = prog_fd;
> -       attr.attach_type   = type;
> -       attr.attach_flags  = OPTS_GET(opts, flags, 0);
> -       attr.replace_bpf_fd = OPTS_GET(opts, replace_prog_fd, 0);
> +       attr.target_fd          = target;
> +       attr.attach_bpf_fd      = prog_fd;
> +       attr.attach_type        = type;
> +       attr.attach_flags       = flags;
> +       attr.relative_fd        = relative;

instead of two lines above, have simple if/else

if (relative_if) {
    attr.relative_id = relative_id;
    attr.attach_flags = flags | BPF_F_ID;
} else {
    attr.relative_fd = relative_fd;
    attr.attach_flags = flags;
}

This combined with the piece above seems very straightforward in terms
of what is checked and what's passed into attr. WDYT?

> +       attr.replace_bpf_fd     = OPTS_GET(opts, replace_fd, 0);
> +       attr.expected_revision  = OPTS_GET(opts, expected_revision, 0);
>
>         ret = sys_bpf(BPF_PROG_ATTACH, &attr, attr_sz);
>         return libbpf_err_errno(ret);
>  }
>
> -int bpf_prog_detach(int target_fd, enum bpf_attach_type type)
> +int bpf_prog_detach_opts(int prog_fd, int target,
> +                        enum bpf_attach_type type,
> +                        const struct bpf_prog_detach_opts *opts)
>  {
> -       const size_t attr_sz = offsetofend(union bpf_attr, replace_bpf_fd);
> +       const size_t attr_sz = offsetofend(union bpf_attr, expected_revision);
> +       __u32 relative_id, flags;
>         union bpf_attr attr;
> -       int ret;
> +       int ret, relative;
> +
> +       if (!OPTS_VALID(opts, bpf_prog_detach_opts))
> +               return libbpf_err(-EINVAL);
> +
> +       relative_id = OPTS_GET(opts, relative_id, 0);
> +       relative = OPTS_GET(opts, relative_fd, 0);
> +       flags = OPTS_GET(opts, flags, 0);
> +
> +       /* validate we don't have unexpected combinations of non-zero fields */
> +       if (relative > 0 && relative_id)
> +               return libbpf_err(-EINVAL);
> +       if (relative_id) {
> +               relative = relative_id;
> +               flags |= BPF_F_ID;
> +       }

see above, I think the same data flow simplification can be done

>
>         memset(&attr, 0, attr_sz);
> -       attr.target_fd   = target_fd;
> -       attr.attach_type = type;
> +       attr.target_fd          = target;
> +       attr.attach_bpf_fd      = prog_fd;
> +       attr.attach_type        = type;
> +       attr.attach_flags       = flags;
> +       attr.relative_fd        = relative;
> +       attr.expected_revision  = OPTS_GET(opts, expected_revision, 0);
>
>         ret = sys_bpf(BPF_PROG_DETACH, &attr, attr_sz);
>         return libbpf_err_errno(ret);
>  }
>

[...]

> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> index d9ec4407befa..a95d39bbef90 100644
> --- a/tools/lib/bpf/libbpf.map
> +++ b/tools/lib/bpf/libbpf.map
> @@ -396,4 +396,5 @@ LIBBPF_1.3.0 {
>         global:
>                 bpf_obj_pin_opts;
>                 bpf_program__attach_netfilter;
> +               bpf_prog_detach_opts;

I think it sorts before bpf_program__attach_netfilter?

>  } LIBBPF_1.2.0;


> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v4 4/8] libbpf: Add link-based API for tcx
  2023-07-10 20:12 ` [PATCH bpf-next v4 4/8] libbpf: Add link-based " Daniel Borkmann
@ 2023-07-11  4:00   ` Andrii Nakryiko
  2023-07-11 14:08     ` Daniel Borkmann
  0 siblings, 1 reply; 22+ messages in thread
From: Andrii Nakryiko @ 2023-07-11  4:00 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: ast, andrii, martin.lau, razor, sdf, john.fastabend, kuba, dxu,
	joe, toke, davem, bpf, netdev

On Mon, Jul 10, 2023 at 1:12 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> Implement tcx BPF link support for libbpf.
>
> The bpf_program__attach_fd() API has been refactored slightly in order to pass
> bpf_link_create_opts pointer as input.
>
> A new bpf_program__attach_tcx() has been added on top of this which allows for
> passing all relevant data via extensible struct bpf_tcx_opts.
>
> The program sections tcx/ingress and tcx/egress correspond to the hook locations
> for tc ingress and egress, respectively.
>
> For concrete usage examples, see the extensive selftests that have been
> developed as part of this series.
>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> ---
>  tools/lib/bpf/bpf.c      | 19 ++++++++++--
>  tools/lib/bpf/bpf.h      |  5 ++++
>  tools/lib/bpf/libbpf.c   | 62 ++++++++++++++++++++++++++++++++++------
>  tools/lib/bpf/libbpf.h   | 16 +++++++++++
>  tools/lib/bpf/libbpf.map |  1 +
>  5 files changed, 92 insertions(+), 11 deletions(-)
>

Pretty minor nits, I think ifindex move to be mandatory argument is
the most consequential, as it's an API. With that addressed, please
add my ack for next rev

Acked-by: Andrii Nakryiko <andrii@kernel.org>

> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> index 3dfc43b477c3..d513c226b9aa 100644
> --- a/tools/lib/bpf/bpf.c
> +++ b/tools/lib/bpf/bpf.c
> @@ -717,9 +717,9 @@ int bpf_link_create(int prog_fd, int target_fd,
>                     const struct bpf_link_create_opts *opts)
>  {
>         const size_t attr_sz = offsetofend(union bpf_attr, link_create);
> -       __u32 target_btf_id, iter_info_len;
> +       __u32 target_btf_id, iter_info_len, relative_id;
> +       int fd, err, relative;

nit: maybe make these new vars local to the TCX cases branch below?

>         union bpf_attr attr;
> -       int fd, err;
>
>         if (!OPTS_VALID(opts, bpf_link_create_opts))
>                 return libbpf_err(-EINVAL);
> @@ -781,6 +781,21 @@ int bpf_link_create(int prog_fd, int target_fd,
>                 if (!OPTS_ZEROED(opts, netfilter))
>                         return libbpf_err(-EINVAL);
>                 break;
> +       case BPF_TCX_INGRESS:
> +       case BPF_TCX_EGRESS:
> +               relative = OPTS_GET(opts, tcx.relative_fd, 0);
> +               relative_id = OPTS_GET(opts, tcx.relative_id, 0);
> +               if (relative > 0 && relative_id)
> +                       return libbpf_err(-EINVAL);
> +               if (relative_id) {
> +                       relative = relative_id;
> +                       attr.link_create.flags |= BPF_F_ID;
> +               }

Well, I have the same nit as in the previous patch, this "relative =
relative_id" is both confusing because of naming asymmetry (no
relative_fd throws me off), and also unnecessary updating of the
state. link_create.flags |= BPF_F_ID is inevitable, but the rest can
be more straightforward, IMO

> +               attr.link_create.tcx.relative_fd = relative;
> +               attr.link_create.tcx.expected_revision = OPTS_GET(opts, tcx.expected_revision, 0);
> +               if (!OPTS_ZEROED(opts, tcx))
> +                       return libbpf_err(-EINVAL);
> +               break;
>         default:
>                 if (!OPTS_ZEROED(opts, flags))
>                         return libbpf_err(-EINVAL);

[...]

> +struct bpf_link *
> +bpf_program__attach_tcx(const struct bpf_program *prog,
> +                       const struct bpf_tcx_opts *opts)
> +{
> +       LIBBPF_OPTS(bpf_link_create_opts, link_create_opts);
> +       __u32 relative_id, flags;
> +       int ifindex, relative_fd;
> +
> +       if (!OPTS_VALID(opts, bpf_tcx_opts))
> +               return libbpf_err_ptr(-EINVAL);
> +
> +       relative_id = OPTS_GET(opts, relative_id, 0);
> +       relative_fd = OPTS_GET(opts, relative_fd, 0);
> +       flags = OPTS_GET(opts, flags, 0);
> +       ifindex = OPTS_GET(opts, ifindex, 0);
> +
> +       /* validate we don't have unexpected combinations of non-zero fields */
> +       if (!ifindex) {
> +               pr_warn("prog '%s': target netdevice ifindex cannot be zero\n",
> +                       prog->name);
> +               return libbpf_err_ptr(-EINVAL);
> +       }

given ifindex is non-optional, then it makes more sense to have it as
a mandatory argument between prog and opts in
bpf_program__attach_tcx(), instead of as a field of an opts struct

> +       if (relative_fd > 0 && relative_id) {

this asymmetrical check is a bit distracting. And also, if someone
specifies negative FD and positive ID, that's also a bad combo and we
shouldn't just ignore invalid FD, right? So I'd have a nice and clean

if (relative_fd && relative_id) { /* bad */ }

> +               pr_warn("prog '%s': relative_fd and relative_id cannot be set at the same time\n",
> +                       prog->name);
> +               return libbpf_err_ptr(-EINVAL);
> +       }
> +       if (relative_id)
> +               flags |= BPF_F_ID;

I think bpf_link_create() will add this flag anyways, so can drop this
adjustment logic here?

> +
> +       link_create_opts.tcx.expected_revision = OPTS_GET(opts, expected_revision, 0);
> +       link_create_opts.tcx.relative_fd = relative_fd;
> +       link_create_opts.tcx.relative_id = relative_id;
> +       link_create_opts.flags = flags;
> +
> +       /* target_fd/target_ifindex use the same field in LINK_CREATE */
> +       return bpf_program_attach_fd(prog, ifindex, "tc", &link_create_opts);

s/tc/tcx/ ?

>  }
>
>  struct bpf_link *bpf_program__attach_freplace(const struct bpf_program *prog,
> @@ -11917,11 +11956,16 @@ struct bpf_link *bpf_program__attach_freplace(const struct bpf_program *prog,
>         }
>
>         if (target_fd) {
> +               LIBBPF_OPTS(bpf_link_create_opts, target_opts);
> +
>                 btf_id = libbpf_find_prog_btf_id(attach_func_name, target_fd);
>                 if (btf_id < 0)
>                         return libbpf_err_ptr(btf_id);
>
> -               return bpf_program__attach_fd(prog, target_fd, btf_id, "freplace");
> +               target_opts.target_btf_id = btf_id;
> +
> +               return bpf_program_attach_fd(prog, target_fd, "freplace",
> +                                            &target_opts);
>         } else {
>                 /* no target, so use raw_tracepoint_open for compatibility
>                  * with old kernels
> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> index 10642ad69d76..33f60a318e81 100644
> --- a/tools/lib/bpf/libbpf.h
> +++ b/tools/lib/bpf/libbpf.h
> @@ -733,6 +733,22 @@ LIBBPF_API struct bpf_link *
>  bpf_program__attach_netfilter(const struct bpf_program *prog,
>                               const struct bpf_netfilter_opts *opts);
>
> +struct bpf_tcx_opts {
> +       /* size of this struct, for forward/backward compatibility */
> +       size_t sz;
> +       int ifindex;

is ifindex optional or it's expected to always be specified? If the
latter, then I'd move ifindex out of opts and make it second arg of
bpf_program__attach_tcx, between prog and opts

> +       __u32 flags;
> +       __u32 relative_fd;
> +       __u32 relative_id;
> +       __u64 expected_revision;
> +       size_t :0;
> +};
> +#define bpf_tcx_opts__last_field expected_revision
> +
> +LIBBPF_API struct bpf_link *
> +bpf_program__attach_tcx(const struct bpf_program *prog,
> +                       const struct bpf_tcx_opts *opts);
> +
>  struct bpf_map;
>
>  LIBBPF_API struct bpf_link *bpf_map__attach_struct_ops(const struct bpf_map *map);
> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> index a95d39bbef90..2a2db5c78048 100644
> --- a/tools/lib/bpf/libbpf.map
> +++ b/tools/lib/bpf/libbpf.map
> @@ -397,4 +397,5 @@ LIBBPF_1.3.0 {
>                 bpf_obj_pin_opts;
>                 bpf_program__attach_netfilter;
>                 bpf_prog_detach_opts;
> +               bpf_program__attach_tcx;

heh, now we definitely screwed up sorting ;)

>  } LIBBPF_1.2.0;

> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v4 5/8] libbpf: Add helper macro to clear opts structs
  2023-07-10 20:12 ` [PATCH bpf-next v4 5/8] libbpf: Add helper macro to clear opts structs Daniel Borkmann
@ 2023-07-11  4:02   ` Andrii Nakryiko
  2023-07-11  9:42     ` Daniel Borkmann
  0 siblings, 1 reply; 22+ messages in thread
From: Andrii Nakryiko @ 2023-07-11  4:02 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: ast, andrii, martin.lau, razor, sdf, john.fastabend, kuba, dxu,
	joe, toke, davem, bpf, netdev

On Mon, Jul 10, 2023 at 1:12 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> Add a small and generic LIBBPF_OPTS_CLEAR() helper macros which clears
> an opts structure and reinitializes its .sz member to place the structure
> size. I found this very useful when developing selftests, but it is also
> generic enough as a macro next to the existing LIBBPF_OPTS() which hides
> the .sz initialization, too.
>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> ---
>  tools/lib/bpf/libbpf_common.h | 11 +++++++++++
>  1 file changed, 11 insertions(+)
>
> diff --git a/tools/lib/bpf/libbpf_common.h b/tools/lib/bpf/libbpf_common.h
> index 9a7937f339df..eb180023aa97 100644
> --- a/tools/lib/bpf/libbpf_common.h
> +++ b/tools/lib/bpf/libbpf_common.h
> @@ -70,4 +70,15 @@
>                 };                                                          \
>         })
>
> +/* Helper macro to clear a libbpf options struct
> + *
> + * Small helper macro to reset all fields and to reinitialize the common
> + * structure size member.
> + */
> +#define LIBBPF_OPTS_CLEAR(NAME)                                                    \
> +       do {                                                                \
> +               memset(&NAME, 0, sizeof(NAME));                             \
> +               NAME.sz = sizeof(NAME);                                     \
> +       } while (0)
> +

This is fine, but I think you can go a half-step further and have
something even more universal and useful. Something like this:


#define LIBBPF_OPTS_RESET(NAME, ...)
    do {
        memset(&NAME, 0, sizeof(NAME));
        NAME = (typeof(NAME)) {
            .sz = sizeof(struct TYPE),
            __VA_ARGS__
        };
     while (0);

I actually haven't tried if that typeof() trick works, but I hope it does :)


Then your LIBBPF_OPTS_CLEAR() is just LIBBPF_OPTS_RESET(x). But you
can also re-initialize:

LIBBPF_OPTS_RESET(x, .flags = 123, .prog_fd = 456);

It's more in line with LIBBPF_OPTS() itself in capabilities, except it
works on existing variable.


>  #endif /* __LIBBPF_LIBBPF_COMMON_H */
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v4 5/8] libbpf: Add helper macro to clear opts structs
  2023-07-11  4:02   ` Andrii Nakryiko
@ 2023-07-11  9:42     ` Daniel Borkmann
  0 siblings, 0 replies; 22+ messages in thread
From: Daniel Borkmann @ 2023-07-11  9:42 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: ast, andrii, martin.lau, razor, sdf, john.fastabend, kuba, dxu,
	joe, toke, davem, bpf, netdev

On 7/11/23 6:02 AM, Andrii Nakryiko wrote:
> On Mon, Jul 10, 2023 at 1:12 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>>
>> Add a small and generic LIBBPF_OPTS_CLEAR() helper macros which clears
>> an opts structure and reinitializes its .sz member to place the structure
>> size. I found this very useful when developing selftests, but it is also
>> generic enough as a macro next to the existing LIBBPF_OPTS() which hides
>> the .sz initialization, too.
>>
>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
>> ---
>>   tools/lib/bpf/libbpf_common.h | 11 +++++++++++
>>   1 file changed, 11 insertions(+)
>>
>> diff --git a/tools/lib/bpf/libbpf_common.h b/tools/lib/bpf/libbpf_common.h
>> index 9a7937f339df..eb180023aa97 100644
>> --- a/tools/lib/bpf/libbpf_common.h
>> +++ b/tools/lib/bpf/libbpf_common.h
>> @@ -70,4 +70,15 @@
>>                  };                                                          \
>>          })
>>
>> +/* Helper macro to clear a libbpf options struct
>> + *
>> + * Small helper macro to reset all fields and to reinitialize the common
>> + * structure size member.
>> + */
>> +#define LIBBPF_OPTS_CLEAR(NAME)                                                    \
>> +       do {                                                                \
>> +               memset(&NAME, 0, sizeof(NAME));                             \
>> +               NAME.sz = sizeof(NAME);                                     \
>> +       } while (0)
>> +
> 
> This is fine, but I think you can go a half-step further and have
> something even more universal and useful. Something like this:
> 
> 
> #define LIBBPF_OPTS_RESET(NAME, ...)
>      do {
>          memset(&NAME, 0, sizeof(NAME));
>          NAME = (typeof(NAME)) {
>              .sz = sizeof(struct TYPE),
>              __VA_ARGS__
>          };
>       while (0);
> 
> I actually haven't tried if that typeof() trick works, but I hope it does :)

It does, I've used this in BPF code for Cilium, too. ;)

> Then your LIBBPF_OPTS_CLEAR() is just LIBBPF_OPTS_RESET(x). But you
> can also re-initialize:
> 
> LIBBPF_OPTS_RESET(x, .flags = 123, .prog_fd = 456);
> 
> It's more in line with LIBBPF_OPTS() itself in capabilities, except it
> works on existing variable.

Agree, changed into ...

/* Helper macro to clear and optionally reinitialize libbpf options struct
  *
  * Small helper macro to reset all fields and to reinitialize the common
  * structure size member. Values provided by users in struct initializer-
  * syntax as varargs can be provided as well to reinitialize options struct
  * specific members.
  */
#define LIBBPF_OPTS_RESET(NAME, ...)                                        \
         do {                                                                \
                 memset(&NAME, 0, sizeof(NAME));                             \
                 NAME = (typeof(NAME)) {                                     \
                         .sz = sizeof(NAME),                                 \
                         __VA_ARGS__                                         \
                 };                                                          \
         } while (0)

... and updated all the test cases.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v4 3/8] libbpf: Add opts-based attach/detach/query API for tcx
  2023-07-11  4:00   ` Andrii Nakryiko
@ 2023-07-11 14:03     ` Daniel Borkmann
  0 siblings, 0 replies; 22+ messages in thread
From: Daniel Borkmann @ 2023-07-11 14:03 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: ast, andrii, martin.lau, razor, sdf, john.fastabend, kuba, dxu,
	joe, toke, davem, bpf, netdev

On 7/11/23 6:00 AM, Andrii Nakryiko wrote:
> On Mon, Jul 10, 2023 at 1:12 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>>
>> Extend libbpf attach opts and add a new detach opts API so this can be used
>> to add/remove fd-based tcx BPF programs. The old-style bpf_prog_detach() and
>> bpf_prog_detach2() APIs are refactored to reuse the new bpf_prog_detach_opts()
>> internally.
>>
>> The bpf_prog_query_opts() API got extended to be able to handle the new
>> link_ids, link_attach_flags and revision fields.
>>
>> For concrete usage examples, see the extensive selftests that have been
>> developed as part of this series.
>>
>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
>> ---
>>   tools/lib/bpf/bpf.c      | 105 +++++++++++++++++++++++++--------------
>>   tools/lib/bpf/bpf.h      |  92 ++++++++++++++++++++++++++++------
>>   tools/lib/bpf/libbpf.c   |  12 +++--
>>   tools/lib/bpf/libbpf.map |   1 +
>>   4 files changed, 157 insertions(+), 53 deletions(-)
>>
> 
> Thanks for doc comments! Looks good, left a few nits with suggestions
> for simplifying code, but it's minor.
> 
> Acked-by: Andrii Nakryiko <andrii@kernel.org>
> 
>> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
>> index 3b0da19715e1..3dfc43b477c3 100644
>> --- a/tools/lib/bpf/bpf.c
>> +++ b/tools/lib/bpf/bpf.c
>> @@ -629,55 +629,87 @@ int bpf_prog_attach(int prog_fd, int target_fd, enum bpf_attach_type type,
>>          return bpf_prog_attach_opts(prog_fd, target_fd, type, &opts);
>>   }
>>
>> -int bpf_prog_attach_opts(int prog_fd, int target_fd,
>> -                         enum bpf_attach_type type,
>> -                         const struct bpf_prog_attach_opts *opts)
>> +int bpf_prog_attach_opts(int prog_fd, int target,
>> +                        enum bpf_attach_type type,
>> +                        const struct bpf_prog_attach_opts *opts)
>>   {
>> -       const size_t attr_sz = offsetofend(union bpf_attr, replace_bpf_fd);
>> +       const size_t attr_sz = offsetofend(union bpf_attr, expected_revision);
>> +       __u32 relative_id, flags;
>>          union bpf_attr attr;
>> -       int ret;
>> +       int ret, relative;
>>
>>          if (!OPTS_VALID(opts, bpf_prog_attach_opts))
>>                  return libbpf_err(-EINVAL);
>>
>> +       relative_id = OPTS_GET(opts, relative_id, 0);
>> +       relative = OPTS_GET(opts, relative_fd, 0);
>> +       flags = OPTS_GET(opts, flags, 0);
>> +
>> +       /* validate we don't have unexpected combinations of non-zero fields */
>> +       if (relative > 0 && relative_id)
>> +               return libbpf_err(-EINVAL);
> 
> I left a comment in the next patch about this, I think it should be
> simple `if (relative_fd && relative_id) { /* bad */ }`. But see the
> next patch for why.
> 
>> +       if (relative_id) {
>> +               relative = relative_id;
>> +               flags |= BPF_F_ID;
>> +       }
> 
> it's a bit hard to follow as written (to me at least). How about a
> slight variation that has less in-place state update
> 
> 
> int relative_fd, relative_id;
> 
> relative_fd = OPTS_GET(opts, relative_fd, 0);
> relative_id = OPTS_GET(opts, relative_id, 0);
> 
> /* only one of fd or id can be specified */
> if (relative_fd && relative_id > 0)
>      return libbpf_err(-EINVAL);
> 
> ... then see further below
> 
>> +
>>          memset(&attr, 0, attr_sz);
>> -       attr.target_fd     = target_fd;
>> -       attr.attach_bpf_fd = prog_fd;
>> -       attr.attach_type   = type;
>> -       attr.attach_flags  = OPTS_GET(opts, flags, 0);
>> -       attr.replace_bpf_fd = OPTS_GET(opts, replace_prog_fd, 0);
>> +       attr.target_fd          = target;
>> +       attr.attach_bpf_fd      = prog_fd;
>> +       attr.attach_type        = type;
>> +       attr.attach_flags       = flags;
>> +       attr.relative_fd        = relative;
> 
> instead of two lines above, have simple if/else
> 
> if (relative_if) {
>      attr.relative_id = relative_id;
>      attr.attach_flags = flags | BPF_F_ID;
> } else {
>      attr.relative_fd = relative_fd;
>      attr.attach_flags = flags;
> }
> 
> This combined with the piece above seems very straightforward in terms
> of what is checked and what's passed into attr. WDYT?

All sgtm, I've implemented the suggestions locally for v5.

>> +       attr.replace_bpf_fd     = OPTS_GET(opts, replace_fd, 0);
>> +       attr.expected_revision  = OPTS_GET(opts, expected_revision, 0);
>>
>>          ret = sys_bpf(BPF_PROG_ATTACH, &attr, attr_sz);
>>          return libbpf_err_errno(ret);
>>   }
>>
>> -int bpf_prog_detach(int target_fd, enum bpf_attach_type type)
>> +int bpf_prog_detach_opts(int prog_fd, int target,
>> +                        enum bpf_attach_type type,
>> +                        const struct bpf_prog_detach_opts *opts)
>>   {
>> -       const size_t attr_sz = offsetofend(union bpf_attr, replace_bpf_fd);
>> +       const size_t attr_sz = offsetofend(union bpf_attr, expected_revision);
>> +       __u32 relative_id, flags;
>>          union bpf_attr attr;
>> -       int ret;
>> +       int ret, relative;
>> +
>> +       if (!OPTS_VALID(opts, bpf_prog_detach_opts))
>> +               return libbpf_err(-EINVAL);
>> +
>> +       relative_id = OPTS_GET(opts, relative_id, 0);
>> +       relative = OPTS_GET(opts, relative_fd, 0);
>> +       flags = OPTS_GET(opts, flags, 0);
>> +
>> +       /* validate we don't have unexpected combinations of non-zero fields */
>> +       if (relative > 0 && relative_id)
>> +               return libbpf_err(-EINVAL);
>> +       if (relative_id) {
>> +               relative = relative_id;
>> +               flags |= BPF_F_ID;
>> +       }
> 
> see above, I think the same data flow simplification can be done
> 
>>
>>          memset(&attr, 0, attr_sz);
>> -       attr.target_fd   = target_fd;
>> -       attr.attach_type = type;
>> +       attr.target_fd          = target;
>> +       attr.attach_bpf_fd      = prog_fd;
>> +       attr.attach_type        = type;
>> +       attr.attach_flags       = flags;
>> +       attr.relative_fd        = relative;
>> +       attr.expected_revision  = OPTS_GET(opts, expected_revision, 0);
>>
>>          ret = sys_bpf(BPF_PROG_DETACH, &attr, attr_sz);
>>          return libbpf_err_errno(ret);
>>   }
>>
> 
> [...]
> 
>> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
>> index d9ec4407befa..a95d39bbef90 100644
>> --- a/tools/lib/bpf/libbpf.map
>> +++ b/tools/lib/bpf/libbpf.map
>> @@ -396,4 +396,5 @@ LIBBPF_1.3.0 {
>>          global:
>>                  bpf_obj_pin_opts;
>>                  bpf_program__attach_netfilter;
>> +               bpf_prog_detach_opts;
> 
> I think it sorts before bpf_program__attach_netfilter?

Yeap, also fixed.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v4 4/8] libbpf: Add link-based API for tcx
  2023-07-11  4:00   ` Andrii Nakryiko
@ 2023-07-11 14:08     ` Daniel Borkmann
  0 siblings, 0 replies; 22+ messages in thread
From: Daniel Borkmann @ 2023-07-11 14:08 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: ast, andrii, martin.lau, razor, sdf, john.fastabend, kuba, dxu,
	joe, toke, davem, bpf, netdev

On 7/11/23 6:00 AM, Andrii Nakryiko wrote:
> On Mon, Jul 10, 2023 at 1:12 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>>
>> Implement tcx BPF link support for libbpf.
>>
>> The bpf_program__attach_fd() API has been refactored slightly in order to pass
>> bpf_link_create_opts pointer as input.
>>
>> A new bpf_program__attach_tcx() has been added on top of this which allows for
>> passing all relevant data via extensible struct bpf_tcx_opts.
>>
>> The program sections tcx/ingress and tcx/egress correspond to the hook locations
>> for tc ingress and egress, respectively.
>>
>> For concrete usage examples, see the extensive selftests that have been
>> developed as part of this series.
>>
>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
>> ---
>>   tools/lib/bpf/bpf.c      | 19 ++++++++++--
>>   tools/lib/bpf/bpf.h      |  5 ++++
>>   tools/lib/bpf/libbpf.c   | 62 ++++++++++++++++++++++++++++++++++------
>>   tools/lib/bpf/libbpf.h   | 16 +++++++++++
>>   tools/lib/bpf/libbpf.map |  1 +
>>   5 files changed, 92 insertions(+), 11 deletions(-)
>>
> 
> Pretty minor nits, I think ifindex move to be mandatory argument is
> the most consequential, as it's an API. With that addressed, please
> add my ack for next rev
> 
> Acked-by: Andrii Nakryiko <andrii@kernel.org>
> 
>> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
>> index 3dfc43b477c3..d513c226b9aa 100644
>> --- a/tools/lib/bpf/bpf.c
>> +++ b/tools/lib/bpf/bpf.c
>> @@ -717,9 +717,9 @@ int bpf_link_create(int prog_fd, int target_fd,
>>                      const struct bpf_link_create_opts *opts)
>>   {
>>          const size_t attr_sz = offsetofend(union bpf_attr, link_create);
>> -       __u32 target_btf_id, iter_info_len;
>> +       __u32 target_btf_id, iter_info_len, relative_id;
>> +       int fd, err, relative;
> 
> nit: maybe make these new vars local to the TCX cases branch below?
> 
>>          union bpf_attr attr;
>> -       int fd, err;
>>
>>          if (!OPTS_VALID(opts, bpf_link_create_opts))
>>                  return libbpf_err(-EINVAL);
>> @@ -781,6 +781,21 @@ int bpf_link_create(int prog_fd, int target_fd,
>>                  if (!OPTS_ZEROED(opts, netfilter))
>>                          return libbpf_err(-EINVAL);
>>                  break;
>> +       case BPF_TCX_INGRESS:
>> +       case BPF_TCX_EGRESS:
>> +               relative = OPTS_GET(opts, tcx.relative_fd, 0);
>> +               relative_id = OPTS_GET(opts, tcx.relative_id, 0);
>> +               if (relative > 0 && relative_id)
>> +                       return libbpf_err(-EINVAL);
>> +               if (relative_id) {
>> +                       relative = relative_id;
>> +                       attr.link_create.flags |= BPF_F_ID;
>> +               }
> 
> Well, I have the same nit as in the previous patch, this "relative =
> relative_id" is both confusing because of naming asymmetry (no
> relative_fd throws me off), and also unnecessary updating of the
> state. link_create.flags |= BPF_F_ID is inevitable, but the rest can
> be more straightforward, IMO
> 
>> +               attr.link_create.tcx.relative_fd = relative;
>> +               attr.link_create.tcx.expected_revision = OPTS_GET(opts, tcx.expected_revision, 0);
>> +               if (!OPTS_ZEROED(opts, tcx))
>> +                       return libbpf_err(-EINVAL);
>> +               break;
>>          default:
>>                  if (!OPTS_ZEROED(opts, flags))
>>                          return libbpf_err(-EINVAL);
> 
> [...]
> 
>> +struct bpf_link *
>> +bpf_program__attach_tcx(const struct bpf_program *prog,
>> +                       const struct bpf_tcx_opts *opts)
>> +{
>> +       LIBBPF_OPTS(bpf_link_create_opts, link_create_opts);
>> +       __u32 relative_id, flags;
>> +       int ifindex, relative_fd;
>> +
>> +       if (!OPTS_VALID(opts, bpf_tcx_opts))
>> +               return libbpf_err_ptr(-EINVAL);
>> +
>> +       relative_id = OPTS_GET(opts, relative_id, 0);
>> +       relative_fd = OPTS_GET(opts, relative_fd, 0);
>> +       flags = OPTS_GET(opts, flags, 0);
>> +       ifindex = OPTS_GET(opts, ifindex, 0);
>> +
>> +       /* validate we don't have unexpected combinations of non-zero fields */
>> +       if (!ifindex) {
>> +               pr_warn("prog '%s': target netdevice ifindex cannot be zero\n",
>> +                       prog->name);
>> +               return libbpf_err_ptr(-EINVAL);
>> +       }
> 
> given ifindex is non-optional, then it makes more sense to have it as
> a mandatory argument between prog and opts in
> bpf_program__attach_tcx(), instead of as a field of an opts struct

Agree, and it will also be more in line with bpf_program__attach_xdp() one
which has ifindex as 2nd param too.

I also implemented the rest of the suggestions in here for v5, thanks!

>> +       if (relative_fd > 0 && relative_id) {
> 
> this asymmetrical check is a bit distracting. And also, if someone
> specifies negative FD and positive ID, that's also a bad combo and we
> shouldn't just ignore invalid FD, right? So I'd have a nice and clean
> 
> if (relative_fd && relative_id) { /* bad */ }
> 
>> +               pr_warn("prog '%s': relative_fd and relative_id cannot be set at the same time\n",
>> +                       prog->name);
>> +               return libbpf_err_ptr(-EINVAL);
>> +       }
>> +       if (relative_id)
>> +               flags |= BPF_F_ID;
> 
> I think bpf_link_create() will add this flag anyways, so can drop this
> adjustment logic here?
> 
>> +
>> +       link_create_opts.tcx.expected_revision = OPTS_GET(opts, expected_revision, 0);
>> +       link_create_opts.tcx.relative_fd = relative_fd;
>> +       link_create_opts.tcx.relative_id = relative_id;
>> +       link_create_opts.flags = flags;
>> +
>> +       /* target_fd/target_ifindex use the same field in LINK_CREATE */
>> +       return bpf_program_attach_fd(prog, ifindex, "tc", &link_create_opts);
> 
> s/tc/tcx/ ?
> 
>>   }
>>
>>   struct bpf_link *bpf_program__attach_freplace(const struct bpf_program *prog,
>> @@ -11917,11 +11956,16 @@ struct bpf_link *bpf_program__attach_freplace(const struct bpf_program *prog,
>>          }
>>
>>          if (target_fd) {
>> +               LIBBPF_OPTS(bpf_link_create_opts, target_opts);
>> +
>>                  btf_id = libbpf_find_prog_btf_id(attach_func_name, target_fd);
>>                  if (btf_id < 0)
>>                          return libbpf_err_ptr(btf_id);
>>
>> -               return bpf_program__attach_fd(prog, target_fd, btf_id, "freplace");
>> +               target_opts.target_btf_id = btf_id;
>> +
>> +               return bpf_program_attach_fd(prog, target_fd, "freplace",
>> +                                            &target_opts);
>>          } else {
>>                  /* no target, so use raw_tracepoint_open for compatibility
>>                   * with old kernels
>> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
>> index 10642ad69d76..33f60a318e81 100644
>> --- a/tools/lib/bpf/libbpf.h
>> +++ b/tools/lib/bpf/libbpf.h
>> @@ -733,6 +733,22 @@ LIBBPF_API struct bpf_link *
>>   bpf_program__attach_netfilter(const struct bpf_program *prog,
>>                                const struct bpf_netfilter_opts *opts);
>>
>> +struct bpf_tcx_opts {
>> +       /* size of this struct, for forward/backward compatibility */
>> +       size_t sz;
>> +       int ifindex;
> 
> is ifindex optional or it's expected to always be specified? If the
> latter, then I'd move ifindex out of opts and make it second arg of
> bpf_program__attach_tcx, between prog and opts
> 
>> +       __u32 flags;
>> +       __u32 relative_fd;
>> +       __u32 relative_id;
>> +       __u64 expected_revision;
>> +       size_t :0;
>> +};
>> +#define bpf_tcx_opts__last_field expected_revision
>> +
>> +LIBBPF_API struct bpf_link *
>> +bpf_program__attach_tcx(const struct bpf_program *prog,
>> +                       const struct bpf_tcx_opts *opts);
>> +
>>   struct bpf_map;
>>
>>   LIBBPF_API struct bpf_link *bpf_map__attach_struct_ops(const struct bpf_map *map);
>> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
>> index a95d39bbef90..2a2db5c78048 100644
>> --- a/tools/lib/bpf/libbpf.map
>> +++ b/tools/lib/bpf/libbpf.map
>> @@ -397,4 +397,5 @@ LIBBPF_1.3.0 {
>>                  bpf_obj_pin_opts;
>>                  bpf_program__attach_netfilter;
>>                  bpf_prog_detach_opts;
>> +               bpf_program__attach_tcx;
> 
> heh, now we definitely screwed up sorting ;)
> 
>>   } LIBBPF_1.2.0;
> 
>> --
>> 2.34.1
>>
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v4 6/8] bpftool: Extend net dump with tcx progs
  2023-07-10 20:12 ` [PATCH bpf-next v4 6/8] bpftool: Extend net dump with tcx progs Daniel Borkmann
@ 2023-07-11 14:19   ` Quentin Monnet
  2023-07-11 16:46     ` Daniel Borkmann
  0 siblings, 1 reply; 22+ messages in thread
From: Quentin Monnet @ 2023-07-11 14:19 UTC (permalink / raw)
  To: Daniel Borkmann, ast
  Cc: andrii, martin.lau, razor, sdf, john.fastabend, kuba, dxu, joe,
	toke, davem, bpf, netdev

2023-07-10 22:12 UTC+0200 ~ Daniel Borkmann <daniel@iogearbox.net>
> Add support to dump fd-based attach types via bpftool. This includes both
> the tc BPF link and attach ops programs. Dumped information contain the
> attach location, function entry name, program ID and link ID when applicable.
> 
> Example with tc BPF link:
> 
>   # ./bpftool net
>   xdp:
> 
>   tc:
>   bond0(4) tcx/ingress cil_from_netdev prog id 784 link id 10
>   bond0(4) tcx/egress cil_to_netdev prog id 804 link id 11
> 
>   flow_dissector:
> 
>   netfilter:
> 
> Example with tc BPF attach ops:
> 
>   # ./bpftool net
>   xdp:
> 
>   tc:
>   bond0(4) tcx/ingress cil_from_netdev prog id 654
>   bond0(4) tcx/egress cil_to_netdev prog id 672
> 
>   flow_dissector:
> 
>   netfilter:
> 
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

Reviewed-by: Quentin Monnet <quentin@isovalent.com>

Thank you!

If you respin, would you mind updating the docs please
(Documentation/bpftool-net.rst), I realise it says that "bpftool net"
only dumps for tc and XDP, but that's not true any more since we have
the flow dissector, netfilter programs, and now tcx. The examples are
out-of-date too, but updating them doesn't have to be part of this PR.

> ---
>  tools/bpf/bpftool/net.c | 86 +++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 82 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/bpf/bpftool/net.c b/tools/bpf/bpftool/net.c
> index 26a49965bf71..22af0a81458c 100644
> --- a/tools/bpf/bpftool/net.c
> +++ b/tools/bpf/bpftool/net.c
> @@ -76,6 +76,11 @@ static const char * const attach_type_strings[] = {
>  	[NET_ATTACH_TYPE_XDP_OFFLOAD]	= "xdpoffload",
>  };
>  
> +static const char * const attach_loc_strings[] = {
> +	[BPF_TCX_INGRESS]		= "tcx/ingress",
> +	[BPF_TCX_EGRESS]		= "tcx/egress",
> +};
> +
>  const size_t net_attach_type_size = ARRAY_SIZE(attach_type_strings);
>  
>  static enum net_attach_type parse_attach_type(const char *str)
> @@ -422,8 +427,80 @@ static int dump_filter_nlmsg(void *cookie, void *msg, struct nlattr **tb)
>  			      filter_info->devname, filter_info->ifindex);
>  }
>  
> -static int show_dev_tc_bpf(int sock, unsigned int nl_pid,
> -			   struct ip_devname_ifindex *dev)
> +static const char *flags_strings(__u32 flags)
> +{
> +	return json_output ? "none" : "";
> +}
> +
> +static int __show_dev_tc_bpf_name(__u32 id, char *name, size_t len)
> +{
> +	struct bpf_prog_info info = {};
> +	__u32 ilen = sizeof(info);
> +	int fd, ret;
> +
> +	fd = bpf_prog_get_fd_by_id(id);
> +	if (fd < 0)
> +		return fd;
> +	ret = bpf_obj_get_info_by_fd(fd, &info, &ilen);
> +	if (ret < 0)
> +		goto out;
> +	ret = -ENOENT;
> +	if (info.name[0]) {
> +		get_prog_full_name(&info, fd, name, len);
> +		ret = 0;
> +	}
> +out:
> +	close(fd);
> +	return ret;
> +}
> +
> +static void __show_dev_tc_bpf(const struct ip_devname_ifindex *dev,
> +			      const enum bpf_attach_type loc)
> +{
> +	__u32 prog_flags[64] = {}, link_flags[64] = {}, i;
> +	__u32 prog_ids[64] = {}, link_ids[64] = {};
> +	LIBBPF_OPTS(bpf_prog_query_opts, optq);
> +	char prog_name[MAX_PROG_FULL_NAME];
> +	int ret;
> +
> +	optq.prog_ids = prog_ids;
> +	optq.prog_attach_flags = prog_flags;
> +	optq.link_ids = link_ids;
> +	optq.link_attach_flags = link_flags;
> +	optq.count = ARRAY_SIZE(prog_ids);
> +
> +	ret = bpf_prog_query_opts(dev->ifindex, loc, &optq);
> +	if (ret)
> +		return;
> +	for (i = 0; i < optq.count; i++) {
> +		NET_START_OBJECT;
> +		NET_DUMP_STR("devname", "%s", dev->devname);
> +		NET_DUMP_UINT("ifindex", "(%u)", dev->ifindex);
> +		NET_DUMP_STR("kind", " %s", attach_loc_strings[loc]);
> +		ret = __show_dev_tc_bpf_name(prog_ids[i], prog_name,
> +					     sizeof(prog_name));
> +		if (!ret)
> +			NET_DUMP_STR("name", " %s", prog_name);
> +		NET_DUMP_UINT("prog_id", " prog id %u", prog_ids[i]);

I was unsure at first about having two words for "prog id", or "link id"
below (we use "prog_id" for netfilter, for example), but I see it leaves
you the opportunity to append the flags, if any, without additional
keywords so... why not.

> +		if (prog_flags[i])
> +			NET_DUMP_STR("prog_flags", "%s", flags_strings(prog_flags[i]));
> +		if (link_ids[i])
> +			NET_DUMP_UINT("link_id", " link id %u",
> +				      link_ids[i]);
> +		if (link_flags[i])
> +			NET_DUMP_STR("link_flags", "%s", flags_strings(link_flags[i]));
> +		NET_END_OBJECT_FINAL;
> +	}
> +}
> +
> +static void show_dev_tc_bpf(struct ip_devname_ifindex *dev)
> +{
> +	__show_dev_tc_bpf(dev, BPF_TCX_INGRESS);
> +	__show_dev_tc_bpf(dev, BPF_TCX_EGRESS);
> +}
> +
> +static int show_dev_tc_bpf_classic(int sock, unsigned int nl_pid,
> +				   struct ip_devname_ifindex *dev)
>  {
>  	struct bpf_filter_t filter_info;
>  	struct bpf_tcinfo_t tcinfo;
> @@ -790,8 +867,9 @@ static int do_show(int argc, char **argv)
>  	if (!ret) {
>  		NET_START_ARRAY("tc", "%s:\n");
>  		for (i = 0; i < dev_array.used_len; i++) {
> -			ret = show_dev_tc_bpf(sock, nl_pid,
> -					      &dev_array.devices[i]);
> +			show_dev_tc_bpf(&dev_array.devices[i]);
> +			ret = show_dev_tc_bpf_classic(sock, nl_pid,
> +						      &dev_array.devices[i]);
>  			if (ret)
>  				break;
>  		}


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v4 6/8] bpftool: Extend net dump with tcx progs
  2023-07-11 14:19   ` Quentin Monnet
@ 2023-07-11 16:46     ` Daniel Borkmann
  0 siblings, 0 replies; 22+ messages in thread
From: Daniel Borkmann @ 2023-07-11 16:46 UTC (permalink / raw)
  To: Quentin Monnet, ast
  Cc: andrii, martin.lau, razor, sdf, john.fastabend, kuba, dxu, joe,
	toke, davem, bpf, netdev

On 7/11/23 4:19 PM, Quentin Monnet wrote:
> 2023-07-10 22:12 UTC+0200 ~ Daniel Borkmann <daniel@iogearbox.net>
>> Add support to dump fd-based attach types via bpftool. This includes both
>> the tc BPF link and attach ops programs. Dumped information contain the
>> attach location, function entry name, program ID and link ID when applicable.
>>
>> Example with tc BPF link:
>>
>>    # ./bpftool net
>>    xdp:
>>
>>    tc:
>>    bond0(4) tcx/ingress cil_from_netdev prog id 784 link id 10
>>    bond0(4) tcx/egress cil_to_netdev prog id 804 link id 11
>>
>>    flow_dissector:
>>
>>    netfilter:
>>
>> Example with tc BPF attach ops:
>>
>>    # ./bpftool net
>>    xdp:
>>
>>    tc:
>>    bond0(4) tcx/ingress cil_from_netdev prog id 654
>>    bond0(4) tcx/egress cil_to_netdev prog id 672
>>
>>    flow_dissector:
>>
>>    netfilter:
>>
>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> 
> Reviewed-by: Quentin Monnet <quentin@isovalent.com>
> 
> Thank you!
> 
> If you respin, would you mind updating the docs please
> (Documentation/bpftool-net.rst), I realise it says that "bpftool net"
> only dumps for tc and XDP, but that's not true any more since we have
> the flow dissector, netfilter programs, and now tcx. The examples are
> out-of-date too, but updating them doesn't have to be part of this PR.

Good point, I updated the docs and help usage to reflect that.

>>   tools/bpf/bpftool/net.c | 86 +++++++++++++++++++++++++++++++++++++++--
>>   1 file changed, 82 insertions(+), 4 deletions(-)
>>
>> diff --git a/tools/bpf/bpftool/net.c b/tools/bpf/bpftool/net.c
>> index 26a49965bf71..22af0a81458c 100644
>> --- a/tools/bpf/bpftool/net.c
>> +++ b/tools/bpf/bpftool/net.c
>> @@ -76,6 +76,11 @@ static const char * const attach_type_strings[] = {
>>   	[NET_ATTACH_TYPE_XDP_OFFLOAD]	= "xdpoffload",
>>   };
>>   
>> +static const char * const attach_loc_strings[] = {
>> +	[BPF_TCX_INGRESS]		= "tcx/ingress",
>> +	[BPF_TCX_EGRESS]		= "tcx/egress",
>> +};
>> +
>>   const size_t net_attach_type_size = ARRAY_SIZE(attach_type_strings);
>>   
>>   static enum net_attach_type parse_attach_type(const char *str)
>> @@ -422,8 +427,80 @@ static int dump_filter_nlmsg(void *cookie, void *msg, struct nlattr **tb)
>>   			      filter_info->devname, filter_info->ifindex);
>>   }
>>   
>> -static int show_dev_tc_bpf(int sock, unsigned int nl_pid,
>> -			   struct ip_devname_ifindex *dev)
>> +static const char *flags_strings(__u32 flags)
>> +{
>> +	return json_output ? "none" : "";
>> +}
>> +
>> +static int __show_dev_tc_bpf_name(__u32 id, char *name, size_t len)
>> +{
>> +	struct bpf_prog_info info = {};
>> +	__u32 ilen = sizeof(info);
>> +	int fd, ret;
>> +
>> +	fd = bpf_prog_get_fd_by_id(id);
>> +	if (fd < 0)
>> +		return fd;
>> +	ret = bpf_obj_get_info_by_fd(fd, &info, &ilen);
>> +	if (ret < 0)
>> +		goto out;
>> +	ret = -ENOENT;
>> +	if (info.name[0]) {
>> +		get_prog_full_name(&info, fd, name, len);
>> +		ret = 0;
>> +	}
>> +out:
>> +	close(fd);
>> +	return ret;
>> +}
>> +
>> +static void __show_dev_tc_bpf(const struct ip_devname_ifindex *dev,
>> +			      const enum bpf_attach_type loc)
>> +{
>> +	__u32 prog_flags[64] = {}, link_flags[64] = {}, i;
>> +	__u32 prog_ids[64] = {}, link_ids[64] = {};
>> +	LIBBPF_OPTS(bpf_prog_query_opts, optq);
>> +	char prog_name[MAX_PROG_FULL_NAME];
>> +	int ret;
>> +
>> +	optq.prog_ids = prog_ids;
>> +	optq.prog_attach_flags = prog_flags;
>> +	optq.link_ids = link_ids;
>> +	optq.link_attach_flags = link_flags;
>> +	optq.count = ARRAY_SIZE(prog_ids);
>> +
>> +	ret = bpf_prog_query_opts(dev->ifindex, loc, &optq);
>> +	if (ret)
>> +		return;
>> +	for (i = 0; i < optq.count; i++) {
>> +		NET_START_OBJECT;
>> +		NET_DUMP_STR("devname", "%s", dev->devname);
>> +		NET_DUMP_UINT("ifindex", "(%u)", dev->ifindex);
>> +		NET_DUMP_STR("kind", " %s", attach_loc_strings[loc]);
>> +		ret = __show_dev_tc_bpf_name(prog_ids[i], prog_name,
>> +					     sizeof(prog_name));
>> +		if (!ret)
>> +			NET_DUMP_STR("name", " %s", prog_name);
>> +		NET_DUMP_UINT("prog_id", " prog id %u", prog_ids[i]);
> 
> I was unsure at first about having two words for "prog id", or "link id"
> below (we use "prog_id" for netfilter, for example), but I see it leaves
> you the opportunity to append the flags, if any, without additional
> keywords so... why not.

Ok, I'll change it into prog_id, link_id for consistency for the human readable output.

And some like flow dissector just dump 'id'. After sync with Quentin, I tracked this
in [0] to more streamline the net dump output for other types.

   [0] https://github.com/libbpf/bpftool/issues/106

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v4 1/8] bpf: Add generic attach/detach/query API for multi-progs
  2023-07-10 20:12 ` [PATCH bpf-next v4 1/8] bpf: Add generic attach/detach/query API for multi-progs Daniel Borkmann
  2023-07-11  0:23   ` Alexei Starovoitov
@ 2023-07-11 18:48   ` Andrii Nakryiko
  2023-07-14 16:00     ` Daniel Borkmann
  1 sibling, 1 reply; 22+ messages in thread
From: Andrii Nakryiko @ 2023-07-11 18:48 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: ast, andrii, martin.lau, razor, sdf, john.fastabend, kuba, dxu,
	joe, toke, davem, bpf, netdev

On Mon, Jul 10, 2023 at 1:12 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> This adds a generic layer called bpf_mprog which can be reused by different
> attachment layers to enable multi-program attachment and dependency resolution.
> In-kernel users of the bpf_mprog don't need to care about the dependency
> resolution internals, they can just consume it with few API calls.
>
> The initial idea of having a generic API sparked out of discussion [0] from an
> earlier revision of this work where tc's priority was reused and exposed via
> BPF uapi as a way to coordinate dependencies among tc BPF programs, similar
> as-is for classic tc BPF. The feedback was that priority provides a bad user
> experience and is hard to use [1], e.g.:
>
>   I cannot help but feel that priority logic copy-paste from old tc, netfilter
>   and friends is done because "that's how things were done in the past". [...]
>   Priority gets exposed everywhere in uapi all the way to bpftool when it's
>   right there for users to understand. And that's the main problem with it.
>
>   The user don't want to and don't need to be aware of it, but uapi forces them
>   to pick the priority. [...] Your cover letter [0] example proves that in
>   real life different service pick the same priority. They simply don't know
>   any better. Priority is an unnecessary magic that apps _have_ to pick, so
>   they just copy-paste and everyone ends up using the same.
>
> The course of the discussion showed more and more the need for a generic,
> reusable API where the "same look and feel" can be applied for various other
> program types beyond just tc BPF, for example XDP today does not have multi-
> program support in kernel, but also there was interest around this API for
> improving management of cgroup program types. Such common multi-program
> management concept is useful for BPF management daemons or user space BPF
> applications coordinating internally about their attachments.
>
> Both from Cilium and Meta side [2], we've collected the following requirements
> for a generic attach/detach/query API for multi-progs which has been implemented
> as part of this work:
>
>   - Support prog-based attach/detach and link API
>   - Dependency directives (can also be combined):
>     - BPF_F_{BEFORE,AFTER} with relative_{fd,id} which can be {prog,link,none}
>       - BPF_F_ID flag as {fd,id} toggle; the rationale for id is so that user
>         space application does not need CAP_SYS_ADMIN to retrieve foreign fds
>         via bpf_*_get_fd_by_id()
>       - BPF_F_LINK flag as {prog,link} toggle
>       - If relative_{fd,id} is none, then BPF_F_BEFORE will just prepend, and
>         BPF_F_AFTER will just append for attaching
>       - Enforced only at attach time
>     - BPF_F_REPLACE with replace_bpf_fd which can be prog, links have their
>       own infra for replacing their internal prog
>     - If no flags are set, then it's default append behavior for attaching
>   - Internal revision counter and optionally being able to pass expected_revision
>   - User space application can query current state with revision, and pass it
>     along for attachment to assert current state before doing updates
>   - Query also gets extension for link_ids array and link_attach_flags:
>     - prog_ids are always filled with program IDs
>     - link_ids are filled with link IDs when link was used, otherwise 0
>     - {prog,link}_attach_flags for holding {prog,link}-specific flags
>   - Must be easy to integrate/reuse for in-kernel users
>
> The uapi-side changes needed for supporting bpf_mprog are rather minimal,
> consisting of the additions of the attachment flags, revision counter, and
> expanding existing union with relative_{fd,id} member.
>
> The bpf_mprog framework consists of an bpf_mprog_entry object which holds
> an array of bpf_mprog_fp (fast-path structure). The bpf_mprog_cp (control-path
> structure) is part of bpf_mprog_bundle. Both have been separated, so that
> fast-path gets efficient packing of bpf_prog pointers for maximum cache
> efficiency. Also, array has been chosen instead of linked list or other
> structures to remove unnecessary indirections for a fast point-to-entry in
> tc for BPF.
>
> The bpf_mprog_entry comes as a pair via bpf_mprog_bundle so that in case of
> updates the peer bpf_mprog_entry is populated and then just swapped which
> avoids additional allocations that could otherwise fail, for example, in
> detach case. bpf_mprog_{fp,cp} arrays are currently static, but they could
> be converted to dynamic allocation if necessary at a point in future.
> Locking is deferred to the in-kernel user of bpf_mprog, for example, in case
> of tcx which uses this API in the next patch, it piggybacks on rtnl.
>
> An extensive test suite for checking all aspects of this API for prog-based
> attach/detach and link API comes as BPF selftests in this series.
>
> Kudos also to Andrii Nakryiko for API discussions wrt Meta's BPF management.
>
>   [0] https://lore.kernel.org/bpf/20221004231143.19190-1-daniel@iogearbox.net
>   [1] https://lore.kernel.org/bpf/CAADnVQ+gEY3FjCR=+DmjDR4gp5bOYZUFJQXj4agKFHT9CQPZBw@mail.gmail.com
>   [2] http://vger.kernel.org/bpfconf2023_material/tcx_meta_netdev_borkmann.pdf
>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> ---
>  MAINTAINERS                    |   1 +
>  include/linux/bpf_mprog.h      | 343 ++++++++++++++++++++++++++
>  include/uapi/linux/bpf.h       |  36 ++-
>  kernel/bpf/Makefile            |   2 +-
>  kernel/bpf/mprog.c             | 427 +++++++++++++++++++++++++++++++++
>  tools/include/uapi/linux/bpf.h |  36 ++-
>  6 files changed, 828 insertions(+), 17 deletions(-)
>  create mode 100644 include/linux/bpf_mprog.h
>  create mode 100644 kernel/bpf/mprog.c
>

From UAPI perspective looks great! Few implementation suggestion
below. I'll also reply separately to Alexei's reply with discussion on
higher-level *internal* API.

[...]

> +
> +#define BPF_MPROG_KEEP 0
> +#define BPF_MPROG_SWAP 1
> +#define BPF_MPROG_FREE 2
> +
> +#define BPF_MPROG_MAX  64
> +
> +#define bpf_mprog_foreach_tuple(entry, fp, cp, t)                      \
> +       for (fp = &entry->fp_items[0], cp = &entry->parent->cp_items[0];\
> +            ({                                                         \
> +               t.prog = READ_ONCE(fp->prog);                           \
> +               t.link = cp->link;                                      \
> +               t.prog;                                                 \
> +             });                                                       \
> +            fp++, cp++)

I wish we could do something like the below to avoid the need to pass
fp and cp from outside:

for (struct { struct bpf_mprog_fp *fp; struct bpf_mprog_cp *cp;} tmp =
     { &entry->fp_items[0], &entry->parent->cp_iterms[0]};
     t.link = tmp.cp->link, t.prog = READ_ONCE(tmp.fp->prog);
     fp++, cp++)

But I'm not sure the kernel's C style allows that yet.

But I think you can use the comma operator to avoid that more verbose
({ }) construct.

> +
> +#define bpf_mprog_foreach_prog(entry, fp, p)                           \
> +       for (fp = &entry->fp_items[0];                                  \
> +            (p = READ_ONCE(fp->prog));                                 \
> +            fp++)
> +

[...]

> +static inline void bpf_mprog_inc(struct bpf_mprog_entry *entry)
> +{
> +       entry->parent->count++;
> +}
> +
> +static inline void bpf_mprog_dec(struct bpf_mprog_entry *entry)
> +{
> +       entry->parent->count--;
> +}
> +
> +static inline int bpf_mprog_max(void)
> +{
> +       return ARRAY_SIZE(((struct bpf_mprog_entry *)NULL)->fp_items) - 1;
> +}

so we can only add BPF_MPROG_MAX - 1 progs, right? I presume the last
entry is presumed to be always NULL, right?

> +
> +static inline int bpf_mprog_total(struct bpf_mprog_entry *entry)
> +{
> +       int total = entry->parent->count;
> +
> +       WARN_ON_ONCE(total > bpf_mprog_max());
> +       return total;
> +}
> +

[...]

> diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
> index 1d3892168d32..1bea2eb912cd 100644
> --- a/kernel/bpf/Makefile
> +++ b/kernel/bpf/Makefile
> @@ -12,7 +12,7 @@ obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list
>  obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o
>  obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o
>  obj-${CONFIG_BPF_LSM}    += bpf_inode_storage.o
> -obj-$(CONFIG_BPF_SYSCALL) += disasm.o
> +obj-$(CONFIG_BPF_SYSCALL) += disasm.o mprog.o
>  obj-$(CONFIG_BPF_JIT) += trampoline.o
>  obj-$(CONFIG_BPF_SYSCALL) += btf.o memalloc.o
>  obj-$(CONFIG_BPF_JIT) += dispatcher.o
> diff --git a/kernel/bpf/mprog.c b/kernel/bpf/mprog.c
> new file mode 100644
> index 000000000000..1c4fcde74969
> --- /dev/null
> +++ b/kernel/bpf/mprog.c
> @@ -0,0 +1,427 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2023 Isovalent */
> +
> +#include <linux/bpf.h>
> +#include <linux/bpf_mprog.h>
> +
> +static int bpf_mprog_link(struct bpf_tuple *tuple,
> +                         u32 object, u32 flags,

so I tried to get used to this "object" notation, but I think it's
still awkwards and keeps me asking "what is this really" every single
time I read this. I wonder if something like "fd_or_id" as a name
would make it more obvious?

> +                         enum bpf_prog_type type)
> +{
> +       bool id = flags & BPF_F_ID;
> +       struct bpf_link *link;
> +

should we reject this object/fd_or_id if it's zero, instead of trying
to lookup ID/FD 0?


> +       if (id)
> +               link = bpf_link_by_id(object);
> +       else
> +               link = bpf_link_get_from_fd(object);
> +       if (IS_ERR(link))
> +               return PTR_ERR(link);
> +       if (type && link->prog->type != type) {
> +               bpf_link_put(link);
> +               return -EINVAL;
> +       }
> +
> +       tuple->link = link;
> +       tuple->prog = link->prog;
> +       return 0;
> +}
> +
> +static int bpf_mprog_prog(struct bpf_tuple *tuple,
> +                         u32 object, u32 flags,
> +                         enum bpf_prog_type type)
> +{
> +       bool id = flags & BPF_F_ID;
> +       struct bpf_prog *prog;
> +

same here about rejecting zero object?

> +       if (id)
> +               prog = bpf_prog_by_id(object);
> +       else
> +               prog = bpf_prog_get(object);
> +       if (IS_ERR(prog)) {
> +               if (!object && !id)
> +                       return 0;
> +               return PTR_ERR(prog);
> +       }
> +       if (type && prog->type != type) {
> +               bpf_prog_put(prog);
> +               return -EINVAL;
> +       }
> +
> +       tuple->link = NULL;
> +       tuple->prog = prog;
> +       return 0;
> +}
> +
> +static int bpf_mprog_tuple_relative(struct bpf_tuple *tuple,
> +                                   u32 object, u32 flags,
> +                                   enum bpf_prog_type type)
> +{
> +       memset(tuple, 0, sizeof(*tuple));
> +       if (flags & BPF_F_LINK)
> +               return bpf_mprog_link(tuple, object, flags, type);
> +       return bpf_mprog_prog(tuple, object, flags, type);
> +}
> +
> +static void bpf_mprog_tuple_put(struct bpf_tuple *tuple)
> +{
> +       if (tuple->link)
> +               bpf_link_put(tuple->link);
> +       else if (tuple->prog)
> +               bpf_prog_put(tuple->prog);
> +}
> +
> +static int bpf_mprog_replace(struct bpf_mprog_entry *entry,
> +                            struct bpf_tuple *ntuple, int idx)
> +{
> +       struct bpf_mprog_fp *fp;
> +       struct bpf_mprog_cp *cp;
> +       struct bpf_prog *oprog;
> +
> +       bpf_mprog_read(entry, idx, &fp, &cp);
> +       oprog = READ_ONCE(fp->prog);
> +       bpf_mprog_write(fp, cp, ntuple);
> +       if (!ntuple->link) {
> +               WARN_ON_ONCE(cp->link);
> +               bpf_prog_put(oprog);
> +       }
> +       return BPF_MPROG_KEEP;
> +}
> +
> +static int bpf_mprog_insert(struct bpf_mprog_entry *entry,
> +                           struct bpf_tuple *ntuple, int idx, u32 flags)
> +{
> +       int i, j = 0, total = bpf_mprog_total(entry);
> +       struct bpf_mprog_cp *cp, cpp[BPF_MPROG_MAX] = {};

a bit worried about using 512 bytes for local cpp array... my initial
assumption was that we won't have to create a copy of cp_iterms, just
update it in place. Hm... let's have the higher-level API discussion
in one branch, where Alexei has some proposals as well.

> +       struct bpf_mprog_fp *fp, *fpp;
> +       struct bpf_mprog_entry *peer;
> +
> +       peer = bpf_mprog_peer(entry);
> +       bpf_mprog_entry_clear(peer);
> +       if (idx < 0) {
> +               bpf_mprog_read_fp(peer, j, &fpp);
> +               bpf_mprog_write_fp(fpp, ntuple);
> +               bpf_mprog_write_cp(&cpp[j], ntuple);
> +               j++;
> +       }
> +       for (i = 0; i <= total; i++) {
> +               bpf_mprog_read_fp(peer, j, &fpp);
> +               if (idx == i && (flags & BPF_F_AFTER)) {
> +                       bpf_mprog_write(fpp, &cpp[j], ntuple);
> +                       j++;
> +                       bpf_mprog_read_fp(peer, j, &fpp);
> +               }
> +               if (i < total) {
> +                       bpf_mprog_read(entry, i, &fp, &cp);
> +                       bpf_mprog_copy(fpp, &cpp[j], fp, cp);
> +                       j++;
> +               }
> +               if (idx == i && (flags & BPF_F_BEFORE)) {
> +                       bpf_mprog_read_fp(peer, j, &fpp);
> +                       bpf_mprog_write(fpp, &cpp[j], ntuple);
> +                       j++;
> +               }
> +       }

sorry if I miss some subtle point, but I wonder why this is so
complicated? I think this choice of idx == -1 meaning prepend is
leading to this complication. It's not also clear why there is this
BPF_F_AFTER vs BPF_F_BEFORE distinction when we already determined a
position where new program has to be inserted (so after or before
should be irrelevant).

Please let me know why the below doesn't work.

Let's define that idx is the position where new prog/link tuple has to
be inserted. It can be in the range [0, N], where N is number of
programs currently in the mprog_peer. Note that N is inclusive above.

The algorithm for insertion is simple: everything currently at
entry->fp_items[idx] and after gets shifted. And we can do it with a
simple memmove:

memmove(peer->fp_items + idx + 1, peer->fp_iters + idx,
(bpf_mprog_total(entry) - idx) * sizeof(struct bpf_mprof_fp));
/* similar memmove for cp_items/cpp array, of course */
/* now set new prog at peer->fp_items[idx] */

The above should replace entire above for loop and that extra if
before the loop. And it should work for corner cases:

  - idx == 0 (prepend), will shift everything to the right, and put
new prog at position 0. Exactly what we wanted.
  - idx == N (append), will shift nothing (that memmov should be a
no-op because size is zero, total == idx == N)


We just need to make sure that the above shift won't overwrite the
very last NULL. So bpf_mprog_total() should be < BPF_MPROG_MAX - 2
before all this.

Seems as simple as that, is there any complication I skimmed over?


> +       bpf_mprog_commit_cp(peer, cpp);
> +       bpf_mprog_inc(peer);
> +       return BPF_MPROG_SWAP;
> +}
> +
> +static int bpf_mprog_tuple_confirm(struct bpf_mprog_entry *entry,
> +                                  struct bpf_tuple *dtuple, int idx)
> +{
> +       int first = 0, last = bpf_mprog_total(entry) - 1;
> +       struct bpf_mprog_cp *cp;
> +       struct bpf_mprog_fp *fp;
> +       struct bpf_prog *prog;
> +       struct bpf_link *link;
> +
> +       if (idx <= first)
> +               bpf_mprog_read(entry, first, &fp, &cp);
> +       else if (idx >= last)
> +               bpf_mprog_read(entry, last, &fp, &cp);
> +       else
> +               bpf_mprog_read(entry, idx, &fp, &cp);
> +
> +       prog = READ_ONCE(fp->prog);
> +       link = cp->link;
> +       if (!dtuple->link && link)
> +               return -EBUSY;
> +
> +       WARN_ON_ONCE(dtuple->prog && dtuple->prog != prog);
> +       WARN_ON_ONCE(dtuple->link && dtuple->link != link);
> +
> +       dtuple->prog = prog;
> +       dtuple->link = link;
> +       return 0;
> +}
> +
> +static int bpf_mprog_delete(struct bpf_mprog_entry *entry,
> +                           struct bpf_tuple *dtuple, int idx)
> +{
> +       int i = 0, j, ret, total = bpf_mprog_total(entry);
> +       struct bpf_mprog_cp *cp, cpp[BPF_MPROG_MAX] = {};
> +       struct bpf_mprog_fp *fp, *fpp;
> +       struct bpf_mprog_entry *peer;
> +
> +       ret = bpf_mprog_tuple_confirm(entry, dtuple, idx);
> +       if (ret)
> +               return ret;
> +       peer = bpf_mprog_peer(entry);
> +       bpf_mprog_entry_clear(peer);
> +       if (idx < 0)
> +               i++;
> +       if (idx == total)
> +               total--;
> +       for (j = 0; i < total; i++) {
> +               if (idx == i)
> +                       continue;
> +               bpf_mprog_read_fp(peer, j, &fpp);
> +               bpf_mprog_read(entry, i, &fp, &cp);
> +               bpf_mprog_copy(fpp, &cpp[j], fp, cp);
> +               j++;
> +       }
> +       bpf_mprog_commit_cp(peer, cpp);
> +       bpf_mprog_dec(peer);
> +       bpf_mprog_mark_ref(peer, dtuple);
> +       return bpf_mprog_total(peer) ?
> +              BPF_MPROG_SWAP : BPF_MPROG_FREE;

for delete it's also a bit unclear to me. We are deleting some
specific spot, so idx should be a valid [0, N) value, no? Then why the
bpf_mprog_tuple_confirm() has this special <= first and idx >= last
handling?

Deletion should be similar to instertion, just the shift is in the
other direction. And then setting NULLs at N-1 position to ensure
proper NULL termination of fp array.

> +}
> +
> +/* In bpf_mprog_pos_*() we evaluate the target position for the BPF
> + * program/link that needs to be replaced, inserted or deleted for
> + * each "rule" independently. If all rules agree on that position
> + * or existing element, then enact replacement, addition or deletion.
> + * If this is not the case, then the request cannot be satisfied and
> + * we bail out with an error.
> + */
> +static int bpf_mprog_pos_exact(struct bpf_mprog_entry *entry,
> +                              struct bpf_tuple *tuple)
> +{
> +       struct bpf_mprog_fp *fp;
> +       struct bpf_mprog_cp *cp;
> +       int i;
> +
> +       for (i = 0; i < bpf_mprog_total(entry); i++) {
> +               bpf_mprog_read(entry, i, &fp, &cp);
> +               if (tuple->prog == READ_ONCE(fp->prog))
> +                       return tuple->link == cp->link ? i : -EBUSY;
> +       }
> +       return -ENOENT;
> +}
> +
> +static int bpf_mprog_pos_before(struct bpf_mprog_entry *entry,
> +                               struct bpf_tuple *tuple)
> +{
> +       struct bpf_mprog_fp *fp;
> +       struct bpf_mprog_cp *cp;
> +       int i;
> +
> +       for (i = 0; i < bpf_mprog_total(entry); i++) {
> +               bpf_mprog_read(entry, i, &fp, &cp);
> +               if (tuple->prog == READ_ONCE(fp->prog) &&
> +                   (!tuple->link || tuple->link == cp->link))
> +                       return i - 1;

taking all the above into account, this should just `return i;`

> +       }
> +       return tuple->prog ? -ENOENT : -1;
> +}
> +
> +static int bpf_mprog_pos_after(struct bpf_mprog_entry *entry,
> +                              struct bpf_tuple *tuple)
> +{
> +       struct bpf_mprog_fp *fp;
> +       struct bpf_mprog_cp *cp;
> +       int i;
> +
> +       for (i = 0; i < bpf_mprog_total(entry); i++) {
> +               bpf_mprog_read(entry, i, &fp, &cp);
> +               if (tuple->prog == READ_ONCE(fp->prog) &&
> +                   (!tuple->link || tuple->link == cp->link))
> +                       return i + 1;
> +       }
> +       return tuple->prog ? -ENOENT : bpf_mprog_total(entry);
> +}

I actually wonder if it would be simpler to not have _exact, _before,
and _after variant. Instead do generic find of a tuple. And then
outside of that, depending on BPF_F_BEFORE/BPF_F_AFTER/BPF_F_REPLACE
just adjust returned position (if item is found) to either keep it as
is for BPF_F_BEFORE and BPF_F_REPLACE, or adjust it +1 for BPF_F_AFTER

> +
> +int bpf_mprog_attach(struct bpf_mprog_entry *entry, struct bpf_prog *prog_new,
> +                    struct bpf_link *link, struct bpf_prog *prog_old,
> +                    u32 flags, u32 object, u64 revision)
> +{
> +       struct bpf_tuple rtuple, ntuple = {
> +               .prog = prog_new,
> +               .link = link,
> +       }, otuple = {
> +               .prog = prog_old,
> +               .link = link,
> +       };
> +       int ret, idx = -2, tidx;

so here I'd init idx to some "impossible" error, like -ERANGE (to pair
with -EDOM ;)

> +
> +       if (revision && revision != bpf_mprog_revision(entry))
> +               return -ESTALE;
> +       if (bpf_mprog_exists(entry, prog_new))
> +               return -EEXIST;
> +       ret = bpf_mprog_tuple_relative(&rtuple, object,
> +                                      flags & ~BPF_F_REPLACE,
> +                                      prog_new->type);
> +       if (ret)
> +               return ret;
> +       if (flags & BPF_F_REPLACE) {
> +               tidx = bpf_mprog_pos_exact(entry, &otuple);
> +               if (tidx < 0) {
> +                       ret = tidx;
> +                       goto out;
> +               }
> +               idx = tidx;
> +       }
> +       if (flags & BPF_F_BEFORE) {
> +               tidx = bpf_mprog_pos_before(entry, &rtuple);
> +               if (tidx < -1 || (idx >= -1 && tidx != idx)) {
> +                       ret = tidx < -1 ? tidx : -EDOM;
> +                       goto out;
> +               }
> +               idx = tidx;
> +       }
> +       if (flags & BPF_F_AFTER) {
> +               tidx = bpf_mprog_pos_after(entry, &rtuple);
> +               if (tidx < -1 || (idx >= -1 && tidx != idx)) {
> +                       ret = tidx < 0 ? tidx : -EDOM;
> +                       goto out;
> +               }
> +               idx = tidx;

and then here just have special casing for -ERANGE, and otherwise
treat anything else negative as error

tidx = bpf_mprog_pos_exact(entry, &rtuple);
/* and adjust +1 for BPF_F_AFTER */
if (tidx >= 0)
    tidx += 1;
if (idx != -ERANGE && tidx != idx) {
    ret = tidx < 0 ? tidx : -EDOM;
    goto out;
}
idx = tidx;

> +       }
> +       if (idx < -1) {
> +               if (rtuple.prog || flags) {
> +                       ret = -EINVAL;
> +                       goto out;
> +               }
> +               idx = bpf_mprog_total(entry);
> +               flags = BPF_F_AFTER;
> +       }
> +       if (idx >= bpf_mprog_max()) {
> +               ret = -EDOM;
> +               goto out;
> +       }
> +       if (flags & BPF_F_REPLACE)
> +               ret = bpf_mprog_replace(entry, &ntuple, idx);
> +       else
> +               ret = bpf_mprog_insert(entry, &ntuple, idx, flags);
> +out:
> +       bpf_mprog_tuple_put(&rtuple);
> +       return ret;
> +}
> +

[...]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v4 1/8] bpf: Add generic attach/detach/query API for multi-progs
  2023-07-11  0:23   ` Alexei Starovoitov
@ 2023-07-11 18:51     ` Andrii Nakryiko
  2023-07-14 16:06       ` Daniel Borkmann
  0 siblings, 1 reply; 22+ messages in thread
From: Andrii Nakryiko @ 2023-07-11 18:51 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Daniel Borkmann, ast, andrii, martin.lau, razor, sdf,
	john.fastabend, kuba, dxu, joe, toke, davem, bpf, netdev

On Mon, Jul 10, 2023 at 5:23 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, Jul 10, 2023 at 10:12:11PM +0200, Daniel Borkmann wrote:
> > + *
> > + *   struct bpf_mprog_entry *entry, *peer;
> > + *   int ret;
> > + *
> > + *   // bpf_mprog user-side lock
> > + *   // fetch active @entry from attach location
> > + *   [...]
> > + *   ret = bpf_mprog_attach(entry, [...]);
> > + *   if (ret >= 0) {
> > + *       peer = bpf_mprog_peer(entry);
> > + *       if (bpf_mprog_swap_entries(ret))
> > + *           // swap @entry to @peer at attach location
> > + *       bpf_mprog_commit(entry);
> > + *       ret = 0;
> > + *   } else {
> > + *       // error path, bail out, propagate @ret
> > + *   }
> > + *   // bpf_mprog user-side unlock
> > + *
> > + *  Detach case:
> > + *
> > + *   struct bpf_mprog_entry *entry, *peer;
> > + *   bool release;
> > + *   int ret;
> > + *
> > + *   // bpf_mprog user-side lock
> > + *   // fetch active @entry from attach location
> > + *   [...]
> > + *   ret = bpf_mprog_detach(entry, [...]);
> > + *   if (ret >= 0) {
> > + *       release = ret == BPF_MPROG_FREE;
> > + *       peer = release ? NULL : bpf_mprog_peer(entry);
> > + *       if (bpf_mprog_swap_entries(ret))
> > + *           // swap @entry to @peer at attach location
> > + *       bpf_mprog_commit(entry);
> > + *       if (release)
> > + *           // free bpf_mprog_bundle
> > + *       ret = 0;
> > + *   } else {
> > + *       // error path, bail out, propagate @ret
> > + *   }
> > + *   // bpf_mprog user-side unlock
>
> Thanks for the doc. It helped a lot.
> And when it's contained like this it's easier to discuss api.
> It seems bpf_mprog_swap_entries() is trying to abstract the error code
> away, but BPF_MPROG_FREE leaks out and tcx_entry_needs_release()
> captures it with extra miniq_active twist, which I don't understand yet.
> bpf_mprog_peer() is also leaking a bit of implementation detail.
> Can we abstract it further, like:
>
> ret = bpf_mprog_detach(entry, [...], &new_entry);
> if (ret >= 0) {
>    if (entry != new_entry)
>      // swap @entry to @new_entry at attach location
>    bpf_mprog_commit(entry);
>    if (!new_entry)
>      // free bpf_mprog_bundle
> }
> and make bpf_mprog_peer internal to mprog. It will also allow removing
> BPF_MPROG_FREE vs SWAP distinction. peer is hidden.
>    if (entry != new_entry)
>       // update
> also will be easier to read inside tcx code without looking into mprog details.

I'm actually thinking if it's possible to simplify it even further.
For example, do we even need a separate bpf_mprog_{attach,detach} and
bpf_mprog_commit()? So far it seems like bpf_mprog_commit() is
inevitable in case of success of attach/detach, so we might as well
just do it as the last step of attach/detach operation.

The only problem seems to be due to bpf_mprog interface doing this
optimization of replacing stuff in place, if possible, and allowing
the caller to not do the swap. How important is it to avoid that swap
of a bpf_mprog_fp (pointer)? Seems pretty cheap (and relatively rare
operation), so I wouldn't bother optimizing this.

So how about we just say that there is always a swap. Internally in
bpf_mprog_bundle current entry is determined based on revision&1. We
can have bpf_mprog_cur_entry() to return a proper pointer after
commit. Or bpf_mprog_attach() can return proper new entry as output
parameter, whichever is preferable.

As for BPF_MPROG_FREE. That seems like an unnecessary complication as
well. Caller can just check bpf_mprog_total() quickly, and if it
dropped to zero assume FREE. Unless there is something more subtle
there?

With the above, the interface will be much simpler, IMO. You just do
bpf_mprog_attach/detach, and then swap pointer to new bpf_mprog_entry.
Then you can check bpf_mprog_total() for zero, and clean up further,
if necessary.

We assume the caller has a proper locking, so all the above should be non-racy.

BTW, combining commit with attach allows us to avoid that relatively
big bpf_mprog_cp array on the stack as well, because we will be able
to update bundle->cp_items in-place.

The only (I believe :) ) big assumption I'm making in all of the above
is that commit is inevitable and we won't have a situation where we
start attach, update fp/cpp, and then decide to abort instead of going
for commit. Is this possible? Can we avoid it by careful checks
upfront and doing attach as last step that cannot be undone?

P.S. I guess one bit that I might have simplified is that
synchronize_rcu() + bpf_prog_put(), but I'm not sure exactly why we
put prog after sync_rcu. But if it's really necessary (and I assume it
is) and is a blocker for the proposal above, then maybe the interface
should delegate that to the caller (i.e., optionally return replaced
prog pointer from attach/detach) or use call_rcu() with callback?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v4 1/8] bpf: Add generic attach/detach/query API for multi-progs
  2023-07-11 18:48   ` Andrii Nakryiko
@ 2023-07-14 16:00     ` Daniel Borkmann
  0 siblings, 0 replies; 22+ messages in thread
From: Daniel Borkmann @ 2023-07-14 16:00 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: ast, andrii, martin.lau, razor, sdf, john.fastabend, kuba, dxu,
	joe, toke, davem, bpf, netdev

On 7/11/23 8:48 PM, Andrii Nakryiko wrote:
> On Mon, Jul 10, 2023 at 1:12 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
[...]
>> +static inline int bpf_mprog_max(void)
>> +{
>> +       return ARRAY_SIZE(((struct bpf_mprog_entry *)NULL)->fp_items) - 1;
>> +}
> 
> so we can only add BPF_MPROG_MAX - 1 progs, right? I presume the last
> entry is presumed to be always NULL, right?

Correct.

>> +static inline int bpf_mprog_total(struct bpf_mprog_entry *entry)
>> +{
>> +       int total = entry->parent->count;
>> +
>> +       WARN_ON_ONCE(total > bpf_mprog_max());
>> +       return total;
>> +}
>> +
> 
> [...]
> 
>> diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
>> index 1d3892168d32..1bea2eb912cd 100644
>> --- a/kernel/bpf/Makefile
>> +++ b/kernel/bpf/Makefile
>> @@ -12,7 +12,7 @@ obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list
>>   obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o
>>   obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o
>>   obj-${CONFIG_BPF_LSM}    += bpf_inode_storage.o
>> -obj-$(CONFIG_BPF_SYSCALL) += disasm.o
>> +obj-$(CONFIG_BPF_SYSCALL) += disasm.o mprog.o
>>   obj-$(CONFIG_BPF_JIT) += trampoline.o
>>   obj-$(CONFIG_BPF_SYSCALL) += btf.o memalloc.o
>>   obj-$(CONFIG_BPF_JIT) += dispatcher.o
>> diff --git a/kernel/bpf/mprog.c b/kernel/bpf/mprog.c
>> new file mode 100644
>> index 000000000000..1c4fcde74969
>> --- /dev/null
>> +++ b/kernel/bpf/mprog.c
>> @@ -0,0 +1,427 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/* Copyright (c) 2023 Isovalent */
>> +
>> +#include <linux/bpf.h>
>> +#include <linux/bpf_mprog.h>
>> +
>> +static int bpf_mprog_link(struct bpf_tuple *tuple,
>> +                         u32 object, u32 flags,
> 
> so I tried to get used to this "object" notation, but I think it's
> still awkwards and keeps me asking "what is this really" every single
> time I read this. I wonder if something like "fd_or_id" as a name
> would make it more obvious?

Ok, fixed it in the v5.

[...]
>> +       struct bpf_mprog_fp *fp, *fpp;
>> +       struct bpf_mprog_entry *peer;
>> +
>> +       peer = bpf_mprog_peer(entry);
>> +       bpf_mprog_entry_clear(peer);
>> +       if (idx < 0) {
>> +               bpf_mprog_read_fp(peer, j, &fpp);
>> +               bpf_mprog_write_fp(fpp, ntuple);
>> +               bpf_mprog_write_cp(&cpp[j], ntuple);
>> +               j++;
>> +       }
>> +       for (i = 0; i <= total; i++) {
>> +               bpf_mprog_read_fp(peer, j, &fpp);
>> +               if (idx == i && (flags & BPF_F_AFTER)) {
>> +                       bpf_mprog_write(fpp, &cpp[j], ntuple);
>> +                       j++;
>> +                       bpf_mprog_read_fp(peer, j, &fpp);
>> +               }
>> +               if (i < total) {
>> +                       bpf_mprog_read(entry, i, &fp, &cp);
>> +                       bpf_mprog_copy(fpp, &cpp[j], fp, cp);
>> +                       j++;
>> +               }
>> +               if (idx == i && (flags & BPF_F_BEFORE)) {
>> +                       bpf_mprog_read_fp(peer, j, &fpp);
>> +                       bpf_mprog_write(fpp, &cpp[j], ntuple);
>> +                       j++;
>> +               }
>> +       }
> 
> sorry if I miss some subtle point, but I wonder why this is so
> complicated? I think this choice of idx == -1 meaning prepend is
> leading to this complication. It's not also clear why there is this
> BPF_F_AFTER vs BPF_F_BEFORE distinction when we already determined a
> position where new program has to be inserted (so after or before
> should be irrelevant).
> 
> Please let me know why the below doesn't work.
> 
> Let's define that idx is the position where new prog/link tuple has to
> be inserted. It can be in the range [0, N], where N is number of
> programs currently in the mprog_peer. Note that N is inclusive above.
> 
> The algorithm for insertion is simple: everything currently at
> entry->fp_items[idx] and after gets shifted. And we can do it with a
> simple memmove:
> 
> memmove(peer->fp_items + idx + 1, peer->fp_iters + idx,
> (bpf_mprog_total(entry) - idx) * sizeof(struct bpf_mprof_fp));
> /* similar memmove for cp_items/cpp array, of course */
> /* now set new prog at peer->fp_items[idx] */
> 
> The above should replace entire above for loop and that extra if
> before the loop. And it should work for corner cases:
> 
>    - idx == 0 (prepend), will shift everything to the right, and put
> new prog at position 0. Exactly what we wanted.
>    - idx == N (append), will shift nothing (that memmov should be a
> no-op because size is zero, total == idx == N)
> 
> We just need to make sure that the above shift won't overwrite the
> very last NULL. So bpf_mprog_total() should be < BPF_MPROG_MAX - 2
> before all this.
> 
> Seems as simple as that, is there any complication I skimmed over?
[...]

>> +static int bpf_mprog_delete(struct bpf_mprog_entry *entry,
>> +                           struct bpf_tuple *dtuple, int idx)
>> +{
>> +       int i = 0, j, ret, total = bpf_mprog_total(entry);
>> +       struct bpf_mprog_cp *cp, cpp[BPF_MPROG_MAX] = {};
>> +       struct bpf_mprog_fp *fp, *fpp;
>> +       struct bpf_mprog_entry *peer;
>> +
>> +       ret = bpf_mprog_tuple_confirm(entry, dtuple, idx);
>> +       if (ret)
>> +               return ret;
>> +       peer = bpf_mprog_peer(entry);
>> +       bpf_mprog_entry_clear(peer);
>> +       if (idx < 0)
>> +               i++;
>> +       if (idx == total)
>> +               total--;
>> +       for (j = 0; i < total; i++) {
>> +               if (idx == i)
>> +                       continue;
>> +               bpf_mprog_read_fp(peer, j, &fpp);
>> +               bpf_mprog_read(entry, i, &fp, &cp);
>> +               bpf_mprog_copy(fpp, &cpp[j], fp, cp);
>> +               j++;
>> +       }
>> +       bpf_mprog_commit_cp(peer, cpp);
>> +       bpf_mprog_dec(peer);
>> +       bpf_mprog_mark_ref(peer, dtuple);
>> +       return bpf_mprog_total(peer) ?
>> +              BPF_MPROG_SWAP : BPF_MPROG_FREE;
> 
> for delete it's also a bit unclear to me. We are deleting some
> specific spot, so idx should be a valid [0, N) value, no? Then why the
> bpf_mprog_tuple_confirm() has this special <= first and idx >= last
> handling?
> 
> Deletion should be similar to instertion, just the shift is in the
> other direction. And then setting NULLs at N-1 position to ensure
> proper NULL termination of fp array.

Agree, the naming was suboptimal and I adapted this slightly in v5.
It's picking the elements when no deletion fd was selected, but rather
delete from front/back or relative to some element, so it needs to
fetch the prog.

[...]
> 
> and then here just have special casing for -ERANGE, and otherwise
> treat anything else negative as error
> 
> tidx = bpf_mprog_pos_exact(entry, &rtuple);
> /* and adjust +1 for BPF_F_AFTER */
> if (tidx >= 0)
>      tidx += 1;
> if (idx != -ERANGE && tidx != idx) {
>      ret = tidx < 0 ? tidx : -EDOM;
>      goto out;
> }
> idx = tidx;

This looks much less intuitive to me given replace and delete case need
exact position, just not the relative insertion. I reworked this also with
the memmove in v5, but kept the more obvious _exact/before/after ones.

Thanks a lot for the feedback!

>> +       }
>> +       if (idx < -1) {
>> +               if (rtuple.prog || flags) {
>> +                       ret = -EINVAL;
>> +                       goto out;
>> +               }
>> +               idx = bpf_mprog_total(entry);
>> +               flags = BPF_F_AFTER;
>> +       }
>> +       if (idx >= bpf_mprog_max()) {
>> +               ret = -EDOM;
>> +               goto out;
>> +       }
>> +       if (flags & BPF_F_REPLACE)
>> +               ret = bpf_mprog_replace(entry, &ntuple, idx);
>> +       else
>> +               ret = bpf_mprog_insert(entry, &ntuple, idx, flags);
>> +out:
>> +       bpf_mprog_tuple_put(&rtuple);
>> +       return ret;
>> +}
>> +
> 
> [...]
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v4 1/8] bpf: Add generic attach/detach/query API for multi-progs
  2023-07-11 18:51     ` Andrii Nakryiko
@ 2023-07-14 16:06       ` Daniel Borkmann
  0 siblings, 0 replies; 22+ messages in thread
From: Daniel Borkmann @ 2023-07-14 16:06 UTC (permalink / raw)
  To: Andrii Nakryiko, Alexei Starovoitov
  Cc: ast, andrii, martin.lau, razor, sdf, john.fastabend, kuba, dxu,
	joe, toke, davem, bpf, netdev

On 7/11/23 8:51 PM, Andrii Nakryiko wrote:
> On Mon, Jul 10, 2023 at 5:23 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
>>
>> On Mon, Jul 10, 2023 at 10:12:11PM +0200, Daniel Borkmann wrote:
>>> + *
>>> + *   struct bpf_mprog_entry *entry, *peer;
>>> + *   int ret;
>>> + *
>>> + *   // bpf_mprog user-side lock
>>> + *   // fetch active @entry from attach location
>>> + *   [...]
>>> + *   ret = bpf_mprog_attach(entry, [...]);
>>> + *   if (ret >= 0) {
>>> + *       peer = bpf_mprog_peer(entry);
>>> + *       if (bpf_mprog_swap_entries(ret))
>>> + *           // swap @entry to @peer at attach location
>>> + *       bpf_mprog_commit(entry);
>>> + *       ret = 0;
>>> + *   } else {
>>> + *       // error path, bail out, propagate @ret
>>> + *   }
>>> + *   // bpf_mprog user-side unlock
>>> + *
>>> + *  Detach case:
>>> + *
>>> + *   struct bpf_mprog_entry *entry, *peer;
>>> + *   bool release;
>>> + *   int ret;
>>> + *
>>> + *   // bpf_mprog user-side lock
>>> + *   // fetch active @entry from attach location
>>> + *   [...]
>>> + *   ret = bpf_mprog_detach(entry, [...]);
>>> + *   if (ret >= 0) {
>>> + *       release = ret == BPF_MPROG_FREE;
>>> + *       peer = release ? NULL : bpf_mprog_peer(entry);
>>> + *       if (bpf_mprog_swap_entries(ret))
>>> + *           // swap @entry to @peer at attach location
>>> + *       bpf_mprog_commit(entry);
>>> + *       if (release)
>>> + *           // free bpf_mprog_bundle
>>> + *       ret = 0;
>>> + *   } else {
>>> + *       // error path, bail out, propagate @ret
>>> + *   }
>>> + *   // bpf_mprog user-side unlock
>>
>> Thanks for the doc. It helped a lot.
>> And when it's contained like this it's easier to discuss api.
>> It seems bpf_mprog_swap_entries() is trying to abstract the error code
>> away, but BPF_MPROG_FREE leaks out and tcx_entry_needs_release()
>> captures it with extra miniq_active twist, which I don't understand yet.
>> bpf_mprog_peer() is also leaking a bit of implementation detail.
>> Can we abstract it further, like:
>>
>> ret = bpf_mprog_detach(entry, [...], &new_entry);
>> if (ret >= 0) {
>>     if (entry != new_entry)
>>       // swap @entry to @new_entry at attach location
>>     bpf_mprog_commit(entry);
>>     if (!new_entry)
>>       // free bpf_mprog_bundle
>> }
>> and make bpf_mprog_peer internal to mprog. It will also allow removing
>> BPF_MPROG_FREE vs SWAP distinction. peer is hidden.
>>     if (entry != new_entry)
>>        // update
>> also will be easier to read inside tcx code without looking into mprog details.

+1, agree, and I implemented this suggestion in the v5.

> I'm actually thinking if it's possible to simplify it even further.
> For example, do we even need a separate bpf_mprog_{attach,detach} and
> bpf_mprog_commit()? So far it seems like bpf_mprog_commit() is
> inevitable in case of success of attach/detach, so we might as well
> just do it as the last step of attach/detach operation.

It needs to be done after the pointers have been swapped by the mprog user.

> The only problem seems to be due to bpf_mprog interface doing this
> optimization of replacing stuff in place, if possible, and allowing
> the caller to not do the swap. How important is it to avoid that swap
> of a bpf_mprog_fp (pointer)? Seems pretty cheap (and relatively rare
> operation), so I wouldn't bother optimizing this.

I would like to keep it given e.g. when application comes up, fetches links
from bpffs and updates all its programs in place, then this is the replace
situation.

> So how about we just say that there is always a swap. Internally in
> bpf_mprog_bundle current entry is determined based on revision&1. We
> can have bpf_mprog_cur_entry() to return a proper pointer after
> commit. Or bpf_mprog_attach() can return proper new entry as output
> parameter, whichever is preferable.
> 
> As for BPF_MPROG_FREE. That seems like an unnecessary complication as
> well. Caller can just check bpf_mprog_total() quickly, and if it
> dropped to zero assume FREE. Unless there is something more subtle
> there?

Agree, some may want to keep an empty bpf_mprog, others may want to
free it. I implemented it this way. I removed all the BPF_MPROG_*
return codes.

> With the above, the interface will be much simpler, IMO. You just do
> bpf_mprog_attach/detach, and then swap pointer to new bpf_mprog_entry.
> Then you can check bpf_mprog_total() for zero, and clean up further,
> if necessary.
> 
> We assume the caller has a proper locking, so all the above should be non-racy.
> 
> BTW, combining commit with attach allows us to avoid that relatively
> big bpf_mprog_cp array on the stack as well, because we will be able
> to update bundle->cp_items in-place.
> 
> The only (I believe :) ) big assumption I'm making in all of the above
> is that commit is inevitable and we won't have a situation where we
> start attach, update fp/cpp, and then decide to abort instead of going
> for commit. Is this possible? Can we avoid it by careful checks
> upfront and doing attach as last step that cannot be undone?
> 
> P.S. I guess one bit that I might have simplified is that
> synchronize_rcu() + bpf_prog_put(), but I'm not sure exactly why we
> put prog after sync_rcu. But if it's really necessary (and I assume it

It is because users can still be inflight on the old mprog_entry so it
must come after the sync rcu where we drop ref for the delete case.

Thanks again,
Daniel

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2023-07-14 16:06 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-10 20:12 [PATCH bpf-next v4 0/8] BPF link support for tc BPF programs Daniel Borkmann
2023-07-10 20:12 ` [PATCH bpf-next v4 1/8] bpf: Add generic attach/detach/query API for multi-progs Daniel Borkmann
2023-07-11  0:23   ` Alexei Starovoitov
2023-07-11 18:51     ` Andrii Nakryiko
2023-07-14 16:06       ` Daniel Borkmann
2023-07-11 18:48   ` Andrii Nakryiko
2023-07-14 16:00     ` Daniel Borkmann
2023-07-10 20:12 ` [PATCH bpf-next v4 2/8] bpf: Add fd-based tcx multi-prog infra with link support Daniel Borkmann
2023-07-10 20:12 ` [PATCH bpf-next v4 3/8] libbpf: Add opts-based attach/detach/query API for tcx Daniel Borkmann
2023-07-11  4:00   ` Andrii Nakryiko
2023-07-11 14:03     ` Daniel Borkmann
2023-07-10 20:12 ` [PATCH bpf-next v4 4/8] libbpf: Add link-based " Daniel Borkmann
2023-07-11  4:00   ` Andrii Nakryiko
2023-07-11 14:08     ` Daniel Borkmann
2023-07-10 20:12 ` [PATCH bpf-next v4 5/8] libbpf: Add helper macro to clear opts structs Daniel Borkmann
2023-07-11  4:02   ` Andrii Nakryiko
2023-07-11  9:42     ` Daniel Borkmann
2023-07-10 20:12 ` [PATCH bpf-next v4 6/8] bpftool: Extend net dump with tcx progs Daniel Borkmann
2023-07-11 14:19   ` Quentin Monnet
2023-07-11 16:46     ` Daniel Borkmann
2023-07-10 20:12 ` [PATCH bpf-next v4 7/8] selftests/bpf: Add mprog API tests for BPF tcx opts Daniel Borkmann
2023-07-10 20:12 ` [PATCH bpf-next v4 8/8] selftests/bpf: Add mprog API tests for BPF tcx links Daniel Borkmann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.