* [PATCH bpf-next v6 1/8] bpf: Add generic attach/detach/query API for multi-progs
2023-07-19 14:08 [PATCH bpf-next v6 0/8] BPF link support for tc BPF programs Daniel Borkmann
@ 2023-07-19 14:08 ` Daniel Borkmann
2023-07-19 14:08 ` [PATCH bpf-next v6 2/8] bpf: Add fd-based tcx multi-prog infra with link support Daniel Borkmann
` (7 subsequent siblings)
8 siblings, 0 replies; 14+ messages in thread
From: Daniel Borkmann @ 2023-07-19 14:08 UTC (permalink / raw)
To: ast
Cc: andrii, martin.lau, razor, sdf, john.fastabend, kuba, dxu, joe,
toke, davem, bpf, netdev, Daniel Borkmann
This adds a generic layer called bpf_mprog which can be reused by different
attachment layers to enable multi-program attachment and dependency resolution.
In-kernel users of the bpf_mprog don't need to care about the dependency
resolution internals, they can just consume it with few API calls.
The initial idea of having a generic API sparked out of discussion [0] from an
earlier revision of this work where tc's priority was reused and exposed via
BPF uapi as a way to coordinate dependencies among tc BPF programs, similar
as-is for classic tc BPF. The feedback was that priority provides a bad user
experience and is hard to use [1], e.g.:
I cannot help but feel that priority logic copy-paste from old tc, netfilter
and friends is done because "that's how things were done in the past". [...]
Priority gets exposed everywhere in uapi all the way to bpftool when it's
right there for users to understand. And that's the main problem with it.
The user don't want to and don't need to be aware of it, but uapi forces them
to pick the priority. [...] Your cover letter [0] example proves that in
real life different service pick the same priority. They simply don't know
any better. Priority is an unnecessary magic that apps _have_ to pick, so
they just copy-paste and everyone ends up using the same.
The course of the discussion showed more and more the need for a generic,
reusable API where the "same look and feel" can be applied for various other
program types beyond just tc BPF, for example XDP today does not have multi-
program support in kernel, but also there was interest around this API for
improving management of cgroup program types. Such common multi-program
management concept is useful for BPF management daemons or user space BPF
applications coordinating internally about their attachments.
Both from Cilium and Meta side [2], we've collected the following requirements
for a generic attach/detach/query API for multi-progs which has been implemented
as part of this work:
- Support prog-based attach/detach and link API
- Dependency directives (can also be combined):
- BPF_F_{BEFORE,AFTER} with relative_{fd,id} which can be {prog,link,none}
- BPF_F_ID flag as {fd,id} toggle; the rationale for id is so that user
space application does not need CAP_SYS_ADMIN to retrieve foreign fds
via bpf_*_get_fd_by_id()
- BPF_F_LINK flag as {prog,link} toggle
- If relative_{fd,id} is none, then BPF_F_BEFORE will just prepend, and
BPF_F_AFTER will just append for attaching
- Enforced only at attach time
- BPF_F_REPLACE with replace_bpf_fd which can be prog, links have their
own infra for replacing their internal prog
- If no flags are set, then it's default append behavior for attaching
- Internal revision counter and optionally being able to pass expected_revision
- User space application can query current state with revision, and pass it
along for attachment to assert current state before doing updates
- Query also gets extension for link_ids array and link_attach_flags:
- prog_ids are always filled with program IDs
- link_ids are filled with link IDs when link was used, otherwise 0
- {prog,link}_attach_flags for holding {prog,link}-specific flags
- Must be easy to integrate/reuse for in-kernel users
The uapi-side changes needed for supporting bpf_mprog are rather minimal,
consisting of the additions of the attachment flags, revision counter, and
expanding existing union with relative_{fd,id} member.
The bpf_mprog framework consists of an bpf_mprog_entry object which holds
an array of bpf_mprog_fp (fast-path structure). The bpf_mprog_cp (control-path
structure) is part of bpf_mprog_bundle. Both have been separated, so that
fast-path gets efficient packing of bpf_prog pointers for maximum cache
efficiency. Also, array has been chosen instead of linked list or other
structures to remove unnecessary indirections for a fast point-to-entry in
tc for BPF.
The bpf_mprog_entry comes as a pair via bpf_mprog_bundle so that in case of
updates the peer bpf_mprog_entry is populated and then just swapped which
avoids additional allocations that could otherwise fail, for example, in
detach case. bpf_mprog_{fp,cp} arrays are currently static, but they could
be converted to dynamic allocation if necessary at a point in future.
Locking is deferred to the in-kernel user of bpf_mprog, for example, in case
of tcx which uses this API in the next patch, it piggybacks on rtnl.
An extensive test suite for checking all aspects of this API for prog-based
attach/detach and link API comes as BPF selftests in this series.
Thanks also to Andrii Nakryiko for early API discussions wrt Meta's BPF prog
management.
[0] https://lore.kernel.org/bpf/20221004231143.19190-1-daniel@iogearbox.net
[1] https://lore.kernel.org/bpf/CAADnVQ+gEY3FjCR=+DmjDR4gp5bOYZUFJQXj4agKFHT9CQPZBw@mail.gmail.com
[2] http://vger.kernel.org/bpfconf2023_material/tcx_meta_netdev_borkmann.pdf
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
MAINTAINERS | 1 +
include/linux/bpf_mprog.h | 318 +++++++++++++++++++++++
include/uapi/linux/bpf.h | 36 ++-
kernel/bpf/Makefile | 2 +-
kernel/bpf/mprog.c | 445 +++++++++++++++++++++++++++++++++
tools/include/uapi/linux/bpf.h | 36 ++-
6 files changed, 821 insertions(+), 17 deletions(-)
create mode 100644 include/linux/bpf_mprog.h
create mode 100644 kernel/bpf/mprog.c
diff --git a/MAINTAINERS b/MAINTAINERS
index 9a5863f1b016..678bef9f60b4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3684,6 +3684,7 @@ F: include/linux/filter.h
F: include/linux/tnum.h
F: kernel/bpf/core.c
F: kernel/bpf/dispatcher.c
+F: kernel/bpf/mprog.c
F: kernel/bpf/syscall.c
F: kernel/bpf/tnum.c
F: kernel/bpf/trampoline.c
diff --git a/include/linux/bpf_mprog.h b/include/linux/bpf_mprog.h
new file mode 100644
index 000000000000..6feefec43422
--- /dev/null
+++ b/include/linux/bpf_mprog.h
@@ -0,0 +1,318 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) 2023 Isovalent */
+#ifndef __BPF_MPROG_H
+#define __BPF_MPROG_H
+
+#include <linux/bpf.h>
+
+/* bpf_mprog framework:
+ *
+ * bpf_mprog is a generic layer for multi-program attachment. In-kernel users
+ * of the bpf_mprog don't need to care about the dependency resolution
+ * internals, they can just consume it with few API calls. Currently available
+ * dependency directives are BPF_F_{BEFORE,AFTER} which enable insertion of
+ * a BPF program or BPF link relative to an existing BPF program or BPF link
+ * inside the multi-program array as well as prepend and append behavior if
+ * no relative object was specified, see corresponding selftests for concrete
+ * examples (e.g. tc_links and tc_opts test cases of test_progs).
+ *
+ * Usage of bpf_mprog_{attach,detach,query}() core APIs with pseudo code:
+ *
+ * Attach case:
+ *
+ * struct bpf_mprog_entry *entry, *entry_new;
+ * int ret;
+ *
+ * // bpf_mprog user-side lock
+ * // fetch active @entry from attach location
+ * [...]
+ * ret = bpf_mprog_attach(entry, &entry_new, [...]);
+ * if (!ret) {
+ * if (entry != entry_new) {
+ * // swap @entry to @entry_new at attach location
+ * // ensure there are no inflight users of @entry:
+ * synchronize_rcu();
+ * }
+ * bpf_mprog_commit(entry);
+ * } else {
+ * // error path, bail out, propagate @ret
+ * }
+ * // bpf_mprog user-side unlock
+ *
+ * Detach case:
+ *
+ * struct bpf_mprog_entry *entry, *entry_new;
+ * int ret;
+ *
+ * // bpf_mprog user-side lock
+ * // fetch active @entry from attach location
+ * [...]
+ * ret = bpf_mprog_detach(entry, &entry_new, [...]);
+ * if (!ret) {
+ * // all (*) marked is optional and depends on the use-case
+ * // whether bpf_mprog_bundle should be freed or not
+ * if (!bpf_mprog_total(entry_new)) (*)
+ * entry_new = NULL (*)
+ * // swap @entry to @entry_new at attach location
+ * // ensure there are no inflight users of @entry:
+ * synchronize_rcu();
+ * bpf_mprog_commit(entry);
+ * if (!entry_new) (*)
+ * // free bpf_mprog_bundle (*)
+ * } else {
+ * // error path, bail out, propagate @ret
+ * }
+ * // bpf_mprog user-side unlock
+ *
+ * Query case:
+ *
+ * struct bpf_mprog_entry *entry;
+ * int ret;
+ *
+ * // bpf_mprog user-side lock
+ * // fetch active @entry from attach location
+ * [...]
+ * ret = bpf_mprog_query(attr, uattr, entry);
+ * // bpf_mprog user-side unlock
+ *
+ * Data/fast path:
+ *
+ * struct bpf_mprog_entry *entry;
+ * struct bpf_mprog_fp *fp;
+ * struct bpf_prog *prog;
+ * int ret = [...];
+ *
+ * rcu_read_lock();
+ * // fetch active @entry from attach location
+ * [...]
+ * bpf_mprog_foreach_prog(entry, fp, prog) {
+ * ret = bpf_prog_run(prog, [...]);
+ * // process @ret from program
+ * }
+ * [...]
+ * rcu_read_unlock();
+ *
+ * bpf_mprog locking considerations:
+ *
+ * bpf_mprog_{attach,detach,query}() must be protected by an external lock
+ * (like RTNL in case of tcx).
+ *
+ * bpf_mprog_entry pointer can be an __rcu annotated pointer (in case of tcx
+ * the netdevice has tcx_ingress and tcx_egress __rcu pointer) which gets
+ * updated via rcu_assign_pointer() pointing to the active bpf_mprog_entry of
+ * the bpf_mprog_bundle.
+ *
+ * Fast path accesses the active bpf_mprog_entry within RCU critical section
+ * (in case of tcx it runs in NAPI which provides RCU protection there,
+ * other users might need explicit rcu_read_lock()). The bpf_mprog_commit()
+ * assumes that for the old bpf_mprog_entry there are no inflight users
+ * anymore.
+ *
+ * The READ_ONCE()/WRITE_ONCE() pairing for bpf_mprog_fp's prog access is for
+ * the replacement case where we don't swap the bpf_mprog_entry.
+ */
+
+#define bpf_mprog_foreach_tuple(entry, fp, cp, t) \
+ for (fp = &entry->fp_items[0], cp = &entry->parent->cp_items[0];\
+ ({ \
+ t.prog = READ_ONCE(fp->prog); \
+ t.link = cp->link; \
+ t.prog; \
+ }); \
+ fp++, cp++)
+
+#define bpf_mprog_foreach_prog(entry, fp, p) \
+ for (fp = &entry->fp_items[0]; \
+ (p = READ_ONCE(fp->prog)); \
+ fp++)
+
+#define BPF_MPROG_MAX 64
+
+struct bpf_mprog_fp {
+ struct bpf_prog *prog;
+};
+
+struct bpf_mprog_cp {
+ struct bpf_link *link;
+};
+
+struct bpf_mprog_entry {
+ struct bpf_mprog_fp fp_items[BPF_MPROG_MAX];
+ struct bpf_mprog_bundle *parent;
+};
+
+struct bpf_mprog_bundle {
+ struct bpf_mprog_entry a;
+ struct bpf_mprog_entry b;
+ struct bpf_mprog_cp cp_items[BPF_MPROG_MAX];
+ struct bpf_prog *ref;
+ atomic64_t revision;
+ u32 count;
+};
+
+struct bpf_tuple {
+ struct bpf_prog *prog;
+ struct bpf_link *link;
+};
+
+static inline struct bpf_mprog_entry *
+bpf_mprog_peer(const struct bpf_mprog_entry *entry)
+{
+ if (entry == &entry->parent->a)
+ return &entry->parent->b;
+ else
+ return &entry->parent->a;
+}
+
+static inline void bpf_mprog_bundle_init(struct bpf_mprog_bundle *bundle)
+{
+ BUILD_BUG_ON(sizeof(bundle->a.fp_items[0]) > sizeof(u64));
+ BUILD_BUG_ON(ARRAY_SIZE(bundle->a.fp_items) !=
+ ARRAY_SIZE(bundle->cp_items));
+
+ memset(bundle, 0, sizeof(*bundle));
+ atomic64_set(&bundle->revision, 1);
+ bundle->a.parent = bundle;
+ bundle->b.parent = bundle;
+}
+
+static inline void bpf_mprog_inc(struct bpf_mprog_entry *entry)
+{
+ entry->parent->count++;
+}
+
+static inline void bpf_mprog_dec(struct bpf_mprog_entry *entry)
+{
+ entry->parent->count--;
+}
+
+static inline int bpf_mprog_max(void)
+{
+ return ARRAY_SIZE(((struct bpf_mprog_entry *)NULL)->fp_items) - 1;
+}
+
+static inline int bpf_mprog_total(struct bpf_mprog_entry *entry)
+{
+ int total = entry->parent->count;
+
+ WARN_ON_ONCE(total > bpf_mprog_max());
+ return total;
+}
+
+static inline bool bpf_mprog_exists(struct bpf_mprog_entry *entry,
+ struct bpf_prog *prog)
+{
+ const struct bpf_mprog_fp *fp;
+ const struct bpf_prog *tmp;
+
+ bpf_mprog_foreach_prog(entry, fp, tmp) {
+ if (tmp == prog)
+ return true;
+ }
+ return false;
+}
+
+static inline void bpf_mprog_mark_for_release(struct bpf_mprog_entry *entry,
+ struct bpf_tuple *tuple)
+{
+ WARN_ON_ONCE(entry->parent->ref);
+ if (!tuple->link)
+ entry->parent->ref = tuple->prog;
+}
+
+static inline void bpf_mprog_complete_release(struct bpf_mprog_entry *entry)
+{
+ /* In the non-link case prog deletions can only drop the reference
+ * to the prog after the bpf_mprog_entry got swapped and the
+ * bpf_mprog ensured that there are no inflight users anymore.
+ *
+ * Paired with bpf_mprog_mark_for_release().
+ */
+ if (entry->parent->ref) {
+ bpf_prog_put(entry->parent->ref);
+ entry->parent->ref = NULL;
+ }
+}
+
+static inline void bpf_mprog_revision_new(struct bpf_mprog_entry *entry)
+{
+ atomic64_inc(&entry->parent->revision);
+}
+
+static inline void bpf_mprog_commit(struct bpf_mprog_entry *entry)
+{
+ bpf_mprog_complete_release(entry);
+ bpf_mprog_revision_new(entry);
+}
+
+static inline u64 bpf_mprog_revision(struct bpf_mprog_entry *entry)
+{
+ return atomic64_read(&entry->parent->revision);
+}
+
+static inline void bpf_mprog_entry_copy(struct bpf_mprog_entry *dst,
+ struct bpf_mprog_entry *src)
+{
+ memcpy(dst->fp_items, src->fp_items, sizeof(src->fp_items));
+}
+
+static inline void bpf_mprog_entry_grow(struct bpf_mprog_entry *entry, int idx)
+{
+ int total = bpf_mprog_total(entry);
+
+ memmove(entry->fp_items + idx + 1,
+ entry->fp_items + idx,
+ (total - idx) * sizeof(struct bpf_mprog_fp));
+
+ memmove(entry->parent->cp_items + idx + 1,
+ entry->parent->cp_items + idx,
+ (total - idx) * sizeof(struct bpf_mprog_cp));
+}
+
+static inline void bpf_mprog_entry_shrink(struct bpf_mprog_entry *entry, int idx)
+{
+ /* Total array size is needed in this case to enure the NULL
+ * entry is copied at the end.
+ */
+ int total = ARRAY_SIZE(entry->fp_items);
+
+ memmove(entry->fp_items + idx,
+ entry->fp_items + idx + 1,
+ (total - idx - 1) * sizeof(struct bpf_mprog_fp));
+
+ memmove(entry->parent->cp_items + idx,
+ entry->parent->cp_items + idx + 1,
+ (total - idx - 1) * sizeof(struct bpf_mprog_cp));
+}
+
+static inline void bpf_mprog_read(struct bpf_mprog_entry *entry, u32 idx,
+ struct bpf_mprog_fp **fp,
+ struct bpf_mprog_cp **cp)
+{
+ *fp = &entry->fp_items[idx];
+ *cp = &entry->parent->cp_items[idx];
+}
+
+static inline void bpf_mprog_write(struct bpf_mprog_fp *fp,
+ struct bpf_mprog_cp *cp,
+ struct bpf_tuple *tuple)
+{
+ WRITE_ONCE(fp->prog, tuple->prog);
+ cp->link = tuple->link;
+}
+
+int bpf_mprog_attach(struct bpf_mprog_entry *entry,
+ struct bpf_mprog_entry **entry_new,
+ struct bpf_prog *prog_new, struct bpf_link *link,
+ struct bpf_prog *prog_old,
+ u32 flags, u32 id_or_fd, u64 revision);
+
+int bpf_mprog_detach(struct bpf_mprog_entry *entry,
+ struct bpf_mprog_entry **entry_new,
+ struct bpf_prog *prog, struct bpf_link *link,
+ u32 flags, u32 id_or_fd, u64 revision);
+
+int bpf_mprog_query(const union bpf_attr *attr, union bpf_attr __user *uattr,
+ struct bpf_mprog_entry *entry);
+
+#endif /* __BPF_MPROG_H */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 9ed59896ebc5..d4c07e435336 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1113,7 +1113,12 @@ enum bpf_perf_event_type {
*/
#define BPF_F_ALLOW_OVERRIDE (1U << 0)
#define BPF_F_ALLOW_MULTI (1U << 1)
+/* Generic attachment flags. */
#define BPF_F_REPLACE (1U << 2)
+#define BPF_F_BEFORE (1U << 3)
+#define BPF_F_AFTER (1U << 4)
+#define BPF_F_ID (1U << 5)
+#define BPF_F_LINK BPF_F_LINK /* 1 << 13 */
/* If BPF_F_STRICT_ALIGNMENT is used in BPF_PROG_LOAD command, the
* verifier will perform strict alignment checking as if the kernel
@@ -1444,14 +1449,19 @@ union bpf_attr {
};
struct { /* anonymous struct used by BPF_PROG_ATTACH/DETACH commands */
- __u32 target_fd; /* container object to attach to */
- __u32 attach_bpf_fd; /* eBPF program to attach */
+ union {
+ __u32 target_fd; /* target object to attach to or ... */
+ __u32 target_ifindex; /* target ifindex */
+ };
+ __u32 attach_bpf_fd;
__u32 attach_type;
__u32 attach_flags;
- __u32 replace_bpf_fd; /* previously attached eBPF
- * program to replace if
- * BPF_F_REPLACE is used
- */
+ __u32 replace_bpf_fd;
+ union {
+ __u32 relative_fd;
+ __u32 relative_id;
+ };
+ __u64 expected_revision;
};
struct { /* anonymous struct used by BPF_PROG_TEST_RUN command */
@@ -1497,16 +1507,26 @@ union bpf_attr {
} info;
struct { /* anonymous struct used by BPF_PROG_QUERY command */
- __u32 target_fd; /* container object to query */
+ union {
+ __u32 target_fd; /* target object to query or ... */
+ __u32 target_ifindex; /* target ifindex */
+ };
__u32 attach_type;
__u32 query_flags;
__u32 attach_flags;
__aligned_u64 prog_ids;
- __u32 prog_cnt;
+ union {
+ __u32 prog_cnt;
+ __u32 count;
+ };
+ __u32 :32;
/* output: per-program attach_flags.
* not allowed to be set during effective query.
*/
__aligned_u64 prog_attach_flags;
+ __aligned_u64 link_ids;
+ __aligned_u64 link_attach_flags;
+ __u64 revision;
} query;
struct { /* anonymous struct used by BPF_RAW_TRACEPOINT_OPEN command */
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index 1d3892168d32..1bea2eb912cd 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -12,7 +12,7 @@ obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list
obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o
obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o
obj-${CONFIG_BPF_LSM} += bpf_inode_storage.o
-obj-$(CONFIG_BPF_SYSCALL) += disasm.o
+obj-$(CONFIG_BPF_SYSCALL) += disasm.o mprog.o
obj-$(CONFIG_BPF_JIT) += trampoline.o
obj-$(CONFIG_BPF_SYSCALL) += btf.o memalloc.o
obj-$(CONFIG_BPF_JIT) += dispatcher.o
diff --git a/kernel/bpf/mprog.c b/kernel/bpf/mprog.c
new file mode 100644
index 000000000000..f7816d2bc3e4
--- /dev/null
+++ b/kernel/bpf/mprog.c
@@ -0,0 +1,445 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2023 Isovalent */
+
+#include <linux/bpf.h>
+#include <linux/bpf_mprog.h>
+
+static int bpf_mprog_link(struct bpf_tuple *tuple,
+ u32 id_or_fd, u32 flags,
+ enum bpf_prog_type type)
+{
+ struct bpf_link *link = ERR_PTR(-EINVAL);
+ bool id = flags & BPF_F_ID;
+
+ if (id)
+ link = bpf_link_by_id(id_or_fd);
+ else if (id_or_fd)
+ link = bpf_link_get_from_fd(id_or_fd);
+ if (IS_ERR(link))
+ return PTR_ERR(link);
+ if (type && link->prog->type != type) {
+ bpf_link_put(link);
+ return -EINVAL;
+ }
+
+ tuple->link = link;
+ tuple->prog = link->prog;
+ return 0;
+}
+
+static int bpf_mprog_prog(struct bpf_tuple *tuple,
+ u32 id_or_fd, u32 flags,
+ enum bpf_prog_type type)
+{
+ struct bpf_prog *prog = ERR_PTR(-EINVAL);
+ bool id = flags & BPF_F_ID;
+
+ if (id)
+ prog = bpf_prog_by_id(id_or_fd);
+ else if (id_or_fd)
+ prog = bpf_prog_get(id_or_fd);
+ if (IS_ERR(prog))
+ return PTR_ERR(prog);
+ if (type && prog->type != type) {
+ bpf_prog_put(prog);
+ return -EINVAL;
+ }
+
+ tuple->link = NULL;
+ tuple->prog = prog;
+ return 0;
+}
+
+static int bpf_mprog_tuple_relative(struct bpf_tuple *tuple,
+ u32 id_or_fd, u32 flags,
+ enum bpf_prog_type type)
+{
+ bool link = flags & BPF_F_LINK;
+ bool id = flags & BPF_F_ID;
+
+ memset(tuple, 0, sizeof(*tuple));
+ if (link)
+ return bpf_mprog_link(tuple, id_or_fd, flags, type);
+ /* If no relevant flag is set and no id_or_fd was passed, then
+ * tuple link/prog is just NULLed. This is the case when before/
+ * after selects first/last position without passing fd.
+ */
+ if (!id && !id_or_fd)
+ return 0;
+ return bpf_mprog_prog(tuple, id_or_fd, flags, type);
+}
+
+static void bpf_mprog_tuple_put(struct bpf_tuple *tuple)
+{
+ if (tuple->link)
+ bpf_link_put(tuple->link);
+ else if (tuple->prog)
+ bpf_prog_put(tuple->prog);
+}
+
+/* The bpf_mprog_{replace,delete}() operate on exact idx position with the
+ * one exception that for deletion we support delete from front/back. In
+ * case of front idx is -1, in case of back idx is bpf_mprog_total(entry).
+ * Adjustment to first and last entry is trivial. The bpf_mprog_insert()
+ * we have to deal with the following cases:
+ *
+ * idx + before:
+ *
+ * Insert P4 before P3: idx for old array is 1, idx for new array is 2,
+ * hence we adjust target idx for the new array, so that memmove copies
+ * P1 and P2 to the new entry, and we insert P4 into idx 2. Inserting
+ * before P1 would have old idx -1 and new idx 0.
+ *
+ * +--+--+--+ +--+--+--+--+ +--+--+--+--+
+ * |P1|P2|P3| ==> |P1|P2| |P3| ==> |P1|P2|P4|P3|
+ * +--+--+--+ +--+--+--+--+ +--+--+--+--+
+ *
+ * idx + after:
+ *
+ * Insert P4 after P2: idx for old array is 2, idx for new array is 2.
+ * Again, memmove copies P1 and P2 to the new entry, and we insert P4
+ * into idx 2. Inserting after P3 would have both old/new idx at 4 aka
+ * bpf_mprog_total(entry).
+ *
+ * +--+--+--+ +--+--+--+--+ +--+--+--+--+
+ * |P1|P2|P3| ==> |P1|P2| |P3| ==> |P1|P2|P4|P3|
+ * +--+--+--+ +--+--+--+--+ +--+--+--+--+
+ */
+static int bpf_mprog_replace(struct bpf_mprog_entry *entry,
+ struct bpf_mprog_entry **entry_new,
+ struct bpf_tuple *ntuple, int idx)
+{
+ struct bpf_mprog_fp *fp;
+ struct bpf_mprog_cp *cp;
+ struct bpf_prog *oprog;
+
+ bpf_mprog_read(entry, idx, &fp, &cp);
+ oprog = READ_ONCE(fp->prog);
+ bpf_mprog_write(fp, cp, ntuple);
+ if (!ntuple->link) {
+ WARN_ON_ONCE(cp->link);
+ bpf_prog_put(oprog);
+ }
+ *entry_new = entry;
+ return 0;
+}
+
+static int bpf_mprog_insert(struct bpf_mprog_entry *entry,
+ struct bpf_mprog_entry **entry_new,
+ struct bpf_tuple *ntuple, int idx, u32 flags)
+{
+ int total = bpf_mprog_total(entry);
+ struct bpf_mprog_entry *peer;
+ struct bpf_mprog_fp *fp;
+ struct bpf_mprog_cp *cp;
+
+ peer = bpf_mprog_peer(entry);
+ bpf_mprog_entry_copy(peer, entry);
+ if (idx == total)
+ goto insert;
+ else if (flags & BPF_F_BEFORE)
+ idx += 1;
+ bpf_mprog_entry_grow(peer, idx);
+insert:
+ bpf_mprog_read(peer, idx, &fp, &cp);
+ bpf_mprog_write(fp, cp, ntuple);
+ bpf_mprog_inc(peer);
+ *entry_new = peer;
+ return 0;
+}
+
+static int bpf_mprog_delete(struct bpf_mprog_entry *entry,
+ struct bpf_mprog_entry **entry_new,
+ struct bpf_tuple *dtuple, int idx)
+{
+ int total = bpf_mprog_total(entry);
+ struct bpf_mprog_entry *peer;
+
+ peer = bpf_mprog_peer(entry);
+ bpf_mprog_entry_copy(peer, entry);
+ if (idx == -1)
+ idx = 0;
+ else if (idx == total)
+ idx = total - 1;
+ bpf_mprog_entry_shrink(peer, idx);
+ bpf_mprog_dec(peer);
+ bpf_mprog_mark_for_release(peer, dtuple);
+ *entry_new = peer;
+ return 0;
+}
+
+/* In bpf_mprog_pos_*() we evaluate the target position for the BPF
+ * program/link that needs to be replaced, inserted or deleted for
+ * each "rule" independently. If all rules agree on that position
+ * or existing element, then enact replacement, addition or deletion.
+ * If this is not the case, then the request cannot be satisfied and
+ * we bail out with an error.
+ */
+static int bpf_mprog_pos_exact(struct bpf_mprog_entry *entry,
+ struct bpf_tuple *tuple)
+{
+ struct bpf_mprog_fp *fp;
+ struct bpf_mprog_cp *cp;
+ int i;
+
+ for (i = 0; i < bpf_mprog_total(entry); i++) {
+ bpf_mprog_read(entry, i, &fp, &cp);
+ if (tuple->prog == READ_ONCE(fp->prog))
+ return tuple->link == cp->link ? i : -EBUSY;
+ }
+ return -ENOENT;
+}
+
+static int bpf_mprog_pos_before(struct bpf_mprog_entry *entry,
+ struct bpf_tuple *tuple)
+{
+ struct bpf_mprog_fp *fp;
+ struct bpf_mprog_cp *cp;
+ int i;
+
+ for (i = 0; i < bpf_mprog_total(entry); i++) {
+ bpf_mprog_read(entry, i, &fp, &cp);
+ if (tuple->prog == READ_ONCE(fp->prog) &&
+ (!tuple->link || tuple->link == cp->link))
+ return i - 1;
+ }
+ return tuple->prog ? -ENOENT : -1;
+}
+
+static int bpf_mprog_pos_after(struct bpf_mprog_entry *entry,
+ struct bpf_tuple *tuple)
+{
+ struct bpf_mprog_fp *fp;
+ struct bpf_mprog_cp *cp;
+ int i;
+
+ for (i = 0; i < bpf_mprog_total(entry); i++) {
+ bpf_mprog_read(entry, i, &fp, &cp);
+ if (tuple->prog == READ_ONCE(fp->prog) &&
+ (!tuple->link || tuple->link == cp->link))
+ return i + 1;
+ }
+ return tuple->prog ? -ENOENT : bpf_mprog_total(entry);
+}
+
+int bpf_mprog_attach(struct bpf_mprog_entry *entry,
+ struct bpf_mprog_entry **entry_new,
+ struct bpf_prog *prog_new, struct bpf_link *link,
+ struct bpf_prog *prog_old,
+ u32 flags, u32 id_or_fd, u64 revision)
+{
+ struct bpf_tuple rtuple, ntuple = {
+ .prog = prog_new,
+ .link = link,
+ }, otuple = {
+ .prog = prog_old,
+ .link = link,
+ };
+ int ret, idx = -ERANGE, tidx;
+
+ if (revision && revision != bpf_mprog_revision(entry))
+ return -ESTALE;
+ if (bpf_mprog_exists(entry, prog_new))
+ return -EEXIST;
+ ret = bpf_mprog_tuple_relative(&rtuple, id_or_fd,
+ flags & ~BPF_F_REPLACE,
+ prog_new->type);
+ if (ret)
+ return ret;
+ if (flags & BPF_F_REPLACE) {
+ tidx = bpf_mprog_pos_exact(entry, &otuple);
+ if (tidx < 0) {
+ ret = tidx;
+ goto out;
+ }
+ idx = tidx;
+ }
+ if (flags & BPF_F_BEFORE) {
+ tidx = bpf_mprog_pos_before(entry, &rtuple);
+ if (tidx < -1 || (idx >= -1 && tidx != idx)) {
+ ret = tidx < -1 ? tidx : -ERANGE;
+ goto out;
+ }
+ idx = tidx;
+ }
+ if (flags & BPF_F_AFTER) {
+ tidx = bpf_mprog_pos_after(entry, &rtuple);
+ if (tidx < -1 || (idx >= -1 && tidx != idx)) {
+ ret = tidx < 0 ? tidx : -ERANGE;
+ goto out;
+ }
+ idx = tidx;
+ }
+ if (idx < -1) {
+ if (rtuple.prog || flags) {
+ ret = -EINVAL;
+ goto out;
+ }
+ idx = bpf_mprog_total(entry);
+ flags = BPF_F_AFTER;
+ }
+ if (idx >= bpf_mprog_max()) {
+ ret = -ERANGE;
+ goto out;
+ }
+ if (flags & BPF_F_REPLACE)
+ ret = bpf_mprog_replace(entry, entry_new, &ntuple, idx);
+ else
+ ret = bpf_mprog_insert(entry, entry_new, &ntuple, idx, flags);
+out:
+ bpf_mprog_tuple_put(&rtuple);
+ return ret;
+}
+
+static int bpf_mprog_fetch(struct bpf_mprog_entry *entry,
+ struct bpf_tuple *tuple, int idx)
+{
+ int total = bpf_mprog_total(entry);
+ struct bpf_mprog_cp *cp;
+ struct bpf_mprog_fp *fp;
+ struct bpf_prog *prog;
+ struct bpf_link *link;
+
+ if (idx == -1)
+ idx = 0;
+ else if (idx == total)
+ idx = total - 1;
+ bpf_mprog_read(entry, idx, &fp, &cp);
+ prog = READ_ONCE(fp->prog);
+ link = cp->link;
+ /* The deletion request can either be without filled tuple in which
+ * case it gets populated here based on idx, or with filled tuple
+ * where the only thing we end up doing is the WARN_ON_ONCE() assert.
+ * If we hit a BPF link at the given index, it must not be removed
+ * from opts path.
+ */
+ if (link && !tuple->link)
+ return -EBUSY;
+ WARN_ON_ONCE(tuple->prog && tuple->prog != prog);
+ WARN_ON_ONCE(tuple->link && tuple->link != link);
+ tuple->prog = prog;
+ tuple->link = link;
+ return 0;
+}
+
+int bpf_mprog_detach(struct bpf_mprog_entry *entry,
+ struct bpf_mprog_entry **entry_new,
+ struct bpf_prog *prog, struct bpf_link *link,
+ u32 flags, u32 id_or_fd, u64 revision)
+{
+ struct bpf_tuple rtuple, dtuple = {
+ .prog = prog,
+ .link = link,
+ };
+ int ret, idx = -ERANGE, tidx;
+
+ if (flags & BPF_F_REPLACE)
+ return -EINVAL;
+ if (revision && revision != bpf_mprog_revision(entry))
+ return -ESTALE;
+ ret = bpf_mprog_tuple_relative(&rtuple, id_or_fd, flags,
+ prog ? prog->type :
+ BPF_PROG_TYPE_UNSPEC);
+ if (ret)
+ return ret;
+ if (dtuple.prog) {
+ tidx = bpf_mprog_pos_exact(entry, &dtuple);
+ if (tidx < 0) {
+ ret = tidx;
+ goto out;
+ }
+ idx = tidx;
+ }
+ if (flags & BPF_F_BEFORE) {
+ tidx = bpf_mprog_pos_before(entry, &rtuple);
+ if (tidx < -1 || (idx >= -1 && tidx != idx)) {
+ ret = tidx < -1 ? tidx : -ERANGE;
+ goto out;
+ }
+ idx = tidx;
+ }
+ if (flags & BPF_F_AFTER) {
+ tidx = bpf_mprog_pos_after(entry, &rtuple);
+ if (tidx < -1 || (idx >= -1 && tidx != idx)) {
+ ret = tidx < 0 ? tidx : -ERANGE;
+ goto out;
+ }
+ idx = tidx;
+ }
+ if (idx < -1) {
+ if (rtuple.prog || flags) {
+ ret = -EINVAL;
+ goto out;
+ }
+ idx = bpf_mprog_total(entry);
+ flags = BPF_F_AFTER;
+ }
+ if (idx >= bpf_mprog_max()) {
+ ret = -ERANGE;
+ goto out;
+ }
+ ret = bpf_mprog_fetch(entry, &dtuple, idx);
+ if (ret)
+ goto out;
+ ret = bpf_mprog_delete(entry, entry_new, &dtuple, idx);
+out:
+ bpf_mprog_tuple_put(&rtuple);
+ return ret;
+}
+
+int bpf_mprog_query(const union bpf_attr *attr, union bpf_attr __user *uattr,
+ struct bpf_mprog_entry *entry)
+{
+ u32 __user *uprog_flags, *ulink_flags;
+ u32 __user *uprog_id, *ulink_id;
+ struct bpf_mprog_fp *fp;
+ struct bpf_mprog_cp *cp;
+ struct bpf_prog *prog;
+ const u32 flags = 0;
+ int i, ret = 0;
+ u32 id, count;
+ u64 revision;
+
+ if (attr->query.query_flags || attr->query.attach_flags)
+ return -EINVAL;
+ revision = bpf_mprog_revision(entry);
+ count = bpf_mprog_total(entry);
+ if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags)))
+ return -EFAULT;
+ if (copy_to_user(&uattr->query.revision, &revision, sizeof(revision)))
+ return -EFAULT;
+ if (copy_to_user(&uattr->query.count, &count, sizeof(count)))
+ return -EFAULT;
+ uprog_id = u64_to_user_ptr(attr->query.prog_ids);
+ uprog_flags = u64_to_user_ptr(attr->query.prog_attach_flags);
+ ulink_id = u64_to_user_ptr(attr->query.link_ids);
+ ulink_flags = u64_to_user_ptr(attr->query.link_attach_flags);
+ if (attr->query.count == 0 || !uprog_id || !count)
+ return 0;
+ if (attr->query.count < count) {
+ count = attr->query.count;
+ ret = -ENOSPC;
+ }
+ for (i = 0; i < bpf_mprog_max(); i++) {
+ bpf_mprog_read(entry, i, &fp, &cp);
+ prog = READ_ONCE(fp->prog);
+ if (!prog)
+ break;
+ id = prog->aux->id;
+ if (copy_to_user(uprog_id + i, &id, sizeof(id)))
+ return -EFAULT;
+ if (uprog_flags &&
+ copy_to_user(uprog_flags + i, &flags, sizeof(flags)))
+ return -EFAULT;
+ id = cp->link ? cp->link->id : 0;
+ if (ulink_id &&
+ copy_to_user(ulink_id + i, &id, sizeof(id)))
+ return -EFAULT;
+ if (ulink_flags &&
+ copy_to_user(ulink_flags + i, &flags, sizeof(flags)))
+ return -EFAULT;
+ if (i + 1 == count)
+ break;
+ }
+ return ret;
+}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 600d0caebbd8..1c166870cdf3 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1113,7 +1113,12 @@ enum bpf_perf_event_type {
*/
#define BPF_F_ALLOW_OVERRIDE (1U << 0)
#define BPF_F_ALLOW_MULTI (1U << 1)
+/* Generic attachment flags. */
#define BPF_F_REPLACE (1U << 2)
+#define BPF_F_BEFORE (1U << 3)
+#define BPF_F_AFTER (1U << 4)
+#define BPF_F_ID (1U << 5)
+#define BPF_F_LINK BPF_F_LINK /* 1 << 13 */
/* If BPF_F_STRICT_ALIGNMENT is used in BPF_PROG_LOAD command, the
* verifier will perform strict alignment checking as if the kernel
@@ -1444,14 +1449,19 @@ union bpf_attr {
};
struct { /* anonymous struct used by BPF_PROG_ATTACH/DETACH commands */
- __u32 target_fd; /* container object to attach to */
- __u32 attach_bpf_fd; /* eBPF program to attach */
+ union {
+ __u32 target_fd; /* target object to attach to or ... */
+ __u32 target_ifindex; /* target ifindex */
+ };
+ __u32 attach_bpf_fd;
__u32 attach_type;
__u32 attach_flags;
- __u32 replace_bpf_fd; /* previously attached eBPF
- * program to replace if
- * BPF_F_REPLACE is used
- */
+ __u32 replace_bpf_fd;
+ union {
+ __u32 relative_fd;
+ __u32 relative_id;
+ };
+ __u64 expected_revision;
};
struct { /* anonymous struct used by BPF_PROG_TEST_RUN command */
@@ -1497,16 +1507,26 @@ union bpf_attr {
} info;
struct { /* anonymous struct used by BPF_PROG_QUERY command */
- __u32 target_fd; /* container object to query */
+ union {
+ __u32 target_fd; /* target object to query or ... */
+ __u32 target_ifindex; /* target ifindex */
+ };
__u32 attach_type;
__u32 query_flags;
__u32 attach_flags;
__aligned_u64 prog_ids;
- __u32 prog_cnt;
+ union {
+ __u32 prog_cnt;
+ __u32 count;
+ };
+ __u32 :32;
/* output: per-program attach_flags.
* not allowed to be set during effective query.
*/
__aligned_u64 prog_attach_flags;
+ __aligned_u64 link_ids;
+ __aligned_u64 link_attach_flags;
+ __u64 revision;
} query;
struct { /* anonymous struct used by BPF_RAW_TRACEPOINT_OPEN command */
--
2.34.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH bpf-next v6 2/8] bpf: Add fd-based tcx multi-prog infra with link support
2023-07-19 14:08 [PATCH bpf-next v6 0/8] BPF link support for tc BPF programs Daniel Borkmann
2023-07-19 14:08 ` [PATCH bpf-next v6 1/8] bpf: Add generic attach/detach/query API for multi-progs Daniel Borkmann
@ 2023-07-19 14:08 ` Daniel Borkmann
2023-07-20 2:13 ` Yafang Shao
2023-07-21 14:57 ` Petr Machata
2023-07-19 14:08 ` [PATCH bpf-next v6 3/8] libbpf: Add opts-based attach/detach/query API for tcx Daniel Borkmann
` (6 subsequent siblings)
8 siblings, 2 replies; 14+ messages in thread
From: Daniel Borkmann @ 2023-07-19 14:08 UTC (permalink / raw)
To: ast
Cc: andrii, martin.lau, razor, sdf, john.fastabend, kuba, dxu, joe,
toke, davem, bpf, netdev, Daniel Borkmann
This work refactors and adds a lightweight extension ("tcx") to the tc BPF
ingress and egress data path side for allowing BPF program management based
on fds via bpf() syscall through the newly added generic multi-prog API.
The main goal behind this work which we also presented at LPC [0] last year
and a recent update at LSF/MM/BPF this year [3] is to support long-awaited
BPF link functionality for tc BPF programs, which allows for a model of safe
ownership and program detachment.
Given the rise in tc BPF users in cloud native environments, this becomes
necessary to avoid hard to debug incidents either through stale leftover
programs or 3rd party applications accidentally stepping on each others toes.
As a recap, a BPF link represents the attachment of a BPF program to a BPF
hook point. The BPF link holds a single reference to keep BPF program alive.
Moreover, hook points do not reference a BPF link, only the application's
fd or pinning does. A BPF link holds meta-data specific to attachment and
implements operations for link creation, (atomic) BPF program update,
detachment and introspection. The motivation for BPF links for tc BPF programs
is multi-fold, for example:
- From Meta: "It's especially important for applications that are deployed
fleet-wide and that don't "control" hosts they are deployed to. If such
application crashes and no one notices and does anything about that, BPF
program will keep running draining resources or even just, say, dropping
packets. We at FB had outages due to such permanent BPF attachment
semantics. With fd-based BPF link we are getting a framework, which allows
safe, auto-detachable behavior by default, unless application explicitly
opts in by pinning the BPF link." [1]
- From Cilium-side the tc BPF programs we attach to host-facing veth devices
and phys devices build the core datapath for Kubernetes Pods, and they
implement forwarding, load-balancing, policy, EDT-management, etc, within
BPF. Currently there is no concept of 'safe' ownership, e.g. we've recently
experienced hard-to-debug issues in a user's staging environment where
another Kubernetes application using tc BPF attached to the same prio/handle
of cls_bpf, accidentally wiping all Cilium-based BPF programs from underneath
it. The goal is to establish a clear/safe ownership model via links which
cannot accidentally be overridden. [0,2]
BPF links for tc can co-exist with non-link attachments, and the semantics are
in line also with XDP links: BPF links cannot replace other BPF links, BPF
links cannot replace non-BPF links, non-BPF links cannot replace BPF links and
lastly only non-BPF links can replace non-BPF links. In case of Cilium, this
would solve mentioned issue of safe ownership model as 3rd party applications
would not be able to accidentally wipe Cilium programs, even if they are not
BPF link aware.
Earlier attempts [4] have tried to integrate BPF links into core tc machinery
to solve cls_bpf, which has been intrusive to the generic tc kernel API with
extensions only specific to cls_bpf and suboptimal/complex since cls_bpf could
be wiped from the qdisc also. Locking a tc BPF program in place this way, is
getting into layering hacks given the two object models are vastly different.
We instead implemented the tcx (tc 'express') layer which is an fd-based tc BPF
attach API, so that the BPF link implementation blends in naturally similar to
other link types which are fd-based and without the need for changing core tc
internal APIs. BPF programs for tc can then be successively migrated from classic
cls_bpf to the new tc BPF link without needing to change the program's source
code, just the BPF loader mechanics for attaching is sufficient.
For the current tc framework, there is no change in behavior with this change
and neither does this change touch on tc core kernel APIs. The gist of this
patch is that the ingress and egress hook have a lightweight, qdisc-less
extension for BPF to attach its tc BPF programs, in other words, a minimal
entry point for tc BPF. The name tcx has been suggested from discussion of
earlier revisions of this work as a good fit, and to more easily differ between
the classic cls_bpf attachment and the fd-based one.
For the ingress and egress tcx points, the device holds a cache-friendly array
with program pointers which is separated from control plane (slow-path) data.
Earlier versions of this work used priority to determine ordering and expression
of dependencies similar as with classic tc, but it was challenged that for
something more future-proof a better user experience is required. Hence this
resulted in the design and development of the generic attach/detach/query API
for multi-progs. See prior patch with its discussion on the API design. tcx is
the first user and later we plan to integrate also others, for example, one
candidate is multi-prog support for XDP which would benefit and have the same
'look and feel' from API perspective.
The goal with tcx is to have maximum compatibility to existing tc BPF programs,
so they don't need to be rewritten specifically. Compatibility to call into
classic tcf_classify() is also provided in order to allow successive migration
or both to cleanly co-exist where needed given its all one logical tc layer and
the tcx plus classic tc cls/act build one logical overall processing pipeline.
tcx supports the simplified return codes TCX_NEXT which is non-terminating (go
to next program) and terminating ones with TCX_PASS, TCX_DROP, TCX_REDIRECT.
The fd-based API is behind a static key, so that when unused the code is also
not entered. The struct tcx_entry's program array is currently static, but
could be made dynamic if necessary at a point in future. The a/b pair swap
design has been chosen so that for detachment there are no allocations which
otherwise could fail.
The work has been tested with tc-testing selftest suite which all passes, as
well as the tc BPF tests from the BPF CI, and also with Cilium's L4LB.
Thanks also to Nikolay Aleksandrov and Martin Lau for in-depth early reviews
of this work.
[0] https://lpc.events/event/16/contributions/1353/
[1] https://lore.kernel.org/bpf/CAEf4BzbokCJN33Nw_kg82sO=xppXnKWEncGTWCTB9vGCmLB6pw@mail.gmail.com
[2] https://colocatedeventseu2023.sched.com/event/1Jo6O/tales-from-an-ebpf-programs-murder-mystery-hemanth-malla-guillaume-fournier-datadog
[3] http://vger.kernel.org/bpfconf2023_material/tcx_meta_netdev_borkmann.pdf
[4] https://lore.kernel.org/bpf/20210604063116.234316-1-memxor@gmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jakub Kicinski <kuba@kernel.org>
---
MAINTAINERS | 4 +-
include/linux/bpf_mprog.h | 9 +
include/linux/netdevice.h | 15 +-
include/linux/skbuff.h | 4 +-
include/net/sch_generic.h | 2 +-
include/net/tcx.h | 206 +++++++++++++++++++
include/uapi/linux/bpf.h | 34 +++-
kernel/bpf/Kconfig | 1 +
kernel/bpf/Makefile | 1 +
kernel/bpf/syscall.c | 82 ++++++--
kernel/bpf/tcx.c | 348 +++++++++++++++++++++++++++++++++
net/Kconfig | 5 +
net/core/dev.c | 265 +++++++++++++++----------
net/core/filter.c | 4 +-
net/sched/Kconfig | 4 +-
net/sched/sch_ingress.c | 61 +++++-
tools/include/uapi/linux/bpf.h | 34 +++-
17 files changed, 935 insertions(+), 144 deletions(-)
create mode 100644 include/net/tcx.h
create mode 100644 kernel/bpf/tcx.c
diff --git a/MAINTAINERS b/MAINTAINERS
index 678bef9f60b4..990e3fce753c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3778,13 +3778,15 @@ L: netdev@vger.kernel.org
S: Maintained
F: kernel/bpf/bpf_struct*
-BPF [NETWORKING] (tc BPF, sock_addr)
+BPF [NETWORKING] (tcx & tc BPF, sock_addr)
M: Martin KaFai Lau <martin.lau@linux.dev>
M: Daniel Borkmann <daniel@iogearbox.net>
R: John Fastabend <john.fastabend@gmail.com>
L: bpf@vger.kernel.org
L: netdev@vger.kernel.org
S: Maintained
+F: include/net/tcx.h
+F: kernel/bpf/tcx.c
F: net/core/filter.c
F: net/sched/act_bpf.c
F: net/sched/cls_bpf.c
diff --git a/include/linux/bpf_mprog.h b/include/linux/bpf_mprog.h
index 6feefec43422..2b429488f840 100644
--- a/include/linux/bpf_mprog.h
+++ b/include/linux/bpf_mprog.h
@@ -315,4 +315,13 @@ int bpf_mprog_detach(struct bpf_mprog_entry *entry,
int bpf_mprog_query(const union bpf_attr *attr, union bpf_attr __user *uattr,
struct bpf_mprog_entry *entry);
+static inline bool bpf_mprog_supported(enum bpf_prog_type type)
+{
+ switch (type) {
+ case BPF_PROG_TYPE_SCHED_CLS:
+ return true;
+ default:
+ return false;
+ }
+}
#endif /* __BPF_MPROG_H */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index b828c7a75be2..024314c68bc8 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1930,8 +1930,7 @@ enum netdev_ml_priv_type {
*
* @rx_handler: handler for received packets
* @rx_handler_data: XXX: need comments on this one
- * @miniq_ingress: ingress/clsact qdisc specific data for
- * ingress processing
+ * @tcx_ingress: BPF & clsact qdisc specific data for ingress processing
* @ingress_queue: XXX: need comments on this one
* @nf_hooks_ingress: netfilter hooks executed for ingress packets
* @broadcast: hw bcast address
@@ -1952,8 +1951,7 @@ enum netdev_ml_priv_type {
* @xps_maps: all CPUs/RXQs maps for XPS device
*
* @xps_maps: XXX: need comments on this one
- * @miniq_egress: clsact qdisc specific data for
- * egress processing
+ * @tcx_egress: BPF & clsact qdisc specific data for egress processing
* @nf_hooks_egress: netfilter hooks executed for egress packets
* @qdisc_hash: qdisc hash table
* @watchdog_timeo: Represents the timeout that is used by
@@ -2252,9 +2250,8 @@ struct net_device {
unsigned int gro_ipv4_max_size;
rx_handler_func_t __rcu *rx_handler;
void __rcu *rx_handler_data;
-
-#ifdef CONFIG_NET_CLS_ACT
- struct mini_Qdisc __rcu *miniq_ingress;
+#ifdef CONFIG_NET_XGRESS
+ struct bpf_mprog_entry __rcu *tcx_ingress;
#endif
struct netdev_queue __rcu *ingress_queue;
#ifdef CONFIG_NETFILTER_INGRESS
@@ -2282,8 +2279,8 @@ struct net_device {
#ifdef CONFIG_XPS
struct xps_dev_maps __rcu *xps_maps[XPS_MAPS_MAX];
#endif
-#ifdef CONFIG_NET_CLS_ACT
- struct mini_Qdisc __rcu *miniq_egress;
+#ifdef CONFIG_NET_XGRESS
+ struct bpf_mprog_entry __rcu *tcx_egress;
#endif
#ifdef CONFIG_NETFILTER_EGRESS
struct nf_hook_entries __rcu *nf_hooks_egress;
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 91ed66952580..ed83f1c5fc1f 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -944,7 +944,7 @@ struct sk_buff {
__u8 __mono_tc_offset[0];
/* public: */
__u8 mono_delivery_time:1; /* See SKB_MONO_DELIVERY_TIME_MASK */
-#ifdef CONFIG_NET_CLS_ACT
+#ifdef CONFIG_NET_XGRESS
__u8 tc_at_ingress:1; /* See TC_AT_INGRESS_MASK */
__u8 tc_skip_classify:1;
#endif
@@ -993,7 +993,7 @@ struct sk_buff {
__u8 csum_not_inet:1;
#endif
-#ifdef CONFIG_NET_SCHED
+#if defined(CONFIG_NET_SCHED) || defined(CONFIG_NET_XGRESS)
__u16 tc_index; /* traffic control index */
#endif
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index e92f73bb3198..15be2d96b06d 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -703,7 +703,7 @@ int skb_do_redirect(struct sk_buff *);
static inline bool skb_at_tc_ingress(const struct sk_buff *skb)
{
-#ifdef CONFIG_NET_CLS_ACT
+#ifdef CONFIG_NET_XGRESS
return skb->tc_at_ingress;
#else
return false;
diff --git a/include/net/tcx.h b/include/net/tcx.h
new file mode 100644
index 000000000000..264f147953ba
--- /dev/null
+++ b/include/net/tcx.h
@@ -0,0 +1,206 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) 2023 Isovalent */
+#ifndef __NET_TCX_H
+#define __NET_TCX_H
+
+#include <linux/bpf.h>
+#include <linux/bpf_mprog.h>
+
+#include <net/sch_generic.h>
+
+struct mini_Qdisc;
+
+struct tcx_entry {
+ struct mini_Qdisc __rcu *miniq;
+ struct bpf_mprog_bundle bundle;
+ bool miniq_active;
+ struct rcu_head rcu;
+};
+
+struct tcx_link {
+ struct bpf_link link;
+ struct net_device *dev;
+ u32 location;
+};
+
+static inline void tcx_set_ingress(struct sk_buff *skb, bool ingress)
+{
+#ifdef CONFIG_NET_XGRESS
+ skb->tc_at_ingress = ingress;
+#endif
+}
+
+#ifdef CONFIG_NET_XGRESS
+static inline struct tcx_entry *tcx_entry(struct bpf_mprog_entry *entry)
+{
+ struct bpf_mprog_bundle *bundle = entry->parent;
+
+ return container_of(bundle, struct tcx_entry, bundle);
+}
+
+static inline struct tcx_link *tcx_link(struct bpf_link *link)
+{
+ return container_of(link, struct tcx_link, link);
+}
+
+static inline const struct tcx_link *tcx_link_const(const struct bpf_link *link)
+{
+ return tcx_link((struct bpf_link *)link);
+}
+
+void tcx_inc(void);
+void tcx_dec(void);
+
+static inline void tcx_entry_sync(void)
+{
+ /* bpf_mprog_entry got a/b swapped, therefore ensure that
+ * there are no inflight users on the old one anymore.
+ */
+ synchronize_rcu();
+}
+
+static inline void
+tcx_entry_update(struct net_device *dev, struct bpf_mprog_entry *entry,
+ bool ingress)
+{
+ ASSERT_RTNL();
+ if (ingress)
+ rcu_assign_pointer(dev->tcx_ingress, entry);
+ else
+ rcu_assign_pointer(dev->tcx_egress, entry);
+}
+
+static inline struct bpf_mprog_entry *
+tcx_entry_fetch(struct net_device *dev, bool ingress)
+{
+ ASSERT_RTNL();
+ if (ingress)
+ return rcu_dereference_rtnl(dev->tcx_ingress);
+ else
+ return rcu_dereference_rtnl(dev->tcx_egress);
+}
+
+static inline struct bpf_mprog_entry *tcx_entry_create(void)
+{
+ struct tcx_entry *tcx = kzalloc(sizeof(*tcx), GFP_KERNEL);
+
+ if (tcx) {
+ bpf_mprog_bundle_init(&tcx->bundle);
+ return &tcx->bundle.a;
+ }
+ return NULL;
+}
+
+static inline void tcx_entry_free(struct bpf_mprog_entry *entry)
+{
+ kfree_rcu(tcx_entry(entry), rcu);
+}
+
+static inline struct bpf_mprog_entry *
+tcx_entry_fetch_or_create(struct net_device *dev, bool ingress, bool *created)
+{
+ struct bpf_mprog_entry *entry = tcx_entry_fetch(dev, ingress);
+
+ *created = false;
+ if (!entry) {
+ entry = tcx_entry_create();
+ if (!entry)
+ return NULL;
+ *created = true;
+ }
+ return entry;
+}
+
+static inline void tcx_skeys_inc(bool ingress)
+{
+ tcx_inc();
+ if (ingress)
+ net_inc_ingress_queue();
+ else
+ net_inc_egress_queue();
+}
+
+static inline void tcx_skeys_dec(bool ingress)
+{
+ if (ingress)
+ net_dec_ingress_queue();
+ else
+ net_dec_egress_queue();
+ tcx_dec();
+}
+
+static inline void tcx_miniq_set_active(struct bpf_mprog_entry *entry,
+ const bool active)
+{
+ ASSERT_RTNL();
+ tcx_entry(entry)->miniq_active = active;
+}
+
+static inline bool tcx_entry_is_active(struct bpf_mprog_entry *entry)
+{
+ ASSERT_RTNL();
+ return bpf_mprog_total(entry) || tcx_entry(entry)->miniq_active;
+}
+
+static inline enum tcx_action_base tcx_action_code(struct sk_buff *skb,
+ int code)
+{
+ switch (code) {
+ case TCX_PASS:
+ skb->tc_index = qdisc_skb_cb(skb)->tc_classid;
+ fallthrough;
+ case TCX_DROP:
+ case TCX_REDIRECT:
+ return code;
+ case TCX_NEXT:
+ default:
+ return TCX_NEXT;
+ }
+}
+#endif /* CONFIG_NET_XGRESS */
+
+#if defined(CONFIG_NET_XGRESS) && defined(CONFIG_BPF_SYSCALL)
+int tcx_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog);
+int tcx_link_attach(const union bpf_attr *attr, struct bpf_prog *prog);
+int tcx_prog_detach(const union bpf_attr *attr, struct bpf_prog *prog);
+void tcx_uninstall(struct net_device *dev, bool ingress);
+
+int tcx_prog_query(const union bpf_attr *attr,
+ union bpf_attr __user *uattr);
+
+static inline void dev_tcx_uninstall(struct net_device *dev)
+{
+ ASSERT_RTNL();
+ tcx_uninstall(dev, true);
+ tcx_uninstall(dev, false);
+}
+#else
+static inline int tcx_prog_attach(const union bpf_attr *attr,
+ struct bpf_prog *prog)
+{
+ return -EINVAL;
+}
+
+static inline int tcx_link_attach(const union bpf_attr *attr,
+ struct bpf_prog *prog)
+{
+ return -EINVAL;
+}
+
+static inline int tcx_prog_detach(const union bpf_attr *attr,
+ struct bpf_prog *prog)
+{
+ return -EINVAL;
+}
+
+static inline int tcx_prog_query(const union bpf_attr *attr,
+ union bpf_attr __user *uattr)
+{
+ return -EINVAL;
+}
+
+static inline void dev_tcx_uninstall(struct net_device *dev)
+{
+}
+#endif /* CONFIG_NET_XGRESS && CONFIG_BPF_SYSCALL */
+#endif /* __NET_TCX_H */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index d4c07e435336..739c15906a65 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1036,6 +1036,8 @@ enum bpf_attach_type {
BPF_LSM_CGROUP,
BPF_STRUCT_OPS,
BPF_NETFILTER,
+ BPF_TCX_INGRESS,
+ BPF_TCX_EGRESS,
__MAX_BPF_ATTACH_TYPE
};
@@ -1053,7 +1055,7 @@ enum bpf_link_type {
BPF_LINK_TYPE_KPROBE_MULTI = 8,
BPF_LINK_TYPE_STRUCT_OPS = 9,
BPF_LINK_TYPE_NETFILTER = 10,
-
+ BPF_LINK_TYPE_TCX = 11,
MAX_BPF_LINK_TYPE,
};
@@ -1569,13 +1571,13 @@ union bpf_attr {
__u32 map_fd; /* struct_ops to attach */
};
union {
- __u32 target_fd; /* object to attach to */
- __u32 target_ifindex; /* target ifindex */
+ __u32 target_fd; /* target object to attach to or ... */
+ __u32 target_ifindex; /* target ifindex */
};
__u32 attach_type; /* attach type */
__u32 flags; /* extra flags */
union {
- __u32 target_btf_id; /* btf_id of target to attach to */
+ __u32 target_btf_id; /* btf_id of target to attach to */
struct {
__aligned_u64 iter_info; /* extra bpf_iter_link_info */
__u32 iter_info_len; /* iter_info length */
@@ -1609,6 +1611,13 @@ union bpf_attr {
__s32 priority;
__u32 flags;
} netfilter;
+ struct {
+ union {
+ __u32 relative_fd;
+ __u32 relative_id;
+ };
+ __u64 expected_revision;
+ } tcx;
};
} link_create;
@@ -6217,6 +6226,19 @@ struct bpf_sock_tuple {
};
};
+/* (Simplified) user return codes for tcx prog type.
+ * A valid tcx program must return one of these defined values. All other
+ * return codes are reserved for future use. Must remain compatible with
+ * their TC_ACT_* counter-parts. For compatibility in behavior, unknown
+ * return codes are mapped to TCX_NEXT.
+ */
+enum tcx_action_base {
+ TCX_NEXT = -1,
+ TCX_PASS = 0,
+ TCX_DROP = 2,
+ TCX_REDIRECT = 7,
+};
+
struct bpf_xdp_sock {
__u32 queue_id;
};
@@ -6499,6 +6521,10 @@ struct bpf_link_info {
} event; /* BPF_PERF_EVENT_EVENT */
};
} perf_event;
+ struct {
+ __u32 ifindex;
+ __u32 attach_type;
+ } tcx;
};
} __attribute__((aligned(8)));
diff --git a/kernel/bpf/Kconfig b/kernel/bpf/Kconfig
index 2dfe1079f772..6a906ff93006 100644
--- a/kernel/bpf/Kconfig
+++ b/kernel/bpf/Kconfig
@@ -31,6 +31,7 @@ config BPF_SYSCALL
select TASKS_TRACE_RCU
select BINARY_PRINTF
select NET_SOCK_MSG if NET
+ select NET_XGRESS if NET
select PAGE_POOL if NET
default n
help
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index 1bea2eb912cd..f526b7573e97 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -21,6 +21,7 @@ obj-$(CONFIG_BPF_SYSCALL) += devmap.o
obj-$(CONFIG_BPF_SYSCALL) += cpumap.o
obj-$(CONFIG_BPF_SYSCALL) += offload.o
obj-$(CONFIG_BPF_SYSCALL) += net_namespace.o
+obj-$(CONFIG_BPF_SYSCALL) += tcx.o
endif
ifeq ($(CONFIG_PERF_EVENTS),y)
obj-$(CONFIG_BPF_SYSCALL) += stackmap.o
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index ee8cb1a174aa..7f4e8c357a6a 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -37,6 +37,8 @@
#include <linux/trace_events.h>
#include <net/netfilter/nf_bpf_link.h>
+#include <net/tcx.h>
+
#define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || \
(map)->map_type == BPF_MAP_TYPE_CGROUP_ARRAY || \
(map)->map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS)
@@ -3740,31 +3742,45 @@ attach_type_to_prog_type(enum bpf_attach_type attach_type)
return BPF_PROG_TYPE_XDP;
case BPF_LSM_CGROUP:
return BPF_PROG_TYPE_LSM;
+ case BPF_TCX_INGRESS:
+ case BPF_TCX_EGRESS:
+ return BPF_PROG_TYPE_SCHED_CLS;
default:
return BPF_PROG_TYPE_UNSPEC;
}
}
-#define BPF_PROG_ATTACH_LAST_FIELD replace_bpf_fd
+#define BPF_PROG_ATTACH_LAST_FIELD expected_revision
+
+#define BPF_F_ATTACH_MASK_BASE \
+ (BPF_F_ALLOW_OVERRIDE | \
+ BPF_F_ALLOW_MULTI | \
+ BPF_F_REPLACE)
-#define BPF_F_ATTACH_MASK \
- (BPF_F_ALLOW_OVERRIDE | BPF_F_ALLOW_MULTI | BPF_F_REPLACE)
+#define BPF_F_ATTACH_MASK_MPROG \
+ (BPF_F_REPLACE | \
+ BPF_F_BEFORE | \
+ BPF_F_AFTER | \
+ BPF_F_ID | \
+ BPF_F_LINK)
static int bpf_prog_attach(const union bpf_attr *attr)
{
enum bpf_prog_type ptype;
struct bpf_prog *prog;
+ u32 mask;
int ret;
if (CHECK_ATTR(BPF_PROG_ATTACH))
return -EINVAL;
- if (attr->attach_flags & ~BPF_F_ATTACH_MASK)
- return -EINVAL;
-
ptype = attach_type_to_prog_type(attr->attach_type);
if (ptype == BPF_PROG_TYPE_UNSPEC)
return -EINVAL;
+ mask = bpf_mprog_supported(ptype) ?
+ BPF_F_ATTACH_MASK_MPROG : BPF_F_ATTACH_MASK_BASE;
+ if (attr->attach_flags & ~mask)
+ return -EINVAL;
prog = bpf_prog_get_type(attr->attach_bpf_fd, ptype);
if (IS_ERR(prog))
@@ -3800,6 +3816,9 @@ static int bpf_prog_attach(const union bpf_attr *attr)
else
ret = cgroup_bpf_prog_attach(attr, ptype, prog);
break;
+ case BPF_PROG_TYPE_SCHED_CLS:
+ ret = tcx_prog_attach(attr, prog);
+ break;
default:
ret = -EINVAL;
}
@@ -3809,25 +3828,41 @@ static int bpf_prog_attach(const union bpf_attr *attr)
return ret;
}
-#define BPF_PROG_DETACH_LAST_FIELD attach_type
+#define BPF_PROG_DETACH_LAST_FIELD expected_revision
static int bpf_prog_detach(const union bpf_attr *attr)
{
+ struct bpf_prog *prog = NULL;
enum bpf_prog_type ptype;
+ int ret;
if (CHECK_ATTR(BPF_PROG_DETACH))
return -EINVAL;
ptype = attach_type_to_prog_type(attr->attach_type);
+ if (bpf_mprog_supported(ptype)) {
+ if (ptype == BPF_PROG_TYPE_UNSPEC)
+ return -EINVAL;
+ if (attr->attach_flags & ~BPF_F_ATTACH_MASK_MPROG)
+ return -EINVAL;
+ if (attr->attach_bpf_fd) {
+ prog = bpf_prog_get_type(attr->attach_bpf_fd, ptype);
+ if (IS_ERR(prog))
+ return PTR_ERR(prog);
+ }
+ }
switch (ptype) {
case BPF_PROG_TYPE_SK_MSG:
case BPF_PROG_TYPE_SK_SKB:
- return sock_map_prog_detach(attr, ptype);
+ ret = sock_map_prog_detach(attr, ptype);
+ break;
case BPF_PROG_TYPE_LIRC_MODE2:
- return lirc_prog_detach(attr);
+ ret = lirc_prog_detach(attr);
+ break;
case BPF_PROG_TYPE_FLOW_DISSECTOR:
- return netns_bpf_prog_detach(attr, ptype);
+ ret = netns_bpf_prog_detach(attr, ptype);
+ break;
case BPF_PROG_TYPE_CGROUP_DEVICE:
case BPF_PROG_TYPE_CGROUP_SKB:
case BPF_PROG_TYPE_CGROUP_SOCK:
@@ -3836,13 +3871,21 @@ static int bpf_prog_detach(const union bpf_attr *attr)
case BPF_PROG_TYPE_CGROUP_SYSCTL:
case BPF_PROG_TYPE_SOCK_OPS:
case BPF_PROG_TYPE_LSM:
- return cgroup_bpf_prog_detach(attr, ptype);
+ ret = cgroup_bpf_prog_detach(attr, ptype);
+ break;
+ case BPF_PROG_TYPE_SCHED_CLS:
+ ret = tcx_prog_detach(attr, prog);
+ break;
default:
- return -EINVAL;
+ ret = -EINVAL;
}
+
+ if (prog)
+ bpf_prog_put(prog);
+ return ret;
}
-#define BPF_PROG_QUERY_LAST_FIELD query.prog_attach_flags
+#define BPF_PROG_QUERY_LAST_FIELD query.link_attach_flags
static int bpf_prog_query(const union bpf_attr *attr,
union bpf_attr __user *uattr)
@@ -3890,6 +3933,9 @@ static int bpf_prog_query(const union bpf_attr *attr,
case BPF_SK_MSG_VERDICT:
case BPF_SK_SKB_VERDICT:
return sock_map_bpf_prog_query(attr, uattr);
+ case BPF_TCX_INGRESS:
+ case BPF_TCX_EGRESS:
+ return tcx_prog_query(attr, uattr);
default:
return -EINVAL;
}
@@ -4852,6 +4898,13 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
goto out;
}
break;
+ case BPF_PROG_TYPE_SCHED_CLS:
+ if (attr->link_create.attach_type != BPF_TCX_INGRESS &&
+ attr->link_create.attach_type != BPF_TCX_EGRESS) {
+ ret = -EINVAL;
+ goto out;
+ }
+ break;
default:
ptype = attach_type_to_prog_type(attr->link_create.attach_type);
if (ptype == BPF_PROG_TYPE_UNSPEC || ptype != prog->type) {
@@ -4903,6 +4956,9 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
case BPF_PROG_TYPE_XDP:
ret = bpf_xdp_link_attach(attr, prog);
break;
+ case BPF_PROG_TYPE_SCHED_CLS:
+ ret = tcx_link_attach(attr, prog);
+ break;
case BPF_PROG_TYPE_NETFILTER:
ret = bpf_nf_link_attach(attr, prog);
break;
diff --git a/kernel/bpf/tcx.c b/kernel/bpf/tcx.c
new file mode 100644
index 000000000000..69a272712b29
--- /dev/null
+++ b/kernel/bpf/tcx.c
@@ -0,0 +1,348 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2023 Isovalent */
+
+#include <linux/bpf.h>
+#include <linux/bpf_mprog.h>
+#include <linux/netdevice.h>
+
+#include <net/tcx.h>
+
+int tcx_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
+{
+ bool created, ingress = attr->attach_type == BPF_TCX_INGRESS;
+ struct net *net = current->nsproxy->net_ns;
+ struct bpf_mprog_entry *entry, *entry_new;
+ struct bpf_prog *replace_prog = NULL;
+ struct net_device *dev;
+ int ret;
+
+ rtnl_lock();
+ dev = __dev_get_by_index(net, attr->target_ifindex);
+ if (!dev) {
+ ret = -ENODEV;
+ goto out;
+ }
+ if (attr->attach_flags & BPF_F_REPLACE) {
+ replace_prog = bpf_prog_get_type(attr->replace_bpf_fd,
+ prog->type);
+ if (IS_ERR(replace_prog)) {
+ ret = PTR_ERR(replace_prog);
+ replace_prog = NULL;
+ goto out;
+ }
+ }
+ entry = tcx_entry_fetch_or_create(dev, ingress, &created);
+ if (!entry) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ ret = bpf_mprog_attach(entry, &entry_new, prog, NULL, replace_prog,
+ attr->attach_flags, attr->relative_fd,
+ attr->expected_revision);
+ if (!ret) {
+ if (entry != entry_new) {
+ tcx_entry_update(dev, entry_new, ingress);
+ tcx_entry_sync();
+ tcx_skeys_inc(ingress);
+ }
+ bpf_mprog_commit(entry);
+ } else if (created) {
+ tcx_entry_free(entry);
+ }
+out:
+ if (replace_prog)
+ bpf_prog_put(replace_prog);
+ rtnl_unlock();
+ return ret;
+}
+
+int tcx_prog_detach(const union bpf_attr *attr, struct bpf_prog *prog)
+{
+ bool ingress = attr->attach_type == BPF_TCX_INGRESS;
+ struct net *net = current->nsproxy->net_ns;
+ struct bpf_mprog_entry *entry, *entry_new;
+ struct net_device *dev;
+ int ret;
+
+ rtnl_lock();
+ dev = __dev_get_by_index(net, attr->target_ifindex);
+ if (!dev) {
+ ret = -ENODEV;
+ goto out;
+ }
+ entry = tcx_entry_fetch(dev, ingress);
+ if (!entry) {
+ ret = -ENOENT;
+ goto out;
+ }
+ ret = bpf_mprog_detach(entry, &entry_new, prog, NULL, attr->attach_flags,
+ attr->relative_fd, attr->expected_revision);
+ if (!ret) {
+ if (!tcx_entry_is_active(entry_new))
+ entry_new = NULL;
+ tcx_entry_update(dev, entry_new, ingress);
+ tcx_entry_sync();
+ tcx_skeys_dec(ingress);
+ bpf_mprog_commit(entry);
+ if (!entry_new)
+ tcx_entry_free(entry);
+ }
+out:
+ rtnl_unlock();
+ return ret;
+}
+
+void tcx_uninstall(struct net_device *dev, bool ingress)
+{
+ struct bpf_tuple tuple = {};
+ struct bpf_mprog_entry *entry;
+ struct bpf_mprog_fp *fp;
+ struct bpf_mprog_cp *cp;
+
+ entry = tcx_entry_fetch(dev, ingress);
+ if (!entry)
+ return;
+ tcx_entry_update(dev, NULL, ingress);
+ tcx_entry_sync();
+ bpf_mprog_foreach_tuple(entry, fp, cp, tuple) {
+ if (tuple.link)
+ tcx_link(tuple.link)->dev = NULL;
+ else
+ bpf_prog_put(tuple.prog);
+ tcx_skeys_dec(ingress);
+ }
+ WARN_ON_ONCE(tcx_entry(entry)->miniq_active);
+ tcx_entry_free(entry);
+}
+
+int tcx_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr)
+{
+ bool ingress = attr->query.attach_type == BPF_TCX_INGRESS;
+ struct net *net = current->nsproxy->net_ns;
+ struct bpf_mprog_entry *entry;
+ struct net_device *dev;
+ int ret;
+
+ rtnl_lock();
+ dev = __dev_get_by_index(net, attr->query.target_ifindex);
+ if (!dev) {
+ ret = -ENODEV;
+ goto out;
+ }
+ entry = tcx_entry_fetch(dev, ingress);
+ if (!entry) {
+ ret = -ENOENT;
+ goto out;
+ }
+ ret = bpf_mprog_query(attr, uattr, entry);
+out:
+ rtnl_unlock();
+ return ret;
+}
+
+static int tcx_link_prog_attach(struct bpf_link *link, u32 flags, u32 id_or_fd,
+ u64 revision)
+{
+ struct tcx_link *tcx = tcx_link(link);
+ bool created, ingress = tcx->location == BPF_TCX_INGRESS;
+ struct bpf_mprog_entry *entry, *entry_new;
+ struct net_device *dev = tcx->dev;
+ int ret;
+
+ ASSERT_RTNL();
+ entry = tcx_entry_fetch_or_create(dev, ingress, &created);
+ if (!entry)
+ return -ENOMEM;
+ ret = bpf_mprog_attach(entry, &entry_new, link->prog, link, NULL, flags,
+ id_or_fd, revision);
+ if (!ret) {
+ if (entry != entry_new) {
+ tcx_entry_update(dev, entry_new, ingress);
+ tcx_entry_sync();
+ tcx_skeys_inc(ingress);
+ }
+ bpf_mprog_commit(entry);
+ } else if (created) {
+ tcx_entry_free(entry);
+ }
+ return ret;
+}
+
+static void tcx_link_release(struct bpf_link *link)
+{
+ struct tcx_link *tcx = tcx_link(link);
+ bool ingress = tcx->location == BPF_TCX_INGRESS;
+ struct bpf_mprog_entry *entry, *entry_new;
+ struct net_device *dev;
+ int ret = 0;
+
+ rtnl_lock();
+ dev = tcx->dev;
+ if (!dev)
+ goto out;
+ entry = tcx_entry_fetch(dev, ingress);
+ if (!entry) {
+ ret = -ENOENT;
+ goto out;
+ }
+ ret = bpf_mprog_detach(entry, &entry_new, link->prog, link, 0, 0, 0);
+ if (!ret) {
+ if (!tcx_entry_is_active(entry_new))
+ entry_new = NULL;
+ tcx_entry_update(dev, entry_new, ingress);
+ tcx_entry_sync();
+ tcx_skeys_dec(ingress);
+ bpf_mprog_commit(entry);
+ if (!entry_new)
+ tcx_entry_free(entry);
+ tcx->dev = NULL;
+ }
+out:
+ WARN_ON_ONCE(ret);
+ rtnl_unlock();
+}
+
+static int tcx_link_update(struct bpf_link *link, struct bpf_prog *nprog,
+ struct bpf_prog *oprog)
+{
+ struct tcx_link *tcx = tcx_link(link);
+ bool ingress = tcx->location == BPF_TCX_INGRESS;
+ struct bpf_mprog_entry *entry, *entry_new;
+ struct net_device *dev;
+ int ret = 0;
+
+ rtnl_lock();
+ dev = tcx->dev;
+ if (!dev) {
+ ret = -ENOLINK;
+ goto out;
+ }
+ if (oprog && link->prog != oprog) {
+ ret = -EPERM;
+ goto out;
+ }
+ oprog = link->prog;
+ if (oprog == nprog) {
+ bpf_prog_put(nprog);
+ goto out;
+ }
+ entry = tcx_entry_fetch(dev, ingress);
+ if (!entry) {
+ ret = -ENOENT;
+ goto out;
+ }
+ ret = bpf_mprog_attach(entry, &entry_new, nprog, link, oprog,
+ BPF_F_REPLACE | BPF_F_ID,
+ link->prog->aux->id, 0);
+ if (!ret) {
+ WARN_ON_ONCE(entry != entry_new);
+ oprog = xchg(&link->prog, nprog);
+ bpf_prog_put(oprog);
+ bpf_mprog_commit(entry);
+ }
+out:
+ rtnl_unlock();
+ return ret;
+}
+
+static void tcx_link_dealloc(struct bpf_link *link)
+{
+ kfree(tcx_link(link));
+}
+
+static void tcx_link_fdinfo(const struct bpf_link *link, struct seq_file *seq)
+{
+ const struct tcx_link *tcx = tcx_link_const(link);
+ u32 ifindex = 0;
+
+ rtnl_lock();
+ if (tcx->dev)
+ ifindex = tcx->dev->ifindex;
+ rtnl_unlock();
+
+ seq_printf(seq, "ifindex:\t%u\n", ifindex);
+ seq_printf(seq, "attach_type:\t%u (%s)\n",
+ tcx->location,
+ tcx->location == BPF_TCX_INGRESS ? "ingress" : "egress");
+}
+
+static int tcx_link_fill_info(const struct bpf_link *link,
+ struct bpf_link_info *info)
+{
+ const struct tcx_link *tcx = tcx_link_const(link);
+ u32 ifindex = 0;
+
+ rtnl_lock();
+ if (tcx->dev)
+ ifindex = tcx->dev->ifindex;
+ rtnl_unlock();
+
+ info->tcx.ifindex = ifindex;
+ info->tcx.attach_type = tcx->location;
+ return 0;
+}
+
+static int tcx_link_detach(struct bpf_link *link)
+{
+ tcx_link_release(link);
+ return 0;
+}
+
+static const struct bpf_link_ops tcx_link_lops = {
+ .release = tcx_link_release,
+ .detach = tcx_link_detach,
+ .dealloc = tcx_link_dealloc,
+ .update_prog = tcx_link_update,
+ .show_fdinfo = tcx_link_fdinfo,
+ .fill_link_info = tcx_link_fill_info,
+};
+
+static int tcx_link_init(struct tcx_link *tcx,
+ struct bpf_link_primer *link_primer,
+ const union bpf_attr *attr,
+ struct net_device *dev,
+ struct bpf_prog *prog)
+{
+ bpf_link_init(&tcx->link, BPF_LINK_TYPE_TCX, &tcx_link_lops, prog);
+ tcx->location = attr->link_create.attach_type;
+ tcx->dev = dev;
+ return bpf_link_prime(&tcx->link, link_primer);
+}
+
+int tcx_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
+{
+ struct net *net = current->nsproxy->net_ns;
+ struct bpf_link_primer link_primer;
+ struct net_device *dev;
+ struct tcx_link *tcx;
+ int ret;
+
+ rtnl_lock();
+ dev = __dev_get_by_index(net, attr->link_create.target_ifindex);
+ if (!dev) {
+ ret = -ENODEV;
+ goto out;
+ }
+ tcx = kzalloc(sizeof(*tcx), GFP_USER);
+ if (!tcx) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ ret = tcx_link_init(tcx, &link_primer, attr, dev, prog);
+ if (ret) {
+ kfree(tcx);
+ goto out;
+ }
+ ret = tcx_link_prog_attach(&tcx->link, attr->link_create.flags,
+ attr->link_create.tcx.relative_fd,
+ attr->link_create.tcx.expected_revision);
+ if (ret) {
+ tcx->dev = NULL;
+ bpf_link_cleanup(&link_primer);
+ goto out;
+ }
+ ret = bpf_link_settle(&link_primer);
+out:
+ rtnl_unlock();
+ return ret;
+}
diff --git a/net/Kconfig b/net/Kconfig
index 2fb25b534df5..d532ec33f1fe 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -52,6 +52,11 @@ config NET_INGRESS
config NET_EGRESS
bool
+config NET_XGRESS
+ select NET_INGRESS
+ select NET_EGRESS
+ bool
+
config NET_REDIRECT
bool
diff --git a/net/core/dev.c b/net/core/dev.c
index d6e1b786c5c5..c4b826024978 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -107,6 +107,7 @@
#include <net/pkt_cls.h>
#include <net/checksum.h>
#include <net/xfrm.h>
+#include <net/tcx.h>
#include <linux/highmem.h>
#include <linux/init.h>
#include <linux/module.h>
@@ -154,7 +155,6 @@
#include "dev.h"
#include "net-sysfs.h"
-
static DEFINE_SPINLOCK(ptype_lock);
struct list_head ptype_base[PTYPE_HASH_SIZE] __read_mostly;
struct list_head ptype_all __read_mostly; /* Taps */
@@ -3882,69 +3882,198 @@ int dev_loopback_xmit(struct net *net, struct sock *sk, struct sk_buff *skb)
EXPORT_SYMBOL(dev_loopback_xmit);
#ifdef CONFIG_NET_EGRESS
-static struct sk_buff *
-sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
+static struct netdev_queue *
+netdev_tx_queue_mapping(struct net_device *dev, struct sk_buff *skb)
+{
+ int qm = skb_get_queue_mapping(skb);
+
+ return netdev_get_tx_queue(dev, netdev_cap_txqueue(dev, qm));
+}
+
+static bool netdev_xmit_txqueue_skipped(void)
{
+ return __this_cpu_read(softnet_data.xmit.skip_txqueue);
+}
+
+void netdev_xmit_skip_txqueue(bool skip)
+{
+ __this_cpu_write(softnet_data.xmit.skip_txqueue, skip);
+}
+EXPORT_SYMBOL_GPL(netdev_xmit_skip_txqueue);
+#endif /* CONFIG_NET_EGRESS */
+
+#ifdef CONFIG_NET_XGRESS
+static int tc_run(struct tcx_entry *entry, struct sk_buff *skb)
+{
+ int ret = TC_ACT_UNSPEC;
#ifdef CONFIG_NET_CLS_ACT
- struct mini_Qdisc *miniq = rcu_dereference_bh(dev->miniq_egress);
- struct tcf_result cl_res;
+ struct mini_Qdisc *miniq = rcu_dereference_bh(entry->miniq);
+ struct tcf_result res;
if (!miniq)
- return skb;
+ return ret;
- /* qdisc_skb_cb(skb)->pkt_len was already set by the caller. */
tc_skb_cb(skb)->mru = 0;
tc_skb_cb(skb)->post_ct = false;
- mini_qdisc_bstats_cpu_update(miniq, skb);
- switch (tcf_classify(skb, miniq->block, miniq->filter_list, &cl_res, false)) {
+ mini_qdisc_bstats_cpu_update(miniq, skb);
+ ret = tcf_classify(skb, miniq->block, miniq->filter_list, &res, false);
+ /* Only tcf related quirks below. */
+ switch (ret) {
+ case TC_ACT_SHOT:
+ mini_qdisc_qstats_cpu_drop(miniq);
+ break;
case TC_ACT_OK:
case TC_ACT_RECLASSIFY:
- skb->tc_index = TC_H_MIN(cl_res.classid);
+ skb->tc_index = TC_H_MIN(res.classid);
break;
+ }
+#endif /* CONFIG_NET_CLS_ACT */
+ return ret;
+}
+
+static DEFINE_STATIC_KEY_FALSE(tcx_needed_key);
+
+void tcx_inc(void)
+{
+ static_branch_inc(&tcx_needed_key);
+}
+
+void tcx_dec(void)
+{
+ static_branch_dec(&tcx_needed_key);
+}
+
+static __always_inline enum tcx_action_base
+tcx_run(const struct bpf_mprog_entry *entry, struct sk_buff *skb,
+ const bool needs_mac)
+{
+ const struct bpf_mprog_fp *fp;
+ const struct bpf_prog *prog;
+ int ret = TCX_NEXT;
+
+ if (needs_mac)
+ __skb_push(skb, skb->mac_len);
+ bpf_mprog_foreach_prog(entry, fp, prog) {
+ bpf_compute_data_pointers(skb);
+ ret = bpf_prog_run(prog, skb);
+ if (ret != TCX_NEXT)
+ break;
+ }
+ if (needs_mac)
+ __skb_pull(skb, skb->mac_len);
+ return tcx_action_code(skb, ret);
+}
+
+static __always_inline struct sk_buff *
+sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
+ struct net_device *orig_dev, bool *another)
+{
+ struct bpf_mprog_entry *entry = rcu_dereference_bh(skb->dev->tcx_ingress);
+ int sch_ret;
+
+ if (!entry)
+ return skb;
+ if (*pt_prev) {
+ *ret = deliver_skb(skb, *pt_prev, orig_dev);
+ *pt_prev = NULL;
+ }
+
+ qdisc_skb_cb(skb)->pkt_len = skb->len;
+ tcx_set_ingress(skb, true);
+
+ if (static_branch_unlikely(&tcx_needed_key)) {
+ sch_ret = tcx_run(entry, skb, true);
+ if (sch_ret != TC_ACT_UNSPEC)
+ goto ingress_verdict;
+ }
+ sch_ret = tc_run(tcx_entry(entry), skb);
+ingress_verdict:
+ switch (sch_ret) {
+ case TC_ACT_REDIRECT:
+ /* skb_mac_header check was done by BPF, so we can safely
+ * push the L2 header back before redirecting to another
+ * netdev.
+ */
+ __skb_push(skb, skb->mac_len);
+ if (skb_do_redirect(skb) == -EAGAIN) {
+ __skb_pull(skb, skb->mac_len);
+ *another = true;
+ break;
+ }
+ *ret = NET_RX_SUCCESS;
+ return NULL;
case TC_ACT_SHOT:
- mini_qdisc_qstats_cpu_drop(miniq);
- *ret = NET_XMIT_DROP;
- kfree_skb_reason(skb, SKB_DROP_REASON_TC_EGRESS);
+ kfree_skb_reason(skb, SKB_DROP_REASON_TC_INGRESS);
+ *ret = NET_RX_DROP;
return NULL;
+ /* used by tc_run */
case TC_ACT_STOLEN:
case TC_ACT_QUEUED:
case TC_ACT_TRAP:
- *ret = NET_XMIT_SUCCESS;
consume_skb(skb);
+ fallthrough;
+ case TC_ACT_CONSUMED:
+ *ret = NET_RX_SUCCESS;
return NULL;
+ }
+
+ return skb;
+}
+
+static __always_inline struct sk_buff *
+sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
+{
+ struct bpf_mprog_entry *entry = rcu_dereference_bh(dev->tcx_egress);
+ int sch_ret;
+
+ if (!entry)
+ return skb;
+
+ /* qdisc_skb_cb(skb)->pkt_len & tcx_set_ingress() was
+ * already set by the caller.
+ */
+ if (static_branch_unlikely(&tcx_needed_key)) {
+ sch_ret = tcx_run(entry, skb, false);
+ if (sch_ret != TC_ACT_UNSPEC)
+ goto egress_verdict;
+ }
+ sch_ret = tc_run(tcx_entry(entry), skb);
+egress_verdict:
+ switch (sch_ret) {
case TC_ACT_REDIRECT:
/* No need to push/pop skb's mac_header here on egress! */
skb_do_redirect(skb);
*ret = NET_XMIT_SUCCESS;
return NULL;
- default:
- break;
+ case TC_ACT_SHOT:
+ kfree_skb_reason(skb, SKB_DROP_REASON_TC_EGRESS);
+ *ret = NET_XMIT_DROP;
+ return NULL;
+ /* used by tc_run */
+ case TC_ACT_STOLEN:
+ case TC_ACT_QUEUED:
+ case TC_ACT_TRAP:
+ *ret = NET_XMIT_SUCCESS;
+ return NULL;
}
-#endif /* CONFIG_NET_CLS_ACT */
return skb;
}
-
-static struct netdev_queue *
-netdev_tx_queue_mapping(struct net_device *dev, struct sk_buff *skb)
-{
- int qm = skb_get_queue_mapping(skb);
-
- return netdev_get_tx_queue(dev, netdev_cap_txqueue(dev, qm));
-}
-
-static bool netdev_xmit_txqueue_skipped(void)
+#else
+static __always_inline struct sk_buff *
+sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
+ struct net_device *orig_dev, bool *another)
{
- return __this_cpu_read(softnet_data.xmit.skip_txqueue);
+ return skb;
}
-void netdev_xmit_skip_txqueue(bool skip)
+static __always_inline struct sk_buff *
+sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
{
- __this_cpu_write(softnet_data.xmit.skip_txqueue, skip);
+ return skb;
}
-EXPORT_SYMBOL_GPL(netdev_xmit_skip_txqueue);
-#endif /* CONFIG_NET_EGRESS */
+#endif /* CONFIG_NET_XGRESS */
#ifdef CONFIG_XPS
static int __get_xps_queue_idx(struct net_device *dev, struct sk_buff *skb,
@@ -4128,9 +4257,7 @@ int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
skb_update_prio(skb);
qdisc_pkt_len_init(skb);
-#ifdef CONFIG_NET_CLS_ACT
- skb->tc_at_ingress = 0;
-#endif
+ tcx_set_ingress(skb, false);
#ifdef CONFIG_NET_EGRESS
if (static_branch_unlikely(&egress_needed_key)) {
if (nf_hook_egress_active()) {
@@ -5064,72 +5191,6 @@ int (*br_fdb_test_addr_hook)(struct net_device *dev,
EXPORT_SYMBOL_GPL(br_fdb_test_addr_hook);
#endif
-static inline struct sk_buff *
-sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
- struct net_device *orig_dev, bool *another)
-{
-#ifdef CONFIG_NET_CLS_ACT
- struct mini_Qdisc *miniq = rcu_dereference_bh(skb->dev->miniq_ingress);
- struct tcf_result cl_res;
-
- /* If there's at least one ingress present somewhere (so
- * we get here via enabled static key), remaining devices
- * that are not configured with an ingress qdisc will bail
- * out here.
- */
- if (!miniq)
- return skb;
-
- if (*pt_prev) {
- *ret = deliver_skb(skb, *pt_prev, orig_dev);
- *pt_prev = NULL;
- }
-
- qdisc_skb_cb(skb)->pkt_len = skb->len;
- tc_skb_cb(skb)->mru = 0;
- tc_skb_cb(skb)->post_ct = false;
- skb->tc_at_ingress = 1;
- mini_qdisc_bstats_cpu_update(miniq, skb);
-
- switch (tcf_classify(skb, miniq->block, miniq->filter_list, &cl_res, false)) {
- case TC_ACT_OK:
- case TC_ACT_RECLASSIFY:
- skb->tc_index = TC_H_MIN(cl_res.classid);
- break;
- case TC_ACT_SHOT:
- mini_qdisc_qstats_cpu_drop(miniq);
- kfree_skb_reason(skb, SKB_DROP_REASON_TC_INGRESS);
- *ret = NET_RX_DROP;
- return NULL;
- case TC_ACT_STOLEN:
- case TC_ACT_QUEUED:
- case TC_ACT_TRAP:
- consume_skb(skb);
- *ret = NET_RX_SUCCESS;
- return NULL;
- case TC_ACT_REDIRECT:
- /* skb_mac_header check was done by cls/act_bpf, so
- * we can safely push the L2 header back before
- * redirecting to another netdev
- */
- __skb_push(skb, skb->mac_len);
- if (skb_do_redirect(skb) == -EAGAIN) {
- __skb_pull(skb, skb->mac_len);
- *another = true;
- break;
- }
- *ret = NET_RX_SUCCESS;
- return NULL;
- case TC_ACT_CONSUMED:
- *ret = NET_RX_SUCCESS;
- return NULL;
- default:
- break;
- }
-#endif /* CONFIG_NET_CLS_ACT */
- return skb;
-}
-
/**
* netdev_is_rx_handler_busy - check if receive handler is registered
* @dev: device to check
@@ -10834,7 +10895,7 @@ void unregister_netdevice_many_notify(struct list_head *head,
/* Shutdown queueing discipline. */
dev_shutdown(dev);
-
+ dev_tcx_uninstall(dev);
dev_xdp_uninstall(dev);
bpf_dev_bound_netdev_unregister(dev);
diff --git a/net/core/filter.c b/net/core/filter.c
index 06ba0e56e369..e39a8a20dd10 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -9312,7 +9312,7 @@ static struct bpf_insn *bpf_convert_tstamp_read(const struct bpf_prog *prog,
__u8 value_reg = si->dst_reg;
__u8 skb_reg = si->src_reg;
-#ifdef CONFIG_NET_CLS_ACT
+#ifdef CONFIG_NET_XGRESS
/* If the tstamp_type is read,
* the bpf prog is aware the tstamp could have delivery time.
* Thus, read skb->tstamp as is if tstamp_type_access is true.
@@ -9346,7 +9346,7 @@ static struct bpf_insn *bpf_convert_tstamp_write(const struct bpf_prog *prog,
__u8 value_reg = si->src_reg;
__u8 skb_reg = si->dst_reg;
-#ifdef CONFIG_NET_CLS_ACT
+#ifdef CONFIG_NET_XGRESS
/* If the tstamp_type is read,
* the bpf prog is aware the tstamp could have delivery time.
* Thus, write skb->tstamp as is if tstamp_type_access is true.
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index 4b95cb1ac435..470c70deffe2 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -347,8 +347,7 @@ config NET_SCH_FQ_PIE
config NET_SCH_INGRESS
tristate "Ingress/classifier-action Qdisc"
depends on NET_CLS_ACT
- select NET_INGRESS
- select NET_EGRESS
+ select NET_XGRESS
help
Say Y here if you want to use classifiers for incoming and/or outgoing
packets. This qdisc doesn't do anything else besides running classifiers,
@@ -679,6 +678,7 @@ config NET_EMATCH_IPT
config NET_CLS_ACT
bool "Actions"
select NET_CLS
+ select NET_XGRESS
help
Say Y here if you want to use traffic control actions. Actions
get attached to classifiers and are invoked after a successful
diff --git a/net/sched/sch_ingress.c b/net/sched/sch_ingress.c
index e43a45499372..04e886f6cee4 100644
--- a/net/sched/sch_ingress.c
+++ b/net/sched/sch_ingress.c
@@ -13,6 +13,7 @@
#include <net/netlink.h>
#include <net/pkt_sched.h>
#include <net/pkt_cls.h>
+#include <net/tcx.h>
struct ingress_sched_data {
struct tcf_block *block;
@@ -78,6 +79,8 @@ static int ingress_init(struct Qdisc *sch, struct nlattr *opt,
{
struct ingress_sched_data *q = qdisc_priv(sch);
struct net_device *dev = qdisc_dev(sch);
+ struct bpf_mprog_entry *entry;
+ bool created;
int err;
if (sch->parent != TC_H_INGRESS)
@@ -85,7 +88,13 @@ static int ingress_init(struct Qdisc *sch, struct nlattr *opt,
net_inc_ingress_queue();
- mini_qdisc_pair_init(&q->miniqp, sch, &dev->miniq_ingress);
+ entry = tcx_entry_fetch_or_create(dev, true, &created);
+ if (!entry)
+ return -ENOMEM;
+ tcx_miniq_set_active(entry, true);
+ mini_qdisc_pair_init(&q->miniqp, sch, &tcx_entry(entry)->miniq);
+ if (created)
+ tcx_entry_update(dev, entry, true);
q->block_info.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS;
q->block_info.chain_head_change = clsact_chain_head_change;
@@ -103,11 +112,22 @@ static int ingress_init(struct Qdisc *sch, struct nlattr *opt,
static void ingress_destroy(struct Qdisc *sch)
{
struct ingress_sched_data *q = qdisc_priv(sch);
+ struct net_device *dev = qdisc_dev(sch);
+ struct bpf_mprog_entry *entry = rtnl_dereference(dev->tcx_ingress);
if (sch->parent != TC_H_INGRESS)
return;
tcf_block_put_ext(q->block, sch, &q->block_info);
+
+ if (entry) {
+ tcx_miniq_set_active(entry, false);
+ if (!tcx_entry_is_active(entry)) {
+ tcx_entry_update(dev, NULL, false);
+ tcx_entry_free(entry);
+ }
+ }
+
net_dec_ingress_queue();
}
@@ -223,6 +243,8 @@ static int clsact_init(struct Qdisc *sch, struct nlattr *opt,
{
struct clsact_sched_data *q = qdisc_priv(sch);
struct net_device *dev = qdisc_dev(sch);
+ struct bpf_mprog_entry *entry;
+ bool created;
int err;
if (sch->parent != TC_H_CLSACT)
@@ -231,7 +253,13 @@ static int clsact_init(struct Qdisc *sch, struct nlattr *opt,
net_inc_ingress_queue();
net_inc_egress_queue();
- mini_qdisc_pair_init(&q->miniqp_ingress, sch, &dev->miniq_ingress);
+ entry = tcx_entry_fetch_or_create(dev, true, &created);
+ if (!entry)
+ return -ENOMEM;
+ tcx_miniq_set_active(entry, true);
+ mini_qdisc_pair_init(&q->miniqp_ingress, sch, &tcx_entry(entry)->miniq);
+ if (created)
+ tcx_entry_update(dev, entry, true);
q->ingress_block_info.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS;
q->ingress_block_info.chain_head_change = clsact_chain_head_change;
@@ -244,7 +272,13 @@ static int clsact_init(struct Qdisc *sch, struct nlattr *opt,
mini_qdisc_pair_block_init(&q->miniqp_ingress, q->ingress_block);
- mini_qdisc_pair_init(&q->miniqp_egress, sch, &dev->miniq_egress);
+ entry = tcx_entry_fetch_or_create(dev, false, &created);
+ if (!entry)
+ return -ENOMEM;
+ tcx_miniq_set_active(entry, true);
+ mini_qdisc_pair_init(&q->miniqp_egress, sch, &tcx_entry(entry)->miniq);
+ if (created)
+ tcx_entry_update(dev, entry, false);
q->egress_block_info.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_EGRESS;
q->egress_block_info.chain_head_change = clsact_chain_head_change;
@@ -256,12 +290,31 @@ static int clsact_init(struct Qdisc *sch, struct nlattr *opt,
static void clsact_destroy(struct Qdisc *sch)
{
struct clsact_sched_data *q = qdisc_priv(sch);
+ struct net_device *dev = qdisc_dev(sch);
+ struct bpf_mprog_entry *ingress_entry = rtnl_dereference(dev->tcx_ingress);
+ struct bpf_mprog_entry *egress_entry = rtnl_dereference(dev->tcx_egress);
if (sch->parent != TC_H_CLSACT)
return;
- tcf_block_put_ext(q->egress_block, sch, &q->egress_block_info);
tcf_block_put_ext(q->ingress_block, sch, &q->ingress_block_info);
+ tcf_block_put_ext(q->egress_block, sch, &q->egress_block_info);
+
+ if (ingress_entry) {
+ tcx_miniq_set_active(ingress_entry, false);
+ if (!tcx_entry_is_active(ingress_entry)) {
+ tcx_entry_update(dev, NULL, true);
+ tcx_entry_free(ingress_entry);
+ }
+ }
+
+ if (egress_entry) {
+ tcx_miniq_set_active(egress_entry, false);
+ if (!tcx_entry_is_active(egress_entry)) {
+ tcx_entry_update(dev, NULL, false);
+ tcx_entry_free(egress_entry);
+ }
+ }
net_dec_ingress_queue();
net_dec_egress_queue();
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 1c166870cdf3..47b76925189f 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1036,6 +1036,8 @@ enum bpf_attach_type {
BPF_LSM_CGROUP,
BPF_STRUCT_OPS,
BPF_NETFILTER,
+ BPF_TCX_INGRESS,
+ BPF_TCX_EGRESS,
__MAX_BPF_ATTACH_TYPE
};
@@ -1053,7 +1055,7 @@ enum bpf_link_type {
BPF_LINK_TYPE_KPROBE_MULTI = 8,
BPF_LINK_TYPE_STRUCT_OPS = 9,
BPF_LINK_TYPE_NETFILTER = 10,
-
+ BPF_LINK_TYPE_TCX = 11,
MAX_BPF_LINK_TYPE,
};
@@ -1569,13 +1571,13 @@ union bpf_attr {
__u32 map_fd; /* struct_ops to attach */
};
union {
- __u32 target_fd; /* object to attach to */
- __u32 target_ifindex; /* target ifindex */
+ __u32 target_fd; /* target object to attach to or ... */
+ __u32 target_ifindex; /* target ifindex */
};
__u32 attach_type; /* attach type */
__u32 flags; /* extra flags */
union {
- __u32 target_btf_id; /* btf_id of target to attach to */
+ __u32 target_btf_id; /* btf_id of target to attach to */
struct {
__aligned_u64 iter_info; /* extra bpf_iter_link_info */
__u32 iter_info_len; /* iter_info length */
@@ -1609,6 +1611,13 @@ union bpf_attr {
__s32 priority;
__u32 flags;
} netfilter;
+ struct {
+ union {
+ __u32 relative_fd;
+ __u32 relative_id;
+ };
+ __u64 expected_revision;
+ } tcx;
};
} link_create;
@@ -6217,6 +6226,19 @@ struct bpf_sock_tuple {
};
};
+/* (Simplified) user return codes for tcx prog type.
+ * A valid tcx program must return one of these defined values. All other
+ * return codes are reserved for future use. Must remain compatible with
+ * their TC_ACT_* counter-parts. For compatibility in behavior, unknown
+ * return codes are mapped to TCX_NEXT.
+ */
+enum tcx_action_base {
+ TCX_NEXT = -1,
+ TCX_PASS = 0,
+ TCX_DROP = 2,
+ TCX_REDIRECT = 7,
+};
+
struct bpf_xdp_sock {
__u32 queue_id;
};
@@ -6499,6 +6521,10 @@ struct bpf_link_info {
} event; /* BPF_PERF_EVENT_EVENT */
};
} perf_event;
+ struct {
+ __u32 ifindex;
+ __u32 attach_type;
+ } tcx;
};
} __attribute__((aligned(8)));
--
2.34.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH bpf-next v6 2/8] bpf: Add fd-based tcx multi-prog infra with link support
2023-07-19 14:08 ` [PATCH bpf-next v6 2/8] bpf: Add fd-based tcx multi-prog infra with link support Daniel Borkmann
@ 2023-07-20 2:13 ` Yafang Shao
2023-07-21 12:53 ` Daniel Borkmann
2023-07-21 14:57 ` Petr Machata
1 sibling, 1 reply; 14+ messages in thread
From: Yafang Shao @ 2023-07-20 2:13 UTC (permalink / raw)
To: Daniel Borkmann
Cc: ast, andrii, martin.lau, razor, sdf, john.fastabend, kuba, dxu,
joe, toke, davem, bpf, netdev
On Wed, Jul 19, 2023 at 10:11 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> This work refactors and adds a lightweight extension ("tcx") to the tc BPF
> ingress and egress data path side for allowing BPF program management based
> on fds via bpf() syscall through the newly added generic multi-prog API.
> The main goal behind this work which we also presented at LPC [0] last year
> and a recent update at LSF/MM/BPF this year [3] is to support long-awaited
> BPF link functionality for tc BPF programs, which allows for a model of safe
> ownership and program detachment.
>
> Given the rise in tc BPF users in cloud native environments, this becomes
> necessary to avoid hard to debug incidents either through stale leftover
> programs or 3rd party applications accidentally stepping on each others toes.
> As a recap, a BPF link represents the attachment of a BPF program to a BPF
> hook point. The BPF link holds a single reference to keep BPF program alive.
> Moreover, hook points do not reference a BPF link, only the application's
> fd or pinning does. A BPF link holds meta-data specific to attachment and
> implements operations for link creation, (atomic) BPF program update,
> detachment and introspection. The motivation for BPF links for tc BPF programs
> is multi-fold, for example:
>
> - From Meta: "It's especially important for applications that are deployed
> fleet-wide and that don't "control" hosts they are deployed to. If such
> application crashes and no one notices and does anything about that, BPF
> program will keep running draining resources or even just, say, dropping
> packets. We at FB had outages due to such permanent BPF attachment
> semantics. With fd-based BPF link we are getting a framework, which allows
> safe, auto-detachable behavior by default, unless application explicitly
> opts in by pinning the BPF link." [1]
>
> - From Cilium-side the tc BPF programs we attach to host-facing veth devices
> and phys devices build the core datapath for Kubernetes Pods, and they
> implement forwarding, load-balancing, policy, EDT-management, etc, within
> BPF. Currently there is no concept of 'safe' ownership, e.g. we've recently
> experienced hard-to-debug issues in a user's staging environment where
> another Kubernetes application using tc BPF attached to the same prio/handle
> of cls_bpf, accidentally wiping all Cilium-based BPF programs from underneath
> it. The goal is to establish a clear/safe ownership model via links which
> cannot accidentally be overridden. [0,2]
>
> BPF links for tc can co-exist with non-link attachments, and the semantics are
> in line also with XDP links: BPF links cannot replace other BPF links, BPF
> links cannot replace non-BPF links, non-BPF links cannot replace BPF links and
> lastly only non-BPF links can replace non-BPF links. In case of Cilium, this
> would solve mentioned issue of safe ownership model as 3rd party applications
> would not be able to accidentally wipe Cilium programs, even if they are not
> BPF link aware.
>
> Earlier attempts [4] have tried to integrate BPF links into core tc machinery
> to solve cls_bpf, which has been intrusive to the generic tc kernel API with
> extensions only specific to cls_bpf and suboptimal/complex since cls_bpf could
> be wiped from the qdisc also. Locking a tc BPF program in place this way, is
> getting into layering hacks given the two object models are vastly different.
>
> We instead implemented the tcx (tc 'express') layer which is an fd-based tc BPF
> attach API, so that the BPF link implementation blends in naturally similar to
> other link types which are fd-based and without the need for changing core tc
> internal APIs. BPF programs for tc can then be successively migrated from classic
> cls_bpf to the new tc BPF link without needing to change the program's source
> code, just the BPF loader mechanics for attaching is sufficient.
>
> For the current tc framework, there is no change in behavior with this change
> and neither does this change touch on tc core kernel APIs. The gist of this
> patch is that the ingress and egress hook have a lightweight, qdisc-less
> extension for BPF to attach its tc BPF programs, in other words, a minimal
> entry point for tc BPF. The name tcx has been suggested from discussion of
> earlier revisions of this work as a good fit, and to more easily differ between
> the classic cls_bpf attachment and the fd-based one.
>
> For the ingress and egress tcx points, the device holds a cache-friendly array
> with program pointers which is separated from control plane (slow-path) data.
> Earlier versions of this work used priority to determine ordering and expression
> of dependencies similar as with classic tc, but it was challenged that for
> something more future-proof a better user experience is required. Hence this
> resulted in the design and development of the generic attach/detach/query API
> for multi-progs. See prior patch with its discussion on the API design. tcx is
> the first user and later we plan to integrate also others, for example, one
> candidate is multi-prog support for XDP which would benefit and have the same
> 'look and feel' from API perspective.
>
> The goal with tcx is to have maximum compatibility to existing tc BPF programs,
> so they don't need to be rewritten specifically. Compatibility to call into
> classic tcf_classify() is also provided in order to allow successive migration
> or both to cleanly co-exist where needed given its all one logical tc layer and
> the tcx plus classic tc cls/act build one logical overall processing pipeline.
>
> tcx supports the simplified return codes TCX_NEXT which is non-terminating (go
> to next program) and terminating ones with TCX_PASS, TCX_DROP, TCX_REDIRECT.
> The fd-based API is behind a static key, so that when unused the code is also
> not entered. The struct tcx_entry's program array is currently static, but
> could be made dynamic if necessary at a point in future. The a/b pair swap
> design has been chosen so that for detachment there are no allocations which
> otherwise could fail.
>
> The work has been tested with tc-testing selftest suite which all passes, as
> well as the tc BPF tests from the BPF CI, and also with Cilium's L4LB.
>
> Thanks also to Nikolay Aleksandrov and Martin Lau for in-depth early reviews
> of this work.
>
> [0] https://lpc.events/event/16/contributions/1353/
> [1] https://lore.kernel.org/bpf/CAEf4BzbokCJN33Nw_kg82sO=xppXnKWEncGTWCTB9vGCmLB6pw@mail.gmail.com
> [2] https://colocatedeventseu2023.sched.com/event/1Jo6O/tales-from-an-ebpf-programs-murder-mystery-hemanth-malla-guillaume-fournier-datadog
> [3] http://vger.kernel.org/bpfconf2023_material/tcx_meta_netdev_borkmann.pdf
> [4] https://lore.kernel.org/bpf/20210604063116.234316-1-memxor@gmail.com
>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Acked-by: Jakub Kicinski <kuba@kernel.org>
> ---
> MAINTAINERS | 4 +-
> include/linux/bpf_mprog.h | 9 +
> include/linux/netdevice.h | 15 +-
> include/linux/skbuff.h | 4 +-
> include/net/sch_generic.h | 2 +-
> include/net/tcx.h | 206 +++++++++++++++++++
> include/uapi/linux/bpf.h | 34 +++-
> kernel/bpf/Kconfig | 1 +
> kernel/bpf/Makefile | 1 +
> kernel/bpf/syscall.c | 82 ++++++--
> kernel/bpf/tcx.c | 348 +++++++++++++++++++++++++++++++++
> net/Kconfig | 5 +
> net/core/dev.c | 265 +++++++++++++++----------
> net/core/filter.c | 4 +-
> net/sched/Kconfig | 4 +-
> net/sched/sch_ingress.c | 61 +++++-
> tools/include/uapi/linux/bpf.h | 34 +++-
> 17 files changed, 935 insertions(+), 144 deletions(-)
> create mode 100644 include/net/tcx.h
> create mode 100644 kernel/bpf/tcx.c
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 678bef9f60b4..990e3fce753c 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3778,13 +3778,15 @@ L: netdev@vger.kernel.org
> S: Maintained
> F: kernel/bpf/bpf_struct*
>
> -BPF [NETWORKING] (tc BPF, sock_addr)
> +BPF [NETWORKING] (tcx & tc BPF, sock_addr)
> M: Martin KaFai Lau <martin.lau@linux.dev>
> M: Daniel Borkmann <daniel@iogearbox.net>
> R: John Fastabend <john.fastabend@gmail.com>
> L: bpf@vger.kernel.org
> L: netdev@vger.kernel.org
> S: Maintained
> +F: include/net/tcx.h
> +F: kernel/bpf/tcx.c
> F: net/core/filter.c
> F: net/sched/act_bpf.c
> F: net/sched/cls_bpf.c
> diff --git a/include/linux/bpf_mprog.h b/include/linux/bpf_mprog.h
> index 6feefec43422..2b429488f840 100644
> --- a/include/linux/bpf_mprog.h
> +++ b/include/linux/bpf_mprog.h
> @@ -315,4 +315,13 @@ int bpf_mprog_detach(struct bpf_mprog_entry *entry,
> int bpf_mprog_query(const union bpf_attr *attr, union bpf_attr __user *uattr,
> struct bpf_mprog_entry *entry);
>
> +static inline bool bpf_mprog_supported(enum bpf_prog_type type)
> +{
> + switch (type) {
> + case BPF_PROG_TYPE_SCHED_CLS:
> + return true;
> + default:
> + return false;
> + }
> +}
> #endif /* __BPF_MPROG_H */
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index b828c7a75be2..024314c68bc8 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1930,8 +1930,7 @@ enum netdev_ml_priv_type {
> *
> * @rx_handler: handler for received packets
> * @rx_handler_data: XXX: need comments on this one
> - * @miniq_ingress: ingress/clsact qdisc specific data for
> - * ingress processing
> + * @tcx_ingress: BPF & clsact qdisc specific data for ingress processing
> * @ingress_queue: XXX: need comments on this one
> * @nf_hooks_ingress: netfilter hooks executed for ingress packets
> * @broadcast: hw bcast address
> @@ -1952,8 +1951,7 @@ enum netdev_ml_priv_type {
> * @xps_maps: all CPUs/RXQs maps for XPS device
> *
> * @xps_maps: XXX: need comments on this one
> - * @miniq_egress: clsact qdisc specific data for
> - * egress processing
> + * @tcx_egress: BPF & clsact qdisc specific data for egress processing
> * @nf_hooks_egress: netfilter hooks executed for egress packets
> * @qdisc_hash: qdisc hash table
> * @watchdog_timeo: Represents the timeout that is used by
> @@ -2252,9 +2250,8 @@ struct net_device {
> unsigned int gro_ipv4_max_size;
> rx_handler_func_t __rcu *rx_handler;
> void __rcu *rx_handler_data;
> -
> -#ifdef CONFIG_NET_CLS_ACT
> - struct mini_Qdisc __rcu *miniq_ingress;
> +#ifdef CONFIG_NET_XGRESS
> + struct bpf_mprog_entry __rcu *tcx_ingress;
> #endif
> struct netdev_queue __rcu *ingress_queue;
> #ifdef CONFIG_NETFILTER_INGRESS
> @@ -2282,8 +2279,8 @@ struct net_device {
> #ifdef CONFIG_XPS
> struct xps_dev_maps __rcu *xps_maps[XPS_MAPS_MAX];
> #endif
> -#ifdef CONFIG_NET_CLS_ACT
> - struct mini_Qdisc __rcu *miniq_egress;
> +#ifdef CONFIG_NET_XGRESS
> + struct bpf_mprog_entry __rcu *tcx_egress;
> #endif
> #ifdef CONFIG_NETFILTER_EGRESS
> struct nf_hook_entries __rcu *nf_hooks_egress;
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 91ed66952580..ed83f1c5fc1f 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -944,7 +944,7 @@ struct sk_buff {
> __u8 __mono_tc_offset[0];
> /* public: */
> __u8 mono_delivery_time:1; /* See SKB_MONO_DELIVERY_TIME_MASK */
> -#ifdef CONFIG_NET_CLS_ACT
> +#ifdef CONFIG_NET_XGRESS
> __u8 tc_at_ingress:1; /* See TC_AT_INGRESS_MASK */
> __u8 tc_skip_classify:1;
> #endif
> @@ -993,7 +993,7 @@ struct sk_buff {
> __u8 csum_not_inet:1;
> #endif
>
> -#ifdef CONFIG_NET_SCHED
> +#if defined(CONFIG_NET_SCHED) || defined(CONFIG_NET_XGRESS)
> __u16 tc_index; /* traffic control index */
> #endif
>
> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> index e92f73bb3198..15be2d96b06d 100644
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -703,7 +703,7 @@ int skb_do_redirect(struct sk_buff *);
>
> static inline bool skb_at_tc_ingress(const struct sk_buff *skb)
> {
> -#ifdef CONFIG_NET_CLS_ACT
> +#ifdef CONFIG_NET_XGRESS
> return skb->tc_at_ingress;
> #else
> return false;
> diff --git a/include/net/tcx.h b/include/net/tcx.h
> new file mode 100644
> index 000000000000..264f147953ba
> --- /dev/null
> +++ b/include/net/tcx.h
> @@ -0,0 +1,206 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/* Copyright (c) 2023 Isovalent */
> +#ifndef __NET_TCX_H
> +#define __NET_TCX_H
> +
> +#include <linux/bpf.h>
> +#include <linux/bpf_mprog.h>
> +
> +#include <net/sch_generic.h>
> +
> +struct mini_Qdisc;
> +
> +struct tcx_entry {
> + struct mini_Qdisc __rcu *miniq;
> + struct bpf_mprog_bundle bundle;
> + bool miniq_active;
> + struct rcu_head rcu;
> +};
> +
> +struct tcx_link {
> + struct bpf_link link;
> + struct net_device *dev;
> + u32 location;
> +};
> +
> +static inline void tcx_set_ingress(struct sk_buff *skb, bool ingress)
> +{
> +#ifdef CONFIG_NET_XGRESS
> + skb->tc_at_ingress = ingress;
> +#endif
> +}
> +
> +#ifdef CONFIG_NET_XGRESS
> +static inline struct tcx_entry *tcx_entry(struct bpf_mprog_entry *entry)
> +{
> + struct bpf_mprog_bundle *bundle = entry->parent;
> +
> + return container_of(bundle, struct tcx_entry, bundle);
> +}
> +
> +static inline struct tcx_link *tcx_link(struct bpf_link *link)
> +{
> + return container_of(link, struct tcx_link, link);
> +}
> +
> +static inline const struct tcx_link *tcx_link_const(const struct bpf_link *link)
> +{
> + return tcx_link((struct bpf_link *)link);
> +}
> +
> +void tcx_inc(void);
> +void tcx_dec(void);
> +
> +static inline void tcx_entry_sync(void)
> +{
> + /* bpf_mprog_entry got a/b swapped, therefore ensure that
> + * there are no inflight users on the old one anymore.
> + */
> + synchronize_rcu();
> +}
> +
> +static inline void
> +tcx_entry_update(struct net_device *dev, struct bpf_mprog_entry *entry,
> + bool ingress)
> +{
> + ASSERT_RTNL();
> + if (ingress)
> + rcu_assign_pointer(dev->tcx_ingress, entry);
> + else
> + rcu_assign_pointer(dev->tcx_egress, entry);
> +}
> +
> +static inline struct bpf_mprog_entry *
> +tcx_entry_fetch(struct net_device *dev, bool ingress)
> +{
> + ASSERT_RTNL();
> + if (ingress)
> + return rcu_dereference_rtnl(dev->tcx_ingress);
> + else
> + return rcu_dereference_rtnl(dev->tcx_egress);
> +}
> +
> +static inline struct bpf_mprog_entry *tcx_entry_create(void)
> +{
> + struct tcx_entry *tcx = kzalloc(sizeof(*tcx), GFP_KERNEL);
> +
> + if (tcx) {
> + bpf_mprog_bundle_init(&tcx->bundle);
> + return &tcx->bundle.a;
> + }
> + return NULL;
> +}
> +
> +static inline void tcx_entry_free(struct bpf_mprog_entry *entry)
> +{
> + kfree_rcu(tcx_entry(entry), rcu);
> +}
> +
> +static inline struct bpf_mprog_entry *
> +tcx_entry_fetch_or_create(struct net_device *dev, bool ingress, bool *created)
> +{
> + struct bpf_mprog_entry *entry = tcx_entry_fetch(dev, ingress);
> +
> + *created = false;
> + if (!entry) {
> + entry = tcx_entry_create();
> + if (!entry)
> + return NULL;
> + *created = true;
> + }
> + return entry;
> +}
> +
> +static inline void tcx_skeys_inc(bool ingress)
> +{
> + tcx_inc();
> + if (ingress)
> + net_inc_ingress_queue();
> + else
> + net_inc_egress_queue();
> +}
> +
> +static inline void tcx_skeys_dec(bool ingress)
> +{
> + if (ingress)
> + net_dec_ingress_queue();
> + else
> + net_dec_egress_queue();
> + tcx_dec();
> +}
> +
> +static inline void tcx_miniq_set_active(struct bpf_mprog_entry *entry,
> + const bool active)
> +{
> + ASSERT_RTNL();
> + tcx_entry(entry)->miniq_active = active;
> +}
> +
> +static inline bool tcx_entry_is_active(struct bpf_mprog_entry *entry)
> +{
> + ASSERT_RTNL();
> + return bpf_mprog_total(entry) || tcx_entry(entry)->miniq_active;
> +}
> +
> +static inline enum tcx_action_base tcx_action_code(struct sk_buff *skb,
> + int code)
> +{
> + switch (code) {
> + case TCX_PASS:
> + skb->tc_index = qdisc_skb_cb(skb)->tc_classid;
> + fallthrough;
> + case TCX_DROP:
> + case TCX_REDIRECT:
> + return code;
> + case TCX_NEXT:
> + default:
> + return TCX_NEXT;
> + }
> +}
> +#endif /* CONFIG_NET_XGRESS */
> +
> +#if defined(CONFIG_NET_XGRESS) && defined(CONFIG_BPF_SYSCALL)
> +int tcx_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog);
> +int tcx_link_attach(const union bpf_attr *attr, struct bpf_prog *prog);
> +int tcx_prog_detach(const union bpf_attr *attr, struct bpf_prog *prog);
> +void tcx_uninstall(struct net_device *dev, bool ingress);
> +
> +int tcx_prog_query(const union bpf_attr *attr,
> + union bpf_attr __user *uattr);
> +
> +static inline void dev_tcx_uninstall(struct net_device *dev)
> +{
> + ASSERT_RTNL();
> + tcx_uninstall(dev, true);
> + tcx_uninstall(dev, false);
> +}
> +#else
> +static inline int tcx_prog_attach(const union bpf_attr *attr,
> + struct bpf_prog *prog)
> +{
> + return -EINVAL;
> +}
> +
> +static inline int tcx_link_attach(const union bpf_attr *attr,
> + struct bpf_prog *prog)
> +{
> + return -EINVAL;
> +}
> +
> +static inline int tcx_prog_detach(const union bpf_attr *attr,
> + struct bpf_prog *prog)
> +{
> + return -EINVAL;
> +}
> +
> +static inline int tcx_prog_query(const union bpf_attr *attr,
> + union bpf_attr __user *uattr)
> +{
> + return -EINVAL;
> +}
> +
> +static inline void dev_tcx_uninstall(struct net_device *dev)
> +{
> +}
> +#endif /* CONFIG_NET_XGRESS && CONFIG_BPF_SYSCALL */
> +#endif /* __NET_TCX_H */
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index d4c07e435336..739c15906a65 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -1036,6 +1036,8 @@ enum bpf_attach_type {
> BPF_LSM_CGROUP,
> BPF_STRUCT_OPS,
> BPF_NETFILTER,
> + BPF_TCX_INGRESS,
> + BPF_TCX_EGRESS,
> __MAX_BPF_ATTACH_TYPE
> };
>
> @@ -1053,7 +1055,7 @@ enum bpf_link_type {
> BPF_LINK_TYPE_KPROBE_MULTI = 8,
> BPF_LINK_TYPE_STRUCT_OPS = 9,
> BPF_LINK_TYPE_NETFILTER = 10,
> -
> + BPF_LINK_TYPE_TCX = 11,
> MAX_BPF_LINK_TYPE,
> };
>
> @@ -1569,13 +1571,13 @@ union bpf_attr {
> __u32 map_fd; /* struct_ops to attach */
> };
> union {
> - __u32 target_fd; /* object to attach to */
> - __u32 target_ifindex; /* target ifindex */
> + __u32 target_fd; /* target object to attach to or ... */
> + __u32 target_ifindex; /* target ifindex */
> };
> __u32 attach_type; /* attach type */
> __u32 flags; /* extra flags */
> union {
> - __u32 target_btf_id; /* btf_id of target to attach to */
> + __u32 target_btf_id; /* btf_id of target to attach to */
> struct {
> __aligned_u64 iter_info; /* extra bpf_iter_link_info */
> __u32 iter_info_len; /* iter_info length */
> @@ -1609,6 +1611,13 @@ union bpf_attr {
> __s32 priority;
> __u32 flags;
> } netfilter;
> + struct {
> + union {
> + __u32 relative_fd;
> + __u32 relative_id;
> + };
> + __u64 expected_revision;
> + } tcx;
> };
> } link_create;
>
> @@ -6217,6 +6226,19 @@ struct bpf_sock_tuple {
> };
> };
>
> +/* (Simplified) user return codes for tcx prog type.
> + * A valid tcx program must return one of these defined values. All other
> + * return codes are reserved for future use. Must remain compatible with
> + * their TC_ACT_* counter-parts. For compatibility in behavior, unknown
> + * return codes are mapped to TCX_NEXT.
> + */
> +enum tcx_action_base {
> + TCX_NEXT = -1,
> + TCX_PASS = 0,
> + TCX_DROP = 2,
> + TCX_REDIRECT = 7,
> +};
> +
> struct bpf_xdp_sock {
> __u32 queue_id;
> };
> @@ -6499,6 +6521,10 @@ struct bpf_link_info {
> } event; /* BPF_PERF_EVENT_EVENT */
> };
> } perf_event;
> + struct {
> + __u32 ifindex;
> + __u32 attach_type;
> + } tcx;
> };
> } __attribute__((aligned(8)));
>
> diff --git a/kernel/bpf/Kconfig b/kernel/bpf/Kconfig
> index 2dfe1079f772..6a906ff93006 100644
> --- a/kernel/bpf/Kconfig
> +++ b/kernel/bpf/Kconfig
> @@ -31,6 +31,7 @@ config BPF_SYSCALL
> select TASKS_TRACE_RCU
> select BINARY_PRINTF
> select NET_SOCK_MSG if NET
> + select NET_XGRESS if NET
> select PAGE_POOL if NET
> default n
> help
> diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
> index 1bea2eb912cd..f526b7573e97 100644
> --- a/kernel/bpf/Makefile
> +++ b/kernel/bpf/Makefile
> @@ -21,6 +21,7 @@ obj-$(CONFIG_BPF_SYSCALL) += devmap.o
> obj-$(CONFIG_BPF_SYSCALL) += cpumap.o
> obj-$(CONFIG_BPF_SYSCALL) += offload.o
> obj-$(CONFIG_BPF_SYSCALL) += net_namespace.o
> +obj-$(CONFIG_BPF_SYSCALL) += tcx.o
> endif
> ifeq ($(CONFIG_PERF_EVENTS),y)
> obj-$(CONFIG_BPF_SYSCALL) += stackmap.o
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index ee8cb1a174aa..7f4e8c357a6a 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -37,6 +37,8 @@
> #include <linux/trace_events.h>
> #include <net/netfilter/nf_bpf_link.h>
>
> +#include <net/tcx.h>
> +
> #define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || \
> (map)->map_type == BPF_MAP_TYPE_CGROUP_ARRAY || \
> (map)->map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS)
> @@ -3740,31 +3742,45 @@ attach_type_to_prog_type(enum bpf_attach_type attach_type)
> return BPF_PROG_TYPE_XDP;
> case BPF_LSM_CGROUP:
> return BPF_PROG_TYPE_LSM;
> + case BPF_TCX_INGRESS:
> + case BPF_TCX_EGRESS:
> + return BPF_PROG_TYPE_SCHED_CLS;
> default:
> return BPF_PROG_TYPE_UNSPEC;
> }
> }
>
> -#define BPF_PROG_ATTACH_LAST_FIELD replace_bpf_fd
> +#define BPF_PROG_ATTACH_LAST_FIELD expected_revision
> +
> +#define BPF_F_ATTACH_MASK_BASE \
> + (BPF_F_ALLOW_OVERRIDE | \
> + BPF_F_ALLOW_MULTI | \
> + BPF_F_REPLACE)
>
> -#define BPF_F_ATTACH_MASK \
> - (BPF_F_ALLOW_OVERRIDE | BPF_F_ALLOW_MULTI | BPF_F_REPLACE)
> +#define BPF_F_ATTACH_MASK_MPROG \
> + (BPF_F_REPLACE | \
> + BPF_F_BEFORE | \
> + BPF_F_AFTER | \
> + BPF_F_ID | \
> + BPF_F_LINK)
>
> static int bpf_prog_attach(const union bpf_attr *attr)
> {
> enum bpf_prog_type ptype;
> struct bpf_prog *prog;
> + u32 mask;
> int ret;
>
> if (CHECK_ATTR(BPF_PROG_ATTACH))
> return -EINVAL;
>
> - if (attr->attach_flags & ~BPF_F_ATTACH_MASK)
> - return -EINVAL;
> -
> ptype = attach_type_to_prog_type(attr->attach_type);
> if (ptype == BPF_PROG_TYPE_UNSPEC)
> return -EINVAL;
> + mask = bpf_mprog_supported(ptype) ?
> + BPF_F_ATTACH_MASK_MPROG : BPF_F_ATTACH_MASK_BASE;
> + if (attr->attach_flags & ~mask)
> + return -EINVAL;
>
> prog = bpf_prog_get_type(attr->attach_bpf_fd, ptype);
> if (IS_ERR(prog))
> @@ -3800,6 +3816,9 @@ static int bpf_prog_attach(const union bpf_attr *attr)
> else
> ret = cgroup_bpf_prog_attach(attr, ptype, prog);
> break;
> + case BPF_PROG_TYPE_SCHED_CLS:
> + ret = tcx_prog_attach(attr, prog);
> + break;
> default:
> ret = -EINVAL;
> }
> @@ -3809,25 +3828,41 @@ static int bpf_prog_attach(const union bpf_attr *attr)
> return ret;
> }
>
> -#define BPF_PROG_DETACH_LAST_FIELD attach_type
> +#define BPF_PROG_DETACH_LAST_FIELD expected_revision
>
> static int bpf_prog_detach(const union bpf_attr *attr)
> {
> + struct bpf_prog *prog = NULL;
> enum bpf_prog_type ptype;
> + int ret;
>
> if (CHECK_ATTR(BPF_PROG_DETACH))
> return -EINVAL;
>
> ptype = attach_type_to_prog_type(attr->attach_type);
> + if (bpf_mprog_supported(ptype)) {
> + if (ptype == BPF_PROG_TYPE_UNSPEC)
> + return -EINVAL;
> + if (attr->attach_flags & ~BPF_F_ATTACH_MASK_MPROG)
> + return -EINVAL;
> + if (attr->attach_bpf_fd) {
> + prog = bpf_prog_get_type(attr->attach_bpf_fd, ptype);
> + if (IS_ERR(prog))
> + return PTR_ERR(prog);
> + }
> + }
>
> switch (ptype) {
> case BPF_PROG_TYPE_SK_MSG:
> case BPF_PROG_TYPE_SK_SKB:
> - return sock_map_prog_detach(attr, ptype);
> + ret = sock_map_prog_detach(attr, ptype);
> + break;
> case BPF_PROG_TYPE_LIRC_MODE2:
> - return lirc_prog_detach(attr);
> + ret = lirc_prog_detach(attr);
> + break;
> case BPF_PROG_TYPE_FLOW_DISSECTOR:
> - return netns_bpf_prog_detach(attr, ptype);
> + ret = netns_bpf_prog_detach(attr, ptype);
> + break;
> case BPF_PROG_TYPE_CGROUP_DEVICE:
> case BPF_PROG_TYPE_CGROUP_SKB:
> case BPF_PROG_TYPE_CGROUP_SOCK:
> @@ -3836,13 +3871,21 @@ static int bpf_prog_detach(const union bpf_attr *attr)
> case BPF_PROG_TYPE_CGROUP_SYSCTL:
> case BPF_PROG_TYPE_SOCK_OPS:
> case BPF_PROG_TYPE_LSM:
> - return cgroup_bpf_prog_detach(attr, ptype);
> + ret = cgroup_bpf_prog_detach(attr, ptype);
> + break;
> + case BPF_PROG_TYPE_SCHED_CLS:
> + ret = tcx_prog_detach(attr, prog);
> + break;
> default:
> - return -EINVAL;
> + ret = -EINVAL;
> }
> +
> + if (prog)
> + bpf_prog_put(prog);
> + return ret;
> }
>
> -#define BPF_PROG_QUERY_LAST_FIELD query.prog_attach_flags
> +#define BPF_PROG_QUERY_LAST_FIELD query.link_attach_flags
>
> static int bpf_prog_query(const union bpf_attr *attr,
> union bpf_attr __user *uattr)
> @@ -3890,6 +3933,9 @@ static int bpf_prog_query(const union bpf_attr *attr,
> case BPF_SK_MSG_VERDICT:
> case BPF_SK_SKB_VERDICT:
> return sock_map_bpf_prog_query(attr, uattr);
> + case BPF_TCX_INGRESS:
> + case BPF_TCX_EGRESS:
> + return tcx_prog_query(attr, uattr);
> default:
> return -EINVAL;
> }
> @@ -4852,6 +4898,13 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
> goto out;
> }
> break;
> + case BPF_PROG_TYPE_SCHED_CLS:
> + if (attr->link_create.attach_type != BPF_TCX_INGRESS &&
> + attr->link_create.attach_type != BPF_TCX_EGRESS) {
> + ret = -EINVAL;
> + goto out;
> + }
> + break;
> default:
> ptype = attach_type_to_prog_type(attr->link_create.attach_type);
> if (ptype == BPF_PROG_TYPE_UNSPEC || ptype != prog->type) {
> @@ -4903,6 +4956,9 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
> case BPF_PROG_TYPE_XDP:
> ret = bpf_xdp_link_attach(attr, prog);
> break;
> + case BPF_PROG_TYPE_SCHED_CLS:
> + ret = tcx_link_attach(attr, prog);
> + break;
> case BPF_PROG_TYPE_NETFILTER:
> ret = bpf_nf_link_attach(attr, prog);
> break;
> diff --git a/kernel/bpf/tcx.c b/kernel/bpf/tcx.c
> new file mode 100644
> index 000000000000..69a272712b29
> --- /dev/null
> +++ b/kernel/bpf/tcx.c
> @@ -0,0 +1,348 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2023 Isovalent */
> +
> +#include <linux/bpf.h>
> +#include <linux/bpf_mprog.h>
> +#include <linux/netdevice.h>
> +
> +#include <net/tcx.h>
> +
> +int tcx_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
> +{
> + bool created, ingress = attr->attach_type == BPF_TCX_INGRESS;
> + struct net *net = current->nsproxy->net_ns;
> + struct bpf_mprog_entry *entry, *entry_new;
> + struct bpf_prog *replace_prog = NULL;
> + struct net_device *dev;
> + int ret;
> +
> + rtnl_lock();
> + dev = __dev_get_by_index(net, attr->target_ifindex);
> + if (!dev) {
> + ret = -ENODEV;
> + goto out;
> + }
> + if (attr->attach_flags & BPF_F_REPLACE) {
> + replace_prog = bpf_prog_get_type(attr->replace_bpf_fd,
> + prog->type);
> + if (IS_ERR(replace_prog)) {
> + ret = PTR_ERR(replace_prog);
> + replace_prog = NULL;
> + goto out;
> + }
> + }
> + entry = tcx_entry_fetch_or_create(dev, ingress, &created);
> + if (!entry) {
> + ret = -ENOMEM;
> + goto out;
> + }
> + ret = bpf_mprog_attach(entry, &entry_new, prog, NULL, replace_prog,
> + attr->attach_flags, attr->relative_fd,
> + attr->expected_revision);
> + if (!ret) {
> + if (entry != entry_new) {
> + tcx_entry_update(dev, entry_new, ingress);
> + tcx_entry_sync();
> + tcx_skeys_inc(ingress);
> + }
> + bpf_mprog_commit(entry);
> + } else if (created) {
> + tcx_entry_free(entry);
> + }
> +out:
> + if (replace_prog)
> + bpf_prog_put(replace_prog);
> + rtnl_unlock();
> + return ret;
> +}
> +
> +int tcx_prog_detach(const union bpf_attr *attr, struct bpf_prog *prog)
> +{
> + bool ingress = attr->attach_type == BPF_TCX_INGRESS;
> + struct net *net = current->nsproxy->net_ns;
> + struct bpf_mprog_entry *entry, *entry_new;
> + struct net_device *dev;
> + int ret;
> +
> + rtnl_lock();
> + dev = __dev_get_by_index(net, attr->target_ifindex);
> + if (!dev) {
> + ret = -ENODEV;
> + goto out;
> + }
> + entry = tcx_entry_fetch(dev, ingress);
> + if (!entry) {
> + ret = -ENOENT;
> + goto out;
> + }
> + ret = bpf_mprog_detach(entry, &entry_new, prog, NULL, attr->attach_flags,
> + attr->relative_fd, attr->expected_revision);
> + if (!ret) {
> + if (!tcx_entry_is_active(entry_new))
> + entry_new = NULL;
> + tcx_entry_update(dev, entry_new, ingress);
> + tcx_entry_sync();
> + tcx_skeys_dec(ingress);
> + bpf_mprog_commit(entry);
> + if (!entry_new)
> + tcx_entry_free(entry);
> + }
> +out:
> + rtnl_unlock();
> + return ret;
> +}
> +
> +void tcx_uninstall(struct net_device *dev, bool ingress)
> +{
> + struct bpf_tuple tuple = {};
> + struct bpf_mprog_entry *entry;
> + struct bpf_mprog_fp *fp;
> + struct bpf_mprog_cp *cp;
> +
> + entry = tcx_entry_fetch(dev, ingress);
> + if (!entry)
> + return;
> + tcx_entry_update(dev, NULL, ingress);
> + tcx_entry_sync();
> + bpf_mprog_foreach_tuple(entry, fp, cp, tuple) {
> + if (tuple.link)
> + tcx_link(tuple.link)->dev = NULL;
> + else
> + bpf_prog_put(tuple.prog);
> + tcx_skeys_dec(ingress);
> + }
> + WARN_ON_ONCE(tcx_entry(entry)->miniq_active);
> + tcx_entry_free(entry);
> +}
> +
> +int tcx_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr)
> +{
> + bool ingress = attr->query.attach_type == BPF_TCX_INGRESS;
> + struct net *net = current->nsproxy->net_ns;
> + struct bpf_mprog_entry *entry;
> + struct net_device *dev;
> + int ret;
> +
> + rtnl_lock();
> + dev = __dev_get_by_index(net, attr->query.target_ifindex);
> + if (!dev) {
> + ret = -ENODEV;
> + goto out;
> + }
> + entry = tcx_entry_fetch(dev, ingress);
> + if (!entry) {
> + ret = -ENOENT;
> + goto out;
> + }
> + ret = bpf_mprog_query(attr, uattr, entry);
> +out:
> + rtnl_unlock();
> + return ret;
> +}
> +
> +static int tcx_link_prog_attach(struct bpf_link *link, u32 flags, u32 id_or_fd,
> + u64 revision)
> +{
> + struct tcx_link *tcx = tcx_link(link);
> + bool created, ingress = tcx->location == BPF_TCX_INGRESS;
> + struct bpf_mprog_entry *entry, *entry_new;
> + struct net_device *dev = tcx->dev;
> + int ret;
> +
> + ASSERT_RTNL();
> + entry = tcx_entry_fetch_or_create(dev, ingress, &created);
> + if (!entry)
> + return -ENOMEM;
> + ret = bpf_mprog_attach(entry, &entry_new, link->prog, link, NULL, flags,
> + id_or_fd, revision);
> + if (!ret) {
> + if (entry != entry_new) {
> + tcx_entry_update(dev, entry_new, ingress);
> + tcx_entry_sync();
> + tcx_skeys_inc(ingress);
> + }
> + bpf_mprog_commit(entry);
> + } else if (created) {
> + tcx_entry_free(entry);
> + }
> + return ret;
> +}
> +
> +static void tcx_link_release(struct bpf_link *link)
> +{
> + struct tcx_link *tcx = tcx_link(link);
> + bool ingress = tcx->location == BPF_TCX_INGRESS;
> + struct bpf_mprog_entry *entry, *entry_new;
> + struct net_device *dev;
> + int ret = 0;
> +
> + rtnl_lock();
> + dev = tcx->dev;
> + if (!dev)
> + goto out;
> + entry = tcx_entry_fetch(dev, ingress);
> + if (!entry) {
> + ret = -ENOENT;
> + goto out;
> + }
> + ret = bpf_mprog_detach(entry, &entry_new, link->prog, link, 0, 0, 0);
> + if (!ret) {
> + if (!tcx_entry_is_active(entry_new))
> + entry_new = NULL;
> + tcx_entry_update(dev, entry_new, ingress);
> + tcx_entry_sync();
> + tcx_skeys_dec(ingress);
> + bpf_mprog_commit(entry);
> + if (!entry_new)
> + tcx_entry_free(entry);
> + tcx->dev = NULL;
> + }
> +out:
> + WARN_ON_ONCE(ret);
> + rtnl_unlock();
> +}
> +
> +static int tcx_link_update(struct bpf_link *link, struct bpf_prog *nprog,
> + struct bpf_prog *oprog)
> +{
> + struct tcx_link *tcx = tcx_link(link);
> + bool ingress = tcx->location == BPF_TCX_INGRESS;
> + struct bpf_mprog_entry *entry, *entry_new;
> + struct net_device *dev;
> + int ret = 0;
> +
> + rtnl_lock();
> + dev = tcx->dev;
> + if (!dev) {
> + ret = -ENOLINK;
> + goto out;
> + }
> + if (oprog && link->prog != oprog) {
> + ret = -EPERM;
> + goto out;
> + }
> + oprog = link->prog;
> + if (oprog == nprog) {
> + bpf_prog_put(nprog);
> + goto out;
> + }
> + entry = tcx_entry_fetch(dev, ingress);
> + if (!entry) {
> + ret = -ENOENT;
> + goto out;
> + }
> + ret = bpf_mprog_attach(entry, &entry_new, nprog, link, oprog,
> + BPF_F_REPLACE | BPF_F_ID,
> + link->prog->aux->id, 0);
> + if (!ret) {
> + WARN_ON_ONCE(entry != entry_new);
> + oprog = xchg(&link->prog, nprog);
> + bpf_prog_put(oprog);
> + bpf_mprog_commit(entry);
> + }
> +out:
> + rtnl_unlock();
> + return ret;
> +}
> +
> +static void tcx_link_dealloc(struct bpf_link *link)
> +{
> + kfree(tcx_link(link));
> +}
> +
> +static void tcx_link_fdinfo(const struct bpf_link *link, struct seq_file *seq)
> +{
> + const struct tcx_link *tcx = tcx_link_const(link);
> + u32 ifindex = 0;
> +
> + rtnl_lock();
> + if (tcx->dev)
> + ifindex = tcx->dev->ifindex;
> + rtnl_unlock();
> +
> + seq_printf(seq, "ifindex:\t%u\n", ifindex);
> + seq_printf(seq, "attach_type:\t%u (%s)\n",
> + tcx->location,
> + tcx->location == BPF_TCX_INGRESS ? "ingress" : "egress");
> +}
> +
> +static int tcx_link_fill_info(const struct bpf_link *link,
> + struct bpf_link_info *info)
> +{
> + const struct tcx_link *tcx = tcx_link_const(link);
> + u32 ifindex = 0;
> +
> + rtnl_lock();
> + if (tcx->dev)
> + ifindex = tcx->dev->ifindex;
> + rtnl_unlock();
> +
> + info->tcx.ifindex = ifindex;
> + info->tcx.attach_type = tcx->location;
> + return 0;
> +}
> +
> +static int tcx_link_detach(struct bpf_link *link)
> +{
> + tcx_link_release(link);
> + return 0;
> +}
> +
> +static const struct bpf_link_ops tcx_link_lops = {
> + .release = tcx_link_release,
> + .detach = tcx_link_detach,
> + .dealloc = tcx_link_dealloc,
> + .update_prog = tcx_link_update,
> + .show_fdinfo = tcx_link_fdinfo,
> + .fill_link_info = tcx_link_fill_info,
Should we show the tc link info in `bpftool link show` as well? I
believe that `bpftool link show` is the appropriate command to display
comprehensive information about all links.
> +};
> +
> +static int tcx_link_init(struct tcx_link *tcx,
> + struct bpf_link_primer *link_primer,
> + const union bpf_attr *attr,
> + struct net_device *dev,
> + struct bpf_prog *prog)
> +{
> + bpf_link_init(&tcx->link, BPF_LINK_TYPE_TCX, &tcx_link_lops, prog);
> + tcx->location = attr->link_create.attach_type;
> + tcx->dev = dev;
> + return bpf_link_prime(&tcx->link, link_primer);
> +}
> +
> +int tcx_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
> +{
> + struct net *net = current->nsproxy->net_ns;
> + struct bpf_link_primer link_primer;
> + struct net_device *dev;
> + struct tcx_link *tcx;
> + int ret;
> +
> + rtnl_lock();
> + dev = __dev_get_by_index(net, attr->link_create.target_ifindex);
> + if (!dev) {
> + ret = -ENODEV;
> + goto out;
> + }
> + tcx = kzalloc(sizeof(*tcx), GFP_USER);
> + if (!tcx) {
> + ret = -ENOMEM;
> + goto out;
> + }
> + ret = tcx_link_init(tcx, &link_primer, attr, dev, prog);
> + if (ret) {
> + kfree(tcx);
> + goto out;
> + }
> + ret = tcx_link_prog_attach(&tcx->link, attr->link_create.flags,
> + attr->link_create.tcx.relative_fd,
> + attr->link_create.tcx.expected_revision);
> + if (ret) {
> + tcx->dev = NULL;
> + bpf_link_cleanup(&link_primer);
> + goto out;
> + }
> + ret = bpf_link_settle(&link_primer);
> +out:
> + rtnl_unlock();
> + return ret;
> +}
> diff --git a/net/Kconfig b/net/Kconfig
> index 2fb25b534df5..d532ec33f1fe 100644
> --- a/net/Kconfig
> +++ b/net/Kconfig
> @@ -52,6 +52,11 @@ config NET_INGRESS
> config NET_EGRESS
> bool
>
> +config NET_XGRESS
> + select NET_INGRESS
> + select NET_EGRESS
> + bool
> +
> config NET_REDIRECT
> bool
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index d6e1b786c5c5..c4b826024978 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -107,6 +107,7 @@
> #include <net/pkt_cls.h>
> #include <net/checksum.h>
> #include <net/xfrm.h>
> +#include <net/tcx.h>
> #include <linux/highmem.h>
> #include <linux/init.h>
> #include <linux/module.h>
> @@ -154,7 +155,6 @@
> #include "dev.h"
> #include "net-sysfs.h"
>
> -
> static DEFINE_SPINLOCK(ptype_lock);
> struct list_head ptype_base[PTYPE_HASH_SIZE] __read_mostly;
> struct list_head ptype_all __read_mostly; /* Taps */
> @@ -3882,69 +3882,198 @@ int dev_loopback_xmit(struct net *net, struct sock *sk, struct sk_buff *skb)
> EXPORT_SYMBOL(dev_loopback_xmit);
>
> #ifdef CONFIG_NET_EGRESS
> -static struct sk_buff *
> -sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
> +static struct netdev_queue *
> +netdev_tx_queue_mapping(struct net_device *dev, struct sk_buff *skb)
> +{
> + int qm = skb_get_queue_mapping(skb);
> +
> + return netdev_get_tx_queue(dev, netdev_cap_txqueue(dev, qm));
> +}
> +
> +static bool netdev_xmit_txqueue_skipped(void)
> {
> + return __this_cpu_read(softnet_data.xmit.skip_txqueue);
> +}
> +
> +void netdev_xmit_skip_txqueue(bool skip)
> +{
> + __this_cpu_write(softnet_data.xmit.skip_txqueue, skip);
> +}
> +EXPORT_SYMBOL_GPL(netdev_xmit_skip_txqueue);
> +#endif /* CONFIG_NET_EGRESS */
> +
> +#ifdef CONFIG_NET_XGRESS
> +static int tc_run(struct tcx_entry *entry, struct sk_buff *skb)
> +{
> + int ret = TC_ACT_UNSPEC;
> #ifdef CONFIG_NET_CLS_ACT
> - struct mini_Qdisc *miniq = rcu_dereference_bh(dev->miniq_egress);
> - struct tcf_result cl_res;
> + struct mini_Qdisc *miniq = rcu_dereference_bh(entry->miniq);
> + struct tcf_result res;
>
> if (!miniq)
> - return skb;
> + return ret;
>
> - /* qdisc_skb_cb(skb)->pkt_len was already set by the caller. */
> tc_skb_cb(skb)->mru = 0;
> tc_skb_cb(skb)->post_ct = false;
> - mini_qdisc_bstats_cpu_update(miniq, skb);
>
> - switch (tcf_classify(skb, miniq->block, miniq->filter_list, &cl_res, false)) {
> + mini_qdisc_bstats_cpu_update(miniq, skb);
> + ret = tcf_classify(skb, miniq->block, miniq->filter_list, &res, false);
> + /* Only tcf related quirks below. */
> + switch (ret) {
> + case TC_ACT_SHOT:
> + mini_qdisc_qstats_cpu_drop(miniq);
> + break;
> case TC_ACT_OK:
> case TC_ACT_RECLASSIFY:
> - skb->tc_index = TC_H_MIN(cl_res.classid);
> + skb->tc_index = TC_H_MIN(res.classid);
> break;
> + }
> +#endif /* CONFIG_NET_CLS_ACT */
> + return ret;
> +}
> +
> +static DEFINE_STATIC_KEY_FALSE(tcx_needed_key);
> +
> +void tcx_inc(void)
> +{
> + static_branch_inc(&tcx_needed_key);
> +}
> +
> +void tcx_dec(void)
> +{
> + static_branch_dec(&tcx_needed_key);
> +}
> +
> +static __always_inline enum tcx_action_base
> +tcx_run(const struct bpf_mprog_entry *entry, struct sk_buff *skb,
> + const bool needs_mac)
> +{
> + const struct bpf_mprog_fp *fp;
> + const struct bpf_prog *prog;
> + int ret = TCX_NEXT;
> +
> + if (needs_mac)
> + __skb_push(skb, skb->mac_len);
> + bpf_mprog_foreach_prog(entry, fp, prog) {
> + bpf_compute_data_pointers(skb);
> + ret = bpf_prog_run(prog, skb);
> + if (ret != TCX_NEXT)
> + break;
> + }
> + if (needs_mac)
> + __skb_pull(skb, skb->mac_len);
> + return tcx_action_code(skb, ret);
> +}
> +
> +static __always_inline struct sk_buff *
> +sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
> + struct net_device *orig_dev, bool *another)
> +{
> + struct bpf_mprog_entry *entry = rcu_dereference_bh(skb->dev->tcx_ingress);
> + int sch_ret;
> +
> + if (!entry)
> + return skb;
> + if (*pt_prev) {
> + *ret = deliver_skb(skb, *pt_prev, orig_dev);
> + *pt_prev = NULL;
> + }
> +
> + qdisc_skb_cb(skb)->pkt_len = skb->len;
> + tcx_set_ingress(skb, true);
> +
> + if (static_branch_unlikely(&tcx_needed_key)) {
> + sch_ret = tcx_run(entry, skb, true);
> + if (sch_ret != TC_ACT_UNSPEC)
> + goto ingress_verdict;
> + }
> + sch_ret = tc_run(tcx_entry(entry), skb);
> +ingress_verdict:
> + switch (sch_ret) {
> + case TC_ACT_REDIRECT:
> + /* skb_mac_header check was done by BPF, so we can safely
> + * push the L2 header back before redirecting to another
> + * netdev.
> + */
> + __skb_push(skb, skb->mac_len);
> + if (skb_do_redirect(skb) == -EAGAIN) {
> + __skb_pull(skb, skb->mac_len);
> + *another = true;
> + break;
> + }
> + *ret = NET_RX_SUCCESS;
> + return NULL;
> case TC_ACT_SHOT:
> - mini_qdisc_qstats_cpu_drop(miniq);
> - *ret = NET_XMIT_DROP;
> - kfree_skb_reason(skb, SKB_DROP_REASON_TC_EGRESS);
> + kfree_skb_reason(skb, SKB_DROP_REASON_TC_INGRESS);
> + *ret = NET_RX_DROP;
> return NULL;
> + /* used by tc_run */
> case TC_ACT_STOLEN:
> case TC_ACT_QUEUED:
> case TC_ACT_TRAP:
> - *ret = NET_XMIT_SUCCESS;
> consume_skb(skb);
> + fallthrough;
> + case TC_ACT_CONSUMED:
> + *ret = NET_RX_SUCCESS;
> return NULL;
> + }
> +
> + return skb;
> +}
> +
> +static __always_inline struct sk_buff *
> +sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
> +{
> + struct bpf_mprog_entry *entry = rcu_dereference_bh(dev->tcx_egress);
> + int sch_ret;
> +
> + if (!entry)
> + return skb;
> +
> + /* qdisc_skb_cb(skb)->pkt_len & tcx_set_ingress() was
> + * already set by the caller.
> + */
> + if (static_branch_unlikely(&tcx_needed_key)) {
> + sch_ret = tcx_run(entry, skb, false);
> + if (sch_ret != TC_ACT_UNSPEC)
> + goto egress_verdict;
> + }
> + sch_ret = tc_run(tcx_entry(entry), skb);
> +egress_verdict:
> + switch (sch_ret) {
> case TC_ACT_REDIRECT:
> /* No need to push/pop skb's mac_header here on egress! */
> skb_do_redirect(skb);
> *ret = NET_XMIT_SUCCESS;
> return NULL;
> - default:
> - break;
> + case TC_ACT_SHOT:
> + kfree_skb_reason(skb, SKB_DROP_REASON_TC_EGRESS);
> + *ret = NET_XMIT_DROP;
> + return NULL;
> + /* used by tc_run */
> + case TC_ACT_STOLEN:
> + case TC_ACT_QUEUED:
> + case TC_ACT_TRAP:
> + *ret = NET_XMIT_SUCCESS;
> + return NULL;
> }
> -#endif /* CONFIG_NET_CLS_ACT */
>
> return skb;
> }
> -
> -static struct netdev_queue *
> -netdev_tx_queue_mapping(struct net_device *dev, struct sk_buff *skb)
> -{
> - int qm = skb_get_queue_mapping(skb);
> -
> - return netdev_get_tx_queue(dev, netdev_cap_txqueue(dev, qm));
> -}
> -
> -static bool netdev_xmit_txqueue_skipped(void)
> +#else
> +static __always_inline struct sk_buff *
> +sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
> + struct net_device *orig_dev, bool *another)
> {
> - return __this_cpu_read(softnet_data.xmit.skip_txqueue);
> + return skb;
> }
>
> -void netdev_xmit_skip_txqueue(bool skip)
> +static __always_inline struct sk_buff *
> +sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
> {
> - __this_cpu_write(softnet_data.xmit.skip_txqueue, skip);
> + return skb;
> }
> -EXPORT_SYMBOL_GPL(netdev_xmit_skip_txqueue);
> -#endif /* CONFIG_NET_EGRESS */
> +#endif /* CONFIG_NET_XGRESS */
>
> #ifdef CONFIG_XPS
> static int __get_xps_queue_idx(struct net_device *dev, struct sk_buff *skb,
> @@ -4128,9 +4257,7 @@ int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
> skb_update_prio(skb);
>
> qdisc_pkt_len_init(skb);
> -#ifdef CONFIG_NET_CLS_ACT
> - skb->tc_at_ingress = 0;
> -#endif
> + tcx_set_ingress(skb, false);
> #ifdef CONFIG_NET_EGRESS
> if (static_branch_unlikely(&egress_needed_key)) {
> if (nf_hook_egress_active()) {
> @@ -5064,72 +5191,6 @@ int (*br_fdb_test_addr_hook)(struct net_device *dev,
> EXPORT_SYMBOL_GPL(br_fdb_test_addr_hook);
> #endif
>
> -static inline struct sk_buff *
> -sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
> - struct net_device *orig_dev, bool *another)
> -{
> -#ifdef CONFIG_NET_CLS_ACT
> - struct mini_Qdisc *miniq = rcu_dereference_bh(skb->dev->miniq_ingress);
> - struct tcf_result cl_res;
> -
> - /* If there's at least one ingress present somewhere (so
> - * we get here via enabled static key), remaining devices
> - * that are not configured with an ingress qdisc will bail
> - * out here.
> - */
> - if (!miniq)
> - return skb;
> -
> - if (*pt_prev) {
> - *ret = deliver_skb(skb, *pt_prev, orig_dev);
> - *pt_prev = NULL;
> - }
> -
> - qdisc_skb_cb(skb)->pkt_len = skb->len;
> - tc_skb_cb(skb)->mru = 0;
> - tc_skb_cb(skb)->post_ct = false;
> - skb->tc_at_ingress = 1;
> - mini_qdisc_bstats_cpu_update(miniq, skb);
> -
> - switch (tcf_classify(skb, miniq->block, miniq->filter_list, &cl_res, false)) {
> - case TC_ACT_OK:
> - case TC_ACT_RECLASSIFY:
> - skb->tc_index = TC_H_MIN(cl_res.classid);
> - break;
> - case TC_ACT_SHOT:
> - mini_qdisc_qstats_cpu_drop(miniq);
> - kfree_skb_reason(skb, SKB_DROP_REASON_TC_INGRESS);
> - *ret = NET_RX_DROP;
> - return NULL;
> - case TC_ACT_STOLEN:
> - case TC_ACT_QUEUED:
> - case TC_ACT_TRAP:
> - consume_skb(skb);
> - *ret = NET_RX_SUCCESS;
> - return NULL;
> - case TC_ACT_REDIRECT:
> - /* skb_mac_header check was done by cls/act_bpf, so
> - * we can safely push the L2 header back before
> - * redirecting to another netdev
> - */
> - __skb_push(skb, skb->mac_len);
> - if (skb_do_redirect(skb) == -EAGAIN) {
> - __skb_pull(skb, skb->mac_len);
> - *another = true;
> - break;
> - }
> - *ret = NET_RX_SUCCESS;
> - return NULL;
> - case TC_ACT_CONSUMED:
> - *ret = NET_RX_SUCCESS;
> - return NULL;
> - default:
> - break;
> - }
> -#endif /* CONFIG_NET_CLS_ACT */
> - return skb;
> -}
> -
> /**
> * netdev_is_rx_handler_busy - check if receive handler is registered
> * @dev: device to check
> @@ -10834,7 +10895,7 @@ void unregister_netdevice_many_notify(struct list_head *head,
>
> /* Shutdown queueing discipline. */
> dev_shutdown(dev);
> -
> + dev_tcx_uninstall(dev);
> dev_xdp_uninstall(dev);
> bpf_dev_bound_netdev_unregister(dev);
>
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 06ba0e56e369..e39a8a20dd10 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -9312,7 +9312,7 @@ static struct bpf_insn *bpf_convert_tstamp_read(const struct bpf_prog *prog,
> __u8 value_reg = si->dst_reg;
> __u8 skb_reg = si->src_reg;
>
> -#ifdef CONFIG_NET_CLS_ACT
> +#ifdef CONFIG_NET_XGRESS
> /* If the tstamp_type is read,
> * the bpf prog is aware the tstamp could have delivery time.
> * Thus, read skb->tstamp as is if tstamp_type_access is true.
> @@ -9346,7 +9346,7 @@ static struct bpf_insn *bpf_convert_tstamp_write(const struct bpf_prog *prog,
> __u8 value_reg = si->src_reg;
> __u8 skb_reg = si->dst_reg;
>
> -#ifdef CONFIG_NET_CLS_ACT
> +#ifdef CONFIG_NET_XGRESS
> /* If the tstamp_type is read,
> * the bpf prog is aware the tstamp could have delivery time.
> * Thus, write skb->tstamp as is if tstamp_type_access is true.
> diff --git a/net/sched/Kconfig b/net/sched/Kconfig
> index 4b95cb1ac435..470c70deffe2 100644
> --- a/net/sched/Kconfig
> +++ b/net/sched/Kconfig
> @@ -347,8 +347,7 @@ config NET_SCH_FQ_PIE
> config NET_SCH_INGRESS
> tristate "Ingress/classifier-action Qdisc"
> depends on NET_CLS_ACT
> - select NET_INGRESS
> - select NET_EGRESS
> + select NET_XGRESS
> help
> Say Y here if you want to use classifiers for incoming and/or outgoing
> packets. This qdisc doesn't do anything else besides running classifiers,
> @@ -679,6 +678,7 @@ config NET_EMATCH_IPT
> config NET_CLS_ACT
> bool "Actions"
> select NET_CLS
> + select NET_XGRESS
> help
> Say Y here if you want to use traffic control actions. Actions
> get attached to classifiers and are invoked after a successful
> diff --git a/net/sched/sch_ingress.c b/net/sched/sch_ingress.c
> index e43a45499372..04e886f6cee4 100644
> --- a/net/sched/sch_ingress.c
> +++ b/net/sched/sch_ingress.c
> @@ -13,6 +13,7 @@
> #include <net/netlink.h>
> #include <net/pkt_sched.h>
> #include <net/pkt_cls.h>
> +#include <net/tcx.h>
>
> struct ingress_sched_data {
> struct tcf_block *block;
> @@ -78,6 +79,8 @@ static int ingress_init(struct Qdisc *sch, struct nlattr *opt,
> {
> struct ingress_sched_data *q = qdisc_priv(sch);
> struct net_device *dev = qdisc_dev(sch);
> + struct bpf_mprog_entry *entry;
> + bool created;
> int err;
>
> if (sch->parent != TC_H_INGRESS)
> @@ -85,7 +88,13 @@ static int ingress_init(struct Qdisc *sch, struct nlattr *opt,
>
> net_inc_ingress_queue();
>
> - mini_qdisc_pair_init(&q->miniqp, sch, &dev->miniq_ingress);
> + entry = tcx_entry_fetch_or_create(dev, true, &created);
> + if (!entry)
> + return -ENOMEM;
> + tcx_miniq_set_active(entry, true);
> + mini_qdisc_pair_init(&q->miniqp, sch, &tcx_entry(entry)->miniq);
> + if (created)
> + tcx_entry_update(dev, entry, true);
>
> q->block_info.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS;
> q->block_info.chain_head_change = clsact_chain_head_change;
> @@ -103,11 +112,22 @@ static int ingress_init(struct Qdisc *sch, struct nlattr *opt,
> static void ingress_destroy(struct Qdisc *sch)
> {
> struct ingress_sched_data *q = qdisc_priv(sch);
> + struct net_device *dev = qdisc_dev(sch);
> + struct bpf_mprog_entry *entry = rtnl_dereference(dev->tcx_ingress);
>
> if (sch->parent != TC_H_INGRESS)
> return;
>
> tcf_block_put_ext(q->block, sch, &q->block_info);
> +
> + if (entry) {
> + tcx_miniq_set_active(entry, false);
> + if (!tcx_entry_is_active(entry)) {
> + tcx_entry_update(dev, NULL, false);
> + tcx_entry_free(entry);
> + }
> + }
> +
> net_dec_ingress_queue();
> }
>
> @@ -223,6 +243,8 @@ static int clsact_init(struct Qdisc *sch, struct nlattr *opt,
> {
> struct clsact_sched_data *q = qdisc_priv(sch);
> struct net_device *dev = qdisc_dev(sch);
> + struct bpf_mprog_entry *entry;
> + bool created;
> int err;
>
> if (sch->parent != TC_H_CLSACT)
> @@ -231,7 +253,13 @@ static int clsact_init(struct Qdisc *sch, struct nlattr *opt,
> net_inc_ingress_queue();
> net_inc_egress_queue();
>
> - mini_qdisc_pair_init(&q->miniqp_ingress, sch, &dev->miniq_ingress);
> + entry = tcx_entry_fetch_or_create(dev, true, &created);
> + if (!entry)
> + return -ENOMEM;
> + tcx_miniq_set_active(entry, true);
> + mini_qdisc_pair_init(&q->miniqp_ingress, sch, &tcx_entry(entry)->miniq);
> + if (created)
> + tcx_entry_update(dev, entry, true);
>
> q->ingress_block_info.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS;
> q->ingress_block_info.chain_head_change = clsact_chain_head_change;
> @@ -244,7 +272,13 @@ static int clsact_init(struct Qdisc *sch, struct nlattr *opt,
>
> mini_qdisc_pair_block_init(&q->miniqp_ingress, q->ingress_block);
>
> - mini_qdisc_pair_init(&q->miniqp_egress, sch, &dev->miniq_egress);
> + entry = tcx_entry_fetch_or_create(dev, false, &created);
> + if (!entry)
> + return -ENOMEM;
> + tcx_miniq_set_active(entry, true);
> + mini_qdisc_pair_init(&q->miniqp_egress, sch, &tcx_entry(entry)->miniq);
> + if (created)
> + tcx_entry_update(dev, entry, false);
>
> q->egress_block_info.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_EGRESS;
> q->egress_block_info.chain_head_change = clsact_chain_head_change;
> @@ -256,12 +290,31 @@ static int clsact_init(struct Qdisc *sch, struct nlattr *opt,
> static void clsact_destroy(struct Qdisc *sch)
> {
> struct clsact_sched_data *q = qdisc_priv(sch);
> + struct net_device *dev = qdisc_dev(sch);
> + struct bpf_mprog_entry *ingress_entry = rtnl_dereference(dev->tcx_ingress);
> + struct bpf_mprog_entry *egress_entry = rtnl_dereference(dev->tcx_egress);
>
> if (sch->parent != TC_H_CLSACT)
> return;
>
> - tcf_block_put_ext(q->egress_block, sch, &q->egress_block_info);
> tcf_block_put_ext(q->ingress_block, sch, &q->ingress_block_info);
> + tcf_block_put_ext(q->egress_block, sch, &q->egress_block_info);
> +
> + if (ingress_entry) {
> + tcx_miniq_set_active(ingress_entry, false);
> + if (!tcx_entry_is_active(ingress_entry)) {
> + tcx_entry_update(dev, NULL, true);
> + tcx_entry_free(ingress_entry);
> + }
> + }
> +
> + if (egress_entry) {
> + tcx_miniq_set_active(egress_entry, false);
> + if (!tcx_entry_is_active(egress_entry)) {
> + tcx_entry_update(dev, NULL, false);
> + tcx_entry_free(egress_entry);
> + }
> + }
>
> net_dec_ingress_queue();
> net_dec_egress_queue();
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 1c166870cdf3..47b76925189f 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -1036,6 +1036,8 @@ enum bpf_attach_type {
> BPF_LSM_CGROUP,
> BPF_STRUCT_OPS,
> BPF_NETFILTER,
> + BPF_TCX_INGRESS,
> + BPF_TCX_EGRESS,
> __MAX_BPF_ATTACH_TYPE
> };
>
> @@ -1053,7 +1055,7 @@ enum bpf_link_type {
> BPF_LINK_TYPE_KPROBE_MULTI = 8,
> BPF_LINK_TYPE_STRUCT_OPS = 9,
> BPF_LINK_TYPE_NETFILTER = 10,
> -
> + BPF_LINK_TYPE_TCX = 11,
> MAX_BPF_LINK_TYPE,
> };
>
> @@ -1569,13 +1571,13 @@ union bpf_attr {
> __u32 map_fd; /* struct_ops to attach */
> };
> union {
> - __u32 target_fd; /* object to attach to */
> - __u32 target_ifindex; /* target ifindex */
> + __u32 target_fd; /* target object to attach to or ... */
> + __u32 target_ifindex; /* target ifindex */
> };
> __u32 attach_type; /* attach type */
> __u32 flags; /* extra flags */
> union {
> - __u32 target_btf_id; /* btf_id of target to attach to */
> + __u32 target_btf_id; /* btf_id of target to attach to */
> struct {
> __aligned_u64 iter_info; /* extra bpf_iter_link_info */
> __u32 iter_info_len; /* iter_info length */
> @@ -1609,6 +1611,13 @@ union bpf_attr {
> __s32 priority;
> __u32 flags;
> } netfilter;
> + struct {
> + union {
> + __u32 relative_fd;
> + __u32 relative_id;
> + };
> + __u64 expected_revision;
> + } tcx;
> };
> } link_create;
>
> @@ -6217,6 +6226,19 @@ struct bpf_sock_tuple {
> };
> };
>
> +/* (Simplified) user return codes for tcx prog type.
> + * A valid tcx program must return one of these defined values. All other
> + * return codes are reserved for future use. Must remain compatible with
> + * their TC_ACT_* counter-parts. For compatibility in behavior, unknown
> + * return codes are mapped to TCX_NEXT.
> + */
> +enum tcx_action_base {
> + TCX_NEXT = -1,
> + TCX_PASS = 0,
> + TCX_DROP = 2,
> + TCX_REDIRECT = 7,
> +};
> +
> struct bpf_xdp_sock {
> __u32 queue_id;
> };
> @@ -6499,6 +6521,10 @@ struct bpf_link_info {
> } event; /* BPF_PERF_EVENT_EVENT */
> };
> } perf_event;
> + struct {
> + __u32 ifindex;
> + __u32 attach_type;
> + } tcx;
> };
> } __attribute__((aligned(8)));
>
> --
> 2.34.1
>
>
--
Regards
Yafang
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH bpf-next v6 2/8] bpf: Add fd-based tcx multi-prog infra with link support
2023-07-20 2:13 ` Yafang Shao
@ 2023-07-21 12:53 ` Daniel Borkmann
0 siblings, 0 replies; 14+ messages in thread
From: Daniel Borkmann @ 2023-07-21 12:53 UTC (permalink / raw)
To: Yafang Shao
Cc: ast, andrii, martin.lau, razor, sdf, john.fastabend, kuba, dxu,
joe, toke, davem, bpf, netdev
On 7/20/23 4:13 AM, Yafang Shao wrote:
> On Wed, Jul 19, 2023 at 10:11 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
[...]
>> +static const struct bpf_link_ops tcx_link_lops = {
>> + .release = tcx_link_release,
>> + .detach = tcx_link_detach,
>> + .dealloc = tcx_link_dealloc,
>> + .update_prog = tcx_link_update,
>> + .show_fdinfo = tcx_link_fdinfo,
>> + .fill_link_info = tcx_link_fill_info,
>
> Should we show the tc link info in `bpftool link show` as well? I
> believe that `bpftool link show` is the appropriate command to display
> comprehensive information about all links.
Yep, good idea. I'll add this to my todo list to tackle for once I'm back
from travel.
Thanks,
Daniel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH bpf-next v6 2/8] bpf: Add fd-based tcx multi-prog infra with link support
2023-07-19 14:08 ` [PATCH bpf-next v6 2/8] bpf: Add fd-based tcx multi-prog infra with link support Daniel Borkmann
2023-07-20 2:13 ` Yafang Shao
@ 2023-07-21 14:57 ` Petr Machata
2023-07-21 23:43 ` Daniel Borkmann
1 sibling, 1 reply; 14+ messages in thread
From: Petr Machata @ 2023-07-21 14:57 UTC (permalink / raw)
To: Daniel Borkmann
Cc: ast, andrii, martin.lau, razor, sdf, john.fastabend, kuba, dxu,
joe, toke, davem, bpf, netdev
[-- Attachment #1: Type: text/plain, Size: 1211 bytes --]
As of this patch (commit e420bed02507), TC qdisc installation and/or
removal cause memory access issues in the system.
A semi-minimal reproducer is:
bash-5.2# ip l a name v1 type veth peer name v2
bash-5.2# ip l s dev v1 up
bash-5.2# ip l s dev v2 up
bash-5.2# tc q a dev v1 ingress
bash-5.2# tc q d dev v1 ingress
bash-5.2# tc q a dev v1 ingress
bash-5.2# tc q d dev v1 ingress
It's a bit finnicky, but only a little. For me, the first two "tc q"
operations never triggered a splat. Then it could take a few "tc q a"
"tc q d" iterations to get it to splat. So it looks like maybe the first
"tc q d" is the problematic bit? And then there's some likelihood of
failing on any following "tc q" operation. The above in particular
produced three warning splats for me (attached as decoded.txt,
decoded2.txt and decoded3.txt). Probing further:
bash-5.2# tc q a dev v1 ingress
Produced two more splats from KASAN (decoded4.txt and decoded5.txt),
which look more serious.
Further attempts to prod the system deadlock it, I guess because RTNL
was left locked.
Reverting e420bed02507, and fe20ce3a5126 + 55cc3768473e that fail to
build without it, makes net-next/main work again.
[-- Attachment #2: decoded.txt --]
[-- Type: text/plain, Size: 13473 bytes --]
[ 337.885866] ------------[ cut here ]------------
[ 337.886351] ODEBUG: activate active (active state 1) object: ffff888008a7a000 object type: rcu_head hint: 0x0
[ 337.887126] WARNING: CPU: 0 PID: 171 at lib/debugobjects.c:514 debug_print_object (/home/petr/src/linux_mlxsw/lib/debugobjects.c:514 (discriminator 2))
[ 337.887813] Modules linked in: sch_ingress veth
[ 337.888504] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc38 04/01/2014
[ 337.888996] RIP: 0010:debug_print_object (/home/petr/src/linux_mlxsw/lib/debugobjects.c:514 (discriminator 2))
[ 337.889324] Code: 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 75 49 41 56 48 8b 14 dd 80 e3 61 83 4c 89 e6 48 c7 c7 e0 d6 61 83 e8 52 8b 2e ff <0f> 0b 58 83 05 9c b1 58 02 01 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e
All code
========
0: 00 fc add %bh,%ah
2: ff (bad)
3: df 48 89 fisttps -0x77(%rax)
6: fa cli
7: 48 c1 ea 03 shr $0x3,%rdx
b: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1)
f: 75 49 jne 0x5a
11: 41 56 push %r14
13: 48 8b 14 dd 80 e3 61 mov -0x7c9e1c80(,%rbx,8),%rdx
1a: 83
1b: 4c 89 e6 mov %r12,%rsi
1e: 48 c7 c7 e0 d6 61 83 mov $0xffffffff8361d6e0,%rdi
25: e8 52 8b 2e ff call 0xffffffffff2e8b7c
2a:* 0f 0b ud2 <-- trapping instruction
2c: 58 pop %rax
2d: 83 05 9c b1 58 02 01 addl $0x1,0x258b19c(%rip) # 0x258b1d0
34: 48 83 c4 18 add $0x18,%rsp
38: 5b pop %rbx
39: 5d pop %rbp
3a: 41 5c pop %r12
3c: 41 5d pop %r13
3e: 41 5e pop %r14
Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: 58 pop %rax
3: 83 05 9c b1 58 02 01 addl $0x1,0x258b19c(%rip) # 0x258b1a6
a: 48 83 c4 18 add $0x18,%rsp
e: 5b pop %rbx
f: 5d pop %rbp
10: 41 5c pop %r12
12: 41 5d pop %r13
14: 41 5e pop %r14
[ 337.890435] RSP: 0018:ffffc9000009f1c8 EFLAGS: 00010286
[ 337.890798] RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000
[ 337.891213] RDX: ffff88800c8c9fc0 RSI: ffffffff813f13cb RDI: 0000000000000001
[ 337.891639] RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000
[ 337.892038] R10: 0000000000000001 R11: 0000000000000001 R12: ffffffff8361dd40
[ 337.892460] R13: ffffffff834daf80 R14: 0000000000000000 R15: ffff8880093eee90
[ 337.892865] FS: 00007f0089130740(0000) GS:ffff888036000000(0000) knlGS:0000000000000000
[ 337.893339] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 337.893673] CR2: 000055958cefedc0 CR3: 000000000c0af001 CR4: 0000000000370ef0
[ 337.894071] Call Trace:
[ 337.894244] <TASK>
[ 337.894380] ? __warn (/home/petr/src/linux_mlxsw/kernel/panic.c:673)
[ 337.894585] ? debug_print_object (/home/petr/src/linux_mlxsw/lib/debugobjects.c:514 (discriminator 2))
[ 337.894865] ? report_bug (/home/petr/src/linux_mlxsw/lib/bug.c:180 /home/petr/src/linux_mlxsw/lib/bug.c:219)
[ 337.895154] ? handle_bug (/home/petr/src/linux_mlxsw/arch/x86/kernel/traps.c:324 (discriminator 1))
[ 337.895390] ? exc_invalid_op (/home/petr/src/linux_mlxsw/arch/x86/kernel/traps.c:345 (discriminator 1))
[ 337.895628] ? asm_exc_invalid_op (/home/petr/src/linux_mlxsw/./arch/x86/include/asm/idtentry.h:568)
[ 337.895890] ? __warn_printk (/home/petr/src/linux_mlxsw/kernel/panic.c:712)
[ 337.896124] ? debug_print_object (/home/petr/src/linux_mlxsw/lib/debugobjects.c:514 (discriminator 2))
[ 337.896414] ? debug_print_object (/home/petr/src/linux_mlxsw/lib/debugobjects.c:514 (discriminator 2))
[ 337.896680] ? _raw_spin_unlock_irqrestore (/home/petr/src/linux_mlxsw/./arch/x86/include/asm/irqflags.h:42 /home/petr/src/linux_mlxsw/./arch/x86/include/asm/irqflags.h:77 /home/petr/src/linux_mlxsw/./arch/x86/include/asm/irqflags.h:135 /home/petr/src/linux_mlxsw/./include/linux/spinlock_api_smp.h:151 /home/petr/src/linux_mlxsw/kernel/locking/spinlock.c:194)
[ 337.896981] debug_object_activate (/home/petr/src/linux_mlxsw/lib/debugobjects.c:734)
[ 337.897274] ? debug_object_destroy (/home/petr/src/linux_mlxsw/lib/debugobjects.c:702)
[ 337.897559] ? mark_held_locks (/home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:4281 (discriminator 1))
[ 337.897811] ? kvfree_call_rcu (/home/petr/src/linux_mlxsw/kernel/rcu/rcu.h:227 /home/petr/src/linux_mlxsw/kernel/rcu/tree.c:3359)
[ 337.898049] kvfree_call_rcu (/home/petr/src/linux_mlxsw/kernel/rcu/rcu.h:227 /home/petr/src/linux_mlxsw/kernel/rcu/tree.c:3359)
[ 337.898300] ? __tcf_block_put (/home/petr/src/linux_mlxsw/net/sched/cls_api.c:535 /home/petr/src/linux_mlxsw/net/sched/cls_api.c:530 /home/petr/src/linux_mlxsw/net/sched/cls_api.c:1301)
[ 337.898569] ingress_destroy (/home/petr/src/linux_mlxsw/net/sched/sch_ingress.c:131) sch_ingress
[ 337.898877] ? clsact_init (/home/petr/src/linux_mlxsw/net/sched/sch_ingress.c:113) sch_ingress
[ 337.899192] __qdisc_destroy (/home/petr/src/linux_mlxsw/net/sched/sch_generic.c:1065)
[ 337.899443] qdisc_destroy (/home/petr/src/linux_mlxsw/net/sched/sch_generic.c:1079)
[ 337.899669] qdisc_graft (/home/petr/src/linux_mlxsw/net/sched/sch_api.c:1134)
[ 337.899939] ? tc_dump_tclass (/home/petr/src/linux_mlxsw/net/sched/sch_api.c:1076)
[ 337.900248] tc_get_qdisc (/home/petr/src/linux_mlxsw/net/sched/sch_api.c:1541)
[ 337.900479] ? tc_ctl_tclass (/home/petr/src/linux_mlxsw/net/sched/sch_api.c:1475)
[ 337.900713] ? rtnetlink_rcv_msg (/home/petr/src/linux_mlxsw/net/core/rtnetlink.c:6421)
[ 337.901034] ? cap_capable (/home/petr/src/linux_mlxsw/security/commoncap.c:102)
[ 337.901384] ? lock_is_held_type (/home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:467 (discriminator 4) /home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:5833 (discriminator 4))
[ 337.901641] ? tc_ctl_tclass (/home/petr/src/linux_mlxsw/net/sched/sch_api.c:1475)
[ 337.901902] rtnetlink_rcv_msg (/home/petr/src/linux_mlxsw/net/core/rtnetlink.c:6423)
[ 337.902196] ? rtnl_dump_ifinfo (/home/petr/src/linux_mlxsw/net/core/rtnetlink.c:6319)
[ 337.902478] ? lockdep_hardirqs_on_prepare (/home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:5000)
[ 337.902797] ? lockdep_hardirqs_on_prepare (/home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:5000)
[ 337.903101] ? find_held_lock (/home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:5251 (discriminator 1))
[ 337.903378] netlink_rcv_skb (/home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:2547)
[ 337.903626] ? rtnl_dump_ifinfo (/home/petr/src/linux_mlxsw/net/core/rtnetlink.c:6319)
[ 337.903902] ? netlink_ack (/home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:2523)
[ 337.904139] ? lock_sync (/home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:5729)
[ 337.904409] ? netlink_deliver_tap (/home/petr/src/linux_mlxsw/./include/linux/rcupdate.h:308 /home/petr/src/linux_mlxsw/./include/linux/rcupdate.h:782 /home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:340)
[ 337.904679] ? is_vmalloc_addr (/home/petr/src/linux_mlxsw/mm/vmalloc.c:83)
[ 337.904933] netlink_unicast (/home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:1340 /home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:1365)
[ 337.905183] ? netlink_attachskb (/home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:1350)
[ 337.905462] ? __sanitizer_cov_trace_switch (/home/petr/src/linux_mlxsw/kernel/kcov.c:340 (discriminator 1))
[ 337.905778] ? __check_object_size (/home/petr/src/linux_mlxsw/mm/usercopy.c:113 /home/petr/src/linux_mlxsw/mm/usercopy.c:145 /home/petr/src/linux_mlxsw/mm/usercopy.c:254 /home/petr/src/linux_mlxsw/mm/usercopy.c:213)
[ 337.906050] netlink_sendmsg (/home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:1911)
[ 337.906319] ? netlink_unicast (/home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:1830)
[ 337.906615] ? netlink_unicast (/home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:1830)
[ 337.906916] ____sys_sendmsg (/home/petr/src/linux_mlxsw/net/socket.c:728 (discriminator 1) /home/petr/src/linux_mlxsw/net/socket.c:748 (discriminator 1) /home/petr/src/linux_mlxsw/net/socket.c:2494 (discriminator 1))
[ 337.907170] ? copy_msghdr_from_user (/home/petr/src/linux_mlxsw/net/socket.c:2420)
[ 337.907481] ? sock_read_iter (/home/petr/src/linux_mlxsw/net/socket.c:2440)
[ 337.907739] ? __lock_acquire (/home/petr/src/linux_mlxsw/./arch/x86/include/asm/bitops.h:228 /home/petr/src/linux_mlxsw/./arch/x86/include/asm/bitops.h:240 /home/petr/src/linux_mlxsw/./include/asm-generic/bitops/instrumented-non-atomic.h:142 /home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:228 /home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:3788 /home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:3844 /home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:5144)
[ 337.908007] ___sys_sendmsg (/home/petr/src/linux_mlxsw/net/socket.c:2550)
[ 337.908275] ? do_recvmmsg (/home/petr/src/linux_mlxsw/net/socket.c:2537)
[ 337.908547] ? local_clock_noinstr (/home/petr/src/linux_mlxsw/kernel/sched/clock.c:301 (discriminator 1))
[ 337.908810] ? __fget_light (/home/petr/src/linux_mlxsw/fs/file.c:1027)
[ 337.909080] __sys_sendmsg (/home/petr/src/linux_mlxsw/net/socket.c:2579)
[ 337.909343] ? __sys_sendmsg_sock (/home/petr/src/linux_mlxsw/net/socket.c:2565)
[ 337.909607] ? xfd_validate_state (/home/petr/src/linux_mlxsw/arch/x86/kernel/fpu/xstate.c:1411 /home/petr/src/linux_mlxsw/arch/x86/kernel/fpu/xstate.c:1455)
[ 337.909899] ? syscall_enter_from_user_mode (/home/petr/src/linux_mlxsw/./arch/x86/include/asm/irqflags.h:42 /home/petr/src/linux_mlxsw/./arch/x86/include/asm/irqflags.h:77 /home/petr/src/linux_mlxsw/kernel/entry/common.c:111)
[ 337.910245] do_syscall_64 (/home/petr/src/linux_mlxsw/arch/x86/entry/common.c:50 (discriminator 1) /home/petr/src/linux_mlxsw/arch/x86/entry/common.c:80 (discriminator 1))
[ 337.910480] entry_SYSCALL_64_after_hwframe (/home/petr/src/linux_mlxsw/arch/x86/entry/entry_64.S:120)
[ 337.910813] RIP: 0033:0x7f008946a8b4
[ 337.911091] Code: 15 59 f5 0b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 00 f3 0f 1e fa 80 3d 2d 7d 0c 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 4c c3 0f 1f 00 55 48 89 e5 48 83 ec 20 89 55
All code
========
0: 15 59 f5 0b 00 adc $0xbf559,%eax
5: f7 d8 neg %eax
7: 64 89 02 mov %eax,%fs:(%rdx)
a: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax
11: eb b5 jmp 0xffffffffffffffc8
13: 0f 1f 00 nopl (%rax)
16: f3 0f 1e fa endbr64
1a: 80 3d 2d 7d 0c 00 00 cmpb $0x0,0xc7d2d(%rip) # 0xc7d4e
21: 74 13 je 0x36
23: b8 2e 00 00 00 mov $0x2e,%eax
28: 0f 05 syscall
2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction
30: 77 4c ja 0x7e
32: c3 ret
33: 0f 1f 00 nopl (%rax)
36: 55 push %rbp
37: 48 89 e5 mov %rsp,%rbp
3a: 48 83 ec 20 sub $0x20,%rsp
3e: 89 .byte 0x89
3f: 55 push %rbp
Code starting with the faulting instruction
===========================================
0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax
6: 77 4c ja 0x54
8: c3 ret
9: 0f 1f 00 nopl (%rax)
c: 55 push %rbp
d: 48 89 e5 mov %rsp,%rbp
10: 48 83 ec 20 sub $0x20,%rsp
14: 89 .byte 0x89
15: 55 push %rbp
[ 337.912135] RSP: 002b:00007ffd615b18c8 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
[ 337.912598] RAX: ffffffffffffffda RBX: 000055958cf26f80 RCX: 00007f008946a8b4
[ 337.913006] RDX: 0000000000000000 RSI: 00007ffd615b1940 RDI: 0000000000000003
[ 337.913435] RBP: 00007ffd615b19b0 R08: 0000000064bab34a R09: 0000000000000001
[ 337.913846] R10: 0000000000000001 R11: 0000000000000202 R12: 00007ffd615b1a30
[ 337.914302] R13: 0000000064bab34b R14: 000055958cf26f80 R15: 0000000000000000
[ 337.914793] </TASK>
[ 337.914929] irq event stamp: 167013
[ 337.915150] hardirqs last enabled at (167021): __up_console_sem (/home/petr/src/linux_mlxsw/./arch/x86/include/asm/irqflags.h:42 /home/petr/src/linux_mlxsw/./arch/x86/include/asm/irqflags.h:77 /home/petr/src/linux_mlxsw/./arch/x86/include/asm/irqflags.h:135 /home/petr/src/linux_mlxsw/kernel/printk/printk.c:347 /home/petr/src/linux_mlxsw/kernel/printk/printk.c:339)
[ 337.915684] hardirqs last disabled at (167030): __up_console_sem (/home/petr/src/linux_mlxsw/kernel/printk/printk.c:345 (discriminator 3))
[ 337.916182] softirqs last enabled at (166282): irq_exit_rcu (/home/petr/src/linux_mlxsw/kernel/softirq.c:427 /home/petr/src/linux_mlxsw/kernel/softirq.c:632 /home/petr/src/linux_mlxsw/kernel/softirq.c:644)
[ 337.916803] softirqs last disabled at (166245): irq_exit_rcu (/home/petr/src/linux_mlxsw/kernel/softirq.c:427 /home/petr/src/linux_mlxsw/kernel/softirq.c:632 /home/petr/src/linux_mlxsw/kernel/softirq.c:644)
[ 337.917302] ---[ end trace 0000000000000000 ]---
[-- Attachment #3: decoded2.txt --]
[-- Type: text/plain, Size: 13363 bytes --]
[ 337.918159] ------------[ cut here ]------------
[ 337.918626] ODEBUG: active_state active (active state 1) object: ffff888008a7a000 object type: rcu_head hint: 0x0
[ 337.920604] WARNING: CPU: 0 PID: 171 at lib/debugobjects.c:514 debug_print_object (/home/petr/src/linux_mlxsw/lib/debugobjects.c:514 (discriminator 2))
[ 337.921684] Modules linked in: sch_ingress veth
[ 337.923119] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc38 04/01/2014
[ 337.924543] RIP: 0010:debug_print_object (/home/petr/src/linux_mlxsw/lib/debugobjects.c:514 (discriminator 2))
[ 337.925136] Code: 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 75 49 41 56 48 8b 14 dd 80 e3 61 83 4c 89 e6 48 c7 c7 e0 d6 61 83 e8 52 8b 2e ff <0f> 0b 58 83 05 9c b1 58 02 01 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e
All code
========
0: 00 fc add %bh,%ah
2: ff (bad)
3: df 48 89 fisttps -0x77(%rax)
6: fa cli
7: 48 c1 ea 03 shr $0x3,%rdx
b: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1)
f: 75 49 jne 0x5a
11: 41 56 push %r14
13: 48 8b 14 dd 80 e3 61 mov -0x7c9e1c80(,%rbx,8),%rdx
1a: 83
1b: 4c 89 e6 mov %r12,%rsi
1e: 48 c7 c7 e0 d6 61 83 mov $0xffffffff8361d6e0,%rdi
25: e8 52 8b 2e ff call 0xffffffffff2e8b7c
2a:* 0f 0b ud2 <-- trapping instruction
2c: 58 pop %rax
2d: 83 05 9c b1 58 02 01 addl $0x1,0x258b19c(%rip) # 0x258b1d0
34: 48 83 c4 18 add $0x18,%rsp
38: 5b pop %rbx
39: 5d pop %rbp
3a: 41 5c pop %r12
3c: 41 5d pop %r13
3e: 41 5e pop %r14
Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: 58 pop %rax
3: 83 05 9c b1 58 02 01 addl $0x1,0x258b19c(%rip) # 0x258b1a6
a: 48 83 c4 18 add $0x18,%rsp
e: 5b pop %rbx
f: 5d pop %rbp
10: 41 5c pop %r12
12: 41 5d pop %r13
14: 41 5e pop %r14
[ 337.926728] RSP: 0000:ffffc9000009f1c8 EFLAGS: 00010286
[ 337.927123] RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000
[ 337.927704] RDX: ffff88800c8c9fc0 RSI: ffffffff813f13cb RDI: 0000000000000001
[ 337.928362] RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000000000000
[ 337.928930] R10: 0000000000000001 R11: 0000000000000001 R12: ffffffff8361db20
[ 337.929541] R13: ffffffff834daf80 R14: 0000000000000000 R15: ffff8880093eee90
[ 337.929977] FS: 00007f0089130740(0000) GS:ffff888036000000(0000) knlGS:0000000000000000
[ 337.930495] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 337.930842] CR2: 00007fd15cc3d000 CR3: 000000000c0af001 CR4: 0000000000370ef0
[ 337.931284] Call Trace:
[ 337.931443] <TASK>
[ 337.931579] ? __warn (/home/petr/src/linux_mlxsw/kernel/panic.c:673)
[ 337.931801] ? debug_print_object (/home/petr/src/linux_mlxsw/lib/debugobjects.c:514 (discriminator 2))
[ 337.932083] ? report_bug (/home/petr/src/linux_mlxsw/lib/bug.c:180 /home/petr/src/linux_mlxsw/lib/bug.c:219)
[ 337.932349] ? handle_bug (/home/petr/src/linux_mlxsw/arch/x86/kernel/traps.c:324 (discriminator 1))
[ 337.932576] ? exc_invalid_op (/home/petr/src/linux_mlxsw/arch/x86/kernel/traps.c:345 (discriminator 1))
[ 337.932816] ? asm_exc_invalid_op (/home/petr/src/linux_mlxsw/./arch/x86/include/asm/idtentry.h:568)
[ 337.933086] ? __warn_printk (/home/petr/src/linux_mlxsw/kernel/panic.c:712)
[ 337.933347] ? debug_print_object (/home/petr/src/linux_mlxsw/lib/debugobjects.c:514 (discriminator 2))
[ 337.933616] ? debug_print_object (/home/petr/src/linux_mlxsw/lib/debugobjects.c:514 (discriminator 2))
[ 337.933878] ? _raw_spin_unlock_irqrestore (/home/petr/src/linux_mlxsw/./arch/x86/include/asm/irqflags.h:42 /home/petr/src/linux_mlxsw/./arch/x86/include/asm/irqflags.h:77 /home/petr/src/linux_mlxsw/./arch/x86/include/asm/irqflags.h:135 /home/petr/src/linux_mlxsw/./include/linux/spinlock_api_smp.h:151 /home/petr/src/linux_mlxsw/kernel/locking/spinlock.c:194)
[ 337.934192] debug_object_active_state (/home/petr/src/linux_mlxsw/lib/debugobjects.c:993 /home/petr/src/linux_mlxsw/lib/debugobjects.c:954)
[ 337.934500] ? debug_stats_show (/home/petr/src/linux_mlxsw/lib/debugobjects.c:956)
[ 337.934763] ? mark_held_locks (/home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:4281 (discriminator 1))
[ 337.935017] kvfree_call_rcu (/home/petr/src/linux_mlxsw/kernel/rcu/tree.c:3359 (discriminator 1))
[ 337.935271] ? __tcf_block_put (/home/petr/src/linux_mlxsw/net/sched/cls_api.c:535 /home/petr/src/linux_mlxsw/net/sched/cls_api.c:530 /home/petr/src/linux_mlxsw/net/sched/cls_api.c:1301)
[ 337.935537] ingress_destroy (/home/petr/src/linux_mlxsw/net/sched/sch_ingress.c:131) sch_ingress
[ 337.935852] ? clsact_init (/home/petr/src/linux_mlxsw/net/sched/sch_ingress.c:113) sch_ingress
[ 337.936188] __qdisc_destroy (/home/petr/src/linux_mlxsw/net/sched/sch_generic.c:1065)
[ 337.936461] qdisc_destroy (/home/petr/src/linux_mlxsw/net/sched/sch_generic.c:1079)
[ 337.936694] qdisc_graft (/home/petr/src/linux_mlxsw/net/sched/sch_api.c:1134)
[ 337.936939] ? tc_dump_tclass (/home/petr/src/linux_mlxsw/net/sched/sch_api.c:1076)
[ 337.937287] tc_get_qdisc (/home/petr/src/linux_mlxsw/net/sched/sch_api.c:1541)
[ 337.937545] ? tc_ctl_tclass (/home/petr/src/linux_mlxsw/net/sched/sch_api.c:1475)
[ 337.937809] ? rtnetlink_rcv_msg (/home/petr/src/linux_mlxsw/net/core/rtnetlink.c:6421)
[ 337.938095] ? cap_capable (/home/petr/src/linux_mlxsw/security/commoncap.c:102)
[ 337.938360] ? lock_is_held_type (/home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:467 (discriminator 4) /home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:5833 (discriminator 4))
[ 337.938620] ? tc_ctl_tclass (/home/petr/src/linux_mlxsw/net/sched/sch_api.c:1475)
[ 337.938883] rtnetlink_rcv_msg (/home/petr/src/linux_mlxsw/net/core/rtnetlink.c:6423)
[ 337.939320] ? rtnl_dump_ifinfo (/home/petr/src/linux_mlxsw/net/core/rtnetlink.c:6319)
[ 337.939726] ? lockdep_hardirqs_on_prepare (/home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:5000)
[ 337.940217] ? lockdep_hardirqs_on_prepare (/home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:5000)
[ 337.940749] ? find_held_lock (/home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:5251 (discriminator 1))
[ 337.941119] netlink_rcv_skb (/home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:2547)
[ 337.941659] ? rtnl_dump_ifinfo (/home/petr/src/linux_mlxsw/net/core/rtnetlink.c:6319)
[ 337.942060] ? netlink_ack (/home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:2523)
[ 337.942478] ? lock_sync (/home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:5729)
[ 337.942825] ? netlink_deliver_tap (/home/petr/src/linux_mlxsw/./include/linux/rcupdate.h:308 /home/petr/src/linux_mlxsw/./include/linux/rcupdate.h:782 /home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:340)
[ 337.943338] ? is_vmalloc_addr (/home/petr/src/linux_mlxsw/mm/vmalloc.c:83)
[ 337.943667] netlink_unicast (/home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:1340 /home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:1365)
[ 337.943947] ? netlink_attachskb (/home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:1350)
[ 337.944312] ? __sanitizer_cov_trace_switch (/home/petr/src/linux_mlxsw/kernel/kcov.c:340 (discriminator 1))
[ 337.944706] ? __check_object_size (/home/petr/src/linux_mlxsw/mm/usercopy.c:113 /home/petr/src/linux_mlxsw/mm/usercopy.c:145 /home/petr/src/linux_mlxsw/mm/usercopy.c:254 /home/petr/src/linux_mlxsw/mm/usercopy.c:213)
[ 337.945042] netlink_sendmsg (/home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:1911)
[ 337.945468] ? netlink_unicast (/home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:1830)
[ 337.945748] ? netlink_unicast (/home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:1830)
[ 337.945997] ____sys_sendmsg (/home/petr/src/linux_mlxsw/net/socket.c:728 (discriminator 1) /home/petr/src/linux_mlxsw/net/socket.c:748 (discriminator 1) /home/petr/src/linux_mlxsw/net/socket.c:2494 (discriminator 1))
[ 337.946273] ? copy_msghdr_from_user (/home/petr/src/linux_mlxsw/net/socket.c:2420)
[ 337.946560] ? sock_read_iter (/home/petr/src/linux_mlxsw/net/socket.c:2440)
[ 337.946819] ? __lock_acquire (/home/petr/src/linux_mlxsw/./arch/x86/include/asm/bitops.h:228 /home/petr/src/linux_mlxsw/./arch/x86/include/asm/bitops.h:240 /home/petr/src/linux_mlxsw/./include/asm-generic/bitops/instrumented-non-atomic.h:142 /home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:228 /home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:3788 /home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:3844 /home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:5144)
[ 337.947097] ___sys_sendmsg (/home/petr/src/linux_mlxsw/net/socket.c:2550)
[ 337.947372] ? do_recvmmsg (/home/petr/src/linux_mlxsw/net/socket.c:2537)
[ 337.947653] ? local_clock_noinstr (/home/petr/src/linux_mlxsw/kernel/sched/clock.c:301 (discriminator 1))
[ 337.947909] ? __fget_light (/home/petr/src/linux_mlxsw/fs/file.c:1027)
[ 337.948164] __sys_sendmsg (/home/petr/src/linux_mlxsw/net/socket.c:2579)
[ 337.948417] ? __sys_sendmsg_sock (/home/petr/src/linux_mlxsw/net/socket.c:2565)
[ 337.948689] ? xfd_validate_state (/home/petr/src/linux_mlxsw/arch/x86/kernel/fpu/xstate.c:1411 /home/petr/src/linux_mlxsw/arch/x86/kernel/fpu/xstate.c:1455)
[ 337.948970] ? syscall_enter_from_user_mode (/home/petr/src/linux_mlxsw/./arch/x86/include/asm/irqflags.h:42 /home/petr/src/linux_mlxsw/./arch/x86/include/asm/irqflags.h:77 /home/petr/src/linux_mlxsw/kernel/entry/common.c:111)
[ 337.949300] do_syscall_64 (/home/petr/src/linux_mlxsw/arch/x86/entry/common.c:50 (discriminator 1) /home/petr/src/linux_mlxsw/arch/x86/entry/common.c:80 (discriminator 1))
[ 337.949530] entry_SYSCALL_64_after_hwframe (/home/petr/src/linux_mlxsw/arch/x86/entry/entry_64.S:120)
[ 337.949836] RIP: 0033:0x7f008946a8b4
[ 337.950059] Code: 15 59 f5 0b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 00 f3 0f 1e fa 80 3d 2d 7d 0c 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 4c c3 0f 1f 00 55 48 89 e5 48 83 ec 20 89 55
All code
========
0: 15 59 f5 0b 00 adc $0xbf559,%eax
5: f7 d8 neg %eax
7: 64 89 02 mov %eax,%fs:(%rdx)
a: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax
11: eb b5 jmp 0xffffffffffffffc8
13: 0f 1f 00 nopl (%rax)
16: f3 0f 1e fa endbr64
1a: 80 3d 2d 7d 0c 00 00 cmpb $0x0,0xc7d2d(%rip) # 0xc7d4e
21: 74 13 je 0x36
23: b8 2e 00 00 00 mov $0x2e,%eax
28: 0f 05 syscall
2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction
30: 77 4c ja 0x7e
32: c3 ret
33: 0f 1f 00 nopl (%rax)
36: 55 push %rbp
37: 48 89 e5 mov %rsp,%rbp
3a: 48 83 ec 20 sub $0x20,%rsp
3e: 89 .byte 0x89
3f: 55 push %rbp
Code starting with the faulting instruction
===========================================
0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax
6: 77 4c ja 0x54
8: c3 ret
9: 0f 1f 00 nopl (%rax)
c: 55 push %rbp
d: 48 89 e5 mov %rsp,%rbp
10: 48 83 ec 20 sub $0x20,%rsp
14: 89 .byte 0x89
15: 55 push %rbp
[ 337.951115] RSP: 002b:00007ffd615b18c8 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
[ 337.951576] RAX: ffffffffffffffda RBX: 000055958cf26f80 RCX: 00007f008946a8b4
[ 337.951983] RDX: 0000000000000000 RSI: 00007ffd615b1940 RDI: 0000000000000003
[ 337.952415] RBP: 00007ffd615b19b0 R08: 0000000064bab34a R09: 0000000000000001
[ 337.952825] R10: 0000000000000001 R11: 0000000000000202 R12: 00007ffd615b1a30
[ 337.953247] R13: 0000000064bab34b R14: 000055958cf26f80 R15: 0000000000000000
[ 337.953670] </TASK>
[ 337.953809] irq event stamp: 167935
[ 337.954012] hardirqs last enabled at (167943): __up_console_sem (/home/petr/src/linux_mlxsw/./arch/x86/include/asm/irqflags.h:42 /home/petr/src/linux_mlxsw/./arch/x86/include/asm/irqflags.h:77 /home/petr/src/linux_mlxsw/./arch/x86/include/asm/irqflags.h:135 /home/petr/src/linux_mlxsw/kernel/printk/printk.c:347 /home/petr/src/linux_mlxsw/kernel/printk/printk.c:339)
[ 337.954517] hardirqs last disabled at (167952): __up_console_sem (/home/petr/src/linux_mlxsw/kernel/printk/printk.c:345 (discriminator 3))
[ 337.955007] softirqs last enabled at (167216): irq_exit_rcu (/home/petr/src/linux_mlxsw/kernel/softirq.c:427 /home/petr/src/linux_mlxsw/kernel/softirq.c:632 /home/petr/src/linux_mlxsw/kernel/softirq.c:644)
[ 337.955509] softirqs last disabled at (167199): irq_exit_rcu (/home/petr/src/linux_mlxsw/kernel/softirq.c:427 /home/petr/src/linux_mlxsw/kernel/softirq.c:632 /home/petr/src/linux_mlxsw/kernel/softirq.c:644)
[ 337.955999] ---[ end trace 0000000000000000 ]---
[-- Attachment #4: decoded3.txt --]
[-- Type: text/plain, Size: 12055 bytes --]
[ 337.957029] ------------[ cut here ]------------
[ 337.957696] kvfree_call_rcu(): Double-freed call. rcu_head ffff888008a7a638
[ 337.961920] WARNING: CPU: 0 PID: 171 at kernel/rcu/tree.c:3361 kvfree_call_rcu (/home/petr/src/linux_mlxsw/kernel/rcu/tree.c:3361 (discriminator 1))
[ 337.963725] Modules linked in: sch_ingress veth
[ 337.964797] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc38 04/01/2014
[ 337.965437] RIP: 0010:kvfree_call_rcu (/home/petr/src/linux_mlxsw/kernel/rcu/tree.c:3361 (discriminator 1))
[ 337.965764] Code: 00 00 00 e8 2a 4c e8 ff e9 35 01 00 00 4c 89 e2 48 c7 c6 a0 07 4e 83 48 c7 c7 40 df 4d 83 c6 05 29 35 07 03 01 e8 28 7c e0 ff <0f> 0b e9 db fa ff ff 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 08
All code
========
0: 00 00 add %al,(%rax)
2: 00 e8 add %ch,%al
4: 2a 4c e8 ff sub -0x1(%rax,%rbp,8),%cl
8: e9 35 01 00 00 jmp 0x142
d: 4c 89 e2 mov %r12,%rdx
10: 48 c7 c6 a0 07 4e 83 mov $0xffffffff834e07a0,%rsi
17: 48 c7 c7 40 df 4d 83 mov $0xffffffff834ddf40,%rdi
1e: c6 05 29 35 07 03 01 movb $0x1,0x3073529(%rip) # 0x307354e
25: e8 28 7c e0 ff call 0xffffffffffe07c52
2a:* 0f 0b ud2 <-- trapping instruction
2c: e9 db fa ff ff jmp 0xfffffffffffffb0c
31: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax
38: fc ff df
3b: 49 8d 7c 24 08 lea 0x8(%r12),%rdi
Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: e9 db fa ff ff jmp 0xfffffffffffffae2
7: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax
e: fc ff df
11: 49 8d 7c 24 08 lea 0x8(%r12),%rdi
[ 337.966903] RSP: 0000:ffffc9000009f328 EFLAGS: 00010286
[ 337.967220] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 337.967645] RDX: ffff88800c8c9fc0 RSI: ffffffff813f13cb RDI: 0000000000000001
[ 337.968049] RBP: ffff888008a7a000 R08: 0000000000000001 R09: 0000000000000000
[ 337.968474] R10: 0000000000000001 R11: 0000000000000001 R12: ffff888008a7a638
[ 337.968879] R13: ffff888008a7a208 R14: ffff888008a7a008 R15: 0000000000000002
[ 337.969302] FS: 00007f0089130740(0000) GS:ffff888036000000(0000) knlGS:0000000000000000
[ 337.969759] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 337.970091] CR2: 00007fd15cc3d000 CR3: 000000000c0af001 CR4: 0000000000370ef0
[ 337.970515] Call Trace:
[ 337.970670] <TASK>
[ 337.970806] ? __warn (/home/petr/src/linux_mlxsw/kernel/panic.c:673)
[ 337.971008] ? kvfree_call_rcu (/home/petr/src/linux_mlxsw/kernel/rcu/tree.c:3361 (discriminator 1))
[ 337.971279] ? report_bug (/home/petr/src/linux_mlxsw/lib/bug.c:180 /home/petr/src/linux_mlxsw/lib/bug.c:219)
[ 337.971519] ? handle_bug (/home/petr/src/linux_mlxsw/arch/x86/kernel/traps.c:324 (discriminator 1))
[ 337.971738] ? exc_invalid_op (/home/petr/src/linux_mlxsw/arch/x86/kernel/traps.c:345 (discriminator 1))
[ 337.971973] ? asm_exc_invalid_op (/home/petr/src/linux_mlxsw/./arch/x86/include/asm/idtentry.h:568)
[ 337.972261] ? __warn_printk (/home/petr/src/linux_mlxsw/kernel/panic.c:712)
[ 337.972501] ? kvfree_call_rcu (/home/petr/src/linux_mlxsw/kernel/rcu/tree.c:3361 (discriminator 1))
[ 337.972750] ? kvfree_call_rcu (/home/petr/src/linux_mlxsw/kernel/rcu/tree.c:3361 (discriminator 1))
[ 337.972997] ? __tcf_block_put (/home/petr/src/linux_mlxsw/net/sched/cls_api.c:535 /home/petr/src/linux_mlxsw/net/sched/cls_api.c:530 /home/petr/src/linux_mlxsw/net/sched/cls_api.c:1301)
[ 337.973276] ingress_destroy (/home/petr/src/linux_mlxsw/net/sched/sch_ingress.c:131) sch_ingress
[ 337.973591] ? clsact_init (/home/petr/src/linux_mlxsw/net/sched/sch_ingress.c:113) sch_ingress
[ 337.973892] __qdisc_destroy (/home/petr/src/linux_mlxsw/net/sched/sch_generic.c:1065)
[ 337.974128] qdisc_destroy (/home/petr/src/linux_mlxsw/net/sched/sch_generic.c:1079)
[ 337.974371] qdisc_graft (/home/petr/src/linux_mlxsw/net/sched/sch_api.c:1134)
[ 337.974612] ? tc_dump_tclass (/home/petr/src/linux_mlxsw/net/sched/sch_api.c:1076)
[ 337.974873] tc_get_qdisc (/home/petr/src/linux_mlxsw/net/sched/sch_api.c:1541)
[ 337.975106] ? tc_ctl_tclass (/home/petr/src/linux_mlxsw/net/sched/sch_api.c:1475)
[ 337.975363] ? rtnetlink_rcv_msg (/home/petr/src/linux_mlxsw/net/core/rtnetlink.c:6421)
[ 337.975645] ? cap_capable (/home/petr/src/linux_mlxsw/security/commoncap.c:102)
[ 337.975883] ? lock_is_held_type (/home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:467 (discriminator 4) /home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:5833 (discriminator 4))
[ 337.976137] ? tc_ctl_tclass (/home/petr/src/linux_mlxsw/net/sched/sch_api.c:1475)
[ 337.976398] rtnetlink_rcv_msg (/home/petr/src/linux_mlxsw/net/core/rtnetlink.c:6423)
[ 337.976653] ? rtnl_dump_ifinfo (/home/petr/src/linux_mlxsw/net/core/rtnetlink.c:6319)
[ 337.976915] ? lockdep_hardirqs_on_prepare (/home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:5000)
[ 337.977243] ? lockdep_hardirqs_on_prepare (/home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:5000)
[ 337.977548] ? find_held_lock (/home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:5251 (discriminator 1))
[ 337.977799] netlink_rcv_skb (/home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:2547)
[ 337.978041] ? rtnl_dump_ifinfo (/home/petr/src/linux_mlxsw/net/core/rtnetlink.c:6319)
[ 337.978331] ? netlink_ack (/home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:2523)
[ 337.978583] ? lock_sync (/home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:5729)
[ 337.978833] ? netlink_deliver_tap (/home/petr/src/linux_mlxsw/./include/linux/rcupdate.h:308 /home/petr/src/linux_mlxsw/./include/linux/rcupdate.h:782 /home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:340)
[ 337.979103] ? is_vmalloc_addr (/home/petr/src/linux_mlxsw/mm/vmalloc.c:83)
[ 337.979376] netlink_unicast (/home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:1340 /home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:1365)
[ 337.979668] ? netlink_attachskb (/home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:1350)
[ 337.979933] ? __sanitizer_cov_trace_switch (/home/petr/src/linux_mlxsw/kernel/kcov.c:340 (discriminator 1))
[ 337.980307] ? __check_object_size (/home/petr/src/linux_mlxsw/mm/usercopy.c:113 /home/petr/src/linux_mlxsw/mm/usercopy.c:145 /home/petr/src/linux_mlxsw/mm/usercopy.c:254 /home/petr/src/linux_mlxsw/mm/usercopy.c:213)
[ 337.980608] netlink_sendmsg (/home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:1911)
[ 337.980861] ? netlink_unicast (/home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:1830)
[ 337.981124] ? netlink_unicast (/home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:1830)
[ 337.981406] ____sys_sendmsg (/home/petr/src/linux_mlxsw/net/socket.c:728 (discriminator 1) /home/petr/src/linux_mlxsw/net/socket.c:748 (discriminator 1) /home/petr/src/linux_mlxsw/net/socket.c:2494 (discriminator 1))
[ 337.981654] ? copy_msghdr_from_user (/home/petr/src/linux_mlxsw/net/socket.c:2420)
[ 337.981944] ? sock_read_iter (/home/petr/src/linux_mlxsw/net/socket.c:2440)
[ 337.982202] ? __lock_acquire (/home/petr/src/linux_mlxsw/./arch/x86/include/asm/bitops.h:228 /home/petr/src/linux_mlxsw/./arch/x86/include/asm/bitops.h:240 /home/petr/src/linux_mlxsw/./include/asm-generic/bitops/instrumented-non-atomic.h:142 /home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:228 /home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:3788 /home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:3844 /home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:5144)
[ 337.982486] ___sys_sendmsg (/home/petr/src/linux_mlxsw/net/socket.c:2550)
[ 337.982734] ? do_recvmmsg (/home/petr/src/linux_mlxsw/net/socket.c:2537)
[ 337.983010] ? local_clock_noinstr (/home/petr/src/linux_mlxsw/kernel/sched/clock.c:301 (discriminator 1))
[ 337.983299] ? __fget_light (/home/petr/src/linux_mlxsw/fs/file.c:1027)
[ 337.983549] __sys_sendmsg (/home/petr/src/linux_mlxsw/net/socket.c:2579)
[ 337.983782] ? __sys_sendmsg_sock (/home/petr/src/linux_mlxsw/net/socket.c:2565)
[ 337.984043] ? xfd_validate_state (/home/petr/src/linux_mlxsw/arch/x86/kernel/fpu/xstate.c:1411 /home/petr/src/linux_mlxsw/arch/x86/kernel/fpu/xstate.c:1455)
[ 337.984352] ? syscall_enter_from_user_mode (/home/petr/src/linux_mlxsw/./arch/x86/include/asm/irqflags.h:42 /home/petr/src/linux_mlxsw/./arch/x86/include/asm/irqflags.h:77 /home/petr/src/linux_mlxsw/kernel/entry/common.c:111)
[ 337.984667] do_syscall_64 (/home/petr/src/linux_mlxsw/arch/x86/entry/common.c:50 (discriminator 1) /home/petr/src/linux_mlxsw/arch/x86/entry/common.c:80 (discriminator 1))
[ 337.984894] entry_SYSCALL_64_after_hwframe (/home/petr/src/linux_mlxsw/arch/x86/entry/entry_64.S:120)
[ 337.985202] RIP: 0033:0x7f008946a8b4
[ 337.985437] Code: 15 59 f5 0b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 00 f3 0f 1e fa 80 3d 2d 7d 0c 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 4c c3 0f 1f 00 55 48 89 e5 48 83 ec 20 89 55
All code
========
0: 15 59 f5 0b 00 adc $0xbf559,%eax
5: f7 d8 neg %eax
7: 64 89 02 mov %eax,%fs:(%rdx)
a: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax
11: eb b5 jmp 0xffffffffffffffc8
13: 0f 1f 00 nopl (%rax)
16: f3 0f 1e fa endbr64
1a: 80 3d 2d 7d 0c 00 00 cmpb $0x0,0xc7d2d(%rip) # 0xc7d4e
21: 74 13 je 0x36
23: b8 2e 00 00 00 mov $0x2e,%eax
28: 0f 05 syscall
2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction
30: 77 4c ja 0x7e
32: c3 ret
33: 0f 1f 00 nopl (%rax)
36: 55 push %rbp
37: 48 89 e5 mov %rsp,%rbp
3a: 48 83 ec 20 sub $0x20,%rsp
3e: 89 .byte 0x89
3f: 55 push %rbp
Code starting with the faulting instruction
===========================================
0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax
6: 77 4c ja 0x54
8: c3 ret
9: 0f 1f 00 nopl (%rax)
c: 55 push %rbp
d: 48 89 e5 mov %rsp,%rbp
10: 48 83 ec 20 sub $0x20,%rsp
14: 89 .byte 0x89
15: 55 push %rbp
[ 337.986472] RSP: 002b:00007ffd615b18c8 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
[ 337.986905] RAX: ffffffffffffffda RBX: 000055958cf26f80 RCX: 00007f008946a8b4
[ 337.987324] RDX: 0000000000000000 RSI: 00007ffd615b1940 RDI: 0000000000000003
[ 337.987730] RBP: 00007ffd615b19b0 R08: 0000000064bab34a R09: 0000000000000001
[ 337.988146] R10: 0000000000000001 R11: 0000000000000202 R12: 00007ffd615b1a30
[ 337.988585] R13: 0000000064bab34b R14: 000055958cf26f80 R15: 0000000000000000
[ 337.989023] </TASK>
[ 337.989171] irq event stamp: 168885
[ 337.989399] hardirqs last enabled at (168895): __up_console_sem (/home/petr/src/linux_mlxsw/./arch/x86/include/asm/irqflags.h:42 /home/petr/src/linux_mlxsw/./arch/x86/include/asm/irqflags.h:77 /home/petr/src/linux_mlxsw/./arch/x86/include/asm/irqflags.h:135 /home/petr/src/linux_mlxsw/kernel/printk/printk.c:347 /home/petr/src/linux_mlxsw/kernel/printk/printk.c:339)
[ 337.989908] hardirqs last disabled at (168902): __up_console_sem (/home/petr/src/linux_mlxsw/kernel/printk/printk.c:345 (discriminator 3))
[ 337.990424] softirqs last enabled at (168216): irq_exit_rcu (/home/petr/src/linux_mlxsw/kernel/softirq.c:427 /home/petr/src/linux_mlxsw/kernel/softirq.c:632 /home/petr/src/linux_mlxsw/kernel/softirq.c:644)
[ 337.990951] softirqs last disabled at (168207): irq_exit_rcu (/home/petr/src/linux_mlxsw/kernel/softirq.c:427 /home/petr/src/linux_mlxsw/kernel/softirq.c:632 /home/petr/src/linux_mlxsw/kernel/softirq.c:644)
[ 337.991830] ---[ end trace 0000000000000000 ]---
[-- Attachment #5: decoded4.txt --]
[-- Type: text/plain, Size: 16024 bytes --]
[ 835.734697] ==================================================================
[ 835.735303] BUG: KASAN: slab-use-after-free in ingress_init (/home/petr/src/linux_mlxsw/./include/net/tcx.h:36 /home/petr/src/linux_mlxsw/./include/net/tcx.h:136 /home/petr/src/linux_mlxsw/net/sched/sch_ingress.c:94) sch_ingress
[ 835.735840] Read of size 8 at addr ffff888008a7a208 by task tc/303
[ 835.736187]
[ 835.736761] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc38 04/01/2014
[ 835.737244] Call Trace:
[ 835.737394] <TASK>
[ 835.737524] dump_stack_lvl (/home/petr/src/linux_mlxsw/lib/dump_stack.c:107)
[ 835.737749] print_report (/home/petr/src/linux_mlxsw/mm/kasan/report.c:365 /home/petr/src/linux_mlxsw/mm/kasan/report.c:475)
[ 835.738015] ? __virt_addr_valid (/home/petr/src/linux_mlxsw/arch/x86/mm/physaddr.c:66)
[ 835.738265] kasan_report (/home/petr/src/linux_mlxsw/mm/kasan/report.c:590)
[ 835.738485] ? ingress_init (/home/petr/src/linux_mlxsw/./include/net/tcx.h:36 /home/petr/src/linux_mlxsw/./include/net/tcx.h:136 /home/petr/src/linux_mlxsw/net/sched/sch_ingress.c:94) sch_ingress
[ 835.738783] ? ingress_init (/home/petr/src/linux_mlxsw/./include/net/tcx.h:36 /home/petr/src/linux_mlxsw/./include/net/tcx.h:136 /home/petr/src/linux_mlxsw/net/sched/sch_ingress.c:94) sch_ingress
[ 835.739086] ingress_init (/home/petr/src/linux_mlxsw/./include/net/tcx.h:36 /home/petr/src/linux_mlxsw/./include/net/tcx.h:136 /home/petr/src/linux_mlxsw/net/sched/sch_ingress.c:94) sch_ingress
[ 835.739393] ? ingress_dump (/home/petr/src/linux_mlxsw/net/sched/sch_ingress.c:79) sch_ingress
[ 835.739703] qdisc_create (/home/petr/src/linux_mlxsw/net/sched/sch_api.c:1327)
[ 835.739929] ? tc_get_qdisc (/home/petr/src/linux_mlxsw/net/sched/sch_api.c:1228)
[ 835.740158] ? lock_is_held_type (/home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:467 (discriminator 4) /home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:5833 (discriminator 4))
[ 835.740409] tc_modify_qdisc (/home/petr/src/linux_mlxsw/net/sched/sch_api.c:1703 (discriminator 1))
[ 835.740651] ? qdisc_create (/home/petr/src/linux_mlxsw/net/sched/sch_api.c:1556)
[ 835.740886] ? rtnetlink_rcv_msg (/home/petr/src/linux_mlxsw/net/core/rtnetlink.c:6421)
[ 835.741144] ? cap_capable (/home/petr/src/linux_mlxsw/security/commoncap.c:102)
[ 835.741372] ? lock_is_held_type (/home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:467 (discriminator 4) /home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:5833 (discriminator 4))
[ 835.741664] ? qdisc_create (/home/petr/src/linux_mlxsw/net/sched/sch_api.c:1556)
[ 835.741900] rtnetlink_rcv_msg (/home/petr/src/linux_mlxsw/net/core/rtnetlink.c:6423)
[ 835.742142] ? rtnl_dump_ifinfo (/home/petr/src/linux_mlxsw/net/core/rtnetlink.c:6319)
[ 835.742402] ? lockdep_hardirqs_on_prepare (/home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:5000)
[ 835.742702] ? lockdep_hardirqs_on_prepare (/home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:5000)
[ 835.742998] ? find_held_lock (/home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:5251 (discriminator 1))
[ 835.743233] netlink_rcv_skb (/home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:2547)
[ 835.743481] ? rtnl_dump_ifinfo (/home/petr/src/linux_mlxsw/net/core/rtnetlink.c:6319)
[ 835.743732] ? netlink_ack (/home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:2523)
[ 835.743955] ? lock_sync (/home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:5729)
[ 835.744170] ? netlink_deliver_tap (/home/petr/src/linux_mlxsw/./include/linux/rcupdate.h:308 /home/petr/src/linux_mlxsw/./include/linux/rcupdate.h:782 /home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:340)
[ 835.744463] ? is_vmalloc_addr (/home/petr/src/linux_mlxsw/mm/vmalloc.c:83)
[ 835.744686] netlink_unicast (/home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:1340 /home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:1365)
[ 835.744912] ? netlink_attachskb (/home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:1350)
[ 835.745156] ? __sanitizer_cov_trace_switch (/home/petr/src/linux_mlxsw/kernel/kcov.c:340 (discriminator 1))
[ 835.745482] ? __check_object_size (/home/petr/src/linux_mlxsw/mm/usercopy.c:113 /home/petr/src/linux_mlxsw/mm/usercopy.c:145 /home/petr/src/linux_mlxsw/mm/usercopy.c:254 /home/petr/src/linux_mlxsw/mm/usercopy.c:213)
[ 835.745736] netlink_sendmsg (/home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:1911)
[ 835.745967] ? netlink_unicast (/home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:1830)
[ 835.746204] ? netlink_unicast (/home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:1830)
[ 835.746481] ____sys_sendmsg (/home/petr/src/linux_mlxsw/net/socket.c:728 (discriminator 1) /home/petr/src/linux_mlxsw/net/socket.c:748 (discriminator 1) /home/petr/src/linux_mlxsw/net/socket.c:2494 (discriminator 1))
[ 835.746705] ? copy_msghdr_from_user (/home/petr/src/linux_mlxsw/net/socket.c:2420)
[ 835.746987] ? sock_read_iter (/home/petr/src/linux_mlxsw/net/socket.c:2440)
[ 835.747225] ? __lock_acquire (/home/petr/src/linux_mlxsw/./arch/x86/include/asm/bitops.h:228 /home/petr/src/linux_mlxsw/./arch/x86/include/asm/bitops.h:240 /home/petr/src/linux_mlxsw/./include/asm-generic/bitops/instrumented-non-atomic.h:142 /home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:228 /home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:3788 /home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:3844 /home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:5144)
[ 835.747495] ___sys_sendmsg (/home/petr/src/linux_mlxsw/net/socket.c:2550)
[ 835.747718] ? do_recvmmsg (/home/petr/src/linux_mlxsw/net/socket.c:2537)
[ 835.747958] ? local_clock_noinstr (/home/petr/src/linux_mlxsw/kernel/sched/clock.c:301 (discriminator 1))
[ 835.748235] ? __fget_light (/home/petr/src/linux_mlxsw/fs/file.c:1027)
[ 835.748523] __sys_sendmsg (/home/petr/src/linux_mlxsw/net/socket.c:2579)
[ 835.748756] ? __sys_sendmsg_sock (/home/petr/src/linux_mlxsw/net/socket.c:2565)
[ 835.749004] ? __up_read (/home/petr/src/linux_mlxsw/./arch/x86/include/asm/preempt.h:104 (discriminator 1) /home/petr/src/linux_mlxsw/kernel/locking/rwsem.c:1354 (discriminator 1))
[ 835.749229] ? syscall_enter_from_user_mode (/home/petr/src/linux_mlxsw/./arch/x86/include/asm/irqflags.h:42 /home/petr/src/linux_mlxsw/./arch/x86/include/asm/irqflags.h:77 /home/petr/src/linux_mlxsw/kernel/entry/common.c:111)
[ 835.749531] do_syscall_64 (/home/petr/src/linux_mlxsw/arch/x86/entry/common.c:50 (discriminator 1) /home/petr/src/linux_mlxsw/arch/x86/entry/common.c:80 (discriminator 1))
[ 835.749755] entry_SYSCALL_64_after_hwframe (/home/petr/src/linux_mlxsw/arch/x86/entry/entry_64.S:120)
[ 835.750059] RIP: 0033:0x7f4a861c38b4
[ 835.750279] Code: 15 59 f5 0b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 00 f3 0f 1e fa 80 3d 2d 7d 0c 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 4c c3 0f 1f 00 55 48 89 e5 48 83 ec 20 89 55
All code
========
0: 15 59 f5 0b 00 adc $0xbf559,%eax
5: f7 d8 neg %eax
7: 64 89 02 mov %eax,%fs:(%rdx)
a: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax
11: eb b5 jmp 0xffffffffffffffc8
13: 0f 1f 00 nopl (%rax)
16: f3 0f 1e fa endbr64
1a: 80 3d 2d 7d 0c 00 00 cmpb $0x0,0xc7d2d(%rip) # 0xc7d4e
21: 74 13 je 0x36
23: b8 2e 00 00 00 mov $0x2e,%eax
28: 0f 05 syscall
2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction
30: 77 4c ja 0x7e
32: c3 ret
33: 0f 1f 00 nopl (%rax)
36: 55 push %rbp
37: 48 89 e5 mov %rsp,%rbp
3a: 48 83 ec 20 sub $0x20,%rsp
3e: 89 .byte 0x89
3f: 55 push %rbp
Code starting with the faulting instruction
===========================================
0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax
6: 77 4c ja 0x54
8: c3 ret
9: 0f 1f 00 nopl (%rax)
c: 55 push %rbp
d: 48 89 e5 mov %rsp,%rbp
10: 48 83 ec 20 sub $0x20,%rsp
14: 89 .byte 0x89
15: 55 push %rbp
[ 835.751300] RSP: 002b:00007fff3b43db58 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
[ 835.751739] RAX: ffffffffffffffda RBX: 000055dc998edf80 RCX: 00007f4a861c38b4
[ 835.752143] RDX: 0000000000000000 RSI: 00007fff3b43dbd0 RDI: 0000000000000003
[ 835.752553] RBP: 00007fff3b43dc40 R08: 0000000064bab53c R09: 0000000000000001
[ 835.752962] R10: 0000000000000001 R11: 0000000000000202 R12: 00007fff3b43dcc0
[ 835.753367] R13: 0000000064bab53d R14: 000055dc998edf80 R15: 0000000000000000
[ 835.753784] </TASK>
[ 835.753923]
[ 835.754017] Allocated by task 165:
[ 835.754220] kasan_save_stack (/home/petr/src/linux_mlxsw/mm/kasan/common.c:46)
[ 835.754466] kasan_set_track (/home/petr/src/linux_mlxsw/mm/kasan/common.c:52 (discriminator 1))
[ 835.754705] __kasan_kmalloc (/home/petr/src/linux_mlxsw/mm/kasan/common.c:374 /home/petr/src/linux_mlxsw/mm/kasan/common.c:383)
[ 835.754937] ingress_init (/home/petr/src/linux_mlxsw/./include/linux/slab.h:582 /home/petr/src/linux_mlxsw/./include/linux/slab.h:703 /home/petr/src/linux_mlxsw/./include/net/tcx.h:85 /home/petr/src/linux_mlxsw/./include/net/tcx.h:106 /home/petr/src/linux_mlxsw/./include/net/tcx.h:100 /home/petr/src/linux_mlxsw/net/sched/sch_ingress.c:91) sch_ingress
[ 835.755240] qdisc_create (/home/petr/src/linux_mlxsw/net/sched/sch_api.c:1327)
[ 835.755481] tc_modify_qdisc (/home/petr/src/linux_mlxsw/net/sched/sch_api.c:1703 (discriminator 1))
[ 835.755721] rtnetlink_rcv_msg (/home/petr/src/linux_mlxsw/net/core/rtnetlink.c:6423)
[ 835.755964] netlink_rcv_skb (/home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:2547)
[ 835.756197] netlink_unicast (/home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:1340 /home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:1365)
[ 835.756445] netlink_sendmsg (/home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:1911)
[ 835.756675] ____sys_sendmsg (/home/petr/src/linux_mlxsw/net/socket.c:728 (discriminator 1) /home/petr/src/linux_mlxsw/net/socket.c:748 (discriminator 1) /home/petr/src/linux_mlxsw/net/socket.c:2494 (discriminator 1))
[ 835.756906] ___sys_sendmsg (/home/petr/src/linux_mlxsw/net/socket.c:2550)
[ 835.757133] __sys_sendmsg (/home/petr/src/linux_mlxsw/net/socket.c:2579)
[ 835.757360] do_syscall_64 (/home/petr/src/linux_mlxsw/arch/x86/entry/common.c:50 (discriminator 1) /home/petr/src/linux_mlxsw/arch/x86/entry/common.c:80 (discriminator 1))
[ 835.757574] entry_SYSCALL_64_after_hwframe (/home/petr/src/linux_mlxsw/arch/x86/entry/entry_64.S:120)
[ 835.757866]
[ 835.757964] Last potentially related work creation:
[ 835.758236] kasan_save_stack (/home/petr/src/linux_mlxsw/mm/kasan/common.c:46)
[ 835.758473] __kasan_record_aux_stack (/home/petr/src/linux_mlxsw/mm/kasan/generic.c:492 (discriminator 1))
[ 835.758752] kvfree_call_rcu (/home/petr/src/linux_mlxsw/./arch/x86/include/asm/irqflags.h:26 /home/petr/src/linux_mlxsw/./arch/x86/include/asm/irqflags.h:67 /home/petr/src/linux_mlxsw/./arch/x86/include/asm/irqflags.h:103 /home/petr/src/linux_mlxsw/kernel/rcu/tree.c:2883 /home/petr/src/linux_mlxsw/kernel/rcu/tree.c:3284 /home/petr/src/linux_mlxsw/kernel/rcu/tree.c:3369)
[ 835.758994] ingress_destroy (/home/petr/src/linux_mlxsw/net/sched/sch_ingress.c:131) sch_ingress
[ 835.759321] __qdisc_destroy (/home/petr/src/linux_mlxsw/net/sched/sch_generic.c:1065)
[ 835.759551] qdisc_destroy (/home/petr/src/linux_mlxsw/net/sched/sch_generic.c:1079)
[ 835.759769] qdisc_graft (/home/petr/src/linux_mlxsw/net/sched/sch_api.c:1134)
[ 835.759994] tc_get_qdisc (/home/petr/src/linux_mlxsw/net/sched/sch_api.c:1541)
[ 835.760219] rtnetlink_rcv_msg (/home/petr/src/linux_mlxsw/net/core/rtnetlink.c:6423)
[ 835.760477] netlink_rcv_skb (/home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:2547)
[ 835.760710] netlink_unicast (/home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:1340 /home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:1365)
[ 835.760941] netlink_sendmsg (/home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:1911)
[ 835.761170] ____sys_sendmsg (/home/petr/src/linux_mlxsw/net/socket.c:728 (discriminator 1) /home/petr/src/linux_mlxsw/net/socket.c:748 (discriminator 1) /home/petr/src/linux_mlxsw/net/socket.c:2494 (discriminator 1))
[ 835.761423] ___sys_sendmsg (/home/petr/src/linux_mlxsw/net/socket.c:2550)
[ 835.761646] __sys_sendmsg (/home/petr/src/linux_mlxsw/net/socket.c:2579)
[ 835.761864] do_syscall_64 (/home/petr/src/linux_mlxsw/arch/x86/entry/common.c:50 (discriminator 1) /home/petr/src/linux_mlxsw/arch/x86/entry/common.c:80 (discriminator 1))
[ 835.762072] entry_SYSCALL_64_after_hwframe (/home/petr/src/linux_mlxsw/arch/x86/entry/entry_64.S:120)
[ 835.762398]
[ 835.762490] Second to last potentially related work creation:
[ 835.762802] kasan_save_stack (/home/petr/src/linux_mlxsw/mm/kasan/common.c:46)
[ 835.763067] __kasan_record_aux_stack (/home/petr/src/linux_mlxsw/mm/kasan/generic.c:492 (discriminator 1))
[ 835.763340] __call_rcu_common.constprop.0 (/home/petr/src/linux_mlxsw/./arch/x86/include/asm/irqflags.h:26 /home/petr/src/linux_mlxsw/./arch/x86/include/asm/irqflags.h:67 /home/petr/src/linux_mlxsw/./arch/x86/include/asm/irqflags.h:103 /home/petr/src/linux_mlxsw/kernel/rcu/tree.c:2650)
[ 835.763631] netlink_release (/home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:829)
[ 835.763865] __sock_release (/home/petr/src/linux_mlxsw/net/socket.c:655)
[ 835.764085] sock_close (/home/petr/src/linux_mlxsw/net/socket.c:1388)
[ 835.764282] __fput (/home/petr/src/linux_mlxsw/fs/file_table.c:385)
[ 835.764493] task_work_run (/home/petr/src/linux_mlxsw/kernel/task_work.c:181)
[ 835.764715] do_exit (/home/petr/src/linux_mlxsw/kernel/exit.c:875)
[ 835.764915] do_group_exit (/home/petr/src/linux_mlxsw/kernel/exit.c:1005)
[ 835.765132] __x64_sys_exit_group (/home/petr/src/linux_mlxsw/kernel/exit.c:1033)
[ 835.765382] do_syscall_64 (/home/petr/src/linux_mlxsw/arch/x86/entry/common.c:50 (discriminator 1) /home/petr/src/linux_mlxsw/arch/x86/entry/common.c:80 (discriminator 1))
[ 835.765596] entry_SYSCALL_64_after_hwframe (/home/petr/src/linux_mlxsw/arch/x86/entry/entry_64.S:120)
[ 835.765891]
[ 835.765988] The buggy address belongs to the object at ffff888008a7a000
[ 835.765988] which belongs to the cache kmalloc-2k of size 2048
[ 835.766668] The buggy address is located 520 bytes inside of
[ 835.766668] freed 2048-byte region [ffff888008a7a000, ffff888008a7a800)
[ 835.767340]
[ 835.767438] The buggy address belongs to the physical page:
[ 835.767750] page:ffffea0000229e00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888008a7a000 pfn:0x8a78
[ 835.768391] head:ffffea0000229e00 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[ 835.768837] flags: 0x100000000010200(slab|head|node=0|zone=1)
[ 835.769170] page_type: 0xffffffff()
[ 835.769385] raw: 0100000000010200 ffff888006842340 ffffea0000241a10 ffffea000022a010
[ 835.769861] raw: ffff888008a7a000 0000000000050001 00000001ffffffff 0000000000000000
[ 835.770622] page dumped because: kasan: bad access detected
[ 835.771278]
[ 835.771540] Memory state around the buggy address:
[ 835.772029] ffff888008a7a100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 835.772980] ffff888008a7a180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 835.773813] >ffff888008a7a200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 835.774506] ^
[ 835.774844] ffff888008a7a280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 835.775524] ffff888008a7a300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 835.776185] ==================================================================
[-- Attachment #6: decoded5.txt --]
[-- Type: text/plain, Size: 12804 bytes --]
[ 835.780645] general protection fault, probably for non-canonical address 0xed6d696d6d6d6e32: 0000 [#1] PREEMPT SMP KASAN
[ 835.781337] KASAN: maybe wild-memory-access in range [0x6b6b6b6b6b6b7190-0x6b6b6b6b6b6b7197]
[ 835.782241] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc38 04/01/2014
[ 835.782741] RIP: 0010:ingress_init (/home/petr/src/linux_mlxsw/./include/net/tcx.h:136 (discriminator 1) /home/petr/src/linux_mlxsw/net/sched/sch_ingress.c:94 (discriminator 1)) sch_ingress
[ 835.783089] Code: 03 80 3c 02 00 0f 85 91 04 00 00 4c 8b ad 00 02 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d bd 28 06 00 00 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 06 0f 8e 75 03 00 00 41 c6 85 28 06 00 00 01
All code
========
0: 03 80 3c 02 00 0f add 0xf00023c(%rax),%eax
6: 85 91 04 00 00 4c test %edx,0x4c000004(%rcx)
c: 8b ad 00 02 00 00 mov 0x200(%rbp),%ebp
12: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax
19: fc ff df
1c: 49 8d bd 28 06 00 00 lea 0x628(%r13),%rdi
23: 48 89 fa mov %rdi,%rdx
26: 48 c1 ea 03 shr $0x3,%rdx
2a:* 0f b6 04 02 movzbl (%rdx,%rax,1),%eax <-- trapping instruction
2e: 84 c0 test %al,%al
30: 74 06 je 0x38
32: 0f 8e 75 03 00 00 jle 0x3ad
38: 41 c6 85 28 06 00 00 movb $0x1,0x628(%r13)
3f: 01
Code starting with the faulting instruction
===========================================
0: 0f b6 04 02 movzbl (%rdx,%rax,1),%eax
4: 84 c0 test %al,%al
6: 74 06 je 0xe
8: 0f 8e 75 03 00 00 jle 0x383
e: 41 c6 85 28 06 00 00 movb $0x1,0x628(%r13)
15: 01
[ 835.784122] RSP: 0018:ffffc90000d17400 EFLAGS: 00010202
[ 835.784429] RAX: dffffc0000000000 RBX: ffff88800c841000 RCX: 0000000000000001
[ 835.784824] RDX: 0d6d6d6d6d6d6e32 RSI: ffffffff81c2398e RDI: 6b6b6b6b6b6b7193
[ 835.785218] RBP: ffff888008a7a008 R08: 0000000000000007 R09: 0000000000000000
[ 835.785620] R10: 0000000000000000 R11: 0000000000000001 R12: ffffc90000d17818
[ 835.786017] R13: 6b6b6b6b6b6b6b6b R14: 0000000000000000 R15: ffff88800d731000
[ 835.786437] FS: 00007f4a85e89740(0000) GS:ffff888036000000(0000) knlGS:0000000000000000
[ 835.786907] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 835.787245] CR2: 000055dc998c5dc0 CR3: 000000000bbc8005 CR4: 0000000000370ef0
[ 835.787664] Call Trace:
[ 835.787818] <TASK>
[ 835.787958] ? die_addr (/home/petr/src/linux_mlxsw/arch/x86/kernel/dumpstack.c:421 /home/petr/src/linux_mlxsw/arch/x86/kernel/dumpstack.c:460)
[ 835.788173] ? exc_general_protection (/home/petr/src/linux_mlxsw/arch/x86/kernel/traps.c:786 /home/petr/src/linux_mlxsw/arch/x86/kernel/traps.c:728)
[ 835.788468] ? asm_exc_general_protection (/home/petr/src/linux_mlxsw/./arch/x86/include/asm/idtentry.h:564)
[ 835.788760] ? end_report (/home/petr/src/linux_mlxsw/./arch/x86/include/asm/current.h:41 (discriminator 1) /home/petr/src/linux_mlxsw/mm/kasan/report.c:239 (discriminator 1))
[ 835.788984] ? ingress_init (/home/petr/src/linux_mlxsw/./include/net/tcx.h:136 (discriminator 1) /home/petr/src/linux_mlxsw/net/sched/sch_ingress.c:94 (discriminator 1)) sch_ingress
[ 835.789431] ? ingress_dump (/home/petr/src/linux_mlxsw/net/sched/sch_ingress.c:79) sch_ingress
[ 835.789870] qdisc_create (/home/petr/src/linux_mlxsw/net/sched/sch_api.c:1327)
[ 835.790234] ? tc_get_qdisc (/home/petr/src/linux_mlxsw/net/sched/sch_api.c:1228)
[ 835.790636] ? lock_is_held_type (/home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:467 (discriminator 4) /home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:5833 (discriminator 4))
[ 835.791033] tc_modify_qdisc (/home/petr/src/linux_mlxsw/net/sched/sch_api.c:1703 (discriminator 1))
[ 835.791530] ? qdisc_create (/home/petr/src/linux_mlxsw/net/sched/sch_api.c:1556)
[ 835.792092] ? rtnetlink_rcv_msg (/home/petr/src/linux_mlxsw/net/core/rtnetlink.c:6421)
[ 835.792543] ? cap_capable (/home/petr/src/linux_mlxsw/security/commoncap.c:102)
[ 835.792906] ? lock_is_held_type (/home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:467 (discriminator 4) /home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:5833 (discriminator 4))
[ 835.793269] ? qdisc_create (/home/petr/src/linux_mlxsw/net/sched/sch_api.c:1556)
[ 835.793723] rtnetlink_rcv_msg (/home/petr/src/linux_mlxsw/net/core/rtnetlink.c:6423)
[ 835.794040] ? rtnl_dump_ifinfo (/home/petr/src/linux_mlxsw/net/core/rtnetlink.c:6319)
[ 835.794412] ? lockdep_hardirqs_on_prepare (/home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:5000)
[ 835.794892] ? lockdep_hardirqs_on_prepare (/home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:5000)
[ 835.795404] ? find_held_lock (/home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:5251 (discriminator 1))
[ 835.795774] netlink_rcv_skb (/home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:2547)
[ 835.796011] ? rtnl_dump_ifinfo (/home/petr/src/linux_mlxsw/net/core/rtnetlink.c:6319)
[ 835.796272] ? netlink_ack (/home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:2523)
[ 835.796517] ? lock_sync (/home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:5729)
[ 835.796758] ? netlink_deliver_tap (/home/petr/src/linux_mlxsw/./include/linux/rcupdate.h:308 /home/petr/src/linux_mlxsw/./include/linux/rcupdate.h:782 /home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:340)
[ 835.797034] ? is_vmalloc_addr (/home/petr/src/linux_mlxsw/mm/vmalloc.c:83)
[ 835.797286] netlink_unicast (/home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:1340 /home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:1365)
[ 835.797547] ? netlink_attachskb (/home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:1350)
[ 835.797809] ? __sanitizer_cov_trace_switch (/home/petr/src/linux_mlxsw/kernel/kcov.c:340 (discriminator 1))
[ 835.798141] ? __check_object_size (/home/petr/src/linux_mlxsw/mm/usercopy.c:113 /home/petr/src/linux_mlxsw/mm/usercopy.c:145 /home/petr/src/linux_mlxsw/mm/usercopy.c:254 /home/petr/src/linux_mlxsw/mm/usercopy.c:213)
[ 835.798429] netlink_sendmsg (/home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:1911)
[ 835.798670] ? netlink_unicast (/home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:1830)
[ 835.798927] ? netlink_unicast (/home/petr/src/linux_mlxsw/net/netlink/af_netlink.c:1830)
[ 835.799186] ____sys_sendmsg (/home/petr/src/linux_mlxsw/net/socket.c:728 (discriminator 1) /home/petr/src/linux_mlxsw/net/socket.c:748 (discriminator 1) /home/petr/src/linux_mlxsw/net/socket.c:2494 (discriminator 1))
[ 835.799440] ? copy_msghdr_from_user (/home/petr/src/linux_mlxsw/net/socket.c:2420)
[ 835.799723] ? sock_read_iter (/home/petr/src/linux_mlxsw/net/socket.c:2440)
[ 835.799968] ? __lock_acquire (/home/petr/src/linux_mlxsw/./arch/x86/include/asm/bitops.h:228 /home/petr/src/linux_mlxsw/./arch/x86/include/asm/bitops.h:240 /home/petr/src/linux_mlxsw/./include/asm-generic/bitops/instrumented-non-atomic.h:142 /home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:228 /home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:3788 /home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:3844 /home/petr/src/linux_mlxsw/kernel/locking/lockdep.c:5144)
[ 835.800227] ___sys_sendmsg (/home/petr/src/linux_mlxsw/net/socket.c:2550)
[ 835.800480] ? do_recvmmsg (/home/petr/src/linux_mlxsw/net/socket.c:2537)
[ 835.800723] ? local_clock_noinstr (/home/petr/src/linux_mlxsw/kernel/sched/clock.c:301 (discriminator 1))
[ 835.800971] ? __fget_light (/home/petr/src/linux_mlxsw/fs/file.c:1027)
[ 835.801222] __sys_sendmsg (/home/petr/src/linux_mlxsw/net/socket.c:2579)
[ 835.801460] ? __sys_sendmsg_sock (/home/petr/src/linux_mlxsw/net/socket.c:2565)
[ 835.801708] ? __up_read (/home/petr/src/linux_mlxsw/./arch/x86/include/asm/preempt.h:104 (discriminator 1) /home/petr/src/linux_mlxsw/kernel/locking/rwsem.c:1354 (discriminator 1))
[ 835.801933] ? syscall_enter_from_user_mode (/home/petr/src/linux_mlxsw/./arch/x86/include/asm/irqflags.h:42 /home/petr/src/linux_mlxsw/./arch/x86/include/asm/irqflags.h:77 /home/petr/src/linux_mlxsw/kernel/entry/common.c:111)
[ 835.802228] do_syscall_64 (/home/petr/src/linux_mlxsw/arch/x86/entry/common.c:50 (discriminator 1) /home/petr/src/linux_mlxsw/arch/x86/entry/common.c:80 (discriminator 1))
[ 835.802455] entry_SYSCALL_64_after_hwframe (/home/petr/src/linux_mlxsw/arch/x86/entry/entry_64.S:120)
[ 835.802758] RIP: 0033:0x7f4a861c38b4
[ 835.802983] Code: 15 59 f5 0b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 00 f3 0f 1e fa 80 3d 2d 7d 0c 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 4c c3 0f 1f 00 55 48 89 e5 48 83 ec 20 89 55
All code
========
0: 15 59 f5 0b 00 adc $0xbf559,%eax
5: f7 d8 neg %eax
7: 64 89 02 mov %eax,%fs:(%rdx)
a: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax
11: eb b5 jmp 0xffffffffffffffc8
13: 0f 1f 00 nopl (%rax)
16: f3 0f 1e fa endbr64
1a: 80 3d 2d 7d 0c 00 00 cmpb $0x0,0xc7d2d(%rip) # 0xc7d4e
21: 74 13 je 0x36
23: b8 2e 00 00 00 mov $0x2e,%eax
28: 0f 05 syscall
2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction
30: 77 4c ja 0x7e
32: c3 ret
33: 0f 1f 00 nopl (%rax)
36: 55 push %rbp
37: 48 89 e5 mov %rsp,%rbp
3a: 48 83 ec 20 sub $0x20,%rsp
3e: 89 .byte 0x89
3f: 55 push %rbp
Code starting with the faulting instruction
===========================================
0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax
6: 77 4c ja 0x54
8: c3 ret
9: 0f 1f 00 nopl (%rax)
c: 55 push %rbp
d: 48 89 e5 mov %rsp,%rbp
10: 48 83 ec 20 sub $0x20,%rsp
14: 89 .byte 0x89
15: 55 push %rbp
[ 835.803998] RSP: 002b:00007fff3b43db58 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
[ 835.804428] RAX: ffffffffffffffda RBX: 000055dc998edf80 RCX: 00007f4a861c38b4
[ 835.804824] RDX: 0000000000000000 RSI: 00007fff3b43dbd0 RDI: 0000000000000003
[ 835.805222] RBP: 00007fff3b43dc40 R08: 0000000064bab53c R09: 0000000000000001
[ 835.805622] R10: 0000000000000001 R11: 0000000000000202 R12: 00007fff3b43dcc0
[ 835.806035] R13: 0000000064bab53d R14: 000055dc998edf80 R15: 0000000000000000
[ 835.806466] </TASK>
[ 835.806606] Modules linked in: sch_ingress veth
[ 835.807662] ---[ end trace 0000000000000000 ]---
[ 835.808497] RIP: 0010:ingress_init (/home/petr/src/linux_mlxsw/./include/net/tcx.h:136 (discriminator 1) /home/petr/src/linux_mlxsw/net/sched/sch_ingress.c:94 (discriminator 1)) sch_ingress
[ 835.812394] Code: 03 80 3c 02 00 0f 85 91 04 00 00 4c 8b ad 00 02 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d bd 28 06 00 00 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 06 0f 8e 75 03 00 00 41 c6 85 28 06 00 00 01
All code
========
0: 03 80 3c 02 00 0f add 0xf00023c(%rax),%eax
6: 85 91 04 00 00 4c test %edx,0x4c000004(%rcx)
c: 8b ad 00 02 00 00 mov 0x200(%rbp),%ebp
12: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax
19: fc ff df
1c: 49 8d bd 28 06 00 00 lea 0x628(%r13),%rdi
23: 48 89 fa mov %rdi,%rdx
26: 48 c1 ea 03 shr $0x3,%rdx
2a:* 0f b6 04 02 movzbl (%rdx,%rax,1),%eax <-- trapping instruction
2e: 84 c0 test %al,%al
30: 74 06 je 0x38
32: 0f 8e 75 03 00 00 jle 0x3ad
38: 41 c6 85 28 06 00 00 movb $0x1,0x628(%r13)
3f: 01
Code starting with the faulting instruction
===========================================
0: 0f b6 04 02 movzbl (%rdx,%rax,1),%eax
4: 84 c0 test %al,%al
6: 74 06 je 0xe
8: 0f 8e 75 03 00 00 jle 0x383
e: 41 c6 85 28 06 00 00 movb $0x1,0x628(%r13)
15: 01
[ 835.814250] RSP: 0018:ffffc90000d17400 EFLAGS: 00010202
[ 835.814569] RAX: dffffc0000000000 RBX: ffff88800c841000 RCX: 0000000000000001
[ 835.815017] RDX: 0d6d6d6d6d6d6e32 RSI: ffffffff81c2398e RDI: 6b6b6b6b6b6b7193
[ 835.815451] RBP: ffff888008a7a008 R08: 0000000000000007 R09: 0000000000000000
[ 835.815857] R10: 0000000000000000 R11: 0000000000000001 R12: ffffc90000d17818
[ 835.816270] R13: 6b6b6b6b6b6b6b6b R14: 0000000000000000 R15: ffff88800d731000
[ 835.816683] FS: 00007f4a85e89740(0000) GS:ffff888036000000(0000) knlGS:0000000000000000
[ 835.817133] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 835.820478] CR2: 000055dc998c5dc0 CR3: 000000000bbc8005 CR4: 0000000000370ef0
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH bpf-next v6 2/8] bpf: Add fd-based tcx multi-prog infra with link support
2023-07-21 14:57 ` Petr Machata
@ 2023-07-21 23:43 ` Daniel Borkmann
0 siblings, 0 replies; 14+ messages in thread
From: Daniel Borkmann @ 2023-07-21 23:43 UTC (permalink / raw)
To: Petr Machata
Cc: ast, andrii, martin.lau, razor, sdf, john.fastabend, kuba, dxu,
joe, toke, davem, bpf, netdev
On 7/21/23 4:57 PM, Petr Machata wrote:
> As of this patch (commit e420bed02507), TC qdisc installation and/or
> removal cause memory access issues in the system.
>
> A semi-minimal reproducer is:
>
> bash-5.2# ip l a name v1 type veth peer name v2
> bash-5.2# ip l s dev v1 up
> bash-5.2# ip l s dev v2 up
> bash-5.2# tc q a dev v1 ingress
> bash-5.2# tc q d dev v1 ingress
> bash-5.2# tc q a dev v1 ingress
> bash-5.2# tc q d dev v1 ingress
>
> It's a bit finnicky, but only a little. For me, the first two "tc q"
> operations never triggered a splat. Then it could take a few "tc q a"
> "tc q d" iterations to get it to splat. So it looks like maybe the first
> "tc q d" is the problematic bit? And then there's some likelihood of
> failing on any following "tc q" operation. The above in particular
> produced three warning splats for me (attached as decoded.txt,
> decoded2.txt and decoded3.txt). Probing further:
>
> bash-5.2# tc q a dev v1 ingress
>
> Produced two more splats from KASAN (decoded4.txt and decoded5.txt),
> which look more serious.
>
> Further attempts to prod the system deadlock it, I guess because RTNL
> was left locked.
>
> Reverting e420bed02507, and fe20ce3a5126 + 55cc3768473e that fail to
> build without it, makes net-next/main work again.
Sorry about that, fix should be here:
https://lore.kernel.org/netdev/20230721233330.5678-1-daniel@iogearbox.net/
Thanks,
Daniel
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH bpf-next v6 3/8] libbpf: Add opts-based attach/detach/query API for tcx
2023-07-19 14:08 [PATCH bpf-next v6 0/8] BPF link support for tc BPF programs Daniel Borkmann
2023-07-19 14:08 ` [PATCH bpf-next v6 1/8] bpf: Add generic attach/detach/query API for multi-progs Daniel Borkmann
2023-07-19 14:08 ` [PATCH bpf-next v6 2/8] bpf: Add fd-based tcx multi-prog infra with link support Daniel Borkmann
@ 2023-07-19 14:08 ` Daniel Borkmann
2023-07-19 14:08 ` [PATCH bpf-next v6 4/8] libbpf: Add link-based " Daniel Borkmann
` (5 subsequent siblings)
8 siblings, 0 replies; 14+ messages in thread
From: Daniel Borkmann @ 2023-07-19 14:08 UTC (permalink / raw)
To: ast
Cc: andrii, martin.lau, razor, sdf, john.fastabend, kuba, dxu, joe,
toke, davem, bpf, netdev, Daniel Borkmann
Extend libbpf attach opts and add a new detach opts API so this can be used
to add/remove fd-based tcx BPF programs. The old-style bpf_prog_detach() and
bpf_prog_detach2() APIs are refactored to reuse the new bpf_prog_detach_opts()
internally.
The bpf_prog_query_opts() API got extended to be able to handle the new
link_ids, link_attach_flags and revision fields.
For concrete usage examples, see the extensive selftests that have been
developed as part of this series.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
---
tools/lib/bpf/bpf.c | 107 ++++++++++++++++++++++++++-------------
tools/lib/bpf/bpf.h | 92 ++++++++++++++++++++++++++++-----
tools/lib/bpf/libbpf.c | 12 +++--
tools/lib/bpf/libbpf.map | 1 +
4 files changed, 159 insertions(+), 53 deletions(-)
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 3b0da19715e1..4131d3a1484b 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -629,55 +629,89 @@ int bpf_prog_attach(int prog_fd, int target_fd, enum bpf_attach_type type,
return bpf_prog_attach_opts(prog_fd, target_fd, type, &opts);
}
-int bpf_prog_attach_opts(int prog_fd, int target_fd,
- enum bpf_attach_type type,
- const struct bpf_prog_attach_opts *opts)
+int bpf_prog_attach_opts(int prog_fd, int target, enum bpf_attach_type type,
+ const struct bpf_prog_attach_opts *opts)
{
- const size_t attr_sz = offsetofend(union bpf_attr, replace_bpf_fd);
+ const size_t attr_sz = offsetofend(union bpf_attr, expected_revision);
+ __u32 relative_id, flags;
+ int ret, relative_fd;
union bpf_attr attr;
- int ret;
if (!OPTS_VALID(opts, bpf_prog_attach_opts))
return libbpf_err(-EINVAL);
+ relative_id = OPTS_GET(opts, relative_id, 0);
+ relative_fd = OPTS_GET(opts, relative_fd, 0);
+ flags = OPTS_GET(opts, flags, 0);
+
+ /* validate we don't have unexpected combinations of non-zero fields */
+ if (relative_fd && relative_id)
+ return libbpf_err(-EINVAL);
+
memset(&attr, 0, attr_sz);
- attr.target_fd = target_fd;
- attr.attach_bpf_fd = prog_fd;
- attr.attach_type = type;
- attr.attach_flags = OPTS_GET(opts, flags, 0);
- attr.replace_bpf_fd = OPTS_GET(opts, replace_prog_fd, 0);
+ attr.target_fd = target;
+ attr.attach_bpf_fd = prog_fd;
+ attr.attach_type = type;
+ attr.replace_bpf_fd = OPTS_GET(opts, replace_fd, 0);
+ attr.expected_revision = OPTS_GET(opts, expected_revision, 0);
+
+ if (relative_id) {
+ attr.attach_flags = flags | BPF_F_ID;
+ attr.relative_id = relative_id;
+ } else {
+ attr.attach_flags = flags;
+ attr.relative_fd = relative_fd;
+ }
ret = sys_bpf(BPF_PROG_ATTACH, &attr, attr_sz);
return libbpf_err_errno(ret);
}
-int bpf_prog_detach(int target_fd, enum bpf_attach_type type)
+int bpf_prog_detach_opts(int prog_fd, int target, enum bpf_attach_type type,
+ const struct bpf_prog_detach_opts *opts)
{
- const size_t attr_sz = offsetofend(union bpf_attr, replace_bpf_fd);
+ const size_t attr_sz = offsetofend(union bpf_attr, expected_revision);
+ __u32 relative_id, flags;
+ int ret, relative_fd;
union bpf_attr attr;
- int ret;
+
+ if (!OPTS_VALID(opts, bpf_prog_detach_opts))
+ return libbpf_err(-EINVAL);
+
+ relative_id = OPTS_GET(opts, relative_id, 0);
+ relative_fd = OPTS_GET(opts, relative_fd, 0);
+ flags = OPTS_GET(opts, flags, 0);
+
+ /* validate we don't have unexpected combinations of non-zero fields */
+ if (relative_fd && relative_id)
+ return libbpf_err(-EINVAL);
memset(&attr, 0, attr_sz);
- attr.target_fd = target_fd;
- attr.attach_type = type;
+ attr.target_fd = target;
+ attr.attach_bpf_fd = prog_fd;
+ attr.attach_type = type;
+ attr.expected_revision = OPTS_GET(opts, expected_revision, 0);
+
+ if (relative_id) {
+ attr.attach_flags = flags | BPF_F_ID;
+ attr.relative_id = relative_id;
+ } else {
+ attr.attach_flags = flags;
+ attr.relative_fd = relative_fd;
+ }
ret = sys_bpf(BPF_PROG_DETACH, &attr, attr_sz);
return libbpf_err_errno(ret);
}
-int bpf_prog_detach2(int prog_fd, int target_fd, enum bpf_attach_type type)
+int bpf_prog_detach(int target_fd, enum bpf_attach_type type)
{
- const size_t attr_sz = offsetofend(union bpf_attr, replace_bpf_fd);
- union bpf_attr attr;
- int ret;
-
- memset(&attr, 0, attr_sz);
- attr.target_fd = target_fd;
- attr.attach_bpf_fd = prog_fd;
- attr.attach_type = type;
+ return bpf_prog_detach_opts(0, target_fd, type, NULL);
+}
- ret = sys_bpf(BPF_PROG_DETACH, &attr, attr_sz);
- return libbpf_err_errno(ret);
+int bpf_prog_detach2(int prog_fd, int target_fd, enum bpf_attach_type type)
+{
+ return bpf_prog_detach_opts(prog_fd, target_fd, type, NULL);
}
int bpf_link_create(int prog_fd, int target_fd,
@@ -841,8 +875,7 @@ int bpf_iter_create(int link_fd)
return libbpf_err_errno(fd);
}
-int bpf_prog_query_opts(int target_fd,
- enum bpf_attach_type type,
+int bpf_prog_query_opts(int target, enum bpf_attach_type type,
struct bpf_prog_query_opts *opts)
{
const size_t attr_sz = offsetofend(union bpf_attr, query);
@@ -853,18 +886,20 @@ int bpf_prog_query_opts(int target_fd,
return libbpf_err(-EINVAL);
memset(&attr, 0, attr_sz);
-
- attr.query.target_fd = target_fd;
- attr.query.attach_type = type;
- attr.query.query_flags = OPTS_GET(opts, query_flags, 0);
- attr.query.prog_cnt = OPTS_GET(opts, prog_cnt, 0);
- attr.query.prog_ids = ptr_to_u64(OPTS_GET(opts, prog_ids, NULL));
- attr.query.prog_attach_flags = ptr_to_u64(OPTS_GET(opts, prog_attach_flags, NULL));
+ attr.query.target_fd = target;
+ attr.query.attach_type = type;
+ attr.query.query_flags = OPTS_GET(opts, query_flags, 0);
+ attr.query.count = OPTS_GET(opts, count, 0);
+ attr.query.prog_ids = ptr_to_u64(OPTS_GET(opts, prog_ids, NULL));
+ attr.query.link_ids = ptr_to_u64(OPTS_GET(opts, link_ids, NULL));
+ attr.query.prog_attach_flags = ptr_to_u64(OPTS_GET(opts, prog_attach_flags, NULL));
+ attr.query.link_attach_flags = ptr_to_u64(OPTS_GET(opts, link_attach_flags, NULL));
ret = sys_bpf(BPF_PROG_QUERY, &attr, attr_sz);
OPTS_SET(opts, attach_flags, attr.query.attach_flags);
- OPTS_SET(opts, prog_cnt, attr.query.prog_cnt);
+ OPTS_SET(opts, revision, attr.query.revision);
+ OPTS_SET(opts, count, attr.query.count);
return libbpf_err_errno(ret);
}
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index c676295ab9bf..49e9d88fd9cf 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -312,22 +312,68 @@ LIBBPF_API int bpf_obj_get(const char *pathname);
LIBBPF_API int bpf_obj_get_opts(const char *pathname,
const struct bpf_obj_get_opts *opts);
-struct bpf_prog_attach_opts {
- size_t sz; /* size of this struct for forward/backward compatibility */
- unsigned int flags;
- int replace_prog_fd;
-};
-#define bpf_prog_attach_opts__last_field replace_prog_fd
-
LIBBPF_API int bpf_prog_attach(int prog_fd, int attachable_fd,
enum bpf_attach_type type, unsigned int flags);
-LIBBPF_API int bpf_prog_attach_opts(int prog_fd, int attachable_fd,
- enum bpf_attach_type type,
- const struct bpf_prog_attach_opts *opts);
LIBBPF_API int bpf_prog_detach(int attachable_fd, enum bpf_attach_type type);
LIBBPF_API int bpf_prog_detach2(int prog_fd, int attachable_fd,
enum bpf_attach_type type);
+struct bpf_prog_attach_opts {
+ size_t sz; /* size of this struct for forward/backward compatibility */
+ __u32 flags;
+ union {
+ int replace_prog_fd;
+ int replace_fd;
+ };
+ int relative_fd;
+ __u32 relative_id;
+ __u64 expected_revision;
+ size_t :0;
+};
+#define bpf_prog_attach_opts__last_field expected_revision
+
+struct bpf_prog_detach_opts {
+ size_t sz; /* size of this struct for forward/backward compatibility */
+ __u32 flags;
+ int relative_fd;
+ __u32 relative_id;
+ __u64 expected_revision;
+ size_t :0;
+};
+#define bpf_prog_detach_opts__last_field expected_revision
+
+/**
+ * @brief **bpf_prog_attach_opts()** attaches the BPF program corresponding to
+ * *prog_fd* to a *target* which can represent a file descriptor or netdevice
+ * ifindex.
+ *
+ * @param prog_fd BPF program file descriptor
+ * @param target attach location file descriptor or ifindex
+ * @param type attach type for the BPF program
+ * @param opts options for configuring the attachment
+ * @return 0, on success; negative error code, otherwise (errno is also set to
+ * the error code)
+ */
+LIBBPF_API int bpf_prog_attach_opts(int prog_fd, int target,
+ enum bpf_attach_type type,
+ const struct bpf_prog_attach_opts *opts);
+
+/**
+ * @brief **bpf_prog_detach_opts()** detaches the BPF program corresponding to
+ * *prog_fd* from a *target* which can represent a file descriptor or netdevice
+ * ifindex.
+ *
+ * @param prog_fd BPF program file descriptor
+ * @param target detach location file descriptor or ifindex
+ * @param type detach type for the BPF program
+ * @param opts options for configuring the detachment
+ * @return 0, on success; negative error code, otherwise (errno is also set to
+ * the error code)
+ */
+LIBBPF_API int bpf_prog_detach_opts(int prog_fd, int target,
+ enum bpf_attach_type type,
+ const struct bpf_prog_detach_opts *opts);
+
union bpf_iter_link_info; /* defined in up-to-date linux/bpf.h */
struct bpf_link_create_opts {
size_t sz; /* size of this struct for forward/backward compatibility */
@@ -495,13 +541,31 @@ struct bpf_prog_query_opts {
__u32 query_flags;
__u32 attach_flags; /* output argument */
__u32 *prog_ids;
- __u32 prog_cnt; /* input+output argument */
+ union {
+ /* input+output argument */
+ __u32 prog_cnt;
+ __u32 count;
+ };
__u32 *prog_attach_flags;
+ __u32 *link_ids;
+ __u32 *link_attach_flags;
+ __u64 revision;
+ size_t :0;
};
-#define bpf_prog_query_opts__last_field prog_attach_flags
+#define bpf_prog_query_opts__last_field revision
-LIBBPF_API int bpf_prog_query_opts(int target_fd,
- enum bpf_attach_type type,
+/**
+ * @brief **bpf_prog_query_opts()** queries the BPF programs and BPF links
+ * which are attached to *target* which can represent a file descriptor or
+ * netdevice ifindex.
+ *
+ * @param target query location file descriptor or ifindex
+ * @param type attach type for the BPF program
+ * @param opts options for configuring the query
+ * @return 0, on success; negative error code, otherwise (errno is also set to
+ * the error code)
+ */
+LIBBPF_API int bpf_prog_query_opts(int target, enum bpf_attach_type type,
struct bpf_prog_query_opts *opts);
LIBBPF_API int bpf_prog_query(int target_fd, enum bpf_attach_type type,
__u32 query_flags, __u32 *attach_flags,
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 63311a73c16d..0f6913e1059b 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -118,6 +118,8 @@ static const char * const attach_type_name[] = {
[BPF_TRACE_KPROBE_MULTI] = "trace_kprobe_multi",
[BPF_STRUCT_OPS] = "struct_ops",
[BPF_NETFILTER] = "netfilter",
+ [BPF_TCX_INGRESS] = "tcx_ingress",
+ [BPF_TCX_EGRESS] = "tcx_egress",
};
static const char * const link_type_name[] = {
@@ -8696,9 +8698,13 @@ static const struct bpf_sec_def section_defs[] = {
SEC_DEF("ksyscall+", KPROBE, 0, SEC_NONE, attach_ksyscall),
SEC_DEF("kretsyscall+", KPROBE, 0, SEC_NONE, attach_ksyscall),
SEC_DEF("usdt+", KPROBE, 0, SEC_NONE, attach_usdt),
- SEC_DEF("tc", SCHED_CLS, 0, SEC_NONE),
- SEC_DEF("classifier", SCHED_CLS, 0, SEC_NONE),
- SEC_DEF("action", SCHED_ACT, 0, SEC_NONE),
+ SEC_DEF("tc/ingress", SCHED_CLS, BPF_TCX_INGRESS, SEC_NONE), /* alias for tcx */
+ SEC_DEF("tc/egress", SCHED_CLS, BPF_TCX_EGRESS, SEC_NONE), /* alias for tcx */
+ SEC_DEF("tcx/ingress", SCHED_CLS, BPF_TCX_INGRESS, SEC_NONE),
+ SEC_DEF("tcx/egress", SCHED_CLS, BPF_TCX_EGRESS, SEC_NONE),
+ SEC_DEF("tc", SCHED_CLS, 0, SEC_NONE), /* deprecated / legacy, use tcx */
+ SEC_DEF("classifier", SCHED_CLS, 0, SEC_NONE), /* deprecated / legacy, use tcx */
+ SEC_DEF("action", SCHED_ACT, 0, SEC_NONE), /* deprecated / legacy, use tcx */
SEC_DEF("tracepoint+", TRACEPOINT, 0, SEC_NONE, attach_tp),
SEC_DEF("tp+", TRACEPOINT, 0, SEC_NONE, attach_tp),
SEC_DEF("raw_tracepoint+", RAW_TRACEPOINT, 0, SEC_NONE, attach_raw_tp),
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index d9ec4407befa..8d1608643c33 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -395,5 +395,6 @@ LIBBPF_1.2.0 {
LIBBPF_1.3.0 {
global:
bpf_obj_pin_opts;
+ bpf_prog_detach_opts;
bpf_program__attach_netfilter;
} LIBBPF_1.2.0;
--
2.34.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH bpf-next v6 4/8] libbpf: Add link-based API for tcx
2023-07-19 14:08 [PATCH bpf-next v6 0/8] BPF link support for tc BPF programs Daniel Borkmann
` (2 preceding siblings ...)
2023-07-19 14:08 ` [PATCH bpf-next v6 3/8] libbpf: Add opts-based attach/detach/query API for tcx Daniel Borkmann
@ 2023-07-19 14:08 ` Daniel Borkmann
2023-07-19 14:08 ` [PATCH bpf-next v6 5/8] libbpf: Add helper macro to clear opts structs Daniel Borkmann
` (4 subsequent siblings)
8 siblings, 0 replies; 14+ messages in thread
From: Daniel Borkmann @ 2023-07-19 14:08 UTC (permalink / raw)
To: ast
Cc: andrii, martin.lau, razor, sdf, john.fastabend, kuba, dxu, joe,
toke, davem, bpf, netdev, Daniel Borkmann
Implement tcx BPF link support for libbpf.
The bpf_program__attach_fd() API has been refactored slightly in order to pass
bpf_link_create_opts pointer as input.
A new bpf_program__attach_tcx() has been added on top of this which allows for
passing all relevant data via extensible struct bpf_tcx_opts.
The program sections tcx/ingress and tcx/egress correspond to the hook locations
for tc ingress and egress, respectively.
For concrete usage examples, see the extensive selftests that have been
developed as part of this series.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
---
tools/lib/bpf/bpf.c | 20 ++++++++++++--
tools/lib/bpf/bpf.h | 5 ++++
tools/lib/bpf/libbpf.c | 58 +++++++++++++++++++++++++++++++++-------
tools/lib/bpf/libbpf.h | 15 +++++++++++
tools/lib/bpf/libbpf.map | 1 +
5 files changed, 88 insertions(+), 11 deletions(-)
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 4131d3a1484b..c9b6b311a441 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -719,9 +719,9 @@ int bpf_link_create(int prog_fd, int target_fd,
const struct bpf_link_create_opts *opts)
{
const size_t attr_sz = offsetofend(union bpf_attr, link_create);
- __u32 target_btf_id, iter_info_len;
+ __u32 target_btf_id, iter_info_len, relative_id;
+ int fd, err, relative_fd;
union bpf_attr attr;
- int fd, err;
if (!OPTS_VALID(opts, bpf_link_create_opts))
return libbpf_err(-EINVAL);
@@ -783,6 +783,22 @@ int bpf_link_create(int prog_fd, int target_fd,
if (!OPTS_ZEROED(opts, netfilter))
return libbpf_err(-EINVAL);
break;
+ case BPF_TCX_INGRESS:
+ case BPF_TCX_EGRESS:
+ relative_fd = OPTS_GET(opts, tcx.relative_fd, 0);
+ relative_id = OPTS_GET(opts, tcx.relative_id, 0);
+ if (relative_fd && relative_id)
+ return libbpf_err(-EINVAL);
+ if (relative_id) {
+ attr.link_create.tcx.relative_id = relative_id;
+ attr.link_create.flags |= BPF_F_ID;
+ } else {
+ attr.link_create.tcx.relative_fd = relative_fd;
+ }
+ attr.link_create.tcx.expected_revision = OPTS_GET(opts, tcx.expected_revision, 0);
+ if (!OPTS_ZEROED(opts, tcx))
+ return libbpf_err(-EINVAL);
+ break;
default:
if (!OPTS_ZEROED(opts, flags))
return libbpf_err(-EINVAL);
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 49e9d88fd9cf..044a74ffc38a 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -401,6 +401,11 @@ struct bpf_link_create_opts {
__s32 priority;
__u32 flags;
} netfilter;
+ struct {
+ __u32 relative_fd;
+ __u32 relative_id;
+ __u64 expected_revision;
+ } tcx;
};
size_t :0;
};
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 0f6913e1059b..17883f5a44b9 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -134,6 +134,7 @@ static const char * const link_type_name[] = {
[BPF_LINK_TYPE_KPROBE_MULTI] = "kprobe_multi",
[BPF_LINK_TYPE_STRUCT_OPS] = "struct_ops",
[BPF_LINK_TYPE_NETFILTER] = "netfilter",
+ [BPF_LINK_TYPE_TCX] = "tcx",
};
static const char * const map_type_name[] = {
@@ -11854,11 +11855,10 @@ static int attach_lsm(const struct bpf_program *prog, long cookie, struct bpf_li
}
static struct bpf_link *
-bpf_program__attach_fd(const struct bpf_program *prog, int target_fd, int btf_id,
- const char *target_name)
+bpf_program_attach_fd(const struct bpf_program *prog,
+ int target_fd, const char *target_name,
+ const struct bpf_link_create_opts *opts)
{
- DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts,
- .target_btf_id = btf_id);
enum bpf_attach_type attach_type;
char errmsg[STRERR_BUFSIZE];
struct bpf_link *link;
@@ -11876,7 +11876,7 @@ bpf_program__attach_fd(const struct bpf_program *prog, int target_fd, int btf_id
link->detach = &bpf_link__detach_fd;
attach_type = bpf_program__expected_attach_type(prog);
- link_fd = bpf_link_create(prog_fd, target_fd, attach_type, &opts);
+ link_fd = bpf_link_create(prog_fd, target_fd, attach_type, opts);
if (link_fd < 0) {
link_fd = -errno;
free(link);
@@ -11892,19 +11892,54 @@ bpf_program__attach_fd(const struct bpf_program *prog, int target_fd, int btf_id
struct bpf_link *
bpf_program__attach_cgroup(const struct bpf_program *prog, int cgroup_fd)
{
- return bpf_program__attach_fd(prog, cgroup_fd, 0, "cgroup");
+ return bpf_program_attach_fd(prog, cgroup_fd, "cgroup", NULL);
}
struct bpf_link *
bpf_program__attach_netns(const struct bpf_program *prog, int netns_fd)
{
- return bpf_program__attach_fd(prog, netns_fd, 0, "netns");
+ return bpf_program_attach_fd(prog, netns_fd, "netns", NULL);
}
struct bpf_link *bpf_program__attach_xdp(const struct bpf_program *prog, int ifindex)
{
/* target_fd/target_ifindex use the same field in LINK_CREATE */
- return bpf_program__attach_fd(prog, ifindex, 0, "xdp");
+ return bpf_program_attach_fd(prog, ifindex, "xdp", NULL);
+}
+
+struct bpf_link *
+bpf_program__attach_tcx(const struct bpf_program *prog, int ifindex,
+ const struct bpf_tcx_opts *opts)
+{
+ LIBBPF_OPTS(bpf_link_create_opts, link_create_opts);
+ __u32 relative_id;
+ int relative_fd;
+
+ if (!OPTS_VALID(opts, bpf_tcx_opts))
+ return libbpf_err_ptr(-EINVAL);
+
+ relative_id = OPTS_GET(opts, relative_id, 0);
+ relative_fd = OPTS_GET(opts, relative_fd, 0);
+
+ /* validate we don't have unexpected combinations of non-zero fields */
+ if (!ifindex) {
+ pr_warn("prog '%s': target netdevice ifindex cannot be zero\n",
+ prog->name);
+ return libbpf_err_ptr(-EINVAL);
+ }
+ if (relative_fd && relative_id) {
+ pr_warn("prog '%s': relative_fd and relative_id cannot be set at the same time\n",
+ prog->name);
+ return libbpf_err_ptr(-EINVAL);
+ }
+
+ link_create_opts.tcx.expected_revision = OPTS_GET(opts, expected_revision, 0);
+ link_create_opts.tcx.relative_fd = relative_fd;
+ link_create_opts.tcx.relative_id = relative_id;
+ link_create_opts.flags = OPTS_GET(opts, flags, 0);
+
+ /* target_fd/target_ifindex use the same field in LINK_CREATE */
+ return bpf_program_attach_fd(prog, ifindex, "tcx", &link_create_opts);
}
struct bpf_link *bpf_program__attach_freplace(const struct bpf_program *prog,
@@ -11926,11 +11961,16 @@ struct bpf_link *bpf_program__attach_freplace(const struct bpf_program *prog,
}
if (target_fd) {
+ LIBBPF_OPTS(bpf_link_create_opts, target_opts);
+
btf_id = libbpf_find_prog_btf_id(attach_func_name, target_fd);
if (btf_id < 0)
return libbpf_err_ptr(btf_id);
- return bpf_program__attach_fd(prog, target_fd, btf_id, "freplace");
+ target_opts.target_btf_id = btf_id;
+
+ return bpf_program_attach_fd(prog, target_fd, "freplace",
+ &target_opts);
} else {
/* no target, so use raw_tracepoint_open for compatibility
* with old kernels
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 10642ad69d76..d9b8f3519bf0 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -733,6 +733,21 @@ LIBBPF_API struct bpf_link *
bpf_program__attach_netfilter(const struct bpf_program *prog,
const struct bpf_netfilter_opts *opts);
+struct bpf_tcx_opts {
+ /* size of this struct, for forward/backward compatibility */
+ size_t sz;
+ __u32 flags;
+ __u32 relative_fd;
+ __u32 relative_id;
+ __u64 expected_revision;
+ size_t :0;
+};
+#define bpf_tcx_opts__last_field expected_revision
+
+LIBBPF_API struct bpf_link *
+bpf_program__attach_tcx(const struct bpf_program *prog, int ifindex,
+ const struct bpf_tcx_opts *opts);
+
struct bpf_map;
LIBBPF_API struct bpf_link *bpf_map__attach_struct_ops(const struct bpf_map *map);
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 8d1608643c33..9c7538dd5835 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -397,4 +397,5 @@ LIBBPF_1.3.0 {
bpf_obj_pin_opts;
bpf_prog_detach_opts;
bpf_program__attach_netfilter;
+ bpf_program__attach_tcx;
} LIBBPF_1.2.0;
--
2.34.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH bpf-next v6 5/8] libbpf: Add helper macro to clear opts structs
2023-07-19 14:08 [PATCH bpf-next v6 0/8] BPF link support for tc BPF programs Daniel Borkmann
` (3 preceding siblings ...)
2023-07-19 14:08 ` [PATCH bpf-next v6 4/8] libbpf: Add link-based " Daniel Borkmann
@ 2023-07-19 14:08 ` Daniel Borkmann
2023-07-19 14:08 ` [PATCH bpf-next v6 6/8] bpftool: Extend net dump with tcx progs Daniel Borkmann
` (3 subsequent siblings)
8 siblings, 0 replies; 14+ messages in thread
From: Daniel Borkmann @ 2023-07-19 14:08 UTC (permalink / raw)
To: ast
Cc: andrii, martin.lau, razor, sdf, john.fastabend, kuba, dxu, joe,
toke, davem, bpf, netdev, Daniel Borkmann
Add a small and generic LIBBPF_OPTS_RESET() helper macros which clears an
opts structure and reinitializes its .sz member to place the structure
size. Additionally, the user can pass option-specific data to reinitialize
via varargs.
I found this very useful when developing selftests, but it is also generic
enough as a macro next to the existing LIBBPF_OPTS() which hides the .sz
initialization, too.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
tools/lib/bpf/libbpf_common.h | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/tools/lib/bpf/libbpf_common.h b/tools/lib/bpf/libbpf_common.h
index 9a7937f339df..b7060f254486 100644
--- a/tools/lib/bpf/libbpf_common.h
+++ b/tools/lib/bpf/libbpf_common.h
@@ -70,4 +70,20 @@
}; \
})
+/* Helper macro to clear and optionally reinitialize libbpf options struct
+ *
+ * Small helper macro to reset all fields and to reinitialize the common
+ * structure size member. Values provided by users in struct initializer-
+ * syntax as varargs can be provided as well to reinitialize options struct
+ * specific members.
+ */
+#define LIBBPF_OPTS_RESET(NAME, ...) \
+ do { \
+ memset(&NAME, 0, sizeof(NAME)); \
+ NAME = (typeof(NAME)) { \
+ .sz = sizeof(NAME), \
+ __VA_ARGS__ \
+ }; \
+ } while (0)
+
#endif /* __LIBBPF_LIBBPF_COMMON_H */
--
2.34.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH bpf-next v6 6/8] bpftool: Extend net dump with tcx progs
2023-07-19 14:08 [PATCH bpf-next v6 0/8] BPF link support for tc BPF programs Daniel Borkmann
` (4 preceding siblings ...)
2023-07-19 14:08 ` [PATCH bpf-next v6 5/8] libbpf: Add helper macro to clear opts structs Daniel Borkmann
@ 2023-07-19 14:08 ` Daniel Borkmann
2023-07-19 14:08 ` [PATCH bpf-next v6 7/8] selftests/bpf: Add mprog API tests for BPF tcx opts Daniel Borkmann
` (2 subsequent siblings)
8 siblings, 0 replies; 14+ messages in thread
From: Daniel Borkmann @ 2023-07-19 14:08 UTC (permalink / raw)
To: ast
Cc: andrii, martin.lau, razor, sdf, john.fastabend, kuba, dxu, joe,
toke, davem, bpf, netdev, Daniel Borkmann, Quentin Monnet
Add support to dump fd-based attach types via bpftool. This includes both
the tc BPF link and attach ops programs. Dumped information contain the
attach location, function entry name, program ID and link ID when applicable.
Example with tc BPF link:
# ./bpftool net
xdp:
tc:
bond0(4) tcx/ingress cil_from_netdev prog_id 784 link_id 10
bond0(4) tcx/egress cil_to_netdev prog_id 804 link_id 11
flow_dissector:
netfilter:
Example with tc BPF attach ops:
# ./bpftool net
xdp:
tc:
bond0(4) tcx/ingress cil_from_netdev prog_id 654
bond0(4) tcx/egress cil_to_netdev prog_id 672
flow_dissector:
netfilter:
Currently, permanent flags are not yet supported, so 'unknown' ones are dumped
via NET_DUMP_UINT_ONLY() and once we do have permanent ones, we dump them as
human readable string.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
---
.../bpf/bpftool/Documentation/bpftool-net.rst | 26 ++---
tools/bpf/bpftool/net.c | 98 ++++++++++++++++++-
tools/bpf/bpftool/netlink_dumper.h | 8 ++
3 files changed, 116 insertions(+), 16 deletions(-)
diff --git a/tools/bpf/bpftool/Documentation/bpftool-net.rst b/tools/bpf/bpftool/Documentation/bpftool-net.rst
index f4e0a516335a..5e2abd3de5ab 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-net.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-net.rst
@@ -4,7 +4,7 @@
bpftool-net
================
-------------------------------------------------------------------------------
-tool for inspection of netdev/tc related bpf prog attachments
+tool for inspection of networking related bpf prog attachments
-------------------------------------------------------------------------------
:Manual section: 8
@@ -37,10 +37,13 @@ DESCRIPTION
**bpftool net { show | list }** [ **dev** *NAME* ]
List bpf program attachments in the kernel networking subsystem.
- Currently, only device driver xdp attachments and tc filter
- classification/action attachments are implemented, i.e., for
- program types **BPF_PROG_TYPE_SCHED_CLS**,
- **BPF_PROG_TYPE_SCHED_ACT** and **BPF_PROG_TYPE_XDP**.
+ Currently, device driver xdp attachments, tcx and old-style tc
+ classifier/action attachments, flow_dissector as well as netfilter
+ attachments are implemented, i.e., for
+ program types **BPF_PROG_TYPE_XDP**, **BPF_PROG_TYPE_SCHED_CLS**,
+ **BPF_PROG_TYPE_SCHED_ACT**, **BPF_PROG_TYPE_FLOW_DISSECTOR**,
+ **BPF_PROG_TYPE_NETFILTER**.
+
For programs attached to a particular cgroup, e.g.,
**BPF_PROG_TYPE_CGROUP_SKB**, **BPF_PROG_TYPE_CGROUP_SOCK**,
**BPF_PROG_TYPE_SOCK_OPS** and **BPF_PROG_TYPE_CGROUP_SOCK_ADDR**,
@@ -49,12 +52,13 @@ DESCRIPTION
bpf programs, users should consult other tools, e.g., iproute2.
The current output will start with all xdp program attachments, followed by
- all tc class/qdisc bpf program attachments. Both xdp programs and
- tc programs are ordered based on ifindex number. If multiple bpf
- programs attached to the same networking device through **tc filter**,
- the order will be first all bpf programs attached to tc classes, then
- all bpf programs attached to non clsact qdiscs, and finally all
- bpf programs attached to root and clsact qdisc.
+ all tcx, then tc class/qdisc bpf program attachments, then flow_dissector
+ and finally netfilter programs. Both xdp programs and tcx/tc programs are
+ ordered based on ifindex number. If multiple bpf programs attached
+ to the same networking device through **tc**, the order will be first
+ all bpf programs attached to tcx, then tc classes, then all bpf programs
+ attached to non clsact qdiscs, and finally all bpf programs attached
+ to root and clsact qdisc.
**bpftool** **net attach** *ATTACH_TYPE* *PROG* **dev** *NAME* [ **overwrite** ]
Attach bpf program *PROG* to network interface *NAME* with
diff --git a/tools/bpf/bpftool/net.c b/tools/bpf/bpftool/net.c
index 26a49965bf71..66a8ce8ae012 100644
--- a/tools/bpf/bpftool/net.c
+++ b/tools/bpf/bpftool/net.c
@@ -76,6 +76,11 @@ static const char * const attach_type_strings[] = {
[NET_ATTACH_TYPE_XDP_OFFLOAD] = "xdpoffload",
};
+static const char * const attach_loc_strings[] = {
+ [BPF_TCX_INGRESS] = "tcx/ingress",
+ [BPF_TCX_EGRESS] = "tcx/egress",
+};
+
const size_t net_attach_type_size = ARRAY_SIZE(attach_type_strings);
static enum net_attach_type parse_attach_type(const char *str)
@@ -422,8 +427,89 @@ static int dump_filter_nlmsg(void *cookie, void *msg, struct nlattr **tb)
filter_info->devname, filter_info->ifindex);
}
-static int show_dev_tc_bpf(int sock, unsigned int nl_pid,
- struct ip_devname_ifindex *dev)
+static int __show_dev_tc_bpf_name(__u32 id, char *name, size_t len)
+{
+ struct bpf_prog_info info = {};
+ __u32 ilen = sizeof(info);
+ int fd, ret;
+
+ fd = bpf_prog_get_fd_by_id(id);
+ if (fd < 0)
+ return fd;
+ ret = bpf_obj_get_info_by_fd(fd, &info, &ilen);
+ if (ret < 0)
+ goto out;
+ ret = -ENOENT;
+ if (info.name[0]) {
+ get_prog_full_name(&info, fd, name, len);
+ ret = 0;
+ }
+out:
+ close(fd);
+ return ret;
+}
+
+static void __show_dev_tc_bpf(const struct ip_devname_ifindex *dev,
+ const enum bpf_attach_type loc)
+{
+ __u32 prog_flags[64] = {}, link_flags[64] = {}, i, j;
+ __u32 prog_ids[64] = {}, link_ids[64] = {};
+ LIBBPF_OPTS(bpf_prog_query_opts, optq);
+ char prog_name[MAX_PROG_FULL_NAME];
+ int ret;
+
+ optq.prog_ids = prog_ids;
+ optq.prog_attach_flags = prog_flags;
+ optq.link_ids = link_ids;
+ optq.link_attach_flags = link_flags;
+ optq.count = ARRAY_SIZE(prog_ids);
+
+ ret = bpf_prog_query_opts(dev->ifindex, loc, &optq);
+ if (ret)
+ return;
+ for (i = 0; i < optq.count; i++) {
+ NET_START_OBJECT;
+ NET_DUMP_STR("devname", "%s", dev->devname);
+ NET_DUMP_UINT("ifindex", "(%u)", dev->ifindex);
+ NET_DUMP_STR("kind", " %s", attach_loc_strings[loc]);
+ ret = __show_dev_tc_bpf_name(prog_ids[i], prog_name,
+ sizeof(prog_name));
+ if (!ret)
+ NET_DUMP_STR("name", " %s", prog_name);
+ NET_DUMP_UINT("prog_id", " prog_id %u ", prog_ids[i]);
+ if (prog_flags[i] || json_output) {
+ NET_START_ARRAY("prog_flags", "%s ");
+ for (j = 0; prog_flags[i] && j < 32; j++) {
+ if (!(prog_flags[i] & (1 << j)))
+ continue;
+ NET_DUMP_UINT_ONLY(1 << j);
+ }
+ NET_END_ARRAY("");
+ }
+ if (link_ids[i] || json_output) {
+ NET_DUMP_UINT("link_id", "link_id %u ", link_ids[i]);
+ if (link_flags[i] || json_output) {
+ NET_START_ARRAY("link_flags", "%s ");
+ for (j = 0; link_flags[i] && j < 32; j++) {
+ if (!(link_flags[i] & (1 << j)))
+ continue;
+ NET_DUMP_UINT_ONLY(1 << j);
+ }
+ NET_END_ARRAY("");
+ }
+ }
+ NET_END_OBJECT_FINAL;
+ }
+}
+
+static void show_dev_tc_bpf(struct ip_devname_ifindex *dev)
+{
+ __show_dev_tc_bpf(dev, BPF_TCX_INGRESS);
+ __show_dev_tc_bpf(dev, BPF_TCX_EGRESS);
+}
+
+static int show_dev_tc_bpf_classic(int sock, unsigned int nl_pid,
+ struct ip_devname_ifindex *dev)
{
struct bpf_filter_t filter_info;
struct bpf_tcinfo_t tcinfo;
@@ -790,8 +876,9 @@ static int do_show(int argc, char **argv)
if (!ret) {
NET_START_ARRAY("tc", "%s:\n");
for (i = 0; i < dev_array.used_len; i++) {
- ret = show_dev_tc_bpf(sock, nl_pid,
- &dev_array.devices[i]);
+ show_dev_tc_bpf(&dev_array.devices[i]);
+ ret = show_dev_tc_bpf_classic(sock, nl_pid,
+ &dev_array.devices[i]);
if (ret)
break;
}
@@ -839,7 +926,8 @@ static int do_help(int argc, char **argv)
" ATTACH_TYPE := { xdp | xdpgeneric | xdpdrv | xdpoffload }\n"
" " HELP_SPEC_OPTIONS " }\n"
"\n"
- "Note: Only xdp and tc attachments are supported now.\n"
+ "Note: Only xdp, tcx, tc, flow_dissector and netfilter attachments\n"
+ " are currently supported.\n"
" For progs attached to cgroups, use \"bpftool cgroup\"\n"
" to dump program attachments. For program types\n"
" sk_{filter,skb,msg,reuseport} and lwt/seg6, please\n"
diff --git a/tools/bpf/bpftool/netlink_dumper.h b/tools/bpf/bpftool/netlink_dumper.h
index 774af6c62ef5..96318106fb49 100644
--- a/tools/bpf/bpftool/netlink_dumper.h
+++ b/tools/bpf/bpftool/netlink_dumper.h
@@ -76,6 +76,14 @@
fprintf(stdout, fmt_str, val); \
}
+#define NET_DUMP_UINT_ONLY(str) \
+{ \
+ if (json_output) \
+ jsonw_uint(json_wtr, str); \
+ else \
+ fprintf(stdout, "%u ", str); \
+}
+
#define NET_DUMP_STR(name, fmt_str, str) \
{ \
if (json_output) \
--
2.34.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH bpf-next v6 7/8] selftests/bpf: Add mprog API tests for BPF tcx opts
2023-07-19 14:08 [PATCH bpf-next v6 0/8] BPF link support for tc BPF programs Daniel Borkmann
` (5 preceding siblings ...)
2023-07-19 14:08 ` [PATCH bpf-next v6 6/8] bpftool: Extend net dump with tcx progs Daniel Borkmann
@ 2023-07-19 14:08 ` Daniel Borkmann
2023-07-19 14:08 ` [PATCH bpf-next v6 8/8] selftests/bpf: Add mprog API tests for BPF tcx links Daniel Borkmann
2023-07-19 17:20 ` [PATCH bpf-next v6 0/8] BPF link support for tc BPF programs patchwork-bot+netdevbpf
8 siblings, 0 replies; 14+ messages in thread
From: Daniel Borkmann @ 2023-07-19 14:08 UTC (permalink / raw)
To: ast
Cc: andrii, martin.lau, razor, sdf, john.fastabend, kuba, dxu, joe,
toke, davem, bpf, netdev, Daniel Borkmann
Add a big batch of test coverage to assert all aspects of the tcx opts
attach, detach and query API:
# ./vmtest.sh -- ./test_progs -t tc_opts
[...]
#238 tc_opts_after:OK
#239 tc_opts_append:OK
#240 tc_opts_basic:OK
#241 tc_opts_before:OK
#242 tc_opts_chain_classic:OK
#243 tc_opts_demixed:OK
#244 tc_opts_detach:OK
#245 tc_opts_detach_after:OK
#246 tc_opts_detach_before:OK
#247 tc_opts_dev_cleanup:OK
#248 tc_opts_invalid:OK
#249 tc_opts_mixed:OK
#250 tc_opts_prepend:OK
#251 tc_opts_replace:OK
#252 tc_opts_revision:OK
Summary: 15/0 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
.../selftests/bpf/prog_tests/tc_helpers.h | 72 +
.../selftests/bpf/prog_tests/tc_opts.c | 2239 +++++++++++++++++
.../selftests/bpf/progs/test_tc_link.c | 40 +
3 files changed, 2351 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/tc_helpers.h
create mode 100644 tools/testing/selftests/bpf/prog_tests/tc_opts.c
create mode 100644 tools/testing/selftests/bpf/progs/test_tc_link.c
diff --git a/tools/testing/selftests/bpf/prog_tests/tc_helpers.h b/tools/testing/selftests/bpf/prog_tests/tc_helpers.h
new file mode 100644
index 000000000000..6c93215be8a3
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/tc_helpers.h
@@ -0,0 +1,72 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) 2023 Isovalent */
+#ifndef TC_HELPERS
+#define TC_HELPERS
+#include <test_progs.h>
+
+static inline __u32 id_from_prog_fd(int fd)
+{
+ struct bpf_prog_info prog_info = {};
+ __u32 prog_info_len = sizeof(prog_info);
+ int err;
+
+ err = bpf_obj_get_info_by_fd(fd, &prog_info, &prog_info_len);
+ if (!ASSERT_OK(err, "id_from_prog_fd"))
+ return 0;
+
+ ASSERT_NEQ(prog_info.id, 0, "prog_info.id");
+ return prog_info.id;
+}
+
+static inline __u32 id_from_link_fd(int fd)
+{
+ struct bpf_link_info link_info = {};
+ __u32 link_info_len = sizeof(link_info);
+ int err;
+
+ err = bpf_link_get_info_by_fd(fd, &link_info, &link_info_len);
+ if (!ASSERT_OK(err, "id_from_link_fd"))
+ return 0;
+
+ ASSERT_NEQ(link_info.id, 0, "link_info.id");
+ return link_info.id;
+}
+
+static inline __u32 ifindex_from_link_fd(int fd)
+{
+ struct bpf_link_info link_info = {};
+ __u32 link_info_len = sizeof(link_info);
+ int err;
+
+ err = bpf_link_get_info_by_fd(fd, &link_info, &link_info_len);
+ if (!ASSERT_OK(err, "id_from_link_fd"))
+ return 0;
+
+ return link_info.tcx.ifindex;
+}
+
+static inline void __assert_mprog_count(int target, int expected, bool miniq, int ifindex)
+{
+ __u32 count = 0, attach_flags = 0;
+ int err;
+
+ err = bpf_prog_query(ifindex, target, 0, &attach_flags,
+ NULL, &count);
+ ASSERT_EQ(count, expected, "count");
+ if (!expected && !miniq)
+ ASSERT_EQ(err, -ENOENT, "prog_query");
+ else
+ ASSERT_EQ(err, 0, "prog_query");
+}
+
+static inline void assert_mprog_count(int target, int expected)
+{
+ __assert_mprog_count(target, expected, false, loopback);
+}
+
+static inline void assert_mprog_count_ifindex(int ifindex, int target, int expected)
+{
+ __assert_mprog_count(target, expected, false, ifindex);
+}
+
+#endif /* TC_HELPERS */
diff --git a/tools/testing/selftests/bpf/prog_tests/tc_opts.c b/tools/testing/selftests/bpf/prog_tests/tc_opts.c
new file mode 100644
index 000000000000..7914100f9b46
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/tc_opts.c
@@ -0,0 +1,2239 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2023 Isovalent */
+#include <uapi/linux/if_link.h>
+#include <net/if.h>
+#include <test_progs.h>
+
+#define loopback 1
+#define ping_cmd "ping -q -c1 -w1 127.0.0.1 > /dev/null"
+
+#include "test_tc_link.skel.h"
+#include "tc_helpers.h"
+
+void serial_test_tc_opts_basic(void)
+{
+ LIBBPF_OPTS(bpf_prog_attach_opts, opta);
+ LIBBPF_OPTS(bpf_prog_detach_opts, optd);
+ LIBBPF_OPTS(bpf_prog_query_opts, optq);
+ __u32 fd1, fd2, id1, id2;
+ struct test_tc_link *skel;
+ __u32 prog_ids[2];
+ int err;
+
+ skel = test_tc_link__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "skel_load"))
+ goto cleanup;
+
+ fd1 = bpf_program__fd(skel->progs.tc1);
+ fd2 = bpf_program__fd(skel->progs.tc2);
+
+ id1 = id_from_prog_fd(fd1);
+ id2 = id_from_prog_fd(fd2);
+
+ ASSERT_NEQ(id1, id2, "prog_ids_1_2");
+
+ assert_mprog_count(BPF_TCX_INGRESS, 0);
+ assert_mprog_count(BPF_TCX_EGRESS, 0);
+
+ ASSERT_EQ(skel->bss->seen_tc1, false, "seen_tc1");
+ ASSERT_EQ(skel->bss->seen_tc2, false, "seen_tc2");
+
+ err = bpf_prog_attach_opts(fd1, loopback, BPF_TCX_INGRESS, &opta);
+ if (!ASSERT_EQ(err, 0, "prog_attach"))
+ goto cleanup;
+
+ assert_mprog_count(BPF_TCX_INGRESS, 1);
+ assert_mprog_count(BPF_TCX_EGRESS, 0);
+
+ optq.prog_ids = prog_ids;
+
+ memset(prog_ids, 0, sizeof(prog_ids));
+ optq.count = ARRAY_SIZE(prog_ids);
+
+ err = bpf_prog_query_opts(loopback, BPF_TCX_INGRESS, &optq);
+ if (!ASSERT_OK(err, "prog_query"))
+ goto cleanup_in;
+
+ ASSERT_EQ(optq.count, 1, "count");
+ ASSERT_EQ(optq.revision, 2, "revision");
+ ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]");
+ ASSERT_EQ(optq.prog_ids[1], 0, "prog_ids[1]");
+
+ ASSERT_OK(system(ping_cmd), ping_cmd);
+
+ ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+ ASSERT_EQ(skel->bss->seen_tc2, false, "seen_tc2");
+
+ err = bpf_prog_attach_opts(fd2, loopback, BPF_TCX_EGRESS, &opta);
+ if (!ASSERT_EQ(err, 0, "prog_attach"))
+ goto cleanup_in;
+
+ assert_mprog_count(BPF_TCX_INGRESS, 1);
+ assert_mprog_count(BPF_TCX_EGRESS, 1);
+
+ memset(prog_ids, 0, sizeof(prog_ids));
+ optq.count = ARRAY_SIZE(prog_ids);
+
+ err = bpf_prog_query_opts(loopback, BPF_TCX_EGRESS, &optq);
+ if (!ASSERT_OK(err, "prog_query"))
+ goto cleanup_eg;
+
+ ASSERT_EQ(optq.count, 1, "count");
+ ASSERT_EQ(optq.revision, 2, "revision");
+ ASSERT_EQ(optq.prog_ids[0], id2, "prog_ids[0]");
+ ASSERT_EQ(optq.prog_ids[1], 0, "prog_ids[1]");
+
+ ASSERT_OK(system(ping_cmd), ping_cmd);
+
+ ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+ ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+
+cleanup_eg:
+ err = bpf_prog_detach_opts(fd2, loopback, BPF_TCX_EGRESS, &optd);
+ ASSERT_OK(err, "prog_detach_eg");
+
+ assert_mprog_count(BPF_TCX_INGRESS, 1);
+ assert_mprog_count(BPF_TCX_EGRESS, 0);
+
+cleanup_in:
+ err = bpf_prog_detach_opts(fd1, loopback, BPF_TCX_INGRESS, &optd);
+ ASSERT_OK(err, "prog_detach_in");
+
+ assert_mprog_count(BPF_TCX_INGRESS, 0);
+ assert_mprog_count(BPF_TCX_EGRESS, 0);
+
+cleanup:
+ test_tc_link__destroy(skel);
+}
+
+static void test_tc_opts_before_target(int target)
+{
+ LIBBPF_OPTS(bpf_prog_attach_opts, opta);
+ LIBBPF_OPTS(bpf_prog_detach_opts, optd);
+ LIBBPF_OPTS(bpf_prog_query_opts, optq);
+ __u32 fd1, fd2, fd3, fd4, id1, id2, id3, id4;
+ struct test_tc_link *skel;
+ __u32 prog_ids[5];
+ int err;
+
+ skel = test_tc_link__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "skel_load"))
+ goto cleanup;
+
+ fd1 = bpf_program__fd(skel->progs.tc1);
+ fd2 = bpf_program__fd(skel->progs.tc2);
+ fd3 = bpf_program__fd(skel->progs.tc3);
+ fd4 = bpf_program__fd(skel->progs.tc4);
+
+ id1 = id_from_prog_fd(fd1);
+ id2 = id_from_prog_fd(fd2);
+ id3 = id_from_prog_fd(fd3);
+ id4 = id_from_prog_fd(fd4);
+
+ ASSERT_NEQ(id1, id2, "prog_ids_1_2");
+ ASSERT_NEQ(id3, id4, "prog_ids_3_4");
+ ASSERT_NEQ(id2, id3, "prog_ids_2_3");
+
+ assert_mprog_count(target, 0);
+
+ err = bpf_prog_attach_opts(fd1, loopback, target, &opta);
+ if (!ASSERT_EQ(err, 0, "prog_attach"))
+ goto cleanup;
+
+ assert_mprog_count(target, 1);
+
+ err = bpf_prog_attach_opts(fd2, loopback, target, &opta);
+ if (!ASSERT_EQ(err, 0, "prog_attach"))
+ goto cleanup_target;
+
+ assert_mprog_count(target, 2);
+
+ optq.prog_ids = prog_ids;
+
+ memset(prog_ids, 0, sizeof(prog_ids));
+ optq.count = ARRAY_SIZE(prog_ids);
+
+ err = bpf_prog_query_opts(loopback, target, &optq);
+ if (!ASSERT_OK(err, "prog_query"))
+ goto cleanup_target2;
+
+ ASSERT_EQ(optq.count, 2, "count");
+ ASSERT_EQ(optq.revision, 3, "revision");
+ ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]");
+ ASSERT_EQ(optq.prog_ids[1], id2, "prog_ids[1]");
+ ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]");
+
+ ASSERT_OK(system(ping_cmd), ping_cmd);
+
+ ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+ ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+ ASSERT_EQ(skel->bss->seen_tc3, false, "seen_tc3");
+ ASSERT_EQ(skel->bss->seen_tc4, false, "seen_tc4");
+
+ LIBBPF_OPTS_RESET(opta,
+ .flags = BPF_F_BEFORE,
+ .relative_fd = fd2,
+ );
+
+ err = bpf_prog_attach_opts(fd3, loopback, target, &opta);
+ if (!ASSERT_EQ(err, 0, "prog_attach"))
+ goto cleanup_target2;
+
+ memset(prog_ids, 0, sizeof(prog_ids));
+ optq.count = ARRAY_SIZE(prog_ids);
+
+ err = bpf_prog_query_opts(loopback, target, &optq);
+ if (!ASSERT_OK(err, "prog_query"))
+ goto cleanup_target3;
+
+ ASSERT_EQ(optq.count, 3, "count");
+ ASSERT_EQ(optq.revision, 4, "revision");
+ ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]");
+ ASSERT_EQ(optq.prog_ids[1], id3, "prog_ids[1]");
+ ASSERT_EQ(optq.prog_ids[2], id2, "prog_ids[2]");
+ ASSERT_EQ(optq.prog_ids[3], 0, "prog_ids[3]");
+
+ LIBBPF_OPTS_RESET(opta,
+ .flags = BPF_F_BEFORE,
+ .relative_id = id1,
+ );
+
+ err = bpf_prog_attach_opts(fd4, loopback, target, &opta);
+ if (!ASSERT_EQ(err, 0, "prog_attach"))
+ goto cleanup_target3;
+
+ assert_mprog_count(target, 4);
+
+ memset(prog_ids, 0, sizeof(prog_ids));
+ optq.count = ARRAY_SIZE(prog_ids);
+
+ err = bpf_prog_query_opts(loopback, target, &optq);
+ if (!ASSERT_OK(err, "prog_query"))
+ goto cleanup_target4;
+
+ ASSERT_EQ(optq.count, 4, "count");
+ ASSERT_EQ(optq.revision, 5, "revision");
+ ASSERT_EQ(optq.prog_ids[0], id4, "prog_ids[0]");
+ ASSERT_EQ(optq.prog_ids[1], id1, "prog_ids[1]");
+ ASSERT_EQ(optq.prog_ids[2], id3, "prog_ids[2]");
+ ASSERT_EQ(optq.prog_ids[3], id2, "prog_ids[3]");
+ ASSERT_EQ(optq.prog_ids[4], 0, "prog_ids[4]");
+
+ ASSERT_OK(system(ping_cmd), ping_cmd);
+
+ ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+ ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+ ASSERT_EQ(skel->bss->seen_tc3, true, "seen_tc3");
+ ASSERT_EQ(skel->bss->seen_tc4, true, "seen_tc4");
+
+cleanup_target4:
+ err = bpf_prog_detach_opts(fd4, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+ assert_mprog_count(target, 3);
+
+cleanup_target3:
+ err = bpf_prog_detach_opts(fd3, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+ assert_mprog_count(target, 2);
+
+cleanup_target2:
+ err = bpf_prog_detach_opts(fd2, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+ assert_mprog_count(target, 1);
+
+cleanup_target:
+ err = bpf_prog_detach_opts(fd1, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+ assert_mprog_count(target, 0);
+
+cleanup:
+ test_tc_link__destroy(skel);
+}
+
+void serial_test_tc_opts_before(void)
+{
+ test_tc_opts_before_target(BPF_TCX_INGRESS);
+ test_tc_opts_before_target(BPF_TCX_EGRESS);
+}
+
+static void test_tc_opts_after_target(int target)
+{
+ LIBBPF_OPTS(bpf_prog_attach_opts, opta);
+ LIBBPF_OPTS(bpf_prog_detach_opts, optd);
+ LIBBPF_OPTS(bpf_prog_query_opts, optq);
+ __u32 fd1, fd2, fd3, fd4, id1, id2, id3, id4;
+ struct test_tc_link *skel;
+ __u32 prog_ids[5];
+ int err;
+
+ skel = test_tc_link__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "skel_load"))
+ goto cleanup;
+
+ fd1 = bpf_program__fd(skel->progs.tc1);
+ fd2 = bpf_program__fd(skel->progs.tc2);
+ fd3 = bpf_program__fd(skel->progs.tc3);
+ fd4 = bpf_program__fd(skel->progs.tc4);
+
+ id1 = id_from_prog_fd(fd1);
+ id2 = id_from_prog_fd(fd2);
+ id3 = id_from_prog_fd(fd3);
+ id4 = id_from_prog_fd(fd4);
+
+ ASSERT_NEQ(id1, id2, "prog_ids_1_2");
+ ASSERT_NEQ(id3, id4, "prog_ids_3_4");
+ ASSERT_NEQ(id2, id3, "prog_ids_2_3");
+
+ assert_mprog_count(target, 0);
+
+ err = bpf_prog_attach_opts(fd1, loopback, target, &opta);
+ if (!ASSERT_EQ(err, 0, "prog_attach"))
+ goto cleanup;
+
+ assert_mprog_count(target, 1);
+
+ err = bpf_prog_attach_opts(fd2, loopback, target, &opta);
+ if (!ASSERT_EQ(err, 0, "prog_attach"))
+ goto cleanup_target;
+
+ assert_mprog_count(target, 2);
+
+ optq.prog_ids = prog_ids;
+
+ memset(prog_ids, 0, sizeof(prog_ids));
+ optq.count = ARRAY_SIZE(prog_ids);
+
+ err = bpf_prog_query_opts(loopback, target, &optq);
+ if (!ASSERT_OK(err, "prog_query"))
+ goto cleanup_target2;
+
+ ASSERT_EQ(optq.count, 2, "count");
+ ASSERT_EQ(optq.revision, 3, "revision");
+ ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]");
+ ASSERT_EQ(optq.prog_ids[1], id2, "prog_ids[1]");
+ ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]");
+
+ ASSERT_OK(system(ping_cmd), ping_cmd);
+
+ ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+ ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+ ASSERT_EQ(skel->bss->seen_tc3, false, "seen_tc3");
+ ASSERT_EQ(skel->bss->seen_tc4, false, "seen_tc4");
+
+ LIBBPF_OPTS_RESET(opta,
+ .flags = BPF_F_AFTER,
+ .relative_fd = fd1,
+ );
+
+ err = bpf_prog_attach_opts(fd3, loopback, target, &opta);
+ if (!ASSERT_EQ(err, 0, "prog_attach"))
+ goto cleanup_target2;
+
+ memset(prog_ids, 0, sizeof(prog_ids));
+ optq.count = ARRAY_SIZE(prog_ids);
+
+ err = bpf_prog_query_opts(loopback, target, &optq);
+ if (!ASSERT_OK(err, "prog_query"))
+ goto cleanup_target3;
+
+ ASSERT_EQ(optq.count, 3, "count");
+ ASSERT_EQ(optq.revision, 4, "revision");
+ ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]");
+ ASSERT_EQ(optq.prog_ids[1], id3, "prog_ids[1]");
+ ASSERT_EQ(optq.prog_ids[2], id2, "prog_ids[2]");
+ ASSERT_EQ(optq.prog_ids[3], 0, "prog_ids[3]");
+
+ LIBBPF_OPTS_RESET(opta,
+ .flags = BPF_F_AFTER,
+ .relative_id = id2,
+ );
+
+ err = bpf_prog_attach_opts(fd4, loopback, target, &opta);
+ if (!ASSERT_EQ(err, 0, "prog_attach"))
+ goto cleanup_target3;
+
+ assert_mprog_count(target, 4);
+
+ memset(prog_ids, 0, sizeof(prog_ids));
+ optq.count = ARRAY_SIZE(prog_ids);
+
+ err = bpf_prog_query_opts(loopback, target, &optq);
+ if (!ASSERT_OK(err, "prog_query"))
+ goto cleanup_target4;
+
+ ASSERT_EQ(optq.count, 4, "count");
+ ASSERT_EQ(optq.revision, 5, "revision");
+ ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]");
+ ASSERT_EQ(optq.prog_ids[1], id3, "prog_ids[1]");
+ ASSERT_EQ(optq.prog_ids[2], id2, "prog_ids[2]");
+ ASSERT_EQ(optq.prog_ids[3], id4, "prog_ids[3]");
+ ASSERT_EQ(optq.prog_ids[4], 0, "prog_ids[4]");
+
+ ASSERT_OK(system(ping_cmd), ping_cmd);
+
+ ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+ ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+ ASSERT_EQ(skel->bss->seen_tc3, true, "seen_tc3");
+ ASSERT_EQ(skel->bss->seen_tc4, true, "seen_tc4");
+
+cleanup_target4:
+ err = bpf_prog_detach_opts(fd4, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+ assert_mprog_count(target, 3);
+
+ memset(prog_ids, 0, sizeof(prog_ids));
+ optq.count = ARRAY_SIZE(prog_ids);
+
+ err = bpf_prog_query_opts(loopback, target, &optq);
+ if (!ASSERT_OK(err, "prog_query"))
+ goto cleanup_target3;
+
+ ASSERT_EQ(optq.count, 3, "count");
+ ASSERT_EQ(optq.revision, 6, "revision");
+ ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]");
+ ASSERT_EQ(optq.prog_ids[1], id3, "prog_ids[1]");
+ ASSERT_EQ(optq.prog_ids[2], id2, "prog_ids[2]");
+ ASSERT_EQ(optq.prog_ids[3], 0, "prog_ids[3]");
+
+cleanup_target3:
+ err = bpf_prog_detach_opts(fd3, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+ assert_mprog_count(target, 2);
+
+ memset(prog_ids, 0, sizeof(prog_ids));
+ optq.count = ARRAY_SIZE(prog_ids);
+
+ err = bpf_prog_query_opts(loopback, target, &optq);
+ if (!ASSERT_OK(err, "prog_query"))
+ goto cleanup_target2;
+
+ ASSERT_EQ(optq.count, 2, "count");
+ ASSERT_EQ(optq.revision, 7, "revision");
+ ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]");
+ ASSERT_EQ(optq.prog_ids[1], id2, "prog_ids[1]");
+ ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]");
+
+cleanup_target2:
+ err = bpf_prog_detach_opts(fd2, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+ assert_mprog_count(target, 1);
+
+ memset(prog_ids, 0, sizeof(prog_ids));
+ optq.count = ARRAY_SIZE(prog_ids);
+
+ err = bpf_prog_query_opts(loopback, target, &optq);
+ if (!ASSERT_OK(err, "prog_query"))
+ goto cleanup_target;
+
+ ASSERT_EQ(optq.count, 1, "count");
+ ASSERT_EQ(optq.revision, 8, "revision");
+ ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]");
+ ASSERT_EQ(optq.prog_ids[1], 0, "prog_ids[1]");
+
+cleanup_target:
+ err = bpf_prog_detach_opts(fd1, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+ assert_mprog_count(target, 0);
+
+cleanup:
+ test_tc_link__destroy(skel);
+}
+
+void serial_test_tc_opts_after(void)
+{
+ test_tc_opts_after_target(BPF_TCX_INGRESS);
+ test_tc_opts_after_target(BPF_TCX_EGRESS);
+}
+
+static void test_tc_opts_revision_target(int target)
+{
+ LIBBPF_OPTS(bpf_prog_attach_opts, opta);
+ LIBBPF_OPTS(bpf_prog_detach_opts, optd);
+ LIBBPF_OPTS(bpf_prog_query_opts, optq);
+ __u32 fd1, fd2, id1, id2;
+ struct test_tc_link *skel;
+ __u32 prog_ids[3];
+ int err;
+
+ skel = test_tc_link__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "skel_load"))
+ goto cleanup;
+
+ fd1 = bpf_program__fd(skel->progs.tc1);
+ fd2 = bpf_program__fd(skel->progs.tc2);
+
+ id1 = id_from_prog_fd(fd1);
+ id2 = id_from_prog_fd(fd2);
+
+ ASSERT_NEQ(id1, id2, "prog_ids_1_2");
+
+ assert_mprog_count(target, 0);
+
+ LIBBPF_OPTS_RESET(opta,
+ .expected_revision = 1,
+ );
+
+ err = bpf_prog_attach_opts(fd1, loopback, target, &opta);
+ if (!ASSERT_EQ(err, 0, "prog_attach"))
+ goto cleanup;
+
+ assert_mprog_count(target, 1);
+
+ LIBBPF_OPTS_RESET(opta,
+ .expected_revision = 1,
+ );
+
+ err = bpf_prog_attach_opts(fd2, loopback, target, &opta);
+ if (!ASSERT_EQ(err, -ESTALE, "prog_attach"))
+ goto cleanup_target;
+
+ assert_mprog_count(target, 1);
+
+ LIBBPF_OPTS_RESET(opta,
+ .expected_revision = 2,
+ );
+
+ err = bpf_prog_attach_opts(fd2, loopback, target, &opta);
+ if (!ASSERT_EQ(err, 0, "prog_attach"))
+ goto cleanup_target;
+
+ assert_mprog_count(target, 2);
+
+ optq.prog_ids = prog_ids;
+
+ memset(prog_ids, 0, sizeof(prog_ids));
+ optq.count = ARRAY_SIZE(prog_ids);
+
+ err = bpf_prog_query_opts(loopback, target, &optq);
+ if (!ASSERT_OK(err, "prog_query"))
+ goto cleanup_target2;
+
+ ASSERT_EQ(optq.count, 2, "count");
+ ASSERT_EQ(optq.revision, 3, "revision");
+ ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]");
+ ASSERT_EQ(optq.prog_ids[1], id2, "prog_ids[1]");
+ ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]");
+
+ ASSERT_OK(system(ping_cmd), ping_cmd);
+
+ ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+ ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+
+ LIBBPF_OPTS_RESET(optd,
+ .expected_revision = 2,
+ );
+
+ err = bpf_prog_detach_opts(fd2, loopback, target, &optd);
+ ASSERT_EQ(err, -ESTALE, "prog_detach");
+ assert_mprog_count(target, 2);
+
+cleanup_target2:
+ LIBBPF_OPTS_RESET(optd,
+ .expected_revision = 3,
+ );
+
+ err = bpf_prog_detach_opts(fd2, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+ assert_mprog_count(target, 1);
+
+cleanup_target:
+ LIBBPF_OPTS_RESET(optd);
+
+ err = bpf_prog_detach_opts(fd1, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+ assert_mprog_count(target, 0);
+
+cleanup:
+ test_tc_link__destroy(skel);
+}
+
+void serial_test_tc_opts_revision(void)
+{
+ test_tc_opts_revision_target(BPF_TCX_INGRESS);
+ test_tc_opts_revision_target(BPF_TCX_EGRESS);
+}
+
+static void test_tc_chain_classic(int target, bool chain_tc_old)
+{
+ LIBBPF_OPTS(bpf_tc_opts, tc_opts, .handle = 1, .priority = 1);
+ LIBBPF_OPTS(bpf_tc_hook, tc_hook, .ifindex = loopback);
+ LIBBPF_OPTS(bpf_prog_attach_opts, opta);
+ LIBBPF_OPTS(bpf_prog_detach_opts, optd);
+ bool hook_created = false, tc_attached = false;
+ __u32 fd1, fd2, fd3, id1, id2, id3;
+ struct test_tc_link *skel;
+ int err;
+
+ skel = test_tc_link__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "skel_load"))
+ goto cleanup;
+
+ fd1 = bpf_program__fd(skel->progs.tc1);
+ fd2 = bpf_program__fd(skel->progs.tc2);
+ fd3 = bpf_program__fd(skel->progs.tc3);
+
+ id1 = id_from_prog_fd(fd1);
+ id2 = id_from_prog_fd(fd2);
+ id3 = id_from_prog_fd(fd3);
+
+ ASSERT_NEQ(id1, id2, "prog_ids_1_2");
+ ASSERT_NEQ(id2, id3, "prog_ids_2_3");
+
+ assert_mprog_count(target, 0);
+
+ if (chain_tc_old) {
+ tc_hook.attach_point = target == BPF_TCX_INGRESS ?
+ BPF_TC_INGRESS : BPF_TC_EGRESS;
+ err = bpf_tc_hook_create(&tc_hook);
+ if (err == 0)
+ hook_created = true;
+ err = err == -EEXIST ? 0 : err;
+ if (!ASSERT_OK(err, "bpf_tc_hook_create"))
+ goto cleanup;
+
+ tc_opts.prog_fd = fd3;
+ err = bpf_tc_attach(&tc_hook, &tc_opts);
+ if (!ASSERT_OK(err, "bpf_tc_attach"))
+ goto cleanup;
+ tc_attached = true;
+ }
+
+ err = bpf_prog_attach_opts(fd1, loopback, target, &opta);
+ if (!ASSERT_EQ(err, 0, "prog_attach"))
+ goto cleanup;
+
+ err = bpf_prog_attach_opts(fd2, loopback, target, &opta);
+ if (!ASSERT_EQ(err, 0, "prog_attach"))
+ goto cleanup_detach;
+
+ assert_mprog_count(target, 2);
+
+ ASSERT_OK(system(ping_cmd), ping_cmd);
+
+ ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+ ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+ ASSERT_EQ(skel->bss->seen_tc3, chain_tc_old, "seen_tc3");
+
+ skel->bss->seen_tc1 = false;
+ skel->bss->seen_tc2 = false;
+ skel->bss->seen_tc3 = false;
+
+ err = bpf_prog_detach_opts(fd2, loopback, target, &optd);
+ if (!ASSERT_OK(err, "prog_detach"))
+ goto cleanup_detach;
+
+ assert_mprog_count(target, 1);
+
+ ASSERT_OK(system(ping_cmd), ping_cmd);
+
+ ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+ ASSERT_EQ(skel->bss->seen_tc2, false, "seen_tc2");
+ ASSERT_EQ(skel->bss->seen_tc3, chain_tc_old, "seen_tc3");
+
+cleanup_detach:
+ err = bpf_prog_detach_opts(fd1, loopback, target, &optd);
+ if (!ASSERT_OK(err, "prog_detach"))
+ goto cleanup;
+
+ __assert_mprog_count(target, 0, chain_tc_old, loopback);
+cleanup:
+ if (tc_attached) {
+ tc_opts.flags = tc_opts.prog_fd = tc_opts.prog_id = 0;
+ err = bpf_tc_detach(&tc_hook, &tc_opts);
+ ASSERT_OK(err, "bpf_tc_detach");
+ }
+ if (hook_created) {
+ tc_hook.attach_point = BPF_TC_INGRESS | BPF_TC_EGRESS;
+ bpf_tc_hook_destroy(&tc_hook);
+ }
+ test_tc_link__destroy(skel);
+ assert_mprog_count(target, 0);
+}
+
+void serial_test_tc_opts_chain_classic(void)
+{
+ test_tc_chain_classic(BPF_TCX_INGRESS, false);
+ test_tc_chain_classic(BPF_TCX_EGRESS, false);
+ test_tc_chain_classic(BPF_TCX_INGRESS, true);
+ test_tc_chain_classic(BPF_TCX_EGRESS, true);
+}
+
+static void test_tc_opts_replace_target(int target)
+{
+ LIBBPF_OPTS(bpf_prog_attach_opts, opta);
+ LIBBPF_OPTS(bpf_prog_detach_opts, optd);
+ LIBBPF_OPTS(bpf_prog_query_opts, optq);
+ __u32 fd1, fd2, fd3, id1, id2, id3, detach_fd;
+ __u32 prog_ids[4], prog_flags[4];
+ struct test_tc_link *skel;
+ int err;
+
+ skel = test_tc_link__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "skel_load"))
+ goto cleanup;
+
+ fd1 = bpf_program__fd(skel->progs.tc1);
+ fd2 = bpf_program__fd(skel->progs.tc2);
+ fd3 = bpf_program__fd(skel->progs.tc3);
+
+ id1 = id_from_prog_fd(fd1);
+ id2 = id_from_prog_fd(fd2);
+ id3 = id_from_prog_fd(fd3);
+
+ ASSERT_NEQ(id1, id2, "prog_ids_1_2");
+ ASSERT_NEQ(id2, id3, "prog_ids_2_3");
+
+ assert_mprog_count(target, 0);
+
+ LIBBPF_OPTS_RESET(opta,
+ .expected_revision = 1,
+ );
+
+ err = bpf_prog_attach_opts(fd1, loopback, target, &opta);
+ if (!ASSERT_EQ(err, 0, "prog_attach"))
+ goto cleanup;
+
+ assert_mprog_count(target, 1);
+
+ LIBBPF_OPTS_RESET(opta,
+ .flags = BPF_F_BEFORE,
+ .relative_id = id1,
+ .expected_revision = 2,
+ );
+
+ err = bpf_prog_attach_opts(fd2, loopback, target, &opta);
+ if (!ASSERT_EQ(err, 0, "prog_attach"))
+ goto cleanup_target;
+
+ detach_fd = fd2;
+
+ assert_mprog_count(target, 2);
+
+ optq.prog_attach_flags = prog_flags;
+ optq.prog_ids = prog_ids;
+
+ memset(prog_flags, 0, sizeof(prog_flags));
+ memset(prog_ids, 0, sizeof(prog_ids));
+ optq.count = ARRAY_SIZE(prog_ids);
+
+ err = bpf_prog_query_opts(loopback, target, &optq);
+ if (!ASSERT_OK(err, "prog_query"))
+ goto cleanup_target2;
+
+ ASSERT_EQ(optq.count, 2, "count");
+ ASSERT_EQ(optq.revision, 3, "revision");
+ ASSERT_EQ(optq.prog_ids[0], id2, "prog_ids[0]");
+ ASSERT_EQ(optq.prog_ids[1], id1, "prog_ids[1]");
+ ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]");
+
+ ASSERT_EQ(optq.prog_attach_flags[0], 0, "prog_flags[0]");
+ ASSERT_EQ(optq.prog_attach_flags[1], 0, "prog_flags[1]");
+ ASSERT_EQ(optq.prog_attach_flags[2], 0, "prog_flags[2]");
+
+ ASSERT_OK(system(ping_cmd), ping_cmd);
+
+ ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+ ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+ ASSERT_EQ(skel->bss->seen_tc3, false, "seen_tc3");
+
+ skel->bss->seen_tc1 = false;
+ skel->bss->seen_tc2 = false;
+ skel->bss->seen_tc3 = false;
+
+ LIBBPF_OPTS_RESET(opta,
+ .flags = BPF_F_REPLACE,
+ .replace_prog_fd = fd2,
+ .expected_revision = 3,
+ );
+
+ err = bpf_prog_attach_opts(fd3, loopback, target, &opta);
+ if (!ASSERT_EQ(err, 0, "prog_attach"))
+ goto cleanup_target2;
+
+ detach_fd = fd3;
+
+ assert_mprog_count(target, 2);
+
+ memset(prog_ids, 0, sizeof(prog_ids));
+ optq.count = ARRAY_SIZE(prog_ids);
+
+ err = bpf_prog_query_opts(loopback, target, &optq);
+ if (!ASSERT_OK(err, "prog_query"))
+ goto cleanup_target2;
+
+ ASSERT_EQ(optq.count, 2, "count");
+ ASSERT_EQ(optq.revision, 4, "revision");
+ ASSERT_EQ(optq.prog_ids[0], id3, "prog_ids[0]");
+ ASSERT_EQ(optq.prog_ids[1], id1, "prog_ids[1]");
+ ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]");
+
+ ASSERT_OK(system(ping_cmd), ping_cmd);
+
+ ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+ ASSERT_EQ(skel->bss->seen_tc2, false, "seen_tc2");
+ ASSERT_EQ(skel->bss->seen_tc3, true, "seen_tc3");
+
+ skel->bss->seen_tc1 = false;
+ skel->bss->seen_tc2 = false;
+ skel->bss->seen_tc3 = false;
+
+ LIBBPF_OPTS_RESET(opta,
+ .flags = BPF_F_REPLACE | BPF_F_BEFORE,
+ .replace_prog_fd = fd3,
+ .relative_fd = fd1,
+ .expected_revision = 4,
+ );
+
+ err = bpf_prog_attach_opts(fd2, loopback, target, &opta);
+ if (!ASSERT_EQ(err, 0, "prog_attach"))
+ goto cleanup_target2;
+
+ detach_fd = fd2;
+
+ assert_mprog_count(target, 2);
+
+ memset(prog_ids, 0, sizeof(prog_ids));
+ optq.count = ARRAY_SIZE(prog_ids);
+
+ err = bpf_prog_query_opts(loopback, target, &optq);
+ if (!ASSERT_OK(err, "prog_query"))
+ goto cleanup_target2;
+
+ ASSERT_EQ(optq.count, 2, "count");
+ ASSERT_EQ(optq.revision, 5, "revision");
+ ASSERT_EQ(optq.prog_ids[0], id2, "prog_ids[0]");
+ ASSERT_EQ(optq.prog_ids[1], id1, "prog_ids[1]");
+ ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]");
+
+ ASSERT_OK(system(ping_cmd), ping_cmd);
+
+ ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+ ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+ ASSERT_EQ(skel->bss->seen_tc3, false, "seen_tc3");
+
+ LIBBPF_OPTS_RESET(opta,
+ .flags = BPF_F_REPLACE,
+ .replace_prog_fd = fd2,
+ );
+
+ err = bpf_prog_attach_opts(fd2, loopback, target, &opta);
+ ASSERT_EQ(err, -EEXIST, "prog_attach");
+ assert_mprog_count(target, 2);
+
+ LIBBPF_OPTS_RESET(opta,
+ .flags = BPF_F_REPLACE | BPF_F_AFTER,
+ .replace_prog_fd = fd2,
+ .relative_fd = fd1,
+ .expected_revision = 5,
+ );
+
+ err = bpf_prog_attach_opts(fd3, loopback, target, &opta);
+ ASSERT_EQ(err, -ERANGE, "prog_attach");
+ assert_mprog_count(target, 2);
+
+ LIBBPF_OPTS_RESET(opta,
+ .flags = BPF_F_BEFORE | BPF_F_AFTER | BPF_F_REPLACE,
+ .replace_prog_fd = fd2,
+ .relative_fd = fd1,
+ .expected_revision = 5,
+ );
+
+ err = bpf_prog_attach_opts(fd3, loopback, target, &opta);
+ ASSERT_EQ(err, -ERANGE, "prog_attach");
+ assert_mprog_count(target, 2);
+
+ LIBBPF_OPTS_RESET(optd,
+ .flags = BPF_F_BEFORE,
+ .relative_id = id1,
+ .expected_revision = 5,
+ );
+
+cleanup_target2:
+ err = bpf_prog_detach_opts(detach_fd, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+ assert_mprog_count(target, 1);
+
+cleanup_target:
+ LIBBPF_OPTS_RESET(optd);
+
+ err = bpf_prog_detach_opts(fd1, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+ assert_mprog_count(target, 0);
+
+cleanup:
+ test_tc_link__destroy(skel);
+}
+
+void serial_test_tc_opts_replace(void)
+{
+ test_tc_opts_replace_target(BPF_TCX_INGRESS);
+ test_tc_opts_replace_target(BPF_TCX_EGRESS);
+}
+
+static void test_tc_opts_invalid_target(int target)
+{
+ LIBBPF_OPTS(bpf_prog_attach_opts, opta);
+ LIBBPF_OPTS(bpf_prog_detach_opts, optd);
+ __u32 fd1, fd2, id1, id2;
+ struct test_tc_link *skel;
+ int err;
+
+ skel = test_tc_link__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "skel_load"))
+ goto cleanup;
+
+ fd1 = bpf_program__fd(skel->progs.tc1);
+ fd2 = bpf_program__fd(skel->progs.tc2);
+
+ id1 = id_from_prog_fd(fd1);
+ id2 = id_from_prog_fd(fd2);
+
+ ASSERT_NEQ(id1, id2, "prog_ids_1_2");
+
+ assert_mprog_count(target, 0);
+
+ LIBBPF_OPTS_RESET(opta,
+ .flags = BPF_F_BEFORE | BPF_F_AFTER,
+ );
+
+ err = bpf_prog_attach_opts(fd1, loopback, target, &opta);
+ ASSERT_EQ(err, -ERANGE, "prog_attach");
+ assert_mprog_count(target, 0);
+
+ LIBBPF_OPTS_RESET(opta,
+ .flags = BPF_F_BEFORE | BPF_F_ID,
+ );
+
+ err = bpf_prog_attach_opts(fd1, loopback, target, &opta);
+ ASSERT_EQ(err, -ENOENT, "prog_attach");
+ assert_mprog_count(target, 0);
+
+ LIBBPF_OPTS_RESET(opta,
+ .flags = BPF_F_AFTER | BPF_F_ID,
+ );
+
+ err = bpf_prog_attach_opts(fd1, loopback, target, &opta);
+ ASSERT_EQ(err, -ENOENT, "prog_attach");
+ assert_mprog_count(target, 0);
+
+ LIBBPF_OPTS_RESET(opta,
+ .relative_fd = fd2,
+ );
+
+ err = bpf_prog_attach_opts(fd1, loopback, target, &opta);
+ ASSERT_EQ(err, -EINVAL, "prog_attach");
+ assert_mprog_count(target, 0);
+
+ LIBBPF_OPTS_RESET(opta,
+ .flags = BPF_F_BEFORE | BPF_F_AFTER,
+ .relative_fd = fd2,
+ );
+
+ err = bpf_prog_attach_opts(fd1, loopback, target, &opta);
+ ASSERT_EQ(err, -ENOENT, "prog_attach");
+ assert_mprog_count(target, 0);
+
+ LIBBPF_OPTS_RESET(opta,
+ .flags = BPF_F_ID,
+ .relative_id = id2,
+ );
+
+ err = bpf_prog_attach_opts(fd1, loopback, target, &opta);
+ ASSERT_EQ(err, -EINVAL, "prog_attach");
+ assert_mprog_count(target, 0);
+
+ LIBBPF_OPTS_RESET(opta,
+ .flags = BPF_F_BEFORE,
+ .relative_fd = fd1,
+ );
+
+ err = bpf_prog_attach_opts(fd1, loopback, target, &opta);
+ ASSERT_EQ(err, -ENOENT, "prog_attach");
+ assert_mprog_count(target, 0);
+
+ LIBBPF_OPTS_RESET(opta,
+ .flags = BPF_F_AFTER,
+ .relative_fd = fd1,
+ );
+
+ err = bpf_prog_attach_opts(fd1, loopback, target, &opta);
+ ASSERT_EQ(err, -ENOENT, "prog_attach");
+ assert_mprog_count(target, 0);
+
+ LIBBPF_OPTS_RESET(opta);
+
+ err = bpf_prog_attach_opts(fd1, loopback, target, &opta);
+ if (!ASSERT_EQ(err, 0, "prog_attach"))
+ goto cleanup;
+
+ assert_mprog_count(target, 1);
+
+ LIBBPF_OPTS_RESET(opta);
+
+ err = bpf_prog_attach_opts(fd1, loopback, target, &opta);
+ ASSERT_EQ(err, -EEXIST, "prog_attach");
+ assert_mprog_count(target, 1);
+
+ LIBBPF_OPTS_RESET(opta,
+ .flags = BPF_F_BEFORE,
+ .relative_fd = fd1,
+ );
+
+ err = bpf_prog_attach_opts(fd1, loopback, target, &opta);
+ ASSERT_EQ(err, -EEXIST, "prog_attach");
+ assert_mprog_count(target, 1);
+
+ LIBBPF_OPTS_RESET(opta,
+ .flags = BPF_F_AFTER,
+ .relative_fd = fd1,
+ );
+
+ err = bpf_prog_attach_opts(fd1, loopback, target, &opta);
+ ASSERT_EQ(err, -EEXIST, "prog_attach");
+ assert_mprog_count(target, 1);
+
+ LIBBPF_OPTS_RESET(opta,
+ .flags = BPF_F_REPLACE,
+ .relative_fd = fd1,
+ );
+
+ err = bpf_prog_attach_opts(fd1, loopback, target, &opta);
+ ASSERT_EQ(err, -EINVAL, "prog_attach_x1");
+ assert_mprog_count(target, 1);
+
+ LIBBPF_OPTS_RESET(opta,
+ .flags = BPF_F_REPLACE,
+ .replace_prog_fd = fd1,
+ );
+
+ err = bpf_prog_attach_opts(fd1, loopback, target, &opta);
+ ASSERT_EQ(err, -EEXIST, "prog_attach");
+ assert_mprog_count(target, 1);
+
+ err = bpf_prog_detach_opts(fd1, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+ assert_mprog_count(target, 0);
+cleanup:
+ test_tc_link__destroy(skel);
+}
+
+void serial_test_tc_opts_invalid(void)
+{
+ test_tc_opts_invalid_target(BPF_TCX_INGRESS);
+ test_tc_opts_invalid_target(BPF_TCX_EGRESS);
+}
+
+static void test_tc_opts_prepend_target(int target)
+{
+ LIBBPF_OPTS(bpf_prog_attach_opts, opta);
+ LIBBPF_OPTS(bpf_prog_detach_opts, optd);
+ LIBBPF_OPTS(bpf_prog_query_opts, optq);
+ __u32 fd1, fd2, fd3, fd4, id1, id2, id3, id4;
+ struct test_tc_link *skel;
+ __u32 prog_ids[5];
+ int err;
+
+ skel = test_tc_link__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "skel_load"))
+ goto cleanup;
+
+ fd1 = bpf_program__fd(skel->progs.tc1);
+ fd2 = bpf_program__fd(skel->progs.tc2);
+ fd3 = bpf_program__fd(skel->progs.tc3);
+ fd4 = bpf_program__fd(skel->progs.tc4);
+
+ id1 = id_from_prog_fd(fd1);
+ id2 = id_from_prog_fd(fd2);
+ id3 = id_from_prog_fd(fd3);
+ id4 = id_from_prog_fd(fd4);
+
+ ASSERT_NEQ(id1, id2, "prog_ids_1_2");
+ ASSERT_NEQ(id3, id4, "prog_ids_3_4");
+ ASSERT_NEQ(id2, id3, "prog_ids_2_3");
+
+ assert_mprog_count(target, 0);
+
+ err = bpf_prog_attach_opts(fd1, loopback, target, &opta);
+ if (!ASSERT_EQ(err, 0, "prog_attach"))
+ goto cleanup;
+
+ assert_mprog_count(target, 1);
+
+ LIBBPF_OPTS_RESET(opta,
+ .flags = BPF_F_BEFORE,
+ );
+
+ err = bpf_prog_attach_opts(fd2, loopback, target, &opta);
+ if (!ASSERT_EQ(err, 0, "prog_attach"))
+ goto cleanup_target;
+
+ assert_mprog_count(target, 2);
+
+ optq.prog_ids = prog_ids;
+
+ memset(prog_ids, 0, sizeof(prog_ids));
+ optq.count = ARRAY_SIZE(prog_ids);
+
+ err = bpf_prog_query_opts(loopback, target, &optq);
+ if (!ASSERT_OK(err, "prog_query"))
+ goto cleanup_target2;
+
+ ASSERT_EQ(optq.count, 2, "count");
+ ASSERT_EQ(optq.revision, 3, "revision");
+ ASSERT_EQ(optq.prog_ids[0], id2, "prog_ids[0]");
+ ASSERT_EQ(optq.prog_ids[1], id1, "prog_ids[1]");
+ ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]");
+
+ ASSERT_OK(system(ping_cmd), ping_cmd);
+
+ ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+ ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+ ASSERT_EQ(skel->bss->seen_tc3, false, "seen_tc3");
+ ASSERT_EQ(skel->bss->seen_tc4, false, "seen_tc4");
+
+ LIBBPF_OPTS_RESET(opta,
+ .flags = BPF_F_BEFORE,
+ );
+
+ err = bpf_prog_attach_opts(fd3, loopback, target, &opta);
+ if (!ASSERT_EQ(err, 0, "prog_attach"))
+ goto cleanup_target2;
+
+ LIBBPF_OPTS_RESET(opta,
+ .flags = BPF_F_BEFORE,
+ );
+
+ err = bpf_prog_attach_opts(fd4, loopback, target, &opta);
+ if (!ASSERT_EQ(err, 0, "prog_attach"))
+ goto cleanup_target3;
+
+ assert_mprog_count(target, 4);
+
+ memset(prog_ids, 0, sizeof(prog_ids));
+ optq.count = ARRAY_SIZE(prog_ids);
+
+ err = bpf_prog_query_opts(loopback, target, &optq);
+ if (!ASSERT_OK(err, "prog_query"))
+ goto cleanup_target4;
+
+ ASSERT_EQ(optq.count, 4, "count");
+ ASSERT_EQ(optq.revision, 5, "revision");
+ ASSERT_EQ(optq.prog_ids[0], id4, "prog_ids[0]");
+ ASSERT_EQ(optq.prog_ids[1], id3, "prog_ids[1]");
+ ASSERT_EQ(optq.prog_ids[2], id2, "prog_ids[2]");
+ ASSERT_EQ(optq.prog_ids[3], id1, "prog_ids[3]");
+ ASSERT_EQ(optq.prog_ids[4], 0, "prog_ids[4]");
+
+ ASSERT_OK(system(ping_cmd), ping_cmd);
+
+ ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+ ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+ ASSERT_EQ(skel->bss->seen_tc3, true, "seen_tc3");
+ ASSERT_EQ(skel->bss->seen_tc4, true, "seen_tc4");
+
+cleanup_target4:
+ err = bpf_prog_detach_opts(fd4, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+ assert_mprog_count(target, 3);
+
+cleanup_target3:
+ err = bpf_prog_detach_opts(fd3, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+ assert_mprog_count(target, 2);
+
+cleanup_target2:
+ err = bpf_prog_detach_opts(fd2, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+ assert_mprog_count(target, 1);
+
+cleanup_target:
+ err = bpf_prog_detach_opts(fd1, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+ assert_mprog_count(target, 0);
+
+cleanup:
+ test_tc_link__destroy(skel);
+}
+
+void serial_test_tc_opts_prepend(void)
+{
+ test_tc_opts_prepend_target(BPF_TCX_INGRESS);
+ test_tc_opts_prepend_target(BPF_TCX_EGRESS);
+}
+
+static void test_tc_opts_append_target(int target)
+{
+ LIBBPF_OPTS(bpf_prog_attach_opts, opta);
+ LIBBPF_OPTS(bpf_prog_detach_opts, optd);
+ LIBBPF_OPTS(bpf_prog_query_opts, optq);
+ __u32 fd1, fd2, fd3, fd4, id1, id2, id3, id4;
+ struct test_tc_link *skel;
+ __u32 prog_ids[5];
+ int err;
+
+ skel = test_tc_link__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "skel_load"))
+ goto cleanup;
+
+ fd1 = bpf_program__fd(skel->progs.tc1);
+ fd2 = bpf_program__fd(skel->progs.tc2);
+ fd3 = bpf_program__fd(skel->progs.tc3);
+ fd4 = bpf_program__fd(skel->progs.tc4);
+
+ id1 = id_from_prog_fd(fd1);
+ id2 = id_from_prog_fd(fd2);
+ id3 = id_from_prog_fd(fd3);
+ id4 = id_from_prog_fd(fd4);
+
+ ASSERT_NEQ(id1, id2, "prog_ids_1_2");
+ ASSERT_NEQ(id3, id4, "prog_ids_3_4");
+ ASSERT_NEQ(id2, id3, "prog_ids_2_3");
+
+ assert_mprog_count(target, 0);
+
+ err = bpf_prog_attach_opts(fd1, loopback, target, &opta);
+ if (!ASSERT_EQ(err, 0, "prog_attach"))
+ goto cleanup;
+
+ assert_mprog_count(target, 1);
+
+ LIBBPF_OPTS_RESET(opta,
+ .flags = BPF_F_AFTER,
+ );
+
+ err = bpf_prog_attach_opts(fd2, loopback, target, &opta);
+ if (!ASSERT_EQ(err, 0, "prog_attach"))
+ goto cleanup_target;
+
+ assert_mprog_count(target, 2);
+
+ optq.prog_ids = prog_ids;
+
+ memset(prog_ids, 0, sizeof(prog_ids));
+ optq.count = ARRAY_SIZE(prog_ids);
+
+ err = bpf_prog_query_opts(loopback, target, &optq);
+ if (!ASSERT_OK(err, "prog_query"))
+ goto cleanup_target2;
+
+ ASSERT_EQ(optq.count, 2, "count");
+ ASSERT_EQ(optq.revision, 3, "revision");
+ ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]");
+ ASSERT_EQ(optq.prog_ids[1], id2, "prog_ids[1]");
+ ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]");
+
+ ASSERT_OK(system(ping_cmd), ping_cmd);
+
+ ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+ ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+ ASSERT_EQ(skel->bss->seen_tc3, false, "seen_tc3");
+ ASSERT_EQ(skel->bss->seen_tc4, false, "seen_tc4");
+
+ LIBBPF_OPTS_RESET(opta,
+ .flags = BPF_F_AFTER,
+ );
+
+ err = bpf_prog_attach_opts(fd3, loopback, target, &opta);
+ if (!ASSERT_EQ(err, 0, "prog_attach"))
+ goto cleanup_target2;
+
+ LIBBPF_OPTS_RESET(opta,
+ .flags = BPF_F_AFTER,
+ );
+
+ err = bpf_prog_attach_opts(fd4, loopback, target, &opta);
+ if (!ASSERT_EQ(err, 0, "prog_attach"))
+ goto cleanup_target3;
+
+ assert_mprog_count(target, 4);
+
+ memset(prog_ids, 0, sizeof(prog_ids));
+ optq.count = ARRAY_SIZE(prog_ids);
+
+ err = bpf_prog_query_opts(loopback, target, &optq);
+ if (!ASSERT_OK(err, "prog_query"))
+ goto cleanup_target4;
+
+ ASSERT_EQ(optq.count, 4, "count");
+ ASSERT_EQ(optq.revision, 5, "revision");
+ ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]");
+ ASSERT_EQ(optq.prog_ids[1], id2, "prog_ids[1]");
+ ASSERT_EQ(optq.prog_ids[2], id3, "prog_ids[2]");
+ ASSERT_EQ(optq.prog_ids[3], id4, "prog_ids[3]");
+ ASSERT_EQ(optq.prog_ids[4], 0, "prog_ids[4]");
+
+ ASSERT_OK(system(ping_cmd), ping_cmd);
+
+ ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+ ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+ ASSERT_EQ(skel->bss->seen_tc3, true, "seen_tc3");
+ ASSERT_EQ(skel->bss->seen_tc4, true, "seen_tc4");
+
+cleanup_target4:
+ err = bpf_prog_detach_opts(fd4, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+ assert_mprog_count(target, 3);
+
+cleanup_target3:
+ err = bpf_prog_detach_opts(fd3, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+ assert_mprog_count(target, 2);
+
+cleanup_target2:
+ err = bpf_prog_detach_opts(fd2, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+ assert_mprog_count(target, 1);
+
+cleanup_target:
+ err = bpf_prog_detach_opts(fd1, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+ assert_mprog_count(target, 0);
+
+cleanup:
+ test_tc_link__destroy(skel);
+}
+
+void serial_test_tc_opts_append(void)
+{
+ test_tc_opts_append_target(BPF_TCX_INGRESS);
+ test_tc_opts_append_target(BPF_TCX_EGRESS);
+}
+
+static void test_tc_opts_dev_cleanup_target(int target)
+{
+ LIBBPF_OPTS(bpf_prog_attach_opts, opta);
+ LIBBPF_OPTS(bpf_prog_detach_opts, optd);
+ LIBBPF_OPTS(bpf_prog_query_opts, optq);
+ __u32 fd1, fd2, fd3, fd4, id1, id2, id3, id4;
+ struct test_tc_link *skel;
+ int err, ifindex;
+
+ ASSERT_OK(system("ip link add dev tcx_opts1 type veth peer name tcx_opts2"), "add veth");
+ ifindex = if_nametoindex("tcx_opts1");
+ ASSERT_NEQ(ifindex, 0, "non_zero_ifindex");
+
+ skel = test_tc_link__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "skel_load"))
+ goto cleanup;
+
+ fd1 = bpf_program__fd(skel->progs.tc1);
+ fd2 = bpf_program__fd(skel->progs.tc2);
+ fd3 = bpf_program__fd(skel->progs.tc3);
+ fd4 = bpf_program__fd(skel->progs.tc4);
+
+ id1 = id_from_prog_fd(fd1);
+ id2 = id_from_prog_fd(fd2);
+ id3 = id_from_prog_fd(fd3);
+ id4 = id_from_prog_fd(fd4);
+
+ ASSERT_NEQ(id1, id2, "prog_ids_1_2");
+ ASSERT_NEQ(id3, id4, "prog_ids_3_4");
+ ASSERT_NEQ(id2, id3, "prog_ids_2_3");
+
+ assert_mprog_count_ifindex(ifindex, target, 0);
+
+ err = bpf_prog_attach_opts(fd1, ifindex, target, &opta);
+ if (!ASSERT_EQ(err, 0, "prog_attach"))
+ goto cleanup;
+
+ assert_mprog_count_ifindex(ifindex, target, 1);
+
+ err = bpf_prog_attach_opts(fd2, ifindex, target, &opta);
+ if (!ASSERT_EQ(err, 0, "prog_attach"))
+ goto cleanup1;
+
+ assert_mprog_count_ifindex(ifindex, target, 2);
+
+ err = bpf_prog_attach_opts(fd3, ifindex, target, &opta);
+ if (!ASSERT_EQ(err, 0, "prog_attach"))
+ goto cleanup2;
+
+ assert_mprog_count_ifindex(ifindex, target, 3);
+
+ err = bpf_prog_attach_opts(fd4, ifindex, target, &opta);
+ if (!ASSERT_EQ(err, 0, "prog_attach"))
+ goto cleanup3;
+
+ assert_mprog_count_ifindex(ifindex, target, 4);
+
+ ASSERT_OK(system("ip link del dev tcx_opts1"), "del veth");
+ ASSERT_EQ(if_nametoindex("tcx_opts1"), 0, "dev1_removed");
+ ASSERT_EQ(if_nametoindex("tcx_opts2"), 0, "dev2_removed");
+ return;
+cleanup3:
+ err = bpf_prog_detach_opts(fd3, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+
+ assert_mprog_count_ifindex(ifindex, target, 2);
+cleanup2:
+ err = bpf_prog_detach_opts(fd2, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+
+ assert_mprog_count_ifindex(ifindex, target, 1);
+cleanup1:
+ err = bpf_prog_detach_opts(fd1, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+
+ assert_mprog_count_ifindex(ifindex, target, 0);
+cleanup:
+ test_tc_link__destroy(skel);
+
+ ASSERT_OK(system("ip link del dev tcx_opts1"), "del veth");
+ ASSERT_EQ(if_nametoindex("tcx_opts1"), 0, "dev1_removed");
+ ASSERT_EQ(if_nametoindex("tcx_opts2"), 0, "dev2_removed");
+}
+
+void serial_test_tc_opts_dev_cleanup(void)
+{
+ test_tc_opts_dev_cleanup_target(BPF_TCX_INGRESS);
+ test_tc_opts_dev_cleanup_target(BPF_TCX_EGRESS);
+}
+
+static void test_tc_opts_mixed_target(int target)
+{
+ LIBBPF_OPTS(bpf_prog_attach_opts, opta);
+ LIBBPF_OPTS(bpf_prog_detach_opts, optd);
+ LIBBPF_OPTS(bpf_prog_query_opts, optq);
+ LIBBPF_OPTS(bpf_tcx_opts, optl);
+ __u32 pid1, pid2, pid3, pid4, lid2, lid4;
+ __u32 prog_flags[4], link_flags[4];
+ __u32 prog_ids[4], link_ids[4];
+ struct test_tc_link *skel;
+ struct bpf_link *link;
+ int err, detach_fd;
+
+ skel = test_tc_link__open();
+ if (!ASSERT_OK_PTR(skel, "skel_open"))
+ goto cleanup;
+
+ ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc1, target),
+ 0, "tc1_attach_type");
+ ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc2, target),
+ 0, "tc2_attach_type");
+ ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc3, target),
+ 0, "tc3_attach_type");
+ ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc4, target),
+ 0, "tc4_attach_type");
+
+ err = test_tc_link__load(skel);
+ if (!ASSERT_OK(err, "skel_load"))
+ goto cleanup;
+
+ pid1 = id_from_prog_fd(bpf_program__fd(skel->progs.tc1));
+ pid2 = id_from_prog_fd(bpf_program__fd(skel->progs.tc2));
+ pid3 = id_from_prog_fd(bpf_program__fd(skel->progs.tc3));
+ pid4 = id_from_prog_fd(bpf_program__fd(skel->progs.tc4));
+
+ ASSERT_NEQ(pid1, pid2, "prog_ids_1_2");
+ ASSERT_NEQ(pid3, pid4, "prog_ids_3_4");
+ ASSERT_NEQ(pid2, pid3, "prog_ids_2_3");
+
+ assert_mprog_count(target, 0);
+
+ err = bpf_prog_attach_opts(bpf_program__fd(skel->progs.tc1),
+ loopback, target, &opta);
+ if (!ASSERT_EQ(err, 0, "prog_attach"))
+ goto cleanup;
+
+ detach_fd = bpf_program__fd(skel->progs.tc1);
+
+ assert_mprog_count(target, 1);
+
+ link = bpf_program__attach_tcx(skel->progs.tc2, loopback, &optl);
+ if (!ASSERT_OK_PTR(link, "link_attach"))
+ goto cleanup1;
+ skel->links.tc2 = link;
+
+ lid2 = id_from_link_fd(bpf_link__fd(skel->links.tc2));
+
+ assert_mprog_count(target, 2);
+
+ LIBBPF_OPTS_RESET(opta,
+ .flags = BPF_F_REPLACE,
+ .replace_prog_fd = bpf_program__fd(skel->progs.tc1),
+ );
+
+ err = bpf_prog_attach_opts(bpf_program__fd(skel->progs.tc2),
+ loopback, target, &opta);
+ ASSERT_EQ(err, -EEXIST, "prog_attach");
+
+ assert_mprog_count(target, 2);
+
+ LIBBPF_OPTS_RESET(opta,
+ .flags = BPF_F_REPLACE,
+ .replace_prog_fd = bpf_program__fd(skel->progs.tc2),
+ );
+
+ err = bpf_prog_attach_opts(bpf_program__fd(skel->progs.tc1),
+ loopback, target, &opta);
+ ASSERT_EQ(err, -EEXIST, "prog_attach");
+
+ assert_mprog_count(target, 2);
+
+ LIBBPF_OPTS_RESET(opta,
+ .flags = BPF_F_REPLACE,
+ .replace_prog_fd = bpf_program__fd(skel->progs.tc2),
+ );
+
+ err = bpf_prog_attach_opts(bpf_program__fd(skel->progs.tc3),
+ loopback, target, &opta);
+ ASSERT_EQ(err, -EBUSY, "prog_attach");
+
+ assert_mprog_count(target, 2);
+
+ LIBBPF_OPTS_RESET(opta,
+ .flags = BPF_F_REPLACE,
+ .replace_prog_fd = bpf_program__fd(skel->progs.tc1),
+ );
+
+ err = bpf_prog_attach_opts(bpf_program__fd(skel->progs.tc3),
+ loopback, target, &opta);
+ if (!ASSERT_EQ(err, 0, "prog_attach"))
+ goto cleanup1;
+
+ detach_fd = bpf_program__fd(skel->progs.tc3);
+
+ assert_mprog_count(target, 2);
+
+ link = bpf_program__attach_tcx(skel->progs.tc4, loopback, &optl);
+ if (!ASSERT_OK_PTR(link, "link_attach"))
+ goto cleanup1;
+ skel->links.tc4 = link;
+
+ lid4 = id_from_link_fd(bpf_link__fd(skel->links.tc4));
+
+ assert_mprog_count(target, 3);
+
+ LIBBPF_OPTS_RESET(opta,
+ .flags = BPF_F_REPLACE,
+ .replace_prog_fd = bpf_program__fd(skel->progs.tc4),
+ );
+
+ err = bpf_prog_attach_opts(bpf_program__fd(skel->progs.tc2),
+ loopback, target, &opta);
+ ASSERT_EQ(err, -EEXIST, "prog_attach");
+
+ optq.prog_ids = prog_ids;
+ optq.prog_attach_flags = prog_flags;
+ optq.link_ids = link_ids;
+ optq.link_attach_flags = link_flags;
+
+ memset(prog_ids, 0, sizeof(prog_ids));
+ memset(prog_flags, 0, sizeof(prog_flags));
+ memset(link_ids, 0, sizeof(link_ids));
+ memset(link_flags, 0, sizeof(link_flags));
+ optq.count = ARRAY_SIZE(prog_ids);
+
+ err = bpf_prog_query_opts(loopback, target, &optq);
+ if (!ASSERT_OK(err, "prog_query"))
+ goto cleanup1;
+
+ ASSERT_EQ(optq.count, 3, "count");
+ ASSERT_EQ(optq.revision, 5, "revision");
+ ASSERT_EQ(optq.prog_ids[0], pid3, "prog_ids[0]");
+ ASSERT_EQ(optq.prog_attach_flags[0], 0, "prog_flags[0]");
+ ASSERT_EQ(optq.link_ids[0], 0, "link_ids[0]");
+ ASSERT_EQ(optq.link_attach_flags[0], 0, "link_flags[0]");
+ ASSERT_EQ(optq.prog_ids[1], pid2, "prog_ids[1]");
+ ASSERT_EQ(optq.prog_attach_flags[1], 0, "prog_flags[1]");
+ ASSERT_EQ(optq.link_ids[1], lid2, "link_ids[1]");
+ ASSERT_EQ(optq.link_attach_flags[1], 0, "link_flags[1]");
+ ASSERT_EQ(optq.prog_ids[2], pid4, "prog_ids[2]");
+ ASSERT_EQ(optq.prog_attach_flags[2], 0, "prog_flags[2]");
+ ASSERT_EQ(optq.link_ids[2], lid4, "link_ids[2]");
+ ASSERT_EQ(optq.link_attach_flags[2], 0, "link_flags[2]");
+ ASSERT_EQ(optq.prog_ids[3], 0, "prog_ids[3]");
+ ASSERT_EQ(optq.prog_attach_flags[3], 0, "prog_flags[3]");
+ ASSERT_EQ(optq.link_ids[3], 0, "link_ids[3]");
+ ASSERT_EQ(optq.link_attach_flags[3], 0, "link_flags[3]");
+
+ ASSERT_OK(system(ping_cmd), ping_cmd);
+
+cleanup1:
+ err = bpf_prog_detach_opts(detach_fd, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+ assert_mprog_count(target, 2);
+
+cleanup:
+ test_tc_link__destroy(skel);
+ assert_mprog_count(target, 0);
+}
+
+void serial_test_tc_opts_mixed(void)
+{
+ test_tc_opts_mixed_target(BPF_TCX_INGRESS);
+ test_tc_opts_mixed_target(BPF_TCX_EGRESS);
+}
+
+static void test_tc_opts_demixed_target(int target)
+{
+ LIBBPF_OPTS(bpf_prog_attach_opts, opta);
+ LIBBPF_OPTS(bpf_prog_detach_opts, optd);
+ LIBBPF_OPTS(bpf_tcx_opts, optl);
+ struct test_tc_link *skel;
+ struct bpf_link *link;
+ __u32 pid1, pid2;
+ int err;
+
+ skel = test_tc_link__open();
+ if (!ASSERT_OK_PTR(skel, "skel_open"))
+ goto cleanup;
+
+ ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc1, target),
+ 0, "tc1_attach_type");
+ ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc2, target),
+ 0, "tc2_attach_type");
+
+ err = test_tc_link__load(skel);
+ if (!ASSERT_OK(err, "skel_load"))
+ goto cleanup;
+
+ pid1 = id_from_prog_fd(bpf_program__fd(skel->progs.tc1));
+ pid2 = id_from_prog_fd(bpf_program__fd(skel->progs.tc2));
+ ASSERT_NEQ(pid1, pid2, "prog_ids_1_2");
+
+ assert_mprog_count(target, 0);
+
+ err = bpf_prog_attach_opts(bpf_program__fd(skel->progs.tc1),
+ loopback, target, &opta);
+ if (!ASSERT_EQ(err, 0, "prog_attach"))
+ goto cleanup;
+
+ assert_mprog_count(target, 1);
+
+ link = bpf_program__attach_tcx(skel->progs.tc2, loopback, &optl);
+ if (!ASSERT_OK_PTR(link, "link_attach"))
+ goto cleanup1;
+ skel->links.tc2 = link;
+
+ assert_mprog_count(target, 2);
+
+ LIBBPF_OPTS_RESET(optd,
+ .flags = BPF_F_AFTER,
+ );
+
+ err = bpf_prog_detach_opts(0, loopback, target, &optd);
+ ASSERT_EQ(err, -EBUSY, "prog_detach");
+
+ assert_mprog_count(target, 2);
+
+ LIBBPF_OPTS_RESET(optd,
+ .flags = BPF_F_BEFORE,
+ );
+
+ err = bpf_prog_detach_opts(0, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+
+ assert_mprog_count(target, 1);
+ goto cleanup;
+
+cleanup1:
+ err = bpf_prog_detach_opts(bpf_program__fd(skel->progs.tc1),
+ loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+ assert_mprog_count(target, 2);
+
+cleanup:
+ test_tc_link__destroy(skel);
+ assert_mprog_count(target, 0);
+}
+
+void serial_test_tc_opts_demixed(void)
+{
+ test_tc_opts_demixed_target(BPF_TCX_INGRESS);
+ test_tc_opts_demixed_target(BPF_TCX_EGRESS);
+}
+
+static void test_tc_opts_detach_target(int target)
+{
+ LIBBPF_OPTS(bpf_prog_attach_opts, opta);
+ LIBBPF_OPTS(bpf_prog_detach_opts, optd);
+ LIBBPF_OPTS(bpf_prog_query_opts, optq);
+ __u32 fd1, fd2, fd3, fd4, id1, id2, id3, id4;
+ struct test_tc_link *skel;
+ __u32 prog_ids[5];
+ int err;
+
+ skel = test_tc_link__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "skel_load"))
+ goto cleanup;
+
+ fd1 = bpf_program__fd(skel->progs.tc1);
+ fd2 = bpf_program__fd(skel->progs.tc2);
+ fd3 = bpf_program__fd(skel->progs.tc3);
+ fd4 = bpf_program__fd(skel->progs.tc4);
+
+ id1 = id_from_prog_fd(fd1);
+ id2 = id_from_prog_fd(fd2);
+ id3 = id_from_prog_fd(fd3);
+ id4 = id_from_prog_fd(fd4);
+
+ ASSERT_NEQ(id1, id2, "prog_ids_1_2");
+ ASSERT_NEQ(id3, id4, "prog_ids_3_4");
+ ASSERT_NEQ(id2, id3, "prog_ids_2_3");
+
+ assert_mprog_count(target, 0);
+
+ err = bpf_prog_attach_opts(fd1, loopback, target, &opta);
+ if (!ASSERT_EQ(err, 0, "prog_attach"))
+ goto cleanup;
+
+ assert_mprog_count(target, 1);
+
+ err = bpf_prog_attach_opts(fd2, loopback, target, &opta);
+ if (!ASSERT_EQ(err, 0, "prog_attach"))
+ goto cleanup1;
+
+ assert_mprog_count(target, 2);
+
+ err = bpf_prog_attach_opts(fd3, loopback, target, &opta);
+ if (!ASSERT_EQ(err, 0, "prog_attach"))
+ goto cleanup2;
+
+ assert_mprog_count(target, 3);
+
+ err = bpf_prog_attach_opts(fd4, loopback, target, &opta);
+ if (!ASSERT_EQ(err, 0, "prog_attach"))
+ goto cleanup3;
+
+ assert_mprog_count(target, 4);
+
+ optq.prog_ids = prog_ids;
+
+ memset(prog_ids, 0, sizeof(prog_ids));
+ optq.count = ARRAY_SIZE(prog_ids);
+
+ err = bpf_prog_query_opts(loopback, target, &optq);
+ if (!ASSERT_OK(err, "prog_query"))
+ goto cleanup4;
+
+ ASSERT_EQ(optq.count, 4, "count");
+ ASSERT_EQ(optq.revision, 5, "revision");
+ ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]");
+ ASSERT_EQ(optq.prog_ids[1], id2, "prog_ids[1]");
+ ASSERT_EQ(optq.prog_ids[2], id3, "prog_ids[2]");
+ ASSERT_EQ(optq.prog_ids[3], id4, "prog_ids[3]");
+ ASSERT_EQ(optq.prog_ids[4], 0, "prog_ids[4]");
+
+ LIBBPF_OPTS_RESET(optd,
+ .flags = BPF_F_BEFORE,
+ );
+
+ err = bpf_prog_detach_opts(0, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+
+ assert_mprog_count(target, 3);
+
+ memset(prog_ids, 0, sizeof(prog_ids));
+ optq.count = ARRAY_SIZE(prog_ids);
+
+ err = bpf_prog_query_opts(loopback, target, &optq);
+ if (!ASSERT_OK(err, "prog_query"))
+ goto cleanup4;
+
+ ASSERT_EQ(optq.count, 3, "count");
+ ASSERT_EQ(optq.revision, 6, "revision");
+ ASSERT_EQ(optq.prog_ids[0], id2, "prog_ids[0]");
+ ASSERT_EQ(optq.prog_ids[1], id3, "prog_ids[1]");
+ ASSERT_EQ(optq.prog_ids[2], id4, "prog_ids[2]");
+ ASSERT_EQ(optq.prog_ids[3], 0, "prog_ids[3]");
+
+ LIBBPF_OPTS_RESET(optd,
+ .flags = BPF_F_AFTER,
+ );
+
+ err = bpf_prog_detach_opts(0, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+
+ assert_mprog_count(target, 2);
+
+ memset(prog_ids, 0, sizeof(prog_ids));
+ optq.count = ARRAY_SIZE(prog_ids);
+
+ err = bpf_prog_query_opts(loopback, target, &optq);
+ if (!ASSERT_OK(err, "prog_query"))
+ goto cleanup4;
+
+ ASSERT_EQ(optq.count, 2, "count");
+ ASSERT_EQ(optq.revision, 7, "revision");
+ ASSERT_EQ(optq.prog_ids[0], id2, "prog_ids[0]");
+ ASSERT_EQ(optq.prog_ids[1], id3, "prog_ids[1]");
+ ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]");
+
+ LIBBPF_OPTS_RESET(optd);
+
+ err = bpf_prog_detach_opts(fd3, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+ assert_mprog_count(target, 1);
+
+ err = bpf_prog_detach_opts(fd2, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+ assert_mprog_count(target, 0);
+
+ LIBBPF_OPTS_RESET(optd,
+ .flags = BPF_F_BEFORE,
+ );
+
+ err = bpf_prog_detach_opts(0, loopback, target, &optd);
+ ASSERT_EQ(err, -ENOENT, "prog_detach");
+
+ LIBBPF_OPTS_RESET(optd,
+ .flags = BPF_F_AFTER,
+ );
+
+ err = bpf_prog_detach_opts(0, loopback, target, &optd);
+ ASSERT_EQ(err, -ENOENT, "prog_detach");
+ goto cleanup;
+
+cleanup4:
+ err = bpf_prog_detach_opts(fd4, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+ assert_mprog_count(target, 3);
+
+cleanup3:
+ err = bpf_prog_detach_opts(fd3, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+ assert_mprog_count(target, 2);
+
+cleanup2:
+ err = bpf_prog_detach_opts(fd2, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+ assert_mprog_count(target, 1);
+
+cleanup1:
+ err = bpf_prog_detach_opts(fd1, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+ assert_mprog_count(target, 0);
+
+cleanup:
+ test_tc_link__destroy(skel);
+}
+
+void serial_test_tc_opts_detach(void)
+{
+ test_tc_opts_detach_target(BPF_TCX_INGRESS);
+ test_tc_opts_detach_target(BPF_TCX_EGRESS);
+}
+
+static void test_tc_opts_detach_before_target(int target)
+{
+ LIBBPF_OPTS(bpf_prog_attach_opts, opta);
+ LIBBPF_OPTS(bpf_prog_detach_opts, optd);
+ LIBBPF_OPTS(bpf_prog_query_opts, optq);
+ __u32 fd1, fd2, fd3, fd4, id1, id2, id3, id4;
+ struct test_tc_link *skel;
+ __u32 prog_ids[5];
+ int err;
+
+ skel = test_tc_link__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "skel_load"))
+ goto cleanup;
+
+ fd1 = bpf_program__fd(skel->progs.tc1);
+ fd2 = bpf_program__fd(skel->progs.tc2);
+ fd3 = bpf_program__fd(skel->progs.tc3);
+ fd4 = bpf_program__fd(skel->progs.tc4);
+
+ id1 = id_from_prog_fd(fd1);
+ id2 = id_from_prog_fd(fd2);
+ id3 = id_from_prog_fd(fd3);
+ id4 = id_from_prog_fd(fd4);
+
+ ASSERT_NEQ(id1, id2, "prog_ids_1_2");
+ ASSERT_NEQ(id3, id4, "prog_ids_3_4");
+ ASSERT_NEQ(id2, id3, "prog_ids_2_3");
+
+ assert_mprog_count(target, 0);
+
+ err = bpf_prog_attach_opts(fd1, loopback, target, &opta);
+ if (!ASSERT_EQ(err, 0, "prog_attach"))
+ goto cleanup;
+
+ assert_mprog_count(target, 1);
+
+ err = bpf_prog_attach_opts(fd2, loopback, target, &opta);
+ if (!ASSERT_EQ(err, 0, "prog_attach"))
+ goto cleanup1;
+
+ assert_mprog_count(target, 2);
+
+ err = bpf_prog_attach_opts(fd3, loopback, target, &opta);
+ if (!ASSERT_EQ(err, 0, "prog_attach"))
+ goto cleanup2;
+
+ assert_mprog_count(target, 3);
+
+ err = bpf_prog_attach_opts(fd4, loopback, target, &opta);
+ if (!ASSERT_EQ(err, 0, "prog_attach"))
+ goto cleanup3;
+
+ assert_mprog_count(target, 4);
+
+ optq.prog_ids = prog_ids;
+
+ memset(prog_ids, 0, sizeof(prog_ids));
+ optq.count = ARRAY_SIZE(prog_ids);
+
+ err = bpf_prog_query_opts(loopback, target, &optq);
+ if (!ASSERT_OK(err, "prog_query"))
+ goto cleanup4;
+
+ ASSERT_EQ(optq.count, 4, "count");
+ ASSERT_EQ(optq.revision, 5, "revision");
+ ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]");
+ ASSERT_EQ(optq.prog_ids[1], id2, "prog_ids[1]");
+ ASSERT_EQ(optq.prog_ids[2], id3, "prog_ids[2]");
+ ASSERT_EQ(optq.prog_ids[3], id4, "prog_ids[3]");
+ ASSERT_EQ(optq.prog_ids[4], 0, "prog_ids[4]");
+
+ LIBBPF_OPTS_RESET(optd,
+ .flags = BPF_F_BEFORE,
+ .relative_fd = fd2,
+ );
+
+ err = bpf_prog_detach_opts(fd1, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+
+ assert_mprog_count(target, 3);
+
+ memset(prog_ids, 0, sizeof(prog_ids));
+ optq.count = ARRAY_SIZE(prog_ids);
+
+ err = bpf_prog_query_opts(loopback, target, &optq);
+ if (!ASSERT_OK(err, "prog_query"))
+ goto cleanup4;
+
+ ASSERT_EQ(optq.count, 3, "count");
+ ASSERT_EQ(optq.revision, 6, "revision");
+ ASSERT_EQ(optq.prog_ids[0], id2, "prog_ids[0]");
+ ASSERT_EQ(optq.prog_ids[1], id3, "prog_ids[1]");
+ ASSERT_EQ(optq.prog_ids[2], id4, "prog_ids[2]");
+ ASSERT_EQ(optq.prog_ids[3], 0, "prog_ids[3]");
+
+ LIBBPF_OPTS_RESET(optd,
+ .flags = BPF_F_BEFORE,
+ .relative_fd = fd2,
+ );
+
+ err = bpf_prog_detach_opts(fd1, loopback, target, &optd);
+ ASSERT_EQ(err, -ENOENT, "prog_detach");
+ assert_mprog_count(target, 3);
+
+ LIBBPF_OPTS_RESET(optd,
+ .flags = BPF_F_BEFORE,
+ .relative_fd = fd4,
+ );
+
+ err = bpf_prog_detach_opts(fd2, loopback, target, &optd);
+ ASSERT_EQ(err, -ERANGE, "prog_detach");
+ assert_mprog_count(target, 3);
+
+ LIBBPF_OPTS_RESET(optd,
+ .flags = BPF_F_BEFORE,
+ .relative_fd = fd1,
+ );
+
+ err = bpf_prog_detach_opts(fd2, loopback, target, &optd);
+ ASSERT_EQ(err, -ENOENT, "prog_detach");
+ assert_mprog_count(target, 3);
+
+ LIBBPF_OPTS_RESET(optd,
+ .flags = BPF_F_BEFORE,
+ .relative_fd = fd3,
+ );
+
+ err = bpf_prog_detach_opts(fd2, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+
+ assert_mprog_count(target, 2);
+
+ memset(prog_ids, 0, sizeof(prog_ids));
+ optq.count = ARRAY_SIZE(prog_ids);
+
+ err = bpf_prog_query_opts(loopback, target, &optq);
+ if (!ASSERT_OK(err, "prog_query"))
+ goto cleanup4;
+
+ ASSERT_EQ(optq.count, 2, "count");
+ ASSERT_EQ(optq.revision, 7, "revision");
+ ASSERT_EQ(optq.prog_ids[0], id3, "prog_ids[0]");
+ ASSERT_EQ(optq.prog_ids[1], id4, "prog_ids[1]");
+ ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]");
+
+ LIBBPF_OPTS_RESET(optd,
+ .flags = BPF_F_BEFORE,
+ .relative_fd = fd4,
+ );
+
+ err = bpf_prog_detach_opts(0, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+
+ assert_mprog_count(target, 1);
+
+ memset(prog_ids, 0, sizeof(prog_ids));
+ optq.count = ARRAY_SIZE(prog_ids);
+
+ err = bpf_prog_query_opts(loopback, target, &optq);
+ if (!ASSERT_OK(err, "prog_query"))
+ goto cleanup4;
+
+ ASSERT_EQ(optq.count, 1, "count");
+ ASSERT_EQ(optq.revision, 8, "revision");
+ ASSERT_EQ(optq.prog_ids[0], id4, "prog_ids[0]");
+ ASSERT_EQ(optq.prog_ids[1], 0, "prog_ids[1]");
+
+ LIBBPF_OPTS_RESET(optd,
+ .flags = BPF_F_BEFORE,
+ );
+
+ err = bpf_prog_detach_opts(0, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+
+ assert_mprog_count(target, 0);
+ goto cleanup;
+
+cleanup4:
+ err = bpf_prog_detach_opts(fd4, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+ assert_mprog_count(target, 3);
+
+cleanup3:
+ err = bpf_prog_detach_opts(fd3, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+ assert_mprog_count(target, 2);
+
+cleanup2:
+ err = bpf_prog_detach_opts(fd2, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+ assert_mprog_count(target, 1);
+
+cleanup1:
+ err = bpf_prog_detach_opts(fd1, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+ assert_mprog_count(target, 0);
+
+cleanup:
+ test_tc_link__destroy(skel);
+}
+
+void serial_test_tc_opts_detach_before(void)
+{
+ test_tc_opts_detach_before_target(BPF_TCX_INGRESS);
+ test_tc_opts_detach_before_target(BPF_TCX_EGRESS);
+}
+
+static void test_tc_opts_detach_after_target(int target)
+{
+ LIBBPF_OPTS(bpf_prog_attach_opts, opta);
+ LIBBPF_OPTS(bpf_prog_detach_opts, optd);
+ LIBBPF_OPTS(bpf_prog_query_opts, optq);
+ __u32 fd1, fd2, fd3, fd4, id1, id2, id3, id4;
+ struct test_tc_link *skel;
+ __u32 prog_ids[5];
+ int err;
+
+ skel = test_tc_link__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "skel_load"))
+ goto cleanup;
+
+ fd1 = bpf_program__fd(skel->progs.tc1);
+ fd2 = bpf_program__fd(skel->progs.tc2);
+ fd3 = bpf_program__fd(skel->progs.tc3);
+ fd4 = bpf_program__fd(skel->progs.tc4);
+
+ id1 = id_from_prog_fd(fd1);
+ id2 = id_from_prog_fd(fd2);
+ id3 = id_from_prog_fd(fd3);
+ id4 = id_from_prog_fd(fd4);
+
+ ASSERT_NEQ(id1, id2, "prog_ids_1_2");
+ ASSERT_NEQ(id3, id4, "prog_ids_3_4");
+ ASSERT_NEQ(id2, id3, "prog_ids_2_3");
+
+ assert_mprog_count(target, 0);
+
+ err = bpf_prog_attach_opts(fd1, loopback, target, &opta);
+ if (!ASSERT_EQ(err, 0, "prog_attach"))
+ goto cleanup;
+
+ assert_mprog_count(target, 1);
+
+ err = bpf_prog_attach_opts(fd2, loopback, target, &opta);
+ if (!ASSERT_EQ(err, 0, "prog_attach"))
+ goto cleanup1;
+
+ assert_mprog_count(target, 2);
+
+ err = bpf_prog_attach_opts(fd3, loopback, target, &opta);
+ if (!ASSERT_EQ(err, 0, "prog_attach"))
+ goto cleanup2;
+
+ assert_mprog_count(target, 3);
+
+ err = bpf_prog_attach_opts(fd4, loopback, target, &opta);
+ if (!ASSERT_EQ(err, 0, "prog_attach"))
+ goto cleanup3;
+
+ assert_mprog_count(target, 4);
+
+ optq.prog_ids = prog_ids;
+
+ memset(prog_ids, 0, sizeof(prog_ids));
+ optq.count = ARRAY_SIZE(prog_ids);
+
+ err = bpf_prog_query_opts(loopback, target, &optq);
+ if (!ASSERT_OK(err, "prog_query"))
+ goto cleanup4;
+
+ ASSERT_EQ(optq.count, 4, "count");
+ ASSERT_EQ(optq.revision, 5, "revision");
+ ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]");
+ ASSERT_EQ(optq.prog_ids[1], id2, "prog_ids[1]");
+ ASSERT_EQ(optq.prog_ids[2], id3, "prog_ids[2]");
+ ASSERT_EQ(optq.prog_ids[3], id4, "prog_ids[3]");
+ ASSERT_EQ(optq.prog_ids[4], 0, "prog_ids[4]");
+
+ LIBBPF_OPTS_RESET(optd,
+ .flags = BPF_F_AFTER,
+ .relative_fd = fd1,
+ );
+
+ err = bpf_prog_detach_opts(fd2, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+
+ assert_mprog_count(target, 3);
+
+ memset(prog_ids, 0, sizeof(prog_ids));
+ optq.count = ARRAY_SIZE(prog_ids);
+
+ err = bpf_prog_query_opts(loopback, target, &optq);
+ if (!ASSERT_OK(err, "prog_query"))
+ goto cleanup4;
+
+ ASSERT_EQ(optq.count, 3, "count");
+ ASSERT_EQ(optq.revision, 6, "revision");
+ ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]");
+ ASSERT_EQ(optq.prog_ids[1], id3, "prog_ids[1]");
+ ASSERT_EQ(optq.prog_ids[2], id4, "prog_ids[2]");
+ ASSERT_EQ(optq.prog_ids[3], 0, "prog_ids[3]");
+
+ LIBBPF_OPTS_RESET(optd,
+ .flags = BPF_F_AFTER,
+ .relative_fd = fd1,
+ );
+
+ err = bpf_prog_detach_opts(fd2, loopback, target, &optd);
+ ASSERT_EQ(err, -ENOENT, "prog_detach");
+ assert_mprog_count(target, 3);
+
+ LIBBPF_OPTS_RESET(optd,
+ .flags = BPF_F_AFTER,
+ .relative_fd = fd4,
+ );
+
+ err = bpf_prog_detach_opts(fd1, loopback, target, &optd);
+ ASSERT_EQ(err, -ERANGE, "prog_detach");
+ assert_mprog_count(target, 3);
+
+ LIBBPF_OPTS_RESET(optd,
+ .flags = BPF_F_AFTER,
+ .relative_fd = fd3,
+ );
+
+ err = bpf_prog_detach_opts(fd1, loopback, target, &optd);
+ ASSERT_EQ(err, -ERANGE, "prog_detach");
+ assert_mprog_count(target, 3);
+
+ LIBBPF_OPTS_RESET(optd,
+ .flags = BPF_F_AFTER,
+ .relative_fd = fd1,
+ );
+
+ err = bpf_prog_detach_opts(fd1, loopback, target, &optd);
+ ASSERT_EQ(err, -ERANGE, "prog_detach");
+ assert_mprog_count(target, 3);
+
+ LIBBPF_OPTS_RESET(optd,
+ .flags = BPF_F_AFTER,
+ .relative_fd = fd1,
+ );
+
+ err = bpf_prog_detach_opts(fd3, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+
+ assert_mprog_count(target, 2);
+
+ memset(prog_ids, 0, sizeof(prog_ids));
+ optq.count = ARRAY_SIZE(prog_ids);
+
+ err = bpf_prog_query_opts(loopback, target, &optq);
+ if (!ASSERT_OK(err, "prog_query"))
+ goto cleanup4;
+
+ ASSERT_EQ(optq.count, 2, "count");
+ ASSERT_EQ(optq.revision, 7, "revision");
+ ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]");
+ ASSERT_EQ(optq.prog_ids[1], id4, "prog_ids[1]");
+ ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]");
+
+ LIBBPF_OPTS_RESET(optd,
+ .flags = BPF_F_AFTER,
+ .relative_fd = fd1,
+ );
+
+ err = bpf_prog_detach_opts(0, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+
+ assert_mprog_count(target, 1);
+
+ memset(prog_ids, 0, sizeof(prog_ids));
+ optq.count = ARRAY_SIZE(prog_ids);
+
+ err = bpf_prog_query_opts(loopback, target, &optq);
+ if (!ASSERT_OK(err, "prog_query"))
+ goto cleanup4;
+
+ ASSERT_EQ(optq.count, 1, "count");
+ ASSERT_EQ(optq.revision, 8, "revision");
+ ASSERT_EQ(optq.prog_ids[0], id1, "prog_ids[0]");
+ ASSERT_EQ(optq.prog_ids[1], 0, "prog_ids[1]");
+
+ LIBBPF_OPTS_RESET(optd,
+ .flags = BPF_F_AFTER,
+ );
+
+ err = bpf_prog_detach_opts(0, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+
+ assert_mprog_count(target, 0);
+ goto cleanup;
+
+cleanup4:
+ err = bpf_prog_detach_opts(fd4, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+ assert_mprog_count(target, 3);
+
+cleanup3:
+ err = bpf_prog_detach_opts(fd3, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+ assert_mprog_count(target, 2);
+
+cleanup2:
+ err = bpf_prog_detach_opts(fd2, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+ assert_mprog_count(target, 1);
+
+cleanup1:
+ err = bpf_prog_detach_opts(fd1, loopback, target, &optd);
+ ASSERT_OK(err, "prog_detach");
+ assert_mprog_count(target, 0);
+
+cleanup:
+ test_tc_link__destroy(skel);
+}
+
+void serial_test_tc_opts_detach_after(void)
+{
+ test_tc_opts_detach_after_target(BPF_TCX_INGRESS);
+ test_tc_opts_detach_after_target(BPF_TCX_EGRESS);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_tc_link.c b/tools/testing/selftests/bpf/progs/test_tc_link.c
new file mode 100644
index 000000000000..ed1fd0e9cee9
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_tc_link.c
@@ -0,0 +1,40 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2023 Isovalent */
+#include <stdbool.h>
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+
+char LICENSE[] SEC("license") = "GPL";
+
+bool seen_tc1;
+bool seen_tc2;
+bool seen_tc3;
+bool seen_tc4;
+
+SEC("tc/ingress")
+int tc1(struct __sk_buff *skb)
+{
+ seen_tc1 = true;
+ return TCX_NEXT;
+}
+
+SEC("tc/egress")
+int tc2(struct __sk_buff *skb)
+{
+ seen_tc2 = true;
+ return TCX_NEXT;
+}
+
+SEC("tc/egress")
+int tc3(struct __sk_buff *skb)
+{
+ seen_tc3 = true;
+ return TCX_NEXT;
+}
+
+SEC("tc/egress")
+int tc4(struct __sk_buff *skb)
+{
+ seen_tc4 = true;
+ return TCX_NEXT;
+}
--
2.34.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH bpf-next v6 8/8] selftests/bpf: Add mprog API tests for BPF tcx links
2023-07-19 14:08 [PATCH bpf-next v6 0/8] BPF link support for tc BPF programs Daniel Borkmann
` (6 preceding siblings ...)
2023-07-19 14:08 ` [PATCH bpf-next v6 7/8] selftests/bpf: Add mprog API tests for BPF tcx opts Daniel Borkmann
@ 2023-07-19 14:08 ` Daniel Borkmann
2023-07-19 17:20 ` [PATCH bpf-next v6 0/8] BPF link support for tc BPF programs patchwork-bot+netdevbpf
8 siblings, 0 replies; 14+ messages in thread
From: Daniel Borkmann @ 2023-07-19 14:08 UTC (permalink / raw)
To: ast
Cc: andrii, martin.lau, razor, sdf, john.fastabend, kuba, dxu, joe,
toke, davem, bpf, netdev, Daniel Borkmann
Add a big batch of test coverage to assert all aspects of the tcx link API:
# ./vmtest.sh -- ./test_progs -t tc_links
[...]
#225 tc_links_after:OK
#226 tc_links_append:OK
#227 tc_links_basic:OK
#228 tc_links_before:OK
#229 tc_links_chain_classic:OK
#230 tc_links_dev_cleanup:OK
#231 tc_links_invalid:OK
#232 tc_links_prepend:OK
#233 tc_links_replace:OK
#234 tc_links_revision:OK
Summary: 10/0 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
.../selftests/bpf/prog_tests/tc_links.c | 1583 +++++++++++++++++
1 file changed, 1583 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/tc_links.c
diff --git a/tools/testing/selftests/bpf/prog_tests/tc_links.c b/tools/testing/selftests/bpf/prog_tests/tc_links.c
new file mode 100644
index 000000000000..81eea5f10742
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/tc_links.c
@@ -0,0 +1,1583 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2023 Isovalent */
+#include <uapi/linux/if_link.h>
+#include <net/if.h>
+#include <test_progs.h>
+
+#define loopback 1
+#define ping_cmd "ping -q -c1 -w1 127.0.0.1 > /dev/null"
+
+#include "test_tc_link.skel.h"
+#include "tc_helpers.h"
+
+void serial_test_tc_links_basic(void)
+{
+ LIBBPF_OPTS(bpf_prog_query_opts, optq);
+ LIBBPF_OPTS(bpf_tcx_opts, optl);
+ __u32 prog_ids[2], link_ids[2];
+ __u32 pid1, pid2, lid1, lid2;
+ struct test_tc_link *skel;
+ struct bpf_link *link;
+ int err;
+
+ skel = test_tc_link__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "skel_load"))
+ goto cleanup;
+
+ pid1 = id_from_prog_fd(bpf_program__fd(skel->progs.tc1));
+ pid2 = id_from_prog_fd(bpf_program__fd(skel->progs.tc2));
+
+ ASSERT_NEQ(pid1, pid2, "prog_ids_1_2");
+
+ assert_mprog_count(BPF_TCX_INGRESS, 0);
+ assert_mprog_count(BPF_TCX_EGRESS, 0);
+
+ ASSERT_EQ(skel->bss->seen_tc1, false, "seen_tc1");
+ ASSERT_EQ(skel->bss->seen_tc2, false, "seen_tc2");
+
+ link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl);
+ if (!ASSERT_OK_PTR(link, "link_attach"))
+ goto cleanup;
+
+ skel->links.tc1 = link;
+
+ lid1 = id_from_link_fd(bpf_link__fd(skel->links.tc1));
+
+ assert_mprog_count(BPF_TCX_INGRESS, 1);
+ assert_mprog_count(BPF_TCX_EGRESS, 0);
+
+ optq.prog_ids = prog_ids;
+ optq.link_ids = link_ids;
+
+ memset(prog_ids, 0, sizeof(prog_ids));
+ memset(link_ids, 0, sizeof(link_ids));
+ optq.count = ARRAY_SIZE(prog_ids);
+
+ err = bpf_prog_query_opts(loopback, BPF_TCX_INGRESS, &optq);
+ if (!ASSERT_OK(err, "prog_query"))
+ goto cleanup;
+
+ ASSERT_EQ(optq.count, 1, "count");
+ ASSERT_EQ(optq.revision, 2, "revision");
+ ASSERT_EQ(optq.prog_ids[0], pid1, "prog_ids[0]");
+ ASSERT_EQ(optq.link_ids[0], lid1, "link_ids[0]");
+ ASSERT_EQ(optq.prog_ids[1], 0, "prog_ids[1]");
+ ASSERT_EQ(optq.link_ids[1], 0, "link_ids[1]");
+
+ ASSERT_OK(system(ping_cmd), ping_cmd);
+
+ ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+ ASSERT_EQ(skel->bss->seen_tc2, false, "seen_tc2");
+
+ link = bpf_program__attach_tcx(skel->progs.tc2, loopback, &optl);
+ if (!ASSERT_OK_PTR(link, "link_attach"))
+ goto cleanup;
+
+ skel->links.tc2 = link;
+
+ lid2 = id_from_link_fd(bpf_link__fd(skel->links.tc2));
+ ASSERT_NEQ(lid1, lid2, "link_ids_1_2");
+
+ assert_mprog_count(BPF_TCX_INGRESS, 1);
+ assert_mprog_count(BPF_TCX_EGRESS, 1);
+
+ memset(prog_ids, 0, sizeof(prog_ids));
+ memset(link_ids, 0, sizeof(link_ids));
+ optq.count = ARRAY_SIZE(prog_ids);
+
+ err = bpf_prog_query_opts(loopback, BPF_TCX_EGRESS, &optq);
+ if (!ASSERT_OK(err, "prog_query"))
+ goto cleanup;
+
+ ASSERT_EQ(optq.count, 1, "count");
+ ASSERT_EQ(optq.revision, 2, "revision");
+ ASSERT_EQ(optq.prog_ids[0], pid2, "prog_ids[0]");
+ ASSERT_EQ(optq.link_ids[0], lid2, "link_ids[0]");
+ ASSERT_EQ(optq.prog_ids[1], 0, "prog_ids[1]");
+ ASSERT_EQ(optq.link_ids[1], 0, "link_ids[1]");
+
+ ASSERT_OK(system(ping_cmd), ping_cmd);
+
+ ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+ ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+cleanup:
+ test_tc_link__destroy(skel);
+
+ assert_mprog_count(BPF_TCX_INGRESS, 0);
+ assert_mprog_count(BPF_TCX_EGRESS, 0);
+}
+
+static void test_tc_links_before_target(int target)
+{
+ LIBBPF_OPTS(bpf_prog_query_opts, optq);
+ LIBBPF_OPTS(bpf_tcx_opts, optl);
+ __u32 prog_ids[5], link_ids[5];
+ __u32 pid1, pid2, pid3, pid4;
+ __u32 lid1, lid2, lid3, lid4;
+ struct test_tc_link *skel;
+ struct bpf_link *link;
+ int err;
+
+ skel = test_tc_link__open();
+ if (!ASSERT_OK_PTR(skel, "skel_open"))
+ goto cleanup;
+
+ ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc1, target),
+ 0, "tc1_attach_type");
+ ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc2, target),
+ 0, "tc2_attach_type");
+ ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc3, target),
+ 0, "tc3_attach_type");
+ ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc4, target),
+ 0, "tc4_attach_type");
+
+ err = test_tc_link__load(skel);
+ if (!ASSERT_OK(err, "skel_load"))
+ goto cleanup;
+
+ pid1 = id_from_prog_fd(bpf_program__fd(skel->progs.tc1));
+ pid2 = id_from_prog_fd(bpf_program__fd(skel->progs.tc2));
+ pid3 = id_from_prog_fd(bpf_program__fd(skel->progs.tc3));
+ pid4 = id_from_prog_fd(bpf_program__fd(skel->progs.tc4));
+
+ ASSERT_NEQ(pid1, pid2, "prog_ids_1_2");
+ ASSERT_NEQ(pid3, pid4, "prog_ids_3_4");
+ ASSERT_NEQ(pid2, pid3, "prog_ids_2_3");
+
+ assert_mprog_count(target, 0);
+
+ link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl);
+ if (!ASSERT_OK_PTR(link, "link_attach"))
+ goto cleanup;
+
+ skel->links.tc1 = link;
+
+ lid1 = id_from_link_fd(bpf_link__fd(skel->links.tc1));
+
+ assert_mprog_count(target, 1);
+
+ link = bpf_program__attach_tcx(skel->progs.tc2, loopback, &optl);
+ if (!ASSERT_OK_PTR(link, "link_attach"))
+ goto cleanup;
+
+ skel->links.tc2 = link;
+
+ lid2 = id_from_link_fd(bpf_link__fd(skel->links.tc2));
+
+ assert_mprog_count(target, 2);
+
+ optq.prog_ids = prog_ids;
+ optq.link_ids = link_ids;
+
+ memset(prog_ids, 0, sizeof(prog_ids));
+ memset(link_ids, 0, sizeof(link_ids));
+ optq.count = ARRAY_SIZE(prog_ids);
+
+ err = bpf_prog_query_opts(loopback, target, &optq);
+ if (!ASSERT_OK(err, "prog_query"))
+ goto cleanup;
+
+ ASSERT_EQ(optq.count, 2, "count");
+ ASSERT_EQ(optq.revision, 3, "revision");
+ ASSERT_EQ(optq.prog_ids[0], pid1, "prog_ids[0]");
+ ASSERT_EQ(optq.link_ids[0], lid1, "link_ids[0]");
+ ASSERT_EQ(optq.prog_ids[1], pid2, "prog_ids[1]");
+ ASSERT_EQ(optq.link_ids[1], lid2, "link_ids[1]");
+ ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]");
+ ASSERT_EQ(optq.link_ids[2], 0, "link_ids[2]");
+
+ ASSERT_OK(system(ping_cmd), ping_cmd);
+
+ ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+ ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+ ASSERT_EQ(skel->bss->seen_tc3, false, "seen_tc3");
+ ASSERT_EQ(skel->bss->seen_tc4, false, "seen_tc4");
+
+ skel->bss->seen_tc1 = false;
+ skel->bss->seen_tc2 = false;
+
+ LIBBPF_OPTS_RESET(optl,
+ .flags = BPF_F_BEFORE,
+ .relative_fd = bpf_program__fd(skel->progs.tc2),
+ );
+
+ link = bpf_program__attach_tcx(skel->progs.tc3, loopback, &optl);
+ if (!ASSERT_OK_PTR(link, "link_attach"))
+ goto cleanup;
+
+ skel->links.tc3 = link;
+
+ lid3 = id_from_link_fd(bpf_link__fd(skel->links.tc3));
+
+ LIBBPF_OPTS_RESET(optl,
+ .flags = BPF_F_BEFORE | BPF_F_LINK,
+ .relative_id = lid1,
+ );
+
+ link = bpf_program__attach_tcx(skel->progs.tc4, loopback, &optl);
+ if (!ASSERT_OK_PTR(link, "link_attach"))
+ goto cleanup;
+
+ skel->links.tc4 = link;
+
+ lid4 = id_from_link_fd(bpf_link__fd(skel->links.tc4));
+
+ assert_mprog_count(target, 4);
+
+ memset(prog_ids, 0, sizeof(prog_ids));
+ memset(link_ids, 0, sizeof(link_ids));
+ optq.count = ARRAY_SIZE(prog_ids);
+
+ err = bpf_prog_query_opts(loopback, target, &optq);
+ if (!ASSERT_OK(err, "prog_query"))
+ goto cleanup;
+
+ ASSERT_EQ(optq.count, 4, "count");
+ ASSERT_EQ(optq.revision, 5, "revision");
+ ASSERT_EQ(optq.prog_ids[0], pid4, "prog_ids[0]");
+ ASSERT_EQ(optq.link_ids[0], lid4, "link_ids[0]");
+ ASSERT_EQ(optq.prog_ids[1], pid1, "prog_ids[1]");
+ ASSERT_EQ(optq.link_ids[1], lid1, "link_ids[1]");
+ ASSERT_EQ(optq.prog_ids[2], pid3, "prog_ids[2]");
+ ASSERT_EQ(optq.link_ids[2], lid3, "link_ids[2]");
+ ASSERT_EQ(optq.prog_ids[3], pid2, "prog_ids[3]");
+ ASSERT_EQ(optq.link_ids[3], lid2, "link_ids[3]");
+ ASSERT_EQ(optq.prog_ids[4], 0, "prog_ids[4]");
+ ASSERT_EQ(optq.link_ids[4], 0, "link_ids[4]");
+
+ ASSERT_OK(system(ping_cmd), ping_cmd);
+
+ ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+ ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+ ASSERT_EQ(skel->bss->seen_tc3, true, "seen_tc3");
+ ASSERT_EQ(skel->bss->seen_tc4, true, "seen_tc4");
+cleanup:
+ test_tc_link__destroy(skel);
+ assert_mprog_count(target, 0);
+}
+
+void serial_test_tc_links_before(void)
+{
+ test_tc_links_before_target(BPF_TCX_INGRESS);
+ test_tc_links_before_target(BPF_TCX_EGRESS);
+}
+
+static void test_tc_links_after_target(int target)
+{
+ LIBBPF_OPTS(bpf_prog_query_opts, optq);
+ LIBBPF_OPTS(bpf_tcx_opts, optl);
+ __u32 prog_ids[5], link_ids[5];
+ __u32 pid1, pid2, pid3, pid4;
+ __u32 lid1, lid2, lid3, lid4;
+ struct test_tc_link *skel;
+ struct bpf_link *link;
+ int err;
+
+ skel = test_tc_link__open();
+ if (!ASSERT_OK_PTR(skel, "skel_open"))
+ goto cleanup;
+
+ ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc1, target),
+ 0, "tc1_attach_type");
+ ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc2, target),
+ 0, "tc2_attach_type");
+ ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc3, target),
+ 0, "tc3_attach_type");
+ ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc4, target),
+ 0, "tc4_attach_type");
+
+ err = test_tc_link__load(skel);
+ if (!ASSERT_OK(err, "skel_load"))
+ goto cleanup;
+
+ pid1 = id_from_prog_fd(bpf_program__fd(skel->progs.tc1));
+ pid2 = id_from_prog_fd(bpf_program__fd(skel->progs.tc2));
+ pid3 = id_from_prog_fd(bpf_program__fd(skel->progs.tc3));
+ pid4 = id_from_prog_fd(bpf_program__fd(skel->progs.tc4));
+
+ ASSERT_NEQ(pid1, pid2, "prog_ids_1_2");
+ ASSERT_NEQ(pid3, pid4, "prog_ids_3_4");
+ ASSERT_NEQ(pid2, pid3, "prog_ids_2_3");
+
+ assert_mprog_count(target, 0);
+
+ link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl);
+ if (!ASSERT_OK_PTR(link, "link_attach"))
+ goto cleanup;
+
+ skel->links.tc1 = link;
+
+ lid1 = id_from_link_fd(bpf_link__fd(skel->links.tc1));
+
+ assert_mprog_count(target, 1);
+
+ link = bpf_program__attach_tcx(skel->progs.tc2, loopback, &optl);
+ if (!ASSERT_OK_PTR(link, "link_attach"))
+ goto cleanup;
+
+ skel->links.tc2 = link;
+
+ lid2 = id_from_link_fd(bpf_link__fd(skel->links.tc2));
+
+ assert_mprog_count(target, 2);
+
+ optq.prog_ids = prog_ids;
+ optq.link_ids = link_ids;
+
+ memset(prog_ids, 0, sizeof(prog_ids));
+ memset(link_ids, 0, sizeof(link_ids));
+ optq.count = ARRAY_SIZE(prog_ids);
+
+ err = bpf_prog_query_opts(loopback, target, &optq);
+ if (!ASSERT_OK(err, "prog_query"))
+ goto cleanup;
+
+ ASSERT_EQ(optq.count, 2, "count");
+ ASSERT_EQ(optq.revision, 3, "revision");
+ ASSERT_EQ(optq.prog_ids[0], pid1, "prog_ids[0]");
+ ASSERT_EQ(optq.link_ids[0], lid1, "link_ids[0]");
+ ASSERT_EQ(optq.prog_ids[1], pid2, "prog_ids[1]");
+ ASSERT_EQ(optq.link_ids[1], lid2, "link_ids[1]");
+ ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]");
+ ASSERT_EQ(optq.link_ids[2], 0, "link_ids[2]");
+
+ ASSERT_OK(system(ping_cmd), ping_cmd);
+
+ ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+ ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+ ASSERT_EQ(skel->bss->seen_tc3, false, "seen_tc3");
+ ASSERT_EQ(skel->bss->seen_tc4, false, "seen_tc4");
+
+ skel->bss->seen_tc1 = false;
+ skel->bss->seen_tc2 = false;
+
+ LIBBPF_OPTS_RESET(optl,
+ .flags = BPF_F_AFTER,
+ .relative_fd = bpf_program__fd(skel->progs.tc1),
+ );
+
+ link = bpf_program__attach_tcx(skel->progs.tc3, loopback, &optl);
+ if (!ASSERT_OK_PTR(link, "link_attach"))
+ goto cleanup;
+
+ skel->links.tc3 = link;
+
+ lid3 = id_from_link_fd(bpf_link__fd(skel->links.tc3));
+
+ LIBBPF_OPTS_RESET(optl,
+ .flags = BPF_F_AFTER | BPF_F_LINK,
+ .relative_fd = bpf_link__fd(skel->links.tc2),
+ );
+
+ link = bpf_program__attach_tcx(skel->progs.tc4, loopback, &optl);
+ if (!ASSERT_OK_PTR(link, "link_attach"))
+ goto cleanup;
+
+ skel->links.tc4 = link;
+
+ lid4 = id_from_link_fd(bpf_link__fd(skel->links.tc4));
+
+ assert_mprog_count(target, 4);
+
+ memset(prog_ids, 0, sizeof(prog_ids));
+ memset(link_ids, 0, sizeof(link_ids));
+ optq.count = ARRAY_SIZE(prog_ids);
+
+ err = bpf_prog_query_opts(loopback, target, &optq);
+ if (!ASSERT_OK(err, "prog_query"))
+ goto cleanup;
+
+ ASSERT_EQ(optq.count, 4, "count");
+ ASSERT_EQ(optq.revision, 5, "revision");
+ ASSERT_EQ(optq.prog_ids[0], pid1, "prog_ids[0]");
+ ASSERT_EQ(optq.link_ids[0], lid1, "link_ids[0]");
+ ASSERT_EQ(optq.prog_ids[1], pid3, "prog_ids[1]");
+ ASSERT_EQ(optq.link_ids[1], lid3, "link_ids[1]");
+ ASSERT_EQ(optq.prog_ids[2], pid2, "prog_ids[2]");
+ ASSERT_EQ(optq.link_ids[2], lid2, "link_ids[2]");
+ ASSERT_EQ(optq.prog_ids[3], pid4, "prog_ids[3]");
+ ASSERT_EQ(optq.link_ids[3], lid4, "link_ids[3]");
+ ASSERT_EQ(optq.prog_ids[4], 0, "prog_ids[4]");
+ ASSERT_EQ(optq.link_ids[4], 0, "link_ids[4]");
+
+ ASSERT_OK(system(ping_cmd), ping_cmd);
+
+ ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+ ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+ ASSERT_EQ(skel->bss->seen_tc3, true, "seen_tc3");
+ ASSERT_EQ(skel->bss->seen_tc4, true, "seen_tc4");
+cleanup:
+ test_tc_link__destroy(skel);
+ assert_mprog_count(target, 0);
+}
+
+void serial_test_tc_links_after(void)
+{
+ test_tc_links_after_target(BPF_TCX_INGRESS);
+ test_tc_links_after_target(BPF_TCX_EGRESS);
+}
+
+static void test_tc_links_revision_target(int target)
+{
+ LIBBPF_OPTS(bpf_prog_query_opts, optq);
+ LIBBPF_OPTS(bpf_tcx_opts, optl);
+ __u32 prog_ids[3], link_ids[3];
+ __u32 pid1, pid2, lid1, lid2;
+ struct test_tc_link *skel;
+ struct bpf_link *link;
+ int err;
+
+ skel = test_tc_link__open();
+ if (!ASSERT_OK_PTR(skel, "skel_open"))
+ goto cleanup;
+
+ ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc1, target),
+ 0, "tc1_attach_type");
+ ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc2, target),
+ 0, "tc2_attach_type");
+
+ err = test_tc_link__load(skel);
+ if (!ASSERT_OK(err, "skel_load"))
+ goto cleanup;
+
+ pid1 = id_from_prog_fd(bpf_program__fd(skel->progs.tc1));
+ pid2 = id_from_prog_fd(bpf_program__fd(skel->progs.tc2));
+
+ ASSERT_NEQ(pid1, pid2, "prog_ids_1_2");
+
+ assert_mprog_count(target, 0);
+
+ optl.expected_revision = 1;
+
+ link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl);
+ if (!ASSERT_OK_PTR(link, "link_attach"))
+ goto cleanup;
+
+ skel->links.tc1 = link;
+
+ lid1 = id_from_link_fd(bpf_link__fd(skel->links.tc1));
+
+ assert_mprog_count(target, 1);
+
+ optl.expected_revision = 1;
+
+ link = bpf_program__attach_tcx(skel->progs.tc2, loopback, &optl);
+ if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) {
+ bpf_link__destroy(link);
+ goto cleanup;
+ }
+
+ assert_mprog_count(target, 1);
+
+ optl.expected_revision = 2;
+
+ link = bpf_program__attach_tcx(skel->progs.tc2, loopback, &optl);
+ if (!ASSERT_OK_PTR(link, "link_attach"))
+ goto cleanup;
+
+ skel->links.tc2 = link;
+
+ lid2 = id_from_link_fd(bpf_link__fd(skel->links.tc2));
+
+ assert_mprog_count(target, 2);
+
+ optq.prog_ids = prog_ids;
+ optq.link_ids = link_ids;
+
+ memset(prog_ids, 0, sizeof(prog_ids));
+ memset(link_ids, 0, sizeof(link_ids));
+ optq.count = ARRAY_SIZE(prog_ids);
+
+ err = bpf_prog_query_opts(loopback, target, &optq);
+ if (!ASSERT_OK(err, "prog_query"))
+ goto cleanup;
+
+ ASSERT_EQ(optq.count, 2, "count");
+ ASSERT_EQ(optq.revision, 3, "revision");
+ ASSERT_EQ(optq.prog_ids[0], pid1, "prog_ids[0]");
+ ASSERT_EQ(optq.link_ids[0], lid1, "link_ids[0]");
+ ASSERT_EQ(optq.prog_ids[1], pid2, "prog_ids[1]");
+ ASSERT_EQ(optq.link_ids[1], lid2, "link_ids[1]");
+ ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]");
+ ASSERT_EQ(optq.link_ids[2], 0, "prog_ids[2]");
+
+ ASSERT_OK(system(ping_cmd), ping_cmd);
+
+ ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+ ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+cleanup:
+ test_tc_link__destroy(skel);
+ assert_mprog_count(target, 0);
+}
+
+void serial_test_tc_links_revision(void)
+{
+ test_tc_links_revision_target(BPF_TCX_INGRESS);
+ test_tc_links_revision_target(BPF_TCX_EGRESS);
+}
+
+static void test_tc_chain_classic(int target, bool chain_tc_old)
+{
+ LIBBPF_OPTS(bpf_tc_opts, tc_opts, .handle = 1, .priority = 1);
+ LIBBPF_OPTS(bpf_tc_hook, tc_hook, .ifindex = loopback);
+ bool hook_created = false, tc_attached = false;
+ LIBBPF_OPTS(bpf_tcx_opts, optl);
+ __u32 pid1, pid2, pid3;
+ struct test_tc_link *skel;
+ struct bpf_link *link;
+ int err;
+
+ skel = test_tc_link__open();
+ if (!ASSERT_OK_PTR(skel, "skel_open"))
+ goto cleanup;
+
+ ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc1, target),
+ 0, "tc1_attach_type");
+ ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc2, target),
+ 0, "tc2_attach_type");
+
+ err = test_tc_link__load(skel);
+ if (!ASSERT_OK(err, "skel_load"))
+ goto cleanup;
+
+ pid1 = id_from_prog_fd(bpf_program__fd(skel->progs.tc1));
+ pid2 = id_from_prog_fd(bpf_program__fd(skel->progs.tc2));
+ pid3 = id_from_prog_fd(bpf_program__fd(skel->progs.tc3));
+
+ ASSERT_NEQ(pid1, pid2, "prog_ids_1_2");
+ ASSERT_NEQ(pid2, pid3, "prog_ids_2_3");
+
+ assert_mprog_count(target, 0);
+
+ if (chain_tc_old) {
+ tc_hook.attach_point = target == BPF_TCX_INGRESS ?
+ BPF_TC_INGRESS : BPF_TC_EGRESS;
+ err = bpf_tc_hook_create(&tc_hook);
+ if (err == 0)
+ hook_created = true;
+ err = err == -EEXIST ? 0 : err;
+ if (!ASSERT_OK(err, "bpf_tc_hook_create"))
+ goto cleanup;
+
+ tc_opts.prog_fd = bpf_program__fd(skel->progs.tc3);
+ err = bpf_tc_attach(&tc_hook, &tc_opts);
+ if (!ASSERT_OK(err, "bpf_tc_attach"))
+ goto cleanup;
+ tc_attached = true;
+ }
+
+ link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl);
+ if (!ASSERT_OK_PTR(link, "link_attach"))
+ goto cleanup;
+
+ skel->links.tc1 = link;
+
+ link = bpf_program__attach_tcx(skel->progs.tc2, loopback, &optl);
+ if (!ASSERT_OK_PTR(link, "link_attach"))
+ goto cleanup;
+
+ skel->links.tc2 = link;
+
+ assert_mprog_count(target, 2);
+
+ ASSERT_OK(system(ping_cmd), ping_cmd);
+
+ ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+ ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+ ASSERT_EQ(skel->bss->seen_tc3, chain_tc_old, "seen_tc3");
+
+ skel->bss->seen_tc1 = false;
+ skel->bss->seen_tc2 = false;
+ skel->bss->seen_tc3 = false;
+
+ err = bpf_link__detach(skel->links.tc2);
+ if (!ASSERT_OK(err, "prog_detach"))
+ goto cleanup;
+
+ assert_mprog_count(target, 1);
+
+ ASSERT_OK(system(ping_cmd), ping_cmd);
+
+ ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+ ASSERT_EQ(skel->bss->seen_tc2, false, "seen_tc2");
+ ASSERT_EQ(skel->bss->seen_tc3, chain_tc_old, "seen_tc3");
+cleanup:
+ if (tc_attached) {
+ tc_opts.flags = tc_opts.prog_fd = tc_opts.prog_id = 0;
+ err = bpf_tc_detach(&tc_hook, &tc_opts);
+ ASSERT_OK(err, "bpf_tc_detach");
+ }
+ if (hook_created) {
+ tc_hook.attach_point = BPF_TC_INGRESS | BPF_TC_EGRESS;
+ bpf_tc_hook_destroy(&tc_hook);
+ }
+ assert_mprog_count(target, 1);
+ test_tc_link__destroy(skel);
+ assert_mprog_count(target, 0);
+}
+
+void serial_test_tc_links_chain_classic(void)
+{
+ test_tc_chain_classic(BPF_TCX_INGRESS, false);
+ test_tc_chain_classic(BPF_TCX_EGRESS, false);
+ test_tc_chain_classic(BPF_TCX_INGRESS, true);
+ test_tc_chain_classic(BPF_TCX_EGRESS, true);
+}
+
+static void test_tc_links_replace_target(int target)
+{
+ LIBBPF_OPTS(bpf_prog_query_opts, optq);
+ LIBBPF_OPTS(bpf_tcx_opts, optl);
+ __u32 pid1, pid2, pid3, lid1, lid2;
+ __u32 prog_ids[4], link_ids[4];
+ struct test_tc_link *skel;
+ struct bpf_link *link;
+ int err;
+
+ skel = test_tc_link__open();
+ if (!ASSERT_OK_PTR(skel, "skel_open"))
+ goto cleanup;
+
+ ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc1, target),
+ 0, "tc1_attach_type");
+ ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc2, target),
+ 0, "tc2_attach_type");
+ ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc3, target),
+ 0, "tc3_attach_type");
+
+ err = test_tc_link__load(skel);
+ if (!ASSERT_OK(err, "skel_load"))
+ goto cleanup;
+
+ pid1 = id_from_prog_fd(bpf_program__fd(skel->progs.tc1));
+ pid2 = id_from_prog_fd(bpf_program__fd(skel->progs.tc2));
+ pid3 = id_from_prog_fd(bpf_program__fd(skel->progs.tc3));
+
+ ASSERT_NEQ(pid1, pid2, "prog_ids_1_2");
+ ASSERT_NEQ(pid2, pid3, "prog_ids_2_3");
+
+ assert_mprog_count(target, 0);
+
+ optl.expected_revision = 1;
+
+ link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl);
+ if (!ASSERT_OK_PTR(link, "link_attach"))
+ goto cleanup;
+
+ skel->links.tc1 = link;
+
+ lid1 = id_from_link_fd(bpf_link__fd(skel->links.tc1));
+
+ assert_mprog_count(target, 1);
+
+ LIBBPF_OPTS_RESET(optl,
+ .flags = BPF_F_BEFORE,
+ .relative_id = pid1,
+ .expected_revision = 2,
+ );
+
+ link = bpf_program__attach_tcx(skel->progs.tc2, loopback, &optl);
+ if (!ASSERT_OK_PTR(link, "link_attach"))
+ goto cleanup;
+
+ skel->links.tc2 = link;
+
+ lid2 = id_from_link_fd(bpf_link__fd(skel->links.tc2));
+
+ assert_mprog_count(target, 2);
+
+ optq.prog_ids = prog_ids;
+ optq.link_ids = link_ids;
+
+ memset(prog_ids, 0, sizeof(prog_ids));
+ memset(link_ids, 0, sizeof(link_ids));
+ optq.count = ARRAY_SIZE(prog_ids);
+
+ err = bpf_prog_query_opts(loopback, target, &optq);
+ if (!ASSERT_OK(err, "prog_query"))
+ goto cleanup;
+
+ ASSERT_EQ(optq.count, 2, "count");
+ ASSERT_EQ(optq.revision, 3, "revision");
+ ASSERT_EQ(optq.prog_ids[0], pid2, "prog_ids[0]");
+ ASSERT_EQ(optq.link_ids[0], lid2, "link_ids[0]");
+ ASSERT_EQ(optq.prog_ids[1], pid1, "prog_ids[1]");
+ ASSERT_EQ(optq.link_ids[1], lid1, "link_ids[1]");
+ ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]");
+ ASSERT_EQ(optq.link_ids[2], 0, "link_ids[2]");
+
+ ASSERT_OK(system(ping_cmd), ping_cmd);
+
+ ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+ ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+ ASSERT_EQ(skel->bss->seen_tc3, false, "seen_tc3");
+
+ skel->bss->seen_tc1 = false;
+ skel->bss->seen_tc2 = false;
+ skel->bss->seen_tc3 = false;
+
+ LIBBPF_OPTS_RESET(optl,
+ .flags = BPF_F_REPLACE,
+ .relative_fd = bpf_program__fd(skel->progs.tc2),
+ .expected_revision = 3,
+ );
+
+ link = bpf_program__attach_tcx(skel->progs.tc3, loopback, &optl);
+ if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) {
+ bpf_link__destroy(link);
+ goto cleanup;
+ }
+
+ assert_mprog_count(target, 2);
+
+ LIBBPF_OPTS_RESET(optl,
+ .flags = BPF_F_REPLACE | BPF_F_LINK,
+ .relative_fd = bpf_link__fd(skel->links.tc2),
+ .expected_revision = 3,
+ );
+
+ link = bpf_program__attach_tcx(skel->progs.tc3, loopback, &optl);
+ if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) {
+ bpf_link__destroy(link);
+ goto cleanup;
+ }
+
+ assert_mprog_count(target, 2);
+
+ LIBBPF_OPTS_RESET(optl,
+ .flags = BPF_F_REPLACE | BPF_F_LINK | BPF_F_AFTER,
+ .relative_id = lid2,
+ );
+
+ link = bpf_program__attach_tcx(skel->progs.tc3, loopback, &optl);
+ if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) {
+ bpf_link__destroy(link);
+ goto cleanup;
+ }
+
+ assert_mprog_count(target, 2);
+
+ err = bpf_link__update_program(skel->links.tc2, skel->progs.tc3);
+ if (!ASSERT_OK(err, "link_update"))
+ goto cleanup;
+
+ assert_mprog_count(target, 2);
+
+ memset(prog_ids, 0, sizeof(prog_ids));
+ memset(link_ids, 0, sizeof(link_ids));
+ optq.count = ARRAY_SIZE(prog_ids);
+
+ err = bpf_prog_query_opts(loopback, target, &optq);
+ if (!ASSERT_OK(err, "prog_query"))
+ goto cleanup;
+
+ ASSERT_EQ(optq.count, 2, "count");
+ ASSERT_EQ(optq.revision, 4, "revision");
+ ASSERT_EQ(optq.prog_ids[0], pid3, "prog_ids[0]");
+ ASSERT_EQ(optq.link_ids[0], lid2, "link_ids[0]");
+ ASSERT_EQ(optq.prog_ids[1], pid1, "prog_ids[1]");
+ ASSERT_EQ(optq.link_ids[1], lid1, "link_ids[1]");
+ ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]");
+ ASSERT_EQ(optq.link_ids[2], 0, "link_ids[2]");
+
+ ASSERT_OK(system(ping_cmd), ping_cmd);
+
+ ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+ ASSERT_EQ(skel->bss->seen_tc2, false, "seen_tc2");
+ ASSERT_EQ(skel->bss->seen_tc3, true, "seen_tc3");
+
+ skel->bss->seen_tc1 = false;
+ skel->bss->seen_tc2 = false;
+ skel->bss->seen_tc3 = false;
+
+ err = bpf_link__detach(skel->links.tc2);
+ if (!ASSERT_OK(err, "link_detach"))
+ goto cleanup;
+
+ assert_mprog_count(target, 1);
+
+ memset(prog_ids, 0, sizeof(prog_ids));
+ memset(link_ids, 0, sizeof(link_ids));
+ optq.count = ARRAY_SIZE(prog_ids);
+
+ err = bpf_prog_query_opts(loopback, target, &optq);
+ if (!ASSERT_OK(err, "prog_query"))
+ goto cleanup;
+
+ ASSERT_EQ(optq.count, 1, "count");
+ ASSERT_EQ(optq.revision, 5, "revision");
+ ASSERT_EQ(optq.prog_ids[0], pid1, "prog_ids[0]");
+ ASSERT_EQ(optq.link_ids[0], lid1, "link_ids[0]");
+ ASSERT_EQ(optq.prog_ids[1], 0, "prog_ids[1]");
+ ASSERT_EQ(optq.link_ids[1], 0, "link_ids[1]");
+
+ ASSERT_OK(system(ping_cmd), ping_cmd);
+
+ ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+ ASSERT_EQ(skel->bss->seen_tc2, false, "seen_tc2");
+ ASSERT_EQ(skel->bss->seen_tc3, false, "seen_tc3");
+
+ skel->bss->seen_tc1 = false;
+ skel->bss->seen_tc2 = false;
+ skel->bss->seen_tc3 = false;
+
+ err = bpf_link__update_program(skel->links.tc1, skel->progs.tc1);
+ if (!ASSERT_OK(err, "link_update_self"))
+ goto cleanup;
+
+ assert_mprog_count(target, 1);
+
+ memset(prog_ids, 0, sizeof(prog_ids));
+ memset(link_ids, 0, sizeof(link_ids));
+ optq.count = ARRAY_SIZE(prog_ids);
+
+ err = bpf_prog_query_opts(loopback, target, &optq);
+ if (!ASSERT_OK(err, "prog_query"))
+ goto cleanup;
+
+ ASSERT_EQ(optq.count, 1, "count");
+ ASSERT_EQ(optq.revision, 5, "revision");
+ ASSERT_EQ(optq.prog_ids[0], pid1, "prog_ids[0]");
+ ASSERT_EQ(optq.link_ids[0], lid1, "link_ids[0]");
+ ASSERT_EQ(optq.prog_ids[1], 0, "prog_ids[1]");
+ ASSERT_EQ(optq.link_ids[1], 0, "link_ids[1]");
+
+ ASSERT_OK(system(ping_cmd), ping_cmd);
+
+ ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+ ASSERT_EQ(skel->bss->seen_tc2, false, "seen_tc2");
+ ASSERT_EQ(skel->bss->seen_tc3, false, "seen_tc3");
+cleanup:
+ test_tc_link__destroy(skel);
+ assert_mprog_count(target, 0);
+}
+
+void serial_test_tc_links_replace(void)
+{
+ test_tc_links_replace_target(BPF_TCX_INGRESS);
+ test_tc_links_replace_target(BPF_TCX_EGRESS);
+}
+
+static void test_tc_links_invalid_target(int target)
+{
+ LIBBPF_OPTS(bpf_prog_query_opts, optq);
+ LIBBPF_OPTS(bpf_tcx_opts, optl);
+ __u32 pid1, pid2, lid1;
+ struct test_tc_link *skel;
+ struct bpf_link *link;
+ int err;
+
+ skel = test_tc_link__open();
+ if (!ASSERT_OK_PTR(skel, "skel_open"))
+ goto cleanup;
+
+ ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc1, target),
+ 0, "tc1_attach_type");
+ ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc2, target),
+ 0, "tc2_attach_type");
+
+ err = test_tc_link__load(skel);
+ if (!ASSERT_OK(err, "skel_load"))
+ goto cleanup;
+
+ pid1 = id_from_prog_fd(bpf_program__fd(skel->progs.tc1));
+ pid2 = id_from_prog_fd(bpf_program__fd(skel->progs.tc2));
+
+ ASSERT_NEQ(pid1, pid2, "prog_ids_1_2");
+
+ assert_mprog_count(target, 0);
+
+ optl.flags = BPF_F_BEFORE | BPF_F_AFTER;
+
+ link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl);
+ if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) {
+ bpf_link__destroy(link);
+ goto cleanup;
+ }
+
+ assert_mprog_count(target, 0);
+
+ LIBBPF_OPTS_RESET(optl,
+ .flags = BPF_F_BEFORE | BPF_F_ID,
+ );
+
+ link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl);
+ if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) {
+ bpf_link__destroy(link);
+ goto cleanup;
+ }
+
+ assert_mprog_count(target, 0);
+
+ LIBBPF_OPTS_RESET(optl,
+ .flags = BPF_F_AFTER | BPF_F_ID,
+ );
+
+ link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl);
+ if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) {
+ bpf_link__destroy(link);
+ goto cleanup;
+ }
+
+ assert_mprog_count(target, 0);
+
+ LIBBPF_OPTS_RESET(optl,
+ .flags = BPF_F_ID,
+ );
+
+ link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl);
+ if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) {
+ bpf_link__destroy(link);
+ goto cleanup;
+ }
+
+ assert_mprog_count(target, 0);
+
+ LIBBPF_OPTS_RESET(optl,
+ .flags = BPF_F_LINK,
+ .relative_fd = bpf_program__fd(skel->progs.tc2),
+ );
+
+ link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl);
+ if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) {
+ bpf_link__destroy(link);
+ goto cleanup;
+ }
+
+ assert_mprog_count(target, 0);
+
+ LIBBPF_OPTS_RESET(optl,
+ .flags = BPF_F_LINK,
+ );
+
+ link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl);
+ if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) {
+ bpf_link__destroy(link);
+ goto cleanup;
+ }
+
+ assert_mprog_count(target, 0);
+
+ LIBBPF_OPTS_RESET(optl,
+ .relative_fd = bpf_program__fd(skel->progs.tc2),
+ );
+
+ link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl);
+ if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) {
+ bpf_link__destroy(link);
+ goto cleanup;
+ }
+
+ assert_mprog_count(target, 0);
+
+ LIBBPF_OPTS_RESET(optl,
+ .flags = BPF_F_BEFORE | BPF_F_AFTER,
+ .relative_fd = bpf_program__fd(skel->progs.tc2),
+ );
+
+ link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl);
+ if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) {
+ bpf_link__destroy(link);
+ goto cleanup;
+ }
+
+ assert_mprog_count(target, 0);
+
+ LIBBPF_OPTS_RESET(optl,
+ .flags = BPF_F_BEFORE,
+ .relative_fd = bpf_program__fd(skel->progs.tc1),
+ );
+
+ link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl);
+ if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) {
+ bpf_link__destroy(link);
+ goto cleanup;
+ }
+
+ assert_mprog_count(target, 0);
+
+ LIBBPF_OPTS_RESET(optl,
+ .flags = BPF_F_ID,
+ .relative_id = pid2,
+ );
+
+ link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl);
+ if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) {
+ bpf_link__destroy(link);
+ goto cleanup;
+ }
+
+ assert_mprog_count(target, 0);
+
+ LIBBPF_OPTS_RESET(optl,
+ .flags = BPF_F_ID,
+ .relative_id = 42,
+ );
+
+ link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl);
+ if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) {
+ bpf_link__destroy(link);
+ goto cleanup;
+ }
+
+ assert_mprog_count(target, 0);
+
+ LIBBPF_OPTS_RESET(optl,
+ .flags = BPF_F_BEFORE,
+ .relative_fd = bpf_program__fd(skel->progs.tc1),
+ );
+
+ link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl);
+ if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) {
+ bpf_link__destroy(link);
+ goto cleanup;
+ }
+
+ assert_mprog_count(target, 0);
+
+ LIBBPF_OPTS_RESET(optl,
+ .flags = BPF_F_BEFORE | BPF_F_LINK,
+ .relative_fd = bpf_program__fd(skel->progs.tc1),
+ );
+
+ link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl);
+ if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) {
+ bpf_link__destroy(link);
+ goto cleanup;
+ }
+
+ assert_mprog_count(target, 0);
+
+ LIBBPF_OPTS_RESET(optl,
+ .flags = BPF_F_AFTER,
+ .relative_fd = bpf_program__fd(skel->progs.tc1),
+ );
+
+ link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl);
+ if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) {
+ bpf_link__destroy(link);
+ goto cleanup;
+ }
+
+ assert_mprog_count(target, 0);
+
+ LIBBPF_OPTS_RESET(optl);
+
+ link = bpf_program__attach_tcx(skel->progs.tc1, 0, &optl);
+ if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) {
+ bpf_link__destroy(link);
+ goto cleanup;
+ }
+
+ assert_mprog_count(target, 0);
+
+ LIBBPF_OPTS_RESET(optl,
+ .flags = BPF_F_AFTER | BPF_F_LINK,
+ .relative_fd = bpf_program__fd(skel->progs.tc1),
+ );
+
+ link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl);
+ if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) {
+ bpf_link__destroy(link);
+ goto cleanup;
+ }
+
+ assert_mprog_count(target, 0);
+
+ LIBBPF_OPTS_RESET(optl);
+
+ link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl);
+ if (!ASSERT_OK_PTR(link, "link_attach"))
+ goto cleanup;
+
+ skel->links.tc1 = link;
+
+ lid1 = id_from_link_fd(bpf_link__fd(skel->links.tc1));
+
+ assert_mprog_count(target, 1);
+
+ LIBBPF_OPTS_RESET(optl,
+ .flags = BPF_F_AFTER | BPF_F_LINK,
+ .relative_fd = bpf_program__fd(skel->progs.tc1),
+ );
+
+ link = bpf_program__attach_tcx(skel->progs.tc2, loopback, &optl);
+ if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) {
+ bpf_link__destroy(link);
+ goto cleanup;
+ }
+
+ assert_mprog_count(target, 1);
+
+ LIBBPF_OPTS_RESET(optl,
+ .flags = BPF_F_BEFORE | BPF_F_LINK | BPF_F_ID,
+ .relative_id = ~0,
+ );
+
+ link = bpf_program__attach_tcx(skel->progs.tc2, loopback, &optl);
+ if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) {
+ bpf_link__destroy(link);
+ goto cleanup;
+ }
+
+ assert_mprog_count(target, 1);
+
+ LIBBPF_OPTS_RESET(optl,
+ .flags = BPF_F_BEFORE | BPF_F_LINK | BPF_F_ID,
+ .relative_id = lid1,
+ );
+
+ link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl);
+ if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) {
+ bpf_link__destroy(link);
+ goto cleanup;
+ }
+
+ assert_mprog_count(target, 1);
+
+ LIBBPF_OPTS_RESET(optl,
+ .flags = BPF_F_BEFORE | BPF_F_ID,
+ .relative_id = pid1,
+ );
+
+ link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl);
+ if (!ASSERT_ERR_PTR(link, "link_attach_should_fail")) {
+ bpf_link__destroy(link);
+ goto cleanup;
+ }
+ assert_mprog_count(target, 1);
+
+ LIBBPF_OPTS_RESET(optl,
+ .flags = BPF_F_BEFORE | BPF_F_LINK | BPF_F_ID,
+ .relative_id = lid1,
+ );
+
+ link = bpf_program__attach_tcx(skel->progs.tc2, loopback, &optl);
+ if (!ASSERT_OK_PTR(link, "link_attach"))
+ goto cleanup;
+
+ skel->links.tc2 = link;
+
+ assert_mprog_count(target, 2);
+cleanup:
+ test_tc_link__destroy(skel);
+ assert_mprog_count(target, 0);
+}
+
+void serial_test_tc_links_invalid(void)
+{
+ test_tc_links_invalid_target(BPF_TCX_INGRESS);
+ test_tc_links_invalid_target(BPF_TCX_EGRESS);
+}
+
+static void test_tc_links_prepend_target(int target)
+{
+ LIBBPF_OPTS(bpf_prog_query_opts, optq);
+ LIBBPF_OPTS(bpf_tcx_opts, optl);
+ __u32 prog_ids[5], link_ids[5];
+ __u32 pid1, pid2, pid3, pid4;
+ __u32 lid1, lid2, lid3, lid4;
+ struct test_tc_link *skel;
+ struct bpf_link *link;
+ int err;
+
+ skel = test_tc_link__open();
+ if (!ASSERT_OK_PTR(skel, "skel_open"))
+ goto cleanup;
+
+ ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc1, target),
+ 0, "tc1_attach_type");
+ ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc2, target),
+ 0, "tc2_attach_type");
+ ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc3, target),
+ 0, "tc3_attach_type");
+ ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc4, target),
+ 0, "tc4_attach_type");
+
+ err = test_tc_link__load(skel);
+ if (!ASSERT_OK(err, "skel_load"))
+ goto cleanup;
+
+ pid1 = id_from_prog_fd(bpf_program__fd(skel->progs.tc1));
+ pid2 = id_from_prog_fd(bpf_program__fd(skel->progs.tc2));
+ pid3 = id_from_prog_fd(bpf_program__fd(skel->progs.tc3));
+ pid4 = id_from_prog_fd(bpf_program__fd(skel->progs.tc4));
+
+ ASSERT_NEQ(pid1, pid2, "prog_ids_1_2");
+ ASSERT_NEQ(pid3, pid4, "prog_ids_3_4");
+ ASSERT_NEQ(pid2, pid3, "prog_ids_2_3");
+
+ assert_mprog_count(target, 0);
+
+ link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl);
+ if (!ASSERT_OK_PTR(link, "link_attach"))
+ goto cleanup;
+
+ skel->links.tc1 = link;
+
+ lid1 = id_from_link_fd(bpf_link__fd(skel->links.tc1));
+
+ assert_mprog_count(target, 1);
+
+ LIBBPF_OPTS_RESET(optl,
+ .flags = BPF_F_BEFORE,
+ );
+
+ link = bpf_program__attach_tcx(skel->progs.tc2, loopback, &optl);
+ if (!ASSERT_OK_PTR(link, "link_attach"))
+ goto cleanup;
+
+ skel->links.tc2 = link;
+
+ lid2 = id_from_link_fd(bpf_link__fd(skel->links.tc2));
+
+ assert_mprog_count(target, 2);
+
+ optq.prog_ids = prog_ids;
+ optq.link_ids = link_ids;
+
+ memset(prog_ids, 0, sizeof(prog_ids));
+ memset(link_ids, 0, sizeof(link_ids));
+ optq.count = ARRAY_SIZE(prog_ids);
+
+ err = bpf_prog_query_opts(loopback, target, &optq);
+ if (!ASSERT_OK(err, "prog_query"))
+ goto cleanup;
+
+ ASSERT_EQ(optq.count, 2, "count");
+ ASSERT_EQ(optq.revision, 3, "revision");
+ ASSERT_EQ(optq.prog_ids[0], pid2, "prog_ids[0]");
+ ASSERT_EQ(optq.link_ids[0], lid2, "link_ids[0]");
+ ASSERT_EQ(optq.prog_ids[1], pid1, "prog_ids[1]");
+ ASSERT_EQ(optq.link_ids[1], lid1, "link_ids[1]");
+ ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]");
+ ASSERT_EQ(optq.link_ids[2], 0, "link_ids[2]");
+
+ ASSERT_OK(system(ping_cmd), ping_cmd);
+
+ ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+ ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+ ASSERT_EQ(skel->bss->seen_tc3, false, "seen_tc3");
+ ASSERT_EQ(skel->bss->seen_tc4, false, "seen_tc4");
+
+ skel->bss->seen_tc1 = false;
+ skel->bss->seen_tc2 = false;
+
+ LIBBPF_OPTS_RESET(optl,
+ .flags = BPF_F_BEFORE,
+ );
+
+ link = bpf_program__attach_tcx(skel->progs.tc3, loopback, &optl);
+ if (!ASSERT_OK_PTR(link, "link_attach"))
+ goto cleanup;
+
+ skel->links.tc3 = link;
+
+ lid3 = id_from_link_fd(bpf_link__fd(skel->links.tc3));
+
+ LIBBPF_OPTS_RESET(optl,
+ .flags = BPF_F_BEFORE,
+ );
+
+ link = bpf_program__attach_tcx(skel->progs.tc4, loopback, &optl);
+ if (!ASSERT_OK_PTR(link, "link_attach"))
+ goto cleanup;
+
+ skel->links.tc4 = link;
+
+ lid4 = id_from_link_fd(bpf_link__fd(skel->links.tc4));
+
+ assert_mprog_count(target, 4);
+
+ memset(prog_ids, 0, sizeof(prog_ids));
+ memset(link_ids, 0, sizeof(link_ids));
+ optq.count = ARRAY_SIZE(prog_ids);
+
+ err = bpf_prog_query_opts(loopback, target, &optq);
+ if (!ASSERT_OK(err, "prog_query"))
+ goto cleanup;
+
+ ASSERT_EQ(optq.count, 4, "count");
+ ASSERT_EQ(optq.revision, 5, "revision");
+ ASSERT_EQ(optq.prog_ids[0], pid4, "prog_ids[0]");
+ ASSERT_EQ(optq.link_ids[0], lid4, "link_ids[0]");
+ ASSERT_EQ(optq.prog_ids[1], pid3, "prog_ids[1]");
+ ASSERT_EQ(optq.link_ids[1], lid3, "link_ids[1]");
+ ASSERT_EQ(optq.prog_ids[2], pid2, "prog_ids[2]");
+ ASSERT_EQ(optq.link_ids[2], lid2, "link_ids[2]");
+ ASSERT_EQ(optq.prog_ids[3], pid1, "prog_ids[3]");
+ ASSERT_EQ(optq.link_ids[3], lid1, "link_ids[3]");
+ ASSERT_EQ(optq.prog_ids[4], 0, "prog_ids[4]");
+ ASSERT_EQ(optq.link_ids[4], 0, "link_ids[4]");
+
+ ASSERT_OK(system(ping_cmd), ping_cmd);
+
+ ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+ ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+ ASSERT_EQ(skel->bss->seen_tc3, true, "seen_tc3");
+ ASSERT_EQ(skel->bss->seen_tc4, true, "seen_tc4");
+cleanup:
+ test_tc_link__destroy(skel);
+ assert_mprog_count(target, 0);
+}
+
+void serial_test_tc_links_prepend(void)
+{
+ test_tc_links_prepend_target(BPF_TCX_INGRESS);
+ test_tc_links_prepend_target(BPF_TCX_EGRESS);
+}
+
+static void test_tc_links_append_target(int target)
+{
+ LIBBPF_OPTS(bpf_prog_query_opts, optq);
+ LIBBPF_OPTS(bpf_tcx_opts, optl);
+ __u32 prog_ids[5], link_ids[5];
+ __u32 pid1, pid2, pid3, pid4;
+ __u32 lid1, lid2, lid3, lid4;
+ struct test_tc_link *skel;
+ struct bpf_link *link;
+ int err;
+
+ skel = test_tc_link__open();
+ if (!ASSERT_OK_PTR(skel, "skel_open"))
+ goto cleanup;
+
+ ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc1, target),
+ 0, "tc1_attach_type");
+ ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc2, target),
+ 0, "tc2_attach_type");
+ ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc3, target),
+ 0, "tc3_attach_type");
+ ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc4, target),
+ 0, "tc4_attach_type");
+
+ err = test_tc_link__load(skel);
+ if (!ASSERT_OK(err, "skel_load"))
+ goto cleanup;
+
+ pid1 = id_from_prog_fd(bpf_program__fd(skel->progs.tc1));
+ pid2 = id_from_prog_fd(bpf_program__fd(skel->progs.tc2));
+ pid3 = id_from_prog_fd(bpf_program__fd(skel->progs.tc3));
+ pid4 = id_from_prog_fd(bpf_program__fd(skel->progs.tc4));
+
+ ASSERT_NEQ(pid1, pid2, "prog_ids_1_2");
+ ASSERT_NEQ(pid3, pid4, "prog_ids_3_4");
+ ASSERT_NEQ(pid2, pid3, "prog_ids_2_3");
+
+ assert_mprog_count(target, 0);
+
+ link = bpf_program__attach_tcx(skel->progs.tc1, loopback, &optl);
+ if (!ASSERT_OK_PTR(link, "link_attach"))
+ goto cleanup;
+
+ skel->links.tc1 = link;
+
+ lid1 = id_from_link_fd(bpf_link__fd(skel->links.tc1));
+
+ assert_mprog_count(target, 1);
+
+ LIBBPF_OPTS_RESET(optl,
+ .flags = BPF_F_AFTER,
+ );
+
+ link = bpf_program__attach_tcx(skel->progs.tc2, loopback, &optl);
+ if (!ASSERT_OK_PTR(link, "link_attach"))
+ goto cleanup;
+
+ skel->links.tc2 = link;
+
+ lid2 = id_from_link_fd(bpf_link__fd(skel->links.tc2));
+
+ assert_mprog_count(target, 2);
+
+ optq.prog_ids = prog_ids;
+ optq.link_ids = link_ids;
+
+ memset(prog_ids, 0, sizeof(prog_ids));
+ memset(link_ids, 0, sizeof(link_ids));
+ optq.count = ARRAY_SIZE(prog_ids);
+
+ err = bpf_prog_query_opts(loopback, target, &optq);
+ if (!ASSERT_OK(err, "prog_query"))
+ goto cleanup;
+
+ ASSERT_EQ(optq.count, 2, "count");
+ ASSERT_EQ(optq.revision, 3, "revision");
+ ASSERT_EQ(optq.prog_ids[0], pid1, "prog_ids[0]");
+ ASSERT_EQ(optq.link_ids[0], lid1, "link_ids[0]");
+ ASSERT_EQ(optq.prog_ids[1], pid2, "prog_ids[1]");
+ ASSERT_EQ(optq.link_ids[1], lid2, "link_ids[1]");
+ ASSERT_EQ(optq.prog_ids[2], 0, "prog_ids[2]");
+ ASSERT_EQ(optq.link_ids[2], 0, "link_ids[2]");
+
+ ASSERT_OK(system(ping_cmd), ping_cmd);
+
+ ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+ ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+ ASSERT_EQ(skel->bss->seen_tc3, false, "seen_tc3");
+ ASSERT_EQ(skel->bss->seen_tc4, false, "seen_tc4");
+
+ skel->bss->seen_tc1 = false;
+ skel->bss->seen_tc2 = false;
+
+ LIBBPF_OPTS_RESET(optl,
+ .flags = BPF_F_AFTER,
+ );
+
+ link = bpf_program__attach_tcx(skel->progs.tc3, loopback, &optl);
+ if (!ASSERT_OK_PTR(link, "link_attach"))
+ goto cleanup;
+
+ skel->links.tc3 = link;
+
+ lid3 = id_from_link_fd(bpf_link__fd(skel->links.tc3));
+
+ LIBBPF_OPTS_RESET(optl,
+ .flags = BPF_F_AFTER,
+ );
+
+ link = bpf_program__attach_tcx(skel->progs.tc4, loopback, &optl);
+ if (!ASSERT_OK_PTR(link, "link_attach"))
+ goto cleanup;
+
+ skel->links.tc4 = link;
+
+ lid4 = id_from_link_fd(bpf_link__fd(skel->links.tc4));
+
+ assert_mprog_count(target, 4);
+
+ memset(prog_ids, 0, sizeof(prog_ids));
+ memset(link_ids, 0, sizeof(link_ids));
+ optq.count = ARRAY_SIZE(prog_ids);
+
+ err = bpf_prog_query_opts(loopback, target, &optq);
+ if (!ASSERT_OK(err, "prog_query"))
+ goto cleanup;
+
+ ASSERT_EQ(optq.count, 4, "count");
+ ASSERT_EQ(optq.revision, 5, "revision");
+ ASSERT_EQ(optq.prog_ids[0], pid1, "prog_ids[0]");
+ ASSERT_EQ(optq.link_ids[0], lid1, "link_ids[0]");
+ ASSERT_EQ(optq.prog_ids[1], pid2, "prog_ids[1]");
+ ASSERT_EQ(optq.link_ids[1], lid2, "link_ids[1]");
+ ASSERT_EQ(optq.prog_ids[2], pid3, "prog_ids[2]");
+ ASSERT_EQ(optq.link_ids[2], lid3, "link_ids[2]");
+ ASSERT_EQ(optq.prog_ids[3], pid4, "prog_ids[3]");
+ ASSERT_EQ(optq.link_ids[3], lid4, "link_ids[3]");
+ ASSERT_EQ(optq.prog_ids[4], 0, "prog_ids[4]");
+ ASSERT_EQ(optq.link_ids[4], 0, "link_ids[4]");
+
+ ASSERT_OK(system(ping_cmd), ping_cmd);
+
+ ASSERT_EQ(skel->bss->seen_tc1, true, "seen_tc1");
+ ASSERT_EQ(skel->bss->seen_tc2, true, "seen_tc2");
+ ASSERT_EQ(skel->bss->seen_tc3, true, "seen_tc3");
+ ASSERT_EQ(skel->bss->seen_tc4, true, "seen_tc4");
+cleanup:
+ test_tc_link__destroy(skel);
+ assert_mprog_count(target, 0);
+}
+
+void serial_test_tc_links_append(void)
+{
+ test_tc_links_append_target(BPF_TCX_INGRESS);
+ test_tc_links_append_target(BPF_TCX_EGRESS);
+}
+
+static void test_tc_links_dev_cleanup_target(int target)
+{
+ LIBBPF_OPTS(bpf_tcx_opts, optl);
+ LIBBPF_OPTS(bpf_prog_query_opts, optq);
+ __u32 pid1, pid2, pid3, pid4;
+ struct test_tc_link *skel;
+ struct bpf_link *link;
+ int err, ifindex;
+
+ ASSERT_OK(system("ip link add dev tcx_opts1 type veth peer name tcx_opts2"), "add veth");
+ ifindex = if_nametoindex("tcx_opts1");
+ ASSERT_NEQ(ifindex, 0, "non_zero_ifindex");
+
+ skel = test_tc_link__open();
+ if (!ASSERT_OK_PTR(skel, "skel_open"))
+ goto cleanup;
+
+ ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc1, target),
+ 0, "tc1_attach_type");
+ ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc2, target),
+ 0, "tc2_attach_type");
+ ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc3, target),
+ 0, "tc3_attach_type");
+ ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc4, target),
+ 0, "tc4_attach_type");
+
+ err = test_tc_link__load(skel);
+ if (!ASSERT_OK(err, "skel_load"))
+ goto cleanup;
+
+ pid1 = id_from_prog_fd(bpf_program__fd(skel->progs.tc1));
+ pid2 = id_from_prog_fd(bpf_program__fd(skel->progs.tc2));
+ pid3 = id_from_prog_fd(bpf_program__fd(skel->progs.tc3));
+ pid4 = id_from_prog_fd(bpf_program__fd(skel->progs.tc4));
+
+ ASSERT_NEQ(pid1, pid2, "prog_ids_1_2");
+ ASSERT_NEQ(pid3, pid4, "prog_ids_3_4");
+ ASSERT_NEQ(pid2, pid3, "prog_ids_2_3");
+
+ assert_mprog_count(target, 0);
+
+ link = bpf_program__attach_tcx(skel->progs.tc1, ifindex, &optl);
+ if (!ASSERT_OK_PTR(link, "link_attach"))
+ goto cleanup;
+
+ skel->links.tc1 = link;
+
+ assert_mprog_count_ifindex(ifindex, target, 1);
+
+ link = bpf_program__attach_tcx(skel->progs.tc2, ifindex, &optl);
+ if (!ASSERT_OK_PTR(link, "link_attach"))
+ goto cleanup;
+
+ skel->links.tc2 = link;
+
+ assert_mprog_count_ifindex(ifindex, target, 2);
+
+ link = bpf_program__attach_tcx(skel->progs.tc3, ifindex, &optl);
+ if (!ASSERT_OK_PTR(link, "link_attach"))
+ goto cleanup;
+
+ skel->links.tc3 = link;
+
+ assert_mprog_count_ifindex(ifindex, target, 3);
+
+ link = bpf_program__attach_tcx(skel->progs.tc4, ifindex, &optl);
+ if (!ASSERT_OK_PTR(link, "link_attach"))
+ goto cleanup;
+
+ skel->links.tc4 = link;
+
+ assert_mprog_count_ifindex(ifindex, target, 4);
+
+ ASSERT_OK(system("ip link del dev tcx_opts1"), "del veth");
+ ASSERT_EQ(if_nametoindex("tcx_opts1"), 0, "dev1_removed");
+ ASSERT_EQ(if_nametoindex("tcx_opts2"), 0, "dev2_removed");
+
+ ASSERT_EQ(ifindex_from_link_fd(bpf_link__fd(skel->links.tc1)), 0, "tc1_ifindex");
+ ASSERT_EQ(ifindex_from_link_fd(bpf_link__fd(skel->links.tc2)), 0, "tc2_ifindex");
+ ASSERT_EQ(ifindex_from_link_fd(bpf_link__fd(skel->links.tc3)), 0, "tc3_ifindex");
+ ASSERT_EQ(ifindex_from_link_fd(bpf_link__fd(skel->links.tc4)), 0, "tc4_ifindex");
+
+ test_tc_link__destroy(skel);
+ return;
+cleanup:
+ test_tc_link__destroy(skel);
+
+ ASSERT_OK(system("ip link del dev tcx_opts1"), "del veth");
+ ASSERT_EQ(if_nametoindex("tcx_opts1"), 0, "dev1_removed");
+ ASSERT_EQ(if_nametoindex("tcx_opts2"), 0, "dev2_removed");
+}
+
+void serial_test_tc_links_dev_cleanup(void)
+{
+ test_tc_links_dev_cleanup_target(BPF_TCX_INGRESS);
+ test_tc_links_dev_cleanup_target(BPF_TCX_EGRESS);
+}
--
2.34.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH bpf-next v6 0/8] BPF link support for tc BPF programs
2023-07-19 14:08 [PATCH bpf-next v6 0/8] BPF link support for tc BPF programs Daniel Borkmann
` (7 preceding siblings ...)
2023-07-19 14:08 ` [PATCH bpf-next v6 8/8] selftests/bpf: Add mprog API tests for BPF tcx links Daniel Borkmann
@ 2023-07-19 17:20 ` patchwork-bot+netdevbpf
8 siblings, 0 replies; 14+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-07-19 17:20 UTC (permalink / raw)
To: Daniel Borkmann
Cc: ast, andrii, martin.lau, razor, sdf, john.fastabend, kuba, dxu,
joe, toke, davem, bpf, netdev
Hello:
This series was applied to bpf/bpf-next.git (master)
by Alexei Starovoitov <ast@kernel.org>:
On Wed, 19 Jul 2023 16:08:50 +0200 you wrote:
> This series adds BPF link support for tc BPF programs. We initially
> presented the motivation, related work and design at last year's LPC
> conference in the networking & BPF track [0], and a recent update on
> our progress of the rework during this year's LSF/MM/BPF summit [1].
> The main changes are in first two patches and the last two have an
> extensive batch of test cases we developed along with it, please see
> individual patches for details. We tested this series with tc-testing
> selftest suite as well as BPF CI/selftests. Thanks!
>
> [...]
Here is the summary with links:
- [bpf-next,v6,1/8] bpf: Add generic attach/detach/query API for multi-progs
https://git.kernel.org/bpf/bpf-next/c/053c8e1f235d
- [bpf-next,v6,2/8] bpf: Add fd-based tcx multi-prog infra with link support
https://git.kernel.org/bpf/bpf-next/c/e420bed02507
- [bpf-next,v6,3/8] libbpf: Add opts-based attach/detach/query API for tcx
https://git.kernel.org/bpf/bpf-next/c/fe20ce3a5126
- [bpf-next,v6,4/8] libbpf: Add link-based API for tcx
https://git.kernel.org/bpf/bpf-next/c/55cc3768473e
- [bpf-next,v6,5/8] libbpf: Add helper macro to clear opts structs
https://git.kernel.org/bpf/bpf-next/c/4e9c2d9af561
- [bpf-next,v6,6/8] bpftool: Extend net dump with tcx progs
https://git.kernel.org/bpf/bpf-next/c/57c61da8bff4
- [bpf-next,v6,7/8] selftests/bpf: Add mprog API tests for BPF tcx opts
https://git.kernel.org/bpf/bpf-next/c/cd13c91d9290
- [bpf-next,v6,8/8] selftests/bpf: Add mprog API tests for BPF tcx links
https://git.kernel.org/bpf/bpf-next/c/c6d479b3346c
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 14+ messages in thread