netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v2 0/7] Add bpf_link based TC-BPF API
@ 2021-06-04  6:31 Kumar Kartikeya Dwivedi
  2021-06-04  6:31 ` [PATCH bpf-next v2 1/7] net: sched: refactor cls_bpf creation code Kumar Kartikeya Dwivedi
                   ` (7 more replies)
  0 siblings, 8 replies; 35+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2021-06-04  6:31 UTC (permalink / raw)
  To: bpf
  Cc: Kumar Kartikeya Dwivedi, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Jamal Hadi Salim, Vlad Buslov, Cong Wang,
	Jesper Dangaard Brouer, Toke Høiland-Jørgensen, netdev

This is the second (non-RFC) version.

This adds a bpf_link path to create TC filters tied to cls_bpf classifier, and
introduces fd based ownership for such TC filters. Netlink cannot delete or
replace such filters, but the bpf_link is severed on indirect destruction of the
filter (backing qdisc being deleted, or chain being flushed, etc.). To ensure
that filters remain attached beyond process lifetime, the usual bpf_link fd
pinning approach can be used.

The individual patches contain more details and comments, but the overall kernel
API and libbpf helper mirrors the semantics of the netlink based TC-BPF API
merged recently. This means that we start by always setting direct action mode,
protocol to ETH_P_ALL, chain_index as 0, etc. If there is a need for more
options in the future, they can be easily exposed through the bpf_link API in
the future.

Patch 1 refactors cls_bpf change function to extract two helpers that will be
reused in bpf_link creation.

Patch 2 exports some bpf_link management functions to modules. This is needed
because our bpf_link object is tied to the cls_bpf_prog object. Tying it to
tcf_proto would be weird, because the update path has to replace offloaded bpf
prog, which happens using internal cls_bpf helpers, and would in general be more
code to abstract over an operation that is unlikely to be implemented for other
filter types.

Patch 3 adds the main bpf_link API. A function in cls_api takes care of
obtaining block reference, creating the filter object, and then calls the
bpf_link_change tcf_proto op (only supported by cls_bpf) that returns a fd after
setting up the internal structures. An optimization is made to not keep around
resources for extended actions, which is explained in a code comment as it wasn't
immediately obvious.

Patch 4 adds an update path for bpf_link. Since bpf_link_update only supports
replacing the bpf_prog, we can skip tc filter's change path by reusing the
filter object but swapping its bpf_prog. This takes care of replacing the
offloaded prog as well (if that fails, update is aborted). So far however,
tcf_classify could do normal load (possibly torn) as the cls_bpf_prog->filter
would never be modified concurrently. This is no longer true, and to not
penalize the classify hot path, we also cannot impose serialization around
its load. Hence the load is changed to READ_ONCE, so that the pointer value is
always consistent. Due to invocation in a RCU critical section, the lifetime of
the prog is guaranteed for the duration of the call.

Patch 5, 6 take care of updating the userspace bits and add a bpf_link returning
function to libbpf.

Patch 7 adds a selftest that exercises all possible problematic interactions
that I could think of.

Design:

This is where in the object hierarchy our bpf_link object is attached.

                                                                            ┌─────┐
                                                                            │     │
                                                                            │ BPF │
                                                                            program
                                                                            │     │
                                                                            └──▲──┘
                                                      ┌───────┐                │
                                                      │       │         ┌──────┴───────┐
                                                      │  mod  ├─────────► cls_bpf_prog │
┌────────────────┐                                    │cls_bpf│         └────┬───▲─────┘
│    tcf_block   │                                    │       │              │   │
└────────┬───────┘                                    └───▲───┘              │   │
         │          ┌─────────────┐                       │                ┌─▼───┴──┐
         └──────────►  tcf_chain  │                       │                │bpf_link│
                    └───────┬─────┘                       │                └────────┘
                            │          ┌─────────────┐    │
                            └──────────►  tcf_proto  ├────┘
                                       └─────────────┘

The bpf_link is detached on destruction of the cls_bpf_prog.  Doing it this way
allows us to implement update in a lightweight manner without having to recreate
a new filter, where we can just replace the BPF prog attached to cls_bpf_prog.

The other way to do it would be to link the bpf_link to tcf_proto, there are
numerous downsides to this:

1. All filters have to embed the pointer even though they won't be using it when
cls_bpf is compiled in.
2. This probably won't make sense to be extended to other filter types anyway.
3. We aren't able to optimize the update case without adding another bpf_link
specific update operation to tcf_proto ops.

The downside with tying this to the module is having to export bpf_link
management functions and introducing a tcf_proto op. Hopefully the cost of
another operation func pointer is not big enough (as there is only one ops
struct per module).

Changelog:
----------
v1 (RFC) -> v2
v1: https://lore.kernel.org/bpf/20210528195946.2375109-1-memxor@gmail.com

 * Avoid overwriting other members of union in bpf_attr (Andrii)
 * Set link to NULL after bpf_link_cleanup to avoid double free (Andrii)
 * Use __be16 to store the result of htons (Kernel Test Robot)
 * Make assignment of tcf_exts::net conditional on CONFIG_NET_CLS_ACT
   (Kernel Test Robot)

Kumar Kartikeya Dwivedi (7):
  net: sched: refactor cls_bpf creation code
  bpf: export bpf_link functions for modules
  net: sched: add bpf_link API for bpf classifier
  net: sched: add lightweight update path for cls_bpf
  tools: bpf.h: sync with kernel sources
  libbpf: add bpf_link based TC-BPF management API
  libbpf: add selftest for bpf_link based TC-BPF management API

 include/linux/bpf_types.h                     |   3 +
 include/net/pkt_cls.h                         |  13 +
 include/net/sch_generic.h                     |   6 +-
 include/uapi/linux/bpf.h                      |  15 +
 kernel/bpf/syscall.c                          |  14 +-
 net/sched/cls_api.c                           | 139 ++++++-
 net/sched/cls_bpf.c                           | 389 ++++++++++++++++--
 tools/include/uapi/linux/bpf.h                |  15 +
 tools/lib/bpf/bpf.c                           |   8 +-
 tools/lib/bpf/bpf.h                           |   8 +-
 tools/lib/bpf/libbpf.c                        |  59 ++-
 tools/lib/bpf/libbpf.h                        |  17 +
 tools/lib/bpf/libbpf.map                      |   1 +
 tools/lib/bpf/netlink.c                       |   5 +-
 tools/lib/bpf/netlink.h                       |   8 +
 .../selftests/bpf/prog_tests/tc_bpf_link.c    | 285 +++++++++++++
 16 files changed, 940 insertions(+), 45 deletions(-)
 create mode 100644 tools/lib/bpf/netlink.h
 create mode 100644 tools/testing/selftests/bpf/prog_tests/tc_bpf_link.c

-- 
2.31.1


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH bpf-next v2 1/7] net: sched: refactor cls_bpf creation code
  2021-06-04  6:31 [PATCH bpf-next v2 0/7] Add bpf_link based TC-BPF API Kumar Kartikeya Dwivedi
@ 2021-06-04  6:31 ` Kumar Kartikeya Dwivedi
  2021-06-04  6:31 ` [PATCH bpf-next v2 2/7] bpf: export bpf_link functions for modules Kumar Kartikeya Dwivedi
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 35+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2021-06-04  6:31 UTC (permalink / raw)
  To: bpf
  Cc: Kumar Kartikeya Dwivedi, Toke Høiland-Jørgensen,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Jamal Hadi Salim, Vlad Buslov, Cong Wang, Jesper Dangaard Brouer,
	netdev

Move parts of the code that are independent and need to be reused into
their own helpers. These will be needed for adding a bpf_link creation
path in a subsequent patch.

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 net/sched/cls_bpf.c | 84 ++++++++++++++++++++++++++++-----------------
 1 file changed, 53 insertions(+), 31 deletions(-)

diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index 6e3e63db0e01..360b97ab8646 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -455,6 +455,57 @@ static int cls_bpf_set_parms(struct net *net, struct tcf_proto *tp,
 	return 0;
 }
 
+static int __cls_bpf_alloc_idr(struct cls_bpf_head *head, u32 handle,
+			       struct cls_bpf_prog *prog,
+			       struct cls_bpf_prog *oldprog)
+{
+	int ret = 0;
+
+	if (oldprog) {
+		if (handle && oldprog->handle != handle)
+			return -EINVAL;
+	}
+
+	if (handle == 0) {
+		handle = 1;
+		ret = idr_alloc_u32(&head->handle_idr, prog, &handle, INT_MAX,
+				    GFP_KERNEL);
+	} else if (!oldprog) {
+		ret = idr_alloc_u32(&head->handle_idr, prog, &handle, handle,
+				    GFP_KERNEL);
+	}
+
+	prog->handle = handle;
+	return ret;
+}
+
+static int __cls_bpf_change(struct cls_bpf_head *head, struct tcf_proto *tp,
+			    struct cls_bpf_prog *prog,
+			    struct cls_bpf_prog *oldprog,
+			    struct netlink_ext_ack *extack)
+{
+	int ret;
+
+	ret = cls_bpf_offload(tp, prog, oldprog, extack);
+	if (ret)
+		return ret;
+
+	if (!tc_in_hw(prog->gen_flags))
+		prog->gen_flags |= TCA_CLS_FLAGS_NOT_IN_HW;
+
+	if (oldprog) {
+		idr_replace(&head->handle_idr, prog, prog->handle);
+		list_replace_rcu(&oldprog->link, &prog->link);
+		tcf_unbind_filter(tp, &oldprog->res);
+		tcf_exts_get_net(&oldprog->exts);
+		tcf_queue_work(&oldprog->rwork, cls_bpf_delete_prog_work);
+	} else {
+		list_add_rcu(&prog->link, &head->plist);
+	}
+
+	return 0;
+}
+
 static int cls_bpf_change(struct net *net, struct sk_buff *in_skb,
 			  struct tcf_proto *tp, unsigned long base,
 			  u32 handle, struct nlattr **tca,
@@ -483,48 +534,19 @@ static int cls_bpf_change(struct net *net, struct sk_buff *in_skb,
 	if (ret < 0)
 		goto errout;
 
-	if (oldprog) {
-		if (handle && oldprog->handle != handle) {
-			ret = -EINVAL;
-			goto errout;
-		}
-	}
-
-	if (handle == 0) {
-		handle = 1;
-		ret = idr_alloc_u32(&head->handle_idr, prog, &handle,
-				    INT_MAX, GFP_KERNEL);
-	} else if (!oldprog) {
-		ret = idr_alloc_u32(&head->handle_idr, prog, &handle,
-				    handle, GFP_KERNEL);
-	}
-
+	ret = __cls_bpf_alloc_idr(head, handle, prog, oldprog);
 	if (ret)
 		goto errout;
-	prog->handle = handle;
 
 	ret = cls_bpf_set_parms(net, tp, prog, base, tb, tca[TCA_RATE], ovr,
 				extack);
 	if (ret < 0)
 		goto errout_idr;
 
-	ret = cls_bpf_offload(tp, prog, oldprog, extack);
+	ret = __cls_bpf_change(head, tp, prog, oldprog, extack);
 	if (ret)
 		goto errout_parms;
 
-	if (!tc_in_hw(prog->gen_flags))
-		prog->gen_flags |= TCA_CLS_FLAGS_NOT_IN_HW;
-
-	if (oldprog) {
-		idr_replace(&head->handle_idr, prog, handle);
-		list_replace_rcu(&oldprog->link, &prog->link);
-		tcf_unbind_filter(tp, &oldprog->res);
-		tcf_exts_get_net(&oldprog->exts);
-		tcf_queue_work(&oldprog->rwork, cls_bpf_delete_prog_work);
-	} else {
-		list_add_rcu(&prog->link, &head->plist);
-	}
-
 	*arg = prog;
 	return 0;
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH bpf-next v2 2/7] bpf: export bpf_link functions for modules
  2021-06-04  6:31 [PATCH bpf-next v2 0/7] Add bpf_link based TC-BPF API Kumar Kartikeya Dwivedi
  2021-06-04  6:31 ` [PATCH bpf-next v2 1/7] net: sched: refactor cls_bpf creation code Kumar Kartikeya Dwivedi
@ 2021-06-04  6:31 ` Kumar Kartikeya Dwivedi
  2021-06-04  6:31 ` [PATCH bpf-next v2 3/7] net: sched: add bpf_link API for bpf classifier Kumar Kartikeya Dwivedi
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 35+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2021-06-04  6:31 UTC (permalink / raw)
  To: bpf
  Cc: Kumar Kartikeya Dwivedi, Toke Høiland-Jørgensen,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Jamal Hadi Salim, Vlad Buslov, Cong Wang, Jesper Dangaard Brouer,
	netdev

These are required in a subsequent patch to implement the bpf_link
command for cls_bpf. Since the bpf_link object is tied to the
cls_bpf_prog object, it has to be initialized and managed from inside
the module.

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/syscall.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 50457019da27..e5934b748ced 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2338,6 +2338,7 @@ void bpf_link_init(struct bpf_link *link, enum bpf_link_type type,
 	link->ops = ops;
 	link->prog = prog;
 }
+EXPORT_SYMBOL_GPL(bpf_link_init);
 
 static void bpf_link_free_id(int id)
 {
@@ -2363,6 +2364,7 @@ void bpf_link_cleanup(struct bpf_link_primer *primer)
 	fput(primer->file);
 	put_unused_fd(primer->fd);
 }
+EXPORT_SYMBOL_GPL(bpf_link_cleanup);
 
 void bpf_link_inc(struct bpf_link *link)
 {
@@ -2510,6 +2512,7 @@ int bpf_link_prime(struct bpf_link *link, struct bpf_link_primer *primer)
 	primer->id = id;
 	return 0;
 }
+EXPORT_SYMBOL_GPL(bpf_link_prime);
 
 int bpf_link_settle(struct bpf_link_primer *primer)
 {
@@ -2522,6 +2525,7 @@ int bpf_link_settle(struct bpf_link_primer *primer)
 	/* pass through installed FD */
 	return primer->fd;
 }
+EXPORT_SYMBOL_GPL(bpf_link_settle);
 
 int bpf_link_new_fd(struct bpf_link *link)
 {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH bpf-next v2 3/7] net: sched: add bpf_link API for bpf classifier
  2021-06-04  6:31 [PATCH bpf-next v2 0/7] Add bpf_link based TC-BPF API Kumar Kartikeya Dwivedi
  2021-06-04  6:31 ` [PATCH bpf-next v2 1/7] net: sched: refactor cls_bpf creation code Kumar Kartikeya Dwivedi
  2021-06-04  6:31 ` [PATCH bpf-next v2 2/7] bpf: export bpf_link functions for modules Kumar Kartikeya Dwivedi
@ 2021-06-04  6:31 ` Kumar Kartikeya Dwivedi
  2021-06-05  3:08   ` Yonghong Song
  2021-06-07 23:23   ` Andrii Nakryiko
  2021-06-04  6:31 ` [PATCH bpf-next v2 4/7] net: sched: add lightweight update path for cls_bpf Kumar Kartikeya Dwivedi
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 35+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2021-06-04  6:31 UTC (permalink / raw)
  To: bpf
  Cc: Kumar Kartikeya Dwivedi, Toke Høiland-Jørgensen,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Jamal Hadi Salim, Vlad Buslov, Cong Wang, Jesper Dangaard Brouer,
	netdev

This commit introduces a bpf_link based kernel API for creating tc
filters and using the cls_bpf classifier. Only a subset of what netlink
API offers is supported, things like TCA_BPF_POLICE, TCA_RATE and
embedded actions are unsupported.

The kernel API and the libbpf wrapper added in a subsequent patch are
more opinionated and mirror the semantics of low level netlink based
TC-BPF API, i.e. always setting direct action mode, always setting
protocol to ETH_P_ALL, and only exposing handle and priority as the
variables the user can control. We add an additional gen_flags parameter
though to allow for offloading use cases. It would be trivial to extend
the current API to support specifying other attributes in the future,
but for now I'm sticking how we want to push usage.

The semantics around bpf_link support are as follows:

A user can create a classifier attached to a filter using the bpf_link
API, after which changing it and deleting it only happens through the
bpf_link API. It is not possible to bind the bpf_link to existing
filter, and any such attempt will fail with EEXIST. Hence EEXIST can be
returned in two cases, when existing bpf_link owned filter exists, or
existing netlink owned filter exists.

Removing bpf_link owned filter from netlink returns EPERM, denoting that
netlink is locked out from filter manipulation when bpf_link is
involved.

Whenever a filter is detached due to chain removal, or qdisc tear down,
or net_device shutdown, the bpf_link becomes automatically detached.

In this way, the netlink API and bpf_link creation path are exclusive
and don't stomp over one another. Filters created using bpf_link API
cannot be replaced by netlink API, and filters created by netlink API are
never replaced by bpf_link. Netfilter also cannot detach bpf_link filters.

We serialize all changes dover rtnl_lock as cls_bpf API doesn't support the
unlocked classifier API.

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf_types.h |   3 +
 include/net/pkt_cls.h     |  13 ++
 include/net/sch_generic.h |   6 +-
 include/uapi/linux/bpf.h  |  15 +++
 kernel/bpf/syscall.c      |  10 +-
 net/sched/cls_api.c       | 139 ++++++++++++++++++++-
 net/sched/cls_bpf.c       | 250 +++++++++++++++++++++++++++++++++++++-
 7 files changed, 430 insertions(+), 6 deletions(-)

diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index a9db1eae6796..b1aaf7680917 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -135,3 +135,6 @@ BPF_LINK_TYPE(BPF_LINK_TYPE_ITER, iter)
 #ifdef CONFIG_NET
 BPF_LINK_TYPE(BPF_LINK_TYPE_NETNS, netns)
 #endif
+#if IS_ENABLED(CONFIG_NET_CLS_BPF)
+BPF_LINK_TYPE(BPF_LINK_TYPE_TC, tc)
+#endif
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 255e4f4b521f..c36c5d79db6b 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -2,6 +2,7 @@
 #ifndef __NET_PKT_CLS_H
 #define __NET_PKT_CLS_H
 
+#include <linux/bpf.h>
 #include <linux/pkt_cls.h>
 #include <linux/workqueue.h>
 #include <net/sch_generic.h>
@@ -45,6 +46,9 @@ bool tcf_queue_work(struct rcu_work *rwork, work_func_t func);
 struct tcf_chain *tcf_chain_get_by_act(struct tcf_block *block,
 				       u32 chain_index);
 void tcf_chain_put_by_act(struct tcf_chain *chain);
+void tcf_chain_tp_delete_empty(struct tcf_chain *chain,
+			       struct tcf_proto *tp, bool rtnl_held,
+			       struct netlink_ext_ack *extack);
 struct tcf_chain *tcf_get_next_chain(struct tcf_block *block,
 				     struct tcf_chain *chain);
 struct tcf_proto *tcf_get_next_proto(struct tcf_chain *chain,
@@ -1004,4 +1008,13 @@ struct tc_fifo_qopt_offload {
 	};
 };
 
+#if IS_ENABLED(CONFIG_NET_CLS_BPF)
+int bpf_tc_link_attach(union bpf_attr *attr, struct bpf_prog *prog);
+#else
+static inline int bpf_tc_link_attach(union bpf_attr *attr, struct bpf_prog *prog)
+{
+	return -EOPNOTSUPP;
+}
+#endif
+
 #endif
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index f7a6e14491fb..bacd70bfc5ed 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -341,7 +341,11 @@ struct tcf_proto_ops {
 	int			(*tmplt_dump)(struct sk_buff *skb,
 					      struct net *net,
 					      void *tmplt_priv);
-
+#if IS_ENABLED(CONFIG_NET_CLS_BPF)
+	int			(*bpf_link_change)(struct net *net, struct tcf_proto *tp,
+						   struct bpf_prog *filter, void **arg, u32 handle,
+						   u32 gen_flags);
+#endif
 	struct module		*owner;
 	int			flags;
 };
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 2c1ba70abbf1..a3488463d145 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -994,6 +994,7 @@ enum bpf_attach_type {
 	BPF_SK_LOOKUP,
 	BPF_XDP,
 	BPF_SK_SKB_VERDICT,
+	BPF_TC,
 	__MAX_BPF_ATTACH_TYPE
 };
 
@@ -1007,6 +1008,7 @@ enum bpf_link_type {
 	BPF_LINK_TYPE_ITER = 4,
 	BPF_LINK_TYPE_NETNS = 5,
 	BPF_LINK_TYPE_XDP = 6,
+	BPF_LINK_TYPE_TC = 7,
 
 	MAX_BPF_LINK_TYPE,
 };
@@ -1447,6 +1449,12 @@ union bpf_attr {
 				__aligned_u64	iter_info;	/* extra bpf_iter_link_info */
 				__u32		iter_info_len;	/* iter_info length */
 			};
+			struct { /* used by BPF_TC */
+				__u32 parent;
+				__u32 handle;
+				__u32 gen_flags;
+				__u16 priority;
+			} tc;
 		};
 	} link_create;
 
@@ -5519,6 +5527,13 @@ struct bpf_link_info {
 		struct {
 			__u32 ifindex;
 		} xdp;
+		struct {
+			__u32 ifindex;
+			__u32 parent;
+			__u32 handle;
+			__u32 gen_flags;
+			__u16 priority;
+		} tc;
 	};
 } __attribute__((aligned(8)));
 
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index e5934b748ced..ce7c00ea135c 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0-only
 /* Copyright (c) 2011-2014 PLUMgrid, http://plumgrid.com
  */
+#include <net/pkt_cls.h>
 #include <linux/bpf.h>
 #include <linux/bpf_trace.h>
 #include <linux/bpf_lirc.h>
@@ -3027,6 +3028,8 @@ attach_type_to_prog_type(enum bpf_attach_type attach_type)
 		return BPF_PROG_TYPE_SK_LOOKUP;
 	case BPF_XDP:
 		return BPF_PROG_TYPE_XDP;
+	case BPF_TC:
+		return BPF_PROG_TYPE_SCHED_CLS;
 	default:
 		return BPF_PROG_TYPE_UNSPEC;
 	}
@@ -4085,7 +4088,7 @@ static int tracing_bpf_link_attach(const union bpf_attr *attr, bpfptr_t uattr,
 	return -EINVAL;
 }
 
-#define BPF_LINK_CREATE_LAST_FIELD link_create.iter_info_len
+#define BPF_LINK_CREATE_LAST_FIELD link_create.tc.priority
 static int link_create(union bpf_attr *attr, bpfptr_t uattr)
 {
 	enum bpf_prog_type ptype;
@@ -4136,6 +4139,11 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
 	case BPF_PROG_TYPE_XDP:
 		ret = bpf_xdp_link_attach(attr, prog);
 		break;
+#endif
+#if IS_ENABLED(CONFIG_NET_CLS_BPF)
+	case BPF_PROG_TYPE_SCHED_CLS:
+		ret = bpf_tc_link_attach(attr, prog);
+		break;
 #endif
 	default:
 		ret = -EINVAL;
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 75e3a288a7c8..f492b4764301 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -9,6 +9,7 @@
  * Eduardo J. Blanco <ejbs@netlabs.com.uy> :990222: kmod support
  */
 
+#include <linux/bpf.h>
 #include <linux/module.h>
 #include <linux/types.h>
 #include <linux/kernel.h>
@@ -1720,9 +1721,9 @@ static struct tcf_proto *tcf_chain_tp_insert_unique(struct tcf_chain *chain,
 	return tp_new;
 }
 
-static void tcf_chain_tp_delete_empty(struct tcf_chain *chain,
-				      struct tcf_proto *tp, bool rtnl_held,
-				      struct netlink_ext_ack *extack)
+void tcf_chain_tp_delete_empty(struct tcf_chain *chain,
+			       struct tcf_proto *tp, bool rtnl_held,
+			       struct netlink_ext_ack *extack)
 {
 	struct tcf_chain_info chain_info;
 	struct tcf_proto *tp_iter;
@@ -1760,6 +1761,7 @@ static void tcf_chain_tp_delete_empty(struct tcf_chain *chain,
 
 	tcf_proto_put(tp, rtnl_held, extack);
 }
+EXPORT_SYMBOL_GPL(tcf_chain_tp_delete_empty);
 
 static struct tcf_proto *tcf_chain_tp_find(struct tcf_chain *chain,
 					   struct tcf_chain_info *chain_info,
@@ -3917,3 +3919,134 @@ static int __init tc_filter_init(void)
 }
 
 subsys_initcall(tc_filter_init);
+
+#if IS_ENABLED(CONFIG_NET_CLS_BPF)
+
+int bpf_tc_link_attach(union bpf_attr *attr, struct bpf_prog *prog)
+{
+	struct net *net = current->nsproxy->net_ns;
+	struct tcf_chain_info chain_info;
+	u32 chain_index, prio, parent;
+	struct tcf_block *block;
+	struct tcf_chain *chain;
+	struct tcf_proto *tp;
+	int err, tp_created;
+	unsigned long cl;
+	struct Qdisc *q;
+	__be16 protocol;
+	void *fh;
+
+	/* Caller already checks bpf_capable */
+	if (!ns_capable(current->nsproxy->net_ns->user_ns, CAP_NET_ADMIN))
+		return -EPERM;
+
+	if (attr->link_create.flags ||
+	    !attr->link_create.target_ifindex ||
+	    !tc_flags_valid(attr->link_create.tc.gen_flags))
+		return -EINVAL;
+
+replay:
+	parent = attr->link_create.tc.parent;
+	prio = attr->link_create.tc.priority;
+	protocol = htons(ETH_P_ALL);
+	chain_index = 0;
+	tp_created = 0;
+	prio <<= 16;
+	cl = 0;
+
+	/* Address this when cls_bpf switches to RTNL_FLAG_DOIT_UNLOCKED */
+	rtnl_lock();
+
+	block = tcf_block_find(net, &q, &parent, &cl,
+			       attr->link_create.target_ifindex, parent, NULL);
+	if (IS_ERR(block)) {
+		err = PTR_ERR(block);
+		goto out_unlock;
+	}
+	block->classid = parent;
+
+	chain = tcf_chain_get(block, chain_index, true);
+	if (!chain) {
+		err = -ENOMEM;
+		goto out_block;
+	}
+
+	mutex_lock(&chain->filter_chain_lock);
+
+	tp = tcf_chain_tp_find(chain, &chain_info, protocol,
+			       prio ?: TC_H_MAKE(0x80000000U, 0U),
+			       !prio);
+	if (IS_ERR(tp)) {
+		err = PTR_ERR(tp);
+		goto out_chain_unlock;
+	}
+
+	if (!tp) {
+		struct tcf_proto *tp_new = NULL;
+
+		if (chain->flushing) {
+			err = -EAGAIN;
+			goto out_chain_unlock;
+		}
+
+		if (!prio)
+			prio = tcf_auto_prio(tcf_chain_tp_prev(chain,
+							       &chain_info));
+
+		mutex_unlock(&chain->filter_chain_lock);
+
+		tp_new = tcf_proto_create("bpf", protocol, prio, chain, true,
+					  NULL);
+		if (IS_ERR(tp_new)) {
+			err = PTR_ERR(tp_new);
+			goto out_chain;
+		}
+
+		tp_created = 1;
+		tp = tcf_chain_tp_insert_unique(chain, tp_new, protocol, prio,
+						true);
+		if (IS_ERR(tp)) {
+			err = PTR_ERR(tp);
+			goto out_chain;
+		}
+	} else {
+		mutex_unlock(&chain->filter_chain_lock);
+	}
+
+	fh = tp->ops->get(tp, attr->link_create.tc.handle);
+
+	if (!tp->ops->bpf_link_change)
+		err = -EDEADLK;
+	else
+		err = tp->ops->bpf_link_change(net, tp, prog, &fh,
+					       attr->link_create.tc.handle,
+					       attr->link_create.tc.gen_flags);
+	if (err >= 0 && q)
+		q->flags &= ~TCQ_F_CAN_BYPASS;
+
+out:
+	if (err < 0 && tp_created)
+		tcf_chain_tp_delete_empty(chain, tp, true, NULL);
+out_chain:
+	if (chain) {
+		if (!IS_ERR_OR_NULL(tp))
+			tcf_proto_put(tp, true, NULL);
+		/* Chain reference only kept for tp creation
+		 * to pair with tcf_chain_put from tcf_proto_destroy
+		 */
+		if (!tp_created)
+			tcf_chain_put(chain);
+	}
+out_block:
+	tcf_block_release(q, block, true);
+out_unlock:
+	rtnl_unlock();
+	if (err == -EAGAIN)
+		goto replay;
+	return err;
+out_chain_unlock:
+	mutex_unlock(&chain->filter_chain_lock);
+	goto out;
+}
+
+#endif
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index 360b97ab8646..bf61ffbb7fd0 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -34,6 +34,11 @@ struct cls_bpf_head {
 	struct rcu_head rcu;
 };
 
+struct cls_bpf_link {
+	struct bpf_link link;
+	struct cls_bpf_prog *prog;
+};
+
 struct cls_bpf_prog {
 	struct bpf_prog *filter;
 	struct list_head link;
@@ -48,6 +53,7 @@ struct cls_bpf_prog {
 	const char *bpf_name;
 	struct tcf_proto *tp;
 	struct rcu_work rwork;
+	struct cls_bpf_link *bpf_link;
 };
 
 static const struct nla_policy bpf_policy[TCA_BPF_MAX + 1] = {
@@ -289,6 +295,8 @@ static void __cls_bpf_delete(struct tcf_proto *tp, struct cls_bpf_prog *prog,
 {
 	struct cls_bpf_head *head = rtnl_dereference(tp->root);
 
+	if (prog->bpf_link)
+		prog->bpf_link->prog = NULL;
 	idr_remove(&head->handle_idr, prog->handle);
 	cls_bpf_stop_offload(tp, prog, extack);
 	list_del_rcu(&prog->link);
@@ -303,8 +311,13 @@ static int cls_bpf_delete(struct tcf_proto *tp, void *arg, bool *last,
 			  bool rtnl_held, struct netlink_ext_ack *extack)
 {
 	struct cls_bpf_head *head = rtnl_dereference(tp->root);
+	struct cls_bpf_prog *prog = arg;
+
+	/* Cannot remove bpf_link owned filter using netlink */
+	if (prog->bpf_link)
+		return -EPERM;
 
-	__cls_bpf_delete(tp, arg, extack);
+	__cls_bpf_delete(tp, prog, extack);
 	*last = list_empty(&head->plist);
 	return 0;
 }
@@ -494,6 +507,11 @@ static int __cls_bpf_change(struct cls_bpf_head *head, struct tcf_proto *tp,
 		prog->gen_flags |= TCA_CLS_FLAGS_NOT_IN_HW;
 
 	if (oldprog) {
+		/* Since netfilter and bpf_link cannot replace a bpf_link
+		 * attached filter, this should never be true.
+		 */
+		WARN_ON(oldprog->bpf_link);
+
 		idr_replace(&head->handle_idr, prog, prog->handle);
 		list_replace_rcu(&oldprog->link, &prog->link);
 		tcf_unbind_filter(tp, &oldprog->res);
@@ -521,6 +539,10 @@ static int cls_bpf_change(struct net *net, struct sk_buff *in_skb,
 	if (tca[TCA_OPTIONS] == NULL)
 		return -EINVAL;
 
+	/* Can't touch bpf_link filter */
+	if (oldprog && oldprog->bpf_link)
+		return -EPERM;
+
 	ret = nla_parse_nested_deprecated(tb, TCA_BPF_MAX, tca[TCA_OPTIONS],
 					  bpf_policy, NULL);
 	if (ret < 0)
@@ -716,6 +738,231 @@ static int cls_bpf_reoffload(struct tcf_proto *tp, bool add, flow_setup_cb_t *cb
 	return 0;
 }
 
+static void cls_bpf_link_release(struct bpf_link *link)
+{
+	struct cls_bpf_link *cls_link;
+	struct cls_bpf_prog *prog;
+	struct cls_bpf_head *head;
+
+	rtnl_lock();
+
+	cls_link = container_of(link, struct cls_bpf_link, link);
+	prog = cls_link->prog;
+
+	if (prog) {
+		head = rtnl_dereference(prog->tp->root);
+		/* Deletion of the filter will unset cls_link->prog */
+		__cls_bpf_delete(prog->tp, prog, NULL);
+		if (list_empty(&head->plist))
+			tcf_chain_tp_delete_empty(prog->tp->chain, prog->tp,
+						  true, NULL);
+	}
+
+	rtnl_unlock();
+}
+
+static void cls_bpf_link_dealloc(struct bpf_link *link)
+{
+	struct cls_bpf_link *cls_link;
+
+	cls_link = container_of(link, struct cls_bpf_link, link);
+	kfree(cls_link);
+}
+
+static int cls_bpf_link_detach(struct bpf_link *link)
+{
+	cls_bpf_link_release(link);
+	return 0;
+}
+
+static void __bpf_fill_link_info(struct cls_bpf_link *link,
+				 struct bpf_link_info *info)
+{
+	struct tcf_block *block;
+	struct tcf_proto *tp;
+	struct Qdisc *q;
+
+	ASSERT_RTNL();
+
+	if (WARN_ON(!link->prog))
+		return;
+
+	tp = link->prog->tp;
+	block = tp->chain->block;
+	q = block->q;
+
+	info->tc.ifindex = q ? qdisc_dev(q)->ifindex : TCM_IFINDEX_MAGIC_BLOCK;
+	info->tc.parent = block->classid;
+	info->tc.handle = link->prog->handle;
+	info->tc.priority = tp->prio >> 16;
+	info->tc.gen_flags = link->prog->gen_flags;
+}
+
+#ifdef CONFIG_PROC_FS
+
+static void cls_bpf_link_show_fdinfo(const struct bpf_link *link,
+				     struct seq_file *seq)
+{
+	struct cls_bpf_link *cls_link;
+	struct bpf_link_info info = {};
+
+	rtnl_lock();
+
+	cls_link = container_of(link, struct cls_bpf_link, link);
+	if (!cls_link->prog)
+		goto out;
+
+	__bpf_fill_link_info(cls_link, &info);
+
+	seq_printf(seq,
+		   "ifindex:\t%u\n"
+		   "parent:\t%u\n"
+		   "handle:\t%u\n"
+		   "priority:\t%u\n"
+		   "gen_flags:\t%u\n",
+		   info.tc.ifindex, info.tc.parent,
+		   info.tc.handle, (u32)info.tc.priority,
+		   info.tc.gen_flags);
+
+out:
+	rtnl_unlock();
+}
+
+#endif
+
+static int cls_bpf_link_fill_link_info(const struct bpf_link *link,
+				       struct bpf_link_info *info)
+{
+	struct cls_bpf_link *cls_link;
+	int ret = 0;
+
+	rtnl_lock();
+
+	cls_link = container_of(link, struct cls_bpf_link, link);
+	if (!cls_link->prog) {
+		ret = -ENOLINK;
+		goto out;
+	}
+
+	__bpf_fill_link_info(cls_link, info);
+
+out:
+	rtnl_unlock();
+	return ret;
+}
+
+static const struct bpf_link_ops cls_bpf_link_ops = {
+	.release = cls_bpf_link_release,
+	.dealloc = cls_bpf_link_dealloc,
+	.detach = cls_bpf_link_detach,
+#ifdef CONFIG_PROC_FS
+	.show_fdinfo = cls_bpf_link_show_fdinfo,
+#endif
+	.fill_link_info = cls_bpf_link_fill_link_info,
+};
+
+static inline char *cls_bpf_link_name(u32 prog_id, const char *name)
+{
+	char *str = kmalloc(CLS_BPF_NAME_LEN, GFP_KERNEL);
+
+	if (str)
+		snprintf(str, CLS_BPF_NAME_LEN, "%s:[%u]", name, prog_id);
+
+	return str;
+}
+
+static int cls_bpf_link_change(struct net *net, struct tcf_proto *tp,
+			       struct bpf_prog *filter, void **arg,
+			       u32 handle, u32 gen_flags)
+{
+	struct cls_bpf_head *head = rtnl_dereference(tp->root);
+	struct cls_bpf_prog *oldprog = *arg, *prog;
+	struct bpf_link_primer primer;
+	struct cls_bpf_link *link;
+	int ret;
+
+	if (gen_flags & ~CLS_BPF_SUPPORTED_GEN_FLAGS)
+		return -EINVAL;
+
+	if (oldprog)
+		return -EEXIST;
+
+	prog = kzalloc(sizeof(*prog), GFP_KERNEL);
+	if (!prog)
+		return -ENOMEM;
+
+	link = kzalloc(sizeof(*link), GFP_KERNEL);
+	if (!link) {
+		ret = -ENOMEM;
+		goto err_prog;
+	}
+
+	bpf_link_init(&link->link, BPF_LINK_TYPE_TC, &cls_bpf_link_ops,
+		      filter);
+
+	ret = bpf_link_prime(&link->link, &primer);
+	if (ret < 0)
+		goto err_link;
+
+	/* We don't init exts to save on memory, but we still need to store the
+	 * net_ns pointer, as during delete whether the deletion work will be
+	 * queued or executed inline depends on the refcount of net_ns. In
+	 * __cls_bpf_delete the reference is taken to keep the action IDR alive
+	 * (which we don't require), but its maybe_get_net also allows us to
+	 * detect whether we are being invoked in netns destruction path or not.
+	 * In the former case deletion will have to be done synchronously.
+	 *
+	 * Leaving it NULL would prevent us from doing deletion work
+	 * asynchronously, so set it here.
+	 *
+	 * On the tcf_classify side, exts->actions are not touched for
+	 * exts_integrated progs, so we should be good.
+	 */
+#ifdef CONFIG_NET_CLS_ACT
+	prog->exts.net = net;
+#endif
+
+	ret = __cls_bpf_alloc_idr(head, handle, prog, oldprog);
+	if (ret < 0)
+		goto err_primer;
+
+	prog->exts_integrated = true;
+	prog->bpf_link = link;
+	prog->filter = filter;
+	prog->tp = tp;
+	link->prog = prog;
+
+	prog->bpf_name = cls_bpf_link_name(filter->aux->id, filter->aux->name);
+	if (!prog->bpf_name) {
+		ret = -ENOMEM;
+		goto err_idr;
+	}
+
+	ret = __cls_bpf_change(head, tp, prog, oldprog, NULL);
+	if (ret < 0)
+		goto err_name;
+
+	bpf_prog_inc(filter);
+
+	if (filter->dst_needed)
+		tcf_block_netif_keep_dst(tp->chain->block);
+
+	return bpf_link_settle(&primer);
+
+err_name:
+	kfree(prog->bpf_name);
+err_idr:
+	idr_remove(&head->handle_idr, prog->handle);
+err_primer:
+	bpf_link_cleanup(&primer);
+	link = NULL;
+err_link:
+	kfree(link);
+err_prog:
+	kfree(prog);
+	return ret;
+}
+
 static struct tcf_proto_ops cls_bpf_ops __read_mostly = {
 	.kind		=	"bpf",
 	.owner		=	THIS_MODULE,
@@ -729,6 +976,7 @@ static struct tcf_proto_ops cls_bpf_ops __read_mostly = {
 	.reoffload	=	cls_bpf_reoffload,
 	.dump		=	cls_bpf_dump,
 	.bind_class	=	cls_bpf_bind_class,
+	.bpf_link_change =	cls_bpf_link_change,
 };
 
 static int __init cls_bpf_init_mod(void)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH bpf-next v2 4/7] net: sched: add lightweight update path for cls_bpf
  2021-06-04  6:31 [PATCH bpf-next v2 0/7] Add bpf_link based TC-BPF API Kumar Kartikeya Dwivedi
                   ` (2 preceding siblings ...)
  2021-06-04  6:31 ` [PATCH bpf-next v2 3/7] net: sched: add bpf_link API for bpf classifier Kumar Kartikeya Dwivedi
@ 2021-06-04  6:31 ` Kumar Kartikeya Dwivedi
  2021-06-04 17:54   ` Alexei Starovoitov
  2021-06-07 23:32   ` Andrii Nakryiko
  2021-06-04  6:31 ` [PATCH bpf-next v2 5/7] tools: bpf.h: sync with kernel sources Kumar Kartikeya Dwivedi
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 35+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2021-06-04  6:31 UTC (permalink / raw)
  To: bpf
  Cc: Kumar Kartikeya Dwivedi, Toke Høiland-Jørgensen,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Jamal Hadi Salim, Vlad Buslov, Cong Wang, Jesper Dangaard Brouer,
	netdev

This is used by BPF_LINK_UPDATE to replace the attach SCHED_CLS bpf prog
effectively changing the classifier implementation for a given filter
owned by a bpf_link.

Note that READ_ONCE suffices in this case as the ordering for loads from
the filter are implicitly provided by the data dependency on BPF prog
pointer.

On the writer side we can just use a relaxed WRITE_ONCE store to make
sure one or the other value is visible to a reader in cls_bpf_classify.
Lifetime is managed using RCU so bpf_prog_put path should wait until
readers are done for old_prog.

All other parties accessing the BPF prog are under RTNL protection, so
need no changes.

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 net/sched/cls_bpf.c | 55 +++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 53 insertions(+), 2 deletions(-)

diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index bf61ffbb7fd0..f23304685c48 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -9,6 +9,7 @@
  * (C) 2013 Daniel Borkmann <dborkman@redhat.com>
  */
 
+#include <linux/atomic.h>
 #include <linux/module.h>
 #include <linux/types.h>
 #include <linux/skbuff.h>
@@ -104,11 +105,11 @@ static int cls_bpf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 			/* It is safe to push/pull even if skb_shared() */
 			__skb_push(skb, skb->mac_len);
 			bpf_compute_data_pointers(skb);
-			filter_res = BPF_PROG_RUN(prog->filter, skb);
+			filter_res = BPF_PROG_RUN(READ_ONCE(prog->filter), skb);
 			__skb_pull(skb, skb->mac_len);
 		} else {
 			bpf_compute_data_pointers(skb);
-			filter_res = BPF_PROG_RUN(prog->filter, skb);
+			filter_res = BPF_PROG_RUN(READ_ONCE(prog->filter), skb);
 		}
 
 		if (prog->exts_integrated) {
@@ -775,6 +776,55 @@ static int cls_bpf_link_detach(struct bpf_link *link)
 	return 0;
 }
 
+static int cls_bpf_link_update(struct bpf_link *link, struct bpf_prog *new_prog,
+			       struct bpf_prog *old_prog)
+{
+	struct cls_bpf_link *cls_link;
+	struct cls_bpf_prog cls_prog;
+	struct cls_bpf_prog *prog;
+	int ret;
+
+	rtnl_lock();
+
+	cls_link = container_of(link, struct cls_bpf_link, link);
+	if (!cls_link->prog) {
+		ret = -ENOLINK;
+		goto out;
+	}
+
+	prog = cls_link->prog;
+
+	/* BPF_F_REPLACEing? */
+	if (old_prog && prog->filter != old_prog) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	old_prog = prog->filter;
+
+	if (new_prog == old_prog) {
+		ret = 0;
+		goto out;
+	}
+
+	cls_prog = *prog;
+	cls_prog.filter = new_prog;
+
+	ret = cls_bpf_offload(prog->tp, &cls_prog, prog, NULL);
+	if (ret < 0)
+		goto out;
+
+	WRITE_ONCE(prog->filter, new_prog);
+
+	bpf_prog_inc(new_prog);
+	/* release our reference */
+	bpf_prog_put(old_prog);
+
+out:
+	rtnl_unlock();
+	return ret;
+}
+
 static void __bpf_fill_link_info(struct cls_bpf_link *link,
 				 struct bpf_link_info *info)
 {
@@ -859,6 +909,7 @@ static const struct bpf_link_ops cls_bpf_link_ops = {
 	.show_fdinfo = cls_bpf_link_show_fdinfo,
 #endif
 	.fill_link_info = cls_bpf_link_fill_link_info,
+	.update_prog = cls_bpf_link_update,
 };
 
 static inline char *cls_bpf_link_name(u32 prog_id, const char *name)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH bpf-next v2 5/7] tools: bpf.h: sync with kernel sources
  2021-06-04  6:31 [PATCH bpf-next v2 0/7] Add bpf_link based TC-BPF API Kumar Kartikeya Dwivedi
                   ` (3 preceding siblings ...)
  2021-06-04  6:31 ` [PATCH bpf-next v2 4/7] net: sched: add lightweight update path for cls_bpf Kumar Kartikeya Dwivedi
@ 2021-06-04  6:31 ` Kumar Kartikeya Dwivedi
  2021-06-04  6:31 ` [PATCH bpf-next v2 6/7] libbpf: add bpf_link based TC-BPF management API Kumar Kartikeya Dwivedi
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 35+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2021-06-04  6:31 UTC (permalink / raw)
  To: bpf
  Cc: Kumar Kartikeya Dwivedi, Toke Høiland-Jørgensen,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Jamal Hadi Salim, Vlad Buslov, Cong Wang, Jesper Dangaard Brouer,
	netdev

This will be used to expose bpf_link based libbpf API to users.

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 tools/include/uapi/linux/bpf.h | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 2c1ba70abbf1..a3488463d145 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -994,6 +994,7 @@ enum bpf_attach_type {
 	BPF_SK_LOOKUP,
 	BPF_XDP,
 	BPF_SK_SKB_VERDICT,
+	BPF_TC,
 	__MAX_BPF_ATTACH_TYPE
 };
 
@@ -1007,6 +1008,7 @@ enum bpf_link_type {
 	BPF_LINK_TYPE_ITER = 4,
 	BPF_LINK_TYPE_NETNS = 5,
 	BPF_LINK_TYPE_XDP = 6,
+	BPF_LINK_TYPE_TC = 7,
 
 	MAX_BPF_LINK_TYPE,
 };
@@ -1447,6 +1449,12 @@ union bpf_attr {
 				__aligned_u64	iter_info;	/* extra bpf_iter_link_info */
 				__u32		iter_info_len;	/* iter_info length */
 			};
+			struct { /* used by BPF_TC */
+				__u32 parent;
+				__u32 handle;
+				__u32 gen_flags;
+				__u16 priority;
+			} tc;
 		};
 	} link_create;
 
@@ -5519,6 +5527,13 @@ struct bpf_link_info {
 		struct {
 			__u32 ifindex;
 		} xdp;
+		struct {
+			__u32 ifindex;
+			__u32 parent;
+			__u32 handle;
+			__u32 gen_flags;
+			__u16 priority;
+		} tc;
 	};
 } __attribute__((aligned(8)));
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH bpf-next v2 6/7] libbpf: add bpf_link based TC-BPF management API
  2021-06-04  6:31 [PATCH bpf-next v2 0/7] Add bpf_link based TC-BPF API Kumar Kartikeya Dwivedi
                   ` (4 preceding siblings ...)
  2021-06-04  6:31 ` [PATCH bpf-next v2 5/7] tools: bpf.h: sync with kernel sources Kumar Kartikeya Dwivedi
@ 2021-06-04  6:31 ` Kumar Kartikeya Dwivedi
  2021-06-04 18:01   ` Alexei Starovoitov
                     ` (2 more replies)
  2021-06-04  6:31 ` [PATCH bpf-next v2 7/7] libbpf: add selftest for " Kumar Kartikeya Dwivedi
  2022-06-10  0:24 ` [PATCH bpf-next v2 0/7] Add bpf_link based TC-BPF API Joanne Koong
  7 siblings, 3 replies; 35+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2021-06-04  6:31 UTC (permalink / raw)
  To: bpf
  Cc: Kumar Kartikeya Dwivedi, Toke Høiland-Jørgensen,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Jamal Hadi Salim, Vlad Buslov, Cong Wang, Jesper Dangaard Brouer,
	netdev

This adds userspace TC-BPF management API based on bpf_link.

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 tools/lib/bpf/bpf.c      |  8 +++++-
 tools/lib/bpf/bpf.h      |  8 +++++-
 tools/lib/bpf/libbpf.c   | 59 ++++++++++++++++++++++++++++++++++++++--
 tools/lib/bpf/libbpf.h   | 17 ++++++++++++
 tools/lib/bpf/libbpf.map |  1 +
 tools/lib/bpf/netlink.c  |  5 ++--
 tools/lib/bpf/netlink.h  |  8 ++++++
 7 files changed, 100 insertions(+), 6 deletions(-)
 create mode 100644 tools/lib/bpf/netlink.h

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 86dcac44f32f..bebccea9bfd7 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -28,6 +28,7 @@
 #include <asm/unistd.h>
 #include <errno.h>
 #include <linux/bpf.h>
+#include <arpa/inet.h>
 #include "bpf.h"
 #include "libbpf.h"
 #include "libbpf_internal.h"
@@ -693,7 +694,12 @@ int bpf_link_create(int prog_fd, int target_fd,
 	attr.link_create.attach_type = attach_type;
 	attr.link_create.flags = OPTS_GET(opts, flags, 0);
 
-	if (iter_info_len) {
+	if (attach_type == BPF_TC) {
+		attr.link_create.tc.parent = OPTS_GET(opts, tc.parent, 0);
+		attr.link_create.tc.handle = OPTS_GET(opts, tc.handle, 0);
+		attr.link_create.tc.priority = OPTS_GET(opts, tc.priority, 0);
+		attr.link_create.tc.gen_flags = OPTS_GET(opts, tc.gen_flags, 0);
+	} else if (iter_info_len) {
 		attr.link_create.iter_info =
 			ptr_to_u64(OPTS_GET(opts, iter_info, (void *)0));
 		attr.link_create.iter_info_len = iter_info_len;
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 4f758f8f50cd..f2178309e9ea 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -177,8 +177,14 @@ struct bpf_link_create_opts {
 	union bpf_iter_link_info *iter_info;
 	__u32 iter_info_len;
 	__u32 target_btf_id;
+	struct {
+		__u32 parent;
+		__u32 handle;
+		__u32 priority;
+		__u32 gen_flags;
+	} tc;
 };
-#define bpf_link_create_opts__last_field target_btf_id
+#define bpf_link_create_opts__last_field tc.gen_flags
 
 LIBBPF_API int bpf_link_create(int prog_fd, int target_fd,
 			       enum bpf_attach_type attach_type,
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 1c4e20e75237..7809536980b1 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -55,6 +55,7 @@
 #include "libbpf_internal.h"
 #include "hashmap.h"
 #include "bpf_gen_internal.h"
+#include "netlink.h"
 
 #ifndef BPF_FS_MAGIC
 #define BPF_FS_MAGIC		0xcafe4a11
@@ -7185,7 +7186,7 @@ static int bpf_object__collect_relos(struct bpf_object *obj)
 
 	for (i = 0; i < obj->nr_programs; i++) {
 		struct bpf_program *p = &obj->programs[i];
-		
+
 		if (!p->nr_reloc)
 			continue;
 
@@ -10005,7 +10006,7 @@ struct bpf_link {
 int bpf_link__update_program(struct bpf_link *link, struct bpf_program *prog)
 {
 	int ret;
-	
+
 	ret = bpf_link_update(bpf_link__fd(link), bpf_program__fd(prog), NULL);
 	return libbpf_err_errno(ret);
 }
@@ -10613,6 +10614,60 @@ struct bpf_link *bpf_program__attach_xdp(struct bpf_program *prog, int ifindex)
 	return bpf_program__attach_fd(prog, ifindex, 0, "xdp");
 }
 
+struct bpf_link *bpf_program__attach_tc(struct bpf_program *prog,
+					const struct bpf_tc_hook *hook,
+					const struct bpf_tc_link_opts *opts)
+{
+	DECLARE_LIBBPF_OPTS(bpf_link_create_opts, lopts, 0);
+	char errmsg[STRERR_BUFSIZE];
+	int prog_fd, link_fd, ret;
+	struct bpf_link *link;
+	__u32 parent;
+
+	if (!hook || !OPTS_VALID(hook, bpf_tc_hook) ||
+	    !OPTS_VALID(opts, bpf_tc_link_opts))
+		return ERR_PTR(-EINVAL);
+
+	if (OPTS_GET(hook, ifindex, 0) <= 0 ||
+	    OPTS_GET(opts, priority, 0) > UINT16_MAX)
+		return ERR_PTR(-EINVAL);
+
+	parent = OPTS_GET(hook, parent, 0);
+
+	ret = tc_get_tcm_parent(OPTS_GET(hook, attach_point, 0),
+				&parent);
+	if (ret < 0)
+		return ERR_PTR(ret);
+
+	lopts.tc.parent = parent;
+	lopts.tc.handle = OPTS_GET(opts, handle, 0);
+	lopts.tc.priority = OPTS_GET(opts, priority, 0);
+	lopts.tc.gen_flags = OPTS_GET(opts, gen_flags, 0);
+
+	prog_fd = bpf_program__fd(prog);
+	if (prog_fd < 0) {
+		pr_warn("prog '%s': can't attach before loaded\n", prog->name);
+		return ERR_PTR(-EINVAL);
+	}
+
+	link = calloc(1, sizeof(*link));
+	if (!link)
+		return ERR_PTR(-ENOMEM);
+	link->detach = &bpf_link__detach_fd;
+
+	link_fd = bpf_link_create(prog_fd, OPTS_GET(hook, ifindex, 0), BPF_TC, &lopts);
+	if (link_fd < 0) {
+		link_fd = -errno;
+		free(link);
+		pr_warn("prog '%s': failed to attach tc filter: %s\n",
+			prog->name, libbpf_strerror_r(link_fd, errmsg, sizeof(errmsg)));
+		return ERR_PTR(link_fd);
+	}
+	link->fd = link_fd;
+
+	return link;
+}
+
 struct bpf_link *bpf_program__attach_freplace(struct bpf_program *prog,
 					      int target_fd,
 					      const char *attach_func_name)
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 6e61342ba56c..284a446c6513 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -282,6 +282,23 @@ LIBBPF_API struct bpf_link *
 bpf_program__attach_iter(struct bpf_program *prog,
 			 const struct bpf_iter_attach_opts *opts);
 
+/* TC bpf_link related API */
+struct bpf_tc_hook;
+
+struct bpf_tc_link_opts {
+	size_t sz;
+	__u32 handle;
+	__u32 priority;
+	__u32 gen_flags;
+	size_t :0;
+};
+#define bpf_tc_link_opts__last_field gen_flags
+
+LIBBPF_API struct bpf_link *
+bpf_program__attach_tc(struct bpf_program *prog,
+		       const struct bpf_tc_hook *hook,
+		       const struct bpf_tc_link_opts *opts);
+
 struct bpf_insn;
 
 /*
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 944c99d1ded3..5aa2e62b9fc2 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -373,5 +373,6 @@ LIBBPF_0.5.0 {
 		bpf_map__initial_value;
 		bpf_map_lookup_and_delete_elem_flags;
 		bpf_object__gen_loader;
+		bpf_program__attach_tc;
 		libbpf_set_strict_mode;
 } LIBBPF_0.4.0;
diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c
index d743c8721aa7..b7ac36fc9c1a 100644
--- a/tools/lib/bpf/netlink.c
+++ b/tools/lib/bpf/netlink.c
@@ -17,6 +17,7 @@
 #include "libbpf.h"
 #include "libbpf_internal.h"
 #include "nlattr.h"
+#include "netlink.h"
 
 #ifndef SOL_NETLINK
 #define SOL_NETLINK 270
@@ -405,8 +406,8 @@ static int attach_point_to_config(struct bpf_tc_hook *hook,
 	}
 }
 
-static int tc_get_tcm_parent(enum bpf_tc_attach_point attach_point,
-			     __u32 *parent)
+int tc_get_tcm_parent(enum bpf_tc_attach_point attach_point,
+		      __u32 *parent)
 {
 	switch (attach_point) {
 	case BPF_TC_INGRESS:
diff --git a/tools/lib/bpf/netlink.h b/tools/lib/bpf/netlink.h
new file mode 100644
index 000000000000..c89133d56eb4
--- /dev/null
+++ b/tools/lib/bpf/netlink.h
@@ -0,0 +1,8 @@
+// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
+#pragma once
+
+#include <linux/types.h>
+#include "libbpf.h"
+
+int tc_get_tcm_parent(enum bpf_tc_attach_point attach_point,
+		      __u32 *parent);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH bpf-next v2 7/7] libbpf: add selftest for bpf_link based TC-BPF management API
  2021-06-04  6:31 [PATCH bpf-next v2 0/7] Add bpf_link based TC-BPF API Kumar Kartikeya Dwivedi
                   ` (5 preceding siblings ...)
  2021-06-04  6:31 ` [PATCH bpf-next v2 6/7] libbpf: add bpf_link based TC-BPF management API Kumar Kartikeya Dwivedi
@ 2021-06-04  6:31 ` Kumar Kartikeya Dwivedi
  2021-06-05 17:26   ` Yonghong Song
  2021-06-07 23:57   ` Andrii Nakryiko
  2022-06-10  0:24 ` [PATCH bpf-next v2 0/7] Add bpf_link based TC-BPF API Joanne Koong
  7 siblings, 2 replies; 35+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2021-06-04  6:31 UTC (permalink / raw)
  To: bpf
  Cc: Kumar Kartikeya Dwivedi, Toke Høiland-Jørgensen,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Jamal Hadi Salim, Vlad Buslov, Cong Wang, Jesper Dangaard Brouer,
	netdev

This covers basic attach/detach/update, and tests interaction with the
netlink API. It also exercises the bpf_link_info and fdinfo codepaths.

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 .../selftests/bpf/prog_tests/tc_bpf_link.c    | 285 ++++++++++++++++++
 1 file changed, 285 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/tc_bpf_link.c

diff --git a/tools/testing/selftests/bpf/prog_tests/tc_bpf_link.c b/tools/testing/selftests/bpf/prog_tests/tc_bpf_link.c
new file mode 100644
index 000000000000..beaf06e0557c
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/tc_bpf_link.c
@@ -0,0 +1,285 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <test_progs.h>
+#include <linux/pkt_cls.h>
+
+#include "test_tc_bpf.skel.h"
+
+#define LO_IFINDEX 1
+
+static int test_tc_bpf_link_basic(struct bpf_tc_hook *hook,
+				  struct bpf_program *prog)
+{
+	DECLARE_LIBBPF_OPTS(bpf_tc_link_opts, opts, .handle = 1, .priority = 1);
+	DECLARE_LIBBPF_OPTS(bpf_tc_opts, qopts, .handle = 1, .priority = 1);
+	struct bpf_prog_info info = {};
+	__u32 info_len = sizeof(info);
+	struct bpf_link *link, *invl;
+	int ret;
+
+	link = bpf_program__attach_tc(prog, hook, &opts);
+	if (!ASSERT_OK_PTR(link, "bpf_program__attach_tc"))
+		return PTR_ERR(link);
+
+	ret = bpf_obj_get_info_by_fd(bpf_program__fd(prog), &info, &info_len);
+	if (!ASSERT_OK(ret, "bpf_obj_get_info_by_fd"))
+		goto end;
+
+	ret = bpf_tc_query(hook, &qopts);
+	if (!ASSERT_OK(ret, "bpf_tc_query"))
+		goto end;
+
+	if (!ASSERT_EQ(qopts.prog_id, info.id, "prog_id match"))
+		goto end;
+
+	opts.gen_flags = ~0u;
+	invl = bpf_program__attach_tc(prog, hook, &opts);
+	if (!ASSERT_ERR_PTR(invl, "bpf_program__attach_tc with invalid flags")) {
+		bpf_link__destroy(invl);
+		ret = -EINVAL;
+	}
+
+end:
+	bpf_link__destroy(link);
+	return ret;
+}
+
+static int test_tc_bpf_link_netlink_interaction(struct bpf_tc_hook *hook,
+						struct bpf_program *prog)
+{
+	DECLARE_LIBBPF_OPTS(bpf_link_update_opts, lopts,
+			    .old_prog_fd = bpf_program__fd(prog));
+	DECLARE_LIBBPF_OPTS(bpf_tc_link_opts, opts, .handle = 1, .priority = 1);
+	DECLARE_LIBBPF_OPTS(bpf_tc_opts, nopts, .handle = 1, .priority = 1);
+	DECLARE_LIBBPF_OPTS(bpf_tc_opts, dopts, .handle = 1, .priority = 1);
+	struct bpf_link *link;
+	int ret;
+
+	/* We need to test the following cases:
+	 *	1. BPF link owned filter cannot be replaced by netlink
+	 *	2. Netlink owned filter cannot be replaced by BPF link
+	 *	3. Netlink cannot do targeted delete of BPF link owned filter
+	 *	4. Filter is actually deleted (with chain cleanup)
+	 *	   We actually (ab)use the kernel behavior of returning EINVAL when
+	 *	   target chain doesn't exist on tc_get_tfilter (which maps to
+	 *	   bpf_tc_query) here, to know if the chain was really cleaned
+	 *	   up on tcf_proto destruction. Our setup is so that there is
+	 *	   only one reference to the chain.
+	 *
+	 *	   So on query, chain ? (filter ?: ENOENT) : EINVAL
+	 */
+
+	link = bpf_program__attach_tc(prog, hook, &opts);
+	if (!ASSERT_OK_PTR(link, "bpf_program__attach_tc"))
+		return PTR_ERR(link);
+
+	nopts.prog_fd = bpf_program__fd(prog);
+	ret = bpf_tc_attach(hook, &nopts);
+	if (!ASSERT_EQ(ret, -EEXIST, "bpf_tc_attach without replace"))
+		goto end;
+
+	nopts.flags = BPF_TC_F_REPLACE;
+	ret = bpf_tc_attach(hook, &nopts);
+	if (!ASSERT_EQ(ret, -EPERM, "bpf_tc_attach with replace"))
+		goto end;
+
+	ret = bpf_tc_detach(hook, &dopts);
+	if (!ASSERT_EQ(ret, -EPERM, "bpf_tc_detach"))
+		goto end;
+
+	lopts.flags = BPF_F_REPLACE;
+	ret = bpf_link_update(bpf_link__fd(link), bpf_program__fd(prog),
+			      &lopts);
+	ASSERT_OK(ret, "bpf_link_update");
+	ret = ret < 0 ? -errno : ret;
+
+end:
+	bpf_link__destroy(link);
+	if (!ret && !ASSERT_EQ(bpf_tc_query(hook, &dopts), -EINVAL,
+			       "chain empty delete"))
+		ret = -EINVAL;
+	return ret;
+}
+
+static int test_tc_bpf_link_update_ways(struct bpf_tc_hook *hook,
+					struct bpf_program *prog)
+{
+	DECLARE_LIBBPF_OPTS(bpf_tc_link_opts, opts, .handle = 1, .priority = 1);
+	DECLARE_LIBBPF_OPTS(bpf_link_update_opts, uopts, 0);
+	struct test_tc_bpf *skel;
+	struct bpf_link *link;
+	int ret;
+
+	skel = test_tc_bpf__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "test_tc_bpf__open_and_load"))
+		return PTR_ERR(skel);
+
+	link = bpf_program__attach_tc(prog, hook, &opts);
+	if (!ASSERT_OK_PTR(link, "bpf_program__attach_tc")) {
+		ret = PTR_ERR(link);
+		goto end;
+	}
+
+	ret = bpf_link_update(bpf_link__fd(link), bpf_program__fd(prog),
+			      &uopts);
+	if (!ASSERT_OK(ret, "bpf_link_update no old prog"))
+		goto end;
+
+	uopts.old_prog_fd = bpf_program__fd(prog);
+	ret = bpf_link_update(bpf_link__fd(link), bpf_program__fd(prog),
+			      &uopts);
+	if (!ASSERT_TRUE(ret < 0 && errno == EINVAL,
+			 "bpf_link_update with old prog without BPF_F_REPLACE")) {
+		ret = -EINVAL;
+		goto end;
+	}
+
+	uopts.flags = BPF_F_REPLACE;
+	ret = bpf_link_update(bpf_link__fd(link), bpf_program__fd(prog),
+			      &uopts);
+	if (!ASSERT_OK(ret, "bpf_link_update with old prog with BPF_F_REPLACE"))
+		goto end;
+
+	uopts.old_prog_fd = bpf_program__fd(skel->progs.cls);
+	ret = bpf_link_update(bpf_link__fd(link), bpf_program__fd(prog),
+			      &uopts);
+	if (!ASSERT_TRUE(ret < 0 && errno == EINVAL,
+			 "bpf_link_update with wrong old prog")) {
+		ret = -EINVAL;
+		goto end;
+	}
+	ret = 0;
+
+end:
+	test_tc_bpf__destroy(skel);
+	return ret;
+}
+
+static int test_tc_bpf_link_info_api(struct bpf_tc_hook *hook,
+				     struct bpf_program *prog)
+{
+	DECLARE_LIBBPF_OPTS(bpf_tc_link_opts, opts, .handle = 1, .priority = 1);
+	__u32 ifindex, parent, handle, gen_flags, priority;
+	char buf[4096], path[256], *begin;
+	struct bpf_link_info info = {};
+	__u32 info_len = sizeof(info);
+	struct bpf_link *link;
+	int ret, fdinfo;
+
+	link = bpf_program__attach_tc(prog, hook, &opts);
+	if (!ASSERT_OK_PTR(link, "bpf_program__attach_tc"))
+		return PTR_ERR(link);
+
+	ret = bpf_obj_get_info_by_fd(bpf_link__fd(link), &info, &info_len);
+	if (!ASSERT_OK(ret, "bpf_obj_get_info_by_fd"))
+		goto end;
+
+	ret = snprintf(path, sizeof(path), "/proc/self/fdinfo/%d",
+		       bpf_link__fd(link));
+	if (!ASSERT_TRUE(!ret || ret < sizeof(path), "snprintf pathname"))
+		goto end;
+
+	fdinfo = open(path, O_RDONLY);
+	if (!ASSERT_GT(fdinfo, -1, "open fdinfo"))
+		goto end;
+
+	ret = read(fdinfo, buf, sizeof(buf));
+	if (!ASSERT_GT(ret, 0, "read fdinfo")) {
+		ret = -EINVAL;
+		goto end_file;
+	}
+
+	begin = strstr(buf, "ifindex");
+	if (!ASSERT_OK_PTR(begin, "find beginning of fdinfo info")) {
+		ret = -EINVAL;
+		goto end_file;
+	}
+
+	ret = sscanf(begin, "ifindex:\t%u\n"
+			    "parent:\t%u\n"
+			    "handle:\t%u\n"
+			    "priority:\t%u\n"
+			    "gen_flags:\t%u\n",
+			    &ifindex, &parent, &handle, &priority, &gen_flags);
+	if (!ASSERT_EQ(ret, 5, "sscanf fdinfo")) {
+		ret = -EINVAL;
+		goto end_file;
+	}
+
+	ret = -EINVAL;
+
+#define X(a, b, c) (!ASSERT_EQ(a, b, #a " == " #b) || !ASSERT_EQ(b, c, #b " == " #c))
+	if (X(info.tc.ifindex, ifindex, 1) ||
+	    X(info.tc.parent, parent,
+	      TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_EGRESS)) ||
+	    X(info.tc.handle, handle, 1) ||
+	    X(info.tc.gen_flags, gen_flags, TCA_CLS_FLAGS_NOT_IN_HW) ||
+	    X(info.tc.priority, priority, 1))
+#undef X
+		goto end_file;
+
+	ret = 0;
+
+end_file:
+	close(fdinfo);
+end:
+	bpf_link__destroy(link);
+	return ret;
+}
+
+void test_tc_bpf_link(void)
+{
+	DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex = LO_IFINDEX,
+			    .attach_point = BPF_TC_INGRESS);
+	struct test_tc_bpf *skel = NULL;
+	bool hook_created = false;
+	int ret;
+
+	skel = test_tc_bpf__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "test_tc_bpf__open_and_load"))
+		return;
+
+	ret = bpf_tc_hook_create(&hook);
+	if (ret == 0)
+		hook_created = true;
+
+	ret = ret == -EEXIST ? 0 : ret;
+	if (!ASSERT_OK(ret, "bpf_tc_hook_create(BPF_TC_INGRESS)"))
+		goto end;
+
+	ret = test_tc_bpf_link_basic(&hook, skel->progs.cls);
+	if (!ASSERT_OK(ret, "test_tc_bpf_link_basic"))
+		goto end;
+
+	bpf_tc_hook_destroy(&hook);
+
+	hook.attach_point = BPF_TC_EGRESS;
+	ret = test_tc_bpf_link_basic(&hook, skel->progs.cls);
+	if (!ASSERT_OK(ret, "test_tc_bpf_link_basic"))
+		goto end;
+
+	bpf_tc_hook_destroy(&hook);
+
+	ret = test_tc_bpf_link_netlink_interaction(&hook, skel->progs.cls);
+	if (!ASSERT_OK(ret, "test_tc_bpf_link_netlink_interaction"))
+		goto end;
+
+	bpf_tc_hook_destroy(&hook);
+
+	ret = test_tc_bpf_link_update_ways(&hook, skel->progs.cls);
+	if (!ASSERT_OK(ret, "test_tc_bpf_link_update_ways"))
+		goto end;
+
+	bpf_tc_hook_destroy(&hook);
+
+	ret = test_tc_bpf_link_info_api(&hook, skel->progs.cls);
+	if (!ASSERT_OK(ret, "test_tc_bpf_link_info_api"))
+		goto end;
+
+end:
+	if (hook_created) {
+		hook.attach_point = BPF_TC_INGRESS | BPF_TC_EGRESS;
+		bpf_tc_hook_destroy(&hook);
+	}
+	test_tc_bpf__destroy(skel);
+}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 4/7] net: sched: add lightweight update path for cls_bpf
  2021-06-04  6:31 ` [PATCH bpf-next v2 4/7] net: sched: add lightweight update path for cls_bpf Kumar Kartikeya Dwivedi
@ 2021-06-04 17:54   ` Alexei Starovoitov
  2021-06-05  4:42     ` Kumar Kartikeya Dwivedi
  2021-06-07 23:32   ` Andrii Nakryiko
  1 sibling, 1 reply; 35+ messages in thread
From: Alexei Starovoitov @ 2021-06-04 17:54 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Toke Høiland-Jørgensen, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Jamal Hadi Salim, Vlad Buslov,
	Cong Wang, Jesper Dangaard Brouer, netdev

On Fri, Jun 04, 2021 at 12:01:13PM +0530, Kumar Kartikeya Dwivedi wrote:
> This is used by BPF_LINK_UPDATE to replace the attach SCHED_CLS bpf prog
> effectively changing the classifier implementation for a given filter
> owned by a bpf_link.
> 
> Note that READ_ONCE suffices in this case as the ordering for loads from
> the filter are implicitly provided by the data dependency on BPF prog
> pointer.
> 
> On the writer side we can just use a relaxed WRITE_ONCE store to make
> sure one or the other value is visible to a reader in cls_bpf_classify.
> Lifetime is managed using RCU so bpf_prog_put path should wait until
> readers are done for old_prog.

Should those be rcu_deref and rcu_assign_pointer ?
Typically the pointer would be __rcu annotated which would be
another small change in struct cls_bpf_prog.
That would make the life time easier to follow?

> All other parties accessing the BPF prog are under RTNL protection, so
> need no changes.
> 
> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>.
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>  net/sched/cls_bpf.c | 55 +++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 53 insertions(+), 2 deletions(-)
> 
> diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
> index bf61ffbb7fd0..f23304685c48 100644
> --- a/net/sched/cls_bpf.c
> +++ b/net/sched/cls_bpf.c
> @@ -9,6 +9,7 @@
>   * (C) 2013 Daniel Borkmann <dborkman@redhat.com>
>   */
>  
> +#include <linux/atomic.h>
>  #include <linux/module.h>
>  #include <linux/types.h>
>  #include <linux/skbuff.h>
> @@ -104,11 +105,11 @@ static int cls_bpf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
>  			/* It is safe to push/pull even if skb_shared() */
>  			__skb_push(skb, skb->mac_len);
>  			bpf_compute_data_pointers(skb);
> -			filter_res = BPF_PROG_RUN(prog->filter, skb);
> +			filter_res = BPF_PROG_RUN(READ_ONCE(prog->filter), skb);
>  			__skb_pull(skb, skb->mac_len);
>  		} else {
>  			bpf_compute_data_pointers(skb);
> -			filter_res = BPF_PROG_RUN(prog->filter, skb);
> +			filter_res = BPF_PROG_RUN(READ_ONCE(prog->filter), skb);
>  		}
>  
>  		if (prog->exts_integrated) {
> @@ -775,6 +776,55 @@ static int cls_bpf_link_detach(struct bpf_link *link)
>  	return 0;
>  }
>  
> +static int cls_bpf_link_update(struct bpf_link *link, struct bpf_prog *new_prog,
> +			       struct bpf_prog *old_prog)
> +{
> +	struct cls_bpf_link *cls_link;
> +	struct cls_bpf_prog cls_prog;
> +	struct cls_bpf_prog *prog;
> +	int ret;
> +
> +	rtnl_lock();
> +
> +	cls_link = container_of(link, struct cls_bpf_link, link);
> +	if (!cls_link->prog) {
> +		ret = -ENOLINK;
> +		goto out;
> +	}
> +
> +	prog = cls_link->prog;
> +
> +	/* BPF_F_REPLACEing? */
> +	if (old_prog && prog->filter != old_prog) {
> +		ret = -EINVAL;

Other places like cgroup_bpf_replace and bpf_iter_link_replace
return -EPERM in such case.

> +		goto out;
> +	}
> +
> +	old_prog = prog->filter;
> +
> +	if (new_prog == old_prog) {
> +		ret = 0;
> +		goto out;
> +	}
> +
> +	cls_prog = *prog;
> +	cls_prog.filter = new_prog;
> +
> +	ret = cls_bpf_offload(prog->tp, &cls_prog, prog, NULL);
> +	if (ret < 0)
> +		goto out;
> +
> +	WRITE_ONCE(prog->filter, new_prog);
> +
> +	bpf_prog_inc(new_prog);
> +	/* release our reference */
> +	bpf_prog_put(old_prog);
> +
> +out:
> +	rtnl_unlock();
> +	return ret;
> +}
> +
>  static void __bpf_fill_link_info(struct cls_bpf_link *link,
>  				 struct bpf_link_info *info)
>  {
> @@ -859,6 +909,7 @@ static const struct bpf_link_ops cls_bpf_link_ops = {
>  	.show_fdinfo = cls_bpf_link_show_fdinfo,
>  #endif
>  	.fill_link_info = cls_bpf_link_fill_link_info,
> +	.update_prog = cls_bpf_link_update,
>  };
>  
>  static inline char *cls_bpf_link_name(u32 prog_id, const char *name)
> -- 
> 2.31.1
> 

-- 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 6/7] libbpf: add bpf_link based TC-BPF management API
  2021-06-04  6:31 ` [PATCH bpf-next v2 6/7] libbpf: add bpf_link based TC-BPF management API Kumar Kartikeya Dwivedi
@ 2021-06-04 18:01   ` Alexei Starovoitov
  2021-06-05  4:51     ` Kumar Kartikeya Dwivedi
  2021-06-05 17:09   ` Yonghong Song
  2021-06-07 23:41   ` Andrii Nakryiko
  2 siblings, 1 reply; 35+ messages in thread
From: Alexei Starovoitov @ 2021-06-04 18:01 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Toke Høiland-Jørgensen, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Jamal Hadi Salim, Vlad Buslov,
	Cong Wang, Jesper Dangaard Brouer, netdev

On Fri, Jun 04, 2021 at 12:01:15PM +0530, Kumar Kartikeya Dwivedi wrote:
> +/* TC bpf_link related API */
> +struct bpf_tc_hook;
> +
> +struct bpf_tc_link_opts {
> +	size_t sz;
> +	__u32 handle;
> +	__u32 priority;
> +	__u32 gen_flags;
> +	size_t :0;
> +};

Did you think of a way to share struct bpf_tc_opts with above?
Or use bpf_tc_link_opts inside bpf_tc_opts? 
Some other way?
gen_flags here and flags there are the same?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 3/7] net: sched: add bpf_link API for bpf classifier
  2021-06-04  6:31 ` [PATCH bpf-next v2 3/7] net: sched: add bpf_link API for bpf classifier Kumar Kartikeya Dwivedi
@ 2021-06-05  3:08   ` Yonghong Song
  2021-06-05  4:52     ` Kumar Kartikeya Dwivedi
  2021-06-07 23:23   ` Andrii Nakryiko
  1 sibling, 1 reply; 35+ messages in thread
From: Yonghong Song @ 2021-06-05  3:08 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: Toke Høiland-Jørgensen, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Jamal Hadi Salim, Vlad Buslov,
	Cong Wang, Jesper Dangaard Brouer, netdev



On 6/3/21 11:31 PM, Kumar Kartikeya Dwivedi wrote:
> This commit introduces a bpf_link based kernel API for creating tc
> filters and using the cls_bpf classifier. Only a subset of what netlink
> API offers is supported, things like TCA_BPF_POLICE, TCA_RATE and
> embedded actions are unsupported.
> 
> The kernel API and the libbpf wrapper added in a subsequent patch are
> more opinionated and mirror the semantics of low level netlink based
> TC-BPF API, i.e. always setting direct action mode, always setting
> protocol to ETH_P_ALL, and only exposing handle and priority as the
> variables the user can control. We add an additional gen_flags parameter
> though to allow for offloading use cases. It would be trivial to extend
> the current API to support specifying other attributes in the future,
> but for now I'm sticking how we want to push usage.
> 
> The semantics around bpf_link support are as follows:
> 
> A user can create a classifier attached to a filter using the bpf_link
> API, after which changing it and deleting it only happens through the
> bpf_link API. It is not possible to bind the bpf_link to existing
> filter, and any such attempt will fail with EEXIST. Hence EEXIST can be
> returned in two cases, when existing bpf_link owned filter exists, or
> existing netlink owned filter exists.
> 
> Removing bpf_link owned filter from netlink returns EPERM, denoting that
> netlink is locked out from filter manipulation when bpf_link is
> involved.
> 
> Whenever a filter is detached due to chain removal, or qdisc tear down,
> or net_device shutdown, the bpf_link becomes automatically detached.
> 
> In this way, the netlink API and bpf_link creation path are exclusive
> and don't stomp over one another. Filters created using bpf_link API
> cannot be replaced by netlink API, and filters created by netlink API are
> never replaced by bpf_link. Netfilter also cannot detach bpf_link filters.
> 
> We serialize all changes dover rtnl_lock as cls_bpf API doesn't support the

dover => over?

> unlocked classifier API.
> 
> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>.
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>   include/linux/bpf_types.h |   3 +
>   include/net/pkt_cls.h     |  13 ++
>   include/net/sch_generic.h |   6 +-
>   include/uapi/linux/bpf.h  |  15 +++
>   kernel/bpf/syscall.c      |  10 +-
>   net/sched/cls_api.c       | 139 ++++++++++++++++++++-
>   net/sched/cls_bpf.c       | 250 +++++++++++++++++++++++++++++++++++++-
>   7 files changed, 430 insertions(+), 6 deletions(-)
> 
[...]
>   subsys_initcall(tc_filter_init);
> +
> +#if IS_ENABLED(CONFIG_NET_CLS_BPF)
> +
> +int bpf_tc_link_attach(union bpf_attr *attr, struct bpf_prog *prog)
> +{
> +	struct net *net = current->nsproxy->net_ns;
> +	struct tcf_chain_info chain_info;
> +	u32 chain_index, prio, parent;
> +	struct tcf_block *block;
> +	struct tcf_chain *chain;
> +	struct tcf_proto *tp;
> +	int err, tp_created;
> +	unsigned long cl;
> +	struct Qdisc *q;
> +	__be16 protocol;
> +	void *fh;
> +
> +	/* Caller already checks bpf_capable */
> +	if (!ns_capable(current->nsproxy->net_ns->user_ns, CAP_NET_ADMIN))

net->user_ns?

> +		return -EPERM;
> +
> +	if (attr->link_create.flags ||
> +	    !attr->link_create.target_ifindex ||
> +	    !tc_flags_valid(attr->link_create.tc.gen_flags))
> +		return -EINVAL;
> +
[...]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 4/7] net: sched: add lightweight update path for cls_bpf
  2021-06-04 17:54   ` Alexei Starovoitov
@ 2021-06-05  4:42     ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 35+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2021-06-05  4:42 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Toke Høiland-Jørgensen, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Jamal Hadi Salim, Vlad Buslov,
	Cong Wang, Jesper Dangaard Brouer, netdev

On Fri, Jun 04, 2021 at 11:24:28PM IST, Alexei Starovoitov wrote:
> On Fri, Jun 04, 2021 at 12:01:13PM +0530, Kumar Kartikeya Dwivedi wrote:
> > This is used by BPF_LINK_UPDATE to replace the attach SCHED_CLS bpf prog
> > effectively changing the classifier implementation for a given filter
> > owned by a bpf_link.
> >
> > Note that READ_ONCE suffices in this case as the ordering for loads from
> > the filter are implicitly provided by the data dependency on BPF prog
> > pointer.
> >
> > On the writer side we can just use a relaxed WRITE_ONCE store to make
> > sure one or the other value is visible to a reader in cls_bpf_classify.
> > Lifetime is managed using RCU so bpf_prog_put path should wait until
> > readers are done for old_prog.
>
> Should those be rcu_deref and rcu_assign_pointer ?
> Typically the pointer would be __rcu annotated which would be
> another small change in struct cls_bpf_prog.
> That would make the life time easier to follow?
>

True, I'll make that change.

> > All other parties accessing the BPF prog are under RTNL protection, so
> > need no changes.
> >
> > Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>.
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> >  net/sched/cls_bpf.c | 55 +++++++++++++++++++++++++++++++++++++++++++--
> >  1 file changed, 53 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
> > index bf61ffbb7fd0..f23304685c48 100644
> > --- a/net/sched/cls_bpf.c
> > +++ b/net/sched/cls_bpf.c
> > @@ -9,6 +9,7 @@
> >   * (C) 2013 Daniel Borkmann <dborkman@redhat.com>
> >   */
> >
> > +#include <linux/atomic.h>
> >  #include <linux/module.h>
> >  #include <linux/types.h>
> >  #include <linux/skbuff.h>
> > @@ -104,11 +105,11 @@ static int cls_bpf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
> >  			/* It is safe to push/pull even if skb_shared() */
> >  			__skb_push(skb, skb->mac_len);
> >  			bpf_compute_data_pointers(skb);
> > -			filter_res = BPF_PROG_RUN(prog->filter, skb);
> > +			filter_res = BPF_PROG_RUN(READ_ONCE(prog->filter), skb);
> >  			__skb_pull(skb, skb->mac_len);
> >  		} else {
> >  			bpf_compute_data_pointers(skb);
> > -			filter_res = BPF_PROG_RUN(prog->filter, skb);
> > +			filter_res = BPF_PROG_RUN(READ_ONCE(prog->filter), skb);
> >  		}
> >
> >  		if (prog->exts_integrated) {
> > @@ -775,6 +776,55 @@ static int cls_bpf_link_detach(struct bpf_link *link)
> >  	return 0;
> >  }
> >
> > +static int cls_bpf_link_update(struct bpf_link *link, struct bpf_prog *new_prog,
> > +			       struct bpf_prog *old_prog)
> > +{
> > +	struct cls_bpf_link *cls_link;
> > +	struct cls_bpf_prog cls_prog;
> > +	struct cls_bpf_prog *prog;
> > +	int ret;
> > +
> > +	rtnl_lock();
> > +
> > +	cls_link = container_of(link, struct cls_bpf_link, link);
> > +	if (!cls_link->prog) {
> > +		ret = -ENOLINK;
> > +		goto out;
> > +	}
> > +
> > +	prog = cls_link->prog;
> > +
> > +	/* BPF_F_REPLACEing? */
> > +	if (old_prog && prog->filter != old_prog) {
> > +		ret = -EINVAL;
>
> Other places like cgroup_bpf_replace and bpf_iter_link_replace
> return -EPERM in such case.
>

Ok, will change.

> > +		goto out;
> > +	}
> > +
> > +	old_prog = prog->filter;
> > +
> > +	if (new_prog == old_prog) {
> > +		ret = 0;
> > +		goto out;
> > +	}
> > +
> > +	cls_prog = *prog;
> > +	cls_prog.filter = new_prog;
> > +
> > +	ret = cls_bpf_offload(prog->tp, &cls_prog, prog, NULL);
> > +	if (ret < 0)
> > +		goto out;
> > +
> > +	WRITE_ONCE(prog->filter, new_prog);
> > +
> > +	bpf_prog_inc(new_prog);
> > +	/* release our reference */
> > +	bpf_prog_put(old_prog);
> > +
> > +out:
> > +	rtnl_unlock();
> > +	return ret;
> > +}
> > +
> >  static void __bpf_fill_link_info(struct cls_bpf_link *link,
> >  				 struct bpf_link_info *info)
> >  {
> > @@ -859,6 +909,7 @@ static const struct bpf_link_ops cls_bpf_link_ops = {
> >  	.show_fdinfo = cls_bpf_link_show_fdinfo,
> >  #endif
> >  	.fill_link_info = cls_bpf_link_fill_link_info,
> > +	.update_prog = cls_bpf_link_update,
> >  };
> >
> >  static inline char *cls_bpf_link_name(u32 prog_id, const char *name)
> > --
> > 2.31.1
> >
>
> --

--
Kartikeya

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 6/7] libbpf: add bpf_link based TC-BPF management API
  2021-06-04 18:01   ` Alexei Starovoitov
@ 2021-06-05  4:51     ` Kumar Kartikeya Dwivedi
  2021-06-07 23:37       ` Andrii Nakryiko
  0 siblings, 1 reply; 35+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2021-06-05  4:51 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Toke Høiland-Jørgensen, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Jamal Hadi Salim, Vlad Buslov,
	Cong Wang, Jesper Dangaard Brouer, netdev

On Fri, Jun 04, 2021 at 11:31:57PM IST, Alexei Starovoitov wrote:
> On Fri, Jun 04, 2021 at 12:01:15PM +0530, Kumar Kartikeya Dwivedi wrote:
> > +/* TC bpf_link related API */
> > +struct bpf_tc_hook;
> > +
> > +struct bpf_tc_link_opts {
> > +	size_t sz;
> > +	__u32 handle;
> > +	__u32 priority;
> > +	__u32 gen_flags;
> > +	size_t :0;
> > +};
>
> Did you think of a way to share struct bpf_tc_opts with above?
> Or use bpf_tc_link_opts inside bpf_tc_opts?

A couple of fields in bpf_tc_opts aren't really relevant here (prog_fd, prog_id)
and will always be unused, so I thought it would be cleaner to give this its own
opts struct. It still reuses the hook abstraction that was added, though.

> Some other way?
> gen_flags here and flags there are the same?

No, it was an oversight that I missed adding gen_flags there, I'll send a patch
separately with some other assorted things. It's used when offloading to HW.

We don't really support any other flags (e.g. BPF_TC_F_REPLACE) for this.

--
Kartikeya

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 3/7] net: sched: add bpf_link API for bpf classifier
  2021-06-05  3:08   ` Yonghong Song
@ 2021-06-05  4:52     ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 35+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2021-06-05  4:52 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, Toke Høiland-Jørgensen, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Jamal Hadi Salim, Vlad Buslov,
	Cong Wang, Jesper Dangaard Brouer, netdev

On Sat, Jun 05, 2021 at 08:38:17AM IST, Yonghong Song wrote:
>
>
> On 6/3/21 11:31 PM, Kumar Kartikeya Dwivedi wrote:
> > This commit introduces a bpf_link based kernel API for creating tc
> > filters and using the cls_bpf classifier. Only a subset of what netlink
> > API offers is supported, things like TCA_BPF_POLICE, TCA_RATE and
> > embedded actions are unsupported.
> >
> > The kernel API and the libbpf wrapper added in a subsequent patch are
> > more opinionated and mirror the semantics of low level netlink based
> > TC-BPF API, i.e. always setting direct action mode, always setting
> > protocol to ETH_P_ALL, and only exposing handle and priority as the
> > variables the user can control. We add an additional gen_flags parameter
> > though to allow for offloading use cases. It would be trivial to extend
> > the current API to support specifying other attributes in the future,
> > but for now I'm sticking how we want to push usage.
> >
> > The semantics around bpf_link support are as follows:
> >
> > A user can create a classifier attached to a filter using the bpf_link
> > API, after which changing it and deleting it only happens through the
> > bpf_link API. It is not possible to bind the bpf_link to existing
> > filter, and any such attempt will fail with EEXIST. Hence EEXIST can be
> > returned in two cases, when existing bpf_link owned filter exists, or
> > existing netlink owned filter exists.
> >
> > Removing bpf_link owned filter from netlink returns EPERM, denoting that
> > netlink is locked out from filter manipulation when bpf_link is
> > involved.
> >
> > Whenever a filter is detached due to chain removal, or qdisc tear down,
> > or net_device shutdown, the bpf_link becomes automatically detached.
> >
> > In this way, the netlink API and bpf_link creation path are exclusive
> > and don't stomp over one another. Filters created using bpf_link API
> > cannot be replaced by netlink API, and filters created by netlink API are
> > never replaced by bpf_link. Netfilter also cannot detach bpf_link filters.
> >
> > We serialize all changes dover rtnl_lock as cls_bpf API doesn't support the
>
> dover => over?
>

Thanks, will fix.

> > unlocked classifier API.
> >
> > Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>.
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> >   include/linux/bpf_types.h |   3 +
> >   include/net/pkt_cls.h     |  13 ++
> >   include/net/sch_generic.h |   6 +-
> >   include/uapi/linux/bpf.h  |  15 +++
> >   kernel/bpf/syscall.c      |  10 +-
> >   net/sched/cls_api.c       | 139 ++++++++++++++++++++-
> >   net/sched/cls_bpf.c       | 250 +++++++++++++++++++++++++++++++++++++-
> >   7 files changed, 430 insertions(+), 6 deletions(-)
> >
> [...]
> >   subsys_initcall(tc_filter_init);
> > +
> > +#if IS_ENABLED(CONFIG_NET_CLS_BPF)
> > +
> > +int bpf_tc_link_attach(union bpf_attr *attr, struct bpf_prog *prog)
> > +{
> > +	struct net *net = current->nsproxy->net_ns;
> > +	struct tcf_chain_info chain_info;
> > +	u32 chain_index, prio, parent;
> > +	struct tcf_block *block;
> > +	struct tcf_chain *chain;
> > +	struct tcf_proto *tp;
> > +	int err, tp_created;
> > +	unsigned long cl;
> > +	struct Qdisc *q;
> > +	__be16 protocol;
> > +	void *fh;
> > +
> > +	/* Caller already checks bpf_capable */
> > +	if (!ns_capable(current->nsproxy->net_ns->user_ns, CAP_NET_ADMIN))
>
> net->user_ns?
>

True, will fix.

> > +		return -EPERM;
> > +
> > +	if (attr->link_create.flags ||
> > +	    !attr->link_create.target_ifindex ||
> > +	    !tc_flags_valid(attr->link_create.tc.gen_flags))
> > +		return -EINVAL;
> > +
> [...]

--
Kartikeya

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 6/7] libbpf: add bpf_link based TC-BPF management API
  2021-06-04  6:31 ` [PATCH bpf-next v2 6/7] libbpf: add bpf_link based TC-BPF management API Kumar Kartikeya Dwivedi
  2021-06-04 18:01   ` Alexei Starovoitov
@ 2021-06-05 17:09   ` Yonghong Song
  2021-06-07 23:41   ` Andrii Nakryiko
  2 siblings, 0 replies; 35+ messages in thread
From: Yonghong Song @ 2021-06-05 17:09 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: Toke Høiland-Jørgensen, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Jamal Hadi Salim, Vlad Buslov,
	Cong Wang, Jesper Dangaard Brouer, netdev



On 6/3/21 11:31 PM, Kumar Kartikeya Dwivedi wrote:
> This adds userspace TC-BPF management API based on bpf_link.
> 
> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>.
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>   tools/lib/bpf/bpf.c      |  8 +++++-
>   tools/lib/bpf/bpf.h      |  8 +++++-
>   tools/lib/bpf/libbpf.c   | 59 ++++++++++++++++++++++++++++++++++++++--
>   tools/lib/bpf/libbpf.h   | 17 ++++++++++++
>   tools/lib/bpf/libbpf.map |  1 +
>   tools/lib/bpf/netlink.c  |  5 ++--
>   tools/lib/bpf/netlink.h  |  8 ++++++
>   7 files changed, 100 insertions(+), 6 deletions(-)
>   create mode 100644 tools/lib/bpf/netlink.h
> 
> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> index 86dcac44f32f..bebccea9bfd7 100644
> --- a/tools/lib/bpf/bpf.c
> +++ b/tools/lib/bpf/bpf.c
> @@ -28,6 +28,7 @@
>   #include <asm/unistd.h>
>   #include <errno.h>
>   #include <linux/bpf.h>
> +#include <arpa/inet.h>
>   #include "bpf.h"
>   #include "libbpf.h"
>   #include "libbpf_internal.h"
> @@ -693,7 +694,12 @@ int bpf_link_create(int prog_fd, int target_fd,
>   	attr.link_create.attach_type = attach_type;
>   	attr.link_create.flags = OPTS_GET(opts, flags, 0);
>   
> -	if (iter_info_len) {
> +	if (attach_type == BPF_TC) {
> +		attr.link_create.tc.parent = OPTS_GET(opts, tc.parent, 0);
> +		attr.link_create.tc.handle = OPTS_GET(opts, tc.handle, 0);
> +		attr.link_create.tc.priority = OPTS_GET(opts, tc.priority, 0);
> +		attr.link_create.tc.gen_flags = OPTS_GET(opts, tc.gen_flags, 0);
> +	} else if (iter_info_len) {
>   		attr.link_create.iter_info =
>   			ptr_to_u64(OPTS_GET(opts, iter_info, (void *)0));
>   		attr.link_create.iter_info_len = iter_info_len;
> diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
> index 4f758f8f50cd..f2178309e9ea 100644
> --- a/tools/lib/bpf/bpf.h
> +++ b/tools/lib/bpf/bpf.h
> @@ -177,8 +177,14 @@ struct bpf_link_create_opts {
>   	union bpf_iter_link_info *iter_info;
>   	__u32 iter_info_len;
>   	__u32 target_btf_id;
> +	struct {
> +		__u32 parent;
> +		__u32 handle;
> +		__u32 priority;
> +		__u32 gen_flags;
> +	} tc;
>   };
> -#define bpf_link_create_opts__last_field target_btf_id
> +#define bpf_link_create_opts__last_field tc.gen_flags
>   
>   LIBBPF_API int bpf_link_create(int prog_fd, int target_fd,
>   			       enum bpf_attach_type attach_type,
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 1c4e20e75237..7809536980b1 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -55,6 +55,7 @@
>   #include "libbpf_internal.h"
>   #include "hashmap.h"
>   #include "bpf_gen_internal.h"
> +#include "netlink.h"
>   
>   #ifndef BPF_FS_MAGIC
>   #define BPF_FS_MAGIC		0xcafe4a11
> @@ -7185,7 +7186,7 @@ static int bpf_object__collect_relos(struct bpf_object *obj)
>   
>   	for (i = 0; i < obj->nr_programs; i++) {
>   		struct bpf_program *p = &obj->programs[i];
> -		
> +
>   		if (!p->nr_reloc)
>   			continue;
>   
> @@ -10005,7 +10006,7 @@ struct bpf_link {
>   int bpf_link__update_program(struct bpf_link *link, struct bpf_program *prog)
>   {
>   	int ret;
> -	
> +
>   	ret = bpf_link_update(bpf_link__fd(link), bpf_program__fd(prog), NULL);
>   	return libbpf_err_errno(ret);
>   }
> @@ -10613,6 +10614,60 @@ struct bpf_link *bpf_program__attach_xdp(struct bpf_program *prog, int ifindex)
>   	return bpf_program__attach_fd(prog, ifindex, 0, "xdp");
>   }
>   
> +struct bpf_link *bpf_program__attach_tc(struct bpf_program *prog,
> +					const struct bpf_tc_hook *hook,
> +					const struct bpf_tc_link_opts *opts)
> +{
> +	DECLARE_LIBBPF_OPTS(bpf_link_create_opts, lopts, 0);
> +	char errmsg[STRERR_BUFSIZE];
> +	int prog_fd, link_fd, ret;
> +	struct bpf_link *link;
> +	__u32 parent;
> +
> +	if (!hook || !OPTS_VALID(hook, bpf_tc_hook) ||
> +	    !OPTS_VALID(opts, bpf_tc_link_opts))
> +		return ERR_PTR(-EINVAL);

For libbpf API functions, the libbpf proposed 1.0 change
     https://lore.kernel.org/bpf/20210525035935.1461796-5-andrii@kernel.org
will return NULL and set errno properly.

So the above code probably should be
	return errno = EINVAL, NULL;


> +
> +	if (OPTS_GET(hook, ifindex, 0) <= 0 ||
> +	    OPTS_GET(opts, priority, 0) > UINT16_MAX)
> +		return ERR_PTR(-EINVAL);
> +
> +	parent = OPTS_GET(hook, parent, 0);
> +
> +	ret = tc_get_tcm_parent(OPTS_GET(hook, attach_point, 0),
> +				&parent);
> +	if (ret < 0)
> +		return ERR_PTR(ret);
> +
> +	lopts.tc.parent = parent;
> +	lopts.tc.handle = OPTS_GET(opts, handle, 0);
> +	lopts.tc.priority = OPTS_GET(opts, priority, 0);
> +	lopts.tc.gen_flags = OPTS_GET(opts, gen_flags, 0);
> +
> +	prog_fd = bpf_program__fd(prog);
> +	if (prog_fd < 0) {
> +		pr_warn("prog '%s': can't attach before loaded\n", prog->name);
> +		return ERR_PTR(-EINVAL);
> +	}
> +
> +	link = calloc(1, sizeof(*link));
> +	if (!link)
> +		return ERR_PTR(-ENOMEM);
> +	link->detach = &bpf_link__detach_fd;
> +
> +	link_fd = bpf_link_create(prog_fd, OPTS_GET(hook, ifindex, 0), BPF_TC, &lopts);
> +	if (link_fd < 0) {
> +		link_fd = -errno;
> +		free(link);
> +		pr_warn("prog '%s': failed to attach tc filter: %s\n",
> +			prog->name, libbpf_strerror_r(link_fd, errmsg, sizeof(errmsg)));
> +		return ERR_PTR(link_fd);
> +	}
> +	link->fd = link_fd;
> +
> +	return link;
> +}
> +
[...]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 7/7] libbpf: add selftest for bpf_link based TC-BPF management API
  2021-06-04  6:31 ` [PATCH bpf-next v2 7/7] libbpf: add selftest for " Kumar Kartikeya Dwivedi
@ 2021-06-05 17:26   ` Yonghong Song
  2021-06-07 23:57   ` Andrii Nakryiko
  1 sibling, 0 replies; 35+ messages in thread
From: Yonghong Song @ 2021-06-05 17:26 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf
  Cc: Toke Høiland-Jørgensen, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Jamal Hadi Salim, Vlad Buslov,
	Cong Wang, Jesper Dangaard Brouer, netdev



On 6/3/21 11:31 PM, Kumar Kartikeya Dwivedi wrote:
> This covers basic attach/detach/update, and tests interaction with the
> netlink API. It also exercises the bpf_link_info and fdinfo codepaths.
> 
> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>.
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>   .../selftests/bpf/prog_tests/tc_bpf_link.c    | 285 ++++++++++++++++++
>   1 file changed, 285 insertions(+)
>   create mode 100644 tools/testing/selftests/bpf/prog_tests/tc_bpf_link.c
> 
> diff --git a/tools/testing/selftests/bpf/prog_tests/tc_bpf_link.c b/tools/testing/selftests/bpf/prog_tests/tc_bpf_link.c
> new file mode 100644
> index 000000000000..beaf06e0557c
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/tc_bpf_link.c
> @@ -0,0 +1,285 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <test_progs.h>
> +#include <linux/pkt_cls.h>
> +
> +#include "test_tc_bpf.skel.h"
> +
> +#define LO_IFINDEX 1
> +
> +static int test_tc_bpf_link_basic(struct bpf_tc_hook *hook,
> +				  struct bpf_program *prog)
> +{
> +	DECLARE_LIBBPF_OPTS(bpf_tc_link_opts, opts, .handle = 1, .priority = 1);
> +	DECLARE_LIBBPF_OPTS(bpf_tc_opts, qopts, .handle = 1, .priority = 1);
> +	struct bpf_prog_info info = {};
> +	__u32 info_len = sizeof(info);
> +	struct bpf_link *link, *invl;
> +	int ret;
> +
> +	link = bpf_program__attach_tc(prog, hook, &opts);
> +	if (!ASSERT_OK_PTR(link, "bpf_program__attach_tc"))
> +		return PTR_ERR(link);

If we changed the bpf_program__attach_tc return semantics such that
the return "link" value can only be NULL or a valid pointer, you can do
	if (!ASSERT_OK_PTR(link, "bpf_program__attach_tc"))
		return -EINVAL;

The true error code should have been printed inside ASSERT_OK_PTR.

The same for a few other cases below.

> +
> +	ret = bpf_obj_get_info_by_fd(bpf_program__fd(prog), &info, &info_len);
> +	if (!ASSERT_OK(ret, "bpf_obj_get_info_by_fd"))
> +		goto end;
> +
> +	ret = bpf_tc_query(hook, &qopts);
> +	if (!ASSERT_OK(ret, "bpf_tc_query"))
> +		goto end;
> +
> +	if (!ASSERT_EQ(qopts.prog_id, info.id, "prog_id match"))
> +		goto end;
> +
> +	opts.gen_flags = ~0u;
> +	invl = bpf_program__attach_tc(prog, hook, &opts);
> +	if (!ASSERT_ERR_PTR(invl, "bpf_program__attach_tc with invalid flags")) {
> +		bpf_link__destroy(invl);
> +		ret = -EINVAL;
> +	}
> +
> +end:
> +	bpf_link__destroy(link);
> +	return ret;
> +}
> +
[...]
> +
> +static int test_tc_bpf_link_info_api(struct bpf_tc_hook *hook,
> +				     struct bpf_program *prog)
> +{
> +	DECLARE_LIBBPF_OPTS(bpf_tc_link_opts, opts, .handle = 1, .priority = 1);
> +	__u32 ifindex, parent, handle, gen_flags, priority;
> +	char buf[4096], path[256], *begin;
> +	struct bpf_link_info info = {};
> +	__u32 info_len = sizeof(info);
> +	struct bpf_link *link;
> +	int ret, fdinfo;
> +
> +	link = bpf_program__attach_tc(prog, hook, &opts);
> +	if (!ASSERT_OK_PTR(link, "bpf_program__attach_tc"))
> +		return PTR_ERR(link);
> +
> +	ret = bpf_obj_get_info_by_fd(bpf_link__fd(link), &info, &info_len);
> +	if (!ASSERT_OK(ret, "bpf_obj_get_info_by_fd"))
> +		goto end;
> +
> +	ret = snprintf(path, sizeof(path), "/proc/self/fdinfo/%d",
> +		       bpf_link__fd(link));
> +	if (!ASSERT_TRUE(!ret || ret < sizeof(path), "snprintf pathname"))
> +		goto end;
> +
> +	fdinfo = open(path, O_RDONLY);
> +	if (!ASSERT_GT(fdinfo, -1, "open fdinfo"))
> +		goto end;
> +
> +	ret = read(fdinfo, buf, sizeof(buf));
> +	if (!ASSERT_GT(ret, 0, "read fdinfo")) {
> +		ret = -EINVAL;
> +		goto end_file;
> +	}
> +
> +	begin = strstr(buf, "ifindex");
> +	if (!ASSERT_OK_PTR(begin, "find beginning of fdinfo info")) {
> +		ret = -EINVAL;
> +		goto end_file;
> +	}
> +
> +	ret = sscanf(begin, "ifindex:\t%u\n"
> +			    "parent:\t%u\n"
> +			    "handle:\t%u\n"
> +			    "priority:\t%u\n"
> +			    "gen_flags:\t%u\n",
> +			    &ifindex, &parent, &handle, &priority, &gen_flags);
> +	if (!ASSERT_EQ(ret, 5, "sscanf fdinfo")) {
> +		ret = -EINVAL;
> +		goto end_file;
> +	}
> +
> +	ret = -EINVAL;
> +
> +#define X(a, b, c) (!ASSERT_EQ(a, b, #a " == " #b) || !ASSERT_EQ(b, c, #b " == " #c))
> +	if (X(info.tc.ifindex, ifindex, 1) ||
> +	    X(info.tc.parent, parent,
> +	      TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_EGRESS)) ||
> +	    X(info.tc.handle, handle, 1) ||
> +	    X(info.tc.gen_flags, gen_flags, TCA_CLS_FLAGS_NOT_IN_HW) ||
> +	    X(info.tc.priority, priority, 1))
> +#undef X
> +		goto end_file;

Maybe put "#undef X" after "goto end_file" is a little bit better?

> +
> +	ret = 0;
> +
> +end_file:
> +	close(fdinfo);
> +end:
> +	bpf_link__destroy(link);
> +	return ret;
> +}
> +
[...]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 3/7] net: sched: add bpf_link API for bpf classifier
  2021-06-04  6:31 ` [PATCH bpf-next v2 3/7] net: sched: add bpf_link API for bpf classifier Kumar Kartikeya Dwivedi
  2021-06-05  3:08   ` Yonghong Song
@ 2021-06-07 23:23   ` Andrii Nakryiko
  1 sibling, 0 replies; 35+ messages in thread
From: Andrii Nakryiko @ 2021-06-07 23:23 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Toke Høiland-Jørgensen, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Jamal Hadi Salim, Vlad Buslov,
	Cong Wang, Jesper Dangaard Brouer, Networking

On Thu, Jun 3, 2021 at 11:32 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> This commit introduces a bpf_link based kernel API for creating tc
> filters and using the cls_bpf classifier. Only a subset of what netlink
> API offers is supported, things like TCA_BPF_POLICE, TCA_RATE and
> embedded actions are unsupported.
>
> The kernel API and the libbpf wrapper added in a subsequent patch are
> more opinionated and mirror the semantics of low level netlink based
> TC-BPF API, i.e. always setting direct action mode, always setting
> protocol to ETH_P_ALL, and only exposing handle and priority as the
> variables the user can control. We add an additional gen_flags parameter
> though to allow for offloading use cases. It would be trivial to extend
> the current API to support specifying other attributes in the future,
> but for now I'm sticking how we want to push usage.
>
> The semantics around bpf_link support are as follows:
>
> A user can create a classifier attached to a filter using the bpf_link
> API, after which changing it and deleting it only happens through the
> bpf_link API. It is not possible to bind the bpf_link to existing
> filter, and any such attempt will fail with EEXIST. Hence EEXIST can be
> returned in two cases, when existing bpf_link owned filter exists, or
> existing netlink owned filter exists.
>
> Removing bpf_link owned filter from netlink returns EPERM, denoting that
> netlink is locked out from filter manipulation when bpf_link is
> involved.
>
> Whenever a filter is detached due to chain removal, or qdisc tear down,
> or net_device shutdown, the bpf_link becomes automatically detached.
>
> In this way, the netlink API and bpf_link creation path are exclusive
> and don't stomp over one another. Filters created using bpf_link API
> cannot be replaced by netlink API, and filters created by netlink API are
> never replaced by bpf_link. Netfilter also cannot detach bpf_link filters.
>
> We serialize all changes dover rtnl_lock as cls_bpf API doesn't support the
> unlocked classifier API.
>
> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>.
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>  include/linux/bpf_types.h |   3 +
>  include/net/pkt_cls.h     |  13 ++
>  include/net/sch_generic.h |   6 +-
>  include/uapi/linux/bpf.h  |  15 +++
>  kernel/bpf/syscall.c      |  10 +-
>  net/sched/cls_api.c       | 139 ++++++++++++++++++++-
>  net/sched/cls_bpf.c       | 250 +++++++++++++++++++++++++++++++++++++-
>  7 files changed, 430 insertions(+), 6 deletions(-)
>

[...]

> @@ -1447,6 +1449,12 @@ union bpf_attr {
>                                 __aligned_u64   iter_info;      /* extra bpf_iter_link_info */
>                                 __u32           iter_info_len;  /* iter_info length */
>                         };
> +                       struct { /* used by BPF_TC */
> +                               __u32 parent;
> +                               __u32 handle;
> +                               __u32 gen_flags;

There is already link_create.flags that's totally up to a specific
type of bpf_link. E.g., cgroup bpf_link doesn't accept any flags,
while xdp bpf_link uses it for passing XDP-specific flags. Is there a
need to have both gen_flags and flags for TC link?

> +                               __u16 priority;

No strong preference, but we typically try to not have unnecessary
padding in UAPI bpf_attr, so I wonder if using __u32 for this would
make sense?

> +                       } tc;
>                 };
>         } link_create;
>
> @@ -5519,6 +5527,13 @@ struct bpf_link_info {
>                 struct {
>                         __u32 ifindex;
>                 } xdp;
> +               struct {
> +                       __u32 ifindex;
> +                       __u32 parent;
> +                       __u32 handle;
> +                       __u32 gen_flags;
> +                       __u16 priority;
> +               } tc;
>         };
>  } __attribute__((aligned(8)));
>

[...]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 4/7] net: sched: add lightweight update path for cls_bpf
  2021-06-04  6:31 ` [PATCH bpf-next v2 4/7] net: sched: add lightweight update path for cls_bpf Kumar Kartikeya Dwivedi
  2021-06-04 17:54   ` Alexei Starovoitov
@ 2021-06-07 23:32   ` Andrii Nakryiko
  2021-06-10 14:14     ` Kumar Kartikeya Dwivedi
  1 sibling, 1 reply; 35+ messages in thread
From: Andrii Nakryiko @ 2021-06-07 23:32 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Toke Høiland-Jørgensen, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Jamal Hadi Salim, Vlad Buslov,
	Cong Wang, Jesper Dangaard Brouer, Networking

On Thu, Jun 3, 2021 at 11:32 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> This is used by BPF_LINK_UPDATE to replace the attach SCHED_CLS bpf prog
> effectively changing the classifier implementation for a given filter
> owned by a bpf_link.
>
> Note that READ_ONCE suffices in this case as the ordering for loads from
> the filter are implicitly provided by the data dependency on BPF prog
> pointer.
>
> On the writer side we can just use a relaxed WRITE_ONCE store to make
> sure one or the other value is visible to a reader in cls_bpf_classify.
> Lifetime is managed using RCU so bpf_prog_put path should wait until
> readers are done for old_prog.
>
> All other parties accessing the BPF prog are under RTNL protection, so
> need no changes.
>
> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>.
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>  net/sched/cls_bpf.c | 55 +++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 53 insertions(+), 2 deletions(-)
>
> diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
> index bf61ffbb7fd0..f23304685c48 100644
> --- a/net/sched/cls_bpf.c
> +++ b/net/sched/cls_bpf.c
> @@ -9,6 +9,7 @@
>   * (C) 2013 Daniel Borkmann <dborkman@redhat.com>
>   */
>
> +#include <linux/atomic.h>
>  #include <linux/module.h>
>  #include <linux/types.h>
>  #include <linux/skbuff.h>
> @@ -104,11 +105,11 @@ static int cls_bpf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
>                         /* It is safe to push/pull even if skb_shared() */
>                         __skb_push(skb, skb->mac_len);
>                         bpf_compute_data_pointers(skb);
> -                       filter_res = BPF_PROG_RUN(prog->filter, skb);
> +                       filter_res = BPF_PROG_RUN(READ_ONCE(prog->filter), skb);
>                         __skb_pull(skb, skb->mac_len);
>                 } else {
>                         bpf_compute_data_pointers(skb);
> -                       filter_res = BPF_PROG_RUN(prog->filter, skb);
> +                       filter_res = BPF_PROG_RUN(READ_ONCE(prog->filter), skb);
>                 }
>
>                 if (prog->exts_integrated) {
> @@ -775,6 +776,55 @@ static int cls_bpf_link_detach(struct bpf_link *link)
>         return 0;
>  }
>
> +static int cls_bpf_link_update(struct bpf_link *link, struct bpf_prog *new_prog,
> +                              struct bpf_prog *old_prog)
> +{
> +       struct cls_bpf_link *cls_link;
> +       struct cls_bpf_prog cls_prog;
> +       struct cls_bpf_prog *prog;
> +       int ret;
> +
> +       rtnl_lock();
> +
> +       cls_link = container_of(link, struct cls_bpf_link, link);
> +       if (!cls_link->prog) {
> +               ret = -ENOLINK;
> +               goto out;
> +       }
> +
> +       prog = cls_link->prog;
> +
> +       /* BPF_F_REPLACEing? */
> +       if (old_prog && prog->filter != old_prog) {
> +               ret = -EINVAL;
> +               goto out;
> +       }
> +
> +       old_prog = prog->filter;
> +
> +       if (new_prog == old_prog) {
> +               ret = 0;

So the contract is that if update is successful, new_prog's refcount
taken by link_update() in kernel/bpf/syscall.c is transferred here. On
error, it will be bpf_prog_put() by link_update(). So here you don't
need extra refcnt, but it's also not an error, so you need to
bpf_prog_put(new_prog) explicitly to balance out refcnt. See how it's
done for XDP, for example.


> +               goto out;
> +       }
> +
> +       cls_prog = *prog;
> +       cls_prog.filter = new_prog;
> +
> +       ret = cls_bpf_offload(prog->tp, &cls_prog, prog, NULL);
> +       if (ret < 0)
> +               goto out;
> +
> +       WRITE_ONCE(prog->filter, new_prog);
> +
> +       bpf_prog_inc(new_prog);

and you don't need this, you already got the reference from link_update()

> +       /* release our reference */
> +       bpf_prog_put(old_prog);
> +
> +out:
> +       rtnl_unlock();
> +       return ret;
> +}
> +
>  static void __bpf_fill_link_info(struct cls_bpf_link *link,
>                                  struct bpf_link_info *info)
>  {
> @@ -859,6 +909,7 @@ static const struct bpf_link_ops cls_bpf_link_ops = {
>         .show_fdinfo = cls_bpf_link_show_fdinfo,
>  #endif
>         .fill_link_info = cls_bpf_link_fill_link_info,
> +       .update_prog = cls_bpf_link_update,
>  };
>
>  static inline char *cls_bpf_link_name(u32 prog_id, const char *name)
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 6/7] libbpf: add bpf_link based TC-BPF management API
  2021-06-05  4:51     ` Kumar Kartikeya Dwivedi
@ 2021-06-07 23:37       ` Andrii Nakryiko
  0 siblings, 0 replies; 35+ messages in thread
From: Andrii Nakryiko @ 2021-06-07 23:37 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: Alexei Starovoitov, bpf, Toke Høiland-Jørgensen,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Jamal Hadi Salim, Vlad Buslov, Cong Wang, Jesper Dangaard Brouer,
	Networking

On Fri, Jun 4, 2021 at 9:52 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>
> On Fri, Jun 04, 2021 at 11:31:57PM IST, Alexei Starovoitov wrote:
> > On Fri, Jun 04, 2021 at 12:01:15PM +0530, Kumar Kartikeya Dwivedi wrote:
> > > +/* TC bpf_link related API */
> > > +struct bpf_tc_hook;
> > > +
> > > +struct bpf_tc_link_opts {
> > > +   size_t sz;
> > > +   __u32 handle;
> > > +   __u32 priority;
> > > +   __u32 gen_flags;
> > > +   size_t :0;
> > > +};
> >
> > Did you think of a way to share struct bpf_tc_opts with above?
> > Or use bpf_tc_link_opts inside bpf_tc_opts?
>
> A couple of fields in bpf_tc_opts aren't really relevant here (prog_fd, prog_id)
> and will always be unused, so I thought it would be cleaner to give this its own
> opts struct. It still reuses the hook abstraction that was added, though.

Overall probably it will be less confusing to have one _opts struct
across both APIs, even if some of the fields are not used. Just
enforce that they are always NULL (and document in a comment that for
bpf_link-based API it's not expected).

>
> > Some other way?
> > gen_flags here and flags there are the same?
>
> No, it was an oversight that I missed adding gen_flags there, I'll send a patch
> separately with some other assorted things. It's used when offloading to HW.
>
> We don't really support any other flags (e.g. BPF_TC_F_REPLACE) for this.
>
> --
> Kartikeya

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 6/7] libbpf: add bpf_link based TC-BPF management API
  2021-06-04  6:31 ` [PATCH bpf-next v2 6/7] libbpf: add bpf_link based TC-BPF management API Kumar Kartikeya Dwivedi
  2021-06-04 18:01   ` Alexei Starovoitov
  2021-06-05 17:09   ` Yonghong Song
@ 2021-06-07 23:41   ` Andrii Nakryiko
  2 siblings, 0 replies; 35+ messages in thread
From: Andrii Nakryiko @ 2021-06-07 23:41 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Toke Høiland-Jørgensen, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Jamal Hadi Salim, Vlad Buslov,
	Cong Wang, Jesper Dangaard Brouer, Networking

On Thu, Jun 3, 2021 at 11:32 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> This adds userspace TC-BPF management API based on bpf_link.
>
> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>.
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>  tools/lib/bpf/bpf.c      |  8 +++++-
>  tools/lib/bpf/bpf.h      |  8 +++++-
>  tools/lib/bpf/libbpf.c   | 59 ++++++++++++++++++++++++++++++++++++++--
>  tools/lib/bpf/libbpf.h   | 17 ++++++++++++
>  tools/lib/bpf/libbpf.map |  1 +
>  tools/lib/bpf/netlink.c  |  5 ++--
>  tools/lib/bpf/netlink.h  |  8 ++++++
>  7 files changed, 100 insertions(+), 6 deletions(-)
>  create mode 100644 tools/lib/bpf/netlink.h
>

[...]

> +       link = calloc(1, sizeof(*link));
> +       if (!link)
> +               return ERR_PTR(-ENOMEM);
> +       link->detach = &bpf_link__detach_fd;
> +
> +       link_fd = bpf_link_create(prog_fd, OPTS_GET(hook, ifindex, 0), BPF_TC, &lopts);
> +       if (link_fd < 0) {
> +               link_fd = -errno;
> +               free(link);
> +               pr_warn("prog '%s': failed to attach tc filter: %s\n",
> +                       prog->name, libbpf_strerror_r(link_fd, errmsg, sizeof(errmsg)));
> +               return ERR_PTR(link_fd);

like Yonghong mentioned, all these error returns have to be either
`return libbpf_err_ptr(err);` or, given it's a new API, we can do just
direct `return errno = err, NULL;`. I'd probably use libbpf_err_ptr()
for now for consistency.

> +       }
> +       link->fd = link_fd;
> +
> +       return link;
> +}
> +
>  struct bpf_link *bpf_program__attach_freplace(struct bpf_program *prog,
>                                               int target_fd,
>                                               const char *attach_func_name)

[...]

> diff --git a/tools/lib/bpf/netlink.h b/tools/lib/bpf/netlink.h
> new file mode 100644
> index 000000000000..c89133d56eb4
> --- /dev/null
> +++ b/tools/lib/bpf/netlink.h
> @@ -0,0 +1,8 @@
> +// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
> +#pragma once

we don't use "#pragma once" in libbpf. And I saw some kernel
discussions discouraging its use. So please just stick to the classic
#ifdef/#define/#endif combo.

> +
> +#include <linux/types.h>
> +#include "libbpf.h"
> +
> +int tc_get_tcm_parent(enum bpf_tc_attach_point attach_point,
> +                     __u32 *parent);
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 7/7] libbpf: add selftest for bpf_link based TC-BPF management API
  2021-06-04  6:31 ` [PATCH bpf-next v2 7/7] libbpf: add selftest for " Kumar Kartikeya Dwivedi
  2021-06-05 17:26   ` Yonghong Song
@ 2021-06-07 23:57   ` Andrii Nakryiko
  1 sibling, 0 replies; 35+ messages in thread
From: Andrii Nakryiko @ 2021-06-07 23:57 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Toke Høiland-Jørgensen, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Jamal Hadi Salim, Vlad Buslov,
	Cong Wang, Jesper Dangaard Brouer, Networking

On Thu, Jun 3, 2021 at 11:32 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> This covers basic attach/detach/update, and tests interaction with the
> netlink API. It also exercises the bpf_link_info and fdinfo codepaths.
>
> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>.
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>  .../selftests/bpf/prog_tests/tc_bpf_link.c    | 285 ++++++++++++++++++
>  1 file changed, 285 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/tc_bpf_link.c
>

[...]

> +static int test_tc_bpf_link_netlink_interaction(struct bpf_tc_hook *hook,
> +                                               struct bpf_program *prog)
> +{
> +       DECLARE_LIBBPF_OPTS(bpf_link_update_opts, lopts,
> +                           .old_prog_fd = bpf_program__fd(prog));
> +       DECLARE_LIBBPF_OPTS(bpf_tc_link_opts, opts, .handle = 1, .priority = 1);
> +       DECLARE_LIBBPF_OPTS(bpf_tc_opts, nopts, .handle = 1, .priority = 1);
> +       DECLARE_LIBBPF_OPTS(bpf_tc_opts, dopts, .handle = 1, .priority = 1);
> +       struct bpf_link *link;
> +       int ret;
> +
> +       /* We need to test the following cases:
> +        *      1. BPF link owned filter cannot be replaced by netlink
> +        *      2. Netlink owned filter cannot be replaced by BPF link
> +        *      3. Netlink cannot do targeted delete of BPF link owned filter
> +        *      4. Filter is actually deleted (with chain cleanup)
> +        *         We actually (ab)use the kernel behavior of returning EINVAL when
> +        *         target chain doesn't exist on tc_get_tfilter (which maps to
> +        *         bpf_tc_query) here, to know if the chain was really cleaned
> +        *         up on tcf_proto destruction. Our setup is so that there is
> +        *         only one reference to the chain.
> +        *
> +        *         So on query, chain ? (filter ?: ENOENT) : EINVAL
> +        */
> +
> +       link = bpf_program__attach_tc(prog, hook, &opts);
> +       if (!ASSERT_OK_PTR(link, "bpf_program__attach_tc"))
> +               return PTR_ERR(link);
> +
> +       nopts.prog_fd = bpf_program__fd(prog);
> +       ret = bpf_tc_attach(hook, &nopts);
> +       if (!ASSERT_EQ(ret, -EEXIST, "bpf_tc_attach without replace"))
> +               goto end;
> +
> +       nopts.flags = BPF_TC_F_REPLACE;
> +       ret = bpf_tc_attach(hook, &nopts);
> +       if (!ASSERT_EQ(ret, -EPERM, "bpf_tc_attach with replace"))
> +               goto end;
> +
> +       ret = bpf_tc_detach(hook, &dopts);
> +       if (!ASSERT_EQ(ret, -EPERM, "bpf_tc_detach"))
> +               goto end;
> +
> +       lopts.flags = BPF_F_REPLACE;
> +       ret = bpf_link_update(bpf_link__fd(link), bpf_program__fd(prog),
> +                             &lopts);
> +       ASSERT_OK(ret, "bpf_link_update");
> +       ret = ret < 0 ? -errno : ret;

all selftests run in libbpf 1.0 mode, so you get actual error
directly, so no need to deal with -errno here.

> +
> +end:
> +       bpf_link__destroy(link);
> +       if (!ret && !ASSERT_EQ(bpf_tc_query(hook, &dopts), -EINVAL,
> +                              "chain empty delete"))
> +               ret = -EINVAL;
> +       return ret;
> +}
> +
> +static int test_tc_bpf_link_update_ways(struct bpf_tc_hook *hook,
> +                                       struct bpf_program *prog)
> +{
> +       DECLARE_LIBBPF_OPTS(bpf_tc_link_opts, opts, .handle = 1, .priority = 1);
> +       DECLARE_LIBBPF_OPTS(bpf_link_update_opts, uopts, 0);
> +       struct test_tc_bpf *skel;
> +       struct bpf_link *link;
> +       int ret;
> +
> +       skel = test_tc_bpf__open_and_load();
> +       if (!ASSERT_OK_PTR(skel, "test_tc_bpf__open_and_load"))
> +               return PTR_ERR(skel);
> +
> +       link = bpf_program__attach_tc(prog, hook, &opts);
> +       if (!ASSERT_OK_PTR(link, "bpf_program__attach_tc")) {
> +               ret = PTR_ERR(link);
> +               goto end;
> +       }
> +
> +       ret = bpf_link_update(bpf_link__fd(link), bpf_program__fd(prog),
> +                             &uopts);
> +       if (!ASSERT_OK(ret, "bpf_link_update no old prog"))
> +               goto end;
> +
> +       uopts.old_prog_fd = bpf_program__fd(prog);
> +       ret = bpf_link_update(bpf_link__fd(link), bpf_program__fd(prog),
> +                             &uopts);

please keep all such calls single-line, they aren't excessively long at all

> +       if (!ASSERT_TRUE(ret < 0 && errno == EINVAL,
> +                        "bpf_link_update with old prog without BPF_F_REPLACE")) {

same as above, ret should already be -EINVAL, so just check directly

> +               ret = -EINVAL;
> +               goto end;
> +       }
> +
> +       uopts.flags = BPF_F_REPLACE;
> +       ret = bpf_link_update(bpf_link__fd(link), bpf_program__fd(prog),
> +                             &uopts);
> +       if (!ASSERT_OK(ret, "bpf_link_update with old prog with BPF_F_REPLACE"))
> +               goto end;
> +
> +       uopts.old_prog_fd = bpf_program__fd(skel->progs.cls);
> +       ret = bpf_link_update(bpf_link__fd(link), bpf_program__fd(prog),
> +                             &uopts);
> +       if (!ASSERT_TRUE(ret < 0 && errno == EINVAL,
> +                        "bpf_link_update with wrong old prog")) {

and here

> +               ret = -EINVAL;
> +               goto end;
> +       }
> +       ret = 0;
> +
> +end:
> +       test_tc_bpf__destroy(skel);
> +       return ret;
> +}
> +
> +static int test_tc_bpf_link_info_api(struct bpf_tc_hook *hook,
> +                                    struct bpf_program *prog)
> +{
> +       DECLARE_LIBBPF_OPTS(bpf_tc_link_opts, opts, .handle = 1, .priority = 1);
> +       __u32 ifindex, parent, handle, gen_flags, priority;
> +       char buf[4096], path[256], *begin;
> +       struct bpf_link_info info = {};
> +       __u32 info_len = sizeof(info);
> +       struct bpf_link *link;
> +       int ret, fdinfo;
> +
> +       link = bpf_program__attach_tc(prog, hook, &opts);
> +       if (!ASSERT_OK_PTR(link, "bpf_program__attach_tc"))
> +               return PTR_ERR(link);
> +
> +       ret = bpf_obj_get_info_by_fd(bpf_link__fd(link), &info, &info_len);
> +       if (!ASSERT_OK(ret, "bpf_obj_get_info_by_fd"))
> +               goto end;
> +
> +       ret = snprintf(path, sizeof(path), "/proc/self/fdinfo/%d",
> +                      bpf_link__fd(link));
> +       if (!ASSERT_TRUE(!ret || ret < sizeof(path), "snprintf pathname"))
> +               goto end;

ASSERT_TRUE is very generic, it's better to do ASSERT_LT(ret,
sizeof(path), "snprintf") here

not sure why `!ret` is allowed?..

> +
> +       fdinfo = open(path, O_RDONLY);
> +       if (!ASSERT_GT(fdinfo, -1, "open fdinfo"))
> +               goto end;
> +
> +       ret = read(fdinfo, buf, sizeof(buf));
> +       if (!ASSERT_GT(ret, 0, "read fdinfo")) {
> +               ret = -EINVAL;
> +               goto end_file;
> +       }
> +
> +       begin = strstr(buf, "ifindex");
> +       if (!ASSERT_OK_PTR(begin, "find beginning of fdinfo info")) {
> +               ret = -EINVAL;
> +               goto end_file;
> +       }
> +
> +       ret = sscanf(begin, "ifindex:\t%u\n"
> +                           "parent:\t%u\n"
> +                           "handle:\t%u\n"
> +                           "priority:\t%u\n"
> +                           "gen_flags:\t%u\n",
> +                           &ifindex, &parent, &handle, &priority, &gen_flags);
> +       if (!ASSERT_EQ(ret, 5, "sscanf fdinfo")) {
> +               ret = -EINVAL;
> +               goto end_file;
> +       }
> +
> +       ret = -EINVAL;
> +
> +#define X(a, b, c) (!ASSERT_EQ(a, b, #a " == " #b) || !ASSERT_EQ(b, c, #b " == " #c))
> +       if (X(info.tc.ifindex, ifindex, 1) ||
> +           X(info.tc.parent, parent,
> +             TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_EGRESS)) ||
> +           X(info.tc.handle, handle, 1) ||
> +           X(info.tc.gen_flags, gen_flags, TCA_CLS_FLAGS_NOT_IN_HW) ||
> +           X(info.tc.priority, priority, 1))
> +#undef X

This seems to be a bit too convoluted and over-engineered. Just
validate all the equalities unconditionally.

ASSERT_EQ(info.tc.ifindex, 1, "info.tc.ifindex");
ASSERT_EQ(ifindex, 1, "fdinfo.ifindex");
ASSERT_EQ(info.tc.parent, TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_EGRESS),
"info.tc.parent");

and so on.

Then, you don't really need to propagate errors from
test_tc_bpf_link_info_api, because each ASSERT_EQ() marks the test (or
subtest) as failed, so you don't have to do that below in
test_tc_bpf_link.

> +               goto end_file;
> +
> +       ret = 0;
> +
> +end_file:
> +       close(fdinfo);
> +end:
> +       bpf_link__destroy(link);
> +       return ret;
> +}
> +
> +void test_tc_bpf_link(void)
> +{
> +       DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex = LO_IFINDEX,
> +                           .attach_point = BPF_TC_INGRESS);
> +       struct test_tc_bpf *skel = NULL;
> +       bool hook_created = false;
> +       int ret;
> +
> +       skel = test_tc_bpf__open_and_load();
> +       if (!ASSERT_OK_PTR(skel, "test_tc_bpf__open_and_load"))
> +               return;
> +
> +       ret = bpf_tc_hook_create(&hook);
> +       if (ret == 0)
> +               hook_created = true;
> +
> +       ret = ret == -EEXIST ? 0 : ret;
> +       if (!ASSERT_OK(ret, "bpf_tc_hook_create(BPF_TC_INGRESS)"))
> +               goto end;
> +
> +       ret = test_tc_bpf_link_basic(&hook, skel->progs.cls);
> +       if (!ASSERT_OK(ret, "test_tc_bpf_link_basic"))
> +               goto end;
> +
> +       bpf_tc_hook_destroy(&hook);
> +
> +       hook.attach_point = BPF_TC_EGRESS;
> +       ret = test_tc_bpf_link_basic(&hook, skel->progs.cls);
> +       if (!ASSERT_OK(ret, "test_tc_bpf_link_basic"))
> +               goto end;
> +
> +       bpf_tc_hook_destroy(&hook);
> +
> +       ret = test_tc_bpf_link_netlink_interaction(&hook, skel->progs.cls);
> +       if (!ASSERT_OK(ret, "test_tc_bpf_link_netlink_interaction"))
> +               goto end;
> +
> +       bpf_tc_hook_destroy(&hook);
> +
> +       ret = test_tc_bpf_link_update_ways(&hook, skel->progs.cls);
> +       if (!ASSERT_OK(ret, "test_tc_bpf_link_update_ways"))
> +               goto end;
> +
> +       bpf_tc_hook_destroy(&hook);
> +
> +       ret = test_tc_bpf_link_info_api(&hook, skel->progs.cls);
> +       if (!ASSERT_OK(ret, "test_tc_bpf_link_info_api"))

I was talking about this above, it's completely unnecessary and
redundant. Just complicates everything.

> +               goto end;
> +
> +end:
> +       if (hook_created) {
> +               hook.attach_point = BPF_TC_INGRESS | BPF_TC_EGRESS;
> +               bpf_tc_hook_destroy(&hook);
> +       }
> +       test_tc_bpf__destroy(skel);
> +}
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 4/7] net: sched: add lightweight update path for cls_bpf
  2021-06-07 23:32   ` Andrii Nakryiko
@ 2021-06-10 14:14     ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 35+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2021-06-10 14:14 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Toke Høiland-Jørgensen, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Jamal Hadi Salim, Vlad Buslov,
	Cong Wang, Jesper Dangaard Brouer, Networking

On Tue, Jun 08, 2021 at 05:02:04AM IST, Andrii Nakryiko wrote:
> On Thu, Jun 3, 2021 at 11:32 PM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > This is used by BPF_LINK_UPDATE to replace the attach SCHED_CLS bpf prog
> > effectively changing the classifier implementation for a given filter
> > owned by a bpf_link.
> >
> > Note that READ_ONCE suffices in this case as the ordering for loads from
> > the filter are implicitly provided by the data dependency on BPF prog
> > pointer.
> >
> > On the writer side we can just use a relaxed WRITE_ONCE store to make
> > sure one or the other value is visible to a reader in cls_bpf_classify.
> > Lifetime is managed using RCU so bpf_prog_put path should wait until
> > readers are done for old_prog.
> >
> > All other parties accessing the BPF prog are under RTNL protection, so
> > need no changes.
> >
> > Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>.
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> >  net/sched/cls_bpf.c | 55 +++++++++++++++++++++++++++++++++++++++++++--
> >  1 file changed, 53 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
> > index bf61ffbb7fd0..f23304685c48 100644
> > --- a/net/sched/cls_bpf.c
> > +++ b/net/sched/cls_bpf.c
> > @@ -9,6 +9,7 @@
> >   * (C) 2013 Daniel Borkmann <dborkman@redhat.com>
> >   */
> >
> > +#include <linux/atomic.h>
> >  #include <linux/module.h>
> >  #include <linux/types.h>
> >  #include <linux/skbuff.h>
> > @@ -104,11 +105,11 @@ static int cls_bpf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
> >                         /* It is safe to push/pull even if skb_shared() */
> >                         __skb_push(skb, skb->mac_len);
> >                         bpf_compute_data_pointers(skb);
> > -                       filter_res = BPF_PROG_RUN(prog->filter, skb);
> > +                       filter_res = BPF_PROG_RUN(READ_ONCE(prog->filter), skb);
> >                         __skb_pull(skb, skb->mac_len);
> >                 } else {
> >                         bpf_compute_data_pointers(skb);
> > -                       filter_res = BPF_PROG_RUN(prog->filter, skb);
> > +                       filter_res = BPF_PROG_RUN(READ_ONCE(prog->filter), skb);
> >                 }
> >
> >                 if (prog->exts_integrated) {
> > @@ -775,6 +776,55 @@ static int cls_bpf_link_detach(struct bpf_link *link)
> >         return 0;
> >  }
> >
> > +static int cls_bpf_link_update(struct bpf_link *link, struct bpf_prog *new_prog,
> > +                              struct bpf_prog *old_prog)
> > +{
> > +       struct cls_bpf_link *cls_link;
> > +       struct cls_bpf_prog cls_prog;
> > +       struct cls_bpf_prog *prog;
> > +       int ret;
> > +
> > +       rtnl_lock();
> > +
> > +       cls_link = container_of(link, struct cls_bpf_link, link);
> > +       if (!cls_link->prog) {
> > +               ret = -ENOLINK;
> > +               goto out;
> > +       }
> > +
> > +       prog = cls_link->prog;
> > +
> > +       /* BPF_F_REPLACEing? */
> > +       if (old_prog && prog->filter != old_prog) {
> > +               ret = -EINVAL;
> > +               goto out;
> > +       }
> > +
> > +       old_prog = prog->filter;
> > +
> > +       if (new_prog == old_prog) {
> > +               ret = 0;
>
> So the contract is that if update is successful, new_prog's refcount
> taken by link_update() in kernel/bpf/syscall.c is transferred here. On
> error, it will be bpf_prog_put() by link_update(). So here you don't
> need extra refcnt, but it's also not an error, so you need to
> bpf_prog_put(new_prog) explicitly to balance out refcnt. See how it's
> done for XDP, for example.
>

Yes, thanks for spotting this.

>
> > +               goto out;
> > +       }
> > +
> > +       cls_prog = *prog;
> > +       cls_prog.filter = new_prog;
> > +
> > +       ret = cls_bpf_offload(prog->tp, &cls_prog, prog, NULL);
> > +       if (ret < 0)
> > +               goto out;
> > +
> > +       WRITE_ONCE(prog->filter, new_prog);
> > +
> > +       bpf_prog_inc(new_prog);
>
> and you don't need this, you already got the reference from link_update()
>

So the reason I still keep an extra refcount is because the existing code on the
netlink side assumes that. Even though the link itself holds a refcount for us,
the actual freeing of cls_bpf_prog may happen independent of bpf_link.

I'll add a comment for this.

> > +       /* release our reference */
> > +       bpf_prog_put(old_prog);
> > +
> > +out:
> > +       rtnl_unlock();
> > +       return ret;
> > +}
> > +
> >  static void __bpf_fill_link_info(struct cls_bpf_link *link,
> >                                  struct bpf_link_info *info)
> >  {
> > @@ -859,6 +909,7 @@ static const struct bpf_link_ops cls_bpf_link_ops = {
> >         .show_fdinfo = cls_bpf_link_show_fdinfo,
> >  #endif
> >         .fill_link_info = cls_bpf_link_fill_link_info,
> > +       .update_prog = cls_bpf_link_update,
> >  };
> >
> >  static inline char *cls_bpf_link_name(u32 prog_id, const char *name)
> > --
> > 2.31.1
> >

--
Kartikeya

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 0/7] Add bpf_link based TC-BPF API
  2021-06-04  6:31 [PATCH bpf-next v2 0/7] Add bpf_link based TC-BPF API Kumar Kartikeya Dwivedi
                   ` (6 preceding siblings ...)
  2021-06-04  6:31 ` [PATCH bpf-next v2 7/7] libbpf: add selftest for " Kumar Kartikeya Dwivedi
@ 2022-06-10  0:24 ` Joanne Koong
  2022-06-10 12:58   ` Kumar Kartikeya Dwivedi
  7 siblings, 1 reply; 35+ messages in thread
From: Joanne Koong @ 2022-06-10  0:24 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Jamal Hadi Salim, Vlad Buslov, Cong Wang, Jesper Dangaard Brouer,
	Toke Høiland-Jørgensen, netdev

On Thu, Jun 3, 2021 at 11:31 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> This is the second (non-RFC) version.
>
> This adds a bpf_link path to create TC filters tied to cls_bpf classifier, and
> introduces fd based ownership for such TC filters. Netlink cannot delete or
> replace such filters, but the bpf_link is severed on indirect destruction of the
> filter (backing qdisc being deleted, or chain being flushed, etc.). To ensure
> that filters remain attached beyond process lifetime, the usual bpf_link fd
> pinning approach can be used.
>
> The individual patches contain more details and comments, but the overall kernel
> API and libbpf helper mirrors the semantics of the netlink based TC-BPF API
> merged recently. This means that we start by always setting direct action mode,
> protocol to ETH_P_ALL, chain_index as 0, etc. If there is a need for more
> options in the future, they can be easily exposed through the bpf_link API in
> the future.
>
> Patch 1 refactors cls_bpf change function to extract two helpers that will be
> reused in bpf_link creation.
>
> Patch 2 exports some bpf_link management functions to modules. This is needed
> because our bpf_link object is tied to the cls_bpf_prog object. Tying it to
> tcf_proto would be weird, because the update path has to replace offloaded bpf
> prog, which happens using internal cls_bpf helpers, and would in general be more
> code to abstract over an operation that is unlikely to be implemented for other
> filter types.
>
> Patch 3 adds the main bpf_link API. A function in cls_api takes care of
> obtaining block reference, creating the filter object, and then calls the
> bpf_link_change tcf_proto op (only supported by cls_bpf) that returns a fd after
> setting up the internal structures. An optimization is made to not keep around
> resources for extended actions, which is explained in a code comment as it wasn't
> immediately obvious.
>
> Patch 4 adds an update path for bpf_link. Since bpf_link_update only supports
> replacing the bpf_prog, we can skip tc filter's change path by reusing the
> filter object but swapping its bpf_prog. This takes care of replacing the
> offloaded prog as well (if that fails, update is aborted). So far however,
> tcf_classify could do normal load (possibly torn) as the cls_bpf_prog->filter
> would never be modified concurrently. This is no longer true, and to not
> penalize the classify hot path, we also cannot impose serialization around
> its load. Hence the load is changed to READ_ONCE, so that the pointer value is
> always consistent. Due to invocation in a RCU critical section, the lifetime of
> the prog is guaranteed for the duration of the call.
>
> Patch 5, 6 take care of updating the userspace bits and add a bpf_link returning
> function to libbpf.
>
> Patch 7 adds a selftest that exercises all possible problematic interactions
> that I could think of.
>
> Design:
>
> This is where in the object hierarchy our bpf_link object is attached.
>
>                                                                             ┌─────┐
>                                                                             │     │
>                                                                             │ BPF │
>                                                                             program
>                                                                             │     │
>                                                                             └──▲──┘
>                                                       ┌───────┐                │
>                                                       │       │         ┌──────┴───────┐
>                                                       │  mod  ├─────────► cls_bpf_prog │
> ┌────────────────┐                                    │cls_bpf│         └────┬───▲─────┘
> │    tcf_block   │                                    │       │              │   │
> └────────┬───────┘                                    └───▲───┘              │   │
>          │          ┌─────────────┐                       │                ┌─▼───┴──┐
>          └──────────►  tcf_chain  │                       │                │bpf_link│
>                     └───────┬─────┘                       │                └────────┘
>                             │          ┌─────────────┐    │
>                             └──────────►  tcf_proto  ├────┘
>                                        └─────────────┘
>
> The bpf_link is detached on destruction of the cls_bpf_prog.  Doing it this way
> allows us to implement update in a lightweight manner without having to recreate
> a new filter, where we can just replace the BPF prog attached to cls_bpf_prog.
>
> The other way to do it would be to link the bpf_link to tcf_proto, there are
> numerous downsides to this:
>
> 1. All filters have to embed the pointer even though they won't be using it when
> cls_bpf is compiled in.
> 2. This probably won't make sense to be extended to other filter types anyway.
> 3. We aren't able to optimize the update case without adding another bpf_link
> specific update operation to tcf_proto ops.
>
> The downside with tying this to the module is having to export bpf_link
> management functions and introducing a tcf_proto op. Hopefully the cost of
> another operation func pointer is not big enough (as there is only one ops
> struct per module).
>
Hi Kumar,

Do you have any plans / bandwidth to land this feature upstream? If
so, do you have a tentative estimation for when you'll be able to work
on this? And if not, are you okay with someone else working on this to
get it merged in?

The reason I'm asking is because there are a few networking teams
within Meta that have been requesting this feature :)

Thanks,
Joanne

> Changelog:
> ----------
> v1 (RFC) -> v2
> v1: https://lore.kernel.org/bpf/20210528195946.2375109-1-memxor@gmail.com
>
>  * Avoid overwriting other members of union in bpf_attr (Andrii)
>  * Set link to NULL after bpf_link_cleanup to avoid double free (Andrii)
>  * Use __be16 to store the result of htons (Kernel Test Robot)
>  * Make assignment of tcf_exts::net conditional on CONFIG_NET_CLS_ACT
>    (Kernel Test Robot)
>
> Kumar Kartikeya Dwivedi (7):
>   net: sched: refactor cls_bpf creation code
>   bpf: export bpf_link functions for modules
>   net: sched: add bpf_link API for bpf classifier
>   net: sched: add lightweight update path for cls_bpf
>   tools: bpf.h: sync with kernel sources
>   libbpf: add bpf_link based TC-BPF management API
>   libbpf: add selftest for bpf_link based TC-BPF management API
>
>  include/linux/bpf_types.h                     |   3 +
>  include/net/pkt_cls.h                         |  13 +
>  include/net/sch_generic.h                     |   6 +-
>  include/uapi/linux/bpf.h                      |  15 +
>  kernel/bpf/syscall.c                          |  14 +-
>  net/sched/cls_api.c                           | 139 ++++++-
>  net/sched/cls_bpf.c                           | 389 ++++++++++++++++--
>  tools/include/uapi/linux/bpf.h                |  15 +
>  tools/lib/bpf/bpf.c                           |   8 +-
>  tools/lib/bpf/bpf.h                           |   8 +-
>  tools/lib/bpf/libbpf.c                        |  59 ++-
>  tools/lib/bpf/libbpf.h                        |  17 +
>  tools/lib/bpf/libbpf.map                      |   1 +
>  tools/lib/bpf/netlink.c                       |   5 +-
>  tools/lib/bpf/netlink.h                       |   8 +
>  .../selftests/bpf/prog_tests/tc_bpf_link.c    | 285 +++++++++++++
>  16 files changed, 940 insertions(+), 45 deletions(-)
>  create mode 100644 tools/lib/bpf/netlink.h
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/tc_bpf_link.c
>
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 0/7] Add bpf_link based TC-BPF API
  2022-06-10  0:24 ` [PATCH bpf-next v2 0/7] Add bpf_link based TC-BPF API Joanne Koong
@ 2022-06-10 12:58   ` Kumar Kartikeya Dwivedi
  2022-06-10 17:23     ` Joanne Koong
  0 siblings, 1 reply; 35+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-06-10 12:58 UTC (permalink / raw)
  To: Joanne Koong
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Jamal Hadi Salim, Vlad Buslov, Cong Wang, Jesper Dangaard Brouer,
	Toke Høiland-Jørgensen, netdev

On Fri, Jun 10, 2022 at 05:54:27AM IST, Joanne Koong wrote:
> On Thu, Jun 3, 2021 at 11:31 PM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > This is the second (non-RFC) version.
> >
> > This adds a bpf_link path to create TC filters tied to cls_bpf classifier, and
> > introduces fd based ownership for such TC filters. Netlink cannot delete or
> > replace such filters, but the bpf_link is severed on indirect destruction of the
> > filter (backing qdisc being deleted, or chain being flushed, etc.). To ensure
> > that filters remain attached beyond process lifetime, the usual bpf_link fd
> > pinning approach can be used.
> >
> > The individual patches contain more details and comments, but the overall kernel
> > API and libbpf helper mirrors the semantics of the netlink based TC-BPF API
> > merged recently. This means that we start by always setting direct action mode,
> > protocol to ETH_P_ALL, chain_index as 0, etc. If there is a need for more
> > options in the future, they can be easily exposed through the bpf_link API in
> > the future.
> >
> > Patch 1 refactors cls_bpf change function to extract two helpers that will be
> > reused in bpf_link creation.
> >
> > Patch 2 exports some bpf_link management functions to modules. This is needed
> > because our bpf_link object is tied to the cls_bpf_prog object. Tying it to
> > tcf_proto would be weird, because the update path has to replace offloaded bpf
> > prog, which happens using internal cls_bpf helpers, and would in general be more
> > code to abstract over an operation that is unlikely to be implemented for other
> > filter types.
> >
> > Patch 3 adds the main bpf_link API. A function in cls_api takes care of
> > obtaining block reference, creating the filter object, and then calls the
> > bpf_link_change tcf_proto op (only supported by cls_bpf) that returns a fd after
> > setting up the internal structures. An optimization is made to not keep around
> > resources for extended actions, which is explained in a code comment as it wasn't
> > immediately obvious.
> >
> > Patch 4 adds an update path for bpf_link. Since bpf_link_update only supports
> > replacing the bpf_prog, we can skip tc filter's change path by reusing the
> > filter object but swapping its bpf_prog. This takes care of replacing the
> > offloaded prog as well (if that fails, update is aborted). So far however,
> > tcf_classify could do normal load (possibly torn) as the cls_bpf_prog->filter
> > would never be modified concurrently. This is no longer true, and to not
> > penalize the classify hot path, we also cannot impose serialization around
> > its load. Hence the load is changed to READ_ONCE, so that the pointer value is
> > always consistent. Due to invocation in a RCU critical section, the lifetime of
> > the prog is guaranteed for the duration of the call.
> >
> > Patch 5, 6 take care of updating the userspace bits and add a bpf_link returning
> > function to libbpf.
> >
> > Patch 7 adds a selftest that exercises all possible problematic interactions
> > that I could think of.
> >
> > Design:
> >
> > This is where in the object hierarchy our bpf_link object is attached.
> >
> >                                                                             ┌─────┐
> >                                                                             │     │
> >                                                                             │ BPF │
> >                                                                             program
> >                                                                             │     │
> >                                                                             └──▲──┘
> >                                                       ┌───────┐                │
> >                                                       │       │         ┌──────┴───────┐
> >                                                       │  mod  ├─────────► cls_bpf_prog │
> > ┌────────────────┐                                    │cls_bpf│         └────┬───▲─────┘
> > │    tcf_block   │                                    │       │              │   │
> > └────────┬───────┘                                    └───▲───┘              │   │
> >          │          ┌─────────────┐                       │                ┌─▼───┴──┐
> >          └──────────►  tcf_chain  │                       │                │bpf_link│
> >                     └───────┬─────┘                       │                └────────┘
> >                             │          ┌─────────────┐    │
> >                             └──────────►  tcf_proto  ├────┘
> >                                        └─────────────┘
> >
> > The bpf_link is detached on destruction of the cls_bpf_prog.  Doing it this way
> > allows us to implement update in a lightweight manner without having to recreate
> > a new filter, where we can just replace the BPF prog attached to cls_bpf_prog.
> >
> > The other way to do it would be to link the bpf_link to tcf_proto, there are
> > numerous downsides to this:
> >
> > 1. All filters have to embed the pointer even though they won't be using it when
> > cls_bpf is compiled in.
> > 2. This probably won't make sense to be extended to other filter types anyway.
> > 3. We aren't able to optimize the update case without adding another bpf_link
> > specific update operation to tcf_proto ops.
> >
> > The downside with tying this to the module is having to export bpf_link
> > management functions and introducing a tcf_proto op. Hopefully the cost of
> > another operation func pointer is not big enough (as there is only one ops
> > struct per module).
> >
> Hi Kumar,
>
> Do you have any plans / bandwidth to land this feature upstream? If
> so, do you have a tentative estimation for when you'll be able to work
> on this? And if not, are you okay with someone else working on this to
> get it merged in?
>

I can have a look at resurrecting it later this month, if you're ok with waiting
until then, otherwise if someone else wants to pick this up before that it's
fine by me, just let me know so we avoid duplicated effort. Note that the
approach in v2 is dead/unlikely to get accepted by the TC maintainers, so we'd
have to implement the way Daniel mentioned in [0].

  [0]: https://lore.kernel.org/bpf/15cd0a9c-95a1-9766-fca1-4bf9d09e4100@iogearbox.net

> The reason I'm asking is because there are a few networking teams
> within Meta that have been requesting this feature :)
>
> Thanks,
> Joanne
>
> > Changelog:
> > ----------
> > v1 (RFC) -> v2
> > v1: https://lore.kernel.org/bpf/20210528195946.2375109-1-memxor@gmail.com
> >
> >  * Avoid overwriting other members of union in bpf_attr (Andrii)
> >  * Set link to NULL after bpf_link_cleanup to avoid double free (Andrii)
> >  * Use __be16 to store the result of htons (Kernel Test Robot)
> >  * Make assignment of tcf_exts::net conditional on CONFIG_NET_CLS_ACT
> >    (Kernel Test Robot)
> >
> > Kumar Kartikeya Dwivedi (7):
> >   net: sched: refactor cls_bpf creation code
> >   bpf: export bpf_link functions for modules
> >   net: sched: add bpf_link API for bpf classifier
> >   net: sched: add lightweight update path for cls_bpf
> >   tools: bpf.h: sync with kernel sources
> >   libbpf: add bpf_link based TC-BPF management API
> >   libbpf: add selftest for bpf_link based TC-BPF management API
> >
> >  include/linux/bpf_types.h                     |   3 +
> >  include/net/pkt_cls.h                         |  13 +
> >  include/net/sch_generic.h                     |   6 +-
> >  include/uapi/linux/bpf.h                      |  15 +
> >  kernel/bpf/syscall.c                          |  14 +-
> >  net/sched/cls_api.c                           | 139 ++++++-
> >  net/sched/cls_bpf.c                           | 389 ++++++++++++++++--
> >  tools/include/uapi/linux/bpf.h                |  15 +
> >  tools/lib/bpf/bpf.c                           |   8 +-
> >  tools/lib/bpf/bpf.h                           |   8 +-
> >  tools/lib/bpf/libbpf.c                        |  59 ++-
> >  tools/lib/bpf/libbpf.h                        |  17 +
> >  tools/lib/bpf/libbpf.map                      |   1 +
> >  tools/lib/bpf/netlink.c                       |   5 +-
> >  tools/lib/bpf/netlink.h                       |   8 +
> >  .../selftests/bpf/prog_tests/tc_bpf_link.c    | 285 +++++++++++++
> >  16 files changed, 940 insertions(+), 45 deletions(-)
> >  create mode 100644 tools/lib/bpf/netlink.h
> >  create mode 100644 tools/testing/selftests/bpf/prog_tests/tc_bpf_link.c
> >
> > --
> > 2.31.1
> >

--
Kartikeya

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 0/7] Add bpf_link based TC-BPF API
  2022-06-10 12:58   ` Kumar Kartikeya Dwivedi
@ 2022-06-10 17:23     ` Joanne Koong
  2022-06-10 19:07       ` Joanne Koong
  0 siblings, 1 reply; 35+ messages in thread
From: Joanne Koong @ 2022-06-10 17:23 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Jamal Hadi Salim, Vlad Buslov, Cong Wang, Jesper Dangaard Brouer,
	Toke Høiland-Jørgensen, netdev

On Fri, Jun 10, 2022 at 5:58 AM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> On Fri, Jun 10, 2022 at 05:54:27AM IST, Joanne Koong wrote:
> > On Thu, Jun 3, 2021 at 11:31 PM Kumar Kartikeya Dwivedi
> > <memxor@gmail.com> wrote:
> > >
> > > This is the second (non-RFC) version.
> > >
> > > This adds a bpf_link path to create TC filters tied to cls_bpf classifier, and
> > > introduces fd based ownership for such TC filters. Netlink cannot delete or
> > > replace such filters, but the bpf_link is severed on indirect destruction of the
> > > filter (backing qdisc being deleted, or chain being flushed, etc.). To ensure
> > > that filters remain attached beyond process lifetime, the usual bpf_link fd
> > > pinning approach can be used.
> > >
> > > The individual patches contain more details and comments, but the overall kernel
> > > API and libbpf helper mirrors the semantics of the netlink based TC-BPF API
> > > merged recently. This means that we start by always setting direct action mode,
> > > protocol to ETH_P_ALL, chain_index as 0, etc. If there is a need for more
> > > options in the future, they can be easily exposed through the bpf_link API in
> > > the future.
> > >
> > > Patch 1 refactors cls_bpf change function to extract two helpers that will be
> > > reused in bpf_link creation.
> > >
> > > Patch 2 exports some bpf_link management functions to modules. This is needed
> > > because our bpf_link object is tied to the cls_bpf_prog object. Tying it to
> > > tcf_proto would be weird, because the update path has to replace offloaded bpf
> > > prog, which happens using internal cls_bpf helpers, and would in general be more
> > > code to abstract over an operation that is unlikely to be implemented for other
> > > filter types.
> > >
> > > Patch 3 adds the main bpf_link API. A function in cls_api takes care of
> > > obtaining block reference, creating the filter object, and then calls the
> > > bpf_link_change tcf_proto op (only supported by cls_bpf) that returns a fd after
> > > setting up the internal structures. An optimization is made to not keep around
> > > resources for extended actions, which is explained in a code comment as it wasn't
> > > immediately obvious.
> > >
> > > Patch 4 adds an update path for bpf_link. Since bpf_link_update only supports
> > > replacing the bpf_prog, we can skip tc filter's change path by reusing the
> > > filter object but swapping its bpf_prog. This takes care of replacing the
> > > offloaded prog as well (if that fails, update is aborted). So far however,
> > > tcf_classify could do normal load (possibly torn) as the cls_bpf_prog->filter
> > > would never be modified concurrently. This is no longer true, and to not
> > > penalize the classify hot path, we also cannot impose serialization around
> > > its load. Hence the load is changed to READ_ONCE, so that the pointer value is
> > > always consistent. Due to invocation in a RCU critical section, the lifetime of
> > > the prog is guaranteed for the duration of the call.
> > >
> > > Patch 5, 6 take care of updating the userspace bits and add a bpf_link returning
> > > function to libbpf.
> > >
> > > Patch 7 adds a selftest that exercises all possible problematic interactions
> > > that I could think of.
> > >
> > > Design:
> > >
> > > This is where in the object hierarchy our bpf_link object is attached.
> > >
> > >                                                                             ┌─────┐
> > >                                                                             │     │
> > >                                                                             │ BPF │
> > >                                                                             program
> > >                                                                             │     │
> > >                                                                             └──▲──┘
> > >                                                       ┌───────┐                │
> > >                                                       │       │         ┌──────┴───────┐
> > >                                                       │  mod  ├─────────► cls_bpf_prog │
> > > ┌────────────────┐                                    │cls_bpf│         └────┬───▲─────┘
> > > │    tcf_block   │                                    │       │              │   │
> > > └────────┬───────┘                                    └───▲───┘              │   │
> > >          │          ┌─────────────┐                       │                ┌─▼───┴──┐
> > >          └──────────►  tcf_chain  │                       │                │bpf_link│
> > >                     └───────┬─────┘                       │                └────────┘
> > >                             │          ┌─────────────┐    │
> > >                             └──────────►  tcf_proto  ├────┘
> > >                                        └─────────────┘
> > >
> > > The bpf_link is detached on destruction of the cls_bpf_prog.  Doing it this way
> > > allows us to implement update in a lightweight manner without having to recreate
> > > a new filter, where we can just replace the BPF prog attached to cls_bpf_prog.
> > >
> > > The other way to do it would be to link the bpf_link to tcf_proto, there are
> > > numerous downsides to this:
> > >
> > > 1. All filters have to embed the pointer even though they won't be using it when
> > > cls_bpf is compiled in.
> > > 2. This probably won't make sense to be extended to other filter types anyway.
> > > 3. We aren't able to optimize the update case without adding another bpf_link
> > > specific update operation to tcf_proto ops.
> > >
> > > The downside with tying this to the module is having to export bpf_link
> > > management functions and introducing a tcf_proto op. Hopefully the cost of
> > > another operation func pointer is not big enough (as there is only one ops
> > > struct per module).
> > >
> > Hi Kumar,
> >
> > Do you have any plans / bandwidth to land this feature upstream? If
> > so, do you have a tentative estimation for when you'll be able to work
> > on this? And if not, are you okay with someone else working on this to
> > get it merged in?
> >
>
> I can have a look at resurrecting it later this month, if you're ok with waiting
> until then, otherwise if someone else wants to pick this up before that it's
> fine by me, just let me know so we avoid duplicated effort. Note that the
> approach in v2 is dead/unlikely to get accepted by the TC maintainers, so we'd
> have to implement the way Daniel mentioned in [0].

Sounds great! We'll wait and check back in with you later this month.

>
>   [0]: https://lore.kernel.org/bpf/15cd0a9c-95a1-9766-fca1-4bf9d09e4100@iogearbox.net
>
> > The reason I'm asking is because there are a few networking teams
> > within Meta that have been requesting this feature :)
> >
> > Thanks,
> > Joanne
> >
> > > Changelog:
> > > ----------
> > > v1 (RFC) -> v2
> > > v1: https://lore.kernel.org/bpf/20210528195946.2375109-1-memxor@gmail.com
> > >
> > >  * Avoid overwriting other members of union in bpf_attr (Andrii)
> > >  * Set link to NULL after bpf_link_cleanup to avoid double free (Andrii)
> > >  * Use __be16 to store the result of htons (Kernel Test Robot)
> > >  * Make assignment of tcf_exts::net conditional on CONFIG_NET_CLS_ACT
> > >    (Kernel Test Robot)
> > >
> > > Kumar Kartikeya Dwivedi (7):
> > >   net: sched: refactor cls_bpf creation code
> > >   bpf: export bpf_link functions for modules
> > >   net: sched: add bpf_link API for bpf classifier
> > >   net: sched: add lightweight update path for cls_bpf
> > >   tools: bpf.h: sync with kernel sources
> > >   libbpf: add bpf_link based TC-BPF management API
> > >   libbpf: add selftest for bpf_link based TC-BPF management API
> > >
> > >  include/linux/bpf_types.h                     |   3 +
> > >  include/net/pkt_cls.h                         |  13 +
> > >  include/net/sch_generic.h                     |   6 +-
> > >  include/uapi/linux/bpf.h                      |  15 +
> > >  kernel/bpf/syscall.c                          |  14 +-
> > >  net/sched/cls_api.c                           | 139 ++++++-
> > >  net/sched/cls_bpf.c                           | 389 ++++++++++++++++--
> > >  tools/include/uapi/linux/bpf.h                |  15 +
> > >  tools/lib/bpf/bpf.c                           |   8 +-
> > >  tools/lib/bpf/bpf.h                           |   8 +-
> > >  tools/lib/bpf/libbpf.c                        |  59 ++-
> > >  tools/lib/bpf/libbpf.h                        |  17 +
> > >  tools/lib/bpf/libbpf.map                      |   1 +
> > >  tools/lib/bpf/netlink.c                       |   5 +-
> > >  tools/lib/bpf/netlink.h                       |   8 +
> > >  .../selftests/bpf/prog_tests/tc_bpf_link.c    | 285 +++++++++++++
> > >  16 files changed, 940 insertions(+), 45 deletions(-)
> > >  create mode 100644 tools/lib/bpf/netlink.h
> > >  create mode 100644 tools/testing/selftests/bpf/prog_tests/tc_bpf_link.c
> > >
> > > --
> > > 2.31.1
> > >
>
> --
> Kartikeya

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 0/7] Add bpf_link based TC-BPF API
  2022-06-10 17:23     ` Joanne Koong
@ 2022-06-10 19:07       ` Joanne Koong
  2022-06-10 19:34         ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 35+ messages in thread
From: Joanne Koong @ 2022-06-10 19:07 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Jamal Hadi Salim, Vlad Buslov, Cong Wang, Jesper Dangaard Brouer,
	Toke Høiland-Jørgensen, netdev

On Fri, Jun 10, 2022 at 10:23 AM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> On Fri, Jun 10, 2022 at 5:58 AM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > On Fri, Jun 10, 2022 at 05:54:27AM IST, Joanne Koong wrote:
> > > On Thu, Jun 3, 2021 at 11:31 PM Kumar Kartikeya Dwivedi
> > > <memxor@gmail.com> wrote:
> > > >
> > > > This is the second (non-RFC) version.
> > > >
> > > > This adds a bpf_link path to create TC filters tied to cls_bpf classifier, and
> > > > introduces fd based ownership for such TC filters. Netlink cannot delete or
> > > > replace such filters, but the bpf_link is severed on indirect destruction of the
> > > > filter (backing qdisc being deleted, or chain being flushed, etc.). To ensure
> > > > that filters remain attached beyond process lifetime, the usual bpf_link fd
> > > > pinning approach can be used.
> > > >
> > > > The individual patches contain more details and comments, but the overall kernel
> > > > API and libbpf helper mirrors the semantics of the netlink based TC-BPF API
> > > > merged recently. This means that we start by always setting direct action mode,
> > > > protocol to ETH_P_ALL, chain_index as 0, etc. If there is a need for more
> > > > options in the future, they can be easily exposed through the bpf_link API in
> > > > the future.
> > > >
> > > > Patch 1 refactors cls_bpf change function to extract two helpers that will be
> > > > reused in bpf_link creation.
> > > >
> > > > Patch 2 exports some bpf_link management functions to modules. This is needed
> > > > because our bpf_link object is tied to the cls_bpf_prog object. Tying it to
> > > > tcf_proto would be weird, because the update path has to replace offloaded bpf
> > > > prog, which happens using internal cls_bpf helpers, and would in general be more
> > > > code to abstract over an operation that is unlikely to be implemented for other
> > > > filter types.
> > > >
> > > > Patch 3 adds the main bpf_link API. A function in cls_api takes care of
> > > > obtaining block reference, creating the filter object, and then calls the
> > > > bpf_link_change tcf_proto op (only supported by cls_bpf) that returns a fd after
> > > > setting up the internal structures. An optimization is made to not keep around
> > > > resources for extended actions, which is explained in a code comment as it wasn't
> > > > immediately obvious.
> > > >
> > > > Patch 4 adds an update path for bpf_link. Since bpf_link_update only supports
> > > > replacing the bpf_prog, we can skip tc filter's change path by reusing the
> > > > filter object but swapping its bpf_prog. This takes care of replacing the
> > > > offloaded prog as well (if that fails, update is aborted). So far however,
> > > > tcf_classify could do normal load (possibly torn) as the cls_bpf_prog->filter
> > > > would never be modified concurrently. This is no longer true, and to not
> > > > penalize the classify hot path, we also cannot impose serialization around
> > > > its load. Hence the load is changed to READ_ONCE, so that the pointer value is
> > > > always consistent. Due to invocation in a RCU critical section, the lifetime of
> > > > the prog is guaranteed for the duration of the call.
> > > >
> > > > Patch 5, 6 take care of updating the userspace bits and add a bpf_link returning
> > > > function to libbpf.
> > > >
> > > > Patch 7 adds a selftest that exercises all possible problematic interactions
> > > > that I could think of.
> > > >
> > > > Design:
> > > >
> > > > This is where in the object hierarchy our bpf_link object is attached.
> > > >
> > > >                                                                             ┌─────┐
> > > >                                                                             │     │
> > > >                                                                             │ BPF │
> > > >                                                                             program
> > > >                                                                             │     │
> > > >                                                                             └──▲──┘
> > > >                                                       ┌───────┐                │
> > > >                                                       │       │         ┌──────┴───────┐
> > > >                                                       │  mod  ├─────────► cls_bpf_prog │
> > > > ┌────────────────┐                                    │cls_bpf│         └────┬───▲─────┘
> > > > │    tcf_block   │                                    │       │              │   │
> > > > └────────┬───────┘                                    └───▲───┘              │   │
> > > >          │          ┌─────────────┐                       │                ┌─▼───┴──┐
> > > >          └──────────►  tcf_chain  │                       │                │bpf_link│
> > > >                     └───────┬─────┘                       │                └────────┘
> > > >                             │          ┌─────────────┐    │
> > > >                             └──────────►  tcf_proto  ├────┘
> > > >                                        └─────────────┘
> > > >
> > > > The bpf_link is detached on destruction of the cls_bpf_prog.  Doing it this way
> > > > allows us to implement update in a lightweight manner without having to recreate
> > > > a new filter, where we can just replace the BPF prog attached to cls_bpf_prog.
> > > >
> > > > The other way to do it would be to link the bpf_link to tcf_proto, there are
> > > > numerous downsides to this:
> > > >
> > > > 1. All filters have to embed the pointer even though they won't be using it when
> > > > cls_bpf is compiled in.
> > > > 2. This probably won't make sense to be extended to other filter types anyway.
> > > > 3. We aren't able to optimize the update case without adding another bpf_link
> > > > specific update operation to tcf_proto ops.
> > > >
> > > > The downside with tying this to the module is having to export bpf_link
> > > > management functions and introducing a tcf_proto op. Hopefully the cost of
> > > > another operation func pointer is not big enough (as there is only one ops
> > > > struct per module).
> > > >
> > > Hi Kumar,
> > >
> > > Do you have any plans / bandwidth to land this feature upstream? If
> > > so, do you have a tentative estimation for when you'll be able to work
> > > on this? And if not, are you okay with someone else working on this to
> > > get it merged in?
> > >
> >
> > I can have a look at resurrecting it later this month, if you're ok with waiting
> > until then, otherwise if someone else wants to pick this up before that it's
> > fine by me, just let me know so we avoid duplicated effort. Note that the
> > approach in v2 is dead/unlikely to get accepted by the TC maintainers, so we'd
> > have to implement the way Daniel mentioned in [0].
>
> Sounds great! We'll wait and check back in with you later this month.
>
After reading the linked thread (which I should have done before
submitting my previous reply :)),  if I'm understanding it correctly,
it seems then that the work needed for tc bpf_link will be in a new
direction that's not based on the code in this v2 patchset. I'm
interested in learning more about bpf link and tc - I can pick this up
to work on. But if this was something you wanted to work on though,
please don't hesitate to let me know; I can find some other bpf link
thing to work on instead if that's the case.


> >
> >   [0]: https://lore.kernel.org/bpf/15cd0a9c-95a1-9766-fca1-4bf9d09e4100@iogearbox.net
> >
> > > The reason I'm asking is because there are a few networking teams
> > > within Meta that have been requesting this feature :)
> > >
> > > Thanks,
> > > Joanne
> > >
> > > > Changelog:
> > > > ----------
> > > > v1 (RFC) -> v2
> > > > v1: https://lore.kernel.org/bpf/20210528195946.2375109-1-memxor@gmail.com
> > > >
> > > >  * Avoid overwriting other members of union in bpf_attr (Andrii)
> > > >  * Set link to NULL after bpf_link_cleanup to avoid double free (Andrii)
> > > >  * Use __be16 to store the result of htons (Kernel Test Robot)
> > > >  * Make assignment of tcf_exts::net conditional on CONFIG_NET_CLS_ACT
> > > >    (Kernel Test Robot)
> > > >
> > > > Kumar Kartikeya Dwivedi (7):
> > > >   net: sched: refactor cls_bpf creation code
> > > >   bpf: export bpf_link functions for modules
> > > >   net: sched: add bpf_link API for bpf classifier
> > > >   net: sched: add lightweight update path for cls_bpf
> > > >   tools: bpf.h: sync with kernel sources
> > > >   libbpf: add bpf_link based TC-BPF management API
> > > >   libbpf: add selftest for bpf_link based TC-BPF management API
> > > >
> > > >  include/linux/bpf_types.h                     |   3 +
> > > >  include/net/pkt_cls.h                         |  13 +
> > > >  include/net/sch_generic.h                     |   6 +-
> > > >  include/uapi/linux/bpf.h                      |  15 +
> > > >  kernel/bpf/syscall.c                          |  14 +-
> > > >  net/sched/cls_api.c                           | 139 ++++++-
> > > >  net/sched/cls_bpf.c                           | 389 ++++++++++++++++--
> > > >  tools/include/uapi/linux/bpf.h                |  15 +
> > > >  tools/lib/bpf/bpf.c                           |   8 +-
> > > >  tools/lib/bpf/bpf.h                           |   8 +-
> > > >  tools/lib/bpf/libbpf.c                        |  59 ++-
> > > >  tools/lib/bpf/libbpf.h                        |  17 +
> > > >  tools/lib/bpf/libbpf.map                      |   1 +
> > > >  tools/lib/bpf/netlink.c                       |   5 +-
> > > >  tools/lib/bpf/netlink.h                       |   8 +
> > > >  .../selftests/bpf/prog_tests/tc_bpf_link.c    | 285 +++++++++++++
> > > >  16 files changed, 940 insertions(+), 45 deletions(-)
> > > >  create mode 100644 tools/lib/bpf/netlink.h
> > > >  create mode 100644 tools/testing/selftests/bpf/prog_tests/tc_bpf_link.c
> > > >
> > > > --
> > > > 2.31.1
> > > >
> >
> > --
> > Kartikeya

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 0/7] Add bpf_link based TC-BPF API
  2022-06-10 19:07       ` Joanne Koong
@ 2022-06-10 19:34         ` Kumar Kartikeya Dwivedi
  2022-06-10 20:04           ` Daniel Borkmann
  2022-06-10 20:16           ` Toke Høiland-Jørgensen
  0 siblings, 2 replies; 35+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-06-10 19:34 UTC (permalink / raw)
  To: Joanne Koong
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Jamal Hadi Salim, Vlad Buslov, Cong Wang, Jesper Dangaard Brouer,
	Toke Høiland-Jørgensen, netdev

On Sat, Jun 11, 2022 at 12:37:50AM IST, Joanne Koong wrote:
> On Fri, Jun 10, 2022 at 10:23 AM Joanne Koong <joannelkoong@gmail.com> wrote:
> >
> > On Fri, Jun 10, 2022 at 5:58 AM Kumar Kartikeya Dwivedi
> > <memxor@gmail.com> wrote:
> > >
> > > On Fri, Jun 10, 2022 at 05:54:27AM IST, Joanne Koong wrote:
> > > > On Thu, Jun 3, 2021 at 11:31 PM Kumar Kartikeya Dwivedi
> > > > <memxor@gmail.com> wrote:
> > > > >
> > > > > This is the second (non-RFC) version.
> > > > >
> > > > > This adds a bpf_link path to create TC filters tied to cls_bpf classifier, and
> > > > > introduces fd based ownership for such TC filters. Netlink cannot delete or
> > > > > replace such filters, but the bpf_link is severed on indirect destruction of the
> > > > > filter (backing qdisc being deleted, or chain being flushed, etc.). To ensure
> > > > > that filters remain attached beyond process lifetime, the usual bpf_link fd
> > > > > pinning approach can be used.
> > > > >
> > > > > The individual patches contain more details and comments, but the overall kernel
> > > > > API and libbpf helper mirrors the semantics of the netlink based TC-BPF API
> > > > > merged recently. This means that we start by always setting direct action mode,
> > > > > protocol to ETH_P_ALL, chain_index as 0, etc. If there is a need for more
> > > > > options in the future, they can be easily exposed through the bpf_link API in
> > > > > the future.
> > > > >
> > > > > Patch 1 refactors cls_bpf change function to extract two helpers that will be
> > > > > reused in bpf_link creation.
> > > > >
> > > > > Patch 2 exports some bpf_link management functions to modules. This is needed
> > > > > because our bpf_link object is tied to the cls_bpf_prog object. Tying it to
> > > > > tcf_proto would be weird, because the update path has to replace offloaded bpf
> > > > > prog, which happens using internal cls_bpf helpers, and would in general be more
> > > > > code to abstract over an operation that is unlikely to be implemented for other
> > > > > filter types.
> > > > >
> > > > > Patch 3 adds the main bpf_link API. A function in cls_api takes care of
> > > > > obtaining block reference, creating the filter object, and then calls the
> > > > > bpf_link_change tcf_proto op (only supported by cls_bpf) that returns a fd after
> > > > > setting up the internal structures. An optimization is made to not keep around
> > > > > resources for extended actions, which is explained in a code comment as it wasn't
> > > > > immediately obvious.
> > > > >
> > > > > Patch 4 adds an update path for bpf_link. Since bpf_link_update only supports
> > > > > replacing the bpf_prog, we can skip tc filter's change path by reusing the
> > > > > filter object but swapping its bpf_prog. This takes care of replacing the
> > > > > offloaded prog as well (if that fails, update is aborted). So far however,
> > > > > tcf_classify could do normal load (possibly torn) as the cls_bpf_prog->filter
> > > > > would never be modified concurrently. This is no longer true, and to not
> > > > > penalize the classify hot path, we also cannot impose serialization around
> > > > > its load. Hence the load is changed to READ_ONCE, so that the pointer value is
> > > > > always consistent. Due to invocation in a RCU critical section, the lifetime of
> > > > > the prog is guaranteed for the duration of the call.
> > > > >
> > > > > Patch 5, 6 take care of updating the userspace bits and add a bpf_link returning
> > > > > function to libbpf.
> > > > >
> > > > > Patch 7 adds a selftest that exercises all possible problematic interactions
> > > > > that I could think of.
> > > > >
> > > > > Design:
> > > > >
> > > > > This is where in the object hierarchy our bpf_link object is attached.
> > > > >
> > > > >                                                                             ┌─────┐
> > > > >                                                                             │     │
> > > > >                                                                             │ BPF │
> > > > >                                                                             program
> > > > >                                                                             │     │
> > > > >                                                                             └──▲──┘
> > > > >                                                       ┌───────┐                │
> > > > >                                                       │       │         ┌──────┴───────┐
> > > > >                                                       │  mod  ├─────────► cls_bpf_prog │
> > > > > ┌────────────────┐                                    │cls_bpf│         └────┬───▲─────┘
> > > > > │    tcf_block   │                                    │       │              │   │
> > > > > └────────┬───────┘                                    └───▲───┘              │   │
> > > > >          │          ┌─────────────┐                       │                ┌─▼───┴──┐
> > > > >          └──────────►  tcf_chain  │                       │                │bpf_link│
> > > > >                     └───────┬─────┘                       │                └────────┘
> > > > >                             │          ┌─────────────┐    │
> > > > >                             └──────────►  tcf_proto  ├────┘
> > > > >                                        └─────────────┘
> > > > >
> > > > > The bpf_link is detached on destruction of the cls_bpf_prog.  Doing it this way
> > > > > allows us to implement update in a lightweight manner without having to recreate
> > > > > a new filter, where we can just replace the BPF prog attached to cls_bpf_prog.
> > > > >
> > > > > The other way to do it would be to link the bpf_link to tcf_proto, there are
> > > > > numerous downsides to this:
> > > > >
> > > > > 1. All filters have to embed the pointer even though they won't be using it when
> > > > > cls_bpf is compiled in.
> > > > > 2. This probably won't make sense to be extended to other filter types anyway.
> > > > > 3. We aren't able to optimize the update case without adding another bpf_link
> > > > > specific update operation to tcf_proto ops.
> > > > >
> > > > > The downside with tying this to the module is having to export bpf_link
> > > > > management functions and introducing a tcf_proto op. Hopefully the cost of
> > > > > another operation func pointer is not big enough (as there is only one ops
> > > > > struct per module).
> > > > >
> > > > Hi Kumar,
> > > >
> > > > Do you have any plans / bandwidth to land this feature upstream? If
> > > > so, do you have a tentative estimation for when you'll be able to work
> > > > on this? And if not, are you okay with someone else working on this to
> > > > get it merged in?
> > > >
> > >
> > > I can have a look at resurrecting it later this month, if you're ok with waiting
> > > until then, otherwise if someone else wants to pick this up before that it's
> > > fine by me, just let me know so we avoid duplicated effort. Note that the
> > > approach in v2 is dead/unlikely to get accepted by the TC maintainers, so we'd
> > > have to implement the way Daniel mentioned in [0].
> >
> > Sounds great! We'll wait and check back in with you later this month.
> >
> After reading the linked thread (which I should have done before
> submitting my previous reply :)),  if I'm understanding it correctly,
> it seems then that the work needed for tc bpf_link will be in a new
> direction that's not based on the code in this v2 patchset. I'm
> interested in learning more about bpf link and tc - I can pick this up
> to work on. But if this was something you wanted to work on though,
> please don't hesitate to let me know; I can find some other bpf link
> thing to work on instead if that's the case.
>

Feel free to take it. And yes, it's going to be much simpler than this. I think
you can just add two bpf_prog pointers in struct net_device, use rtnl_lock to
protect the updates, and invoke using bpf_prog_run in sch_handle_ingress and
sch_handle_egress. You could also split the old and new path so that there are
less branches when the ingress/egress static key is enabled only for bpf_link
mode (as clsact will become redundant after this). It should be similar to how
XDP's bpf_link works. Auto detach can be handled similarly by unlinking dev from
the link when net_device teardown occurs, same as XDP's bpf_link, to sever it.

The other thing to keep in mind is that not all return codes are permitted in
direct action mode, so you may want to lift cls_bpf_exec_opcode out into a
header as a helper and then repurpose it for the switch statement in
sch_handle_ingress/egress to handle only those. Also need to document for the
user that only direct action mode is supported for TC bpf_link mode.

You can look at the cls_bpf_classify function in net/sched/cls_bpf.c. The
branches protected by prog->exts_integrated is for direct action mode, so the
handling for bpf_link will be similar. Also need to handle mono_delivery_time
unsetting that Martin added recently.

That's all from my notes from that time. Best of luck! :)

>
> > >
> > >   [0]: https://lore.kernel.org/bpf/15cd0a9c-95a1-9766-fca1-4bf9d09e4100@iogearbox.net
> > >
> > > > The reason I'm asking is because there are a few networking teams
> > > > within Meta that have been requesting this feature :)
> > > >
> > > > Thanks,
> > > > Joanne
> > > >
> > > > > Changelog:
> > > > > ----------
> > > > > v1 (RFC) -> v2
> > > > > v1: https://lore.kernel.org/bpf/20210528195946.2375109-1-memxor@gmail.com
> > > > >
> > > > >  * Avoid overwriting other members of union in bpf_attr (Andrii)
> > > > >  * Set link to NULL after bpf_link_cleanup to avoid double free (Andrii)
> > > > >  * Use __be16 to store the result of htons (Kernel Test Robot)
> > > > >  * Make assignment of tcf_exts::net conditional on CONFIG_NET_CLS_ACT
> > > > >    (Kernel Test Robot)
> > > > >
> > > > > Kumar Kartikeya Dwivedi (7):
> > > > >   net: sched: refactor cls_bpf creation code
> > > > >   bpf: export bpf_link functions for modules
> > > > >   net: sched: add bpf_link API for bpf classifier
> > > > >   net: sched: add lightweight update path for cls_bpf
> > > > >   tools: bpf.h: sync with kernel sources
> > > > >   libbpf: add bpf_link based TC-BPF management API
> > > > >   libbpf: add selftest for bpf_link based TC-BPF management API
> > > > >
> > > > >  include/linux/bpf_types.h                     |   3 +
> > > > >  include/net/pkt_cls.h                         |  13 +
> > > > >  include/net/sch_generic.h                     |   6 +-
> > > > >  include/uapi/linux/bpf.h                      |  15 +
> > > > >  kernel/bpf/syscall.c                          |  14 +-
> > > > >  net/sched/cls_api.c                           | 139 ++++++-
> > > > >  net/sched/cls_bpf.c                           | 389 ++++++++++++++++--
> > > > >  tools/include/uapi/linux/bpf.h                |  15 +
> > > > >  tools/lib/bpf/bpf.c                           |   8 +-
> > > > >  tools/lib/bpf/bpf.h                           |   8 +-
> > > > >  tools/lib/bpf/libbpf.c                        |  59 ++-
> > > > >  tools/lib/bpf/libbpf.h                        |  17 +
> > > > >  tools/lib/bpf/libbpf.map                      |   1 +
> > > > >  tools/lib/bpf/netlink.c                       |   5 +-
> > > > >  tools/lib/bpf/netlink.h                       |   8 +
> > > > >  .../selftests/bpf/prog_tests/tc_bpf_link.c    | 285 +++++++++++++
> > > > >  16 files changed, 940 insertions(+), 45 deletions(-)
> > > > >  create mode 100644 tools/lib/bpf/netlink.h
> > > > >  create mode 100644 tools/testing/selftests/bpf/prog_tests/tc_bpf_link.c
> > > > >
> > > > > --
> > > > > 2.31.1
> > > > >
> > >
> > > --
> > > Kartikeya

--
Kartikeya

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 0/7] Add bpf_link based TC-BPF API
  2022-06-10 19:34         ` Kumar Kartikeya Dwivedi
@ 2022-06-10 20:04           ` Daniel Borkmann
  2022-06-10 22:01             ` Joanne Koong
  2022-06-10 20:16           ` Toke Høiland-Jørgensen
  1 sibling, 1 reply; 35+ messages in thread
From: Daniel Borkmann @ 2022-06-10 20:04 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, Joanne Koong
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Jamal Hadi Salim,
	Vlad Buslov, Cong Wang, Jesper Dangaard Brouer,
	Toke Høiland-Jørgensen, netdev, john.fastabend

Hi Joanne, hi Kumar,

On 6/10/22 9:34 PM, Kumar Kartikeya Dwivedi wrote:
> On Sat, Jun 11, 2022 at 12:37:50AM IST, Joanne Koong wrote:
>> On Fri, Jun 10, 2022 at 10:23 AM Joanne Koong <joannelkoong@gmail.com> wrote:
>>> On Fri, Jun 10, 2022 at 5:58 AM Kumar Kartikeya Dwivedi
>>> <memxor@gmail.com> wrote:
>>>> On Fri, Jun 10, 2022 at 05:54:27AM IST, Joanne Koong wrote:
>>>>> On Thu, Jun 3, 2021 at 11:31 PM Kumar Kartikeya Dwivedi
>>>>> <memxor@gmail.com> wrote:
[...]
>>>> I can have a look at resurrecting it later this month, if you're ok with waiting
>>>> until then, otherwise if someone else wants to pick this up before that it's
>>>> fine by me, just let me know so we avoid duplicated effort. Note that the
>>>> approach in v2 is dead/unlikely to get accepted by the TC maintainers, so we'd
>>>> have to implement the way Daniel mentioned in [0].
>>>
>>> Sounds great! We'll wait and check back in with you later this month.
>>>
>> After reading the linked thread (which I should have done before
>> submitting my previous reply :)),  if I'm understanding it correctly,
>> it seems then that the work needed for tc bpf_link will be in a new
>> direction that's not based on the code in this v2 patchset. I'm
>> interested in learning more about bpf link and tc - I can pick this up
>> to work on. But if this was something you wanted to work on though,
>> please don't hesitate to let me know; I can find some other bpf link
>> thing to work on instead if that's the case.

The tc ingress/egress overhaul we also discussed at lsf/mm/bpf in our session
with John and pretty much is along the lines as in the earlier link you sent.
We need it from Cilium & Tetragon as well, so it's wip from our side at the
moment, modulo the bpf link part. Would you be okay if I pinged you once something
that is plateable is ready?

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 0/7] Add bpf_link based TC-BPF API
  2022-06-10 19:34         ` Kumar Kartikeya Dwivedi
  2022-06-10 20:04           ` Daniel Borkmann
@ 2022-06-10 20:16           ` Toke Høiland-Jørgensen
  2022-06-10 20:35             ` Daniel Borkmann
  1 sibling, 1 reply; 35+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-06-10 20:16 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, Joanne Koong
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Jamal Hadi Salim, Vlad Buslov, Cong Wang, Jesper Dangaard Brouer,
	netdev

Kumar Kartikeya Dwivedi <memxor@gmail.com> writes:

> On Sat, Jun 11, 2022 at 12:37:50AM IST, Joanne Koong wrote:
>> On Fri, Jun 10, 2022 at 10:23 AM Joanne Koong <joannelkoong@gmail.com> wrote:
>> >
>> > On Fri, Jun 10, 2022 at 5:58 AM Kumar Kartikeya Dwivedi
>> > <memxor@gmail.com> wrote:
>> > >
>> > > On Fri, Jun 10, 2022 at 05:54:27AM IST, Joanne Koong wrote:
>> > > > On Thu, Jun 3, 2021 at 11:31 PM Kumar Kartikeya Dwivedi
>> > > > <memxor@gmail.com> wrote:
>> > > > >
>> > > > > This is the second (non-RFC) version.
>> > > > >
>> > > > > This adds a bpf_link path to create TC filters tied to cls_bpf classifier, and
>> > > > > introduces fd based ownership for such TC filters. Netlink cannot delete or
>> > > > > replace such filters, but the bpf_link is severed on indirect destruction of the
>> > > > > filter (backing qdisc being deleted, or chain being flushed, etc.). To ensure
>> > > > > that filters remain attached beyond process lifetime, the usual bpf_link fd
>> > > > > pinning approach can be used.
>> > > > >
>> > > > > The individual patches contain more details and comments, but the overall kernel
>> > > > > API and libbpf helper mirrors the semantics of the netlink based TC-BPF API
>> > > > > merged recently. This means that we start by always setting direct action mode,
>> > > > > protocol to ETH_P_ALL, chain_index as 0, etc. If there is a need for more
>> > > > > options in the future, they can be easily exposed through the bpf_link API in
>> > > > > the future.
>> > > > >
>> > > > > Patch 1 refactors cls_bpf change function to extract two helpers that will be
>> > > > > reused in bpf_link creation.
>> > > > >
>> > > > > Patch 2 exports some bpf_link management functions to modules. This is needed
>> > > > > because our bpf_link object is tied to the cls_bpf_prog object. Tying it to
>> > > > > tcf_proto would be weird, because the update path has to replace offloaded bpf
>> > > > > prog, which happens using internal cls_bpf helpers, and would in general be more
>> > > > > code to abstract over an operation that is unlikely to be implemented for other
>> > > > > filter types.
>> > > > >
>> > > > > Patch 3 adds the main bpf_link API. A function in cls_api takes care of
>> > > > > obtaining block reference, creating the filter object, and then calls the
>> > > > > bpf_link_change tcf_proto op (only supported by cls_bpf) that returns a fd after
>> > > > > setting up the internal structures. An optimization is made to not keep around
>> > > > > resources for extended actions, which is explained in a code comment as it wasn't
>> > > > > immediately obvious.
>> > > > >
>> > > > > Patch 4 adds an update path for bpf_link. Since bpf_link_update only supports
>> > > > > replacing the bpf_prog, we can skip tc filter's change path by reusing the
>> > > > > filter object but swapping its bpf_prog. This takes care of replacing the
>> > > > > offloaded prog as well (if that fails, update is aborted). So far however,
>> > > > > tcf_classify could do normal load (possibly torn) as the cls_bpf_prog->filter
>> > > > > would never be modified concurrently. This is no longer true, and to not
>> > > > > penalize the classify hot path, we also cannot impose serialization around
>> > > > > its load. Hence the load is changed to READ_ONCE, so that the pointer value is
>> > > > > always consistent. Due to invocation in a RCU critical section, the lifetime of
>> > > > > the prog is guaranteed for the duration of the call.
>> > > > >
>> > > > > Patch 5, 6 take care of updating the userspace bits and add a bpf_link returning
>> > > > > function to libbpf.
>> > > > >
>> > > > > Patch 7 adds a selftest that exercises all possible problematic interactions
>> > > > > that I could think of.
>> > > > >
>> > > > > Design:
>> > > > >
>> > > > > This is where in the object hierarchy our bpf_link object is attached.
>> > > > >
>> > > > >                                                                             ┌─────┐
>> > > > >                                                                             │     │
>> > > > >                                                                             │ BPF │
>> > > > >                                                                             program
>> > > > >                                                                             │     │
>> > > > >                                                                             └──▲──┘
>> > > > >                                                       ┌───────┐                │
>> > > > >                                                       │       │         ┌──────┴───────┐
>> > > > >                                                       │  mod  ├─────────► cls_bpf_prog │
>> > > > > ┌────────────────┐                                    │cls_bpf│         └────┬───▲─────┘
>> > > > > │    tcf_block   │                                    │       │              │   │
>> > > > > └────────┬───────┘                                    └───▲───┘              │   │
>> > > > >          │          ┌─────────────┐                       │                ┌─▼───┴──┐
>> > > > >          └──────────►  tcf_chain  │                       │                │bpf_link│
>> > > > >                     └───────┬─────┘                       │                └────────┘
>> > > > >                             │          ┌─────────────┐    │
>> > > > >                             └──────────►  tcf_proto  ├────┘
>> > > > >                                        └─────────────┘
>> > > > >
>> > > > > The bpf_link is detached on destruction of the cls_bpf_prog.  Doing it this way
>> > > > > allows us to implement update in a lightweight manner without having to recreate
>> > > > > a new filter, where we can just replace the BPF prog attached to cls_bpf_prog.
>> > > > >
>> > > > > The other way to do it would be to link the bpf_link to tcf_proto, there are
>> > > > > numerous downsides to this:
>> > > > >
>> > > > > 1. All filters have to embed the pointer even though they won't be using it when
>> > > > > cls_bpf is compiled in.
>> > > > > 2. This probably won't make sense to be extended to other filter types anyway.
>> > > > > 3. We aren't able to optimize the update case without adding another bpf_link
>> > > > > specific update operation to tcf_proto ops.
>> > > > >
>> > > > > The downside with tying this to the module is having to export bpf_link
>> > > > > management functions and introducing a tcf_proto op. Hopefully the cost of
>> > > > > another operation func pointer is not big enough (as there is only one ops
>> > > > > struct per module).
>> > > > >
>> > > > Hi Kumar,
>> > > >
>> > > > Do you have any plans / bandwidth to land this feature upstream? If
>> > > > so, do you have a tentative estimation for when you'll be able to work
>> > > > on this? And if not, are you okay with someone else working on this to
>> > > > get it merged in?
>> > > >
>> > >
>> > > I can have a look at resurrecting it later this month, if you're ok with waiting
>> > > until then, otherwise if someone else wants to pick this up before that it's
>> > > fine by me, just let me know so we avoid duplicated effort. Note that the
>> > > approach in v2 is dead/unlikely to get accepted by the TC maintainers, so we'd
>> > > have to implement the way Daniel mentioned in [0].
>> >
>> > Sounds great! We'll wait and check back in with you later this month.
>> >
>> After reading the linked thread (which I should have done before
>> submitting my previous reply :)),  if I'm understanding it correctly,
>> it seems then that the work needed for tc bpf_link will be in a new
>> direction that's not based on the code in this v2 patchset. I'm
>> interested in learning more about bpf link and tc - I can pick this up
>> to work on. But if this was something you wanted to work on though,
>> please don't hesitate to let me know; I can find some other bpf link
>> thing to work on instead if that's the case.
>>
>
> Feel free to take it. And yes, it's going to be much simpler than this. I think
> you can just add two bpf_prog pointers in struct net_device, use rtnl_lock to
> protect the updates, and invoke using bpf_prog_run in sch_handle_ingress and
> sch_handle_egress.

Except we'd want to also support multiple programs on different
priorities? I don't think requiring a libxdp-like dispatcher to achieve
this is a good idea if we can just have it be part of the API from the
get-go...

-Toke


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 0/7] Add bpf_link based TC-BPF API
  2022-06-10 20:16           ` Toke Høiland-Jørgensen
@ 2022-06-10 20:35             ` Daniel Borkmann
  2022-06-10 20:41               ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 35+ messages in thread
From: Daniel Borkmann @ 2022-06-10 20:35 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Kumar Kartikeya Dwivedi, Joanne Koong
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Jamal Hadi Salim,
	Vlad Buslov, Cong Wang, Jesper Dangaard Brouer, netdev

On 6/10/22 10:16 PM, Toke Høiland-Jørgensen wrote:
> Kumar Kartikeya Dwivedi <memxor@gmail.com> writes:
> 
>> On Sat, Jun 11, 2022 at 12:37:50AM IST, Joanne Koong wrote:
>>> On Fri, Jun 10, 2022 at 10:23 AM Joanne Koong <joannelkoong@gmail.com> wrote:
>>>>
>>>> On Fri, Jun 10, 2022 at 5:58 AM Kumar Kartikeya Dwivedi
>>>> <memxor@gmail.com> wrote:
>>>>>
>>>>> On Fri, Jun 10, 2022 at 05:54:27AM IST, Joanne Koong wrote:
>>>>>> On Thu, Jun 3, 2021 at 11:31 PM Kumar Kartikeya Dwivedi
>>>>>> <memxor@gmail.com> wrote:
>>>>>>>
>>>>>>> This is the second (non-RFC) version.
>>>>>>>
>>>>>>> This adds a bpf_link path to create TC filters tied to cls_bpf classifier, and
>>>>>>> introduces fd based ownership for such TC filters. Netlink cannot delete or
>>>>>>> replace such filters, but the bpf_link is severed on indirect destruction of the
>>>>>>> filter (backing qdisc being deleted, or chain being flushed, etc.). To ensure
>>>>>>> that filters remain attached beyond process lifetime, the usual bpf_link fd
>>>>>>> pinning approach can be used.
>>>>>>>
>>>>>>> The individual patches contain more details and comments, but the overall kernel
>>>>>>> API and libbpf helper mirrors the semantics of the netlink based TC-BPF API
>>>>>>> merged recently. This means that we start by always setting direct action mode,
>>>>>>> protocol to ETH_P_ALL, chain_index as 0, etc. If there is a need for more
>>>>>>> options in the future, they can be easily exposed through the bpf_link API in
>>>>>>> the future.
>>>>>>>
>>>>>>> Patch 1 refactors cls_bpf change function to extract two helpers that will be
>>>>>>> reused in bpf_link creation.
>>>>>>>
>>>>>>> Patch 2 exports some bpf_link management functions to modules. This is needed
>>>>>>> because our bpf_link object is tied to the cls_bpf_prog object. Tying it to
>>>>>>> tcf_proto would be weird, because the update path has to replace offloaded bpf
>>>>>>> prog, which happens using internal cls_bpf helpers, and would in general be more
>>>>>>> code to abstract over an operation that is unlikely to be implemented for other
>>>>>>> filter types.
>>>>>>>
>>>>>>> Patch 3 adds the main bpf_link API. A function in cls_api takes care of
>>>>>>> obtaining block reference, creating the filter object, and then calls the
>>>>>>> bpf_link_change tcf_proto op (only supported by cls_bpf) that returns a fd after
>>>>>>> setting up the internal structures. An optimization is made to not keep around
>>>>>>> resources for extended actions, which is explained in a code comment as it wasn't
>>>>>>> immediately obvious.
>>>>>>>
>>>>>>> Patch 4 adds an update path for bpf_link. Since bpf_link_update only supports
>>>>>>> replacing the bpf_prog, we can skip tc filter's change path by reusing the
>>>>>>> filter object but swapping its bpf_prog. This takes care of replacing the
>>>>>>> offloaded prog as well (if that fails, update is aborted). So far however,
>>>>>>> tcf_classify could do normal load (possibly torn) as the cls_bpf_prog->filter
>>>>>>> would never be modified concurrently. This is no longer true, and to not
>>>>>>> penalize the classify hot path, we also cannot impose serialization around
>>>>>>> its load. Hence the load is changed to READ_ONCE, so that the pointer value is
>>>>>>> always consistent. Due to invocation in a RCU critical section, the lifetime of
>>>>>>> the prog is guaranteed for the duration of the call.
>>>>>>>
>>>>>>> Patch 5, 6 take care of updating the userspace bits and add a bpf_link returning
>>>>>>> function to libbpf.
>>>>>>>
>>>>>>> Patch 7 adds a selftest that exercises all possible problematic interactions
>>>>>>> that I could think of.
>>>>>>>
>>>>>>> Design:
>>>>>>>
>>>>>>> This is where in the object hierarchy our bpf_link object is attached.
>>>>>>>
>>>>>>>                                                                              ┌─────┐
>>>>>>>                                                                              │     │
>>>>>>>                                                                              │ BPF │
>>>>>>>                                                                              program
>>>>>>>                                                                              │     │
>>>>>>>                                                                              └──▲──┘
>>>>>>>                                                        ┌───────┐                │
>>>>>>>                                                        │       │         ┌──────┴───────┐
>>>>>>>                                                        │  mod  ├─────────► cls_bpf_prog │
>>>>>>> ┌────────────────┐                                    │cls_bpf│         └────┬───▲─────┘
>>>>>>> │    tcf_block   │                                    │       │              │   │
>>>>>>> └────────┬───────┘                                    └───▲───┘              │   │
>>>>>>>           │          ┌─────────────┐                       │                ┌─▼───┴──┐
>>>>>>>           └──────────►  tcf_chain  │                       │                │bpf_link│
>>>>>>>                      └───────┬─────┘                       │                └────────┘
>>>>>>>                              │          ┌─────────────┐    │
>>>>>>>                              └──────────►  tcf_proto  ├────┘
>>>>>>>                                         └─────────────┘
>>>>>>>
>>>>>>> The bpf_link is detached on destruction of the cls_bpf_prog.  Doing it this way
>>>>>>> allows us to implement update in a lightweight manner without having to recreate
>>>>>>> a new filter, where we can just replace the BPF prog attached to cls_bpf_prog.
>>>>>>>
>>>>>>> The other way to do it would be to link the bpf_link to tcf_proto, there are
>>>>>>> numerous downsides to this:
>>>>>>>
>>>>>>> 1. All filters have to embed the pointer even though they won't be using it when
>>>>>>> cls_bpf is compiled in.
>>>>>>> 2. This probably won't make sense to be extended to other filter types anyway.
>>>>>>> 3. We aren't able to optimize the update case without adding another bpf_link
>>>>>>> specific update operation to tcf_proto ops.
>>>>>>>
>>>>>>> The downside with tying this to the module is having to export bpf_link
>>>>>>> management functions and introducing a tcf_proto op. Hopefully the cost of
>>>>>>> another operation func pointer is not big enough (as there is only one ops
>>>>>>> struct per module).
>>>>>>>
>>>>>> Hi Kumar,
>>>>>>
>>>>>> Do you have any plans / bandwidth to land this feature upstream? If
>>>>>> so, do you have a tentative estimation for when you'll be able to work
>>>>>> on this? And if not, are you okay with someone else working on this to
>>>>>> get it merged in?
>>>>>>
>>>>>
>>>>> I can have a look at resurrecting it later this month, if you're ok with waiting
>>>>> until then, otherwise if someone else wants to pick this up before that it's
>>>>> fine by me, just let me know so we avoid duplicated effort. Note that the
>>>>> approach in v2 is dead/unlikely to get accepted by the TC maintainers, so we'd
>>>>> have to implement the way Daniel mentioned in [0].
>>>>
>>>> Sounds great! We'll wait and check back in with you later this month.
>>>>
>>> After reading the linked thread (which I should have done before
>>> submitting my previous reply :)),  if I'm understanding it correctly,
>>> it seems then that the work needed for tc bpf_link will be in a new
>>> direction that's not based on the code in this v2 patchset. I'm
>>> interested in learning more about bpf link and tc - I can pick this up
>>> to work on. But if this was something you wanted to work on though,
>>> please don't hesitate to let me know; I can find some other bpf link
>>> thing to work on instead if that's the case.
>>
>> Feel free to take it. And yes, it's going to be much simpler than this. I think
>> you can just add two bpf_prog pointers in struct net_device, use rtnl_lock to
>> protect the updates, and invoke using bpf_prog_run in sch_handle_ingress and
>> sch_handle_egress.
> 
> Except we'd want to also support multiple programs on different
> priorities? I don't think requiring a libxdp-like dispatcher to achieve
> this is a good idea if we can just have it be part of the API from the
> get-go...

Yes, it will be multi-prog to avoid a situation where dispatcher is needed.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 0/7] Add bpf_link based TC-BPF API
  2022-06-10 20:35             ` Daniel Borkmann
@ 2022-06-10 20:41               ` Toke Høiland-Jørgensen
  2022-06-10 21:52                 ` Alexei Starovoitov
  0 siblings, 1 reply; 35+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-06-10 20:41 UTC (permalink / raw)
  To: Daniel Borkmann, Kumar Kartikeya Dwivedi, Joanne Koong
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Jamal Hadi Salim,
	Vlad Buslov, Cong Wang, Jesper Dangaard Brouer, netdev

>> Except we'd want to also support multiple programs on different
>> priorities? I don't think requiring a libxdp-like dispatcher to achieve
>> this is a good idea if we can just have it be part of the API from the
>> get-go...
>
> Yes, it will be multi-prog to avoid a situation where dispatcher is needed.

Awesome! :)

-Toke


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 0/7] Add bpf_link based TC-BPF API
  2022-06-10 20:41               ` Toke Høiland-Jørgensen
@ 2022-06-10 21:52                 ` Alexei Starovoitov
  2022-06-10 22:02                   ` Daniel Borkmann
  0 siblings, 1 reply; 35+ messages in thread
From: Alexei Starovoitov @ 2022-06-10 21:52 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Daniel Borkmann, Kumar Kartikeya Dwivedi, Joanne Koong, bpf,
	Alexei Starovoitov, Andrii Nakryiko, Jamal Hadi Salim,
	Vlad Buslov, Cong Wang, Jesper Dangaard Brouer, netdev

On Fri, Jun 10, 2022 at 1:41 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> >> Except we'd want to also support multiple programs on different
> >> priorities? I don't think requiring a libxdp-like dispatcher to achieve
> >> this is a good idea if we can just have it be part of the API from the
> >> get-go...
> >
> > Yes, it will be multi-prog to avoid a situation where dispatcher is needed.
>
> Awesome! :)

Let's keep it simple to start.
Priorities or anything fancy can be added later if really necessary.
Otherwise, I'm afraid, we will go into endless bikeshedding
or the best priority scheme.

A link list of bpf progs like cls_bpf with the same semantics as
cls_bpf_classify.
With prog->exts_integrated always true and no classid, since this
concept doesn't apply.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 0/7] Add bpf_link based TC-BPF API
  2022-06-10 20:04           ` Daniel Borkmann
@ 2022-06-10 22:01             ` Joanne Koong
  0 siblings, 0 replies; 35+ messages in thread
From: Joanne Koong @ 2022-06-10 22:01 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Kumar Kartikeya Dwivedi, bpf, Alexei Starovoitov,
	Andrii Nakryiko, Jamal Hadi Salim, Vlad Buslov, Cong Wang,
	Jesper Dangaard Brouer, Toke Høiland-Jørgensen, netdev,
	John Fastabend

On Fri, Jun 10, 2022 at 1:04 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> Hi Joanne, hi Kumar,
>
> On 6/10/22 9:34 PM, Kumar Kartikeya Dwivedi wrote:
> > On Sat, Jun 11, 2022 at 12:37:50AM IST, Joanne Koong wrote:
> >> On Fri, Jun 10, 2022 at 10:23 AM Joanne Koong <joannelkoong@gmail.com> wrote:
> >>> On Fri, Jun 10, 2022 at 5:58 AM Kumar Kartikeya Dwivedi
> >>> <memxor@gmail.com> wrote:
> >>>> On Fri, Jun 10, 2022 at 05:54:27AM IST, Joanne Koong wrote:
> >>>>> On Thu, Jun 3, 2021 at 11:31 PM Kumar Kartikeya Dwivedi
> >>>>> <memxor@gmail.com> wrote:
> [...]
> >>>> I can have a look at resurrecting it later this month, if you're ok with waiting
> >>>> until then, otherwise if someone else wants to pick this up before that it's
> >>>> fine by me, just let me know so we avoid duplicated effort. Note that the
> >>>> approach in v2 is dead/unlikely to get accepted by the TC maintainers, so we'd
> >>>> have to implement the way Daniel mentioned in [0].
> >>>
> >>> Sounds great! We'll wait and check back in with you later this month.
> >>>
> >> After reading the linked thread (which I should have done before
> >> submitting my previous reply :)),  if I'm understanding it correctly,
> >> it seems then that the work needed for tc bpf_link will be in a new
> >> direction that's not based on the code in this v2 patchset. I'm
> >> interested in learning more about bpf link and tc - I can pick this up
> >> to work on. But if this was something you wanted to work on though,
> >> please don't hesitate to let me know; I can find some other bpf link
> >> thing to work on instead if that's the case.
>
> The tc ingress/egress overhaul we also discussed at lsf/mm/bpf in our session
> with John and pretty much is along the lines as in the earlier link you sent.
> We need it from Cilium & Tetragon as well, so it's wip from our side at the
> moment, modulo the bpf link part. Would you be okay if I pinged you once something
> that is plateable is ready?
Yeah definitely! Thanks for letting me know!
>
> Thanks,
> Daniel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 0/7] Add bpf_link based TC-BPF API
  2022-06-10 21:52                 ` Alexei Starovoitov
@ 2022-06-10 22:02                   ` Daniel Borkmann
  2022-06-11 10:54                     ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 35+ messages in thread
From: Daniel Borkmann @ 2022-06-10 22:02 UTC (permalink / raw)
  To: Alexei Starovoitov, Toke Høiland-Jørgensen
  Cc: Kumar Kartikeya Dwivedi, Joanne Koong, bpf, Alexei Starovoitov,
	Andrii Nakryiko, Jamal Hadi Salim, Vlad Buslov, Cong Wang,
	Jesper Dangaard Brouer, netdev

On 6/10/22 11:52 PM, Alexei Starovoitov wrote:
> On Fri, Jun 10, 2022 at 1:41 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>>>> Except we'd want to also support multiple programs on different
>>>> priorities? I don't think requiring a libxdp-like dispatcher to achieve
>>>> this is a good idea if we can just have it be part of the API from the
>>>> get-go...
>>>
>>> Yes, it will be multi-prog to avoid a situation where dispatcher is needed.
>>
>> Awesome! :)
> 
> Let's keep it simple to start.
> Priorities or anything fancy can be added later if really necessary.
> Otherwise, I'm afraid, we will go into endless bikeshedding
> or the best priority scheme.
> 
> A link list of bpf progs like cls_bpf with the same semantics as
> cls_bpf_classify.
> With prog->exts_integrated always true and no classid, since this
> concept doesn't apply.
Yes, semantics must be that TC_ACT_UNSPEC continues in the list and
everything else as return code would terminate the evaluation.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 0/7] Add bpf_link based TC-BPF API
  2022-06-10 22:02                   ` Daniel Borkmann
@ 2022-06-11 10:54                     ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 35+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-06-11 10:54 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov
  Cc: Kumar Kartikeya Dwivedi, Joanne Koong, bpf, Alexei Starovoitov,
	Andrii Nakryiko, Jamal Hadi Salim, Vlad Buslov, Cong Wang,
	Jesper Dangaard Brouer, netdev

Daniel Borkmann <daniel@iogearbox.net> writes:

> On 6/10/22 11:52 PM, Alexei Starovoitov wrote:
>> On Fri, Jun 10, 2022 at 1:41 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>>
>>>>> Except we'd want to also support multiple programs on different
>>>>> priorities? I don't think requiring a libxdp-like dispatcher to achieve
>>>>> this is a good idea if we can just have it be part of the API from the
>>>>> get-go...
>>>>
>>>> Yes, it will be multi-prog to avoid a situation where dispatcher is needed.
>>>
>>> Awesome! :)
>> 
>> Let's keep it simple to start.
>> Priorities or anything fancy can be added later if really necessary.
>> Otherwise, I'm afraid, we will go into endless bikeshedding
>> or the best priority scheme.
>> 
>> A link list of bpf progs like cls_bpf with the same semantics as
>> cls_bpf_classify.
>> With prog->exts_integrated always true and no classid, since this
>> concept doesn't apply.
> Yes, semantics must be that TC_ACT_UNSPEC continues in the list and
> everything else as return code would terminate the evaluation.

Sure, SGTM!

-Toke


^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2022-06-11 10:55 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-04  6:31 [PATCH bpf-next v2 0/7] Add bpf_link based TC-BPF API Kumar Kartikeya Dwivedi
2021-06-04  6:31 ` [PATCH bpf-next v2 1/7] net: sched: refactor cls_bpf creation code Kumar Kartikeya Dwivedi
2021-06-04  6:31 ` [PATCH bpf-next v2 2/7] bpf: export bpf_link functions for modules Kumar Kartikeya Dwivedi
2021-06-04  6:31 ` [PATCH bpf-next v2 3/7] net: sched: add bpf_link API for bpf classifier Kumar Kartikeya Dwivedi
2021-06-05  3:08   ` Yonghong Song
2021-06-05  4:52     ` Kumar Kartikeya Dwivedi
2021-06-07 23:23   ` Andrii Nakryiko
2021-06-04  6:31 ` [PATCH bpf-next v2 4/7] net: sched: add lightweight update path for cls_bpf Kumar Kartikeya Dwivedi
2021-06-04 17:54   ` Alexei Starovoitov
2021-06-05  4:42     ` Kumar Kartikeya Dwivedi
2021-06-07 23:32   ` Andrii Nakryiko
2021-06-10 14:14     ` Kumar Kartikeya Dwivedi
2021-06-04  6:31 ` [PATCH bpf-next v2 5/7] tools: bpf.h: sync with kernel sources Kumar Kartikeya Dwivedi
2021-06-04  6:31 ` [PATCH bpf-next v2 6/7] libbpf: add bpf_link based TC-BPF management API Kumar Kartikeya Dwivedi
2021-06-04 18:01   ` Alexei Starovoitov
2021-06-05  4:51     ` Kumar Kartikeya Dwivedi
2021-06-07 23:37       ` Andrii Nakryiko
2021-06-05 17:09   ` Yonghong Song
2021-06-07 23:41   ` Andrii Nakryiko
2021-06-04  6:31 ` [PATCH bpf-next v2 7/7] libbpf: add selftest for " Kumar Kartikeya Dwivedi
2021-06-05 17:26   ` Yonghong Song
2021-06-07 23:57   ` Andrii Nakryiko
2022-06-10  0:24 ` [PATCH bpf-next v2 0/7] Add bpf_link based TC-BPF API Joanne Koong
2022-06-10 12:58   ` Kumar Kartikeya Dwivedi
2022-06-10 17:23     ` Joanne Koong
2022-06-10 19:07       ` Joanne Koong
2022-06-10 19:34         ` Kumar Kartikeya Dwivedi
2022-06-10 20:04           ` Daniel Borkmann
2022-06-10 22:01             ` Joanne Koong
2022-06-10 20:16           ` Toke Høiland-Jørgensen
2022-06-10 20:35             ` Daniel Borkmann
2022-06-10 20:41               ` Toke Høiland-Jørgensen
2022-06-10 21:52                 ` Alexei Starovoitov
2022-06-10 22:02                   ` Daniel Borkmann
2022-06-11 10:54                     ` Toke Høiland-Jørgensen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).