netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v3 0/4] bpf, netns: Prepare for multi-prog attachment
@ 2020-06-25 14:13 Jakub Sitnicki
  2020-06-25 14:13 ` [PATCH bpf-next v3 1/4] flow_dissector: Pull BPF program assignment up to bpf-netns Jakub Sitnicki
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Jakub Sitnicki @ 2020-06-25 14:13 UTC (permalink / raw)
  To: bpf
  Cc: netdev, kernel-team, Andrii Nakryiko, Martin KaFai Lau,
	Stanislav Fomichev

This patch set prepares ground for link-based multi-prog attachment for
future netns attach types, with BPF_SK_LOOKUP attach type in mind [0].

Two changes are needed in order to attach and run a series of BPF programs:

  1) an bpf_prog_array of programs to run (patch #2), and
  2) a list of attached links to keep track of attachments (patch #3).

Nothing changes for BPF flow_dissector. Just as before only one program can
be attached to netns.


In v3 I've simplified patch #2 that introduces bpf_prog_array to take
advantage of the fact that it will hold at most one program for now.

In particular, I'm no longer using bpf_prog_array_copy. It turned out to be
less suitable for link operations than I thought as it fails to append the
same BPF program.

bpf_prog_array_replace_item is also gone, because we know we always want to
replace the first element in prog_array.

Naturally the code that handles bpf_prog_array will need change once
more when there is a program type that allows multi-prog attachment. But I
feel it will be better to do it gradually and present it together with
tests that actually exercise multi-prog code paths.

Thanks,
-jkbs

[0] https://lore.kernel.org/bpf/20200511185218.1422406-1-jakub@cloudflare.com/

Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Stanislav Fomichev <sdf@google.com>

v2 -> v3:
- Don't check if run_array is null in link update callback. (Martin)
- Allow updating the link with the same BPF program. (Andrii)
- Add patch #4 with a test for the above case.
- Kill bpf_prog_array_replace_item. Access the run_array directly.
- Switch from bpf_prog_array_copy() to bpf_prog_array_alloc(1, ...).
- Replace rcu_deref_protected & RCU_INIT_POINTER with rcu_replace_pointer.
- Drop Andrii's Ack from patch #2. Code changed.

v1 -> v2:

- Show with a (void) cast that bpf_prog_array_replace_item() return value
  is ignored on purpose. (Andrii)
- Explain why bpf-cgroup cannot replace programs in bpf_prog_array based
  on bpf_prog pointer comparison in patch #2 description. (Andrii)

Jakub Sitnicki (4):
  flow_dissector: Pull BPF program assignment up to bpf-netns
  bpf, netns: Keep attached programs in bpf_prog_array
  bpf, netns: Keep a list of attached bpf_link's
  selftests/bpf: Test updating flow_dissector link with same program

 include/net/flow_dissector.h                  |   3 +-
 include/net/netns/bpf.h                       |   7 +-
 kernel/bpf/net_namespace.c                    | 162 ++++++++++++------
 net/core/flow_dissector.c                     |  32 ++--
 .../bpf/prog_tests/flow_dissector_reattach.c  |  32 +++-
 5 files changed, 160 insertions(+), 76 deletions(-)

-- 
2.25.4


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH bpf-next v3 1/4] flow_dissector: Pull BPF program assignment up to bpf-netns
  2020-06-25 14:13 [PATCH bpf-next v3 0/4] bpf, netns: Prepare for multi-prog attachment Jakub Sitnicki
@ 2020-06-25 14:13 ` Jakub Sitnicki
  2020-06-26  1:38   ` Martin KaFai Lau
  2020-06-25 14:13 ` [PATCH bpf-next v3 2/4] bpf, netns: Keep attached programs in bpf_prog_array Jakub Sitnicki
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Jakub Sitnicki @ 2020-06-25 14:13 UTC (permalink / raw)
  To: bpf; +Cc: netdev, kernel-team, Andrii Nakryiko

Prepare for using bpf_prog_array to store attached programs by moving out
code that updates the attached program out of flow dissector.

Managing bpf_prog_array is more involved than updating a single bpf_prog
pointer. This will let us do it all from one place, bpf/net_namespace.c, in
the subsequent patch.

No functional change intended.

Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 include/net/flow_dissector.h |  3 ++-
 kernel/bpf/net_namespace.c   | 20 ++++++++++++++++++--
 net/core/flow_dissector.c    | 13 ++-----------
 3 files changed, 22 insertions(+), 14 deletions(-)

diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
index a7eba43fe4e4..4b6e36288ddd 100644
--- a/include/net/flow_dissector.h
+++ b/include/net/flow_dissector.h
@@ -372,7 +372,8 @@ flow_dissector_init_keys(struct flow_dissector_key_control *key_control,
 }
 
 #ifdef CONFIG_BPF_SYSCALL
-int flow_dissector_bpf_prog_attach(struct net *net, struct bpf_prog *prog);
+int flow_dissector_bpf_prog_attach_check(struct net *net,
+					 struct bpf_prog *prog);
 #endif /* CONFIG_BPF_SYSCALL */
 
 #endif
diff --git a/kernel/bpf/net_namespace.c b/kernel/bpf/net_namespace.c
index 78cf061f8179..b951dab2687f 100644
--- a/kernel/bpf/net_namespace.c
+++ b/kernel/bpf/net_namespace.c
@@ -189,6 +189,7 @@ int netns_bpf_prog_query(const union bpf_attr *attr,
 int netns_bpf_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
 {
 	enum netns_bpf_attach_type type;
+	struct bpf_prog *attached;
 	struct net *net;
 	int ret;
 
@@ -207,12 +208,26 @@ int netns_bpf_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
 
 	switch (type) {
 	case NETNS_BPF_FLOW_DISSECTOR:
-		ret = flow_dissector_bpf_prog_attach(net, prog);
+		ret = flow_dissector_bpf_prog_attach_check(net, prog);
 		break;
 	default:
 		ret = -EINVAL;
 		break;
 	}
+	if (ret)
+		goto out_unlock;
+
+	attached = rcu_dereference_protected(net->bpf.progs[type],
+					     lockdep_is_held(&netns_bpf_mutex));
+	if (attached == prog) {
+		/* The same program cannot be attached twice */
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+	rcu_assign_pointer(net->bpf.progs[type], prog);
+	if (attached)
+		bpf_prog_put(attached);
+
 out_unlock:
 	mutex_unlock(&netns_bpf_mutex);
 
@@ -277,7 +292,7 @@ static int netns_bpf_link_attach(struct net *net, struct bpf_link *link,
 
 	switch (type) {
 	case NETNS_BPF_FLOW_DISSECTOR:
-		err = flow_dissector_bpf_prog_attach(net, link->prog);
+		err = flow_dissector_bpf_prog_attach_check(net, link->prog);
 		break;
 	default:
 		err = -EINVAL;
@@ -286,6 +301,7 @@ static int netns_bpf_link_attach(struct net *net, struct bpf_link *link,
 	if (err)
 		goto out_unlock;
 
+	rcu_assign_pointer(net->bpf.progs[type], link->prog);
 	net->bpf.links[type] = link;
 
 out_unlock:
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index d02df0b6d0d9..b57fb1359395 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -70,10 +70,10 @@ void skb_flow_dissector_init(struct flow_dissector *flow_dissector,
 EXPORT_SYMBOL(skb_flow_dissector_init);
 
 #ifdef CONFIG_BPF_SYSCALL
-int flow_dissector_bpf_prog_attach(struct net *net, struct bpf_prog *prog)
+int flow_dissector_bpf_prog_attach_check(struct net *net,
+					 struct bpf_prog *prog)
 {
 	enum netns_bpf_attach_type type = NETNS_BPF_FLOW_DISSECTOR;
-	struct bpf_prog *attached;
 
 	if (net == &init_net) {
 		/* BPF flow dissector in the root namespace overrides
@@ -97,15 +97,6 @@ int flow_dissector_bpf_prog_attach(struct net *net, struct bpf_prog *prog)
 			return -EEXIST;
 	}
 
-	attached = rcu_dereference_protected(net->bpf.progs[type],
-					     lockdep_is_held(&netns_bpf_mutex));
-	if (attached == prog)
-		/* The same program cannot be attached twice */
-		return -EINVAL;
-
-	rcu_assign_pointer(net->bpf.progs[type], prog);
-	if (attached)
-		bpf_prog_put(attached);
 	return 0;
 }
 #endif /* CONFIG_BPF_SYSCALL */
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH bpf-next v3 2/4] bpf, netns: Keep attached programs in bpf_prog_array
  2020-06-25 14:13 [PATCH bpf-next v3 0/4] bpf, netns: Prepare for multi-prog attachment Jakub Sitnicki
  2020-06-25 14:13 ` [PATCH bpf-next v3 1/4] flow_dissector: Pull BPF program assignment up to bpf-netns Jakub Sitnicki
@ 2020-06-25 14:13 ` Jakub Sitnicki
  2020-06-25 20:50   ` Andrii Nakryiko
  2020-06-26  5:41   ` Martin KaFai Lau
  2020-06-25 14:13 ` [PATCH bpf-next v3 3/4] bpf, netns: Keep a list of attached bpf_link's Jakub Sitnicki
  2020-06-25 14:13 ` [PATCH bpf-next v3 4/4] selftests/bpf: Test updating flow_dissector link with same program Jakub Sitnicki
  3 siblings, 2 replies; 14+ messages in thread
From: Jakub Sitnicki @ 2020-06-25 14:13 UTC (permalink / raw)
  To: bpf; +Cc: netdev, kernel-team

Prepare for having multi-prog attachments for new netns attach types by
storing programs to run in a bpf_prog_array, which is well suited for
iterating over programs and running them in sequence.

After this change bpf(PROG_QUERY) may block to allocate memory in
bpf_prog_array_copy_to_user() for collected program IDs. This forces a
change in how we protect access to the attached program in the query
callback. Because bpf_prog_array_copy_to_user() can sleep, we switch from
an RCU read lock to holding a mutex that serializes updaters.

Because we allow only one BPF flow_dissector program to be attached to
netns at all times, the bpf_prog_array pointed by net->bpf.run_array is
always either detached (null) or one element long.

No functional changes intended.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 include/net/netns/bpf.h    |   5 +-
 kernel/bpf/net_namespace.c | 120 +++++++++++++++++++++++++------------
 net/core/flow_dissector.c  |  19 +++---
 3 files changed, 96 insertions(+), 48 deletions(-)

diff --git a/include/net/netns/bpf.h b/include/net/netns/bpf.h
index a8dce2a380c8..a5015bda9979 100644
--- a/include/net/netns/bpf.h
+++ b/include/net/netns/bpf.h
@@ -9,9 +9,12 @@
 #include <linux/bpf-netns.h>
 
 struct bpf_prog;
+struct bpf_prog_array;
 
 struct netns_bpf {
-	struct bpf_prog __rcu *progs[MAX_NETNS_BPF_ATTACH_TYPE];
+	/* Array of programs to run compiled from progs or links */
+	struct bpf_prog_array __rcu *run_array[MAX_NETNS_BPF_ATTACH_TYPE];
+	struct bpf_prog *progs[MAX_NETNS_BPF_ATTACH_TYPE];
 	struct bpf_link *links[MAX_NETNS_BPF_ATTACH_TYPE];
 };
 
diff --git a/kernel/bpf/net_namespace.c b/kernel/bpf/net_namespace.c
index b951dab2687f..0dba97202357 100644
--- a/kernel/bpf/net_namespace.c
+++ b/kernel/bpf/net_namespace.c
@@ -33,6 +33,17 @@ static void __net_exit bpf_netns_link_auto_detach(struct bpf_link *link)
 	net_link->net = NULL;
 }
 
+/* Must be called with netns_bpf_mutex held. */
+static void netns_bpf_run_array_detach(struct net *net,
+				       enum netns_bpf_attach_type type)
+{
+	struct bpf_prog_array *run_array;
+
+	run_array = rcu_replace_pointer(net->bpf.run_array[type], NULL,
+					lockdep_is_held(&netns_bpf_mutex));
+	bpf_prog_array_free(run_array);
+}
+
 static void bpf_netns_link_release(struct bpf_link *link)
 {
 	struct bpf_netns_link *net_link =
@@ -54,8 +65,8 @@ static void bpf_netns_link_release(struct bpf_link *link)
 	if (!net)
 		goto out_unlock;
 
+	netns_bpf_run_array_detach(net, type);
 	net->bpf.links[type] = NULL;
-	RCU_INIT_POINTER(net->bpf.progs[type], NULL);
 
 out_unlock:
 	mutex_unlock(&netns_bpf_mutex);
@@ -76,6 +87,7 @@ static int bpf_netns_link_update_prog(struct bpf_link *link,
 	struct bpf_netns_link *net_link =
 		container_of(link, struct bpf_netns_link, link);
 	enum netns_bpf_attach_type type = net_link->netns_type;
+	struct bpf_prog_array *run_array;
 	struct net *net;
 	int ret = 0;
 
@@ -93,8 +105,11 @@ static int bpf_netns_link_update_prog(struct bpf_link *link,
 		goto out_unlock;
 	}
 
+	run_array = rcu_dereference_protected(net->bpf.run_array[type],
+					      lockdep_is_held(&netns_bpf_mutex));
+	WRITE_ONCE(run_array->items[0].prog, new_prog);
+
 	old_prog = xchg(&link->prog, new_prog);
-	rcu_assign_pointer(net->bpf.progs[type], new_prog);
 	bpf_prog_put(old_prog);
 
 out_unlock:
@@ -142,14 +157,38 @@ static const struct bpf_link_ops bpf_netns_link_ops = {
 	.show_fdinfo = bpf_netns_link_show_fdinfo,
 };
 
+/* Must be called with netns_bpf_mutex held. */
+static int __netns_bpf_prog_query(const union bpf_attr *attr,
+				  union bpf_attr __user *uattr,
+				  struct net *net,
+				  enum netns_bpf_attach_type type)
+{
+	__u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids);
+	struct bpf_prog_array *run_array;
+	u32 prog_cnt = 0, flags = 0;
+
+	run_array = rcu_dereference_protected(net->bpf.run_array[type],
+					      lockdep_is_held(&netns_bpf_mutex));
+	if (run_array)
+		prog_cnt = bpf_prog_array_length(run_array);
+
+	if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags)))
+		return -EFAULT;
+	if (copy_to_user(&uattr->query.prog_cnt, &prog_cnt, sizeof(prog_cnt)))
+		return -EFAULT;
+	if (!attr->query.prog_cnt || !prog_ids || !prog_cnt)
+		return 0;
+
+	return bpf_prog_array_copy_to_user(run_array, prog_ids,
+					   attr->query.prog_cnt);
+}
+
 int netns_bpf_prog_query(const union bpf_attr *attr,
 			 union bpf_attr __user *uattr)
 {
-	__u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids);
-	u32 prog_id, prog_cnt = 0, flags = 0;
 	enum netns_bpf_attach_type type;
-	struct bpf_prog *attached;
 	struct net *net;
+	int ret;
 
 	if (attr->query.query_flags)
 		return -EINVAL;
@@ -162,32 +201,17 @@ int netns_bpf_prog_query(const union bpf_attr *attr,
 	if (IS_ERR(net))
 		return PTR_ERR(net);
 
-	rcu_read_lock();
-	attached = rcu_dereference(net->bpf.progs[type]);
-	if (attached) {
-		prog_cnt = 1;
-		prog_id = attached->aux->id;
-	}
-	rcu_read_unlock();
+	mutex_lock(&netns_bpf_mutex);
+	ret = __netns_bpf_prog_query(attr, uattr, net, type);
+	mutex_unlock(&netns_bpf_mutex);
 
 	put_net(net);
-
-	if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags)))
-		return -EFAULT;
-	if (copy_to_user(&uattr->query.prog_cnt, &prog_cnt, sizeof(prog_cnt)))
-		return -EFAULT;
-
-	if (!attr->query.prog_cnt || !prog_ids || !prog_cnt)
-		return 0;
-
-	if (copy_to_user(prog_ids, &prog_id, sizeof(u32)))
-		return -EFAULT;
-
-	return 0;
+	return ret;
 }
 
 int netns_bpf_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
 {
+	struct bpf_prog_array *run_array;
 	enum netns_bpf_attach_type type;
 	struct bpf_prog *attached;
 	struct net *net;
@@ -217,14 +241,28 @@ int netns_bpf_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
 	if (ret)
 		goto out_unlock;
 
-	attached = rcu_dereference_protected(net->bpf.progs[type],
-					     lockdep_is_held(&netns_bpf_mutex));
+	attached = net->bpf.progs[type];
 	if (attached == prog) {
 		/* The same program cannot be attached twice */
 		ret = -EINVAL;
 		goto out_unlock;
 	}
-	rcu_assign_pointer(net->bpf.progs[type], prog);
+
+	run_array = rcu_dereference_protected(net->bpf.run_array[type],
+					      lockdep_is_held(&netns_bpf_mutex));
+	if (run_array) {
+		WRITE_ONCE(run_array->items[0].prog, prog);
+	} else {
+		run_array = bpf_prog_array_alloc(1, GFP_KERNEL);
+		if (!run_array) {
+			ret = -ENOMEM;
+			goto out_unlock;
+		}
+		run_array->items[0].prog = prog;
+		rcu_assign_pointer(net->bpf.run_array[type], run_array);
+	}
+
+	net->bpf.progs[type] = prog;
 	if (attached)
 		bpf_prog_put(attached);
 
@@ -244,11 +282,11 @@ static int __netns_bpf_prog_detach(struct net *net,
 	if (net->bpf.links[type])
 		return -EINVAL;
 
-	attached = rcu_dereference_protected(net->bpf.progs[type],
-					     lockdep_is_held(&netns_bpf_mutex));
+	attached = net->bpf.progs[type];
 	if (!attached)
 		return -ENOENT;
-	RCU_INIT_POINTER(net->bpf.progs[type], NULL);
+	netns_bpf_run_array_detach(net, type);
+	net->bpf.progs[type] = NULL;
 	bpf_prog_put(attached);
 	return 0;
 }
@@ -272,7 +310,7 @@ int netns_bpf_prog_detach(const union bpf_attr *attr)
 static int netns_bpf_link_attach(struct net *net, struct bpf_link *link,
 				 enum netns_bpf_attach_type type)
 {
-	struct bpf_prog *prog;
+	struct bpf_prog_array *run_array;
 	int err;
 
 	mutex_lock(&netns_bpf_mutex);
@@ -283,9 +321,7 @@ static int netns_bpf_link_attach(struct net *net, struct bpf_link *link,
 		goto out_unlock;
 	}
 	/* Links are not compatible with attaching prog directly */
-	prog = rcu_dereference_protected(net->bpf.progs[type],
-					 lockdep_is_held(&netns_bpf_mutex));
-	if (prog) {
+	if (net->bpf.progs[type]) {
 		err = -EEXIST;
 		goto out_unlock;
 	}
@@ -301,7 +337,14 @@ static int netns_bpf_link_attach(struct net *net, struct bpf_link *link,
 	if (err)
 		goto out_unlock;
 
-	rcu_assign_pointer(net->bpf.progs[type], link->prog);
+	run_array = bpf_prog_array_alloc(1, GFP_KERNEL);
+	if (!run_array) {
+		err = -ENOMEM;
+		goto out_unlock;
+	}
+	run_array->items[0].prog = link->prog;
+	rcu_assign_pointer(net->bpf.run_array[type], run_array);
+
 	net->bpf.links[type] = link;
 
 out_unlock:
@@ -368,11 +411,12 @@ static void __net_exit netns_bpf_pernet_pre_exit(struct net *net)
 
 	mutex_lock(&netns_bpf_mutex);
 	for (type = 0; type < MAX_NETNS_BPF_ATTACH_TYPE; type++) {
+		netns_bpf_run_array_detach(net, type);
 		link = net->bpf.links[type];
 		if (link)
 			bpf_netns_link_auto_detach(link);
-		else
-			__netns_bpf_prog_detach(net, type);
+		else if (net->bpf.progs[type])
+			bpf_prog_put(net->bpf.progs[type]);
 	}
 	mutex_unlock(&netns_bpf_mutex);
 }
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index b57fb1359395..142a8824f0a8 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -86,14 +86,14 @@ int flow_dissector_bpf_prog_attach_check(struct net *net,
 		for_each_net(ns) {
 			if (ns == &init_net)
 				continue;
-			if (rcu_access_pointer(ns->bpf.progs[type]))
+			if (rcu_access_pointer(ns->bpf.run_array[type]))
 				return -EEXIST;
 		}
 	} else {
 		/* Make sure root flow dissector is not attached
 		 * when attaching to the non-root namespace.
 		 */
-		if (rcu_access_pointer(init_net.bpf.progs[type]))
+		if (rcu_access_pointer(init_net.bpf.run_array[type]))
 			return -EEXIST;
 	}
 
@@ -894,7 +894,6 @@ bool __skb_flow_dissect(const struct net *net,
 	struct flow_dissector_key_addrs *key_addrs;
 	struct flow_dissector_key_tags *key_tags;
 	struct flow_dissector_key_vlan *key_vlan;
-	struct bpf_prog *attached = NULL;
 	enum flow_dissect_ret fdret;
 	enum flow_dissector_key_id dissector_vlan = FLOW_DISSECTOR_KEY_MAX;
 	bool mpls_el = false;
@@ -951,14 +950,14 @@ bool __skb_flow_dissect(const struct net *net,
 	WARN_ON_ONCE(!net);
 	if (net) {
 		enum netns_bpf_attach_type type = NETNS_BPF_FLOW_DISSECTOR;
+		struct bpf_prog_array *run_array;
 
 		rcu_read_lock();
-		attached = rcu_dereference(init_net.bpf.progs[type]);
+		run_array = rcu_dereference(init_net.bpf.run_array[type]);
+		if (!run_array)
+			run_array = rcu_dereference(net->bpf.run_array[type]);
 
-		if (!attached)
-			attached = rcu_dereference(net->bpf.progs[type]);
-
-		if (attached) {
+		if (run_array) {
 			struct bpf_flow_keys flow_keys;
 			struct bpf_flow_dissector ctx = {
 				.flow_keys = &flow_keys,
@@ -966,6 +965,7 @@ bool __skb_flow_dissect(const struct net *net,
 				.data_end = data + hlen,
 			};
 			__be16 n_proto = proto;
+			struct bpf_prog *prog;
 
 			if (skb) {
 				ctx.skb = skb;
@@ -976,7 +976,8 @@ bool __skb_flow_dissect(const struct net *net,
 				n_proto = skb->protocol;
 			}
 
-			ret = bpf_flow_dissect(attached, &ctx, n_proto, nhoff,
+			prog = READ_ONCE(run_array->items[0].prog);
+			ret = bpf_flow_dissect(prog, &ctx, n_proto, nhoff,
 					       hlen, flags);
 			__skb_flow_bpf_to_target(&flow_keys, flow_dissector,
 						 target_container);
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH bpf-next v3 3/4] bpf, netns: Keep a list of attached bpf_link's
  2020-06-25 14:13 [PATCH bpf-next v3 0/4] bpf, netns: Prepare for multi-prog attachment Jakub Sitnicki
  2020-06-25 14:13 ` [PATCH bpf-next v3 1/4] flow_dissector: Pull BPF program assignment up to bpf-netns Jakub Sitnicki
  2020-06-25 14:13 ` [PATCH bpf-next v3 2/4] bpf, netns: Keep attached programs in bpf_prog_array Jakub Sitnicki
@ 2020-06-25 14:13 ` Jakub Sitnicki
  2020-06-26  5:43   ` Martin KaFai Lau
  2020-06-25 14:13 ` [PATCH bpf-next v3 4/4] selftests/bpf: Test updating flow_dissector link with same program Jakub Sitnicki
  3 siblings, 1 reply; 14+ messages in thread
From: Jakub Sitnicki @ 2020-06-25 14:13 UTC (permalink / raw)
  To: bpf; +Cc: netdev, kernel-team, Andrii Nakryiko

To support multi-prog link-based attachments for new netns attach types, we
need to keep track of more than one bpf_link per attach type. Hence,
convert net->bpf.links into a list, that currently can be either empty or
have just one item.

Instead of reusing bpf_prog_list from bpf-cgroup, we link together
bpf_netns_link's themselves. This makes list management simpler as we don't
have to allocate, initialize, and later release list elements. We can do
this because multi-prog attachment will be available only for bpf_link, and
we don't need to build a list of programs attached directly and indirectly
via links.

No functional changes intended.

Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 include/net/netns/bpf.h    |  2 +-
 kernel/bpf/net_namespace.c | 42 +++++++++++++++++++++-----------------
 2 files changed, 24 insertions(+), 20 deletions(-)

diff --git a/include/net/netns/bpf.h b/include/net/netns/bpf.h
index a5015bda9979..0ca6a1b87185 100644
--- a/include/net/netns/bpf.h
+++ b/include/net/netns/bpf.h
@@ -15,7 +15,7 @@ struct netns_bpf {
 	/* Array of programs to run compiled from progs or links */
 	struct bpf_prog_array __rcu *run_array[MAX_NETNS_BPF_ATTACH_TYPE];
 	struct bpf_prog *progs[MAX_NETNS_BPF_ATTACH_TYPE];
-	struct bpf_link *links[MAX_NETNS_BPF_ATTACH_TYPE];
+	struct list_head links[MAX_NETNS_BPF_ATTACH_TYPE];
 };
 
 #endif /* __NETNS_BPF_H__ */
diff --git a/kernel/bpf/net_namespace.c b/kernel/bpf/net_namespace.c
index 0dba97202357..7a34a8caf954 100644
--- a/kernel/bpf/net_namespace.c
+++ b/kernel/bpf/net_namespace.c
@@ -19,20 +19,12 @@ struct bpf_netns_link {
 	 * with netns_bpf_mutex held.
 	 */
 	struct net *net;
+	struct list_head node; /* node in list of links attached to net */
 };
 
 /* Protects updates to netns_bpf */
 DEFINE_MUTEX(netns_bpf_mutex);
 
-/* Must be called with netns_bpf_mutex held. */
-static void __net_exit bpf_netns_link_auto_detach(struct bpf_link *link)
-{
-	struct bpf_netns_link *net_link =
-		container_of(link, struct bpf_netns_link, link);
-
-	net_link->net = NULL;
-}
-
 /* Must be called with netns_bpf_mutex held. */
 static void netns_bpf_run_array_detach(struct net *net,
 				       enum netns_bpf_attach_type type)
@@ -66,7 +58,7 @@ static void bpf_netns_link_release(struct bpf_link *link)
 		goto out_unlock;
 
 	netns_bpf_run_array_detach(net, type);
-	net->bpf.links[type] = NULL;
+	list_del(&net_link->node);
 
 out_unlock:
 	mutex_unlock(&netns_bpf_mutex);
@@ -225,7 +217,7 @@ int netns_bpf_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
 	mutex_lock(&netns_bpf_mutex);
 
 	/* Attaching prog directly is not compatible with links */
-	if (net->bpf.links[type]) {
+	if (!list_empty(&net->bpf.links[type])) {
 		ret = -EEXIST;
 		goto out_unlock;
 	}
@@ -279,7 +271,7 @@ static int __netns_bpf_prog_detach(struct net *net,
 	struct bpf_prog *attached;
 
 	/* Progs attached via links cannot be detached */
-	if (net->bpf.links[type])
+	if (!list_empty(&net->bpf.links[type]))
 		return -EINVAL;
 
 	attached = net->bpf.progs[type];
@@ -310,13 +302,15 @@ int netns_bpf_prog_detach(const union bpf_attr *attr)
 static int netns_bpf_link_attach(struct net *net, struct bpf_link *link,
 				 enum netns_bpf_attach_type type)
 {
+	struct bpf_netns_link *net_link =
+		container_of(link, struct bpf_netns_link, link);
 	struct bpf_prog_array *run_array;
 	int err;
 
 	mutex_lock(&netns_bpf_mutex);
 
 	/* Allow attaching only one prog or link for now */
-	if (net->bpf.links[type]) {
+	if (!list_empty(&net->bpf.links[type])) {
 		err = -E2BIG;
 		goto out_unlock;
 	}
@@ -345,7 +339,7 @@ static int netns_bpf_link_attach(struct net *net, struct bpf_link *link,
 	run_array->items[0].prog = link->prog;
 	rcu_assign_pointer(net->bpf.run_array[type], run_array);
 
-	net->bpf.links[type] = link;
+	list_add_tail(&net_link->node, &net->bpf.links[type]);
 
 out_unlock:
 	mutex_unlock(&netns_bpf_mutex);
@@ -404,24 +398,34 @@ int netns_bpf_link_create(const union bpf_attr *attr, struct bpf_prog *prog)
 	return err;
 }
 
+static int __net_init netns_bpf_pernet_init(struct net *net)
+{
+	int type;
+
+	for (type = 0; type < MAX_NETNS_BPF_ATTACH_TYPE; type++)
+		INIT_LIST_HEAD(&net->bpf.links[type]);
+
+	return 0;
+}
+
 static void __net_exit netns_bpf_pernet_pre_exit(struct net *net)
 {
 	enum netns_bpf_attach_type type;
-	struct bpf_link *link;
+	struct bpf_netns_link *net_link;
 
 	mutex_lock(&netns_bpf_mutex);
 	for (type = 0; type < MAX_NETNS_BPF_ATTACH_TYPE; type++) {
 		netns_bpf_run_array_detach(net, type);
-		link = net->bpf.links[type];
-		if (link)
-			bpf_netns_link_auto_detach(link);
-		else if (net->bpf.progs[type])
+		list_for_each_entry(net_link, &net->bpf.links[type], node)
+			net_link->net = NULL; /* auto-detach link */
+		if (net->bpf.progs[type])
 			bpf_prog_put(net->bpf.progs[type]);
 	}
 	mutex_unlock(&netns_bpf_mutex);
 }
 
 static struct pernet_operations netns_bpf_pernet_ops __net_initdata = {
+	.init = netns_bpf_pernet_init,
 	.pre_exit = netns_bpf_pernet_pre_exit,
 };
 
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH bpf-next v3 4/4] selftests/bpf: Test updating flow_dissector link with same program
  2020-06-25 14:13 [PATCH bpf-next v3 0/4] bpf, netns: Prepare for multi-prog attachment Jakub Sitnicki
                   ` (2 preceding siblings ...)
  2020-06-25 14:13 ` [PATCH bpf-next v3 3/4] bpf, netns: Keep a list of attached bpf_link's Jakub Sitnicki
@ 2020-06-25 14:13 ` Jakub Sitnicki
  2020-06-26  5:52   ` Martin KaFai Lau
  3 siblings, 1 reply; 14+ messages in thread
From: Jakub Sitnicki @ 2020-06-25 14:13 UTC (permalink / raw)
  To: bpf; +Cc: netdev, kernel-team

This case, while not particularly useful, is worth covering because we
expect the operation to succeed as opposed when re-attaching the same
program directly with PROG_ATTACH.

While at it, update the tests summary that fell out of sync when tests
extended to cover links.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 .../bpf/prog_tests/flow_dissector_reattach.c  | 32 ++++++++++++++++---
 1 file changed, 28 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/flow_dissector_reattach.c b/tools/testing/selftests/bpf/prog_tests/flow_dissector_reattach.c
index 15cb554a66d8..a2db3b0f84db 100644
--- a/tools/testing/selftests/bpf/prog_tests/flow_dissector_reattach.c
+++ b/tools/testing/selftests/bpf/prog_tests/flow_dissector_reattach.c
@@ -1,9 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 /*
- * Test that the flow_dissector program can be updated with a single
- * syscall by attaching a new program that replaces the existing one.
- *
- * Corner case - the same program cannot be attached twice.
+ * Tests for attaching, detaching, and replacing flow_dissector BPF program.
  */
 
 #define _GNU_SOURCE
@@ -308,6 +305,31 @@ static void test_link_update_replace_old_prog(int netns, int prog1, int prog2)
 	CHECK_FAIL(prog_is_attached(netns));
 }
 
+static void test_link_update_same_prog(int netns, int prog1, int prog2)
+{
+	DECLARE_LIBBPF_OPTS(bpf_link_create_opts, create_opts);
+	DECLARE_LIBBPF_OPTS(bpf_link_update_opts, update_opts);
+	int err, link;
+
+	link = bpf_link_create(prog1, netns, BPF_FLOW_DISSECTOR, &create_opts);
+	if (CHECK_FAIL(link < 0)) {
+		perror("bpf_link_create(prog1)");
+		return;
+	}
+	CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
+
+	/* Expect success updating the prog with the same one */
+	update_opts.flags = 0;
+	update_opts.old_prog_fd = 0;
+	err = bpf_link_update(link, prog1, &update_opts);
+	if (CHECK_FAIL(err))
+		perror("bpf_link_update");
+	CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
+
+	close(link);
+	CHECK_FAIL(prog_is_attached(netns));
+}
+
 static void test_link_update_invalid_opts(int netns, int prog1, int prog2)
 {
 	DECLARE_LIBBPF_OPTS(bpf_link_create_opts, create_opts);
@@ -571,6 +593,8 @@ static void run_tests(int netns)
 		  test_link_update_no_old_prog },
 		{ "link update with replace old prog",
 		  test_link_update_replace_old_prog },
+		{ "link update with same prog",
+		  test_link_update_same_prog },
 		{ "link update invalid opts",
 		  test_link_update_invalid_opts },
 		{ "link update invalid prog",
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next v3 2/4] bpf, netns: Keep attached programs in bpf_prog_array
  2020-06-25 14:13 ` [PATCH bpf-next v3 2/4] bpf, netns: Keep attached programs in bpf_prog_array Jakub Sitnicki
@ 2020-06-25 20:50   ` Andrii Nakryiko
  2020-06-26  9:45     ` Jakub Sitnicki
  2020-06-26  5:41   ` Martin KaFai Lau
  1 sibling, 1 reply; 14+ messages in thread
From: Andrii Nakryiko @ 2020-06-25 20:50 UTC (permalink / raw)
  To: Jakub Sitnicki; +Cc: bpf, Networking, kernel-team

On Thu, Jun 25, 2020 at 7:17 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> Prepare for having multi-prog attachments for new netns attach types by
> storing programs to run in a bpf_prog_array, which is well suited for
> iterating over programs and running them in sequence.
>
> After this change bpf(PROG_QUERY) may block to allocate memory in
> bpf_prog_array_copy_to_user() for collected program IDs. This forces a
> change in how we protect access to the attached program in the query
> callback. Because bpf_prog_array_copy_to_user() can sleep, we switch from
> an RCU read lock to holding a mutex that serializes updaters.
>
> Because we allow only one BPF flow_dissector program to be attached to
> netns at all times, the bpf_prog_array pointed by net->bpf.run_array is
> always either detached (null) or one element long.
>
> No functional changes intended.
>
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---

I wonder if instead of NULL prog_array, it's better to just use a
dummy empty (but allocated) array. Might help eliminate some of the
IFs, maybe even in the hot path.


>  include/net/netns/bpf.h    |   5 +-
>  kernel/bpf/net_namespace.c | 120 +++++++++++++++++++++++++------------
>  net/core/flow_dissector.c  |  19 +++---
>  3 files changed, 96 insertions(+), 48 deletions(-)
>

[...]


>
> +/* Must be called with netns_bpf_mutex held. */
> +static int __netns_bpf_prog_query(const union bpf_attr *attr,
> +                                 union bpf_attr __user *uattr,
> +                                 struct net *net,
> +                                 enum netns_bpf_attach_type type)
> +{
> +       __u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids);
> +       struct bpf_prog_array *run_array;
> +       u32 prog_cnt = 0, flags = 0;
> +
> +       run_array = rcu_dereference_protected(net->bpf.run_array[type],
> +                                             lockdep_is_held(&netns_bpf_mutex));
> +       if (run_array)
> +               prog_cnt = bpf_prog_array_length(run_array);
> +
> +       if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags)))
> +               return -EFAULT;
> +       if (copy_to_user(&uattr->query.prog_cnt, &prog_cnt, sizeof(prog_cnt)))
> +               return -EFAULT;
> +       if (!attr->query.prog_cnt || !prog_ids || !prog_cnt)
> +               return 0;
> +
> +       return bpf_prog_array_copy_to_user(run_array, prog_ids,
> +                                          attr->query.prog_cnt);

It doesn't seem like bpf_prog_array_copy_to_user can handle NULL run_array

> +}
> +
>  int netns_bpf_prog_query(const union bpf_attr *attr,
>                          union bpf_attr __user *uattr)
>  {
> -       __u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids);
> -       u32 prog_id, prog_cnt = 0, flags = 0;
>         enum netns_bpf_attach_type type;
> -       struct bpf_prog *attached;
>         struct net *net;
> +       int ret;
>
>         if (attr->query.query_flags)
>                 return -EINVAL;

[...]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next v3 1/4] flow_dissector: Pull BPF program assignment up to bpf-netns
  2020-06-25 14:13 ` [PATCH bpf-next v3 1/4] flow_dissector: Pull BPF program assignment up to bpf-netns Jakub Sitnicki
@ 2020-06-26  1:38   ` Martin KaFai Lau
  0 siblings, 0 replies; 14+ messages in thread
From: Martin KaFai Lau @ 2020-06-26  1:38 UTC (permalink / raw)
  To: Jakub Sitnicki; +Cc: bpf, netdev, kernel-team, Andrii Nakryiko

On Thu, Jun 25, 2020 at 04:13:54PM +0200, Jakub Sitnicki wrote:
> Prepare for using bpf_prog_array to store attached programs by moving out
> code that updates the attached program out of flow dissector.
> 
> Managing bpf_prog_array is more involved than updating a single bpf_prog
> pointer. This will let us do it all from one place, bpf/net_namespace.c, in
> the subsequent patch.
> 
> No functional change intended.
Acked-by: Martin KaFai Lau <kafai@fb.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next v3 2/4] bpf, netns: Keep attached programs in bpf_prog_array
  2020-06-25 14:13 ` [PATCH bpf-next v3 2/4] bpf, netns: Keep attached programs in bpf_prog_array Jakub Sitnicki
  2020-06-25 20:50   ` Andrii Nakryiko
@ 2020-06-26  5:41   ` Martin KaFai Lau
  2020-06-26 10:40     ` Jakub Sitnicki
  1 sibling, 1 reply; 14+ messages in thread
From: Martin KaFai Lau @ 2020-06-26  5:41 UTC (permalink / raw)
  To: Jakub Sitnicki; +Cc: bpf, netdev, kernel-team

On Thu, Jun 25, 2020 at 04:13:55PM +0200, Jakub Sitnicki wrote:
> Prepare for having multi-prog attachments for new netns attach types by
> storing programs to run in a bpf_prog_array, which is well suited for
> iterating over programs and running them in sequence.
> 
> After this change bpf(PROG_QUERY) may block to allocate memory in
> bpf_prog_array_copy_to_user() for collected program IDs. This forces a
> change in how we protect access to the attached program in the query
> callback. Because bpf_prog_array_copy_to_user() can sleep, we switch from
> an RCU read lock to holding a mutex that serializes updaters.
> 
> Because we allow only one BPF flow_dissector program to be attached to
> netns at all times, the bpf_prog_array pointed by net->bpf.run_array is
> always either detached (null) or one element long.
> 
> No functional changes intended.
> 
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---
>  include/net/netns/bpf.h    |   5 +-
>  kernel/bpf/net_namespace.c | 120 +++++++++++++++++++++++++------------
>  net/core/flow_dissector.c  |  19 +++---
>  3 files changed, 96 insertions(+), 48 deletions(-)
> 
> diff --git a/include/net/netns/bpf.h b/include/net/netns/bpf.h
> index a8dce2a380c8..a5015bda9979 100644
> --- a/include/net/netns/bpf.h
> +++ b/include/net/netns/bpf.h
> @@ -9,9 +9,12 @@
>  #include <linux/bpf-netns.h>
>  
>  struct bpf_prog;
> +struct bpf_prog_array;
>  
>  struct netns_bpf {
> -	struct bpf_prog __rcu *progs[MAX_NETNS_BPF_ATTACH_TYPE];
> +	/* Array of programs to run compiled from progs or links */
> +	struct bpf_prog_array __rcu *run_array[MAX_NETNS_BPF_ATTACH_TYPE];
> +	struct bpf_prog *progs[MAX_NETNS_BPF_ATTACH_TYPE];
>  	struct bpf_link *links[MAX_NETNS_BPF_ATTACH_TYPE];
With the new run_array, I think the "*progs[]" is not needed.
It seems the original "*progs[]" is only used to tell
if it is in the prog_attach mode or the newer link mode.
There is other ways to do that.

It is something to think about when there is more clarity on how
multi netns prog will look like in the next set.

Other lgtm,

Acked-by: Martin KaFai Lau <kafai@fb.com>

Please continue the other discussion.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next v3 3/4] bpf, netns: Keep a list of attached bpf_link's
  2020-06-25 14:13 ` [PATCH bpf-next v3 3/4] bpf, netns: Keep a list of attached bpf_link's Jakub Sitnicki
@ 2020-06-26  5:43   ` Martin KaFai Lau
  0 siblings, 0 replies; 14+ messages in thread
From: Martin KaFai Lau @ 2020-06-26  5:43 UTC (permalink / raw)
  To: Jakub Sitnicki; +Cc: bpf, netdev, kernel-team, Andrii Nakryiko

On Thu, Jun 25, 2020 at 04:13:56PM +0200, Jakub Sitnicki wrote:
> To support multi-prog link-based attachments for new netns attach types, we
> need to keep track of more than one bpf_link per attach type. Hence,
> convert net->bpf.links into a list, that currently can be either empty or
> have just one item.
> 
> Instead of reusing bpf_prog_list from bpf-cgroup, we link together
> bpf_netns_link's themselves. This makes list management simpler as we don't
> have to allocate, initialize, and later release list elements. We can do
> this because multi-prog attachment will be available only for bpf_link, and
> we don't need to build a list of programs attached directly and indirectly
> via links.
Acked-by: Martin KaFai Lau <kafai@fb.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next v3 4/4] selftests/bpf: Test updating flow_dissector link with same program
  2020-06-25 14:13 ` [PATCH bpf-next v3 4/4] selftests/bpf: Test updating flow_dissector link with same program Jakub Sitnicki
@ 2020-06-26  5:52   ` Martin KaFai Lau
  0 siblings, 0 replies; 14+ messages in thread
From: Martin KaFai Lau @ 2020-06-26  5:52 UTC (permalink / raw)
  To: Jakub Sitnicki; +Cc: bpf, netdev, kernel-team

On Thu, Jun 25, 2020 at 04:13:57PM +0200, Jakub Sitnicki wrote:
> This case, while not particularly useful, is worth covering because we
> expect the operation to succeed as opposed when re-attaching the same
> program directly with PROG_ATTACH.
> 
> While at it, update the tests summary that fell out of sync when tests
> extended to cover links.
Acked-by: Martin KaFai Lau <kafai@fb.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next v3 2/4] bpf, netns: Keep attached programs in bpf_prog_array
  2020-06-25 20:50   ` Andrii Nakryiko
@ 2020-06-26  9:45     ` Jakub Sitnicki
  2020-06-26 22:13       ` Andrii Nakryiko
  0 siblings, 1 reply; 14+ messages in thread
From: Jakub Sitnicki @ 2020-06-26  9:45 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, Networking, kernel-team

On Thu, Jun 25, 2020 at 10:50 PM CEST, Andrii Nakryiko wrote:
> On Thu, Jun 25, 2020 at 7:17 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>
>> Prepare for having multi-prog attachments for new netns attach types by
>> storing programs to run in a bpf_prog_array, which is well suited for
>> iterating over programs and running them in sequence.
>>
>> After this change bpf(PROG_QUERY) may block to allocate memory in
>> bpf_prog_array_copy_to_user() for collected program IDs. This forces a
>> change in how we protect access to the attached program in the query
>> callback. Because bpf_prog_array_copy_to_user() can sleep, we switch from
>> an RCU read lock to holding a mutex that serializes updaters.
>>
>> Because we allow only one BPF flow_dissector program to be attached to
>> netns at all times, the bpf_prog_array pointed by net->bpf.run_array is
>> always either detached (null) or one element long.
>>
>> No functional changes intended.
>>
>> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
>> ---
>
> I wonder if instead of NULL prog_array, it's better to just use a
> dummy empty (but allocated) array. Might help eliminate some of the
> IFs, maybe even in the hot path.

That was my initial approach, which I abandoned seeing that it leads to
replacing NULL prog_array checks in flow_dissector with
bpf_prog_array_is_empty() checks to determine which netns has a BPF
program attached. So no IFs gone there.

While on the hot path, where we run the program, we probably would still
be left with an IF checking for empty prog_array to avoid building the
context if no progs will RUN.

The checks I'm referring to are on attach path, in
flow_dissector_bpf_prog_attach_check(), and hot-path,
__skb_flow_dissect().

>
>
>>  include/net/netns/bpf.h    |   5 +-
>>  kernel/bpf/net_namespace.c | 120 +++++++++++++++++++++++++------------
>>  net/core/flow_dissector.c  |  19 +++---
>>  3 files changed, 96 insertions(+), 48 deletions(-)
>>
>
> [...]
>
>
>>
>> +/* Must be called with netns_bpf_mutex held. */
>> +static int __netns_bpf_prog_query(const union bpf_attr *attr,
>> +                                 union bpf_attr __user *uattr,
>> +                                 struct net *net,
>> +                                 enum netns_bpf_attach_type type)
>> +{
>> +       __u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids);
>> +       struct bpf_prog_array *run_array;
>> +       u32 prog_cnt = 0, flags = 0;
>> +
>> +       run_array = rcu_dereference_protected(net->bpf.run_array[type],
>> +                                             lockdep_is_held(&netns_bpf_mutex));
>> +       if (run_array)
>> +               prog_cnt = bpf_prog_array_length(run_array);
>> +
>> +       if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags)))
>> +               return -EFAULT;
>> +       if (copy_to_user(&uattr->query.prog_cnt, &prog_cnt, sizeof(prog_cnt)))
>> +               return -EFAULT;
>> +       if (!attr->query.prog_cnt || !prog_ids || !prog_cnt)
>> +               return 0;
>> +
>> +       return bpf_prog_array_copy_to_user(run_array, prog_ids,
>> +                                          attr->query.prog_cnt);
>
> It doesn't seem like bpf_prog_array_copy_to_user can handle NULL run_array

Correct. And we never invoke it when run_array is NULL because then
prog_cnt == 0.

>
>> +}
>> +
>>  int netns_bpf_prog_query(const union bpf_attr *attr,
>>                          union bpf_attr __user *uattr)
>>  {
>> -       __u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids);
>> -       u32 prog_id, prog_cnt = 0, flags = 0;
>>         enum netns_bpf_attach_type type;
>> -       struct bpf_prog *attached;
>>         struct net *net;
>> +       int ret;
>>
>>         if (attr->query.query_flags)
>>                 return -EINVAL;
>
> [...]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next v3 2/4] bpf, netns: Keep attached programs in bpf_prog_array
  2020-06-26  5:41   ` Martin KaFai Lau
@ 2020-06-26 10:40     ` Jakub Sitnicki
  0 siblings, 0 replies; 14+ messages in thread
From: Jakub Sitnicki @ 2020-06-26 10:40 UTC (permalink / raw)
  To: Martin KaFai Lau; +Cc: bpf, netdev, kernel-team

On Fri, Jun 26, 2020 at 07:41 AM CEST, Martin KaFai Lau wrote:
> On Thu, Jun 25, 2020 at 04:13:55PM +0200, Jakub Sitnicki wrote:
>> Prepare for having multi-prog attachments for new netns attach types by
>> storing programs to run in a bpf_prog_array, which is well suited for
>> iterating over programs and running them in sequence.
>>
>> After this change bpf(PROG_QUERY) may block to allocate memory in
>> bpf_prog_array_copy_to_user() for collected program IDs. This forces a
>> change in how we protect access to the attached program in the query
>> callback. Because bpf_prog_array_copy_to_user() can sleep, we switch from
>> an RCU read lock to holding a mutex that serializes updaters.
>>
>> Because we allow only one BPF flow_dissector program to be attached to
>> netns at all times, the bpf_prog_array pointed by net->bpf.run_array is
>> always either detached (null) or one element long.
>>
>> No functional changes intended.
>>
>> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
>> ---
>>  include/net/netns/bpf.h    |   5 +-
>>  kernel/bpf/net_namespace.c | 120 +++++++++++++++++++++++++------------
>>  net/core/flow_dissector.c  |  19 +++---
>>  3 files changed, 96 insertions(+), 48 deletions(-)
>>
>> diff --git a/include/net/netns/bpf.h b/include/net/netns/bpf.h
>> index a8dce2a380c8..a5015bda9979 100644
>> --- a/include/net/netns/bpf.h
>> +++ b/include/net/netns/bpf.h
>> @@ -9,9 +9,12 @@
>>  #include <linux/bpf-netns.h>
>>
>>  struct bpf_prog;
>> +struct bpf_prog_array;
>>
>>  struct netns_bpf {
>> -	struct bpf_prog __rcu *progs[MAX_NETNS_BPF_ATTACH_TYPE];
>> +	/* Array of programs to run compiled from progs or links */
>> +	struct bpf_prog_array __rcu *run_array[MAX_NETNS_BPF_ATTACH_TYPE];
>> +	struct bpf_prog *progs[MAX_NETNS_BPF_ATTACH_TYPE];
>>  	struct bpf_link *links[MAX_NETNS_BPF_ATTACH_TYPE];
> With the new run_array, I think the "*progs[]" is not needed.
> It seems the original "*progs[]" is only used to tell
> if it is in the prog_attach mode or the newer link mode.
> There is other ways to do that.
>
> It is something to think about when there is more clarity on how
> multi netns prog will look like in the next set.

Having just the run_array without *progs[] is something I've tried
initially but ended up rewriting it. The end result was confusing to me.
I couldn't convince myself to sign off on it and present it.

Adding back the pointer to bpf_prog was counterintutivive, because it is
wasteful, but it actually made the code readable.

Best I can articulate why it didn't work out great (should have tagged
the branch...) is that without *progs[] the run_array holds bpf_prog
pointers sometimes with ref-count on the prog (old mode), and sometimes
without one (new mode). This mixed state manifests itself mostly on
netns teardown, where we need to access the prog_array to put the prog,
but only if we're not using links.

Then again, perhaps I simply messsed up the code back then, and it
deserves another shot. Either way, getting rid of *progs[] is a
potential optimization.

[...]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next v3 2/4] bpf, netns: Keep attached programs in bpf_prog_array
  2020-06-26  9:45     ` Jakub Sitnicki
@ 2020-06-26 22:13       ` Andrii Nakryiko
  2020-06-26 22:15         ` Andrii Nakryiko
  0 siblings, 1 reply; 14+ messages in thread
From: Andrii Nakryiko @ 2020-06-26 22:13 UTC (permalink / raw)
  To: Jakub Sitnicki; +Cc: bpf, Networking, kernel-team

On Fri, Jun 26, 2020 at 2:45 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> On Thu, Jun 25, 2020 at 10:50 PM CEST, Andrii Nakryiko wrote:
> > On Thu, Jun 25, 2020 at 7:17 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
> >>
> >> Prepare for having multi-prog attachments for new netns attach types by
> >> storing programs to run in a bpf_prog_array, which is well suited for
> >> iterating over programs and running them in sequence.
> >>
> >> After this change bpf(PROG_QUERY) may block to allocate memory in
> >> bpf_prog_array_copy_to_user() for collected program IDs. This forces a
> >> change in how we protect access to the attached program in the query
> >> callback. Because bpf_prog_array_copy_to_user() can sleep, we switch from
> >> an RCU read lock to holding a mutex that serializes updaters.
> >>
> >> Because we allow only one BPF flow_dissector program to be attached to
> >> netns at all times, the bpf_prog_array pointed by net->bpf.run_array is
> >> always either detached (null) or one element long.
> >>
> >> No functional changes intended.
> >>
> >> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> >> ---
> >
> > I wonder if instead of NULL prog_array, it's better to just use a
> > dummy empty (but allocated) array. Might help eliminate some of the
> > IFs, maybe even in the hot path.
>
> That was my initial approach, which I abandoned seeing that it leads to
> replacing NULL prog_array checks in flow_dissector with
> bpf_prog_array_is_empty() checks to determine which netns has a BPF
> program attached. So no IFs gone there.
>
> While on the hot path, where we run the program, we probably would still
> be left with an IF checking for empty prog_array to avoid building the
> context if no progs will RUN.
>
> The checks I'm referring to are on attach path, in
> flow_dissector_bpf_prog_attach_check(), and hot-path,
> __skb_flow_dissect().
>

Fair enough.

> >
> >
> >>  include/net/netns/bpf.h    |   5 +-
> >>  kernel/bpf/net_namespace.c | 120 +++++++++++++++++++++++++------------
> >>  net/core/flow_dissector.c  |  19 +++---
> >>  3 files changed, 96 insertions(+), 48 deletions(-)
> >>
> >
> > [...]
> >
> >
> >>
> >> +/* Must be called with netns_bpf_mutex held. */
> >> +static int __netns_bpf_prog_query(const union bpf_attr *attr,
> >> +                                 union bpf_attr __user *uattr,
> >> +                                 struct net *net,
> >> +                                 enum netns_bpf_attach_type type)
> >> +{
> >> +       __u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids);
> >> +       struct bpf_prog_array *run_array;
> >> +       u32 prog_cnt = 0, flags = 0;
> >> +
> >> +       run_array = rcu_dereference_protected(net->bpf.run_array[type],
> >> +                                             lockdep_is_held(&netns_bpf_mutex));
> >> +       if (run_array)
> >> +               prog_cnt = bpf_prog_array_length(run_array);
> >> +
> >> +       if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags)))
> >> +               return -EFAULT;
> >> +       if (copy_to_user(&uattr->query.prog_cnt, &prog_cnt, sizeof(prog_cnt)))
> >> +               return -EFAULT;
> >> +       if (!attr->query.prog_cnt || !prog_ids || !prog_cnt)
> >> +               return 0;
> >> +
> >> +       return bpf_prog_array_copy_to_user(run_array, prog_ids,
> >> +                                          attr->query.prog_cnt);
> >
> > It doesn't seem like bpf_prog_array_copy_to_user can handle NULL run_array
>
> Correct. And we never invoke it when run_array is NULL because then
> prog_cnt == 0.

Oh, that !prog_cnt above, right.. it's easy to miss.

>
> >
> >> +}
> >> +
> >>  int netns_bpf_prog_query(const union bpf_attr *attr,
> >>                          union bpf_attr __user *uattr)
> >>  {
> >> -       __u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids);
> >> -       u32 prog_id, prog_cnt = 0, flags = 0;
> >>         enum netns_bpf_attach_type type;
> >> -       struct bpf_prog *attached;
> >>         struct net *net;
> >> +       int ret;
> >>
> >>         if (attr->query.query_flags)
> >>                 return -EINVAL;
> >
> > [...]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next v3 2/4] bpf, netns: Keep attached programs in bpf_prog_array
  2020-06-26 22:13       ` Andrii Nakryiko
@ 2020-06-26 22:15         ` Andrii Nakryiko
  0 siblings, 0 replies; 14+ messages in thread
From: Andrii Nakryiko @ 2020-06-26 22:15 UTC (permalink / raw)
  To: Jakub Sitnicki; +Cc: bpf, Networking, kernel-team

On Fri, Jun 26, 2020 at 3:13 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Fri, Jun 26, 2020 at 2:45 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
> >
> > On Thu, Jun 25, 2020 at 10:50 PM CEST, Andrii Nakryiko wrote:
> > > On Thu, Jun 25, 2020 at 7:17 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
> > >>
> > >> Prepare for having multi-prog attachments for new netns attach types by
> > >> storing programs to run in a bpf_prog_array, which is well suited for
> > >> iterating over programs and running them in sequence.
> > >>
> > >> After this change bpf(PROG_QUERY) may block to allocate memory in
> > >> bpf_prog_array_copy_to_user() for collected program IDs. This forces a
> > >> change in how we protect access to the attached program in the query
> > >> callback. Because bpf_prog_array_copy_to_user() can sleep, we switch from
> > >> an RCU read lock to holding a mutex that serializes updaters.
> > >>
> > >> Because we allow only one BPF flow_dissector program to be attached to
> > >> netns at all times, the bpf_prog_array pointed by net->bpf.run_array is
> > >> always either detached (null) or one element long.
> > >>
> > >> No functional changes intended.
> > >>
> > >> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> > >> ---
> > >
> > > I wonder if instead of NULL prog_array, it's better to just use a
> > > dummy empty (but allocated) array. Might help eliminate some of the
> > > IFs, maybe even in the hot path.
> >
> > That was my initial approach, which I abandoned seeing that it leads to
> > replacing NULL prog_array checks in flow_dissector with
> > bpf_prog_array_is_empty() checks to determine which netns has a BPF
> > program attached. So no IFs gone there.
> >
> > While on the hot path, where we run the program, we probably would still
> > be left with an IF checking for empty prog_array to avoid building the
> > context if no progs will RUN.
> >
> > The checks I'm referring to are on attach path, in
> > flow_dissector_bpf_prog_attach_check(), and hot-path,
> > __skb_flow_dissect().
> >
>
> Fair enough.
>

Acked-by: Andrii Nakryiko <andriin@fb.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-06-26 22:15 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-25 14:13 [PATCH bpf-next v3 0/4] bpf, netns: Prepare for multi-prog attachment Jakub Sitnicki
2020-06-25 14:13 ` [PATCH bpf-next v3 1/4] flow_dissector: Pull BPF program assignment up to bpf-netns Jakub Sitnicki
2020-06-26  1:38   ` Martin KaFai Lau
2020-06-25 14:13 ` [PATCH bpf-next v3 2/4] bpf, netns: Keep attached programs in bpf_prog_array Jakub Sitnicki
2020-06-25 20:50   ` Andrii Nakryiko
2020-06-26  9:45     ` Jakub Sitnicki
2020-06-26 22:13       ` Andrii Nakryiko
2020-06-26 22:15         ` Andrii Nakryiko
2020-06-26  5:41   ` Martin KaFai Lau
2020-06-26 10:40     ` Jakub Sitnicki
2020-06-25 14:13 ` [PATCH bpf-next v3 3/4] bpf, netns: Keep a list of attached bpf_link's Jakub Sitnicki
2020-06-26  5:43   ` Martin KaFai Lau
2020-06-25 14:13 ` [PATCH bpf-next v3 4/4] selftests/bpf: Test updating flow_dissector link with same program Jakub Sitnicki
2020-06-26  5:52   ` Martin KaFai Lau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).