All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next 0/3] bpf, netns: Prepare for multi-prog attachment
@ 2020-06-22 16:02 Jakub Sitnicki
  2020-06-22 16:02 ` [PATCH bpf-next 1/3] flow_dissector: Pull BPF program assignment up to bpf-netns Jakub Sitnicki
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Jakub Sitnicki @ 2020-06-22 16:02 UTC (permalink / raw)
  To: bpf; +Cc: netdev, kernel-team, Stanislav Fomichev

This patch set prepares ground for link-based multi-prog attachment for
future netns attach types, with BPF_SK_LOOKUP attach type in mind [0].

Two changes are needed in order to attach and run a series of BPF programs:

  1) an bpf_prog_array of programs to run (patch #2), and
  2) a list of attached links to keep track of attachments (patch #3).

I've been using these patches with the next iteration of BPF socket lookup
hook patches, and saw that they are self-contained and can be split out to
ease the review burden.

Nothing changes for BPF flow_dissector. That is at most one prog can be
attached.

Thanks,
-jkbs

[0] https://lore.kernel.org/bpf/20200511185218.1422406-1-jakub@cloudflare.com/

Cc: Stanislav Fomichev <sdf@google.com>

Jakub Sitnicki (3):
  flow_dissector: Pull BPF program assignment up to bpf-netns
  bpf, netns: Keep attached programs in bpf_prog_array
  bpf, netns: Keep a list of attached bpf_link's

 include/linux/bpf.h          |   3 +
 include/net/flow_dissector.h |   3 +-
 include/net/netns/bpf.h      |   7 +-
 kernel/bpf/core.c            |  19 +++-
 kernel/bpf/net_namespace.c   | 189 +++++++++++++++++++++++++----------
 net/core/flow_dissector.c    |  34 +++----
 6 files changed, 172 insertions(+), 83 deletions(-)

-- 
2.25.4


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH bpf-next 1/3] flow_dissector: Pull BPF program assignment up to bpf-netns
  2020-06-22 16:02 [PATCH bpf-next 0/3] bpf, netns: Prepare for multi-prog attachment Jakub Sitnicki
@ 2020-06-22 16:02 ` Jakub Sitnicki
  2020-06-23  6:02   ` Andrii Nakryiko
  2020-06-22 16:02 ` [PATCH bpf-next 2/3] bpf, netns: Keep attached programs in bpf_prog_array Jakub Sitnicki
  2020-06-22 16:03 ` [PATCH bpf-next 3/3] bpf, netns: Keep a list of attached bpf_link's Jakub Sitnicki
  2 siblings, 1 reply; 8+ messages in thread
From: Jakub Sitnicki @ 2020-06-22 16:02 UTC (permalink / raw)
  To: bpf; +Cc: netdev, kernel-team

Prepare for using bpf_prog_array to store attached programs by moving out
code that updates the attached program out of flow dissector.

Managing bpf_prog_array is more involved than updating a single bpf_prog
pointer. This will let us do it all from one place, bpf/net_namespace.c, in
the subsequent patch.

No functional change intended.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 include/net/flow_dissector.h |  3 ++-
 kernel/bpf/net_namespace.c   | 20 ++++++++++++++++++--
 net/core/flow_dissector.c    | 13 ++-----------
 3 files changed, 22 insertions(+), 14 deletions(-)

diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
index a7eba43fe4e4..4b6e36288ddd 100644
--- a/include/net/flow_dissector.h
+++ b/include/net/flow_dissector.h
@@ -372,7 +372,8 @@ flow_dissector_init_keys(struct flow_dissector_key_control *key_control,
 }
 
 #ifdef CONFIG_BPF_SYSCALL
-int flow_dissector_bpf_prog_attach(struct net *net, struct bpf_prog *prog);
+int flow_dissector_bpf_prog_attach_check(struct net *net,
+					 struct bpf_prog *prog);
 #endif /* CONFIG_BPF_SYSCALL */
 
 #endif
diff --git a/kernel/bpf/net_namespace.c b/kernel/bpf/net_namespace.c
index 78cf061f8179..b951dab2687f 100644
--- a/kernel/bpf/net_namespace.c
+++ b/kernel/bpf/net_namespace.c
@@ -189,6 +189,7 @@ int netns_bpf_prog_query(const union bpf_attr *attr,
 int netns_bpf_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
 {
 	enum netns_bpf_attach_type type;
+	struct bpf_prog *attached;
 	struct net *net;
 	int ret;
 
@@ -207,12 +208,26 @@ int netns_bpf_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
 
 	switch (type) {
 	case NETNS_BPF_FLOW_DISSECTOR:
-		ret = flow_dissector_bpf_prog_attach(net, prog);
+		ret = flow_dissector_bpf_prog_attach_check(net, prog);
 		break;
 	default:
 		ret = -EINVAL;
 		break;
 	}
+	if (ret)
+		goto out_unlock;
+
+	attached = rcu_dereference_protected(net->bpf.progs[type],
+					     lockdep_is_held(&netns_bpf_mutex));
+	if (attached == prog) {
+		/* The same program cannot be attached twice */
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+	rcu_assign_pointer(net->bpf.progs[type], prog);
+	if (attached)
+		bpf_prog_put(attached);
+
 out_unlock:
 	mutex_unlock(&netns_bpf_mutex);
 
@@ -277,7 +292,7 @@ static int netns_bpf_link_attach(struct net *net, struct bpf_link *link,
 
 	switch (type) {
 	case NETNS_BPF_FLOW_DISSECTOR:
-		err = flow_dissector_bpf_prog_attach(net, link->prog);
+		err = flow_dissector_bpf_prog_attach_check(net, link->prog);
 		break;
 	default:
 		err = -EINVAL;
@@ -286,6 +301,7 @@ static int netns_bpf_link_attach(struct net *net, struct bpf_link *link,
 	if (err)
 		goto out_unlock;
 
+	rcu_assign_pointer(net->bpf.progs[type], link->prog);
 	net->bpf.links[type] = link;
 
 out_unlock:
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index d02df0b6d0d9..b57fb1359395 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -70,10 +70,10 @@ void skb_flow_dissector_init(struct flow_dissector *flow_dissector,
 EXPORT_SYMBOL(skb_flow_dissector_init);
 
 #ifdef CONFIG_BPF_SYSCALL
-int flow_dissector_bpf_prog_attach(struct net *net, struct bpf_prog *prog)
+int flow_dissector_bpf_prog_attach_check(struct net *net,
+					 struct bpf_prog *prog)
 {
 	enum netns_bpf_attach_type type = NETNS_BPF_FLOW_DISSECTOR;
-	struct bpf_prog *attached;
 
 	if (net == &init_net) {
 		/* BPF flow dissector in the root namespace overrides
@@ -97,15 +97,6 @@ int flow_dissector_bpf_prog_attach(struct net *net, struct bpf_prog *prog)
 			return -EEXIST;
 	}
 
-	attached = rcu_dereference_protected(net->bpf.progs[type],
-					     lockdep_is_held(&netns_bpf_mutex));
-	if (attached == prog)
-		/* The same program cannot be attached twice */
-		return -EINVAL;
-
-	rcu_assign_pointer(net->bpf.progs[type], prog);
-	if (attached)
-		bpf_prog_put(attached);
 	return 0;
 }
 #endif /* CONFIG_BPF_SYSCALL */
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH bpf-next 2/3] bpf, netns: Keep attached programs in bpf_prog_array
  2020-06-22 16:02 [PATCH bpf-next 0/3] bpf, netns: Prepare for multi-prog attachment Jakub Sitnicki
  2020-06-22 16:02 ` [PATCH bpf-next 1/3] flow_dissector: Pull BPF program assignment up to bpf-netns Jakub Sitnicki
@ 2020-06-22 16:02 ` Jakub Sitnicki
  2020-06-23  6:23   ` Andrii Nakryiko
  2020-06-22 16:03 ` [PATCH bpf-next 3/3] bpf, netns: Keep a list of attached bpf_link's Jakub Sitnicki
  2 siblings, 1 reply; 8+ messages in thread
From: Jakub Sitnicki @ 2020-06-22 16:02 UTC (permalink / raw)
  To: bpf; +Cc: netdev, kernel-team

Prepare for having multi-prog attachments for new netns attach types by
storing programs to run in a bpf_prog_array, which is well suited for
iterating over programs and running them in sequence.

Because bpf_prog_array is dynamically resized, after this change a
potentially blocking memory allocation in bpf(PROG_QUERY) callback can
happen, in order to collect program IDs before copying the values to
user-space supplied buffer. This forces us to adapt how we protect access
to the attached program in the callback. As bpf_prog_array_copy_to_user()
helper can sleep, we switch from an RCU read lock to holding a mutex that
serializes updaters.

To handle bpf(PROG_ATTACH) scenario when we are replacing an already
attached program, we introduce a new bpf_prog_array helper called
bpf_prog_array_replace_item that will exchange the old program with a new
one. bpf-cgroup does away with such helper by computing an index into the
array based on program position in an external list of attached
programs/links. Such approach seems fragile, however, when dummy progs can
be left in the array after a memory allocation failure on link release.

No functional changes intended.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 include/linux/bpf.h        |   3 +
 include/net/netns/bpf.h    |   5 +-
 kernel/bpf/core.c          |  19 +++--
 kernel/bpf/net_namespace.c | 137 +++++++++++++++++++++++++++----------
 net/core/flow_dissector.c  |  21 +++---
 5 files changed, 131 insertions(+), 54 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 07052d44bca1..a3d2652185ce 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -924,6 +924,9 @@ int bpf_prog_array_copy(struct bpf_prog_array *old_array,
 			struct bpf_prog *exclude_prog,
 			struct bpf_prog *include_prog,
 			struct bpf_prog_array **new_array);
+int bpf_prog_array_replace_item(struct bpf_prog_array *array,
+				struct bpf_prog *old_prog,
+				struct bpf_prog *new_prog);
 
 #define __BPF_PROG_RUN_ARRAY(array, ctx, func, check_non_null)	\
 	({						\
diff --git a/include/net/netns/bpf.h b/include/net/netns/bpf.h
index a8dce2a380c8..a5015bda9979 100644
--- a/include/net/netns/bpf.h
+++ b/include/net/netns/bpf.h
@@ -9,9 +9,12 @@
 #include <linux/bpf-netns.h>
 
 struct bpf_prog;
+struct bpf_prog_array;
 
 struct netns_bpf {
-	struct bpf_prog __rcu *progs[MAX_NETNS_BPF_ATTACH_TYPE];
+	/* Array of programs to run compiled from progs or links */
+	struct bpf_prog_array __rcu *run_array[MAX_NETNS_BPF_ATTACH_TYPE];
+	struct bpf_prog *progs[MAX_NETNS_BPF_ATTACH_TYPE];
 	struct bpf_link *links[MAX_NETNS_BPF_ATTACH_TYPE];
 };
 
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 9df4cc9a2907..7798093c901d 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -1946,16 +1946,25 @@ int bpf_prog_array_copy_to_user(struct bpf_prog_array *array,
 	return 0;
 }
 
-void bpf_prog_array_delete_safe(struct bpf_prog_array *array,
-				struct bpf_prog *old_prog)
+int bpf_prog_array_replace_item(struct bpf_prog_array *array,
+				struct bpf_prog *old_prog,
+				struct bpf_prog *new_prog)
 {
 	struct bpf_prog_array_item *item;
 
-	for (item = array->items; item->prog; item++)
+	for (item = array->items; item->prog; item++) {
 		if (item->prog == old_prog) {
-			WRITE_ONCE(item->prog, &dummy_bpf_prog.prog);
-			break;
+			WRITE_ONCE(item->prog, new_prog);
+			return 0;
 		}
+	}
+	return -ENOENT;
+}
+
+void bpf_prog_array_delete_safe(struct bpf_prog_array *array,
+				struct bpf_prog *old_prog)
+{
+	bpf_prog_array_replace_item(array, old_prog, &dummy_bpf_prog.prog);
 }
 
 int bpf_prog_array_copy(struct bpf_prog_array *old_array,
diff --git a/kernel/bpf/net_namespace.c b/kernel/bpf/net_namespace.c
index b951dab2687f..593523a22168 100644
--- a/kernel/bpf/net_namespace.c
+++ b/kernel/bpf/net_namespace.c
@@ -38,7 +38,9 @@ static void bpf_netns_link_release(struct bpf_link *link)
 	struct bpf_netns_link *net_link =
 		container_of(link, struct bpf_netns_link, link);
 	enum netns_bpf_attach_type type = net_link->netns_type;
+	struct bpf_prog_array *old_array, *new_array;
 	struct net *net;
+	int err;
 
 	/* Link auto-detached by dying netns. */
 	if (!net_link->net)
@@ -54,8 +56,20 @@ static void bpf_netns_link_release(struct bpf_link *link)
 	if (!net)
 		goto out_unlock;
 
+	/* Rebuild run array without the prog or replace it with a
+	 * dummy one because we cannot fail during link release.
+	 */
+	old_array = rcu_dereference_protected(net->bpf.run_array[type],
+					      lockdep_is_held(&netns_bpf_mutex));
+	err = bpf_prog_array_copy(old_array, link->prog, NULL, &new_array);
+	if (err) {
+		bpf_prog_array_delete_safe(old_array, link->prog);
+	} else {
+		rcu_assign_pointer(net->bpf.run_array[type], new_array);
+		bpf_prog_array_free(old_array);
+	}
+
 	net->bpf.links[type] = NULL;
-	RCU_INIT_POINTER(net->bpf.progs[type], NULL);
 
 out_unlock:
 	mutex_unlock(&netns_bpf_mutex);
@@ -76,6 +90,7 @@ static int bpf_netns_link_update_prog(struct bpf_link *link,
 	struct bpf_netns_link *net_link =
 		container_of(link, struct bpf_netns_link, link);
 	enum netns_bpf_attach_type type = net_link->netns_type;
+	struct bpf_prog_array *run_array;
 	struct net *net;
 	int ret = 0;
 
@@ -93,8 +108,16 @@ static int bpf_netns_link_update_prog(struct bpf_link *link,
 		goto out_unlock;
 	}
 
+	run_array = rcu_dereference_protected(net->bpf.run_array[type],
+					      lockdep_is_held(&netns_bpf_mutex));
+	if (run_array)
+		ret = bpf_prog_array_replace_item(run_array, link->prog, new_prog);
+	else
+		ret = -ENOENT;
+	if (ret)
+		goto out_unlock;
+
 	old_prog = xchg(&link->prog, new_prog);
-	rcu_assign_pointer(net->bpf.progs[type], new_prog);
 	bpf_prog_put(old_prog);
 
 out_unlock:
@@ -142,14 +165,38 @@ static const struct bpf_link_ops bpf_netns_link_ops = {
 	.show_fdinfo = bpf_netns_link_show_fdinfo,
 };
 
+/* Must be called with netns_bpf_mutex held. */
+static int __netns_bpf_prog_query(const union bpf_attr *attr,
+				  union bpf_attr __user *uattr,
+				  struct net *net,
+				  enum netns_bpf_attach_type type)
+{
+	__u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids);
+	struct bpf_prog_array *run_array;
+	u32 prog_cnt = 0, flags = 0;
+
+	run_array = rcu_dereference_protected(net->bpf.run_array[type],
+					      lockdep_is_held(&netns_bpf_mutex));
+	if (run_array)
+		prog_cnt = bpf_prog_array_length(run_array);
+
+	if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags)))
+		return -EFAULT;
+	if (copy_to_user(&uattr->query.prog_cnt, &prog_cnt, sizeof(prog_cnt)))
+		return -EFAULT;
+	if (!attr->query.prog_cnt || !prog_ids || !prog_cnt)
+		return 0;
+
+	return bpf_prog_array_copy_to_user(run_array, prog_ids,
+					   attr->query.prog_cnt);
+}
+
 int netns_bpf_prog_query(const union bpf_attr *attr,
 			 union bpf_attr __user *uattr)
 {
-	__u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids);
-	u32 prog_id, prog_cnt = 0, flags = 0;
 	enum netns_bpf_attach_type type;
-	struct bpf_prog *attached;
 	struct net *net;
+	int ret;
 
 	if (attr->query.query_flags)
 		return -EINVAL;
@@ -162,32 +209,17 @@ int netns_bpf_prog_query(const union bpf_attr *attr,
 	if (IS_ERR(net))
 		return PTR_ERR(net);
 
-	rcu_read_lock();
-	attached = rcu_dereference(net->bpf.progs[type]);
-	if (attached) {
-		prog_cnt = 1;
-		prog_id = attached->aux->id;
-	}
-	rcu_read_unlock();
+	mutex_lock(&netns_bpf_mutex);
+	ret = __netns_bpf_prog_query(attr, uattr, net, type);
+	mutex_unlock(&netns_bpf_mutex);
 
 	put_net(net);
-
-	if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags)))
-		return -EFAULT;
-	if (copy_to_user(&uattr->query.prog_cnt, &prog_cnt, sizeof(prog_cnt)))
-		return -EFAULT;
-
-	if (!attr->query.prog_cnt || !prog_ids || !prog_cnt)
-		return 0;
-
-	if (copy_to_user(prog_ids, &prog_id, sizeof(u32)))
-		return -EFAULT;
-
-	return 0;
+	return ret;
 }
 
 int netns_bpf_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
 {
+	struct bpf_prog_array *run_array;
 	enum netns_bpf_attach_type type;
 	struct bpf_prog *attached;
 	struct net *net;
@@ -217,14 +249,25 @@ int netns_bpf_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
 	if (ret)
 		goto out_unlock;
 
-	attached = rcu_dereference_protected(net->bpf.progs[type],
-					     lockdep_is_held(&netns_bpf_mutex));
+	attached = net->bpf.progs[type];
 	if (attached == prog) {
 		/* The same program cannot be attached twice */
 		ret = -EINVAL;
 		goto out_unlock;
 	}
-	rcu_assign_pointer(net->bpf.progs[type], prog);
+
+	run_array = rcu_dereference_protected(net->bpf.run_array[type],
+					      lockdep_is_held(&netns_bpf_mutex));
+	if (run_array) {
+		ret = bpf_prog_array_replace_item(run_array, attached, prog);
+	} else {
+		ret = bpf_prog_array_copy(NULL, NULL, prog, &run_array);
+		rcu_assign_pointer(net->bpf.run_array[type], run_array);
+	}
+	if (ret)
+		goto out_unlock;
+
+	net->bpf.progs[type] = prog;
 	if (attached)
 		bpf_prog_put(attached);
 
@@ -234,6 +277,18 @@ int netns_bpf_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
 	return ret;
 }
 
+/* Must be called with netns_bpf_mutex held. */
+static void netns_bpf_run_array_detach(struct net *net,
+				       enum netns_bpf_attach_type type)
+{
+	struct bpf_prog_array *run_array;
+
+	run_array = rcu_dereference_protected(net->bpf.run_array[type],
+					      lockdep_is_held(&netns_bpf_mutex));
+	RCU_INIT_POINTER(net->bpf.run_array[type], NULL);
+	bpf_prog_array_free(run_array);
+}
+
 /* Must be called with netns_bpf_mutex held. */
 static int __netns_bpf_prog_detach(struct net *net,
 				   enum netns_bpf_attach_type type)
@@ -244,11 +299,11 @@ static int __netns_bpf_prog_detach(struct net *net,
 	if (net->bpf.links[type])
 		return -EINVAL;
 
-	attached = rcu_dereference_protected(net->bpf.progs[type],
-					     lockdep_is_held(&netns_bpf_mutex));
+	attached = net->bpf.progs[type];
 	if (!attached)
 		return -ENOENT;
-	RCU_INIT_POINTER(net->bpf.progs[type], NULL);
+	netns_bpf_run_array_detach(net, type);
+	net->bpf.progs[type] = NULL;
 	bpf_prog_put(attached);
 	return 0;
 }
@@ -272,7 +327,7 @@ int netns_bpf_prog_detach(const union bpf_attr *attr)
 static int netns_bpf_link_attach(struct net *net, struct bpf_link *link,
 				 enum netns_bpf_attach_type type)
 {
-	struct bpf_prog *prog;
+	struct bpf_prog_array *old_array, *new_array;
 	int err;
 
 	mutex_lock(&netns_bpf_mutex);
@@ -283,9 +338,7 @@ static int netns_bpf_link_attach(struct net *net, struct bpf_link *link,
 		goto out_unlock;
 	}
 	/* Links are not compatible with attaching prog directly */
-	prog = rcu_dereference_protected(net->bpf.progs[type],
-					 lockdep_is_held(&netns_bpf_mutex));
-	if (prog) {
+	if (net->bpf.progs[type]) {
 		err = -EEXIST;
 		goto out_unlock;
 	}
@@ -301,7 +354,14 @@ static int netns_bpf_link_attach(struct net *net, struct bpf_link *link,
 	if (err)
 		goto out_unlock;
 
-	rcu_assign_pointer(net->bpf.progs[type], link->prog);
+	old_array = rcu_dereference_protected(net->bpf.run_array[type],
+					      lockdep_is_held(&netns_bpf_mutex));
+	err = bpf_prog_array_copy(old_array, NULL, link->prog, &new_array);
+	if (err)
+		goto out_unlock;
+	rcu_assign_pointer(net->bpf.run_array[type], new_array);
+	bpf_prog_array_free(old_array);
+
 	net->bpf.links[type] = link;
 
 out_unlock:
@@ -368,11 +428,12 @@ static void __net_exit netns_bpf_pernet_pre_exit(struct net *net)
 
 	mutex_lock(&netns_bpf_mutex);
 	for (type = 0; type < MAX_NETNS_BPF_ATTACH_TYPE; type++) {
+		netns_bpf_run_array_detach(net, type);
 		link = net->bpf.links[type];
 		if (link)
 			bpf_netns_link_auto_detach(link);
-		else
-			__netns_bpf_prog_detach(net, type);
+		else if (net->bpf.progs[type])
+			bpf_prog_put(net->bpf.progs[type]);
 	}
 	mutex_unlock(&netns_bpf_mutex);
 }
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index b57fb1359395..e501aefb9a0e 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -86,14 +86,14 @@ int flow_dissector_bpf_prog_attach_check(struct net *net,
 		for_each_net(ns) {
 			if (ns == &init_net)
 				continue;
-			if (rcu_access_pointer(ns->bpf.progs[type]))
+			if (rcu_access_pointer(ns->bpf.run_array[type]))
 				return -EEXIST;
 		}
 	} else {
 		/* Make sure root flow dissector is not attached
 		 * when attaching to the non-root namespace.
 		 */
-		if (rcu_access_pointer(init_net.bpf.progs[type]))
+		if (rcu_access_pointer(init_net.bpf.run_array[type]))
 			return -EEXIST;
 	}
 
@@ -894,7 +894,6 @@ bool __skb_flow_dissect(const struct net *net,
 	struct flow_dissector_key_addrs *key_addrs;
 	struct flow_dissector_key_tags *key_tags;
 	struct flow_dissector_key_vlan *key_vlan;
-	struct bpf_prog *attached = NULL;
 	enum flow_dissect_ret fdret;
 	enum flow_dissector_key_id dissector_vlan = FLOW_DISSECTOR_KEY_MAX;
 	bool mpls_el = false;
@@ -951,14 +950,14 @@ bool __skb_flow_dissect(const struct net *net,
 	WARN_ON_ONCE(!net);
 	if (net) {
 		enum netns_bpf_attach_type type = NETNS_BPF_FLOW_DISSECTOR;
+		struct bpf_prog_array *run_array;
 
 		rcu_read_lock();
-		attached = rcu_dereference(init_net.bpf.progs[type]);
+		run_array = rcu_dereference(init_net.bpf.run_array[type]);
+		if (!run_array)
+			run_array = rcu_dereference(net->bpf.run_array[type]);
 
-		if (!attached)
-			attached = rcu_dereference(net->bpf.progs[type]);
-
-		if (attached) {
+		if (run_array) {
 			struct bpf_flow_keys flow_keys;
 			struct bpf_flow_dissector ctx = {
 				.flow_keys = &flow_keys,
@@ -966,6 +965,7 @@ bool __skb_flow_dissect(const struct net *net,
 				.data_end = data + hlen,
 			};
 			__be16 n_proto = proto;
+			struct bpf_prog *prog;
 
 			if (skb) {
 				ctx.skb = skb;
@@ -976,8 +976,9 @@ bool __skb_flow_dissect(const struct net *net,
 				n_proto = skb->protocol;
 			}
 
-			ret = bpf_flow_dissect(attached, &ctx, n_proto, nhoff,
-					       hlen, flags);
+			prog = READ_ONCE(run_array->items[0].prog);
+			ret = bpf_flow_dissect(prog, &ctx, n_proto,
+					       nhoff, hlen, flags);
 			__skb_flow_bpf_to_target(&flow_keys, flow_dissector,
 						 target_container);
 			rcu_read_unlock();
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH bpf-next 3/3] bpf, netns: Keep a list of attached bpf_link's
  2020-06-22 16:02 [PATCH bpf-next 0/3] bpf, netns: Prepare for multi-prog attachment Jakub Sitnicki
  2020-06-22 16:02 ` [PATCH bpf-next 1/3] flow_dissector: Pull BPF program assignment up to bpf-netns Jakub Sitnicki
  2020-06-22 16:02 ` [PATCH bpf-next 2/3] bpf, netns: Keep attached programs in bpf_prog_array Jakub Sitnicki
@ 2020-06-22 16:03 ` Jakub Sitnicki
  2020-06-23  6:26   ` Andrii Nakryiko
  2 siblings, 1 reply; 8+ messages in thread
From: Jakub Sitnicki @ 2020-06-22 16:03 UTC (permalink / raw)
  To: bpf; +Cc: netdev, kernel-team

To support multi-prog link-based attachments for new netns attach types, we
need to keep track of more than one bpf_link per attach type. Hence,
convert net->bpf.links into a list, that currently can be either empty or
have just one item.

Instead of reusing bpf_prog_list from bpf-cgroup, we link together
bpf_netns_link's themselves. This makes list management simpler as we don't
have to allocate, initialize, and later release list elements. We can do
this because multi-prog attachment will be available only for bpf_link, and
we don't need to build a list of programs attached directly and indirectly
via links.

No functional changes intended.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 include/net/netns/bpf.h    |  2 +-
 kernel/bpf/net_namespace.c | 42 +++++++++++++++++++++-----------------
 2 files changed, 24 insertions(+), 20 deletions(-)

diff --git a/include/net/netns/bpf.h b/include/net/netns/bpf.h
index a5015bda9979..0ca6a1b87185 100644
--- a/include/net/netns/bpf.h
+++ b/include/net/netns/bpf.h
@@ -15,7 +15,7 @@ struct netns_bpf {
 	/* Array of programs to run compiled from progs or links */
 	struct bpf_prog_array __rcu *run_array[MAX_NETNS_BPF_ATTACH_TYPE];
 	struct bpf_prog *progs[MAX_NETNS_BPF_ATTACH_TYPE];
-	struct bpf_link *links[MAX_NETNS_BPF_ATTACH_TYPE];
+	struct list_head links[MAX_NETNS_BPF_ATTACH_TYPE];
 };
 
 #endif /* __NETNS_BPF_H__ */
diff --git a/kernel/bpf/net_namespace.c b/kernel/bpf/net_namespace.c
index 593523a22168..3e89c7ad42cb 100644
--- a/kernel/bpf/net_namespace.c
+++ b/kernel/bpf/net_namespace.c
@@ -19,20 +19,12 @@ struct bpf_netns_link {
 	 * with netns_bpf_mutex held.
 	 */
 	struct net *net;
+	struct list_head node; /* node in list of links attached to net */
 };
 
 /* Protects updates to netns_bpf */
 DEFINE_MUTEX(netns_bpf_mutex);
 
-/* Must be called with netns_bpf_mutex held. */
-static void __net_exit bpf_netns_link_auto_detach(struct bpf_link *link)
-{
-	struct bpf_netns_link *net_link =
-		container_of(link, struct bpf_netns_link, link);
-
-	net_link->net = NULL;
-}
-
 static void bpf_netns_link_release(struct bpf_link *link)
 {
 	struct bpf_netns_link *net_link =
@@ -69,7 +61,7 @@ static void bpf_netns_link_release(struct bpf_link *link)
 		bpf_prog_array_free(old_array);
 	}
 
-	net->bpf.links[type] = NULL;
+	list_del(&net_link->node);
 
 out_unlock:
 	mutex_unlock(&netns_bpf_mutex);
@@ -233,7 +225,7 @@ int netns_bpf_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
 	mutex_lock(&netns_bpf_mutex);
 
 	/* Attaching prog directly is not compatible with links */
-	if (net->bpf.links[type]) {
+	if (!list_empty(&net->bpf.links[type])) {
 		ret = -EEXIST;
 		goto out_unlock;
 	}
@@ -296,7 +288,7 @@ static int __netns_bpf_prog_detach(struct net *net,
 	struct bpf_prog *attached;
 
 	/* Progs attached via links cannot be detached */
-	if (net->bpf.links[type])
+	if (!list_empty(&net->bpf.links[type]))
 		return -EINVAL;
 
 	attached = net->bpf.progs[type];
@@ -327,13 +319,15 @@ int netns_bpf_prog_detach(const union bpf_attr *attr)
 static int netns_bpf_link_attach(struct net *net, struct bpf_link *link,
 				 enum netns_bpf_attach_type type)
 {
+	struct bpf_netns_link *net_link =
+		container_of(link, struct bpf_netns_link, link);
 	struct bpf_prog_array *old_array, *new_array;
 	int err;
 
 	mutex_lock(&netns_bpf_mutex);
 
 	/* Allow attaching only one prog or link for now */
-	if (net->bpf.links[type]) {
+	if (!list_empty(&net->bpf.links[type])) {
 		err = -E2BIG;
 		goto out_unlock;
 	}
@@ -362,7 +356,7 @@ static int netns_bpf_link_attach(struct net *net, struct bpf_link *link,
 	rcu_assign_pointer(net->bpf.run_array[type], new_array);
 	bpf_prog_array_free(old_array);
 
-	net->bpf.links[type] = link;
+	list_add_tail(&net_link->node, &net->bpf.links[type]);
 
 out_unlock:
 	mutex_unlock(&netns_bpf_mutex);
@@ -421,24 +415,34 @@ int netns_bpf_link_create(const union bpf_attr *attr, struct bpf_prog *prog)
 	return err;
 }
 
+static int __net_init netns_bpf_pernet_init(struct net *net)
+{
+	int type;
+
+	for (type = 0; type < MAX_NETNS_BPF_ATTACH_TYPE; type++)
+		INIT_LIST_HEAD(&net->bpf.links[type]);
+
+	return 0;
+}
+
 static void __net_exit netns_bpf_pernet_pre_exit(struct net *net)
 {
 	enum netns_bpf_attach_type type;
-	struct bpf_link *link;
+	struct bpf_netns_link *net_link;
 
 	mutex_lock(&netns_bpf_mutex);
 	for (type = 0; type < MAX_NETNS_BPF_ATTACH_TYPE; type++) {
 		netns_bpf_run_array_detach(net, type);
-		link = net->bpf.links[type];
-		if (link)
-			bpf_netns_link_auto_detach(link);
-		else if (net->bpf.progs[type])
+		list_for_each_entry(net_link, &net->bpf.links[type], node)
+			net_link->net = NULL; /* auto-detach link */
+		if (net->bpf.progs[type])
 			bpf_prog_put(net->bpf.progs[type]);
 	}
 	mutex_unlock(&netns_bpf_mutex);
 }
 
 static struct pernet_operations netns_bpf_pernet_ops __net_initdata = {
+	.init = netns_bpf_pernet_init,
 	.pre_exit = netns_bpf_pernet_pre_exit,
 };
 
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH bpf-next 1/3] flow_dissector: Pull BPF program assignment up to bpf-netns
  2020-06-22 16:02 ` [PATCH bpf-next 1/3] flow_dissector: Pull BPF program assignment up to bpf-netns Jakub Sitnicki
@ 2020-06-23  6:02   ` Andrii Nakryiko
  0 siblings, 0 replies; 8+ messages in thread
From: Andrii Nakryiko @ 2020-06-23  6:02 UTC (permalink / raw)
  To: Jakub Sitnicki; +Cc: bpf, Networking, kernel-team

On Mon, Jun 22, 2020 at 9:03 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> Prepare for using bpf_prog_array to store attached programs by moving out
> code that updates the attached program out of flow dissector.
>
> Managing bpf_prog_array is more involved than updating a single bpf_prog
> pointer. This will let us do it all from one place, bpf/net_namespace.c, in
> the subsequent patch.
>
> No functional change intended.
>
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---

LGTM.

Acked-by: Andrii Nakryiko <andriin@fb.com>

>  include/net/flow_dissector.h |  3 ++-
>  kernel/bpf/net_namespace.c   | 20 ++++++++++++++++++--
>  net/core/flow_dissector.c    | 13 ++-----------
>  3 files changed, 22 insertions(+), 14 deletions(-)
>

[...]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH bpf-next 2/3] bpf, netns: Keep attached programs in bpf_prog_array
  2020-06-22 16:02 ` [PATCH bpf-next 2/3] bpf, netns: Keep attached programs in bpf_prog_array Jakub Sitnicki
@ 2020-06-23  6:23   ` Andrii Nakryiko
  2020-06-23 10:51     ` Jakub Sitnicki
  0 siblings, 1 reply; 8+ messages in thread
From: Andrii Nakryiko @ 2020-06-23  6:23 UTC (permalink / raw)
  To: Jakub Sitnicki; +Cc: bpf, Networking, kernel-team

On Mon, Jun 22, 2020 at 9:04 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> Prepare for having multi-prog attachments for new netns attach types by
> storing programs to run in a bpf_prog_array, which is well suited for
> iterating over programs and running them in sequence.
>
> Because bpf_prog_array is dynamically resized, after this change a
> potentially blocking memory allocation in bpf(PROG_QUERY) callback can
> happen, in order to collect program IDs before copying the values to
> user-space supplied buffer. This forces us to adapt how we protect access
> to the attached program in the callback. As bpf_prog_array_copy_to_user()
> helper can sleep, we switch from an RCU read lock to holding a mutex that
> serializes updaters.
>
> To handle bpf(PROG_ATTACH) scenario when we are replacing an already
> attached program, we introduce a new bpf_prog_array helper called
> bpf_prog_array_replace_item that will exchange the old program with a new
> one. bpf-cgroup does away with such helper by computing an index into the
> array based on program position in an external list of attached
> programs/links. Such approach seems fragile, however, when dummy progs can
> be left in the array after a memory allocation failure on link release.

bpf-cgroup can have the same BPF program present multiple times in the
effective prog array due to inheritance. It also has strict
guarantee/requirement about relative order of programs in parent
cgroup vs child cgroups. For such cases, replacing a BPF program based
on its pointer is not going to work correctly.

We do need to make sure that cgroup detachment never fails by falling
back to replacing BPF prog with dummy prog, though. If you are
interested in a challenge, you are very welcome to do that! :)

>
> No functional changes intended.
>
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---

LGTM.

Acked-by: Andrii Nakryiko <andriin@fb.com>

>  include/linux/bpf.h        |   3 +
>  include/net/netns/bpf.h    |   5 +-
>  kernel/bpf/core.c          |  19 +++--
>  kernel/bpf/net_namespace.c | 137 +++++++++++++++++++++++++++----------
>  net/core/flow_dissector.c  |  21 +++---
>  5 files changed, 131 insertions(+), 54 deletions(-)
>

[...]

> +
> +void bpf_prog_array_delete_safe(struct bpf_prog_array *array,
> +                               struct bpf_prog *old_prog)
> +{
> +       bpf_prog_array_replace_item(array, old_prog, &dummy_bpf_prog.prog);

nit: (void) cast to show we don't care about return code?

[...]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH bpf-next 3/3] bpf, netns: Keep a list of attached bpf_link's
  2020-06-22 16:03 ` [PATCH bpf-next 3/3] bpf, netns: Keep a list of attached bpf_link's Jakub Sitnicki
@ 2020-06-23  6:26   ` Andrii Nakryiko
  0 siblings, 0 replies; 8+ messages in thread
From: Andrii Nakryiko @ 2020-06-23  6:26 UTC (permalink / raw)
  To: Jakub Sitnicki; +Cc: bpf, Networking, kernel-team

On Mon, Jun 22, 2020 at 9:04 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> To support multi-prog link-based attachments for new netns attach types, we
> need to keep track of more than one bpf_link per attach type. Hence,
> convert net->bpf.links into a list, that currently can be either empty or
> have just one item.
>
> Instead of reusing bpf_prog_list from bpf-cgroup, we link together
> bpf_netns_link's themselves. This makes list management simpler as we don't
> have to allocate, initialize, and later release list elements. We can do
> this because multi-prog attachment will be available only for bpf_link, and
> we don't need to build a list of programs attached directly and indirectly
> via links.
>
> No functional changes intended.
>
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---

Acked-by: Andrii Nakryiko <andriin@fb.com>

>  include/net/netns/bpf.h    |  2 +-
>  kernel/bpf/net_namespace.c | 42 +++++++++++++++++++++-----------------
>  2 files changed, 24 insertions(+), 20 deletions(-)
>

[...]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH bpf-next 2/3] bpf, netns: Keep attached programs in bpf_prog_array
  2020-06-23  6:23   ` Andrii Nakryiko
@ 2020-06-23 10:51     ` Jakub Sitnicki
  0 siblings, 0 replies; 8+ messages in thread
From: Jakub Sitnicki @ 2020-06-23 10:51 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, Networking, kernel-team

On Tue, Jun 23, 2020 at 08:23 AM CEST, Andrii Nakryiko wrote:
> On Mon, Jun 22, 2020 at 9:04 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>
>> Prepare for having multi-prog attachments for new netns attach types by
>> storing programs to run in a bpf_prog_array, which is well suited for
>> iterating over programs and running them in sequence.
>>
>> Because bpf_prog_array is dynamically resized, after this change a
>> potentially blocking memory allocation in bpf(PROG_QUERY) callback can
>> happen, in order to collect program IDs before copying the values to
>> user-space supplied buffer. This forces us to adapt how we protect access
>> to the attached program in the callback. As bpf_prog_array_copy_to_user()
>> helper can sleep, we switch from an RCU read lock to holding a mutex that
>> serializes updaters.
>>
>> To handle bpf(PROG_ATTACH) scenario when we are replacing an already
>> attached program, we introduce a new bpf_prog_array helper called
>> bpf_prog_array_replace_item that will exchange the old program with a new
>> one. bpf-cgroup does away with such helper by computing an index into the
>> array based on program position in an external list of attached
>> programs/links. Such approach seems fragile, however, when dummy progs can
>> be left in the array after a memory allocation failure on link release.
>
> bpf-cgroup can have the same BPF program present multiple times in the
> effective prog array due to inheritance. It also has strict
> guarantee/requirement about relative order of programs in parent
> cgroup vs child cgroups. For such cases, replacing a BPF program based
> on its pointer is not going to work correctly.

Thanks for the explanation. That did not occur to me. Incorporated it
into the description in v2.

>
> We do need to make sure that cgroup detachment never fails by falling
> back to replacing BPF prog with dummy prog, though. If you are
> interested in a challenge, you are very welcome to do that! :)

I keep a list of tasks for a slow day.

[...]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-06-23 10:51 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-22 16:02 [PATCH bpf-next 0/3] bpf, netns: Prepare for multi-prog attachment Jakub Sitnicki
2020-06-22 16:02 ` [PATCH bpf-next 1/3] flow_dissector: Pull BPF program assignment up to bpf-netns Jakub Sitnicki
2020-06-23  6:02   ` Andrii Nakryiko
2020-06-22 16:02 ` [PATCH bpf-next 2/3] bpf, netns: Keep attached programs in bpf_prog_array Jakub Sitnicki
2020-06-23  6:23   ` Andrii Nakryiko
2020-06-23 10:51     ` Jakub Sitnicki
2020-06-22 16:03 ` [PATCH bpf-next 3/3] bpf, netns: Keep a list of attached bpf_link's Jakub Sitnicki
2020-06-23  6:26   ` Andrii Nakryiko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.