[PATCH bpf-next v2 0/3] bpf, netns: Prepare for multi-prog attachment

bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH bpf-next v2 0/3] bpf, netns: Prepare for multi-prog attachment
@ 2020-06-23 10:34 Jakub Sitnicki
  2020-06-23 10:34 ` [PATCH bpf-next v2 1/3] flow_dissector: Pull BPF program assignment up to bpf-netns Jakub Sitnicki
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Jakub Sitnicki @ 2020-06-23 10:34 UTC (permalink / raw)
  To: bpf; +Cc: netdev, kernel-team, Andrii Nakryiko, Stanislav Fomichev

This patch set prepares ground for link-based multi-prog attachment for
future netns attach types, with BPF_SK_LOOKUP attach type in mind [0].

Two changes are needed in order to attach and run a series of BPF programs:

  1) an bpf_prog_array of programs to run (patch #2), and
  2) a list of attached links to keep track of attachments (patch #3).

I've been using these patches with the next iteration of BPF socket lookup
hook patches, and saw that they are self-contained and can be split out to
ease the review burden.

Nothing changes for BPF flow_dissector. That is at most one prog can be
attached.

Thanks,
-jkbs

[0] https://lore.kernel.org/bpf/20200511185218.1422406-1-jakub@cloudflare.com/

Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Stanislav Fomichev <sdf@google.com>

v1 -> v2:

- Show with a (void) cast that bpf_prog_array_replace_item() return value
  is ignored on purpose. (Andrii)
- Explain why bpf-cgroup cannot replace programs in bpf_prog_array based
  on bpf_prog pointer comparison in patch #2 description. (Andrii)

Jakub Sitnicki (3):
  flow_dissector: Pull BPF program assignment up to bpf-netns
  bpf, netns: Keep attached programs in bpf_prog_array
  bpf, netns: Keep a list of attached bpf_link's

 include/linux/bpf.h          |   3 +
 include/net/flow_dissector.h |   3 +-
 include/net/netns/bpf.h      |   7 +-
 kernel/bpf/core.c            |  20 +++-
 kernel/bpf/net_namespace.c   | 189 +++++++++++++++++++++++++----------
 net/core/flow_dissector.c    |  34 +++----
 6 files changed, 173 insertions(+), 83 deletions(-)

-- 
2.25.4


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH bpf-next v2 1/3] flow_dissector: Pull BPF program assignment up to bpf-netns
  2020-06-23 10:34 [PATCH bpf-next v2 0/3] bpf, netns: Prepare for multi-prog attachment Jakub Sitnicki
@ 2020-06-23 10:34 ` Jakub Sitnicki
  2020-06-23 10:34 ` [PATCH bpf-next v2 2/3] bpf, netns: Keep attached programs in bpf_prog_array Jakub Sitnicki
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: Jakub Sitnicki @ 2020-06-23 10:34 UTC (permalink / raw)
  To: bpf; +Cc: netdev, kernel-team, Andrii Nakryiko

Prepare for using bpf_prog_array to store attached programs by moving out
code that updates the attached program out of flow dissector.

Managing bpf_prog_array is more involved than updating a single bpf_prog
pointer. This will let us do it all from one place, bpf/net_namespace.c, in
the subsequent patch.

No functional change intended.

Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 include/net/flow_dissector.h |  3 ++-
 kernel/bpf/net_namespace.c   | 20 ++++++++++++++++++--
 net/core/flow_dissector.c    | 13 ++-----------
 3 files changed, 22 insertions(+), 14 deletions(-)

diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
index a7eba43fe4e4..4b6e36288ddd 100644
--- a/include/net/flow_dissector.h
+++ b/include/net/flow_dissector.h
@@ -372,7 +372,8 @@ flow_dissector_init_keys(struct flow_dissector_key_control *key_control,
 }
 
 #ifdef CONFIG_BPF_SYSCALL
-int flow_dissector_bpf_prog_attach(struct net *net, struct bpf_prog *prog);
+int flow_dissector_bpf_prog_attach_check(struct net *net,
+					 struct bpf_prog *prog);
 #endif /* CONFIG_BPF_SYSCALL */
 
 #endif
diff --git a/kernel/bpf/net_namespace.c b/kernel/bpf/net_namespace.c
index 78cf061f8179..b951dab2687f 100644
--- a/kernel/bpf/net_namespace.c
+++ b/kernel/bpf/net_namespace.c
@@ -189,6 +189,7 @@ int netns_bpf_prog_query(const union bpf_attr *attr,
 int netns_bpf_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
 {
 	enum netns_bpf_attach_type type;
+	struct bpf_prog *attached;
 	struct net *net;
 	int ret;
 
@@ -207,12 +208,26 @@ int netns_bpf_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
 
 	switch (type) {
 	case NETNS_BPF_FLOW_DISSECTOR:
-		ret = flow_dissector_bpf_prog_attach(net, prog);
+		ret = flow_dissector_bpf_prog_attach_check(net, prog);
 		break;
 	default:
 		ret = -EINVAL;
 		break;
 	}
+	if (ret)
+		goto out_unlock;
+
+	attached = rcu_dereference_protected(net->bpf.progs[type],
+					     lockdep_is_held(&netns_bpf_mutex));
+	if (attached == prog) {
+		/* The same program cannot be attached twice */
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+	rcu_assign_pointer(net->bpf.progs[type], prog);
+	if (attached)
+		bpf_prog_put(attached);
+
 out_unlock:
 	mutex_unlock(&netns_bpf_mutex);
 
@@ -277,7 +292,7 @@ static int netns_bpf_link_attach(struct net *net, struct bpf_link *link,
 
 	switch (type) {
 	case NETNS_BPF_FLOW_DISSECTOR:
-		err = flow_dissector_bpf_prog_attach(net, link->prog);
+		err = flow_dissector_bpf_prog_attach_check(net, link->prog);
 		break;
 	default:
 		err = -EINVAL;
@@ -286,6 +301,7 @@ static int netns_bpf_link_attach(struct net *net, struct bpf_link *link,
 	if (err)
 		goto out_unlock;
 
+	rcu_assign_pointer(net->bpf.progs[type], link->prog);
 	net->bpf.links[type] = link;
 
 out_unlock:
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index d02df0b6d0d9..b57fb1359395 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -70,10 +70,10 @@ void skb_flow_dissector_init(struct flow_dissector *flow_dissector,
 EXPORT_SYMBOL(skb_flow_dissector_init);
 
 #ifdef CONFIG_BPF_SYSCALL
-int flow_dissector_bpf_prog_attach(struct net *net, struct bpf_prog *prog)
+int flow_dissector_bpf_prog_attach_check(struct net *net,
+					 struct bpf_prog *prog)
 {
 	enum netns_bpf_attach_type type = NETNS_BPF_FLOW_DISSECTOR;
-	struct bpf_prog *attached;
 
 	if (net == &init_net) {
 		/* BPF flow dissector in the root namespace overrides
@@ -97,15 +97,6 @@ int flow_dissector_bpf_prog_attach(struct net *net, struct bpf_prog *prog)
 			return -EEXIST;
 	}
 
-	attached = rcu_dereference_protected(net->bpf.progs[type],
-					     lockdep_is_held(&netns_bpf_mutex));
-	if (attached == prog)
-		/* The same program cannot be attached twice */
-		return -EINVAL;
-
-	rcu_assign_pointer(net->bpf.progs[type], prog);
-	if (attached)
-		bpf_prog_put(attached);
 	return 0;
 }
 #endif /* CONFIG_BPF_SYSCALL */
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH bpf-next v2 2/3] bpf, netns: Keep attached programs in bpf_prog_array
  2020-06-23 10:34 [PATCH bpf-next v2 0/3] bpf, netns: Prepare for multi-prog attachment Jakub Sitnicki
  2020-06-23 10:34 ` [PATCH bpf-next v2 1/3] flow_dissector: Pull BPF program assignment up to bpf-netns Jakub Sitnicki
@ 2020-06-23 10:34 ` Jakub Sitnicki
  2020-06-23 19:33   ` Martin KaFai Lau
  2020-06-24 17:18   ` Jakub Sitnicki
  2020-06-23 10:34 ` [PATCH bpf-next v2 3/3] bpf, netns: Keep a list of attached bpf_link's Jakub Sitnicki
  2020-06-29 14:56 ` [PATCH bpf-next v2 0/3] bpf, netns: Prepare for multi-prog attachment Daniel Borkmann
  3 siblings, 2 replies; 15+ messages in thread
From: Jakub Sitnicki @ 2020-06-23 10:34 UTC (permalink / raw)
  To: bpf; +Cc: netdev, kernel-team, Andrii Nakryiko

Prepare for having multi-prog attachments for new netns attach types by
storing programs to run in a bpf_prog_array, which is well suited for
iterating over programs and running them in sequence.

Because bpf_prog_array is dynamically resized, after this change a
potentially blocking memory allocation in bpf(PROG_QUERY) callback can
happen, in order to collect program IDs before copying the values to
user-space supplied buffer. This forces us to adapt how we protect access
to the attached program in the callback. As bpf_prog_array_copy_to_user()
helper can sleep, we switch from an RCU read lock to holding a mutex that
serializes updaters.

To handle bpf(PROG_ATTACH) scenario when we are replacing an already
attached program, we introduce a new bpf_prog_array helper called
bpf_prog_array_replace_item that will exchange the old program with a new
one. bpf-cgroup does away with such helper by computing an index into the
array from a program position in an external list of attached
programs/links. Such approach fails when a dummy prog is left in the array
after a memory allocation failure on link release, but is necessary in
bpf-cgroup case because the same BPF program can be present in the array
multiple times due to inheritance, and attachment cannot be reliably
identified by bpf_prog pointer comparison.

No functional changes intended.

Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 include/linux/bpf.h        |   3 +
 include/net/netns/bpf.h    |   5 +-
 kernel/bpf/core.c          |  20 ++++--
 kernel/bpf/net_namespace.c | 137 +++++++++++++++++++++++++++----------
 net/core/flow_dissector.c  |  21 +++---
 5 files changed, 132 insertions(+), 54 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 1e1501ee53ce..5d0506f46f24 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -928,6 +928,9 @@ int bpf_prog_array_copy(struct bpf_prog_array *old_array,
 			struct bpf_prog *exclude_prog,
 			struct bpf_prog *include_prog,
 			struct bpf_prog_array **new_array);
+int bpf_prog_array_replace_item(struct bpf_prog_array *array,
+				struct bpf_prog *old_prog,
+				struct bpf_prog *new_prog);
 
 #define __BPF_PROG_RUN_ARRAY(array, ctx, func, check_non_null)	\
 	({						\
diff --git a/include/net/netns/bpf.h b/include/net/netns/bpf.h
index a8dce2a380c8..a5015bda9979 100644
--- a/include/net/netns/bpf.h
+++ b/include/net/netns/bpf.h
@@ -9,9 +9,12 @@
 #include <linux/bpf-netns.h>
 
 struct bpf_prog;
+struct bpf_prog_array;
 
 struct netns_bpf {
-	struct bpf_prog __rcu *progs[MAX_NETNS_BPF_ATTACH_TYPE];
+	/* Array of programs to run compiled from progs or links */
+	struct bpf_prog_array __rcu *run_array[MAX_NETNS_BPF_ATTACH_TYPE];
+	struct bpf_prog *progs[MAX_NETNS_BPF_ATTACH_TYPE];
 	struct bpf_link *links[MAX_NETNS_BPF_ATTACH_TYPE];
 };
 
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 9df4cc9a2907..07937ddf722f 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -1946,16 +1946,26 @@ int bpf_prog_array_copy_to_user(struct bpf_prog_array *array,
 	return 0;
 }
 
-void bpf_prog_array_delete_safe(struct bpf_prog_array *array,
-				struct bpf_prog *old_prog)
+int bpf_prog_array_replace_item(struct bpf_prog_array *array,
+				struct bpf_prog *old_prog,
+				struct bpf_prog *new_prog)
 {
 	struct bpf_prog_array_item *item;
 
-	for (item = array->items; item->prog; item++)
+	for (item = array->items; item->prog; item++) {
 		if (item->prog == old_prog) {
-			WRITE_ONCE(item->prog, &dummy_bpf_prog.prog);
-			break;
+			WRITE_ONCE(item->prog, new_prog);
+			return 0;
 		}
+	}
+	return -ENOENT;
+}
+
+void bpf_prog_array_delete_safe(struct bpf_prog_array *array,
+				struct bpf_prog *old_prog)
+{
+	(void)bpf_prog_array_replace_item(array, old_prog,
+					  &dummy_bpf_prog.prog);
 }
 
 int bpf_prog_array_copy(struct bpf_prog_array *old_array,
diff --git a/kernel/bpf/net_namespace.c b/kernel/bpf/net_namespace.c
index b951dab2687f..593523a22168 100644
--- a/kernel/bpf/net_namespace.c
+++ b/kernel/bpf/net_namespace.c
@@ -38,7 +38,9 @@ static void bpf_netns_link_release(struct bpf_link *link)
 	struct bpf_netns_link *net_link =
 		container_of(link, struct bpf_netns_link, link);
 	enum netns_bpf_attach_type type = net_link->netns_type;
+	struct bpf_prog_array *old_array, *new_array;
 	struct net *net;
+	int err;
 
 	/* Link auto-detached by dying netns. */
 	if (!net_link->net)
@@ -54,8 +56,20 @@ static void bpf_netns_link_release(struct bpf_link *link)
 	if (!net)
 		goto out_unlock;
 
+	/* Rebuild run array without the prog or replace it with a
+	 * dummy one because we cannot fail during link release.
+	 */
+	old_array = rcu_dereference_protected(net->bpf.run_array[type],
+					      lockdep_is_held(&netns_bpf_mutex));
+	err = bpf_prog_array_copy(old_array, link->prog, NULL, &new_array);
+	if (err) {
+		bpf_prog_array_delete_safe(old_array, link->prog);
+	} else {
+		rcu_assign_pointer(net->bpf.run_array[type], new_array);
+		bpf_prog_array_free(old_array);
+	}
+
 	net->bpf.links[type] = NULL;
-	RCU_INIT_POINTER(net->bpf.progs[type], NULL);
 
 out_unlock:
 	mutex_unlock(&netns_bpf_mutex);
@@ -76,6 +90,7 @@ static int bpf_netns_link_update_prog(struct bpf_link *link,
 	struct bpf_netns_link *net_link =
 		container_of(link, struct bpf_netns_link, link);
 	enum netns_bpf_attach_type type = net_link->netns_type;
+	struct bpf_prog_array *run_array;
 	struct net *net;
 	int ret = 0;
 
@@ -93,8 +108,16 @@ static int bpf_netns_link_update_prog(struct bpf_link *link,
 		goto out_unlock;
 	}
 
+	run_array = rcu_dereference_protected(net->bpf.run_array[type],
+					      lockdep_is_held(&netns_bpf_mutex));
+	if (run_array)
+		ret = bpf_prog_array_replace_item(run_array, link->prog, new_prog);
+	else
+		ret = -ENOENT;
+	if (ret)
+		goto out_unlock;
+
 	old_prog = xchg(&link->prog, new_prog);
-	rcu_assign_pointer(net->bpf.progs[type], new_prog);
 	bpf_prog_put(old_prog);
 
 out_unlock:
@@ -142,14 +165,38 @@ static const struct bpf_link_ops bpf_netns_link_ops = {
 	.show_fdinfo = bpf_netns_link_show_fdinfo,
 };
 
+/* Must be called with netns_bpf_mutex held. */
+static int __netns_bpf_prog_query(const union bpf_attr *attr,
+				  union bpf_attr __user *uattr,
+				  struct net *net,
+				  enum netns_bpf_attach_type type)
+{
+	__u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids);
+	struct bpf_prog_array *run_array;
+	u32 prog_cnt = 0, flags = 0;
+
+	run_array = rcu_dereference_protected(net->bpf.run_array[type],
+					      lockdep_is_held(&netns_bpf_mutex));
+	if (run_array)
+		prog_cnt = bpf_prog_array_length(run_array);
+
+	if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags)))
+		return -EFAULT;
+	if (copy_to_user(&uattr->query.prog_cnt, &prog_cnt, sizeof(prog_cnt)))
+		return -EFAULT;
+	if (!attr->query.prog_cnt || !prog_ids || !prog_cnt)
+		return 0;
+
+	return bpf_prog_array_copy_to_user(run_array, prog_ids,
+					   attr->query.prog_cnt);
+}
+
 int netns_bpf_prog_query(const union bpf_attr *attr,
 			 union bpf_attr __user *uattr)
 {
-	__u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids);
-	u32 prog_id, prog_cnt = 0, flags = 0;
 	enum netns_bpf_attach_type type;
-	struct bpf_prog *attached;
 	struct net *net;
+	int ret;
 
 	if (attr->query.query_flags)
 		return -EINVAL;
@@ -162,32 +209,17 @@ int netns_bpf_prog_query(const union bpf_attr *attr,
 	if (IS_ERR(net))
 		return PTR_ERR(net);
 
-	rcu_read_lock();
-	attached = rcu_dereference(net->bpf.progs[type]);
-	if (attached) {
-		prog_cnt = 1;
-		prog_id = attached->aux->id;
-	}
-	rcu_read_unlock();
+	mutex_lock(&netns_bpf_mutex);
+	ret = __netns_bpf_prog_query(attr, uattr, net, type);
+	mutex_unlock(&netns_bpf_mutex);
 
 	put_net(net);
-
-	if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags)))
-		return -EFAULT;
-	if (copy_to_user(&uattr->query.prog_cnt, &prog_cnt, sizeof(prog_cnt)))
-		return -EFAULT;
-
-	if (!attr->query.prog_cnt || !prog_ids || !prog_cnt)
-		return 0;
-
-	if (copy_to_user(prog_ids, &prog_id, sizeof(u32)))
-		return -EFAULT;
-
-	return 0;
+	return ret;
 }
 
 int netns_bpf_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
 {
+	struct bpf_prog_array *run_array;
 	enum netns_bpf_attach_type type;
 	struct bpf_prog *attached;
 	struct net *net;
@@ -217,14 +249,25 @@ int netns_bpf_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
 	if (ret)
 		goto out_unlock;
 
-	attached = rcu_dereference_protected(net->bpf.progs[type],
-					     lockdep_is_held(&netns_bpf_mutex));
+	attached = net->bpf.progs[type];
 	if (attached == prog) {
 		/* The same program cannot be attached twice */
 		ret = -EINVAL;
 		goto out_unlock;
 	}
-	rcu_assign_pointer(net->bpf.progs[type], prog);
+
+	run_array = rcu_dereference_protected(net->bpf.run_array[type],
+					      lockdep_is_held(&netns_bpf_mutex));
+	if (run_array) {
+		ret = bpf_prog_array_replace_item(run_array, attached, prog);
+	} else {
+		ret = bpf_prog_array_copy(NULL, NULL, prog, &run_array);
+		rcu_assign_pointer(net->bpf.run_array[type], run_array);
+	}
+	if (ret)
+		goto out_unlock;
+
+	net->bpf.progs[type] = prog;
 	if (attached)
 		bpf_prog_put(attached);
 
@@ -234,6 +277,18 @@ int netns_bpf_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
 	return ret;
 }
 
+/* Must be called with netns_bpf_mutex held. */
+static void netns_bpf_run_array_detach(struct net *net,
+				       enum netns_bpf_attach_type type)
+{
+	struct bpf_prog_array *run_array;
+
+	run_array = rcu_dereference_protected(net->bpf.run_array[type],
+					      lockdep_is_held(&netns_bpf_mutex));
+	RCU_INIT_POINTER(net->bpf.run_array[type], NULL);
+	bpf_prog_array_free(run_array);
+}
+
 /* Must be called with netns_bpf_mutex held. */
 static int __netns_bpf_prog_detach(struct net *net,
 				   enum netns_bpf_attach_type type)
@@ -244,11 +299,11 @@ static int __netns_bpf_prog_detach(struct net *net,
 	if (net->bpf.links[type])
 		return -EINVAL;
 
-	attached = rcu_dereference_protected(net->bpf.progs[type],
-					     lockdep_is_held(&netns_bpf_mutex));
+	attached = net->bpf.progs[type];
 	if (!attached)
 		return -ENOENT;
-	RCU_INIT_POINTER(net->bpf.progs[type], NULL);
+	netns_bpf_run_array_detach(net, type);
+	net->bpf.progs[type] = NULL;
 	bpf_prog_put(attached);
 	return 0;
 }
@@ -272,7 +327,7 @@ int netns_bpf_prog_detach(const union bpf_attr *attr)
 static int netns_bpf_link_attach(struct net *net, struct bpf_link *link,
 				 enum netns_bpf_attach_type type)
 {
-	struct bpf_prog *prog;
+	struct bpf_prog_array *old_array, *new_array;
 	int err;
 
 	mutex_lock(&netns_bpf_mutex);
@@ -283,9 +338,7 @@ static int netns_bpf_link_attach(struct net *net, struct bpf_link *link,
 		goto out_unlock;
 	}
 	/* Links are not compatible with attaching prog directly */
-	prog = rcu_dereference_protected(net->bpf.progs[type],
-					 lockdep_is_held(&netns_bpf_mutex));
-	if (prog) {
+	if (net->bpf.progs[type]) {
 		err = -EEXIST;
 		goto out_unlock;
 	}
@@ -301,7 +354,14 @@ static int netns_bpf_link_attach(struct net *net, struct bpf_link *link,
 	if (err)
 		goto out_unlock;
 
-	rcu_assign_pointer(net->bpf.progs[type], link->prog);
+	old_array = rcu_dereference_protected(net->bpf.run_array[type],
+					      lockdep_is_held(&netns_bpf_mutex));
+	err = bpf_prog_array_copy(old_array, NULL, link->prog, &new_array);
+	if (err)
+		goto out_unlock;
+	rcu_assign_pointer(net->bpf.run_array[type], new_array);
+	bpf_prog_array_free(old_array);
+
 	net->bpf.links[type] = link;
 
 out_unlock:
@@ -368,11 +428,12 @@ static void __net_exit netns_bpf_pernet_pre_exit(struct net *net)
 
 	mutex_lock(&netns_bpf_mutex);
 	for (type = 0; type < MAX_NETNS_BPF_ATTACH_TYPE; type++) {
+		netns_bpf_run_array_detach(net, type);
 		link = net->bpf.links[type];
 		if (link)
 			bpf_netns_link_auto_detach(link);
-		else
-			__netns_bpf_prog_detach(net, type);
+		else if (net->bpf.progs[type])
+			bpf_prog_put(net->bpf.progs[type]);
 	}
 	mutex_unlock(&netns_bpf_mutex);
 }
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index b57fb1359395..e501aefb9a0e 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -86,14 +86,14 @@ int flow_dissector_bpf_prog_attach_check(struct net *net,
 		for_each_net(ns) {
 			if (ns == &init_net)
 				continue;
-			if (rcu_access_pointer(ns->bpf.progs[type]))
+			if (rcu_access_pointer(ns->bpf.run_array[type]))
 				return -EEXIST;
 		}
 	} else {
 		/* Make sure root flow dissector is not attached
 		 * when attaching to the non-root namespace.
 		 */
-		if (rcu_access_pointer(init_net.bpf.progs[type]))
+		if (rcu_access_pointer(init_net.bpf.run_array[type]))
 			return -EEXIST;
 	}
 
@@ -894,7 +894,6 @@ bool __skb_flow_dissect(const struct net *net,
 	struct flow_dissector_key_addrs *key_addrs;
 	struct flow_dissector_key_tags *key_tags;
 	struct flow_dissector_key_vlan *key_vlan;
-	struct bpf_prog *attached = NULL;
 	enum flow_dissect_ret fdret;
 	enum flow_dissector_key_id dissector_vlan = FLOW_DISSECTOR_KEY_MAX;
 	bool mpls_el = false;
@@ -951,14 +950,14 @@ bool __skb_flow_dissect(const struct net *net,
 	WARN_ON_ONCE(!net);
 	if (net) {
 		enum netns_bpf_attach_type type = NETNS_BPF_FLOW_DISSECTOR;
+		struct bpf_prog_array *run_array;
 
 		rcu_read_lock();
-		attached = rcu_dereference(init_net.bpf.progs[type]);
+		run_array = rcu_dereference(init_net.bpf.run_array[type]);
+		if (!run_array)
+			run_array = rcu_dereference(net->bpf.run_array[type]);
 
-		if (!attached)
-			attached = rcu_dereference(net->bpf.progs[type]);
-
-		if (attached) {
+		if (run_array) {
 			struct bpf_flow_keys flow_keys;
 			struct bpf_flow_dissector ctx = {
 				.flow_keys = &flow_keys,
@@ -966,6 +965,7 @@ bool __skb_flow_dissect(const struct net *net,
 				.data_end = data + hlen,
 			};
 			__be16 n_proto = proto;
+			struct bpf_prog *prog;
 
 			if (skb) {
 				ctx.skb = skb;
@@ -976,8 +976,9 @@ bool __skb_flow_dissect(const struct net *net,
 				n_proto = skb->protocol;
 			}
 
-			ret = bpf_flow_dissect(attached, &ctx, n_proto, nhoff,
-					       hlen, flags);
+			prog = READ_ONCE(run_array->items[0].prog);
+			ret = bpf_flow_dissect(prog, &ctx, n_proto,
+					       nhoff, hlen, flags);
 			__skb_flow_bpf_to_target(&flow_keys, flow_dissector,
 						 target_container);
 			rcu_read_unlock();
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH bpf-next v2 3/3] bpf, netns: Keep a list of attached bpf_link's
  2020-06-23 10:34 [PATCH bpf-next v2 0/3] bpf, netns: Prepare for multi-prog attachment Jakub Sitnicki
  2020-06-23 10:34 ` [PATCH bpf-next v2 1/3] flow_dissector: Pull BPF program assignment up to bpf-netns Jakub Sitnicki
  2020-06-23 10:34 ` [PATCH bpf-next v2 2/3] bpf, netns: Keep attached programs in bpf_prog_array Jakub Sitnicki
@ 2020-06-23 10:34 ` Jakub Sitnicki
  2020-06-29 14:56 ` [PATCH bpf-next v2 0/3] bpf, netns: Prepare for multi-prog attachment Daniel Borkmann
  3 siblings, 0 replies; 15+ messages in thread
From: Jakub Sitnicki @ 2020-06-23 10:34 UTC (permalink / raw)
  To: bpf; +Cc: netdev, kernel-team, Andrii Nakryiko

To support multi-prog link-based attachments for new netns attach types, we
need to keep track of more than one bpf_link per attach type. Hence,
convert net->bpf.links into a list, that currently can be either empty or
have just one item.

Instead of reusing bpf_prog_list from bpf-cgroup, we link together
bpf_netns_link's themselves. This makes list management simpler as we don't
have to allocate, initialize, and later release list elements. We can do
this because multi-prog attachment will be available only for bpf_link, and
we don't need to build a list of programs attached directly and indirectly
via links.

No functional changes intended.

Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 include/net/netns/bpf.h    |  2 +-
 kernel/bpf/net_namespace.c | 42 +++++++++++++++++++++-----------------
 2 files changed, 24 insertions(+), 20 deletions(-)

diff --git a/include/net/netns/bpf.h b/include/net/netns/bpf.h
index a5015bda9979..0ca6a1b87185 100644
--- a/include/net/netns/bpf.h
+++ b/include/net/netns/bpf.h
@@ -15,7 +15,7 @@ struct netns_bpf {
 	/* Array of programs to run compiled from progs or links */
 	struct bpf_prog_array __rcu *run_array[MAX_NETNS_BPF_ATTACH_TYPE];
 	struct bpf_prog *progs[MAX_NETNS_BPF_ATTACH_TYPE];
-	struct bpf_link *links[MAX_NETNS_BPF_ATTACH_TYPE];
+	struct list_head links[MAX_NETNS_BPF_ATTACH_TYPE];
 };
 
 #endif /* __NETNS_BPF_H__ */
diff --git a/kernel/bpf/net_namespace.c b/kernel/bpf/net_namespace.c
index 593523a22168..3e89c7ad42cb 100644
--- a/kernel/bpf/net_namespace.c
+++ b/kernel/bpf/net_namespace.c
@@ -19,20 +19,12 @@ struct bpf_netns_link {
 	 * with netns_bpf_mutex held.
 	 */
 	struct net *net;
+	struct list_head node; /* node in list of links attached to net */
 };
 
 /* Protects updates to netns_bpf */
 DEFINE_MUTEX(netns_bpf_mutex);
 
-/* Must be called with netns_bpf_mutex held. */
-static void __net_exit bpf_netns_link_auto_detach(struct bpf_link *link)
-{
-	struct bpf_netns_link *net_link =
-		container_of(link, struct bpf_netns_link, link);
-
-	net_link->net = NULL;
-}
-
 static void bpf_netns_link_release(struct bpf_link *link)
 {
 	struct bpf_netns_link *net_link =
@@ -69,7 +61,7 @@ static void bpf_netns_link_release(struct bpf_link *link)
 		bpf_prog_array_free(old_array);
 	}
 
-	net->bpf.links[type] = NULL;
+	list_del(&net_link->node);
 
 out_unlock:
 	mutex_unlock(&netns_bpf_mutex);
@@ -233,7 +225,7 @@ int netns_bpf_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
 	mutex_lock(&netns_bpf_mutex);
 
 	/* Attaching prog directly is not compatible with links */
-	if (net->bpf.links[type]) {
+	if (!list_empty(&net->bpf.links[type])) {
 		ret = -EEXIST;
 		goto out_unlock;
 	}
@@ -296,7 +288,7 @@ static int __netns_bpf_prog_detach(struct net *net,
 	struct bpf_prog *attached;
 
 	/* Progs attached via links cannot be detached */
-	if (net->bpf.links[type])
+	if (!list_empty(&net->bpf.links[type]))
 		return -EINVAL;
 
 	attached = net->bpf.progs[type];
@@ -327,13 +319,15 @@ int netns_bpf_prog_detach(const union bpf_attr *attr)
 static int netns_bpf_link_attach(struct net *net, struct bpf_link *link,
 				 enum netns_bpf_attach_type type)
 {
+	struct bpf_netns_link *net_link =
+		container_of(link, struct bpf_netns_link, link);
 	struct bpf_prog_array *old_array, *new_array;
 	int err;
 
 	mutex_lock(&netns_bpf_mutex);
 
 	/* Allow attaching only one prog or link for now */
-	if (net->bpf.links[type]) {
+	if (!list_empty(&net->bpf.links[type])) {
 		err = -E2BIG;
 		goto out_unlock;
 	}
@@ -362,7 +356,7 @@ static int netns_bpf_link_attach(struct net *net, struct bpf_link *link,
 	rcu_assign_pointer(net->bpf.run_array[type], new_array);
 	bpf_prog_array_free(old_array);
 
-	net->bpf.links[type] = link;
+	list_add_tail(&net_link->node, &net->bpf.links[type]);
 
 out_unlock:
 	mutex_unlock(&netns_bpf_mutex);
@@ -421,24 +415,34 @@ int netns_bpf_link_create(const union bpf_attr *attr, struct bpf_prog *prog)
 	return err;
 }
 
+static int __net_init netns_bpf_pernet_init(struct net *net)
+{
+	int type;
+
+	for (type = 0; type < MAX_NETNS_BPF_ATTACH_TYPE; type++)
+		INIT_LIST_HEAD(&net->bpf.links[type]);
+
+	return 0;
+}
+
 static void __net_exit netns_bpf_pernet_pre_exit(struct net *net)
 {
 	enum netns_bpf_attach_type type;
-	struct bpf_link *link;
+	struct bpf_netns_link *net_link;
 
 	mutex_lock(&netns_bpf_mutex);
 	for (type = 0; type < MAX_NETNS_BPF_ATTACH_TYPE; type++) {
 		netns_bpf_run_array_detach(net, type);
-		link = net->bpf.links[type];
-		if (link)
-			bpf_netns_link_auto_detach(link);
-		else if (net->bpf.progs[type])
+		list_for_each_entry(net_link, &net->bpf.links[type], node)
+			net_link->net = NULL; /* auto-detach link */
+		if (net->bpf.progs[type])
 			bpf_prog_put(net->bpf.progs[type]);
 	}
 	mutex_unlock(&netns_bpf_mutex);
 }
 
 static struct pernet_operations netns_bpf_pernet_ops __net_initdata = {
+	.init = netns_bpf_pernet_init,
 	.pre_exit = netns_bpf_pernet_pre_exit,
 };
 
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v2 2/3] bpf, netns: Keep attached programs in bpf_prog_array
  2020-06-23 10:34 ` [PATCH bpf-next v2 2/3] bpf, netns: Keep attached programs in bpf_prog_array Jakub Sitnicki
@ 2020-06-23 19:33   ` Martin KaFai Lau
  2020-06-23 20:59     ` Jakub Sitnicki
  2020-06-24 17:18   ` Jakub Sitnicki
  1 sibling, 1 reply; 15+ messages in thread
From: Martin KaFai Lau @ 2020-06-23 19:33 UTC (permalink / raw)
  To: Jakub Sitnicki; +Cc: bpf, netdev, kernel-team, Andrii Nakryiko

On Tue, Jun 23, 2020 at 12:34:58PM +0200, Jakub Sitnicki wrote:

[ ... ]

> @@ -93,8 +108,16 @@ static int bpf_netns_link_update_prog(struct bpf_link *link,
>  		goto out_unlock;
>  	}
>  
> +	run_array = rcu_dereference_protected(net->bpf.run_array[type],
> +					      lockdep_is_held(&netns_bpf_mutex));
> +	if (run_array)
> +		ret = bpf_prog_array_replace_item(run_array, link->prog, new_prog);
> +	else
When will this happen?

> +		ret = -ENOENT;
> +	if (ret)
> +		goto out_unlock;
> +
>  	old_prog = xchg(&link->prog, new_prog);
> -	rcu_assign_pointer(net->bpf.progs[type], new_prog);
>  	bpf_prog_put(old_prog);
>  
>  out_unlock:
> @@ -142,14 +165,38 @@ static const struct bpf_link_ops bpf_netns_link_ops = {
>  	.show_fdinfo = bpf_netns_link_show_fdinfo,
>  };

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v2 2/3] bpf, netns: Keep attached programs in bpf_prog_array
  2020-06-23 19:33   ` Martin KaFai Lau
@ 2020-06-23 20:59     ` Jakub Sitnicki
  2020-06-23 21:24       ` Martin KaFai Lau
  0 siblings, 1 reply; 15+ messages in thread
From: Jakub Sitnicki @ 2020-06-23 20:59 UTC (permalink / raw)
  To: Martin KaFai Lau; +Cc: bpf, netdev, kernel-team, Andrii Nakryiko

On Tue, Jun 23, 2020 at 09:33 PM CEST, Martin KaFai Lau wrote:
> On Tue, Jun 23, 2020 at 12:34:58PM +0200, Jakub Sitnicki wrote:
>
> [ ... ]
>
>> @@ -93,8 +108,16 @@ static int bpf_netns_link_update_prog(struct bpf_link *link,
>>  		goto out_unlock;
>>  	}
>>
>> +	run_array = rcu_dereference_protected(net->bpf.run_array[type],
>> +					      lockdep_is_held(&netns_bpf_mutex));
>> +	if (run_array)
>> +		ret = bpf_prog_array_replace_item(run_array, link->prog, new_prog);
>> +	else
> When will this happen?

This will never happen, unless there is a bug. As long as there is a
link attached, run_array should never be detached (null). Because it can
be handled gracefully, we fail the bpf(LINK_UPDATE) syscall.

Your question makes me think that perhaps it should trigger a warning,
with WARN_ON_ONCE, to signal clearly to the reader that this is an
unexpected state.

WDYT?

>
>> +		ret = -ENOENT;
>> +	if (ret)
>> +		goto out_unlock;
>> +
>>  	old_prog = xchg(&link->prog, new_prog);
>> -	rcu_assign_pointer(net->bpf.progs[type], new_prog);
>>  	bpf_prog_put(old_prog);
>>
>>  out_unlock:
>> @@ -142,14 +165,38 @@ static const struct bpf_link_ops bpf_netns_link_ops = {
>>  	.show_fdinfo = bpf_netns_link_show_fdinfo,
>>  };

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v2 2/3] bpf, netns: Keep attached programs in bpf_prog_array
  2020-06-23 20:59     ` Jakub Sitnicki
@ 2020-06-23 21:24       ` Martin KaFai Lau
  2020-06-24 17:33         ` Jakub Sitnicki
  0 siblings, 1 reply; 15+ messages in thread
From: Martin KaFai Lau @ 2020-06-23 21:24 UTC (permalink / raw)
  To: Jakub Sitnicki; +Cc: bpf, netdev, kernel-team, Andrii Nakryiko

On Tue, Jun 23, 2020 at 10:59:37PM +0200, Jakub Sitnicki wrote:
> On Tue, Jun 23, 2020 at 09:33 PM CEST, Martin KaFai Lau wrote:
> > On Tue, Jun 23, 2020 at 12:34:58PM +0200, Jakub Sitnicki wrote:
> >
> > [ ... ]
> >
> >> @@ -93,8 +108,16 @@ static int bpf_netns_link_update_prog(struct bpf_link *link,
> >>  		goto out_unlock;
> >>  	}
> >>
> >> +	run_array = rcu_dereference_protected(net->bpf.run_array[type],
> >> +					      lockdep_is_held(&netns_bpf_mutex));
> >> +	if (run_array)
> >> +		ret = bpf_prog_array_replace_item(run_array, link->prog, new_prog);
> >> +	else
> > When will this happen?
> 
> This will never happen, unless there is a bug. As long as there is a
> link attached, run_array should never be detached (null). Because it can
> be handled gracefully, we fail the bpf(LINK_UPDATE) syscall.
> 
> Your question makes me think that perhaps it should trigger a warning,
> with WARN_ON_ONCE, to signal clearly to the reader that this is an
> unexpected state.
> 
> WDYT?
Thanks for confirming and the explanation.

If it will never happen, I would skip the "if (run_array)".  That
will help the code reading in the future.

I would not WARN also.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v2 2/3] bpf, netns: Keep attached programs in bpf_prog_array
  2020-06-23 10:34 ` [PATCH bpf-next v2 2/3] bpf, netns: Keep attached programs in bpf_prog_array Jakub Sitnicki
  2020-06-23 19:33   ` Martin KaFai Lau
@ 2020-06-24 17:18   ` Jakub Sitnicki
  2020-06-24 17:47     ` Andrii Nakryiko
  1 sibling, 1 reply; 15+ messages in thread
From: Jakub Sitnicki @ 2020-06-24 17:18 UTC (permalink / raw)
  To: bpf; +Cc: netdev, kernel-team, Andrii Nakryiko

On Tue, Jun 23, 2020 at 12:34 PM CEST, Jakub Sitnicki wrote:
> Prepare for having multi-prog attachments for new netns attach types by
> storing programs to run in a bpf_prog_array, which is well suited for
> iterating over programs and running them in sequence.
>
> Because bpf_prog_array is dynamically resized, after this change a
> potentially blocking memory allocation in bpf(PROG_QUERY) callback can
> happen, in order to collect program IDs before copying the values to
> user-space supplied buffer. This forces us to adapt how we protect access
> to the attached program in the callback. As bpf_prog_array_copy_to_user()
> helper can sleep, we switch from an RCU read lock to holding a mutex that
> serializes updaters.
>
> To handle bpf(PROG_ATTACH) scenario when we are replacing an already
> attached program, we introduce a new bpf_prog_array helper called
> bpf_prog_array_replace_item that will exchange the old program with a new
> one. bpf-cgroup does away with such helper by computing an index into the
> array from a program position in an external list of attached
> programs/links. Such approach fails when a dummy prog is left in the array
> after a memory allocation failure on link release, but is necessary in
> bpf-cgroup case because the same BPF program can be present in the array
> multiple times due to inheritance, and attachment cannot be reliably
> identified by bpf_prog pointer comparison.
>
> No functional changes intended.
>
> Acked-by: Andrii Nakryiko <andriin@fb.com>
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---
>  include/linux/bpf.h        |   3 +
>  include/net/netns/bpf.h    |   5 +-
>  kernel/bpf/core.c          |  20 ++++--
>  kernel/bpf/net_namespace.c | 137 +++++++++++++++++++++++++++----------
>  net/core/flow_dissector.c  |  21 +++---
>  5 files changed, 132 insertions(+), 54 deletions(-)
>

[...]

> diff --git a/kernel/bpf/net_namespace.c b/kernel/bpf/net_namespace.c
> index b951dab2687f..593523a22168 100644
> --- a/kernel/bpf/net_namespace.c
> +++ b/kernel/bpf/net_namespace.c

[...]

> @@ -93,8 +108,16 @@ static int bpf_netns_link_update_prog(struct bpf_link *link,
>  		goto out_unlock;
>  	}
>
> +	run_array = rcu_dereference_protected(net->bpf.run_array[type],
> +					      lockdep_is_held(&netns_bpf_mutex));
> +	if (run_array)
> +		ret = bpf_prog_array_replace_item(run_array, link->prog, new_prog);

Thinking about this some more, link update should fail with -EINVAL if
new_prog already exists in run_array. Same as PROG_ATTACH fails with
-EINVAL when trying to attach the same prog for the second time.

Otherwise, LINK_UPDATE can lead to having same BPF prog present multiple
times in the prog_array, once attaching more than one prog gets enabled.

Then we would we end up with the same challenge as bpf-cgroup, that is
how to find the program index into the prog_array in presence of
dummy_prog's.

> +	else
> +		ret = -ENOENT;
> +	if (ret)
> +		goto out_unlock;
> +
>  	old_prog = xchg(&link->prog, new_prog);
> -	rcu_assign_pointer(net->bpf.progs[type], new_prog);
>  	bpf_prog_put(old_prog);
>
>  out_unlock:

[...]

> @@ -217,14 +249,25 @@ int netns_bpf_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
>  	if (ret)
>  		goto out_unlock;
>
> -	attached = rcu_dereference_protected(net->bpf.progs[type],
> -					     lockdep_is_held(&netns_bpf_mutex));
> +	attached = net->bpf.progs[type];
>  	if (attached == prog) {
>  		/* The same program cannot be attached twice */
>  		ret = -EINVAL;
>  		goto out_unlock;
>  	}
> -	rcu_assign_pointer(net->bpf.progs[type], prog);
> +
> +	run_array = rcu_dereference_protected(net->bpf.run_array[type],
> +					      lockdep_is_held(&netns_bpf_mutex));
> +	if (run_array) {
> +		ret = bpf_prog_array_replace_item(run_array, attached, prog);

I didn't consider here that there can be a run_array with a dummy_prog
from a link release that failed to allocate memory.

In such case bpf_prog_array_replace_item will fail, while we actually
want to replace the dummy_prog.

The right thing to do is to replace the first item in prog array:

	if (run_array) {
		WRITE_ONCE(run_array->items[0].prog, prog);
	} else {
                /* allocate a bpf_prog_array */
        }

This leaves just one user of bpf_prog_array_replace_item(), so I think
I'm just going to fold it into its only caller, that is the update_prog
callback.

> +	} else {
> +		ret = bpf_prog_array_copy(NULL, NULL, prog, &run_array);
> +		rcu_assign_pointer(net->bpf.run_array[type], run_array);
> +	}
> +	if (ret)
> +		goto out_unlock;
> +
> +	net->bpf.progs[type] = prog;
>  	if (attached)
>  		bpf_prog_put(attached);
>

[...]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v2 2/3] bpf, netns: Keep attached programs in bpf_prog_array
  2020-06-23 21:24       ` Martin KaFai Lau
@ 2020-06-24 17:33         ` Jakub Sitnicki
  0 siblings, 0 replies; 15+ messages in thread
From: Jakub Sitnicki @ 2020-06-24 17:33 UTC (permalink / raw)
  To: Martin KaFai Lau; +Cc: bpf, netdev, kernel-team, Andrii Nakryiko

On Tue, Jun 23, 2020 at 11:24 PM CEST, Martin KaFai Lau wrote:
> On Tue, Jun 23, 2020 at 10:59:37PM +0200, Jakub Sitnicki wrote:
>> On Tue, Jun 23, 2020 at 09:33 PM CEST, Martin KaFai Lau wrote:
>> > On Tue, Jun 23, 2020 at 12:34:58PM +0200, Jakub Sitnicki wrote:
>> >
>> > [ ... ]
>> >
>> >> @@ -93,8 +108,16 @@ static int bpf_netns_link_update_prog(struct bpf_link *link,
>> >>  		goto out_unlock;
>> >>  	}
>> >>
>> >> +	run_array = rcu_dereference_protected(net->bpf.run_array[type],
>> >> +					      lockdep_is_held(&netns_bpf_mutex));
>> >> +	if (run_array)
>> >> +		ret = bpf_prog_array_replace_item(run_array, link->prog, new_prog);
>> >> +	else
>> > When will this happen?
>>
>> This will never happen, unless there is a bug. As long as there is a
>> link attached, run_array should never be detached (null). Because it can
>> be handled gracefully, we fail the bpf(LINK_UPDATE) syscall.
>>
>> Your question makes me think that perhaps it should trigger a warning,
>> with WARN_ON_ONCE, to signal clearly to the reader that this is an
>> unexpected state.
>>
>> WDYT?
> Thanks for confirming and the explanation.
>
> If it will never happen, I would skip the "if (run_array)".  That
> will help the code reading in the future.
>
> I would not WARN also.

Best code is no code :-)

I realized that bpf_prog_array_replace_item() cannot fail either, unless
there is a bug how we compile the prog_array. So I plan to remove that
error check as well.

Thanks for feedback.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v2 2/3] bpf, netns: Keep attached programs in bpf_prog_array
  2020-06-24 17:18   ` Jakub Sitnicki
@ 2020-06-24 17:47     ` Andrii Nakryiko
  2020-06-24 18:13       ` Jakub Sitnicki
  0 siblings, 1 reply; 15+ messages in thread
From: Andrii Nakryiko @ 2020-06-24 17:47 UTC (permalink / raw)
  To: Jakub Sitnicki; +Cc: bpf, Networking, kernel-team, Andrii Nakryiko

On Wed, Jun 24, 2020 at 10:19 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> On Tue, Jun 23, 2020 at 12:34 PM CEST, Jakub Sitnicki wrote:
> > Prepare for having multi-prog attachments for new netns attach types by
> > storing programs to run in a bpf_prog_array, which is well suited for
> > iterating over programs and running them in sequence.
> >
> > Because bpf_prog_array is dynamically resized, after this change a
> > potentially blocking memory allocation in bpf(PROG_QUERY) callback can
> > happen, in order to collect program IDs before copying the values to
> > user-space supplied buffer. This forces us to adapt how we protect access
> > to the attached program in the callback. As bpf_prog_array_copy_to_user()
> > helper can sleep, we switch from an RCU read lock to holding a mutex that
> > serializes updaters.
> >
> > To handle bpf(PROG_ATTACH) scenario when we are replacing an already
> > attached program, we introduce a new bpf_prog_array helper called
> > bpf_prog_array_replace_item that will exchange the old program with a new
> > one. bpf-cgroup does away with such helper by computing an index into the
> > array from a program position in an external list of attached
> > programs/links. Such approach fails when a dummy prog is left in the array
> > after a memory allocation failure on link release, but is necessary in
> > bpf-cgroup case because the same BPF program can be present in the array
> > multiple times due to inheritance, and attachment cannot be reliably
> > identified by bpf_prog pointer comparison.
> >
> > No functional changes intended.
> >
> > Acked-by: Andrii Nakryiko <andriin@fb.com>
> > Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> > ---
> >  include/linux/bpf.h        |   3 +
> >  include/net/netns/bpf.h    |   5 +-
> >  kernel/bpf/core.c          |  20 ++++--
> >  kernel/bpf/net_namespace.c | 137 +++++++++++++++++++++++++++----------
> >  net/core/flow_dissector.c  |  21 +++---
> >  5 files changed, 132 insertions(+), 54 deletions(-)
> >
>
> [...]
>
> > diff --git a/kernel/bpf/net_namespace.c b/kernel/bpf/net_namespace.c
> > index b951dab2687f..593523a22168 100644
> > --- a/kernel/bpf/net_namespace.c
> > +++ b/kernel/bpf/net_namespace.c
>
> [...]
>
> > @@ -93,8 +108,16 @@ static int bpf_netns_link_update_prog(struct bpf_link *link,
> >               goto out_unlock;
> >       }
> >
> > +     run_array = rcu_dereference_protected(net->bpf.run_array[type],
> > +                                           lockdep_is_held(&netns_bpf_mutex));
> > +     if (run_array)
> > +             ret = bpf_prog_array_replace_item(run_array, link->prog, new_prog);
>
> Thinking about this some more, link update should fail with -EINVAL if
> new_prog already exists in run_array. Same as PROG_ATTACH fails with
> -EINVAL when trying to attach the same prog for the second time.
>
> Otherwise, LINK_UPDATE can lead to having same BPF prog present multiple
> times in the prog_array, once attaching more than one prog gets enabled.
>
> Then we would we end up with the same challenge as bpf-cgroup, that is
> how to find the program index into the prog_array in presence of
> dummy_prog's.

If you attach 5 different links having the same bpf_prog, it should be
allowed and all five bpf_progs should be attached and called 5 times.
They are independent links, that's the main thing. What specific BPF
program is attached by the link (or later updated to) shouldn't be of
any concern here (relative to other attached links/programs).

Attaching the same *link* twice shouldn't be allowed, though.

>
> > +     else
> > +             ret = -ENOENT;
> > +     if (ret)
> > +             goto out_unlock;
> > +
> >       old_prog = xchg(&link->prog, new_prog);
> > -     rcu_assign_pointer(net->bpf.progs[type], new_prog);
> >       bpf_prog_put(old_prog);
> >
> >  out_unlock:
>
> [...]
>
> > @@ -217,14 +249,25 @@ int netns_bpf_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
> >       if (ret)
> >               goto out_unlock;
> >
> > -     attached = rcu_dereference_protected(net->bpf.progs[type],
> > -                                          lockdep_is_held(&netns_bpf_mutex));
> > +     attached = net->bpf.progs[type];
> >       if (attached == prog) {
> >               /* The same program cannot be attached twice */
> >               ret = -EINVAL;
> >               goto out_unlock;
> >       }
> > -     rcu_assign_pointer(net->bpf.progs[type], prog);
> > +
> > +     run_array = rcu_dereference_protected(net->bpf.run_array[type],
> > +                                           lockdep_is_held(&netns_bpf_mutex));
> > +     if (run_array) {
> > +             ret = bpf_prog_array_replace_item(run_array, attached, prog);
>
> I didn't consider here that there can be a run_array with a dummy_prog
> from a link release that failed to allocate memory.
>
> In such case bpf_prog_array_replace_item will fail, while we actually
> want to replace the dummy_prog.
>
> The right thing to do is to replace the first item in prog array:
>
>         if (run_array) {
>                 WRITE_ONCE(run_array->items[0].prog, prog);
>         } else {
>                 /* allocate a bpf_prog_array */
>         }
>
> This leaves just one user of bpf_prog_array_replace_item(), so I think
> I'm just going to fold it into its only caller, that is the update_prog
> callback.

That will change relative order of BPF programs, which I think is bad.
So I agree that bpf_prog_array_replace_item is not all that useful and
probably should be dropped. And the right way is to know the position
of bpf_prog you are trying to replace/delete, just like cgroup case.
Except cgroup case is even more complicated due to inheritance and
hierarchy, which luckily you don't have to deal with here.

>
> > +     } else {
> > +             ret = bpf_prog_array_copy(NULL, NULL, prog, &run_array);
> > +             rcu_assign_pointer(net->bpf.run_array[type], run_array);
> > +     }
> > +     if (ret)
> > +             goto out_unlock;
> > +
> > +     net->bpf.progs[type] = prog;
> >       if (attached)
> >               bpf_prog_put(attached);
> >
>
> [...]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v2 2/3] bpf, netns: Keep attached programs in bpf_prog_array
  2020-06-24 17:47     ` Andrii Nakryiko
@ 2020-06-24 18:13       ` Jakub Sitnicki
  2020-06-24 18:24         ` Andrii Nakryiko
  0 siblings, 1 reply; 15+ messages in thread
From: Jakub Sitnicki @ 2020-06-24 18:13 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, Networking, kernel-team, Andrii Nakryiko

On Wed, Jun 24, 2020 at 07:47 PM CEST, Andrii Nakryiko wrote:
> On Wed, Jun 24, 2020 at 10:19 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>
>> On Tue, Jun 23, 2020 at 12:34 PM CEST, Jakub Sitnicki wrote:
>> > Prepare for having multi-prog attachments for new netns attach types by
>> > storing programs to run in a bpf_prog_array, which is well suited for
>> > iterating over programs and running them in sequence.
>> >
>> > Because bpf_prog_array is dynamically resized, after this change a
>> > potentially blocking memory allocation in bpf(PROG_QUERY) callback can
>> > happen, in order to collect program IDs before copying the values to
>> > user-space supplied buffer. This forces us to adapt how we protect access
>> > to the attached program in the callback. As bpf_prog_array_copy_to_user()
>> > helper can sleep, we switch from an RCU read lock to holding a mutex that
>> > serializes updaters.
>> >
>> > To handle bpf(PROG_ATTACH) scenario when we are replacing an already
>> > attached program, we introduce a new bpf_prog_array helper called
>> > bpf_prog_array_replace_item that will exchange the old program with a new
>> > one. bpf-cgroup does away with such helper by computing an index into the
>> > array from a program position in an external list of attached
>> > programs/links. Such approach fails when a dummy prog is left in the array
>> > after a memory allocation failure on link release, but is necessary in
>> > bpf-cgroup case because the same BPF program can be present in the array
>> > multiple times due to inheritance, and attachment cannot be reliably
>> > identified by bpf_prog pointer comparison.
>> >
>> > No functional changes intended.
>> >
>> > Acked-by: Andrii Nakryiko <andriin@fb.com>
>> > Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
>> > ---
>> >  include/linux/bpf.h        |   3 +
>> >  include/net/netns/bpf.h    |   5 +-
>> >  kernel/bpf/core.c          |  20 ++++--
>> >  kernel/bpf/net_namespace.c | 137 +++++++++++++++++++++++++++----------
>> >  net/core/flow_dissector.c  |  21 +++---
>> >  5 files changed, 132 insertions(+), 54 deletions(-)
>> >
>>
>> [...]
>>
>> > diff --git a/kernel/bpf/net_namespace.c b/kernel/bpf/net_namespace.c
>> > index b951dab2687f..593523a22168 100644
>> > --- a/kernel/bpf/net_namespace.c
>> > +++ b/kernel/bpf/net_namespace.c
>>
>> [...]
>>
>> > @@ -93,8 +108,16 @@ static int bpf_netns_link_update_prog(struct bpf_link *link,
>> >               goto out_unlock;
>> >       }
>> >
>> > +     run_array = rcu_dereference_protected(net->bpf.run_array[type],
>> > +                                           lockdep_is_held(&netns_bpf_mutex));
>> > +     if (run_array)
>> > +             ret = bpf_prog_array_replace_item(run_array, link->prog, new_prog);
>>
>> Thinking about this some more, link update should fail with -EINVAL if
>> new_prog already exists in run_array. Same as PROG_ATTACH fails with
>> -EINVAL when trying to attach the same prog for the second time.
>>
>> Otherwise, LINK_UPDATE can lead to having same BPF prog present multiple
>> times in the prog_array, once attaching more than one prog gets enabled.
>>
>> Then we would we end up with the same challenge as bpf-cgroup, that is
>> how to find the program index into the prog_array in presence of
>> dummy_prog's.
>
> If you attach 5 different links having the same bpf_prog, it should be
> allowed and all five bpf_progs should be attached and called 5 times.
> They are independent links, that's the main thing. What specific BPF
> program is attached by the link (or later updated to) shouldn't be of
> any concern here (relative to other attached links/programs).
>
> Attaching the same *link* twice shouldn't be allowed, though.

Thanks for clarifying. I need to change the approach then:

 1) find the prog index into prog_array by iterating the list of links,
 2) adjust the index for any dummy progs in prog_array below the index.

That might work for bpf-cgroup too.

The only other alternative I can think of it to copy the prog array to
filter out dummy_progs, before replacing the prog on link update.

>
>>
>> > +     else
>> > +             ret = -ENOENT;
>> > +     if (ret)
>> > +             goto out_unlock;
>> > +
>> >       old_prog = xchg(&link->prog, new_prog);
>> > -     rcu_assign_pointer(net->bpf.progs[type], new_prog);
>> >       bpf_prog_put(old_prog);
>> >
>> >  out_unlock:
>>
>> [...]
>>
>> > @@ -217,14 +249,25 @@ int netns_bpf_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
>> >       if (ret)
>> >               goto out_unlock;
>> >
>> > -     attached = rcu_dereference_protected(net->bpf.progs[type],
>> > -                                          lockdep_is_held(&netns_bpf_mutex));
>> > +     attached = net->bpf.progs[type];
>> >       if (attached == prog) {
>> >               /* The same program cannot be attached twice */
>> >               ret = -EINVAL;
>> >               goto out_unlock;
>> >       }
>> > -     rcu_assign_pointer(net->bpf.progs[type], prog);
>> > +
>> > +     run_array = rcu_dereference_protected(net->bpf.run_array[type],
>> > +                                           lockdep_is_held(&netns_bpf_mutex));
>> > +     if (run_array) {
>> > +             ret = bpf_prog_array_replace_item(run_array, attached, prog);
>>
>> I didn't consider here that there can be a run_array with a dummy_prog
>> from a link release that failed to allocate memory.
>>
>> In such case bpf_prog_array_replace_item will fail, while we actually
>> want to replace the dummy_prog.
>>
>> The right thing to do is to replace the first item in prog array:
>>
>>         if (run_array) {
>>                 WRITE_ONCE(run_array->items[0].prog, prog);
>>         } else {
>>                 /* allocate a bpf_prog_array */
>>         }
>>
>> This leaves just one user of bpf_prog_array_replace_item(), so I think
>> I'm just going to fold it into its only caller, that is the update_prog
>> callback.
>
> That will change relative order of BPF programs, which I think is bad.
> So I agree that bpf_prog_array_replace_item is not all that useful and
> probably should be dropped. And the right way is to know the position
> of bpf_prog you are trying to replace/delete, just like cgroup case.
> Except cgroup case is even more complicated due to inheritance and
> hierarchy, which luckily you don't have to deal with here.

I think we are on the same page.

The write to first item could change the relative prog order, and even
replace the wrong program, but only if we had to deal with a prog_array
larger > 1.

In this case, direct attachment with PROG_ATTACH to netns, the
prog_array size is always 1, if it has been allocated already.

That is because I did not plan to enable multi-prog attachment for
flow_dissector with links. And only flow_dissector uses PROG_ATTACH to
netns.

>
>>
>> > +     } else {
>> > +             ret = bpf_prog_array_copy(NULL, NULL, prog, &run_array);
>> > +             rcu_assign_pointer(net->bpf.run_array[type], run_array);
>> > +     }
>> > +     if (ret)
>> > +             goto out_unlock;
>> > +
>> > +     net->bpf.progs[type] = prog;
>> >       if (attached)
>> >               bpf_prog_put(attached);
>> >
>>
>> [...]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v2 2/3] bpf, netns: Keep attached programs in bpf_prog_array
  2020-06-24 18:13       ` Jakub Sitnicki
@ 2020-06-24 18:24         ` Andrii Nakryiko
  2020-06-24 18:37           ` Jakub Sitnicki
  0 siblings, 1 reply; 15+ messages in thread
From: Andrii Nakryiko @ 2020-06-24 18:24 UTC (permalink / raw)
  To: Jakub Sitnicki; +Cc: bpf, Networking, kernel-team, Andrii Nakryiko

On Wed, Jun 24, 2020 at 11:16 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> On Wed, Jun 24, 2020 at 07:47 PM CEST, Andrii Nakryiko wrote:
> > On Wed, Jun 24, 2020 at 10:19 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
> >>
> >> On Tue, Jun 23, 2020 at 12:34 PM CEST, Jakub Sitnicki wrote:
> >> > Prepare for having multi-prog attachments for new netns attach types by
> >> > storing programs to run in a bpf_prog_array, which is well suited for
> >> > iterating over programs and running them in sequence.
> >> >
> >> > Because bpf_prog_array is dynamically resized, after this change a
> >> > potentially blocking memory allocation in bpf(PROG_QUERY) callback can
> >> > happen, in order to collect program IDs before copying the values to
> >> > user-space supplied buffer. This forces us to adapt how we protect access
> >> > to the attached program in the callback. As bpf_prog_array_copy_to_user()
> >> > helper can sleep, we switch from an RCU read lock to holding a mutex that
> >> > serializes updaters.
> >> >
> >> > To handle bpf(PROG_ATTACH) scenario when we are replacing an already
> >> > attached program, we introduce a new bpf_prog_array helper called
> >> > bpf_prog_array_replace_item that will exchange the old program with a new
> >> > one. bpf-cgroup does away with such helper by computing an index into the
> >> > array from a program position in an external list of attached
> >> > programs/links. Such approach fails when a dummy prog is left in the array
> >> > after a memory allocation failure on link release, but is necessary in
> >> > bpf-cgroup case because the same BPF program can be present in the array
> >> > multiple times due to inheritance, and attachment cannot be reliably
> >> > identified by bpf_prog pointer comparison.
> >> >
> >> > No functional changes intended.
> >> >
> >> > Acked-by: Andrii Nakryiko <andriin@fb.com>
> >> > Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> >> > ---
> >> >  include/linux/bpf.h        |   3 +
> >> >  include/net/netns/bpf.h    |   5 +-
> >> >  kernel/bpf/core.c          |  20 ++++--
> >> >  kernel/bpf/net_namespace.c | 137 +++++++++++++++++++++++++++----------
> >> >  net/core/flow_dissector.c  |  21 +++---
> >> >  5 files changed, 132 insertions(+), 54 deletions(-)
> >> >
> >>
> >> [...]
> >>
> >> > diff --git a/kernel/bpf/net_namespace.c b/kernel/bpf/net_namespace.c
> >> > index b951dab2687f..593523a22168 100644
> >> > --- a/kernel/bpf/net_namespace.c
> >> > +++ b/kernel/bpf/net_namespace.c
> >>
> >> [...]
> >>
> >> > @@ -93,8 +108,16 @@ static int bpf_netns_link_update_prog(struct bpf_link *link,
> >> >               goto out_unlock;
> >> >       }
> >> >
> >> > +     run_array = rcu_dereference_protected(net->bpf.run_array[type],
> >> > +                                           lockdep_is_held(&netns_bpf_mutex));
> >> > +     if (run_array)
> >> > +             ret = bpf_prog_array_replace_item(run_array, link->prog, new_prog);
> >>
> >> Thinking about this some more, link update should fail with -EINVAL if
> >> new_prog already exists in run_array. Same as PROG_ATTACH fails with
> >> -EINVAL when trying to attach the same prog for the second time.
> >>
> >> Otherwise, LINK_UPDATE can lead to having same BPF prog present multiple
> >> times in the prog_array, once attaching more than one prog gets enabled.
> >>
> >> Then we would we end up with the same challenge as bpf-cgroup, that is
> >> how to find the program index into the prog_array in presence of
> >> dummy_prog's.
> >
> > If you attach 5 different links having the same bpf_prog, it should be
> > allowed and all five bpf_progs should be attached and called 5 times.
> > They are independent links, that's the main thing. What specific BPF
> > program is attached by the link (or later updated to) shouldn't be of
> > any concern here (relative to other attached links/programs).
> >
> > Attaching the same *link* twice shouldn't be allowed, though.
>
> Thanks for clarifying. I need to change the approach then:
>
>  1) find the prog index into prog_array by iterating the list of links,
>  2) adjust the index for any dummy progs in prog_array below the index.
>
> That might work for bpf-cgroup too.
>

I thought that's what bpf-cgroup already does... Except right now
there could be no dummy progs, but if we do non-failing detachment,
there might be. Then hierarchies can get out of sync and we need to
handle that nicely. It's not super straightforward, that's why I said
that it's a nice challenge to consider :)

But here we don't have hierarchy, it's a happy place to be in :)


> The only other alternative I can think of it to copy the prog array to
> filter out dummy_progs, before replacing the prog on link update.

Probably over-complication. I'd filter dummy progs only on new link
attachment or detachment. Update can be kept very simple.

>
> >
> >>
> >> > +     else
> >> > +             ret = -ENOENT;
> >> > +     if (ret)
> >> > +             goto out_unlock;
> >> > +
> >> >       old_prog = xchg(&link->prog, new_prog);
> >> > -     rcu_assign_pointer(net->bpf.progs[type], new_prog);
> >> >       bpf_prog_put(old_prog);
> >> >
> >> >  out_unlock:
> >>
> >> [...]
> >>
> >> > @@ -217,14 +249,25 @@ int netns_bpf_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
> >> >       if (ret)
> >> >               goto out_unlock;
> >> >
> >> > -     attached = rcu_dereference_protected(net->bpf.progs[type],
> >> > -                                          lockdep_is_held(&netns_bpf_mutex));
> >> > +     attached = net->bpf.progs[type];
> >> >       if (attached == prog) {
> >> >               /* The same program cannot be attached twice */
> >> >               ret = -EINVAL;
> >> >               goto out_unlock;
> >> >       }
> >> > -     rcu_assign_pointer(net->bpf.progs[type], prog);
> >> > +
> >> > +     run_array = rcu_dereference_protected(net->bpf.run_array[type],
> >> > +                                           lockdep_is_held(&netns_bpf_mutex));
> >> > +     if (run_array) {
> >> > +             ret = bpf_prog_array_replace_item(run_array, attached, prog);
> >>
> >> I didn't consider here that there can be a run_array with a dummy_prog
> >> from a link release that failed to allocate memory.
> >>
> >> In such case bpf_prog_array_replace_item will fail, while we actually
> >> want to replace the dummy_prog.
> >>
> >> The right thing to do is to replace the first item in prog array:
> >>
> >>         if (run_array) {
> >>                 WRITE_ONCE(run_array->items[0].prog, prog);
> >>         } else {
> >>                 /* allocate a bpf_prog_array */
> >>         }
> >>
> >> This leaves just one user of bpf_prog_array_replace_item(), so I think
> >> I'm just going to fold it into its only caller, that is the update_prog
> >> callback.
> >
> > That will change relative order of BPF programs, which I think is bad.
> > So I agree that bpf_prog_array_replace_item is not all that useful and
> > probably should be dropped. And the right way is to know the position
> > of bpf_prog you are trying to replace/delete, just like cgroup case.
> > Except cgroup case is even more complicated due to inheritance and
> > hierarchy, which luckily you don't have to deal with here.
>
> I think we are on the same page.
>
> The write to first item could change the relative prog order, and even
> replace the wrong program, but only if we had to deal with a prog_array
> larger > 1.
>
> In this case, direct attachment with PROG_ATTACH to netns, the
> prog_array size is always 1, if it has been allocated already.
>
> That is because I did not plan to enable multi-prog attachment for
> flow_dissector with links. And only flow_dissector uses PROG_ATTACH to
> netns.

I was thinking in general for multi-link case, yeah. For this case, sure.

>
> >
> >>
> >> > +     } else {
> >> > +             ret = bpf_prog_array_copy(NULL, NULL, prog, &run_array);
> >> > +             rcu_assign_pointer(net->bpf.run_array[type], run_array);
> >> > +     }
> >> > +     if (ret)
> >> > +             goto out_unlock;
> >> > +
> >> > +     net->bpf.progs[type] = prog;
> >> >       if (attached)
> >> >               bpf_prog_put(attached);
> >> >
> >>
> >> [...]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v2 2/3] bpf, netns: Keep attached programs in bpf_prog_array
  2020-06-24 18:24         ` Andrii Nakryiko
@ 2020-06-24 18:37           ` Jakub Sitnicki
  0 siblings, 0 replies; 15+ messages in thread
From: Jakub Sitnicki @ 2020-06-24 18:37 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, Networking, kernel-team, Andrii Nakryiko

On Wed, Jun 24, 2020 at 08:24 PM CEST, Andrii Nakryiko wrote:
> On Wed, Jun 24, 2020 at 11:16 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>
>> On Wed, Jun 24, 2020 at 07:47 PM CEST, Andrii Nakryiko wrote:
>> > On Wed, Jun 24, 2020 at 10:19 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>> >>
>> >> On Tue, Jun 23, 2020 at 12:34 PM CEST, Jakub Sitnicki wrote:
>> >> > Prepare for having multi-prog attachments for new netns attach types by
>> >> > storing programs to run in a bpf_prog_array, which is well suited for
>> >> > iterating over programs and running them in sequence.
>> >> >
>> >> > Because bpf_prog_array is dynamically resized, after this change a
>> >> > potentially blocking memory allocation in bpf(PROG_QUERY) callback can
>> >> > happen, in order to collect program IDs before copying the values to
>> >> > user-space supplied buffer. This forces us to adapt how we protect access
>> >> > to the attached program in the callback. As bpf_prog_array_copy_to_user()
>> >> > helper can sleep, we switch from an RCU read lock to holding a mutex that
>> >> > serializes updaters.
>> >> >
>> >> > To handle bpf(PROG_ATTACH) scenario when we are replacing an already
>> >> > attached program, we introduce a new bpf_prog_array helper called
>> >> > bpf_prog_array_replace_item that will exchange the old program with a new
>> >> > one. bpf-cgroup does away with such helper by computing an index into the
>> >> > array from a program position in an external list of attached
>> >> > programs/links. Such approach fails when a dummy prog is left in the array
>> >> > after a memory allocation failure on link release, but is necessary in
>> >> > bpf-cgroup case because the same BPF program can be present in the array
>> >> > multiple times due to inheritance, and attachment cannot be reliably
>> >> > identified by bpf_prog pointer comparison.
>> >> >
>> >> > No functional changes intended.
>> >> >
>> >> > Acked-by: Andrii Nakryiko <andriin@fb.com>
>> >> > Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
>> >> > ---
>> >> >  include/linux/bpf.h        |   3 +
>> >> >  include/net/netns/bpf.h    |   5 +-
>> >> >  kernel/bpf/core.c          |  20 ++++--
>> >> >  kernel/bpf/net_namespace.c | 137 +++++++++++++++++++++++++++----------
>> >> >  net/core/flow_dissector.c  |  21 +++---
>> >> >  5 files changed, 132 insertions(+), 54 deletions(-)
>> >> >
>> >>
>> >> [...]
>> >>
>> >> > diff --git a/kernel/bpf/net_namespace.c b/kernel/bpf/net_namespace.c
>> >> > index b951dab2687f..593523a22168 100644
>> >> > --- a/kernel/bpf/net_namespace.c
>> >> > +++ b/kernel/bpf/net_namespace.c
>> >>
>> >> [...]
>> >>
>> >> > @@ -93,8 +108,16 @@ static int bpf_netns_link_update_prog(struct bpf_link *link,
>> >> >               goto out_unlock;
>> >> >       }
>> >> >
>> >> > +     run_array = rcu_dereference_protected(net->bpf.run_array[type],
>> >> > +                                           lockdep_is_held(&netns_bpf_mutex));
>> >> > +     if (run_array)
>> >> > +             ret = bpf_prog_array_replace_item(run_array, link->prog, new_prog);
>> >>
>> >> Thinking about this some more, link update should fail with -EINVAL if
>> >> new_prog already exists in run_array. Same as PROG_ATTACH fails with
>> >> -EINVAL when trying to attach the same prog for the second time.
>> >>
>> >> Otherwise, LINK_UPDATE can lead to having same BPF prog present multiple
>> >> times in the prog_array, once attaching more than one prog gets enabled.
>> >>
>> >> Then we would we end up with the same challenge as bpf-cgroup, that is
>> >> how to find the program index into the prog_array in presence of
>> >> dummy_prog's.
>> >
>> > If you attach 5 different links having the same bpf_prog, it should be
>> > allowed and all five bpf_progs should be attached and called 5 times.
>> > They are independent links, that's the main thing. What specific BPF
>> > program is attached by the link (or later updated to) shouldn't be of
>> > any concern here (relative to other attached links/programs).
>> >
>> > Attaching the same *link* twice shouldn't be allowed, though.
>>
>> Thanks for clarifying. I need to change the approach then:
>>
>>  1) find the prog index into prog_array by iterating the list of links,
>>  2) adjust the index for any dummy progs in prog_array below the index.
>>
>> That might work for bpf-cgroup too.
>>
>
> I thought that's what bpf-cgroup already does... Except right now
> there could be no dummy progs, but if we do non-failing detachment,
> there might be. Then hierarchies can get out of sync and we need to
> handle that nicely. It's not super straightforward, that's why I said
> that it's a nice challenge to consider :)

Now I get it... bpf-cgroup doesn't use bpf_prog_array_delete_safe(). I
was confusing it with what I saw in bpf_trace all along.

> But here we don't have hierarchy, it's a happy place to be in :)

It feels like there are some war stories to tell about bpf-cgroup.

>> The only other alternative I can think of it to copy the prog array to
>> filter out dummy_progs, before replacing the prog on link update.
>
> Probably over-complication. I'd filter dummy progs only on new link
> attachment or detachment. Update can be kept very simple.

OK, I know what I need to do.

[...]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v2 0/3] bpf, netns: Prepare for multi-prog attachment
  2020-06-23 10:34 [PATCH bpf-next v2 0/3] bpf, netns: Prepare for multi-prog attachment Jakub Sitnicki
                   ` (2 preceding siblings ...)
  2020-06-23 10:34 ` [PATCH bpf-next v2 3/3] bpf, netns: Keep a list of attached bpf_link's Jakub Sitnicki
@ 2020-06-29 14:56 ` Daniel Borkmann
  2020-06-29 14:57   ` Daniel Borkmann
  3 siblings, 1 reply; 15+ messages in thread
From: Daniel Borkmann @ 2020-06-29 14:56 UTC (permalink / raw)
  To: Jakub Sitnicki, bpf
  Cc: netdev, kernel-team, Andrii Nakryiko, Stanislav Fomichev

On 6/23/20 12:34 PM, Jakub Sitnicki wrote:
> This patch set prepares ground for link-based multi-prog attachment for
> future netns attach types, with BPF_SK_LOOKUP attach type in mind [0].
> 
> Two changes are needed in order to attach and run a series of BPF programs:
> 
>    1) an bpf_prog_array of programs to run (patch #2), and
>    2) a list of attached links to keep track of attachments (patch #3).
> 
> I've been using these patches with the next iteration of BPF socket lookup
> hook patches, and saw that they are self-contained and can be split out to
> ease the review burden.
> 
> Nothing changes for BPF flow_dissector. That is at most one prog can be
> attached.
> 
> Thanks,
> -jkbs
> 
> [0] https://lore.kernel.org/bpf/20200511185218.1422406-1-jakub@cloudflare.com/
> 
> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
> Cc: Stanislav Fomichev <sdf@google.com>
> 
> v1 -> v2:
> 
> - Show with a (void) cast that bpf_prog_array_replace_item() return value
>    is ignored on purpose. (Andrii)
> - Explain why bpf-cgroup cannot replace programs in bpf_prog_array based
>    on bpf_prog pointer comparison in patch #2 description. (Andrii)
> 
> Jakub Sitnicki (3):
>    flow_dissector: Pull BPF program assignment up to bpf-netns
>    bpf, netns: Keep attached programs in bpf_prog_array
>    bpf, netns: Keep a list of attached bpf_link's
> 
>   include/linux/bpf.h          |   3 +
>   include/net/flow_dissector.h |   3 +-
>   include/net/netns/bpf.h      |   7 +-
>   kernel/bpf/core.c            |  20 +++-
>   kernel/bpf/net_namespace.c   | 189 +++++++++++++++++++++++++----------
>   net/core/flow_dissector.c    |  34 +++----
>   6 files changed, 173 insertions(+), 83 deletions(-)
> 

Applied, thanks!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v2 0/3] bpf, netns: Prepare for multi-prog attachment
  2020-06-29 14:56 ` [PATCH bpf-next v2 0/3] bpf, netns: Prepare for multi-prog attachment Daniel Borkmann
@ 2020-06-29 14:57   ` Daniel Borkmann
  0 siblings, 0 replies; 15+ messages in thread
From: Daniel Borkmann @ 2020-06-29 14:57 UTC (permalink / raw)
  To: Jakub Sitnicki, bpf
  Cc: netdev, kernel-team, Andrii Nakryiko, Stanislav Fomichev

On 6/29/20 4:56 PM, Daniel Borkmann wrote:
[...]
> Applied, thanks!

(The v3 one of course.)

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-06-29 19:43 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-23 10:34 [PATCH bpf-next v2 0/3] bpf, netns: Prepare for multi-prog attachment Jakub Sitnicki
2020-06-23 10:34 ` [PATCH bpf-next v2 1/3] flow_dissector: Pull BPF program assignment up to bpf-netns Jakub Sitnicki
2020-06-23 10:34 ` [PATCH bpf-next v2 2/3] bpf, netns: Keep attached programs in bpf_prog_array Jakub Sitnicki
2020-06-23 19:33   ` Martin KaFai Lau
2020-06-23 20:59     ` Jakub Sitnicki
2020-06-23 21:24       ` Martin KaFai Lau
2020-06-24 17:33         ` Jakub Sitnicki
2020-06-24 17:18   ` Jakub Sitnicki
2020-06-24 17:47     ` Andrii Nakryiko
2020-06-24 18:13       ` Jakub Sitnicki
2020-06-24 18:24         ` Andrii Nakryiko
2020-06-24 18:37           ` Jakub Sitnicki
2020-06-23 10:34 ` [PATCH bpf-next v2 3/3] bpf, netns: Keep a list of attached bpf_link's Jakub Sitnicki
2020-06-29 14:56 ` [PATCH bpf-next v2 0/3] bpf, netns: Prepare for multi-prog attachment Daniel Borkmann
2020-06-29 14:57   ` Daniel Borkmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).